Nexo Earn with Nexo
0G trains 107B parameter decentralized model with China Mobile, a first for AI above 100 billion parameters

0G trains 107B parameter decentralized model with China Mobile, a first for AI above 100 billion parameters

The DiLoCoX framework achieved 357x better communication efficiency than traditional methods, all over standard 1 Gbps network links.

Training a 107-billion-parameter AI model is hard enough when you have a warehouse full of cutting-edge GPUs connected by ultra-fast networking. Doing it across decentralized clusters on a standard 1 Gbps network? That’s a fundamentally different engineering challenge. 0G Labs claims to have pulled it off.

The project, completed in July 2025 in partnership with China Mobile, represents the first successful decentralized training of an AI model exceeding 100 billion parameters. The research paper detailing the methodology was published on arXiv on June 26, 2025, under the code arXiv:2506.21263.

How DiLoCoX actually works

The standard approach, known as AllReduce, requires all nodes to constantly share gradient updates with each other. DiLoCoX instead lets clusters of NVIDIA A800 GPUs work semi-independently, synchronizing far less frequently.

The framework employs several technical innovations to make this possible. Pipeline parallelism breaks the model into stages that can be processed sequentially across devices. A dual optimizer policy uses different optimization strategies for local and global training steps. One-step-delay overlap allows computation to continue while synchronization happens in the background. And adaptive gradient compression squeezes down the data that needs to travel between clusters.

Advertisement

The result, according to 0G Labs, is a 357x improvement in communication efficiency compared to traditional AllReduce methods, without sacrificing model convergence.

Why China Mobile matters here

China Mobile is the world’s largest mobile network operator. Its involvement signals something broader than a one-off research collaboration, as telecom companies sit on vast distributed infrastructure including cell towers, edge data centers, and network backbone. If decentralized AI training can genuinely work over standard bandwidth links, telecom providers could become potential hosts for distributed training networks without the need for specialized high-bandwidth interconnects.

0G Labs CEO Michael Heinrich framed the achievement in democratization terms:

“DiLoCoX marks a pivotal step in democratizing LLM training.”

What this means for the decentralized AI landscape

DiLoCoX challenges the concentration of AI training directly. Clusters of A800 GPUs, the export-compliant version of NVIDIA’s A100 available in China, were coordinated across geographically distributed locations over 1 Gbps links to train a 100B+ parameter model.

In March 2026, 0G Labs announced plans to publicly retrain the model with full transparency, with a commitment to open-source its technologies. That would allow independent verification of the efficiency claims and enable other teams to build on the methodology.

The key risk is reproducibility. A 357x efficiency improvement is extraordinary. Independent teams will need to validate these results before the market should price in a paradigm shift. The arXiv paper provides a starting point for that scrutiny, and the planned open-source release will determine whether DiLoCoX becomes a building block for the broader ecosystem or remains an impressive but isolated demonstration.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

0G trains 107B parameter decentralized model with China Mobile, a first for AI above 100 billion parameters

0G trains 107B parameter decentralized model with China Mobile, a first for AI above 100 billion parameters

The DiLoCoX framework achieved 357x better communication efficiency than traditional methods, all over standard 1 Gbps network links.

Training a 107-billion-parameter AI model is hard enough when you have a warehouse full of cutting-edge GPUs connected by ultra-fast networking. Doing it across decentralized clusters on a standard 1 Gbps network? That’s a fundamentally different engineering challenge. 0G Labs claims to have pulled it off.

The project, completed in July 2025 in partnership with China Mobile, represents the first successful decentralized training of an AI model exceeding 100 billion parameters. The research paper detailing the methodology was published on arXiv on June 26, 2025, under the code arXiv:2506.21263.

How DiLoCoX actually works

The standard approach, known as AllReduce, requires all nodes to constantly share gradient updates with each other. DiLoCoX instead lets clusters of NVIDIA A800 GPUs work semi-independently, synchronizing far less frequently.

The framework employs several technical innovations to make this possible. Pipeline parallelism breaks the model into stages that can be processed sequentially across devices. A dual optimizer policy uses different optimization strategies for local and global training steps. One-step-delay overlap allows computation to continue while synchronization happens in the background. And adaptive gradient compression squeezes down the data that needs to travel between clusters.

Advertisement

The result, according to 0G Labs, is a 357x improvement in communication efficiency compared to traditional AllReduce methods, without sacrificing model convergence.

Why China Mobile matters here

China Mobile is the world’s largest mobile network operator. Its involvement signals something broader than a one-off research collaboration, as telecom companies sit on vast distributed infrastructure including cell towers, edge data centers, and network backbone. If decentralized AI training can genuinely work over standard bandwidth links, telecom providers could become potential hosts for distributed training networks without the need for specialized high-bandwidth interconnects.

0G Labs CEO Michael Heinrich framed the achievement in democratization terms:

“DiLoCoX marks a pivotal step in democratizing LLM training.”

What this means for the decentralized AI landscape

DiLoCoX challenges the concentration of AI training directly. Clusters of A800 GPUs, the export-compliant version of NVIDIA’s A100 available in China, were coordinated across geographically distributed locations over 1 Gbps links to train a 100B+ parameter model.

In March 2026, 0G Labs announced plans to publicly retrain the model with full transparency, with a commitment to open-source its technologies. That would allow independent verification of the efficiency claims and enable other teams to build on the methodology.

The key risk is reproducibility. A 357x efficiency improvement is extraordinary. Independent teams will need to validate these results before the market should price in a paradigm shift. The arXiv paper provides a starting point for that scrutiny, and the planned open-source release will determine whether DiLoCoX becomes a building block for the broader ecosystem or remains an impressive but isolated demonstration.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.