Z.AI releases GLM-5.2 with 1M context window on Hugging Face

Z.AI, the Beijing-based AI company formerly known as Zhipu AI, just dropped GLM-5.2, a large language model with a 1 million-token context window and a focus on coding tasks. The model weights are live on Hugging Face under an MIT license, which means developers can grab them and run.

Here’s the thing about that context window: GLM-5.2’s predecessor, GLM-5.1, topped out at 200,000 tokens. This new version handles five times that amount, putting it in rare company among open-weight models that can actually process book-length inputs in a single pass.

What’s under the hood

GLM-5.2 runs on a mixture of experts (MoE) architecture. The full model contains roughly 744 to 753 billion parameters, but only about 40 billion are active at any given time.

The Hugging Face release includes an FP8 variant, a reduced-precision format that further lowers the computational requirements for running the model.

Z.AI has positioned GLM-5.2 explicitly as a coding and engineering tool rather than a general-purpose chatbot. The model is optimized for project-level engineering workflows and what the company calls “long-horizon agentic tasks,” meaning multi-step processes where the AI needs to maintain context across extended interactions.

Third release in four months

GLM-5.2 marks the third major release in the GLM-5 series since the original GLM-5 launched roughly four months ago.

Notably, Z.AI did not publish performance benchmarks alongside this launch.

All GLM Coding Plan subscribers can access the model through Z.AI’s platform and through compatible tools like Claude Code and OpenClaw. The subscription tiers, Lite, Pro, Max, and Team, start at $18 per month. For direct API access, Z.AI has set pricing at $1.40 per million input tokens and $4.40 per million output tokens.

What this means for developers and the broader market

The open-source coding model market has gotten crowded fast. Meta’s Llama series, Alibaba’s Qwen family, DeepSeek’s models, and Mistral’s offerings all compete for developer attention.

The 1 million-token context window is the headline feature for a reason. Most coding tasks that involve entire repositories, documentation sets, or multi-file refactoring jobs demand the ability to hold large amounts of context simultaneously. A model that can ingest an entire codebase in one pass, rather than requiring chunking strategies, removes a significant friction point for developers building AI-powered coding assistants.

The absence of benchmarks is a legitimate concern. Without third-party evaluations, it’s difficult to assess how GLM-5.2 stacks up against models like DeepSeek-V3 or GPT-4.1 on specific coding tasks.

What’s under the hood

GLM-5.2 runs on a mixture of experts (MoE) architecture. The full model contains roughly 744 to 753 billion parameters, but only about 40 billion are active at any given time.

The Hugging Face release includes an FP8 variant, a reduced-precision format that further lowers the computational requirements for running the model.

Third release in four months

GLM-5.2 marks the third major release in the GLM-5 series since the original GLM-5 launched roughly four months ago.

Notably, Z.AI did not publish performance benchmarks alongside this launch.

What this means for developers and the broader market

The open-source coding model market has gotten crowded fast. Meta’s Llama series, Alibaba’s Qwen family, DeepSeek’s models, and Mistral’s offerings all compete for developer attention.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Z.AI releases GLM-5.2 with 1M context window on Hugging Face

What’s under the hood

Third release in four months

What this means for developers and the broader market

Z.AI releases GLM-5.2 with 1M context window on Hugging Face

What’s under the hood

Third release in four months

What this means for developers and the broader market

Get Crypto Briefing in your inbox