Mistral AI launches OCR 4 with 72% win rate in blind tests and support for 170 languages

Mistral AI launches OCR 4 with 72% win rate in blind tests and support for 170 languages

The French AI company's latest document intelligence model undercuts competitors on price while topping accuracy benchmarks

Mistral AI just dropped the fourth generation of its optical character recognition model, and the numbers suggest the French AI lab is quietly building one of the most capable document processing tools on the market.

Mistral OCR 4, unveiled on June 23, scored a 72% average win rate in human preference evaluations against competing OCR systems. The tests covered more than 600 real-world documents across over 12 languages.

What makes OCR 4 different

Mistral OCR 4 topped the public OlmOCRBench leaderboard with a score of 85.20, which serves as an independent benchmark for how well models handle real-world document extraction.

The model supports 170 languages across 10 language groups, with particular strength in rare and low-resource languages.

Advertisement

Beyond raw text extraction, OCR 4 introduces several features aimed at making its output actually useful for downstream applications. Paragraph-level bounding boxes tell you exactly where on a page each block of text lives. Typed block labels classify content into categories like titles, tables, and equations. Per-word and per-page confidence scores let developers programmatically flag sections that might need human review.

The output comes in markdown-structured text, which slots neatly into the retrieval-augmented generation (RAG) pipelines that enterprises are building to let AI agents search and reason over their internal documents.

Pricing and deployment

Mistral set API pricing at $4 per 1,000 pages for standard processing and $2 per 1,000 pages for batch jobs.

The model is optimized for single-container deployment, which matters for enterprises with strict data sovereignty requirements. Rather than routing sensitive documents through a third-party cloud API, companies can run OCR 4 on-premises or in sovereign cloud environments.

Early user feedback has highlighted lower latency compared to established competitors when processing structured documents.

The rapid iteration story

Mistral’s pace of development in this space tells its own story. The original Mistral OCR launched in March 2025. OCR 3 followed in December 2025, reportedly achieving a 74% win rate over its predecessor. Now OCR 4 arrives roughly six months later.

What this means for the market

The document processing market has historically been dominated by legacy players who built their businesses on on-premises scanning solutions and enterprise licensing agreements. Companies like ABBYY and Kofax have owned this space for years. More recently, cloud giants including Google, Amazon, and Microsoft have rolled out their own document AI services.

Mistral’s combination of competitive accuracy, aggressive pricing, and flexible deployment options positions OCR 4 as a credible alternative to all of them. The 72% win rate in blind human evaluations is the kind of metric that procurement teams can point to when justifying a vendor switch.

Going from a 74% win rate for OCR 3 over its predecessor to a 72% win rate for OCR 4 against the broader competitive field suggests the gains are real but the benchmarks are getting harder.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

Mistral AI launches OCR 4 with 72% win rate in blind tests and support for 170 languages

Mistral AI launches OCR 4 with 72% win rate in blind tests and support for 170 languages

The French AI company's latest document intelligence model undercuts competitors on price while topping accuracy benchmarks

Mistral AI just dropped the fourth generation of its optical character recognition model, and the numbers suggest the French AI lab is quietly building one of the most capable document processing tools on the market.

Mistral OCR 4, unveiled on June 23, scored a 72% average win rate in human preference evaluations against competing OCR systems. The tests covered more than 600 real-world documents across over 12 languages.

What makes OCR 4 different

Mistral OCR 4 topped the public OlmOCRBench leaderboard with a score of 85.20, which serves as an independent benchmark for how well models handle real-world document extraction.

The model supports 170 languages across 10 language groups, with particular strength in rare and low-resource languages.

Advertisement

Beyond raw text extraction, OCR 4 introduces several features aimed at making its output actually useful for downstream applications. Paragraph-level bounding boxes tell you exactly where on a page each block of text lives. Typed block labels classify content into categories like titles, tables, and equations. Per-word and per-page confidence scores let developers programmatically flag sections that might need human review.

The output comes in markdown-structured text, which slots neatly into the retrieval-augmented generation (RAG) pipelines that enterprises are building to let AI agents search and reason over their internal documents.

Pricing and deployment

Mistral set API pricing at $4 per 1,000 pages for standard processing and $2 per 1,000 pages for batch jobs.

The model is optimized for single-container deployment, which matters for enterprises with strict data sovereignty requirements. Rather than routing sensitive documents through a third-party cloud API, companies can run OCR 4 on-premises or in sovereign cloud environments.

Early user feedback has highlighted lower latency compared to established competitors when processing structured documents.

The rapid iteration story

Mistral’s pace of development in this space tells its own story. The original Mistral OCR launched in March 2025. OCR 3 followed in December 2025, reportedly achieving a 74% win rate over its predecessor. Now OCR 4 arrives roughly six months later.

What this means for the market

The document processing market has historically been dominated by legacy players who built their businesses on on-premises scanning solutions and enterprise licensing agreements. Companies like ABBYY and Kofax have owned this space for years. More recently, cloud giants including Google, Amazon, and Microsoft have rolled out their own document AI services.

Mistral’s combination of competitive accuracy, aggressive pricing, and flexible deployment options positions OCR 4 as a credible alternative to all of them. The 72% win rate in blind human evaluations is the kind of metric that procurement teams can point to when justifying a vendor switch.

Going from a 74% win rate for OCR 3 over its predecessor to a 72% win rate for OCR 4 against the broader competitive field suggests the gains are real but the benchmarks are getting harder.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.