PixelRAG outperforms text parsers, reduces AI agent token costs by 10x

A new framework from UC Berkeley, Princeton, EPFL, and Databricks skips text parsing entirely, treating web pages as screenshots to dramatically improve retrieval accuracy

Every enterprise AI pipeline has the same dirty secret: the first step is usually the worst. When a retrieval-augmented generation system needs to pull knowledge from a document or web page, it starts by converting that content into plain text. Tables get flattened. Layouts get destroyed. Visual context vanishes. And according to new research, that single conversion step is responsible for the majority of wrong answers these systems produce.

A team from UC Berkeley, Princeton University, EPFL, and Databricks has a fix. Their new framework, called PixelRAG, skips text parsing entirely. Instead, it renders pages as screenshots, indexes those images, and feeds the retrieved visual tiles directly to a vision-language model. Tested against 30 million screenshot tiles covering all of Wikipedia, the system improves accuracy by up to 18.1% over traditional text-based RAG across six benchmarks.

How PixelRAG actually works

Rather than parsing HTML or PDFs into raw strings, PixelRAG renders source material into screenshot tiles. Those tiles are then embedded into a visual index that preserves the original layout, tabular structures, and design signals that text parsers strip away. When a query comes in, the system retrieves the most relevant tiles and passes them to a vision-language model reader that can interpret both the visual and textual information simultaneously.

Across six different question-answering benchmarks, PixelRAG consistently outperformed text-based RAG systems, with accuracy gains as high as 18.1%. The framework also cuts token generation costs for AI agents by up to 10x compared to legacy pipelines.

Why this matters for AI infrastructure costs

The research team also found that PixelRAG achieves higher question-answering accuracy than Google Search while maintaining costs that are 2 to 4x lower.

Key contributors to the project include Yichuan Wang, Zirui Wang, and Matei Zaharia, who serves as Databricks’ CTO. Zaharia co-created Apache Spark. The framework’s code is openly available on GitHub at github.com/StarTrail-org/PixelRAG, and an accompanying paper has been published on arXiv with identifier 2506.05209.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

PixelRAG outperforms text parsers, reduces AI agent token costs by 10x

A new framework from UC Berkeley, Princeton, EPFL, and Databricks skips text parsing entirely, treating web pages as screenshots to dramatically improve retrieval accuracy

by Editorial Team

Jun. 12, 2026

Add us on Google

How PixelRAG actually works

Why this matters for AI infrastructure costs

The research team also found that PixelRAG achieves higher question-answering accuracy than Google Search while maintaining costs that are 2 to 4x lower.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

PixelRAG outperforms text parsers, reduces AI agent token costs by 10x

How PixelRAG actually works

Why this matters for AI infrastructure costs

PixelRAG outperforms text parsers, reduces AI agent token costs by 10x

How PixelRAG actually works

Why this matters for AI infrastructure costs

Get Crypto Briefing in your inbox