PixelRAG outperforms text parsers, reduces AI agent token costs by 10x
A new framework from UC Berkeley, Princeton, EPFL, and Databricks skips text parsing entirely, treating web pages as screenshots to dramatically improve retrieval accuracy
Every enterprise AI pipeline has the same dirty secret: the first step is usually the worst. When a retrieval-augmented generation system needs to pull knowledge from a document or web page, it starts by converting that content into plain text. Tables get flattened. Layouts get destroyed. Visual context vanishes. And according to new research, that single conversion step is responsible for the majority of wrong answers these systems produce.
A team from UC Berkeley, Princeton University, EPFL, and Databricks has a fix. Their new framework, called PixelRAG, skips text parsing entirely. Instead, it renders pages as screenshots, indexes those images, and feeds the retrieved visual tiles directly to a vision-language model. Tested against 30 million screenshot tiles covering all of Wikipedia, the system improves accuracy by up to 18.1% over traditional text-based RAG across six benchmarks.
How PixelRAG actually works
Rather than parsing HTML or PDFs into raw strings, PixelRAG renders source material into screenshot tiles. Those tiles are then embedded into a visual index that preserves the original layout, tabular structures, and design signals that text parsers strip away. When a query comes in, the system retrieves the most relevant tiles and passes them to a vision-language model reader that can interpret both the visual and textual information simultaneously.
Across six different question-answering benchmarks, PixelRAG consistently outperformed text-based RAG systems, with accuracy gains as high as 18.1%. The framework also cuts token generation costs for AI agents by up to 10x compared to legacy pipelines.
Why this matters for AI infrastructure costs
The research team also found that PixelRAG achieves higher question-answering accuracy than Google Search while maintaining costs that are 2 to 4x lower.
Key contributors to the project include Yichuan Wang, Zirui Wang, and Matei Zaharia, who serves as Databricks’ CTO. Zaharia co-created Apache Spark. The framework’s code is openly available on GitHub at github.com/StarTrail-org/PixelRAG, and an accompanying paper has been published on arXiv with identifier 2506.05209.
Earn with Nexo