OpenAI’s Mark Chen says AI models are approaching the point of generating their own innovations

OpenAI’s Mark Chen says AI models are approaching the point of generating their own innovations

The chief research officer outlined how pre-training and chain-of-thought reasoning are pushing models toward genuine autonomy in long-horizon tasks.

Mark Chen, OpenAI’s Chief Research Officer, laid out a vision of AI models that don’t just follow instructions but actively develop their own innovations. In a June 2026 interview for Latent Space, Chen described a trajectory where pre-training remains the foundational layer powering increasingly autonomous systems.

Pre-training as the engine of autonomy

Chen’s argument centers on a deceptively simple idea. The better a model’s pre-training, the more capable it becomes at handling tasks that stretch over long time horizons without constant human guidance.

In AI terms, that absorbed knowledge is what allows models to undertake what Chen described as “long-horizon tasks,” meaning projects that require sustained focus, planning, and adaptation over extended periods rather than quick one-shot answers.

Chen pointed to OpenAI’s o1 series of reasoning models, first introduced in 2024, as a pivotal milestone in this evolution. Those models represented an early step toward machines that could genuinely reason through multi-step problems rather than pattern-match their way to answers.

Advertisement

Chain-of-thought reasoning, where a model works through problems step by step rather than jumping to conclusions, is being refined to the point where models can autonomously generate elaborate research agendas and manage complex, extended projects.

From reasoning to agency

Chen has been at the center of several of OpenAI’s most significant projects. He directed both DALL-E and Codex, and contributed to the reasoning model work that produced the o1 series.

The interview also touched on multimodal systems, where models don’t just process text but interpret and generate images as well. This builds on advancements seen with GPT-4’s vision integration, where models began engaging with visual information alongside language.

Discussions around GPT-5, which emerged through 2025, have centered on merging faster response times with deeper reasoning capabilities while extending these multimodal functions.

The compute question and what investors should watch

Chen’s remarks repeatedly circled back to compute demand and the persistent necessity for better evaluation benchmarks, particularly for long-context and long-horizon tasks.

For crypto-adjacent investors hoping for a direct connection between AI breakthroughs and token prices, the interview offered little to work with. Chen made no mention of cryptocurrency tokens, protocols, or digital assets. The conversation stayed squarely within the lane of AI research and its infrastructure requirements.

The evaluation benchmark problem is worth watching too. Chen flagged that existing benchmarks struggle to measure model performance on long-context and long-horizon tasks. As new evaluation frameworks emerge, they’ll likely reshape how the industry assesses which models are actually leading the field versus which are simply performing well on outdated tests.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.

OpenAI’s Mark Chen says AI models are approaching the point of generating their own innovations

OpenAI’s Mark Chen says AI models are approaching the point of generating their own innovations

The chief research officer outlined how pre-training and chain-of-thought reasoning are pushing models toward genuine autonomy in long-horizon tasks.

Mark Chen, OpenAI’s Chief Research Officer, laid out a vision of AI models that don’t just follow instructions but actively develop their own innovations. In a June 2026 interview for Latent Space, Chen described a trajectory where pre-training remains the foundational layer powering increasingly autonomous systems.

Pre-training as the engine of autonomy

Chen’s argument centers on a deceptively simple idea. The better a model’s pre-training, the more capable it becomes at handling tasks that stretch over long time horizons without constant human guidance.

In AI terms, that absorbed knowledge is what allows models to undertake what Chen described as “long-horizon tasks,” meaning projects that require sustained focus, planning, and adaptation over extended periods rather than quick one-shot answers.

Chen pointed to OpenAI’s o1 series of reasoning models, first introduced in 2024, as a pivotal milestone in this evolution. Those models represented an early step toward machines that could genuinely reason through multi-step problems rather than pattern-match their way to answers.

Advertisement

Chain-of-thought reasoning, where a model works through problems step by step rather than jumping to conclusions, is being refined to the point where models can autonomously generate elaborate research agendas and manage complex, extended projects.

From reasoning to agency

Chen has been at the center of several of OpenAI’s most significant projects. He directed both DALL-E and Codex, and contributed to the reasoning model work that produced the o1 series.

The interview also touched on multimodal systems, where models don’t just process text but interpret and generate images as well. This builds on advancements seen with GPT-4’s vision integration, where models began engaging with visual information alongside language.

Discussions around GPT-5, which emerged through 2025, have centered on merging faster response times with deeper reasoning capabilities while extending these multimodal functions.

The compute question and what investors should watch

Chen’s remarks repeatedly circled back to compute demand and the persistent necessity for better evaluation benchmarks, particularly for long-context and long-horizon tasks.

For crypto-adjacent investors hoping for a direct connection between AI breakthroughs and token prices, the interview offered little to work with. Chen made no mention of cryptocurrency tokens, protocols, or digital assets. The conversation stayed squarely within the lane of AI research and its infrastructure requirements.

The evaluation benchmark problem is worth watching too. Chen flagged that existing benchmarks struggle to measure model performance on long-context and long-horizon tasks. As new evaluation frameworks emerge, they’ll likely reshape how the industry assesses which models are actually leading the field versus which are simply performing well on outdated tests.

Disclosure: This article was edited by Editorial Team. For more information on how we create and review content, see our Editorial Policy.