xAI launches Voice Agent Builder in beta with aggressive per-minute pricing
The no-code platform lets users spin up production-ready voice agents powered by Grok Voice for $0.05 per minute, undercutting established competitors.
Elon Musk’s AI company just made it significantly cheaper to build a voice agent. xAI launched the beta version of its Voice Agent Builder on July 1, offering a no-code platform where users can create custom voice agents powered by Grok Voice in under two minutes.
The pricing is straightforward: $0.05 per minute of audio for agents, plus an additional $0.01 per minute for telephony on provisioned phone numbers.
What the Voice Agent Builder actually does
The platform supports over 25 languages and more than 80 distinct voices, with sub-second latency claims for VoIP production environments.
xAI is positioning this for production deployments across customer support, sales, and other business-critical functions. The platform includes integration with various external tools, provisioned phone numbers at no extra charge, and direct SIP connectivity for enterprises that want to plug voice agents into existing telephony infrastructure.
The Builder also sits on top of xAI’s broader voice technology stack, which includes the Grok Voice Agent API introduced earlier in 2026. That API enables real-time speech interactions through WebSocket technology. The company also rolled out custom voice cloning capabilities that let users create a voice clone from just a one-minute audio recording.
How the pricing stacks up
At $0.05 per minute for agent audio plus $0.01 per minute for telephony, xAI is positioning itself as the budget option in a market where competitors charge meaningfully more. The pricing is reportedly more economical than established players like ElevenLabs and Vapi.
A ten-minute customer support call would cost roughly $0.60 total through xAI’s platform. Provisioned phone numbers come at no extra charge.
What this means for developers and businesses
The no-code angle dramatically expands the potential user base. Previously, building production-grade voice agents required developers comfortable with WebSocket APIs, audio streaming protocols, and telephony integration.
The custom voice cloning capability lets a company create a consistent brand voice across all its automated interactions from one minute of audio input.