In 2025, entrepreneurs will unleash a flood of AI-powered apps. Finally, generative AI will deliver on the hype with a new crop of affordable consumer and business apps. This is not the consensus view today. OpenAI, Google, and xAI are locked in an arms race to train the most powerful large language model (LLM) in pursuit of artificial general intelligence, known as AGI, and their gladiatorial battle dominates the mindshare and revenue share of the fledgling Gen AI ecosystem.
For example, Elon Musk raised $6 billion to launch the newcomer xAI and bought 100,000 Nvidia H100 GPUs, the costly chips used to process AI, costing north of $3 billion to train its model, Grok. At those prices, only techno-tycoons can afford to build these giant LLMs.
The incredible spending by companies such as OpenAI, Google, and xAI has created a lopsided ecosystem that’s bottom heavy and top light. The LLMs trained by these huge GPU farms are usually also very expensive for inference, the process of entering a prompt and generating a response from large language models that is embedded in every app using AI. It’s as if everyone had 5G smartphones, but using data was too expensive for anyone to watch a TikTok video or surf social media. As a result, excellent LLMs with high inference costs have made it unaffordable to proliferate killer apps.
This lopsided ecosystem of ultra-rich tech moguls battling each other has enriched Nvidia while forcing application developers into a catch-22 of either using a low-cost and low-performance model bound to disappoint users, or face paying exorbitant inference costs and risk going bankrupt.
In 2025, a new approach will emerge that can change all that. This will return to what we’ve learned from previous technology revolutions, such as the PC era of Intel and Windows or the mobile era of Qualcomm and Android, where Moore’s law improved PCs and apps, and lower bandwidth cost improved mobile phones and apps year after year.
But what about the high inference cost? A new law for AI inference is just around the corner. The cost of inference has fallen by a factor of 10 per year, pushed down by new AI algorithms, inference technologies, and better chips at lower prices.
As a reference point, if a third-party developer used OpenAI’s top-of-the-line models to build AI search, in May 2023 the cost would be about $10 per query, while Google’s non-Gen-AI search costs $0.01, a 1,000x difference. But by May 2024, the price of OpenAI’s top model came down to about $1 per query. At this unprecedented 10x-per-year price drop, application developers will be able to use ever higher-quality and lower-cost models, leading to a proliferation of AI apps in the next two years.
Source