近日,我在《连线 2025 年度趋势》专刊的的专栏文章里,探讨了生成式 AI 的未来发展及其对社会、科技和商业模式的深远影响。
当前,动辄数十亿美元的算力投资和昂贵的推理成本正在侵蚀生成式 AI 的创新潜力。为了实现进一步的突破,大语言模型(LLM)亟需变得更轻量化、更高效、更实惠。
我相信,2025 年将是一个重要的转折点。在性能优异、但更轻量化的模型加持下,众多 AI-First 应用将会涌现,深刻改变我们的生活。
以下是专栏文章全文:
作者:李开复
零一万物 CEO、创新工场董事长
展望 2025 年,预计将会有一系列由生成式 AI(GenAI)驱动的 AI-First 应用被推出。届时,生成式 AI 将衍生出新一代价格合理的消费级和企业级解决方案,一方面回应其在业界引爆的广泛期待,另一方面也向社会各界展现其巨大的潜在价值。
然而,这一观点在今天并非共识。当前,OpenAI、谷歌和 xAI 等硅谷科技巨头正陷入一场激烈的“科技军备竞赛”,竞相开发最强大的超大模型,以追求被称为“AGI”的通用人工智能。他们之间激烈的拉锯竞争,占据了生成式 AI 生态的全球关注焦点,也主导分配了该生态体系中的收入份额。
以埃隆·马斯克为例,他筹集了 60 亿美元用于创立新公司 xAI,并购买了 10 万张英伟达 H100 GPU。这些昂贵的芯片被用于训练人工智能,其 Grok 模型的成本超过了 30 亿美元。这些惊人的投入水位,只有富有的科技巨头才有能力构建这些超大语言模型。
图片来源:pexels
OpenAI、谷歌和 xAI 等公司不计成本的投入,造就了一个头重脚轻的生态系统。由这些庞大的 GPU 集群训练出的大模型,推理成本通常非常昂贵,这一成本最终会叠加到在每一个接入该大模型的应用上。
这就好比每个人都拥有 5G 智能手机,但流量费用却高得令人望而却步,贵到没法观看短视频或时时浏览社交媒体。因此,尽管大模型性能在不断提升,但只要推理成本居高不下,杀手级应用的普及就不切实际。
这个由超级富豪、科技巨头相互竞争造就的失衡生态,使得英伟达成为最大的获利者,同时迫使应用开发者陷入两难:要么只能用低成本、低性能的模型,但这必然会达不到用户的期望值;要么直面高昂的推理成本,冒着破产的风险去开发应用。
到 2025 年,一种新的模式将为改变这一困境带来希望。回顾我们从以往技术革命浪潮中习得的经验,PC 时代英特尔和 Windows 成功崛起,移动时代的高通和安卓成为了新的弄潮儿。在这些时代里,摩尔定律逐年提升了 PC 和应用的性能,而更低的带宽和联网成本,则极大改善了移动端应用的使用体验。
面对高昂的推理成本,我预测业界即将迎来一项革命性的 AI 推理法则——得益于新一代人工智能算法的优化、先进的推理技术以及成本效益更高的芯片技术,AI 推理成本有望实现每年十倍的下降。
图片来源:pexels
为了凸显推理成本下降的显著影响,我们来做一个简单的对比。如果一名第三方开发者使用 OpenAI 的顶级模型来构建一个 AI 搜索应用,2023 年 5 月这款应用的单次搜索成本约为 0.75 美元,而没有生成式 AI 加持的谷歌单次搜索成本远远低于 0.01 美元,相差 75 倍。
但仅仅一年间,2024 年 5 月使用 OpenAI 顶级模型的单次提问成本,已经降至约 0.04 美元,非常接近谷歌搜索。
推理成本每年下降十倍,这一速度是前所未有的。在此趋势下,应用开发者很快就能够使用性能更优、更实惠的大模型,未来两年内 AI-First 应用将会迅速普及。
我相信,这将引领一种构建大模型公司的新模式。与其专注于 AGI “军备竞赛”,创业者将开始专注于构建性能优异、但更轻量化的模型,从而实现极速和极低成本的推理。
这些专为商业用途而设计的模型会采用创新的模型架构,变得更精简。这不仅会大幅收窄训练成本,还可以保证模型性能能够满足消费者或企业端的需求。在这种模式下或许不会诞生“能获得诺贝尔奖的 AI”,但是这类模型却有望成为推动 AI-First 应用普及的催化剂,促成 AI 生态系统的良性循环。
图片来源:pexels
举个例子,我所孵化的一个创业团队正在同时构建模型、推理引擎和应用。仅以 300 万美元的成本就训练出了一个性能与 OpenAI 顶级模型几乎持平的模型。值得一提的是,Sam Altman 曾表示训练 OpenAI 的 GPT-4 成本超过 1 亿美元。[1]
将这个模型应用到 AI 搜索应用 BeaGo 上,单次搜索的推理成本仅为 0.001 美元,只有 GPT-4 成本的 3%。而且,该团队仅用五名工程师,花了两个月就研发上线了这个 AI 搜索。
这又是如何实现的呢?据我所知,创业团队通过深度垂直整合,全面优化了推理、模型和应用开发的全过程。
在人工智能发展的历程中,我们共同见证了大语言模型这项革命性技术的力量。我坚信,生成式 AI 将彻底改变我们的学习、工作、生活方式以及商业模式。整个生态系统必须协同合作,克服成本障碍,调整策略,实现平衡,让 AI 真正为我们的社会作出贡献。
本文翻译自《THE WIRED WORLD IN 2025》(《连线 2025 年度趋势》)英文专栏,原文如下:
How Do You Get to Artificial General Intelligence? Think Lighter
BY KAI-FU LEE
CEO of 01.AI and Chairman of Sinovation Ventures
In 2025, entrepreneurs will unleash a flood of AI-powered apps. Finally, generative AI will deliver on the hype with a new crop of affordable consumer and business apps. This is not the consensus view today. OpenAI, Google, and xAI are locked in an arms race to train the most powerful large language model (LLM) in pursuit of artificial general intelligence, known as AGI, and their gladiatorial battle dominates the mindshare and revenue share of the fledgling GenAI ecosystem.
For example, Elon Musk raised $6 billion to launch the newcomer xAI and bought 100,000 Nvidia H100 GPUs, the costly chips used to process AI, costing north of $3 billion to train its model,Grok. At those prices, only technotycoons can afford to build these giant LLMs.
The incredible spending by companies such as OpenAI, Google and xAI has created a lopsided ecosystem that’s bottom heavy and top light. The LLMs trained by these huge GPU farms are usually also very expensive to inference, the process of entering a prompt and generating a response from large language models that is embedded in every app using AI. It’s as if everyone had 5G smartphones, but using data was too expensive for anyone to watch a Tiktok video or surf social media. As a result, excellent LLMs with high inference costs have made it unaffordable to proliferate killer apps.
This lopsided ecosystem of ultra-rich tech moguls battling each other has enriched Nvidia while forcing application developers into a catch-22 of either using a low-cost and low-performance model bound to disappoint users, or face paying exorbitant inference costs and risk going bankrupt.
In 2025, a new approach will emerge that can change all that. This will return to what we’ve learned from previous technology revolutions, such as the PC-era of Intel and Windows or the mobile era of Qualcomm and Android, where Moore’s Law improved PCs and apps, and lower bandwidth cost improved mobile phones and apps year after year.
But what about the high inference cost? A new law for AI inference is just around the corner. The cost of inference has fallen by a factor of 10 per year, pushed down by new AI algorithms, inference technologies, and better chips at lower prices.
As a reference point, if a third-party developer used OpenAI’s top-of-the-line models to build AI search, in May 2023 the cost would be about $0.75 per query, while Google’s non-Gen-AI search costs well less than $0.01, a 75x difference. But by May 2024, the price of OpenAI’s top model came down to about $0.04 per query. At this unprecedented 10x-per-year price drop, application developers will be able to use ever higher-quality and lower-cost models, leading to a proliferation of AI apps in the next two years.
I believe this will drive a different way to build an LLM company. Rather than focusing on the AGI arms race, founders will start to focus on building models that are almost as good as the top LLMs, but lightweight and thus ultra-fast and ultra-cheap. These models and apps, purpose-built for commercial applications using leaner models and innovative architecture, will cost a fraction to train and achieve levels of performance good enough for consumers and enterprises. This approach will not lead to a Nobel Prize- winning AI, but will be the catalyst to proliferating AI apps, leading to a healthy AI ecosystem.
For instance, I’m backing a team that’s jointly building a model, an inference engine, and an app all at the same time. This Silicon Valley-based AI startup trained a model almost as good as the best from OpenAI for $3 million, compared to the more than $100 million that Sam Altman said it cost to train OpenAI’s GPT-4 [1]. The inference cost of this model applied to an AI search app such as BeaGo is only $0.001 per query, only 3% of GPT-4’s price. And the team also built and launched an AI search app with just five engineers working for two months.
How was that accomplished? Vertical and deep integration that optimized inference, model, and application development holistically.
On the path of AI progression, we have all witnessed the power of LLM as a revolutionary technology. I am a firm believer that generative AI will disrupt the way we learn, work, live, and do business. The ecosystem must work together to get over the cost hurdle and adjust the formula, achieving equilibrium to make AI really work for our society.