4 月 3 日消息,负责维护和管理 ARC-AGI 的 Arc Prize Foundation,在上周对 OpenAI 的 o3 “推理” 人工智能模型,在 ARC-AGI 基准测试中的成本估算进行了重大修订。这一调整后,o3 模型的运行成本远超预期。
去年 12 月,OpenAI 推出 o3 模型时,与 ARC-AGI 的开发者合作,展示了该模型解决复杂问题的强大能力。当时的成本估算显示,其运行成本较为可观。但几个月后的最新估算结果却发生了惊人变化。Arc Prize Foundation 指出,o3 模型中表现最佳的配置 o3 high,解决一个单一的 ARC-AGI 问题的成本可能高达约 3 万美元(现汇率约合 21.8 万元人民币),然而在此之前,这一成本估算仅为约 3000 美元(现汇率约合 2.18 万元人民币) 。
这一成本估算的大幅上调,揭示了目前最先进 AI 模型在特定任务上可能面临的高昂成本挑战。尽管 OpenAI 尚未公布 o3 模型的定价,也未正式发布该模型,但 Arc Prize Foundation 认为,OpenAI 目前最昂贵的 o1-pro 模型的定价可作为参考。Arc Prize Foundation 的联合创始人迈克・库诺(Mike Knoop)在接受 TechCrunch 采访时表示:“我们认为 o1-pro 与 o3 的真实成本更为接近,因为两者在测试时使用的计算量相近。但这仍然只是一个参考值,在官方定价公布前,我们在排行榜上仍将 o3 标记为预览版,以体现其中的不确定性 。”
o3 high 的高成本并非毫无缘由。据 Arc Prize Foundation 介绍,o3 high 在处理 ARC-AGI 任务时,使用的计算资源是 o3 模型中计算量最低的 o3 low 配置的 172 倍 。如此庞大的计算资源消耗,使得 o3 high 的成本大幅攀升。
此外,关于 OpenAI 计划为面向企业客户的高端定制服务收取高额费用的传闻由来已久。今年 3 月初,The Information 报道称,该公司可能计划每月收取高达 2 万美元(现汇率约合 14.5 万元人民币)的费用,为企业提供像软件开发人员代理这样的专业 AI “代理” 服务 。
虽然有人认为,即使是最昂贵的 AI 模型,其成本也远低于人类承包商或员工的薪酬,但 AI 研究员托比・奥德(Toby Ord)在 X 网站上发表文章指出,这些模型的效率可能并不如人们期待的那么高。例如,o3 high 在 ARC-AGI 测试中,需要尝试 1024 次才能达到最佳成绩 。
随着 AI 技术的持续发展和商业化进程的加快,如何在维持模型高性能的同时,有效控制成本,成为了整个行业面临的重要挑战。
On April 3rd, news emerged that the Arc Prize Foundation, which is responsible for maintaining and managing ARC-AGI, last week made significant revisions to the cost estimation of OpenAI’s o3 “reasoning” artificial intelligence model in the ARC-AGI benchmark test. After this adjustment, the operating cost of the o3 model far exceeds expectations.
When OpenAI launched the o3 model in December last year, it collaborated with the developers of ARC-AGI to showcase the model’s powerful ability to solve complex problems. At that time, the cost estimation indicated that its operating cost was considerable. However, the latest estimation results a few months later have shown astonishing changes. The Arc Prize Foundation pointed out that for the best-performing configuration of the o3 model, o3 high, the cost of solving a single ARC-AGI problem could be as high as approximately $30,000 (equivalent to about 218,000 yuan at the current exchange rate), whereas previously, this cost estimation was only about $3,000 (equivalent to about 21,800 yuan at the current exchange rate).
This significant upward adjustment of the cost estimation reveals the potentially high cost challenges that the current most advanced AI models may face in specific tasks. Although OpenAI has not announced the pricing of the o3 model nor officially released it, the Arc Prize Foundation believes that the pricing of OpenAI’s currently most expensive o1-pro model can serve as a reference. Mike Knoop, a co-founder of the Arc Prize Foundation, said in an interview with TechCrunch: “We think that the o1-pro is closer to the true cost of the o3 because the amount of computation used in the tests of the two is similar. But this is still just a reference value. Before the official pricing is announced, we will still mark the o3 as a preview version on the leaderboard to reflect the uncertainties involved.”
The high cost of o3 high is not without reason. According to the Arc Prize Foundation, when handling ARC-AGI tasks, o3 high uses 172 times the computational resources of the o3 low configuration, which has the lowest computational amount among the o3 models. Such a huge consumption of computational resources has caused the cost of o3 high to skyrocket.
In addition, there have long been rumors that OpenAI plans to charge high fees for its high-end customized services for corporate customers. In early March this year, The Information reported that the company may plan to charge as much as $20,000 per month (equivalent to about 145,000 yuan at the current exchange rate) to provide enterprises with professional AI “agent” services such as software developer agents.
Although some people believe that even the most expensive AI models have costs that are much lower than the salaries of human contractors or employees, AI researcher Toby Ord pointed out in an article published on the X website that the efficiency of these models may not be as high as people expect. For example, o3 high needs to attempt 1,024 times to achieve the best result in the ARC-AGI test.
With the continuous development of AI technology and the acceleration of its commercialization process, how to effectively control costs while maintaining high model performance has become an important challenge facing the entire industry.