Pricing LLMS: The True Cost of Compute

On July 23, Meta announced their Llama 3.1 models. Alongside the new 8B, 70B, and 405B parameter models, they also released a technical paper titled The Llama 3 Herd of Models.

While the benchmarking results were impressive, the sheer amount of compute used to train the 405B model caught my attention as a software developer.

/Graphics Processing Units and Training Costs

Meta used ~16K H100 GPUs to train the 405B model. While the specs of these GPUs might not mean much to the average reader, let’s break it down.

The H100 is Nvidia’s state-of-the-art GPU. According to analysts at Raymond James Financial, the price of each H100 ranges between $25,000 and $30,000. Generally, these GPUs are not sold individually to consumers but rather to data centers and enterprises. If you want to get your hands on one through a reseller on eBay, expect to pay over $30,000. Additionally, hosting an H100 requires a server rack with capable CPUs, memory, storage, and a robust cooling system. 

Assuming Meta paid $25,000 per GPU, the cost of GPUs alone would be $400 million. This figure doesn’t include the entire hardware infrastructure needed to support these GPUs. It’s also rumored that OpenAI used 25,000 Nvidia A100 GPUs (each costing around $10,000 in 2023) to train GPT-4. 

Epoch, 2023 | Chart: 2024 AI Index Report

So, what is the actual cost to train a large foundation model? Of course, these hardware components aren’t discarded after training one model. According to the Artificial Intelligence Index Report 2024 published by Stanford University, the cost to train GPT-4 is estimated at around $78 million, and the cost to train Gemini Ultra is approximately $191 million. Both GPT-4 and Gemini Ultra are rumored to have over 1.5 trillion parameters. These costs are calculated based on the hardware used, the duration of the training process, and the utilization rate of the hardware.

As shown in the chart, training LLMs is becoming exponentially more expensive. To power the hardware for training, The Verge reported that training GPT-3 consumed 1,300 megawatt-hours (MWh) of electricity. For context, one megawatt can power the average American home for 1.2 months. To put that into perspective, the electricity used to train GPT-3 could power all the homes in a neighborhood with a population of over 15,000, for an entire month.

Epoch, 2023 | Chart: 2024 AI Index Report

/What About the Smaller Players?

The high cost of training these high-end models means that only a few industry leaders are driving LLM development forward. In response, in October 2023, President Biden issued an Executive Order on Artificial Intelligence aimed at leveling the playing field. The order includes the launch of a pilot program for the National AI Research Resource, which will provide resources and data to AI researchers and students.

In addition, the order aims to give small developers and entrepreneurs access to technical assistance and resources to help commercialize their AI innovations. The effectiveness of this Executive Order remains to be seen, but the dominance of a few powerful industry leaders in LLM development is unlikely to change in the near future.

/My Thoughts

In 2017, Google published the landmark paper Attention is All You Need, which introduced the Transformer architecture. Since then, all major advancements in generative language models have been built on top of the success of Attention. I recall my professor, Trac Tran, once told me that one breakthrough in a specific technology domain typically pushes innovation for 10 years. Here we are in 2024, seven years into harnessing the power of the Transformer model, and we might be nearing the end of this wave of rapid innovation sparked by the concept of attention.

We are already seeing diminishing returns on large models when it comes to benchmark performance (though I recognize criticisms about the validity of public benchmarks). Additionally, model sizes are growing so large that we are running out of high-quality data to train them effectively (Stanford University, 2024).

A new breakthrough is needed to drive the development of more efficient and capable generative language models. I believe the next revolution will come from a new architecture design similar to the innovation of Attention or algorithm breakthrough to allow faster computation. Simply adding more compute and increasing parameter counts is not only making training exponentially more expensive but may also be ineffective in delivering the next level of performance.

The Llama 3.1 405B model is 50.6 times the size of the Llama 3.18B model.

The average improvement is about 0.27.

Sources:

Stanford University. (2024). AI Index Report 2024, Chapter 1: Research and Development. Stanford Institute for Human-Centered Artificial Intelligence. https://aiindex.stanford.edu/wp-content/uploads/2024/04/HAI_AI-Index-Report-2024_Chapter1.pdf

Llama Team, AI @ Meta1.(2024). The Llama 3 Herd of Models. https://arxiv.org/pdf/2407.21783

James Vincent, The Verge (2024). How much electricity does AI consume? https://www.theverge.com/24066646/ai-electricity-energy-watts-generative-consumption

Zhenbo Yan

Zhenbo is a software developer with experience in full-stack mobile development and networking, currently working on high-performance computing software at a BioTech institute. He is interested in exploring the application of LLMs and emerging AI tools in medical research.

Previous
Previous

Big Tech in The White House, a Step or a Leap in the Right Direction?

Next
Next

Writing Still Matters