Skip to main content

ChatGPT: The Cloud Costs of the Most Famous AI Language Model

Tobias Jonas Tobias Jonas 5 min read
ChatGPT: The Cloud Costs of the Most Famous AI Language Model

The Fascination of ChatGPT

The entire internet is fascinated by ChatGPT, but what does something like this actually cost to operate, and why can a language model with so many parameters only work with a hyperscaler? Let’s find out with assumptions and using a calculation already made by a professor from the University of Maryland.

The Costs of ChatGPT’s Cloud Infrastructure

First, it must be noted that exact details of ChatGPT’s cloud infrastructure are not known, and we have to make many assumptions in this article. If you search the internet for information or ask ChatGPT itself for corresponding details, you still get some very interesting information that we can use as a basis for calculation. The AI model GPT-3 Large has 175 billion parameters and is provided in the form of ChatGPT by OpenAI in the Microsoft Azure Cloud. Following the argumentation of Tom Goldstein, a professor of artificial intelligence at the University of Maryland, a model with 3 billion parameters can predict a token, i.e., a word or punctuation mark, in 6ms on an NVIDIA A100 GPU. If you scale this value up to 175 billion, predicting a token in ChatGPT should take 350ms. Since the trained language model is far too large for a single A100 GPU, an entire cluster must be used for inferencing. ChatGPT produces approximately 15-20 words per second, which could be achieved using A100 GPUs via an 8-GPU server in the Azure Cloud. An A100 GPU costs EUR 4.50 per hour in the Azure Cloud in the USA as of today. If you calculate 8 servers with GPUs for 20 words per second, you get approximately a price of EUR 0.0005 per suggested word. However, we are talking here only about inferencing, i.e., delivering data with an already trained model. Training a model with 175 billion parameters probably requires additional GPU clusters with hundreds of GPUs. Assuming training is complete and we are only dealing with active prediction, the next consideration is how many users ChatGPT has already reached. In the first 5 days alone, over a million users registered for ChatGPT. For several days, it has been…

This very extreme AI example clearly shows that this can only be made possible by a hyperscaler. Hardly any company has the capability to provide so many A100 graphics cards permanently, considering that this graphics card costs over EUR 10,000 to purchase.

How Much Did Training Cost and How Much Power Was Consumed?

This information can be found on the internet on LinkedIn on a ChatGPT focus page, even if the information has not been verified by OpenAI. A scientific estimate suggests that 1024x A100 GPUs were used for 34 days for training. This means that OpenAI needed $4.6 million for training. Although energy consumption has not been officially confirmed, it is estimated that training consumed 936 MWh. This corresponds to the consumption of almost 100,000 average households in Europe per day.

What Would the Minimum Operation of GPT-3 Cost?

There are many sources that dealt with GPT-3 or GPT-2 before ChatGPT was released. LambdaLabs, for example, calculated very simply that training GPT-3 with all 175 billion parameters with only one V100 GPU (innFactory note: A100 GPUs are newer and faster) would take over 350 years. Theoretically, the trained model can then also be delivered with just one V100 GPU, but it only works smoothly with 350 GB VRAM. At AWS, for example, there is an EC2 instance of type p4de.24xlarge that would be up to the job. For this, a normal company without special conditions would have to pay over EUR 30,000 per month, and again we are only talking about operation, not training the AI model.

GPT-4 vs. LaMDA: What Will AI Language Models Bring Us in the Future?

ChatGPT is currently based on GPT-3, an AI model that was released in mid-last year. It is expected that the successor model GPT-4 will be much better. The new model is supposed to use not 175 billion but 100 trillion parameters. (Update Jan 22, 2023: OpenAI has disputed the parameters circulating online for GPT-4. However, performance should still be significantly better.) Performance should thereby be improved again. But Google isn’t sleeping either. Google is working on LaMDA and will soon make it available to the general public as BARD. This model also promises incredible performance, as a former Google engineer made headlines when he publicly said he believed LaMDA had developed a human consciousness. Unlike GPT, BARD should also be able to access current information and will be integrated directly into Google Search very soon. GPT-3 vs. LaMDA or GPT-4 vs. LaMDA – we are excited. In our view, such language models will significantly shape our next decade. The conversion of text to fluid speech will also reach an almost unbelievable level. For example, Google is working on Tacotron 2 in the AI field, or Microsoft on VALL-E, which makes it possible to imitate a voice after only 3 seconds. There are already other services with which you can easily create AI videos. In the following example, I had the text written by GPT-3 and the voice, as well as the video, generated by another generative AI (D-ID). The whole thing is still a bit rough, but the AI also had exactly only 1 photo of me as a basis.

Looking for a Keynote Speaker on Artificial Intelligence – For Example on Generative Models Like ChatGPT?

More information about Tobias Jonas as a keynote speaker for AI can be found at: tobias-jonas.de. In addition, innFactory’s subsidiary, innFactory AI Consulting, advises medium-sized businesses on artificial intelligence topics and trains their employees as AI Managers.

Below is the Twitter post from Tom Goldstein and additional sources.

There are several sources about GPT-3 available on the internet.

Tobias Jonas
Written by Tobias Jonas CEO

Cloud-Architekt und Experte für AWS, Google Cloud, Azure und STACKIT. Vor der Gründung der innFactory bei Siemens und BMW tätig.

LinkedIn