How Does ChatGPT Work?

Updated May 2026
ChatGPT is a large language model built on the transformer architecture, trained in three phases: pre-training on trillions of words of internet text to learn language patterns, supervised fine-tuning on human-written conversations to learn the chat format, and reinforcement learning from human feedback (RLHF) to align outputs with human preferences for helpfulness and safety. The result is a system that predicts the most likely next token given everything that came before it, repeated thousands of times to generate a full response.

The Transformer Architecture

ChatGPT is built on the transformer, an architecture introduced in the 2017 paper "Attention Is All You Need" by researchers at Google. The transformer's key innovation is the self-attention mechanism, which allows every position in a sequence to attend to every other position simultaneously.

Before transformers, language models used recurrent neural networks (RNNs) that processed text one word at a time, passing information forward through a hidden state. This sequential processing created two problems: it was slow (you could not parallelize the computation), and long-range dependencies were difficult because information had to survive through many sequential steps, degrading along the way.

Self-attention solves both problems. Each token computes a weighted sum of all other tokens' representations, where the weights are learned during training. A token at position 500 in a document can directly attend to a token at position 1 without the information passing through 499 intermediate steps. This makes transformers both faster to train (because attention computations can be parallelized across all positions) and better at capturing long-range relationships.

GPT (Generative Pre-trained Transformer) uses only the decoder half of the original transformer. The decoder processes the input sequence left-to-right, with each position only allowed to attend to previous positions (causal masking). This design naturally suits text generation: the model predicts the next token based on all previous tokens, which is exactly the task it is trained on.

Phase 1: Pre-training

The first and most expensive phase of training teaches the model to predict the next word. The model processes enormous amounts of text, books, websites, academic papers, code repositories, forum discussions, and more, and at each position, it tries to predict what comes next.

For example, given the text "The capital of France is," the model should assign high probability to "Paris" and low probability to random words like "banana" or "seventeen." By training on trillions of these prediction tasks, the model learns grammar, facts, reasoning patterns, coding conventions, mathematical relationships, and much more. All of this knowledge is encoded implicitly in the model's parameters.

Pre-training is self-supervised: the labels (the actual next words) come from the text itself, so no human labeling is needed. This is what makes it possible to train on datasets measured in trillions of tokens. The compute cost is staggering, GPT-3's pre-training reportedly cost around $4.6 million in compute, and GPT-4's cost is estimated at over $100 million.

After pre-training, the model is a powerful text predictor, but it is not a useful chatbot. It tends to continue text rather than answer questions. Ask it a question and it might generate ten more related questions rather than an answer, because question-followed-by-more-questions is a pattern it has seen in text. The next two phases fix this.

Phase 2: Supervised Fine-Tuning (SFT)

In the second phase, human writers create examples of ideal conversations. A prompt like "Explain quantum entanglement in simple terms" is paired with a carefully written response that is clear, accurate, helpful, and appropriately detailed. Thousands of these prompt-response pairs are created.

The pre-trained model is then fine-tuned on this dataset, adjusting its parameters so that it produces outputs more like the human-written examples. This phase teaches the model the format of a helpful conversation: answer the question directly, provide relevant detail, avoid unnecessary hedging, and stop when the answer is complete.

SFT is relatively cheap compared to pre-training because the dataset is much smaller (thousands of examples versus trillions of tokens) and only a few epochs of training are needed. But the quality of the SFT data has an outsized impact on the model's behavior. The human writers effectively define what "good" looks like, and the model learns to imitate their style, depth, and judgment.

Phase 3: RLHF (Reinforcement Learning from Human Feedback)

The final phase uses reinforcement learning to further align the model with human preferences. This phase has three steps.

Step 1: Collect comparison data. The model generates multiple responses to each prompt. Human annotators read the responses and rank them from best to worst. These rankings capture nuanced preferences that are hard to specify in a rule: which response is more helpful, more accurate, more appropriately cautious, and less likely to produce harm.

Step 2: Train a reward model. The ranking data is used to train a separate neural network, the reward model, that predicts how much a human would prefer a given response. This reward model takes a prompt and a response as input and outputs a scalar score. It effectively learns to simulate human judgment about response quality.

Step 3: Optimize with PPO. The language model is then fine-tuned using Proximal Policy Optimization (PPO), a reinforcement learning algorithm. The model generates responses, the reward model scores them, and the scores serve as the reward signal. The model's parameters are adjusted to produce responses that the reward model scores highly, subject to a constraint that prevents the model from drifting too far from the SFT model (which prevents mode collapse, where the model finds a degenerate strategy that exploits the reward model).

RLHF is what makes modern chatbots dramatically more useful than the raw pre-trained model. Without RLHF, the model's outputs are coherent but often unhelpful, evasive, or unsafe. With RLHF, the model learns to prioritize helpfulness, honesty, and safety in ways that pure next-token prediction cannot capture.

How Generation Works at Inference Time

When you type a prompt, ChatGPT processes it through the transformer's layers. At the final layer, for the position immediately after your prompt, the model outputs a probability distribution over its entire vocabulary (roughly 100,000 tokens). The system samples from this distribution, selecting one token as the first word of its response.

That token is then appended to the input, and the entire sequence is processed again to generate the next token. This repeats, one token at a time, until the model produces a stop token or reaches its maximum output length. A 500-word response requires roughly 700 to 800 individual token predictions, each conditioned on the full sequence up to that point.

The temperature parameter controls how random the sampling is. At temperature 0, the model always picks the highest-probability token (deterministic but repetitive). At temperature 1, it samples proportionally to the probabilities (varied but occasionally erratic). Most deployed systems use temperatures between 0.5 and 0.8 to balance coherence and diversity.

What ChatGPT Does Not Do

ChatGPT does not search the internet during a conversation (unless connected to a retrieval system). Its knowledge comes entirely from its training data, which has a cutoff date. It does not remember previous conversations unless the conversation history is explicitly included in the prompt. It does not learn from your interactions, its parameters are frozen during inference.

ChatGPT does not understand truth. It predicts likely token sequences, and likely sequences sometimes contain false information. When the model states an incorrect fact confidently, it is not lying; it is generating the most probable continuation given its training data, and the most probable continuation is not always factually correct. This is why language models hallucinate, producing plausible-sounding but fabricated information.

Key Takeaway

ChatGPT is a transformer-based language model trained in three phases: pre-training on massive text to learn language, supervised fine-tuning to learn the conversational format, and RLHF to align with human preferences. It generates text one token at a time by predicting the most likely continuation, and it has no understanding of truth, no memory between conversations, and no real-time internet access.