Vol.01 · No.10 CS · AI · Infra May 15, 2026

AI Glossary

GlossaryReferenceLearn
LLM & Generative AI CS Fundamentals

Token

Difficulty

Plain Explanation

A token is a chunk of text that an LLM uses as its processing unit. One English word can be one token, but long words may split into multiple tokens, and spaces or punctuation can become tokens too. That is why word count and token count are not the same.

A tokenizer converts text into an array of token IDs. The model does not read raw letters directly; it reads those IDs and predicts the next likely token. The output is then decoded back from token IDs into text.

Examples & Analogies

Think of tokens as Lego pieces for text. A word such as "unbelievable" may be split into pieces like "un", "believ", and "able". Common fragments may become larger reusable pieces.

The same sentence can have different token counts depending on whether it contains English, Korean, code, emojis, or symbols. Pricing pages use units such as "1M input tokens" because tokens map more directly to model work than characters or words do.

At a Glance

UnitMeaningRelationship to tokens
CharacterA visible writing unitOften too small for efficient model input
WordA human reading unitBoundaries vary across languages
TokenA model processing unitBasis for cost, context limits, and latency
Token IDNumeric vocabulary indexThe actual input used for embedding lookup

Where and Why It Matters

Token count directly affects LLM cost. System prompts, conversation history, RAG documents, tool outputs, and user messages all add input tokens. Longer answers add output tokens.

Context windows are also measured in tokens. A model with a 128k context window gives you a token budget across input and output, not a character budget. Long documents should therefore be split by measured token count, not by rough word count.

Common Misconceptions

A token is not always a word. The same text may tokenize differently depending on language, tokenizer, and model family.

Reducing tokens does not automatically improve quality. Removing repetition is useful, but removing necessary context can make the model less grounded.

Tokens are not only the visible user input. System instructions, developer messages, tool schemas, retrieved context, and model output can all count against the token budget.

How It Sounds in Conversation

"This prompt burns too many tokens, so we should trim the RAG chunks."

"Lowering the output-token cap cuts cost, but the answer may get truncated."

"For Korean documents, we should measure token count with the actual tokenizer instead of guessing from character count."

Related Reading

References

Helpful?