LLM & Generative AI CS Fundamentals

Token

Difficulty

Plain Explanation

A token is a chunk of text that an LLM uses as its processing unit. One English word can be one token, but long words may split into multiple tokens, and spaces or punctuation can become tokens too. That is why word count and token count are not the same.

A tokenizer converts text into an array of token IDs. The model does not read raw letters directly; it reads those IDs and predicts the next likely token. The output is then decoded back from token IDs into text.

Examples & Analogies

Think of tokens as Lego pieces for text. A word such as "unbelievable" may be split into pieces like "un", "believ", and "able". Common fragments may become larger reusable pieces.

The same sentence can have different token counts depending on whether it contains English, Korean, code, emojis, or symbols. Pricing pages use units such as "1M input tokens" because tokens map more directly to model work than characters or words do.

At a Glance

Unit	Meaning	Relationship to tokens
Character	A visible writing unit	Often too small for efficient model input
Word	A human reading unit	Boundaries vary across languages
Token	A model processing unit	Basis for cost, context limits, and latency
Token ID	Numeric vocabulary index	The actual input used for embedding lookup

Where and Why It Matters

Token count directly affects LLM cost. System prompts, conversation history, RAG documents, tool outputs, and user messages all add input tokens. Longer answers add output tokens.

Context windows are also measured in tokens. A model with a 128k context window gives you a token budget across input and output, not a character budget. Long documents should therefore be split by measured token count, not by rough word count.

Common Misconceptions

A token is not always a word. The same text may tokenize differently depending on language, tokenizer, and model family.

Reducing tokens does not automatically improve quality. Removing repetition is useful, but removing necessary context can make the model less grounded.

Tokens are not only the visible user input. System instructions, developer messages, tool schemas, retrieved context, and model output can all count against the token budget.

How It Sounds in Conversation

"This prompt burns too many tokens, so we should trim the RAG chunks."

"Lowering the output-token cap cuts cost, but the answer may get truncated."

"For Korean documents, we should measure token count with the actual tokenizer instead of guessing from character count."

References

★Paper2016
Neural Machine Translation of Rare Words with Subword UnitsRico Sennrich, Barry Haddow, Alexandra BirchACL
Foundational paper that popularized BPE-style subword tokenization.
★Paper2018
SentencePiece: A simple and language independent subword tokenizer and detokenizerTaku Kudo, John RichardsonEMNLP
Explains subword tokenizer design for languages with and without explicit spaces.
★Docs2026
TokenizersHugging FaceHugging Face Docs
Official docs for tokenizer pipelines, vocabularies, and encode/decode behavior.
★Code2026
tiktokenOpenAIGitHub
OpenAI tokenizer implementation and examples for BPE-based model tokenization.

Helpful?

0to1log Weekly

AI Glossary