← Glossary
LLM & Generative AI
Large language models, generative AI, agents, RLHF, multimodal
92 terms
Agent Evaluation
에이전트 평가
Agent evaluation assesses LLM-based agents that plan, remember, and call tools in external environments by scoring full …
Agent Loop
에이전트 루프
An agent loop is the control cycle that assembles input and accumulated context, invokes an LLM to plan and choose the n…
Agentic RAG
에이전틱 RAG
Agentic RAG is a retrieval-augmented generation architecture with an explicit planner (policy) that interleaves multi-st…
Agentic workflows
에이전트 워크플로우
Agentic workflows are dynamic workflows in which multiple specialized AI agents collaborate to plan, reason, use tools, …
AI Agent
AI 에이전트
An AI agent is a system that pursues a stated goal by perceiving its environment, reasoning and planning, and repeatedly…
AI Inference
AI 추론
AI inference is the runtime phase in which a trained model with fixed weights processes new inputs to produce prediction…
Anthropic
앤트로픽
Anthropic is an AI company that provides the Claude family of large language models and a developer platform, distributi…
Attention
어텐션
Attention is a neural mechanism that computes a weighted aggregation of information by scoring the similarity between a …
AUC (Area Under the Curve)
곡선 아래 면적
AUC represents the area under the ROC curve and is used as a metric to evaluate the performance of a classification mode…
Batch Inference
배치 추론
Batch inference is an offline prediction method that aggregates a large, fixed set of inputs and generates outputs in bu…
Bedrock
베드록
Amazon Bedrock is a fully managed AWS service that provides secure, enterprise-grade access to multiple foundation model…
BM25
BM25
BM25 is a probabilistic information-retrieval scoring function that ranks documents by summing per-query-term contributi…
Browser Agent
브라우저 에이전트
A browser agent is an autonomous system that controls a real web browser to complete tasks by running a closed-loop of o…
ChatGPT
챗GPT
ChatGPT is OpenAI's conversational AI application that turns natural-language user requests into answers or task outputs…
Claude
클로드
Claude is Anthropic’s family of large language models delivered through a developer platform — exposed via the Messages …
Computer Use
컴퓨터 사용
Computer Use is a tool-and-harness integration pattern where a model perceives the current UI via screenshots, proposes …
Context Engineering
컨텍스트 엔지니어링
Context engineering is the disciplined selection, organization, and formatting of everything a language model reads on a…
Context Window
컨텍스트 윈도우
A context window is the finite working memory of a language model that encompasses all tokens it can reference during ge…
CoT
사고 과정
CoT, or Chain-of-Thought, is a reasoning technique that prompts or trains large language models to produce or emulate in…
Cross-Encoder
크로스 인코더
A Cross-Encoder is an interaction-based neural ranker that concatenates the query and document into a single Transformer…
edge deployment
에지 배포
Edge deployment means running AI models or apps close to where data is created — for example on factory lines, inside re…
Embedding
임베딩
An embedding is a learned vector representation that maps discrete objects or high-dimensional inputs into a continuous …
Evals
모델 평가
Evals are the practice of turning benchmark or user‑study measurements into decision‑ready evidence about a model’s capa…
Evaluation Harness
평가 하니스
An evaluation harness is a testing framework that runs language models or agents against standardized datasets, prompts,…
Fine-tuning
파인튜닝
Fine-tuning is the process of continuing training from a pretrained model to adapt it to a specific task, domain, style,…
Gemini
제미나이
Gemini is Google’s family of multimodal generative models delivered through the Gemini API and Vertex AI, handling text,…
GPT-4o
GPT-4o
GPT-4o is OpenAI’s latest large language model that can handle text, speech, and images all at once. It’s designed to be…
grouped-query attention
그룹 쿼리 어텐션
Grouped-query attention is a method used in large language models (LLMs) and transformer-based AI systems to process sev…
Hallucination
환각
Hallucination is a failure mode where an LLM produces fluent content that conflicts with source evidence, real-world fac…
Hybrid Search
하이브리드 검색
Hybrid search is an information-retrieval design that runs lexical (keyword/BM25 over an inverted index) and vector (emb…
In-Context Learning
문맥 내 학습
In-context learning is a pre-trained language model?s ability to adapt to a task at inference by conditioning on instruc…
Inference
추론
Inference is the execution phase where a trained model receives new inputs and computes predictions, classifications, or…
inference cost
추론 비용
Inference cost is the operational compute-and-infrastructure expense incurred each time a deployed LLM tokenizes a promp…
inference latency
추론 지연 시간
Inference latency is the actual time it takes for an AI model to process an input and return an output. It typically ref…
Inference-Time Scaling
추론 시점 스케일링
Inference-Time Scaling is a technique that improves a trained model’s outputs without retraining by allocating more comp…
KV Cache
KV 캐시
A KV cache is the inference-time memory structure that stores previously computed attention key/value tensors in an auto…
KV Offloading
KV 오프로딩
KV offloading is an inference technique that tiers the self-attention Key/Value cache from GPU memory to CPU RAM or stor…
LLM
대규모 언어 모델
A large language model is a deep learning system trained on vast text corpora to understand and generate natural languag…
LoRA
로라
LoRA is a parameter-efficient fine-tuning method that freezes the base model and trains small low-rank adapters instead.
MCP
모델 컨텍스트 프로토콜
Model Context Protocol (MCP) is a stateful JSON-RPC 2.0 client-server protocol that lets AI hosts discover and invoke se…
MCP Server
MCP 서버
An MCP server is the service-side component of Model Context Protocol that exposes server capabilities such as tools, re…
Mistral AI
미스트랄 AI
Mistral AI is a platform company offering a family of large language models via a first-party API and enterprise product…
Model Cascading
모델 캐스케이딩
Model cascading is a dynamic routing technique that speculatively runs small, low-cost models first, validates draft res…
Model Distillation
모델 증류
Model distillation is a training method that teaches a smaller student model to imitate a larger teacher model's output …
Model parallelism
모델 병렬 처리
Model parallelism is a distributed technique that enables training or serving neural networks too large for a single GPU…
Model Router
모델 라우터
A model router is an orchestration layer that selects which model should handle a request based on difficulty, modality,…
Model Serving
모델 서빙
Model serving is the operational system that deploys a trained model behind APIs, batch jobs, or streaming endpoints and…
MoE
전문가 혼합
Mixture of Experts (MoE) is a sparse conditional-computation architecture in which a gating/router network selects a sma…
multi-agent system
다중 에이전트 시스템
A multi-agent system is a network of multiple artificial intelligence agents that interact within a shared environment, …
multi-hop retrieval
다중 홉 검색
Multi-hop retrieval is a technique where an AI system answers complex queries by sequentially retrieving and connecting …
multi-stage training
다단계 학습
Multi-stage training is a method for developing AI models—especially large language models (LLMs)—by progressively impro…
Multimodal Model
멀티모달 모델
A multimodal model is an AI model designed to process or generate across two or more data modalities, such as text, imag…
Multimodal RAG
멀티모달 RAG
Multimodal RAG is an extension of retrieval-augmented generation that embeds and indexes heterogeneous data such as text…
NLP
자연어 처리
Natural Language Processing (NLP) is an AI discipline that enables computers to interpret and produce human language by …
NVIDIA
엔비디아
NVIDIA provides an end-to-end AI software stack—NVIDIA AI Enterprise—spanning deployment microservices (NIM) and develop…
on-device AI
온디바이스 AI
On-device AI means running artificial intelligence directly on your own device—like a phone, laptop, or tablet—instead o…
open-source LLM
오픈소스 대형 언어 모델
An open-source large language model (open-source LLM) is a type of AI language model whose underlying code and trained d…
OpenAI
오픈AI
OpenAI is an AI platform and API provider that offers models such as GPT‑5.5 and hosted tools to developers, exposing a …
OpenAI Codex
오픈AI 코덱스
OpenAI Codex is a cloud-based coding agent optimized for software engineering that can implement features, fix bugs, exp…
output tokens
출력 토큰
Output tokens are pieces of text generated by an AI model in response to input, where the model predicts the next most l…
PagedAttention
페이지드 어텐션
PagedAttention is an LLM-serving memory technique that partitions the attention key–value (KV) cache into fixed-size pag…
post-training
후훈련
Post-training is the stage that adapts a pretrained model to instructions, safety requirements, domain behavior, and hum…
pre-training
사전 훈련
Pre-training is the upstream phase that optimizes a model on large, broad data with self-supervised objectives such as n…
Prompt Caching
프롬프트 캐싱
Prompt caching is an inference optimization where the provider reuses the model’s prefill state for an exact, sufficient…
Prompt Injection
프롬프트 인젝션
Prompt injection is an attack that causes an LLM application to treat untrusted user input or external content as higher…
PyTorch
파이토치
PyTorch is an open-source deep learning framework used to build and train neural networks. With its Python-based intuiti…
RAG
검색 증강 생성
Retrieval-Augmented Generation (RAG) couples a retriever with a generator so an LLM conditions on top‑K query‑relevant c…
Re-ranking
리랭킹
Re-ranking is a second-stage ranking step in RAG and search pipelines that jointly processes the user query with each in…
real-time inference
실시간 추론
Real-time inference is a serving paradigm that exposes a trained model as an API to execute and respond immediately upon…
Reasoning Model
추론 모델
A reasoning model is a specialization of large language models that augments next‑token generation with intermediate rea…
recurrent mechanism
순환 메커니즘
A recurrent mechanism refers to an architectural design in AI models where the output from a previous step is fed back a…
RLHF
인간 피드백 강화학습
Reinforcement Learning from Human Feedback (RLHF) is a post-training alignment method that treats a language model as a …
RoPE
RoPE(회전 위치 인코딩)
RoPE, or Rotary Position Embedding, is a Transformer positional encoding method that rotates query and key vectors by po…
RRF
상호 순위 융합
Reciprocal Rank Fusion (RRF) is a rank-aggregation algorithm that merges result lists from different retrievers by ignor…
Self-Attention
셀프 어텐션
Self-attention is a mechanism where each element in an input sequence compares itself with all other elements to compute…
Self-Supervised Pretext Tasks
자기지도 사전학습 과제
Self-supervised pretext tasks are label-free training objectives that exploit intrinsic structure in unlabeled data to l…
SLM
소형 언어 모델
A Small Language Model (SLM) is a language model that performs natural-language understanding or generation with a small…
Speculative Decoding
추측적 디코딩
Speculative decoding is an inference acceleration method where a smaller drafter proposes multiple candidate tokens and …
Structured Outputs
구조화된 출력
Structured outputs let an LLM generate only data that matches a user‑provided schema (typically JSON Schema), so require…
supervised fine-tuning
지도 미세 조정
Supervised fine-tuning is the process of further training a pre-trained AI model using additional labeled data, where hu…
SWE-bench
SWE-bench
SWE-bench is a software engineering benchmark that evaluates language models and agents on real GitHub issues by providi…
Synthetic Data
합성 데이터
Synthetic data is artificially generated data created from rules, simulations, statistical models, or generative models …
TensorFlow
텐서플로우
TensorFlow is an open-source machine learning and deep learning framework developed by the Google Brain team, designed f…
Test-Time Compute
테스트 타임 컴퓨트
Test-time compute is the inference-time budget of model evaluations, generated tokens, and wall-clock latency an LLM spe…
Token
토큰
A token is the basic unit an LLM uses to represent and process text instead of reading raw characters or full words dire…
Tool Calling
툴 호출
Tool calling is an interaction mechanism where an LLM, given definitions and input schemas of external tools, emits a st…
Tool Use
도구 사용
Tool use is an interaction pattern where an LLM emits structured calls against declared tool interfaces while the host a…
Transformer
트랜스포머
A Transformer is a neural network architecture that stacks self-attention and feed-forward blocks to learn relationships…
Vector Database
벡터 데이터베이스
A vector database is a specialized storage and retrieval system for embeddings that supports low-latency nearest-neighbo…
vision-language model
비전-언어 모델
A vision-language model is an artificial intelligence model designed to simultaneously understand and process both visua…
Visual Instruction Tuning
시각 지시 학습
Visual instruction tuning is a supervised fine-tuning approach that aligns a vision encoder with a large language model …
vLLM
vLLM
vLLM is an open-source LLM serving engine that boosts throughput by managing the attention KV cache with PagedAttention—…