← Glossary

LLM & Generative AI

Large language models, generative AI, agents, RLHF, multimodal

92 terms

LLM & Generative AI
Agent Evaluation
에이전트 평가
Agent evaluation assesses LLM-based agents that plan, remember, and call tools in external environments by scoring full …
LLM & Generative AI
Agent Loop
에이전트 루프
An agent loop is the control cycle that assembles input and accumulated context, invokes an LLM to plan and choose the n…
LLM & Generative AI
Agentic RAG
에이전틱 RAG
Agentic RAG is a retrieval-augmented generation architecture with an explicit planner (policy) that interleaves multi-st…
ML Fundamentals LLM & Generative AI
Agentic workflows
에이전트 워크플로우
Agentic workflows are dynamic workflows in which multiple specialized AI agents collaborate to plan, reason, use tools, …
LLM & Generative AI
AI Agent
AI 에이전트
An AI agent is a system that pursues a stated goal by perceiving its environment, reasoning and planning, and repeatedly…
Infra & Hardware LLM & Generative AI
AI Inference
AI 추론
AI inference is the runtime phase in which a trained model with fixed weights processes new inputs to produce prediction…
Products & Platforms LLM & Generative AI AI Safety & Ethics
Anthropic
앤트로픽
Anthropic is an AI company that provides the Claude family of large language models and a developer platform, distributi…
Deep Learning LLM & Generative AI
Attention
어텐션
Attention is a neural mechanism that computes a weighted aggregation of information by scoring the similarity between a …
LLM & Generative AI
AUC (Area Under the Curve)
곡선 아래 면적
AUC represents the area under the ROC curve and is used as a metric to evaluate the performance of a classification mode…
Infra & Hardware LLM & Generative AI Data Engineering
Batch Inference
배치 추론
Batch inference is an offline prediction method that aggregates a large, fixed set of inputs and generates outputs in bu…
Products & Platforms Infra & Hardware LLM & Generative AI
Bedrock
베드록
Amazon Bedrock is a fully managed AWS service that provides secure, enterprise-grade access to multiple foundation model…
CS Fundamentals Data Engineering LLM & Generative AI
BM25
BM25
BM25 is a probabilistic information-retrieval scoring function that ranks documents by summing per-query-term contributi…
LLM & Generative AI
Browser Agent
브라우저 에이전트
A browser agent is an autonomous system that controls a real web browser to complete tasks by running a closed-loop of o…
Products & Platforms LLM & Generative AI
ChatGPT
챗GPT
ChatGPT is OpenAI's conversational AI application that turns natural-language user requests into answers or task outputs…
Products & Platforms LLM & Generative AI Deep Learning
Claude
클로드
Claude is Anthropic’s family of large language models delivered through a developer platform — exposed via the Messages …
LLM & Generative AI
Computer Use
컴퓨터 사용
Computer Use is a tool-and-harness integration pattern where a model perceives the current UI via screenshots, proposes …
LLM & Generative AI
Context Engineering
컨텍스트 엔지니어링
Context engineering is the disciplined selection, organization, and formatting of everything a language model reads on a…
LLM & Generative AI
Context Window
컨텍스트 윈도우
A context window is the finite working memory of a language model that encompasses all tokens it can reference during ge…
LLM & Generative AI ML Fundamentals
CoT
사고 과정
CoT, or Chain-of-Thought, is a reasoning technique that prompts or trains large language models to produce or emulate in…
LLM & Generative AI Data Engineering Deep Learning
Cross-Encoder
크로스 인코더
A Cross-Encoder is an interaction-based neural ranker that concatenates the query and document into a single Transformer…
Infra & Hardware LLM & Generative AI
edge deployment
에지 배포
Edge deployment means running AI models or apps close to where data is created — for example on factory lines, inside re…
LLM & Generative AI Deep Learning Data Engineering
Embedding
임베딩
An embedding is a learned vector representation that maps discrete objects or high-dimensional inputs into a continuous …
LLM & Generative AI
Evals
모델 평가
Evals are the practice of turning benchmark or user‑study measurements into decision‑ready evidence about a model’s capa…
LLM & Generative AI
Evaluation Harness
평가 하니스
An evaluation harness is a testing framework that runs language models or agents against standardized datasets, prompts,…
LLM & Generative AI Deep Learning ML Fundamentals
Fine-tuning
파인튜닝
Fine-tuning is the process of continuing training from a pretrained model to adapt it to a specific task, domain, style,…
Products & Platforms LLM & Generative AI Deep Learning
Gemini
제미나이
Gemini is Google’s family of multimodal generative models delivered through the Gemini API and Vertex AI, handling text,…
Products & Platforms LLM & Generative AI
GPT-4o
GPT-4o
GPT-4o is OpenAI’s latest large language model that can handle text, speech, and images all at once. It’s designed to be…
Deep Learning LLM & Generative AI
grouped-query attention
그룹 쿼리 어텐션
Grouped-query attention is a method used in large language models (LLMs) and transformer-based AI systems to process sev…
AI Safety & Ethics LLM & Generative AI
Hallucination
환각
Hallucination is a failure mode where an LLM produces fluent content that conflicts with source evidence, real-world fac…
Data Engineering LLM & Generative AI
Hybrid Search
하이브리드 검색
Hybrid search is an information-retrieval design that runs lexical (keyword/BM25 over an inverted index) and vector (emb…
LLM & Generative AI
In-Context Learning
문맥 내 학습
In-context learning is a pre-trained language model?s ability to adapt to a task at inference by conditioning on instruc…
LLM & Generative AI Infra & Hardware Deep Learning
Inference
추론
Inference is the execution phase where a trained model receives new inputs and computes predictions, classifications, or…
LLM & Generative AI Infra & Hardware Products & Platforms
inference cost
추론 비용
Inference cost is the operational compute-and-infrastructure expense incurred each time a deployed LLM tokenizes a promp…
Infra & Hardware LLM & Generative AI
inference latency
추론 지연 시간
Inference latency is the actual time it takes for an AI model to process an input and return an output. It typically ref…
LLM & Generative AI
Inference-Time Scaling
추론 시점 스케일링
Inference-Time Scaling is a technique that improves a trained model’s outputs without retraining by allocating more comp…
Infra & Hardware LLM & Generative AI
KV Cache
KV 캐시
A KV cache is the inference-time memory structure that stores previously computed attention key/value tensors in an auto…
Infra & Hardware LLM & Generative AI
KV Offloading
KV 오프로딩
KV offloading is an inference technique that tiers the self-attention Key/Value cache from GPU memory to CPU RAM or stor…
LLM & Generative AI Deep Learning ML Fundamentals
LLM
대규모 언어 모델
A large language model is a deep learning system trained on vast text corpora to understand and generate natural languag…
LLM & Generative AI Deep Learning ML Fundamentals
LoRA
로라
LoRA is a parameter-efficient fine-tuning method that freezes the base model and trains small low-rank adapters instead.
LLM & Generative AI
MCP
모델 컨텍스트 프로토콜
Model Context Protocol (MCP) is a stateful JSON-RPC 2.0 client-server protocol that lets AI hosts discover and invoke se…
LLM & Generative AI
MCP Server
MCP 서버
An MCP server is the service-side component of Model Context Protocol that exposes server capabilities such as tools, re…
Products & Platforms LLM & Generative AI
Mistral AI
미스트랄 AI
Mistral AI is a platform company offering a family of large language models via a first-party API and enterprise product…
LLM & Generative AI
Model Cascading
모델 캐스케이딩
Model cascading is a dynamic routing technique that speculatively runs small, low-cost models first, validates draft res…
Deep Learning LLM & Generative AI
Model Distillation
모델 증류
Model distillation is a training method that teaches a smaller student model to imitate a larger teacher model's output …
Infra & Hardware LLM & Generative AI
Model parallelism
모델 병렬 처리
Model parallelism is a distributed technique that enables training or serving neural networks too large for a single GPU…
LLM & Generative AI Infra & Hardware
Model Router
모델 라우터
A model router is an orchestration layer that selects which model should handle a request based on difficulty, modality,…
Infra & Hardware LLM & Generative AI Products & Platforms
Model Serving
모델 서빙
Model serving is the operational system that deploys a trained model behind APIs, batch jobs, or streaming endpoints and…
LLM & Generative AI Deep Learning
MoE
전문가 혼합
Mixture of Experts (MoE) is a sparse conditional-computation architecture in which a gating/router network selects a sma…
LLM & Generative AI
multi-agent system
다중 에이전트 시스템
A multi-agent system is a network of multiple artificial intelligence agents that interact within a shared environment, …
LLM & Generative AI Data Engineering
multi-hop retrieval
다중 홉 검색
Multi-hop retrieval is a technique where an AI system answers complex queries by sequentially retrieving and connecting …
ML Fundamentals LLM & Generative AI
multi-stage training
다단계 학습
Multi-stage training is a method for developing AI models—especially large language models (LLMs)—by progressively impro…
LLM & Generative AI Deep Learning
Multimodal Model
멀티모달 모델
A multimodal model is an AI model designed to process or generate across two or more data modalities, such as text, imag…
LLM & Generative AI
Multimodal RAG
멀티모달 RAG
Multimodal RAG is an extension of retrieval-augmented generation that embeds and indexes heterogeneous data such as text…
LLM & Generative AI Deep Learning ML Fundamentals
NLP
자연어 처리
Natural Language Processing (NLP) is an AI discipline that enables computers to interpret and produce human language by …
Products & Platforms LLM & Generative AI Infra & Hardware
NVIDIA
엔비디아
NVIDIA provides an end-to-end AI software stack—NVIDIA AI Enterprise—spanning deployment microservices (NIM) and develop…
Infra & Hardware LLM & Generative AI
on-device AI
온디바이스 AI
On-device AI means running artificial intelligence directly on your own device—like a phone, laptop, or tablet—instead o…
LLM & Generative AI
open-source LLM
오픈소스 대형 언어 모델
An open-source large language model (open-source LLM) is a type of AI language model whose underlying code and trained d…
Products & Platforms LLM & Generative AI AI Safety & Ethics
OpenAI
오픈AI
OpenAI is an AI platform and API provider that offers models such as GPT‑5.5 and hosted tools to developers, exposing a …
LLM & Generative AI CS Fundamentals
OpenAI Codex
오픈AI 코덱스
OpenAI Codex is a cloud-based coding agent optimized for software engineering that can implement features, fix bugs, exp…
LLM & Generative AI
output tokens
출력 토큰
Output tokens are pieces of text generated by an AI model in response to input, where the model predicts the next most l…
Infra & Hardware LLM & Generative AI
PagedAttention
페이지드 어텐션
PagedAttention is an LLM-serving memory technique that partitions the attention key–value (KV) cache into fixed-size pag…
ML Fundamentals LLM & Generative AI
post-training
후훈련
Post-training is the stage that adapts a pretrained model to instructions, safety requirements, domain behavior, and hum…
ML Fundamentals LLM & Generative AI
pre-training
사전 훈련
Pre-training is the upstream phase that optimizes a model on large, broad data with self-supervised objectives such as n…
LLM & Generative AI
Prompt Caching
프롬프트 캐싱
Prompt caching is an inference optimization where the provider reuses the model’s prefill state for an exact, sufficient…
AI Safety & Ethics LLM & Generative AI
Prompt Injection
프롬프트 인젝션
Prompt injection is an attack that causes an LLM application to treat untrusted user input or external content as higher…
LLM & Generative AI
PyTorch
파이토치
PyTorch is an open-source deep learning framework used to build and train neural networks. With its Python-based intuiti…
LLM & Generative AI Data Engineering
RAG
검색 증강 생성
Retrieval-Augmented Generation (RAG) couples a retriever with a generator so an LLM conditions on top‑K query‑relevant c…
LLM & Generative AI Data Engineering
Re-ranking
리랭킹
Re-ranking is a second-stage ranking step in RAG and search pipelines that jointly processes the user query with each in…
Infra & Hardware LLM & Generative AI
real-time inference
실시간 추론
Real-time inference is a serving paradigm that exposes a trained model as an API to execute and respond immediately upon…
LLM & Generative AI
Reasoning Model
추론 모델
A reasoning model is a specialization of large language models that augments next‑token generation with intermediate rea…
Deep Learning LLM & Generative AI
recurrent mechanism
순환 메커니즘
A recurrent mechanism refers to an architectural design in AI models where the output from a previous step is fed back a…
LLM & Generative AI Deep Learning
RLHF
인간 피드백 강화학습
Reinforcement Learning from Human Feedback (RLHF) is a post-training alignment method that treats a language model as a …
CS Fundamentals Deep Learning LLM & Generative AI
RoPE
RoPE(회전 위치 인코딩)
RoPE, or Rotary Position Embedding, is a Transformer positional encoding method that rotates query and key vectors by po…
Data Engineering LLM & Generative AI
RRF
상호 순위 융합
Reciprocal Rank Fusion (RRF) is a rank-aggregation algorithm that merges result lists from different retrievers by ignor…
LLM & Generative AI Deep Learning ML Fundamentals
Self-Attention
셀프 어텐션
Self-attention is a mechanism where each element in an input sequence compares itself with all other elements to compute…
Deep Learning LLM & Generative AI
Self-Supervised Pretext Tasks
자기지도 사전학습 과제
Self-supervised pretext tasks are label-free training objectives that exploit intrinsic structure in unlabeled data to l…
LLM & Generative AI Infra & Hardware
SLM
소형 언어 모델
A Small Language Model (SLM) is a language model that performs natural-language understanding or generation with a small…
LLM & Generative AI Infra & Hardware
Speculative Decoding
추측적 디코딩
Speculative decoding is an inference acceleration method where a smaller drafter proposes multiple candidate tokens and …
LLM & Generative AI
Structured Outputs
구조화된 출력
Structured outputs let an LLM generate only data that matches a user‑provided schema (typically JSON Schema), so require…
ML Fundamentals LLM & Generative AI
supervised fine-tuning
지도 미세 조정
Supervised fine-tuning is the process of further training a pre-trained AI model using additional labeled data, where hu…
LLM & Generative AI
SWE-bench
SWE-bench
SWE-bench is a software engineering benchmark that evaluates language models and agents on real GitHub issues by providi…
Data Engineering LLM & Generative AI ML Fundamentals
Synthetic Data
합성 데이터
Synthetic data is artificially generated data created from rules, simulations, statistical models, or generative models …
LLM & Generative AI
TensorFlow
텐서플로우
TensorFlow is an open-source machine learning and deep learning framework developed by the Google Brain team, designed f…
LLM & Generative AI
Test-Time Compute
테스트 타임 컴퓨트
Test-time compute is the inference-time budget of model evaluations, generated tokens, and wall-clock latency an LLM spe…
LLM & Generative AI CS Fundamentals
Token
토큰
A token is the basic unit an LLM uses to represent and process text instead of reading raw characters or full words dire…
LLM & Generative AI
Tool Calling
툴 호출
Tool calling is an interaction mechanism where an LLM, given definitions and input schemas of external tools, emits a st…
LLM & Generative AI
Tool Use
도구 사용
Tool use is an interaction pattern where an LLM emits structured calls against declared tool interfaces while the host a…
Deep Learning LLM & Generative AI
Transformer
트랜스포머
A Transformer is a neural network architecture that stacks self-attention and feed-forward blocks to learn relationships…
Data Engineering LLM & Generative AI
Vector Database
벡터 데이터베이스
A vector database is a specialized storage and retrieval system for embeddings that supports low-latency nearest-neighbo…
Deep Learning LLM & Generative AI
vision-language model
비전-언어 모델
A vision-language model is an artificial intelligence model designed to simultaneously understand and process both visua…
Deep Learning LLM & Generative AI
Visual Instruction Tuning
시각 지시 학습
Visual instruction tuning is a supervised fine-tuning approach that aligns a vision encoder with a large language model …
Infra & Hardware LLM & Generative AI
vLLM
vLLM
vLLM is an open-source LLM serving engine that boosts throughput by managing the attention KV cache with PagedAttention—…