← Glossary

Data Engineering

Data pipelines, storage, processing, formats

13 terms

Infra & Hardware LLM & Generative AI Data Engineering
Batch Inference
배치 추론
Batch inference is an offline prediction method that aggregates a large, fixed set of inputs and generates outputs in bu…
CS Fundamentals Data Engineering LLM & Generative AI
BM25
BM25
BM25 is a probabilistic information-retrieval scoring function that ranks documents by summing per-query-term contributi…
LLM & Generative AI Data Engineering Deep Learning
Cross-Encoder
크로스 인코더
A Cross-Encoder is an interaction-based neural ranker that concatenates the query and document into a single Transformer…
LLM & Generative AI Deep Learning Data Engineering
Embedding
임베딩
An embedding is a learned vector representation that maps discrete objects or high-dimensional inputs into a continuous …
Data Engineering LLM & Generative AI
Hybrid Search
하이브리드 검색
Hybrid search is an information-retrieval design that runs lexical (keyword/BM25 over an inverted index) and vector (emb…
CS Fundamentals Data Engineering
Inverted Index
역색인
An inverted index is a search-oriented data structure that maps each term to a postings list of documents — often includ…
LLM & Generative AI Data Engineering
multi-hop retrieval
다중 홉 검색
Multi-hop retrieval is a technique where an AI system answers complex queries by sequentially retrieving and connecting …
LLM & Generative AI Data Engineering
RAG
검색 증강 생성
Retrieval-Augmented Generation (RAG) couples a retriever with a generator so an LLM conditions on top‑K query‑relevant c…
LLM & Generative AI Data Engineering
Re-ranking
리랭킹
Re-ranking is a second-stage ranking step in RAG and search pipelines that jointly processes the user query with each in…
Data Engineering LLM & Generative AI
RRF
상호 순위 융합
Reciprocal Rank Fusion (RRF) is a rank-aggregation algorithm that merges result lists from different retrievers by ignor…
Data Engineering LLM & Generative AI ML Fundamentals
Synthetic Data
합성 데이터
Synthetic data is artificially generated data created from rules, simulations, statistical models, or generative models …
Data Engineering LLM & Generative AI
Vector Database
벡터 데이터베이스
A vector database is a specialized storage and retrieval system for embeddings that supports low-latency nearest-neighbo…
Data Engineering
Vector Search
벡터 검색
Vector search is a similarity-based retrieval method that represents documents and queries as embedding vectors in the s…