← Glossary
Infra & Hardware
GPU, cloud, MLOps, deployment, optimization
16 terms
AI-native framework
AI 네이티브 프레임워크
An AI-native framework refers to systems or processes designed from the ground up with AI as a core component, with AI c…
Bedrock
베드록
Amazon Bedrock is an AWS fully managed service that provides access to high-performing foundation models from multiple p…
cuDNN
cuDNN
cuDNN is a GPU-accelerated library from NVIDIA that provides highly optimized implementations of core deep learning oper…
edge deployment
에지 배포
Edge deployment means running AI models or apps close to where data is created — for example on factory lines, inside re…
FlashAttention-4
플래시어텐션-4
FlashAttention-4 is a highly optimized GPU kernel for computing 'attention' operations in large-scale AI models, deliver…
GPU
그래픽 처리 장치
A GPU (Graphics Processing Unit) is a processor built with thousands of small cores to execute many operations in parall…
GPU cluster
GPU 클러스터
A GPU cluster is a system where multiple GPUs (Graphics Processing Units) are networked together to function as a single…
inference cost
추론 비용
Inference cost is the ongoing cost of running data through a trained AI model to produce an output—like a prediction, ge…
inference latency
추론 지연 시간
Inference latency is the actual time it takes for an AI model to process an input and return an output. It typically ref…
Model parallelism
모델 병렬 처리
Model parallelism is a technique where a large AI model is partitioned across multiple devices (such as GPUs), with each…
Nvidia
엔비디아
Nvidia is a technology company best known for its graphics processing units (GPUs) and a full-stack AI platform that inc…
NVIDIA Blackwell
NVIDIA 블랙웰
NVIDIA Blackwell is a GPU architecture designed for next-generation AI performance, serving as a core technology for AI …
NVIDIA DGX Cloud
NVIDIA DGX 클라우드
NVIDIA DGX Cloud is a cloud-based AI supercomputer designed for large-scale AI development. It provides a comprehensive …
on-device AI
온디바이스 AI
On-device AI means running artificial intelligence directly on your own device—like a phone, laptop, or tablet—instead o…
real-time inference
실시간 추론
Real-time inference refers to the process where a trained machine learning model accepts live input data and generates p…
Trainium
트레이늄
Trainium is Amazon Web Services (AWS)’s custom AI training chip designed to train large deep learning models—especially …