DeepSpeed 문서읽기 1

Extreme Speed and Scale for DL Training and Inference

Train/Inference dense or sparse models with billions or trillions of parameters
Achieve excellent system throughput and efficiently scale to thousands of GPUs
Train/Inference on resource constrained GPU systems
Achieve unprecedented low latency and high throughput for inference
Achieve extreme compression for an unparalleled inference latency and model size reduction with low costs

결국, 강조하고 싶은 부분을 세가지 키워드로 요약하자면 Speed, Scale, Compression 입니다.

즉, large model을 학습/추론하기 위해 여러 대의 GPU로 스케일링하여 분산처리하고, 해당 과정에서 latency 및 throughput을 위한 compression까지 제공합니다.

deepspeed는 Training, Inference, Compression의 세가지 영역에서 혁신을 제공합니다.

Untitled