AMD, the AMD Arrow logo, AMD Instinct, Ryzen and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Other names are for informational purposes only and may be trademarks of their respective owners.
____________________________
1 Measurements conducted by AMD Performance Labs as of November 11th, 2023 on the AMD Instinct MI300X (750W) GPU designed with AMD CDNA 3 5nm | 6nm FinFET process technology at 2,100 MHz peak boost engine clock resulted in 163.4 TFLOPs peak theoretical double precision Matrix (FP64 Matrix), 81.7 TFLOPs peak theoretical double precision (FP64), 163.4 TFLOPs peak theoretical single precision Matrix (FP32 Matrix), 163.4 TFLOPs peak theoretical single precision (FP32), 653.7 TFLOPs peak theoretical TensorFloat-32 (TF32), 1307.4 TFLOPs peak theoretical half precision (FP16), 1307.4 TFLOPs peak theoretical Bfloat16 format precision (BF16), 2614.9 TFLOPs peak theoretical 8-bit precision (FP8), 2614.9 TOPs INT8 floating-point performance.
Published results on Nvidia H100 SXM (80GB) GPU resulted in 66.9 TFLOPs peak theoretical double precision tensor (FP64 Tensor), 33.5 TFLOPs peak theoretical double precision (FP64), 66.9 TFLOPs peak theoretical single precision (FP32), 494.7 TFLOPs peak TensorFloat-32 (TF32)*, 989.4 TFLOPs peak theoretical half precision tensor (FP16 Tensor), 133.8 TFLOPs peak theoretical half precision (FP16), 989.4 TFLOPs peak theoretical Bfloat16 tensor format precision (BF16 Tensor), 133.8 TFLOPs peak theoretical Bfloat16 format precision (BF16), 1,978.9 TFLOPs peak theoretical 8-bit precision (FP8), 1,978.9 TOPs peak theoretical INT8 floating-point performance.
Nvidia H100 source:
https://resources.nvidia.com/en-us-tensor-core/
* Nvidia H100 GPUs dont support FP32 Tensor.
MI300-18
2 Text generated with Llama2-70b chat using input sequence length of 4096 and 32 output token comparison using custom docker container for each system based on AMD internal testing as of 11/17/2023.
Configurations: 2P Intel Xeon Platinum CPU server using 4x AMD Instinct MI300X (192GB, 750W) GPUs, ROCm® 6.0 pre-release, PyTorch 2.2.0, vLLM for ROCm, Ubuntu® 22.04.2. Vs. 2P AMD EPYC 7763 CPU server using 4x AMD Instinct MI250 (128 GB HBM2e, 560W) GPUs, ROCm® 5.4.3, PyTorch 2.0.0., HuggingFace Transformers 4.35.0, Ubuntu 22.04.6.
4 GPUs on each system was used in this test. Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations. MI300-33
3 An AMD Ryzen Strix point processor is projected to offer 3x faster NPU performance for AI workloads when compared to an AMD Ryzen 7040 series processor. Performance projection by AMD engineering staff. Engineering projections are not a guarantee of final performance. Specific projections are based on reference design platforms and are subject to change when final products are released in market. STX-01.
Contact:
Brandi Martina
AMD Communications
(512) 705-1720
Email Contact
Suresh Bhaskaran
AMD Investor Relations
(408) 749-2845
Email Contact