AMD unveils its MI100 GPU, said to be its most powerful silicon for supercomputers, high-end AI processing

AMD announced on Monday its Instinct MI100 accelerator, a GPU aimed at speeding up AI and other vector-heavy work done by supercomputers and high-end servers.
This is a 7nm TSMC-fabricated chip code-named Arcturus, and is the first to feature AMD’s CDNA architecture. We’re told the hardware features 120 compute units and 7,680 stream processors capable of performing up to 11.5 TFLOPs of FMA64 and FP64 precision. The silicon peaks at 184.6 TFLOPS for FP16 matrix operations, and 92.3 TFLOPS for trendy bfloat16 math, AMD boasted. It ships on a PCIe card.
“Today AMD takes a major step forward in the journey toward exascale computing as we unveil the AMD Instinct MI100 – the world’s fastest HPC GPU,” said Brad McCredie, AMD’s corporate veep of datacenter GPU and accelerated processing.
“Squarely targeted toward the workloads that matter in scientific computing, our latest accelerator, when combined with the AMD ROCm open software platform, is designed to provide scientists and researchers a superior foundation for their work in HPC.”
AMD did not reveal the number of transistors or the die size to reporters in a briefing last week. The specs that are public, however, show that each chip uses PCIe 4 to interface, contains 32GB HBM2 memory, can sustain a maximum of 1.2 TB per second of memory bandwidth, and has a max TDP of 300W. They’re also capable of shuttling 340GB per second of bandwidth per card with three AMD Infinity Fabric Links.
The MI100 accelerator is designed to compete against Nvidia’s latest A100 GPUs. The AMD and Nvidia parts score some wins against each other. For example, the A100 has more RAM and memory bandwidth (up to 80GB and 2,039 GB/s); the MI100 has a higher FP64 performance (11.5 TFLOPS v A100’s 9.7 TFLOPS); the A100 has higher performance at lower precision, it seems, and has a higher max TDP. Whichever one gives you more bang for your buck depends on the workload.
AMD reckons its MI100 accelerator will offer customers a cheaper pathway towards building exascale supercomputers. The chip gives users 1.8X to 2.1X more performance per dollar compared to the A100, it claimed. It’s supported by AMD’s ROCm 4.0 open source platform, which can accelerate machine-learning frameworks PyTorch and Tensorflow.
Designed to be used alongside AMD’s Epyc server processors, the MI100 GPUs are expected to crunch through heavy machine-learning workloads and simulations for things like climate modelling, astrophysics, and fluid dynamics. The MI100 will be available through various vendors, including HPE, Dell, Supermicro, and Gigabyte and is expected to start shipping this month. ®