AMD's Instinct MI355X accelerator will reportedly consume 1,400 watts

1 day ago 2
AMD
(Image credit: AMD)

Mark Papermaster, chief technology officer of AMD, formally introduced the company's Instinct MI355X accelerators for AI and HPC at ISC 2025 — revealing massive performance improvements for AI inference, but also pointing to nearly doubled power consumption of the new flagship GPU compared to its predecessor from 2023, reports ComputerBase.

AMD's CDNA 4 enters the scene

AMD's Instinct MI350X-series GPUs are based on the CDNA 4 architecture that introduces support for FP4 and FP6 precision formats alongside FP8 and FP16. These lower-precision formats have grown in relevance in AI workloads, particularly for inference. AMD positions its Instinct MI350X processors primarily for inference, which makes sense as scale out world size of MI350X continues to be limited to eight GPUs, which reduces their competitive capabilities compared to Nvidia's Blackwell GPUs. Still Pegatron is readying a 128-way MI350X machine.

AMD's Instinct MI350X family of AI and HPC GPUs consists of two models: the default Instinct MI350X module with a 1000W power consumption designed for air cooling as well as the higher-performance Instinct MI355X that will consume up to 1400W and will be designed primarily for direct liquid cooling (even though AMD believes that some of its clients will be able to use air cooling with the MI355X).

Both SKUs will come with 288GB HBM3E memory that will offer up to 8 TB/s of bandwidth, but the MI350X will offer a maximum FP4/FP6 performance of 18.45 PFLOPS, whereas the MI355X is said to push the maximum FP4/FP6 performance to 20.1 PFLOPS. On paper, both Instinct MI350X models outperform Nvidia's B300 (Blackwell Ultra) GPU that tops at 15 FP4 PFLOPS, though it remains to be seen how AMD's MI350X and MI355X perform in real-world applications.

Swipe to scroll horizontally

Row 0 - Cell 0

AMD Instinct MI325X GPU

AMD Instinct MI350X GPU

AMD Instinct MI350X Platform (8x OAM)

AMD Instinct MI355X GPU

AMD Instinct MI355X Platform (8x OAM)

GPUs

Instinct MI325X OAM

Instinct MI350X OAM

8x Instinct MI350X OAM

Instinct MI355X OAM

8x Instinct MI355X OAM

GPU Architecture

CDNA 3

CDNA 4

CDNA 4

CDNA 4

CDNA 4

Dedicated Memory Size

256 GB HBM3E

288 GB HBM3E

2.3 TB HBM3E

288 GB HBM3E

2.3 TB HBM3E

Memory Bandwidth

6 TB/s

8 TB/s

8 TB/s per OAM

8 TB/s

8 TB/s per OAM

Peak Half Precision (FP16) Performance

2.61 PFLOPS

4.6 PFLOPS

36.8 PFLOPS

5.03 PFLOPS

40.27 PFLOPS

Peak Eight-bit Precision (FP8) Performance

5.22 PFLOPS

9.228 PFLOPS

72 PFLOPS

10.1 PFLOPS

80.53 PFLOPS

Peak Six-bit Precision (FP6) Performance

-

18.45 PFLOPS

148 PFLOPS

20.1 PFLOPS

161.06 PFLOPS

Peak Four-bit Precision (FP4) Performance

-

18.45 PFLOPS

148 PFLOPS

20.1 PFLOPS

161.06 PFLOPS

Cooling

Air

Air

Air

DLC / Air

DLC / Air

Typical Board Power (TBP)

1000W Peak

1000W Peak

1000W Peak per OAM

1400W Peak

1400W Peak per OAM

When it comes to performance comparison against its predecessor, FP8 compute throughput of the MI350X is listed at approximately 9.3 PFLOPS, while the faster MI355X is said to be 10.1 PFLOPS, up from 2.61/5.22 FP8 FLOPS (without/with structured sparsity) in case of the Instinct MI325X — this represents a significant performance improvement. Meanwhile, the MI355X also outperforms Nvidia's B300 by 0.1 FP8 PFLOPS.

Faster GPUs incoming

Papermaster expressed confidence that the industry will continue to develop even more powerful CPUs and accelerators for supercomputers to achieve zettascale performance in about a decade from now. However, that performance will come at the cost of a steep increase of power consumption, which is why a supercomputer offering a ZetaFLOPS performance could consume 500 MW of power — half of what a nuclear power plant can produce.

At ISC 2025, AMD presented data showing that top supercomputers have consistently followed a trajectory where compute performance doubles roughly every 1.2 years. The graph covered performance from 1990 to the present, demonstrating peak system GFLOPs. Early growth was driven by CPU-only systems, but from around 2005, a shift to heterogeneous architectures — mixing CPUs with GPUs and accelerators — took over. Now, in what AMD calls 'AI Acceleration Era,' systems like El Capitan and Frontier are pushing beyond 1 ExaFLOP, continuing the exponential growth trend with increasingly AI-specialized hardware.

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

But performance comes at a cost of power consumption. To maintain performance growth, memory bandwidth and power scaling have become urgent challenges. AMD's slide indicated that GPU memory bandwidth must more than double every two years to preserve the ratio of bandwidth per FLOPS. This has required increasing the number of HBM stacks per GPU, which in turn results in larger and more power-hungry GPUs and modules.

Indeed, power consumption of accelerators for supercomputers is increasing rapidly. While AMD's Instinct MI300X introduced in mid-2023 consumed 750W peak, the Instinct MI355X, set to be formally unveiled this week, will feature a peak power consumption of 1,400W. Papermaster envisions 1,600W accelerators in 2026 – 2027 and then 2,000W processors later this decade. By contrast, AMD's peers from Nvidia seem to be even more ambitious when it comes to power consumption as their Rubin Ultra GPUs featuring four reticle-sized compute chiplets are expected to consume as much as 3,600W.

The good news is that in addition to increased power consumption, supercomputers and accelerators have also been gaining performance efficiency rapidly. Another one of AMD's ISC 2025 keynote slides illustrated that performance efficiency increased from about 3.2 GFLOPS/W in 2010 to approximately 52 GFLOPS/W by the time exascale systems like Frontier arrived.

Looking ahead, maintaining this pace of performance scaling will require doubling energy efficiency every 2.2 years. A projected zettascale system delivering 1,000× exaflop-class performance would need around 500 MW of power at an efficiency level of 2,140 GFLOPs/W (a 41-fold increase from today). Without such gains, future supercomputers could demand gigawatt-scale energy — comparable to an entire nuclear power plant, making them way too expensive to operate.

AMD believes that to increase the performance of supercomputers dramatically a decade from now, not only it will need to make a number of architectural breakthroughs, but the industry will have to keep pace with compute capabilities to provide adequate memory bandwidth. Still, using nuclear reactors to power supercomputers seems in the 2030s seems to be a more and more realistic possibility.

Follow Tom's Hardware on Google News to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

Read Entire Article