Comments on: Top500 Supercomputers: Who Gets The Most Out Of Peak Performance?

By: Hubert

Hubert — Wed, 22 Nov 2023 04:50:01 +0000

Congratulations to all future HPC professional experts who took part in the SC23 Student Cluster Competition (Smackdown: https://www.studentclustercompetition.us/2023/index.html)! Congrats to the Indy winners in Zhejiang, Finland, and Córdoba! Congrats also to the Peking Radiance of Weiming team (EPYC9654+A800) that won HPL (182 TF/s) and MLPerf (tie), the Tsinghua Diablo team (Xeon8490H+H100) that won HPCG (3 TF/s), and the UCSD Triton LLC team (SuperMicro EPYC9654+MI210 mini-Frontier) that won MLPerf (tie)!

Mostly though, in the spirit of HPC Gastronomy, a hearty digestive congratulation to the SCC23 Overall Winning team:

The 4-node Piora Swiss Cheese Racklette Fondue (Milan+A100), from ETH Zürich

(kid you not!) Bon HPC appétit!

By: Slim Albert

Slim Albert — Mon, 20 Nov 2023 07:20:44 +0000

In reply to Hubert.

Hubert, it is notable, I think, that Frontier already gets 10 EF/s in HPL-MxP and so there should certainly be hope for 1 km, even today (networking seems ok). Converting SCREAM to mixed-precision might be a demanding task, but should be well worth it in the long run, especially if the so-developed methods also apply broadly to other models. From such efforts, NOAA could finally get its 1000x Cactus and Dogwood (#75-76 at 10 PF/s) for 1-km forecasts ( https://www.nextplatform.com/2022/06/28/noaa-gets-3x-more-oomph-for-weather-forecasting-it-needs-3300x/ )!

The new 40 million-core Sunway team is also doing well here with 5 EF/s in MxP (SC23) described in detail in Open Access ( https://dl.acm.org/doi/10.1145/3581784.3607030 ) which should be good for 1.4 km x-y grids. The EU’s LUMI and Japan’s Fugaku, at 2.4 and 2.0 EF/s MxP, respectively, might do nicely at around 2 km of horizontal resolution.

Hopefully, SCREAM’s vertical-Hamiltonian, horizontal-spectral, and temporal IMEX high-CFL RK, provide a bounty of opportunities for accurate, effective, and stable mixed-precision implementations.

The 10 EF/s 1-km goal also makes me wonder if anything interesting might have come out of last year’s DOE/Oak-Ridge “Advanced Computing Ecosystem” RFI, aimed at that performance range ( https://www.nextplatform.com/2022/06/30/so-you-think-you-can-design-a-20-exaflops-supercomputer/ )? If MxP were to give a 5x to 10x speedup on those, we’d be looking at 100 EF/s of practical effective oomph (one full tentacle of ZettaCthulhu! eh-eh-eh!)!

By: Hubert

Hubert — Sat, 18 Nov 2023 03:04:20 +0000

Very loud congratulations to everyone in the 19-member Exascale SCREAM team (Taylor, Guba, Sarat Sreepathi, …) for winning the first ACM Gordon Bell Prize for Climate Modelling! (now awarded: https://awards.acm.org/bell-climate , https://sc23.supercomputing.org/program/awards/ )!

Keep up the great work, all the way to 1 km at 10 EF/s, and beyond!

By: Slim Albert

Slim Albert — Fri, 17 Nov 2023 03:36:36 +0000

In reply to Slim Jim.

I join you in your congratulations, and add to them the 3 new CPU-only systems that are in the Top-25 for both HPL and HPCG:

MareNostrum 5 GPP (Xeon), Shaheen III-CPU (EPYC), and Crossroads (Xeon)

Easy to program, flexible, and powerful, what’s no to like!

By: Matt

Matt — Wed, 15 Nov 2023 07:36:32 +0000

In reply to Paul Berry. Speaking of which, it was ironically mentioned to me recently that the costs of building power infrastructure for a linpack run could be saved, since we never run the machine that hard again after acceptance.

By: Hubert

Hubert — Wed, 15 Nov 2023 06:50:41 +0000

In reply to Nebojsa Novakovic.

Speaking of GBP, Sarat of E3SM SCREAM! just commented (back on the GBP TNP piece https://www.nextplatform.com/2023/09/15/chinas-1-5-exaflops-supercomputer-chases-gordon-bell-prize-again/ ) that their very detailed and impressive physically-based (PDEs) whole-earth 3.25 km horizontal-resolution cloud-cover prediction presentation (128 vertical nodes, 10 billion parameters for physics + dynamics, MI250x ROCm and V100 CUDA) is also Wednesday (morning) Nov. 15 ( https://sc23.conference-program.com/presentation/?id=gbv102&sess=sess298 ) with Open Access paper ( https://dl.acm.org/doi/10.1145/3581784.3627044 ). A must see by all accounts!

Interestingly, the European equivalent to SCREAM! is the model named EXCLAIM! (go figure!)! (eh-eh-eh!)

By: Slim Jim

Slim Jim — Wed, 15 Nov 2023 04:46:49 +0000

This Top500 list sure has the most of hopscotch-leapfrogging around expectations I’ve seen in a while … but I’m glad Aurora (sleeping beauty) came out of its coma after being kissed by skunkworks engineer(s), even if it’s not yet in 100% shape. Doing the 585 PF/s tango on a first outing with all new glitter and gear (SR+PV) is nothing to be ashamed of in my mind (#2)!

Cloud HPC, that, on the face of it, seemed like a very dumb idea recently, is apparently not that bad at all with MS’ Azure Eagle machine hitting that very impressive 561 PF/s on HPL (#3)! Nvidia’s Eos SuperPOD (#9) also looks interesting as a pre-built 121 PF/s system that you can just buy, plug-in, and run (I guess), rather than having to go through months of instrument tuning and qualifier exams (as in that next log10 perf level: Exaflopping)!

And who could have ever expected the dual Spanish Inquisition minds of MareNostrum 5 ACC and GPP (#8 and #19) that convert HPC unfaithfuls (with surprise, ruthless efficiency, and torture by pillow and comfy chair) at a combined rate of 178 PF/s!?

The list is completely different from my expectations, but a great one nonetheless! Congrats to all, and especially the new entries!

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Wed, 15 Nov 2023 04:13:56 +0000

In reply to Eric Olson. They were fat for their time, I suppose. I would like 1,024-bit vectors. HA!

By: Eric Olson

Eric Olson — Tue, 14 Nov 2023 23:41:14 +0000

In reply to Timothy Prickett Morgan.

I think staying up without losing a node is important. How much wall time do these exascale HPL computations take anyway?

I’m confused why the A64FX is said to be “using special accelerators that put what is in essence a CPU and a fat vector on a single-socket processor.” I thought the Scalable Vector Extension on the A64FX were 512-bit wide. Isn’t that the same width as AVX512 on the Xeon?

Maybe the difference is the integrated HBM? But now Xeon MAX also has that.

I appreciate your analysis and find it amazing how much fun people have with the high-performance Linpack. Whether a relevant indicator for practical computation or not, HPL is still a good stress test to make sure the hardware works and meets design specifications.

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Tue, 14 Nov 2023 23:17:37 +0000

In reply to Hubert. That smells like a typo to me....