Comments on: Power Efficiency, Customization Will Drive Arm’s Role In AI https://www.nextplatform.com/2024/04/17/power-efficiency-customization-will-drive-arms-role-in-ai/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Thu, 25 Apr 2024 17:38:24 +0000 hourly 1 https://wordpress.org/?v=6.5.5 By: Ranga https://www.nextplatform.com/2024/04/17/power-efficiency-customization-will-drive-arms-role-in-ai/#comment-223539 Wed, 24 Apr 2024 06:46:23 +0000 https://www.nextplatform.com/?p=143999#comment-223539 While the Arm power efficiency claims are impressive, the lack of real data is not. Is there a single CPU to CPU perf/w comparison of best-in-class x86 and best-in-class Arm from the currently shipping generation of CPUs? Also, how much of the GH or GB efficiency benefits are purely from the GPU and the NVLINK between the CPU and the GPU versus something fundamental in the Arm architecture? Inquiring minds want to know 🙂

]]>
By: Calamity Jim https://www.nextplatform.com/2024/04/17/power-efficiency-customization-will-drive-arms-role-in-ai/#comment-223410 Sun, 21 Apr 2024 19:21:21 +0000 https://www.nextplatform.com/?p=143999#comment-223410 In reply to Slim Albert.

Good points! Overall efficiency improvements in hybrid CPU+Accelerator architectures may hinge on whether some type of RISC-vs-CISC duality can be identified also in mixed-precision matrix-vector units. This could be key to bringing us closer to the Zettascale, as Kathy Yelick will discuss in her ISC 2024 Keynote ( https://www.isc-hpc.com/conference-keynote-2024.html ).

Scaling Frontier straight to 1.2 ZF/s would see it consume 23 GW of juice. Assuming that MxP is acceptable for FP64 accuracy and gives a 10x performance boost reduces that to 2.3 GW — which is already less than Microsoft’s 5 GW “death”-star-gate AI project ( https://www.nextplatform.com/2024/04/01/microsoft-stargate-the-next-ai-platform-will-be-an-entire-cloud-region/ , https://www.theregister.com/2024/04/01/microsoft_openai_5gw_dc/ ). Switching to RISC-CPU and RISC-accelerator (if possible) might get this down to 1 GW– still large, but 30 years ago (June 1994) the Top500 #1 was Sandia’s XP/S140 at just 184 kW (and 0.14 PF/s). If it was acceptable to increase power consumption by 100x to get from there to the ExaFlop of Frontier, maybe it is okay to increase it another 50x for a FP64 MxP ZettaFlopper?

]]>
By: Slim Albert https://www.nextplatform.com/2024/04/17/power-efficiency-customization-will-drive-arms-role-in-ai/#comment-223346 Sat, 20 Apr 2024 06:16:14 +0000 https://www.nextplatform.com/?p=143999#comment-223346 In reply to HuMo.

Thanks for the correction. I see it now: Green500 #79 Leonardo-CPU ( https://top500.org/lists/green500/list/2023/11/ ), right there with Mare Nostrum 5 GPP, Snellius Phase 2 CPU, Shaheen III – CPU, … I’m glad I had it wrong (it puts my mind back at ease to see this 2x-3x better efficiency for A64FX ARM in CPU-only action, as actually expected. The universe is back in harmony!).

]]>
By: HuMo https://www.nextplatform.com/2024/04/17/power-efficiency-customization-will-drive-arms-role-in-ai/#comment-223318 Fri, 19 Apr 2024 14:49:59 +0000 https://www.nextplatform.com/?p=143999#comment-223318 In reply to Slim Albert.

Hmmmm … no, and yes! The MN-3 is accelerated by custom MN-core processors as you can see here: https://projects.preferred.jp/supercomputers/en/ — it is not a CPU-only system and so A64FX wins on perf/Watt in that category at 16 GF/W. The first x86-only looks to be Leonardo at 7 GF/W (Crossroads is near 5 GF/W).

But yes, the Ballroom blitz Battle Royale between the swashbuckling stronger swagger of El Capitan, Santa’s red-nosed rocket sleigh, the Swiss-cheese-grater of the Alps, and this new EXA1 atomic leaping frog, should be most entertaining (not to mention the recently awakened sleeping beauty!)! q^8

Hopefully that all takes place next month, hosted by the delicious Hamburgers of HPC gastronomy! (May 12-16: https://www.isc-hpc.com/ )

]]>
By: Slim Albert https://www.nextplatform.com/2024/04/17/power-efficiency-customization-will-drive-arms-role-in-ai/#comment-223308 Fri, 19 Apr 2024 12:57:59 +0000 https://www.nextplatform.com/?p=143999#comment-223308 I sure am looking forward to seeing some impressive combinations of performance and efficiency from ARM devices in Top500 and MLPerf (for example from Grace-Grace, GH200, and/or GB200). At present though, it bears noting that for CPU-only machines, the highest rank on Green500 is at #13 (MN-3, Xeon Platinum 8260M 24C 2.4GHz) which gets 41 GF/Watt, while the A64FX systems run at approx. 16 GF/Watt (eg. #48 Central Weather PrimeHPC). I think that when looking at high performance CPUs (eg. Neoverse V, versus Neoverse N), it remains necessary to pay special attention to the whole system in order for potential “power-sipping” characteristics of the cores to propagate to the whole machine. Apple’s M-based laptops for example, combine SDRAM with the CPU for improvements in both performance and efficicency. Similarly, the LPDDR5 used by Nvidia likely helps the Grace systems in power efficiency. Meanwhile, in most AI-oriented systems, the performance of accelerators probably trumps that of CPUs (w/r overall perf and eff).

Comparing performance and efficiency of LANL’s Crossroads (CPU-only) to that of Venado’s Grace-Grace partition, and also that of MI300A-based systems (El Capitan) to Grace-Hopper machines (Venado, Alps, CEA EXA1 HE, …) should be most informative in this, IMHO. Maybe we’ll see more power-sipping accelerators from ARM (beyond the 4-TOPs U85)?

]]>
By: JustNobody https://www.nextplatform.com/2024/04/17/power-efficiency-customization-will-drive-arms-role-in-ai/#comment-223253 Thu, 18 Apr 2024 06:17:18 +0000 https://www.nextplatform.com/?p=143999#comment-223253 It strikes me that the big AI models happen only after the zero/negative interest rate phase, even though most of the model makers are not concerned with profits right now.

The cheap money/easy funding of the past decade got us business models like Uber and Door dash with a lot of contenders and little (or negative) margin desperate to turn spending and market share into margin and profit. It didn’t work well for most of them unless they sold their shop early enough.

AI and especially the GenAI seems a lot like it. Except this time the biggest players are the ones most able to drop 9-13 figures and the least likely to make money by being acquired and/or going public for the first time.

There’s a lot of money to be made with AI, but I’m not convinced it’s in the flagship Geminis, Copilots, Llamas and MM1s of this world. I see a lot more revenue from engineering and science applications, like Synopsis seems to think of with Ansys, or internal models to manage systems/networks and that sort of applications. But that’s not as flashy or impressive for the public and boards to drop money on due to FOMO.

]]>