Comments on: Billion-Dollar AI Promise a Bright Spot in Gloomy Quarter for Cisco https://www.nextplatform.com/2023/11/16/billion-dollar-ai-promise-a-bright-spot-in-gloomy-quarter-for-cisco/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Fri, 17 Nov 2023 22:29:17 +0000 hourly 1 https://wordpress.org/?v=6.5.5 By: Slim Jim https://www.nextplatform.com/2023/11/16/billion-dollar-ai-promise-a-bright-spot-in-gloomy-quarter-for-cisco/#comment-216422 Fri, 17 Nov 2023 22:29:17 +0000 https://www.nextplatform.com/?p=143266#comment-216422 In reply to Slim Albert.

Right on! Lambos don’t get great performance on dirt roads … In the SCREAM paper, the coarse problem’s (110km) strong-scaling looks best at around 8 nodes (GPU speedup approx 3.5) which gives a compute density near 700 spectral elements per node (Table 1), that obtains also at the upper end of the scaling done on Frontier for 3.25km (6.3 million spectral elements, over 8,192 nodes). Neat! Can’t wait for the 10 EF/s box that will run the 1km-grid version of this (with great comps & comms)!

]]>
By: Slim Albert https://www.nextplatform.com/2023/11/16/billion-dollar-ai-promise-a-bright-spot-in-gloomy-quarter-for-cisco/#comment-216376 Thu, 16 Nov 2023 23:05:17 +0000 https://www.nextplatform.com/?p=143266#comment-216376 Great and very timely! The MS Azure Eagle placing number 3 on the Top500 strongly suggests (to me at least) that the prospect of cloud HPC, and cloud AI, is becoming much more realistic (unlike RISC-V). Taking that diagonally over to the excellent SC23 GBP-candidate paper on the E3SM Frontier Summit SCREAM (Taylor, Guba, Sarat Sreepathi, … https://dl.acm.org/doi/10.1145/3581784.3627044 ), and particularily the line with blue circles and the one with red squares in their Figure 3, one finds the following (Section 7.1):

“Running at the strong scaling limit, the performance is dominated by communication costs, and GPU speedups are diminished”

The CPU-only system actually overtakes the GPU-accelerated one at this limit (40 nodes, 110km coarse config.) because networking can’t keep up (this is for Perlmutter; Frontier’s networking is just fine as shown later in Fig. 4)! All this to say that the user-available performance of the gargantuan machines being fielded these days (for HPC and AI) relies in no small way on equivalently performant networking (not to be forgotten as we copiously salivate over the specs of mammoth CPUs and elephantine accelerators!)!

]]>