Comments on: What To Do When You Can’t Get Nvidia H100 GPUs https://www.nextplatform.com/2023/11/17/what-to-do-when-you-cant-get-nvidia-h100-gpus/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Mon, 29 Jan 2024 06:56:33 +0000 hourly 1 https://wordpress.org/?v=6.5.5 By: Amit https://www.nextplatform.com/2023/11/17/what-to-do-when-you-cant-get-nvidia-h100-gpus/#comment-219465 Mon, 29 Jan 2024 06:56:33 +0000 https://www.nextplatform.com/?p=143272#comment-219465 If you are in need for H100 you can contact me on LinkedIn – Amit Kuperman, in Nebius we have big capacity and really good prices for many kind of GPU clusters.

]]>
By: Slim Jim https://www.nextplatform.com/2023/11/17/what-to-do-when-you-cant-get-nvidia-h100-gpus/#comment-216424 Sat, 18 Nov 2023 00:10:12 +0000 https://www.nextplatform.com/?p=143272#comment-216424 Interesting! MLPerf 3.1 Training (11/08/23 https://mlcommons.org/benchmarks/training/ ) has some interesting results for L40, L40S, and others. With DLRM-dcnv2, for example, training is reported to take 25 minutes with 8x L40, 23 minutes with 8x L40S, 12 minutes with 8x H100-PCIe, and just 4 minutes with 8x H100-SXM5 (scaling beyond 8x is not extra super though as 64x H100-SXM5 gives 1.4 minutes, and 128x yields 1 minute).

I would expect that Ultrarack-x16 or even -x32 (if planned) would be the more interesting ones for AI trainers (eg. Wagner mentions a satisfied customer who wanted to “put 20 GPUs behind the server”; so maybe -x24 too). And composability (or de-composability) to be a great bonus when switching over to inference-oriented workloads. Cool stuff!

]]>