Comments on: Opening Up The Future “Venado” Grace-Hopper Supercomputer At Los Alamos https://www.nextplatform.com/2022/05/31/opening-up-the-future-venado-grace-hopper-supercomputer-at-los-alamos/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Tue, 07 Jun 2022 19:43:03 +0000 hourly 1 https://wordpress.org/?v=6.5.5 By: Timothy Prickett Morgan https://www.nextplatform.com/2022/05/31/opening-up-the-future-venado-grace-hopper-supercomputer-at-los-alamos/#comment-192165 Fri, 03 Jun 2022 14:59:32 +0000 https://www.nextplatform.com/?p=140676#comment-192165 In reply to Scott Atchley.

I meant “Summit.” Damnit. My bad.

]]>
By: Scott Atchley https://www.nextplatform.com/2022/05/31/opening-up-the-future-venado-grace-hopper-supercomputer-at-los-alamos/#comment-192164 Fri, 03 Jun 2022 12:06:28 +0000 https://www.nextplatform.com/?p=140676#comment-192164 “Lujan did say that storage would be directly integrated into the network, not bolted on through a different network, as Oak Ridge is doing with Frontier, and added that it would very likely be an all-flash storage system based on NVM-Express flash and very likely running the open source Lustre parallel file system.”

Frontier’s Orion file system sits directly on the dragonfly network with over 18.5 TB/s of bandwidth between the compute and storage dragonfly groups. Those storage groups also have a high number of gateway (in HPE parlance) or router (in Lustre parlance) nodes to provide 1.6 TB/s connectivity to external resources including the analysis cluster, data transfer nodes, and cloud-based, workflow orchestration services, as well as access to Summit’s GPFS file system, Alpine.

An all-flash filesystem like that on Perlmutter is convenient for users that want a single namespace in addition to being fast. Unfortunately, flash still costs an order of magnitude more than disk which means capacity will be a tenth of a disk based system. Frontier’s Orion oil system strives to provide both within a single name space (~10 PB of flash and ~600 PB of disk) that requires more complicated software to manage migration between the tiers. Time will tell if this meets the needs of our users.

]]>
By: Fred https://www.nextplatform.com/2022/05/31/opening-up-the-future-venado-grace-hopper-supercomputer-at-los-alamos/#comment-192134 Thu, 02 Jun 2022 03:59:25 +0000 https://www.nextplatform.com/?p=140676#comment-192134 That photo of LANL is really old.

]]>
By: Jim Z https://www.nextplatform.com/2022/05/31/opening-up-the-future-venado-grace-hopper-supercomputer-at-los-alamos/#comment-192098 Wed, 01 Jun 2022 00:59:37 +0000 https://www.nextplatform.com/?p=140676#comment-192098 fp8 is a vanity metric, fp16 is already more or less unusable since the dynamic range is too small, which is why everyone ended up using bf16 which has a full 8 bits of just exponent. Since floating point loses a bit for the sign, that’s only 7 bits total between the exponent and the mantissa, which isn’t going to let you train anything. GIGO.

]]>