Comments on: Greasing The Skids To Move AI From InfiniBand To Ethernet https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Mon, 20 May 2024 16:12:00 +0000 hourly 1 https://wordpress.org/?v=6.5.5 By: Mark Hahn https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224462 Thu, 16 May 2024 23:35:02 +0000 https://www.nextplatform.com/?p=144127#comment-224462 In reply to Timothy Prickett Morgan.

Can you expand on this static-vs-dynamic thing? Are hyperscalers actually modifying the fabric topology dynamically, in response to job layout on nodes?

]]>
By: Mark Hahn https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224461 Thu, 16 May 2024 23:32:00 +0000 https://www.nextplatform.com/?p=144127#comment-224461 In reply to Adir Zevulun.

Risk-management, because IB is a safe and well-understood fabric for clusters. And arguably until recently ethernet simply didn’t scale easily.

Another interesting point is that IB has traditionally carried aspirations of smarter networking – verbs, offloading collectives, etc. Is it true that IB efforts in this direction have failed? I still see marketing about smarter IB NICs – or is it just that that kind of offload isn’t leveraged by the central AI frameworks?

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224324 Mon, 13 May 2024 16:11:30 +0000 https://www.nextplatform.com/?p=144127#comment-224324 In reply to Henry.

InfiniBand had low latency in a port to port hop inside the switch and that often translated into a very big performance difference across hundreds to thousands of endpoints. But now we are at tens of thousands of endpoints moving on to hundreds of thousands to perhaps a million, and the game is different. It is a statically configured network (InfiniBand) versus a dynamically configured one (Ethernet) and it is 3X the cost versus 1X the cost. There was a time when InfiniBand was cheaper and better and equally scalable. I remember it. Now it is more expensive, and not as scalable for a given number of layers in a network, and soon it will not have lower end-to-end latency for real applications. Nvidia would do well to license whatever Microsoft did with InfiniBand and add it to the Quantum line to add multitenancy and security.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224323 Mon, 13 May 2024 16:07:30 +0000 https://www.nextplatform.com/?p=144127#comment-224323 In reply to Calamity Jim.

It sure looks like it to me.

]]>
By: Calamity Jim https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224313 Mon, 13 May 2024 08:19:18 +0000 https://www.nextplatform.com/?p=144127#comment-224313 In reply to Adir Zevulun.

It’s been like the Hundred Years’ War between The Ethernets and the Infinibands, but it looks like the high-command field strategists and generals of the Ultra Ethernet Consortium have finally developed the needed low latency packet spraying dynamic load balancing and congestion control weaponry needed for their ultimate takeover of AI and HPC comms territories … but by 2025 this time (almost right away, but not quite, due to tail latency?). Will the “confederates” eventually win this one out against the “secessionists”? History will tell:

Greasing The Skids To Move AI From InfiniBand To Ethernet (2024)
Ethernet Consortium Shoots For 1 Million Node Clusters That Beat InfiniBand (2023)
CISCO Guns for Infiniband (2023)
Broadcom Takes on Infiniband (2023)
Google abandons ethernet to outdo infiniband (2022)
The Eternal Battle Between InfiniBand And Ethernet In HPC (2021)
Infiniband, still setting the pace for HPC (2020)

]]>
By: Henry https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224305 Mon, 13 May 2024 02:30:22 +0000 https://www.nextplatform.com/?p=144127#comment-224305 Historically, Infiniband had appreciably lower latency than Ethernet and quicker time-to-market for each new performance generation.

]]>
By: Jim https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224228 Fri, 10 May 2024 19:59:23 +0000 https://www.nextplatform.com/?p=144127#comment-224228 It is utterly astonishing how much money is being dumped into such little gain.

In the early days of computing, you could throw what would be the equivalent of a few million to transition from binary coding to assembly to using higher level languages for efficiency improvements of several thousand if not hundreds of thousands of percent in coding time.

Now it’s tens to possibly hundreds of billions, for nebulous promises of tens of percent, in both performance of the system itself and of what the output can do.

]]>
By: Adir Zevulun https://www.nextplatform.com/2024/05/09/greasing-the-skids-to-move-ai-from-infiniband-to-ethernet/#comment-224178 Fri, 10 May 2024 01:08:45 +0000 https://www.nextplatform.com/?p=144127#comment-224178 AI DC networking :
Nvidia Infiniband 2023 : $6.47 billion
Arista Ethernet 2025 : $0.75 billion (?)
Why do customers choose Infiniband if “Ethernet is proving to offer at least 10 percent improvement of job completion performance across all packet sizes versus InfiniBand” ?

]]>