Comments on: Intel Is Counting On AI Inference To Save The Xeon CPU https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Fri, 05 Jan 2024 14:22:21 +0000 hourly 1 https://wordpress.org/?v=6.5.5 By: Timothy Prickett Morgan https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-218521 Fri, 05 Jan 2024 14:22:21 +0000 https://www.nextplatform.com/?p=143167#comment-218521 In reply to Jeff Brower.

Well put.

]]>
By: Jeff Brower https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-218510 Fri, 05 Jan 2024 10:46:00 +0000 https://www.nextplatform.com/?p=143167#comment-218510 Very insightful article — both the technology involved and associated business aspects. I have no doubt Gelsinger finds your Intel datacenter business revenue chart as “inconceivable” as you do. But the whole GenAI effort is on the wrong track. The underlying need is for a novel inference architecture, combined with a fundamental change in training algorithms. There is zero chance the human brain, in 40 W, is performing complex math (e.g. gradient descent) with a level of numerical accuracy and error free calculation required by Big AI algorithms. Huge amounts of memory params are fine if access can be slow and errors allowed — rather than wasting power hungry circuitry on EDAC and every possible incremental tick in bandwidth. Big AI, instead of striving for efficiency, which evolution prizes above all else, strives for continuously increasing capex, energy usage, and complexity. I guess that increases the width of their moat, but it doesn’t come anywhere close to matching the human brain.

]]>
By: John Lee https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-217890 Fri, 22 Dec 2023 13:30:50 +0000 https://www.nextplatform.com/?p=143167#comment-217890 In reply to Timothy Prickett Morgan.

I agree Emerald Rapids is a stop-gap. The next big step forward from Intel is Granite Rapids. Multiplexer Combined Ranks (MCR) DIMM supported by Granite Rapids is much slower than HBM. NVIDIA is currently getting 5 TBytes/sec from 6 stacks of HBM3e. The second generation HBM3e, which is already sampling to NVIDIA and AMD, has 1.2 TBytes/sec per stack and 7.2 TBytes/sec per 6 stacks. 12 channels of MCR-DIMM is only 860 GBytes/sec, which is 8.4x slower than 7.2 TBytes/sec. Granite Rapids will not be competitive for large language model inference if all it has is 12 channels of MCR-DIMMs. Intel needs a Xeon Max version with 6 stacks of HBM. Even AMD has an HBM version of Turin, which is their server processor after Genoa. Unlike AMD and NVIDIA, Intel does not have the TSMC supply constraints for packaging HBM with Xeon.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-217816 Wed, 20 Dec 2023 14:47:23 +0000 https://www.nextplatform.com/?p=143167#comment-217816 In reply to John Lee.

They want to use double-pumped DDR5 memory instead. I think it is called MCR. We’re going to dig into it. Emerald Rapids is a stop-gap. The real show is Granite Rapids.

]]>
By: John Lee https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-217811 Wed, 20 Dec 2023 11:43:29 +0000 https://www.nextplatform.com/?p=143167#comment-217811 If Intel is counting on AI inference to save Xeon, why isn’t there a Xeon Max version of Emerald Rapids with High Bandwidth Memory (HBM)? The performance of large language model inference is limited by the bandwidth of reading neural net weights from DRAM, not by arithmetic. The arithmetic for large language model inference is just one 8-bit x 8-bit integer multiply accumulate per byte read from DRAM.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-215996 Tue, 07 Nov 2023 13:28:15 +0000 https://www.nextplatform.com/?p=143167#comment-215996 No, there is not.

]]>
By: Slim Jim https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-215783 Wed, 01 Nov 2023 15:41:20 +0000 https://www.nextplatform.com/?p=143167#comment-215783 In reply to HuMo.

I don’t know the details of your broomstick, but it was sure fun to see a few kids in costumes go from store to store yesterday (café, bakery, …), for free fistfuls of candy, in that French version of Halloween (not quite the North-American house-to-house approach yet … too many Hansel-&-Gretel witches maybe?)!

Speaking of SC23 though, a recent Eviden press release (10/04/23) suggests that 2024’s Jupiter (to be described at SC23) will be a GH200-type affair (Booster rocket), with Rhea1 possibly playing a mostly secondary role (Cluster ammunition) ( https://eviden.com/insights/press-releases/ ). It’ll be a RISC Exaflopper either way, but one may have hoped (unrealistically?) for a more central place for Rhea1 … (?)

]]>
By: Paul Berry https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-215779 Wed, 01 Nov 2023 14:06:11 +0000 https://www.nextplatform.com/?p=143167#comment-215779 In reply to JayN.

Even if Ponte Vecchio was amazing, Argonne would still be a disaster for how much program slip it’s had. Even the alternate-alternate-backup plan is two years late. The point of the machine isn’t to pass a benchmark, the point is to deliver useful science and it’s still not doing that.

]]>
By: Darren Starr https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-215777 Wed, 01 Nov 2023 13:34:23 +0000 https://www.nextplatform.com/?p=143167#comment-215777 In my current job position, I’m working on trying to deploy HPC resources for AI on a national scale for a European country. I could easily write a book on the topic at this point and by the time I finished writing it, everything would be old and somewhat irrelevant news. The article is absolutely correct that the two key areas where AI resources are consumed is in training and inference. Beyond that, I have issues with the article.
We’ve worked with AI/ML researchers across Europe to identify their computational requirements for inference and training. Our goal will be to deploy systems which sit comfortably near the top of the Top500, but only when measuring AI results. We will of course employ Lumi at CSC in Finland for a large portion of our workload.
What we learned is that most forms of AI and ML training can run fairly well on run of the mill servers. GPUs help, but while we have more than a few nVidia H100s and AMD MiXXX accelerators and we also have access to some of the world’s largest NVidia DGX clusters, most of the researchers we’re working with don’t need these types of resources. In fact, many run their science on their laptops.
There are a limited number of cases we identified which need HPC resources for training. That includes large language models, diffusion based image processing, etc… effectively only the models which require applying weights based on billions or trillions of source documents tend to need massive processing.
Interestingly, while we haven’t substantiated our hypothesis yet, we believe that a relatively small and inexpensive cluster of special purpose tensor processors (such as those you can run on Google cloud or similar) is possibly exponentially more cost effective than running on general purpose GPUs. The scale should be at least as drastic as comparing CPU to GPU for graphics processing.
Also, training is a short term investment and I personally predict that it is nearly impossible to see a return on investment when building systems for training. It’s also unlikely that paying to power a supercomputer for training a large language model will provide a return on investment. Especially with the expected electricity prices this coming winter.
On the other hand, once a model is trained, there is incredible potential for ROI across many fields in inference. And there are numerous potential use cases for inference and transformation in HPC. But not for the reasons one would typically think.
National or even international scale supercomputing environments are not optimal for AI because we don’t build these machines for a single task to hog for a month at a time. There are certainly oil companies who own their own HPC environments dedicated to weather simulation and exploration, but the massive computers we make have to be shared since there’s no way to pay for such a computer if there aren’t multiple projects benefiting from them. Therefore, running a process that consumes 50% of a super for 3 months would be unrealistic and simply wasteful.
This is ok because if we need to do this, we can just upload to an environment like AWS where rather than building a $100 million computer, we can rent someone else’s for a few hundred thousand dollars and be done with it.
But there are some excellent use cases for ML and GenAI once a model is trained. For example, we can us GenAI to take O(x^n) tasks such as protein folding and, while we are far from having quantum computers today (we’re building a big one now, Lumi-Q), while ML can’t provide as accurate an answer as a real quantum computer in O(x log(n)) time (current quantum computers can’t possibly do that either), we can estimate the outcome in a much shorter time comparable to quantum computers via inference.
For systems like this, it’s far smarter to use something like a Guadi2 system or better yet, a large scale Huawei Atlas 9000 system than a bunch of H100’s, DGX, MI250s… but keep in mind that the technology is moving so impressively fast that I’m quite sure that in the next year or two we’ll see developments in this field which completely change how we handle neural networks.
My thoughts on this is a play on other quotes. All it will take is some kids sitting in a garage… maybe in Ukraine to make a new innovation that squashes all the AI efforts from Google, Microsoft, AWS, OpenAI and more. And I think it’s going to happen sooner than later.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2023/10/30/intel-is-counting-on-ai-inference-to-save-the-xeon-cpu/#comment-215776 Wed, 01 Nov 2023 13:17:12 +0000 https://www.nextplatform.com/?p=143167#comment-215776 In reply to Slim Albert.

Appreciate you all. Thanks.

]]>