Comments on: Arm Neoverse Roadmap Brings CPU Designs, But No Big Fat GPU

By: Jack Harvard

Jack Harvard — Fri, 05 Apr 2024 16:33:00 +0000

In reply to Timothy Prickett Morgan. You'll have seen that the two CEOs interact on LinkedIn with 'likes' since around the Telegraph story broke, just stating the facts.

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Fri, 05 Apr 2024 12:15:06 +0000

In reply to Jack Harvard. I have not. But wouldn't that be neat! Arm thinks I am being nuts or mean or both by suggesting it should have a beefy GPU/XPU.

By: Jack Harvard

Jack Harvard — Fri, 05 Apr 2024 02:52:27 +0000

Have you thought about the idea that Graphcore will get folded into Arm, so data center GPU/AI accelerator story gets more complete? Read the Telegraph story: https://www.telegraph.co.uk/business/2024/02/17/british-ai-champion-graphcore-explores-foreign-sale/

By: Lyman

Lyman — Thu, 22 Feb 2024 19:33:39 +0000

GPUs and AI are not equivalent classes. Tenstorrent, Cerebras, etc are not. Arm needs a discrete GPU like they need another hole in the head. AI/ML Inference doesn’t necessarily need exactly what AI/ML Traininig needs either. Long term the substantive part of delivered AI services is going to be for “inference” , not “training”. Nvidia has a temporary scarcity lock on “training”. They don’t have inference locked up in anywhere as close as much.

The new CSS blocks appear to have UCIe chiplet interconnect also. An integrated Inference die via UCIe has lots of potential upside without Arm spreading their pragmatically limited budget even thinner. (UCIe is a bit of a dual edge sword because it could do big harm to Arms iGPU business also over the long term.) Arm has a substantively high number architecture license holders which are also a long term viability threat to CSS like products. Again a very high load on their limited R&D resource budget.

N3 cranks up AI/ML performance in some areas by 100%. Extremely likely they are targeting inference loads with very low latency tolerances rather then the training ‘bubble’ we are in right now. Long term, Inference is going to grow bigger than Training. It simply has more usable and privacy utility (value).

By: Hubert

Hubert — Thu, 22 Feb 2024 00:06:42 +0000

Great presentation and analysis! I’m sure glad the Neoverse Poseidon V3 is coming under the CSS Voyager umbrella for straightforward cut-and-paste SoC design. Like you, I wonder about the combination “PCIe Gen5 & CXL 3.0” on that slide as CXL 3 is commonly viewed as coming with PCIe6 (maybe an ARM-fingertip typo?).

There seems to be a nice progression of performance core counts between A64FX at 48 cores, Rhea1 at 64 cores, Grace at 72 cores, Graviton4 at 96 cores, and now this V3 at 128 cores (2×64, out-of-the-box). As you suggest, a 4-die 256 core package might be possible if some DDR5 or I/O modules are replaced with Die-to-Die blocks, but then perf may suffer through memory bottleneck.

The big mystery though is where the AI uplift comes from (both V3’s near 100%, and N3’s near 200%). Are the vector units updated in a way that makes software better-able to use them, or did they just double-up on them, to “eight 128-bit vectors” for example? Maybe HBM3 (but only in the V3 then)?

In any instance, with so many countries wanting to have chip-making sovereignty after the recent supply-chain disruptions, seeing ARM make the performance V-series of the twelve cheeky Olympians more easy to integrate into one’s chip design is definitely a great thing in my view!