Comments on: Chiplet Cloud Can Bring The Cost Of LLMs Way Down https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/ In-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. Fri, 18 Aug 2023 20:41:57 +0000 hourly 1 https://wordpress.org/?v=6.5.5 By: Nicolas Dujarrier https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-212483 Fri, 18 Aug 2023 20:41:57 +0000 https://www.nextplatform.com/?p=142644#comment-212483 In reply to Doctorfouad.

It could be that emerging Non-Volatile Memory (NVM) MRAM, like the VG-SOT-MRAM from European research center IMEC could help increase the cache density (replacing part or all the SRAM cache) AND also improve power efficiency as it is Non-Volatile (load the parameters in the cache once, and they stay there without needing energy to retain the information: it is a bit like bi-stable E-ink).

]]>
By: HuMo https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211095 Fri, 14 Jul 2023 01:27:07 +0000 https://www.nextplatform.com/?p=142644#comment-211095 In reply to Doctorfouad.

I think this goes back to the 85 TB/s figure in the “unfortunately using …” paragraph of this TNP article, which refers back to Figure 4 in the Chiplet paper. For HBM4 at 10 TB/s, the V100 would still need to perform 10-20 flops-of-processing/byte-of-accessed-mem for fullish utilization of the GPU’s oomph — if I correctly mis-exfoliate … (100 flop/byte at 0.9 TB/s on the fig.)

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211093 Thu, 13 Jul 2023 23:00:27 +0000 https://www.nextplatform.com/?p=142644#comment-211093 In reply to emerth.

I fully expect that to happen at some point, but heaven only knows when. This Dot Cog Boom will probably not take as long as the Dot Com one did. We are a lot faster at the hype cycle these days, eh?

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211092 Thu, 13 Jul 2023 22:59:14 +0000 https://www.nextplatform.com/?p=142644#comment-211092 In reply to deadbeef.

Well said. Thanks.

]]>
By: deadbeef https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211088 Thu, 13 Jul 2023 20:11:02 +0000 https://www.nextplatform.com/?p=142644#comment-211088 The memory limitation problem dogging the immortal computing paradigm is correctly identified by the paper and to some extent this article. Already the HBM in an H100 costs more to manufacture then the GPU die itself, and it (the memory) is still too slow and too small. At the same time though, its far too early to invest in fixed function ASICs optimised specifically for these transformer workloads. The mooted TCO gains would only ever materialize if the algorithms fail to advance meaningfully in the next 5-10 years which is not a likely outcome. That said the points made in the article and paper about the real world economics of LLMs are quite correct.

Sram scaling is dead after 3nm so not sure that the idea that the way forward is to expand SRAM to fit these huge models inside is the right one. This is where Graphcore started; either they were early, or wrong. A related point on sram scaling is that the compute becomes ever cheaper and thus ever more irrelevant in $ and mm2 versus the memory. So why prematurely optimise to an extremely fixed function compute solution, surely better to have something more general purpose (but not GPU sized)? The SRAM can always be stacked vertically to make room for that.

The parameter count is still vast versus the size of the training set. The algorithms for training and the model structure and assumptions themselves need to be fixed to deal with this fundamental flaw before genAI for all becomes economic; a very fixed function ASIC-write-large doesn’t alter the situation. I expect more innovations to follow before long that should improve efficiencies from the top down. Attention is seemingly, *not* all that is needed.

]]>
By: emerth https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211083 Thu, 13 Jul 2023 17:31:28 +0000 https://www.nextplatform.com/?p=142644#comment-211083 Two things come to mind:
– Wouldn’t it be wonderful if demand for GPUs from big tech dropped off like it has from crypto miners? Normal people could buy decent GPUs again.
– Google search vs Bing search number of ops needed: since Bing is so much more an advertizing engine this makes sense. Serving paid results would involve more simple hash lookups and less inference.

]]>
By: Timothy Prickett Morgan https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211080 Thu, 13 Jul 2023 16:32:32 +0000 https://www.nextplatform.com/?p=142644#comment-211080 In reply to asdf2jkl.

Dojo is using pretty advanced packaging. Similar in concept, for sure, with precise interconnects.

]]>
By: asdf2jkl https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211076 Thu, 13 Jul 2023 14:59:13 +0000 https://www.nextplatform.com/?p=142644#comment-211076 So what is the difference (architecturally) between this Chiplet Cloud and the Tesla Dojo that was presented at the last Hot Chips?

]]>
By: John S https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211066 Thu, 13 Jul 2023 12:30:58 +0000 https://www.nextplatform.com/?p=142644#comment-211066 In reply to Timothy Prickett Morgan.

ASICs are way less costly, way faster and much cheaper in quantity. You pay in turn around time and debugging effort. I’ve worked at places where the original product went out in FPGA so they could do updates as needed. Once it was stable, they did a cost reduction ASIC spin to save money. Didn’t lower the price of the hardware… but kept more money.

I would assume in this case going straight to ASIC is the way to go, especially since they’re logically fairly simple designs, they just want close memory, connectivity, and a bunch of matrix engines. Once they debug it in FPGAs… they cost reduce like crazy since the payout is super quick. It’s just three to six months from sending the design files to TSMC before you get hardware back…. that’s the only reason they’d do this on FPGAs first.

]]>
By: EJ222 https://www.nextplatform.com/2023/07/12/microsofts-chiplet-cloud-to-bring-the-cost-of-llms-way-down/#comment-211053 Thu, 13 Jul 2023 05:13:30 +0000 https://www.nextplatform.com/?p=142644#comment-211053 There is other low hanging LLM fruit ASICs can pick, like chunked 3-8 bit quantization modern GPUs aren’t really designed for, or optimization for specific software attention schemes.

If I am multiplying it right, 175B GPT-3 minimum latency calls for 414GB of SRAM. That doesn’t sound like ~4 bit quantization to me.

]]>