Comments on: Building A Hassle-Free Way To Port CUDA Code To AMD GPUs

By: Dmartins

Dmartins — Fri, 12 Jan 2024 06:35:23 +0000

Well, this aged as it should’ve.
AMD is supporting HIP on 780M GPUs and Ryzen processors, the 680M is just asking too much. It was not based on the same gen as 6000 series desktop even.

By: Thomas Hoberg

Thomas Hoberg — Tue, 02 Jan 2024 09:38:26 +0000

I remember QuickTransit with fondness and sadness: fondness, because it was extremely cool and sadly, because IBM bought it and put it into their ‘poison locker’, so it couldn’t be used “the right way”, which would haven been a better Hercules (z/Arch emulator) or what Gene Amdahl had been working on in his later career.

That they licensed it to mainframe customers who might want to x86 workloads on z/Arch was almost too painful to bear!

The problem is, that something as simplistic as QT won’t do the job here. QT is at a similar level as the internal translation engines of x86 CPUs that take the x86 ISA and turn it into their respective native codes or akin to Transmeta’s Crusoe.

If you’re in want of a metaphor, it enables a guy who’s used to build a cottage from mud bricks, to erect a similar structure from autoclaved aerated concrete blocks.

But in CUDA the job is to build a Cheops pyramid before people forget who Cheops was (at least they managed to preserve the corpse while it was waiting). You can’t just dump thousands of cottage makers on a pile of bricks and have them build a pyramid: they need to be finely orchestrated in variable sized teams meshed in several levels of gears. Any individual stepping out of the carefully synchronized dances creates havoc.

And that dance script cannot be translated to an Eiffel type wrought iron or pre-stressed concrete variant of the pyramid by translating the worker’s instructions from ancient Egyptian to the Hindi the current crew understands.

The challenge here is that it’s not about translating the code, it’s about translating the various layers of abstractions that CUDA also had to create in order to have thousands of cores work with the least amount of waste on a huge undertaking. And that IMHO also creates a potential IP minefield, as those abstractions by themselves represent a significant body of work.

What helps, is that those who create the current wave of pyramid building instructions are very intent on avoiding to code in the Egyptian long prosed by team Nile-Green and try to use something more commonly spoken like this new Babylonian.

By: Cristian Vasile

Cristian Vasile — Sun, 24 Dec 2023 14:15:41 +0000

A not so hassle-free could be transpiling/transcoding (a word borrowed from video processing workflows) Nvidia Cuda high level code to Python style and feed the new Mojo compiler with that.
Some smart persons would use the new AI LLM methods augmenting the conversion phase with more accuracy.
Language2Language Transformers: machine learning to build transpilers.
(https://tomassetti.me/language2language-transformers-machine-learning-to-build-transpilers/)

By: hoohoo

hoohoo — Sat, 23 Dec 2023 01:21:36 +0000

In reply to ConsumerJoe. For the Ryzen 7940hs processor, you can get HIP code running as follows. Install Visual Studio 2019. Install AMDs HIP stack for Windows. Open one of the rocm-examples in VS. Change the Properties > General[AMD HIP for clang] > Offload Architectures value, add device string gfx1103 to the list. The code will compile and run. I have the Floyd Warshal example running this way. There is also an environment variable HSA_OVERRIDE_GFX_VERSION which can be set to 11.0.3, and I am told this works for Linux too. This is much like CUDA with it's compute_NN values and sm_NN values. Though CUDA is more friendly, it'll fall back to runnable spec code on newer silicon if your config strings are old. HIP just drops core at kernel launch if I haven't included the right arch string at compile time. I almost have HIP Caffe compiling on Windows 11 lol.

By: NonStandardz

NonStandardz — Sat, 23 Dec 2023 00:42:17 +0000

In reply to Alan Brown.

The Blender Foundation stopped supporting OpenCL with Blender 3D 3.0 and only natively supports Nvidia’s CUDA or Apple’s Metal GPU graphics/compute APIs! And so for AMD’s GPUs to be able to be used with Blender There has to be the ROCm/HIP translation stack employed to take the CUDA and translate that to a form that can be run on AMD’s Radeon hardware. And ditto for Intel’s OneAPI and making that CUDA work for Intel’s GPUs. So just you try and Get Blender 3D 3.0/later to emit anything but some CUDA or Metal compatible Intermediate Representation there for anything but Nvidia’s or Apple’s respective GPU drivers to consume. AMD and Intel have to eat the CUDA and translate that to work with Blencer3D 3.0/later editions on their respective GPU hardware.

And Linux has never had a good cross platform Modern OpenCL implementation in the MESA driver package supported over the years and that’s part of the reason that the Blender Foundation dropped OpenCL support in favor of CUDA/Metal. And Linux/MESA is only now getting a more modern OpenCL supported with the Rusticl(OpenCL) project(Modern OpenCL implementation written in Rust) added to the MESA driver packages that ship installed on most Linux Distros. It’s good for any Linux Applications that still use OpenCL as the GPU compute API but too late for Blender 3D 3.0/Later editions.

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Fri, 22 Dec 2023 23:03:51 +0000

In reply to ConsumerJoe. I am not doing any such thing. Perhaps the tech people need to take a gander.

By: Timothy Prickett Morgan

Timothy Prickett Morgan — Fri, 22 Dec 2023 23:03:09 +0000

In reply to hoohoo. No apologies, man. You're good.

By: Alan Brown

Alan Brown — Fri, 22 Dec 2023 18:52:11 +0000

Um…. How long has openCL been around? (Quick check: Wikipedia sez over 14 years)

Multiplatform, multios, works on all GPUs (and many CPUs), supported by AMD, Intel, Nvidia and most other GPU houses

So why are programmers persisting with proprietary extensions? (Hint: MSIE wasn’t a “standard” either)

By: hoohoo

hoohoo — Thu, 21 Dec 2023 20:26:35 +0000

In reply to HuMo.

Let’s kick a verse for my man called Miles
‘Cause seems to me he’s gonna be ’round for a long while
‘Cause he’s a multi-talented and gifted musician
Who can play any position
It’s no mystery that you’re no risk to me
‘Cause I’m the lover and tell your girl to throw a kiss to me
And hop in bed and have a fight with the pillow
Turn off the lights and let the J give it to ya
And let the trumpet blow as I kick this
‘Cause rap is fundamental and Miles sounds so wicked
A little taste of bebop sound with the backdrop of doo-hop
And this is why we can call it the doo-bop
Now go ahead and play like a wannabe
You know it’s gonna be
I hate to cut the throats of MC’s up in front of me
When I’m out blow make the A want to sing
My rhymes be shining on brothers
Like they flippin’ on they high beams
And when I just come through
You think you bad ’cause somebody has seen you
Climbing the tree like Jack Be Nimble
Yo, Miles blow the trumpet off the symbol
Miles Davis style is different, you could describe it as specific
He rip, rage and roar, no time for watchin’ Andy Griffith
You can (whistle) all you want, go ahead
While he take to doo-hop and mix it with bebop
Just like a maker in the shoe shop
Easy Mo Bee will cream you like the nougat
And usually we doo-wop but since Miles wanna cool out
You can do that Miles, blow your trumpet
Show the people, just when it’s to doo-bop
– Miles Davis

Sorry. Hadda do it

By: deksman2

deksman2 — Thu, 21 Dec 2023 18:37:43 +0000

In reply to Jim.

The main purpose I would argue is that this translation exists so that AMD gpu’s can be used for hw acceleration in software that’s CUDA bound with no loss in performance or efficiency, so in reality, there would be no need to worry if it works with NV hw or not, since the software can just default to original CUDA if an NV gpu/accelerator is detected in the system.

Unless they are looking to replace CUDA with CuPBoP-AMD, but not sure how that would work if the software relies on CUDA for making use of the GPU in the first place and CuPBoP-AMD just acts as a translation so AMD accelerators and GPU can be used for HW acceleration (effectively enabling CUDA to run on AMD).

So, not really sure why they are bothering with NV hw to start with since they can work off the original CUDA code.
At this point its just a matter of integrating the CuPBoP-AMD so AMD hw can be used in AI and pro software instead.