Fresh Hacker News | BarraCUDA Open-source CUDA compiler targeting AMD GPUs

▲BarraCUDA Open-source CUDA compiler targeting AMD GPUs(github.com)

269 points by rurban 11 hours ago | 22 comments

▲h4kunamata 9 hours ago

>Requirements

>A will to live (optional but recommended)

>LLVM is NOT required. BarraCUDA does its own instruction encoding like an adult.

>Open an issue if theres anything you want to discuss. Or don't. I'm not your mum.

>Based in New Zealand

Oceania sense of humor is like no other haha

The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.

The cheer amount of knowledge required to even start such project, is really something else, and prove the manual wrong on the machine language level is something else entirely.

When it comes to AMD, "no CUDA support" is the biggest "excuse" to join NVIDIA's walled garden.

Godspeed to this project, the more competition the less NVIDIA can continue destroying the PC parts pricing.

▲querez 9 hours ago

> The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.

The project owner is talking about LLVM,a compiler toolkit, not an LLM.

▲kmaitreys 7 hours ago

It's actually quite easy to spot if LLMs were used or not.

Very few total number of commits, AI like documentation and code comments.

But even if LLMs were used, the overall project does feel steered by a human, given some decisions like not using bloated build systems. If this actually works then that's great.

▲butvacuum 7 hours ago

Since when is squashing noisesum commits an AI activity instead of good manners?

▲sigmoid10 20 minutes ago

The first commit was 17k lines. So this was either developed without using version control or at least without using this gh repo. Either way I have to say certain sections do feel like they would have been prime targets for having an LLM write them. You could do all of this by hand in 2026, but you wouldn't have to. In fact it would probably take forever to do this by hand as a single dev. But then again there are people who spend 2000 hours building a cpu in minecraft, so why not. The result speaks for itself.

▲kmaitreys 1 hour ago

Can you prove that this is what happened?

▲natvert 7 hours ago

Says the clawdbot

▲kmaitreys 1 hour ago

It's quite amusing the one time I did not make an anti-AI comment, I got called a clanker myself.

I'm glad the mood here is shifting towards the right side.

▲luckydata 2 hours ago

this type of project is the perfect project for an llm, llvm and cuda work as harnesses, easy to compare.

▲kmaitreys 1 hour ago

What do you mean by harnesses?

▲wild_egg 8 hours ago

This project very most definitely has significant AI contributions.

Don't care though. AI can work wonders in skilled hands and I'm looking forward to using this project

▲ZaneHam 7 hours ago

Hello! I didn't realise my project was posted here but I can actually answer this.

I do use LLM's (specifically Ollama) particularly for test summarisation, writing up some boilerplate and also I've used Claude/Chatgpt on the web when my free tier allows. It's good for when I hit problems such as AMD SOP prefixes being different than I expected.

▲8note 3 hours ago

since nobody else seems to have said it, this is exciting! keep up the fun work!

▲magicalhippo 8 hours ago

> Oceania sense of humor is like no other haha

Reminded me of the beached whale animated shorts[1].

[1]: https://www.youtube.com/watch?v=ezJG0QrkCTA&list=PLeKsajfbDp...

▲ekianjo 6 hours ago

LLVM, nothing to do with LLMs

▲samrus 8 hours ago

> >LLVM is NOT required. BarraCUDA does its own instruction encoding like an adult.

> The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.

"Has tech literacy deserted the tech insider websites of silicon valley? I will not beleove it is so. ARE THERE NO TRUE ENGINEERS AMONG YOU?!"

▲colordrops 7 hours ago

I'm still blown away that AMD hasn't made it their top priority. I've said this for years. If I was AMD I would spend billions upon billions if necessary to make a CUDA compatibility layer for AMD. It would certainly still pay off, and it almost certainly wouldn't cost that much.

▲woctordho 4 hours ago

They've been doing it all the time and it's called HIP. Nowadays it works pretty well on a few supported GPUs (CDNA 3 and RDNA 4).

▲colordrops 2 hours ago

Please. If HIP worked so well they would be eating into Nvidia's market share.

First, it's a porting kit, not a compatibility layer, so you can't run arbitrary CUDA apps on AMD GPUs. Second, it only runs on some of their GPUs.

This absolutely does not solve the problem.

▲KennyBlanken 13 minutes ago

HIP is just one of many examples of how utterly incompetent AMD is at software development.

GPU drivers, Adrenalin, Windows chipset drivers...

How many generations into the Ryzen platform are they, and they still can't get USB to work properly all the time?

▲mathisfun123 2 hours ago

it's astounding to me how many people pop off about "AMD SHOULD SUPPORT CUDA" not knowing that HIP (and hipify) has been around for literally a decade now.

▲colordrops 2 hours ago

Please explain to me why all the major players are buying Nvidia then? Is HIP a drop in replacement? No.

You have to port every piece of software you want to use. It's ridiculous to call this a solution.

▲woctordho 1 hour ago

Major players in China don't play like that. MooreThreads, Lisuan, and many other smaller companies all have their own porting kits, which are basically copied from HIP. They just port every piece of software and it just works.

If you want to fight against Nvidia monopoly, then don't just rant, but buy a GPU other than Nvidia and build on it. Check my GitHub and you'll see what I'm doing.

▲mathisfun123 1 hour ago

> Is HIP a drop in replacement? No.

You don't understand what HIP is - HIP is AMD's runtime API. it resembles CUDA runtime APIs but it's not the same thing and it doesn't need to be - the hard part of porting CUDA isn't the runtime APIs. hipify is the thing that translates both runtime and kernels. Now is hipify a drop-in replacement? No of course but because the two vendors have different architectures. So it's absolutely laughable to imagine that some random could come anywhere near "drop-in replacement" when AMD can't (again: because of fundamental architecture differences).

▲colordrops 1 hour ago

Who said "some random"? Read the whole thread. I was suggesting AMD invest BILLIONS to make this happen. You're aguing with a straw man.

▲bigyabai 36 minutes ago

I think you misunderstand what's fundamentally possible with AMD's architecture. They can't wave a magic wand for a CUDA compatibility layer any better than Apple or Qualcomm can, it's not low-hanging fruit like DirectX or Win32 translation. Investing billions into translating CUDA on raster GPUs is a dead end.

AMD's best option is a greenfield GPU architecture that puts CUDA in the crosshairs, which is what they already did for datacenter customers with AMD Instinct.

▲KeplerBoy 15 minutes ago

This is a big part of AMD still not having a proper foothold in the space: AMD Instinct is quite different from what regular folks can easily put in their workstation. In Nvidia-land I can put anything from mid-range gaming cards, over a 5090 to an RTX 6000 Pro in my machine and be confident that my CUDA code will scale somewhat acceptably to a datacenter GPU.

▲KennyBlanken 16 minutes ago

Wow you're so very smart! You should tell all the llm and stablediffusion developers who had no idea it existed! /s

HIP has been dismissed for years because it was a token effort at best. Linux only until the last year or two, and even now it only supports a small number of their cards.

Meanwhile CUDA runs on damn near anything, and both Linux and Windows.

Also, have you used AMD drivers on Windows? They can't seem to write drivers or Windows software to save their lives. AMD Adrenalin is a slow, buggy mess.

Did I mention that compute performance on AMD cards was dogshit until the last generation or so of GPUs?

▲ddtaylor 6 hours ago

AMD did hire someone to do this and IIRC he did, but they were afraid of Nvidia lawyers and he released it outside of the company?

▲colordrops 2 hours ago

Surely they could hire some good lawyers if that means they make billions upon billions? AFAIK there's nothing illegal about creating compatibility layers. Otherwise WINE would have shut down long ago.

▲andy_ppp 6 hours ago

Moving target, honestly just get PyTorch working fully (loads of stuff just doesn’t work on AMD hardware) and also make it work on all graphics cards from a certain generation. The matrix of support needed GFX cards, architectures and software together is quite astounding but still yes that should have at least that working and equivalent custom kernels.

▲colordrops 2 hours ago

That would be a great start.

▲dboreham 7 hours ago

Unrelated: just returned from a month in NZ. Amazing people.

▲ZaneHam 3 hours ago

Hope you enjoyed it!!

▲lambda 3 hours ago

> The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.

Huh? This is obvious AI slop from the readme. Look at that "ASCII art" diagram with misaligned "|" at the end of the lines. That's a very clear AI slop tell, anyone editing by hand would instinctively delete the extra spaces to align those.

▲ZaneHam 3 hours ago

Hello!

Didn't realise this was posted here (again lol) but where I originally posted, on the R/Compilers subreddit I do mention I used chatgpt to generate some ascii art for me. I was tired and it was 12am and I then had to spend another few minutes deleting all the Emojis it threw in there.

I've also been open about how I use AI use to people who know me, and I work with in the OSS space. I have a lil Ollama model that helps me from time to time, especially with test result summaries (if you've ever seen what happens when a Mainframe emulator explodes on a NIST test you'd want AI too lol, 10k lines of individual errors aint fun to walk through) and you can even see some Chatgpt generated Cuda in notgpt.cu which I mixed and mashed a little bit. All in all, I'm of the opinion that this is perfectly acceptable use of AI.

▲cmdr2 2 hours ago

> This is obvious AI slop from the readme

I keep hoping that low-effort comments like these will eventually get downvoted (because it's official HN policy). I get that it's fashionable to call things AI slop, but please put some effort into reading the code and making an informed judgment.

It's really demeaning to call someone's hard work "AI slop".

What you're implying is that the quality of the work is poor. Did you actually read the code? Do you think the author didn't obsessively spend time over the code? Do you have specific examples to justify calling this sloppy? Besides a misaligned "|" symbol?

And I doubt you even read anything because the author never talked about LLMs in the first place.

My beef isn't with you personally, it's with this almost auto-generated trend of comments on HN calling everyone's work "AI slop". One might say, low-effort comments like these are arguably "AI slop", because you could've generated them using GPT-2 (or even simple if-conditionals).

▲kmaitreys 41 minutes ago

While I would not call this AI slop, the probability that LLMs were used is high.

> It's really demeaning to call someone's hard work "AI slop".

I agree. I browsed through some files and found AI-like comments in the code. The readme and several other places have AI-like writing. Regarding author not spending time on this project, this is presumably a 16k loc project that was commited in a single commit two days ago. So the author never commited any draft/dev version in the time. I find that quite hard to believe. Again my opinion is that LLMs were used, not that the code is slop. It may be. It may not be.

Yes this whole comment chain is the top comment misreading LLVM as LLMs which is hilarious.

> My beef isn't with you personally, it's with this almost auto-generated trend of comments on HN calling everyone's work "AI slop".

Now this doesn't necessarily is about this particular project but if you post something on a public forum for reactions then you are seeking the time of the people who will read and interact with it. So if they encounter something that the original author did not even bother to write, why should they read it? You're seeing many comments like that because there's just a lot of slop like that. And I think people should continue calling that out.

Again, this project specifically may or may not be slop. So here the reactions are a bit too strong.

▲RockRobotRock 3 hours ago

>No LLVM. No HIP translation layer. No "convert your CUDA to something else first." Just ......

Another obvious tell.

https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing#...

▲ZaneHam 3 hours ago

Oh gosh, Emdashes are already ruined for me and now I can't use that to? I've already had to drop boldface in some of my writings because it's become prolific too.

This is also just what I intentionally avoided when making this by the way. I don't really know how else to phrase this because LLVM and HIP are quite prolific in the compiler/GPU world it seems.

▲cmdr2 2 hours ago

For what it's worth - for people who're in this space - this project is awesome, and I hope you keep going with it! The compiler space for GPUs really needs truly open-source efforts like this.

Code/doc generators are just another tool. A carpenter uses power tools to cut or drill things quickly, instead of screwing everything manually. That doesn't mean they're doing a sloppy job, because they're still going to obsessively pore over every detail of the finished product. A sloppy carpenter will be sloppy even without power tools.

So yeah, I don't think it's worth spending extra effort to please random HN commenters, because the people who face the problem that you're trying to solve will find it valuable regardless. An errant bold or pipe symbol doesn't matter to people who actually need what you're building.

▲m-schuetz 2 hours ago

TIL: I'm an LLM.

▲croes 1 hour ago

Parent confused LLVM with LLM

▲freakynit 3 hours ago

"If this doesn't work, your gcc is broken, not the Makefile." ... bruh.. the confidence.

▲bigyabai 8 hours ago

> and prove the manual wrong on the machine language level

I'll be the party pooper here, I guess. The manual is still right, and no amount of reverse-engineering will fix the architecture AMD chose for their silicon. It's absolutely possible to implement a subset of CUDA features on a raster GPU, but we've been doing that since OpenCL and CUDA is still king.

The best thing the industry can do is converge on a GPGPU compute standard that doesn't suck. But Intel, AMD and Apple are all at-odds with one another so CUDA's hedged bet on industry hostility will keep paying dividends.

▲freakynit 3 hours ago

The first issue created by someone other than the author is from geohot himself.. the goat: https://github.com/Zaneham/BarraCUDA/issues/17

I would love to see these folks working together on this to break apart nvidia's strangehold on gpu market (which, according to internet, allows them to have an insane 70% profit margins, thereby, raising costs for all users, worldwide).

▲piker 9 hours ago

> # It's C99. It builds with gcc. There are no dependencies.

> make

Beautiful.

▲parlortricks 8 hours ago

You gotta love it, simple and straight to the point.

▲esafak 9 hours ago

Wouldn't it funny and sad if a bunch of enthusiasts pulled off what AMD couldn't :)

▲h4kunamata 9 hours ago

Many projects turned out to be far better than proprietary because open-source doesn't have to please shareholders.

What sucks is that such projects at some point become too big, and make so much noise forcing big techs to buy them and everybody gets fuck all.

All it requires to beat proprietary walled garden, is somebody with knowledge and a will to make things happen. Linus with git and Linux is the perfect example of it.

Fun fact, BitKeeper said fuck you to the Linux community in 2005, Linus created git within 10 days.

BitKeeper make their code opensource in 2016 but by them, nobody knew who they were lol

So give it time :)

▲throwa356262 36 minutes ago

I think it was the other way around. It was the community that told bitkeeper to fuck off.

It all ended up good because of one mans genius but let's not rewrite history.

▲bri3d 8 hours ago

The lack of CUDA support on AMD is absolutely not that AMD "couldn't" (although I certainly won't deny that their software has generally been lacking), it's clearly a strategic decision.

Supporting CUDA on AMD would only build a bigger moat for NVidia; there's no reason to cede the entire GPU programming environment to a competitor and indeed, this was a good gamble; as time goes on CUDA has become less and less essential or relevant.

Also, if you want a practical path towards drop-in replacing CUDA, you want ZLUDA; this project is interesting and kind of cool but the limitation to a C subset and no replacement libraries (BLAS, DNN, etc.) makes it not particularly useful in comparison.

▲enlyth 7 hours ago

Even disregarding CUDA, NVidia has had like 80% of the gaming market for years without any signs of this budging any time soon.

When it comes to GPUs, AMD just has the vibe of a company that basically shrugged and gave up. It's a shame because some competition would be amazing in this environment.

▲cebert 7 hours ago

What about PlayStation and Xbox? They use AMD graphics and are a substantial user base.

▲bigyabai 2 minutes ago

PlayStation and Xbox are two extremely low-margin, high volume customers. Winning their bid means shipping the most units of the cheapest hardware, which AMD is very good at.

▲ekianjo 6 hours ago

Because AMD has the APU category that mixes x86_64 cores with powerful integrated graphics. Nvidia does not have that.

▲fdefitte 7 hours ago

Agreed on ZLUDA being the practical choice. This project is more impressive as a "build a GPU compiler from scratch" exercise than as something you'd actually use for ML workloads. The custom instruction encoding without LLVM is genuinely cool though, even if the C subset limitation makes it a non-starter for most real CUDA codebases.

▲guerrilla 8 hours ago

> couldn't

More like wouldn't* most of the time.

Well isn't that the case with a few other things? FSR4 on older cards is one example right now. AMD still won't officially support it. I think they will though. Too much negativity around it. Half the posts on r/AMD are people complaining about it.

▲DiabloD3 8 hours ago

Because FSR4 is currently slower on RDNA3 due to lack of support of FP8 in hardware, and switching to FP16 makes it almost as slow as native rendering in a lot of cases.

They're working the problem, but slandering them over it isn't going to make it come out any faster.

▲guerrilla 8 hours ago

> Because FSR4 is currently slower on RDNA3 due to lack of support of FP8 in hardware, and switching to FP16 makes it almost as slow as native rendering in a lot of cases.

It works fine.

> They're working the problem, but slandering them over it isn't going to make it come out any faster.

You have insider info everyone else doesn't? They haven't said any such thing yet last I checked. If that were true, they should have said that.

▲wmf 8 hours ago

We have HIP at home.

▲pyuser583 1 hour ago

I was hoping AMD would keep making gaming cards, now that NVIDIA is an AI company. Somebody has to, right?

▲bigyabai 18 minutes ago

Nowadays, all you need is Vulkan 1.2 compliance and Linux to run most of Steam's library. A lot of AI-oriented hardware is usable for gaming.

▲exabrial 2 hours ago

Is OpenCL a thing anymore? I sorta thought thats what is was supposed to solve.

But I digress, just a quick put around... I don't know what I'm looking at. But it's impressive.

▲ByThyGrace 7 hours ago

How feasible is it for this to target earlier AMD archs down to even GFX1010, the original RDNA series aka the poorest of GPU poor?

▲monster_truck 5 hours ago

Don't let anyone dissuade you, it's going to be annoying but it can be done. When diffusion was new and rocm was still a mess I was manually patching a lot to get a vii, 1030, then 1200 working well enough.

It's a LOT less bad than it used to be, amd deserves serious credit. Codex should be able to crush it once you get the env going

▲ZaneHam 2 hours ago

Hey, I am actually working on making this compatible on earlier AMD's as well because I have an old gaming laptop with an RX5700m which is GFX10. I'm reading up on the ISA documentation to see where the differences are, and I'll have to adjust some binary encoding to get it to work.

I mean this with respect to the other person though please don't vibe code this if you want to contribute or keep the compiler for yourself. This isn't because I'm against using AI assistance when it makes sense it's because LLMs will really fail in this space. Theres's things in the specs you won't find until you try it and LLMs find it really hard to get things right when literal bits matter.

▲whizzter 10 hours ago

Not familiar with CUDA development, but doesn't CUDA support C++ ? Skipping Clang/LLVM and going "pure" C seems to be quite limiting in that case.

▲ZaneHam 2 hours ago

Im parsing the features of c++ CUDA actually uses, not the full c++ spec as that would take a very large amount of time. The Compiler itself being written in c99 is just because that's how I write my C and is a separate thing.

▲woctordho 4 hours ago

I'm also wondering this. The compiler itself is written in C99, but looking from the tests, it can parse some C++ features such as templates.

▲ 8 hours ago

▲hackyhacky 8 hours ago

Honestly I'm not sure how good is LLVM's support for AMD GX11 machine code. It's a pretty niche backend. Even if it exists, it may not produce ideal output. And it's a huge dependency.

▲bri3d 6 hours ago

Quite good, it’s first party supported by AMD (ROCm LLVM, with a lot upstreamed as well) where it’s fairly widely used in production.

This project is a super cool hobby/toy project but ZLUDA is the “right” drop in CUDA replacement for almost any practical use case.

▲h4kunamata 8 hours ago

Real developer never depended on AI to write good quality code, in fact, the amount of slope code flying left and right is due to LLM.

Open-source projects are being inundated with PR from AIs, not depending on them doesn't limit a project.

That project owner seems pretty knowledgeable of what is going on and keeping it free of dependencies is not an easy skill. Many developers would have written the code with tons of dependency and copy/paste from LLM. Some call the later coding :)

▲gsora 8 hours ago

LLVM and LLM are not the same thing

▲brookman64k 8 hours ago

LLVM (Low Level Virtual Machine) != LLM (Large Language Model)

▲bravetraveler 9 hours ago

> No HIP translation layer.

Storage capacity everywhere rejoices

▲BatteryMountain 2 hours ago

In the old days we had these kinds of wars with cpu instruction sets & extensions (SSE, MMX, x64,). In a way I feel that CUDA should be opened up & generalized so that other manufacturers can use it too, the same way cpu's equalled out on most intruction sets. That way the whole world won't be beholden to one manufacturer (Big Green) and would calm down the scarcity effect we have now. I'm not an expert on gpu tech, would this be something that is possible? Is CUDA a driver feature or a hardware feature?

▲dokyun 1 hour ago

Love to see just a simple compiler in C with a Makefile instead of some amalgamation of 5 languages 20 libraries and some autotools cmake shit.

▲quantumwoke 1 hour ago

There's a lot of people in this thread that don't seem to have caught up with the fact that AMD has worked very hard on their cuda translation layer and for the most part it just works now, you can build cuda projects on amd just fine on modern hardware/software.

▲jillesvangurp 59 minutes ago

Nice repeat of history given that AMD started out emphasizing x86 compatibility with Intel's CPUs. It's a good strategy. And open sourcing it means it might be be adapted to other hardware platforms too.

▲skipants 5 hours ago

Perusing the code, the translation seems quite complex.

Shout out to https://github.com/vosen/ZLUDA which is also in this space and quite popular.

I got Zluda to generally work with comfyui well enough.

▲ZaneHam 2 hours ago

This, this and this! Was really inspired by ZLUDA when I made this.

▲yodon 9 hours ago

▲gzread 7 hours ago

Nice! It was only a matter of time until someone broke Nvidia's software moat. I hope Nvidia's lawyers don't know where you live.

▲saagarjha 4 hours ago

This isn't a production grade effort though.

▲latchkey 7 hours ago

Note that this targets GFX11, which is RDNA3. Great for consumer, but not the enterprise (CDNA) level at all. In other words, not a "cuda moat killer".

▲ZaneHam 2 hours ago

Hello,

I'm not the one who posted to HN but I am the project author. I'm working my way into doing multiple architectures as well as more modern GPUs too. I only did this because I used LLVM to check my work and I have an AMD GFX 11 card on my partners desktop (Which I use to test on sometimes when its free).

If you do have access to this kind of hardware and you're willing to test my implementations on it then I'm all ears! (You don't have too obviously :-) )

▲phoronixrly 9 hours ago

Putting a registered trademark in your project's name is quite a brave choice. I hope they don't get a c&d letter when they get traction...

▲battle-racket 48 minutes ago

BarraCUDA is also a bioinformatics toolset? https://www.biocentric.nl/biocentric/nvidia-cuda-bioinformat...

▲cadamsdotcom 9 hours ago

Maybe a rename to Barra. Everyone will still get the pun :)

▲HenrikB 9 hours ago

... or Baccaruda or Baba-rara-cucu-dada (https://youtu.be/2tvIVvwXieo)

▲dboreham 7 hours ago

Or bacaruda.

▲bee_rider 7 hours ago

I wonder if they could change the name to Barracuda if pressed. The capitalization is all that keeps it from being a normal English word, right?

▲Alifatisk 9 hours ago

Are you thinking of Seagate Barracuda?

▲adzm 8 hours ago

They mean the CUDA part

▲gclawes 8 hours ago

What's the benefit of this over tinygrad?

▲bri3d 8 hours ago

Completely different layer; tinygrad is a library for performing specific math ops (tensor, nn), this is a compiler for general CUDA C code.

If your needs can be expressed as tensor operations or neural network stuff that tinygrad supports, might as well use that (or one of the ten billion other higher order tensor libs).

▲7speter 7 hours ago

Will this run on cards that don’t have ROCM/latest ROCM support? Because if not, its only gonna be a tiny subset of a tiny subset of cards that this will allow cuda to run on.

▲woctordho 4 hours ago

Yes. It outputs a hsaco binary that just runs on the GPU (as long as you have the driver). No ROCm needed.

▲sam_goody 9 hours ago

Wow!! Congrats to you on launch!

Seeing insane investments (in time/effort/knowledge/frustration) like this make me enjoy HN!!

(And there is always the hope that someone at AMD will see this and actually pay you to develop the thing.. Who knows)

▲latchkey 7 hours ago