Pedantic note: rust-cuda was created by https://github.com/RDambrosio016 and he is not currently involved in VectorWare. rust-gpu was created by the folks at embark software. We are the current maintainers of both.
We didn't post this or the title, we would never claim we created the projects from scratch.
`rust-GPU` and `rust-CUDA` fall in the category to me of "Rust is great, let's build the X ecosystem in rust". Meanwhile, it's been in a broken and dormant state for years. There was a leadership/dev change recently, (Are the creators of VectorWare the creators of Rust-CUDA, or the new leaders?), and more activity. I haven't tried since.
If you have a Rust application or library and want to use the GPU, these approaches are comparatively smooth:
- WGPU: Great for 3D graphics
- Ash and other Vulkan bindings: Low-level graphics bindings
- Cudarc: Nice API for running CUDA kernels.
I am using WGPU and Cudarc for structural biology + molecular dynamics computations, and they work well.
Rust - CUDA feels like lots-of-PR, but not as good of a toolkit as these quieter alternatives. What would be cool for them to deliver, and I think is in their objectives: Cross-API abstractions, so you could, for example, write code that runs on Vulkan Compute in addition to CUDA.
Something else that would be cool: High-level bindings to cuFFT and vkFFT. You can FFI them currently, but that's not ideal. (Not too bad to impl though, if you're familiar with FFI syntax and the `cc` crate)
Yes, it is all these folks getting together and getting resources to push those projects to the next level: https://www.vectorware.com/team/
wgpu, ash, and cudarc are great. We're focusing on the actual code that runs on the GPU in Rust, and we work with those projects. We have cust in rust-cuda, but that existed before cudarc and we have been seriously discussing just killing it in favor of cudarc.
Be sure to turn on "pedantic mode" to get the footnotes that make this make more sense. Some examples of what this means by "applications" would help. I don't think the prediction here is that Excel's main event loop is going to run on the GPU, but I can see that its calculation engine might.
With current GPU architectures, this seems unlikely. Like, you would need a ton of cells with almost perfectly aligned inputs before even the DMA bus roundtrip is worth it.
We’re talking at least hundreds of thousands of cells, depending on the calculation, or at least a number that will make the UI very sad long before you’ll see a slowdown from calculation.
There isn't always a DMA roundtrip; unified memory is a thing. But programming for the GPU is very awkward at a systems level. Even with unified memory, there is generally no real equivalent to virtual memory or mmap() so you have to shuffle your working set in and out of VRAM by hand anyway (i.e. backing and residency is managed explicitly, even with "sparse" allocation api's that might otherwise be expected to ease some of the work). Better GPU drivers may be enough to mitigate this, along with broad-based standardization of some current vendor-specific extensions (it's not clear that real HW changes are needed) but this creates a very real limitation in the scale of software (including the AI kind) you can realistically run on any given device.
after reading this page, I still don't know what gpu native ware they want to work on.
> If you look at existing GPU applications, their software implementations aren't truly GPU-native. Instead, they are architected as traditional CPU software with a GPU add-on.
I feel that this is due to the current hardware architecture, not the fault of software.
We are investing in them and they form the basis of what we are doing. That being said, we are also exploring other technical avenues with different tradeoffs...we don't want to assume a solution merely because we are familiar with them.
One of the founders here, feel free to ask whatever. We purposefully didn't put much technical detail in the post as it is an announcement post (other people posted it here, we didn't).
Pedantic note: rust-cuda was created by https://github.com/RDambrosio016 and he is not currently involved in VectorWare. rust-gpu was created by the folks at embark software. We are the current maintainers of both.
We didn't post this or the title, we would never claim we created the projects from scratch.
My bad! "contributors" is more accurate, but HN doesn't allow editing titles, sadly :(
HN allows the submitter to edit the title, at least it did last time I checked.
No worries, just wanted to correct it for folks. Thanks for posting!
`rust-GPU` and `rust-CUDA` fall in the category to me of "Rust is great, let's build the X ecosystem in rust". Meanwhile, it's been in a broken and dormant state for years. There was a leadership/dev change recently, (Are the creators of VectorWare the creators of Rust-CUDA, or the new leaders?), and more activity. I haven't tried since.
If you have a Rust application or library and want to use the GPU, these approaches are comparatively smooth:
I am using WGPU and Cudarc for structural biology + molecular dynamics computations, and they work well.Rust - CUDA feels like lots-of-PR, but not as good of a toolkit as these quieter alternatives. What would be cool for them to deliver, and I think is in their objectives: Cross-API abstractions, so you could, for example, write code that runs on Vulkan Compute in addition to CUDA.
Something else that would be cool: High-level bindings to cuFFT and vkFFT. You can FFI them currently, but that's not ideal. (Not too bad to impl though, if you're familiar with FFI syntax and the `cc` crate)
Yes, it is all these folks getting together and getting resources to push those projects to the next level: https://www.vectorware.com/team/
wgpu, ash, and cudarc are great. We're focusing on the actual code that runs on the GPU in Rust, and we work with those projects. We have cust in rust-cuda, but that existed before cudarc and we have been seriously discussing just killing it in favor of cudarc.
+1 for cudarc. I've been using it for a couple of years now and has worked great. I'm using it for financial markets backtesting.
Be sure to turn on "pedantic mode" to get the footnotes that make this make more sense. Some examples of what this means by "applications" would help. I don't think the prediction here is that Excel's main event loop is going to run on the GPU, but I can see that its calculation engine might.
More software than you think can run fully on the GPU, especially with datacenter cards. We'll be sharing some demos in the coming weeks.
With current GPU architectures, this seems unlikely. Like, you would need a ton of cells with almost perfectly aligned inputs before even the DMA bus roundtrip is worth it.
We’re talking at least hundreds of thousands of cells, depending on the calculation, or at least a number that will make the UI very sad long before you’ll see a slowdown from calculation.
Databases, on the other hand…
There isn't always a DMA roundtrip; unified memory is a thing. But programming for the GPU is very awkward at a systems level. Even with unified memory, there is generally no real equivalent to virtual memory or mmap() so you have to shuffle your working set in and out of VRAM by hand anyway (i.e. backing and residency is managed explicitly, even with "sparse" allocation api's that might otherwise be expected to ease some of the work). Better GPU drivers may be enough to mitigate this, along with broad-based standardization of some current vendor-specific extensions (it's not clear that real HW changes are needed) but this creates a very real limitation in the scale of software (including the AI kind) you can realistically run on any given device.
Sounds interesting.
But, languages like Java or python simply lack even programming constructs to program on GPUs easily.
No standardised ISA on GPUs also mean compilers can’t really provide a translation layer.
Let’s hope things get better over time!
You might be interested in a previous blog post where we showed one codebase running on many types of GPUs: https://rust-gpu.github.io/blog/2025/07/25/rust-on-every-gpu...
Python has decorators which can be used to add sugar to methods for things like true parallelization. For example, see modal.com’s Python snippets.
https://modal.com/docs/examples/batched_whisper
after reading this page, I still don't know what gpu native ware they want to work on.
> If you look at existing GPU applications, their software implementations aren't truly GPU-native. Instead, they are architected as traditional CPU software with a GPU add-on.
I feel that this is due to the current hardware architecture, not the fault of software.
We have some demos coming in the next couple weeks. The hardware is there, the software isn't!
What does this mean for the rust-gpu and rust-cuda projects themselves? Will they go unmaintained now that the creators are running a business?
(Don't miss the "Pedantic mode" switch on the linked page, it adds relevant and detailed footnotes to the blog post.)
We are investing in them and they form the basis of what we are doing. That being said, we are also exploring other technical avenues with different tradeoffs...we don't want to assume a solution merely because we are familiar with them.
One of the founders here, feel free to ask whatever. We purposefully didn't put much technical detail in the post as it is an announcement post (other people posted it here, we didn't).