NVIDIA Unveils Experimental 'cuda-oxide' Compiler: Write GPU Kernels in Rust, Compile Directly to PTX
Breaking: NVIDIA AI Releases cuda-oxide – An Experimental Rust-to-CUDA Compiler Backend
NVIDIA AI researchers have released cuda-oxide, an experimental compiler backend that enables developers to write CUDA SIMT (Single Instruction, Multiple Threads) GPU kernels entirely in standard Rust. The compiler compiles Rust source code directly to PTX (Parallel Thread Execution), the low-level intermediate representation used by CUDA to target NVIDIA GPUs – without requiring domain-specific languages, foreign function interface bindings, or any C/C++ code.

“cuda-oxide brings the CUDA programming model into Rust, allowing developers to write __global__ functions natively in safe Rust,” said a spokesperson for NVIDIA AI. “This is a fundamentally different approach from previous attempts, which abstracted CUDA away or required bridging through other languages.”
The project is still experimental, but its release marks a significant step toward integrating Rust’s memory safety and concurrency guarantees with high-performance GPU computing.
Background
Traditionally, writing GPU kernels has meant using C++ and the CUDA programming model directly, or relying on Python-level abstractions like Triton that generate CUDA under the hood. The Rust GPU ecosystem has seen multiple efforts to close this gap:
- Rust-GPU targets SPIR-V for Vulkan and graphics compute.
- rust-cuda uses a custom rustc codegen backend targeting NVVM IR.
- CubeCL employs an embedded DSL with a JIT runtime that cross-compiles to CUDA/ROCm/WGPU.
- std::offload relies on LLVM’s implicit offload path.
cuda-oxide occupies a distinct niche. “Our design center is ‘bringing CUDA into Rust’ – kernel authoring, device intrinsics, the SIMT execution model, and the CUDA programming model expressed natively in safe Rust,” the NVIDIA AI team explained. “This is closer in spirit to writing a __global__ function in C++ than to writing a generic Rust function that happens to run on a GPU.”
By contrast, rust-cuda focuses on “bringing Rust to NVIDIA GPUs,” emphasizing Rust ergonomics like async/.await and a Rust-first programming model that abstracts over CUDA concepts. The NVlabs team stated it has coordinated with rust-cuda maintainers and considers the two projects complementary.
The Compilation Pipeline
At the core of cuda-oxide is a custom rustc codegen backend – the layer in the Rust compiler responsible for generating machine code. Instead of emitting native CPU code, the rustc-codegen-cuda crate intercepts the compiler at the CodegenBackend::codegen_crate() entry point and runs a separate pipeline for device code:
- Rust Source → rustc frontend → rustc_public (Stable MIR) → dialect-mir → mem2reg → dialect-llvm → LLVM IR (.ll) → PTX (.ptx)
Two key technologies enable this pipeline:

Why rustc_public (Stable MIR)?
The raw internal MIR representation in rustc changes between nightly versions with no stability guarantees. cuda-oxide uses rustc_public, also known as Stable MIR – Rust’s official versioned, stable API over the compiler’s internals. This allows the backend to read MIR without breaking on every nightly update.
What is Pliron?
The middle stages use Pliron, a Rust-native MLIR-like IR framework written entirely in Rust. Choosing Pliron over upstream MLIR means the entire compiler builds with cargo – no C++ toolchain, no CMake, no tablegen. cuda-oxide defines three custom Pliron dialects: dialect-mir (modeling Rust MIR semantics – places, projection), dialect-llvm, and others that bridge the gap between Rust’s high-level transformations and low-level PTX generation.
What This Means
For developers, cuda-oxide potentially lowers the barrier for writing high-performance GPU code in Rust. Writing kernels directly in Rust without FFI or C/C++ glue could reduce safety bugs while still leveraging NVIDIA’s powerful SIMT execution model. “This is a game-changer for the Rust GPU ecosystem,” commented an independent GPU programming expert. “It allows Rust developers to tap into CUDA’s full potential without sacrificing the language’s guarantees.”
However, the project is explicitly experimental. The team warns that it is not yet ready for production use. Future work includes improving code generation quality, supporting more device intrinsics, and stabilizing the Pliron-based pipeline. The release invites community contributions to help mature the compiler.
Interested developers can find the source code on NVIDIA’s GitHub. The team encourages experimentation and feedback, emphasizing that while cuda-oxide is young, its architecture – built on Stable MIR and Pliron – is designed for longevity and cross-platform potential. “We’re just getting started,” the NVIDIA AI spokesperson added. “The goal is to make Rust a first-class language for GPU computing, and this is the first big step.”
Related Articles
- 10 Critical Lessons from the SAP npm Package Attack on Developer Tools and CI/CD Pipelines
- Orchestrating Multi-Agent AI Systems: A Step-by-Step Guide to Scalable Collaboration
- Go 1.26 Released: New Language Features, Performance Gains, and Experimental SIMD Support
- Imaging Systems Can Now Be Optimized for Information Content, Not Just Resolution, Says New NeurIPS Study
- Automating Intellectual Toil: A Guide to Agent-Driven Development with GitHub Copilot
- Stack Overflow's Overnight Revolution: How a Q&A Site Changed Programming Forever
- Python 3.15.0a4 Released with Build Error Alert – Corrected Alpha 5 on the Way
- Python 3.15.0 Alpha 6: What You Need to Know