Memory-Safe Until It Isn’t: The Rust Kernel Bug That Broke Linux
The disclosure of CVE-2025-68260, the first publicly assigned CVE affecting Rust code in the Linux kernel, triggered a disproportionate level of attention compared to its immediate technical impact.
Headlines framed it as a symbolic failure: “Rust breaks,” “memory safety promises collapse,” or “Linux’s Rust experiment backfires.” These interpretations obscure what actually happened and, more importantly, what the event teaches about systems programming, concurrency, and language guarantees.
This article examines three tightly related topics:
What CVE-2025-68260 actually was, technically
The goals and constraints of the Rust-for-Linux initiative
Why race conditions remain a hard problem even in Rust, especially in kernel code
The goal is not to defend Rust, nor to criticize Linux developers, but to clarify where responsibility lies: in invariants, concurrency design, and the unavoidable complexity of kernel-level programming.
Background: The Rust-for-Linux Initiative
The Linux kernel is historically written in C. Over decades, this has produced unmatched performance and portability, but also a persistent class of vulnerabilities: memory safety errors. Use-after-free, buffer overflows, double frees, and invalid pointer dereferences account for a large fraction of kernel CVEs every year.
Rust was introduced into the kernel with a narrowly defined objective:
Reduce the number of memory safety vulnerabilities in new code, especially drivers.
This objective is often misrepresented as “make the kernel safe” or “eliminate CVEs.” That was never technically plausible.
Key constraints of kernel Rust was:
Rust is opt-in, not a rewrite
Existing C subsystems remain dominant
Rust code must interoperate with C
Many kernel operations require
unsafestdis unavailable; onlycoreandalloc
Rust in Linux follows a pattern common in high-assurance systems:
Safe outer layers (API usage, object ownership)
Unsafe inner cores (raw pointers, synchronization, hardware access)
This design deliberately concentrates risk rather than attempting to eliminate it.
What Was CVE-2025-68260?
CVE-2025-68260 affected the Android Binder driver, which had been partially rewritten in Rust. The vulnerability was a race condition in a data structure managing object lifetimes (commonly referred to as a “death list”). Under specific concurrent interleavings, an object could be accessed after it was logically invalid, leading to memory corruption.
Key facts:
The bug occurred inside an
unsafeblockThe root cause was incorrect lifetime/concurrency assumptions
Rust’s compiler guarantees were explicitly suspended
The vulnerability was real and exploitable
Why This Became “The First Rust CVE”
A CVE (Common Vulnerabilities and Exposures) is a standardized identifier assigned to a publicly disclosed security vulnerability. CVEs are maintained under the CVE Program, historically operated by MITRE, and serve as a common reference point across vendors, researchers, security tools, and security advisories.
A CVE entry does not imply that a vulnerability is novel, severe, or indicative of a systemic failure. It simply means that the issue is considered security-relevant, is reproducible or theoretically exploitable, and merits tracking, coordination, and public disclosure. In practice, CVEs function as index numbers, not technical judgments. Thousands are issued each year, many for routine bugs in long-mature systems.
In the Linux kernel specifically, C-based code receives a large volume of CVEs annually. Many are minor, highly localized, or low-impact, and most receive little attention outside vulnerability databases.
Technically, Rust code in the kernel had already contained bugs prior to CVE-2025-68260. What made this vulnerability notable was not its technical uniqueness, but that it crossed the formal threshold for CVE assignment, was clearly attributable to Rust code rather than C glue, and directly contradicted popular narratives that equated Rust adoption with the complete elimination of kernel memory vulnerabilities. This combination made the issue symbolically significant, even though it was not unprecedented from an engineering standpoint.
Understanding Rust’s Safety Model
To interpret the CVE correctly, it is necessary to understand what Rust does—and does not—guarantee.
What Safe Rust Guarantees
In safe Rust, the compiler enforces:
No data races
No use-after-free
No dangling references
No invalid aliasing
No out-of-bounds memory access
These guarantees are enforced statically.
The Meaning of unsafe
In Rust, the unsafe keyword does not mean that code is incorrect or reckless. It means that the compiler’s ability to enforce certain guarantees ends, and responsibility for maintaining those guarantees is explicitly transferred to the programmer.
More precisely, unsafe allows operations that the Rust compiler cannot statically verify, including:
Dereferencing raw pointers
Calling foreign (FFI) functions
Accessing mutable global state
Implementing custom synchronization primitives
Performing low-level memory manipulation
Inside an unsafe block, Rust’s usual guarantees—such as freedom from data races, correct lifetimes, and valid aliasing—are no longer enforced automatically. The compiler assumes that the programmer has upheld the required invariants manually.
Why unsafe Is Essential in the Linux Kernel
The Linux kernel operates in an environment that fundamentally violates many of Rust’s core assumptions. As a result, unsafe is not an exception in kernel code; it is a necessity.
Specifically, kernel development requires:
Direct interaction with hardware
Memory-mapped I/O, device registers, DMA buffers, and interrupt controllers cannot be expressed in safe Rust because they involve raw addresses and aliasing that the compiler cannot reason about.Manual memory management and lifetimes
Kernel objects often have lifetimes that depend on global state, reference counts, or external subsystems. These lifetimes are not hierarchical or lexical, making them incompatible with Rust’s borrow checker withoutunsafe.Custom concurrency and synchronization models
The kernel uses spinlocks, RCU (Read-Copy-Update), atomic operations, and lock-free data structures tailored to preemption, interrupt context, and multi-CPU execution. These patterns cannot be implemented purely with Rust’s standard safe concurrency primitives.Interoperation with existing C code
The Linux kernel is predominantly written in C. Rust code must call into, and be called from, C code that does not follow Rust’s ownership or aliasing rules. Crossing this boundary requiresunsafeby definition.
In the context of the Android Binder driver, unsafe was used to express invariants such as:
“This pointer remains valid while this lock is held”
“This object cannot be freed while it is present on this list”
“These two fields are mutated under mutually exclusive conditions”
These invariants are real, necessary, and kernel-specific—but they are also outside the compiler’s ability to verify.
What unsafe Actually Signals
Rather than indicating a flaw, unsafe serves as a risk boundary marker:
It localizes trust assumptions
It highlights code requiring deeper review
It documents where correctness depends on reasoning beyond the type system
CVE-2025-68260 did not arise because unsafe was used unnecessarily. It arose because one of the invariants upheld inside an essential unsafe region was violated under a rare concurrency interleaving. This distinction is critical: the presence of unsafe is expected in kernel Rust, and the vulnerability reflects the inherent difficulty of concurrent systems programming, not a misuse of the language itself.
Why Race Conditions Are Special
Memory Safety vs. Concurrency Correctness
Rust is exceptionally good at preventing spatial and lifetime errors. Race conditions are temporal errors. They depend on when things happen, not where they live.
Examples of race conditions that Rust cannot prevent by itself:
Check-then-act races
Incorrect lock ordering
Lifetime assumptions across callbacks
Concurrent teardown and use
RCU grace-period violations
These are logic problems, not type problems.
A Simple Safe Rust Race Example
use std::sync::{
atomic::{AtomicBool, Ordering},
Arc,
};
use std::thread;
fn main() {
let ready = Arc::new(AtomicBool::new(false));
for _ in 0..4 {
let ready = ready.clone();
thread::spawn(move || {
if !ready.load(Ordering::Relaxed) {
// Multiple threads may enter here
initialize();
ready.store(true, Ordering::Relaxed);
}
});
}
}
fn initialize() {
println!("initializing");
}
This code is 100% safe Rust. There is no memory corruption. But the logic is incorrect if initialization must be one-time. Rust does not (and cannot) prevent this. This example should instead be rewritten using a synchronization primitive that enforces one-time initialization semantics, such as std::sync::Once or OnceLock, which are designed to prevent check-then-act races. These abstractions encode the required temporal invariant directly in the API, removing the need for timing-dependent reasoning by the programmer.
Why Kernel Races Are Harder
Kernel concurrency adds additional complexity:
Preemption
Interrupt context
Multiple CPUs
Weak memory ordering
Lock-free structures
Cross-language invariants
Many kernel paths cannot block, which eliminates simple locking strategies.
Why “Just Use a Mutex” Is Not a Valid Kernel Answer
In user space, the fix for many races is trivial:
Mutex<Vec<T>>
In the kernel, this approach often fails for structural reasons.
Kernel Constraints
Blocking may be illegal (interrupt context)
Mutexes may sleep
Some code runs with interrupts disabled
Some paths require lock-free progress
Performance constraints prohibit coarse locking
The Binder driver uses carefully designed concurrency patterns that cannot be replaced with naïve locking without breaking semantics or performance.
What Actually Went Wrong in the Binder Driver
While details vary, the class of bug was:
A shared object list
Concurrent modification and teardown
A missing or incorrect synchronization boundary
A lifetime assumption violated under a rare interleaving
This is exactly the kind of bug Rust does not automatically prevent once unsafe is involved.
Importantly:
The same bug in C would likely be larger and harder to audit
Rust localized the unsafe region
The blast radius was smaller than comparable C bugs
Below is a simplified illustrative example, not the actual Binder code.
use core::ptr::NonNull;
struct Node {
value: u64,
}
static mut GLOBAL: Option<NonNull<Node>> = None;
unsafe fn publish(node: &mut Node) {
GLOBAL = Some(NonNull::new_unchecked(node));
}
unsafe fn read() -> u64 {
GLOBAL.unwrap().as_ref().value
}
If one thread calls publish, and another concurrently frees node, read becomes a use-after-free. Rust cannot prevent this because:
Raw pointers are involved
Lifetime is external
Synchronization is manual
This is structurally similar to many kernel lifetime bugs.
Best Practices for Rust in Kernel-Like Environments
1. Minimize unsafe Surface Area
Keep
unsafeblocks as small as possibleWrap unsafe internals in safe abstractions
Document invariants explicitly
/// SAFETY: caller must hold `lock` and ensure object is alive
unsafe fn access_raw(ptr: *const T) -> &T { ... }
2. Treat unsafe Like Cryptography
Mandatory review
No “obvious correctness” assumptions
Explicit reasoning about interleavings
3. Prefer Proven Concurrency Patterns
RCU-like schemes
Reference counting with strict ownership rules
Epoch-based reclamation
4. Avoid Cross-Language Lifetime Assumptions
Assume C code may violate Rust expectations
Treat FFI boundaries as hostile
Use defensive programming
5. Test Concurrency Aggressively
Stress tests
Fuzzing with thread injection
CPU affinity variation
Artificial delays
Rust removes many bug classes, but testing remains mandatory.
Does CVE-2025-68260 Undermine Rust in Linux?
From a technical standpoint: no.
What it demonstrates:
Rust does not eliminate concurrency bugs
Kernel programming inherently requires unsafe code
Language guarantees stop at abstraction boundaries
What it does not demonstrate:
That Rust is unsuitable for kernels
That Rust increases risk
That memory safety guarantees were false
Empirical data still shows that Rust significantly reduces vulnerability density in new kernel code.
The disproportionate reaction to this CVE stems from a mismatch between:
What Rust actually promises
What people believed it promised
Rust promises:
Memory safety where the compiler is allowed to enforce it
It does not promise:
Correct algorithms
Race-free designs
Automatic concurrency safety in unsafe code
When expectations are corrected, the event appears routine rather than shocking.
Comments
Post a Comment