V.E.L.O.C.I.T.Y.-OS, a bare-metal operating system described by its developer in a multi-part series, expands its multitasking and update capabilities while pursuing self-evolution. In the swarms update (Part 11), the system introduces a Nexus Core Swarm Runtime to run background compilation and model inference concurrently with GUI rendering. The runtime spawns sandboxed “agents” under a cooperative scheduler and uses lock-free shared-memory message rings with Merkle-hash validation on write and read. The same release also describes a headless “Beacon” streaming protocol that divides the display into an 80×50 grid, computes per-cell signatures each tick, and streams RLE delta frames at 30+ FPS over serial or Ethernet; it also injects remote keyboard and mouse events into the kernel input queues. For reliability, the OS implements zero-downtime OTA hot-patching for kernel drivers using cryptographic signature verification, atomic compare-and-swap pointer swapping, and Read-Copy-Update (RCU) reclamation after CPU quiescent ticks. In the self-evolution update (Part 12), the developer claims a telemetry-driven loop that uses CPU cycle counters and page-fault signals to prompt a local Qwen-Coder analyzer to generate optimized AST candidates, which are sandboxed and hot-swapped into the running kernel.
V.E.L.O.C.I.T.Y.-OS adds swarms, headless streaming, and RCU zero-downtime driver updates
V.E.L.O.C.I.T.Y.-OS, a bare-metal operating system described by its developer in a multi-part series, expands its multitasking and update capabilities while pursuing self-evolution. In the swarms upda...
- V.E.L.O.C.I.T.Y.-OS describes a Nexus Core Swarm Runtime that spawns sandboxed agents under a cooperative scheduler and uses Merkle-hash-validated message rings for inter-agent communication.
- The system supports background tasks (e.g., compilation and model inference) running alongside GUI rendering to address multitasking bottlenecks.
- A headless “Beacon” protocol streams display updates by sending RLE-encoded delta frames for an 80×50 cell grid at 30+ FPS, and sends remote input back into kernel input queues.
- Zero-downtime OTA hot-patching uses cryptographic payload signature verification, atomic pointer swaps (CAS), and Read-Copy-Update (RCU) reclamation after CPU cores reach quiescent ticks.
- The self-healing/self-evolution loop is described as using CPU telemetry (e.g., cycle counters) to detect latency anomalies, then generating and hot-swapping optimized code candidates via a local LLM.
I had arrived at the final frontier. My bare-metal kernel was booting in QEMU, driving NVMe block storage, running multi-agent swarms, and rendering a force-directed canvas. But to make V.E.L.O.C.I.T.Y.-OS a truly next-generation system, I needed to close the loop: the operating system had to be able to evolve and compile itself without human intervention. The V.E.L.O.C.I.T.Y.-OS 12-Part Roadmap We are building a bare-metal, self-healing operating system running entirely inside the CPU's L3 cache. Here is the roadmap for this 12-part series: Part 1: The Spark — Exposing the "Safe-Room" security leak and building the compiler gate. Part 2: The NDA Language — Designing a content-addressed triplet representation to cure context bloat. Part 3: Ditching the Web Stack — Building a native 30MB IDE with 1,500,000x IPC latency drops. Part 4: The Closure JIT — Compiling AST blocks to nested closures and bypassing borrow checker limits. Part 5: JIT Math Optimizations — Replacing division operations with precomputed 16-bit lookup tables. Part 6: x86-64 Assembler & SCEV-Lite — Compiling scalar loops directly to native code in constant time. Part 7: Classic Compiler Passes — Implementing inter-procedural Dead Code Elimination and loop unrolling. Part 8: Reclaiming Ring 0 — Exiting UEFI boot services and transitioning the kernel to Ring 0. Part 9: Bare-Metal Drivers — Writing a PCI scanner, NVMe block storage controller, and FAT32 parser. Part 10: Synaptic Canvas — Rendering a spatial, force-directed GUI based on model token activation vectors. Part 11: Swarms & Hot-Patching — Building multi-agent scheduling and zero-downtime RCU driver updates. Part 12: Self-Evolution — Handing system control over to a local LLM Terminal that self-optimizes via telemetry. (You are here) During the final hours of my Sunday morning sprint, I completed the self-healing loop, the Biosphere P2P registry, and the Boot-to-NDA LLM Terminal handover. To achieve self-healing, I built a Ring 0 telemetry system. The kernel monitors JIT execution speeds using the CPU’s Time Stamp Counter (RDTSC). If telemetry detects performance degradation or anomalous page faults in a module, it feeds the module’s AST and performance log directly to the local Qwen-Coder-0.5B analyzer. The model reasons over the code, JIT-compiles optimized candidates, sandboxes them for safety, and hot-swaps them dynamically in memory, improving execution speeds on-the-fly. Here is the closed-loop self-evolution pipeline mapping how telemetry metrics trigger AST optimization passes and hot-swapping: Fig 1: The closed-loop self-evolution cycle of the operating system. Here is the self-healing loop code from src/evolution.rs that detects latency anomalies, triggers AST optimization passes, JIT-compiles the clean candidates, and registers the optimized function pointer dynamically: // velocity-bootloader/src/evolution.rs — Self-Healing Loop pub static GLOBAL_ASTS: Mutex<BTreeMap<u64, NdaNode>> = Mutex::new(BTreeMap::new()); // Track function latency via RDTSC; trigger healing if average cycles exceed 1,500,000 pub fn track_latency(hash: u64, cycles: u64) { let mut stats = TELEMETRY.lock(); if let Some(node) = stats.iter_mut().find(|n| n.hash == hash) { node.total_cycles += cycles; node.call_count += 1; let avg = node.total_cycles / node.call_count; if avg > 1_500_000 && node.call_count == 10 { // Performance degradation limit crate::serial_println!("[Self-Evolution] Latency warning on hash {:016X}. Avg: {}", hash, avg); trigger_healing_loop(hash); } } else { stats.push(TelemetryNode { hash, total_cycles: cycles, call_count: 1 }); } } fn trigger_healing_loop(hash: u64) { crate::serial_println!("[Self-Evolution] Initiating reflection self-healing loop for {:016X}...", hash); // 1. Retrieve raw function AST from global sitemap register let node_opt = GLOBAL_ASTS.lock().get(&hash).cloned(); let node = match node_opt { Some(n) => n, None => { return; } }; let func_nodes = match &node { NdaNode::Scope { children } => children.clone(), _ => alloc::vec![node.clone()], }; // 2. Run AST optimizer passes (Constant folding, DCE, Loop unrolling) let opt_nodes = crate::nda_jit::optimize_ast(&func_nodes); // 3. JIT compile optimized AST candidate inside the safety sandbox let program = crate::nda_jit::compile(&opt_nodes); // 4. Hot-swap the compiled function pointer atomically in the Sitemap table if let Some(opt_fn) = program.fns.first() { crate::profile::register_optimized_kernel(hash, opt_fn.clone()); crate::serial_println!("[Self-Evolution] Swap complete. Function {:016X} hot-patched.", hash); } } 2. The P2P Registry Biosphere (biosphere.rs) To share modules safely across nodes, I built The Biosphere—a content-addressed P2P registry. Modules import dependencies directly by their Merkle hash (import "8f2ca9..."). If a duplicate dependency is requested, the registry maps it to the same physical memory page in my Single Address Space. This dynamically deduplicates code and ensures that identical dependencies share physical RAM. 3. SMP Core Pinning & IRQ-C (cognitive_bus.rs) Running model inference at the same time as system execution was causing frame drops. I implemented SMP Core Pinning: I pinned background LLM inference tasks exclusively to Core 3, leaving Cores 0-2 free to handle low-latency system ticks and compositor frame rendering. I added Predictive KV Cache Pre-fetching (predictive.rs), which tokenizes ahead of typing to pre-calculate K/V attention mappings in the background, rendering predictions instantly. 4. Boot-to-NDA: The Pure-Glass Handover (pure_glass.rs) The ultimate phase was removing the bootloader scaffolding. During the Boot-to-NDA handover, the UEFI bootloader transfers control to BOOT_ND.BIN. The kernel relinquishes all native Rust registers and execution scopes. All system operations—including the parser, JIT compiler, and GOP canvas compositor—run entirely within JIT-compiled bytecode, accessing hardware ports and MMIO via standardized bytecode shims (sys_in_u8, sys_write_mem32). No native Rust or C code remains active in memory. velocity:> draw a red square at 100 100 [LLM Terminal] Parsing intent -> JIT bytecode compiled in 62us -> GOP rendering executed. In this environment, you don't type syntax. The LLM Terminal acts as your shell. Because the model knows the exact system state via the live Merkle root, you give it plaintext commands, and it compiles opcode-level JIT instructions on-the-fly to execute them. What's Next: The Universal Application Translators What started on June 23rd as a casual comment thread about Kimi K2.7 pricing transformed in just 5 days into a working, 1.1ms-booting bare-metal operating system running in 6MB of L3 cache. I proved that by designing the data structure and JIT compilation to match the model’s internal representation, I could close the gap between developer intent and execution correctness to zero. But this is not the end of the journey—it is just the first major milestone. I will be publishing future updates on this blog as an ongoing series to document the development of V.E.L.O.C.I.T.Y.-OS. The biggest upcoming challenge is answering the question: How do we run legacy software? In the next phases, I will be deep-diving into two major architectural blueprints: The Universal Application Translator (WASI to NDA): A pipeline that takes standard applications (Rust, C++, Go) compiled to WebAssembly (WASI) and translates them into native NDA bytecode, bridging legacy OS dependencies (file I/O, threading) into native V.E.L.O.C.I.T.Y. kernel syscalls. The Universal Binary-to-NDA Lifter: A static decompilation engine that lifts raw compiled binaries (x86-64 Windows PE/Linux ELF) into high-level NDA AST representation. This will allow the kernel to run Auto-Vectorization optimization passes on legacy loops and execute them natively with software-enforced safety. This is how we will get legacy apps like Notepad++ running natively in 2-bit quantized bytecode. A Final Thank You This first major milestone would have never been achieved without the intense, daily design critiques from Pascal CESCATOFollow Full-stack dev sharing practical guides on WordPress, n8n automation, AI tools, Docker & self-hosting. Always experimenting with new tech to make life easier. . Pascal pushed me to move beyond simple prompts, to challenge Node.js/Electron bloat, to solve distributed consensus, and to think about the bootstrap path of Forth and Lisp machines. V.E.L.O.C.I.T.Y.-OS is as much a testament to our collaboration in that comment section as it is to the code itself. The system is booting, the framework is standing, and the horizon is wide open. Stay tuned for the next phase of updates! 🛸 Discussion What are your thoughts on self-evolving software architectures? How do we build guardrails to ensure that AI-driven code modification remains stable, secure, and predictable at bare metal? Let's discuss in the comments below! Special thanks to Pascal CESCATOFollow Full-stack dev sharing practical guides on WordPress, n8n automation, AI tools, Docker & self-hosting. Always experimenting with new tech to make life easier. for grounding my bare-metal sprint in the historical wisdom of Forth and Lisp machines. Disclaimer: AI was used throughout this project, it is just fitting that it would co-author with me, so special thanks to the Foundry for its tireless hours toiling away and Gemini for producing the cover image.
2 hours agoWith the Synaptic Canvas GUI rendering, my bare-metal kernel was fully functional. However, as I expanded the OS features, I ran into multitasking bottlenecks: how do I run background compilation, model inference, and GUI rendering concurrently without crashing the system? Last night, I solved this by implementing three core infrastructure services: Nexus Swarms, Beacon Headless Streaming, and Zero-Downtime OTA Hot-Patching. The V.E.L.O.C.I.T.Y.-OS 12-Part Roadmap We are building a bare-metal, self-healing operating system running entirely inside the CPU's L3 cache. Here is the roadmap for this 12-part series: Part 1: The Spark — Exposing the "Safe-Room" security leak and building the compiler gate. Part 2: The NDA Language — Designing a content-addressed triplet representation to cure context bloat. Part 3: Ditching the Web Stack — Building a native 30MB IDE with 1,500,000x IPC latency drops. Part 4: The Closure JIT — Compiling AST blocks to nested closures and bypassing borrow checker limits. Part 5: JIT Math Optimizations — Replacing division operations with precomputed 16-bit lookup tables. Part 6: x86-64 Assembler & SCEV-Lite — Compiling scalar loops directly to native code in constant time. Part 7: Classic Compiler Passes — Implementing inter-procedural Dead Code Elimination and loop unrolling. Part 8: Reclaiming Ring 0 — Exiting UEFI boot services and transitioning the kernel to Ring 0. Part 9: Bare-Metal Drivers — Writing a PCI scanner, NVMe block storage controller, and FAT32 parser. Part 10: Synaptic Canvas — Rendering a spatial, force-directed GUI based on model token activation vectors. Part 11: Swarms & Hot-Patching — Building multi-agent scheduling and zero-downtime RCU driver updates. (You are here) Part 12: Self-Evolution — Handing system control over to a local LLM Terminal that self-optimizes via telemetry. 1. The Nexus Core Swarm Runtime (nexus.rs) To support concurrent compilation and optimization, I built the Nexus Core Swarm Runtime. The runtime allows JIT threads or the LLM shell to launch child agents via sys_spawn_agent(source_ptr, source_len, mem_limit). Each spawned agent (such as the translator_agent or optimizer_agent) runs in an isolated heap with sandboxed PIDs under a cooperative scheduler. Agents communicate using Synaptic Message Rings—lock-free circular ring buffers in shared memory. Every packet header contains a rolling Merkle hash calculated on write and validated on read to prevent message corruption. Here is the cooperative context switcher implementation in src/gui.rs showing the raw assembly context swap and how task registers are pushed and popped to switch execution stacks on core quiescent ticks: // velocity-bootloader/src/gui.rs — Cooperative Context Switcher pub struct JitTask { pub id: usize, pub title: String, pub program: Arc<crate::nda_jit::JitProgram>, pub stack: Vec<u8>, pub rsp: u64, pub completed: bool, } pub struct CooperativeScheduler { pub tasks: Vec<JitTask>, pub current_task_idx: Option<usize>, pub scheduler_rsp: u64, } // Low-level assembly context switcher (Win64 calling convention) #[cfg(target_os = "uefi")] #[unsafe(naked)] pub unsafe extern "win64" fn switch_context(from_rsp: *mut u64, to_rsp: u64) { core::arch::naked_asm!( // 1. Preserve floating-point and SIMD context registers "sub rsp, 160", "movdqu [rsp + 0], xmm6", "movdqu [rsp + 16], xmm7", "movdqu [rsp + 32], xmm8", "movdqu [rsp + 48], xmm9", "movdqu [rsp + 64], xmm10", "movdqu [rsp + 80], xmm11", "movdqu [rsp + 96], xmm12", "movdqu [rsp + 112], xmm13", "movdqu [rsp + 128], xmm14", "movdqu [rsp + 144], xmm15", // 2. Preserve standard registers "push rbx", "push rbp", "push rdi", "push rsi", "push r12", "push r13", "push r14", "push r15", // 3. Swap stack pointer registers "mov [rcx], rsp", // Save old stack pointer "mov rsp, rdx", // Load new stack pointer // 4. Restore new task's registers "pop r15", "pop r14", "pop r13", "pop r12", "pop rsi", "pop rdi", "pop rbp", "pop rbx", "movdqu xmm15, [rsp + 144]", "movdqu xmm14, [rsp + 128]", "movdqu xmm13, [rsp + 112]", "movdqu xmm12, [rsp + 96]", "movdqu xmm11, [rsp + 80]", "movdqu xmm10, [rsp + 64]", "movdqu xmm9, [rsp + 48]", "movdqu xmm8, [rsp + 32]", "movdqu xmm7, [rsp + 16]", "movdqu xmm6, [rsp + 0]", "add rsp, 160", "ret" ); } 2. The Beacon Remote Headless Protocol (beacon.rs) For edge VMs or headless servers without physical displays, I developed the Beacon headless Protocol. The compositor divides the screen into an 80×50 grid of cells. On every tick, the protocol computes signatures for each cell, detects pixel changes, and streams Run-Length Encoded (RLE) delta frames over COM1 serial or Ethernet at 30+ FPS. Incoming packets from Beacon clients decode keyboard and mouse movements, injecting them directly into the kernel's keyboard::INPUT_QUEUE and mouse registers. (Note: This custom protocol will be replaced with V.E.L.O.C.I.T.Y. Remote soon). 3. Zero-Downtime OTA Hot-Patching (ota.rs) If a core OS driver (such as fat or nvme) has a bug, rebooting a live JIT compiler is dangerous. I built a cryptographic Zero-Downtime OTA Hot-Patching module. // Atomic CAS swap of the active FAT32 read pointer let old_ptr = FAT_READ_PTR.swap(new_ptr, Ordering::SeqCst); Core driver entrypoints are stored in a global Sitemap Dispatch Table. When an update is pushed, the kernel: Allocates fresh memory pages and compiles the new driver code. Cryptographically verifies the payload signature against the public developer key embedded in the bootloader. Swaps the function pointers atomically using a Compare-And-Swap (lock cmpxchg) instruction. Reclaims the old memory pages using a Read-Copy-Update (RCU) reclamation pattern once all active CPU cores pass their quiescent ticks. Here is the architectural overview comparing the multi-agent cooperative stack switcher and RCU pointer hot-patching pipeline: Fig 1: Cooperative task context switching and RCU driver hot-patching architecture. Pascal's Analysis: Distributed Transactions Pascal CESCATOFollow Full-stack dev sharing practical guides on WordPress, n8n automation, AI tools, Docker & self-hosting. Always experimenting with new tech to make life easier. analyzed the agent coordination and hot-patching architecture: "The pre-commit notification pattern... is essentially a distributed transaction with optimistic concurrency. The discourse board is your conflict resolution layer... The audit trail isn't just for debugging — it's a record of why each change was made and who agreed to it." Pascal noted that by utilizing RCU pointer swapping and Merkle message verification, the OS was executing kernel-level code updates with identical safety guarantees as database transactions. But to make this OS self-improving, I needed a way to let the local LLM optimize its own kernel code on-the-fly. In the next post, I'll document how I completed the self-healing loop, the content-addressed Biosphere registry, and the Boot-to-NDA LLM Terminal handover. Discussion How do you handle task scheduling and state consensus in multi-agent environments? Have you implemented cooperative context switching or dynamic RCU hot-patching in low-level systems? Let's discuss in the comments below! Special thanks to Pascal CESCATOFollow Full-stack dev sharing practical guides on WordPress, n8n automation, AI tools, Docker & self-hosting. Always experimenting with new tech to make life easier. for helping me conceptualize the conflict resolution board for multi-agent state consensus. Disclaimer: AI was used throughout this project, it is just fitting that it would co-author with me, so special thanks to the Foundry for its tireless hours toiling away and Gemini for producing the cover image.
3 hours agox64 Assembly ICMP sniffer demonstrates Linux raw sockets and manual IP-to-ASCII handling
Two Dev.to posts describe an x86_64 Linux ICMP sniffer implemented without standard libraries, using direct system calls...
Jasmine Bhasin Hospitalised in Dubai on Birthday, Boyfriend Aly Goni Shares Update
Actress Jasmine Bhasin is hospitalised in Dubai after falling ill during her birthday trip. Her boyfriend, actor Aly Gon...
Amazon cuts prices on M5 MacBook Air and MacBook Pro, alongside Apple Watch and AirPods deals
Multiple outlets report broad Amazon markdowns across Apple’s latest Mac lineup, including M5 MacBook Air and M5 MacBook...