Why we are eliminating virtualization:
Bare-metal manifesto

In more than a decade of cloud history, the virtual machine (VM) and the hypervisor have been the default foundation. From AWS EC2 to Google Cloud Compute Engine, public clouds are built on virtualization. It splits one big host into many small instances so multiple tenants share the same die, the same bus, the same power—extreme sharing and oversubscription built a massive business and a rich SaaS ecosystem.

But the hardware has changed, especially on Apple Silicon (M-series) where the ARM design reset desktop and workstation performance. Pasting the old x86 cloud model onto Mac compute created a hard engineering fact: virtualization is choking the core of what makes Apple chips fast.

Today, the NOVAKVM team publishes this architecture note. We go deep: why, for high-performance compile paths and on-device LLM inference, virtualization is poison; and how we rebuilt the full stack to ship what we believe is the first pure, lossless, 100% native Apple Silicon bare-metal control plane in production.

To see why virtualization hurts, start with the Apple design advantage: unified memory architecture (UMA). On classic x86 workstations, system RAM and discrete GPU VRAM are separate. To move large data—textures, huge model weights—between CPU and GPU, everything crosses the slow, narrow PCIe link. That transfer cost is the so-called “PCIe wall.”

M4 and M4 Pro break that wall. CPU performance cores, efficiency cores, the multi-core GPU, and the Neural Engine sit on the same die and are directly connected to a single, very large pool of high-bandwidth memory, up to 64GB or 128GB. When you run parallel Xcode builds or an MLX workflow with a 70B model, data is effectively zero-copy: the bytes stay in place, cores use pointers, and work happens in place without shuttling over PCIe.

Insert a hypervisor, and the picture breaks.

The virtualization layer sits between CPU instructions and hardware. Every high-speed memory access and every concurrent filesystem operation has to be intercepted, translated, and emulated. In compile workloads measured in minutes, that is a slow, avoidable self-inflicted cost.

In our internal lab we used a real iOS app codebase with more than 2M lines of mixed Swift and Objective-C. The gap was large:

  • 100% bare-metal M4 Pro (14 cores / 64GB): incremental and full index time 4m 15s. Steady system load, NVMe queues not saturated.
  • VM with the same M4 Pro profile (14 vCPU / 64GB): incremental time jumps to 12m 40s. Host fans at full speed; kernel data shows a large share of sys time CPU time in nested page-table walks and blocked I/O virtualization paths.
BENCHMARK_EXECUTION.LOG
root@mg-lab-01:~$ xcodebuild -project Core_Enterprise.xcodeproj -benchmark
> Initiating Build Sequence...
> Allocating parallel compilation threads: 14

[VM Environment Detected - KVM/Hypervisor Hooked]
> Translation overhead: SEVERE
> I/O wait time spiking: > 4200ms
> Result: Build finished in 760.5s. (12m 40s)

root@mg-lab-02:~$ xcodebuild -project Core_Enterprise.xcodeproj -benchmark
> Initiating Build Sequence...

[Bare Metal Environment Detected - Direct Hardware Access]
> Native Metal framework hooked. Zero-copy memory mapped.
> Direct NVMe PCIe lanes confirmed.
> Result: Build finished in 255.0s. (4m 15s)

Beyond instruction and I/O tax, colocation and VMs bring the well-known cloud problem called the “noisy neighbor” effect. Multiple VMs on one M4 means when another tenant hammers the CPU, your instance jitters. L2 and system cache lines thrash. For CI/CD that must be stable, that variability is a serious failure mode.

Worse, sharing weakens the hardware security boundary. Side-channel work from Spectre and Meltdown to ARM-specific variants is still a real class of issues.

At NOVAKVM, we end this the blunt way, because it works: physical isolation. You are not buying abstract “compute”—you get a real Mac mini with a board and aluminum enclosure. When the lease ends, the control plane reaches the maximum-security path on the Apple chip toward the Secure Enclave, destroying hardware-wrapped key material, and, under DoD 5220.22-M standard to perform a deep physical overwrite of the entire NVMe drive.

After mapping the cost of virtualization, the team had a hard choice: use the same VM model everyone else uses and oversubscribe, or take the harder, less traveled path. We took the second.

Human operators cannot be the bottleneck for cloud. At NOVAKVM, the control plane is code, end to end.

Our network and platform engineers, after a long design and bring-up cycle, use Apple enterprise MDM (Mobile Device Management) and, from zero, build in Go a high-concurrency Mac bare-metal orchestration daemon.

When you click DEPLOY INSTANCE in the console, the following runs automatically in a quiet colo, fast:

  1. Schedule and pick: the scheduler claims a physical machine in the resource pool that is still powered off.
  2. Reset and network boot: a smart PDU powers the unit; the switch places the port on an isolated recovery VLAN.
  3. Native image deploy: a deploy server on 10G internal uplink streams a clean macOS image; typical time is under 90 seconds.
  4. Handoff: MDM issues commands for static public IPv4 routing and writes your SSH public key into the system authorized_keys trust path.

Bare metal is not just for Xcode. The biggest workload today is local LLM research and on-device inference.

Thanks to unified memory, a physical M4 Pro with 64GB or more lets the GPU use system RAM like VRAM at full width. With Apple open-source MLX, you load very large models on the bare host without the usual memory sharding and cross-device chatter.

MLX_INFERENCE_TEST.LOG
# On a 64GB NOVAKVM bare-metal node, load a model that is not heavily pruned
root@mlx-engine:~$ python -m mlx_lm.generate \
    --model meta-llama/Meta-Llama-3-70B-Instruct \
    --prompt "Explain quantum entanglement."

> Allocating unified memory buffer...
[OK] 48.2 GB mapped to GPU address space directly.
> Bootstrapping Neural Engine...

> Generation completed.
> Profiling: Token generation speed: 18.5 tokens/sec
> VRAM Overflow: FALSE (100% Native UMA Support)

Cloud should not be built by stacking more heavy software on top of hardware. The right direction is the opposite: smooth automation and great APIs on top, direct access to silicon on the bottom so none of the hardware budget is left on the table.

At NOVAKVM, we take infrastructure-as-code to the physical layer. The long-term plan includes Terraform-style providers so you can declare, in a few lines, a fleet of real Macs across regions—similar in spirit to EC2, but on metal only.

The virtualization margin era is not unlimited. The performance tax is real, and engineering teams should not have to pay it by default. Today, NOVAKVM rejects that model and redefines the boundary of what “compute” means for Mac on the cloud.

Do not let slow, janky builds break your focus. This is a bare-metal era, pure, raw, and fully yours when you are on the box.