Seamless Bare-Metal Migration: NVMe Lift-and-Shift Between Identical Nodes

The Context (Business Challenge / Problem): When a critical client workstation or edge node suffers a catastrophic hardware failure (e.g., motherboard short or degraded power delivery), time is of the essence. Reinstalling the OS, configuring specific enterprise software, and restoring data from backups can easily result in 4 to 8 hours of business downtime. In this case, the client experienced a hardware failure, but the NVMe storage remained intact. The business required the system to be back online immediately with zero configuration loss. ...

June 16, 2026 · 2 min · Aleksei Riumin

Proxmox Server Crash: Recovering XFS Data After a Catastrophic NVMe Failure

The Context (Business Challenge / Problem): In enterprise infrastructure, hardware reliability is paramount, but every architect needs an R&D environment to test limits. Recently, the primary 2TB NVMe drive (a budget-tier brand) in my Proxmox R&D lab suffered a sudden, catastrophic controller failure. The hypervisor crashed, dropping into an initramfs boot loop. Standard filesystem checks (fsck.xfs) reported severe input/output errors, indicating hardware-level NAND or controller death. While a robust monthly backup strategy was in place, the “delta” data—crucial documents created over the last few weeks—was trapped on the failing drive. The objective was clear: diagnose the storage stack, salvage the recent documents without causing further corruption, and migrate the hypervisor to enterprise-grade hardware. ...

June 14, 2026 · 2 min · Aleksei Riumin