Posts

Automating a Senior Architect's Portfolio: Hugo, GitHub Pages, and AI

The Context (Business Challenge / Problem): As an IT entrepreneur and Senior Infrastructure Architect managing infrastructure for 25 B2B clients, time is my most critical asset. Maintaining a traditional dynamic CMS for a technical diary, digital resume, and client case studies requires unnecessary overhead, security patching, and resource allocation. The challenge was to design a web platform that reflects the “Infrastructure as Code” (IaC) philosophy: zero maintenance, robust security, blazing fast loading speeds, and a frictionless content publishing pipeline. ...

Mitigating 99% Disk Exhaustion Caused by AI & SEO Scrapers

The Context (Business Challenge / Problem): A partner running a sports news portal contacted us with a critical issue: the editorial team was entirely unable to publish new articles. A quick infrastructure health check revealed a severe problem: the server’s disk space was at 99% utilization. Further investigation showed that legitimate readers were not the issue. The disk was being aggressively consumed by massive cache files generated by relentless requests from AI bots (GPTBot, ClaudeBot, Applebot) and SEO scrapers (SemrushBot, AhrefsBot). This uncontrolled automated traffic was actively draining disk space, CPU, and RAM, effectively paralyzing the business’s publishing workflow. ...

Seamless Bare-Metal Migration: NVMe Lift-and-Shift Between Identical Nodes

The Context (Business Challenge / Problem): When a critical client workstation or edge node suffers a catastrophic hardware failure (e.g., motherboard short or degraded power delivery), time is of the essence. Reinstalling the OS, configuring specific enterprise software, and restoring data from backups can easily result in 4 to 8 hours of business downtime. In this case, the client experienced a hardware failure, but the NVMe storage remained intact. The business required the system to be back online immediately with zero configuration loss. ...

Physics vs. Marketing: Real AV1300 PowerLine Speeds on Old Aluminum Wire

The Context (Business Challenge / Problem): Ground movement at the client’s site crushed the underground mounting pipe carrying the main network lines between the primary building and a remote garage. Trenching and laying a new conduit was economically unviable in the short term. The business required continuous network access in the remote facility to support a TP-Link Mesh zone, an IP digital intercom, and IP surveillance cameras. The only available physical medium was the existing 220V power line—an old, 2-core aluminum cable without proper grounding. ...

Cost-Effective Kubernetes: Hardware Isolating Dev & Prod Environments in a Single Cluster

The Context (Business Challenge / Problem): A common dilemma for mid-sized enterprises is balancing infrastructure costs with application stability. Running entirely separate Kubernetes clusters for Development, Staging, and Production doubles or triples the overhead (paying for multiple Control Planes and idle compute resources). However, mixing them in a single cluster is dangerous: a memory leak in a developer’s test pod can consume all node resources, causing a cascading failure that takes down the live Production application. The business needs a way to combine these environments to save money, without ever compromising Production SLA. ...

Proxmox Server Crash: Recovering XFS Data After a Catastrophic NVMe Failure

The Context (Business Challenge / Problem): In enterprise infrastructure, hardware reliability is paramount, but every architect needs an R&D environment to test limits. Recently, the primary 2TB NVMe drive (a budget-tier brand) in my Proxmox R&D lab suffered a sudden, catastrophic controller failure. The hypervisor crashed, dropping into an initramfs boot loop. Standard filesystem checks (fsck.xfs) reported severe input/output errors, indicating hardware-level NAND or controller death. While a robust monthly backup strategy was in place, the “delta” data—crucial documents created over the last few weeks—was trapped on the failing drive. The objective was clear: diagnose the storage stack, salvage the recent documents without causing further corruption, and migrate the hypervisor to enterprise-grade hardware. ...

Welcome to My Digital Garden: Infrastructure & DevOps

Hello! I am an Infrastructure Architect, System Administrator, and Independent IT Consultant. This repository serves as my engineering diary and professional portfolio. Here, I document everything from hands-on hardware repair to high-level network architecture and cloud-native solutions. 🛠 What I Do My expertise spans the entire IT stack—from the physical layer to advanced virtualization: Network Architecture & Physical Infrastructure: Designing, planning, and deploying local area networks. My most extensive hands-on project involved engineering a network utilizing over 4,500 meters of UTP cable. Server Virtualization & Management: Building and maintaining high-availability clusters using Proxmox and Docker containerization. IT Consulting & Business Support: Operating as an independent contractor, I currently manage infrastructure and provide SLA-backed support for 23 corporate clients (ranging from 3 to 100 workstations per site). Hardware & Systems Troubleshooting: Diagnosing and repairing servers, setting up IP/VoIP telephony, and integrating CCTV/Access Control systems. 📚 Certifications & Continuous Learning I believe in a solid security foundation followed by advanced routing and orchestration: ...

title: “Architect’s Workstation Upgrade & RTX 3080 Thermal Throttling Analysis” date: 2026-06-21T12:00:00+03:00 draft: false description: “Upgrading a local infrastructure architect’s lab. Diagnosing severe RTX 3080 thermal throttling and planning VRM/VRAM maintenance to restore 320W TDP.” tags: [Hardware, Workstation, RTX3080, Troubleshooting] The Context (Business Challenge / Problem): Maintaining an optimal local environment is critical for an Infrastructure Architect working with Proxmox, Docker, and preparing for the CKA (Kubernetes) certification. I recently migrated my primary workstation to an Intel Core i5-12400 with 32GB of DDR4—currently the most cost-effective platform for running virtualized local labs. However, during stress testing, the Palit RTX 3080 GPU exhibited severe thermal throttling. Instead of operating at its design power envelope of 320W-330W under heavy load, the card drastically downclocked, dropping power consumption to just 140W to prevent critical silicon damage. The Architecture & Work (Solution): The core platform rebuild was a success, providing a stable and highly efficient foundation for complex routing and containerized workloads. To address the GPU bottleneck, I analyzed the hardware telemetry. The issue stems from degraded thermal pads on the GDDR6X memory modules and VRM. GDDR6X runs notoriously hot, and once it hits the 105°C+ junction temperature limit, the firmware aggressively throttles performance. The engineering solution requires a complete teardown of the cooling system, precise PCB cleaning, and the application of high-performance thermal pads with exact thickness tolerances to bridge the gap between the components and the heatsink. ...