Build logs and technical write-ups from the lab.
2026-03-31 AIBenchmarksOllama
Squeezing Blood from a Stone: Testing TurboQuant KV Cache Compression on a 12GB GPU
Google's TurboQuant compresses KV cache to ~3 bits at inference time. We benchmarked it on a Qwen3 14B model with an RTX 4070 — 2GB VRAM savings, 25% speed penalty, and lessons learned from community forks.
2026-03-29 AIHardwareOllama
RTX 5060 Blackwell vs RTX 4070: LLM Inference Benchmarks
The RTX 5060 was underperforming the 4070 despite being newer. Root cause: 8GB VRAM, not software. IQ4_XS quantization brought it to 67 t/s — within 8% of the 4070.
2026-03-27 SIEMMonitoring
Replacing Graylog with Splunk Enterprise
Migrated from Graylog (Docker on cainfra01) to a dedicated Splunk Enterprise VM. Native install for real admin experience — service management, conf files, and proper resource isolation.
2026-03-26 AIProxmoxInfrastructure
Repurposing a Proxmox Node as an AI Workstation
Freed pve1 (i9-13900H, 64GB) from hypervisor duty and deployed it as a bare-metal Conductor host running a 35B MoE model at 8.2 t/s on CPU.
2026-03-25 SecurityMFAActive Directory
Deploying Cisco Duo MFA Across the Lab
Replaced self-hosted PrivacyIDEA with Cisco Duo. Push-based MFA for Windows logon and RDP with RADIUS proxy for future VPN integration.
2026-03-20 AIArchitecture
Launching the AI Agent Swarm
14-service multi-agent system deployed on cainfra02. Conductor, four subagents, three observers, and N8N gatekeeper — all communicating via RabbitMQ.