The New Paradigm
We have officially moved past the era of the messy, half-built corporate IT lab. Provisioning a multi-tier infrastructure designed to enforce zero-trust access, maintain immutable audit trails, and actively orchestrate a localized swarm of AI agents isn't just tinkering — it is engineering a fully compliant, end-to-end custom solution with a strict Sysops and Netops mindset.
This level of rigorous infrastructure build used to require an entire team of high-tier specialists: dedicated sysadmins, network engineers, and security analysts working in tandem. Instead, this environment was architected and executed by a collaborative triad: a single human CISSP and AIGP director providing the vision and constraints, Claude serving as the hands-on execution engineer, and Gemini acting as the strategic sounding board and validation engine.
This is an exploration of what modern infrastructure looks like when it is designed from the ground up to be tested and audited prior to deployment. From integrating Cisco Duo MFA across an Active Directory environment to centralizing observability through Splunk Enterprise, every component supporting the rpc-cyberflight.com domain adheres to strict enterprise standards.
Rather than spending weeks manually configuring Proxmox clusters, writing OpenWrt firewall rules, and deploying Docker containers, the paradigm has shifted. The human architect writes the requirements, establishes the risk register, and defines the deployment boundaries in rigorous markdown documentation. The AI assistants ingest that context, generate the infrastructure-as-code, and execute the changes.
This is the blueprint for the future of engineering. Welcome to the era of the single-human, AI-native IT department.
The Compute Foundation: Right-Sizing for a Distributed Swarm
When building an environment capable of running autonomous agents, the first mistake most architects make is assuming they need a single, monolithic GPU server. However, an enterprise-grade Agent Swarm operates on a distributed architecture. Orchestrating multiple agents requires a strategic separation of duties: you need a highly capable "brain" to handle complex reasoning and task delegation, and a separate, high-speed "execution tier" to actually carry out the commands.
The Orchestrator: The Bare-Metal Conductor
The "brain" of the operation runs on a Minisforum MS-01, repurposed specifically as the AI workstation. Running Ubuntu 24.04 directly on bare metal, this node features an Intel Core i9-13900H processor and 64GB of RAM.
This node is dedicated entirely to running the Conductor — the central orchestrator API of the Agent Swarm. Because the Conductor is responsible for parsing user intents, validating complex enterprise constraints (like ensuring new subnets don't conflict with existing DHCP scopes), and routing tasks, it requires a model with deep reasoning capabilities. By leveraging the 64GB of system RAM, the MS-01 comfortably runs the massive 35-billion parameter cyberpilot (Qwen3.5 MoE) model via Ollama. It acts as the slow, deliberate, and highly intelligent routing engine for the entire lab.
The Execution Tier: High-Speed GPU Inference
Once the Conductor determines the required action, the task is passed via RabbitMQ to specialized subagents like Sapper (for firewall and network policy) and DaVinci (for infrastructure-as-code). These execution agents do not need 35B parameter reasoning; they need to rapidly write scripts and execute terminal commands.
To support this, the subagents are hosted on the BigBrain node, utilizing an NVIDIA RTX 4070 passed through directly to the underlying VM.
The Sweet Spot: The RTX 4070's 12GB of VRAM is the perfect mathematical fit for modern 9B parameter models (like Qwen3.5 9B). Using 4-bit quantization, these models only consume roughly 5-7GB of VRAM, leaving ample room dedicated entirely to the KV cache.
Massive Context, Rapid Speed: This generous cache overhead allows the subagents to ingest thousands of lines of local markdown documentation — like the lab's exact As-Built configurations — and still generate infrastructure-as-code at a blazing ~73 tokens per second.
The Agent Swarm: Infrastructure as Code (Written by AI)
The functional core of the architecture is the Agent Swarm itself, a distributed deployment of over a dozen containerized services running on an Ubuntu VM (cainfra02). Rather than a single monolithic script, the Swarm utilizes RabbitMQ as a high-speed message broker to route specific tasks to specialized subagents, maintaining a strict separation of duties.
Securing the Front Door: The MFA-Gated Zero-Trust Interface
The most glaring vulnerability of any autonomous AI agent is its input vector. If a bad actor — or an insider threat — can issue prompts to the agent, they can effectively hijack the infrastructure. While routing commands through a chat app might suffice for a sandbox, it instantly fails an enterprise audit due to a lack of non-repudiation and step-up authentication.
To mitigate this, the primary user assistant, "Ralph" (running on the NemoClaw framework), is deployed behind a strict defense-in-depth security model.
First, the network boundary is absolute. NemoClaw resides on an isolated virtual machine with no public internet exposure. To issue commands, the human operator must cryptographically authenticate through the external OpenWrt firewall via the site-to-site WireGuard VPN.
However, network access does not equal execution rights. The interface to NemoClaw is protected by a mandatory Duo MFA challenge, mirroring the identity perimeter established for the lab's Active Directory environment. Before "Ralph" can even process a natural language request to alter a firewall rule or provision a VM, the human operator must verify their identity. This ensures absolute accountability and satisfies the core "Human-in-the-Loop" requirement for compliance.
Second, NemoClaw operates on the principle of least privilege. Even after a successful MFA challenge, the agent itself is completely unprivileged. It possesses no SSH keys, no root passwords, and no direct sysadmin capabilities. It exists in a secure sandbox where its only technical capability is reading local markdown documentation and constructing JSON payloads.
When the authenticated user instructs NemoClaw to provision a new server, NemoClaw formats the request and sends it to the Conductor API. The Conductor then acts as the ultimate gatekeeper — validating the payload against a hardcoded set of enterprise constraints before delegating the task to a specialized execution subagent. NemoClaw can ask for the world, but the Conductor decides if it is safe to build it.
The MFA Gateway and the Continuous Audit Pipeline
To enforce this strict entry requirement, the architecture places an NGINX reverse proxy directly in front of the NemoClaw interface. This proxy acts as the sole ingress point and is configured to mandate a Duo MFA handshake before a single packet of the user's prompt is allowed to enter the NemoClaw sandbox.
But authentication is only half of the zero-trust equation; the other half is continuous observability. Every interaction at the NGINX gateway — along with the resulting JSON payloads generated by NemoClaw and the specific execution steps taken by the Conductor — is immediately forwarded to a centralized Splunk Enterprise instance.
This creates an unbroken, verifiable chain of custody. By logging everything to Splunk, the system enables an automated "Auditor" function to continuously compare the initial, authenticated human intent against the actual infrastructure commands executed by the Swarm's subagents. If a user asks the agent to open a specific port for a web server, the Auditor verifies that only that specific port was opened.
Enterprise Security & Observability
This level of autonomous, infrastructure-altering AI is only deployable because the underlying environment was treated like a corporate network from day one. In an enterprise swarm, security cannot be an afterthought; it must be the foundation.
Algorithmic Air-Gapping: The Ultimate Separation of Duties
Perhaps the most critical security control in this architecture is the physical and logical isolation of the AI models themselves. In traditional monolithic AI deployments, a single model drift or hallucination can compromise the entire workflow.
This swarm avoids that risk through strict separation of duties across isolated local models. The "Conductor" orchestrator runs exclusively on the bare-metal MS-01, utilizing a heavy 35-billion parameter reasoning model. Meanwhile, the execution subagents run on the BigBrain node using entirely different, specialized 9B parameter models.
Crucially, these models do not share a context window and cannot communicate in fluid natural language. They exchange data strictly through rigid, schema-validated JSON payloads over RabbitMQ. This creates a functional algorithmic air-gap. The models cannot "collude," scheme, or compound each other's errors. If an update to the Conductor's model introduces a behavioral flaw, it cannot force the subagent models to act outside their hardcoded boundaries, and the Splunk Auditor will immediately flag the resulting anomalous payload.
The Live Risk Register
Because the infrastructure is heavily monitored and logically isolated, it allows for the maintenance of a live Risk Register. Rather than hiding technical debt, vulnerabilities are documented and systematically queued for the AI Swarm to remediate. Armed with precise As-Built markdown documentation, the human architect simply instructs the Swarm to draft the exact firewall rules and container boundary constraints needed to resolve specific risk items, knowing the execution will be fully audited.
Bridging Local to Cloud: The Hybrid Architecture
An enterprise lab is only as valuable as the production workloads it supports. While the Agent Swarm operates strictly within the local DMZ, its purpose is to maintain and secure the broader rpc-cyberflight.com ecosystem, which extends directly into Google Cloud Platform (GCP).
To maintain the zero-trust posture across environments, the local OpenWrt routing infrastructure maintains a persistent, site-to-site WireGuard VPN tunnel with the GCP environment. This secure bridge allows the local Splunk Enterprise SIEM to natively ingest logs forwarded directly from the cloud instances. If a web application firewall in GCP triggers an alert, the local observability stack sees it instantly.
AI-Native Aviation RAG
Beyond infrastructure management, the local compute resources directly power high-fidelity aviation tools. Because the MS-01 has the overhead to run massive LLMs locally, it also hosts an Open WebUI instance configured as an Aviation-specific Retrieval-Augmented Generation (RAG) system.
By ingesting dozens of FAA manuals — including the Airplane Flying Handbook, Federal Aviation Regulations, Advisories, and other data — the local AI can instantly synthesize complex flight training procedures, airspace regulations, and mechanical troubleshooting steps. This ensures that sensitive, proprietary flight planning data never has to leave the local network to be processed by a public cloud AI provider.
Conclusion: The Future of Engineering
The deployment of the rpc-cyberflight Agent Swarm represents a fundamental shift in systems engineering. We are moving away from the era of the human sysadmin manually typing bash commands over SSH, and into the era of intent-based, AI-executed architecture.
When you combine a human's architectural vision with the rapid execution of an AI like Claude and the strategic validation of an AI like Gemini, the limitations of a single-person IT department vanish.
This triad approach proves that you don't need a massive team to build an audit-ready, enterprise-grade data center. By strictly enforcing a zero-trust MFA perimeter, algorithmically air-gapping local models, and maintaining an immutable Splunk audit trail, you can safely hand the keys of your infrastructure to an autonomous swarm.
The human writes the constraints. The Swarm builds the reality.