Case Study — Tech / Software
A persistent AI operating layer on a dedicated Windows machine — not a chatbot, not a script runner. A production system that gives Claude, GPT-4o, and Gemini authenticated access to shell commands, SSH, WordPress APIs, DNS management, finance databases, and more. Built around the principle that AI capability without auditability is just chaos at scale.
Running a portfolio of businesses — BamPav Ltd (photography), Mars Purifier Ltd (water purification), FocusGoods (e-commerce), Worthington (client engagement), Lab Locum Marketplace, plus personal finance and infrastructure — creates a surface area that no single person can manage efficiently with conventional tools. Constant context-switching, credential sprawl, and repetitive operations that don't justify their cognitive cost.
The off-the-shelf answer is "use more SaaS tools." The real answer is to build an operating layer that understands my entire system and can act within it — with appropriate controls so that acting quickly doesn't mean acting dangerously.
The core is a Node.js MCP gateway server that exposes 19 tools to any connected AI model. The gateway handles authentication, credential injection (from the encrypted vault), and audit logging. Models never see raw credentials — they call a tool name and receive a result.
shell_exec
Runs shell commands on the local machine. Scoped to an allowlist — no arbitrary command execution.
ssh_exec
Authenticated SSH to the VPS (69.62.122.2). Credentials injected from vault — no plaintext key paths in prompts.
sqlite_query
Read/write access to the T-V2 SQLite database. All queries logged.
finance_query
Read-only SQL access to the finance database (Barclays + Nationwide transactions, BamPav Ltd, Mars Purifier Ltd, personal).
wp_rest
WordPress REST API integration for all 7 managed sites — post creation, editing, media upload, plugin state.
hostinger_dns
Hostinger DNS API — A/CNAME/TXT record management for all domains. Used for Traefik routing and mail authentication.
stalwart
Stalwart Mail Server API — account management, DKIM key rotation, delivery logs, ManageSieve rule updates.
openai_workers
Dispatches generation tasks to OpenAI (GPT-4o, DALL-E). Tracks usage in llm_usage_ledger with cost in micro-USD.
ge_wiki_price
Live OSRS Grand Exchange price feed via OSRS Wiki API — powers RuneStock margin and arbitrage engine.
Plus 10 additional tools covering file operations, calendar, clipboard, system monitoring, Stripe webhooks, and inter-agent messaging.
Every secret — API keys, SSH credentials, WordPress application passwords, Stripe keys, DNS API tokens — is stored in a DPAPI-encrypted credentials vault on the local machine. DPAPI (Windows Data Protection API) ties encryption to the current user's login session: the vault can only be decrypted by the same Windows user account that created it.
This means: no plaintext secrets in config files, no .env files committed to git, no credentials in prompt history. Models call vault_read(key) and receive the decrypted value for the duration of that tool call. The value is never written to disk in plaintext. 20+ secrets under management, covering every service in the stack.
Beyond the core gateway, three specialist MCP servers extend the stack for specific workloads:
The gateway is model-agnostic: Claude (Anthropic), GPT-4o (OpenAI), and Gemini (Google) all connect via the same MCP protocol. Different models have different access scopes based on the task. Claude handles most operational work; GPT-4o is routed for content generation tasks (Worthington pipeline, FocusGoods product descriptions); Gemini is available for cross-validation and long-context document work.
Every tool call — regardless of which model made it — is logged to the audit table with timestamp, model identifier, tool name, input parameters (credentials redacted), and output status. Nothing runs without a trail.
Errors escalate. No tool call swallows an exception and returns a synthetic success. If something breaks, the model sees the error and escalates — it doesn't proceed as if nothing happened.
Deletions, DNS changes, production deployments, and financial operations require explicit human confirmation before execution. The model proposes; the operator approves.
Every LLM call writes to llm_usage_ledger with cost_micro_usd BIGINT. Token spend is queryable at any time. No surprise API bills.
On any given day the operating system might: pull the latest Barclays transactions, categorise spend, and surface a finance summary; check Stalwart delivery logs for a domain's outbound mail; push a WordPress post to three sites; run a DuckDB query against the RuneStock price archive; rotate a DNS record; or monitor a FocusGoods canary probe and decide whether to increment the v2 traffic weight. All from natural language. All logged. All authenticated without ever passing a credential through the model context window.
This isn't theoretical capability — these are operations I run every week across a real portfolio of businesses, on infrastructure I built and maintain.
Happy to walk through design decisions, the vault implementation, or how MCP compares to alternative integration patterns.