Foundry — Private AI Infrastructure on Apple Silicon

The problem

CTOs buying Mac Studios for local AI hit the same wall: models crash, memory runs out, switching is slow, nothing is observable, and they can't trust the stack for real work. But even if the models run perfectly, they still have to figure out how to actually use them — routing workloads, building agent logic, connecting to their systems, keeping sessions alive, handling failures. A reliable local endpoint isn't enough. They need the whole stack working together.

The product

Foundry is a complete private AI infrastructure stack for Apple Silicon. Not just local inference — inference, orchestration, messaging, and observability, all working together out of the box.

Three offers:

Foundry Advisory (£299 one-time)

We assess your hardware, workloads, and current AI spend. We tell you which Mac to buy, which models to run, which workloads to move local, and which to keep on cloud. You get a written recommendation pack and setup scripts.

Foundry Managed (£999 setup + £99/mo)

We set up your Mac Studio as a complete private AI infrastructure node:

Local inference with capacity guardrails and health monitoring
Your choice of harness: OpenClaw for agent orchestration, or Hermes for messaging integrations (or both for advanced setups)
Observability dashboard showing models, memory, activity, and health
You don't need to understand agentic architecture. We configure it for your workflows.

Foundry On-Prem (£2–5k/mo)

For compliance-sensitive teams. Everything in Managed, plus audit logging, no-cloud mode, a support contract, and a runbook your risk team can read. Your data never leaves your network.

What's in the stack

Layer	What it does	What the CTO sees
Foundry	Model management, capacity planning, health checks, benchmarks	"Which models are running? Are they healthy? What fits in memory?"
OpenClaw (or Hermes)	Agent orchestration OR messaging integration	"My support queries get handled automatically" / "The AI reads our Slack and responds"
llm_stats	Real-time observability — endpoints, memory, activity, crash risk	"I can see at a glance that everything is working."

The harness layer is one or the other, not both. A CTO chooses:

OpenClaw if they want agent orchestration: sessions, memory, cron, specialist routing, workflow automation
Hermes if they want messaging integration: connecting models to Slack, email, databases, APIs

Combining both is possible for advanced setups but is not the default configuration.

Why this works

Buyers are already spending £500–3k+/mo on OpenAI/Anthropic. Anything under that is a saving.
Compliance teams often have no acceptable cloud option. Local inference isn't a preference — it's a requirement.
No one else offers a complete managed stack on Apple Silicon. The market is DIY pain or GPU cloud.
We already built every layer. This isn't a pitch deck — it's running on our Mac Studio right now.

Why us

512GB M3 Ultra running every model, every quant, daily — we know what fits and what crashes
Foundry: model management, capacity, health, drift detection
OpenClaw: agent orchestration with sessions, memory, cron, and specialist agents
or Hermes: messaging and integration layer connecting to Slack, email, databases, APIs
llm_stats: real-time observability dashboard
Benchmark evidence: measured performance data, not theory

How a CTO actually uses it

You don't need to understand agentic architecture. You tell us:

"I want my support team's routine queries handled locally"
"I want code reviews running on our own hardware"
"I want internal document search that doesn't phone home to OpenAI"

We configure the full stack — models, agents, integrations, monitoring — and hand you a working system. Your team interacts with it through your existing tools (Slack, email, API). The infrastructure is invisible.

MVP scope

Supported hardware: M3 Ultra 512GB (expand later)
Supported runtimes: omlx + Ollava
Full stack: Foundry + (OpenClaw or Hermes) + llm_stats
Pre-configured workflow templates: support routing, code review, document search
Customer chooses harness: OpenClaw for agent orchestration, Hermes for messaging integration, or both for advanced setups
3 paid design partners to start

What it's not

Not just a local inference endpoint (that's useless without orchestration)
Not competing with Groq/Together on raw throughput
Not an enterprise SaaS platform (yet)
Not training or fine-tuning models
Not Windows/Linux

The ask

We're looking for 3 design partners — CTOs who want to cut their AI bill and run private AI on their own hardware. You tell us what workloads you want local. We make the whole stack work.