← Back to indexPRD - Foundry: Managed Private Local Inference for Apple Silicon
1. One-line pitch
For startup and mid-market CTOs who need to cut AI API spend or keep sensitive data off cloud providers, Foundry delivers a managed Apple Silicon local-inference platform that makes private LLM serving reliable, observable, and commercially usable.
2. Problem
CTOs and engineering leads buying Mac Studios for local AI hit the same wall:
- model choice is confusing and changes weekly
- memory pressure, OOM crashes, and endpoint drift make local inference unreliable
- observability is weak, so CTOs cannot trust the stack for real business workflows
- compliance-sensitive teams cannot use OpenAI/Gemini cloud APIs, but DIY local serving is brittle
- every hour spent babysitting local models erodes the cost-saving case
The job-to-be-done is not "run a model locally." It is:
"Give me a local inference stack that my team can depend on, justify to finance, and defend to compliance."
3. Target customers
Primary (V1)
1. Startup CTO / technical founder
- already spending £500-£3k+/month on OpenAI/Anthropic/Gemini
- willing to buy Mac Studio hardware to reduce recurring spend for suitable workloads
- wants fast setup, clear model recommendations, and low babysitting overhead
- values developer-friendly endpoints, benchmark evidence, and a credible migration path from cloud to local
2. Mid-market CTO / head of engineering
- 50-500 person company
- meaningful AI usage but limited appetite for GPU infrastructure
- wants a vendor-like solution, not a GitHub science project
- needs visibility, support boundaries, auditability, and rollback discipline
3. Compliance-constrained organisations
- healthcare, finance, legal, government, defence-adjacent, internal R&D
- cannot or should not send sensitive prompts/data to cloud AI providers
- values on-prem reliability, auditability, and support over raw lowest price
Secondary (V2+)
4. Service businesses / document-heavy operators (trades, field service, logistics)
- high volumes of repetitive admin: job intake, quotes, invoices, completion packs
- value privacy and local control for sensitive customer/job data
- want admin turnaround improvement, not flashy AI demos
- need human review before anything reaches customers or systems of record
- This is a real market, but the sales motion, channel, and product shape are different enough that it belongs in V2 after the core offer is proven
4. Product thesis
Foundry is a complete private AI infrastructure product - not just local inference.
A CTO doesn't need "a reliable local endpoint." They need a working system that handles their workloads, connects to their tools, and runs without babysitting. A local model endpoint without orchestration is a car engine without a steering wheel.
The product has four layers. The harness layer is a choice, not a double requirement:
1. Foundry (inference layer) - which model, which quant, which runtime, what fits safely; health checks, capacity guardrails, restart discipline, drift detection
2. OpenClaw OR Hermes (harness layer) - agent orchestration or messaging integration, not both by default:
- OpenClaw for agent orchestration: sessions, memory, cron, specialist routing, workflow automation
- Hermes for messaging integration: connecting models to Slack, email, databases, APIs
- Both together is an advanced setup, not the default
3. llm_stats (observability layer) - live status, memory pressure, loaded models, activity, crash risk, benchmark evidence
Without a harness, the CTO has a working model but no way to route work to it. Without observability, they can't trust it's working. The stack is the product, but the harness is chosen based on the buyer's needs.
For V1, the CTO doesn't need to understand agentic architecture. They describe their workloads and we configure the appropriate harness for them. Pre-built workflow templates (support routing, code review, document search) give them working examples from day one.
Current building blocks:
- Project Foundry → inference layer
- OpenClaw → orchestration harness
- Hermes → integration harness
- llm_stats → observability layer
- Tes benchmark + capacity work → evidence and model-fit credibility
5. Goals (MVP)
Business goals
- Prove that CTOs will pay for local inference reliability, not just raw benchmarks
- Close first 3 paying design-partner customers (CTO/local-inference buyers)
- Demonstrate one of two primary value stories:
- cost reduction vs cloud API spend
- compliance / data-sovereignty viability without cloud AI
Product goals
- Give customers one supported path to run and monitor local inference on Apple Silicon
- Make runtime state legible enough that a CTO can trust it for internal workflows
- Reduce "DIY local AI ops" time-to-value from weeks to one working day
6. Non-goals (MVP)
- Competing with hyperscaler inference providers on raw throughput
- Training or fine-tuning foundation models
- Windows/Linux fleet support
- General-purpose multi-tenant SaaS for every hardware setup
- Replacing every runtime; MVP wraps existing runtimes rather than inventing a new one
- Service-business workflow automation (V2)
- Headcount reduction promises
- Unsupervised automation of customer-facing, financial, or operational actions
7. Customer promises
Cost story
"Reduce recurring API spend by moving suitable workloads to Apple Silicon local inference."
Reliability story
"Know what is loaded, what is healthy, what fits, and what is about to fall over."
Compliance story
"Keep sensitive inference local, auditable, and under your control."
8. MVP definition
Offer A - Foundry Advisory (£299 one-time)
- Personalised hardware + model-fit report
- Recommended stack and runtime selection
- Workload suitability assessment
- Setup scripts / deployment checklist
Offer B - Foundry Managed Setup (£999 setup + £99/mo)
- Install and configure the full stack on customer Mac Studio: Foundry + (OpenClaw or Hermes) + llm_stats
- Harness choice based on customer needs: OpenClaw for agent orchestration, Hermes for messaging integration, or both for advanced setups
- Pre-configured workflow templates based on customer's stated workloads (support routing, code review, document search, or custom)
- Health checks, capacity profiles, benchmark baseline, observability dashboard
- Basic runbook and support period
- Customer interacts through existing tools (Slack, email, API) - the infrastructure is invisible
Offer C - Foundry On-Prem (£2-5k/mo)
- For compliance-sensitive teams
- Single-node on-prem deployment with audit logging and support
- Explicit supported use cases only
- No-cloud mode, support contract, operational runbook
MVP software scope
- Supported runtimes: omlx + Ollama (+ LM Studio visibility where practical)
- Full stack: Foundry + (OpenClaw or Hermes) + llm_stats
- Harness choice: OpenClaw for orchestration, Hermes for messaging integration, or both for advanced setups
- Endpoint health probing
- Capacity guard with named profiles (operational / benchmark / all)
- Benchmark evidence store
- Storage hygiene + duplicate awareness
- Observability dashboard (menu bar + lightweight web)
- Simple recommendation engine for “what should I run for this workload?”
- Pre-configured workflow templates: support routing, code review, internal document search
- Runbook + installer
- Customer onboarding: we configure the full stack based on their stated workloads
9. User stories
Startup CTO
- As a CTO, I want to know which models safely fit on my Mac Studio so I can stop guessing.
- As a CTO, I want one dashboard showing runtime health and memory pressure so I can trust local AI in team workflows.
- As a CTO, I want a migration path for suitable prompts from OpenAI to local inference so I can reduce monthly spend.
- As a CTO, I want my routine workloads (support queries, code reviews, document search) running automatically on local infrastructure without me building agent logic from scratch.
- As a CTO, I want to choose between agent orchestration (OpenClaw) and messaging integration (Hermes) based on my use case, not have to learn both.
- As a CTO, I want the AI to work through my existing tools (Slack, email) so my team doesn't have to learn a new interface.
Mid-market / compliance buyer
- As an engineering lead, I want audit-friendly logs and explicit deployment boundaries so I can justify local AI to risk/compliance.
- As an ops owner, I want a supported on-prem deployment with a runbook so this does not depend on one internal tinkerer.
10. Core workflow (golden path)
1. Customer describes hardware, workloads, sensitivity, and current API spend.
2. Foundry assesses fit and recommends runtime/model profiles.
3. Foundry deploys or guides setup on Apple Silicon hardware.
4. Customer sees live status via llm_stats/dashboard.
5. Team routes selected internal workloads to local endpoint.
6. Foundry tracks health, capacity, storage, and benchmark evidence.
7. Customer expands local usage only where economics and reliability hold.
11. Technical approach
Runtime strategy
- Wrap, don't fork existing runtimes.
- Primary serving substrate: omlx where possible.
- Support visibility for Ollama and LM Studio because buyers already use them.
- Foundry owns orchestration, fit analysis, health visibility, and policy.
MVP technical components
1. Model inventory - local model discovery, duplicate/storage accounting
2. Capacity guard - named memory profiles, safe-fit checks, drift notes
3. Health probe - endpoint checks, latency, loaded model state
4. Benchmark store - structured benchmark history and comparisons
5. Recommendation engine - runtime/model suggestions by workload class
6. Observability surface - menu bar and/or lightweight dashboard
7. Runbook + installer - deployment scripts, config validation, support playbook
Compliance/enterprise technical requirements
- local-only network mode option
- audit log of runtime changes and incidents
- explicit no-cloud mode for sensitive environments
- backup/restore story for configs and benchmark state
- safe update path with rollback guidance
12. Logistics and operations
Delivery models
1. Advisory only - document, scripts, recommendation pack
2. Remote managed setup - customer-owned Mac Studio, remote configuration/support
3. On-prem managed pilot - customer-owned or customer-sited hardware, restricted support contract
Operational realities to solve
- hardware procurement recommendations (which Mac Studio config is enough?)
- remote access/support process for managed customers
- SSD/storage policy for model libraries
- update cadence for model/runtime changes
- incident response when models OOM, endpoints drift, or disks fill
- boundaries: what workloads should stay cloud even if local is available?
Commercial/logistics questions
- do we require customer-owned hardware for MVP?
- do we support one standard hardware profile first (e.g. M3 Ultra 512GB) before smaller configs?
- what is the support window and SLA for design partners?
- how do we package onboarding so it feels like a product, not consultancy chaos?
13. Success metrics
Design-partner phase
- 10 qualified conversations with target buyers
- 3 paid pilots or advisory engagements
- at least 1 customer using local inference for a real recurring workflow
Product metrics
- time from kickoff to working local endpoint < 1 day for supported hardware
- customer can identify runtime health state in < 30 seconds
- measurable reduction in cloud AI usage for suitable workloads
- zero critical incidents caused by unsupported auto-actions in MVP
14. Risks and objections
1. Market confusion - buyers may want "cheap AI" when the real sell is "reliable local AI ops"
2. Hardware narrowness - M3 Ultra 512GB is powerful but niche; smaller configs need separate guidance
3. Support burden - bespoke environments can turn MVP into consulting soup
4. Reliability gap - if local inference is still flaky, the cost story collapses
5. Procurement friction - some buyers will need budget approval before hardware or pilot spend
6. Compliance sales cycle - lucrative, but slower than startup founder sales
7. Naming collision - Microsoft has a product called "Foundry Local" in the same category
15. Open questions
- Is the first sale better packaged as advisory, managed setup, or on-prem pilot?
- How much of llm_stats should remain free vs become part of paid Foundry?
- Should MVP support only one blessed hardware profile to keep ops sane?
- What workloads are strong local wins vs clear stay-in-cloud cases?
16. V2 path
Service businesses and document-heavy operators (trades, field service, logistics, professional services) represent a real secondary market. They have repetitive admin pain, sensitive data, and no good local AI option. But:
- the sales motion is different (channel/partnership vs direct)
- the product shape is different (workflow orchestration, not just inference ops)
- the evidence for headcount reduction is thin
- incumbents like Housecall Pro and ServiceTitan already own the workflow/budget
V2 plan: After proving the core CTO/local-inference offer with 3+ design partners, run one bounded service-business pilot. Prove measurable admin turnaround reduction with human review. Then decide whether to productise vertical workflow packs.