← Back to index

PRD - Foundry: Managed Private Local Inference for Apple Silicon

1. One-line pitch

For startup and mid-market CTOs who need to cut AI API spend or keep sensitive data off cloud providers, Foundry delivers a managed Apple Silicon local-inference platform that makes private LLM serving reliable, observable, and commercially usable.

2. Problem

CTOs and engineering leads buying Mac Studios for local AI hit the same wall:

The job-to-be-done is not "run a model locally." It is:

"Give me a local inference stack that my team can depend on, justify to finance, and defend to compliance."

3. Target customers

Primary (V1)

1. Startup CTO / technical founder

2. Mid-market CTO / head of engineering

3. Compliance-constrained organisations

Secondary (V2+)

4. Service businesses / document-heavy operators (trades, field service, logistics)

4. Product thesis

Foundry is a complete private AI infrastructure product - not just local inference.

A CTO doesn't need "a reliable local endpoint." They need a working system that handles their workloads, connects to their tools, and runs without babysitting. A local model endpoint without orchestration is a car engine without a steering wheel.

The product has four layers. The harness layer is a choice, not a double requirement:

1. Foundry (inference layer) - which model, which quant, which runtime, what fits safely; health checks, capacity guardrails, restart discipline, drift detection

2. OpenClaw OR Hermes (harness layer) - agent orchestration or messaging integration, not both by default:

3. llm_stats (observability layer) - live status, memory pressure, loaded models, activity, crash risk, benchmark evidence

Without a harness, the CTO has a working model but no way to route work to it. Without observability, they can't trust it's working. The stack is the product, but the harness is chosen based on the buyer's needs.

For V1, the CTO doesn't need to understand agentic architecture. They describe their workloads and we configure the appropriate harness for them. Pre-built workflow templates (support routing, code review, document search) give them working examples from day one.

Current building blocks:

5. Goals (MVP)

Business goals

Product goals

6. Non-goals (MVP)

7. Customer promises

Cost story

"Reduce recurring API spend by moving suitable workloads to Apple Silicon local inference."

Reliability story

"Know what is loaded, what is healthy, what fits, and what is about to fall over."

Compliance story

"Keep sensitive inference local, auditable, and under your control."

8. MVP definition

Offer A - Foundry Advisory (£299 one-time)

Offer B - Foundry Managed Setup (£999 setup + £99/mo)

Offer C - Foundry On-Prem (£2-5k/mo)

MVP software scope

9. User stories

Startup CTO

Mid-market / compliance buyer

10. Core workflow (golden path)

1. Customer describes hardware, workloads, sensitivity, and current API spend.

2. Foundry assesses fit and recommends runtime/model profiles.

3. Foundry deploys or guides setup on Apple Silicon hardware.

4. Customer sees live status via llm_stats/dashboard.

5. Team routes selected internal workloads to local endpoint.

6. Foundry tracks health, capacity, storage, and benchmark evidence.

7. Customer expands local usage only where economics and reliability hold.

11. Technical approach

Runtime strategy

MVP technical components

1. Model inventory - local model discovery, duplicate/storage accounting

2. Capacity guard - named memory profiles, safe-fit checks, drift notes

3. Health probe - endpoint checks, latency, loaded model state

4. Benchmark store - structured benchmark history and comparisons

5. Recommendation engine - runtime/model suggestions by workload class

6. Observability surface - menu bar and/or lightweight dashboard

7. Runbook + installer - deployment scripts, config validation, support playbook

Compliance/enterprise technical requirements

12. Logistics and operations

Delivery models

1. Advisory only - document, scripts, recommendation pack

2. Remote managed setup - customer-owned Mac Studio, remote configuration/support

3. On-prem managed pilot - customer-owned or customer-sited hardware, restricted support contract

Operational realities to solve

Commercial/logistics questions

13. Success metrics

Design-partner phase

Product metrics

14. Risks and objections

1. Market confusion - buyers may want "cheap AI" when the real sell is "reliable local AI ops"

2. Hardware narrowness - M3 Ultra 512GB is powerful but niche; smaller configs need separate guidance

3. Support burden - bespoke environments can turn MVP into consulting soup

4. Reliability gap - if local inference is still flaky, the cost story collapses

5. Procurement friction - some buyers will need budget approval before hardware or pilot spend

6. Compliance sales cycle - lucrative, but slower than startup founder sales

7. Naming collision - Microsoft has a product called "Foundry Local" in the same category

15. Open questions

16. V2 path

Service businesses and document-heavy operators (trades, field service, logistics, professional services) represent a real secondary market. They have repetitive admin pain, sensitive data, and no good local AI option. But:

V2 plan: After proving the core CTO/local-inference offer with 3+ design partners, run one bounded service-business pilot. Prove measurable admin turnaround reduction with human review. Then decide whether to productise vertical workflow packs.