← Back to indexFoundry MVP Path
What exists right now (today)
Working:
- omlx serving on 8081 (Qwen3-Coder-Next + GLM-5.1)
- MLX server on 8085 (8 models loaded)
- LM Studio on 1234 (9 models)
- Ollama on 11434
- llm_stats CLI + menubar: endpoint health, model listing, memory, swap, RSS, Foundry capacity data, request activity, crash risk
- Foundry: capacity guard (operational/benchmark/all profiles), benchmark store, model inventory, storage hygiene, health probe — code exists, not yet packaged
- OpenClaw + Hermes exist as local orchestration/messaging building blocks, but are not part of the core Foundry MVP unless used for one bounded workflow pilot
Not working / not yet built:
- no packaged installer
- no customer-facing onboarding flow
- no web dashboard (only CLI + menubar)
- no config validation tool
- no update/rollback procedure
- no documentation beyond internal files
- no packaged vertical workflow templates
- no customer-safe workflow review UI or approval audit trail
Positioning guardrails for MVP
Foundry MVP stays horizontal at the core: private/local inference ops on Apple Silicon.
The primary buyer remains the CTO / engineering lead who wants reliable local inference, reduced cloud API exposure, clear observability, and supportable deployment.
A secondary buyer path is allowed for service businesses and document-heavy operators, but only through one narrow workflow pilot. Foundry should not become a broad automation consultancy in MVP.
OpenClaw/Hermes positioning:
- optional add-on for pure CTO/local-inference buyers
- central operating/workflow layer for service-business buyers
- not required for every Foundry Managed setup
- not a reusable vertical workflow-pack product until V2
MVP build sequence
Step 1: Package the full stack (1 week)
Goal: A customer could run one command and get Foundry + their chosen harness + observability.
- [ ]
llm_stats installable via pip install llm-stats or a single curl | sh
- [ ]
llm_stats summary works out of the box with zero config on a standard Mac Studio
- [ ] OpenClaw installable and configurable for a standard set of workflow templates
- [ ] Hermes installable and configurable for standard messaging integrations (Slack, email)
- [ ] Foundry capacity guard runnable as a standalone CLI:
foundry capacity check --profile operational
- [ ] Foundry benchmark store runnable:
foundry benchmark list, foundry benchmark compare
- [ ] Foundry storage hygiene runnable:
foundry storage report
- [ ]
foundry status — one-command overview wrapping llm_stats + capacity + health + harness status
- [ ] One README that explains the full stack, harness choice (OpenClaw vs Hermes), and how the layers work together
Why first: If we can't give a customer a working full stack in 5 minutes, nothing else matters. This is the proof that the product works outside our machine.
Step 2: Advisory pack (1 week, parallel with Step 1)
Goal: We can deliver the £299 advisory offer tomorrow if someone asks.
- [ ] Hardware recommendation matrix: which Mac Studio config for which workload class
- [ ] Model-fit worksheet: "I want to run X, what model/quant/runtime should I use?"
- [ ] Cost comparison template: cloud API spend vs local TCO
- [ ] Local-fit decision framework: what should stay cloud, what should move local
- [ ] Setup checklist: the exact steps to go from bare Mac Studio to working local inference
- [ ] Service-business workflow-fit worksheet: which repetitive admin workflow is narrow enough for a private/local AI pilot
- [ ] Bundle into a Notion doc or PDF template we can customise per customer
Why parallel: We can sell this before any packaging is done. If someone asks tomorrow, we deliver manually. The packaging catches up.
Step 3: Managed setup scripts (2 weeks)
Goal: A repeatable install process that turns a bare Mac Studio into a working Foundry node with the full stack.
- [ ]
foundry init — detects hardware, installs supported runtimes, configures OpenClaw, validates environment
- [ ]
foundry doctor — checks health of the full stack (inference + orchestration + messaging + observability), flags problems
- [ ]
foundry configure --profile operational — sets capacity profile, starts appropriate services, configures workflow templates
- [ ]
foundry status — one-command overview (inference health + agent status + integration status + memory/capacity)
- [ ] Config templates for omlx, Ollava, and LM Studio
- [ ] Pre-configured workflow templates: support routing, code review, internal document search
- [ ] Auto-start on boot (launchd plist generation)
- [ ] Storage policy: where models live, how to avoid SSD fill
Step 4: Remote support baseline (1 week)
Goal: We can support a customer's Mac Studio without being in the room.
- [ ] Tailscale install as part of
foundry init (zero-config mesh VPN)
- [ ]
foundry support command that dumps a health bundle (logs, config, capacity state, benchmark history)
- [ ] Remote health check: we can run
foundry doctor on customer machine via Tailscale SSH
- [ ] Monitoring alert: customer's llm_stats can send us a webhook when something breaks
- [ ] Support runbook: what we check, what we fix, what we escalate
Why before dashboard: We need to be able to support the first 3 customers. A fancy dashboard can wait. A way to see what's wrong on their machine can't.
Step 5: One bounded workflow pilot (optional, from Week 3+)
Goal: Prove Foundry can reduce repetitive admin workload for one service-business customer without turning MVP into broad workflow automation.
Pick one workflow only, for one customer:
- job intake/triage
- quote drafting
- dispatch prep
- invoice or completion-pack admin
- customer comms drafting
- document classification/retrieval
MVP boundary:
- [ ] workflow is document-backed, repetitive, and safe to draft/classify/retrieve locally
- [ ] OpenClaw/Hermes are installed only for this pilot workflow
- [ ] every AI output is marked draft or recommendation
- [ ] a named human reviews before any customer message, dispatch action, invoice/finance step, or system-of-record update
- [ ] audit notes record what the AI drafted/classified and who approved it
- [ ] success is measured by admin turnaround, staff effort, and error review, not by headcount reduction
Why bounded: This gives service-business buyers something concrete while keeping reusable workflow packs out of core MVP until V2.
Step 6: First 3 customers (ongoing from Step 2)
Goal: Real paying customers using the stack.
- [ ] Identify 3 design partners: primarily startup CTOs / engineering leads, plus optionally one service-business or compliance-heavy operator
- [ ] Deliver advisory pack to at least 1
- [ ] Deploy managed setup on at least 1 customer Mac Studio
- [ ] Verify they can route real workloads to local inference
- [ ] If using the workflow pilot, verify the human-review boundary in writing before any live use
- [ ] Collect: what broke, what confused them, what they actually used
- [ ] Get 1 testimonial or ROI/admin-time case study
Not a build step — this runs in parallel from day 1. We start selling as soon as Step 2 is done.
What we skip in MVP
- Web dashboard — CLI + menubar is enough for 3 customers
- Multi-node/fleet — single Mac Studio per customer
- Reusable vertical workflow packs — V2, after one bounded pilot proves the pattern
- Broad OpenClaw/Hermes rollout — optional for CTOs; used centrally only for the one service-business pilot
- Auto-remediation — we fix things, not scripts
- Unsupervised workflow automation — humans approve customer, dispatch, finance, and system-of-record actions
- Billing system — invoice manually for the first 3
- Self-serve signup — it's a conversation, not a form
- Windows/Linux — Apple Silicon only
- Smaller Mac support — M3 Ultra 512GB first, smaller configs get a different advisory pack
Tes-specific questions for the discussion
1. omlx vs Ollama as primary runtime for customers — omlx is more capable but less known. Do we standardise on one or support both from day 1?
2. Capacity guard test suite — the code exists but no venv/pytest is wired up. What's the path to making it testable and shippable?
3. Benchmark pack standardisation — what benchmark packs should we ship as the default evidence base for customers?
4. llm_stats → Foundry integration — llm_stats already reads Foundry data. How tight should the packaged coupling be?
5. Hardware profile scope — M3 Ultra 512GB only for MVP, or do we need M4 Mac Pro / M4 MacBook Pro profiles too?
6. omlx production readiness — is omlx stable enough to recommend as a customer's primary serving substrate, or does it need more hardening?
7. Workflow pilot safety — what minimal review/audit mechanism is enough for a service-business pilot without building a full workflow product too early?
Timeline
| Week | Step | Deliverable |
| 1 | Package truth layer + Advisory pack | `pip install llm-stats` works; advisory template ready to sell |
| 2-3 | Managed setup scripts | `foundry init/doctor/configure/status` works end-to-end |
| 3-4 | Remote support baseline | Tailscale + health bundle + support runbook |
| 3-4+ | Optional bounded workflow pilot | One human-reviewed admin workflow using OpenClaw/Hermes, not a reusable pack |
| 1-4+ | First 3 customers | Sell advisory, deploy managed, collect feedback |
Total: 3-4 weeks to a sellable, deployable MVP.
The core MVP is Foundry private/local inference ops. The workflow pilot is a deliberately constrained proof point for the secondary service-business buyer path.