Foundry MVP Path

What exists right now (today)

Working:

omlx serving on 8081 (Qwen3-Coder-Next + GLM-5.1)
MLX server on 8085 (8 models loaded)
LM Studio on 1234 (9 models)
Ollama on 11434
llm_stats CLI + menubar: endpoint health, model listing, memory, swap, RSS, Foundry capacity data, request activity, crash risk
Foundry: capacity guard (operational/benchmark/all profiles), benchmark store, model inventory, storage hygiene, health probe — code exists, not yet packaged
OpenClaw + Hermes exist as local orchestration/messaging building blocks, but are not part of the core Foundry MVP unless used for one bounded workflow pilot

Not working / not yet built:

no packaged installer
no customer-facing onboarding flow
no web dashboard (only CLI + menubar)
no config validation tool
no update/rollback procedure
no documentation beyond internal files
no packaged vertical workflow templates
no customer-safe workflow review UI or approval audit trail

Positioning guardrails for MVP

Foundry MVP stays horizontal at the core: private/local inference ops on Apple Silicon.

The primary buyer remains the CTO / engineering lead who wants reliable local inference, reduced cloud API exposure, clear observability, and supportable deployment.

A secondary buyer path is allowed for service businesses and document-heavy operators, but only through one narrow workflow pilot. Foundry should not become a broad automation consultancy in MVP.

OpenClaw/Hermes positioning:

optional add-on for pure CTO/local-inference buyers
central operating/workflow layer for service-business buyers
not required for every Foundry Managed setup
not a reusable vertical workflow-pack product until V2

MVP build sequence

Step 1: Package the full stack (1 week)

Goal: A customer could run one command and get Foundry + their chosen harness + observability.

[ ] llm_stats installable via pip install llm-stats or a single curl | sh
[ ] llm_stats summary works out of the box with zero config on a standard Mac Studio
[ ] OpenClaw installable and configurable for a standard set of workflow templates
[ ] Hermes installable and configurable for standard messaging integrations (Slack, email)
[ ] Foundry capacity guard runnable as a standalone CLI: foundry capacity check --profile operational
[ ] Foundry benchmark store runnable: foundry benchmark list, foundry benchmark compare
[ ] Foundry storage hygiene runnable: foundry storage report
[ ] foundry status — one-command overview wrapping llm_stats + capacity + health + harness status
[ ] One README that explains the full stack, harness choice (OpenClaw vs Hermes), and how the layers work together

Why first: If we can't give a customer a working full stack in 5 minutes, nothing else matters. This is the proof that the product works outside our machine.

Step 2: Advisory pack (1 week, parallel with Step 1)

Goal: We can deliver the £299 advisory offer tomorrow if someone asks.

[ ] Hardware recommendation matrix: which Mac Studio config for which workload class
[ ] Model-fit worksheet: "I want to run X, what model/quant/runtime should I use?"
[ ] Cost comparison template: cloud API spend vs local TCO
[ ] Local-fit decision framework: what should stay cloud, what should move local
[ ] Setup checklist: the exact steps to go from bare Mac Studio to working local inference
[ ] Service-business workflow-fit worksheet: which repetitive admin workflow is narrow enough for a private/local AI pilot
[ ] Bundle into a Notion doc or PDF template we can customise per customer

Why parallel: We can sell this before any packaging is done. If someone asks tomorrow, we deliver manually. The packaging catches up.

Step 3: Managed setup scripts (2 weeks)

Goal: A repeatable install process that turns a bare Mac Studio into a working Foundry node with the full stack.

[ ] foundry init — detects hardware, installs supported runtimes, configures OpenClaw, validates environment
[ ] foundry doctor — checks health of the full stack (inference + orchestration + messaging + observability), flags problems
[ ] foundry configure --profile operational — sets capacity profile, starts appropriate services, configures workflow templates
[ ] foundry status — one-command overview (inference health + agent status + integration status + memory/capacity)
[ ] Config templates for omlx, Ollava, and LM Studio
[ ] Pre-configured workflow templates: support routing, code review, internal document search
[ ] Auto-start on boot (launchd plist generation)
[ ] Storage policy: where models live, how to avoid SSD fill

Step 4: Remote support baseline (1 week)

Goal: We can support a customer's Mac Studio without being in the room.

[ ] Tailscale install as part of foundry init (zero-config mesh VPN)
[ ] foundry support command that dumps a health bundle (logs, config, capacity state, benchmark history)
[ ] Remote health check: we can run foundry doctor on customer machine via Tailscale SSH
[ ] Monitoring alert: customer's llm_stats can send us a webhook when something breaks
[ ] Support runbook: what we check, what we fix, what we escalate

Why before dashboard: We need to be able to support the first 3 customers. A fancy dashboard can wait. A way to see what's wrong on their machine can't.

Step 5: One bounded workflow pilot (optional, from Week 3+)

Goal: Prove Foundry can reduce repetitive admin workload for one service-business customer without turning MVP into broad workflow automation.

Pick one workflow only, for one customer:

job intake/triage
quote drafting
dispatch prep
invoice or completion-pack admin
customer comms drafting
document classification/retrieval

MVP boundary:

[ ] workflow is document-backed, repetitive, and safe to draft/classify/retrieve locally
[ ] OpenClaw/Hermes are installed only for this pilot workflow
[ ] every AI output is marked draft or recommendation
[ ] a named human reviews before any customer message, dispatch action, invoice/finance step, or system-of-record update
[ ] audit notes record what the AI drafted/classified and who approved it
[ ] success is measured by admin turnaround, staff effort, and error review, not by headcount reduction

Why bounded: This gives service-business buyers something concrete while keeping reusable workflow packs out of core MVP until V2.

Step 6: First 3 customers (ongoing from Step 2)

Goal: Real paying customers using the stack.

[ ] Identify 3 design partners: primarily startup CTOs / engineering leads, plus optionally one service-business or compliance-heavy operator
[ ] Deliver advisory pack to at least 1
[ ] Deploy managed setup on at least 1 customer Mac Studio
[ ] Verify they can route real workloads to local inference
[ ] If using the workflow pilot, verify the human-review boundary in writing before any live use
[ ] Collect: what broke, what confused them, what they actually used
[ ] Get 1 testimonial or ROI/admin-time case study

Not a build step — this runs in parallel from day 1. We start selling as soon as Step 2 is done.

What we skip in MVP

Web dashboard — CLI + menubar is enough for 3 customers
Multi-node/fleet — single Mac Studio per customer
Reusable vertical workflow packs — V2, after one bounded pilot proves the pattern
Broad OpenClaw/Hermes rollout — optional for CTOs; used centrally only for the one service-business pilot
Auto-remediation — we fix things, not scripts
Unsupervised workflow automation — humans approve customer, dispatch, finance, and system-of-record actions
Billing system — invoice manually for the first 3
Self-serve signup — it's a conversation, not a form
Windows/Linux — Apple Silicon only
Smaller Mac support — M3 Ultra 512GB first, smaller configs get a different advisory pack

Tes-specific questions for the discussion

1. omlx vs Ollama as primary runtime for customers — omlx is more capable but less known. Do we standardise on one or support both from day 1?

2. Capacity guard test suite — the code exists but no venv/pytest is wired up. What's the path to making it testable and shippable?

3. Benchmark pack standardisation — what benchmark packs should we ship as the default evidence base for customers?

4. llm_stats → Foundry integration — llm_stats already reads Foundry data. How tight should the packaged coupling be?

5. Hardware profile scope — M3 Ultra 512GB only for MVP, or do we need M4 Mac Pro / M4 MacBook Pro profiles too?

6. omlx production readiness — is omlx stable enough to recommend as a customer's primary serving substrate, or does it need more hardening?

7. Workflow pilot safety — what minimal review/audit mechanism is enough for a service-business pilot without building a full workflow product too early?

Timeline

Week	Step	Deliverable
1	Package truth layer + Advisory pack	`pip install llm-stats` works; advisory template ready to sell
2-3	Managed setup scripts	`foundry init/doctor/configure/status` works end-to-end
3-4	Remote support baseline	Tailscale + health bundle + support runbook
3-4+	Optional bounded workflow pilot	One human-reviewed admin workflow using OpenClaw/Hermes, not a reusable pack
1-4+	First 3 customers	Sell advisory, deploy managed, collect feedback

Total: 3-4 weeks to a sellable, deployable MVP.

The core MVP is Foundry private/local inference ops. The workflow pilot is a deliberately constrained proof point for the secondary service-business buyer path.