Introducing Local Models in Workshop Desktop
Run fully local AI models inside Workshop Desktop — free, private, and on your own hardware. Research Preview now available.
TL;DR
Workshop Desktop now supports local AI models running on your own hardware. Connect any Anthropic Messages API-compatible server — like llama.cpp — and use it just like a frontier model, inside a structured agent workspace. No API keys. No cloud calls. No cost per request.
This is a Research Preview. It works, but expect rough edges.
Why Local Models Matter Now
For the past few years, AI has mostly meant one thing: send your data to the cloud, wait for a response, hope the model behaves.
That's starting to change.
Models are getting smaller and more capable. Quantized variants run on consumer hardware. Open-weight ecosystems are evolving fast. And privacy expectations are rising.
But the tooling hasn't caught up. Running a local model today usually means managing terminals, wrestling with CLI flags, and hacking together workflows with no real agent experience.
Workshop Desktop changes that.
What You Can Do
With local model support, Workshop Desktop becomes a general AI workspace powered by your own hardware:
- Research and analyze data
- Build apps and tools
- Refactor code
- Prototype ideas
- Chain workflows
All inside a structured workspace — not a terminal.
Private by Default
Your prompts. Your files. Your experiments. They stay on your machine.
No API calls to external services. No cloud inference. No hidden data flows. When you run a local model, nothing leaves your computer.
Seamless Switching Between Local and Frontier Models
Sometimes you want maximum privacy, offline capability, or zero cost per request. Other times you want a frontier model for the hardest reasoning tasks or massive context windows.
Workshop lets you move between these modes fluidly. Select "Local" in the model dropdown when you want local inference. Switch back to Claude, GPT, or Gemini when you need more horsepower. You're not choosing a side — you're choosing flexibility.
Free and Private Forever
Running AI locally should not be a premium feature.
Workshop Desktop's local model support is free, private, forever. No artificial throttling. No paywalls for autonomy. No "upgrade to unlock privacy."
You own your machine. You should own your AI.
How It Works
Under the hood, Workshop connects to any server that implements the Anthropic Messages API (/v1/messages endpoint). The recommended setup is llama.cpp with its built-in Anthropic compatibility mode.
- Run a local model server (like llama-server from llama.cpp)
- In Workshop, go to Settings > Agent > Local Model Setup
- Enter your server's base URL (e.g.,
http://0.0.0.0:8082) - Hit Test Connection to verify
- Select Local from the model dropdown
That's it. Workshop handles the rest — tool definitions, streaming, context management — the same way it does for cloud models.
For a detailed walkthrough, see the Local Models Setup Guide.
Research Preview: What to Expect
This is an early release. Here's what you should know:
What works well:
- General conversation and reasoning
- Code generation and analysis
- Research and summarization
- Streaming responses
Known limitations:
- Tool calling varies by model. Workshop's agent uses tool calls for file editing, code execution, and search. Some local models handle this well; others struggle. When tool calls fail, Workshop falls back to text extraction, but the experience may be rougher.
- Hardware matters. You'll need a machine with enough memory and compute to run the model you choose. Apple Silicon Macs with 16GB+ RAM are a good starting point. See the setup guide for recommended specs.
- Context windows are smaller. Most local models support 8k–128k tokens, compared to 200k+ for frontier models. Workshop sets a conservative 100k default, but your actual limit depends on the model and your hardware.
- No cloud deployment. Local model support is desktop-only by design. It doesn't work in Workshop's cloud environments.
We'll keep improving compatibility as local models and inference servers evolve. If you hit issues, let us know.
Who This Is For
If you're:
- A developer who wants private code analysis without sending proprietary code to the cloud
- A builder who doesn't want to live in the terminal to use local AI
- A founder who wants autonomy without managing infra
- A power user exploring quantized models on your own hardware
This is for you.
The Bigger Picture
We think the future of AI isn't just cloud APIs, just open models, just agents, or just apps. It's a unified workspace where you can research, analyze, prototype, and deploy — across both local and cloud environments, without friction.
Local model support in Workshop Desktop is the first step toward that future.
Local models are ready for real work. And now, they have a real home.
