Gremlin — Local multi-agent coordinator

What it is

Gremlin runs a team of AI agents from a browser tab. Each agent has its own system prompt, model, and role. They send messages to each other, call tools (web search, file system, browser), and converge on a result. You watch the whole conversation happen in real time.

The Vite dev server is the runtime — there is no separate backend. The coordinator and UI run locally; inference can run locally (Ollama, LM Studio, WebLLM) or against any cloud provider (OpenAI, Anthropic, Gemini, Groq, OpenRouter, Together, or any OpenAI-compatible endpoint). API keys live in localStorage and only ever reach the provider you point them at.

How it works

A TypeScript coordinator (coordinator.ts) holds the in-memory agent state and routes messages. Each agent returns JSON:

{ "analysis": "My reasoning here", "messages": [ { "to": "critic", "content": "Verify finding #3" } ], "done": false, "result": null }

The coordinator routes each message to its recipient and runs the next agent. When an agent sets done: true with a result, that agent stops. The synthesizer's result is shown to the user.

Agents can also call tools via the standard tool-calling API:

web_search / web_fetch — DuckDuckGo by default; Brave, Serper, Tavily, SearXNG, or Cloudflare optional
read_file / write_file / list_directory — in Engineering mode, after granting a folder via the File System Access API
browse_navigate / click / type / evaluate / assert — Playwright headless Chromium sidecar at 127.0.0.1:3131

Built-in modes

Each mode loads a preset roster of agents. You can edit any agent, save a new mode from your current team, or delete custom modes.

General — CEO · Researcher · Analyst · Critic · Writer · Editor · Chief of Staff
Engineering — CTO and a full software dev team with file-system tools
Finance — Capital Allocator with Value / Quant / Filings / Risk / Sector analysts
Algo Trading — Strategy, quant, engineering, and risk roles for building trading bots
Industrial — Manufacturing, operations, supply chain, quality
Medicine — Attending physician, diagnosticians, pharmacist, specialists
Networking — NOC director with transport / routing / voice / RF engineers
Prediction Markets — Probability modeler, news scanner, whale tracker, arbitrage analyst
Game Design — Director, designer, narrative, art, 3D, architecture

Bring your own model

Gremlin is a coordinator, not a model. Point it at any supported provider and pick a model per agent — mix local and cloud across the same team if you want.

OpenAI — GPT-4o and variants. Drop in an API key.
Anthropic — Claude Opus / Sonnet / Haiku. Drop in an API key.
Google Gemini — free tier available.
OpenRouter — 200+ models behind one key, free tier available.
Groq — very fast inference, free tier available.
Together — open-source models in the cloud.
Custom — any OpenAI-compatible endpoint (vLLM, llama.cpp server, etc.).
Ollama — local models, any size. Auto-detects your GPU and recommends per-agent assignments on first run.
LM Studio — local server, point-and-click model management.
WebLLM — in-browser inference via WebGPU. Zero install, zero server.

API keys live in localStorage and are sent only to the provider you point them at. Nothing is proxied through Gremlin's servers — there are no Gremlin servers.

Install

Three commands. The dev server is the app — open the URL it prints.

$ git clone https://github.com/aosmith/gremlin.git
$ cd gremlin/web
$ npm install
$ npm run dev
# → http://localhost:5173

Requires Node.js 18+. For local inference, install Ollama; Gremlin detects your GPU on first run and recommends per-agent models. For cloud inference, drop an API key into Settings.