- Nix 57.5%
- Shell 42.5%
| cpu-only | ||
| linux-nvidia | ||
| mac-arm64 | ||
| mac-intel | ||
| nixOS | ||
| opencode | ||
| searxng | ||
| slides | ||
| windows-wsl2-nvidia | ||
| README.md | ||
Self-Hosting AI on Your Laptop
Demo stack for the DevOpsDays Austin self-hosting talk. This brings up a local AI stack in Docker with no cloud dependencies. Welcome to freedom...
What's included
| Service | URL | Description |
|---|---|---|
| Open WebUI | http://localhost:8080 | ChatGPT-like interface for local models |
| Perplexica | http://localhost:3000 | AI-powered web search (like Perplexity) |
| SearXNG | http://localhost:8888 | Private meta-search engine |
| Dockhand | http://localhost:3100 | Docker container management UI |
| OpenCode | Terminal | AI coding assistant backed by local models |
| Ollama API | http://localhost:11434 | LLM inference backend (native on M-series, containerized on Intel/Linux) |
Prerequisites
All platforms:
- Docker Desktop (or Docker Engine on Linux)
- Node.js 18+ (for OpenCode)
- 20+ GB free disk space for models
Apple Silicon Mac only:
- Ollama for Mac — runs natively for full Metal GPU acceleration
Linux / WSL2 (NVIDIA GPU) only:
- NVIDIA drivers (Linux) or Windows NVIDIA driver + WSL2 (Windows)
- NVIDIA Container Toolkit
Quick Start — Apple Silicon (M1/M2/M3/M4/M5)
1. Install and configure Ollama
Download and install Ollama for Mac, then configure it to accept connections from Docker containers:
# Allow Ollama to listen on all interfaces (not just localhost)
launchctl setenv OLLAMA_HOST "0.0.0.0"
Restart the Ollama app from the menu bar after running that command. Verify it's working:
ollama list
2. Pull models
ollama pull llama3.2
ollama pull mistral
ollama pull nomic-embed-text
llama3.2(~2 GB) — chat model for Open WebUImistral(~4.1 GB) — chat model for Perplexica and OpenCodenomic-embed-text(~275 MB) — embeddings for Open WebUI RAG
Start these early in the talk so they're ready for the live demo at the end.
3. Pull and start the stack
cd mac-arm64
docker compose pull
./start.sh
4. Set up OpenCode
npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json
5. Open the UIs
- Open WebUI → http://localhost:8080
- Perplexica → http://localhost:3000
- SearXNG → http://localhost:8888
- Dockhand → http://localhost:3100
- OpenCode →
opencodein any project directory
Quick Start — Intel Mac
1. Tune Docker Desktop memory
Open Docker Desktop → Settings → Resources:
- Memory: 10 GB+ (8 GB minimum)
- CPUs: 4+
2. Pull and start the stack
cd mac-intel
docker compose pull
./start.sh
3. Pull models
docker exec ollama ollama pull llama3.2:3b-instruct-q4_K_M
docker exec ollama ollama pull mistral
docker exec ollama ollama pull nomic-embed-text
llama3.2:3b-instruct-q4_K_M(~2 GB) — chat model for Open WebUImistral(~4.1 GB) — chat model for Perplexica and OpenCodenomic-embed-text(~275 MB) — embeddings for Open WebUI RAG
4. Set up OpenCode
npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json
5. Open the UIs
- Open WebUI → http://localhost:8080
- Perplexica → http://localhost:3000
- SearXNG → http://localhost:8888
- Dockhand → http://localhost:3100
- OpenCode →
opencodein any project directory
Quick Start — Windows (WSL2 + NVIDIA GPU)
Prerequisites
- Windows 10 22H2+ or Windows 11
- WSL2 with Ubuntu 22.04+ (
wsl --install) - NVIDIA Game Ready or Studio driver for Windows (no separate CUDA install needed)
- NVIDIA Container Toolkit installed inside WSL2
- Docker Engine inside WSL2 (recommended) or Docker Desktop with WSL2 backend
1. Verify GPU access inside WSL2
Open a WSL2 terminal and confirm the GPU is visible:
nvidia-smi
You should see your GPU listed. If not, update your Windows NVIDIA driver.
2. Pull and start the stack
cd windows-wsl2-nvidia
docker compose pull
./start.sh
3. Pull models
docker exec ollama ollama pull llama3.1:8b
docker exec ollama ollama pull nomic-embed-text
llama3.1:8b(~4.7 GB) — chat model for Open WebUI, Perplexica, and OpenCodenomic-embed-text(~275 MB) — embeddings for Open WebUI RAG
4. Set up OpenCode
Inside WSL2:
npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json
Edit ~/.config/opencode/opencode.json and set the default model:
"model": "ollama-local/llama3.1:8b"
5. Open the UIs
From your Windows browser:
- Open WebUI → http://localhost:8080
- Perplexica → http://localhost:3000
- SearXNG → http://localhost:8888
- Dockhand → http://localhost:3100
- OpenCode →
opencodein any project directory inside WSL2
WSL2 automatically forwards ports to Windows, so
localhostworks in your Windows browser.
Quick Start — CPU-only (Linux, WSL2, or any x86)
For systems without a supported GPU. Inference is slower — use small, highly-quantized models.
Prerequisites
- Docker Engine (Linux / WSL2) or Docker Desktop (Windows/Mac)
- 16 GB RAM recommended (8 GB minimum)
1. Pull and start the stack
cd cpu-only
docker compose pull
./start.sh
2. Pull models
docker exec ollama ollama pull llama3.2:3b-instruct-q4_K_M
docker exec ollama ollama pull nomic-embed-text
llama3.2:3b-instruct-q4_K_M(~2 GB) — smallest usable chat model on CPUnomic-embed-text(~275 MB) — embeddings for Open WebUI RAG
Skip
mistralon CPU — it's too slow for interactive use without a GPU.
3. Set up OpenCode
npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json
Edit ~/.config/opencode/opencode.json and set:
"model": "ollama-local/llama3.2:3b-instruct-q4_K_M"
4. Open the UIs
- Open WebUI → http://localhost:8080
- Perplexica → http://localhost:3000
- SearXNG → http://localhost:8888
- Dockhand → http://localhost:3100
- OpenCode →
opencodein any project directory
WSL2 users: ports are forwarded to Windows automatically, so
localhostworks in your Windows browser.
Quick Start — x86 Linux (NVIDIA GPU)
1. Pull and start the stack
cd linux-nvidia
docker compose pull
./start.sh
2. Pull models
docker exec ollama ollama pull llama3.1:8b
docker exec ollama ollama pull nomic-embed-text
llama3.1:8b(~4.7 GB) — chat model for Open WebUI, Perplexica, and OpenCodenomic-embed-text(~275 MB) — embeddings for Open WebUI RAG
3. Set up OpenCode
npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json
Linux users: edit ~/.config/opencode/opencode.json and change the default model:
"model": "ollama-local/llama3.1:8b"
4. Open the UIs
- Open WebUI → http://localhost:8080
- Perplexica → http://localhost:3000
- SearXNG → http://localhost:8888
- Dockhand → http://localhost:3100
- OpenCode →
opencodein any project directory
Using the services
Open WebUI — Select a model from the dropdown and start chatting. Enable web search via the search icon in the chat bar. No login required (auth disabled for demo).
Perplexica — AI-powered search that cites sources. Uses SearXNG under the hood — no API keys needed. On first launch, go to Settings and set the chat model to mistral (M-series / Intel Mac) or llama3.1:8b (Linux) before running any searches.
Dockhand — Browse and manage all running containers. On first load, click the "No environments" dropdown in the top bar and select Local Docker to activate the pre-configured environment.
OpenCode — Terminal AI coding assistant. Navigate to any project directory and run opencode. Uses mistral by default (or llama3.1:8b on Linux). Best for single-file edits and explanations — 7B models are hit-or-miss on multi-step agentic tasks.
Stopping the stack
# From inside the platform directory you started
docker compose down
To also delete all data volumes (full reset):
docker compose down -v
Tips
Start pulls early: Kick off docker compose pull and your ollama pull commands at the beginning of the talk so everything is ready for the live demo at the end.
Slow first response: The first request after startup loads the model into memory (~30 seconds). Subsequent requests are much faster.
Out of memory (Intel Mac / Linux): If containers crash, use a smaller model. llama3.2:3b-instruct-q4_K_M is the most conservative option.
Ollama not reachable from containers (M-series): Make sure you ran launchctl setenv OLLAMA_HOST "0.0.0.0" and restarted the Ollama app. You can verify with curl http://localhost:11434 from your terminal.
Taking it further — NixOS
The nixOS/ folder contains illustrative NixOS configurations for four common self-hosting use cases. These aren't meant to be run directly — they're reference material for the talk showing how the same services you're running in Docker Compose can be expressed declaratively in NixOS.
| Host | Role |
|---|---|
hosts/ai-server |
GPU-accelerated Ollama + Open WebUI |
hosts/media-server |
Jellyfin + Immich photo management |
hosts/home-gateway |
AdGuard DNS blocking + Caddy reverse proxy |
hosts/cloud-vps |
Public blog, SSO, and Tailscale mesh VPN |
A shared modules/common.nix applies SSH hardening, user config, and firewall defaults to every host automatically — no repetition.