Demo stack and NixOS sample config for Matthew Brahms Self-Hosting talk.

Nix 57.5%
Shell 42.5%

Find a file

Matthew Brahms 26352535d4 slides		2026-05-20 17:19:45 -05:00
cpu-only	initial commit of demo stack	2026-05-06 00:50:01 -05:00
linux-nvidia	initial commit of demo stack	2026-05-06 00:50:01 -05:00
mac-arm64	initial commit of demo stack	2026-05-06 00:50:01 -05:00
mac-intel	initial commit of demo stack	2026-05-06 00:50:01 -05:00
nixOS	initial commit of demo stack	2026-05-06 00:50:01 -05:00
opencode	initial commit of demo stack	2026-05-06 00:50:01 -05:00
searxng	initial commit of demo stack	2026-05-06 00:50:01 -05:00
slides	slides	2026-05-20 17:19:45 -05:00
windows-wsl2-nvidia	initial commit of demo stack	2026-05-06 00:50:01 -05:00
README.md	initial commit of demo stack	2026-05-06 00:50:01 -05:00

README.md

Self-Hosting AI on Your Laptop

Demo stack for the DevOpsDays Austin self-hosting talk. This brings up a local AI stack in Docker with no cloud dependencies. Welcome to freedom...

What's included

Service	URL	Description
Open WebUI	http://localhost:8080	ChatGPT-like interface for local models
Perplexica	http://localhost:3000	AI-powered web search (like Perplexity)
SearXNG	http://localhost:8888	Private meta-search engine
Dockhand	http://localhost:3100	Docker container management UI
OpenCode	Terminal	AI coding assistant backed by local models
Ollama API	http://localhost:11434	LLM inference backend (native on M-series, containerized on Intel/Linux)

Prerequisites

All platforms:

Docker Desktop (or Docker Engine on Linux)
Node.js 18+ (for OpenCode)
20+ GB free disk space for models

Apple Silicon Mac only:

Ollama for Mac — runs natively for full Metal GPU acceleration

Linux / WSL2 (NVIDIA GPU) only:

NVIDIA drivers (Linux) or Windows NVIDIA driver + WSL2 (Windows)
NVIDIA Container Toolkit

Quick Start — Apple Silicon (M1/M2/M3/M4/M5)

1. Install and configure Ollama

Download and install Ollama for Mac, then configure it to accept connections from Docker containers:

# Allow Ollama to listen on all interfaces (not just localhost)
launchctl setenv OLLAMA_HOST "0.0.0.0"

Restart the Ollama app from the menu bar after running that command. Verify it's working:

ollama list

2. Pull models

ollama pull llama3.2
ollama pull mistral
ollama pull nomic-embed-text

llama3.2 (~2 GB) — chat model for Open WebUI
mistral (~4.1 GB) — chat model for Perplexica and OpenCode
nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

Start these early in the talk so they're ready for the live demo at the end.

3. Pull and start the stack

cd mac-arm64
docker compose pull
./start.sh

4. Set up OpenCode

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

5. Open the UIs

Open WebUI → http://localhost:8080
Perplexica → http://localhost:3000
SearXNG → http://localhost:8888
Dockhand → http://localhost:3100
OpenCode → opencode in any project directory

Quick Start — Intel Mac

1. Tune Docker Desktop memory

Open Docker Desktop → Settings → Resources:

Memory: 10 GB+ (8 GB minimum)
CPUs: 4+

2. Pull and start the stack

cd mac-intel
docker compose pull
./start.sh

3. Pull models

docker exec ollama ollama pull llama3.2:3b-instruct-q4_K_M
docker exec ollama ollama pull mistral
docker exec ollama ollama pull nomic-embed-text

llama3.2:3b-instruct-q4_K_M (~2 GB) — chat model for Open WebUI
mistral (~4.1 GB) — chat model for Perplexica and OpenCode
nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

4. Set up OpenCode

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

5. Open the UIs

Open WebUI → http://localhost:8080
Perplexica → http://localhost:3000
SearXNG → http://localhost:8888
Dockhand → http://localhost:3100
OpenCode → opencode in any project directory

Quick Start — Windows (WSL2 + NVIDIA GPU)

Prerequisites

Windows 10 22H2+ or Windows 11
WSL2 with Ubuntu 22.04+ (wsl --install)
NVIDIA Game Ready or Studio driver for Windows (no separate CUDA install needed)
NVIDIA Container Toolkit installed inside WSL2
Docker Engine inside WSL2 (recommended) or Docker Desktop with WSL2 backend

1. Verify GPU access inside WSL2

Open a WSL2 terminal and confirm the GPU is visible:

nvidia-smi

You should see your GPU listed. If not, update your Windows NVIDIA driver.

2. Pull and start the stack

cd windows-wsl2-nvidia
docker compose pull
./start.sh

3. Pull models

docker exec ollama ollama pull llama3.1:8b
docker exec ollama ollama pull nomic-embed-text

llama3.1:8b (~4.7 GB) — chat model for Open WebUI, Perplexica, and OpenCode
nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

4. Set up OpenCode

Inside WSL2:

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

Edit ~/.config/opencode/opencode.json and set the default model:

"model": "ollama-local/llama3.1:8b"

5. Open the UIs

From your Windows browser:

Open WebUI → http://localhost:8080
Perplexica → http://localhost:3000
SearXNG → http://localhost:8888
Dockhand → http://localhost:3100
OpenCode → opencode in any project directory inside WSL2

WSL2 automatically forwards ports to Windows, so localhost works in your Windows browser.

Quick Start — CPU-only (Linux, WSL2, or any x86)

For systems without a supported GPU. Inference is slower — use small, highly-quantized models.

Prerequisites

Docker Engine (Linux / WSL2) or Docker Desktop (Windows/Mac)
16 GB RAM recommended (8 GB minimum)

1. Pull and start the stack

cd cpu-only
docker compose pull
./start.sh

2. Pull models

docker exec ollama ollama pull llama3.2:3b-instruct-q4_K_M
docker exec ollama ollama pull nomic-embed-text

llama3.2:3b-instruct-q4_K_M (~2 GB) — smallest usable chat model on CPU
nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

Skip mistral on CPU — it's too slow for interactive use without a GPU.

3. Set up OpenCode

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

Edit ~/.config/opencode/opencode.json and set:

"model": "ollama-local/llama3.2:3b-instruct-q4_K_M"

4. Open the UIs

Open WebUI → http://localhost:8080
Perplexica → http://localhost:3000
SearXNG → http://localhost:8888
Dockhand → http://localhost:3100
OpenCode → opencode in any project directory

WSL2 users: ports are forwarded to Windows automatically, so localhost works in your Windows browser.

Quick Start — x86 Linux (NVIDIA GPU)

1. Pull and start the stack

cd linux-nvidia
docker compose pull
./start.sh

2. Pull models

docker exec ollama ollama pull llama3.1:8b
docker exec ollama ollama pull nomic-embed-text

llama3.1:8b (~4.7 GB) — chat model for Open WebUI, Perplexica, and OpenCode
nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

3. Set up OpenCode

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

Linux users: edit ~/.config/opencode/opencode.json and change the default model:

"model": "ollama-local/llama3.1:8b"

4. Open the UIs

Open WebUI → http://localhost:8080
Perplexica → http://localhost:3000
SearXNG → http://localhost:8888
Dockhand → http://localhost:3100
OpenCode → opencode in any project directory

Using the services

Open WebUI — Select a model from the dropdown and start chatting. Enable web search via the search icon in the chat bar. No login required (auth disabled for demo).

Perplexica — AI-powered search that cites sources. Uses SearXNG under the hood — no API keys needed. On first launch, go to Settings and set the chat model to mistral (M-series / Intel Mac) or llama3.1:8b (Linux) before running any searches.

Dockhand — Browse and manage all running containers. On first load, click the "No environments" dropdown in the top bar and select Local Docker to activate the pre-configured environment.

OpenCode — Terminal AI coding assistant. Navigate to any project directory and run opencode. Uses mistral by default (or llama3.1:8b on Linux). Best for single-file edits and explanations — 7B models are hit-or-miss on multi-step agentic tasks.

Stopping the stack

# From inside the platform directory you started
docker compose down

To also delete all data volumes (full reset):

docker compose down -v

Tips

Start pulls early: Kick off docker compose pull and your ollama pull commands at the beginning of the talk so everything is ready for the live demo at the end.

Slow first response: The first request after startup loads the model into memory (~30 seconds). Subsequent requests are much faster.

Out of memory (Intel Mac / Linux): If containers crash, use a smaller model. llama3.2:3b-instruct-q4_K_M is the most conservative option.

Ollama not reachable from containers (M-series): Make sure you ran launchctl setenv OLLAMA_HOST "0.0.0.0" and restarted the Ollama app. You can verify with curl http://localhost:11434 from your terminal.

Taking it further — NixOS

The nixOS/ folder contains illustrative NixOS configurations for four common self-hosting use cases. These aren't meant to be run directly — they're reference material for the talk showing how the same services you're running in Docker Compose can be expressed declaratively in NixOS.

Host	Role
`hosts/ai-server`	GPU-accelerated Ollama + Open WebUI
`hosts/media-server`	Jellyfin + Immich photo management
`hosts/home-gateway`	AdGuard DNS blocking + Caddy reverse proxy
`hosts/cloud-vps`	Public blog, SSO, and Tailscale mesh VPN

A shared modules/common.nix applies SSH hardening, user config, and firewall defaults to every host automatically — no repetition.