Demo stack resources for Matthew's self-hosted talk
Find a file
2026-05-05 00:53:09 -05:00
linux-nvidia initial commit of demo stack 2026-05-05 00:08:17 -05:00
mac-arm64 initial commit of demo stack 2026-05-05 00:08:17 -05:00
mac-intel initial commit of demo stack 2026-05-05 00:08:17 -05:00
nixOS nixOS examples 2026-05-05 00:53:09 -05:00
opencode adding opencode 2026-05-05 00:35:30 -05:00
searxng initial commit of demo stack 2026-05-05 00:08:17 -05:00
README.md nixOS examples 2026-05-05 00:53:09 -05:00

Self-Hosting AI on Your Laptop

Demo stack for the DevOpsDays Austin self-hosting talk. This brings up a local AI stack in Docker with no cloud dependencies. Welcome to freedom...

What's included

Service URL Description
Open WebUI http://localhost:8080 ChatGPT-like interface for local models
Perplexica http://localhost:3000 AI-powered web search (like Perplexity)
SearXNG http://localhost:8888 Private meta-search engine
Dockhand http://localhost:3100 Docker container management UI
OpenCode Terminal AI coding assistant backed by local models
Ollama API http://localhost:11434 LLM inference backend (native on M-series, containerized on Intel/Linux)

Prerequisites

All platforms:

  • Docker Desktop (or Docker Engine on Linux)
  • Node.js 18+ (for OpenCode)
  • 20+ GB free disk space for models

Apple Silicon Mac only:

Linux only:


Quick Start — Apple Silicon (M1/M2/M3/M4/M5)

1. Install and configure Ollama

Download and install Ollama for Mac, then configure it to accept connections from Docker containers:

# Allow Ollama to listen on all interfaces (not just localhost)
launchctl setenv OLLAMA_HOST "0.0.0.0"

Restart the Ollama app from the menu bar after running that command. Verify it's working:

ollama list

2. Pull models

ollama pull llama3.2
ollama pull mistral
ollama pull nomic-embed-text
  • llama3.2 (~2 GB) — chat model for Open WebUI
  • mistral (~4.1 GB) — chat model for Perplexica and OpenCode
  • nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

Start these early in the talk so they're ready for the live demo at the end.

3. Pull and start the stack

cd mac-arm64
docker compose pull
docker compose up -d

4. Set up OpenCode

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

5. Open the UIs


Quick Start — Intel Mac

1. Tune Docker Desktop memory

Open Docker Desktop → Settings → Resources:

  • Memory: 10 GB+ (8 GB minimum)
  • CPUs: 4+

2. Pull and start the stack

cd mac-intel
docker compose pull
docker compose up -d

3. Pull models

docker exec ollama ollama pull llama3.2:3b-instruct-q4_K_M
docker exec ollama ollama pull mistral
docker exec ollama ollama pull nomic-embed-text
  • llama3.2:3b-instruct-q4_K_M (~2 GB) — chat model for Open WebUI
  • mistral (~4.1 GB) — chat model for Perplexica and OpenCode
  • nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

4. Set up OpenCode

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

5. Open the UIs


Quick Start — x86 Linux (NVIDIA GPU)

1. Pull and start the stack

cd linux-nvidia
docker compose pull
docker compose up -d

2. Pull models

docker exec ollama ollama pull llama3.1:8b
docker exec ollama ollama pull nomic-embed-text
  • llama3.1:8b (~4.7 GB) — chat model for Open WebUI, Perplexica, and OpenCode
  • nomic-embed-text (~275 MB) — embeddings for Open WebUI RAG

3. Set up OpenCode

npm install -g opencode-ai
mkdir -p ~/.config/opencode
cp ../opencode/opencode.json ~/.config/opencode/opencode.json

Linux users: edit ~/.config/opencode/opencode.json and change the default model:

"model": "ollama-local/llama3.1:8b"

4. Open the UIs


Using the services

Open WebUI — Select a model from the dropdown and start chatting. Enable web search via the search icon in the chat bar. No login required (auth disabled for demo).

Perplexica — AI-powered search that cites sources. Uses SearXNG under the hood — no API keys needed. On first launch, go to Settings and set the chat model to mistral (M-series / Intel Mac) or llama3.1:8b (Linux) before running any searches.

Dockhand — Browse and manage all running containers. On first load, click the "No environments" dropdown in the top bar and select Local Docker to activate the pre-configured environment.

OpenCode — Terminal AI coding assistant. Navigate to any project directory and run opencode. Uses mistral by default (or llama3.1:8b on Linux). Best for single-file edits and explanations — 7B models are hit-or-miss on multi-step agentic tasks.

Stopping the stack

# From inside the platform directory you started
docker compose down

To also delete all data volumes (full reset):

docker compose down -v

Tips

Start pulls early: Kick off docker compose pull and your ollama pull commands at the beginning of the talk so everything is ready for the live demo at the end.

Slow first response: The first request after startup loads the model into memory (~30 seconds). Subsequent requests are much faster.

Out of memory (Intel Mac / Linux): If containers crash, use a smaller model. llama3.2:3b-instruct-q4_K_M is the most conservative option.

Ollama not reachable from containers (M-series): Make sure you ran launchctl setenv OLLAMA_HOST "0.0.0.0" and restarted the Ollama app. You can verify with curl http://localhost:11434 from your terminal.


Taking it further — NixOS

The nixOS/ folder contains illustrative NixOS configurations for four common self-hosting use cases. These aren't meant to be run directly — they're reference material for the talk showing how the same services you're running in Docker Compose can be expressed declaratively in NixOS.

Host Role
hosts/ai-server GPU-accelerated Ollama + Open WebUI
hosts/media-server Jellyfin + Immich photo management
hosts/home-gateway AdGuard DNS blocking + Caddy reverse proxy
hosts/cloud-vps Public blog, SSO, and Tailscale mesh VPN

A shared modules/common.nix applies SSH hardening, user config, and firewall defaults to every host automatically — no repetition.