These tutorials are written by Freedom Lab members in their free time. If you find them helpful, please consider supporting our work.

TutorialsSelf-Hosting AI Models

Self-Hosting AI Models

Self-hosting Privacy AI Linux

Run your own AI inference server by renting GPU hardware and deploying open-source models with a web interface.

This guide covers renting a GPU server on Vast.ai, deploying models with Ollama, and setting up Open WebUI with TLS.

Overview

  1. Rent a server with GPUs on Vast.ai
  2. Download AI models with Ollama
  3. Setup a web interface for interacting with the models
  4. Add TLS support to distribute access securely

Key Concepts

Inference

The process of using live data with a trained AI model to make predictions or solve tasks. This occurs after training, when the model is used for chatting or querying.

GGUF

GPT-Generated Unified Format (GGUF) is a specialized file format designed for optimized storage and fast loading of large language models (LLMs). It converts the original model file into a more efficient format, improving performance and usability during inference tasks.

Inference Framework

A software system optimized to execute trained AI models for real-world tasks. It handles:

Unlike training frameworks, it focuses purely on running models (not building them), prioritizing speed, compatibility, and ease of use.

Tools Used

Tool Purpose
Ollama Lightweight inference framework for downloading and running AI models
Open WebUI ChatGPT-like web interface that integrates with Ollama's API
Caddy Lightweight reverse proxy to handle TLS
Cloudflare Domain registration and DNS API for TLS verification

Step 1 - Rent a Server

  1. Go to Vast.ai
  2. Fund the account
  3. Select the template NVIDIA Cuda (Ubuntu)
  4. Modify the template:
    • Open port 3000
    • Select launch mode Interactive shell server, SSH
    • Check Use direct SSH connection
    • Set at least 200 GB of disk space
  5. Save the template
  6. Go to Search:
    • Select the template
    • In Machine Options check Secure Cloud and Static IP Address
    • Select a server with your desired GPU configuration

Step 2 - SSH into the Server

Go to Instances and click on the server IP. Note the IP and port mapping:

Public IP Address: <SERVER_IP>

Open Ports:
<SERVER_IP>:<SSH_PORT> -> 22/tcp
<SERVER_IP>:<WEB_PORT> -> 3000/tcp

The ports are automatically proxied by Vast.ai.

Create an SSH keypair:

ssh-keygen -t ed25519

Back on your instance, click the Key icon and add your SSH public key.

Connect to the server:

ssh -i ~/.ssh/id_ed25519 root@<SERVER_IP> -p <SSH_PORT>

Disable the preset tmux (optional):

touch .no_auto_tmux

Step 3 - Setup Ollama

Download and install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version

Download the models you want:

ollama pull deepseek-r1:7b

Browse available models at ollama.com/search

Since we're inside a container without systemd, use tmux to keep Ollama running:

tmux new-session -s 'ollama'
ollama serve

Press Ctrl+b d to detach. Ollama is now running and exposes its API on port 11434.

Step 4 - Setup Open WebUI

Install uv (Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create the data directory:

mkdir ~/.open-webui

Open a tmux session and start Open WebUI on port 3001:

tmux new-session -s 'webui'
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve --port 3001

We use port 3001 because Caddy will handle TLS on port 3000 and proxy to 3001.

Press Ctrl+b d to detach.

Step 5 - Setup Cloudflare

  1. Go to Cloudflare and register a domain (or use an existing one)
  2. Go to My Profile > API Tokens
  3. Create a Token:
    • Select Create custom token
    • Give the token a name
    • Set permissions: Zone : DNS : Edit
  4. Save the API token

This token allows Caddy to verify domain ownership via DNS challenge (since ports 80/443 may not be available).

Step 6 - Setup Caddy

Download Caddy with the Cloudflare DNS plugin:

  1. Go to caddyserver.com/download
  2. Select Linux AMD64
  3. Add the caddy-dns/cloudflare plugin
  4. Copy the download link

On the server:

wget "<DOWNLOAD_URL>" -O /usr/bin/caddy
chmod +x /usr/bin/caddy

Verify the installation:

caddy version

Create the configuration directory and file:

mkdir -p /etc/caddy

Create /etc/caddy/Caddyfile with the following content:

example.com:3000 {
    tls {
        dns cloudflare {env.CLOUDFLARE_API_TOKEN}
    }
    reverse_proxy localhost:3001
}

Replace example.com with your domain.

Start Caddy in a tmux session:

tmux new-session -s 'caddy'
CLOUDFLARE_API_TOKEN="YOUR_API_TOKEN" caddy run --config /etc/caddy/Caddyfile

Press Ctrl+b d to detach.

Step 7 - Access and Configure

Access Open WebUI at https://example.com:<WEB_PORT>/

The first user to register becomes the admin. Configure user management:

Summary

Your self-hosted AI server is now running with: