Self-Hosting AI Models | Freedom Lab NYC

Overview

Rent a server with GPUs on Vast.ai
Download AI models with Ollama
Setup a web interface for interacting with the models
Add TLS support to distribute access securely

Key Concepts

Inference

The process of using live data with a trained AI model to make predictions or solve tasks. This occurs after training, when the model is used for chatting or querying.

GGUF

GPT-Generated Unified Format (GGUF) is a specialized file format designed for optimized storage and fast loading of large language models (LLMs). It converts the original model file into a more efficient format, improving performance and usability during inference tasks.

Inference Framework

A software system optimized to execute trained AI models for real-world tasks. It handles:

Loading models (e.g., GGUF or custom formats)
Efficiently processing inputs/outputs
Leveraging hardware (CPU/GPU) for speed

Unlike training frameworks, it focuses purely on running models (not building them), prioritizing speed, compatibility, and ease of use.

Tools Used

Tool	Purpose
Ollama	Lightweight inference framework for downloading and running AI models
Open WebUI	ChatGPT-like web interface that integrates with Ollama's API
Caddy	Lightweight reverse proxy to handle TLS
Cloudflare	Domain registration and DNS API for TLS verification

Step 1 - Rent a Server

Go to Vast.ai
Fund the account
Select the template NVIDIA Cuda (Ubuntu)
Modify the template:
- Open port 3000
- Select launch mode Interactive shell server, SSH
- Check Use direct SSH connection
- Set at least 200 GB of disk space
Save the template
Go to Search:
- Select the template
- In Machine Options check Secure Cloud and Static IP Address
- Select a server with your desired GPU configuration

Step 2 - SSH into the Server

Go to Instances and click on the server IP. Note the IP and port mapping:

Public IP Address: <SERVER_IP>

Open Ports:
<SERVER_IP>:<SSH_PORT> -> 22/tcp
<SERVER_IP>:<WEB_PORT> -> 3000/tcp

The ports are automatically proxied by Vast.ai.

Create an SSH keypair:

ssh-keygen -t ed25519

Back on your instance, click the Key icon and add your SSH public key.

Connect to the server:

ssh -i ~/.ssh/id_ed25519 root@<SERVER_IP> -p <SSH_PORT>

Disable the preset tmux (optional):

touch .no_auto_tmux

Step 3 - Setup Ollama

Download and install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Verify the installation:

ollama --version

Download the models you want:

ollama pull deepseek-r1:7b

Browse available models at ollama.com/search

Since we're inside a container without systemd, use tmux to keep Ollama running:

tmux new-session -s 'ollama'
ollama serve

Press Ctrl+b d to detach. Ollama is now running and exposes its API on port 11434.

Step 4 - Setup Open WebUI

Install uv (Python package manager):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create the data directory:

mkdir ~/.open-webui

Open a tmux session and start Open WebUI on port 3001:

tmux new-session -s 'webui'
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve --port 3001

We use port 3001 because Caddy will handle TLS on port 3000 and proxy to 3001.

Press Ctrl+b d to detach.

Step 5 - Setup Cloudflare

Go to Cloudflare and register a domain (or use an existing one)
Go to My Profile > API Tokens
Create a Token:
- Select Create custom token
- Give the token a name
- Set permissions: Zone : DNS : Edit
Save the API token

This token allows Caddy to verify domain ownership via DNS challenge (since ports 80/443 may not be available).

Step 6 - Setup Caddy

Download Caddy with the Cloudflare DNS plugin:

Go to caddyserver.com/download
Select Linux AMD64
Add the caddy-dns/cloudflare plugin
Copy the download link

On the server:

wget "<DOWNLOAD_URL>" -O /usr/bin/caddy
chmod +x /usr/bin/caddy

Verify the installation:

caddy version

Create the configuration directory and file:

mkdir -p /etc/caddy

Create /etc/caddy/Caddyfile with the following content:

example.com:3000 {
    tls {
        dns cloudflare {env.CLOUDFLARE_API_TOKEN}
    }
    reverse_proxy localhost:3001
}

Replace example.com with your domain.

Start Caddy in a tmux session:

tmux new-session -s 'caddy'
CLOUDFLARE_API_TOKEN="YOUR_API_TOKEN" caddy run --config /etc/caddy/Caddyfile

Press Ctrl+b d to detach.

Step 7 - Access and Configure

Access Open WebUI at https://example.com:<WEB_PORT>/

The first user to register becomes the admin. Configure user management:

Create groups to manage permissions
By default, AI models are private
Grant access to specific groups or make models public

Summary

Your self-hosted AI server is now running with:

Ollama serving models on port 11434 (internal)
Open WebUI on port 3001 (internal)
Caddy handling TLS on port 3000
Models accessible via https://your-domain.com:<WEB_PORT>/