How to Install and Use Ollama on Ubuntu: Run LLMs Locally

By 

Published on

7 min read

Install Ollama on Ubuntu to run large language models locally

Running a large language model does not require sending every prompt to a hosted API. Ollama packages models so you can run them on your Ubuntu machine, use them offline after downloading, and access them from terminal commands or local applications.

When you select a local model, inference and prompt processing stay on your machine. Ollama also supports optional cloud models, which are separate from the local workflow covered here. This guide explains how to install Ollama on Ubuntu, run your first local model, manage downloads, check GPU acceleration, and use the built-in API.

Quick Reference

CommandDescription
ollama run MODELDownload a model if needed and start a chat
ollama pull MODELDownload a model without running it
ollama lsList downloaded models
ollama psShow models currently loaded in memory
ollama show MODELDisplay model details
ollama stop MODELUnload a running model
ollama rm MODELRemove a downloaded model
systemctl status ollamaCheck the Ollama service
journalctl -e -u ollamaView recent service logs

Step 1: Install Ollama

The official installation script detects the system architecture, installs the Ollama binary, and configures a systemd service. Install curl and zstd first so Ubuntu can download and extract the current package:

Terminal
sudo apt update
sudo apt install curl zstd

You can then run the official installer:

Terminal
curl -fsSL https://ollama.com/install.sh | sh

The script adds the ollama command to your path, creates a dedicated ollama system user, and enables the background service when systemd is available.

If you prefer to inspect remote scripts before running them, download this one first:

Terminal
curl -fsSL https://ollama.com/install.sh -o ollama-install.sh
less ollama-install.sh
sh ollama-install.sh

Confirm that the command is available:

Terminal
ollama --version

The command prints the installed Ollama version. The number changes as new releases become available.

Step 2: Verify the Ollama Service

Ollama listens on 127.0.0.1:11434 by default. Check that its systemd service is running:

Terminal
systemctl status ollama --no-pager

The relevant part of the output should look similar to this:

output
● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
     Active: active (running)

The enabled state means the service starts during boot. active (running) confirms that the local server is ready to accept commands and API requests.

Step 3: Run Your First Model

The ollama run command downloads a model the first time you use it, then opens an interactive chat. Llama 3.2 is a practical first model because its default 3B variant is about 2 GB:

Terminal
ollama run llama3.2

After the download finishes, enter a prompt at the >>> marker:

output
>>> Explain what a reverse proxy does in one sentence.
A reverse proxy receives client requests and forwards them to one or more
backend servers, often handling TLS, caching, or load balancing.

>>> /bye

Type /bye or press Ctrl+D to leave the chat. Enter /? inside the session to see the available interactive commands.

Step 4: Manage Models

The Ollama model library lists available models, variants, file sizes, and context limits. Download another model without opening a chat by using ollama pull:

Terminal
ollama pull mistral

List the models stored on the machine:

Terminal
ollama ls
output
NAME               ID              SIZE      MODIFIED
llama3.2:latest    a80c4f17acd5    2.0 GB    5 minutes ago
mistral:latest     f974a74358d6    4.1 GB    1 minute ago

Use ollama show to inspect a model and ollama ps to see which models are loaded in memory:

Terminal
ollama show llama3.2
ollama ps

Ollama keeps a recently used model loaded for five minutes by default. Unload it immediately with ollama stop:

Terminal
ollama stop llama3.2

Remove an unused download to recover disk space:

Terminal
ollama rm mistral

Models installed by the system service are stored under /usr/share/ollama/.ollama/models. When you pick a model, match its size to the memory you have. A small 1B to 3B model runs comfortably on a CPU, while larger models want more RAM or GPU memory and feel much faster once a supported GPU handles them.

Step 5: Use the Ollama API

The local API is available at http://localhost:11434/api. Send a prompt to the generate endpoint with curl, setting stream to false so the server returns one completed JSON response:

Terminal
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Name three Linux distributions.",
  "stream": false
}'

The JSON response contains the generated text in the response field, along with timing and token-count information. Because the request stays on your machine, local models need no API key or authentication.

The same local service also exposes chat, embeddings, model-management, and OpenAI-compatible endpoints, so most applications and browser front ends can talk to Ollama without any extra setup.

Step 6: Check GPU Acceleration

If no supported GPU is present, Ollama falls back to the CPU and still works, just more slowly. When the Linux installer detects supported NVIDIA or AMD hardware, it pulls in the matching GPU libraries for you. You still need a working GPU driver on the host, since Ollama relies on it to reach the card.

For NVIDIA hardware, verify the installed driver:

Terminal
nvidia-smi

Ollama does not require the full CUDA Toolkit for inference. It supports NVIDIA GPUs with compute capability 5.0 or newer, subject to the driver requirements listed in the official hardware documentation .

For supported AMD hardware on Linux, Ollama uses ROCm and requires a compatible ROCm 7 driver. The installer downloads the additional ROCm package when it detects an AMD GPU.

Run a model, then check which processor it uses:

Terminal
ollama run llama3.2 "Respond with the word ready."
ollama ps
output
NAME               ID              SIZE      PROCESSOR    UNTIL
llama3.2:latest    a80c4f17acd5    3.5 GB    100% GPU     4 minutes from now

A PROCESSOR value of 100% GPU confirms GPU inference. If it shows 100% CPU, inspect the service logs and verify that the GPU driver can see the card:

Terminal
journalctl -e -u ollama

Step 7: Expose Ollama on the Network

Ollama binds to the loopback address by default, which prevents other devices from reaching it. To use a separate GPU server from a trusted workstation, create a systemd override:

Terminal
sudo systemctl edit ollama

Add the following setting:

ini
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Reload systemd and restart Ollama:

Terminal
sudo systemctl daemon-reload
sudo systemctl restart ollama
Warning
The local Ollama API does not require authentication. Binding it to 0.0.0.0 exposes model execution to every client that can reach port 11434. Allow only trusted source addresses with a firewall, and do not publish the port directly to the internet.

For example, this UFW rule permits a private subnet and blocks unrelated clients:

Terminal
sudo ufw allow from 192.168.1.0/24 to any port 11434 proto tcp

Replace the subnet with the address or network that should have access. See the Ubuntu UFW guide for rule management and verification.

Update Ollama

When a new release lands, you do not need a special upgrade command. Run the same installation script again and it updates Ollama in place:

Terminal
curl -fsSL https://ollama.com/install.sh | sh

The installer replaces the program files and restarts the service without deleting downloaded models.

Troubleshooting

Ollama cannot connect to the local service
Start the service with sudo systemctl start ollama, then inspect its status with systemctl status ollama --no-pager.

The service starts but model commands fail
Read the recent logs with journalctl -e -u ollama. For live output while reproducing the problem, use journalctl -u ollama --no-pager --follow --pager-end.

A model runs out of memory
Remove other loaded models with ollama stop MODEL, then choose a smaller model or a lower-precision variant from the model library.

Ollama uses the CPU instead of an NVIDIA GPU
Run nvidia-smi and update the NVIDIA driver if it cannot see the card. After a Linux suspend and resume cycle, reloading nvidia_uvm or rebooting may restore GPU detection.

Ollama does not detect an AMD GPU
Confirm that the host uses a supported GPU and ROCm 7 driver. Check /dev/kfd permissions and the Ollama logs for ROCm or GPU-discovery errors.

Conclusion

Ollama provides a direct way to run local language models on Ubuntu through the terminal or an HTTP API. Start with a small model, confirm its memory and processor use with ollama ps, and keep port 11434 restricted to trusted systems.

Tags

Linuxize Weekly Newsletter

A quick weekly roundup of new tutorials, news, and tips.

About the authors

Dejan Panovski

Dejan Panovski

Dejan Panovski is the founder of Linuxize, an RHCSA-certified Linux system administrator and DevOps engineer based in Skopje, Macedonia. Author of 800+ Linux tutorials with 20+ years of experience turning complex Linux tasks into clear, reliable guides.

View author page