How to Install and Use Ollama on Ubuntu: Run LLMs Locally

Running a large language model does not require sending every prompt to a hosted API. Ollama packages models so you can run them on your Ubuntu machine, use them offline after downloading, and access them from terminal commands or local applications.
When you select a local model, inference and prompt processing stay on your machine. Ollama also supports optional cloud models, which are separate from the local workflow covered here. This guide explains how to install Ollama on Ubuntu, run your first local model, manage downloads, check GPU acceleration, and use the built-in API.
Quick Reference
| Command | Description |
|---|---|
ollama run MODEL | Download a model if needed and start a chat |
ollama pull MODEL | Download a model without running it |
ollama ls | List downloaded models |
ollama ps | Show models currently loaded in memory |
ollama show MODEL | Display model details |
ollama stop MODEL | Unload a running model |
ollama rm MODEL | Remove a downloaded model |
systemctl status ollama | Check the Ollama service |
journalctl -e -u ollama | View recent service logs |
Step 1: Install Ollama
The official installation script detects the system architecture, installs the Ollama binary, and configures a systemd service. Install curl and zstd first so Ubuntu can download and extract the current package:
sudo apt update
sudo apt install curl zstdYou can then run the official installer:
curl -fsSL https://ollama.com/install.sh | shThe script adds the ollama command to your path, creates a dedicated ollama system user, and enables the background service when systemd is available.
If you prefer to inspect remote scripts before running them, download this one first:
curl -fsSL https://ollama.com/install.sh -o ollama-install.sh
less ollama-install.sh
sh ollama-install.shConfirm that the command is available:
ollama --versionThe command prints the installed Ollama version. The number changes as new releases become available.
Step 2: Verify the Ollama Service
Ollama listens on 127.0.0.1:11434 by default. Check that its systemd service is running:
systemctl status ollama --no-pagerThe relevant part of the output should look similar to this:
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running)The enabled state means the service starts during boot. active (running) confirms that the local server is ready to accept commands and API requests.
Step 3: Run Your First Model
The ollama run command downloads a model the first time you use it, then opens an interactive chat. Llama 3.2 is a practical first model because its default 3B variant is about 2 GB:
ollama run llama3.2After the download finishes, enter a prompt at the >>> marker:
>>> Explain what a reverse proxy does in one sentence.
A reverse proxy receives client requests and forwards them to one or more
backend servers, often handling TLS, caching, or load balancing.
>>> /byeType /bye or press Ctrl+D to leave the chat. Enter /? inside the session to see the available interactive commands.
Step 4: Manage Models
The Ollama model library
lists available models, variants, file sizes, and context limits. Download another model without opening a chat by using ollama pull:
ollama pull mistralList the models stored on the machine:
ollama lsNAME ID SIZE MODIFIED
llama3.2:latest a80c4f17acd5 2.0 GB 5 minutes ago
mistral:latest f974a74358d6 4.1 GB 1 minute agoUse ollama show to inspect a model and ollama ps to see which models are loaded in memory:
ollama show llama3.2
ollama psOllama keeps a recently used model loaded for five minutes by default. Unload it immediately with ollama stop:
ollama stop llama3.2Remove an unused download to recover disk space:
ollama rm mistralModels installed by the system service are stored under /usr/share/ollama/.ollama/models. When you pick a model, match its size to the memory you have. A small 1B to 3B model runs comfortably on a CPU, while larger models want more RAM or GPU memory and feel much faster once a supported GPU handles them.
Step 5: Use the Ollama API
The local API is available at http://localhost:11434/api. Send a prompt to the generate endpoint with curl, setting stream to false so the server returns one completed JSON response:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Name three Linux distributions.",
"stream": false
}'The JSON response contains the generated text in the response field, along with timing and token-count information. Because the request stays on your machine, local models need no API key or authentication.
The same local service also exposes chat, embeddings, model-management, and OpenAI-compatible endpoints, so most applications and browser front ends can talk to Ollama without any extra setup.
Step 6: Check GPU Acceleration
If no supported GPU is present, Ollama falls back to the CPU and still works, just more slowly. When the Linux installer detects supported NVIDIA or AMD hardware, it pulls in the matching GPU libraries for you. You still need a working GPU driver on the host, since Ollama relies on it to reach the card.
For NVIDIA hardware, verify the installed driver:
nvidia-smiOllama does not require the full CUDA Toolkit for inference. It supports NVIDIA GPUs with compute capability 5.0 or newer, subject to the driver requirements listed in the official hardware documentation .
For supported AMD hardware on Linux, Ollama uses ROCm and requires a compatible ROCm 7 driver. The installer downloads the additional ROCm package when it detects an AMD GPU.
Run a model, then check which processor it uses:
ollama run llama3.2 "Respond with the word ready."
ollama psNAME ID SIZE PROCESSOR UNTIL
llama3.2:latest a80c4f17acd5 3.5 GB 100% GPU 4 minutes from nowA PROCESSOR value of 100% GPU confirms GPU inference. If it shows 100% CPU, inspect the service logs and verify that the GPU driver can see the card:
journalctl -e -u ollamaStep 7: Expose Ollama on the Network
Ollama binds to the loopback address by default, which prevents other devices from reaching it. To use a separate GPU server from a trusted workstation, create a systemd override:
sudo systemctl edit ollamaAdd the following setting:
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"Reload systemd and restart Ollama:
sudo systemctl daemon-reload
sudo systemctl restart ollama0.0.0.0 exposes model execution to every client that can reach port 11434. Allow only trusted source addresses with a firewall, and do not publish the port directly to the internet.For example, this UFW rule permits a private subnet and blocks unrelated clients:
sudo ufw allow from 192.168.1.0/24 to any port 11434 proto tcpReplace the subnet with the address or network that should have access. See the Ubuntu UFW guide for rule management and verification.
Update Ollama
When a new release lands, you do not need a special upgrade command. Run the same installation script again and it updates Ollama in place:
curl -fsSL https://ollama.com/install.sh | shThe installer replaces the program files and restarts the service without deleting downloaded models.
Troubleshooting
Ollama cannot connect to the local service
Start the service with sudo systemctl start ollama, then inspect its status with systemctl status ollama --no-pager.
The service starts but model commands fail
Read the recent logs with journalctl -e -u ollama. For live output while reproducing the problem, use journalctl -u ollama --no-pager --follow --pager-end.
A model runs out of memory
Remove other loaded models with ollama stop MODEL, then choose a smaller model or a lower-precision variant from the model library.
Ollama uses the CPU instead of an NVIDIA GPU
Run nvidia-smi and update the NVIDIA driver if it cannot see the card. After a Linux suspend and resume cycle, reloading nvidia_uvm or rebooting may restore GPU detection.
Ollama does not detect an AMD GPU
Confirm that the host uses a supported GPU and ROCm 7 driver. Check /dev/kfd permissions and the Ollama logs for ROCm or GPU-discovery errors.
Conclusion
Ollama provides a direct way to run local language models on Ubuntu through the terminal or an HTTP API. Start with a small model, confirm its memory and processor use with ollama ps, and keep port 11434 restricted to trusted systems.
Tags
Linuxize Weekly Newsletter
A quick weekly roundup of new tutorials, news, and tips.
About the authors

Dejan Panovski
Dejan Panovski is the founder of Linuxize, an RHCSA-certified Linux system administrator and DevOps engineer based in Skopje, Macedonia. Author of 800+ Linux tutorials with 20+ years of experience turning complex Linux tasks into clear, reliable guides.
View author page