AI tutorial - 1.1: Hello world to AI with Ollama
We are currently in a time of continuous evolution of artificial intelligence (AI) languages, with new models and evolutions of existing ones constantly appearing, such as the different versions of GPT (GPT-3.5, GPT-4, etc.) from OpenAI, Gemini from Google, Claude from Anthropic, Grok from X, DeepSeek-V3 from DeepSeek or Copilot from Microsoft.
These models are available online with a subscription fee based on token usage, but some models, or versions (usually earlier ones), are available for download and local use.
For this local use, it is necessary to have applications that manage AI language models, such as:
- Ollama. Very important: Ollama is an AI language model management application, and it also has models called "llama." Don't confuse the two.
- LMStudio
- Jan
- GPT4All
We'll explain how to install an AI language model management application, specifically Ollama, and how to download and use AI language models locally.
To install Ollama, visit the following URL:
https://ollama.com/download/linuxThis page shows how to install Ollama for each operating system and the available models to use with Ollama.
First, we install Ollama, which will install the CLI for the Ollama console and a REST API to interact and manage everything.
All the actions detailed in this article were performed on an Ubuntu 25.04 desktop virtualized with VirtualBox with 154 GB RAM and 4 CPUs, but it should work on most operating systems with the changes you apply.
We will perform the installation as detailed at https://ollama.com/download/linux by running:
curl -fsSL https://ollama.com/install.sh | sh
This installation will create an ollama service that must be running for this console CLI and REST API to be available:
llm-04@ubuntu2504desktop-llm-04-ts04:~$ sudo systemctl status ollama
● ollama.service - Ollama Service
Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
Active: active (running) since Fri 2025-09-19 11:57:22 CEST; 1h 32min ago
Invocation: 1e26e8d051b443d2b8c9ba46e07725af
Main PID: 2143 (ollama)
Tasks: 11 (limit: 16054)
Memory: 4.5G (peak: 9.5G)
CPU: 34.230s
CGroup: /system.slice/ollama.service
└─2143 /usr/local/bin/ollama serve
Once Ollama is installed on our computer, we look at the "Models" link on the ollama.com website for the available models and proceed to download the one we prefer. For this explanation, we'll download two models:
- llama3:8b
- llama3:instruct
If you consider yourself to have powerful hardware, you can try downloading llama3:70b or any other model with a high number of parameters. With a greater number of parameters on the same hardware, the token generation rate per second is lower, but it can handle more complex tasks and generate more detailed responses.
The "ollama pull" command is used to download the model you want to use, and after the model you have the ability to specify tags, which are reported in each model which are available for it, such as the following:
- To specify the size of the parameters you have, and in general the larger they are, the more knowledge they have but the more resources they require: they can be values like 1b, 8b, 70b, etc.
- To specify quantization: which serves to reduce the size of the model and speed up its execution, at the cost of reducing numerical precision. Example values are 16, q2_K, q4_K_, etc. Lower numerical values mean more compression and therefore greater loss of precision. For example, f16 is 16-bit floating-point quantization and is therefore minimally compromised, as is f32, but they require a greater amount of resources. Generally, the K variants (_K_S, _K_M, _K_L) are a good balance between size and quality.
- To specify the variant
- instruct: to follow instructions
- chat: to chat
- code: for programming
- vision: for images
- uncensored: with less security filtering
- latest: is the default if nothing is specified
We download the models with the following commands:
ollama pull llama3:8b
ollama pull llama3:instruct
These models are stored, each with its Modelfile and other files they need, in different locations depending on the OS and installation type:
- macOS: in the user's home directory at /.ollama/models
- Windows: in the user's home directory at /.ollama/models
- Linux with a non-Systemd service installation: in the user's home directory at /.ollama/models
- Linux with a Systemd service installation (as detailed in this article): in /usr/share/ollama/.ollama/models
- Docker installation: in /root/.ollama/models inside the container
Once we have downloaded the two models with Ollama, we can check that they are ready with the following command:
ollama list
An example of the execution of the previous command is the following, where we see that it provides the name of the model, its ID, the size and the modification date:
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama list
NAME ID SIZE MODIFIED
llama3:instruct 365c0bd3c000 4.7 GB 3 weeks ago
llama3:8b 365c0bd3c000 4.7 GB 3 weeks ago
llm-04@ubuntu2504desktop-llm-04-ts04:~$
A model on our computer can be deleted with the following command, for example deleting the "llama3:instruct" model:
ollama rm llama3:instruct
A model on our computer can be duplicated to customize it to our needs with the evolution we consider with the following command:
ollama cp llama3:instruct llama3:instruct-evo
Once we have the model we want to run available on our computer, we proceed to run it with the following command, example for the model "llama3:8b":
ollama run llama3:8b
This presents us with a prompt to interact with:
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama run llama3:8b
>>> Send a message (/? for help)
This model, when executed, is loaded into memory and remains available in memory for a period of time until it is removed from memory, taking into account the last time it was accessed. Generally, 5 minutes of inactivity removes it from memory and it is no longer active.
And now is the time to ask him any question like "Explain to me the concept of the Napierian logarithm" (you can interact in different languages, I have tried in English and Spanish and he has responded without problems in both, although it seemed to me that he is more explanatory in English):
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama run llama3:8b
>>> Explain to me the concept of the Napierian logarithm
A delightful topic!
The Napierian logarithm, also known as the natural logarithm (ln), is a fundamental concept in mathematics that plays a crucial role in many areas of
science and engineering. It's named after John Napier, a Scottish mathematician who introduced it in 1614.
**What is the Napierian logarithm?**
The Napierian logarithm is a mathematical function that maps any positive real number to its logarithmic value with respect to the base e, where e ≈
2.71828 (Euler's number). In other words, it measures the power to which the base e must be raised to produce the original value.
Formally, for any positive real number x, the Napierian logarithm is defined as:
ln(x) = loge(x)
**Key properties:**
1. **Base**: The Napierian logarithm has a base of e, which is an irrational number approximately equal to 2.71828.
2. **Monotonicity**: The function is monotonically increasing, meaning that ln(x) increases as x increases.
3. **Continuity**: The Napierian logarithm is continuous at all points in its domain (all positive real numbers).
4. **Invertibility**: The function is invertible, meaning that there exists a unique value of x for each given value of ln(x).
**Relationship to other logarithms**
The Napierian logarithm is closely related to other types of logarithms:
1. **Common logarithm (log10)**: This is the logarithm with base 10, which is commonly used in everyday applications. The relationship between log10
and ln is:
log10(x) = ln(x) / ln(10)
2. **Binary logarithm (log2)**: This is the logarithm with base 2, which is used in computer science and information theory. The relationship between
log2 and ln is:
log2(x) = ln(x) / ln(2)
**Importance in mathematics and science**
The Napierian logarithm has numerous applications across various fields:
1. **Calculus**: It appears in the study of limits, derivatives, and integrals.
2. **Probability theory**: The logarithmic function is used to model random processes and calculate probabilities.
3. **Information theory**: The entropy of a probability distribution is closely related to the Napierian logarithm.
4. **Physics**: Logarithms appear in problems involving exponential growth or decay, such as population dynamics, chemical reactions, and
thermodynamics.
In conclusion, the Napierian logarithm is a fundamental mathematical function that plays a crucial role in many areas of science and engineering. Its
unique properties and relationships to other logarithmic functions make it an essential tool for solving problems in various fields.
>>> Send a message (/? for help)
To exit you can press Ctrl+D.
Within the model execution we have commands that are called with "/" followed by the command and the options of each command, some basic examples:
- /?: displays a help menu.
- /show info: displays information about the current model on the screen. This command, within the running model, provides the same information as the "ollama show" command detailed below.
- /bye: exits the interactive chat session and unloads the model from memory.
- /exit: the same as /bye.
- /save name_sesion: creates a new model with the current chat history.
- /load name_sesion: loads a saved model and starts in the saved state to restore the conversation history and continue where you left off. Note: It can also be run with "ollama run session_name".
We can see the active models in memory with the following command:
ollama ps
An example of its output is this:
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama ps
NAME ID SIZE PROCESSOR CONTEXT UNTIL
llama3:8b 365c0bd3c000 5.3 GB 100% CPU 4096 4 minutes from now
llm-04@ubuntu2504desktop-llm-04-ts04:~$
We can see all the details of a model with the following command, in this example asking for the model "llama3:8b":
ollama show llama3:8b
Which returns an output like the following:
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama show llama3:8b
Model
architecture llama
parameters 8.0B
context length 8192
embedding length 4096
quantization Q4_0
Capabilities
completion
Parameters
num_keep 24
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
META LLAMA 3 COMMUNITY LICENSE AGREEMENT
Meta Llama 3 Version Release Date: April 18, 2024
...
llm-04@ubuntu2504desktop-llm-04-ts04:~$
We see in this information that it is a Q4_0 quantization model, with 8 billion parameters, and that it has configuration parameters, and four of these appear: one num_keep and three stop.
Therefore we see that the models present internal configuration through configuration parameters.
In this example, we see a value of 24 in the "num_keep" configuration parameter, which specifies the number of tokens to keep in memory during inference. A lower number of tokens means the model is faster and consumes fewer resources, but its quality is lower. A higher number of tokens means greater coherence, as it has more context to work with. This num_keep parameter depends heavily on the hardware capabilities and the type of task the model is performing.
The "stop" configuration parameter is used to identify specific text sequences that the model will immediately stop producing further output upon generating, thus avoiding generating unwanted or irrelevant content. You can specify keywords to stop when you reach that word, or specify keywords to prevent unwanted content due to sensitive topics.
A list of parameters for Ollama's models (not the AI language management tool) is specified at https://ollama.readthedocs.io/en/modelfile/#valid-parameters-and-values .
These parameters can be modified within the running model with the command "set parameter", an example:
/set parameter num_keep 30
This change is temporary for the current model to run. This change is lost upon restart.
Parameters can be permanently changed via the Ollama API or in the model's Modelfile.
You can ask for the Modelfile of a model with the following command, example with "llama3:8b":
ollama show --modelfile llama3:8b
Which returns the following, where you can see the location of the Modelfile and the parameters on the lines that begin with "PARAMETER":
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama show --modelfile llama3:8b
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM llama3:8b
FROM /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"
PARAMETER num_keep 24
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
LICENSE "META LLAMA 3 COMMUNITY LICENSE AGREEMENT
Meta Llama 3 Version Release Date: April 18, 2024
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.
“Documentation” means the specifications, manuals and documentation accompanying Meta Llama 3 distributed by Meta at https://llama.meta.com/get-started/.
“Licensee” or “you” means you, or your employer or any other person or
...
...
of Meta Llama 3: LlamaUseReport@meta.com
"
llm-04@ubuntu2504desktop-llm-04-ts04:~$
Ollama, as a model manager for artificial intelligence languages, allows you to create your own model from a Modelfile with the "ollama create" command, an aspect we will not go into in depth now.
The Ollama tool also has a REST API that is available locally at "http://localhost:11434", and we can verify its operation with a "curl" or in the browser for GET requests or using an API manager such as Postman or similar:
llm-04@ubuntu2504desktop-llm-04-ts04:~$ curl http://localhost:11434
Ollama is running
llm-04@ubuntu2504desktop-llm-04-ts04:~$
With this API you can, for example, consult the available models in a similar way to how the Ollama CLI does in the console with "ollama list" with a call to the REST API with the get method "/v1/models":
llm-04@ubuntu2504desktop-llm-04-ts04:~$ curl http://localhost:11434/v1/models | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 189 100 189 0 0 29329 0 --:--:-- --:--:-- --:--:-- 31500
{
"object": "list",
"data": [
{
"id": "llama3:instruct",
"object": "model",
"created": 1756027924,
"owned_by": "library"
},
{
"id": "llama3:8b",
"object": "model",
"created": 1756027280,
"owned_by": "library"
}
]
}
llm-04@ubuntu2504desktop-llm-04-ts04:~$
Just like you can see with the models, you can create a model with a REST API call using the POST /api/create method. You can perform the same Ollama functions with its REST API as with its console CLI.
curl http://localhost:11434/api/chat -d '{
"model": "llama3:8b",
"messages": [
{ "role": "user", "content": "Explain to me the concept of the Napierian logarithm" }
]
}'
Which gives the following response, as JSON:
llm-04@ubuntu2504desktop-llm-04-ts04:~$ curl http://localhost:11434/api/chat -d '{
"model": "llama3:8b",
"messages": [
{ "role": "user", "content": "Explain to me the concept of the Napierian logarithm" }
]
}'
{"model":"llama3:8b","created_at":"2025-09-19T14:38:36.319638065Z","message":{"role":"assistant","content":"A"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:36.648205025Z","message":{"role":"assistant","content":" fascinating"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:36.977830334Z","message":{"role":"assistant","content":" topic"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:37.319596095Z","message":{"role":"assistant","content":"!\n\n"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:37.691332083Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:38.038542472Z","message":{"role":"assistant","content":" Nap"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:38.374558726Z","message":{"role":"assistant","content":"ier"},"done":false}
...
...
{"model":"llama3:8b","created_at":"2025-09-19T14:42:51.954438674Z","message":{"role":"assistant","content":" and"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:52.367103924Z","message":{"role":"assistant","content":" mathematical"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:52.808416659Z","message":{"role":"assistant","content":" models"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:53.274296402Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:53.678305521Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":260023194239,"load_duration":271466683,"prompt_eval_count":23,"prompt_eval_duration":2388628695,"eval_count":563,"eval_duration":257362217533}
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama list
With this, we have a basic understanding of how to manage and work with the different AI models available to us. In this tutorial, we'll go deeper into the different aspects.
NOTE: The author of this article also has it published at feitam.es.