AI tutorial: Hello world to AI with Ollama

We are currently in a time of continuous evolution of artificial intelligence (AI) languages, with new models and evolutions of existing ones constantly appearing, such as the different versions of GPT (GPT-3.5, GPT-4, etc.) from OpenAI, Gemini from Google, Claude from Anthropic, Grok from X, DeepSeek-V3 from DeepSeek or Copilot from Microsoft.

These models are available online with a subscription fee based on token usage, but some models, or versions (usually earlier ones), are available for download and local use.

For this local use, it is necessary to have applications that manage AI language models, such as:

Ollama. Very important: Ollama is an AI language model management application, and it also has models called "llama." Don't confuse the two.
LMStudio
Jan
GPT4All

We'll explain how to install an AI language model management application, specifically Ollama, and how to download and use AI language models locally.

To install Ollama, visit the following URL:

https://ollama.com/download/linux

This page shows how to install Ollama for each operating system and the available models to use with Ollama.

First, we install Ollama, which will install the CLI for the Ollama console and a REST API to interact and manage everything.

All the actions detailed in this article were performed on an Ubuntu 25.04 desktop virtualized with VirtualBox with 154 GB RAM and 4 CPUs, but it should work on most operating systems with the changes you apply.

We will perform the installation as detailed at https://ollama.com/download/linux by running:

curl -fsSL https://ollama.com/install.sh | sh

This installation will create an ollama service that must be running for this console CLI and REST API to be available:

llm-04@ubuntu2504desktop-llm-04-ts04:~$ sudo systemctl status ollama
● ollama.service - Ollama Service
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-09-19 11:57:22 CEST; 1h 32min ago
 Invocation: 1e26e8d051b443d2b8c9ba46e07725af
   Main PID: 2143 (ollama)
      Tasks: 11 (limit: 16054)
     Memory: 4.5G (peak: 9.5G)
        CPU: 34.230s
     CGroup: /system.slice/ollama.service
             └─2143 /usr/local/bin/ollama serve

Once Ollama is installed on our computer, we look at the "Models" link on the ollama.com website for the available models and proceed to download the one we prefer. For this explanation, we'll download two models:

llama3:8b
llama3:instruct

If you consider yourself to have powerful hardware, you can try downloading llama3:70b or any other model with a high number of parameters. With a greater number of parameters on the same hardware, the token generation rate per second is lower, but it can handle more complex tasks and generate more detailed responses.

The "ollama pull" command is used to download the model you want to use, and after the model you have the ability to specify tags, which are reported in each model which are available for it, such as the following:

To specify the size of the parameters you have, and in general the larger they are, the more knowledge they have but the more resources they require: they can be values like 1b, 8b, 70b, etc.
To specify quantization: which serves to reduce the size of the model and speed up its execution, at the cost of reducing numerical precision. Example values are 16, q2_K, q4_K_, etc. Lower numerical values mean more compression and therefore greater loss of precision. For example, f16 is 16-bit floating-point quantization and is therefore minimally compromised, as is f32, but they require a greater amount of resources. Generally, the K variants (_K_S, _K_M, _K_L) are a good balance between size and quality.
To specify the variant
- instruct: to follow instructions
- chat: to chat
- code: for programming
- vision: for images
- uncensored: with less security filtering
- latest: is the default if nothing is specified

We download the models with the following commands:

ollama pull llama3:8b
ollama pull llama3:instruct

These models are stored, each with its Modelfile and other files they need, in different locations depending on the OS and installation type:

macOS: in the user's home directory at /.ollama/models
Windows: in the user's home directory at /.ollama/models
Linux with a non-Systemd service installation: in the user's home directory at /.ollama/models
Linux with a Systemd service installation (as detailed in this article): in /usr/share/ollama/.ollama/models
Docker installation: in /root/.ollama/models inside the container

Once we have downloaded the two models with Ollama, we can check that they are ready with the following command:

ollama list

An example of the execution of the previous command is the following, where we see that it provides the name of the model, its ID, the size and the modification date:

llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama list
NAME               ID              SIZE      MODIFIED    
llama3:instruct    365c0bd3c000    4.7 GB    3 weeks ago    
llama3:8b          365c0bd3c000    4.7 GB    3 weeks ago    
llm-04@ubuntu2504desktop-llm-04-ts04:~$

A model on our computer can be deleted with the following command, for example deleting the "llama3:instruct" model:

ollama rm llama3:instruct

A model on our computer can be duplicated to customize it to our needs with the evolution we consider with the following command:

ollama cp llama3:instruct llama3:instruct-evo

Once we have the model we want to run available on our computer, we proceed to run it with the following command, example for the model "llama3:8b":

ollama run llama3:8b

This presents us with a prompt to interact with:

llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama run llama3:8b
>>> Send a message (/? for help)

This model, when executed, is loaded into memory and remains available in memory for a period of time until it is removed from memory, taking into account the last time it was accessed. Generally, 5 minutes of inactivity removes it from memory and it is no longer active.

And now is the time to ask him any question like "Explain to me the concept of the Napierian logarithm" (you can interact in different languages, I have tried in English and Spanish and he has responded without problems in both, although it seemed to me that he is more explanatory in English):

llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama run llama3:8b
>>> Explain to me the concept of the Napierian logarithm
A delightful topic!

The Napierian logarithm, also known as the natural logarithm (ln), is a fundamental concept in mathematics that plays a crucial role in many areas of 
science and engineering. It's named after John Napier, a Scottish mathematician who introduced it in 1614.

**What is the Napierian logarithm?**

The Napierian logarithm is a mathematical function that maps any positive real number to its logarithmic value with respect to the base e, where e ≈ 
2.71828 (Euler's number). In other words, it measures the power to which the base e must be raised to produce the original value.

Formally, for any positive real number x, the Napierian logarithm is defined as:

ln(x) = loge(x)

**Key properties:**

1. **Base**: The Napierian logarithm has a base of e, which is an irrational number approximately equal to 2.71828.
2. **Monotonicity**: The function is monotonically increasing, meaning that ln(x) increases as x increases.
3. **Continuity**: The Napierian logarithm is continuous at all points in its domain (all positive real numbers).
4. **Invertibility**: The function is invertible, meaning that there exists a unique value of x for each given value of ln(x).

**Relationship to other logarithms**

The Napierian logarithm is closely related to other types of logarithms:

1. **Common logarithm (log10)**: This is the logarithm with base 10, which is commonly used in everyday applications. The relationship between log10 
and ln is:

log10(x) = ln(x) / ln(10)

2. **Binary logarithm (log2)**: This is the logarithm with base 2, which is used in computer science and information theory. The relationship between 
log2 and ln is:

log2(x) = ln(x) / ln(2)

**Importance in mathematics and science**

The Napierian logarithm has numerous applications across various fields:

1. **Calculus**: It appears in the study of limits, derivatives, and integrals.
2. **Probability theory**: The logarithmic function is used to model random processes and calculate probabilities.
3. **Information theory**: The entropy of a probability distribution is closely related to the Napierian logarithm.
4. **Physics**: Logarithms appear in problems involving exponential growth or decay, such as population dynamics, chemical reactions, and 
thermodynamics.

In conclusion, the Napierian logarithm is a fundamental mathematical function that plays a crucial role in many areas of science and engineering. Its 
unique properties and relationships to other logarithmic functions make it an essential tool for solving problems in various fields.

>>> Send a message (/? for help)

To exit you can press Ctrl+D.

Within the model execution we have commands that are called with "/" followed by the command and the options of each command, some basic examples:

/?: displays a help menu.
/show info: displays information about the current model on the screen. This command, within the running model, provides the same information as the "ollama show" command detailed below.
/bye: exits the interactive chat session and unloads the model from memory.
/exit: the same as /bye.
/save name_sesion: creates a new model with the current chat history.
/load name_sesion: loads a saved model and starts in the saved state to restore the conversation history and continue where you left off. Note: It can also be run with "ollama run session_name".

We can see the active models in memory with the following command:

ollama ps

An example of its output is this:

llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama ps
NAME         ID              SIZE      PROCESSOR    CONTEXT    UNTIL              
llama3:8b    365c0bd3c000    5.3 GB    100% CPU     4096       4 minutes from now    
llm-04@ubuntu2504desktop-llm-04-ts04:~$

We can see all the details of a model with the following command, in this example asking for the model "llama3:8b":

ollama show llama3:8b

Which returns an output like the following:

llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama show llama3:8b
  Model
    architecture        llama    
    parameters          8.0B     
    context length      8192     
    embedding length    4096     
    quantization        Q4_0     

  Capabilities
    completion    

  Parameters
    num_keep    24                       
    stop        "<|start_header_id|>"    
    stop        "<|end_header_id|>"      
    stop        "<|eot_id|>"             

  License
    META LLAMA 3 COMMUNITY LICENSE AGREEMENT             
    Meta Llama 3 Version Release Date: April 18, 2024    
    ...                                                  

llm-04@ubuntu2504desktop-llm-04-ts04:~$

We see in this information that it is a Q4_0 quantization model, with 8 billion parameters, and that it has configuration parameters, and four of these appear: one num_keep and three stop.

Therefore we see that the models present internal configuration through configuration parameters.

In this example, we see a value of 24 in the "num_keep" configuration parameter, which specifies the number of tokens to keep in memory during inference. A lower number of tokens means the model is faster and consumes fewer resources, but its quality is lower. A higher number of tokens means greater coherence, as it has more context to work with. This num_keep parameter depends heavily on the hardware capabilities and the type of task the model is performing.

The "stop" configuration parameter is used to identify specific text sequences that the model will immediately stop producing further output upon generating, thus avoiding generating unwanted or irrelevant content. You can specify keywords to stop when you reach that word, or specify keywords to prevent unwanted content due to sensitive topics.

A list of parameters for Ollama's models (not the AI language management tool) is specified at https://ollama.readthedocs.io/en/modelfile/#valid-parameters-and-values .

These parameters can be modified within the running model with the command "set parameter", an example:

/set parameter num_keep 30

This change is temporary for the current model to run. This change is lost upon restart.

Parameters can be permanently changed via the Ollama API or in the model's Modelfile.

You can ask for the Modelfile of a model with the following command, example with "llama3:8b":

ollama show --modelfile llama3:8b

Which returns the following, where you can see the location of the Modelfile and the parameters on the lines that begin with "PARAMETER":

llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama show --modelfile llama3:8b
# Modelfile generated by "ollama show"
# To build a new Modelfile based on this, replace FROM with:
# FROM llama3:8b

FROM /usr/share/ollama/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
TEMPLATE "{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"
PARAMETER num_keep 24
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
LICENSE "META LLAMA 3 COMMUNITY LICENSE AGREEMENT

Meta Llama 3 Version Release Date: April 18, 2024
“Agreement” means the terms and conditions for use, reproduction, distribution and modification of the Llama Materials set forth herein.

“Documentation” means the specifications, manuals and documentation accompanying Meta Llama 3 distributed by Meta at https://llama.meta.com/get-started/.

“Licensee” or “you” means you, or your employer or any other person or 
...
...
of Meta Llama 3: LlamaUseReport@meta.com
"

llm-04@ubuntu2504desktop-llm-04-ts04:~$

Ollama, as a model manager for artificial intelligence languages, allows you to create your own model from a Modelfile with the "ollama create" command, an aspect we will not go into in depth now.

The Ollama tool also has a REST API that is available locally at "http://localhost:11434", and we can verify its operation with a "curl" or in the browser for GET requests or using an API manager such as Postman or similar:

llm-04@ubuntu2504desktop-llm-04-ts04:~$ curl http://localhost:11434
Ollama is running
llm-04@ubuntu2504desktop-llm-04-ts04:~$

With this API you can, for example, consult the available models in a similar way to how the Ollama CLI does in the console with "ollama list" with a call to the REST API with the get method "/v1/models":

llm-04@ubuntu2504desktop-llm-04-ts04:~$ curl http://localhost:11434/v1/models | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   189  100   189    0     0  29329      0 --:--:-- --:--:-- --:--:-- 31500
{
  "object": "list",
  "data": [
    {
      "id": "llama3:instruct",
      "object": "model",
      "created": 1756027924,
      "owned_by": "library"
    },
    {
      "id": "llama3:8b",
      "object": "model",
      "created": 1756027280,
      "owned_by": "library"
    }
  ]
}
llm-04@ubuntu2504desktop-llm-04-ts04:~$

Just like you can see with the models, you can create a model with a REST API call using the POST /api/create method. You can perform the same Ollama functions with its REST API as with its console CLI.

curl http://localhost:11434/api/chat -d '{
  "model": "llama3:8b",
  "messages": [
    { "role": "user", "content": "Explain to me the concept of the Napierian logarithm" }
  ]
}'

Which gives the following response, as JSON:

llm-04@ubuntu2504desktop-llm-04-ts04:~$ curl http://localhost:11434/api/chat -d '{
  "model": "llama3:8b",
  "messages": [
    { "role": "user", "content": "Explain to me the concept of the Napierian logarithm" }
  ]
}'
{"model":"llama3:8b","created_at":"2025-09-19T14:38:36.319638065Z","message":{"role":"assistant","content":"A"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:36.648205025Z","message":{"role":"assistant","content":" fascinating"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:36.977830334Z","message":{"role":"assistant","content":" topic"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:37.319596095Z","message":{"role":"assistant","content":"!\n\n"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:37.691332083Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:38.038542472Z","message":{"role":"assistant","content":" Nap"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:38:38.374558726Z","message":{"role":"assistant","content":"ier"},"done":false}

...
...

{"model":"llama3:8b","created_at":"2025-09-19T14:42:51.954438674Z","message":{"role":"assistant","content":" and"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:52.367103924Z","message":{"role":"assistant","content":" mathematical"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:52.808416659Z","message":{"role":"assistant","content":" models"},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:53.274296402Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"llama3:8b","created_at":"2025-09-19T14:42:53.678305521Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":260023194239,"load_duration":271466683,"prompt_eval_count":23,"prompt_eval_duration":2388628695,"eval_count":563,"eval_duration":257362217533}
llm-04@ubuntu2504desktop-llm-04-ts04:~$ ollama list

With this, we have a basic understanding of how to manage and work with the different AI models available to us. In this tutorial, we'll go deeper into the different aspects.

NOTE: The author of this article also has it published at feitam.es.