Ollama

Context #

Ollama is one of the quickest way to run LLMs on your local system.
Best of running SLMs (Small Language Models)locally.
Ollama provides a default REST API for running the managed models using OpenAI-compatible interface.
For Web Interface, use Open WebUI. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.
Library has 100+ models listed.
See Examples
Like Docker concept, Ollama helps to create a model from a Modelfile.
GitHub
Not recommended for production. It is better to use either Cloud-provider deployed version (e.g. Vertex AI, Bedrock, Azure AI) or use vllm, or similar solutions.

ollama list
ollama run phi3

FROM llama3.2
PARAMETER temperature 1
SYSTEM """
You are Mario from super mario bros, acting as an assistant.
"""

ollama create mymodel -f ./Modelfile

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

pip install open-webui
open-webui serve