Ollama

Context #

  • Ollama is one of the quickest way to run LLMs on your local system.
  • Best of running SLMs (Small Language Models)locally.
  • Ollama provides a default REST API for running the managed models using OpenAI-compatible interface.
  • For Web Interface, use Open WebUI. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.
  • Library has 100+ models listed.
  • See Examples
  • Like Docker concept, Ollama helps to create a model from a Modelfile.
  • GitHub
  • Not recommended for production. It is better to use either Cloud-provider deployed version (e.g. Vertex AI, Bedrock, Azure AI) or use vllm, or similar solutions.

Running Locally on Mac #

ollama list
ollama run phi3

Create a Local Model #

  • Create a Model file.
FROM llama3.2
PARAMETER temperature 1
SYSTEM """
You are Mario from super mario bros, acting as an assistant.
"""
  • Run the below command to create a model:
ollama create mymodel -f ./Modelfile

Access a REST API #

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Install Web UI #

pip install open-webui
open-webui serve