Context
#
- Ollama is one of the quickest way to run LLMs on your local system.
- Best of running SLMs (Small Language Models)locally.
- Ollama provides a default REST API for running the managed models using OpenAI-compatible interface.
- For Web Interface, use Open WebUI. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.
- Library has 100+ models listed.
- See Examples
- Like Docker concept, Ollama helps to create a model from a Modelfile.
- GitHub
- Not recommended for production. It is better to use either Cloud-provider deployed version (e.g. Vertex AI, Bedrock, Azure AI) or use vllm, or similar solutions.
Running Locally on Mac
#
ollama list
ollama run phi3
Create a Local Model
#
FROM llama3.2
PARAMETER temperature 1
SYSTEM """
You are Mario from super mario bros, acting as an assistant.
"""
- Run the below command to create a model:
ollama create mymodel -f ./Modelfile
Access a REST API
#
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
Install Web UI
#
pip install open-webui
open-webui serve