All Technologies
AI & ML·advanced

Ollama

Run LLMs locally

Ollama is the simplest way to run open-weight LLMs locally. I use it for offline development, prompt iteration without API costs, and embedded model scenarios.

1+years in production
8+projects shipped
advancedproficiency

My take

Why I use Ollama

Ollama removed all the friction from local model serving. ollama run llama3 and I'm experimenting in seconds. It's now part of my standard local AI dev loop.

Want the broader stack philosophy? Read about how Sri picks tools or browse engineering insights.

Honest assessment

Strengths & tradeoffs

No tool is perfect. Here's what shines and what to watch for.

Strengths

  • One-command install and model pull
  • OpenAI-compatible API endpoint
  • Apple Silicon GPU acceleration out of the box
  • Wide model library (Llama, Mistral, Gemma, Qwen)
  • Quantized models that fit on laptops

Tradeoffs (honestly)

  • Performance bound by local hardware
  • Production deployments need vLLM or TGI for throughput
  • Less observability than managed APIs

Fit assessment

When to reach for Ollama

Pick the right tool for the job.

Best fits

Local development against local models

Offline demos and prototypes

Privacy-sensitive personal tooling

Cost-free prompt iteration

Not ideal for

Production-scale serving (use vLLM or TGI)

Multi-user concurrent workloads

Common use cases

Local LLM devOffline inferencePrompt iterationEmbedded use

Resources

Learn more

Curated official docs, tutorials, and writing on Ollama.

Stack

Pairs well with Ollama

Tools and platforms I commonly combine with this one.

Need help with Ollama?

Whether you're starting fresh or optimizing an existing implementation, I can help you get the most out of this technology. Read more in insights or get in touch.

Command Palette

Search for a command to run...