All Insights
technical

DeepSeek, Llama, Qwen - When Open Models Win

Open-weights aren't just for hobbyists anymore. Real production use cases now favor them.

February 12, 20258 min read

Open-weights models closed enough of the gap with frontier closed-source models that real production workloads now make sense on them. Here's where I deploy them.

Open-weights AI models are no longer toys. DeepSeek-V3, Llama 3.3, Qwen 2.5 - these are real production-grade models. Here's where they win.

When open wins

  • Cost. Self-hosted Llama 3.3 70B on a single H100 costs ~$3/hr. At 200 tokens/sec, that's $0.0008/1K output tokens. Compared to GPT-4o at $0.015/1K, that's nearly 20x cheaper.
  • Privacy. Some workloads can't ship data to a third party. Healthcare, defense, financial regulatory data. Open-weights are the only option.
  • Customization. Fine-tuning open-weights is straightforward. Fine-tuning closed-source models requires vendor cooperation (and is rarely worth it).
  • Predictable performance. No surprises from "we updated the model" emails.

When closed wins

  • Frontier capability. GPT-4o and Claude 3.5+ still beat open models on hard reasoning, multi-step planning, and code generation.
  • Tool-use reliability. This is the biggest gap. Closed models call tools more reliably.
  • Operational simplicity. Hosted inference is easier than self-hosted. If your team isn't ML-ops competent, closed wins.

My current production split

For client work:

  • Internal tools, classification, draft generation: open-weights via Groq or Together AI
  • Customer-facing high-stakes work: Claude or GPT
  • Privacy-sensitive workloads: self-hosted DeepSeek or Llama on the client's own infra

For my own chat widget - open-weights via Groq. Cost-effective, fast, good enough for the workload.

Self-hosting reality

Running open-weights yourself is operationally non-trivial:

  • GPU procurement (H100s are still hard to get on demand)
  • Inference server (vLLM or TGI)
  • Load balancing, auto-scaling
  • Monitoring (model-specific metrics matter)

For most teams, hosted open-weights (via Groq, Together, Fireworks, etc.) is the right answer. You get the open-weights cost benefit without the ops overhead.

What I'd watch

  • DeepSeek-class models continuing to close the frontier gap
  • Better tool-use in open-weights (it's improving fast)
  • Local inference on consumer hardware getting more capable (Llama 3.3 70B runs on a Mac M3 Ultra; that's mind-blowing)

The open-weights era is real. Hybrid is the right architecture.

References

aiopen-sourcedeepseekllama

Want to discuss this topic?

I'm always happy to dive deeper. Reach out if you have questions or want to collaborate.

Get in Touch

Command Palette

Search for a command to run...