Once you have picked a tool (see Which tool should I use?), the next question is which model to run inside it. Most modern coding tools let you switch between several models, and the right choice depends on the task, not just on which model is “best” in the abstract.
If you are unsure how you actually reach a model, start with Model access first. If you specifically want a local or offline route, see Local models and Ollama.
After working through this page, students should be better able to:
The table below is a rough practical map, optimised for actual work rather than benchmark bragging.
| Model family | Best use | Relative quality | Main tradeoff |
|---|---|---|---|
| Claude Opus | hardest debugging, architecture, forensics, ambiguous root-cause work, long chains of reasoning | usually the strongest overall reasoning and writing in this set | slower, more expensive, more session-limit anxiety |
| Claude Sonnet | everyday serious coding, implementation, reviews, normal debugging | close to Opus on many tasks, but less reliable on the nastiest multi-step cases | cheaper and faster, but gives up some depth |
| GPT-5.4 | strong general coding agent, implementation, edits across a repo, good default when you want throughput plus solid quality | roughly in the top working tier; usually not as prose-heavy as Opus, but very capable | quality can depend more on harness and provider policy |
| GPT-5 mini | cheap and faster helper for boilerplate, small transforms, grep-and-fix style tasks, subagents | clearly below the big frontier models, but often good enough for scoped tasks | weaker judgement, easier to drift on larger changes |
| Qwen larger open models | local or self-hosted coding help, decent code generation, useful when privacy, control, or cost matter | surprisingly strong for open weights, but usually below top proprietary models on hard reasoning | more setup, more variance by size and quantisation, weaker long-horizon reliability |
| Kimi or Kimi K2 style models | long-context reading, broad synthesis, sometimes very good value | can be excellent for some analysis and synthesis workloads | availability and deployment path is less universal; coding reliability still depends on task and harness |
| Small Ollama local models (7B-ish class) | quick snippet help, toy examples, autocomplete-style assistance, introductory-course use | good for lightweight help, not frontier-class | weak on difficult repo work, planning, and deep debugging |
| Mid or large Ollama open models (14B, 32B, 70B-ish class) | better local coding assistant, document reading, medium-difficulty implementation | much better than tiny local models; still usually below top proprietary models on the hardest work | hardware hungry, slower, and still less reliable than Opus or top GPTs on tough tasks |
A short practical ranking by task:
A simple decision rule:
Very roughly, from strongest to weakest on hard work:
The biggest caveat is that the harness around the model matters a lot. A good client with good context handling, subagents, and tooling can easily make a slightly weaker model feel better in practice than a stronger model in a worse harness.
In other words:
AGENTS.md and skills) often beats a stronger model in
browser chatA practical pattern several practitioners in this course use is:
This is not the only pattern, but it illustrates that model choice is rarely “pick one and stick with it”. Most serious workflows mix models depending on the task.
Try these short scenarios before looking at the suggested answers:
Suggested answers: