Prompt engineering uses carefully written instructions to guide a standard AI model, while fine-tuning retrains the model itself on your specific data—for most productivity work, good prompting is faster and cheaper, but if you need the model to handle an entirely new domain or recognize very specialized patterns, fine-tuning becomes worthwhile. The choice comes down to whether you're asking the model to do something within its existing capability (prompt engineering) or teaching it a genuinely new skill (fine-tuning).
Fine-tuning means training an AI model on your specific data. You provide hundreds or thousands of examples (e.g., "here's how I write project summaries"), and the model learns your style, priorities, and patterns. Once fine-tuned, it behaves differently than the base model—more like you.
Prompt engineering means optimizing instructions within a single conversation. You refine your wording, add context, provide examples—all without changing the underlying model. This is what most productivity users do and should do.
The common mistake: assuming fine-tuning is necessary for customization. It rarely is. Fine-tuning is expensive (typically $25–100+ per model creation), slow (24+ hour training time), and requires 100+ high-quality examples. For most productivity use cases, a well-engineered prompt beats fine-tuning by 10:1.
You need the AI to summarize meetings your way. Instead of fine-tuning, write a 300-word example of your ideal summary, show it to the AI, and ask it to match that style going forward. Otter.ai or Claude can follow that instruction in real-time, with zero training required.
You want AI to classify tasks into your custom categories ("urgent-external," "quick-win," "blocked-by-other"). Rather than train a model, write 10 clear examples of each category and include them in every prompt. Todoist AI and Zapier integrations handle this pattern effortlessly.
The rule of thumb: if you can express the rule in text, prompt engineering is faster and cheaper. Fine-tuning only wins when the rule is too complex or implicit to write down, or when you need to process thousands of items and can't afford per-request API costs.
You're running a business where the AI generates customer communications, project reports, or internal documentation. You have 500+ examples of perfect outputs ("here's how we write project status reports"). The cost of a slightly-off tone across thousands of documents exceeds the $50 fine-tuning cost. In this case, fine-tuning a small model (like GPT-3.5) to your exact style creates measurable ROI.
You're building a dedicated productivity tool for your team and need to embed AI logic directly (not via API). Fine-tuning a smaller model and deploying it locally or on-device becomes practical. This saves API costs on high-volume queries and avoids per-request latency.
You have continuous, proprietary data (meeting recordings, chat logs, decision archives) that gives you a structural edge. Fine-tuning on that proprietary data creates a moat competitors can't replicate. But this is rare for individual productivity—it's enterprise-level customization.
Fine-tuned models are smaller and faster than the base model but less adaptable. If you fine-tune on Q1–Q2 data and suddenly your workflow changes in Q3, your fine-tuned model is now misaligned and retraining is expensive. A prompt-engineered system adapts immediately because the rules live in the prompt, not the weights.
Fine-tuning also requires careful data quality. One poorly labeled training example teaches the model bad behavior. Prompt engineering is more forgiving—a bad example in your prompt instructions doesn't permanently corrupt the model.
There's also a middle ground: retrieval-augmented generation (RAG). Instead of fine-tuning, you feed the AI your recent examples at prompt time ("here are 5 previous summaries I approved"). This gives fine-tuning-like customization without the training cost, though it consumes tokens and requires engineering to implement.
Do you have 100+ labeled examples? Yes → Fine-tuning might be worth exploring. **No** → Prompt engineer. Is your customization rule complex and non-verbal (e.g., "matching the energy of my notes")? **Yes** → Fine-tuning. **No** → Show examples in the prompt. Do you process 10,000+ requests monthly? **Yes** → Fine-tuning cost per request becomes negligible. **No** → API cost already isn't your bottleneck.
Try this: Next time you're tempted to "train the AI" on your style, write a 200-word example of what you want, add it to your prompt, and test. Most people find that a single good example in the prompt matches or exceeds the quality they expected from fine-tuning, with zero setup cost and instant iteration.
Peri can explain this concept, give practical examples, help you decide whether it applies to your situation, or recommend a journey if appropriate.
Explore related journeys or tell Peri what you're working through.