Brand voice, not bot voice — what we learned wiring AI into 100 captions
If you've ever asked ChatGPT to "write me an Instagram caption for a bakery," you know what AI sounds like by default. It opens with "🍞 Indulge in the warm aroma of freshly baked bread!" and ends with five hashtags including #foodie and #yum. It is recognisably not you. Customers can tell.
We spent a month wiring AI caption generation into Postpilot's Compose page and watched what happens when SMBs try to make AI sound like them. Here are the three changes that actually moved the needle.
1. Feed the model your last 20 posts — not a brand-voice questionnaire
The intuitive approach is to ask the customer 10 questions about their brand voice: tone, formality, emoji policy, preferred openers. We tried that. The captions came back sounding like brand-voice answers — generic enough to fit a wedding planner or a tax advisor.
The version that worked: skip the questionnaire entirely. On first connect, Postpilot reads the customer's last 20 published posts directly from Instagram and Facebook. That's the brand voice. The AI sees the actual hooks, the actual emoji density, the actual sentence rhythm.
For Müller Bakery, that meant the AI noticed:
- The owner opens with one short factual statement ("Three hours of laminating.") and never with a question.
- They use one emoji per post, usually at the end.
- They never use #ad, #sponsored, or #foodie — only the bakery name and the city.
You can't get that from a questionnaire. You get it from the data.
2. Show the AI which past posts performed — not just which ones exist
The first version weighted all 20 posts equally. The captions got better at sounding like the brand but didn't get better at engaging.
The second version added a second pass: for each platform, we sort past posts by saves + comments + shares per follower. The AI gets told "these eight posts performed well; the other 12 didn't." Now the suggested captions skew toward the hooks that actually convert.
The shift was sharp. Before this change, AI-drafted captions were accepted unchanged in 31% of cases. After, that jumped to 58%. Time-to-post dropped from 4 minutes per draft to 90 seconds.
3. Make the model refuse, not invent
The most common failure mode of generic AI captioning is the hallucinated specific. Asked to draft a caption for a flat-lay of sourdough, GPT happily invents "Our 100-year-old starter from Bavaria" — when the bakery's starter is three years old and from Bordeaux.
The fix is a small but boring one: every prompt now includes "If you don't have a specific fact about the brand from the last 20 posts, don't invent one. Stay generic." Combined with the brand-voice context, the model stops hallucinating specifics about the business. The captions get less colourful — but they get correct.
This is the change customers notice most. "It doesn't make stuff up about my business anymore" is the phrase we hear back from pilots.
What we still can't do
- Voice drift over time. As a brand evolves (new products, new tone), the 20-post window goes stale. We don't yet auto-refresh; you have to manually re-run the brand-voice analysis. Phase 2.
- Multilingual voice in one account. If you post in German and English from the same account, the brand voice gets averaged. Workaround for now: keep one Postpilot org per language.
- Photo-aware captioning. The AI sees only the caption text, not the image. A description-aware model is on the Phase 2 roadmap.
If you want to see this in action: start a free Pro trial and connect your Instagram. The first AI caption you see is generated from your last 20 posts — no questionnaire, no setup.