
A New Study Puts Leading Tools to the Test
As AI continues to reshape how we communicate and work, email remains a high-stakes proving ground for machine-generated language. From internal memos to customer outreach, writing with clarity, empathy, and persuasion is still hard to fake — even for the most advanced AI assistants.
A recent Washington Post study set out to compare five of today’s most popular AI assistants in a “bake-off” to determine which one actually writes the most human-sounding email. As a company that builds search and AI-driven applications, we were intrigued by how the AI assistants performed — and what it says about the future of natural language interfaces.
The Setup: 5 Prompts, 5 AIs, Expert Judging
Columnist Geoffrey A. Fowler tested five email-writing prompts — from difficult apologies to awkward breakups to a humorous work proposal — and submitted the results from these tools:
- Microsoft Copilot
- ChatGPT (OpenAI)
- Gemini (Google)
- DeepSeek
- Claude (Anthropic)
He also submitted his own human-written drafts and had a panel of expert judges evaluate all responses blindly. The judging panel included bestselling authors and communications coaches like Ann Handley, Erica Dhawan, and Carmine Gallo.
AI Assistants : Writing Ranked by Human-Likeness
5. Microsoft Copilot (23/100)
Copilot, integrated with Microsoft 365, was flagged as overly formal, robotic, and generic. Judges criticized its overuse of template-like phrases and a lack of emotional warmth.
4. ChatGPT (43/100)
OpenAI’s ChatGPT scored points for directness and clarity, especially in business contexts. However, it often came across as stiff and impersonal in emotionally charged messages.
3. Google’s Gemini (44/100)
Gemini performed similarly to ChatGPT — structurally sound but still felt “off” to human readers. The writing lacked warmth and occasionally sounded automated or insincere.
2. DeepSeek (45/100)
This lesser-known Chinese AI surprised judges with well-reasoned arguments and emotional clarity. However, verbosity and awkward word choice held it back from the top spot.
1. Claude by Anthropic (50/100)🥇
Claude stood out for delivering tone, empathy, and emotional intelligence that felt genuinely human. It even edged out the human-written responses in some cases. Judges praised its balance of humility, humor, and context-aware writing — critical in business and personal communications alike.
“Claude uses precise, respectful language without being overly corporate or impersonal.”
– Erica Dhawan, judge and author of Digital Body Language
🔍 Why This Matters for AI-Driven Applications
At Pureinsights, we build AI applications that interact with humans — whether it’s hybrid search powered by LLMs or conversational interfaces for enterprise content. So while this study focused on email, it highlights a much bigger point:
The effectiveness of AI in enterprise settings hinges not just on language generation — but on emotional intelligence and human-centered communication.
As AI gets integrated into everyday tools like Gmail, Outlook, and Slack, the ability to sound natural, helpful, and trustworthy becomes a competitive advantage.
This also reinforces a key principle behind our Discovery platform: AI needs orchestration, context, and customization to deliver value. Generic responses don’t cut it — whether you’re writing an email or building a virtual agent.
💡 What This Taught Us About AI Assistants and Context
After reading the Washington Post bake-off, we ran our own experiment: we asked ChatGPT to draft this blog — and then submitted that draft to Claude for a rewrite. We also asked a few humans to weigh in.
Surprisingly (or maybe not), the ChatGPT version was the favorite. While Claude produced a polished, emotionally intelligent draft, ChatGPT’s version more closely matched Pureinsights’ voice and writing style. That likely wasn’t accidental — we’ve used ChatGPT extensively across past content, so it’s been shaped by familiarity and repetition.
This small but meaningful test reminded us of a broader truth: AI assistants don’t operate in a vacuum. Context and continuity matter.
✅Key Takeaways: What This Taught Us About AI Assistants and Context
- Context trumps general capability. ChatGPT’s output aligned more naturally with our brand because it’s been exposed to more of our content — not because it’s inherently “better” than Claude.
- Claude still demonstrated impressive tone and empathy. While it won the Washington Post’s bake-off, Claude lacked the familiarity and grounding in Pureinsights’ writing voice, which proved more important in this context.
- This study tested general-purpose AI assistants. These were broad LLMs, not purpose-built email writing tools. As specialized assistants emerge, they may outperform in targeted use cases like sales, support, or executive comms.
- RAG and long-term context matter. This experiment reinforces a core Pureinsights belief: pairing AI with your own content — through Retrieval-Augmented Generation or ongoing interaction — makes output more relevant, useful, and human-aligned.
- AI assistants work best in tuned ecosystems. The future of communication-focused AI isn’t just about model strength — it’s about context, content integration, and how well the system knows you.
Want to see how GenAI can speak in your voice?
If you’re exploring how to apply RAG, LLMs, or hybrid search to your enterprise content, get in touch with us or learn more about our Pureinsights Discovery platform. We specialize in building AI solutions that don’t just generate — they understand.