When a General-Purpose AI Stops Being Enough
The large language models available today — ChatGPT, Claude, Gemini — have genuinely democratised assisted writing. For emails, summaries, short articles, and one-off tasks, they're extraordinary.
The problems start when the document exceeds 20–30 pages. That's where these models run into structural limits that no amount of prompting can overcome. Understanding those limits is the starting point for choosing the right tool.
This guide analyses what's actually available in 2025 for long-document work, what each tool's real strengths are, and when a specialised tool makes more sense than a general-purpose model.
Technical Background: Context Windows and Coherence
To understand the differences between tools, you need to understand the concept of the context window: the maximum amount of text a model can process in a single interaction.
| Model | Context Window | Approximate Pages |
|---|---|---|
| GPT-4o | 128,000 tokens | ~96 pages |
| Claude 3.5 Sonnet | 200,000 tokens | ~150 pages |
| Gemini 1.5 Pro | 1,000,000 tokens | ~750 pages |
| GPT-5.4 | 128,000 tokens | ~96 pages |
On paper, Gemini 1.5 Pro looks like the obvious choice for long documents. In practice, two additional problems apply to all of them:
-
Degradation at the end of long windows: all models become less precise with content that appears deep in a very long context. This is well-documented as the "lost in the middle" phenomenon.
-
No enforced coherence across sessions: if you need to generate a document across multiple sessions (because it's very long, or because you need revisions), each session starts from scratch.
Specialised tools solve this with multi-agent architectures: instead of fitting everything into one context window, they divide the work across agents that communicate with each other and share a persistent global context.
General-Purpose Models: Honest Analysis
ChatGPT (GPT-4o and GPT-5.4)
Strengths:
- Excellent writing quality across most registers and styles
- Well-suited to clearly-defined individual sections
- Strong comprehension of complex, multi-part instructions
- Familiar interface with a low learning curve
Limitations for long documents:
- No automatic coherence between sessions or chapters
- You have to manually manage context (pasting summaries, re-establishing scope)
- No native way to upload 15 sources and have them all integrated coherently
- Word export only via third-party plugins
Best for: documents up to 30–40 pages with active user supervision; individual sections that the user assembles manually.
Claude (Anthropic)
Strengths:
- 200k-token context window (the most practically useful in this class)
- Excellent for analysing long documents and extracting structured information
- Better than GPT for tasks requiring reasoning across extended texts
- Consistently strong writing quality
Limitations for long documents:
- No native document export to Word
- The web interface has limits on the size of file attachments
- No persistent glossary management across sessions
- No document-specific architecture for generating structured, multi-chapter outputs
Best for: analysis of existing documents, writing long sections with substantial prior context, reviewing and editing drafts.
Gemini 1.5 Pro and Gemini 2.0
Strengths:
- Largest available context window (1M tokens)
- Can read long PDFs directly
- Integration with Google Workspace (Docs, Drive)
- Strong multimodal capabilities (text + images together)
Limitations for long documents:
- Writing quality in formal prose is generally below GPT and Claude
- "Lost in the middle" effect is more pronounced with very long windows
- No chapter-level coherence management for separately generated sections
- Documents produced in Gemini Advanced have basic formatting
Best for: documents in English, analysis of extensive PDFs, workflows deeply integrated with Google tools.
Specialised Tools: When They Have a Clear Advantage
Tools built specifically for long documents (like Nomos) use a fundamentally different architecture from general-purpose chatbots.
How Multi-Agent Architecture Works
Instead of a single conversation, the process is divided into phases:
- Analysis: the system reads all sources you upload and builds a map of concepts, terminology, and document structure
- Planning: a complete document outline is generated before any section is written
- Parallel generation: multiple specialised agents write chapters simultaneously — but all with access to the same global context
- Active coherence checking: an "editor" agent ensures terminology and tone are consistent throughout the full document
This resolves the two main problems of general-purpose models: cross-chapter coherence and multi-source integration.
Use Cases Where Specialised Tools Win Clearly
Theses and dissertations: Academic structure requires each section to explicitly reference preceding ones. The introduction's hypotheses must be answered in the conclusions; the theoretical framework must connect directly to the methodology. A general-purpose model can't do this automatically when sections are generated in separate sessions.
Annual reports and corporate documents: Brand identity requires tonal consistency across 150 pages. The previous year's report must serve as the stylistic reference. A general-purpose model doesn't remember the previous report unless you paste it in full with every prompt.
Technical manuals: Terminology must be perfectly consistent. In a 200-page manual, a term that appears 80 times must be used identically in every instance. This is exactly the kind of constraint multi-agent systems are built to handle.
Book translation: Characters, locations, and the author's voice must be preserved from page 1 to page 400. A model that translates fragment by fragment simply cannot guarantee this.

Full Comparison: All Use Cases vs. All Tools
| Use Case | ChatGPT | Claude | Gemini | Specialised Tool |
|---|---|---|---|---|
| Email or short article | Ideal | Ideal | Good | Unnecessary |
| Section of 10–20 pages | Good | Very good | Good | Optional |
| Document of 50 pages | Adequate | Good | Adequate | Recommended |
| Thesis / dissertation (80–150 pages) | No | Marginal | No | Necessary |
| Corporate report (150 pages) | No | No | No | Necessary |
| Book translation (300 pages) | No | No | Marginal | Necessary |
The Pricing Question
General-purpose models offer monthly subscription plans (~$20/month) covering unlimited usage for short tasks. For long documents via API, per-token costs can add up significantly.
Specialised tools typically operate on credits or per-project pricing. For a 100-page thesis or annual report, the cost in a specialised tool is usually in the $10–$20 range — versus the hours of manual assembly and coherence-fixing you'd spend doing it piecemeal with a general model.
The right comparison isn't tool price vs. tool price. It's total time to reach a coherent, high-quality document with each approach.
Conclusion
In 2025, general-purpose models are outstanding for writing tasks up to 30–40 pages. For longer documents, the absence of cross-session coherence and global context management makes them unsuitable without significant manual intervention.
Specialised long-document tools don't compete with ChatGPT or Claude on general versatility — they solve a specific problem: coherence at scale. That's exactly the problem that matters when you're writing a dissertation, an annual report, or translating a book.
The right choice depends on document length and type. For short tasks, any major general-purpose model is excellent. For documents over 50 pages with coherence requirements, a specialised tool will save more time than you'd expect — and produce a result that a general-purpose tool, used the same way, simply cannot match.