Overview
A comprehensive study tested how different LLMs handle massive structured data contexts (up to 10,000 SQL tables) across various file formats. The research reveals that model choice matters far more than format optimization when dealing with large-scale structured data tasks.
The Breakdown
- Frontier models significantly outperform open source models on large structured data tasks - Claude 4.5, GPT-5.2, and Gemini 2.5 Pro consistently beat DeepSeek, Llama 4, and other open weight alternatives
- Filesystem-based context retrieval only works well for frontier models - open source models struggle with file-native agentic operations, explaining why terminal/coding benchmarks are dominated by commercial models
- Token-efficient formats can backfire spectacularly - TOON format used 740% more tokens than YAML on 10,000-table schemas because models couldn’t effectively parse the unfamiliar syntax
- The ‘grep tax’ emerges at scale - unfamiliar data formats force models into expensive iterative refinement loops, consuming far more tokens than simple, familiar formats like YAML