Overview
A research paper systematically tested how different LLM models handle large-scale structured data contexts, using SQL generation tasks across schemas with up to 10,000 tables. The study reveals that context format familiarity matters more than file size efficiency when working with massive datasets.
The Breakdown
- Comprehensive testing methodology - 9,649 experiments across 11 models testing 4 different data formats (YAML, Markdown, JSON, TOON) with schemas from 10 to 10,000 tables to systematically measure context handling performance
- Model capability hierarchy confirmed - frontier models (Claude Opus, GPT, Gemini) significantly outperformed open source models, with frontier models benefiting from filesystem-based context retrieval while open source models struggled
- The “grep tax” phenomenon discovered - TOON format, despite being 25% smaller in file size, caused models to use 138% more tokens on medium schemas and 740% more tokens on large schemas due to unfamiliarity with the syntax
- Context format familiarity trumps efficiency - models performed better with familiar formats like YAML even when they were larger, because they could construct effective search and refinement patterns rather than struggling with unknown syntax