

On May 5, 2026, a Miami-based startup called Subquadratic emerged from stealth with a bold claim: they had built the first frontier-scale large language model (LLM) using a fully sub-quadratic sparse attention architecture. Dubbed SubQ (or SubQ 1M-Preview in its initial release), the model boasts a functional 12 million token context window, dramatic efficiency gains, and competitive performance on long-context and coding benchmarks.
This isn't just another incremental model release with a larger context window slapped on top of a standard transformer. SubQ's core innovation, Subquadratic Sparse Attention (SSA), fundamentally rethinks how attention works, achieving roughly linear scaling in compute and memory for long sequences. If the claims hold under broader scrutiny, it could mark a meaningful architectural shift away from the quadratic bottleneck that has constrained LLMs since the original Transformer paper in 2017.
Standard self-attention in transformers computes relationships between every pair of tokens in a sequence. For a context of length n, this leads to O(n²) complexity in both time and (naively) memory. Techniques like FlashAttention have optimized the practical implementation, making it faster and more memory-efficient by avoiding materialization of the full attention matrix, but they do not change the underlying scaling law. Doubling the context still roughly quadruples the attention compute.
At modern scales (hundreds of thousands to millions of tokens), this becomes prohibitive. Real-world applications analyzing entire code repositories, processing months of legal documents or chat histories, running long-running agents with persistent state demand far more than the typical 128K-1M token windows of today's leaders.
Workarounds like Retrieval-Augmented Generation (RAG), chunking, summarization, and multi-agent orchestration have proliferated precisely because the base architecture fights against long, coherent reasoning. These hacks introduce fragility: lost context, compounding errors, and engineering overhead.
Subquadratic's thesis is that efficiency is intelligence for these workloads. By making long context practical and affordable, models can reason over full artifacts in one pass, preserving positional, hierarchical, and cross-reference information that fragmented approaches lose.
SSA is a content-dependent sparse attention mechanism. Instead of computing attention over all token pairs (or fixed positional patterns), the model learns to dynamically select only the relevant positions for each query token and performs exact attention over that sparse subset.
Key properties highlighted by the company:
This addresses limitations of prior approaches:
SubQ was trained in three stages:
The company positions SubQ as frontier-level, built by a team with PhDs and experience from Meta, Google, Oxford, Cambridge, BYU, and others. Co-founders: CEO Justin Dangel (serial entrepreneur) and CTO Alexander Whedon (ex-Meta, Head of Generative AI at TribeAI). They raised $29 million in seed funding at a reported ~$500M valuation from notable backers including Tinder co-founder Justin Mateen, ex-SoftBank's Javier Villamizar, and early investors in Anthropic/OpenAI/Stripe.
Subquadratic reports strong results, with some third-party validation noted:
Cost claims: $8 vs. $2,600. On the RULER 128k benchmark, SubQ delivered 95.1% accuracy for the price of a sandwich, while Opus cost as much as a used car. With input/output rates at ~$0.50/$1.50 per 1M tokens and inference at 150+ tokens/sec, the "long-context tax" is officially gone.
Caveats: As a brand-new release, broader independent reproduction is pending. Model card and fuller technical report are "coming soon." Skeptics (including on forums and from some researchers) note the high valuation for a seed-stage company, past overhyped claims in the space, and the need for public weights or API stress-testing. SWE-Bench and similar evals can be saturated or gamed, though SubQ's scores align with strong but not outlier performance.
SubQ launches with:
If SSA delivers on its promises, the impacts could be substantial:
SubQ represents an ambitious bet that the next leap in LLMs comes from architectural efficiency rather than pure scale. The combination of claimed linear scaling, strong long-context benchmarks, aggressive pricing, and practical coding tools makes it one of the more intriguing releases of 2026. Early access is available now via subq.ai, with more technical details forthcoming.
For builders, researchers, and enterprises frustrated with context limitations and RAG complexity, this merits hands-on evaluation. The industry has seen many "revolutionary" claims, some deliver incrementally, a few reshape the field. SubQ has the technical narrative, team pedigree, funding, and initial numbers to be taken seriously. The coming weeks of independent validation and user feedback will reveal whether SSA truly cracks the long-context barrier or joins the list of promising but partial solutions.
Ready to explore what SubQ (or similar frontier efficiency breakthroughs) can do for your business?
Book a discovery call with Codiste today. Our senior AI engineers can help you assess, prototype, and productionize next-generation long-context solutions tailored to your needs. Visit codiste.com or reach out directly to start the conversation. Don't let context limitations hold back your AI initiatives. The tools to move forward are here



Every great partnership begins with a conversation. Whether you're exploring possibilities or ready to scale, our team of specialists will help you navigate the journey.