SubQ is the first LLM declared fully subquadratic: launched in May 2026 by the startup Subquadratic (Miami), it claims, according to its own benchmarks not yet independently reproduced, a 12-million-token context window at 1,000 times lower attention cost than standard transformers, thanks to an architecture called Subquadratic Sparse Attention (SSA).
A startup from Miami appears out of nowhere, raises 29 million dollars and announces it has solved the problem that has been dragging down AI economics since 2017. SubQ promises costs divided by 1,000 on long contexts, a 12-million-token window and an architecture that the major labs supposedly never managed to make work. If it's true, it's the breakthrough of the decade. If it's not, it's well-packaged vaporware. And either way, it won't change anything for your AI projects this year.
- ⚠️ Unverified promise: no technical report published, closed weights, private beta only.
- 📉 Unfavorable track record: Mamba, RWKV, DeepSeek Sparse: every subquadratic attempt has failed at scale.
- 💡 Wrong bottleneck: for an SME, model cost matters less than integration cost.
- 🎯 Immediate action: existing models, properly integrated, already deliver measurable value.
SubQ: the startup promising to cut costs by 1,000x
SubQ is the model name from the startup Subquadratic (Miami), founded by Justin Dangel (CEO) and Alex Whedon (former Head of Generative AI at Meta), which raised 29 million dollars in seed funding in May 2026. It claims to be the first LLM built on an entirely subquadratic architecture, with attention costs divided by 1,000 on long context windows, claims not yet independently verified as of this date.
On May 5, 2026, Subquadratic emerged from stealth mode. The company, co-founded by Justin Dangel (CEO) and Alexander Whedon (CTO, former Head of Generative AI at Meta), announced SubQ 1M-Preview: the first LLM built on a fully subquadratic attention architecture.
The pitch fits in one sentence: where standard transformers compare every token to every other token (quadratic cost), SubQ selects only the relevant relationships. Announced result: a cost that grows linearly instead of quadratically.
How does the SSA architecture work?
Standard attention in a transformer is dense. Every token looks at every other token. Double the input, and computation quadruples. That's the quadratic wall.
SubQ replaces this with what they call Subquadratic Sparse Attention (SSA). For each token, the model dynamically selects a small subset of relevant positions, then computes exact attention only on those. This is not fixed sparse attention like Longformer, nor a state-space approach like Mamba. SSA keeps the attention mechanism but makes it selective.
In terms of algorithmic complexity, SSA moves from O(n²), where every token compares against all others, to O(n·k), where k is the average number of tokens selected per position. According to The New Stack, this architecture reaches a speed 52 times faster than FlashAttention at 1 million tokens.
According to VentureBeat, at 12 million tokens, this architecture would reduce attention compute by nearly 1,000x compared to current frontier models. According to SiliconANGLE, the RULER 128K benchmark would show 95% accuracy for 8 dollars, compared to 94.8% and roughly 2,600 dollars for Claude Opus 4.6.
Numbers that would make any CTO salivate.
The fundraise confirms that serious people believe in it: 29 million in seed, a valuation reported at 500 million by The New Stack, and investors that include the co-founder of Tinder (Justin Mateen), a former SoftBank Vision Fund partner (Javier Villamizar), as well as early investors in Anthropic, OpenAI, Stripe and Brex.
What do the benchmarks show?
| Benchmark | Claude Opus * | GPT-5.5 | SubQ 1M-Preview | What it measures | Trend |
|---|---|---|---|---|---|
| SWE-Bench Verified | 87.6% (4.7) | n/r | 81.8% | Real-world software engineering | ↓ behind |
| RULER 128K | 94.8% (4.6) | n/r | 95.0% | Long-context accuracy | ↑ +0.2 pts |
| MRCR v2 (1M, 8 needles) | 32.2% (4.7) | 74.0% | 65.9% (deployed) | Long coreference resolution | → middle of the pack |
SOURCE: subq.ai benchmarks + VentureBeat · Updated 05/2026. * Subquadratic used Claude Opus 4.6 for RULER and Claude Opus 4.7 for SWE-Bench / MRCR. The SubQ MRCR column shows the deployed model's score (65.9%); the research configuration claims 83%.
The numbers are interesting on long context, but SubQ trails on SWE-Bench Verified (81.8% versus 87.6% for Claude Opus 4.7). A cheaper model that codes worse isn't necessarily a good deal for an autonomous AI agent that needs to produce reliable code.
Why researchers remain skeptical
The problem isn't that the claims are impossible. It's that they're unverifiable.
What evidence is still missing?
According to FelloAI, the full technical report has not been published. The model weights remain closed. All products (API, SubQ Code, SubQ Search) are in private beta. And the benchmarks, though presented as third-party validated, have not been independently reproduced by the community.
This is not a minor detail. The history of subquadratic architectures is a graveyard of promises.
Mamba proposed a state-space approach that was supposed to replace attention. RWKV tried to reconcile RNNs and transformers. DeepSeek introduced its own sparse attention. Every time, the benchmarks on paper were promising and the production results were disappointing. None of these architectures managed to rival dense transformers at frontier scale.
A second red flag concerns the MRCR benchmarks themselves. According to DataCamp, SubQ's research configuration reaches 83% on MRCR v2, but the deployed API model only achieves 65.9%, a 17-point gap between lab and production. This kind of gap between internal benchmarks and real-world performance is precisely what the community is waiting to see explained publicly.
The Magic.dev precedent is also instructive. According to The New Stack and VentureBeat, that startup had announced in August 2024 a 100-million-token context with a similar 1,000x efficiency advantage, and had raised roughly 500 million dollars. By early 2026, there is still no public evidence that their LTM-2-mini model is used in production outside the company. Grand contextual efficiency announcements already have a track record.
SubQ argues that SSA is fundamentally different because it preserves exact attention on selected tokens, rather than replacing it with an alternative mechanism. That's an interesting technical argument. But until the community can reproduce the results, skepticism remains the rational position.
As VentureBeat puts it, researcher reactions range "from genuine curiosity to open accusations of vaporware." Not exactly a consensus.
The real problem: your clients aren't waiting for a cheaper model
Even if SubQ delivered on every promise tomorrow morning, model cost would rarely be the top expense in an enterprise AI project. What actually holds back deployments is integration with existing tools, not the token bill.
Let's assume for a moment that SubQ delivers on all its promises. A 12-million-token context, linear costs, frontier quality. What does that concretely change for a 50-person SME looking to automate customer service or streamline prospecting?
Not much this year.
Why isn't model cost your bottleneck?
I see it every week while working with SMEs on their AI projects: token cost is almost never the blocker. What's expensive is integration. Connecting an LLM to the CRM, to emails, to the knowledge base, training the teams, handling errors, iterating on prompts. The real cost of LLMs isn't on the API bill.
According to McKinsey, companies that capture value from AI are the ones investing in integration with existing workflows, not the ones chasing the cheapest model. The pattern is always the same: an impressive demo, then months of integration before the first euro of ROI.
Why does integration matter more than architecture?
A model that's 1,000x cheaper doesn't fix the fact that your ERP exports to CSV, that your sales team doesn't use the CRM properly, or that nobody on the team knows how to write a structured prompt. In my experience with SMEs, these problems absorb the vast majority of an AI project's budget, rarely less than 70 to 80%.
The companies I work with that get concrete results aren't the ones waiting for the next architectural breakthrough. They're the ones that integrate AI into their departments with the models available today, starting with a specific and measurable use case.
"The real value isn't in the model, it's in the integration with your business processes. SubQ or not, that equation doesn't change."
Vincent, May 2026
What you should do instead of waiting
Don't postpone your AI projects waiting for SubQ. Existing models already deliver measurable value, and SubQ won't be available for enterprise production until late 2026 at the earliest, probably not before 2027.
The natural reflex when an announcement like SubQ drops is to think: "let's wait, prices will come down." That's exactly the wrong calculation.
Should you delay your AI projects waiting for SubQ?
No. For three reasons.
First, SubQ is in private beta with no announced general availability date. Even if the model works, you won't be able to use it in production for months, probably not before 2027 for reliable enterprise use.
Second, the costs of existing models are already dropping. OpenAI offers free fine-tuning, Anthropic has significantly reduced its model pricing over the past year, and open-source models like Llama allow local inference for certain use cases. You don't need an architectural breakthrough to get reasonable costs.
Third, every month of waiting is a month without the operational gains AI can already deliver. A well-configured AI agent on your sales pipeline generates value from the first week. A model that's 1,000x cheaper but doesn't exist yet generates none.
What signals should you watch to know if SubQ is serious?
Three indicators to look for:
The publication of the full technical report. Without it, any discussion of the architecture remains speculative. Independent reproduction of the benchmarks by at least two recognized research teams. And the opening of a public API with verifiable pricing, not a private, invite-only beta.
Until all three conditions are met, SubQ remains a promise, not a tool. And promises don't reduce your operating costs.
The right strategy hasn't changed: identify the task that costs you the most in time and money, plug an existing model into it, measure ROI in six weeks, iterate. It's less spectacular than a 29-million-dollar funding announcement, but it's what works. Companies that put AI at the heart of their operations today, with today's tools, will have a structural advantage over those waiting for the perfect model. At GoLive Software, we support exactly this kind of transition: pragmatic, measurable, without waiting for the next revolution.
Frequently asked questions
Is SubQ really 1,000 times cheaper than Claude or GPT?
That's what Subquadratic claims for very long contexts (12 million tokens). At 128K tokens, the announced reduction would be closer to 300x according to SiliconANGLE. These numbers have not been independently reproduced, and the model is not publicly accessible. Until the technical report is published, these claims remain unverifiable.
Can SubQ be used in production today?
No. All three products (API, SubQ Code, SubQ Search) are in private beta by request. No general availability date has been communicated. For enterprise use requiring reliability and support, you'll likely need to wait until at least late 2026, possibly 2027.
Why have subquadratic architectures always failed?
Previous attempts (Mamba, RWKV, DeepSeek Sparse Attention) replaced attention with alternative mechanisms or used fixed sparsity patterns. They performed well on benchmarks but lost quality at frontier scale. SubQ claims SSA is different because it preserves exact attention, but this claim remains to be validated.
Should an SME wait for LLM costs to drop before launching an AI project?
No. Token cost is rarely the main expense in an SME's AI project. Integration with existing tools, team training, and iterating on use cases absorb most of the budget. Waiting for a cheaper model delays operational gains that are already achievable with current models.
Could SubQ replace RAG and context pipelines?
That's the stated ambition: with 12 million tokens, there's no need to chunk, index and retrieve documents, everything fits in context. In theory, this would drastically simplify architectures. In practice, nobody has yet been able to verify that quality holds up on real-world use cases at this scale.
Vidéos YouTube
Articles & ressources
- Miami startup Subquadratic claims 1,000x AI efficiency gain with SubQ model; researchers demand independent proof · VentureBeat
- SubQ Review: The First Subquadratic LLM with a 12 Million Token Context · FelloAI
- Subquadratic launches with $29M to bring 12M-token context windows to AI · SiliconANGLE
- Subquadratic · Efficiency is Intelligence · subq.ai
