Something changed in the last few weeks. The release of Claude Mythos moved AI-powered AppSec from an interesting experiment to a board-level conversation with its demonstrated ability to perform source-visible vulnerability discovery across large codebases. Executives who weren't asking about AI code scanning in Q1 are asking now.
The question they're asking isn't just "can it work."
It's: "what's it going to cost us to run this at scale?"
That's a harder question to answer than it looks. Provider pricing is public, but it doesn't tell you how many tokens a real security scan consumes, how active vs. stale repositories should be treated differently, or how model choice interacts with cost at your specific scale. The math is non-trivial, but getting it wrong in either direction has real consequences.
Size the raw provider exposure before you commit to an architecture. That's what the Arnica AI Cost Calculator was built to do.
The Model Benchmark That Started the Conversation
Anthropic's Claude Code Review benchmarks reference PR reviews averaging $15–25 each, billed on token usage. At first glance, that sounds manageable. But consider what it means at scale:
- A midmarket company with 400 developers generates roughly 5,200 PRs per month based on Arnica's benchmark of 13 PRs per developer per month.
- At $15–25 per PR, that's $78,000–$130,000 per month in provider costs — before you factor in full repository scans.
- Add weekly full scans of active repositories, and the number climbs further.
Those numbers are attention-grabbing. But they're also planning anchors, not foregone conclusions. Model choice matters enormously. Follow-up testing of Mythos showcase cases found that multiple smaller, cheaper models recovered the same vulnerability analyses as the flagship model. The right architecture, with the right model routing, caching, and scan design, can reduce costs dramatically without sacrificing coverage.
The Arnica AI Cost Calculator lets you test all of those variables against your specific workload before you commit to an approach.
What the Calculator Models
The calculator models the raw provider spend you'd pay directly to an AI provider (including OpenAI, Anthropic Claude, and Google Gemini) when scanning your repositories and pull requests with frontier models. It doesn't include engineering time, orchestration infrastructure, CI minutes, or triage operations. Those matter, but this is where to start.
Here's what you can configure:
Your environment
- Developer count. Drives defaults for repository count and pull request volume using Arnica's benchmark cohorts: SMB (under 100 developers), Mid-market (100–999), and Enterprise (1,000+). Mid-market defaults to 4.0 repos per developer and 60% stale repositories. Every value is editable.
- Active vs. stale repositories. This is the variable that surprises most teams. Stale repos still need periodic coverage for newly disclosed vulnerability classes, but active repos drive most continuous scan activity. Treating both identically inflates spend significantly. The calculator lets you set independent scan cadences, as often as every other day for active repos, as infrequently as annually for stale ones.
- Pull request volume. Defaults to 13 PRs per developer per month from Arnica's benchmarks. Editable.
Model selection and comparison
- Compare up to three models side by side. Input and output pricing is pre-loaded from current provider rates. See exactly how model choice affects your annual cost at your specific workload.
- The pricing multiplier is derived from relative input/output token rates, so you're comparing apples to apples across providers with different pricing structures.
Scan cadence
- Full repository scans can be set from every other day to monthly for active repos, and from every other day to annually for stale ones.
- PR scans are calculated from your developer count and PR volume.
The output is an annual provider cost estimate you can bring to a budget conversation, a vendor evaluation, or an architecture discussion.
→ Run your numbers at ai-cost-calculator.arnica.io
The Two Problems Behind the Number
The calculator surfaces a number. But the number is really a proxy for two distinct operational problems every team deploying AI code scanning at scale has to solve.
Backlog discovery. Frontier models like Mythos raise the urgency of finding latent vulnerabilities before attackers or auditors do. The question is no longer whether AI can find buried issues; it's how often you can afford to look across all repositories. For most organizations, the cost of not scanning the backlog is higher than the cost of scanning it. But that calculus only works if you're not overspending on unchanged code or applying the same expensive model to repos that don't warrant it.
Forward prevention. Backlog scans address existing risk. PR scans prevent new risk from entering the codebase while developers still have context to fix it quickly. These two workloads have different cost profiles, different urgency levels, and different tolerance for false positives. The calculator models them separately for exactly this reason.
If the Number Is Higher Than Expected
Good. That's the point of running it before you build. There are several high-impact levers that reduce provider spend without reducing coverage:
- Caching and deduplication. Repeatedly scanning unchanged code pays the provider again for analysis you've already done. Intelligent caching is one of the highest-ROI cost reduction strategies available, saving compute by reusing prior analysis when neither the file nor the prompt has changed
- Model routing. Not every scan needs the most capable model. Cheaper, smaller models recover the same vulnerability analyses in many cases. Routing heuristic scans to lower-cost models and reserving frontier models for deep analysis can dramatically reduce cost-per-scan without sacrificing quality.
- Active vs. stale targeting. Stale repositories don't need weekly scans. Calibrating cadence to actual change velocity is one of the most straightforward ways to reduce spend.
- Operational controls. Provider limits, budget enforcement, retry behavior, and failed scan handling all need owners. Without them, a runaway scan job or misconfigured retry can spike your monthly bill unexpectedly.
Arnica reduces the provider bill through scan orchestration, caching, deduplication, active-repo targeting, and model operations built for application security, while bringing results into developer workflows so token spend becomes resolved risk, not alert volume.
Raw Provider Cost Is Only One Lever
The calculator is intentionally scoped to provider spend because that's the number finance asks about and the number that's easiest to underestimate. But it's worth being explicit about what it doesn't include:
- Engineering time to build and maintain the scanning pipeline
- Orchestration infrastructure and CI minutes
- Alert routing and triage operations; raw model output still needs prioritization, suppression, fix context, and developer-native comments; without that, token spend turns into alert volume instead of resolved risk
- Model upgrade costs as frontier models, pricing, and deprecation timelines shift
Use the calculator to size the raw provider exposure. Then compare it with architectures that avoid unnecessary scans, route the right workload to the right model, and prevent new risk before it becomes backlog.
Get Your Number
The Arnica AI Cost Calculator is free, requires no signup, and takes under two minutes to run.
→ ai-cost-calculator.arnica.io
Already past the estimate stage? Talk to us about how Arnica reduces provider spend through scan orchestration, caching, and model operations, while driving 100% developer adoption across the enterprise.
Nir Valtman is the CEO and Co-Founder of Arnica. He previously served as VP Security at Finastra and CISO at Kabbage, and is a frequent speaker at Black Hat, DEF CON, RSA, and BSides conferences globally.
Reduce Risk and Accelerate Velocity
Integrate Arnica ChatOps with your development workflow to eliminate risks before they ever reach production.
.png)
