Two AI agencies quote the same chatbot project. One charges $1,200/month all-in. The other charges $800/month plus pass-through API costs. Most buyers pick the lower number. Six months later they find out the $800/month vendor is invoicing them $2,400/month because the API costs are billed separately. Or the $1,200/month vendor is making 60% gross margin on bundled API fees the buyer never saw. Both buyers feel ripped off — and both could have avoided the surprise by understanding how AI API costs actually work in 2026. This guide explains BYO vs. bundled pricing in plain language, with the math, the markup ranges, and the questions to ask before signing.
What you're actually paying for
Every AI chatbot, voice agent, or automation calls one or more underlying LLMs — Anthropic Claude, OpenAI GPT, Google Gemini, Meta Llama, or an open-source equivalent. Each call to that LLM uses "tokens" (roughly 0.75 words). The LLM provider charges per token.
There's also typically infrastructure cost (cloud hosting, vector database, telephony for voice agents), but the LLM API is the largest variable cost and the one most often obscured in pricing.
Approximate 2026 LLM API pricing (varies by model and provider, check current pricing pages for exact figures):
| Model tier | Input ($/M tokens) | Output ($/M tokens) |
|---|---|---|
| Lightweight / Haiku / GPT-4o-mini class | $0.10–$1 | $0.30–$5 |
| Flagship / Sonnet / GPT-4o class | $1–$15 | $5–$75 |
| Premium / Opus / GPT-4 class | $15–$30 | $60–$120 |
| Open-source (self-hosted) | $0.05–$1 effective | $0.05–$1 effective |
A typical chatbot conversation uses 3,000–8,000 tokens total. A typical voice agent call uses 5,000–15,000 tokens plus voice synthesis costs.
You can verify pricing directly at the Anthropic pricing page, OpenAI pricing page, and Google AI pricing page.
What "BYO" means
BYO — "bring your own API key" or "pass-through pricing" — means the agency or platform uses your account with the underlying LLM provider, and you pay the LLM provider directly. The agency charges separately for their work (setup, integration, hosting, maintenance).
The pros:
- You see the actual cost. Your bill is exactly what the LLM provider charges.
- No markup. The agency makes money on their services, not on hidden API margins.
- You control the relationship with the LLM provider. Negotiate enterprise pricing, switch models, set rate limits — all yours.
- Audit trail. You can see every call, every token, every cost line item.
The cons:
- You need to set up the LLM provider account, billing, and access keys.
- You manage rate limits, quota issues, and outage scenarios with the vendor directly.
- Onboarding is slightly more involved.
- Some buyers find the dual billing confusing in the first month.
For most engagements above $1,000/month in API spend, BYO is the better economic choice.
What "bundled" means
Bundled pricing means the agency or platform charges one all-in fee that includes the API costs. You don't see the underlying LLM bill — the agency does, and they pass it along inside their flat fee.
The pros:
- Simple invoicing. One number per month.
- No vendor-management overhead.
- Predictable budget if usage is steady.
The cons:
- You don't see the markup. Industry markup ranges from 0% (rare but exists) to 5x or more (sadly common).
- You can't switch underlying models easily.
- You lose negotiating leverage at the LLM provider level.
- You can't audit usage independently.
Bundled pricing isn't inherently wrong — it's a legitimate way to package services. The problem is when it hides aggressive markup behind opaque billing.
What the markup actually looks like
A specific example. Suppose your chatbot handles 3,000 conversations per month, each using ~6,000 tokens (~5,000 input, ~1,000 output) on a flagship model. The real API cost:
- 5M input tokens × ~$3/M = $15
- 1M output tokens × ~$15/M = $15
- Real API cost: ~$30/month
A reputable bundled vendor builds in ~10–20% buffer for variability: $35–$40/month.
An aggressive bundled vendor builds in 2–4x markup: $60–$120/month, charged to you as "$0.04 per conversation."
An exploitative bundled vendor builds in 5x+ markup: $150–$250/month for the same $30 of real cost.
The buyer typically doesn't notice because $250/month for "AI included" sounds reasonable next to a $100,000 software licensing bill. But over 24 months, you've paid $5,280 of $720 worth of real API cost — $4,560 of pure margin.
Multiply this by every business in their book. The economics are why some agencies prefer bundled pricing aggressively.
Voice agents make markup bigger
For voice AI agents, the API cost is bigger and the markup opportunity is bigger. A typical 4-minute call uses:
- 12,000 tokens of LLM context
- 4 minutes of speech-to-text (~$0.01–$0.03)
- 4 minutes of text-to-speech (~$0.05–$0.40 depending on quality)
- Telephony at ~$0.005–$0.02 per minute
Real per-minute API cost is typically $0.03–$0.15. Bundled vendors often quote $0.30–$0.80 per minute. The markup on voice is routinely 3–8x. For deeper detail on voice agent economics, see the real ROI of an AI calling agent.
How to spot bundled markup
Three diagnostic questions:
- What model am I using? If the vendor won't tell you, the markup is likely high.
- What's the per-conversation or per-minute cost? Compare against the real-API math above.
- Can I see a token-usage breakdown for my account? A reputable vendor will provide one even on bundled pricing.
If they refuse all three, assume meaningful markup.
When bundled is genuinely worth paying for
To be balanced — bundled pricing is sometimes the right choice. The cases:
- Your monthly AI usage is low (<$200 of real API cost). Markup is small in absolute terms, and the simplicity is worth it.
- You don't have anyone internal to manage the LLM provider relationship.
- The vendor's bundling includes meaningful value beyond API access (e.g., fine-tuned models, proprietary tooling, premium support).
- You strongly prefer one predictable monthly bill for budgeting reasons.
In those cases, modest markup (10–50%) is fair compensation for the simplicity. Aggressive markup (2x+) usually isn't.
A practical comparison
Same chatbot, same scope, same volume, two pricing structures:
BYO pricing example:
- Setup: $20,000 (one-time)
- Monthly agency services: $1,200 (hosting + monitoring + maintenance)
- Monthly API costs (paid directly to LLM provider): $300
- Total monthly: $1,500
Bundled pricing example:
- Setup: $20,000 (one-time)
- Monthly all-in: $1,800 (includes API costs)
- Total monthly: $1,800
Over 24 months, BYO saves you $7,200 — or 12% on monthly costs. If usage scales, the gap widens fast. At 3x the volume, the BYO advantage grows to ~$30,000 over 24 months.
Open-source self-hosting as a third option
A third option that doesn't get enough discussion: self-hosting open-source models (Llama, Mistral, Qwen, etc.). You pay only the cloud infrastructure bill — typically $0.05–$1 per million tokens equivalent.
The pros:
- Lowest possible per-token cost
- Total data privacy (data never leaves your infrastructure)
- No vendor lock-in
- Customization through fine-tuning
The cons:
- Higher engineering complexity. You need real ML/infra expertise.
- Setup cost is higher: $30,000–$120,000 typically
- Performance lags closed-source flagships for some tasks
- Monitoring and scaling are your problem
For most SMBs, self-hosting isn't the right answer in 2026 — the engineering overhead exceeds the API savings. For enterprises with strict data residency requirements, very high volume, or strong internal AI teams, self-hosting starts to make sense above ~$5,000/month of equivalent API spend.
How to structure a contract that protects you
If you're signing for a multi-year engagement, build these terms into the contract:
- Clear delineation of agency services vs. API pass-through. Even bundled contracts should itemize the components.
- Right to audit usage. You can request monthly token-usage reports.
- Model-change rights. If the vendor switches the underlying model, you get notified, and pricing adjusts accordingly.
- Capped overage. A monthly cap so a runaway prompt or attack can't generate a $30,000 surprise invoice.
- Migration support. If you leave, the vendor provides exportable data, configuration, and training assets.
- Termination terms. No multi-year lock-in without clear off-ramp.
For more on the broader contract structure, see how to evaluate an AI agency.
What changes when models get cheaper
LLM API pricing has dropped meaningfully every 12–18 months since 2022, and that trajectory is expected to continue. A bundled contract signed in 2026 may be priced against today's API costs but billed against tomorrow's much-cheaper ones. The agency pockets the difference.
A BYO contract automatically benefits when API prices drop. Your costs go down without renegotiation. This is a major long-term advantage of BYO that few buyers factor in.
What we do at SpeedX Marketing
To be transparent — our default is BYO. We bill our services (setup, integration, hosting, maintenance) and pass API costs through at vendor cost. We can do bundled pricing if a client prefers the simpler invoicing, but we'll disclose the buffer (10–20%) and you can opt back to BYO at any time. We never run aggressive markup.
For service-specific overviews, browse our AI automation services in New York, AI chatbot development services in Los Angeles, or AI calling agent development services in San Francisco.
For related pricing topics, see what AI chatbots actually cost in 2026 and free AI tools vs. agency hidden costs.
Free pricing transparency call
If you have an existing AI vendor quote and you'd like a second opinion on whether the API costs look reasonable — book a free 30-minute call. We'll review the quote, run the real-API math against your expected volume, and tell you where the markup likely sits. Message us on WhatsApp, email info@speedxmarketing.com, or reach out through our contact page.



