How ARKAbrain intelligently routes requests to save you money while maintaining quality. Understand the economics of AI model selection.
Different AI models have dramatically different costs. ARKAbrain analyzes each request and routes to the most cost-effective model that can handle the task well. This typically saves 40-60% compared to always using premium models.
Simple tasks go to cheap models, complex tasks to powerful ones.
Cheaper models are often faster, giving you results quickly.
Only pay for the capabilities you actually need.
Here's what each model costs per 1,000 tokens. ARKAbrain picks from this list based on your task:
| Model | Input (1K) | Output (1K) | Relative Cost |
|---|---|---|---|
| Gemini 1.5 Flash | $0.000075 | $0.0003 | 1x (cheapest) |
| DeepSeek Coder | $0.00014 | $0.00028 | ~1x |
| GPT-4o Mini | $0.00015 | $0.0006 | ~2x |
| Claude 3 Haiku | $0.00025 | $0.00125 | ~4x |
| Llama 3.1 70B | $0.00052 | $0.00075 | ~3x |
| Gemini 1.5 Pro | $0.00125 | $0.005 | ~17x |
| Claude 3.5 Sonnet | $0.003 | $0.015 | ~50x |
| o1 Mini | $0.003 | $0.012 | ~40x |
| GPT-4o | $0.005 | $0.015 | ~50x |
| GPT-4 Turbo | $0.01 | $0.03 | ~100x |
The Math
A simple rewrite using GPT-4o Mini costs about $0.001. The same task with GPT-4 Turbo would cost $0.05 - that's 50x more! ARKAbrain knows when the cheaper model is sufficient.
Here's how ARKAbrain routing compares to always using a premium model:
500-word article → bullet points
ARKAbrain Pick
GPT-4o Mini → ~$0.0008
Always Premium
GPT-4 Turbo → ~$0.025
Title + meta description for landing page
ARKAbrain Pick
GPT-4o → ~$0.008
Always Premium
GPT-4o → ~$0.008
For creative tasks, ARKAbrain uses the premium model because quality matters.
50,000-word report summary
ARKAbrain Pick
Gemini 1.5 Pro → ~$0.15
Always Premium
GPT-4 Turbo → ~$0.90
Gemini has a 2M token context window, perfect for long documents.
Multi-file bug with stack trace
ARKAbrain Pick (Pro)
GPT-4o + Claude fallback → ~$0.02
Always Premium
GPT-4 Turbo → ~$0.04
ARKAbrain also optimizes output length based on task complexity. This prevents over-generation (and wasted tokens):
| Complexity | Starter Max | Pro Max |
|---|---|---|
| Low | 300 tokens | 700 tokens |
| Medium | 700 tokens | 1,600 tokens |
| High | 1,200 tokens | 2,800 tokens |
ARKAbrain estimates the cost of each request before sending it. This estimate appears in your workspace so you always know what you're spending:
estimated_cost = (input_tokens / 1000) × input_price
+ (output_tokens / 1000) × output_price
Input tokens are known before the request. Output tokens are estimated based on complexity and capped by plan limits.
Add both OpenAI and OpenRouter API keys. This gives ARKAbrain more options for cost-effective routing and better fallback coverage.
Clear, specific requests help ARKAbrain classify tasks accurately. Saying "summarize this in 3 bullet points" is better than "tell me about this."
Tools are pre-classified, which means instant routing without classification overhead. Use "Summarize This" for summaries instead of typing "please summarize" in chat.
If you mostly do simple tasks (rewrites, summaries, explanations), the Starter plan routes to efficient models that are more than capable.
Your Keys, Your Costs
Remember: ARKA-AI uses BYOK (Bring Your Own Key). You pay your AI provider directly for API usage. ARKA-AI only charges the subscription fee for the routing intelligence and tools.