US-East 42ms p50 · EU-Central 38ms p50 · AP-South 71ms p50 · ETH Block #21.4M head · SOL Block #312.1M head · Groq 750 tok/s · Together 600 tok/s · Alchemy 25ms rpc · US-East 42ms p50 · EU-Central 38ms p50 · AP-South 71ms p50 · ETH Block #21.4M head · SOL Block #312.1M head · Groq 750 tok/s · Together 600 tok/s · Alchemy 25ms rpc ·
[728×90 display ad]
AI Inference google-gemini25-flash ✓ Verified June 4, 2026

Google  Gemini 2.5 Flash

Google Gemini 2.5 Flash: $0.075/1M input tokens, 580 tok/s throughput, 350ms latency. Compare on BenchNode.io.

$0.075
per 1M input tokens
$0.3 / 1M output
Free tier available

Specifications

model id
gemini-2.5-flash
model family
Gemini 2.5
parameters
undisclosed
context window tokens
1,000,000
modality
text + vision + audio
reasoning
No

Performance

Latency (TTFT) 350 ms
lower is better
Uptime SLA 99.95%
Throughput 580 tok/s
350
ms latency
99.95%
uptime

Pricing Detail

input per 1m tokens usd
0.075
output per 1m tokens usd
0.3
free tier available
Yes
rate limit tpm
1,000,000
rate limit rpm
2,000
context caching input usd
0.019
AI Analysis · gpt-4o-mini

Technical Verdict

The input cost of $0.075 / 1M tokens and output cost of $0.3 / 1M tokens position this model in the mid-price range for its class, with a throughput of 580 tok/s indicating a fast processing capability. The rate limits of 1,000,000 TPM and 2000 RPM suggest suitability for high-volume production environments, while the first-token latency of 350 ms leans towards batch use rather than real-time applications.

Ideal Use Case

This tier is suitable for a high-volume production team processing >500k daily requests in a batch document processing architecture.

More AI Inference Configs

Ready to start building?
Google
Generous free tier
[300×250 display ad — EthicalAds / Carbon Ads]
← All providers VS comparisons → Data methodology →