US-East 42ms p50 · EU-Central 38ms p50 · AP-South 71ms p50 · ETH Block #21.4M head · SOL Block #312.1M head · Groq 750 tok/s · Together 600 tok/s · Alchemy 25ms rpc · US-East 42ms p50 · EU-Central 38ms p50 · AP-South 71ms p50 · ETH Block #21.4M head · SOL Block #312.1M head · Groq 750 tok/s · Together 600 tok/s · Alchemy 25ms rpc ·
[728×90 display ad]
AI Inference google-gemini25-pro ✓ Verified June 4, 2026

Google  Gemini 2.5 Pro

Google Gemini 2.5 Pro: $1.25/1M input tokens, 380 tok/s throughput, 800ms latency. Compare on BenchNode.io.

$1.25
per 1M input tokens
$5 / 1M output

Specifications

model id
gemini-2.5-pro
model family
Gemini 2.5
parameters
undisclosed
context window tokens
2,000,000
modality
text + vision + audio
reasoning
Yes

Performance

Latency (TTFT) 800 ms
lower is better
Uptime SLA 99.95%
Throughput 380 tok/s
800
ms latency
99.95%
uptime

Pricing Detail

input per 1m tokens usd
1.25
output per 1m tokens usd
5
free tier available
No
rate limit tpm
4,000,000
rate limit rpm
150
context caching input usd
0.31
AI Analysis · gpt-4o-mini

Technical Verdict

The input cost of $1.25 / 1M tokens and output cost of $5.0 / 1M tokens position this API tier at a premium price point, while the throughput of 380 tok/s is relatively fast for real-time applications. The rate limits of 4,000,000 TPM and 150 RPM indicate suitability for high-volume production environments, with the 800 ms first-token latency favoring real-time streaming or chat applications.

Ideal Use Case

Real-time streaming API for a team of 5-10 engineers handling >500k daily requests with a focus on low-latency interaction.

More AI Inference Configs

Ready to start building?
Google
Generous free tier
[300×250 display ad — EthicalAds / Carbon Ads]
← All providers VS comparisons → Data methodology →