[728×90 display ad]
AI Inference google-gemini25-pro
✓ Verified June 4, 2026
Google Gemini 2.5 Pro
Google Gemini 2.5 Pro: $1.25/1M input tokens, 380 tok/s throughput, 800ms latency. Compare on BenchNode.io.
$1.25
per 1M input tokens
$5 / 1M output
Specifications
- model id
- gemini-2.5-pro
- model family
- Gemini 2.5
- parameters
- undisclosed
- context window tokens
- 2,000,000
- modality
- text + vision + audio
- reasoning
- Yes
Performance
Latency (TTFT) 800 ms
lower is better
Uptime SLA 99.95%
Throughput 380 tok/s
800
ms latency
99.95%
uptime
Pricing Detail
- input per 1m tokens usd
- 1.25
- output per 1m tokens usd
- 5
- free tier available
- No
- rate limit tpm
- 4,000,000
- rate limit rpm
- 150
- context caching input usd
- 0.31
AI Analysis · gpt-4o-mini
Technical Verdict
The input cost of $1.25 / 1M tokens and output cost of $5.0 / 1M tokens position this API tier at a premium price point, while the throughput of 380 tok/s is relatively fast for real-time applications. The rate limits of 4,000,000 TPM and 150 RPM indicate suitability for high-volume production environments, with the 800 ms first-token latency favoring real-time streaming or chat applications.
Ideal Use Case
Real-time streaming API for a team of 5-10 engineers handling >500k daily requests with a focus on low-latency interaction.
More AI Inference Configs
Ready to start building?
Google
Generous free tier
Free Try Gemini Free →
[300×250 display ad — EthicalAds / Carbon Ads]