[728×90 display ad]
AI Inference google-gemini25-flash
✓ Verified June 4, 2026
Google Gemini 2.5 Flash
Google Gemini 2.5 Flash: $0.075/1M input tokens, 580 tok/s throughput, 350ms latency. Compare on BenchNode.io.
$0.075
per 1M input tokens
$0.3 / 1M output
Free tier available
Specifications
- model id
- gemini-2.5-flash
- model family
- Gemini 2.5
- parameters
- undisclosed
- context window tokens
- 1,000,000
- modality
- text + vision + audio
- reasoning
- No
Performance
Latency (TTFT) 350 ms
lower is better
Uptime SLA 99.95%
Throughput 580 tok/s
350
ms latency
99.95%
uptime
Pricing Detail
- input per 1m tokens usd
- 0.075
- output per 1m tokens usd
- 0.3
- free tier available
- Yes
- rate limit tpm
- 1,000,000
- rate limit rpm
- 2,000
- context caching input usd
- 0.019
AI Analysis · gpt-4o-mini
Technical Verdict
The input cost of $0.075 / 1M tokens and output cost of $0.3 / 1M tokens position this model in the mid-price range for its class, with a throughput of 580 tok/s indicating a fast processing capability. The rate limits of 1,000,000 TPM and 2000 RPM suggest suitability for high-volume production environments, while the first-token latency of 350 ms leans towards batch use rather than real-time applications.
Ideal Use Case
This tier is suitable for a high-volume production team processing >500k daily requests in a batch document processing architecture.
More AI Inference Configs
Ready to start building?
Google
Generous free tier
Free Try Gemini Free →
[300×250 display ad — EthicalAds / Carbon Ads]