DeepInfra Llama 3.3 70B

DeepInfra Llama 3.3 70B: $0.59/1M input tokens, 700 tok/s throughput, 270ms latency. Compare on BenchNode.io.

$0.59

per 1M input tokens

$0.59 / 1M output

Free tier available

Specifications

model id: llama-3.3-70b
model family: Llama 3.3
parameters: 70B
context window tokens: 128,000
modality: text
reasoning: No

Performance

Latency (TTFT) 270 ms

lower is better

Uptime SLA 99.9%

Throughput 700 tok/s

270

ms latency

99.9%

uptime

Pricing Detail

input per 1m tokens usd: 0.59
output per 1m tokens usd: 0.59
free tier available: Yes
rate limit tpm: 15,000
rate limit rpm: 60

AI Analysis · gpt-4o-mini

Technical Verdict

The input and output costs are both $0.59 per 1M tokens, which is mid-range for a model of this scale, while the throughput of 700 tok/s is relatively fast. The rate limits of 15,000 TPM and 60 RPM are more suited for low-volume prototyping, and the first-token latency of 270 ms does not favor real-time applications but rather batch processing scenarios.