Groq Llama 3.3 70B

Groq Llama 3.3 70B: $0.59/1M input tokens, 750 tok/s throughput, 250ms latency. Compare on BenchNode.io.

$0.59

per 1M input tokens

$0.79 / 1M output

Free tier available

Specifications

model id: llama-3.3-70b
model family: Llama 3.3
parameters: 70B
context window tokens: 128,000
modality: text
reasoning: No

Performance

Latency (TTFT) 250 ms

lower is better

Uptime SLA 99.9%

Throughput 750 tok/s

250

ms latency

99.9%

uptime

Pricing Detail

input per 1m tokens usd: 0.59
output per 1m tokens usd: 0.79
free tier available: Yes
rate limit tpm: 6,000
rate limit rpm: 30

AI Analysis · gpt-4o-mini

Technical Verdict

The input cost is $0.59 / 1M tokens and the output cost is $0.79 / 1M tokens, which positions this API tier as mid-range for its model class, while the throughput of 750 tok/s is fast. With rate limits of 6,000 TPM and 30 RPM, this setup is more suited for low-volume prototyping, and the first-token latency of 250 ms favors batch use over real-time applications.