Fireworks AI Llama 3.3 70B

Fireworks AI Llama 3.3 70B: $0.9/1M input tokens, 680 tok/s throughput, 260ms latency. Compare on BenchNode.io.

$0.9

per 1M input tokens

$0.9 / 1M output

Free tier available

Specifications

model id: llama-3.3-70b
model family: Llama 3.3
parameters: 70B
context window tokens: 128,000
modality: text
reasoning: No

Performance

Latency (TTFT) 260 ms

lower is better

Uptime SLA 99.9%

Throughput 680 tok/s

260

ms latency

99.9%

uptime

Pricing Detail

input per 1m tokens usd: 0.9
output per 1m tokens usd: 0.9
free tier available: Yes
rate limit tpm: 10,000
rate limit rpm: 60

AI Analysis · gpt-4o-mini

Technical Verdict

The input and output costs are $0.9 per 1M tokens, which is mid-range for this model class, while throughput at 680 tok/s is fast. The rate limits of 10,000 TPM and 60 RPM suggest suitability for low-volume prototyping, and the first-token latency of 260 ms is more favorable for batch use than real-time applications.