Groq Llama 3.1 8B

Groq Llama 3.1 8B: $0.05/1M input tokens, 1200 tok/s throughput, 120ms latency. Compare on BenchNode.io.

$0.05

per 1M input tokens

$0.08 / 1M output

Free tier available

Specifications

model id: llama-3.1-8b
model family: Llama 3.1
parameters: 8B
context window tokens: 128,000
modality: text
reasoning: No

Performance

Latency (TTFT) 120 ms

lower is better

Uptime SLA 99.9%

Throughput 1,200 tok/s

120

ms latency

99.9%

uptime

Pricing Detail

input per 1m tokens usd: 0.05
output per 1m tokens usd: 0.08
free tier available: Yes
rate limit tpm: 20,000
rate limit rpm: 30

AI Analysis · gpt-4o-mini

Technical Verdict

The input cost is $0.05 / 1M tokens and output cost is $0.08 / 1M tokens with a throughput of 1200 tok/s, indicating a mid-tier price point for this model class and a fast throughput. The rate limits of 20,000 TPM and 30 RPM suggest suitability for low-volume prototyping, while the first-token latency of 120 ms favors real-time applications such as chat interfaces.