Together AI Llama 3.1 8B Turbo

Together AI Llama 3.1 8B Turbo: $0.18/1M input tokens, 1100 tok/s throughput, 150ms latency. Compare on BenchNode.io.

$0.18

per 1M input tokens

$0.18 / 1M output

Free tier available

Specifications

model id: llama-3.1-8b
model family: Llama 3.1
parameters: 8B
context window tokens: 128,000
modality: text
reasoning: No

Performance

Latency (TTFT) 150 ms

lower is better

Uptime SLA 99.9%

Throughput 1,100 tok/s

150

ms latency

99.9%

uptime

Pricing Detail

input per 1m tokens usd: 0.18
output per 1m tokens usd: 0.18
free tier available: Yes
rate limit tpm: 60,000
rate limit rpm: 60

AI Analysis · gpt-4o-mini

Technical Verdict

With an input and output cost of $0.18 per 1M tokens and a throughput of 1100 tok/s, this pricing is mid-tier for the model class, while the throughput is fast for real-time applications. The rate limits of 60,000 TPM and 60 RPM indicate suitability for low-volume prototyping, and the first-token latency of 150 ms favors real-time streaming or chat applications.