Together AI Qwen3 32B

Together AI Qwen3 32B: $0.8/1M input tokens, 520 tok/s throughput, 320ms latency. Compare on BenchNode.io.

$0.8

per 1M input tokens

$0.8 / 1M output

Free tier available

Specifications

model id: qwen3-32b
model family: Qwen3
parameters: 32B
context window tokens: 32,768
modality: text
reasoning: Yes

Performance

Latency (TTFT) 320 ms

lower is better

Uptime SLA 99.9%

Throughput 520 tok/s

320

ms latency

99.9%

uptime

Pricing Detail

input per 1m tokens usd: 0.8
output per 1m tokens usd: 0.8
free tier available: Yes
rate limit tpm: 10,000
rate limit rpm: 60

AI Analysis · gpt-4o-mini

Technical Verdict

The input and output costs are both $0.8 per 1M tokens, which positions this API tier in the mid-price range for its model class, while the throughput of 520 tok/s is relatively fast. With rate limits of 10,000 TPM and 60 RPM, this setup is more suited for low-volume prototyping rather than high-volume production, and the first-token latency of 320 ms is better aligned with batch use than real-time applications.