NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
NVIDIA · Budget · Context 131K
nvidia/llama-3.3-nemotron-super-49b-v1.5
Data as of:
LLM API list prices change frequently (new models and price cuts are common) and vary by tier, region, batch / cache usage and time. These are list prices captured at the time shown; always verify the current price with the provider before relying on it.
Price summary
per 1M input tokens
per 1M output tokens
0.75×input + 0.25×output (factual)
Blended $/1M is a published convenience figure: 0.75 × input + 0.25 × output (a stated 3:1 input:output mix). It is descriptive arithmetic, not a value verdict.
Specifications
Capability
Capability values are the published per-model score from Open LLM Leaderboard (Hugging Face), shown as-is with no edit and no “best” verdict. The leaderboard evaluates open-weight models only and lags the newest releases, so many models (including closed/proprietary APIs) have no value and show “—”. Different benchmarks rank models differently; treat this as one signal among many. As of 2026-05-25. Open LLM Leaderboard (Hugging Face) (Apache-2.0).
Try it / official references
- OpenRouter model page (specs + try-it chat)
- Provider API documentation — NVIDIA
- Provider playground — NVIDIA
External links open the provider's own pages; list prices and availability there are authoritative.
Estimated cost per use case
| Use case | input tokens | output tokens | Cost (per 1,000 requests) |
|---|---|---|---|
| Chat / assistant | 1,000 | 500 | $0.3 |
| RAG / Q&A | 8,000 | 800 | $1.12 |
| Coding agent | 6,000 | 2,000 | $1.4 |
| Summarization | 12,000 | 600 | $1.44 |
Each row is (input_tokens/1M)×input_price + (output_tokens/1M)×output_price, scaled to 1,000 requests. Assumptions are as shown in the table. Not a recommendation.