provider object is accepted on Chat Completions, Anthropic Messages, and Gemini Generate requests and is enforced consistently across non-streaming and streaming executions (including fallback paths).
Quick Start
Use theprovider object to control routing behavior per request:
provider.allow_fallbacks simply toggles the new fallback.enabled flag. Use the fallback object when you need finer control over mode, retries, or circuit breakers.Real Examples
Cost Control
“Set maximum prices for all requests”Use
max_price to enforce budget limits and prevent unexpected costs.Compliance
“Only use zero data retention providers”Enforce data privacy requirements with
zdr and data_collection filters.Performance
“Prioritize throughput for batch jobs”Use
sort: "throughput" or :nitro shortcuts for high-volume processing.Reliability
“Restrict to trusted providers”Use
only to whitelist specific providers for critical applications.Configuration Options
Provider Ordering
Control which providers are tried first:Cost Optimization
Set maximum prices to control spending:Compliance & Security
Enforce data privacy and security requirements:Provider Parameters
Explicit list of provider tags to try first. When omitted, Adaptive’s heuristics determine the initial ordering.
Whitelist of providers/endpoint tags. Requests are rejected if no allowed provider remains.
Blacklist of providers/endpoint tags to skip even when the router selects them.
Secondary ordering when
order is absent. Options: price, throughput, and latency map to cost, capacity, and responsiveness heuristics.Require specific quantization levels (e.g.,
["int8","fp8"]). Endpoint metadata is used; models lacking the requested format are filtered out.When true, the model must advertise support for every parameter implied by the request (tools, response_format, etc.).
allow (default) or deny. When deny, only providers marked as non-retentive in the registry remain. (Falls back to current metadata; future registry updates will make this stricter.)Restrict routing to Zero Data Retention endpoints.
Filter to models whose publishers have opted into distillable outputs.
Convenience flag that maps to
fallback.enabled. Set to false to disable provider retries entirely.Ceilings for prompt/completion/request/image pricing. Providers lacking explicit pricing are treated as exceeding the cap.
Maximum price for prompt tokens (USD per million tokens).
Maximum price for completion tokens (USD per million tokens).
Maximum price per request (USD).
Intelligent Routing + Provider Constraints
- Logical model selection still happens through the Adaptive router (unless you hard-code
model). - Provider constraints (order/only/quantization/price/etc.) are applied when building the physical execution plan.
- Fallback now respects
fallback.enabled. When disabled, the first provider failure surfaces directly.
quantizations: ["fp8"], every provider in the execution plan satisfies that requirement.
Nitro / Floor Shortcuts
- Append
:nitroto any model slug to implyprovider.sort = "throughput". - Append
:floorto implyprovider.sort = "price".
model directly (e.g., meta-llama/llama-3.1-70b-instruct:nitro).
Endpoint Coverage
| Endpoint | Support |
|---|---|
/v1/chat/completions | Full provider object + fallback.enabled |
/v1/messages | Same provider fields + fallback toggle |
/v1/models/:generate | Same provider fields + fallback toggle |
The Gemini streaming API now builds the same provider execution plan as the non-streaming route, so ordering and filtering are consistent everywhere.
Migration Tips
- Existing code: No changes required unless you want to leverage the new controls. Previous behavior (no
providerobject) is unchanged. - Fallback: If you relied on “unset mode = disabled,” switch to
fallback.enabled=false(orprovider.allow_fallbacks=false). - Registry metadata: Some filters (data collection, ZDR, distillable text) depend on registry tags. They currently act as “best effort” switches and will grow stricter as the registry schema expands.



