Provider Routing

Adaptive Proxy exposes granular provider controls so you can steer routing per request without sacrificing the intelligent model router. The provider object is accepted on Chat Completions, Anthropic Messages, and Gemini Generate requests and is enforced consistently across non-streaming and streaming executions (including fallback paths).

Quick Start

Use the provider object to control routing behavior per request:

const completion = await openai.chat.completions.create({
  model: "meta-llama/llama-3.1-70b-instruct",
  messages: [{ role: "user", content: "Summarize this thread." }],
  provider: {
    order: ["anthropic", "openrouter/groq"],
    only: ["anthropic", "openrouter/groq"],
    ignore: ["deepinfra"],
    sort: "price",
    quantizations: ["fp8"],
    require_parameters: true,
    data_collection: "deny",
    zdr: true,
    enforce_distillable_text: false,
    allow_fallbacks: false,
    max_price: {
      prompt: 1.2,
      completion: 2.0,
      request: 0.10
    }
  }
});

console.log(`Used provider: ${completion.provider}`);

provider.allow_fallbacks simply toggles the new fallback.enabled flag. Use the fallback object when you need finer control over mode, retries, or circuit breakers.

Real Examples

Cost Control

“Set maximum prices for all requests”Use max_price to enforce budget limits and prevent unexpected costs.

provider: {
  max_price: { prompt: 1.0, completion: 1.5 }
}

Compliance

“Only use zero data retention providers”Enforce data privacy requirements with zdr and data_collection filters.

provider: {
  zdr: true,
  data_collection: "deny"
}

Performance

“Prioritize throughput for batch jobs”Use sort: "throughput" or :nitro shortcuts for high-volume processing.

provider: {
  sort: "throughput",
  quantizations: ["fp8"]
}

Reliability

“Restrict to trusted providers”Use only to whitelist specific providers for critical applications.

provider: {
  only: ["anthropic", "openai"],
  allow_fallbacks: false
}

Configuration Options

Provider Ordering

Control which providers are tried first:

// Try Anthropic first, then OpenRouter
const completion = await openai.chat.completions.create({
  model: "claude-3-5-sonnet-20241022",
  messages: [{ role: "user", content: "Hello!" }],
  provider: {
    order: ["anthropic", "openrouter"]
  }
});

Cost Optimization

Set maximum prices to control spending:

const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Write a summary" }],
  provider: {
    sort: "price",
    max_price: {
      prompt: 0.5,      // $0.50 per million prompt tokens
      completion: 1.0,  // $1.00 per million completion tokens
      request: 0.05     // $0.05 per request
    }
  }
});

Compliance & Security

Enforce data privacy and security requirements:

const completion = await openai.chat.completions.create({
  model: "",
  messages: [{ role: "user", content: "Analyze sensitive data" }],
  provider: {
    data_collection: "deny",  // Only non-retentive providers
    zdr: true,                // Zero Data Retention only
    only: ["anthropic", "openai"]  // Trusted providers only
  }
});

Provider Parameters

order

array

Explicit list of provider tags to try first. When omitted, Adaptive’s heuristics determine the initial ordering.

only

array

Whitelist of providers/endpoint tags. Requests are rejected if no allowed provider remains.

ignore

array

Blacklist of providers/endpoint tags to skip even when the router selects them.

sort

string

Secondary ordering when order is absent. Options: price, throughput, and latency map to cost, capacity, and responsiveness heuristics.

quantizations

array

Require specific quantization levels (e.g., ["int8","fp8"]). Endpoint metadata is used; models lacking the requested format are filtered out.

require_parameters

boolean

When true, the model must advertise support for every parameter implied by the request (tools, response_format, etc.).

data_collection

string

allow (default) or deny. When deny, only providers marked as non-retentive in the registry remain. (Falls back to current metadata; future registry updates will make this stricter.)

zdr

boolean

Restrict routing to Zero Data Retention endpoints.

enforce_distillable_text

boolean

Filter to models whose publishers have opted into distillable outputs.

allow_fallbacks

boolean

Convenience flag that maps to fallback.enabled. Set to false to disable provider retries entirely.

max_price

object

Ceilings for prompt/completion/request/image pricing. Providers lacking explicit pricing are treated as exceeding the cap.

max_price.prompt

number

Maximum price for prompt tokens (USD per million tokens).

max_price.completion

number

Maximum price for completion tokens (USD per million tokens).

max_price.request

number

Maximum price per request (USD).

Intelligent Routing + Provider Constraints

Logical model selection still happens through the Adaptive router (unless you hard-code model).
Provider constraints (order/only/quantization/price/etc.) are applied when building the physical execution plan.
Fallback now respects fallback.enabled. When disabled, the first provider failure surfaces directly.

Because constraints are enforced during provider selection, both primary execution and fallback candidates adhere to the same rules. For example, if you pin quantizations: ["fp8"], every provider in the execution plan satisfies that requirement.

Nitro / Floor Shortcuts

Append :nitro to any model slug to imply provider.sort = "throughput".
Append :floor to imply provider.sort = "price".

These hints are recognized even when you specify model directly (e.g., meta-llama/llama-3.1-70b-instruct:nitro).

Endpoint Coverage

Endpoint	Support
`/v1/chat/completions`	Full provider object + `fallback.enabled`
`/v1/messages`	Same provider fields + fallback toggle
`/v1/models/:generate`	Same provider fields + fallback toggle

The Gemini streaming API now builds the same provider execution plan as the non-streaming route, so ordering and filtering are consistent everywhere.

Migration Tips

Existing code: No changes required unless you want to leverage the new controls. Previous behavior (no provider object) is unchanged.
Fallback: If you relied on “unset mode = disabled,” switch to fallback.enabled=false (or provider.allow_fallbacks=false).
Registry metadata: Some filters (data collection, ZDR, distillable text) depend on registry tags. They currently act as “best effort” switches and will grow stricter as the registry schema expands.

Use these controls to emulate the routing policies your apps expect, enforce compliance requirements, and keep Adaptive’s intelligent planner as the safety net. Whatever combination you choose, the planner guarantees all executed providers match the constraints you set.

API Reference

Learn about all available parameters

Model Routing

Understand how Adaptive selects providers

Getting Started

Key Features

Framework Integrations

Developer Tools

Examples

API Reference

Support

Quick Start

Real Examples

Cost Control

Compliance

Performance

Reliability

Configuration Options

Provider Ordering

Cost Optimization

Compliance & Security

Provider Parameters

Intelligent Routing + Provider Constraints

Nitro / Floor Shortcuts

Endpoint Coverage

Migration Tips

API Reference

Model Routing

Getting Started

Key Features

Framework Integrations

Developer Tools

Examples

API Reference

Support

​Quick Start

​Real Examples

Cost Control

Compliance

Performance

Reliability

​Configuration Options

​Provider Ordering

​Cost Optimization

​Compliance & Security

​Provider Parameters

​Intelligent Routing + Provider Constraints

​Nitro / Floor Shortcuts

​Endpoint Coverage

​Migration Tips

API Reference

Model Routing

Quick Start

Real Examples

Configuration Options

Provider Ordering

Cost Optimization

Compliance & Security

Provider Parameters

Intelligent Routing + Provider Constraints

Nitro / Floor Shortcuts

Endpoint Coverage

Migration Tips