Why Use This?
Use Adaptive’s intelligence, run inference wherever you want:- “I have my own OpenAI/Anthropic accounts” - Get optimal model selection, pay your providers directly
- “I run models on-premise” - Get routing decisions for your local infrastructure
- “I have enterprise contracts” - Use your existing provider relationships with intelligent routing
- “I need data privacy” - Keep inference local while getting smart model selection
Request
Provider-agnostic format - send your available models and prompt, get intelligent selection back.Array of available model specifications in
provider:model_name format. Adaptive automatically queries the Model Registry to fill in pricing, capabilities, and other details for known models.The prompt text to analyze for optimal model selection
Cost optimization preference (0.0 = cheapest, 1.0 = best performance) Default:
Uses server configuration. Override to prioritize cost savings or performance
for this specific selection.
Semantic cache configuration for this request
Response
Selected model details Complete model information for the chosen model
Alternative models (optional) Fallback model options if the primary selection is unavailable. Each alternative is a complete RegistryModel object.
Cache hit information Indicates if the selection came from cache (“semantic_exact”, “semantic_similar”, or empty if not cached)
Authentication
Same as chat completions:No Inference = Fast & Cheap
This endpoint:- ✅ Fast - No LLM inference, just routing logic
- ✅ Cheap - Doesn’t count against token usage
- ✅ Accurate - Uses exact same selection logic as real completions



