Deployments Overview

Overview

If this is your first time here, we recommend reading our Mission Statement before starting.

Deployments are critical components in Influxion's adaptive orchestration. These are where you configure your application requirements and monitor deployment behaviors. They are like a virtual model that is customized and evolves just for you.

A Deployment is, at its core, a collection of models and behavioral requirements. Once configured, Influxion's intelligent routing determines the best models to use to achieve better results than can be accomplished with any single model. Influxion uses a control system to constantly monitor model behaviors and adjust model routing to satisfy your custom requirements, e.g., based on:

Performance metrics
Cost optimization
Reliability and availability
Accuracy metrics from integrated eval platforms

Creating Deployments

What do you care about?

Just as there's no single answer to this question, there's no single best way to design a Deployment. However, there are certain properties that make some Deployments more useful than others.

Good Deployments specify desired behaviors that are achievable.

Constraints are quantifiable limits that Influxion's adaptive routing should respect. They are specified as inequalities, e.g., latency ≤ 5 seconds.

Deployments support an objective dimension that will be optimized during runtime. Objectives are minimized or maximized, depending on the dimension. For example, latency and cost dimensions are minimized, whereas throughput dimensions like Tokens Per Second are maximized. Objectives are subject to constraints, i.e., the latter are stricter requirements.

Being realistic about your requirements helps you get the most out of Deployments.

Evals

Influxion provides integration with LLM evals platforms, starting with DeepEval. Deployments can use a subset of DeepEval metrics, either as part of the behavior settings or simply for monitoring purposes. We currently support:

Answer Relevancy
Bias
Toxicity
PII Leakage
Summarization

Evals require additional LLM usage, so are charged in addition to your Deployment gateway requests using the same pricing structure.

Model Routing

Influxion automatically selects initial models for your Deployment based on your requirements, but you can modify this selection later. Models exhibit tradeoffs in the dimensions that you care about. Both model and application workload behaviors fluctuate over both short and long time periods, so having buffer in which to operate enables Influxion's adaptive routing to optimize better while still satisfying your requirements. Similarly, your application requirements may change, e.g., to reduce costs or improve performance.

A Deployment might include several models from the same provider with different cost/accuracy tradeoffs. For example, some cheaper and less performant models in combination with some more expensive frontier models.

Alternatively, a Deployment might include models from multiple providers to be more robust to provider system errors or rate limits.

Open source models can enable more predictable latency, cost, and accuracy tradeoffs. For example, a Deployment might include several models derived from a common base model, e.g., with different parameter counts and/or quantization levels.

Get Started with Deployments

See the Deployments guide for instructions on creating and managing Deployments.