Deployments

Background

If this is your first time here, we recommend reading our Mission Statement before starting. You will find Deployments immensely more useful—and easier to configure—if you can answer the question, "What do you care about?"

For additional background, we also recommend reading the Deployments Overview.

Only API keys owned by your account can use your Deployments in the Influxion gateway.

Create a Deployment

To create a new Deployment:

Navigate to Projects → All in the sidebar.
Click the card for Project you want to use.
Click New Deployment.
Specify a Deployment Name, choose a Category, and optionally specify Tags.
[Optional, but recommended] Specify your Requirements, which are grouped by Performance, Cost, and Accuracy.
[Optional, but recommended] Choose an Optimization Target.
[Optional] Add Observability metrics that use LLM-as-a-Judge.
[Optional] Specify Model Presets that Influxion will automatically use with your requests.
Click Create Deployment.

Being realistic about your requirements helps you get the most out of Deployments. Constraints limit options at runtime—we recommend starting simple, e.g., with 1 or 2 constraints, and building up from there.

LLM-as-a-Judge Evals

Influxion provides LLM-as-a-Judge Evals integration with DeepEval. Evals dimensions can be used as part of your behavioral requirements, or added simply for observability as an additional usage metric.

LLM Evals currently execute using the openai/gpt-4o-mini model. When creating or editing a Deployment, you can specify the sampling probability, i.e., the likelihood of evaluating each individual gateway request. This value defaults to 1.0, meaning 100% of requests in the Deployment will be evaluated. A smaller value like 0.1 means roughly 10% of requests will be evaluated.

Evals require additional LLM usage, so are charged in addition to your Deployment gateway requests using the same pricing structure.

Active Deployments

Once deployed, use the Deployment Slug as the model parameter in your API requests. This is just like a plain model, except that the slug begins with a @ symbol.

curl -X POST https://api.influxion.io/v1/chat/completions \
  -H "Authorization: Bearer $INFLUXION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "@your-username/your-deployment-name",
    "messages": [
      {
        "role": "user",
        "content": "Hello, world!"
      }
    ]
  }'

import os
import requests

response = requests.post(
    "https://api.influxion.io/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['INFLUXION_API_KEY']}",
        "Content-Type": "application/json"
    },
    json={
        "model": "@your-username/your-deployment-name",
        "messages": [
            {
                "role": "user",
                "content": "Hello, world!"
            }
        ]
    }
)

print(response.json())

const response = await fetch(
  "https://api.influxion.io/v1/chat/completions",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.INFLUXION_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "@your-username/your-deployment-name",
      messages: [
        {
          role: "user",
          content: "Hello, world!",
        },
      ],
    }),
  }
);

const data = await response.json();
console.log(data);

Train a Deployment

Deployments learn from the request workloads during runtime so they can satisfy your requirements.

When a Deployment is created, it has a Ready (Unverified) status. This means it's in a learning phase, where it will route requests to many different models to understand how they respond to your workload. After behaviors are determined, the Deployment will move to a Ready (Verified) status where it continually satisfies your requirements.

In the Continuous Learning tab in the sidebar, you can initiate Learning Jobs with your own datasets. We'll run the jobs for you to help get you from Ready (Unverified) to Ready (Verified) automatically. See Continuous Learning for instructions. You're also free to run datasets yourself, of course.

Monitor a Deployment

As you use a Deployment in your application, behavior metrics are measured and summarized in the Metrics page for the Deployment. Expand individual Performance, Cost, Usage, and Error metrics to view a time series of each behavior dimension.

By default, the behavior for the entire Deployment is shown. Using the dropdown on each figure, you can show the behaviors of the individual models in the deployment, too.

You can also switch between average behaviors (the default), and p50, p90, and p99 values.

Modify a Deployment

To modify a Deployment, click the Edit button in the upper right corner of the Deployment's Summary page.

Similar to the Deployment creation page, you can add/remove/adjust your:

Requirements
Objective
LLM-as-a-Judge selections
Guardrails
Presets
[Advanced] Platform models

If you add new requirements or models, the Deployment may enter a learning phase again with a Ready (Unverified) status. See: Continuous Learning.

Delete a Deployment

To delete a Deployment, click the Edit button in the upper right corner of the Deployment's Summary page, then:

Scroll to the bottom of the page.
In the Danger Zone section, click Delete Deployment.
Type the Deployment name and click Delete to confirm.

Deleting a dataset cannot be undone.

The Deployment will be deleted and no longer accessible through the gateway.

Caveats

The Chat Completions API is publicly known and used by different model providers. However, providers may respond in their own "dialect" or with other quirks. Influxion currently returns their responses unmodified. We recommend making your application robust to these variations, especially when using Deployments that route across multiple providers. See the Model Routing page for a Deployment to view its model details.

Influxion does not currently route based on features in individual requests. We recommend that you choose models that support any capabilities your application needs, like tool calling or different modalities.

Deployment constraints are satisfied as average behaviors. Individual request behavior is generally too noisy to control without concrete SLAs from downstream providers. That typically requires expensive provisioning of reserved GPUs.

Deployments

On this page