Integrate with Your Custom Backend

This guide explains how to extend Michelangelo Inference with custom serving backends, model configuration providers, and traffic routing.

Overview

Michelangelo Inference uses a plugin-based architecture with three main extension points:

Interface	Purpose	Reference Implementation
`Backend`	Provision inference server infrastructure	Triton
`ModelConfigProvider`	Manage model configurations	ConfigMap-based
`RouteProvider`	Route traffic to models	Gateway API HTTPRoute

Each interface is designed to be idempotent—implementations should handle repeated calls gracefully.

1. Backend Interface

The Backend interface abstracts inference server provisioning for different frameworks (Triton, vLLM, TensorRT-LLM, etc.).

Interface: go/components/inferenceserver/backends/interface.go

type Backend interface {
    CreateServer(ctx context.Context, logger *zap.Logger, kubeClient client.Client, inferenceServer *v2pb.InferenceServer) (*ServerStatus, error)
    GetServerStatus(ctx context.Context, logger *zap.Logger, kubeClient client.Client, inferenceServerName string, namespace string) (*ServerStatus, error)
    DeleteServer(ctx context.Context, logger *zap.Logger, kubeClient client.Client, inferenceServerName string, namespace string) error
    IsHealthy(ctx context.Context, logger *zap.Logger, kubeClient client.Client, inferenceServerName string, namespace string) (bool, error)
    CheckModelStatus(ctx context.Context, logger *zap.Logger, kubeClient client.Client, httpClient *http.Client, inferenceServerName string, namespace string, modelName string) (bool, error)
}

Reference Implementation: go/components/inferenceserver/backends/triton.go

Registry: go/components/inferenceserver/backends/registry.go

To add a new backend:

Create a new file (e.g., torchserve.go) implementing the Backend interface
Register it in the Registry with the appropriate BackendType

2. ModelConfigProvider Interface

The ModelConfigProvider manages model configurations for inference servers. This enables a sidecar pattern where a sidecar container watches the config and loads/unloads models accordingly.

Interface: go/components/inferenceserver/modelconfig/interface.go

type ModelConfigProvider interface {
    CreateModelConfig(ctx context.Context, logger *zap.Logger, kubeclient client.Client, inferenceServerName string, namespace string, labels map[string]string, annotations map[string]string) error
    CheckModelConfigExists(ctx context.Context, logger *zap.Logger, kubeclient client.Client, inferenceServerName string, namespace string) (bool, error)
    DeleteModelConfig(ctx context.Context, logger *zap.Logger, kubeclient client.Client, inferenceServerName string, namespace string) error
    GetModelsFromConfig(ctx context.Context, logger *zap.Logger, kubeclient client.Client, inferenceServerName string, namespace string) ([]ModelConfigEntry, error)
    AddModelToConfig(ctx context.Context, logger *zap.Logger, kubeclient client.Client, inferenceServerName string, namespace string, entry ModelConfigEntry) error
    RemoveModelFromConfig(ctx context.Context, logger *zap.Logger, kubeclient client.Client, inferenceServerName string, namespace string, modelName string) error
}

Reference Implementation: go/components/inferenceserver/modelconfig/provider.go

How It Works

The InferenceServer controller creates/deletes the model config, while the Deployment controller adds/removes individual model entries:

InferenceServer Controller → CreateModelConfig(), DeleteModelConfig()
Deployment Controller → AddModelToConfig(), RemoveModelFromConfig()

3. RouteProvider Interface

The RouteProvider manages traffic routing to deployed models.

Interface: go/components/deployment/route/interface.go

type RouteProvider interface {
    EnsureDeploymentRoute(ctx context.Context, logger *zap.Logger, client dynamic.Interface, deploymentName string, namespace string, inferenceServerName string, modelName string) error
    CheckDeploymentRouteStatus(ctx context.Context, logger *zap.Logger, client dynamic.Interface, deploymentName string, namespace string, inferenceServerName string, modelName string) (bool, error)
    DeploymentRouteExists(ctx context.Context, logger *zap.Logger, client dynamic.Interface, deploymentName string, namespace string) (bool, error)
    DeleteDeploymentRoute(ctx context.Context, logger *zap.Logger, client dynamic.Interface, deploymentName string, namespace string) error
}

Reference Implementation: go/components/deployment/route/httproute.go

Default Behavior

The default implementation creates HTTPRoutes that:

Match requests on path /{inferenceServerName}/{deploymentName}
Rewrite the path to /v2/models/{modelName} (Triton V2 inference protocol)
Route to the inference server's Service

Best Practices

Idempotency: All methods should handle repeated calls. Use errors.IsAlreadyExists and client.IgnoreNotFound appropriately.
Structured Logging: Use the provided *zap.Logger with contextual fields.
Error Handling: Wrap errors with context using fmt.Errorf("message: %w", err).
Health Checks: Prefer checking Kubernetes resource status (Deployment conditions) over making HTTP calls when possible.

Overview​

1. Backend Interface​

To add a new backend:​

2. ModelConfigProvider Interface​

How It Works​

3. RouteProvider Interface​

Default Behavior​

Best Practices​