Skip to main content

Sandbox Setup

Set up a local Michelangelo environment on your machine. This gives you a fully functional cluster with the API server, controller manager, workflow engine, object storage, and all supporting services.

Time estimate: ~20 minutes (assuming prerequisites are installed).

Prerequisites

Before you begin, make sure you have the following installed. Run each verification command to confirm:

ToolInstallVerify
DockerGet Docker or Colimadocker --version
kubectlbrew install kubectl or official guidekubectl version --client
k3dbrew install k3dk3d --version
Python 3.11ru+python.orgpython3 --version
Poetrycurl -sSL https://install.python-poetry.org | python3 -poetry --version

Configure host.docker.internal

Docker containers need to communicate with services on your host machine. Verify this hostname resolves correctly:

  1. Open your hosts file: sudo nano /etc/hosts
  2. Look for this line:
    127.0.0.1 host.docker.internal
  3. If missing, add it to the end of the file and save.

Install Python dependencies

From the repository root, install the Michelangelo Python packages:

cd <repo-root>/python
poetry install

Tip: Replace <repo-root> with the path where you cloned the Michelangelo repository (e.g., ~/michelangelo).


Quick start

The fastest way to get a working Michelangelo environment:

# 1. Install dependencies (from the repository root)
cd <repo-root>/python
poetry install
source .venv/bin/activate

# 2. Create the sandbox (~10-15 min on first run)
ma sandbox create

# 3. Verify everything works by running the demo pipeline
ma sandbox demo pipeline

When ma sandbox create completes successfully, you should see all Michelangelo services starting up in your K3d cluster. You can verify with:

kubectl get pods

All pods should show Running status. See Sandbox Ports and Endpoints for the full list of services and their URLs.


Sandbox commands

The ma sandbox command manages your local Kubernetes development environment.

For a complete command reference, see the CLI Reference - Sandbox Commands.

Lifecycle

The typical sandbox workflow:

create → (develop) → stop → start → (develop) → delete

Create

ma sandbox create [OPTIONS]
FlagDescriptionDefault
--workflow cadence|temporalChoose workflow enginecadence
--exclude [services]Exclude services: apiserver, controllermgr, ui, workernone
--create-compute-clusterCreate an additional Ray compute cluster for distributed jobsdisabled
--compute-cluster-name <name>Custom name for the compute clusterauto-generated
--include-experimental [services]Include experimental servicesnone

Examples:

# Full sandbox with all services (default: Cadence workflow engine)
ma sandbox create

# Sandbox with Temporal workflow engine
ma sandbox create --workflow temporal

# Sandbox without UI, with a Ray compute cluster
ma sandbox create --exclude ui --create-compute-cluster

Stop / Start

Pause and resume your sandbox without losing state:

ma sandbox stop    # preserves state
ma sandbox start # resume where you left off

Delete

Tear down the cluster and remove all resources:

ma sandbox delete

Demo

Create pre-configured demo resources for testing:

ma sandbox demo pipeline    # registers and runs a sample pipeline
ma sandbox demo inference # sets up demo inference server

Running your first workflow

Once your sandbox is running, you can run Uniflow workflows locally or remotely.

Local execution

Local execution runs workflows directly in your Python environment -- great for rapid development and debugging.

cd <repo-root>/python
poetry install --extras example
PYTHONPATH=. poetry run python ./examples/bert_cola/bert_cola.py

Note: Local execution doesn't support caching, retries, or resource constraints. Use remote execution for production-like behavior.

Remote execution

Remote execution deploys workflows to your sandbox's Kubernetes cluster, with full caching, retries, and resource management.

Setup:

  1. Build a Docker image with your workflow code:

    cd <repo-root>/python
    docker build -t examples:latest -f ./examples/Dockerfile .
  2. Import the image into your K3d cluster:

    k3d image import examples:latest -c michelangelo-sandbox
  3. Set up MinIO storage (object storage for workflow artifacts):

    • Open the MinIO Console at http://localhost:9090
    • Log in with username minioadmin and password minioadmin (these are default sandbox credentials, not for production use)
    • Click "Create Bucket" and create a bucket named default
  4. Set up the Cadence workflow domain (if using Cadence):

    brew install cadence-workflow
    cadence --do default d re
  5. Run your workflow:

    PYTHONPATH=. poetry run python ./examples/bert_cola/bert_cola.py \
    remote-run \
    --image docker.io/library/examples:latest \
    --storage-url s3://default \
    --yes

Monitoring your workflow:

ServiceURLWhat to check
Cadence Web UIhttp://localhost:8088/domains/default/workflowsWorkflow status and history
MinIO Consolehttp://localhost:9090/browser/defaultStored artifacts and data
Ray Dashboardhttp://localhost:8265Ray task execution (requires port-forward, see below)

To access the Ray Dashboard for tasks running in the cluster:

  1. Find the Ray head service: kubectl get svc | grep ray
  2. Port-forward it: kubectl port-forward svc/<ray-head-svc-name> 8265:8265 -n default

For more details on execution modes, see Pipeline Running Modes.


Troubleshooting

ModuleNotFoundError: No module named 'grpc_reflection'

This error occurs when Python dependencies aren't fully installed. Fix it by reinstalling from the python/ directory:

cd <repo-root>/python
poetry install

If the error persists, try removing the virtual environment and reinstalling:

rm -rf .venv
poetry install

Pods stuck in ImagePullBackOff or ErrImagePull

The cluster can't pull a Docker image. Check which image is failing:

kubectl describe pod <pod-name> | grep -A 5 "Events"

Common causes:

  • Network issues: Ensure Docker can reach ghcr.io (try docker pull ghcr.io/michelangelo-ai/worker:latest)
  • Image doesn't exist: Verify the image tag matches what's available in the registry

Pods stuck in CrashLoopBackOff

A service is starting but immediately crashing. Check its logs:

kubectl logs <pod-name>

To restart a single service (e.g., MinIO):

kubectl delete pod minio
kubectl apply -f <repo-root>/python/michelangelo/cli/sandbox/resources/minio.yaml

Port already in use

If ma sandbox create fails because a port is already bound:

# Find what's using the port (e.g., port 9090)
lsof -i :9090

# Kill the process if it's safe to do so
kill <PID>

See Sandbox Ports and Endpoints for the full list of ports used.

Poetry install fails with build errors on macOS

If you see C++ compilation errors during poetry install:

export CC=clang
export CXX=clang++
poetry install

Add those exports to your ~/.zshrc to make them permanent.


What's next?