Sandbox Setup

Set up a local Michelangelo environment on your machine. This gives you a fully functional cluster with the API server, controller manager, workflow engine, object storage, and all supporting services.

Time estimate: ~20 minutes (assuming prerequisites are installed).

Prerequisites

Before you begin, make sure you have the following installed. Run each verification command to confirm:

Tool	Install	Verify
Docker	Get Docker or Colima	`docker --version`
kubectl	`brew install kubectl` or official guide	`kubectl version --client`
k3d	`brew install k3d`	`k3d --version`
Python 3.11ru+	python.org	`python3 --version`
Poetry	`curl -sSL https://install.python-poetry.org \| python3 -`	`poetry --version`

Configure `host.docker.internal`

Docker containers need to communicate with services on your host machine. Verify this hostname resolves correctly:

Open your hosts file: sudo nano /etc/hosts
Look for this line:
```
127.0.0.1 host.docker.internal
```
If missing, add it to the end of the file and save.

Install Python dependencies

From the repository root, install the Michelangelo Python packages:

cd <repo-root>/python
poetry install

Tip: Replace <repo-root> with the path where you cloned the Michelangelo repository (e.g., ~/michelangelo).

Quick start

The fastest way to get a working Michelangelo environment:

# 1. Install dependencies (from the repository root)
cd <repo-root>/python
poetry install
source .venv/bin/activate

# 2. Create the sandbox (~10-15 min on first run)
ma sandbox create

# 3. Verify everything works by running the demo pipeline
ma sandbox demo pipeline

When ma sandbox create completes successfully, you should see all Michelangelo services starting up in your K3d cluster. You can verify with:

kubectl get pods

All pods should show Running status. See Sandbox Ports and Endpoints for the full list of services and their URLs.

Sandbox commands

The ma sandbox command manages your local Kubernetes development environment.

For a complete command reference, see the CLI Reference - Sandbox Commands.

Lifecycle

The typical sandbox workflow:

create → (develop) → stop → start → (develop) → delete

Create

ma sandbox create [OPTIONS]

Flag	Description	Default
`--workflow cadence\|temporal`	Choose workflow engine	`cadence`
`--exclude [services]`	Exclude services: `apiserver`, `controllermgr`, `ui`, `worker`	none
`--create-compute-cluster`	Create an additional Ray compute cluster for distributed jobs	disabled
`--compute-cluster-name <name>`	Custom name for the compute cluster	auto-generated
`--include-experimental [services]`	Include experimental services	none

Examples:

# Full sandbox with all services (default: Cadence workflow engine)
ma sandbox create

# Sandbox with Temporal workflow engine
ma sandbox create --workflow temporal

# Sandbox without UI, with a Ray compute cluster
ma sandbox create --exclude ui --create-compute-cluster

Stop / Start

Pause and resume your sandbox without losing state:

ma sandbox stop    # preserves state
ma sandbox start   # resume where you left off

Delete

Tear down the cluster and remove all resources:

ma sandbox delete

Demo

Create pre-configured demo resources for testing:

ma sandbox demo pipeline    # registers and runs a sample pipeline
ma sandbox demo inference   # sets up demo inference server

Running your first workflow

Once your sandbox is running, you can run Uniflow workflows locally or remotely.

Local execution

Local execution runs workflows directly in your Python environment -- great for rapid development and debugging.

cd <repo-root>/python
poetry install --extras example
PYTHONPATH=. poetry run python ./examples/bert_cola/bert_cola.py

Note: Local execution doesn't support caching, retries, or resource constraints. Use remote execution for production-like behavior.

Remote execution

Remote execution deploys workflows to your sandbox's Kubernetes cluster, with full caching, retries, and resource management.

Setup:

Build a Docker image with your workflow code:

cd <repo-root>/python
docker build -t examples:latest -f ./examples/Dockerfile .

Import the image into your K3d cluster:

k3d image import examples:latest -c michelangelo-sandbox

Set up MinIO storage (object storage for workflow artifacts):
- Open the MinIO Console at http://localhost:9090
- Log in with username minioadmin and password minioadmin (these are default sandbox credentials, not for production use)
- Click "Create Bucket" and create a bucket named default

Set up the Cadence workflow domain (if using Cadence):

brew install cadence-workflow
cadence --do default d re

Run your workflow:

PYTHONPATH=. poetry run python ./examples/bert_cola/bert_cola.py \
  remote-run \
  --image docker.io/library/examples:latest \
  --storage-url s3://default \
  --yes

Monitoring your workflow:

Service	URL	What to check
Cadence Web UI	http://localhost:8088/domains/default/workflows	Workflow status and history
MinIO Console	http://localhost:9090/browser/default	Stored artifacts and data
Ray Dashboard	http://localhost:8265	Ray task execution (requires port-forward, see below)

To access the Ray Dashboard for tasks running in the cluster:

Find the Ray head service: kubectl get svc | grep ray
Port-forward it: kubectl port-forward svc/<ray-head-svc-name> 8265:8265 -n default

For more details on execution modes, see Pipeline Running Modes.

Troubleshooting

`ModuleNotFoundError: No module named 'grpc_reflection'`

This error occurs when Python dependencies aren't fully installed. Fix it by reinstalling from the python/ directory:

cd <repo-root>/python
poetry install

If the error persists, try removing the virtual environment and reinstalling:

rm -rf .venv
poetry install

Pods stuck in `ImagePullBackOff` or `ErrImagePull`

The cluster can't pull a Docker image. Check which image is failing:

kubectl describe pod <pod-name> | grep -A 5 "Events"

Common causes:

Network issues: Ensure Docker can reach ghcr.io (try docker pull ghcr.io/michelangelo-ai/worker:latest)
Image doesn't exist: Verify the image tag matches what's available in the registry

Pods stuck in `CrashLoopBackOff`

A service is starting but immediately crashing. Check its logs:

kubectl logs <pod-name>

To restart a single service (e.g., MinIO):

kubectl delete pod minio
kubectl apply -f <repo-root>/python/michelangelo/cli/sandbox/resources/minio.yaml

Port already in use

If ma sandbox create fails because a port is already bound:

# Find what's using the port (e.g., port 9090)
lsof -i :9090

# Kill the process if it's safe to do so
kill <PID>

See Sandbox Ports and Endpoints for the full list of ports used.

Poetry install fails with build errors on macOS

If you see C++ compilation errors during poetry install:

export CC=clang
export CXX=clang++
poetry install

Add those exports to your ~/.zshrc to make them permanent.

What's next?

Build your first pipeline -- Follow Getting Started with ML Pipelines to create a training workflow (~30 min)
Explore example projects -- Try Boston Housing XGBoost, BERT Text Classification, or GPT Fine-tuning
Learn the CLI -- See the CLI Reference for managing pipelines and projects

Prerequisites​

Configure host.docker.internal​

Install Python dependencies​

Quick start​

Sandbox commands​

Lifecycle​

Create​

Stop / Start​

Delete​

Demo​

Running your first workflow​

Local execution​

Remote execution​

Troubleshooting​

ModuleNotFoundError: No module named 'grpc_reflection'​

Pods stuck in ImagePullBackOff or ErrImagePull​

Pods stuck in CrashLoopBackOff​

Port already in use​

Poetry install fails with build errors on macOS​

What's next?​