MLflow Integration Guide
This guide covers integrating DeepFix with MLflow for experiment tracking, artifact storage, and model management.
Overview
DeepFix integrates seamlessly with MLflow to provide:
- Automatic experiment tracking
- Artifact storage and retrieval
- Run metadata logging
- Model versioning support
- Experiment comparison
Prerequisites
- DeepFix installed and configured
- MLflow installed (
pip install mlflow) - MLflow tracking server running (optional, can use local file storage)
- Python 3.11 or higher
Basic Setup
Step 1: Start MLflow Server (Optional)
For centralized tracking, start an MLflow tracking server:
# Local MLflow server
mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root ./mlruns \
--host 0.0.0.0 \
--port 5000
# Or using Docker
docker run -p 5000:5000 \
-e MLFLOW_BACKEND_STORE_URI=sqlite:///mlflow.db \
-e MLFLOW_DEFAULT_ARTIFACT_ROOT=/mlruns \
-v $(pwd)/mlruns:/mlruns \
ghcr.io/mlflow/mlflow:latest \
mlflow server --host 0.0.0.0 --port 5000
Step 2: Configure MLflow in DeepFix
from deepfix_sdk.client import DeepFixClient
from deepfix_sdk.config import MLflowConfig
# Create MLflow configuration
mlflow_config = MLflowConfig(
tracking_uri="http://localhost:5000", # MLflow server URI
experiment_name="deepfix-analysis", # Experiment name
run_name="dataset-diagnosis-1" # Run name
)
# Initialize client with MLflow
client = DeepFixClient(
api_url="http://localhost:8844",
mlflow_config=mlflow_config
)
Step 3: Use DeepFix with MLflow
All operations are automatically tracked in MLflow:
from deepfix_sdk.data.datasets import ImageClassificationDataset
# Ingest dataset - tracked in MLflow
client.ingest(
dataset_name="my-dataset",
train_data=train_dataset,
test_data=test_dataset,
overwrite=False
)
# Diagnose dataset - results tracked in MLflow
result = client.diagnose_dataset(dataset_name="my-dataset")
# Results and artifacts are automatically logged to MLflow
Complete Example
Here's a complete example with MLflow integration:
from deepfix_sdk.client import DeepFixClient
from deepfix_sdk.config import MLflowConfig
from deepfix_sdk.zoo.datasets.foodwaste import load_train_and_val_datasets
from deepfix_sdk.data.datasets import ImageClassificationDataset
# Configure MLflow
mlflow_config = MLflowConfig(
tracking_uri="http://localhost:5000",
experiment_name="foodwaste-analysis",
run_name="run-2024-01-15"
)
# Initialize client with MLflow
client = DeepFixClient(
api_url="http://localhost:8844",
mlflow_config=mlflow_config,
timeout=120
)
# Load dataset
dataset_name = "cafetaria-foodwaste"
train_data, val_data = load_train_and_val_datasets(
image_size=448,
batch_size=8,
num_workers=4,
pin_memory=False
)
# Wrap datasets
train_dataset = ImageClassificationDataset(
dataset_name=dataset_name,
dataset=train_data
)
val_dataset = ImageClassificationDataset(
dataset_name=dataset_name,
dataset=val_data
)
# Ingest - tracked in MLflow
client.ingest(
dataset_name=dataset_name,
train_data=train_dataset,
test_data=val_dataset,
train_test_validation=True,
data_integrity=True,
batch_size=8,
overwrite=False
)
# Diagnose - results tracked in MLflow
result = client.diagnose_dataset(dataset_name=dataset_name)
# View results
print(result.to_text())
# All artifacts and metadata are now in MLflow
MLflow Artifact Storage
DeepFix stores various artifacts in MLflow:
Dataset Metadata
- Dataset statistics
- Class distributions
- Feature information
- Data splits
Deepchecks Reports
- Data quality checks
- Drift detection results
- Integrity check results
- Visual reports
Model Checkpoints
- Model state
- Model configuration
- Checkpoint metadata
- Deployment information
Training Artifacts
- Training metrics
- Training logs
- Hyperparameters
- Validation results
Accessing MLflow Data
Using MLflow Python API
import mlflow
# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")
# Get experiment
experiment = mlflow.get_experiment_by_name("deepfix-analysis")
experiment_id = experiment.experiment_id
# Get all runs
runs = mlflow.search_runs(experiment_ids=[experiment_id])
print(runs)
# Get specific run
run_id = runs.iloc[0]['run_id']
run = mlflow.get_run(run_id)
print(f"Run ID: {run_id}")
print(f"Status: {run.info.status}")
print(f"Parameters: {run.data.params}")
print(f"Metrics: {run.data.metrics}")
# List artifacts
artifacts = mlflow.list_artifacts(run_id)
for artifact in artifacts:
print(f" - {artifact.path}")
# Download artifact
mlflow.artifacts.download_artifacts(
run_id=run_id,
artifact_path="dataset_metadata"
)
Using MLflow UI
-
Start MLflow UI:
-
Open browser to
http://localhost:5000 -
Navigate to experiments and runs
-
View metrics, parameters, and artifacts
Advanced Usage
Custom MLflow Logging
Log additional information to MLflow:
import mlflow
# Set experiment context
mlflow.set_experiment("deepfix-analysis")
with mlflow.start_run(run_name="custom-run"):
# Log parameters
mlflow.log_param("dataset_name", "my-dataset")
mlflow.log_param("batch_size", 8)
# Use DeepFix
client = DeepFixClient(...)
result = client.diagnose_dataset(...)
# Log metrics
mlflow.log_metric("num_issues", len(result.agent_results))
mlflow.log_metric("severity_score", calculate_severity(result))
# Log artifacts
with open("diagnosis_report.txt", "w") as f:
f.write(result.to_text())
mlflow.log_artifact("diagnosis_report.txt")
# Log tags
mlflow.set_tag("dataset_type", "image")
mlflow.set_tag("analysis_version", "1.0")
Multiple Experiments
Organize analyses into multiple experiments:
from deepfix_sdk.config import MLflowConfig
# Experiment 1: Image classification
image_config = MLflowConfig(
tracking_uri="http://localhost:5000",
experiment_name="image-classification",
run_name="run-1"
)
# Experiment 2: Tabular analysis
tabular_config = MLflowConfig(
tracking_uri="http://localhost:5000",
experiment_name="tabular-analysis",
run_name="run-1"
)
# Use different configs for different analyses
image_client = DeepFixClient(api_url="...", mlflow_config=image_config)
tabular_client = DeepFixClient(api_url="...", mlflow_config=tabular_config)
MLflow Model Registry
Register models in MLflow Model Registry:
import mlflow
# After diagnosis, if you train a model
# ... training code ...
# Log model
mlflow.sklearn.log_model(model, "model")
# Register model
model_version = mlflow.register_model(
model_uri=f"runs:/{run_id}/model",
name="my-model"
)
# Promote to staging
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="my-model",
version=model_version.version,
stage="Staging"
)
Configuration Options
Local File Storage
Use local file storage instead of server:
mlflow_config = MLflowConfig(
tracking_uri="file:./mlruns", # Local file path
experiment_name="local-experiment",
run_name="run-1"
)
Remote MLflow Server
Connect to remote MLflow server:
mlflow_config = MLflowConfig(
tracking_uri="https://mlflow.example.com", # Remote server
experiment_name="remote-experiment",
run_name="run-1"
)
Databricks MLflow
Connect to Databricks MLflow:
mlflow_config = MLflowConfig(
tracking_uri="databricks", # Use Databricks
experiment_name="/Shared/deepfix-analysis",
run_name="run-1"
)
Best Practices
Experiment Organization
-
Use Descriptive Names: Use clear experiment and run names
-
Tag Runs: Add tags for filtering
-
Version Control: Track code versions
Artifact Management
- Store Only Necessary Artifacts: Don't log everything
- Use Relative Paths: Use consistent artifact paths
- Clean Up Old Runs: Archive or delete old experiments
Performance
- Async Logging: Use async logging for high-throughput scenarios
- Batch Logging: Batch multiple log operations
- Artifact Compression: Compress large artifacts before logging
Troubleshooting
Common Issues
Problem: Cannot connect to MLflow server
# Solution: Verify server is running and URI is correct
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
try:
mlflow.list_experiments() # Should not raise error
except Exception as e:
print(f"Connection error: {e}")
Problem: Artifacts not appearing
# Solution: Verify artifact root is accessible
mlflow_config = MLflowConfig(
tracking_uri="http://localhost:5000",
experiment_name="test",
run_name="test-run"
)
# Check artifact root permissions and path
Problem: Large artifacts causing performance issues
# Solution: Use artifact compression or external storage
import mlflow
mlflow.log_artifact("large_file.csv", artifact_path="compressed")
# Or use external storage and log URI
mlflow.log_param("data_uri", "s3://bucket/data.csv")
Next Steps
- Quickstart Guide - Basic DeepFix usage
- Configuration Guide - Configure DeepFix
- API Reference - Complete API documentation