MLflow Integration Guide

This guide covers integrating DeepFix with MLflow for experiment tracking, artifact storage, and model management.

Overview

DeepFix integrates seamlessly with MLflow to provide:

Automatic experiment tracking
Artifact storage and retrieval
Run metadata logging
Model versioning support
Experiment comparison

Prerequisites

DeepFix installed and configured
MLflow installed (pip install mlflow)
MLflow tracking server running (optional, can use local file storage)
Python 3.11 or higher

Basic Setup

Step 1: Start MLflow Server (Optional)

For centralized tracking, start an MLflow tracking server:

# Local MLflow server
mlflow server \
    --backend-store-uri sqlite:///mlflow.db \
    --default-artifact-root ./mlruns \
    --host 0.0.0.0 \
    --port 5000

# Or using Docker
docker run -p 5000:5000 \
    -e MLFLOW_BACKEND_STORE_URI=sqlite:///mlflow.db \
    -e MLFLOW_DEFAULT_ARTIFACT_ROOT=/mlruns \
    -v $(pwd)/mlruns:/mlruns \
    ghcr.io/mlflow/mlflow:latest \
    mlflow server --host 0.0.0.0 --port 5000

Step 2: Configure MLflow in DeepFix

from deepfix_sdk.client import DeepFixClient
from deepfix_sdk.config import MLflowConfig

# Create MLflow configuration
mlflow_config = MLflowConfig(
    tracking_uri="http://localhost:5000",  # MLflow server URI
    experiment_name="deepfix-analysis",    # Experiment name
    run_name="dataset-diagnosis-1"         # Run name
)

# Initialize client with MLflow
client = DeepFixClient(
    api_url="http://localhost:8844",
    mlflow_config=mlflow_config
)

Step 3: Use DeepFix with MLflow

All operations are automatically tracked in MLflow:

from deepfix_sdk.data.datasets import ImageClassificationDataset

# Ingest dataset - tracked in MLflow
client.ingest(
    dataset_name="my-dataset",
    train_data=train_dataset,
    test_data=test_dataset,
    overwrite=False
)

# Diagnose dataset - results tracked in MLflow
result = client.diagnose_dataset(dataset_name="my-dataset")

# Results and artifacts are automatically logged to MLflow

Complete Example

Here's a complete example with MLflow integration:

from deepfix_sdk.client import DeepFixClient
from deepfix_sdk.config import MLflowConfig
from deepfix_sdk.zoo.datasets.foodwaste import load_train_and_val_datasets
from deepfix_sdk.data.datasets import ImageClassificationDataset

# Configure MLflow
mlflow_config = MLflowConfig(
    tracking_uri="http://localhost:5000",
    experiment_name="foodwaste-analysis",
    run_name="run-2024-01-15"
)

# Initialize client with MLflow
client = DeepFixClient(
    api_url="http://localhost:8844",
    mlflow_config=mlflow_config,
    timeout=120
)

# Load dataset
dataset_name = "cafetaria-foodwaste"
train_data, val_data = load_train_and_val_datasets(
    image_size=448,
    batch_size=8,
    num_workers=4,
    pin_memory=False
)

# Wrap datasets
train_dataset = ImageClassificationDataset(
    dataset_name=dataset_name,
    dataset=train_data
)
val_dataset = ImageClassificationDataset(
    dataset_name=dataset_name,
    dataset=val_data
)

# Ingest - tracked in MLflow
client.ingest(
    dataset_name=dataset_name,
    train_data=train_dataset,
    test_data=val_dataset,
    train_test_validation=True,
    data_integrity=True,
    batch_size=8,
    overwrite=False
)

# Diagnose - results tracked in MLflow
result = client.diagnose_dataset(dataset_name=dataset_name)

# View results
print(result.to_text())

# All artifacts and metadata are now in MLflow

MLflow Artifact Storage

DeepFix stores various artifacts in MLflow:

Dataset Metadata

Dataset statistics
Class distributions
Feature information
Data splits

Deepchecks Reports

Data quality checks
Drift detection results
Integrity check results
Visual reports

Model Checkpoints

Model state
Model configuration
Checkpoint metadata
Deployment information

Training Artifacts

Training metrics
Training logs
Hyperparameters
Validation results

Accessing MLflow Data

Using MLflow Python API

import mlflow

# Set tracking URI
mlflow.set_tracking_uri("http://localhost:5000")

# Get experiment
experiment = mlflow.get_experiment_by_name("deepfix-analysis")
experiment_id = experiment.experiment_id

# Get all runs
runs = mlflow.search_runs(experiment_ids=[experiment_id])
print(runs)

# Get specific run
run_id = runs.iloc[0]['run_id']
run = mlflow.get_run(run_id)
print(f"Run ID: {run_id}")
print(f"Status: {run.info.status}")
print(f"Parameters: {run.data.params}")
print(f"Metrics: {run.data.metrics}")

# List artifacts
artifacts = mlflow.list_artifacts(run_id)
for artifact in artifacts:
    print(f"  - {artifact.path}")

# Download artifact
mlflow.artifacts.download_artifacts(
    run_id=run_id,
    artifact_path="dataset_metadata"
)

Using MLflow UI

Start MLflow UI:

mlflow ui --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns

Open browser to http://localhost:5000
Navigate to experiments and runs
View metrics, parameters, and artifacts

Advanced Usage

Custom MLflow Logging

Log additional information to MLflow:

import mlflow

# Set experiment context
mlflow.set_experiment("deepfix-analysis")

with mlflow.start_run(run_name="custom-run"):
    # Log parameters
    mlflow.log_param("dataset_name", "my-dataset")
    mlflow.log_param("batch_size", 8)

    # Use DeepFix
    client = DeepFixClient(...)
    result = client.diagnose_dataset(...)

    # Log metrics
    mlflow.log_metric("num_issues", len(result.agent_results))
    mlflow.log_metric("severity_score", calculate_severity(result))

    # Log artifacts
    with open("diagnosis_report.txt", "w") as f:
        f.write(result.to_text())
    mlflow.log_artifact("diagnosis_report.txt")

    # Log tags
    mlflow.set_tag("dataset_type", "image")
    mlflow.set_tag("analysis_version", "1.0")

Multiple Experiments

Organize analyses into multiple experiments:

from deepfix_sdk.config import MLflowConfig

# Experiment 1: Image classification
image_config = MLflowConfig(
    tracking_uri="http://localhost:5000",
    experiment_name="image-classification",
    run_name="run-1"
)

# Experiment 2: Tabular analysis
tabular_config = MLflowConfig(
    tracking_uri="http://localhost:5000",
    experiment_name="tabular-analysis",
    run_name="run-1"
)

# Use different configs for different analyses
image_client = DeepFixClient(api_url="...", mlflow_config=image_config)
tabular_client = DeepFixClient(api_url="...", mlflow_config=tabular_config)

MLflow Model Registry

Register models in MLflow Model Registry:

import mlflow

# After diagnosis, if you train a model
# ... training code ...

# Log model
mlflow.sklearn.log_model(model, "model")

# Register model
model_version = mlflow.register_model(
    model_uri=f"runs:/{run_id}/model",
    name="my-model"
)

# Promote to staging
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="my-model",
    version=model_version.version,
    stage="Staging"
)

Configuration Options

Local File Storage

Use local file storage instead of server:

mlflow_config = MLflowConfig(
    tracking_uri="file:./mlruns",  # Local file path
    experiment_name="local-experiment",
    run_name="run-1"
)

Remote MLflow Server

Connect to remote MLflow server:

mlflow_config = MLflowConfig(
    tracking_uri="https://mlflow.example.com",  # Remote server
    experiment_name="remote-experiment",
    run_name="run-1"
)

Databricks MLflow

Connect to Databricks MLflow:

mlflow_config = MLflowConfig(
    tracking_uri="databricks",  # Use Databricks
    experiment_name="/Shared/deepfix-analysis",
    run_name="run-1"
)

Best Practices

Experiment Organization

Use Descriptive Names: Use clear experiment and run names

experiment_name="foodwaste-classification-2024"
run_name="baseline-model-v1"

Tag Runs: Add tags for filtering

mlflow.set_tag("dataset", "foodwaste")
mlflow.set_tag("model_type", "resnet50")

Version Control: Track code versions

mlflow.log_param("git_commit", get_git_commit())

Artifact Management

Store Only Necessary Artifacts: Don't log everything
Use Relative Paths: Use consistent artifact paths
Clean Up Old Runs: Archive or delete old experiments

Performance

Async Logging: Use async logging for high-throughput scenarios
Batch Logging: Batch multiple log operations
Artifact Compression: Compress large artifacts before logging

Troubleshooting

Common Issues

Problem: Cannot connect to MLflow server

# Solution: Verify server is running and URI is correct
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
try:
    mlflow.list_experiments()  # Should not raise error
except Exception as e:
    print(f"Connection error: {e}")

Problem: Artifacts not appearing

# Solution: Verify artifact root is accessible
mlflow_config = MLflowConfig(
    tracking_uri="http://localhost:5000",
    experiment_name="test",
    run_name="test-run"
)
# Check artifact root permissions and path

Problem: Large artifacts causing performance issues

# Solution: Use artifact compression or external storage
import mlflow
mlflow.log_artifact("large_file.csv", artifact_path="compressed")
# Or use external storage and log URI
mlflow.log_param("data_uri", "s3://bucket/data.csv")

Next Steps

Quickstart Guide - Basic DeepFix usage
Configuration Guide - Configure DeepFix
API Reference - Complete API documentation