Configuration
This guide covers configuring DeepFix for your environment, including server configuration, client configuration, and MLflow integration.
Server Configuration
The DeepFix server requires configuration for the LLM backend used by the analysis agents.
Environment Variables
Create a .env file in the deepfix-server directory with the following variables:
# LLM Configuration
DEEPFIX_LLM_API_KEY=sk-... # API key for your LLM provider
DEEPFIX_LLM_BASE_URL=https://api.your-llm.com/v1 # Base URL of the LLM API
DEEPFIX_LLM_MODEL_NAME=gpt-4o # Model name (e.g., gpt-4o, llama-3.1)
DEEPFIX_LLM_TEMPERATURE=0.4 # Temperature (0.0-2.0)
DEEPFIX_LLM_MAX_TOKENS=6000 # Maximum tokens per request
DEEPFIX_LLM_CACHE=true # Enable caching ("true"/"false")
DEEPFIX_LLM_TRACK_USAGE=true # Track token usage ("true"/"false")
# Server Configuration
LIT_SERVER_API_KEY=ACCESS-TOKEN-TO-BE-DEFINED # Optional: API key for server
LLM Provider Configuration
OpenAI / Compatible API
DEEPFIX_LLM_API_KEY=sk-...
DEEPFIX_LLM_BASE_URL=https://api.openai.com/v1
DEEPFIX_LLM_MODEL_NAME=gpt-4o
DEEPFIX_LLM_TEMPERATURE=0.4
DEEPFIX_LLM_MAX_TOKENS=6000
Anthropic Claude
DEEPFIX_LLM_API_KEY=sk-ant-...
DEEPFIX_LLM_BASE_URL=https://api.anthropic.com/v1
DEEPFIX_LLM_MODEL_NAME=claude-3-opus-20240229
DEEPFIX_LLM_TEMPERATURE=0.7
DEEPFIX_LLM_MAX_TOKENS=8000
Local LLM (Ollama, etc.)
DEEPFIX_LLM_API_KEY=not-needed
DEEPFIX_LLM_BASE_URL=http://localhost:11434/v1
DEEPFIX_LLM_MODEL_NAME=llama3
DEEPFIX_LLM_TEMPERATURE=0.7
DEEPFIX_LLM_MAX_TOKENS=4000
Server Launch Configuration
Launch the server with configuration:
# Using .env file
uv run deepfix-server launch -e deepfix-server/.env -port 8844 -host 127.0.0.1
# Using environment variables directly
export DEEPFIX_LLM_API_KEY=sk-...
export DEEPFIX_LLM_BASE_URL=https://api.openai.com/v1
uv run deepfix-server launch -port 8844 -host 0.0.0.0
Server Configuration Options
-host: Server host (default:127.0.0.1)-port: Server port (default:8844)-e: Path to.envfile--version: Show server version
Client Configuration
Configure the DeepFix SDK client for your needs.
Basic Client Configuration
from deepfix_sdk.client import DeepFixClient
# Minimal configuration
client = DeepFixClient(api_url="http://localhost:8844")
# With custom timeout
client = DeepFixClient(
api_url="http://localhost:8844",
timeout=120 # Request timeout in seconds
)
MLflow Configuration
Configure MLflow integration for experiment tracking:
from deepfix_sdk.client import DeepFixClient
from deepfix_sdk.config import MLflowConfig
# Create MLflow configuration
mlflow_config = MLflowConfig(
tracking_uri="http://localhost:5000", # MLflow tracking server URI
experiment_name="my-ml-experiment", # Experiment name
run_name="baseline-run" # Run name
)
# Initialize client with MLflow
client = DeepFixClient(
api_url="http://localhost:8844",
mlflow_config=mlflow_config
)
Artifact Configuration
Configure which artifacts to load:
from deepfix_sdk.config import ArtifactConfig
artifact_config = ArtifactConfig(
load_dataset_metadata=True, # Load dataset metadata
load_checks=True, # Load deepchecks reports
load_model_checkpoint=False, # Load model checkpoints
load_training=False # Load training artifacts
)
# Use with pipelines (see API reference)
Environment Variables for Client
The client can also use environment variables:
# Optional: Set API key if needed
export DEEPFIX_API_KEY=your-api-key
# MLflow configuration
export MLFLOW_TRACKING_URI=http://localhost:5000
export MLFLOW_EXPERIMENT_NAME=my-experiment
MLflow Configuration
DeepFix integrates with MLflow for experiment tracking and artifact storage.
Setting Up MLflow Server
Option 1: Local MLflow Server
# Start MLflow tracking server
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns --host 0.0.0.0 --port 5000
Option 2: Docker MLflow Server
# docker-compose.yml
services:
mlflow:
image: ghcr.io/mlflow/mlflow:latest
ports:
- "5000:5000"
command: >
mlflow server
--backend-store-uri sqlite:///mlflow.db
--default-artifact-root /mlruns
--host 0.0.0.0
--port 5000
volumes:
- ./mlruns:/mlruns
Option 3: Remote MLflow Server
Use a hosted MLflow server (e.g., Databricks, MLflow on AWS):
mlflow_config = MLflowConfig(
tracking_uri="https://your-mlflow-server.com",
experiment_name="my-experiment"
)
Configuring MLflow in Code
from deepfix_sdk.client import DeepFixClient
from deepfix_sdk.config import MLflowConfig
# Configure MLflow
mlflow_config = MLflowConfig(
tracking_uri="http://localhost:5000",
experiment_name="deepfix-analysis",
run_name="dataset-diagnosis-1"
)
client = DeepFixClient(
api_url="http://localhost:8844",
mlflow_config=mlflow_config
)
# All operations will be tracked in MLflow
client.ingest(...)
result = client.diagnose_dataset(...)
MLflow Artifact Storage
DeepFix stores artifacts in MLflow:
- Dataset Metadata: Dataset statistics and properties
- Deepchecks Reports: Data quality check results
- Model Checkpoints: Model state and configuration
- Training Artifacts: Training metrics and logs
Access artifacts via MLflow UI or Python API:
import mlflow
# Access MLflow run
run = mlflow.get_run(run_id="...")
# List artifacts
artifacts = mlflow.list_artifacts(run_id="...")
# Download artifacts
mlflow.artifacts.download_artifacts(run_id="...", artifact_path="dataset_metadata")
Advanced Configuration
Custom Timeout Settings
# Increase timeout for large datasets
client = DeepFixClient(
api_url="http://localhost:8844",
timeout=300 # 5 minutes
)
Batch Size Configuration
# Adjust batch size based on available memory
client.ingest(
dataset_name="my-dataset",
train_data=train_dataset,
batch_size=16 # Increase for better throughput, decrease for memory constraints
)
Server URL Configuration
# Local development
client = DeepFixClient(api_url="http://localhost:8844")
# Remote server
client = DeepFixClient(api_url="http://deepfix.example.com:8844")
# With authentication (if configured)
client = DeepFixClient(
api_url="https://deepfix.example.com:8844",
timeout=120
)
Configuration Best Practices
Security
- Never commit
.envfiles: Add.envto.gitignore - Use environment variables in production: Set via deployment system
- Rotate API keys regularly: Update
DEEPFIX_LLM_API_KEYperiodically - Use secure connections: Prefer HTTPS for remote servers
Performance
- Optimize batch sizes: Balance memory usage and throughput
- Configure appropriate timeouts: Based on dataset size and network latency
- Enable caching: Set
DEEPFIX_LLM_CACHE=truefor repeated analyses - Use local MLflow: For faster artifact access during development
Development
- Use separate configs: Different
.envfiles for dev/staging/prod - Version control config templates: Keep
env.exampleupdated - Document configuration: Note any custom settings
- Test configurations: Verify settings before production deployment
Troubleshooting
Configuration Issues
Problem: Server fails to start
# Solution: Verify all required environment variables are set
cat deepfix-server/.env
# Check for missing variables
DEEPFIX_LLM_API_KEY=...
DEEPFIX_LLM_BASE_URL=...
DEEPFIX_LLM_MODEL_NAME=...
Problem: LLM connection errors
# Solution: Verify API key and base URL
echo $DEEPFIX_LLM_API_KEY
echo $DEEPFIX_LLM_BASE_URL
# Test LLM API connectivity
curl -H "Authorization: Bearer $DEEPFIX_LLM_API_KEY" $DEEPFIX_LLM_BASE_URL/models
Problem: MLflow connection errors
# Solution: Verify MLflow server is running
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.list_experiments() # Should not raise error
Configuration Validation
# Validate client configuration
client = DeepFixClient(api_url="http://localhost:8844")
# Test connection
try:
# Attempt a simple operation
result = client.diagnose_dataset("test-dataset")
except Exception as e:
print(f"Configuration error: {e}")
Next Steps
- Quickstart Guide - Get started with configured DeepFix
- MLflow Integration Guide - Advanced MLflow setup
- Deployment Guide - Production configuration
- API Reference - Configuration API details