NLP

import os
from deepfix_sdk import DeepFixClient

os.environ["DEEPFIX_API_KEY"] = "sk-empty"

client = DeepFixClient(api_url="https://deepfix.delcaux.com", timeout=120)

Sentence classification

from deepfix_sdk.data.datasets import NLPDataset
from deepfix_sdk.zoo.datasets import load_tweet_emotion_classification

train_data, test_data = load_tweet_emotion_classification(
    as_train_test=True, include_embeddings=True
)
dataset_name = "tweet_emotion_classification"
train_data = NLPDataset(dataset_name=dataset_name, dataset=train_data)
val_data = NLPDataset(dataset_name=dataset_name, dataset=test_data)

result = client.get_diagnosis(
    train_data=train_data,
    test_data=val_data,
    language="english",
)

deepchecks - WARNING - Could not find model's classes, using the observed classes. In order to make sure the classes used by the model are inferred correctly, please use the model_classes argument
UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.

FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.

# Visualize results
result.to_text()

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                               DEEPFIX ANALYSIS RESULT                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────────────────────────────────── Summary ───────────────────────────────────────────────────────╮
│ The cross-artifact analysis was partially successful. The DatasetArtifactsAnalyzer encountered a technical error and │
│ could not provide results. However, the DeepchecksArtifactsAnalyzer identified critical data quality issues. The     │
│ most severe finding is a high-confidence label distribution drift between train and test sets, which poses a         │
│ significant risk to model validity. Additional concerns include a high ratio of outliers in text toxicity and the    │
│ presence of unknown tokens, both rated as medium severity. A low-severity text embeddings drift was also noted.      │
│ Recommendations focus on data preprocessing, tokenizer updates, and careful performance monitoring to ensure model   │
│ robustness. The failure of one analyzer highlights a potential need to review the artifact analysis pipeline for     │
│ stability.                                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

                                      Summary Statistics                                      
 Metric                          Value                                                        
 Total Findings                  4                                                            
 Severity Distribution           MEDIUM: 2  HIGH: 1  LOW: 1

                                  HIGH Severity Issues (1)                                   
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ #   ┃ Finding                                  ┃ Action                                   ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1   │ Significant label distribution drift     │ Re-examine the train-test split          │
│     │ between train and test sets              │ methodology to ensure proper             │
│     │ Evidence: Label drift check failed with  │ stratification                           │
│     │ Cramer's V score of 0.22, exceeding the  │ Label distribution mismatch can lead to  │
│     │ 0.15 threshold, indicating substantial   │ unreliable performance metrics and model │
│     │ distribution shift in emotion labels     │ overfitting to train-specific patterns   │
└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘

                                 MEDIUM Severity Issues (2)                                  
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ #   ┃ Finding                                  ┃ Action                                   ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1   │ High outlier ratio in text properties,   │ Investigate and clean outliers in the    │
│     │ particularly Toxicity                    │ Toxicity property, or implement robust   │
│     │ Evidence: Text property outliers check   │ preprocessing                            │
│     │ failed with Toxicity property showing    │ High outlier ratios can distort feature  │
│     │ 16.43% outlier ratio, significantly      │ relationships and model training,        │
│     │ above the 5% threshold                   │ potentially leading to poor              │
│     │                                          │ generalization                           │
│ 2   │ Presence of unknown tokens indicating    │ Update tokenizer vocabulary or           │
│     │ tokenizer coverage gaps                  │ preprocess text to handle unknown tokens │
│     │ Evidence: Unknown tokens check failed    │ appropriately                            │
│     │ with ratios of 0.79% and 0.68%,          │ Unknown tokens can degrade model         │
│     │ indicating unsupported tokens in the     │ performance and introduce noise in text  │
│     │ dataset                                  │ representations                          │
└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘

                                   LOW Severity Issues (1)                                   
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ #   ┃ Finding                                  ┃ Action                                   ┃
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 1   │ Moderate text embeddings drift           │ Monitor model performance closely and    │
│     │ suggesting domain shift                  │ consider domain adaptation techniques if │
│     │ Evidence: Text embeddings drift showed   │ performance degrades                     │
│     │ AUC of 0.6, indicating some domain shift │ Domain shift can affect model            │
│     │ between train and test distributions     │ generalization, though the current level │
│     │                                          │ may be acceptable for deployment         │
└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘

''