Skip to content

NLP

import os
from deepfix_sdk import DeepFixClient
os.environ["DEEPFIX_API_KEY"] = "sk-empty"
client = DeepFixClient(api_url="https://deepfix.delcaux.com", timeout=120)

Sentence classification

from deepfix_sdk.data.datasets import NLPDataset
from deepfix_sdk.zoo.datasets import load_tweet_emotion_classification
train_data, test_data = load_tweet_emotion_classification(
    as_train_test=True, include_embeddings=True
)
dataset_name = "tweet_emotion_classification"
train_data = NLPDataset(dataset_name=dataset_name, dataset=train_data)
val_data = NLPDataset(dataset_name=dataset_name, dataset=test_data)
result = client.get_diagnosis(
    train_data=train_data,
    test_data=val_data,
    language="english",
)
deepchecks - WARNING - Could not find model's classes, using the observed classes. In order to make sure the classes used by the model are inferred correctly, please use the model_classes argument
UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
# Visualize results
result.to_text()
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                               DEEPFIX ANALYSIS RESULT                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


╭────────────────────────────────────────────────────── Summary ───────────────────────────────────────────────────────╮
 The cross-artifact analysis was partially successful. The DatasetArtifactsAnalyzer encountered a technical error and 
 could not provide results. However, the DeepchecksArtifactsAnalyzer identified critical data quality issues. The     
 most severe finding is a high-confidence label distribution drift between train and test sets, which poses a         
 significant risk to model validity. Additional concerns include a high ratio of outliers in text toxicity and the    
 presence of unknown tokens, both rated as medium severity. A low-severity text embeddings drift was also noted.      
 Recommendations focus on data preprocessing, tokenizer updates, and careful performance monitoring to ensure model   
 robustness. The failure of one analyzer highlights a potential need to review the artifact analysis pipeline for     
 stability.                                                                                                           
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯


                                      Summary Statistics                                      
 Metric                          Value                                                        
 Total Findings                  4                                                            
 Severity Distribution           MEDIUM: 2  HIGH: 1  LOW: 1                                   


                                  HIGH Severity Issues (1)                                   
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
 #    Finding                                   Action                                   
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
 1    Significant label distribution drift      Re-examine the train-test split          
      between train and test sets               methodology to ensure proper             
      Evidence: Label drift check failed with   stratification                           
      Cramer's V score of 0.22, exceeding the   Label distribution mismatch can lead to  
      0.15 threshold, indicating substantial    unreliable performance metrics and model 
      distribution shift in emotion labels      overfitting to train-specific patterns   
└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘


                                 MEDIUM Severity Issues (2)                                  
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
 #    Finding                                   Action                                   
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
 1    High outlier ratio in text properties,    Investigate and clean outliers in the    
      particularly Toxicity                     Toxicity property, or implement robust   
      Evidence: Text property outliers check    preprocessing                            
      failed with Toxicity property showing     High outlier ratios can distort feature  
      16.43% outlier ratio, significantly       relationships and model training,        
      above the 5% threshold                    potentially leading to poor              
                                                generalization                           
 2    Presence of unknown tokens indicating     Update tokenizer vocabulary or           
      tokenizer coverage gaps                   preprocess text to handle unknown tokens 
      Evidence: Unknown tokens check failed     appropriately                            
      with ratios of 0.79% and 0.68%,           Unknown tokens can degrade model         
      indicating unsupported tokens in the      performance and introduce noise in text  
      dataset                                   representations                          
└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘


                                   LOW Severity Issues (1)                                   
┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
 #    Finding                                   Action                                   
┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
 1    Moderate text embeddings drift            Monitor model performance closely and    
      suggesting domain shift                   consider domain adaptation techniques if 
      Evidence: Text embeddings drift showed    performance degrades                     
      AUC of 0.6, indicating some domain shift  Domain shift can affect model            
      between train and test distributions      generalization, though the current level 
                                                may be acceptable for deployment         
└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘


''