deepchecks - WARNING - Could not find model's classes, using the observed classes. In order to make sure the classes used by the model are inferred correctly, please use the model_classes argument
UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
# Visualize resultsresult.to_text()
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│ DEEPFIX ANALYSIS RESULT │╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────────── Summary ───────────────────────────────────────────────────────╮│The cross-artifact analysis was partially successful. The DatasetArtifactsAnalyzer encountered a technical error and││could not provide results. However, the DeepchecksArtifactsAnalyzer identified critical data quality issues. The ││most severe finding is a high-confidence label distribution drift between train and test sets, which poses a ││significant risk to model validity. Additional concerns include a high ratio of outliers in text toxicity and the ││presence of unknown tokens, both rated as medium severity. A low-severity text embeddings drift was also noted. ││Recommendations focus on data preprocessing, tokenizer updates, and careful performance monitoring to ensure model ││robustness. The failure of one analyzer highlights a potential need to review the artifact analysis pipeline for ││stability.│╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Summary Statistics Metric Value Total Findings 4 Severity Distribution MEDIUM: 2 HIGH: 1 LOW: 1
HIGH Severity Issues (1) ┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ # ┃ Finding ┃ Action ┃┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩│ 1 │ Significant label distribution drift │ Re-examine the train-test split │││ between train and test sets │ methodology to ensure proper │││Evidence: Label drift check failed with │ stratification │││Cramer's V score of 0.22, exceeding the │Label distribution mismatch can lead to │││0.15 threshold, indicating substantial │unreliable performance metrics and model│││distribution shift in emotion labels│overfitting to train-specific patterns│└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘
MEDIUM Severity Issues (2) ┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ # ┃ Finding ┃ Action ┃┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩│ 1 │ High outlier ratio in text properties, │ Investigate and clean outliers in the │││ particularly Toxicity │ Toxicity property, or implement robust │││Evidence: Text property outliers check │ preprocessing │││failed with Toxicity property showing │High outlier ratios can distort feature │││16.43% outlier ratio, significantly │relationships and model training, │││above the 5% threshold│potentially leading to poor ││││generalization││ 2 │ Presence of unknown tokens indicating │ Update tokenizer vocabulary or │││ tokenizer coverage gaps │ preprocess text to handle unknown tokens │││Evidence: Unknown tokens check failed │ appropriately │││with ratios of 0.79% and 0.68%, │Unknown tokens can degrade model │││indicating unsupported tokens in the │performance and introduce noise in text │││dataset│representations│└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘
LOW Severity Issues (1) ┏━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ # ┃ Finding ┃ Action ┃┡━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩│ 1 │ Moderate text embeddings drift │ Monitor model performance closely and │││ suggesting domain shift │ consider domain adaptation techniques if │││Evidence: Text embeddings drift showed │ performance degrades │││AUC of 0.6, indicating some domain shift│Domain shift can affect model │││between train and test distributions│generalization, though the current level││││may be acceptable for deployment│└─────┴──────────────────────────────────────────┴──────────────────────────────────────────┘