Performance Divergence
Procedure
Assumes a given chosen model and performance metric and that we are using k-fold cross-validation for an apples-to-apples performance comparison.
-
Gather Performance Scores for Train and Test Sets
- What to do: Apply k-fold cross-validation or another resampling method to evaluate the model’s performance:
- Collect a sample of performance scores for the train set using the chosen metric (e.g., accuracy, RMSE, F1 score).
- Collect a sample of performance scores for the test set using the same evaluation procedure and metric.
- What to do: Apply k-fold cross-validation or another resampling method to evaluate the model’s performance:
-
Compare Performance Distributions
- What to do: Use statistical techniques to assess the similarity between the train and test performance distributions:
- Compute KL-divergence or JS-divergence to measure the distance between the distributions.
- Use visualizations like density plots or histograms to compare distributions qualitatively.
- What to do: Use statistical techniques to assess the similarity between the train and test performance distributions:
-
Perform Statistical Test on Divergence
- What to do: Quantify the divergence between the distributions:
- If using KL-divergence, determine if the value is within an acceptable range (e.g., close to zero indicates similarity).
- If using JS-divergence, check if the value is statistically significant or negligible based on predefined thresholds.
- What to do: Quantify the divergence between the distributions:
-
Interpret Results
- What to do: Assess whether the divergence is small enough to conclude consistent performance:
- If the divergence is small, the model generalizes well and has consistent performance on the train and test sets.
- If the divergence is large, investigate potential issues such as overfitting, data leakage, or differences in data distribution.
- What to do: Assess whether the divergence is small enough to conclude consistent performance:
-
Report Findings
- What to do: Summarize the analysis and provide actionable insights:
- Report the divergence metrics and their interpretation.
- Highlight any discrepancies and recommend next steps for addressing them if divergence is large.
- What to do: Summarize the analysis and provide actionable insights:
Code Example
Below is a Python function that calculates the KL-divergence and JS-divergence between the cross-validation scores of the train and test sets and interprets the divergence results.
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer
from scipy.stats import entropy
from scipy.spatial.distance import jensenshannon
def divergence_diagnostic(X_train, y_train, X_test, y_test, model, metric, k, threshold=0.1):
"""
Perform a diagnostic test to calculate the KL-divergence and JS-divergence between train and test CV scores.
Parameters:
X_train, y_train: Train set X and y elements.
X_test, y_test: Test set X and y elements.
model: The machine learning model to evaluate.
metric: Performance metric function compatible with scikit-learn.
k (int): Number of folds for cross-validation.
threshold (float): Divergence threshold (default=0.1)
Returns:
dict: Dictionary containing KL-divergence, JS-divergence, and interpretation.
"""
# Cross-validation scores for the train set
cv_train_scores = cross_val_score(model, X_train, y_train, cv=k, scoring=make_scorer(metric))
# Cross-validation scores for the test set
cv_test_scores = cross_val_score(model, X_test, y_test, cv=k, scoring=make_scorer(metric))
# Normalize distributions to probabilities for divergence calculations
train_probs, _ = np.histogram(cv_train_scores, bins=10, range=(0, 1), density=True)
test_probs, _ = np.histogram(cv_test_scores, bins=10, range=(0, 1), density=True)
train_probs += 1e-10 # Avoid zero probabilities
test_probs += 1e-10
# Calculate KL-divergence and JS-divergence
kl_divergence = entropy(train_probs, test_probs)
js_divergence = jensenshannon(train_probs, test_probs)
# Interpretation
kl_interpretation = "Similar Distributions" if kl_divergence < threshold else "Dissimilar Distributions"
js_interpretation = "Similar Distributions" if js_divergence < threshold else "Dissimilar Distributions"
return {
"KL-Divergence": kl_divergence,
"JS-Divergence": js_divergence,
"KL-Divergence Interpretation": kl_interpretation,
"JS-Divergence Interpretation": js_interpretation
}
# Demo Usage
if __name__ == "__main__":
# Generate synthetic dataset
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
X_train, X_test, y_train, y_test = X[:800], X[800:], y[:800], y[800:]
# Define model and metric
chosen_model = RandomForestClassifier(random_state=42)
chosen_metric = accuracy_score
# Run the divergence diagnostic test
results = divergence_diagnostic(
X_train, y_train,
X_test, y_test,
chosen_model,
chosen_metric,
10
)
# Print Results
print("Divergence Diagnostic Test Results:")
for key, value in results.items():
print(f" - {key}: {value}")
Example Output
Divergence Diagnostic Test Results:
- KL-Divergence: 0.10939293383499822
- JS-Divergence: 0.1921801942367432
- KL-Divergence Interpretation: Dissimilar Distributions
- JS-Divergence Interpretation: Dissimilar Distributions