Performance Divergence

Diagnostics

Procedure

Assumes a given chosen model and performance metric and that we are using k-fold cross-validation for an apples-to-apples performance comparison.

Gather Performance Scores for Train and Test Sets
- What to do: Apply k-fold cross-validation or another resampling method to evaluate the model’s performance:
  - Collect a sample of performance scores for the train set using the chosen metric (e.g., accuracy, RMSE, F1 score).
  - Collect a sample of performance scores for the test set using the same evaluation procedure and metric.
Compare Performance Distributions
- What to do: Use statistical techniques to assess the similarity between the train and test performance distributions:
  - Compute KL-divergence or JS-divergence to measure the distance between the distributions.
  - Use visualizations like density plots or histograms to compare distributions qualitatively.
Perform Statistical Test on Divergence
- What to do: Quantify the divergence between the distributions:
  - If using KL-divergence, determine if the value is within an acceptable range (e.g., close to zero indicates similarity).
  - If using JS-divergence, check if the value is statistically significant or negligible based on predefined thresholds.
Interpret Results
- What to do: Assess whether the divergence is small enough to conclude consistent performance:
  - If the divergence is small, the model generalizes well and has consistent performance on the train and test sets.
  - If the divergence is large, investigate potential issues such as overfitting, data leakage, or differences in data distribution.
Report Findings
- What to do: Summarize the analysis and provide actionable insights:
  - Report the divergence metrics and their interpretation.
  - Highlight any discrepancies and recommend next steps for addressing them if divergence is large.

Code Example

Below is a Python function that calculates the KL-divergence and JS-divergence between the cross-validation scores of the train and test sets and interprets the divergence results.

import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer
from scipy.stats import entropy
from scipy.spatial.distance import jensenshannon

def divergence_diagnostic(X_train, y_train, X_test, y_test, model, metric, k, threshold=0.1):
    """
    Perform a diagnostic test to calculate the KL-divergence and JS-divergence between train and test CV scores.

    Parameters:
        X_train, y_train: Train set X and y elements.
        X_test, y_test: Test set X and y elements.
        model: The machine learning model to evaluate.
        metric: Performance metric function compatible with scikit-learn.
        k (int): Number of folds for cross-validation.
        threshold (float): Divergence threshold (default=0.1)

    Returns:
        dict: Dictionary containing KL-divergence, JS-divergence, and interpretation.
    """
    # Cross-validation scores for the train set
    cv_train_scores = cross_val_score(model, X_train, y_train, cv=k, scoring=make_scorer(metric))
    # Cross-validation scores for the test set
    cv_test_scores = cross_val_score(model, X_test, y_test, cv=k, scoring=make_scorer(metric))

    # Normalize distributions to probabilities for divergence calculations
    train_probs, _ = np.histogram(cv_train_scores, bins=10, range=(0, 1), density=True)
    test_probs, _ = np.histogram(cv_test_scores, bins=10, range=(0, 1), density=True)
    train_probs += 1e-10  # Avoid zero probabilities
    test_probs += 1e-10

    # Calculate KL-divergence and JS-divergence
    kl_divergence = entropy(train_probs, test_probs)
    js_divergence = jensenshannon(train_probs, test_probs)

    # Interpretation
    kl_interpretation = "Similar Distributions" if kl_divergence < threshold else "Dissimilar Distributions"
    js_interpretation = "Similar Distributions" if js_divergence < threshold else "Dissimilar Distributions"

    return {
        "KL-Divergence": kl_divergence,
        "JS-Divergence": js_divergence,
        "KL-Divergence Interpretation": kl_interpretation,
        "JS-Divergence Interpretation": js_interpretation
    }

# Demo Usage
if __name__ == "__main__":
    # Generate synthetic dataset
    from sklearn.datasets import make_classification
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import accuracy_score

    X, y = make_classification(n_samples=1000, n_features=10, random_state=42)
    X_train, X_test, y_train, y_test = X[:800], X[800:], y[:800], y[800:]

    # Define model and metric
    chosen_model = RandomForestClassifier(random_state=42)
    chosen_metric = accuracy_score

    # Run the divergence diagnostic test
    results = divergence_diagnostic(
        X_train, y_train,
        X_test, y_test,
        chosen_model,
        chosen_metric,
        10
    )

    # Print Results
    print("Divergence Diagnostic Test Results:")
    for key, value in results.items():
        print(f"  - {key}: {value}")

Example Output

Divergence Diagnostic Test Results:
  - KL-Divergence: 0.10939293383499822
  - JS-Divergence: 0.1921801942367432
  - KL-Divergence Interpretation: Dissimilar Distributions
  - JS-Divergence Interpretation: Dissimilar Distributions

Performance Effect Size