Split Sensitivity Analysis By Performance Central Tendency

Split Sensitivity Analysis By Performance Central Tendency

We can perform a sensitivity analysis of train/test split sizes using comparisons of performance distributions.

Evaluate the quality of different train/test split ratios based on the consistency of model performance metrics, ensuring that splits represent the dataset equally well from a modeling perspective.

Procedure:

  1. Define Split Ratios:

    • Specify a range of train/test split ratios to evaluate (e.g., 50/50, 70/30, 80/20, 90/10).
  2. Select a Model:

    • Use a standard baseline model (e.g., a dummy model, random forest, or logistic regression) to ensure consistent results.
    • Ensure the model can handle the dataset’s characteristics (classification/regression, numerical/categorical).
  3. Generate Random Splits:

    • For each split ratio, perform multiple random train/test splits to account for variability.
  4. Cross-Validation on Subsets:

    • Perform k-fold cross-validation (CV) on the training set of each split to obtain a distribution of performance metrics (e.g., accuracy, RMSE, F1-score).
    • Independently, perform k-fold CV on the test set (treating it as a surrogate for unseen data) to obtain a comparable performance distribution.
  5. Compare Performance Metric Distributions:

    • For each split ratio and each performance metric:
      • Use statistical tests to compare the central tendencies of the train and test CV performance distributions.
        • Parametric test: t-test if the performance metric distribution is normal.
        • Non-parametric test: Mann-Whitney U test if the metric distribution is non-normal.
    • Record whether the train and test performance distributions are statistically similar for each metric.
  6. Aggregate Results:

    • For each split ratio, compute:
      • The percentage of splits where train and test distributions are statistically similar.
      • The percentage similarity for each performance metric across splits.
  7. Evaluate Split Quality:

    • “Good” splits yield train and test CV performance distributions that are consistently similar across metrics and splits.
    • “Bad” splits show significant differences between the train and test performance distributions, indicating that the split is not representative.

Code Example

Below is a code example that performs sensitivity analysis based on model performance metrics.

import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import make_scorer
from scipy.stats import mannwhitneyu

def split_ratio_sensitivity_test(X, y, split_ratios, model, metric, k=5, num_trials=20, alpha=0.05):
    """
    Evaluates the impact of train/test split ratios on model performance by applying k-fold cross-validation
    to both train and test sets and comparing their central tendencies using a non-parametric test.

    Parameters:
        X (np.ndarray): Feature dataset (numerical variables only).
        y (np.ndarray): Target array.
        split_ratios (list of tuples): Train/test split ratios (e.g., [(0.7, 0.3), (0.8, 0.2)]).
        model: A scikit-learn compatible model instance.
        metric: A callable metric function (e.g., sklearn.metrics.accuracy_score).
        k (int): Number of folds for cross-validation.
        num_trials (int): Number of random trials for each split ratio.
        alpha (float): Significance level for the statistical test (default is 0.05).

    Returns:
        dict: A dictionary summarizing the percentage of trials with equivalent central tendencies for each split ratio.
    """
    results = {}
    scorer = make_scorer(metric)

    for train_ratio, test_ratio in split_ratios:
        ratio_key = f"Train:{train_ratio}, Test:{test_ratio}"
        results[ratio_key] = []

        for _ in range(num_trials):
            # Split the data
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, train_size=train_ratio, test_size=test_ratio, random_state=None
            )

            # Perform k-fold cross-validation on the train set
            train_cv_scores = cross_val_score(model, X_train, y_train, cv=k, scoring=scorer)

            # Perform k-fold cross-validation on the test set
            test_cv_scores = cross_val_score(model, X_test, y_test, cv=k, scoring=scorer)

            # Compare central tendencies using Mann-Whitney U test
            _, p_value = mannwhitneyu(train_cv_scores, test_cv_scores, alternative='two-sided')

            # Evaluate consistency based on p-value
            results[ratio_key].append(p_value > alpha)  # Consistent if p-value > alpha

        # Summarize results for this split ratio
        results[ratio_key] = {
            "Consistent Trials (%)": np.mean(results[ratio_key]) * 100
        }

    return results

# Demo Usage
if __name__ == "__main__":
    from sklearn.datasets import make_classification
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import accuracy_score

    # Generate synthetic dataset
    X, y = make_classification(n_samples=1000, n_features=10, random_state=42)

    # Define split ratios to evaluate
    split_ratios = [(0.5, 0.5), (0.7, 0.3), (0.8, 0.2), (0.9, 0.1)]

    # Initialize model and metric
    model = LogisticRegression()
    metric = accuracy_score

    # Run the sensitivity analysis
    results = split_ratio_sensitivity_test(
        X, y, split_ratios, model, metric, k=10, num_trials=30, alpha=0.01
    )

    # Print results
    for split, details in results.items():
        print(f"Split Ratio {split}:")
        for key, value in details.items():
            print(f"  - {key}: {value:.2f}%")
        print("-" * 40)

Example Output:

Split Ratio Train:0.5, Test:0.5:
  - Consistent Trials (%): 100.00%
----------------------------------------
Split Ratio Train:0.7, Test:0.3:
  - Consistent Trials (%): 100.00%
----------------------------------------
Split Ratio Train:0.8, Test:0.2:
  - Consistent Trials (%): 100.00%
----------------------------------------
Split Ratio Train:0.9, Test:0.1:
  - Consistent Trials (%): 90.00%
----------------------------------------