Split Sensitivity Analysis By Performance Variance

Split Sensitivity Analysis By Performance Variance

Procedure

  1. Define Split Ratios and Metric

    • Identify train/test split ratios to evaluate, such as 50/50, 70/30, 80/20, and 90/10.
    • Select a performance metric (e.g., accuracy, precision, recall, F1 score, or RMSE) for evaluation.
  2. Set Up Model Evaluation Framework

    • Choose a naive or standard model, such as logistic regression or decision tree.
    • Implement k-fold cross-validation (e.g., k=5) for each split ratio on the training set.
    • Record performance scores (e.g., the chosen metric) for each fold.
  3. Test Set Evaluation

    • Train the model on the complete training set for each split ratio.
    • Evaluate the model on the corresponding test set and record the performance score.
  4. Compare Variances of Performance Scores

    • Use statistical tests like Levene’s test or an F-test to compare the variance of performance scores obtained from k-fold cross-validation and test set evaluations for each split ratio.
    • Count the number of cases where the variances are statistically equivalent (e.g., p-value > 0.05).
  5. Perform Multiple Trials

    • Repeat the entire process (steps 2–4) multiple times (e.g., 10–20 trials per split ratio) to account for variability and ensure robust results.
  6. Aggregate and Analyze Results

    • Calculate the percentage of trials where the variance of k-fold and test set performance scores is statistically equivalent for each split ratio.
    • Identify the split ratio(s) with the highest consistency in performance measure correlation between k-fold and test set evaluations.
  7. Report Findings

    • Summarize the results, including:
      • Average percentage of trials with equivalent variances for each split ratio.
      • Number of trials where performance measures are highly correlated.
    • Recommend split ratio(s) that show stable and reliable performance evaluations across trials.

Code Example

Below is a Python function that performs a diagnostic test to evaluate train/test split ratios by comparing model performance distributions using k-fold cross-validation and a chosen evaluation metric.

import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import make_scorer
from scipy.stats import levene

def evaluate_split_sensitivity(X, y, split_ratios, model, metric, k=5, num_trials=20, alpha=0.05):
    """
    Perform sensitivity analysis of train/test split ratios by comparing model performance variances.

    Parameters:
        X (np.ndarray): Feature dataset (numerical variables only).
        y (np.ndarray): Target array.
        split_ratios (list of tuples): List of train/test split ratios (e.g., [(0.5, 0.5), (0.7, 0.3)]).
        model: A scikit-learn compatible model instance.
        metric: A scikit-learn compatible scoring function (e.g., accuracy_score).
        k (int): Number of folds for cross-validation.
        num_trials (int): Number of trials to repeat for statistical consistency testing.
        alpha (float): Significance level for Levene's test.

    Returns:
        dict: A dictionary containing variance consistency results for each split ratio.
    """
    scorer = make_scorer(metric)
    results = {}

    for train_ratio, test_ratio in split_ratios:
        ratio_key = f"Train:{train_ratio}, Test:{test_ratio}"
        results[ratio_key] = []

        for _ in range(num_trials):
            # Split the dataset
            X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=train_ratio, test_size=test_ratio)

            # Cross-validation performance scores
            cv_scores = cross_val_score(model, X_train, y_train, cv=k, scoring=scorer)

            # Train and evaluate on test set
            model.fit(X_train, y_train)
            test_score = metric(y_test, model.predict(X_test))

            # Compare variances using Levene's test
            stat, p_value = levene(cv_scores, [test_score])
            results[ratio_key].append(p_value > alpha)  # Consistency if p-value > alpha

        # Summarize results for the split ratio
        results[ratio_key] = {
            "Consistent Trials (%)": np.mean(results[ratio_key]) * 100
        }

    return results

# Demo Usage
if __name__ == "__main__":
    # Generate synthetic dataset
    from sklearn.datasets import make_classification
    from sklearn.metrics import accuracy_score
    from sklearn.linear_model import LogisticRegression

    X, y = make_classification(n_samples=500, n_features=10, random_state=42)

    # Define split ratios to evaluate
    split_ratios = [(0.5, 0.5), (0.7, 0.3), (0.8, 0.2), (0.9, 0.1)]

    # Initialize model and metric
    model = LogisticRegression()
    metric = accuracy_score

    # Perform the split sensitivity analysis
    results = evaluate_split_sensitivity(X, y, split_ratios, model, metric, k=10, num_trials=30, alpha=0.05)

    # Print results
    for split, details in results.items():
        print(f"Split Ratio {split}:")
        for key, value in details.items():
            print(f"  - {key}: {value:.2f}%")
        print("-" * 40)

Example Output

Split Ratio Train:0.5, Test:0.5:
  - Consistent Trials (%): 100.00%
----------------------------------------
Split Ratio Train:0.7, Test:0.3:
  - Consistent Trials (%): 96.67%
----------------------------------------
Split Ratio Train:0.8, Test:0.2:
  - Consistent Trials (%): 100.00%
----------------------------------------
Split Ratio Train:0.9, Test:0.1:
  - Consistent Trials (%): 96.67%
----------------------------------------