Split Sensitivity Analysis By Performance Variance
Procedure
-
Define Split Ratios and Metric
- Identify train/test split ratios to evaluate, such as 50/50, 70/30, 80/20, and 90/10.
- Select a performance metric (e.g., accuracy, precision, recall, F1 score, or RMSE) for evaluation.
-
Set Up Model Evaluation Framework
- Choose a naive or standard model, such as logistic regression or decision tree.
- Implement k-fold cross-validation (e.g., k=5) for each split ratio on the training set.
- Record performance scores (e.g., the chosen metric) for each fold.
-
Test Set Evaluation
- Train the model on the complete training set for each split ratio.
- Evaluate the model on the corresponding test set and record the performance score.
-
Compare Variances of Performance Scores
- Use statistical tests like Levene’s test or an F-test to compare the variance of performance scores obtained from k-fold cross-validation and test set evaluations for each split ratio.
- Count the number of cases where the variances are statistically equivalent (e.g., p-value > 0.05).
-
Perform Multiple Trials
- Repeat the entire process (steps 2–4) multiple times (e.g., 10–20 trials per split ratio) to account for variability and ensure robust results.
-
Aggregate and Analyze Results
- Calculate the percentage of trials where the variance of k-fold and test set performance scores is statistically equivalent for each split ratio.
- Identify the split ratio(s) with the highest consistency in performance measure correlation between k-fold and test set evaluations.
-
Report Findings
- Summarize the results, including:
- Average percentage of trials with equivalent variances for each split ratio.
- Number of trials where performance measures are highly correlated.
- Recommend split ratio(s) that show stable and reliable performance evaluations across trials.
- Summarize the results, including:
Code Example
Below is a Python function that performs a diagnostic test to evaluate train/test split ratios by comparing model performance distributions using k-fold cross-validation and a chosen evaluation metric.
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import make_scorer
from scipy.stats import levene
def evaluate_split_sensitivity(X, y, split_ratios, model, metric, k=5, num_trials=20, alpha=0.05):
"""
Perform sensitivity analysis of train/test split ratios by comparing model performance variances.
Parameters:
X (np.ndarray): Feature dataset (numerical variables only).
y (np.ndarray): Target array.
split_ratios (list of tuples): List of train/test split ratios (e.g., [(0.5, 0.5), (0.7, 0.3)]).
model: A scikit-learn compatible model instance.
metric: A scikit-learn compatible scoring function (e.g., accuracy_score).
k (int): Number of folds for cross-validation.
num_trials (int): Number of trials to repeat for statistical consistency testing.
alpha (float): Significance level for Levene's test.
Returns:
dict: A dictionary containing variance consistency results for each split ratio.
"""
scorer = make_scorer(metric)
results = {}
for train_ratio, test_ratio in split_ratios:
ratio_key = f"Train:{train_ratio}, Test:{test_ratio}"
results[ratio_key] = []
for _ in range(num_trials):
# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=train_ratio, test_size=test_ratio)
# Cross-validation performance scores
cv_scores = cross_val_score(model, X_train, y_train, cv=k, scoring=scorer)
# Train and evaluate on test set
model.fit(X_train, y_train)
test_score = metric(y_test, model.predict(X_test))
# Compare variances using Levene's test
stat, p_value = levene(cv_scores, [test_score])
results[ratio_key].append(p_value > alpha) # Consistency if p-value > alpha
# Summarize results for the split ratio
results[ratio_key] = {
"Consistent Trials (%)": np.mean(results[ratio_key]) * 100
}
return results
# Demo Usage
if __name__ == "__main__":
# Generate synthetic dataset
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=500, n_features=10, random_state=42)
# Define split ratios to evaluate
split_ratios = [(0.5, 0.5), (0.7, 0.3), (0.8, 0.2), (0.9, 0.1)]
# Initialize model and metric
model = LogisticRegression()
metric = accuracy_score
# Perform the split sensitivity analysis
results = evaluate_split_sensitivity(X, y, split_ratios, model, metric, k=10, num_trials=30, alpha=0.05)
# Print results
for split, details in results.items():
print(f"Split Ratio {split}:")
for key, value in details.items():
print(f" - {key}: {value:.2f}%")
print("-" * 40)
Example Output
Split Ratio Train:0.5, Test:0.5:
- Consistent Trials (%): 100.00%
----------------------------------------
Split Ratio Train:0.7, Test:0.3:
- Consistent Trials (%): 96.67%
----------------------------------------
Split Ratio Train:0.8, Test:0.2:
- Consistent Trials (%): 100.00%
----------------------------------------
Split Ratio Train:0.9, Test:0.1:
- Consistent Trials (%): 96.67%
----------------------------------------