Residual Error Effect Size
Procedure
This procedure evaluates whether the residual errors (model prediction errors) on the train set and test set have the same distributions, helping to assess the model’s generalization consistency.
-
Gather Residual Error Samples
- What to do: Ensure you have prediction error samples for the train set and test set for your chosen machine learning algorithms.
- Errors should be calculated as the difference between actual and predicted values for each data point.
- Ensure the data has been split into train and test sets, and errors are collected for each set.
- What to do: Ensure you have prediction error samples for the train set and test set for your chosen machine learning algorithms.
-
Calculate Effect Size
- What to do: Quantify the magnitude of the difference in residual distributions using an effect size measure.
- Compute Cohen’s d to measure the standardized mean difference between train and test residuals:
- $ d = \frac{M_1 - M_2}{SD_{pooled}} $, where $ M_1 $ and $ M_2 $ are the means of the train and test residuals, and $ SD_{pooled} $ is the pooled standard deviation.
- Interpret the value:
- Small effect size: |d| < 0.2
- Medium effect size: |d| = 0.2 to 0.5
- Large effect size: |d| > 0.5
- Ensure both train and test samples are large enough for a reliable estimate of Cohen’s d.
- Compute Cohen’s d to measure the standardized mean difference between train and test residuals:
- What to do: Quantify the magnitude of the difference in residual distributions using an effect size measure.
-
Interpret the Effect Size
- What to do: Use the effect size to determine whether the train and test residuals are meaningfully different.
- A small effect size indicates negligible differences, suggesting consistent model behavior across train and test sets.
- A large effect size suggests significant differences, indicating potential issues such as overfitting, underfitting, or data leakage.
- What to do: Use the effect size to determine whether the train and test residuals are meaningfully different.
-
Report the Findings
- What to do: Summarize your findings based on the calculated effect size.
- Clearly state the calculated Cohen’s d value and its interpretation (small, medium, or large).
- Discuss any implications for the model’s generalization ability.
- Provide actionable recommendations, such as revisiting feature engineering, regularization techniques, or data preprocessing if large effect sizes are observed.
- Include visualizations, such as histograms or box plots comparing the residual distributions, to support the conclusions.
- What to do: Summarize your findings based on the calculated effect size.
Code Example
This Python function evaluates the effect size of the difference in residual errors between the train and test sets for a suite of machine learning algorithms, helping assess generalization consistency.
import numpy as np
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import mean_squared_error
def residual_effect_size_test(
X_train, y_train, X_test, y_test, algorithms, metric=mean_squared_error, k=10
):
"""
Evaluate the effect size of the difference in residual errors between train and test sets.
Parameters:
X_train (np.ndarray): Train set features.
y_train (np.ndarray): Train set target.
X_test (np.ndarray): Test set features.
y_test (np.ndarray): Test set target.
algorithms (list): List of machine learning models to evaluate.
metric (callable): Performance metric function (default=mean_squared_error).
k (int): Number of folds for cross-validation (default=10).
Returns:
dict: Dictionary with effect size and interpretation for each algorithm.
"""
results = {}
for alg in algorithms:
# Cross-validation predictions on the train set
train_preds = cross_val_predict(alg, X_train, y_train, cv=k)
train_residuals = y_train - train_preds
# Cross-validation predictions on the test set
test_preds = cross_val_predict(alg, X_test, y_test, cv=k)
test_residuals = y_test - test_preds
# Calculate means and pooled standard deviation
train_mean = np.mean(train_residuals)
test_mean = np.mean(test_residuals)
pooled_std = np.sqrt((np.std(train_residuals, ddof=1)**2 + np.std(test_residuals, ddof=1)**2) / 2)
# Calculate Cohen's d effect size
cohen_d = (train_mean - test_mean) / pooled_std
# Interpret Cohen's d
if abs(cohen_d) < 0.2:
interpretation = "Small effect size, negligible difference in residual errors."
elif 0.2 <= abs(cohen_d) < 0.5:
interpretation = "Medium effect size, moderate difference in residual errors."
else:
interpretation = "Large effect size, significant difference in residual errors."
# Store results
results[alg.__class__.__name__] = {
"Cohen's d": cohen_d,
"Interpretation": interpretation,
}
return results
# Demo Usage
if __name__ == "__main__":
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
# Generate synthetic data
np.random.seed(42)
X_train = np.random.normal(loc=0, scale=1, size=(100, 5))
y_train = X_train[:, 0] * 2 + np.random.normal(loc=0, scale=0.5, size=100)
X_test = np.random.normal(loc=0.2, scale=1.2, size=(100, 5))
y_test = X_test[:, 0] * 2 + np.random.normal(loc=0, scale=0.5, size=100)
# Define models
algorithms = [LinearRegression(), DecisionTreeRegressor(), RandomForestRegressor(n_estimators=10)]
# Perform the diagnostic test
results = residual_effect_size_test(X_train, y_train, X_test, y_test, algorithms, k=5)
# Print results
print("Residual Effect Size Test Results:")
for model, result in results.items():
print(f"{model}:")
for key, value in result.items():
print(f" - {key}: {value}")
Example Output
Residual Effect Size Test Results:
LinearRegression:
- Cohen's d: -0.00640635333539964
- Interpretation: Small effect size, negligible difference in residual errors.
DecisionTreeRegressor:
- Cohen's d: 0.08863288026208745
- Interpretation: Small effect size, negligible difference in residual errors.
RandomForestRegressor:
- Cohen's d: -0.0891434923059157
- Interpretation: Small effect size, negligible difference in residual errors.