Residual Error Variance
Procedure
This procedure evaluates whether the residual errors (prediction errors) from model predictions on the train set and test set have the same variance, providing insights into model consistency and generalization.
-
Gather Residual Error Samples
- What to do: Collect residual errors from both the train set and test set for each machine learning algorithm in your suite.
- Residual errors are calculated as the difference between actual values and predicted values for each data point.
- Ensure these samples are derived using the same test harness (e.g., 10-fold cross-validation).
- What to do: Collect residual errors from both the train set and test set for each machine learning algorithm in your suite.
-
Choose a Variance Test
- What to do: Select an appropriate statistical test to compare the variances of the residual errors.
- Use Levene’s Test if the data may not follow a normal distribution.
- Use an F-Test if the residuals are approximately normally distributed.
- What to do: Select an appropriate statistical test to compare the variances of the residual errors.
-
Perform the Variance Test
- What to do: Conduct the chosen test to compare the variances of the residual errors.
- For Levene’s Test, compute the test statistic and p-value using the train and test residual samples.
- For the F-Test, calculate the ratio of variances (train variance over test variance) and derive the test statistic and p-value.
- Ensure the samples meet the assumptions of the selected test (e.g., independence of observations).
- What to do: Conduct the chosen test to compare the variances of the residual errors.
-
Interpret the Test Results
- What to do: Analyze the output of the variance test to determine if there is evidence of variance differences.
- A p-value less than the chosen significance level (e.g., 0.05) indicates significant variance differences between train and test residuals.
- A non-significant p-value suggests similar variances, supporting consistency between train and test sets.
- What to do: Analyze the output of the variance test to determine if there is evidence of variance differences.
-
Evaluate Practical Implications
- What to do: Assess the impact of the findings on model performance and generalization.
- Significant variance differences might suggest overfitting or issues with data splitting.
- Similar variances support the assumption that the model generalizes well to unseen data.
- What to do: Assess the impact of the findings on model performance and generalization.
-
Report the Findings
- What to do: Summarize your results and their implications.
- Include the test statistic, p-value, and interpretation of the variance comparison.
- Highlight any concerns, such as significant variance differences, and suggest potential next steps (e.g., refining the model or reconsidering the train-test split).
- Use visualizations (e.g., box plots or density plots of residuals) to support your analysis.
- What to do: Summarize your results and their implications.
Code Example
This Python function evaluates whether the residual errors on the train and test sets have the same variance using statistical tests, interprets the results, and provides insights into model consistency.
import numpy as np
from scipy.stats import f_oneway, levene
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import mean_squared_error
def residual_variance_test(
X_train, y_train, X_test, y_test, algorithms, metric=mean_squared_error, k=10, alpha=0.05
):
"""
Test whether the residual errors on the train and test sets have the same variance.
Parameters:
X_train (np.ndarray): Train set features.
y_train (np.ndarray): Train set target.
X_test (np.ndarray): Test set features.
y_test (np.ndarray): Test set target.
algorithms (list): List of machine learning models to evaluate.
metric (callable): Performance metric function (default=mean_squared_error).
k (int): Number of folds for cross-validation (default=10).
alpha (float): P-value threshold for statistical significance (default=0.05).
Returns:
dict: Dictionary with variance test results and interpretation for each algorithm.
"""
results = {}
for alg in algorithms:
# Cross-validation predictions on the train set
train_preds = cross_val_predict(alg, X_train, y_train, cv=k)
train_residuals = y_train - train_preds
# Cross-validation predictions on the test set
test_preds = cross_val_predict(alg, X_test, y_test, cv=k)
test_residuals = y_test - test_preds
# Perform Levene's Test (robust to non-normality)
stat, p_value = levene(train_residuals, test_residuals)
# Interpret the results
significance = "significant" if p_value <= alpha else "not significant"
if significance == "significant":
interpretation = "The variances of train and test residuals are significantly different, indicating potential generalization issues."
else:
interpretation = "The variances of train and test residuals are not significantly different, suggesting consistent generalization."
# Store results
results[alg.__class__.__name__] = {
"Test Statistic": stat,
"P-Value": p_value,
"Significance": significance,
"Interpretation": interpretation,
}
return results
# Demo Usage
if __name__ == "__main__":
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
# Generate synthetic data
np.random.seed(42)
X_train = np.random.normal(loc=0, scale=1, size=(100, 5))
y_train = X_train[:, 0] * 2 + np.random.normal(loc=0, scale=0.5, size=100)
X_test = np.random.normal(loc=0.2, scale=1.2, size=(100, 5))
y_test = X_test[:, 0] * 2 + np.random.normal(loc=0, scale=0.5, size=100)
# Define models
algorithms = [LinearRegression(), DecisionTreeRegressor(), RandomForestRegressor(n_estimators=10)]
# Perform the diagnostic test
results = residual_variance_test(X_train, y_train, X_test, y_test, algorithms, alpha=0.05)
# Print results
print("Residual Variance Test Results:")
for model, result in results.items():
print(f"{model}:")
for key, value in result.items():
print(f" - {key}: {value}")Example Output
Residual Variance Test Results:
LinearRegression:
- Test Statistic: 5.156627785560461
- P-Value: 0.024233939944840216
- Significance: significant
- Interpretation: The variances of train and test residuals are significantly different, indicating potential generalization issues.
DecisionTreeRegressor:
- Test Statistic: 5.858548835856423
- P-Value: 0.016404195921629616
- Significance: significant
- Interpretation: The variances of train and test residuals are significantly different, indicating potential generalization issues.
RandomForestRegressor:
- Test Statistic: 5.592681930473287
- P-Value: 0.01900260083551664
- Significance: significant
- Interpretation: The variances of train and test residuals are significantly different, indicating potential generalization issues.