Residual Error Distributions Correlated
Procedure
This procedure evaluates whether the residual errors from model predictions on the train and test sets are highly correlated, providing insights into model consistency and generalization.
-
Gather Residual Error Samples
- What to do: Ensure you have residual error samples for the train set and test set for your chosen machine learning algorithms.
- Residual errors are calculated as the difference between actual values and predicted values for each data point.
- These should already be split into train and test samples.
- What to do: Ensure you have residual error samples for the train set and test set for your chosen machine learning algorithms.
-
Calculate Correlation
- What to do: Compute the correlation between residual errors on the train and test sets.
- Use Pearson’s correlation coefficient if you expect a linear relationship.
- Use Spearman’s rank correlation coefficient if the relationship might be non-linear but monotonic.
- Ensure that both residual error samples are of similar length for accurate correlation computation.
- What to do: Compute the correlation between residual errors on the train and test sets.
-
Interpret Correlation Coefficient
- What to do: Analyze the correlation value to assess the relationship between train and test residuals.
- A correlation coefficient close to +1 indicates a strong positive relationship, suggesting consistent residual patterns.
- A coefficient close to 0 suggests no relationship, indicating potential inconsistencies in model behavior.
- A negative coefficient implies an inverse relationship, which may be unexpected and requires further investigation.
- What to do: Analyze the correlation value to assess the relationship between train and test residuals.
-
Evaluate Statistical Significance
- What to do: Assess whether the correlation is statistically significant.
- Perform a hypothesis test for the correlation coefficient to check if it significantly differs from 0.
- Record the p-value. A p-value less than the chosen significance level (e.g., 0.05) indicates a significant correlation.
- What to do: Assess whether the correlation is statistically significant.
-
Interpret the Results
- What to do: Derive conclusions based on the correlation and its significance.
- If the correlation is high and positive, it confirms consistency between train and test residuals, which is desirable.
- If the correlation is low or non-significant, it suggests inconsistencies, indicating that the model may generalize poorly.
- If the correlation is unexpectedly negative, it may suggest overfitting or issues in data partitioning or preprocessing.
- What to do: Derive conclusions based on the correlation and its significance.
-
Report the Findings
- What to do: Summarize your results and their implications.
- Include the calculated correlation coefficient, p-value, and interpretation of the results.
- Highlight any unexpected findings, such as low or negative correlations, and suggest potential next steps (e.g., model refinement or data investigation).
- Use visualizations (e.g., scatter plots of residuals) to support your analysis.
- What to do: Summarize your results and their implications.
Code Example
This Python function evaluates whether the residual errors from model predictions on the train and test sets are highly correlated, automatically determining whether the residuals follow a normal distribution and using the appropriate correlation method.
import numpy as np
from scipy.stats import pearsonr, spearmanr, shapiro
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import mean_squared_error
def residual_correlation_test(
X_train, y_train, X_test, y_test, algorithms, metric=mean_squared_error, k=10, alpha=0.05
):
"""
Test whether the residual errors on the train and test sets are highly correlated.
Parameters:
X_train (np.ndarray): Train set features.
y_train (np.ndarray): Train set target.
X_test (np.ndarray): Test set features.
y_test (np.ndarray): Test set target.
algorithms (list): List of machine learning models to evaluate.
metric (callable): Performance metric function (default=mean_squared_error).
k (int): Number of folds for cross-validation (default=10).
alpha (float): P-value threshold for statistical significance (default=0.05).
Returns:
dict: Dictionary with correlation results and interpretation for each algorithm.
"""
results = {}
for alg in algorithms:
# Cross-validation predictions on the train set
train_preds = cross_val_predict(alg, X_train, y_train, cv=k)
train_residuals = y_train - train_preds
# Cross-validation predictions on the test set
test_preds = cross_val_predict(alg, X_test, y_test, cv=k)
test_residuals = y_test - test_preds
# Check normality of residuals
_, p_train_normal = shapiro(train_residuals)
_, p_test_normal = shapiro(test_residuals)
# Choose appropriate correlation method
if p_train_normal > alpha and p_test_normal > alpha:
method = "Pearson"
correlation, p_value = pearsonr(train_residuals, test_residuals)
else:
method = "Spearman"
correlation, p_value = spearmanr(train_residuals, test_residuals)
# Interpret the results
significance = "significant" if p_value <= alpha else "not significant"
# Interpret the correlation coefficient
if correlation > 0.8:
interpretation = "The residuals have a high positive correlation, indicating strong consistency and good generalization."
elif 0.5 < correlation <= 0.8:
interpretation = "The residuals have a moderate positive correlation, indicating reasonable consistency but potential minor generalization issues."
elif 0.2 < correlation <= 0.5:
interpretation = "The residuals have a low positive correlation, suggesting weak consistency and potential generalization concerns."
elif -0.2 <= correlation <= 0.2:
interpretation = "The residuals have little to no correlation, indicating poor consistency and likely generalization problems."
else:
interpretation = "The residuals have a negative correlation, suggesting inverse relationships and severe generalization issues."
# Store results
results[alg.__class__.__name__] = {
"Correlation Method": method,
"Correlation": correlation,
"P-Value": p_value,
"Significance": significance,
"Interpretation": interpretation,
}
return results
# Demo Usage
if __name__ == "__main__":
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
# Generate synthetic data
np.random.seed(42)
X_train = np.random.normal(loc=0, scale=1, size=(100, 5))
y_train = X_train[:, 0] * 2 + np.random.normal(loc=0, scale=0.5, size=100)
X_test = np.random.normal(loc=0.2, scale=1.2, size=(100, 5))
y_test = X_test[:, 0] * 2 + np.random.normal(loc=0, scale=0.5, size=100)
# Define models
algorithms = [LinearRegression(), DecisionTreeRegressor(), RandomForestRegressor(n_estimators=10)]
# Perform the diagnostic test
results = residual_correlation_test(X_train, y_train, X_test, y_test, algorithms, alpha=0.05)
# Print results
print("Residual Correlation Test Results:")
for model, result in results.items():
print(f"{model}:")
for key, value in result.items():
print(f" - {key}: {value}")
Example Output
Residual Correlation Test Results:
LinearRegression:
- Correlation Method: Pearson
- Correlation: -0.13994348478847515
- P-Value: 0.1649287316055379
- Significance: not significant
- Interpretation: The residuals have little to no correlation, indicating poor consistency and likely generalization problems.
DecisionTreeRegressor:
- Correlation Method: Pearson
- Correlation: -0.09194504725426
- P-Value: 0.3629175215329332
- Significance: not significant
- Interpretation: The residuals have little to no correlation, indicating poor consistency and likely generalization problems.
RandomForestRegressor:
- Correlation Method: Pearson
- Correlation: -0.22060791982493722
- P-Value: 0.02741252648616371
- Significance: significant
- Interpretation: The residuals have a negative correlation, suggesting inverse relationships and severe generalization issues.