Calculate Numerical Effect Size
Procedure
This procedure evaluates whether there is statistical evidence that the train and test sets have the same data distributions, with a specific focus on determining if each variable has a small effect size (e.g., Cohen’s d).
-
Test for Normality
- What to do: Determine whether the data in each variable of the train and test sets follows a normal distribution.
- Use the Shapiro-Wilk test or Anderson-Darling test for small datasets.
- For larger datasets, use the Kolmogorov-Smirnov test or visualize distributions with histograms and Q-Q plots to support interpretation.
- What to do: Determine whether the data in each variable of the train and test sets follows a normal distribution.
-
Compute Effect Size for Numerical Variables
- What to do: Choose and calculate the effect size for each variable in the train and test sets.
- If the data is normally distributed, calculate Cohen’s d to measure the standardized difference in means.
- For non-normally distributed data, consider using an equivalent non-parametric effect size measure, such as Cliff’s delta.
- Document the calculated effect size for each variable.
- What to do: Choose and calculate the effect size for each variable in the train and test sets.
-
Interpret Effect Size Results
- What to do: Assess the computed effect sizes to determine whether each variable exhibits a small effect size.
- Small effect sizes (e.g., Cohen’s d < 0.2) indicate minimal practical difference between train and test sets for that variable.
- Larger effect sizes suggest meaningful differences, warranting further investigation or data adjustments.
- What to do: Assess the computed effect sizes to determine whether each variable exhibits a small effect size.
-
Report the Findings
- What to do: Summarize the results of the effect size analysis and statistical tests in a clear, concise format.
- Highlight variables with small versus large effect sizes and explain their potential impact on model performance.
- Provide actionable recommendations, such as rebalancing datasets or applying feature scaling, for variables with significant differences.
- What to do: Summarize the results of the effect size analysis and statistical tests in a clear, concise format.
Code Example
This Python function evaluates whether the train and test sets have small effect sizes for numerical variables by calculating Cohen’s d for normally distributed data or Cliff’s delta for non-parametric data.
import numpy as np
from scipy.stats import shapiro
def effect_size_test(X_train, X_test, alpha=0.05):
"""
Test whether the train and test sets have small effect sizes (Cohen's d or Cliff's delta)
for numerical variables.
Parameters:
X_train (np.ndarray): Train set features.
X_test (np.ndarray): Test set features.
alpha (float): P-value threshold for determining normality (default=0.05).
Returns:
dict: Dictionary with effect size and interpretation for each feature.
"""
results = {}
for feature_idx in range(X_train.shape[1]):
train_feature = X_train[:, feature_idx]
test_feature = X_test[:, feature_idx]
# Check normality
_, p_train_normal = shapiro(train_feature)
_, p_test_normal = shapiro(test_feature)
if p_train_normal > alpha and p_test_normal > alpha:
# Both distributions are normal, use Cohen's d
mean_diff = np.mean(train_feature) - np.mean(test_feature)
pooled_std = np.sqrt((np.var(train_feature) + np.var(test_feature)) / 2)
effect_size = mean_diff / pooled_std if pooled_std != 0 else np.nan
effect_size_type = "Cohen's d"
else:
# At least one distribution is non-normal, use Cliff's delta
n = len(train_feature)
m = len(test_feature)
concordant = sum(
(x > y for x in train_feature for y in test_feature)
)
discordant = sum(
(x < y for x in train_feature for y in test_feature)
)
effect_size = (concordant - discordant) / (n * m)
effect_size_type = "Cliff's delta"
# Interpret effect size
if effect_size_type == "Cohen's d":
interpretation = (
"Small effect" if abs(effect_size) < 0.2 else
"Medium effect" if abs(effect_size) < 0.5 else
"Large effect"
)
elif effect_size_type == "Cliff's delta":
interpretation = (
"Negligible effect" if abs(effect_size) < 0.147 else
"Small effect" if abs(effect_size) < 0.33 else
"Medium effect" if abs(effect_size) < 0.474 else
"Large effect"
)
# Store results
results[f"Feature {feature_idx}"] = {
"Effect Size Type": effect_size_type,
"Effect Size": effect_size,
"Interpretation": interpretation
}
return results
# Demo Usage
if __name__ == "__main__":
# Generate synthetic data
np.random.seed(42)
X_train = np.random.normal(loc=0, scale=1, size=(100, 5))
X_test = np.random.normal(loc=0.2, scale=1.2, size=(100, 5)) # Slightly shifted test set
# Perform the diagnostic test
results = effect_size_test(X_train, X_test, alpha=0.05)
# Print results
print("Effect Size Test Results:")
for feature, result in results.items():
print(f"{feature}:")
for key, value in result.items():
print(f" - {key}: {value}")Example Output
Effect Size Test Results:
Feature 0:
- Effect Size Type: Cohen's d
- Effect Size: -0.2546196652120215
- Interpretation: Medium effect
Feature 1:
- Effect Size Type: Cohen's d
- Effect Size: 0.021417499810950664
- Interpretation: Small effect
Feature 2:
- Effect Size Type: Cohen's d
- Effect Size: -0.37319109615675333
- Interpretation: Medium effect
Feature 3:
- Effect Size Type: Cohen's d
- Effect Size: -0.08099639963739871
- Interpretation: Small effect
Feature 4:
- Effect Size Type: Cohen's d
- Effect Size: -0.389333931668137
- Interpretation: Medium effect