Calculate Numerical Effect Size

Diagnostics

Procedure

This procedure evaluates whether there is statistical evidence that the train and test sets have the same data distributions, with a specific focus on determining if each variable has a small effect size (e.g., Cohen’s d).

Test for Normality
- What to do: Determine whether the data in each variable of the train and test sets follows a normal distribution.
  - Use the Shapiro-Wilk test or Anderson-Darling test for small datasets.
  - For larger datasets, use the Kolmogorov-Smirnov test or visualize distributions with histograms and Q-Q plots to support interpretation.
Compute Effect Size for Numerical Variables
- What to do: Choose and calculate the effect size for each variable in the train and test sets.
  - If the data is normally distributed, calculate Cohen’s d to measure the standardized difference in means.
  - For non-normally distributed data, consider using an equivalent non-parametric effect size measure, such as Cliff’s delta.
  - Document the calculated effect size for each variable.
Interpret Effect Size Results
- What to do: Assess the computed effect sizes to determine whether each variable exhibits a small effect size.
  - Small effect sizes (e.g., Cohen’s d < 0.2) indicate minimal practical difference between train and test sets for that variable.
  - Larger effect sizes suggest meaningful differences, warranting further investigation or data adjustments.
Report the Findings
- What to do: Summarize the results of the effect size analysis and statistical tests in a clear, concise format.
  - Highlight variables with small versus large effect sizes and explain their potential impact on model performance.
  - Provide actionable recommendations, such as rebalancing datasets or applying feature scaling, for variables with significant differences.

Code Example

This Python function evaluates whether the train and test sets have small effect sizes for numerical variables by calculating Cohen’s d for normally distributed data or Cliff’s delta for non-parametric data.

import numpy as np
from scipy.stats import shapiro

def effect_size_test(X_train, X_test, alpha=0.05):
    """
    Test whether the train and test sets have small effect sizes (Cohen's d or Cliff's delta)
    for numerical variables.

    Parameters:
        X_train (np.ndarray): Train set features.
        X_test (np.ndarray): Test set features.
        alpha (float): P-value threshold for determining normality (default=0.05).

    Returns:
        dict: Dictionary with effect size and interpretation for each feature.
    """
    results = {}

    for feature_idx in range(X_train.shape[1]):
        train_feature = X_train[:, feature_idx]
        test_feature = X_test[:, feature_idx]

        # Check normality
        _, p_train_normal = shapiro(train_feature)
        _, p_test_normal = shapiro(test_feature)

        if p_train_normal > alpha and p_test_normal > alpha:
            # Both distributions are normal, use Cohen's d
            mean_diff = np.mean(train_feature) - np.mean(test_feature)
            pooled_std = np.sqrt((np.var(train_feature) + np.var(test_feature)) / 2)
            effect_size = mean_diff / pooled_std if pooled_std != 0 else np.nan
            effect_size_type = "Cohen's d"
        else:
            # At least one distribution is non-normal, use Cliff's delta
            n = len(train_feature)
            m = len(test_feature)
            concordant = sum(
                (x > y for x in train_feature for y in test_feature)
            )
            discordant = sum(
                (x < y for x in train_feature for y in test_feature)
            )
            effect_size = (concordant - discordant) / (n * m)
            effect_size_type = "Cliff's delta"

        # Interpret effect size
        if effect_size_type == "Cohen's d":
            interpretation = (
                "Small effect" if abs(effect_size) < 0.2 else
                "Medium effect" if abs(effect_size) < 0.5 else
                "Large effect"
            )
        elif effect_size_type == "Cliff's delta":
            interpretation = (
                "Negligible effect" if abs(effect_size) < 0.147 else
                "Small effect" if abs(effect_size) < 0.33 else
                "Medium effect" if abs(effect_size) < 0.474 else
                "Large effect"
            )

        # Store results
        results[f"Feature {feature_idx}"] = {
            "Effect Size Type": effect_size_type,
            "Effect Size": effect_size,
            "Interpretation": interpretation
        }

    return results

# Demo Usage
if __name__ == "__main__":
    # Generate synthetic data
    np.random.seed(42)
    X_train = np.random.normal(loc=0, scale=1, size=(100, 5))
    X_test = np.random.normal(loc=0.2, scale=1.2, size=(100, 5))  # Slightly shifted test set

    # Perform the diagnostic test
    results = effect_size_test(X_train, X_test, alpha=0.05)

    # Print results
    print("Effect Size Test Results:")
    for feature, result in results.items():
        print(f"{feature}:")
        for key, value in result.items():
            print(f"  - {key}: {value}")

Example Output

Effect Size Test Results:
Feature 0:
  - Effect Size Type: Cohen's d
  - Effect Size: -0.2546196652120215
  - Interpretation: Medium effect
Feature 1:
  - Effect Size Type: Cohen's d
  - Effect Size: 0.021417499810950664
  - Interpretation: Small effect
Feature 2:
  - Effect Size Type: Cohen's d
  - Effect Size: -0.37319109615675333
  - Interpretation: Medium effect
Feature 3:
  - Effect Size Type: Cohen's d
  - Effect Size: -0.08099639963739871
  - Interpretation: Small effect
Feature 4:
  - Effect Size Type: Cohen's d
  - Effect Size: -0.389333931668137
  - Interpretation: Medium effect

Compare Distribution Divergence Compare Distribution of Univariate Outliers