Robustness to Algorithm Hyperparameter Noise

Robustness to Algorithm Hyperparameter Noise

Procedure

This procedure assesses a model’s robustness by analyzing the variance in its performance when Gaussian noise is applied to key hyperparameters of the learning algorithm. This helps determine if the model is overly sensitive to small hyperparameter perturbations, which might indicate overfitting or instability.

  1. Select Hyperparameters to Perturb

    • What to do: Identify key hyperparameters of the model to introduce noise.
      • Choose hyperparameters critical to the model’s performance, such as learning rate, regularization parameters, or tree depth.
      • Focus on hyperparameters with a significant impact on the model’s training dynamics or output.
  2. Define Parameters for Gaussian Noise

    • What to do: Specify the Gaussian noise distribution for hyperparameter perturbation.
      • Select a mean (e.g., 0) and standard deviation based on a small fraction of the typical hyperparameter range (e.g., ±10% of the parameter value).
      • Ensure the noise level is small enough to reflect realistic, minor adjustments.
  3. Perturb Hyperparameters and Retrain the Model

    • What to do: Apply Gaussian noise to the chosen hyperparameters and retrain the model.
      • Generate multiple sets of hyperparameters by adding noise (e.g., 5-10 variations).
      • Retrain the model on the training dataset using each set of perturbed hyperparameters.
  4. Evaluate Performance on Test Data

    • What to do: Test the model’s performance on the same test dataset for each hyperparameter variation.
      • Keep the test dataset, evaluation metric, and other configurations constant to isolate the effect of hyperparameter changes.
      • Record the model’s performance metrics for each variation.
  5. Calculate Variance in Performance

    • What to do: Compute the variance (or standard deviation) of the performance metrics across all hyperparameter variations.
      • Use statistical tools or libraries (e.g., Pandas, NumPy) to calculate the variance.
      • Summarize the results to assess performance stability.
  6. Interpret Variance Results

    • What to do: Analyze the variance to evaluate model robustness.
      • Low variance indicates the model is robust to small hyperparameter changes, suggesting good generalization.
      • High variance suggests the model is sensitive to hyperparameter settings, potentially indicating overfitting or excessive reliance on precise tuning.
  7. Report the Findings

    • What to do: Summarize the diagnostic test results and their implications.
      • Present the computed variance alongside the individual performance metrics for each perturbed hyperparameter set.
      • Highlight any significant trends or anomalies and recommend actions (e.g., simplifying the model, fine-tuning hyperparameter ranges, or increasing training data diversity).

Code Example

This Python function evaluates whether a model demonstrates robustness by analyzing the variance in performance metrics when Gaussian noise is added to its hyperparameters during training.

import numpy as np
from sklearn.metrics import accuracy_score

def robustness_with_hyperparameter_noise(X_train, y_train, X_test, y_test, model, metric, hyperparam_name, base_value, noise_levels, n_runs):
    """
    Evaluate a model's robustness by analyzing performance variance with Gaussian noise added to a hyperparameter.

    Parameters:
        X_train (np.ndarray): Training features.
        y_train (np.ndarray): Training labels.
        X_test (np.ndarray): Testing features.
        y_test (np.ndarray): Testing labels.
        model: Configured machine learning model with accessible hyperparameters.
        metric (callable): Performance metric function.
        hyperparam_name (str): The name of the hyperparameter to perturb (must match the model's parameter name).
        base_value (float): The baseline value of the hyperparameter.
        noise_levels (list): List of standard deviations for Gaussian noise to test.
        n_runs (int): Number of retraining iterations per noise level.

    Returns:
        dict: Dictionary containing noise level, variance, mean performance, and interpretation for each noise level.
    """
    results = []

    for noise_std in noise_levels:
        performance_scores = []

        for _ in range(n_runs):
            # Add Gaussian noise to the hyperparameter
            noisy_value = base_value + np.random.normal(0, noise_std)

            # hack for max_depth
            noisy_value = round(noisy_value)
            if noisy_value < 1:
                noisy_value = 1

            # Update the hyperparameter and retrain the model
            model.set_params(**{hyperparam_name: noisy_value})
            model.fit(X_train, y_train)

            # Evaluate the model on the test set
            y_pred = model.predict(X_test)
            score = metric(y_test, y_pred)
            performance_scores.append(score)

        # Calculate variance and mean performance
        variance = np.var(performance_scores)
        mean_performance = np.mean(performance_scores)

        # Interpret the results
        if variance < 0.01:
            interpretation = "High robustness: very low performance variance at this noise level."
        elif variance < 0.05:
            interpretation = "Moderate robustness: acceptable performance variance at this noise level."
        else:
            interpretation = "Low robustness: high sensitivity to noise at this level."

        results.append({
            "Noise Level (Std Dev)": noise_std,
            "Variance": variance,
            "Mean Performance": mean_performance,
            "Interpretation": interpretation,
        })

    return results

# Demo Usage
if __name__ == "__main__":
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.datasets import make_classification

    # Generate synthetic dataset
    X, y = make_classification(
        n_samples=500, n_features=10, n_informative=5, random_state=42
    )

    # Split into train and test sets
    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Initialize model
    model = RandomForestClassifier(random_state=42)

    # Perform robustness test on 'max_depth' hyperparameter
    results = robustness_with_hyperparameter_noise(
        X_train, y_train, X_test, y_test, model,
        accuracy_score, hyperparam_name='max_depth', base_value=5,
        noise_levels=[0, 1, 2, 3, 4, 5], n_runs=30
    )

    # Print results
    print("Robustness Test Results with Hyperparameter Noise:")
    for res in results:
        for key, value in res.items():
            print(f"{key}: {value}")
        print()

Example Output

Robustness Test Results with Hyperparameter Noise:
Noise Level (Std Dev): 0
Variance: 1.232595164407831e-32
Mean Performance: 0.9299999999999999
Interpretation: High robustness: very low performance variance at this noise level.

Noise Level (Std Dev): 1
Variance: 0.0008573333333333348
Mean Performance: 0.9060000000000001
Interpretation: High robustness: very low performance variance at this noise level.

Noise Level (Std Dev): 2
Variance: 0.0018862222222222242
Mean Performance: 0.8973333333333334
Interpretation: High robustness: very low performance variance at this noise level.

Noise Level (Std Dev): 3
Variance: 0.0031578888888888906
Mean Performance: 0.8923333333333333
Interpretation: High robustness: very low performance variance at this noise level.

Noise Level (Std Dev): 4
Variance: 0.003695555555555558
Mean Performance: 0.8866666666666667
Interpretation: High robustness: very low performance variance at this noise level.

Noise Level (Std Dev): 5
Variance: 0.004709888888888889
Mean Performance: 0.8663333333333335
Interpretation: High robustness: very low performance variance at this noise level.