Nested Cross-Validation

Diagnostics

Interventions

Nested Cross-Validation is used to evaluate the performance of machine learning models and avoid overfitting or bias when tuning hyperparameters.

Procedure

It involves two levels of cross-validation:

Outer Loop (Model Evaluation): This loop is used to assess the overall performance of the model. The dataset is split into multiple folds, and in each iteration, one fold is held out as the test set while the remaining folds are used for training and hyperparameter tuning.
Inner Loop (Hyperparameter Tuning): Within the training folds of the outer loop, another cross-validation is performed to tune the hyperparameters. The inner loop identifies the best hyperparameters by further splitting the training data into smaller training and validation sets.

Key Advantages:

Provides an unbiased estimate of the model’s performance on unseen data.
Prevents data leakage between the hyperparameter tuning process and the final model evaluation.
Ensures the evaluation metric reflects the true performance of the model when deployed.

Code Example

Below is a code example of nested cross-validation.

from sklearn.datasets import make_classification
from sklearn.model_selection import StratifiedKFold, GridSearchCV, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np

# Create a synthetic dataset
X, y = make_classification(
    n_samples=500, n_features=20, n_informative=15, n_redundant=5, random_state=42
)

# Define the outer cross-validation loop
outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Define the model and hyperparameter grid for the inner loop
model = RandomForestClassifier(random_state=42)
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
}

# Store scores for each fold in the outer loop
outer_scores = []

# Nested cross-validation
for train_idx, test_idx in outer_cv.split(X, y):
    # Split the data into training and test sets for the outer loop
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]

    # Inner cross-validation for hyperparameter tuning
    inner_cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)
    grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=inner_cv, scoring='accuracy')
    grid_search.fit(X_train, y_train)

    # Evaluate the best model on the outer test set
    best_model = grid_search.best_estimator_
    y_pred = best_model.predict(X_test)
    outer_scores.append(accuracy_score(y_test, y_pred))

# Aggregate and print the performance metrics
mean_accuracy = np.mean(outer_scores)
std_accuracy = np.std(outer_scores)

print(f'Best Accuracy: {mean_accuracy:.4f} ({std_accuracy:.4f})')

Example Output

Best Accuracy: 0.8940 (0.0393)

Ensemble Chosen Model