Data Science Diagnostics

Data Science Diagnostics Diagnostics About

CTRL K

Diagnostics
About
Help
FAQ
Glossary
Disclaimer

Data Science Diagnostics

My Model is worse on the Test Set than the Train Set!

This problem is typically called “overfitting” or the “generalization gap”.

There are many causes for this problem, and we must know the specific cause in order to address it.

Discover the cause of your problem with diagnostic tests

We can use diagnostic tests to probe your project and discover the cause of the performance mismatch.

Below are diagnostic tests that we can use to gather evidence for the specific cause on your project:

Diagnostic Test Categories

Generalization Gap Problem

Discover more about the biggest problem in machine learning typically referred to simply as ‘overfitting’.

Dataset Split Procedure

Discover diagnostic checks to confirm that you followed best practices when splitting your dataset.

Split Size Sensitivity

Discover diagnostic checks to confirm the chosen train/test split size is appropriate for your data.

Gap Quantification Tests

Discover diagnostic tests to quantify the size of the gap in performance and determine whether it is a concern.

Challenge Performance Tests

Discover diagnostic tests to challenge and develop more robust estimates of model performance.

Data Distribution Tests

Discover diagnostic tests to determine whether train and test sets have the same distributions of data.

Model Performance Tests

Discover diagnostic tests to determine whether performance is consistent across train and test sets.

Prediction Error Tests

Determine diagnostic tests to determine whether hard to predict examples are consistent across sets.

Residual Distribution Tests

Determine diagnostic tests to determine whether residual error distributions are consistent across train and test sets.

Model Overfitting Tests

Discover diagnostic tests to determine whether your model is overfitting the training dataset.

Model Robustness Tests

Discover diagnostic tests to determine whether your model is fragile to small changes.

Test Set Leakage

Discover diagnostic tests to determine whether knowledge of the test set has leaked into the train set

Discover ideas on how to fix your data and/or model once the cause has been identified

© 2024 Data Science Diagnostics