About
Hi there, I’m Jason Brownlee (twitter, linkedin), the author of this website.
Decade of Coaching
I’ve worked as an engineer on modeling and high-performance computing projects in industry, government, startups, and a long time ago, I had more academic aspirations and completed a Masters and PhD in stochastic optimization.
I’ve been helping data scientists for more than a decade via coaching and consulting and answering tens of thousands of emailed questions.
I’ve also authored 1,000+ tutorials and 20+ books on machine learning and deep learning over at my former company Machine Learning Mastery, which I sold in 2021.
Generalization Gap
I’ve worked with hundreds (probably thousands) of budding and professional data scientists one-on-one over the years.
The most common problem that we work on together is the Generalization Gap.
This is where performance on new data in the test set or in production is worse than the expected performance from the test harness.
Diagnostic Checklist
This website is based on a long checklist that I used as a diagnostic tool to work through the problem with my clients.
I don’t do so much consulting anymore, but I know from experience how valuable this checklist is for those struggling with the generalization gap problem.
I recently decided to share an updated version of my checklist publicly: Data Science Diagnostic Checklist.
It generated some interest among my data science friends, so I decided to expand upon it with this site and provide more information on each test, code snippets for each test, and some worked examples.
I hope that you find these data science diagnostic tests useful too!
Reach Out
If the tests help you make progress on your project, or if you have ideas for more/better tests, please email me any time: Jason.Brownlee05@gmail.com