Covariate Shift

These are some additional notes that I am taking on the incredible book by David Foster on Generative Deep Learning

This is the change in the distributions of the internal nodes of a deep network. Foster states in chapter 2 of the Generative Deep Learning book that the longer that a network trains, the greater the possibility that the weights could move from the random initial values and this in turn could lead to NaN errors.

Machine Learning Mastery called this internal covariate shift.