If you like what you see here, please post this website on other sites. Message me directly on Twitter with any comments.
These are some additional notes that I am taking on the incredible book by David Foster on Generative Deep Learning
This is the change in the distributions of the internal nodes of a deep network. Foster states in chapter 2 of the Generative Deep Learning book that the longer that a network trains, the greater the possibility that the weights could move from the random initial values and this in turn could lead to NaN errors.
Machine Learning Mastery called this internal covariate shift.