Deep Learning Dictionary
Additive Smoothing : When calculating the maximum likelihood estimate $\theta_j$, you want to make sure that even unlikely possibilities could be generated in the additive model. Discussed in Generative Deep Learning - July 2019 - David Foster - Chapter 1
Activations : These are tne nonlinearities that rae introduced within Dense layers.
- Sigmoid - This is used for multiclass classification (when an item can belong to more than one class ). It is used when you want the values to be between 0 and 1. This is represented as $\frac{1}{1 + e^{-x}}$
Discussed in Generative Deep Learning - July 2019 - David Foster - Chapter 2
Autoregressive Model : This is a unidirectional model that attempts to predict data from past input.
Catastrophic Forgetting : A situation where the model forgets the task on which it was originally trained. Discussed in NAACL 2019 Transfer Learning Tutorial Slides
Naive Bayes : This modeling technique makes the assumption that each feature is independent of every other feature.
Sequential Adaptation : Intermediate fine-tuning on related datasets and tasks. Discussed in Generative Deep Learning - July 2019 - David Foster - Chapter 1