Interesting read. Was surprised to learn how much damage can be done to a model's parameters without making any discernible difference in its quality of output.
I didn't see any mention of dropout in the article, during training parameters or whole layers are removed in different places which helps force it into a distributed representation.
Interesting read. Was surprised to learn how much damage can be done to a model's parameters without making any discernible difference in its quality of output.
I didn't see any mention of dropout in the article, during training parameters or whole layers are removed in different places which helps force it into a distributed representation.
Archive link: https://archive.is/0Bl3Z