Following the Belkin et al., Nakkiran et al. further showed that double descent occurs not only as a function of model size but also as a function of the number of training epochs.
Following the Belkin et al., Nakkiran et al. further showed that double descent occurs not only as a function of model size but also as a function of the number of training epochs.