Beyond Empirical Risk Minimization: the lessons of deep learning

Mikhail Belkin, Professor, The Ohio State University - Department of Computer Science and Engineering, Department of Statistics, Center for Cognitive Science Abstract: “A model with zero training error is overfit to the training data and will typically generalize poorly“ goes statistical textbook wisdom. Yet, in modern practice, over-parametrized deep networks with near perfect fit on training data still show excellent test performance. This apparent contradiction points to troubling cracks in t
Back to Top