Resources

Lectures

  • Sanjeev Arora. Theory of Deep Learning. Princeton University. [Link]

  • Soheil Feizi. Foundations of Deep Learning. University of Maryland, College Park. [Link]

  • Nathan Srebro (TTIC), Zhaoran Wang (Northwestern University), Dongning Guo (Northwestern University). Theory of Deep Learning. [Link]

  • David Donoho, Vardan Papyan, and Yiqiao Zhong. Analyses of Deep Learning. Stanford University. [Link]

Papers

Self-supervision Learning Theory
  • Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli. Understanding self-supervised learning with dual deep networks. arXiv preprint arXiv:2010.00578 (2020). [Paper Link]

  • Colin Wei, Kendrick Shen, Yining Chen, Tengyu Ma. Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. ICLR 2020. [Paper Link]

  • Bansal, Y., Kaplun, G., & Barak, B. For self-supervised learning, Rationality implies generalization, provably. arXiv preprint arXiv:2010.08508. [Paper Link]

  • Tosh, Christopher, Akshay Krishnamurthy, and Daniel Hsu. Contrastive learning, multi-view redundancy, and linear models. arXiv preprint arXiv:2008.10150 (2020). [Paper Link]

  • Lee JD, Lei Q, Saunshi N, Zhuo J. Predicting what you already know helps: Provable self-supervised learning. arXiv preprint arXiv:2008.01064. [Paper Link]

  • Arora, S., Khandeparkar, H., Khodak, M., Plevrakis, O., & Saunshi, N. A theoretical analysis of contrastive unsupervised representation learning. ICML 2019. [Paper Link]

Algorithm-based Learning Theory
  • Nagarajan, Vaishnavh, and J. Zico Kolter. Uniform convergence may be unable to explain generalization in deep learning NIPS 2019. [Paper Link] [Blog]

  • Smith, Samuel L., et al. On the Origin of Implicit Regularization in Stochastic Gradient Descent. arXiv preprint arXiv:2101.12176 (2021). [Paper Link]

  • Amir, Idan, Tomer Koren, and Roi Livni. SGD Generalizes Better Than GD (And Regularization Doesn’t Help). arXiv preprint arXiv:2102.01117 (2021). [Paper Link]

  • Neu, Gergely. Information-Theoretic Generalization Bounds for Stochastic Gradient Descent. arXiv preprint arXiv:2102.00931 (2021). [Paper Link]

  • Zhou, Lijia, D. J. Sutherland, and Nathan Srebro. On Uniform Convergence and Low-Norm Interpolation Learning. arXiv preprint arXiv:2006.05942 (2020). [Paper Link]

  • Xu, Yunbei, and Assaf Zeevi. Towards Problem-dependent Optimal Learning Rates. Advances in Neural Information Processing Systems 33 (2020). [Paper Link]

Interesting Empirical Results

Others

Recent Learning Theory for Deep Learning
  • He, Fengxiang, and Dacheng Tao. Recent advances in deep learning theory. arXiv preprint arXiv:2012.10931 (2020). [Paper Link]

  • Valle-Pérez, G., & Louis, A. A. Generalization bounds for deep learning. arXiv preprint arXiv:2012.04115. [Paper Link]

  • Liu, Qinghua, and Zhou Lu. A Tight Lower Bound for Uniformly Stable Algorithms. arXiv preprint arXiv:2012.13326 (2020). [Paper Link]

Causality
  • Dziugaite, Gintare Karolina, et al. In search of robust measures of generalization. NIPS 2020. [Paper Link]
Optimization
  • Nguyen, Quynh. On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths. arXiv preprint arXiv:2101.09612 (2021). [Paper Link]

  • Cheridito, Patrick, Arnulf Jentzen, and Florian Rossmannek. Non-convergence of stochastic gradient descent in the training of deep neural networks. Journal of Complexity (2020): 101540. [Paper Link]