I recently stumbled upon a post in HackerNews that Sutskever (of AlexNet fame) in 2020 listed 30 papers saying that "If you really learn all of these, you’ll know 90% of what matters today”. Based on some light googling, there has been some debate on which exact papers those were, but at least these three sources seem to provide the same list. With the last one claiming that this list was provided by one of Sutskever's colleagues at OpenAI. So this might be a reasonable approximation at least.
The list, copied from here, is listed below. Some of these are blog posts, other scientific articles and others are full text books. The list seemed cool at first glance so I'm planning to start reading it through in no particular order. In future separate posts I'll also try to post a few thoughts about each one, or groups of them, but this post will function as my main diary of my progress of reading the list through.
Papers
- The Annotated Transformer [Status: Unread]
- The First Law of Complexodynamics [Status: Read (before starting this project)]
- The Unreasonable Effectiveness of RNNs [Status: Read (before starting this project)]
- Understanding LSTM Networks [Status: Read (before starting this project)]
- Recurrent Neural Network Regularization [Status: Unread]
- Keeping Neural Networks Simple by Minimizing the Description Length of the Weights [Status: Unread]
- Pointer Networks [Status: Unread]
- ImageNet Classification with Deep CNNs [Status: Unread]
- Order Matters: Sequence to sequence for sets [Status: Unread]
- GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism [Status: Unread]
- Deep Residual Learning for Image Recognition [Status: Unread]
- Multi-Scale Context Aggregation by Dilated Convolutions [Status: Unread]
- Neural Quantum Chemistry [Status: Unread]
- Attention Is All You Need [Status: Read (before starting this project)]
- Neural Machine Translation by Jointly Learning to Align and Translate [Status: Unread]
- Identity Mappings in Deep Residual Networks [Status: Unread]
- A Simple NN Module for Relational Reasoning [Status: Unread]
- Variational Lossy Autoencoder [Status: Unread]
- Relational RNNs [Status: Unread]
- Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton [Status: Unread]
- Neural Turing Machines [Status: Unread]
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin [Status: Unread]
- Scaling Laws for Neural LMs [Status: Unread]
- A Tutorial Introduction to the Minimum Description Length Principle [Status: Unread]
- Machine Super Intelligence Dissertation [Status: Unread]
- Komogrov Complexity (Page 434 onwards) [Status: Link is broken, trying to find the pdf]
- CS231n Convolutional Neural Networks for Visual Recognition [Status: Unread]