‘Must-Read’ AI Papers Suggested by Experts — Pt 2
Due to the overwhelming response to our previous expert paper suggestion blog, we had to do another. We asked some of our expert community the papers they would suggest everybody read when working in the field.
Haven’t seen the first blog? You can read the recommendations of Andrew Ng, Jeff Clune, Myriam Cote and more here.
Alexia Jolicoeur-Martineau, PhD Researcher, MILA
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization — Sebastian Nowozin et al.
https://arxiv.org/pdf/1711.04894.pdf
Alexia suggested this paper as it explains how many classifiers can be thought of as estimating an f-divergence. Thus, GANs can be interpreted as estimating and minimizing a divergence. This paper from Microsoft Research clearly maps the experiments undertaken, methods and related work to support. Read this paper here.
Sobolev GAN — Youssef Mroueh et al.
https://arxiv.org/pdf/1711.04894.pdf
This paper shows how the gradient norm penalty (used in the very popular WGAN-GP) can be thought of as constraining the discriminator to have its gradient in a unit-ball. The paper is very mathematical and complicated, but the key message is that we can apply a wide variety of constraints to the discriminator/critic. These constraints help prevent the discriminator from becoming too strong. I recommend focusing on Table 1, which shows the various different constraints that can be used. I have come back many times to this paper just to look at Table 1. You can read this paper here.
Jane Wang, Senior Research Scientist, DeepMind
To be honest, I don’t believe in singling out any one paper as being more important than the rest, since I think all papers build on each other, and we should acknowledge science as a collaborative effort. I will say that there are some papers I’ve enjoyed reading more than others, and that I’ve learned from, but others might have different experiences, based on their interest and background. That said, I’ve enjoyed reading the following:
Where Do Rewards Come From? — Satinder Singh et al.
https://all.cs.umass.edu/pubs/2009/singh_l_b_09.pdf
This paper advances a general computational framework for reward that places it in an evolutionary context, formulating a notion of an optimal reward function given a fitness function and some distribution of environments. Novel results from computational experiments show how traditional notions of extrinsically and intrinsically motivated behaviors may emerge from such optimal reward functions. You can read this paper here.
Building machines that learn and think like people — Brenden Lake et al
This paper reviews progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn and how they learn it. Specifically, we argue that these machines should (1) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (2) ground learning in intuitive theories of physics and psychology to support and enrich the knowledge that is learned; and (3) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. Read more on this paper here.
Jekaterina Novikova, Director of Machine Learning, WinterLight Labs
Attention Is All You Need — Ashish Vaswani et al
https://arxiv.org/abs/1706.03762
Novel large neural language models like BERT or GPT-2/3 were developed soon after NLP scientists realized in 2017 that “Attention is All You Need”. The exciting results produced by these models caught the attention of not just ML/NLP researchers but also the general public. For example, GPT-2 caused almost mass hysteria in 2019 as a model that is “too dangerous to be public” as it can potentially generate fake news indistinguishable from real news articles. The GPT-3, which was only released several weeks ago, has already been called “the biggest thing since bitcoin”. You can read this paper here.
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data — Emily M. Bender et al.
https://www.aclweb.org/anthology/2020.acl-main.463.pdf
To outweigh the hype, I would recommend everyone to read a great paper that was presented and recognized as the best theme paper at the ACL conference in the beginning of July 2020 — “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”. In the paper, the authors argue that while the existing models, such as BERT or GPT, are undoubtedly useful, they are not even close to human-analogous understanding of language and its meaning. The authors explain that understanding happens when one is able to recover the communicative intent of what was said. As such, it is impossible to learn and understand language if language is not associated with some real-life interaction, or in other words — “meaning cannot be learned from form alone”. This is why even very large and complex language models can only learn a “reflection” of meaning but not meaning itself. Read more on the paper here.
See the full blog here.