The Most Cited Paper of the Decade – Can We Learn from It?

65,287

1,783 0

Published 2024-02-14

“Adam: A Method for Stochastic Optimization” – is one of the most highly cited papers ever published. Moreover, this paper was written in 2014 by two PhD students! Let’s see why it became so popular and if we can learn anything from it.

A relevant video by @SimonClark • I read the top 100 scientific papers ...

00:00 Incredible number of citations
02:37 Authors of the “Adam” paper
04:40 What is Adam method?
05:26 Let’s check the paper!
10:10 Can we learn from it?
10:38 YES
12:26 NO
13:52 Other highly cited papers
14:49 “Adam” is an unusual paper
16:02 Attitude to research
17:15 Other features

Andrey Churkin (Андрей Чуркин) 2024
andreychurkin.ru/

All Comments (21)

@zuruumi9849 5 months ago

There are several points to learn from the most cited paper: - Practical usability: make non-scientists want to use it (that's why breakthroughs aren't most cited, they aren't immediately usable outside academia) - Concise, readable: leave details non-essential for use in separate sections to be easily skipped (that's why related literature and proofs are separate) - Graphs, images: if you want to say something important, make it into a graph/image, that's what people scanning through will actually notice - Advertise it: big conferences, cite/link/use it in open-source libraries, etc. In other words, if you want to get lots of citations don't write for academia, write a manual that people on the periphery (not strictly working for universities etc.) can notice, read, understand, and easily use.
@Drudge.Miller 5 months ago

Make a paper about making papers and become the meta paper publisher 😂
@0xnika 5 months ago

You coincidentally stabled across a fundamental fact: Reviewing ML papers is a quite successful strategy for youtube channels ;)
@SirGisebert 5 months ago

Two more points: First, the adam paper entered a feedback loop, where its popularity resulted in a lot of deep learning tutorials on the internet mentioning it. Then, a lot of people with no idea about optimization algorithms pick it because it was recommended in a tutorial, further increasing its citations. Second, the name is in the title. That wouldn't matter for lesser known methods, but when you want to use the software implementation of a method you know nothing about (based on a tutorial you read), it is very easy to figure out which paper to cite when the name is in the title.
@osman7900 5 months ago

I have a friend who is a scientist at DeepMind. He says there are two criteria for measuring performance of the researchers. One is coming up with significant research ideas that may contribute to the development of a general AI and the second is convincing fellow researchers to work on those ideas. So their success criteria is not quantitative but qualitative.
@boylanpardosi4586 3 months ago

The Systematic Review of Systematic Reviews: A Systematic Review
@TheCheesyNachos 5 months ago

Two things off the top of my head: 1. At 12:45, about the paper being a method paper. This is very typical in CS where a paper will introduce a problem and then propose a method to solve it, rather than making some discoveries alone. Maybe it will also prove some result also. 2. Worth mentioning also that the original Adam paper had an incorrect proof that was eventually corrected (probably why its arXiv version is edited a few times). I just thought that is also worth mentioning.
@allinclusive169 3 months ago

Saying that this is "only" a method paper would be a great understatement. Firstly, because a lot of ML papers are "just method" papers. You develop a new method and test it on a set of well known datasets to show that your method works better than others. Another factor in the adoption of Adam (which is basically used everywhere all the time as the go to optimization technique now) was really easy to implement in popular machine learning libraries, which some other optimizers were not. Also... It's simply a great idea wrapped in a very well written paper.
@manueltiburtini6528 5 months ago

Good story telling and narrative. I like your videos ! Thanks
@leohuang990 5 months ago

Having the Related Work section right before the Conclusion is common in computer sciences. This style may be specific to research subareas or simply advisors. I previously worked on crypto side channels and now on embedded system security. My previous advisor preferred "related work" as a subsection in the Introduction, while my current one prefers the other. I 100% agree with you about the advantage of method papers over discovery papers. However, it is not strange to see short but impactful papers in theoretical computer science. Hao Huang's paper in 2019 is an example. It is 5 pages long, the main body (a proof) is two pages long, and the math is at best graduate-level. But, it solves a 30-year-old conjecture of Boolean Sensitivity. The paper's value lies in its simplicity against a long-lasting problem.
@Militaizi 5 months ago

I think this highlights the importance of practicality in paper publishing and research. I use it almost daily, or at least weekly, ADAM, or nowadays its better performing derivatives in ML Engineering. I must admit I don't understand it anymore 100%, but I have seen it is the most robust optimization algorithm. I only have twice or 3 times re-read the papers that I read during my masters thesis initially. Other optimization techniques are harder to grasp, manifold.
@emperor4102 5 months ago

I love this channel Great work my friend
@hahne9 2 months ago

I would assume that most of the people citing this paper never read the paper. Adam is just the standard optimization algorithm in deep learning and is integrated in every deep learning library. People just use it without even knowing how exactly it works and they mention it in their papers. So obviously is gets cited a lot. I doesn't matter at all how well written the paper is in this case.
@workforyouraims 5 months ago

It was an era of the boom of machine learning. Previously, in many fields there were non machine learning techniques to model the data, but since 2015 and on many of those techniques are barely used anymore since for example deep learning models, can do all the intermediate steps of data extraction, and you do not need many layers of algorithms to model the data. You just need one. To be honest, I think luck is also a great factor. Of course these researchers are really smart and hardworking etc, but it was the right time. It was an era of changing the methods from pre-machine learning to machine learning methods.
@giovannibarbarani464 2 months ago

Adam is very important, everyone who works with DL knows that without Adam it won't work so well or it will take ages to train, making it just impractical. It's a paper with many leaks (and a wrong proof) but its impact is unquestionable. However RMSProb was quite good as Adam but it has not even cared about a pubblication lol (it is quoted from a blogpost and a coursera video).
@kilogods 3 months ago

Wow man your videos are fantastic.
@christiangreisinger2339 5 months ago

Geoffrey Hinton was the PhD supervisor of Jimmy Ba (one of the two Adam paper authors). I feel like this could heavily be contributed to his success as Geoffrey is literally called "The Godfather of AI". He is one of the most cited people ever and has an enormlus influence on the whole scientific community
@john.darksoul 2 months ago

I absolutely love this paper. You can open it, rewrite the algorithm presented in your programming language of choice, and it just works :D
@Benforeva 5 months ago

The Related Work section showing up late in the paper is advice I’ve seen from CS researcher Simon Peyton-Jones. He has popular Microsoft Research talks on YouTube describing this format. As you noted it allows the reader to dive into your own original content as quickly as possible.
@mehmetcansoyluoglu9575 3 months ago

It's a nice video, and I would like to highlight a point where I believe should also be looked at especially when the number of publication of the authors of ADAM paper were compared in the year they published their paper. I believe the number of publication per author can be also evaluated from the previous years of initial ADAM publication. The question is, until they reach the level of publication of their ADAM paper how intense they focused on to this project, and if we can follow it through the number of publications they published before ADAM paper. Because, if they really focused on one big project, then it must be appeared at the number of publication until they publish ADAM paper. So I believe this would be another interesting aspect to look at on this matter.