The moment we stopped understanding AI [AlexNet]

342,468
0
Published 2024-07-01
Thanks to KiwiCo for sponsoring today's video! Go to www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first month of monthly lines and/or for 20% off your first Panda Crate.

Activation Atlas Posters!
www.welchlabs.com/resources/5gtnaauv6nb9lrhoz9cp60…
www.welchlabs.com/resources/activation-atlas-poste…
www.welchlabs.com/resources/large-activation-atlas…
www.welchlabs.com/resources/activation-atlas-poste…

Special thanks to the Patrons:
Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti

Welch Labs
Ad free videos and exclusive perks: www.patreon.com/welchlabs
Watch on TikTok: www.tiktok.com/@welchlabs
Learn More or Contact: www.welchlabs.com/
Instagram: www.instagram.com/welchlabs
X: twitter.com/welchlabs

References
AlexNet Paper
proceedings.neurips.cc/paper_files/paper/2012/file…

Original Activation Atlas Article- explore here - Great interactive Atlas! distill.pub/2019/activation-atlas/
Carter, et al., "Activation Atlas", Distill, 2019.

Feature Visualization Article: distill.pub/2017/feature-visualization/
`Olah, et al., "Feature Visualization", Distill, 2017.`

Great LLM Explainability work: transformer-circuits.pub/2024/scaling-monosemantic…
Templeton, et al., "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet", Transformer Circuits Thread, 2024.

“Deep Visualization Toolbox" by Jason Yosinski video inspired many visuals:
   • Deep Visualization Toolbox  

Great LLM/GPT Intro paper
arxiv.org/pdf/2304.10557

3B1Bs GPT Videos are excellent, as always:
   • Attention in transformers, visually e...  
   • But what is a GPT?  Visual intro to t...  

Andrej Kerpathy's walkthrough is amazing:
   • Let's build GPT: from scratch, in cod...  

Goodfellow’s Deep Learning Book
www.deeplearningbook.org/

OpenAI’s 10,000 V100 GPU cluster (1+ exaflop) news.microsoft.com/source/features/innovation/open…

GPT-3 size, etc: Language Models are Few-Shot Learners, Brown et al, 2020.

Unique token count for ChatGPT: cookbook.openai.com/examples/how_to_count_tokens_w…

GPT-4 training size etc, speculative:
patmcguinness.substack.com/p/gpt-4-details-reveale…
www.semianalysis.com/p/gpt-4-architecture-infrastr…

Historical Neural Network Videos
   • Convolutional Network Demo from 1989  
   • Perceptron Research from the 50's & 6...  

All Comments (21)
  • Thanks to KiwiCo for sponsoring today's video! Go to www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first month of monthly lines and/or for 20% off your first Panda Crate.
  • @EdgarVerona
    30 years ago, I used to work with an older guy who retired from IBM. I was barely out of high school, and he used to tell me that neural networks were going to change the world once people figured out how to train them properly. He didn't live to see his dream become reality unfortunately, but he was totally right.
  • @JustSayin24
    That real-time kernel activation map was life-changing. If, whilst editing these videos, you've ever questioned if the vast amounts of effort are worth what amounts to a brief, 10s clip, just know that it's these moments which have stuck with me. Easy sub
  • @drhxa
    I've been in the field for 10 years and never had anyone describe this so clearly and visually. Brilliant, thank you!
  • Computers not being fast enough to make a correct algorithm practically usable reminds me of Reed–Solomon error correcting codes. They were developed in 1960 but computers were too slow for them to be practical. They went unused until 1982 when they were used in Compact Discs after computers had become fast enough.
  • Most people think AI is a brand new technology, while in reality there have been studies on Computer Neural Networks all the way back in the 1940s, that's insane.
  • @emrahe468
    Amazing intro with scissor and carboards 👏
  • @frostebyte
    I really appreciate how well you communicate non-verbally despite using very little A-roll. You're expressions are clear yet natural even while reading, enunciating and employing tone, and there's no fluff; you have a neutral point for your hands to signal that there's no gesture to pay attention to. I couldn't find anything to critique in your vids if I tried but this seemed like the easiest to overlook. Thanks for every absolute banger!
  • @somnvm37
    "one way to think about this vector, is as a point in 4096 dimentional space" give me a minute, I now gotta visualise a 4096 dimentional space in my head.
  • Awesome video! Funny how the moment we stopped understanding AI also appears to be the moment it started working lol
  • @ernestuz
    I was working with deep neural networks at the university during the late 90s, the main issue that stopped all progress was the use of a kind of functions between layers (the sigmoid as activation function), this effectively stopped the learning backpropagating from the output layers and limiting how many layers you can use (the problem is called the vanishing gradient). Once people rediscovered ReLU (it was invented in the early 70s, I believe, but I think the inventor published it in Japanese, so it went unnoticed) deep neural networks became possible. High computation needs were only a problem if you wanted real time or low latency, those days we used to leaving the computer calculating during nighttime to get something next day.
  • It's rare to find an AI video this informative and interesting. Great pacing great focus, this is wonderful. I'm a particular fan of the sort of stop-motion / sped-up physical manipulation of papers on your desk with that overhead lighting. Very clean and engaging effect. Seeing the face-detecting kernel emerge after so few blocks was also mind-blowing!
  • @optiphonic_
    Your visualisations helped a few concepts click for me around the layers and activations Ive struggled to understand for years. Thanks!
  • Fun fact, the kernels used in vision models work pretty much the same way as how our retinas perceive objects. In a similar structure, our eyes have cells that perceive edges at certain angles, then as shapes, then as objects in increasing abstraction.
  • @iccuwarn1781
    Fantastic presentation on the inner workings of machine learning!
  • @ben9089
    This was an incredible introduction in just 18 minutes. I continue to be blown away by this channel.
  • Great video. The only nitpick is with title: we haven't stopped understanding AI at AlexNet (and video clearly shows that we only getting better at understanding since that moment), we finally had working "AI" starting from AlexNet. All those "expert handcrafted" AIs before were no simpler to understand (if not harder) despite being handcrafted. And they largely didn't work and it was very hard to understand why.
  • The amount of work you must put into videos is mind boggling. Thank you for making them.