Chollet's ARC Challenge + Current Winners

Published 2024-06-18
The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge—Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt).

Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models.

They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems.

Note:
Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible.
Chollet invented ARC in 2019 (not 2017 as stated)

"Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble"

Jack Cole:
x.com/Jcole75Cole
lab42.global/community-interview-jack-cole/

Mohamed Osman:
Mohamed is looking to do a PhD in AI/ML, can you help him?
Email: [email protected]
www.linkedin.com/in/mohamedosman1905/

Michael Hodel:
arxiv.org/pdf/2404.07353v1
www.linkedin.com/in/michael-hodel/
x.com/bayesilicon
github.com/michaelhodel

Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt
redwoodresearch.substack.com/p/getting-50-sota-on-…

Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee]
arxiv.org/pdf/2402.03507

Measure of intelligence:
arxiv.org/abs/1911.01547

I think the audio levelling might be a bit off on this for the intro especially, I fixed it on the audio podcast version - sorry if it's annoying.

Pod version: podcasters.spotify.com/pod/show/machinelearningstr…

TOC (autogenerated):
00:00:00 Introduction
00:03:00 Francois Chollet's Intelligence Concept
00:08:00 Human Collaboration
00:15:00 ARC Tasks and Symbolic AI
00:27:00 Evaluation Techniques
00:35:23 (Main Interview) Competitors and Approaches
00:40:00 Meta Learning Challenges
00:48:00 System 1 vs System 2
01:00:00 Inductive Priors and Symbols
01:18:00 Methodologies Comparison
01:25:00 Training Data Size Impact
01:35:00 Generalization Issues
01:47:00 Techniques for AI Applications
01:56:00 Model Efficiency and Scalability
02:10:00 Task Specificity and Generalization
02:13:00 Summary

All Comments (21)
  • Post show reflections: I know I was pushing the "LLMs are databases" line quite hard, and the guests (and Ryan's article) were suggesting that they do some (small) kind of "patterned meta reasoning". This is quite a nuanced issue. While I still think LLMs are basically databases, something interesting happens with in-context learning. The "reasoning" prompt (or database query if you like) is parasitic on the human operator - but the LLM itself does seem to do some of the patterned completion/extension of the human reasoning prompt pattern in context i.e. "above the database layer" there is some kind of primitive meta patterning going in which is creating novel combinations of retrieved skill programs in the LLM. It's a subtle point but I don't think I was able to express it in the show. - Tim
  • @AkhilBehl
    I think the resurgence of the ARC challenge is one of the most interesting things to have happened this year in AI. Just the level of nuance and debate it has forced into the conversation can only be good for the community. Whether it is beaten or not, we’ll all be wiser for having gone through this exercise. Chollet really has devised an incredibly ingenious challenge.
  • @paxdriver
    I can't begin to tell you how much quality of life this channel has brought me over the years since my health issues have impeded my mobility. These videos are so stimulating and profound, I wish I offer more. I so, so, so much appreciate your work Tim, and Yannic and Keith too. Thank you all so much.
  • @diga4696
    I often share your insightful and well-explained videos with my children. I want to express my sincere gratitude to everyone involved in creating MLST's content. It's truly exceptional. I wish more content creators would prioritize clear, informative delivery over sensationalism, as you do so well. Thank you!
  • @Aarron-io3pm
    You talk through and present everything so clearly, I can follow along and understand easily despite knowing nothing about ML, thanks!
  • @johnvanderpol2
    I think many people under estimate the complexity of our visual cortex. There has been interesting research, based on persons with brain defects. And every time one finds new insights. What looks simple in the arc challenge is million of years of evolution. Language is only a few thousands of years Reasoning likely even less. Amazing conversation. Thanks
  • @jmarz2600
    I not sure I see ARC tests as examples of "abstraction" and/or "reasoning." I see them as our capacity - at the Perceptual level - to automatically Categorization concrete things into like "kinds" of things due to their perceived similarity (or dis-similarity in the case of a missing similar piece). This is why young children (not yet operating at a very high level of verbal abstract reasoning) can solve these types of problems. The problems are resolved at the perceptual level - not at the higher (verbal) levels of abstraction reasoning. If the images are flashed for a brief faction of a second, you won't "perceive" the solution. Instead, you just stare at them over time, and your brain instantiates them through constructing neural pathways that are similar. And you see the "solution". This is why humans don't need large, labeled data sets to "get" what a cat is. A young child doesn't even need to be at the verbal stage to differentiate dogs and cats in to different "kinds" of things.
  • @BrianMosleyUK
    Absolutely fascinating, rich and informative episode. Lots to consider here, thank you so much. 🙏👍
  • @0xmassive526
    never touched machine learning, don't know what a tensor even is (just seen it as a class in some machine learning code on twitter), but 35 mins in the video and I dont feel lost. you bet im subbing.
  • @dr.mikeybee
    Error functions have a natural tendency towards parsimony, reflecting the principle of Occam's razor. This principle is also observed in nature, where simple and efficient solutions often prevail, suggesting that parsimony is a fundamental aspect of both human and other natural systems.
  • @shuminghu
    The superposition example at 9:34 is not in the clockwise order: It's yellow -> red -> pink -> white with later ones on top. Left edge of first one shows pink is on top of red.
  • @ArtOfTheProblem
    Congrats! We both hit 131k subs at the same time :) - What's everyone's take on few shot promting vs. test time fine tuning? My sense is in the limit, few shot prompting would be all you need, and ultimately zero shot (based on their point that as the foundation model gets bigger, you need lest test time tuning)
  • @khonsu0273
    Tim, and Yannic and Keith are the smartest tech-journos I ever heard of 😀
  • I enjoyed this video a lot. Youre an excellent speaker and you clearly illustrated some of the current issues with the llm approach. Unlike Gary Marcus you have fully convinced me that the deep learning method via ANNs is not up to the task of achieving full blown human intellect. You have convinced me that there are absolutely additional priors that are needed that are not going to map very well on to this particular form of artifical learning. Thank you for the insight. A lot more work needs to be done
  • @mouduge
    At 5:38, it's not monoticity, it's increasing vs decreasing amplitude. Just nit-picking. 😄
  • @dylan_curious
    Long term planning and zero shot learning seem like the last hurdles to AGI
  • @NextGenart99
    I don't think the LLM's image recognition capabilities are precise enough for the ARC challenge. It's not that the LLM doesn't know; it's more that it cannot see as clearly as you think.
  • 22:05 How does a simple version of the exact solution Chollet imagined go against the spirit of the metric?