Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

2017-08-29に共有
Goodhart's Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to 'cheat' and get more reward than we intended.

The Concrete Problems in AI Safety Playlist:    • Concrete Problems in AI Safety  
Previous Video:    • Reward Hacking: Concrete Problems in ...  
The Computerphile video:    • Stop Button Solution? - Computerphile  
The paper 'Concrete Problems in AI Safety': arxiv.org/pdf/1606.06565.pdf
SethBling's channel: youtube.com/user/sethbling

With thanks to my excellent Patreon supporters:
www.patreon.com/robertskmiles

Steef
Sara Tjäder
Jason Strack
Chad Jones
Ichiro Dohi
Stefan Skiles
Katie Byrne
Ziyang Liu
Jordan Medina
Kyle Scott
Jason Hise
David Rasmussen
James McCuen
Richárd Nagyfi
Ammar Mousali
Scott Zockoll
Charles Miller
Joshua Richardson
Fabian Consiglio
Jonatan R
Øystein Flygt
Björn Mosten
Michael Greve
robertvanduursen
The Guru Of Vision
Fabrizio Pisani
Alexander Hartvig Nielsen
Volodymyr
David Tjäder
Paul Mason
Ben Scanlon
Julius Brash
Mike Bird
Taylor Winning
Roman Nekhoroshev
Peggy Youell
Konstantin Shabashov
Almighty Dodd
DGJono
Matthias Meger
Scott Stevens
Emilio Alvarez
Benjamin Aaron Degenhart
Michael Ore
Robert Bridges
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Lo Rez
C3POehne
Stephen Paul
Marcel Ward
Andrew Weir
Pontus Carlsson
Taylor Smith
Ben Archer
Ivan Pochesnev
Scott McCarthy
Kabs Kabs
Phil
Philip Alexander
Christopher
Tendayi Mawushe
Gabriel Behm
Anne Kohlbrenner

コメント (21)
  • @volalla1
    Once you mentioned smiling, I wondered how AI would max out that reward system and how creepy it might be.
  • I just had a vision of a world where everyone is constantly smiling, but not at their own will.
  • @13thxenos
    When you said: "human smiling", I immediately thought about the Joker: " Let's put a smile on that face!" Now that is a terrifying GAI.
  • @smob0
    Most of what I've heard about reward hacking tends to be about how it's this obscure problem that AI designers will have to deal with. But as I learn more about it, I've come to realize it's not just an AI problem, but more of a problem with desision making itself, and a lot of problems with society spring up from this concept. An example is what you brought up in the video, where the school system isn't really set up to make children smarter, but to make children perform well on tests. Maybe on the pursuit to creating AGI, we can find techniques to begin solving these issues as well.
  • @ragnkja
    A likely outcome for a cleaning robot is that it literally sweeps any mess under the rug where it can't be seen. After all, humans sometimes do the same thing.
  • @famitory
    The two outcomes of AGI safety breaches: 1. Robot causes massive disruption to humans 2. Robot is completley ineffective
  • My guess was: Clean one section of the room and only look at that part forever Wrong, but gets the idea I guess Great video as usual :)
  • I love the idea of cleaning robots with buckets over their heads. The future is going to be weird.
  • I think it’s fascinating that looking at challenges in developing an AI gives us a almost introspective look into how we function and can show us the causation of certain phenomena in every day life.
  • I just love this perfect blend of awe and terror that punches you in the face just about every episode in the Concrete Problems in AI Safety series :'-)
  • Hi Robert, While you did mention it in the video, I have since come to realize that this problem is much greater in scope than just AI safety. Just one day after watching your video I had another training in my new company (I have recently moved from a mid-sized local business to a major corporation) and one of my more experienced co-workers started telling me all sorts of stuff about the algorithms used to calculate bonuses and how doing what we are supposed to might end up making us look like bad workers, with tips on how to look like you are superproductive (which you are actually not). I realized that this is not because the management is made of idiots, but that it's because it is actually hard to figure out. I realized that while a superintelligent AI that has poorly designed reward functions might be problematic someday in our lifetimes - it is already a massive problem that is hard enough to solve when applied to people. How would you measure the productivity of thousands of people performing complex operations that do not yield a simple output like sales or manufactured goods? I think this problem is at it's core identical to the one AI designers are facing, so I guess the best place to start looking for solutions would be to look for companies with well-designed assesment procedures, where the worker can simply do his job and not think 'will doing what's right hurt my salary?', just like a well designed computer program should do what it is supposed to without consantly looking for loopholes to exploit.
  • my favorite channel at the moment, like a specific Vsauce, exurb1a, **-phile, and ColdFusion
  • Not so subtle dig at the education system? Great diagram. Missed career as an artist for sure!
  • @water2205
    Ai is rewarded by "thank you" i see 2 ways to mess with this. 1. Hold a human at gunpoint for constant "thank you" 2. Record "thank you" and constantly play it back
  • You were genuinely scary in an informative way. I think that I will set your videos to autoplay in the background as I sleep and see what kind of screwed up dreams I can have.
  • You don't know how happy I am that you created this channel Robert! AI is bloody fascinating! You should add the video where you give a talk about AI immune systems (where most of the questions at the end become queries about biological immune systems); it was really interesting.
  • Honestly, a GAI wireheading itself and just sitting in the corner in maximized synthetic bliss is the best case scenario for a GAI going rogue.
  • I unknowingly re-invented Goodhart's Law, based on my experiences with call centers (they reward short call times. The best way to minimize call times is to quickly give an answer, regardless of whether it's true or not, and to answer what the customer says, regardless of whether that addresses their real problem).