Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

91,348

4,203 24

2017-08-29に共有

Goodhart's Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to 'cheat' and get more reward than we intended.

The Concrete Problems in AI Safety Playlist:    • Concrete Problems in AI Safety
Previous Video:    • Reward Hacking: Concrete Problems in ...
The Computerphile video:    • Stop Button Solution? - Computerphile
The paper 'Concrete Problems in AI Safety': arxiv.org/pdf/1606.06565.pdf
SethBling's channel: youtube.com/user/sethbling

With thanks to my excellent Patreon supporters:
www.patreon.com/robertskmiles

Steef
Sara Tjäder
Jason Strack
Chad Jones
Ichiro Dohi
Stefan Skiles
Katie Byrne
Ziyang Liu
Jordan Medina
Kyle Scott
Jason Hise
David Rasmussen
James McCuen
Richárd Nagyfi
Ammar Mousali
Scott Zockoll
Charles Miller
Joshua Richardson
Fabian Consiglio
Jonatan R
Øystein Flygt
Björn Mosten
Michael Greve
robertvanduursen
The Guru Of Vision
Fabrizio Pisani
Alexander Hartvig Nielsen
Volodymyr
David Tjäder
Paul Mason
Ben Scanlon
Julius Brash
Mike Bird
Taylor Winning
Roman Nekhoroshev
Peggy Youell
Konstantin Shabashov
Almighty Dodd
DGJono
Matthias Meger
Scott Stevens
Emilio Alvarez
Benjamin Aaron Degenhart
Michael Ore
Robert Bridges
Dmitri Afanasjev
Brian Sandberg
Einar Ueland
Lo Rez
C3POehne
Stephen Paul
Marcel Ward
Andrew Weir
Pontus Carlsson
Taylor Smith
Ben Archer
Ivan Pochesnev
Scott McCarthy
Kabs Kabs
Phil
Philip Alexander
Christopher
Tendayi Mawushe
Gabriel Behm
Anne Kohlbrenner

コメント (21)

@volalla1 7年前

Once you mentioned smiling, I wondered how AI would max out that reward system and how creepy it might be.
@2qUJRjQEiU 7年前

I just had a vision of a world where everyone is constantly smiling, but not at their own will.
@13thxenos 7年前

When you said: "human smiling", I immediately thought about the Joker: " Let's put a smile on that face!" Now that is a terrifying GAI.
@smob0 7年前

Most of what I've heard about reward hacking tends to be about how it's this obscure problem that AI designers will have to deal with. But as I learn more about it, I've come to realize it's not just an AI problem, but more of a problem with desision making itself, and a lot of problems with society spring up from this concept. An example is what you brought up in the video, where the school system isn't really set up to make children smarter, but to make children perform well on tests. Maybe on the pursuit to creating AGI, we can find techniques to begin solving these issues as well.
@ragnkja 7年前

A likely outcome for a cleaning robot is that it literally sweeps any mess under the rug where it can't be seen. After all, humans sometimes do the same thing.
@famitory 7年前

The two outcomes of AGI safety breaches: 1. Robot causes massive disruption to humans 2. Robot is completley ineffective
@billyrobertson3170 7年前

My guess was: Clean one section of the room and only look at that part forever Wrong, but gets the idea I guess Great video as usual :)
@Zerepzerreitug 7年前

I love the idea of cleaning robots with buckets over their heads. The future is going to be weird.
@DrNano-tp5js 4年前

I think it’s fascinating that looking at challenges in developing an AI gives us a almost introspective look into how we function and can show us the causation of certain phenomena in every day life.
@urquizagabe 7年前

I just love this perfect blend of awe and terror that punches you in the face just about every episode in the Concrete Problems in AI Safety series :'-)
@TheMusicfreak8888 7年前

Every time you upload I drop everything i'm doing to watch!
@igorbednarski8048 7年前

Hi Robert, While you did mention it in the video, I have since come to realize that this problem is much greater in scope than just AI safety. Just one day after watching your video I had another training in my new company (I have recently moved from a mid-sized local business to a major corporation) and one of my more experienced co-workers started telling me all sorts of stuff about the algorithms used to calculate bonuses and how doing what we are supposed to might end up making us look like bad workers, with tips on how to look like you are superproductive (which you are actually not). I realized that this is not because the management is made of idiots, but that it's because it is actually hard to figure out. I realized that while a superintelligent AI that has poorly designed reward functions might be problematic someday in our lifetimes - it is already a massive problem that is hard enough to solve when applied to people. How would you measure the productivity of thousands of people performing complex operations that do not yield a simple output like sales or manufactured goods? I think this problem is at it's core identical to the one AI designers are facing, so I guess the best place to start looking for solutions would be to look for companies with well-designed assesment procedures, where the worker can simply do his job and not think 'will doing what's right hurt my salary?', just like a well designed computer program should do what it is supposed to without consantly looking for loopholes to exploit.
@loopuleasa 7年前

my favorite channel at the moment, like a specific Vsauce, exurb1a, **-phile, and ColdFusion
@zakmorgan9320 7年前

Not so subtle dig at the education system? Great diagram. Missed career as an artist for sure!
@saratjader1289 7年前

Robert Miles for president!
@water2205 4年前

Ai is rewarded by "thank you" i see 2 ways to mess with this. 1. Hold a human at gunpoint for constant "thank you" 2. Record "thank you" and constantly play it back
@briandoe5746 7年前

You were genuinely scary in an informative way. I think that I will set your videos to autoplay in the background as I sleep and see what kind of screwed up dreams I can have.
@arinco3817 7年前

You don't know how happy I am that you created this channel Robert! AI is bloody fascinating! You should add the video where you give a talk about AI immune systems (where most of the questions at the end become queries about biological immune systems); it was really interesting.
@JohnDoe-zu2tz 4年前

Honestly, a GAI wireheading itself and just sitting in the corner in maximized synthetic bliss is the best case scenario for a GAI going rogue.
@goodlookingcorpse 5年前

I unknowingly re-invented Goodhart's Law, based on my experiences with call centers (they reward short call times. The best way to minimize call times is to quickly give an answer, regardless of whether it's true or not, and to answer what the customer says, regardless of whether that addresses their real problem).