DeepMind and OpenAI's "Learning from Human Preferences"

So apparently either nobody’s spotted this or didn’t think it would be interesting to post here, but Deepmind and OpenAI have, in conjunction, developed a system for training AI to be rewarded according to human preferences while cutting down on the time required to be spent by humans

I feel like if we can manage to implement this for use in making weaker – or even teaching – bots for go, then beginners who are too afraid to face other humans could have a much nicer time improving and going up the ranks.

But probably the most interesting part is just how much it appears to expand upon the possible powers of AI. Wanna have a bot that plays more influence than a normal bot? More territory? Maybe you wanna make it look more like a pro than usual (or maybe more like you). These all seem much more possible now that we can have a modified system that takes human input, seeks to update and replicate that human input to make that travel much further.

Of course, for those of us who are less AI-savvy, or would just rather watch a video on the topic than read the actual blog post or paper, there’s a nice video on it here:


This is very cool. I had never heard about this aspect of AI before. Thank you very much for sharing this :smiley: