Speaker: Tilman Hartwig

Title: Be careful what you wish for: Reward Modelling in AI

A computer program will do what you say and not what you mean. In the
everyday life of an astronomer, this can for example lead to poor
fitting results, which can be spotted and corrected with human
intuition. However, this problem becomes more serious for artificial
intelligence with explicit, human-designed reward functions. I will
present examples from various scientific domains where an AI naively
exploits the reward function, which leads to undesired behaviour.
Finally, I will present Reward Modelling as a novel solution to this
problem of specification learning.

Last-modified: 2020-11-19 (木) 16:26:56 (66d)