Concrete AI safety problems

2016-06-22

Artificial Intelligence > Machine Learning

The “Concrete AI safety problems” paper by _Dario Amodei (Google Brain), Chris Olah (Google Brain), Jacob Steinhardt (Stanford University), Paul Christiano (UC Berkeley), John Schulman (OpenAI), Dan Mané (Google Brain) _suggests a new approach to the Machine Learning (ML) and Artificial Intelligence (AI) research, which focuses more on productivity of forward-looking applications while building cutting-edge AI systems.

A number of key problems considered in the paper as well as their descriptions are listed below. In the paper authors also elaborate on how to approach each of the given problem.

Avoiding Negative Side Eﬀects: How can we ensure that our cleaning robot will not

disturb the environment in negative ways while pursuing its goals, e.g. by knocking over a

vase because it can clean faster by doing so? Can we do this without manually specifying

everything the robot should not disturb?
- Avoiding Reward Hacking: How can we ensure that the cleaning robot won’t game its
reward function? For example, if we reward the robot for achieving an environment free of

messes, it might disable its vision so that it won’t ﬁnd any messes, or cover over messes with

materials it can’t see through, or simply hide when humans are around so they can’t tell it

about new types of messes.
- Scalable Oversight: How can we eﬃciently ensure that the cleaning robot respects aspects of
  
  the objective that are too expensive to be frequently evaluated during training? For instance, it
  
  should throw out things that are unlikely to belong to anyone, but put aside things that might
  
  belong to someone (it should handle stray candy wrappers diﬀerently from stray cellphones).
  
  Asking the humans involved whether they lost anything can serve as a check on this, but this
  
  check might have to be relatively infrequent – can the robot ﬁnd a way to do the right thing
  
  despite limited information?
- Safe Exploration: How do we ensure that the cleaning robot doesn’t make exploratory
  
  moves with very bad repercussions? For example, the robot should experiment with mopping
  
  strategies, but putting a wet mop in an electrical outlet is a very bad idea.
- Robustness to Distributional Shift: How do we ensure that the cleaning robot recognizes,
  
  and behaves robustly, when in an environment diﬀerent from its training environment? For
  
  example, heuristics it learned for cleaning factory workﬂoors may be outright dangerous in an
  
  oﬃce.

Source: Concrete Problems in AI Safety

Contents