Summarizing https://arxiv.org/pdf/1906.01820.pdf

Here's my try:

This paper introduces the concept of mesa-optimization, which refers to machine learning models that are also optimizers. The authors identify two key problems related to mesa-optimization: unintended optimization and inner alignment. They introduce a distinction between the two different, independent alignment problems that appear in the case of mesa-optimization. Mesa-optimizers need not be robustly aligned with the base optimizer that created them, only pseudo-aligned. The paper seeks to understand what sorts of machine learning systems are likely to exhibit mesa-optimization and what sorts are not. Furthermore, the analysis suggests that a time complexity penalty (as opposed to a description length penalty) is a double-edged sword. In section 2, we suggested that penalizing time complexity might serve to reduce the likelihood of mesa-optimization. However, the above suggests that doing so would also promote pseudo-alignment in those cases where mesa-optimizers do arise. If the cost of fully modeling the base objective in the mesa-optimizer is large, then a pseudo-aligned mesa-optimizer might be preferred simply because it reduces time complexity, even if it would result in suboptimal plans for the base objective.

Reply to this note

Please Login to reply.

Discussion

No replies yet.