⚡️🤖 NEW - Anthropic researchers teach language models to fine-tune themselves
Anthropic and partners have developed a new method called “Internal Coherence Maximization” (ICM) that allows language models to fine-tune themselves without any human feedback.
The model evaluates the consistency of its own responses and optimizes itself by comparing and correcting inconsistent statements. In benchmarks such as TruthfulQA and GSM8K, ICM achieved similar or better results than models with classic supervised fine-tuning. It was particularly striking that ICM-optimized models often performed more convincingly in subjective evaluations such as usefulness.
ICM can also generate a powerful reward model for reinforcement learning. However, the method shows limitations with new concepts and very long text inputs. Nevertheless, everything indicates that ICM is a promising step toward more autonomous and consistent AI systems.
