Global Feed Post Login
Replying to Avatar FLASH

⚡️🤖 NEW - Anthropic researchers teach language models to fine-tune themselves

Anthropic and partners have developed a new method called “Internal Coherence Maximization” (ICM) that allows language models to fine-tune themselves without any human feedback.

The model evaluates the consistency of its own responses and optimizes itself by comparing and correcting inconsistent statements. In benchmarks such as TruthfulQA and GSM8K, ICM achieved similar or better results than models with classic supervised fine-tuning. It was particularly striking that ICM-optimized models often performed more convincingly in subjective evaluations such as usefulness.

ICM can also generate a powerful reward model for reinforcement learning. However, the method shows limitations with new concepts and very long text inputs. Nevertheless, everything indicates that ICM is a promising step toward more autonomous and consistent AI systems.

Avatar
FLASH 7mo ago

🗞️ https://the-decoder.com/anthropic-researchers-teach-language-models-to-fine-tune-themselves/

Reply to this note

Please Login to reply.

Discussion

No replies yet.