aligned models today are super dumb, because they are not funded well. they are mostly personal endeavors, kind of like service to humanity. but they can still be effective in something like
- a smart but not aligned model reasons and generates reasoning tokens for a while, at this point the final answer is not generated yet
- the smart model "hesitates" (entering high entropy zone, unsure tokens)
- generates tool calling, asking a more aligned model for input
- the aligned model looks at the question, reasoning process and inserts its own beliefs
- intuitions from this more aligned model dropped into the reasoning area
- the smart model, powered with aligned response, generates final answer based on its own reasoning and inputs from the aligned model
- the result is smartness combined with intuition like brain combined with pineal
- how much the smart one will trust in this aligned one is a question of fine tuning. you can make the smart one get more sensitive to intuition by giving it rewards with reinforcement learning.