having bad LLMs can allow us to find truth faster. reinforcement algorithm could be: "take what a proper model says and negate what a bad LLM says". then the convergence will be faster with two wings!
having bad LLMs can allow us to find truth faster. reinforcement algorithm could be: "take what a proper model says and negate what a bad LLM says". then the convergence will be faster with two wings!