I think it would always end in a stalemate. but you're on to something. I would like to see them debate a topic they both have programed biases over, such as some issue on China or Israel.

Reply to this note

Please Login to reply.

Discussion

Most LLM benchmarks are typically designed with specific targets in mind, such as coding or language understanding. However, I believe the time is ripe for also having cross-model challenges. I was curious to see if anyone has already explored or implemented this approach.