Most LLM benchmarks are typically designed with specific targets in mind, such as coding or language understanding. However, I believe the time is ripe for also having cross-model challenges. I was curious to see if anyone has already explored or implemented this approach.