Are there good tools to make benchmark testing easy? I’m trying to test with MMAU but kinda wish it was easier than copy/pasting the GitHub and getting ChatGPT to build a script for me.
It’s taking a lot of back and forths, needing to grab sample data from huggingface, reviewing output of script to make sure it’s formatted correctly, etc…

Source: x.com/yoheinakajima/status/1848058566571393105