I've used Mixtral (the quantized Dolphin fork) to turn subtitle files into essay style documents. Works perfectly, until you get files that exceed its token limit and it summarizes or plain errors out. The biggest version of Mixtral you can run might do a good job of understanding vtt files (renamed to .txt if it refuses the original file).
Speaking if which, it might not be Lama but a token limit. If you feed it a file that's more tokens that the AI will accept it can sometimes lead to really sub par results if you expect it to understand the whole file.