Not on actual video file, on vtt files, they are essentially text files with timestamps and content).
Discussion
I think you could convert the VTT file to a txt file and then load that into Msty desktop as a knowledge stack and then make queries about it. I can try it when I get home.
I cleaned up the vtt file removing the timestamps and the author, but it seems that Llama3.1-8B cannot handle a 100KB file, it's too much for its 128K token context window.
It's a shame.