Nostr Web Client

Replying to

daniele

What'sthe best AI model to analyze some video transcriptions (VTT files)?

I tried Llama3.1-8B with AnythingLLM but the results are really bad.

In comparison Claude has a perfect understanding of the content.

#asknostr

Nate 5mo ago

I've used Mixtral (the quantized Dolphin fork) to turn subtitle files into essay style documents. Works perfectly, until you get files that exceed its token limit and it summarizes or plain errors out. The biggest version of Mixtral you can run might do a good job of understanding vtt files (renamed to .txt if it refuses the original file).

Speaking if which, it might not be Lama but a token limit. If you feed it a file that's more tokens that the AI will accept it can sometimes lead to really sub par results if you expect it to understand the whole file.

Reply to this note

Please Login to reply.

Discussion

Nate 5mo ago

Edit: reading replies, yeah, sounds like you found out it's a token limit. LM studio does let you set a custom token limit, to mixed results. Might be worth trying with a couple LLMs to see if they'd handle bigger files.

daniele 5mo ago

At the end it's not a max token problem. A 100KB text file (I already clean the vtt and extracted the plain text) should be ~20.000 - 25.000 tokens, and Llama3.1:8B manages 128K tokens.

But the text comprehension is simply wrong. I also tried gemma3:12B, same problem.

Instead Claude Sonnet 4 from Claude.ai gives me a perfect reply.