I'm going to share some steps I would take if I were doing what you're doing (afaik summarizing a large text document with local models) Some of these might help you improve your results.
- First, I would check that the context window configuration in Ollama meets your needs and is not set to the default 2048. If it is, the issue is that the model will only perceive what fits within that context window.
- I would ensure that I'm providing a good system prompt to clearly specify the role, task, and traits.
If you've already considered these points and are still not getting good results, I would then try more sophisticated methods to feed the data to the model. For example, I would chunk the large text into contextual segments, summarize each chunk, and then summarize all the outputs.
If that still doesn't work, I would focus on creating an agentic workflow as a data pipeline to experiment with these chunking and summarization steps in a more controlled manner. I would also consider using DSPy.
Just keep in mind that a 12B model is significantly smaller compared to the enormous proprietary models. To achieve similar results, we'll need to be smarter in how we use the local models.