OK so srsly limiting the response output gave giant speed boost. Token generation is like 10x slower than prompt reading.

Outlines let's you constrain model output but I don't see how to have a conditional requirement. Like result:boolean quote:required if true

So gonna break it into 2 calls, one for the result and one for the quote

Reply to this note

Please Login to reply.

Discussion

No replies yet.