maybe? LLMs are weird animals. i think it will work somewhat because instead of giving the same material lots of times i can now give slightly different versions, causing less overfitting.

another use case may be RL using LLM feedbacks. also the bad answer and the good answer can be generated by different LLMs.

i also thought about doing the reverse. like system message "you are an evil LLM" and provide the answers inverted. then it may learn what is evil better? fun times.

Reply to this note

Please Login to reply.

Discussion

No replies yet.