DeepSeek unveils new AI reasoning method as anticipation for its next-gen model rises. In collaboration with researchers from Tsinghua University, DeepSeek developed a technique that combines methods referred to as generative reward modelling (GRM) and self-principled critique tuning, according to a paper published on Friday. The dual approach aims to enable LLMs to deliver better and faster results to general queries.The resulting DeepSeek-GRM models outperformed existing methods, having โachieved competitive performanceโ with strong public reward models, the researchers wrote. Reward modeling is a process that guides an LLM towards human preferences.

