Skip to content

DeepSeek launches new method to improve large language models reasoning

Photo shows the DeepSeek app on a mobile phone This photo illustration shows the DeepSeek app on a mobile phone in Beijing on January 28, 2025. (AFP Photo)
By Anadolu Agency
Apr 7, 2025 10:13 AM

Chinese AI start-up DeepSeek has introduced a new technique to improve the reasoning capabilities of large language models (LLMs), claiming it outperforms existing methods.

According to the South China Morning Post on Sunday, DeepSeek, in partnership with researchers from Tsinghua University, created a dual approach that integrates generative reward modeling (GRM) with self-principled critique tuning.

According to a paper released on Friday, the dual method is intended to improve LLMs’ ability to respond to general queries with greater accuracy and speed.

The researchers said the resulting DeepSeek-GRM models outperformed existing techniques, achieving “competitive performance” with robust public reward models. Reward modeling is a process used to align an LLM’s behavior with human preferences.

DeepSeek plans to make its GRM models open source, the researchers said, although no specific timeline was given.

The paper, published on the online scientific repository arXiv, comes amid growing interest in the company’s future developments. It follows the global attention drawn by its V3 foundation model and R1 reasoning model.

Last Updated:  Apr 7, 2025 10:13 AM