Baidu paper improves open-ended reasoning with RL via multiple-choice reformulation
Researchers at Baidu have developed a new method for applying reinforcement learning (RL) to open-ended tasks like writing and subjective answers, where outputs lack a single correct answer. By reformulating…
