In authors or contributors

1 resource

  • Peiyi Wang, Lei Li, Zhihong Shao
    |
    Feb 19th, 2024
    |
    preprint
    Peiyi Wang, Lei Li, Zhihong Shao
    Feb 19th, 2024

    In this paper, we present an innovative process-oriented math process reward model called \textbf{Math-Shepherd}, which assigns a reward score to each step of math problem solutions. The training of Math-Shepherd is achieved using automatically constructed process-wise supervision data, breaking the bottleneck of heavy reliance on manual annotation in existing work. We explore the effectiveness of Math-Shepherd in two scenarios: 1) \textit{Verification}: Math-Shepherd is utilized for...

Last update from database: 28/12/2024, 22:15 (UTC)
Powered by Zotero and Kerko.