2 resources

  • Siyuan Wang, Zhuohan Long, Zhihao Fan
    |
    Feb 17th, 2024
    |
    preprint
    Siyuan Wang, Zhuohan Long, Zhihao Fan
    Feb 17th, 2024

    This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

  • Siyuan Wang, Zhuohan Long, Zhihao Fan
    |
    Feb 17th, 2024
    |
    preprint
    Siyuan Wang, Zhuohan Long, Zhihao Fan
    Feb 17th, 2024

    This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations...

Last update from database: 28/10/2025, 06:15 (UTC)
Powered by Zotero and Kerko.