6 resources

  • Ming Zhong, Yang Liu, Da Yin
    |
    Jan 22nd, 2022
    |
    preprint
    Ming Zhong, Yang Liu, Da Yin
    Jan 22nd, 2022

    Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG...

  • Ming Zhong, Yang Liu, Da Yin
    |
    Jan 22nd, 2022
    |
    preprint
    Ming Zhong, Yang Liu, Da Yin
    Jan 22nd, 2022

    Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG...

  • Matthew S. Johnson, Xiang Liu, Daniel F....
    |
    Sep 22nd, 2022
    |
    journalArticle
    Matthew S. Johnson, Xiang Liu, Daniel F....
    Sep 22nd, 2022
  • Yan Zhuang, Qi Liu, Zhenya Huang
    |
    Jun 28th, 2022
    |
    journalArticle
    Yan Zhuang, Qi Liu, Zhenya Huang
    Jun 28th, 2022

    Computerized Adaptive Testing (CAT) refers to an efficient and personalized test mode in online education, aiming to accurately measure student proficiency level on the required subject/domain. The key component of CAT is the "adaptive" question selection algorithm, which automatically selects the best suited question for student based on his/her current estimated proficiency, reducing test length. Existing algorithms rely on some manually designed and pre-fixed informativeness/uncertainty...

  • Nigel Fernandez, Aritra Ghosh, Naiming L...
    |
    Jan 22nd, 2022
    |
    preprint
    Nigel Fernandez, Aritra Ghosh, Naiming L...
    Jan 22nd, 2022

    Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two...

  • Iddo Drori, Sarah Zhang, Reece Shuttlewo...
    |
    Aug 2nd, 2022
    |
    journalArticle
    Iddo Drori, Sarah Zhang, Reece Shuttlewo...
    Aug 2nd, 2022

    We demonstrate that a neural network pretrained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI’s Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a dataset of questions from Massachusetts Institute of Technology (MIT)’s largest mathematics courses (Single Variable and Multivariable Calculus,...

Last update from database: 22/01/2026, 11:15 (UTC)
Powered by Zotero and Kerko.