Search
6 resources
-
Ming Zhong, Yang Liu, Da Yin|Jan 22nd, 2022|preprintMing Zhong, Yang Liu, Da YinJan 22nd, 2022
Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG...
-
Ming Zhong, Yang Liu, Da Yin|Jan 22nd, 2022|preprintMing Zhong, Yang Liu, Da YinJan 22nd, 2022
Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG...
-
Matthew S. Johnson, Xiang Liu, Daniel F....|Sep 22nd, 2022|journalArticleMatthew S. Johnson, Xiang Liu, Daniel F....Sep 22nd, 2022
-
Yan Zhuang, Qi Liu, Zhenya Huang|Jun 28th, 2022|journalArticleYan Zhuang, Qi Liu, Zhenya HuangJun 28th, 2022
Computerized Adaptive Testing (CAT) refers to an efficient and personalized test mode in online education, aiming to accurately measure student proficiency level on the required subject/domain. The key component of CAT is the "adaptive" question selection algorithm, which automatically selects the best suited question for student based on his/her current estimated proficiency, reducing test length. Existing algorithms rely on some manually designed and pre-fixed informativeness/uncertainty...
-
Nigel Fernandez, Aritra Ghosh, Naiming L...|Jan 22nd, 2022|preprintNigel Fernandez, Aritra Ghosh, Naiming L...Jan 22nd, 2022
Automated scoring of open-ended student responses has the potential to significantly reduce human grader effort. Recent advances in automated scoring often leverage textual representations based on pre-trained language models such as BERT and GPT as input to scoring models. Most existing approaches train a separate model for each item/question, which is suitable for scenarios such as essay scoring where items can be quite different from one another. However, these approaches have two...
-
Iddo Drori, Sarah Zhang, Reece Shuttlewo...|Aug 2nd, 2022|journalArticleIddo Drori, Sarah Zhang, Reece Shuttlewo...Aug 2nd, 2022
We demonstrate that a neural network pretrained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI’s Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a dataset of questions from Massachusetts Institute of Technology (MIT)’s largest mathematics courses (Single Variable and Multivariable Calculus,...