Search
130 resources
-
Fengchun Miao, Wayne Holmes|Jul 1st, 2023|bookFengchun Miao, Wayne HolmesJul 1st, 2023
-
Ramon Pires, Hugo Abonizio, Thales Sales...|Jul 1st, 2023|preprintRamon Pires, Hugo Abonizio, Thales Sales...Jul 1st, 2023
As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...
-
Ramon Pires, Hugo Abonizio, Thales Sales...|Jul 1st, 2023|preprintRamon Pires, Hugo Abonizio, Thales Sales...Jul 1st, 2023
As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...
-
Shashank Sonkar, Naiming Liu, Debshila M...|Jul 1st, 2023|conferencePaperShashank Sonkar, Naiming Liu, Debshila M...Jul 1st, 2023
-
Valdemar Švábenský, Ryan S. Baker, André...|Jul 1st, 2023|conferencePaperValdemar Švábenský, Ryan S. Baker, André...Jul 1st, 2023
-
Jiaan Wang, Yunlong Liang, Fandong Meng,...|Jul 1st, 2023|journalArticleJiaan Wang, Yunlong Liang, Fandong Meng,...Jul 1st, 2023
Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with...
-
Kevin P. Yancey, Geoffrey Laflair, Antho...|Jul 1st, 2023|conferencePaperKevin P. Yancey, Geoffrey Laflair, Antho...Jul 1st, 2023
Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...
-
Kevin P. Yancey, Geoffrey Laflair, Antho...|Jul 1st, 2023|conferencePaperKevin P. Yancey, Geoffrey Laflair, Antho...Jul 1st, 2023
Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...
-
Shuyan Zhou, Uri Alon, Sumit Agarwal|Jul 1st, 2023|conferencePaperShuyan Zhou, Uri Alon, Sumit AgarwalJul 1st, 2023
-
Zihao Zhou, Maizhen Ning, Qiufeng Wang|Jul 1st, 2023|conferencePaperZihao Zhou, Maizhen Ning, Qiufeng WangJul 1st, 2023