214 resources

  • Ramon Pires, Hugo Abonizio, Thales Sales...
    |
    Dec 1st, 2023
    |
    preprint
    Ramon Pires, Hugo Abonizio, Thales Sales...
    Dec 1st, 2023

    As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

  • Ramon Pires, Hugo Abonizio, Thales Sales...
    |
    Dec 1st, 2023
    |
    preprint
    Ramon Pires, Hugo Abonizio, Thales Sales...
    Dec 1st, 2023

    As the capabilities of language models continue to advance, it is conceivable that "one-size-fits-all" model will remain as the main paradigm. For instance, given the vast number of languages worldwide, many of which are low-resource, the prevalent practice is to pretrain a single model on multiple languages. In this paper, we add to the growing body of evidence that challenges this practice, demonstrating that monolingual pretraining on the target language significantly improves models...

  • Partha Pratim Ray
    |
    Dec 1st, 2023
    |
    journalArticle
    Partha Pratim Ray
    Dec 1st, 2023

    In recent years, artificial intelligence (AI) and machine learning have been transforming the landscape of scientific research. Out of which, the chatbot technology has experienced tremendous advancements in recent years, especially with ChatGPT emerging as a notable AI language model. This comprehensive review delves into the background, applications, key challenges, and future directions of ChatGPT. We begin by exploring its origins, development, and underlying technology, before examining...

  • Shashank Sonkar, Naiming Liu, Debshila M...
    |
    Dec 1st, 2023
    |
    conferencePaper
    Shashank Sonkar, Naiming Liu, Debshila M...
    Dec 1st, 2023
  • Mirac Suzgun, Nathan Scales, Nathanael S...
    |
    Dec 1st, 2023
    |
    preprint
    Mirac Suzgun, Nathan Scales, Nathanael S...
    Dec 1st, 2023

    BIG-Bench (Srivastava et al., 2022) is a diverse evaluation suite that focuses on tasks believed to be beyond the capabilities of current language models. Language models have already made good progress on this benchmark, with the best model in the BIG-Bench paper outperforming average reported human-rater results on 65% of the BIG-Bench tasks via few-shot prompting. But on what tasks do language models fall short of average human-rater performance, and are those tasks actually unsolvable by...

  • Valdemar Švábenský, Ryan S. Baker, André...
    |
    Dec 1st, 2023
    |
    conferencePaper
    Valdemar Švábenský, Ryan S. Baker, André...
    Dec 1st, 2023
  • Ekaterina Svikhnushina, Pearl Pu
    |
    Dec 1st, 2023
    |
    preprint
    Ekaterina Svikhnushina, Pearl Pu
    Dec 1st, 2023

    As conversational models become increasingly available to the general public, users are engaging with this technology in social interactions. Such unprecedented interaction experiences may pose considerable social and psychological risks to the users unless the technology is properly controlled. This highlights the need for scalable and robust evaluation metrics for conversational chatbots. Existing evaluation metrics aim to automate offline user evaluation and approximate human judgment of...

  • Yan Tao, Olga Viberg, Ryan S. Baker
    |
    Dec 1st, 2023
    |
    journalArticle
    Yan Tao, Olga Viberg, Ryan S. Baker
    Dec 1st, 2023

    Culture fundamentally shapes people's reasoning, behavior, and communication. Generative artificial intelligence (AI) technologies may cause a shift towards a dominant culture. As people increasingly use AI to expedite and even automate various professional and personal tasks, cultural values embedded in AI models may bias authentic expression. We audit large language models for cultural bias, comparing their responses to nationally representative survey data, and evaluate country-specific...

  • Jiaan Wang, Yunlong Liang, Fandong Meng,...
    |
    Dec 1st, 2023
    |
    journalArticle
    Jiaan Wang, Yunlong Liang, Fandong Meng,...
    Dec 1st, 2023

    Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with...

  • Scott Wood
    |
    Dec 1st, 2023
    |
    report
    Scott Wood
    Dec 1st, 2023
  • Kevin P. Yancey, Geoffrey Laflair, Antho...
    |
    Dec 1st, 2023
    |
    conferencePaper
    Kevin P. Yancey, Geoffrey Laflair, Antho...
    Dec 1st, 2023

    Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and...

  • Eric Zelikman, Wanjing Anya Ma, Jasmine ...
    |
    Dec 1st, 2023
    |
    conferencePaper
    Eric Zelikman, Wanjing Anya Ma, Jasmine ...
    Dec 1st, 2023
  • Shuyan Zhou, Uri Alon, Sumit Agarwal
    |
    Dec 1st, 2023
    |
    conferencePaper
    Shuyan Zhou, Uri Alon, Sumit Agarwal
    Dec 1st, 2023
  • Zihao Zhou, Maizhen Ning, Qiufeng Wang
    |
    Dec 1st, 2023
    |
    conferencePaper
    Zihao Zhou, Maizhen Ning, Qiufeng Wang
    Dec 1st, 2023
Last update from database: 01/12/2025, 15:15 (UTC)
Powered by Zotero and Kerko.