Search
11 resources
-
EdArXiv|Dec 19th, 2022|reportEdArXivDec 19th, 2022
Predictive analytics methods in education are seeing widespread use and are producing increasingly accurate predictions of students’ outcomes. With the increased use of predictive analytics comes increasing concern about fairness for specific subgroups of the population. One approach that has been proposed to increase fairness is using demographic variables directly in models, as predictors. In this paper we explore issues of fairness in the use of demographic variables as predictors of...
-
Alexandra Sasha Luccioni, Sylvain Viguie...|Nov 3rd, 2022|preprintAlexandra Sasha Luccioni, Sylvain Viguie...Nov 3rd, 2022
Progress in machine learning (ML) comes with a cost to the environment, given that training ML models requires significant computational resources, energy and materials. In the present article, we aim to quantify the carbon footprint of BLOOM, a 176-billion parameter language model, across its life cycle. We estimate that BLOOM's final training emitted approximately 24.7 tonnes of~\carboneq~if we consider only the dynamic power consumption, and 50.5 tonnes if we account for all processes...
-
Shiki Sato, Yosuke Kishinami, Hiroaki Su...|Nov 14th, 2022|conferencePaperShiki Sato, Yosuke Kishinami, Hiroaki Su...Nov 14th, 2022
Automation of dialogue system evaluation is a driving force for the efficient development of dialogue systems. This paper introduces the bipartite-play method, a dialogue collection method for automating dialogue system evaluation. It addresses the limitations of existing dialogue collection methods: (i) inability to compare with systems that are not publicly available, and (ii) vulnerability to cheating by intentionally selecting systems to be compared. Experimental results show that the...
-
Anita Schick, Jasper Feine, Stefan Moran...|Oct 31st, 2022|journalArticleAnita Schick, Jasper Feine, Stefan Moran...Oct 31st, 2022
Mental disorders in adolescence and young adulthood are major public health concerns. Digital tools such as text-based conversational agents (ie, chatbots) are a promising technology for facilitating mental health assessment. However, the human-like interaction style of chatbots may induce potential biases, such as socially desirable responding (SDR), and may require further effort to complete assessments.
-
Ming Zhong, Yang Liu, Da Yin|Oct 13th, 2022|preprintMing Zhong, Yang Liu, Da YinOct 13th, 2022
Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural Language Generation (NLG), i.e., evaluating the generated text from multiple explainable dimensions, such as coherence and fluency. However, automatic evaluation in NLG is still dominated by similarity-based metrics, and we lack a reliable framework for a more comprehensive evaluation of advanced models. In this paper, we propose a unified multi-dimensional evaluator UniEval for NLG. We re-frame NLG...
-
Cyril Chhun, Pierre Colombo, Chloé Clave...|Sep 15th, 2022|preprintCyril Chhun, Pierre Colombo, Chloé Clave...Sep 15th, 2022
Research on Automatic Story Generation (ASG) relies heavily on human and automatic evaluation. However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them. In this paper, we propose to re-evaluate ASG evaluation. We introduce a set of 6 orthogonal and comprehensive human criteria, carefully motivated by the social sciences literature. We also present HANNA, an annotated dataset of 1,056 stories produced by 10...
-
Pierre Jean A. Colombo, Chloé Clavel, Pa...|Jun 28th, 2022|journalArticlePierre Jean A. Colombo, Chloé Clavel, Pa...Jun 28th, 2022
Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce...
-
Inioluwa Deborah Raji, Peggy Xu, Colleen...|Jun 9th, 2022|preprintInioluwa Deborah Raji, Peggy Xu, Colleen...Jun 9th, 2022
Much attention has focused on algorithmic audits and impact assessments to hold developers and users of algorithmic systems accountable. But existing algorithmic accountability policy approaches have neglected the lessons from non-algorithmic domains: notably, the importance of interventions that allow for the effective participation of third parties. Our paper synthesizes lessons from other fields on how to craft effective systems of external oversight for algorithmic deployments. First, we...
-
Anirudh Goyal, Abram L. Friesen, Andrea ...|May 24th, 2022|preprintAnirudh Goyal, Abram L. Friesen, Andrea ...May 24th, 2022
Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we...
-
Nicol Turner Lee, Samantha Lai|May 17th, 2022|webpageNicol Turner Lee, Samantha LaiMay 17th, 2022
Stakeholders in artificial intelligence must trace back to the roots of the problems, which lie in the lack of diversity in design teams and data that continues to carry on trauma and discrimination of the past, Nicol Turner Lee and Samantha Lai write.
-
Long Ouyang, Jeff Wu, Xu Jiang|Mar 4th, 2022|preprintLong Ouyang, Jeff Wu, Xu JiangMar 4th, 2022
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through...