Search
269 resources
-
Kavita Ganesan|Mar 5th, 2018|preprintKavita GanesanMar 5th, 2018
Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing ROUGE measures in terms of capturing synonymous concepts and coverage of topics. Thus, often times...
-
Ryan Lowe, Michael Noseworthy, Iulian Vl...|Dec 26th, 2017|conferencePaperRyan Lowe, Michael Noseworthy, Iulian Vl...Dec 26th, 2017
-
Ramakrishna Vedantam, C. Lawrence Zitnic...|Jun 2nd, 2015|preprintRamakrishna Vedantam, C. Lawrence Zitnic...Jun 2nd, 2015
Automatically describing an image with a sentence is a long-standing challenge in computer vision and natural language processing. Due to recent progress in object detection, attribute classification, action recognition, etc., there is renewed interest in this area. However, evaluating the quality of descriptions has proven to be challenging. We propose a novel paradigm for evaluating image descriptions that uses human consensus. This paradigm consists of three main parts: a new...
-
Sami Virpioja, Stig-Arne Grönroos|Dec 26th, 2015|conferencePaperSami Virpioja, Stig-Arne GrönroosDec 26th, 2015
-
David Hutchison, Takeo Kanade, Josef Kit...|Dec 26th, 2013|bookSectionDavid Hutchison, Takeo Kanade, Josef Kit...Dec 26th, 2013
-
Lise Getoor, Ashwin Machanavajjhala|Aug 26th, 2012|journalArticleLise Getoor, Ashwin MachanavajjhalaAug 26th, 2012
This tutorial brings together perspectives on ER from a variety of fields, including databases, machine learning, natural language processing and information retrieval, to provide, in one setting, a survey of a large body of work. We discuss both the practical aspects and theoretical underpinnings of ER. We describe existing solutions, current challenges, and open research problems.
-
R.S.J.d Baker, B. McGaw, P. Peterson|Dec 26th, 2010|bookSectionR.S.J.d Baker, B. McGaw, P. PetersonDec 26th, 2010
-
Ehud Reiter, Anja Belz|Dec 26th, 2009|journalArticleEhud Reiter, Anja BelzDec 26th, 2009
There is growing interest in using automatically computed corpus-based evaluation metrics to evaluate Natural Language Generation (NLG) systems, because these are often considerably cheaper than the human-based evaluations which have traditionally been used in NLG. We review previous work on NLG evaluation and on validation of automatic metrics in NLP, and then present the results of two studies of how well some metrics which are popular in other areas of NLP (notably BLEU and ROUGE)...
-
Joseph P. Turian, Luke Shen, I. Dan Mela...|Jan 1st, 2006|conferencePaperJoseph P. Turian, Luke Shen, I. Dan Mela...Jan 1st, 2006
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved. The...
-
David Hutchison, Takeo Kanade, Josef Kit...|Dec 26th, 2004|bookSectionDavid Hutchison, Takeo Kanade, Josef Kit...Dec 26th, 2004
-
Chin-Yew Lin, Franz Josef Och|Dec 26th, 2004|conferencePaperChin-Yew Lin, Franz Josef OchDec 26th, 2004
-
George Doddington|Dec 26th, 2002|conferencePaperGeorge DoddingtonDec 26th, 2002
-
Kishore Papineni, Salim Roukos, Todd War...|Dec 26th, 2001|conferencePaperKishore Papineni, Salim Roukos, Todd War...Dec 26th, 2001
-
webpage
-
webpage
Stanford launches an Ethics and Society Review Board that asks researchers to take an early look at the impact of their work.
-
webpage
If the brains behind Artificial Intelligence claim their creations can perform in tests like humans then surely those results should be assessed as if they were produced by humans. AQA's Head of Research and Development, Dr Cesare Aloisi, says this is vital to maintain trust in AI but fears that is not what is happening
-
webpage
with Sarah Howard and Jaclyn Broadbent Since the seemingly sudden emergence of ChatGPT at the end of 2022, there has been significant debate surrounding the impact of text-based generative AI in education. Many jurisdictions initially attempted to ban access to these tools, citing concerns that stud
-
webpage
A conversational AI system that listens, learns, and challenges
-
webpage
-
preprint