Search
706 resources
-
Cynthia Lee, Kelvin C.K. Wong, William K...|Feb 4th, 2009|journalArticleCynthia Lee, Kelvin C.K. Wong, William K...Feb 4th, 2009
-
Klaus Zechner, Derrick Higgins, Xiaoming...|Oct 1st, 2007|journalArticleKlaus Zechner, Derrick Higgins, Xiaoming...Oct 1st, 2007
-
Anat Ben-Simon, Randy Elliot Bennett|May 4th, 2007|journalArticleAnat Ben-Simon, Randy Elliot BennettMay 4th, 2007
This study evaluated a “substantively driven” method for scoring NAEP writing assessments automatically. The study used variations of an existing commercial program, e-rater®, to compare the performance of three approaches to automated essay scoring: a brute-empirical approach in which variables are selected and weighted solely according to statistical criteria, a hybrid approach in which a fixed set of variables more closely tied to the characteristics of good writing was used but the...
-
Joseph P. Turian, Luke Shen, I. Dan Mela...|Jan 1st, 2006|conferencePaperJoseph P. Turian, Luke Shen, I. Dan Mela...Jan 1st, 2006
Evaluation of MT evaluation measures is limited by inconsistent human judgment data. Nonetheless, machine translation can be evaluated using the well-known measures precision, recall, and their average, the F-measure. The unigram-based F-measure has significantly higher correlation with human judgments than recently proposed alternatives. More importantly, this standard measure has an intuitive graphical interpretation, which can facilitate insight into how MT systems might be improved. The...
-
Alon Lavie, Kenji Sagae, Shyamsundar Jay...|May 4th, 2004|bookSectionAlon Lavie, Kenji Sagae, Shyamsundar Jay...May 4th, 2004
-
Chin-Yew Lin, Franz Josef Och|May 4th, 2004|conferencePaperChin-Yew Lin, Franz Josef OchMay 4th, 2004
-
Paul Deane, Kathleen M. Sheehan|Apr 4th, 2003|conferencePaperPaul Deane, Kathleen M. SheehanApr 4th, 2003
-
George Doddington|May 4th, 2002|conferencePaperGeorge DoddingtonMay 4th, 2002
-
Lawrence M. Rudner, T. Liang|May 4th, 2002|conferencePaperLawrence M. Rudner, T. LiangMay 4th, 2002
-
Kishore Papineni, Salim Roukos, Todd War...|May 4th, 2001|conferencePaperKishore Papineni, Salim Roukos, Todd War...May 4th, 2001
-
Jill Burstein, Martin Chodorow|May 4th, 1999|conferencePaperJill Burstein, Martin ChodorowMay 4th, 1999
-
Jill Burstein, Martin Chodorow|May 4th, 1999|conferencePaperJill Burstein, Martin ChodorowMay 4th, 1999
The e-rater system™ is an operational automated essay scoring system, developed at Educational Testing Service (ETS). The average agreement between human readers, and between independent human readers and e-rater is approximately 92%. There is much interest in the larger writing community in examining the system's performance on nonnative speaker essays. This paper focuses on results of a study that show e-rater's performance on Test of Written English (TWE) essay responses written by...
-
document
-
document
-
report
-
document
-
journalArticle
-
Satanjeev Banerjee, Alon Lavie|journalArticleSatanjeev Banerjee, Alon Lavie
We describe METEOR, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machine produced translation and human-produced reference translations. Unigrams can be matched based on their surface forms, stemmed forms, and meanings; furthermore, METEOR can be easily extended to include more advanced matching strategies. Once all generalized unigram matches between the two strings have been found, METEOR computes a score...
-
Jill Burstein|journalArticleJill Burstein
-
Lei Chen, Klaus Zechner, Su-Youn Yoon|reportLei Chen, Klaus Zechner, Su-Youn Yoon