64 resources

  • Iddo Drori, Sarah Zhang, Reece Shuttlewo...
    |
    Aug 2nd, 2022
    |
    journalArticle
    Iddo Drori, Sarah Zhang, Reece Shuttlewo...
    Aug 2nd, 2022

    We demonstrate that a neural network pretrained on text and fine-tuned on code solves mathematics course problems, explains solutions, and generates questions at a human level. We automatically synthesize programs using few-shot learning and OpenAI’s Codex transformer and execute them to solve course problems at 81% automatic accuracy. We curate a dataset of questions from Massachusetts Institute of Technology (MIT)’s largest mathematics courses (Single Variable and Multivariable Calculus,...

  • Yigal Attali, Andrew Runge, Geoffrey T. ...
    |
    Jul 22nd, 2022
    |
    journalArticle
    Yigal Attali, Andrew Runge, Geoffrey T. ...
    Jul 22nd, 2022

    Automatic item generation (AIG) has the potential to greatly expand the number of items for educational assessments, while simultaneously allowing for a more construct-driven approach to item development. However, the traditional item modeling approach in AIG is limited in scope to content areas that are relatively easy to model (such as math problems), and depends on highly skilled content experts to create each model. In this paper we describe the interactive reading task, a...

  • Yigal Attali, Andrew Runge, Geoffrey T. ...
    |
    Jul 22nd, 2022
    |
    journalArticle
    Yigal Attali, Andrew Runge, Geoffrey T. ...
    Jul 22nd, 2022

    Automatic item generation (AIG) has the potential to greatly expand the number of items for educational assessments, while simultaneously allowing for a more construct-driven approach to item development. However, the traditional item modeling approach in AIG is limited in scope to content areas that are relatively easy to model (such as math problems), and depends on highly skilled content experts to create each model. In this paper we describe the interactive reading task, a...

  • Rishi Bommasani, Drew A. Hudson, Ehsan A...
    |
    Jul 12th, 2022
    |
    preprint
    Rishi Bommasani, Drew A. Hudson, Ehsan A...
    Jul 12th, 2022

    AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical...

  • Pierre Jean A. Colombo, Chloé Clavel, Pa...
    |
    Jun 28th, 2022
    |
    journalArticle
    Pierre Jean A. Colombo, Chloé Clavel, Pa...
    Jun 28th, 2022

    Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce...

  • Yan Zhuang, Qi Liu, Zhenya Huang
    |
    Jun 28th, 2022
    |
    journalArticle
    Yan Zhuang, Qi Liu, Zhenya Huang
    Jun 28th, 2022

    Computerized Adaptive Testing (CAT) refers to an efficient and personalized test mode in online education, aiming to accurately measure student proficiency level on the required subject/domain. The key component of CAT is the "adaptive" question selection algorithm, which automatically selects the best suited question for student based on his/her current estimated proficiency, reducing test length. Existing algorithms rely on some manually designed and pre-fixed informativeness/uncertainty...

  • Qiao Wang
    |
    Jun 21st, 2022
    |
    journalArticle
    Qiao Wang
    Jun 21st, 2022

    This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard was produced by a native expert. A shortlist of carefully selected tools, including InferSent, spaCy, DKPro, ADW, SEMILAR and Latent Semantic...

  • David W. Dorsey, Hillary R. Michaels
    |
    Jun 9th, 2022
    |
    journalArticle
    David W. Dorsey, Hillary R. Michaels
    Jun 9th, 2022

    In this concluding article of the special issue, we provide an overall discussion and point to future emerging trends in AI that might shape our approach to validity and building validity arguments.

  • Inioluwa Deborah Raji, Peggy Xu, Colleen...
    |
    Jun 9th, 2022
    |
    preprint
    Inioluwa Deborah Raji, Peggy Xu, Colleen...
    Jun 9th, 2022

    Much attention has focused on algorithmic audits and impact assessments to hold developers and users of algorithmic systems accountable. But existing algorithmic accountability policy approaches have neglected the lessons from non-algorithmic domains: notably, the importance of interventions that allow for the effective participation of third parties. Our paper synthesizes lessons from other fields on how to craft effective systems of external oversight for algorithmic deployments. First, we...

  • Anirudh Goyal, Abram L. Friesen, Andrea ...
    |
    May 24th, 2022
    |
    preprint
    Anirudh Goyal, Abram L. Friesen, Andrea ...
    May 24th, 2022

    Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model. In this paper we...

  • Nicol Turner Lee, Samantha Lai
    |
    May 17th, 2022
    |
    webpage
    Nicol Turner Lee, Samantha Lai
    May 17th, 2022

    Stakeholders in artificial intelligence must trace back to the roots of the problems, which lie in the lack of diversity in design teams and data that continues to carry on trauma and discrimination of the past, Nicol Turner Lee and Samantha Lai write.

  • Mark D. Shermis
    |
    May 15th, 2022
    |
    journalArticle
    Mark D. Shermis
    May 15th, 2022

    One of the challenges of discussing validity arguments for machine scoring of essays centers on the absence of a commonly held definition and theory of good writing. At best, the algorithms attempt to measure select attributes of writing and calibrate them against human ratings with the goal of accurate prediction of scores for new essays. Sometimes these attributes are based on the fundamentals of writing (e.g., fluency), but quite often they are based on locally developed rubrics that may...

  • Eva M. Campo, Sadasivan Shankar, Alexand...
    |
    Apr 13th, 2022
    |
    journalArticle
    Eva M. Campo, Sadasivan Shankar, Alexand...
    Apr 13th, 2022
  • Norah Almusharraf, Hind Alotaibi
    |
    Apr 5th, 2022
    |
    journalArticle
    Norah Almusharraf, Hind Alotaibi
    Apr 5th, 2022
  • Derek Justice
    |
    Apr 1st, 2022
    |
    conferencePaper
    Derek Justice
    Apr 1st, 2022
  • Jill Burstein, Geoffrey T. LaFlair, Anto...
    |
    Mar 23rd, 2022
    |
    report
    Jill Burstein, Geoffrey T. LaFlair, Anto...
    Mar 23rd, 2022

    The Duolingo English Test is a groundbreaking, digital-first, computer-adaptive English language proficiency test intended to support stakeholder admissions decisions at English-medium institutions. The test measures four key constructs for university English language proficiency: Speaking, Writing, Reading, and Listening (SWRL), and is aligned with the Common European Framework of Reference for Languages (CEFR) proficiency levels and descriptors. As a digital-first assessment, the test...

  • Riordan Brennan, Debbie Perouli
    |
    Mar 21st, 2022
    |
    conferencePaper
    Riordan Brennan, Debbie Perouli
    Mar 21st, 2022
  • Amber Dood, Blair Winograd, Solaire Fink...
    |
    Mar 21st, 2022
    |
    conferencePaper
    Amber Dood, Blair Winograd, Solaire Fink...
    Mar 21st, 2022
  • Mohammadreza Tavakoli, Abdolali Faraji, ...
    |
    Mar 21st, 2022
    |
    conferencePaper
    Mohammadreza Tavakoli, Abdolali Faraji, ...
    Mar 21st, 2022
  • Long Ouyang, Jeff Wu, Xu Jiang
    |
    Mar 4th, 2022
    |
    preprint
    Long Ouyang, Jeff Wu, Xu Jiang
    Mar 4th, 2022

    Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through...

Last update from database: 01/12/2025, 16:15 (UTC)
Powered by Zotero and Kerko.