Results – Evidence Library – Artificial Intelligence in Measurement and Education

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Evan Shieh, Faye-Marie Vassel, Cassidy S...

|

Apr 16th, 2024

|

preprint

Evan Shieh, Faye-Marie Vassel, Cassidy S...

Apr 16th, 2024

The rapid deployment of generative language models (LMs) has raised concerns about social biases affecting the well-being of diverse consumers. The extant literature on generative LMs has primarily examined bias via explicit identity prompting. However, prior research on bias in earlier language-based technology platforms, including search engines, has shown that discrimination can occur even when identity terms are not specified explicitly. Studies of bias in LM responses to open-ended...

Can Large Language Models Automatically Score Proficiency of Written Essays?

Watheq Mansour, Salam Albatarni, Sohaila...

|

Apr 15th, 2024

|

preprint

Watheq Mansour, Salam Albatarni, Sohaila...

Apr 15th, 2024

Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays. We experimented with two popular LLMs, namely ChatGPT and Llama....

The TESCREAL bundle: Eugenics and the promise of utopia through artificial general intelligence

Apr 14th, 2024

|

journalArticle

Apr 14th, 2024

The stated goal of many organizations in the field of artificial intelligence (AI) is to develop artificial general intelligence (AGI), an imagined system with more intelligence than anything we have ever seen. Without seriously questioning whether such a system can and should be built, researchers are working to create “safe AGI” that is “beneficial for all of humanity.” We argue that, unlike systems with specific applications which can be evaluated following standard engineering...

Combining machine translation and automated scoring in international large-scale assessments

Ji Yoon Jung, Lillian Tyack, Matthias vo...

|

Apr 8th, 2024

|

journalArticle

Ji Yoon Jung, Lillian Tyack, Matthias vo...

Apr 8th, 2024

Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies.

Combining machine translation and automated scoring in international large-scale assessments

Ji Yoon Jung, Lillian Tyack, Matthias Vo...

|

Apr 8th, 2024

|

journalArticle

Ji Yoon Jung, Lillian Tyack, Matthias Vo...

Apr 8th, 2024

Artificial intelligence (AI) is rapidly changing communication and technology-driven content creation and is also being used more frequently in education. Despite these advancements, AI-powered automated scoring in international large-scale assessments (ILSAs) remains largely unexplored due to the scoring challenges associated with processing large amounts of multilingual responses. However, due to their low-stakes nature, ILSAs are an ideal ground for innovations and exploring new methodologies.

Brainy: An Innovative Context-Aware Generative AI Engine for Education

Elie Nahas, Paul Barakat Diab, Talar Ate...

|

Apr 8th, 2024

|

conferencePaper

Elie Nahas, Paul Barakat Diab, Talar Ate...

Apr 8th, 2024

In the realm of education, the integration of Generative AI tools within learning platforms has transformed teaching and learning paradigms. This paper introduces Brainy, a context-aware AI engine integrated into the Augmental Adaptive Learning platform, which implements the concept of Differentiated Instruction. At its core, Brainy employs Generative AI Models, leveraging a variety of Language Learning Models (LLMs), including OpenAI. This integration enables Brainy to generate personalized...

“We’re changing the system with this one”: Black students using critical race algorithmic literacies to subvert and survive AI-mediated racism in school

Tiera Chante Tanksley

|

Apr 8th, 2024

|

journalArticle

Tiera Chante Tanksley

Apr 8th, 2024

Purpose This paper aims to center the experiences of three cohorts ( n = 40) of Black high school students who participated in a critical race technology course that exposed anti-blackness as the organizing logic and default setting of digital and artificially intelligent technology. This paper centers the voices, experiences and technological innovations of the students, and in doing so, introduces a new type of digital...

How Tech Giants Cut Corners to Harvest Data for A.I.

Cade Metz, Cecilia Kang, Sheera Frenkel,...

|

Apr 6th, 2024

|

newspaperArticle

Cade Metz, Cecilia Kang, Sheera Frenkel,...

Apr 6th, 2024

OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.

Designing Child-Centric AI Learning Environments: Insights from LLM-Enhanced Creative Project-Based Learning

Siyu Zha, Yuehan Qiao, Qingyu Hu

|

Apr 5th, 2024

|

preprint

Siyu Zha, Yuehan Qiao, Qingyu Hu

Apr 5th, 2024

Project-based learning (PBL) is an instructional method that is very helpful in nurturing students' creativity, but it requires significant time and energy from both students and teachers. Large language models (LLMs) have been proven to assist in creative tasks, yet much controversy exists regarding their role in fostering creativity. This paper explores the potential of LLMs in PBL settings, with a special focus on fostering creativity. We began with an exploratory study involving 12...

Artificial Intelligence and Curriculum Implementation in Public Secondary Schools of Federal Capital Territory, Abuja, Nigeria

Ahmed Mohammed, Umar Faiza Bashir, Abuba...

|

Apr 3rd, 2024

|

journalArticle

Ahmed Mohammed, Umar Faiza Bashir, Abuba...

Apr 3rd, 2024

The study assessed the impact of artificial intelligence on curriculum implementation in public secondary schools in Federal Capital territory, Abuja, Nigeria. The research design used for the study is descriptive survey. The population of the study comprises of the all the teachers in public secondary schools in FCT. The sample for the study is 320 respondents. The researcher formulated a questionnaire titled Artificial Intelligence on Curriculum Implementation Questionnaire (AICIQ). The...

Automated Assessment of Encouragement and Warmth in Classrooms Leveraging Multimodal Emotional Features and ChatGPT

Ruikun Hou, Tim Fütterer, Babette Bühler...

|

Apr 1st, 2024

|

preprint

Ruikun Hou, Tim Fütterer, Babette Bühler...

Apr 1st, 2024

Classroom observation protocols standardize the assessment of teaching effectiveness and facilitate comprehension of classroom interactions. Whereas these protocols offer teachers specific feedback on their teaching practices, the manual coding by human raters is resource-intensive and often unreliable. This has sparked interest in developing AI-driven, cost-effective methods for automating such holistic coding. Our work explores a multimodal approach to automatically estimating...

Contextual evaluation of LLM’s performance on primary education science learning contents in the Yoruba language

Olanrewaju Lawal, Anthony Soronnadi, Olu...

|

Apr 1st, 2024

|

conferencePaper

Olanrewaju Lawal, Anthony Soronnadi, Olu...

Apr 1st, 2024

In the rapidly evolving era of artificial intelligence, Large Language Models (LLMs) like ChatGPT-3.5, Llama, and PaLM 2 play a pivotal role in reshaping education. Trained on diverse language data with a predominant focus on English, these models exhibit remarkable proficiency in comprehending and generating intricate human language constructs, revolutionizing educational applications. This potential has prompted exploration into personalized and enriched educational experiences,...

Predictors of middle school students’ perceptions of automated writing evaluation

Joshua Wilson, Fan Zhang, Corey Palermo,...

|

Apr 1st, 2024

|

journalArticle

Joshua Wilson, Fan Zhang, Corey Palermo,...

Apr 1st, 2024

This study examined middle school students' perceptions of an automated writing evaluation (AWE) system, MI Write. We summarize students' perceptions of MI Write's usability, usefulness, and desirability both quantitatively and qualitatively. We then estimate hierarchical entry regression models that account for district context, classroom climate, demographic factors (i.e., gender, special education status, limited English proficiency status, socioeconomic status, grade), students'...

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Yiqing Xie, Alex Xie, Divyanshu Sheth

|

Mar 31st, 2024

|

preprint

Yiqing Xie, Alex Xie, Divyanshu Sheth

Mar 31st, 2024

To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples...

A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science

Clayton Cohn, Nicole Hutchins, Tuan Le

|

Mar 24th, 2024

|

journalArticle

Clayton Cohn, Nicole Hutchins, Tuan Le

Mar 24th, 2024

This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and...

Towards Building a Language-Independent Speech Scoring Assessment

Shreyansh Gupta, Abhishek Unnam, Kuldeep...

|

Mar 24th, 2024

|

journalArticle

Shreyansh Gupta, Abhishek Unnam, Kuldeep...

Mar 24th, 2024

Automatic speech scoring is crucial in language learning, providing targeted feedback to language learners by assessing pronunciation, fluency, and other speech qualities. However, the scarcity of human-labeled data for languages beyond English poses a significant challenge in developing such systems. In this work, we propose a Language-Independent scoring approach to evaluate speech without relying on labeled data in the target language. We introduce a multilingual speech scoring system...

Researchers at Stanford University Introduce 'pyvene': An Open-Source Python Library that Supports Intervention-Based Research on Machine Learning Models

Muhammad Athar Ganaie

|

Mar 16th, 2024

|

blogPost

Muhammad Athar Ganaie

Mar 16th, 2024

Understanding and manipulating neural models is essential in the evolving field of AI. This necessity stems from various applications, from refining models for enhanced robustness to unraveling their decision-making processes for greater interpretability. Amidst this backdrop, the Stanford University research team has introduced 'pyvene,' a groundbreaking open-source Python library that facilitates intricate interventions on PyTorch models. pyvene is ingeniously designed to overcome the...

Automated Scoring of Translations with BERT Models: Chinese and English Language Case Study

Yizhuo Cui, Maocheng Liang

|

Feb 26th, 2024

|

journalArticle

Yizhuo Cui, Maocheng Liang

Feb 26th, 2024

With the wide application of artificial intelligence represented by deep learning in natural language-processing tasks, the automated scoring of translations has also advanced and improved. This study aims to determine if the BERT-assist system can reliably assess translation quality and identify high-quality translations for potential recognition. It takes the Han Suyin International Translation Contest as a case study, which is a large-scale and influential translation contest in China,...

Microsoft releases its internal generative AI red teaming tool to the public

Feb 23rd, 2024

|

webpage

Feb 23rd, 2024

PyRIT can generate thousands of malicious prompts to test a gen AI model, and even score its response.

An Application of Text Embeddings to Support Alignment of Educational Content Standards

National Council on Measurem...

|

Feb 23rd, 2024

|

report

National Council on Measurem...

Feb 23rd, 2024

Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics makethemviable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper presents an application of text embeddings to find relationships between different sets of educational content standards in a process termed content...

Search

Publication year