Classification of Human- and AI-Generated Texts for English, French, German, and Spanish

Article Status
Published
Authors/contributors
Title
Classification of Human- and AI-Generated Texts for English, French, German, and Spanish
Abstract
In this paper we analyze features to classify human- and AI-generated text for English, French, German and Spanish and compare them across languages. We investigate two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. For training and testing the classifiers in this multilingual setting, we created a new text corpus covering 10 topics for each language. For the detection of AI-generated text, the combination of all proposed features performs best, indicating that our features are portable to other related languages: The F1-scores are close with 99% for Spanish, 98% for English, 97% for German and 95% for French. For the detection of AI-rephrased text, the systems with all features outperform systems with other features in many cases, but using only document features performs best for German (72%) and Spanish (86%) and only text vector features leads to best results for English (78%).
Repository
arXiv
Archive ID
arXiv:2312.04882
Date
2023-12-08
Accessed
14/06/2024, 20:45
Library Catalogue
Extra
arXiv:2312.04882 [cs] <标题>: 英、法、德、西班牙语中人类与人工智能生成文本的分类 Citation Key: schaaff2023
Citation
Schaaff, K., Schlippe, T., & Mindner, L. (2023). Classification of Human- and AI-Generated Texts for English, French, German, and Spanish (arXiv:2312.04882). arXiv. http://arxiv.org/abs/2312.04882
Powered by Zotero and Kerko.