6 resources

  • Abhimanyu Dubey, Abhinav Jauhri, Abhinav...
    |
    Aug 15th, 2024
    |
    preprint
    Abhimanyu Dubey, Abhinav Jauhri, Abhinav...
    Aug 15th, 2024

    Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language...

  • Bhashithe Abeysinghe, Ruhan Circi
    |
    Jun 5th, 2024
    |
    preprint
    Bhashithe Abeysinghe, Ruhan Circi
    Jun 5th, 2024

    Chatbots have been an interesting application of natural language generation since its inception. With novel transformer based Generative AI methods, building chatbots have become trivial. Chatbots which are targeted at specific domains such as medicine, psychology, and general information retrieval are implemented rapidly. This, however, should not distract from the need to evaluate the chatbot responses. Especially because the natural language generation community does not entirely agree...

  • Juan D. Pinto, Luc Paquette
    |
    May 22nd, 2024
    |
    preprint
    Juan D. Pinto, Luc Paquette
    May 22nd, 2024

    The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability, identifying overlaps and semantic misalignments. We propose moving towards a unified framework...

  • Nicolò Cosimo Albanese
    |
    Apr 20th, 2024
    |
    conferencePaper
    Nicolò Cosimo Albanese
    Apr 20th, 2024

    Ensuring fidelity to source documents is crucial for the responsible use of Large Language Models (LLMs) in Retrieval Augmented Generation (RAG) systems. We propose a lightweight method for real-time hallucination detection, with potential to be deployed as a model-agnostic microservice to bolster reliability. Using in-context learning, our approach evaluates response factuality at the sentence level without annotated data, promoting transparency and user trust. Compared to other...

  • Jon Saad-Falcon, Omar Khattab, Christoph...
    |
    Mar 31st, 2024
    |
    preprint
    Jon Saad-Falcon, Omar Khattab, Christoph...
    Mar 31st, 2024

    Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate. We introduce ARES, an Automated RAG Evaluation System, for evaluating RAG systems along the dimensions of context relevance, answer faithfulness, and answer relevance. By creating its own synthetic training data, ARES finetunes lightweight LM judges to assess the quality of individual RAG components. To mitigate potential...

  • Zhen Li, Xiaohan Xu, Tao Shen
    |
    Apr 28th, 2024
    |
    journalArticle
    Zhen Li, Xiaohan Xu, Tao Shen
    Apr 28th, 2024

    In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This survey aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to...

Last update from database: 28/04/2025, 20:15 (UTC)
Powered by Zotero and Kerko.