In authors or contributors

1 resource

  • Ted Zadouri, Ahmet Üstün, Arash Ahmadian...
    |
    Sep 11th, 2023
    |
    preprint
    Ted Zadouri, Ahmet Üstün, Arash Ahmadian...
    Sep 11th, 2023

    The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architecture with lightweight experts.Our MoE architecture outperforms standard parameter-efficient...

Last update from database: 27/12/2024, 15:15 (UTC)
Powered by Zotero and Kerko.