MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks
May 14 2023

Related Posts