Sparse video tubes for joint video and image vision transformers
Posted by AJ Piergiovanni and Anelia Angelova, Research Scientists, Google Video understanding is a challenging problem that requires reasoning about both spatial information (e.g., for objects in a scene, including...