RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers
Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like...