GroupViT is a framework for learning semantic segmentation purely from text captions without using any mask supervision. It learns to perform bottom-up heirarchical spatial grouping of ...
Note: The website will detect your platform automatically. If not, scroll to the Download section and pick the correct file for your OS.