This is the official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge. The train_reranker.sh script is used to fine-tune the reranker module with specific ...
The main novelty seems to be an extra layer of indirection with the prior network (whether it is an autoregressive transformer or a diffusion network), which predicts an image embedding based on the ...