Abstract: Large Vision-Language Models (LVLMs) suffer from severe object hallucinations, leading them to frequently generate outputs that do not correspond to the image content, significantly reducing ...
Abstract: Unlike traditional single-modal visual tracking, vision-language tracking leverages textual descriptions to provide semantic understanding, aiding in handling challenges such as appearance ...