Abstract: In recent years, Audio Visual Scene-Aware Dialog (AVSD) has been an active research task in the multimodal dialogue community and has also been a core part of the Dialog System Technology ...