Abstract: In an age where the Internet is dominated by visual content, the generation of animated captions has become a must. It has always been an interesting study for researchers in the Department ...
Abstract: With the emergence of audio-language models, constructing large-scale paired audio-language datasets has become essential yet challenging for model development, primarily due to the ...