Title | VSRNet: End-to-end video segment retrieval with text query |
Authors | Sun, Xiao Long, Xiang He, Dongliang Wen, Shilei Lian, Zhouhui |
Affiliation | Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China Meituan Inc, Beijing 100102, Peoples R China Baidu Inc, Dept Comp Vis VIS Technol, Beijing 100085, Peoples R China Baidu Inc, Beijing 100085, Peoples R China |
Issue Date | Nov-2021 |
Publisher | PATTERN RECOGNITION |
Abstract | Users are sometimes interested in specific segments of an untrimmed video when using the video search engine. Targeting at this demand, we explore a novel research topic of text query based video segment retrieval (VSR). Different from the conventional video retrieval task or localizing text descriptions in a single video, it requires the retrieval of the most relevant video from a large collection as well as localizing the start and end timestamps of a segment that matches the text query best from the video. A direct solution is to perform video-level matching first, and then apply description localization among such video candidates. Such two-stage based methods are not able to utilize complementary information of each stage, and are time-consuming in inference. In this paper, We propose VSRNet, an end-to-end framework that efficiently retrieves video at segment granularity with two branches. In the first branch, individual videos and texts are mapped to a common space for stand-alone ranking. In the second branch, we propose a supervised text-aligned attention mechanism and calculate the response of every frame to the text query, from which the frames with high scores are aggregated as segment proposals. Extensive experiments conducted on ActivityNet Captions and DiDeMo verify the effectiveness of our method and show that our solution significantly outperforms the state of the art. (C) 2021 Elsevier Ltd. All rights reserved. |
URI | http://hdl.handle.net/20.500.11897/623853 |
ISSN | 0031-3203 |
DOI | 10.1016/j.patcog.2021.108027 |
Indexed | EI SCI(E) |
Appears in Collections: | 待认领 |