1

Temporal action localization (TAL) is a fundamental yet challenging task in video understanding. Existing TAL methods rely on pre-training a video encoder through action classification supervision. This results in a task discrepancy problem for the …

Boundary-sensitive Pre-training for Temporal Localization in Videos

Most existing models for temporal localization tasks are pre-trained on video classification tasks. The domain gap between action recognition and localization can be addressed by a temporal boundary datasets. • For the first time, we investigate pre-training for localization by introducing a novel boundary-sensitive pretext task. • We propose to synthesize temporal boundaries in existing video classification datasets to help localize action. • Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification based pre-training counterpart, and achieves new state-of-the-art performance on several temporal localization tasks.

1

Multi-modal few-shot temporal action detection via vision-language meta-adaptation

Negative frames matter in egocentric visual query 2d localization

Segtad: Precise temporal action detection via semantic segmentation

Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization

Boundary-sensitive Pre-training for Temporal Localization in Videos

BAOD: Budget-Aware Object Detection

Boundary-sensitive pre-training for temporal localization in videos

Low-fidelity video encoder optimization for temporal action localization

Relation-aware video reading comprehension for temporal language grounding

Vlg-net: Video-language graph matching network for video grounding