Boundary-denoising for video activity localization

Video activity localization aims at understanding the semantic content in long, untrimmed videos and retrieving actions of interest. The retrieved action with its start and end locations can be used for highlight generation, temporal action …

ETAD: Training Action Detection End to End on a Laptop

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Mindstorms in Natural Language-Based Societies of Mind

Query Localization in Long-form Videos

Where is my wallet? modeling object proposal sets for egocentric visual query localization

Contrastive language-action pre-training for temporal localization

Ego4d: Around the world in 3,000 hours of egocentric video

LC-NAS: Latency constrained neural architecture search for point cloud networks

Multi-modal few-shot temporal action detection via vision-language meta-adaptation