Most existing models for temporal localization tasks are pre-trained on video classification tasks. The domain gap
between action recognition and localization can be addressed by a temporal boundary datasets.
• For the first time, we investigate pre-training for localization by introducing a novel boundary-sensitive pretext task.
• We propose to synthesize temporal boundaries in existing video classification datasets to help localize action.
• Extensive experiments show that the proposed BSP is superior and complementary to the existing action classification
based pre-training counterpart, and achieves new state-of-the-art performance on several temporal localization tasks.