Multi-modal few-shot temporal action detection via vision-language meta-adaptation

Publication
arXiv preprint arXiv:2211.14905