Publications

(2023). Where is my wallet? modeling object proposal sets for egocentric visual query localization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

Cite

(2023). Mindstorms in Natural Language-Based Societies of Mind. arXiv preprint arXiv:2305.17066.

Cite

(2023). GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation. CVPR 2024.

Cite

(2023). FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing. ICLR 2024.

Cite

(2023). ETAD: Training Action Detection End to End on a Laptop. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

Cite

(2023). Boundary-denoising for video activity localization. ICLR 2024.

Cite

(2022). Segtad: Precise temporal action detection via semantic segmentation. European Conference on Computer Vision.

Cite

(2022). Negative frames matter in egocentric visual query 2d localization. arXiv preprint arXiv:2208.01949.

Cite

(2022). Multi-modal few-shot temporal action detection via vision-language meta-adaptation. arXiv preprint arXiv:2211.14905.

Cite

(2022). LC-NAS: Latency constrained neural architecture search for point cloud networks. 2022 International Conference on 3D Vision (3DV).

Cite

(2022). Ego4d: Around the world in 3,000 hours of egocentric video. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

Cite

(2022). Contrastive language-action pre-training for temporal localization. arXiv preprint arXiv:2204.12293.

Cite

(2021). Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization. NeurIPS 2021.

(2021). Boundary-sensitive Pre-training for Temporal Localization in Videos. ICCV 2021.

Cite

(2021). Vlg-net: Video-language graph matching network for video grounding. Proceedings of the IEEE/CVF International Conference on Computer Vision.

Cite

(2021). Relation-aware video reading comprehension for temporal language grounding. The 2021 Conference on Empirical Methods in Natural Language Processing.

Cite

(2021). Low-fidelity video encoder optimization for temporal action localization. Advances in Neural Information Processing Systems.

Cite

(2021). Boundary-sensitive pre-training for temporal localization in videos. Proceedings of the IEEE/CVF International Conference on Computer Vision.

Cite

(2021). BAOD: Budget-Aware Object Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

Cite

(2020). Learning Heat Diffusion for Network Alignment. Thirty-seventh International Conference on Machine Learning (ICML) Workshop.

Cite

(2020). Improve Baseline for Temporal Action Detection: HACS Challenge 2020 Solution of IVUL-KAUST team. The Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

Cite

(2019). Semantic Part RCNN for Real-World Pedestrian Detection. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

Cite

(2019). Missing Labels in Object Detection. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.

Cite

(2019). Logistic Regression is Still Alive and Effective: The 3rd YouTube 8M Challenge Solution of the IVUL-KAUST team. The IEEE International Conference on Computer Vision (ICCV) Workshops.

Cite

(2019). Beyond weakly supervised: Pseudo ground truths mining for missing bounding-boxes object detection. IEEE Transactions on Circuits and Systems for Video Technology.

Cite

(2013). G-TAD: Sub-Graph Localization for Temporal Action Detection. CVPR 2020.

Cite