教师简介
杨思蓓,中山大学计算机学院副教授(引进人才系列),博士生导师,逸仙学者。主要研究领域为跨模态视觉感知、理解、生成与交互,特别聚焦于:1)多模态大模型(LLM/MLLM);2)视觉-语言统一理解与生成;3)具身智能;4)开放世界视觉感知与理解。迄今为止累计发表CCF A类/中科院一区论文近60篇,其中以第一作者/通讯作者身份发表40余篇,Google Scholar 引用3000余次。主持了包括国自然面上项目、国自然青年项目、浦江人才计划、上海领军人才-海外计划等多项科研项目。担任ICCV、ICLR、ECCV、WACV等顶级会议领域主席(AC)。入选全球前2%顶尖科学家榜单。
杨思蓓分别于2020年和2016年获得香港大学(香港政府奖学金)博士学位和浙江大学(竺可桢学院)学士学位。2020至2021,她曾担任香港理工大学研究助理教授,博导。2021至2025年,她担任上海科技大学助理教授,研究员,博导。2012年入选教育部珠峰计划。
详情请参见个人主页:https://sibeiyang.github.io/
[Recruitment-2025/12]:招募研究实习生(Research Interns),方向涵盖多模态大模型与具身智能。本科生请以[研究实习生-姓名]为邮件标题,将成绩单与简历发送至 sibeiyang9@gmail.com;硕博同学需征得导师同意后发送邮件。同等条件下,将优先考虑计划申请本组硕博的同学。
PS. 杨思蓓累计指导超过20名学生在CCF-A类会议上以第一作者/共同一作身份发表论文,其中包括7名本科生。
研究领域
跨模态视觉感知、理解、生成与交互,尤其是1)多模态大模型(LLMs/MLLMs),2) 视觉-语言统一理解与生成,3)具身智能,4)开放世界视觉感知与理解。
News
2026/3 3 papers are accepted by CVPR 2026~
2026/1 2 papers are accepted by ICLR 2026~
2025/12 Sibei Yang will serve as Area Chair for ECCV 2026.
2025/9/19 5 papers are accepted by NeurIPS 2025~
教育背景
2012/09-2016/07:浙江大学,竺可桢学院,计算机科学与技术,学士学位
2016/09-2020/09:香港大学,香港政府奖学金,计算机科学,博士学位
工作经历
2020/10-2021/05:香港理工大学,计算机系,研究助理教授,博导
2021/06-2025/05:上海科技大学,信息学院,助理教授,研究员,博导
2025/06-至今:中山大学数据科学与计算机学院,副教授,博导
科研项目
主持包括国自然面上项目、国自然青年项目、浦江人才计划、上海领军人才(海外)计划、启动培育项目等在内的多项科研课题。
代表性论著
(*)代表通讯作者
[1] Sibei Yang, Guanbin Li, and Yizhou Yu. Relationship-Embedded Representation Learning for Grounding Referring Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021, 43: 2765-2779. [CCF A][中科院一区]
[2] Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, and Yizhou Yu. Bottom-up shift and reasoning for referring image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021: 11266-11275. [CCF A]
[3] Sibei Yang, Guanbin Li, and Yizhou Yu. Graph-structured referring expression reasoning in the wild. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020: 9952-9961. [CCF A]
[4] Sibei Yang, Guanbin Li, and Yizhou Yu. Dynamic graph attention for referring expression comprehension. Proceedings of the IEEE/CVF international conference on computer vision (ICCV Oral). 2019: 4644-4653. [CCF A]
[5] Sibei Yang, Guanbin Li, and Yizhou Yu.Cross-modal relationship inference for grounding referring expressions.Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2019:4145-4154. [CCF A]
[6] Sibei Yang, Guanbin Li, and Yizhou Yu. Propagating over phrase relations for one-stage visual grounding. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer International Publishing (ECCV), 2020: 589-605. [CCF B]
[7] Sibei Yang*(通讯作者), Ge Zheng, Jiajin Tang, Jiaye Qian, Hanzhuo Huang, Cheng Shi. Discovering Compositional Hallucination in LVLMs. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]
[8] Xiang He, Sibei Yang*(共同一作), Guanbin Li, Haofeng Li, Huiyou Chang, and Yizhou Yu. Non-local context encoder: Robust biomedical image segmentation Proceedings of the AAAI Conference on Artificial Intelligence (AAAI Oral). 2019, 33(01): 8417-8424. [CCF A]
[9] Cheng Shi, Yizhou Yu, Sibei Yang*(通讯作者). Vision Transformers Need More Than Register. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2026. [CCF A]
[10] Yulin Zhang, Cheng Shi, Sibei Yang*(通讯作者). WeaveTime: Streaming from Earlier Frames into Emergent Memory in VideoLLM. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2026. [CCF A]
[11] Jiajin Tang, Gaoyang, Wenjie Wang, Sibei Yang*(通讯作者), Xing Chen. Chart Deep Research in LVLMs via Parallel Relative Policy Optimization. The International Conference on Learning Representations (ICLR). 2026. [CCF A]
[12] Hanzhuo Huang, Qingyang Bao, Zekai Gu, Zhongshou Du, Cheng Lin, Yuan Liu*, Sibei Yang*(通讯作者). RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation. The International Conference on Learning Representations (ICLR). 2026. [CCF A]
[13] Cheng Shi, Yizhou Yu, Sibei Yang*(通讯作者). Vision Function Layer in Multimodal LLMs.Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]
[14] Yulin Zhang, Cheng Shi, Yang Wang, Sibei Yang*(通讯作者). Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]
[15] Jiaye Qian, Ge Zheng, Yuchen Zhu, Sibei Yang*(通讯作者). Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]
[16] Yue Xu, Chengyan Fu, Li Xiong, Sibei Yang, Wenjie Wang. Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]
[17] Jiajin Tang, Zhengxuan Wei, Yuchen Zhu, Cheng Shi, Guanbin Li, Liang Lin, Sibei Yang*(通讯作者). Sim-DETR: Unlock DETR for Temporal Sentence Grounding. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]
[18] Bin Yang, Yulin Zhang, Hong-Yu Zhou, Sibei Yang*(通讯作者). No More Sibling Rivalry: Debiasing Human-Object Interaction Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]
[19] Ge Zheng, Jiaye Qian, Jiajin Tang, Sibei Yang*(通讯作者). Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]
[20] Zhengxuan Wei, Jiajin Tang, SibeiYang*(通讯作者). Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCFA]
[21] Jiajin Tang, Zhengxuan Wei, SibeiYang*(通讯作者). Closed-Loop Transfer for Weakly-supervised Affordance Grounding. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]
[22] Qiyuan Dai, Hanzhuo Huang, Yu Wu, and SibeiYang*(通讯作者). Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025. [CCFA]
[23] Qiyuan Dai, and Sibei Yang*(通讯作者). Enhancing Flexibility in Test-Time Adaptation with Online EM.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025 [CCF A]
[24] Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, SibeiYang*(通讯作者). Rethinking Query-based Transformer for Continual Image Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025 [CCF A]
[25] Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, and Sibei Yang*(通讯作者), Xiaoguang Han*, Yizhou Yu*. A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. [CCF A][中科院一区]
[26] Qiyuan Dai, and SibeiYang*(通讯作者). Curriculum point prompting for weakly-supervised referring image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024: 13711-13722. [CCF A]
[27] Ge Zheng, Bin Yang, Jiajin Tang, Hong-Yu Zhou, and SibeiYang*(通讯作者). Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models. Advances in Neural Information Processing Systems (NeurIPS), 2023, 36: 5168-5191. [CCF A]
[28] Hanzhuo Huang, Yufan Feng, Cheng Shi, Lan Xu, Jingyi Yu, and SibeiYang*(通讯作者). Free-bloom: Zero-shot text-to-video generator with llm director and ldm animator. Advances in Neural Information Processing Systems (NeurIPS), 2023, 36: 26135-26158. [CCF A]
[28] Cheng Shi, and SibeiYang*(通讯作者). EdaDet: Open-vocabulary object detection using early dense alignment. Proceedings of the IEEE/CVF international conference on computer vision (ICCV). 2023: 15724-15734. [CCF A]
[30] Jiajin Tang, Ge Zheng, and SibeiYang*(通讯作者). Temporal collection and distribution for referring video object segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 15466-15476. [CCF A]
[31] Cheng Shi, and SibeiYang*(通讯作者). LogoPrompt: Synthetic text images can be good visual prompts for vision-language models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 2932-2941. [CCF A]
[32] Jiajin Tang, Ge Zheng, Jingyi Yu, and SibeiYang*(通讯作者). CotDet: Affordance knowledge prompting for task driven object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 3068-3078. [CCF A]
[33] Longwen Zhang, Qiwei Qiu, Hongyang Lin, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, SibeiYang*(通讯作者), Lan Xu*, and Jingyi Yu*. DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance. ACM Transactions on Graphics (TOG), SIGGRAPH, 2023, 42(4): 1-16. [CCF A]
[34] Jiajin Tang, Ge Zheng, Cheng Shi, and SibeiYang*(通 讯作者). Contrastive grouping with transformer for referring image segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2023: 23570-23580. [CCF A]
[35] Xuyang Liu, Bingbing Wen, and SibeiYang*(通讯作者). CCQ: cross-class query network for partially labeled organ segmentation. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2023, 37(2): 1755-1763. [CCF A]
[36] Cheng Shi, Yuchen Zhu, and SibeiYang*(通讯作者). Plain-Det: A Plain Multi-Dataset Object Detector. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2024: 210-226. [CCF B]
[37] Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, and SibeiYang*(通讯作 者). Part2Object: Hierarchical Unsupervised 3D Instance Segmentation. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2024: 1-18. [CCF B]
[38] Cheng Shi, and SibeiYang*(通讯作者). Spatial and visual perspective-taking via view rotation and relation reasoning for embodied reference understanding. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2022: 201-218. [CCF B]
[39] Hanzhuo Huang, Yuan Liu, Ge Zheng, Jiepeng Wang, Zhiyang Dou, and SibeiYang*(通讯作者). MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow. The Thirteenth International Conference on Learning Representations (ICLR). 2025. [CCF A]
[40] Cheng Shi, and SibeiYang*(通讯作者). The Devil is in the Object Boundary: Towards Annotation-free Instance Segmentation using Foundation Models. The Twelfth International Conference on Learning Representations (ICLR). 2024. [CCF A]



