杨思蓓

副教授

联系邮箱： sibeiyang9@gmail.com

教师简介

杨思蓓，中山大学计算机学院副教授（引进人才系列），博士生导师，逸仙学者。主要研究领域为跨模态视觉感知、理解、生成与交互，特别聚焦于：1）多模态大模型（LLM/MLLM）；2）视觉-语言统一理解与生成；3）具身智能；4）开放世界视觉感知与理解。迄今为止累计发表CCF A类/中科院一区论文近60篇，其中以第一作者/通讯作者身份发表CCF-A类论文40余篇，Google Scholar 引用3600余次。主持了包括国自然面上项目、国自然青年项目、浦江人才计划、上海领军人才-海外计划、腾讯犀牛鸟专项等校企合作项目。担任ICCV、ICLR、ECCV、NeurIPS等顶级会议领域主席（AC）。入选全球前2%顶尖科学家榜单。

杨思蓓分别于2020年和2016年获得香港大学（香港政府奖学金）博士学位和浙江大学（竺可桢学院）学士学位。2020至2021，她曾担任香港理工大学研究助理教授，博导。2021至2025年，她担任上海科技大学助理教授，研究员，博导。2012年入选教育部珠峰计划。

详情请参见个人主页：https://sibeiyang.github.io/

* 累计指导超过25名学生在CCF-A类会议上以第一作者/共同一作身份发表论文，其中包括8名本科生。

* 课题组首届博士毕业生（2026届）获多家头部科技企业人才计划录取。

[Admissions-2026/06]：Fall 2027 硕/博 研究方向涵盖多模态大模型、具身智能、世界模型。请以[27Fall-姓名]为邮件标题，将个人简历及成绩单发送至：sibeiyang9@gmail.com。

[Recruitment-2025/12]：招募研究实习生（Research Interns），方向涵盖多模态大模型与具身智能。本科生请以[研究实习生-姓名]为邮件标题，将成绩单与简历发送至 sibeiyang9@gmail.com；硕博同学需征得导师同意后发送邮件。同等条件下，将优先考虑计划申请本组硕博的同学。

研究领域

跨模态视觉感知、理解、生成与交互，尤其是1）多模态大模型（LLMs/MLLMs），2) 视觉-语言统一理解与生成，3）具身智能，4）开放世界视觉感知与理解。

News

2026/5 Several papers are accepted by ICML, TPAMI, MICCAI, and TIP. Special congratulations to Zhendong for publishing a first-author ICML paper as an undergraduate and pursuing his graduate studies with our group~

2026/3 3 papers are accepted by CVPR 2026~

2026/1 2 papers are accepted by ICLR 2026~

2025/12 Sibei Yang will serve as Area Chair for ECCV 2026.

2025/9/19 5 papers are accepted by NeurIPS 2025~

教育背景

2012/09-2016/07：浙江大学，竺可桢学院，计算机科学与技术，学士学位

2016/09-2020/09：香港大学，香港政府奖学金，计算机科学，博士学位

工作经历

2020/10－2021/05：香港理工大学，计算机系，研究助理教授，博导

2021/06－2025/05：上海科技大学，信息学院，助理教授，研究员，博导

2025/06－至今：中山大学数据科学与计算机学院，副教授，博导

科研项目

主持包括国自然面上项目、国自然青年项目、浦江人才计划、上海领军人才（海外）计划、腾讯犀牛鸟专项、启动培育项目等在内的多项科研课题。

代表性论著

(*)代表通讯作者

[1] Sibei Yang, Guanbin Li, and Yizhou Yu. Relationship-Embedded Representation Learning for Grounding Referring Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021, 43: 2765-2779. [CCF A][中科院一区]

[2] Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, and Yizhou Yu. Bottom-up shift and reasoning for referring image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021: 11266-11275. [CCF A]

[3] Sibei Yang, Guanbin Li, and Yizhou Yu. Graph-structured referring expression reasoning in the wild. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020: 9952-9961. [CCF A]

[4] Sibei Yang, Guanbin Li, and Yizhou Yu. Dynamic graph attention for referring expression comprehension. Proceedings of the IEEE/CVF international conference on computer vision (ICCV Oral). 2019: 4644-4653. [CCF A]

[5] Sibei Yang, Guanbin Li, and Yizhou Yu.Cross-modal relationship inference for grounding referring expressions.Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2019:4145-4154. [CCF A]

[6] Sibei Yang, Guanbin Li, and Yizhou Yu. Propagating over phrase relations for one-stage visual grounding. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer International Publishing (ECCV), 2020: 589-605. [CCF B]

[7] Sibei Yang*(通讯作者), Ge Zheng, Jiajin Tang, Jiaye Qian, Hanzhuo Huang, Cheng Shi. Discovering Compositional Hallucination in LVLMs. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[8] Xiang He, Sibei Yang*(共同一作), Guanbin Li, Haofeng Li, Huiyou Chang, and Yizhou Yu. Non-local context encoder: Robust biomedical image segmentation Proceedings of the AAAI Conference on Artificial Intelligence (AAAI Oral). 2019, 33(01): 8417-8424. [CCF A]

[9] Cheng Shi, Yizhou Yu*, Sibei Yang*(通讯作者). Vision Transformers Need More Than Register. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2026. [CCF A]

[10] Yulin Zhang, Cheng Shi, Sibei Yang*(通讯作者). WeaveTime: Streaming from Earlier Frames into Emergent Memory in VideoLLM. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2026. [CCF A]

[11] Jiajin Tang, Gaoyang, Wenjie Wang, Sibei Yang*(通讯作者), Xing Chen. Chart Deep Research in LVLMs via Parallel Relative Policy Optimization. The International Conference on Learning Representations (ICLR). 2026. [CCF A]

[12] Hanzhuo Huang, Qingyang Bao, Zekai Gu, Zhongshou Du, Cheng Lin, Yuan Liu*, Sibei Yang*(通讯作者). RefAny3D: 3D Asset-Referenced Diffusion Models for Image Generation. The International Conference on Learning Representations (ICLR). 2026. [CCF A]

[13] Cheng Shi, Yizhou Yu, Sibei Yang*(通讯作者). Vision Function Layer in Multimodal LLMs.Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[14] Yulin Zhang, Cheng Shi, Yang Wang, Sibei Yang*(通讯作者). Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[15] Jiaye Qian, Ge Zheng, Yuchen Zhu, Sibei Yang*(通讯作者). Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[16] Yue Xu, Chengyan Fu, Li Xiong, Sibei Yang, Wenjie Wang. Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[17] Jiajin Tang, Zhengxuan Wei, Yuchen Zhu, Cheng Shi, Guanbin Li, Liang Lin, Sibei Yang*(通讯作者). Sim-DETR: Unlock DETR for Temporal Sentence Grounding. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[18] Bin Yang, Yulin Zhang, Hong-Yu Zhou, Sibei Yang*(通讯作者). No More Sibling Rivalry: Debiasing Human-Object Interaction Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[19] Ge Zheng, Jiaye Qian, Jiajin Tang, Sibei Yang*(通讯作者). Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[20] Zhengxuan Wei, Jiajin Tang, SibeiYang*(通讯作者). Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCFA]

[21] Jiajin Tang, Zhengxuan Wei, SibeiYang*(通讯作者). Closed-Loop Transfer for Weakly-supervised Affordance Grounding. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[22] Qiyuan Dai, Hanzhuo Huang, Yu Wu, and SibeiYang*(通讯作者). Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025. [CCFA]

[23] Qiyuan Dai, and Sibei Yang*(通讯作者). Enhancing Flexibility in Test-Time Adaptation with Online EM.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025 [CCF A]

[24] Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, SibeiYang*(通讯作者). Rethinking Query-based Transformer for Continual Image Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025 [CCF A]

[25] Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, and Sibei Yang*(通讯作者), Xiaoguang Han*, Yizhou Yu*. A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. [CCF A][中科院一区]

[26] Qiyuan Dai, and SibeiYang*(通讯作者). Curriculum point prompting for weakly-supervised referring image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024: 13711-13722. [CCF A]

[27] Ge Zheng, Bin Yang, Jiajin Tang, Hong-Yu Zhou, and SibeiYang*(通讯作者). Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models. Advances in Neural Information Processing Systems (NeurIPS), 2023, 36: 5168-5191. [CCF A]

[28] Hanzhuo Huang, Yufan Feng, Cheng Shi, Lan Xu, Jingyi Yu, and SibeiYang*(通讯作者). Free-bloom: Zero-shot text-to-video generator with llm director and ldm animator. Advances in Neural Information Processing Systems (NeurIPS), 2023, 36: 26135-26158. [CCF A]

[28] Cheng Shi, and SibeiYang*(通讯作者). EdaDet: Open-vocabulary object detection using early dense alignment. Proceedings of the IEEE/CVF international conference on computer vision (ICCV). 2023: 15724-15734. [CCF A]

[30] Jiajin Tang, Ge Zheng, and SibeiYang*(通讯作者). Temporal collection and distribution for referring video object segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 15466-15476. [CCF A]

[31] Cheng Shi, and SibeiYang*(通讯作者). LogoPrompt: Synthetic text images can be good visual prompts for vision-language models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 2932-2941. [CCF A]

[32] Jiajin Tang, Ge Zheng, Jingyi Yu, and SibeiYang*(通讯作者). CotDet: Affordance knowledge prompting for task driven object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 3068-3078. [CCF A]

[33] Longwen Zhang, Qiwei Qiu, Hongyang Lin, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, SibeiYang*(通讯作者), Lan Xu*, and Jingyi Yu*. DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance. ACM Transactions on Graphics (TOG), SIGGRAPH, 2023, 42(4): 1-16. [CCF A]

[34] Jiajin Tang, Ge Zheng, Cheng Shi, and SibeiYang*(通讯作者). Contrastive grouping with transformer for referring image segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2023: 23570-23580. [CCF A]

[35] Xuyang Liu, Bingbing Wen, and SibeiYang*(通讯作者). CCQ: cross-class query network for partially labeled organ segmentation. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2023, 37(2): 1755-1763. [CCF A]

[36] Cheng Shi, Yuchen Zhu, and SibeiYang*(通讯作者). Plain-Det: A Plain Multi-Dataset Object Detector. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2024: 210-226. [CCF B]

[37] Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, and SibeiYang*(通讯作者). Part2Object: Hierarchical Unsupervised 3D Instance Segmentation. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2024: 1-18. [CCF B]

[38] Cheng Shi, and SibeiYang*(通讯作者). Spatial and visual perspective-taking via view rotation and relation reasoning for embodied reference understanding. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2022: 201-218. [CCF B]

[39] Hanzhuo Huang, Yuan Liu, Ge Zheng, Jiepeng Wang, Zhiyang Dou, and SibeiYang*(通讯作者). MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow. The Thirteenth International Conference on Learning Representations (ICLR). 2025. [CCF A]

[40] Cheng Shi, and SibeiYang*(通讯作者). The Devil is in the Object Boundary: Towards Annotation-free Instance Segmentation using Foundation Models. The Twelfth International Conference on Learning Representations (ICLR). 2024. [CCF A]

师资队伍

杨思蓓