Peiyang Xu

I'm a first-year Ph.D. student in Electrical and Computer Engineering at Princeton University. I earned my Bachelor's degree from Tsinghua University, majoring in Computer Science and Technology. Previously, I was fortunate to have worked as a research intern at University of Chicago under the supervision of Professor Bo Li. I also had the honor of being advised by Professor Xiaolin Hu at Tsinghua University.

Email / CV / Google Scholar

Research

I'm deeply interested in Trustworthy Machine Learning, Adversarial Machine Learning, and the use of Multimodal Large Language Models. Questions like “How can we make AI more reliable?” (e.g., adversarial attacks, hallucination in MLLMs) and “How can we fully harness AI for the benefit of humanity?” (e.g., guardrail models) are at the heart of my research pursuits, and I am committed to exploring these issues through my graduate studies.

	Mitigating Object Hallucination in Large Vision-Language Models through Adversarial Contrastive Finetuning Peiyang Xu, Xiaopei Zhu, Jun Zhu, Xiaolin Hu AAAI 2026, Under review We propose Adversarial Contrastive Finetuning (ACFT), a novel method to mitigate object hallucination in LVLMs.
	SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability Peiyang Xu, Minzhou Pan, Zhaorun Chen, Xue Lin, Chaowei Xiao, Bo Li NeurIPS 2025, Under review We propose SafeVision, a novel image guardrail system that integrates human-like understanding and reasoning.
	MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Zhuowen Yuan, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Zhun Wang, Alexander Xiong, Xuandong Zhao, Yu Gai, Francesco Pinto, Yujin Potter, Zhen Xiang, Zinan Lin, Dan Hendrycks, Dawn Song, Bo Li ICLR 2025 We present the first unified platform, MMDT, designed to provide a comprehensive safety and trustworthiness evaluation for multimodal foundation models.
	Natural Language Induced Adversarial Images Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yinpeng Dong, Xiaolin Hu ACM Multimedia 2024 We propose a natural language induced adversarial attack method. The core idea is to leverage a text-to-image model to generate adversarial images given maliciously constructed prompts.
	CurBench: Curriculum Learning Benchmark Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Shengnan Liu, Wenwu Zhu ICML 2024 We develop CurBench, the first benchmark that supports systematic evaluations for curriculum learning consisting of 3 research domains.

Website template from here.