Research
I'm deeply interested in Trustworthy Machine Learning, Adversarial Machine Learning, and the Safety of Multimodal Models. Questions like “How can we make AI more reliable?”(e.g. adversarial attacks,halluciation in MLLMs) and “How can we fully harness AI for the benefit of humanity?”(e.g. guardrail models) are at the heart of my research pursuits, and I am committed to exploring these issues through my graduate studies.
|
|
SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability
Peiyang Xu, Minzhou Pan, Zhaorun Chen, Xue Lin, Chaowei Xiao, Bo Li
ICLR 2025, Under review
We propose SafeVision, a novel image guardrail system that integrates human-like understanding and reasoning.
|
|
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Zhuowen Yuan, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Zhun Wang, Alexander Xiong, Xuandong Zhao, Yu Gai, Francesco Pinto, Yujin Potter, Zhen Xiang, Zinan Lin, Dan Hendrycks, Dawn Song, Bo Li
ICLR 2025, Under review
We present the first unified platform, MMDT, designed to provide a comprehensive safety and trustworthiness evaluation for multimodal foundation models.
|
|
Natural Language Induced Adversarial Images
Xiaopei Zhu*, Peiyang Xu*, Guanning Zeng, Yinpeng Dong, Xiaolin Hu
ACM Multimedia 2024
We propose a natural language induced adversarial attack method. The core idea is to leverage a text-to-image model to generate adversarial images given maliciously constructed prompts.
|
|
CurBench: Curriculum Learning Benchmark
Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Shengnan Liu, Wenwu Zhu
ICML 2024
We develop CurBench, the first benchmark that supports systematic evaluations for curriculum learning consisting of 15 datasets spanning 3 research domains.
|
Website template from here.
|
|