Hehai Lin
Hehai Lin is an incoming PhD student at The Hong Kong University of Science and Technology (Guangzhou). Before that, he received his Bachelor degree in the School of Artificial Intelligence from Sun Yat-sen University, supervised by Prof. Zhenhui Peng and Prof. Xiaobin Chang. He also works closely with Prof. Wenya Wang in the College of Computing and Data Science, Nanyang Technological University. His current research interests include Multimodal learning and reasoning, especially the self-evolve ability of Large vision-language models (LVLMs).
Sun Yat-sen University
B.S. in Artificial Intelligence Sep. 2020 - Jun. 2024
Nanyang Technological University
Research Assistant (Supervisor is Wenya Wang) Jun. 2024 - Nov. 2024
Hehai Lin, Hui Liu, Shilei Cao, Haoliang Li, Wenya Wang
Under review 2024
Numerous multimodal misinformation benchmarks exhibit bias toward specific modalities, allowing detectors to make predictions based solely on one modality. Training detectors on such datasets can significantly degrade performance in real-world applications. While previous research has quantified modality bias at the dataset level or manually identified spurious correlations between modalities and labels, these approaches lack meaningful insights at the sample level and struggle to scale to the vast amount of online information. In this paper, we investigate the design for automatically recognizing modality bias at the sample level. Specifically, we introduce three views, namely modality benefit, modality flow, and modality causal effect, to quantify samples’ modality contribution based on different theories. To verify their effectiveness and discover the pattern of bias, we conduct a human evaluation on two benchmarks Fakeddit and MMFakeBench, and compare the performance of each view and their ensemble multi-view analysis. The experimental result indicates that multi-view analysis yields the highest performance and is aligned with human judgment in most samples. We further discuss the sensitivity and consistency of each view.
Jiayi He*, Hehai Lin*, Qingyun Wang, Yi Fung, Heng Ji
Under review 2024
While Vision-Language Models (VLMs) have shown remarkable abilities in visual and language reasoning tasks, they invariably generate flawed responses. Self-correction that instructs models to refine their outputs presents a promising solution to this issue. Previous studies have mainly concentrated on Large Language Models (LLMs), while the self-correction abilities of VLMs, particularly concerning both visual and linguistic information, remain largely unexamined. This study investigates the self-correction capabilities of VLMs during both inference and fine-tuning stages. We introduce a Self-Correction Learning (SCL) approach that enables VLMs to learn from their self-generated self-correction data through Direct Preference Optimization (DPO) without relying on external feedback, facilitating self-improvement. Specifically, we collect preferred and disfavored samples based on the correctness of initial and refined responses, which are obtained by two-turn self-correction with VLMs during the inference stage. Experimental results demonstrate that although VLMs struggle to self-correct effectively during iterative inference without additional fine-tuning and external feedback, they can enhance their performance and avoid previous mistakes through preference fine-tuning when their self-generated self-correction data are categorized into preferred and disfavored samples. This study emphasizes that self-correction is not merely a refinement process; rather, it should enhance the reasoning abilities of models through additional training, enabling them to generate high-quality responses directly without further refinement.