(† corresponding author, * equal contribution)

2024

Multi-view Analysis for Modality Bias in Multimodal Misinformation Benchmarks
Multi-view Analysis for Modality Bias in Multimodal Misinformation Benchmarks

Hehai Lin, Hui Liu, Shilei Cao, Haoliang Li, Wenya Wang

Under review 2024

Numerous multimodal misinformation benchmarks exhibit bias toward specific modalities, allowing detectors to make predictions based solely on one modality. Training detectors on such datasets can significantly degrade performance in real-world applications. While previous research has quantified modality bias at the dataset level or manually identified spurious correlations between modalities and labels, these approaches lack meaningful insights at the sample level and struggle to scale to the vast amount of online information. In this paper, we investigate the design for automatically recognizing modality bias at the sample level. Specifically, we introduce three views, namely modality benefit, modality flow, and modality causal effect, to quantify samples’ modality contribution based on different theories. To verify their effectiveness and discover the pattern of bias, we conduct a human evaluation on two benchmarks Fakeddit and MMFakeBench, and compare the performance of each view and their ensemble multi-view analysis. The experimental result indicates that multi-view analysis yields the highest performance and is aligned with human judgment in most samples. We further discuss the sensitivity and consistency of each view.

Multi-view Analysis for Modality Bias in Multimodal Misinformation Benchmarks
Multi-view Analysis for Modality Bias in Multimodal Misinformation Benchmarks

Hehai Lin, Hui Liu, Shilei Cao, Haoliang Li, Wenya Wang

Under review 2024

Numerous multimodal misinformation benchmarks exhibit bias toward specific modalities, allowing detectors to make predictions based solely on one modality. Training detectors on such datasets can significantly degrade performance in real-world applications. While previous research has quantified modality bias at the dataset level or manually identified spurious correlations between modalities and labels, these approaches lack meaningful insights at the sample level and struggle to scale to the vast amount of online information. In this paper, we investigate the design for automatically recognizing modality bias at the sample level. Specifically, we introduce three views, namely modality benefit, modality flow, and modality causal effect, to quantify samples’ modality contribution based on different theories. To verify their effectiveness and discover the pattern of bias, we conduct a human evaluation on two benchmarks Fakeddit and MMFakeBench, and compare the performance of each view and their ensemble multi-view analysis. The experimental result indicates that multi-view analysis yields the highest performance and is aligned with human judgment in most samples. We further discuss the sensitivity and consistency of each view.

Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks

Jiayi He*, Hehai Lin*, Qingyun Wang, Yi Fung, Heng Ji

Under review 2024

While Vision-Language Models (VLMs) have shown remarkable abilities in visual and language reasoning tasks, they invariably generate flawed responses. Self-correction that instructs models to refine their outputs presents a promising solution to this issue. Previous studies have mainly concentrated on Large Language Models (LLMs), while the self-correction abilities of VLMs, particularly concerning both visual and linguistic information, remain largely unexamined. This study investigates the self-correction capabilities of VLMs during both inference and fine-tuning stages. We introduce a Self-Correction Learning (SCL) approach that enables VLMs to learn from their self-generated self-correction data through Direct Preference Optimization (DPO) without relying on external feedback, facilitating self-improvement. Specifically, we collect preferred and disfavored samples based on the correctness of initial and refined responses, which are obtained by two-turn self-correction with VLMs during the inference stage. Experimental results demonstrate that although VLMs struggle to self-correct effectively during iterative inference without additional fine-tuning and external feedback, they can enhance their performance and avoid previous mistakes through preference fine-tuning when their self-generated self-correction data are categorized into preferred and disfavored samples. This study emphasizes that self-correction is not merely a refinement process; rather, it should enhance the reasoning abilities of models through additional training, enabling them to generate high-quality responses directly without further refinement.

Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks

Jiayi He*, Hehai Lin*, Qingyun Wang, Yi Fung, Heng Ji

Under review 2024

While Vision-Language Models (VLMs) have shown remarkable abilities in visual and language reasoning tasks, they invariably generate flawed responses. Self-correction that instructs models to refine their outputs presents a promising solution to this issue. Previous studies have mainly concentrated on Large Language Models (LLMs), while the self-correction abilities of VLMs, particularly concerning both visual and linguistic information, remain largely unexamined. This study investigates the self-correction capabilities of VLMs during both inference and fine-tuning stages. We introduce a Self-Correction Learning (SCL) approach that enables VLMs to learn from their self-generated self-correction data through Direct Preference Optimization (DPO) without relying on external feedback, facilitating self-improvement. Specifically, we collect preferred and disfavored samples based on the correctness of initial and refined responses, which are obtained by two-turn self-correction with VLMs during the inference stage. Experimental results demonstrate that although VLMs struggle to self-correct effectively during iterative inference without additional fine-tuning and external feedback, they can enhance their performance and avoid previous mistakes through preference fine-tuning when their self-generated self-correction data are categorized into preferred and disfavored samples. This study emphasizes that self-correction is not merely a refinement process; rather, it should enhance the reasoning abilities of models through additional training, enabling them to generate high-quality responses directly without further refinement.

CoNewsReader: Supporting Critical News Reading with Comments in Social Media
CoNewsReader: Supporting Critical News Reading with Comments in Social Media

Kangyu Yuan, Guanzheng Chen, Sizhe LIANG, Hehai Lin, Qingyu Guo, Zhenhui Peng

Under review 2024

Critical news reading (CNR), which requires grasping the ideas of and raising critical thoughts on the news, is beneficial yet challenging for people who want to get comprehensive information on social media. Comments under the news can aid CNR by providing additional information and other readers’ opinions. However, it is under-investigated how to leverage these comments to support users in CNR. In this paper, we first derive user requirements for a comment-based CNR tool from literature and a formative study (N=6). Then, we develop CoNewsReader, a comment-based interactive CNR tool powered by a large language model. CoNewsReader supports users in grasping the news idea with supplementary information from comments, filtering useful comments for CNR, and getting questions generated based on the comments to conduct critical thinking. Our within-subjects study (N=24) indicates that compared to a baseline news reading interface in social media, participants with CoNewsReader have a more engaging CNR experience and perform better on comprehending the news and raising critical thoughts. We discuss design considerations for supporting reading tasks with user- and machine-generated content.

CoNewsReader: Supporting Critical News Reading with Comments in Social Media
CoNewsReader: Supporting Critical News Reading with Comments in Social Media

Kangyu Yuan, Guanzheng Chen, Sizhe LIANG, Hehai Lin, Qingyu Guo, Zhenhui Peng

Under review 2024

Critical news reading (CNR), which requires grasping the ideas of and raising critical thoughts on the news, is beneficial yet challenging for people who want to get comprehensive information on social media. Comments under the news can aid CNR by providing additional information and other readers’ opinions. However, it is under-investigated how to leverage these comments to support users in CNR. In this paper, we first derive user requirements for a comment-based CNR tool from literature and a formative study (N=6). Then, we develop CoNewsReader, a comment-based interactive CNR tool powered by a large language model. CoNewsReader supports users in grasping the news idea with supplementary information from comments, filtering useful comments for CNR, and getting questions generated based on the comments to conduct critical thinking. Our within-subjects study (N=24) indicates that compared to a baseline news reading interface in social media, participants with CoNewsReader have a more engaging CNR experience and perform better on comprehending the news and raising critical thoughts. We discuss design considerations for supporting reading tasks with user- and machine-generated content.

SAFE: A Spatial-aware Framework for Arable Land Quality Evaluation
SAFE: A Spatial-aware Framework for Arable Land Quality Evaluation

Hehai Lin, Wei Liu†, Mengting Li, Kangyu Yuan, Zhao Liu, Huaijie Zhu, Jianxing Yu, Jian Yin

(ICONIP '24) The International Conference on Neural Information Processing 2024 CCF-C

Identifying high-quality arable land is essential for effective arable land protection. In the assessment of arable land, where each sampling unit of arable land is recorded in a tabular format, current approaches primarily utilize decision tree ensembles and deep learning (DL) techniques. However, these methods are easily suffered from the data imbalance problem, a common issue in real-world arable land assessment scenarios that can lead to sub-optimal outcomes. Furthermore, traditional methods tend to treat each land grid as an independent sample, overlooking the significant spatial and topological interactions among adjacent areas. To overcome these challenges, we introduce SAFE, a Spatial-aware Framework for arable land quality Evaluation that integrates convolutional neural network (CNN) and graph neural network (GNN) components into a deep learning-based architecture. This framework is designed to discern spatial local features and capture topological relationships within tabular data. Additionally, we incorporate a self-supervised regularization technique utilizing contrastive learning into the training objective of the architecture. This approach enhances feature embedding refinement and mitigates the effects of data imbalance. Experimental result demonstrates the distinct advantages of SAFE on a real-world dataset across multiple counties.

SAFE: A Spatial-aware Framework for Arable Land Quality Evaluation
SAFE: A Spatial-aware Framework for Arable Land Quality Evaluation

Hehai Lin, Wei Liu†, Mengting Li, Kangyu Yuan, Zhao Liu, Huaijie Zhu, Jianxing Yu, Jian Yin

(ICONIP '24) The International Conference on Neural Information Processing 2024 CCF-C

Identifying high-quality arable land is essential for effective arable land protection. In the assessment of arable land, where each sampling unit of arable land is recorded in a tabular format, current approaches primarily utilize decision tree ensembles and deep learning (DL) techniques. However, these methods are easily suffered from the data imbalance problem, a common issue in real-world arable land assessment scenarios that can lead to sub-optimal outcomes. Furthermore, traditional methods tend to treat each land grid as an independent sample, overlooking the significant spatial and topological interactions among adjacent areas. To overcome these challenges, we introduce SAFE, a Spatial-aware Framework for arable land quality Evaluation that integrates convolutional neural network (CNN) and graph neural network (GNN) components into a deep learning-based architecture. This framework is designed to discern spatial local features and capture topological relationships within tabular data. Additionally, we incorporate a self-supervised regularization technique utilizing contrastive learning into the training objective of the architecture. This approach enhances feature embedding refinement and mitigates the effects of data imbalance. Experimental result demonstrates the distinct advantages of SAFE on a real-world dataset across multiple counties.

2023

CriTrainer: An Adaptive Training Tool for Critical Paper Reading
CriTrainer: An Adaptive Training Tool for Critical Paper Reading

Kangyu Yuan*, Hehai Lin*, Shilei Cao*, Zhenhui Peng†, Qingyu Guo, Xiaojuan Ma

(UIST '23) Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology 2023 CCF-A

Learning to read scientific papers critically, which requires first grasping their main ideas and then raising critical thoughts, is important yet challenging for novice researchers. The traditional ways to develop critical paper reading (CPR) skills, e.g., checking general tutorials or taking reading courses, often can not provide individuals with adaptive and accessible support. In this paper, we first derive user requirements of a CPR training tool based on literature and a survey study (N=52). Then, we develop CriTrainer, an interactive tool for CPR training. It leverages text summarization techniques to train readers’ skills in grasping the paper’s main ideas. It further utilizes template-based generated questions to help them learn how to raise critical thoughts. A mixed-design study (N=24) shows that compared to a baseline tool with general CPR guidance, students trained by CriTrainer perform better in independently raising critical thinking questions on a new paper. We conclude with design considerations for CPR training tools.

CriTrainer: An Adaptive Training Tool for Critical Paper Reading
CriTrainer: An Adaptive Training Tool for Critical Paper Reading

Kangyu Yuan*, Hehai Lin*, Shilei Cao*, Zhenhui Peng†, Qingyu Guo, Xiaojuan Ma

(UIST '23) Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology 2023 CCF-A

Learning to read scientific papers critically, which requires first grasping their main ideas and then raising critical thoughts, is important yet challenging for novice researchers. The traditional ways to develop critical paper reading (CPR) skills, e.g., checking general tutorials or taking reading courses, often can not provide individuals with adaptive and accessible support. In this paper, we first derive user requirements of a CPR training tool based on literature and a survey study (N=52). Then, we develop CriTrainer, an interactive tool for CPR training. It leverages text summarization techniques to train readers’ skills in grasping the paper’s main ideas. It further utilizes template-based generated questions to help them learn how to raise critical thoughts. A mixed-design study (N=24) shows that compared to a baseline tool with general CPR guidance, students trained by CriTrainer perform better in independently raising critical thinking questions on a new paper. We conclude with design considerations for CPR training tools.