The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[3Yin-A-23]Analyzing Image and Text Contributions to Social Bias in VLMs

〇Daiki Shirafuji¹, Makoto Takenaka¹, Tatsuhiko Saito¹ (1. Mitsubishi Electric Corporation)

Keywords:

Social Bias,VLM,AI Ethics

In recent years, the social biases in Vision Language Models (VLMs) have been increasingly recognized as a serious problem. Prior works have not sufficiently investigated which modality is responsible for biased outputs. In this paper, we propose a bias contribution analysis method utilizing the probability distribution of tokens generated by VLMs. Our method consists of the following steps: (1) preparing queries including occupations likely to induce bias, and corresponding anti-stereotypical images and texts as contextual inputs; (2) inputting each modality data with the queries into a VLM to calculate a score for each modality by subtracting the log probability of anti-stereotypical tokens from that of stereotypical tokens; and (3) constructing an evaluation metric by subtracting the text score from the image score. A positive value of the proposed metric indicates that the image contributes to social bias, whereas a negative value indicates that the text contributes to social bias. We conduct experiments with a dataset based on FACET. The experimental results demonstrate many models exhibit a larger contribution from images both for gender and race bias. It can be said that images tend to influence the social biased outputs more strongly than text in generative tasks.

Comment

To browse or post comments, you must log in.Log in

Back to Session information