Presentation Information
[3Yin-A-23]Analyzing Image and Text Contributions to Social Bias in VLMs
〇Daiki Shirafuji1, Makoto Takenaka1, Tatsuhiko Saito1 (1. Mitsubishi Electric Corporation)
Keywords:
Social Bias,VLM,AI Ethics
In recent years, the social biases in Vision Language Models (VLMs) have been increasingly recognized as a serious problem. Prior works have not sufficiently investigated which modality is responsible for biased outputs. In this paper, we propose a bias contribution analysis method utilizing the probability distribution of tokens generated by VLMs. Our method consists of the following steps: (1) preparing queries including occupations likely to induce bias, and corresponding anti-stereotypical images and texts as contextual inputs; (2) inputting each modality data with the queries into a VLM to calculate a score for each modality by subtracting the log probability of anti-stereotypical tokens from that of stereotypical tokens; and (3) constructing an evaluation metric by subtracting the text score from the image score. A positive value of the proposed metric indicates that the image contributes to social bias, whereas a negative value indicates that the text contributes to social bias. We conduct experiments with a dataset based on FACET. The experimental results demonstrate many models exhibit a larger contribution from images both for gender and race bias. It can be said that images tend to influence the social biased outputs more strongly than text in generative tasks.
