Presentation Information
[3Yin-A-23]Analyzing Image and Text Contributions to Social Bias in VLMs
〇Daiki Shirafuji1, Makoto Takenaka1, Tatsuhiko Saito1 (1. Mitsubishi Electric Corporation)
Keywords:
Social Bias,VLM,AI Ethics
In recent years, the social biases in Vision Language Models (VLMs) have been increasingly recognized as a serious problem. Prior works have not sufficiently investigated which modality is responsible for biased outputs. In this paper, we propose a bias contribution analysis method utilizing the probability distribution of tokens generated by VLMs. Our method consists of the following steps: (1) preparing queries including occupations likely to induce bias, and corresponding anti-stereotypical images and texts as contextual inputs; (2) inputting each modality data with the queries into a VLM to calculate a score for each modality by subtracting the log probability of anti-stereotypical tokens from that of stereotypical tokens; and (3) constructing an evaluation metric by subtracting the text score from the image score. A positive value of the proposed metric indicates that the image contributes to social bias, whereas a negative value indicates that the text contributes to social bias. We conduct experiments with a dataset based on FACET. The experimental results demonstrate many models exhibit a larger contribution from images both for gender and race bias. It can be said that images tend to influence the social biased outputs more strongly than text in generative tasks.
Comment
To browse or post comments, you must log in.Log in
