Presentation Information

[3K2-OS-27b-01]Comparative Analysis of Human and Large Language Model Evaluations of Real Estate Floor Plans — Visual Impression vs. Structural Understanding —

〇Taro Narahara1, Toshihiko Yamasaki2 (1. New Jersey Institute of Technology (NJIT), 2. The Univ. of Tokyo)

Keywords:

Floor Plan Evaluation,Subjective Score Prediction,Multimodal Learning,Large Language Models,Human–AI Collaboration

This study investigates the use of large language models (LLMs) to predict “living comfort” from real-estate floor plan images and compares the results with human ground truth. Experiments using zero-shot, few-shot, and fine-tuning settings show that GPT performs well on visually driven impressions such as spaciousness, modernity, and luxuriance, with zero-shot already demonstrating strong general visual knowledge. However, performance declines on criteria requiring structural understanding of room adjacency, circulation, and privacy, and fine-tuning does not fully resolve this limitation. These results highlight the need for more effective methods for conveying structural layout information to LLMs.