Presentation Information
[3K2-OS-27b-01]Comparative Analysis of Human and Large Language Model Evaluations of Real Estate Floor Plans — Visual Impression vs. Structural Understanding —
〇Taro Narahara1, Toshihiko Yamasaki2 (1. New Jersey Institute of Technology (NJIT), 2. The Univ. of Tokyo)
Keywords:
Floor Plan Evaluation,Subjective Score Prediction,Multimodal Learning,Large Language Models,Human–AI Collaboration
This study investigates the use of large language models (LLMs) to predict “living comfort” from real-estate floor plan images and compares the results with human ground truth. Experiments using zero-shot, few-shot, and fine-tuning settings show that GPT performs well on visually driven impressions such as spaciousness, modernity, and luxuriance, with zero-shot already demonstrating strong general visual knowledge. However, performance declines on criteria requiring structural understanding of room adjacency, circulation, and privacy, and fine-tuning does not fully resolve this limitation. These results highlight the need for more effective methods for conveying structural layout information to LLMs.
Comment
To browse or post comments, you must log in.Log in
