Presentation Information
[3Yin-A-21]A Multi-Persona AI Music Critique System with Real-time Performance Analysis and RAG
〇Kiwamu Sato1 (1. Accenture Japan Ltd)
Keywords:
Music Critique Generation,Retrieval-Augmented Generation,Real-time Audio Analysis
We present Otoprism, a multi-persona AI music critique system that integrates real-time acoustic analysis with
retrieval-augmented generation (RAG) to support understanding of expressive performance during live music lis-
tening. The system analyzes incoming audio on a client device, aggregates acoustic features over a sliding time
window, and converts them into natural-language descriptions across five categories—tempo, dynamics, timbre,
harmony, and expression—via a rule-based module called FeatureToNLP. These descriptions serve as queries to
retrieve relevant human critiques from a performance-criticism corpus, which are injected into LLM prompts to
generate short persona-specific critiques at approximately 40-second intervals. As pilot validation, we compared
generated critiques for two contrasting performances of the same string-quartet piece and confirmed, across three
independent runs under fixed conditions, that the generated text consistently exhibits vocabulary differences aligned
with the intended expressive contrast.
retrieval-augmented generation (RAG) to support understanding of expressive performance during live music lis-
tening. The system analyzes incoming audio on a client device, aggregates acoustic features over a sliding time
window, and converts them into natural-language descriptions across five categories—tempo, dynamics, timbre,
harmony, and expression—via a rule-based module called FeatureToNLP. These descriptions serve as queries to
retrieve relevant human critiques from a performance-criticism corpus, which are injected into LLM prompts to
generate short persona-specific critiques at approximately 40-second intervals. As pilot validation, we compared
generated critiques for two contrasting performances of the same string-quartet piece and confirmed, across three
independent runs under fixed conditions, that the generated text consistently exhibits vocabulary differences aligned
with the intended expressive contrast.
Comment
To browse or post comments, you must log in.Log in
