講演情報
[5O3-IS-5b-02]Embedded Smart Voice Control System
HusanFu Wang2, KuanYu Lee2, 〇SULIN CHI1, ChinChin Lin2, YuJui Chen2, KaiHung Fang2 (1. Otemon Gakuin University, 2. National Formosa University)
regular
キーワード:
Mel-Frequency Cepstral Coefficients (MFCC)、Voiceprint Recognition、Deep Learning、Long Short-Term Memory (LSTM)、Human-Computer Interaction (HCI)
With the increasing demands of Internet of Things (IoT) and smart home technologies for advanced Human–Computer Interaction, this study proposes a novel interaction interface based on user-specific clap sound-print recognition as an alternative to traditional voice wake-up words. The proposed method addresses privacy concerns and reduces accidental triggering inherent in voice-based systems. A portable audio collection device is constructed using a low-cost microcontroller and digital microphone, and a four-class dataset is collected. Raw audio signals are amplified and segmented into ten time steps, from which 72-dimensional Mel-Frequency Cepstral Coefficient (MFCC) features, including first- and second-order derivatives, are extracted to form a (10, 72) temporal feature sequence. An LSTM network is used for classification, achieving a recognition accuracy of 97.25%. A real-time recognition system is further implemented using the Flask framework, enabling user-specific task execution. Experimental results demonstrate the feasibility and practical value of the proposed system as a personalized smart control hub.
