講演情報
[17p-K307-14]An YMnO3 Single Crystal-Based In-Materio Physical Reservoir Computing Device for Voice Recognition
〇Muzhen Xu1, Kyoka Furuta2, Ahmet Karacali3, Yuki Umezaki2, Yuki Usami1,3, Yoichi Horibe1,2, Hirofumi Tanaka1,3 (1.Kyushu Inst. Tech. Neumorph Center, 2.Kyushu Inst. Tech. Mater. Sci. Eng., 3.Kyushu Inst. Tech. Hum. Intel. Sys.)
キーワード:
in-materio physcial reservoir computing、Yttrium manganese oxide (YMnO3)
Physical reservoir computing (PRC) is an innovative computational paradigm that leverages intrinsic nonlinearity of physical systems to efficiently perform complex tasks. Yttrium manganese oxide (YMnO3), a unique ferroelectric material, features a network of semiconductive domains and walls analogous to a reservoir layer. This study aims to evaluate the voice recognition performance of YMnO3 as an in-material PRC device.
The surface of YMnO3 crystallites was polished perpendicularly to the crystallographic c-axis (namely YMO⊥), and a 16-electrode array was deposited on both the top and bottom sides. A free-spoke-digit-data dataset containing voice recordings of the numbers zero to nine, pronounced 47 times each by six speakers (George, Lucas, Jackson, Theo, Nicolas, and Yweweler), was used for voice recognition. This dataset was augmented 15 times using a sulfonated polyaniline network, followed by rectification, downsampling, and normalization (range: 0-5) using Python software. The pretreated voice signals were then amplified by three times (Thurlby Thandar Instruments, WA301 Wide Band Amplifier, 30V pk-pk) and applied to the YMnO3 PRC system as time-series bias voltages using LabVIEW software. 31 output signals from the device were recorded simultaneously with a sampling rate of 1000 points/s. For voice recognition, all output signals were labelled as real numbers, a one-hot vector was used as the target, and ridge regression was applied for classification using Python software. It demonstrated impressive recognition accuracies across different digits and speakers, achieving up to 75% accuracy for digits and 98% accuracy for speakers. These results highlight the potential of YMnO3-based PRC for practical applications in speech recognition, offering both high performance and energy efficiency.
The surface of YMnO3 crystallites was polished perpendicularly to the crystallographic c-axis (namely YMO⊥), and a 16-electrode array was deposited on both the top and bottom sides. A free-spoke-digit-data dataset containing voice recordings of the numbers zero to nine, pronounced 47 times each by six speakers (George, Lucas, Jackson, Theo, Nicolas, and Yweweler), was used for voice recognition. This dataset was augmented 15 times using a sulfonated polyaniline network, followed by rectification, downsampling, and normalization (range: 0-5) using Python software. The pretreated voice signals were then amplified by three times (Thurlby Thandar Instruments, WA301 Wide Band Amplifier, 30V pk-pk) and applied to the YMnO3 PRC system as time-series bias voltages using LabVIEW software. 31 output signals from the device were recorded simultaneously with a sampling rate of 1000 points/s. For voice recognition, all output signals were labelled as real numbers, a one-hot vector was used as the target, and ridge regression was applied for classification using Python software. It demonstrated impressive recognition accuracies across different digits and speakers, achieving up to 75% accuracy for digits and 98% accuracy for speakers. These results highlight the potential of YMnO3-based PRC for practical applications in speech recognition, offering both high performance and energy efficiency.