Presentation Information

[EDP2-03]Speech Recognition with TDGL-Based Superconducting Physical Reservoir

*Ken Arita1, Jukiya Kusuki1, Edmund Soji Otabe1, Ahmet Karacali1, Xu Muzhen1, Yuki Usami1, Hirofumi Tanaka1, Tetsuya Matsuno2 (1. Kyushu Institute of Technology (Japan), 2. National Institute of Technology Ariake College (Japan))
PDF DownloadDownload PDF

Keywords:

Time-Dependent Ginzburg–Landau (TDGL),Affine Integrator (AFI),Reservoir computing,Speech recognition

[Purpose]
Physical reservoir computing (PRC) is a computational framework that exploits the inherent dynamics of physical systems to process temporal information with minimal training cost. Superconducting systems provide attractive candidates for PRC because they inherently combine nonlinearity, high-speed dynamics, and low dissipation. A superconducting reservoir modeled by the two-dimensional Time-Dependent Ginzburg-Landau (TDGL) equation has previously achieved a coefficient of determination in waveform generation and Nonlinear Autoregressive Moving Average (NARMA2) benchmark tasks, demonstrating its potential as a versatile temporal processor [1]. In this study, the applicability of the superconducting reservoir is extended to a speech-recognition task involving isolated digit classification.
[Method]
Audio inputs are first converted into Mel-frequency cepstral coefficients (MFCCs) [2], a widely used feature representation in speech processing. The preprocessed signals are then applied to the superconducting reservoir as time-dependent current-density drives. The dynamical response of the reservoir is simulated by the Affine Integrator (AFI) scheme [3] for the TDGL equations, which provides gauge-invariant and stable numerical integration. From the simulated fields, electric-field values at 50 spatial points are selected to serve as reservoir nodes. These values constitute the reservoir state vectors, which are passed to a ridge regression readout for supervised training. Importantly, as in standard PRC, the internal reservoir parameters are fixed and only the readout layer is trained, maintaining the efficiency of the framework.
[Results]
Classification accuracy is evaluated as a function of pinning density, pinning strength, and reduced temperature. Systematic parameter scans are performed to clarify the trade-offs between memory retention and nonlinear separability inherent to the vortex dynamics. The results, summarized as a heatmap in Figure 1, show that an overall classification accuracy of approximately 0.6 is achieved under optimal condition.
[Consideration]
The analysis further indicates that weak pinning enhances memory capacity, as vortices exhibit more reversible motion and energy is temporarily stored in the order parameter before relaxation. In contrast, strong pinning produces highly nonlinear responses because vortices enter and exit pinning sites in a discontinuous manner, amplifying local field variations but reducing long-term memory. Intermediate regimes are therefore found to balance these effects and yield the best recognition performance.
[Conclusion]
These findings demonstrate that the TDGL-based superconducting reservoir can be adapted beyond benchmark temporal tasks to practical classification problems such as speech recognition. The method relies only on established thin-film superconductors, patterned pinning arrays, and standard cryogenic readout, suggesting compatibility with future hardware implementations. The present results reinforce the potential of superconducting reservoirs as a physically grounded platform for information processing, with implications not only for speech recognition but also for broader real-time signal-processing applications at cryogenic environments.

Presentation Materials
https://iss-archives.jp/iss2025.jp/slides/EDP2-03.pdf