Presentation Information
[O6-06]An investigation of auditory rhythms with a spiking neural network autoencoder
*Rodrigo Manríquez1,2, Sonja A. Kotz2,3, Andrea Ravignani4,5, Bart de Boer1 (1. Vrije Universiteit Brussel (Belgium), 2. Maastricht University (Netherlands), 3. Max Planck Institute for Human Cognitive and Brain Sciences (Germany), 4. Sapienza University of Rome (Italy), 5. Aarhus University & The Royal Academy of Music (Denmark))
Keywords:
Spiking Neural Networks,Auditory Processing,Rhythm Processing
Here, we present a biologically inspired spiking neural network, or SNN, framework that learns auditory rhythms from acoustic data by exploiting the exact spike timing of spikes. Although classic deep learning models have been applied to investigate temporal sequences, spiking NNs more accurately reflect the temporal dynamics of biological neural systems.
We first encoded acoustic waveforms containing rhythmic information into spike trains and considered a subcortical model of the peripheral auditory pathway 1. This model reproduces cochlear transduction and auditory-nerve firing across characteristic frequencies, yielding parallel streams of precisely timed spikes that retain the temporal structure of the input. These spike trains were then used to train a purely spike-based autoencoder. In this framework, the encoder compresses input data into a latent representation, i.e. a simplified representation that captures underlying features of the data, while the decoder reconstructs the amplitude envelope of the original sound, preserving rhythmic features.
By training on isochronous sequences, where consecutive onsets were separated by identical intervals, we demonstrate that rhythmic structure is preserved in the latent space representation. Moreover, the network develops predictive behaviour, by anticipating subsequent beat onsets even in the absence of a beat. This sensitivity reflects a form of temporal expectation embedded in the SNN. To evaluate how the network internalises rhythmic structures, we tested it with sequences that missed beats and inspected the resulting latent representations. By analysing the spiking activity and internal variables within this hidden layer, we revealed how the model encodes temporal regularities and reconstructs the expected onset pattern, in a way that would not be possible in a non-spiking neural network.
1. Zuk, N., Carney, L., Lalor, E. 2018. Preferred Tempo and Low-Audio-Frequency Bias Emerge From Simulated Sub–cortical Processing of Sounds With a Musical Beat. Front. Neurosci., 12.
We first encoded acoustic waveforms containing rhythmic information into spike trains and considered a subcortical model of the peripheral auditory pathway 1. This model reproduces cochlear transduction and auditory-nerve firing across characteristic frequencies, yielding parallel streams of precisely timed spikes that retain the temporal structure of the input. These spike trains were then used to train a purely spike-based autoencoder. In this framework, the encoder compresses input data into a latent representation, i.e. a simplified representation that captures underlying features of the data, while the decoder reconstructs the amplitude envelope of the original sound, preserving rhythmic features.
By training on isochronous sequences, where consecutive onsets were separated by identical intervals, we demonstrate that rhythmic structure is preserved in the latent space representation. Moreover, the network develops predictive behaviour, by anticipating subsequent beat onsets even in the absence of a beat. This sensitivity reflects a form of temporal expectation embedded in the SNN. To evaluate how the network internalises rhythmic structures, we tested it with sequences that missed beats and inspected the resulting latent representations. By analysing the spiking activity and internal variables within this hidden layer, we revealed how the model encodes temporal regularities and reconstructs the expected onset pattern, in a way that would not be possible in a non-spiking neural network.
1. Zuk, N., Carney, L., Lalor, E. 2018. Preferred Tempo and Low-Audio-Frequency Bias Emerge From Simulated Sub–cortical Processing of Sounds With a Musical Beat. Front. Neurosci., 12.