CNN对图片分类能够取得很高的准确率,而语音信号经过傅里叶变换或者其他变换能够得到时间-频率图。 大家的想法都很直接,能不能用于语音信号的分析? 于是就有了下面的探索。。 估计看完的小伙伴应该能够实现一个CNN对音频分类,同时也知道CNN在处理时间-频率图与普通图的不同之处。
1.Quora 上的问题Can we apply CNN to frequency domain? https://www.quora.com/Can-we-apply-CNN-to-frequency-domain
2.Fourier Convolutional Neural Networks (FCNN)是什么? Harry Pratt, Bryan Williams, Frans Coenen, and Yalin Zheng, “FCNN: Fourier Convolutional Neural Networks” (http://ecmlpkdd2017.ijs.si/paper…, ECML PKDD 2017 - European Conf. in ML) 这个Fourier Convolutional并不是你想的那样子的,他只是将常见的图片转化到频域,使原本大尺度的图片变得稀疏,加快网络的计算。 3.paper Spectral Representations for Convolutional Neural Networks 这个跟上面那个有些像。
4.Audio Classification using FastAI and On-the-Fly Frequency Transforms An experiment with generating spectrograms from raw audio at training time with PyTorch and fastai v1. 这篇博客记录了如何提取光谱图和使用FastAI训练网络。
5.github Audio visualization & analysis using the RTFI
6.常用时频表示方法及相应的库 Time–frequency representation
7.Audio features for web-based ML 文中提到一篇文章 Learning the Speech Front-end With Raw Waveform CLDNNs 这里有一份是使用librosa提取log-mel-spectrogram的源码 (#^.^#)。
8.教你如何将Spectrogram存为图片 Store the Spectrogram as Image in Python Save spectrogram (only content, without axes or anything else) to a file using Matloptlib
9.kaggle中语音识别比赛中选手的代码 Log Spectrogram and MFCC, Filter Bank Example Keras Sequential Conv1D Model Classification
10.github开源源码:Keras Audio Preprocessors 语音识别预处理的代码,包括了特征提取、数据增强等。 出自ICML2017 的文章 Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
11.一份声音分类的PPT教程 https://web.njit.edu/~usman/courses/cs698_spring18/CNNforSoundClassifcation.pptx
12.解释DCNN怎么就能用在音频分类上和问题 (1)一篇paper : EXPLAINING DEEP CONVOLUTIONAL NEURAL NETWORKS ON MUSICCLASSIFICATION (2)Stack Overflow的问题 Convolutional Neural Network (CNN) for Audio (3)General Study of audio detection(Spectrogram) in Convolutional Neural Networks (4)What’s wrong with CNNs and spectrograms for audio processing?
13.怎样构建 mel-spectrogram作为输入特征 How do I use mel-spectrogram as the input of a CNN? 还提到一个开源代码供参考:panotti
14.城市声音分类的博客 从特征提取到分类 (1)Urban Sound Classification, Part 1 (2)Urban Sound Classification, Part 2 (3)Github: Urban-Sound-Classification
15.博客 Audio Classification Using CNN — An Experiment 有源码
16.Github tensorflow-speech-recognition Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks