I got interested in working on a personal project which is going to have following features
* the audio clip will be in mp3 format
* Audio will have different people speaking in a mass gathering
* Strip off a particular person's voice
* The system will be provided with some sample voice clips of each person
I understand the basics of digital signal processing (FT, FFT, DFT, STFT, etc.). The subject is a little bit intimidating, so I wanted to know in specific
* What theories should I deeply focus on for this case? Any reference documents would be appreciated.
* What tools should I be using for this particular case? I'm thinking to do this using NAudio library for .NET. I'm willing to change my programming platform if there are some other better libraries.
* the audio clip will be in mp3 format
* Audio will have different people speaking in a mass gathering
* Strip off a particular person's voice
* The system will be provided with some sample voice clips of each person
I understand the basics of digital signal processing (FT, FFT, DFT, STFT, etc.). The subject is a little bit intimidating, so I wanted to know in specific
* What theories should I deeply focus on for this case? Any reference documents would be appreciated.
* What tools should I be using for this particular case? I'm thinking to do this using NAudio library for .NET. I'm willing to change my programming platform if there are some other better libraries.