|
|
|||||||||||||||||||||||||||||||||
|
Analysis
Filter Bank Interpretation
The filterbank is the original implementation of the phase vocoder and is also the first system that was implemented. This system breaks up the signal into its short term amplitude and phase spectra. In speech transmission it reduces the bandwidth and bit rate of the signal for transmission. It is referred to as the filter bank interpretation because it is just that, a series of filter banks. The signal is sent through a series of encoding channels, multiplexed and then regenerated. To understand the encoding, it is best to focus on one channel and how it works. The signal is fed through parallel channels, each operating on a different center frequency. Looking at one channel we can see what happens. The signals is modulated by a cosine wave at that center frequency and then fed through a low pass filter. At the same time it is modulated by a sine wave (at the center frequency) and fed through a low pass filter with a cutoff at that center frequency. At this point we can see that this is in essence splitting the signal into a real and imaginary component for that center frequency (remember z = mag*(cos(theta) + j*sin(theta)) ). From the real and imaginary signals we find the magnitude and phase. If the real part is ‘a’ and the imaginary part is ‘b’ then
It is important to note that although the signal is literally being lowpass filtered, in actuality the signal is being bandpass filtered around the center frequency (the cos/sin frequency and the cutoff frequency). The wave modulation shifts all frequencies of the input signal by +/- Fo (frequency of the sin/cos). When the shifted signal is then low pass filtered you are going to only get a portion of the signal that was shifted down in frequency. This portion of the signal is similar to what a bandpass filter would capture. This concept is referred to as heterodyning. So now for one channel we have the magnitude and phase at a defined frequency band centered around Fo for that channel. In order to encode the entire signal it is necessary to have multiple channels at increasing center frequencies. By the time one time portion of the signal is calculated you will have N magnitudes and phases that coincide with N frequency bands. For a moderately sampled voice signal 30 channels are appropriate. With a higher sample rate (bandwidth) more channels are needed to fully capture the signal’s qualities. In a commutation system these channel components could be transmitted with a limited bandwidth, or number of bits. In my implementation there was no need to alter the encoded output since I would be directly using the encoding for manipulation of an audio signal instead of transmitting bits on a wire. On the receiving end the signal operations are reversed to get back the original signal. Each channel is fed through a channel decoder and then the outputs of the channel decoders are summed to get the composite signal. For each channel the magnitude and phase signals are interpolated (separately). The interpolated phase is then integrated and then modulated. The phase modulation yields, cos(wi + phi(wi, n)), where wi is the frequency of the channel i and n is the time index. The interpolated magnitude is then modulated with the phase modulated signal. The output of these decoded channels is then summed up as mentioned previously. Now the signal is reconstructed. It is important to note that between the encoding and decoding is where the audio effects take place. After creating the phase vocoder with this implementation it became apparent that I could not properly modulate two vocoded signals to get the classic robot channel vocoder sound. Even though I could successfully encode and then decode the signal I could not get the sound I was looking for. I assumed this was due to a poor implementation of the phase vocoder. This assumption is incorrect, it did however lead me to the Fourier Transform interpretation. >> Fourier Transform Interpretation
|
|||||||||||||||||||||||||||||||||