Summer Of Blog

Thursday, October 10, 2013

Epilogue: Fourier analysis and testing

Hi.

It's time for the final update of my GSoC project. The last part of my GSoC was all about improving the resampler test cases for PulseAudio. Since a lot of DSP testing depends on the Fourier transformation, I'll try to explain it briefly here.

The Fourier transform converts a periodic signal into a sum of sines and cosines of varying frequencies and phases. This way we can see the amplitude and frequency of the fundamental sine waves that make out our signal.

Lets take a look at a 440Hz sine wave like the one below.

~100 samples of a 440Hz sine wave

This is a simple sine wave without any higher harmonics so we don't expect to see much in our transformed signal. The sample format of our sine wave is 16-bit PCM. We can see that our amplitude is at about 50% (32768 would be loudest for 16-bit PCM and we are at about 16000) or -6dB.

Below we can see the Fourier transform of our sine wave.

We can clearly see a nice spike at 440Hz with an amplitude of -6dB. If our sine wave would consist of a fundamental sine wave at 440Hz and a higher harmonic at 880Hz we would see two spikes.

Now there are some other spikes, albeit rather small ones, which means that our original sine wave did not only contain a fundamental wave at 440Hz. The other spikes are considered to be noise. This way we can measure the signal to noise ratio (SNR). We take a look at the amplitude of our signal and the highest amplitude of an unwanted spike and divide them. Easy, isn't it?

Lets take a final look at a transformed signal. Below is the Fourier transform of a logarithmic chirp (logarithmically swept sine wave) from 440Hz to 48kHz which was run through a resampler.

We can see that the start frequency is indeed around 440Hz but at about 20kHz our amplitude starts to fall off. This happened because our resampler was smart enough to filter out frequencies above the audible level (or frequencies that are higher than half of the sampling frequency) to avoid aliasing.

These test and graphs are now easily reproducible using the new and shiny 'resampler-quality-test' which generates arbitrary sine or chirp signals and runs them through the resampler (the result can be saved) and a fft.py script which plots the Fourier transform of a WAVE file.

Some more improvements were made to the already existing resampler test case but these are not so interesting.

That's all for this year's GSoC. It was a fun and productive experience. I want to take a moment and thank my mentor for this year, Peter Meerwald, for all the help and friendly exchanges during the summer, and my last year's mentor Tanu Kaskinen for the same nice treatment last year.

Goodbye.

Monday, August 12, 2013

Vol 2: Resampling methods

Hi.
Time for an update. This time I will talk a little bit about the different resampling methods and resampling in general. So lets start with a quick introduction to resampling.

Below is a figure of a discrete time representation of a 440Hz sine wave. The sine wave is sampled at a rate (or sampling frequency) of 48kHz. This means that roughly every 0.02ms a sample of our sine wave is taken so for one second of sound 48000 samples are needed.

440Hz sine wave sampled at 48kHz

Now if the clock in our sound card supports this frequency, we can just feed it our sine wave and it will play it just fine. If this is not the case, we would get pitch shifted output just like when the playback speed on an old tape recorder or turntable is increased or decreased.

Conceptually, we could reconstruct our analog signal and sample it at our new sample rate to obtain our desired discrete time representation. But this is not a really practical solution and we use pure mathematical solutions instead. One of them is linear interpolation. Shown below is our original sine wave resampled via linear interpolation to 96kHz which means that we now have twice as many samples than in our original sampled sine wave.

440Hz sine wave resampled to 96kHz

There are many different resampling methods and implementations. PulseAudio already supports quite a few. I added support for some more and tested their performance.

Here are the newly added methods:

libswresample (lswr below)
libavresample (lavr below)
sox resampler (soxr below)

And here are the test results:

Performance using signed 16-bit integers as the sample format

These results should be taken with a grain of salt because the different resampling methods do not carry the same quality. The most interesting resampler here seems to be soxr using cubic interpolation.

Below is the same test but this time using floating point numbers as the sample format:

Performance using floating point numbers as the sample format

Again soxr here seems to be the most promising.

Which of these new resampling methods will find their way into the master tree of PulseAudio remains to be seen.

This ended up somewhat longer than anticipated, but I hope it was interesting.

Thanks for your attention!

Sunday, July 14, 2013

Vol 1. Refactoring

Hi. Time for a quick update about my Summer of Code project.
I'm going to talk a little about my first checklist item, refactoring the resampling code. While refactoring isn't quite exciting as implementing new features it is a necessity to assure code maintainability.

So let's take a look at the resampler interface and see what we can do. The interface consists of a pa_resampler structure which we create if we want to do some resampling. This structure holds all of our settings the resampler cares about, (sample specifications, different buffers) but also some specific data for different resampling implementations.
The interesting bits are shown bellow.

struct pa_resampler {
...
    void (*impl_free)(pa_resampler *r);
    void (*impl_update_rates)(pa_resampler *r);
    void (*impl_resample)(pa_resampler *r, const pa_memchunk *in, 
                          unsigned in_samples, pa_memchunk *out, 
                          unsigned *out_samples);
    void (*impl_reset)(pa_resampler *r);

    struct { /* data specific to the trivial resampler */
        unsigned o_counter;
        unsigned i_counter;
    } trivial;

    struct { /* data specific to the peak finder pseudo resampler */
        unsigned o_counter;
        unsigned i_counter;

        float max_f[PA_CHANNELS_MAX];
        int16_t max_i[PA_CHANNELS_MAX];

    } peaks;
...
};

After the implementation specific function pointers we can see multiple structures holding implementation specific data. Since the resampler can't switch implementations on the fly (without destroying and recreating a resampler) only one of these structures is used at a time.
There are six of those structures contained inside of pa_resampler and some of them have only a single member which is quite pointless.

Further bellow of the file we see a big init_table containing the mapping between a resampling method (an enumeration) and its initialization function.

static int (* const init_table[])(pa_resampler*r) = {
    [PA_RESAMPLER_SRC_SINC_BEST_QUALITY]   = libsamplerate_init,
    [PA_RESAMPLER_SRC_SINC_MEDIUM_QUALITY] = libsamplerate_init,
    [PA_RESAMPLER_SRC_SINC_FASTEST]        = libsamplerate_init,
    [PA_RESAMPLER_SRC_ZERO_ORDER_HOLD]     = libsamplerate_init,
    [PA_RESAMPLER_SRC_LINEAR]              = libsamplerate_init,
    [PA_RESAMPLER_TRIVIAL]                 = trivial_init,
    [PA_RESAMPLER_SPEEX_FLOAT_BASE+0]      = speex_init,
    [PA_RESAMPLER_SPEEX_FLOAT_BASE+1]      = speex_init,
    ...
    [PA_RESAMPLER_SPEEX_FLOAT_BASE+10]     = speex_init,
    [PA_RESAMPLER_SPEEX_FIXED_BASE+0]      = speex_init,
    ...
    [PA_RESAMPLER_SPEEX_FIXED_BASE+10]     = speex_init,
    ...

As we can see there are quite some duplicates here. There are a total of 32 entries while only having 6 distinctive init functions. There is another big table like this containing the implementation names, this table doesn't contain any duplicates but it would be nice if we could group the implementation names into smaller implementation specific tables.

So without further ado here is the first code snippet with the appropriate changes.

struct pa_resampler {
    ...
    pa_resampler_implementation implementation;
    ...
};

All the implementation specific data is now contained inside a single structure.
And here is how the equivalent second code snippet looks like.

static pa_resampler_implementation *impl_table[] = {
    [PA_RESAMPLER_SRC_SINC_FASTEST] = &libsamplerate_impl,
    [PA_RESAMPLER_TRIVIAL] = &trivial_impl,
    [PA_RESAMPLER_SPEEX_FLOAT_BASE] = &speex_impl,
    [PA_RESAMPLER_FFMPEG] = &ffmpeg_impl,
    [PA_RESAMPLER_AUTO] = &auto_impl,
    [PA_RESAMPLER_COPY] = ©_impl,
    [PA_RESAMPLER_PEAKS] = &peaks_impl,
};

No more duplicate entries here.
And at last here is how the pa_resampler_implementation structure is defined.

struct pa_resampler_implementation {
    int (*init)(pa_resampler *r);
    void (*free)(pa_resampler *r);
    void (*update_rates)(pa_resampler *r);
    void (*resample)(pa_resampler *r, const pa_memchunk *in,
                     unsigned in_samples, pa_memchunk *out,
                     unsigned *out_samples);
    void (*reset)(pa_resampler *r);
    void *data;
    const char *names[PA_RESAMPLER_MAX_VARIANTS];
};

The implementation specific structures are replaced by a simple opaque data entry and the implementations init function takes care of the allocating. The big names table is also now split up and contained inside this structure.

These changes aren't yet merged upstream and some of it may change if needed. Further details are on my github page.
Thats it for now, next time there should be a more interesting topic. Thanks for your attention and bye.

Sunday, June 2, 2013

Prelude

Hello and welcome to my new blog.

This blog should mainly serve as a small developer log over the duration of this year's Google Summer of Code. I'll try to keep it short and technical.

This year, like last year, I'm participating as a student for PulseAudio. My main task for this year is to update the resampling code.

So here's a list of things I have the privilege to work on over the summer:

refactor and cleanup the resampling code
enable resampling with libavresample
enable resampling with libswresample
deprecate the ffmpeg resampling method
replace libspeex with opus tools
improve resampling tests

What all of this exactly means will be explained some other time.

That's all for now. Bye!