Low pass shape curve with low cycle count

Question

Hi, I want to achieve the above curve in software without using a dsp function. I was hoping to use a fast and low cycle arm function like multiply-accumulate.

Is there any fast way of doing this in C on an embedded arm processor?

Answer 1

The curve shown is that of the simplest possible first-order filter charactarised by a 3dB cut-off frequency f _c> and a 6dB/Octave or 20dB/Decade roll-off. As an analogue filter it could be implemented as a simple passive RC filter thus:

In the digital domain such a filter would be implemented by:

y _n = a ₀ x _n + b ₁ y _n-1

Where y are input samples and x output samples. Or in code:

void lowPassFilter( const tSample* x, tSample* y, size_t sample_count )
{
    static tSample y_1 = 0 ;

    for( int i = 0; i < n; ++i)
    {
        y[i] = a0 * x[i] + b1 * y_1 ;
        y_1 = y[i];
    }
}

The filter is characterised by the coefficients:

a ₀ = 1 - x
b ₁ = x

where x is a value between 0 and 1 (I'll address the eradication of the implied floating point operations in due course):

x = e ^{-2πf _c}

Where f _c is the desired -3dB cut-off frequency expressed as a fraction of the sample rate. So for a sample rate 32Ksps and a cut-off frequency of 1KHz, f _c = 1000/32000 = 0.03125, so:

b ₁ = x = e ^{-2πf _c} = 0.821725
a ₀ = 1 - x = 0.178275

Now naïvely plugging those constants into the lowPassFilter() will result in generation of floating point code and on an MCU without an FPU that might be prohibitive and even with an FPU might be be sub-optimal. So in this case we might use a fixed-point representation. Since all the real values are less than one, and the machine is 32bit, a UQ0.16 representation would be appropriate, as intermediate multiplication results will not then overflow a 32 bit machine word. This does require the sample width to be 16bit or less (or scaled accordingly). So using fixed-point the code might look like:

typedef uint16_t tSample ;

#define b1 53852    // 0.821725 * 65535
#define a0 (1 - b1)

#define FIXED_MUL( x, y ) (((x)*(y))>>16))

void lowPassFilter( const tSample* x, tSample* y, size_t sample_count )
{
    static tSample y_1 = 0 ;

    for( int i = 0; i < n; ++i)
    {
        y[i] = FIXED_MUL(a0, x[i]) + FIXED_MUL(b1, y_1) ;
        y_1 = y[i];
    }
}

Now that is not a significant amount of processing for most ARM processors at 32ksps suggested in this example. Obviously it depends what other demands are on the processor, but on its own this would not be a significant load, even without applying compiler optimisation. As with any optimisation, you should implement it, measure it and improve it if necessary.

As a first stab I'd trust the compiler optimiser to generate code that in most cases will meet requirements or at least be as good as you might achieve with handwritten assembler. Whether or not it would choose to use a multiply-accumulate instruction is out of your hands, but if it didn't the chances are that it is because there s no advantage.

Bare in mind that ARM Cortex-M4 and M7 for example include DSP instructions not supported in M0 or M3 ports. The compiler may or may not utilise these, but the simplest way to guarantee that without resorting to assembler would be to use the CMSIS DSP Library whether or not that provided greater performance or better fidelity than the above, you would have to test.

Worth noting that the function lowPassFilter() retains state staically so can be called iteratively for "blocks" of samples (from ADC DMA transfer for example), so you might have:

int dma_buffer_n = 0
for(;;)
{
    waitEvent( DMA_BUFFER_READY ) ;
    lowPassFilter( dma_buffer[dma_buffer_n], output_buffer, DMA_BLOCK_SIZE ) ;
    dma_buffer_n = dma_buffer_n == 0 ? 1 : 0 ; // Flip buffers
}

The use of DMA double-buffering is likely to be far more important to performance than the filter function implementation. I have worked on a DSP application sampling two channels at 48ksps on a 72MHz Cortex-M3 with far more complex DSP requirements than this with each channel having a high pass IIR, an 18 coefficient FIR and a Viterbi decoder, so I really do think that your assumption that this simple filter will not be fast enough is somewhat premature.

Low pass shape curve with low cycle count

Question

1 answers

solution1
1 2022-02-06 12:00:14

Low pass shape curve with low cycle count

Question

1 answers

solution1 1 2022-02-06 12:00:14

solution1
1 2022-02-06 12:00:14