Subsections

FIR Filter

Figure 16: FIR filter using the impulse response of the analogue filter $h(t)$
\includegraphics[width=\linewidth]{fir}
What happens if we sample the impulse response $h(t)$ of an analogue filter? Let's find out:
  $\displaystyle
h(t)=\sum_{n=0}^\infty h(nT) \delta(t-nT)
$ (102)
If we transform it to the Laplace space it looks like this:
  $\displaystyle
H(s)=\sum_{n=0}^\infty h(nT) {\underbrace{{\left(e^{-sT}\right)}}_
{z^{-1}}}^n
$ (103)
Remember that $e^{-sT}$ has a very handy meaning: it is a delay by the unit time step (Eq. 80). Thus $z^{-n}$ is a delay by $n$ time steps. We can rewrite Eq. 103:
  $\displaystyle H(z)=\sum_{n=0}^\infty h(nT) {(z^{-1})}^n
$ (104)
This is the z-transform of the impulse response $h(t)$ of the filter.

We filter now the signal $X(z)$ with $H(z)$:

  $\displaystyle H(z)X(z)=\underbrace{\sum_{n=0}^\infty h(nT) z^{-n}}_{H(z)} X(z)
$ (105)

This sum is a direct recipe how to filter the signal $X(z)$. We only need the impulse response of the filter $h(nT)$ and we can set up a digital filter (see Fig. 16). Of course in practise this impulse response cannot run till infinity but only for a limited number of samples. These are often called “taps”. So for example a filter with $100$ samples of $h(nT)$ has $100$ “taps”. This in turn then requires a delay line which can hold $M=100$ samples:

  $\displaystyle
H(z)X(z)=\underbrace{\sum_{m=0}^M h(mT) z^{-m}}_{H(z)} X(z)
$ (106)

Figure 17: Digital FIR filter with normalised frequency
\includegraphics[width=\linewidth]{fir_digital}

This is the formula of an Finite Impulse Response filter where we have sampled an analogue impulse response $h(t)$ at time intervals of $T$. However, usually the impulse response of the filter is directly derived in the digital domain where the argument $n$ of $h(n)$ represents just the sample index $n$ and the sampling interval is implicitly the inverse of the sampling rate. For that reason one needs to distinguish between:


$\displaystyle H(z)X(z)$ $\textstyle =$ $\displaystyle \sum_{m=0}^M h_\textrm{\footnotesize {analogue}}(mT) z^{-n} \, X(z)$ (107)
$\displaystyle H(z)X(z)$ $\textstyle =$ $\displaystyle \sum_{m=0}^M h_\textrm{\footnotesize {digital}}(m) z^{-n} \, X(z)$ (108)
  $\textstyle =$ $\displaystyle \sum_{m=0}^M h_(m) z^{-m} \, X(z)$ (109)

Eq 107 is the filter where the impulse response is based on an analogue filter and Eq. 108 is based on the impulse response originating from a digital filter design (see section 6.7.6 and 6.7.7) where the frequencies are normalised frequencies from $0$ to $0.5$. From now on we will always refer to the “digital” one (Eq. 108) and the subscript will be omitted (Eq. 109 & Fig. 17).

In the time domain the FIR filter is then doing the following operation:

  $\displaystyle y(n) = \sum_{m=0}^{M-1} h(m) x(n-m)
$ (110)

Generally FIR filters do not perform a convolution operation

Remember that we have derived the FIR filters from the analogue domain by filtering a signal (Eq.105). The timedomain representation of this is a discrete and causal convolution operation (using the sampled Laplace transform). However, the problem is that this runs to infinite time (see Eq 110) and would require an infinite number of delay steps and thus infinite time to complete. So the FIR filter at best performs an approximation of the convolution operation but there are serious problems with the finite number of taps ($N$) which requires a technique called windowing which moves it even further away from a convolution operation. On the other hand if the impulse response is strictly limited in time one can use this property for example for matched filters. However, the bottomline is that sweeping statements such as that “FIR filters perform a convolution” are generally wrong if nothing is known of the impulse response.

FIR filter implementations

No matter the design process the implementation is always the same: we need a delay line for the incoming signal and then weight the delayed outputs by the different coefficients and for that reason are called (water-) “taps” (see Eqs. 107 & 108). We are now presenting different implementations.

More sophisticated code can be found in Teukolsky et al. (2007). This book is strongly recommended for any C programmer who needs efficient solutions.

Fixed point FIR filters

Figure 18: Fixed point FIR filter. The output signal is bit shifted to the right by $w$ bits while the coefficients are scaled up by $2^w$.
\includegraphics[width=\linewidth]{fir_fixed}
These filters receive integer numbers as input, perform integer multiplications/additions and their outputs are integer as well. Thus, these filters do not require a floating point unit on a processor.

Fig. 18 shows a fixed point FIR filter. The input $x(n)$ is an integer variable with I bit integers, the accumulator is an integer variable with A bits and the output as well (usually the same as the input in terms of bit width).

In contrast to a floating point FIR filter we need to scale up the coefficients so that they use full the integer range to avoid quantisation errors. For example if the coefficients of $h(n)$ range from $-0.75$ and $+0.75$ and we have signed 16 bit integers then the scaling factor is $2^W, W=15$.

However, the accumulator A which collects the data needs to have more bits because it receives scaled input values at I bits precision and these multiplied by factor $2^W$. If we have $M$ taps then the additional bits we need is $\log_2 M$. The total number of bits we need in the accumulator in the worst case are:

  $\displaystyle A = I + W + \log_2 M
$ (111)
However, this is the worst case scenario because if the gain of the FIR filter is below one then the summations by the $M$ taps will only create temporary overflows because integer numbers are cyclic in their representation. In case the gain of the FIR filter is below one this can be relaxed:
  $\displaystyle A = I + W
$ (112)
The actual bitwidth of the accumulator is usually the next integer size available and also makes sure that in case the gain goes slightly over one in an unexpected case that the filter still works. For example if I has $16$ bits the accumulator has probably $32$ bits.

Constant group delay or linear phase filter

So far the FIR filter has no constant group delay which is defined by:
  $\displaystyle \tau_\omega= - {d\phi(\omega)\over d\omega}
$ (113)
This means that different frequencies arrive at the output of the filter earlier or later. This is not desirable. The group delay $\tau_\omega$ should be constant for all frequencies $\omega$ so that all frequencies arrive at the same time at the output $y$ of the filter.

A constant group delay can be achieved by restricting ourselves to the transfer function:

  $\displaystyle H(e^{i\omega})=B(\omega)e^{-i\omega\tau+i\phi}
$ (114)
where $B(\omega)$ is real and the phase is only defined by the exponential. The phase of this transfer function is then trivially $\omega\tau+\phi$. The group delay is the derivative of this term which is constant.

Eq. 114 now imposes restrictions on the impulse response $h(t)$ of the filter. To show this we use the inverse Fourier transform of Eq. 114 to get the impulse response:

  $\displaystyle h(n)=\frac{1}{2\pi}\int_{-\infty}^{+\infty} H(e^{i\omega})
e^{i\omega n}d \omega
$ (115)
After some transformations we get:
  $\displaystyle h(n+\tau)=\frac{1}{2\pi}e^{i\phi}b(n)
$ (116)
where $b(n)$ represents the Fourier coefficients of $B(\omega)$. Since $B$ is real the coefficients $b(n)$ have the property $b(n)=b^*(-n)$. Thus we get a second impulse response:
  $\displaystyle h(n+\tau)=\frac{1}{2\pi}e^{i\phi}b^*(-n)
$ (117)
Now we can eliminate $b$ by equating Eq. 116 and Eq. 117 which yields:
  $\displaystyle h(n+\tau)=e^{2i\phi} n^*(-n+\tau)
$ (118)
With the shift $\tau$ we have the chance to make the filter “more” causal. We can shift the impulse response in positive time to get $h(n)$ zero for $n<0$. In a practical application we shift the impulse response by half the number of delays. If we have a filter with $M$ taps we have to delay by $\tau=M/2$.

The factor $e^{2i\phi}$ restricts the values of $\phi$ because the impulse response must be real. This gives us the final FIR design rule:

  $\displaystyle h(n+M/2)=(-1)^k h(-n+M/2)
$ (119)
where $k=0$ or $1$. This means that the filter is either symmetric or antisymmetric and the impulse response has to be delayed by $\tau=M/2$.

Window functions

Figure: Different window functions applied to a low pass filter (Eq. 132) with cutoff at $f=0.1$ and 100 taps.
\includegraphics[width=\linewidth]{window_functions}
So far we still have a infinite number of coefficients for for the FIR filter because there's no guarantee that the impulse response becomes zero after $M/2$ delays. Thus, we have to find a way to truncate the response without distorting the filter response.

The standard technique is to multiply the coefficients with a window function which becomes and stays zero at a certain coefficient $n>N$ so that Eq. 106 need not to run to infinity:

  $\displaystyle
H(z)X(z)=\sum_{n=0}^N \underbrace{h(nT) w(nT)} z^{-n} X(z)
$ (120)

  1. Rectangular window: truncating the impulse response. Problem: we get ripples in the frequency-response. The stop-band damping is poor

  2. Triangular or Bartlett window: greatly improved stop-band attenuation.
  3. Hanning and Hamming window: standard windows in many applications.
      $\displaystyle w(n) = \alpha - (1-\alpha) \cos\left(\frac{2\pi n}{M}\right)
$ (121)

  4. Blackman window:
      $\displaystyle w(n) = 0.42 + 0.5 \cos \left(\frac{2 \pi n}{M}\right) +
0.08 \cos \left( \frac{4 \pi n}{M} \right)
$ (122)
  5. Kaiser window: control over stop- and passband. No closed form equation available.
To illustrate how window functions influence the frequency response we have taken an impulse response of a lowpass filter ($f_c=0.1$) and applied different window functions to it (Fig. 19).

Note that the higher the damping the wider the transition from pass- to stopband. This can be seen when comparing the Blackman window with the Hamming window (Fig. 19). For the lowpass filter this seems to be quite similar. However, for a bandstop filter the wider transition width might lead actually to very poor stopband damping. In such a case a Hamming window might be a better choice.


Python code: impulse response from the inverse DFT - The frequency sampling method

Imagine that we want to remove $50Hz$ from a signal with sampling rate of $1kHz$. We define an FIR filter with 100 taps. The midpoint $N/2=50$ corresponds to $500Hz$. The $50Hz$ correspond to index $5$.
f_resp=np.ones(100)
# note we need to add "+1" to the end because the end of
# the range is not included.
f_resp[4:6+1]=0
f_resp[94:96+1]=0
hc=np.fft.ifft(f_resp)
h=np.real(hc)
# this is from index 0 to index 49 on the left
# and on the right hand side from index 50 to index 99
h_shift[0:50]=h[50:100]
h_shift[50:100]=h[0:50]
h_wind=h_shift*hamming(100)
To get a nice symmetric impulse response we need to shift the inverse around $50$ samples.


FIR filter design from ideal frequency response – The analytical way

For many cases the impulse response can be calculated analytically. The idea is always the same: define a function with the ideal frequency response
  $\displaystyle \vert H(e^{j\omega})\vert = \underbrace{B(e^{j\omega})}_{\mbox{real}}
$ (123)
and perform an inverse Fourier transform to get the impulse response $h(n)$.

We demonstrate this with a lowpass filter:

  $\displaystyle \vert H(e^{j\omega})\vert =
\left\{
\begin{array}{ll}
1 & \mbox{f...
...c} \\
0 & \mbox{for } \omega_c < \vert\omega\vert \leq \pi
\end{array}\right.
$ (124)

Use the inverse Fourier transform to get the impulse response:

$\displaystyle h(n)$ $\textstyle =$ $\displaystyle \frac{1}{2 \pi} \int_{-\pi}^{\pi} H(e^{j \omega}) e^{j \omega n} d\omega$ (125)
  $\textstyle =$ $\displaystyle \frac{1}{2 \pi} \int_{- \omega_{c}}^{+ \omega_{c}} e^{j \omega n} d\omega$ (126)
  $\textstyle =$ $\displaystyle \frac{1}{2 \pi} \left[\frac{1}{jn} e^{j \omega n} \right]_{- \omega_{c}}^{+ \omega_{c}}$ (127)
  $\textstyle =$ $\displaystyle \frac{1}{2 \pi jn} \left( e^{j \omega_{c} n} - e^{-j \omega_{c}n} \right)$ (128)

With these handy equations:
$\displaystyle \sin z$ $\textstyle =$ $\displaystyle \frac{1}{2j} (e^{zj} - e^{-zj})$ (129)
$\displaystyle \cos z$ $\textstyle =$ $\displaystyle \frac{1}{2} (e^{zj} + e^{-zj})$ (130)

we get for the filter:
  $\displaystyle h(n) =
\left\{
\begin{array}{cc}
\frac{1}{\pi n} \sin \omega_{c}...
...or } n \neq 0 \\
\frac{\omega_{c}}{\pi} & \mbox{for }n = 0
\end{array}\right.
$ (131)
This response is a-causal! However we know that we can shift the response by any number of samples to make it causal! And we have to window the response to get rid of any remaining negative contribution and to improve the frequency response.

Highpass, bandstop and bandpass can be calculated in exactly the same way and lead to the following ideal filter characteristics:

See Diniz (2002, p.195) for more impulse responses.

Here is an example code for a bandstop filter which fills the array h with the analytically calculated impulse response:

f1 = 45.0/fs
f2 = 55.0/fs
n = np.arange(-200,200)
h = (1/(n*np.pi))*(np.sin(f1*2*np.pi*n)-np.sin(f2*2*np.pi*n))
h[200] = 1-(f2*2*np.pi-f1*2*np.pi)/np.pi;
h = h * np.hamming(400)
After the function has been calculated it is windowed so that the cut off is smooth. It's not very elegant as it's causes a division by zero first and then the coeffecient at 200 is fixed. Do it better!

Design steps for FIR filters

Here are the design steps for an $M$ tap FIR filter.
  1. Get yourself an impulse response for your filter:
    1. Create a frequency response “by hand” just by filling an array with the desired frequency response. Then, perform an inverse Fourier transform (see section 6.7.6).
    2. Define a frequency response analytically. Do an inverse Fourier transform (see section 6.7.7).
    3. Use an analogue circuit, get its impulse response and use these as filter coefficients.
    4. Dream up directly an impulse response (for example, averagers, differentiators, etc)

  2. Mirror the impulse response (if not already symmetrical)
  3. Window the impulse response from an infinite number of samples to $M$ samples.
  4. Move the impulse response to positive time so that it becomes causal (move it $M/2$ steps to the right).

FIR filter design with Python's high level functions

The “firwin” command generates the impulse response of a filter with m taps and also applies a default window function. For example:
from scipy.signal import firwin
h = firwin(m,2*f)
generates a lowpass FIR filter with the normalised frequency f. Note the factor two because in scipy the normalisation is not the sampling rate but the nyquist frequency. To be compatible with the math here the easiest approach is to multiply the frequency by 2. With this command one can also design high pass, bandstop and bandpass filters. Type help(firwin) which provides examples for all filter types and which parameters need to be set.

Signal Detection: Matched filter

How can I detect a certain event in a signal? With a correlator. Definition of a correlator?
  $\displaystyle e(t) = \int_{0}^{t} \underbrace{s(\tau)}_{\mbox{signal}} \underbrace{r(\tau)}_{\mbox{template}} d(\tau)
$ (136)

How to build a correlator with a filter? Definition of filtering?

  $\displaystyle e(t) = \int_{0}^{\infty} s(\tau) h(t - \tau) d\tau
$ (137)
NB. h - integration runs backwards. However we used an int forward! $h(t) : = r(T - \tau)$, only valid for $0\ldots T$.


$\displaystyle e(t)$ $\textstyle =$ $\displaystyle \int_{0}^{T} s(\tau) r \left(T - (t - \tau)\right) d\tau$ (138)
  $\textstyle =$ $\displaystyle \int_{0}^{T} s(\tau) r(T - t + \tau) d\tau$ (139)

for $t: = T$ we get:
  $\displaystyle e(T) = \int_{0}^{\infty} s(\tau) r (\tau) d\tau
$ (140)
  $\displaystyle \underbrace{h(t) : = r(T - t)}_{ matched \quad filter!}
$ (141)
In order to design a detector we just create an impulse response $h$ by reversing the template $r$ in time and constructing an FIR filter with it.

How to improve the matching process? Square the output of the filter!

Adaptive FIR Least Mean Squares (LMS) filters

Figure 20: Adaptive FIR filter: A) Negative feedback loop where a desired signal $d(n)$ is compared with the output of the FIR filter $y(n)$ and creates the error signal $e(n)$ which then tunes the FIR filter. B) Internal workings of the adaptive FIR filter where the error signal $e(n)$ is multiplied with the delayed input signal $x(n-m)$ and then changes the FIR filter coefficients $h_m$.
\includegraphics[width=\linewidth]{fir_lms}
So far the coefficients of the FIR filter have been constant but now we are going to allow them to change while the FIR filter is operating so that it can learn its own coefficients. The idea is to use an error signal $e(n)$ to tune the coefficients $h(n)$ of the FIR filter (see Fig. 20A) by comparing its output $y(n)$ with a desired output $d(n)$:
  $\displaystyle e(n) = d(n) - y(n)
$ (142)
and tuning the FIR filter in a negative feedback loop till the average of $e(n)$ is zero.

We need to derive how the error $e(n)$ can change the FIR filter coefficients so that the error is actually minimised. This can be expressed as a so called “gradient descent” which minimises the squared error $\frac{1}{2} e(n)^2$ because both a positive and a negative error are equally bad:

  $\displaystyle \Delta h_m = - \mu \frac{\partial\left( \frac{1}{2}e(n)^2 \right)}{\partial h_m}
$ (143)
where $\Delta h_m(n)$ is the change of the coefficient $h_m(n)$ at every time step: $h_m(n+1) = h_m(n) + \Delta h_m(n)$. $\mu \ll 1$ defines how quickly the coefficients $h_m$ change at every time step and is called the “learning rate”. Note here the change in notation of the FIR filter coefficients $h_m$ which are changing much slower than the sampled signals $e(n), d(n)$ and $y(n)$ and can still be seen as constant for the realtime filtering operation. Thus, we have $n$ for the sample by sample processing as before and the index $m$ of $h_m$ for the very slowly changing FIR filter coefficients. To gain some intuition why Eq. 143 mimimises the error $e(n)$ we look at what happens if we increase the FIR coefficient $h_m$ a little bit, for example, by the small amount of $\vert\epsilon\vert \ll 1$: $h_m := h_m + \epsilon$. Then we can observe two cases:
  1. The squared error $e(n)^2$ increases so we need to decrease $h_m$ as it makes it worse.
  2. The squared error $e(n)^2$ decreases so we need to increase $h_m$ as it makes it better.
This also works in the same way if we decrease $h_m$ by a small amount. Thus from both directions this always minimises the error. This is basically a carrot & stick approach where the coefficients $h_m$ are being “rewarded” if they minimise the error and “punished” if they make the error larger. This approach is called “Least Mean Squares” (LMS) as it minimises the squared error on average. It's also known from neural networks where the $h_m$ are called the “weights” of a neuron (see delta rule).

Eq 143 now needs to be turned into a form which can directly run in software by solving its partial derivative by inserting Eqs. 142 and 110.

$\displaystyle \Delta h_m$ $\textstyle =$ $\displaystyle - \mu \frac{1}{2}\frac{\partial \left(d(n)-y(n)\right)^2}{\partial h_m}$ (144)
  $\textstyle =$ $\displaystyle - \mu \frac{1}{2}\frac{\partial \left(d(n)-\sum_{m=0}^{M-1} h_m \cdot x(n-m)\right)^2}{\partial h_m}$ (145)
  $\textstyle =$ $\displaystyle \mu \left(d(n)-y(n)\right) \cdot x(n-m)$ (146)
  $\textstyle =$ $\displaystyle \mu \cdot e(n) \cdot x(n-m)$ (147)

Note that $x(n-m)$ emerges because of the chain rule when partially differentiating the output $y(n)$ which depends on the sum of the delay lines of the FIR filter (see Eq 110). Eq. 147 is now our “learning” rule which can simply be applied to the FIR filter as showm in Fig 20B.

Now we have the recipe of an adaptive filter where the FIR filter minimises its own error $e(n)$ in a negative feedback loop and generating an output $y(n)$ which is as closely as possible to the desired signal $d(n)$. The signal $y(n)$ is often called “the remover” as it cancels out the signal $d(n)$. However, not everything in $d(n)$ is cancelled out but only what is correlated with the input $x(n)$. This means that the error signal $e(n)$ will not be zero but will contain any signal components which are not correlated with $x(n)$. This means that the error signal actually is also the cleaned up output signal of the adaptive filter.

Imagine you want to remove $50$ Hz mains from a signal

  $\displaystyle d(n) = \mathrm{signal}(n) + \mathrm{50Hz~noise}(n)
$ (148)
then one would provide $50$ Hz mains via
  $\displaystyle x(n) = \mathrm{50Hz~noise}(n)
$ (149)
so that then the filter learns to remove the powerline interference. One might think that the filter will remove everything from $d(n)$ because it will try to minimise the error $e(n)$ but only the 50 Hz in the contaminated signal will be removed because only signal components which are correlated between $x(n)$ and $d(n)$ will be removed. The cleaned up signal will be identical to the error signal:
  $\displaystyle e(n) = \mathrm{signal}(n)
$ (150)

github / contact

Creative Commons License