In [1]:
%matplotlib inline
import matplotlib.pyplot as pl
from matplotlib import gridspec
import seaborn as sb
import numpy as np
from scipy.fftpack import fft
from scipy.signal import hamming, convolve, dimpulse, tf2zpk, zpk2tf, lfilter, freqz
from scipy.io import wavfile as wv

sb.set(style='ticks', font_scale=1.5)
blue, green, red, purple, gold, cyan = sb.color_palette("deep",6)

Finite Impule Response (FIR) filters and moving average (MA) models

Suppose we have a signal that is the combination of two sinusoids, one with low frequency and one with high frequency:

In [2]:
fs = 1000 # sampling frequency = 1000 Hz
t = np.arange(0,.2,1/fs) # time vector from 0 to 20 ms

# sinusoidal components
xa = 1.3*np.cos(2*np.pi*10*t + np.pi/5)
xb = 0.75*np.cos(2*np.pi*350*t + np.pi/4)
x = xa + xb

# plotting
fig, ax = pl.subplots(1,1,figsize=(15,5))
ax.plot(x, color=blue, lw=2); ax.set(ylim=(-2.25,2.25), xlabel='n', ylabel='x[n]')
sb.despine(ax=ax,trim=True)

Now suppose we calculate the five-point moving average to this sequence, i.e., we create a new signal which, at time $n$, is the mean of the five samples immediately preceding and including sample $n$. In this case, the output $y[n]$ is given by:

$y[n] = \displaystyle{\frac{1}{5}x[n] + \frac{1}{5}x[n-1] + \frac{1}{5}x[n-2] + \frac{1}{5}x[n-3] + \frac{1}{5}x[n-4] = \frac{1}{5}\sum_{k=0}^4 x[n-k]}$

We can easily define a moving average function and apply it to the signal $x$ defined above.

Here is a function:

In [3]:
# moving average function
def moving_average(x_in, npt=5):
    w = 1/npt # reciprocal of number of points
    L = len(x_in) # number of points
    x_out = np.zeros(L) # initialize output array
    for n in range(L):
        k = min(npt,n) # for avoiding problems with the first npt-1 samples
        x_out[n] = w*np.sum(x_in[n-k:n]) # moving average
    return x_out

And here is what happens when we apply it:

In [4]:
y = moving_average(x) # apply function to x

# plot original and new signal
fig,ax = pl.subplots(1,1,figsize=(15,5))
la, = ax.plot(x, lw=2, color=blue); lb, = ax.plot(y,'r',lw=2, color=red)
ax.set(xlabel='n',ylabel='y[n], x[n]'); ax.legend([la,lb],['x[n]','moving_average(x[n])'])
sb.despine(ax=ax,trim=True)

Convolution in MA processes, FIR filters

The moving average process $\displaystyle{y[n] = \frac{1}{5}\sum_{k=0}^4 x[n-k]}$ is an example of an operation called convolution.

We can make it more general by letting the weights (or coefficients) have arbitrary values (i.e., not all be equal to $1/5$). We'll call the general set of $M$ coefficients $h[k]$. The convolution of two sequences $x$ and $h$ is defined as:

$\displaystyle{y[n] = \sum_{k=0}^{M-1} h[k] x[n-k]} = h[k]\ast x[n]$

In our moving average example, $M=5$, and $h[k] = \displaystyle{\frac{1}{5}}$ for $k = 0,1,2,3,4$.

Below is a sample-by-sample illustration of convolution for the first 10 points of $x[n]$ using our five-point moving average coefficients $h[k]$. For the purposes of illustration, we'll define a couple functions for handling and plotting the samples of the signal $x$, the coefficients $h$, and the convolution $y$.

Here are the functions:

In [5]:
def adj_ax(ax,pix):
    ylabs = ['x[n]','h[n-k]','y[n]']; 
    yticks = [np.arange(0,1.8,.4),np.arange(0,.5,.2),np.arange(0,1.8,.4)];
    ylims = [(-.1,1.8),(-.1,.5),(-.1,1.8)]
    ax.set(ylabel=ylabs[pix-1],ylim=ylims[pix-1],yticks=yticks[pix-1],xlim=(-.1,13.1),xticks=np.arange(14));
    if pix==3:
        ax.set_xticklabels(range(-4,10));
    else:
        ax.set_xticklabels([]);
        
def conv_ill(n_in, clrs=[blue,red,purple]):
    fig, axes = pl.subplots(3,2,figsize=(16,6))
    ax1d = axes.ravel()
    for ix,ni in enumerate(n_in):
        ax = ax1d[ix]
        ax.plot(xs,'o',color=clrs[0],ms=10); adj_ax(ax,1);
        ax.set_title('n = ' + str(ni-4))
        hkt = np.concatenate((np.zeros(ni-4),hk[::-1],np.zeros(14-ni-1)));
        ax = ax1d[ix+2]; ax.plot(hkt,'o',color=clrs[1],ms=10); adj_ax(ax,2);
        yt[ni] = sum(hkt*xs)
        ax = ax1d[ix+4]; ax.plot(yt,'o',color=clrs[2],ms=10); adj_ax(ax,3);
In [6]:
xs = np.concatenate((np.zeros(4),x[:10])) # zero-padded signal
hk = .2*np.ones(5) # five-point moving average coefficients
yt = np.zeros(14) # initialize output array

[conv_ill([j,j+1]) for j in range(4,13,2)]; # list comprehension for plotting multiple steps

Some facts about convolution

  • Convolution is commutative, i.e., the order of $h$ and $x$ doesn't matter:

\begin{equation} y[n] = \displaystyle{\sum_{k=-\infty}^{\infty} h[k] x[n-k] = \sum_{k=-\infty}^{\infty} h[n-k] x[k]} \end{equation}

  • Convolution is associative, i.e., if you're convolving $g$, $h$, and $x$, it doesn't matter which two you convolve first:

\begin{equation} y[n] = g[i] \ast (h[k] \ast x[n]) = (g[i] \ast h[k]) \ast x[n] \end{equation}

  • Convolution is distributive, i.e., if you convolve $h$ with the sum of $x$ and $w$, this is the same as summing the separate convolutions of $h$ with $x$ and $h$ with $w$:

\begin{equation} y[n] = h[k] \ast (x[n] + w[n]) = h[k] \ast x[n] + h[k] \ast w[n] \end{equation}

  • Finally (at least for our purposes right now), if $y[n] = h[k] \ast x[n]$, and if $Y[m]$ is the DFT of $y[n]$, $H[m]$ the DFT of $h[k]$, and $X[m]$ the DFT of $x[n]$, then $Y[m] = H[m]X[m]$. That is, convolution in the time domain corresponds to multiplication in the frequency domain. In fact, the opposite is true as well, namely that multiplication in the time domain corresponds to convolution in the frequency domain. One important implication of this is that we can analyze any $h[k]$ by taking the DFT of the sequence $h[k]$.

A brief note about impulses and impulse response functions

With $x[n]$ as our input and $y[n]$ as our output, it is useful to think of $h[k]$ as a system. One way to characterize a dynamic system like this is with its impulse response. An impulse is defined as:

$\delta[n] = \left\{\begin{array}{ll}1 & n=0\\0 & n \neq 0 \end{array}\right.$

That is, it is a single, unitary input. We can get the impulse response from a filter by using the $\delta[n]$ as input. As it happens, the impulse response for a FIR filter is just the filter coefficients. For now, using the term impulse response is convenient. We'll come back to it later and analyze it in more depth.

A brief note on zero-padding

Our $h[k]$ from above has just five coefficients. If we were to take the DFT of it, we would have just five points in the frequency domain, which wouldn't tell us all that much about the frequency response of our filter. We can get a finer-grained frequency domain representation by appending zeros to the impulse response function. This also allows us to have a power-of-two length input to the fft() function.

DFTs of impulse response/FIR coefficients

A FIR filter can change the amplitude and phase of an input sinusoid, but not the frequency. The magnitude spectrum of a set of FIR coefficients tells us the frequency response of the filter (i.e. changes in amplitude of difference input frequencies), and the phase angle tells us how the filter changes the phase of different input frequencies.

In [7]:
Npt = 128 # number of fft point
hkz = np.concatenate((hk,np.zeros(Npt-len(hk)))) # zero-padding coefficients hk
Hk = fft(hkz) # FFT of zero-padded hk

# plots
fig, axes = pl.subplots(2,1,figsize=(15,10)); ax1,ax2 = axes
ax1.plot(np.absolute(Hk[:int(Npt/2+1)]),'s'); 
ax1.set(xlim=(-.5,Npt/2+.5),xticks=np.arange(0,Npt/2+1,Npt/4),
        xticklabels=['0','$f_s$/4','$f_s$/2'],ylim=(-.05,1.05),ylabel='|H[m]|')
ax2.plot(np.angle(Hk[:int(Npt/2+1)]),'s'); 
ax2.set(xlim=(-.5,Npt/2+.5),xticks=np.arange(0,Npt/2+1,Npt/4),xticklabels=['0','$f_s$/4','$f_s$/2'],
        ylim=(-np.pi,np.pi),yticks=np.arange(-np.pi,np.pi+.01,np.pi/2),
        yticklabels=['-$\pi$','-$\pi$/2','0','$\pi$/2','$\pi$'],ylabel='H$_\phi$[m]')
sb.despine(ax=ax1,trim=True); sb.despine(ax=ax2,trim=True)

The lower the value of $|H[m]|$, the more the 5-point MA filter reduces the amplitude of signal components at frequency $m$. Components at the two points at which $|H[m]|$ is equal to zero will be removed completely.

We can illustrate this by creating a signal with components at the specific frequencies at which $|H[m]|$ equals zero (i.e., the stop frequencies):

In [8]:
# find and print stop frequencies
Hm = np.absolute(Hk[:int(Npt/2)]) # magnitude response of hk
print('stop frequencies = ' + str(np.where(Hm<.02)[0]*fs/Npt) + ' Hz') # print frequencies where Hm is near zero

# create signal with components at stop frequencies
t = np.arange(0,.2,1/fs)
xa = 1.3*np.cos(2*np.pi*10*t + np.pi/5)
xb = 0.5*np.cos(2*np.pi*200*t + np.pi/4)
xc = 0.75*np.cos(2*np.pi*400*t + 3*np.pi/5)
x = xa + xb + xc
y = moving_average(x, 5)

# plot original and filtered signal
fig, ax = pl.subplots(1,1,figsize=(12,4))
ax.plot(x, color=blue, lw=2); ax.plot(y, color=red, lw=3); ax.set(ylim=(-2.25,2.25),xlabel='n',ylabel='y[n], x[n]');
sb.despine(ax=ax,trim=True)
stop frequencies = [ 203.125   398.4375] Hz

The coefficients of a FIR filter need not be identical. For example, we could set $h[k] = [0.1, 0.2, 0.2, 0.2, 0.1]$ or $h[k] = [0.04,0.12,0.20,0.12,0.04]$.

Here are the frequency responses (i.e., FFT magnitude spectra) of all three $h[k]$ sequences we've discussed:

In [9]:
hk2 = np.array([.1,.2,.2,.2,.1]); hk2 = hk2/sum(hk2) # hk sequence #2, normalized
hk2z = np.concatenate((hk2,np.zeros(Npt-len(hk2)))) # zero padding
Hk2 = fft(hk2z); Hm2 = np.absolute(Hk2[:int(Npt/2)]) # FFT of hk2
hk3 = np.array([.04,.12,.2,.12,.04]); hk3 = hk3/np.sum(hk3) # hk sequence #3, normalized
hk3z = np.concatenate((hk3,np.zeros(Npt-len(hk3)))) # zero padding
Hk3 = fft(hk3z); Hm3 = np.absolute(Hk3[:int(Npt/2)]) # FFT of hk3

# plots
fig, axes = pl.subplots(2,1,figsize=(15,10)); ax1,ax2 = axes
la, = ax1.plot(hk,'s:',ms=12,lw=3); lb, = ax1.plot(hk2,'o:',ms=12,lw=3); lc, = ax1.plot(hk3,'^:',ms=12,lw=3);
ax1.set(ylabel='h[k]',xticks=np.arange(0,5))
ax1.legend([la,lb,lc],['h$_1$[k]','h$_2$[k]','h$_3$[k]'])
la, = ax2.plot(Hm,'s',ms=10); lb, = ax2.plot(Hm2,'o',ms=10); lc, = ax2.plot(Hm3,'^',ms=10);
ax2.set(xlim=(-.5,Npt/2+.5),xticks=np.arange(0,Npt/2+1,Npt/4),
        xticklabels=['0','$f_s$/4','$f_s$/2'],ylim=(-.05,1.05),ylabel='|H[m]|')
ax2.legend([la,lb,lc],['h$_1$[k]','h$_2$[k]','h$_3$[k]'])
sb.despine(ax=ax1,trim=True); sb.despine(ax=ax2,trim=True)

Let's see how these three FIR filters affect the same signal. We will illustrate both multiplication in the frequency domain and convolution in the time-domain:

In [10]:
t = np.arange(0,.2,1/fs)
xa = 1.3*np.cos(2*np.pi*10*t + np.pi/5)
xb = 0.5*np.cos(2*np.pi*185*t + np.pi/4)
xc = 0.75*np.cos(2*np.pi*315*t + 3*np.pi/5)
x = xa + xb + xc
y1 = convolve(hk,x); y2 = convolve(hk2,x); y3 = convolve(hk3,x)
hw = hamming(Npt); XX = fft(hw*x[:Npt]); Xm = np.absolute(XX[:int(Npt/2)])
fig, axes = pl.subplots(4,2,figsize=(15,12))
axes[0,0].plot(Xm/max(Xm),'o-',color=cyan);
axes[0,1].plot(x,color=cyan); axes[0,1].set(xlim=(0,len(x)),ylim=(-2.5,2.5),xticklabels='')
axes[1,0].plot(Hm,'s',color=la.get_color()); axes[1,0].plot(Xm/max(Xm)*Hm,'o-',color=cyan)
axes[1,1].plot(y1,color=cyan); axes[1,1].set(xlim=(0,len(y1)),ylim=(-2.5,2.5),xticklabels='')
axes[2,0].plot(Hm2,'o',color=lb.get_color()); axes[2,0].plot(Xm/max(Xm)*Hm2,'o-',color=cyan)
axes[2,1].plot(y2,color=cyan); axes[2,1].set(xlim=(0,len(y2)),ylim=(-2.5,2.5),xticklabels='');
axes[3,0].plot(Hm3,'^',color=lc.get_color()); axes[3,0].plot(Xm/max(Xm)*Hm3,'o-',color=cyan)
axes[3,1].plot(y3,color=cyan); axes[3,1].set(xlim=(0,len(y3)),ylim=(-2.5,2.5))
[axes[i,0].set(xlim=(-.5,Npt/2+.5),xticks=np.arange(0,Npt/2+1,Npt/4),
        xticklabels=[],ylim=(-.05,1.05),ylabel='|H[m]|') for i in range(4)]
axes[3,0].set(xticklabels=['0','$f_s$/4','$f_s$/2']);
[[sb.despine(ax=axes[i,j],trim=True) for j in range(2)] for i in range(4)];

Impulse response function and FIR filter coefficients

Here's the definition of convolution again:

$\displaystyle{y[n] = \sum_{k=0}^{M-1} h[k] x[n-k]} = h[k]\ast x[n]$

The impulse response function of a system (e.g, a filter) is the output of the system when the input is an impulse:

$\delta[n] = \left\{\begin{array}{ll}1 & n=0\\0 & n \neq 0 \end{array}\right.$

For a FIR filter, the impulse response function is just the set of coefficients $h[k]$:

$\begin{align} &\\ y[-1] &= h[0]\delta[-1-0] + h[1]\delta[-1-1] + h[2]\delta[-1-2] + h[3]\delta[-1-3] + h[4]\delta[-1-4] = 0\\ &\\ y[0] &= h[0]\delta[0-0] + h[1]\delta[0-1] + h[2]\delta[0-2] + h[3]\delta[0-3] + h[4]\delta[0-4] = h[0]\\ &\\ y[1] &= h[1]\delta[1-0] + h[1]\delta[1-1] + h[2]\delta[1-2] + h[3]\delta[1-3] + h[4]\delta[1-4] = h[1]\\ &\\ y[2] &= h[1]\delta[2-0] + h[1]\delta[2-1] + h[2]\delta[2-2] + h[3]\delta[2-3] + h[4]\delta[2-4] = h[2]\\ &\\ y[3] &= h[1]\delta[3-0] + h[1]\delta[3-1] + h[2]\delta[3-2] + h[3]\delta[3-3] + h[4]\delta[3-4] = h[3]\\ &\\ y[4] &= h[1]\delta[4-0] + h[1]\delta[4-1] + h[2]\delta[4-2] + h[3]\delta[4-3] + h[4]\delta[4-4] = h[4]\\ &\\ y[5] &= h[1]\delta[5-0] + h[1]\delta[5-1] + h[2]\delta[5-2] + h[3]\delta[5-3] + h[4]\delta[5-4] = 0\\ \end{align}$

By (zero-padding and then) taking the DFT of the impulse response function of our moving average filter, we analyzed the frequency response of the filter.

DFTs, complex numbers, and $z$-transforms

If we let $\omega = 2\pi m /N$, the definition of the discrete Fourier transform is simplified (at least with respect to notation):

$\begin{equation}X[m] = \displaystyle{\sum_{n=0}^{N-1} x[n] e^{-i\omega n}}\end{equation}$

Recall that the complex exponential is defined as:

$e^{-i\omega n} = \cos(\omega n) - i\sin(\omega n)$

The $z$-transform is a generalization of the Fourier transform:

$X(z) = \displaystyle{\sum_{n=-\infty}^\infty x[n] z^{-n}}$

We can express $z$ in complex exponential (or polar) form $z = r e^{i\omega} = r\big(\cos(\omega) + i\sin(\omega)\big)$. Plugging this into the equation for $H(z)$ above, we get:

$X(z) = \displaystyle{\sum_{n=-\infty}^\infty x[n] z^{-n} = \sum_{n=-\infty}^\infty x[n] (re^{i\omega})^{-n} = \sum_{n=-\infty}^\infty x[n]r^{-n}e^{-i\omega n}}$

This is just the Fourier transform of the sequence $x[n]$ multiplied by the exponential sequence $r^{-n}$.

If $|r| > 1$, $r^{-n}$ is a decreasing sequence, while if $|r| = 1$, it's a sequence of ones, and if $|r| < 1$, $r^{-n}$ is an increasing sequence. Here are illustrations of three exponential sequences:

In [11]:
r = np.array([.9, 1, 1.2]) # values of r less than, equal to, and greater than one
lfmt = ['-','-','-'] # line styles
mfmt = ['o','o','o'] # marker styles
clrt = [la.get_color(),lb.get_color(),lc.get_color()] # colors
N = 25 # number of points to plot
rn = np.zeros((3,N)) # initialized array

# plots
fig, axes = pl.subplots(3,1,figsize=(15,9))
for i, ax in enumerate(axes): # loop through r values, axes
    for n in range(N): # loop through samples
        rn[i,n] = r[i]**(-n) # calculate r sequence for value i, sample n
    ml,sl,bl = ax.stem(rn[i,:])
    pl.setp(ml,'color',clrt[i])
    pl.setp(sl,'color',clrt[i])
    pl.setp(bl,'color',clrt[i])
    sb.despine(ax=ax,trim=True)

When $|z| = 1$, $H(z)$ gives the frequency response of a filter, which is also the Fourier transform.

If we plot complex numbers on a plane with the real part on the x axis and the imaginary part on the y axis, we can see that $|z| = 1$ is the unit circle (and $\omega$ is the angle of the complex number $e^{i\omega}$:

Using $z$-transforms to analyze filters

In the five-point moving average filter example above, the output $y[n]$ was the sum of weighted and delayed inputs, $x[n-1]$, $x[n-2]$, etc...

In order to see how the $z$-transform relates to delays, we'll consider a very simple example, in which the output is just the input delayed by one sample: $y[n] = x[n-1]$

Let $Y(z)$ be the $z$-transform of $y[n]$ and let $X(z)$ be the $z$-transform of $x[n]$. Then

\begin{equation} \displaystyle{Y(z) = \sum_{n=-\infty}^\infty y[n]z^{-n} = \sum_{n=-\infty}^\infty x[n-1]z^{-n} } \end{equation}

Let $k=n-1$, i.e., rearranging terms, $n=k+1$, and substitute this into the equation above to show that the $z$-transform of a unit-delayed sequence is the $z$-transform of the non-delayed sequence multiplied by $z^{-1}$:

\begin{equation} \displaystyle{Y(z) = \sum_{k=-\infty}^\infty x[k]z^{-(k+1)} = \sum_{k=-\infty}^\infty x[k]z^{-k}z^{-1} = z^{-1}\sum_{k=-\infty}^\infty x[k]z^{-k} = z^{-1}X(z)} \end{equation}

You can do this repeatedly to show that the delaying $m$ units leads to multiplication by $z^{-m}$.

Let's use this fact to analyze a simple three-point moving average FIR filter: $y[n] = \frac{1}{3}x[n] + \frac{1}{3}x[n-1] + \frac{1}{3}x[n-2]$.

Taking the $z$-transform (and exploiting the fact that the $z$-transform is a linear operator, and so distributes across the elements of a sum):

\begin{equation} Y(z) = \frac{1}{3}X(z) + \frac{1}{3}X(z)z^{-1} + \frac{1}{3}X(z)z^{-2} = \left(\frac{1}{3} + \frac{1}{3}z^{-1} + \frac{1}{3}z^{-2}\right)X(z) \end{equation}

Recall that if $y[n] = h[k] \ast x[n]$, then $Y[m] = H[m]X[m]$. This result applies to $z$-transforms, as well.

For our purposes right now, this means that $H(z) = \left(\frac{1}{3} + \frac{1}{3}z^{-1} + \frac{1}{3}z^{-2}\right)$.

$H(z)$ is called the transfer function of the filter. When we design a filter, we work to create a desired transfer function.

If we plug $z = e^{i\omega} = \cos(\omega) + i\sin(\omega)$ in, we get $H(\omega)$, the Fourier transform. For our three-point moving average, we get:

\begin{equation} H(\omega) = \frac{1}{3} + \frac{1}{3}\big(\cos(\omega) + i\sin(\omega)\big) + \frac{1}{3}\big(\cos(2\omega) + i\sin(2\omega)\big) \end{equation}

In [12]:
npt = 64 # number of points to plot
omega = np.arange(0,2*np.pi,2*np.pi/npt) # frequency in radians
Ho = 1/3 + 1/3*np.exp(1j*omega) + 1/3*np.exp(1j*2*omega) # mathematical frequency response of 3-pt MA filter
h = np.array([1/3,1/3,1/3]) # 3-pt MA coefficients
hz = np.concatenate((h,np.zeros(npt-len(h)))) # zero-padding
Hm = np.absolute(fft(hz)) # frequency response

# plot
fig, ax = pl.subplots(1,1,figsize=(15,5))
ax.plot(omega,Hm,'s',ms=10) # computed freq. resp.
ax.plot(omega,np.absolute(Ho),color=red,linewidth=3); # mathematical freq. resp.
ax.set(xticks=[0,np.pi/2,np.pi,3*np.pi/2,2*np.pi],
       xticklabels=['0','$\pi/2$','$\pi$','$3\pi/2$','$2\pi$'])
sb.despine(ax=ax,trim=True)

IIR filters, auto-regression models

The FIR filters we've looked at so far have the output solely as a function of the input, but a filter can operate on delayed samples of the output, too. Filters that do this are called Infinite Impulse Response (IIR) filters. (In time series analysis, they are called auto-regression models.)

Here's a simple example, in which the output at time $n$ is a weighted function of the current input $x[n]$, delayed inputs $x[n-k]$, and delayed outputs $y[n-k]$:

\begin{equation} y[n] = a[1] y[n-1] + a[2] y[n-2] + b[0] x[n] + b[1] x[n-1] + b[2] x[n-2] \end{equation}

If we rearrange terms to put all instances of $y$ on the left and all instances of $x$ on the right, we get:

\begin{equation} y[n] - a[1] y[n-1] - a[2] y[n-2] = b[0] x[n] + b[1] x[n-1] + b[2] x[n-2] \end{equation}

Then, if we then take the $z$-transform of all terms on each side, we get:

\begin{equation} Y(z) - a[1]z^{-1}Y(z) - a[2]z^{-2}Y(z) = b[0]X(z) + b[1]z^{-1}X(z) + b[2]z^{-2}X(z) \end{equation}

Factoring $Y(z)$ and $X(x)$ out, we get:

\begin{equation} \displaystyle{Y(z)\big(1 - a[1]z^{-1} - a[2]z^{-2}\big) = \big(b[0] + b[1]z^{-1} + b[2]z^{-2}\big)X(z)} \end{equation}

Dividing both sides by $\big(1 - a[1]z^{-1} - a[2]z^{-2}\big)$, we get:

\begin{equation} \displaystyle{Y(z) = \frac{b[0] + b[1]z^{-1} + b[2]z^{-2}}{1 - a[1]z^{-1} - a[2]z^{-2}}X(z)} \end{equation}

Hence, the transfer function of this more complicated filter is:

\begin{equation} H(z) = \displaystyle{\frac{b[0] + b[1]z^{-1} + b[2]z^{-2}}{1 - a[1]z^{-1} - a[2]z^{-2}}} \end{equation}

If we multiply $H(z)$ by $\displaystyle{\frac{z^2}{z^2}}$ to get positive powers of $z$, we get:

\begin{equation} H(z) = \displaystyle{\frac{b[0]z^2 + b[1]z + b[2]}{z^2 - a[1]z - a[2]} = \frac{(z-z_0)(z-z_1)}{(z-p_0)(z-p_1)}} \end{equation}

Here, $z_0, z_1$ are the roots of the polynomial in the numerator, and $p_0, p_1$ are the roots of the polynomial in the denominator. When $z=z_0$ or $z = z_1$, $|H(z)| = 0$. Hence, these roots are called zeros of the $z$-transform. When $z=p_0$ or $z=p_1$, the denominator is zero, and $|H(z)|$ is infinite. These roots are called poles.

A useful way to visualize a filter's transfer function is with a pole-zero plot. On the $z$ plane, each zero is plotted as an "o", and each pole is plotted as an "x". So, for example, suppose the $H(z)$ from above has the following $a[k]$ and $b[k]$ coefficients:

\begin{equation} H(z) = \displaystyle{\frac{0.0605 z^2 + 0.121 z + 0.0605}{z^2 - 1.194z + 0.436}} \end{equation}

We can find the roots of these two polynomials using the roots() function in numpy:

In [13]:
# numerator
print(np.roots(np.array([0.0605,0.121,0.0605])))
# denominator
print(np.roots(np.array([1,-1.194,0.436])))
[-1. -1.]
[ 0.597+0.28211877j  0.597-0.28211877j]

Plugging these values into the equation for $H(z)$, we get:

\begin{equation} H(z) = \displaystyle{\frac{(z+1)(z+1)}{(z-0.597-i0.282)(z-0.597+i0.282)}} \end{equation}

We can visualize the poles and zeros in a pole-zero plot:

In [14]:
circ = pl.Circle((0,0),1,color='k',ls='dashed',lw=2,fill=False) # unit circle object for reference
fig,ax = pl.subplots(1,1,figsize=(8,8))
ax.set(ylim=(-1.5,1.5),xlim=(-1.5,1.5),aspect='equal')
ax.add_artist(circ) # draw circle
ax.plot([-1.5,1.5],[0,0],'k:') # draw x axis
ax.plot([0,0],[-1.5,1.5],'k:') # draw y axis
ax.plot(-1,0,'ko',mfc='white',mec='black',ms=12,mew=2) # plot zero
ax.plot(.597,.282,'kx',ms=12,mew=2) # plot pole
ax.plot(.597,-.282,'kx',ms=12,mew=2) # plot pole
sb.despine(ax=ax,trim=True)

As above, if we plug in $z=e^{i\omega}$, we get the Fourier transform. If we take the magnitude of the function when $z=e^{i\omega}$, we get the frequency response of the filter. The height of this curve is $|H(e^{i\omega})|$, with $\omega$ ranging from 0 to $2\pi$, rotation counterclockwise around the unit circle:

In [15]:
# using omega (radian frequency array) from above, plug in exp(iw) for z
Ho = (.0605*np.exp(2j*omega) + 0.121*np.exp(1j*omega) + 0.0605)/(np.exp(2j*omega - 1.194*np.exp(1j*omega) + 0.436))
fig,ax = pl.subplots(1,1,figsize=(15,5)) # plot it
ax.plot(omega,np.absolute(Ho),lw=3)
ax.set(xticks=[0,np.pi/2,np.pi,3*np.pi/2,2*np.pi],xticklabels=['0','$\pi/2$','$\pi$','$3\pi/2$','$2\pi$'],
       ylabel='$|H(e^{i\omega})|$',xlabel='$\omega$')
sb.despine(ax=ax,trim=True)

The general form of an IIR filter

The general form of an IIR filter is, in the time domain:

\begin{equation} y[n] = \displaystyle{\sum_{k=1}^N a[k]y[n-k] + \sum_{k=0}^M b[k]x[n-k]} \end{equation}

And if we take the $z$-transform and rearrange terms, we get:

\begin{equation} Y(z) = \displaystyle{\frac{\displaystyle{\sum_{k=0}^M} b[k]z^{-k}}{1-\displaystyle{\sum_{k=1}^N} a[k]z^{-k}}X(z)} \end{equation}

So, we can express the transfer function of an IIR filter as:

\begin{equation} H(z) = \frac{\displaystyle{\sum_{k=0}^M} b[k]z^{-k}}{1-\displaystyle{\sum_{k=1}^N} a[k]z^{-k}} \end{equation}

A few notes on IIR filters:

  • An IIR filter is stable if all the poles are inside the unit circle. On the other hand, an IIR filter is unstable if any poles are on or outside of the unit circle.

  • A filter is stable if its impulse response function is bounded, and unstable otherwise.

  • If there is a complex pole (i.e., a pole with a non-zero imaginary part), its complex conjugate will also be a pole. That is, complex poles show up in conjugate pairs. Recall that a complex conjugate of a complex number $z$ has the sign of the imaginary part flipped, e.g.: $a + jb$ and $a - jb$ are complex conjugates of one another.

  • In general, the phase response of an IIR filter is non-linear (cf. FIR filters).

As with the FIR filters, if you are designing a filter, you are figuring out where to put poles and zeros to get a particular frequency and phase response.

Pole-zero plots, frequency response, and impulse response functions

In order to see how poles and zeros affect the frequency response and impulse response functions, we can plot some sets of pole-zero plots, frequency responses, and impulse response functions.

First, we'll define a function that takes zeros, poles, a frequency array, a frequency response, and an impulse response and generates a pole-zero plot, a spectral magnitude plot, and a stem plot of the impulse response function:

In [16]:
def plot_filter(z,p,om,Hm,ir):
    fig = pl.figure(figsize=(18,4))
    gs = gridspec.GridSpec(1,5); ax1 = pl.subplot(gs[0,0]); ax2 = pl.subplot(gs[0,1:3]); ax3 = pl.subplot(gs[0,3:]);
    # create unit circle object
    circ = pl.Circle((0,0),1,color='k',ls='dashed',lw=2,fill=False)
    # make pole-zero plot in left panel
    ax1.set(ylim=(-1.5,1.5),xlim=(-1.5,1.5),xticks=[],yticks=[],aspect='equal');
    ax1.add_artist(circ); ax1.plot([-2,2],[0,0],'k:'); ax1.plot([0,0],[-2,2],'k:')
    [ax1.plot(zz.real,zz.