Polyphase Filter Banks for Embedded Sample Rate Changes in Digital Radio Front-Ends
Mehmood Awan, Yannick Le Moullec, Peter Koch, and Fred Harris
[Abstract]This paper presents efficient processing engines for software-defined radio (SDR) front-ends. These engines, based on a polyphase channelizer, perform arbitrary sample-rate changes, frequency selection, and bandwidth control. This paper presents an M-path polyphase filter bank based on a modified N-path polyphase filter. Such a system allows resampling by arbitrary ratios while performing baseband aliasing from center frequencies at Nyquist zones that are not multiples of the output sample rate. This resampling technique is based on sliding cyclic data load interacting with cyclic-shifted coefficients. A non-maximally-decimated polyphase filter bank (where the number of data loads is not equal to the number of M subfilters) processes M subfilters in a time period that is less than or greater than the M data loads. A polyphase filter bank with five different resampling modes is used as a case study for embedded resampling in SDR front-ends. These modes are (i) maximally decimated, (ii) under-decimated, (iii) over-decimated, and combined up- and down-sampling with (iv) single stride length, and (v) multiple stride lengths. These modes can be used to obtain any required rational sampling rate change in an SDR front-end based on a polyphase channelizer. They can also be used for translation to and from arbitrary center frequencies that are unrelated to the output sample rates.
[Keywords] SDR; digital front-ends; polyphase filter bank; embedded resampling
There are several generations of architectures for digital radio transceivers. A base station in a cellular mobile communication system is an example of a multichannel radio receiver that simultaneously down-converts and demodulates narrowband radio frequency (RF) channels , . Traditional heterodyne architecture, considered the first generation of digital radio architecture, is shown in Fig. 1(a) for an N-channel receiver. Each subreceiver consists of a dual-stage down-converter, and only the baseband processing is done in the digital domain . In the first stage, the RF signal is down-converted to bandlimited intermediate frequency (IF). In the second stage, the IF filter output is again down-converted to baseband by a matched-quadrature mixer and matched baseband filters that perform final bandwidth control. Next, the signal passes into the digital domain where the output of the analog-to-digital converter (ADC) is processed by digital signal processing (DSP) engines. These engines perform the required baseband processing, that is, synchronization, equalization, demodulation, detection, and decoding. The problem with this type of architecture is that amplitude and phase are imbalanced. This results in cross-talk between the narrowband channels because of aging (time, temperature) of the analog components of the quadrature down-converter. Each imbalance-related spectral image must be lower than the desired spectral term, and this is difficult to sustain over time and at varying temperatures.
The need for extreme I/Q balance gave rise to the next generation of radios where second-stage (IF) down-conversion and, consequently, the channelization process are digitized, as shown in Fig. 1(b). Digital conversion at IF provides greater control over the imbalance by manipulating the number of bits involved in the arithmetic operation. The precision of the coefficients used in the filtering process sets an upper limit for spectral artifacts at -5 dB/bit. This means that a 12-bit ADC can achieve image levels below -60 dB . DSP-based complex down-conversion, however, has two advantages: the spectral images are controlled so that they are below the quantization noise floor of the ADC involved in the conversion process, and the digital filters before and after the mixers are designed to have linear phase characteristics . The second generation of radio, with digital front-end, is a realizable version of SDR. The range of applications for second-generation architecture, shown in Fig. 1(b), is restricted to those with IF center frequencies of a couple of hundred megahertz. This is due to the limited dynamic range of high-speed ADCs. The dynamic range is often extended by using a hybrid scheme in which the initial complex down-conversion is performed by analog I/Q mixers, and channelization is performed digitally after the ADC. DSP techniques are applied to the digitized I/Q data to balance the gain and phase offsets in the analog ADC .
A digital front-end with a standard design that includes frequency selection, bandwidth reduction, and sample rate reduction is one of the most power- and time-critical functionalities of an SDR terminal. This is due to the large bandwidth and high dynamic range of the signal to be processed. Consequently, the digital signals may have high sample rates and large word lengths. High sample rates not only increase power consumption but also make the use of time-shared hardware infeasible . On the other hand, multirate signal processing specifies new ways of performing DSP tasks, and these ways are not normally available in traditional DSP designs. A multirate polyphase filter can perform the tasks of a multichannel receiver. In such a receiver, an input signal is composed of many equal-bandwidth, equally spaced frequency-division-multiplexed (FDM) channels. These channels are digitally down-converted to baseband (bandwidth is constrained by digital filters) and subjected to a sample rate reduction commensurate with the bandwidth reduction. This significantly reduces the number of system resources required to perform multichannel processing and, consequently, reduces costs , .
The remainder of this paper is organized as follows: In section 2, we briefly introduce a polyphase channelizer and describe how it is formulated from a conventional channelizer. In section 3, we categorize the embedded resampling cases described in  into five different resampling cases: maximally decimated, under-decimated, over-decimated, and combined
up- and down-sampled with single and multiple commutator stride lengths. In section 4, we perform MatLab simulations and the simulation results demonstrate the performance of polyphase channelizers that deliver the targeted output sample rates. Section 5 concludes the paper.
A multirate polyphase filter can perform the tasks of a multichannel receiver. These tasks are equivalent to down-conversion, filtering, and resampling of multiple narrowband signals . The step-by-step conversion of a standard single-channel demodulator into a multichannel polyphase channelizer is described in  and . A brief introduction is given here. In the standard single-channel demodulation process, shown in Fig. 2(a), the carrier-centered spectrum is translated to baseband (where a filter reduces the bandwidth), and a resampler reduces the sample rate in proportion to the bandwidth reduction. Standard single-channel demodulation is described by
where x (n ) is the carrier-centered input signal, θk is carrier angular frequency for k th channel, h (n) is the baseband filter, and y (n, k) is the output baseband signal for k th channel.
According to the Equivalency Theorem , down-conversion followed by baseband filtering can be reordered so that filtering at the carrier occurs first, followed by down-conversion. This is the opposite of the traditional channelization process. Fig. 2(b) shows this reordered operation, which is also described by
To reduce the work involved in down-converting and then discarding the samples during resampling, the heterodyne and down-sampler are reordered, and only retained samples are down-converted, as shown in Fig. 2(c). In this case, the frequency of the heterodyne at the reduced sample rate is Mθk rad/sample rather than the original frequency of θk. If the center frequency, θk , is a multiple of the output sample rate 2π/M, that is, k (2π/M), then the center frequency is aliased to 0 by M-to-1 resampling. Under this condition, the down-sampled heterodyne defaults to unity and can be discarded, as shown in Fig. 2(d).
For the computed output for each input, M -1 of these computed output samples are discarded by the down-sampler. To reduce this workload, the resampling and the filtering operations are reordered so that one output is computed for every M input sample. This is achieved by applying the Noble identity , which describes how a filter processing every M th input sample followed by an output M -to-1 down-sampler is equivalent to an input M -to-1 down-sampler followed by a filter processing every input sample. The original up-converted filter is partitioned to M subfilters that operate at the reduced output rate rather than the original input rate. The mathematical expressions in (4) describe the mapping of the filter’s Z-transforms at the input rate to a sum of Z-transforms at the output rate:
The phase rotators in each subfilter are constant for that subfilter. Fig. 2(e) shows the block diagram for (4). The output resampler is pulled to the input side of each filter stage by applying the Noble identity. The input delay elements Z -r and the resampler at each stage are replaced by a rotary switch called a commutator.
In the final step of forming the polyphase filter bank, the sum formed by the phase rotators is one output port of a discrete Fourier transform (DFT). The DFT can be implemented as a fast Fourier transform (FFT) to extract time samples of each narrowband process located at multiples of the output sample rate (that has been aliased to baseband by the resampler) . This is shown in Fig. 2(f) and given by
The relationship between the sampling frequency, channel spacing, and number of channels for the polyphase channelizer is
where fs is the input sampling frequency, N is number of channels (FFT size), which is the same as M here, and Δf is the inter-channel spacing .
The polyphase filter channelizer uses the input M -to-1 resampling to alias the spectral terms residing at multiples of the output sample rate to baseband. This means that for a standard polyphase channelizer processing M input samples at a time, the output sample rate is the same as the channel spacing. When operating in this mode, the system is a maximally decimated filter bank. We experimented with polyphase filter banks using embedded resampling and here present under-decimated, over-decimated, and combined up- and down-sampling (for single and multiple commutator stride lengths) modes.
3 Non-Maximally Decimated Filter Bank
We have briefly presented the polyphase filter bank channelizer in which the output sample rate is the same as the channel spacing. However, in practice, an output sample rate that is different from the channel spacing is often required. To uncouple the output sample rate from the channel spacing, a straightforward approach is to resample each channel with P/Q resamplers . By changing the values of P and Q, the required sample rate can be obtained. An alternative is to embed the resampling process in (i) the polyphase commutator, that is, in the interaction between input data registers and the polyphase coefficients, and in (ii) the interaction between the polyphase outputs and the FFT input. This alternative only requires a state machine to schedule the interactions, and there is no computational cost.
Two schemes  for these interactions are (i) serpentine shifting the input data that interacts with a fixed set of coefficients and circular buffering the filtered data prior to FFT, and (ii) sliding the cyclic data-load that interacts with cyclic-shifted coefficient memory. In the serpentine shift and circular buffering scheme, an input data set (not equal to M ) is always fed to the same registers, and the polyphase subfilter coefficients are fixed. Let us consider a single-tapped delay line where all the data is moved further to the right before the next input data set is loaded. The data is moved by an address equal in length to the next input data set. By folding this one-dimensional tapped delay line into the two-dimensional memory of the polyphase filter, the data move is a serpentine shift between the columns. Because this non-equal M input data set is loaded, the data time-origin moves with respect to the FFT time-origin. To keep these two origins aligned, the computed output of the polyphase filter is circular-shifted by the residue address of the data time-origin mod M before the FFT is performed. In the sliding cyclic data-load and cyclic shift of the coefficient memory scheme, the data registers are fixed instead of being cycled, and the coefficient sets are rotated. The input data is fed as sliding cyclic load by the input commutator to a fixed set of registers, and the subfilter coefficients are cyclic-shifted by the same residue address of the data time-origin mod M before FFT is performed. Taking individual subfilters into account, the first scheme seems to require more read and write operations to synthesize the serpentine data shift. However, this shift is, rather, achieved by circular wrapping of block memory (an address control task). In the second scheme, only the loading subfilter gets a data shift.
To demonstrate the embedded resampling, we here describe the example shown in Fig. 3. A system has 5 channels separated by 6 MHz center frequencies. Each channel has a
2.5 MHz symbol rate shaped by a square root Nyquist filter (with 20% excess bandwidth) to form a 30 MHz FDM channel. To satisfy the Nyquist criteria at the output sample rate, the output sample rate must be greater than the occupied channel bandwidth. The occupied channel bandwidth of 3 MHz (symbol rate plus excess bandwidth) is selected to be smaller than the channel bandwidth of 6 MHz to allow down-sampling by large factors within the channelizer. The down-sample channelizer uses a 30-tap prototype low-pass filter with around 60 dB side-lobe attenuation that is partitioned into a 5-path polyphase filter with 6-tap subfilters. Both the data bank and filter coefficient bank are two-dimensional memories of 5 rows and 6 columns, and each row corresponds to a subfilter. According to (6), the output sample rate for the maximally decimated system becomes 6 MHz, which is the same as the channel spacing. Four other resampling factors are also introduced, and two of these have an embedded up-sampling factor of two. These five resampling factors are 5, 3, 6, 5/2, and 15/2, delivering output sample rates of 6, 10, 5, 12 and 4 MHz, respectively. These correspond to maximally decimated, under-decimated, over-decimated, and combined up-and down-sample cases. Each of these cases will be presented for the sliding cyclic data load and cyclic shift of the coefficient memory scheme.
Case 1: Maximally-Decimated Mode
In a maximally decimated system, data is loaded in stride lengths of 5 mod 5, and a computed output sample has a 5-to-1 down-sampling. Fig. 4(a) shows the data loading process for the two outputs. The subfilter’s data register and coefficients are denoted R and C, respectively. In all the data loads, data loading starts from subfilter R4 up to R0, and the loaded subfilter’s tapped delay line is pushed one tap to the right before a new data element is loaded. For every computed output, all the subfilters are fed with input data. Because there is no residue (non-loaded subfilter), there is no offset between the data time-origin and the FFT time-origin. The coefficients of the subfilters are therefore fixed from C0 to C4. There is only one state machine where the 5-point data-loading of the register bank always performs an inner product with the fixed set of subfilter coefficients. Table 1 shows the register loading sequence and corresponding subfilter coefficients.
Case 2: Under-Decimated Mode
In a non-maximally decimated (under-decimated) system, data is loaded in stride lengths of 3 mod 5, and a computed output sample has a 3-to-1 down-sampling within a 5-stage polyphase filter. The least common multiple (LCM) of 3 and 5 is 15, which means that the state engine cycles in 15 inputs, and because 3-point data is delivered at a time, there must be 5 distinct states in the state machine. Fig. 4(b) shows the data-loading processes for the first two states.
In the first state, data loading starts from subfilter R2 up to R0; and in the second state, data loading starts from R4 to R2 and so on for the five distinct states. Consequently, there is a residue of 2 for each data-loading operation. To align the data time-origin with the FFT time-origin, the subfilter coefficients are cyclic-shifted by the residue address of the data time-origin mod 5. The time-origin that is being cyclically shifted is also periodic in the LCM of 3 and 5. Thus, the cyclic shift of the polyphase subfilter coefficients has the same period as the data register load and is controlled by the same state machine. The data-loading sequence is always to the next 3 registers that have indexing of mod 5, which means that the next register to accept data when moving from state 0 to state 1 is (R0)-1, which is actually R4. Similarly, the filter coefficients assigned to perform the inner products with the registers are always offset 3 mod 5 relative to the previous filter set. Table 2 shows the state machine for register-loading and coefficients of each corresponding subfilter for performing 3-to-1 down-sampling in a 5-stage polyphase filter.
Case 3: Over-Decimated Mode
In a over-decimated system, data is loaded in stride lengths of 6 mod 5, and a computed output sample has a 6-to-1 down-sampling within a 5-stage polyphase filter. The LCM of 6 and 5 is 30, which means that the state engine cycles in 30 inputs, and because 6-point data is delivered at a time, there must be 5 distinct states in the state machine. Fig. 4(c) shows the data-loading processes for the first two states.
In the first data load, loading starts from subfilters R0, R4 up to R0, and the second load starts from R4 up to R0 and R4 again and so on for the 5 distinct states. Consequently, there is a residue of 4 for each data-loading operation. To align the data time-origin with the FFT time-origin, the subfilter coefficients are cyclic-shifted by the residue address of the data time-origin mod 5. Table 3 shows the state machine for register-loading and the corresponding coefficients for performing 6-to-1 down-sampling in a 5-stage polyphase filter.
Case 4: Combined Up- and Down-Sampling Mode (Single Stride)
In the previous three cases, down-sampling was performed by different factors. The polyphase filter is also capable of embedding the up-sampling factor with the down-sampling so that the sample rate change is rational. In this case, we up-sample by a factor of 2 and then down-sample by a factor of 5 to obtain a 5/2 sample rate change. The up-sampling, performed by zero-packing the input data, is actually achieved by data-load addressing, in which one address is skipped so that 1-to-2 up-sampling can be realized. Down-sampling is performed by cyclic-loading the data (zero-packed) through the filter in stride of length 5. The two data-loading cycles for a 5/2 sample rate change in a 5-path polyphase filter are shown in Fig. 4(d).
In the first data load, 3 data samples are delivered to the 5 register addresses. In the second load, 2 data samples are delivered to the 5 register addresses. The data-loading process is periodic in 2 load cycles, and 2 states are needed to control the process. The data-loading process for the 2 states and the corresponding coefficient sets are listed in Table 4. In the 2 states, 5 inputs are delivered, and 2 outputs from the polyphase engine are taken to realize the desired 5/2 embedded resampling. The loading scheme has a constant offset of -2 mod 5 within a sequence and also in the transition between sequences. The -2 offset is a result of the 1-to-2 up-sampling, represented by the zero packing.
There are normally 5 subfilters in the polyphase partition of a 5-stage polyphase filter. Because of the 1-to-2 up-sampling implemented by the zero-packing, only half the coefficients in each stage actually contribute to the subfilter output . Thus, each stage is further partitioned into 2 subsets of coefficients, which results in 10 subfilter coefficient sets. These sets are denoted C0, C1,..., C9 where the integer is the starting index from the original non-partitioned prototype filter. The successive filter index increments by
6 mod 10; and between the states, the filter index increments by 5 mod 10. The integer 6 is the offset between two data samples in the zero-packed load in two adjacent rows. Because of up-sampling by a factor of 2, the prototype filter has to be designed to operate at 2 × fs , that is, 60 MHz. Consequently, the filter becomes twice as long as the standard design. However, because only half of it is used per processing cycle, there is no processing penalty .
Case 5: Combined Up- and Down-Sampling Mode (Multiple Strides)
This case is similar to case 4 but down-sampled by a factor of 15 to have a 15/2 sample rate change. Up-sampling is performed by data-load addressing, which skips the next address, and down-sampling is performed by cyclic-loading the data through the filter in stride lengths of 15. The two states of the loading cycle for 15/2 sample rate change in a 5-path polyphase filter are shown in Fig. 4(e).
In the first data load, 8 data samples are delivered to the 5 register addresses. In the second load, 7 data samples are delivered to the 5 register addresses. The data-loading process is periodic in 2 load cycles, and 2 states are needed to control the process. Table 5 lists the data-loading process for the 2 states and the corresponding coefficient sets. In the process, 15-to-2 down-sampling is performed in a 5-path polyphase filter. The filter down-converts the spectral regions from multiples of fs /5 (or 30/5 = 6 MHz) and maintains a sample rate of fs (2/15) (or 60/15 = 4 MHz).
The MatLab simulations show the embedded sample rate changes in a 5-path polyphase filter and DFT operating as a 5-channel channelizer. The FDM input signal has 5 channels that are each 16-QAM modulated and separated by 6 MHz center frequencies. The sample rate is 30 MHz, and each channel has a 2.5 MHz symbol rate shaped by a square root Nyquist filter with 20% excess bandwidth. Three of the five channels, which are occupied by 3 MHz bandwidth signals, are centered at 0, 6, and 12 MHz. The remaining two channels, centered at -12 and -6 MHz, are intentionally kept empty. The input signal spectrum comprising 5 channels at 30 MHz is shown in Fig. 5(a).
In a system operating in maximally decimated mode, the input data is channelized and down-sampled 5-to-1 for an output rate of 6 MHz. Each of the 5 polyphase filter stages are 6 taps long and are anchored to the 5 input registers being fed by the periodic input commutator. Fig. 5(b) shows the spectra of the 5 output channels with an output rate of 6 MHz. In a system operating in under-decimated mode, the same input data is channelized and down-sampled 3-to-1 for an output rate of 10 MHz. Fig. 5(c) shows the spectra of the 5 output channels with
10 MHz output rate. Similarly, in a system operating in over-decimated mode, the same input data is channelized and down-sampled 6-to-1 for an output rate of 5 MHz. Figure 5(d) shows the spectra of the 5 output channels with 5 MHz output rate.
In a system operating in combined up- and down-sampling mode, the input spectrum is channelized, up-sampled by a factor of 2, and down-sampled by 5-to-1 and 15-to-1 for output rates of 12 MHz and 4 MHz, respectively. Because of up-sampling by a factor of 2, there are 10 polyphase filter coefficient stages each with 6 taps. The filters’ coefficients are periodically rotated through the 5 input registers (which have a periodic sliding input commutator) according to the state machine described in Table 4 and Table 5. The spectral locations of the channels are reordered as a result of processing the up-sampled data in the polyphase filter , . The 5-point FFT processes the polyphase data output frequencies in the order [0, 2, 4, 1, 3], which is seen to be indexing stride of 2 mod 5. These are reordered back to their natural order. Figs. 5(e) and (f) show the spectra of the 5 output channels at 12 MHz and 4 MHz, which correspond to 5/2 and 15/2 sample rate changes, respectively.
The simulations show that embedded sample rate changes can be successfully implemented in a polyphase channelizer. All the output channels have 60 dB of spectral side-lobe attenuation, selected by the prototype low-pass filter. The processing engines used in all the 5 cases are identical except that each has different state machines, register loading schemes, and subfilter coefficient sets.
In this paper, we have shown the versatility of a polyphase engine that performs embedded resampling that is uncoupled from frequency selection and bandwidth control. Five embedded resampling modes in polyphase filter banks have been presented, namely, maximally decimated, under-decimated, over-decimated, and combined up- and down-sampling. These correspond to single, short, long, and multiple commutator stride lengths. For various applications, these modes can be used for any required rational sampling-rate change in an SDR front-end using a polyphase channelizer. The suggested modes are highly useful for designing flexible and resource-optimal architectures for advanced software radios. In a subsequent paper “Hardware Architecture Analysis of Polyphase Filter Banks Performing Embedded Resampling for Software Defined Radio Front-Ends” , we analyze FPGA based hardware architecture of these resampling engines in terms of area, time, and power tradeoffs.
 A. M. Badda and M. Donati, “The software defined radio technique applied to the RF front-end for cellular mobile systems,” in Software Radio Technologies and Services, E. Del Re, Ed., Berlin, Germany: Springer-Verlag, 2001.
 f. harris, C. Dick, M. Rice, “Digital receivers and transmitters using polyphase filter banks for wireless communications,” in IEEE Trans. Microw. Theory Tech., vol. 51, no. 4, pp 1395-1412, 2003.
 f. harris, Multirate Signal Processing for Communication Systems. New York: Prentice Hall, 2006.
 T. Hentschel, M. Henker, G. Fettweis, “The digital front-end of software radio Terminals,” IEEE Personal Commun., vol. 6, no.4, pp 40-46, Aug. 1999.
 f. harris, C. Dick, “Performing simultaneous arbitrary spectral translation and sample rate change in polyphase interpolating or decimating filters in transmitters and receivers,” in Proc. Software Defined Radio Tech. Conf. and Product Expo, San Diego, CA, Nov 2002.
 M. Awan, Y. Le Moullec, P. Koch, and f. harris, “Hardware architecture analysis of polyphase filter banks performing embedded resampling for software-defined radio front-ends,” to appear in Special Issue on Digital Front-End and Software Radio Frequency, ZTE Communications, March, 2012.
Mehmood Awan (email@example.com) received his MSc degree in electronic engineering with specialization in applied signal processing and implementation from Aalborg University in 2007. He was a research assistant for one year and started his Ph.D. in resource-optimal SDR front-ends in 2008. His research interests include multirate signal processing, SDR, hardware architectures, and embedded systems.
Yannick Le Moullec (firstname.lastname@example.org) received his Ph.D. degree in electrical engineering from Université de Bretagne Sud, Lorient, France, in 2003. From 2003 to 2005, he was a post-doctoral fellow at the Center for Embedded Software Systems, Aalborg University, Denmark. From 2005 to 2008, he was an assistant professor at the Department of Electronic Systems, Aalborg University, where he is now an associate professor. His research interests include methods and tools for HW/SW co-design, embedded systems, and reconfigurable computing.
Peter Koch (email@example.com) received his M.Sc. and Ph.D. degrees in Electrical Engineering from Aalborg University, Denmark, in 1989 and 1996. Since 1997, he has been an associate professor at the Department of Electronic Systems, Aalborg University, working in the interdisciplinary field between DSP and resource-optimal real-time architectures. From 2006 to 2010, he headed the Center for Software Defined Radio, Aalborg University. His research interests include optimization between DSP algorithms and architectures, and low-energy HW/SW design.
Fred Harris (firstname.lastname@example.org) holds the Signal Processing Chair of the Communication Systems and Signal Processing Institute at San Diego State University where he teaches DSP and communication systems. He holds 20 patents for digital receivers, and he lectures around the world on DSP applications. He is an adjunct of the Princeton IDA-CCR Center for Communications Research and is the author of “Multirate Signal Processing for Communication systems.”