

# COM-7003SOFT Turbo code encoder/decoder VHDL source code overview / IP core

### Overview

The COM-7003SOFT is an error correction turbocode encoder/decoder written in generic VHDL.

The entire VHDL source code is deliverable.

#### Key features and performance:

- Flexible dynamic (i.e. at runtime) user-selected configuration:
  - Block length up to 8000 bits
  - Puncturing patterns for rates 1/3,1/2,2/3,3/4,4/5,5/6,6/7,7/8
- Frame error rate examples:
  - $\begin{array}{l} \circ \quad 2032 \text{-bit frame, Rate 1/3, 5-bit soft} \\ \text{quantization, 15-iterations:} \\ \text{FER} = 10^{-2} @ E_b/N_o = 1.4 \text{ dB} \\ \text{FER} = 10^{-3} @ E_b/N_o = 1.6 \text{ dB} \end{array}$
  - $\circ \quad 768\text{-bit frame, Rate 3/4, 5-bit soft} \\ \text{quantization, 15-iterations:} \\ \text{FER} = 10^{-2} @ E_b/N_o = 3.1 \text{ dB} \\ \text{FER} = 10^{-3} @ E_b/N_o = 3.5 \text{ dB}$
- Provided with IP core:
  - VHDL source code
  - Matlab .m file for simulating the encoding and decoding algorithms, for generating stimulus files for VHDL simulation and for end-to-end BER performance analysis at various signal to noise ratios
  - VHDL testbench
- Complies with ETSI EN 301 545-2, DVB-RCS2.

## Target Hardware

The code is written in generic standard VHDL so that it can be ported to a variety of FPGAs. The code was developed and tested on a Xilinx 7-series FPGA but is expected to work similarly on other targets.

# Configuration

### Synthesis-time configuration parameters

The following constants are user-defined in the decoder component generic section prior to synthesis. These parameters generally define the size of the decoder embodiment.

| Synthesis-time configuration parameters                                   |                                                                                                                                                                                                                                                                       |  |
|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Parameters                                                                | Configuration                                                                                                                                                                                                                                                         |  |
| Number of soft-<br>quantized bits at the<br>decoder input<br>NQBITS       | Typical values: 4. A minor<br>performance improvement can be<br>achieved with 5-bits.                                                                                                                                                                                 |  |
| log2 of the maximum<br>payload size in Bytes,<br>rounded up<br>NADDRBITS. | While the actual payload size is<br>user-programmable at run-time, the<br>maximum payload size is an<br>important factor that affects the<br>number of RAM blocks used in the<br>FPGA.<br>For the maximum payload size of<br>1000 Bytes, set <b>NADDRBITS</b> =<br>10 |  |

#### **Run-time configuration parameters**

The user can set and modify the following controls at run-time through the top level component interface:

| Parameters                       | Configuration                                                                                                                                  |
|----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| Frame size<br>BURST_PAYLOAD_SIZE | Uncoded/Decoded frame<br>size, expressed in bytes.<br>Valid range 1 – 1000<br>Bytes.                                                           |
|                                  | Constraints:<br>- when using puncturing<br><b>BURST_PAYLOAD_SI</b><br><b>ZE</b> *4 must be an integer<br>multiple of the puncturing<br>period. |
|                                  | - must NOT be an integer multiple of 15.                                                                                                       |
| Encoding rate <b>R1/R2</b>       | Valid values are<br>1/3,1/2,2/3,3/4,4/5,5/6,6/7,<br>7/8                                                                                        |

# I/Os

## General

#### CLK: input

The synchronous clock. The user must provide a global clock (use BUFG). The CLK timing period must be constrained in the .xdc file associated with the project.

### SYNC\_RESET: input

Synchronous reset. The reset MUST be exercised at least once to initialize the internal variables. It must be exercised whenever a control parameter is changed.

### **Encoder/Decoder controls**

Users can define the encoder and decoder controls with one of two possible levels of abstraction: simple and detailed.

The simplest form is described by the payload size **BURST\_PAYLOAD\_SIZE** and the code rate **R1/R2**, as described in the run-time configuration section above.

A more detailed configuration consists of several arcane parameters **BURST\_PAYLOAD\_SIZE**, P, **Q0**, Q1, Q2, Q3, Y\_PUNCTURING\_PERIOD, Y\_PUNCTURING\_PATTERN, W\_PUNCTURING\_PATTERN, defined in Table A-1 of [1].

To simplify operation, a VHDL component (TC\_DECODER\_DVB\_RCS2.VHD) and a Matlab table1.m program are provided to look-up the optimum detailed configuration from just the payload size BURST\_PAYLOAD\_SIZE and the code rate R1/R2.

### Encoder

|                                                                 | TC_ENCODER_DVB_RCS2                                                                                                                                                                   |                                              |
|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
| $\rightarrow$                                                   | CLK DATA_OUT(1:0)<br>SYNC_RESET DATA_OUT_VALID                                                                                                                                        | <b>→</b>                                     |
| $\rightarrow$<br>$\rightarrow$<br>$\rightarrow$<br>$\leftarrow$ | DATA_IN(1:0)<br>DATA_IN_VALID INPUT<br>SOF_IN BITS<br>CTS_OUT                                                                                                                         | <b>→ →                                  </b> |
| <b> </b>                                                        | BURST_PAYLOAD_SIZE(9:0)<br>R1(2:0)<br>R2(3:0)<br>P(6:0)<br>Q0(3:0)<br>Q1(3:0)<br>Q2(3:0)<br>Q3(2:0)<br>Y_PUNCTURING_PERIOD(4:0)<br>Y_PUNCTURING_PATTERN(27:0)<br>W_PUNCTURING_PATTERN |                                              |

**DATA\_IN(1:0)**: Input data is read two bits at a time A(bit 0) and B (bit1).

**DATA\_IN\_VALID**: input.

1 CLK-wide pulse indicating that DATAIN is valid.

**SOF\_IN**: input Start Of Frame. 1 CLK-wide pulse. The SOF is aligned with **DATA\_IN\_VALID**. Note that there is no need for an end of frame as the input frame size is defined as a control parameter.

#### CTS\_OUT: output.

Clear-To-Send flow control. '1' indicates that the encoder is ready to accept another input dibit. IMPORTANT: relying on CTS\_OUT for flow control may not be sufficient because of latency in stopping the flow. NEVER send the next SOF\_IN when CTS\_OUT = '0'. This implies the sender must count the data symbol in a frame, stop at N and wait 2 CLKs at least before checking CTS\_OUT again.

The encoder outputs mirror its inputs: DATA\_OUT(1:0), DATA\_OUT\_VALID, SOF\_OUT, EOF\_OUT, CTS\_IN.

| ur cik         | 1  |  |    |      |
|----------------|----|--|----|------|
| U SYNC_RESET   | 0  |  |    |      |
| U CTS_OUT      | 1  |  |    |      |
| M DATA_IN[1:0] | 00 |  | 00 | X 10 |
| Wata_IN_VALID  | 1  |  |    |      |
| SOF_IN         | 1  |  |    |      |
|                |    |  |    |      |

### Decoder

|                                             | TC_                                                                                                   | DECODER_DVB_RCS2                                   |             |
|---------------------------------------------|-------------------------------------------------------------------------------------------------------|----------------------------------------------------|-------------|
| →<br>→                                      | CLK<br>SYNC_RESET                                                                                     | DATA_OUT(1:0) =<br>SAMPLE_CLK_OUT =                | >           |
| <b>^ ^ ^ ^ </b>                             | DATA_A_IN(NQBITS-1:0<br>DATA_B_IN(NQBITS-1:0<br>SAMPLE_CLK_IN<br>SOF_IN INPI<br>EOF_IN SAM<br>CTS_OUT | ) SOF_OUT<br>) DECODED CTS_IN<br>) OUTPUT JT IPLES | <b>→</b> (- |
| →<br>→                                      | BURST_PAYLOAD_SIZE<br>P(6:0)                                                                          | E(NADDRBITS-1:0)                                   |             |
| $\uparrow$ $\uparrow$ $\uparrow$ $\uparrow$ | Q0(3:0)<br>Q1(3:0)<br>Q2(3:0)<br>Q3(2:0)                                                              | CONTROLS                                           |             |
| $\rightarrow$ $\rightarrow$ $\rightarrow$   | Y_PUNCTURING_PERIC<br>Y_PUNCTURING_PATT<br>W_PUNCTURING_PATT                                          | DD(4:0)<br>ERN(27:0)<br>ERN                        |             |
| ->                                          | N_ITER(3:0)                                                                                           |                                                    |             |

**DATA\_A\_IN / DATA\_B\_IN**: Two soft-quantized input samples. The precision (**NQBITS**) is selectable at the time of synthesis. A 4-bit softquantization is considered a good trade-off between decoding performance and FPGA occupancy. A 5bit soft-quantization may yield minor performance improvement.

Usage: it is expected that the demodulator preceding this decoder will normalize the

demodulated samples prior to soft-quantization by using an AGC loop. The AGC target level is important in maximizing the decoder BER performance.

**DATA\_IN\_VALID**: input. 1 CLK-wide pulse indicating that DATAIN is valid.

**SOF\_IN / EOF\_IN:** inputs Start Of Frame and End Of Frame. 1 CLK-wide pulses. A aligned with **DATA\_IN\_VALID**.

#### **CTS OUT**: output.

Clear-To-Send flow control. 'l' indicates that the encoder is ready to accept another input dual input samples.

#### The decoder outputs mirror its inputs: DATA\_OUT(1:0), DATA\_OUT\_VALID, SOF\_OUT, CTS\_IN.

**N\_ITER(3:0)**: input. Number of decoder iterations. MUST be an odd number between 1 and 15. The more iterations, the lower the BER. However, the decoder latency is nearly proportional to the number of iterations. 7 is a good tradeoff between performance and latency.

# Performance

# **Encoder throughput**

The maximum encoder throughput is as follows:

- Encoded output: 2\*f<sub>clk</sub> bits/s
- Uncoded input:  $2*f_{clk}*R$  bits/s, where R is the encoding rate and  $f_{clk}$  the FPGA clock.

# **Decoder latency**

The decoder can only handle one frame at a time. The latency between input SOF and decoded output EOF is a function of **BURST\_PAYLOAD\_SIZE**, the coding rate **R1/R2** and the selected number of decoding iterations **N\_ITER**:

Latency (in processing clocks CLK) = (BURST\_PAYLOAD\_SIZE \* 4 + 25) \* (2 \* N\_ITER + 1/( R1/R2))

For example, in the case of a 1000 Bytes payload, rate 3/4 and 7 iterations, the latency is 61717 clocks (including 5333 clocks for encoded input samples, 4000 clocks for output decoded bits).

## Frame Error Rate

The decoded errors are somewhat bursty in nature, with many error-free decoded frames followed by an occasional erroneous frame with many bit errors. Therefore, we prefer to measure the decoder performance in terms of frame error rate (FER).

Frame error rate examples:

- 2032-bit frame, Rate 1/3, 5-bit soft quantization, 15-iterations:  $FER = 10^{-2} @ E_b/N_o = 1.4 dB$  $FER = 10^{-3} @ E_b/N_o = 1.6 dB$
- 768-bit frame, Rate 3/4, 5-bit soft quantization, 15-iterations:  $FER = 10^{-2} @ E_b/N_o = 3.1 dB$  $FER = 10^{-3} @ E_b/N_o = 3.5 dB$
- 472-bit frame, Rate 1/2, 5-bit soft quantization, 15-iterations:  $FER = 10^{-2} @ E_b/N_o = 1.9 dB$  $FER = 10^{-3} @ E_b/N_o = 2.2 dB$

# Software Licensing

The COM-7003SOFT is supplied under the following key licensing terms:

- 1. A nonexclusive, nontransferable license to use the VHDL source code internally, and
- 2. An unlimited, royalty-free, nonexclusive transferable license to make and use products incorporating the licensed materials, solely in bit stream format, on a worldwide basis.

The complete VHDL/IP Software License Agreement can be downloaded from http://www.comblock.com/download/softwarelicense.pdf

# **Configuration Management**

The current software revision is 3.

| Directory | Contents                                                                                                                                                                                                        |
|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| /doc      | Specifications, user manual, implementation documents                                                                                                                                                           |
| /src      | .vhd source code,.pkg packages, .xdc<br>constraint files (Xilinx)<br>One component per file.                                                                                                                    |
| /sim      | VHDL test benches                                                                                                                                                                                               |
| /matlab   | Matlab .m file for simulating the encoding<br>and decoding algorithms, for generating<br>stimulus files for VHDL simulation and for<br>end-to-end BER performance analysis at<br>various signal to noise ratios |
| /bin      | .bit configuration files (for use with<br>ComBlock COM-1800 FPGA development<br>platform)                                                                                                                       |

Project files:

Xilinx ISE 14 project file: com-7003.xise Xilinx Vivado v2015.2 project file: project\_1.xpr

# VHDL development environment

The VHDL software was developed using the following development environment:

(a) Xilinx ISE 14.7 for synthesis, place and route

(b) Xilinx Vivado 2015.2 for synthesis, place and route and VHDL simulation

The entire project fits easily within a Xilinx Artix7-100T. Therefore, the ISE project can be processed using the free Xilinx WebPack tools.

## **Device Utilization Summary**

The encoder size is fixed (not parameterized).

| Device: Xilinx Artix7-1001 |  |
|----------------------------|--|
|                            |  |

| Encoder        |     | % of Xilinx<br>Artix7-100T |
|----------------|-----|----------------------------|
| Registers      | 721 | 0.6%                       |
| LUTs           | 996 | 1.6%                       |
| Block RAM/FIFO | 3.5 | 2.6%                       |
| DSP48          | 1   | 0.4%                       |
| GCLKs          | 1   | 3.1%                       |

The decoder size depends essentially on two key parameters defined in the generic section of *tc\_decoder\_dvb\_rcs2.vhd*, namely:

- The maximum payload size defined by the constant NADDRBITS
- The number of soft-quantized bits at the decoder input **NQBITS**

| Decoder<br>4-bit soft-quantization<br>Frame size < 2048 bits |       | % of Xilinx<br>Artix7-100T |
|--------------------------------------------------------------|-------|----------------------------|
| Registers                                                    | 3558  | 2.8%                       |
| LUTs                                                         | 10652 | 16.8%                      |
| Block RAM/FIFO                                               | 15    | 11.1%                      |
| DSP48                                                        | 1     | 0.4%                       |
| GCLKs                                                        | 1     | 3.1%                       |

| Decoder<br>4-bit soft-quantization<br>Frame size ≤ 8000 bits |       | % of<br>Artix7-100T |
|--------------------------------------------------------------|-------|---------------------|
| Registers                                                    | 3591  | 2.8%                |
| LUTs                                                         | 10726 | 16.9%               |
| Block RAM/FIFO                                               | 39.5  | 29.3%               |
| DSP48                                                        | 1     | 0.4%                |
| GCLKs                                                        | 1     | 3.1%                |

### Clock and decoding speed

The entire design uses a single global clock CLK. Typical maximum clock frequencies for various FPGA families are listed below:

| Device family                     | Encoder | Decoder |
|-----------------------------------|---------|---------|
| Xilinx Artix 7 -2 speed<br>grade  | 212 MHz | 155 MHz |
| Xilinx Kintex-7 -2<br>speed grade | 294 MHz | 230 MHz |

# Ready-to-use Hardware

The COM-7003SOFT was developed on, and therefore ready to use on the following commercial off-the-shelf hardware platform:

### FPGA development platform

<u>COM-1800</u> FPGA (XC7A100T) + ARM + DDR3 SODIMM socket + GbE LAN development platform

## Xilinx-specific code

The VHDL source code is written in generic VHDL and thus can be ported FPGAs from various vendors. No Xilinx CORE nor Xilinx primitive is used.

# VHDL components overview

### Top level

| YH.      | co             | M70            | 03_TOP - Behavioral (src\com7003_top.vhd)            |
|----------|----------------|----------------|------------------------------------------------------|
|          | ΎΗ2            | CLK            | GEN7C - CLKGEN7C_MMCM - xilinx (src\CLKGEN7C_MM      |
|          | Ή.             | CLK            | GEN7D - CLKGEN7D_MMCM - xilinx (src\CLKGEN7D_MM      |
|          | Ή <sub>Π</sub> | Inst           | LFSR11P - LFSR11P - behavior (src\lfsr11p.vhd)       |
| <b>-</b> | Ή <sub>D</sub> | IC_            | CODEC_CONFIG_001 - IC_CODEC_CONFIG - Behavioral (    |
|          |                | Ή <sub>α</sub> | TC_CODEC_TABLEA1_001 - TC_CODEC_TABLEA1 - Behavi     |
|          | Ή <sub>D</sub> |                | ENCODER_DVB_RCS2_001 - TC_ENCODER_DVB_RCS2 - Be      |
|          | <b>.</b>       | Ή.             | ARITH_001 - ARITH - Behavioral (src\arith\arith.vhd) |
|          |                | Ή.             | INPUT_BUF_001 - BRAM_DP2 - Behavioral (src\bram_dp2. |
|          |                | Ϋ́Η_           | INPUT_BUF_002 - BRAM_DP2 - Behavioral (src\bram_dp2. |
|          |                | ΎΗ_            | OUTBUF_001 - BRAM_DP2 - Behavioral (src\bram_dp2.vh  |
|          |                | Ϋ́Η_           | OUTBUF_002 - BRAM_DP2 - Behavioral (src\bram_dp2.vh  |
| <u> </u> | Hot            | Щ Т            | C_DECODER_DVB_RCS2_001 - TC_DECODER_DVB_RCS2 -       |
|          | <u>ب</u>       | H.             | PERMUTATION_TABLE_001 - PERMUTATION_TABLE - Beh      |
|          |                | H.             | Y_DEPUNCTURING_004 - BRAM_DP2 - Behavioral (src\bra  |
|          |                | ΎΗ_            | INPUT_BUF_001 - BRAM_DP2 - Behavioral (src\bram_dp2. |
|          |                | ΎΗ_            | INPUT_BUF_003B - BRAM_DP2 - Behavioral (src\bram_dp  |
|          |                | YH2            | INPUT_BUF_004 - BRAM_DP2 - Behavioral (src\bram_dp2. |
|          |                | ΎΗ_            | INPUT_BUF_005 - BRAM_DP2 - Behavioral (src\bram_dp2. |
|          |                | ΎΗ_            | INPUT_BUF_006 - BRAM_DP2 - Behavioral (src\bram_dp2. |
|          |                | ΎΗ_            | LLR_BUF_007 - BRAM_DP2 - Behavioral (src\bram_dp2.vh |
|          |                | ΫН             | BM_GEN_001 - BM_GEN - Behavioral (src\bm_gen.vhd)    |
|          |                | Υн             | FORWARD_STATE_GEN_001 - FORWARD_STATE_GEN - Bel      |
|          |                | ΎΗ_            | BACKWARD_STATE_GEN_001 - BACKWARD_STATE_GEN -        |
|          |                | ΎΗ,            | BSM_BUF_001 - BRAM_DP2 - Behavioral (src\bram_dp2.vl |
|          |                | ΎΗ,            | BSM_BUF_001x - BRAM_DP2 - Behavioral (src\bram_dp2.) |
|          |                | ΫΗ_            | LLR_GEN_001 - LLR - Behavioral (src\LLR.vhd)         |
|          | l              | Ч              | OB - BRAM_DP2 - Behavioral (src\bram_dp2.vhd)        |

*TC\_CODEC\_CONFIG.vhd* generates detailed configuration parameters for the encoder and decoder. The user enters the burst payload size (in BYTES) and the coding rate R1/R2. This component looks up the optimum detailed configuration in Table-A1 of [1]

*TC\_ENCODER\_DVB\_RCS2.vhd* is the encoder top component.

The *ARITH.vhd* component performs minor arithmetic operations to compute the initial permutation indices 3, (4Q1+3) modulo N, (4Q2+3+ 4Q0P) modulo N, (4Q3+3+4Q0P) modulo N.

*BRAM\_DP2.vhd* is a generic dual-port memory, used as input and output elastic buffers. Memory is inferred (no Xilinx primitive is used).

*TC\_DECODER\_DVB\_RCS2.vhd* is the decoder top component. It processes one frame at a time, i.e. the input flow must be stopped until the entire frame is decoded.

*PERMUTATION\_TABLE.vhd* generate permutation and inverse permutation lookup tables

*BM.vhd* generates the 16 branch metrics value, based on the received samples ABYW and the associated erasure information (when puncturing is enabled).

*FORWARD\_STATE\_GEN.vhd* generates forward state metrics a(k+1,s) from the previous state metrics a(k,s')

*BACKWARD\_STATE\_GEN.vhd* generates backward state metrics b(k,s) from the next state metrics b(k+1,s)

*LLR.vhd* generates the log likelihood ratio (LLR) from a(k), bm(s,s'), b(k+1). see Matlab turbo.m

*COM7003\_TOP.vhd*: is mostly a use example when the turbo-codec is implemented on a ComBlock COM-1800 FPGA development platform. This component includes encoder, decoder, detailed codec configuration, clock generation, interface to a supervisory microcontroller (8-bit address/data bus to exchange control registers REG and status registers SREG). CLK\_P is the main processing clock.

*INFILE2SIM.vhd* reads an input file. This component is used by the testbench to read a softquantized encoded bit stream generated by the turbo.m Matlab program for various Eb/No cases.

*SIM2OUTFILE.vhd* writes three 12-bit data variables to a tab delimited file which can be subsequently read by Matlab (load command) for plotting or analysis.

## Matlab simulation

The turbo.m program

- generates a stimulus file fecdecin.txt for use as input to the decoder VHDL simulation. The file includes a frame of pseudo-random (PRBS11) data bits, turbo code encoding, additive white Gaussian noise and soft-quantization.
- Performs end-to-end BER performance analysis of the turbo-codec over a noisy (AWGN) channel.

#### The turbo.m program uses

treillis\_diagram.m to generate the treillis state diagram (input state, input data, output state, output parity bits).

The tc\_dec\_ber.m program reads a file of decoded data tcout.txt

generated by VHDL simulation and compare it with the original PRBS-11 test sequence. It counts the number of bit errors.



### **Reference documents**

#### [1] ETSI EN 301 545-2

"Digital Video Broadcasting (DVB); Second Generation DVB Interactive Satellite System (DVB-RCS2); Part 2: Lower Layers for Satellite standard" 7.3.5.1 Turbo FEC Encoder

# Implementation Overview

## **Turbo Code Encoder**

Encoding requires four passes of the input block through an encoder core:

- pass #1: natural order. Determines the circulation state C1
- pass #2: natural order starting at encoder state C1
- pass #3: interleaved order. Determines the circulation state C2.
- Pass #4: interleaved order starting at encoder state C2.

For maximum throughput, two encoder cores are used in parallel according to the sequencing below:



# **ComBlock Ordering Information**

COM-7003SOFT Turbo code encoder/decoder, VHDL source code / IP core

# **Contact Information**

MSS • 845-N Quince Orchard Boulevard • Gaithersburg, Maryland 20878-1676 • U.S.A.

Telephone: (240) 631-1111 Facsimile: (240) 631-1676 E-mail: info@comblock.com