Detection Algorithms for Wireless Communications
To my parents, Ezio and Ester, for letting me detect my path Gianluigi Ferrari
To my wife Laura Giulio Colavolpe
To Annapaola, Enrica and Alberto Riccardo Raheli
Detection Algorithms for Wireless Communications With Applications to Wired and Storage Systems
Gianluigi Ferrari, Giulio Colavolpe and Riccardo Raheli All of University of Parma Italy
John Wiley & Sons, Ltd
Copyright © 2004 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England Telephone
(+44) 1243 779777
Email (for orders and customer service enquiries):
[email protected] Visit our Home Page on www.wileyeurope.com or www.wiley.com All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London WIT 4LP, UK, without the permission in writing of the Publisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to
[email protected], or faxed to (+44)1243770571. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the Publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Other Wiley Editorial Offices John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809 John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 0-471-85828-1 Typeset by the author using LaTex software. Printed and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire. This book is printed on acid-free paper responsibly manufactured from sustainable forestry in which at least two trees are planted for each one used for paper production.
Contents Preface
xi
Acknowledgements
xiii
List of Figures
xv
List of Tables
xxix
1 Wireless Communication Systems 1.1 Introduction 1.2 Overview of Wireless Communication Systems 1.3 Wireless Channel Models 1.3.1 Additive White Gaussian Noise Channel 1.3.2 Frequency Nonselective Fading Channel 1.3.3 Frequency Selective Fading Channel 1.3.4 Phase-Uncertain Channel: Channel with Phase and Frequency Instabilities 1.4 Demodulation, Detection, and Parameter Estimation 1.5 Information Theoretic Limits 1.5.1 Additive White Gaussian Noise Channel 1.5.2 Frequency Nonselective Fading Channel 1.5.3 Phase-Uncertain Channel 1.6 Coding and Modulation 1.6.1 Block and Convolutional Coding 1.6.2 Linear Modulation without Memory 1.6.3 Combined Coding and Modulation 1.7 Approaching Shannon Limits: Turbo Codes and Low Density Parity Check Codes v
1 1 4 5 6 6 8 9 10 12 12 12 14 15 15 16 17 19
vi
Contents
1.8 Space Time Coding 1.9 Summary 1.10 Problems 2 A General Approach to Statistical Detection for Channels with Memory 2.1 Introduction 2.2 Statistical Detection Theory 2.3 Transmission Systems with Memory 2.3.1 Causality and Finite Memory 2.3.2 Stochastic Channels: Channels with Infinite Memory . . . . 2.4 Overview of Detection Algorithms for Stochastic Channels 2.5 Summary 2.6 Problems
20 21 21 25 25 26 32 35 38 40 43 43
3 Sequence Detection: Algorithms and Applications 49 3.1 Introduction 49 3.2 MAP Sequence Detection Principle 50 3.3 Viterbi Algorithm 51 3.4 Soft-Output Viterbi Algorithm 54 3.5 Finite Memory Sequence Detection 54 3.5.1 Inter-Symbol Interference Channel 57 3.5.2 Flat Slow Fading Channel 58 3.6 Estimation-Detection Decomposition 59 3.7 Data-Aided Parameter Estimation 63 3.8 Joint Detection and Estimation 66 3.8.1 Phase-Uncertain Channel 67 3.8.2 Dispersive Slow Fading Channel 69 3.9 Per-Survivor Processing 71 3.9.1 Phase-Uncertain Channel 75 3.9.2 Dispersive Slow Fading Channel 75 3.9.3 Remarks 75 3.10 Complexity Reduction Techniques for VA-Based Detection Algorithms 76 3.10.1 State Reduction by Memory Truncation 77 3.10.2 State Reduction by Set Partitioning 80 3.10.3 A Case Study: TCM on an ISI Channel 83 3.10.4 Reduced-Search Algorithms 87 3.11 Applications to Wireless Communications 88 3.11.1 Adaptive Sequence Detection: Preliminaries and Least Mean Squares Estimation 89
Contents
3.11.2 Noncoherent Sequence Detection for Phase-Uncertain Channels 3.11.3 Noncoherent Sequence Detection for Slowly Varying Frequency Nonselective Fading Channels 3.11.4 Linear Predictive Sequence Detection for Phase-Uncertain Channels 3.11.5 Linear Predictive Sequence Detection for Frequency Flat Fading Channels 3.11.6 Linear Predictive Sequence Detection for Frequency Selective Fading Channels 3.12 Summary 3.13 Problems 4 Symbol Detection: Algorithms and Applications 4.1 Introduction 4.2 MAP Symbol Detection Principle 4.3 Forward Backward Algorithm 4.4 Iterative Decoding and Detection 4.5 Extrinsic Information in Iterative Decoding: a Unified View . . . . 4.5.1 A Review of the Use of the Extrinsic Information 4.5.2 Forward Backward Algorithm 4.5.3 Soft-Output Viterbi Algorithm 4.6 Finite Memory Symbol Detection 4.7 An Alternative Approach to Finite Memory Symbol Detection . . . 4.8 State Reduction Techniques for Forward Backward Algorithms . . . 4.8.1 Forward-Only RS-FB Algorithms 4.8.2 Examples of Application of Fwd-Only RS-FB Algorithms . 4.8.3 Forward-Only RS FB-type Algorithms 4.8.4 Examples of Application of Fwd-Only RS FB-type Algorithms 4.8.5 Generalized RS-FB Algorithms 4.8.6 Examples of Application of Generalized RS-FB Algorithms 4.9 Applications to Wireless Communications 4.9.1 Noncoherent Iterative Detection of Binary Linear Coded Modulation 4.9.2 Noncoherent Iterative Detection of Spectrally Efficient Linear Coded Modulation
vii
95 Ill 124 134 141 146 148 155 155 156 157 162 168 169 172 178 185 191 200 201 204 213 216 222 237 246 246 260
viii
Contents
4.9.3
Pilot Symbol-Assisted Iterative Detection for PhaseUncertain Channels 4.9.4 Linear Predictive Iterative Detection for Phase-Uncertain Channels 4.9.5 Noncoherent Iterative Detection for Slow Frequency Nonselective Fading Channels 4.9.6 Linear Predictive Iterative Detection for Fading Channels . . 4.10 Summary 4.11 Problems
272 285 292 294 296 297
5 Graph-Based Detection: Algorithms and Applications 5.1 Introduction 5.2 Factor Graphs and the Sum-Product Algorithm 5.3 Finite Memory Graph-Based Detection 5.4 Complexity Reduction for Graph-Based Detection Algorithms . . . 5.5 Strictly Finite Memory: Inter-Symbol Interference Channels . . . . 5.5.1 Factor Graph 5.5.2 Modified Graph 5.6 Applications to Wireless Communications 5.6.1 Noncoherent Graph-Based Detection 5.6.2 Linear Predictive Graph-Based Detection for PhaseUncertain Channels 5.6.3 Linear Predictive Graph-Based Detection for Frequency Flat Fading Channels 5.7 Strong Phase Noise: An Alternative Approach to Graph-Based Detection 5.7.1 System Model and Exact Sum-Product Algorithm 5.7.2 Proposed Algorithms 5.7.3 Numerical Results 5.8 Summary 5.9 Problems
301 301 303 307 312 313 314 318 322 323
A Discretization by Sampling A.I Introduction A.2 Continuous-Time Signal Model A.2.1 Power Spectrum of a Rayleigh Faded Signal A.2.2 Signal Oversampling A.2.3 Signal Symbol-Rate Sampling A.3 Discrete-Time Signal Model
353 353 353 355 359 363 364
323 326 329 329 333 343 347 349
Contents
ix
References
367
List of Acronyms
387
Index
391
Preface This book presents, in a unitary and novel perspective, some of the research work the authors have carried out over the last decade, along with several collaborators and students. The roots of this book can be traced back to the design of adaptive sequence detection algorithms for channels with parametric uncertainty. The explosion of turbo codes and iterative decoding around the middle of the Nineties has motivated the design of iterative (turbo and graph-based) detection algorithms. This book aims at providing the reader with a unified perspective on the design of detection algorithms for wireless communications. What does this statement really mean? It has become clear to us, in recent years, that most of the proposed detection algorithms evolve from a simple idea, which can be described as finite-memory detection and synthesized by a simple metric. This unique metric is the key ingredient to derive: • sequence detection algorithms based on the Viterbi algorithm; • symbol detection algorithms based on the forward-backward algorithm; • graph-based detection algorithms based on the sum-product algorithm. Although simple, and probably familiar to several researchers working in this area, to the best of our knowledge a unified approach to the design of detection algorithms, based on a single metric, has never been proposed clearly in the literature. This book tries to address this lack, by giving a comprehensive treatment, with several examples of application. This book should, however, be interpreted by the reader as a starting point, rather than a purely tutorial work. In fact, we believe that the proposed simple unifying idea can find many applications beyond those explored in this book. We would like to mention a single (and significant) example. In current and future wireless communication systems, it will be more and more important to support high data-rate transmissions. Multiple-input multiple-output systems, based on the use of multiple antennas, have received significant interest from the research community over the xi
xii
Preface
last years. All the detection algorithms presented in this book apply to single-input single-output systems. The reader is therefore invited to entertain herself/himself by trying to extend these algorithms to multiple-input multiple-output communication scenarios. A final comment is related to the subtitle: "With Applications to Wired and Storage Systems." As the reader will see, most of the examples presented in this book are related to wireless communication systems. However, several of the proposed communication scenarios apply also to storage and wired systems: for example, proper inter-symbol interference channels may characterize several storage systems. Moreover, the proposed approach is general and, therefore, suitable for application to scenarios different from those considered explicitly. Again, the reader is invited to use the tools proposed in this book and apply them to solve her/his own communication problems. As an extra resource we have set up a companion website for our book containing a solutions manual and a sample chapter. Also, for those wishing to use this material for lecturing purposes, electronic versions of most of the figures from our book are available. Please go to the following URL and take a look: ftp://ftp.wiley.co.uk /pub/books/ferrari.
Acknowledgements This book would have never been possible without the help of several people across the years of research, pain and happiness at the University of Parma (and not only). Since it is impossible to thank explicitly all of them, we would like to take this opportunity to "unitarily" thank all of them, guaranteeing that their help has never been forgotten, but it is well remembered. In particular, we first wish to thank many students at the University of Parma, who have indirectly contributed to this book with their thesis works. Although explicit and comprehensive acknowledgments are impossible, several researchers must be explicitly mentioned, for particularly significant and important contributions. We would like to thank Prof. Achilleas Anastasopoulos (University of Michigan, Ann Arbor, USA), whose collaboration led to some results presented in Chapter 4. Moreover, we would also like to thank him for kindly proof-reading the entire manuscript. Many thanks go to Prof. Keith M. Chugg (University of Southern California, Los Angeles, USA) and Dr. Phunsak Thiennviboon (TrellisWare Technologies Inc., San Diego, USA), for the collaboration (while Gianluigi Ferrari was visiting the University of Southern California in 2000-2001) from which some results presented in Chapter 4 come from. The invaluable collaboration of Prof. Giuseppe Caire (Institut Eurecom, Sophia Antipolis, France) in part of Chapter 5 is acknowledged. This collaboration started while Giulio Colavolpe was visiting the Institut Eurecom in 2000 and has continued and consolidated in the following years. Thanks to Dr. Alberto Ginesi and Dr. Riccardo De Gaudenzi (ESA-ESTEC, Noordwijk, The Netherlands) for their appreciation and encouragement on some results in Chapter 5. Alan Barbieri (PhD student, University of Parma, Italy) and Gianpietro Germi, are also acknowledged for their contribution to part of Chapter 5. Going backward in time, we would like to acknowledge the contribution of Prof. Andreas Polydoros (University of Athens, Greece) for a long research collaboration which lead to persurvivor processing, described in Chapter 3. We would also like to acknowledge the contribution of Prof. Piero Castoldi (Scuola Superiore Sant'Anna, Pisa, Italy) to some of the results described in Chapter 3 and the Appendix. Finally, we wish to acxiii
xiv
Acknowledgements
knowledge the encouragement and support through the years of our senior colleagues Prof. Giorgio Picchi (University of Parma, Italy) and Prof. Giancarlo Prati (Scuola Superiore Sant'Anna, Pisa, Italy), who first appreciated our scientific achievements. Besides research collaborators, several people have helped in the process of editing the manuscript. Among them, we would like to thank Alessandra De Conti, who read very carefully the entire manuscript, providing extremely valuable comments on the English style. Luca Consolini (PhD student, University of Parma, Italy) is also thanked for reading the entire manuscript and providing comments. We wish also to thank Annuccia Babayan, for proof-reading parts of the manuscript, and Michele Franceschini (PhD student, University of Parma, Italy), for providing useful comments and helping significantly in the editing process. Prof. Enrico Forestieri (Scuola Superiore Sant'Anna, Pisa, Italy) is finally acknowledged for using his vast knowledge of ETgX to solve a few (unsolvable to us) editing problems and make the manuscript more compliant with the requests of the Publisher. Last, but not least, we heartly wish to thank several people at John Wiley and Sons Ltd, who made the realization of this book possible. First of all, we are greatly indebted to our Development Editor, Sarah Hinton, who first believed in this project and promoted it. Our gratitude goes also to our Publishing Editor, Mark Hammond, who supported the project along its entire realization. Finally, the Project Editor, Dan Gill, is thanked for managing very efficiently the editorial phase of the manuscript and the Copyeditor, Helen Heyes, is also thanked for providing extremely detailed corrections to the final version of the manuscript.
List of Figures 1.1 Examples of wireless channel models 1.2 Classical model of a communication system 1.3 Lower bound on the noncoherent channel capacity for different values of N. Reproduced from [44] with permission of John Wiley & Sons 1.4 Rate-1/2 convolutional encoder with generators G\ = 5 and G^ = 7. 2.1 2.2 2.3 2.4 2.5 2.6
7 11 15 22
M-ary signaling and detection Discretization of the received signal Decision regions Transmission system Constellation for 32-APSK Possible communication systems. Reproduced from [92] by permission of John Wiley & Sons
26 29 30 33 44
Add-compare-select operation in a VA Receiver based on estimation-detection decomposition Training and tracking operational mode PSP-based detection. . Trellis evolution: universal and PSP-based estimation Pictorial description of trellis folding Set partitioning for 8-PSK constellation TCM encoder and mapping for 16-QAM (the subsets are specified in Figure 3.9) 3.9 Set partition and mapping rule for 16-QAM constellation 3.10 Equivalent discrete-time channel response of an ISI channel 3.11 Performance of TCM with 16-QAM for transmission over the 4tap ISI channel considered in Figure 3.10. Reproduced from [106], ©1996 IEEE, by permission of the IEEE
53 61 67 73 74 78 81
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
xv
45
84 84 85 86
xvi
List of Figures
3.12 SER performance of uncoded QPSK transmission over a 3-tap dispersive fading channel and LMS-based adaptive detection. The normalized Doppler rate is foT = 1.85 x 10~3. Reproduced from [34], ©1995 IEEE, by permission of the IEEE 3.13 SER performance of uncoded QPSK transmission over a 3-tap dispersive fading channel and LMS-based adaptive detection with dual diversity. The normalized Doppler rate is foT = 3.69 x 10~3. . . . 3.14 SER performance of PSP-based detection of a TCM with 8-PSK. For comparison, the performance of a conventional data-aided receiver (with d = 2) is also shown. Reproduced from [34], ©1995 IEEE, by permission of the IEEE 3.15 BER of NSD detection schemes for 16-DQAM with various degrees of complexity. Reproduced from [47], ©1999 IEEE, by permission of the IEEE 3.16 BER of NSD detection schemes for 8-state TC-16-QAM. Reproduced from [47], ©1999 IEEE, by permission of the IEEE 3.17 BER of the proposed detection schemes for 16-DQAM on the two considered ISI channels and various values of N. The noncoherent detectors search a trellis with £' = 256 states. Reproduced from [47], ©1999 IEEE, by permission of the IEEE 3.18 BER of the proposed receiver for DQPSK with N = 5 and £' = 1 for various values of phase jitter standard deviation (white marks) and frequency offset (black marks). Reproduced from [47], ©1999 IEEE, by permission of the IEEE 3.19 System model in the case of a channel with phase and frequency uncertainty 3.20 Examples of indistinguishable sequences: (a) noncoherent receiver (zero-th order); (b) advanced receiver with frequency estimation (first order). Reproduced from [117], ©2002 IEEE, by permission of the IEEE 3.21 BER of the receiver based on (3.137) (white marks) for DQPSK and comparison with NSD (black marks) and coherent receivers. The frequency offset is v — 0. Reproduced from [117], ©2002 IEEE, by permission of the IEEE 3.22 BER of the receiver based on (3.137) (white marks) for DQPSK, N = 6, L = 11, and f' = 16. The performance of an NSD receiver (black marks) with N = 6 and (J = 16 is also shown for comparison. Reproduced from [117], ©2002 IEEE, by permission of the IEEE. .
92 93
95 100 101
102
103 104
107
109
110
List of Figures
xvii
3.23 BER at an SNR of 10 dB versus the normalized frequency offset of the receiver based on (3.137) for DQPSK and N = 6, L = 6, and £' = 16. Method 1 (white marks) based on a limitation of the estimation interval and method 2 (black marks) based on DDE are considered. The performance of an NSD receiver with N = 6 and £' = 16 is also shown for comparison. Reproduced from [117], ©2002 IEEE, by permission of the IEEE
Ill
3.24 System model for transmission over a frequency nonselective fading channel. Reproduced from [120], ©2000 IEEE, by permission of the IEEE
112
3.25 BER of the proposed receivers with and without CSI, 5 = 4 and TV = 2, for differentially encoded 16-QAM and Rice fading with KR = 10 dB. The performance of an ideal coherent receiver is also shown for comparison. Reproduced from [120], ©2000 IEEE, by permission of the IEEE
122
3.26 BER of the proposed receivers with and without CSI, ("' = 4 and N — 2, for differentially encoded 16-QAM and Rayleigh fading. The performance of an ideal coherent receiver is also shown for comparison. Reproduced from [120], ©2000 IEEE, by permission of the IEEE
123
3.27 BER of the proposed receiver without CSI, (' = 4 and TV = 3, for differentially encoded QPSK, slow Rayleigh fading and various values of phase noise standard deviation (black marks). The performance of a coherent receiver based on a decision-directed PLL (white marks) is also shown for comparison. Reproduced from [120], ©2000 IEEE, by permission of the IEEE
124
3.28 Eb/N0 as a function of the phase noise standard deviation for BER equal to 10~3 of the proposed receiver without CSI, f' = 1 and N — 3, for differentially encoded QPSK, slow Rayleigh fading (black marks), and comparison with a coherent receiver based on a decision-directed PLL (white marks). Reproduced from [120], ©2000 IEEE, by permission of the IEEE 125 3.29 System model for linear prediction-based receivers
126
3.30 BER of a TCM scheme with 16-QAM. Linear predictive receivers with various complexity levels are considered. For comparison, the performance of the equivalent coherent receiver is also shown. . . .
130
xviii
List of Figures
3.31 Prediction coefficients as a function of the phase noise standard deviation <JA, for an equal-energy modulation, prediction order TV — 4 and Eb/No — 4 dB. Various values of the frequency offset intensity a are considered. Reproduced from [128], ©2003 IEEE, by permission of the IEEE
132
3.32 BER as a function of the phase noise standard deviation CTA for DQPSK, symbol by symbol decision, and various values of the frequency offset intensity a. Reproduced from [128], ©2003 IEEE, by permission of the IEEE 133 3.33 BER of a linear predictive receiver for transmission of QPSK over a time-varying flat Rayleigh fading channel. Various values of the normalized maximum Doppler rate are considered. Reproduced from [124], ©1995 IEEE, by permission of the IEEE 141 3.34 BER of the blind recursive detector with (3 = 1 for Eb/N0 = 40 dB and BPSK as a function of the assumed memory N. Reproduced from [121] by permission of John Wiley & Sons
146
3.35 BER versus Eb/N0 of the blind recursive detector with (3 — 1 for QPSK modulation and p — 0.998. Reproduced from [121] by permission of John Wiley & Sons
147
3.36 BER versus E^/No of the blind recursive detector with (3 — 1 and (3 = 2 for BPSK modulation and p = 0.99. Reproduced from [121] by permission of John Wiley & Sons
148
4.1
Transmission system and MAP symbol detection
156
4.2
Parallel concatenated convolutional code, or turbo code
163
4.3
Turbo decoder for a PCCC
164
4.4
Typical BER performance curves of a turbo code for an increasing number of iterations. Reprinted from [33], ©IEEE, by permission of the IEEE
166
4.5
Decoder for a turbo code of rate 1/2
170
4.6
BER of a turbo code and the FB algorithm. The extrinsic information generated by each decoder is either modeled as a Gaussiandistributed random variable (first method) or used to update the a priori probabilities (second method). The considered numbers of iterations are 1, 3, 6 and 18. Reproduced from [147], ©2001 IEEE, by permission of the IEEE
177
List of Figures
4.7
4.8 4.9
4.10
4.11
4.12 4.13
4.14 4.15 4.16
Average value of ratio r]z /a^ versus the number of iterations, for various values of SNR and a turbo code. The component decoders use the FB algorithm. The extrinsic information generated by each decoder is modeled as a Gaussian-distributed random variable (first method). Reproduced from [147], ©2001 IEEE, by permission of the IEEE. . BER of a serially concatenated code and FB algorithm. The considered numbers of iterations are 1, 3, 6 and 18. Reproduced from [147], ©2001 IEEE, by permission of the IEEE Average value of ratio TJZ/(T^ versus number of iterations, for various values of SNR and a serially concatenated code. The component decoders use the FB algorithm. The extrinsic information generated by each decoder is modeled as a Gaussian-distributed random variable (first method). Reproduced from [147], ©2001 IEEE, by permission of the IEEE BER of the proposed detection schemes for a turbo code and SOVA. The extrinsic information generated by each decoder is either modeled as a Gaussian-distributed random variable (first method) or used to update the a priori probabilities (second method) or heuristically weighted (third method). The considered numbers of iterations are 1, 3 and 18. Reproduced from [147], ©2001 IEEE, by permission of the IEEE Average value of the ratio ry^/cr^ versus the number of iterations, for various values of SNR by considering a turbo code. The component decoders use SOVA. The extrinsic information generated by each decoder is considered as a Gaussian-distributed random variable (first method) BER of a serially concatenated code and SOVA. The considered numbers of iterations are 1, 3, 6 and 18. Reproduced from [147], ©2001 IEEE, by permission of the IEEE Average value of the ratio r)z/cr% versus the number of iterations, for various values of SNR and a serially concatenated code. The component decoders use SOVA. The extrinsic information generated by each decoder is considered as a Gaussian-distributed random variable (first method) Communication system Implicit phase estimation in the forward recursion, backward recursion and completion for the NCSOa algorithm Implicit phase estimation in the forward recursion, backward recursion and completion for the NCSOb algorithm
xix
178 179
180
183
184 185
186 186 198 199
xx
List of Figures
4.17 Noncoherent iterative decoding of a PCCC with BPSK. For comparison the performance of the corresponding coherent receiver is also shown. In all cases 1,5, and 10 decoding iterations are considered. . 200 4.18 Noncoherent iterative decoding of an SCCC with 8-PSK. For comparison the performance of the corresponding coherent receiver is also shown. In all cases 1 and 5 decoding iterations are considered. . 201 4. 19 Forward recursion for the computation of a& (sk) for a Fwd-only RSFB algorithm in the case of coherent detection for an ISI channel. Reproduced from [156], ©2001 IEEE, by permission of the IEEE. . 206 4.20 Backward recursion for the computation of (3k(s'k) for a Bwd-only RS-FB algorithm in the case of coherent detection for an ISI channel. The survivor map is constructed during this recursion. Reproduced from [156], ©2001 IEEE, by permission of the IEEE .........
207
4.21 Application of the proposed technique to iterative decoding/detection for an ISI channel. Receivers with various levels of complexity are considered and compared with the full-state receiver (£ = 16). The considered numbers of iterations are 1 and 6 in all cases. The performance in the case of coded transmission over an AWGN channel, without ISI, is also shown (solid lines with circles). Reproduced from [156], ©2001 IEEE, by permission of the IEEE .........
209
4.22 Application of the proposed Fwd-only state reduction technique to iterative decoding/detection, through linear prediction, for flat fading channels with /DT = 0.01. Receivers with various levels of complexity (in terms of prediction order TV and reduced-state parameter Q) are shown. The considered numbers of iterations are 1 and 6 in all cases. The performance in the case of decoding with perfect knowledge of the fading coefficients is also shown (solid lines). Reproduced from [156], ©2001 IEEE, by permission of the IEEE. . . 214 4.23 Forward recursion of the pdf &k(tk) for a general Fwd-only RS FBtype algorithm. Reproduced from [156], ©2001 IEEE, by permission of the IEEE ............................
217
4.24 Backward recursion of the pdf /^(e fc ) for a general Fwd-only RS FB-type algorithm. The metric c/>k is calculated using the survivor map previously constructed in the forward recursion. Reproduced from [156], ©2001 IEEE, by permission of the IEEE .........
217
List of Figures
4.25 Application of the RS-NCSOb algorithm to noncoherent decoding of an RSC code. Receivers with various levels of complexity are considered and compared with a full-state receiver (TV = 2) and the coherent receiver. Reproduced from [156], ©2001 IEEE, by permission of the IEEE 4.26 Transmitter and receiver for the uncoded BPSK transmission over ISI/AWGN channels 4.27 Performance comparisons of various self-iterative detection algorithms for Channel A assuming perfect CSI. The number of considered selfiterations / is 1 (solid lines) or 5 (dashed lines), and the number of states is indicated by £' in the case of state reduction. For comparison, the performance of the full-state receiver (f = 2048) is also shown. Reproduced from [166], ©2002 IEEE, by permission of the IEEE 4.28 Performance comparisons of various self-iterative detection algorithms for Channel B assuming perfect CSI. The number of considered selfiterations / is 1 (solid lines) or 5 (dashed lines), and the number of states, in the case of state reduction, is indicated by £'. For comparison, the performance of the full-state receiver (£ = 2048) is also shown. Reproduced from [166], ©2002 IEEE, by permission of the IEEE 4.29 Performance comparisons of various self-iterative detection algorithms for Channel C assuming perfect CSI. The number of considered selfiterations / is 1 (solid lines) or 5 (dashed lines), and the number of states is indicated, in the case of state reduction, by £'. For comparison, the performance of the full-state receiver (£ = 2048) is also shown. Reproduced from [166], ©2002 IEEE, by permission of the IEEE 4.30 Performance of a (2,1,9) NRC code with generators G\ = 7604 and G2 = 4174 with BPSK on an AWGN channel. / = 1 (solid lines) and 7 = 5 (dashed lines) self-iterations are considered for different numbers of reduced states £' and different packet length K. The performance in the full-state case (£ = 512 states) is also shown. . . 4.31 Performance of a (2,1,12) NRC code with generators d = 42554 and G2 = 77304 with BPSK on an AWGN channel. / = 1 (solid lines) and 7 = 5 (dashed lines) self-iterations are considered for different numbers of reduced states £'. A packet length K = 128 is considered in all cases, and for comparison, the performance of a full-state (C = 128) (2,1,7) convolutional code is also shown
xxi
222 239
240
241
242
243
244
xxii
List of Figures
4.32 Transmitter and receiver for the TCM system over an ISI/AWGN channel 245 4.33 Performance comparison of various iterative detection algorithms for the TCM/ISI channel (Opt. 1: (/i,/ 0 ) = (1,1), (Mo) = (1,1); Opt. 2: (/ i} / 0 ) = (3,5), (Mo) = (1,1); Opt. 3: (/i,/ 0 ) = (2,6), (Mo) = (0.625,0.375); Opt. 4: (/ i? / 0 ) = (2,10), (Mo) = (0.625,0.375)). Reproduced from [166], ©2002 IEEE, by permission of the IEEE 246 4.34 Communication system model for transmission over phase-uncertain channels 247 4.35 Schemes with separate detection and decoding using the proposed soft-output noncoherent algorithms: (a) transmitter and (b) receiver. 250 4.36 Receiver with combined detection and decoding for a PCCC of rate 1/2. Reproduced from [162], ©2000 IEEE, by permission of the IEEE. . 251 4.37 BER of the proposed iterative detection schemes using the NCSOb algorithm with predetection (dot-dashed curves), combined detection and decoding (solid curves), coherent decoding (dashed curves), and coherent predetection (dotted curves). The numbers of iterations are 1, 3, 6, and 18 in all cases. Reproduced from [162], ©2000 IEEE, by permission of the IEEE 252 4.38 BER of the proposed receivers using SO-NSD with combined detection and decoding for asymmetric (dotted curves) and symmetric (dashed curves) schemes. The numbers of iterations are 1, 3, 6 and 18 in both cases. The performance for coherent decoding (solid curve) and 18 iterations is also shown. Reproduced from [162], ©2000 IEEE, by permission of the IEEE 254 4.39 Application of the Fwd-only RS NCSOb algorithm to noncoherent decoding of a PCCC. Receivers with various levels of complexity are considered and compared with a full-state receiver (with N — 3) and a coherent receiver. The considered numbers of iterations are 1, 3 and 6 in all cases. Reproduced from [156], ©2000 IEEE, by permission of the IEEE 255 4.40 BER of the proposed detection scheme using SO-NSD with combined detection and decoding for various levels of phase noise. In all cases, the number of iterations is 6. Reproduced from [162], ©2000 IEEE, by permission of the IEEE 256 4.41 Iterative decoding of serially concatenated interleaved codes. Reproduced from [162], ©2000 IEEE, by permission of the IEEE 257
List of Figures
xxiii
4.42 BER of the proposed detection scheme using the NCSOb algorithm with N — 2 (dashed curves) and N = 4 (dotted curves) for the serial concatenation of two interleaved convolutional codes. For comparison, the performance of iterative coherent decoding (solid curves) is also shown. The numbers of iterations are 1, 3, 6 and 18. Reproduced from [162], ©2000 IEEE, by permission of the IEEE 258 4.43 Transmission scheme relative to a concatenated code constituted by an outer rate-1/2 RSC code and an inner differential encoder. . . . . 259 4.44 BER of the proposed detection scheme using the NCSOb algorithm with N = 3 (dashed curves) for the serial concatenation of a convolutional code, an interleaver and a differential encoder. For comparison, the performance of iterative coherent decoding (dotted curves) and optimal coherent decoding of the single convolutional code (solid curve) is also shown. In the cases with iterative detection, the numbers of iterations are 1, 3, and 10. Reproduced from [162], ©2000 IEEE, by permission of the IEEE 260 4.45 Berrou-type PCCC followed by differential encoding on the modulated symbols. Reproduced from [163] by permission of GET/Hermes Science 261 4.46 Performance of the system shown in Figure 4.45. The considered numbers of inner iterations are 1, 3 and 5 in all cases. Reproduced from [163] by permission of GET/Hermes Science 262 4.47 Benedetto-type PCCC followed by differential encoding on the modulated symbols. Reproduced from [163] by permission of GET/Hermes Science 263 4.48 Performance of the system shown in Figure 4.47. The considered numbers of iterations are 1, 3 and 6 in all cases. Reproduced from [163] by permission of GET/Hermes Science 264 4.49 SCCC constituted by an outer convolutional code and an inner Ungerboeck code. Reproduced from [163] by permission of GET/Hermes Science 265 4.50 Performance of the system shown in Figure 4.49. The outer code has 8 states and the number of iterations is 10 in all cases. Reproduced from [163] by permission of GET/Hermes Science 266 4.51 Performance of the system shown in Figure 4.49. The outer code has 16 states and the number of iterations is 10 in all cases. Reproduced from [163] by permission of GET/Hermes Science 267
xxiv
List of Figures
4.52 Performance of the system shown in Figure 4.49. The modulation format is 16-QAM and the number of iterations is 1, 5 and 10 in all cases. Reproduced from [163] by permission of GET/Hermes Science. 268 4.53 Turbo trellis-coded scheme by Benedetto et al. with 8-PSK modulation. Puncturing may be embedded in the component Ungerboeck codes to take into account QPSK modulation. Reproduced from [163] by permission of GET/Hermes Science
269
4.54 Performance of the system proposed in Figure 4.53. The modulation format is 8-PSK and the number of iterations is 1, 3 and 6 in all cases. Reproduced from [163] by permission of GET/Hermes Science. . . 271 4.55 Parallel concatenated coding scheme with separate detection and decoding: transmitter, channel and iterative decoder constituted by the concatenation of an A-SODEM and a coherent turbo decoder. Reproduced from [129], ©2004 IEEE, by permission of the IEEE. . . 277 4.56 BER of the separate scheme with rate-1/2 PCCC and QPSK output modulation. In all cases, 7e = 5 external iterations between the ASODEM and the turbo decoder are considered. Various numbers I\ of inner decoding iterations are considered. In the CL case, the adaptive algorithm is characterized by TV = 0 for a A = 5 degrees and by TV = 1 for <JA = 10 degrees. In the OL case, the detection algorithm is characterized by (TV, Q) = (7,3). Reproduced from [129], ©2004 IEEE, by permission of the IEEE 279 4.57 Serially concatenated coding scheme with combined detection and decoding: transmitter, channel and adaptive iterative decoder. Reproduced from [129], ©2004 IEEE, by permission of the IEEE. . . 280 4.58 Parallel concatenated coding scheme with combined detection and decoding: transmitter, channel and iterative decoder constituted by two adaptive component decoders. Reproduced from [129], ©2004 IEEE, by permission of the IEEE
281
4.59 BER of an SCCC with OL and CL inner decoding algorithm, for phase jitter standard deviation , then P{O,Q~D~I \fik-o} — 0. Hence, for any sequence a^D~l compatible with //&_£>, one can write:
where the last equality is based on the FMC (2.17). Hence, (2.19) becomes:
Since
from (2.21) it follows that
which completes the proof. 2. The second useful consequence of causality and finite memory conditions is that
Transmission Systems with Memory
37
Proof. In fact,
Based on the independence between information symbols and on the system causality assumption, it follows that
Applying the chain rule, one can immediately write:
where, based on the causality and finite memory conditions, each term in (2.27) can be expressed as
Using (2.28) in (2.27) we may write:
Finally, (2.25) becomes:
which completes the proof. The encoder/modulator block in Figure 2.4 can be often decomposed into a cascade of an encoder and a memoryless mapper. In this case, the causality and finite memory conditions imply analogous relations between the observation sequence r and code sequence c = c^""1, where Ck is a generic code symbol. These conditions can accordingly be formulated as follows:
38
A General Approach to Statistical Detection for Channels with Memory
These conditions, however, involve only the transmission channel and do not imply (2.16) and (2.17) in general. A case of interest may be that of a linear block code followed by a memoryless modulator. In particular, a linear block code is not guaranteed to be causal,5 so that channel causality (2.31) does not imply system causality (2.16). In the case of a linear block code, a trellis representation is possible, but the trellis is time-variant both in terms of states and branches [54,55], In this case, the evolution of the encoder/modulator over a trellis diagram should be described by a time-variant next-state function ns^(-, •). One can immediately conclude that a Tanner graph [81] (or other suitable graphical) representation for a linear block code—where the parity checks determine the structure of the graph—can be more appealing, especially if the parity check equations involve a few symbols (as in the case of low density parity check, LDPC, codes) [78]. A significant example where causality and finite memory conditions strictly hold is given by transmission over channels with ISI, possibly encompassing a nonlinearity with finite memory. The pdf at the right-hand side of (2.17) simplifies, by dropping the conditioning observables, to
where in this case the finite memory parameter C equals the channel dispersion. While it will be shown in much more detail in the following chapters, we now remark that the pdf in (2.33) can be directly used both in a Viterbi algorithm (VA) and a forward backward (FB) algorithm. A similar property holds for (2.32): this is of interest, for example, in the case of transmission of linear block codes over ISI channels. Hence, a sum-product (SP) algorithm based on the same basic metric can also be applied (see Chapter 5 for more details).
2.3.2
Stochastic Channels: Channels with Infinite Memory
While in Section 2.3.1 the case of detection for channels with strictly finite memory has been considered, it is important to underline that this is rarely observed in the case of wireless communication channels. In the case of a channel affected by stochastic uncertainty, the observations {rk} are dependent, so that the channel memory may not be finite. A very general parametric model for the observation rk is the following:
5
Block-wise causality must be indeed satisfi ed.
Transmission Systems with Memory
39
where L is an integer, 0 J is a sequence of stochastic parameters independent of a, and Uk is an additive noise sample.6 Under this channel model, the following conditional Markov property (CMP)
where TV is the order of Markovianity, is sufficient to guarantee an FMC. It is possible to show that (2.35) implies the following:
where the finite memory parameter is C = N + L. In fact, based on (2.35), one can write
The conditional pdf in the numerator of (2.37) can be expressed, by applying the total probability theorem, as follows:
Owing to the considered observation model (2.34), it is immediately concluded that
where C — N + L. As the stochastic parameters are independent from the information symbols, the second pdf inside the integral in (2.38) can be equivalently expressed as p(0^ a\_L_c, nk-L-c}- Finally, the integral (2.38) becomes:
6
The additive noise is not required to be Gaussian for the validity of the following result. We also point out that the symbol 6 will be used, in the remainder of the book, to indicate a possible channel phase rotation, besides a generic channel stochastic parameter, as in the subsection at hand. The context should eliminate any ambiguity.
40
A General Approach to Statistical Detection for Channels with Memory
Applying the same reasoning to the denominator of (2.37) (taking also into account the causality of the system), one can conclude that
which corresponds to (2.36). One can immediately recognize that (2.36) represents a special case of (2.17). As a consequence, all the derivations in the previous section hold. A statistical description of the stochastic parameter allows one to compute the basic pdf in (2.36) as follows:
Unfortunately, the above exact result is limited by the fact that in realistic scenarios the CMP (2.35) is seldom exactly met [47,95]. However, this result suggests a reasonable approach to devising good approximate detection algorithms whenever the conditional observations are asymptotically independent for increasing index difference.
2.4 Overview of Detection Algorithms for Stochastic Channels As described in Section 2.1, the design of detection algorithms involves many aspects, such as the channel model, the considered modulation/coding format, and possible constraints on the complexity of the algorithm. In particular, a major classification among detection algorithms is the following. • Hard-output detection algorithms generate "hard" decisions on the transmitted message, in the sense that their output is one of the possible messages in the transmission alphabet. The most common of these algorithms is the VA [96,97], which performs MAP sequence detection. This algorithm will be described in more detail in Chapter 3. • Soft-output detection algorithms, instead of generating hard decisions, compute reliability values on the transmitted messages. The most common algorithm is
Overview of Detection Algorithms for Stochastic Channels
41
the FB algorithm [54]. These reliability values can be of various types and they are usually related to the APP of the transmitted messages P{m^ r}. In particular, this type of algorithm is generally used to perform (exact or approximated) MAP symbol detection, so that the calculated APP can be written as P{dk r}. In the case of binary information symbols, a commonly considered reliability value is given by the logarithmic likelihood ratio (LLR), defined as
One can immediately recognize that an LLR captures within a single quantity the relationship between the APP of a transmitted "1" and that of a transmitted "0." Note that the formulation, based on the use of the LLR, can be also extended to the case of larger information symbol cardinality. In the case of M-ary symbols, (M — 1) LLRs are needed: the LLR relative to the m-th symbol, m = 1, . . . , M — 1, is given by the logarithm of the ratio between P{ak — m r} and P{ak = 0|r} —in other words, the reference probability still remains the APP of zero symbol, for which the LLR is thus 0. The transmission model described in Section 2.3 can be viewed as based on a single MK-ary signaling act or K repetitions of M-ary signaling acts. In the former interpretation, the message is the entire information sequence, whereas in the latter the message is each individual information symbol. According to these interpretations, two MAP detection strategies are obtained. MAP sequence detection is optimal in the sense that it minimizes the probability of erroneously detecting the entire sequence, i.e., selecting a sequence not equal to the transmitted one. MAP symbol detection minimizes the probability of erroneously detecting each information symbol. Specializing (2.4), we may derive the following general formulations of these detection strategies:
Although in principle which of these detection strategies is best may depend on the specific application, the difference is mainly of a conceptual nature.7 In fact, the most frequent erroneous decisions occurring in sequence detection are those characterized by only very few symbol errors. For high signal-to-noise ratios (SNRs), where 7
Note, however, that the use of symbol detection becomes essential in the case of iterative detection, as will be shown in detail in Chapter 4.
42
A General Approach to Statistical Detection for Channels with Memory
most frequent errors dominate, most sequence errors entail only very few symbol errors, typically one in uncoded systems or the minimum allowed number in coded systems. As a consequence, the symbol error rates attained by receivers designed according to these two strategies tend to become equal for increasing SNR. The receiver complexity necessary to implement these detection strategies may be significantly different. Sequence detection is often simpler to implement thanks to the well-known and celebrated VA. Symbol detection, however, has an important inherent feature which has attracted much attention in recent years, namely it generates the APP of each information symbol. This APP can be regarded as soft information about a symbol, in the sense that besides symbol decisions, taken by selecting the symbol with maximum APP, they provide information about the reliability of these decisions. If the APP of a specific symbol is much greater than those of the other symbols, a decision in favor of that symbol is more reliable with respect to the case when that APP is just a little greater than the others. The usefulness of soft information can be immediately understood considering a receiver implementation, as in the "decomposed" block diagram in Figure 1.2. The decoding function can be significantly facilitated, and its performance improved, if the demodulation function outputs soft information about the code sequence, instead of just the code symbol decisions. This soft information is also the basis of iterative detection techniques [33,98]. In this case, in order to compute P{a^ r}, one could average (by applying the total probability theorem) over the probabilities of all paths containing a^. However, this is not efficient. A much more efficient algorithm is the FB algorithm, which will be described in more detail in Chapter 4. Particular attention must finally be devoted to a soft-output decoding/detection technique which has been developing over the last few years, namely graph-based decoding. As mentioned in Section 1.6.1, while a trellis representation is attractive for a convolutional code, in the case of a linear block code a graphical representation is more appealing. For example, one can consider a factor graph constituted by two kinds of node: variable nodes (relative to the transmitted coded symbols) and check nodes (relative to the parity check equations characterizing the linear block code). Such a graphical representation is practical if the number of edges in the considered graph is not too large. This is the case, for instance, for a linear block code such that each parity check equation involves a limited number of coded symbols. Linear block codes of this type are, for example, LDPC codes. In particular, a simple and, under certain conditions, almost optimal decoding algorithm, referred to as a message passing algorithm, is based on the exchange of soft information between check and variable nodes. As will be shown in the following chapters, several of the proposed algorithms will guarantee almost optimal performance. However, the price to pay in terms of
Summary
43
complexity will sometimes be very heavy. In particular, one of the main conclusions which stem from the general framework proposed in this book is that the goal in designing a detection algorithm is not that of approaching8 the optimal performance of a receiver which perfectly knows the channel (regardless of the algorithmic complexity), but rather to approach the ideal performance with limited complexity.
2.5
Summary
This Chapter represents the "key" to understanding, with a unitary perspective, the entire book. After a brief discussion about statistical detection theory, the focus moved to transmission systems with memory. A statistical definition of causality and finite-memory represents the root for the derivation of most of the detection algorithms presented in the following chapters. In fact, these simple techniques lead to a unified approach to finite memory detection, which allows one to systematically derive MAP (trellis-based) sequence and (trellis-based and graph-based) symbol detection algorithms. In the case of stochastic channels, which correspond to the reality of wireless communication systems, we have shown that the application of a conditional Markov property still leads to finite memory detection. In particular, while not rigorous in most cases, the conditional Markov property practically allows one to describe well a stochastic channel, provided that the order of Markovianity is sufficiently large.
2.6 Problems Problem 2.1: Consider a 12-cross QAM constellation, obtained from a 16QAM constellation by removing the four points at the corners of the constellation. Assuming uncoded transmission, determine the decision regions for symbol-by-symbol detection at the receiver side. Problem 2.2: Consider a 32-ary amplitude PSK (32-APSK) constellation, as shown in Figure 2.5. The three radii r l 5 r2 and r3 are suitably optimized values. Assuming uncoded transmission, determine the decision regions for symbolby-symbol decision at the receiver side. Problem 2.3: [92, Section 4.2] In the transmission schemes considered in Figure 2.6, HI and n2 are additive noise vectors independent from each other and ^Note that reaching the optimal performance might be basically impossible in some cases.
44
A General Approach to Statistical Detection for Channels with Memory
Figure 2.5: Constellation for 32-APSK. from the discrete-time transmitted signal vector s. Determine, in each of the three cases shown in Figure 2.6, if vector r 2 is irrelevant in detecting the transmitted signal, i.e., if r i is a sufficient statistic (see Appendix A). Problem 2.4: Consider a single-input single-output channel with additive noise whose output is a scalar r = s + n, where the noise n is a Gaussian random variable with zero mean and variance cr2. Assume a binary transmission: s € {si, §2}, with P{SI} = p. Determine the decision regions depending on p. Problem 2.5: Consider a bidimensional channel with additive Gaussian noise. The received discrete-time vector can be written as r = s + n, where the signal vector s can assume the following values (ternary transmission):
and the noise vector n has independent components, with zero mean and variance cr2. Determine the decision regions in the following cases. A. Equally probable messages. B. The probabilities of the messages are P{mi} = 1/9, P^m?} = 4/9, Problem 2.6: The observation vector at the output of a fading channel can be expressed as
Problems
45
Figure 2.6: Possible communication systems. Reproduced from [92] by permission of John Wiley & Sons. where 0 denotes an element-by-element product of vectors or matrices. Show that the conditional mean vector and covariance matrix of the observation vector can be written as follows:
where rjf and C/ indicate, respectively, the mean vector and covariance matrix of the fading vector /, and Cn is the covariance matrix of the additive noise n. Problem 2.7: In a binary transmission system over a fading channel, the observation can be written as:
where s G {sl7 s2}, with
46
A General Approach to Statistical Detection for Channels with Memory
Assume that the random variables /i and /2 are zero-mean, independent Gaussian variables with equal variance a^, and that n\ and n-2 are zero-mean, independent Gaussian variables as well, with equal variance cr^. Derive the MAP decision strategy and the decision regions. Problem 2.8: A transmission system uses QPSK with the following signals:
where u — 2?r/T and
Derive the structure of the optimal receiver assuming that the signals are equally likely and transmitted over an AWGN channel. Problem 2.8: Provide an intuitive or physical explanation of the reason why the MAP detection strategy for known signals {s^(t)}^1 transmitted over an AWGN channel cannot be expressed as an explicit function of the following integral:
where T0 is a suitable time interval at least equal to the transmission interval. Problem 2.9: Consider transmission of independent and identically distributed antipodal binary symbols {o^} belonging to {+!,—!}. Assume that these information symbols are coded into symbols {c/J through the following differential encoding rule: Ck = c^-ia^. Show that MAP sequence and MAP symbol detection criteria coincide. Problem 2.10: Consider the transmission of PSK symbols {a^} belonging to the set {exp(j27Tjj) , k = 0, . . . , M — 1} and suppose that they are encoded through the following differential encoding rule: c^ — Ck-icik- Show that the optimal receiver, according to the MAP sequence detection criterion, reduces to a symbol-by- symbol receiver for coded symbols {ck}, emitting symbol decisions {ck} from which the estimated information symbols are recovered according to ak Problem 2.11: Suppose that the discrete-time observation at the receiver can be written as r^ = a& + n^. Suppose that a^ is real and n^ is complex. Show that ^{n/j} is irrelevant for any detection strategy.
problems
47
Problem 2.12: Let the observation vector be the concatenation of two subvectors:
and assume that the following condition is satisfied:
Show that vector r 2 is irrelevant, given r i ? in the decision problem and can be discarded (Theorem of irrelevance). Hint: formulate the MAP detection strategy in terms of the conditional joint pdf of these vectors and use chain factorization. Problem 2.13: Consider an M-ary signaling scheme with signal set { Assuming signal Si(t) is sent, the received signal at the output of an AWGN phase-noncoherent channel is
where 0 is uniformly distributed over 2yr. A. Determine a discretization process for the received signal which provides a sufficient statistic for MAP detection. B. Derive the noncoherent MAP strategy. C. Give examples of signal sets suitable for noncoherent detection. Problem 2.14: Prove that in the case of transmission over a channel with strictly finite memory, (2.33) holds.
6 Sequence Detection: Algorithms and Applications 3.1 Introduction In this chapter, we consider a generic approach to maximum a posteriori (MAP) sequence detection, with emphasis on detection over stochastic channels. In particular, the principle of MAP sequence detection, equivalent to the maximum likelihood (ML) sequence detection principle for equal a priori sequence probabilities, was introduced in the literature long ago [2]. The most celebrated and used MAP sequence detection algorithm, the Viterbi algorithm (VA), was introduced by Viterbi in 1967 and represents a very efficient solution for hard-output sequence detection [96,97]. In the remainder of this chapter the principle of sequence detection will be first introduced and revisited, recalling the main characteristic of a VA and describing some details about one of its possible extensions, namely the soft-output Viterbi algorithm (SOVA) [99]. The application of the general approach for detection over channels with memory proposed in Chapter 2 will then be specialized for the case of sequence detection. We will also consider a general discussion on detection and estimation, culminating with the description of the principle of per-survivor processing (PSP) [34].
Detection Algorithms for Wireless Communications- G. Ferrari, G. Colavolpe and R. Raheli ©John Wiley & Sons, Ltd. ISBN: 0-470-85828-1
49
50
Sequence Detection: Algorithms and Applications
3.2 MAP Sequence Detection Principle As seen in Chapter 1, the MAP sequence detection strategy can be formulated, referring to the equivalent discrete-time observation r — r^~l, where K is the transmission length, as
In particular, a brute-force approach to the identification of a consists of the evaluation of P{a r} for all possible information sequences a, selecting the sequence which maximizes the a posteriori probability. By chain factorization and owing to the independence of the information symbols, the MAP sequence detection strategy in (3.1) can be formulated in terms of the conditional probability density functions (pdfs) p(r a) and the a priori sequence probabilities P(a):
where the symbol ~ indicates that two quantities are monotonically related with respect to the variable of interest (in this case, a); a discrete-time version of the system causality (see Chapter 2) is assumed to hold in the second line; and the monotonicity of the logarithm is used in the third line. In particular, if the a priori probabilities {P{ak}} are equal, the MAP sequence detection criterion coincides with the ML sequence detection criterion. The maximization of (3.2) over all possible sequences {a} can be implemented as a search of a path in a tree diagram where each branch is in a one-to-one correspondence with an information symbol ak. Consequently, a path is in a one-to-one correspondence with a partial sequence a* up to epoch k. Assign first a metric equal to the k-th term of (3.2) to a branch at epoch k associated with symbol a&. Defining a path metric as the sum of the metrics of the branches forming that path, the MAP sequence detection strategy implements a search for the path with largest metric in this tree diagram. As assumed in Chapter 1 , the encoder/modulator can be described as a finite state machine (FSM) with state jj,k and characterized by the following "next-state" and
Viterbi Algorithm
51
"output" functions:
Considering a channel with complex additive white Gaussian noise (AWGN) with circular symmetry, i.e., with independent real and imaginary components, each with variance d1 — a simple and very common example of memoryless channel—the generic observation at epoch k can be written as
where nk is the AWGN sample. In this case, it follows that
and (3.2) can be reformulated as
As mentioned above, a brute-force approach to the implementation of the MAP sequence detection criterion would consist of evaluating (3.6) for all possible information sequences, choosing the maximizing sequence. Assuming M-ary information symbols, the complexity of this brute-force approach would be MK, i.e., exponential with the transmission length. Hence, this implementation of the MAP sequence detection principle is feasible only for short transmission lengths, whereas it becomes highly inefficient for longer transmission lengths. A much more efficient and appealing MAP sequence detection algorithm is the VA, which will be described in the next section.
3.3
Viterbi Algorithm
In the case of a strictly finite memory channel, MAP sequence detection can be formulated as indicated in (3.6), possibly by redefining symbol ck and the underlying FSM model. In particular, the optimal tree diagram can be folded into a trellis diagram, where the possible states at each epoch are given by all possible values of the
52
Sequence Detection: Algorithms and Applications
state Hk of the encoder/modulator FSM. In the remainder of this chapter the number of states of the encoder/modulator FSM will be indicated by Sc. Denoting by tk — (/4b, flfc) a transition in the trellis diagram, a branch metric associated with this transition can be defined as follows:
Upon the definition of the branch metric (3.7), the a posteriori sequence probability can be written as follows:
Without entering into the details (the interested reader can find plenty of literature regarding the VA [17,56]), the implementation principle of the VA is that of associating to each state /in a partial path metric relative to the corresponding path originating from a known state /IQ, at epoch k — 0, and terminating into //„. This partial path metric, indicated by A n (/^ n ), can be written as follows:
Obviously, P{a r} = A# (/// 1 and JQ+I state can be simplified as follows:
= JL — 1, the definition of the reduced
Complexity Reduction Techniques for VA-based Detection Algorithms
81
Figure 3.7: Set partitioning for 8-PSK constellation. If Ji = . . . — JQ = M', memory truncation, i.e., the state reduction technique considered in Section 3.10.1, arises as a special case. As an example of state complexity reduction, in Figure 3.7 a possible partition of an 8-ary phase shift keying (8-PSK) constellation is shown, where the starting constellation (A) and two successive constellations (B and C} are shown. Assuming uncoded transmission13 and a channel length L — 2, the full-complexity state is
and the number of states is £ = ML = 82 = 64. Various set partitionings are possible, as indicated in the following. 1. A reduced state could be obtained by memory truncation considering Q = 1. This corresponds to considering a set partitioning state reduction with Jl = S and J2 = 1 • In this case, the reduced state can be written as
and the number of reduced states is ' = 13
In this case, there is no encoder at the transmitter side, so that, as the modulation is memoryless, the possible encoder/modulator state //& disappears from the defi nition of state Sk.
82
Sequence Detection: Algorithms and Applications
2. A reduced state can be obtained by considering set partition with
The reduced state can be written as follows:
and the number of reduced states is given by f ' = 3\Ji ~ 4 x 2 = 8. 3. Finally, we consider the set partitioning characterized by
In this case the reduced state can be written as
and the number of reduced states is f ' = Ji J2 = 4 x 4 = 16. It is worth observing that set partitioning should follow the partition rules used in trellis coded modulation (TCM) [9]. Note also that if J\ < M' parallel transitions may occur (they do, for sure, in the uncoded case). In particular, if Ji < M' a state transition and the corresponding branch metric are defined as (/&(!), Wk) and Afc(4(l) 3 wk), respectively. The branch metric can then be written as follows:
where the pseudo state Sk(wk) must be compatible with w^ in the sense that ck-i 6 Ik-i(i)- The missing information can be based on tentative decisions or PSP. In particular, the PSP-based pseudo state can be expressed as
where (ck-i(wk)i • • • , ck-Q(wk)) are code symbols compatible with state wk to be found in the survivor history of state wk, and (ck-Q-i(wk), . . . , Ck-L(wk)) are c°de symbols in the survivor of state wk. Considering linear modulation on a static dispersive channel, the branch metric in the full-state trellis (with £ = SCML states) has the expression (3.76). In the case of
Complexity Reduction Techniques for VA-based Detection Algorithms
83
state reduction by set partitioning, the branch metric in the corresponding RS trellis can be written as follows:
where ak(ck, wk) is the information symbol associated with code symbol ck and reduced state wk—recall that we are considering an invertible coding rule. In particular, (ck-i(wk), • • • , Ck-Q(wk)) are code symbols associated with the reduced state Wk, while (ck-Q-i(wk), • • • , ck-L(wk)) are code symbols in the survivor associated with state wk.
3.10.3 A Case Study: TCM on an ISI Channel As an example of an application of the state reduction techniques considered above, namely memory truncation and set partitioning, we consider the case of TCM signaling on an ISI channel. In particular, we consider TCM characterized by a 3/4-rate convolutional code followed by mapping over a 16-ary quadrature amplitude modulation (QAM) constellation. The convolutional code has Sc — 4 states. The structure of the convolutional encoder is shown in Figure 3.8. The information bits are {ak }2=0 and the code bits are indicated as {ck}f=0- Note that the gross spectral efficiency of the TCM code is 3 bits/s/Hz. Using raised cosine pulses with roll-off e = 0.3 [17], the spectral efficiency has to be reduced by the expansion factor 1 + e = 1.3, so that the net spectral efficiency is 2.3 bits/s/Hz. In Figure 3.8, the mapping over the branches relative to a trellis transition is also indicated. In particular, note that each branch indicated in the section of the trellis diagram corresponds to 2 2 parallel branches—as one can see, two of the three bits at the input of the convolutional encoder do not influence the evolution of the encoder and are systematic bits. In Figure 3.9, the set partition for the considered 16-QAM modulation and the mapping rule are shown. In particular, note that each branch in the trellis section shown in Figure 3.9 is associated with a subset of C. The model of the discrete observable is the following:
84
Sequence Detection: Algorithms and Applications
Figure 3.8: TCM encoder and mapping for 16-QAM (the subsets are specified in Figure 3.9).
Figure 3.9: Set partition and mapping rule for 16-QAM constellation. where {//j^o ^s me white noise discrete equivalent impulse response of the ISI channel, {nk} is an iid Gaussian noise sequence with variance cr2 per component, and {ck} is the code sequence. The parameter L, representing the discrete-time equivalent channel length, must be large enough to accommodate the significant pulses. In particular, we consider L — 3, and the distribution of the channel coefficients {//} is
Complexity Reduction Techniques for VA-based Detection Algorithms
85
Figure 3.10: Equivalent discrete-time channel response of an ISI channel. shown in Figure 3.10. The full-complexity state, in this case, is
and the number of states is f = SCML — 4 x 83 = 2048. Various state reductions are possible and in the following we enumerate a few interesting options. 1. The first possibility is to consider memory truncation with Q = 1, in which case the reduced state can be written as follows:
and the number of reduced states is £' = SCMQ = 4 x 81 = 32. Note that memory truncation corresponds to set partitioning with Jj = 16, J% — J% — 1. 2. It is possible to consider set partition with
In this case, the partial state can be written as
and the number of reduced states is £' = Sc^ = 4 x 4 = 16. 3. Another possible set partition is characterized by
In this case, the partial state is
and the number of reduced states i s / = S ' c
=
4 x 2 x 2 = 16.
86
Sequence Detection: Algorithms and Applications
Figure 3.11: Performance of TCM with 16-QAM for transmission over the 4-tap ISI channel considered in Figure 3.10. Reproduced from [106], ©1996 IEEE, by permission of the IEEE. 4. Another possible set partition is characterized by
In this case, the partial state becomes
and the number of reduced states is
=
— 4 x 2 = 8.
5. Finally, one can consider the case of memory truncation with Q = 0, i.e., considering only the code trellis. This results in w^ = //*. and the number of states is £' = Sc = 4. We now analyze the performance of a receiver using VA-based algorithms for combined detection/decoding and the state complexity reduction techniques enumerated above. The results are shown in Figure 3.11, in terms of bit error rate (BER) versus E^/TVo, E^ being the received energy per information bit and N0 the one-sided noise power spectral density. In particular, two numbers are indicated in correspondence to each curve: the first one corresponds to the number of reduced states—recall
Complexity Reduction Techniques for VA-based Detection Algorithms
87
that the number of full states in this case is f = 2048—and the second one corresponds to the number of estimators considered in the PSP-based VA (more details can be found in [106]). In particular: • the case with (' = 32 corresponds to the reduced combined code/ISI trellis (case 1); • the case with ("' = 16 corresponds to the reduced combined code/ISI trellis (case 2); • the case with £' = 4 corresponds to the code trellis only (case 5). For comparison, the performance curve relative to the case of absence of ISI is also shown.
3.10.4 Reduced-Search Algorithms If optimal processing is infeasible, any type of suboptimal processing may deserve our attention. Nevertheless, ranking of suboptimal solutions is difficult because of lacking reference criteria. Note that RSSD must be considered but an alternative among many others. More precisely, while the state reduction techniques proposed in the previous subsection allow for systematic state reduction over successive trellis transitions, another approach to performing complexity reduction is possible. In particular, reduced-search (or sequential) algorithms may be used to search a small part of a large FSM trellis diagram or a non-FSM tree diagram. As opposed to statecomplexity reduction, the original full-complexity trellis (or tree) diagram is partially searched. These algorithms date back to the pre-VA era [107]. They were first proposed to decode convolutional codes. The denomination "sequential" emphasizes the "novelty" compared to the then-established algebraic decoding of block codes [4]. These algorithms can be applied to any system characterized by large memory or state complexity (if an FSM model holds). In the following, we assume that an FSM model exists, and we refer to its corresponding trellis diagram. A general formulation of breadth-first algorithms is presented in [108,109]. Assuming that an FSM model holds, let £ be the number of states in the full-size trellis. It is possible to partition the f states into nc\ disjoint classes— that is, these classes constitute a partition of the ensemble of f states. We assume that npa paths per class are maintained, by selecting those which maximize the a posteriori probabilities (APPs) under the constraint imposed by the partition rule and the class structure. The resulting search algorithm is denoted as SA(n pa ,n c i) [108]. In the following, M corresponds to the cardinality of the symbols at the input of the FSM. Special interesting cases can be summarized as follows.
88
Sequence Detection: Algorithms and Applications
• npa > 1 and nc\ = £: the SA(npa, nc\) reduces to the list Viterbi algorithm (LVA) with B survivors per state. • ^pa = 1 and nc\ = £: the classical VA is obtained. • npa = 1 and nc\ < £: this situation corresponds to RSSD with nc\ states. • npa > 1 and nc\ < f: this corresponds to list RSSD with npa survivors per state and nci states. • npa = M and nc\ — 1: the M-algorithm is obtained. Whenever nc\ < £, the branch metric can be computed by applying the principle of PSP. Moreover, PSP allows also the above formalization to be applied when an FSM model does not hold (£ —> oo). For this class of algorithms the complexity level is related to the total number of paths being traced, i.e., the product npanc\. Imposing a constraint on the complexity, i.e., npanci < 77, where r? is a suitable complexity threshold, constrained optimality can be defined. According to this constrained optimality criterion, it is possible to show that the M-algorithm is the constrained optimal search algorithm [108]. Finally, we mention another reduced-search technique, namely the T-algorithm, where T stands for threshold. This algorithm is applicable to a VA running over a full-complexity (or possibly reduced-complexity) trellis. The basic idea of this algorithm consists of maintaining only the paths with partial metric above a predefined threshold. If a path has a partial path metric above the threshold, the path is discarded. Note that while the SA(npa, nci) class of algorithms has a predetermined complexity (in terms of total number of paths being traced), the T-algorithm has a time-varying complexity, since the number of paths maintained can vary across consecutive trellis transitions.
3.11 Appplications to Wireless Communications In this section, we show how the general approach proposed in Section 3.5 allows one to derive, in a unified manner, novel, as well as known, sequence detection algorithms. In particular, we focus on the case of sequence detection for phase-uncertain channels and fading channels. Various algorithms, depending on the considered channel model and particular estimation/detection approach, will be derived.
Applications to Wireless Communications
89
3.11.1 Adaptive Sequence Detection: Preliminaries and Least Mean Squares Estimation Broadly speaking, the channel model parameters can be time-varying (e.g., carrier phase, timing epoch and channel impulse response). A receiver based on the estimation-detection decomposition must be able to track these time variations, provided they are not too fast. The receiver must adapt itself to the time- varying channel conditions: the concept of PSP can then be successfully applied to the design of adaptive receivers. • The per-survivor estimator associated with the best survivor is derived from data information which can be perceived as high-quality zero-delay decisions. PSP is thus useful in fast time- varying channels. • Many hypothetical data sequences are simultaneously considered in the parameter estimation process. In this sense, acquisition without training (blind) may be facilitated. Assume a parameter-conditional FSM model with state Sk (i.e., we are implicitly assuming that the FMC holds). In this case, it is possible to consider two types of PSP-based parameter estimation strategy: feed-forward and feedback. • A PSP-based feed-forward data-aided parameter estimator at time k can be written as follows:
where p[-, •] is a suitable function of the observations and coded symbols. The estimator is a function of the TV most recent signal observations14 r^~lN and the per-survivor aiding data sequence cjp1^) is associated with state Sk> Based on the computation of a parameter estimate, the branch metric associated with a transition at epoch k starting from state Sk can be written as follows:
and the update rule of the parameter estimate at epoch k + 1 becomes
Note that these estimates are simply recomputed for the new observation vector r%_N+l and each new survivor sequence cfa(Sk+i)- In this sense, they are 14
A CMP, implying an FMC, is implicitly assumed.
90
Sequence Detection: Algorithms and Applications
formally independent from the estimates at the previous epoch. Examples of applications, considered in the following, are the designs of linear predictive and noncoherent detection algorithms. • We indicate a PSP-based feedback data-aided parameter estimate at time k as 0k (Sk). Based on this estimate, the branch metric associated with a branch starting from state Sk is
and the update rule of the parameter estimate at epoch k + 1 can be written as
where p[-, •] has the same meaning as in the feed-forward parameter estimate in (3.101), whereas q[-, •] is a function of the n^ previous feedback estimates. These estimates are computed for the TV most recent observations rk_N+l and each new survivor sequence cfc(Sk+i)- The previous n^ parameter val^ k fb ues Ok_nfb+1(Sk) are those associated with the survivors of states Sk in the transitions (Sk —> Sk+i) selected during the ACS step. Feedback parameter estimation is usually used in adaptive receivers [105]. ~k Note that in feed-forward and feedback parameter estimation, tentative decisions c0 can be used in place of the survivor data sequences cJ(Sfc +1 ) for updating the parameter estimate. If this is the case, then the parameter estimate becomes universal, i.e., identical for all survivors. Formally, the updating recursions yield identical estimates for all survivors. The parameter estimator becomes external to the detection block. During training the correct data sequence would be used. In the following, we consider a few significant examples of sequence detection algorithms using feedback parameter estimation. Tracking of a Dispersive Time-Varying Channel Assume the following model for a linearly modulated discrete observable (slow variation):
where: fk = (fak, fi,k, • • • •> /L,k)T is the overall time-varying discrete equivalent impulse response at the /c-th epoch, ck — (ck, C f e _ i , . . . , Cfc_^) T is a code sequence of
Applications to Wireless Communications
91
length L relative to the encoder/modulator FSM with state fa, and nk is an AWGN sample with variance per component equal to cr2. We define as Sk = (afe_i, a f c _ 2 , . . . , ak-L, A^-L) the system state including the encoder/modulator and the channel. The code symbol vector uniquely associated with the considered trellis branch (ak, Sk), in accordance with the coding rule, is indicated as follows:
• Considering LMS adaptive identification, the parameter estimator updating rule can be written as
where d > I represents a suitable delay and complies with the causality condition upon the data. The parameter /3 compromises between adaptation speed and algorithm stability [105]. In this case, the branch metric can be expressed as
In the (tentative) decision-directed tracking mode, the recursive estimator updating rules are
In the case of PSP-based LMS adaptive identification, the branch metric becomes
where the channel estimate f k ( S k ) depends in this case on the survivor asso ciated with state Sk. The channel estimate update recursion can be written a: follows:
where
92
Sequence Detection: Algorithms and Applications
Figure 3.12: SER performance of uncoded QPSK transmission over a 3-tap dispersive fading channel and LMS-based adaptive detection. The normalized Doppler rate is /DT = 1.85 x 1(T3. Reproduced from [34], ©1995 IEEE, by permission of the IEEE. The parameter estimate update recursions must take place along the transitions (Sk —> Sk+i) which extend the survivors of states {S^}, i.e., those selected during the ACS step at time epoch k. As an example of application, we consider the case of transmission of uncoded quaternary PSK (QPSK) (M = 4) over a Rayleigh fading channel with 3 independent tap weights. The channel power delay profile (standard deviations of the tap gains) is -4(1,2,1). We consider data blocks of K = 60 symbols, with training preamble and tail. In Figure 3.12, the performance, in terms of symbol error rate (SER) versus signal-to-noise ratio (SNR), in the case of a normalized Doppler rate /DT = 1.85 x 10~3 is shown. In particular, in the 1.8 GHz band this Doppler rate corresponds to (i) I/T = 24.3 kHz in the case of a speed of 32.5 km/h and to (ii) 1/T = 270.8 kHz for a speed equal to 300 km/h. Note that the results shown in Figure 3.12 refer to the case of full-state sequence detection with Q — L = 2 (£ = 16 states). In Figure 3.13, the performance in the case of a normalized Doppler rate /DT = 3.69 x 10~3 is shown. In particular, in the 1.8 GHz band this Doppler rate corresponds to (i) 1/T = 24.3 kHz in the case of a speed of 65 km/h and to (ii) 1/T = 270.8 kHz for a speed equal
Applications to Wireless Communications
93
Figure 3.13: SER performance of uncoded QPSK transmission over a 3-tap dispersive fading channel and LMS-based adaptive detection with dual diversity. The normalized Doppler rate is fDT = 3.69 x 1CT3. to 600 km/h. In this case as well, a full-state receiver is considered and dual diversity is used. Joint Detection and Phase Synchronization The problem of optimal detection of a possibly encoded information sequence transmitted over a bandpass AWGN channel is commonly approached by trying to approximately implement coherent detection. In applications in which a coherent phase reference is not available, this approximation is based on the use of a phase synchronization scheme, which extracts a phase reference from the incoming signal, in conjunction with a detection scheme which is optimal under the assumption of perfect synchronization. Since the reconstructed phase reference is only an approximation of the correct one, the overall detection scheme is only an approximation of ideal, i.e., with perfect phase reference, coherent detection. Although widely adopted, this solution should be regarded as just an ad hoc heuristic procedure based on one possible logical approach and will be referred to as pseudocoherent. In this subsection, we examine a simple strategy characterized by PLL-based estimation.
94
Sequence Detection: Algorithms and Applications
The model of the linearly modulated discrete observable is the following (valid in the case of slow variation):
where: 9k is a channel-induced phase rotation; {ck} is the coded/modulated sequence generated by an encoder/modulator modeled as an FSM with state /^; and {nk} is an iid Gaussian noise sequence with variance per component equal to a2. In this case, the feedback parameter estimation is based on the use of a first order data-aided PLL and reads as follows:
where the parameter 77 controls the loop bandwidth. In the following, we briefly recall two possible phase tracking strategies: decision-directed and PSP-based. • In the case of decision-directed phase tracking, the branch metric can be written as follows:
where ck(ak, Sk) is the code symbol associated with transition (ak^k). The PLL phase-update (feedback) recursion is
The tentative decision delay d must comply with the causality condition upon the detected data, which implies d > 1. • In the case of PSP-based phase tracking, the branch metric can be written as follows:
The phase estimate update recursion in this case is
where ck(ak,^k) is the code symbol associated with transition (ak,Hk)- The phase estimate update recursions must take place along the transitions (p,k —> fjtk+i) which extend the survivors of states {/^k}, i.e., those selected in correspondence to each state p,k+i during the ACS step in correspondence to the trellis section at epoch k.
Applications to Wireless Communications
95
Figure 3.14: SER performance of PSP-based detection of a TCM with 8-PSK. For comparison, the performance of a conventional data-aided receiver (with d = 2) is also shown. Reproduced from [34], ©1995 IEEE, by permission of the IEEE. As an example of application, we consider the transmission of TCM over a channel introducing a phase noise modeled as a Wiener process according to
where A& are Gaussian, iid random variables with standard deviation i- e -> the usual differential encoding rule for QPSK modulations applied to symbols {pn}. The branch metric (3.126) must be expressed in terms of the information symbols in such a way that the VA also implements differential decoding. Multiplying the terms and Eila1 rn-*cn-i in (3.126) by \qn = 1, noting that16 AT
16
1
Symbols fin, pn, qn are defined by this differential encoding rule applied to the data sequence {an}.
Applications to Wireless Communications
99
and cn = an = \fj,n , from (3.126) one can immediately obtain:
where the trellis state is evidently defined as
The number of states f = \MN depends exponentially on the integer TV. Complexity reduction techniques may also be adopted. We only consider schemes with a number of states £' = 4 (i.e., the reduced state is defined as wn = /}n) and £' = 1 (i.e., symbol-by-symbol detection with decision-feedback is performed). Figure 3.15 shows the relevant performance, obtained by computer simulation, for various values of TV and compares it with that of optimal coherent detection. It may be observed that the proposed receivers exhibit a loss of only 0.5 dB at a BER of 10 ~5 with respect to coherent detection, with affordable complexity. For larger complexity, the performance approaches that of coherent detection. We also note that the performance tends to that of coherent detection with a rate which is independent of the SNR. We then consider the application of NSD to TCM. Two 8-state trellis coded (TC) 16-QAM schemes, optimal under coherent detection, are considered: the first is a nonrotationally invariant (NRI) code [9] and the second a 90° rotationally invariant (RI) code [57]. The RI code is used in conjunction with a differential encoder and, in the proposed algorithms, the VA also implements differential decoding. Various noncoherent receivers with different complexities have been analyzed. Figure 3.16 shows the performance of the considered receivers along with that of coherent detection. Receivers based on the code trellis (f' = 8) exhibit a performance loss of about 1.5 dB (for TV = 8), but with an increase of the number of states to (,' = 64 the performance loss becomes negligible. The state of the receivers with £' = 16 is defined by a complete representation of the previous information symbol a n _i and a partial representation of symbol a n _ 2 . The proposed approach can also be extended to the case of ISI channels. In fact, proceeding as in [112], an alternative set {zk} of sufficient statistics an be obtained through a whitened matched filter (WMF) whose output may be expressed as
100
Sequence Detection: Algorithms and Applications
Figure 3.15: BER of NSD detection schemes for 16-DQAM with various degrees of complexity. Reproduced from [47], ©1999 IEEE, by permission of the IEEE. in which {n/-} are Gaussian zero-mean independent complex random variables of variance per component equal to cr2 = NQ and
where L is a channel memory parameter. The equivalent discrete-time channel impulse response {/&} can be obtained from sequence {gk} following the method described in [112], where g^ = g(kT) represents a sample of the overall channel impulse response g(t). Reasoning in the same way as in the absence of ISI, it can be immediately concluded that, in this case, the branch metric can be eventually written as follows:
As an example of application, we consider the case of transmission of 16-DQAM over two ISI channels with L — 2. The receiver is composed of a WMF and a VA based on branch metric (3.133). The considered channels correspond to the following overall discrete impulse responses at the output of the WMF: (/0, /i, f i ) =
Applications to Wireless Communications
101
Figure 3.16: BER of NSD detection schemes for 8-state TC-16-QAM. Reproduced from [47], ©1999 IEEE, by permission of the IEEE. (0.3,0.9,0.3) for Channel 1 and (/ 0 ,/i,/ 2 ) = (0.408,0.816,0.408) for Channel 2 [17]. The optimal coherent receiver is characterized by Sc = 256 states—in this case the ISI channel is the underlying FSM. In Figure 3.17, the performance of the optimal coherent receiver for both channels is shown and compared with the performance of the proposed NSD schemes with various values of TV. For a given value of TV, the number of states resulting from (3.133) is C = \MN+L = 4 x 16jv+1. In the simulations, we used the described state-complexity reduction techniques and considered a number of states (,' = 256, i.e., the same number of states of the optimal coherent receivers for the two channels. The state of these noncoherent receivers is defined by a complete representation of the symbols a n _i and a n _ 2 . Despite the constraint on state-complexity, for increasing values of TV the BER tends to approach that of coherent detection. State-complexity reduction has also been considered, both in noncoherent and in coherent receivers. In the figure, the performance for £ = 16 states is shown for the considered channels. The state definition takes into account a complete representation of the single symbol a n _i. We then have a two-fold comparison between a coherent receiver and an NSD-based receiver. With reference to the proposed noncoherent receivers, the figure shows their performance loss, under an equal state-complexity constraint, and their complexity increase, under an approxi-
102
Sequence Detection: Algorithms and Applications
Figure 3.17: BER of the proposed detection schemes for 16-DQAM on the two considered ISI channels and various values of N. The noncoherent detectors search a trellis with £' = 256 states. Reproduced from [47], ©1999 IEEE, by permission of the IEEE. mate equal-performance constraint. As stated in Chapter 1, it is possible to derive an algorithm considering a particular channel model (which usually simplifies the design) and then apply the same algorithm to other channel models—which must be obviously related to the original channel model and somehow compatible with it. Owing to the FMC applied in the derivation of the proposed detection algorithm for phase noncoherent channels, these detection algorithms can also be applied in the case of dynamic channel conditions. As an example, the performance under dynamic channel conditions has also been investigated, assuming the transmitter and receiver filters have square root raisedcosine frequency response with roll-off 0.5. Two types of time-varying phase model are jointly considered. • The first is the well-known stochastic model of phase jitter. Accordingly, the phase of the received signal is modeled as a time-continuous Wiener process 0(t) with incremental variance over a signaling interval equal to a\. • The second is a deterministic model of a frequency offset A/.
Applications to Wireless Communications
103
Figure 3.18: BER of the proposed receiver for DQPSK with TV = 5 and £' = 1 for various values of phase jitter standard deviation (white marks) and frequency offset (black marks). Reproduced from [47], ©1999 IEEE, by permission of the IEEE. The proposed noncoherent receivers are robust to phase jitter and frequency offset, as may be observed in Figure 3.18 for differential QPSK (DQPSK), TV = 5 and C' = 1. A jitter standard deviation up to 5 degrees (per signaling interval) does not degrade significantly the receiver performance. Values of the frequency offset A/, normalized to the symbol frequency 1/T, up to A/71 = 10~2 do not entail appreciable degradation. The proposed approach can also be extended to the case of transmission over a channel affected by frequency uncertainty, besides phase noise. In this case, the assumed model is depicted in Figure 3.19, where 9 is modeled as a random variable with uniform distribution in [0, 2vr), while the frequency offset v is assumed deterministic. We consider the case of linearly modulated signals, so that the information bearing signal can be written as
where: K denotes the number of transmitted code symbols; T is the signaling interval; and h(t) is a properly normalized shaping pulse. Assuming perfect symbol
104
Sequence Detection: Algorithms and Applications
Figure 3.19: System model in the case of a channel with phase and frequency uncertainty. synchronization and absence of frequency offset, the output, sampled at time t = kT, of a filter matched to the shaping pulse h(t) is a sufficient statistic for this detection problem. If a moderate frequency offset is present, this is not true in a strict sense; however, this sampled output may still be considered as an approximate sufficient statistic. This assumption is commonly used in the derivation of frequency estimation algorithms (see [113,114] and references therein).17 Assuming that a genie tells us the value of the (unknown but deterministic) frequency offset, one could use the following metric, obtained by directly extending the metric in (3.125):
where cr2 is the variance per component of the AWGN sample. In particular, it is possible to consider an estimate based on the L most recent observations r%_L and code symbols cJ:_L, so that we can write:
17
For larger values of the frequency offset z/, a different front-end, possibly based on oversampling techniques, could be used [94].
Applications to Wireless Communications
105
and rewrite (3.135) as follows:
The frequency estimate v may be obtained, for instance, from ML data-aided estimation based on the algorithm proposed in [115]. However, in general, a simpler data-aided algorithm may be used. We consider some of the algorithms described in [114], namely those in [116] and in [113]. In the technical literature, these algorithms have been used for M-ary PSK (M-PSK) signals [113,114]. They may be easily extended to the general case of non-equal-energy signals such as M-ary QAM (M-QAM) [117]. In particular, in [117] the starting point for the derivation of this class of algorithms is the likelihood function for joint data detection and frequency estimation, which can be written as
where; c = c^ l is the code sequence uniquely associated with the information sequence a by the given coding rule; v denotes a trial value of the frequency offset; and rk = r(t) h(—t)\t=kT is the matched filter output, sampled at time t = kT. The likelihood function (3.138) is similar to that obtained in [47] in the absence of frequency offsets—the likelihood function considered in [47] may be obtained by letting v = 0 in (3.138). A property of interest of the likelihood function (3.138) is now described. Let us assume that two code sequences c^ and c^ corresponding to/ -i \distinct information sequences a^ and a^\ respectively, are such that ck = c k 'e^27ri/okT+9\ where ^0 is some frequency value and 9 is some phase rotation. The structure of the likelihood function (3.138) implies that
106
Sequence Detection: Algorithms and Applications
A consequence of this property is that if the coding rule admits such sequences, a decoding strategy based on (3.138), directly or by means of some approximations, will not be able to distinguish between the information sequences a^ and a^. For generality, the concept of phase rotation of order I may be introduced. A time-varying phase rotation <j)k is said to be of order / if it may be expressed as
where l . . . , 4> are suitable constants. The above property (3.139) may be reformulated in the following terms: two code sequences which differ for a phase rotation of the first order are indistinguishable. For this reason, a code which admits code sequences that differ for a first order phase shift is catastrophic when decoded by means of the strategy entailed by (3.138). Note that this problem resembles, in a higher order sense, the characteristic of noncoherent receivers of being unable to distinguish code sequences which differ for a phase rotation of zero-th order [47]. Although the usual differential encoding (DE) rule solves this zero-th order ambiguity, it is catastrophic in a first order sense. In fact, DE has the property that two code sequences which differ for a zero-th order phase rotation are associated with the same information sequence, but code sequences which differ for a first order phase rotation may derive from different information sequences. The proposed detector with branch metric (3.135) may be affected by the catastrophic property (3.139). In order to overcome this problem, two possible solutions can be derived by extending simple countermeasures for the zero-th order ambiguity problems of noncoherent receivers to the first order case. An example of indistinguishable sequences in a noncoherent receiver for uncoded QPSK modulation is shown in Figure 3.20 (a). Sequence a^ is a rotation of sequence a/1) by Tr/2 and cannot be distinguished. However, if the phase rotation 9 is limited as \9\ < ?r/4, as in the case of the sequence {zk} = {ak ej61}, the sequence a^ can be correctly recovered. Alternatively, differential encoding/decoding associates code sequences which differ for a constant phase shift with the same information sequence, eliminating this ambiguity. In Figure 3.20(b), a first order analogy is shown. Sequence a^ is a first order rotation of sequence a^ with vT = 1/4 and cannot be distinguished.18 However, if the frequency offset is limited to vT\ < 1/8, as in the case of sequence izk} — {ak V27rzyfcT}, the ambiguity may be resolved. Alternatively, double differential encoding (DDE) with suitable differential decoding may be adopted [45,118]. These solutions are described in the following. The first solution (method 1) consists of limiting the frequency offset range and estimation interval. Denote by I/JQ the angle of invariance of the considered constel18
In (3.140), 0
=27rz/.
Applications to Wireless Communications
107
Figure 3.20: Examples of indistinguishable sequences: (a) noncoherent receiver (zero-th order); (b) advanced receiver with frequency estimation (first order). Reproduced from [117], ©2002 IEEE, by permission of the IEEE. lation. In the case of M-PSK signals, ^0 = 2?r/M, whereas, in the case of square M-QAM, ^0 = 7T/2. As 27rz/T is the phase rotation introduced by the channel in one signaling interval, when —^o/2 < 2?r^T < ^0/2, this phase rotation does not yield ambiguities. At the same time, this condition should be also satisfied by the estimate p£,(on). To this purpose, each estimate such that |27rPi(an)T| > ^o/2 may be replaced by the closest value in the allowed interval. This method may only be applied for limited values of the frequency offset. That means that for QPSK the condition vT\ < 1/8 = 0.125 must be satisfied, whereas the allowed interval reduces to \vT < 0.03125 for 16-PSK constellations. To avoid zero-th order ambiguities, DE
108
Sequence Detection: Algorithms and Applications
can be used. An alternative solution (method 2) consists of the use of DDE, possibly followed by a code invariant to first order rotations.19 As a consequence, unlike the previous case in which a limitation of the estimation interval must be imposed at the receiver, this solution does not require any modifications of the receiver operation (except for the fact that the new encoding rule must be taken into account). DDE is described for M-PSK modulations in [45, Chapter 8] and [118]. In this case, symbols {ck} belonging to an M-PSK alphabet are derived from symbols {a^}, belonging to the same alphabet, by means of the following DDE rule: In the case of M-QAM signals, quadrant DDE may be used. Expressing the generic information symbol of an M-QAM constellation as ak = /J,kpk, where p.k belongs to the first quadrant and pk G {±1, ±j}, the encoded symbol ck is given by ck = [ikqk, where qk G {±1, ±j} are defined by the DDE rule for QPSK modulations applied to symbols {pk}, that is according to the rule At this point it is possible to consider a PSP-based estimate of the frequency offset. In other words, the frequency estimate could be based on the survivor associated with a reduced state wk, so that one can substitute v by P, defined as where the state wk has to be properly defined and FS(wk) indicates the survivor20 associated with the reduced state wk. Note that in (3.136) the definition of PS(wk) is very general, since, depending on the coding/modulation structure, the definition of the survivor could change. Moreover, this technique allows one to choose the parameters N and L independently from the number of states f of the VA. The number of reduced states £' may therefore be limited retaining sufficiently large values for N and L. In order to compute the branch metric in an RS trellis, the necessary symbols not included or not completely specified in the reduced state definition wk are found in the survivor history embedded in FS(wk). The performance of the receiver based on branch metric (3.137) for DQPSK, N = 6, C' = 16, various values of L, and limitation of the estimation interval (method 1), is shown in Figure 3.21. The algorithm in [116] has been used for frequency 19
These codes may be viewed as an extension of the usual RI codes [119]. The terminology used to indicate the survivor, i.e., FS, recalls the fact that the recursion in the VA proceeds, time-wise, forwards. In Chapter 4 it will be shown that in an FB algorithm the backward recursion might rely on a backward survivor. 20
Applications to Wireless Communications
109
Figure 3.21: BER of the receiver based on (3.137) (white marks) for DQPSK and comparison with NSD (black marks) and coherent receivers. The frequency offset is v — 0. Reproduced from [117], ©2002 IEEE, by permission of the IEEE. estimation, but no differences can be observed using more complex algorithms such as those in [113,115]. The optimal coherent receiver and a NSD receiver with N — 6 and C' = 16 are also considered for comparison. The frequency offset is v = 0. It may be observed that, for increasing values of L, the performance of the NSD receiver may be approached at least at high SNR. For L = 6, a loss of about 1 dB is exhibited at a BER of 10~4. This power loss reduces to 0.2 dB for L = 11 and is negligible for L = 15. In Figure 3.22, the performance of the proposed receiver based on branch metric (3.137) under dynamic channel conditions is evaluated. It is also compared with that of an NSD receiver in order to assess its robustness. The modulation format and the considered receiver are those in the previous figure with L — 11. The transmit and receive filters have square root raised-cosine frequency response with roll-off 0.5.21 In addition to a frequency offset, a phase noise is considered, modeled as a timecontinuous Wiener process with incremental standard deviation over a signaling interval equal to <j&. The performance of the NSD receiver rapidly degrades for values of normalized frequency offset greater than 10~2, whereas the proposed receiver is 'Under dynamic channel conditions, this specifi cation is necessary as well.
110
Sequence Detection: Algorithms and Applications
Figure3.22: BERof the receiver based on (3.137) (white marks) for DQPSK, TV = 6, L = 11, and £' = 16. The performance of an NSD receiver (black marks) with TV = 6 and (' = 16 is also shown for comparison. Reproduced from [117], ©2002 IEEE, by permission of the IEEE. practically unaffected by frequency offset up to 10 l and phase noise standard deviation up to 5 degrees. As already mentioned, in the case of DQPSK, the limiting value of vT\ is 0.125 (method 1). On the other hand, for higher values of i/T\ the receiver performance would be limited, in any case, by the front-end bandwidth and the interference generated by the resulting mismatch. As previously noted, the algorithm used for frequency estimation is irrelevant. In the previous figures, receivers based on branch metric (3.137) and limitation of the estimation interval (method 1) for DQPSK were considered. Alternatively, method 2 can be used, but in this case DDE has to be employed. The different behavior of the receivers based on the two methods as well as the robustness of the proposed schemes is emphasized in Figure 3.23, in which the BER versus the normalized frequency offset z/T1, for E^/N^ — 10 dB, is shown for two receivers. The modulation format, the transmit and the receive filters are those of the previous figure. The curve with white marks corresponds to the performance of the receiver based on branch metric (3.137) with N = 6, L = 6, and C' = 16 for DQPSK with limitation of the estimation interval (method 1). The curve with black marks corresponds to
Applications to Wireless Communications
111
Figure 3.23: BER at an SNR of 10 dB versus the normalized frequency offset of the receiver based on (3.137) for DQPSK and TV = 6, L = 6, and £' = 16. Method 1 (white marks) based on a limitation of the estimation interval and method 2 (black marks) based on DDE are considered. The performance of an NSD receiver with N — 6 and £' = 16 is also shown for comparison. Reproduced from [117], ©2002 IEEE, by permission of the IEEE. a receiver based on the same branch metric, the same values of TV, L, and ("', and using DDE (method 2). Frequency offset values up to 10% of the symbol rate do not affect the receiver performance. We can also note that the performance of a receiver based on method 1 rapidly degrades when the normalized frequency offset exceeds the limiting value which, for QPSK, is 0.125. On the contrary, the receiver based on method 2 has a performance which slowly degrades with the frequency offset due to the front-end filter. The performance of the NSD receiver already degrades rapidly for v equal to 1% of the signaling interval.
3.11.3 Noncoherent Sequence Detection for Slowly Varying Frequency Nonselective Fading Channels The system model is shown in Figure 3.24. An information sequence {an}, composed of iid symbols belonging to an M-ary alphabet, is mapped into a code sequence by means of some coding rule. We assume a burst-mode transmission with K code
112
Sequence Detection: Algorithms and Applications
Figure 3.24: System model for transmission over a frequency nonselective fading channel. Reproduced from [120], ©2000 IEEE, by permission of the IEEE. symbols per burst and denote with c = (c0, c i , . . . , cK-i)T a column vector whose elements are the code symbols transmitted in a burst. The code sequence is further mapped by a modulator into a time-continuous signal s(t,c) which is transmitted over a frequency nonselective slowly fading channel, represented by a multiplicative complex random variable /, modeled as Gaussian with mean rjf and variance cr2. The transmitted signal also undergoes a phase rotation 0, modeled as a random variable with uniform distribution in the interval [0,2?r) and independent of /. Although both / and 0 are assumed to be constant during the transmission of each data burst, the application of the CMP has the convenient feature of allowing one to remove the channel time-invariance assumption and encompass time-varying models. The choice of the value of rjf allows us to take into account both Rayleigh (rjf = 0) and Rice fading distributions, with a factor KR = \rjf\2/cr2. The channel also introduces AWGN of complex envelope n(t) with independent real and imaginary components, each with single-sided power spectral density cr2. Assuming rjf — 1 and cr2 = 0, one obtains the limiting case of an AWGN channel. Assuming that signal s(t, c) is linearly modulated, it may be easily shown that sampling the output of a filter matched to the shaping pulse, with one sample per signaling interval, yields a sufficient statistic for optimal detection of the information sequence.22 In the absence of ISI, this sampled output may be expressed as
where {nk} are iid complex noise samples with independent components, each with variance cr2. In the following, the samples {rn} will be referred to as observations. Perfect Channel State Information We now assume that the receiver has perfect channel state information (CSI), represented by the channel gain | /1. As an example, this knowledge may be well ap22
This is not true in the case of a time-varying fading channel, where a suffi cient statistic may require the use of oversampling [121].
Applications to Wireless Communications
113
proximated by means of a simple data-aided estimate based on a known preamble preceding the data burst. In the case of perfect CSI, an alternative sufficient statistic may be obtained by normalizing the received samples {rn} using the known gain value as Using polar coordinates to express the fading coefficient as / = \ f \ e ^ a , we have where With respect to a channel which introduces only an unknown phase shift, considered in detail in Section 3.11.2, we now have two differences: (i) the phase shift is 9' and (ii) the variance of the noise samples is c r 2 / \ f \ 2 . The random variable 0' has uniform distribution, regardless of the distribution of a. Under the final channel dependence approximation, the noise variance does not affect the receiver strategy. Therefore, the results in Section 3.11.2 still apply and the receiver, in the case of perfect CSI, is based on a VA with branch metric
where Tk = (o^, Sk) is a transition in a suitably expanded trellis and N is the order of Markovianity of the channel. Limited Channel State Information We now turn our attention to the case characterized by the absence of CSI, in the sense that the receiver has statistical information about the fading gain only. Specifically, using polar coordinates to express the mean rj/ of the random variable / as rjf — \r]f\e^9f, we assume that the receiver knows the value |?y/|, the fading variance cry and the noise variance cr2. In principle, an estimation of these parameters could be obtained from the observation of a few preambles of consecutive bursts. However, it is shown in the following that in all significant cases estimation of these parameters is not necessary. Recalling (3.144), given a code vector c and a hypothetical phase 0 the conditional pdf of the observation vector r = (r0, r i , . . . , TK-I)T is obviously Gaussian:
114
Sequence Detection: Algorithms and Applications
where [-]H is the transpose conjugate operator and
are the conditional mean and covariance matrix, respectively. Using equation (3.54) of [122], it is possible to express the inverse covariance matrix as
The following relation may also be easily verified
where det(-) denotes the determinant of a matrix. Substituting (3.152) and (3.153) into (3.149), after some tedious but straightforward manipulations, one obtains:
where, in the last step, £ = cr2/2th signaling interval
Applications to Wireless Communications
117
and an incremental metric
The difficulty inherent in the incremental metric (3.159) is its unlimited memory. In fact, this metric depends on the entire previous code sequence. This implies that the maximization of the sequence metric may, in principle, be realized by a search of the path in a properly defined tree diagram. From an implementation viewpoint, approximate tree search algorithms must be used, unless a very short transmission length is assumed. In order to limit the memory of the incremental metric (3.159) we apply a CMP, which allows one to search a trellis diagram by means of a VA. To this end, in (3.159) we may consider (TV + 1) TV, the
Sequence Detection: Algorithms and Applications
118
resulting approximate finite memory incremental, or branch, metric is
where S^ = ( a ^ - i , . . . , ak_N,ijik-N) and the transition Tfc = (a fc , 5^) uniquely identify the code symbols c%_N used in (3.160). As already mentioned, the corresponding receiver is based on a VA and requires knowledge of the parameters |T?/|, cr2 and depends only on the specific linear modulation format. For example, in the case of 16-QAM, it is easy to show that
where ES is the average symbol energy. As an example of application, we consider approximate MAP sequence detection, based on linear prediction, of an eight-state rate-3/4 TCM with 16-QAM output modulation and 90 degree rotational invariance [57]. The performance of the system, shown in Figure 3.30, was investigated for a phase jitter standard deviation CTA equal to 0 degrees and to 5 degrees. For comparison the performance of the simplified algorithm proposed above was also evaluated. The number of states was kept fixed to 64 (Q = 2) in all linear predictive receivers. While for <JA = 0 degrees the receiver with prediction order TV — 7 asymptotically approaches the performance of the coherent system, for 1 is appreciably worse. The case of a random constant channel phase can be derived from the Wiener
132
Sequence Detection: Algorithms and Applications
Figure 3.31: Prediction coefficients as a function of the phase noise standard deviation <JA, for an equal-energy modulation, prediction order TV = 4 and E^/NQ = 4 dB. Various values of the frequency offset intensity a are considered. Reproduced from [128], ©2003 IEEE, by permission of the IEEE.
phase process model previously considered by imposing crA = 0. Assuming equaln Tlii Tl Tl energy modulation, the Wiener Hopf system can be written as R p = b where the system matrix RTI is
y
I being the identity matrix and 6TI = [1 • • • 1]T. After a few simple manipulations, it follows that the solution of the Yule Walker system consists of a sequence of identical prediction coefficients, i.e., pjl = p , i e { l , . . . , ./V}, where
Applications to Wireless Communications
133
Figure 3.32: BER as a function of the phase noise standard deviation <JA for DQPSK, symbol by symbol decision, and various values of the frequency offset intensity a. Reproduced from [128], ©2003 IEEE, by permission of the IEEE. Hence, the estimate ei9k becomes
and the corresponding branch metric can be written, in the case of equally probable information symbols, as follows:
In [129], it is shown that the basic metric used in [47] can be approximately written, in the case of equally probable information symbols, as
134
Sequence Detection: Algorithms and Applications
where
After simple manipulations, it can be immediately concluded that the branch metric (3.189) coincides with (3.188) in the case of equal-energy modulation. More generally, the metric (3.188) coincides with that proposed in [130]. Considering a generic modulation format (not necessarily with constant amplitude), for large SNR, i.e., o- /\ck —> 0, the system matrix R in (3.185) can be approximated as follows:
which implies p^ ~ I/TV , i G {1,..., AT}. The same phasor estimate (3.187) (exact in the case of equal-energy modulation) can therefore be approximately used in this case (i.e., for any linear modulation format) as well.
3.11.5
Linear Predictive Sequence Detection for Frequency Flat Fading Channels
In the case of linear coded modulation, we assume that the discrete observable can be written as follows:
where: {//J is a circular complex Gaussian random sequence; {ck} is a code sequence; and {rik} is an iid Gaussian noise sequence with variance per component equal to cr2. We assume that the encoder/modulator can be modeled as an FSM with state /ifc, characterized by "output" and "next-state" functions as given in (3.3). The statistics of the observable, conditioned on the data, are Gaussian. More precisely, the conditional pdf of the observation at time epoch k can be written as follows:24
24
We are implicitly assuming that the starting state ^o of the encoder/modulator is known.
Applications to Wireless Communications
135
where the conditional mean and conditional variance are
In particular, the conditional mean and variance depend on the entire previous information sequence, so that one can conclude that the memory of the system is unlimited. By considering the CMP, one can conclude that
where the second equality follows immediately upon the considered observation model in (3.192). The intuitive motivation is that "old" observations do not contribute much information regarding the current observation. If this condition were strictly met, the random sequence rk would be Markov of order TV, conditional25 upon a*. This Markov assumption is never verified in an exact sense for a realistic fading channel. Even assuming a Markov fading model, thermal noise destroys the Markovianity of the observation. The quality of this approximation depends on the autocovariance sequence of the fading process {fk} and the value of TV, which is an important design parameter. Upon the application of the CMP, we can concentrate on
The conditional mean and variance in (3.197), i.e.,
are, respectively, (i) the TV-th order mean square prediction of current observation rk, given the previous TV observations and the information sequence, and (ii) the relevant prediction error. Note the difference with respect to the previously introduced notation ^(aj) and a 2 (ag), which denote similar quantities given the entire previous observation history rjp1 (k-th order prediction at time k). For Gaussian random sequences, the conditional mean (i.e., the mean square estimation) is linear in the observation:
25
Note that the CMP can be equivalently reinterpreted as a Markovianity condition of order N.
136
Sequence Detection: Algorithms and Applications
At time epoch k, the linear prediction coefficients Pijk and the mean square prediction error a2, are the solution of the following (linear) matrix equation (Wiener Hopf system):
where
The observation correlation matrix ,Rfc(ao) incorporates the dependence on the data sequence ak and vectors p and q include the unknowns. Given the flat fading model, the observation vector can be expressed as
where C(kN} = diag(ckk_N) and fkk_N = (fk, fk-i,..., A-/v) and nkk_N = (rifc, Uk-i,. • . , rik-N)- Hence, the observation correlation matrix is
where F = E{/^_ Ar (/^_ 7V ) H } is the fading correlation matrix, which does not depend on k under the assumption of stationary fading. This expression shows that the dependence of JJ fc (aJ) on the information sequence can be compacted26 into the dependence on the code sequence ckk_N. Hence, considering the "usual" expanded state Sk — (fljt-ij • • • , Q>k-Ni A^fc-Tv). we can further write:
It can be immediately concluded that a similar dependence on the couple (a fc , Sk) characterizes the prediction coefficients, the conditional mean and variance, and the 26
It is possible to arrive at the same conclusion immediately after the application of the CMP, since a limited dependence on the observations implies a limited dependence on the information sequence as well, as shown in Chapter 2.
Applications to Wireless Communications
137
entire conditional statistics of the observation. More precisely, carefully analyzing the previously introduced Wiener Hopf system, it is possible to write:
where the unnecessary time indexes can be dropped assuming a stationary fading regime. The resulting branch metric to be used in a VA is
The branch metric (3.212) makes use of the linear prediction ?k(Sk) of the current observation rk, based on the previous observations and path-dependent prediction coefficients. The derived linear predictive detection algorithm can be given the following interpretation. Based on the conditional Gaussian nature of the observation and the CMP, we can concentrate on the Gaussian pdf p(rk rk~]y, ak-,Sk}. The conditional mean f fc (a fc , 5*) and variance &2(ak, Sk) can be viewed as system parameters to be estimated, and the following steps should be considered. 1. Adopt a linear feed-forward data-aided parameter estimator of order TV (see Section 3. 11.1). 2. Use a set of estimators by associating one estimator to each trellis path. 3. Compute the estimation coefficients in order to minimize the mean square estimation error with respect to the random variable rk, conditional on the path data sequence. The resulting estimator is exactly the described path-dependent linear predictor. Linear prediction of rk based on the previous observations is a form of PSP-based/^eJforward parameter estimation. We obtained it naturally in the derivation of the detection algorithm.
138
Sequence Detection: Algorithms and Applications
It is possible to obtain an alternative formulation of the branch metric. In fact, the observation prediction can be expressed as follows:
indicates the coded symbols associated with state Sk. In particular, fk(Sk) denotes a state-dependent linear prediction of the fading coefficient at time fc, based on previous observations.27 The coefficients {p"(Sk}} are state -dependent prediction coefficients of the fading process based on previous observations of noisy fading -like state-dependent sequences {rk-i/Ck-i(Sk)}^Li- The mean square prediction error of observation and fading coefficient are similarly related as follows: k-i(Sk)}
The branch metric can then be expressed as
In particular, the fading prediction coefficients p"(Sk) and mean square prediction error a2f(ak, Sk) are the solution of the following Wiener Hopf equation:
27
Note that in the case of application of state reduction techniques, some of the symbols Ck-i have to be retrieved on the survivor associated with the reduced state replacing Sk. In this case, the linear predictions of the fading coeffi cients can be interpreted as path-dependent. The same comment holds also for the prediction coeffi cients {jf' (5fc)}.
Applications to Wireless Communications
139
where
The dependence of the solution on the hypothetical sequence is only through the moduli of the code symbols. For code symbols with constant modulus, the prediction coefficients do not depend on the state £& and symbol ak. For instance, this is the case for PSK, i.e., = 1. In this case, the branch metric simplifies as follows:
This solution is remarkably similar to what we would obtain in a decomposed estimation-detection strategy by estimating the "undesired" parameter fk according to PSP. Note that the (Gaussian) prediction error variance a2 affects the "overall" thermal noise power. It is possible to interpret the designed algorithm as embedding linear predictive estimation of the fading coefficient fk. In particular, the observation model fk = ckfk + nk satisfies a parameter-conditional FMC by viewing fk as un undesired parameter. In order to estimate this parameter, the following steps should be considered. 1. Adopt a linear feed-forward data-aided parameter estimator of order N (see Section 3.11.1). 2. Use a set of estimators by associating one estimator to each trellis path. 3. Compute the estimation coefficients in order to minimize the mean square estimation error with respect to the random variable rk/ck, conditional on the path data sequence.
140
Sequence Detection: Algorithms and Applications
The resulting estimator is exactly the described state-dependent linear predictor. Hence, linear prediction of fk based on previous observations is a form of PSP-based feed-forward parameter estimation. The state-complexity of a linear predictive receiver can be naturally decoupled from the prediction order TV by means of state reduction techniques. For the sake of simplicity, we consider folding by memory truncation, but set partitioning could be used as well. Let Q < N denote the memory parameter to be taken into account in the definition of a reduced trellis state, so that a reduced state wk can be written as follows:
The branch metric can be obtained by defining a pseudo- state28 as
where (ak_Q+i(wk), • • • ,ak-]y(wk)) and fik-Nfak) are information symbols and a state in the survivor of state wk, respectively. The branch metric in the RS trellis can be defined as usual according to
For coded PSK, the branch metric is
The prediction order N and RS parameter Q are design parameters to be jointly optimized by experiment to yield a good compromise between performance and complexity. As an example, we consider the case of uncoded QPSK signaling (the cardinality of the information symbols is M = 4) over a time-varying flat fading channel. The BER performance, as a function of the SNR, is shown in Figure 3.33. The normalized maximum Doppler rate is indicated as foT. In particular, we consider a prediction order TV = 10 and a reduced state parameter Q ~ 2 (16 possible reduced 28
Note that the formulation in (3.225) is very general and accounts for the case of a recursive state as well. In the case of a nonrecursive state, a simplifi ed formulation is possible.
Applications to Wireless Communications
141
Figure 3.33: BER of a linear predictive receiver for transmission of QPSK over a time-varying flat Rayleigh fading channel. Various values of the normalized maximum Doppler rate are considered. Reproduced from [124], ©1995 IEEE, by permission of the IEEE. states). Pilot symbols are periodically inserted in the transmitted sequence (the insertion rate is 1 every 9 data symbols). For comparison, the curve relative to the case of perfect channel state information (i.e., when the fading process is perfectly known at the receiver side) is shown.
3.11.6 Linear Predictive Sequence Detection for Frequency Selective Fading Channels A frequency selective fading channel can be visualized as a time-varying ISI channel. In particular, if the channel response is time-vary ing according to an arbitrary, but known, stochastic model, we assume that the detector has knowledge of the underlying statistics. In the following, we apply the CMP to derive sequence detection algorithms for signals transmitted over frequency selective fading channels with arbitrary statistics (i.e., amplitude distribution and correlation). No assumption of Gaussian fading or additive noise is therefore made a priori, although specific solutions
142
Sequence Detection: Algorithms and Applications
are proposed for the case of Rayleigh fading channels. The final discrete-time expression (A.32) for the observable derived in Appendix A can be rewritten by making explicit the indexing of the overall symbol period and of the sampling instants inside each period:
Defining the following quantities 29
one can compactly indicate the oversamples relative to the k-th signaling interval as follows: The CMP extends naturally to the case of oversampling, provided that a vector notation is used. In other words, one can write: where TV is the order of Markovianity, C is the finite memory parameter given by C = TV + L, and rkk~^lN is a /3(N + 1) x 1 column vector defined as
In other words, in the case of oversampling we assume that the set of (3 conditional observations relative to a signaling interval depends only on the TV most recent previous sets. Hence, the branch metric has formally the general expression given by (3.17). By defining Sk — (a*!^), one can write: 29
Note that while the observation rk is a vector with elements the oversamples considered in the kth signaling interval, the information sequence {a/d is such that there is only one symbol per signaling interval.
Applications to Wireless Communications
143
Note that (3.236) holds for any fading or noise statistics, as long as the CMP is satisfied. This metric could thus be used for non-Gaussian fading (e.g., Nakagami fading) and non-Gaussian additive noise (e.g., impulsive noise or noise generated by interfering signals). In the important case of Rayleigh fading, the conditional observation and the relevant pdfs are multivariate zero-mean Gaussian and further simplifications are possible. Defining the following covariance matrices:
simple algebra leads one to the following expression for the branch metric (3.236):
We note that the matrices Rk_N(ak_c) and Rk~lN(ak~lc) are actually independent of time epoch k if the channel and noise are stationary. In the considered case of a Rayleigh fading channel, the branch metric (3.239) can also be given a different equivalent formulation. In fact, by applying the chain factorization rule to the conditional pdf (3.234) over the corresponding samples of a vectorial observation rk = [rkji, . . . , rk^]T, one can write:
Each conditional observation sample has Gaussian distribution, so that one can further rewrite (3.240) as follows:
where rk,i\k,i-i(rk,i-i, • • • , ^,1, rk_lN, akk_c) and ^| M _i(a£_c) indicate the conditional mean and variance, respectively, and can be evaluated by solving a suitable
144
Sequence Detection: Algorithms and Applications
MMSE problem.30 We remark that only the conditional mean depends on both observations and information symbols, whereas the conditional variance is indeed independent of the observations [131]. The complexity of the proposed algorithms can be straightforwardly reduced by applying the state reduction techniques introduced in Section 3.10.1. In particular, considering the application of memory truncation techniques, it is possible to define a reduced state as
where Q £ {!,...,!/} represents the RS parameter. In order to compute the branch metric (3.241), the information symbols (flfc-L, • • • , ak-Q+i) are recovered from the survivor (based on stored decision feedback) associated with reduced state s^. For Rayleigh fading, assuming that the tap coefficients {fi,n}f=0 m (A.32) obey an autoregressive moving average (ARMA) model, the conditional mean and conditional variance which appear in (3.241) may be computed recursively using a bank of Kalman filters as shown in [132,133]. Although these algorithms may be operated with trellis diagrams of various state complexities, this aspect is only implicit in [132,133], where the ISI trellis diagram is often used. These detection schemes based on per-survivor Kalman channel estimation may be shown to coincide with the proposed detection algorithm in the case with Q — L, if the CMP expressed by (3.234) holds. As a matter of fact, a Kalman filtering technique is hidden in the proposed algorithms, as shown in [121]. We now present a performance analysis of the proposed detection algorithms for frequency selective fading channels. The assumed performance measure is the average BER versus E^/No, where Eb denotes the received signal energy per information bit averaged over the data and channel statistics. The assumed receiver filter is a root Nyquist filter regardless of the selected oversampling factor. In the case of an oversampled front-end, a root Nyquist filter frequency response with vestigial symmetry around 1/(2TS) satisfies both (A.19) and (A.20) if
where 6 is the roll-off factor of the receiving filter. The overall time-discrete channel is modeled according to (A.32), in which the span of the ISI is set to L = 2 and the time-varying channel tap coefficients {fi>n}^=Q 30
Note that for an optimal fi Itering approach, the dependence on the entire observation sequence should be considered. However, in (3.241) we are already considering the application of the CMP.
Applications to Wireless Communications
145
are assumed to be characterized by the following first order ARM A model:
where p is a forgetting factor and {zijn}^=0 are independent, white, complex, Gaussian random processes with independent real and imaginary components. This simple time-discrete model is suitable to be employed both for symbol-spaced sampled detectors ((3 — 1) and for oversampled detectors (f3 > 1). However, the received signal is not strictly bandlimited and the observation sequence is only an approximate sufficient statistic, even for (3 > 1. The correlation between fading samples whose distance in time is a symbol interval is always p, regardless of the selected oversampling factor (3. The simulations are performed assuming that (A.31) is met for j3 — I and neglecting the cross-correlation among taps for (3 > 1. The latter assumption is known to be pejorative with respect to the case in which this cross-correlation is taken into account [133]. The standard deviation of the tap gains (/o,n, /i,n, /2,n) is set to the values (0.407, 0.815,0.407). Different values of p have been considered (the smaller the parameter p, the faster the channel variations associated with it), and two different modulation formats: binary phase shift keying (BPSK, M = 2) and QPSK (M = 4). An information block length of K — 60 symbols is assumed, with a known preamble and postamble of 2 symbols.31 In Figure 3.34, we show the BER of the blind recursive receiver employing one sample per symbol interval ((3 = 1) as a function of the overall assumed memory C, for BPSK modulation. This figure is intended to validate the CMP, which allows us to factorize the conditional pdf. Since L = 2 is fixed because of the assumed channel model and since we consider a full-complexity solution (Q = L), the figure allows us to infer empirically the value of the channel dependence parameter N. We can see that for the considered values of p, the BER levels out for increasing values of N. In fact from an order of Markovianity equal to TV = 8 (C = 10) onwards, the BER curves exhibit an error floor indicating that no BER improvement is further achievable by increasing the assumed order of Markovianity N. Figure 3.35 shows the performance curves of the blind recursive detector with 13 = I for QPSK modulation and p — 0.998. One can notice that a larger (but not dramatic) performance loss, with respect to BPSK, is exhibited by the reducedcomplexity receiver (compare C = 5, Q = 5,4,3). Furthermore, for an equal number of trellis states (Q = 5), the use of state reduction techniques allows one to improve 3
'The presence of the preamble and postamble are expedient for the initialization and the termination of the VA. More details can be found in [121].
Sequence Detection: Algorithms and Applications
146
Figure 3.34: BER of the blind recursive detector with (3 = 1 for Eb/N0 = 40 dB and BPSK as a function of the assumed memory N. Reproduced from [121] by permission of John Wiley & Sons. significantly the performance by increasing the assumed channel memory from C = 5 to 6 and 7. In Figure 3.36, we present a performance comparison between symbol-spaced sampled and oversampled blind detectors, for BPSK modulation and p = 0.99. The oversampled receivers employing two samples per symbol interval ((3 = 2) exhibit an error floor which is remarkably lower than that of the corresponding symbolspaced sampled receivers. Oversampling the received signal may also be a way to compromise in terms of complexity and computational needs. For example, we can compare the curve obtained with C = Q — 6 and (3 — 2 with that obtained with C = Q — 7 and ( 3 — 1 : the oversampled receiver performs slightly better. This is due to the fact that even if it has half the number of states, the branch metric makes use of matrices and observation vectors which have double length with respect to that of a symbol-spaced receiver.
3.12
Summary
This chapter has been devoted to MAP sequence detection. After describing this concept, the Viterbi algorithm, the most famous and successful detection algorithm
Summary
147
Figure 3.35: BER versus Eb/N0 of the blind recursive detector with (3 — 1 for QPSK modulation and p = 0.998. Reproduced from [121] by permission of John Wiley & Sons.
to date, has been described and shown to exactly implement MAP sequence detection. Finite memory sequence detection has been introduced, as an instance of the general approach introduced in Chapter 2. After a discussion about the relation between detection and estimation, a few preliminary examples of data-aided parameter estimation and joint detection and estimation have been considered. The concept of per-survivor processing has been presented, along with examples of application to phase-uncertain channels and dispersive slow fading channels. This concept turns out to be related, explicitly or implicitly, to several detection algorithms considered in this book. In order to limit the complexity of finite memory sequence detection, state reduction techniques for detection strategies based on the Viterbi algorithm have been introduced. Finally, several applications to wireless communications have been considered: adaptive sequence detection based on least mean squares estimation for tracking a dispersive fading channel and joint detection and phase synchronization; noncoherent sequence detection for phase-uncertain and slowly varying frequency nonselective fading channels; linear predictive sequence detection for phase-uncertain and fading channels.
148
Sequence Detection: Algorithms and Applications
Figure 3.36: BER versus Eb/N0 of the blind recursive detector with (3 = 1 and (3 = 2 for BPSK modulation and p = 0.99. Reproduced from [121] by permission of John Wiley & Sons.
3.13
Problems
Problem 3.1: Consider a base-band binary transmission system which uses equally probable symbols a/- G { — !,+!} and linear modulation. The channel is affected by AWGN. The samples at the output of a matched filter can be expressed as
where {c/c} are the transmitted symbol and {g^} is the overall discrete impulse response such that
The noise process {n£ } is colored, with autocorrelation equal to Rn (k) — A. Design a digital noise- whitening filter to process the observations and determine the overall discrete-time impulse response.
Problems
149
B. Consider MAP sequence detection based on the samples at the output of the whitening filter: define a suitable state and design the corresponding trellis diagram. Problem 3.2: Consider a linearly modulated signal with QPSK. The complex envelope of the transmitted signal can be written as follows:
where the symbols {ck} are differentially encoded according to the rule ck — Ck-iOik- The information and code symbols belong the alphabet {±l,±j}. Assume that there is AWGN, the information symbols are equally likely and the condition for absence of ISI is satisfied. At the receiver we consider a detector based on the VA. A. Draw the trellis diagram of the detector. B. Show that the survivors at epoch k + 1 can be obtained by directly extending the survivor at epoch k with the best metric (in other words, the depth of convergence of the survivors is one). C. Show that considering sequence detection based on the VA is, in this case, equivalent to symbol by symbol demodulation. D. Discuss the effects on the decision process of a phase rotation in the received signal equal to a multiple of Tr/2. Problem 3.3: Independent and equally likely information symbols {ak} £ {±1} are coded according to the following rule: A coded symbol ck thus belongs to the alphabet {—2,0, +2}. The coded symbols are transmitted through a base-band linear modulation with shaping pulse p ( t ) , whose Fourier transform is a raised root cosine with no excess bandwidth (i.e., it is a rectangular spectrum). The transmitted signal has therefore the following expression:
Considering transmission over an AWGN channel and assuming that the receiver front-end is a sampled matched filter, derive the VA to implement MAP sequence detection.
150
Sequence Detection: Algorithms and Applications
Problem 3.4: Consider the general class of reduced-state algorithms, indicated as SA(npa, nc\) in Section 3.10.4 and originally described in [108]. Justify why, for a given maximum complexity level given by the product npanci, the reduced-state sequence detection algorithm which provides the best performance is the M-algorithm, i.e., the algorithm with M = npa and nc\ = 1. Compare your solution with the results in [108, 109]. Problem 3.5: The observation at the output of a phase-uncertain channel can be written as follows:
where nk is an AWGN sample. It is easy to show that
A gradient algorithm for estimating the channel phase process would be based on the following recursive estimation:
Comment on the relationship of this algorithm with the joint phase synchronization and MAP sequence detection strategy considered in Section 3.8.1 (data-aided) and Section 3.9.1 (PSP-based) where a first order PLL is used. Problem 3.6: Suppose transmission of independent and equally distributed symbols belonging to an 8-PSK constellation over an ISI channel with memory L = 4. Draw all possible trellis diagrams with four states to perform reducedstate detection. Considering various ISI channel impulse responses, describe the pros and cons of the considered state reduction techniques. Problem 3.7: In the case of application of NSD to fading channels, derive the basic branch metric (3.160) by considering the approach specular to that proposed in the text. In other words, first apply the CMP, and then average over the (known) channel statistics. Problem 3.8: Considering the observation model (3.30) for a flat slow fading channel in Section 3.5.2, show that linear predictive sequence detection is equivalent to the strategy proposed in Section 3.5.2 and based on the correlation properties of the observation sequence.
151
Problems
Problem 3.9: Starting from (3.138), derive the metric (3.135). Note that this approach is the "specular" version of that proposed in this book, where, first, an approximate application of the CMP is used, i.e., memory truncation is introduced, and then statistical averaging is performed. Problem 3.10: Derive (3.160) using the "standard" approach proposed in this book: first apply (approximately) the CMP and then average out the channel statistical parameters. Problem 3.11: Verify by computer simulations that the use of the approximation lnlo(x) ~ x does not entail major performance degradation in the performance of a VA-based receiver using (3.160). Problem 3.12: Verify that the performance of a VA-based receiver using (3. 164) and (3.165) is the same. Problem 3.13: Verify that the performance of a VA-based receiver using (3. 165) is equivalent, in the case of Rice fading, to that obtained by (3.160). Devise suitable approximations which allow one to derive (3.165) from (3.160). Problem 3.14: Consider uncoded transmission of binary symbols ak € {±1} through a static dispersive channel with white noise discrete equivalent impulse response/ = (l,2,l)/\/6. A. Define a suitable system state jj,k and draw the relevant trellis diagram. B. Express explicitly the branch metric as a function of the received signal sample rk for any possible transition. Assume that the received sequence is
(r0, r-!, r 2 , r3, r 4 , r5, r6, r 7 ) = (1.7, 1.2, 1.1, 0.3, -0.2, -1.1, 0.7, 0.4) and the initial state is //o = (+!,+!). C. Use the VA to detect the MAP sequence Problem 3.15: Consider the model of discrete observable where 9 is the overall phase rotation introduced by the channel. Let 9 be an estimate of the channel phase and define the "phase- synchronized" observation
152
Sequence Detection: Algorithms and Applications
A. Derive an explicit expression of the MSB E f^ — c^|2 as a function of 9. B. Obtain an estimate of 0 minimizing the MSB. C. Formulate a data-aided iterative stochastic gradient algorithm with the aim to minimize the MSB. D. Comment on the functional relationship of the obtained synchronization scheme with a first order PLL. Hint: Define a stochastic gradient by differentiating the MSB with respect to 9 and discarding the expectation. Problem 3.16: Consider the model of discrete observable:
where / is the overall discrete equivalent channel impulse response. Let / be an estimate of the channel response. Assume that the code symbols are zero-mean and uncorrelated. /\ T1
A. Derive an explicit expression of the MSB E{ rk — f ck 2} as a function of/. B. Formulate a data-aided iterative stochastic gradient algorithm to minimize the MSB. C. Comment on the functional relationship of the obtained identification scheme and the LMS algorithm. Hint: Define a stochastic gradient by differentiating the MSB with respect to f and discarding the expectation. Problem 3.17: Assume a first order autoregressive fading model
where p G [0, 1] is a constant and {vk} is an iid zero-mean Gaussian sequence with variance o^. A. Show that the fading sequence is Markovian of first order. Assume /0 is Gaussian with variance a\. B. Show that fk is a stationary Gaussian sequence. C. Check if the conditional observation {rk} satisfies a Markov property.
Problems_
153
Problem 3.18: Consider the random-phase discrete channel model
Define a feed-forward data-aided phase estimate 9 based on the previous N observations by minimizing the MSB A. Show that this estimate must verify the condition
B. Show that the result in part A coincides with the data-aided ML phase estimate based on the previous N observations.
4 Symbol Detection: Algorithms and Applications 4.1 Introduction This chapter focuses on the application of the general framework proposed in Chapter 2 to the case of maximum a posteriori (MAP) symbol detection. In particular, performing MAP symbol detection requires the explicit computation of a symbol a posteriori probability (APP), i.e., the probability that a particular symbol has been transmitted given the observation of the entire received signal, or, considering an optimal discretization of the received signal, the entire sequence of discrete-time channel observations. Since, in various cases, the calculation of the exact APP may be computationally very intensive, one can resort to the computation of suboptimal values which approximate the APPs. While the bit error rate (BER) performance of a given transmission system based on the MAP symbol detection criterion is substantially identical1 to that obtained by applying the MAP sequence detection criterion, the discovery of the concept of iterative decoding/detection [33] spurred a strong interest and attention for symbol detection algorithms. The most commonly used algorithm to perform MAP symbol detection is the so-called forward backward (FB) algorithm. Seminal work in the design of algorithms for soft-output decoding dates back to the late Sixties [134,135]. An instance of the FB algorithm was proposed in [136], but a clear formalization is Detection Algorithms for Wireless Communications- G. Ferrari, G. Colavolpe and R. Raheli ©John Wiley & Sons, Ltd. ISBN: 0-470-85828-1 'This is exactly correct for large signal-to-noise ratios, but it represents a good approximation also at low signal-to-noise ratios (SNRs). 155
156
Symbol Detection: Algorithms and Applications
Figure 4. 1 : Transmission system and MAP symbol detection. due to Bahl, Cocke, Jelinek and Raviv in 1974 [54] —for this reason, the FB algorithm is also often referred to as BCJR algorithm from the initials of the last names of the authors of [54],
4.2 MAP Symbol Detection Principle We assume that the transmission system can be represented as shown in Figure 4. 1 . In particular, a source generates a sequence a = a£~l of information symbols, which is transformed by the encoder/modulator into a time-continuous signal s(t,a). For the sake of simplicity, we assume that the duration of the signal is KT, where T denotes the symbol interval. The MAP symbol detection criterion leads to the choice of the symbol dk which minimizes the probability of error with respect to the received signal. More precisely, this criterion can be formulated as follows:
where r(t) is the received signal observed over the entire information-bearing interval TQ. Due to the system memory, T0 may include the interval (0, KT), possible border intervals or, ideally, the entire time axis. Considering a suitable discretization process, through which the received signal r(t) is converted into an equivalent sequence of discrete-time observations r, whose dimension depends on the number of samples per symbol interval T, the strategy (4.1) can be reformulated as follows:
In order to compute the APP P{ak r} one can write:
where the notation a : ak denotes all possible information sequences containing ak, or compatible with ak. Note that the computation of the APP, as indicated in (4.3),
Forward Backward Algorithm
157
requires the computation of the same metric as in the case of MAP sequence detection (i.e., p(r a)P{a}) and a further marginalization based on the sum over all the information sequences compatible with symbol ak. Assuming that the information symbols are independent, one can further express the APP as follows:
The first and simplest possible approach for the evaluation of the APP could be based on the computation of expression (4.4). It is immediately concluded that the efficiency of this computation is very low, since one must compute a large number of sequence metrics and then marginalize by adding them together. The complexity would obviously be exponential in the transmission length K. The FB algorithm, introduced in more detail in the following section, represents an efficient way to compute the APP, with a complexity linear with the transmission length K, as in the case of a Viterbi algorithm (VA) for MAP sequence detection.
4.3 Forward Backward Algorithm As already mentioned, the first clear formulation of the FB algorithm can be found in [54]. In the following, we propose a simple derivation of the FB algorithm for transmission over a memoryless channel. We assume that the encoder/modulator can be represented as a finite state machine (FSM), with state fik and output symbol ck. As in Chapter 2, we assume that the next- state and the output functions are known:2
The couple (/^, a*) uniquely identifies a transition in the trellis diagram of the encoder/modulator FSM. We denote this transition as tk. After a suitable discretization process with one sample per information symbol, we assume that the observable at epoch k can be written as
2
Note that the derivation shown in the following holds also in the case of a channel with strictly fi nite memory, the only difference being the interpretation of pk as the state of the FSM obtained by concatenating the encoder/modulator and the channel. Moreover, we assume generation of a single output symbol ck in correspondence to an information symbol ak, but the derivation can be straightforwardly extended to the case of multiple output symbols by using a vector notation.
158
Symbol Detection: Algorithms and Applications
In this case, the APP can be expressed as follows:
Owing to the independence between the information symbols, since fj,k may depend on ajp1, it follows that
Upon the assumption of transmission over a memoryless channel, the remaining conditional probability density functions (pdfs) in (4.8) can be simplified as
Hence, the APP in (4.8) can be rewritten as follows:
By defining
the desired symbol APP finally becomes
where, for the sake of notational simplicity, the dependence of // fc+1 on /i^ and a^ is not explicitly indicated.3 In the following, since the generated soft-output value is a 3
This simplifying notational assumption (and other similar assumptions) will also be considered later in the book. The context should eliminate any ambiguity.
Forward Backward Algorithm
159
quantity monotonically related to (or approximating) the APP, we will indicate this value with the general notation S[ak]. In other words, one can write:
Note that the operation (4.18), where the quantities {ctk(fJ-k)} and {/3k+i(Hk+i)} are combined to generate the extrinsic information, is usually referred to as completion. The quantities Oik(nk) and (3k+i(^k+i) can be computed by means of forward and backward recursions, respectively. More precisely, one can write:
where the index of the summation indicates all possible transitions compatible, through the next-state function, with p,k—this notation is general and accounts also for the case of underlying recursive and nonrecursive FSM models. The summation in (4.19) can be re-interpreted as being carried over all possible trellis branches {tk-i} compatible with the final state f i k . Since we are considering possible combinations of iik-i and a^-\ compatible with nk, it follows that
Based on the independence between the information symbols and on the absence of channel memory, one can write:
160
Symbol Detection: Algorithms and Applications
Finally, a step in the forward recursion in (4.19) can be concisely expressed as follows:
A similar derivation holds also for the backward recursion. More precisely, one can write:
Based on the independence between information symbols and the absence of memory of the considered transmission channel, the following simplifications hold:
A step in the backward recursion, as indicated in (4.25), can be rewritten as follows:
Note that, unlike the forward recursion, in the backward recursion there is no ambiguity in the summation index. In fact, the trellis diagram corresponding to any possible coding/modulation scheme, either recursive or nonrecursive, is such that branches exiting from a state are associated with different information symbols. Given the inherent complexity of the FB algorithm, a number of suboptimal approximate versions have been proposed in the literature. Among the various approximations, we consider one which arises naturally once the algorithm is formulated in the metric (or logarithmic) domain. Unlike sequence detection, taking the natural
Forward Backward Algorithm
161
logarithm of (4.18), (4.24), and (4.29) would not seem very helpful because the logarithm does not commute with the summations. However, given two values x and y, and noting that for large x — y\
and, consequently, that
it is possible to formulate a computationally efficient approximation of the FB algorithm. This formulation of the FB algorithm is usually referred to either as maxlog [137] or min-sum [98]. The following definitions are expedient:4
In the metric domain, the completion operation in (4.18) therefore becomes:
where, in the last step, the max-log approximation has been used. Similarly, the two 4
Note that the second defi nition derives naturally from the defi nitions of the branch metric (afc, fJik] in Chapter 2 and 7fc(a/c, Hk)- In particular, 7^ can be interpreted as an 'exponential metc.
162
Symbol Detection: Algorithms and Applications
recursions can be written as follows:
The "beauty" of the max-log approximation is that forward and backward recursions in the metric domain, given by (4.36) and (4.37), respectively, can be implemented by two VAs and the two quantities A]T(/^) and Aj? (//&) can be interpreted directly as forward and backward path metrics associated with state // fc , respectively. Specifically, the first VA is run forward, with usual branch metric A fc (t fc ), to compute the forward path metrics {A£(^)}, whereas the second VA is run backward, still with branch metric Ajt(t/ c ), to compute the backward path metrics {Aj? (//&)}. Finally, the forward and backward path metrics are combined with the branch metrics and a comparison is performed to determine the APPs, or the symbol decisions, according to (4.35). The computational efficiency of the max-log FB algorithm obviously comes at the price of a slight performance degradation; hence, a number of different approximations have been studied as intermediate solutions between the full-complexity optimal FB algorithm and its max-log approximation. For conciseness, we do not pursue this topic further and refer the interested reader to the existing literature [98, 137]. The max-log approximation of the FB algorithm is sufficient for our purposes because of its direct interpretation in terms of VAs running in direct and inverse time directions.
4.4 Iterative Decoding and Detection The concept of iterative detection is a natural extension of the concept of iterative decoding. The latter, originally introduced by Gallager in his Ph.D. thesis [31], was crystallized by Berrou and Glavieux in 1993 with the introduction of turbo codes
Iterative Decoding and Detection
163
Figure 4.2: Parallel concatenated convolutional code, or turbo code. and the concept of iterative decoding [33,66]. In this revolutionary work, the authors showed that a complicated code, with a particular structure, can be decoded efficiently with limited complexity. In particular, they considered a parallel concatenated convolutional code (PCCC), constituted by the parallel concatenation, through interleaving, of two convolutional codes. The structure of a parallel concatenated convolutional code is shown in Figure 4.2, where the interleaver is indicated with the symbol TT. As shown in Figure 4.2, the output sequences from the two component convolutional codes can be punctured in order for the overall code to have the desired code rate. The receiver is based on two component decoders corresponding to the two constituent convolutional encoders. In Figure 4.3, the basic structure of a turbo decoder, corresponding to the encoder in Figure 4.2, is shown. The two component decoders exchange soft information between each other, and it can thus be obtained by using soft-output algorithms at each component decoder. More precisely, the decoders exchange extrinsic information, which represents the component of the generated soft output on a symbol not depending on the soft-input information on the same symbol [33]. Referring to the FB algorithm formulation proposed in Section 4.3, one can immediately recognize that the final soft-output quantity (4.18), can be always written as follows: where
and
164
Symbol Detection: Algorithms and Applications
Figure 4.3: Turbo decoder for a PCCC. In Figure 4.3, the extrinsic information values on information symbol a/- generated by the first and second decoders in the n-th iteration are denoted by S^'^aJ and Sj^'^fafc], respectively. Note that the a priori probability P{a^} of an information symbol used by each component decoder is given, in the iterative decoding process, by the extrinsic information generated by the other decoder. Assuming, as a convention, that an iteration is constituted by the sequence of decoding acts of the first and second component decoders, the soft-output values generated by each component decoder can be rewritten as follows:
In other words, the soft-output value generated by the first decoder at the n-th iteration is the product of the soft value at its input, corresponding to the extrinsic information generated by the second decoder at the (n — l)-th iteration, and the generated extrinsic information. The soft-output value generated by the second decoder at the n-th iteration is the product of the soft value at its input, corresponding to the extrinsic information generated by the first decoder at the same iteration, and the generated extrinsic information. The two soft-output values in (4.41) and (4.42) indicate
Iterative Decoding and Detection
165
clearly that the soft-output decoding processes (based on the FB algorithm) in the two component decoders are coupled. A decoding iteration is basically constituted by four steps: 1. The first decoder generates the extrinsic information on an information symbol dk by using the extrinsic information generated by the second decoder—note that at the very first iteration the first decoder cannot rely on the extrinsic information generated by the second decoder. 2. The extrinsic information sequence generated by the first decoder is interleaved and passed to the second decoder. 3. The second decoder, upon receiving the sequence of soft reliability values generated by the first decoder, generates a sequence of extrinsic information values. 4. The extrinsic information sequence generated by the second decoder is deinterleaved and passed to the first decoder. At this point, a single iteration ends, and a new iteration can start. The concept of extrinsic information is very subtle. This information can be interpreted as the "surplus" of information, on the reliability of a symbol, generated by a component decoder. The extrinsic information value of a symbol at epoch k is unbiased with respect to the corresponding soft-input information at the same epoch.5 In this way, considering the turbo decoder scheme shown in Figure 4.3, each decoder receives from the other decoder a "new" suggestion regarding the reliability of a symbol. This is fundamental for the iterative decoding process to converge to a correct decision. In fact, suppose that the first decoder does not decode correctly symbol a&, i.e., it attributes to this symbol a wrong reliability value—this means that the extrinsic information value corresponding to a* suggests that a specific wrong realization is more likely than the others. Upon receiving this wrong reliability value, if the second decoder passed back to the first decoder a "complete" soft-output information S^ [ajfc], then the latter decoder would be heavily influenced by previous decisions of the former decoder. In other words, the second decoder would "push" the first decoder to decide in the same way. Instead, if the second decoder manages to generate a correct surplus of information (i.e., the extrinsic information is a reliability value consistent with the correct information symbol o^), then this surplus of information can positively influence the first decoder, which can decide in the right direction in the following iterations. 5
Note that the extrinsic information at epoch k depends, however, on all soft-input information values at epochs different from k.
166
Symbol Detection: Algorithms and Applications
Figure 4.4: Typical BER performance curves of a turbo code for an increasing number of iterations. Reprinted from [33], ©IEEE, by permission of the IEEE. Note that performing MAP sequence or symbol decoding of the entire turbo code, seen as a single FSM, is practically impossible, since the presence of the interleaver makes the overall FSM possess a huge number of states. Hence, nobody knows exactly the ultimate BER performance of a turbo code, but various analyses suggest that it is reasonable to assume that the performance obtained with iterative decoding basically approaches that obtained performing sequence decoding of the overall code. Typical BER curves for iterative decoding of a parallel concatenated code are shown in Figure 4.4. In particular, the BER curves shown in Figure 4.4 refer to the turbo code originally introduced in [66], which is a rate-1/2 turbo code, with constituent 16-state recursive systematic convolutional (RSC) codes with generators (in octal notation) G\ — 37, G2 = 21, and interleaving 256 x 256 (i.e., the transmitted information sequence has length K = 65 536) as described in [33]. The considered number of iterations increases from 1 to 18. One can immediately recognize a characteristic waterfall effect of the BER curves, as the number of decoding iterations increases above a specific threshold value of the signal-to-noise ratio (SNR). In the particular case considered in Figure 4.4, the threshold SNR value is around 0.7 dB. In other words, if the SNR value at the channel output (i.e., at the input of the turbo decoder) is larger than this value, then the iterative decoding process converges, in the sense that the number of errors (on the eventually decided symbol) rapidly re-
Iterative Decoding and Detection
167
duces to a very small value. If, on the other hand, the SNR is below this value, then the iterative decoding process does not converge and the BER remains unacceptable. In the light of the previous interpretation of the iterative decoding process, the existence of a threshold in terms of SNR can be further interpreted as follows. If the SNR is too low, then the first decoder, at the very first iteration, passes completely wrong reliability values to the second decoder. The latter decoder is "pushed too hard" in the wrong direction by the first decoder, and it passes back soft-output values which aim in the same wrong decision direction. The successive iterations worsen the situation, so that the final decisions made by the iterative decoder are completely wrong. In the last few years, in order to avoid long computer simulations with numerical problems, new techniques have been developed to evaluate the performance of concatenated codes decoded by iterative decoding techniques. In particular, as indicated in the previous paragraph, a figure of interest may be the threshold SNR value above which there is convergence, i.e., above which the BER rapidly decreases to zero for an increasing number of iterations. Among these techniques, we recall two interesting ones.
• A first technique is based on density evolution [84]. According to this approach, the generated extrinsic information is associated with a pdf, and the evolution of this pdf over successive decoding iterations is tracked—this is related to the approach proposed in Section 4.5.
A second technique is based on the extrinsic information transfer (EXIT) charts [86, 87], which characterize the input output SNR relation for a considered code. For example, comparing the EXIT curves of both component convolutional encoders of a turbo code allows determination of the value of the SNR threshold and the speed of the iterative decoding process.
In the following Section 4.5, we propose a different (and simpler) approach to analyzing the iterative decoding process on the basis of the mean value and variance of the generated extrinsic information. In particular, it will be shown that simply scaling the generated soft-output information, whenever produced by algorithms introducing suboptimal approximations, may lead to substantial performance improvements. As an example, this will be the case using state reduction techniques.
168
Symbol Detection: Algorithms and Applications
4.5 Extrinsic Information in Iterative Decoding: a Unified View As described in the previous sections, the principle of iterative decoding is based on the exchange, between component subdecoders, of soft information [33,138]. More precisely, in [33,138] the original concept of extrinsic information was introduced to identify the component of the reliability value generated by a decoding subsystem, relative to a specific input or output variable, which does not depend on the input soft information on the same variable.6 In this section, we analyze the extrinsic information, showing that simply scaling, in the logarithmic domain, the soft information generated by each decoder can improve in some cases the overall performance. For the sake of simplicity, we will refer to the case of transmission over a memoryless channel. The same conclusions hold, however, also in the case of iterative detection over channels with memory, as will be shown in the following sections. We consider, as examples, a PCCC, or turbo code, and a serially concatenated convolutional code (SCCC) transmitted over an additive white Gaussian noise (AWGN) channel. In particular, we consider as component algorithms at the receiver side the FB algorithm, and the softoutput VA (SOVA) [99,139]. In both cases, we derive the two algorithms in the logarithmic domain. In fact, a natural reliability value, in the binary case,7 is the logarithmic likelihood ratio (LLR), defined as
where the word "inputs" refers to all decoder inputs. We remark that we consider a binary input ak belonging to the set {—1, +1}, in a one-to-one correspondence to the set {0,1}. We consider this notation because it will make the following derivation simpler. The LLR can be exactly computed employing the FB algorithm, which allows one to calculate the APPs P{ak = i|inputs}, i e {±1} [54]. As we have seen, the FB algorithm is the optimal algorithm to generate the sequence of APPs, but its computational complexity is large with respect to that of the VA. Besides "hard" symbol decisions, SOVA provides a symbol reliability information which can be interpreted as an approximation of the LLR [33,139,140]. For both the FB algorithm and SOVA, in the literature there exist essentially two methods to process the extrinsic information at the input of a soft-output decoder. 6
In [138], the extrinsic information is referred to as 'refi nement factor." In this section, we limit our attention to the case of binary information symbols. The proposed derivation can, however, be extended to the case of information symbols with larger cardinality. 7
Extrinsic Information in Iterative Decoding: a Unifi ed View
169
1. In a first method, the extrinsic information at the input of a decoder is modeled as the output of an AWGN meta-channel [33,141,142]. 2. In a second method, the extrinsic information is used to update the a priori probabilities used in the next decoding step, in the sense that the APPs computed by a decoder become a priori probabilities for the other one [73,143, 144]—this method was discussed in Section 4.4 when introducing the concept of iterative decoding. In this section, we present a unified interpretation of these two methods and emphasize their commonalities and differences. More precisely, we show that, either using the FB algorithm or SOVA, the two methods only differ for a multiplicative factor used in the metric computation. When the input is modeled as a Gaussian random variable, this multiplicative factor depends on the variance and mean of the sequence of LLRs, whereas, in the case of extraction of the a priori probabilities, it is a constant equal to 1/2. We finally consider the use of an heuristic multiplicative parameter for both algorithms and evaluate the performance of the considered decoding schemes for various values of this parameter.
4.5.1
A Review of the Use of the Extrinsic Information
As mentioned in Section 4.4, decoding of parallel and serially concatenated codes is based on a suboptimal iterative processing in which each component decoder takes advantage of the extrinsic information produced by the other decoder at the previous time [33]. This iterative decoding process is made possible by employing soft-output component decoders. As an example, for the rate-1/2 turbo code described in [33], the turbo decoder is shown in Figure 4.5—we recall that this turbo code is composed of two rate-1/2 RSC codes, with puncturing on the coded symbols generated by the two encoders. In the figure: blocks II and II"1 denote the interleaver and deinterleaver, respectively; {r[ }, j = 1,2, denote the channel output sequences; and [Zfr }, j = 1,2, denote the extrinsic information sequence at the input of the j-th soft-output decoder (i.e., produced by the other one). This sequence is derived, by means of an interleaver or a deinterleaver, from the sequences {w^ }, j — 1,2, produced by the other component decoder. Obviously, a serial concatenated decoder presents a serial concatenation, instead of a parallel concatenation, of two component decoders. In particular, while in a parallel decoder at the input of each component subdecoder there are channel observations, in a serial decoder only the inner subdecoder is directly connected to the channel.
170
Symbol Detection: Algorithms and Applications
Figure 4.5: Decoder for a turbo code of rate 1/2. In this subsection, we describe the possible methods for the use of the extrinsic information at the input of each decoder. To this purpose we consider, without loss of generality, a soft-output decoder which receives a sequence {r^} of channel outputs and a sequence {zk} of extrinsic information values generated by another decoder and produces a sequence {wk} of soft-output values—this is the case for decoder 2 in Figure 4.5, but may be easily generalized to the other decoder with an extended vector notation for rk. Moreover, we assume that the input sequence {zk} and the generated sequence {w^} are both relative to the sequence {ak} of information symbols. The proposed formulation can be generalized computing the extrinsic information relative to the code symbols {ck} if the relevant soft outputs are needed, besides those for information symbols, as in the case of nonsystematic codes. In [83,138], this generalization is carried out considering the FB algorithm; however, an extension to SOVA is straightforward. For simplicity, we consider an RSC code (usually the component code of a turbo code and the inner code of a serially concatenated code), in which case the extrinsic information on an information symbol needs to be computed. We assume that a channel output, after suitable filtering and sampling, can be expressed as where {nk} is a sequence of independent, zero-mean, real Gaussian random variables, with variance a 2 . In (4.44) we assume that c^ 0, embedded in the survivor associated with state sk. A more refined formalism will be considered in Section 4.8.3. The ensemble of the survivors for all states and all epochs is denoted by df and we call it the (forward) survivor map. Assuming a forward survivor map has been constructed, it is reasonable to compute the soft output relative to a symbol ak as follows:
In particular, it is worth remarking that the quantities ak(sk) and (3k+i(sk+i) are formally equivalent to the corresponding quantities in the full-state case, but, due to state reduction, they are quantitatively different. We now show how a forward survivor map af can be recursively constructed during the forward recursion. We assume that the forward survivors are known up to epoch k — 1 . A generic step at epoch k of the forward recursion can be written as follows:
As the survivors are known up to epoch k — 1, a reasonable way to compute the pdf p(rk_i r^_1; e f c _i) is the following:
so that one can write:
204
Symbol Detection: Algorithms and Applications
Hence, the forward recursion can be written as
The forward survivor associated with state sk is associated with the survivor df of the state s™j^ such that the corresponding term in the sum (4.140) is the largest one. Once the forward survivor map has been constructed, it can be used in the backward recursion, which can be written in the following way:
The computational complexity of the forward and backward recursions is related to the number of branches in a section of the receiver trellis, which is proportional to the number of states. Hence, the complexity of the proposed RS-FB algorithms is approximately M^N~Q"> times lower than that of the full-state case. A direct extension of the Fwd-only RS-FB algorithm proposed in this section can be defined the as backward-only (Bwd-only) RS-FB algorithm. In this case, a backward survivor map a° is constructed during the backward recursion, run first, and then used in the forward recursion and in the completion. We remark that a linear time-invariant (LTI) maximum-phase system is a causal and stable system with the maximum energy delay property among all systems with the same frequency response magnitude [155]. As we will see in the case of iterative detection for inter-symbol interference (ISI) channels, use of Bwd-only RS-FB algorithms can be greatly beneficial for maximum-phase ISI channels. The basic structural idea can be further generalized. One could construct two distinct survivor maps during forward and backward recursions, and use an extended metric in the completion. This has been partially proposed in [151], and will be considered in Section 4.8.5.
4.8.2
Examples of Application of Fwd-Only RS-FB Algorithms
As examples of application of the Fwd-only state reduction technique for FB algorithms introduced in Section 4.8.1, we consider the cases of coherent detection for an ISI channel (i.e., assuming perfect knowledge of the ISI channel coefficients), noncoherent detection (using the noncoherent FB algorithm, indicated as NCSOa, introduced in Section 4.6) and linear predictive detection based on linear prediction for a Rayleigh flat fading channel.
State Reduction Techniques for Forward Backward Algorithms
205
ISI Channel In the case of an ISI channel, we assume uncoded transmission, i.e., there is no state fik inside the expanded state Sk (suitably defined as explained in the following). The observation sample at the output of a whitened matched filter (WMF) can be expressed as [100]
The noise samples {nk} are real independent Gaussian random variables, with zero mean and variance cr2. The distortion on the elementary shaping pulse makes each sample rk dependent on L information symbols, each weighted by a different coefficient f^ By defining an expanded state21 as Sk = aj:!^ (consequently, Tk = «-^_L), it is easy to show that the FB algorithm can be directly applied [54]. Hence, the metric 7fc(Tfc) reduces to
where the symbol ~ denotes proportionality. We remark that an LTI system is minimum-phase if it is causal and stable and has a causal and stable inverse [155]. A minimum-phase system has the minimum energy delay property among all systems with the same frequency response magnitude [155]. If the impulse response of the equivalent time-discrete channel {fi}^=0 is minimumphase, effective state reduction is obtained by defining a reduced state sk = ak-Q' with Q < L, and, consequently, a transition ek = (sfc,Sfc+i) = a>k-Q [100» 152154]. Hence, searching in the survivor path associated with state sk, it is possible to recover22 d f ~ l a n d then write
2
'This would correspond to setting the fi nite memory parameter C to the length L of the ISI channel, and suppressing the dependence of an observation from previous observations. 22 For the sake of notational simplicity, in the following we do not explicitly indicate the dependence of the information symbols associated with the survivor path from the state Sk- The correct interpretation should be clear from the context.
206
Symbol Detection: Algorithms and Applications
Figure 4.19: Forward recursion for the computation of ak(sk) for a Fwd-only RSFB algorithm in the case of coherent detection for an ISI channel. Reproduced from [156], ©2001 IEEE, by permission of the IEEE. In Figure 4.19, it is shown how the forward recursion proceeds in the reduced-state trellis in the case of coherent detection for an ISI channel. In this case, we associate with each state Sk a state Sk-i, chosen among the beginning states of the M branches ending in s^, according to the corresponding path metrics —the chosen state s/~_i is such that the corresponding path metric is the largest. The backward recursion proceeds similarly, by using the survivors selected during the forward recursion. If the overall impulse response {fi}^0 is maximum-phase, e.g., if a causal whitening filter is selected [100], efficient definitions of trellis state and transition are s'k — k a fc_L+Q_i and e/^ = a k~L+Q^ respectiveiy A Bwd-only RS-FB algorithm can be designed, which starts with the backward recursion, constructs a backward survivor map compatible with the new reduced state s'k, and runs the forward recursion. Related reverse-time processing structures, suited to impulse responses with energy concentrated towards the end, are considered in [157, 158]. In this "specular" version of the previously introduced algorithm, during the backward recursion the information symbol ak-L is relative to the transition from state s'k+1 to state s'k and symbol dk-L+Q is discarded. The formulation is a straightforward extension of that previously introduced, the only modification being a termination of the reduced-state
State Reduction Techniques for Forward Backward Algorithms
207
Figure 4.20: Backward recursion for the computation of I3k(s'k} for a Bwd-only RSFB algorithm in the case of coherent detection for an IS I channel. The survivor map is constructed during this recursion. Reproduced from [156], ©2001 IEEE, by permission of the IEEE. trellis necessary to better initialize the backward recursion. Figure 4.20 schematically shows how the backward recursion proceeds in a Bwd-only RS-FB algorithm. This example leads to a more general conclusion on the possible applications of the proposed state reduction techniques. Survivors can be constructed during both recursions, depending on the overall channel impulse response and, consequently, on the structure of the observations. As an example, a proper use of survivor maps constructed during both recursions may prove useful if the overall impulse response is "mixed-phase."23 This will be considered in detail in Section 4.8.5. As an example of coherent iterative detection for an ISI channel we consider the scheme of iterative equalization proposed in [73]. The performance is assessed by means of computer simulations in terms of BER versus the SNR Eb/N0, E^ being the received signal energy per information bit and 7V0 being the one-sided noise power 23
In this context, the term 'mixed-phase" channel will be loosely used for channels in which most of the energy is not concentrated around the channel coeffi cients that are associated with the most recent or the earliest input symbols.
208
Symbol Detection: Algorithms and Applications
spectral density. In any component decoder we consider a simplified logarithmic version of the corresponding FB or RS-FB algorithm [156]. More precisely, we consider a binary (M = 2) transmission system characterized by a rate-1/2 16-state RSC encoder with generators GI = (23)8, G2 = (35)8 (octal notation), followed by a 64 x 64 pseudorandom interleaver [33]. The bits at the output of the interleaver are sent through the channel with BPSK. The minimum-phase discrete-time channel impulse response is characterized by the following coefficients [73]:
The receiver is based on a serial concatenation of an inner detector, which uses the RS-FB algorithm, and an outer decoder which is a standard FB module [83]. The extrinsic information is used according to the heuristic method proposed in Section 4.5. By trial and error, we found that a good performance is obtained when the extrinsic information generated by the inner detector is weighted, i.e., multiplied, in the logarithmic domain, by a constant24 equal to 0.6 and the extrinsic information generated by the outer decoder is weighted by a constant equal to 1 (i.e. it is not modified). The state reduction technique is applied to the inner detector. In Figure 4.21, the performance of the full-state receiver (inner detector with £ = 16 states) is compared with the performance of the receiver with reduced complexity (inner detector with £' = 8,4 or 2 states). In all cases, we consider 1 and 6 decoding iterations. At 6 decoding iterations, the performance loss for a detector with £' = 4 states with respect to that of a receiver without state reduction is only 0.75 dB at a BER of 10~4, and it reduces to 0.25 dB for a reduced-state detector with £' = 8 states. For comparison, the performance in the absence of ISI, i.e., for coded transmissions over an AWGN channel, is also shown. In this case, the receiver reduces to the outer decoder of the RSC code considered above. Noncoherent Channel We consider the transmission scenario described at the end of Section 4.7, and we assume linear modulation. The channel introduces an unknown phase rotation, which is modeled as a time-invariant random variable with uniform distribution in [0, 2?r). A sample at the output of a matched filter has the following expression:
24
Note that in Section 4.5, since we were referring to LLRs, the multiplicative constant used in the heuristic method had to be lower than 0.5. However, without considering the LLR and referring directly to the generated soft-output values, the multiplicative constant is between 0 and 1.
State Reduction Techniques for Forward Backward Algorithms
209
Figure 4.21: Application of the proposed technique to iterative decoding/detection for an ISI channel. Receivers with various levels of complexity are considered and compared with the full-state receiver (£ = 16). The considered numbers of iterations are 1 and 6 in all cases. The performance in the case of coded transmission over an AWGN channel, without ISI, is also shown (solid lines with circles). Reproduced from [156], ©2001 IEEE, by permission of the IEEE. where Ck represents the modulated, and possibly coded, symbol and n^ is a complex AWGN sample with variance per component equal to cr2. Considering the NCSOa algorithm derived and used in Section 4.7, the exponential metric 7^ (TJ-) can be expressed as follows:
Introducing a reduced-state parameter Q, which replaces JV, a reduced-state trellis is obtained and a forward survivor map df can be constructed during the forward recursion. Based on this map, a full-state metric can be associated with each reduced
210
Symbol Detection: Algorithms and Applications
transition ek as follows
where the notation 4_» is relative to a coded and modulated symbol obtained by using the constructed forward survivors. We do not show any numerical result relative to this particular application, since in Section 4.9.1 detailed examples of noncoherent iterative detection schemes will be proposed. Fading Channel We consider transmission of differentially encoded M-ary phase shift keying (MPSK) over a Rayleigh flat fading channel. We refer to the transmission system considered in [159] and denote by foT the normalized Doppler rate of the channel. We assume that each information symbol ak corresponds to a group of Iog2 M = ra bits, i.e., ak = (aj^, . . . , a(™}). The bits (a(^\ . . . , a(™}) are mapped to an M-PSK symbol ak through Gray mapping. The differentially encoded M-ary symbols ck are defined by the rule ck = akCk-it with c0 = 1 (c0 acts as a reference symbol). The corresponding received signal at the matched filter output can be written as
where the sequence of channel coefficients {fk} is a complex random sequence whose real and imaginary components are independent, Gaussian with zero mean, and {nk} are samples of a zero-mean complex-valued circularly symmetric white Gaussian noise process. The autocorrelation of the fading process is assumed to follow the classical isotropic scattering model [20]. Assuming that the information bits are independent within each symbol, we can consider P{ak} = P{ak } • • • P{ak }. Considering a bit-wise interleaver as in [159], soft outputs on bits {a^. } have to be calculated. To obtain these soft outputs, equation (4.92) (or (4.134)) can be modified as follows:
State Reduction Techniques for Forward Backward Algorithms
211
where the notation Tk : a(Tk)^ indicates all the transitions {T*} compatible with the z'-th bit component of the symbol ak associated with Tk. Since the specific realizations of the channel coefficients {fk} are unknown, linear prediction [101,123,125,126] can be used to derive estimates of these coefficients. Denoting by TV the prediction order and setting the finite memory parameter as C = N, the metric ~/k(Tk) can be written as follows [159]:
where fk is the fading gain prediction, {pn} are the prediction coefficients and cr2re is the variance of the prediction error, which can be computed as shown in [160]. We may observe that CfccJ_- = n}=o a^-j- By defining Sk — oj!l]v+1, the corresponding trellis diagram has MN~l states.25 Assuming that the autocorrelation sequence of the channel fading process is known, the optimal prediction coefficients pnj I < n < TV,
25
Note that this defi nition of state depending on only (N — I) symbols, instead of C to the use of differential encoding.
N, is due
212
Symbol Detection: Algorithms and Applications
are obtained by solving Wiener Hopf equations [159]. Hence
Noting that \ak 2 and r/- 2 do not depend on Tk and defining
one finally obtains
The first term inside the exponential in (4.154) was dropped in [159]. However, it depends on Sk and should not be neglected, unless one assumes perfect CSI. To reduce the number of trellis states, we proceed as in the case of an ISI channel. We may define a reduced state sk = a{!lg+1, with Q < N, obtaining (the formulation
State Reduction Techniques for Forward Backward Algorithms
213
is similar to that used in Section 4.8.2)
where the notation s~(ek) indicates the beginning state sk of transition ek. Similarly, hereafter the notation s+(ek) will refer to the ending state sk+i of transition ek. An identical notation will be used in the full-state case, too. As an example of application, we consider transmission of differentially encoded quaternary PSK (DQPSK) signals over a flat fading channel, as in [159]. The outer code is a 64-state nonrecursive convolutional (NRC) code with generators GI = (133)8 and G2 = (171)8. This code is concatenated, through a 64 x 64 nonuniform bit interleaver, with an inner differential encoder. In fact, bit interleaving is an appropriate means to combat the effects of fading [65, 161]. The normalized fading rate is foT — 0.01. The differential inner detector uses linear prediction and state reduction. In Figure 4.22, the performance of the full-state receiver, with prediction orders TV = Q = 3 and 4, respectively, is compared with the performance of a receiver with various levels of complexity reduction, specified by the couple of parameters (TV, Q). In all cases, 1 and 6 decoding iterations are considered. At 6 decoding iterations, a detector with TV = 3 and Q = 2 exhibits a performance loss of only 0.2 dB at a BER of 10~4, with respect to the full-state detector (TV = Q = 3). For (TV, Q) — (5,3), the performance gain with respect to the full-state receiver with TV = 3 is about 0.6 dB at a BER of 10~4. For comparison, the performance curve in the case of perfect knowledge of the fading coefficients (coherent detection) is also shown.
4.8.3 Forward-Only RS FB-type Algorithms We now consider the FB-type algorithms proposed in Section 4.7. We consider the same communication scenario (an encoder/modulator followed by a channel with memory) as in Section 4.6, and we adopt the same formalism. We recall the defini-
214
Symbol Detection: Algorithms and Applications
Figure 4.22: Application of the proposed Fwd-only state reduction technique to iterative decoding/detection, through linear prediction, for flat fading channels with fDT — 0.01. Receivers with various levels of complexity (in terms of prediction order N and reduced-state parameter Q} are shown. The considered numbers of iterations are 1 and 6 in all cases. The performance in the case of decoding with perfect knowledge of the fading coefficients is also shown (solid lines). Reproduced from [156], ©2001 IEEE, by permission of the IEEE. tions of the following quantities:
where TV is the order of Markovianity. According to the structure of the introduced FB-type algorithms, the soft-output reliability value on an information symbol ak can be expressed as follows:
where the summation is extended over all transitions of epoch k compatible with the information symbol a&, and P{Sk} represents the a priori probability of state Sk,
State Reduction Techniques for Forward Backward Algorithms
215
respectively. As seen in Section 4.7, ak(Tk) and /3k(Tk) can be computed by means of forward and backward recursions, as in (4.119) and (4.123), respectively. We proceed exactly as in the case of Fwd-only RS-FB algorithms. We consider a trellis diagram with a reduced number of states f ' < £ and denote its generic state by sk = (fJ-k-Qj ak-Q+i), with Q < N. The question of how to define a survivor in the case of RS FB-type algorithms arises. A full-state FB-type algorithm runs first a forward recursion to compute ak(Tk) for each transition Tk and epochs k = 0, 1, . . . , AT — 1. Due to the structure of FBtype algorithms, ak is associated with a transition Tk, instead of a single state Sk as in the case of an FB algorithm. Hence, a survivor associated with a single transition has to be defined. By considering the algorithm in the logarithmic domain, the forward recursion of ak (equation (4. 1 19)) can be expressed as
where S+(Tk_i) indicates the final state of transition Tk-i and S~(Tk) indicates the initial state of transition Tk. We can approximate this recursion by using the max-log approximation [137], obtaining
In equation (4.161), it is intuitive to interpret the term to be maximized as a "metric" associated with a path ending with transition Tk, as in a classical VA. However, this VA is an "extended" VA, where a single step involves two consecutive trellis transitions. Considering the reduced-state trellis and assuming that the forward survivor (FS) of each transition ek-i is known, we now show how the survivors can be extended to epoch k by using the forward recursion. We define26 by FSj£_-(efc_i) the sequence of i transitions reaching epoch k — j along the survivor of transition e^-i, i.e., F S . ( e f c _ i ) - e£:_ m = aJ_~Q+l. Any transition e fc _,, for / G {j, . . . , j + i— 1}, inFSJj._ -(ejfe-i) and any information symbol afk_j_h, forh G {Q, • • • , Q+i— 1}, depend on the transition e fc _i. Therefore, the couple (FS|j._2 (e^-i), e/t-i) uniquely identifies the sequence of information symbols (a^]^_l^akkil_l) where afc'_]y_^2 In Section 4.8.1 we considered the generic notation af to indicate the forward decision feedback. In the current section we use an explicit notation to indicate a segment of an FS. We also suppress the explicit dependence of the recovered information symbols on the transition ek-i to simplify the notation. Any ambiguity should be clear from the context. 26
216
Symbol Detection: Algorithms and Applications
represent the information symbols in the path history of transition e f c _i. With these definitions, it is easy to extend the survivors by a step, i.e., to epoch k. In the reducedstate trellis, taking into account the survivor path FSj._ 1 (efc_i) associated with transition efe_i, equation (4.161) reduces to
For each transition e fc , the branch e™ that maximizes (4.162) has to be stored. Given the above definitions, we can reformulate equations (4.159)-(4.162), obtaining:
When a recursive state // fe is embedded in sk, one can use the following approximation for the a priori probability of state sk: P{sk} ~ JlSi1 P{ak-i} [162]—this implicitly assumes that all recursive states are considered equiprobable. In Figure 4.23, it is shown how the forward recursion proceeds in the reduced-state trellis according to equation (4.164). In order to compute ak(ek), one should consider the M quantities {ak-i(ck-i)} such that s+(ek-i) = s~(ek) and, for each of them, compute the metric ipk by considering the symbols associated with the survivor of transition ek-i. The backward recursion proceeds similarly using the survivor map generated during the forward recursion, as shown in Figure 4.24.
4.8.4 Examples of Application of Fwd-Only RS FB-type Algorithms Noncoherent Channel We consider the so-called NCSOb algorithm proposed in Section 4.7. We remark that in this case, the application of the proposed state reduction technique is "tricky." We now show a possible derivation, successfully employed in [163].
State Reduction Techniques for Forward Backward Algorithms
217
Figure 4.23: Forward recursion of the pdf djb(efe) for a general Fwd-only RS FB-type algorithm. Reproduced from [156], ©2001 IEEE, by permission of the IEEE.
Figure 4.24: Backward recursion of the pdf ft(^fc) for a general Fwd-only RS FBtype algorithm. The metric 0^ is calculated using the survivor map previously constructed in the forward recursion. Reproduced from [156], ©2001 IEEE, by permission of the IEEE.
Assuming a forward survivor map has been constructed, the pdf 7^ can be com-
218
Symbol Detection: Algorithms and Applications
puted as follows:
In the reduced-state case, we recall that the two quantities ak and J3k can be written, directly extending (4.156) and (4.158), as
If Q < N, then ak(ek), as defined in (4.167) for the reduced-state case, is different from ak(Tk) in (4.156) for the full-state case. Similarly, /?*(€&) ^ fik(Tk). We now show the mathematical derivation which leads to the forward recursion in the reduced-state trellis. More precisely, assuming the survivor map is known up to epoch (k — 1), we show how to extend it to epoch k. The detailed mathematical derivation of the forward recursion proposed in Section 4.7 for the full-state case cannot be applied. For ease of derivation, we consider the case of an encoder/modulator FSM such that the next-state function is invertible. In other words, we assume that (ajb_Q_i, efc) uniquely identifies ek-i- A correct formulation of a step in the reducedstate forward recursion is the following:
Assuming Q < N (state reduction), ak-Q-i depends on r%_N. Hence P {afc-Q_i |efc, rk_N } ^ P {O/ C _Q_ I }, making it impossible to evaluate this probability. As a consequence, another approach has to be considered. More precisely, we
State Reduction Techniques for Forward Backward Algorithms
219
may express ak as follows:
Since P{a/ c _g_i e/c} = P{afc_g_i}, observing that e^-i is uniquely determined by (a fc _ Q _i, e fc ) and applying the CMP, i.e.:
the following forward recursion in the reduced-state trellis is obtained:
where in agreement with (4.167). The problem in the computation of (4.172) is the evaluation of the two pdfs P (r%-N-i ak-Q-i,tk) and p (r^_N ek). In fact, since Q < TV, each of the two pdfs should be correctly computed by averaging over previous information symbols. Since at epoch k the survivor of each transition e fc _i is known and since (a f c _ Q _i,e f c ) = (efc_i, ek), we replace p^^^ \ak-Q-i,ek) = p(r}k_N_1\ek^llek) with the pdf p(r^_ 7V _ 1 FS^g
( e fe-i)j e fe-ij e fc)» obtaining the following modified
220
Symbol Detection: Algorithms and Applications
recursion:
We now express the forward recursion (4.174) in the logarithmic domain as follows:
and using the max-log approximation one obtains:
The choice of the survivor associated with ek may be based on the maximization operation in (4.176), which can be correctly carried out since the quantities can be computed. The term ]i\.p(r*_N \ek) does not affect the maximization (and, consequently, the survivor selection), but affects the numerical value of In ak(ek). We denote by ej™1* the transition selected by maximization in (4.176). Equivalently, the symbol CL™Q may be considered. Once the transition e™_fj has been associated with ek, we replace \np(rk_N e k ) with the following pdf in the logarithmic domain:
State Reduction Techniques for Forward Backward Algorithms
221
The resulting forward recursion becomes:
The obtained forward recursion in the reduced-state trellis exhibits some analogies with the corresponding forward recursion in the full-state trellis. This indirectly confirms the validity of the proposed intuitive approximations. The backward recursion can be similarly obtained with the further simplification that the survivor map is now already available because it was previously determined during the forward recursion. More precisely, observing that (e fc , ak) uniquely identifies ejb+i, the backward recursion may be written as follows:
As an example of application, we consider a single 16-state RSC binary code with generators G\ = (37)g, G^ — (21)8 and rate 2/3, obtained by means of puncturing,
222
Symbol Detection: Algorithms and Applications
Figure 4.25: Application of the RS-NCSOb algorithm to noncoherent decoding of an RSC code. Receivers with various levels of complexity are considered and compared with a full-state receiver (TV = 2) and the coherent receiver. Reproduced from [156], ©2001 IEEE, by permission of the IEEE. used as a component of the turbo code presented in [33]. The modulation format is BPSK. In Figure 4.25, the performance of the noncoherent decoder using the RSNCSOb algorithm is assessed for various levels of state reduction and compared with that of a coherent receiver. With respect to a decoder with N = 2 and f = 64 states, for TV = 5 and £' = 64 states the performance is appreciably improved at low SNRs. For TV = 1 and (' = 16, the performance loss with respect to the full-state receiver with TV = 2 is less than 1 dB for every SNR, and reduces to only 0.5 dB for an SNR value larger than 5 dB. As mentioned previously, we do not consider any example of noncoherent iterative detection, since Section 4.9.1 will be entirely dedicated to this topic.
4.8.5
Generalized RS-FB Algorithms
In this subsection, we present a generalized class of RS-FB algorithms. We first consider a probabilistic derivation, and we then introduce a generalized structure. Various examples of applications of these generalized state reduction techniques for
State Reduction Techniques for Forward Backward Algorithms
223
FB algorithms are proposed. More details about generalized RS-FB algorithms can be found in [164]. Probabilistic Derivation In this section, a probabilistic approach to deriving generalized RS-FB algorithms is proposed [165]. Let a/-, Sk, and Xk be the input, state, and output of an FSM (it could be the encoder/modulator FSM or the encoder/modulator/channel FSM), respectively, at epoch fce{0,l,...,.&"—1}, where K represents the length of the information sequence at the input of the FSM. The input symbols {a^}^1 are assumed to be iid. A valid transition Tk is defined by (Sfe, a fc , Sk+i), where Sk+i ~ nsk(Sk, a-k)The generated output symbol is denoted by Xk — 0^(5^, a^). The next-state (nsfc(-, •)) and output (o fc (-, •)) functions are determined by the encoder/modulator structure. In order to simplify the notation, we assume, in the remainder of this subsection, that a single output symbol Xk is generated in correspondence to a single input symbol afc. The extension to a more general case is straightforward by considering a vector notation. To simplify the derivation, a state of an FSM is assumed to be defined by a sequence of consecutive input symbols Sk = a>k-L> where the parameter L is related to the memory of the FSM. We refer to this kind of FSM as simple [98]. This is the case, for example, for the FSM corresponding to an ISI channel with L + 1 taps—as considered, for instance, in Section 4.8.2. Therefore, in the full-state case, a transition Tfc is associated with L + 1 information symbols, i.e., Tk = (Sk, a>k-> Sk+i) = ak-LFor the reduced-state case, a state is simply defined as Sk = of^L , with LI < L. Similarly to the full-state trellis case, a reduced-state transition tk is also defined by (fit, at, Sfe+i) = a>k-Li • Thus, there are L — LI + 1 possible reduced-state transitions {tk-(L-Li), • • • , tk} embedded in (i.e., uniquely determined by) a full-state transition Tk. In both reduced-state recursions we want to associate with each trellis branch, at any epoch, a metric corresponding to one in the full-state case. Hence, a reduced-state transition has to be associated in a unique way with a particular full-state transition, and this will be done on the basis of decision feedback. In general, a full-state transition Tk can be truncated differently in the two recursions. Specifically, Tk can be reduced to tk-n{ in the forward recursion, and to tf~-nb in the backward recursion, where rif and n^ are suitable integer parameters such that 0 < HI < n^ < L — LI. Let us suppose that the forward recursion is run first. An intuitive approach to performing reduced-state detection consists of reducing Tk to tk in the forward recursion (i.e., nf = 0) and to ^-L2 in the backward recursion (i.e., n^ — L^, where L2 € { 0 , . . . , L — LI}. In the following, we derive an RS-FB algorithm for this specific case, and generalizations to other cases can be dealt with in the same way. For
224
Symbol Detection: Algorithms and Applications
the sake of simplicity, we refer to a discrete-time communication system, where the output sequence x^~l of a simple FSM is transmitted over an AWGN channel. The observation at the output of the channel at epoch k, indicated as rk, can be written as
where nfi l is a sequence of zero-mean uncorrelated Gaussian random variables. The goal of this derivation is to compute the a posteriori likelihood, i.e., the extrinsic soft output, of ak based on the observations r $~l. The extrinsic soft output, indicated as SO[afc], can be written as
where P(rjf \ ak) is the joint probability distribution function27 of the observation sequence r^~l and information symbol ak, S[ak] is the "complete" soft output (corresponding, in principle, to the APP) and SI[a J is the a priori input soft information on ak (corresponding, in principle, to the a priori probability on the symbol). In the following, we will focus on the computation of the joint probability distribution function P(r^~l,ak) over a reduced-state trellis. Let us assume that forward and backward survivors have been somehow already associated with each reduced state at each epoch (at this point of the derivation we do not know how, yet). The ensembles of forward and backward survivors are indicated with the general notation df and db, respectively. A reasonable approximation, useful for computing the desired soft output (4.181), is the following:
27
We use the term probability distribution function to denote a continuous pdf with some discrete probability masses. For a probability distribution function we will use the symbol P(.).
State Reduction Techniques for Forward Backward Algorithms
225
Using elementary conditional probability relations, one obtains:
where the pdfs in (4.183), denoted by pi,p2, and p3, respectively, are further analyzed below. Pi. Based on the causality of the system (i.e., tk = o,%_Ll), pi can be written as
which we define as follows:28
p2- As the channel is memoryless and assuming that a forward survivor af (sk) and a backward survivor db(sk+i) are associated with states Sk and sfe+1, respectively, the extended metric p2 can be computed as
28
In fact, since P(r^ the quantity ak(sk).
,sk a f ) =
l
o^^s/c, at)P{sk}, the quantity defi ned in (4.185) is exactly
226
Symbol Detection: Algorithms and Applications
and we correspondingly define:
PS. The quantity p3 can be expressed in different ways, depending on L2. If L2 L — LI (e.g., [151]), we can write:
However, if L2 < L — LI (e.g., L2 = 0 in [156]), then r fc +L 2 +i depends on all the information symbols embedded in a reduced transition tk and, possibly, on a portion of d f . Thus, it is not true that r%+£ +1 depends only on Sk+i and db. However, by assuming that df contains the information, relative to tk, not included in sk+i, one can write:
and one can correspondingly define:
The likelihood of (i.e., the extrinsic soft-output value associated with) ak can be finally obtained through the following completion operation:
As one can see, we have dropped from the definitions of the introduced quantities (SO[afc], afc(sfc), Gfr+L2(tk) and /^+i(sfc+i)) the dependence on the survivors. They have been used, however, to justify and motivate the above derivation. The notation is used to indicate the time interval relative to the considered observation
State Reduction Techniques for Forward Backward Algorithms
227
sequence, i.e., from ktok+L2. This represents a difference with respect to a standard FB algorithm. In fact, instead of considering just the observation at epoch k, we consider a window of observations. The quantities ctk(sk) and /3k+i(sk+i) can be computed by means of forward and backward recursions, respectively. During these recursions, the survivors df and db are recursively constructed. Soft-output values relative to output symbols {x^} can be generated in a similar way. Let us consider the computation associated with the trellis transition at epoch k — 1. Assuming that a forward survivor d f ( s k - i ) is associated with each state Sk-i, the corresponding step in the forward recursion allows one to recursively compute ak(sk) on the reduced-state trellis and, simultaneously, to extend the existing survivors to the states at epoch k. In fact, &k(sk) can be expressed as follows:
By defining
where a^_^!2 depend on the forward survivor, the (A: — l)-th step in the forward recursion becomes:
A reasonable way to associate a survivor with Sk consists of associating with it the survivor relative to the state sk-i such that the corresponding term in the summation (4.194) is the largest one [151,156]. Proper boundary conditions, in terms of QO(SO)» have to be considered, depending on the initial state of the FSM. Similarly to the forward recursion, the quantity pk(&k) can be computed through a backward recursion assuming that backward survivors are known down to states
228
Symbol Detection: Algorithms and Applications
{sk+i}. Moreover, the backward survivors are simultaneously extended to states at the /c-th epoch step in the backward recursion. Thus,
Defining
a step in the backward recursion at epoch k reads:
Similarly to the forward recursion, it is reasonable to associate with state sk the backward survivor relative to state sk+i corresponding to the largest term in the summation in (4.197). Note that, unlike the case where forward-only state reduction techniques are used, in this case the step at epoch k in the backward recursion involves the observation at epoch k + L2, rather than that at epoch k + 1. This is due to the extended observation window used in the completion operation. For the sake of clarity, we consider a few interesting cases for the backward recursion by focusing on the metric Gk+L2(tk)-
State Reduction Techniques for Forward Backward Algorithms
229
• If L2 = L — LI, then
In this case, the backward recursion relies only on a backward survivor, which means that backward and forward recursions can be run independently [151]. • If 0 < L2 < L — LI, during the backward recursion backward decision feedback survivors are constructed relying on both forward and backward survivors. This means that the forward recursion has to be run first, and must rely only on forward survivors. • At another extreme case, i.e., L2 = 0 [156], no survivor is built during the backward recursion. In other words, the decision feedback relies completely on the forward survivors, and one obtains:
This corresponds to the approach considered in Section 4.8.1. Generalized Structures of RS-FB Algorithms The probabilistic derivation in Section 4.8.5 provides a unified framework for deriving RS-FB algorithms, generalizing previously published solutions [151, 156]. However, further improvements, based on intuition, are possible (for more details, see [164, 166]). In fact, further generalizations can be obtained by modifying various aspects of the derived RS-FB algorithmic structure, including the forward recursion, the backward recursion, and the soft output generation. As in the previous section, for the sake of simplicity we will consider only the generation of soft-output values relative to the inputs of the FSM, i.e., {ak}. First, let us consider forward and backward recursions. In the previous derivation, the forward recursion is run first, by using only forward decision feedback. The backward recursion, where a backward decision feedback might be constructed, can
230
Symbol Detection: Algorithms and Applications
utilize both forward and backward decision feedbacks. However, both recursions could be designed to use either forward or backward decision feedback, or both, thus providing different RS-FB structures. Moreover, the soft input relative to a^ does not necessarily need to be included in the metric corresponding to the reduced-state trellis section at epoch k. In fact, it could be included at any epoch /, provided that ak can be extracted from the reduced transition at epoch /. Indicating by of (sk) the forward decision feedback associated with the reduced state sk and by a°(sk+i) the backward decision feedback associated with Sk+i, the soft-input information on xk+i, relative to the reduced-state transition tk = (sfc, a&, SAH-I)» can be generally indicated with the following notation:
in which the sequence of symbols a^L+i is extracted from df (sk), tk, and db(sk+i). For example, if tk = akk_Li (simple FSM), the forward and backward recursions in (4.194) and (4.197) can then be equivalently rewritten as follows:
where raf and mb are suitable integers such that — LI < mf < 0 and —L\ < mb < 0. These integer values indicate which soft-input information is embedded in the transition at epoch k. Therefore, the forward and backward recursions corresponding to the case where Tk is reduced to tk_n{ in the forward recursion and to t/ c _ nb in the backward recursion, with 0 < rif < nb < L — LI, —L\ < m? < 0 and —Li < rab < 0, can be generally written as
If the parameters rif and nb in the two recursions are set so that both recursions are based only on the forward (backward) decision feedback, the derived algorithm corresponds to a Fwd-only (Bwd-only) RS-FB algorithm. On the other hand, if one
State Reduction Techniques for Forward Backward Algorithms
231
of these two recursions makes use of both forward and backward decision feedbacks, the corresponding RS-FB algorithm will be referred to as a BiD RS-FB algorithm. At this point, let us consider the completion operation in (4.191). By using the notation introduced in (4.200), one can rewrite (4.191) as follows:
where SOk[ak] represents a soft output value on ak, generated through the completion corresponding to the reduced-state transition tk (the notation will be clear in the following). In this case, the forward and backward recursions are defined as in (4.201) and (4.202), with 0 < L2 < L - LI and raf = rab = 0 (as used in Section 4.8.5). • The Fwd-only RS-FB algorithm introduced in Section 4.8.1 is obtained by setting 1/2 = 0:
One can immediately notice that this completion is formally identical to that in the full-state case. This structure will be referred to as the standard completion operation. • At the other extreme, if L2 = L — LI, only backward decision feedback is used during the backward recursion. Therefore, for this BiD RS-FB algorithm, the backward recursion can be run independently from the forward recursion. The completion operation (4.205) for the generation of a soft output value on ak at the trellis section corresponding to time epoch k can be written as
Comparing this BiD RS-FB algorithm with the BiD RS-FB algorithm proposed in [151], one can immediately recognize that both algorithms utilize the same forward and backward recursions. Nevertheless, there is a difference in the completion operation. In fact, the completion operation proposed in [151] can be written as
232
Symbol Detection: Algorithms and Applications
The completion in (4.208) has the same structure as the completion of the (fullstate) FB algorithm (i.e., a standard completion). Hence, the completions in (4.207) and (4.208) utilize a different amount of soft-input information in order to generate the desired soft-output information. In fact, it is possible to show that the proposed BiD RS-FB algorithm, with the completion (4.207), uses each of the soft inputs SI[xk] and SI[ajb], for k — 0 , . . . , K — 1, exactly once29 to generate the quantity SO^a*;]. On the other hand, for the BiD RS-FB algorithm in [151] (with completion in (4.208)), the soft output SOk[ak] does not depend on the soft-input values {SI[xjfc+i], SI[xfc + 2 ],..., Sl[xk+(L-Li)}}- The set of missing soft-input values will be referred to as the gap. In the RS-FB algorithm in [151], this gap is due to the time misalignment between the soft input SI[xk] used in the metric of the forward recursion step at epoch k and the soft input SI[xfc+L2] used in the metric of the backward recursion step at epoch k. In this sense, the completion (4.207) of the proposed BiD RS-FB algorithm fully fills the gap.
In order to further generalize the completion in (4.205), weight exponents for the soft-input values can be introduced as follows:
The weight exponents wcx(k, i) are used to identify whether the soft inputs SI[xfe+i] should be included in the completion. If a particular soft input is included (excluded) in the completion, its associated weight exponent is set to 1 (0). In general, a weight exponent can be any real number 0 G (0,1]. The completion (4.209) refers to the sum-product version of an RS-FB algorithm—in the min-sum version, a weight exponent becomes a scaling factor that multiplies the soft-input information, i.e., it corresponds to the heuristic parameter introduced in Section 4.5 to scale the extrinsic information for optimized iterative detection. Referring to the generalized forward and backward recursions (4.203) and (4.204), the corresponding generalized comple29
A situation where each soft-input Sljxfc] and Slfa^], k = 0 , . . . , N — 1, is used more than once will be referred to as double counting and should be avoided if possible.
State Reduction Techniques for Forward Backward Algorithms
233
tion operation in the reduced-state trellis at epoch k can be written as
where the weight exponents {w£(fc, m)} have been introduced for the soft-input values on the information symbols and the parameters nf,nb,ra,f and rab previously introduced are such that rif < nb and raf < mb. In this case, each soft input is utilized at most once. The weight exponents wcx(k,n) and wca(k,m) can be tuned to use the desired amount of soft-input information. Specifically, {SlfajJ}^^"1 U {SIHJIT'"1 and {SI[^]}~^nb+1 U {SIiai]},1^mb+1 are the soft-input values used during the forward and backward recursions to generate the state metrics otk(sk) and (3k+i (sfc+i)— note that SI[xfc+nf]SI[afc+mf ] and SI[zfc+nb]SI[afc+mb] are the metrics of the forward and backward recursions in (4.203) and (4.204), respectively. Standard completion is equivalent to setting wca(k, m) = 5^(m—raf) and wx(k, n) = 6K(n — n f ), where 6K(-) is the Kronecker delta function defined as
The corresponding RS-FB algorithm is identified by not filling the gap (Nfg). The gap is in this case constituted by {SI[xfc+i]}2nf+1 U {SI[afc+i]}™bmf+1. For instance, there is no gap if nf = rib and raf = rab (e.g., nf = nb = raf = rab = 0). On the other hand, the gap can be fully filled if w°(k, ra) = wcx(k, n) — 1, Vra, n. This option will be referred to as fully filling the gap (Ffg). The option where only a portion of the gap is filled, by setting wca(k, ra) = ^K(^ — ^f) + ^K(^ - ^b) and/or w°(k, n) — S^(n — nf) + ^K(^ — ^b)» will be referred to as partially filling the gap (Pfg). Another possible improvement can be obtained by considering multiple completion, as originally suggested in [98, Problem 3.10]. The idea is simple. Instead of simply generating the soft output on a^ when averaging over t^ a soft output relative to ak could be generated by averaging over any transition embedding a^. To be more specific, we could obtain various reliability values on a/- by averaging over tk, tk+i,..., tk+Li- Due to the suboptimality of RS-FB algorithms, based on the construction and use of survivors, it is likely that the LI generated soft outputs are different. Suitably combining them could refine, owing to a diversity effect, the final soft-output information on a*. In the logarithmic domain (min-sum version), an intuitive way to
234
Symbol Detection: Algorithms and Applications
combine them is considering the arithmetic mean—in fact, this operation returns the same output as the FB algorithm in the absence of state reduction. In the probability domain (sum-product version), this corresponds to the geometric mean. Indicating by SOk+i[dk\ the soft output on ak obtained by averaging over tk+i, the final soft output can be expressed as
In this case as well, weight exponents can be introduced to generalize the multiple completion in (4.212). More precisely, the soft output on a^ could be written as
where the weight exponents w°a(k, i) in (4.213) have to be properly set—in (4.212), we are implicitly considering 9 — l/(Li + 1). Typically, a soft-output value on ak is generated through a single completion at epoch k. This corresponds to considering w°(k, i) = 06x(i), and will be referred to as a single completion (sc). At the other extreme, the final soft-output value on ak could be obtained by using all the possible quantities in (4.213). An instance of this option is obtained by setting w°(k,i) = 0/(l + LI), Vi 6 {0,..., LI} and will be referred to as full multiple completion (Fmc). Finally, if only a portion of the soft-input values in (4.213) is considered to generate the final value on ak, such as w°(k, i) — f #K W + f ^K(* — LI), this option will be called partial multiple completion (Pmc). The factor 9 introduced above is used to control the influence of the soft-input information in the generation of the soft output. This might be very useful when considering exchange of soft information in iterative data detectors, as seen in Section 4.5. In fact, RS-FB algorithms usually lead, because of suboptimality, to overestimation of the soft-output reliability information. The beneficial effect of using weight exponents lower than 1 will be confirmed by the considered numerical results. A probabilistic derivation of generalized RS-FB algorithms has been proposed. However, further useful and effective modifications (e.g., use of weight exponents and multiple completion) are dictated by intuition and do not have an analytical basis. A graphical approach represents a systematic way to design RS-FB algorithms, which can take into account all the considered intuitive modifications. In fact, a suitable graph structure should possess the following characteristics: • it should allow the application of different state reduction techniques;
State Reduction Techniques for Forward Backward Algorithms
235
• it should account for multiple completion; • in the absence of state reduction, the probabilistic propagation in the graph should provide the same (or exact) soft output as the FB algorithm. More details about state reduction techniques based on the use of suitable graphical structures, based on the concept of an over-structured graph, can be found in [164, 167]. In the following subsection, we further comment on RS-FB algorithms, in order to provide the reader with more intuition regarding state reduction techniques. Comments on Generalized RS-FB Algorithms for ISI Channels For any specific communication system, it would be of interest to find the RS-FB algorithm that guarantees the best possible performance for a given complexity—this is a legitimate question, since an RS-FB algorithm is an approximation of the FB algorithm. The case of an ISI channel, whose corresponding FSM output xk can be written as
is important and easy to understand. Hence, we will comment on the proposed RSFB algorithms referring to an ISI channel, according to the values of the channel coefficients {/ 0 ,..., /L}. This analysis can be extended to various types of FSM, especially any LTI system. In the following, the discussion will be based on the three general categories of RS-FB algorithm previously introduced: Fwd-only, Bwd-only, and BiD RS-FB algorithms. First, let us consider Fwd-only RS-FB algorithms, where both forward and backward recursions utilize only forward decision feedback in the metric computation. In other words, this algorithm uses RSSD only during the forward recursion, where a forward survivor (corresponding to a sequence of "past" decided symbols) is associated with each state at each epoch. This algorithm can be obtained from the generic recursions (4.203) and (4.204), by setting rif = n^, = m? = rab = 0. Since the reduced state s^ = o^L is the most recent portion of the full state Sk = &%-},> me Fwd-only RS-FB algorithm is the best choice if the ISI channel is minimum-phase, because in this case the taps of the ISI channel, where most of the energy is concentrated, are the ones relative to the information symbols embedded in the reduced state. In this way, the past information symbols, recovered by decision feedback, correspond to the smallest taps and have little influence. The error propagation is not significant, and the backward recursion is then based on a set of reliable survivors.
236
Symbol Detection: Algorithms and Applications
However, if the channel is not minimum-phase, the symbols associated with the reduced state may not be the ones corresponding to the taps of the channel with most of the energy. In the worst case, if the channel is maximum-phase, they correspond to the taps of the channel with lower energy. The information symbols corresponding to the taps with most of the energy are thus recovered with decision feedback, and error propagation, with severe performance degradation, is very likely. In any case, there is no gap in the completion operation. Multiple completion is also unnecessary, since completion operations at different epochs, used to derive a soft output on a/t, provide the same extrinsic information. On the other hand, the Bwd-only RS-FB algorithm is obtained by applying RSSD only in the backward recursion, where a backward survivor (corresponding to a sequence of "future" decided symbols) is associated with each state at each epoch. In this case, the backward recursion is run first and the forward recursion is then operated with the same metric as the backward recursion. As an example, this algorithm can be obtained from the generic recursions (4.203) and (4.204), by setting nf = nb = L — LI and raf = rab = 0. The Bwd-only RS-FB algorithm is a specular version of the Fwd-only RS-FB algorithm. Reasoning as in the case of the Fwd-only RS-FB algorithm, one can conclude that the Bwd-only RS-FB algorithm is the best choice if the ISI channel is maximum-phase. In fact, in this case the taps of the ISI channel, where most of the energy is concentrated, are the ones relative to the information symbols embedded in the reduced state. Error propagation and, consequently, performance degradation increase if the channel is not maximum-phase. The worst case is given by a minimum-phase channel. As for a Fwd-only RS-FB algorithm, in this case as well there is no need to fill the gap and to consider multiple completion. In BiD RS-FB algorithms, forward and backward decision feedback is used. Either one of the recursions, run first, must be designed in such a way that only the decision feedback in its direction is used. The other recursion, run later, can partly rely on the decision feedback generated in the first recursion. At the extreme, each recursion utilizes only the decision feedback in the corresponding direction. In this case, the two recursions can be run independently—this is the only case, relative to BiD RS-FB algorithms, considered for the numerical results in Section 4.8.6. As an example, this BiD RS-FB algorithm can be obtained from the general recursions (4.203) and (4.204), by setting nf = raf = rab = 0 and nb = L2 (0 < L2 < L - LI). The special case of independent recursions is obtained by setting L 2 = L — L\. Obviously, for a given number of (reduced) states, BiD RS-FB algorithms have a larger complexity than Fwd-only and Bwd-only RS-FB algorithms. This complexity increase is due to the fact that BiD RS-FB algorithms utilize both forward and backward survivor maps. Filling the gap and multiple completion further increase the complexity. Hence, a trade-off between complexity and performance has to be
State Reduction Techniques for Forward Backward Algorithms
237
properly found. Performance-wise, since each recursion contains the decision feedback in its direction, it might happen that even if there is severe error propagation in one of the recursions (e.g., the forward recursion using only forward decision feedback for maximum-phase channels), only little error propagation may affect the other recursion. A sort of diversity is then provided by the two recursions, and multiple completion can exploit this phenomenon. By properly filling the gap, setting multiple completion, and selecting suitable weight exponents, more reliable soft-input values could dominate over less reliable soft-input values. Unfortunately, an appropriate selection is likely to be problem-dependent and possibly difficult to predict in advance without a trial-and-error process. Nevertheless, intuitively, it is possible to find a BiD RS-FB algorithm that performs relatively well in almost any ISI channel. This is important especially when the ISI channel is mixed-phase (or for any FSM with an equivalent property), in which case Fwd-only and Bwd-only RS-FB algorithms offer unsatisfying performance. Moreover, BiD RS-FB algorithms may yield a better performance, with respect to Fwd-only or Bwd-only RS-FB algorithms, in the case of strong state reduction, especially for mixed-phase ISI channels.
4.8.6 Examples of Application of Generalized RS-FB Algorithms The performance of the considered receivers is assessed by means of computer simulations in terms of BER versus bit SNR E^/N^, E^ being the received signal energy per information bit and 7V0 being the one-sided noise power spectral density. In all cases, we assume perfect CSI at the receiver side. All considered FSMs are simple FSMs with memory L. The simple FSM corresponding to a reduced-state trellis has memory LI. In any component decoder, the min-sum version of the particular RS-FB/FB algorithm is considered. According to the general structure proposed in Section 4.8.5, a particular structure of RS-FB algorithms can be realized through an assignment of LI, Z/2, raf, ?if, m\>, nb, wca(k, m), wcx(k, n), and w°a(k, q) in (4.209), (4.210), and (4.213). In Table 4.1, a summary of a few possibilities, which are considered in the following, is given. In particular, in all considered schemes mf = rif = m^ — 0, so that these parameters are not indicated in Table 4.1. The parameter 9 G (0,1] appearing in Table 4.1 represents the scaling factor embedded in the weight exponents. Note that Fwdonly and BiD-Nfg-sc RS-FB algorithms are equivalent to the existing Fwd-only and BiD RS-FB algorithms in [156] and [151], respectively. The BiD-Ffg-Fmc and BiDPfg-Pmc RS-FB algorithms may be viewed as extensions of the BiD-Nfg-sc RS-FB algorithms, where the completion and soft-output combining make use of a larger portion of the available information. The particular structure of the BiD-Pfg-Pmc RS-
238
Symbol Detection: Algorithms and Applications
Table 4.1: Different parameter settings for RS-FB algorithms. L2
w°(k,m)
w^.(k,n)
0 is, in this case, a design choice and determines the expanded trellis size.38 A transition in the expanded trellis is indicated by Tk = (5fc, ak) = (p,k-N, ak-N)- ^n tms case' me APP can be approximated as follows [186]:
where Olk(Sk) , i 6 {f, b} are forward and backward phase estimates, respectively, which are obtained as described in the following. For the sake of simplicity, we will simply use the notation Olk to indicate an estimate of Ok —the dependence on a state Sk should be clear from the context. By defining
the soft output on the right-hand side of (4.231) generated by an adaptive CL algorithm, which we denote by SCL[ak], can be written as
It is important to remark that we explicitly consider the dependence on the observation and the phase estimate only in the expression of the exponential metric 7fc 'c, assuming an implicit dependence on it in the expressions of the quantities o^ and flk+i. In [186], it is shown that the metric ^LjC(Tk) characterizing the completion operation (4.235) can be written as
38
N — 0 corresponds to the case Sk = M/c, i.e., there is no trellis expansion.
274
Symbol Detection: Algorithms and Applications
where 9 is either Ofk or 0JJ+1, and (see [186] for details)
where the meaning of the parameter A, which is related to the recursive phase estimation strategy, is clarified below. The quantities ak(Sk) and /?fe+1(5fc+1) in (4.235) can be computed by means of forward and backward recursions, during which forward and backward phase estimates are computed, respectively. Let us consider the forward recursion. In general, it can be written as follows:
The phase estimate 9{ is updated in a per-survivor processing (PSP) fashion, using a first order phase-locked loop (PLL). The PLL update equation is given by
where the transition T™ax (and the corresponding beginning state 5™**) is the one determined by an add-compare- select operation similar to the one in (4.239), derived by exchanging the summation operation J]Tfc .5fc l by a maximization operation maxTfc.Sfc+1 . The backward recursion, during which a backward phase estimate 0£ is updated, is similar to the forward recursion. At this point, we would like to emphasize that the CL algorithm only keeps and updates a limited number of phase estimates. Each of these estimates corresponds to each of the survivors, i.e., each of the tree paths that are explored in the limited tree search procedure. However, due to the recursive nature of the parameter update equations, the entire memory of the channel is retained in each of these estimates (for this reason, this class of adaptive algorithms is referred to as closed-loop adaptive algorithms). This is a unique feature of the CL algorithm which allows almost coherent performance for low phase dynamics. This feature is, however, one of its drawbacks for fast phase dynamics, as will be evident from the results shown in the following. In all considered transmission schemes where a CL detection strategy is used, pilot symbols are inserted in the output modulated symbols, as indicated in Figure 4.55, Figure 4.57, Figure 4.58.
Applications to Wireless Communications
275
Open-Loop Adaptive FB Algorithms We recall, for ease of comparison, the expression of the exponential metric NCSOa algorithm:39
of the
Comparing the expressions in (4.237) and (4.241), it can be observed that there is no explicit phase estimate involved in the metric 7°L. However, by proper manipulations and slight approximations it is possible to interpret this metric as implicitly performing phase estimation, based on a window of consecutive observations. More precisely, observing that I 0 (x) ~ ex for sufficiently large x, and that, for a generic complex number c, c = ce~jZc = 3ft(ce~ jZc ), (4.129) can be approximated as follows:
where the implicit phase estimates are defined as
39
The current notation for the exponential metric is slightly different with respect to that used in the description of the NCSOa algorithm in Section 4.7. This is expedient to make the comparison with CL FB algorithms clearer.
276
Symbol Detection: Algorithms and Applications
If 0(kN+l} ~ 0(kN\ then (4.242) can be further approximated as
For a time-invariant channel phase rotation, the last approximation is reasonable for values of TV large enough [163, 187], while for a time-varying channel phase rotation an optimal value of TV exists as will be evident from the numerical results. Moreover, the final expression in (4.244) is formally identical to the corresponding metric for the CL approach in (4.237), and clearly shows the connection between 7JpL(Tfc, rkk_N) and the implicit OL phase estimate 0k in (4.243). Since the OL phase estimate depends on the current observation window, two consecutive estimates Ok_\ and Ok are not explicitly related by a recursive formula. In this sense, it is possible to interpret the phase estimation strategy embedded in (4.244) as openloop phase estimation, as opposed to the CL estimation in CL algorithms. In all the considered transmission schemes with OL detection strategy, pilot symbols are inserted in the input information stream, as indicated in Figure 4.55, Figure 4.57 and Figure 4.58. Separate Detection and Decoding In Sections 4.9.1 and 4.9.2, the schemes with separate detection and decoding were based on the use of differential encoding before transmission over the channel. Hence, the terminology "separate" is not formally correct since, in the detection process, besides taking into account the memory of the channel, the memory introduced by the differential encoder was also considered. A differential code is the simplest noncoherently noncatastrophic code [150], and its use allows one to overcome the unknown phase rotation introduced by the channel. However, the insertion of pilot symbols is an alternative simple way to combat the unknown channel phase rotation, making the overall code noncoherently noncatastrophic. In Figure 4.55, a possible scheme is shown. The output bits of a binary PCCC are grouped and mapped, through a memoryless mapper, to modulated symbols {c^}. The pilot symbols can be inserted either before mapping (for example Np binary pilot symbols could be inserted every TV^ information symbols) or after mapping (as complex pilot symbols). In either case, the noncoherent adaptive soft demodulator (A-SODEM), which generates soft-output reliability values based on the received observations, must be given knowledge of the pilot symbols insertion strategy. The considered noncoherent soft-output algorithms have simply to take advantage of this knowledge. For example, if the pilot symbols are inserted among the information bits as zero bits, the soft input SI[afc] (in the nat-
Applications to Wireless Communications
277
Figure 4.55: Parallel concatenated coding scheme with separate detection and decoding: transmitter, channel and iterative decoder constituted by the concatenation of an A-SODEM and a coherent turbo decoder. Reproduced from [129], ©2004 IEEE, by permission of the IEEE. ural or probabilistic domain) at an epoch k corresponding to the insertion of a pilot bit is set as
If, instead, we consider insertion of pilot symbols among the modulated output symbols {cfc}, a simple way to modify the basic exponential metric 7^ is that of assuming that pilot symbols are sent as supplementary complex symbols. Considering a vector notation, it is very simple to take into account the inserted pilot symbols. Basically, a generic correlation sum Y^i=o rk-icl-i involved in this metric is extended by adding an extra term relative to the inserted pilot symbol. The performance of the proposed receivers is assessed by means of computer
278
Symbol Detection: Algorithms and Applications
simulations mainly in terms of BER and frame-error rate (PER) versus Eb/N0, E^ being the received energy per information bit. The SNR loss due to the insertion of pilot symbols is accounted for in all the results presented herein. In all cases, Np = I pilot symbol is inserted every Nd symbols. The performance of the considered systems under dynamic channel conditions is investigated. The time-varying phase process {Ok} used in the simulations is a random walk with independent Gaussian increments, with variance over a signaling interval equal to cr^. In the following, we will also assume that any adaptive FB algorithm is in the min-sum form, while any coherent FB algorithm is the standard FB algorithm [54]. In all presented results, we assume that the initial forward and backward phase estimates (only forward in the OL case and both in the CL case) are ideal. This assumption is justified, since insertions of an initial and a final training sequence result in negligible bandwidth efficiency and energy loss for the considered codeword lengths. In all systems examined in this work, a number of simulations were run to roughly optimize the different system parameters (e.g., TV, Q, Nd, A, etc.). However, for conciseness, only a limited number of these results are presented in order to demonstrate the main conclusions. In Figure 4.56, the performance of the PCCC-based separate detection and decoding scheme considered in Figure 4.55 is shown. The purpose of these simulations is not to compare separate and combined strategies (see [186] for such a comparison), but rather to compare CL and OL strategies in the specific case of separate detection/decoding. In this case, the PCCC is almost identical to the combined scheme [33] described earlier. The only difference is that at every epoch the two output bits are mapped to a QPSK symbol with Gray mapping, resulting in an overall spectral efficiency of 1 bit/s/Hz. At the receiver side, the inner A-SODEM uses either the OL FB algorithm or the CL FB algorithm. The iterative detection and decoding process can be characterized by /e external iterations between the A-SODEM and the inner turbo decoder, and by I[ internal iterations in the turbo decoder. The performance of the proposed adaptive algorithms is compared with the performance of the corresponding coherent scheme. In all cases, the number of external iterations /e is set to 5. In the CL case, for <JA = 5 degrees, the performance for TV = 0 (single-state decoder) is shown—it is the same either for I[ = 2 or /j = 3 internal decoding iterations. The best performance is obtained in this case by considering Nd = 4. When increasing the phase jitter to <JA = 10 degrees, the performance for TV = 1 and /i = 2 internal decoding iterations is shown. As one can see, the loss with respect to the coherent limit is significant. Increasing the pilot insertion rate to Nd = 2, a performance improvement around 3 dB is observed at a BER of 10~4. In the OL case the A-SODEM uses the proposed noncoherent algorithm with (TV, Q) = (7,3): /i = 2 and I\ = 3 internal decoding iterations are considered. For <JA = 5 degrees the best performance is obtained with Nd = 16, while for crA = 10
Applications to Wireless Communications
279
Figure 4.56: BER of the separate scheme with rate-1/2 PCCC and QPSK output modulation. In all cases, 7e = 5 external iterations between the A-SODEM and the turbo decoder are considered. Various numbers I\ of inner decoding iterations are considered. In the CL case, the adaptive algorithm is characterized by TV = 0 for CTA = 5 degrees and by TV = 1 for <JA = 10 degrees. In the OL case, the detection algorithm is characterized by (N,Q) = (7,3). Reproduced from [129], ©2004 IEEE, by permission of the IEEE. degrees with Nj = 8. Both for CTA = 5 degrees and u& = 10 degrees, increasing 1^ from 2 to 3 leads to a performance improvement of less than 0.3 dB. The complexity with (/e, Jj) = (5, 2) is roughly comparable to the performance of the perfect CSI scheme with It = 10 decoding iterations. As one can see, the OL approach is more robust to strong phase variations with a reduced insertion rate with respect to the CL case, i.e., with a reduced bandwidth expansion. However, this comes at the expense of an increased number of states in the A-SODEM (Q = 3 corresponds to £' = 64 states). Combined Detection and Decoding We consider both an SCCC and a PCCC. The equivalent base-band discrete-time transmission system of an SCCC is shown in Figure 4.57. The bit sequence {6/J
280
Symbol Detection: Algorithms and Applications
Figure 4.57: Serially concatenated coding scheme with combined detection and decoding: transmitter, channel and adaptive iterative decoder. Reproduced from [129], ©2004 IEEE, by permission of the IEEE. is encoded using an outer code and interleaved using a symbol-wise or bit-wise interleaver. The resulting sequence of M-ary symbols {a^} is coded by an inner code, producing the coded sequence {ck}. The resulting coded symbols are further mapped to the complex symbols {c/-} and transmitted over an AWGN channel, which, in addition, introduces an unknown carrier phase offset. The statistics of the phase process need not be specified at this point. The complex equivalent signal can be written, after a suitable discretization process, as in (4.230). The receiver consists of an adaptive inner block, that jointly estimates the phase and produces soft information on symbols {0^}, and a nonadaptive outer block that produces soft decisions on {a^}, as well as hard decisions for {bk}. The details of the inner adaptive decoder, which can be either a CL or an OL FB algorithm, are discussed in the next section. In Figure 4.58, a transmission scheme employing a PCCC is shown. In this case, the PCCC is constituted by two component RSC codes. For simplicity, rate-1/2 RSC codes are considered. After possible puncturing, the sequences of information and coded bits are serialized, mapped to a BPSK constellation and transmitted over the channel. At the receiver side, the turbo decoder consists of two adaptive component decoders. It should be emphasized at this point that a combined detection scheme for PCCCs, such as the one described here, can become quite complicated when higher order constellations are used, as demonstrated in [186]. Therefore, only BPSK modulation is considered in this case, while higher modulations are considered in conjunction with separate detection and decoding, as described in the following subsection. The CL and OL algorithms are first compared considering iterative decoding of
Applications to Wireless Communications
281
Figure 4.58: Parallel concatenated coding scheme with combined detection and decoding: transmitter, channel and iterative decoder constituted by two adaptive component decoders. Reproduced from [129], ©2004 IEEE, by permission of the IEEE. two SCCCs using combined detection and decoding. The first SCCC consists of an outer 4-state, rate-1/2 convolutional code connected through a length-1024 pseudorandom interleaver to an inner 4-state, rate-2/3 convolutional code.40 The respective generator matrices are given by
The output symbols are mapped to an 8-PSK constellation with natural mapping, 40
The constituent codes in all SCCC and PCCC schemes examined herein are properly terminated using tail bits.
282
Symbol Detection: Algorithms and Applications
Figure 4.59: BER of an SCCC with OL and CL inner decoding algorithm, for phase jitter standard deviation CTA = 5 degrees and 16 and this is in agreement with a result in [173]. Similarly, values of TV > 17 are not considered since they do not produce any performance improvement. Therefore, the value of N = 17 (i.e., —8 < i < 8 in all the equations of Section 5.7.2) can be considered as optimal for CTA = 6 degrees. Hence, a gap of about 0.2 dB with respect to the curve labeled "known phase" is only due to the loss in channel capacity for a time-varying channel phase. In the binary case considered in the previous figure, the proposed algorithms have a practically optimal performance and a similar complexity. However, for a modulation format characterized by a more dense constellation, if for the discretizationbased algorithm the optimal number of discretization levels, and thus the complexity, must be increased, it can be expected that the number TV of considered Fourier coefficients in the proposed algorithm remains practically the same. This aspect is shown in Figure 5.20, where QPSK (M = 4) is used. The phase noise has CTA = 6 degrees and even in this case we have a pilot symbol in every block of 20 transmitted symbols. For the discretization-based algorithm L = 8 x M = 32 quantization levels are considered, whereas for the algorithm based on Fourier parameterization, the number of Fourier coefficients is still TV = 17. In Figure 5.21, the performance of the algorithms based on Tikhonov and Gaussian parameterizations is shown under the same conditions as Figure 5.19. One can observe that, despite the very low complexity, these algorithms have practically the same performance of more computationally demanding algorithms based on discretization and Fourier parameterization. This fact can be also observed from Figure 5.22, where all the considered algorithms are compared for a Wiener phase model with crA = 6 degrees. In this figure, the performance of two other algorithms described in the literature is also shown for the sake of comparison: the first one is the "ultra fast" algorithm with overlapped windows described in [93], with the value of TV optimized by computer simulation; the second one is based on the EM algorithm [220-222]. In order to adapt the algorithm to a time-varying channel phase, different phase estimates are computed for each code symbol, taking into account the contribution of the adjacent symbols belonging to a window whose dimension is optimized by computer simulation. For this reason, the algorithm is denoted by EM with sliding window (EM-SW). We found that the optimal window has a width of 60 symbols for the considered phase noise. In both cases, the performance loss is
Strong Phase Noise: An Alternative Approach to Graph-Based Detection
345
Figure 5.20: Performance of the algorithms based on discretization of channel parameters and Fourier parameterization. QPSK and the Wiener model with crA = 6 degrees are considered. due to the fact that these two algorithms are designed for a different phase model, i.e., a block-constant phase. Based on the above experiments and on extensive numerical evidence, it is possible to conclude that all the algorithms proposed in Section 5.7.2 exhibit a practically optimal performance (i.e., they perform as well as the discretization approach). Among them, those based on Tikhonov and Gaussian parameterizations, because of their low complexity (roughly equivalent to that of the EM-SW algorithm), represent the best candidates. For this reason, these two algorithms will be considered in the remaining results. By comparing the results in Figure 5.22 with those in Figure 5.15, one can also observe that, for CTA = 6 degrees, the linear-predictive algorithm described in Section 5.6.2 has a practically optimal performance, although with a complexity higher than that of the algorithm based on Tikhonov parameterization. The sensitivity to distributions of the pilot symbols is considered in Figure 5.23. In the case of the Wiener model with cr/^ = 6 degrees, two different distributions are considered, namely 1 pilot symbol in each block of 20 consecutive bits and 20
346
Graph-Based Detection: Algorithms and Applications
Figure 5.21: Performance of the algorithms based on Tikhonov and Gaussian parameterizations. BPSK and two different phase models are considered. pilots in each block of 400 consecutive bits (hence, the effective transmission rate is the same). We may observe that the algorithm based on Tikhonov parameterization is almost insensitive to the pilot symbol insertion strategy thanks to the algorithm modification described in Section 5.7.2. A similar modification is not possible in the case of the algorithm based on Gaussian parameterization, since it can be shown that the choice of a dominant term corresponds to a hard decision based uniquely on the decoder outcome Pd{ck}- We verified that this modification of the algorithm does not provide any performance improvement. Note that, in general, the distribution of the pilots has to be optimized for the specific detection algorithm employed. Finally, we consider the application of the algorithms to the DVB-S2 system. We consider two standardized LDPC codes with codeword length equal to 64,800 [226]: the first one has rate 2/3 and is mapped onto an 8-PSK modulation; the second one has rate 4/5 and is mapped onto a 32-amplitude phase shift keying (APSK) modulation.11 A maximum number of 50 iterations is considered and 36 pilot symbols 11
In this case, the constellation is composed of three concentric PSK constellations with different radii, namely a QPSK, a 12-PSK and a 16-PSK. See Problem 2.2.
Summary
347
Figure 5.22: Performance of all the proposed algorithms and comparison with other algorithms proposed in the literature. BPSK and the Wiener phase model with CTA = 6 degrees are considered. are inserted every 1476 symbols, as prescribed by the existing standard [226]. The above mentioned phase noise ESA model is considered. The performance is shown in Figure 5.24. For the algorithm based on Tikhonov parameterization, the loss due to phase noise is less than 0.1 dB in both cases. Notice that a further improvement in performance may be obtained if the maximum number of iterations is not limited to 50. The Kalman smoother (Gaussian parameterization) does not perform as well mainly because of the bursty allocation of pilot symbols.
5.8
Summary
In this chapter, we have extended the general framework developed in the previous chapters for VA-based MAP sequence detection and FB-based MAP symbol detection, to the case of graph-based MAP symbol detection. In particular, using the FMC or the CMP, we have described how to build a factor graph representing the joint a
348
Graph-Based Detection: Algorithms and Applications
Figure 5.23: Performance of the algorithms based on Tikhonov and Gaussian parameterizations. BPSK and two different pilot distributions are considered.
posteriori probability of the information symbols and taking into account channel statistics and code constraints. In this FG, the channel parameters are not explicitly represented. The desired marginals necessary to implement MAP symbol detection are approximately derived using the SP algorithm. The cases of ISI, phase-uncertain and flat-fading channels have been considered in the numerical results, showing the effectiveness of the described approach. An alternative approach inspired by [192] has also been considered for phaseuncertain channels. In this case, the channel parameters are explicitly represented in the FG and the method of canonical distributions has been used to simplify the implementation of the SP algorithm. We have shown that a clever choice of the used canonical distribution allows one to derive graph-based detection algorithms with very low complexity and practically optimal performance.
Problems
349
Figure 5.24: Performance of the algorithms based on Tikhonov and Gaussian parameterizations. The ESA phase model is considered along with 8-PSK and 32-APSK modulations.
5.9 Problems Problem 5.1: Consider a rate-2/3 binary block code with parity check matrix
and recall that c is a codeword iff HCT = 0, where c = (GI, c2, €3). Assume that this code is transmitted over a channel which introduces AWGN noise and let r = (7*1, r2, rs) denote the received sequence. A. Draw the factor graph of the joint APP P(c r) and verify that the graph does not contain cycles. B. Using the sum-product algorithm, compute the marginals P(ci r), i = 1,2,3.
350
Graph-Based Detection: Algorithms and Applications
C. Compute the LLRs
Problem 5.2: Derive the message computations (5.9) and (5.10) from (5.3) and (5.4). Problem 5.3: Show that the FB algorithm described in Chapter 3 can be interpreted as an instance of the SP algorithm applied to Wiberg graphs with a natural forward backward schedule. Problem 5.4: For an ISI channel, demonstrate that if the differences between the indexes of the channel interferers are all different, the graph has girth 6. Problem 5.5: For an ISI channel with L = 4 and /2 = 0 and an uncoded transmission (lower part of the graph in Figure 5.7), identify all possible stretching transformations which allow one to obtain a girth-6 graph. Problem 5.6: For an ISI channel with L = 4 and /2 = 0 and an uncoded transmission (lower part of the graph in Figure 5.7), transform the original graph into a cycle-Wiberg graph. Problem 5.7: Substituting (5.31) into (5.17), show that for frequency flat fading channels the joint a posteriori pmf of the information symbols P(a r) has the expression (5.32). Problem 5.8: Compute the phase noise power spectral density for the ESA model described in Section 5.7.1. Problem 5.9: Restate the algorithm based on Fourier parameterization in Section 5.7.2 assuming the following phase model: where T is the symbol interval and {vn} is a sequence of independent and identically distributed frequency offsets with uniform distribution in [—a/T, a/T] (a is a known constant). Problem 5.10: Let p(x) and q(x) be two probability density functions. The divergence D(p\ q) (also known as cross-entropy, or Kullbach Leibler distance [35]) is defined by
Problems
351
A. Let f(x) be a probability density function of a real random variable with mean /// and variance aj. Show that the solution to the divergence minimization problem min eg D(f\\g), where Q is the set of all real Gaussian pdfs, is given by #(///, &2f,x), i.e., it is the Gaussian pdf with the same mean and variance. B. Let f(x) be the probability density function of a complex random variable with mean /// and variance a2f. Show that the solution to the divergence minimization problem min Q> D(f\\g), where Q' is the set of all complex circularly-symmetric Gaussian pdfs, is given by p