DATA HANDLING IN SCIENCE AND TECHNOLOGY m VOLUME 18
Signal t r e a t m e n t a n d signal analysis in NMR
DATA HANDLI...
63 downloads
1058 Views
25MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
DATA HANDLING IN SCIENCE AND TECHNOLOGY m VOLUME 18
Signal t r e a t m e n t a n d signal analysis in NMR
DATA HANDLING IN SCIENCE AND TECHNOLOGY m VOLUME 18
Signal t r e a t m e n t a n d signal analysis in NMR
DATA HANDLING IN SCIENCE AND TECHNOLOGY Advisory Editors: B.G.M. Vandeginste and S.C. Rutan Other volumes in this series: Volume 1 Volume 2 Volume 3 Volume 4 Volume 5 Volume 6 Volume 7 Volume 8 Volume 9 Volume 10 Volume 11 Volume 12 Volume 13 Volume 14 Volume 15 Volume 16 Volume 17 Volume 18
Microprocessor Programming and Applications for Scientists and Engineers by R.R. Smardzewski Chemometrics: A Textbook by D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte and L. Kaufman Experimental Design: A Chemometric Approach by S.N. Deming and S.L. Morgan Advanced Scientific Computing in BASIC with Applications in Chemistry, Biology and Pharmacology by P. Valk6 and S. Vajda PCs for Chemists, edited by J. Zupan Scientific Computing and Automation (Europe) 1990, Proceedings of the Scientific Computing and Automation (Europe) Conference, 12-15 June, 1990, Maastricht, The Netherlands, edited by E.J. Karjalainen ReceptorModeling for Air Quality Management, edited by P.K. Hopke Designand Optimization in Organic Synthesis by R. Carlson Multivariate Pattern Recognition in Chemometrics, illustrated by case studies, edited by R.G. Brereton Sampling of Heterogeneous and Dynamic Material Systems: theories of heterogeneity, sampling and homogenizing by P.M. Gy Experimental Design: A Chemometric Approach (Second, Revised and Expanded Edition) by S.N. Deming and S.L. Morgan Methods for Experimental Design: principles and applications for physicists and chemists by J.L. Goupy Intelligent Software for Chemical Analysis, edited by L.M.C. Buydens and P.J. Schoenmakers The Data Analysis Handbook, by I.E. Frank and R. Todeschini Adaption of Simulated Annealing to Chemical Optimization Problems, edited by J.H. Kalivas Multivariate Analysis of Data in Sensory Science, edited by T. Naes and E. Risvik Data Analysis for Hyphenated Techniques, by E.J. Karjalainen and U.P. Karjalainen Signal Treatment and Signal Analysis in NMR, edited by D.N. Rutledge
DATA HANDLING IN SCIENCE AND TECHNOLOGY m VOLUME 18 Advisory Editors: B.G.M. Vandeginste and S.C. Rutan
Signal treatment and signal analysis in NMR
edited D.N.
by RUTLEDGE
Laboratoire de Chimie Ana/ytique, Institut National Agronomique, 16, rue Claude Bernard, 75005 Paris, France
1996 ELSEVIER
Amsterdam - - L a u s a n n e m N e w Y o r k - - O x f o r d m S h a n n o n - - T o k y o
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands
ISBN
0-444-81986-X
© 1996 Elsevier Science B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher, Elsevier Science B.V., Copyright & Permissions Department, P.O. Box 521, 1000 AM Amsterdam, The Netherlands. Special regulations for readers in the U S A - This publication has been registered with the Copyright Clearance Center Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the USA. All other copyright questions, including photocopying outside of. the USA, should be referred to the copyright owner, Elsevier Science B.V., unless otherwise specified. No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. This book is printed on acid-free paper. Printed in The Netherlands
Preface Signal analysis and signal treatment are integral parts of all types of Nuclear Magnetic Resonance. In the last ten years, much has been achieved in the development of new methods for analysing standard NMR signals such as relaxation curves and 1dimensional spectra. At the same time new NMR techniques such as NMR Imaging and multidimensional spectroscopy have appeared, requiring entirely new methods of signal analysis. However, until now, most NMR texts and reference books limited their presentation of signal processing to a short introduction to the principles of the Fourier Transform, signal convolution, apodisation and noise reduction. Therefore, if one wished to understand the mathematics of the newer signal processing techniques, it was usually necessary to go back to the primary references in NMR, chemometrics and mathematics journals. The objective of this book was to fill this void by presenting, in a single volume, both the theory and applications of most of these new techniques to Time-Domain, FrequencyDomain and Space-Domain NMR signals. Although the book is primarily aimed at NMR users in the medical, industrial and academic fields, it should also interest chemometricians and programmers working with other techniques. In the same way that concepts derived from astronomy (Maximum Entropy Method), Near Infrared Spectroscopy (Multivariate Statistical Analysis) and chromatography (Stochastic Excitation) have proven to be fertile in NMR, I am sure that the ideas presented by the authors in this book will be of great use in other fields where similar signals - images, relaxation curves, spectra, multidimensional sequential data - are to be treated and analysed. The procedures are presented in detail and some of the computer programs are included, either as source code or in executable form. For those who wish to refer to the primary articles, almost 1000 references are given. I wish to thank Bernard Vandeginste who was the first to realise there was a need for such a book, the authors whose enthusiasm for this project echoed my own, and my family, both near and far, without whose support it would not have been possible.
Douglas N. Rutledge
This Page Intentionally Left Blank
vii
Contents PREFACE Douglas N. R UTLEDGE
GENERAL THEORY OF SIGNAL ANALYSIS IN NMR 1 Fourier Transform and Signal Manipulation Luc J. EVELEIGH
Basics of signal processing The Fourier transform : main features Fourier transform of sampled data References 2 Maximum Entropy Methods in NMR Data Processing
1 9 15 23 25
Kevin M. WRIGHT
Introduction Approaches to the Maximum Entropy Formalism MEM algorithms Applications of MEM in NMR data processing The definition of Entropy References 3 Analysis of NMR Relaxation Data
25 26 33 39 39 41 44
Kenneth P. WHITTALL
Introduction Mathematical description Nonlinear parameter estimation Linear inverse theory Orthogonal function expansions Linear Prediction and Singular Value Decomposition Rational function methods Deconvolution methods Nonuniqueness and resolution Conclusions References
44 46 48 49 53 55 56 58 59 61 62
viii
4 Nonlinear Regression
68
Ricardo S. CARMENES
Introduction Numerical methods Further topics Acknowledgments References
5 The Pad~-Laplace Analysis of NMR Signals
68 75 93 96 96 100
Jean A UBARD and Patrick LA VOIR
Introduction General theory of the Pad6-Laplace method Pad6-Laplace analysis of multiexponential curves The particular case of Pad6-Laplace analysis of NMR FID Conclusion References
6 Digital filtering
100 101 107 110 116 117
120
Keith J. CROSS
Introduction: why use digital filtering ? Mathematical background Computational pitfalls Applications Conclusion References Appendix A: Glossary of symbols and abbreviations Appendix B: PASCAL code for tan Butterworth filter Appendix C: The M th roots of - 1
7 Binomial Filters
120 121 125 131 139 141 141 142 144 145
Erics KUPCE
Introduction Theory. Postacquisition filters Chemical shift filters Multiple quantum filters J filters Conclusion Appendix References
145 145 149 153 157 160 161 161 161
ix
8 Linear Prediction and Singular Value Decomposition in NMR Signal Analysis
164
Mihaela LUPU & Dorin TODOR
Linear Prediction. Elementary theory. Linear Prediction of NMR signals SVD and its properties Solving the prediction coefficients LPSVD HSVD (State Space Approach) Conclusion References
164 168 169 171 177 181 188 189
APPLICATIONS IN THE TIME-DOMAIN 9 A Windows Program for Relaxation Parameter Estimation
191
Douglas N. R UTLEDGE
Introduction Marquardt method of nonlinear parameter estimation File menu Acquisition menu Simulate menu Calculate menu View menu Plot menu Convert menu Options menu Help menu References
10 Continuous Relaxation Time Distribution Decomposition by MEM
191 192 201 203 204 205 209 212 215 216 216 217
218
Frangoi s MARIETTE, Jean-Pi erre G UILLEMENT, Charles TELLIER and Philippe MARCHAL
Introduction The Maximum Entropy Method Validation of MEM by simulation Validation of MEM with experimental data Conclusion References
11 Examples of the Use of Pad~-Laplace in NMR
218 220 224 228 232 232
235
Denis LE BOTLAN
Introduction
235
Experimental conditions Study. of a model Study of a starch suspension Conclusion References
12 Analysis and interpretation of NMR water relaxation and diffusion data Brian P. HILLS Imroduction Relaxation in spatially homogeneous solutions and gels Exchange in spatially heterogeneous systems Activi .ty coefficients and NMR water relaxation Electrical conductivity and NMR water relaxation Conclusion Appendix References
236 242 243 245 245
248 248 248 262 272 276 277 278 279
13 Scattering Wavevector Analysis of Pulsed Gradient Spin Echo Data 281 Andrew COY and Paul T. CALLA GHAN The Pulsed Gradient Spin Echo Method Restricted diffusion Direct imaging of molecular motion Conclusion References
281 284 294 301 303
APPLICATIONS IN THE FREQUENCY-DOMAIN 14 Accuracy and Precision of Intensity Determinations in Quantitative NMR Jean-Philippe GRIVET Introduction Relaxation times and relaxation delays Baseline and phase anomalies Integration algorithms and integration range Noise in NMR spectrometers Precision of imegrals Least squares methods Linear Prediction Maximum Emropy Methods The use of modulus or power spectra Precision of derived parameters Conclusion References
306 306 307 308 311 312 313 316 318 319 319 320 321 321
xi 15 Least Squares Estimation of Parameters Affecting NMR Line-shapes in Multi-Site Chemical Exchange
330
Guido CRISPON!
Introduction General equations for signal shape Least Squares method Computer program Problems in Least Squares optimisation Symbols References
330 330 331 333 334 342 344
16 Reference Deeonvolution in NMR
346
Gareth A. MORRIS
Reference deconvolution Practical implementation Applications Conclusions Acknowledgments References 17 Continuous Wave and Rapid Scan Correlation NMR
346 349 353 358 360 360 362
Peter S. BELTON
Introduction The principles of CW NMR Rapid scan methods Applications of Rapid Scan Correlation NMR Conclusions References 18 Data processing in High-Resolution Multidimensional NMR
362 362 365 370 372 373 374
Marc A. DELSUC
Introduction Nature of multidimensional data set - some mathematics Basic processing Display Quality improvement Non-FT spectral analysis Alternate samplings Parameter evaluation Minimisation of the computer burden References
374 374 379 385 386 390 395 396 397 399
xii
19 Neural Networks for 2D NMR Spectroscopy
407
Simon A. CORNE
Introduction Pattern recognition and Neural Networks Data abstraction and Neural Networks Assignment of protein spectra Acknowledgements References
407 408 413 418 420 420
20 Analysis of Nuclear Magnetic Resonance Spectra of Mixtures Using Multivariate Techniques Trond BREKKE and Olav M. KVALHEIM
422
Introduction Carbon- 13 Nuclear Magnetic Resonance Spectroscopy Data pretreatment Multivariate Analysis Applications Conclusions References
422 424 428 431 439 448 448
APPLICATIONS IN THE SPATIAL-DOMAIN 21 Quantitative Magnetic Resonance Imaging" Applications and Estimation of Errors
452
Simon J. DORAN
Introduction The uses of quantitative imaging Practical problems in quantitative imaging Data fitting Random errors Systematic errors Conclusion Acknowledgements References
22 Stochastic Spectroscopic Imaging
452 453 457 458 463 471 482 483 483
489
Helge NILGENS & Bernhard BL OMICH
Introduction Noise Processing of the linear response Nonlinear response Processing of the nonlinear response Experimental examples
489 490 493 496 498 500
xiii Summary References 23 Application of Multivariate Data Analysis Techniques to NMR Imaging
509 509 513
Hans GRAHN & Paul GELADI
Summary Introduction Tomography, MR Imaging and MR Spectra Multivariate Images The experiment Experimental Design Multivariate Image Analysis Acknowledgements References
513 514 515 520 523 523 524 532 532
ACRONYMS
535
INDEX
537
This Page Intentionally Left Blank
XV
LIST OF CONTRIBUTORS J. AUBARD
Institut de Topologie et de Dynamique des SystOnes 1, rue Guy de la Brosse, 75005 Paris, FRANCE
P. BELTON
BBSRC, Institute of Food Research Norwich Research Park Colney, Norwich, NR4 7UA, UNITED KINGDOM
B. BLUMICH
Lehrstuhl f0r Makromolekulare Chemie RWTH Aachen, Sammelbau Chemie Worringerweg 1, 52056 Aachen, GERMANY
T. BREKKE
Department of Chemistry University of Bergen N-5007 Bergen, NORWAY
P. T. CALLAGHAN
Department of Physics & Biophysics Massey University, Private Bag 11-222 Palrnerston North, NEW ZEALAND
R. S. CARMENES
Departamento de Biologia Funcional Area de Bioquimica y Biologia Molecular Universidad de Oviedo, 33071 Oviedo, SPAIN
S. CORNE
School of Geography University of Leeds Leeds LS2 9JT, UNITED KINGDOM
A. COY
Department of Physics & Biophysics Massey University, Private Bag 11-222 Palmerston North, NEW ZEALAND
G. CRISPONI
Dipartimento di Chimica e Tecnologie Inorganiche e Metallorganiehe Universita' di Cagliari via Ospedale 72, 09124 Cagliari, ITALY
K. CROSS
42 Patrick Close Greensborough, Victoria 3088, AUSTRALIA
xvi M-A. DELSUC
Centre de Biochimie Structurale Facult6 de Pharmacie, Universit6 de Montpellier I 34060 Montpellier cedex, FRANCE
S. J. DORAN
Department of Physics University of Surrey Guildford, Surrey, UNITED KINGDOM
L. EVELEIGH
Laboratoire de Chimie Analytique Institut National Agronomique 16, rue Claude Bernard, 75005 Paris, FRANCE
P. GELADI
Department of Organic Chemistry Umea University Umea, S-90187, SWEDEN
H. GRAHN
Karolinska Institute Dept. of Surgery, MR Center, Karolinska Hospital S- 171 76, Stockholm, SWEDEN
J.-P. GRIVET
Centre de Biophysique Mol6eulaire Universit6 d'Orl6ans 1A, avenue de la Recherche Seientifique 45071 Orl6ans cedex 2, FRANCE
J. P. GUILLEMENT
Laboratoire de RMN et R6activit6 Chimique Universit6 de Nantes 2,rue de la Houssini&e, 44072 Nantes, FRANCE
B. P. HILLS
BBSRC, Institute o~Food Research Norwich Research Park Colney, Norwich, NR4 7UA, UNITED KINGDOM
1~. KUP (2 E
Department of Biochemistry University of Oxford, South Parks Road Oxford OX1 3QU, UNITED KINGDOM
O. KVALHEIM
Department of Chemistry University of Bergen N-5007 Bergen, NORWAY
xvii D. LE BOTLAN
Laboratoire de RMN et R6aetivit6 Chimique Universit6 de Nantes 2,rue de la Houssini&e, 44072 Nantes, FRANCE
M. LUPU
Institute of Physics and Nuclear Engineering, Div 1 Bucharest-Magurele, PO Box MG6, ROUMANIA
F. MARIETTE
CEMAGREF Divison Technologie 17 Avenue de Cucill6, 35044 Rennes cedex, FRANCE
G. MORRIS
Department of Chemistry University of Manchester, Oxford Road Manchester M13 9PL, UNITED KINGDOM
H. NILGENS
Max-Planck-Institute fiir Polymerforschung Mainz, GERMANY
D. N. RUTLEDGE
Laboratoire de Chimie Analytique Institut National Agronomique 16, rue Claude Bernard, 75005 Paris, FRANCE
C. TELLIER
Laboratoire de RMN et R6activit6 Chimique Universit6 de Nantes 2,rue de la Houssini6re, 44072 Nantes, FRANCE
D. TODOR
Institute of Physics and Nuclear Engineering, Div 1 Bucharest-Magurele, PO Box MG6, ROUMANIA
K. WRIGHT
BBSRC, Institute of Food Research Norwich Research Park Colney, Norwich, NR4 7UA, UNITED KINGDOM
K. P. WHITTALL
Dept. of Radiology, University of British Columbia 2211 Wesbrook Mall, Vancouver B.C., CANADA
This Page Intentionally Left Blank
Signal Treatment and Signal Analysis in NMR Ed. by D.N. Rutledge 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 1 FOURIER T R A N S F O R M AND SIGNAL MANIPULATION Luc J. Eveleigh
Laboratoire de Chimie Analytique Institut National Agronomique Paris-Grignon Paris, FRANCE 1 BASICS OF SIGNAL PROCESSING The aim of this chapter is to present the main mathematical tools used in Fourier transform NMR spectroscopy and in the associated signal processing. As an introduction, the advantage of Fourier transform spectroscopies compared to classical ones is first discussed. Then the behaviour of a linear system quite similar to a relaxing nuclear magnetic moment is analysed. This analysis can be done in the time domain as well as in the frequency domain, provided some mathematical tools are introduced. Those wishing to have more details should consult the references given at the end of the chapter (Marshall and Verdun, 1990 ; Oppenheim and Schafer, 1989 ; Jansson, 1984). 1.1 Preliminaries 9the Feliget advantage A spectrum, whatever the type of spectroscopy, can be seen as a set of measurements of various physical properties, for instance the absorbance at various wavelengths. These physical properties can be measured once at a time, as in a classical spectrometer, where a slot successively selects the wavelengths. Several properties can be measured simultaneously if several detectors are available, for instance the photodiodes in a diode array detector. It is also possible to measure different combinations of the physical properties and use some adequate methods, usually based on mathematics, to recover the value of each individual property. This "multiplex" strategy is used for instance in experimental design, when several experimental parameters are varied at the same time. A classical example of this method is the weighing objects using a double-pan balance: in this example, one must weigh three objects with unknown masses ml, m 2 and m 3. Each weighing gives a result Yi = • where x i is the mass of the weights to balance the mass of the unknown, and e i an experimental error. The classical method is to weigh one object at a time as shown in Fig 1.
I w
Fig. 1 : Classical weighing of an object. Yi = mi +ei
(1)
The expcriment thus gives m 1, m 2 and m 3 with an error e. In the multiplex mcthod, the three objects are measured together. Two are put on the right pan, and one on the left pan, as in Fig 2.
I
Fig. 2 : Multiplex weighing of a set of objects. yl = ml+m2-m3+el y2 = ml-m2+m3+e2 y3=-ml +m2+m3+e3 The masses are evaluated by : 2m I = yl+Y2+el+e2 = yl+Y2+el,2 2ml = YI+Y3+Cl +e3 = Yl +Y3+el,3 2m 3 = y2+Y3+e2+e3 = y2+Y2+e2,3
(2)
The error el. 2 is only " ~ as high as e I or e 2. if the latter are independent. Thus equation (2), compared to equation (1) gives m 1 from a signal increased by a factor of 2 and a noise increased by only a factor "~. This multiplex or so-called Fellget advantage comes from the fact that two measurements are required for the evaluation of m 1. The disadvantage is that it requires a "decoding step" to get m l, m 2 and m 3 from Yl, Y2 and Y3 and that the coding function used when various physical properties are measured together can be inverted. This is the case in Fourier spectroscopy. All the components of a spectrum are measured together, resulting in a dramatic improvement in signal to noise ratio. The Fellget advantage is therefore one of the most striking features of Fourier spectroscopies. C 13 NMR spectroscopy, for instance, exists only thanks to the enhancement of sensitivity provided by this advantage. 1.2 Encoding a signal 9the various coding functions available Several "coding functions" are used in analytical chemistry to take advantage of the multiplex method. Among them are :
1.2.1 Hadamard matrix It is very similar to the procedure explained in the weighing experiment, and is used in optical spectroscopy (IR). The spectral window is divided into N channels using an N channel mask interposed between the spectral disperser and the detector. Each channel can be set open or closed and N different combinations of the N light intensities can be measured. Each intensity can be calculated from the measurements by resolving the set of N equations and N unknowns. If the combinations are made following Hadamard matrix (a special matrix), this equation solving (the inversion of the matrix) is very easy. 1.2.2 Pseudo Random Binary Sequence When a PRBS is used, the same signal which lasts a long time, for instance a chromatogram, is measured N times. The N analyses are not performed exactly at the very same time: a sequence of successive analyses, overlapping in time, is made. In chromatography, the same mixture is injected or not injected every few seconds, following a binary random sequence in which 0 codes for non-injection and 1 for injection. The recorded signal is a combination of overlapping chromatograms. Since the sequence has some special properties (this why it is a PSEUDO random sequence), it is possible to recover by correlation (see below) the desired chromatogram from the complicated detector signal and from the injection sequence. A number of papers have described this so-called correlation chromatography (Annino, 1976; Smit, 1983; Kaljurand et al., 1990 : Yang et al., 1992). 1.2.3 Fourier coding fimction This is the "natural" code for measurement of physical phenomena related to waves (light, sound). In this coding procedure, each amplitude of a spectrum is associated with a sinusoidal function, in fact the vibration itself. The measured signal is the sum (i.e. combination) of the vibrations involved in the observed phenomenon. Since a sinusoidal function varies along time, successive measurements in time give different combinations
which can be decoded using the Fourier transform. At least N measurements in time are required to calculate N amplitudes in the spectrum. This presentation is helpful in order to understand Fellget's advantage, but might sound somewhat artificial. We will see below the better known and more classical physical meaning of Fourier transforms. 1.3 Decoding the Fourier code 9the Fourier transform In the Fourier code. each intensity is associated with a vibration. Therefore, it can be recovered with a classical spectrometer. What does in fact a classical spectrometer do ? Firstly, it separates vibrations, which are time-dependent phenomena, according to their characteristic frequency. Secondly, it measures the intensity of each vibration. The separating device might be a disperser or a filter. Thus, the Fourier code can be decoded if a mathematical separating device is available. In this chapter, we will see that the Fourier decoding operator, namely the Fourier transform, is very similar to a classical RLC electrical filter. We shall then examine how a frequency is extracted by such a system. Another reason for choosing an RLC filter is that it behaves like a set of nuclei in a magnetic field : when displaced from its equilibrium position and released, it oscillates with an exponential decay in a way very. similar to the Free Induction Decay of NMR. The advantage of the RLC over NMR is that we have become familiar with it since secondary school ! Suppose that an input time-domain signal I vi(t ) is filtered by a band-pass filter to give an output signal vo(t ) as shown on Figure 3. To simplify we consider that there are no leaks at points X and Y, so that the intensit3' is the same at R, C and L.
•
1 C input
R
output
vi(t)
Vo(t)
Y L Fig. 3 : A band-pass RLC filter. 1.3.1 Resolution f o r an infinite sine wave If the input signal vi(t) is an infinite sinusoidal wave :
l In this chapter, lower case will be used for signals when considered as being functions of time and upper case when they are considered as functions of the frequency. For instance rid(t) is the classical Free Induction Decay, and FID(v) is the NMR spectrum.
e xp(j 2 rt vt)+e xp( -j 2 rtvt ) 2
vi(t) = E cos(2rtvt) = E
(3)
the form of the output is known : Vo(t) is also an infinite sinusoidal wave, but of a smaller amplitude, because of the attenuation of the filter, which we can easily calculate in the frequency domain. The complex impedance of the resistor, the capacity and the inductance being R, l/jC2rtv and jL2rtv, the intensity in the circuit is : vi(t) i(t) = R + 1/jC 2av + jL 2~tv
(4)
and it is obvious that the tension at the terminals of the resistor is :
(5)
Vo(t) = R i(t) = F(v) vi(t )
The frequency response FR(v) (or transmittance) of the system is shown on Fig 4. FR(v) 89
v0
real part
v
imaginary, part
Fig. 4 : Frequency response of a band-pass RLC filter.
The transmittance is maximum for the frequencies +v o = •
LC
The width at half height of the peaks associated with these frequencies in the real part of FR(v) is WtA =~-~
. If the input is a real function, vi(t ) for instance, the output is
a real function too, because the real part of FR(v) is even and its imaginary part odd. Some dephasing may occur. 1.3.2 Resolution for a transient signal On the contrary,, if the input is a transient signal, the output has to be calculated in the time domain by solving the following differential equation where q(t) is the charge of the capacitance C :
L d2q dt 2 + R d--q dt + C~= vi(t)
(6)
The output is the tension at the ends of the resistor : Vo(0 = R dq dt
The output for various transient signals is shown in Fig. 5. Vo(t)
vi(t) E
IA^^^..
. . . . . .
iV~VVVV....
t
t
t
E/~
IAA^^........
lyvvvv.
. . . . . .
Fig. 5 : Output Vo(t) of an RLC filter for various transient inputs vi(t). Let us now study the more general case of any signal : in the time domain, equation (6) cannot be solved if vi(t) is not a simple function (step, exponential...); in the frequency domain, a complex impedance can be defined only for a given frequency, that is for a pure sinusoidal signal (in fact, the impedance defines a particular solution of the time domain equation when vi(t) is a sinusoid). However. in both cases, these problems can be overcome because the filter is a linear system, that is if Vol and Vo2 are two solution of (6), then Vo] + Vo2 is a solution too. If vi(t) can be decomposed into a sum of simple functions for which the solution of the differential equation is known, then the solution of the equation for vi(t ) is merely the sum of these solutions. There are two main ways of decomposing vi(t) : as a sum of wave functions, or as a sum of successive "slices".
1.3.3 Decomposition in the fi'equencv donlain The first decomposition is given by the Fourier theorem: if vi(t ) is finite in time, it can be coded in the Fourier code. that is in the form of a sum of sinusoidal functions.
vi(t) =
;
oo
Vi(v) exp(2jrtvt) dv
(7)
oo
The Fourier theorem alone does not give the function Vi(v) but does guarantee it exists. The signal Vi(v ) at frequency v is modified by the filter and becomes Vo(v ) : Vo(v ) = FR(v) Vi(v ) Vo(t) is the sum of the outputs corresponding to the input functions involved in the decomposition: Vo(t) =
Vo(v) exp(2jrcvt) dv =
fl
FR(v)Vi(v ) exp(2j~vt) dv
(8)
1.3.4 Decomposition in the time domain The second decomposition is to cut the time domain signal into a series of thin "slices" which, if gathered together, would give the signal again. When infinitely thin, a slice is called an impulsion. Let us note 5x(t ) the unitaD' impulsion at time x. In order to use a more general definition, we can decide that ~Sx(t) is the result of the translation to the date x of the impulse 5(t) at time t = 0 as illustrated by Fig. 6.
( ,8~ t )
0
~ ( t ) = 5(t-x)
x
Fig. 6 : Dirac impulsion at t = 0 and t = x 5x(t) = iS(t--t)
(9)
If we take a "slice" of the input function at this time, we get vi(x) 5(t-x). The sum of the "slices" must give the original input : vi(t) =
vi('t) 5(t-'t) d't
(10)
oo
and the solution is just the sum of the solution for the various components of the input. Each component ~5(t--r) has a coefficient vi(-t), so that the components of the output are ir(t-x) vi(x ). where ir(t), thc impulse response, is the solution of equation (6) when the input is the 5 function :
Vo(t) =
~
OG
(11)
vi(x) ir(t-'t) d't
O(3
1.3.5 The Fourier transform As we already mentioned, the Fourier theorem indicates that for most functions, a Fourier code exists" a time function f(t) is the sum of sinusoidal forms, each at a frequency v o with a given amplitude in the sum F(vo) 9 f(t) =
F(v o) exp(2j~vot) dv o
(12)
oo
Thanks to our study of RLC filters, we are now able to find out how to decode the Fourier encoded signal, that is, given f(t), find the amplitude F(vo). We just have to imagine a filter which transmits only the signal at frequency v o. The frequency response of such a filter, called the "Fourier transform", is shown in Fig. 7 9
F(v)
real part Fig. 7 : Frequency response of a filter which keeps complex frequency v o only. It is possible to build, or at least to imagine, an RLC filter with such characteristics : as its bandwidlh is null, R]L must be null too: L and C must be chosen so that its nominal frequency is Vo. It is easy to show that the impulse response of such a filter is : 2cos(2nvot) = exp(2jnvot) + exp(-2jnvot)
(13)
This would be a real filter which keeps imaginary frequencies v o and -v o. As the filter is to keep positivc frequency only, its impulse response must be : exp(2jnvt)
(14)
If this filter is applied to a signal f(t), we get an output fvo(t) which is a sine wave at frequency v o, with an amplitude corresponding to the contribution of this frequency to f(t). This outpu! signal can be derived from our previous reasoning on filters using a formula similar to equation (11) :
fvo(t) =
(15)
flu) exp[2jrtvo(t-u)] du -00
fro(t) = exp(2jrtvot)
~
OD
f(u) exp(-2jrtvou) du
(16)
As expected, fv (t) is a sine wave with frequency v o, and an amplitude F(v o) given by o the Fourier transform of fit) : F(v o) =
flu) exp(-2j~vou) du
(17)
It is noticeable that the Fourier coding formula (12) and the Fourier decoding formula (17) are very similar, except for the sign in the complex exponential and, more important, the integration variable. One might have the feeling that there is something wrong with equation (17): it cannot be applied to the fimction fit) = A exp(2jrcv0t ) to get its amplitude A at frequency v o, because the sum becolnes infinite! The reason for is that, in this case, fit) is not a finite fimction. Fortunately, there is another pair of Fourier transforms, with the corresponding Fourier theorem : a periodic function fit) of period T can be decomposed into a sum of harmonics : +oo
f(t) =
~
F(vn) exp(2irtvnt )
where
v n = n/T
(18)
n=-oG
F(v n) = ~
(19)
fit) exp(-2jnvnt) dt
In fact, these two transforms are the ones initially discovered by Fourier at the end of the 18 th century,. 2 THE FOURIER TRANSFORM
9M A I N F E A T U R E S
In this section, using our filter experiment, we shall see the main definitions and the theorems required to do for this signal processing with the Fourier transformation. The Fourier transform and its symmetries The direct and inverse Fourier transforms may be defined as what we called the decoding and coding operations. F is the transform of f and f the inverse transform of F when 9 2.1
10
F(v) =
fit) exp(-2jnvt) dt
(20)
F(v) exp(2jnvt) dv
(21)
oO
+oO
f(t) =
I
-OO
From these definitions, it is easy to demonstrate the symmetries reported in Table 1. 2.2 Time and frequency shifting. The first order dephasing. Any shift in the origin of time has an effect on the spectrum : S(v) =
fit+to) exp(-2jnvt) dt = exp(2jnvt o) F(v)
(22)
In NMR. it is not possible to perfectly synchronise the beginning of the relaxation process and the observation of the rid. so that the acquired signal s(t) is not the free induction decay, but the rid with a slight change in time origin: s(t) = fid(t+to). The Fourier transformation is performed on s(t) and the result is not the true NMR spectrum, but the spectrum S(v) with a dephasing which is a linear function of the frequency: exp(2jnvto). A similar problem is observed in the case of the inverse transform :
f_
-~ F(v+Vo) exp(2jnvt) dv = exp(-2jnvv o) f(t)
oO
Table 1 : Main swnmetries of the Fourier transform . f(t) fit) fit) f(t) f(t) fit) fit) fit)
if is real is imaginary is even is odd is real and even is real and odd is imaginar), and even is imaginar), and odd
denotes complex conjugation
then... F(-v) = [F(v)l* F(-v) = -[F(v)l F(v) is even F(v) is odd F(v) is real and even F(v) is imaginary and odd F(v) is imaginary and even F(v) is real and odd
(23)
11 2.3 Parseval-Plancherei theorem The total power of a signal f is defined as : P =
f(t) f*(t) dt
(24)
oo
If f is expressed in P as the inverse Fourier transform of F, and the order of summation changed in the integral, one gets : P=
i_
F(v) dv
oo
ii
exp(2irtvt) f*(t) dt =
oo
ii
F(v) F*(v) dv
oo
Thus the total power is kept unchanged by Fourier transformation and can be expressed in time domain as well as in frequency domain. 2.4 Hilbert t r a n s f o r m The real R(v) and imaginaD' I(v) part of a spectrum both contain the same information: the amplitude of each frequency in a signal. Actually, these two functions might be derived one from the other using the Hilbert transform :
1 f_~-~ I(u)du
R(v) = - -
v-u
(25)
oo
1 ~+oo R(u)du
I(v) = -
V- U
7Z
(26)
J-oO
R(v) is normally uscd ill spectroscopy, whereas I(v) is discarded. Some application of the Hilbert transform call be found in a paper by Williams and Marshall (Williams, 1992) where they take advantage of the information content of l(v). 2.5 Dirac impulsion 15and convolution product At this stage, we introduce two fundamental concepts in signal manipulation. We first get some new information about what we called an impulsion and which is in fact Dirac's ~5impulsion. Following our reasoning, its definition can be :
8 0 - t o) = 0
f
if t~:t o
+ ~ f(t) ~(t - t o) dt = f(t o)
(27)
o' -0(3
There is a slight difference between equation (10) and (27) : t - t o instead of t o - t but, as ~5is an infinitely thin slice, it is symetrical. ~5is even, so that 6(t - to) = 8(t o - t).
12 Secondly. we introduce a new operation between fimctions f and g, the convolution product, that we will denote with the sign | 9 f | g(t) =g | fit) =
flu) g(t - u) du
(28)
The convolution product of any function fit) with a Dirac impulsion gives 9
f |
ax(t) =
(29)
f(t - u)•(u - x) du = f(t - x) Ot3
so the convolution by the impulse centred in x is simply equivalent to a translation x. Fourier transform of the i m i ) u i s e r e s p o n s e of a l i n e a r s y s t e m We can summarise our previous results in the following scheme" when a signal passes through a lincar system, it i s in the time domain 9convoluted by the impulse response of the system in the frequency domain 9multiplied by the frequency response of the system. 2.6
vi(t)
.l.
....
~ q(v)
,AAA,, . .A
A..... 1,.,..
FI_
.
]
t
ittj.
~d . ~
......
!I!~1l'll-l~-i I
ri(t)
FR(v)
v ~
•
9 ">
1,,.. $
~~.._.._~.
.,
V""
V
I
~, Vo(t)
Vo(V) ~, FT r
FT -I ..........
....
,. .....
Fig. 8 9Signal transformation through a linear ~,stem. Rcprcsentation in the time and frequency domains.
13 Something is missing in Fig. 8 : the relationship, if there is one, between the impulse and the frequency responses of the system. It is easy to find it since we have a function which has no effect when put in the convolution product : the 8 impulse as introduced in equation (10). The Fourier transform of 6 is given by : zX(v) =
(30)
8(0 exp(-2jrtvt) dt = exp(-2jrtv O) = 1 o0
and a constant function equal to 1. (Pianists and guitar players already know this as they always use the same stroke or pinch -the impulsion corresponding to their instrument- to excite any cord, whatever its nominal frequency may be; NMR users know it too, since they use an impulsion to give energy to any magnetic momentum, whatever its Larmor frequency may be). If 8 and A are inserted in our previous diagram, we get Fig. 9. The bottom of this figure then shows that the frequency response of a linear system is equal to the Fourier transform of its impulse response. The interrogation mark in Fig. 8 can simply be replaced by the direct and inverse Fourier transforms.
A(v)
FT r
8(0
FT-I o
t
o
v
~ ri(t)
9 r IIA.
_
V'....
I
t
v
ri(t)
h..
IV'.
.
.
FR(v) ~ ==
.
.
FT']
. t
"
v
Fig. 9 9Relationship between time impulse response and frequency response.
14 2.7 Convolution theorem We have just shown that the whole right hand part of the figure is the Fourier transform of thc Icft hand part. This result is known as "the convolution theorem", f (t) and g(t) being two functions and G(v) and F(v) their Fourier transforms, then : FTIf | gl(v) = F(v).G(v)
(31)
This theorem has its own symmetries : FTlf.gl(v) = F | G(v)
(32)
FT'I[F | Gl(t)= f(t).g(t)
(33)
FT'IIF.GI(t) = f |
(34)
g(t)
When a product (classical or convolution) is applied in a domain (time or frequency), the other product is applied in the other domain. 2.8 Correlation theorem The correlation product of two fimctions f and g is defined as : Corrlf.gl(t) =
f(t + u) g(u) du
(35)
This product is ve~ ~ usefid to characterise some random processes (for instance Brownian motion is often described by its correlation time) and has its own theorem, in fact quite similar to the convolution theorem : FT[Corr(f.g)](v) = F(v).G(-v)
(36)
For real functions, the correlation theorem becomes : FT[Corr(f.g)l(v) = F(v).G*(v)
(37)
2.9 Wiener-Khinchin theorem The correlation of a fimclion f with itself defines the autocorrelation Corr(f,f). The Fourier transform of the autocorrelation is derived from the correlation theorem : FT[Corr(f.f)](v) = F(v).F*(v) =
I F(,,) 12
(38)
15
3 FOURIER TRANSFORM OF SAMPLED DATA In contrast to what we have seen above which applies to mathematical functions defined continuously for time or frequency domains ranging from -oo to +oo, true signals are limited in time or frequency and can be recorded only for a finite number of points. It is then necessary to investigate the consequences of these limitations. 3.1 Truncation of the signal 3.1. ! Limitation of acquisition in time In NM1L the time domain signal does not exist before the impulsion so that it is not a problem to perform Fourier transformation for positive time only 9 FID(v) =
rid(t) exp(-2jnvt) dt = Do
f0
rid(t) exp(-2jnvt) dt
(39)
since rid(t) = 0 if t < 0. On the other hand. it is not possible to record the signal forever! The rid is then truncated after time A to give the observed signal rid A. rid A can be defined as : fidA(t) = rid(t), bA(t) where the boxcar filnctionb is given by : ba(t) = 1 if tA According to the convolution theorem, the NMR spectrum is : FIDA(V) = FID(v) | BA(V)
(40)
The consequence of limiting the acquisition in time are illustrated on Fig 10. Thus, the experimental spectrum is distorted by the convolution by BA(V). Some extra peaks, usually called "feet", appear on either side of each "true peak". The peak is said to "leak". This is bad in two ways : it is not possible to know if a small peak is a "foot" or a true peak ; a part of the area of the true pcak is transferred to its feet.
3.1.2 Apodisation It is possible to remove the feet. This operation (literally "apodisation" in Greek) is performed in the time domain before Fourier transformation. As the feet come from the abrupt truncation of the signal, one can multiply the rid by a function which smooths out this change in signal amplitude. Such a function might by a triangle, for instance. The result, shown on Fig. 11, is a spectrum without supplementary, wiggles. In order to smooth out the signal extinction, the apodisation function has to begin extinction earlier than the classical boxcar b(t). More information is lost, and the resolution of an apodiscd spectrum is always poorer than the initial one 9the price to pay for the removal of thc feet is broadening of the peak. Various apodisation functions (often called windows) are availablc. Three of them are compared in Fig. 12.
"(~.IJl~13 JoJ ,(lUO u~d [l~OJ) Axopu!Ax Ol~tXt'!J1 13JO SHI3OtU ,~q uo.tles!podv " I I ~ ! d
1
I
1
,-i~ ...................... """'"wP"vvv"wrvtl"ltl'?l]v~VP~~ "(fil!Jel~ JOj POllOl d s! ued lt~:~J ,~:lUO) uouotuou~tld ~t~ll~OI " O! "~!d
A
......... ,ill v|
....... V|VV. . . . . . .
|
~x
9I
17 3.2 Sampling theorem As unlimited observation time is not possible, infinite observation resolution is not feasible. The information has to be sampled, that is the recorded data is limited to a finite number of points representative of the continuous phenomenon studied. The most common sampling is periodical. As all the points between two samples are discarded, the loss of information may seem infinite. In fact, the analysis of the sampling process in the frequency domain shows that the complete information content is conserved provided the sampling period is adapted to the component frequencies of the signal. This result is known as the Nyquist theorem which can be proved as follows.
SPECTRUM
TIME DOMAIN SIGNAL
1
......... ++
CORRESPONDING
ACQUISITION WINDOW
SPECTRUM
A boxcar
welch
t
v
t
v
triangle
Harming
... t
v
Fig. 12 9Comparison with the boxcar windowof three apodisation fuctions.
18 3.2.1 Nyquist criterion Sampling in the time domain is equivalent to multiplication with a Dirac Impulse Train (DIT), a function equal to zero everywhere except at the sampling points, the sampling period being T. The spectrum of the signal is therefore convoluted with the Fourier transform of the DIT, which is itself a DIT with a period I/T. The spectrum of the time sampled signal is then a periodical function as shown in Fig. 13 }1 !
l
i
!
!
ii 1 I I i
:.
.
I -I/T
'
-Vmax
copy translated by -I/T
I
0
original spectrum
1
i !
'
t
l
:':
~:
!"
_J..;. ................................."~..L_.
i
I
I l/T-vmax I/T Vm~x copy translated by I/T
Fig. 13 : Frequency representation of a time sampled signal (real part only). As the time domain signal is a real function, its Fourier transform is even. Therefore, if the highest frequency in the signal is Vmax, the spectrum contains -Vma x. In the spectrum of the sampled signal, the "first" copy of the original spectrum then contains some signal at l/T-vma x. The spectrum of the initial signal can be extracted from the spectrum of its sampled version if the original spectrum and its first copy do not overlap, that is if: l f r - V m a x < Vnm x
or
Vac q >
2
Vma x
If the sampling period is T, the maximum frequency that can be correctly sampled is 2/T, called Nyquist frequency (the Nyquist theorem is sometimes known as "the sampling theorem" or "the Shannon theorem"). The Nyquist theorem is obeyed in Fig. 14 This very important results is known as the Nyquist criterion. 3.2.2 Aliasing If the Nyquist criterion is not obeyed, the replicates overlap. As shown in the following figures, in a real signal, a peak a t frequency VNyquist + Av has its counterpart at frequency -(VNyquist + Av). Therefore, the first replica has a peak at 2VNyquist -(VNyquist + Av) = VNyquist - Av. Thus, after sampling, this peak has an alias in the spectrum. If the
19 peak in the unsampled spectrum is at a frequency Av above the maximum authorised frequency, the alias is situated Av below this frequency, so that the spectrum appears "folded back" around Nyquist frequency. This phenomenon, called aliasing, must be avoided by choosing a sampling frequency at least twice as high as the maximum frequency in the signal, and/or by using a lowpass filter which eliminates every components of the signal above the Nyquist frequency of the sampling device used.
3.2.3 Alissing initial points Missing initial points lead to a problem equivalent to time shifting (see 2.2) but for a sampled signal.
|
II,~hall.lll~l~aLa,..,,................. F T i ~ .............. ,..
0
l/2T
v
x, :j
|
9
!..
~
w
|...
sampling period T
t
0
I/T v
==l
0 -xxx
~
x
I/2T
x
original spectrum
copy translated by 1/T
Fig 14 9Sampling following Nyquist condition (real part only).
1/T v
20
i
..................................
.....................
" ,. . . . .
I/2T
!~i~,:
.. 9
~
'.;': .
sampling period T
t
t. *9 i
l~
"
.
_. -
1 ~
-.,==,%~
=
~'.
~..~
-
== " =
~,
'~-'- , - ~ - ~ . - - . ~
%~-,~.~.~,Z~,,-~'~.",
.~
original
I -
;i ,
I/~-T
t
I/JI"
v
copy translated by !/T or - 1/T
spectnlm
copy translated by 2/T Fig. 15 9Sampling outside of Nyquist condition (real part only)" the original spectnlm and its first copies (translated by I/T or -l/T) overlap.
L r
x
FL
,,=
= = x NM
Id~
X
M •
~
FT-i |'~ i
i
x=x=
=IdMN~ j %
x,
t
I/2T
V
Fig 16 9Aliasing (real part only). Peaks 1 and 2 belong to the original spectrum and peak 3 to its copy translated by I/T. As the spectrum is expected between frequencies -1/2T and I/2T, peak 3 is seen as peak 2 folded back around 1/2T. 3.3 Discrete Fourier Transform, zero-filling The computer manipulation of continuous spectra is no more possible than the manipulation of continuous time domain signals and spectra have also to be sampled. Frequency domain sampling has of course consequences in the time domain: it too becomes periodical. The sannpling interval can be defined with respect to the width of one replicate in the spectrum (that is l/T) : if N points are sampled for each replica, then
21 the sampling interval is I/NT, and tile period of the replicates in the time domain is NT, as shown on Fig. 17. The time domain signal fs(t) of a sampled spectrum being now a periodical function, the Fourier theorem for such fimctions (equation 19) indicates that : +oo
fs(t) =
~
c n exp(2innt/NT)
(41)
n=-or
The fs has a period N for n and all the information is contained between n = 0 and n = N-I. It is convenient to define new coefficients F n instead of c n : N-1 fs(t) = ~ F n exp(2innt/NT) n=0
(42)
1 I't o + N T F n =~--~
(43)
exp(-2innt~T) f(t) dt
0
The time signal is sampled and sampling is limited between n =0 and n =N-1 9 N-l fs(t) = ~ f(t) 6(t- kT) k=0
(44)
and then : 1 N-I Fsn - NT ~ f(kT) exp(-2innk/N) k=0
(45)
A vet)., similar reasoning call be applied Io F(v) which is a periodical function with a period 1/T 9 Fs(v) =
N-I ~ fn exp(2innvT)
(46)
n=0
N-I fsn = T ~ Fs(k/NT ) exp(2imlk/N) k=0
(47)
22
FT FT-1
~ :1
:..i
'
,!il,,~"
.
",..
'I~ II :
...
i;"":. I,., .
I]' l ___; . .
" .
.
.
.
|
.
.
I
. "
.
,
sampling period T
t
V
I/T
w
4----
M
v MM I'
|
X
,
.
,
'.
r
4---
NT
2NT
"'
'
'"
,.
t
':,' ii
",
i]'"
'
.;'ii :
iI
NXM
'
v
w
w
x w
~al~"- -
:
.
sampling period 1/NT M
X
"'"
~~"-~ xXX
x
~~0"XXX
)t
It
t
i
v
x
x
Fig. 17 9Sampling in the time and the frequency domain 9 the discrcte Fourier transform (real part only).
It is usual to remove T from the previous expressions to define a pair of transform that can be applied to a series of indexed figures (for instance some data in an array in the memory of a computer). If f(kT) is defined as the k th sampled point 9f(kT) = f[k], and F(k/NT) as the k th point of the spectrum 9F(k/NT) = F[k], then 9
23 N-1 FIk] = ~ f[n] exp(-2iTmk/N) n=0
(48)
1 N-1 f[k] = ~ ~ F[n] exp(2innk/N) n=0
(49)
It should be noted that the discrete Fourier transform is not a "low resolution" transform that would give the average value of the true transform between frequencies 2k-1 2k+l 2NT and 2N:T; the discrete Fourier transform is the sampled version of the true transform. F[k] is exactly equal at F(v) for v = klNT. If the sampling interval of the transform, that is 1/NT, is too large it is very easy to change it. One just has to add some points in the dataset to increase N. These points should not bring any information that would change the spectrum, so their value must be zero. This operation is called zerofilling and is frequently used to "increase the resolution" of the DFT spectra. In fact the total information content is unchanged, but the spectra become nicer and easier to read. 3.4 Fast Fourier Transform.
It can be seen from the previous definitions that the computation of the discrete Fourier transform of a set of N points requires N 2 multiplications and several additions" the spectrum has N points, and each F[k] is the sum of N expressions of the form: qn] exp(-2innk/N). Even with fast computers, the computation time becomes too long for many applications as soon as datasets contain more than 100.000 points. Fortunately, Cooley and Tukey proposed in 1965 an algorithm called Fast Fourier Transform. The number of operations in the FFT grows as Nlog2N which leads to a dramatic improvement in speed for large N. One drawback of FFT is that it requires (at least in its standard version) that N be a power of 2. Zero-filling can always be used to adjust the size of data. More information about the algorithms can be found in the books by Press et al. (Press, 1988) or Zelniker and Taylor (Zelniker, 1994) 4 REFERENCES
Annino, R., 1976. Cross-Correlation Techniques in Chromatography. J. Chromatogr. Sci., 14, 265-274. Cooley, J.W., Tukey, J.W., 1965. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19, 297-307. Jansson, P.A., 1984. Deconvolution, with application in spectroscopy, Academic Press, Orlando. Kaljurand, M., Koel, M., K~llik, E., 1990. Multiplex advantage and peak resolution in correlation chromatography. Anal. Chim. Acta, 239, 317-319. Marshall, A.G., Verdun, F.R., 1990. Fourier Transforms in NMR, Optical and Mass Spectrometry, Elsevier, Amsterdam
24 Oppenheim, A.V., Schafer, R.W., 1989 Discrete-Time Signal Processing (Imernational edition), Prentice-Hall Inc, Englewood Cliffs. Press, W.H. Flannery, B.P. Teukolsky, S.A. Vetterling, W.T., 1988. Numerical Recipes in C, Cambridge University Press, Cambridge Smit, H.C., 1983. Correlation chromatography. TrA C, 2, 1-7. Williams, C.P., Marshall, A.G., 1992. Hartley/Hilbert Transform Spectroscopy: Absorbtion-Mode Resolution with Magnitude-Mode Precision. Anal. Chem., 64, 916923 Yang, M.J., Pawliszyn, B., Pawliszyn, J., 1992. Optimization of Calculation Parameters and Experimental Conditions for Multiplex Gas Chromatography using Computer Simulation. d. Chromatogr. Sci, 30, 306-314 Zelniker, G., Taylor, F.; 1994. Advanced Digital Signal Processing, Marcel Dekker, New-York.
Signal Treatment and Signal Analysis in NMR
Ed. by D.N. Rutledge
25
9 1996 Elsevier Science B.V. All rights reserved.
Chapter 2 MAXIMUM ENTROPY METHODS IN NMR DATA PROCESSING Kevin M. Wright Institute o f F o o d Research N o r w i c h Research Park Norwich NR4 7UA UNITED KINGDOM 1 INTRODUCTION
For many years the Fourier transform, often in conjunction with some form of linear filtering, has been the standard method of spectrum analysis in pulsed nuclear magnetic resonance spectroscopy. With suitable filtering, improvements in either the spectral resolution or signal-to-noise ratio can be achieved, enhancing spectra and sometimes recovering additional information by splitting overlapping peaks or recovering weak signals lost in the noise. It is difficult to achieve both enhancements simultaneously. Performance is ultimately limited by the Nyquist theorem, with the spectral resolution determined by the 'dwell time' between the time-domain data. However, it is possible to go beyond this limit using nonlinear methods of spectrum analysis which make use of additional or prior information, such as statistical models of the underlying signal and the noise, and the variance of the noise. Such methods can achieve simultaneous resolution and signal-to-noise enhancements. One such method is the Maximum Entropy Method, or MEM (also known as MaxEnt). MEM is based on a very general principle which has been applied in many areas ranging from NMR to X-ray crystallography, tomography, radio astronomy and general image processing. The reason for its widespread applicability is that it is not a physical theory of anything in particular; rather, it is an extension of the principles of rational inference in science (Skilling, 1991). Maximum entropy provides a powerful and very general method of inferring the form of a probability distribution, given only the e ~ t a t i o n values or moments of quantities drawn from that distribution. MEM has been
26 claimed to be the only correct and consistent method of inference from incomplete and noisy data. It has its roots deep in information theory and statistical mechanics. In truth the application of MEM to NMR data processing is not straightforward. The claims made for MEM have sometimes led to false expectations of what it can actually achieve. It has been criticized on the grounds of requiring excessive execution times compared to linear methods, and failure to produce good quantitative peak intensities in spectra. There is even controversy over the most fundamental quantity in the theory, the entropy functional or information content of a spectrum. MEM is not a panacea, but it has taken its place as a very useful data processing technique in many areas of NMR spectroscopy. MEM must be used with care; it cannot rescue bad data, but it is sometimes capable of achieving quite remarkable results. Other chapters in this book cover specific applications of MEM: the analysis of NMR relaxation data (Chapters 3 and 10); quantitative peak intensity determination (Chapter 14) and high-resolution multidimensional NMR (Chapter 18). This chapter is an introduction to maximum entropy, reviewing the basic principles and algorithms and the difficulties associated with it. 2 APPROACHES TO THE MAXIMUM ENTROPY FORMALISM Like NMR itself, the maximum entropy method can be understood from several different viewl~ints. Several of these viewl~ints or approaches are introduced here, beginning with the derivation of MEM from information theory.
2.1 The information theory approach 2.1.1 Measurement and information In the context of information theory (Shannon and Weaver, 1949) the word 'information' has a precise meaning which is much more restricted than that in common usage. Information, as it is commonly understood, has certain general features or attributes such as content, quantity, meaning in the light of background knowledge, validity, accuracy and usefulness. The quantity of information conveyed by a message sent along some kind of communication channel is held by information theorists to depend only on the probability of the message being transmitted, and is independent of the other attributes of the message. In particular the quantity of information is independent of its meaning. This is an important point, as it means that even white noise may be said to carry quantifiable information. The measurement or reception of transmitted information is commonly associated with an increase in knowledge, or conversely with a reduction in uncertainty. 'Information' is a quantitative measure of the reduction in uncertainty which occurs when a message is received, or a measurement is performed. To fix these ideas, suppose an experimenter A tosses a coin, and transmits the result as a message to a bystander B. If the coin is unbiased, the message received by B resolves B's 50-50 uncertainty about the outcome of the coin toss. The information conveyed may be quantified as one bit (binary digit) which is all that is required to encode the message by the rule 0=heads, l=tails. But if B knew in advance that the coin was biased and always landed heads, B could predict the result of A's experiment with certainty and the
27 message from A would convey no additional information. Thus the information conveyed by a message depends on the probability of its occurrence. More generally, in a situation where the message conveys the result of an experiment with N possible outcomes, each with a known probability Pi (i= 1,2,...,N) the information content of the message depends on all the probabilities pi. Consider two extreme cases. (I) When one outcome is certain (e.g. 1~=1, p2=p3=...=ps=0) the information conveyed is zero. (II) When all outcomes are equally probable (pl=p2=...=p~l/N), a maximum of information is transmitted. In the latter case, the amount of information is defined as log2N bits. E.g. for N-8, the information is 3 bits. This implies that 3 is the minimum number of binary digits required for the most efficient encoding of the result of the experiment, which suffices to distinguish the 8 cquiprobable outcomes. The message can be encoded as one of the eight 3-bit patterns 000, 001, 010, 011, 100, 101, 110, 111. Since the information is a function of the probabilities, it appears to be an intrinsic property of the message source. Shannon took this view and referred to this property as the entropy of the source. It is synonymous with the mean information content of the messages originating from the source; it also represents the mean number of bits required to encode all the possible messages, using the most efficient encoding scheme. To clarify this, consider an experiment with 4 possible outcomes A, B, C and D having probabilities pA= 1/2, 1~= 1/4, ~----pD= 1/8. The results of a series of such experiments may be transmitted as a binary-coded message using the codes A=0, B-10, C= 110, D= 111. E.g. the bit string 011010111010 would denote the outcomes ACBDAB. The mean number of bits per message is Epi.(codr i
) = 1.1 + 1.2 +--.3 1 +--.3 1 = 1.75 bits. 2
4
8
8
No other encoding scheme can be more efficient than this. In this example therefore, the source entropy is 1.75 bits per message.
2.1.2 The Shannon entropy formula The source entropy may be defined by the following general expression (Shannon and Weaver, 1949): S = - E Pi l~ Pi (1) i where -log2 Pi is the information c.,onvg~yexiby a single message occurring with probability 1~ and the weighted sum gives the mean information, i.e. the entropy S measured in bits. S is more usually defined using natural logarithms, S = -E
Pi In Pi (2) i whose units are 'nats' (one nat = log2e bits). Up to this point, the term 'information' has been used in the sense usually employed in information theory, where it is synonymous with entropy. However in the physical sciences including NMR, 'information' is often used to refer to a degree of order or polarisation in a system which is the opposite of thermodynamic entropy. It is important to be aware of these two opposite meanings of information in the literature. For the
28 remainder of this chapter, 'information' will usually be employed in the physical sense: increasing information implies decreasing entropy S, and vice versa. Equation (2) gives the functional form of entropy S in terms of a probability distribution p. Shannon arrived at this formula by suggesting that entropy should satisfy a few simple conditions if it is to coincide with intuitive ideas of how an information measure should behave. These requirements, which may be stated in the form of four axioms, are as follows:(I) S should be a continuous function of the Pi. (II) If all the Pi are equal to l/N, S should be a monotonic increasing function of N, i.e. when there are N equiprobable outcomes, S should increase with N. (III) For a given N, S should reach a maximum when all the pi values are equal, and it should reach a minimum when one of the Pi values is unity, and the rest are zero. (IV) Consider two experiments whose outcomes have discrete probability distributions Pi (i=l,2,...,M) and q-tj 0=l,2,.--,N), respectively. If we perform a composite experiment in which eacli outcome corresponds to a pair of results (i, j) from the individual experiments, then the information derived from the composite experiment is the same as that derived from the individual experiments performed in succession. In other words, the entropy of the composite experiment is the sum of the entropies of the individual experiments. Shannon proved that apart from an arbitrary scale factor there is only one expression for entropy which is consistent with these conditions, i.e. equation (2). In particular, the entropy of the composite experiment is given by MN
S(pq) = - E
M
rk lnrk = - E
k=l
N
M
E P i q J lnPiqJ = - E
i=l j = l
i=l
N
pi lnpi - E qJ lnqj = S(p)+ S(q) j=l
where rk = p~qj is the probability of the kth. outcome of the composite experiment which combines the outcomes of the individual experiments, and E p , = E q j =1. i
j
Note that when all the outcomes are equally probable S = In N, which (apart from a constant factor which sets the units) is Boltzmann's famous expression for the thermodynamic entropy of a system with N equiprobable states. There is a close connection between information theory and thermodynamics. It is possible to derive most of equilibrium statistical mechanics from the conservation of energy law and the maximum entropy principle (Tribus, 1961). The latter states that the entropy of a system at equilibrium is maximized, subject to all known constraints on the possible states of the system.
2.1.3 The entropy of images In many of the earliest applications of MEM to image processing (e.g. Frieden 1972; Gull and Daniell, 1978; DanieU and Gull 1980; Wemecke and D'Addario, 1977; Burch,
29 Gull and Skilling, 1983) the information content of an optical image was defined by assuming that an image was built up from a large number of individual contributions to the overall intensity (photons) falling upon the pixels of an electronic detector such as a CCD, or localized in the pixels of a digitized photographic image. Given a sufficiently large number of incident photons, the relative fraction of the photons falling in a given pixel is proportional to the probability of a photon falling in that pixel. The array of pixel intensities fij therefore defines a two-dimensional probability distribution over the image:fij
(3)
Pij = E f k l k,1
The entropy of the image, or the mean information conveyed by the photons is then defined by S = -E
Pij In Pij
(4)
i,j
An image F is an array of intensities f~j. By 'restoration' of an image is meant simultaneous resolution enhancement ('deblurring' or deconvolution to reduce the effects of imperfect focussing and optical aberrations introduced by the imaging system) and signal-to-noise enhancement (reduction of background noise). To achieve this, an observed image G (the data) is often modelled by the equation G = H | F+N (5) where F is the true or 'source' image which is required to be found. H is a matrix representing the point spread function (PSF) of the imaging system, which is a blurred image of a point light source (Rosenfeld and Kac, 1982). N is an array of noise values which are typically assumed to obey Gaussian or Poisson statistics, equation (5) denotes the convolution of the source image with the PSF, followed by the addition of noise which may include, for example, thermal noise in a CCD imaging system and photon noise. Since the values of the individual array elements of N are unknown (only the statistics of N can be measured), F cannot be recovered exactly, but it can be estimated. The estimated or restored image F' satisfies the equation G = H | 1~+N' (6) where N' is some array of noise values having the same statisticalproperties as N. There will in general be an infinite number of arrays F' and N' which satisfy equation (6). It is necessary to select just one possible F' as the restored image. The central principle of the m a x i m u m entropy method is to selectthat image F' which has m a x i m u m entropy S and therefore has minimum information content, subject to a statistical constraint which restricts the allowed images to a subset of all possible images. The maximum entropy image is the smoothest (in a sense to be examined in section 2.2) which is consistent with the original data. Assuming that the noise obeys Gaussian statistics, this consistency is typically measured by a chi-squared criterion ~ 2 _ IG - H |
0.2
2
g[ 2
N' ~j _ M -- E.. O.2 1,J
--
(7)
30 where o is the standard deviation of the noise (assumed here to be uniform over the image, and uncorrelated between pixels). The quantity X2 is constrained to be equal to the number of pixels M in the image, which is the most probable value of X2. The maximum entropy image is the most unbiased estimate of the true image F; among the subset of images satisfying equation (7), it contains the least possible information and therefore guarantees that instrumental distortions and noise will be minimized. In a sense, choosing the maximum entropy image is an implementation of Occam's Razor, the nile that the simplest possible explanation for observed data should always be chosen. The MEM image F' is the best choice available. This assumes that the PSF H has been accurately determined, and the noise o is properly characterized. To summarize, maximum entropy image processing is carded out by maximizing the entropy S (given by equations (3) and (4)) subject to the quadratic constraint equation (7). This is a non-linear optimization problem in a space of typically a million dimensions (the number of pixels). Since there is no known analytical solution, numerical methods must be used. The scale of the problem prevents use of standard optimization methods such as conjugate gradient, and it is necessary to use a sophisticated specialized algorithm (the 'Cambridge algorithm', Skilling and Bryan, 1984) discussed in section 3.1.
2.1.4 The entropy of NMR spectra It is now possible to understand how MEM can be applied to NMR signal processing to enhance the resolution and signal-to-noise ratio of spectra. A simple NMR ~ r u m can be regarded as a one-dimensional image, whose intensities at each frequency are proportional to the probability of a photon being detected at that frequency. The entropy of a spectrum can therefore be defined by equation (2) with the probabilities given by f~ (8) Pi=Ef j J
where the quantities f~ are just the spectral intensities as a function of frequency. (This assumes that all the intensifies are positive - if this is not the case, for example in selective inversion and spectral editing experiments, there is a problem - discussion is post~ned to section 5.1. Equation (6) is modified to g = Uf' +n' (9) where the vector f' is the spectral estimate (array of intensities), g is the data (free induction decay or FID) and U is typically a unitary operator which performs the Fourier transform from the frequency domain to the time domain. It may also include the action of a window or line-broadening function. For example, U may consist of the product of a diagonal matrix whose diagonal elements are a decaying exlxmential (representing linebroadening in an inhomogeneous magnetic field) with a Fourier transform matrix of coefficients, n' is an estimate of the noise vector. Spectral restoration or 'reconstruction' is then performed by varying f to maximize S subject to the quadratic constraint
31
x , _ Ig-a~url '
n'2i
- ~N - " - ~. -
(lO)
1
Once again the Cambridge algorithm (Skilling and Bryan, 1984) can be used to perform the numerical optimization. This discussion has so far ignored the difficult problem posed by the denominator in equation (8), since in the theory the total intensity has a different status to the individual intensities. Its value is not usually known in advance. The difficulty is resolved by generalizing the entropy formula in two stages. The first stage is to modify the entropy formula to the form, S :-~Pi
ln(Pi/qi)
(11)
i
The negative of this quantity is sometimes known as the Kullbaek-Liebler crossentropy, employed in minimum cross-entropy minimization (Shore and Johnson, 1980). A new probability distribution, known as the prior probability q is introduced with discrete probabilities qi. q is the limiting distribution which p approaches when S is maximized with no constraints other than the normalization of p. It represents prior information about the spectrum before further information is introduced via the constraint equation (10). ff the data is good (small a) the choice of q (typically taken to be a flat spectrum) has negligible influence on the optimized p. But ff the data is bad, p will tend to approach q in the limit a--->oowhere the data provide no new information. The second generalization is to allow p and q to be 'improper' or unnormalized probability distributions otherwise known as PADs (positive additive distributions, see Skilling, 1989). Equation (11) is extended to S = ~(p,-q,-p, (12)
ln(p,/q,))
/
Since p is no longer required to be normalized, equation (8) can be simplified by identif~ng p directly with the spectrum f:Pi =fi and the prior q is identified with a prior estimate m of the spectrum. Unconstrained maximization of f leads to m, and the scaling problem associated with the denominator of equation (8) is eliminated.
2.2 The constrained regularization approach The process of maximizing S subject to constraints was justified in the previous section in terms of information theory. It is also possible to regard MEM in a simpler light as one possible version of constrained rcgularization (Provencher, 1982). This is a technique for solving inverse problems in which a 'smoothness' functional is defined on an image or spectrum. This smoothness is then maximized, subject to the constraint that the convolution of an image with the PSF, or the Fourier transform of the spectrum, is a good statistical fit to experimental data, typically measured by Z2. Constrained rcgularization or dcconvolution can be applied to NMR (Belton and Wright, 1986) using
32 the sum-of-squares of the second differential coefficients of the spectrum as a smoothness measure; this sum is at a minimum when the spectrum is flat. It is easily demonstrated that the entropy S in equation (2) is a smoothness measure as follows. Consider the entropy change if any two points, say p~ and P2 with p~ > P2, are changed so that (Pl-P2) decreases while normalization is preserved; specifically, let p~ --> p~-~ and P2 -> pE+g. The data may be said to be smoother after this change. The change in S is AS = -(Pl - ~;)In(p1 - ~) - (P2 + ~;)ln(p2 + ~) + Pl In Pl + P2 In P2 = -Pl I
1- e
- P 2 1 1+ e
+ e l n Pl P2 +~ As ~ --> 0, ln(l+x) --> x hence the first two terms on the right hand side cancel leaving A S - ~ ln(pP---12) ignoring terms of order e 2. Since p~ > p2 by assumption, AS is positive. Thus when any two points in the spectrum are adjusted to be closer together in value, the entropy increases. A flat spectrum will maximize the entropy.
2.3 The axiomatic approach Shannon derived his entropy formula from axioms which an information measure should satisfy (section 2.1.2). This approach can be extended to justify the maximum entropy procedure using equation (12). Tikochinsky, Tishby and Levine (1984) carded out early work in this field by proposing consistency conditions which must be satisfied by any algorithm for inferring a discrete probability distribution with given averages measured in reproducible experiments. They showed that the only consistent algorithm is one leading to the distribution of maximal entropy, subject to the given constraints. Shore and Johnson (1980) (later corrected by Johnson and Shore, 1983) gave a formal axiomatic treatment showing that maximum entropy and minimum cross-entropy are the uniquely correct and consistent methods for inductive inference of probability distributions from observed data. Later Skilling (1988) presented maximum entropy as a universal method of finding a "oest' PAD constrained by incomplete data, and showed that the generalized entropy equation (12) is the only form which selects acceptable distributions. Not using MEM leaves one open to logical inconsistencies and contradictions. 2.4 The Bayesian statistical approach To some extent, the procedure outlined above has been superseded by much more powerful methods based on Bayesian statistics (Skilling, 1990 and 1991). The 'Quantified Maxcnt' algorithm no longer selects a single spectrum or image as the "oest'; instead it computes a probability distribution over the space of all possible images which arc consistent with the original data, allowing one to place error bars on features in the reconstructed image. Commercial software is available, and the technique has been used with considerable success for NMR spectral analysis with quantitative peak intensity and
33 error estimation (Sibisi, 1990). However, implementation of this technique is very complex, even compared to the earlier Cambridge algorithm. Although it is not based directly on entropy maximization, the work of Bretthorst (1988, 1990a, 1990b, 1990c) has opened up a new and powerful field of Bayesian statistical estimation of NMR parameters and associated errors, with an accuracy far surpassing that achievable by Fourier transform methods. 2.4 The Burg method The Burg (or All Poles) method (Chen, 1982) is an alternative (and older) maximum entropy method based on the definition of spectral entropy in terms of the power spectral density I f (v) 12 of a spectrum:S = ~ lr~f(v)l2dv
(13)
At first glance, this looks completely unrelated to the Shannon definition and has sometimes been criticized for being unphysical. However it can be derived from the Shannon formula on the basis of certain (arguable) assumptions about the statistical properties of time series data such as an FID (Haykin and Kesler, 1979). Further details and algorithms, which are radically different to conventional MEM, are given by Chen (1982) and Press, Flannery, Teukolsky and Vetterling (1988). The method has been used with considerable success in NMR (Viti, Massaro, Guidoni and Barone, 1986). 3 MEM ALGORITHMS The maximization of entropy subject to linear constraints is a straightforward problem which can be solved numerically using an algorithm by Agmon, Alhassid and Levine (1979). It transforms the problem to one in which the independent variables are Lagrange multipliers, one for each constraint. The optimization is done by a standard method such as Newton-Raphson. Various algorithms have been proposed for carrying out the nonlinear optimization required by MEM when some constraints are quadratic, e.g. the condition that Z2 equals the number of data. The most widely used, the so-called 'Cambridge' algorithm for 'historic MaxEnt' (Skilling and Bryan, 1984), is outlined below, followed by a brief survey of some of the alternatives. 3.1 The Cambridge Algorithm 3.1.1 Statement of the problem The problem to be solved can be formally expressed as follows: to maximize the entropy functional
S(f) = E {fi - m i - f i ln(fi/mi)}
(14)
i
of the reconstructed image (or spectrum) f relative to the prior m, where the summation is carried out over N array elements f~. The maximization is subject to a constraint on the goodness-of-fit (chi-squared) statistic C(f) = Z2 = E (Fk - Dk)2 / ~2 k
(15)
34 where the array D is the data, the summation is carried out over the M data, and the blurred image (or signal) F is some known linear function of f: Fk = E RkjfJ
(16)
J The matrix R might be, for example, a matrix of Fourier coefficients or it might represent convolution with a PSF. A reconstruction f is said to be feasible if the simulated data F agree with the actual data D to within the noise, which is expressed by C(f) < C.~m (17) where Cam~is some preset target. The most probable value of C ~ is M. S(f) is to be maximized subject to equation (17). The solution of this typical constrained maximization problem lies at the extremum of the Lagrangian function Q(f) = s ( f ) - Lc(f) (18) for some value of the Lagrange multiplier ~, to be chosen so that equation (17) is satisfied.
3.1.2 Constructing a subspace of the image space The derivatives of the entropy S(f) are the gradient vector VS and Hessian matrix vVS with components c3S - ln(fj/mj) S,j = -~i =
(19)
82S
S,ij = oqfic3fJ =--~ij/fj
(20)
The derivatives of the constraint C(f) are the gradient vector VC and Hessian matrix VVC with components
C.j --~jj : -~C
E1 -~jj~10F' 0C : 2El Rlj(F1 - D,)/(~
C'ij = c3fig3fJ -
~
g3Fm
(21)
m
f may be regarded as a point in an N-dimensional image space. In this space, surfaces of constant C(f) are convex ellipsoids, and surfaces of constant S(f) are also strictly convex. The maximum entropy reconstruction is therefore unique. The dimensionality N may be several thousand if f is a spectrum, or a million or more ff f is a large image. The size of the problem is such that vector operations such as addition, scalar multiplication and scalar products are permissible in N-dimensional image space or M-dimensional data space but matrix operations of order O(N 2) or O(M 2) must be prohibited because of memory space and execution time restrictions. This immediately rules out most standard numerical optimization methods such as Newton-Raphson. Skilling and Bryan (1984) adopted a modified form of the conjugate gradient method which arrives at the solution iteratively. In each iteration the trial image f is modified by a vector 5f which lies wholly within a subspace of the image space. By keeping the
35 dimensionality of the subspace small (3 is usually adequate) the scale of the problem can be dramatically reduced. The choice of search directions in the subspace is critical, and is discussed below. A further simplication is achieved when searching within the subspace; instead of using the full expressions for S(f) and C(f), quadratic approximation models for S(f) and C(f) are used in the vicinity of a particular trial solution f. These models are constructed anew during each iteration, by employing gradient information at f (equations (19)-(22)). Even though the models are regularly updated, a limit must be placed on the difference 8f between successive iterates, since the models for S and C will be inaccurate at large distances. A method is required for quantifying and controlling 'distance' within the subspace, in other words the magnitude of a jump 8f. Skilling recommends using the distance constraint 12 = E (8fi)2/fi -< 12
(23)
i
where 102 is typically chosen to be of the order 0.1Zf~ to 0.5Ef ~. It discriminates in favour of allowing high 6fivalues (intensity peaks) to change more than low ones, but not too much so. Equation (23) controls the size of the jump in image space, performed at the end of each iteration. Measuring 'distance' in image space by (23) is equivalent to imposing a metric or distance-measure on the space. A metric is a tensor g~j such that in general distance is given by 8S2 = E
giJdfidfj
E i
(24)
j
(See, for example, any standard text on tensor calculus and differential geometry.) The metric tensor of image space is therefore gij = 8 i j / f i (25) Note that the index i in f~ has now been raised to a superscript, which is strictly necessary to conform with the standard notation of differential geometry. The same change must be made in all the foregoing equations. Since the metric is non-Euclidean, it is important to keep careful track of contravariant indices (superscripts) and covariant indices (subscripts). Comparing (20) and (25), the metric is ~2S gij = ~io~fj =-(VVS)i j (26) Using minus the entropy curvature as the metric in image space is the single most important key to the development of a robust algorithm. It leads to a 'slowing down' of the iterative process, reducing the magnitude of the allowed jump 8f when the entropy curvature is large. The gradients (19) and (21) must be treated as covariant vectors. To compute the contravariant quantity 8f ~, their indices must be raised. The contravariant entropy gradient becomes, 9- c3S = fi c3S
s'i = E g'J ~fJ J
~fi
(27)
36 where the contravariant form of the metric tensor is .. Cofactor of gij in matrix g i g'J = det(g) =SiJf
(28)
Similarly the contravariant chi-squared gradient becomes 9" ~ C
= f i c9C
c'i = E g'J ~J0fi (29) J The chi-squared curvature matrix must be premultiplied by g~ to give a matrix which will map contravariant vectors onto contravariant vectors:,k
gki
~2C
_ fk
(30)
~2C
i
The two column vectors and the matrix with components given by (27), (29) and (30) are denoted in the following discussion by f(VS), f(VC) and f(VVC) respectively. The analogous quantity f(VVS)is not required (it is just minus the unit matrix). These firstand second-order gradients are used to construct vectors which span the subspace of image space in which the search is carried out. Skilling recommends the use of at least three vectors (search directions):el = f(VS) (31) e 2 = f(VC)
(32)
= Ivsl-' f(VVC), f(VS) - i v c l - ' f f v v c ) , f f v c ) (33) In the expression for e3, the terms f(VVC).f(VS)andf(VVC)-f(VC)are the products of a matrix and a column vector. They are premultiplied by the inverses of the lengths of the gradient vectors. The lengths are determined by the choice of metric, IVY=
fi OS
(34)
Each of the quantities e~, e2 and e3 is a column vector whose N contravariam components are the projections of the search vectors along the N dimensions of the original image space. As previously mentioned, this gradient information is used to construct quadratic approximations or models for S and C in the vicinity of the current trial image f:S(x) = S0 + E S~x' + ~ E ~t
E Sr,vX'XV ~t
= Co +Zc x ~t
(36)
v
(37) !~
v
37 In these equations the indices kt and v are summed over the range 1, 2 and 3. So and Co are the entropy and chi-squared for the trial image f. The quantities x ~ represent displacements along the three search directions (31)-(33). The model coefficients are scalar quantities found by the vector products of the search vectors and gradients, S~ =(e~) T .VS
(38)
C~ =(e~) T .VC
(39)
and by products of the search vectors and the Hessian matrices, S~v . (e~) . T VVS . . (ev).
-(e~) T g-(e~)= - g ~
C ~ =(e~) T .VVC.(ev)
(40) (41)
The control algorithm described below is used to find the displacements x '. The trial image or Sl~Ctrum is then updated by a jump in the subspace, f~)
= f +Sf = f + ~ x~%
(42)
The length squared of the increment 8f, given by v
is used to control the length of the jump. In order to compute all the model coefficients, it is found that 6 transformations of the form (16) (typically Fourier transforms) from the image space to data space must be performed per iteration. Typically 10-50 iterations are performed. The process is terminated when C(f) reaches C~smto within some required tolerance, and the quantity VS
VC
TEST = i2 Iv-gI IvcI
(44)
which measures the angle between the entropy and chi-squared gradients, is sufficiently close to zero (typically 0.1). At the true MaxEnt solution, VS and VC are parallel and TEST is 0. A complete implementation of this algorithm is a major programming effort, but it performs well and is robust. 3.1.3 The control algorithm
The control algorithm performs a constrained maximization of S(x)subject to the constraints C(x) - C,im (45) 1 1 are useful when the untransformed error has a long upper tail, and A < 1 when there is a long lower tail. Obviously, the same transformation has to be applied to the model to preserve the functional relationship. Outliers A totally different kind of departure from normality is due to the presence of outliers. According to Barnett and Lewis (1994), we can define outliers as observations which appear to be inconsistent with the remainder of data in an experiment. In spite of the fact that Laplace and Legendre had already faced this problem, and that it concerns researchers, statisticians have been reluctant until recently to consider their effect and treat them properly. Fortunately, the situation is changing and the subject is now progressing rapidly (Huber, 1981; Barnett and Lewis, 1994). The presence of outliers is due to a series of factors usually considered as disturbing, and therefore uninteresting for explaining the behaviour of the experimental system. Unfortunately, they frequently appear in observations and their effect on the regression analysis of affected experiments is too important to be ignored. The natural outcome is to discard anomalous observations that spoil experiment interpretation. The regression methods that detect and diminish the effect of outliers are said to be robust. Discarding data that are conspicuously wrong by observation is already a robust technique, and if used with caution, it is perfectly acceptable. However, often we cannot proceed in this way, either because it is not clear whether concerned data are really outliers, because of the huge amounts of data to be analyzed, or because data are analyzed automatically or by untrained personnel. Therefore it is highly desirable to have some kind of proper robust regression technique at hand whenever we suspect data. The subject of robust estimation is very important in its own right, so here we will only outline a solution. This consists essentially in minimizing a merit function that incorporates a robust loss function, p ( r i / a i ) , instead of the simple residual sum of squares. A robust loss function is any function that downweights residuals more or less smoothly, such that, when ri/ffi is small, a unit weight is attributed and the point is fully considered. When r i / a i is big, a zero weight is attributed (equivalent to rejecting the point), and in all other cases an intermediate weight value is attributed. Many robust loss functions have been proposed by different authors, but most perform similarly, and at our level we do not need to be concerned with the differences. A popular one, implemented in the LSTSQ package (C~irmenes, 1991), known as Tukey's biweight (Mosteller and Tukey, 1977) is (ti(1
p ( r i / a i ) - p(ti) -
O,
-
(ti/k)2)) 2
if It~l < k if It~l _> k,
(65)
91 where k is some suitable robust constant (a k between 2 and 3 is adequate). The sum of absolute deviations (5) that leads to Ll-norm minimization can also be used as a robust loss function. Apart from the differences in the merit function, nonlinear robust regression algorithms are identical to their non-robust counterparts. However, robust fitting is always more diffic'_.lt than ordinary regression, and there may exist several solutions, depending on which data are considered correct and which are outliers. Data sets with many outliers should simply be rejected, and robust methods only used when outliers are the exception. A simple way to accelerate calculations which is useful in most cases (but not always) is to perform an ordinary fit first, and to use the calculated parameters as starting values for the robust regression. An appropriate word of caution: points can be downweighted because the model is not able to predict the behaviour of the system at some parts of the curve. Beware in particular of outliers that occur in clusters: they may not really be outliers, it may just mean that the model is incomplete. This is important especially when we use nonlinear regression to compare or validate models. 2.4.2 Inappropriate modelling
Another important cause of failure in regression analysis is the use of an inappropriate model. We can distinguish two situations. One is given by models which cannot satisfactorily explain the results. This can be easily detected by plotting the residuals (ri) versus the regressor variable (xi). The presence of trends on these plots almost invariably suggests the model as being inappropriate 13 (Fig. 7). The obvious solution to this problem is to change or complete the model.
o o
~0-
o 8 0
o
O
8~
O 0
o
o
o
O
--o-
o 8 0
0-0--
0- 0-~I-
-o ~o
~
--
o
0
Fig. 7: Plots of residual versus regressor values in a bad fit with trends (A) and a correct fit without trends (B), using heteroscedastic data. A second type of problem appears when the model is essentially correct, but has been badly parametrized. This happens when one of the parameters is a linear combination of the others, and this is known as parameter redundancy. In this case the main symptom is 13 If the fitting were carded out using a simplistic algorithm, it might also mean that it had stopped at a false minimum.
92 that the J ' J matrix is persistently singular at every iteration, and the regression program is unable to converge. The solution consists in carefully studying the model in order to identify and eliminate any redundant parameter. True parameter redundancy is unlikely if the model has been designed carefully, but near parameter redundancy may not be obvious until we observe that the regression program converges painfully, and that the J ' J matrix is singular or almost singular. After convergence, the resulting parameters will show inadmissibly wide variance and covariance values. Sometimes the problem can be reduced by expanding the range of regressor values within which the system is studied. More frequently, we will need to reparametrize the model until we achieve low covariance values, or even to obtain some of the parameters by other means, as in the nonidentifiability problem below.
2.4.3 Poor experimental design A different source of failure is the design and execution of the experiment. It is evident that we cannot expect to obtain useful information from badly run experiments having considerable experimental error or many outliers. Therefore, when data are obtained and analyzed automatically, or by untrained personnel, it is important to have some kind of internal or external quality control to uncover these situations. Less evident, but equally important is the design of the experiment, that is, which points need to be collected to obtain information leading to accurate parameter estimation. There is not a simple answer to this subject, sometimes referred to as optimal design, but we will try to provide some basic directions. If we are comparing or validating models, it is important to have information on the widest possible range of values. It is usually a good idea to obtain enough points from the parts of the curve which change rapidly. If we know that a particular section of the curve can take alternative shapes depending on the value of some parameters, we should obtain data on that section. In heteroscedastic models the use of replicates may be important to study the structure of the error in order to introduce appropriate weights. These comments are drawn almost from common sense, and may seem unnecessary, but it is surprising how easily they are overlooked. For example, one common error is to take equally spaced samples in decay experiments. A more sophisticated approach, useful especially when planning very difficult or costly experiments, involves using the model under study to generate a few artificial data sets, to which we add some normally distributed random error. We then fit these synthetic data and compare the accuracy obtained with different prospective experimental designs. This method can sometimes save a considerable amount of time, resources and effort. For example, data shown in Fig. 4 have been generated by this means.
2.4.4 Nonidentifiability The problem of parameter identifiability is very closely related to that of near parameter redundancy. It means that in spite of dealing with the right model, with carefully designed and performed experiments, it is still impossible to determine some of the parameters with a minimally acceptable confidence, no matter which regression method is used. A classical example is given by models involving sums of exponential functions. These models arise in decay experiments, or as the solution to sets of differential equations that describe compartmental systems. When the parameters that appear on the exponents are too close,
93 they may not be identifiable. For example, 7e -x/2 + 1 le -x/7 and 11.78e -~/3.1+6.06e-~/9.4 are almost indistinguishable (Seber and Wild, 1989). Similarly, it may not be possible at all to distinguish between the sum of two exponential terms and the sum of three terms, as for example with 5e -x/1"5 + 5e -z/4 + 9e -z/14 and 8.54e -z/2"1 + 10.40e -z/12"85 (CornishBowden, 1976). The simulation technique outlined in the last paragraph of the previous section is probably the easiest way to detect problems of identifiability. For example, reaction velocity depends on temperature according to the Arrhenius equation, a mechanistic model with two parameters, the activation energy (AG~) and the so-called Arrhenius constant (A):
k - Ae -~G*/RT,
(66)
where T, the absolute temperature, is the independent variable. Some synthetic data were generated by using typical temperature and parameter values and random error (A -- 108 in arbitrary units, AG~ = 50 kJ mo1-1, a ~ 5 % of average measurements). After fitting, the resulting determination coefficient was R 2 = 0.998, calculated parameter values were A = 7.5x 107 and AG* = 49.3kJmo1-1, and the corresponding relative standard deviations were 33% for A, and 1.7% for AG$. For this kind of experiment, confidence intervals are consistently about 20 times bigger for A than for AG$. We conclude from this simulation that, while activation energies can be safely calculated using this procedure, it is of no value to try to estimate the Arrhenius constant in this way. One solution to overcome this is to obtain affected parameters from some completely different property of the system, that is, using a different model that does not suffer from the same problem. Another approach, useful in models leading to exponential sums, consists of studying the system under different conditions, for example measuring concentrations in several compartments, or decay rates at multiple settings in relaxation experiments. This results in a multi-response model, and the corresponding data are analysed altogether (Beauchamp and Cornell, 1966; Beechem, 1992).
2.4.5 Fitting failure Before concluding this section, we will briefly mention what can be done when the regression fails to converge. We may successively try the following simple actions: (1) use different starting values, (2) rescale the problem so that the order of magnitude of variables and parameters is close to 1, (3) limit the step-length, especially when the fitting has failed due to overflow errors, (4) if performing robust regression, try to make an ordinary fit first, (5) use a more reliable regression algorithm, and finally (6) consider repeating the experiment or revising the model. 3 FURTHER TOPICS In addition to the basic practical and theoretical topics covered so far in this chapter, there are many others that we cannot examine in detail without employing too much space. However, some of these are important enough to merit some consideration. We will briefly mention them and provide some pointers to more specialized literature which interested readers may find useful.
94 We have already mentioned that, in addition to the estimation of parameter values, we may also use nonlinear regression to obtain confidence intervals for parameters, for predicted responses, or for predicted regressor value~ (in calibration). The necessity of obtaining intervals cannot be overemphasized when parameters are important in their own right. In some fields it is not yet customary to calculate and provide them, and this practice is scientifically incorrect and should be modified. Calculations are usually based on the linearly asymptotic properties stated by expressions (19) to (24) for the linear case with uniform variance, or through equivalent expressions for nonlinear WLS. More sophisticated techniques are based on simulations (Monte-Carlo methods) and on resampling techniques (bootstrap). However, if we are interested only in one-dimensional (or single-parameter) confidence intervals, we can calculate them from the approximate covariance matrix (C s 2 (J9, t V - 1 "J ) - 1 in nonlinear GLS), where the i-th diagonal element, ci,i is the variance of the corresponding parameter, ai. If we are interested in multi-dimensional intervals, we need to use more complicated techniques and the reader should refer to more specific readings. See for example Press et al. (1992), Seber and Wild (1989), Tiede and Pagano (1979), Lieberman et al. (1967), Miller (1981), and Wu (1986), mentioned in order of increasing difficulty. Most nonlinear regression methods need to calculate Jacobians (first derivatives of the model), and some even the Hessian. Although the whole theory is built on the use of analytical derivatives, in our experience, numerical derivatives can be used safely without any significant loss of efficiency for most practical problems. Only in the rare event that the model does not have derivatives, or more frequently, when they are very difficult to calculate, is it advisable to use techniques that do not use derivatives at all, of which the polytope algorithm is a good candidate. Another class of algorithms that do not use derivatives are secant methods, similar in spirit to the secant method of root finding (Gill et al., 1981; Seber and Wild, 1989). Nonlinear regression is also very well suited to study multi-response models, while fitting every response separately is strongly discouraged. Compartmental models are probably the most typical examples. The extension of WLS theory and algorithms to cope with multiresponse models is fairly easy. The only important advice is to detect linear dependencies among the various responses, and not to include in the analysis those that can be calculated from the others. For example, if we measure metabolite concentrations in three body compartments or in three reaction steps and we know that c1 -I- e 2 + e3 - - constant, it is safer to use only two of the responses (Beauchamp and Cornell, 1966; Box et al., 1973). One of the basic conditions for least-squares mentioned in Section 2.1 is that the regressors should be essentially free of error. For practical purposes it is usually sufficient that the regressor error be small in comparison to the response error. Sometimes this may not be true, for example, when we use a model to relate two kinds of experimental observations. We call these situations error-in-variables models. Here, the methods we have seen may give biased estimations, although on most occasions WLS will suffice for practical purposes. Proposed solutions are complicated and interested readers are referred to very specialized literature (Hodges and Moore, 1972; Narula, 1974). The regression methods described in this chapter do not pose any restriction on which
95 parameter values are acceptable, they only seek to optimize the merit function. However, when the parameters represent physical, chemical or biological magnitudes, some sets of values may be meaningless. For example, kinetic constants and spectral band widths and heights are expected to be positive values. When the usual methods provide unreasonable values, we may use constrained regression methods, which optimize the merit function subject to conditions known as the constraints of the problem. It is advisable to use primarily unconstrained (ordinary) regression methods, which with well designed models and experiments will usually provide correct answers. Data or models that do not lead to reasonable estimates are, frequently, suspect. If we still find the use of constraints necessary, a simple approach involves transforming the model (for example, squaring parameters that must be necessarily greater than zero), or modifying the merit function so that when the parameters approach forbidden values S(a) tends to infinity 14. For more sophisticated approaches and further references see for example Gill et al. (1981). All usual optimization and regression methods find local minima, rather than the global minimum, which would be desirable to find. Usually this is not a serious problem, since spurious local minima usually give fits that are conspicuously inappropriate. The solution consists in repeating the fit using different starting values. Sometimes this may not be so obvious, especially when the problem has many similar minima. In these cases we need totally different algorithms aimed at global optimization (based for example on Monte-Carlo techniques (Binder and Stauffer, 1985; Aluffi-Pentini et al., 1988), simulated annealing (Kirkpatrick et al., 1983; Corana et al., 1987; Ingber, 1989; Ingber and Rosen, 1992) or genetic algorithms (Holland, 1975; Goldberg, 1989; Wright, 1991; Ingber and Rosen, 1992)), but this is a difficult subject that is being, at present, actively investigated. In Section 1.2 we mentioned that splines were not appropriate for evaluating nonlinear assays in spite of the apparently excellent fittings they give. We outline here the reasons, that will be the object of a forthcoming specific report. Given any set of n distinct points (with different abscissae) there is always a polynomial of degree n - 1 that passes exactly through every point. However, as can be appreciated in Fig. 8-A (continuous line) the actual curve varies widely between the fixed points, making them conspicuously inappropriate for interpolation. Cubic spline functions use third-degree polynomials which join every pair of consecutive points (known here as knots), subject to the condition that the first derivative must be continuous everywhere. As a result, spline curves still pass through every point, but they are notoriously smoother than polynomials, and consequently they are extremely eye-catching (dashed line). The bad side of the story is that if any point is erroneous (and we know that experimental error is ubiquitous), it pulls the two adjacent spline sections with it, so that this whole part of the curve will be badly affected. Compare the spline in Fig. 8-B with the curve resulting from fitting a suitable nonlinear parametric model (continous line) to typical immunoassay data. Thus, splines are useful only when we have no indication of what the underlying model is like, whereas in all other cases, only aesthetics would justify its use. In general, mechanistic models, or even empirical parametric models, are preferable to nonparametric models. 14 Thisis knownas a barrierfunction, and it can be easily implementedusingthe LSTSQpackage.
96
A
5-th degree polynomial cubic sphne
parametric model cubic sphne
x
Fig. 8: Polynomial interpolation (continuous line in Part A), spline interpolation (dashed line), and parametric fitting (continuous line in Part B) to typical immunoassay data. Nonlinear regression is a powerful tool which can increase the quality of results obtained from existing data, and even open the way to new kinds of experimental setups that would be otherwise useless. We have seen that, as with any other technique, it is not exempt from risks. Although not every one needs to be an expert, all scientific users should know its basic principles, and at least be able to decide knowledgeably when they need to use it, how to make sensible use of it, and when they need to seek assistance. We hope that this chapter has contributed to this purpose. 4 ACKNOWLEDGMENTS This work was supported, in part, by the Comisirn Interministerial de Ciencia y Tecnologs (Ministerio de Educacirn y Ciencia, Spain), Research Grant PTR93-0041. I thank David Gonz~ilez Pisano for his assistance in the preparation of this manuscript. 5 REFERENCES
Aluffi-Pentini, E, Parisi, V. and Zirilli, E, 1988. Global optimization using a stochastic integration algorithm. ACM Trans. Math. Software, 14, 366-380. Barker, B.E. and Fox, M.E, 1980. Computer resolution of overlapping electronic absorption bands. Chem. Soc. Rev., 9, 143-184. Barnett, V. and Lewis, T., 1994. Outliers in statistical data. 3rd Ed. Wiley, New York. Bates, D.M. and Watts, D.G., 1981. A relative offset orthogonality convergence criterion for nonlinear least squares. Technometrics, 123, 179-183. Baxter, R.C., 1983. Methods of measuring confidence limits in radioimmunoassay. Meth. Enzymol., 92, 601-610. Beauchamp, J.J. and Cornell, R.G., 1966. Simultaneous nonlinear estimation. Technometrics, 8, 319-326. Beechem, J.M., 1992. Global analysis of biochemical and biophysical data. Meth. Enzymol., 210, 37-54. Bevington, P.R. and Robinson, D.K., 1992. Data reduction and error analysis for the physical sciences. 2nd Ed. McGraw-Hill, New York. Binder, K. and Stauffer, D., 1985. A simple introduction to Monte Carlo simulations and
97 some specialized topics. In: K. Binder (Editor), Applications of the Monte Carlo method in statistical physics, pp. 1-36, Springer-Verlag, Berlin. Box, G.E.P. and Cox, D.R., 1964. The analysis of transformations. J. R. Statist. Soc. B, 26, 211-252. Box, G.E.P. and Cox, D.R., 1982. An analysis of transformations revisited, rebutted. J. Amer. Statist. Assoc., 77, 209-210. Box, G.E.P., Hunter, W.G., MacGregor, J.E and Erjavec, J., 1973. Some problems associated with the analysis of multiresponse data. Technometrics, 15, 33-51. Carroll, R.J. and Ruppert, D., 1984. Power transformations when fitting theoretical models to data. J. Amer. Statist. Assoc., 79, 321-328. Carroll, R.J. and Ruppert, D., 1988. Transformation and weighting in regression. Chapman and Hall, New York. C~irmenes, R.S., 1991. LSTSQ: a module for reliable constrained and unconstrained nonlinear regression. Comput. Appl. Biosci., 7, 373-378. Corana, A., Marchesi, M., Martini, C. and Ridella, S., 1987. Minimizing multimodal functions of continuous variables with the "simulated annealing" algorithm. ACM Trans. Math. Software, 13,272-280. Cornish-Bowden, A.J., 1976. Principles of enzyme kinetics. Butterworths, London. Dennis, J.E., Gay, D.M. and Welsch, R.E., 1981. An adaptive nonlinear least-squares algorithm. A CM Trans. Math. Software, 7, 348-368. Dennis, J.E. and Schnabel, R.B., 1983. Numerical methods for unconstrained optimization and nonlinear equations. Prentice-Hall, Englewood Cliffs, N.J.. Dixon, M. and Webb, E.C., 1979. Enzymes. 3rd Ed. Longman, London. Dowd, J.E. and Riggs, D.S., 1965. A comparison of estimates of Michaelis-Menten kinetic constants from various linear transformations. J. Biol. Chem., 240, 863-869. Ferreti, J.A. and Weiss, G.H., 1989. One-dimensional nuclear overhauser effects and peak intensity measurements. Meth. Enzymol., 176, 3-11. Galat, A., 1986. Computer-aided analysis of infrared, circular dichroism and absorption spectra. Comput. Appl. Biosci., 2, 201-205. Gallant, A.R., 1987. Nonlinear statistical models. Wiley, New York. Galton, F., 1886. Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute, 15,246-263. Gauss, C.E, 1809. Theoria motus corporum celestium. Perthes et Besser, Hamburg, Germany. Gill, P.E. and Murray, W., 1978. Algorithms for the solution of the nonlinear least-squares problem. SlAM J. Numer. Anal., 15,977-992. Gill, P.E., Murray, W. and Wright, M.H., 1981. Practical optimization. Academic Press, London. Goldberg, D.E., 1989. Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, Massachusetts. Hartley, H.O., 1961. The modified Gauss-Newton method for the fitting of nonlinear regression functions by least squares. Technometrics, 3,269-280.
98 Hodges, S.D. and Moore, RG., 1972. Data uncertainties and least squares regression. Appl. Statist., 21,185-195. Holland, J.H., 1975. Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor, Michigan. Huber, P.J., 1981. Robust statistics. Wiley, New York. Ingber, L., 1989. Very fast simulated re-annealing. J. Math. Comput. Modelling, 12,967-973. Ingber, L. and Rosen, B., 1992. Genetic algorithms and very fast simulated reannealing: a comparison. J. Math. Comput. Modelling, 16, 87-100. Jacquez, J.A., 1972. Compartmental analysis in biology and medicine. Elsevier, New York. Kirkpatrick, S., Gelatt, C.D. and Vecchi, M.P., 1983. Optimization by simulated annealing. Science, 220, 671-680. Laplace, P.S., 1820. Th6orie analytiques des probabilit6s. 3rd Ed. Courcier, Paris. Led, J.J. and Gesmar, H., 1994. Quantitative information from complicated nuclear magnetic resonance spectra of biological macromolecules. Meth. Enzymol., 239, 318-345. Legendre, A.M., 1805. Nouvelles m6thodes pour la d6termination des orbites des com~tes. Courcier, Paris. Levenberg, K., 1944. A method for the solution of certain problems in least squares. Quart. Appl. Math., 2, 164-168. Lieberman, G.J., Miller, R.G.Jr. and Hamilton, M.A., 1967. Unlimited simultaneous discrimination intervals in regression. Biometrika, 54, 133-145. Marquardt, D.W., 1963. An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math., 11, 431-441. Marschner, I., Erhardt, F. and Scriba, P.C., 1978. Calculation of the immunoassay standard curve by spline function. In: Radioimmunoassay and related procedures in medicine, pp. 111-122, Int. Atomic Energy Agency, Vienna. McCullagh, P., 1983. Quasi-likelihood functions. Ann. Statist., 11, 59-67. Meyer, R.R. and Roth, P.M., 1972. Modified damped least squares: an algorithm for nonlinear estimation. J. Inst. Math. Appl., 9, 218-233. Miller, R.G.Jr., 1981. Simultaneous statistical inference. 2nd Ed. McGraw-Hill Book Company, New York. Mosteller, F. and Tukey, J.W., 1977. Data analysis and regression. Addison-Wesley, Reading, Massachusetts. Narula, S.C., 1974. Predictive mean square error and stochastic regressor variables. Appl. Statist., 23, 11-16. Nelder, J.A. and Mead, R., 1965. A simplex method for function minimization. Comput. J., 7, 308-313. Powell, M.J.D., 1970. A hybrid method for nonlinear equations. In: P. Rabinowitz (Editor), Numerical methods for nonlinear algebraic equations, pp. 87-114, Gordon and Breach, London. Press, W.H., Teukolsky, S.A., Vetterling, W.T. and Flannery, B.P., 1992. Numerical recipes in C: The art of scientific computing. 2nd Ed. Cambridge University Press, Cambridge, England.
99 Prince, J.R., Bancroft, S. and Dukstein, W.G., 1980. Pharmacokinetics of pertechnetate administered after pretreatment with 400mg of potassium perchlorate: concise communication. J. Nucl. Med., 21,763-766. Raab, G.M., 1983. Comparison of a logistic and a mass-action curve for radioimmunoassay data. Clin. Chem., 29, 1757-1761. Reinsch, C.H., 1967. Smoothing by spline function. Numerische Mathematik, 10, 177-183. Richards, ES.G., 1961. A method of maximum-likelihood estimation. J. R. Statist. Soc. B, 23,469-476. Seber, G.A.E, 1977. Linear regression analysis. Wiley, New York. Seber, G.A.E and Wild, C.J., 1989. Nonlinear regression. Wiley, New York. Seshadri, K.S. and Jones, R.N., 1963. The shapes and intensities of infrared absorption bands. Spectrochim. Acta, 19, 1013-1085. Stigler, S.M., 1981. Gauss and the invention of least squares. Ann. Statist., 9,465-474. Stigler, S.M., 1986. The history of statistics: the measurement of uncertainty before 1900. The Belknap Press of Harvard University Press, Cambridge, Massachusetts, and London, England. Tiede, J.J. and Pagano, M., 1979. The application of robust calibration to radioimmunoassay. Biometrics, 35,567-574. Wilkinson, G.N., 1961. Statistical estimations in enzyme kinetics. Biochem. J, 30, 324-332. Wright, A., 1991. Genetic algorithms for real parameter optimization. In: G. Rawlins (Editor), Foundations of genetic algorithms, Morgan Kaufmann Publishers, San Mateo, California. Wu, EJ., 1986. Jacknife, bootstrap and other resampling methods in regression analysis. Ann. Statist., 14, 1261-1295. Yule, G.U., 1897. On the theory of correlation. J. R. Statist. Soc. B, 60, 812-854.
Signal Treatment and Signal Analysis in NMR Ed. by D.N. Rutledge 9 1996 Elsevier Science B.V. All rights reserved.
100
Chapter 5 THE PADE-LAPLACE ANALYSIS OF NMR SIGNALS Jean A ubard and Patrick Levoir
Institut de Topologie et de Dynamique des Systfmes Universit6 Paris 7 Paris, F R A N C E
1 INTRODUCTION
In the last decade new methods of signal analysis in NMR have been developed either to obtain accurate estimation of spectral parameters (ie. amplitudes, frequencies, linewidths and phases) directly from the free induction decay (FID) or to extract relaxation time data (T~ and/or T2) from NMR decay curves. Thus, maximum entropy method (MEM) and linear prediction (LP) belong to a series of recently used methods which overcome the usual limitations when processing NMR FIDs using the discrete Fourier transform (DFT) and improve spectral analysis even in the case of poor signal to noise (S/N) ratios. On the other hand, the problem of recovering the exact number of components in multiexponential NMR relaxation curves is difficult and various approaches have been employed for a long time with more or less success (see the classification proposed in Aubard et al., 1987). Let us cite quite recent procedures using constrained regularization and non linear regression methods, both described in this book, which have proved to be powerful, in certain cases, for the analysis of NMR relaxation decay curves. Aubard et al. (1985,1987), Boudhabhay et al. (1992), Yeramian and Claverie (1987), Claverie et al. (1989) have recently described a new method well suited to analysing multiexponential functions and which has been named by the authors, Padf-Laplace (PL), since it introduces Laplace transform and Pad6 approximants. In the particular case of NMR signals this method is interesting because it has been shown to be very efficient for the treatment of relaxation signals such as decay curves (Tellier et al., 1991) and to be potentially adapted for resolving exponentially damped sinusoidal signals like NMR FIDs (Boudhabhay et al., 1992). Thus, we have at our disposal a unique tool which allows us to tackle the various situations encountered in the analysis of NMR signals, i.e. either to estimate spectral parameters from FID or to determine relaxation times from decay curves. In the fisrt part of this chapter, we will give an overview of the PL theory highlighting the most important mathematical properties of the method. Illustrations of PL to the analysis of experimental multiexponential relaxation curves will be given in the second part to show the performances of the Padf-Laplace treatment. Lastly, the third part will be devoted to the processing of FIDs using the PL method, emphasizing the specificities of such a treatment for this particular class of signals. Both simulated and
101 experimental FIDs will then be presented and discussed in order to show how PL works in NMR spectral analysis. Finally, before getting to the heart of the matter, the reader is informed that the purpose of this chapter is not to enter into a debate concerning the merits and the weakness of PL as an effective technique in signal analysis in general and for the treatment of NMR relaxation decay curves and FIDs in particular. The reader will probably be able to come to an opinion after reading this chapter and the literature concerned by this question (Tang and Norris, 1988 ; Yeramian, 1988 ; Matheson, 1989 ; Bowers et al., 1992; Clayden, 1992).
2 GENERAL THEORY OF THE PADI~-LAPLACE METHOD In this chapter the signals under investigation may be represented by functions of the form"
f (t) - ~
Cke ~'~t ,
(t > 0)
(1)
k=l
where Ck and ~tk have their general meaning and n is the number of exponentials. In the most general case the ~tk are complex numbers ' ktk =-O~k + iCOk
and fit) is now the classical analytical expression representing a complex FID (assuming zero phase),
f (t) - ~ Cke -~t e '~
(2)
k=l
with Ck the intensity of the k th spectral line, o~k the decay constant related to the linewidth and Ok the resonance frequency. On the other hand, when gk are real numbers i.e., P.k = -1/Xk (Xk > 0, relaxation time), f(t) appears as a typical relaxation function such as multiexponential NMR decay curves:
f (t) - ~ Cke -"~
(3)
k=l
Lastly, when the gk are purely imaginary, ISk = iC0k, the problem of the detection of components in equation (1) is easily performed by means of the Fourier transform which leads to n Dirac peaks corresponding to the n components in f(t) (position and amplitude). By analogy with Fourier's approach but in the case where gk are arbitrary (i. e., real and/or complex numbers) we started from this general idea to use some integral transform. Thus, the fisrt step in our PL method consists in evaluating the Laplace transform Lf(p) of f(t) :
102 oo
Lf (p) - ~ exp(-pt)f (t)dt
(4)
0
with p a complex number. Applying this transform to equation (1) one obtains
Lf (p)
-
~
ck
k:~ p -
(5)
ktk
with Re(p) > Sul~Re(~), this condition ensuring the convergence of the Laplace transform as well as of its derivatives. Then Lf(p) (and all its derivatives) exists in a right half complex plane and is therefore analytic in this region. Let us now consider the actual experimental signals which are in fact of the form given by equation (1), sampled at regular intervals, tj = jAt, with At the sampling step arising from the digital recording of the signal over No data points. Thus, the evaluation of the Laplace transform (Eq. 4) may now only be performed using some numerical integration but even in this case, Lf(p) remains an analytical function of the variable p. At this stage the detection problem, i.e. the search for the unknown parameters Ck, ~k and n seems to be theoretically solved by a mere identification of the poles (l~k) and the corresponding residues (Ck) of the rational expression (5). However, since Lf(p) only converges in a region which does not contain the poles ~tk, it is not possible to detect these poles by a direct numerical integration. Thus, we consider the analytic continuation of Lf(p) in the whole complex plane and this continuation will obviously be: g/
Z C k / ( P - JUg) ,
(for every p)
k=l
To achieve this goal, according to the theory of analytic functions of a complex variable, we first evaluate at some point po, suitably choosen in the convergence haft plane, the Taylor series S representing Lf(p) at the point Po : oo
S(p) - ~
Cr
(p - Po )~
(6)
r=0
where cr =
ldr
rT clp I P:P~
(7)
and
dr Lf
dp
= So (-t)~f(t)e-ptdt
(8)
From this local knowledge the crucial problem is now to obtain the desired rational expression of Lf(p), valid in the whole complex plane. It is the key point of our method and the appropriate solution which has been found, was to resort to the Pad6
103 approximants method. Hence, the authors proposed the name "Pad6-Laplace" to designate this novel method of signal analysis. A Pad6 approximant of the variable p, usually noted [N/M], is a rational fraction obtained by dividing two polynomials AN and BM of degree N and M respectively" N
E a~p ~
[ N / M ] - AN ( p ) = ~-0M
BM(p)
,
(1%= 1)
(9)
~"bvp v v=0
which is evidently related to the rational expression of the Laplace transform (Eq. 5). Indeed, rearranging equation (5), which by reduction to the same denominator may be rewritten as : n-1
L f (p) -
,-o
(lO)
n
~
a,,p"
v=0
clearly reveals the relation between [N/M] and Lf(p). Thus, in principle, a Pad6 approximant [N/M] may represent the Laplace transform Lf(p) on the condition that their Taylor series expansion agrees up to order N+M, i.e. that the following formal identity be satisfied : N oo
E
~:0
E
a ~ p 's
Crptr = s=0 M
+ O(p 'N+u+~ )
(11)
y , bVp,V v=0
with p' = P-l~. The above expression (11) leads to a system of N+M+ 1 linear equations which may be used to determine the set of coefficients {as} and {b~} from the known values of the Taylor coefficients, Cr. NOW from equation (10) we remark that Lf(p) is a rational function of degree (n- 1) for the numerator and degree n for the denominator and it becomes obvious that the [(n-1)/n] Pad6 approximant will then represent exactly the Laplace transform. However n is not known T... The way by which the component detection is achieved through the PL method is in fact quite simple. Once the calculation of the Taylor series coefficients, Cr, has been completed and the {as} and {b~} coefficients are available (from the establishment of linear equations system (11)), we only consider the successive [(N-1)/N] approximants of the paradiagonal in the I N ~ Pad6 table, which must reduce when N>n to the [(n-1)/n] approximant (Baker, 1965) and will then represent exactly the analytic expression of
104 Lf(p). This procedure provides the basis for the determination o f , n, the unknown number of exponential components without any "a priori" assumption. The following flowchart summarizes the main steps along with the corresponding subroutines programs (S/P) used in the PL procedure. (BEGIN) Read data set t~, f(t~)
[
Enter po value
[
Calculate the Taylor series coefficients by Simpson or Filon trapeze method (S/P SIMPSON) Compute the Pad6 approximants table by Longman algorithm (S/P PENTI
Calculate the poles gk and the amplitudes Ck (S/P POLRT)
Search for poles stability (S/P AMPLIT) .
.
.
.
.
Display and/or print results
[END) Fig. 1 : Flowchart of PL procedure with the names of used subroutines (cf. Fortran program source on CD ROM)
105
In order to illustrate "how it works" in actual practice, let us consider a detailed Pad6-Laplace analysis of a simulated relaxation signal consisting of the stun of two extxmentials as expressed by the following expression : f(t) = 25exp(-0.05t)+exp(-0.002t). (12) The results of the PL analysis are displayed in Table 1, which also gives complementary information relevant to our numerical analysis. As we can see, we consider the successive Pad6 approximants [(N-1)/N] computed from the Taylor coefficients of the Laplace transform of f(t), using Longman's recursive algorithm (1971). Table 1 : Pad6-Laplace analysis of fit) (Eq. 12) with the parameter po= 0.0043 chosen in the optimal range. Numerical integration was performed with the trapezoidal rule ; At = 0.5s and No = 5000 points. Below each message LAMBDA and AMPLITUDE we find the poles ~ and corresponding amplitudes Ck printed as follows : label k, Re~,k, Imp, ReCk, ImCk. Reprinted from Aubard et al., Comput. Chem., Copyright (1987), with kind permission from Elsevier Science Ltd. M=IN=O LAMBDA !
AMPLITUDE 0 . 1 1 3 8 4 7 0 4 D + 02
0.0
AMPLITUDE 0 . 2 5 0 1 0 8 1 4 D + 02 0. 1 0 0 0 3 9 2 4 D + 01
0.0 0.0
0.0 0.0
AMPLITUDE 0 . 2 5 0 0 4 8 8 6 D + 02 0 . 9 9 9 9 3 3 8 ! D + 00
0.0 0.0
0.0
- 0 . 2 9 4 8 3 5 9 8 D - 05
0.0
and
0.0
-0.14087227D-01
M=2N=I LAMBDA 1 2
-0.50025142D-01 -0.20007883D02
and
0.0 0.0
M=3N=2 LAMBDA 1 2 3
- 0 . 5 0 0 0 6 6 5 0 D - 01 - 0 . 1 9 9 9 7 3 5 0 D - 02 0 . 2 1 4 4 8 9 0 5 D - 02
and
M=4N=3 -'---1~--
1 2 3 4
-0.50015144D-01 -0.20010304D-02 0 . 1 4 4 7 8 2 3 5 D - 02 0. ! 4 4 7 8 2 3 5 D - 02
LAMBDA and 0.0 0.0 - 0. I ! 9 6 0 7 9 6 D - 02 0 . 1 1 9 6 0 7 9 6 D - 02
AMPLITUDE 0.25007179D 0.10002383D 0.17485 ! 51D 0.17485151D
+ + -
02 01 04 04
0.0 0.0 0.91655343D-0.91655343D-
LAMBDA and 0.0 0.0 0.0 - 0. ! 5 4 9 5 0 9 3 D - 02 0 . 1 5 4 9 5 0 9 3 D - 02
AMPLITUDE - 0.45448948D 0.25006961D 0.10002154D 0.18912181D 0.18912181D
+ + -
05 02 01 05 05
0 . 3 8 5 7 4 6 3 7 D - 23 - 0.73273879D - !6 0.0 0 . 8 9 8 3 9 6 3 8 D - 05 - 0 . 8 9 8 3 9 6 3 8 S - 05
05 05
M=5N=4
-- ~
I 2 3 4 5
0 . 1 6 9 1 7 0 3 7 D - 02 -0.50014151D-01 -0.20006473D-02 0.1662341 ! D - 02 0 . 1 6 6 2 3 4 1 1 D - 02
M=6N=5 LAMBDA ---lb--
-"-"1~-
I 2 3 4 5 6
-0.20319932D 0.45003054D 0.45003054D -0.50023310D - 0.6397608 i D - 0.6397608 i D
-
02 03 03 01 03 03
and
0.0 - 0.30999666D 0.30999666D 0.0 -0.10913473D 0.10913473D
- 02 - 02 - 02 - 02
AMPLITUDE 0 . 9 9 0 4 4 9 0 0 D + 00 0.22430217D 0.22430217D 0.25007985D 0.51642550D 0.51642550D
+ -
03 03 02 02 02
-0.20741961D
-
16
- 0.40223303D 0.40223303D 0.16337135D 0.39310844D -0.39310844D
-
04 04 16 02 02
106 The first Pad6 approximant, [0/1], provides (through its pole and the corresponding residue) an approximation of the original function with a single exponential. As expected, the two correct poles (Re gk) and the corresponding amplitudes (Re Ck) appear for the [1/2] approximant and they remain stable in the next approximants (they arc indicated by arrows in Table 1). From approximant [2/3] and following, one can notice, along with the genuine roots, the presence of spurious roots, with either very small negative amplitudes or appearing as pairs of conjugated complex numbers, called "Froissart doublets" (Gillewicz, 1978). Such "extraneous" or "artificial" roots arc very unstable, on going from one Pad6 approximant to the next, while the expected roots and corresponding amplitudes arc very stable for several consecutive Pad6 approximants, on the condition that the value of the parameter Po is suitably chosen in the so-called Po optimal range (see theoretical selection of po in Aubard et al., 1987). As can lx seen in Table 1, in the present numerical example, the PL method gives, up to approximam [5/6], the exact theoretical expression of the original function (Eq. 12), with an accuracy of better than two or three digits. However, when noise affects the data, a test of stability must be introduced so that the stable genuine roots, found for several Pad6 approximants, arc easily distinguished from spurious ones. Last of all, since one is interested by the detection problem, it is important to note here the relation which exists between the development order of the Taylor series and the maximum number of exponential components that could be detected. Indeed, in actual practice the Taylor series representing Lf(p) at the point Po ( Eq. 6) is truncated to some order 1, so that equation (6) now becomes : 1
S ( p ) - ~ Cr (p - Po )r
(13)
r=0
Given that there are N+M+I terms in the Taylor series expansion of the [N/M] approximants (Eq. 11), it is clear that 1 = M+N. Hence, since only [(N-1)/N] approximants are considered in PLM, (1-1)/2 is the maximum number of exponentials that could be detected when the Taylor series expansion is developed up to order I using the detection strategy defined above. For instance, in Table 1, the Taylor series of the Laplace transform of the original function (Eq. 12) has been developed at Po = 0.0043 up to order eleven. Under these conditions computation of Pad6 approximants up to [5/6] is available and in this case it would have been possible to detect five components just by following the stability of the roots (with respect to a given stability criterion, (see Aubard et al., 1987), on going from the [4/5] approximant to the next, i.e. [5/6] approximant. In the next part of this paper, we will consider applications of PL to the analysis of experimental relaxation signals.
107 3 PADI~-LAPLACE ANALYSIS OF MULTIEXPONENTIAL CURVES The Pad6-Laplace method has been successfully applied by the authors to the analysis of various classes of experimental signals, other than NMR decay curves or FIDs, including relaxation data (Aubard et al., 1985, 1987), spectral line decomposition ( Denis and Aubard, 1985), time-resolved fluorescence decays (Claverie et a1.,1987), neurophysiological data (Aubard et al., 1985), convoluted stress relaxation in polymers (Fulchiron et al., 1993). Moreover the literature abounds in miscellaneous applications of PL ; let us cite the analysis of electrophysiological signals from cardiac cells (Yeramian and Claverie, 1987), fluorescence intensity decays ( Bajzer et al., 1989, 1990), transient electric birefringence decay curves (Bowers et al., 1992), oscillatory data from linear viscoelastic spectra (Simharnbhatla and Leonov, 1993) and even (last but not the least) economic data ( Claverie et al., 1990). In the following we restrict ourselves to some illustrative experimental examples performed by the authors ( Aubard et al., 1985, 1987 ; Claverie et al., 1987). 3.1 Analysis of relaxation data.
The transient response of a chemical system to an external perturbation is of the form n
of equation (3), i.e. f (t) - Z C k e - " ~ , where Ck and ~k are the amplitude and k=l
relaxation times respectively, and is often characterized by a very small intensity blurred by a strong noise level. Therefore, sometimes the relaxation signal cannot be accurately analyzed with the available methods, due to its poor signal-to-noise ratio, (S/N). The Pad6-Laplace method was used, for the first time, to analyse fast relaxation signals, from biological tautomeric ex~libria in aqueous solutions of nucleic acid bases obtained with a very fast T-jump relaxation apparatus (Aubard, 1981). Figure 2 shows a typical relaxation signal (curve 1) obtained after 3000 accumulations. As observed, the signal is blurred by an important noise and its analysis as a stun of exponentials is not very easy. However the PL procedure gives very good results which are displayed in Table 2.
108
(b)
Informoflon$ ?CYTS. Mog
Js'~"~
~-- o.~
Tht$ $tgnat consists of 2 eXl)onentmt$
JPoJnt$ = 199 JNB Accu = 3000 T = 2 0(3 microlec JBoset~ne ? 120 |
I
f:N?,
step
0 56, 0.02
i
9ResuLts 9
Tou = 0 . 3 4 0 2 5 E - 0 4 AmlXJ =16.14 Tou = 0.23550s Ampt J : 39.79
0
t 14
l 2e
I 42
1 rm
1 ro
1 84
~
11z
1am
140
M i c r o - see
Fig. 2 : Pad6-Laplace analysis of the cytosine relaxation signal. Trace (1) is the experimental signal. Traces (2,3) are the computed relaxation curves. Reprinted from Aubard et al., Comput. Chem., Copyright (1987), with kind permission from Elsevier Science Ltd. As mentioned in the preceding section, PL analysis of the signal in figure 2 first consisted in evaluating (through numerical integration) the value of the Laplace transform Lf(p) and a certain number of its derivatives at a point po and second in representing Lf(p) through Pad6 approximants. Before performing these numerical integrations the procedure requires to search for the baseline of the signal, in order to eliminate d.c. offset. In the example shown in the figure 2 we selected the last 120 points for the baseline which is then substracted from the raw signal values. The results displayed in the right frame of figure 2 were obtained from numerical integration over the first 80 points of the signal at Po = 0.56 (see also Table 2). It should be noted that they show a good stability over a rather large interval of Po, the so-called optimal Po range, spanning from 0.45 to 0.65. The quality of this PL analysis is evident just by looking at the "fit " between the experimental signal and the curve obtained from the explicit analytical expression, computed from the results of the PL analysis (Fig. 2).
109 Table 2 : PL analysis (po=0.56) of the experimental relaxation signal (No=200 points, At= 2txs) shown in Fig. 2. See Table 1 for the explanations concerning the results of the Pad6Laplace analysis. Reprinted from Aubard et al., Comput. Chem., Copyright (1987), with kind permission from Elsevier Science Ltd.
I
-0.57935047e+00
LAMBDA and AMPLITUDE 0.00000000E+00 0.92894203E+5
0.00000000E+00
1 2
-0.29452735e+00 -0.42863226e+01
LAMBDA and AMPLITUDE 0.00000000E+00 0.48526223E+05 u.00000000E+00 0. I 1992406E+06
0.00000000E+00 0.00000000E+00
I 2 3
0.42871296E+00 -0.29389703E+00 -0.42462540E+01
LAMBDA and AMPLITUDE 0.00000000E+00 0.24818084E+00 0.00000000E+00 0.48408824E+05 0.00000000E+00 0. I 1938287E+06
0.00000000E+00 0.00000000E+00 0.00000000E+00
THIS SIGNAL CONSISTS OF 2 EXq~NENTIALS -0.29389703e+00 -0.42462540E+01
0.48408824E+05 0. i 1938287E+06
TAU=0.34026E-04 TAU=0.23550E-05
AMPLI=0.16136E+02 AMPLI=0.39794E+02
3.2 PL analysis of convoluted data set
Experimental relaxation signals can sometimes be described by a convolution product of the instrument response, h(t), with the genuine multiexponential decay, f(t), of the system under study (Aubard, 1981). Thus, the recovery of f(t) can be achieved, in principle, by deconvolution. Although non linear least-squares methods are classically used to determine the discrete relaxation time spectrum when the convolution has to be considered (Bouchy, 1982; Visser, 1985), these methods need an "a priori" assumption concerning the number of exponentials in the decay and to guess initial values near the solution for the relaxation times and their weights. We therefore applied the PL method to this problem (Claverie et al., 1987 ; Fulchiron et al., 1993) which is naturaly suited for the deconvolution operation, as shown below. Let us consider that the actual signal, s(t), is described by the convolution product : s(t) = f(t),h(t) (14) From the well known convolution theorem ( Sneddon, 1972), the convolution product above (Eq. 14) is converted by using the Laplace transform, into a simple product : Ls(p) = Lf(p).Lh(p) (15)
110 Then, Lf(p) is the ratio of Ls(p) and Lh(p), Lf(p) = Ls(p)/Lh(p), (16) and it a ~ r s from the above equation that the recovery of f(t) and its analysis through the PL method is straightforward. Indeed, the PL method needs only the knowledge of the Laplace transform of f(t) and some of its derivatives at a certain value of po, and this can be built up from equation (16). Evaluating the Taylor series coefficients both of the experimental signal s(t) and of the apparatus function h(t) (by numerical integration of sampling data), the series representing Ls(p) and Lh(p) may be written : L s ( p ) - y ' Cq ( p - Po ) q ,
(17)
q
Lh(p)
- ~
b m ( p - po )m "
(18)
m
Then, the Taylor series of Lf(p) is obtained by dividing Ls(p) by Lh(p) as expressed by equation (16). Therefore, if we define : Lf (p) - ~
a: ( p - Po )J ,
(19)
J
the Taylor coefficients aj of Lf(p) at p = Po are simply obtained by formal power series division in equation (16), leading to the following recurrence relationships :
ao- o/bo a 1 - ( c 1 - a obl ) / b o j-1
aj - (cj - ~
a~bj_,) / b o .
(20)
t=0
The rest of this deconvolution procedure is then directly performed within the frame of the original PL method. Indeed, from the knowledge of the aj coefficients, successive [N-1/qq] Pad6 approximants are computed, through algebric methods described in the previous section 2, and this provides the key to the analysis of f(t). 4 THE PARTICULAR CASE OF THE PADI~-LAPLACE ANALYSIS OF NMR FID 4.1 Problem statement
Using the theoretical considerations developed above, the application of the PL method to the determination of NMR relaxation parameters is fairly straightforward. Various specific examples of the use of PL in low-field NMR T2 relaxation experiments are given elsewhere in this book (see Le Botlan). In this section we will now consider the analysis of NMR Free Induction Decay interferograms (FIDs) which, because of their oscillating character, may appear to be less easy to resolve using the PL method.
111 Conventionally the discrete Fourier transform (DFT) has been the most popular analysis tool for processing NMR FIDs, because of its computational simplicity and efficiency of the Fast Fourier Transform (FFT). However, due to the well known limitations of DFT, including implicit windowing leading to "sine" wiggles and spectral distorsions, limited resolution, spectral folding ..., new methods of spectral analysis have a ~ r e d in spectroscopy during the last years to alleviate these difficulties (Stephenson, 1988 ; Kauppinen et al., 1994 and references therein). The maximum entropy method (MEM), Fourier self deconvolution (FSD), factor analysis and linear prediction (LP) belong to this series of new methods which seem to be very promising in spectroscopy, in general. In regard to the particular case of NMR FIDs, MEM and LP are the most appropriate procedures although they require extensive computational time. However, this point is no longer crucial with the latest generation of computers. As stated in the beginning of this chapter, the model function, fit), that describes NMR FID interferograms, consists of a linear combination of exponentially damped sinusoids :
f (t) - ~ Ake -~.t eos(oJkt + ~ k ) ,
(t>O)
(21)
k=l
assuming that the phase ~k - 0 and with Ak the intensity of the k a~ spectral line, o~k the decay constant (related to the linewidth) and o~k the resonance frequency. From equation (21), it appears that the PL analysis of f(0 is natural in the frame of the current theory developed above (see part 2). However, surprisingly, the potential of PL for resolving exponentially damped sinusoidal signals like NMR FID has been exploited only recently (Boudhabhay et al., 1992). In the last part of this chapter, we present the application of the PL method to the extraction of NMR spectral parameters (amplitude, frequency and linewidth) from both simulated and experimental FIDs. In the latter case, quadrupolar a4N FID signals obtained from N-methyl imidazole ( Witanowski et al., 1972) were analysed. 4.2 Extension of the Pad&Laplace method to NMR FIDs
Let us consider the FID expressed by Eq. (21) sampled at regular time intervals, At" n
f(jAt)-
~ Ake (-'~jAt) coscokjAt ,
0 = 1,. ..... No)
(22)
k=l
e ioJt + e-~~ Using, eoscot f (jAt) - ~
, equation (22) becomes (e -~*Jat ) e '~*jAt + e -'~*jAt
k=l
and complex conjugate exponents now appear,
(23)
112
s jA,) - X-%""" + k=l 2
-%";"
k=l 2
'
(24)
with ~k = -c~k + i c ~ From equation (24) it is clear that the Laplace transform of f(t) will then be exactly represented by the [(2n-1)/2n] Pad6 aproximant and the detection problem regarding FID signals can be expressed as follows : For an FID containing n components, the [(2n -1)/2n] Pad6 approximant represents exactly the Laplace transform of this signal. In the PL analysis the genuine roots will appear from this approximant, and the following ones, as pairs of stable complex conjugates, (J2k, ~ ; ) . In order to obtain these approximants, the Taylor series coefficients must be computed numerically and we found it convenient, in this case, to adopt an integration method based on simple trapezoidal rule and Filon's idea (Boudhabhay et al., 1992). Moreover, to properly calculate these coefficients, i.e integrals of the form, # f e cos (otdt (see part 2, Eq. (8)), the FID must be sampled over a sufficient number of data points. From practical and theoretical considerations we have established that at least 12 points per cycle of the highest frequency present in the signal are neccessary to estimate precisely these integrals 03oudhabhay et a1.,1992). However, this sampling condition is seldom encountered in actual experimental signals and we shall see, later in this paper, that under-sampling (with respect to the above sampling condition and not to the Shannon theorem) leads to important inaccuracy of the numerical integrations and then the genuine roots will appear at higher approximants than expected (see numerical simulations below). 4. 3 Numerical simulations.
The applicability of the PL method to NMR FIDs was first checked using different data sets of multiexponentially damped sinusoids (Boudhabhay et al., 1992). We only give here two illustrative examples showing the effectiveness and pitfalls encountered in the PL analysis of FIDs. As a first example, we consider a simulated signal consisting of two noiseless damped sinusoids, as expressed by the following expression : f0At) = A~exp(-bljAt)cos(2xf~jAt)+ A2exp(-bzjA0cos(2xfzjAt), (25) where, b~=500 Hz, b2= 1000Hz ; f~=500Hz, f2=900Hz ; AI=2, A2= 1. The results displayed in Table 3 were obtained for po-5, taking into account the above sampling condition (at least 12 points for 1/fmax,i.e. At < 1/(900.12) - 92.6 txs ; see Table 3) and using the Filon trapeze method of integration. Under these conditions, it appears that the recovery of the two genuine (complex conjugate) components is provided, as expected, by the [3/4] approximant and remains stable for the following
113
approximants (up to [9/10] in Table 3). This stability was preserved even when varying po from 2 to 10 with a test of stability equal to 0.1 (see part 2 above). However, this example was not fully significant to our approach since the simulated FID was over sampled compared to experimental situations usually encountered (less points, more noise T...). Therefore we consider the analysis of a simulated signal containing two noisy (1% white gaussian noise) damped sinusoids, as expressed by equation (25), but with the set of parameters used by Tang et al. (1985) in their comparative study of various techniques for the analysis of FIDs (resp., 1/b]=0.5~, 1/b2=0.4~ts ; fl=15MHz, t"2= 15.5Mhz ; AI=I, A2=0.5). As previously observed by these authors, the cosine FFT spectrum of this simulated signal, over the entire data set (256 points), consists of only one peak. The results of the PL analysis obtained for po = 3 show that only one average value is detected up to approximant [6/7] and the "exact" parameters really appear from approximant [7/8] and remain stable up to approximant [ 11/12]. In this case it is clear that the recovery of the components starts for approximant orders definitely larger than expected ([3/4]). As already mentioned above, this phenomenon is due to the inaccuracy of the numerical integration. Indeed, the simulated signal studied was defined over 256 data points using a 3 ~ time window (Tang et al., 1985) and this leads to a sampling step of ca. 12 ns. Under these conditions only about 5 points cover a cycle of the highest frequency (15.5 Mz) leading to an important inaccuracy of the numerical results. Moreover, the added white noise in the data, although very weak, also contributes to the inaccuracy of the results. Table 3 9PL analysis ( po=5 ) of the simulated FID in equation (25) ; No=200 points. At = 92~ts. Reprinted from Boudhabhay et al., Comput. Chem., Copyright (1992). with kind permission from Elsevier Science Ltd. Re p~ Im ~tk Re A~ Im Ak APPROXIMAMT 1
-0.21568991E+02
APPROXIMANT 1 2
-0.13109024E+01 -0.13109024E+01
I
I
=
0.74752688E+01 -0.13224316E+01 -0.13224316E+01
O.OO000000E+O0 0.42238017E+01 -0.42238017E+01
1 2 3 4
-0.65412481E+00 -0.65412481E+O0 -0.13148524E+01 -0.13148524E+01
0.40809979E+01 -0.40809979E+01 -0.74040502E+01 0.74040502E+01
1 2 3 4 5
-0.64992055E+00 -0.64992055E+00 -0.13124318E+01 -0.13124318E+01 -0.1858957e!+02
0.40850742E+01 -0.40850742E+01 -0.73543542E+01 0.73543542E+01 O.O0000000R+O0
APlq~OXImmT
~-------,I
=
O.O00000OOE+O0
0.14973127E+01 0.14973127E+01
0.27298496E+00 -0.27298496E+00
0.30249746E-04 0.14972976E+01 0.14972976E+01
0.51998306E-25 0.28379769E+00 -0.28379769E+00
0.10008593E+01 0.10008593E+01 0.49645338E+00 0.496453388+00
0.77167RILE-02 -0.77167811E-02 0.12937440E-01 -0.129374408-01
0.10002212E+01 0.10002212E+01 0.50353611E+00 0.50353611E+00 -0.12889337E-01
-0.28101202E-03 0.28101202E-03 0.13231981E-02 -0.13231981E-02 -0.24198335E-15
[ 2/ 3]
1 2 3
APPROXIMAMT
0.29946254E+O1 [ I/ 2]
0.42436531E+01 -0.42436531E+01 APPROXIMANT
------~
[ 0/ i]
O.O0000000E+O0
[ 3/ 4]
[ 4/
5]
114 Table 3 - cont. ~ppm)xoudrr ~ I
~
1 2 3 4 5
6
[ 5 / 6]
-0.65053812E+00 -0.65053812K+00 -0.13010866E+01 -0.13010866R+O1 0.36216911E+01 0.36216911K+01
0.40853799E+01 -0.40853799E+01 -0.73537000K+O1 0.73537000B+01 0.39069637E+02 -0.39069637Z+02
-0.650530241rrko0 -0.65053824B+00 -0.13010795B+01 -0.13010795B+O1 0.28371075E+01 0.37448043E+02 0.283710751+01
0.40053004B+O1 -0.40853804B+01 -0.73536841B+O1 0.73536841B+O1 0.387189671H-02 -0.78522854E-13 -0.38718967B+02
AP]PROXZNANT [ 6 / = =
I
2 3 4 5 6 7
0.99903349K+00 0.99903349R+O0 0.49908408E+00 0.499084081+00 -0.80488206K-03 -0.00480206K-03
-0.70565746E-04 0.70565746E-04 0.40574490E-03 -0.40574490E-03 0.30379715E-02 -0.30379715K-02
0.99915016B+00 0.99915016B+00 0.49913926K'+OO 0.49913926B+O0 -0.971296468-03 -0.10883322E-O4 ...0.97129646E-O3
-0.71291196B-04 0.71291196K-04 0.40094583E-O3 -0.40094583E-03 0.298452248-02 -0.60395669K-14 -0.29845224B-02
7]
~U,L~OXOU, m ' ( 7 / s ] = ~
{i
1 2 3 4 5 6 7 8
-'O.65053828E+O0 -0.65053828E+O0 -O.13O107661rrt.O1 -0.13010766B+O1 -O.890318MB+O0 -0.89031858E+O0 -0.490592901rrl-01 -0.49059290E+01
0.40853804E+O1 -O.40853804E+01 -0.735368471+01 0.73536847K+O1 0.37838812B+02 -0.37838812E+O2 -0.55119222E+O2 0.551192221+02 APt~oxx~
:
1
2 3
-o.65o5,2,.oo
-0.13010766B+O1 -0.65053828E+00 9 0.14531681E+O2 5 -0.130107661+O1 6 -0.10290024E+O1 7 -0.10290024K+O1 8 -0.36991669E+01 9 -0.36991669E+O1
o.,o,3so4R§
0.73536847B+O1 -0.40853804E+01 0.86829665E-15 -0.73536847E+01 0.37678847E+02 -0.37678847E+02 -0.54207084E+02 0.54207084E+O2
~pwoxiludrr
/
~
~
~
1 2 3 4 5 6 7 8 9 10
0.80786705E+01 -0.65053828E+00 0.80786705E+01 -0.65053828E+00 -0.13010766E+01 -0.13010766E+01 -0.92466904E+00 -0.92466904E+00 -0.44004621R+01 -0.44004621E+01
0.42945348m+O1 0.40853804E+O1 -0.42945348R+O1 -0.40853804E+01 0.73536847E+01 -0.73536847E+O1 -0.37769272E+02 0.37769272E+02 0.54874851E+02 -0.54874851E+02
0.99856740E+OO 0.99856740B+00 0.49884703E+00 0.49884703B'1"00 -0.26942216E-O2 -0.2694221611-02 0.25924763E-02 0.25924763E-02
-0.71504804K-04 0.71504804B-04 0.40159263K-03 -0.40159263E-03 0.20365224E-02 -0.20365224E-02 -0.13039733E-02 0.13039733E-02
[ s / 9] 0.100199621+01 0.50028654E+00 0.100199621+01 -0.10081523K-01 0.50028654E+O0 -0.26903359E-02 -0.26903359E-02 0.27610759E-02 0.27610759H-02
0.87995324E-03 0.29070245E-03 -0.87995324E-03 -0.29988708E-15 -0.29070245E-03 0.18130824K-02 -0.18130824E-02 -0.15836989E-02 0.15836989E-02
[ 9/xo] 0.65783528R-03 0.998102511+00 0.65783528~-03 0.99810251K+O0 0.49866614E+00 0.49866614E+00 -0.26874662Z-02 -0.26874662E-02 0.25736663E-02 0.25736663E-02
-0.13506388B-03 -0.22692275E-03 0.13506388E-03 0.22692275E-03 -0.51732580E-03 0.51732580E-03 -0.19724021E-02 0.19724021E-02 0.13538826E-02 -0.13538826E-02
4. 4 PL analysis of N-methyl imidazole FIDs Pad6-Laplace analysis of N-methyl imidazole FIDs was carried out to check the ability of the method on real NMR signals. Figure 3 shows the two broad ]4N resonances of N-methyl imidazole NMR spectrum. This spectruna was recorded using a WP200 Bruker instrument operating at 14.46 MHz (Boudhabhay et al., 1992). Approximately 3.2 million transient FIDs were accumulated under the following conditions : 512 data points were acquired in 0.0256s, corresponding to a spectral width of 10,000 Hz (quadrature detection) and were zero filled to 2K before FFT processing. The rolling baseline resulting from acoustic ringing of the probe for about 250 ~ is clearly evident
115 in the spectrum, precluding accurate determination of linewidths and peak positions. From the 256 real data points available in the experimental FID, only 160 data points were used in the PL analysis. In order to ensure detection at high approximants the number of Taylor coefficients was set at 40 and we chose po = 50. As the first data points are spoiled by the "acoustic ringing" of the probe rezeiver (Gerothanassis, 1987), the first four points were eliminated and reconstructed by using an efficient interpolation procedure (Boudhabhay et al., 1992). The results of the PL analysis are listed in Table 4, and compare well with the FFF values. Previous results obtained using a CW NMR spectrometer (Witanoswski et al., 1972) are also provided in Table 4 caption, for comparison. Though these previous values are in the same frequency range as the present ones, the observed discrepancy between CW and DFT might arise from solvent effects. ,
i
,
,,,,
,,,,,
f(Hz )
Fig. 3 9Real part of the 14N NMR FFT spectrum of N-methyl imidazole. Reprinted from Boudhabhay et al., Comput. Chem., Copyright (1992), with kind permission from Elsevier Science Ltd.
116 Table 4 9Comparison of 14N NMR spectrals parameters obtained from PL and FFT on N-methyl imidazole in DMSO'. Reprinted from Boudhabhay et al., Comput. Chem., Copyright (1992), with kind permission from Elsevier Science Ltd. Method
NUCLEUS
bk(Hz)
fk(nz)
Ak
PL
NCH3 N
555 1000
3510 4968
2 1
FFT
NCH3 N
555 1280
3480 4950
2 1
"Values of bk and fk, previously measured in CC14 using an NMR continuous wave spectrometer (Witanowski et al., 1972), are respectively, 470 and 1620 Hz for NCH3 and 1020 and 3030 Hz for N. 5 CONCLUSION A brief but substantial survey of the Pad6-Laplace method along with various significant applications to the time-domain in spectroscopy (e.g. NMR) have been covered in this chapter. The purpose was to give the most important mathematical tools and principles of the PL analysis and their use in a lot of experimental situations. Therefore, different classes of relaxation signals and FID were studied, emphazing the specifications of the PL treatment and the pitfalls encountered for each class of signals. The reader will now probably be able to appreciate the utility and the efficiency of the PL method compared to the popular methods currently used in NMR signal treatment. Let us recall however the most striking points of PL : Pad6-Laplace is an analytic method perfectly suited to the analysis of multiexponential curves. In this method there is no need to introduce initial values as in the case of least-squares methods ; the number of components (with real and/or complex exponents) and the exponential parameters are given by the method itself, provided the Laplace parameter Po is chosen in the optimal range. Such a detection procedure, without any a priori, is a great advantage when the solution of a problem requires the exact identification of the number of exponentials in a signal (as in the case of complex relaxation processes where this knowledge allows one to determine the number of species involve in the mechanism ). Furthermore, the PL method can be easily associated with an iterative procedure : this procedure consists in introducing the output of the PL computation as starting values in a non linear least-squares method. In this way, the PL method is enriched by the elements of the statistical method although this procedure remains essentially analytic ( Levoir and Aubard, 1989 ; Tellier et al., 1991 ; Fulchiron et al., 1993).
117 In addition to these advantages, it should be noted that PL is a simple, clear and efficient procedure which needs only low-cost computing algorithms.
Acknowledgements We wish to take this opportunity to evoke here the names of all those who have worked with us to the advance and use of the Pad6-Laplace method. First of all, A. Denis and the late P. Claverie who were the inventors of the original PL method and have conducted various applications. The collaboration with S. Boudhabhay, R. Fulchiron, R Topoi and E. Yeramian, who have contributed to different stages of this work, was gratefully appreciated. Lastly, we would also like to thank J.J. Meyer for his technical advice, usefid discussions and encouragements all along this work. 6 REFERENCES Aubard, J., 1981. Study of ultrafast biological phenomena by means of T-Jump relaxation techniques. Doctoral thesis. Universit6 Paris 7. Aubard, J., Levoir, P., Denis, A., & Claverie, P., 1985. Application of the Pad6-Laplace Method to Multiexponential Decays : Analysis of Chemical Relaxation Signals. 10th Symp. Signal and lmage Processing, GRETSI, Nice, France, pp. 1077-1081. Aubard, J., Levoir, P., Denis, A. & Claverie, P., 1987. Direct Analysis of Chemical Relaxation Signals by a Method Based on the Combination of Laplace Transform and Pad6 Approximants. Comput Chem., 11, 163-178. Bajzer, Z., Myers, A. C., Sedarous, S. S. & Prendergast, F.G., 1989. Pad6-Laplace method for analysis of fluorescence intensity decay. Biophys. J., 56, 79-93. Bajzer, Z., Sharp, J. C., Sedarous, S. S. & Prendergast, F.G., 1990. Pad6-Laplace Method for the Analysis of Time-resolved Fluorescence decay Curves. Eur Biophys. J., 18, 101-115. Baker, G. A., 1965. The Theory and Application of the Pad6 Approximant Method. Adv. Theor. Phys., 1, 1-58. Bouchy, M., 1982. D6convolution et reconvolution de signaux analytiques : Application /l la spectroscopie de Fluorescence. M. Bouchy (Editor) ENSIC-INPL, France. Boudhabhay, S., Ancian, B., Levoir, P., Dubest, R. & Aubard, J., 1992. Spectral Analysis of Quadrupolar NMR Signals by the Pad6-Laplace Method. Computer Chem., 271-276. Bowers, J. S., Prud'homme, R. K., Faramto, R. S., 1992. An Assessment of the Pad6Laplace Method for Transient Electric Birefringence Decay Analysis. Comput Chem., 249-259. Claverie, P., Levoir, P., & Aubard, J., 1987. Application de la M6thode Pad6-Laplace /l la D6convolution de Signaux Transitoires en Spectroscopie R6solue dans le Temps. 1lth Symp. Signal and lmage Processing, GRETSI, Nice, France, Vol 2, pp. 607611. Claverie, P., Dems, A. & Yeramian, E. 1989. The Representation of Functions through the Combined Use of Integral Transforms and Pad6 Approximants : Pad6-Laplace Analysis of Functions as Sums of Exponentials. Comput. Phys. Rep., 9, 247-299.
118 Claverie, P., Szpiro, D. & Topoi, R., 1990. Identification des Mod61es ~ Fonction de Transfert : la M6thode Pad6-transform6e en z. Annales d'~conomie et de statistique, 17, 145-161. Clayden, N. J., 1992. Pad6-Laplace Analysis in the Fitting of Multi-exponential Nuclear Magnetic Resonance Relaxation Decay Curves. J. Chem. Soc. Faraday Trans., 88, 2481-2486. Denis, A. & Aubard, J., 1985. Spectral lines Analysis Method. In: F. Rostas (Editor), Spectral Line Shape, Vol 3, pp. 735-737, W. de Gruyter, New York. Filon, L. N. G., 1928. On a Quadrature Formula for Trigonometric Integrals. Proc. R. Soc. Edimburg, 49, 38-47. Fulchiron, R., Verney, V., Cassagneau, P., Michel, A., Levoir, P. & Aubard, J., 1993. Deconvolution of Polymer Melt Stress Relaxation by the Pad6-Laplace Method. dr. Rheol., 37, 17-34. Gerothanassis, I. P., 1987. Methods of Avoiding the Effects of Acoustic Ringing in Pulsed Fourier Transform Nuclear Magnetic Resonance Spectroscopy. Prog. NMR Spectros., 19, 267-330. Gillewicz, J., 1978. Approximants de Pad6. In :A. Dolb & B. Eckmann (Editors), Lecture Notes in Mathematics, 667, Springer Verlag. Kauppinen, J. K., Saarinen, P. E. & Hollberg, M. R., 1994. Linear Prediction in Spectroscopy. dr. Mol. Struct., 324, 61-74. Levoir, P. & Aubard, J. 1989. OPTIPAD an Optimised Computer Program for the Pad6Laplace Method ( available upon request). Longman, I. M., 1971. Computation of the Pad6 Table. Int dr. Computer Math., 3, 53-64. Matheson, I. B. C., 1989. The Non -equivalence Pad6-Laplace and Non-linear Least Square Data Fitting : A Pad6-Laplace Bias towards Slower Processes. Computers Chem., 13,385-386. Simhambhatla, M. & Leonov, A. I., 1993. The Extended Pad6-Laplace Method for Efficient Discretization of Linear Viscoelastic Spectra. RheoL Acta, 32, 589-600. Sneddon, I. N., 1972. The Use of Integral Transforms. McGraw-Hill, New York. Stephenson, D. S., 1988. Linear Prediction and Maximun Entropy Methods in NMR Spectroscopy. In : G. W. Emsley, J. Feeney & L. H. Sutcliffe (Editors), Prog. NMR Spectrosc.), Vol 20, pp. 515-626. Tellier, C., Guillou-Charpin, M., Le Botlan, D. & Pelissolo, F., 1991. Analysis of Lowresolution, Low-field M R Relaxation Data with the Pad6-Laplace Method. Magn. Reson. Chem., 29, 164-167. Tang, J., Lin, C. P., Bowman, M. K. & Norris, J. R., 1985. An Alternative to Fourier Transform Spectral Analysis with Improved Resolution. d. Magn. Reson., 62, 167171. Tang, J. & Norris, J. R., 1988. Pad6 Approximations and Linear Prediction methods. Nature, 333,216. Visser, A. J. W. G., 1985. Time Resolved Fluorescence Spectroscopy Anal. Instrum., 14, 193-546. Witanowski, M., Stefaniak, L., Januszewski, H., Grabowski, Z. & Webb, G. A., 1972. Nitrogen-14 Nuclear Magnetic Resonnance of Azoles and their Benzo-derivatives Tetrahedron, 28, 637-653.
119 Yeramian, E. & Claverie, P., 1987. Analysis of Multiexponential Functions without a Hypothesis as to the Number of Components. Nature, 169-174. Yeramian, E., 1988. Further Analysis of Multiexponential Functions. Nature, 335, 501502.
120
Signal Treatment and Signal Analysis in NMR Ed. by D.N. Rutledge 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 6 DIGITAL FILTERING Keith J. Cross
NMR Facility, Peter MacCallum Cancer Institute, 381 Royal Parade, Parkville, Victoria, Australia 3052 1 INTRODUCTION: WHY USE DIGITAL F I L T E R I N G ?
Fourier transform NMR spectrometers are a familiar item in many chemical, biochemical and biophysical laboratories. The raw data acquired by such instruments consists of a digitized time series of voltage readings, referred to as a free induction decay (FID). The NMR spectrum is obtained by Fourier transformation of the FID. The sampling rate, T, of the FID restricts the frequencies that may appear in the spectrum to values less than the Nyquist frequency, F. Frequencies present in the original signal that are higher than the Nyquist frequency are folded back and appear as artifacts or noise in the NMR spectrum. In normal practice, the sampling rate is sufficiently fast that all the spectral features have frequencies less than the Nyquist frequency and only noise occurs at frequencies higher than the Nyquist frequency. Since the high-frequency noise will be folded back to appear at frequencies lower than the Nyquist frequency, it has been the usual practice to pass the audio-frequency signal through a low-pass electronic filter before digitization. The electronic filters consist of a network of resistive and capacitative elements whose output approximates the behaviour of some mathematical filter (eg. a Butterworth or elliptical filter). There are a number of well recognized problems involved with using electronic filters. The transient response of the filter results in intensity distortions of the first few data points in the FID, leading to curvature in the baseline of the NMR spectrum and a constant (DC) offset of the baseline from zero ( Hoult et al.. 1983). The gradual transition from the bandpass to the bandstop region, characteristic of electronic filters, requires that the filter bandpass be set significantly larger than the Nyquist frequency, otherwise signals at the edges of the spectrum will be attenuated. The output from an electronic filter necessarily lags behind the input in a frequency dependent manner.
121 Consequently, the spectrum will require phasing with a frequency dependent phase, an operation that is known to introduce further distortions into the final NMR spectrum (Stejskal and Schaeffer 1974, Marion and Bax 1988). Finally, apart from the distortions of the NMR spectrum introduced by the electronic filter, we should also consider the cost of the electronic filter. The requirement that the filter cutoff frequency be changed as a function of the spectral sweep width, implies a fairly sophisticated piece of electronics and therefore an expensive item to manufacture. While these problems could be eliminated by recording spectra with a fast digitization rate, eg. use a 100 kHz wide sweep width to acquire a 6 kHz wide spectrum and a fixed frequency (100 kHz) electronic filter, storing the final data in this form is clearly inefficient. The FID is always filtered to produce a data set that contains only the information of interest. The question is then not whether to filter the FID, but how to filter the FID most effectively. Digital filtering offers a number of advantages compared to electronic filters. First, the transition between bandpass and bandstop regions can be made to occur over a much narrower frequency interval than electronic filters. Second, there is always a delay between data acquisition and data processing, even if the delay is a short one required to get the data into computer RAM. Consequently, the filtering operation can be symmetric in time and need not introduce a frequency dependent phase error. Third, the software can be written in such a way as to minimize the transient response of the filter, and therefore give flatter baseplanes to the spectra. Fourth, the software required amounts to a few lines of code and will run on the computer required to do the Fourier transform steps, saving money. A final feature is that a digital filter is very versatile compared to a lowpass electronic filter; an additional two or three lines of code can convert a low-pass digital filter to a high-pass filter. For example, we can use a digital filter to eliminate unwanted spectral features, eg. residual solvent peaks, or select a region of a spectrum for further processing. The remainder of this chapter consists of an introduction to the mathematical background of digital filtering, which also serves to introduce some of the terminology. This is followed by a survey of literature applications of digital filters to NMR spectroscopy. 2 M A T H E M A T I C A L BACKGROUND 2.1 The transfer function The most general form of digital filter can be expressed as a difference equation of the form:
K M yj- k=~bkXj_k-k~__lakYj_k
(2.1)
The current output from the filter consists of a weighted sum of input values, x, minus a weighted sum of previous output values, y. (See Appendix A.) Filters that are implemented electronically in hardware are causal in nature, that is the output depends solely on the points acquired before the point of current interest. In contrast, digital filters
122 can act on points acquired both before and after the point of current interest, that is N may be negative in equation (2.1). Such filters are referred to as non-causal. The order of a filter is determined by the parameter M in equation (2.1): a first order filter having M=I and a second order filter having M=2. A common application of digital filters is to modify a time-series of points, x, so as to modify the amplitude of points having a particular frequency, v, in the Fourier transform of x, denoted by the symbol X(v). If we represent the filtered time series by y and its Fourier transform as Y(v), then the ratio of the output signal at frequency v to the input signal at that frequency is given by: H(v) =Y(v)
(2.2)
x(v)
H(v) is referred to as the transfer function. Note that H(v) is, in general, a complex function. We want to derive an expression for the transfer function, H(v), in terms of the coefficients of the digital filter defined by equation (2.1). In later sections we will be concerned with filters up to the second order, and so we will restrict our attention to a second order filter defined by the equation: yj = (b 0 xj + bl xj-1 + b2 xj_2)- (al Yj-1 + a2 Yj-2) (2.3) The definition of the Fourier transform of yj is: Y(v) = T ~
3~exp(-i2mcjT)
(2.4)
j~-oo
a similar expression relates x to X(v). To simplify the notation we will replace the complex exponential exp(-i2rrvT) by Z-1. The calculation of H(v) is readily achieved by replacing yj in equation (2.4) by its expansion given by the right-hand side of equation (2.3): oo
Y(V) = T Z
{(b0 xj + bl xj_l + b2 xj_2)- (al Yj-1+ a2 Yj-2)} Z-J
(2.5)
j--_~
The Fourier transform shift theorem: X(v) Z-k = T ~
Xj.kz-J
(2.6)
j----~
allows us to replace each of the sums in equation (2.5) by Fourier transforms of either x or y as appropriate. Equation (2.5) becomes: Y(v) = (b0X(v) + bl X(v) Z-1 + b2 X(v) Z-2) - (alY(v) Z -1 + a2Y(v) Z-2) (2.7) which can be readily rearranged to give the transfer function H(v): Y(v)_ b0 + bl Z-1 + b 2 Z-2 H(v) = X ( v ) - 1 + al Z-1 + a 2 Z-2 (2.8) The transfer function for the general digital filter described by equation (2.1) is a trivial extension of equation (2.8)"
123 K
H(v) =
k=~bkZ'k M
N(Z) -- D(Z)
(2.9)
1 + ~'~,ak Z-k k=l
where D(Z) and N(Z) are polynomials in Z-1. The behaviour of the function H(v) can be completely characterized (except for a constant multiplicative scale factor) by the roots of the polynomials D and N, the former being referred to as the poles of H. Equations (2.8) and (2.9) are rational functions of Z-l(Press et al., 1988, page 439). Rational functions are better able to approximate functions having sharp peaks and corners than polynomial functions. Since the ideal band-pass filter has a sharp transition from the band-pass to band-stop region, we should expect filters having a transfer function of the form of (2.8) to be generally better filters than filters with a simple polynomial dependence for their transfer function.
2.2 Properties of the transfer function Before proceeding to a discussion of filter implementation, we should briefly look at some of the properties of the transfer function. The transfer function, H(v), is in general a complex function of v. The frequency dependent gain of the filter is given by the function: G(v) : I H(v)I
(2.10)
In practice, the maximum of G(v) is normalized to unity, and G(v) describes the attenuation of signals as a function of frequency. In many applications it is convenient to use a decibel scale to describe the filter attenuation: d B = 20 log 10(G(v)) (2.11) negative values of dB correspond to attenuation. The 6 dB point is the point at which the filter gain drops to a half. The width of a filter pass-band or band-stop region is usually referenced to the 6 dB point of the filter. With the definition of the filter gain from equation (2.10) we can rewrite the transfer function as follows: H(v) = G(v) exp(ir (2.12) where ~(v) is the frequency dependent filter phase. The filter phase can be determined from the following expression: ~(v) = t a n - l { ~ H(v)~ n(v)J
(2.13)
The filter phase can be thought of as a measure of the amount by which the output signal lags behind the input signal. Digital filters are divided into two broad classes; those where the ai terms are zero, referred to as finite impulse response (FIR filters) or non-recursive filters, and those with non-zero ai terms, referred to as infinite impulse response filters (IIR filters) or recursive filters. FIR filters have been more frequently used in the NMR literature. An FIR filter that is symmetric in time, bk = b_k, is characterized by a filter phase of zero at all
124 frequencies. IIR filters have a filter phase that varies with frequency and this must be considered when implementing such a digital filter. The description of filters as FIR or IIR filters refers to the duration in time of the filter response to an impulse, a transient signal. The response of the FIR filter is limited by the length of the series defined by equation (2.1), whereas the response of a stable IIR filter asymptotically approaches zero. The description of a filter as recursive refers to the fact that the current output of the filter depends on previous output generated by the filter. 2.3 Higher order filters While a digital filter can always be implemented by coding equation (2.1) directly for the computer of your choice, this may not necessarily be the most efficient method of implementing any given class of filter. Consider for example an M'th order filter of the form: M
yj
= box j -
y akYj_k
(2.14)
k=l
According to our earlier derivation, the transfer function has the form (see equation (2.9)): b0 b0 H(V) = M -(2.15) D(Z) 1 + Y akZ-k k=l
The roots of D(Z) are of two types: single roots having real Z, and complex roots, which occur as complex conjugate pairs of numbers. The M'th order polynomial D(Z) can be rewritten as a product of M terms, one term for each of the roots denoted by its (complex) frequency, Vm: M D(Z) = 1 + ~ a k z-k k=l M
- 1-I{ m=l
Let us reconsider the different types of roots of D(Z) bearing in mind equation (2.16). The single roots having real Z are characterized by imaginary frequencies Vrn. Because the complex exponential in equation (2.16) is periodic with period 27tT, we can restrict our attention to roots of the form i~2r~T or i(cz/2nT + F), with 0 < oc/2r~T < 2r~, and F is the Nyquist frequency for a signal sampled every T seconds. The terms in equation (2.16) corresponding to real roots are of the form { 1 +exp(-oc)Z-1}. The denominators containing negative exponentials give low-pass filters and are derived from the ioc/2r~T roots. Positive exponentials correspond to high-pass filters and are derived from the i(oq2nT+F) roots. These filters are described as single pole filters. Second, the complex roots of D(Z) are characterized by complex frequencies Vm. For each root of the form (ioc + 13)/2~T, there exists a second root of the form (icz- [3)/2rtT. Treating the two roots together in equation (2.16), we obtain { 1 - 2exp(-oc) cosl3 Z-1 + exp(-2oc) Z-2 } that can be recognized as having the form of the denominator of a second order filter.
125 We have shown that the transfer function H(v) of some M'th order filters can be written as the product of P first order filters and Q/2 second order filters such that M=P+Q:
P
Q/2
H(V) = H H p ( V ) H H q ( V )
p=l
(2.17)
q=l
A filter implementation of this sort is referred to as a cascade or serial filter. There are two reasons why this type of implementation is preferable to the direct application of equation (2.1). First, the computer program can be written in such a way that the order of the filter can be readily changed without having to rewrite the code that performs the filtering. The order simply controls the number of iterations of the filter using different filter coefficients at each iteration. Second, the requirement for numeric stability of M'th order filters is a complicated function of the coefficients in equation (2.1) (Otnes and Enochson, 1978, Press et al. 1988, pages 440-443). By contrast, the requirements for numeric stability of first and second order filters are simple functions of the coefficients and are easy to apply. The numeric stability of the filter in equation (2.17) is assured if each of the filters Hp and Hq are stable. The above discussion of filter implementation was in terms of a fairly simple M'th order filter having only a b0 term. More complex filters can also be written in the form of a cascade of 1st and 2nd order filters providing their transfer function can be written in the form of a product of filters, equation (2.17) The possibilities for filter implementation are not limited to the direct and cascade forms. For example, if the transfer function of equation (2.9) can be written as a sum of terms as follows: H(v) -
N(Z) D(Z)
R Z -
Nr(Z)
(2.18)
r=l then the filter can be implemented using a parallel type algorithm. Clearly, the possibility then exists for the individual terms in the summation to be written in a cascade form, for example, leading to a wide variety of hybrid filter algorithms. 3 COMPUTATIONAL PITFALLS The discussion so far has been concerned with the mathematical properties of digital filters. In practice, the filter is going to be implemented on a computer using finite approximations to the equations that appear in the earlier section and operating on short sequences of numbers rather than infinitely long signals, as implied by the treatment above. We must look at the consequences of these approximations. There are two facets to this problem: filter stability and distortion. Stability requires that the filter output remains finite for all possible input signal frequencies. Distortion is a more subtle problem, can we be sure that the filter has not modified the intensities of the spectral peaks in the filtered spectrum.
126 3.1 N u m e r i c Stability of Digital Filters
Consider the transfer function of a general second order filter, equation (2.8). The poles ol~ the transfer function are determined by setting the denominator equal to zero: 1 + al Z-I + azZ-2 = 0 (3.1) Multiplying through by Z2 and then applying the well known formula for the roots of a quadratic equation yields: Z=
-al + 4 a l 2- 4a2 2
(3.2)
There are three possibilities to consider: either al z > 4a2, which leads to real unequal roots, or al 2 = 4a 2 and equation (3.1) has two equal real roots, or al 2 < 4a2 and equation (3.1) has complex conjugate roots. The parabola defined by a l 2 = 4a 2 separates filters having real poles from those having complex poles. Let us consider the real roots first; these lay on or below the parabola defined above. We will label the two roots from equation (3.2) as rl and r 2 respectively. Then the denominator of equation (3.1) can be written as: (1 - rlZ-1)(1 - r2Z-I ) (3.3) using the same reasoning as led to equation (2.16). Stability of the filter is assured if neither of the bracketed terms in (3.3) can pass through zero as Z-I varies. This constraint is easily met by requiring that the magnitude of rl and r2 be less than unity, hence: -al + ~ al2 - 4a2 -1
c~i+1 >- 0 for i = l..p The values c~i are the singular values of A and the vectors u i and v i are respectively the i-th left and right singular vectors. The singular values can be considered as quantitative measures of the qualitative notion of rank: algebraically a matrix has a well determined rank which is a non negative integer but in practice, the effects of rounding errors and fuzzy noisy data make numerical rank determination a non trivial exercise.
170 The SVD reveals a great deal about the structure of a matrix as demonstrated by the following corollary Corollary : Let A be a matrix on which one applies SVD and 9 9
cr I > ~ 2 > ... > err >err+ 1 = ".- = r = 0 Rank property r(A) = r and N(A) = Tr {Vr+ 1..... Vn}; R(A) = Tr {u I ..... Ur} Dyadic decomposition:
A= Zi uioivT The dyadic decomposition provides a canonical description of a matrix as a sum of r(A) rank-one matrices of decreasing importance as measured by singular values. 9 Norms
IIAII,- - , In the analysis of the effect of perturbations of A and b on the least square solution, the generalised condition number for the 2-norm, since
IIAII2-
k(A)= IIAII,IIA-'II pl, ys a significant and IIA '112- o:'. ,ve
role. In particular
m
matrix is
singular if its condition number if imqnite. If the condition number is too 'large' the matrix is said to be ill-conditioned. The definition of 'large' may differ from problem to problem and depends on the accuracy of data and the accuracy necdcd for the solutions. The linear systems arising from the linear prediction equations are such examples of illconditioned problems since it is obvious that the matrix is not fidl (rank is singular just because some columns are linear combinations of a few others) and the experimental data are affected by errors. For singular matrices the concepts of nullspace and range arc important. If A is a singular matrix there is some subspace of x, callcd the nullspace, lhal is mapped to zero 9 Ax=0; the dimension of this space (the number of linearly indepcndcnt vectors x which can be found in it) is called the nullity of A. Now, there is also a subspace of b that can be 'reached' by A (there are some x which can be mappcd here): lhis subspace is callcd the range of A and its dimension is callcd the rank of A, R(A). A rclevant theorem is 9 " rank plus nullity equals N". For a nonsingular matrix thc rank is N (its rangc will be all of the vector space b). By doing an SVD of a matrix, one actually conslnlcls orlhonormal bases for lhe nullspace and range of the matrix in the sense that the columns of U whose corresponding singular values w i are nonzero, are an onhonormal sct of vcciors that span the range; reciprocally the columns of V whose corrcsponding singular valucs arc zero are an orthonormal basis for the nulispacc. SVD is also a powcrfid mcthod for solving mosl lincar leasl squarcs problcms "
171 Theorem. Given the matrix A ~ R pxn and the vector b ~ R p , consider the general linear least squares problem 9
mi~llxll2, s - {x ~ Rn,llb- AxlI2 - min } where rank (A) - r _<min (p.n). This problem always has a unique solution, the pseudo-inverse solution, which can be written x=AIy, where AI is the unique pseudo-inverse of A. Let the SVD of A be 9
where U and V are orthogonal matrices, and Er=diag (~1, ~2,- ..... ~r) and ~1_> ~2 --..... -> ~r > 0 are the ordered nonzero singular values of A. Then the pseudo-inverse is 9 0
"uT"
4 SOLVING THE PREDICTION COEFFICIENTS For N known samples of the signal and a given prediction order M it is obvious that we can apply the prediction equations (14,15) for (N-M) times. The 'prediction matrix' is a rectangular matrix of dimensions (N-M) x M. The maximum rank of this matrix is the minimum value between M and (N-M). The problem of finding the prediction coefficients is now reduced to solving the linear system of equations. The linear LS procedure used is based on singular value decomposition (SVD). This enhances the numerical stabilit3~of the mathematical process. In addition, analysis of the singular values enables one to distinguish between signal and noise. We have used a particular SVD implementation based on Eigensystem Subroutine Package - EISPACK (Smith et al. 1976). The subroutine computes the singular values and complete orthogonal decomposition of a real rectangular coefficient matrix A of the linear system 9 Ax=B
(17)
A is decomposed into U. Z.l," r with U r. U = l " r - V = I. Our interest was to find a reliable routine for computing both the full SVD decomposition and the LS solutions of minimal norm. If A is a (M x N) matrix and b is a given column vector with M components, then tile N-component vector x is defined to be the least squares solution of the linear system Ax=b if it minimises the Euclidean norm of the residual vector b-Ax. If A is not rank full, then the solution is not unique: in the absence of other criteria, it is common to choose the one with minimum norm Ilxl]. If A = UEI ''T then the desired solution x can be written x = / ' E +U rb where E +is the diagonal matrix with elements
172 o.j+. = { l / c r j "if o-j > r 0 :othenvise with a certain specified value for x. The minimum norm solution x depends upon the choice of ~. If-t is increased to a value which causes one or more additional values to be neglected, then lib- Axll will be increased and Ilxllwill be i
i
decreased. One starting point for a "good" -t value could be r > maxl A,.jl where Ai,j are
t,J
the errors in ai,j which are actually the experimental data. Our implementation (like the EISPACK routine) computes the U Tb product and also 2 and V, so in order to compute x one need only determine 2 + (based on the selected x) and form the product matrix. Since we sometimes need a simultaneous solution for both forward and backward prediction, b could be a matrix with more than one columns. The implementation of the SVD follows the Golub and Reinsch (Golub & Reinsch, 1970) description of the two steps algorithm: the first one in which we reduce the matrix to the bidiagonal form and a second one in which a variant of the QR method is used to decompose a symmetric tridiagonal form. Since we are SOlnCtimes interested in computing only a few terms of the SVD expansion of a matrix, special algorithms could be very advantageous in the case of large matrix. These are the so called Partial SVD and are based on conjugate gradient search methods (Haimi-Cohen & Cohen, 1987) or the power method (Golub & van Loan, 1989). An example is given in the Pascal program LPS.PAS (and LP.PAS) which uses a SVD technique for solving the forward prediction coefficients or LPFB.PAS which uses the same technique for finding both the fonvard and backward prediction coefficients. In the given examples a signal is simulated (as a FID or relaxation signal) and the values are put into the 'xs' vector. The Hankel matrix A generated by the prediction equations is filled in the following sequence (in pseudocode) : for i= 1:nrow for j= 1:ncolumns a li,jl=xsli+j-ll The right term of tile system (17), tile b matrix has 2 columns (for the forward and backward direction of prediction) filled as follows: for i= 1:nrow b ii, l l=xs lncolumns+il b [i,2]=xs [i-ll
{ forfoJqvardprediction } { for bacl,~t,ardprediction}
Thus a simultaneously solution is found for both directions of prediction after performing the SVD by calling :
SVD (a,b,ncolumns,nrow,w)
173 where 'w' is the singular values vector. Analysing this vector is extremely important for the numerical stability of the system. A typical plot of the singular values for NMR data is given in Fig. 1.
800
~o~aoo
'\
u~ m
\
I-I
\
I-I-I-I-I-I.l.|.U-|-|-I-I-I-I-I-i-I-I-I-I-I-I-I
15
25
20
Index
Fig. 1 The singular values of a (100 x 30) prediction matrix. The signal is a superposition of three damped sinusoids with noise One then establish a 'low limit' for the singular values and implicitly an estimation for the rank of the A matrix. E + is then calculated: for i=l: ncolumns if (w[i] > low limit) then svarray [i,i]=l/wlil the solution is calculated as .,,-- UE +~,~T b using a procedure for matrix multiplication 'matlrll.l':
matmu (a.svarray.mexp.nrow.ncolumns.ncolumns). matmu (mexp.b, mexp. nrow, ncolumns.2): The prediction coefficients are mexpli.lJ for the fonvard direction and mexp[i,21 for the backward direction. Now we are able to reconstruct (or to extend) the experimental data both in fonvard and/or backward direction by using the prediction formulae (14,15). In the following examples we applied LP methods for reconstructing (both in forward and backward directions) FID and relaxation signals. - - - - -
Ongmal
data
o
d
,o',~ Time
Index
Fig. 2" First 132 points of an FID signal consisting of three noisy damped exponentials.
174
,00
Ii
Reconstructed data
5O
50
-~oo
0
2oo
400
600
800
! I~1
1200
Time index
Fig. 3" Results of LP forward prediction for 1024 points, using signal from Fig. 2 The prediction order is 30. D~fferences between ongmal and predicted data
,,:_
:
, i,ii,I I'11'1, i , IIlllill"wl 1
:~"
,t
~o~
Time
6o~
index
8o6
,o;o
,2~o
Fig. 4 Differences between the original data and data reconstructed by LP in the forward direction. The first reconstructed point is 133 so the error is zero for the first 132 points. Reconstructed 9 data
i!,
Timeindex
Fig. 5: The attempt of predicting forward the same data as in Fig. 2 but with a low prediction order (10). Note difference compared to Fig. 3
175
~o-
O~erences between ongmal and p r e ~ e d
t ,o
data
I
-
9 .,
illtlW't
,ill,'
.lo -2o
:
-3o
:
. ,.?...
i ,,.
'-'1'''
2o~ Time index
Fig. 6: Differences between original data and data reconstructed by LP in the forward direction. The first reconstructed point is 133 so the error is zero for the first 132 points. The errors in this case are much larger than the noise level.
i. ~JCO 6C0 400
i \ ................
;-"-w:..-,,-;
....
: ....
; ....
- ....
=,
Index
Fig. 7" Singular values of the signal from Fig. 2 in the case of prediction order M=30. It is clear thal 6 singular values are 'signal related'. (For k damped sinusoids the number of singular values are 2k)
pred=ct=on order M=IO
60o
\
\
400
~5
2oo
---"- '= . . . . .
i ~ l
11
i
i
! Index
Fig. 8 Singular values of the signal from Fig.2 in the case of prediction order M=10. One can no longer distinguish the 6 true 'signal related' singular values.
176
---'--Onginal =,
data
".., ,=
?f:3
m .~_
t50
um. 9149149
-i
lOO
"Ul-lll~ Iliml
i
~inmllmmllu
"Rme index
Fig. 9: A relaxation signal consisting of two damped exponentials with added noise
-----
R e c o n s t r u c t e d data
,;~
,;o
,oo
\
~oo
5~
"
~o~
Time index
Fig. 10 Data reconstructed with fonvard LP. 8 -
Drfferences between ongmal and reconstructed data
6 4
i
I
I
~J 4
Time index
Fig. 11 Differences between original and reconstructed data by LP in the forward direction. The firs! reconstructed point is 41 so the error is zero for the first 40 points. The errors are of the same order as the added noise.
177
~00
--
--e-B a c k w a r d LP r e c o n s t r u c t e d --o--Original d a t a u s e d for LP
o\
250
data
o o
200
9
100
50
Tim e i n d e x
Fig. 12" Reconstructed data from backward prediction. D=fferences between
original and reconstructed
3
data
o~
i
~ ra
~176 o o
O_
E '~
" -I
!ilt V o/ 8b
,do
Tim e I n d e x
Fig. 13 Differences between original and reconstructed data by LP in the backward direction are of the same order as the added noise.
5 LPSVI) Spectral analysis of sampled time domain signals is usually accomplished by FFT (Fast Fourier Transform). A fimdamental limitation of this method is that the smallest observable splitting is approximately equal to the inverse of the time duration of the signal. Higher resolution than FFT invoh, es fitting a model directly to the time domain data (Barkhuijsen et al., 1985 and Beer & Onnondt, 1991). The paramctcrs of lnodels represented by a sum of a known complex damped exponentials to noisy data can be estimated using Prony's method. Such estimation problems arise in NMR spectroscopy (Vandewalle & Moor, 1988). These data may be modelled as a sum of many damped complex exponentials and for complex molecules there may be tens of exponentials in the model (the pattern of peaks in the spectrum can be used to hypolhcsise models of the couplings within the molecule). The LPSVD method (Kumaresan & Tufts, 1982) involves extending the length of the prediction equations (so that the number of linear prediction coefficients is much greater than the number of terms in the model) and then using a truncated singular value
178 decomposition as a method for determination of the linear prediction coefficients (Kot et al., 1993). From these estimates, the aim is to obtain the model parameters which are less sensitive to noise and to investigate the performance of the truncated singular value decomposition in the LPSVD method when applied to noisy data (Delsuc et al., 1987). Here truncation amounts to omitting the contribution of the "noise related singular values" to the LP coefficients. The model being considered is: K Sn = kE=l~. exp[+i~k + ( - b k + i 2 a f k )Atn ]
(18)
where the parameters r k, ~k, bk and fk are the amplitudes, phase, damping and frequency of the k-th term in the above equation. Estimates of the frequencies and the damping factors are obtained from the roots of the M-degree characteristic polynomial: z M - a , z M-,_ ..-aMzt' = 0
(19)
The LP coefficients a 1..... a M in the difference equation" )t!
(20)
ares ..... (in fonvard direction)
s,, = - Z m=|
may be written A-a = s. where A, a and s are defined as follows: I -SAI
six4-1
" "
Sl
I
sM
" "
s2
I
I -s'At-I
A-)
.
9 ..
l
I I
- S N _ 1 -SN_ 2 . . - S N _ ~ I )
a
=
I I a
s = s
.
1
M+I
.
.
a
M
....
1' ~N
(21)
)'
The least squares solution to a is given by: a = A+s where A + is the pseudo-inverse of A. In the LPSVD method, the pseudo-inverse of A + is based on the SVD of A, truncated to K terms, rather than using all M terms (Dologlou & Carayannis, 1991). The numerical rank K of A is defined as the number of singular values of A. strictly larger than a thrcshold value -t:
179 c~1 > cr2 > ... crk >1: > Crk+1. This definition is reasonable, provided there is a well defined gap between the singular values crk and C~k+1. If such a gap does not exist, than the numerical rank is ill-dcfined (Umapathi & Biradar. 1993). In the case of K noiseless exponentially damped sinusoids the quantities: expI(-b k + i2nfk )At], with k = 1..... K are equal to the roots of the polynomial: z
2K
- alz
2K-1
-...-a2K
= 0
(22)
When more LP coefficients than necessary for the noiseless case are taken, then the order of the polynomial increases accordingly. There is an important feature of the backward and for~,ard linear prediction polynomials: the position of the roots. For the forvcard prediction polynomial, all the roots (Zk) of the polynomial lie inside the unit circle For backward prediction the roots can lie either inside or outside the unit circle: the outside roots are related to the signal (damped sinusoids) while the roots located inside the unit circle are related either to noise or to the exponentially increasing signals. This special feature of the backward linear prediction technique provides an efficient way to sort signal from noise, if the SNR is not too low. A further way to improve the sorting is to use both directions of prediction, then reflect the backward roots and pick up the pairs (fonvard. backward) with minimum distances (I zkb-zlfl ). Thus. depicting the signal characteristic root as a vector in the complex plane (Zk=Zkr+JZki). the damping factor can be obtained from the radius and the frequencies from the angle with the real axis:
(IZkln):
F .vo
.Vl
Y q - I -]
[ .vI
.v2 .
.Vq
I
f
Hi,j = Yi+j-I
(31)
whose rows and columns consists of consecutive outputs of the system. A necessary condition for such an estimate is that the last column of the Hankel matrix does not contain significant extra information on the signal or, in other terms, the phenomenon to be modelled has been measured during a sufficiently long time. For matrices that possess Hankel structure (i.e. all elements on an anti diagonal are equal) it is easy to do the following factorisation via SVD: H = U.E.V T = (U.E 1/2). (Z I/2"vT) = F'G into an observability matrix F = U.E 1/2 and a controllability matrix G = Y,I/2.VT, where U and V are orthogonal and E is diagonal Using the above assumptions in describing the system :
H =
x 0 Ax 0 c T ",_p-I 4
Aq-lxo
= FG
(32)
184 The system matrix A appears more explicitly and can be reconstructed using the fact that the shifted Hankel matrix H with entries Hi, j = Yi+j satisfies an analogous factorisation" /'7 = F . A 9a = (UE1/2) 9A 9(21/2V T)
(33)
Since I = FV.F.G.G x, the system matrix A may be obtained from the shifted Hankel matrix H
A = (Z-I/ZuT)
9H . (V~-1/2)
(34)
The exponents can be retrieved from the matrix A knowing the relation between the diagonal elements and exponents 9 aii = exp (bi.At) where At is the sampling time of the signal. The coefficients c i are then found by a Least Squares procedure using again an SVDbased procedure. 6.2 Algorithm implementation The program which illustrates how the algorithm of Zeiger-McEwen could be used in multiexponential fitting was written in Borland PASCAL V7.0 and is HSVD.PAS. In the following, we present a brief description of the main steps of the program: 9 9 9 9 9 9 9 9
simulation of the relaxation signal using a theoretical formula and adding noise; initialisation of the Hankel matrix ali,j] and Hankel shifted matrix h_shift [i,j]; SVD of the matrix A: estimation of the reduced rank: form the pseudo-inverse matrix Z -l" computation of the system matrix A, computation of the exponents from the diagonal elements of A; computation of the exponentials coefficients, solving the linear s3.,stem via SVD.
6.3 Results obtained on simulated data In the following we have applied the algorithm to identify the coefficients and exponents of a set of simulated signals, which may be written in the following theoretical form: 3
f(t)-
Ec, exp(b, .t)+w(t)
(35)
t=l
where w(t) is the noise term given by w(t) = ~.f(o).(random number) For the noise level ~xwe have chosen three values, namely 0.01, 0.025 and 0.05. Testing the SVD algorithm on simulated data is very important before applying it to real data. In this way. one may control both input data and the results obtained at different stages of nmning the program. In Table 2 are presented the results obtained in the case of singular value decomposition of a sum of two exponentials (in which the coefficients c i = 60, 40 and b i = -5, -3 respectively).
185 Table 2. Results obtained on simulated data (sum of two exponentials)
ii ii
Ill
Noise level (o0 0.00 0.01 0.025 0.05
Exponent 1 -5.000 -5.0035 -5.0056 -4.9988
Noise level (o~) 0.00 0.01 0.025 0.05
Coefficient 1 Standard deviation Coefficient 2 Standard deviation
i
II
Standard deviation 0.0241 0.0532 0.0635
6o.oo
Exponent 2 -3.000 -3.0025 -3.0108 -3.0166
-
59.8288 59.6548 59.5692
i
4o.ooo
0.8364 1.9036 2.8709
I
Standard deviation 0.0121 0.0337 0.0554
I
II
40.1664 40.1800 40.3044
-
0.8389 1.7186 2.9525 I I
Each decomposition was repeated 15 times in the presence of noise in order to determine the errors. Also. we increased the noise level in order to have information about the noise robustness of the algorithm. As a result we may conclude that in the case of a very good signal/noise ratio the SVD method discriminates with great accuracy the number of exponentials describing the data. This information is clear from an examination of the singular values, as shown in Fig. 16
l0
Ol O ~ o ~ o ~ O ~ o ~ O ~ o ~
2
4
6
Index
8
O
10
Fig. 16: The singular values obtained from the decomposition of the simulated signal (sum of two exponentials) at different noise levels (o-0.010" 11-0.025;O-0.050) The singular values that characterise the signal are positive, decreasing numbers, which differ by at least one order of magnitude. On the other hand the singular values that characterise the noise are very similar and therefore easily detected. As the noise increases, one observes an increase of the noise-associated singular values level at the same time as a decrease in the gap between the largest 'noise' singular value and the smallest 'signal' singular value. In the same way, we have applied this lnethod for the decomposition of a signal formed by a sum of three exponentials, (in which the coefficients c i = 50, 30, 20 and b i = -5,-3 - 1.5 respectively).
186 The results are presented in Table 3 in the same conditions of noise level. As the number of exponentials is increased, the noise effect on the obtained results also increases. Table 3. Results obtained on simulated data (sum of three exponentials) "Noise level Exponent 1 Standard (o~) deviation 0.00 -5.000 0.01 -5.0116 0.0472 0.025 -5.0474 0.0876 0.05 -5.2198 0.1832
Exponent2
Standard deviation 0.0734 0.3147 0.2648
-3.000 -3.0380 -3.0362 -3.3786
Exponent3 -1.5000 -1.5048 -1.5232 -1.5394
Standard deviation 0.0051 0.0240 0.0272
II
,
,
Noise level Coefficient 1 Standard Coefficient2 Standard Coefficient3 Standard deviation deviation deviation 0.00 50.000 30.000 20.00 0.01 49.4002 2.0329 30.3048 1.7339 20.293 0.2576 0.025 48.4978 4.2621 32.4438 3.0850 19.5664 1.7534 0.05 45.4988 3.0410 32.5455 3.1025 22.9190 3.1580 ,,
,,1
1,
,,
,
,
,
,
,
,
,
,
It is obvious that, in the case of good signal/noise ratio the gap between the largest 'noise' singular value and the smallest 'signal' singular value is easily detected. Ioo i
',
1
;
~'~
001
\
t
....
,
,,
"r.
.
.
.
.
.
,
-
'
Index
Fig. 17 The singular values obtained from the decomposition of the simulated signal (sum of three exponentials) at different noise levels (e-0.010; m-0.025;0- 0.050) In order to justify this, in Fig. 16 and 17 are presented the values obtained from the singular values matrix for different noise level cases (or = 0.01; 0.025; 0.05).In the first two cases we observed a substantial gap between the smallest "signal" singular value and the largest "noise" singular value : this gap diminishes as noise level increases, as does the accuracy of the corresponding reconstructed exponents. As a result, it is shown (Vandewalle & Moor, 1988) that under mild conditions the error in the computed exponents is of the order:
187 O(n) = On+ 1/(~n - ~n+ 1)
(36)
where O(n) is the quotient of the largest 'noise' singular value and the gaps between the smallest 'signal' singular value and the largest 'noise' singular value. It is obvious that the less the measurements are corrupted by noise, the greater will be the gap between ~n and Cn+l and hence the better the accuracy of the estimate. For this reason, easily repeated experiments, such as NMR relaxation experiments where signal accumulation can be performed, offer the possibility for SVD of signals which have S/N ratios even smaller than those shown in this paper to provide reliable results. An important result of this method is the possibility of detecting the number of exponentials present in a signal by examining the values given in the singular value matrix, and deciding on the matrix rank, 6.4 Results obtained on real data
For a better evaluation of the possibilities offered by SVD exponential firing we have done a NMR experiment on a two compartment system formed of different water samples (labelled with different concentrations of MnC12), which were measured separately first and then together. The signal was obtained using the Carr-Purcell-Meiboom-Gill sequence applied to the two compartment system and finally a number of 200 points were used for the fitting. We have applied SVD algorithm to the data obtained from the samples placed concentrically in the NMR probe, having also the possibility to compare the results with the control samples which were determined before. The results are presented in Table 4. Table 4. Results obtained on real data
i
Sample as a function of MnCI 2 (mM) content c 1 - 22.5 c 2 = 2.0 c 1 + c2 c 1 = 15.0 c 2 = 5.0 c ! + c2 c 1 = 15.0 c 2 = 10.0 c 1 + c2
T2A(ms )
T2B(ms )
Amplitude A
Amplitude B
1.04 1.11 1.6 1.8 1.6 1.44
11.5 10.8 4.7 5.0 2.57 2.81
160 153 250 220 250 225
90 83 100 87 90 83
We applied the same method in a study concerning the possibilities of using NMR for authentication of edible oils (EC project COST 901). The experimental results showed that for all 42 investigated samples the relaxation functions (both spin-spin and spinlattice) can not be described in terms of a single decaying exponential and therefore one can not assign a single relaxation time to this class of compounds. In order to decide the mathematical structure of the relaxation functions (e.g. superposition of several decaying
188 exponentials or continuous distribution of exponentials), we have used the method State Space-SVD. Using this procedure one may predict in advance the number of exponentials invoh,ed in the relaxation process (Brown. 1989). Applying the SVD based procedure, it turned out that for all oil samples, the relaxation functions can be described by a sum of two decaying exponentials. This is illustrated in Fig. 18 and 19.
1000
. _
100 0 0 rs ~176176176176176176176
d
Index
Fig. 18: Tile singular value decomposition analysis (SVD) ofT 2 relaxation data in sunflower oils. |
!
i
i
Experimental data Calculated data
sb
,o~, Time (ms)
~s'o
20'0
Fig 19 The experimental spin-spin relaxation data in sunflower oils. The full line represents the SVD fit with a sum of two decaying exponentials. 7 CONCLUSION We consider that the results arc in good agreement with the expected values and these tests enable us to apply the method described in this paper for T 2 determinations in unknown biological systems in which the prior knowledge of the constituents is improbable. The assets of this algorithm over others are the algorithm provides a clear answer to the question of whether the quality of the data justifies fitting by a sum of exponentials; 9 no need to know in advance the number of exponentials that characterise the data: 9
189 9 9 9
no need to provide a suitable starting point for iterations: less chance of breakdown if the exponents are close: shorter computation time.
7 REFERENCES Barkhuijsen, H., Beer, R. de, Bovee, W.M.M.J., Ormont, D. van, 1985. Retrieval of Frequencies, Amplitudes Damping Factors and Phases from Time-Domain Signals Using a Linear Least-Squares Procedure. J. ~1agnetic Resonance, 61,465-481 Barkhuijsen, H., Beer, R. de, Bovee, W.M.M.J., Ormont, D. van, 1985. Aspects of the Computational Efficiency ofLPSVD. J. ~lagnetic Resonance, 64, 343-346 Barkhuijsen, H., Beer, R. de, Bovee, W.M.M.J., Ormont, D. van, 1986. Error Theory for Time-Domain Signal Analysis with Linear Prediction and Singular Value Decomposition. J. Magnetic Resonance, 67, 371-375 Beer, R.de, Ormondt, D.van, 1991. Analysis of NMR Data Using Time Domain Fitting Procedures. In" M.Rudin and J. Seeling (Editor), In Vivo Magnetic Resonance Spectroscopy, Springer. Brown, R.J.S., 1989. hfformation Available and Unavailable from Multiexponential Relaxation Data. J.,~lagnetic Resonance, 82, 539-561 Delsuc, M.A., Ni, F., Levy. C.G.. 1987. lmprovment of Linear Prediction Processing of NMR Spectra Having Ver), Low Signal-to-Noise. J. Alagnetic Resonance, 73, 548552 Dologlou, I., Carayannis, G.. 1991. Physical lnlerpretation of Signal Reconstruction from Reduced Rank Matrices. IEEE Transactions o11 Signal Processing, 39(7), 1681-1683. Dooren, P.M. van, 1991. Strucured Linear Algebra Problems in Digital Signal Processing. In" G.H. Golub, P.V. Doorcn (Editors), Numerical Linear Algebra, Digital Signal Processing and Parallel Algorithms. Springer-Verlag Berlin/Heidelberg. Ferrari. A., Alengrin, G.. Pitarque. T., 1992. hnprovment of a State-Space Iterative Noise Reduction Algorithm for Harmonic Retrieval. IEEE Transactions on Signal Processing, 40(5), 1263-1266. Gersho, A., Gray, R.M., 1992. Linear Prediction. In: Vector Quantisation and Signal Compression, Kluwer Academic Publishers, Boston/Dordrecht/London Golub, G.H. and Reinsch, C., 1970. Singular Value Decomosition and Least Squares Solutions. Numer. Alath. 14. 403-420 Golub. G.H., Van Loan, C.F., 1989. Matrix Computations, The John Hopkins University Press, Batimore and London. Haimi - Cohen, R., Cohen. A., 1987. Gradient Type Algorithms for Partial Singular Value Decomposition. IEEE 7)'ansactions on Pattern Analysis and Mrachine Intelligence, 9(1), 137-142. Ho, B.L. and Kalman, E.R., 1965. Effective Construction of Linear State Variable Models from I/O Data, Proc. 3-rdAllerton C o i l 449-459. Kot, A.C., Tufts, D.W., Vaccaro. R.J., 1993. Analysis of Linear Prediction by Matrix Approximation. 1EEE Transactions on Signal Processing, 41(11), 3175-3177.
190 Kumaresan, R., Tufts, W.D., 1982. Estimating the Parameters of Exponentially Damped Sinusoids and Pole-Zero Modelling in Noise. 1EEE Trans. on ASSP, 30(6), 833-840. Makhoul, J., 1975. Linear Prediction: A Tutorial Review. Proceedings of lEEE, 63(4), 561-580. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T., 1988. Numerical Recipes The Art of Scientific Computing, Cambridge University Press. Smith, B.T. et ai. 1976. Matrix Eigensystem Routines-EISPACK Guide, 2-nd Edition, vol. 6 of Lecture Notes in Computer Science, New York; Springer Verlag. Umapathi, R.V., Biradar, L.S., 1993. SVD-Based Information Theoretical criteria for detection of the Number of Damped/Undamped Sinusoids and Their Performance Analysis. IEEE Transactions on Signal Processing, 41 (9), 2872-2881. Vandewalle, J., Moor, B.de, 1988. A Variety of Applications of Singular Value Decomposition in Identification and Signal Processing. In: E.F. Deprettere (Editor), SVD and Signal Processing: Algorithms, Applications and Architectures, Elseviers Science Publishers B.V. Zeiger, H.P. and McEwen, A.J., 1974. Approximate Linear Realisations of Given Dimensions Via Ho's Algorithm, IEEE Transactions on AC. vol 19, 153 -
Signal Treatment and Signal Analysis in NMR Ed. by D.N. Rutledge 9 1996 Elsevier Science B.V. All rights reserved.
191
Chapter 9 A WINDOWS PROGRAM FOR RELAXATION PARAMETER ESTIMATION
Douglas N. Rutledge L a b o r a t o i r e de Chime Analytique Institut National A g r o n o m i q u e Paris, F R A N C E 1 INTRODUCTION
Time Domain-Nuclear Magnetic Resonance is a rapid instrumental technique widely used in the agro-food, pharmaceutical and chemical industries for quality control. NMR parameters such as the longitudinal relaxation time (T1) and the transverse relaxation time (T2) may be correlated with quality parameters such as solid content, moisture content, water activity, degree of unsaturation of oils or molecular diffusion rates (Rutledge, 1990, 1995). For a transverse relaxation curve, the parameters to be estimated are the Initial Amplitudes (Moi), the T2i and possibly the Baseline offset (B) ; for a longitudinal relaxation curve, the parameters are Moi, the Tli and the Pulse Phase (P). The equations for these relaxation processes are as follows : n
I(t) - ~
Moi 9ex p
,=~
n
I(t)-~lMOi. :
+ B
(1)
LZ21J
( 1 - P - expI~---~-t~
(2)
LIli._V
A program based on the Marquardt method was developed in order to estimate these NMR relaxation parameters from the Time-Domain signals. The first part of this chapter will give a succinct presentation of the theoretical basis of the Marquardt method. Details will then be given on the structure and characteristics of the program itself. The implementation presented here is derived, with modifications,
192 from procedures published in "Numerical Recipes in C. The Art of Scientific Computing" (Press et al., 1988). For a more complete explanation of non-linear regression in general, see also the chapter by R.S.Cfirmenes. The program "MARQT" is written in Microsoft Visual BASIC 3.0 Professional Version in order to take advantage of the Windows graphical user interface environment and memory management possibilities. It is supplied with an accompanying Help file.
2 MARQUARDT METHOD OF NONLINEAR PARAMETER ESTIMATION In non-linear regressions and parameter estimation procedures, it is usually necessary to iteratively vary all parameters simultaneously, adjusting the increments so that the movement in the multidimensional parameter space leads to a decrease in the optimisation criterion. The optimisation criterion is a function of the dependent variable and the value of the dependent variable is a function of the parameters. Therefore, instead of blindly searching the parameter space, one can create a linear model for the variation of the optimisation criterion as a function of the parameters. Creating such a model and using it to predict the optimal set of parameter values accelerates the minimisation as fewer calculations are required. The procedure consists in first performing a Taylor series expansion to give an approximation of the criterion at a given point (the Gauss-Newton method). For a function that depends on only one variable the Taylor series expansion can be written as :
fd2f i r Cdx2 { 2 ] 9
\dx,/"
+
"'"
(3)
In the case of an optimisation criterion C = f[al...a,] which depends on several variables or parameters, we have : C(a + Aa) - C(a)+
.Aa i + da,
9 i=l j=l ,,OaiSas
2
)+""
(4) where: a = [a 1...an] 9
a vector of parameters to be estimated
Aa = [Aa~... Aa,] 9
a vector of variations of the parameter
Using matrix notation, this can be rewritten as : C(a + Aa) = C(a) where:
_~_gTAa +
AaT H- Aa 2
+...
(5)
193
Fac acl g=Laa~
8an J" vector of partial derivatives (or "gradient") of C ;
[ d2C
02C 7
I
...
] Sa, Oal n=l
I -
I
........
"
'
9
~
".
Oa18an I
~. . . . . . . '
"
2C --~-i
..................
L 8a,,Oal '
9matrix of second partial derivatives
I
~
"
-r,
2
]
8C
(" Hessian matrix") of C ;
I
' Oa,~OanJ
The objective is then to estimate the parameter increments optimisation function is minimised : C(a + Aa) = C(ami~) = Cmin
Aasuch that the
(6)
When far from the minimum, where the model is not well adapted, one can simplify the relation by using : C(a + Aa) - C ( a ) + g T . Aa
(7)
The best way to proceed is then simply along the direction of steepest descent in the parameter space 9 1
Aa = - - - g 2
(8) 1
where the value of the multiplicative factor -- which should lead to the optimum can 2 be calculated from" 1
OC(a- ~g) =0
(9)
02 On the other h a n ~ when close to the optimum it is more efficient to use the extra information contained in the Hessian component of the Taylor expansion. Since at the optimum 9 A C(a) - (C(a + Aa) - C(a))
one can simplify, and s e t
- gT. Aa +
Aa
T
o
H- Aa
=o
(lo)
194 Aa = - H -~ .g
(11)
Most optimisation criteria, such as the Chi 2, are functions of the differences between the observed and predicted values at the current position a in parameter space N
C(a) : 2" 2 ( a ) : / ~X;"[ l Y , -)9 (a)j] 2
(12)
l
Therefore, by setting 1 dC(a) 13= - 2 - da
1 d(z2(a)) 2 da
1 d(z2(a)) 2 d(y-:~(a))
d(y-~'(a)) da
(13)
we can derive 9 (14)
13 : _(y_ ~(a)). O(~(a)) da
that is 9 (15) and from d2C(a) cz : ~ da= 2
d2(z2(a)) (( O(~'(a)) i : ~da y - ~(a))- da Oa 2
(16)
we can derive 9 or=
d(~'(a)) d(~(a)) d2(~(a)).(y_~,(a) ) d----~" d----~- d a 2
(17)
that iS" N rSyp(a) dyp(a) ar
[da,
" daj
d 2y p ( a ) . [ y p _ y , ( a ) ] j 7 da~daj
(18)
To accelerate the calculations, the above formula can be simplified by leaving out the second term which would in any case become negligible as the optimum was approached, giving
195
~
=
~_li
Oyp(a) Oyp(a)] eta
"
Oa j
(19)
Calculations carried out on simulated data demonstrated that this simplification does not significantly influence the results of the decomposition. Therefore, far from the optimum one should use : ,t. Aa = -I~
(20)
while close to the optimum it is preferable to use : a . Aa = -13
(21)
The Marquardt method (Marquardt, 1963) combines the two procedures by defining : !
a ~ = a~(l+2)
(22)
o t ' j k = ot jk
(23)
and using : a'.Aa = -13
(24)
The expansion factor (L) is initialised at 0.001 and then adjusted as the iterations proceed so that increments are either in the direction of steepest descent or based on the inverse Hessian method. The set of simultaneous linear equations (equation 24) is solved for Aa by Gauss-Jordan elimination. To illustrate the evolution of these different values during the iterations, a noisy monoexponential T2 curve was generated and decomposed, starting far from the initial position found by linearisation. As can be seen in Figure 1, the value of the optimisation criterion initially decreases rapidly and so L (Lambda) also decreases to favour the inverse Hessian model. If the direction of decreasing slope had not been found, Lambda would have increased. In the implementation presented here, once the optimisation criterion has stabilised for 3 iterations, Lambda is reset at its initial value of 0.001 and the decomposition procedure reinitiated using the current parameter values as the starting position. This reinitialisation, which produces the cyclic increases in Lambda seen in Figure 1, is repeated 3 times unless a new optimum is found.
196
1.00E+08
.
0.1
-
0.01
-
0.001
-
0.0001
1.00E+07
.
- "
.
.
.
.
.
.
.
"
: 9
1E-05
-
1E-06
-
"
1E-07
-
.
1E-08
-
1E-09
-
1E-10
-
.
9 "..
"..
9
"..
.
" 9
-,"
1.00E+06
--
1.00E+05
--
1.00E+04
--
1.00E+03
9
~ i
--
|
".
. ".
1.00E+02 9
i
1.00E+01
1E-11
1.00E+00
Fig. 1 " E v o l u t i o n o f t h e o p t i m i s a t i o n c r i t e r i o n , C h i 2 (- - - ) a n d L a m b d a ( ~ ) . M a n y o p t i m i s a t i o n c r i t e r i a a r e p o s s i b l e ( D r o b l o s , 1987). T h e f o l l o w i n g f u n c t i o n s a r e a v a i l a b l e in t h e p r o g r a m 9
N
9
Euclid 2 9
=
,
N 9
9
Minkowski
9
=
Canberra"
2*m-1
N-
2*m-1
N-
2*m-1
'
=
N 9
N-
City block"
=
' N-
9
Chebychev "
2*m-1
= (2 * m + 1) * m a x ( l ( Y ~ . . . ~ - Y,.mo.,)l)
I~ (~Lmode'-- %ode') (~Lobserved-- Yobscrved)[ 1 -
1/2
CYii,model-Ymodel)2 * ~ ~i,observed- Y . ~ ) 9
Correlation"
= N-2*m-1
197 N Z
9
Autocorrelation 9=
(Yi, observed -- Yi, mode, )(Yi+p, observed -- Yi+p, model)
'
N- 2*m-1
where 9 N = number of data points m - number of exponentials p = autocorrelation step The functions most commonly used as optimisation criteria in similar parameter estimation programs are the Chi 2 and the Euclidian distance between the observed and model values (both are comparable to Euclid 2). However, the Autocorrelation function is often preferable as it favors models which give more randomly distributed residues (differences between observed and model values). This can be seen in Table 1 where the results of the decomposition of a biexponential T2 curve are presented. It is clear that when noise is present in the signal, the results using Euclid 2 as the optimisation criterion are very different from those obtained in absence of noise. Table 1 9Comparison of Euclid 2 and Autocorrelation as optimisation criteria
Noise Initial Final
Euclid 2 0% 5% T2 Amp Crit. T2 Amp Crit. 100 4.0 15352 100 4.0 16912 100 4.0 100 4.0
AutoCorrelation (step ! 1) ..... 0% 5% T2 Amp Crit. T2 Amp Crit. 100 4.0 8927 100 4.0 8960 100 4.0 100 4.0
81 101
81 101
4.5 4.5
0.0062
67 95
1.3 7.7
1140
4.4 0.0018 4.6
81 101
4.5 4.5
18
The simulation parameters were 9T2(1) =100, Amp(I)=5.0; T2(2) =80, Amp(2)=4.0; 200 points and Gaussian noise. The following code presents the calculations of the optimisation criteria. Select Case critere% Case 0 ' Euclid z For i% = 1 To data nmb% CritT = CritT + dY!(i% ) * dY!(i%) Next i% Case 1 ' Minkowski For i% = 1 To data m nmb% CritT = CritT + Abs(dY!(i%) * dY!(i%) * dY!(i%)) Next i% Crit! = Crit! ^ (1 / 3) Case 2 'Canberra For i% = 1 To data nmb% YmodT = Y ! ( i % ) - dY!(i%)
198
sumYT = Y!(i%) + Ymod! CritT - Crit! + Abs(dY!(i%)) / A b s ( s u m Y ! ) Next i% Case 3 ' C i t y block F o r i % = 1 To data n m b % Crit! = Crit! + Abs(dY!(i%)) Next i% Case 4 ' C h e b y c h e v F o r i % = 1 To data n m b % T e m p = Abs(dY!(i%)) If T e m p > Crit! T h e n CritT = T e m p Next i% Case 5 ' A u t o c o r r e l a t i o n bl =0 F o r h % = 1 To data n m b % g = h% + AutoCorrStep% If g > d a t a _ n m b % T h e n g = g - ( d a t a _ n m b % + 1) b l = b l + dY!(g) * dYT(h%) Next h% CritT = Abs(b 1) Case 6 ' C o r r e l a t i o n TheoMoy! = 0 ObsMoy! - 0 F o r i % = 1 To data n m b % YmodT = YT(i%) - dYT(i%) T h e o M o y ! = TheoMoy! + Y m o d t ObsMoyT = ObsMoy! + Y ! ( i % ) Next i% TheoMoyT = TheoMoy! / d a t a _ n m b % ObsMoy! = ObsMoyT / d a t a _ n m b % SumDifrheo2T = 0 SumDifObs2! = 0 SumProdT = 0 For i % = 1 To data n m b % YmodT - Y!(i%) - dY!(i%) SumProdT - SumProd! + (YmodT - T h e o M o y ! ) * (YT(i%) - O b s M o y ! ) SumDifObs2! - SumDitObs2! + (Y!(i%) - ObsMoy!) * (Y!(i%) - O b s M o y ! ) S u m D i f T h e o 2 ! = S u m D i f I ' h e o 2 ! + (Ymod! - T h e o M o y ! ) * (Ymod! TheoMoy!) Next i% Temp = Sqr(SumDiiTheo2! * SumDifObs2!) Crit! = 1 - A b s ( S u m P r o d ! ) / T e m p E n d Select
199 If (critere% 4) And (critere% 6) Then df_less! = 1 / (data_nmb%- 2 * mfit%- 1) Elself (critere% = 4) Then df less! = 2 * mfit% + 1 Elseff (critere% = 6) Then df less f = 1 End If Crit! = CritT * df lessT where 9 dY!(i%) = mfit% =
difference between observed and model at point i% number of exponentials
The gradient or first partial derivatives or slopes, -Beta(l) and -Beta(2), in the direction of the two parameters, Initial Amplitude and T2, progressively approaches zero (Figure 2). If there were no noise and the model correctly chosen, these values would finally reach zero. The second partial derivatives also evolve during the iterations. The curvature for the T2 (Alpha(I,1)), increases near the optimum. On the other hand, the curvature for the Initial Amplitude (Alpha(2,2)), decreases although it started at a much lower level (Figure 3). This shows that, in this case, the region of the optimum is a long narrow valley with steep walls in the direction of the T2. 10000000 1000000 100000 10000
q
1000 i,
100 10
m
9
,t
0.1 0.01
Fig. 2 9Evolution of the gradient, -Beta(l) (- - - ) and-Beta(2) (m).
200
100000
--
10000
.,,
81
4
1000
100
l o T * - I - + + + + +
, , , , , , , , , , , , , ,, , I , , , , , , , , , , , , , I , , , , , , , ,
,
,
,
,
,
Fig. 3 9Evolution of Alpha(I,1) (- - ), Alpha(I,2) and Alpha(2,1) (--), Alpha(2,2) (+ +). The following code shows the calculations to estimate the intensity at each point as well as the slope for a sum of mfit% decreasing exponentials (T2). At each point (X!) on the relaxation curve, the theoretical value for the dependent variable (Ymod!) and its derivatives with relation to the parameters ( d Y d M ( ) ) are calculated. Ymod! = 0 For i% = 1 To m f i t % - 1 Step 2 arg = (X! / A_val!(i%)) ex = Exp(-arg) fac = A_val!(i% + 1) * ex YmodT = YmodT + fac dYdAT(i% + 1) = ex dYdA! (i%) = fac * arg / A_valT(i%) Next i% If Baseline% = True Then Ymod! = Ymod! + A_val!(0) dYdAT(O) : 1 Else dYdA!(O) = 0 E n d If where 9 A_val! (0) = A_valT (i%) = A_val!(i% + 1 ) =
Base line offset T2 Relaxation Time of componem i% Initial Amplitude of component i%
201 Now that the calculations involved in the Marquardt method of nonlinear parameter estimation have been outlined, the rest of this chapter will be devoted to a presentation of the various functionalities of the Windows-based program "MARQT" that we have developed to decompose NMR relaxation curves into a sum of up to four exponentials. These functionalities include not only the possibility to select options in order to adapt the decomposition process to the particular signal being treated, but to also import and export different files formats and apply a range of mathematical transformations to the data.
3 FILE MENU The available file input-outpout operations are indicated in Figure 4.
Open
...
Merge . . . Save , . . Save I n i t i a l V a l u e s Read Initial Values )l~tions
... ...
..
Exit
!
Fig. 4 F i l e menu options.
3.1 File Open Data can be read from a Bruker Minispec data file. In this case the default extensions are .T2D or .T1D. ASCII data files may also be read. The default extension is then .TXT. Table 2 : Example of a Bruker Minispec *.T2D data file. T2 1994/8/19 100 5 6.666717 10 6.242849
490 495 500
.1382047 .1139484 .1499974
When importing data from a text file, a new window opens showing the first lines of data in the file. An input box also appears, indicating the number of columns and lines of data found in the file and asking the user to validate these values, to indicate whether the
202 first line and first column contain titles, and whether the values should be considered as T1 or T2 data. 3.2 File Merge Data can be read from a Bruker Minispec *.T2D or *.T1D data file (Table 2). ASCII data'files (*.TXT) may also be read. if all lines contain the same number of columns. The data is appended to the end of the current data in memory. 3.3 File Save Data can be written to a Bruker Minispec *.T2D or *.T1D data file, or exported as an ASCII data files (*.TXT).
3.4 Save Initial Values and Read Initial Values When treating a series of relaxation curves which change in a regular way, e.g. as a function of temperature or of water content, it may be useful to take the final values of one decomposition as the starting values of another. The Save Initial Values option allows one to save the final values of a decomposition to an Initial Values File while Read Initial Values can be used to select an Initial Values File to be read. Table 3 :Example of an Initial Values file. [Parameters] Type=T2 Baseline = 107.159 Phase = 0 Number=2 Timel=96.6629 Time2=41.62804 Time3=0 Time4=0 Ampl=5554.087 Amp2= 1569.476 Amp3 =0
Amp4=0 The value for the amplitude and the baseline are multiplied by 1000. If an *.INI file has not already been selected, you will be prompted for a nalne.
3.5 File Options This options allows the user to activate : 9
automatic save of current estimates to initial values file (*.INl).
9
automatic read of starting estimates from initial values file (*.INI).
3.6 Program Exit Exit the Marquardt decomposition program.
203
4 ACQUISITION MENU
Fig. 5 : Data acquisition menu. This module is designed for use with the RS232C data transfer cards found in Brt~er Minispec instruments. The data acquisition routine is compatible with the data packets transmitted by. both the uni-directional and the bi-directional RS232C cards. 4.1 Acquisition Go Start serial data acquisition via RS232C with current interface settings. Data acquisition can be performed via the COM1 or COM2 serial interface ports. Transfer rates of between 300 bauds and 19200 bauds may be used. The acquired data decoding module assumes that there are first a header of two Carriage Return terminated lines of information (Date and Status). The Status line contains an indicator of the type of data, T1 or T2. 4.2 Acquisition Stop Clicking on this menu option will interrupt the data acquisition. It may take a little while for the program to react and stop. The data acquired just before the interruption is lost!
4.3 Acquisition Options The following parameters may be selected using the option panel : _co,mP~ J e_,,,~FZot,, [ I P~,~
,e(+-.i,I
0 300
II 0 No.o
o,~ Iloo.. s_,opm,,'I o +..oo I1-':-. o,.oo II O.,k
+:.+ I |0 19200
I lOs~ ]]
D_ataBits'
04 05 O~ @7 Oe Cancel
]
Fig. 6 : Selection ofRS232C data acquisition options. These serial data acquisition options should be set in accordance with the parameters of the RS232C card in the NMR instrument.
204 5 SIMULATE MENU
Fig. 7 " Data simulation menu. 5.1 Simulate Go Generate multiexponential T1 o r T2 relaxation data or decaying sinusoids. The data is automatically plotted after simulation. A new set of data, with the same parameters but new noise, is generated each time one clicks on Simulate Go. 5.2 Simulate Options Select values for generation of multiexponential T1, T2 or decaying sinusoids with 9
up to 4 exponentials in variable proportions
9
constant (for T2) or progressive (for T1) pulse spacing
:
9 variable levels of Uniform, Gaussian or Flicker noise The intensities are entered in Volts and converted to milliVolts. If the most intense point has an intensity inferior to 5 Volts, the curve is adjusted to bring it up to this level. The Relaxation Times and Interpulse Delays are entered in milliseconds. In the current version, Inversion-Recuperation T] curves cannot be simulated, only ProgressiveSaturation curves. The noise level is entered as a percemage of the intensity of the first point in the T2 curves and the last point in the T] curves. The baseline for T2 curves is also calculated as a percentage of the first poim. For the sinusoids, frequencies are entered in Hertz. If the number of points selected exceeds the dimensions of the current data vectors and matrices, these are all automatically redimensionned. _Pmpodkms
Vedluos
M_odol
I
|
I
02
O 12
0 3
O sPc
04 N_oi~
Point=
I 1000 ,,I
Gauzsian
DoRa t
I0"50
0 Unifom 0 FIk.-km
Expansi~ 11"1)5 Facto,
OK
I
Noise level (Z) [ 0
Fig. 8 9Data simulation options.
CANAL
!
I
205 6 CALCULATE MENU Multi exponential decomposition of T1 and T2 data by non-linear regression using the Marquardt algorithm. The program estimates the different relaxation times, initial magnetisations and, in the case of T2's, the baseline offset. By default, the number of exponentials is one, but this number will automatically vary to attain the optimal adjustment, and may be set at a higher value by the user. Calculations start with the current initial values when the user clicks on Calculate Go. These values may be user-selected, obtained by linearisation of the relaxation curve or by reading an Initial Values File. The default values are those obtained by linearisation. The program then proceeds to iteratively vary these estimates in order to minimise the current optimisation criterion. The default optimisation criterion is the autocorrelation of the differences between observed and model values at different points on the relaxation curve, corrected for the number of degrees of freedom. The iterations usually stop when the optimisation criterion reaches the current stopping value. The default stopping value for all optimisation criteria is equal to the number of data points in the relaxation curve. If the stopping value is not attained, the iterations will stop when both the optimisation criterion and the estimates no longer vary significantly. Before stopping however, the program will test whether increasing or decreasing the number of exponentials improves the optimisation criterion. The user may interrupt the iterations by clicking on Calculate Stop. The calculations may be started again from a slightly different position in the multidimensional space of the parameters by clicking on Calculate Reiterate, while Jackknife estimates may be had by clicking on Calculate JackKnife. All default choics may be changed using Calculate Options.
__Go Reiterate
Stop Jack-knife Save . . .
Optimisation Criterion S_tarting
Values
...
...
S_topping Criterion...
Fig. 9 Calculate menu.
6.1 Calculate Go Start non-linear parameter estimation using current initial values for the estimates as the starting point in the multidimensional space of the parameters.
206 When a new set of data is acquired or read from file, the initial values are calculated by linearisation. However, if the user reads an Initial Values File, these default values will be replaced by those comained in the file. Before starting the iterations, it is also possible, by using Calculate Options Starting Values, to change the initial values calculated by, linearisation or read from an Initial Values File. 6.2 Calculate Reiterate The Reiterate option allows one to test whether slightly differem starting poims lead to significamly differem final values. This can often be the case with iterative procedures such as Marquardt which explore the topography of the optimisation criterion in the multidimensional space of the parameters. One sometimes encounters and is stopped by a local minimum, whereas, with another set of initial values, one would have evolved towards a differem minimum giving another set of parameter estimates. The non-linear parameter estimation procedure is reinitiated using the calculated final values +/- twice the estimated errors on these values.
6.3 Calculate Stop Imerrupt calculations before the stopping value has been reached. The optimum is attained when the optimisation criterion is less than the selected stopping value, ff at the end of the iteration loop the optimisation criterion and the parameter values have stabilised, the counter is incrememed, otherwise it is reset to zero. The iterations cominue umil either the coumer equals 2, or the user imerrupts the calculations. The system is stabilised if the maximal relative variation of a parameter estimation between two iterations is less than 0.001 and the optimisation criterion varies by less than 0.1 .ff an optimum has not been reached,, the number of exponemials is automatically increased or decreased up to 3 times in order to try to improve the model. ff the iterations seem to evolve in a ridiculous direction (excessively large or small values), it is possible to stop the program and change the values with Calculate Options Starting Values and then start over again with Calculate Go. Because of the time taken by each calculation step, the imerruption may not take effect immediately. The estimates corresponding to the lowest criterion value encoumered are conserved as the final values. 6.4 Calculate JackKnife The results found by Calculate Go may be used to repeat parameter estimations using the Jackknife method (Efron, 1982). One poim at a time is taken from the data set and the estimation performed using the other poims. To increase the variability, a new starting poim in the multidimensional space of the parameters is used at each repetition. The default method for choosing the starting poim for the estimations is by linearisation based on the remaining poims. Calculate Options JackKnife allows one to choose between this global method and a more local method where the starting poim is displaced by twice the currem estimated errors. At the end of the jackknife procedure, or if the procedure is imerrupted by the user, the individual values are used to calculate a Jackknife estimate of the mean and of the standard deviation for each parameter. The Plot JackKnife option then becomes active.
207 This option allows one to visualise the dispersion and correlation of the parameter estimates. 6.5 Calculate Save Save results in a text file (default extension .RES) The results may be appended to an existing file. This is useful when one wishes to compare a series of relaxation curves. Each line in the file contains the File Name followed by the Phase (for T1 files) or Baseline (for T2 files) and then the Relaxation Times (T1 or T2) and Initial Amplitudes for each component found by the multiexponential decomposition. The components are sorted in order of increasing Amplitude. Table 4 : Result (*.RES) file containing results of 5 decompositions. Filename Baseline C: \100 50G1.T2D 1.076857 C: \100 50G2.T2D 0.9925858 C: \100 50U2.T2D 1.158471 C: \100 50F1.T2D 1.071686 C: \100 50F2.T2D 1.07159 m
m
T2 44.12901 40.68137 29.4194 41.62452 41.62804
Amplitude 1.366325 1.356073 .8017524 1.569245 1.569476
T2 94.73029 95.93311 90.56707 96.66005 96.6629
Amplitude 5.634373 5.6666 6.313432 5.554363 5.554087
The results may be saved automatically at the end of each calculation by choosing Calculate Options Auto Save. If a *.RES file has not yet been selected, the user is prompted for a file name.
6.6 Calculate Options To a certain extent the user may configure the program to adapt it to the type of relaxation data being analysed. The following options are available : 9 select optimisation criterion 9 set starting values or initial estimates 9 change stopping value for optimisation criterion 9 select between global or local Jackknifing 9 automatically write final estimates to file. 6. 6.1 Calculate Options Starting Values Select initial values to start non-linear parameter estimation. These initial values are obtained by linearisation of the curve or by reading the values from the *.INI file. The values may be changed by the user ff he wishes to try other starting points in the multidimensional space of the parameters. To calculate the Initial Values for a T2 relaxation curve, the signal is segmented and the segments linearised. The slopes and the extrapolated intersections are determined as shown in the code below : decalage = data_nmb% / 8
208 A_val!(2) = 0 For i% = 1 To num_mfit_max% Step 2 TempT1 = t!(data_nmb% - (i% + 1) * decalage + 1) TempT2 = t!(data_nmb%- i% * decalage) TempAmpl = Abs(amp!( data_nmb% - (i% + 1) * decalage + 1)) TempAmp2 = Abs(amp!( data_nmb%- i% * decalage)) slope = (Log(TempAmp2) - Log(TempAmpl)) / (TempT2 - TempT1) A_valt(i%) = 1 / slope '{T2 guess} temp = Log(TempAmp2) - TempT2 * slope A_val!(i% + 1)= Abs(Exp(temp) * 1000) - A_val!(2) '{Amp guess} Next i% The pairs of estimates are then sorted in decreasing order of the Initial Amplitude so that the major components are taken into account first. illllrl- a~lr,,,,;,,;,,~ I Components- Piopo[tions"-7
i
01 ~:,:~:
I
03
9 4
I
Soft
I
~N~L I
Fig. 10 9Starting values imerface.
6. 6. 2 Calculate Optimisation Criterion If the default criterion, Autocorrelation, is chosen it is possible to vary the step (number of points between the two values used to calculate the function). The optimal value depends on the frequencies of the noise and of the information in the signal.
6.6.3 Calculate Option Stopping Criterion The iteration process normally stops once the value of the optimisation criterion has stabilised below the Stopping Value. The default Stopping Value is equal to the number of data points. As some optimisation criteria are sensitive to the size of the data values, the user may wish to change default this by introducing a multiplying factor : 9
greater than 1 to cause the optirnisation process to stop earlier
9
less than 1 to cause the optimisation process to continue longer.
209 This multiplying factor will be used for all following decompositions unless changed or the program stopped.
ptittti~liotlCrilcri - Fxpenen6als --
- C[it~ion
{~ Euclid z
| 02
0 Cafd)e~ta
O cay . u ~ , .
03
04
OK J
0 Cheb~chev {~ Co.elation
Step:
~'~
Fig. 11 9Selection of optimisation criteria. 6. 6. 4 Calculate Options JackKnife Global or Local Select between a more global Jackknife procedure where the starting values are obtained by linearisation or a more local method using plus / minus twice the estimated errors. The default method is Global. The Local method underestimates the variability of the Jackknife estimates as each iteration is based on the best results obtained in the previous iteration. If the estimated errors are small, the procedure may remain stuck in a local minimum. The Global method produces a greater variability as the starting point is usually further from the current optimum. 6.6.5 Calculate Options AutoSave Results Select this option to automatically append final results to a *.RES file. 7
VIEW MENU
Use a spreadsheet-like grid to view, and in most cases, edit data. This menu option allows one to visualise and edit the raw data read from a file, acquired via the RS232C or generated by simulation. They may also be saved to a file or printed on a printer or plotter. After multi-exponential decomposition, the spreadsheet can also be used to visualise and edit, print and save the individual component data, the sum of the individual components, the raw data and the difference between these observed values and the model.
210
~
.
.
-
-
-
__Simulation I )at Results Statistics
Fig. 12 : Edit menu. Clicking on the Right Mouse Button makes a toolbar containing a range of mathematical functions appear.
Fig. 13 :Data edit Toolbar. One selects an individual cell by, clicking on it with the Left Mouse Button, a whole column by selecting its topmost cell, and a row by selecting its leflmost cell. If one wishes to apply a function to a column or row of data, an individual data cell or a range of data cells, one should select it and then click on the Right Mouse Button to activate the toolbar. Click on the desired function button in the toolbar to apply the function. To select another set of cells, it is necessary to close the toolbar first. However, double-clicking when selecting a cell, row or column causes the most recently used toolbar function to be directly applied to it. The available functions are : + : Add operand to selected Row, Column or Cell - : Subtract operand from selected Row, Column or Cell x : Multiply selected Row, Column or Cell by operand / : Divide selected Row, Column or Cell by operand After clicking on one of these buttons (or pressing on the corresponding key), an InputBox appears requesting the value to be used as the second operand. If a number is entered, the selected cells are remplaced by the result of the operation between the current contents of the cells and that input value. If the letter 'C' or 'R' followed by a number is entered, the second operand will be the coments of the corresponding Column or Row. For example, " + C2" will result in the contents of the selected column being replaced by its sum with Column 2. Some of the following functions will also require the user to give a numerical value as second operand. Diff" : Replace by deviation from mean for the selected Row or Column Prog : Calculate progression (difference from value of previous cell) Inverse : Invert selected Row, Column or Cell (zero values set to "**")
211 B a s e l i n e : Adjust lowest value (baseline) of selected Row or Column to given value
Normalise : Adjust all selected values so that maximum point has given value S m o o t h : Perform Savitsky-Golay smoothing of selected Row, Column or Cell using given step (Savitsky and Golay, 1964, Steiner et al., 1972) Deriv : Calculate Savitsky-Golay derivative of selected Row, Column or Cell using given step Sort : Sort complete table of data based on selected Row or Column, using Quicksort algorithm L o g l 0 : Log(10) of selection (negative or zero values set to "**") Ln : Natural Log(e) of selection (negative or zero values set to "**") Exp : Exponential of selected Row, Column or Cell S q u a r e : Square selected Row, Column or Cell Root : Square Root of selected Row, Column or Cell (negative values set to "**") P o w e r : Use values in selected Row or Column as the power to elevate given number FFT : Fast Fourier Transform magnitude spectrum of selected Row or Column Inv FFT : Inverse Fast Fourier Transform of selected Row or Column M E M : Maximum Entropy magnitude spectrum of selected Row or Column Insert : Insert a Row or Column above or to the left of the selected Row or Column M a r k : Mark selected Row, Column or Cell for subsequent deletion Oel : Delete Marked Row or Column, or Selected Cell (in this case the other Cells may be shifted either to the left or upwards). Copy : Copy selected Row, Column or Cell to ClipBoard Paste : Paste contents of ClipBoard to selected Row, Column or Cell Close : Close ToolBar To definitively replace the existing data by the modified data, click on "USE". If there are more than two columns, the user will be prompted to choose the columns to become the independant and the dependant variables. All these results may be saved to an ASCII file or to a Bruker Minispec file (*.T1D or *.T2D). When saving to a Bruker file the user will also be prompted, if necessary, to choose the columns to become the independant and the dependant variables. 7.1
V i e w Statistics
The statistical values associated with the model generated by the decomposition can also be visualised, but not edited. One can visualise : Mean values and estimated errors for the parameters estimates Chi 2 Standard Error of the Estimates (SEE) Rcsidual Standard Deviation (RSD) Cocfficicnt of Determination (R2) Total Variance, Explained Variance for the model, Residual Variance Correlation matrix of the estimated parameters.
212 One may sometimes obtain a negative Chi z or an Explained Variance greater than the Total Variance, but this is simply indicative of a very bad set of estimated parameters. The following code shows the calculation of these statistics in the case of a T2 curve. For i% = 1 To data nmb% mean_y! = mean__y! + amp!(i%) Next i% mean_y! = mean_y! / data_nmb% For i% = 1 To data nmb% sum_xy! = sum_xyT + ampT(i%) * t!(i%) sum x2! - sum x2! + t!(i%) * t!(i%) sum_y2! = sum_y2! + amp! (i%) * amp! (i%) diff_y2! = diff_y2! + dY!(i%) * dY!(i%) Khi2! = Khi2! + dY!(i%)* dY!(i%) / modele!(i%) tot_var! = tot_var! + (amp!(i%) - mean_y!) * (amp!(i%) - mean_y!) expl_var! = expl_var! + (modele!(i%) - mean_y!) * (modele!(i%) - mean_y!) res var! - res var! + (amp!(i%) - modele!(i%)) * (amp!(i%) - modele!(i%)) Next i% RSDT = Sqr(Abs(diff__y2T / data_nmb% / (data_nmb% - 1))) SEE T= Sqr(diff__y2T / data_nmb%) R2! = (1 - res_var! / tot_varT) where: ls% t!(i%) amp!(i%) modele! (i%) dY!(i%) data nmb%
number of exponentials time at point i% observed intensity at point i% predicted intensity at point i% difference between observed and predicted at point i% number of data points
D
8 PLOT MENU All graph may be saved as either a bitmap (*.BMP) or as a Windows metafile (*.WMF). This latter format is preferable as the graphs are line-based (vectorial) and vectorial files are significantly smaller than bitmaps. Bitmap files contain information describing the state of each image bit, whether it contributes to the image or not. Plot Data gives an graphic plots of the raw data and Plot Results gives the component curvcs after decomposition.
213
Fig. 14 " Plot menu.
8.1 Plot Histogram This allows one to have : a histogram of added noise for simulated data histogram of the frequencies of the deviations. This may be useful to detect non-gaussian error distributions which could indicate an incorrect adjustment of the model to the data. As the data are distributed across twenty classes, at least 20 points are necessary to plot the histogram.
8.2 Plot Lack of Fit This graph presents the deviations as a function of the position in the curve.
] T2 232.T2D Noise Hisloqram
Ell 005-
-
0.040J030.02-
i
0.010.00 0
0.01,0J02-
I
.0D3.0.04-
0.05CLASS
ISAVEI OK I Fig. 15 9Histogram of deviations.
OD6-
TIME (ms)
[SAVE I OK Fig. 16 9Lack of Fit.
I
214 8.3 Plot Parameter Leverage Plot of variation in the Lack of Fit between model and observed data as a function of percent variation in estimated value for each parameter. Each parameter is varied in a maximum of 50 steps between its mean value plus and minus twice its estimated standard deviation, as calculated by the Marquardt method. If twice the estimated standard deviation is less than 1 percent, this latter value is used instead. The change in Lack of Fit is calculated at each step as the sum of the absolute values of the differences, for all the points in the curve, between the current model and the model found using the mean parameters. As is clear from Plot JackKnife and View Statistics, the parameters are in fact correlated. However, for speed of execution all parameters except the one being studied are kept constant. Therefore these graphics are oversimplifications as they neglect the interactions. In the case of T2 curves, the influence of the baseline is also neglected. 8.4 Plot JackKnife The set of the individual pairs of values (Amp(i%) and T] (i%) or T2(i%) ) calculated by the Jackknife procedure are presented in a scatterplot. This allows one to visualise the dispersion of the parameter estimates. The points are usually scattered along a diagonal, indicating a strong correlation between the parameters. ~ a r a m e t e r E f f e c t on L O F
f112+ Q
/
~i
Scatter Plot
X
f
/'-",>:' /
Cb~-ie Ja
ackKnife
~ ~t ht i
li ilts t ill lltli, 1ii 1115.l-
-i-
li Qlii
X
Amp
0
, -~I l,.9 i
I2 1
~i
i
i
i
i
i
i
i
" Tr ) l ~X j ~ ,'5(.', (1'4X.'s ~iiI ~'s ~',i .d3 : ~ :-a' ,,'-,D~ llllO T ill9li I $.1-
X-~
+_T2
Fig. 17 : Lack of Fit as a function of variation in parameter estimate.
Fig. 18 : Distribution of parameter estimates by JackKnife.
215 9 CONVERT M E N U These menu options allows one to export data in other file formats.
Fig. 19 : Convert menu. 9.1 Convert to JCAMP.DX The JCAMP.DX format is a standard developed by the Joint Committee of Atomic and Molecular Physicists to facilitate the exchange of spectral data (McDonald and Wilks, 1988). This conversion is useful when one wishes to analyse the complete relaxation curve using one of the many statistical software packages capable of importing this standard data transfer format. A complete analysis of the relaxation curves has the advantage of not assuming a particular model, and of not reducing a large number of data points to a limited number of model-based relaxation parameters. It is possible to append several JCAMP.DX files together to produce a compound file. In this case, it should be remembered that a compound JCAMP.DX file must include a global header indicating the number of subfiles it contains, and each of these subfiles must contain the same number of data points. When this option is chosen, the user may introduce physico-chemical data such as the chemical composition or properties of the analysed samples. This information may be used to perform multi-dimensional regressions such as Principal Components Regression or Partial Least Squares Regression between the relaxation data and the physico-chemical data. Default file name : current name with the extension .DX. 9.2 Convert to SPC (Bruker WIN-EPR Format) Both a *.SPC and a *.PAR file are produced. These files are compatible with the WIN-EPR program from Bruker. It is possible to append several data sets together to produce a 2-dimensional *.SPC file. In that case the corresponding 2-dimensional parameter file *.PAR is also generated. Exporting files in this format may be interesting if one wishes to carry out Medium Resolution studies using a Time-Domain NMR instrument in an Off-Resonance mode. The Interferograms may then be phased, Fourier Transformed and the resulting spectra integrated using programs such as WIN-EPR. The 2-dimensional option is particularly interesting, not necessarily for exporting true 2D-NMR data, but for the 2D-analysis of a series of 1D-spectra or even Time-Domain signals which evolve in a coherent manner as a function of a factor such as time, water content, temperature, etc. PAR files contain Carriage Return terminated lines (no Line Feed) of the data parameters. SPC files contain the data in standard IEEE 32-bit (4 bytes) single-precision
216 floating-point format. Data values can range from -3.402823 E38 to -1.401298 E-45 for negative numbers and from 1.401298 E-45 to 3.402823 E38 for positive numbers. Default file names : current file name with the extension .SPC and .PAR 9.3 Convert to Flat ASCH Format The values of the first (independant) variable are in a Carriage Return+Line Feed terminated line. The values of the second (dependant) variable are in a second Carriage Return+Line Feed terminated line. The user may select to separate individual values within a line either by a SPACE (ASCII 32), a TABULATION (ASCII 9), a COMMA (ASCII 44) or a Carriage Return+Line Feed pair (ASCII 13+ASCII 10). It is possible to append several data sets together to produce a compound *.TXT file. In this case, the independant variable (the first line of data) is only exported to the file once and it is assumed that the user has verified that the number of data points is always the same. Exporting in this format may be interesting if one wishes to transfer the raw data to another software package that can only import Flat ASCII files with the data in lines rather than columns. This is also a way of transposing the data from columns to lines. Default file name : current file name with the extension .TXT
I0 OPTIONS MENU
Fig. 20 :Options menu.
10.1 Options Printer This option allows one to choose among the installed printers or plotters, and the corresponding fonts. One can then preview the output which would be produced. 10.2 Options Dimensions This option can be used to preselect the size of the matrices and data tables used by the program. However, if the number of data points acquired or simulated or read from file is too great, this ftmction will be called automatically to readjust these dimensions. 11 HELP MENU This menu gives access to information on : []
General NMR theory.
[]
Time-Domain NMR theory and applications
[]
Time-Domain NMR data analysis
[]
Use of this program
217
Fig. 21 :Help menu. Help Contents allows access to the Table of Contents of the Help file, Help Search activates the Index to the Help file and Help About supplies copyright information on the program. 12 REFERENCES
Droblos, F., 1987. Symmetric distance measures for mass spectra, Analytica Chimica Acta, 201,225-239. Efron, B., 1982. "The Jackknife, the Bootstrap and Other Resampling Plans", Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania. Marquardt, D.W., 1963. An algorithm for least-squares estimation of nonlinear parameters, J.Soc. Industrial and Applied Mathematics, 11(2), 431-441. McDonald, R.S. and Wilks, P.A., 1988. A standard form for exchange of infrared spectra in computer readable form, Applied Spectroscopy, 42, 151-162. Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T., 1988. "Numerical Recipes in C. The Art of Scientific Computing", Cambridge University Press, Cambridge. Rutledge, D.N. and Tom6, D., 1991. "La r6sonance magn6tique nucl6aire" in Techniques d'analyse et de contr61e dans les industries agro-alimentaires, 234-251, (ed. G. Linden) Lavoisier, Paris. Rutledge, D.N., 1995. "Nuclear Magnetic Resonance Spectroscopy: Applications in Food Analysis" in Encyclopaedia of Analytical Science, (ed. A. Townsend), Academic Press, London. Savitsky, A. and Golay, M.J.E., 1964. Smoothing and differentiation of data by simplified least squares procedures, Analytical Chemistry, 36(8), 1627-1639. Steiner, J., Termonia, Y., Deltour, J., 1972. Comments on smoothing and differentiation of data by simplified least squares procedures, Analytical Chemistry, 44(11), 19061909.
Signal Treatment and Signal Analysis in NMR 218
Ed. by D.N. Rutledge 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 10 CONTINUOUS RELAXATION TIME DISTRIBUTION DECOMPOSITION BY MEM F. M a r i e t t e ~, J.P. G u i l l e m e n t 2 , C. TellierSand P . M a r c h a l ~
~Cemagref, Division Technologie, 17 avenue de Cucill6, Rennes, FRANCE ZDepartement de Mathematiques Appliqu6es URA CNRS n~ 3Laboratoire de RMN et R6activite Chimique, URA CNRS n~ 2 rue de la Houssini6re, Nantes, FRANCE 1 INTRODUCTION
There are a wide range of physical methods, such as fluorescence spectroscopy, chemical relaxation and NMR spectroscopy where data can be represented by a sum of exponential decays. In NMR spectroscopy, multiexponential relaxation decays occur in diverse systems which present chemical or physical heterogeneity. In these cases, the analyse of multiexponential decays may provide information on diffusion processes, exchange processes and compartmen! size. (Lillford et al., 1980; Belton & Hills, 1987; Hills et al., 1989). Due to of the great potential of relaxation measurements for the determination and understanding of the composition or the structure of a sample, numerous mathematical methods have been developed to analyse the multiexponential decay function. These methods can be classified in two groups, depending on the number of exponential terms which are taken into account. The first approach is to fit the decay curves with the smallest number of discrete exponentials terms which provide a satisfactory representation of the experimental data. The other approach is to analyse the data in terms of a continuous distribution of relaxation terms. One of the best known methods of discrete analysis of relaxation data is the Marquardt algorithm (1963) which uses an initial guess for the exponential terms. More recently other techniques have been proposed which do not need the introduction of initial values. These techniques use either a combination of inverse Laplace Transform and Pad6 approximation (Yeramian & Claverie, 1987; Tellier et al., 1991) or a linear prediction and singular decomposition (Lin et al., 1993).
219 The inversion program CONTIN (Provencher, 1976, 1982) was used to analyse the NMR relaxation decay curves of biological samples in terms of continuous distributions of relaxation times (Kroeker and Henkelman. 1986). CONTIN is a general package used to resolve ill-conditioned problems such as the inverse Laplace transform in relaxation experiments. The strategy used by Provencher is based on the principle of Parsimony which states that the most appropriate solution is the simplest one. Th,s method has been used recently by Hills and Le FIoc'h (1994) to study the effect of temperature on the distribution of water proton relaxation in frozen potato tissue. Independently of the performance of all these methods (discrete or continuous) to estimate the relaxation time from noisy, and sometimes incomplete measurements, their choice is important because it will govern the interpretation of the proton relaxation time in the sample. The first mathematical method available was the discrete method, and consequently all the relaxation components were assumed to correspond to different populations of proton. For example, in the study of macromolccule hydration the twostate model of Zimmcrman and Brittin (1957), in which the water molecule is either associated with the macromolecule or with the bulk solvent, was widely used. Extensions to the two-state model such as lhc three state model in which the water is highly and loosely bound to the macromoiccule or the two state model with anisotropic motion of water (Halle and al., 1981: Peemoeller and al., 1986) have been proposed. Another class of models supposes that all solvent molecules are in continuous, long range interaction with macromolecule (Hallenga & Koenig, 1976). Lillford et al. (1980) preferred to interprete the relaxation parameters of a muscle sample with a continuous distribution analysis instead of a discrete analysis. In the same way Kroeker & Henkelman (1986) and Hills and Le Floc'h. (1994) showed that relaxation can be analysed in biological and food samples in terms of a continuous distribution of relaxation times. These authors conclude that for biological systems the second class of model is more realistic and this despite the fact that there is no exact theory, to take the complex distribution into account. However, the continuous approach has the advantage of requiring few a priori assumptions about the relaxalion bchaviour. In this paper we discuss the applications of the Maximum Entropy Method (MEM) to analyse the NMR relaxation decay curves in terms of a continuous distribution of relaxation components. The MEM is a powerful data analysis technique, which has found many applications in image analysis, radioastronomy, crystallography, medical imaging (for a review see Gull & Skilling, 1984: Bevensee. 1993). In NMR, MEM data analysis has been widely used for the reconstruction of NMR spectra and magnetic resonance images (Sibisi, 1983: Hore, 1985). In the first part of this paper we discuss critically the application of the MEM to characterisc decay curves in terms of a continuous distributions of relaxation times. We present the algorithm which is based on Jaynes Entropy and we use simulated decay curves to illustrale the application of the technique. In the second part we compare the results obtained on simulated data with the MEM and the Marquartd algorithm. In the third part, the advantages of the MEM arc discussed using experimental data. o
220
2 THE MAXIMUM ENTROPY METHOD The general equation describing multiexponential relaxation is : t
I(t)=
o -F(T) e TdT
(1)
Where I(t) is the intensity of the signal at time t, after a pulse sequence, and F(T) the unknown amplitude of the spectral component at relaxation time T. The data yk = I(tk) + ~k, corresponding to the values of I(t) at times tk spoiled by noise ak is known. We suppose that the noise (ak) has a zero mean value and a known standard deviation c. We wish to determine F(T), the unknown distribution of relaxation time. Before describing our numerical approach, we wish to recall the main properties and problems of the transformation F ~ I. 2.1 Mathematical properties The transformation F ~ I is related to the Laplace transform. In fact, ](t) is the value at time t, of the Laplace transform of the function :
1 I(t) = r~joe-ut ~ F(--)du U-
(2)
U
I(t) is defined when F(T) is intcgrable over ]0,~[, or when F(T) satisfies
IF(T) i _<TI+----c ~,
e>0
(3)
The transformation F ~ I is linear. When F(T) >-- 0 then l(t) >-- 0 and l(t) decreases. Under large conditions, I(t) is im'Snitely derivable and I(0) = Io F(T)dT and
II(t)l < ~oIF(T)~T
(4)
We emphasise, as pointed out by many authors (Bellman, 1966: Provencher, 1976, 1982; Mc Whirter and Pike, 1987; Livesey and Brochon, 1987), that inverting the Laplace transform is a very ill-conditioned problem. We will illustrate this with three examples. The first example emphasises the difficult)' to determine the width of a dispersion F(T). The transform of the Dirac delta fimction 6(T-To) is exp(-t/To). Let us consider the triangular peak F(T) = PTO,h(T) centered on To with a base equals to 2h and a height of 1/h. When h is small the triangular fimction is a good approximation of the delta function 8(T-To).The deviation between exp(-t/To), the exact transform of the delta function and I(t) the transform of the peak PT0,h(T) can be seen in Fig. I. A variation in the height of the peak for a same area (therefore a variation of the dispersion) induces a very small perturbation on I(t). Therefore, for a given I(t) it will be
221 difficult to retreive the exact height and width of the dispersion especially if the width is small.
l(t)
F(t)
Residuals
4
2e.4 10
hPl"o= 5 % 3
0e+0
~2
~.
~'05
1
-2~4 ~,.
0
l 4
2
I 6
,
O0
1 8
10
0
T
1 100
l 200
i
t
i
4~
t 2~
3 Oe+O
P2 I
-3 ,~
-2e.4
0
........ _
L
2
4
j
~
0.0 1
.J-.... 0
t, 100
J 200
0
~ 100
~ 200
4e-4
t
Fig. 1 : Comparison between an exact transform of a discrete function and a transform of a peak for different values of To and h. The second example illustrates the effect of a periodic fimction F(T) on the signal l(t). The transform I(t) of the sinusoidal distribution : F(T) = A sin(wt), T ~ [0,2nn], and F(T) = 0, T > 2nn,
(5)
satisfies,
II(t)l_< 2A
(6)
W
Therefore, if a distribution is pcrlubated by a sinusoidal function Asin(wt), the effect on I(t) can be very small even if A is large. Hence, it will be difficult to detect sinusoidal perturbations on a distribution F(T). The last example illustrates the difficulty to distinguish a broad distribution of relaxation times from several discrete values of relaxation times. Let It(t) be the transform of the uniform distribution Fr
= l/Tin, T ~ [0.Tm]. and F~(T) = 0. T > T,..
(7)
222 and Id(t) the transform of Fo(T) = cl 8(T-T1) + c2 8 (T-T,) +c3 8(T-T3),
(8)
with T m - 100, T] - 5.68, T2 = 27.22, T3 = 75.14, and Cl = 0.12, C2 = 0.34, C3 = l-C! -C2 = 0.54. It is found that Id(t) deviates from L(t) by less than 5 10 4. Therefore, it is possible to obtain a good fit of Fc(T). through the transform F --t I, with only 3 delta functions. This example shows that a good fit of I(t) can be obtained with a continuous distribution as well as with a family of discrete components. The problem will be to eliminate, among all possible solutions, those which don't correspond to the true unknown function F(T).
2.2 Oiscretisation. We make the choice of conserving the continuous aspect of the spectrum by discretising I(0 :
--tk
Ii o Yk =
F(T)e
T dT
+
ek
k = l .... N
(9)
with a large enough number of points M defining Tj uniformly distributed on an interval [0,TM] (M = 160, 200). The rectangle integration formula is used to obtain : tk
M
-~
Yk = h Z e P~j
Tj
+
6k
k=l .... N
(10)
j=l
where h = TM/M is the step of discrctisation and Pj = F(Tj) arc the unknowns. Typically N = 200, 400, 800 data and TM must be chosen so that F(T) = 0 when T _> TM.
We say that a family (Tj, Pj) j is an acceptable solution if the reconstructed quantities tk
Xk = h L P j e
q
k = 1.... N
(ll)
j=l
present a deviation from the data (yk) with the same statistical properties as the noise (ek), that is, if N Z
=
Yk-Xk k=l
is close to N ~2.
I
(12)
223 Unfortunately, as said before, a lot of very. different (Tj.P0 , which satisfy, this criterion exist. Classical non-linear deconvolution and least-square analysis such as Prony method (Marple, 1987), Marquardt algorithm (Marquardt, 1963) or Laplace-Pad6 method (Tellier et al. 1991) work mostly with a minimal number of discrete exponential terms (M < 5) and are not really suited to complex samples. 2.3 The Maximum Entropy Method Maximum Entropy Method is a method that permits to select the solution that requires the least information among all acceptable solutions (Jaynes, 1982, Stephenson, 1988). To measure the information contained in a family (Tj,P) j, the Shannon Entropy function, redefined by Skiiling and Bryan (1984) is used: M
S=-
z Vj (In(
-1)
(13)
j=l
In the absence of constraints, this expression is maximum when Pj - A, corresponding to a uniform distribution F(T). For others possible Entropy functions, see Freiden (1972), Livesey and Brochon (1987), Stcphenson (1988) and Xu (1990). Tile problem is therefore to minimise 9 M
- S = E P j (In( p j - 1) A
(14)
j=l
subject to the condition N =
Yk-Xkl~No
"2
(15)
k=l 9
.
We can specify the condilion Z2 ~ Nc~2 by Z- = N~2 or No 2 _>X 2 In both cases, it can be proven that (14) has a unique solulion.
2.3.1 Soh, ing the optimisation problem Following Skilling and Bryan, a Lagrange multiplier lambda is introduced. The solution of (14) will lie at the minimum of Q = - S +,;Lz 2
(16)
for a given value of ;L. To find the corresponding ~,, we start with a small value, and repeatedly compute the minimum of Q, for increasing L values, until 7,2 is acceptable. The automatic update of Z, depends of the behaviour of Z", grad(z:), grad(Q). It must be noted that small values of ~, give more importance to -S, while large ones favour Z2. The
224 Entropy function is used to get the first approximations of the solution, while during the last iterations, it is the 7~~- constraint which dominates the minimisation that is mostly attempted.
2.3.2 )~1inimisationof Q = - S + 22"2 , for afixed 2 Starting from a uniform distribution (Pj), we have to improve the quality of the solution obtained for different value of Z,. To do this, a quadratic Taylor approximation of Q, is computed on a small descent space containing grad(Q) is computed. The improved (Pj) is actualised as the solution of the unconstrained minimisation problem on the descent sub-space of the quadratic approximation. In ordcr to facilitate this task and guarantee a good result, some mathematical transformations arc performed. (See Skilling and Bryan (1984) for details). We have written a program, PMEM and computer simulations have been performed to determine the ability of the this program to reproduce known input distributions. 3 VALIDATION OF M E M BY SIMULATION A monomodal and a bimodal gaussian distribution of T,_ relaxation timeswere generated, and the effect of the width of the distribution ~ (5 is the ratio between the mid width at half height and the value of T~_corresponding to the maximum amplitude), the signal to noise ratio (SNR) and the signal truncature were tested. A baseline value of zero was used for all simulations. The decay curves were simulated with N=600 equally spaced time t, with uncorrelated added noise (standard deviation cy, =0.02 %). All the distribution F(T) were normalised to 100 arbitrar).' units. Fig. 2 and Fig. 3 show examples of the reconstituted monomodal and bimodal spectra with 180 points equally spaced in T2 and the associated residuals. For both simulations the convergence between the simulated spectra and the MEM solutions is good. Correlation with distribution width 60Utlmt and width 15inimt The effect of the dispersion on the MEM solution was studied on simulated relaxation data corresponding to a continuous monomodal gaussian spectrum. The mean T2 value was 60 ms and the width ~ ranged from 0 (a mono-delta fimction) to 40 % (wide dispersion). Figure 4 shows the 6 obtained from MEM analysis as a function of the simulated ~ input values. From 10 to 40 % the MEM solution is very close to the simulated parameters. 8 = 10 % is the limit below which the accuracy of MEM is insufficient. Below this value and in the particular case of a mono-dclta function, we obtained a distribution with ~5 ~7 % and the solution is ver)., unstable. However the T2 value and the area of the peak calculated from MEM is independent of the width and coherent with the input values. We obtained from all the simulations with T: = 60 ms and area = 100 %, a mean T_,value of 60.1 + 1.5 ms and a mean area of 100.5 + 0.2 %.
3.1
225
3 ~;= 1 0 %
5=30%
'I @
.~ 2
'*4
g a. 1
0
I 0
50
0
100
0
T 2 (ms)
~0"4 t
f
0.4
"~o.2 -
0.0
~-0.2
.n.Av.,
i
0
!
-0.2
i
I00 200 Time (ms)
"1
150
-0.4
300
50
100
150
T z (ms)
............ i
J
I
t
0
100
200
300
Time (ms)
Fig. 2 9Monomodal spectn~m with the MEM solution ( ) and simulated distribution (o~176176176 diffcrent widlhs. The residuals are the difference between the simulated function I(t) and the fimction I'(t) reconstituted from MEM solulion spectrum. The mean input T_. value was 60 ms.
1.5 5= -- 40 % .-q 9 .o
5 and ~;b = 20 %
--- , . . 0 ~
o
fi3.5
0.0
1
-1
J
0
I
f
100 200 300 T 2 (ms)
I
1
I
100 200 T 2 (ms)
I
"4f5
500
1000
0
300
400
.~.2 A~...~,t.lM, a - , . i . --
-0.2 -0.4
0
I
$2
~.0
I
400
l 0
.m..-,L., t,...
Time (ms)
--0.2 -0.4
I 500
I 1000
Time (ms)
Fig. 3 9Bimodal spectrum wilh the MEM solution ( ) and simulated distribution (ooooo)for different widths. The residuals are the difference between the simulated function I(I) and the function l'(t) reconstituted from MEM solution spectrum. The mean input values were T2. = 60 ms and T_~b= 240 ms.
226
./,Z
50' ~,
40
... 30 '~ 20 0
goo ._o80
0
-
-
100~
I
I
I
10 20
o 60 -
17
O
40
9
9
-
0
~'20 N ~ O-
i 0
-
I
I
30
40
V I
50
8 input (%) Fig. 4 : Correlation between half full linewidth input and half full linewidth obtained by MEM analysis.The line represent the expected 8 output values.
0
0
I
I
I
10 20 30
I
40
50
8 input (%) Fig. 5 : Influence of the dispersion 8 on the Marquardt decomposition. T2 obtained from Marquardt algorithm (~') and mean input T2(O).
As we do not know, a priori, if the experimental relaxation curves are characterised by a dispersion in T2 ,we also evaluated the Marquardt decomposition on the monomodal simulation. The effect of increasing the width of dispersion is very important for the results of the Marquardt decomposition (Fig. 5). For wide dispersions, the discrete method adds discrete exponential terms with value corresponding to the edge of the dispersion. The total area is constant but the relaxation parameters and their relative percentages change with dispersion. 3.2 Effect of signal to noise ratio Noise was added to simulated decay curves so that SNR ranged between 10 to ,,,,. The SNR was calculated from the ratio of the amplitude to the noise standard deviation (Kroeker and Henkelman, 1986). Fig. 6 represents the effect of noise on reconstituted MEM solution from three different input distribution (8 = 15 %, 30 %, 40 %). For SNR values below 100 the MEM solutions are still monomodal but the dispersion width increases as the SNR decreases. Above SNR = 1000, the dispersion width is independent of the SNR. Nevertheless the mean T_, output arc constant (60 + 2 ms) over the SNR range tested. 3.3 Truncature effect for bimodal distribution of relaxation In numerous physically and chemically heterogeneous samples a wide range of relaxation parameters are observed. Until now. most commercial low field NMR spectrometers were not able to store large amounts of data. Consequently it was not possible to sample the decay curves until the baseline was reached; the signals were truncated and the baseline was not known. The effect is more pronounced in the case of multiexponetial decays. To correctly sample the fast decaying component, a short sample
227
time has to be chosen and this prevents a complete sampling of the relaxation data. This phenomenon induces errors in the determination of T: values and their relative amplitudes. 50 ..o ~40o"
,~ ~
_
30 -
~' 20 10I
100
I
1000 10000 SNR
Fig. 6 9Effect of the SNR on the 8 measured from MEM for three simulated dispersions: ]o % (e). ]5% (u). 30% (A).
,ooo t,oo
99 9
600 400
,oo 8O 60
0
D
~
[]
[]
[]
~ ~
[]
~ 8
I
I
I
20
40
60
l
% o f the long c o m p o n e n t
Fig. 7 : Calculated relaxation times T: as a fi~nction of the % of the long component. Square symbols refer to the short (D) and long (,,) relaxation time simulated. Circle symbols refer to the short (o) and long (e) relaxation time obtained from MEM. Triangle symbols refer to the short (v) and long (v) relaxation time obtained from Marquardt analysis. Truncated relaxation decay curves were generated from bimodal relaxation spectra. The relaxation curves were sampled until 1.3 times the largest T2. The mean T2 value inputs w e r e T 2 A = 90 ms and T:B = l(100 ms and the relative percentage of the two mode varying between 10 to 70 %. The decay curves were simulated with N = 600 equally spaced time points with added noise (c~, = 0.02 %) and a baseline value of zero. Fig. 7. shows the evolution of relaxation time Tz obtained from MEM and Marquardt analysis as a function of the percentage of the long T2 component. The agreement between results of the discrete and continuous method is good for the evolution of the short component. A discrepancy appears for the determination of the long T: component.
228 The mean T2 from MEM is constant over the percentage tested. But, the discrete analysis failes to recover the correct value of the long relaxation time especially for small percentages. From these simulated curves it is apparent that discrete analysis of relaxation curves does not recover the maximum of a continuous spectrum when the corresponding relaxation curve is incomplete. 4 VALIDATION OF M E M W I T H E X P E R I M E N T A L DATA. 4.1 Quantitative determination of water and fat in heterogeneous medium Cheese products are by nature very heterogeneous.The heterogeneity lies at the chemical level because the product is composed of water, fat, and protein. It lies also at the physical level because water is compartimentalized by the protein gel network (Callaghan et al., 1983). This structural complexity explains the discrepancy observed between relaxation parameters obtained from the discrete and continuous fitting methods (Fig. 8). B
A .
.
.
.
.
=1"0 I
o 0.5
~ e'~
.
.
.
.
I
T
45 %
I
--
2
t
0.0 :___1 102 1o~ T: (msec)
[ [______a___dI 101
/~Z__ 13 10:
T~,(msec)
Fig. 8 9Bimodal distribution determined by MEM from the relaxation curves of a high fat cheese (A) and a low fat cheese (B). the Marquardt results are indicated as single bars T2 measurements were obtained with a CPMG sequence ( 845 points and interpulse 500 laS) on a Minispec PC 120. Before the measurements, the cheese samples, were heated at 40~ to melt the fat fraction. This temperature is maintained during the experiment with a thermostable NMR probe. Fig. 9 compares the amount of fat and water determined by chemical analysis. MEM analysis and Marquardt analysis. From the Marquardt analysis, the relaxation curves are decomposed into four exponentials for both cheeses. The respective population of these components can not be related to the chemical composition of the cheeses. On the other hand, the relaxation distributions obtained from MEM analysis are bimodal (Fig. 8). The short relaxation time can be attributed to the water protons and the long relaxation time to the fat protons. The relative population of these modes is in close agreement with the ratio of fat and water content in cheese found by chemical analysis (Fig. 9. A I and B l).
229
Chemical Analysis zoo 8o
t
water
100
water
AI _
BI
80
60
60
4o
40
20
20
o
0
NIVlR Analysis 1oo
t
1
I
I
~o =
~f
A2
0~ 80 ~o= 9 60
T
T I T "-"
_
100 80
60
4o
40
20
20
0
0
1
2
3
4
0
1
Number of component ZOO 80
i
l
i
3
0
4
Number of component I
I
A3
-
2 I
I
_
t B3
_
80 60
~9 6 0 -
40
40--
20
20 0
100
0
1
2
3
4
Number of component
0
1
2
3
4
0
Number of component
Fig. 9 9Comparison between NMR and chemical determination of fat and water in high fat (A) and low fat (B) cheeses. The number 1,2 and 3 are respectively the result of the chemical analysis, the M E M analysis and the Marquardt analysis. 4.2 Syneresis of milk gel Our previous study on milk coagulation and syneresis (Tellier et al., 1993) has provided a reliable example of the effect of incomplete decay sampling on the interpretation of the relaxation parameters. After milk e n ~ ' m e coagulation, the coagulum encloses the entire aqueous phase of the milk. A few minute later at 40~ the water components of the milk starts to be expelled as the coagulum contracts, this phenomenon is calling syneresis. During lhis process, the relaxation is analysed in terms of a biexponcntial, independently of the fitting method applied. The short relaxation time corresponds to the water trapped in the coagulum, and the long one is the relaxation time of the expelled whey. Due to the technical characteristics of the Minispec PC 110 the
230 number of sample points is fixed at 845 and the relaxation decay curve is therefore truncated. Fig. 10, reports the evolution of the relaxation time as a function of syneresis time. The agreement between the discrete and continuous method is good for the evolution of the short component, but discrepancy, appears for the evolution of the long component. On the basis of simulations reported in Fig. 7 we can now interpret the time evolution of the long relaxation time during the syneresis as a mathematical artefact of the discrete analysis: it has no physical meaning. 250
1
f
I
I
"g~O0 ( E
2500
2000
~50 -100
E
1500
5O
I
0 0
I
i
1
100 200 300 400
1000 500
Time (rain)
Fig. 10 : Evolution of the water transverse relaxation times as a fimction of time at 40~ and 10 MHz. Open symbols refer to the component with short relaxation time determined by MEM analysis (v) and biexponential deconvolution (o). Solid symbols refer to the component with long relaxation determined by MEM analysis (v) and biexponential deconvolution ( 9 4.3 Droplet-size distribution in emulsion In heterogeneous samples water molecules are co,dined in cavities for a gel and in droplets for an emulsion. In these cases relaxation time measurements can be sensitive to the morphology of the system and pore or droplet size can be determined (D'Orazio et al., 1989; Davies and Packer. 1990: Davies et al.. 1990; Whittall. 199 l). The simplest physical model is the two-fraction fast-exchange model (Zimmerman and Brittin. 1957; D'Orazio et al.. 1989) which assumes that there is a fast exchange between bulk water and water molecules at droplet surface and there is slow exchange between droplets. The water relaxation can be expressed as : 1
T2
1 _
$2
+ ~
(17)
where T2b is the bulk water relaxation. T2.~ the surface water relaxation and X the thickness of the water layer in interaction with the droplet surface S. V is the droplet volume. For the entire sample the magnetisation decay becomes :
231 1 -t(
I(t) = --Jo~P(d)e
+ T2b
111,~
-)
T2s
(18)
where P(d) is the pore size distribution function; d is the pore diameter and m is a constant which depends on the droplet geometry (m = 6 for a spherical geometry). If we know T2b and the ratio ~./T2~. P(d) can be determined from the MEM distribution of the relaxation time F(T). To verify this. we compared the droplet size distribution determined by NMR relaxation measurement and MEM analysis with the distribution measured by microscopy and image analysis. Different water in oil emulsions (water droplets in fat liquid phase) where prepared with different mean pores sizes. r~ .,,.. i
0.25
0.025
- 0.020
t~
'* 0.20 - 0.015
..~ 0.15 -
z
~0.10
-
0.020
-
0.015
-
0.010
-
E 0.01o
E 0.10
-6
-
O
.=_ 0.005
.$ 0.05 o.oo 0
I
I
1
2
f l -I n ~ 3
0.000 4
Radius 0 u n ) MR
mean radius 91.39 ~tm
image analysis m e a n radius 91.42 lttm
--~ 0.05 -
- 0.005
.c
~o o\
..o o"
0.00 0
r
i
i
i
I
1
2
3
4
5
0.000 6
Radius (~tm) N M R mean radius 92.28 p m image analysis mean radius 92.04 v m
Fig. 11 9Distribution of droplets size from MEM (e) with ) . / T 2 ~ = 3.1 10.7 s ~ and 0.445 s ~ and image analysis histogram in volume.
1/T2b =
Fig. 11 shows the NMR and image analysis droplet-size distribution results. For the small mean droplet-size emulsion, the analyses are in good agreement .The width dispersion and the mean diameter are, within experimental error identical (mean NMR radius is 1.39 lam and mean image analysis radius is 1.42 lain). For the medium droplet size emulsion, the dispersion widths are the same but the mean diameter from MEM is a little bit overestimated. For larger mean radii, the level of heterogeneity of the emulsion increases. At this level the agreement with MEM analysis is poor. This disagreement can be explained using our model 9as the diameter increases the ratio between bulk water and surface water increases and tile effect of the heterogeneity becomes insignificant. This determines the size limit for tile NMR relaxation method to study sample heterogeneity.
232 5 CONCLUSION The Maximum Entropy Method appears to be a powerful technique to analyse relaxation data. The method is robust against experimental constraints such as the signal to noise ratio and signal truncation. This mclhod needs no assumptions for initial values and number of componcnts. The analysis of the relaxation curve as a continuous distribution with MEM gives more realistic physical results which take into account the heterogeneity of most food samples. The analysis of multiexponcntial decay curves is the first step in the interpretation of the NMR relaxation data. The second step is the interpretation of the relaxation parameters on the basis of the physical model which should be proposed according to the physical or chemical properties of the sample. However, in the literature the physical model has often been adjusted to the result of data analysis which unfortunately depends on the numerical method used to fit the data. In order to avoid this problem, it is essential to test several numerical fitting methods before deciding which is the most realistic solution. 6 REFERENCE
Bellman, R.E., 1966. Numerical Inversion of the Laplace Transform, Elsevier Belton, P.S. and Hills, B.P., 1987. The Effects of Diffusive Exchange in Heterogeneous Systems on NMR Line Shapes and Relaxation Process..~tolecular Physics, 61. 4, 999-1018. Bevensee, R.M., 1993. Maximum Entropy Solutions to Scientific Problems. Prentice Hall, New Jersey Callaghan. P.T., Jollcy, K.W. and Humphrey, R.S.. 1983. Diffusion of Fat and Water in Cheese as Studied by Pulsed Field Gradicnt Nuclear Magnetic Resonance. Journal of Colloid and hlterface Science, 93, 521-529. Davies, S. and Packcr. K.J. 1990. Pore-size Distributions from Nuclear Magnetic Resonance Spin-lattice Relaxation Measurements of Fluid-saturated Porous Solids. I. Theory and Simulation. Journal of,4pplied Physic, 67,3163-3170. Davies, S. Kalam, M.Z." Packer, K.J. and Zelaya, F.O. 1990. Pore-size Distributions from Nuclear Magnetic Resonance Spin-lattice Relaxation Measurements of FluidSaturated Porous Solids. II. Applications to Reservoir Core Samples. Journal of Applied Physic, 67, 3171-3176. D'Orazio, F." Tarczon, J.C." Halperin, W.P., Eguchi. K. and Misusaki, T. 1989. Application of Nuclear Magnetic Resonance Pore Structure Analysis to Porous Silica Glass. Journal of Applied l~h.vslc. 65. 742-751. Frieden, B.R., 1972. Restoring with Maximum Likelihood and Maximum Entropy. Journal of optical society of ,.Imerica, vol 62. Gull, S.F. and Skilling, J. 1984. Maximum Entropy Method. m Indirect Imaging. Roberts, J.A., Cambridge University Press, Cambrige. 267-279.
233 Halle, B., Anderson, T.. Forsen, S. and Lindman. B.. 1981. Protein Hydration from Water ~70 Magnetic Relaxation. Journal of the ,.-]merican Chemical Socie.tv , 103, 500-508. Hallenga, K. and Koenig, S.H., 1976. Protein Rotational Relaxation as Studied by Solvent ~H and 2H Magnetic Relaxation. Biochemistry 15, 19, 4255-4264. Hills, B.P., Takacs, S.F. and Belton, P.S., 1989. NMR Studies of Water Proton Relaxation in Serphadex Bead Suspensions. ,~lolecular Ph.vsics, 67,209-216. Hills, B.P. and Le Floc'h, G., 1994. NMR studies of non-freezing Water in Cellular Plant Tissue. Food Chemiso'y, 51 (3),331-336. Hore, P.J., 1985. NMR Data Processing using the Maximum Entropy Method. Journal of Magnetic Resonance, 62, 561-567. Jaynes. E.T., 1982. On the Rationale of Maximum Entropy Methods. Proc. IEEE, 70, 939-952. Kroeker, R.M and Henkelman, R.M., 1986. Analysis of Biological NMR Relaxation Data with Continuous Distributions of Relaxation Times. Journal of A[agnetic Resonance, 69, 218-235. Lillford, P.J., Clark, A.H. and Jones, D.V., 1980. Distribution of Water in Heterogeneous Food and Model Systems. in Water in Polymers. ACS Symposium n ~127, 177-195. Lin, Y.Y., Ge, N. H. and Hwang, L. P., 1993. Multiexponenlial Analysis of Relaxation Decays Based on Linear Prediction and Singadar-Value Decomposition. Journal of Magnetic Resonance. serie A, 105.64-71. Livesey, A.K. and Brochon..J.C., 1987. Analysing the Distribution of Decay Constants in Pulse Fluorimetry using the Maximum Entropy Method. Biophysical Journal, 52, 693-706. M ~ Whiner, J.G. and Pike, E.R.. 1987. On the Numerical of the Laplace Transform and Friedholm Integral Equation of the First Kind. Journal oj" Physics A-~lathematical and General, 11, 1729-1745. Marple, S.L. 1987. Digital Spectral Analysis with Applications. Prentice Hall. Marquardt, D.W., 1963. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. J. Soc. lndust. Appl. A[ath., 11, 2.431-441. Peemoeler, H., Yomens, F.G., Kydon, D.W. and Sharp. A.R., 1986. Water Molecule Dynamics in Hydrated Lysozyme. A Deuteron Magnetic Resonance Study. Biophysical Journal, 49, 943-948. Provencher, S.W., 1976. An Eigenfi~nction Expansion Method for the Analysis of Exponential Decay Cun, es. Journal of Chemical Physics. Vol 64, 7. Provencher, S.W. 1982. A Constrained Regularization Method for Inverting Data Represented by Linear Algebraic or Integral Equations. Computer Physics Communications, 27, 213-227. Sibisi, S., 1983. Two-dimensional Rcconstnlclions from One-dimensional Data By Maximum Entropy. Nature, 301, 134-136. Skilling, J. and Bryan, R.K.,. 1984. Maximum Entropy Image Reconstruction 9General Algorithm. A,Ionthlv Notices of the Roval Astronomcal Socie.tv, 211, 111-124.
234 Stephenson, D.S., 1988. Linear Prediction and Maximum Entropy Methods in NMR Spectroscopy. Progress in Nuclear Alagnetic Resonance Speco'oscopy., 20, 515-626. Tellier, C., Guillou-Charpin, M.. Le Botlan. D. and Pelissolo F., 1991. Analysis of Low Resolution, Low-Field NMR Relaxation Data with the Pad6-Laplace Method. Alagnetic Resonance Chemist~. , 29, 164-167. Tellier, C., Mariette, F., Guillement, J.P. and Marchal, P.,1993. Evolution of Water Proton Nuclear Magnetic Relaxation during Milk Coagulation and Syneresis 9 Structural hnplications. Journal of Agriculture and Food Chemistry. 41, 2259-2266. Whittall, K.P., 1991. Recovering Compartment Size from NMR Relaxation Data. Journal of 3/lagnetic Resonance, 94, 486-492. Xu, F. 1990. Reconstitution en Resonance Magn6tique Nucl6aire par la Methode d'Entropie Maximum. Th6se:Universit6 de Paris Orsay. Yeramian, E. and Claverie,P., 1987. Analysis of Multiexponential Functions without a Hypothesis as to the Number of components. Nature, 326, 169-174. Zimmerman, J.R. and Brittin, W.E. 1957. Nuclear Magnetic Resonance Studies in Multiple Phase Systems 9Lifetime of a Water Molecule in an Adsorbing Phase on Silica Gel. Journal of Physical Chemistry, 61, 1328-1333.
Signal Treatment and Signal Analysis in NMR Ed. by D.N. Rutledge 9 1996 Elsevier Science B.V. All rights reserved.
235
Chapter 11 EXAMPLES OF THE USE OF PADI~-LAPLACE IN NMR D. Le Botlan Laboratoire de R M N et Reactivite Chimique Ura C N R S 472 Faculte des Sciences Nantes, F R A N C E 1 INTRODUCTION
Although high resolution NMR spectroscopy has been used to study problems in the area of food stuffs (Belton and Colquhoun, 1989, Hills and Belton, 1989) or in life science (Moser et al., 1989 A, Escany6 et al., 1984) very often, low resolution NMR can also provide a wealth of information (Desbois and Le Botlan, 1994, Moser et al., 1989 B, Lee et al., 1992). In fact, for many heterogeneous products in this field, the solid parts perturb the homogeneity of the magnetic field Bo, and the relatively long correlation time of hydrogen atoms present in many of these samples imply short relaxation times T2 and therefore some very large peaks. Further more, it is often impossible to make use of solid state high resolution NMR for a number of these samples due to the fact that they would need to be rotated at a high frequency, and either it is very difficult to obtain a homogeneous distribution of the sample in the tubes, or the centrifugal force provokes a migration of the constituents. In low resolution NMR, the inhomogeneity of the field Bo is such that the width of the peaks that can be obtained, no longer permit determination and therefore exploitation of chemical shift. The relaxation times T1 and T2 are then chosen as the observed parameters. While in high resolution NMR the relaxation times, T1 most frequently, are determined from the height of the peaks of particular sites, the values are then treated according to a monoexponential model, by low resolution NMR, it is the raw signal without treatment by Fourier Transform which is exploited. Moreover, measurement of T 2 relaxation times by high resolution NMR can pose so many problems, to the point where most of the time, we make do with the value of the apparent spin-spin relaxation time T2* obtained from the width at half height of the average peaks T ; 1 - 7t'A Vo. 5 .
236 (Martin et al., 1980), whilst making use of CPMG sequences (Meiboom and Gill, 1958) is so much simpler on low resolution NMR spectrometers. Whether a simple product such as triglyceride or a complex product such as grain, starch suspension or gel, there is quite a large number of protons for which the relaxation time,is different. The problem is then to know which is the best method to extract the most exploitable form of information from the relaxation curves. Taking into account the speed of acquisition of the CPMG relaxation curves, the sequence which allows the effects of the inhomogeneity of Bo to be compensated for, it is the T2 relaxation times which are most frequently measured. The problem then arises to determine T2 from these relaxation curves. In the products that were studied, the liquid phase is composed of water and / or fat. The most commonly tackled practical problems are the measurement of the level of humidity a n d / o r the level of fat (by setting the temperature at 60 - 70~ all the fat becomes liquid) (Brosio, 1983, Gambhir and Agarwala, 1985), and the state of the water and its quantification (Leung et al., 1976, Brosio et al., 1984); from this last parameter, the level of denaturation of proteins (Lambelet, 1988), the optimum level of drying (Monteiro - Marques et al., 1991) or the behaviour of a plant under stress at low temperatures (Abass and Rajashekar, 1991 ) can be deduced. In general, the results appear in the form of discrete values, the CPMG relaxation curves being analysed as the sum of 2 or 3 components I = I0 [ Pa exp(-t / T2a) + Pb exp(-t / T2b) + Pc exp(-t / T2c) ]
[11
without indications being given on the actual number of distinct physical components; in some cases only the distribution ofT2 is presented (Kroeker and Henkelman, 1986). However, in most cases, not only one sample is studied, but rather series of samples, according to a parameter such as the water content, the fat content, or the temperature. In this case, variation of one or more of the NMR parameters (T2 or P) is representative of an evolution in the sample. So, such treatment remains interesting. In this chapter the Pad6-Laplace method will be applied to the decomposition of a sum of discerte CPMG relaxation curves. 2 EXPERIMENTAL CONDITIONS 2.1 Baseline The first problem when treating CPMG relaxation curves by the Pad6 - Laplace method (Yeramian and Claverie, 1987, Claverie et al., 1989, Tellier et al., 1991) is to know whether it is required for the curve to reach the baseline. In actual fact the knowledge of the baseline of the signal would not be theoretically necessary if a constant component is used (Aubard et al., 1987). A starch-water suspension CPMG curve (pulse spacing 9= 0.6 ms, 169 measurements every 2 echoes) has been used as a model. As can be seen on the amplified view (Fig. 1), the baseline is reached at about t = 90 ms (150th value, this point corresponds to 6 times the T2max of the T 2 distribution determined by the Contin program), although, from the general shape, it appears to be reached at about t = 50 ms. Five treatments have been performed with 169, 150, 93, 80 and 55 points
237 (such as, for the last three cases, the last value corresponds to 0.5%, 1% and 3.5% of the first point of the curve). It can be seen in Table 1 that although the FIT value, given by FIT = 100[1 - ~o(E(f(h)-YO 1 2/(N-2n)) 0~]
[2]
L
is better with 169 values, the NMR parameters are not very different when fewer points NMR Signal
1854
%, -%
i
.
-
."
scale / ! 00
9
9
-
~
~
...-
o
-..
o
m
,,
,
.
Time ms number of data
.
,,
,
,~8 80
,.
.
.
.
.
.
5'6 93
.
.
.
90 150
I01 169
Fig. 1" CPMG relaxation curve of a starch-water suspension, B, amplified view of the last 50 points. Table 1" NMR parameters obtained with the Pad6 - Laplace program according to the percentage ,P, of the CPMG relaxation curve treated, N, corresponding number of points, FIT, see the text. i
ii
i
i
N
P%
T2a ms
T2b ms
T2c ms
Pa %
Pb ,
169
100
113
6.44
1.83
54.8
17.2
27.9
99.908
150
100
115
6.46
1.76
52.2
19.9
27.9
99.906
93
99.5
116
6.72
1.77
50.2
21.8
28.0
99.895
80
99.0
114
6.28
1.74
53.1
19.1
27.8
99.825
65
96.5
106
2.2
67.2
32.8
99.495
,i
i
FIT %
238 are used except for the last case (P=-96.5%) from which we only got two components. With the Iteratif program (Tellier et al., 1991), using the Pad6 - Laplace results as a starting point the FIT values are better for each case (table 2), but the NMR parameters are much more sensitive, especially the short component T2c (-8%), and Pc (-10%); the interpretation of the first ~'o components will be tackled later. Table 2: NMR parameters obtained with the Iteratif program according to the percentage, P, of the CPMG relaxation curve treated, N, corresponding number of points, * method does not converge.
N
P%
T2a ms
T21~ ms
T2r ms
P~1%
169
100
14.3
9.0
1.78
18.4
150
100
13.7
8.7
1.77
93
}9.5
11.7
6.27
80
}9.C
11.5
5.80
65
}6.5
Pc %
FIT %
52.0
29.6
99.958
23.5
47.2
29.3
99.957
1.66
51.4
21.4
27.1
99.955
1.64
54.2
19.2
26.6
99.938
2.2 Measurements We are now led to the problem of the conditions necessary to get the CPMG curves. For many products studied, there was found to be short T2 values of the order of milliseconds corresponding to "bound" water for example, and long T2 values of the order of hundreds of milliseconds for liquid fat and / or free water (as in suspensions). If the measurement step is large, the rapid decrease is insufficiently sampled, if the step is too small, the base line is not reached due to the limited memory of the NMR apparatus used, a Minispec (Bruker). The solution that we found, and proposed in the BRUKER pilotage software of MINISPEC for a PC, is to use the same pulse spacing but to take measurements every 2, 4 and 10 echoes ( Le Botlan and Helie, 1994 ). Then the 3 relaxation curves can be perfectly superimposed and can be merged, which is not the case when different pulse spacing times are used because of chemical exchange, molecular diffusion (Hills et al., 1990), or pulse imperfection. However when the noise from the treatment of these curves is considered it can be seen that residues are not distributed in the same way over the time span because of the greater weight of the first part of the curve in the calculus. This problem can be avoided by writing the pulse sequence in the following way : 90~ - Xl "(180~ "x2 " 180~
~1 "measurement - x 1 - 180~
~2 " 180~ -- ---. X2)n.
In this way the relaxation curve is sampled from 2 x2 every 10 echoes, that is to say with a step of 20 r2. Then, by changing the position of the measurement in the sequence to 4 x2, then to 6 x2 etc., a data set of 845 (169,5) values is obtained; so a data set of measurements every even echo is obtained from the second echo to the 1690 th one.
239 2.3 Offset value 2.31 Origin of the non null offset value
When the value of the horizontal asymptote for CPMG curves is determined, it is noticed that its value is different to the offset of the amplifier, and for the same product, it changes with the pulse spacing, whatever the nature of the liquid phase, water or oil, of the sample. The evolution of the CPMG baseline for a starch suspension is presented in table 3, and with an oil sample, asymptotic values of-63.7 (~=400 ~t sec.) to -11.0 (x=4000 ~ sec.) have been measured whilst their standard deviations were identical (1.1 and 1.24). This problem has been already tackled by Hughes and Lindblom (1974) concerning high resolution NMR CPMG curves, and a modelisation of the Mydriit magnetisation has been propounded. According to the authors, "the baseline drift causes no error in the measured value of T 2, provided the equilibrium CPMG baseline is taken as the zero". Table 3: Horizontal asymptote (HA) of CPMG relaxation curves according to the pulse spacing x used. ,
i
i
i I
z ~ts
200
250
300
350
400
500
HA au
-166
-156.3
-146.6
-136.6
-128.1
-109.8
1000
I I
-47.3 ]
However, if we aim to optimise the smoothing of the curve by an iterative process on the offset value, the optimum value is always different to both the amplifier offset and the CPMG curve asymptote (table 4). Table 4 9 NMR parameters obtained by using 3 offsets: a) the amplifier offset; b) the horizontal asymptote of the CPMG relaxation curve; c) the value giving the minimum standard deviation ~. i
Offset a.u. -32.0 a -44.5 b -27.0 c i
I
II
[ I
o 4.8 14 3.7
I0 a.u 1728 1746 1724 |
i
i
I
, T2a ms 10.3 9.3 10.7 i
II
Pa % 67.3 73.1 65.5 i
T2b ms 1.51 1.05 1.66 I
I
i
Pb % 32.6 26.8 34.5
i
We can see that the signal is not null after a 180 ~ pulse along either the ox or oy axes (fig. 2) and differs with the sample height, nor after the last 180 ~ pulse in a CPMG sequence, even though the baseline is reached (fig.3). The problem of the efficiency of these 180 ~ pulses is evident when measuring the spin-lattice relaxation time T1. By applying the relation I = Io[ 1 - 13 exp(-t / T1) ] (13 = 1- Cos 0, 13 = 2 if 0 = 180 ~ (Gerhards and Dietrich, 1976) to the measurements obtained with the inversion-recovery sequence, we always get an angle of less than 180 o ( Le Botlan et al., 1983). In the same way, Meiboom and Gill ( Meiboom and Gill, 1958) modified the Carr Purcell sequence in view of this problem. By applying a 90 ~ phase to
240
..:--..A
312-
FID 180~
arbltrat3'unit
o
Z"l.%'"-, ;.. 9
:
9
";
";.
..
".. ...................
'.... ;::,. mo under conditions of fast diffusive averaging of the magnetisation density so that the observed relaxation is single exponential. As we
275 have seen, whenever the NMR observation timescale is shorter than the diffusive timescale the relaxation becomes, in general, multiple exponential. In contrast, multiple activity coefficients are never observed because the observational timescale for equilibrium thermodynamic quantities is essentially infinite. This distinction is important because it may explain some anomalies in the effect of reduced water content on the growth and death of microorganisms in porous media such as foods. A bacterial cell has a typical size of the order of 2-4 microns, and in a porous medium is confined to a particular pore or capillary. The cell does not therefore respond to the volume averaged water activity but to the local water activity in the pore in which it is located. If water is removed by capillary forces from that pore before it is removed from other smaller pores in the matrix then the local activity sensed by the cell will be much lower than the volume averaged activity measured by e.g. equilibrium vapour pressure methods. In this circumstance NMR water proton relaxation time distributions may be a more useful sensor of microbiological survival and growth rates than conventional water activity measurements.
1.2
----------'--m
1.0[ 0.8 0.6
oooooooooooooooOO~~176 ........~.................................................................. o.OO~176
0.4 ..... BET fit 0.2
10 -0.2
20
30
40
50 Wy
60
70
80
90
100
Fig. 16: The equilibrium sorption isotherm for randomly packed beds of Sephadex G2550 and G25-300 microspheres at 298K. The solid line is the fit of equation (22), the dotted line the best fit of the B.E.T. isotherm.
276 5 ELECTRICAL CONDUCTIVITY AND NMR W A T E R RELAXATION According to Archies law (D'Orazio et al., 1990), the dependence of the electrical conductivity of a porous medium is related to water content as, O"
-r
~
or,.
(25)
where ~ is the porosity (the ratio of the total void volume to the total volume), S is the degree of saturation and p and q are constants of the order of 2-3. Archies law has been derived for water-saturated beds by purely geometric arguments, however, the extension to unsaturated systems is empirical.
1.8 1.6 r~
>~ 1.4 -~
1.2
0
1.0
~
O.8
~
0.6
0~
0.4
o
0.2
O Sephadex G25-300
Silica 40-601am A Silica 130-2501am
! 1/"
01
f'l
0.05
I
I
I
I
I
I
I
0.1
0.15
0.2
0.25
0.3
0.35
0.4
-0.2
-Log Wy/Wy sat
Fig 17: The dependence of the conductivity ratio on water content for the indicated types of randomly packed beds. The lines show the fit of Archies law (equation 25) to the data. Figure 17 shows that Archies law is reasonably well obeyed for many porous systems containing varying amounts of a dilute potassium chloride solution. If Archies law is combined with equation (23) we deduce that the water proton relaxation rates are related to conductivity as, cr
crq,
_
~
[ 1' y.,-y,
Y',~ Yb
(26)
277 where ysat is at the relaxation rate in the water saturated system. Equation (26) remains valid as long as the exchange is fast on the NMR timescale and the relaxation is single exponential. Figure 18 shows the single exponential water proton transverse relaxation times for beds of Sephadex G25-50 conform satisfactorily to equation (26). When the NMR exchange is slow the changing intensities of the relaxation time peaks is not so obviously related to conductivity and much work remains to be done in elucidating the relationship.
1.8
1.6 1.4 r
1.2 -,.,.4
1.0 O
0.8
,f,,~
0.6
~
> ~
9 Silica 40-60~m [ Sephadex G25-300 |
0.4
0
o 0.2 tm o
O( 0.1
|
-0.2
0.2
0.3
0.4
0.5
0.6
0.7
Log ( ( l f r l ) - ( l f r lb ))/((1]Wlsat)-(1/T/lb ))
-0.4 Fig 18: The dependence of electrical conductivity on water proton relaxation rates for the indicated randomly packed porous beds. The lines are the fits of equation (29) to the data.
6 CONCLUSION The above discussion shows that water relaxation in heterogeneous materials can be extremely complicated and elucidating the relationship between microstructure and water relaxation will remain a challenge for many years to come. The discussion in this Chapter has centred on water proton relaxation. However many of the principles also apply to deuterium (in D20) and oxygen-17 (in H2 ~70) relaxation. Indeed there is much to be gained by a combined multinuclear approach to water relaxation since the three nuclei have different relaxation pathways. Deuterium and oxygen-17 relaxation is dominated by intramolecular rotational modulation of the electric quadrupole interaction. However chemical exchange can also contribute to the deuterium relaxation, but not to the oxygen-17 relaxation (at least not under conditions of proton decoupling). A
278 comparative multinuclear study can therefore, in principle, reveal the relative importance of chemical exchange and of intermolecular dipole interactions to the water relaxation. Another area that has not been addressed in this Chapter is the relationship between water proton relaxation and MR/. By combining the relaxation pulse sequences with imaging sequences it is possible to produce spatially resolved relaxation time maps. If the spatial resolution is low (e.g. millimeters) water diffusion between volume elements (voxels) in these maps can be neglected and the previous theory can be used to interpret the relaxation behaviour of each voxel in the image. However in NMR microscopy voxel sizes can be as small as a few microns and water diffusion between neighbouring voxels during the imaging can no longer be ignored. When this occurs the observed MRI relaxation behaviour will depend on image resolution. Elucidating this relationship remains another challenge for the future.
7 APPENDIX The 2 site chemical exchanges expressions are 1/T2 = (1/2r)lnA. ln2
=-
rot+ +
2D~ = + 1 + ( r
r = (r / .
=
(27)
ln[(D+cosh2~:-D, cos~)~'2+(D+sinh:~: + D. sin2r/)~'2] (28) 28co2)/(~ 2+ ~)v2
+ (C + r
(30)
+ (C+
(31)
4")"'f '}
= cr_2 - 6co 2 + 4/Z',rb (=
(29)
(32)
2.6co.a.
a. = (1/T2,- 1/T2b + l / r , -
(33) 1/rb)
(34)
a+ = ( l / T : , + l/T2b + l i t , + l/rb)
(35)
6co = COb" CO,
(36)
where 9 T2a and T2b are the transverse relaxation times of the a and b sites, respectively 1;a and 1;b are the lifetimes on the a and b sites, respectively such that Pa/1;a= P b / 1 ; b
--
1//1;
279 $ REFERENCES
Barker, G.C. and Mehta, A., 1992. Vibrated powders: Structure, correlations and dynamics. Phys. Rev. A., 45, 3435-3446. Belton, P.S. and Hills, B.P., 1987. The effects of diffusive exchange in heterogeneous systems on NMR lineshapes and relaxation processes. Molec. Phys., 61, 999-1018. Brownstein, K.R. and Tarr, C.E., 1979. Importance of classical diffusion in NMR studies of water in biological cells. Phys.Rev., 19, 2446-2453. Brunauer, S., Emmett, P.H. and Teller, E., 1938. Adsorption of Gases in Multimolecular Layers. d. Am. Chem. Soc., 60, 309. Carver, J.P. and Richards, R.E., 1972. A general 2-site solution for the chemical exchange produced dependence of T2 upon the Cart-Purcell pulse spacing, d. Magn. Reson., 6, 89-105. Davis, S. and Packer, K.J., 1990. Pore-size distributions from M R spin-lattice relaxation measurements of fluid-saturated porous solids, d. Applied Physics. 67, 3163-3170. D'Orazio, F., Bhattacharja, S., Halperin, W.P., Eguchi, K. and Mizusaki, T., 1990. Molecular diffusion and NMR relaxation of water in unsaturated porous silica glass. Phys. Rev. B., 42, 9810. Franks, F., 1991. Water Activity: a credible measure of food safety and quality? Trends in Food Sci. and Technology, 68-72. Goldman, M. and Shen, L. 1966. Spin-spin relaxation in LaF3. Phys. Rev., 144, 321331. Grad, J. and Bryant, R.G., 1990. Nuclear Magnetic Cross-Relaxation Spectroscopy. d. Magn. Reson., 90, 1-8. Hills, B.P. 1990. NMR relaxation studies of proton exchange in methanol-water mixtures, d. Chem. Soc. Faraday Trans., 86, 481-487. Hills, B.P., 1992a. The proton exchange cross-relaxation model of water relaxation in biopolymer systems. Molec. Phys., 76, 489-508 Hills, B.P., 1992b. The proton exchange cross-relaxation model of water relaxation in biopolymer systems II. The sol and gel states of gelatine. Molec. Phys., 76, 509-523. Hills, B.P. and Babonneau, F., 1994. A quantitative study of water proton relaxation in packed beds of porous particles with varying water content. Magnetic Resonance lmaging, 12, 909-922. Hills, B.P., Belton, P.S. and Quantin, V.M., 1993. Water proton relaxation in heterogeneous systems. I. Saturated randomly packed suspensions of impenetrable particles, Molec. Phys., 78, 893-908. Hills, B.P., Cano, C. and Belton, P.S., 1991. Proton NMR relaxation studies of aqueous polysaccharide systems. Macromolecules, 24, 2944-2950. Hills, B.P. and Duce, S.L., 1990. The influence of chemical and diffusive exchange on water proton transverse relaxation in plant tissues. Magnetic Resonance lmaging, 8, 321-331.
280 Hills, B.P. and LeFloc'h, G., 1994a. NMR studies of non-freezing water in cellular plant tissue. Food Chemistry, 51, 331-336. Hills, B.P. and LeFloc'h, G., 1994b. NMR studies of non-freezing water in randomly packed beds of porous particles, Molec. Phys., 82, 751 763. Hills, B.P. and Pardoe, K., 1995. Proton and Deuterium NMR Studies of the Glass transition in a 10% water-maltose solution. J. Molec. Liquids. (in press). Hills, B.P. and Quantin, V.M., 1993. Water proton relaxation in dilute and unsaturated suspensions of non-porous particles, Molec. Phys., 79, 77-93. Hills, B.P. and Snaar, J.E.M., 1992. Dynamic q-space studies of cellular plant tissue. Molec. Phys., 76, 979. Hills, B.P., Takacs, S.F. and Belton, P.S., 1989. The effects of proteins on the proton NMR transverse relaxation times of water. I. Native bovine serum albumin. Molec. Phys., 67, 903-918. Hills, B.P., Wright, K.M. and Belton, P.S., 1989a. NMR studies of water proton relaxation in Sephadex bead suspensions. Molec. Phys., 67, 193-208. Hills, B.P., Wright, K.M. and Belton, P.S., 1989b. Proton NMR studies of chemical and diffusive exchange in carbohydrate systems. Molec. Phys., 67, 1309-1326. Kenwright, A.M. and Packer, K.J., 1990. On T1 cancellation schemes in Goldman-Shen type experiments. Chem. Phys. Letts., 173, 471-475. Koenig, S.H., Bryant, R.G., Hallenga, K., and Jacob, G.S., 1978. Magnetic crossrelaxation among protons in protein solutions. Biochemistry, 17, 4348-4358. Neuhaus, D. and WiUiamson, M., 1989, in "The Nuclear Overhauser Effect" (VCH). Otting, G., Liepinsh, E. and Wuthrich, K., 1991. Protein hydration in aqueous solution, Science, 254, 974-985. Provencher, S.W., 1982. A constrained regularization method for inverting data represented by linear algebraic or integral equations. Comput. Phys. Commun., 27, 213. Santyr, G.E., Henkelman, R.M. and Bronskill, M.J., 1988. Variation in measured transverse relaxation in tissue resulting from spin locking with the CPMG sequence. J.Magn.Reson.,79, 28-44. Torrey, H.C., 1956. Block equations with Dit~sion Terms. Phys. Rev., 104, 563-565. Wu, J.Y., Bryant, R.G. and Eads, T.M., 1992. Detection of solid like components in starch using cross-relaxation and fourier transform wide-line 1H NMR methods. J. Agric. Food Chem., 40, 449-455.
Signal Treatment and Signal Analysis in NMR
281
Ed. by D.N. Rutledge
9 1996 Elsevier Science B.V. All rights reserved.
C h a p t e r 13 SCATTERING PULSED
WAVEVECTOR
GRADIENT
ANALYSIS
OF
SPIN ECHO DATA
A n d r e w Coy and Paul T. Callaghan D e p a r t m e n t of P h y s i c s , M a s s e y U n i v e r s i t y , P a l m e r s t o n North, New Zealand. 1
THE
PULSED
GRADIENT
SPIN
ECHO
METHOD
The pulsed gradient spin echo (PGSE) nuclear magnetic resonance (NMR) technique has been widely used since the late 1960's (Stejskal and Tanner, 1956, Stejskal, 1965) to measure molecular self-diffusion coefficients. In recent years the applications of this technique have been expanded (Callaghan et al., 1990, Cory and Garroway, 1990, Callaghan, 1991) to include a number of more complex examples of molecular motion. In this chapter we focus on two specific examples which present interesting problems in signal and data analysis. These are the case of restricted diffusion involving a scattering or diffraction analysis (Callaghan, 1991, Callaghan et al., 1991, Coy and Callaghan, 1994a, Coy and Callaghan, 1994b) which is dealt with in section 2 and the case of heterogeneous motion involving imaging analysis (see Mansfield and Morris, 1982 and Callaghan, 1991). This latter example is dealt with in section 3. A typical PGSE pulse sequence is shown in figure 1. The spins are disturbed from equilibrium by the first 900 pulse and are then dephased by the first gradient pulse where the amount of dephasing depends on the position of the spin in the sample at the time when the gradient pulse was applied. (At this point it should be noted that any motion of the spin during the gradient pulse can also affect the amount of spin dephasing.) The 1800 pulse, at time v, then inverts all spin phases so that the effect of the second gradient pulse is to restore the initial phases of all the spins. Any motion of the spins during the period A between the two gradient pulses will result in a difference in the phase shifts generated by the two gradient pulses. This time interval A is often referred to as the diffusion time. A distribution of phase shifts throughout all spins in the sample will then cause an attenuation of the echo formed at time 2r. By observing the echo attenuation, E, as a function of
282
90,,
echo
~8oy
center
r.f. pulses 0
2z
"r
,< -->'5" 0. As long as the value of the FID at t = 0 § is not modified by the window, integrals are not changed, but the very wide "wings" of the Lorentz shape are removed.
312 For a given integration interval, the bias is much diminished. Thus, 11 = 0.98 for a pure gaussian shape. When the parameter c~ is large, the line acquires a characteristic shape, with negative "feet": even in that case, the integral value is not seriously altered.
4.3 The full width-height estimator When base-line flattening or deconvolution are not applicable, a simple and accurate estimation of the area of a lorentzian line can still be obtained by computing the full width-amplitude product. For the lorentzian of full width 2k, the amplitude is L(0) = A/Trk, and the area is A = (Tr/2)(2k)(A/Trk). This amplitude-width estimator has been shown to be rather immune to phase errors (Weiss et al., 1983), up to an error of 60 ~ but its behaviour in the presence of noise was not investigated. At this point, one should take up the problem of overlapping lines. In the author's opinion, it has no convincing solution when straightforward integration is to be used. An expert system has been described (Chow et al., 1993). It is claimed that this program can automatically quantify in vivo 31p NMR spectra as reliably as a human spectroscopist. The program fits each line to an asymetrical triangle shape. Holak et al.(1987) showed that NOESY cross-peak volumes could be well approximated from the scaled product of two integrals, one along fl, the other along f2; by a suitable choice of row and column, the interference from a second peak may be minimized. 5. NOISE IN M R SPECTROMETERS Noise in an NMR spectrometer is almost always taken as additive, stationary, gaussian and of zero mean. These hypotheses are very convenient for theoretical derivations, but experimental support is scant. In fact, the first demonstration of the gaussian property is quite recent (Rouh et al., 1994). There are some indications that spectrometer noise is not stationary: the uncertainty on peak heights was found to depend on the recovery delay used in a TI measurement (Skelton et al., 1993). Except in Ernst's classic paper (Ernst, 1966), the effects of non-white noise have never been investigated for any data processing scheme in NMR even though 1/f noise, for instance, occurs often and can be easily simulated (Laeven et al., 1984). In fact, the noise is both band-limited (by an input filter of pass-band approximately equal to the spectral width Ws ) and time limited (by the acquisition time Ta). Taking these features rigorously into account leads to computational difficulties. Authors have therefore assumed that the noise is unbounded in time or frequency or both. 5.1 Peak-picking routines A knowledge of the statistical properties of the noise is necessary in order to design automatic peak-picking routines. Commercial software packages usually assume that no signals are present in the first and last 5 or 10 percent of the spectral width, compute the standard deviation of noise in these regions and label as "peak" any feature that rises more than three standard deviations above the baseline. A parabola may then be fit by leastsquares to the central part of the line, to obtain an accurate value of the resonance frequency. Such a procedure will not perform satisfactorily without a clean base-line and
313 can hardly be generalized to multidimensional spectra, because spikes, ghosts, and tl ridges will give rise to a large number of false resonances. The first defect can be addressed by first removing base-line artifacts as described above or by using algorithms that correct the base-line simultaneously with peak recognition (Chylla and Markley, 1993). The second problem is partially solved using additional information to reject spurious peaks. Statistical criteria may be used (Mitschang et al., 1991, Rouh et al., 1993, Rouh et al., 1994). Information about the width or the symmetry of the cross-peaks can be used (Kleywegt et al., 1990, Rouh et al., 1994). 6. PRECISION OF I N T E G R A L S In this paragraph, we consider random errors in the integral caused by noise superimposed on the spectrum. This topic was first treated by Ernst (1966) in the case of continuous wave NMR, then independently by Smit and coworkers for chromatography signals (Smit and Walg, 1975, Laeven and Smit, 1985, Smit 1990), by Weiss and collaborators (Weiss and Feretti, 1983, Weiss et al., 1987, 1988, Ferretti and Weiss, 1991) and by Nadjari and Grivet (1991, 1992).
6.1 Frequency domain approach Either of two approaches can be taken to compute the standard deviation of the integral of noise, in the frequency or in the time domain. Let us begin with the more direct method, using the frequency domain. We wish to integrate some line between the limits /~inf - - P0 - A/2 and Vsup = v0 +A/2.The integral of the region of interest is then I(A) = J(A) + K(A), the sum of signal and noise contributions. The expectation value of I is J, because we assume that the noise has zero mean. The standard deviation of I is entirely due to the fluctuating quantity K, that is (712 "-- aK 2. It can be shown (Papoulis 1966, Nadjari 1992) that ~K reduces to: A 2
or - f r R ( . ) ( a -I~ p)d.
(1)
-A
where I a R is the real part of the complex noise correlation function; it may be obtained from the real part of the Fourier transform of the noise intensity q(t) in the time domain. The effect of windows appears through the function q(t). If we use no filter, q(t) = q0 times the unit rectangle from 0 to T a. Then I'a(/Z ) = q0Tasinc(2~r#Ta) for white noise and aK2 = (q0A/Tr)Si(ATa), where Si stands for the sine integral. The quantity AT a is often much larger than unity, and aK2 reduces to qoA/2. 6.2 Time domain approach Another relation may be obtained (Ernst, 1966) by working in the time domain. The integral of the region of interest can be considered as the integral of x(v) rec(v - v0; A), where the second factor is a unit rectangle extending from Pinft o /~sup" The integral of any function in the frequency domain is the value, at t = 0, of its inverse Fourier transform.
314 In the present case" FT -~ {x(v) rec(v - Vo; A)} = s(t) | [A sinc(At) exp(-2iTrvot)]
(2)
where FT -~ {...} stands for the inverse Fourier transform and | is the symbol for a convolution product; s(t) is the FID (signal plus noise, possibly multiplied by a window function) and sinc(x) = sin(x)/x. The value of the convolution product at t = 0 is the integral that we seek. Here again, we are interested in the standard deviation of I or of K, the integral of the noise, which can be obtained by retaining only the noise component b(t) of s(t). In the case of white noise, multiplied by the window function f(t), the variance of K is (Nadjari 1992):
0 r2 =
fq o f 2(t)A2sinc2(At)dt
(3)
o
If we do not use a filter, f(t) = 1, the integral of sinc 2 is 1/2, and OK2 = q0A/2, identical to the previous result. The standard deviation of the integral is proportional to the square root of the integration interval. It is useful to define the signal to noise ratio for the integral, r~ = I(A)/aK. On the other hand, the (usual) signal to noise ratio is rs = A/av, if A is the amplitude of the peak and try the standard deviation of the noise in the frequency domain. For an unfiltered lorentzian line, rs = (A/TrX)/V'(q0Ta). The ratio r,/r, is then:
r1
r,
- ~/rcT a I T 2 arctan(A/2~,)
(4)
x/A/2;t
The function arctan(x)/v/x has a maximum value of 0.8 for x = 1.4. Several points of interest emerge from a consideration of r~ or of the ratio r~/r,. (i) Since T a is often several times T2*, r,/r, is often slightly larger than unity, showing that peak integration performs somewhat better than measuring the peak height. (ii) There is an optimum integration range, A = 2.8X. However, this value will entail a large systematic error. (iii) The above formula can be used to determine the signal to noise ratio (i.e. the number of scans) required in order to achieve a given precision on the integral. As mentioned previously, the area of a constant-width signal can often be replaced with advantage by the signal amplitude. It has been reported that peak amplitudes, rather than integrals, lead to better precision in T~ determinations (Akke et al., 1993 ), although no explanation was offered. It can be surmised that systematic errors due to a faulty baseline or to very low frequency ("flicker" or l/f) noise will be less important for amplitude than for area measurements.
315 6.3 The effects of windows The previous formalism is easily extended to any type of windows, exponential, gaussian or sinusoidal for instance. Theoretical results have often been supplemented by computer simulations. Some authors have chosen to incorporate uncertainty due to noise and systematic error due to a finite integration interval into a single "error term" obtained either by summing the two contributions (Smit and Walg, 1975) or by a mean square combination (Ferretti and Weiss, 1991). The main conclusion to be drawn is that the use of filters is ineffective in increasing the precision of an integral. The main reason is that computing an integral is of itself equivalent to filtering the signal. The transfer function is H(o~) = 1/i~ and the high frequency components of noise are effectively strongly attenuated. Another equivalent explanation is that the integral of the whole spectrum is equal to the value of the FID at time 0 +, which is kept constant for all filters. Various windows differ in the way they modify linewidths. Best results will be obtained with procedures that diminish linewidths and prevent peak overlap: the Lorentz-Gauss deconvolution is thus recommended. 6.4 Two-dimensional spectra Some two-dimensional experiments are designed to provide quantitative information as a function of two frequencies. The volume of a NOESY cross-peak depends on the distance between the two relevant nuclei. In the case of the EXSY experiment, the peak volume is a function of the rate constants of reactions carrying the nucleus between the corresponding sites. Relative concentrations can also be obtained from COSY cross-peaks. The two-dimensional signal is made up of a series of FID's, recorded as functions of t2, for various values of the parameter t~. This array is then interpreted as a sampled version of a function of two variables, s(tl,t2). Each FID is corrupted by the noise b(t2). This series of noise traces is then interpreted as a sampled realisation of a two-dimensional random process b(t~,t2). Just as in the one-dimensional case, we assume that the noise is stationary, white and gaussian. In fact, in the first dimension (t~), successive noise samples are uncorrelated, since they are recorded far apart in real time. Further, there is no correlation between the two dimensions. These hypotheses allow the separation of the two dimensions, t~,p~ on one hand and tE,P 2 o n the other. The precision of volume estimation for two-dimensional spectra can then be computed (Nadjari and Grivet, 1992) as a product of two factors, one for each dimension. These factors are identical to the results of the previous paragraph. Thus, when no filters are used: O'K2 =
qr
/ 4 = (q/4Wl)A1A 2
(5)
The A~ are the integration ranges in each direction, q is the noise intensity (in the physical t2 direction), W~ is the spectral width in the first direction, and the q~ are the mathematical noise intensities for the two-dimensional random process. Under the hypotheses that pure absorption lineshapes are obtained and that an exponential filter is used in both dimensions, the signal to noise ratio for the peak volume can be written as:
316
rv
-
2A
H 1
11;2 i=
arctan(2~~ )
q~/~i
(6)
A is the peak amplitude, the ~,~ are the half-widths of the peak in each direction, after broadening by the exponential filter. The ratio of rv to r s, the usual signal to noise ratio, is:
rv
4 i ~3 ~'i arctan( 2_~_L) : jA,Sfi
(7)
where the filter broadening is called difi. This ratio is not very different from unity, for any reasonable choice of parameters: we verify again that filtering is ineffective in improving the precision of volume integrals. Just as in 1D NMR, windowing can be helpful in reducing overlap and truncation artefacts. 7 LEAST SQUARES METHODS The previous sections have shown that intensity determinations are fraught with many difficulties: the first points of the FID can be corrupted, the last points may be missing, lines may overlap. Further, spectral processing must often rely on subjective judgment by the spectroscopist, in order to choose the baseline or the integration limits. Therefore, there is strong interest in methods that can use incomplete data, provide an objective estimate of the intensity and its probable error, and possibly can be used in an automatic mode, with minimal operator intervention. The least squares (LS) fit of either free precession signals or spectra is one such method.The maximum likelihood formalism is also a powerful data processing method (Gelfgat et al., 1993); it has been applied to solid state NMR spectra (Kosarev, 1990). In most implementations, the noise is assumed to be white and gaussian. In that case, the maximum likelihood approach is completely equivalent to least squares. The two formalisms will be considered together. The precision of LS-derived parameters was first investigated by Posener (1974) and Chen et al. (1986), for continuous wave NMR. In the case of FT NMR, one can work either in the time domain ("measurement domain" ) or in the frequency domain. The two approaches are theoretically equivalent. The quality of the fit is more easily assessed in the frequency domain (Martin, 1994) but the model function in the time domain (sum of damped sinusoids) is simpler than in the frequency domain (sum of Abilgaard functions). Further the properties of exponential functions make some algebraic shortcuts possible, thus saving computer time. It is also worth noticing that frequencies, phases and relaxation times enter the model in a non-linear manner, while amplitudes are linear parameters. It is possible to make use of this difference in the algorithm (Golub and Pereyra, 1973), albeit with a much more complex formalism. The minimisation of the sum of squared residuals is
317 usually accomplished with the Levenberg-Marquardt algorithm, although simulated annealing has been used (Sekihara and Ohyama, 1990); this method is attractive in principle, since it seeks the global minimum, however it is computer intensive. Since each line is described by four parameters, an n-line spectrum results in a problem with 4n unknowns. The EM algorithm (Miller and Greene, 1989, Miller et al., 1993, Chen et al., 1993) attempts to fit each line sequentially. On the other hand, it was observed (Montigny et al., 1990) that small artefacts (or lines) distant from the region of interest could be omitted from the fit, without hindering the convergence. This favourable property has been used to perform a frequency selective fit in the time domain, allowing for a large reduction in the problem size (Knijn et al., 1992). In an original approach, Webb et al. (1992) proposed to represent the spectrum as a large (several 105) number of lorentzian "spectral elements" of constant amplitude but varying frequency and width. The probable errors on line intensities were very close to the Cramer-Rao lower bounds (see below). The large number of recent reports on the use of least-squares makes an exhaustive survey difficult, but recent reviews are available (de Beer and van Ormondt, 1992, Gesmar et al., 1990). The performance of a least squares method can be assessed by computing the Cramer-Rao lower bounds on the statistical errors of the parameter estimates. In the case of uncorrelated gaussian noise and well-resolved signals, the estimated variance on the signal amplitude is nearly equal to the variance of noise in the time domain (Macovski and Spielman, 1986). Not surprisingly, the method performs better the higher the signal to noise ratio.Thus, it proved possible to retrieve an accurate ~3C abundance from a fit to ~3C satellites of a high-sensitivity proton spectrum (Montigny et al., 1990). When the signal to noise ratio is too small, the least-squares method breaks down, the uncertainties rising much above the Cramer-Rao limits (de Beer and von Ormondt 1992).
7.1 The importance of previous knowledge Another approach that can be taken to improve the robustness and precision of an LS fit is the incorporation of previous knowledge. Obviously, the use of an accurate model (exact lineshape, true number of lines) amounts to incorporation of a vast amount of such knowledge, but information on phases, relative intensities, frequency and width of some lines will also lead to a better result (Decanniere et al., 1994). Incorporation of the number of resonances among the unknown parameters has been attempted (Sekihara et al., 1990). The shear size of two-dimensional data sets and parameter spaces have probably prevented the appearance of complete least-squares treatments, except for some exploratory or model problems (Jeong et al., 1993, Miller et al., 1993, Chen et al., 1993). However, the LS method can be put to good use for a less ambitious goal, the determination of volumes of cross-peaks. The method of Denk et al. (1986), incorporated in the program EASY (Eccles et al., 1991), implements an LS fit of a cross-section to a sum of reference line-shapes; these are derived empirically from isolated peaks. This procedure is in effect an LS version of the reference line deconvolution method. Another approach (Gesmar et al., 1994) combines the volume decomposition of Holak et al. (1987) (see w with an LS fit to exact line-shapes in the f~ and f2 dimensions. A very efficient
318 deconvolution of overlapping peaks results, as well as very accurate values of linewidths (Led and Gesmar, 1994). 8 LINEAR PREDICTION Recent years have seen a flurry of papers describing the applications of various linear prediction (LP) algorithms to the processing of magnetic resonance signals. Among the reasons that justify this intense interest are the wide applicability of such methods and their ability to handle truncated signals. It is also very important that LP algorithms do not require any starting values and thus no operator intervention, hence the qualifiers "automatic" or "blackbox" often applied to such methods. Readers can find an introduction to LP in the chapter by Lupu and Todor or in the review by Led and Gesmar (1991); a somewhat older but more general survey (Lin and Wong, 1990) lists more than four hundred references. On the negative side, one must mention the complex theory, the intricate algebra and the heavy computer load. It would take us too far afield to describe the algorithms. Suffice it to say that the LP method models the FID as a sum of damped sinusoids, retrieves first the frequencies and damping factors by a non linear procedure, then determines the amplitudes and phases of each sinusoid by a linear least-squares fit. Since the number of spectral lines is unknown at the outset, the model must incorporate more components than can be possibly present in the FID. This number, often called the order of the model, also influences the resolution (Gesmar and Led, 1988). Further, using more sinusoids than strictly necessary will allow the algorithm to accommodate lines of non lorentzian shape (Pijnappel et a/.,1992, de Beer and van Ormondt,1994). The computation load increases roughly as the third power of the order. 8.1 Tests of accuracy and precision of LP
Some authors have examined the accuracy and precision of frequency determinations for the simple cases of one or two sinusoids at high signal to noise ratios (Okhovat and Cruz, 1989, Rao and Hari, 1989, Li et al., 1990), but intensities were not considered. LP methods do not perform well for noisy signals (Joliot et al., 1991, Mazzeo and Levy 1991). This means that the algorithm will sometime yield meaningless values of frequencies, and consequently of intensities, the proportion of failures increasing with the noise level (Zaim-Wadghiri et a1.,1992). Further, and apparently in contradiction to theoretical results (Li et al., 1990), amplitude values are biased for spectra of low signal to noise ratio (Diop et al., 1992, Koehl et al., 1994a). Simultaneously, the probable errors in the parameters deviate more and more from the optimal Cramer-Rao bounds (Diop et al., 1994). A particular variant of LP, called total least squares (Tirendi and Martin, 1989, van Huffel et al., 1994) recovers a higher proportion of the spectral components. LP is specially useful for truncated signals, such as those recorded in multinuclear experiments; an interesting application to the quantitation of a two-dimensional heteronuclear NOESY map was presented recently (Mutzenhardt et al., 1993).
319
8.2 Improving the reliability of LP Several approaches may be taken to improve the performance and reliability of LP data processing. It is known that true quadrature detection yields better results than sequential sampling and Redfield pseudo-quadrature (Zaim-Wadghiri et al., 1992). Cadzow (1988) presented a signal conditioning algorithm that has proved very useful for noisy NMR signals (Diop et al., 1992, Lin and Hwang, 1993, Diop et al., 1994a). A computationally more efficient variant has been proposed that is claimed to have superior robustness in the presence of noise (Chen et al., 1994). Regularization techniques may also be successful (K61bel and Schiifer, 1992, Diop et al., 1994b). Oversampling was shown to improve the robustness of LP, probably because artifacts are thereby reduced (Koehl et al., 1994b). In stark contrast to signal preconditioning techniques, Barone et al. (1994) proposed that LP processing be applied with many different model orders. The resulting frequencies and amplitudes are considered as random variables whose mean values are taken to be the best estimates of the actual spectral parameters. It is claimed that intensities of closely spaced lines are reliably estimated. 9 MAXIMUM ENTROPY METHODS In the previous sections, we examined data processing methods that made full use of the information available about the NMR signal. It may therefore seem odd that we now turn to a data processing algorithm which is, in principle, indifferent to any previous knowledge. However, the maximum entropy method (MEM) has proved successful for NMR data processing (See chapter by K.M. Wright). We refer here to the Jaynes-Skilling MEM (Sibisi et al., 1984) and not to the Burg algorithm, which belongs to the class of linear prediction methods. In mere outline, the MEM method is a constrained optimization algorithm. One starts with a trial spectrum, s(v), obtains, by inverse Fourier transform, a trial FID f(t), which is compared to the actual data F(t). Many different s(v) functions will give an f(t) "close" to F(t) in a mean-square sense. One retains that s(v) which maximises an entropy function, for a given maximum mean square distance between f and F. Lineshape information can be incorporated by multiplying f(t) by some weighting function.The algorithm easily accommodates truncated FIDs and other forms of signal degradation (Laue et al., 1987). It is again a computer intensive method. There is but a single systematic study on the performance of MEM for quantitation purposes (Jones and Hore, 1991). This Monte-Carlo investigation concluded that the probable error on line integrals was comparable to what could be obtained using a conventional least-squares fit. The advantages of MEM therefore do not lie in the precision or accuracy of the resulting intensities, but rather in its automatic or "blackbox" operation and its robustness (Hodgkinson et al., 1993). 10 THE USE OF MODULUS OR POWER SPECTRA Motion of the "sample" can occur during some in vivo NMR spectroscopy experiments, with specially deleterious effects for those comprising echo sequences. Motion causes an attenuation of the FID and random phase shifts, which in turn produce further signal losses through averaging. Therefore, a gain in sensitivity is expected if one
320 averages spectra rather than FIDs. This is born out by experiment (Ziegler and Decorps, 1993): the co-addition of phased spectra (block averaging) improves T2 measurements. In a similar vein, modulus or squared modulus spectra can be averaged, also with beneficial results. 11 PRECISION OF DERIVED PARAMETERS Some intensity measurements are used directly to determine concentrations, and the uncertainty in peak area translates directly into probable error on the concentration. In contrast, reaction rates or internuclear distances determinations incorporate intensities (or amplitudes) in a complicated and non-linear manner. The question then arises as to what are the probable errors (or confidence intervals) for such derived parameters. The propagation of errors for parameters which are known analytic function of the intensities is given by a well-known if computationally tedious formula (Bevington, 1969). This formalism has been applied to reaction rate constants, as derived from two-dimensional EXSY experiments (Kuchel et al., 1988, Perrin and Dwyer, 1990). In small, rigid molecules, the dipolar relaxation rates are known functions of a few geometric parameters. Thus Trudeau et al. (1993) used the results of an extensive series of Monte Carlo simulations to determine the uncertainties of relaxation parameters in the case of the phosphite anion (PHO3--). The relative probable error on the internuclear distance was found to be a linear function of the noise standard deviation. Macura (1994,1995) has derived error propagation formulas for model 3- and 4-spin systems. He used expressions for the partial derivatives of cross-peak volumes presented by Yip and Case (1989). The future will show wether this work can be practically applied to a large biomolecule. Let us first remark that it is usually not possible to derive probable errors on atomic coordinates from crystallographic data, due to the extensive computations required. An average uncertainty for all coordinates can be estimated from a Luzzati plot (Luzzati, 1952). The situation is both less clear-cut and less favourable in the case of NMR. We refer the reader to reviews describing the derivation of structures from NMR data (Clore and Gronenborn, 1991, James and Basus, 1991, Sherman and Johnson, 1993). Using simulated data, several authors have addressed the accuracy and precision question (Liu et al., 1992, Nibedita et al., 1992, Clore et al., 1993, Brunger et al., 1993, Zhao and Jardetzky, 1994, McArthur et a/.,1994). We will be content to quote a few salient remarks from these papers. It appears that results (atomic coordinates) are always somewhat biased, whatever the algorithm used. One reason is that the theory relating the most useful data (NOESY cross peak intensities) to interatomic distances is oversimplified: it generally assumes a rigid molecule, with a single correlation time. Another reason is that the information provided by NMR is only of short range, in the form of distances smaller than 6 A and dihedral angles. Then, the NMR data are not really translated into coordinates by a mathematical transformation. Instead, the macromolecule conformation space is searched for those conformations compatible with the experimental constraints. This step is sensitive to the researcher's personal bias. For instance, was a large enough part of the conformational space sampled ? The end result is usually an ensemble of
321 structures. The quality of a model can be estimated from the R factor for which a possible definition is (Gonzalez et al., 1991, McArthur et al., 1994):
E ( [v~alc]1]6 - [~ijbs]1]6 ) enmr = ij
(8) ij
where Vii~ and Vijcalc are the observed and calculated cross-peak volumes for the spin pair i,j. There is as yet not much experience in the use of these factors. Taking for granted that NMR will provide a family of structures, it appears that the accuracy depends on the quality and quantity of the data but can hardly be better than 1 A, while the precision is almost insensitive to the quality of the data and can be of the order of 0.5 A: it is limited by the experimental errors on peak volumes, but only through a highly favourable V ~/6 dependence. In fact, because of the flexible or disordered nature of a protein in solution, it has been advocated that a statistical description be used (McArthur et al., 1994). 12 CONCLUSION At the present time, there is no universal data handling procedure that will lead to clean baselines and to accurate and precise integrals for every type of NMR experiment. It is in fact fortunate that the application which is the most "integral-intensive" (NOESY spectra) is also the most tolerant to errors. Semi-automatic (least-squares) or automatic (linear prediction or maximum entropy) methods should be used whenever possible, because they relieve the spectroscopist of much tedious work, can be reproducible and objective, and provide estimates of probable errors. The main argument raised against them is the long computation times involved. This time should be compared to the time spent in preparing the sample, recording the data and assigning the spectra. The ready availability of fast work stations will also make these methods accessible to a growing number of spectroscopists. 13 R E F E R E N C E S Abilgaard F., Gesmar H., and Led J.J., 1988. Quantitative analysis of complicated nonideal Fourier transform NMR spectra. J. Magn. Reson., 79, 78-89. Akke M., Skelton N. J., K6rdel J., Palmer A.G. III, and Chazin W.J., 1993. Effects of ion binding on the backbone dynamics of calbindin D9k determined by ~5N NMR relaxation. Biochemistry, 32, 9832-9844. Balacco G., 1994. New criterion for automatic phase correction of high resolution NMR spectra which does not require isolated or symmetrical lines. J. Magn. Reson., A110, 19-25. Bardet M., Foret M.-F., and Robert D., 1985. Use of the DEPT pulse sequence to facilitate the ~3C NMR structural analysis of lignins. Makromol. Chem., 186, 1495-
322 1504. Barone P., Guidoni L., Ragona R., Viti V., Furman E., and Degani H., 1994. Modified Prony method to resolve and quantify in vivo 31p NMR spectra of tumors. J. Magn. Reson., B 105, 137-146. Barsukov I.L. and Arseniev A.S., 1987. Base-plane correction in 2D NMR. J. Magn.Reson., 73, 148-149. Becker E.D., 1969. High resolution nuclear magnetic resonance. Academic Press, NewYork. Beer R. de and Ormondt, van D., 1992. Analysis of NMR data using time domain fitting procedures. NMR basic principles and progress, 26, 201-248. Beer R. de and Ormondt, van D., 1994. Background features in magnetic resonance signals, addressed by SVD-based state space modelling. Appl. Magn. Reson., 6, 379390. Bevington P.R., 1969. Data reduction and error analysis for the physical sciences. McGrawHill, New York. Bodenhausen G., Void R.L., and Void R.R., 1980. Multiple quantum spin-echo spectroscopy. J. Magn. Reson., 37, 93-106. Bracewell R.N.,1986. The Fourier transform and its applications. 2nd edition, McGrawHill, New York. Brunger A.T., Clore G. M., Gronenborn A.M., Saffrich R., and Nilges M., 1993. Assessing the quality of solution nuclear magnetic resonance structures by complete cross-validation. Science, 261,328-331. Cadzow J.A., 1988. Signal enhancement-A composite property mapping algorithm. IEEE Tran. Acoust. Speech Process., 36, 49-62. Chen S.C., Schaewe T. J., Teichman R.S., Miller M. I., Nadel S.N., and Greene A.S., 1993. Parallel algorithms for maximum-likelihood nuclear magnetic resonance spectroscopy. J. Magn. Reson., A102, 16-23. Chen H., Huffel S. van, Decanniere C., and Hecke P. van, 1994. A signal- enhancement algorithm for the quantification of NMR data in the time domain. J. Magn. Reson., A 109, 46-55. Chen L., Cottrell C.E., and Marshall A.G., 1986. Effects of signal-to-noise ratio and number of data points upon precision in measurement of peak amplitude, position and width in Fourier transform spectrometry. Chem. Intel. Lab. Sys., 1, 51-58. Chow J.K., Levitt K.N., and Kost G.J., 1993. NMRES: an artificial intelligence expert system for quantification of cardiac metabolites from phosphorus nuclear magnetic resonance spectroscopy. Ann. Biomed. Eng., 21,247-258. Chylla R.A. and Markley J.L., 1993. Simultaneous basepoint correction and base-line recognition in multidimentsional NMR spectra. J. Magn. Reson., B 102,148-154. Clore G.M. and Gronenborn A.M., 1991. Two-, three-, and four-dimensional methods for obtaining larger and more precise three-dimensional structures of proteins in solution. Annu. Rev. Biophys. Biophys. Chem., 20, 29-63. Clore G.M., Robien M.A., and Gronenborn A.M., 1993. Exploring the limits of precision and accuracy of protein structures determined by nuclear magnetic resonance
323 spectroscopy. J. Mol. Biol., 231, 82-102. Cross K.J., 1993. Improved digital filtering for solvent suppression. J. Magn. Reson., A 101,220-224. Daubenfeld J.-M., Boubel J.-C., and Delpuech J.-J., 1985. Automatic intensity, phase and baseline corrections in quantitative carbon-13 spectroscopy. J. Magn. Reson., 62, 195-208, and references therein. Decanniere C., Hecke P. van, Vanstapel F., Chen H., Huffel S. van, Voort C. van der, Tongeren B. van, and Ormondt D. van, 1994. Evaluation of signal processing methods for the quantitation of strongly overlapping peaks in 31p NMR spectra. J. Magn. Reson., B105, 31-37. Delsuc M.-A. and Lallemand J.-Y, 1986. Improvement of dynamic range by oversampling. J. Magn. Reson., 69, 504-507. Denk W., Baumann R., and Wagner G., 1986. Quantitative evaluation of cross-peak intensities by projection of two-dimensional NOE spectra on a linear space spanned by a set of reference resonance lines. J. Magn. Reson., 67, 386-390. Dietrich W., Rtidel C.H., and Neumann M. Fast and precise automatic base-line correction of one- and two-dimensional NMR spectra. J. Magn. Reson., 91, 1-11. Diop A., Briguet A., and Graveron-Demilly D., 1992. Automatic in vivo NMR data processing based on an enhancement procedure and linear prediction method. Magn. Reson. Med., 27, 318-328. Diop A., Zaim-Wadghiri Y., Briguet A., and Graveron-Demilly D., 1994a. Improvements of quantitation by using the Cadzow enhancement procedure prior to any linear prediction methods. J. Magn. Reson., B105, 17-24. Diop A., K61bel W., Michel D., Briguet A., and Graveron-Demilly D., 1994b. Full automation of quantitation of in vivo NMR by LPSVD(CR) and EPLPSVD. J. Magn. Reson., B 103, 217-221. Dobbenburgh J.O. van, Lekkerkerk C., Echteld C.J.A. van, and Beer R. de, 1994. Saturation correction in human cardiac P-31 NMR spectroscopy at 1.5 T. NMR Biomed., 7, 218-224. Eccles C., Gtintert P., Billeter M., and WiJthrich K., 1991. Efficient analysis of protein 2D NMR spectra using the software package EASY. J. Biomol. NMR, 1,111-130. Ellet J.D., Gibby M.G., Huber L.M., Mehring M., Pines A., and Waugh J. S., 1971. Spectrometers for multiple pulse NMR. Adv. Magn. Reson., 5, 117-176. Ernst R.R., 1966. Sensitivity enhancement in magnetic resonance. Adv. Magn. Reson., 2, 1-137. Ferretti J.A. and Weiss G.H., 1991. One dimensional nuclear Overhauser effects and peak intensity measurements. Meth. Enzymol., 176, 3-11. Froystein N.A., 1993. Removal of all baseline and phase distortions from 2D NOE spectra by tailored spin-echo evolution and detection. J. Magn. Reson. A103, 332-337. Gard J.K., Kichura J.M., Ackerman J.J.H., Eisenberg J.D., Billadello J.J., Sobel B.S., and Gross R.W., 1985. Quantitative 31p nuclear magnetic resonance analysis of metabolite concentrations in Langendorf-perfused rabbit hearts. Biophys. J. 48,803813.
324 Gelfgat V.I., Kosarev E.L., and Podolyak, E.R., 1993. Programs for signal recovery from noisy data using the maximum likelihood principle. Comput. Phys Commun., 74, 335-357. Gesmar H., Led J.J., 1988. Spectral estimation of complex time-domain NMR signals by linear prediction. J. Magn. Reson., 76, 183-192. Gesmar H., Led J.J., and Abilgaard F., 1990. Improved methods for quantitative spectral analysis of NMR data. Prog. NMR Spectry, 22, 255-288. Gesmar H., Nielsen P.F., and Led J.J., 1994. Simple least-squares estimation of intensities of overlapping signals in 2D NMR spectra. J. Magn. Reson., B 103, 10-18. Golub G.H. and Pereyra V., 1973. The differentiation of pseudo-inverses and non-linear least squares problems whose variables separate. SlAM J. Numer. Anal., 10, 413-432. Gonzalez C., Rullmann J.A.C., Bonvin A.M.J.J., Boelens R., and Kaptein R., 1991. Toward an NMR R factor. J. Magn. Reson., 91,659-664. Gu6ron M., Plateau P., and Decorps M., 1991. Solvent signal suppression in NMR. Prog. NMR Spectr., 23, 135-209. Gtintert P. and Wtithrich K., 1992. FLATT-A new procedure for high quality baseline correction of multidimensional NMR spectra. J. Magn. Reson., 96, 403-407. Hamming R.W., 1962. Numerical methods for engineers and scientists. Reprinted 1986, Dover, New York. Herring F.G. and Phillips P.S., 1985. Integration errors in digitized magnetic resonance spectra. J. Magn. Reson., 62, 19-28. Heuer A. and Haeberlen U., 1989. A new method for suppressing base-line distortions in FT NMR. J. Magn. Reson., 85,79-94. Hockings P.D. and Rogers P.J., 1994. ~H NMR determination of intracellular volume in cell suspensions. Arch. Biochem. Biophys. 311, 383-388. Hodgkinson P., Mott H. R., Driscoll P.C., Jones J.A. and Hore P.J., 1993. Application of maximum entropy methods to three-dimensional NMR spectroscopy. J. Magn. Reson., B 101, 218-222. Holak, T.A., Scarsdale J.N., and Prestegard J.H., 1987. A simple method for quantitative evaluation of cross-peak intensities in two-dimensional NOE spectra. J. Magn. Reson., 74, 546-549. Hoult D. I., Chen C. N., Eden H., and Eden M., 1983. Elimination of baseline artifacts in spectra and their integrals. J. Magn. Reson., 51, 110-117. Huffel S. van, Chen H., Decanniere C., and Hecke P. van, 1994. Algorithm for timedomain NMR data fitting based on total least squares. J. Magn. Reson., A110, 228237. James T.L. and Basus V.J., 1991. Generation of high-resolution protein structures in solution from multidimensional NMR. Annu. Rev. Phys. Chem., 42, 501-542. Jeong G.-W., Borer, P.N., Wang S.S., and Levy G.C., 1993. Maximum-likelihood constrained deconvolution of two-dimensional NMR spectra. Accuracy of spectral quantification. J. Magn. Reson., A 103, 123-134. Joliot M., Mazoyer B. M., and Huesman R.H., 1991. In vivo spectral parameter estimation: a comparison between time and frequency domain methods. Magn. Reson.
325
Med.,18,358-370. Jones J.A. and Hore P.J., 1991. The maximum entropy method. Appearance and reality. J. Magn. Reson., 92, 363-376. Kleywegt G.J., Boelens R., and Kaptein R., 1990. A versatile approach toward the partially automatic recognition of cross-peaks in 2D ~H NMR spectra. J. Magn. Reson., 88, 601-608. Knijn A., Beer R. de, and Ormondt D. van, 1992. Frequency selective quantification in the time domain. J. Magn. Reson., 97, 444-450. Koehl P., Ling C., and Lef6vre J.-F., 1994a. Statistics and limits of linear-prediction quantification of magnetic resonance spectral parameters. J. Magn. Reson., A 109, 3240. Koehl P., Ling C., and Lef6vre J.-F., 1994b. Improving the performance of linear prediction on magnetic resonance signals by oversampling. J. Chim. Phys., 91,595606. K61bel W. and Sch~ifer H., 1992. Improvement and automation of the LPSVD algorithm by continuous regularization of the singular values. J. Magn. Reson., 100,598-603. Kosarev E.L., 1990. Shannon's superresolution limit for signal recovery. Inv. Problems, 6,55-76. Kuchel P.W., Bulliman B.T., Chapman B.E., and Mendz G.L., 1988. Variances of rate constants estimated from 2D NMR exchange spectra. J. Magn. Reson., 76, 136-142. Kuchel P.W., Chapman B.E., and Lennon A.J., 1993. Diffusion of hydrogen in aqueous solutions containing protein. Pulsed field-gradient NMR techniques. J. Magn. Reson., A103, 329-331. Laeven J.M., Smit H.C., and Lankelma J.V., 1984. A software package for the generation of noise with widely divergent spectral properties. Analyt. Chim. Acta, 157, 273-290. Laeven J.M. and Smit H.C., 1985. Optimal peak area determination in the presence of noise. Analyt. Chim. Acta, 176, 77-104. Lambert J., Burba P., and Buddrus J. ,1992. Quantification of partial structures in aquatic humic substances by volume integration of two-dimensional ~3C nuclear magnetic resonance spectra. Comparison of one- and two-dimensional techniques. Magn. Res. Chem., 30, 221-227. Laue E.D., Pollard K.O.B., Skilling J., Staunton J., and Sutkowski A.C., 1987. Use of the maximum entropy method to correct for acoustic ringing and pulse feedthrough in 170 NMR spectra. J. Magn. Reson., 72, 493-501. Led J.J. and Gesmar H., 1991. Application of the linear prediction method to NMR spectroscopy. Chem. Rev., 91, 1413-1426. Led J.J. and Gesmar H., 1994. Quantitative information from complicated NMR spectra of biological macromolecules. Methods in Enzymology, 239, 318-348. Li F., Vaccaro R.J., and Tufts D.W., 1990. Unified performance analysis of subspacebased estimation algorithms. Proc. IEEE ICASSP 1990, 8, 2575-2578. Lin D.M. and Wong E.K., 1990. A survey of the maximum entropy method and parameter spectral estimation. Physics Reports, 193,41-135.
326 Lin Y.-Y. and Hwang L.-P., 1993. NMR signal enhancement based on matrix property mappings. J. Magn. Reson., A103, 109-114. Lippens G. and Hellenga K., 1990. Perfectly flat baselines in 1D and 2D spectra with optimized spin-echo detection. J. Magn. Reson., 88, 619. Liu Y., Zhao D., Altmann R., and Jardetzky O., 1992. A systematic comparison of three structure determination methods for NMR data: dependence upon quality and quantity of data. J. Biomol. NMR, 2, 373-388. Luzzati P.V., 1952. Traitement statistique des erreurs dans la d6termination des structures cristallines. Acta Cryst., 5, 802-810. McArthur M.W., Laskowki R.A., and Thornton J.M., 1994. Knowledge-based validation of protein structure coordinates derived by X-ray crystallography and NMR spectroscopy. Curr. Opinion Struc. Biol., 4, 731-737. Macovski A. and Spielman D., 1986. In vivo spectroscopic magnetic resonance imaging using estimation theory. Magn. Res. Med., 3, 97-104. Macura S., 1994. Evaluation of errors in 2D exchange spectroscopy. J. Magn. Reson., B104, 168-171. Macura S., 1995. Full matrix analysis of the error propagation in two-dimensional chemical exchange and cross relaxation spectroscopy. J. Magn. Reson., A112, 152159. Manolera N. and Norton R.S., 1992. Spectral processing methods for the removal of tl noise and solvent artifacts from NMR spectra. J. Biomol. NMR, 2, 485-494. Marion D. and Bax A, 1988. Baseline distortion in real Fourier transform NMR spectra. J. Magn. Reson., 79, 352-356. Marion D. and Bax A, 1989. Baseline correction of 2D FT NMR spectra using a simple linear prediction extrapolation of the time-domain data. J. Magn. Reson., 83,205-211. Martin M.-L. and Martin G.J., 1990. Deuterium NMR in the study of site-specific natural isotope fractionation (SNIF-NMR). NMR Basic Principles and Progress, 23, 1-61. Martin Y.-L., 1994. A global approach to accurate and automatic quantitative analysis of NMR spectra by complex least-squares fitting. J. Magn. Reson., A111, 1-10. Mazzeo A.R. and Levy G.C., 1991. An evaluation of new processing protocols for in vivo NMR spectroscopy. Magn. Reson. Med., 17, 483-495. Miller M.L. and Greene A.S., 1989. Maximum-likelihood estimation for magnetic resonance spectroscopy. J. Magn. Reson., 83, 525-548. Miller M.L., Chen S.C., Kuefler D.A., and d'Avignon D.A., 1993. Maximum likelihood and the EM algorithm for 2D NMR spectroscopy. J. Magn. Reson., A104, 247-257. Mitschang L., Cieslar C., Holak T.A., and Oschkinat H., 1991. Application of the Karhunen-Lo6ve transformation to the suppression of undesired resonances in threedimensional NMR. J. Magn. Reson., 92, 208-217. Montigny J., Brondeau J., and Canet D., 1990. Analysis of time-domain NMR data by standard non-linear least-squares. Chem. Phys. Lett., 170, 175-180. Mutzenhardt P., Palmas P., Brondeau J., and Canet D., 1993. Quantitative time-domain analysis of two-dimensional heteronuclear Overhauser effect (HOE) data by the HD
327 (Hankel decomposition) method. J. Magn. Reson., A104, 180-189. Nadjari R. and Grivet J.-P., 1991. Precision of integrals in quantitative NMR. J. Magn. Reson., 91, 353-361. Nadjari, Ph. D. Thesis, Universit6 d'Orl6ans, 1992. Nadjari R. and Grivet J.-P., 1992. Precision of volume integrals in two-dimensional NMR. J. Magn. Reson., 98, 259-270. Netzel D.A., 1987. Quantitation of carbon types using DEPT/QUAT NMR pulse sequences: application to fossil-fuel-derived oils. Anal. Chem., 59, 1775-1779. Nibedita R., Kumar R.A., Majumdar A., and Hosur R.V., 1992. Quantitative comparison of experimental and simulated NOE intensities. Correlation with accuracy of oligonucleotide structure determination. J. Biomol. NMR, 2, 477-482. Nuzillard J.-M. and Freeman R., 1994. Oversampling in two-dimensional NMR. J. Magn. Reson., A 110, 252-256. Okhovat A. and Cruz J.R., 1989. Statistical analysis of the Tufts-Kumaresan and principal Hankel components method for estimating damping factors of single complex exponentials. Proc. IEEE ICASSP 1989,4, 2286-2289. Otting G., Widmer H., Wagner H., and Wiithrich K., 1986. Origins of t~ and t2 ridges in 2D NMR spectra and procedures for suppression. J. Magn. Reson., 66, 187-193. Pastemack L.B., Laude D.A. Jr., and Appling D.A., 1994. Whole-cell detection by ~3C NMR of metabolic flux through the C~-tetrahydrofolate synthase/serine hydroxymethyl transferase enzyme system and effect of antifolate exposure in Saccharomyces cerevisiae. Biochemistry 33, 7166-7173. Perrin C.L. and Dwyer T.J., 1990. Application of two-dimensional NMR to kinetics of chemical exchange. Chem. Rev., 90, 935-967. Pijnappel W.W.F., Boogart A. van den, Beer R. de, and Ormondt D. van, 1992. SVDbased quantification of magnetic resonance signals. J. Magn. Reson., 97, 122-134. Pople J.A., Schneider W.G. and Bemstein H.J., 1959. High resolution nuclear magnetic resonance. McGraw-Hill, New-York. Posener D.W., 1974. Precision in measuring resonance spectra. J. Magn. Reson., 14, 121-128. Rao B.D. and Hair K.V.S., 1989. Performance analysis of Root-Music. IEEE Trans. Acoust. Speech Sig. Process., 37,1939-1949. Redfield A.G. and Gupta R.K., 1971. Pulsed Fourier transform NMR spectrometer. Adv. Magn. Reson., 5, 82-116. Redfield A.G. and Kunz S.D., 1975. Quadrature Fourier NMR detection: simple multiplex for dual detection and discussion. J. Magn. Reson., 19, 250-254. Redfield A.G. and Kunz S.D., 1994. Simple NMR input system using a digital signal processor. J. Magn. Reson., A108, 234-237. Rosen M.E., 1994. Selective detection in NMR by time-domain digital filtering. J. Magn. Reson., A 107, 119-125. Rouh A., Delsuc M.-A., Bertrand G., and Lallemand J.-Y., 1993. The use of classification in baseline correction of FT NMR spectra. J. Magn. Reson., A102, 357-
328 359. Rouh A., Louis-Joseph A., and Lallemand J.-Y., 1994. Bayesian signal extraction from noisy NMR spectra. J. Biomol. NMR, 4, 505-518. Saffrich R., Beneicke W., Neidig K.-P., and Kalbitzer H.R., 1993. Baseline correction in n-dimensional NMR spectra by sectionally linear interpolation. J. Magn. Reson., B 101, 304-308. Sekihara K. and Ohyama N., 1990. Parameter estimation for in vivo magnetic resonance spectroscopy (MRS) using simulated annealing. Magn. Res. Med., 13,332-339. Sekihara K., Haneishi H., and Ohyama N., 1990. Maximum-likelihood parameter estimation for in vivo magnetic resonance spectroscopy using modified cost function. Method when exact number of components is unknown. J. Magn. Reson., 90, 192197. Sherman S.A. and Johnson M.E., 1993.Derivation of locally accurate spatial protein structure from NMR data. Prog. Biophys. Molec. Biol.,59, 285-339. Sibisi S., Skilling J., Brereton R.G., Laue E.D., and Staunton J., 1984. Maximum entropy signal processing in practical NMR spectroscopy. Nature, 311,446-447. Skelton N.J., K6rdel J., Akke M., and Chazin W.J., 1992. Nuclear magnetic resonance studies of the internal dynamics in apo, (Cd2+)~, and (Ca2+)2 calbindin D9k. The rate of amide proton exchange with solvent. J. Mol. Biol., 227, 1100-1117. Smit H.C., 1990. Specification and estimation of noisy analytical signals. Part I. Chem. Intel. Lab. Sys., 8, 15-27. Part II, ibid., 8, 29-41. Smit H.C. and Walg H.L., 1975. Base-line noise and detection limits in signal-integrating analytival methods. Application to chromatography. Chromatographia, 8, 311-323. Sotak C.H., Dumoulin C.L., and Levy G.C., 1983. Software for quantitative analysis by carbon-13 Fourier transform nuclear magnetic resonance spectroscopy, Anal. Chem., 55,782-787. Spitzfaden C., Braun W., Wider G., Widmer H., and Wtitrich K., 1994. Description of the NMR solution structure of the cyclophilin A-cyclosporin A complex, J. Biomol. NMR, 4, 463-482. Star6uk Z. Jr, Bartu~ek K., and Star6uk Z.,1994. First data point problem and the baseline distorsion in Fourier transform NMR spectroscopy with simultaneous sampling. J. Magn. Reson., A 108, 177-188. Tang C., 1994. An analysis of baseline distorsion and offset in NMR spectra. J. Magn. Reson., A109, 232-240. Tirendi C.F. and Martin J.F., 1989. Quantitative analysis of NMR spectra by linear prediction and total least squares. J. Magn. Reson., 85, 162-169. Trudeau J.D., Bohmann J., and Farrar T.C., 1993. Parameter estimation from longitudinal relaxation studies in coupled two-spin-l/2 systems using Monte Carlo simulations. J. Magn. Reson., A 105, 151-166. Vaals J.J. van and Gerwen P.H.J. van, 1990. Novel methods for automatic phase corrections of NMR spectra. J. Magn. Reson., 86, 127-147. Webb S., Collins D. J., and Leach M.O., 1992. Quantitative magnetic resonance spectroscopy by optimized numerical curve fitting. NMR Biomed., 5, 87-94.
329 Weiss G.H. and Ferretti J.A., 1983. Accuracy and precision in the estimation of peak areas and NOE factors. J. Magn. Reson., 55, 397-407. Weiss G.H., Ferretti J.A., Kiefer J.E., and Jacobson L., 1983. A method for eliminating errors due to phase imperfection on NOE measurements. J. Magn. Reson., 53, 7-13. Weiss G.H., Ferretti J.A., and Byrd R.A., 1987. Accuracy and precision in the estimation of peak areas and NOE factors. II. The effects of apodization. J. Magn. Reson., 71, 97-105. Weiss G.H., Kiefer J.E., and Ferretti J.A., 1988. Accuracy and precision in the estimation of peak areas. The effects of apodization. Chem. Intel. Lab. Sys., 4, 223229. Weiss G.H., Kiefer J.E., and Ferretti J.A., 1992. Accuracy and precision in the estimation of internuclear distances for structure determination. J. Magn. Reson.,97, 227-234. Wenzel T.J., Ashley M.E., and Sievers R.E., 1982. Water-soluble paramagnetic relaxation reagents for carbon-13 nuclear magnetic resonance spectrometry. Anal. Chem., 54, 615-620. Wider G., 1990. Elimination of baseline artifacts in NMR spectra by oversampling. J. Magn. Reson., 89, 406-409. Yip P. and Case P.A., 1989. A new method for refinement of macromolecular structure based on nuclear Overhauser effect spectra. J. Magn. Reson., 83, 643-648. Zaim-Wadghiri Y., Diop A., Graveron-Demilly D., and Briguet A., 1992. Improving data acquisition parameters of 31p in vivo spectra for signal analysis in the time domain. Biochimie, 74,769-776. Zhao D. and Jardetzky O., 1994. An assessment of the precision and accuracy of protein structures determined by NMR. Dependence on distance errors. J. Mol. Biol., 239, 601-607. Zhu G., Torchia D.A., and Bax A., 1993. Discrete Fourier transformation of NMR signals. The relationship between sampling delay time and spectral baseline. J. Magn. Reson., A105, 219-222. Ziegler A. and Decorps M., 1993. Signal-to-noise improvement in in Vivo spin-echo spectroscopy in the presence of motion. J. Magn. Reson., B102, 26-34. Zolnai Z., Macura S., and Markley J.L., 1989.Spline method for correcting baseplane distortions in two-dimensional NMR spectra. J. Magn. Reson., 82, 596-504.
Signal Treatment and Signal Analysis in NMR
Ed. by D.N. Rutledge
330
9 1996 Elsevier Science B.V. All rights reserved.
Chapter 15 LEAST-SQUARES ESTIMATION OF PARAMETERS AFFECTING NMR LINE-SHAPES IN MULTI-SITE CHEMICAL EXCHANGE Guido Crisponi Dipartimento di Chimiea e Teenologie Inorganiehe e Metallorganiehe Universitb. degli Studi di Cagliari, Cagliari, Italy 1 INTRODUCTION The modifications in NMR spectra arising from chemical exchange processes contain a great deal of kinetic and thermodynamic information on the studied system which would be hardly achievable otherwise (Gutowsky et al., 1953; Anderson and Weiss, 1953; Kubo, 1957; McConnell, 1958, Sack, 1958; Johnson, 1965; Gutowsky et al., 1965). In order to extract all the available information it is essential to have phenomenological equations to describe the systems as well as mathematical tools to estimate the correct numerical parameters. In this chapter we present a computer program for the evaluation of exchange parameters and discuss the problems which can be encountered in optimization procedures. A treatment of the exchange processes is beyond the scope of this chapter and in the following section we simply present the equations used in the program for describing the signal shapes. These equations are based on suitable modifications of the Bloch equations (Whitesides and Fleming, 1967; Yandle and Maher, 1969; Reeves and Shaw, 1970; Reeves and Shaw, 1971; Reeves, Shaddick and Shaw, 1971; Chan and Reeves, 1973) and allow to obtain the signal intensifies by real algebra (Caminiti, Lai, Saba and Crisponi, 1983; Nurchi, Crisponi and Ganadu, 1990), as opposed to other literature programs (Binsch, 1969; Meakin et al., 1976; English et al., 1976) which resort to complex algebra. 2 GENERAL EQUATIONS FOR SIGNAL SHAPE The equations for obtaining M R signal shapes in the case of exchange between different sites which are used in our least-squares program are obtained with the
331 procedure proposed by McConnell (McConnell, 1958) and reported by Pople et al. (Pople, Schneider and Bernstein, 1959). With n types of sites each characterized by a Larmor frequency 03j and a transverse relaxation time T2j, the original Bloch equations: dGj/dt + [1/T2j -
i(03-coj)] Gj = iTHaMoj
(1)
are modified to take into account exchange phenomena (the complex moment G is defined as G = u + iv, where u and v are the transverse components of the macroscopic momem M in phase and out of phase with to the rotating field Hi, and v is proportional to the absorption intensity). We therefore define the probability that a nucleus makes a jump from a j to a k position by "lTjk-1. F o r j = k z.1k-1 equals zero the jumps between equal sites being ineffective. The Bloch equations corrected.for the change in the complex moment Gj connected with the jumps are: dGj /dt + [1/T2j - i(co-coj)l.Gj = - iTt-I1MOj + s
zjk-lGj)
(2)
The steady-state solution of eqs.[2] for all types of sites is reached putting all dGj/dt equal to zero and resolving the resultant matrix equation: ([T + K] + i ~ ) - G = -i~lMO- p
(3)
in which G = u + iv; u and v are the u-mode and v-mode magnetisation vectors, with elements uj and vj respectively; T is the nxn diagonal matrix whose elemems [T2j]-1 are the inverse of the transverse relaxation times including a contribution due to magnetic field inhomogeinity; ~ is the nxn diagonal matrix with elemems (03-03j); K is the rate matrix with off-diagonal elements Kjk = - Xjk-1 and diagonal elemems K~ = [Zxf 1]jvk, ~2i = 7H1, and p is the vector of Pj, the population of the jth site. The imaginary part of G which gives the absorptions at the 03 frequency, is easily obtained as (Scheid, 1968): v(03) = -~IMo([T + K] + ~ - [T + K] -1 - ~)-~ - p
(4)
and the signal shape I(03) can be calculated as: I (03) = Zvj (03)
(5)
Besides on co, I (co) depends also on the parameters coj, Tzj, Zjk and Pj. ff one assumes that 03j and Tzj do not vary during the exchange process, then the line shapes can be considered a function of ~, and Pj only. In order to obtain the best estimates of the latter parameters, we make use of the Gauss-Newton non-linear least-squares method (Draper and Smith, 1966; Davies and Goldsmith, 1972), which will be briefly discussed in the following section.
3 LEAST-SQUARES METHOD The determination of the parameters b in a functional relationship: y =f(x,b)
(6)
332
from a set of N experimental determinations y, of the dependent variable is generally accomplished by least-squares procedures. Two different approaches are used depending on whether equation (6) is linear or not with respect to the parameters b. In the first case the N determinations can be ordered as a function ofp independent variables as follows: Yl = bl(Xl)l + b2(X2)l +...+bp(Xp)l Y2 = bl(Xl)2 + b2(x2)2 -I-...+bp(xp)2
(7) YN = bl(Xl)N + bz(X2)N +...+bp(Xp)N
Equation (7) can be formulated in the more compact matrix form: y = Xb
(8)
where X is the matrix of Nxp measurements of independent variables, called the 'design matrix'.The parameters obtained by matrix algebra as: b = ( x T x ) I x Ty
(9)
are those which minimize the sum of square residuals ERi 2 according to the least squares criterion. The matrix x T x is known as the 'information matrix' and its inverse as the 'dispersion matrix'. When fix, b) in equation (6) is non-linear with respect to the parameters, equation (7) can no longer be used to represent the N experimental measurements. Nevertheless, if a set b0 of guesses o f p parameters is known, the function y = f (x,b) can be replaced by its Taylor expansion truncated to the first term: y = f (x,bo) + ~-'4=l,p [Sf (X,bo)/8 bl ](bl-bol)
(lo)
In this case equation (6) becomes: Ayl = Abl(81)l + Ab2(~2)l + ... + Abp(~p)l Ay2= abl(~l)2 + Ab2(82)2 + ... + abp(Sp)2
(11) Ay~ = Ab1(81)~+ Ab2(82~ + ... + Abp(Sp)N where Ay, : yi-f(x~bo) and (~/)i-- 8f(x,,bo)/8 bl; in matrix notation: Ay = JAb
(12)
333 where J, the matrix of derivatives, is called the Jacobian. As in the linear case the solution is: Ab = (jTj)qjT Ay
(13)
The Taylor expansion (10) is a good approximation off(x,b) only if the initial guesses are good enough to neglect the higher-order terms. The calculation of the correction terms Ab has therefore to be iterated until the change in ZR~2 is less than an established value. The information on the reliability of the parameters estimated by equation (9) is contained in the variance-covariance matrix. This is the product of the dispersion matrix by the variance of pure experimental error o'2: V = (~2 (xTx)-I ._ (~2 C
(14)
The variance of residuals S2 = ~R,2/(N-p) gives an estimate of o 2 and C represents the dispersion matrix. The element VII~/2 is the standard deviation associated with the b~ parameter. The correlation between two parameters bt~ and bt2 is: rll,12 = Cll,12 ](Cll,I1 Cl2,12) V2
(15)
while the total correlation coefficient for the parameter bt is: rl = [1-1/(Cl,l Ct,{1)] '/~
(16)
For the non-linear case, according to Draper and Smith, '...all the usual formulae and analyses of linear regression theory can be applied. Any results obtained are, however, valid only to the extent that the linearized form provides a good approximation to the true model.' 4 COMPUTER PROGRAM On the basis of equations (4) and (5) for the calculation of signal shape and of GaussNewton non-linear least squares procedure (equation (13)) we developed the computer program OPTAO. This program is written in Microsoft Professional Visual Basic 1.0 for DOS because this language offers the possibilities of: 9 wide limits in dimensioning matrices 9 instructions for matrix algebra 9 interactive graphic capabilities. The program, which easily manages up to ten exchange sites, optimizes all the related population and exchange parameters. It is supplied in executable form together with detailed directions for use.
334 5 P R O B L E M S IN LEAST SQUARES O P T I M I Z A T I O N The optimization of non-linear parameters by equation (13) is in many instances perturbed by problems connected to different factors which affect the convergence of the iterative procedure. In order to point out some of these factors we examined a set of illustrative cases using the simplest system of two exchanging sites. This set (Fig. 1) is composed of 18 cases according to the scheme in Table 1 in which the population ratios Table 1: Conventional marks for the 18 simulated cases. x12 2 0.2 0.1 0.05 0.02 0.002
P1 = 16.66, P2 = 83.33 1A 2A 3A 4A 5A 6A
P1 = 33.33, P2 = 66.66 1B 2B 3B 4B 5B 6B
P1 = 50.0, P2 = 50.0 1C 2C 3C 4C 5C 6C
are 1:5, 1:2 and 1:1 respectively (the 5:1 and 2:1 cases maintain the symmetry) and the lifetimes vary from z = 2 to z = 0.002 sec, i.e. from a slow exchange situation to a fast one. The first factor which can introduce uncertainties is the precision in the numerical calculation of derivatives. In fact the use of equations (4) and (5), which implies a double matrix inversion, prevents the analytical differentiation of signal intensity I with respect to the parameters to be optimized. The necessary numerical differentiation implies two conflicting requirements: to obtain reliable results and to avoid long calculation times. According to Scheid (Scheid, 1968), the following three different approximations can be used: A) I'(b, co~) ~ [I(b+h, co~) -I(b, coi)]/h, B) I'(b, coi) _-_[I(b+h, coi) -I(b-h, co~)]/2h C) I'(b, (o~) ~- [I(b, COl)- I(b-h, co~)]/h where h is the differentiation interval. They are derived from forward Newton, Stifling and backward Newton formulae respectively. Since the program calculates the function I at N co, values in a step before the optimization procedure, the first and third formulae require only a further calculation of I at N (b + h, co3 or (b - h, co3 values for each parameter, while the second formula requires two calculations for each parameter both at the N (b + h, co~) and (b - h, co~) values which implies double calculation time. A good choice of the step h is, at any rate, surely an important factor: too great an interval should lead to underestimate or overestimate the value according to the curvature of the function in the chosen interval, while too small an interval should produce problems connected to calculation precision. The derivatives at varying relative parameter increments in different situations were therefore calculated.
335
Pt---- 16.'7
Pi----33.3
la 1----50. O
20
Pz--83.8
P2~66.7
P~_--50.O
9
fl 7 V
0
t\
"r--O.2
---/x',_)/tx'-0
5,)
200
_...,,,A,x O
200
Fig. 1: The signal shapes for variable populations and lifetime are shown. The simplest formulae A and B in the cases 1A, 3B and 6C using (h/b) _~ 0.04, 0.02, 0.01, 0.005 and 0.001 were used in double precision. The results, reported in Fig.2 for the case 3B, give evidence of the following facts: i) when the ascending formula A is used, the derivatives show a linear trend with respect to the amplitude h of the differentiation interval. This trend can be positive, negative or almost null according to the examined case and parameter; ii) exactly the same behaviour is presented when the descending formula B is used, but the slopes of the straight lines are reversed, as can be seen in Fig.2; iii) the values of the derivatives calculated with formula C are almost constant and equal to the values extrapolated to zero using the A and B formulae. For values of (h/b)< 0.005 sudden variations of the derivatives, which are not easily predictable, are observed with all three formulae (to a less extent for C) due to numerical precision problems. Points i) and ii) are easily explained by recalling that, if one considers a continuous function f(x), then the derivative calculated with ascending formula will increase if f(x) has an upward curvature; it will decrease for a downward curvature and will be zero if f(x) is a linear function. The opposite trend happens using the descending formula. On the basis of the above considerations formula C, which gives reliable estimates of the derivatives, was chosen. Considering the computer technology available nowadays, doubling the calculation time for the derivatives is not unacceptably time consuming. In the first version of the program on IBM XT or AT computers, the calculation time was a limiting factor, and formula A had to be used despite loss of accuracy. The derivatives for the 18 cases in Table 1, calculated with formula C using a h / b value 0.02, are reported in Fig. 3.
336
7.4
.!7 7
A
*
.~
B
3
oo
'
relative
o.61
'
o.62
increment
Fig. 2" Derivatives with respect to T12 for the case [P1= 33.3, P2= 66.7 and x12 = 0.1] calculated at variable relative increments h/b with the three formulae A, B and C. The following factors affecting the reliability of the final results are taken imo consideration: 1. number of examined poims, or alternatively, step of frequency in recording the signal; 2. truncation of the signal used in the optimization procedure (in some cases part of the signal may be unavailable because of overlapping with other signals); 3. symmetry of the examined range of signals; 4. values taken by the parameters characterizing the examined system; 5. correctness of the imtial parameter estimates. The strategy used for pointing out how each factor affects the final results (Crisponi et al., 1993) is based on equation (14) and will now be illustrated in some detail. The variance of each parameter depends on two different factors: S2 which is an estimate of experimental error and Cu which is only a function of the way in which the values of the independent variables of the system are chosen and does not depend on the values of the dependent variable. In order to get the best estimates of the parameters (those affected by the minimum Vu values) S2 has to be minimized by a proper choice of experimental apparatus and procedures and then the experiments have to be planned to obtain the lowest values for the Ctt terms. Knowledge of the behaviour of the dispersion matrix in the actual system is therefore necessary: this could give a clear picture of the precision which can be achieved in the determination of signal parameters in various possible real situations. The five points previously remarked will now be examined with this procedure.
337
5.1 Number of points used in scanning the frequency axis Twenty sets of the signals for the case [P1 = P 2 = 5 0 , "1:12 = 0.1] were simulated in the same frequency range. A variable step was used in such a way that the number of points varied from 10 in the first set to 200 in the 20 th. The square roots of the diagonal elements of the dispersion matrix related to the populations Eu,, calculated for all the above cases, are reported in Fig.4. A quasi-linear trend in Eu can be observed in the range 60 _< N _< 200 (where Eft is practically half the 60 point value). When less than 50 points are used Eu increases exponentially. In our opinion the use of about 100 points for the two exchange sites represents a reasonable compromise between good precision and a manageable number of experimental points. 5.2 Truncation of the signal To know what amount of external signal can be disregarded without significative loss in precision, nine sets of data for the case [P1 = P2 = 50, x~2 = 0.1] were simulated. From a starting interval of 200, ten frequency units were progressively removed from both sides. This can be understood taking into account the derivatives shown in Fig.3, which are the real independent variables of a non-linear system: clearly the precision of the parameters decreases when the highest values of the derivatives are disregarded. 5.3 5.3 Symmetry of the range As in the previous case nine sets of data were simulated starting from a 200 frequency unit interval: 10 frequency units were progressively removed only from the right side of the interval. The E values calculated from these sets of data are reported in Fig. 6 as a function of the disregarded amount. The results reported here are very similar to those in Fig.5, but show a lower variation for Pl and 1712. These findings could have been foreseen, since the left side for P~ is unchanged.
338
l~
P~=ltll+6$
/ ;
r=~
,,:,,
J___%.L
I
1113": [}, ~ 41
~
t
.
. . . .
.I
.
. I . . . I
.
.
1 ~.'_'.,
.
.
. . . .
1
LZ 1"I
.
I . . . .
.
.
d~]0 l O S
.I..
_1 . . . .
L
.I.
.
.I
.
.
.
.
.
. I . . . I
.
1 .
.
.
.
|
a
.
.
. I . . . I
.
.
.I.
.....
'm'll
~
--161
.
t . . . .
. t.
- ....
-
:',,
P,
9 L., 0
.
r
:, -
.
'-..__
lu], Ir=t;I, I
-til
-
.
;',
A
I . . .
--I~]
.
/ \,
/\ _
+
] P:':-60.0t3
_.."
,',
'
J
:,,
Pe=66,~r
~
~
/\
%. % ~
I
"
.
I
1;~]1riO,01
.
.
.
.I.
.
.I
. . . .
.
.
.I.
.
.I
. . . .
.
.I.
.
.
.
.I . . . .
l ~,
t;] ]
i,
'\
-t . . . . _. ~--%e~.' "~~F
':1
--P-!
.
9
9I
.
%~'*-q. .
I
j|
Y
i
.
.
.
......
' ~
.
.I.
flji
,
.....
I~ .
.t~
,.-,~.om
....
. . . .
. I
'rim
o ....
.
.
i
-,v
1,+,u
i
.
.
9d~ moo
.
.
,i
I
o ....
I
.
', ~"%..
"Y ~i
1~-
I
d~
-too
Fig. 3" The derivatives of signal intensities with respect to the parameters P1 (continuous line), P2 (dashed line) and x12 (thick line) are shown for the 18 cases.
339
153
E] ] IO~
oJ
~'o
t 50 of p o i n t s
160
Number
zoo
Fig. 4 E values related to the populations P as a function of the number of points for the case [P1 = P2 - 50, 1;12 = 0.1]. E values for 1;12 are lower but show a very similar trend.
0.06
10
E
E
0.03
0.00 2'0
'
4'0
'
60
'
IA~l
ab
'
100
9 z'o ' 4o ' 6o ' a'o ' 10o A~01
Fig. 5" E values related to the populations (left) and 1;12 (light) reported as a function of the frequency interval excluded on both sides of the graph. [P~ = P2 = 5 0 , x~2 = 0 . 1 ] .
lO1
0.06
E
p~
E
8d 6d _
0
2'0
'
4'0
1
6'o
' ao ' lOo IACOlright
0.03
0.00
J ~'o
4'0
6'o
a'o IACOIright
lOo
Fig. 6" The E values related to the populations (left) and x~2 (right) as a function of the frequency interval excluded on the right side of the graph. [P~ = P2 = 50, x~2 = 0.1].
340
5.4 5.4 Values of the parameters Signal intensity can be assumed to be proportional to (P1 + P2)/10. The related error on the measurements can therefore be estimated by dividing S by (P1 + P2)/10. Equation (14) can easily be changed to obtain the relative standard deviation on each parameter as: (17a)
1~
(SD), C~,~ P~+P2 , r r 10 I r
"i
10
.S = E l
]
l-
L~+P~
-1
-S
1300 patterns by creating four new patterns from each peak by shifting cross-peaks by +0.2 and +0. l ppm. In training, the order of patterns was randomised and a low gain (0.05) was used to avoid local minima problems. With a momentum of 0.9 satisfactory training was achieved with 2000 epochs. Primary sequence information was also incorporated into some training exercises by introducing an extra bias into a given output unit that was proportional to the number of occurrences of that particular amino acid in the protein. A large negative factor was used for amino acids absent from the test protein, spinach ACP. Classification was evaluated according to ambiguity: the number of output units that produced an activation greater than or equal to that of the correct unit. An ambiguity of nine, or more, was deemed to indicate an incorrect classification. If the largest output value was less than 0.05 the classification was deemed undetermined. It was found that the number of correct classifications and the ambiguity were optimised with a hidden layer of four nodes. The generalisation error, the difference between the fraction of correctly assigned patterns and the fraction of all patterns classified, was found to be 0.27; this could be improved by increasing the number of patterns in the training set. Constraints derived from this network were then used by a second network for sequential assignment.
4.2 Sequential assignment The network for sequential assignment was a Boltzmann machine incorporating simulated annealing in a constraint satisfaction algorithm. A square of units in the network is used to represent a residue position. Each unit along one side of the square represents a spin system identified by the previous network as one of the best three candidates for the given residue; unassigned spin systems are also included. The choice of three represents a practical compromise. Units are binary with associated probabilities calculated as a function of a simulated annealing temperature. All units are fully connected by a symmetric weight matrix. Training comprised weight modification according to a protocol optimised for the highly or-helical region of spinach ACP (residues 5-24, where every residue displays at least one sequential NOE connection). Negative changes were used to discourage multiple assignments, positive changes for connected units exhibiting
420 sequential NOE connections (NH-NH, NH-CaH). This protocol may require modification for proteins with different secondary structures. In testing, interresidue cross-peaks from the 9-3ppm region of a 2D NOESY spectrum of spinach ACP was used in conjunction with the constraints derived from the spin system identification network. The training algorithm proceeded by calculating the outputs of randomly selected units. Simulated annealing was incorporated, decreasing the temperature from two to zero in non-equal steps over 2000 epochs. In practice the network was able to correctly identify 15 out of 20 residues; this same result was observed independently of the random selection of starting configurations, suggesting a global result for these data. Erroneous assignments could be attributed to spin systems incorrectly identified by the first network. Additional experiments or analysis of complimentary data could help avoid misassignments. Increasing the number of training patterns could help avoid the effects of the uncommon chemical shifts of some protons. Ambiguities were also found and were traced to two residues of the same type each having a sequential NOE connection to the same residue. For this type of network the size of the weights matrix will be a limiting factor with large proteins, the size being of the order N 4 x A2/202, where N is the number of residues in the primary sequence and A is the average amino acid ambiguity (e.g. eight in the example in 4.1). Overall, the networks achieved a 75% correct sequential assignment based upon a single NOESY and single TOCSY spectrum. The authors suggest that the spin system identification network could be extended to incorporate other types of information such as heteronuclear data, coupling constants and secondary structure. 5 ACKNOWLEDGEMENTS I would like to thank the Wellcome Trust for their generous financial support of the MetJ protein research. Co-holders of this research grant at the University of Leeds are Dr Julie Fisher (School of Chemistry) and Dr John Arnold (Department of Genetics); I am grateful to them both for their support and for allowing me time to continue my interest in neural network applications in NMR. Finally, I would like to thank my brother, David Corne, of the Artificial Intelligence Department at the University of Edinburgh, for suggesting neural networks as a possible solution to our NMR data abstraction problem. 6 REFERENCES
Beale, R., Jackson, T., 1990. Neural Computing: an Introduction. Adam Hilger, Bristol. Carrara, E. A., Pagliari, F., Nicolini, C., 1993. Neural Networks for the Peak-Picking of Nuclear Magnetic Resonance Spectra. Neural Networks, 6, 1023-1032. Corne, S. A., Johnson A. P., Fisher, J., 1992. An Artificial Neural Network for Classifying Cross Peaks in Two-Dimensional NMR Spectra. J. Magnetic Resonance, 100, 256-266. Corne, S. A., Fisher, J., Johnson, A. P., Newell, W. R., 1993. Cross-Peak Classification in Two-Dimensional Nuclear Magnetic Resonance Spectra Using a TwoLayer Neural Network. Analytica Chimica Acta, 278, 149-158. Garrett, D. S., Powers, R., Gronenborn, A. M., Clore, G. M., 1991. A Common Sense Approach to Peak Picking in Two-, Three-, and Four-Dimensional Spectra using Automatic Computer Analysis of Contour Diagrams. J. Magnetic Resonance, 95, 214-220.
421 Hare, B. J., Prestegard, J. H., 1994. Application of Neural Networks to Automated Assignment of NMR Spectra of Proteins. J. Biomolecular NMR, 4, 35-46. Hinton, G. E., Sejnowski, T. J., Ackley, D. H., 1984. Boltzmann Machines: Constraint Satisfaction Networks that Learn. Technical Report CMU-CS-84-119, Department of Computer Science, Carnegie Mellon University. Hopfield, J. J., 1982. Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proceedings of the National Academy of Sciences USA, 79, 2554-2558. Kleywegt, G. J., Boelens, R., Kaptein, R., 1990. A Versatile Approach toward the Partially Automatic Recognition of Cross Peaks in 2D 1H NMR Spectra. J. Magnetic Resonance, 88,601-608. Kjaer, M., Poulsen, F. M., 1991. Identification of 2D 1H NMR Antiphase Cross Peaks Using a Neural Network. J. Magnetic Resonance, 94, 659-663. Kohonen, T., 1982. Self-Organised Formation of Topologically Correct Feature Maps. Biological Cybernetics, 43, 59-69. McClelland, J. L., Rumelhart, D. E., 1986. Parallel Distributed Processing Vol. 1, MIT Bradford Press, Cambridge. Meadows, R. P., Olejniczak, E. T., Fesik, S. W., 1994. A Computer-Based Protocol for Semiautomated Assignments and 3D Structure Determination of Proteins. J. Biomolecular NMR, 4(1), 79-96. Neidig, K. P., Bodenmueller, H., Kalbitzer, H. R.,1984. Computer-Aided Evaluation of 2-Dimensional NMR Spectra of Proteins. Biochemistry Biophysics Research Communications, 125, 1143-1150. Otting, G., Widmer, H., Wagner, G., Wuthrich, K., 1986. Origin of tl and t2 Ridges in 2D NMR Spectra and Procedures for Suppression. J. Magnetic Resonance, 66, 187193. Panaye, A., Doucet, J. P., Fan, B. T., Feuilleaubois, E., Rahali el Azzouzi, S., 1994. Artificial Neural Network Simulation of 13C NMR Shifts for Methyl-Substituted Cyclohexanes. Chemometrics Intelligent Laboratory Systems, 24(2), 129-135. Radomski, J. P., van Halbeek, H., Meyer, B., 1994. Neural Network Based Recognition of Oligosaccharide 1H NMR Spectra. Nature Structural Biology, 1(4), 217-218. Rosenblatt, F., 1962. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington DC. Wasserman, P. D., 1989. Neural Computing: Theory and Practice, van Nostrand/Reinhold, New York. Widrow, B., Stearns, S. D., 1985. Adaptive Signal Processing, Prentice Hall, Englewood Cliffs, NJ. Widrow, B. Lehr, M. A., 1990. Thirty Years of Adaptive Linear Networks - Perceptron, MADALINE and Backpropagation. Proceedings IEEE, 78, 1415-1442. Wuthrich, K., Wider, G., Wagner, G., Braun, W., 1982. Sequential Resonance Assignments as a Basis for Determination of Spatial Protein Structures by High Resolution Proton Nuclear Magnetic Resonance. J. Molecular Biology, 155, 311-319.
422
Signal Treatment and Signal Analysis in NMR Ed. by D.N. Rutledge 9 1996 Elsevier Science B.V. All rights reserved.
Chapter 20
ANALYSIS OF NUCLEAR MAGNETIC RESONANCE SPECTRA OF MIXTURES USING MULTIVARIATE TECHNIQUES
Trond Brekke and Olav M. Kvalheim Department o f Chemistry University o f Bergen NORWAY
1 INTRODUCTION The aim of this chapters is to present data-analytical methods that can cope with the amounts of potential information obtained by modern nuclear magnetic resonance (NMR) spectroscopy. The systems investigated are petroleum fractions and related systems of hydrocarbon mixtures. However, the methodology is the main issue. The NMR mixture spectra are subjected to correlation analyses, projection analyses, principal component analysis, and partial-least-squares regression analysis. These methods provide: i) correlation of samples, ii) spectral interpretation and thereby identification of constituents, and, iii) quantification of constituents. During the last twenty years, nuclear magnetic resonance (NMR) spectroscopy has experienced a dramatic development towards increased resolution and sensitivity. This development has been accompanied by the application of NMR spectroscopy to the analysis of increasingly complex systems, e.g., petroleum and biological samples. The huge amounts of data that are produced for such systems create serious problems in the data-analytical step. Methods based on integration of certain chemical shift ranges are useful for many purposes but fail to recognise the full information content in the spectra. It is clear that more efficient data-analytical methods are needed, particularly for the analysis of mixtures. A characteristic feature of spectroscopic data as opposed to chromatographic data is that most chemical substances are represemed by several peaks. This simple fact is the
423 basis for most analysis methods aiming at identification of constituents from spectroscopic data. The most obvious way to assign a spectrum of an unknown compound is by comparison with pure component spectra in spectral libraries (Gray, 1982). Alternatively, the chemist may propose candidate structures, simulate their spectra, and compare the spectrum of the unknown to the simulated spectra. A number of computer assisted methods have been proposed for this purpose (Wilkins et al., 1974, Brunner et al., 1975, Lipkus and Munk, 1985, Hyman et al., 1988, Hashimoto and Tanaka, 1988, 1989, Juts et al, 1989). However, searching methods are rather limited when it comes to analysis of complex mixtures due to the large number of false matches that crowded NMR spectra may produce. Laude and Wilkins (1986) proposed to use "quantitative" 13C NMR spectra to identify constituent subspectra. In such spectra constituents are identified as groups of peaks with intensities that are integer ratios of each other. However, the experimental conditions necessary to record such spectra degrade both resolution and sensitivity, so that the method fails to appreciate the performance of modem spectrometers. Stilbs (1981) showed that it was possible to separate constituents of different size by the FT-PGSE (Fourier transform - Pulsed gradient spin-echo) experiment. In this experiment, peaks belonging to the same constituent are revealed by having the same self-diffusion coefficient. As with the Laude and Wilkins method, the FT-PGSE experiment allows for a constituent analysis from a single spectrum. However, due to the magnetic field gradient pulse used in FT-PGSE experiment the resolution is degraded, so that its application is restricted to relatively simple mixtures. Allerhand and Maple (1986) demonstrated that appropriate temperature stability, composite pulse 1H decoupling, and simple one-pulse acquisition yield "ultra-high" resolution 13C NMR spectra. By this procedure extremely complex mixtures (gasoline) give spectra where the individual constituents are represented by pure, non-overlapping subspectra. They identified these subspectra by the addition method. If the added compounds were originally present in the gasoline, no new peaks would appear in a new spectrum, but some peaks of the original spectrum would grow in intensity. Windig (1988) reviewed a number of multivariate methods for spectral mixture analysis based on the principal component analysis (PCA). Applications to 13C NMR data are reported for some of the methods (Hearmon et al, 1987, Kormos and Waugh, 1983). In this chapter it is assumed that the observed system is a set of samples consisting of the same chemical constituents varying only in relative amounts. Petroleum or related systems of non-polar hydrocarbon mixtures analysed by NMR spectroscopy are used as example systems. When such spectra are collected in a data matrix, underlying factors are revealed in the form of collinear spectra or collinear variable arrays. We show in this chapter that multivariate analysis (MVA) is a proper tool for the identification, interpretation, and quantification of such collinearity patterns. The experimental conditions involved in the data acquisition are deliberately chosen to optimise the S/N ratio and resolution in a minimum of instrumental time. Only one-
424 dimensional 13C NIVIR spectroscopy is applied. (For MVA in two-dimensional NMR spectroscopy see e.g. Grahn et al. (1988, 1989)). WALTZ 1H decoupling (Shakar and Keeler,1987) and ambient temperature are found to give single constituent resolution even for complex petroleum samples. The prime goal of the analysis is identification and quantification of constituents. However, in a broader scope, any factor that affects the appearance of the data may be investigated: solvent effects, temperature effects, geochemical processes, etc. A subsidiary, but not less important, issue is how to prepare a consistent data matrix from a set of NMR mixture spectra. This is mandatory for all multivariate analysis. Due to the large number of variables that is involved, the only practical approach is to use some automated procedure. Unfortunately, the digitised NMR spectra cannot be used directly due to the solvent shitts induced by varying sample compositions. To be able to arrange the variables consistently, a knowledge of the relative solvent shitts between the individual peaks is required. As first shown by Malinowski et al. (Weiner et al., 1970, Malinowski and Howery, 1980), PC analysis, and a number of methods base on PC analysis, collectively called factor analysis, are powerful tools for the study of solvent effects. 2 CARBON-13 NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY The 13C~ spectrum of a non-viscous liquid sample typically exhibits sharp, well resolved Lorentzian lines, separated by random baseline noise. Carbon-13 NMR spectrometry provides very high resolution as demonstrated by the spectruna of eicosane (n-C20) (Figure 1). Seven resonances can be seen, five of which can be assigned to the chemically similar C 2 to C 6 methylene carbons. For mixtures, such an amount of details leads to spectra of an overwhelming complexity. 13C spectra of petroleum fractions may consist of 1000 - 2000 peaks. To be sufficiently defined, each peak should be represented by at least three data points. Hence, a 13C mixture spectrum is truly a multivariate object. 2.1 Experimental conditions Excellent textbooks in experimental NMR are available (see e.g. Martin et al. and Derome). However, for the analysis of complex mixtures a few topics requires a brief recapitulation. 2.1.1 Quantitative conditions The term "quantitative conditions" is often used to mean experimental conditions that give equal response for all carbon-13 nuclei in the observed sample. The advantage of quantitative spectra is that relative peak intensities give relative constituent concentrations directly. Two factors are responsible for the non-uniform response exhibited by different carbon types under routine conditions; the variable values of 13C spin-lattice relaxation times ( T1 ) and variations in the NOE. Quaternary carbons in particular have exceptionally long T1 values (frequently 60 seconds or more) and weak NOE's, so their intensities appear as much reduced compared to protonated carbons.
425
1
2
3
4
5
6
7
8
9
10
10
9
8
7
6
5
4
3
2
1
7-10
--'
1
'
29.90
3
32
I
1
29.88
'
29.86
I
29.84
4
3
2
28
26
24
1
22
20
18
16
14
1
PPM
Fig. 1" Carbon- 13 NMR spectrum of eicosane. There are two well-known experimental procedures to use that avoid these difficulties: (i) A relaxation delay long enough to allow full relaxation for all carbons in the sample is inserted between each consecutive transient. During this delay the decoupler is gated off to let the NOE die out. This is known as the gated broadband technique. (ii) By doping the sample with a paramagnetic species the NOE is suppressed altogether, and the Tl'S are reduced to a value that allows rapid pulsing without discriminating significantly between the different carbon types. Both techniques mentioned above suffer from the forced quenching of the NOE. In addition to this the gated broad-band technique is extremely time-consuming due to the long relaxation delay (up to ten times T1) that is required for non-protonated carbons, while paramagnetic agents cause line broadening that severely degrades both the
426 resolution and the S/N ratio in the spectra. These are serious drawbacks in the analysis of mixtures. Hence, if the aim is to obtain information on a detailed level, such as single constituents, it may be necessary to optimise the S/N ratio at the cost of quantitative conditions. 2.1.2 Signal-to-noise ratio Due to the low inherent sensitivity of the 13C isotope, and its low natural abundance (1.1%), the experimental conditions in 13C NMR spectroscopy are primarily directed towards a maximum signal-to-noise (S/N) ratio. This is particularly crucial for mixtures, where the individual constituents are present at low concentrations. Assuming that the probe is tuned and magnet shimmed to the manufacturer's specifications, the S/N ratio may be improved in four ways: (i) By maximising the amount of material in the available sample volume. However, at the same time it is often preferable to dilute the sample in order to decrease viscosity and to minimise solvent effects due to variable sample compositions among a set of samples. For most applications a 1 : 1 dilution of the material is an acceptable compromise. (ii) By adding the intensity from n transients (signal averaging). In this process, coherent intensity adds proportionally to n, while random noise adds proportionally to ~/n. Thus the S/N ratio after n transients is:
(S/N), =,J-ff -(S/N),
(iii)
(1)
where (S/N)1 denote the signal-to-noise ratio after a single transient. This means that the signal to noise ratio increases linearly with the square root of the number of transients. By irradiating the proton band (1H broad-band deeoupling) during 13C observation (Shakar and Keeler, 1987). The multiplets due to 1H- 13C spin-spin couplings then collapse to intense singlets. At the same time the nuclear Overhauser enhancement (NOE) provides an intensity enhancement of up to 2.98 for protonated carbons. Thus, for many applications the information inherent in the coupling patterns is suppressed in favour of an improved S/N ratio. The spectrum shown in Figure 1 is recorded using a standard single pulse technique (Ernst and Anderson, 1966) and "WALTZ-16" 1H broad-band deeoupling (Shakar and Keeler, 1987). During the last fifteen years, pulse sequences have been designed that retain the information available from the largest couplings (one bond couplings) in deeoupled spectra through the phase of the signals (Brown et aL, 1981, Doddrell et al., 1982, Cookson and Smith, 1983). After the phases have been prepared by setting an appropriate pulse sequence, the protons are decoupled, and a simple decoupled 13C spectrum appears in which the intensities of methyl, methylene, methine, and quaternary carbons are differently modulated. In a typical application the conditions are chosen so that methyl and methine carbons give positive peaks, while methylene and quaternary carbons give negative peaks. The subspectra for the different carbon types may even be
427
(iv)
isolated by linear combination of GASPE (gated spin-echo) or DEPT (distortionless enhancement by polarisation transfer) spectra (Madsen et aL, ! 986, Bendall and Pegg, 1983). This is known as subspectral editing. Finally, Allerhand and Maple (1987) and Maple and Allerhand (1987) have demonstrated the importance of temperature stability during signal averaging. Small temperature variations during signal averaging lead to a slight spread in resonance frequencies for the observed nuclei, resulting in incoherent averaging and broader lines. The elimination of temperature drift thus leads to an improved S/N ratio as well as to improved resolution.
2.2 Solvent shift
The main contribution to the total chemical shift of a 13C nucleus is determined by the molecular structure, or more precisely the electronic surroundings, within its four nearest neighbouring nuclei (Lindeman and Adams, 1971). Structural differences beyond its fourth sphere of nuclei in the molecular framework lead to chemical shifts of the same magnitude as shifts induced by solvent and temperature effects. In particular, solvent shifts constitute a major problem in crowded spectra of mixtures, since peaks that are separated less than the possible solvent shifts cannot be assigned unambiguously to separate molecules based on chemical-shill information alone. Temperature shifts do not normally impose such problems since they are averaged over time as noted above. However, systematic temperature fluctuations due to decoupler switching or long-time dritt may give significant shitts. When dealing with mixtures it is convenient to define the solvent as all the molecules which surround an observed molecule. Hence, the mixture itself must be regarded as the solvent. Due to the low sensitivity of 13C NMR, the solvent shifts induced by the mixture cannot be removed by extensive dilution. Therefore one has to cope with the solvent shifts caused by variation in the analysed mixtures, the best one can do is to keep the concentration of the diluent constant. The most practical chemical shift referencing system is to use internal references. With this method, the observed shifts are not true solvent shifts. The reference carbon is itself shitted by solvent effects, so that the observed solvent shitt for a given nucleus is the difference between the solvent shift for that nucleus and the reference nucleus. Still, for simplicity, we will in this chapter refer to solvent-induced shifts relative to an internal standard as solvent shifts. Solvent shifts may be estimated with an almost unlimited experimental precision by using sufficient sampling points and curve fitting procedures (Weiss et al., 1982). Obviously, very detailed information on solvent-solute interactions is present in such data, but solvent shitts are complex observables, and the significance of solvent shifts in complex mixtures is only crudely understood. Weiner et al. (1970) proposed to express the solvent shift, Aik, for a nucleus k, in solvent i, as a linear combination of A different solvent-solute factors: A
=
l='l
(2)
428 Fi is the i'th solvent effect while Sik the solute response in variable k to the i'th solvent effect. Sik depends to some extent on the electronic characteristics of the observed nucleus, but its distance from the solute molecule is more important. Hence, nuclei that are positioned at the surface of a molecule are more exposed than those at central sites. Sik is therefore known as the "site" factor (Rummens, 1976, Cans et al., 1978). The different solvent factors, Fi, are usually related to bulk magnetic susceptibility, magnetic anisotropy, Van tier Waals interactions, permanent dipole interactions, and stronger interactions such as hydrogen bonding and complex formation (see e.g. Cans et al., 1978). 3 DATA PRETREATMENT 3.1 Spectral processing Data pretreatment is usually seen as part of the multivariate scheme of data processing. In the present context it is therefore important to be aware that an NMR spectrum very rarely represents the primary data that is measured by the detector. Modem spectrometers operate in the pulsed Fourier-transform mode b.v which a free induction decay (FID) is recorded. The more familiar frequency spectrum is obtained by a Fourier transformation of the FID. Prior to Fourier transformation it is customary to scale the FID by using different "window" functions that affect the resolution and the magnitude of the noise in the spectra. Another common transformation is effectuated by extending the FID by zero's prior to Fourier transformation. This increases the digital resolution and the amount of resolved information in the Fourier transformed spectrum. For a detailed treatment of the effects of such manipulations with the FID see e.g. Lindon and Ferrige (1980). 3.2 Peak picking A consequence of high resolution is the large number of data points required to avoid serious digitisatiott errors. To use the full digitised spectrum would thus easily lead to prohibitively large data matrices. However, in most cases large parts of the spectrum are baseline which need not be included in the analysis. This can be removed through peakpicking routines which are implemented in most modern spectrometric software. Peak picking consist of picking only those peaks that are above a predefined intensity, level. Since the NMR lineshape in most cases is truly Lorentzian, peak picking should be performed by the use of a Lorentzian curve fitting procedure. The intensities and chemical shifts estimated from peak picking are actually more precise measures than the raw digitised values (Weiss et aL, 1982, Weissand Ferreti, 1983, Verdun et al., 1988). By assuming constant line widths throughout the spectrum (usually a good approximation for small molecules), the peak heights can be used as quantitative measures. Simple peak picking thus provides very useful data reduction. 3.3 Maximum-entropy reduction The maximum-entropy method proposed by Full et al. (1984) offers a more exhaustive procedure for data reduction when the total data profile needs to be accounted for. It is appropriate for spectra of viscous or highly complex samples, for
429 which constam lineshapes cannot be assumed, and the less intense regions of the spectra may contain significant information. Full et al. developed this method for grain-size frequency plots, but the algorithm is also applicable to spectroscopic profiles. However, the maximum-entropy criterion, as it is used in this method, produces a completely flat average spectrum. This obscures the features in the raw spectra so much that subsequent spectral assignments becomes very difficult. Karstang and Eastgate (1987) developed an algorithm for reduction of X-ray diffraction data based on this maximum-entropy approach, where the more intense data points are left unchanged, while the less intense data points are summed. Thus, the main features in the raw spectra are retained in the reduced spectra. Provided that the summed variables in a reduced variable belong to the same peak,, no component information is lost. The algorithm is described in detail by Karstang and Eastgate (1987) and by Brekke et al. (1990). Figure 2 shows the result of using the algorithm for reducing a 13C NMR digitised profile from an oil fraction. 3.4 Normalisation The total intensity of a spectrum is likely to depend on experimental conditions (Martin et al., 1980, Derome, 1987) such as minute differences in tube shapes or tuning/matching optima for the different samples, and the amount of available sample material. In order to eliminate this factor, it is necessary to bring the spectra to a common intensity level. This may be achieved by adding known amounts of known compounds to the samples (internal standards). The standard compounds should be chosen so that their spectra cover the observed spectral range, but without overlapping with the spectrum of the observed system. The spectral intensities of the internal standards provide the normalisation constant. For high-resolution 13C NMR spectra several standards should be used in order to ensure a statistically sufficient number of data points for the normalised constant. Alternatively, the spectra may be normalised to constant total intensity. In that case, quantitative conditions should be ensured in order to avoid variations in total intensity related to differences in Tl's and NOE's among the constituents. DEPT spectra represent a special problem in this context since the DEPT technique neglects the quaternary carbons altogether. Thus, if the content of quaternary carbons varies, normalisation to constant intensity is not a valid procedure for DEPT spectra.
3.5 Scaling Different scaling procedures and their effects on the results are discussed in section 4.1. See also Table 1. 3.6 Preparation of the data matrix Before the analysis can proceed, the spectra must be tabulated into a consistent data matrix, which means that the variables should be arranged in the same order for all spectra. Without a consistent data matrix, MVA will produce nonsense, or may in the worst cases give apparently "good" but misleading results. The combination of high resolution and solvent shifts makes this task particularly crucial in 13C NMR spectroscopy.
430
r 23o
I
I
~so. >-
H~) zIJJ ,Z
! I
I
CH ,-
------------I
I I
~2e.
._______~|~
I I l i I
I
I
I
I I I
I
IM
I
I
I
I
I
i
I
I
co.
I 210
/" L_/_ --I,
lg
I
'
I
' ' 1
: :
!
~
!
!
158
H U') Z ud 'Z-
I
120
I
I I
go
l
I
11
I 6e
I
2
17
I
,t H .
_.
.
.
.
.
2681
.
A
3677
_
-
5378
16
1~ 2 II
I,,
63
.
6365
VARIABLE
6697
6982
7236
7638
8509
9655
N O
Fig. 2. Spectra of a C 18 petroleum fraction before (above) and after data reduction. The CH, CH2, and CH3 subspectra are separated using the DEPT technique and recombined as described in Brekke et al. (1990). The most intense peaks are numbered for assignment purposes. The variable numbers on the bottom scale refer to the raw spectrum before data reduction. When the maximum of a resonance is shifted from one point in one spectrum to a another point in another spectrum the inherent collinearity between the variable vectors is destroyed. Furthermore, the intensity variation introduced by such shifting is large, and obscures other factors. To be able to correct for the solvent shifts knowledge of how the individual nuclei respond to the solvent effects is required. Due to the complex nature of the solvent effects, and the large number of variables that is involved, this is
431 itself a multivariate problem, and not a trivial one. An example where a multivariate strategy is used to arrive at a consistent data matrix is discussed in section 5. Table 1: Different forms of dispersion. Scaling applies to variable vectors. n is the number of samples. Dispersion
Centering
Scaling
Variance/Covariance
Yes
1/n
Correlation
Yes
Variance -1/2
No
Mean -1
Comments Magnitude in variables retained. Pure qualitative information. Magnitude in variables destroyed. Noise in small variables enhanced. Emphasises variables with large variances relative to their means.
4 MULTIVARIATE ANALYSIS The term "multivariate analysis" is here used for the analysis of objects that are characterised by many variables. The data are arranged in a (n x p) matrix X, in which each row is a one-dimensional spectrum of p numbers, and each variable is represented by a column of n numbers. Hence, p is the number of variables, and n is the number of spectra. Matrices and vectors are represented by bold capital and small letters, respectively. Unless otherwise stated the data matrix is assumed column-centred. This means that the mean of each variable is zero. Variables will be loosely referred to as "peaks", "data points", "intensities", "chemical shifts", etc., depending on the context. 4.1 Variable dispersion It should be kept in mind that spectra and variable arrays are vectors characterised by lengths and directions. Most MVA concepts are intuitively grasped within a geometrical terminology. Assume that the data matrix consists of intensities tabulated from a set of spectra such as those shown in Figure 3. When the concentration of toluene is increased from sample 1 to sample 2, the intensities for all toluene peaks (the peaks marked with triangles) increase proportionally. In the data matrix the corresponding column vectors differ only by a constant factor. Geometrically, this means that the vectors have the same direction in multivariate space, but differ in magnitude. The direction of these vectors can thus be assigned to toluene, while their magnitudes are proportional to quantities such as the number of magnetically equivalent carbon nuclei, the size of the NOE, and the rate of relaxation for the different carbon sites in toluene. This simple example thus illustrates two important geometric aspects of multivariate data:
432 1) 2)
qualitative information is connected to directions in the multivariate space; quantities are connected to the relative magnitudes of variable and object vectors. It is the inherent collinearity of the variable vectors that makes spectroscopic data so well adapted for MVA. We shall look closer at how the qualitative information connected to the dispersion in the variables can be extracted.
,,1
.
I. 9 .1 140
130
II,!!,I,,ITll Ll Jl I
I
I
1..,1.til ,.
I 120
110
100
90
80
PPM
70
60
50
40
30
20
10
0
Fig. 3 Carbon-13 NMR spectra of two 12-component mixtures. The peaks marked by triangles are due to toluene.
4.1.1 Measures of dispersion In matrix notation the variable dispersion is calculated as the data matrix premultiplied by. its transpose: o = x~x~
(3)
where D is the p x p dispersion matrix (Windig, 1988), X is the n x p data matrix, superscript T denotes transposition, and "s" means that the data may be scaled in some way. It follows that D is symmetric. Geometrically, each element, d~, is the scalar product between two columns (variable vectors), Xk and x~, in the data matrix (see also Figure 3). The data matrix X is of dimension n x p, where n is the number of spectra and p is the number of variables. Each element of D is a scalar product of two variable columns in the data matrix. See Equation (4). The actual computations are better expressed by the algebraic form
433 d~ - s
x~:x~
l=l
where k and 1 are variable indices as shown in the figure, while i is the sample index. The subscript s means that the variables may be scaled in some way, e.g. as given in Table I.
Q
! : XT
__/~_)
I
I
9
I
I
1'
1'
XI~
du
X Tks
X is
Fig. 4: Schematic represemation of the dispersion matrix. Each elemem in D, dkl, is the scalar product between two columns (variable vectors), Xk and x~:
II 1Ill II
d~ = xuxts = xu x, cosa;k,l = 1,...,p
(4)
where a is the angle between Xk and xl, II [I denotes the Euclidean norm, e.g.
with i as the sample index. Hence, D gives the dispersion among variables. When scaling the data by n l , the variance-covariance matrix, C, is obtained. This matrix has variances along the diagonal and covariances elsewhere. It reflects the true magnitudes of the peaks in the spectra. By scaling the colunms of X to unit magnitude (Euclidean norm), the correlation matrix, R, is obtained. This scaling procedure is known as standardisation. In R each element is equal to a correlation coefficient, cosa. The correlation matrix is a pure representation of the directional relations between the variables. Another useful scaling procedure for M R data is to divide the variables by their means Brekke et al. (1990), in which case we are operating on the uncentred data
434 matrix. This scaling emphasises variables with relatively large variances at the expense of those with relatively low variances. It is useful for an investigation of the variables which discriminate most between the samples. The various forms of the dispersion matrix treated here are summarised in Table 1.
4.1.2 Congruence In D each row or column contains the dispersions for one variable relative to all other variables. The scalar product between two such rows or column thus supply yet another measure of dispersion (Brekke et al. 1989a, Kvalheim, 1988). In matrix notation this dispersion is expressed by the congruence matrix, S: S - DTDs
(5)
The columns of D are scaled to unit magnitude. Congruence is a more multivariate measure of collinearity than correlation since all variables contribute to a congruence coefficient while only two variables contribute to a correlation coefficient. For this reason congruence between two variables is more sensitive to interference from other variables in the spectra (e.g. by resonance overlap).
4.1.3 Analysis of dispersion The dispersion matrix may be analysed visually in the same way as a twodimensional COSY NMR spectrum (Derome, 1987). The contour plot of a covariance matrix and one of its rows are shown in Figure 5a and Figure 5b, respectively. Owing to resonance overlap in this case the covariance patterns represent average structures in the samples rather than single constituents. Note that the covariance matrix is displayed in absolute mode. Actually, it contains both negative and positive elements as seen in Figure 5b. The spectrum corresponds to a row in the covariance matrix shown in Figure 5a. The negative covariance between the peripheral n-alkyl carbons and the middle n-alkyl carbons is due to a variation in average chain length among the samples. A spectrum of variances is found along the diagonal from upper left to lower fight in Figure 5a, covarying peaks are recognised by "large" off-diagonal elements. However, from Equation (4), it is seen that when o~ for two variable vectors is non-zero, the covariance can take any value depending on the magnitudes of Xk and Xl. This means that there is no definite limit between high and low covariance. In the correlation matrix, on the other hand, correlated variables are recognised by correlation coefficients equal to one (or close to one since noise can never be eliminated completely), corresponding to a cluster of variables in the multivariate space. Hence, peaks representing a given constituent always cluster together, provided that the peaks do not overlap with peaks belonging to other constituents. Note, however, that by standardising the variables to unit length, all variables are given the same importance. Thus, random noise embedded in variables with poor S/N ratios is enhanced while the quantitative importance of intense variables is suppressed.
435
4 0 0
800 ,9
.,;,.
9
|
6 9
-
9
o: 9 o, 9
~ ~
o o
~0
9
o
9
o0
r
o.
~,.
9 , '
9
6
"
~'9,
"~. ~..
""
~
0~|
.0~
9 .
.o
~,
,~..0
(~
9 ,
o~o.~; 800
ooo. . . . . o
~
9
ee
, . ~
9
o
....
~
o
40
9;
--.
eO
9
o.
oe
9
9
~
o
~ ,
. . o"
~
-.
aO
0
0,
o~.
,..
.. .....
:
9 9
~,.
~
.oa
o
."
.
.
=
~ o. ~. , .
.
.0. .11 ~f~ v ~r.~ 9
.
.
.
.. .. ~. -. ; ~ 1- v. ~. ;:
o ~.~.~.. ...~.~oooo~.~ ~ ~ . . ,ooo .. o_ o,.
G
9
~
"
oo,
~e
o ~ - 0
t
9 . . . . " . . . . .-
~ d:e
9
400
o
~qP
~
, 9
.
.~
,, 9
en
-.0
9
.
o
-
,..
oo .. o,...0 .
. 00 t
.
,
".
~
o. ~l.I ., ~,j
~~~~ ,.oo ., ..
o.
,o o oo
:
:
~ 1
Fig. 5a: Covariances for a set o f petroleum fractions. - Contour plot of the w h ole covariance matrix. Further details about the data set are given in Brekke et al. (1990).
436
19
21 I
400
11 =
800
Fig. 5b: Covariances for a set of petroleum fractions. - Spectrum of covariances for the peak representing the middle carbon atoms of n-alkyl chains. Further details about the data set are given in Brekke et al. (1990).
437
4.2 Principal component analysis The dispersion matrices contain redundant information due to collinearily in the spectra. As an example, all the rows with an n-alkane peak on the diagonal in Figure 5 contain information on the n-alkanes. Furthermore, the dispersion matrix, as it is defined here, will be of dimension p x p while its rank cannot exceed min(n,p). For spectroscopic data the number of variables, p, is usually much higher than the number of samples. The most condensed way to express the variation in the data is given by the principal component solution. In matrix notation the PC solution consists of an orthogonal decomposition of the (scaled) data according to X s = T. pT + E
(6)
in which T. pT ideally describes the significant variation in X and the noise is contained in the residual matrix E. The columns of T are orthogonal, and the columns of P are orthonormal. The solution is obtained by minimising the squared residuals E T E, and corresponds to a singular value decomposition ofX s (see e.g. Joliffe (1986)): K s = U . G 1/2.
pT + E
(7)
U and P are orthonormal, i.e. UTU = pTp = 1. A ( < the lesser of n and p) is the rank of X s, G is a diagonal matrix containing the minimised eigenvalues of the dispersion matrix in decreasing order of magnitude, while P contains the corresponding eigenvectors (principal components). Thus, PC analysis may be seen as a decomposition of the dispersion matrix, Xs ~xS = P. G - P . In factor analysis, described by Malinowski and Howery (1980), principal components are called abstract factors because they describe the mathematical structure in the data. In the chemometric terminology, T = U - G ~/~ is a n x A matrix of object scores, and P is a p x A matrix of variable loadings. Geometrically, scores are coordinates for objects in the space spanned by the A PC's, while loadings are coefficients of correlation between the original variables and the PC's. The number of significant principal components is commonly determined by a crossvalidation procedure. For PC number m (m - 1,2,..,A) a fraction of the elements in the data matrix is kept out of the calculations, and their values predicted from a PC-model of the remaining elements. This is repeated until all elements have been predicted once, and the total squared and summed error-of-prediction is calculated. As long as this sum is less than the total variance in the residual matrix for m-1 PC's, the new PC is considered significant. In this way it is ensured that the m'th principal component represents systematic variation. The proportion of explained variance (the elements of G 1/2) in each PC relative to the total variance in the raw data is also a useful criterion for the estimation of the "true" rank of the data matrix (Malinowski 1977a, 1977b). In most cases the number of PC's necessary to reproduce the data within the experimental error is small compared to the number of variables in the raw spectra, so
438 that a significam data reduction is achieved. Cross validation and related methods are reviewed by Efron and Gong (1983). 4.3
Constrained methods
The methods presented so far are based on mathematical criteria. However, it is often necessary to investigate the data with more specific aims by applying chemically meaningful constraints in the analysis.
4.3.1 Marker-object projections One way to obtain a chemically meaningfid description of the data is by marker projections (Kvalheim, 1987). In this technique, objects or variables that represent specific chemical or physical properties are selected as axes for the measured variation. Mathematically, marker projections are the scalar products between selected marker vectors and the rows or columns of the data matrix. The marker vectors may not be orthogonal to each other, and variation that is not represented by. the marker is not accounted for in a marker-object or marker-variable projection analysis. One application of marker projections is to check the presence of a constituent in mixtures, using the pure component spectrum of the anticipated compound as a marker object. If the marker-object spectrum and the object spectra are normalised to a common standard, the marker-object spectrum serves as a unit vector for the marker compound relative to the mixture spectra, providing a scale for quantification of the marker compound in the mixtures (Brekke et aL, 1990). 4.3.2 Target-factor analysis The philosophy and aims of marker projections are similar to those of target-factor analysis (Hearmon et al., 1987, Malinowski and Howery, 1980, Malinowski and McCue, 1979, Knorr and Futrell, 1979, Gillette et al., 1983). Target-factor analysis consists of a PC decomposition followed by a transformation of the PC's: X = ( T . Q-'). (Q. pv)
(8)
The transformation matrix, Q, is constructed by testing the overlap between the PC's and judiciously selected target vectors. This step in a factor analysis corresponds to a marker projection applied to the PC's instead of the raw data vectors. When the appropriate target vectors are found the PC's are rotated in a least-squares manner to obtain the best fit with the target vectors Malinowski and Howery (1980). A particularly useful rotation is obtained when Q is defined so that (Q pv) coincides with spectra of pure components. In this case (T Q-~) becomes a matrix of concentrations for these components in the samples. However, it should be noted that the number of chemical components that can be isolated by target-factor analysis is limited to A, the number of PC's available for rotation (Ritter et al., 1976) [48]. For a comprehensive treatment of the various methods of factor analysis see Malinowski and Howery (1980).
439
4.3.3 Partial-Least-Squares regression In order to obtain a relation between a block of spectral variables, X, and some external variable, Y, we start from the following expression: Y=Xb+e
(9)
The best fit to this relation is obtained when the squared residual error e' e is minimised, leading to the least-squares solution: XT-X-b : x T ' y
(10)
If the inverse of X T -X exists the solution for b is straight-forward. Due to collinearity this is generally not the case for spectroscopic data, so that Equation (10) has multiple solutions. What is performed by the partial-least-squares (PLS) method is the calculation of the Moore-Penrose generalised inverse, X § of X, which gives the minimum norm solution for b. The number of PLS components to be used to obtain b is commonly established by cross validation of the y block in a similar manner as in PC analysis. The properties of the Moore-Penrose generalised inverse in relation to the PLS method have been discussed by Manne (1987) and Lorber (1987). 5 APPLICATIONS
5.1 Data preparation by use of principal component analysis As noted in section 3.6, solvent shifts create problems in the preparation of the data matrix. Simple manual data tabulation based on eye-bailing from spectrum to spectrum becomes practically impossible when the spectra become complex. This section deals with how this procedure can be automated by PCA. There is a close analogy between Equations (2) and (6). Hence, the solute site factors, S~, for the different nuclei are analogous to variable loadings, while sample scores are analogous to the influence of various solvent effects in the system. The loadings are particularly useful in the data preparation step since they reveal how the individual nuclei respond to the solvent effects. Brekke et al. (1989) developed a stepwise algorithm which combines qualified input from the operator and PC analysis to obtain a consistent data matrix. The algorithm asks for appropriate reference peaks to account for the main solvent factors in the system, a typical reference spectrum that is matched against all the other spectra after the large solvent shifts have been subtracted, and a solvent shift tolerance-range to allow for residual solvent shifts in the matching step. With this information a data matrix is produced. The data matrix is analysed and evaluated by PC analysis. Evaluation is based on visual inspection and interpretation of scores and loadings, and number of missing elements in the resulting data matrix. The whole process of input, PC analysis, and evaluation is repeated until a consistent data matrix is achieved (Brekke et al., 1989). For the systems tested in that reference, it turned out that a single PC explained 85% to 90% of the shift variance. From the variable loading pattern, three distinctly different site factors were identified: one for the methyl carbons, one for the aromatic methine
440 carbons, and one for all the remaining carbons. Fortunately, the three carbon type groups are easily separated by appropriate pulse sequences (DEPT and GASPE) and rough chemical-shift criteria, so that imernal referencing by choosing a typical reference peak within each group is possible. This algorithm makes it possible to arrive at a consistem shift matrix in a semiautomated way, a significam improvemem compared to manual data preparation. Finally, when the shift matrix is consistem, imensities (or any other NMR parameter) can be tabulated in the same order. Note that for systems containing polar constituems, additional solvem factors should be expected (Weiner et al., 1970, Cans et al., 1978). This would probably require a differem and extended set of chemical-shift references. Recemly, Vogels et al. (1993) developed an alternative way of peak shifting. The method, called partial linear fit (PLF), shifts small regions at the time by maximizing the cross correlation between spectra.
5.2 Interpretation of solvent factors A well known criticism of the PC method is that it gives a mathematical solution, devoid of chemical meaning. Therefore, it is often appropriate to combine PC analysis with other techniques in order to obtain chemically m e a n i n # results. Brekke et al. (1989a) used marker-object projections in addition to PC analysis to investigate 13C NMR chemical-shift variation as a function of mixture composition and temperature. The quantitatively most important variation was modelled by two PC's. By marker-object projections it was established that the first PC was due to the aromaticity of the sample, and the second due to temperature. This shows the importance of temperature stability in NMR, and that solvem shifts should be expected when analysing samples of varying aromaticity. The use of factor analysis to study 1H NMR solvem shifts was imroduced by Weiner, Malinowski and Levinstone in 1970. A similar study of 13C NMR solvem shifts has also been reported (Bacon and Maciel 1973). The method was later criticised by Rummens (1976) for two reasons: (i) It did not produce as many solvem factors as is found by traditional one-variable analysis. This must be expected, since the PC's are orthogonal, and may express many correlated solvem factors. An example is the correlation between Aa, the anisotropy comribution, and Aw, the Van der Waals dispersion comribution. According to theory, (Rummens, 1976), both are functions of the solvem-solute distance, which in turn is a function of the composition of the solvem, particularly its aromaticity (Brekke et al. 1989b). Hence, Aa and Aw cannot be isolated by the PC approach. (ii) To be able to select appropriate targets for the rotation: ..."one possesses already the knowledge one warns to obtain, and subsequem factor analysis reveals nothing further of imerest." (Rummens, 1976 page 5). This statemem is not correct since one does not need any exact knowledge in order to select the targets. However, a hypothes.is is required, as in any analytical
441 scheme. Target-factor analysis and related multivariate techniques, such as marker projections, merely provide a tool for hypothesis testing when the number of experimental variables becomes large. The use of multivariate techniques does not produce new information as such, but for large and complex data sets the information is not readily available, and a multivariate strategy may be the only way to reveal it.
5.3 Constituent analysis 5.3.1 Single constituents The pure-peak method (Windig, 1988, Hearmon et aL, 1987, Malinowski and Howery, 1991) aims at qualitative and quantitative isolation of single constituent subspectra. This is a target-factor analysis method that uses peaks that are unique to single constituents as targets to construct the rotation matrix. The crucial point of the method is thus how to find the pure peaks. According to Knorr and Futrell (1979), pure peaks should be sought as the variables that behave least like the average, and least like each other. Hence, in a PC model of the standardised data, the first pure peak is found as the peak with the lowest loading in the first PC, because this PC is regarded as the most average direction in the space spanned by the variables. The subsequent pure peaks are found from the subsequent PC's as the variables with loadings least like the average of all the previous ones. There is no proof to show that this should be so, but the merit of the method to pick pure peaks is confirmed in several mass-spectrometric applications (Hearmon et al., 1987, Knorr and Futrell, 1979, Gillette et al., 1983). Hearmon et al., (1987) showed that the method also works for 13C NMR spectra of ternary mixtures, but a closer look at their data reveals that 70-85% of the peaks in their spectra are pure peaks. In such data the chance of randomly selecting a pure peak is high. Hence, their report is not strong evidence in favour of the method. It should also be noted that the number of constituents isolated by this method cannot exceed the number of PC's that can be safely extracted from the data. To our knowledge, applications of the method are so far limited to test mixtures containing no more than five constituents. Despite its limitations, the pure-peak method does provide a quantitative isolation of the individual constituents. However, in real systems it is very. rare that the individual constituents vary completely independently of each other, so that the individual constituents can be separated in isolated components. For qualitative analysis only, variable correlations can be used. Variable correlations do not require complete orthogonality in the constituents, and may be extracted from a PC model in the form of variable loading clusters in the PC space (see Figure 6), or directly from the correlation matrix as described in section 4.1 and Brekke et al. (1989b). The ability of the correlation method to identify individual constituents is based on two properties in the data: (i) The dispersion in the constituents must be larger than the dispersion caused by the noise in the data; (ii) The individual constituents must be characterised by two or more pure peaks that can produce a cluster of variable points in the multivariate space. In other words, high resolution is required.
442 In Brekke et al. (1989b) the correlation method is used to identify nine single constituent subspectra from seven 13C NMR spectra of test mixtures, showing that constituent orthogonality is indeed no requirement for the extraction of qualitative information. As a rigorous test the method was applied to a system of petroleum fractions in which the number of possible constituents was estimated to at minimum one thousand. Figure 7 shows the spectrum for one sample. 0 21
EB
MB
014
0 07
03
PB/Tr-I,2-DMCHx
~
Cis-I,2-DMCHX
#
13.. Z O 03 CD Z H H 1(~2) =
(9a)
500
H3s(~21,~'22,~23) = < Y3(~"21+ ~22 + ~"23)X* (~'21)X* (~"22)X* (~23) > (9b) 6 ,. echo
,
90~
L.~
ll,. Signal
180~
TE/2 ---M
TE ,,
echo
180~
-'--
TE ..
,qp,
v"
1"st spin echo
2:nd spin echo
Fig. 2: Spin echo: (a) The RF pulse flips the magnetisation vector to the x-y plane (transverse plane). Immediately after, the different components of the magnetisation vector begins to dephase ("fan out"). After some time a 180 degree pulse pulse rotates the magnetisation from one transverse plane to the opposite transverse plane. The precession continues and an echo is formed. (b). The spin-echo pulse sequence consists of one 90 degree pulse and one or several 180 degree pulses. The first 180 degree pulse is applied after a delay (TE/2) and subsequent 180 degree pulses after TE. (c). The recorded signal are the spin-echoes.
One technique to circumvent the dephasing, and hence in an indirect way "adjust" the magnetic field inhomogeneities and subsequent loss of signal, is to introduce the spin echo technique illustrated in Figure 2. The spin system starts to lose phase coherence shortly after the 90 ~ pulse. This is caused mainly by inhomogeneities in the magnetic field. As the magnetisation precesses the total magnetisation is divided into fast (F) and slow (S) components where the boundaries (F and S) are seen in Figure 2. If a 180 ~ pulse is applied, the separate magnetisation vectors will change place i.e. the fast component will be behind the slow. This leads to a re-phasing of the magnetisation vectors and an echo is formed which can be recorded. The spin echo technique is one of the most common methods to aquire images, mostly because the image formation and contrast in the image can easily be monitored by the T2 process which will be explained later.
517 The spatial distribution and physical characteristics of a certain region are based on two different encodings of an element, namelyfrequency and phase encoding. Since this terminology is commonly used they deserve an explanation. Frequency encoding is performed when the spin-system is not subjected to any gradients. It is at the time of the signal-recording that the gradients are present. Phase encoding is performed before recording the signal but in the presence of a gradient. The main difference is thus that phase encoding is performed before sampling the signal whereas frequency encoding is applied during the sampling. In phase encoding the phase contains spatial information and if the phases of two signals with the same frequency are compared, information can be gained about their spatial location. This information can be easily presented in graylevel images by using the Fourier transformation (Ernst & Anderson, 1966). Today's standard image formation technique is based on both phase and frequency encoding and is called the 2D-FT technique and is illustrated in Figure 3.
~ f
,,
,," ~
/ , ?" , ?/ , ,, ", ,, " ,,*
[ " %
,,,"~P~,,
% % %
.,., / , ' , ' , ' , , , ,
~~ .~ ,~
Y',2,',',"
...
% % % - % % % %
, //,:,:,~,',',,
,,
,,
x
x
,,
,,
,,
,, /..t
,,,
,,,
." _ % %
1
~
,~
,,,
,Y/ ~
/
. . . . .
;. . . . . . . .
. . . . .
[ ......
,
,. ~ .
~,l-~
7
.
.........
, / ........... ,,........ /
~/
7
rr ~
/c._:.............L / ~ -
,,.// ,,eJ~r.,,# "
" "..., ' . / /
/ /
. .
9
.. .. ... .. .. . .
~,.
. . # - /- / /t /
.
[
'........... / ~ . . . . . . .
,~
/
~-
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
.
.
""
, . . . . _
/
t~
ml ,,hi
.
.
~.
t ~tn
~x~ .......4a41illl/i .
.
_
.
9
-
/~ ....... f/~---/
/ . .
. . . . . . . .
7--: . . . . . . . . . . . . . / / /
/
9
/
Frequencyencoding a
F~
S s k/' I e
4I
TM
d
Fig. 3: Image formation: (a) A slice through a set of three phantoms. The total frequency encoding projection for the slice is shown in the rear. (b) The collection of data from the slice in (a) results in a number of spin echoes (free induction decays, FID's) depending on the number of phase encoding steps. (c) All echoes are Fourier transformed into different projections along the frequency axis. (d) The data matrix consisting of two directions (frequency and spin echoes) are transposed into a new matrix where the axis are unchanged. (e) Each row is Fourier transformed and an image is formed.
518 Three phantoms form a slice and are put into a magnet. Collection of MR data from the "phantom-slice" produces a set of spin echoes (usually 128 or 256) where each echo is recorded from the whole slice. After a Fourier transformation of all echoes a set of projections of the slice onto the frequency encoding axis is received. A transpose operation is applied and a new data set is formed. In our case two of the newly formed slices contain signal (water signal). The top one contains one frequency and the lower one contains two different frequencies. After a second Fourier transform of all the newly formed slices (following the transpose) the image is created.How does the contrast in the image reflect the MR characteristics of the spins in the sample? One parameter is the concentration of spins which in H 1 imaging is a measure of the mobile protons in a particular region. If imaging parameters are optimised for such images the resulting images are called "density weighted". Often the intensity is quite similar between different structures in the image i.e. the contrast remains constant. Other perhaps more important ways of increasing the contrast in tissue are through the relaxation parameters T1 and T2. The relaxation behaviour of different tissues usually varies quite significantly. In Figure 4 a set of different images, acquired from an identical anatomical slice in the brain of a healthy volunteer is shown. Some of these images correspond to the "pure" state of MR weighted images, i.e. density (upper left comer), T1 (lower fight comer) and T2 (the lower left comer). The important pulse sequence parameters that control the contrast in a spin echo sequence are TR (relaxation time delay) and TE (spin echo time). To visualise the change in contrast CSF (cerebral spinal fluid) can be taken as an example. The parameters TR and TE can be changed according to Figure 4. For short values of TE and TR (= T~ weighted image) CSF is dark but for T2 weighted images (short TE, long TR) they are bright. This is actually the best rule of thumb in distinguishing T~ and T2 weighted MR images. Look for water-like tissue where T~ weighted images are dark and T2 weighted images are bright. By similar arguments grey and white matter can be distinguished.
3.3 MR Spectra Another MR modality which has become increasingly useful is MR Spectroscopy (MRS), i.e. NMR spectroscopy in vivo (Howe et al., 1993). Common nuclei to study are 31p, 13C, 23Na and 19F. It is now also possible to obtain high resolution proton spectra of various metabolites where the magnetic field has been subjected to ordinary shimming procedures, just as in high resolution NMR. Technical difficulties such as the elimination of the water and lipid signal have basically been solved and techniques for volume selection have been developed. It has become a method which is predominantly used for studies of diseases in the human brain such as cerebral ischemia and cancer.It is quite likely that in a few years from now more routine clinical work can be performed. The combination of MR imaging and MRS presents thrilling perspectives for the future. The biochemical changes that can be studied through MRS adds to the MRI visual information and the result from an MRI/MRS examination can improve the diagnostic relevance of each.
519
Fig. 4: The 5 images from an MRI examination based on the experimental design in Figure 9. The images are laid out as in the design, with the center point image in the middle. It is clearly visible that the factors influence the image contrast. In "classical" MR terms, the image in the lower fight comer is T1 weighted, the one in the lower left comer is T2 weighted and the one in the upper left comer is density weighted.
520 4 M I ~ T I V A R I A T E IMAGES An image is in general continuous in both spatial co-ordinates and intensity. It can be presented as a function A(x,y), where x and y are the horizontal and vertical coordinates and A is the intensity function. All computer memory storage and calculations require digitisation. Also multivariate imaging requires digitisation. Digitisation is done by taking the continuous analog image and making it discrete. This must be done in 2 ways. First, the co-ordinates are digitised. If a 2-dimensional image has coordinates x and y, then these are replaced by the integer co-ordinates I and J. The image is divided up as a chessboard in small rectangles or squares, called pixels. Then, for each of these squares, an intensity level is registered. This is a digitised value of the average analog intensity of the pixel. The digitisation of the intensity is the second way of making the image discrete (Figure 5). This is more or less like pictures in newspapers. A number of small fields with different reflective values gives the impression of a continuous intensity function that is interpreted as an image. Columns Rows
1 Pixel
Fig. 5: A digitized image is an array of square (rectangular) elements called pixels. They are organised in rows and columns so that each pixel has a row index and a column index. Each pixel represents a greyvalue. In radiology, the range of 0-4095 is used for the greyvalues. Because of the digitisation, some of the spatial resolution may be lost. Also for each pixel, some intensity resolution is lost. In many cases, square image are collected, although this is not necessary. Any rectangular size may do. Popular image sizes are 256x256, 512x512 and 1024x1024. For reasons of hardware construction it is useful with powers of 2 9256=28 , 512=29 and 1024=21~ The discrete values for the pixel intensities are also powers of 2. 256 greylevels from 0 to 255 is often used. In radiology, 212 greylevels from 0 to 4095 are a tradition. See also Geladi et al., (1992). The digital images described up to now are univariate or greylevel images. A lot of operations are possible on greylevel images and a significant literature is available describing all kinds of contrast improvement, error
521 correction, feature extractiort~etc. (Pratt, 1978; Gonzalez & Wintz, 1987; Lim, 1990; Rosenfeld & Kak, 1982; Gonzalez & Woods, 1992). There are always different ways of imaging the same object. In microscopy and in satellite imaging, different wavelength bands are used for making images over the same microscope slide or geographic region. In electron microscopy, X-ray energy, wavelength or electron energy are used to create many images of the same sample. More examples are given in Geladi et al. (1992), Esbensen et al. (1992), Geladi & Esbensen (1991), Van Espen et al. (1992), Pedersen (1994), Bonnet et al. (1992), Trebbia & Bonnet (1990). In general, one may say that all imaging can be done in different variables. The rule that counts here is pixel correspondence, also called congruence. A stack of congruent images, all collected at different wavelengths form a multivariate image (Figure 6). It will be shown further on in this chapter that MRI imaging may also be done at different variables to create a stack of congruent images.
1"
i
K
Fig. 6: It is often possible to collect many images of the same object, e.g. in different wavelength bands. When there is congruence (pixel correspondence) the images may be put together in an array, a multivariate image. The multivariate image shown here has I columns, J rows and K variables or bands.
522 The stack formed by the congruent images may also be presented as a 3-way array of size IxJxK, where I and J are image sizes and K is the number of variables (Figure 6). Since the variables are very often wavelengths, the number K is also called spectral resolution. An image may have a high spatial resolution and a low spectral resolution. or it may have a high spectral resolution and a low spatial resolution (Figure 7).
T
J/"
/
Fig. 7: Some multivariate images have a high spatial resolution (many pixels) and a low spectral resolution (few variables). Other multivariate images have a high spectral resolution and a low spatial resolution. Memory and calculation speed often restrict the use of images with both high spatial and spectral resolution. Also technical problems such as data collection times, physical constraints and others may limit spatial and / or spectral resolution.
Images with high spectral and spatial resolution would be ideal, but they also create a number of problems. Part of this is because of the huge memory and calculation requirements of such large arrays of numbers. Other constraints may be of a technical or physical nature. Image collection may be too slow or too expensive when both high spatial and spectral resolution are used. There may also be physical limits to detectors that make high resolution unnecessary. Very often, there may be a need for first making an image of high spatial and low spectral resolution (Figure 8)and finding interesting regions in it. When the interesting regions in this image are found the spectral resolution is then increased locally.
523
K
Fig. 8: In some cases, it is necessary to identify regions of special interest in a multivariate image of low spectral resolution and to increase the spectral resolution locally in the interesting regions.
5 THE EXPERIMENT The experimental data is a 5x256x256 MR image of a 5mm slice through the head of a healthy control. The 5 variables used are described below under "experimental design". The raw images are shown in Figure 4. The total data set has 17 anatomical slices, but only one is used here as an example. Furthermore, in Figure 4 the image responses as a function of their position in the design is shown. This gives an idea of how the values of the factors influence the design. It is also notable that different image contrasts are obtained by changing the factors TE and TR. Instead of choosing the one and only "correct" image, it is better to use all five as a multivariate image. 6 EXPERIMENTAL DESIGN In order to make optimal combinations of the parameters (factors) influencing an experiment, experimental design is used. 2 K factorial designs are easy to understand and construct, and very popular. The parameters influencing the images in MRI are TR (Relaxation time delay) and TE (spin echo time). A design may be used to span the space of both. The response in each design point is an image. For the example the design of Figure 9 was used to span the space of TR and TE. This is a 2 2 design with one center point. For reasons of practical constraints, this design is not perfect. It was not possible to get a center point exactly in the center.
524 The interpretation of the design is done as a function of the analysis of the multivariate image obtained. In principle, it would be possible to obtain a response surface, where each response is an image. 17
68
500
t E (ms)
I
I I Center [point
(34,2105)
I I 4210
I I
Fig. 9: The design used for creating the 5x256x256 didactic example. Two values for both TE and TR were chosen (TE = 17, 68; TR = 500, 3478). These values are reflected in Figure 4, where the image in the top left comer has TE=I 7 and TR=500 and the lower right image has TE=68 and TR=3478.
7 MULTIVARIATE IMAGE ANALYSIS
7.1 Principal Component Analysis When multivariate images are available, one would like to use the extra information in the K variables in the analysis of the images. When K is large, one would also like to extract the relevant information and do data reduction. A technique that is very appropriate for treating data sets with many variables is Principal Component Analysis (PCA). Principal Component Analysis is closely related to the Singular Value Decomposition in Chapter 8 (Lupu & Todor). A short description of the method is necessary. In the literature, PCA on multivariate images is described in Geladi et al. (1989), Esbensen & Geladi (1989), Geladi et al. (1992), Geladi & Esbensen (1991), Grahn et al. (1989), Lindgren & Geladi (1992), Geladi et al. (1994), Van Espen et al. (1992), Pedersen (1994), Bonnet et al. (1992), Trebbia & Bonnet (1990) Assume a data matrix X, with N rows (objects) and K columns (variables) is available. The goal is to reduce K to some lower dimension A with an optimal representation of the interesting properties of the data and separation of the noise in a residual E:
525 A X = TP' + E =
~ t a p'a+E
(1)
a=l
X : a matrix NxK T : a matrix NxA where the t a are collected P : a matrix KxA where the Pa are collected E : a matrix NxK t a : a score vector of N elements Pa : a loading vector of K elements a=l...A : an index of number of components, A is called "rank" The vectors t a are called scores and the vectors Pa are called loadings. They can be used to study relationships between objects resp. variables as is explained later on. The ta and Pa may also be calculated by Singular Value Decomposition (Chapter 8). It is important to remember that the significant characteristics of the data are retained in the score and loading vectors and that a noisy residual is discarded. When dealing with images, the data array is at least 3-way with dimensions IxJxK, with K the number of variables. Principal Component Analysis is still useful. How this is done is better shown in a figure than with equations. Figure 10 shows how PCA is used to make loading vectors, score images and a residual array. Also for a multivariate image, the significant properties are retained in a reduced form and the noisy residual is discarded.
7.2 Usingloadingsvectors The loading vectors Pa may be used to make scatter plots, also called loading plots. These plots reveal the relationships between the K variables. Figure 11 shows a typical loading plot. Generally speaking, variables close to the origin of the loading plot are ones that contribute little to the PCA model. Variables that appear close to each other give a similar contribution to the PCA model. They are said to be correlated. There are more advanced interpretations of loading plots that may be made. These are often problemdependent, so no general rules are applicable. The scree plot and loading plots from the example are shown in Figure 12. The scree plot shows the eigenvalues as a function of component number and is sometimes used to get a quick and rough idea of how many components are important. In this case, the square roots of the eigenvalues are shown in the scree plot. The eigenvalues represent nothing but the amount of the total sum of squares of the data that is explained for each component. It is to be expected that important components explain a large percentage of the total sum of squares and that small components are noisy and maybe less important. In this lies the importance of showing scree plots.
526
According to the scree plot, 2-3 components are necessary. As expected the Pl-P2 loading plot distributes the images in a way similar to the design.
Multivariate. Image m
m Component 1 Score Image
/
Loading vector
Kip
1
+
Component 2 Score Image
Loading vector
p
+ 2
Residual
E
Fig. 10: With Principal Component Analysis, a multivariate image may be decomposed in a structural part and a residual. The structural part consists of a number of principal component loading vectors and score images (2 here). The score images have the same spatial resolution as the original images, but the spectral information in them is more concentrated.
527 Loading Plot
mmkm 4 - -
--
- Correlated variables
--
-Variables close to origin= not important
n m ..~m mm
A cluster of variables
P
1
Fig. 1 I ' A typical loading plot. It is important to note the position of the origin. Variables close to the origin are of little importance to the model under study. Variables that are close together are said to be correlated. Variables may form clusters of more than 2 or they may be on their own (describe unique properties of the data)
•1/210
P2 0.8 0.6
1,
[17,soo]
8
2 0.4 [68,s00]
6
0.2
'5
4
[34,2105]
-0.2
2 0
-0.4
1
i
4
5
Component#
-0.6
0.3
,3
[17,4210
4
t68,4z,o] 014
015
0.6
P1
Fig. 12: Some plots resulting from the PCA analysis of the 5x256x256 multivariate image. To the left is the "scree" plot of ~1/2 against component number. To the fight is the loading plot Pl-P2- The scree plot shows that 2-3 components will suffice. In the loading plot, the design points are identified. The Pl-P2 loading plot shows how the design is repeated in the multivariate space, with a slight deformation as was expected.
528
Fig. 13" The score images produced from the raw images in Figure 4. Component 1 in the upper left comer, component 5 in the lower right.
529 7.3 Using score images In principal component analysis of a matrix X, the score vectors are studied in scatter plots called score plots. These reveal the relationships between the objects and interpretation may be done as for the loading plots. The score images are a more powerful tool than just vectors. Since they are images, they may be studied as such and used for a visual interpretation. The data reduction makes the score images less noisy than the original ones and gives more condensed visual information. Figure 13 shows the score images obtained from the 5x256x256 multivariate image in Figure 4. It raay be seen how the important visual information gradually disappears, making the score images noisier. Even more powerful than score images are the score plots. Figure 14 and 15 show how these are constructed from score images. Because of the huge number of pixels in an image (256x256 in the examples shown here) the score plots become rather cluttered. Therefore they are shown with a pixel density coding or even better with a color transformation of density. An example may be seen in Figure 16. The density in the score plots gives an idea of existing clustering, gradients and outlier pixels. T
rz
Score Plot
I T
T1
Fig. 14" Two score images (here the first and second score) may be used to construct a scatter plot or score plot. Paired intensity values are used for this. The principle is shown for one pixel pair, but the action is repeated for all pixel pairs available.
530 Score Plot
rz
T1
Fig. 15" With the huge number of pixel pairs (see Figure 14), the scatter plots become pixel density plots showing dense classes, gradients and outlier pixels. The classes may be of different size and intensity. Examples of score plots are shown in Figure 16.
531
Fig. 16" The figure shows 2 score plots as density plots: T1-T2and T1-T3 for the example in Figures 4 and 13. Score Plot
Image
/ 0
Score Plot
Image
0
6
Fig. 17:'In order to study the pixel classes or clusters in the score plot, they may be marked by a polygon. The pixels in the polygon are then backprojected to a binary image. The pixel groups in the binary image can sometimes be interpreted as special phenomena etc. This method is called multivariate image segmentation MIS.
532 An important use of the clustering in score plots is multivariate segmentation. The principles of this technique are given in Figure 17. A cluster of pixels is identified by a polygon and then the pixels in the cluster are highlighted in image space. In this way, different tissues types may be identified. Color images are shown on the CD-ROM and their explanation is given in the appendix. Multivariate images may also be used for regression and calibration purposes. The success of the latent variable techniques (PCA) has lead to the use of latent variable regression on multivariate images. Principal Component Regression (PCR) and Partial Least Squares (PLS) have been used. Examples of this and the theory of the regression models may be found in Grahn et al. (1989), Geladi & Esbensen (1991), Esbensen et al. (1992), Grahn & S/i/if (1992), Geladi et al. (1994), Lindgren et al. (1993, 1994), de Jong (1993), de Jong & Ter Braak (1994). In Grahn et al. (1989) the use of PLS regression for modelling and finding certain types of tissue in multivariate MRI images is discussed. In Geladi & Esbensen (1991) PCR regression is applied to a multivariate satellite image and techniques for studying the regression model visually are introduced. Further applications are given in Esbensen et al. (1992) with MRI and chemical examples. In Geladi et al. (1994), regression by MLR, RR, PCR and PLS is applied to a multivariate microscopic image. In Lindgren et al. (1993, 1994), de Jong (1993) and de Jong & Ter Braak (1994) theoretical aspects of PLS applied to large data matrices (with many objects--many pixels) are explained. 8 ACKNOWLEDGEMENTS The authors acknowledge valuable discussions with and experimental work carried out by Dr. Jan S/i/if. The MacLisPix software of David Bright, NIST, Gaithersburg, MD was used to make score plots. 9 REFERENCES
Bonnet N., Simova E., Lebonvallet S. & Kaplan H., 1992. New Applications of Multivariate Statistical Analysis in Spectroscopy and Microscopy. Ultramicroscopy, 40, 1-11. de Jong S., 1993. SIMPLS: An Altemative to Partial Least Squares Regression. Chemometrics and Intelligent Laboratory Systems ,18, 251-263. de Jong S. & Ter Braak C., 1994. Comments on the PLS Kemel Algorithm. Journal of Chemometrics, 8, 169-174. Emst, R.R. & Anderson W.A. 1966. Application of Fourier Transform Spectroscopy to Magnetic Resonance. Rev. Sci. Instr., 37, 93. Esbensen K. & Geladi P., 1989. Strategy of Multivariate Image Analysis (MIA). Chemometrics and Intelligent Laboratory Systems, 7, 67-86. Esbensen K., Geladi P. & Grahn H., 1992. Strategies for Multivariate Image Regression. Chemometrics and Intelligent Laboratory Systems, 14, 357-374. Geladi P., Isaksson H., Lindqvist L., Wold S. & Esbensen K, 1989. Principal Component Analysis of Multivariate Images. Chemometrics and Intelligent Laboratory Systems, 5, 209-220.
533 Geladi P. & Esbensen K., 1991. Regression On Multivariate Images: Principal Component Regression for Modeling, Prediction and Visual Diagnostic Tools. Journal of Chemometrics, 5, 97-111. Geladi P. & Esbensen K., 1991. Multivariate Image Analysis in Chemistry: an Overview. In: J. Devillers and W. Karcher (Editors) Applied Multivariate Analysis in SAR and Environmental Studies, pp. 415-445, Kluwer Academic Publishers, Dordrecht. Geladi P., Bengtsson E., Esbensen K. & Grahn H., 1992. Image Analysis in Chemistry. Part 1: Properties of Images, Greylevel Operations, The Multivariate Image. Trends in Analytical Chemistry, 11, 41-53. Geladi P., Grahn H., Esbsensen K. & Bengtsson E., 1992. Image Analysis in Chemistry. Part 2: Multivariate Image Analysis, Trends in Analytical Chemistry, 11, 121-130. Geladi P., Swerts J. & Lindgren F., 1994. Multiwavelength Microscopic Image Analysis of a Piece of Painted Chinaware. Chemometrics and Intelligent Laboratory Systems, 24, 145-167. Gonzalez R. and Wintz P., 1987. Digital Image Processing. Addison-Wesley, Reading, 2nd ed. Gonzalez R. & Woods R., 1992. Digital Image Processing. Addison-Wesley, Reading MA. Grahn H., Szeverenyi N., Roggenbuck M., Delaglio F. and Geladi P., 1989. Data Analysis of Magnetic Resonance Images: 1. A Principal Component Analysis Approach. Chemometrics and Intelligent Laboratory Systems, 5, 311-322. Grahn H., Szeverenyi N., Roggenbuck M. & Geladi P. 1989. Tissue Discrimination in Magnetic Resonance Imaging: A Predictive Multivariate Approach. Chemometrics and Intelligent Laboratory Systems, 7, 87-93. Grahn H. & S/i~ifJ., 1992. Multivariate Image Regression and Analysis. Useful Techniques for the Evaluation of Clinical Magnetic Resonance Images. Chemometrics and Intelligent Laboratory Systems, 14, 391-396. Howe F.A., Maxwell R.J., Saunders D.E., Brown M.M. & Griffiths J.R., 1993. Proton Spectroscopy in Vivo, Magnetic Reson. Quart., 9, 31-59. Lauterbur P.C., 1973. Image Formation by Reduced Local Interactions: Examples Employing Nuclear Magnetic Resonance, Nature, 242, 190. Lim J., 1990. Two-dimensional Signal and Image Processing, Prentice-Hall, Englewood Cliffs, NJ. Lindgren F. & Geladi P., 1992. Multivariate Spectrometric Image Analysis: An Ilustrative Study with two Constructed Examples of Metal Ions in Solution. Chemometrics and Intelligent Laboratory Systems, 14, 397-412. Lindgren F., Geladi P. & Wold S.,1993. The Kernel Algorithm for PLS. Journal of Chemometrics, 7, 45-59. Lindgren F., Geladi P. & Wold S., 1994. Kernel-based Regression; Cross-Validation and Applications to Spectral Data. Journal of Chemometrics, 8, 377-389. Pedersen F., 1994. Interactive Explorative Analysis of Multivariate Images Using Principal Components. Ph.D. Thesis, Uppsala University. Pratt W., 1978. Digital Image Processing, Wiley, New York. Rosenfeld & Kak A., 1982. Digital Picture Processing (2 volumes), Academic Press, New York.
534 Trebbia P. & Bonnet N., 1990. EELS Elemental Mapping with Unconventional Methods. I. Theoretical Basis: Image Analysis with Multivariate Statistics And Entropy Concepts. Ultramicroscopy, 34, 165-178. Van Espen P., Janssens G., Vanhoolst W. & Geladi P., 1992. Imaging and Image Processing in Analytical Chemistry, Analusis, 20, 81-90.
535
Acronyms 1 Dimensional 2 Dimensional ADAptive Linear NEuron COrrelation SpectroscopY Carr-Purcell-Meiboom-Gill Continuous Wave NMR Direct Assignment Interconnection SpectroscopY Distortionless Enhancement Polarisation Transfer Discrete Fourier Transform Discrete Isolation from Gradient Governed Elimination of Resonances DOUBTFUL DOUBle quantum Transition for Finding Unresolved Lines FFT Fast Fourier Transform FID Free Induction Decay FIR Fast Inversion-Recovery or Finite Impulse Response FLASH Fast Low-Angle SHot FT Fourier Transform GASPE GAted SPin Echo HMBC Heteronuclear Multiple Bond Correlation spectroscopy HMQC Heteronuclear Multiple Quantum Correlation spectroscopy HSQC Heteronuclear Single Quantum Correlation spectroscopy HSVD Hankel Singular Value Decomposition 1FT Inverse Fourier Transform IIR Infinite Impulse Response INADEQUATE Incredible Natural Abundance DoublE QUAntum Transfer Experiment INEPT Insensitive Nuclei Enhanced by Polarisation Transfer IR Inversion-Recovery LP Linear Prediction LPSVD Linear Prediction Singular Value Decomposition MAS Magical Angle Spinning MEM Maximum Entropy Method MLBS Maximum Length Binary Sequence MLM Maximum Likelihood Method MRI Magnetic Resonance Imaging MRS Magnetic Resonance Spectroscopy MVA MultiVariate Analysis 1D 2D ADALINE COSY CPMG CWNMR DAISY DEPT DFT DIGGER
536 nD NLLS NNLS NMR NOE NOED NOESY OLS, LS PAD PCA PEP PFG PGSE PL PLM PLS PRBS PSF
QM~ RELAY RLSA ROESY SNR, S/N SR SVD T1 T~9 T2 TOCSY TPPI
n Dimensional Non Linear Least Squares Non Negative Least Squares Nuclear Magnetic Resonance Nuclear Overhauser Effect Nuclear Overhauser Effect Difference spectroscopy Nuclear Overhauser Effect SpectroscopY Ordinary Least Squares Positive Additive Distribution Principal Components Analysis Preservation of Equivalent Pathways Pulsed Field Gradient Pulsed Gradient Spin Echo Pad6-Laplace Pad6-Laplace Method Partial Least Squares Pseudo Random Binary Sequence Point Spread Function Quantitative Magnetic Resonance Imaging RELAYed coherence transfer spectroscopy Reference LineShape Adjustment Rotating-frame Overhauser Effect SpectroscopY Signal to Noise Ratio Saturation-Recovery Singular Value Decomposition Longitudinal or Spin-Lattice Relaxation Time Rotating Frame Longitudinal Relaxation Time Transverse or Spin-Spin Relaxation Time TOtal Correlation SpectroscopY Time Proportional Phase Incrementation
537
Index 1-9
1D NMR, 347, 354, 496, 500 I D spectra, 215 2D analysis, 215 2D Fourier Transform, 517 2D NMR, 151, 153ff, 157ff, 348, 353, 354, 357, 374, 407ff, 496 2D spectra, 315 3D NMR, 374, 391 4D NMR, 374, 383,407 A Acceptable error, 411, 416 Accuracy, 307, 318, 415-418, 420, 457, 469 Acoustic ringing, 308 ADALINE, 409 Adaptive smoothing, 308 Algorithm (see also Method),33, 86, 87, 91.93ff -, back-projection, 504 -, backpropagation, 413 -, baseline flattening, 310 -, Cambridge, 30, 31, 33 -, FIDDLE, 350ff -. Gauss-Newton, 81-85, 97 -, Genetic, 95, 97ff -, integration, 311 -, Marquardt, 84, 85, 97, 218 -, Meyer-Roth, 85, 87 -, Newton, 81-85 -, Perceptron, 411 -, phase correction, 310 -, simplex, polytope, 79, 80, 94, 98 -, steepest descent, 81, 83-85 -, Zeiger-McEwen, 184
Aliasing, 18, 155 Amplitude, 178, 180, 181, 187, 188, 199, 200, 214, 236, 237, 376, 384, 393 -, modulation, 376, 384, 393 Apodisation, 15, 368, 381, 391 Appraisal, 44, 45, 62 Artefacts, 414, 414 Assessment of results, 461 Audio-frequency signal, 120 Autocorrelation, 14, 196, 198, 490, 491, 494 -, method, 166 Autoregression (AR) modelling, 168 Axiomatic approach, 32 B
Background gradients, 480 Back-projection, 504 Backpropagation, 410-412, 415-419 Backus-Gilbert, 59 Band-pass, 134, 142 Band-reject, 134 Baseline, 191,236, 239, 307, 310, 313, 314 -, rolling, 309 -, correction, 387 -, errors, 350 Bayesian method, 394 Bayesian statistics, 32, 408 Bias, 307 Bicomplex, 378 Bisection, 79 Bloch, 257, 261,330, 331 Bloch- Torrey, 263-266, 269, 270 Boltzmann machine, 413, 418, 419 Burg (All Poles) method, 33,392
538 Burst, 480 C Calculated images, 455 Calibration samples, 462 Canberra, 196, 198 Carr-Purcell-Meiboom-GiU (CPMG), 478, 187, 228, 250, 236- 241,243,245, 264-266 Carver-Richards, 250 Centering, 432 Chebychev, 196, 198 Chemical shift, 412, 419, 420, 453-455, 483 Chi-squared (Z z), 29, 30, 31, 33, 47, 194, 211,222, 223,286, 378, 379, 380, 382 -, maps, 456, 457, 458, 460, 462 -. surface, 286 City block, 196, 198 Classical dimension, 375 Coefficient -, congruence, 434, 443 -, correlation, 341,433,435 -, of diffusion (D), 263,283ff Coherent flow, 480 Complex Fourier Transform, 376, 380 Composite display, 386 Computer burden, 397 Concentration, 306, 320 Confidence interval, 72, 75, 79, 93- 96 Conformation space, 320 Congruence -, coefficient, 434, 443 -, matrix, 443 -, of variables, 433,443,444 Conjugate gradient method, 34 Constituent analysis, 441 Constrained -, methods, 438 -, maximisation, 34, 37 -, regularisation, 31 Construction, 44, 45
CONTIN, 52, 219, 236, 242, 264, 265, 270. 284 Continuous Wave (CW) NMR, 316, 348, 362ff Contrast, 508 Contrast agent, 454 Convection, 480 Convergence, 82, 86, 87, 92, 96 Convolution, 11, 14, 492 Cooley-Tukey, 23 Correlation, 3, 196, 198, 211,362ff, 431 -, analysis, 422 -, coefficient, 341,433,435 -, matrix, 433,434, 441,443,448 -, of samples, 422 -, of variables, 443,444 -, theorem, 14 -, time, 249, 258, 261 COSY, 153, 155, 157, 315, 381, 382, 389, 397, 415, 434, 500 Covariance -, matrix, 78, 88, 89, 92 -, method, 167 Cramer-Rao, 317, 318 Criterion -, autocorrelation, 196, 198 -, Canberra, 196, 198 -, Chebychev, 196, 198 -, city block, 196, 198 -, Euclid, 196, 197 -, FIT, 237 -, Minkowski, 196, 197 -, misfit, 47 -, optimisation, convergence, 86 194, 196, 197, 208 Cross Entropy, 31, 32 Cross peak, 310, 313,320, 414-416 Cross-correlation, 440, 489, 494, 496, 498 Cross-validation, 437-439 Curve fitting, see Regression Curves - Multiexponential, 100ff CW-NMR, 362
539
DAISY, 158 Data abstraction, 407, 413-418 Data compression, 398 Data filling, 391,394 Data preparation, 429, 439 Deconvolution, 346ff, 365, 392, 397 -, methods, 58 Dephasing, 10 DEPT, 308, 427, 429 Derived parameters, 456 Design -, Experimental, 513, 523 -, Factorial, 513 Diagonal, 385 -, suppression, 390 Difference Spectroscopy, 348, 352, 353, 356 Diffraction -, PGSE and, 285 -, unrestricted Brownian, 283 -, analogy, 285 -, analysis, 281 -, in porous sample, 291 Diffusion, 263,265, 278, 454 -, restricted, 281,284, 293 -, coefficient (D), 263,283ff -, coefficient (D) in QMRI, 453,454, 479 -, in T1 measurements, 476 -, measurements, 480 DIGGER, 506 Digitisation noise, 129 Dirac delta (8) function, 11, 128, 153,220 Discretisation, 222 Dispersion, 265, 270, 431,441,454 -, variable, 431 -, curve, 250, 251,254 -, matrix, 432, 434, 437 Display, 386ff Distinction between models, 460 Distortion, 121 Distribution, 70, 74-76, 79, 88-90
Distribution -, continuous, 218if, 264ff, 460 -, discrete, 218, 221,236, 460 -, droplet size, 230 -, error, 462 -, Normal, 76, 78, 88-90, 92 -, Pascal, 77, 79 -, pore size, 231,265ff, 454 -, relaxation times, 218if, 236, 242 -, of sampling points, 467 Domain -, Fourier, 365, 504 -, frequency, 6, 146, 150, 153, 313, 366, 499 -, measurement, 306 -, spatial, 286 -, spectral, 306 -, time, 7, 25, 146, 177, 19 lff, 248, 286, 313,316, 317, 366, 498 DOUBTFUL, 157-158 Dynamic, 379 -, range, 157, 465 E
Eddy currents, 479 Eigenfunctions, 53ff Eigenvalues, 53ff, 437 Eigenvectors, 437 Entropy, 27, 30, 31, 32, 39, 40 -, Jaynes, 219 -, definition, 39 -, of images, 28 -, of NMR spectra, 30 -, Shannon, 223 EPI, 474 Error, 71, 73-77, 79, 87-90, 92-96, 98 -, acceptable, 412, 417 -, baseline, 350, 387 -, phase, 346 -, random, 346, 461,463 -, systematic, 346, 462, 471 -, function, 408, 409, 411, 415 -, of prediction, 437
540 -, propagation, 320 Euclid,. 196, 197 Euclidean norm, 171, 433 Evolution, 375 Exchange -, chemical, 249, 278, 330ff -, diffusion, 263,265 -, fast, 230, 250, 257 -, slow, 230 -, rate, 250-252, 255-258 Existence, 44, 45, 62 Experimental Design, 62, 523 Explained variance, 437 Exponential sums, 71, 85, 92, 93 EXSY, 315 F Fast Fourier Transform (FFT), 23, 174, 177, 180, 181,211 Feature vector, 408, 409, 417 Fejer, 309 Fellget advantage, 1, 4 FID, see Free Induction Decay FIDDLE, 350ff FIR, 469-470 Field gradient, 504 Field/Frequency ratio instability, 346 Filters, 145ff -, band selective, 155 -, binomial, 145ff -, Blackman, 138 - , cascade, 125, 134 -, chemical shift, 153 -, digital, 120if, 121, 122, 139, 145ff, 310 -, electronic, 120, 121 -, Finite Impulse Response (FIR), 145, 123, 124, 127, 131ff -, Gaussian, 152 -, high-pass, 134 -, Infinite Impulse Response (IIR), 123, 124, 127, 129-133, 135 - , J , 160
-, low-pass, 124, 133, 138 -, low-pass Butterworth, 134 -, low-pass electronic, 120 -, low-pass FIR, 132 -, matched, 500, 501 -, multi-quantum, 157 -, parallel, 125 -, phase, 123 -, second order, 126 -, serial, 125 -, sine bell, 152 -, stable digital, 126 -, tan Butterworth, 134, 136, 137 Filtering, 25 Finite element simulation, 456 Finite Impulse Response (FIR), 123ff, 130-133, 135, 140 First point problem, 309 FLASH, 471,476-477, 505 Flow measurements -, turbulence, 481 -, velocity spectrum, 482 -, velocity, 453,457, 459, 480 Fluid -, Non-Newtonian, 297 -, velocity, 297ff Fourier Domain, 365, 504 Fourier Transform (FT), 1-24, 25, 31, 33, 37, 39, 53, 57, 120, 122, 146, 177, 235, 294, 295, 301,306, 309, 314, 319, 346, 350ff, 365, 367, 372, 376, 380, 381,384, 386, 398, 428, 458, 491-495, 497-499, 517 -, 2D, 517 -, Complex, 349, 376, 380 -, Discrete, 309 -, Hypercomplex, 357, 380, 384 -, Inverse (IFT), 8, 10, 13, 314, 346, 349, 350ff, 365-368, 392 -, Real, 376, 380
541 Free Induction Decay (FID), 4, 30, 33, 100, l l0ff, 120, 121, 129, 130, 131, 149, 153, 157, 172, 173, 180, 240, 245, 259, 309-311,315, 366, 370, 374, 375, 381,391,392, 395, 428, 474, 491,494, 497, 501,515, 517 Frequency dependent filter phase, 121, 123, 132, 142 Frequency dependent gain, 123, 142 Function, see also Model -, Gaussian, 132 -, likelihood, 77, 98 -, merit, 75, 76, 79, 83, 85, 86, 88, 90, 91, 95 -, polynomial, 70, 71, 79, 95, 96 -, rational, 123 -, robust loss, 90, 91 -, spline, 70, 72, 95, 96, 98, 99 -, QMRI, 452, 463 G GASPE, 428 Gauss, Gaussian, 289, 293, 311, 312, 317, 355, 356 Gauss-Jordan, 195 Gauss-Markov condition, 76, 78, 88 Gauss-Newton, 192, 331, 333 Generalisation, 411, 416, 418, 419 Generalised Least Squares (GLS), see Least squares, generalised Golden section, 79 Goldman-Shen, 259, 262 Grad, 223,224 Gradient, 77, 78, 81, 83, 85, 87, 94, 95, 193, 199, 333 Gradient vector, 34 H
Hadamard, 2 -, imaging, 505 -, spectroscopy, 495 -, Transformation, 490, 495 Hahn Spin Echo, 156 Hankel
-, matrix, 172, 182ff -, Singular Value Decomposition (HSVD), 18 lff, 393 Hartmann-Hahn, 158 Hessian, 81-85, 94, 193, 199, 200 -, matrix, 34 Heteroscedasticity, see Variance heterogenei .ty Hilbert, 11 -, Transform, 349-351,398 HMBC, 352 HMQC, 155, 157-160, 386 HOGWASH, 349 Hopfield network, 413 HSQC, 155 HSVD, see Hankel Singular Value Decomposition Hypercomplex, 377ff Hyperplane, 375 I
Idemifiability problem, 88, 92, 93 Idemification, 422 IEEE-754/854, 129 Image space, 34 Images -, digital, 513. 520 -, entropy of, 28 -, multivariate, 520ff -, score, 528, 5290 Imaging, 285, 452ff, 489ff, 513ff -, Hadamard, 505 -, Magnetic Resonance (MRI), 452ff, 489ff, 513ff -, magnitude, 299-302 -, molecular motion, 294 -, q encoded, 295 -, velocity, 296, 299-302 Imperfect 180 ~ pulses -, and T1,475 -, and T2,239ff Impulse response function, 141 INADEQUATE, 157-158
542 INEPT, 154, 308 Infinite Impulse Response, see Filters Infinite Impulse Response (IIR) Information, 26 Initial estimates, Starting values, 80, 87, 91, 93, 95, 188, 202, 340 Integral, 307 Intensity- P e a l 26 Interatomic distance, 307 Instrument instability -, and T1,476 -, andT2,241ff Inverse Fourier Transform (IFT), 8, 10, 13, 314, 350ff, 365-368, 392 Inverse method, 393 Inversion-Recovery (I-R), 239, 257, 464, 469, 471,475 Iterative schemes, 134 J
Jackknife, 206, 214 Jacobian, see Gradient K
k-space, 503 Kullback - Liebler, 31 L Lack of Fit, 213 Lagrange, Lagrangian, 34, 38, 223 Laplace Transform, 44, 45, 53, 58, 100if, 218, 220 Larmor, 13 Layer -, hidden, 410-412, 415, 416, 419 -, input, 410-412, 415, 416, 419 -, output, 409, 410, 412, 415, 416, 419 Learning (Training) -, supervised, 408 -, unsupervised, 408, 410 Least squares, see also Regression -, generalised (GLS), 88, 89, 94
-, nonlinear (NLLS), 284, 286, 289, 291, 292, 331,332, 458, 462 -, nonnegative (NNLS), 49, 284 -, ordinary (OLS), 49, 50, 68. 69, 76- 80, 84, 86, 88-94, 97, 98, 248, 264, 286, 307, 312, 316, 330ff, 438, 439. 498 -, weighted, 68, 76, 89, 91, 94 Likelihood, 77 -, concentrated, 89 -, maximum, 76, 99, 307. 316 -, quasi, 85, 89, 98 -, method- Maximum (MLM). 393 Line fitting, 397 Linear inverse theory, 49 -, - minimum structure solutions, 50 -, - minimum misfit solutions, 49 -, - T partition, 49 Linear Prediction (LP), 111, 55ff, 164ff. 307, 318, 358, 390, 395, 399 -, and Singular Value Decomposition (LPSVD), 55ff, 164ff, 177, 392 -, backward. 56 -, forward, 55 Linear programming, 50, 51, 60 Linear systems, 491 Linearisation, (see also Transformation), 73-75, 207, 208 Lineshape, 376-378, 384, 387, 393,396 -, correction, 346, 354 Loadings, 439, 441 -, vectors, 525 Local Maxima, 414-417 Local Minima, 412, 417, 419 Lorentz, Lorentzian, 311, 312, 314, 318, 348, 350, 354, 363, 376, 378, 390, 396, 428 Lorentz-Gauss, 315 LSTSQ (see also Meyer-Roth), 85, 95, 97 M
MADALINE, 409, 416, 417 Magic angle, 262 -, spinning (MAS), 500
543 Magnetic Resonance Imaging (MRI), 452ff, 489ff. 513ff Magnetisation, 306 Marker -, object spectrum, 438 -, objects, 438, 440, 447, 448 -. projections, 438, 441 -, variable, 438 Markov parameters, 182 Marquardt, 48, 19 lff, 218, 223,227, 228, 317, 458 Matrix -. congruence, 443 -, correlation, 433,434, 441,443.448 -. covariance, 78, 88, 89, 92 -, dispersion. 341,432, 434, 437 -, Hankel, 172, 182ff -, Hessian, 34 -. information, 332 -, prediction, 171 -. residual, 437 -, Toeplitz, 167 -. transformation, 438 -, variance / covariance, 433 Maximisation - Constrained. 34, 37 Maximum Entropy Method (MEM, MaxEnt), 25-43, 44, 111, 218if, 248, 264, 307, 319, 349, 358, 392, 393,458 Maximum Entropy reduction, 428 Maximum Length Binary Sequence (MLBS), 490 Maximum Likelihood, 307, 316 Maximum Likelihood Method (MLM), 393 Measures of dispersion, 432 Memory function, 492 Method, see also Algorithm -. adaptive, 79, 85, 97 -. autocorrelation, 166 -, Bayesian, 394 -. Burg (All Poles), 33,392 -, bootstrap, 94. 99 -. combined, 79, 84, 86
-, -, -, -, -, -, -, -, -, -, -,
conjugate gradient, 34 covariance, 167 damped, 84, 98 descent, 81, 83-85 direct, 79, 82 Gauss-Newton, 81, 84 Hartley's, 84, 85 hybrid, 79, 85, 98 iterative, 79-81, 83, 84, 86, 87 Marquardt, 191 if, 458 Maximum Entropy (MEM, MaxEnt), 25-43, 44, 111,218ff, 264, 307, 319, 349, 358, 392, 393,458 -, Maximum Likelihood(MLM), 393 -, mirror image, 392 -, modified, 84, 85, 97, 98 -, Monte-Carlo, 94, 95, 264-270 -, Ne~aon, 82, 84 -, Newton-Raphson, 34, 291 -, Powell's, 85 -, regularisation, 248 -, resampling, 94, 99 -, secant, 94 -, Zhu, 392 Methods -, constrained, 438 -, deconvolution, 58 -, inverse, 393 -, rational function, 56 Minimisation, 224, 394 -, Ll-norm, 77, 91 Minimum, 77, 79, 83, 86, 87, 95 -, false, 87, 91 -, norm, 172, 439 Minkowski, 196, 197 Mixing, 376 Mixture spectra, 423 Model, see also Function -, compartmental, 70, 71, 92, 94, 98 -, components, 69, 71 -, empirical, 69, 70, 72, 95 -, inappropriate, 91 -, mechanistic, 69, 70, 72, 87, 93, 95
544 Monte Carlo, 319, 320, 264-266, 270, 461 Moore-Penrose generalised inverse, 439 Multi-channel advantage, 493 Multiexponential, 209, 218, 100ff Multiplex advantage, 1, 4 Multiplexing, 362 Multivariate Analysis (MVA), 423 Multivariate Data Analysis, 513ff Multivariate Image Segmentation (MIS), 513ff Multivariate Regression, 73, 75, 93, 94 N
nD NMR, 374, 379ff, 497ff Neural Network, 407ff Newton-Raphson, 34, 291 NMR microscopy - q encoded, 295 NMR spectra - Entropy of, 30 NOE, 258, 425, 429, 431,445 NOED, 352, 356 NOESY, 155, 258, 308, 312, 315, 320, 321,385, 396, 414 Noise, 130, 131, 133, 184ff, 195, 197, 220. 241-243, 307, 312, 414, 416, 489ff -, binary, 490 -, coloured, 490 -, computational, 129 -, flicker, 314 -, Gaussian, 180, 490 -, pseudo-random, 490 -, quaternary, 490 -, systematic, 494 -. T1,131,133 -, tl, 308, 348, 353 -, tailored, 491 -, white, 491,497 -. sources, 463 Noisy data, 177 Non-Classical dimension, 375 Nonlinear -, parameter estimation, 48, 192, 205 -, regression, see Regression
-, systems, 490 Nonlinear Least squares (NLLS), 284, 286, 289, 291,292, 331,332, 458, 462 Nonnegative Least squares (NNLS), 49, 284 Nonuniqueness, 59, 62 Normalisation, 430 Nyquist, 18, 25, 120, 127, 137, 141,368, 374, 377 O
Object spectra, 438 Occam's Razor, 30 Optimal Design, 92 Optimisation, 75, 76, 85, 95-99 -, criterion, see Criteria, optimisation Orthogonal -, expansion, 498 -, function expansion, 53, 60 Orthogonality principle, 165 Outlier, 90-92, 96 Oversampling, 310 P
Pad6 approximants, 100if. 218 Pad6-Laplace (PL), 56, 100if, 218, 223, 235ff, 248 Parameter, 69-73, 75-80, 83, 87-89, 9195, 98. 192, 320 -, estimation, 48, 330ff -, maps, 454 -, redundancy, 91, 92 Parseval-Plancherel, 11 Partial Least Squares (PLS), 422, 439, 446- 448 Partial Linear Fit (PLF), 440 Pattern recognition, 407. 408-413 PC A, see Principal Components Analysis Peak -, picking, 307, 396, 428 -, shifting, 440 -, volume, 311, 312, 315, 317 Perceptron
545 -, multilayer, 410-412, 415-419 -, single layer, 409, 410, 416, 417 Phase, Phasing, 121, 178, 180, 181,307, 368, 377, 378, 380, 387, 391 -, correction, 150, 381,397 -, cycling, 353, 356 -, errors, 346 -, filter, 123 -, modulation, 377 -, roll, 387 -, shift, 156 -, twist, 377 Plot -, contour, 385 -, stacked,. 385 Point Spread Function (PSF), 29, 30, 31 Polynomial, 310 Pore size, 284ff, 454 Porosity, 454, 455, 475 Positive Additive Distribution (PAD), 31, 32 Power spectral density, 33 Preacquisition delay, 308 Precision, 307, 314-316, 318, 320 Prediction, 72, 73 Preservation of Equivalent Paths (PEP), 384 Principal Components Analysis, 422, 437, 439, 440, 447, 448, 515, 524ff Processing -, in memory, 382 -, in place, 383 -, off place, 383 -, on file, 382 Projection, 385 -, analysis, 422 -, marker, 438, 441 Prony, 177, 223 Propagation, 320 Propagator -, difference, 298, 300-302 -, diffusion and flow, 283
-, self, 282 Proton density, 474ff Pseudo Random Binary Sequence (PRBS), 11 Pulse feed through. 308 Pulse rate, 250, 270 Pulsed Field gradients (PFG), 384 Pulsed Gradient Spin Echo (PGSE), 28 lff, 479, 423 Pulsed Gradient Stimulated Echo, 479-480
Q q encoded NMR microscopy, 295 q-space, 286 Quadrature detection, 309 Quantification, 422 Quantitative Magnetic Resonance Imaging (QMRI), 452ff R
r space, 503 Random error, 346 Randomisation, 409, 419 Rank, 170, 437, 525 Rapid passage, 366 Rapid Scan, 365 -, Correlation NMR, 362ff RARE, 473 Rate constants, 307 Ratio, 60 Rational function, 123 -, methods, 56 Real Fourier Transform, 376, 380 Redfield trick, 309 Reference deconvolution, 308, 317, 346ff Reference Lineshape Adjustment (RLSA), 349 Reference spectrum, 439 References, 62 Regression, see also Least-squares -, constrained, 95, 97 -, linear, 68, 70, 71, 74, 77, 78, 80, 82, 87, 94, 97, 99
546 -, multivariate, 73, 75, 93, 94 -, nonlinear, 68, 69, 71-73, 75, 77, 79, 84-89, 91, 94, 96, 97-99 -, nonparametric, 70, 95 -, robust, 90, 91, 93 Regressor, 71, 75-77, 88, 91, 92, 94, 97 Regularisation, 248 - . constrained, 31 Regularising parameter, 51 Relaxation, 44ff, 100if, 164ff, 191if, 218if, 234ff, 248ff, 307 -, cross, 258, 259, 261,262 -, longitudinal/spin-lattice, 191,256, 257, 258, 261,263, 307, 357, 363,453 -, rotating frame, 259, 261,262 -, transverse/spin-spin, 191,249, 254, 255, 263, 265, 267, 268, 271,273, 331,363,368, 453,477 -, reagents, 308 -, time dispersion, 39 RELAY, 155 Residual matrix, 438 Resolution, 62 -, spectral, 25, 39, 508 - , enhancement, 355 Reverse FT, see Inverse Fourier Transform Rheology, 453 Ridges, 413,414, 416, 418 Rotation, 438, 440 -, matrix, 441,444 S Sampling, 17, 395 -, interval, 141 -, rate, 120 Saturation-Recovery (SR), 469ff Savitsky-Golay, 211 Scaling, 429, 431 Scatter plot, 530 Scattering analogy, 282 Scores, 439, 445, 447, 529ff Scree plot, 527 Search, 79, 85, 87, 97
-, grid, 79, 87 -, line, 79, 84-87 -, random, 79, 87 Separability, 380 -, linear, 408 -, nonlinear, 408, 410 Sequential assignment, 407, 418-420 Shannon, 18, 27, 39 Shifting -, frequency, 10 -, time, 10, 19, 150 Shimming, 346 Signal-to-Noise (SNR, S/N), 25, 39, 148, 169, 226, 317, 370, 371,372, 426, 445, 447, 448, 461,464ff. 501 Signal averaging, 466 Simpson, 311 Simulated annealing, 95, 97, 98, 413, 417, 419. 420 Simulation (see also Method, MonteCarlo), 93, 94, 96, 461 SINC, 506 Sine-bell function. 132 Singular Value Decomposition (SVD), 54, 55, 60. 164ff. 525 Smoothing, 211 -, Adaptive. 308 Smoothness, 31 Solvent -, response, 131 -, shift, 424, 427, 439, 448 -. suppression, 131, 310, 389, 413, 414, 416 Spatial domain, 286 Spectral -, domain. 306 -. interpretation. 422 -, processing, 428 -, width. 308 Spectrum -. ID. 215 -, 2D, 315 -. absolute value. 40
547 -, of covariances, 436 -, of variances, 434 -, mixture, 422 -, object, 438 -, oversampled, 131 -, power, 40, 319 -, reference, 439 -, marker object, 438 -, recognition, 412 Spin-locking, 261 Spinning sidebands, 346, 353 Spline, see Function, spline Squared residual error, 439 Standard Error of Prediction (SEP), 446 Standardising, 445 Starting values, see Initial estimates State space, 181, 188 Stejskal-Tanner, 283,480 Step, 73, 80-82, 84-86, 93 -, direction, 81, 83-85 -, length, 84, 85, 93 Stochastic -, differential equations, 496 -, excitation, 489ff -, NMR, 489ff Submatrix, 383 Symmetry, Symmetrisation, 388, 389 Systematic errors, 346 T Tl, 44, 46, 100, 191,203-205, 207, 214, 235, 239, 256-258, 308, 312, 314, 347, 363, 370, 453ff, 500, 508, 518, 519 T1 in QMRI, 460, 475 T1 measurement, 469ff T1 noise, 131, 133 T19,261 t 1 , 315, 375-377 t 1 noise, 308, 347-348, 353, 386, 387 t 1 ridge, 308, 313, 386
T2,44, 46, 100, 180, 181, 187, 188, 191, 195, 197, 199, 200, 203-205, 207, 214, 224-228, 230, 231,235ff, 249, 254, 255, 260, 310, 311,331,341,363, 453t/'.. 500, 508, 516, 518, 519 T2 in QMRI, 477 T2 measurement -, stimulated echoes, 478 -, mirror artefacts, 478 T2 *, 235ff, 310-311,314, 368, 370, 454, 474 t 2 , 315, 375, 376 Target Factor Analysis (TFA), 438, 441, 444 Targets, 440 Taylor, 332, 333 -, approximation, 224 -, series, 102, 103, 106, 110, 192 Time domain, 7, 25, 146, 177, 19 lff, 248, 286, 313, 316, 317, 366, 498 Threshold -, sigmoid, 410, 411 -, step, 409 Toeplitz matrix, 167 Total Least Squares, 318 Total variance, 437 Training set, 409, 415-419 Transfer function, 121-123, 125, 128, 134. 141,492 Transformation (see also Linearisation), 73, 74, 90, 97 -, Box-Cox, 89 -, data, 73, 75, 77, 89 -, Eadie-Hofstee, 73 -, Hanes-Woolf, 73 -, Lineweaver-Burk, 73 -, model, 73, 89, 90 -, matrix, 438 Transient response, 120, 121, 128 Transposition, 383 Trends, 91 Truncation, Truncation error, 15, 226, 227, 311,318, 337, 354, 390
548 Tukey's biweight (see also Regression, robust), 90 Two-point fits, 458 U Unconstrained maximisation, 31 Unconstrained minimisation problem, 224 Underflow, 130 Uniqueness, 44, 45, 59, 61, 62 V Variable, 68, 69, 71, 73, 77, 93, 94, 97, 98 -, marker, 438 -, dispersion, 431 Variance, 76, 78, 88, 89, 92, 94 -. explained, 437 -, heterogeneity, 88, 89 -, total, 437 Variance / covariance, 431 -, matrix, 433 Velocity- Fluid, 297ff Volterra -, series, 497 -, theory, 497 W
WALTZ, 424, 426 Water activity, 273 Weights, 408, 409 Wiener-Khinchin, 14 Wiener -, kernel, 498 -, series, 497 -, theory, 490, 497 Window, 313, 315 X,Y,Z Z- spectroscopy, 259 Zeiger-McEwen, 182ff Zero filling, 20, 350, 398, 458 Zhu method, 392