DATA HANDLING IN SCIENCE AND TECHNOLOGY — VOLUME 26
Practical Data Analysis in Chemistry
DATA HANDLING IN SCIENCE AND TECHNOLOGY Advisory Editors: S. Rutan and B. Walczak
Other volumes in this series: Volume 1 Volume 2 Volume 3 Volume 4 Volume 5 Volume 6
Volume 7 Volume 8 Volume 9 Volume 10 Volume 11 Volume 12 Volume 13 Volume 14 Volume 15 Volume 16 Volume 17 Volume 18 Volume 19 Volume 20A
Volume 20B Volume 21 Volume 22 Volume 23 Volume 24
Volume 25
Microprocessor Programming and Applications for Scientists and Engineers, by R.R. Smardzewski Chemometrics: A Textbook, by D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte and L. Kaufman Experimental Design: A Chemometric Approach, by S.N. Deming and S.L. Morgan Advanced Scientific Computing in BASIC with Applications in Chemistry, Biology and Pharmacology, by P. Valkó and S. Vajda PCs for Chemists, edited by J. Zupan Scientific Computing and Automation (Europe) 1990, Preceedings of the Scientific Computing and Automation (Europe) Conference, 12–15 June, 1990, Maastricht, The Netherlands, edited by E.J. Karjalainen Receptor Modeling for Air Quality Management, edited by P.K. Hopke Design and Optimization in Organic Synthesis, by R. Carlson Multivariate Pattern Recognition in Chemometrics, illustrated by case studies, edited by R.G. Brereton Sampling of Heterogeneous and Dynamic Material Systems: Theories of Heterogeneity, Sampling and Homogenizing, by P.M. Gy Experimental Design: A Chemometric Approach (Second, Revised and Expanded Edition) by S.N. Deming and S.L. Morgan Methods for Experimental Design: Principles and Applications for Physicists and Chemists, by J.L Goupy Intelligent Software for Chemical Analysis, edited by L.M.C. Buydens and P.J. Schoenmakers The Data Analysis Handbook, by I.E. Frank and R. Todeschini Adaption of Simulated Annealing to Chemical Optimization Problems, edited by J. Kalivas Multivariate Analysis of Data in Sensory Science, edited by T. Næs and E. Risvik Data Analysis for Hyphenated Techniques, by E.J. Karjalainen and U.P. Karjalainen Signal Treatment and Signal Analysis in NMR, edited by D.N. Rutledge Robustness of Analytical Chemical Methods and Pharmaceutical Technological Products, edited by M.W.B. Hendriks, J.H. de Boer, and A.K. Smilde Handbook of Chemometrics and Qualimetrics: Part A, by D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke Handbook of Chemometrics and Qualimetrics: Part B, by B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke Data Analysis and Signal Processing in Chromatography, by A. Felinger Wavelets in Chemistry, edited by B. Walczak Nature-inspired Methods in Chemometrics: Genetic Algorithms and Artificial Neural Networks, edited by R. Leardi Handbook of Chemometrics and Qualimetrics, by D.L. Massart, B.M.G. Vandeginste, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke Statistical Design — Chemometrics, by R.E. Bruns, I.S. Scarminio and B. de Barros Neto
DATA HANDLING IN SCIENCE AND TECHNOLOGY — VOLUME 26 Advisory Editors: S. Rutan and B. Walczak
Practical Data Analysis in Chemistry MARCEL MAEDER School of Environmental and Life Sciences The University of Newcastle Callaghan, NSW 2308, Australia
YORCK-MICHAEL NEUHOLD School of Environmental and Life Sciences The University of Newcastle Callaghan, Australia
Amsterdam – Boston – Heidelberg – London – New York – Oxford
Paris – San Diego – San Francisco – Singapore – Sydney – Tokyo
Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK
First edition 2007 Copyright © 2007 Elsevier B.V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-444-53054-7 ISSN: 0922-3487
For information on all Elsevier publications visit our website at books.elsevier.com
Printed and bound in The Netherlands 07 08 09 10 11 10 9 8 7 6 5 4 3 2 1
Contents PREFACE
IX
SYMBOLS
XIII
1 INTRODUCTION
1
2 MATRIX ALGEBRA
7
2.1 Matrices, Vectors, Scalars 2.1.1 Elementary Matrix Operations Transposition Addition and Subtraction Multiplication 2.1.2 Special Matrices Square Matrix Symmetric Matrix Diagonal Matrix Identity Matrix Inverse Matrix Orthogonal and Orthonormal Matrices
8
10
10
12
16
21
21
22
22
23
24
25
2.2 Solving Systems of Linear Equations
26
3 PHYSICAL/CHEMICAL MODELS
29
3.1 Beer-Lambert's Law
33
3.2 Chromatography / Gaussian Curves
36
3.3 Titrations, Equilibria, the Law of Mass Action 3.3.1 A Simple Case: Fe3+ + SCN 3.3.2 The General Case, Definitions A Chemical Example, Cu2+, Ethylenediamine, Protons 3.3.3 Solving Complex Equilibria The Newton-Raphson Algorithm Example: General 3-Component Titration Example: pH Titration of Acetic Acid Equilibria in Excel Complex Equilibria Including Activity Coefficients Special Case: Explicit Calculation for Polyprotic Acids 3.3.4 Solving Non-Linear Equations One Equation, One Parameter
40
40
43
45
48
48
56
58
60
62
64
69
69
Contents
vi Systems of Non-Linear Equations
3.4 Kinetics, Mechanisms, Rate Laws 3.4.1 The Rate Law 3.4.2 Rate Laws with Explicit Solutions 3.4.3 Complex Mechanisms that Require Numerical Integration The Euler Method Fourth Order Runge-Kutta Method in Excel 3.4.4 Interesting Kinetic Examples Autocatalysis 0th Order Reaction The Steady-State Approximation Lotka-Volterra / Predator-Prey Systems The Belousov-Zhabotinsky (BZ) Reaction Chaos, the Lorenz Attractor
4 MODEL-BASED ANALYSES
71
76
77
77
80
80
82
86
87
89
91
92
95
97
101
4.1 Background to Least-Squares Methods 4.1.1 The Residuals and the Sum of Squares Linear Example: Straight Line Non-Linear Example: Exponential Decay
102
103
103
105
4.2 Linear Regression 4.2.1 Straight Line Fit - Classical Derivation 4.2.2 Matrix Notation 4.2.3 Generalised Matrix Notation 4.2.4 The Normal Equations The Pseudo-Inverse Linear Dependence, Rank of a Matrix Numerical Difficulties 4.2.5 Errors in the Fitted Parameters 4.2.6 Excel Linest 4.2.7 Applications of Linear Least-Squares Fitting Linearisation of Non-Linear Problems Polynomials, the Savitzky-Golay Digital Filter Smoothing of Noisy Data Calculation of the Derivative of a Curve Polynomial Interpolation 4.2.8 Linear Regression with Multivariate Data Applications Computation of Component Spectra, Known Concentrations Computation of Component Concentrations, Known Spectra The Pseudo-Inverse in Excel
109
109
113
114
115
117
119
120
121
125
127
127
130
131
135
138
139
143
144
145
146
4.3 Non-Linear Regression 4.3.1 The Newton-Gauss-Levenberg/Marquardt Algorithm A First, Minimal Algorithm
148
148
149
Contents Termination Criterion, Numerical Derivatives The Levenberg/Marquardt Extension Standard Errors of the Parameters Multivariate Data, Separation of the Linear and Non-Linear Parameters Constraint: Positive Component Spectra Structures, Fixing Parameters Known Spectra, Uncoloured Species Reduced Eigenvector Space Global Analysis 4.3.2 Non-White Noise, F2-Fitting Linear F2-Fitting Non-Linear F2-Fitting 4.3.3 Finding the Correct Model 4.4 General Optimisation 4.4.1 The Newton-Gauss Algorithm 4.4.2 The Simplex Algorithm 4.4.3 Optimisation in Excel, the Solver F2-Fitting in Excel
5 MODEL-FREE ANALYSES
vii 153
155
161
162
168
169
175
180
183
189
190
195
197
198
198
204
207
211
213
5.1 Factor Analysis, FA 5.1.1 The Singular Value Decomposition, SVD 5.1.2 The Rank of a Matrix Magnitude of the Singular Values The Structure of the Eigenvectors The Structure of the Residuals The Standard Deviation of the Residuals 5.1.3 Geometrical Interpretations Two Components Reduction in the Number of Dimensions Lawton-Sylvestre Three and More Components Mean Centring, Closure HELP Plots Noise Reduction
213
214
217
219
221
222
223
224
224
228
231
235
239
241
243
5.2 Target Factor Analyses, TFA 5.2.1 Projection Matrices 5.2.2 Iterative Target Transform Factor Analysis, ITTFA 5.2.3 Target Transform Search/Fit Parameter Fitting via Target Testing
246
250
251
253
257
5.3 Evolving Factor Analyses, EFA 5.3.1 Evolving Factor Analysis, Classical EFA 5.3.2 Fixed-Size Window EFA, FSW-EFA 5.3.3 Secondary Analyses Based on Window Information
259
260
268
271
Contents
viii
Iterative Refinement of the Concentration Profiles Explicit Computation of the Concentration Profiles
271
276
5.4 Alternating Least-Squares, ALS 5.4.1 Initial Guesses for Concentrations or Spectra 5.4.2 Alternating Least-Squares and Constraints 5.4.3 Rotational Ambiguity
280
281
282
288
5.5 Resolving Factor Analysis, RFA
290
5.6 Principle Component Regression and Partial Least Squares, PCR and PLS 5.6.1 Principal Component Regression, PCR Mean-Centring, Normalisation PCR Calibration PCR Prediction Cross Validation 5.6.2 Partial Least Squares, PLS PLS calibration PLS Prediction / Cross Validation 5.6.3 Comparing PCR and PLS
295
296
297
298
300
303
306
308
309
310
FURTHER READING
313
LIST OF MATLAB FILES
317
LIST OF EXCEL SHEETS
321
INDEX
322
Preface The word 'practical' in the title describes a characteristic feature of this book. However, it could easily be misunderstood. It is not a book that is meant to be taken into the laboratory to remain on the lab bench next to instruments and test tubes. The book is practical insofar as every bit of theory applicable to data analysis is exemplified in a short program in Matlab or in an Excel spreadsheet. The philosophy of the book is that the reader can study the programs, play with them and observe what happens. They are short and concise and thus invite and encourage meddling and improving by the reader. Suitable data are generated for each example in short routines. This ensures the reader has a clear understanding of the structure of the data and thus will have a better chance of comprehending the analysis. In fact, the programs, rather than complex equations, are often used to elucidate the principles of the analysis. The programs are written modular. The reader can replace the artificial data, generated by a function, with real data from the lab. There is extensive use of graphical output. While the plots are minimal they efficiently illustrate the results of the analyses. In order to keep the programs concise, no effort was made to build comfortable user-interfaces. In Chapter 2, we give a brief introduction to matrix algebra and its implementation in Matlab and Excel. The next three chapters form the core of the book. We distinguish two types of data analysis: model-based and model-free analyses. For both, appropriate data have to be generated for subsequent analysis. In Chapter 3, we supply the theory required for the modelling of chemical processes. Many of the example data sets used for both kinds of analyses are taken from kinetics and equilibrium processes. This reflects the background of both authors. In fact, this part of the book serves as a solid introduction to the simulation of equilibrium processes such as titrations and the simulation of complex kinetic processes. The example routines are easily adapted to the processes investigated by the reader. They are very general and there is essentially no limit to the complexity of the processes that can be simulated. Chapter 4 is an introduction to linear and non-linear least-squares fitting. The theory is developed and exemplified in several stages, each demonstrated with typical applications. The chapter culminates with the development of a very general Newton-Gauss-Levenberg/Marquardt algorithm. Chapter 5 comprises a collection of several methods for model-free data analyses. It starts with classical Factor Analysis, employing many
x
Preface
geometrical visualisations, covers popular methods, such as EFA and ALS and concludes with a brief introduction into the calibration based methods PCR and PLS. A fair amount of effort has been put into writing short and concise but still readable code. A few highlights: lolipop.m, a very short function of 7 lines of code that performs polynomial interpolation of any degree and complexity. It can be used for interpolation and, of course with caution, for extrapolation. NewtonRaphson.m, a 30 line code function that equilibrium problems of any degree of complexity.
solves
chemical
nglm3.m, a function with 40 lines of code for the general non-linear leastsquares fitting based on the Newton-Gauss algorithm. It incorporates a very efficient handling of parameters. A package of compact PCR and PLS programs that includes cross validation. While many aspects of data analysis are introduced, starting from very basic facts, the book is not primarily written for the beginner. Its main audience is expected to come from post-graduate students, research and industrial chemists with sufficient interest in data analysis to warrant the development of their own software rather than relying on other people's packages that all too often are rather black boxes. Statistics plays a crucial role in any data analysis, and accordingly, the statistical aspects are mentioned and appropriate equations/code are supplied. E.g. examples are given for the least-squares analysis of data with white noise as well as F2-analyses for data with non-uniformly distributed noise. However, the statistical background for the appropriate choice of the two methods and more importantly, the effects of wrong assumptions about the noise structure are not included. Many of our students, colleagues and friends deserve to be acknowledged. Most important are the students. They have repeatedly forced us to think and re-think the concepts of data analysis. The principle is straightforward: it is not possible to explain anything properly without having understood it in depth. Most important were those students who have been involved in chemometrics projects over the years: Andrew Whitson, Arnaldo Cumbana, Caroline Mason, Eric Wilkes, Graeme Puxty, Jeff Harmer, Kirsten Molloy, Maryam Vosough, Monica Rossignoli, Nichola McCann, Pascal Bugnon, Peter Lye, Porn Jandanklang, Raylene Dyson, Rod Williams, and Sarah Norman. Important are also all the colleagues who helped educate us and have been part in some or many aspects of data analysis, they include: Alan Williams, André Merbach, Andreas Zuberbühler, Anna de Juan, Arne Zilian, Bernhard Jung, Bill Tolman, Charlie Meyer, Christoph Borchers, Dom Swinkels, Ed Constable, Elmars Krausz, Geoff Lawrance, Hans Brintzinger, Harald Gampp, Helmut Mäcke, Ira Brinn, Jean Clerc, Jim Ferguson, Ken Karlin, Konrad Hungerbühler, Liselotte Siegfried, Manuel Martínez, Martin
Preface
xi
Schumacher, Paul Gemperline, Peter King, Peter Comba, Robert Binstead, Romá Tauler, Sigrid Mönkeberg, Silvio Fallab, Susan Kaderli, Thomas Kaden, Tom Callcott. Special thanks to our colleagues at the Department of Chemistry, the University of Newcastle, for taking over MM's teaching and more importantly, his share of administration while on sabbatical leave, sweating over this book. Thanks also to Jenny Helman and Gudrun Ludescher for the incredible effort of proof reading a text without understanding the content. They deserve a warm thank-you from us and also from every reader.
Marcel Maeder Yorck-Michael (Bobby) Neuhold Newcastle, Australia September 2006
"Alles für die Wissenschaft", Gerda Maeder
This page intentionally left blank
Symbols y, Y
vector (mu1, nsu1) or matrix (mun, nsunl) of data (e.g. single or multi wavelength absorbance data)
C
matrix (munc, nsunc) of component concentrations
a, A
vector (npu1, ncu1) or matrix (ncunl) of linear parameters (e.g. molar absorptivities of pure species spectra)
r, R
vector (mu1, nsu1) or matrix (mun or nsunl) of residuals
F
design matrix for linear regression (munp or nsunp)
J
Jacobian matrix of derivatives with respect to parameters
U
column matrix of all eigenvectors of YYt
S
diagonal matrix of all corresponding singular values in decreasing order of their magnitude
V
row matrix of all eigenvectors of YtY
¯ U
column matrix of significant eigenvectors (nsune)
¯ S
diagonal matrix of significant singular values (neune)
¯ V
row matrix of significant eigenvectors (neunl)
¯ Y
factor analytically reproduced data matrix (nsunl)
Yred
data matrix of absorbances in reduced eigenspace (nsune)
Ared
matrix of molar absorptivities in reduced eigenspace (ncune)
Yglob
vertically concatenated data matrices Y1…Ynm
Cglob
vertically concatenated concentration matrices C1…Cnm
p
vector of parameters (npu1)
T
transformation matrix (ncunc), score matrix in PLS (nsune)
P
loading matrix in PLS (neunl)
W
matrix of loading weights in PLS (neunl)
vprog
prognostic vector in PCR or PLS (nlu1)
m, ns
# of spectra (e.g. # of rows in Y and C)
n, nl
# of lambdas (e.g. # of columns in Y or A)
nc
# of components or species (e.g. # of columns in C or rows in A)
xiv
Symbols
ne
¯ and V ¯ t or diagonal # of factors (e.g. # of columns in U ¯ elements in S, # of factors used for PCR or PLS prediction)
nm
# of measurements (e.g. # of submatrices in Yglob and Cglob)
nd
polynomial degree
df
degree of freedom
np
# of parameters
nu
# of unknown spectra
mp
Marquardt parameter
A, B, C, … names of chemical species [A]
concentration of species A
[A]tot
total concentration of component A
[A]0
initial concentration of species A
K
equilibrium constant
Exyz
formation constant of species XxYyZz
k
rate constant
Oj
j-th wavelength or j-th eigenvalue
ssq
sum of squared residuals
F2
sum of squared weighted residuals
Vr
standard deviation of the residuals
Vy
standard deviation of the noise in y or Y
Vp i
standard deviation of the parameter pi
1 Introduction As the title Practical Data Analysis in Chemistry indicates, there are different facets to the book: The book is about data analysis, the data are taken from chemistry, and the emphasis is on practical considerations. Data Analysis in Chemistry is an ambitious title; of course we cannot cover all aspects of data analysis in chemistry. A substantial fraction of the examples investigated in the different chapters is based on data from absorption spectroscopy. Absorption spectroscopy is a very powerful and very readily available technique; there are very few laboratories that do not have a spectrophotometer. Further, Beer-Lambert's law establishes a very neat and simple relationship between signal and concentration of species in solution. While most of the examples discussed in this book deal with spectra measured in the visible wavelength region, this is not important. BeerLambert's law is valid at any wavelength and covers UV, as well as NIR and IR spectroscopy. Identical laws govern CD spectroscopy and thus the methods can be adapted immediately. The only difference is that CD signals can be negative and thus, those methods that rely on positive molar absorptivities, need to be modified. Also, light emission spectroscopy often obeys laws that are very similar to Beer-Lambert's law. Other examples of data types used in the book include potentiometric data (pH) and data from monovariate chromatography detectors, such as flame ionisation or refractive index. The crucial feature is that there is a clearly defined relationship between signal and concentration. All the numerical methods are developed to a level that allows the analysis of complete absorption spectra, e.g. complete absorption spectra are measured as a function of time in a kinetic investigation. More traditional singlewavelength measurements are just a special case of spectra measured at one wavelength only. This multivariate ability is also a significant aspect of the book. There are few commercial and publicly available programs that include the analysis of multivariate data. The collection of examples is extensive and includes relatively simple data analysis tasks such as polynomial fits; they are used to develop the principles of data analysis. Some chemical processes will be discussed extensively; they include kinetics, equilibrium investigations and chromatography. Kinetics and equilibrium investigations are often reasonably complex processes, delivering complicated data sets and thus require fairly complex modelling and fitting algorithms. These processes serve as examples for the advanced analysis methods.
2
Chapter 1
There are many types of data in chemistry that are not specifically covered in this book. For example, we do not discuss NMR data. NMR spectra of solutions that do not include fast equilibria (fast on the NMR time scale) can be treated essentially in the same way as absorption spectra. If fast equilibria are involved, e.g. protonation equilibria, other methods need to be applied. We do not discuss the highly specialised data analysis problems arising from single crystal X-ray diffraction measurements. Further, we do not investigate any kind of molecular modelling or molecular dynamics methods. While these methods use a lot of computing time and power, they are more concerned with data generation than with data analysis. Also, we do not cover several typical chemometrics types of analyses, such as cluster analysis, experimental design, pattern recognition, classification, neural networks, wavelet transforms, qualimetrics etc. This explains our decision not to include the word 'chemometrics' in the title. What is practical about this book? The book is not meant to be taken into the lab. It is practical in a different way: all methods and equations that are developed, are translated immediately into a short computer program that performs the particular analysis under investigation. We decided not to supply a collection of data files from real experiments that could be read by the analysis programs for further processing. Instead, we provide short files that generate 'measurements'. The main advantage of this practice is that the reader will be able to analyse and understand the structure of the data. The results of the analyses can be compared with the input, e.g. resulting rate constants in a kinetic fit can be compared with the rate constants used to generate the data. The practice also invites the reader to 'play' with the data, investigating the influence of noise level, noise structure. The reader can observe the effects of changing the parameters used to generate the data, such as rate or equilibrium constants, absorption spectra for the reacting species, general conditions such as initial concentrations, etc. The data generation or modelling functions are a powerful educational tool. The extensive collection of example programs is a unique feature of this book. They are meant to be an invitation for the reader to be used, to be incorporated into the readers' packages and also to be fiddled with and improved. We have put considerable effort into writing good code, but no doubt, there is room for improvement. Matlab is a matrix oriented language that is just about perfect for most data analysis tasks. Those readers who already know Matlab will agree with that statement. Those who have not used Matlab so far, will be amazed by the ease with which rather sophisticated programs can be developed. This strength of Matlab is a weak point in Excel. While Excel does include matrix operations, they are clumsy and probably for this reason, not well known and used. An additional shortcoming of Excel is the lack of functions for Factor Analysis or the Singular Value Decomposition. Nevertheless, Excel is very powerful and allows the analysis of fairly complex data. The book is structured in four main chapters.
Introduction
3
Chapter 2, Matrix Algebra, gives a very brief introduction into matrix algebra. Most tasks in numerical data analysis are advantageously formulated in elegant and efficient matrix notation. Matlab is a matrix based language and thus ideally suited for the development of programs dealing with numerical analyses. This point cannot be over-stressed. The few short programs presented in this chapter may also serve as a very rudimentary introduction into Matlab. Readers not familiar with Matlab but otherwise proficient in an alternative language will be surprised at the almost complete lack of for … end loops. We also introduce matrix operations in Excel, assuming that the other, more common aspects of Excel are known to the reader. While there is a reasonable collection of matrix operations available in Excel, their usage is rather cumbersome. We believe that many readers will appreciate the short introduction into this aspect of Excel. Of course parts of, or the whole chapter can be skipped by those readers who are already proficient in the basics of matrix algebra and the implementation in Matlab and Excel. Chapter 3, Physical/Chemical Models, starts with a review of Beer-Lambert's law and very importantly demonstrates its compatibility with matrix notation. After a short discussion on chromatographic concentration profiles (these are used heavily in Chapter 5, Model-Free Analyses), we start the development of a toolbox for the computational analysis of equilibrium problems. All these computations are based on the law of mass action. While a few simple equilibrium systems can be solved by analytical expressions, all other systems require modelling by iterative procedures. We explore the Newton-Raphson method and develop it into an incredibly powerful algorithm that resolves equilibrium systems of any complexity. We subsequently incorporate the algorithm into programs that model potentiometric pH-titrations and spectrophotometric titrations. At this stage, the collection of routines can serve as an educational tool; later, in the subsequent Chapter 4, Model-Based Analyses, it will be incorporated into a general non-linear least-squares fitting program for the analysis of equilibrium processes and the determination of equilibrium constants. Those equilibrium processes that can be resolved explicitly are straightforwardly modelled in Excel. While it is possible to solve equilibrium problems of essentially any complexity in Excel, it is virtually impossible to develop a reasonable spreadsheet for the modelling of a complex titration. Iterative methods are generally difficult to implement in Excel. The Newton-Raphson algorithm is further developed into a fairly generally applicable tool for the solving of sets of non-linear equations. The equivalent to the law of mass action in equilibria are the sets of differential equations in kinetics. They are defined by the chemical reaction scheme. Again, there are explicit solutions for very simple models but most other models lead to sets of differential equations that need to be integrated numerically. Matlab supplies an extensive collection of functions for
4
Chapter 1
numerical integration dealing with just about any conceivable case, in particular the so-called 'stiff problems'. It would go well beyond the limits and scope of this book to develop such algorithms. We do, however, explain the principles of numerical integration and also develop an Excel spreadsheet with a 4th order Runge-Kutta algorithm. We demonstrate the use of Matlab's numerical integration routines (ODE solvers) and apply them to a representative collection of interesting mechanisms of increasing complexity, such as an autocatalytic reaction, predator-prey kinetics, oscillating reactions and chaotic systems. This section demonstrates the educational usefulness of data modelling. The collection of kinetic modelling programs will be adapted in the subsequent chapter for the non-linear least-squares analysis of kinetic data and the determination of rate constants. Chapter 4, Model-Based Analyses, is essentially an introduction into leastsquares fitting. It is crucial to clearly distinguish between linear and non linear least-squares fitting: linear problems have explicit solutions while non-linear problems need to be solved iteratively. Linear regression forms the base for just about 'everything' and thus requires particular consideration. For non-linear regression there are several iterative methods available. The simplex algorithm is a popular method. Its concept is simple but convergence and execution times are slow. In this chapter we present the Newton-Gauss algorithm enhanced by the Levenberg/Marquardt method. The method is developed using a representative collection of worked examples that illustrate the different aspects of data fitting. We end up with a collection of programs that can fit any reaction mechanism in kinetics and any system in equilibrium studies. The most advanced features include multivariate data, inclusion of known spectra, efficient handling of variable and non-variable parameters and global analysis of series of multivariate measurements. Chapter 5, Model-Free Analyses. Model-based data fitting analyses rely crucially on the choice of the correct model; model-free analyses allow insight into the data without prior chemical knowledge about the process. Model-free analysis is based on restrictions imposed on the results of the analysis. The restrictions that are demanded by the physics of the measurement rather then by the scientist. Typical restrictions of this kind are that concentrations and molar absorptivities have to be positive. Only multivariate (e.g. multi-wavelength) data are amenable to model-free analyses. While this is a restriction, it is not a serious one. The goal of the analysis is to decompose the matrix of data into a product of two physically meaningful matrices, usually into a matrix containing the concentration profiles of the components taking part in the chemical process, and a matrix that contains their absorption spectra (Beer-Lambert's law). If there are no model-based equations that quantitatively describe the data, model-free analyses are the only method of analysis. Otherwise, the results of model
Introduction
5
free analyses can guide the researcher in the choice of the correct model for a subsequent model-based analysis. An important group of methods relies on the inherent order of the data, typically time in kinetics or chromatography. These methods are often based on Evolving Factor Analysis and its derivatives. Another well known family of model-free methods is based on the Alternating Least-Squares algorithm that solely relies on restrictions such as positive spectra and concentrations. There is a rich collection of publications describing novel methods for ModelFree Analyses. The selection presented here does not cover the complete range; it attempts to select the more useful and interesting methods. Such a selection is always influenced by personal preferences and thus can be biased. Excel does not provide functions for the factor analysis of matrices. Further, Excel does not support iterative processes. Consequently, there are no Excel examples in Chapter 5, Model-Free Analyses. There are vast numbers of free add-ins available on the internet, e.g. for the Singular Value Decomposition. Alternatively, it is possible to write Visual Basic programs for the task and link them to Excel. We strongly believe that such algorithms are much better written in Matlab and decided not to include such options in our Excel collection. This chapter ends with a short description of the important methods, Principal Component Regression (PCR) and Partial Least-Squares (PLS). Attention is drawn to the similarity of the two methods. Both methods aim at predicting properties of samples based on spectroscopic information. The required information is extracted from a calibration set of samples with known spectrum and property. In any book, there are relevant issues that are not covered. The most obvious in this book is probably a lack of in-depth statistical analysis of the results of model-based and model-free analyses. Data fitting does produce standard deviations for the fitted parameters, but translation into confidence limits is much more difficult for reasonably complex models. Also, the effects of the separation of linear and non-linear parameters are, to our knowledge, not well investigated. Very little is known about errors and confidence limits in the area of model-free analysis. We tried to keep the programs short; they perform only the essential numerical operations, followed by minimal data output. Usually there are one or two graphs of output and occasionally a few lines of numerical output. The graphs are designed to be simple but instructive. They do not make use of the richness of the Matlab graphics routines. We did not include any graphical user-interfaces (GUIs) in order to avoid difficulties with different versions of Matlab. Matlab versions starting from 5.3 should be compatible with all programs in this book.
6
Chapter 1
Iterative refining processes invariably depend on the quality of initial guesses. If these are too far from the optimum, the process can diverge and collapse, sometimes seriously. The code provided for all iterative processes is minimal and works for most reasonably well behaved problems; however, the routines are not fool-proof. In the case of divergence and collapse we recommend the user investigates the appropriateness of the initial guesses supplied to the function. A few words regarding programming style are appropriate. Any computer program is a compromise between readability, length of code and speed of execution. Matlab, in particular, offers a substantial range of powerful commands that allows the composition of extremely compact code. The reader will find a few instances where commands perform very complex operations at the expense of being almost incomprehensible. In many instances explanations will be given in the form of comments or in adjoining text but there may be several occasions when the novice will struggle to understand the line of code. The process of preparing programs for a digital computer is especially attractive, not only because it can be economically and scientifically rewarding, but also because it can be an aesthetic experience much like composing poetry or music. Donald Knuth - The Art of Computer Programming.
2 Matrix Algebra In this chapter, we present the basic matrix mathematics that is required for understanding the methods introduced later in the book. In line with the philosophy that all concepts are immediately implemented in Matlab and/or Excel, this will be done here as well. This way, Chapter 2 not only revises the basic mathematics, it also serves as a very short introduction to the Matlab and Excel languages. It is not meant to be a manual on Matlab or Excel; the reader will need to refer to more specialised texts and proper manuals. Several more advanced features of both languages are not covered at this introductory stage but will be explained as they emerge in later chapters. Generally, most chemists possess some knowledge in one or more classical programming languages such as Basic or Fortran. While this is certainly helpful, it also produces some 'bad habits'. Matlab is a matrix-based language; it incorporates an extensive function library for matrices, vectors and data arrays in general and so is particularly well designed for the numerical analysis of multivariate data. Even though classical loop-based programming is possible for matrix operations, in Matlab there is usually a much shorter and faster way of performing the same task. Matlab programs are very readable since matrix equations are almost written as in 'real life'. Very important properties of Matlab are the direct availability of a vast number of functions that directly allow high level graphical output and the fact that Matlab works on interpreter basis, i.e. compilation is done in the background and all variables and data can be accessed at the prompt. This makes Matlab one of the most used development tools in engineering and the current standard in chemometrics. Most programs provided in this book have been developed and tested in the standard version of Matlab 6.1 and do not require any additional toolboxes (e.g. the optimisation toolbox). Our philosophy is to keep the algorithms as simple as feasible. Additionally, we avoid Matlab's capabilities in programming a graphical user interface (GUI). That way, backwards compatibility of our programs to Matlab 5.3 as well as upwards to Matlab 7.x is very likely, yet not always guaranteed. As an integral component of Microsoft Office, the spreadsheet program Excel is installed on many personal computers. Thus, a widespread basic expertise can be assumed. Although initially designed for business calculations and graphics, Excel is also extremely useful for scientific purposes. Its matrix capabilities, as well as the optimisation add-in 'solver', are not widely known but can often be applied in order to quickly resolve quite complex multivariate problems. We have used Excel 2002 but any other version will do equally well. As mentioned before, this chapter has two goals, (a) to refresh some basic matrix mathematics and (b) to familiarise the reader with the essentials of both Matlab and Excel, particularly with respect to multivariate data
8
Chapter 2
analysis of chemical problems. In order to minimise abstractness we provide many examples. It is helpful to distinguish matrices, vectors, scalars and indices by typographic conventions. Matrices are denoted in boldface capital characters (A), vectors in boldface lowercase (a) and scalars in lowercase italic characters (s). For indices, lower case characters are used (i). The symbol 't' indicates matrix and vector transposition (At, at). All chemical applications discussed later in this book will deal exclusively with real numbers. Thus, we introduce matrix algebra for real numbers only and do not include matrices formed by complex numbers.
2.1 Matrices, Vectors, Scalars A matrix is a rectangular array of numbers. e.g.
A
ªa1,1 «# « «ai ,1 « «# « ¬am,1
" a1, j % " ai , j " am , j
" a1,n º » # » " a j ,n » » % # » " am ,n »¼
(2.1)
The size of a matrix is defined by its number of rows, m, and number of columns, n. We refer to the dimensions by mun. In Matlab, the appropriate notation is [m,n]=size(A). For any matrix A, ai,j is the element in row i (i=1…m) and column j (j=1…n). Vectors and scalars can be seen as special cases of matrices where one or both dimensions have collapsed to 1. Thus, a row vector represents a 1un matrix, a column vector an mu1 matrix and a scalar a 1u1 matrix. Matlab is based on the philosophy that Everything is a Matrix. In order to visualise matrices, column and row vectors, it is convenient to use rectangles, vertical and horizontal lines, as outlined in Figure 2-1.
Figure 2-1. A Matrix, a column vector and a row vector Sometimes it is helpful to specifically distinguish between row and column vectors. In such instances, we borrow Matlab's colon (:) notation. A vector x
Matrix Algebra
9
is represented by x1,: if it is a row vector (the row dimension is 1) and by x:,1 if it is a column vector (the column dimension is 1). Furthermore, every row of a matrix A can be seen as a row vector or sub matrix of A with the dimensions 1un, while every column of A represents a column vector or sub matrix of the dimensions mu1. Thus, the second row of matrix A can be referred to as the row vector a2,:, the third column of A as the column vector a:,3, etc…. With this notation it is generally possible to denote any sub matrix of A. For example, A2:4,3:6 is a matrix of dimensions 3u4 comprised of the elements of A that are within the rectangle defined by rows 2 to 4 and columns 3 to 6. Let's see how this is done in Matlab: A=[1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6; ... 7; ... 8; ... 9]
a_r2c3=A(2,3) a_r2=A(2,:) a_c3=A(:,3) A_sub=A(2:4,3:6)
% % % %
1st row of A 2nd 3rd 4th
% % % %
extract extract extract extract
element of 2nd row of 3rd column 2nd to 4th
A in row 2 column 3 A of A row and 3rd to 6th column
A = 1 2 3 4 a_r2c3 = 4 a_r2 = 2 a_c3 = 3 4 5 6 A_sub = 4 5 6
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
3
4
5
6
7
5 6 7
6 7 8
7 8 9
A few remarks are in order: x all matrix entries are enclosed by two square brackets, the elements within each row are separated by blanks (or equivalently by commas), the end of a row is indicated by a semicolon x three dots at the end of a line tell Matlab there is continuing input in the next line x the percent (%) character introduces a comment
10
Chapter 2
x for references to individual elements, rows, columns or sub-matrices, the corresponding row and column indices (or colon operators) are separated by a comma and put between parentheses x there is an equivalent command for the last line that creates the identical sub-matrix (A2:4,3:6) but refers to all rows and columns individually: A_sub=A([2 3 4],[3 4 5 6]). The sub-matrix is created according to the two vectors comprising the row and column indices. This can be handy if rows and/or columns that are not in a sequence are to be combined .
2.1.1 Elementary Matrix Operations
Transposition The transposed matrix At is defined as the interchange of rows and columns of A. This can also be seen as the reflection of all elements of A at its main diagonal (along ai,j, i=j ), according to transposition ai , j oa j ,i
(2.2)
Thus, the row dimension of A becomes the column dimension of At, the column dimension of A will be the row dimension of At. j i
i
A
j
At
Figure 2-2. A matrix A and its transposed At
A few lines in Matlab illustrate this on an example. A=[1 2 3; ... 4 5 6]
% definition of matrix A
At=A'
% matrix transposition
dim_A=size(A) dim_At=size(At)
% retrieving the dimensions of A % retrieving the dimensions of At
A = 1 4
2 5
3 6
Matrix Algebra
11
At = 1 2 3 dim_A = 2 dim_At = 3
4 5 6 3 2
Note that Matlab uses the quote (') as the transposition operator. In an Excel spreadsheet, the TRANSPOSE function can be applied to an array of data. For this, we need to become familiar with the two most important rules to perform matrix operations in Excel: x The result has to be pre-selected, i.e. the dimensions of the result have to be known. x
The SHIFT+CTRL keys must be held while pressing the ENTER key to confirm the operation.
Figure 2-3 shows an example spreadsheet. Cells A3:C4 contain the elements of the 2u3 matrix A. In order to perform the matrix transposition cells E3:F5, which will contain the result, At, have to be pre-selected. Next =TRANSPOSE(A3:C4) is typed on the Excel command line followed by the SHIFT+CTRL+ENTER key combination.
Figure 2-3. Matrix transposition in Excel Note the curly braces that have appeared in the command line and indicate a matrix operation applied to a block of cells. The curly braces must not be typed in explicitly. As for most Excel functions, there is always an alternative way via the main menu's Insert-Function feature and an upcoming graphical user interface that leads you through the process of selecting the correct function and the source cells corresponding to the matrix A. This takes a bit longer but has the advantage that you do not have to recall the exact syntax. However, the target cells, i.e. the elements corresponding to the transposed matrix At still
12
Chapter 2
have to be pre-selected. When the TRANSPOSE function is selected from the Insert-Function menu, a window similar to the one shown in Figure 2-4 appears.
Figure 2-4. Matrix transposition via Excels interactive graphical user interface With the correct source cells in the input line and pushing the OK button while holding the SHIFT+CTRL keys, the equivalent result as in Figure 2-3 is obtained. Importantly, once an array of cells has been declared a matrix by applying the SHIFT+CTRL+ENTER or the SHIFT+CTRL+OK combination, it is no longer possible to alter or delete its individual cells. Always, the whole array has to be pre-selected and thus only the complete array can be modified. As you have probably already noticed, Excel uses alphabetical letters for the column index and numbers for the row index in order to specify a cell on the spreadsheet. The column index appears before the row index, e.g. cell A3 refers to column A row 3. This is contrary to the general convention as introduced earlier and, if desired, you can change Excel's default settings (Tools-Options-General) to accommodate a row-column notation. The notation A3 will then be altered into R3C1 (row 3, column 1). Addition and Subtraction Addition or subtraction of matrices is done element-wise and thus is straightforward. Obviously, the dimensions of the matrices to be added have to match.
ArB
ªa1,1 r b1,1 " a1, j r b1, j «# % # « «ai ,1 r bi ,1 ai , j r bi , j « # # « «am ,1 r bm ,1 " am , j r bm , j ¬
" a1,n r b1,n º » # » ai ,n r bi ,n » » % # » " am ,n r bm ,n »¼
(2.3)
Matrix Algebra
13
Matrix addition and subtraction are commutative
ArB
rB A
(2.4)
and associative.
(A r B) r C
A r (B r C )
(2.5)
In Matlab the standard mathematical operators for addition (+) and subtraction (-) can be used directly with matrices. As with transposition, Matlab automatically calls the appropriate functions to perform the operations. A=[1 2 3; ... 4 5 6]
% definition of A
B=[0.1 0.2 0.3; ... 0.4 0.5 0.6]
% definition of B
Y=A+B
% matrix addition
A = 1 4
2 5
3 6
B = 0.1000 0.4000
0.2000 0.5000
0.3000 0.6000
1.1000 4.4000
2.2000 5.5000
3.3000 6.6000
Y =
Note that Matlab directly adds/subtracts a single scalar element-wise to/from matrices. Suppose you want to subtract element b1,2 (a scalar) from all elements of matrix A. A valid Matlab command would be A=[1 2 3; 4 5 6]; B=[0.1 0.2 0.3; 0.4 0.5 0.6]; Y=A-B(1,2)
% subtracting element b1,2 from all elements of A
Y = 0.8000 3.8000
1.8000 4.8000
2.8000
5.8000
In Excel, mathematical operations of one or more cells can be dragged to other cells. Since a cell represents one element of an array or matrix, the effect will be an element-wise matrix calculation. Thus, addition and subtraction of matrices are straightforward. An example:
14
Chapter 2
Figure 2-5. Matrix addition in Excel Cells A3:C4 and E3:G4 comprise the elements of array A and B respectively. First, in cell I3, the addition is done for one pair of elements, A3+E3 (a1,1+b1,1). Then, the calculation is repeated by dragging cell I3 to the remaining cells of the rectangle I3:K4. It is worthwhile mentioning that matrix addition can alternatively be performed by pre-selection of all cells of the prospective Y, assigning the array addition according to A3:C4+B3:C4 and applying the SHIFT+CTRL+ENTER key combination. Usually, this has no particular advantage.
Figure 2-6. Matrix addition by array command Note the difference in the command line between Figure 2-5 and Figure 2-6. The capability of dragging results from one cell to others is a very useful property of Excel and becomes even more powerful in combination with the dollar operator ($) correctly applied within the cell reference. Referring to the previous Matlab example, if the scalar element b1,2 (cell F3) is to be subtracted from matrix A (A3:C4) in Excel, putting the dollar operator ($) in front of the column and row reference of the source cell containing the scalar b1,2 ($F$3), prevents "dragging-over" of the source cell F3 in both column and row direction.
Figure 2-7. Subtracting element b1,2 from all elements of A
Matrix Algebra
15
Similarly, it is possible to add/subtract one column, say b:,j (or one row bi,:) of B to/from all columns (or rows) of A. Then, the $ symbol is put before the column (row) index only. In the example below, the third column of B (b:,3), containing the elements of cells G3:G4, is subtracted from all columns of A and a matrix Y of the same dimensions as A is formed.
Figure 2-8. Subtracting the third column of B from all columns of A In the same way the '$' symbol can be used for other element-wise operations in Excel. The power and importance of the $ operator cannot be overrated and we apply it in several additional examples later. In Matlab the plus (+) and minus (-) operators cannot be directly applied to equations that involve vectors or matrices of different dimensions. In order to perform the same operation as in the former Excel example, column vector b:,3 must be replicated three times to match the dimensions of A. For this, the Matlab command repmat can be used. A=[1 2 3; 4 5 6]; B=[0.1 0.2 0.3; 0.4 0.5 0.6]; Y=A-repmat(B(:,3),1,3)
% replicating the third column of B 3u and % subtracting the result from A
Y = 0.7000 3.4000
1.7000 4.4000
2.7000
5.4000
Matlab employs B(:,3) as the notation for the third column of B, b:,3. By using repmat(B(:,3),1,3) a matrix is created consisting of an 1-by-3 (horizontal) tiling of copies of B(:,3). Naturally, this function can also be used to create a vertical tiling of copies of row vectors, e.g. if row vector b2,: is to be added/subtracted to/from all rows of A. An appropriate function call would then be repmat(B(2,:),2,1). We refer to the Matlab manuals for further details on this function. The repmat command in the above application can be replaced by a more conventional loop: A=[1 2 3; 4 5 6];
16
Chapter 2
B=[0.1 0.2 0.3; 0.4 0.5 0.6]; for i=1:3 Y(:,i)=A(:,i)-B(:,3); end
It is a matter of opinion whether the loop or the repmat option is preferable. One is shorter and the other is easier to comprehend; one is Matlab specific and the other is more general. In this book, we tend to use the shorter, Matlab-style version, at least as long as the readability is not severely compromised. Multiplication
A matrix product Y=CA is defined in the following way nc
yi , j
¦c
ci ,: u a:, j
i ,k
ak , j
(2.6)
k 1
It is only defined if the number of columns of matrix C matches the number of rows of matrix A. The column dimension nc of C (i.e. the length of ci,:) has to be the same as the row dimension of A (i.e. the length of a:,j). j i
j i
A C
=
Y
Figure 2-9. Matrix multiplication Let ci,: be the i-th row vector of C and a:,j be the j-th column vector of A, then each element yi,j of Y is calculated as the scalar product ci,:ua:,j. In other words, the element in the i-th row and j-th column of Y is the sum over the element-wise products of the i-th row of matrix C and the j-th column of matrix A. Thus, if the dimensions of C are munc and the dimensions of A are ncun, Y has the dimensions mun. Figure 2-10 illustrates the multiplication Y=CA on a simple example. The factor matrices C (4u2) and A (2u3) can be arranged in such a way that the rows of C align with the rows of Y and the columns of A align with the columns of Y.
17
Matrix Algebra
C ª1 «3 « «5 « ¬7
2º 4»» 6» » 8¼
ª9 «19 « «29 « ¬39
12 15 º Y 26 33 »» 40 51» » 54 69 ¼
ª1 « 4 ¬
2 5
3 ºA 6 »¼
Figure 2-10. Matrix multiplication Y = C A That way, the dimensions of Y (4u3) become immediately obvious. One particular element, e.g. y2,3, calculates as the scalar product of the second row of C with the third column of A according to c2,:ua:,3=3u3+4u6=33. In Matlab the asterisk operator ( ) is used for the matrix product. If the corresponding dimensions match all individual scalar products, ci,:ua:,j, are evaluated to form Y. C=[1 3 5 7
2; ... 4; ... 6; ... 8]
A=[1 2 3; ... 4 5 6] Y=C*A
% the matrix product
C = 1 3 5 7
2 4 6 8
1 4
2 5
3 6
9 19 29 39
12 26 40 54
15 33 51 69
A =
Y =
Matlab automatically determines the correct dimensions of Y. In Excel, the cells comprising the prospective result Y have to be pre-selected as we have already seen for matrix transposition. For this, we need to predict the dimensions of Y from the row dimension of C and column dimension of A. Also, there is no direct operator for matrix multiplication in Excel. The function MMULT in conjunction with the SHIFT+CTRL+ENTER key
18
Chapter 2
combination needs to be applied in order to perform the operation. MMULT is called with two arguments, the cell range B3:C6, containing the elements of C, and the cell range E8:G9, containing the elements of A. The correct syntax can be taken from the Excel command line in Figure 2-11.
Figure 2-11. Matrix multiplication in Excel Vectors have earlier been introduced as matrices with one dimension being reduced to one, i.e. they are comprised of one column or one row only. When the rule for matrix multiplication given by equation (2.6) is formally applied to vectors and the appropriate dimensions match, there are two immediate consequences: (1) The product of a row vector and a column vector of same length results in a scalar (the scalar product). e.g.
ª4º 1 2 3 > @ ««5 »» «¬6 »¼
1u 4 2 u 5 3 u 6
33
(2) The product of a column vector with m rows and a row vector with n columns results in a matrix with m rows and n columns. This is the so-called outer product.
e.g.
ª1 º «2» « » >5 6 7 @ «3 » « » ¬ 4¼
6 7º ª5 «10 12 14 » « » «15 18 21» « » ¬20 24 28 ¼
There are a few useful rules for dealing with matrix products. The list is not complete and for more details, we refer to a textbook on linear algebra.
19
Matrix Algebra
If a matrix is to be multiplied by a scalar x, the multiplication is performed on every element of the matrix. Obviously, this operation is commutative. xA
ª x a1,1 ! x a1,n º « » % # » « # « x am,1 " x am ,n » ¼ ¬
Ax
(2.7)
The multiplication of matrices, however, is generally not commutative; i.e. the order of the factors must not be changed. CA z AC
(2.8)
The multiplication of a matrix with more than one scalar (e.g. x, y) is associative and commutative (x A ) y
x (A y )
xy A
(2.9)
and the product with a sum of scalars is distributive. A (x y )
Ax Ay
(2.10)
When three or more matrices are to be multiplied, the operation is also associative
(C A ) D
C (A D)
(2.11)
and with the sum of two matrices involved in the product, it is also distributive.
(A B) D
AD BD
(2.12)
The transpose of the product of matrices is the product of the transposed individual matrices in reversed order. (C A D)t
Dt A t C t
(2.13)
As stated earlier, Matlab's philosophy is to read everything as a matrix. Consequently, the basic operators for multiplication, right division, left division, power (*, /, \, ^) automatically perform corresponding matrix operations (^ will be introduced shortly in the context of square matrices, / and \ will be discussed later, in the context of linear regression and the calculation of a pseudo inverse, see The Pseudo-Inverse, p.117). Element-wise operations can, however, be enforced. If this is desired, the dot (.) needs to be placed before each operator (.*, ./, .\, .^).
20
Chapter 2
+ A
.*
B
=
C
./ … Figure 2-12. Element-wise matrix operations
Note that the operators for addition (+) and subtraction (-) need not be preceded by a dot. These two operations are always done element-wise. Some examples: A=[1 2 3; 4 5 6]; B=[0.1 0.2 0.3; 0.4 0.5 0.6]; X=A.*B Y1=A./B Y2=A.\B Z=A.^B
% % % %
element-wise element-wise element-wise element-wise
multiplication right division left division raising to the power
X = 0.1000 1.6000 Y1 = 10 10 10 10 Y2 = 0.1000 0.1000 Z = 1.0000 1.7411
0.4000 2.5000
0.9000
3.6000
10 10 0.1000 0.1000
0.1000 0.1000
1.1487 2.2361
1.3904 2.9302
Element-wise right division (./) leads to the inverse result of element-wise left division (.\) and the operator '.^' raises all elements to the corresponding power. For element-wise operations, the dimensions of the matrices always have to match. In contrast to Matlab, where the defaults are the matrix operators, in Excel the default is the element-wise operation. In fact, all basic operations (e.g. +, -, *, /, ^) and functions (e.g. EXP, LN, LOG) work element-wise in Excel. All matrix functions such as TRANSPOSE, MMULT, and MINVERSE require a
Matrix Algebra
21
pre-selection of a target cell block and the SHIFT+CTRL+ENTER key combination to perform the calculation.
2.1.2 Special Matrices There are several matrices that have special properties and thus require closer attention. The following list is not complete but it is sufficient for many applications. Square Matrix
Matrices that have the same number of rows and columns are called square matrices. From what we have learnt so far there are two immediate consequences with respect to matrix multiplication. (1) The product of any matrix A with its transpose At or vice versa results in square matrices that are symmetric (see below); but recall that AAt z AtA. (2) Any square matrix can be multiplied with itself repeatedly and the resulting matrix is also square. This is identical to raising the power of the matrix. A few examples in Matlab: A=[1 2 3; 4 5 6]; Y1=A*A' Y2=A'*A Z1=Y2*Y2 Z1=Y2^2 Y1 = 14 32 Y2 = 17 22 27 Z1 =
% % % %
multiplication if A with its transpose or vice versa multiplication of Y2 with itself or, equivalently, raising the power of Y2
32 77 22 29 36
27 36 45
1502 1984 2466
1984 2621 3258
2466 3258 4050
1502 1984 2466
1984 2621 3258
2466 3258 4050
Z1 =
22
Chapter 2
Symmetric Matrix
Symmetric matrices are square matrices that are identical to their transpose. They are invariant to an inflection at their main diagonal, i.e. invariant to the interchange of row and column index. In the former Matlab example both the 2u2 matrix Y1 and the 3u3 matrix Y2 are symmetric. Diagonal Matrix
A square matrix comprised of zeros except for the main diagonal elements, is called a diagonal matrix. Naturally, a diagonal matrix is symmetric. The Matlab command diag(x) forms a diagonal matrix D with the entries of vector x as its diagonal elements. Reversely, the command diag(D) extracts the diagonal elements of D into a vector. x=[1 2 3]; D=diag(x) x=diag(D)
% forming a diagonal matrix % extracting diagonal element into a vector
D = 1 0 0
0 2 0
0 0 3
x = 1 2 3
Diagonal matrices are handy when individual rows or columns of a matrix are to be multiplied by different scalar factors s1…sn. One typical example is the normalisation of B so that the square root of the sum of all squared elements in, for example, each row of B becomes one, i.e. unity length of each row vector. B=[0.1 0.2 0.3; 0.4 0.5 0.6]; s=1./sqrt(sum(B.^2,2)) B_n=diag(s)*B
% vector of row-wise normalisation coeff. % row-wise normalisation
Note that the Matlab command sum(B.^2,2) performs a row-wise addition of all squared elements of B. For a column-wise summation sum(B.^2,1) or just sum(B.^2) could be used. s = 2.6726 1.1396 B_n = 0.2673 0.4558
0.5345 0.5698
0.8018 0.6838
Matrix Algebra
23
Another common task is to normalise in such a way that the maximum value in each e.g. column of B, becomes one. B=[0.1 0.2 0.3; 0.4 0.5 0.6]; s=1./max(B) % vector of column-wise normalisation coeff. B_n=B*diag(s) % column-wise normalisation
Note that the command max(B,1) or simply max(B) finds the column-wise maxima, max(B,2) the row-wise maxima of B. s = 2.5000 B_n = 0.2500 1.0000
2.0000
1.6667
0.4000 1.0000
0.5000 1.0000
Note that the order of the multiplication with the square matrix diag(s) is different in the two examples. In the first example the rows are normalised, in the second, the columns. Identity Matrix
A diagonal matrix with only ones as diagonal elements is called identity matrix, commonly abbreviated as I. It is the neutral element with respect to matrix multiplication; i.e. left or right multiplication of a matrix A with an identity matrix I of appropriate dimensions results in A itself. In Matlab the command eye(n) can be used to build an identity matrix of dimensions nun. A=[1 2 3; 4 5 6]; I1=eye(2) Y1=I1*A I2=eye(3) Y2=A*I2
% % % %
I1 = 1 0
0 1
1 4
2 5
3 6
1 0 0
0 1 0
0 0 1
1 4
2 5
3 6
Y1 =
I2 =
Y2 =
2u2 identity matrix left multiplication 3u3 identity matrix right multiplication
24
Chapter 2
Inverse Matrix
The inverse X-1 of a matrix X is defined in such a way that X X 1
X 1 X
I
(2.14)
The left and right product of a square matrix with its inverse results in an identity matrix. In Matlab the command inv(X) or equivalently X^(-1) is used for matrix inversion. Only square matrices can be inverted. X=[1 2; 3 4]; X_inv=inv(X) I=X*X_inv X_inv = -2.0000 1.5000 I = 1.0000 0.0000
% matrix inversion
1.0000 -0.5000 0 1.0000
Singular matrices cannot be inverted. They have linearly dependent rows or columns. In Matlab or any other computer language, singularity can simply be an issue due to the numerical precision. Consider the following example: X=[1 2; 1+1e-16 2]; rank_X=rank(X) % rank of X X_inv=inv(X) % matrix inversion rank_X = 1 Warning: Matrix is singular to working precision. X_inv = Inf Inf Inf Inf
Within the Matlab's numerical precision X is singular, i.e. the two rows (and columns) are identical, and this represents the simplest form of linear dependence. In this context, it is convenient to introduce the rank of a matrix as the number of linearly independent rows (and columns). If the rank of a square matrix is less than its dimensions then the matrix is call rank-deficient and singular. In the latter example, rank(X)=1, and less than the dimensions of X. Thus, matrix inversion is impossible due to singularity, while, in the former example, matrix X must have had full rank. Matlab provides the function rank in order to test for the rank of a matrix. For more information on this topic see Chapter 2.2, Solving Systems of Linear Equations, the Matlab manuals or any textbook on linear algebra. In Excel, matrix inversion can be performed similarly to matrix transposition (see earlier). Figure 2-13 gives an example. Cells D3:E4, defining the target matrix, have to be pre-selected and now the MINVERSE function is applied to the source cells A3:B4. Finally, the SHIFT+CTRL+ENTER key combination is used to confirm the matrix operation.
Matrix Algebra
25
Figure 2-13. Matrix inversion in Excel
Orthogonal and Orthonormal Matrices
Matrices with exclusively orthogonal column (or row) vectors are called orthogonal matrices. For any two columns x:,i and x:,j of a matrix X to be orthogonal the necessary condition states that their scalar product is zero. t x :,i x :, j
0
for all i z j
(2.15)
In our three dimensional world we can perceive three vectors to be orthogonal. However, in a higher dimensional space the set of equations defined by (2.15) must suffice. If columns (or rows) of X are normalised to the square root of the sum of their squared elements (i.e. to unity length), the matrix is called orthonormal. Recall that earlier this kind of normalisation was solved most elegantly by right (left) multiplication with a diagonal matrix comprising the appropriate normalisation coefficients. See the section introducing diagonal matrices for more details. Alternatively, Matlab's built-in function norm can be used to determine normalisation coefficients and perform the same task. An example for column-wise normalisation of a matrix X with orthogonal columns is given below. It is worthwhile to compare X with equation (2.15); the subspace command can be used to determine the angle between the vectors (in rad) and reconfirm orthogonality. X=[1 4; ... 2 3; ... 5 -2]
% orthogonal column matrix
angle=subspace(X(:,1),X(:,2)); angle=rad2deg(angle)
% angle (in rad, pi/2=90°) % angle (in grad)
Xn=[]; for i=1:2 Xn(:,i)=X(:,i)./norm(X(:,i)); % col.-wise normalisation end Xn
26
Chapter 2
X = 1 4 2 3 5 -2 angle = 1.5708e+000 angle = 90 Xn = 1.8257e-001 7.4278e-001 3.6515e-001 5.5709e-001 9.1287e-001 -3.7139e-001
Note that this kind of normalisation, via the norm function, can only be performed column- (or row-) wise via a loop as seen in the Matlab box above. Calling norm with one matrix argument determines a different kind of normalisation coefficients. We refer to the Matlab help and function references for more detail. Orthonormal matrices have very special properties. If a matrix X is comprised of orthonormal rows then X Xt
I
(2.16)
If matrix X is comprised of orthonormal columns then Xt X
I
(2.17)
with the appropriate dimensions of the identity matrices. If matrix X is square and has orthonormal rows, its columns are also orthonormal. The inverse is then equal to its transpose X -1
Xt
(2.18)
and consequently X Xt
Xt X
I
(2.19)
2.2 Solving Systems of Linear Equations Matrix multiplication and inversion provide very useful means of representing and solving systems of linear equations. Consider the following matrix equation: y
cA
(2.20)
27
Matrix Algebra
where y
>y1 y2 y3 @ , c
y
>c1 c 2 c 3 @ , A
=
ªa1,1 a1,2 a1,3 º » « «a2,1 a 2,2 a 2,3 » . «a 3,1 a 3,2 a 3,3 » ¼ ¬
c
A
Figure 2-14. System of linear equations in their matrix form According to the rule for matrix multiplication introduced earlier, each element of y is calculated as the scalar product between c and the corresponding column of A. These linear operations are represented exactly by the following system of inhomogeneous linear equations: y1
c1a1,1 c 2a 2,1 c 3a 3,1
y2
c1a1,2 c 2a 2,2 c 3a 3,2
y3
c1a1,3 c 2a 2,3 c 3a3,3
(2.21)
Let's assume the elements c1, c2 and c3 of vector c are the unknowns. Thus, the system is comprised of three equations with three unknowns. Such systems of n equations with n unknowns have exactly one solution if none of the individual equations can be expressed by linear combinations of the remaining ones, i.e. if they are linearly independent. Then, the coefficient matrix A is of full rank and non-singular and its inverse, A-1, exists such that right multiplication of equation (2.20) with A-1 allows the determination of the unknowns. c
c
=
y A 1
(2.22)
y
A-1
Figure 2-15. Solving Systems of linear equations A typical example arises from Beer-Lambert's law. In spectrophotometry, it describes the linear relationship between the concentration of a chemical species and the measured absorbance at a particular wavelength. The corresponding coefficients are called molar absorptivities. They are specific for each species and wavelength. We refer to Chapter 3.1, Beer-Lambert's Law, for a more detailed introduction of Beer-Lambert's law. Consider a mixture of three species of unknown concentrations c1, c2 and c3 for which the absorbances y1, y2 and y3 have been measured at three
28
Chapter 2
different wavelengths. Suppose that the molar absorptivity coefficients for the individual three species have been determined independently at all three wavelengths beforehand and are known. These absorptivities are collected in matrix A such that each individual row contains the values for one specific species at the three wavelengths. This case is exactly covered by equations (2.20) and (2.21). If the wavelengths have been chosen reasonably, A will be invertible and the individual concentrations c1, c2 and c3 can be determined from equation (2.22). A small Matlab routine could be as follows: y=[0.8 0.6 0.9]; A=[90 30 80; ... 20 70 50; ... 10 50 40]; c=y*inv(A)
% absorbances % molar absorptivities % concentrations
c = 0.0079
0.0033
0.0026
It is important to stress that for this to work, the independently known matrix A of absorptivity coefficients needs to be square, i.e. it has previously been determined at as many wavelengths as there are chemical species. Often complete spectra are available with information at many more wavelengths. It would, of course, not be reasonable to simply ignore this additional information. However, if the number of wavelengths exceeds the number of chemical species, the corresponding system of equations will be over determined, i.e. there are more equations than unknowns. Consequently, A will no longer be a square matrix and equation (2.22) does not apply since the inverse is only defined for square matrices. In Chapter 4.2, we introduce a technique called linear regression that copes exactly with these cases in order to find the best possible solution.
3 Physical/Chemical Models Any textbook on Physical Chemistry is full of mathematical equations that quantitatively describe chemical and physical processes. Often these equations are explicit, often they are not. Some are simple equations and some are complex. Explicit equations are relatively straightforward to deal with Ɇ all that is required is to translate the equation into computer code, be it Matlab or Excel or any other language. As an example, let us consider the ideal gas law pV
nRT
(3.1)
where p is the pressure, V the volume, n the number of moles, R the gas constant and T the temperature in Kelvin. The equation can be rearranged to allow the calculation of the pressure as a function of volume and temperature of a gas sample of a given number of moles. The results can be plotted in a mesh plot of pressure vs. volume and temperature. We create a range of temperatures and volumes. The command logspace( 1.2,0,50) creates a vector of 50 logarithmically spaced values between 10 1.2 and 100=1; the command meshgrid produces the matrices V and T that contain all the pressures and temperatures required for the grid of values needed for the plot. MatlabFile 3-1. Gas_Laws.m %Gas_Laws n=1; R=8.206e-2;
% number of moles % [L atm mol-1 K-1]
volume=logspace(-1.2,0,50)'; temp=200:10:300; [V,T]=meshgrid(volume,temp); % Ideal gas pressure=n*R*T./V; mesh(log10(volume),temp,pressure); view(20,30) xlabel('log(volume)');ylabel('temp');zlabel('pressure');
30
Chapter 3
Figure 3-1. The pressure of an ideal gas as a function of log(volume) and temperature. Ideal gases do not exist and for real gases, several approximate equations have been developed to describe the pressure as a function of volume and temperature. The first useful equation is due to van der Waals p
nRT §n · a ¨ ¸ V nb ©V ¹
2
(3.2)
The van der Waals coefficients a and b are determined experimentally for each gas. For CO2 they are a=3.610 atmL2mol-2 and b=0.0429 Lmol-1. The approximation is not perfect and negative volumes are computed at certain ranges of pressure and temperature. These values are replaced by zero. MatlabFile 3-2. Gas_Laws.m …continued %Gas_Laws ... continued %Van der Waals corrections for CO2 a=3.61; b=.0429;
% [L^2 atm mol-2] % [L mol-1]
pressure_vdW=n*R*T./(V-n*b)-a*n^2./(V.*V); pressure_vdW(pressure_vdW[ X ]
[Y ] [Z ]@
(3.36)
Equation (3.34) now reads as:
d(c + Gc ) = d(c ) +
wd(c) u Gc wc
(3.37)
It is useful to represent this equation graphically. All vectors are 3-element row vectors while the matrix of derivatives is a 3u3 matrix. =
+ Gc
d(c)
d(c+Gc)
wd(c ) wc
Figure 3-10. Graphical representation of equation (3.37). The task is to determine the shift vector Gc for which the new vector of differences d(c+Gc) is minimal or within the Taylor approximation for which this difference is zero. In Chapter 2.2, Solving Systems of Linear Equations, we have given the solution:
ª wd(c) º Gc= d(c) u « ¬ wc ¼» What is left is the matrix
J.
wd wc
1
(3.38)
wd(c) of derivatives. It is called the Jacobian, wc
51
Physical/Chemical Models
ª wd1 « « wc1 « wd1 « « wc 2 « wd « 1 «¬ wc 3
wd wc
J
wd2 wc1 wd2 wc 2 wd2 wc 3
wd3 º » wc1 » wd3 » » wc 2 » wd3 » » wc 3 »¼
(3.39)
The determination of the first element of J as an example, is detailed below: wd X w[ X ]
w [ X ]tot ¦ xExyz [ X ]x [Y ]y [Z ]z
w[ X ]
w ¦ xExyz [ X ]x [Y ]y [Z ]z w[ X ]
¦ x Exyz [ X ]x 1[Y ]y [Z ]z 2
x
2
¦
y
(3.40)
z
x Exyz [ X ] [Y ] [Z ] [X ] 2
¦
x [ X xYy Z z ] [X ]
where dx denotes the difference between true and calculated total concentration of component X. It is relatively easy to continue along these lines and determine the total Jacobian as:
J
wd(c ) wc
ª x 2 [ X xYy Z z ] «¦ [X ] « « xy[ X xYy Z z ] «¦ « [Y ] « « xz [ X xYy Z z ] «¦ [Z ] «¬
xz [ X xYy Z z ] º » [X ] [X ] » » y 2[ X xYy Z z ] yz [ X xYy Z z ] ¦ [Y ] ¦ [Y ] »» » yz [ X xYy Z z ] z 2 [ X xYy Z z ] » ¦ [Z ] ¦ [Z ] » »¼
¦
xy[ X xYy Z z ]
¦
(3.41)
The Jacobian is not symmetric. However, we can rewrite this equation in a way that takes advantage of a 'hidden' symmetry. This allows faster computation as only the upper triangular part of the symmetric matrix J needs to be computed:
J
2 0 º ª« ¦ x [ X xYy Z z ] ª1 [X ] 0 «« 0 1 [Y ] 0 »» u « ¦ xy[ X xYy Z z ] « «¬ 0 0 1 [Z ]»¼ « xz [ X Y Z ] x y z ¬¦
diag(c)1 u J
¦ xy[X xYy Z z ] ¦ xz[X xYy Z z ]º» ¦ y 2[X xYy Z z ] ¦ yz[X xYy Z z ]»» ¦ yz[X xYy Z z ] ¦ z 2[X xYy Z z ]»¼
(3.42)
52
Chapter 3
The Jacobian can be written as the product of a diagonal matrix, containing the inverse component concentrations times the matrix J , where
J 1
1
J diag(c ) . The shifts are then calculated as: d(c) J 1
Gc
1
d(c ) J diag(c)
(3.43)
We are now in a position to write a function NewtonRaphson.m that calculates the species concentrations for a solution for which the total component concentrations, the chemical model, and its formation constants are known. The flow diagram in Figure 3-11 illustrates the procedure. Guess initial concentrations for the free component concentrations [X], [Y], [Z]
Calculate the concentrations of all species e.g. XxYyZz=Exyz [X]x[Y]y[Z]z
Calculate the total concentration of the components e.g. [ X ]tot _ calc
¦ x Exyz [X ]x [Y ]y [Z ]z
Compare actual total conc. with computed ones e.g.
dx
[ X ]tot [ X ]tot _ calc
Difference
#0
Exit
>0 Calculate Jacobian
Calculate shift vector and add to component concentrations
Figure 3-11. A flow diagram for the Newton-Raphson algorithm.
53
Physical/Chemical Models MatlabFile 3-9. NewtonRaphson.m function c_spec=NewtonRaphson(Model, beta, c_tot, c,i) ncomp=length(c_tot); nspec=length(beta); c_tot(c_tot==0)=1e-15;
% number of components % number of species % numerical difficulties if c_tot=0
it=0; while it B, explicit solution d=dsolve('Da=-2*k*a^2','Db=k*a^2','a(0)=a_0','b(0)=0'); pretty(simplify(d.a)) a_0 ------------2 k t a_0 + 1
or in a readable form: [A]
[ A ]0 2 k t [ A ]0 1
(3.77)
80
Chapter 3
Note that Matlab’s symbolic toolbox demands lower case characters for species names. Below, the attempt to use the symbolic toolbox for the integration of a slightly more complex mechanism: k1 A oB k
(3.78)
2 2B oC
% A -> B, explicit solution % 2B -> C d=dsolve('Da=-k1*a','Db=k1*a-2*k2*b^2','Dc=k2*b^2', ... 'a(0)=a_0','b(0)=0','c(0)=0'); pretty(simplify(d.c))
As it turns out, the explicit solution is very complex including several Bessel functions. We leave it to the reader to explore the output. As a matter of fact, most mechanisms do not have explicit solutions and require numerical integration. The few mechanisms discussed so far are exceptions rather than the rule. Fortunately, numerical integration is always possible and next we demonstrate how this can be achieved.
3.4.3 Complex Mechanisms that Require Numerical Integration Numerical integration of sets of differential equations is a well developed field of numerical analysis, e.g. most engineering problems involve differential equations. Here, we only give a very brief introduction to numerical integration. We start with the Euler method, proceed to the much more useful Runge-Kutta algorithm and finally demonstrate the use of the routines that are part of the Matlab package. The Euler Method The simplest method for the numerical integration of a system of differential equations is named after Euler. Not surprisingly, it can be seen as an adaptation of the truncated Taylor series expansion, equation (3.81), which is the standard tool for non-linear problems. We have already encountered it in Solving Complex Equilibria (p.48), and we employ it again for non-linear least-squares fitting in Chapter 4.3, Non-Linear Regression. Please note: the Euler method should not be used. It is very slow even if only a modest accuracy is required. But because of its simplicity it is ideally suited to demonstrate the general principles of the numerical integration of ordinary differential equations. As usual, we start with a simple example; consider the reversible reaction: k+ ZZZZ X 2 A YZZZ Z B k
(3.79)
81
Physical/Chemical Models
While there is an analytical solution for this mechanism, the formula for the calculation of the concentration profiles for A and B is fairly complex, involving the tan and atan functions (according to Matlab’s symbolic toolbox). We use it to demonstrate the basic ideas of numerical integration. The Euler method can be represented graphically, see Figure 3-29.
[A]0
[ A ]0
slope
[A]1
slope
[ A ]1
[A]2
[ A ]2
slope
[A]3
t0
t1
t2
t3
Figure 3-29. Euler method for numerical integration.
At any time t, knowing the concentrations [ A ]t and [B ]t , the derivatives of the concentrations of A and B, [ A ]t and [B ]t , can be calculated by applying the rate law.
[ A ]t = -2k+[A ]t2 + 2k- [B ]t [B ]t = -
1 [ A ]t = k+[A ]t2 - k- [B ]t 2
(3.80)
Figure 3-29 only deals with the concentration [A], the principle for the treatment of concentration [B] is identical. Starting at time t0 the initial concentrations are [A]0 and [B]0. The derivatives [ A ]0 and [B ]0 are calculated according to equation (3.80). This allows the computation of new concentrations, [A]1 and [B]1 after a short time interval 't = t1–t0. [A ]1 = [A ]0 + Ʀt [A ]0 [B ] = [B ] + Ʀt [B ] 1
0
(3.81)
0
These new concentrations at time t1 in turn allow the determination of new derivatives and thus another set of concentrations [A]2 and [B]2, after the second time interval t2–t1. As shown in Figure 3-29, this procedure is simply repeated until the desired final reaction time is reached. The main disadvantage of the Euler method is that the calculated approximation for the concentrations are systematically wrong for each step.
82
Chapter 3
In the example, the tangent [A ] always overestimates the change of the concentration for any time interval. The longer it is, the larger the deviation. Thus to maintain a good accuracy, step sizes have to be very small; but then computation times are very long and additionally other numerical problems start to interfere. Fortunately, there are many better methods available. Amongst them algorithms of the Runge-Kutta type are frequently used in chemical kinetics. In contrast to our preferred standard mode in this book, we do not develop a Matlab function for the task of numerical integration of the differential equations pertinent to chemical kinetics. While it would be fairly easy to develop basic functions that work reliably and efficiently with most mechanisms, it was decided not to include such functions since Matlab, in its basic edition, supplies a good suite of fully fledged ODE solvers. ODE solvers play a very important role in many applications outside chemistry and thus high level routines are readily available. An important aspect for fast computation is the automatic adjustment of the step-size, depending on the required accuracy. Also, it is important to differentiate between stiff and non-stiff problems. Proper discussion of the difference between the two is clearly outside the scope of this book, however, we indicate the stiffness of problems in a series of examples discussed later. So, instead of developing our own ODE solver in Matlab, we will learn how to use the routines supplied by Matlab. This will be done in a quite extensive series of examples. The situation is different for those readers who do not have access to Matlab and rely completely on Excel. In the following, we explain how a fourth order Runge-Kutta method can be incorporated into a spreadsheet and used to solve non-stiff ODE's. Fourth Order Runge-Kutta Method in Excel The fourth order Runge-Kutta method is the workhorse for the numerical integration of ODEs. Elaborate routines with automatic step-size control are available in Matlab. Here, we develop an Excel spreadsheet for the numerical integration of the k ZZZZ X reaction mechanism 2A YZZZ Z B based on the 4th order Runge-Kutta k
method, see Figure 3-30. The 4th order Runge-Kutta method requires four evaluations of concentrations and derivatives per step. This appears to be a serious disadvantage, but as it turns out, significantly larger step sizes can be taken for an acceptable accuracy and overall the computation times are very much shorter.
Physical/Chemical Models
83
ExcelSheet 3-6. Chapter2.xls-RungeKutta
=B5+E5/6*(F5+2*J5+2*N5+R5)
=B5+E5/2*J5
=A6-A5 =A6-A5 2*$B$1*B5^2+2*$B$2*C5 =-2*$B$1*B5^2+2*$B$2*C5 =
= =-2*$B$1*L5^2+2*$B$2*M5
=B5+E5/2*F5
=B5+E5*N5
=-2*$B$1*H5^2+2*$B$2*I5 =-2*$B$1*H5^2+2*$B$2*I5
=-2*$B$1*P5^2+2*$B$2*Q5
Figure 3-30. Excel spreadsheet for the numerical integration of the k ZZZZ X rate law for the reaction 2A YZZZ Z B using 4th order Runge-Kutta k
equations. The 4th order Runge-Kutta method is reasonably complex. Without developing the equations, we demonstrate their application by applying them stepwise in an Excel spreadsheet that is written specifically for the reaction k ZZZZ X 2A YZZZ Z B. k
We start at time zero where the initial concentrations are [A]0=1 and [B]0=0 (cells B5 and C5). The time interval 't=1 is calculated in cell E5. 1. Calculate the derivatives of the concentrations at time point 0:
[ A ]0 = -2k+ [A ]02 + 2k- [B ]0 [B ]0 = k+[A ]20 - k- [B ]0
(cell F5) (cell G5)
In the Excel language, for [ A ]0 , this translates into =-2*$B$1*B5^2+2*$B$2*C5, as indicated in Figure 3-30. Note, in the figure only the cell formulas for the computations of component A are given. 2. Calculate approximate concentrations at the intermediate time point 't/2, based on the concentrations and derivatives at time 0.
't [A ]0 2 't [B ]0 [B ]1 = [B ]0 + 2
[A ]1 = [A ]0 +
(cell H5) (cell I 5)
Again, the Excel formula for component A is given in Figure 3-30.
84
Chapter 3
3. Calculate the derivatives at the intermediate time point:
[ A ]1 = -2k+ [A ]12 + 2k- [B ]1 [B ]1 = k+ [A- ]12 k- [B ]1
(cell J5) (cell K5)
4. Calculate another set of concentrations at the intermediate time point, now based on the concentrations at time 0 and the derivatives at the intermediate time point:
't [A ]1 2 't [B ]2 = [B ]0 + [B ]1 2
[A ]2 = [A ]0 +
(cell L5) (cell M5)
5. Compute a new set of derivatives at the intermediate time point, based on the concentrations just calculated:
[ A ]2 = -2k+ [A ]22 + 2k- [B ]2 [B ]2 = k+ [A ]22- k- [B ]2
(cell N5) (cell O5)
6. Next, the concentrations at the new time point 1, after the complete time interval, are computed, based on the concentrations at time point 0 and these new derivatives at the intermediate time point:
[A ]3 = [A ]0 + 't [A ]2 [B ] = [B ] + 't[B ] 3
0
2
(cell P5) (cell Q5)
7. Computation of the derivatives at time point 1:
[ A ]3 = -2k+ [A ]32 + 2k- [B ]3 [B ]3 = k+[A ]32- k- [B ]3
(cell R5) (cell S5)
8. Finally the new concentrations after the full time interval are computed as: 't [A ]new = [A ]0 + (cell B6) [A ]0 +2[A ]1 2[A ]2 [A ]3 6 't [B ]new = [B ]0 + [B ]0 2[B ]1 2[B ]2 [B ]3 (cell C6) 6 These concentrations are put as the next elements into the columns B and C.
85
Physical/Chemical Models
These final equations can be compared with the equivalent in the Euler approach, equation (3.81). In the Runge-Kutta method a weighted average of 4 different approximations for the derivatives is used instead of the derivative at the beginning of the time interval. Figure 3-31 displays the resulting concentration profiles for species A and B. As it is a reversible reaction, an equilibrium is established after a certain time.
1
concentration
0.8
0.6
0.4
0.2
0 0
2
4
6
8
10
time
k ZZZZ X Figure 3-31. Concentration profiles for a reaction 2A YZZZ Z B ( k
[A], } [B]) as modelled in Excel using a 4th order Runge-Kutta for numerical integration. While the above spreadsheet looks moderately complex, it nevertheless allows the accurate numerical integration of a real chemical reaction in a very short time. We might get the impression that "this is it". Far from it, there are several important aspects that we can only point out very briefly here. As mentioned before, the spreadsheet has been set up specifically for the numerical integration of the differential equations for the reaction k ZZZZ X 2A YZZZ Z B . It is not convenient to adapt this spreadsheet for the k
computation of a different reaction scheme. Most equations need to be rewritten. Such manual changes are error prone, and ensuing wrong results can easily go unnoticed. It is of course possible to set up a more elaborate spreadsheet that more readily allows adaptation for different reaction mechanisms. (The web-site http://www.cse.csiro.au/poptools/index.htm offers a large downloadable selection of tools for Excel, amongst them a package for the numerical integration of ODE's)
86
Chapter 3
For fast computation the determination of the best step-size (interval) is crucial; steps that are too small result in correct concentrations at the expense of long computation times. On the other hand, steps that are too long save computation time but result in poor approximations. The best intervals lead to the fastest computation of concentration profiles within some pre-defined error limits. This of course requires knowledge about the required accuracy. The ideal step-size is not constant during the reaction and so needs to be adjusted continuously. If more complex mechanisms and thus systems of differential equations are to be integrated, adaptive step size control is absolutely essential. The Runge-Kutta algorithm cannot handle so-called stiff problems. Computation times are astronomical and thus the algorithm is useless, for that class of ordinary differential equations, specialised 'stiff solvers' have been developed. In our context, a system of ODEs sometimes becomes stiff if it comprises very fast and also very slow steps and/or very high and very low concentrations. As a typical example we model an oscillating reaction in The Belousov-Zhabotinsky (BZ) Reaction (p.95). It is well outside the scope of this chapter to expand on the intricacies of modern numerical integration routines. Matlab provides an excellent selection of routines for any situation. For further reading we refer to the relevant literature and the Matlab manuals. Thus, rather than trying to explain how they work, we demonstrate how they are used.
3.4.4 Interesting Kinetic Examples Fast and accurate ODE solvers are very complex algorithms. In particular the design of adaptive step size and the analysis of stiff problems require sophisticated algorithms. The development of such algorithms is beyond the scope of this book. Matlab supplies a good collection of routines that cater for all the needs of the kineticist dealing with any reasonably complex mechanism. In contrast to the other parts of this book, we do not develop a generally applicable algorithm. Instead, we demonstrate in a representative collection of interesting and exemplary mechanisms, how to use the Matlab solvers. In this chapter, we concentrate on the simulation of chemical kinetics, i.e. based on a given chemical mechanism and the relevant rate constants, the concentration profiles (the matrix C) of all reaction species is computed. The next chapter incorporates these functions into a general fitting routine that can be used to fit the optimal rate constants for a given mechanism to a particular measurement. We start with simple chemical examples, later we examine a few interesting and surprising non-chemical examples.
Physical/Chemical Models
87
Autocatalysis Processes are called autocatalytic if the products of a reaction accelerate their own formation. Autocatalytic reactions get faster as the reaction proceeds, sometimes dramatically, sometimes slowly and steadily. Exponential growth is a very basic non-chemical example. Of course the acceleration cannot be permanent; the reaction will slow down and eventually come to an end once the starting materials have been used up. Only economists believe in sustainable growth. An extreme example of an autocatalytic reaction is an explosion. In this case, it is not directly a chemical product that accelerates the reaction, it is the heat generated by the reaction. The more heat produced, the faster the reaction; the faster the reaction, the more heat, etc. There are many mechanisms that display autocatalytic behaviour. A minimal and very basic autocatalytic reaction scheme is presented below: k1 A o B
(3.82)
k
2 A + B o 2B
Starting with component A there is a relatively slow first order reaction to form the product B. The second reaction is of the order two, it opens another path for the formation of component B. As it is a second order reaction, the higher the concentration of B, the faster is the decomposition of A to form more B. The system of differential equations for this reaction scheme is given below (3.83): [A ] = -k1[A ] - k2 [A ][B ] [B ] = k [A ] + k [A ][B ] 1
(3.83)
2
The organisation of the Matlab ODE solvers requires some explanation. For this example, the core is a function, ode_autocat.m, that returns the derivatives of the concentrations at any particular time or better, for any set of concentration of the reacting species. Essentially it is the Matlab code for equation (3.83). MatlabFile 3-20. ode_autocat.m function c_dot=ode_autocat(t,c,flag,k) % A --> B % A + B --> 2 B c_dot(1,1)=-k(1)*c(1)-k(2)*c(1)*c(2); c_dot(2,1)= k(1)*c(1)+k(2)*c(1)*c(2);
% A_dot % B_tot
c_dot is a column vector of the two derivatives [A ] and [B ]. The vectors k and c contain the rate constants and the actual concentrations; t is the time at which the derivatives are computed; it is not used within this particular function. The flag is not used here either.
88
Chapter 3
This function is called numerous times from the Matlab ODE solver. In the example it is the ode45 which is the standard Runge-Kutta algorithm. ode45 requires as parameters the file name of the inner function, ode_autocat.m, the vector of initial concentrations, c0, the rate constants, k, and the total amount of time for which the reaction should be modelled (20 time units in the example). The solver returns the vector t at which the concentrations were calculated and the concentrations themselves, the matrix C. Note that due to the adaptive step size control, the concentrations are computed at times t which are not predefined. MatlabFile 3-21. autocat.m % autocat % A --> B % A + B --> 2 B c0=[1;0]; % initial conc of A and B k=[1e-6;2]; % rate constants k1 and k2 [t,C]=ode45('ode_autocat',20,c0,[],k); % call ode-solver plot(t,C) % plotting C vs t xlabel('time');ylabel('conc.');
1.2 1
conc.
0.8 0.6 0.4 0.2 0 -0.2 0
5
10 time
15
20
Figure 2-32. Concentration profiles for the autocatalytic reaction k2 k1 A o B ; A + B o 2B.
Figure 2-32 shows the calculated corresponding concentration profiles using the rate constants k1=10-6s-1 and k2=2M-1s-1 for initial concentrations [A]0=1M and [B]0=0M. After an induction period of some 5 time units the reaction accelerates dramatically. At around 10 time units, when the
Physical/Chemical Models
89
component A is almost used up, the reaction decelerates and quickly comes to the end. We are using the solvers here in their very basic version. Many additional parameters can be controlled, such as maximal step size or required accuracy. We refer to the original documentation for more information about these topics. In the above program, autocat.m, the 20 represents the total time. The ODE solver calculates the optimal step size automatically and returns the time vector t with the concentrations C. The ODE solver can also be forced to return concentrations at specific times by passing the complete vector of times instead of only the total time. 0th Order Reaction In strict terms, 0th order reactions do not really exist. They are always macroscopically observed reactions where the rate of the reaction is independent of the concentrations of the reactants for a certain time period. Formally, the ODE for a basic 0th order reaction is defined below: [A ] = -k [A ]0 = -k
(3.84)
A simple mechanism that mimics a 0th order reaction is the catalytic transformation of A to C. A reacts with the catalyst Cat to form an intermediate activated complex B. B in turn reacts further to form the product C releasing the catalyst that continues reacting with A. k1 A + Cat o B k
(3.85)
2 B o C + Cat
The total concentration of catalyst is much smaller than the concentrations of the reactants or products. Note, that in real systems, the reactions are reversible and usually there are more intermediates, but for the present purpose, this minimal reaction mechanism is sufficient. The system of ODEs: [ A ] ] [Cat [B ] [C ]
k1[A ][Cat ] k1[A ][Cat ] k2 [B ]
k1[A ][Cat ] k2[B ] k2[B ]
MatlabFile 3-22. ode_zero_order.m function c_dot=ode_zero_order(t,c,flag,k) % 0th order kinetics % A + Cat --> B % B --> C + Cat c_dot(1,1)=-k(1)*c(1)*c(2);
% A_dot
(3.86)
90
Chapter 3
c_dot(2,1)=-k(1)*c(1)*c(2)+k(2)*c(3); c_dot(3,1)= k(1)*c(1)*c(2)-k(2)*c(3); c_dot(4,1)= k(2)*c(3);
% Cat_dot % B_dot % C_dot
The production of C is governed by the amount of intermediate B which is constant over an extended period of time. As long as there is an excess of A with respect to the catalyst, essentially all of the catalyst exists as complex B and thus this concentration is constant. The crucial differential equation is the last one; it is a 0th order reaction as long as [B] is constant. The kinetic profiles displayed in Figure 3-32 have been integrated numerically with Matlab's stiff solver ode15s using the rate constants k1=1000 M-1s-1, k2=100 s-1 for the initial concentrations [A]0=1 M, [Cat]0=10 4 M and [B] =[C] =0 M. For this model the standard Runge-Kutta routine is 0 0 far too slow and thus useless. MatlabFile 3-23. zero_order.m % zero_order % A + Cat --> B % B --> C + Cat c0=[1;1e-4;0;0]; % initial conc of A, Cat, B and C k=[1000;100]; % rate constants k1 and k2 [t,C] = ode15s('ode_zero_order',200,c0,[],k); % call ode-solver figure(1); plot(t,C) % plotting C vs t xlabel('time');ylabel('conc.');
1 0.9 0.8 0.7
conc.
0.6 0.5 0.4 0.3 0.2 0.1 0 0
50
100 time
150
200
Figure 3-32. Concentration profiles for the reaction k2 k1 A + Cat o B , B o C + Cat . The reaction is th approximately 0 order for about 100 s.
91
Physical/Chemical Models
The Steady-State Approximation Traditionally, reaction mechanisms of the kind above have been analysed based on the steady-state approximation. The differential equations for this mechanism cannot be integrated analytically. Numerical integration was not readily available and thus approximations were the only options available to the researcher. The concentrations of the catalyst and of the intermediate, activated complex B are always only very low and even more so their ] and [B ] . In the steady-state approach these two derivatives derivatives [Cat are set to 0. [B ]
] [Cat
k1[A ][Cat ] k2[B ]
0
(3.87)
This equation allows the computation of the concentration [B] [B ]
k1[ A ][Cat ] k2
(3.88)
and the conservation of mass for the catalyst which either exists as Cat or as B [Cat ] [Cat ]0 [B ]
(3.89)
Introduction of (3.89) into (3.88) and a few rearrangements result in [B ]
k1[ A ][Cat ]0 k2 k1[ A ]
(3.90)
For most of the time, up to 100 sec, k2 2 wolves wolf --> dead wolf
c_dot(1,1)=k(1)*c(1)-k(2)*c(1)*c(2); c_dot(2,1)=k(2)*c(1)*c(2)-k(3)*c(2);
% sheep_dot % wolf_dot
The kinetic population profiles displayed in Figure 3-34 have been obtained by numerical integration using Matlab's Runge-Kutta solver ode45 with the rate constants k1=2, k2=5, k3=6 for the initial populations [sheep]0=2, [wolf]0=2. For simplicity, we ignore the units. In ode_Lotka_Volterra.m the function that generates the differential equations is given. It is repeatedly called by the ODE-solver ode45. MatlabFile 3-26. Lotka_Volterra.m % Lotka_Volterra % sheep --> 2 sheep % wolf + sheep --> 2 wolves % wolf --> dead wolf c0=[2;2]; % initial 'conc' of sheep and wolves k=[2;5;6]; % rate constants k1, k2 and k3 [t,C] = ode45('ode_lotka_volterra',10,c0,[],k); % call ode-solver figure(1); plot(t,C) % plotting C vs t xlabel('time');ylabel('conc.') legend('sheep','wolves');
Surprisingly, the dynamic of such a population is completely cyclic. All properties of the cycle depend on the initial populations and the ‘rate constants’. This behaviour is best seen in a plot of the wolf vs. the sheep 'concentration'. For any set of initial 'concentrations' and ‘rate constants’, this cyclic behaviour is maintained.
94
Chapter 3
4 sheep wolves
3.5 3
conc.
2.5 2 1.5 1 0.5 0 0
2
4
6
8
10
time
Figure 3-34. Lotka-Volterra’s predator and prey ‘kinetics’. MatlabFile 3-27. Lotka_Volterra.m …continued %Lotka_Volterra ...continued figure(2); plot(C(:,1),C(:,2)) xlabel('[sheep]');ylabel('[wolf]')
2.5
2
[wolf]
1.5
1
0.5
0 0
1
2 [sheep]
3
4
Figure 3-35. The concentration of wolves plotted versus the concentration of sheep in the Lotka-Volterra predator-prey kinetics.
Physical/Chemical Models
95
Note the imperfect coincidence of the line. This effect is due to small numerical errors; increasing the accuracy of the solver reduces these differences. The Belousov-Zhabotinsky (BZ) Reaction Chemical mechanisms for real oscillating reactions are very complex and presently not understood in every detail. Nevertheless, there are approximate mechanisms which correctly model several crucial aspects of real oscillating reactions. In these simplified systems, often not all physical laws are strictly obeyed, e.g. the law of conservation of mass. The Belousov-Zhabotinsky (BZ) reaction involves the oxidation of an organic species such as malonic acid (MA) by an acidified aqueous bromate solution in the presence of a metal ion catalyst such as the Ce(III)/Ce(IV) couple. At excess [MA] the stoichiometry of the net reaction is catalyst 2BrO3 3CH 2 (COOH )2 2H o 2BrCH (COOH )2 3CO2 4H 2O (3.94)
A short induction period is typically followed by an oscillatory phase, visible by the alternating colour of the aqueous solution due to the different oxidation states of the metal catalyst. Addition of a coloured redox indicator, such as the Fe(II)/(III)(phen)3 couple, results in more dramatic colour changes. Typically, several hundred oscillations with a periodicity of approximately one minute, gradually die out within a couple of hours and the system slowly drifts towards its equilibrium state. In order to understand the BZ system Field, Körös and Noyes developed the so-called FKN mechanism. From this, Field and Noyes later derived the Oregonator model, an especially convenient kinetic model to match individual experimental observations and predict experimental conditions under which oscillations might arise. k1 BrO3 Br o HBrO2 HOBr k
2 BrO3 HBrO2 o 2HBrO2 2M ox
k3 o 2HOBr HBrO2 Br k4
2HBrO2 o BrO3 k5
MA M ox o 12 Br
(3.95)
HOBr
Mox represents the metal ion catalyst in its oxidised form (Ce(IV)). It is important to note that this model is based on an experimentally determined empirical rate law and does clearly not comprise stoichiometrically correct elementary processes. The five reactions in the model provide the means to kinetically describe the four essential stages of the BZ reaction: x
formation of HBrO2
96
Chapter 3
x
autocatalytic formation of HBrO2
x
consumption of HBrO2
x
oxidation of malonic acid (MA)
For the calculation of the kinetic profiles displayed in Figure 3-36, we used the rate constants k1=1.28M-1s-1, k2=33.6M-1s-1, k3=2.4u106M-1s-1, k4=3u103M-1s-1, k5=1M-1s-1 which result in approximate concentration profiles in acidic solution. The initial concentrations are [BrO3-]0=0.063M, [Ce(IV)]0=0.002M (=[Mox]0) and [MA]0=0.275M. The code is fairly complex and thus its development can be error prone. MatlabFile 3-28. ode_BZ.m function c_dot = ode_BZ(t,c,flag,k) % % % % % % %
BZ BrO3 + Br BrO3 + HBrO2 HBrO2 + Br 2 HBrO2 MA + Mox
--> --> --> --> -->
HBrO2 + HOBr 2 HBrO2 + 2 Mox 2 HOBr BrO3 + HOBr 0.5 Br
c_dot(1,1)=-k(1)*c(1)*c(2)-k(2)*c(1)*c(3)+k(4)*c(3).^2; %BrO3_dot c_dot(2,1)=-k(1)*c(1)*c(2)-k(3)*c(3)*c(2)+0.5*k(5)*c(6)*c(5); %Br_dot c_dot(3,1) =k(1)*c(1)*c(2)+k(2)*c(1)*c(3)-k(3)*c(3)*c(2) -2*k(4)*c(3).^2; %HBrO2_dot c_dot(4,1)= k(1)*c(1)*c(2)+2*k(3)*c(3)*c(2)+k(4)*c(3).^2; %HOBr_dot c_dot(5,1)= 2*k(2)*c(1)*c(3)-k(5)*c(6)*c(5); %Mox_dot c_dot(6,1)=-k(5)*c(6)*c(5); %MA_dot MatlabFile 3-29. BZ.m % BZ % % % % %
BrO3 + Br BrO3 + HBrO2 HBrO2 + Br 2 HBrO2 MA + Mox
--> --> --> --> -->
HBrO2 + HOBr 2 HBrO2 + 2Mox 2 HOBr BrO3 + HOBr 0.5 Br
BrO3_0=0.063; Mox_0=0.002; MA_0=0.275; k=[1.28;33.6;2.4e6;3e3;1]; options=odeset('RelTol',1e-6,'AbsTol',1e-10); [t,C] = ode15s('ode_BZ',1000,[BrO3_0 0 0 0 Mox_0 MA_0],options,k); plot(t,log10(C(:,[1 4 6])));axis([0 1000 -8 0]); % BrO3,HOBr,MA hold;plot(t,log10(C(:,[2 3 5])),'linewidth',2);hold; % Br,HBrO2,Mox xlabel('time');ylabel('log(conc)') legend('BrO3','HOBr','MA','Br','HBrO2','Mox')
97
Physical/Chemical Models
0 BrO3 HOBr MA Br HBrO2 Mox
-1 -2
log(conc)
-3 -4 -5 -6 -7 -8 0
200
400
600
800
1000
time
Figure 3-36. The BZ reaction as represented by the Oregonator model. The species Br , HBrO2 and M ox display regular oscillations while the species BrO3 , HOBr and MA change their concentrations slowly and more steadily. One important note for this system: we had to increase the default accuracy of the integration (RelTol and AbsTol) and also use the stiff solver ode15s. We leave it to the reader to experience the Runge-Kutta solver ode45 or the default accuracy. Chaos, the Lorenz Attractor Our chemical experiences suggest that differential equations seem to be something stable, and by that we mean that, if there is a small change in one of the conditions, either initial concentrations or rate constants, we expect small changes in the outcomes as well. The classical example for a stable system is our solar system of planets orbiting the sun. Their trajectories are defined by their masses and initial location and velocity, all of which are the initial parameters of a relatively simple system of differential equations. As we all know, the system is very stable and we can predict the trajectories with an incredible precision, e.g. the eclipses and even the returns of comets. For a long time, humanity believed that the whole universe behaves in a similarly predictable way, of course much more complex but still essentially predictable. Descartes was the first to formally propose such a point of view.
98
Chapter 3
In the 1960's the meteorologist Edward Lorenz worked on systems of differential equations describing weather patters, and found something utterly different. The smallest modification in the initial conditions can have a dramatic effect, resulting in a completely different outcome after a certain time. Such behaviour is called chaotic. The sets of differential equations initially were rather complex but later he developed a simpler set which shows the same effect. A B
k1(B - A)
C
AB - k3C
k2 A - B - AC
(3.96)
MatlabFile 3-30. ode_Lorenz.m function c_dot=ode_Lorenz(t,c,flag,k) % A_dot = k1(B-A) % B_dot = k2A-B-AC % C_dot = AB-k3C c_dot(1,1)= k(1)*(c(2)-c(1)); c_dot(2,1)= k(2)*c(1)-c(2)-c(1)*c(3); c_dot(3,1)= c(1)*c(2)-k(3)*c(3);
% A_dot % B_tot % C_tot
Naturally, A, B and C as well as the constants ki have a completely different meaning than the ones we are used to from chemical kinetics; they are not species with a certain concentration. Chaotic behaviour is restricted to certain ranges of initial values and parameters. It is up to the reader to play with these options. The short program Lorenz.m calculates the 'concentrations' for A, B and C for the initial conditions. c0=[1;1;20]. Figure 3-37 displays the trajectories in a fashion that is not common in chemical kinetics. It is a plot of the time evolution of the values of A vs. B vs. C (see also Figure 3-35). Most readers will recognise the characteristic butterfly shape of the trajectory. The important aspect is that, in contrast to Figure 3-35, the trajectory is different each time. This time, it is not the effect of numerical errors but an essential aspect of the outcome. Even if the starting values for A, B and C are away from the ‘butterfly’, the trajectory moves quickly into it; it is attracted by it and thus the name, Lorenz attractor. MatlabFile 3-31. Lorenz.m % Lorenz c0=[1;1;20]; % initial conc of A, B, C k=[10;30;3]; % parameters k1, k2, k3 [t,C]=ode45('ode_Lorenz',30,c0,[],k); % ode-solver figure(1); plot3(C(:,1),C(:,2),C(:,3)); grid; xlabel('A');ylabel('B');zlabel('C');
99
Physical/Chemical Models
50 40
C
30 20 10 0 40 20 0 -20 B
-40
-20
10
0
-10
20
A
Figure 3-37. The trajectory for the Lorenz attractor. And now the chaotic aspect. Lets start with very similar initial conditions of c0=[1;1;20.00001], and store the result in the matrix C1. A plot of A only for the two calculations as a function of time is most revealing. MatlabFile 3-32. Lorenz.m …continued % Lorenz ...continued
c0=[1,1,20.00001];
[t1,C1]=ode45('ode_lorenz',30,c0,[],k); figure(2); plot(t,C(:,1),t1,C1(:,1));
xlabel('time');ylabel('A');
axis([10 20 -20 20])
% ode-solver
20 15 10
A
5 0 -5 -10 -15 -20 10
12
14
16
18
20
time
Figure 3-38. Two trajectories with very slightly different initial conditions. They are indistinguishable for a relatively long time and then suddenly move apart.
100
Chapter 3
For the first 14 time units the two traces are virtually indistinguishable and then, rather suddenly, they move apart. Each trajectory still stays within the original 'butterfly' but follows a completely different path.
4 Model-Based Analyses Very rarely are measurements themselves of much use or of great interest. The statement "the absorption of the solution increased from 0.6 to 0.9 in ten minutes", is of much less use than the statement, "the reaction has a half-life of 900 sec". The goal of model-based analysis methods presented in this chapter is to facilitate the above 'translation' from original data to useful chemical information. The result of a model-based analysis is a set of values for the parameters that quantitatively describe the measurement, ideally within the limits of experimental noise. The most important prerequisite is the model, the physical-chemical, or other, description of the process under investigation. An example helps clarify the statement. The measurement is a series of absorption spectra of a reaction solution; the spectra are recorded as a function of time. The model is a second order reaction A+B→C. The parameter of interest is the rate constant of the reaction. The purpose of this chapter is to develop a collection of methods that allow the determination of the 'best' set of parameters for a particular given model and one or a collection of measurements. In other words we fit the parameter(s) to the measurement(s). It cannot be over-stressed that the task of finding the 'best' model for the measurement is a much more difficult undertaking. A crucial difference between finding the optimal parameters for a given model and finding the optimal model, lies in the fact that the parameters of a model form a continuous space, while models are discrete entities. Model-based parameter fitting relies on the continuous relationship between the quality of the fit and the parameters. There are no equivalent continuous transitions from one trial model to the next and thus, all the powerful fitting algorithms are useless. A lot of chemical intuition, experience, knowledge etc. is involved in the process of establishing the correct model. It is not the goal of this chapter to offer much help on this subject. The usual procedure is to chose a selection of reasonable models and fit them all, and subsequently make a decision on the 'best' or 'correct' one by analysing the individual results of these analyses. Some data fitting algorithms provide statistical information that allows an estimation of the quality of the fit and thus about the suitability of the model. The tools we created in Chapter 3, Physical/Chemical Models, form the core of the fitting algorithms of this chapter. The model defines a mathematical function, either explicitly (e.g. first order kinetics) or implicitly (e.g. complex equilibria), which in turn is quantitatively described by one or several parameters. In many instances the function is based on such a physical model, e.g. the law of mass action. In other instances an empirical function is chosen because it is convenient (e.g. polynomials of any degree) or because it is a reasonable approximation (e.g. Gaussian functions and their linear combinations are used to represent spectral peaks).
102
Chapter 4
A crucial point, not mentioned so far, is the question about the meaning of the expression 'best' parameters. Intuitively it seems to be clear; they are the parameters for which the calculated data match the measured data as closely as possible. Almost invariably the sum of the squares of the differences between the measured data and the calculated model function is minimised and is the measure for the quality of the fit.
4.1 Background to Least-Squares Methods There are several reasons why the sum of squares, i.e. the sum of squared differences between the measured and modelled data, is used to define the quality of a fit and thus is minimised as a function of the parameters. It is instructive to consider alternatives to the sum of squares. (a) Minimal sum of differences - is not an option, as positive and negative differences cancel each other out. Huge deviations in both directions can result in zero sums. (b) This suggests minimising the sum of the absolute values of differences. (c) Another possibility is to take the sum of the differences to the power of 4, 6, …. The higher the power the more weight is applied to the relatively large differences. (d) The ultimate is to minimize the maximal difference, which is identical to taking a very high power of the differences prior to summation. It is beyond the scope of this book to discuss the statistical theories behind the very common least-squares fitting. We refer the reader to the list of reference books in Further Reading for more details. Here we give a glimpse into the reasoning. Given a set of measurements, it is obvious that parameters producing function values that are very different from the data, are less likely correct than parameters producing function values very similar to the measurement. The statistical argument goes the other way. Given a set of parameters, what is the probability that the particular measurement could have occurred? Assuming normally distributed errors in the measurements, one can start the mathematical formalism and come to the expected end: those parameters that result in the minimal sum of squares are the ones most likely to produce the actual measurement. Leastsquares fitting delivers a maximum likelihood estimation of the parameters. We need to stress again: this is the case only if the measurement errors are independent, normally distributed and of constant standard deviation. Often these rather stringent statistical requirements are not met. However, ignoring this fact and applying the method of least-squares fitting anyway, usually does not result in a disaster. All alternatives to the least-squares measure are computationally more demanding. Note that non-uniformly distributed noise is not a problem and can easily be incorporated into the fit. Weighting according to known standard deviations of the noise results in χ2 fitting, see Non-White Noise, χ2-Fitting (p.189).
103
Model-Based Analyses
4.1.1 The Residuals and the Sum of Squares The measured data are approximated by an appropriate function. For each measured data pair (xi, yi) there is a calculated value ycalci for that particular xi. The value ycalci is computed as a function of the parameters and xi. The difference between the measurement yi and its calculated value ycalci is defined as the residual ri. This is represented in Figure 4-1.
+ + yi ycalci
+
+
ri
+
+
+ xi
Figure 4-1. The residual ri is the difference between the measured yi and the calculated ycalci ri = yi − ycalci = yi − f (xi , parameters)
(4.1)
The residuals are a function of the parameters. Note that they are also a function of the model and the data, but we take these as given and ignore this for the time being. The sum of squares, ssq, is the sum of all the squares of the individual residuals and thus is also a function of the parameters: ssq = ∑ ri2 = ∑ (yi − ycalci )2 = f ( parametrs ) i
(4.2)
i
It is instructive to represent the situation for two typical examples. Linear Example: Straight Line For a straight line, the function for ycalc describing a vector of measurements is: ycalci = intercept + slope × xi
(4.3)
Thus ssq is a function of the parameters intercept and slope and therefore can be displayed in a 3-dimensional plot, see Figure 4-3.
104
Chapter 4
First some noisy data are generated, they are scattered around a straight line: MatlabFile 4-1. Data_mxb.m function [x,y]=Data_mxb x=(1:10)'; y=20+6*x; randn('seed',2); y=y+5*randn(size(y));
% initialise random number generator % adding normally distributed noise
MatlabFile 4-2. Main_mxb.m % Main_mxb [x,y]=Data_mxb; plot(x,y,'+',[1 10],[26 80]) axis([0 11 0 100]) xlabel('x');ylabel('y');
100 90 80 70
y
60 50 40 30 20 10 0 0
2
4
6
8
10
x
Figure 4-2. Noisy data scattered around the underlying straight line We can calculate ssq for a range of values for slope and intercept and plot the result in a mesh-plot, see Figure 4-3. MatlabFile 4-3. Main_mxb.m …continued % Main_mxb ...continued intercepts=-20:5:60; slopes=0:12; for i=1:length(intercepts); for j=1:length(slopes);
105
Model-Based Analyses SSQ(i,j)=sum((y-(intercepts(i)+slopes(j).*x)).^2); end end mesh(slopes,intercepts,SSQ+5e4);
colormap([0 0 0]);
hold on;
contour(slopes,intercepts,SSQ,50);
xlabel('slopes');
ylabel('intercepts');
zlabel('ssq+5e4');
hold off;
x 10
4
ssq+5e4
15
10
5
0 60 40
15 10
20 0 intercepts
5 -20
0
slopes
Figure 4-3. ssq vs. slope and intercept. The minimum of ssq is near the true values slope=6 and intercept=20 that were used to generate the data (see Data_mxb.m). ssq is continuously increasing for parameters moving away from their optimal values. Analysing that behaviour more closely, we can observe that the valley is parabolic in all directions. In other words, any vertical plane cutting through the surface results in a parabola. In particular, this is also the case for vertical planes parallel to the axes, i.e. ssq versus only one parameter is also a parabola. This is a property of so-called linear parameters. Non-Linear Example: Exponential Decay In order to further explore the properties of the landscape formed by the sum of squares as a function of parameters, we concentrate on a slightly more
106
Chapter 4
complex function. As an example, we use the exponential decay of the intensity of the radiation of a sample of a radioisotope. I = I 0 e −k t
(4.4)
The two parameters defining this function are the rate constant k and the initial intensity I0. First we create and plot a noisy measurement: MatlabFile 4-4. Data_Decay.m function [t,y]=Data_Decay t=[0:50]'; k=0.05; I_0=100; randn('seed',0); y=I_0*exp(-k*t); y=y+10*randn(size(y)); MatlabFile 4-5. Main_Decay_2d.m % Main_Decay_2d [t,y]=Data_Decay; plot(t,y,'x'); xlabel('time'); ylabel('intensity');
120 100
intensity
80 60 40 20 0 -20 0
10
20
30
40
50
time
Figure 4-4. Exponential decay of radiation intensity. And now we repeat what we have done earlier with the straight line fit, i.e. calculating and plotting the sum of squares, ssq, as a function of a range of parameters.
Model-Based Analyses
107
MatlabFile 4-6. Main_Decay_ssq.m % Main_Decay_ssq [t,y]=Data_Decay; I_0=0:10:200; k=0:.01:.2; for i=1:length(I_0); for j=1:length(k); SSQ(i,j)=sum((y-(I_0(i)*exp(-k(j).*t))).^2); end end mesh(k,I_0,SSQ+1e6); colormap([0 0 0]); hold on; contour(k,I_0,SSQ,100); xlabel('k'); ylabel('I_0'); zlabel('ssq+1e6'); hold off; view(50,20);
Figure 4-5. ssq vs. initial intensity I0 and k. Comparing Figure 4-5 with the corresponding plot from the straight line fit in Figure 4-3, an important difference is that the landscape is no longer parabolic. There is a flat region and a very steep increase at the back corner. Nevertheless, the contour lines clearly indicate that there is a minimum near the correct position.
108
Chapter 4
More careful examination of this shape reveals two important facts. (a) Plots of ssq as a function of k at fixed I0 are not parabolas, while plots of ssq vs. I0 at fixed k are parabolas. This indicates that I0 is a linear parameter and k is not. (b) Close to the minimum, the landscape becomes almost parabolic, see Figure 4-6. We will see later in Chapter 4.3, Non-Linear Regression, that the fitting of non-linear parameters involves linearisation. The almost parabolic landscape close to the minimum indicates that the linearisation is a good approximation.
Figure 4-6. Close-up of ssq vs. I0 and k near the minimum. The surface is approximately parabolical. The exact location of the minimum cannot be computed explicitly for non linear parameters. Starting from a set of initial guesses for the parameters, the location of the minimum has to be approached iteratively. Some methods are robust, i.e. they converge reliably, even if started far from the minimum. These methods, however, tend to be slow in localising the exact position of the minimum (e.g. in Chapter 4.4.2, The Simplex Algorithm). Several alternative algorithms require the computation of the first and sometimes the second derivatives, either of the sum of squares or of the residuals with respect to the parameters (e.g. The Newton-Gauss Algorithm, see Chapters 4.3.1 and 4.4.1). If the initial guesses for the parameters are poor, convergence is often not reliable; generally, these methods are not as robust as the simplex algorithm when started far from the minimum. However, if the initial guesses for the parameters are reasonable, the progress of the iterative process is very fast.
Model-Based Analyses
109
4.2 Linear Regression This chapter on linear regression is central to the whole book, in fact linear regression is central for almost all numerical computations. This might sound surprising, as linear regression is significantly simpler than non linear regression or many other algorithms. The justification for the statement lies in the fact that most non-linear problems are linearised and solved iteratively, but each iterative linearisation step is solved by a linear regression calculation. Generally, the linearisation is based on a Taylor series expansion, truncated after the second term. We have already encountered the Taylor expansion in the Chapter The Newton-Raphson Algorithm (p.48). We meet it again in Chapter 4.3.1, The Newton-GaussLevenberg/Marquardt Algorithm. We can conclude that linear regression calculations are very, very common. They are continuously performed deep inside the non-linear problem solving routines. As it turns out, linear regression is, with a few exceptions, the most complex computation undertaken in any program. For this reason we specifically discuss numerical problems that may occur in certain situations. Matlab recognises the importance of linear regression calculations and introduced a very elegant and useful notation: the / forward- and \ back slash operators, see p.117-118. Note that the term 'Linear Regression' is somewhat misleading. It is not restricted to the task of just fitting a straight line to some data. While this task is an example of linear regression, the expression covers much more. However, to start with, we return to the task of the straight line fit.
4.2.1 Straight Line Fit - Classical Derivation It makes sense to start with the well known task of finding the best straight line through a set of (x,y)-data pairs. We can refer back to Figure 4-3 which displays the sum of squares, ssq, as a function of the two parameters defining a straight line, the slope and the intercept. The task is to find the position of the minimum, the values for slope and intercept that result in the least sum of squares. Earlier, we promised an explicit solution for the determination of linear parameters. We first change the original notation introduced in equation (4.3): ycalci = intercept + slope × xi = a1 + a 2 xi
(4.5)
It is more efficient to use a1 and a2 as parameters rather than intercept and slope. More importantly, in 4.2.3, Generalised Matrix Notation we will be able to extend the vector containing the a-values to any higher dimension.
110
Chapter 4
The sum of squares, ssq, can be written as m
m
m
i =1
i =1
i =1
ssq = ∑ ri2 = ∑ (yi − ycalci )2 = ∑ (yi − (a1 + a2 xi ))2
(4.6)
where m denotes the number of elements in the data vector y. At the minimum, the derivatives of ssq with respect to a1 and to a2 are both zero. ∂ssq ∂ssq =0 = ∂a1 ∂a0
(4.7)
It is a matter of substituting (4.6) into (4.7) and a bit of straight algebra to arrive at: m
∂ssq = ∂a1
∂ ∑ (yi − (a1 + a 2xi ))2 i =1
∂a1 m
m
m
i =1
i =1
i =1
= ∑ −2(yi − a1 − a2 xi ) = −2∑ yi + 2ma1 + 2a 2 ∑ xi = 0
(4.8)
and m
∂ssq = ∂a 2
∂ ∑ (yi − (a1 + a 2xi ))2
i =1
∂a 2 m
m
m
m
i =1
i =1
i =1
i =1
= ∑ −2xi (yi − a1 − a 2 xi ) = −2∑ xi yi + 2a1 ∑ xi + 2a 2 ∑ xi2 = 0
This represents a system of 2 equations with 2 unknowns, a1 and a2. After division
by
2
and
introducing
a
short
notation
for
the
sums
m
(e.g. ∑ xi yi = Σxy ), we can write: i =1
a1m + a2Σx = Σy a1Σx + a2Σx 2 = Σxy
(4.9)
and this in turn is written as a matrix equation (see Chapter 2.2) ⎡m ⎢ ⎣ Σx with the solution
Σx ⎤ ⎡a1 ⎤ ⎡ Σy ⎤ ⎥⎢ ⎥ = ⎢ ⎥ Σx 2 ⎦ ⎣a2 ⎦ ⎣ Σxy ⎦
(4.10)
111
Model-Based Analyses
⎡a1 ⎤ ⎡ m ⎢a ⎥ = ⎢ ⎣ 2 ⎦ ⎣ Σx
Σx ⎤ ⎥ Σx 2 ⎦
−1
⎡ Σy ⎤ ⎢Σxy ⎥ ⎦ ⎣
(4.11)
The inverse of the 2-by-2 matrix can be calculated as ⎡m ⎢ ⎣ Σx
Σx ⎤ ⎥ Σx 2 ⎦
−1
=
⎡ Σx 2 1 2 ⎢ m Σx − (Σx ) ⎣ −Σx 2
−Σx ⎤ ⎥ m ⎦
(4.12)
Inserting (4.12) into the matrix product of (4.11) results in a set of two explicit equations for the two parameters:
Σx 2Σy − Σx Σxy m Σx 2 − (Σx )2 −Σx Σy + m Σxy a2 = m Σx 2 − (Σx )2 a1 =
(4.13)
Rather than writing a short program in Matlab for this result, we demonstrate how to perform the task of a straight line fit in Excel. Excel actually provides several ways of performing the job of fitting the best line through a set of data pairs. The most convenient is probably the Add Trendline … tool which delivers the result in a few clicks. Right-click on one of the points of the data series and a context menu appears as shown in Figure 4-7. ExcelSheet 4-1. Chapter3.xls-trendline
Figure 4-7. Using Trendline for a straight line fit. Select Add Trendline … to get the graphical input selection menu for the trendline as shown in Figure 4-8.
112
Chapter 4
Figure 4-8. The Add Trendline menu. Keep the default Linear Regression under the type tab and on the Options tab select Display equation on chart. This results in 14
y = 1.2667x + 1.6667
12 10 8 6 4 2 0 0
5
10
Figure 4-9. Fitted trendline with equation. One difficulty with the Trendline is that the equation only appears graphically. The values for slope and intercept have to be copied manually into the spreadsheet if they are to be used in later calculations. In Chapter 4.2.6, Excel Linest, we discuss the LINEST function of Excel which is much more versatile while still covering the best line fit. LINEST delivers the results into the spreadsheet where they can be used for further
calculations. Additionally, LINEST supplies a statistical analysis.
Model-Based Analyses
113
4.2.2 Matrix Notation A useful first step towards the fitting of more complex linear functions, is to translate the equations into a matrix oriented notation. Equation (4.5) is actually a system of m equations, where m is the number of (x,y)-data pairs. ycalc1 = a1 + a 2 x1 ycalc 2 = a1 + a 2 x 2 ycalci = a1 + a 2 xi
(4.14)
ycalcm = a1 + a 2 xm
This system of m equations can be written as one matrix equation.
⎡ ycalc1 ⎤ ⎡1 x ⎤ 1 ⎢ ⎥ ⎢ ⎥ ⎢ ycalc 2 ⎥ ⎢1 x 2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎡a1 ⎤ ⎢ ⎥=⎢ ⎥ ⎢ ycalci ⎥ ⎢1 xi ⎥ ⎢⎣a2 ⎥⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢y ⎥ ⎢1 xm ⎥⎦ ⎣ calcm ⎦ ⎣
(4.15)
y calc = F(x ) a
(4.16)
or
ycalc is a column vector containing the m individual elements ycalci, F(x), or shorter just F, is an m by 2 matrix; the first column is formed by ones and the second column is composed of the elements xi. The vector a contains the parameters a1 and a2.
Similarly, the vector of residuals r, as introduced in equation (4.1), can be defined in a matrix equation: y = y calc + r
r = y − y calc = y − F a
(4.17)
And the sum of squares can be written as m
ssq = ∑ ri2 = r t r
(4.18)
i =1
The task is to find that vector a for which ssq is minimal. Now we are in a better position to generalise to more complex linear functions. The prototype of linear least-squares fitting is the fitting of a
114
Chapter 4
polynomial of a higher degree. Remember, a straight line is a polynomial of degree one and hence is only a special case.
4.2.3 Generalised Matrix Notation Equations (4.15) or (4.16) represent the fit of a straight line to a set of data pairs. These equations can be written in an expanded form:
y calc = F a = f:,1 a1 + f:,2 a2
(4.19)
Recall the colon (:) notation as introduced in Chapter 2.1, Matrices, Vectors, Scalars. The first column of F, f:,1, contains m ones, while the second column, f:,2, contains the values x1,…,xm. The j-th column of F is multiplied by its corresponding j-th element of the parameter vector a and the products are summed. Equation (4.19) and its predecessors describe the special case for a polynomial of degree one. It is straightforward to generalise by adding any number of terms or columns in F and elements in a.
y calc = F a = f:,1 a1 + f:,2 a2 + ... + f:, j a j + ... + f:,np anp =
np
∑ f:, j a j
(4.20)
j =1
The prototype application is the fitting of the np linear parameters, a1,…,anp defining a higher order polynomial of degree np-1. The generalisation of equation (4.5) reads as:
ycalci = a1 + a2xi + a3 xi2 + ... + a j xij −1 + ... + anp xinp −1 =
np
∑ a j xij −1
(4.21)
j =1
In matrix notation, the equivalent of equation (4.15) can be written in the following way:
y calc
⎡1 x1 ⎢ ⎢1 x 2 ⎢ =⎢ ⎢1 xi ⎢ ⎢ ⎢ ⎣1 xm
x12
x1j −1
x 22
x 2j −1
xi2
xij −1
xm2
xmj −1
⎡a ⎤ x1np −1 ⎤ ⎢ 1 ⎥ ⎥ a2 ⎥ x 2np −1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ a3 ⎥ ⎥⎢ ⎥ xinp −1 ⎥ ⎢ a ⎥ j ⎥⎢ ⎥ ⎥⎢ ⎥ np −1 ⎥ xm ⎦ ⎢a ⎥ ⎣ np ⎦
(4.22)
where the j-th column f:,j contains the (j-1)-th power of the elements of the vector x. It is most important to realise that the columns of F can comprise any function of x, not just the different powers. Examples include sin(x), exp(
115
Model-Based Analyses
3x), 1./x, tan(ln(x+1)), … there is no end to the possibilities. The matrix F is often called design matrix . We repeat, the task of linear regression is to determine those values of the vector a for which the product vector ycalc=Fa is as close as possible to the actual measurements y. Closeness of course is defined by the sum of the squared differences between y and ycalc. There are several ways to derive the equations for the computation of the optimal vector a. One option would be to generalise the procedure we used for the straight line fit in equations (4.5) - (4.13), which would be rather cumbersome. In the following we use a different approach.
4.2.4 The Normal Equations We already stated that this chapter on linear regression is central to the whole book and to numerical methods in general. Thus, it is worthwhile putting extra effort into trying to fully understand the process, its limits, its dangers, etc. It is possible to represent the principles geometrically. However, due to the restriction of the human mind to comprehend three dimensions only, these geometrical representations necessarily are restricted as well. We can only deal with two columns in the design matrix F. y f:,2
r
ycalc=Fa f:,1
Figure 4-10. The best residual vector r is orthogonal to the plane spanned by the vectors f:,1 and f:,2 .
The two vectors f:,1 and f:,2 span a plane that is represented by the grey rectangle. Note that the two vectors are not orthogonal; they do not form a normal system of axes. Any point on this plane can be written as a linear combination of the two base vectors f:,1 and f:,2 or as Fa. Thus, any point on that plane is defined by a pair of numbers and this pair forms the vector a. The pair could be called coordinates but this might be misleading, as coordinates usually are based on an orthogonal system of axes. In this graph, the linear regression problem can be understood as finding the point on the plane that is closest to the measurement vector y. The vector of measured data y usually does not lie in the plane spanned by the columns of F. It would lie there if there were no measurement errors or no noise. Be reminded: the column vectors in F, y and r are m-dimensional vectors, thus there is a high dimensional space in which y can be outside the plane F.
116
Chapter 4
Unfortunately, it is not possible for us to 'see' the grey plane imbedded in an m-dimensional space, 3 dimensions have to suffice. Some people have a good 3 dimensional imagination. They immediately 'see' that the closest point on the plane is just vertically underneath the tip of the vector y. Using more appropriate expressions: the minimal residual vector r, which is the shortest difference between y and Fa, is orthogonal, or normal, to the plane defined by F. Now the expression 'Normal Equations' starts to make sense. The residual vector r is normal to the grey plane and thus normal to both vectors f:,1 and f:,2. As outlined earlier, in Chapter Orthogonal and Orthonormal Matrices (p.25), for orthogonal (normal) vectors the scalar product is zero. Thus, the scalar product between each column of F and vector r is zero. The system of equations corresponding to this statement is: t f:,1 r=0 t f:,2 r=0
(4.23)
This set of equations can be further simplified and written as one matrix equation:
Ft r = 0
(4.24)
where 0 is now a column vector with two 0's. All that is needed now are a few matrix algebraic manipulations to arrive at the equation for the calculation of the best a. F t r = F t (y − F a) = 0 thus t
(4.25)
t
F y = F Fa to result in a = (F t F )-1 F t y
(4.26)
This last equation is very crucial and we will spend considerable time investigating it further. Equations (4.11) and (4.26) have to be identical. Simple verification based on the rules for matrix multiplication:
117
Model-Based Analyses
⎡1 1 F tF = ⎢ ⎣ x1 x 2
1 ⎤ xm ⎥⎦
⎡1 1 Fty = ⎢ ⎣ x1 x 2
1 ⎤ xm ⎦⎥
⎡1 x1 ⎤ ⎢1 x ⎥ 2⎥ ⎢ ⎢ ⎥ ⎢ ⎥ x 1 m⎦ ⎣ y ⎡ 1⎤ ⎢ ⎥ ⎢ y2 ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ym ⎦
⎡m = ⎢ ⎣ Σx
⎡ Σy ⎤
⎢ ⎥
⎣Σxy ⎦
Σx ⎤ ⎥ Σx 2 ⎦
(4.27)
thus ⎡a ⎤ ⎡ m a = ⎢ 1⎥ = ⎢ ⎣a2 ⎦ ⎣Σx
Σx ⎤ ⎥ Σx 2 ⎦
−1
⎡ Σy ⎤ ⎢Σxy ⎥ ⎣ ⎦
The important point is that equation (4.26) is very general and does not require any calculations of derivatives of ssq with respect to the parameters, etc.
The Pseudo-Inverse Equation (4.17) +r can be written in a casual way y ≈ Fa
(4.28)
where the ≈ represents the least-squares solution. Don't forget that it is not a proper equation as y=Fa. As the solution to equation (4.28) one might be tempted to write " a = F -1 y "
(4.29)
which, of course, is mathematically incorrect as F is not square and thus cannot be inverted. The matrix (FtF)-1Ft in equation (4.26) replaces "F-1" in equation (4.29). It is known as the Pseudo-Inverse of F, for which the notation F+ is often used. Thus we can write a = F+y F + = (F t F )-1 F t
(4.30)
Matlab is, of course, aware of the fundamental importance of the pseudoinverse and created its own notation for it. In Matlab we could write a=inv(F'*F)*F'*y but it is numerically much more efficient to use the appropriate Matlab back-slash \ command as in a=F\y. It is to be read from the right to the left as 'y divided by F', implying, of course, the multiplication of the left pseudo-inverse of F with y as given in equation (4.30).
118
Chapter 4
The equation y≈Fa written more casually as y=Fa can be represented by the following general scheme
= =
y =
F
a
Figure 4-11. Schematic representation of the matrix equation y=Fa. y and a are column vectors while F is a matrix of the appropriate dimensions. It is possible to transpose the equation represented in Figure 4-11 and one could write yt=atFt. Renaming yt, at and Ft to y1, a1 and F1 and using other dimensions as in Figure 4-11, the next figure represents a generalised 'transposed' situation.
=
y1
=
a1
F1
Figure 4-12. Schematic representation of the matrix equation y1=a1F1. The least-squares solution for a1 in the above equation is a1 = y1 F1t (F1 F1t )−1 or a1 =
(4.31) y1 F1+
In Matlab we could write a=y*F'*(inv(F*F')); but again it is numerically much more efficient to use the appropriate Matlab forward slash / operator as in a=y/F, which again states 'y divided by F' but now reading from left to right and meaning the multiplication of y with the right pseudo-inverse of F, as stated in equation (4.31). Matrix multiplication is not commutative, (see Chapter 2.1.1, Elementary Matrix Operations) and thus the order of the factors is important and in a similar way the order is important with 'division'. There are the left and right pseudo-inverses and the / and \ slashes in Matlab represent them in a very elegant way. With beginners of Matlab, it is a very common mistake to use the wrong slash. In the best
119
Model-Based Analyses
case, an error occurs as the dimensions do not match. In the worst case the error goes unnoticed. It is very helpful to write down schematic matrix equations in order to verify the dimensions and correctness of the corresponding calculations. In Figure 4-13 and Figure 4-14 this was done to visualise equations (4.30) and (4.31) using the matrix/vector dimensions shown in Figure 4-11 and Figure 4-12.
=
=
a = (FtF)-1
Ft
y =
F+
y
Figure 4-13. Schematic representation of the matrix equations involving multiplication of y by the left pseudo-inverse F+=(FtF)-1Ft.
=
a1 =
=
y1
F1t
(F1F1t)-1 =
y1
F1+
Figure 4-14. Schematic representation of the matrix equations involving multiplication of y1 by the right pseudo-inverse F1+=F1t(F1F1t)-1.
Linear Dependence, Rank of a Matrix For the computation of the pseudo-inverse, it is crucial that the vectors f:,j are not parallel, or more correctly, that they are linearly independent. Otherwise, the matrix FtF is singular and cannot be inverted. Matlab issues a warning. We can gain a certain level of understanding by adapting Figure 4-10:
120
Chapter 4 y
r
f:,2
ycalc=Fa
f:,3 f:,1
Figure 4-15. Linear dependence, three vectors f:,1, f:,2, f:,3 lie in one plane Assume there are three columns in F that all lie in one plane, as indicated in Figure 4-15. In such a case, there is no unique set of coordinates defining the point on the plane that is nearest the measurement y. The coordinates can be given by any two of the three vectors. They cannot be calculated uniquely or, in other words, the three parameters of the vector a are not defined. In this case, when trying to perform a=F\y, Matlab would usually respond with an error message such as Warning: Matrix is singular to working precision. The number of linearly independent columns (or rows) in a matrix is called the rank of that matrix. The rank can be seen as the dimension of the space that is spanned by the columns (rows). In the example of Figure 4-15, there are three vectors but they only span a 2-dimensional plane and thus the rank is only 2. The rank of a matrix is a very important property and we will study rank analysis and its interpretation in chemical terms in great detail in Chapter 5, Model-Free Analyses.
Numerical Difficulties Figure 4-10 can be adapted further to represent another important aspect. In Figure 4-16 we see that the two vectors f:,1 and f:,2 are almost parallel, not exactly parallel, as then the rank would be one only. y f:,1
f:,2
r
ycalc=Fa
Figure 4-16. Near linear dependence if the base vectors are almost parallel
Model-Based Analyses
121
Because the two base vectors are almost parallel, the plane they lie in is not well defined. Figure 4-16 attempts to represent the problem: the plane can be turned about the two vectors like the pages of a book about the spine. Consequently the projection of y and the residuals r are poorly defined as well. The figure also indicates that the problem is less serious if y is close to the vectors f:,1 and f:,2, than if it is almost orthogonal. As linear regression is a very fundamental operation, several methods have been developed in order to improve the numerical stability of the calculation. It is beyond the objective of this book to discuss these issues in any detail. We do feel, however, that the reader has to be aware of the potential problems and should be able to avoid them as much as possible. The Matlab computations invoked by the back-slash \ and forward-slash / operators do not perform the calculation as given in equations (4.26) and (4.31). Here a short extract from the Matlab HELP: If F is not square and is full, then Householder reflections are used to compute an orthogonal-triangular factorization. F*P = Q*R where P is a permutation, Q is orthogonal and R is upper triangular (see qr). The least squares solution X for the equation B=AX is computed with X = P*(R\(Q'*B)
Without attempting to fully understand this, the essence is important: a=F\y
or a1=y1/F1
are numerically much better than a=inv(F'*F)*F'*y
or a1=y1*F1'*inv(F1*F1').
4.2.5 Errors in the Fitted Parameters As there is a difference between the measurements and the values of the calculated function, we can safely assume that the fitted parameters are not perfect. They are our best estimates for the true parameters and an obvious question is, how reliable are these fitted parameters? Are they tightly or they are loosely defined? As long as the assumption of random white noise applies, there are formulas that allow the computation of the standard deviation of the fitted parameters. While these answers should always be taken with a grain of salt, they do give an indication of how well defined the parameters are.
122
Chapter 4
We are not going to derive the formulas that allow the calculation of the standard deviations of the parameters. The reader is invited to refer to more specialised texts on statistics. We just give the formulas and also give ways of calculating the required information. In equation (4.32) the standard deviation of the parameter aj is given. σa j = σr d j , j
(4.32)
where σr is the standard deviation of the residuals σr =
ssq m − np
(4.33)
Here m is the number of points in y and np the number of fitted parameters. The difference m-np is the number of degrees of freedom, df. The elements dj,j in equation (4.32) are the diagonal elements of the inverse of the so-called curvature matrix, Curv, that contains the second derivatives of the sum of squares with respect to the parameters. The definition of the element Curvj,k is
Curv j ,k =
1 ∂ 2ssq 2 ∂a j ∂ak
(4.34)
This looks horrendous, but as will be shown in a moment it is not. In fact it is 'trivial' to compute. We start with the first derivative ∂ssq ∂ = ∂a j ∂a j
m
∑ ri2 i =1
2
np ⎛ ⎞ ⎜ yi − ∑ f i , j a j ⎟ ⎜ ⎟ j =1 ⎝ ⎠ np m ⎛ ⎞ = −2∑ ⎜ yi − ∑ f i , j a j ⎟ f i , j ⎜ ⎟ i =1 ⎝ j =1 ⎠ m
∂ =∑ i =1 ∂a j
= −2 r t f:, j and then the second derivative:
(4.35)
123
Model-Based Analyses
∂ 2ssq ∂ ∂ssq
= ∂ak ∂a j ∂ak ∂a j ∂ ∂ak
= −2
m
⎛
np
⎞
j =1
⎠
∑ ⎜⎜ yi − ∑ f i , ja j ⎟⎟ f i , j i =1 ⎝
⎞ ⎛ ⎞ ⎛ np ⎜ yi f i , j − ⎜ ∑ f i , j a j ⎟ f i , j ⎟ ⎜ j =1 ⎟ ⎟ ⎜ ⎠ ⎝ ⎠ ⎝ np m ⎞ ∂ ⎛ = 2∑ ⎜ ∑ fi , ja j ⎟ f i , j ⎟ ⎜ i =1 ∂ak ⎝ j =1 ⎠ m
∂ ∂a i =1 k
= −2∑
(4.36)
m
= 2∑ f i ,k f i , j i =1
= 2 f:,tk f:, j The complete set of first and second derivatives can be written elegantly in matrix notation: ∂ssq = −2 r t F ∂a and
(4.37)
2
∂ ssq = 2F t F ∂a∂a The elements dj,j, as required in equation (4.32), are the diagonal elements of the inverse of FtF. D = Curv −1 = (F t F)−1
(4.38)
It is possible to represent the situation graphically.
ssq
ssq
ak
aj
Figure 4-17. The parameter ak is well defined, while the parameter aj is much more loosely defined.
124
Chapter 4
If ssq increases sharply with small movement of the parameter away from the minimum, then the parameter is well defined. Otherwise the parameter is only loosely defined. Referring to Figure 4-17 the parameter ak is better defined than the parameter aj. The example below shows a short Matlab program that fits the function y=tan(x) with a polynomial of degree 3 defined by 4 linear parameters, i.e. the elements of a. The statistical analysis is problematic for this example as the residuals are obviously not normally distributed. Nonetheless, the high errors in the parameters and the large standard deviation of the residuals indicate a bad fit. MatlabFile 4-7. tan_poly.m % tan_poly % fitting y=tan(x0) with a polynomial of degree 3 x=(0.1:0.1:1.5)'; y=tan(x); nd=3; np=nd+1;
% degree of polynomial % number of parameters
F(:,1)=ones(size(x)); for j=1:nd F(:,j+1)=F(:,j).*x; end
% design matrix
a=F\y; r=y-F*a; df=length(x)-np; ssq=r'*r; sigma_r=sqrt(ssq/df); D=inv(F'*F); sigma_a=sigma_r*sqrt(diag(D));
% linear parameters % residuals % degrees of freedom % sum of squares % sigma_r % inverted curvature matrix % sigma_parameters
fprintf(1,'sigma_r: %g\n',sigma_r); % print sigma_r for i=1:np % print sigma_a fprintf(1,'a(%i): %g +- %g\n',i,a(i),sigma_a(i)); end plot(x,y,'.',x,F*a); xlabel('x'); ylabel('y'); sigma_r: 1.21374 a(1): -2.46752 +- 1.6462 a(2): 20.9388 +- 8.62303 a(3): -37.9646 +- 12.3167 a(4): 20.2432 +- 5.0712
125
Model-Based Analyses
16 14 12 10
y
8 6 4 2 0 -2 0
0.5
1
1.5
x
Figure 4-18. Fitting the function y=tan(x) with a polynomial of degree 3.
4.2.6 Excel Linest As indicated in Chapter 4.2.1, Straight Line Fit, the Excel function LINEST is a more general function than the TRENDLINE. In addition to allowing the fitting of any linear function, it also delivers a statistical analysis. In order to demonstrate its use, we fit the same polynomial to the same tan-function as we have done in the preceding section using Matlab and tan_poly.m. The columns A and B of the spreadsheet shown in Figure 4-19 are made up by the (x,y)-data-pairs; the array (F4:I18) is the equivalent of the matrix F. LINEST is an array function. The parameters have to be entered as demonstrated in Figure 4-20. The parameter Const has to be set to FALSE at this stage. We explain its meaning in a moment. Stats needs to be set to TRUE for the statistical analysis. As LINEST is an array function, the output area (F21:I25) needs to be selected (the area is shaded) and the command is executed by pressing Ctrl-Shift-Enter (all three keys together). The statistical analysis results are listed as explained by the text underneath the output. Note that, for unknown reasons, Excel reverses the order of the parameters, i.e. the parameter belonging to the first column is listed last. It is encouraging to see that the results of Matlab (see p.124) and Excel are identical. Excel returns a few additional statistical numbers, r2, F and ssreg in cells F23:F25. They are not vital for our present purpose and we refer to the Excel Help for details.
126
Chapter 4
ExcelSheet 4-2. Chapter3.xls-linest
=LINEST(B4:B18,F4:I18,FALSE,TRUE)
Figure 4-19. Excel spreadsheet demonstrating the use of the LINEST function.
Figure 4-20. The window for the LINEST parameter input.
Model-Based Analyses
127
Excel has an alternative way of doing the same. The column of 1's can be omitted, selecting only the three others and fitting with the LINEST function but having the Const option set to TRUE. Excel internally adds the columns of ones and delivers exactly the same results. That one special parameter is the y-intercept. Similar options exist in the trendline function.
4.2.7 Applications of Linear Least-Squares Fitting In the following we present several linear least-squares analyses. Linearisation of Non-Linear Problems In many instances non-linear functions can be linearised and in this way a non-linear, iterative fitting procedure can be reduced to an explicit linear fit. A typical example is the exponential decay of the intensity of the emission of a radioactive sample. We use the data already used for Figure 4-4, produced by the function Data_Decay.m. The equation describing the data is:
y = I 0 × e −kt
(4.39)
This equation can be rewritten in a logarithmic form ln(y ) = ln(I 0 ) - kt
(4.40)
Plotting ln(y) versus time t results in a straight line with slope -k and intercept ln(I0). These two parameters can be computed non-iteratively in a linear regression. The program could look like this: MatlabFile 4-8. Main_Decay.m % Main_Decay [t,y]=Data_Decay; yb=y(y>0); tb=t(y>0);
% we have to get rid of negative values
F=ones(length(tb),2); F(:,2)=tb; a=F\log(yb); I_0=exp(a(1)) k=-a(2) plot(tb,log(yb),'o',tb,F*a); xlabel('time'); ylabel('ln(y)'); I_0 = 100.4029 k =
128
Chapter 4 0.0502
5 4.5 4
ln(y)
3.5 3 2.5 2 1.5 1 0
10
20
30
40
50
time
Figure 4-21. The logarithmic transform of the exponential data set used in Figure 4-4. The fitted exponential curve appears as a straight line. Several observations have to be made: (a) negative y-values have to be deleted as their logarithms are not defined. In real emission experiments this is usually not a problem. Due to the non-uniform error structure of emission data, emission intensity readings are most likely not negative. We will further investigate this aspect in Non-White Noise, χ2-Fitting (p.189). In other applications, eg. spectrophotometric measurements, negative readings are to be expected if the absorbance reading is close enough to zero. (b) As is obvious from the figure, the distribution of the noise is not uniform and thus the later part of the measurement carries more weight than the earlier part in defining the parameters. (c) Thus, the fitted values are different from the best values resulting from a non-linear least-squares fit of the original (non linearised) data. In this example the difference is minimal and not relevant. (d) In the example, the reading of emission intensity is expected to reach zero after enough time. If the infinity reading is not zero, the formula cannot be applied directly. If the measurement is described by an equation of the form
y = I 0 × e −kt + const .
(4.41)
the logarithm of y minus the constant (which is the value at time infinity) has to be plotted versus time
Model-Based Analyses
129
ln(y - const ) = ln(I 0 ) - kt
(4.42)
This task can be problematic as the correct value for const is not necessarily accurately known. Subtracting a 'wrong' value obviously results in a flawed analysis, and this is not always easily detected. The calculations below demonstrate the problem. There is a constant offset added to the y-data, which is not subtracted according to equation (4.42). The plot in Figure 4-22 does not show any 'obvious' miss-fit, however, the calculated parameters are significantly wrong. MatlabFile 4-9. Data_Decay_Offset.m function [t,y]=Data_Decay_Offset t=(0:50)'; k=0.05; I_0=100; randn('seed',0); y=50+I_0*exp(-k*t); y=y+10*randn(size(y)); MatlabFile 4-10. Main_Decay_Offset.m % Main_Decay_Offset [t,y]=Data_Decay_Offset; yb=y(y>0); tb=t(y>0); F=ones(length(tb),2); F(:,2)=tb; a=F\log(yb); I_0=exp(a(1)) k=-a(2) plot(tb,log(yb),'o',tb,F*a); xlabel('time'); ylabel('ln(y)'); I_0 = 137.7157 k = 0.0204
It is worthwhile noting that if a smaller amount of noise is added to the ydata, the subtraction of the wrong constant offset manifests in a visible curvature of their logarithmic plot. This in turn could be misinterpreted as non-exponential behaviour.
130
Chapter 4
5.2 5 4.8
ln (y)
4.6 4.4 4.2 4 3.8 3.6 3.4 0
10
20
30
40
50
time
Figure 4-22. Logarithmic plot with incorrect offset showing no obvious deviation from linearity. Iterative processes are always time consuming and, if possible, should be avoided. Thus, it seems attractive to apply a linearisation procedure in order to avoid a direct iterative non-linear least-squares fit of the exponential. The linearisation approach might be perfectly acceptable in some instances (see the first example) but as a general recipe it is not recommended (e.g. refer to the second example). Another drawback is the non-uniform error distribution invoked by the linearisation of the exponential data. This should in fact be counteracted by appropriate weighting of the residuals as will be introduced in Non-White Noise, χ2-Fitting (p.189). Further, linearisations are only possible in a few selected cases and therefore are not of great general value. In summary: there are not many convincing reasons to do it. Polynomials, the Savitzky-Golay Digital Filter Polynomials do not play an important role in real chemical applications. Very few chemical data behave like polynomials. However, as a general data treatment tool, they are invaluable. Polynomials are used for empirical approximations of complex relationships, smoothing, differentiation and interpolation of data. Most of these applications have been introduced into chemistry by Savitzky and Golay and are known as Savitzky-Golay filters. Polynomial fitting is a linear, fast and explicit calculation, which, of course, explains the popularity.
131
Model-Based Analyses
Smoothing of Noisy Data Measured data are always corrupted by a certain level of noise. For graphical purposes, it is sometimes desirable to display, for example, a spectrum as a smooth line instead of a band of noisy data points. It is important to stress from the very beginning that data smoothing should only be a graphical aid. Data smoothing prior to any parametric or non-parametric analysis hardly ever improves the results. Data smoothing distorts the original values in ways that are difficult to control and the distortion of the results of the fitting is virtually impossible to correct. Nothing is won and a lot can be lost. The basic idea of the Savitzky-Golay digital filter is fairly straightforward. The y-value of a particular (x,y)-data pair is replaced by the value of a polynomial of a certain degree, which has been fitted to a number of neighbouring data points. The computation is repeated for all data. In Figure 4-23 the procedure is illustrated graphically: There are 100 data points in the graph. One particular point is marked by ×. It is the one for which we want to compute its smoothed equivalent. 41 neighbouring data points are marked in black points. They include 20 points to the right and 20 to the left plus the point × itself. A parabola has been fitted though these 41 points and its graph in the range of the fit is represented in the figure. The point 'o' on the parabola is the smoothed representation for the original '×'. 0.114 0.113 0.112 0.111
y
0.11 0.109 0.108 0.107 0.106 0.105 0.104 0.01
0.015
0.02
0.025 x
0.03
0.035
0.04
Figure 4-23. Savitzky-Golay filtering. A polynomial is fitted to a range of data points and the original point (×) is replaced by the value on the polynomial (o).
132
Chapter 4
Two parameters define the Savitzky-Golay filter: the number of points, n, to the right and left of the centre, which are used for the fit and the degree, nd, of the polynomial to be fitted. It is crucial to choose those two parameters carefully; they have to be appropriate for the curves to be smoothed. Many data points and a low degree polynomial result in excellent smoothing but narrow features are broadened. The extreme in this direction is to fit a polynomial of degree 0 to many data points; this is also called moving window averaging. The opposite choice, few data points and a high order polynomial, result in poor smoothing with much less distortion of narrow features. The following short program creates a series of three noisy Gaussians of decreasing width. The density of data is constant along the x-axis and thus there is a decreasing number of points defining each Gaussian. The function SavGol.m performs a Savitzky-Golay smoothing. The parameters are the x- and y-vectors, the number (n) of neighbouring left or right data points that are used for one polynomial fit (i.e. if n=5, 2n+1=11 data points are fitted) and the degree (nd) of the polynomial to be fitted. Figure 4-24 displays the data and the results of the smoothing. The top panel contains the original true curve as well as the noisy data. The second panel displays the result of the Savitzky-Golay filter using 11 data points (2×5+1) to fit a forth order polynomial. All features of the curve are reasonably well preserved, but the smoothing is much less efficient than in the lower part of the Figure where a parabola (polynomial of degree 2) was fitted to 21 (2×10+1) data points. However, in this very smooth curve, the features of the narrow Gaussians are completely lost. The higher the polynomial and the smaller the number of data points fitted, the better it follows narrow features, but the smoothing effect is diminished. The user has to find the appropriate compromise. MatlabFile 4-11. Main_SavGol.m % Main_SavGol x=(1:150)'; y=gauss(x,50,25)+gauss(x,100,10)+gauss(x,125,2); randn('state',0) yn=y+0.1*randn(size(y)); y1=SavGol(x,yn,5,4); y2=SavGol(x,yn,10,2); subplot(3,1,1);plot(x,y,x,yn,'.k'); axis([0 150 -0.1 1.1]); ylabel('y'); subplot(3,1,2);plot(x,y,x,y1,'.k'); axis([0 150 -0.1 1.1]); ylabel('y'); subplot(3,1,3);plot(x,y,x,y2,'.k'); axis([0 150 -0.1 1.1]); xlabel('x'); ylabel('y');
133
Model-Based Analyses
y
1 0.5 0 0
50
100
150
50
100
150
100
150
y
1 0.5 0 0
y
1 0.5 0 0
50 x
Figure 4-24. In all panels the true data are represented by the line marker. The top panel displays the noisy (•) data; the middle panel shows the result of a 4th degree polynomial fitted through 11 noisy data points (•); and the bottom panel, the result of a 2nd degree smoothing through 21 noisy data points (•). It is worthwhile discussing the function SavGol.m in some detail. There are some interesting aspects that can illustrate a few issues of numerically reliable programming. It is tempting to write a routine such as SavGol_bad.m, to perform the Savitzky-Golay filtering, but we will show its numerical weakness. F is built up by the appropriate range of x-values and used to calculate the polynomial coefficients as a=F\y(i-n:i+n), see e.g. equation (4.31). MatlabFile 4-12. SavGol_bad.m function y1=SavGol_bad(x,y,n,nd) % Savitzki-Golay % n: number of points to the right or left % nd: degree of polynomial y1=zeros(size(x)); for i=1+n:length(x)-n x1=x(i-n:i+n); F(:,1)=ones(size(x1)); for j=1:nd F(:,j+1)=F(:,j).*x1; end a=F\y(i-n:i+n); y1(i)=F(1+n,:)*a; end
% the first and last n points are lost
134
Chapter 4
Using y1=SavGol_bad(x,yn,5,4);
rather than y1=SavGol(x,yn,5,4);
within Main_SavGol.m, the result is a large number of messages of the type Warning: Rank deficient, rank = 4
tol =
5.2917e-006.
These messages indicate that the rank of the matrix F is not 5, as expected for the case of a fourth order polynomial, but only 4. Why is this the case? At the beginning of the loop F is 1 1 1 ⎤ ⎡1 1 ⎥ ⎢ 2 3 2 24 ⎥ ⎢1 2 2 F=⎢ ⎥ ⎢ ⎥ ⎢1 11 112 113 114 ⎥ ⎣ ⎦
(4.43)
This is perfectly ok. However, towards the end of the loop, the matrix F looks like ⎡1 140 1402 1403 1404 ⎤ ⎢ ⎥ ⎢1 141 1412 1413 1414 ⎥ F=⎢ ⎥ ⎢ ⎥ ⎢ 2 3 4⎥ ⎣1 150 150 150 150 ⎦
(4.44)
In a strictly mathematical sense this matrix is not singular but numerically it is rank deficient and has effectively a rank of only 4. Calculation of its pseudo-inverse consequently is impossible, or at least numerically unsafe. What can we do about that? It is important to realise that the fitting of a polynomial to a series of (x,y) data pairs is really independent of the actual values in the x-vector. In other words, in the above example, it does not matter whether the x-values are between 140 and 150 or between 1 and 11 or between -5 and +5. What matters is the relationship between the x-values and their y-values. As the data are equidistant, any equidistant vector with the right number of values can be used to generate F. In the improved SavGol function we chose the values from -n to +n. The second important observation is that consequently we do not have to recalculate F each time and more importantly, we do not have to recalculate its pseudo-inverse F+, which is computed outside the loop for all points. The result is a routine which is much faster and numerically much sounder: MatlabFile 4-13. SavGol.m function y1=SavGol(x,y,n,nd) % Savitzki-Golay
Model-Based Analyses
135
y1=zeros(size(x)); x1=(-n:n)'; F(:,1)=ones(size(x1)); for j=1:nd F(:,j+1)=F(:,j).*x1; end F_plus=inv(F'*F)*F'; for i=1+n:length(x)-n a=F_plus*y(i-n:i+n); y1(i)=F(1+n,:)*a; end
The routine SavGol.m is very basic. It does not include the beginning and the end of the curve, and it does not allow asymmetric selection of data points for the polynomial fitting. Both these features could easily be implemented. We leave it to the reader to improve the function accordingly. In its present form it cannot be used for non-equidistant x-values. F would have to be recalculated within the loop but the vector x1 used to generate F would still have to be centred around zero. Calculation of the Derivative of a Curve The computation of the derivative would be straightforward for perfect, noise-free data. The first derivative could be well approximated by dyi yi +1 − yi ≈ dxi xi +1 − xi
(4.45)
and to calculate higher derivatives the process is repeated. For noisy data, this approach results in a disaster. The noise component is amplified with each differentiation and soon there is nothing left but amplified noise. A possible remedy is to use the Savitzky-Golay filter. An initial idea might be tempting: first smooth the data as just demonstrated, and subsequently differentiate the treated data using equation (4.45). There is a better way: the fitted polynomial can be differentiated analytically; this explicit computation is computationally more efficient, both faster and numerically safer. The data set is the same as the one used for smoothing, with the same amount of noise added. The 'true' derivatives are computed using the Savitzky-Golay algorithm on the noise-free data. This is not quite correct but suffices here. The three panels of Figure 4-25 display the results of three different computations of the derivative. The first plot in the figure shows the derivative calculated as simple differences between noisy data. There is only noise left. The two next parts are the derivatives calculated as 2nd and 4th order polynomials through 11 points. Again, a compromise has to be sought between noise reduction and the loss of narrow features. As with smoothing, large numbers of data to define the polynomials and a low order polynomial result in smooth curves that might suffer from loss of narrow features.
136
Chapter 4
MatlabFile 4-14. Main_SavGol_Deriv.m % Main_SavGol_Deriv.m x=(1:150)'; y=gauss(x,50,25)+gauss(x,100,10)+gauss(x,125,2); randn('state',0); yn=y+0.1*randn(size(y)); yd=SavGol_deriv(x,y,2,2); for i=2:length(y); ys(i)=yn(i)-yn(i-1); end y1=SavGol_deriv(x,yn,5,2); y2=SavGol_deriv(x,yn,5,4);
% the 'true' derivative
subplot(3,1,1);plot(x,ys,'.',x,yd,'k'); axis([0 150 -0.6 0.6]); subplot(3,1,2);plot(x,y1,'.',x,yd,'k'); axis([0 150 -0.6 0.6]); subplot(3,1,3);plot(x,y2,'.',x,yd,'k'); axis([0 150 -0.6 0.6]); xlabel('x');
0.5 0 -0.5 0
50
100
150
50
100
150
100
150
0.5 0 -0.5 0 0.5 0 -0.5 0
50 x
Figure 4-25. The top panel displays the true derivatives and those computed as the quotient of differences; the middle and bottom panels show the result of a 2nd and 4th degree polynomial fitted through 11 data points.
137
Model-Based Analyses
The calculation of the derivative of a general polynomial is straightforward:
yi = a1 + a2 xi + a3 xi2 + a 4 xi3 + ... + and +1xi nd = ⎛ dy ⎞ nd -1 2 = ⎜ ⎟ = a 2 + 2a3 xi + 3a 4 xi + ... + nd and +1x i ⎝ dx ⎠i
nd +1
∑ a j xij -1 j =1
nd
∑
j =1
(4.46)
ja j +1xi j -1
Defying Matlab elegance, one could write equation (4.46) as a loop, but it is certainly faster to vectorise the equation. The vectorised Matlab code (note that the polynomial degree equals the number of parameters minus one, nd=np-1) dydx(i)=F(1+n,1:nd)*([1:nd]'.*a(2:nd+1));
is a bit less transparent. Here the 'explanation':
nd
∑ ja j +1xij −1 = ⎡⎣ xi0 j =1
x 1i
x i2
⎛ ⎡ 1 ⎤ ⎡ a 2 ⎤ ⎞ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢ 2 ⎥ ⎢ a3 ⎥ ⎟ x ind −1 ⎤⎦ ∗ ⎜ ⎢ 3 ⎥ . ∗ ⎢ a 4 ⎥ ⎟ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢nd ⎥ ⎢a ⎟ ⎥ ⎝ ⎣ ⎦ ⎣ nd +1 ⎦ ⎠
= [F (n + 1,1) F (n + 1, 2) F (n + 1, 3)
⎛ ⎡ 1 ⎤ ⎡ a(2) ⎤ ⎞ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢ 2 ⎥ ⎢ a(3) ⎥ ⎟
F (n + 1, nd )] ∗ ⎜ ⎢ 3 ⎥ . ∗ ⎢ a (4) ⎥ ⎟
⎜⎢ ⎥ ⎢ ⎥⎟ ⎜⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎢nd ⎥ ⎢ a (nd + 1)⎥ ⎟ ⎦ ⎣ ⎦ ⎣ ⎝ ⎠
MatlabFile 4-15. SavGol_deriv.m function dydx=SavGol_deriv(x,y,n,nd) % Savitzki-Golay % % polynomial interpolation, degree nd, through 2n+1 data points dydx=zeros(size(x)); x1=(-n:+n)'; F(:,1)=ones(size(x1)); for j=1:nd F(:,j+1)=F(:,j).*x1; end F_plus=inv(F'*F)*F'; for i=1+n:length(x)-n a=F_plus*y(i-n:i+n); dydx(i)=F(1+n,1:nd)*([1:nd]'.*a(2:nd+1)); end
(4.47)
138
Chapter 4
Polynomial Interpolation The Savitzky-Golay algorithm could readily be adapted for polynomial interpolation. The computations are virtually identical to smoothing. In smoothing, a polynomial is fitted to a range of (x,y)-data pairs arranged around the x-value that needs to be smoothed. For polynomial smoothing, the polynomial is evaluated for a set number of data points around the desired x-value and the computed y-value at that x is the interpolated value. Polynomial fitting is a very important tool and, as expected, Matlab provides a set of functions for the task. Instead of adapting the Savitzky-Golay routine used previously, we demonstrate the handling of the Matlab routines polyfit.m and polyval.m. The developed function is a very general polynomial interpolation routine that deals with almost anything imaginable. An obvious name would be polypol, we couldn't resist the temptation and rearranged two consonants to turn it into a lolipop. MatlabFile 4-16. Main_lolipop.m % Main_lolipop x=(1:.5:6)'; % x/y data pairs y=gauss(x,4,3); randn('seed',0); yn=y+0.02*randn(size(y)); x1=1:0.1:5; % x1-values for which y1 is interpolated y1=lolipop(x,yn,x1,3,6); % interpolation plot(x,yn,'+',x1,y1,'.k'); xlabel('x');ylabel('y');
1.4 1.2 1
y
0.8 0.6 0.4 0.2 0 1
2
3
4
5
x
Figure 4-26. The result of polynomial interpolation.
6
Model-Based Analyses
139
In Main_lolipop.m a Gaussian curve is calculated at 0.5 intervals. A small amount of noise is added and this demonstrates a certain level of smoothing, resulting from polynomial fitting. The function lolipop.m interpolates the value y1 at position x1, using a polynomial of degree nd through points neighbouring (x,y)-data points whose x are closest to x1. MatlabFile 4-17. lolipop.m function y1=lolipop(x,y,x1,nd,npoints) % % % %
General Polynomial Inter/Extrapolation, degree nd, using npoints x,y,x1,y1 vectors - x,y do not have to be the same length as x1,y1 nd: degree of polynomials npoints: number of total points to define each polynomial
for i=1:length(x1) N=sortrows([x y abs(x-x1(i))],3); % sort x,y by abs(x-x1(i)) x_npoints=N(1:npoints,1); y_npoints=N(1:npoints,2);
% npoints nearest nodes
a=polyfit(x_npoints-mean(x_npoints),y_npoints,nd); % polyn. par. y1(i)=polyval(a,x1(i)-mean(x_npoints)); % interpolate end
The Matlab functions polyfit.m and polyval.m need to be explained. a=polyfit(x,y,nd) fits a polynomial of degree nd to the x/y data pairs, a is the vector of coefficients defining the polynomial. Note that the elements of a are arranged in the opposite order than 'our' a as defined in (4.26). The command y1(i)=polyval(a,x1(i)) evaluates the polynomial defined by a at x1(i). Using lolipop.m the routine Main_lolipop.m fits a polynomial of degree nd=3 through 6 nodes, closest to x1(i). These 6 nodes are determined by the first 3 lines of the loop, taking advantage of the sortrow function. Note that lolipop.m also allows for extrapolation but choosing x1-values outside the range of x is not recommended.
4.2.8 Linear Regression with Multivariate Data In this chapter we expand the linear regression calculation into higher dimensions, i.e. instead of a vector y of measurements and a vector a of fitted linear parameters, we deal with matrices Y of data and A of parameters. We derive the new concept by using a chemical example based on absorption data. First, consider a consecutive reaction A→B→C, with rate constants k1 and k2, where the absorption at one particular wavelength was recorded as a function of time. Let's say our task is to determine the molar absorptivities of species A, B and C at this wavelength, knowing all individual concentrations at all reaction times. Previously we used the notation F for the matrix of the 'known' function. In many chemical applications involving spectroscopic absorption
140
Chapter 4
measurements, an equivalent matrix is made up of molar concentrations of several chemical species. In these circumstances, we call the matrix C, referring to Chapter 3.1, Beer-Lambert's Law. The above example of recording the kinetics of the reaction A→B→C at one wavelength is then best described by the matrix equation. y = Ca + r
(4.48)
The (ns×1) column vector y contains the absorption data at ns reaction times; the concentration profiles of three species A, B and C form the columns of an (ns×3) matrix C and their molar absorptivities form an (3×1) column vector a. Vector r contains the residuals between y and C×a and has the same dimensions as y. Having the measurements y and supposedly knowing C, it is, as earlier in equation (4.30), a linear least-squares calculation that computes the best a: a = C+ y
(4.49)
The next step is to imagine having measured whole absorption spectra as a function of time, e.g. by using a diode array spectrophotometer. The kinetic traces at nl different wavelengths are arranged as columns of a matrix Y and, similarly, the molar absorptivities as columns of a matrix A, thus (4.48) transforms into
Y = CA +R
(4.50)
It is most helpful to recall the 'rectangle' notation for this equation introduced in Chapter 3.1, Beer-Lambert's Law:
nl ns
Y
nc
=
C
×
nl
A
nl nc
+ ns
R
Figure 4-27. The structure of Beer-Lambert's law in matrix notation. It is important to realise that each column a:,j can independently be calculated from the appropriate column y:,j, irrespective of all the other wavelengths, using equation (4.49). The pseudo-inverse C+ is the same for all. The equivalent of (4.49) for all wavelengths can be written as A = C + Y (in Matlab A = C \ Y )
(4.51)
Equation (4.51) minimises the sum of squares, ssq, of the residuals in R, defined by the multivariate equivalent to equation (4.2):
Model-Based Analyses ns nl
ssq = ∑ ∑ ri2, j
141
(4.52)
i =1 j =1
The complete matrix A, containing the absorption spectra of the components A, B and C, is computed in one step! Presently, we are able to compute A knowing Y and C. Computing Y knowing C and A is trivial. What about calculating the concentration matrix C, knowing Y and A? We could transpose equation (4.50): Yt = A t Ct
(4.53)
Applying the equivalent of (4.51), we can write +
Ct = A t Y t
(4.54)
(in Matlab C = Y / A )
(4.55)
and transposing back C = Y A+
This is, of course, the matrix equivalent of equation (4.31). There is another, more casual, way of deriving equations (4.51) and (4.55). We start with
Y = CA
(4.56)
multiply the equation with At from the right Y A t = CA A t
(4.57)
and then with (AA t )-1 , again from the right Y A t (AA t )-1 = C AA t (AA t )-1
(4.58)
C = Y A t (AA t )-1 = Y A +
(4.59)
A + = A t (AA t )-1
(4.60)
to result in
noting that
The same sequence for the calculation of C, given Y and A:
142
Chapter 4
Y = CA t
C Y = Ct C A (C t C)-1 C t Y = (C t C)-1 C t C A t
-1
(4.61)
t
A = (C C) C Y = C+ Y
Of course, All these derivations and equations hold for any matrix product of the kind Y=CA, irrespective of what the physical meaning of the matrices is. In addition, vectors are just special matrices and the equations also hold for vectors. The derivation of equation (4.59) and its equivalent (4.61) is not mathematically proper, but more importantly, the results are correct in the least-squares sense. They are identical to the ones derived via the normal equations, e.g. equation (4.26). Figure 4-28 represents the shapes of the matrices. A Y
=
C
C+ =
A
Y
A+ C
=
Y
Figure 4-28. Schematic representations of the dimensions of the matrices in equations (4.50), (4.51) and (4.55). Referring back to Matlab, it is very important to use the correct slash operator \ or / for the left and right pseudo inverse. Applying the wrong one will invariably result in an error message or worse, in a potentially undetected error.
143
Model-Based Analyses
Applications First we construct a kinetic measurement which we then analyse in both of the above ways. The reaction is the set of two consecutive first order reactions with rate constants k1 and k2: k2 k1 A ⎯⎯⎯ → B ⎯⎯⎯ →C
(4.62)
Integration of the appropriate differential equations for the reaction scheme is straightforward, the resulting equations for the concentrations of A, B, and C as a function of time (see Chapter 3.4.2, Rate Laws with Explicit Solutions) are: [ A ] = [ A ]0 × e −k1t [B ] = [ A ]0
(
k1 e −k1t − e −k2t k2 − k1
)
(4.63)
[C ] = [ A ]0 − [ A ] − [B ]
The absorption spectra are modelled by Gaussians (see 3.2 Chromatography / Gaussian Curves). MatlabFile 4-18. Data_ABC.m function [t,lam,Y,C,A]=Data_ABC % absorbance data generation for A -> B -> C t =[0:25:4000]'; lam =400:10:600; k =[.003; .0015]; A_0=1e-3;
% % % %
reaction times wavelengths rate constants initial concentration of A
C(:,1)=A_0*exp(-k(1)*t); % concentrations of A C(:,2)=A_0*k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t)); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A(1,:)=1e3*gauss(lam,450,50); % molar spectrum of A A(2,:)=4e2*gauss(lam,500,50); % molar spectrum of B A(3,:)=5e2*gauss(lam,550,50); % molar spectrum of C Y=C*A; randn('seed',0); Y=Y+0.01*randn(size(Y));
% applying Beer's law to generate Y % fixed start for random number generator % standard deviation 0.01
The short routine Main_ABC_3D.m reads in the absorbance data modelled by Data_ABC.m and plots them in Figure 4-29 against wavelength and reaction time. MatlabFile 4-19. Main_ABC_3D % Main_ABC_3D [t,lam,Y]=Data_ABC; mesh(lam,t,Y); xlabel('wavelength')
144
Chapter 4
ylabel('time') zlabel('absorbance')
Figure 4-29. Mesh-plot of the absorption matrix for a consecutive reaction A→B→C, measured at several wavelengths. We have now a set of data we can analyse in the two ways.
Computation of Component Spectra, Known Concentrations Assuming we know the two rate constants k1 and k2 that allow the computation of C. Assuming further, we only have measured spectra between time = 200 and 1200 (fast reaction with significant dead time of the instrument). The task is to determine the three absorption spectra of the pure compounds A, B and C. All three are not accessible directly in the range of available spectra because of severe overlap. MatlabFile 4-20. Main_ABC_Lin1.m % Main_ABC_Lin1 [t,lam,Y,C,A]=Data_ABC; C_p=C(9:49,:); Y_p=Y(9:49,:); A_calc=C_p\Y_p; % component spectra via multivariate linear regression plot(lam,A,'k:',lam,A_calc,'k-'); xlabel('wavelength');ylabel('mol. absorptivity');
145
Model-Based Analyses
1200 1000
mol. absorptivity
800 600 400 200 0 -200 400
450
500 wavelength
550
600
Figure 4-30. Calculated and true absorption spectra of species A, B and C (from left to right).
Computation of Component Concentrations, Known Spectra The molar absorptivity spectra of species A, B and C have been determined Main_ABC_Lin2.m calculates the corresponding independently. concentration profiles of the 3 components using the complete data set from Data_ABC.m. They are shown in Figure 4-31. MatlabFile 4-21. Main_ABC_Lin2.m % Main_ABC_Lin2 [t,lam,Y,C,A]=Data_ABC; C_calc=Y/A; % conc. profiles via multivariate linear regression plot(t,C,'k:',t,C_calc,'k-'); xlabel('time'); ylabel('concentration');
146
Chapter 4
12
x 10
-4
10
concentration
8 6 4 2 0 -2 0
1000
2000 time
3000
4000
Figure 4-31. Calculated and true concentration profiles of species A, B and C. It is probably more realistic to assume that we know neither the rate constants nor the absorption spectra for the above example. All we have is the measurement Y and the task is to determine the best set of parameters which include the rate constants k1 and k2 and the molar absorptivities, the whole matrix A. This looks like a formidable task as there are many parameters to be fitted, the two rate constants as well as all elements of A. In Multivariate Data, Separation of the Linear and Non-Linear Parameters (p.162), we start tackling this problem.
The Pseudo-Inverse in Excel We have encountered Excel's LINEST as a tool for linear regression. Unfortunately, LINEST cannot be generalised from vectors to matrices. To deal with matrices, we do not have an option but to use equations (4.59) and (4.61). It is possible to do so, but not as convenient as in Matlab. In order to keep the spreadsheet reasonably small, the dimensions are much smaller than those in the Matlab examples. It is still a consecutive reaction scheme; the spectra were recorded at 11 times and at 6 wavelengths.
147
Model-Based Analyses ExcelSheet 4-3. Chapter3.xls-pseudoinverse
=TRANSPOSE(C5:E15)
=MMULT(G5:Q7,C5:E15)
=MINVERSE(G10:I12)
=MMULT(G5:Q7,A18:F28)
=MMULT(K10:M12,H15:M17)
=MMULT(MINVERSE(MMULT(TRANSPOSE(C5:E15) ,C5:E15)),MMULT(TRANSPOSE(C5:E15),A18:F28))
Figure 4-32. Excel spreadsheet applying the equation A=(Ct C)-1 Ct Y in two ways, stepwise and in one big formula. There are two paths to reach the result. The first path is a stepwise construction of a series of intermediate matrices, they are framed in Figure 4-32. (a) Ct (b) Ct C (c) (Ct C)-1 (d) Ct Y and finally (e) (Ct C)-1 Ct Y This approach uses a lot of space on the spreadsheet, in particular the transpose of a long column is a very wide row. However, it is reasonably easy to detect potential errors in the formulas. The other path does the whole calculation in one step, the result is the shaded matrix Aone_eq. This is a very neat approach on the spreadsheet, but a very difficult equation to be entered in the matrix mode: =MMULT(MINVERSE(MMULT(TRANSPOSE(C5:E15),C5:E15)),MMULT(TRANS POSE(C5:E15),A18:F28)) Remember to preselect the rectangle of correct dimensions and use CtrlShift-Enter for matrix equations in Excel!
148
Chapter 4
4.3 Non-Linear Regression Non-linear regression calculations are extensively used in most sciences. The goals are very similar to the ones discussed in the previous chapter on Linear Regression. Now, however, the function describing the measured data is non-linear and as a consequence, instead of an explicit equation for the computation of the best parameters, we have to develop iterative procedures. Starting from initial guesses for the parameters, these are iteratively improved or 'fitted', i.e. those parameters are determined that result in the optimal 'fit', or, in other words, that result in the minimal sum of squares of the residuals. There are a multitude of methods for this task. Those that are conceptually simple usually are computationally intensive and slow, while the fast algorithms have a more complex mathematical background. We start this chapter with the Newton-Gauss-Levenberg/Marquardt algorithm, not because it is the simplest but because it is the most powerful and fastest method. We can't think of many instances where it is advantageous to use an alternative algorithm. Because of its relative complexity and tremendous usefulness, we develop the Newton-Gauss-Levenberg/Marquardt algorithm in several small steps and thus examine it in more detail than many of the other algorithms introduced in this book.
4.3.1 The Newton-Gauss-Levenberg/Marquardt Algorithm Later, in Chapter 4.4, General Optimisation, we discuss non-linear leastsquares methods where the sum of squares is minimised directly. What is meant with that statement is, that ssq is calculated for different sets of parameters p and the changes of ssq as a function of the changes in p are used to direct the parameter vector towards the minimum. In this section we demonstrate that it is possible to use the complete vector or matrix of residuals to drive the iterative refinement towards the minimum. As expected in an iterative algorithm, we start from an initial guess for the parameters. This parameter vector is subsequently improved by the addition of an appropriate parameter shift vector δp, resulting in a better, but probably still not perfect, fit. From this new parameter vector the process is repeated until the optimum is reached. As with almost any other non-linear problem that has to be solved iteratively, linearisation via a Taylor expansion with truncation after very few elements, is the solution. In Chapter 4.1 Background to Least-Squares Methods, e.g. in Figure 4-3 and Figure 4-5, we have seen that for univariate data, the vector r of residuals and thus the sum of squares ssq, is a function of the measurement y and the parameters p of the model of choice.
Model-Based Analyses
r = f (y, p)
149
(4.64)
The basic principle of the algorithm is to add a shift vector δp to the parameter vector p. The shift vector is computed with the aim of producing a new parameter vector for which ssq is minimal, or at least smaller. The residuals r(p+δp) after the application of the shift vector, are approximated by a Taylor series expansion. With sufficient terms, any precision for the approximation can be achieved. r(p + δp) = r(p) +
1 ∂r(p) 1 ∂r 2 (p) 2 δp + δp + ... 1! ∂p 2! ∂p2
(4.65)
As done previously, in The Newton-Raphson Algorithm (p.48), we neglect all but the first two terms in the expansion. This leaves us with an approximation that is not very accurate but, since it is a linear equation, is easy to deal with. Algorithms that include additional higher terms in the Taylor expansion, often result in fewer iterations but require longer computation times due to the calculation of higher order derivatives. r(p + δp) ≅ r(p) +
∂r(p) δp = r(p) + Jδp ∂p
(4.66)
The derivative ∂r(p)/∂p is known as the Jacobian J. The task is to compute the ‘best’ parameter shift vector δp that minimises the new residuals r(p+δp) in the least-squares sense. This is a linear regression equation with the explicit solution. δp = − J + r(p)
(4.67)
Note that equation (4.66) ( r(p + δp) ≅ r(p) + Jδp ) has the same structure as equation (4.17) ( r = y − Fa ) where the calculation of a was a = F + y . The Taylor series expansion is always only an approximation and therefore the shift vector δp will not result in the minimum directly. However, the new parameter vector p+δp will usually be better than the preceding p. Thus, an iterative process should move towards the optimal parameters.
A First, Minimal Algorithm We are now in a position to devise a first, very crude program that should, starting from a set of initial guesses, move towards the best fit. Below, a flow diagram is given that represents the basic principle of the Newton-Gauss algorithm:
150
Chapter 4
guess parameters, p=pstart calculate residuals, r(p) r(p)
and the sum of squa squares, res, ssq ssq
calculate Jacobian J calculate shift vector δp, and p = p + δp
Figure 4-33. First version of the Newton-Gauss algorithm The crucial part of this algorithm is the computation of J, the derivatives of the residuals with respect to the parameters. It might be best to demonstrate this by an example. An exponential curve, including some noise, is generated by the function Data_exp.m. The curve is defined by three parameters, the rate, p1, the amplitude p2 and the value at infinity time p3. y = p3 + p2 e
− p1 t
(4.68)
MatlabFile 4-22. Data_exp.m function [t,y]=Data_exp p=[2e-2;-4;10]; t=(1:2:100)'; y=p(3)+p(2)*exp(-p(1)*t); randn('seed',0); y=y+5e-2*randn(size(y)); MatlabFile 4-23. Main_exp_2d.m % Main_exp_2d [t,y]=Data_exp; plot(t,y,'.'); xlabel('time'); ylabel('y');
The main routine Main_exp_2d.m reads in the data and plots them in Figure 4-34.
151
Model-Based Analyses
10 9.5 9
y
8.5 8 7.5 7 6.5 6 0
20
40
60
80
100
time
Figure 4-34. An exponential function The derivatives of the vector r of residuals with respect to the parameter vector p are given by the following equations. Note that the first column of the Jacobian matrix contains the derivative of the residuals with respect to the first parameter, the second with respect to the second, etc. In this example the derivatives can be computed explicitly., later we will introduce the computation of numerical derivatives. r = y − ycalc = y − ( p3 + p2 e − p1 t )
j:,1 =
∂r = p2 t e − p1 t ∂p1
j:,2 =
∂r = −e − p1 t ∂p2
j:,3 =
∂r = −1 ∂p3
(4.69)
With that, we are in a position to write a short program, Main_NG1.m, that iterates towards the optimal set of parameters and fits the curve in Figure 4-34, provided the initial guesses are reasonable. MatlabFile 4-24. Main_NG1.m % Main_NG1 [t,y]=Data_exp; p=[.01; -3 ; 15 ]; % initial guesses [rate const, amp, inf]
152
Chapter 4
for i=1:10 y_calc=p(3)+p(2)*exp(-p(1)*t); r=y-y_calc; ssq(i)=sum(r.*r); J(:,1)=p(2)*t.*exp(-p(1)*t); J(:,2)=-exp(-p(1)*t); J(:,3)=-1;
end
delta_p=-J\r; p=p+delta_p;
% calculate parameter shifts % add parameter shifts
p subplot(1,2,1); plot(t,y,'.',t,y_calc);xlabel('time');ylabel('y'); subplot(1,2,2); plot(log(ssq),'+');xlabel('iteration');ylabel('log(ssq)'); p = 0.0195 -3.9882 10.0252
10
8
9.5
6
9 4 log(ssq)
y
8.5 8 7.5
2
0
7 -2
6.5 6 0
50 time
100
-4 0
5 iteration
10
Figure 4-35. Fitted exponential and iterative decrease of ssq The left panel of Figure 4-35 shows the result of the fit. The right panel displays the sum of squares for each iteration, featuring a continuous decrease. The program at this stage is very crude and needs several stages of improvements. For the next version we implement two new measures: a
Model-Based Analyses
153
proper termination criterion and numerical calculation of the derivatives for the calculation of the Jacobian.
Termination Criterion, Numerical Derivatives We start with the termination criterion. The right panel of Figure 4-35 immediately tells us that the iterations 6 to 10 are wasted. The minimal ssq has been reached at the fifth iteration and there is no further improvement. There are different ways of testing whether there is continuing improvement of the fit or whether the progress is finished and, hopefully, the best minimal ssq has been reached. In the progress of the iterations, the shifts δp, as well as the sum of squares, usually decrease continuously. Thus both could be inspected for constancy. The most common and intuitively correct test is the constancy of the sum of squares, as indicated in Figure 4-35. If a generally applicable routine is envisaged, it is not possible to test the absolute difference between old and new square sum. Depending on the data, ssq can be very small or very large. Therefore, a convergence criterion analysing the relative change in ssq has to be applied. The iterations are stopped once the absolute change is less than a preset value μ, typically μ=10-4. abs(
ssqold − ssq )≤ μ ssqold
(4.70)
guess parameters, p=pstart
calculate residuals, r(p) and the sum of squares, ssq
ssq const ?
yes
end; display results
no calculate Jacobian J calculate shift vector δp, and p = p + δp
Figure 4-36. Improved Newton-Gauss algorithm, including a termination criterion.
154
Chapter 4
And now we introduce numerical derivatives. In the example above, we used explicit formulas for the derivatives of the residuals with respect to the parameters. Often it is not easy, or even impossible, to work out the correct equations. Numerical computation of the derivatives is always possible. Usually it is slower and also numerically less accurate. The general formula is: r(p + Δpi ) − r(p) ∂r ≅ ∂pi Δpi
(4.71)
This is a rather casual notation and we need to clarify what is meant. p+Δpi is a new parameter vector with only the i-th parameter pi shifted by the small amount Δpi. In Main_NG2.m, Δpi is calculated as 1×10-4 pi. The factor 1×10-4 is somewhat arbitrary and experimentation is usually the best way of ∂y ∂r ∂(y − y calc ) determining the optimal value. Note that = = − calc since ∂p ∂p ∂p ∂y = 0 due to the invariance of the measured data y; thus we only need to ∂p compute ycalc not r, which is slightly quicker. In equation (4.72) this is worked out for the example. j:,1 = −
( p + p2 e −1.0001 p1 t ) − ( p3 + p2 e − p1 t ) ∂y calc =− 3 ∂p1 10−4 p1
j:,2 = −
( p + 1.0001 p2 e − p1 t ) − ( p3 + p2 e − p1 t ) ∂y calc =− 3 ∂p2 10−4 p2
j:,3 = −
(1.0001 p3 + p2 e − p1 t ) − ( p3 + p2 e − p1 t ) ∂y calc =− ∂p3 10−4 p3
(4.72)
The Matlab program Main_NG2.m has implemented the additions for a termination criterion and numerical derivatives. Refer to the Matlab Help Desk for information on the while end loop and also the break command. MatlabFile 4-25. Main_NG2.m % Main_NG2 [t,y]=Data_exp; p=[.01; -5 ; 15 ]; ssq_old=1e50;
% initial guesses [rate const, inf, amp]
while 1 y_calc=p(3)+p(2)*exp(-p(1)*t); r=y-y_calc; ssq=sum(r.*r) if end
(ssq_old-ssq)/ssq_old0
% if known spectra are used
C_k =C_glob(:,s.known==1); C_uk=C_glob(:,s.known==0);
% conc with known spectra % conc with unkown spectra
Y_k=C_k*s.A_k; Y_uk=Y_glob-Y_k;
% known part of Y % unkown part of Y
A_uk=C_uk\Y_uk;
% unknown absorptivities
R=Y_uk-C_uk*A_uk;
% residuals
s.A(s.known==1,:)=s.A_k; s.A(s.known==0,:)=A_uk; else s.A=C_glob\Y_glob; R=Y_glob-C_glob*s.A; end r=R(:); s.ssq=sum(r.*r); if nargout==2 for i=1:s.nm subplot(3,2,i);
% build A % % unknown absorptivities % residuals
Model-Based Analyses
187
plot(s.t{i},s.C{i});axis tight; xlabel('time');ylabel('conc.'); subplot(3,2,i+2); plot(s.t{i},s.C{i}*s.A(:,[3,6,9]), ... s.t{i},s.Y{i}(:,[3,6,9]),'.');axis tight; legend(cellstr(int2str([s.lam([3 6 9])]'))) xlabel('time');ylabel('absorbance');
end subplot(3,2,5) plot(s.lam,s.A);axis tight; xlabel('wavelength');ylabel('absorptivity'); drawnow; end MatlabFile 4-48. Main_Glob.m % Main_Glob % A+BC
clear; s=Data_Glob; % get kinetic data into structure s s.fname='Rcalc_Glob'; s.par_str={'s.k(1)' 's.k(2)'}; % variables to be fitted s.k=[5;1e-2]; % rate const initial estimates s.par=get_par(s); % collects variable parameters into s.par s.known=[0 0 0]; s.A_k=[];
% known spectra % define known spectra
s=nglm3(s);
% call ngl/m % sigma_r s.sig_r=sqrt(s.ssq/(sum([s.ns{:}])*s.nl-length(s.par) ... -sum(s.known==0)*s.nl)); s.sig_par=s.sig_r*sqrt(diag(inv(s.Curv))); % sigma_par for i=1:length(s.par) fprintf(1,'%s: %g +- %g\n',s.par_str{i}(3:end),s.par(i), ... s.sig_par(i)); end fprintf(1,'sig_r: %g\n',s.sig_r); rate(1): 0.965868 +- 0.0402616 rate(2): 0.00518603 +- 0.000131341 sig_r: 0.00100028
Clearly, both kinetic parameters are well defined with standard errors of less than 5%. Also, all component spectra are clearly resolved, see Figure 4-49. The calculated standard deviation of the residuals matches the noise level of the generated data sets. To investigate the advantage of global analysis, we also analysed the two data sets individually. The quality of the fits, as represented by σr, are essentially the same. The main difference is that standard deviations of the parameters are substantially larger with errors between 25% and 50% and calculated parameters can be fairly off.
188
Chapter 4
absorbance
3 concentration
1.5 1 0.5 0 0
200 time
0.4 0.3 0.2 0
absorptivity
-3
400
absorbance
concentration
x 10
200 time
400
500 wavelength
600
x 10
-3
2 1 0 0
100 200 time
300
100 200 time
300
0.4 0.3 0.2 0
200 100 0 400
Figure 4-49. Global analysis of two data sets. Top panels: concentration profiles; middle panels: representative fits; bottom panel: absorption spectra.
First data set only: rate(1): 2.42965 +- 0.962225 rate(2): 0.00262281 +- 0.00137306 sig_r: 0.000997811
Second data set only: rate(1): 1.13265 +- 0.362335 rate(2): 0.00460243 +- 0.00116169 sig_r: 0.000982706
Note, that for the individual analyses, one of the two component spectra of species A or B needs to be set as colourless, i.e. non absorbing (see Known Spectra, Uncoloured Species, p.175), as the matrices C1 and C2, containing the concentration profiles, are rank deficient for this kinetic model and hence their pseudo-inverse C+ is not defined.
Model-Based Analyses
189
A few changes in the programs were required to allow the analysis of one individual data set. These lines are highlighted within the main routine Main_Glob.m, s.known=[0 1 0]; s.A_k=[0*s.A_sim(2,:)];
% known spectra % define known spectra
and in the function Data_Glob.m. s.c_0={[3e-3 1.6e-3 0]}; s.t={[0:10:300]'};
% initial conc A,B,C, 2nd data set % times, 2nd data set
Again, for the computation of the standard deviation σr the degrees of freedom, df, need to be adapted according to the number of data sets i.e. individual measurements, nm, comprising Yglob. Referring to equations (4.76) , (4.83) and (4.89), the number of degrees of freedom, df, is now given by df =
nm
∑ nsi × nl − (np + nu × nl )
(4.96)
i =1
where nsi denotes the number of spectra in the i-th data matrix Yi.
4.3.2 Non-White Noise, χ2-Fitting The actual noise distribution of the data is often not known. The most common response is to ignore this fact and assume a normal, white distribution of the noise. Even if the assumption of white noise is incorrect, it is still useful to perform the least-squares fit. There is no real alternative and the results are generally not too wrong. White noise signifies that the experimental standard deviation, σy, is normally distributed and the same for all individual measurements, yi,j. The traditional least-squares fit delivers the most likely parameters only under the condition of white noise. If, however, the standard deviations, σyi,j, for all elements of the matrix Y are known or can be estimated reliably, it does make sense to use this information in the data analysis. Then, instead of the sum of squares, it is the sum of all appropriately weighted and squared residuals that has to be minimised. This is known as the ‘chi-square’ or χ2 -fitting. If the data matrix Y has the dimensions ns×nl, χ2 is defined by ⎛ ri , j χ = ∑∑⎜ ⎜ i =1 j =1 ⎝ σy i , j 2
ns nl
⎞ ⎟ ⎟ ⎠
2
(4.97)
If all σyi,j are the same (white noise), the calculated parameters of the χ2 -fit are the same as for least-squares fitting. If the σyi,j are not constant across the data set, the least-squares fit over-emphasises those parts of the data with high noise.
190
Chapter 4
Linear χ2-Fitting
For the linear least-squares analysis of a monovariate data set equation (4.97) reduces to ⎛ r χ = ∑⎜ i ⎜ i =1 ⎝ σyi 2
ns
⎞ ⎟ ⎟ ⎠
2
(4.98)
Recall equation (4.19) ycalc=Fa. To achieve minimal χ2 in a linear regression calculation, all we need to do is to divide each element of y and of the column vectors f:,j by its corresponding σyi to result in the weighted vectors yw and fw:,j. yi σyi
ywi =
(4.99)
fi, j
f wi , j =
σyi
Or in Matlab y_w=y./sig_y F_w(:,j)=F(:,j)./sig_y
Figure 4-50 displays the original and weighted situations. y
r
f:,1
ssq
ycalc=F a f:,2
yw
χ2 rw
fw
:,1
ycalc,w=Fw a fw
:,2
Figure 4-50. Linear ssq and
χ2
fitting.
Note that all vectors changed direction and length. Each element of the vectors is multiplied by its individual σyi, the vectors are not just multiplied by a constant factor. However, the orthogonality relationship between the
Model-Based Analyses
191
weighted residuals and the weighted base vectors is maintained and the equivalent equation to (4.26) performs the linear regression a = (Fwt Fw )−1 Fwt y w
or
(4.100)
a=
Fw+
yw
or in Matlab a=F_w\y_w
Instead of continuing with a monovariate example of this kind, we immediately proceed to multivariate data and revert to our standard equation Y=CA. Light emission spectroscopy provides a good example for data with nonuniformly distributed, but well defined, noise. This is in contrast to absorption measurements, where the noise is usually adequately described by white noise. In light emission, the noise is directly related to the intensity of the light. If there is no light there is zero signal with zero noise; at high emission the noise is high as well. Particularly in photon counting applications, there is a simple relationship between the number of counted photons, count, and its standard deviation σcount = count
(4.101)
As an example for linear χ2-fitting, we analyse time resolved emission spectra of a mixture of two components with overlapping emission spectra and similar lifetimes. In the 'experiment', the molecules in solution are excited with a very short flash and the emission intensity is measured at many wavelengths, as a function of time. Below is the function Data_Emission.m to generate the data. Note that experimental emission decays are exponentials and that the lifetime τ is used instead of the more customary rate constant in kinetics. Also, we use the notation C representing the concentration of the exited states, not the 'normal' concentration. MatlabFile 4-49. Data_Emission.m function s=Data_Emission % 2-component emission data s.t s.ns s.tau
=[0:1:100]'; =length(s.t); =[10; 30];
% reaction times % number of spectra % life times
s.C_sim(:,1)=exp(-s.t/s.tau(1)); s.C_sim(:,2)=exp(-s.t/s.tau(2)); s.lam=400:10:800;
% concentrations of A % conc. of B % wavelengths
192
Chapter 4
s.A_sim(1,:)=500*gauss(s.lam,500,100); s.A_sim(2,:)=800*gauss(s.lam,600,100); s.nl=length(s.lam); Y0=s.C_sim*s.A_sim;
% Emission spectrum of A % Emission spectrum of B % number of wavelengths
% noise-free data
randn('seed',0); % fixed start for random number generator s.Sig_y=0.5*Y0; % noise proportional to signal %s.Sig_y=sqrt(Y0); % noise proportional to root of signal s.Y=Y0+s.Sig_y.*randn(size(Y0)); plot(s.t,Y0(:,20),s.t,s.Y(:,20),'.'); xlabel('time');ylabel('emission');
1400 1200 1000
emission
800 600 400 200 0 -200 0
20
40
60
80
100
time
Figure 4-51. Noise free (−) and highly noisy (•) emission decay at one particular wavelength.
The noise level used to generate the data shown in Figure 4-51 is proportional to the intensity; it is far too high and not realistic. We use such a level to emphasise the difference between ssq- and χ2-fitting. For a linear fitting exercise, e.g. the calculation of the emission spectra A, we assume to know the lifetimes τ and hence the matrix Csim, which we used for the generation of the measurement. The linear regression has to be performed individually at each wavelength. This is due to the fact that at each wavelength λj the appropriate vector σy:,j is different and each weighted matrix Cw and its pseudo-inverse, needs to be computed independently. There is no equivalent of the elegant A=C\Y notation. MatlabFile 4-50. Main_Emission_lin.m % Main_Emission_lin
193
Model-Based Analyses s=Data_Emission; % chi square (weighted) for j=1:length(s.lam) C_w=s.C_sim./(repmat(s.Sig_y(:,j),1,2)); y_w=s.Y(:,j)./s.Sig_y(:,j); A_w(:,j)=C_w\y_w; R_w(:,j)=y_w-C_w*A_w(:,j); end chi_2=sum(sum(R_w.^2)) % least squares (non-weighted) A=s.C_sim\s.Y; subplot(2,1,1) plot(s.lam,s.A_sim,s.lam,A_w);ylabel('emission'); subplot(2,1,2) plot(s.lam,s.A_sim,s.lam,A);xlabel('wavelength');ylabel('emission'); chi_2 = 4.0551e+003
emission
1000 500 0 -500 400
500
600
700
800
500
600 wavelength
700
800
emission
1000 500 0 -500 400
Figure 4-52. Original (--) and calculated (−) emission spectra as the result of linear regression of very noisy data. Top panel: χ2 – fitting; bottom panel: traditional least-squares fitting.
The figure displays the clearly better defined emission spectra for the χ2 – fitting in the top panel. However, considering the high noise level (refer to Figure 4-51) we have to recognise that even the standard least-squares fit delivers useful results.
194
Chapter 4
MatlabFile 4-51. Main_Emission_lin.m …continued % Main_Emission_lin ...continued subplot(2,1,1) plot(s.t,(s.Y-s.C_sim*A_w)./s.Sig_y,'.k');ylabel('r'); subplot(2,1,2) plot(s.t,s.Y-s.C_sim*A,'.k');xlabel('time');ylabel('r');
Figure 4-53. Top panel: weighted residuals with constant standard deviation of one; bottom panel: uneven distribution of residuals.
The standard deviation of the weighted residuals is equal to one, see Figure 4-53, and χ2 is approximately equal to the number of elements in Y minus the number of fitted parameters, here the number of elements in A, i.e. equal to the degrees of freedom. χ2 ≈ ns × nl − nc × nl
(4.102)
In our example the numbers are ns=101, nl=41, nc=2 and nc×nl=82. Thus we expect χ2 to be 4059, which is very close to the result of the fit (4055). This is a powerful test for the adequateness of the fit. Knowing the standard deviation of the residuals, we know what value for χ2 we can expect. If χ2 is too high, the fit is not good enough. In practice, however, it is difficult to accurately estimate the standard deviations of the errors in the measurement and if χ2 is too large, it could also indicate an under estimation of the standard deviations. Naturally, the argument also works the other way: a χ2 that is too small, necessarily indicates an over-estimation of the standard deviations of the data.
Model-Based Analyses
195
Non-Linear χ2-Fitting
We use the same measurement as in the previous section but this time with a more realistic noise distribution. We replace the line s.Sig_y=0.5*Y0; in Data_Emission.m with the line s.Sig_y=sqrt(Y0);
% noise proportional to root of signal
corresponding to equation (4.101) and now fit the spectra as well as the emission decay lifetimes. Compared with a standard least-squares fit we need to rewrite the linear regression as just explained and additionally, we have to weight the residuals and change the statistics output at the very end. Note that compared to equation (4.102) the degrees of freedom are additionally reduced by the number of non-linear parameters np=2. MatlabFile 4-52. Main_Emission_weighted.m % Main_Emission_weighted s=Data_Emission; % s.fname ='Rcalc_Emission_weighted'; % %s.fname ='Rcalc_Emission'; % file to s.tau=[10;35]; %
get emission data file to calc weighted residuals calc non-weighted residuals start parameter vector
s.par_str={'s.tau(1)'; 's.tau(2)'}; % variable parameters s.par=get_par(s); % collects variable parameters into s.par s=nglm3(s); % call ngl/m % sigma_r sig_r=sqrt(s.ssq/(prod(size(s.Y))-length(s.tau)... -(prod(size(s.A_sim))))); sig_tau=sig_r*sqrt(diag(inv(s.Curv))); % sigma_par for i=1:length(s.par) fprintf(1,'tau(%i): %g +- %g\n',i,s.tau(i),sig_tau(i)); end fprintf(1,'sig_r: %g\n',sig_r); tau(1): 10.0011 +- 0.0805663 tau(2): 29.9019 +- 0.147129 sig_r: 0.9992766
The following routine calculates the weighted residuals: MatlabFile 4-53. Rcalc_Emission_weighted.m function [r_w,s]=Rcalc_Emission_weighted(s) s.C(:,1)=exp(-s.t/s.tau(1)); s.C(:,2)=exp(-s.t/s.tau(2));
% concentrations of A % conc. of B
for j=1:length(s.lam) C_w=s.C./(repmat(s.Sig_y(:,j),1,2)); y_w=s.Y(:,j)./s.Sig_y(:,j); s.A(:,j)=C_w\y_w; end R=s.Y-s.C*s.A; R_w=R./s.Sig_y; r_w=R_w(:); s.ssq=sum(r_w.*r_w); % sum of squared weighted residuals
196
Chapter 4
The results of a standard non-weighted least-squares fit, i.e. setting all Sig_y(i,j)=1 (in Rcalc_Emmission_weighted.m), are similar; the main difference in the results are the standard deviations of the parameters: tau(1): 10.0822 +- 0.111536 tau(2): 30.1539 +- 0.135565 sig_r: 8.6782
One could argue that the differences in the standard deviations are insignificant and that there is no real advantage in doing χ2 fitting. In order to shed some additional light onto the situation, we performed 1000 χ2 and 1000 least-squares fits using the same data generated with different seeds for the random number generator. Figure 4-54 displays the distributions of the fitted parameters. The white bars represent the χ2 fitting results, the black ones the least-squares results. The means of all four distributions are essentially correct, except for the standard deviations of these distributions being slightly narrower for the χ2 fitting. (χ2 fits: τ1=10.01 ± 0.08, τ2=30.00 ± 0.15; ssq-fits: τ1=9.98 ± 0.18, τ2=29.98 ± 0.22). The difference is hardly breathtaking. 300 200 100 0 9
9.5
10 tau(1)
10.5
11
29.5
30 tau(2)
30.5
31
300 200 100 0 29
Figure 4-54. Distributions of the fitted parameters for 1000 experiments. Top panel τ1 with a true value of 10 and bottom panel τ2 with a true value of 30. The white bars represent the χ2 fits, the black bars the ssq-fits.
Figure 4-55 displays the distributions of the computed standard deviations resulting from each fit. We expect these standard deviations to be similar to the standard deviation of the distributions given above. The coincidence is perfect for the χ2 fitting, as indicated by white bars and arrows, while the
197
Model-Based Analyses
standard least-squares fitting deviations of the parameters.
seriously
underestimates
the
standard
600 600 400 400 200 200 0
0. 0.0 08
0.1 0.1
0. 0.1 12
0.1 0.14
0.12 0.12 0. 0.1 14 sig sigtau (1)) (1 tau
0. 0.1 16
0.18 0.18
300 300 200 200 100 100 0
0.16 0.16 0.18 0.18 si sig gta (2) (2) tau u
0.2 0.2
0.22 0.22
Figure 4-55. Distributions of the calculated standard deviations of the fitted parameters for 1000 experiments. Top panel for τ1, bottom panel for τ2. The white bars represent the χ2 fits, the black bars the ssq-fits; the arrows represent the standard deviations of the distributions of Figure 4-54.
4.3.3 Finding the Correct Model We have mentioned earlier that finding the correct model to fit the data is much more difficult than fitting the data to a given model. Whether the model is right or wrong is not relevant from the point of view of the fitting algorithm. Usually, chemical intuition will guide the choice of models but there is not always one unique model that can be used. Statistical tests are the only way then to distinguish the different options. Ockham's razor will always be the guiding principle. It states that the simplest model adequately fitting the data is the 'best' one, or more accurately, the one to accept. As a general rule, the more parameters are fitted, the smaller ssq will be. In the ideal case, the decrease is significant until the correct complexity of the model is reached and only marginal thereafter. In real life there is often no well-defined change in the decrease of ssq when the correct model is reached. Statistical analysis is well established for the ideal case of pure white noise. Under such conditions the development of the correct model can be guided
198
Chapter 4
by pure statistical analyses. Real data suffer from systematic errors that are much harder to manage. The Newton-Gauss algorithm delivers standard deviations for the fitted parameters. Ideally, repeating the experiment should result in approximately the same standard deviation for the collection of fitted parameters. This is never the case, at least we have never experienced such a situation in our laboratories. Calculated standard deviations are reasonably useful for the comparison of the parameters of one data set, but they are not accurate estimates for the standard deviation of the experimental distribution of the parameters. On a very different note, in Chapter 5, Model-Free Analyses, we introduce methods that attempt a model-free analysis of the data. Typically, a matrix Y is automatically decomposed into the product of the matrices C and A. These analyses usually are not as robust as the fitting discussed in this chapter, however, the results can guide the researcher in finding the correct model.
4.4 General Optimisation
4.4.1 The Newton-Gauss Algorithm In all the different versions of the Newton-Gauss algorithms we have used so far, we have not directly minimised the sum of squares! The iterative process was driven by computation of the shift vector for the parameters by the equation δp=-J+r(p). The sum of squares was only used to monitor progress of the fitting and to formulate the termination criterion. Methods that work directly on ssq are often called direct methods. We will see later that for non linear least-squares fitting, the Newton-Gauss algorithms developed so far, are superior. However, there are many optimisation tasks that are of a different nature. In chemistry, data fitting is probably the most important application, but by no means the only one. This chapter provides additional insight into fitting algorithms and also allows expansion of the programs for more general optimisation tasks. It is worth noting that optimisation includes minimisation and maximisation. They are fundamentally identical and one of the two can always be formulated as just the negative of the other. We start with a simple example. Consider the function y = cos(x )/(log(x ) + π) , as introduced in equation (3.69) and Figure 3-21. In Chapter 3.3.4, Solving Non-Linear Equations we solved the equation y = cos(x )/(log(x ) + π) = 0 . Now the task is to find the value of x at a minimum of the function (near x=10). Clearly another non-linear problem and the first thought might be: develop the function into a Taylor series, truncate after the first two elements… etc. This would be equivalent to drawing a tangent at a point on the curve and this does not result in anything useful. The tangent does not have a minimum that could be used as an improved guess.
199
Model-Based Analyses
For the present task, we need to keep an extra term in the Taylor expansion f (x + δx ) = f (x ) + f '(x ) × δx +
1 f ''(x ) × δx 2 2
(4.103)
which is the equation for a parabola. The idea is to approximate the function at the initial point with a parabola, compute the minimum of the parabola and use it as an improved guess for the position of the minimum of the function. An iterative process imposes itself. Refer to Figure 4-56, where the parabola that approximates the curve at x=5 is drawn.
0.2
y
0.1
0
-0.1
-0.2 0
10
20 x
30
40
50
Figure 4-56. A parabola is fitted to the function
y = cos(x )/(log(x ) + π) at x=5.
Obviously, we have to make sure that the initial position for the parabola is sensible. In any iterative process the choice of initial guesses is important. Fitting a parabola at x=30 does not result in an improvement; also recall Figure 3-21. Instead of developing a program that performs the task as just explained, we move to the 2-parameter case. Subsequently, we generalise to the np parameter case and then we analyse the relationship with the Newton-Gauss algorithm for least-squares fitting.
200
Chapter 4
For the 2-parameter case, consider a function of the kind represented in Figure 4-5, where we plotted the sum of squares as a function of two parameters. In analogy to Figure 4-56, we start with an initial guess for the parameters p1 and p2 and at this point compute the first and second derivatives ∂z ∂z ∂ 2z ∂ 2z ∂ 2z , , , and 2 2 ∂p1∂p2 ∂p1 ∂p2 ∂p1 ∂p2
(4.104)
Note, the generalisation of the nomenclature, using z instead of ssq. The parabolic surface approximates z at the point p1/p2. The quality of the approximation decreases with increasing distance from p1/p2. Having determined the first and second derivatives, either explicitly or numerically, the minimum of the parabolic surface has to be localised. The general equation for a parabolic surface is z = a1 + a 2 p1 + a3 p2 + a 4 p12 + a5 p22 + a6 p1 p2 = a1 + [a2
⎡p ⎤ a3 ] ⎢ 1 ⎥ + [ p1 ⎣ p2 ⎦
⎡ ⎢ a4 p2 ] ⎢ ⎢ a6 ⎢⎣ 2
a6 ⎤ 2 ⎥ ⎡ p1 ⎤ ⎥⎢ ⎥ p a5 ⎥ ⎣ 2 ⎦ ⎥⎦
(4.105)
The first derivatives are ∂z = a2 + 2a 4 p1 + a6 p2 ∂p1 ∂z = a3 + 2a5 p2 + a6 p1 ∂p2 (4.106) ⎡ ∂z ⎤ ⎡ ⎢ ∂p ⎥ ⎡a ⎤ ⎢a4 ∂z 1 2 ⎥ = = ⎢ + 2⎢ ∂p ⎢ ∂z ⎥ ⎣⎢a3 ⎦⎥ ⎢ a6 ⎢ ∂p ⎥ ⎢⎣ 2 ⎣ 2⎦ and the second derivatives:
a6 ⎤ 2 ⎥ ⎡ p1 ⎤ ⎥⎢ ⎥ p ⎥ a5 ⎣ 2 ⎦ ⎥⎦
201
Model-Based Analyses
∂ 2z = 2a 4
∂p1 2 ∂ 2z = 2a 5
∂p22 ∂ 2z ∂ 2z = = a6 ∂p1∂p2 ∂p2∂p1 ⎡ ∂ 2z ⎢ 2 ∂ 2z ⎢ ∂p1 = ∂p2 ⎢ ∂ 2z ⎢ ⎢⎣ ∂p1∂p2
(4.107)
∂ 2z ⎤ ⎡ ⎥ ⎢a4 ∂p1∂p2 ⎥ = 2⎢ ∂ 2z ⎥ ⎢ a6 ⎥ ⎢⎣ 2 2 ∂p2 ⎦⎥
a6 ⎤ 2⎥ ⎥ a5 ⎥ ⎦⎥
At the minimum, we know the two first derivatives to be zero: ⎡ ∂z ⎤ ⎡ ⎢ ∂p ⎥ ⎡a ⎤ ⎢ a4 1 2 ⎢ ⎥= + 2⎢ ⎢ ∂z ⎥ ⎣⎢a3 ⎦⎥ ⎢ a6 ⎢ ∂p ⎥ ⎢⎣ 2 ⎣ 2⎦
a6 ⎤ 2 ⎥ ⎡ p1 ⎤ = 0 ⎥⎢ ⎥ p2 ⎦ min ⎥ ⎣ a5 ⎥⎦
(4.108)
Thus ⎡ a ⎡ p1 ⎤ 1⎢ 4 = − ⎢ ⎢ ⎥ 2 ⎢ a6 ⎣ p2 ⎦ min ⎣⎢ 2
a6 ⎤ 2⎥ ⎥ a5 ⎥ ⎥⎦
−1
⎡a2 ⎤ ⎢a ⎥ ⎣ 3⎦
(4.109)
In order to compute the minimum we need to know the coefficients a2 to a6. The polynomial coefficients a2 and a3 are defined by the first derivatives, equation (4.106); the coefficients a4, a5 and a6 are defined by the second derivatives, equation (4.107). It is of course possible to generalise the above equations for any number of parameters. Having a column parameter vector p of length np, we can write: z(p) = a1 + a2p + pt A 3 p ∂z = a2 + 2A 3 p ∂p ∂ 2z = 2A 3 ∂p2 1 pmin = − A -1 3a 2
(4.110)
2
202
Chapter 4
where the polynomial coefficients are collected in a scalar a1, a row vector a2 of length np and a matrix A3 of size np×np. Compare equations (4.110) with equations (4.105)-(4.109) for the 2-parameter case. The next question: How does this compare with what we discussed in Chapter 4.3, Non-Linear Regression? Replacing z in the preceding equations with the sum of squared residuals, ssq, the first derivatives m
∑ ∂ri (p)2
∂ssq = ∂p j
i =1
∂p j m
= ∑2 i =1
∂ri (p) ri (p) ∂p j
= 2 j:,t j r
(4.111)
thus ∂ssq = 2J t r ∂p
turn out to be the product 2Jt r. The second derivatives, or the Hessian:
∂ 2ssq ∂ 2ssq ∂ssq = = ∂p j ∂pk ∂pk ∂p j ∂pk =
∂ssq ∂pk
⎛ ∂ssq ⎞ ⎜ ⎟ ⎜ ⎟ ⎝ ∂p j ⎠
⎛ m ∂ri (p) ⎞ ri (p) ⎟ ⎜∑2 ⎜ i =1 ∂p j ⎟ ⎝ ⎠
m ⎛ ∂ r (p) ∂ri (p) ∂ 2ri (p) ⎞ = 2∑ ⎜ i + ri (p) ⎟ ⎜ ∂pk ∂p j ∂pk ⎠⎟ i =1 ⎝ ∂p j m ⎛ t ∂ 2ri (p) ⎞ = 2 ⎜ j:,j j:,k + ∑ ri (p) ⎟ ⎜ ∂p j ∂pk ⎟⎠ i =1 ⎝ thus
(4.112)
∂ 2ssq ≈ 2Jt J ∂p2 turns out to be almost 2JtJ. JtJ is an approximation for the curvature matrix. It is approximately 0.5 times the Hessian matrix of second derivatives of ssq with respect to the m ∂ 2ri (p) , which is the parameters. What can we say about the term ∑ ri (p) ∂p j ∂pk i =1
Model-Based Analyses
203
difference between JtJ and the Hessian matrix? It is the sum of the products of the residuals, times the second derivatives. Close to the minimum, the elements of r are approximately randomly distributed around zero, with similar numbers of positive and negative elements. Thus the sum of the products approximately cancels and the term is small. What is the effect on the iterative refinement of the parameters? The minimum is defined by Jtr=0. The curvature matrix is only required to guide the iterative process towards the minimum and thus the approximation, JtJ, for the curvature matrix does not compromise the exact location of the minimum. The approximation only results in a slightly different path taken by the algorithm towards the minimum. Ignoring the terms m ∂ 2r (p) ∑ ri (p) ∂p i∂p generally does not affect the iterative process in a seriously i =1 k j negative way. It turns out that the Newton-Gauss algorithm as introduced in Chapter 4.3.1 The Newton-Gauss-Levenberg/Marquardt Algorithm for the minimisation of the sum of squares is a very elegant and fast way of approximating the curvature matrix, as it only requires the computation of the first derivatives. Minimising the sum of squares directly, as defined by the equation (4.110) and its predecessors, is much more computationally intensive, as it requires the calculation of the second derivatives. Within chemistry, there are not many applications of function minimisation that are not sum-of-squares minimisations. This is why we do not supply a Matlab program that optimises general functions based on the NewtonGauss algorithm, nor do we supply one for the numerical calculation of second derivatives. Additionally, Matlab provides fminunc for unconstrained function optimisation in the Optimisation Toolbox. Fminunc has many options that allow optimal usage for a wide range of optimisation problems. Note also that the Newton-Gauss algorithm for function optimisation is the standard option in Excel's solver. The following few equations recapitulate the relationship between our 'original' least-squares formulas, introduced in Chapter 4.3, Non-Linear Regression, and those developed here. The first derivatives are: ∂ssq = a2 + 2 A 3 p = 2 Jt r ∂p
(4.113)
and the second derivatives: ∂ 2ssq = 2 A 3 ≈ 2 Jt J ∂p2 and now the calculation of the shift vector δp:
(4.114)
204
Chapter 4
δp = −( Jt J ) −1J t r ≈ −
1 -1 1 A 3 (a 2 + 2 A 3 p) = − A -1 3 a2 − p 2 2
thus
pmin
(4.115) 1 = p + δp = − A -1 3 a2 2
which is the same as equation (4.110).
4.4.2 The Simplex Algorithm The simplex algorithm is conceptually a very simple method. It is reasonably fast for small numbers of parameters, robust and reliable. For high dimensional tasks with many parameters, however, it quickly becomes painfully slow. Also, the simplex algorithm does not deliver any statistical information about the parameters, e.g. it is possible to 'fit' parameters that are completely independent of the data. The algorithm delivers a value without indicating its uselessness. A simplex is a multidimensional geometrical object with n+1 vertices in an n dimensional space. In 2 dimensions the simplex is a triangle, in 3 dimensions it is a tetrahedron, etc. The simplex algorithm can be used for function minimisation as well as maximisation. We formulate the process for minimisation. At the beginning of the process, the functional values at all corners of the simplex have to be determined. Next the corner with the highest function value is determined. Then, this vertex is deleted and a new simplex is constructed by reflecting the old simplex at the face opposite the deleted corner. Importantly, only one new value has to be determined on the new simplex. The new simplex is treated in the same way: the highest vertex is determined and the simplex reflected, etc. The process is illustrated in Figure 4-57. In the initial simplex the highest value is 14 (we are searching for the minimum) and the simplex has to be reflected at the opposite face, marked in grey. A new functional value of 7 is determined in the new simplex. The next move would be deletion of corner 11 and reflection at the face (8,9,7).
205
Model-Based Analyses
9
14
9
8
reflection at the grey face
11
7
8
11
Figure 4-57. The original simplex is reflected at the grey face opposite the corner with the highest value (14). In the simplex algorithm, the size of the simplex plays an important role. If the simplex is too large, fine details are not covered. If it is too small, progress towards the optimum is painfully slow. In a well designed algorithm, its size should be fairly large at the beginning, as we want to proceed in big steps towards the minimum, but it should be small near the minimum, as we want an accurate resolution of the minimum. The simplex algorithm, implemented by Matlab, has, in addition to the reflection steps just introduced, additional expansion and contraction steps. The simplex moves fast, in growing steps, towards the minimum in those parts of the function that are unstructured, simple slopes. It shrinks in narrow valleys and close to the minimum. We do not design our own algorithm here but use the fminsearch.m function supplied by Matlab. It is based on the original Nelder, Mead simplex algorithm. As an example, we re-analyse our exponential decay data Data_Decay.m (see p.106), this time fitting both parameters, the rate constant and the amplitude. Compare the results with those from the linearisation of the exponential curve, followed by a linear least-squares fit, as performed in Linearisation of Non-Linear Problems, (p.127). The arguments passed into fminsearch are the name of the function that delivers the function value for the parameters, initial guesses for the parameters to be fitted, an empty matrix (here specific minimisation arguments could be included, refer to the manual for more details), and the actual data t and y. fminsearch returns the optimal parameters. MatlabFile 4-54. Main_Decay_Simplex.m %Main_Decay_Simplex [t,y]=Data_Decay; par=fminsearch('SsqCalc_Decay',[10 0.02],[],t,y) ssq=SsqCalc_decay(par,t,y) par = 106.8371 0.0540 ssq = 4.3885e+003
206
Chapter 4
MatlabFile 4-55. SsqCalc_Decay.m function ssq=SsqCalc_Decay(par,t,y) I_0=par(1); k=par(2); y_calc=I_0*exp(-k*t); r=y-y_calc; ssq=sum(r.*r);
In Figure 4-58 two simplex paths used by fminserch are represented. One is starting from [200 0.04] and the other from [10 0.02]. The moves of the simplex are clearly visible: they grow in size at the beginning and shrink towards the end, close to the minimum. 0.2 0.18 0.16 0.14
k
0.12 0.1 0.08 0.06 0.04 0.02 0 0
50
100 I
150
200
0
Figure 4-58. Two paths of the simplex for the fitting of the decay data. Starting parameters are [200 0.04] and [10 0.02] As the second example, we re-analyse the consecutive reaction A→B→C, Data_ABC.m, where data were 'acquired' at many wavelengths. As outlined in Multivariate Data, Separation of the Linear and Non-Linear Parameters, (p.162), it is crucial to eliminate the linear parameters by calculating the matrix A of molar absorptivities as a function of C and thus the rate constants. In fact, the function SsqCalc_ABC is almost identical to Rcalc_ABC (p.167). The only difference concerns the sum of squares, ssq, which is now returned instead of the residuals.
Model-Based Analyses
207
MatlabFile 4-56. Main_ABC_simplex.m % Main_ABC_Simplex [time,lam,Y]=Data_ABC; A_0=1e-3; k0=[0.01;0.001];
% get absorbance data % start parameter vector
[k,ssq] = fminsearch('SsqCalc_ABC',k0,[],A_0,time,Y); sig_r=sqrt(ssq/(prod(size(Y))-length(k)));% sigma_r for i=1:length(k) fprintf(1,'k(%i): %g\n',i,k(i)); end fprintf(1,'sig_r: %g\n',sig_r); k(1): 0.00301105 k(2): 0.00149269 sig_r: 0.0100068 MatlabFile 4-57. SsqCalc_ABC.m function ssq=SsqCalc_ABC(k,A_0,t,Y) C(:,1)=A_0*exp(-k(1)*t); % concentrations of A C(:,2)=A_0*k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t)); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A=C\Y; % elimination of linear parameters R=Y-C*A; % residuals ssq=sum(sum(R.*R));
The results of the simplex optimisation are essentially the same as those produced by the Newton-Gauss fit on p.166. The main difference is the lack of the standard deviations of the parameters and longer computation times.
4.4.3 Optimisation in Excel, the Solver The Excel Solver Add-In is a very powerful tool. We have already used it to solve systems of non-linear equation, see Chapters 3.3.3 Solving Complex Equilibria and 3.3.4 Solving Non-Linear Equations. The Solver includes optimisation as one of the options. Its main application, within this chapter on data analysis, is data fitting, based on the minimisation of sum of squares. There is little information available in the Excel documentation about the algorithms and techniques used by Excel. The algorithms used are not well described but this is irrelevant for most users. In a few representative examples we demonstrate the ability and power of the Solver for non-linear data fitting tasks. Several examples are based on the fitting tasks already solved by the Newton-Gauss-Levenberg/Marquardt method in the earlier parts of this chapter. In the first example we re-analyse the double Gaussian from Data_Chrom.m (p.158).
208
Chapter 4
In the Excel spreadsheet of Figure 4-59, the columns A and B contain the data, the time vector and the vector y of measurements. Columns C and D contain the individual Gaussians (see Chapter 3.2, Chromatography / Gaussian Curves) as defined by the parameters in the cells I3:J5. Column E is the sum of the two Gaussians, i.e. the model for the double-Gaussian. The squares of the differences between this model and the data are collected in column F. The sum of all these squared residuals makes up the cell I8. With the present initial guesses for the parameters, the two model Gaussians are visible as the dashed lines, the sum of the two as the full line. ExcelSheet 4-4. Chapter3.xls-chrom =I$3*EXP(-(LN(2)*($A3-I$4)^2)/(I$5^2/4)) =SUM(C3+D3) =(B3-E3)^2
=SUM(F3:F52)
Figure 4-59. The 'measured' double Gaussian and the sum of two Gaussians as defined by the parameters in the spreadsheet, prior to the fit. The solver window in Figure 4-60 indicates the set-up: the sum of squares in I8 is minimised as a function of the parameters I3:J5. Make sure the Min radio button is selected before hitting the Solve button.
Model-Based Analyses
209
Figure 4-60. The solver window set for the task of fitting the parameters in Figure 4-59.
Figure 4-61. The fitted double Gaussian and its parameters. The parameters determined by the Solver are virtually identical with those determined by the Newton-Gauss-Levenberg/Marquardt fit; see Main_Chrom.m, (p.158). In the next example, we re-analyse the consecutive reaction, Data_ABC.m (p.143)and (p.165). This time however, we use fewer data in order to keep the Excel spreadsheet reasonably compact. The important concept of treating linear and non-linear parameters separately can be implemented in Excel as well.
210
Chapter 4
ExcelSheet 4-5. Chapter3.xls-kinetics
=SUMXMY2(E3:O13,E21:O31)
=MMULT(MINVERSE(MMULT(TRANSPOSE(A21:C31) ,A21:C31)),MMULT(TRANSPOSE(A21:C31),E3:O13)) =MMULT(A21:C31,E16:O18)
=Q$4-A21-B21 =Q$4*R$4/(S$4-R$4)*(EXP(-R$4*A3)-EXP(-S$4*A3)) =Q$4*EXP(-R$4*A3)
Figure 4-62. Spreadsheet for the fitting of the reaction scheme A→B→C to multivariate data. The spreadsheet in Figure 4-62 is heavily matrix based (see Chapter 2, for an introduction to basic matrix functions in Excel). It is the only way to keep the structure reasonably simple. The matrix C in cells A21:C31 is computed in the usual way, see equation (4.63); the parameters required to compute the concentration matrix are in cells Q4:S4, they include the initial concentration for species A and the two rate constants that are to be fitted. In cells E16:O18 the computation of the best absorptivity matrix A for any given concentration matrix C, is done as a matrix equation, as demonstrated in The Pseudo-Inverse in Excel (p.146). Similarly the matrix Ycalc in cells E21:O31 is written as the matrix product CA. Even the calculation of the square sum of the residuals in cell R7 is written in a compact way, using the Excel function SUMXMY2, especially designed for this purpose. We refer to the Excel Help for more information on this and similar functions. The small plots below the data in Figure 4-62 show, from left to right, the calculated concentration profiles, the absorption spectra and fits at three selected wavelengths. The spreadsheet in Figure 4-62 is taken before the fitting. Application of the solver results in the rate constants 0.0031 and 0.00161. Due to the smaller number of data, they are not as well defined as the results of the analysis of the complete data set (p.165).
Model-Based Analyses
211
Figure 4-63. The solver window for the fitting of the rate constants in Figure 4-62. To prevent an interchange of the two rate constants, the constraint k1≥k2 has been added (see Constraint: Positive Component Spectra, p.168).
χ2-Fitting in Excel As a last example, we demonstrate the versatility of the Solver by performing a χ2-fitting of the emission data taken from Data_Emission.m (p.191). In order to keep the Excel spreadsheet reasonably concise, we selected the data at one wavelength only (500nm). At this wavelength, the correct amplitudes for the two species are 500 and 50 with lifetimes of 10 and 30 time units. Data are available from times 0 to 100 time units. Figure 4-64 displays the results of the χ2-fitting on the left and of normal least-squares fitting on the right. The distribution of the weighted residuals is uniform with a mean of 0 and a standard deviation of 1; the distribution of the residuals from ssq-fitting is non-uniform. More importantly, the fitted parameters are significantly closer to the correct values for the χ2-fitting. While this analysis seems to be straightforward, there are a few issues that deserve closer attention. How do we define the standard deviation of the error in the measurement? For the χ2-fitting in Non-White Noise, χ2-Fitting (p.189), we assumed to know the standard deviations of the errors from some independent source and used these values for the weighting of the residuals. As given in equation (4.101), for photon counting experiments the standard deviation of the error of a reading equals the square root of the reading. Thus the root of the measurement is an estimate of the error and the entry in e.g. cell D11 of the spreadsheet shown in Figure 4-64 should be sqrt(B11). For low intensities, the reading reaches 0 and so too will its estimated error, and the weighting of this reading results in a division by 0
212
Chapter 4
error. In the spreadsheet, we assign an error of 1 for a reading of 0, i.e. 0±1. This issue could be avoided if the error was computed as the square root of the calculated value (e.g. using cell C11 instead of B11). For the correct analysis, this would be reasonable, but with wrong models and poor fit, the estimated errors will be wrong. ExcelSheet 4-6. Chapter3.xls-emission
=SUM(F11:F110) =$B$2*EXP(-A11/$B$1)+$B$4*EXP(-A11/$B$3)
=$I$2*EXP(-A11/$I$1)+$I$4*EXP(-A11/$I$3)
=IF(B11>0,SQRT(B11),1) =(B11-C11)/D11 =E11^2
=(B11-H11) =I11^2 =I11^2
Figure 4-64. The result of the χ2-fit (left) and sum-of-squares fit (right). Note the uniformly distributed weighted residuals for the χ2 fit. An additional observation for photon counting data: there are no fractions of photons and thus the count can only include integer numbers. Thus the 'measurements' in column B are rounded down to the nearest integer. It seems to be reasonable to do the same with the calculated values in column C. However, a test in Excel reveals that such an attempt does not work. The reason is, that the solver's Newton-Gauss algorithm requires the computation of the derivatives of the objective (χ2 or ssq) with respect to the parameters. A rounding would destroy the continuity of the function and effectively wipe out the derivatives.
5 Model-Free Analyses In the preceding Chapter 4, Model-Based Analyses, we investigated how a given measurement is analysed based on a predetermined model. The model can be a simple mathematical function, such as a polynomial that is fitted to a mono-variate data set; it can also be a complex chemical system, such as an oscillating chemical reaction. In Chapter 4, we provided a range of Matlab routines for this task. In Chapter 4.3.3, we touched on the subject of determining the right model to be fitted, say the degree of the polynomial or the exact chemical reaction mechanism in kinetics or the correct equilibrium model in a titration experiment. This task, unfortunately, is a very difficult one. There are no generally applicable tools available to guide the researcher towards finding the model that correctly describes the chemical process under investigation. Model-fitting is much easier than model-finding. There is a good collection of methods available for performing so-called model-free or soft-modelling analyses. What exactly does this mean? Why should anyone be bothered with trying to find the correct physical/chemical model for successful data fitting, if there are model-free analyses that deliver a satisfactory results? One drawback of these model-free methods is that they do not deliver crisp and directly useful results such as a set of rate constants in a kinetic investigation or equilibrium constants in a complexation study. Typically, these methods deliver the shapes of the concentration profiles of all reacting components as well as the shapes of their absorption spectra. Such information can be very useful in supplying preliminary information about the system under investigation and ultimately could guide the researcher towards the correct model. In many instances, however, there is no model or mathematical function at all that could be used to quantitatively describe the process under investigation. Then, the concentration profiles, in conjunction with the pure component spectra, are all there is to be extracted from the data. There is nothing that could be added to the results of these model-free analyses. An example is chromatography. There are no generally applicable functions that could form the basis for a model-fitting approach. Model-free analyses are essentially all there is. Secondary analysis, such as a library search of the computed component spectra, is possible and can be useful. Most, but not all model-free methods are based on Factor Analysis and we start this chapter with a fairly detailed and comprehensive discussion of this topic.
5.1 Factor Analysis, FA The term 'Factor Analysis', FA, has a very wide range of interpretations; there is no general agreement of its exact meaning. From an abstract
214
Chapter 5
mathematical point of view, Factor Analysis is easily defined: it is the decomposition of a matrix into a product of two or three or more matrices. Such an interpretation of the term is too general and not very useful, even if any such a decomposition, of course, could be called factor analysis. In 'proper' Factor Analysis, the resulting factor matrices have very particular properties: they are orthogonal matrices − sometimes they are even orthonormal (see Orthogonal and Orthonormal Matrices, p.25). There is still a long list of different interpretations for the expression Factor Analysis. All the meanings of the term can be explained on the basis of the Singular Value Decomposition.
5.1.1 The Singular Value Decomposition, SVD The Singular Value Decomposition, SVD, has superseded earlier algorithms that perform Factor Analysis, e.g. the NIPALS or vector iteration algorithms. SVD is one of the most stable, robust and powerful algorithms existing in the world of numerical computing. It is clearly the only algorithm that should be used for any calculation in the realm of Factor Analysis. According to the SVD any matrix Y can be decomposed into the product of three matrices
Y = USV
(5.1)
Please note that we continue with our preference of not explicitly indicating column and row matrices. It is common to write equation (5.1) as Y=USVt, indicating that V is a column matrix that needs to be transposed for the Singular Value Decomposition. Matlab, too, uses this transposed notation. It is not possible to consistently, logically and generally distinguish between row and column matrices. In many matrices both rows and columns have a particular meanings, e.g. for kinetics, the matrix Y has spectra as rows and kinetic wavelength traces as columns. Y is neither a row nor a column matrix. We decided to drop the transposition of V and write the SVD as in (5.1). The price to pay is that we need to remember that Matlab's SVD routine returns Vt. n
n
n n
m
Y
= m
S
n n
V
U
Figure 5-1. Graphical representation of the Singular Value Decomposition.
Model-Free Analyses
215
The dimensions are as follows: Y is an m×n matrix where m≥n, U is an m×n matrix as well, while S and V are n×n matrices. Matlab delivers the above 'economy sized' dimensions only if the following command is used: [U,S,Vt]=svd(Y,0);
If the 0 is not included in the list of parameters passed into svd ([U,S,Vt]=svd(Y)), the resulting dimensions are U (m×m), S (m×n) and Vt (n×n). The matrices are larger but do not contain additional useful information. The important special properties of the three product matrices U, S and V are the following: S is a diagonal matrix, containing the so-called singular values in descending order. Note that the singular values of real matrices are always positive and real. U and V are orthonormal matrices, which means they are comprised of orthonormal vectors. In matrix notation: U t U = VV t = I
(5.2)
Where I is the identity matrix of dimensions (n×n). A more traditional notation for Factor Analysis is
Y = TL
(5.3)
T is often called the score matrix and L the loadings matrix. The relationship between decompositions (5.1) and (5.3) is T = U S and L = V
(5.4)
L contains normalised rows while T is weighted by the matrix S. This, however, is somewhat ambiguous as the decomposition of the transposed, Yt, is equally possible and then the score and loading matrices are simply exchanged. For this reason, we do not use the expressions 'scores' and 'loadings'. The Singular Value Decomposition maintains some kind of symmetry between the decompositions of Y and Yt. The matrices U and V contain, as columns and rows the eigenvectors of the square matrices YYt and YtY. This can easily be shown using the Singular Value Decomposition
YY t U = USVV t SU t U = US2 = UΛ
(5.5)
Remembering that the eigenvectors of a matrix are those vectors that, when multiplied by the matrix, become multiples of the vectors. As Λ=S2 is a diagonal matrix, each column of the product UΛ is a multiple of U and thus the columns of U are eigenvectors of YYt. The diagonal elements of Λ=S2 are the eigenvalues for the corresponding columns of U. In a similar way we can prove the relationship for V:
216
Chapter 5
Y t YV t = V t SU t USVV t = V t S2 = V t Λ
(5.6)
Let us confirm all of this using a few Matlab lines: MatlabFile 5-1. Main_SVD1.m % Main_SVD1 rand('state',0) Y=rand(4,3) [U,S,Vt]=svd(Y,0) UtU=U'*U VVt=Vt'*Vt
% initialise rand. number generator % random numbers % economy sized svd
Y = 0.9501 0.2311 0.6068 0.4860 U = -0.7164 -0.3821 -0.4557 -0.3649 S = 2.1373 0 0 Vt = -0.5721 -0.5355 -0.6212 UtU = 1.0000 -0.0000 0.0000 VVt = 1.0000 -0.0000 -0.0000
0.8913 0.7621 0.4565 0.0185
0.8214 0.4447 0.6154 0.7919
-0.1651 -0.5813 0.1157 0.7883
-0.5108 0.7183 -0.1563 0.4457
0 0.6248 0 0.2594 -0.8367 0.4823
0
0
0.2538
-0.7781 0.1148 0.6176
-0.0000 1.0000 0.0000
0.0000
0.0000
1.0000
-0.0000 1.0000 -0.0000
-0.0000
-0.0000
1.0000
It is hard to imagine at this stage, how powerful this fairly simple and straightforward Singular Value Decomposition is. An astonishing wealth of information can be extracted from the decomposition. The next few chapters deal with that. As a reminder, there is no model or any chemical knowledge required for the SVD. It is completely automatic. No user input of any kind is required. SVD is the core of almost all model-free methods.
Model-Free Analyses
217
5.1.2 The Rank of a Matrix The rank of a matrix Y is the number of linearly independent rows or columns in this matrix. The columns of Y are linearly dependent if one of the column vectors y:,j can be written as a linear combination of the other columns. The same holds for rows.
y:, j = ∑ y:,i ai i≠ j
y j ,: = ∑ yi ,: bi
(5.7)
i≠ j
ai and bi represent any corresponding coefficients. There is an immediate relationship between rank and the chemical process under investigation. Consider a matrix Y of perfect, noise-free spectra, measured during a chromatographic experiment. Assume there are 3 components with different spectra and all 3 components are at least partially resolved. In this situation, every row of Y is a linear combination of the three component spectra. Thus the maximum number of linearly independent rows is 3 and therefore the rank of the matrix is 3. The same holds for the columns of Y: each chromatogram is a linear combination of the 3 component concentration profiles. The row and column ranks of any matrix Y are always identical. In a casual way one could state that the rank of the matrix equals the number of different species that exist in the mixture. However, such a statement is not generally true and needs to be qualified in several ways: (a) only species that absorb in the wavelength range contribute to the rank, e.g. solvents, electrolytes often do not absorb and thus do not contribute to the rank; (b) the active species need to have distinguishable spectra or more precisely they need to have linearly independent spectra; (c) the concentration profiles need to be linearly independent, i.e. two exactly co-eluting species cannot be distinguished and only one contributes to the total rank. (d) the species need to take part in the process, e.g. they have to change concentrations. Spectator concentration profiles are just a constant and any number of such components will increase the rank by a total of one. (e) the statements above only apply directly for noise-free data. A first question in model-free analysis is: how many components are there in a system? Or, in other words, what is the rank of the matrix Y? In particular, what is the influence of noise? Providing an answer to these questions is a first, extremely powerful result of SVD. Equation (5.1) can be written in a different way
218
Chapter 5 n
Y = ∑ u:,i si ,i v i ,:
(5.8)
i =1
Recall that the singular values si,i are ordered in decreasing magnitude, s1,1 ≥ s2,2 ≥ s3,3 ≥ … sn,n. This means the eigenvectors u:,i in U and vi,: in V continuously lose importance and once the singular values si,i are small enough, their contribution can be ignored altogether. In the ideal noise-free case, all 'small' singular values are zero; with real, noisy data they are 'small'. So, instead of summing over all n terms in equation (5.8), we only sum ne significant terms often referred to as principal components. ne
Y ≈ ∑ u:,i si ,i v i ,:
(5.9)
i =1
There are many advantages in selecting only the significant ne eigenvectors and singular values for the representation of Y. In fact, from now on we only use this selection and introduce an appropriate nomenclature. ne
Y = ∑ u:,i si ,i v i ,: = U S V
(5.10)
i =1
¯, V ¯ and S ¯ contain the significant parts of the total matrices U, S and Where U V. The graphical representation is instructive: n
m
Y
ne
ne ne S
ne
n V
=m U
Figure 5-2. The shapes of the matrices after selecting the significant parts of U, S and V. ¯ and V ¯ are much thinner and S ¯ is smaller. Depending on the The matrices U dimensions of Y and the number of significant eigenvalues, the reduction in the sizes of U, S and V can be dramatic. Real data are never noise-free and in purely mathematical terms, the rank of a noisy data matrix is always the smaller of the number of rows or columns. So, the question obviously is, where do we stop? What is the correct number of independent species or the correct rank of the matrix Y? How many singular values are statistically relevant? Most importantly for the chemist: what is the practical or the chemical rank; how many components are there in the system?
Model-Free Analyses
219
There is extensive literature on this question. We do not examine this subject from the statistical point of view in any detail. Instead, we start with three graphical ways of answering the question and include a crude statistical analysis. Magnitude of the Singular Values It is easiest to examine an example. We generate a set of three overlapping peaks in a chromatogram, add two different levels of noise and analyse the two data sets. MatlabFile 5-2. Data_Chrom2.m function [t,lam,Y,C,A]=Data_Chrom2 lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); % molar component spectra A A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50); t=(1:1:100)'; C(:,1)=1e-3*gauss(t,35,15); C(:,2)=9e-4*gauss(t,50,16); C(:,3)=2e-3*gauss(t,70,17); Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y));
% elution profiles C
% absorbance data Y
Figure 5-3 displays two data matrices, used to demonstrate different ways of estimating the rank of a matrix. The top matrix has a noise level of 10-3 and the lower one of 1.01×10-1. The mean of all elements of Y is about 0.2 and the maximum is 2. Thus, the noise levels amount to some 0.5% and 50% of the mean and 0.05% and 5% of the maximal value of Y. MatlabFile 5-3. Main_SVD2.m % Main_SVD2 [t,lam,Y,C,A]=Data_Chrom2; [U,S,Vt]=svd(Y,0); Y1=Y+1e-1*randn(size(Y)); [U1,S1,Vt1]=svd(Y1,0); subplot(2,1,1); mesh(lam,t,Y); axis tight xlabel('wavelength');ylabel('time'); zlabel('abs'); subplot(2,1,2); mesh(lam,t,Y1); axis tight xlabel('wavelength');ylabel('time'); zlabel('abs');
% % % %
generating data do svd add additional noise do another svd
220
Chapter 5
Figure 5-3. Two matrices Y with noise levels of 10-3 and 1.01×10-1. Logarithmic plots of the magnitude of the singular values are often instructive and allow a simple analysis. MatlabFile 5-4. Main_SVD2.m …continued % Main_SVD2, ...continued subplot(2,1,1); plot(1:length(lam),log10(diag(S)),'+'); ylabel('log(S)'); axis([0 25 -3 2]); subplot(2,1,2); plot(1:length(lam),log10(diag(S1)),'x'); ylabel('log(S1)'); axis([0 25 -3 2]);
Plots of the kind shown in Figure 5-4 are often crisp and clear, as is the case in the top panel; sometimes less so, as can be seen in the lower panel. The rank is the number of singular values above the noise level, which is represented by the series of much smaller and usually similar singular values. In both panels the rank can clearly be identified as three. Naturally, the difference between significant and noise singular values is much easier to discern for the measurement with a small noise level than it is for the increased noise level. If significantly more noise is present in the data, the third singular value 'disappears' in the noise and there will only be two significant ones remaining. Eventually, with high enough noise, the second singular value also disappears. It is interesting to observe that the significant singular values are hardly affected by increasing noise, while the noise singular values move up together.
221
Model-Free Analyses
2
log(S)
1 0 -1 -2 -3 0
5
10
15
20
25
5
10
15
20
25
2 log(S1)
1 0 -1 -2 -3 0
Figure 5-4. Log of the singular values for a 3 component chromatogram. Upper panel (+) for the data matrix Y with a noise level of 1×10-3 and lower panel (×) for Y1 with additional noise of 1×10-1.
The Structure of the Eigenvectors In a similar, slightly more complex, eye-based analysis, one can investigate the noisiness of the eigenvectors. The real eigenvectors or principal components are smooth. They have broad structures while noise eigenvectors are wildly oscillating and show no underlying structure. Of course, as before, the difference can be more or less pronounced. We analyse the same data as before: MatlabFile 5-5. Main_SVD2.m …continued % Main_SVD2, ...continued subplot(2,1,1) plot(t,U(:,1:4),'-'); hold on; plot(t,U(:,3),'-','LineWidth',3);hold off; xlabel('time');ylabel('U'); subplot(2,1,2) plot(t,U1(:,1:4),'-'); hold on plot(t,U1(:,3),'-','LineWidth',3);hold off xlabel('time');ylabel('U1');
222
Chapter 5
0.4
U
0.2 0 -0.2 -0.4 0
20
40
60
80
100
60
80
100
time 0.4
U1
0.2 0 -0.2 -0.4 0
20
40 time
Figure 5-5. Different noisiness of eigenvectors resulting from SVD of the data in Figure 5-3. The 3rd eigenvector is highlighted. The upper panel of Figure 5-5 shows a clear distinction between the first three real eigenvectors, while the 4th eigenvector represents pure noise. Note that the third eigenvector is highlighted in both panels. In the lower panel all eigenvectors are noisier, but in particular the 3rd eigenvector only just shows some broad structure that is almost completely hidden by the relatively large amount of noise. Another interesting observation can be made: the signs of the eigenvectors are not defined − they arbitrarily result from the Singular Value Decomposition. Apart from the amount of noise, the matrices Y and Y1 are identical, but the resulting eigenvectors have opposite signs. The Structure of the Residuals
Y in equation (5.9) is a good representation of the original matrix Y, but not identical. There is a residual matrix of decreasing significance the more eigenvectors are used to compute Y . Y = Y+R
(5.11)
Similar to the structure of U and V that reveals the significance of the eigenvectors, the structure of R allows the identification of the correct rank. MatlabFile 5-6. Main_SVD2.m …continued % Main_SVD2, ...continued
Model-Free Analyses
223
for i=0:3 R=Y-U(:,1:i)*S(1:i,1:i)*Vt(:,1:i)'; subplot(2,2,i+1) mesh(lam,t,R); axis tight xlabel('wavelength');ylabel('time'); zlabel(['R(' num2str(i) ')']); end
Figure 5-6. The structure of the residuals after the subtraction of the contribution of the first 0, 1, 2 and 3 eigenvectors, see equations (5.9) and (5.11). In this example, we are analysing the data set with the low noise level and, accordingly, the distinction between structured and noise residuals is crisp and unambiguous. Only noise is left after the subtraction of the contributions of three eigenvectors, equations (5.9) and (5.11). The Standard Deviation of the Residuals As an alternative to observing the structure of the residuals, statistical information about their magnitude is also readily available. In essence, after removing the correct number of eigenvalues, the standard deviation of the residuals should be the same as the noise level of the instrument, or in our case, the level of noise added.
224
Chapter 5
MatlabFile 5-7. Main_SVD2.m …continued % Main_SVD2, ...continued for i=0:5 R=Y-U(:,1:i)*S(1:i,1:i)*Vt(:,1:i)'; sig_R(i+1)=std(R(:)); end sig_R sig_R = 0.3492
0.2085
0.0591
0.0009
0.0009
0.0008
Recall, the standard deviation of the added noise in Y was 1×10-3. It is reached approximately after the removal of 3 sets of eigenvectors (at i=4). Note that, from a strictly statistical point of view, it is not quite appropriate to use Matlab's std function for the determination of the residual standard deviation since it doesn't properly take into account the gradual reduction in the degrees of freedom in the calculation of R. But it is not our intention to go into the depths of statistics here. For more rigorous statistical procedures to determine the number of significant factors, we refer to the relevant chemometrics literature on this topic.
5.1.3 Geometrical Interpretations
Two Components So far in this chapter, all our elaborations were completely abstract; there has been no attempt at an interpretation or understanding of the results of Factor Analysis in chemical terms. Abstract Factor Analysis is the core of most applications of Factor Analysis within chemistry, but, nevertheless, much more insight can be gained than the results of the rank analysis we have seen so far. How can we relate the factors U and V to something chemically meaningful? Very sensibly these factors are called abstract factors, in contrast to real factors such as the matrices C and A containing the concentration profiles and pure component spectra. Is there a useful relationship between U, V, C and A? Let us start with an example: the Matlab function Data_AB.m models the absorption spectra of a reacting solution as a function of time. They are stored as rows of the matrix Y. The reaction is a simple first order reaction A → B as introduced in Chapter 3.4.2, Rate Laws with Explicit Solutions. Recall Beer Lambert's law (Chapter 3.1):
Y = CA MatlabFile 5-8. Data_AB.m function [t,lam,Y,C,A]=Data_AB A_0=1e-3;
% init. conc. A
(5.12)
225
Model-Free Analyses k=2e-2;
lam=400:10:600;
A(1,:)=1000*gauss(lam,450,120); % component spectra
A(2,:)=2000*gauss(lam,350,120);
t=(1:2:100)';
C(:,1)=A_0*exp(-k*t); C(:,2)=A_0-C(:,1);
% [A]
% [B] (closure)
Y=C*A;
randn('seed',0);
Y=Y+1e-3*randn(size(Y));
MatlabFile 5-9. Main_plot_AB.m % Main_Plot_AB [t,lam,Y,C,A]=Data_AB; plot(lam,Y,'-k'); xlabel('wavelength'); ylabel('absorbance');
1.4 1.2
absorbance
1 0.8 0.6 0.4 0.2 0 400
450
500 wavelength
550
600
Figure 5-7. Observed spectra during a reaction A → B. Each spectrum yi,: (i-th row of Y) in Figure 5-7 and equation (5.12) is an nl dimensional vector (in the example, the number of wavelengths is nl=21). We cannot represent, nor can we comprehend, such a high dimensional vector, but we can do a reduced representation in three dimensions, without loosing many important aspects. The equivalent would be to measure the spectra at three wavelengths only. Figure 5-8, left, shows a spectrum recorded at three wavelengths; right, shows its vector representation, the absorbances at the three wavelengths form the coordinates in a 3-dimensional space.
226
Chapter 5
λ3
ab abs s
abs(λ abs(λ1) yi,: abs(λ abs(λ3) λ1
λ2
λ3
λ2
wavelength
abs( λ2) abs(λ λ1
Figure 5-8. The spectrum vector yi,: measured at three wavelengths and its representation in a 3-dimensional space. The original spectra, measured at 21 wavelengths, of course are represented as 21-dimensional spectral vectors in a 21-dimensional space. An important question arises: where in the 21-dimensional space (or 3 dimensional space) can all the measured vectors be found? Is it possible to restrict the potential locations of the spectral vectors in a subspace? A first restriction is obvious: as absorbances can only be positive, only those parts of the space with positive coordinates are available to the spectral vectors. Is there anything more specific? Figure 5-9 represents this question in a 3 dimensional space.
λ3 y1,: ? yi,:
?
λ2
ym,: λ1 Figure 5-9. Possible (?) path for the intermediate spectra yi,: as a function of the reaction time.
Model-Free Analyses
227
The vector y1,: represents the initial spectrum of the reaction solution containing only the component A with concentration [A]0. The i-th spectrum is represented by the i-th row vector yi,:. The vector ym,: is the last measured spectrum, the m-th row of Y. If the reaction were finished, the final solution would only contain B with the concentration [A]0. What is the path taken by the series of measured spectra? (It is easy to guess that it is not the wild loop shown in Figure 5-9). First we recognise that each spectrum yi,: is a linear combination of the two spectra of the components A and B. The row vectors a1,: and a2,:, containing the molar absorption spectra, are multiplied with the concentrations [A] and [B].
yi ,: = [ A ] a1,: + [B ] a2,: = c i ,1 a1,: + c i ,2 a2,:
(5.13)
= ci ,: A All intermediate spectra yi,: are linear combinations of the molar component spectra aj,: and therefore they all lie in a plane defined by these component spectra. This is a first, very important result! Can the spectra be localised more precisely ? We know that the sum of the two component concentrations is constant, [A]+[B]=[A]0 or, equivalently, ci,1+ci,2=ctot. A bit of algebra demonstrates that any spectrum yi,: is the sum of a fixed vector ctota1,: plus the difference vector (a2,:-a1,:) multiplied by the concentration [B]. yi,: = [ A ]a1,: + [B ] a2,: = ([ A ]0 − [B ]) a1,: + [B ] a 2,: = [ A ]0 a1,: + [B ](a 2,: − a1,: )
(5.14)
= y1,: + [B ](a 2,: − a1,: ) = c tot a1,: + c i,2 (a 2,: − a1,: )
All spectra yi,: lie on a straight line between the initial and the final spectrum. See Figure 5-10 for a graphical representation. This figure also represents the plane in which all the action occurs. The visible part is shown as the grey triangle. It is limited by the positive part of the space, i.e. where all 3 coordinates are positive. Note also that [A]0 is about 0.7M: y1,:≈0.7a1,:. Further, the final spectrum ym,: has not reached the vector a2,:. The measurements were stopped before the reaction reached completion. The reason for the spectra lying on a straight line is the result of the fact that the sum of the concentrations is constant. We call such a system a closed system.
228
Chapter 5
a1,: λ3 y1,:=[A]0 a1,: y2,: yi,:=y1,: +[B] (a2,:-a1,:) λ2
λ1
ym,:
a2,:
Figure 5-10. Graphical representation of equations (5.13) and (5.14). The grey triangle represents the plane of action.
Reduction in the Number of Dimensions Back to Factor Analysis and eigenvectors. The fact, that in a two component system all intermediate spectra lie on a plane, allows us to define the position of the spectra on that plane with only two coordinates. Or specifically, if spectra were acquired at 1024 wavelengths (with a diode array instrument), the original spectra vectors are defined by 1024 coordinates, but they could be defined by only 2 coordinates. A tremendous reduction! Recall that Figure 5-9 and Figure 5-10 are simplifications for human consumption; they are misleading, since usually spectra are measured at many more than three wavelengths and need to be represented by vectors in a much higher dimensional space. To be able to represent the spectral vectors in the plane, we need a system of axes, preferably an orthonormal system. As it turns out, the two eigenvectors ¯ form an orthonormal system of axes in that plane. This is represented in V Figure 5-11.
229
Model-Free Analyses
a1,: λ3 y1,: v2,: yi,: v1,:
λ2
ym,:
λ1
a2,:
¯ form a system of orthonormal axis Figure 5-11. The eigenvectors V in the plane spanned by the spectra. The dark grey part of the plane is the same as in Figure 5-10. The first eigenvector v1,: is approximately parallel to the average of all measured spectra. The second eigenvector v2,: is orthogonal to v1,: and thus has negative elements. To indicate that fact, the grey plane has been expanded into the region of negative values. We are now in a position to grab that plane, turn it around and put it onto the plane of the paper. Figure 5-12 represents the new situation. The next question arises immediately: how do we determine the coordinates ¯? b of the spectral vectors yi,: in this new system of axes V yi ,: = bV
(5.15)
¯ , this is a particularly simple linear regression Due to the orthonormality of V calculation. The vector b is computed as:
b = yi,:V + = yi,:V t (VV t )−1 = yi,:V t
(5.16)
230
Chapter 5
a1,: y1,: v2,:
yi,: v1,: ym,: a1,: Figure 5-12. Everything on a plane defined by the system of axes v1,: and v2,:. Equation (5.15) holds for one specific vector yi,:. Naturally, it can be expanded into a matrix equation for all yi,:'s in Y.
Y= B V and B=YV
(5.17) t
Recall equation (5.10)
Y = USV and thus
(5.18) t
B = USV V = US ¯¯ are the coordinates of the spectral vectors Y in the The rows of the matrix US ¯ . (Note, we use the notation (US ¯ ×S ¯ .) ¯¯ ) for the product U coordinate system V This is represented in Figure 5-13.
231
Model-Free Analyses
v2
(us )i,2 i ,2
yi,:
(us )i,1 (us
v1
Figure 5-13. The coordinates of the vector yi,: in the system of axes ¯. V All this is not completely new. In Reduced Eigenvector Space (p.180), we did ¯¯ was used to represent the complete matrix Y. The just that: the matrix US ¯¯ we called Yred. The component spectra A can also be represented matrix US ¯ t. As mentioned then, the reduction in the in the eigenvector axes: Ared=AV size of the matrices Y and A can be substantial. Lawton-Sylvestre The insight we have gained so far forms the basis of what is arguably the first 'chemometrics' method. Chemometrics is not easily defined. A Google search offers: "Chemometrics is the science of relating measurements made on a chemical system or process to the state of the system via application of mathematical or statistical methods." More casually one can state: "Chemometrics is the art of extracting useful information from chemical data." Both definitions would include the calculation of the average of a few numbers, what most chemometricians would not accept as a chemometrics method; chemometrics is more exciting. There is no doubt that the method of Lawton and Sylvestre is proper chemometrics. Consider the data in Figure 5-7, spectra that were collected during the progress of the reaction A → B. For the present application, not the whole reaction was covered. The first spectrum is taken a while after the reaction began and the last spectrum before the reaction reached completion. Thus, the data include neither the pure spectrum of the starting material A, nor the spectrum of the product B. The spectrum of pure A is 'somewhere' before the first measured spectrum and the spectrum of pure B 'somewhere' past the last measured spectrum. But, where exactly? Of course it is not possible to define the spectra perfectly, but it is possible to be more precise than the above statement. As a reminder, fitting the data with a physical/chemical model will produce the 'perfect' result but we are now in the chapter on
232
Chapter 5
model-free analysis and the present task is to extract useful information without a model. The foundation of the method of Lawton and Sylvestre is the recognition that all spectra lie on a line, and that the pure spectra must be on that line as well − the spectrum of A somewhere before the first measured spectrum and the spectrum of B somewhere past the last spectrum. We have already narrowed down the region of 'past' and 'before' to be a line. We still hope to be able to localise the regions precisely. Referring to Figure 5-10, one must recognise that the line must poke somewhere through one of the planes spanned by the λ-axes. On this side of the planes, all absorptivities are positive; on the other side of one of the planes, at least one absorptivity is negative. These planes are the limits on the line where the spectra can be found.
a1,: 1,: λ3 y1,: y1,:
v2,:
yi,:
yi,: λ2
λ1
ym,:
v1,:
ym,: m,:
a2,: 2,:
Figure 5-14. Schematic of the Lawton-Sylvestre method. The bold dotted lines represent the feasible regions. In Figure 5-14 the bold dotted lines outside the first, y1,:, and last, ym,:, measured spectra, represent the range in which the pure component spectra a1,: and a2,: can be found. The sections of these lines are limited by the location of where they poke through the λ1/λ2 and the λ2/λ3 planes. These intersections could be computed in explicit equations since everything is linear. We take a more intuitive approach: starting from the first spectrum, we 'walk backwards' on the line until we 'hit the wall'. This happens when the first molar absorptivity becomes negative. We repeat the
233
Model-Free Analyses
same from the last measured spectrum, 'walking' in the opposite direction. Doing the 'walking' is easy, just refer to Figure 5-14 and Figure 5-15. Fit a straight line through the (us ¯¯ ):,2 data pairs and continue on the line. ¯¯ ):,1/(us The points thus calculated need to be translated back into real spectra before the test for negativity can be made. We only use the significant parts ¯, S ¯ and V ¯ , i.e. keep the first two factors. Referring to Figure 5-13, one can U ¯¯ . see that the i-th spectrum is defined by the i-th row of US yi,: = (us)i,: V
(5.19)
¯. where (us ¯¯ )i,: contains the coordinates of yi,: in eigenvector space V
v2 y1,:
v1 ym,: yextrap
dir
Figure 5-15. Determination of the limits for the feasible solutions from the Lawton-Sylvestre method. Vector dir denotes the directional vector pointing to the spectral direction. The program LawtonSylvestre.m performs the analysis using Data_AB.m (p.224) shown in Figure 5-7, after removing the first and last 10 spectra (rows of Y). First the original spectra are represented in the eigenvector ¯ where the coordinates are US ¯¯ . Next, the equation for the best space V straight line through the data in the eigenvector space needs to be computed; we use the MATLAB function polyfit. The vector dir (see Figure 5-15) points along the line of all spectra. Increasing contributions of dir are added to the last spectrum. The resulting points in the eigenvector space are redefined in the real space, see equation (5.20), and all its coordinates are tested for negative values. y extrap = (us)extrap V
(5.20)
234
Chapter 5
The process is repeated in the other direction. MatlabFile 5-10. LawtonSylvestre.m % LawtonSylvestre [t,lam,Y,C,A]=Data_AB; Y(41:50,:)=[]; % remove end and beginning of Y Y(1:10,:)=[]; m=length(Y(:,1)); % number of spectra left [U,S,Vt]=svd(Y,0); U_bar=U(:,1:2);S_bar=S(1:2,1:2);V_bar=Vt(:,1:2)'; US_bar=U_bar*S_bar; % coordinates in V_bar-space subplot(1,2,1) % plot in the eigenvector space plot(US_bar(:,1),US_bar(:,2),'k.'); xlabel('(us)_{:,1}');ylabel('(us)_{:,2}'); hold on b = polyfit(US_bar(:,1),US_bar(:,2),1); % straight line fit dir=[1 b(1)]; % vector in direction of spectra inc=0; % y_extrap=Y(m,:); while all(y_extrap>0) % us_extrap=[US_bar(m,:)+inc*dir] % plot(us_extrap(1),us_extrap(2),'k+'); y_extrap=us_extrap*V_bar; % inc=inc+.05; end y_last=y_extrap; %
init. step size for move check if all absorbances>0 move spect. in real absorbances first impossible spectrum
inc=0; % init. step size for move y_extrap=Y(1,:); while all(y_extrap>0) % check if all absorbances>0 us_extrap=[US_bar(1,:)-inc*dir]; % move plot(us_extrap(1),us_extrap(2),'k+'); y_extrap=us_extrap*V_bar; % spect. in real absorbances inc=inc+.05; end y_first=y_extrap; % first impossible spectrum subplot(1,2,2) plot(lam,Y,'-',lam,y_first,'--',lam,y_last,'--'); xlabel('wavelength');ylabel('absorbance');
Figure 5-16 displays the results of the analysis. On the left, the • markers ¯ , the + markers represent the measured spectra in the eigenvector space V the extrapolated values. The full lines in the right panel are the series of spectra that were used for the analysis and the dashed lines are the extrapolated boundary spectra. The spectrum of B is fairly well defined while the spectrum of A is not, there is a long extrapolation required until the its spectrum turns negative at 400nm. This exemplifies the limitation of model-free methods: they rely on very simple constraints but in certain cases the range of feasible answers can be very wide, sometimes too wide to be useful. This will be discussed later in Chapter 5.4.3, Rotational Ambiguity.
235
Model-Free Analyses
While the Lawton-Sylvestre method is very elegant and simple, it is virtually impossible to extend the principle to 3 and more components. 1
1.8 1.8
0.5 0.5
1.6 1.6
B
1.4 1.4
0
absorbance absorbance
(u (uss ):,2
-1 -1 -1..5 -2
A
1 0.8 0.8 0.6 0.6 0.4 0.4
-2 -2..5
0.2 0.2
-3
0
-3 -3..5 -4
A
1.2 1.2
-0 -0..5
-2 (u (uss):,1
0
-0 -0.2 .2 400
B 500 500 wav wavelength
600
Figure 5-16. The Lawton-Sylvestre analysis in action. The double arrows cover the feasible regions of positive absorbances.
Three and More Components So far we restricted our deliberations to 2-component systems. It is possible to increase this number to 3 and still comprehend the action in a 3 dimensional space. We can even project the 3-dimensional space onto the plane of the paper or computer screen and 'see' what is going on. As usual, we demonstrate the procedures based on a chemical process. Instead of another kinetics example, we use a spectrophotometric titration. The experiment follows the deprotonation of a two-protic acid by measuring the absorption spectra of the solution as a function of pH. The equilibria are quantitatively described by equation (5.21). log K
1 ⎯⎯⎯⎯ → AH A + H ←⎯⎯⎯⎯
log K
2 ⎯⎯⎯⎯→ AH + H ←⎯⎯⎯⎯ AH 2
(5.21)
236
Chapter 5
The concentrations of the differently protonated species, as a function of pH, are calculated with the explicit function we developed in Special Case: Explicit Calculation for Polyprotic Acids, (p.64). A data matrix Y is constructed as before. Data_eqAH2a.m generates the data, it is called by Main_eqAH2a.m. MatlabFile 5-11. Data_eqAH2a.m function [pH,lam,Y,C,A]=Data_eqAH2a pH=[2:.1:12]'; H=10.^(-pH); logK=[8 6]; K=10.^logK; C_tot=1e-3; n=length(logK);
% pH range % protonation constants % [AH2]+[AH]+[A] % number of protons
denom=zeros(size(H)); for i=0:n num(:,i+1)=H.^i*prod(K(1:i)); % numerator denom=denom+num(:,i+1); % denominator end alpha=diag(1./denom)*num; C=C_tot*alpha;
% degree of dissociation % concentration profiles
lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50); Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y));
% wavelength range % component spectra % absorbance data % noise level 0.001
MatlabFile 5-12. Main_eqAH2a.m % Main_eqAH2a [pH,lam,Y,C,A]=Data_eqAH2a; subplot(2,1,1); plot(lam,A); xlabel('wavelength');ylabel('absorptivity'); subplot(2,1,2); plot(pH,C); xlabel('pH');ylabel('concentration');
Each spectrum, measured during the titration, forms a row of the data matrix Y, and is a vector in an nl-dimensional space (in the example nl=21). As this is a three component system, all the vectors lie in a 3-dimensional sub-space. Each measured spectrum is a linear combination of the 3 component spectra shown in Figure 5-17.
237
Model-Free Analyses
absorptivity
1500 1000 500 0 400
concentration
1
x 10
450
500 wavelength
550
600
-3
0.5
0 2
4
6
8
10
12
pH
Figure 5-17. Molar absorption spectra and concentration profiles for the titration of a diprotic acid with logK values of 8 and 6. ¯ and U ¯. Thus, the rank of Y is 3 − there are 3 significant eigenvectors in V ¯ form a set of three basis vectors in the spectral space. The row vectors of V The coordinates of each vector yi,:, in this new system of axes, are given by ¯¯ , see equations (5.17) and (5.18) or in a different the i-th row of the matrix US notation: YV t = USVV t = US
(5.22)
MatlabFile 5-13. Main_EV_space.m % Main_EV_space [pH,lam,Y,C,A]=Data_eqAH2a; [U,S,Vt]=svd(Y,0); US=U*S; plot3(US(:,1),US(:,2),US(:,3),'.',US(:,1),US(:,2),0*US(:,3)-.5) grid on xlabel('us_{:,1}');ylabel('us_{:,2}');zlabel('us_{:,3}');
To support comprehension of the 3-dimensional character of Figure 5-18, we added the projection of the curve onto the bottom of the plot at z=-0.5. The titration starts at the right hand end of the trace. As can be seen from Figure 5-17, the first spectrum at pH 2 is essentially pure AH2. There is not much change in the concentrations up to pH 4 and the spectrum vectors are very similar. Then, as the pH approaches the first logK-value, the measured spectrum starts to move towards the spectrum of the intermediate AH.
238
Chapter 5
1
us:,3
0.5
0
-0.5 1 -1
0 -2
-1 us
:,2
-2
-3
us
:,1
¯ Figure 5-18. Representation of the measured spectra y:,i in V space for an AH2 titration. At pH 7 there is a maximum in the concentration of AH and the measured spectrum is close to that of pure AH. With further increase in pH the spectrum veers towards the spectrum of fully deprotonated A which is represented at the end of the series. A small side issue deserves mentioning: as discussed in connection with Figure 5-13 and equation (5.22), the system of axes is formed by the ¯ , the coordinates of the spectra in Y are the rows of the matrix eigenvectors V ¯¯ . It is not automatically clear how the axes in a plot like Figure 5-18 US ¯ or US ¯¯ ? The equivalent question can be should be labelled; should it be V posed for Figure 5-17; should the abscissa in the top panel be labelled 'wavelength' or 'nm'? Both are correct. The matrix Y can be regarded as a row or as a column matrix and consequently we can also concentrate on the columns of Y rather than the rows. The columns are linear combinations of the concentration profiles of the species and they all lie in a 3-dimensional space as well. The columns of ¯ form a basis in this space. And the coordinates of each column the matrix U ¯¯ . vector of Y are contained in the columns of the matrix SV MatlabFile 5-14. Main_EV_space.m …continued % Main_EV_space, ...continued SV=S*Vt'; plot3(SV(1,:),SV(2,:),SV(3,:),'.',SV(1,:),SV(2,:),0*SV(3,:)-2) grid on xlabel('sv_{1,:}');ylabel('sv_{2,:}');zlabel('sv_{3,:}');
239
Model-Free Analyses
3 2 sv3,:
1 0 -1 -2 4 2
0 0
-5
-2 sv2,:
-4
-10
sv1,:
Figure 5-19. Representation of the measured absorption profiles in ¯ space. the U The lowest wavelength at 400nm is represented in Figure 5-19 at the end of the trace on the top left. For increasing wavelengths, the profiles move to the right. As expected, the trace in Figure 5-19 is less ordered than the equivalent in Figure 5-18. Concentration profiles are governed by the law of mass action ¯¯ , is structured and closure and thus the trace, following the rows of US accordingly. No such law governs the relative shape of the absorption ¯¯ . spectra and the trace following the columns of SV Mean Centring, Closure In Figure 5-10, we have seen that the law of conservation of mass dictates that in the 2-component case all measured spectra lie on a straight line. In the present context this property is called closure. In general terms it means that the sum of all species concentrations is constant during an experiment. In the 2-component case the spectral action occurs in a 2-dimensional subspace. If the system is closed, the action is concentrated in a 1 dimensional space. Similarly in a closed 3-component case, the action is concentrated in a 2-dimensional subspace. Back to the data set for the titration of the 2-protic acid, Data_eqAH2a.m. Due to closure of the chemical system, the sum of all concentrations [A]+[AH]+[AH2] is constant, and as a result, the curve in Figure 5-18 lies in a plane.
240
Chapter 5
The fact that the spectral vectors in a closed system lie in an further reduced sub-space, in a 2-component system they lie on a straight line, in a 3 component system, in a plane etc., suggests that we could move the origin of the system of axes into that sub-space and in this way the number of relevant dimensions is reduced by one. We subtract the mean spectrum from each measured spectrum yi,: and as a result, the origin of the system of axes is moved into the mean. In the above example, it is into the plane of all spectral vectors. This is called meancentring. Mean-centring is numerically superior to subtraction of one particular spectrum, e.g. the first one. The Matlab program, Main_MeanCenter.m, performs mean-centring on the titration data and displays the resulting curve in such a way that we see the zero us:,3 component, i.e. the fact that the origin (+) lies in the (us:,1,us:,2)-plane. MatlabFile 5-15. Main_MeanCenter.m % Main_MeanCenter [pH,lam,Y,C,A]=Data_eqAH2a; Y_mc=Y-repmat(mean(Y,1),length(pH),1); [U,S,Vt]=svd(Y_mc,0); US=U*S;
% subtract mean spectrum
plot3(US(:,1),US(:,2),US(:,3),'.',0,0,0,'+') grid on axis([-1.5 1.5 -1.5 1.5 -1.5 1.5]); view(-70, 2); xlabel('us_{:,1}');ylabel('us_{:,2}');zlabel('us_{:,3}');
1.5 1
us:,3
0.5 0 -0.5 -1 -1.5 1
0
us:,2
-1
-1
0
1 us:,1
Figure 5-20. Mean-centring moved the origin of the system of axes into the centre of the action. This reduces the dimension of the subspace by one.
Model-Free Analyses
241
The argument can be turned around. If mean-centring reduces the rank of the matrix by one, the data set is closed. We have to be careful. The symmetry between columns and rows of the matrix Y is not complete. Closure is a property of the concentration profiles only and thus applies only in one dimension. The command mean(Y,1) computes the mean of each column of Y and the resulting mean spectrum is subtracted from each individual spectrum. There is no equivalent in the other direction. Subtracting the mean column from the columns does not reduce the rank, however, it moves the origin to the centre of the action. While not reducing the rank, it does reduce the absolute values of the numbers and improves the numerical accuracy of the computations. The improvement is often insignificant and usually only marginal. We generally refrained from performing mean-centring, with the exception of PCR and PLS in Chapter 5.6. HELP Plots Plots of the kind represented in Figure 5-18 and in Figure 5-19 are more than just graphically appealing. A considerable amount of useful information can be extracted from these plots of the spectra or concentration profiles in their respective eigenvector spaces. Consider the multivariate chromatographic data, Data_Chrom2.m (p.219) of a 3-component system as shown in Figure 5-3. MatlabFile 5-16. Main_HELPP.m % Main_HELPP [t,lam,Y,C,A]=Data_Chrom2; [U,S,Vt]=svd(Y,0); US=U*S; plot3(US(:,1),US(:,2),US(:,3),'.',US(:,1),US(:,2),0*US(:,3)-1); hold on;plot3(0,0,0,'o','MarkerSize',10);hold off; grid on xlabel('us_{:,1}');ylabel('us_{:,2}');zlabel('us_{:,3}');
From Figure 5-21, there are a few observations we can make: This is a three component system, but as it is not closed, the action occupies all three dimensions. It does not occur in a plane. The path starts and ends at the origin (marked by {). Figure 5-22 reveals that there are no components eluting at the beginning and end of the chromatogram, and therefore the respective spectral vectors contain just noise.
242
Chapter 5
1
us:,3
0.5 0 -0.5 -1 2 1
2 0
0 -1
us:,2
-2 -2 -4
us
:,1
¯ Figure 5-21. The spectra yi,: of a 3-component chromatogram in V space. The path takes off from the origin in an almost straight line and returns to the origin in an almost straight line. This is exploited in HELP plots (Heuristic Evolving Latent Projections). If a section of the path is on a straight line and its extension goes through the origin, this is an indication that there exists only one component in that section of the measurement. In the example, this is the case at the beginning and end of the overlapped concentration profiles. Figure 5-22 reveals that during times 15-25 only the first component is present and during times 70-95 only the third. MatlabFile 5-17. Main_HELPP.m …continued % Main_HELPP, ...continued plot(t,C) xlabel('time');ylabel('concentration');
The useful aspect of this follows: we can determine the regions in the series of spectra in which there is only one component. The spectral vectors are all parallel and the average over all spectra in the region is a good estimate for the pure component spectrum. The main difficulty with this approach is to decide when exactly the deviation from a straight line starts and thus, which selection of spectra we need to average.
243
Model-Free Analyses
2
x 10
-3
1.8 1.6
concentration
1.4 1.2 1 0.8 0.6 0.4 0.2 0 0
20
40
60
80
100
time
Figure 5-22. The concentration profiles from Data_Chrom2.m.
Noise Reduction ¯ and their respective Retaining only the significant singular values S ¯ and V ¯ , as indicated in Figure 5-2 and equation (5.10), results eigenvectors U in a substantial reduction of the size of the matrices needed to represent the ¯ . There is an additional valuable benefit: Y ¯ not only original matrix Y as Y ¯ is represents all relevant information contained in the original Y, Y somewhat better as it contains much less noise than Y. This is demonstrated in Main_NoiseRed1.m, using the kinetic data set Data_AB.m (p.224). MatlabFile 5-18. Main_NoiseRed1.m % Main_NoiseRed1 [t,lam,Y0,C,A]=Data_AB; Y=Y0+.05*randn(size(Y0)); ne=2; [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne);Vt_bar=Vt(:,1:ne);S_bar=S(1:ne,1:ne); Y_bar=U_bar*S_bar*Vt_bar'; subplot(3,1,1); plot(lam,Y0,'-');axis tight;ylabel('Y0'); subplot(3,1,2); plot(lam,Y,'-');axis tight;ylabel('Y'); subplot(3,1,3); plot(lam,Y_bar,'-');axis tight; xlabel('wavelength');ylabel('Y_{bar}');
244
Chapter 5
Y0
1 0.8 0.6 0.4 0.2 400
450
500
550
600
450
500
550
600
450
500 wavelength
550
600
1 Y
0.5 0 400
Ybar
1 0.5 0 400
Figure 5-23. Absorbance spectra with noise level 10-3 (top panel) and an increased noise level of 5×10-2 (second panel). The third ¯ retaining 2 significant factors. panel represents Y The graphs in Figure 5-23 are convincing. The top panel displays the original data for a simple first order reaction A→B. The next panel shows the same data after the addition of a substantial amount of noise. The third panel ¯ =U ¯S ¯V ¯ with 2 eigenvectors. Clearly a features the reconstructed matrix Y substantial amount, but not all, of the noise, was removed. It is worthwhile investigating this particular aspect of Factor Analysis more deeply. Data_AB2.m generates data for a first order reaction where only the first component A absorbs. The rank of Y is then only one. MatlabFile 5-19. Data_AB2.m function [t,lam,Y,C,A]=Data_AB2
A_0=1e-3; k=2e-2;
% initial concentration % rate constant
lam=400:10:600;
A(1,:)=1000*gauss(lam,450,120); t=(1:2:100)';
% spectrum of component A only
245
Model-Free Analyses C(:,1)=A_0*exp(-k*t); Y=C*A; randn('seed',0); Y=Y+5e-2*randn(size(Y));
¯ against The program Main_NoiseRed2.m plots three columns of Y and Y each other. MatlabFile 5-20. Main_NoiseRed2.m % Main_NoiseRed2 [t,lam,Y,C,A]=Data_AB2; ne=1; [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne);Vt_bar=Vt(:,1:ne);S_bar=S(1:ne,1:ne); Y_bar=U_bar*S_bar*Vt_bar'; plot3(Y(:,5),Y(:,10),Y(:,15),'+',Y_bar(:,5),Y_bar(:,10),Y_bar(:,15),'.') xlabel('Y_{:,5}');ylabel('Y_{:,10}');zlabel('Y_{:,15}');grid on
In Figure 5-24, the +'s are the noisy original (yi,5,yi,10,yi,15)-data points. The •'s represent the corresponding factor analytically reproduced data points. The noise reduction is obvious, however, note that the distribution of the •'s along the line is still noisy. This manifests in the irregular distribution of the markers.
0.3
Y:,15
0.2 0.1 0 -0.1 0.8 0.6
1 0.4
0.5
0.2 Y:,10
0
0
Y:,5
Figure 5-24. 3-dimensional plot of the 5th column of Y (+) or Y (•) versus the 10th versus the 15th.
246
Chapter 5
The Figure 5-25 attempts to provide a geometrical representation of the situation.
yi,: i,:
v2,: ytruei,:
yi ,:
v1,:
Figure 5-25. The relationship between the original measurement vector yi,:, its projection into the eigenvector space and the 'true' vector ytruei,:. The eigenvectors v1,: and v2,: span the grey plane; yi,: is the i-th spectrum, its ¯ -plane is y ¯ . The coordinates of spectrum y projection onto the V ¯¯ )i,:V ¯ i,: in ¯ i,:=(us ¯ -space are (us ) . The hypothetical true, noise-free spectrum ytruei,: (which is V ¯¯ i,: not known) usually lies close to, but generally not exactly on, the plane. Figure 5-26 concentrates on the triangle defined by the tips of the vectors yi,:, y ¯ i,: and ytruei,:. These are represented as small circles on that figure. The ¯. difference vector y ¯ i,:-yi,: is orthogonal to the plane spanned by V yi,: i,: noise removed
real noise
ytruei,: noise left
yi ,:
plane v1,:, v2,:
Figure 5-26. Detailed view of Figure 5-25. ¯ is y The projection of yi,: into V ¯ i,:, which is usually is much closer to the true spectrum ytruei,: than yi,: itself. A substantial amount of the noise is removed in the projection but not all.
5.2 Target Factor Analyses, TFA We continue considering multivariate data sets, e.g. a series of spectra measured as a function of time, reagent addition etc. In short, a matrix of
247
Model-Free Analyses
data that can be decomposed in the usual way: Y=CA. The spectra are measured at nl wavelengths and thus they are nl-dimensional vectors. The whole series of spectra follow a particular path in an nl-dimensional space. We have recognised in the preceding Chapter 5.1 Factor Analysis, that this path is concentrated in a much lower dimensional sub-space. Usually, for an nc component system, the sub-space has nc dimensions; e.g. for a two component system, all spectra lie in a plane. Recall that, if the system is closed, the dimension of the sub-space can be further reduced by meancentring. To start with, we do not know the spectra A of the components in the system under investigation. Factor Analysis delivers an orthonormal system of axes ¯ that defines the sub-space of Y and A in an optimal way. Importantly, this V is done automatically, and there is no input from the chemist regarding the components in the system or their spectra. The basic idea of Target Factor Analysis is very simple. In order to test whether a certain compound is taking part in the process, whether its spectrum exists in the measurement, we test whether that spectrum lies in ¯ . If such a test spectrum is outside V ¯ , there is no doubt that the component V does not take part in the process under investigation. If it is in the sub space, we cannot positively conclude that the species is there; the test spectrum could be a linear combination of the existing spectra. A typical application can be found in chromatography. A group of components elute in a strongly overlapping peak cluster. We suspect that a particular chemical, for which we know the spectrum, might be in the unknown mixture, but due to overlap, its spectrum does not appear pure in the matrix Y. Due to inevitable experimental noise, the test spectrum vector will never be ¯ and consequently the question is whether the test exactly in the subspace V ¯. vector is close to V The initial idea might be to compute the distance r of the test row vector t ¯ . As indicated in Figure 5-27, r is the difference between t and its from V ¯. projection tproj into V
v2,: t1
t2 r2
r1
v1,:
tproj,2
¯ . While r1 is Figure 5-27. The distance of two test vectors to V shorter t2 is a better test vector.
248
Chapter 5
Figure 5-27 shows the principle for two test vectors t1 and t2. The fact that ¯ is shorter than the distance t2 does the distance r1 to the sub-space V not mean that t1 is a better candidate. The test vectors need to be normalised in order to be able to compare these distances. One could also use the angle between a test vector t and its projection tproj as a measure. t v2,:
tn
r rn
tn,proj
α
tproj proj v1,:
Figure 5-28. The angle α is a good measure for closeness of the ¯. test vector t to the space V The angle α is defined by the following equations: sin α =
r t
=
rn tn
= rn
(5.23)
The projection tn,proj is computed as given in equation (5.26). The Matlab file Main_TFA.m generates a three-component overlapping chromatogram, generated by Data_Chrom2.m (p.219). Two test spectra t1 and t2 are generated, t1 is the original spectrum of one of the components, t2 is slightly shifted, see Figure 5-29. Both are normalised. The output includes the length of the residuals and the angles between the test spectra and the ¯. plane V MatlabFile 5-21. Main_TFA.m % Main_TFA [t,lam,Y,C,A]=Data_Chrom2; ne=size(C,2); [U,S,Vt]=svd(Y,0);V_bar=Vt(:,1:ne)'; t1=1000*gauss(lam,450,120); % component spectrum, max at 450nm t2=1000*gauss(lam,460,120); % slightly shifted spectrum plot(lam,t1,lam,t2); xlabel('wavelength');ylabel('absorptivity'); t1n=t1/norm(t1); t2n=t2/norm(t2); t1n_proj=t1n*V_bar'*V_bar; t2n_proj=t2n*V_bar'*V_bar; r1n=t1n-t1n_proj;
% normalisation of t1 and t2 % projections % residuals
249
Model-Free Analyses r2n=t2n-t2n_proj; distance(1)=norm(r1n); distance(2)=norm(r2n) angles=asin(distance)/pi*180 distance = 0.0003 angles = 0.0160
% angles in degrees
0.0424 2.4287
While both angles (given in degrees) and distances are small, the ones for the correct spectrum are significantly smaller. The principle of Target Factor Analysis is not restricted to the testing of spectra or, more generally, to row vectors. Exactly the same principles apply, of course, to column vectors or concentration profiles. In mathematical terms, there is a complete symmetry between the two. However, in chemical terms the two dimensions are different. Along the concentration profiles, we usually have a function that quantitatively describes the action while there is nothing of that kind along the spectral dimension. In Chapter 5.2.3, Target Transform Search/Fit, we take advantage of the functional definition that is available in the column space.
1000 900 800
absorptivity
700 600 500 400 300 200 100 0 400
450
500 wavelength
550
600
Figure 5-29. Correct (—) and slightly shifted (...) species spectrum, both used as target spectra.
250
Chapter 5
5.2.1 Projection Matrices The projection of a vector into the subspace defined by eigenvectors, and the subsequent calculation of the residual vector between the original and its projection, is a very common task. Refer back to equations (5.15) and (5.16). It is worthwhile investigating the computations in some detail. The determination of the projections can be regarded as a linear least¯ =, as in Figure squares fit; only now we have an orthogonal set of vectors V 5-28, rather than a general set of non-orthogonal vectors in F in the equivalent Figure 4-12. The projected test vector tproj is a linear combination ¯. of the vectors V tproj = b V
(5.24)
The computation of the linear parameters b is easy, as the pseudo-inverse of an orthonormal matrix is equal to its transposed
b = t V+ = t V t
(5.25)
and thus the projected vector
tproj = t V t V
(5.26)
and the residuals
r = t − tproj = t − t Vt V
(5.27)
t
= t (I − V V ) The equivalent operations are valid for columns:
tproj = U b = UU t t
(5.28)
with
r = t − tproj = t − UU t t
(5.29)
t
= (I − UU ) t The above equations are valid for orthonormal sets of basis vectors. They can be written in very similar ways for general non-orthogonal bases (e.g. F in Figure 4-12). The only difference is the computation of the pseudo-inverse, which can be numerically demanding, but is trivial for orthonormal bases.
Model-Free Analyses
251
tproj = b F (5.30) +
r = t (I − F F ) Similarly, for a column vector tproj, we can write in accordance to Figure 4-11
tproj = F b (5.31) +
r = t (I − F F ) ¯ tV ¯ ), r=t(I-F+F), r=(I- U ¯U ¯ t)t and r=(I-FF+)t are While the notations r=t(I- V elegant, they are inefficient ways of performing the calculations. The ¯ tV ¯ and U ¯U ¯ t are often very large square matrices which take time to matrices V compute, store and also to multiply with the vectors t. It is faster to ¯ (U ¯ tt) rather than r=(I-U ¯U ¯ t)t. The same, of course, is valid for calculate r=t–U the equations (5.29)-(5.31).
5.2.2 Iterative Target Transform Factor Analysis, ITTFA As the name Iterative Target Transform Factor Analysis indicates, this is an iterative extension of Target Factor Analysis. This time, we apply Target Factor Analysis to column vectors or concentration profiles. The basic idea is straightforward. First, we somehow guess a concentration profile, preferably close to a true one. Call it ctest. In the 3-component chromatographic example Data_Chrom2a.m, we use a delta function (often called a needle in the context of ITTFA) with a maximum (52) close to the true maximum of the second concentration profile (50). ¯ . We iteratively Such a test vector ctest normally is not in the sub-space U ¯ , applying equation (5.28). improve it in the following way: project ctest into U ¯ , is not correct, e.g. it contains negative This projected vector, while lying in U elements. A correction is applied that makes the profile physically possible ¯ . Nevertheless, this new vector is a better estimate but it removes it from U than the original one. As the projection invariably results in a shortening of the vector ctest, we re-normalise it to a maximum of one in each iteration. Ideally, the iterations are continued until things are perfect. Unfortunately this is easier said than done, as convergence is notoriously slow. This is illustrated in Figure 5-30, the correct profile will essentially 'never' be reached. MatlabFile 5-22. Data_Chrom2a.m function [t,lam,Y,C,A]=Data_Chrom2a lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50);
% component spectra
252
Chapter 5
t=(1:1:100)'; C(:,1)=1e-3*gauss(t,35,30); C(:,2)=9e-4*gauss(t,50,31); C(:,3)=2e-3*gauss(t,70,32);
% elution profiles
Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y)); MatlabFile 5-23. Main_ITTFA.m % Main_ITTFA [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; ne=size(C_sim,2); [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne); c_sim_n=C_sim(:,2)/max(C_sim(:,2)); % true conc profile, normalised c_test=zeros(size(t)); c_test(52)=1; % init guess,delta function at t=52 for i=1:20; C_test(:,i)=c_test; c_new=U_bar*(U_bar'*c_test); c_test=c_new.*(c_new>=0); c_test=c_test/max(c_test); end
% % % %
improved test vectors in C_test projection into U negative values=0 normalisation to max=1
plot(t,C_test,'-',t,c_sim_n,':'); xlabel('time');ylabel('norm. conc.');
1 0.9 0.8
norm. conc.
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
20
40
60
80
100
time
Figure 5-30. Progress of the ITTFA algorithm for one particular concentration profile. The dashed line represents the correct concentration profile.
Model-Free Analyses
253
We are using the function Data_Chrom2a for data generation. It produces slightly wider concentration profiles than Data_Chrom2 (p.219). Starting from the initial needle, we observe a very quick improvement towards the correct profile; however, the iterative process slows down very quickly and essentially never reaches the correct profile. This is a typical result as convergence in algorithms of this kind tends to be fast in the beginning and subsequently slows down dramatically. Defining a reliable termination criterion for the iterative process that copes with such behaviour is very difficult. This is the exact opposite of what we have experienced in the Newton-Gauss type algorithms of Chapter 4.3.1, where convergence accelerates towards the minimum.
5.2.3 Target Transform Search/Fit Traditional Target Factor Analysis just determines whether a particular test vector is close to the sub-space spanned by the significant eigenvectors, ¯ or U ¯ . As an introduction to the method of Target Transform either V Searching, consider the example discussed in Figure 5-29. One of the ¯ , the other does not. The almost obvious idea is to move the spectra lies in V ¯ spectrum along the wavelength axis, to continuously check the distance to V and determine the minimum. Moving spectra around, in this or similar ways, does not have many applications for the chemist. The reason is that there is no functional relationship that usefully defines absorption spectra. The general idea of target testing has more potential in the column direction. There, we deal with concentration profiles that often are defined by mathematical functions, based on chemical/physical laws. In the following, we develop the principle vaguely defined above and see how concentration profiles and their parameters can be determined by analysing and minimising distances. We develop the idea using a kinetic example. Any reaction scheme that consists exclusively of first order reactions, results in concentration profiles that are linear combinations of exponentials. There is no limit to the number of reacting components nc. The set of differential equations describing such a scheme of exclusively first order reactions can always be written in the following way. ⎡ [c1 ] ⎤ ⎡k1,1 k1,2 ⎢ [c ] ⎥ ⎢k ⎢ 2 ⎥ = ⎢ 2,1 ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ ⎣[c nc ]⎦ ⎢⎣
⎤ ⎡ [c1 ] ⎤ ⎥⎢ ⎥ ⎥ ⎢ [c 2 ] ⎥ ⎥⎢ ⎥ ⎥⎢ ⎥ knc ,nc ⎥⎦ ⎣[cnc ]⎦
(5.32)
or
c = Kc
(5.33)
254
Chapter 5
d[ X ] , the derivative of the concentration of X with dt respect to time. The vector c contains all the derivatives, the vector c all concentrations and the matrix K is formed by the rate constants describing the reaction mechanism. Usually, most entries in K are zero. It is best to use an example:
Recall the notation[ X ] =
k2 k1 For the reaction A ⎯⎯⎯ → B ⎯⎯⎯ →C , equation (5.32) reduces to
⎡[ A ]⎤ ⎡ −k1 0 0 ⎤ ⎡[ A ]⎤ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢[B ]⎥ = ⎢ k1 −k2 0 ⎥ ⎢[B ]⎥ ⎢[C ]⎥ ⎢⎣ 0 k2 0 ⎦⎥ ⎢⎣[C ]⎥⎦ ⎣ ⎦
(5.34)
Equation (5.34) is the equivalent to equation (3.75)(d) in matrix notation. For the same set of components but including reversible reactions k2 k1 ⎯⎯⎯ → B ←⎯⎯ ⎯⎯⎯ → C , equation (5.32) becomes: A ←⎯⎯ ⎯ ⎯ k3
k4
⎡[ A ]⎤ ⎡ −k1 k3 ⎢ ⎥ ⎢ ⎢[B ]⎥ = ⎢ k1 −k2 − k3 ⎢ ⎥ k2 ⎣[C ]⎦ ⎢⎣ 0
0 ⎤ ⎡[ A ]⎤ k 4 ⎥⎥ ⎢⎢[B ]⎥⎥ −k 4 ⎦⎥ ⎢⎣[C ]⎥⎦
(5.35)
Such systems of differential equations are called homogeneous. They have as solutions, linear combinations of exponential functions, where the eigenvalues, λi, of the matrix K are the exponentials. In the first, irreversible example, equation (5.34), the eigenvalues of K are λ1=-k1, λ2=-k2 and λ3=0. Thus, the concentration profiles are linear combinations of the vectors e-λit, where t is the vector of times. In matrix notation we can write
C = E TE
(5.36)
where C contains the concentration profiles in the usual way, E contains the column vectors e:,i=e-λit and TE is a transformation matrix that establishes the relationship between C and E. The elements of TE are defined by the reaction scheme and the initial concentrations of the reacting species. In the above example, equation (5.34), it is relatively straightforward to determine the eigenvalues of K. In the example (5.35) it is much more difficult to develop the equations. The Symbolic Toolbox of Matlab can be employed for the task. MatlabFile 5-24. Main_Sym_ABC.m % Main_Sym_ABC syms k1 k2 K K=[-k1 k1 0
0 -k2 k2
% symbolic variables 0; 0; 0];
% A->B->C
Model-Free Analyses lambdas=eig(K)
255
% eigenvalues of K
lambdas = -k1 -k2 0
and for the more interesting case of example (5.35) MatlabFile 5-25. Main_Sym_ABC_rev.m % Main_Sym_ABC_rev syms k1 k2 k3 k4 K
% symbolic variables
K=[-k1 k1 0
% ABC
k3 -k2-k3 k2
0; k4; -k4];
lambdas=eig(K)
% eigenvalues of K
lambdas = 0 -1/2*k1-1/2*k2-1/2*k3-1/2*k4+1/2*(k1^2-2*k1*k2+2*k1*k3 2*k1*k4+k2^2+2*k2*k3+2*k4*k2+k3^2-2*k3*k4+k4^2)^(1/2) -1/2*k1-1/2*k2-1/2*k3-1/2*k4-1/2*(k1^2-2*k1*k2+2*k1*k3 2*k1*k4+k2^2+2*k2*k3+2*k4*k2+k3^2-2*k3*k4+k4^2)^(1/2)
or in a more civilised form λ1 = 0
(5.37) 1 1 2 λ2 = − (k1 + k2 + k3 + k4 ) + k1 + k22 + k32 + k42 − 2k1k2 + 2k1k3 − 2k1k4 + 2k2k3 + 2k2k4 − 2k3k4 2 2 1 1 2 λ3 = − (k1 + k2 + k3 + k4 ) − k1 + k22 + k32 + k42 − 2k1k2 + 2k1k3 − 2k1k4 + 2k2k3 + 2k2k4 − 2k3k4 2 2 Interestingly, analysis of measured data only delivers the 2 (or 3 if zero is included) λ-values. There is not enough information to resolve two equations into 4 rate constants. Or, in chemical terms, without independent additional information, it is impossible to determine all 4 rate constants. The Symbolic Toolbox can even cope with initial concentrations and thus delivers the equations for the concentration profiles. MatlabFile 5-26. Main_Sym_ABC_rev.m …continued % Main_Sym_ABC_rev, ...continued C=dsolve('Da=-k1*a+k3*b','Db=k1*a-(k2+k3)*b+k4*c', ... 'Dc=k2*b-k4*c','a(0)=A0','b(0)=0','c(0)=0'); C.a C.b C.c
The output of this short program is 9500 characters, too much to be included here. We leave it to the readers to perform the task on their computer.
256
Chapter 5
Back to Target Factor Analysis. C is both a linear combination of E (equation ¯ (see later in equation (5.49)). Combining the two (5.36)) and also of U equations
C = E TE E=
and C = U TU
C TE-1
= U TU TE-1 = U T
(5.38)
¯ . Thus, demonstrates that the columns e:,i of E are linear combinations of U ¯ if the correct eigenvalue λi is used; otherwise it is at a e:,i lies in or close to U distance. This is nothing but target testing a test vector etest=e-λtestt. The application Main_TTF.m uses data generated in Data_ABC2.m for a consecutive reaction A→B→C. MatlabFile 5-27. Data_ABC2.m function [t,lam,Y,C,A]=Data_ABC2 % A -> B -> C t lam k A_0
= = = =
[0:100]'; 400:5:600; [0.1 0.03]; 1e-3;
% % % %
reaction times wavelengths rate constants initial concentration of A
C(:,1)=A_0*exp(-k(1)*t); % concentrations of species A C(:,2)=A_0*(k(1)/(k(2)-k(1))*(exp(-k(1)*t)-exp(-k(2)*t))); % conc. of B C(:,3)=A_0-C(:,1)-C(:,2); % concentrations of C A(1,:)=1e3*gauss(lam,450,120); % molar spectrum of species A A(2,:)=2e3*gauss(lam,350,120)+1e3*gauss(lam,500,50); % mol. spect. of B A(3,:)=1e3*gauss(lam,500,50); % molar spectrum of C Y=C*A; randn('seed',0); Y=Y+0.001*randn(size(Y));
% Beer's law % fixed start for random number generator % standard deviation 0.001
MatlabFile 5-28. Main_TTF.m % Main_TTF [t,lam,Y,C,A]=Data_ABC2; ne=size(C,2); [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne); lambda_test=-.05:.001:.2; for i=1:length(lambda_test) e_test=exp(-lambda_test(i)*t); e_test=e_test/norm(e_test); r=e_test-U_bar*(U_bar'*e_test); distance(i)=norm(r); end
% % % %
test exponential vector normalise residual vector distance
plot(lambda_test,log10(distance)); xlabel('lambda_{test}');ylabel('log(\midr\mid)');
257
Model-Free Analyses
0 -0.5
log(⏐r⏐)
-1 -1.5 -2 -2.5 -3 -3.5 -0.05
0
0.05 0.1 lambda
0.15
0.2
test
Figure 5-31. Distance of normalised exponential functions, exp( ¯. λtest*t) to the subspace U Figure 5-31 clearly features three minima at the correct positions, λtest=0, and the two rate constants used to generate the data: λtest=0.03 and λtest=0.1. A very interesting feature of the whole method is that the rate constants are completely independent. Each minimum, or rate constant, is defined on its own, completely independent of all the others. This is in clear contrast to normal, hard-modelling data fitting where the residuals are a function of all parameters together. There are several extensions and comments worth making. 1. It is possible to use an iterative algorithm to determine the exact positions of the minima. Again, in such a program the rate constants can be fitted individually, irrespective of the others. 2. Under certain circumstances it is possible to represent concentration profiles encountered in titrations as linear combinations of the typical S-shaped profiles known from equilibrium studies. 3. It is feasible to target test not only column vectors of C or E but also the complete matrices C or E. Parameter Fitting via Target Testing
It is worthwhile examining point 3 above in some additional detail. Equation (5.38) C = U TU is, of course, not completely correct. It is only an approximation; it should be written as
258
Chapter 5
C = U TU + R U
(5.39)
The matrix C is defined by the non-linear parameters (rate constants). It is possible to minimise RU, i.e. the corresponding ssq, as a function of these parameters in a 'normal' Newton-Gauss algorithm. The chain of equations goes as follows TU = U t C R U = C − U TU = C − UU t C
(5.40)
t
= (I − UU ) C
The advantage is that there is no pseudo-inverse to be calculated in this way. The computation of TU, which comprises linear parameters is easier ¯ is an orthonormal matrix, U ¯ +=U ¯ t. As mentioned before, in than 'usual' as U equation (5.29), it is advantageous to compute the residuals as R U = C − U(U t C) ; it is considerably faster. The 'standard' chain of equations using Beer-Lambert's law is:
Y = CA +R A = C+ Y R = Y − CC+ Y
(5.41)
= (I − CC + ) Y The main difference to equations (5.40) is the computation of the pseudoinverse C+. For the sake of completeness, we also include the relevant equations if data reduction, according to Reduced Eigenvector Space (p.180), is applied: Yred = C A red + R red = US A red = C + US R red = US − CC+ US
(5.42)
= (I − CC + ) US
The dimensions of the corresponding matrices in equation (5.42) are all the same as in equation (5.40), the main difference is still the computation of C+ ¯ t. instead of U The two approaches, equation (5.40) or (5.41)/(5.42), are not equivalent. Figure 5-32 attempts to represent the situation graphically. Due to the limitation of our mind to 3 dimensions, this endeavour is not easy, or rather ¯¯ or Y it is impossible as we are running out of dimensions. The 'vectors' US ¯¯ or the complete space Y; the curved line C=f(k) represent the subspace US represents the space defined by the whole matrix C; the 'vector' C(kC) represents the minimum for equation (5.41) or (5.42) and C(kU) the minimum
259
Model-Free Analyses
for equation (5.40); the 'vectors' R (Rred) and RU are the corresponding residual matrices.
R or Rred
C(kc) C(ku)
Ru
US or Y
C=f(k) Figure 5-32. Graphical representation of equations (5.40) to (5.42). The 'normal' residuals R are orthogonal to the space C, defined by the ¯¯ or Y into C. This is a straightforward projection of the column vectors in US linear least-squares calculation, equivalent to Figure 4-10. C(kC) is the ¯¯ or Y. closest the space C gets to the 'vectors' US The residuals Ru are defined by the projections of the 'vectors' C into the ¯ ; they are orthogonal to U ¯ . This projection is simpler due to the space U ¯. orthogonal base vectors. RU is the closest the 'vectors' C get to U The figure, however incomplete, demonstrates that the two minima are not the same. Often they are very similar. Probably a more important difference ¯¯ have a length defined by the measurement, thus there is that the vectors US is a weight given to the vectors which is relevant. Such information is completely lost in Target Transform Fitting, employing equation (5.40); Ru is ¯¯ . In fact, the shorter C, minimised without any reference to the length of US the shorter is Ru, thus some normalising of C, as a function of the parameters k, is required. Refer also to Figure 5-27 for a similar situation.
5.3 Evolving Factor Analyses, EFA ¯¯¯ is full The Singular Value Decomposition of a matrix Y into the product USV of rich and powerful information. The model-free analyses we discussed so ¯ and V ¯. far are based on the examination of the matrices of eigenvectors U Evolving Factor Analysis, EFA, is primarily based on the analysis of the ¯ of singular values. matrix S Previously, we have seen in Magnitude of the Singular Values (p.219) that the number of significant singular values in S equals the number of linearly
260
Chapter 5
independent rows or columns of the matrix Y of measurement, which ideally equals the number of changing chemical components in the process investigated. So far, the complete data matrix Y has been analysed and thus the result reflects the total measurement, the number of components existing anywhere in the measurement. Evolving Factor Analyses investigate the evolving character of the singular values − how they change as a function of the progress of the measurement. Information about the evolution of the rank and thus the appearance of new components is revealed. Naturally, this only makes sense if there is an inherent order in the data, usually an order in the acquisition of the spectra that make up the matrix Y. Factor analytical methods have been developed by social scientists; their samples are individuals for whom they have a 'spectrum' of properties. In this collection of samples there is no inherent order and thus, methods that rely on an inherent order of the samples, such as EFA, are of no use. As a typical example of ordered data we will investigate chromatography, where spectra are measured as a function of elution time.
5.3.1 Evolving Factor Analysis, Classical EFA The basic principle of EFA is very simple. Instead of subjecting the complete matrix Y to the Singular Value Decomposition, specific sub-matrices of Y are analysed. In the original EFA, these sub-matrices are formed by the first i spectra of Y where i increases from 1 to the total number of spectra, ns. The appearance of a new compound during the acquisition of the data is indicated by the emergence of a new significant singular value. The procedure is best explained graphically in Figure 5-33. The sub-matrix, indicated in grey, is subject to the SVD and the resulting ne significant singular values are stored as a row vector in a matrix EFA of the same number of rows as Y.
ne
1 SVD i
ns Figure 5-33. Schematic of Forward EFA
si ,:
Model-Free Analyses
261
The example used for the introduction of EFA is based on the threecomponent chromatogram, Data_Chrom2.m (p.219) we have used several times earlier. While most of the Matlab listing in Main_EFA1.m is close to self explanatory, a few statements might need clarification. The singular values are stored in the matrix EFA_f which has ns rows and ne columns. It is advantageous to plot the logarithms of the singular values; their values span several orders of magnitude and cannot be represented in a normal plot. The rank is the number of significant singular values. The significance level can be estimated as the first non-significant singular value of the total matrix Y. The three sub-plots in Figure 5-34 clearly indicate the relationship between the concentration profiles, the evolving singular values and the evolving rank. MatlabFile 5-29. Main_EFA1.m % Main_EFA1 [t,lam,Y,C,A]=Data_Chrom2; [ns,nc]=size(C); ne=nc+1;
% one extra singular value
EFA_f=NaN(ns,ne); % NaNs to prevent log(0) for i=1:ns s_f=svd(Y(1:i,:)); % svd of the first i rows of Y if isig_level)'); % number of significant SV subplot(3,1,1); plot(t,C); ylabel('conc.'); subplot(3,1,2); plot(t,log10(EFA_f)); ylabel('log(s_f)'); subplot(3,1,3); plot(t,Rank_f,'x'); axis([0 t(ns) 0 ne]); xlabel('time');ylabel('rank');
262
Chapter 5
conc.
2
x 10
-3
1
0 0
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
log(s f)
2 0 -2 -4 0
rank
4
2
0 0
time
Figure 5-34. Forward EFA. Top panel: concentration profiles; second panel: evolving singular values; third panel: evolving rank. Evolving factor analysis can, and should be performed in both forward and backward directions. The forward plot, calculated above and shown in Figure 5-34, indicates the appearance of new components. The backward plots of Figure 5-36 are calculated similarly by determination of the singular values of the set of the last 1, 2, 3, ... spectra in Y, as seen in the schematic of Figure 5-35. These plots indicate the disappearance of the components.
ne
1 ns-i+1 SVD ns Figure 5-35. Schematic of Backward EFA
sns −i +1,: +1,:
263
Model-Free Analyses MatlabFile 5-30. Main_EFA1.m …continued % Main_EFA1 ...continued EFA_b=NaN(ns,ne); for i=1:ns s_b=svd(Y(ns-i+1:ns,:)); if isig_level)'); subplot(3,1,1); plot(t,C); ylabel('conc.'); subplot(3,1,2); plot(t,log10(EFA_b)); ylabel('log(s_b)'); subplot(3,1,3); plot(t,Rank_b,'x'); axis([0 t(ns) 0 ne]); xlabel('time');ylabel('rank');
conc.
2
x 10
% NaNs to prevent log(0) % svd of the last i rows of Y % relevant SV are stored
% number of species
-3
1
0 0
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
log(sb)
2 0 -2 -4 0
rank
4
2
0 0
time
Figure 5-36. Backward EFA. Top panel: concentration profiles; second panel: evolving singular values; bottom panel: evolving rank.
264
Chapter 5
The combined interpretation of the plots of the singular values in forward and backward direction is fairly straightforward. The increase of the rank by one, indicates the appearance of a species during the process monitored. In well behaved chromatograms, i.e. non-overloaded columns, the width of the elution profiles increases continuously with increasing elution time. In such instances, it is possible to connect the appearance and the disappearance of the individual components: the first compound to appear is also the first to disappear, etc. Concentration windows can be established for all compounds. These are regions along the time axis during which a component exists. Outside these windows the concentration is known to be zero. The connection between the forward and backward singular values can be made in a one-line Matlab command MatlabFile 5-31. Main_EFA1.m …continued % Main_EFA1 ...continued for i=1:3 % windows of existence for the components C_window(:,i)=EFA_f(:,i)>sig_level & EFA_b(:,ne-i)>sig_level; end subplot(4,1,1) plot(t,C); ylabel('conc.'); subplot(4,1,2); plot(t,log10(EFA_f)); ylabel('log(s_f)'); subplot(4,1,3); plot(t,log10(EFA_b)); ylabel('log(s_b)'); subplot(4,1,4); plot(t,C_window(:,1),t,C_window(:,2)+0.3,t,C_window(:,3)+0.6); xlabel('time');ylabel('conc. window');
EFA plots can be used to estimate the rank of a matrix. EFA plots have similarities with the singular value plots shown in Figure 5-4 but they clearly contain more information and thus are more instructive. In order to demonstrate this enhanced capability of EFA plots for the determination of the number of components, we generate a series of spectrophotometric titrations of a diprotic acid with different noise levels, employing Data_eqAH2a.m (p.236) and analyse with Main_EFA2.m. Forward EFA plots, next to the original data, are presented in Figure 5-38. A few observations can be made: the significant singular values are not much affected by the noise level, only the non-significant ones move up continuously with increasing noise. This behaviour is similar to the one observed in Figure 5-4. The lowest panels of Figure 5-38 demonstrate that even at very high noise levels, EFA facilitates the determination of the correct number of components.
265
Model-Free Analyses
conc.
2
x 10
1
log(sf)
0 0 5
log(sb)
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
0 -5 0 5
conc. window
-3
0 -5 0 2 1 0 0
time
Figure 5-37. Complete EFA. Concentration profiles; forward and backward evolving singular values; bottom panel: concentration windows. The human eye is very good at detecting patterns − in this case the appearance of a new significant singular value. The appearance of a new component, as indicated by the point where a new significant singular value rises above the noise level, is delayed by increasing noise. MatlabFile 5-32. Main_EFA2.m % Main_EFA2 [pH,lam,Y,C,A]=Data_eqAH2a; [ns,nc]=size(C); ne=nc+1; noise=[.05 .1 .2];
% one extra singular value
for j=1:3; Yn=Y+noise(j)*randn(size(Y)); % add different noise levels EFA_f=EFA(Yn,ne); subplot(3,2,2*j-1); plot(pH,Yn,'-'); axis([2 12 -1 2]); if j==3,xlabel('pH');end;ylabel('abs.'); subplot(3,2,2*j);
266
Chapter 5
plot(pH,log10(EFA_f)); axis([2 12 -.5 1.5]); if j==3,xlabel('pH');end;ylabel('log(s_f)'); end
1.5 1
1
log(sf)
abs.
2
0 -1 2
4
6
log(sf)
abs.
6
8 10 12
4
6
8 10 12
4
6 8 10 12 pH
1
0 4
6
0.5 0 -0.5 2
8 10 12
2
1.5 1
1
log(s f)
abs.
4
1.5
1
0 -1 2
0 -0.5 2
8 10 12
2
-1 2
0.5
4
6 8 10 12 pH
0.5 0 -0.5 2
Figure 5-38. EFA forward plots for a data set with increasing noise levels. EFA.m is a short Matlab function that computes forward and backward EFA matrices for a given number, ne, of singular values. Its structure is essentially identical to the one discussed for Main_EFA2.m. MatlabFile 5-33. EFA.m function [EFA_f,EFA_b]=EFA(Y,ne) [ns,nl]=size(Y); EFA_f=NaN(ns,ne); EFA_b=NaN(ns,ne); for i=1:ns s_f=svd(Y(1:i,:)); % forward SV s_b=svd(Y(ns-i+1:ns,:)); % backward SV EFA_f(i,1:min(i,ne))=s_f(1:min(i,ne))'; EFA_b(ns-i+1,1:min(i,ne))=s_b(1:min(i,ne))'; end
267
Model-Free Analyses
Interestingly, EFA was originally developed for the analysis of spectrophotometric titration data. Concentration profiles in chromatography and equilibrium studies can be surprisingly similar. The main difference is that in chromatography, the data set generally starts and ends without any component present (Figure 5-37), while in titrations, there is usually one particular species at the beginning and another one at the end (Figure 5-39). While the algorithm is not affected, the concentration windows are different. MatlabFile 5-34. Main_EFA3.m % Main_EFA3 [pH,lam,Y,C,A]=Data_eqAH2a; [ns,nc]=size(C); ne=nc+1; [EFA_f,EFA_b]=EFA(Y,ne); sig_level=EFA_f(ns,ne); for i=1:nc % windows of existence for the components C_window(:,i)=EFA_f(:,i)>sig_level & EFA_b(:,ne-i)>sig_level; end subplot(3,1,1); plot(pH,C); ylabel('conc.'); subplot(3,1,2); plot(pH,log10(EFA_f),'-',pH,log10(EFA_b),':'); ylabel('log(s_f),log(s_b)'); subplot(3,1,3); plot(pH,C_window(:,1),pH,C_window(:,2)+0.3,pH,C_window(:,3)+0.6); xlabel('pH');ylabel('conc. window');
conc.
1
x 10
-3
0.5
0 2
4
6
8
10
12
4
6
8
10
12
4
6
8
10
12
f
b
log(s ),log(s )
2 0 -2 -4 2 conc. window
2
1
0 2
pH
Figure 5-39. Concentration profiles for the titration of a di-protic acid; EFA plots and concentration windows.
268
Chapter 5
5.3.2 Fixed-Size Window EFA, FSW-EFA There are different strategies for the selection of sub-matrices for evolving type factor analyses. The classical, original mode has been presented so far. The most important alternative procedure is based on a moving window of fixed size. In other words, a window of a pre-defined number of consecutive spectra is moved along the columns of the matrix Y. Each window is subjected to SVD, the singular values are stored and their logarithms are plotted.
SVD
si ,:
Figure 5-40. Schematic of FSW-EFA Fixed-size-window-EFA plots reveal the number of different species that co exist in the particular window. More precisely, it is the number of species with linearly independent concentration profiles. Here is the appropriate Matlab program Main_FSW_EFA.m. The data, generated by Data_eqAH4a.m, are mimicking a spectrophotometric titration of a tetra-protic acid AH4. with log(K) values of 8, 7, 6 and 2. The equilibria are quantitatively described by equation (5.43). log K
1 ⎯⎯⎯⎯ → AH A + H ←⎯⎯⎯⎯
log K
2 ⎯⎯⎯⎯→ AH + H ←⎯⎯⎯⎯ AH 2
log K
3 ⎯⎯⎯⎯→ AH 3 AH 2 + H ←⎯⎯⎯⎯
(5.43)
log K
4 ⎯⎯⎯⎯→ AH 4 AH 3 + H ←⎯⎯⎯⎯
The concentrations of the differently protonated species as a function of pH are calculated with the explicit function we developed in Special Case: Explicit Calculation for Polyprotic Acids, p.64. MatlabFile 5-35. Data_eqAH4a.m function [pH,lam,Y,C,A]=Data_eqAH4a pH=[0:.1:12]'; H=10.^(-pH);
% pH range
Model-Free Analyses logK=[8 7 6 2]; K=10.^logK; n=length(logK); denom=zeros(size(H)); for i=0:n num(:,i+1)=H.^i*prod(K(1:i)); denom=denom+num(:,i+1); end alpha=diag(1./denom)*num; C=1e-3*alpha; lam=400:10:600; A(1,:)=1000*gauss(lam,450,120); A(2,:)=2000*gauss(lam,350,120); A(3,:)=1000*gauss(lam,500,50); A(4,:)=1000*gauss(lam,550,50); A(5,:)=1000*gauss(lam,580,50); Y=C*A; randn('seed',0); Y=Y+1e-3*randn(size(Y));
% protonation constants % number of protons
% numerator % denominator % degree of dissociation % concentration profiles % wavelength range % component spectra
% absorbance data % noise level 0.001
MatlabFile 5-36. Main_FSW_EFA.m % Main_FSW_EFA [pH,lam,Y,C,A]=Data_eqAH4a; [ns,nc]=size(C);
% eq. data 4-protic acid
size_w=3; % small windows EFA_w=zeros(ns-size_w+1,size_w); for i=1:ns-size_w+1 s_w=svd(Y(i:i+size_w-1,:)); EFA_w(i,:)=s_w'; end subplot(3,1,1); plot(pH,C,'k'); ylabel('conc'); subplot(3,1,2); plot(pH(0.5*(size_w+1):ns-0.5*(size_w-1)),log10(EFA_w),'k'); ylabel('log(s_w)'); size_w=9; % large windows EFA_w=zeros(ns-size_w+1,size_w); for i=1:ns-size_w+1 s_w=svd(Y(i:i+size_w-1,:)); EFA_w(i,:)=s_w'; end subplot(3,1,3); plot(pH(0.5*(size_w+1):ns-0.5*(size_w-1)),log10(EFA_w),'-k'); xlabel('pH'); ylabel('log(s_w)');
269
270
Chapter 5
conc
1
x 10
-3
0.5
0 0
2
4
6
8
10
12
2
4
6
8
10
12
2
4
6 pH
8
10
12
lo g (s w )
2 0 -2 -4 0
lo g (s w )
2 0 -2 -4 0
Figure 5-41. Concentration profiles for a titration of a 4-protic acid. Second panel: FSW-EFA plot for a window size of 3; and third panel for a window size of 9. Figure 5-41 displays the results of two FSW-EFA analyses with different window sizes, 3 and 9, the second and third panels. The small window of size 3 naturally cannot detect more than 3 different components within the window. The relatively high noise level using a window of only 3 spectra, and the high overlap of the concentration profiles, makes the third singular value in the middle plot, expected between pH 6-8, hardly discernable. With windows of size 9, up to 4 singular values are clearly identifiable. The price to pay for large window sizes is the spreading out of the information. Around pH 4, there are only 2 components co-existing but within the large window there is a total of 3 components. Due to that broadening effect, the
Model-Free Analyses
271
beginnings of concentration profiles are not easily detected. However, the 4 components coexisting at pH 7 are well distinguished in the FSW-EFA plot. There are advantages and disadvantages in this approach compared to classical EFA. a) In big systems with many species and spectra, the detection of new species deteriorates with increasing window size. This effect is clearly noise dependent and can qualitatively be observed in Figure 5-38. It is the effect of a continuous increase of the noise singular values. In FSW-EFA, the fixed size window maintains the magnitude of the singular values related to noise. b) the classical EFA plots are easier to interpret − compare Figure 5-37 with Figure 5-41. c) Consider the following, rather unlikely example: component 1 and 5 in a co-eluting peak system in chromatography have the same spectrum. Under such circumstances the detection of component 5 in the cluster is not detected at all in classical EFA. Window EFA does not suffer from these shortcomings as long as there is no overlap between the 2 components with identical spectra. d) a decision has to be made for the size of the window in FSW-EFA. As outlined above, this decision is important.
5.3.3 Secondary Analyses Based on Window Information The location of the concentration windows is the distinctive result of the classical EFA plots, as in Figure 5-37. Apart from information on peak purity in chromatography, there is not much that is directly useful in the information about concentration windows. In this section, we develop methods that, based on these concentration windows, result in complete concentration profiles C and subsequently the corresponding species spectra A. Iterative Refinement of the Concentration Profiles
This algorithm has many aspects similar to Iterative Target Transform Factor Analysis, ITTFA, as discussed in Chapter 5.2.2, and Alternating LeastSquares, ALS as introduced later in Chapter 5.4. The main difference is the inclusion of the window information as provided by the EFA plots. A brief description of the algorithm (as usual, everything is based on Y=CA): (a) Initial guess for the matrix C of concentrations; often this is not crucial; possible choices are combined EFA plots (see Figure 5-44), the window matrix of 0's and 1's is adequate as well (Figure 5-37). (b) Calculate A as A=C\Y (c) Corrections on A, e.g. negative values=0 (d) Calculate C as C=Y/A (e) Corrections on C, negative values and values outside the concentration windows =0. (This is the main difference to ITTFA).
272
Chapter 5
(f) If fit not 'perfect' return to (b). Instead of a proper termination criterion, which is difficult to develop, we just iterate 100 times. We employ the same chromatographic data data_chrom2a.m (p.251) as in ITTFA. Figure 5-42 displays the results. MatlabFile 5-37. Main_It_EFA.m % Main_It_EFA [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; [ns,nc]=size(C_sim); ne=nc+1; % one extra singular value [EFA_f,EFA_b]=EFA(Y,ne); % perform EFA EFA_f(isnan(EFA_f)==1)=0; % replace NaN's by zeros EFA_b(isnan(EFA_b)==1)=0; % replace NaN's by zeros C=min(EFA_f(:,1:nc),fliplr(EFA_b(:,1:nc))); % combined SV curves sig_level=0.01; % define cut off level C_window=C>sig_level; % build window matrix of 0's and 1's for it=1:100 C=C/diag(max(C));
% normalization to max of 1
A=C\Y; A=A.*(A>0);
% spectra % positive
C=Y/A; C=C.*(C>0); C=C.*C_window;
% conc profiles % positive % apply windows
R=Y-C*A; ssq(it)=sum(sum(R.*R));
% residuals
end [C_n,A_n]=norm_max(C,A); % norm. C and C_sim to unit height [C_sim_n,A_sim_n]=norm_max(C_sim,A_sim); % and recalc. A, A_sim subplot(3,1,1); plot(lam,A_n,'-',lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(3,1,2); plot(t,C_n,'-',t,C_sim_n,'.'); xlabel('time');ylabel('concentration'); subplot(3,1,3); plot(log10(ssq)); xlabel('iteration');ylabel('log(ssq)');
273
Model-Free Analyses
absorptivity
2 1 0 400
450
500 wavelength
550
600
concentration
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
2 0 -2 -4 0
20
40 iteration
Figure 5-42. Iterative EFA. Top panel: component spectra; middle panel: concentration profiles; bottom panel: progress of the quality of the fit. Markers (•) represent the true values and the lines, the iterative EFA estimates after 100 iterations. The width of the concentration windows is defined by the chosen level of significance of the singular values. Figure 5-43 represents the effect of choosing two different levels: the dotted horizontal line is defined by the first non-significant singular value (fine dotted line) of the complete matrix Y. The intersection of this horizontal line with the EFA trace of the second singular value is the beginning of the second concentration window. The lower significance level at 0.01, represented by the full line, results in an intersection at an earlier time and thus a wider concentration window. In all future iterative analyses of these chromatography data, data_chrom2a.m, the level of significance is set to 0.01.
274
Chapter 5
-1
log10(S)
-1.5
-2
-2.5 0
20
40
60
80
100
time
Figure 5-43. Different significance levels defining the concentration windows for Data_Chrom2a. The initial guess for the concentration profiles is computed as the combination of the forward and backward EFA graphs; the smaller of each forward/backward pair is used. We display them in Figure 5-44 as we use these initial guesses in most other upcoming model-free analyses. C=min(EFA_f(:,1:nc),fliplr(EFA_b(:,1:nc))); % combined SV curves plot(t,C); xlabel('time');ylabel('concentration'); 3.5 3
concentration
2.5 2 1.5 1 0.5 0 0
20
40
60
80
100
time
Figure 5-44. Initial guesses for the concentration profiles, computed as the combination of the singular value traces for forward and backward EFA.
Model-Free Analyses
275
Several additional comments are due. As observed in Chapter 5.2.2, Iterative Target Transform Factor Analysis, ITTFA, iterative progress is relatively fast at the beginning and slows down continuously with the number of iterations. The third panel of Figure 5-42 demonstrates that the minimum has not been reached at all after 100 iterations. While the concentration profiles are reasonably well reproduced, there are some problems with the absorption spectra; one spectrum has a substantial contribution from another. Nevertheless, considering the simplicity of the algorithm the results are astoundingly accurate. Model-free methods do neither supply absolute information about the concentrations or about the spectra. Essentially they only deliver the shapes for the profiles. In this and future examples, we normalise the concentration profiles in C to a maximum of 1 and adjust the species spectra of A in such a way that the product CA is correct. This is done in the function norm_max.m. MatlabFile 5-38. norm_max.m function [Cn,An]=norm_max(C,A) coef=1./max(C); Cn=C*diag(coef); if nargin==2 An=diag(1./coef)*A; end
% normalisation coefficients % apply to C % apply inverse to A
It is worthwhile to compare this iterative refinement of concentration profiles as given on p.271 with ITTFA, the other iterative process we introduced in Chapter 5.2.2. ¯S ¯V ¯ , see Instead of computing A=C+Y, as in (b) on p.271, we replace Y with U equation (5.10).
Y = CA = USV
(5.44)
The component spectra A can then be determined by the following ¯ tC)-1U ¯ t from the left results in rearrangements. Multiplication with (U (U t C )−1 U t C A = (U t C )−1 U t U S V or
(5.45) t
−1
A = (U C ) S V The next step is C=YA+, as in (c) on p.271. Applying equation (5.45) to compute the pseudo-inverse of A
C = Y A + = U S V V t S-1 U t C = U U t C
(5.46)
which is exactly the formula used in ITTFA. If no changes are applied to C and A during both iterative processes, there is no difference between the two methods. The advantage in the refinement, as outlined in steps (a)-(f) on
276
Chapter 5
p.271, lies in the possibility of the incorporation of extra information on A, such as the non-negativity constraint. Explicit Computation of the Concentration Profiles
As we have seen with the previous iterative refinement and ITTFA, convergence generally is very sluggish. Even with moderately complex systems, it is often too slow to be useful. There are alternative, non-iterative methods that compare favourably with the above iterative algorithms. ¯¯¯ which We start the derivation with the standard equations Y=CA and Y=USV t t −1 we combine, see equation (5.44). Post-multiplication with V (A V ) results in
C A V t (A V t )−1 = C = U S (A V t )−1
(5.47)
To compute C, the only unknown is A. It is advantageous to regard the product S(A )−1 as the unknown. Dimensional analysis shows that it is a nc×nc square matrix (nc =number of components), we call it a transformation matrix T:
T = S (A V t )−1
(5.48)
C = UT
(5.49)
And now
This is a very interesting and useful equation and we will return to it several times in later parts of this chapter. Equation (5.49) relates the concentration ¯ of eigenvectors. It is worthwhile representing the matrix C to the matrix U equation graphically.
nc
nc
nc ×
ns
C
=
T
nc
U
Figure 5-45. Graphical representation of C = U T
277
Model-Free Analyses
For an nc=2 component system the square transformation matrix T has only 4 elements, for a 3 component system there are 9 elements, etc. This relatively small number of unknown elements can be calculated explicitly! The crucial information is contained in the concentration windows, as determined by EFA. We know the elements of the matrix C within the concentration windows are positive, while outside these windows, the ¯ . The known elements of C are zero. We also know the complete matrix U ¯ are represented as the shaded areas in Figure 5-46. The elements of C and U white parts of the matrices have to be calculated. ×
=
C
=
U
×
T
¯ are known. Figure 5-46. The shaded parts of C and U ¯T The idea is to compute the elements of T in such a way that the product U results in zeros for the shaded part of C. In order to achieve this, we can separate the columns of C and treat them individually. The i-th column c:,i of ¯ ×t:,i, where t:,i is the i-th column of T. C is the product of U c:,i = U × t:,i
(5.50)
Graphically: ×
=
c:,i
=
U
×
t:,i
¯ ×t:,i. Figure 5-47. The i-th column c:,i of C is the product U
278
Chapter 5
This equation can be split up again. We can remove the unknown, white part of c:,i, i.e. the window of existence of the i-th component, and also the ¯ (in white). What is left, in grey, is the product corresponding part (rows) in U
c:,0i = U0 × t:,i = 0
(5.51)
where the superscript '0' represents the zero parts of c:,i and corresponding ¯. parts of U × =
0
=
U0
× t:,i
Figure 5-48. The homogeneous system of equations U0 × t:,i = 0 This represents a homogeneous system of equations. There is, as always with homogeneous equations, a trivial solution: t:,i=0. There are, however, and fortunately, non-trivial solutions as well. This is due to the fact that U0 does not have full rank. We removed all the information on the i-th component and consequently the rank of U0 is one less than the ¯ . In such a system of equations, a solution t:,i is not rank of the complete U completely defined, as it can be multiplied by any factor (≠0). Thus, we can freely choose one element, e.g. the first element in t:,i as one, t1,i=1. Equation (5.51) can now be written as
0 = U0 × t:,i
(5.52)
0 0 = u:,1 ×1 + U:,2: nc × t2:nc ,i
× =
0
+
0 u:,1 :,1
t2:nc ,i
0 U:,2: nc
Figure 5-49. Graphical representation of equation (5.52).
Model-Free Analyses
279
This allows the computation of the other elements t2:nc,i by linear regression as + 0 0 t2:nc ,i = −(U:,2: nc ) u:,1
(5.53)
This process is repeated for all individual columns of C to result in the ¯ T is complete matrix T (containing 1's in the first row). Finally, the product U the complete matrix C. As always with model-free analyses, only the shapes of the concentration profiles are determined. They have to be normalised in some way. We have seen in The Structure of the Eigenvectors (p.221) that the sign of the eigenvectors is not defined. In the present application, a concentration profile can be positive or, if negative, needs to be multiplied by -1. The two lines neg=-min(C)>max(C); C=C*diag((-1).^neg);
% sign of conc profiles % reverse if negative
in Main_Non_It_EFA.m check for negative concentrations and if necessary correct them. The subsequent computation of the absorption spectra A from C and Y is a simple linear regression. This is followed by the normalisation of the concentration profiles to a maximum of one, as has been outlined already in the preceding chapter Iterative Refinement of the Concentration Profiles. The normalisation is done using the routine norm_max.m (p.275). MatlabFile 5-39. Main_Non_It_EFA.m % Main_Non_It_EFA [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; [ns,nc]=size(C_sim); ne=nc; [EFA_f,EFA_b]=EFA(Y,ne+1); sig_level=0.01;
% perform EFA % define cut-off level % build window matrix C_window=EFA_f(:,1:nc)>sig_level & fliplr(EFA_b(:,1:nc))>sig_level; [U,S,Vt]=svd(Y,0); U_bar=U(:,1:ne); T=ones(nc,nc); for i=1:nc U_i_0=U_bar(~C_window(:,i),:); T(2:nc,i)=-U_i_0(:,2:nc)\U_i_0(:,1); end C=U_bar*T; neg=-min(C)>max(C); C=C*diag((-1).^neg); A=C\Y;
% sign of conc profiles % reverse if negative
[C_n,A_n]=norm_max(C,A); % normalisation of calc. C and A [C_sim_n,A_sim_n]=norm_max(C_sim,A_sim); % norm. of true C and A subplot(2,1,1)
280
Chapter 5
plot(lam,A_n,'-',lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(2,1,2); plot(t,C_n,'-',t,C_sim_n,'.'); xlabel('time');ylabel('concentration');
absorptivity
2 1 0 -1 400
450
500 wavelength
550
600
concentration
1 0.5 0 -0.5 0
20
40
60
80
100
time
Figure 5-50. Result of non-iterative EFA. It is instructive to compare Figure 5-50 with Figure 5-42. The explicit computation is not only much faster, it also produces better results. This clearly is the consequence of the poor convergence of the iterative version of EFA.
5.4 Alternating Least-Squares, ALS The method of Alternating Least-Squares, ALS, is very simple and exactly for that reason it can be very powerful. ALS has found widespread applications and it is an important method in the collection of model-free analyses. In contrast to most other model-free analyses, ALS is not based on Factor Analysis. ALS should more correctly be called Alternating Linear Least-Squares as every step in the iterative cycle is a linear least-squares calculation followed by some correction of the results. The main advantage and strength of ALS is the ease with which any conceivable constraint can be implemented; its main weakness is the inherent poor convergence. This is a property ALS shares with the very similar methods of Iterative Target Transform Factor Analysis, ITTFA and Iterative Refinement of the Concentration Profiles, discussed in Chapters 5.2.2 and 5.3.3.
281
Model-Free Analyses
We start with the flow diagram in Figure 5-51 demonstrating the basic ideas. Of course, the data matrix is still Y and the goal is to decompose it into the product of the concentration matrix C and matrix A of molar absorptivities according to Chapter 3.1, Beer-Lambert's Law.
Initial guess for C A = C+ Y Corrections to A → A C = Y Ã+ Corrections to C → C
R = Y-CA
= < ssqold
>
do something
= end Figure 5-51. Flow diagram for the ALS algorithm. The diagram starts with initial guesses for the concentration profiles C. It is, of course, equally possible to start with initial guesses for the component spectra A and swapping the order of the linear regression/correction steps: calculating first C and then A while the structure of the rest is the same.
5.4.1 Initial Guesses for Concentrations or Spectra Experience shows that often the quality of the initial guesses made for the concentration matrix C (or the matrix A of component spectra) is not crucial. As demonstrated in Figure 5-42, progress in that kind of algorithm typically is fast initially and slows down dramatically towards the minimum. Nevertheless, it cannot harm to have good initial starting matrices. Commonly implemented options include: •
combined eigenvalue curves, such as in Figure 5-44
•
non-iterative EFA result, Figure 5-50
282
Chapter 5
• concentration windows (matrices formed by 1's and 0's), such as the bottom panel of Figure 5-39
5.4.2 Alternating Least-Squares and Constraints By far the most important aspect of the ALS algorithm is the ease of implementing restrictions. In the following we demonstrate this using a number of examples. The program Main_ALS.m forms the backbone of the ALS algorithm. It reads in the data set Data_Chrom2a (p.251) which simulates an overlapping chromatogram of three components. It is the data set we used previously in Chapter 5.3.3 to demonstrate the concepts of iterative and explicit computation of the concentration profiles, based on the window information from EFA. In order to facilitate the comparison of the results and the progress of the iterative process, we start all iterative attempts with the same concentration profiles. They are the combined eigenvalue traces of EFA, as shown in Figure 5-44. The very basic ALS program does not include a termination criterion for the iterative cycle. Just 100 iterations are performed. As convergence invariably slows down towards the minimum, it is not trivial to introduce a generally reliable termination criterion. The algorithm also does not incorporate steps that are required if there is divergence in an iteration. This is indicated by 'do something' in Figure 5-51. Again, it is not easy to develop generally applicable measures that force the iterations towards a good direction. There is no equivalent to the Levenberg/Marquardt method that deals with divergence in the Newton-Gauss algorithm. MatlabFile 5-40. Main_ALS.m % Main_ALS [t,lam,Y,C_sim,A_sim]=Data_Chrom2a; [ns,nc]=size(C_sim); nl=length(lam); ne=nc+1; [EFA_f,EFA_b]=EFA(Y,ne); EFA_f(isnan(EFA_f)==1)=0; EFA_b(isnan(EFA_b)==1)=0;
% one extra singular value % perform EFA % replace NaN's by zeros % replace NaN's by zeros % combined singular value curves C=min(EFA_f(:,1:nc),fliplr(EFA_b(:,1:nc))); for it=1:100 C=norm_max(C);
% normalization
[C,A]=constraints_positiveCA(Y,C);
% constraints
R=Y-C*A; ssq(it)=sum(sum(R.*R));
% residuals
283
Model-Free Analyses
end [C_n,A_n]=norm_max(C,A); [C_sim_n,A_sim_n]=norm_max(C_sim,A_sim);
% norm. C, C_sim to max. 1 % and recalc. A, A_sim
subplot(3,1,1); plot(lam,A_n,'-',lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(3,1,2); plot(t,C_n,'-',t,C_sim_n,'.'); xlabel('time');ylabel('concentration'); subplot(3,1,3); plot(log10(ssq)); xlabel('iteration');ylabel('log(ssq)'); axis([0 100 -3 0]);
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-52. 100 iterations of ALS using the simplest constraint of setting negative values of A and C to zero. The markers represent the true spectra and concentration profiles, the lines the ALS result. The bottom panel shows the progress of the sum of squares. There are several types of constraints that can be used and generally the more constraints applied, the better the convergence and the better defined the results.
284
Chapter 5
The most important and almost universally applicable constraint is the nonnegativity of all elements of C and A. Obviously, neither concentrations nor molar absorptivities can be negative. In many ALS algorithms, this constraint is enforced by simply setting all negative entries in C and A to zero: MatlabFile 5-41. constraints_positiveCA.m function [C,A]=constraints_positiveCA(Y,C) A=C\Y; A=A.*(A>0);
% spectra % positive
C=Y/A; C=C.*(C>0);
% conc profiles % positive
There are exceptions to the universality of this non-negativity constraint: e.g. CD or ESR spectra can be negative. Apart from that, both spectroscopies produce a signal that is a linear function of concentration and thus the equivalent of Beer-Lambert's law holds. In other words, the equation Y=CA applies and thus also the ALS algorithm. The alternating computation of the matrices A and C in linear least-squares fits, each followed by setting negative values to zero, is simple but very crude. This fact is reflected in the slow process of the sum of squares minimisation. Matlab supplies the function lsqnonneg that performs a non-negative leastsquares fit of the kind y=Ca+r, where y and a are column vectors. The function computes the best vector a with only positive entries. This equation corresponds to data acquired at only one wavelength. In our application, the columns of A have to be computed individually in a loop over all wavelengths, in each instance using the appropriate column of Y. C is the complete matrix of concentrations. It is, of course, the same for all wavelengths. The following function constraints_lsqnonneg.m replaces the function constraints_positiveCA.m. (Naturally, the call in the main program, Main_ALS.m, needs to be adapted). All columns a:,j of A are computed sequentially in a loop. In the alternate computation of the best C from A, the same function can be used. It computes the rows of C in an analogue loop, using the appropriate row of Y and the complete matrix A. The computation of positive rows of C, using lsqnonneg requires the appropriate transpositions for the rows of C and Y and the matrix A. MatlabFile 5-42. constraints_lsqnonneg.m function [C,A]=constraints_lsqnonneg(Y,C) [ns,nl]=size(Y); for j=1:nl % pos spectra (MATLAB) A(:,j)=lsqnonneg(C,Y(:,j)); end
285
Model-Free Analyses
for j=1:ns % pos conc. (MATLAB) C(j,:)=lsqnonneg(A',Y(j,:)')'; end
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-53. ALS using the Matlab function lsqnonneg.m for non negative linear least-squares fitting. Compared to Figure 5-52, the resultant concentration profiles and spectra appear to be very similar. The plot of the development of the quality of the fit indicates that a smaller sum of squares is achieved in fewer iterations. However, the calculation is much slower and there is no obvious and significant benefit. In Constraint: Positive Component Spectra (p.168), we introduced an improved, much faster matrix based function nonneg.m (provided by C. Andersson) that is more efficient than the Matlab function lsqnonneg.m. The result of implementing the function constraints_nonneg.m is identical but achieved much faster. MatlabFile 5-43. constraints_nonneg.m function [C,A]=constraints_nonneg(Y,C) A=nonneg(Y',C')'; C=nonneg(Y,A);
% pos spectra (Andersson) % pos conc. (Andersson)
The secondary hump in the third concentration profile in Figure 5-52 and
Figure 5-53 obviously is not correct. In fact, often it is independently known
286
Chapter 5
that the concentration profiles are unimodal, i.e. they have only one maximum and continuously decrease on both sides of the maximum. This is certainly the case for chromatographic concentration profiles. The function constraints_nonneg_unimod.m implements this additional constraint by levelling off secondary maxima. It also uses nonneg.m for non-negative linear least-squares fits. The effect, as demonstrated in Figure 5-54, is clear; not only has the secondary maximum been suppressed, as a consequence, the absorption spectra also are closer to the true ones. MatlabFile 5-44. constraints_nonneg_unimod.m function [C,A]=constraints_nonneg_unimod(Y,C) [ns,nc]=size(C); A=nonneg(Y',C')'; C=nonneg(Y,A); for j=1:nc [m,p]=max(C(:,j)); for i=p:ns-1 if C(i+1,j)>C(i,j); end for i=p:-1:2 if C(i-1,j)>C(i,j); end end
% pos spectra (Andersson) % pos conc. (Andersson) % unimodal conc. profiles C(i+1,j)=C(i,j);end C(i-1,j)=C(i,j);end
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-54. ALS using the Matlab function constraints_ nonneg_unimod.m. performing positive linear least-squares and removing secondary maxima in the concentration profiles.
287
Model-Free Analyses
The unimodality constraint distorts the least-squares improvements and this is evident from the slower convergence. However, the constraint forces the concentration profiles to physically possible shapes. Another, most powerful constraint is based on concentration windows provided by EFA. MatlabFile 5-45. constraints_nonneg_window.m function [C,A]=constraints_nonneg_window(Y,C,C_window) [ns,nc]=size(C); A=nonneg(Y',C')'; C=nonneg(Y,A); C=C.*C_window;
% pos spectra (Andersson) % pos conc. (Andersson) % apply windows from EFA
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-55. ALS applying the Matlab function constraints_nonneg_window.m; using non-negative linear leastsquares and the window matrix applied to C. The matrix C_window contains the window information. It is composed of 1's and 0' indicating whether the particular value in the corresponding entry in
288
Chapter 5
the matrix C is known to be positive or zero, see Figure 5-37. This matrix C_window is computed directly before the ALS loop: sig_level=0.01; C_window=C>sig_level;
% define cut off level % build window matrix
Of course the matrix C_window has to be passed as an argument into constraints_nonneg_window.m. Refer to Figure 5-43 for a discussion of the level of significance used above.
5.4.3 Rotational Ambiguity The one major problem with all model-free methods is the fact that often there is no unique solution for the task of decomposing the matrix Y of measurements into the product of two positive matrices C and A. In many instances, there is a whole range of possible solutions. Recall the original model-free method by Lawton-Sylvestre (p.231) that clearly results in bands of feasible solutions, see Figure 5-16. In the literature on model-free analyses, the expression 'rotational ambiguity' has been coined for such situations. In instances where there is rotational ambiguity, algorithms like ALS converge to one particular point within the range of possibilities. Importantly, the algorithm does not detect such situations and thus does not warn the user of the potential non-uniqueness of the result. It is difficult to generalise, but such ambiguous situations often occur when the concentration windows are overlapping in specific ways. Kinetic investigations are typical examples of rotational ambiguity as a result of very wide concentration windows. Using Data_ABC2.m (p.256), producing data mimicking a reaction A→B→C, instead of the chromatography data and applying constraints_nonneg.m results in Figure 5-56. The main program Main_ALS2.m is not listed here, it is virtually identical with Main_ALS.m. While the resulting concentration profiles, and in particular the computed spectra, seem to be reasonably close to the true ones, there are significant discrepancies, typical for model-free analyses. (a) The computed concentration profile for the intermediate component reaches zero at the end of the measurement. (b) The initial part of the concentration profile for the final product is wrong; it does not start with zero concentration. Both discrepancies are the result of rotational ambiguity. The minimal ssq, reached after relatively few iterations, reflects the noise of the data and not a misfit between CA and Y. ssq does not improve if the correct matrices C and A are used.
289
Model-Free Analyses
absorptivity
1 0.5
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-56. ALS analysis of kinetic data Data_ABC2.m analysed using constraints_nonneg.m. Implementation of additional constraints can help remove, or at least reduce, rotational ambiguity. A possibility to narrow down the range is to utilize a known component spectrum. The simplest way of implementing known component spectra is to replace the appropriate spectrum within the iterations with the 'correct', known one. See constraints_nonneg_known_spec.m. We repeat, a powerful property of the ALS algorithm is the ease with which additional known information can be implemented. MatlabFile 5-46. constraints_nonneg_known_spec.m function [C,A]=constraints_nonneg_known_spec(Y,C,A_sim) A=nonneg(Y',C')'; A(2,:)=A_sim(2,:); C=nonneg(Y,A);
% pos spectra (Andersson) % known spectrum % pos conc. (Andersson)
290
Chapter 5
absorptivity
1.5 1 0.5
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
60
80
100
time
log(ssq)
0 -1 -2 -3 0
20
40 iteration
Figure 5-57. ALS with known intermediate spectrum The improvement resulting from the incorporation of the correct spectrum for the intermediate is subtle but significant. The all resulting spectra are improved, with the intermediate spectrum, of course, correct. The new concentration profiles for the starting material A and product C are now correct, while the profile for the intermediate B is untouched. Compared to Figure 5-56, the minimal ssq after the incorporation of the correct spectrum, is not improved and this is a clear indication for rotational ambiguity. The incorrect concentration profile for the intermediate B indicates that there is still a reduced level of rotational ambiguity. In fact, the solution is still not unique; there is still a discrepancy in the concentration profile of the intermediate and small errors in the spectra. With the introduction of the one correct spectrum, the range of rotational ambiguity has been reduced but not totally removed.
5.5 Resolving Factor Analysis, RFA Resolving Factor Analysis, RFA, is an attempt to introduce the strengths of the Newton-Gauss algorithm into the model-free analysis methodology. As
Model-Free Analyses
we have seen in analyses can be accelerates as it methodologies is computations.
291
many instances, the iterative progress in model-free very slow. The Newton-Gauss algorithm, in contrast, converges towards the optimum. Combining the two a promising idea that should result in much faster
¯ T, is the core of RFA. See also its graphical Equation (5.49), C= U ¯ is known from the SVD of the measurement representation in Figure 5-45. U Y, C and also A, see equation(5.58), can be calculated as a function of a transformation matrix T. The residuals and the sum of squares are defined as R( T ) = Y − C(T ) × A(T ) ssq = ∑ ∑ Ri2, j
(5.54)
and are minimised by the Newton-Gauss method. For a three component system, the matrix T has nine elements and thus it appears that C and eventually the sum of squares are a function of nine parameters. As we will see in a moment there are actually fewer, only six, parameters to be fitted. The idea of RFA is to use the Newton-Gauss algorithm to fit this rather small number of parameters in T. To start the iterative cycle, we need a set of initial guesses, Tguess, for the parameters T. In order to compare the properties of RFA with the previous iterative methods, we use the same data set as generated by Data_chrom2a.m (p.251). Since the latest Newton-Gauss algorithm, nglm3.m (p.173), requires Matlab structures, the equivalent data are generated by the appropriate function Data_RFA_chrom2a.m. The Newton-Gauss algorithm requires initial estimates for the parameters in T. These can be computed from the same estimated concentration profiles Cguess as before (Figure 5-44). It is determined by
Tguess = U t C guess
(5.55)
An important issue needs to be discussed next. Multiplying a column of C with any number and its corresponding row of A with the inverse of that number, does not affect the product CA and thus this factor is not determined at all. It can be freely chosen. Due to this multiplicative ambiguity, only the shapes of the concentration profiles (and component spectra) can be determined by any model-free method and only additional quantitative information allows the absolute determination of C and A. Multiplying a concentration profile, or column of C, with a factor is equivalent to multiplication of the corresponding column of T with the same factor. Any one element of each column vector of T can be chosen freely while the other elements in that column define the shape of the concentration profile. In order to avoid numerical problems with very small or very large numbers in each column of T, we choose the largest absolute element of each column of the matrix of initial guesses Tguess and keep it
292
Chapter 5
fixed during the iterative refinement of the others. This reduces the number of parameters that need fitting to nc(nc-1) or in our example of a three component system from 9 to 6. The Newton-Gauss algorithm (nglm3.m), is called from Main_RFA.m, and requires a Matlab function that computes the residuals as a function of the parameters T, as defined in equation (5.54). This calculation is performed in the Matlab function Rcalc_RFA.m. ¯ T, see Figure 5-45. Elements of C that are First C is computed as C= U outside the concentration window and negative elements are set to zero. ¯¯ . This somewhat surprising equation can be derived A is computed as T-1SV in the following way
CA = USV
(5.56)
introducing the identity matrix TT-1 in the appropriate position: CA = UTT -1SV
(5.57)
A = T -1SV
(5.58)
¯ T, A must be: as C=U
Next, negative elements of A are set to zero and the residuals and the sum of squares are computed as indicated in Figure 5-51. The derivatives of the residuals with respect to the parameters are computed numerically by the Newton-Gauss algorithm. MatlabFile 5-47. Rcalc_RFA.m function [r,s]=Rcalc_RFA(s) s.C=s.U_bar*s.T; s.C=s.C.*s.C_window; s.C(s.C 0 residuals
There is one detail that necessitates a few additional comments: nglm3.m requires a vector (s.par_str) of strings that contains the names of all variables that are fitted. In the RFA application these are the nc(nc-1) elements of T, s.par_str=[s.T(1,1), s.T(1,2), …]. The function build_par_str.m does the job of finding the maximum element of each column of T and including the other elements into par_str. MatlabFile 5-48. build_par_str.m function par_str=build_par_str(T) [maxT,index]=max(abs(T)); k=0; for j=1:3
Model-Free Analyses for i=1:3 if i~=index(j) k=k+1; par_str{k}=['s.T(' int2str(i) ',' int2str(j) ')']; end end end MatlabFile 5-49. Main_RFA.m % Main_RFA s=Data_RFA_Chrom2a; s.fname='Rcalc_RFA';
% get data into structure s % file to calc residuals
ne=s.nc; [EFA_f,EFA_b]=EFA(s.Y,ne); % perform EFA on ne sing. values EFA_f(isnan(EFA_f)==1)=0; % replace NaN's by zeros EFA_b(isnan(EFA_b)==1)=0; % replace NaN's by zeros % combined singular value curves C_guess=min(EFA_f(:,1:s.nc),fliplr(EFA_b(:,1:s.nc))); sig_level=0.01; % cut off level s.C_window=C_guess>sig_level; % build window matrix C_guess=norm_max(C_guess); % normalise [U,S,Vt]=svd(s.Y,0); s.U_bar=U(:,1:ne); s.S_bar=S(1:ne,1:ne); s.V_bar=Vt(:,1:ne)'; s.T=s.U_bar'*C_guess; % initial guesses for s.T from C_guess s.par_str=build_par_str(s.T); % cell array of 's.T(i,j)' strings s.ssq_all=[]; s.par=get_par(s); s=nglm3(s);
% Newton-Gauss % collects variable parameters into s.par % call ngl/m
fprintf(1,'s.T = \n');disp(s.T);fprintf(1,'\n'); % display T s.sig_r=sqrt(s.ssq/(s.ns*s.nl-length(s.par))); % sigma_r s.sig_par=s.sig_r*sqrt(diag(inv(s.Curv))); % sigma_par for i=1:length(s.par) fprintf(1,'%s: %g +- %g\n',s.par_str{i}(3:end), ... s.par(i),s.sig_par(i)); end fprintf(1,'sig_r: %g\n',s.sig_r); [C_sim_n,A_sim_n]=norm_max(s.C_sim,s.A_sim); [C_n,A_n]=norm_max(s.C,s.A); subplot(3,1,1); plot(s.lam,A_n,'-',s.lam,A_sim_n,'.'); xlabel('wavelength');ylabel('absorptivity'); subplot(3,1,2); plot(s.t,C_n,'-',s.t,C_sim_n,'.'); xlabel('time');ylabel('concentration'); subplot(3,1,3); plot(log10(s.ssq_all),'.'); xlabel('iteration');ylabel('log(ssq)'); it=0, ssq=7.9338, mp=0, conv_crit=1
293
294
Chapter 5
it=1, ssq=0.941161, mp=0, conv_crit=0.881373 it=2, ssq=0.176043, mp=0, conv_crit=0.812951 it=3, ssq=0.0444385, mp=0, conv_crit=0.74757 it=4, ssq=0.0145205, mp=0, conv_crit=0.673245 it=5, ssq=0.00511355, mp=0, conv_crit=0.647839 it=6, ssq=0.0029535, mp=0, conv_crit=0.422417 it=7, ssq=0.00241231, mp=0, conv_crit=0.183239 it=8, ssq=0.00237849, mp=0, conv_crit=0.0140165 it=9, ssq=0.00237848, mp=0, conv_crit=6.54114e-006 s.T = -5.7280 -2.4717 -3.9865
5.2605 0.7339 -3.1774
2.4481 -1.2217 -0.3969
T(1,1): -5.72798 +- 0.0237521 T(2,1): 5.26049 +- 0.0167212 T(2,2): 0.733879 +- 0.0025855 T(3,2): -1.2217 +- 0.00220923 T(2,3): -3.17735 +- 0.0110746 T(3,3): -0.396851 +- 0.00511213 sig_r: 0.00106576
absorptivity
2 1
concentration
0 400
450
500 wavelength
550
600
1 0.5 0 0
20
40
60
80
100
6
8
10
time
log(ssq)
2 0 -2 -4 0
2
4 iteration
Figure 5-58. RFA analysis of chromatography data.
Model-Free Analyses
295
It is illustrative to compare Figure 5-58 with the equivalent result of the ALS analysis in Figure 5-55. The most striking difference is the number of iterations required to reach the outcome. RFA arrives at the optimal resolution, within the constraints, in 10 iterations. ALS, using equivalent constraints results in acceptable matrices C and A but even after 100 iterations the optimum clearly has not been reached.
5.6 Principle Component Regression and Partial Least Squares, PCR and PLS Principal Component Regression, PCR, and Partial Least Squares, PLS, are the most widely known and applied chemometrics methods. This is particularly the case for PLS, for which there is a tremendous number of applications and a never-ending stream of proposed improvements. The details of these latest modifications are not within the scope of this book and we concentrate on the essential, classical aspects. One could argue whether PCR and PLS should be part of the chapter ModelBased Analyses or Model-Free Analyses. Both, PCR and PLS, are clearly not hard-model fitting methods in the way presented in Chapter 4, nor are they pure model-free analyses. They are somewhere in between, maybe closer to model-free analyses and that is the reason for discussing them here. PCR or PLS establish a mathematical relationship (calibration) between the matrix that is formed by the spectra taken of a collection of samples and the vector of properties or qualities for these same samples. Additionally, both methods allow the prediction of the quality for new samples, just based on their spectra. In contrast to most methods discussed so far, PCR and PLS do not require any order in the data set. In this chapter, we deviate from our well-established principle of generating 'measurements' and analysing them subsequently with the methods developed for the purpose. Such a procedure does not make much sense for PCR/PLS. At least it would be rather difficult to generate realistic data sets that are amenable to analysis by PCR or PLS. We decided to use a publicly available data set; the file corn.mat can be downloaded from http://software.eigenvector.com/Data/Corn/index.html. This data set contains near infrared (NIR) spectra of a collection of 80 corn samples measured on three different instruments, together with the qualities 'Moisture', 'Oil', 'Protein' and 'Starch' for each sample. We use the example of 'Protein' measured on instrument 'mp6' to demonstrate the principles of the PCR/PLS analyses. In order to chemically analyse a sample of corn for its protein content, a rather complex analytical procedure (e.g. Kjeldahl analysis) is required, a slow and expensive process. In our example, the PCR/PLS group of methods replaces this procedure with a much faster spectroscopic analysis. First, a mathematical relationship is established from a calibration set, comprising a matrix of NIR-spectra of the collection of samples and the vector of
296
Chapter 5
corresponding qualities. This calibration can subsequently be used to predict the particular quality for a new sample from its NIR-spectrum alone, thus avoiding an expensive experimental analysis. A more traditional spectroscopy based approach for corn analysis would be to investigate whether there is a peak in the NIR-spectrum that correlates well with the protein content, or whether there is a ratio of peaks that correlates well, or … whatever else the scientist can think of and is prepared to try. Evidently, there is a tremendous number of potential combinations and permutation one could try. PCR/PLS do this job in a much more elegant and efficient way.
5.6.1 Principal Component Regression, PCR In the example we deal with a collection of ns=80 corn samples for which we have the NIR spectra, measured at nl=700 wavelengths, and 80 corresponding qualities (protein contents). The Matlab script Main_PCR.m first reads in the complete corn data. Then it execute stepwise all the tasks that are described in the following. In order to test PCR and later PLS, we remove a random selection of 10 test samples from the total data set; the 10 test spectra are collected row-wise in the matrix Ys and the corresponding 'known' qualities in a column vector qs,known. The remaining spectra are organised in the same way in the matrix Y of dimensions 70×700. For each one of the samples we also know the protein content; we collect these qualities in the vector q with 70 entries. In the following, Y and q serve as the calibration set that is used later to predict the unknown qualities qs for the test set Ys. The predicted qs can then be compared with the 'known' qualities qs,known. MatlabFile 5-50. Main_PCR.m % Main_PCR load corn.mat mp6spec propvals; % load corn data set Y_data=mp6spec.data; % NIR spectra q_data=propvals.data(:,3); % protein qualities ns=length(q_data); lam=[1100:2:2498]; plot(lam,Y_data); xlabel('wavelength') rand('seed',1); s=ceil(rand(10,1)*ns); Y=Y_data; Y(s,:)=[]; q=q_data; q(s)=[]; Y_s=Y_data(s,:); q_s_k=q_data(s);
% initialise random number generator % random selection of 10 samples % calibration set excluding 10 samples % 'unknown' test samples % their 'known' qualities
297
Model-Free Analyses
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 1000
1500
2000
2500
wavelength
Figure 5-59. Collection of NIR spectra of 80 corn samples. There is a fair amount of structure in the NIR spectra but the differences between the spectra are rather subtle. Obviously, the protein content cannot easily be read from any of the peaks. Mean-Centring, Normalisation
There are numerous publications proposing a glut of data treatment methods prior to PCR/PLS. Well established, tested and essentially universally applied are mean-centring and normalisation of the data. We have seen in Mean Centring, Closure (p.239) that mean centring reduces the dimensionality by one, which of course cannot harm. In PCR/PLS it is also common to normalise to the standard deviation of the signals. Both are implemented in Main_PCR.m. MatlabFile 5-51. Main_PCR.m … continued % Main_PCR ... continued (data pre-treatment) meanY=mean(Y); % mean centring Y and q meanq=mean(q); Y_mc=Y-repmat(meanY,size(Y,1),1); q_mc=q-meanq; norm_coef=1./std(Y_mc); % normalisation of Y_mc Y_mc_n=Y_mc*diag(norm_coef);
Mean-centring and normalisation are optional. The PCR (and PLS) algorithm are essentially independent of the nature of pre-treatment of the data, only the centring has to be reversed in the prediction step. In the programs we
298
Chapter 5
indicate the levels of pre-treatment, i.e. Y→Ymc→Ymc,n and q→qmc, while in the equations in the text we do not. PCR Calibration
It is possible to develop the ideas behind PCR and to a lesser extent behind PLS, based on chemical ideas and intuition. Naturally, this is not the only way and both PCR and PLS methods have been developed on pathways that are theoretically oriented. While we do not know the components of the corn samples nor their component spectra, nor even how many components there are, we can still assume that Beer-Lambert's law holds and we can write Y=CA (see Chapter 3.1). There is nothing new here. For a chemist, it intuitively makes sense to assume that the quality 'protein content' is related to the concentrations of the components in the mixture. The simplest assumption is that the protein content is the weighted average of the concentrations of the relevant individual components. Some components have a high protein content others might even have a negative influence. This is best expressed in a matrix equation: nc ns
× nc +
=
q
C
b’
ns
r
(5.59)
All we know at present is the vector of qualities q and that q might be approximated by the product Cb'. We do not have an idea about the number of components, nc, nor about C or b'. ¯ T, C is the product of U ¯ and a Now we remember equation (5.49) C= U transformation matrix T. Introduction of this equation into equation (5.59) results in
q = UTb' + r = Ub + r
(5.60)
where the product Tb' is replaced with the column vector b. The quality vector q is now approximated by a linear combination of the columns of the ¯ . This might be surprising but if (5.59) makes sense, eigenvector matrix U ¯ is known from the SVD of (5.60) does as well. The important aspect is that U Y. Note, we still do not know how many components there are, or how many ¯ . We come back to that question of the eigenvectors we should retain in U shortly. The computation of the best b, the one for which the residual vector r is ¯ minimal, is a linear least-squares calculation. Due to the orthonormality of U it is particularly easy:
299
Model-Free Analyses
b = Ut q
(5.61)
This allows us to compute the PCR approximation qPCR for the quality vector q: qPCR = Ub = UU t q
(5.62)
This should remind the reader of e.g. equation (5.28). The vector qPCR is ¯ . The PCR nothing but the projection of the quality vector q into the space U ¯ , it is bad otherwise. calibration is good if the vector q is close to the space U The Matlab function PCR_calibration.m performs the PCR calibration according to equation (5.62). Note that we use ne=12 eigenvectors in the above calculations. This is the optimal number for prediction, as we show in Cross Validation (p.303). The reader is invited to play with this number and observe the effect. The routine PCR_calibration.m also returns a 'prognostic vector' vprog. It is used for prediction and its function is explained in the next Chapter PCR Prediction. MatlabFile 5-52. Main_PCR.m … continued % Main_PCR ... continued (calibration) ne=12; % no of factors for calibration [q_PCR,v_prog]=PCR_calibration(Y_mc_n,q_mc,ne); % calibration q_PCR=q_PCR+meanq; % undo mean centering in q_PCR plot(q,q_PCR,'.') xlabel('q');ylabel('q_P_C_R')
10
9.5
q
PCR
9
8.5
8
7.5 7.5
8
8.5
9
9.5
10
q
Figure 5-60. PCR calibration of the corn data using 12 factors: qPCR versus the actual qualities q.
300
Chapter 5
MatlabFile 5-53. PCR_calibration.m function [q_PCR,v_prog]=PCR_calibration(Y_mc_n,q_mc,ne) [U,S,Vt]=svd(Y_mc_n,0); U_bar=U(:,1:ne); S_bar=S(1:ne,1:ne); V_bar=Vt(:,1:ne)'; q_PCR=U_bar*U_bar'*q_mc; v_prog=V_bar'/S_bar*U_bar'*q_mc;
% prognostic vector
PCR Prediction
So far, the program Main_PCR.m covers the calibration part of PCR. As Figure 5-60 demonstrates, there is a very reasonable mathematical relationship between the quality 'protein' and the NIR-spectra of the collection of corn samples; the correlation between measured and PCRmodelled protein contents is convincing. How can we use these results to predict the quality qs of a new sample, just based on its NIR-spectrum? The ¯ b. Each individual property qi is approximated relevant equation is (5.60) q=U ¯ and b. The calculation of by the product of the corresponding row of u ¯ i,: of U the quality qs for a new sample is done in an analogous way: q s = us b
(5.63)
However, we need to determine the row vector u ¯ s corresponding to the new sample. Figure 5-61 attempts to represent the relationship between the spectrum of a new sample, ys, and the Singular Value Decomposition of Y itself.
Y
ys
= U
S
V
us
y s = u s SV SV Figure 5-61. Relationship between a new sample spectrum ys and its representation u ¯ s in the eigenvector space. The new spectrum, ys, is shown as the grey row underneath Y and the ¯. corresponding u ¯ s as the grey row underneath U
y s = u s SV
(5.64)
¯¯ . u The spectrum of the new sample is the product of u ¯ s and the matrix SV ¯s contains the coordinates of spectrum ys in the eigenvector space spanned by ¯¯ . Rearranging equation (5.64), u SV ¯ s is computed as:
301
Model-Free Analyses
u s = y s V t S-1
(5.65)
Inserting this equation into equation (5.60) and substitution of b by equation (5.61) leads to: qs = u s b
(5.66)
= y s V t S-1U t q
¯, S ¯, V ¯ and the vector b are determined by the calibration set Y The matrices U ¯ tS ¯ -1 U ¯ tq is also completely determined by the and q; thus the product V calibration. It is a column vector of dimension nl×1 (nl is the number of wavelengths at which the spectra are taken), we call it the prognostic vector, vprog. The quality qs for any new sample can be predicted by the product of its spectrum ys and the prognostic vector vprog: qs = y s v prog
(5.67)
v prog = V t S-1U t q
This prognostic vector can reveal interesting insight into the relationship between the qualities q and the spectra of the calibration set Y. Note that the prognostic vector vprog has already been computed in the function PCR_calibration. MatlabFile 5-54. Main_PCR.m … continued % Main_PCR ... continued (calibration) subplot(2,1,1);plot(lam,meanY);axis tight; subplot(2,1,2);plot(lam,v_prog);axis tight; xlabel('wavelength')
0.6 0.4 0.2 0
1200
1400
1600
1800
2000
2200
2400
1200
1400
1600 1800 2000 wavelength
2200
2400
0.2 0.1 0 -0.1
Figure 5-62. The mean spectrum and the prognostic vector.
302
Chapter 5
Figure 5-62 displays the mean spectrum and the prognostic vector vprog. The vector product ys×vprog is the sum over all the products of pairs of elements in ys and vprog. Thus, if at a certain wavelength the prognostic vector vprog has a positive value, a high value in ys at this wavelength would add to the quality, a negative value in vprog would subtract. As an example, consider the wavelength 2100nm highlighted by the dotted line in Figure 5-62: the prognostic vector is negative, indicating that the peak in the spectrum at this wavelength is 'bad' for the protein content. Samples with shifted peaks towards longer wavelength would have an increased quality qs as the prognostic vector has a strong positive contribution at around 2200nm. For the sake of completeness, we introduce an alternative at this stage and present a more theoretical but quicker path for the development of the PCR calibration and prediction equations. The starting concept is to assume there is a linear relationship between q and the matrix Y:
q = Yb''
×
≈ q
(5.68)
Y
b”
Figure 5-63. Graphic representation of equation (5.68) The optimal vector b" cannot be computed as b"=Y+q since the pseudoinverse Y+ is not defined, its calculation would include the inversion of a ¯¯¯ , b" can then be rank deficient matrix. The way out is to replace Y with USV computed as
b'' = V t S −1U t q
(5.69)
and the prediction of qs of a new sample is simply qs = y s b'' = y s V t S −1U t q = y s v prog
(5.70)
The equations, of course, are the same as developed in the derivations given previously. And it turns out that b'' is the prognostic vector b''=vprog. Now we use the information gathered so far for the prediction of the 10 test samples Ys removed from the complete data set at the very beginning. The function PCR_PLS_pred.m does the work according to equation (5.70). Importantly the mean-centring and normalisation have to be performed in exactly the same way as in the calibration. MatlabFile 5-55. PCR_PLS_pred.m function q_s_pred=PCR_PLS_pred(Y_s,meanY,meanq,norm_coef,v_prog)
303
Model-Free Analyses
Y_s_mc Y_s_mc_n q_s_mc q_s_pred
= = = =
Y_s-repmat(meanY,size(Y_s,1),1); Y_s_mc*diag(norm_coef); Y_s_mc_n*v_prog; q_s_mc+meanq;
% % % %
mean centre normalise predict undo mean centring
MatlabFile 5-56. Main_PCR.m … continued % Main_PCR ... continued (prediction) q_s_pred=PCR_PLS_pred(Y_s,meanY,meanq,norm_coef,v_prog); plot(q_s_k,q_s_pred,'.'); xlabel('q_{s,known}');ylabel('q_{s,pred}'); axis([7.5 10 7.5 10])
10
9.5
q s,pred
9
8.5
8
7.5 7.5
8
8.5 q
9
9.5
10
s,known
Figure 5-64. PCR prediction of 10 new corn samples. The prediction result, as shown in Figure 5-64, for the 10 samples that were removed from the total data set is convincing. ¯¯¯ . Remember, for the calculations so far we used ne=12 eigenvectors in USV Now, we need to return to the question on how this number is determined. The main goal of PCR/PLS is the prediction of the qualities of new samples based on prior calibration using a suitable known calibration set. The best number of eigenvectors is the number that results in the best prediction. It's as easy as that. Cross Validation
¯, S ¯ and V ¯ is determined prior to the The optimal number of eigenvectors in U predictions of new samples, it can be seen as part of the calibration. The
304
Chapter 5
process is only discussed here because it contains both calibration and prediction steps. The most common and intuitive method for the determination of this number of eigenvalues is called Cross Validation. The idea is to remove one (or several samples) from the calibration set, use what is left for the computation of a new calibration, and use it to predict the quality of the removed sample(s). Each prediction is compared with the actual quality that is known as the removed sample really is part of the total calibration set. In a loop all samples are removed either one by one or in groups and after recalibration with the reduced calibration set their qualities are predicted and compared with the true values. In order to determine the best number of eigenvectors this procedure is repeated in a big loop systematically trying all numbers of eigenvectors. This complete procedure is called Cross Validation. The continuation of Main_PCR.m calls the function PCR_cross.m. This function performs the systematic cross validation for up to nemax=40 eigenvectors. The computations result in a plot of the accuracy of the prediction as a function of the number of eigenvectors. Hopefully, the graph has a clear minimum! MatlabFile 5-57. PCR_cross.m function [q_s_cross,PRESS]=PCR_cross(Y,q,ne_max) ns=length(q); for i=1:ns i Y_cal=Y(i~=1:ns,:); q_cal=q(i~=1:ns,:); y_s=Y(i,:);
% elim. i-th row of Y for new calib. set % elim. i-th element of q, new qual. set % extract i-th row of Y as new sample
meanY_cal=mean(Y_cal); % mean centring Y and q meanq_cal=mean(q_cal); Y_cal_mc=Y_cal-repmat(meanY_cal,ns-1,1); q_cal_mc=q_cal-meanq_cal; y_s_mc=y_s-meanY_cal; norm_coef=1./std(Y_cal_mc); % normalising Y Y_cal_mc_n=Y_cal_mc*diag(norm_coef); y_s_mc=y_s_mc.*norm_coef; [U,S,Vt]=svd(Y_cal_mc_n,0); % PCR calibration for k=1:ne_max U_bar=U(:,1:k); S_bar=S(1:k,1:k); V_bar=Vt(:,1:k)'; V_prog(:,k)=V_bar'/S_bar*U_bar'*q_cal_mc; end
end
q_s_cross(i,:)=y_s_mc*V_prog; q_s_cross(i,:)=q_s_cross(i,:)+meanq_cal;
PRESS=sum((q_s_cross-repmat(q,1,ne_max)).^2);
% prediction % undo mean centring
305
Model-Free Analyses MatlabFile 5-58. Main_PCR.m … continued % Main_PCR ... continued (cross validation) ne_max=40; [q_s_cross,PRESS]=PCR_cross(Y,q,ne_max); plot(1:ne_max,PRESS);ylabel('PRESS');xlabel('factors')
16 14 12
PRESS
10 8 6 4 2 0 0
10
20 factors
30
40
Figure 5-65. PRESS-PCR for the corn data set. PRESS, the prediction sum of squares, is the measure for the accuracy of the prediction. It is the sum over all squared differences between crossvalidation predicted and true known qualities. ns
PRESS = ∑ (q s,crossi − qi )2
(5.71)
i =1
¯, S ¯ In Main_PCR.m the PRESS values for all 1 to ne eigenvectors used in U ¯ to compute the predicted qualities qs,cross are stored in a vector PRESS and V that is displayed in Figure 5-65. The figure does not show a clear minimum. In Figure 5-66 we show the results of the cross-validation for ne=12; this number has already been used for the calibration in Figure 5-60. MatlabFile 5-59. Main_PCR.m … continued % Main_PCR ... continued (cross validation) plot(q,q_s_cross(:,ne),'.'); xlabel('q');ylabel('q_{s,cross}'); axis([7.5 10 7.5 10])
306
Chapter 5
10
9.5
qs,cross
9
8.5
8
7.5 7.5
8
8.5
9
9.5
10
q
Figure 5-66. PCR cross-validation for the corn data set. Figure 5-66 shows the relationship between the true and cross-validation predicted qualities. Note the small but significant drop in the correlation compared to pure calibration, as shown in Figure 5-60. Calibration invariably produces a better correlation than prediction.
5.6.2 Partial Least Squares, PLS Partial Least Squares is the chemometrics method 'par excellence'. There is a tremendous number of published applications and also a large number of minor improvements to the original PLS algorithm. In order to understand the difference between the PCR and the PLS methods we first return to PCR. The two central equations of PCR are:
Y = USV and q = Ub
(5.72)
The first equation is the well-known Singular Value Decomposition. In the ¯ form the basis for the column vectors of Y. context of PCR the eigenvectors U The second equation in (5.72) attempts to also represent the column vector q ¯ . If both representations are good then PCR of qualities in the same space U works well, resulting in accurate predictions. A potential drawback of PCR is ¯ is defined solely by Y. Even if there is good reasoning for a the fact that U ¯ , as indicated in the derivation of equation relationship between q and U (5.60), it is somehow 'accidental'. The basic idea of PLS is to find a better set of basis vectors that represent adequately both Y and q. In the PLS literature, this basis T is often called
Model-Free Analyses
307
'scores'. Note, matrix T must not be confused with transformation matrices T we have used several times earlier in this chapter. As required for a decent set of basis vectors, the columns of T have to be orthogonal. The ideal basis for q would be q itself but it might not be a good basis for Y. Ideally, T is a compromise that serves as a good basis for q and Y. Obviously, there is not just one compromise and this to some extent explains the large number of modified PLS algorithms. Below, we give the complete program Main_PLS.m that runs the PLS computations. Its is identical to Main_PCR.m with the exception of the PLS functions PLS_calibration.m and PLS_cross.m that are called instead of the corresponding PCR routines; additional minor differences include the axis labelling etc. Mean-centring and normalisation are implemented as in PCR. Here, the complete listing is included while the equivalent Main_PCR.m has only been given in many little fragments. MatlabFile 5-60. Main_PLS.m % Main_PLS load corn.mat mp6spec propvals; % load corn data set Y_data=mp6spec.data; % NIR spectra q_data=propvals.data(:,3); % protein qualities ns=length(q_data); lam=[1100:2:2498]; plot(lam,Y_data); xlabel('wavelength') rand('seed',1); s=ceil(rand(10,1)*ns); Y=Y_data; Y(s,:)=[]; q=q_data; q(s)=[]; Y_s=Y_data(s,:); q_s_k=q_data(s);
% % % % % %
initialise random number generator random selection of 10 samples calibration set excluding 10 samples 'unknown' test samples their 'known' qualities
% Main_PLS ... continued (data pre-treatment) meanY=mean(Y); % mean centering Y and q meanq=mean(q); Y_mc=Y-repmat(meanY,size(Y,1),1); q_mc=q-meanq; norm_coef=1./std(Y_mc); % normalisation of Y_mc Y_mc_n=Y_mc*diag(norm_coef); % Main_PLS ... continued (calibration) ne=10; % no of factors for calibration [q_PLS,v_prog]=PLS_calibration(Y_mc_n,q_mc,ne); % calibration q_PLS=q_PLS+meanq; % undo mean centering in q_PLS plot(q,q_PLS,'.') xlabel('q');ylabel('q_P_L_S') subplot(2,1,1);plot(lam,meanY);axis tight; subplot(2,1,2);plot(lam,v_prog);axis tight; xlabel('wavelength')
308
Chapter 5
% Main_PLS ... continued (prediction) q_s_pred=PCR_PLS_pred(Y_s,meanY,meanq,norm_coef,v_prog); plot(q_s_k,q_s_pred,'.'); xlabel('q_{s,known}');ylabel('q_{s,pred}'); axis([7.5 10 7.5 10]) % Main_PLS ... continued (cross validation) ne_max=40; [q_s_cross,PRESS]=PLS_cross(Y,q,ne_max); plot(1:ne_max,PRESS);xlabel('factors');ylabel('PRESS'); plot(q,q_s_cross(:,ne),'.'); xlabel('q');ylabel('q_{s,cross}'); axis([7.5 10 7.5 10])
PLS calibration
The PLS equations can be written in analogy to equation (5.72)
Y = TP and q = Tb
(5.73)
In the original PLS, the matrices T (scores), P (loadings) and the vector b are computed sequentially in the following way: (a) take q as a first estimate for the first basis vector t:,1. (b) q is assumed to be a basis for Y. Hence, we can approximate Y as Y=qw1,:, and calculate a best w1,: (loading weights) in a linear leastsquares fit
w1,: = q \ Y
(5.74)
In the standard PLS algorithm w1,: is normalised to unity length and subsequently, t:,1 is calculated as
t:,1 = Y / w1,:
(5.75)
This can be interpreted as one ALS iteration. If the iterations are continued, this process converges to the first eigenvector u:,1. The PLS compromise is to stop at one iteration. (c) this t:,1 is the first basis vector, it is the PLS analogue to the first eigenvector u:,1 in PCR. (d) Both Y and q are projected onto t:,1 and the residuals calculated. Importantly, the residuals are orthogonal to t:,1. rq = q − t:,1b1 Ry = Y − t:,1p1,:
where b1 and p1,: are computed in linear least-squares fits
(5.76)
Model-Free Analyses
b1 = t :,1\q
p1,: = t :,1\Y
309
(5.77)
(e) The remaining basis vectors t:,2:ne are computed in an adaptation of the NIPALS algorithm. q is replaced by the residual vector rq and is used as a new estimate for the next basis vector, t:,2; Ry replaces Y and the computation is continued at (a). The cycle (a)-(d) is repeated ne times. The optimal number ne is determined by cross-validation. The vectors t:,k, wk,:, pk,: and the scalars bk (k=1…ne) are collected in the matrices T, W, P and vector b. The function PLS_calibration.m is the PLS equivalent to PCR_calibration.m. The iterative loop implements equations (5.73)-(5.77). The prognostic vector vprog is introduced in the next section. MatlabFile 5-61. PLS_calibration.m function [q_PLS,v_prog]=PLS_calibration(Y_mc_n,q_mc,ne) rq=q_mc; Ry=Y_mc_n; for k=1:ne W(k,:)=rq\Y_mc_n; W(k,:)=W(k,:)/norm(W(k,:)); T(:,k)=Ry/W(k,:); P(k,:)=T(:,k)\Ry; b(k,1)=T(:,k)\rq; Ry=Ry-T(:,k)*P(k,:); rq=rq-T(:,k)*b(k,1); end q_PLS=T*b; v_prog=W'*((P*W')\b);
% prognostic vector
PLS Prediction / Cross Validation
PLS prediction can be performed in analogy to PCR by PCR_PLS_pred.m (p.302) already introduced earlier. For this, we need to determine the prognostic vector, vprog. Using the results of the PLS calibration, it can be computed as:
v prog = W t (PW t )-1 b
(5.78)
During cross validation, it is most convenient to compute this prognostic vector as a function of the number of factors. This results in a collection of prognostic vectors, conveniently stored in a matrix V_prog. After determination of the optimal number of factors, the appropriate column can be selected as prognostic vector. The predicted quality qs for an unknown sample with the spectrum ys is then calculated as: qs = y s v prog
(5.79)
310
Chapter 5
For PCR and PLS, the principle of cross validation is identical. We refer to Cross Validation (p.303). The following function PLS_cross.m differs from PCR_cross only in the few lines performing the calibration and calculating the prognostic vector. MatlabFile 5-62. PLS_cross.m function [q_s_cross,PRESS]=PLS_cross(Y,q,ne_max) ns=length(q); for i=1:ns i Y_cal=Y(i~=1:ns,:); q_cal=q(i~=1:ns,:); y_s=Y(i,:);
% elim. i-th row of Y for new calib. set % elim. i-th elem. of q for new qual. set % extract i-th row of Y as new sample
meanY_cal=mean(Y_cal); % mean centring Y and q meanq_cal=mean(q_cal); Y_cal_mc=Y_cal-repmat(meanY_cal,ns-1,1); q_cal_mc=q_cal-meanq_cal; y_s_mc=y_s-meanY_cal; norm_coef=1./std(Y_cal_mc); Y_cal_mc_n=Y_cal_mc*diag(norm_coef); y_s_mc=y_s_mc.*norm_coef;
% normalising Y
rq=q_cal_mc; % PLS calibration Ry=Y_cal_mc_n; for k=1:ne_max W(k,:)=rq\Y_cal_mc_n; W(k,:)=W(k,:)/norm(W(k,:)); T(:,k)=Ry/W(k,:); P(k,:)=T(:,k)\Ry; b(k,1)=T(:,k)\rq; Ry=Ry-T(:,k)*P(k,:); rq=rq-T(:,k)*b(k,1); V_prog(:,k)=W(1:k,:)'*inv(P(1:k,:)*W(1:k,:)')*b(1:k,1); end
end
q_s_cross(i,:)=y_s_mc*V_prog; q_s_cross(i,:)=q_s_cross(i,:)+meanq_cal;
% prediction % undo mean centring
PRESS=sum((q_s_cross-repmat(q,1,ne_max)).^2);
5.6.3 Comparing PCR and PLS Figure 5-67 displays the results of the cross validation computations of the corn data with PCR and PLS. The graph is fairly typical: PLS is consistently better at small numbers of factors and predictions are very similar at the optimal number of factors ne, which is 10 for PLS and 12 for PCR. Experience has shown that it is 'dangerous' to use an excessive number of factors (over-fitting) for thew prediction of new unknown samples. This is why we selected ne=12 rather than 23 for PCR.
311
Model-Free Analyses
PCR PLS
14
PRESS
12 10 8 6 4 2 5
10
15
20 25 factors
30
35
40
Figure 5-67. Comparison of the cross validation results for PCR and PLS. In view of the similarity of the PRESS results for PCR and PLS, it is not surprising that the predicted qualities are very similar for the two methods if the optimal numbers of factors is used. Figure 5-68 summarises the comparison.
qs,cross
10 9 PCR PLS
8 7.5
8
8.5
9
9.5
10
q PCR PLS
vprog
0.2 0 -0.2 1200
1400
1600 1800 2000 wavelength
2200
2400
qs,pred
10 9 PCR PLS
8 7.5
8
8.5 q
9
9.5
10
s,known
Figure 5-68. Comparison of predictions and prognostic vectors for PCR and PLS.
312
Chapter 5
The top panel compares the cross-validation predictions with the optimal number of factors of 12 for PCR and 10 for PLS. The middle panel shows the similarity of the two prognostic vectors. The bottom panel compares the prediction of the 10 test samples that were removed from the total data set prior to cross-validation. If a conclusion can be drawn, it is that PCR and PLS are virtually indistinguishable in their outcome. Yet, PLS appears to reach optimal prediction with fewer factors than PCR.
Further Reading Kinetics John Ross, Igor Schreiber, Marcel O. Vlad, and Adam Arkin. Determination of Complex Reaction Mechanisms: Analysis of Chemical, Biological, and Genetic Networks. Oxford University Press 2005 Robert W. Hay. Reaction Mechanisms of Metal Complexes. Albion/Harwood Pub 2000 James H Espenson. Chemical Kinetics and Reaction Mechanisms (2nd edition). McGraw-Hill 1995 S.K. Scott. Oscillations, Waves, and Chaos in Chemical Kinetics. Oxford. Oxford Chemistry Press 1994. Ralph G. Wilkins. Kinetics and Mechanisms of Reactions of Transition Metal Complexes. VCH 1991 N.M. Rodiguin, E.N. Rodiguina. Consecutive Chemical Reactions. Mathematical Analysis and Development. D. van Nostrand New York 1964 S.W. Benson. The Foundations of Chemical Kinetics. McGraw-Hill New York 1960
Equilibria Arthur Martell, Robert Hancock. Metal Complexes in Aqueous Solutions (Modern Inorganic Chemistry). Springer 1996 Arthur Martell, Ramunas J. Motekaitis. The Determination and Use of Stability Constants. (2nd Edition). Wiley 1992 Juergen Polster, Heinrich Lachmann. Spectrometric Titrations: Analysis of Chemical Equilibria. VCH 1989 Kenneth A. Connors. Binding Constants: The Measurement of Molecular Complex Stability. Wiley 1987 M.T. Beck, I. Nagypal. The Chemistry of Complex Equilibria. Van NostrandReinhold. London 1970
Chemometrics/Statistics/Data Fitting Richard G.Brereton. Applied Chemometrics for Scientists. Wiley 2007
314
Further Reading
Paul Gemperline (editor). Practical Guide to Chemometrics (2nd edition). CRC Press 2006 Bruns, Scarmino, de Barros Neto. Statistical Design - Chemometrics, Volume 25 (Data Handling in Science and Technology). Elsevier 2006 James Miller, Jane Miller. Statistics and Chemometrics for Analytical Chemistry (5th edition). Prentice Hall 2005 D. Brynn Hibbert, J. Justin Gooding. Data Analysis for Chemistry. Oxford University Press 2005 M.J. Adams. Chemometrics in Analytical Spectroscopy (2nd edition). The Royal Society of Chemistry 2004. Richard G. Brereton. Chemometrics: Data Analysis for the Laboratory and Chemical Plant. Wiley 2003 George A. F. Seber, C. J. Wild. Nonlinear Regression (Wiley Series in Probability and Statistics). Wiley 2003 Edmund R. Malinowski. Factor Analysis in Chemistry (3rd edition). Wiley 2002 Philip R. Bevington, D. Keith. Robinson. Data Reduction and Error Analysis (3rd edition). McGrawHill New York 2002 Peter C. Meier, Richard E. Zünd. Statistical Methods in Analytical Chemistry (2nd edition). Wiley 2000 Matthias Otto. Chemometrics: Statistics and Computer Application in Analytical Chemistry. Wiley 1999 Sigmund Brandt, Glen Gowan. Data Analysis: Statistical and Computational Methods for Scientists and Engineers (3rd edition). Springer 1998 Richard Kramer. Chemometrics Techniques for Quantitative Analysis. Marcel Dekker 1998 Kenneth R. Beebe, Randy J. Pell, Mary Beth Seasholtz. Chemometrics: A Practical Guide. Wiley 1998 E.J. Karjalainen, U.P. Karjalainen. Data Analysis for Hyphenated Techniques (Data Handling in Science and Technology, Vol. II). Elsevier 1996 Meloun Milan, Jiri Militky, and Michele Forina. Chemometrics for Analytical Chemistry (Vol I+II). Ellis Horwood 1994 Harald Martens, Tormod Naes. Multivariate Calibration. Wiley 1993 John Kalivas. Mathematical Analysis of Spectral Orthogonality. Marcel Dekker 1993 Richard C. Graham. Data Analysis for the Chemical Sciences: A Guide to Statistical Techniques. VCH 1993 Stephen Haswell (editor). Practical Guide to Chemometrics (1st edition). Marcel Dekker 1992
315
Further Reading
Peter Gans. Data Fitting in the Chemical Sciences by the Method of Least Squares. Wiley 1992 Ed Morgan. Chemometrics: Experimental Design. Wiley 1991 D.L. Massart, B.G.M. Vandeginste, S.N. Chemometrics: A Textbook. Elsevier 1988
Deming,
and
Y.
Michotte.
B. G. M. Vandeginste, L. M. C. Buydens, S. De Jong, and P. J. Lewi. Handbook of Chemometrics and Qualimetrics A/B. Elsevier 1988
Numerical Methods Michael B. Cutlip, Mordechai Shacham. Problem Solving in Chemical Engineering with Numerical Methods. Prentice Hall 2007 Bruce A. Finlayson. Introduction to Chemical Engineering Computing. Wiley 2006 Daniel Dubin. Numerical and Analytical Methods for Scientists and Engineers using Mathematica. Wiley 2003 James B. Riggs. Introduction to Numerical Methods for Chemical Engineers (2nd edition). Texas Tech University Press 1999. Alejandro Garcia. Numerical Methods for Physics (2nd edition). Prentice Hall 1999 William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing (2nd edition). Cambridge 1996 Peter Pelikan, Michal Ceppan, and Marek Liska. Applications of Numerical Methods in Molecular Spectroscopy. CRC Press 1994 R. Bulirsch, J. Stoer. Introduction to Numerical Analysis. Springer New York 1993 Louis Lyons. A Practical Guide to Data Analysis for Physical Science Students. Cambridge University Press 1991
Matlab and Excel Kenneth Beers. Numerical Methods for Chemical Engineering: Applications in MATLAB. Cambridge University Press 2006 Rudra Pratap. Getting Started with MATLAB 7: A Quick Introduction for Scientists and Engineers (The Oxford Series in Electrical and Computer Engineering). Oxford University Press 2005 Gerard Verschuuren. Excel for Scientists Professionals series). Holy Macro! Books 2005
and
Engineers
(Excel
for
316
Further Reading
Steve Chapra. Applied Numerical Methods with MATLAB for Engineers and Scientists. McGraw-Hill 2004 Robert de Levie. Advanced Excel for Scientific Data Analysis. Oxford University Press 2004 S.C. Bloch. Excel for Engineers and Scientists (2nd edition). Wiley 2003 Bernard Liengme. A Guide to MS Excel 2002 for Scientists and Engineers. Butterworth, Heinemann 2002 E. Joseph Billo. Excel for Chemists: A Comprehensive Guide (2nd edition). Wiley-VCH 2001 Robert de Levie. How to Use Excel in Analytical Chemistry and in General Scientific Data Analysis. Cambridge University Press 2001 Alkis Constantinides, Navid Mostoufi. Numerical Methods for Chemical Engineers with Matlab Applications. Prentice Hall 1999 Michael B. Cutlip, Mordechai Shacham. Problem Solving in Chemical and Biochemical Engineering with POLYMATH, Excel, and MATLAB (2nd Edition). Prentice Hall 1998 Dermot Diamond, Venita C. A. Hanratty. Spreadsheet Applications in Chemistry using MS Excel. Wiley 1997 William J. Orvis. Excel for Scientist and Engineers (2nd edition). SYBEX 1995
List of Matlab Files MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
3-1. GAS_LAWS.M 3-2. GAS_LAWS.M …CONTINUED 3-3. BEER_LAMBERT.M 3-4. GAUSS.M 3-5. GAUSS_CURVE.M 3-6. GAUSS_CURVE2.M 3-7. GAUSS_SK.M 3-8. GAUSS_SKEWED.M 3-9. NEWTONRAPHSON.M 3-10. EQ1.M 3-11. EQ2.M 3-12. EDTA.M 3-13. EGG_CARTON.M 3-14. EGG_CARTON.M …CONTINUED 3-15. EGG_CARTON.M …CONTINUED 3-16. TWO_EQUATIONS.M 3-17. NONLINEQ.M 3-18. MAIN_NONLINEQ.M 3-19. ATOB.M 3-20. ODE_AUTOCAT.M 3-21. AUTOCAT.M 3-22. ODE_ZERO_ORDER.M 3-23. ZERO_ORDER.M 3-24. ZERO_ORDER.M …CONTINUED 3-25. ODE_LOTKA_VOLTERRA.M 3-26. LOTKA_VOLTERRA.M 3-27. LOTKA_VOLTERRA.M …CONTINUED 3-28. ODE_BZ.M 3-29. BZ.M 3-30. ODE_LORENZ.M 3-31. LORENZ.M 3-32. LORENZ.M …CONTINUED 4-1. DATA_MXB.M 4-2. MAIN_MXB.M 4-3. MAIN_MXB.M …CONTINUED 4-4. DATA_DECAY.M 4-5. MAIN_DECAY_2D.M 4-6. MAIN_DECAY_SSQ.M 4-7. TAN_POLY.M 4-8. MAIN_DECAY.M 4-9. DATA_DECAY_OFFSET.M 4-10. MAIN_DECAY_OFFSET.M 4-11. MAIN_SAVGOL.M 4-12. SAVGOL_BAD.M
29
30
34
37
37
38
39
39
53
57
59
66
72
72
73
74
75
75
79
87
88
89
90
91
93
93
94
96
96
98
98
99
104
104
104
106
106
107
124
127
129
129
132
133
318 MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
List of Matlab Files 4-13. SAVGOL.M 4-14. MAIN_SAVGOL_DERIV.M 4-15. SAVGOL_DERIV.M 4-16. MAIN_LOLIPOP.M 4-17. LOLIPOP.M 4-18. DATA_ABC.M 4-19. MAIN_ABC_3D 4-20. MAIN_ABC_LIN1.M 4-21. MAIN_ABC_LIN2.M 4-22. DATA_EXP.M 4-23. MAIN_EXP_2D.M 4-24. MAIN_NG1.M 4-25. MAIN_NG2.M 4-26. DATA_CHROM.M 4-27. MAIN_CHROM.M 4-28. NGLM.M 4-29. RCALC_CHROM.M 4-30. MAIN_CHROM2.M 4-31. MAIN_ABC.M 4-32. NGLM2.M 4-33. RCALC_ABC.M 4-34. RCALC_ABC2.M 4-35. DATA_EQAH2.M 4-36. MAIN_EQAH2.M 4-37. GET_PAR.M 4-38. PUT_PAR.M 4-39. NGLM3.M 4-40. RCALC_EQAH2.M 4-41. DATA_EQFIX.M 4-42. RCALC_EQFIX.M 4-43. MAIN_EQFIX.M 4-44. MAIN_ABC_RED.M 4-45. ODEAPB_C_REV.M 4-46. DATA_GLOB.M 4-47. RCALC_GLOB.M 4-48. MAIN_GLOB.M 4-49. DATA_EMISSION.M 4-50. MAIN_EMISSION_LIN.M 4-51. MAIN_EMISSION_LIN.M …CONTINUED 4-52. MAIN_EMISSION_WEIGHTED.M 4-53. RCALC_EMISSION_WEIGHTED.M 4-54. MAIN_DECAY_SIMPLEX.M 4-55. SSQCALC_DECAY.M 4-56. MAIN_ABC_SIMPLEX.M 4-57. SSQCALC_ABC.M 5-1. MAIN_SVD1.M 5-2. DATA_CHROM2.M 5-3. MAIN_SVD2.M
134
136
137
138
139
143
143
144
145
150
150
151
154
158
158
159
160
161
165
166
167
168
170
172
173
173
173
174
178
178
179
182
185
185
186
187
191
192
194
195
195
205
206
207
207
216
219
219
List of Matlab Files MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
5-4. MAIN_SVD2.M …CONTINUED 5-5. MAIN_SVD2.M …CONTINUED 5-6. MAIN_SVD2.M …CONTINUED 5-7. MAIN_SVD2.M …CONTINUED 5-8. DATA_AB.M 5-9. MAIN_PLOT_AB.M 5-10. LAWTONSYLVESTRE.M 5-11. DATA_EQAH2A.M 5-12. MAIN_EQAH2A.M 5-13. MAIN_EV_SPACE.M 5-14. MAIN_EV_SPACE.M …CONTINUED 5-15. MAIN_MEANCENTER.M 5-16. MAIN_HELPP.M 5-17. MAIN_HELPP.M …CONTINUED 5-18. MAIN_NOISERED1.M 5-19. DATA_AB2.M 5-20. MAIN_NOISERED2.M 5-21. MAIN_TFA.M 5-22. DATA_CHROM2A.M 5-23. MAIN_ITTFA.M 5-24. MAIN_SYM_ABC.M 5-25. MAIN_SYM_ABC_REV.M 5-26. MAIN_SYM_ABC_REV.M …CONTINUED 5-27. DATA_ABC2.M 5-28. MAIN_TTF.M 5-29. MAIN_EFA1.M 5-30. MAIN_EFA1.M …CONTINUED 5-31. MAIN_EFA1.M …CONTINUED 5-32. MAIN_EFA2.M 5-33. EFA.M 5-34. MAIN_EFA3.M 5-35. DATA_EQAH4A.M 5-36. MAIN_FSW_EFA.M 5-37. MAIN_IT_EFA.M 5-38. NORM_MAX.M 5-39. MAIN_NON_IT_EFA.M 5-40. MAIN_ALS.M 5-41. CONSTRAINTS_POSITIVECA.M 5-42. CONSTRAINTS_LSQNONNEG.M 5-43. CONSTRAINTS_NONNEG.M 5-44. CONSTRAINTS_NONNEG_UNIMOD.M 5-45. CONSTRAINTS_NONNEG_WINDOW.M 5-46. CONSTRAINTS_NONNEG_KNOWN_SPEC.M 5-47. RCALC_RFA.M 5-48. BUILD_PAR_STR.M 5-49. MAIN_RFA.M 5-50. MAIN_PCR.M 5-51. MAIN_PCR.M … CONTINUED
319 220
221
222
224
224
225
234
236
236
237
238
240
241
242
243
244
245
248
251
252
254
255
255
256
256
261
263
264
265
266
267
268
269
272
275
279
282
284
284
285
286
287
289
292
292
293
296
297
320 MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE MATLABFILE
List of Matlab Files 5-52. 5-53. 5-54. 5-55. 5-56. 5-57. 5-58. 5-59. 5-60. 5-61. 5-62.
PCR_CALIBRATION.M MAIN_PCR.M … CONTINUED MAIN_PCR.M … CONTINUED PCR_PLS_PRED.M MAIN_PCR.M … CONTINUED PCR_CROSS.M MAIN_PCR.M … CONTINUED MAIN_PCR.M … CONTINUED MAIN_PLS.M PLS_CALIBRATION.M PLS_CROSS.M
300 299 301 302 303 304 305 305 307 309 310
List of Excel Sheets EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET EXCELSHEET
3-1. 3-2. 3-3. 3-4. 3-5. 3-6. 4-1. 4-2. 4-3. 4-4. 4-5. 4-6.
CHAPTER2.XLS-FESCN CHAPTER2.XLS-EQML2 CHAPTER2.XLS-CASO4 CHAPTER2.XLS-H3PO4 CHAPTER2.XLS-EQSYS CHAPTER2.XLS-RUNGEKUTTA CHAPTER3.XLS-TRENDLINE CHAPTER3.XLS-LINEST CHAPTER3.XLS-PSEUDOINVERSE CHAPTER3.XLS-CHROM CHAPTER3.XLS-KINETICS CHAPTER3.XLS-EMISSION
42
61
63
67
75
83
111
126
147
208
210
212
Index # χ2-fitting,
189, 211
D data generation, 34
linear, 190
chromatograms, 36
non-linear, 195
spectra, 36
standard deviation of the residuals, 194
degrees of freedom, 122, 161, 165, 180, 189,
194, 195
A
design matrix, 115
activity coefficients, 44, 62
E
Debye-Hückel, 63
Alternating Least-Squares (ALS), 280
constraints, 282
eigenvalues, 215
eigenvectors, 181, 215
concentration windows, 287
noise, 221
known component spectrum, 289
significant, 218
non-negativity, 284
unimodality, 286
initial guesses, 281
structure, 221
equilibria, 32, 40
beta-values, 43
complex, 48, 62
B Beer-Lambert's law, 33
absorption, 33
calculation of component spectra using
known concentrations, 144
calculation of concentrations using known
component spectra, 145
component spectrum, 34
concentration profiles, 34
molar absorptivity, 33
path length, 33
components, 40, 44, 49
concentration profiles, 42, 56
degree of dissociation, 65
deprotonation of coordianted water, 47
equilibrium concentrations, 47
equilibrium constant, 44
explicit equations, 41, 64
formation constant, 44
general case, 43
general solution, 48
hydroxide ion concentration, 47
ionic product of water, 58
C chemical equilibrium, 40
chemometrics, 231
chromatography, 36
elution profiles, 36
overlapping peaks, 36
closure, 227, 239
constraints
positive component spectra, 168
coordination chemistry, 45
curvature matrix, 122, 161, 202
model, 45
notation, 43
numerical solution. See Newton-Raphson
algorithm
simple case Fe/SCN, 40
species, 40, 44, 49
stoichiometry, 47, 53
total concentrations, 41, 43, 47
Evolving Factor Analysis (EFA), 259
classical, 260
analysis of titration data, 267
backward EFA, 262
concentration windows, 264, 271
323
Index evolving singular values, 260
forward EFA, 261
rank analysis, 261
significance level, 261, 273
spectrophotometric titration of a 2-protic
acid, 236, 264
spectrophotometric titration of a 4-protic
acid, 268
Fixed-Size Window EFA, 268
steady state, 91
secondary analyses, 271
time resolved single photon counting, 191,
explicit computation of concentration
profiles, 276
initial guesses of concentration profiles,
274
iterative concentration refinement, 271
examples
211
titration curve for a 2-protic acid, 170
titration of acetic acid, 58
Excel
χ2-fitting, 211
$ operator, 14
0th order reaction, 89
element-wise operations, 15
autocatalysis, 87
equilibria, 60
Beer-Lambert's law and multivariate linear
introduction, 7
regression, 144, 145, 146
linear regression, 111, 125
Belousov-Zhabotinsky (BZ), 95
linest, 125
chaos, 97
matrix operations, 11
citric acid titration, 68
multivariate linear regression, 146
Cu/en/H+, 45
polynomial fitting, 125
distorted Gaussian, 38
pseudo-inverse, 146
EDTA species distribution, 66
solver, 61, 74, 207
exponential curve fitting, 150, 154
solver constraints, 61
exponential decay, 105, 205
straight line fit, 111
Fe/SCN titration, 40
gas law, 29
trendline, 111
explicit equations, 29
Gaussian curve, 37
F
general 3-component titration, 56
H3PO4 titration, 67
linearisation of exponential decay, 127
Lorenz attractor, 97
Lotka-Volterra, 92
metal/ligand titration, 60
multivariate chromatogram, 219, 241, 253,
261, 272, 282
overlapping Gaussians, 158, 161, 207
PCR and PLS of corn data, 295
polynomial fitting, 124
predator-prey, 92
reaction 2A→B, 79, 80
reaction A→B, 78, 224, 233, 244
reaction A→B→C, 143, 162, 165, 182, 206,
Factor Analysis, 213
geometrical interpretations, 224
closure, 239
HELP plots, 241
Lawton-Sylvestre, 231
mean centring, 239
noise reduction, 243
reduction in the number of dimensions,
228
three and more components, 235
two components, 224
number of significant factors, 224
feasible regions. See model-free analyses
209, 254, 255, 256, 288
G
reaction A+B↔C, 185
solubility product, 31, 62
spectrophotometric metal/ligand titration,
177
Gaussian curves, 36
distorted, 38
linear combination of, 36
global analysis, 183
324
Index
augmented data matrices, 184
multivariate, 139
numerical difficulties, 120
H hard-modelling. See model-based analyses
Hessian matrix, 202
Heuristic Evolving Latent Projections (HELP),
241
polynomials, 114, 124
pseudo-inverse, 117, 118, 122, 140
standard deviation of the residuals, 122
straight line fit, 109
using Excel, 111
using Matlab, 121
I
ionic strength, 44
Iterative Target Transform Factor Analysis
linearisation of non-linear problems non-uniform error distribution, 130
loadings, 215
target testing, 251
M K
kinetics, 32, 76, Also see examples
boundary conditions, 77
chemical model, 77
complex mechanisms, 80
concentration profiles, 78
Euler method, 80
explicit solutions, 77
initial concentrations, 77
mechanism, 76
numerical integration. See numerical
integration
ordinary differential equations (ODEs), 77
oscillating reaction, 95
rate constants, 77
Matlab
/ \ operators, 48, 109, 117, 118, 121, 156,
165
cell arrays, 169
element-wise operations, 15, 19
introduction, 7
linear regression, 117, 121
multivariate linear regression, 144, 145
optimisation toolbox, 203
polynomial fitting, 124
pseudo-inverse, 142
Singular Value Decomposition, 215
structures, 169
symbolic toolbox, 79
matrix
addition and subtraction, 12
rate law, 76
diagonal, 22
Runge-Kutta, 82
dimension, 8
L
identity, 23
inverse, 24
law of mass action, 31, 40, 44
multiplication, 16
Lawton-Sylvestre, 231
notation, 8
least-squares methods, 102
operator, 11
ligand, 45
orthogonal, 25
linear dependence, 119
orthonormal, 25
concentration profiles, 217
pseudo-inverse, 49
in concentration profiles, 175
rank, 119
linear least-squares. See linear regression
size, 8
linear regression, 109, 163
square, 21
applications, 127
submatrix, 9
design matrix, 115
symmetric, 22
errors in parameters, 121
transposition, 10
generalised matrix notation, 114
mean centring, 239
linearisation of non-linear problems, 127
metal ion, 45
matrix notation, 113
model-based analyses, 101
325
Index best model, 101, 197
model-free analyses, 213
feasible regions, 234, 288
multivariate, 162
Newton-Gauss algorithm. See Newton-
Gauss algorithm
standard deviation of the residuals, 161,
limitations, 234
multiplicative ambiguity, 291
rotational ambiguity, 234, 288
multiplicative ambiguity. See model-free
165, 180, 189
non-white noise, 189
normal equations, 115
normalisation
analyses
multivariate data, 34, 139, 162
unity hight of concentration profiles, 275
number of species, 217
N Newton-Gauss algorithm, 148, 290
calculation of the residuals, 163
convergence criterion, 153
curvature matrix, 161, 202
explicit derivatives, 151
numerical integration
accuracy, 97
Euler, 80
Runge-Kutta, 82, 88, 93
step size, 86
stiff problems, 86, 90, 97
fitting in reduced eigenvector space, 180
O
fixing parameters, 169
flow diagram, 157
Hessian matrix, 202
initial guesses, 148
optimisation Newton-Gauss algorithm. See NewtonGauss algorithm
Jacobian, 149, 163
simplex, 204, See Simplex optimisation
known spectra, 175, 177
solver. See Excel
Levenberg/Marquardt extension, 155
P
Marquardt parameter, 156, 161
minimal algorithm, 149
numerical derivatives, 153, 173
separation of linear and non-linear
parameters, 163
termination criterion, 153
uncolored species, 175, 177
Newton-Raphson algorithm, 48, 69
equilibrium model, 53
flow diagram, 52
initial guesses, 49, 70
input arguments, 53
Jacobian, 50, 74
numerical accuracy, 56
output arguments, 54
shift vector, 50
termination criterion, 56
noise reduction, 243
non-linear least-squares. See non-linear
regression
non-linear regression, 148
errors in parameters, 161
initial guesses, 148
Jacobian, 149
parameters
linear, 105
Partial Least Squares (PLS), 295, 306
calibration, 295, 308
comparing with PCR, 310
mean-centring and normalisation, 307
prediction, 309
prognostic vector, 309
physical/chemical models, 29
polynomial fitting, 114, 124
Savitzky-Golay filter, 130
using Excel, 125
using Matlab, 124
polynomial interpolation, 138
lolipop, 138
Principal Component Regression (PCR), 295,
296
calibration, 295
cross validation, 303
mean-centring and normalisation, 297
prediction, 300, 302
PRESS, 305
326
Index
prognostic vector, 300
sum of squares, 103, 109, 140
principal components, 218, 221
systems of linear equations, 26
pseudo-inverse, 117, 140, 142
systems of non-linear equations, 69
using Excel, 146
T R
Target Factor Analysis, 246
projection matrices, 250
rank deficiency
of concentration profiles, 175, 184
rank of a matrix, 119, 120, 217
residuals, 34, 103, 140
target testing, 247, 250, 251
parameter fitting, 257
Target Transform Search/Fit, 253
standard deviation, 223
parameter fitting via target testing, 257
structure, 222
system of homogeneous differential
Resolving Factor Analysis (RFA), 290
rotational ambiguity. See model-free analyses
equations, 254
transformation matrix, 254
Taylor series, 48, 80, 149, 199
S Savitzky-Golay filter, 130
derivative of a curve using, 135
smoothing using, 131, 132
scalar, 8
scores, 215
simplex optimisation, 204
Singular Value Decomposition (SVD), 181,
214, 260, 268
singular values, 181, 215
titration, 40, Also see examples
acetic acid, 58
acid-base, 40
complexometric, 40
equilibrium constants, 40
Fe/SCN, 40
general 3-component titration, 56
metal/ligand titration, 60
pH, 40
polyprotic acids, 64
magnitude, 219
V
significant, 218
soft-modelling. See model-free analyses
van der Waals coefficients, 30
straight line fit
vector, 8
explicit equations, 111
dimension, 8
matrix notation, 113
scalar product, 17