OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS '.V:«i..
Ilya Grigorenko World Scientific
OPTIMAL CONTRO...
221 downloads
723 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS '.V:«i..
Ilya Grigorenko World Scientific
OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS
This page is intentionally left blank
OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS
Ilya Grigorenko University of Southern California
USA
\{p World Scientific NEW JERSEY • LONDON • SINGAPORE • BEIJING • SHANGHAI • HONGKONG • TAIPEI • CHENNAI
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
OPTIMAL CONTROL AND FORECASTING OF COMPLEX DYNAMICAL SYSTEMS Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-256-660-0
Printed in Singapore by World Scientific Printers (S) Pte Ltd
To my beautiful wife Elena
This page is intentionally left blank
Preface
Chance, however, is the governess of life. Palladas, 5th Century A.D. Anthologia Palatina 10.65 This book has appeared by choice but also by some chance. I gave a talk in summer 2003 at Max Plank Institute of Complex Systems in Dresden, Germany, where I was kindly invited by Prof. Dr. Jan-Michael Rost. It happened that among the people who has visited my talk, was a representative of the World Scientific Inc. One month later I received an invitation to write this book. The purpose of this text is to summarize and share with a reader the author's experience in the field of complex systems. The title of this book were constructed to maximally cover different topics of the author's recent research, as fully as it is possible to do in few words. The main target of this book is to show the variety of the problems in modern physics which be formulated in terms of optimization and optimal control. This idea is not new. From the 18th century it is known, that almost any physical (or mechanical) problem one can formulate as an extremum problem. Such approach is called Lagrangian formalism, after the great French mathematician Lagrange. The text is written in such a way, that all the chapters are logically coupled to each other, so the reader should be ready to be referred to different parts of the book. In this book the author tried to adopt a naive division of the complexity hierarchy. The simplest case is control and forecast of the systems, which one can describe with the help of linear differential equations, and control fields enter in these equations as additive inhomogeneous terms. The sitvii
viii
Optimal control and forecasting of complex dynamical
systems
uation becomes more complicated, when control appears as not additive, but multiplicative. It leads to a nonlinear problem for the search of control fields. A typical example-control of a quantum system, where a control field enters into the Schrodinger equation as a product with the system's wavefunction. The next level of complexity appears when the nonlinearity of controlled system is taken into account. Such problems are still tractable. As an example one can consider control of Bose-Einstein condensate (BEC), which dynamics is described by the Gross-Pitaevsky equation. Note, it is assumed, that we still know the explicit form of the mathematical equations governing the system's dynamics. However, the dynamics of the controlled system could be very complicated (chaotic). Additional complexity can be achieved, if dynamics of the system becomes not deterministic, with addition of some stochastic component. And the most difficult situation happens when we need to control a black box system (like biological, financial or social systems), for which ab initio evolution equations are unknown. Chapter 1 provides an introduction to the long mathematical history of the calculus of variations, starting from the Ferma's variational principle, the famous Bernoulli brachistochrone problem and the beginning of the calculus of variations. Despite of the limited applicability of analytical methods, the calculus of variations remains a vital instrument to solve various variational problems. The author could go deeper into the ancient times and start his story with the princess Dido's problem, but he has feeling that the brachistochrone problem belongs to the scientific history, and Dido's problem is just a beautiful ancient legend based on Virgil's Aeneid, without clear proof of the Dido's priority. In chapter 2 we discuss different aspects of numerical optimization, including effectiveness of the optimization algorithms and multiobjective optimization. We make a brief review of some popular numerical methods which could be useful for solution of various problems in optimization, control and forecasting. We give a broader review of the so-called "Quantum Genetic Algorithm", which operates with smooth differentiable functions with a limited absolute value of their gradient. As an example, we demonstrate its ability to solve few-body quantum statistical problems in ID and 2D as a problem of minimization of the ground state energy, or maximization of the partition function. Different scenarios of the formation and melting of a "Wigner molecule" in a quantum dot.
Preface
IX
Chapter 3 outlines some elements of the chaos theory and a deep connection between nonlinearity and complexity in different systems. In this chapter we give a generalization of the Lorenz system using fractional derivatives. We show, how the "effective dimension" of the system controls its dynamical behavior, including a transition from chaos to a regular motion. In chapter 4 we discuss a problem of optimal control in application to nanoscale quantum systems. We introduced a novel approach, which permits us to obtain new analytical solutions of different optimal control problems. We also solve a problem for optimal control of the induced photo-current between two quantum dots using genetic algorithm. We analyze how decoherence processes, which result in non-unitary evolution of a quantum system, change the optimal control fields. This question is very significant for future design of nanoscale devices, since decoherence, in general, significantly limits optimal control. In chapter 5 we continue to consider control of quantum systems with particular application to quantum computing. We have shown, that an optimal design of artificial quantum bits can decrease by an order of magnitude the number of errors due to quantum decoherence processes, and leads to a faster performance of basic quantum logical operations. In chapter 6 we briefly discuss different aspects of forecasting and its connection with optimization and chaos theory, which we discuss in the previous chapters. I would like to conclude this introduction with acknowledge of my teachers and colleagues, who helped and governed my research last years. The most of the results which were presented or mentioned in this book were done in a close collaboration with these nice people. I would like to thank my scientific supervisors and colleagues: Prof. Dr. B. G. Matisov and Dr. I. E. Mazets, Prof. Dr. K. H. Bennemann, Prof. Dr. M. E. Garcia, Prof. Dr. D. V. Khveshchenko, Prof. Dr. S. Haas and Prof. Dr. A. F. J. Levi. Ilya A. Grigorenko
This page is intentionally left blank
Contents
Preface
vii
1. Analytical methods in control and optimization
2.
1
1.1 Calculus of variations 1.1.1 The beginning: Fermat's variational principle . . . . 1.1.2 The "beautiful" Brachistochrone Problem 1.1.3 Euler-Lagrange equation 1.1.4 A word about distance between two functions . . . . 1.1.5 The Brachistochrone problem revisited 1.1.6 Generalizations of the Euler-Lagrange equation . . . 1.1.7 Transversality conditions 1.1.8 Conditional extremum: Lagrange multipliers method 1.1.9 Mixed Optimal problem 1.1.10 Approximate methods of solution-Ritz's method . . . 1.2 Optimal control theory 1.2.1 Sensitivity analysis 1.2.2 Null controllability 1.2.3 Problems with constrained control 1.3 Summary
1 2 4 6 11 12 14 16 16 19 20 21 25 26 26 27
Numerical optimization
29
2.1 The halting problem and No Free Lunch Theorem 2.2 Global Optimization: searching for the deepest hole on a golf field in the darkness using a cheap laser pointer 2.2.1 Sensitivity to numerical errors 2.3 Multiobjective optimization
29
xi
30 33 34
xii
3.
4.
5.
Optimal control and forecasting
of complex dynamical
systems
2.3.1 Pareto front 2.3.2 The weighted-sum method 2.4 Simplex method 2.5 Simulated annealing: "crystallizing" solutions 2.6 Introduction to genetic algorithms 2.7 GA for a class of smooth (differentiable) functions 2.8 Application of the GA to the eigenproblem 2.8.1 The ground state problem in one and two dimensions 2.8.2 Extension of the QGA to quantum statistical problems 2.8.3 Formation of a "Wigner molecule" and its "melting" 2.9 Evolutionary gradient search and Lamarckianism 2.10 Summary
35 37 38 42 44 49 57 58 66 69 74 76
Chaos in complex systems
77
3.1 Lorenz attractor 3.2 Control of chaotic dynamics of the fractional Lorenz system 3.3 Summary
80 83 91
Optimal control of quantum systems
93
4.1 Density matrix formalism 4.2 Liouville equation for the reduced density matrix 4.3 Modern variational approach to optimal control of quantum systems 4.3.1 An alternative analytical theory 4.4 An approximate analytical solution for the case of a two level system 4.5 Optimal control of a time averaged occupation of the excited level in a two-level system 4.5.1 Analytical solution for optimal control field 4.5.2 Optimal control at a given time 4.5.3 Estimation of the absolute bound for the control due to decoherence 4.6 Optimal control of nanostructures: double quantum dot . . 4.6.1 The optimal field for the control of the photon assisted tunnelling between quantum dots 4.7 Analytical theory for control of multi-photon transitions . . 4.8 Summary
95 96
Optimal control and quantum computing
99 100 105 109 114 117 119 121 124 138 145 147
Contents
5.1 5.2 5.3 5.4 6.
Robust two-qubit quantum registers Optimal design of universal two-qubit gates Entanglement of a pair of qubits Summary
Forecasting of complex dynamical systems 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
xiii
147 157 166 168 171
Forecasting of financial markets 171 Autoregressive models 172 Chaos theory embedding dimensions 174 Modelling of economic "agents" and El Farol bar problem . 175 Forecasting of the solar activity 176 Noise reduction and Wavelets 177 Finance and random matrix theory 179 Neural Networks 180 Summary 181
Bibliography
183
Index
197
Chapter 1
Analytical methods in control and optimization
1.1
Calculus of variations
Physics, Chemistry, Engineering and Finance often pose us problems which have in common that one have to choose the best (in a certain sense) solution among a huge set of possible ones. Such problems are involving optimization, optimal control, optimal design, optimal decision, long-term and short-term forecasting etc. All these optimization problems, in particular, which involve nonlinearity and multi-dimensionality, rarely can be solved analytically - some numerical methods must be used. However, one usually cannot learn much from the particular numerical solution of a complex optimization problem-it can be too complicated for analysis. Fortunately, one can get some insight from studying relatively simple problems that can be solved analytically. That is why this book begins with the introduction to analytical techniques, which were developed by many outstanding mathematicians during last 300 years. One of this techniques, the Variational Calculus provides us a method based on the Euler-Lagrange theory, to obtain analytical solutions of optimization or optimal control problems. This method can be considered as a generalization of the condition for a local extremum of the real variable functions f'(x) = 0 to the problems of functional analysis. A functional is a correspondence which assigns a definite real number to each function belonging to some class. Thus, one might say that a functional is a kind of function, where the independent variable is itself a function. In this chapter we are going to discuss methods of analytical solution of optimal control problems, based on variational approach. Being useful only in some simple cases, it is very transparent to demonstrate complexity and richness of the optimal control theory.
1
1
Optimal control and forecasting of complex dynamical
systems
In the following section we give a short historical survey of the most significant discoveries in the theory of the variational calculus, which gave rise of the many branches of the functional analysis and optimal control theory.
1.1.1
The beginning:
Fermat's
variational
principle
The name of a great French mathematician Pierre Fermat (1601-1665) is connected to many famous and even intriguing pages of Mathematics. Perhaps, everyone heard about the great "Fermat's Last Theorem" that was challenging mathematicians for more than 300 years. Many mathematicians questioned Fermat's claim that he knew a proof of this theorem. Finally the theorem was resolved by Andrew Wiles, a professor of mathematics at Princeton in 1993. It is less known that Fermat made the first formulation of the variational principle for a physical problem. It is named as Fermat's variational principle in geometrical optics. In his letter to a colleague, Marin Cureau de la Chamber, dated 1st January 1662, Fermat has attached two papers: "The analysis of refractions" and "The synthesis of refractions", where he considered a problem of light propagation in optical medium. Fermat's basic idea was that a ray of light goes along a trajectory (among all possible trajectories joining the same points), so that the time of the travel must be minimal. In a homogeneous medium (for example, in the air), where the speed of light is constant at all points and in all directions, the time of travel along a trajectory is proportional to its length. Therefore, the minimum time trajectory connecting two points A and B is simply the straight line. By means of his principle Fermat was able to give completely clear derivation of the Snell's law of light refraction, that was originally established experimentally by Dutch mathematician Willebrord van Roijen Snell (1580 — 1626). Here we also would like to mention that the Snell's law probably have been discovered first by 10th-century Islamic scholar Abu Said al-Ala Ibn Sahl in his work "On the Burning Instruments" in the 10th century, and then rediscovered independently by several European scientists, including another great French mathematician and philosopher Rene Descartes (1596 - 1650). Let us give a brief description of the Snell's law, since we a going to use it in the next subsection. The law can be formulated as follows. Let us consider a parallel beam of rays of light be incident on a horizontal plane
Analytical methods in control and optimization
3
i
Fig. 1.1
k
s
Snell's law.
interface 5 between two homogeneous optical media (see Fig. 1.1). Snell's law says: sinai
The
vi
where a and f3 are the angle of incidence and the angle of refraction (reckoned from the normal N to S ) and v\ and V2 are the speed of light above and below S. Quite remarkable, in one of his above mentioned letters Fermat cited a great Italian scientist Galileo Galilei (1564 — 1642), and his work, published in 1638. In this work, titled "Two New Sciences", Galileo apparently first considered the problem of finding the path of the fastest descend under the action of gravity. For Fermat it was very significant that Galileo's solution (which was incorrect) was not just a straight line representing the shortest way. After the Fermat's discovery of the first variational principle, many
4
Optimal control and forecasting of complex dynamical
systems
others were proposed by different scientists. Variational formulations became known in Mechanics, Electrodynamics, Quantum Mechanics, Quantum Field Theory, etc. Nowadays it is a common knowledge in science that any natural law can be formulated as an extremal problem. Here we can quote a great Swiss mathematicians Leonhard Euler (1707 — 1783): "In everything that occurs in the world the meaning of a maximum or a minimum can be seen". The variational principle inspired metaphysically inclined thinkers who understood it as a concrete mathematical expression of the idea of a great German philosopher and scientist Gottfried Wilhelm Leibnitz (1646 — 1716) that actual world is the best possible world. 1.1.2
The "beautiful" Brachistochrone
Problem
After Fermat's papers of 1662 it was not much progress on the subject until in June 1696 an article by a great Swiss mathematician Johann Bernoulli (1667-1748) was published with an intriguing title: "A New Problem to Whose Solution Mathematicians are Invited". He stated there the following problem: "Suppose that we are given two points A and B in a vertical plane. Determine the path ACB along which a body C which starts moving from the point A under the action of its own gravity reaches the point B in the shortest time". The statement of the problem was followed by a paragraph in which
-0.5
-1.5
-2.5
Fig. 1.2
The line of the shortest decent, the brachistochrone.
Johann Bernoulli explained his readers that the problem is very useful in
Analytical methods in control and
optimization
5
mechanics, the solution is not a straight line AB, and that the curve is very well known to geometers (Fig. 1.2). One of the most influential mathematicians at this time, G. W. Leibnitz recognized this problem as "so beautiful and until recently unknown". Apparently, neither Leibnitz nor Johann Bernoulli was aware about Galileo's work in 1638, which we have mentioned above. Some time after the publication of his article Johann Bernoulli gave the solution of the problem by himself. Another solutions were provided independently by Leibniz, Jacob Bernoulli (brother of Johann Bernoulli), and anonymous author. Experts immediately guessed "Ex Ungue Leonem..." ("to judge a lion from its claw"), and now we know exactly, that it was a great British mathematician and physicist Sir Isaac Newton (1642 — 1727). The curve of the shortest decent or the brachistochrone ("brachistochrone" means in Greek "shortest time") turned out to be a cycloid. Leibniz's solution was based on the approximation of curves with broken lines. Very interesting solution was given by Jacob Bernoulli. This solution was based on adoption of the Huygens' principle and the concept of "wave front". However, the most frequently mentioned solution belongs to Johann Bernoulli himself. The Bernoulli's solution is following. First of all let us set a coordinate system (x, y) in the vertical plane so that the x axis is horizontal and the y axis is directed downward (see Fig. 1.2). The velocity of the body C does not depend of the shape of the curve y(x) because the body moves without friction. Its velocity depends only on the current heights y(x) and according to the Galileo's law, v = ^J2gy{x), (g is the acceleration of gravity). The total time of the decent along the path y{x) from the point A to the point B is equal to: T=[ JAB
* = / - * , V JAB y/2gy(x)
(1.2)
where ds is the differential of the arc length. The problem is to find the optimal path y(x) in order to minimize the total time T. Bernoulli's original idea was to apply the Fermat's principle to the brachistochrone problem. He noted, that one can formulate original problem as the problem of light propagation in a nonhomogeneous medium, where the speed of light at a point (x,y(x)) is assumed to be equal to y/2gy(x). In order to obtain the solution in analytical form, after Johann Bernoulli, we can split the medium into many thin parallel layers with local
6
Optimal control and forecasting
of complex dynamical
systems
speed of light vt, i = 1, 2,..., N. Applying the Snell's law, we obtain sin(ai)
sin(a2)
sin(ai)
sin(a^)
Vi
V2
Vi
VN
const,
(1-3)
where on are the angles of incidence of the ray. Going to the limit of the infinitely thin optical layers, we conclude that sin(a(a;)) —W11 = const, v(x)
(1.4)
where v(x) = y/2gy(x) and a(x) is the angle between the tangent to the curve y(x) at the point (x,y(x)) and the axis y: sin(a(x)) = 1/y/l + (dy(x)/dx)2.
(1.5)
Thus, the equation of the brachistochrone y(x) can be rewritten in the following form:
y/l + (dy(x)/dx)*y/tfx)
= const.
(1-6)
This equation can be easily integrated, and we obtain the solution as the equation of a cycloid : x = xo + x\(t — sin(£))/2, S/ = I/o(l-cos(t))/2. 1.1.3
Euler-Lagrange
(1.7)
equation
It was already clear at that time, that the elegant method, proposed by Bernoulli, cannot be applied to any problem, and some general method of solving problems of this type, is very desired. In 1732, when Euler was only 25, he published his first work on variational calculus. And in 1744 Euler published a manuscript under the title "A method of finding curves possessing the properties the properties of a maximum or a minimum or the solution of the isoperimetric problem taken in the broadest sense". In his systematic study Euler treated 100(!) special problems and not only solved them but also set up the beginnings of a real general theory. Nowadays we can say that in this work he established the theoretical foundations of a new branch of mathematical analysis.
Analytical methods in control and
optimization
7
In his work, Euler considered the problems, similar to the brachistochrone problem, and which can be formulated as the problem to find unknown function x(t), so the functional J(x) achieves its extremum: J(x)=
F(t,x(t),x(t))dt~*extr.
(1.8)
J to The function F is problem-specific and it is usually called functional density. The Euler's method was based on the approximation curves with broken lines. With this method Euler derived a second-order differential equation for the extremals, which later Lagrange called the Euler equation. Eleven years later, in 1755, 19 years old Joseph Louis Lagrange (1736 — 1813) sent a letter to Euler where he described his "method of variations". In 1759 Lagrange published the work where he had elaborated his new methods of the calculus of variations. His main idea was to consider a "variation" of the curve, which is assumed to be an extremal. Later on the Lagrange method became generally accepted by other mathematicians. This is the method that we shall use to derive the famous Euler equation. But before we start to derive the Euler-Lagrange equation let us proof first the socalled Fundamental Lemma, which plays an essential role in the variational calculus. The Fundamental Lemma. Let a continues function a(t) has a property: / a(t)x(t)dt = 0 Jo for any continuously differentiable function x(0) = x(T) = 0. Then a(t) = 0.
(1.9) x(t) satisfying
condition:
Proof. Let us suppose that a{t) ^ 0 at a point r G [0,T]. Then the continuity ofa(t) implies that there is a closed interval A = [ri, T2] on which a(t) does not vanish. For definiteness, leta(t) > a > 0, t € A (see Fig.1.3). Let construct a function x(t) = (t — n ) 2 ( i — T 2 ) 2 , t e A, 0, otherwise. It can be easily be verified, that x(t) is continuously differentiable function with x(0) = x(T) = 0. Thus x(t) satisfies conditions of the Fundamental Lemma. On the other hand, by the Mean Value Theorem one can show that exists 7} £ [0,T], such that J0 a(t)x(t)dt = a(ri) f0 x{t)dt > 0, and arrive at a contradiction which proves the Lemma. D
8
Optimal control and forecasting of complex dynamical
systems
A Fig. 1.3
The Fundamental Lemma, function a(t) (see in the text).
Now let us start the derivation of Euler-Lagrange equation. We suppose x* = x* (t) is an optimal solution of the variation problem given by Eq.(1.8), and let fi(t) be a function which satisfies boundary conditions /J.(to) = M(*I) = 0- Let us modestly restrict ourselves to the class of functions that satisfy the boundary conditions and have the first and the second derivative continuous on the t e (*o,*i)- These functions we will call admissible functions. For each real number a, let us define a new function x{t) by X(t) = X*(t) + afi(t),
(1.10)
see Fig. 1.4. Note that if a is small, the function x(t) is "near" the function x*(t) in some sense. The more precise meaning of "nearness" of functions we shall consider later. Clearly, for optimal x*: J{x*) > J(x* + afi), if we assume x* to be a global maximum, for all a. If we keep the function fi(t) fixed, the integral J(x* + a/x) becomes a function of a alone. Putting 1(a) = J(x* + afi), we have 1(a) =
F(t,x*+an,x*+aji)dt.
(1.11)
Jta Here 7(0) = J(x*), so 7(a) < 7(0) for all a. This condition can be formu-
Analytical methods in control and
optimization
Fig. 1.4
"Weak" variation of the optimal solution x*.
Fig. 1.5
"Strong" variation of the optimal solution x*.
9
yt
lated as dl/da\
= 0.
(1.12)
la=0
This is the condition we use to derive the Euler equation. Now, looking
10
Optimal control and forecasting of complex dynamical
systems
at the Eq.(l.ll), we see that to calculate 1(0) we must differentiate the integral with respect to the parameter a appearing in the integrand. The result is as follows dl_ da
(1.13)
Let us now integrate the second term in Eq.(1.13) by parts: ffl 9F . . . ^
dF
, , *i
f*1 d dF
(1.14)
Using Eq.(l.ll) and rearranging, we get: fu \dF da a=o JtQ L ax fdF\ , s Now recall that (i(to) = fi(ti
I
1
ddF-i .., dt ox J (dF\ ,
N
(1.15)
0, so the condition Eq.(1.12) reduces to
[dF
ddFi
. .,
l-8x--Jt-dx-ht)dt
=
n
°-
(1.16)
J t0
In the argument leading to this result, fi(t) was a fixed function. However, Eq.(1.16) holds for all functions /j,(t) which are 0 at to and t\. According to the Fundamental Lemma it follows that the term in the brackets must vanish for all t G [to,ti]: 9F^_dLdF_ _ dx dt dx
(1.17)
Here is worth to note, that the Euler-Lagrange equation is a first-order condition for "local" optimum analogous to the condition that the partial derivatives are 0 at a local extreme point in the "static" optimization. Let us mention two special (simple) forms of the Euler-Lagrange equation: 1) If the Lagrangian F does not depend on x explicitly, the Euler equation becomes: dF_ \ p(t) — const. dx
(1.18)
Analytical methods in control and
optimization
11
It is called the momentum integral. 2) If the Lagrangian F does not depend on t, the Euler equation possesses the first integral: dF — - F = E(t) = const.
(1.19)
It is called the energy integral (both the names originate from classical mechanics). The Legender's necessary condition: Now let us suppose that the condition of Eq.(1.17) is satisfied. A necessary condition for J, given by Eq.(1.8) to have a maximum at x* = x*(t), was first obtained by a great French mathematician Adrien-Marie Legendre (1752 - 1833) : FZ±(t,x*(t),x*(t)) —oo is called a global minimum, if for any x G A: /(x*) < / ( x ) . Then, x* is a global minimum point, / is called the objective function, and set A is called the feasible region. The problem of determining a global minimum point is called the global optimization problem. In the same way one can formulate the problem for search of a global maximum. The definition of a local minimum of an objective function: / ( x ) - the value / * is called a local minimum, if an e-environment C/(xo) : |xo—x| < e, such that /* is the smallest feasible objective function value within this environment. In practice, the number of local minima is quite large, see Fig.2.1. In one can proof, that a given objective function has property of con-
Numerical
optimization
31
vexity, then optimization problem becomes easier task. Convexity-is a geometrical property of a curve. A convex function is a real-valued function / defined on an interval [a, b] if for any two points x and y laying in [a, b] and any a £ [0,1], the following inequality holds: f(ax + (1 — a)y) < af{x) + (1 — a)f(y). It can be proofed, that any local minimum of a convex function is also a global minimum. A strictly convex function will have at most one global minimum. Taking into account this fact, we will call an objective function / convex, if it has exactly one local minimum (and it is also the global minimum), otherwise we will call it non-convex. Most of real-life optimization problems are non-convex. Of course, one is usually interested in finding a global
Fig. 2.1 Example of a ID objective function, note that it has many local and one global minimum.
optimum of a given optimization problem instead of only a local one. Unfortunately, in global optimization no general criterion exists for identification
32
Optimal control and forecasting of complex dynamical
systems
of the global minimum. Another bad message, that almost all practically interesting optimization problems contains huge amount of local minima. As an example of a frequent optimization problem we can mention a global optimization problem caused by the least square estimation of model parameters. Let us consider an observed data sequence (x*, ?/i), i = 1,..., N. Here yi are dependent variables ("effect") and X% 2LT6 independent variables ("cause"). Assuming the existence of a model or at least a hypothesis y(a, Xi) which describes the dependence between "effect" y^ and "cause" xit the model parameter vector a must be estimated in order to minimize the sum of squared differences between measured reality and model predictions: N
Note, that the "effect" y^ could in principle, depend on all x,, that makes the problem strongly non-local. Let us imagine an optimization problem, where a solution can be represented as a bit (1 or 0) string with the length N. One can easily calculate that there are 2 ^ possible variants of the unknown solution. Another example is from the multidimensional optimization; let f(xi,..., xn) is defined as n-l
f(xu
...,xn) = lOsin(Trxi)2 + (xn - l ) 2 + ]T(x* - 1) 2 (1 + 10sin(7rx i+1 ) 2 ).
(2.2) This function has an incredible number of local minima. For instance, the last function has 10 10 local minima when n = 10, but only a single global minimum. These two examples illustrate how fast the complexity of global optimization grows with the dimension of the problem. Clearly, to find the global optimum by full (exhaustive) search in these examples is unrealistic. As a general rule, the search space grows according to the laws of Combinatorics, which often can be nicely approximated by some exponential function of N. In a very natural way the discussion of global optimization turned from continues variables to discrete ones by introducing the idea of a grid search technique. Generally, optimization problems with discrete object variables are called combinatorial optimization problems. One example of such a problem is travelling salesman problem:
Numerical
optimization
33
Find a path through a weighted graph which starts and ends at the same vertex, includes every other vertex exactly once, and minimizes the total cost of edges. According to the exponential growth of the search space as it is mentioned above, it is not surprising that the decision problem to check whether a given feasible solution of a smooth, non-convex nonlinear optimization problem is not a local minimum needs could be an extremely hard problem.
2.2.1
Sensitivity
to numerical
errors
Today's computers and computers of the near future can manipulate and store only a finite amount of information. Since the solution of a nonlinear problem may be a real number, that cannot be represented in finite space or displayed on a screen in finite time, the best we can hope for in general is a point close to a solution (preferably with some guarantee on its proximity to the solution) or an interval enclosing a solution. Computer methods for solving nonlinear problems typically use floating-point numbers to approximate real numbers. Since there are only finitely many floating point numbers, these methods are bound to make numerical errors. These errors, although probably small considered in isolation, may have fundamental implications on the results. Consider, for instance, Wilkinson's problem, which consists in finding all solutions to the equation
20
Y[(x + i)+px19=0,
(2.3)
i=l
in the interval [—20.4; —9.4]. When p = 0, the equation obviously has 11 solutions. When p = 2 - 2 3 « 10~ 7 , it has no solution! Wilkinson's problem clearly indicates that a small numerical error (e.g., assume that p is the output of some numerical computation) can have fundamental implications for the results of an application. These numerical issues require users of numerical software to exercise great care when interpreting their results.
34
2.3
Optimal control and forecasting
of complex dynamical
systems
Multiobjective optimization
Most realworld engineering and scientific optimization problems (including optimal control, forecasting, least square estimation of model parameters, etc.) are multi-objective, since they usually have several objectives that may be conflicting with each other and at the same time must all be satisfied. For example, if we refer to the optimal control of a complex chemical reaction, where the multiple products of this reaction can be controlled, we will usually want first to maximize some final products, but at the same time, as a second task, we would like to minimize the outcome of some others products, that could be in a contradiction with a first goal. Another example is that we want to find the best fitting of the data using the most simple hypothesis y(a, Xi) in Eq.(2.1). Obviously, these two objectives (best fitting and simplicity of the hypothesis) are conflicting with each other. For illustration let us consider a classical example of Schaffer's function [Schaffer (2001)]: F{x) = (hix),
f2(x))
= (-x2, -[x - l) 2 ).
(2.4)
Obviously, there is no single point that maximize it all the components of F, see Fig.2.2. Multi-objective optimization (also called multi-criteria optimization) can be defined as a problem of finding a n-dimensional vector of variables called optimization parameters x = <xi,...,xm> which satisfies all imposed constraints and optimizes a m-dimensional vector objective function f(x) = / i , . . . , fm . Each element of f(x) corresponds to a particular performance criteria. No surprise that they are usually in conflict with each other. To solve the multi-objective optimization problem means to find such an optimal solution x* which would minimize values of all the components of the objective function. As we have mentioned in the previous chapter, in many optimization problems there are usually some restrictions imposed by the particular characteristics of the environment or resources available. For example, in the optimal control of molecules using laser pulses the energy of the laser pulse or the minimum possible pulse duration is bounded. And as we will see, the optimal solution will depend on this bounding value. Such kind of re-
Numerical optimization
35
0
-0.5 -1
-1.5 -2
-2.5 -3
-3.5 -4 -1
-0.5
Fig. 2.2
0
0.5
1
1.5
2
Example of the Shaffer's function.
strictions must be satisfied in order to consider that a certain solution is acceptable. These constraints can be expressed in form of mathematical inequalities: Gi(x) n the problem is said to be over-constrained, since there are no degrees of freedom left for optimization.
2.3.1
Pareto
front
As we have seen, a multi-objective optimization problem usually has no unique, perfect (or "Utopian") solution. However, one can introduce a set of nondominated, alternative solutions, known as the Pareto-optimal set, named after a brilliant Italian Economist, Sociologist and Philosopher Vil-
36
Optimal control and forecasting of complex dynamical
systems
fredo Pareto (1848-1923). Pareto-optimal solutions are also called efficient, non-dominated, and non-inferior solutions. We say that x* is Pareto optimal if there exists no feasible vector x* which would decrease some criterion without causing a simultaneous increase in at least one other criterion. Pareto optimum almost always gives multiple solutions called noninferior or non-dominated solutions. In other words, for problems having more than one objective function (for example, Fj, j = 1,2,..., M and M > 1), any two solutions xi and X2 (having P decision variables each) can have one of the two possibilities- one dominates the other or none dominates the other. A solution x\ is said to dominate the other solution x%, if both the following conditions are true [Deb (1999)]: 1. The solution Xi is no worse than #2 in all objectives. 2. The solution x\ is strictly better than X2 in at least one objective. The set of all such solutions which are non-dominated constitute the Pareto front. These solutions are in the boundary of the design region, or in the locus of the tangent points of the objective functions. In general, it is not easy to find an analytical expression of the line or surface that contains these points, and the normal procedure is to compute the points belonging to the Pareto-optimal front and their corresponding function values for each of the objectives. When we have sufficient amount of these, we may proceed to take the final decision. More formally, one can introduce such kind of definition of the Pareno optimality: For an arbitrary minimization problem, dominance is defined as follows: Pareto dominance A vector u = (ui,...,un) is said to dominate v = (vi,..., vn) if and only if u is partially less than v , i.e., for all i £ 1, ...,n , u^ < Vi implicative exists i S l,...,n : Uj < Uj.
Numerical
optimization
37
Pareto optimality A solution xu G U is said to be Pareto-optimal if and only if there is no xv e U for which v = f(xv) = (vi,...,vn) dominates u = f(xu) = (ui,...,u n ). Pareto optimality can be illustrated graphically (see Fig.2.3)
Pareto front
Pig. 2.3
Example of the Pareto front.
by considering the set of all feasible objective values, i.e., the set of all points in the objective space corresponding to the number of degrees of freedom.
2.3.2
The weighted-sum
method
In order to characterize a "quality" of a certain solution, we need to have some criteria to evaluate it. These criteria are expressed as some functions of the decision variables, that are called objective functions. In our case, some of them will be in conflict with others, and some will have to be minimized while others are maximized. These objective functions may be commensurable (measured in the same units) or noncommensurable (measured in different units). In general, the objective functions with which we deal in engineering optimization are noncommensurable. We will designate the objective functions as: fi(x), f2(x),..., f„(x). Therefore, our objective functions will form a vector function f (x) which will be defined by a vector:
38
Optimal control and forecasting of complex dynamical
f(x) = (h(x),f2(x),..,fn(x)).
systems
(2.7)
The easiest and perhaps most widely used method to handle a multiobjective optimization problem, is the weighted-sum approach. Weighting coefficients rt- are real values which express the relative "importance" of the objectives and control their involvement in the cost functional. In this approach the cost function is formulated as a weighted sum of the objectives: N
F(x) = ^ r i / i ( x ) .
(2.8)
i=l
However, this method has its disadvantages, for example, the weightedsum approach can be particularly sensitive to the setting of the weights, depending on the problem. An alternative way to determine Pareto front and solve multiobjective optimization problem is to use multiobjective genetic algorithm [Zitzler (1999)]. In the conclusion, the multi-objective optimization can be identified as one of the most challenging optimization problems.
2.4
Simplex method
In 1965 Nelder and Mead [Nelder (1965)] developed a simplex method for a function minimization, which rapidly becomes one of the most popular methods of optimization. The simplex method was based on earlier developed numerical optimization algorithm by Spendley, Hext, and Himsworth [Spendley (1962)]. A simplex is a convex hull of N + 1 points in the ./V-dimensional search space. It is assumed that the points satisfy the non-degeneracy condition (i.e. the volume of the hull is not zero). In every iteration of the algorithm, the vertices / ( x i ) , / ( x 2 ) , . . . , / ( X J V + I ) of the current simplex are arranged in ascending order according to their objective function values (we consider minimization problem):
/(xi)
< /(x2)
< ... < / ( X J V + I ) .
(2.9)
Numerical
Fig. 2.4
optimization
39
Reflection scheme for the Simplex method.
We refer xi as the best point and x^r+i as the worst point. Let us introduce the mean of all points except the worst one: < x>=
1 N
(2.10) i=l
The simplex method attempts to replace the current worst vertex by a new one that is generated by the following three operations: 1) "reflection" 2) "expansion" 3) "contraction". Only in the case these fails, a 4) "shrink" step is carried out. Here we are going to describe the operations in more details. Let us first try construct a reflection point as follows: ^reflect = < X > + a ( < X > - X j y + l ) .
(2.11)
If / ( x i ) < f(x.refiect) < /(xjv+i), i.e. if the reflected point improves on the worst point but is not better that the best point so far, then we replace
40
Optimal control and forecasting
Fig. 2.5
of complex dynamical
systems
Expansion scheme for the Simplex method.
XJV+I by ^reflect and the iteration terminated. If f(xrefleet) < / ( x i ) , i.e. if the reflected point is better than the best point so far, then we create an expansion point X-expand = < X > +j(xrefiect-
Then we replace the worst point and the iteration is terminated.
XN+I
< X >).
by the better of xreftect
(2.12)
and xexpan
In the case / ( x r e / ; e c t ) > / ( X J V ) , i.e. if the reflected point would still be the worst if it replaced the worst point so far, then a contraction is attempted. Depending on whether the reflected point is better or worse than the worst point so far, two types of contraction are possible: (1) If f(x.refiect) < / ( X J V + I ) , then an outside contraction point ^-contract = < X > + / 3 ( x r e / ; e c t - < X > ) .
(2) If /(x r e /(ect) >
/(XJV+I), ^contract
(2.13)
then an inside contraction point
= < X > +^(xjv+l- < X >).
(2.14)
Numerical
k
optimization
41
cont
cont Fig. 2.6
Contraction scheme for the Simplex method.
Fig. 2.7
Shrink scheme for t h e Simplex method.
In either case, if f(xcontract) < nun(/(xjv + i),/(x r e /j e ct))) t n e n t n e worst point XJV+I is replaced by xContract and the iteration if terminated. If all of the above have failed to generate a point that it is better than the second worst, then all the vertices x; but the best are replaced by new points: x'i = < x i > +<J(x'i - x i ) , i = 2,..., N + l.
(2.15)
According to Nelder and Mead, the purpose of these operations is that "the simplex adapts itself to the local landscape, elongating down inclined planes, changing direction on encountering a valley at an angle, and contracting in the neighborhood of a minimum". Clearly, depending on the quality of the
42
Optimal control and forecasting of complex dynamical
systems
new points that are generated, the methods requires either 1, 2, or N + 2 objective function evaluations per time step. The almost universally recommended values for the parameters are a = 1,(3 = 6 = 0.5, and 7 = 2. We illustrate the "reflection", "expansion", "contraction" and "shrink" steps in a case of two dimensions (see Figs.2.4-2.7). While still being a rather popular optimization method, the simplex search is known to be frequently ineffective, in particular, in cases of searching over multi-modal landscapes, this method can be easily trapped in a local minimum. There are other methods, which we are going to discuus in the next sections, that in general more suitable for the global nonlinear optimization. They are also more robust, than simplex method. These methods which we are going to use further in this book are simulated annealing and genetic algorithm.
2.5
Simulated annealing: "crystallizing" solutions
Stochastic optimization techniques are optimization algorithms, that utilize some random elements. Unlike deterministic search algorithms that locate optima by systematically searching the solution space using gradient information (like Newton's gradient method), stochastic methods apply a degree of randomness to the decision-making process. The random nature of stochastic algorithms has discouraged their use in some applications, especially in the area of trajectory optimization, like Travelling Salesman Problem (TSP), where deterministic algorithms work better. However, the probabilistic elements in stochastic algorithms provide them a capability not possessed by deterministic methods - the ability to escape from a local minimum and stability in the presence of noise. Real engineering and scientific optimization problems are often characterized by a highly multimodal search space containing numerous local optima. Under these conditions, deterministic rules-based algorithms have difficulty while stochastic methods persevere. In the nest pages we are going to discuss some popular stochastic methods. These methods, like simulated annealing or genetic algorithm are inspired by natural systems in physics and biology. In simulated annealing, the process by which atoms cool in molten metal to crystallize into their solid state is modelled. Genetic algorithms implement the Darwinian concept of natural selection to evolve a randomly generated set of mediocre initial guesses into a population of
Numerical
optimization
43
optimal solutions. Simulated annealing is a technique first developed by Scott Kirkpatrick et al. [Kirkpatrick (1983)]. Simulated annealing can be seen as a generalization of a Monte Carlo method for examining the equations of state and frozen states of n-body systems. At the heart of the simulation lies the Metropolis algorithm [Metropolis (1953)], which models the behavior of large systems of particles in equilibrium at a given temperature. This component is fully responsible for the stochastic nature of the algorithm and that gives the algorithm its capability to avoid local minima. The simulated annealing algorithm combines the Metropolis algorithm with a temperature schedule in an attempt to simulate the physical behavior of atoms as they cool from their liquid state into a solid state with a minimum energy. The simulated annealing algorithm derives its name because it emulates the metallurgical process, where the metal reaches its most stable, minimum energy crystalline structure. The objective of the annealing process therefore is to cool the metal so slowly, that all of its atoms align in the same direction, thus achieving a perfectly ordered crystal and the lowest possible energy state. The crucial link between these simulation with optimization process is the analogy drawn between the energy of a given configuration and the value of the objective function. The simulation of annealing is conducted by extending the Metropolis algorithm, which essentially is random perturbations of the system's parameters to minimize the total energy of a system, and if the change in the total energy dE (or objective function) is positive, it is accepted with a probability given by the Boltzmann factor e x p ( — j ^ ) . The atomic structure, i.e. the set of problem parameters, is first initialized to a randomly determined state. The Metropolis algorithm is then run at the initial usersupplied simulated temperature for a number of iterations large enough to consider the system near thermal equilibrium. The system temperature is then decreased following a user-defined temperature schedule, and the Metropolis simulation is conducted for the new temperature. These temperature reductions continue until the simulated temperature of the system has reached absolute zero. Nowadays simulated annealing is one of the most highly regarded, wellunderstood and widely applied stochastic algorithms. The simulated annealing is able to search for a global optimum, and under certain constraints,
44
Optimal control and forecasting of complex dynamical
systems
it converges to a globally optimal solution with a probability 1. However, this requires that an exponential number of trails are performed, even to approximate an optimal solution arbitrary closely, and for some problems (e.g. the TSP) it requires more computation that a complete enumeration of the search space! As can be seen from the previous discussion, the conditions which must be in place for the simulated annealing algorithm to ensure an optimal solution greatly constrains the number of problems for which convergence can be guaranteed. While convergence of the algorithm to an optimal solution can statistically be guaranteed in the theoretical realm, application of the algorithm to real-world problems brings to light the impracticality of many of the algorithm's criteria for convergence. Although claims of guaranteed convergence might be somewhat inflated, simulated annealing remains a powerful optimization algorithm for multi-dimensional and multi-modal problems due to the probabilistic hillclimbing capabilities it possesses. Simulated annealing is thus not a panacea, but rather a powerful alternative in the family of stochastic optimization algorithms.
2.6
Introduction to genetic algorithms
Now let us discuss genetic algorithm, which we are going to use as a tool in the following chapters. Genetic algorithm became widely known after it was formally introduced in the 1970s by John Holland at University of Michigan [Holland (1975); Holland (1978)]. Genetic algorithm is in fact, belong to an interdisciplinary research field with a strong connection to Biology, Artificial Intelligence, and decision support in almost any engineering discipline. Genetic algorithm is based on a model of natural, biological evolution, which was formulated for the first time by a great British naturalist Charles Darwin (1809 - 1882) in 1859 in his book "The Origin of Species". The Darwinian theory of evolution explains the adaptive change of species by the principle of natural selection, which favors those species for survival and further evolution that are best adapted to their environmental conditions. Genetic algorithm is a subset of the class of stochastic search methods (we have discussed in the previous section another stochastic search methodsimulated annealing). Whereas most stochastic search methods operate on a single solution to the problem at hand, genetic algorithm operates on a
Numerical
45
optimization
population of solutions. In addition, genetic algorithm exhibits a large degree of parallelism, making it possible to effectively exploit the computing power made available through parallel processing. To use genetic algorithms, one must represent a solution to the problem as a genome (or chromosome). The genetic algorithm then creates initial population of solutions and applies genetic operators such as mutation and crossover [Sutton (1994)] to evolve the solutions in order to find the best one(s). Let us outline some of the basics of genetic algorithms. The three
r^>
Population Selection parents in proportion to their fitness
0 1 0 0 0 1 1 1
^Mffffffjffll
-CU ih^tttt^g •8 Fig. 2.8
0 0
A general scheme for genetic algorithm.
most important aspects of using genetic algorithms are: 1. definition of the objective (fitness) function, 2. definition and implementation of the genetic representation, 3. definition and implementation of the genetic operators.
46
Optimal control and forecasting
of complex dynamical
systems
Once these three steps have been performed, the genetic algorithm should work well. Beyond that one can try many different variations to improve performance, find multiple optima, or parallelize the algorithms.
T1
1
0 |1 |0
0
1
0
1
*
1
1 Fig. 2.9
1
Mutation operation on a bit string.
1
0
0 |0 |0
1
0
1 0
0
0
t
0
1
1 0
1
1 0
1
*
1
1
0
0
+ 0
Fig. 2.10
1 0
1 0
0
1
Crossover operation between two bit strings.
The genetic algorithm is very simple, yet it performs well on many different types of problems, because of its flexibility. There are many ways
Numerical
optimization
47
to modify the basic algorithm, and many parameters that can be adjusted. Basically, if one gets the objective function right, the representation right and the operators right, then variations on the genetic algorithm and its parameters will result in only minor improvements. Since the algorithm is separated from the representation of the problem, searches of mixed continuous/discrete variables are just as easy as searches of entirely discrete or entirely continuous variables. One can use different representations for the individual genomes in the genetic algorithm. Holland worked primarily with strings of bits [Holland (1975)], but one can use arrays, trees, lists, or any other object. However, one must define genetic operators (initialization, mutation, crossover, copy (reproduction)) for any representation that one decides to use. One have to remember that each individual must represent a complete solution to the problem one is trying to optimize. Let us now discuss main genetic operators: copy, crossover and mutation. After the seminal work of Holland the most common representation for the individual genomes in the genetic algorithm is a string of bits. The reason is that the definition of the genetic operators in this case is very simple. That is why we are going to explain first the definition of genetic operators using bit representation. The disadvantage of the bit string representation is that some of the bits have exponentially bigger weights than others. The result is that a random flip of the high order bit could change the solution dramatically, and place the "offspring" far away from its "parent" with a poor probability to improve the fitness value. Another possible alternative is to use a bit string, but employ a Gray code-an ordering of In binary numbers such that only one bit changes from one entry to the next. In this case a small perturbation (mutation) of the higher order bits will not change the initial number dramatically. However, the Gray codes for 4 or more bits are not unique, even allowing for permutation or inversion of bits, and also need some algorithm for coding and decoding. The copy or reproduction operator merely transfers the information of the "parent" to an individual of the next generation without any changes. The mutation operator introduces a certain amount of randomness to the search. It can help the search find solutions that crossover alone might not encounter. Usually the mutation represents the application of the logical "NOT" operation to a single bit of a "gene" at a random position, see Fig.2.9. Typically the crossover operation is defined so that two individuals (the
48
Optimal control and forecasting
of complex dynamical
systems
parents) combine to produce two more individuals ("the children"). The primary purpose of the crossover operator is to get "genetic material" from the previous generation to the subsequent generation. In a simple crossover, a random position is chosen at which each partner in a particular pair is divided into two pieces. Each "parent" then exchanges a subsection of itself with its partner (see Fig.2.10). Note, that application of the crossover operation between identical "parents" leads to the same "children". There are different implementations of the schedule of the genetic operations. Two of the most common genetic algorithm implementations are "simple" and "steady state". The simple genetic algorithm is described by Goldberg [Goldberg (1989)]. It is a generational algorithm in which the entire population is replaced each generation. In the steady state genetic algorithm only a few individuals are replaced each "generation". This type of replacement is often referred to as "overlapping populations". Often the objective scores must be transformed in order to help the genetic algorithm maintain diversity or differentiate between very similar individuals. The transformation from raw objective scores to scaled fitness scores is called scaling. There are many different scaling algorithms. Some of the most common are linear (fitness proportionate) scaling, sigma truncation scaling, and sharing. Linear scaling transforms the objective score based on a linear relationship using the maximum and minimum scores in the population as the transformation metric. Sigma truncation scaling uses the population's standard deviation to do a similar transformation, but it truncates to zero the poor performers. Sharing derates the score of individuals that are similar to other individuals in the population. For a complete description of each of these methods, see Goldberg's book [Goldberg (1989)]. The selection method determines how individuals are chosen for mating. If one use a selection method that picks only the best individual, then the population will quickly converge to that individual. So the selector should be biased toward better individuals, but should also pick some that aren't quite as good (but hopefully contain some good "genetic material"). Some of the most common methods include "roulette wheel selection" (the likelihood of picking an individual is proportional to the individual's score), "tournament selection" (a number of individuals are picked using roulette wheel selection, then the best of these are chosen for mating), and "rank selection" (pick the best individuals every time). Sometimes the crossover operator and selection method lead to a fast convergence of the population of individuals that are almost exactly the
Numerical
optimization
49
same. When the population consists of similar individuals, the likelihood of finding new solutions typically decreases. On one hand, it is desired that the genetic algorithm finds "good" individuals, but on the other hand one need it to maintain diversity. In general, the genetic algorithm is much more robust than other search methods in the case of a noisy environment, or/and if the search space has many local optima. Genetic search method has been recently applied for plenty of various optimization problems in science, for example, to optimize the atomic structures of small clusters [Judson (1992); Deaven (1995); Michaelian (1998); Garzon (1998)]. In these works the global minimum of the cohesive energy was obtained for different cluster species using LennardJones potentials [Judson (1992)], ionic potentials [Michaelian (1998)], or interaction potentials derived from the tight-binding Hamiltonian [Deaven (1995); Garzon (1998)]. Especially successful applications of the genetic algorithm were performed in optimal control theory [Vajda (2001)]. As we have mentioned, the genetic algorithm could be used for both discrete/contineous optimization. However, if one is going to apply a "classical" version of the genetic algorithm to search for optimal continuous and differentiable function, one have to take an additional care about smoothness and differentiability of the obtained solutions. The direct application of the mutation and crossover rules leads to generation of "children" with discontinuity at the positions of the crossover or mutation operations, and therefore, such kind of solutions do not belong to the class of our interest. In the next section we present an extension of genetic algorithm to search optimal solution in the class of smooth functions. As applications of this new technique we will consider search for a ground state function for a given system's Hamiltonian, or an optimal shape of electric field to control a nanoscale device.
2.7
GA for a class of smooth (differentiable) functions
The following extension of GA was originally developed to demonstrate how to obtain ground state wave-functions of a quantum system confined in external potential. Although this type of optimization problem belongs to the class of quadratic optimization and GA definitely is not the fastest way to solve such type of problems, it could be easily applied to the case of optimal control problems to obtain realistic solutions (continuous and with
50
Optimal control and forecasting of complex dynamical
systems
limited absolute value of local gradient, that corresponds to a finite time resolution of the control field). In the case of a few-body interacting quantum system, when it is treated under the mean field approximation (using Hatree-Fock or Density Functional theory), the corresponding minimization problem becomes nonlinear and GA can serve better, because it usually easier avoids local minima compare to gradient-based methods. Since this method was originally applied to quantum systems, we called it Quantum Genetic Algorithm (QGA). Let us start with the description of a quantum system. We assume H be the Hermitian Hamiltonian operator of a TV-body quantum mechanical system: H = Hkin + Hpot + Hint, where (throughout the section we use atomic units
(2.16) fi,=m=e=l)
N
1
Hkin = rZ 2_1V»2' i=i
N
Hpot = "£jU{xi), N-l
(2.17)
N
Hint = Yl Yl V& ~*i)i=l
j=i+l
Operators Hkin, Hpot, Hint refer to kinetic, potential and interaction energy. Let us first consider a quantum mechanical ground state problem for a system described by the Hamiltonian Eq.(2.16). Let ^(xi,X2,--,Xff) be an arbitrary TV-body wavefunction. We assume that \& is normalized and (*|\1/) = 1. One can write an inequality for the ground state energy E0 in this case: E0 < .
(2.18)
Starting with a population of trial wavefunctions one can run the evolutionary procedure (GA) until the global minimum of the energy functional given by Eq.(2.18) is attained. For simplicity let us consider first the ground state problem for one particle in one dimension. In our approach a wavefunction vjr(x) is discretized
Numerical
optimization
51
on the mesh {2^} in real space, i = l,.., L, where L is a number of discretization points, and represented by the "genetic code" vector \&j = ^(xi) (see Fig.2.11).
Y(x)
¥ ( * , ) — - W.
T
L-2
Fig. 2.11 Representation of the discretized in real space wavefunction ^ ( x ) as a genetic code vector.
As we have mentioned before, there are different ways to describe the evolution of the population and the creation of the offsprings. The genetic algorithm we propose to obtain the ground state of a quantum system can be described as follows: (i) We create a random initial population {v^- (x)} consisting of Npop trial wave functions. (ii) The fitness E[^\
] of all individuals is determined.
(iii) A new population { ^ genetic operators.
(x)} is created through application of the
(iv) The fitness of the new generation is evaluated, which replaces the old one. (v) Steps (iii) and (iv) are repeated for the successive generations
52
Optimal control and forecasting of complex dynamical
systems
{*?\x)} until convergence is achieved and the ground-state wave function is found.
0.2
0.4
0.6
0.8
Coordinate x (arbitrary units) Fig. 2.12 Two randomly chosen wavefunctions for the crossover operation. The vertical dashed line shows the position of the crossover.
0.2
0.4
0.6
0.8
Coordinate x (arbitrary units) Fig. 2.13 Result of the direct application of the "classical" crossover operation. Note the discontinuity of the function 'J'j at the position of the crossover operation, that leads to extremely high kinetic energy of the "offspring".
Usually, the real space calculations deal with boundary conditions on a
Numerical
optimization
53
box. Therefore, in order to describe a wave function within a given interval in one dimension a < x < b we have to choose boundary conditions for ^ ( a ) and $(b). For simplicity we set ^(a) = —' l
0.5
0
0
0.2
0.4
0.6
0.8
1
Coordinate x (arbitrary units) Fig. 2.15 Result of the application of the "smooth" crossover operation. The vertical dashed line shows the position of the crossover operation.
As we mentioned above, we should define three kinds of operations on the individuals: the copy, mutation of a wavefunction, and crossover between two wavefunctions (see Fig.2.12). While the copy operation has the same meaning as in previous applications of the GA, the crossover and the mutation operations have to be redefined to be applied to the quantum mechanical case. The reason is that after straightforward application of the crossover operation between two "parents" one unavoidably obtains "children" with discontinuity at the position of the crossover. It means that the "offsprings" have infinitely (practically, very large) kinetic energy, and therefore, cannot be considered as good candidates to be the ground state wavefunction (see Fig.2.13). To avoid this problem we suggested a new modification of the genetic operations to apply to smooth and differentiable wavefunctions. The smooth or "uncertain" crossover is defined as follows. Let us take two randomly chosen "parent" functions $[s)(x) and $>2a)(x) (see Fig.2.12). We can construct two new functions \&f+ \x),ty2
(x)
V[s+1){x)
= 0 one obtains the usual Heaviside step function St(x) = 6(X—XQ) and the transformation Eq.(2.20) becomes the "classical" crossover operation. Note, that the crossover operation does not violate the boundary conditions and application of the crossover between identical wavefunctions generates the same wavefunctions. The mutation operation in the quantum case must also take care about smoothness of the generated "offsprings". In "classical" GA it is not possible to change randomly the value of the wave function at a given point without producing dramatic changes in the kinetic energy of the state. To avoid this problem we define a new mutation operation as ^'+1\x)
= *W(a;) + * P ( x ) ,
(2.21)
where ^fr(x) is a random mutation function. For simplicity we choose ^r(x) as a Gaussian-like ^r(x) = B exp(—(xr — x)2/km) (x — a) (b — x) with a random center xr G (a, b), width km € (0, b — a), and a small amplitude B that can be both positive or negative. Note, that so defined mutation also does not violate the boundary conditions. In order to find the ground state, for each step of the QGA iteration we randomly perform copy, crossover and mutation operations. After each application of the genetic operation (except coping) the new-created functions are normalized. It is very easy to extend the quantum genetic algorithm to treat quantum systems of a few interacting particles in two dimensions. We perform calculations on a finite region f2 where we discretize the real space. Q, = {(x,y),0 < x < d, 0 < y < d} We assume again that the external potential outside Q, is infinitely high. For a simplified study let us consider a two-particle system described by a wave function ^ J J F C P I , ^ ) , with f = (x,y), which is the Slater determinant consisting of orthogonal and normalized one-particle wave functions ipu(r), v = 1,2. This means that the optimized ^HF will represent the exact ground-state wave function for the case of noninteracting particles, whereas for the interacting case ^HF will correspond to the Hartree-Fock
56
Optimal control and forecasting of complex dynamical
systems
approximation to the ground-state wave function. As in the one dimensional case, an initial population of trial two-body wave functions {\l/i},i = l,..,Npop is chosen randomly. For this purpose, we construct each \I>, , using Gaussian-like one-particle wave functions of the form 1>„(x, y) = Av e x p ( - ( x ~ ^ ) 2 -
{y
a
~2Vv)2 )x(d - x)y(d - y),
(2.22)
a
X,v
Y,v
with v = 1,2 and random values for xv, yv and for r2)H(fi,r ? 2 )*i(fi,r 3 )dfidr ? 3 , (2.24) Jo where H is the Hamiltonian of the corresponding problem. This means that the expectation value of the energy for a given individual is a measure of its fitness, and we apply the QGA to minimize the energy. By virtue of the variational principle, when the QGA finds the global minimum, it corresponds to the ground state of H. Now we define the smooth crossover in two dimensions. Given two randomly chosen single-particle "parent" functions ip;° (x,y) and ip^0 (x,y) (i,l = l,Npop, ii,u = 1,2), one can construct two new functions
i>ivew)(x> y)Mlew)(x> 4:eW)(x,y) ew
v) a s
= V - i f W ) St(x,y) + ^;ld\x,y) ld
d
i>\; \x,y) = i>l; \x,y) St(x,y) + 4°J \x,y)
(1 - St(x,y)) (2.25) (1 -
St(x,y)),
where St(x, y) is a 2D smooth step function which produces the crossover operation. We define St(x,y) = (l+tanh((aa;-|-&2/ + c)/fc;?))/2, where a,b,c are chosen randomly, so that the line ax + by + c = 0 cuts fi into two pieces. kc is a parameter which allows to control the sharpness of the crossover operation.
Numerical
optimization
57
In the same manner we modify the mutation operation for random "parent" ip(°ld){x,y) as
4:ew\x,y)
= ^(x,y)+Mx,y),
(2-26)
where ipr(x,y) is a random mutation function. We choose ipr{x,y) as a Gaussian-like function tpr(x,y) = Ar exp(—(xr — x)2/R2 — (yr — y)21Ry) x(d — x)y(d — y) with random values for xr, yr, Rx, Ry and AT. Note, that application of the defined above mutation operation does not violate boundary conditions. As it was discussed before, for each iteration of the QGA procedure we randomly perform copy, crossover and mutation operations. After each application of a genetic operation the new-created functions should be normalized and orthogonalized. Then, the fitness of the individuals is evaluated and the fittest individuals are selected. The procedure is repeated until convergence of the fitness function (the energy of the system) to a minimal value is reached. Inside the box CI one can simulate different kinds of external potentials. If size of the box is large enough, boundary effects become negligible. Concerning our choice of the GA parameters, for the following examples we have used Pm = 0.015 for the probability of a mutation and Pc — 0.485 for the probability of a crossover operation. In the rest cases we perform the coping operation. During our calculations we set different sizes of the population up to Npop = 1000. However, the population size of only 200 "parents" usually guarantees a good convergence of the algorithm.
2.8
Application of the GA to the eigenproblem
In this section we present results of the ground and excited states calculations for interacting particles in a confined quantum system (like a quantum dot system) using the Quantum Genetic Algorithm. First we perform calculations of the ground state for different simple one and two dimensional systems and compare the obtained results with known analytical solutions. This serves as a good test for the method developed in this work. Then we compute the partition function and the excitation spectra of strongly interacting few body systems. With the help of the QGA we investigate formation of the "Wigner molecule" in systems of few confined electrons. We also investigate two different mechanisms for the so called "melting of the Wigner molecule", namely due to thermal and quantum fluctuations.
58
2.8.1
Optimal control and forecasting of complex dynamical
The ground state problem in one and two
systems
dimensions
With the purpose to test the QGA, first we apply it to calculate the ground state wave function ^{x) in the case of different external potentials in one and two dimensions. For each iteration of the QGA we evaluate the fitness function for the different individuals of the population: Ej = E[ipj] =< tyj \H\^j >, then follow the steps described above. This process is repeated until the values of the fitness function converge to the minimal value of the energy. In the figures presented below we show the results for the density probability of the ground state and the behavior of the fitness function during the iterative GA-procedure. Let us start from the ground state problem for one particle captured in the region [0,L] being in the infinite square well. The analytical solution gives the lowest energy state with energy E = n2 I'll? and corresponds to the ground state wavefunction *(x) = y/2/Zsm(irx/L).
(2.27)
In Fig.2.16 we show the calculated ground state particle density |^(x)| 2 for a potential well with infinite walls at x = 0 and x = 1 (throughout this section we use atomic units h = e = m = 1). In the inset of the Fig.2.16 we show the evolution of the mean energy of the population. The mean population energy is defined using calculated energies of all population members. It is clear that the QGA converges rapidly to the ground state. The ground-state energy calculated using our method is very close to the exact value E = ir2/2 = 4.9348... up to an error of 1 0 - 5 % already after 20 iterations. We also performed calculations for other analytically solvable problems, namely the harmonic potential U(x) = \UJ2{X — 0.5) 2 . In this case the ground state energy is E = w/2, and the ground state wavefunction is given by V(x) = L/rr)
exp(-> \ 1 1
PC 2OD0 *> == 1000
11 1
1
y
. •,
)
i 10
-n
i(x)\2 (solid line) and |2(z)|2 (dotted line) of two orbitals which build the first triplet-state wave function for two noninteracting electrons in a ID harmonic potential (dashed line). The convergence behavior of the fitness function is shown in the inset.
In our first example we perform the evaluation of the ground state for an electron confined in the infinite well. The ground state is given by *(xi,2/i,£2,2/2) = 4[sin(7rai)sin(7r2/i)sin(7riE2)sin(7r2/2) (see Fig.2.21). We found an excellent agreement with the analytical solution. In our second example we perform the evaluation of the ground state for two noninteracting particles (in a triplet state) confined in the infinite well. The ground state of this system is degenerate, and the wave functions corresponding to the different degenerated states are antisymmetric. One possible solution is given by y{xi,yi,X2,y2)
= 4[sin(7rai)sin(7T2/i)sin(27ra:2)sin(7r2/2) — sin(7r:E2) sm(iry2) sin(27r:ri) sin(7T2/i)]
(2.32)
The QGA procedure converges rapidly to a solution having the same symmetry of the function given by Eq.(2.32). In Fig.2.22 we show the ground state spatial density pQGA{r) = f \^QGA(r,r')\2dr' obtained from the QGA. The overall shape of the solution and its symmetry are in a good agreement with the exact result. The calculated value of the ground en-
64
Optimal control and forecasting of complex dynamical
systems
1,0 Fig. 2.21 Ground state spatial density distribution of an electron | * ( x ) | 2 in a infinite well calculated using the QGA.
ergy (E C?G ' 4 =34.543619) is also in a very good agreement with the analytical result (E = 77r2/2=34.543615), the relative error being less than
io-5%. In the next example we determine the ground state of two noninteracting particles (in a triplet state) in a 2D harmonic potential described by the potential U(x, y) =
±UJ\{X
- 0.5) 2 + (y- 0.5) 2 ),
(2.33)
Numerical
optimization
65
Fig. 2.22 Density distribution PQGA(X,V) for the ground state of two noninteracting fermionic particles (triplet state) in a square infinite well.
using w = 10 2 . The analytical solution for one of the degenerate triplet state reads
3
y(xi,yi,x2,y2)
= — e x p ( - ^ V ((a* - 0.5)2 + (yi - 0.5) 2 ))(z 2 - an).
(2.34) In Fig.2.23 we present the ground state density PQGA{%, y) for this problem. In this case there is also a good agreement between the result obtained from the QGA and the exact analytical solution. The calculated value of the ground-state energy is EQGA = 300.0024, which compares well with the exact one (E = 3ui = 300).
66
Optimal control and forecasting
of complex dynamical
systems
Fig. 2.23 Density distribution PQGA{X,V) f ° r the ground state of two noninteracting fermionic particles (triplet state) in an external harmonic potential.
2.8.2
Extension lems
of the QGA
to quantum
statistical
prob-
Now we are starting to discuss a possible generalization of the QGA for the case of quantum statistical problems. In order to compute not only a ground state, but also exited states of a few body quantum system, one needs only a small modification of the QGA. For this purpose we use a variational formulation for the partition function Z of a many body quantum system. Let {vPfc} be an arbitrary orthonormal set of N body wave functions (\J/fc = $>k(x1)..,xiy)). Note, that we count eigenstates in such way, that ^>i corresponds to the ground state. It can be shown that the partition function Z of the quantum system satisfies the following inequality [Peierls (1936)]: Z>Z'
= ^e-W*\A\*k\ k
(2.35)
Numerical optimization
67
where the Hamiltonian H is defined by Eq.(2.16) and the dimensionless parameter (3 is proportional to the inverse temperature: 0 = -j^p, where ks is the Boltzmann constant. The equality holds only in the case if {^k} is the complete set of eigenfunctions of the Hamiltonian H. In practice, for a finite temperature calculations using Eq.(2.35), one can take in account the lowest M levels of the system, neglecting the occupation of the levels with higher energies. The number of considered levels M can be chosen in such way, that occupation of the neglected levels does not exceed a certain value at a given temperature T. In the limit when the temperature goes to zero (T —+ 0 or /? —* +oo) Eq.(2.35) becomes equivalent to the variational principle for the ground state energy EQ:
£b 0.2
1/
\
> ' • >
i
0.4
/: —
,'
V
/ A t
• < • '
i
0.6
0.8
Position, atomic units
Fig. 2.28 Rescaled particle density p(x) of four interacting particles in an infinite well at zero temperature using width of the well L = 500 (dotted line), L = 100 (dashed line), L = 1 (solid line).
the particle density p(x) for different sizes of the well L = 500,100,1 at zero temperature. In order to visualize these results we re-scale calculated p(x) on the interval [0,1]. From Fig.2.28 it becomes clear, that with decreasing of the size of the system electron density tends to be more delocalized, i.e. a kind of "Wigner molecule melting" takes place in the system. Note, the it looks quite different from the "melting" due to thermal fluctuations. 2.9
Evolutionary gradient search and Lamarckianism
Now let us talk about possible combination between global GA optimization and local search techniques. The evolutionary gradient search algorithm of Salomon [Salomon (1998)] is a search method that employs yet another, "evolutionary inspired" estimation of the local gradient of the objective function. The state of the evolutionary gradient search algorithm at time
Numerical
optimization
75
t is described by a base point x and a step length h. In every iteration, A offspring candidate solutions x + hzi, i = 1,..., A, are generated, where the Zj are random vectors with independent components distributed according to the Gauss normal distribution. Rather then performing selection the way an evolution strategy does, the evolutionary gradient search method computes: 1
A
d h (x) = - Y, [/( x + ^ i ) - /(x)] Zi
(2.43)
as an estimation of the local gradient of the function. The motivation for this step is the wish to not discard the information carried by the offspring that are rejected, but to interpret an offspring candidate solution with a negative fitness advantage over the best point as evidence that a step should be taken in the opposite direction. The method then proceeds by taking two test steps from the base point in the negative direction of dh(x). One test step has length hy/Nk, the other one has length hy/N/k, where usually k = 1.8. The base point is then updated by performing the test step with the higher (measured) fitness advantage, and the step length h is multiplied by k if the longer of the two test steps was more successful and it is divided by k if the shorter of the two test steps prevailed. Clearly, an iteration of the evolutionary gradient search procedure requires A + 2 evaluations of the objective function. It is interesting to compare the direction given by the negative of Eq. (2.43) in which the evolutionary gradient search method proceeds with that of other strategies. Salomon has shown that for small h and sufficiently large A, the direction given by Eq.(2.43) agrees closely with the gradient direction at local x. It seems that the evolutionary gradient search method assigns weights that are proportional to the fitness advantages of all offspring, with offspring with a negative fitness advantage receiving negative weights. While the evolutionary gradient search method has not yet been studied in great detail, it seems conceivable that the "genetic repair" effect may be present in evolutionary gradient searches. Another way to improve search of genetic algorithm is to move from Darwin's to Lamarck's formulation of the evolution. Jean-Baptiste Pierre Antoine de Monet, Chevalier de Lamarck was a famous French naturalist (1744— 1829). Implementation of the "Lamarckianism" is using such a hybrid strategy, when an individual's fitness and genotype are returned after execution of local gradient search for some small percentage of time. This has the effect of slowly moving the search to beneficial areas of the search
76
Optimal control and forecasting of complex dynamical
systems
space. Such a gradual movement may provide faster algorithm execution since an increasing number of solutions in the genetic population will represent good initial guesses, limiting the amount of inefficient search by the localized search technique in poor areas of the search space. In particular, local optimization can help on the later stage of the whole optimization procedure, when the area of the global optimum is already localized by the genetic algorithm. In this case it could be a good idea to introduce an effective rate, which characterizes the ratio between Darwinian and Lamarckian methodology, and which decreases during search. 2.10
Summary
In this chapter we have discussed some problems which arise in a context of global optimization search. In particular, we learned a bit about NFT and its main consequences, and we also have discussed multi-objective optimization. We made a brief review of some popular numerical methods of optimization, including the Simplex method and Simulated Annealing. We gave a more detailed introduction into genetic algorithms, and we gave some examples how use GA to solve stationary few-body quantum mechanical eigenproblems, including quantum statistical calculations for strongly interacting systems. As an example, we considered formation and melting of a "Wigner molecule" in a system of a few electrons confined in a quantum dot.
Chapter 3
Chaos in complex systems
The Universe presents us with an infinite variety of complex dynamical systems. Their spacial and temporal scales vary from intergalactic distances and millions of years to atomic and electron motion and femto- and even attoseconds. All these physical systems can be described within framework of well established and experimentally proven theories, like celestial mechanics or quantum theory. A much higher level of complexity is associated with the description of dynamics in human society or financial networks (stock market), since there are no rigorous mathematical description of the rational (or irrational) behavior of a human being. However, one can try to develop some interdisciplinary ideas which are common for different systems and may provide some insight of the complex phenomena in general. One of such ideas is the concept of chaos. Perhaps the ultimate test for chaos is the accuracy of short-term predictions. With truly random data, prediction is impossible. Conversely, with chaotic data, there is absolute determinism, and prediction is possible in principle, at least on short-time scales. Predictability thus precludes complete randomness and signifies determinism, although randomness and determinism often coexist. Chaotic behavior can be observed in most fields of science: subatomic and molecular physics, fluid dynamics, plasma physics, chemistry, molecular biology, most of the environmental, social and economic phenomena. Thus we can hope, that the knowledge which is obtained from the studies of chaos in natural sciences, can then be applied to more complex examples, like social systems. In general, all chaotic systems are nonlinear, however, not any nonlinear system is chaotic. The story of multiple attempts to understand chaos
77
78
Optimal control and forecasting of complex dynamical
systems
begins 300 years ago, when Sir Isaac Newton has derived analytically Kepler's elliptic orbits and set equations of motion of a two-body problem that can be integrated exactly. However, if one starts to consider the motion of the Moon in more details, he immediately realizes, that it cannot be well approximated by a two-body problem, because at least the gravitation of the Sun has to be taken into account. So, one should solve much more complicated three-body problem: the motion of the Moon, the Earth, and the Sun. Although few analytical solutions for some particular cases of the problem were given in XIX century, the failure to solve the three-body problem analytically, motivated most brilliant mathematicians at that time for search of a general solution. The surprising result was obtained by a great French mathematician Henri Poincare (1854 — 1912), who has rigorously proofed the impossibility of integrating the three-body problem in general case. The impossibility to integrate equations of motion of a physical system is tightly connected with unpredictability its time evolution. In a deterministic time evolution, if we know the initial state x(0) of the system at time t = 0 exactly, we can predict its state x(t) at time t = T with complete precision. But if there is an imprecision Ax(0) in the initial definition of x(0), this (even extremely small) error Ax(0) may grow exponentially with t, at least for a while, that will virtually destroy our predictions! Technically, is one even be able to obtain partial analytical solution under special conditions (for example, for a planar motion), the series will converge extremely slow. The way to characterize this exponential grow is to calculate the Lyapunov exponents. In practice, to obtain the Lyapunov spectra, imagine an infinitesimal small sphere (in the case of three dimensions), with radius dr sitting on the initial state of a trajectory. The flow will deform this sphere into an ellipsoid. That is, after a finite time t all orbits which have started in that sphere will be in the ellipsoid. The i-th Lyapunov exponent is defined by
Ai= l i m l l n ( ^ ) , t-»oo t \dr I
(3.1)
where dli(t) is the radius of the ellipsoid along its ith principal axis. Sensitivity to initial conditions means that nearby points in phase space typically "repel" each other. Note, we are not interested in the case of
Chaos
79
unbounded dynamics, when the distance between two initially nearby points diverge from each other exponentially for all times, because they would eventually move apart to the opposite ends of the Universe. Actually, we are looking for the opposite case, when the motion in phase space is bounded and any two points will eventually reach a maximum separation and then begin to approach each other again. As a result, in some cases, the volume of the accessible phase space can monotonously decrease until it becomes a set of measure zero, and the trajectory of the system will be captured in so-called strange attractor. In fact, this strange attractor will have a fractional dimension! Such sensitive dependence on initial condition often appears in our life. Imagine, for example, in the case of someone playing dices and attempting to reproduce a certain combination and fails, because of the above mentioned reason. This situation is usually referred to chaotic behavior. Obviously, the evolution in time of a chaotic system has limited predictability, which of the order of t oc 1/A, A is the largest positive Lyapunov exponent, and the system undergoes irregular oscillations (if they were regular they would be predictable). How to do prediction for a chaotic system one can learn from mathematics, using so-called Takens' delay embedding theorem, which is a result of Floris Takens on the embedding dimension of nonlinear (chaotic) systems. The theorem states that a dynamical system can be reconstructed from a sequence of observations of the state of the dynamical system. We will talk about in more detail in chapter 6. The mathematical possibility of chaos was well understood 100 years ago by Hadamard and Poincare, but theory and applications were developed only six decades later. The reason is that only in 1950 — 1960 the growing computational power of computers permitted scientists to make possible numerical integration of systems of nonlinear equations and perform numerical experiments. Now, in the era of digital computers, most likely what we know about properties of nonlinear differential equations is usually based on numerical integration techniques. Any numerical methods are based on approximations, and chaotic systems are characterized by highly sensitivity to approximations. This problem is know as problem of efficient shadowing of the system. Shadowing is a branch of chaotic systems theory that tries to show that, even in the face of the exponential magnification of small errors, numerical solutions have some validity. It does this by trying to show that, for any particular computed solution (the "noisy" solution), there exists a
80
Optimal control and forecasting of complex dynamical
systems
true solution with slightly different initial conditions that stays uniformly close to the computed solution. If such a solution exists, it is called a true shadow of the computed solution. An approximation to true shadowing is numerical shadowing, whereby an iterative refinement algorithm is applied to a noisy solution to produce a nearby solution with less noise. If this iterative process converges to a solution with noise close to the machine precision, the resulting solution is called a numerical shadow, for more details, see, for example, [Maddox (1989)]. There are some important consequences of the idea of chaos. One of them was pointed out by Poincare, who believed that the uncertainty of our predictions justifies a probabilistic description of the world. After the discover of the chaos phenomena, it is widely accepted that chaos is a generic property of all multidimensional nonlinear systems. By projecting out the concept of chaos on to the natural world of earthquakes or the Sun activity, biological evolution, and even the human society, many scientists started to believe that concept of chaos will explode the naive determinism of Newton's mechanics. There is another issue worth to mention. One have to distinguish between chaos and randomness. Chaos usually assumes that there is always finite horizon of the reliable forecast of the system's dynamics, while randomness usually means the dynamics is completely unpredictable. Up to a certain precision one can simulate randomness with some deterministic pseudorandom procedure, or a strong chaotic process with a large positive Lyapunov exponent that makes any (even short-term) forecast very difficult. However, it seems that the true randomness exists only in quantum mechanics, where there is a breakdown of determinism, the cause and effect, given initial (and final) state and trajectories. If we accept postulates of quantum mechanics, this randomness is true, pure and unavoidable. It is not a poor pseudorandomness of dices and other gambling devices. One should notice, that the Schrodinger equation which describes the quantum evolution of a very specific dynamical variable, a wavefunction, is deterministic like as the common heat transfer equation.
3.1
Lorenz attractor
The Lorenz attractor dates from 1963, when the meteorologist Edward Lorenz working at MIT published an analysis of a simple system of three
Chaos
81
differential equations. He developed the equations as a simple model for the so-called Rayleigh-Benard convection problem, which governs the formation of convection rolls between two parallel surfaces at different temperatures. His interest was in predicting atmospheric dynamics with the ultimate goal of developing long-term weather prediction tools. What he found through obtaining numerical solutions of these seemingly simple equations had a deep impact on our understanding of the possible behaviors in nonlinear equations. In general, convection is described using the Navier-Stokes equations in partial derivatives, that makes analysis quite complicated. Fortunately, partial differential equations may often be thought of as infinite systems of ordinary differential equations. It is frequently possible to expand the dependent variables of a partial differential equation in an infinite discrete set of coupled ordinary differential equations for the time dependence of the Fourier coefficients. The Lorenz model was derived from the Navier-Stokes equations by making similar simplifications and results in a system of three coupled nonlinear first-order ordinary differential equations:
—a; =
a(y-x),
d —y = px-y-
, xz,
s
(3.2)
where a is the Prandtl number, i.e. the ratio of the kinematic viscosity divided by the thermal diffusivity, b is related to the shape of the convection rolls, and p is related to the Rayleigh number (or the temperature difference between the surfaces). The dependent variables x(t), y(t) and z(t) essentially contain the time dependence of the stream-function and temperature distributions expressed as Fourier expansions. Note, that the equations are nonlinear due to the xz and xy terms in the second and third equations, respectively. For derivation of the Lorenz equations, see [Schuster (1988)]. Lorenz has pointed out that Eq.(3.2) possess some surprising features. In particular, the equations are "sensitive to initial conditions", meaning that tiny differences at the start become amplified exponentially as time passes. This type of unpredictability is a characteristic feature of chaos. Conversely, there is also some magic "order" in the system: numerical solutions of the equations, plotted in three dimensions, consist of curves which approach a curious two-sheeted surface, later named the Lorenz at-
82
Optimal control and forecasting of complex dynamical
Fig. 3.1
systems
The famous Lorenz attractor, that symbolizes an order in chaos.
tractor. The geometry of the attractor is closely related to the "flow" of the equations the curves corresponding to solutions of the differential equations. There is an unstable equilibrium, a saddle point, at the origin. The trajectory repeatedly passes this point, and are pushed away to the left or right, only to circle round to pass back by the saddle. As they loop back, adjacent curves are pulled apart this is how the unpredictability is created and can end up on either side of the saddle. The result is an apparently random sequence of loops to the left and right. Lorenz attractor became more than 40 years ago the one of the wellrecognized symbols of modern nonlinear dynamics and chaos theory. One can possibly find it in every textbook on theory of dynamical nonlinear systems and it associated with appearing of an order within chaos (see Fig.3.1). First its existence was established only on the basis of numerical integration of Lorenz equations on computer. However, mathematicians have lacked a rigorous proof that exact solutions of the Lorenz equations will resemble the shape generated on a computer by numerical approximations, and they
Chaos
83
even also could not prove that its dynamics are genuinely chaotic. Only in 1998 Warwick Tucker has proved that Lorenz system indeed define a robust chaotic attractor [Tucker (1999)]. This result is of great importance, because it builds a bridge between empirical laws from numerical experiment and mathematically rigorous proof. It is easily accepted by our intuition that if a system consists of many interacting subsystems, having many degrees of freedom and hence governed by a set of complex multidimensional differential equations, then its behavior might be in practice impossible to predict. What is really hard to imagine that some times even very simple systems, described by simple nonlinear equations, can have very complicated chaotic solutions. Another striking observation, is that increasing of the dimensionality of some type of dynamical systems can lead to decreasing of the probability, that this system can be chaotic! [Sprott (2003)].
3.2
Control of chaotic dynamics of the fractional Lorenz system
Complex systems can consist of an enormous number of simple units whose interactions can lead to unexpected collective behavior. The dream of complexity theory is to discover the laws governing such complex systems, whether they be ecosystems, the weather or corporations in the marketplace. The "power law" is a distinctive experimental signature seen in a wide variety of complex systems. In Econometrics it goes under the name of distributions with "fat tails", in Physics it is referred to as "critical fluctuations" or "universality" , in Computer Science and Biology it is "the edge of chaos", and in Demographics and Linguistics it is called "Zipfs law". As it was shown recently [Hilfer (2000)], many of these general "power law" dependencies can be naturally introduced as solutions of differential equations with fractional derivatives. Another example, is the success in the modelling of financial series using Autoregressive Fractionally Integrated Moving Average models (ARFIMA) [Doornik (1994)]. Thus, fractional derivatives could be a significant ingredient in modelling of complex dynamical systems. In this section we introduce a generalization of the Lorenz dynamical system using fractional derivatives. Thus, the system can have an effective non-integer dimension E defined as a sum of the orders of all involved derivatives. We found that the system with E < 3 can exhibit chaotic
84
Optimal control and forecasting
of complex dynamical
systems
behavior. An interesting finding is that there a critical value of the effective dimension E c r , under which the system undergoes a transition from a chaotic dynamics to a regular one. Although fractional derivatives have a long mathematical history, many years they were not used in physics. One possible explanation of such unpopularity could be that, there are multiple nonequivalent definitions of fractional derivatives [Hilfer (2000)]. Another difficulty is that fractional derivatives have no evident geometrical interpretation because of their nonlocal character [Podlubny (2001)]. However, during the last ten years fractional calculus starts to attract much more attention of physicists and engineers. It was found that various, especially interdisciplinary applications can be elegantly described with the help of the fractional derivatives. As example, one can mention studies on viscoelastic bodies, polymer physics, phase transitions of fractional order, anomalous diffusion and description of the fractional kinetics of chaotic systems (for review see [Hilfer (2000)]). The usefulness of fractional derivatives in quantitative finance [Scalas (2000)] and quantum evolution of complex systems [Kusnezov (1999)] was recently demonstrated. One should also mention recent attempts to introduce a local formulation of fractional derivatives [Kolwankar (1998)] and to give some geometrical interpretations [Podlubny (2001)]. However, most of the studies mentioned above were performed on the basis of linear differential equations containing fractional derivatives. The main consequence of this limitation that the dynamics of such systems cannot be chaotic. According to the Poincare-Bendixson theorem (see, for example, [Hirsch (1965)]), chaos cannot occur in two-dimensional systems of autonomous ordinary differential equations. One have to stress that this theorem is applicable to continues-time dynamical systems and not to discrete maps, which can be represented as Xn+i = f(Xn), where Xn is a discrete state variable. The discrete-time dynamical systems can exhibit chaotic behavior even in one dimension (see, for example, the famous logistic map [Ott (1994)]). As we mentioned before, the most famous example of a continuestime three-dimensional system which exhibits chaos is the Lorenz model [Lorenz (1963)]. The dimension E of such system can be defined as a sum of the orders of all involved derivatives, however one should remember, that this definition is not rigorous. Therefore, by using fractional derivatives of orders 0 < a, )3,7 < 1 it is possible to obtain a system with an effective noninteger dimension £ < 3. A natural question arises whether such system can exhibit chaotic behavior. In connection with this question one should
Chaos
85
also mention the work of Hartley et al. where the authors studied chaotic motion of the Chua-Hartley system of a fractional order [Hartley (1995)]. In this section we are going to investigate dynamics of the fractional Lorenz system and we find that it can be chaotic with E < 3. We estimate the largest Lyapunov exponent in this case. Moreover, we determine a critical value E c r , under which E < E c r the dynamics of the considered system becomes regular. There are several definitions of fractional derivatives [Hilfer (2000)]. Probably the most known is the Riemann-Liouville formulation. The Riemann-Liouville derivative of order a and with the lower limit a is defined as
where r ( a ) is the gamma function and n is an integer number chosen in such way, that n — 1 < a < n. An alternative definition of fractional derivatives was introduced by Caputo [Caputo (1967)]. Caputo derivative of order a is a sort of regularization of the Riemann-Liouville derivative and it is defined through
The main advantage of the definition Eq.(3.4) is that the Caputo derivative of a constant is equal to zero, that is not the case for the Riemann-Liouville derivative. Substantially, the Caputo fractional derivative is a formal generalization of the integer derivative under the Laplace transformation [Scalas (2000)]. Now let us introduce a fractional generalization of the Lorenz system: = a(y -x), dt0V
= px -• =
y -
xz
(3.5)
xy-•bz.
Here we assume 0 < a,/3,7 < l , r > 1 and the time derivatives are in the Caputo sense. The effective dimension E of the system Eq. (3.5) we define as a sum of the orders a + /? + 7 = E. In our calculations we use the following values of the parameters a = 10, p — 28, b = 8/3, so that in the
86
Optimal control and forecasting
of complex dynamical
systems
case a = /? = 7 = r = l the system Eq.(3.5) reduces to the common Lorenz dynamical system exhibiting chaotic behavior. Generalization of dynamical equations using fractional derivatives could be useful in phenomenological description of viscoelastic liquids like, for example, human blood [Hilfer (2000); Thurston (1972)]. The system Eq.(3.5) is in fact a system of coupled nonlinear integro-differential equations with a weakly singular kernel. This is a computationally expensive problem since for numerical integration it requires C(n 2 ) operation counts, where n is the number of sampling points [Diethelm (1997)]. Let us start from the analytical solution of a linear fractional differential equation: ^ x
= Ax + f(t),x(0)=xo.
(3.6)
With the help of the Laplace transformation [Podlubny (1997)] one can easily obtain the solution of the Eq.(3.6) in the form: x(t) = x0Ea(Ata)
+ [ (t- T)a^Ea,a(A(t Jo
- r)a)f(r)dT,
(3.7)
where Ea is the one-parameter Mittag-Leffler function [Ederlyi (1955)] defined by: OO
fc
*» = £i^+T)'< a>0 >-
^
and the two-parameter Mittag-Leffler function [Ederlyi (1955)] defined by: OO
fc
For a = 1, Ea and EaiCC both reduce to the usual exponent function. The numerical scheme we implemented in our calculations is based on the linearization of the system Eq.(3.5) on each step of integration and iterative application of the Eq.(3.7). We have checked our numerical scheme by comparison results for Eq.(3.5), obtained using Eq.(3.7), and by the standard fourth order Runge-Kutta method for the case a = /? = 7 = r = l. We integrated Eq.(3.5) for different values of the parameters a, (3,7, r and different initial conditions. The first finding is that the fractional Lorenz system can exhibit chaotic behavior with the effective dimension £ < 3. In Fig.3.2 we show the dynamical portrait of the system
Chaos
87
{x(t),y(t),z(t)} using parameters r = 1, a = /? = 7 = 0.99. Thus, the effective dimension of the system is E = 2.97 < 3. We set the initial conditions at t = 0 as {xo, yo, ZQ} = {10,0,10}. Note, that the system exhibits chaotic dynamics similar to the case of the common Lorenz system. Moreover, one probably can also define the set of points which could be characterized as a strange attractor. However, this set is slightly deformed compare to the "classical" Lorenz attractor. We have to stress, that it is rather time consuming to define Lyapunov exponents for the nonlocal system, like Eq.(3.5). In order to resolve this difficulty we define the largest positive Lyapunov exponent A using an implicit procedure developed for the time-series data. With the help of the free-ware package TISEAN [Hegger (1998)] we estimated the largest Lyapunov exponent A for the case shown in the Fig.3.2. We found A « 0.85 that corresponds to the chaotic regime. Note, that for the common Lorenz system A « 0.906 [Sparrow (1982)]. We conclude that decreasing of the effective dimension E induces some effective damping in the system. By decreasing the parameters a,/3,7 one obtains a further decreasing of the largest Lyapunov exponent.
Fig. 3.2 Dynamical portrait of the fractional Lorenz system using parameters a = (3 = 7 = 0.99 and having the effective dimension S = 2.97. Note formation of the attractor, similar to the Lorenz strange attractor.
At a certain critical dimension E c r the dynamics of the system undergoes qualitative changers and becomes regular for any initial condition. This is a new interesting result which, for our knowledge, was not described before.
88
Optimal control and forecasting of complex dynamical
systems
50 40 30 20 10
30 20 10 -15 -10
-io X
10
y
-20
15
20^30
Fig. 3.3 Dynamical portrait of the fractional Lorenz system using parameters a = /3 = 7 = 0.96 and having the effective dimension £ = 2.94. Note, that the strange attractor does not exist and system is attracted by one of the two focuses: (3v / 8,3\/8,26) and (-3^,3^,26).
We obtain the lowest value of the system's effective dimension E c r « 2.91, for which chaotic regime is still possible. This corresponds to the case a « 0.91,/? = 7 = 1. The obtained critical values of the parameters a, /3,7 reflect the fact, that the first, the linear differential equation in the system Eq.(3.5) seems to be "less sensitive" to the damping, introduced by the fractional derivative, than others, nonlinear equations. If we restrict ourselves for the case of equal derivative orders a = (3 = 7, the effective critical dimension for this symmetric case is even higher: £sim ^ 2.94. In the Fig.3.3 we show the dynamical portrait of the system setting parameters r = 1, a = /3 = 7 = 0.97, with the corresponding effective dimension £ = 2.91 < ££J.m- We use the same initial conditions as for the previous examples. Note, that in this case the system exhibits a strong damping of the oscillations. Dependent on initial conditions, the trajectory of the system is attracted by one of two centers given by
(x, y, z) = (±y/b{r-l),
±y/b(r-l),
r - 1).
(3.10)
These points can be easily defined from the stationarity condition: da
d13
_
di
0.
(3.11)
Note, that the stationarity condition Eq.(3.11) has the usual form because
Chaos
we use the Caputo's fractional derivatives, and it is not applicable, if one uses the Riemann-Liouville formulation Eq.(3.3). However, the obtained critical value S c r « 2.91 is not the "universal threshold" for any continues-time chaotic system of fractional dimension. We found that it is a value that characterizes a particular dynamical system. In order to illustrate this we repeat the simulations shown in the Fig.3.3 with the same initial conditions, but with the changed parameter r = 3. In the Fig.3.4 we show the dynamics of the variable z(t). The system Eq.(3.5) in this case exhibits a "stronger" nonlinearity, which possibly compensates the damping effect described above, and one again obtains chaotic behavior.
Fig. 3.4 Evolution of the variable z(t) of the fractional Lorenz system. We use parameters a — (3 = 7 = 0.96, r = 3. The effective dimension £ = 2.94. Note, that unlike in the Fig.3.3, the dynamics of the system is chaotic.
We also found that under certain conditions the system Eq.(3.5) can exhibit quasi-periodic oscillations with stable periodic orbits. In the Fig.3.5 we show an example of such quasi-periodic dynamics of the variable z(t) using parameters values r = l,a = (3 = l and 7 = 0.98, that corresponds to the effective dimension £ = 2.98. We used the same initial conditions as for the previous Figs.3.2-3.4. Note, that after some time of transient behavior
90
Optimal control and forecasting of complex dynamical
systems
Fig. 3.5 A quasi-periodic evolution of the variable z(t) of the fractional Lorenz system. We use parameters a —' f} = 1,7 = 0.98, r = 1.
the system evolves quasi-periodically. For different initial conditions the dynamics of the system shows different limiting cycles having the same two symmetric centers (fixed points) given by Eq.(3.10). One can understand the obtained results shown in the Figs.3.2-3.5 in the following way. Any chaotic system is characterized by a strong sensitivity to its initial conditions, and the "memory" time of the system can be estimated as tmem RS A - 1 , where A is the largest positive Lyapunov exponent. On the other hand, the introduction of fractional derivatives leads to a non-locality in time domain (see definition Eq.(3.4)), which can be interpreted as the presence of long "memory". The competition between these two tendencies was a subject of the presented investigations. Now we discuss a question whether introduction of fractional derivatives should always lead to stabilization and damping of the chaos in the dynamical system. Let us consider Eqs.(3.6),(3.7) in more detail. In the case of A < 0 and in the limit t -> +00, Ea(Ata) oc t~a [Scalas (2000)]. Therefore, one obtains only the power law convergence of two close trajectories instead of the exponential one for a = 1. Thus, even small changers of
Optimal control of quantum
systems
91
the orders of derivatives could lead to dramatic changers of the Lyapunov spectrum and of the whole dynamics. One can imagine a situation, when a small decreasing of the order of the derivative a could lead to a stronger sensitivity of the whole nonlinear system to the initial conditions. If A > 0, in the limit t —* +00, Ea(Ata) oc exp(A1^at) and one obtains, as in the case of a = 1, the exponential divergence of two close trajectories.
3.3
Summary
In this section we discussed an example of chaotic phenomena in Nature. We have introduced a fractional generalization of the Lorenz model which could be useful in phenomenological description of viscoelastic liquids and other dynamical systems with a long "memory". We study how the dynamics of the system depends on the effective dimension S, which now can be a non-integer value. We found that the fractional Lorenz system exhibits rich dynamical properties and can be chaotic with effective dimension £ less than 3. To discriminate between chaotic and ordered orbits, we also estimated the largest Lyapunov exponent in particular cases. We demonstrated that the dynamics of the system is strongly sensitive to the values of the orders of the involved derivatives a, f3,7 and as the result, to the effective dimension S. In general, decreasing of the parameters a, 3,7 leads to a damping in the system. We discovered that under the certain critical dimensionality S c r < 2.91 chaotic motion of the system is not possible. There are some interesting questions which are still open. Does the lowest universal bound £ " " exist, under which any nonlinear system cannot be chaotic? And how far could it be from the value predicted by the Poincare-Bendixson theorem ( £ " " = 2)? One should aware though, that fractional dynamics do not represent a group f(t + s) = f(t) * f(s) in time domain, so one cannot apply the Poincare-Bendicsson theorem rigorously. Another interesting question is whether the introduction of fractional derivatives of the distributed order [Chechkin (2002)] in nonlinear systems could help in the description of "edge of chaos", which is characterized by a power law divergence of close trajectories.
This page is intentionally left blank
Chapter 4
Optimal control of quantum systems
In the last decade the development of laser systems opened the way for creation of ultrashort femtosecond pulses with controlled shape, spectrum and polarization (see, for example, [Brixner (2001)]). Ultrashort laser pulses can be used as an ideal tool to manipulate quantum objects. For instance, with the help of optimal control field one can induce chemical reactions which are not possible or very difficult to carry out [Judson (1992)]. After the seminal work of Judson and Rabitz [Judson (1992)], where the authors suggested a variational formulation of optimal control in quantum systems and a procedure to solve this problem, many theoretical and experimental investigations were devoted to the problem how "to teach" lasers to drive molecular reactions in real time [Bardeen (1997); de Vivie-Riedle (2000), Apalategui (2001); Vajda (2001); Hornung (2000)]. The idea consists in using the pulse-shaping techniques to design pulses or sequences of pulses having a given optimal shape (and phase) so that the desired atomic wave packet dynamics is induced. Optimal control of the internal motions of a molecule is achieved by exploiting a variety of interference effects associated with the quantum mechanics of molecular or electron motion. Thus, the population of a certain vibrational state, which may be responsible for the yield of a chemical reaction can be controlled. Using a variational formulation of the control problem it was shown, how to construct optimal external fields (e.g. laser pulses) to drive a certain physical quantity, like the population of a given state, to reach a desired value at a given time [Ohtsuki (1999); Peirce (1988); de Araujo (1998)]. However, even for the simplest control problems the obtained fields usually have a rather complex nature and cannot be easily interpreted [Zhu (1998)]. Furthermore, since the optimal field arises in the formalism [Judson (1992)] as a solution of the system of coupled nonlinear differential equations, which
93
94
Optimal control and forecasting
of complex dynamical
systems
are treated numerically by application of iterative methods, there is also a problem related to the multiplicity of the obtained optimal solutions which are local extrema of the control problem. Therefore, it is necessary to develop a new theory, which permits to derive analytical solutions for the optimal fields, to guarantee their uniqueness at least for simple problems. In this chapter we present a new alternative approach which permits to obtain analytical solutions in some simple problems. There is another point worth to note. Although the maximization of a given objective at a certain moment (as it was considered in earlier works mentioned above) is relevant for many purposes, a more detailed manipulation of real systems may require the control of physical quantities during a finite time interval. An interesting example of such optimal control was recently performed on a system of shallow donors in semiconductors [Cole (2001)]. Using pulses of various shapes and duration one can control the photo-current in the system, while the total transferred charge is proportional to a time integral over the occupation of a certain excited state. The search for optimal fields able to perform such control quantum systems is a vital problem for which no analytical description has been given so far. Using our theory we are going to consider and solve such kind of optimal control problems. In many situations the controlled system cannot be treated as isolated, therefore dissipative and relaxation processes, due to coupling to the environment (thermal bath) or due to contact with measuring devices, could play a significant role. In this case some limits of the optimal control of the system should exist [Schirmer (2000)]. The question is to estimate these limits quantitatively. An interesting problem which is usually not mentioned, is: "If the optimal control field for a quantum system without relaxation is known, how should it be modified in the presence of relaxation in the system?" In the following chapter we develop a new formulation of the optimal control problem in quantum systems. Using new analytical approach we derive a differential equation for the optimal control field which we solve analytically for some limiting cases. This approach also permits us to investigate optimal control of simple quantum systems with relaxation. First we will give a short introduction into the density matrix formalism, which is useful in description of realistic quantum mechanical systems in a contact with environment. Then we give a brief overview of the modern variational formulation for the optimal control problem, which were developed, for example, in [Ohtsuki (1999); Peirce (1988)]. After that, we give an alternative approach, that allows us to describe optimal control of
Optimal control of quantum
systems
95
a quantum system over a finite [0, T] time interval. Optimal control of the system at a given time T is only a special case of our general theory. We also introduce a new type of constraint on the control field which limits the minimal width of the envelope of the resulting field. This constraint naturally arises if one tries to find the optimal pulses with experimentally achievable modulation of the control fields.
4.1
Density matrix formalism
Here we would like to outline main ideas behind the density matrix formalism that permits us to perform quantum mechanical description of a system embedded in some environment, usually described as a thermal bath. Let us consider a mixture of independently prepared quantum states \tpi >, (i = l,...,n) with the statistical weights Wi. These states are not necessarily orthornormal to each other. The statistical operator p, or density matrix operator is defined as: n
P
= J2wi\iPi>\ >,..., |0„ >, that is connected |i/>i > through the relationship: n
3= 1 n
< ^ | = ^a*fckl
(4.3)
or, in equivalent way n
< 4>i\p\4>k >= Y,
W a
i iJ l( < v , i ( 0 1 )- (4-5) Using the time-dependent Schrodinger equation and its Hermite conjugate
§-t\A(t) >= ~H\i>i{t) >, | < Ut)\ = J$< Mt)\,
(4.6)
one easily obtains the quantum Liouville equation, which is very useful to describe the evolution of quantum systems interacting with decohering environment: ih ^=Hp-Hp=[H,p}.
(4.7)
Note, that the Hamiltonian of the system H can explicitly contain an external control field.
4.2
Liouville equation for the reduced density matrix
In this section we are going to derive, using the projector technique, the Liouville equation for the reduced density matrix, which describes a quantum system being in contact with environment. Let us consider a quantum system A with the Hamiltonian HA in contact with another system B (heat bath) with the Hamiltonian Hg. One can think about systems A and B as two subsystems interacting with each other. We assume a weak interaction between subsystems in order to justify using of the perturbation theory. As we have shown in the previous section, the density matrix of the
Optimal control of quantum systems
97
whole system p obeys the quantum Liouville equation: ih^
= [HA + HB + HAB ,p],
(4.8)
where HAB is the interaction between the systems A and B. In order to make further progress, we assume that the density matrix of the system B is at thermal equilibrium at temperature T and it is given by = exp(-/3FB)/TrB[exp(-/3FB)].
PB
(4.9)
The trace operation TTB means that we take the diagonal sum of the following operator in the Hilbert space of the system B [Toda (1983)]. Let us introduce the reduced density matrix a that describes the system A and is given by a(t) = TrB[p(t)].
(4.10)
Then our task is to derive from the Eq.(4.8) corresponding equation of motion for the reduced density matrix a(t). Let us introduce a projector operator: Pp{t) = pBTrB[p{t)}=pBa(t).
(4.11)
One can immediately check that from the definition Eq.(4.11) it follows: P2 = P,
(4.12)
thus, P is indeed a projector. Let us also introduce for a brief notation the corresponding Liouville operators: C = CA+CB
+ CAB.
(4.13)
For a Liouville operator £ and an operator F we can always write: exp(i£)T = exp(-iHt/h)Fexp(iHt/h).
(4.14)
We start from the separation of the Liouville equation into two equations:
ih
iif
ihd{1
~tP)p
= PCPp+P£(1 - p)/9
= (1 - P)CPp + (1 - P)C{1 - P)p.
(4.15)
98
Optimal control and forecasting of complex dynamical
systems
It is easy by summation of both equations obtain again the Liouville equation Eq.(4.8). We can write ih-^ ih
~^-
= £Aa +
= £ABpB 25 T3 20 (ji + 72)/2 = 3.75 • 1 0 1 3 s - 1 .
4.6
Optimal control of nanostructures: dot
double quantum
Within the same time period, when the design and control of femtosecond laser fields exhibited its great success (80th-90th), mesoscopic science
122
Optimal control and forecasting of complex dynamical
systems
0.8 C
o \S
0.6
"3 OH
0.4
O OH
0.2 i 0
i
I 0.2
.
I 0.4
.
I 0.6
•
I 0.8
.
I 1
Time Fig. 4.12 Dynamics of t h e occupation P22 (t) for system without decoherence (thick solid line) and with decoherence (dash dotted line using Eq.(4.58), thin solid line-numerical solution of the Liouville equation Eq.(4.46)).
advanced as a new era in physics and technology. Fabrication of mesoscopic systems and nanostructures (like clusters, nanowires, quantum dots, etc.) and further investigations of their properties promises a technological breakthrough in many areas. For example, a further development of quantum dots (QD), which are often described as artificial solid-state atoms, provides a possibility to develop a new generation of semiconductor lasers. In these studies the semiconductor dots and rings are made from indium arsenide embedded in gallium arsenide [Groom 2002]. They were grown using techniques developed within the past decade that allows much smaller nanostructures to be created. Such studies provide new perspectives on the internal quantum-mechanical workings of quantum dots. The ultimate goal of mesoscopic science is to create useful electronic and optical nano-materials that have been quantummechanically engineered by tailoring the shape, size, composition and position of various quantum dots and other nanostructures. Excited electronic states in quantum dots usually persist for a relatively long time because they interact in a very restricted way with their environment. Normally, such interactions lead to decoherence or destruction
Optimal control of quantum
systems
123
Fig. 4.13 Dynamics of the occupation P22M for 40 randomly generated control pulses using 71T = 272T = 1 (thin solid lines). The thick solid line represents a bound for the possible values of p??2ax(t) = (1 + e x p ( - ( 7 l + 7 2 ) t / 2 ) ) / 2 .
of the quantum state. As a result, quantum dots may provide an excellent solid-state system for exploring advanced technologies which based on quantum coherence. For example, it may be possible to create and control superimposed or even entangled quantum states using highly coherent laser stimulation [Oliver (2002)]. External control of the full quantum wavefunction in a semiconductor nanostructure may even lead to revolutionary new applications, such as those involving quantum computing, making computation in orders of magnitude faster than it is possible today. Because the underlying principles of optimal control of quantum dynamics are broadly applicable, it is a very attractive idea to combine femtosecond dynamics and mesoscopic physics and to perform coherent control on mesoscopic objects. For such systems the electronic degrees of freedom might offer the possibility of control by pulse shaping. An important requirement for optimal control of quantum dynamics is the existence of phase coherence over a time range comparable to the duration of the pulse sequence, in terms of decoherence rates and control interval T: 7i,2T 00 requires Po = 1 — p n — P22> which projects out double occupancies [Stoof (1996)]. The initial situations are set as pn = 1, P22 = 0, as can be inferred from Fig.4.14. We are going to consider the photon assisted tunnelling when the one-photon resonance condition HUJ = \ / A £ 2 + 4d2
(4.83)
is satisfied. For simplicity we can assume symmetric coupling to the reservoirs: T 2 = I Y Equations (4.82) can be solved analytically in some limiting cases. For instance, considering only an isolated system of two quantum dots in an electric field, periodic in time and with a constant amplitude V(t) — VQ, an electron placed on one of the dots will oscillate back and forth between the dots with the Rabi frequency (Ti = T 2 = 0) UR\
^ - £,„,£,,
(4.84)
where J^ is the Bessel function of order N. N refers to the number of photons absorbed by the system in order to fulfill the resonance condition Nhuj = \/Ae2+4d2.
(4.85)
From the numerical integration of Eqs.(4.82) one can obtain the charge QT transferred from the left to the right reservoir due to the action of the
128
Optimal control and forecasting of complex dynamical systems
external field over a finite time interval [0,T]. For that purpose we write the current operator J = id/h~(c~l c 2 — c2 c^) which leads, in combination with Eqs. (4.82) to the time dependent average current (/(*)> = eTr{pj}
= e ^ . + ^
p22(t),
(4.86)
where e is the electron charge and TV is the trace operation. The net transferred charge from left to the right quantum dot QT is obtained as
QT = J
dt (/(*)) = ^
/
dtp22(t) + eP22(T).
(4.87)
Obviously, QT only represents the transferred charge to the right reservoir when T2 ^ 0. Otherwise it represents the charge in the right quantum dot. In the Eq.(4.87), the second term indicates that, after the field is switched off (t > T, the charge remaining in the second quantum dot ep22(T) will be completely transferred to the right reservoir. It is important to point out that QT = QT[V(t)] is a nonlinear functional of the field amplitude V(t), and can exhibit different types of behavior, depending on the form of V(t). For instance, if the external field has a Gaussian shape V(t) — Voexp(—t 2 /2r 2 ) of duration r, then QT shows Stiickelberg-like oscillations as a function of r [Speer (2000)]. However, the Gaussian shape of V(t) does not necessarily maximize the transferred charge. Our goal is to find the optimal pulse shape Vopt(t) which maximizes QT, i.e., which satisfies Q1pax = QT\Yopt{t)\The problem of finding Vopt(t) is complicated because of its nonlinearity contained in p22{t). Therefore, we are going to determine the shape of the optimal control fields numerically using the genetic algorithm as a global search method. In our present approach a vector representing the "genetic code" is just the pulse shape V(t) discretized on a time interval t € [0,T]. The fitness function, i.e. the functional to be maximized by the successive generations is the transferred charge Qr[V(4)] (see Eq.(4.87)). Note, that the fitness function in fact is a combination of two parts, one is the value of a function at a given time, the second one is an integral (time average). Thus we have a hybrid control problem. We assume that the control field is active (non-zero) within the time interval [0, T] and it satisfies the boundary conditions f (0) = V(T) = 0. As initial population of field amplitudes satisfying the boundary conditions
Optimal control of quantum
systems
129
we choose Gaussian-like functions of the form V}°\t) = 1° exp(-(* - tjffr2) t (t - T),
(4.88)
with random values for the position of the maximum tj G [0, T] and the duration Tj € (0,T]. The peak amplitude I? for each pulse is calculated from the condition that all pulses must carry the same energy
E = J F2(t)dt « - / V2(t)dt. Jo 2 JQ
(4.89)
Equation (4.89) represents an integral isoperimetric constraint for a possible solution. The value of the pulse energy E can be considered also as a parameter to be optimized. Although the above formulated control problem (Eqs.(4.82),(4.87), (4.89) is applied to a rather simple quantum system, as we shall see, it is a rich problem, and the optimal control fields have a nontrivial shape and induce complicated dynamics of the electron occupations in the system. The parameters used in calculations are given in terms of the tunnelling matrix element d. The energy difference Ae = £2 — £i must be much larger than d to ensure that the ground state of the double quantum dot is localized on the left side and the excited state is localized on the right side. This also leads to a sharper resonance behavior. Therefore we set As = 24d. In calculations we use symmetric coupling of the quantum dots to the reservoirs I^i = T2 = T and compute optimal field shape for different values of the coupling constants T = 0, O.Old, 0.05(1 It is important to point out that r must be smaller than d, so that the Rabi oscillations do not become over-damped. If T is large, the systems saturates very rapidly to Pu = P22 = 1/2, and no interesting transient dynamics can be observed. Finally, we choose the control interval T = lOOH/d which is large enough to allow back and forth motion of the electrons between the quantum dots and is of the order of h/T. In this calculations we also set a implicit constraint on the minimal width of the pulse in order to describe pulses which can be achieved experimentally. In our calculations this minimal width is naturally determined by the discretization of the time interval and by the smoothness parameters kc and km of the crossover and mutation operations (see Eqs.(2.20),(2.21)). Let us first discuss the results for [7 = 0, i.e., neglecting the inter-dot Coulomb repulsion. From the elementary analysis of Eq.(4.82) it is clear that the optimal pulse first should be able to transfer an electron from the
130
Optimal control and forecasting of complex dynamical
systems
(a)
f'"~~"~x 40 i
20
\\ \\
t
I
i
n
v
2
3 time [ti/d]
4
f
0.8
(b)
1 0.6 |o.4 0.2
J
f 2
3 time [h/d]
4
Fig. 4.15 Optimal control field for the isolated double quantum dot ( r = 0). (a) Solid line: reference square pulse of duration r = 7rf2^, ox , intensity Vo yielding the first maximum of J\{Vo/hJ) (see text), and energy EQ. Dashed line: optimal pulse shape for the maximization of the charge transferred from the left to the right quantum dot. T h e pulse energy is Eo. (b) Corresponding time-dependence of the occupation P22 (t) on the second dot for the pulses shown in (a).
left to the right quantum dot (inversion of the occupations) and then to keep this situation as long as possible. However, there are many different pulse shapes able to achieve this situation, and it is a priori not clear which one maximizes QT- There are three basic time scales involved in the problem: the control time interval T, period of the Rabi oscillations tu = LJR-1, and decoherence time t^ = Ti"1. For T —> 0, for example, Eqs.(4.82) can be solved analytically for some limiting cases. If, for example, the external field is periodic in time V(t) = Vocoswt, with a constant amplitude Vo, an electron placed on one of the dots will oscillate back and forth between the
Optimal control of quantum
131
systems
J U
0.8
0)
T3 3
1 0.6
20
|
"Si
0.4 0.2
S 10
0 " 0
• 20
• 40
• 60
• 80
1 100
time [»/d]
"53
0
20
40
60 time [ft/d]
80
100
Fig. 4.16 Optimal pulse shape for the maximization of the charge transferred from the left to the right quantum dot. Pulse energy is E = 0.57i?o, r = 10 _ 6 d. Inset: the corresponding time-dependence of the occupation P22W on the second dot.
dots with the Rabi frequency [Zeldovich (1967); Tien (1963)]: n = 2d/fiJi(V0/M.
( 4 - 90 )
being Ji the Bessel function of the order 1, if the system absorbs one photon. Note, that UJ must fulfill the resonance condition Two = %/Ae2 + Ad2. The description of the tunnelling dynamics for pulses of varying intensity is more complicated, because the Rabi frequency changes in time. Thus, for pulses of a constant amplitude there is an upper limit O m a x for the Rabi frequency, which is obtained when the ratio x — VQ/JVJJ is the first maximum of the the function J 1(2:). Using this property we construct a reference pulse of square shape (V(t) = VQ for 0 < t < r and V(t) = 0 otherwise) with intensity V0 as defined above and duration r = irQ^ax. In the following we will use the energy E0 of such reference pulse as a unit of pulse energies. In principle one would expect that the reference pulse denned above exactly achieves an inversion of the occupation in the double quantum dot within the shortest time (assuming only one-photon absorption). However, as we show below, such a pulse shape is not absolutely the best one, and one can obtain slightly better beyond the adiabatic approximation, which was assumed valid to derive expression Eq.(4.90). In Fig.4.15 we compare the
132
Optimal control and forecasting of complex dynamical
systems
OVJ
(a) 60 •E.40
e £ 20 4—
I 20 1 0.8
40 60 time [ti/d]
80
100
40 60 time [h/d\
80
100
jus
I 0.6
I 0.4 a. 0.2 0
20
Fig. 4.17 (a) Optimal pulse shape which induces maximal current for T = O.Old. Pulse energy is E = 4.26Bo (b) Corresponding behavior of P22(t).
effect induced on the isolated double quantum dot ( r = 0) by the reference square pulse with that induced by the optimal one calculated using genetic algorithm and having the same energy EQ. AS one can see in Fig.4.15(b), genetic algorithm finds a pulse shape which induces a slightly faster transfer of the charge. This result inspires us to perform calculations for more complicated problems. In the following examples we are searching also for the minimal pulse energy E. Note that, if no constraints are imposed on the width of the pulses, a pulse of infinitely small width (delta-like pulse) should yield to the maximal QT- Such pulse would produce p22{t) = 1 over the whole time control interval, leading to the maximum possible value Q™ax = eTT/h + e. However, the energy of such pulse would diverge. Pulses with zero width and infinite energy cannot describe a realistic situation. Moreover, for such pulses the whole model would break down, since very narrow pulse has very large
Optimal control of quantum
systems
133
40 60 time [h/d\
Fig. 4.18 (a) Optimal pulse shape for T = 0.05d. Pulse energy is E = 121E0 (b) The corresponding behavior of f>2i{t)-
spectral width and would excite many levels in each quantum dot. In Fig.4.16 we show the optimized pulse shape for the maximization of the charge transfer in the almost isolated double quantum dot (T = 10~6d). The optimal pulse excites the system at the beginning of the control time interval, inducing the inversion of the occupations. P22(t) reaches the value 1 when the pulse goes to zero. Since TT/hbar is very small, this occupation remains constant in time. As a consequence QT is maximized. From the comparison between Fig.4.15 and Fig.4.16 we see that a limitation of the minimal pulse width, that we employed for this calculations, leads to more symmetric and smooth optimal solutions. The corresponding evolution of the occupation of the second quantum dot is shown in the inset of Fig.4.16. In Fig.4.17 we show the optimal field envelope and the induced occupation P22W dynamics in the case of weak coupling to reservoirs with coupling constant T = O.Old. Note, that the optimal field is structured as a sequence
134
Optimal control and forecasting
20
40
60
80
100
of complex dynamical
20
time [h/d]
0
20
40
60
time [h/d]
40
systems
60
80
100
80
100
time [h/d]
80
100
20
40
60
time [h/d]
Fig. 4.19 Illustration of the optimization process using genetic algorithms. Evolution of the "fittest" pulse shape for maximization of the current for T = 0.05d.
of two pulses (see Fig.4.17(a)). The first one acts at the beginning and has the proper shape to bring the occupation of the second quantum dot to a value close to 1. However, since TT/H = 1 and according to Eqs.(4.82), P22(t) starts to decrease as soon as the first pulse goes to zero. Shortly before the end of the control time interval the second pulse brings the occupation p22(t) again to a high value (Fig.4.17(b)). The structure of the optimal pulse can be easily interpreted with the help of the expression of QT as a functional of p22{t) ( se e Eq.(4.87)). The first pulse tends to keep the term ^j/*- J0 dtp22(t) as large as possible, whereas the second pulse acts to increase ep22(T). As a consequence, QT is maximized. Figure 4.18 shows the results for the same system, but with larger coupling constant, namely T = 0.05d. As can be seen in Fig.4.18(a), in this case the optimal solution also exhibits pulses at the beginning and at the end of the control interval, but also a complicated sequence of pulses between them,
Optimal control of quantum
60
a 43 o
"O il) Ui
systems
135
3.1 3
oo.
Pulse shape
QT
optimal pulse rectangular pulse Gaussian pulse constant pulse
1.29 0.85 0.74 0.77
the beginning and at the and of the control interval, but also some structure in between. Prom this example one can learn that it is a general property of the control fields for systems with decoherence and this property one finds for various functional to optimize. In order to investigate the influence of the interdot Coulomb repulsion U we perform calculations similar to those described above, but for the case U —> oo using the same set of coupling parameters T. We found that the repulsion between quantum dots leads to a relatively smaller net transferred charge (see Fig.4.22). This is due to the fact that U —• oo prevents double occupancies in the system. Therefore an electron from the left reservoir can jump into the double quantum dot only when the previous electron has already left the system and was transferred to the right reservoir.
138
Optimal control and forecasting of complex dynamical systems
Finally, to emphasize our results and to show that pulse shaping can indeed lead to a remarkable enhancement of the photon assisted current through double quantum dots, we indicate in the Table the values of the transferred charge QT for the coupling constant T = O.Old and pulses having different shapes V(t) but carrying the same energy E. As expected, the optimal pulse found by the genetic algorithm (already shown in Fig.4.17) induces clearly more transferred charge than pulses having other shapes. It is important to point out that the rectangular and Gaussian pulses mentioned in the Table are the fittest ones among rectangular and Gaussian pulses, respectively. Thus, the optimal pulse induces 1.74 times more charge than the best Gaussian pulse, and 1.5 times more charge than the best rectangular pulse.
4.7
Analytical theory for control of multi-photon transitions
Using Floquet formalism and the adiabatic approximation we develop a theory for optimal control of a quantum system interacting with external electric field under multi-photon resonance. An optimal solutions for the case of control over time interval are presented. We investigate how the order of the photon resonance affects the shape of the optimal control field. In this chapter we already have demonstrated that some analytical solutions of optimal control problems can be obtained using the adiabatic approximation together with variational formalism. As it was shown in the previous sections, the effect of weak decoherence on the shape of optimal control fields can also be taken into account. All these analytical and semi-analytical results were obtained under the assumption of weak control field and one-photon resonance. However, the weak response limit is not restricted to single-photon processes. For example, one can consider optical control of an excited state which is not accessible via a direct electric dipole transition from the initial ground state. In this case two- or even multiphoton process will be responsible for the population transfer. Since multi-photon interaction usually assumes significant nonlinearity of the corresponding system's dynamics, the integration is usually done using numerical procedure, because the overall mechanism is quite complex due to the multiphoton nature of the laser driven dynamics. For example, in [Speer (2000)], using the direct numerical integration of equations of motion for a double quantum dot system, it was shown that under two-photon
Optimal control of quantum systems
139
resonance the population transfer can be more effective than in one-photon case, even if one restricts himself to simple Gaussian control pulses. That is why it is significant and highly desirable to develop analytical theory which will help to get more insight for optimal control of quantum systems under the multi-photon resonance. Our approach to optimal control problem is based on derivation of analytical solutions for the occupation of the quantum level using the adiabatic approximation and consequent application of the variational principle. The corresponding Euler-Lagrange equations determining the optimal control field are derived in close and integrable form, so we can determine qualitatively and quantitatively how the order of the multi-photon resonance change the shape of the optimal control field. We consider an N + 1 level quantum system, described by the Hamiltonian Ho with eigenstates \i > (i — 1,..., N + 1), which interacts with a classical laser pulse. The total Hamiltonian of the system is then of the form: H{t) = H0 + nV{t) cos{cjt),
(4.92)
where fx is the dipole matrix, V(t) describes the envelope of the control field. Let us first focus on the case of a three level system and two photon resonance in the system. Let Ei,E2, E3 are the corresponding energy levels described by the amplitudes 01,02,03 interacting with an external control field E(t) — V(t)cos(ut). We assume the so-called "cascade" scheme of the levels, thus we take into account transitions between levels |1 > and |2 > and between levels |2 > and |3 >, with the corresponding dipole matrix elements /U12 and ^23, and neglect the direct transition between |1 > and |3 > (Mis » 0). The dynamics of a three-level quantum system interacting with external field can be described then as: ihai = E\di + H2iV(t)a,2Cos(u)t), iha2 = Eia-i + H2iV(t)ai cos(uit) + /J,23V(t)a3 cos(wi),
(4.93)
iM3 = E3C13 + V32V (t)a2 cos(urt). We choose the initial condition as oi(0) = 1,02(0) = 03(0) = 0. In order to obtain analytical solutions in close form it is useful to assume that the field's envelope V(t) changes adiabatically slowly with time. Thus we restrict ourselves to optimal control fields, which modulation of the
140
Optimal control and forecasting of complex dynamical
systems
field's envelope does not affect the carrier frequency ui. We also assume the case of two-photon resonance: w = a>i + w2-
(4.94)
Here we use notation for the transition frequencies u>\ = Ei — Ei,u>2 = E3 —Ei. Because the Hamiltonian of the system is almost periodic in time (for the fields envelope V(t) changing slowly on the time scale of w _ 1 ) we can use the Floquet approach (see, for example, a nice review [Grifoni (1998)]) in order to get analytical solutions for the amplitudes a 1,02,03. We represent the amplitude a^ (i = 1,2,3) as an infinite series of quasienergetic functions 0,,^: 00
a,i(t) = ^2
ai:i(t)exp(ilujt).
(4.95)
i=—00
By substituting this expression into the Eq.(4.93) and making equal the terms with the same frequencies, one obtain an infinite system of coupled equations. By neglecting rapidly oscillating quantities and keeping equations, containing only resonant terms 01,0,02,-1,03,-2, we obtain: iMifl = Eiaifl +//2iV"(t)o2,-i/2 i M 2 , - i = (£2 -
fiwK-i
+ /*2iV(t)oi, 0 /2 + H23V(t)a3,-2/2,
iftd3,_2 = (E3 - 2^)03,-2 +
(4.96)
l^32V(t)a2,-i/2.
For a slowly changing field amplitude V(t) and under the condition of the exact two-photon resonance 2w = E3 — E\, one can apply the adiabatic approximation and obtain analytical expression for the occupation of the third level P33(t): P33(i) = |a 3 (i)| 2 « |a 3 ,- 2 (i)| 2 = sin2 ( g |
J* V*{t>)dt>),
(4.97)
where A2 = UJ\ — w is the (non-zero) de-tuning of the field respect to the intermediate level. Note, that the occupation of the system begin to oscillate between levels 1 and 3 (similar to Rabi oscillations) with a frequency, which is proportional to the square of the field amplitude, that is characteristic of the two-photon process. Within our approximations the occupation of the intermediate state P22 ~ |«2,-i| 2 remains approximately zero.
Optimal control of quantum
systems
141
Let us now analyze the obtained result. Using the expression for />33(i) Eq.(4.97), the first observation is that the final value of the occupation ^33 depends only on the pulse energy E oc fQ V2(t')dt' and does not depend on the pulse area 9 <x J0 V(t')dt' of the control field, opposite to the case of one-photon resonance. Fixing the final state (for example, P33(T) = 1) of the system, one restricts the pulse energy E to have certain values. But in the same time one can find an infinite set of the field envelopes V(t) with the same energy E but with different final pulse areas OT OC / 0 V(t')dt'. This result is in dramatic difference with the case of one-photon resonance, which we considered earlier [Grigorenko (2002)], where by fixing the final state and the pulse's energy, one can uniquely determine shape of the driving V(t). From Eq.(4.97), it is easy to see that the population of the third level becomes maximal if the energy of the pulse is
Etot =
27rAo -+2im M21M23
4A9 — ,n = 0,l....
(4.98)
M21M23
It is possible to generalize the solution given by Eq.(4.97) for the case of N + 1 level system with the cascade scheme of transitions (assuming the transitions only between the neighboring states \l >-\l + 1 >, I = 1, ...,N), and assuming N photon resonance: HNu> = .Ew+i — E\. By using the same quasi-energy method in resonant approximation and the adiabatic approximation, we can obtain the solution for the upper level pN+i,N+i{t) in a close form: PN+i,N+i(t) = sin 2 ( / Q.N(t')dt'J
=
^([^ W I n(^)]i'v»/ S
\
s/'
Nw \ n
systems
~
*T
,\N.,-SX,
0.5
1
1.5
W
Fig. 5.4 Purity decay rate \dP/dt\ as a function of the parameter w (see Fig.5.3) for T = 1 0 - 7 A (solid line) and T = A (dashed line).
chiral spin liquids [Kitaev (2003)]. Thus, albeit enjoying an exceptionally high degree of coherence, the (topo)logical qubits require an enormous overhead in encoding, since creating only a handful of such qubits takes a macroscopically large number of interacting physical ones. Besides, the nearly perfect isolation of (topo)logical qubits from the environment can also make initialization of and read-out from such qubits rather challenging. Nonetheless, the results we have presented in this section, suggest that a much more modest idea of augmenting the other decoherence-suppression techniques with a properly tailored permanent inter-qubit couplings might still result in a substantial improvement of the quantum register's performance. Moreover, the optimization method employed in this section can be rather straightforwardly extended to the case of time-dependent gates and other quality quantifiers (quantum degree, entanglement capability, etc.) as well as beyond the Bloch-Redfield approximation and the assumption of the Ohmic dissipative environments. The rapid pace of a technological progress in solid-state quantum com-
Robust qubits
157
puting (particularly, the phase [Mooij (1999)], charge [Nakamura (2003)] and charge-phase [Vion (2002)] superconducting qubit architectures) provides one with a strong hope that the specific prescriptions towards building robust qubits and their assemblies discussed in this section could be implemented before long.
5.2
Optimal design of universal two-qubit gates
Now let us construct optimized implementations of some universal twoqubit gates that, unlike in the most examples of the previously proposed protocols, are carried out in a single step. The new protocols require tunable inter-qubit couplings but, in return, show a significant improvements in the quality of gate operations. Our optimization procedure can be further extended to combinations of elementary two-qubit as well as irreducible many-qubit gates [Grigorenko (2005)]. According to one of the central results obtained in quantum information theory, an arbitrarily complex quantum protocol can be decomposed into a sequence of single-qubit rotations and two-qubit gates [DiVincenzo (1995)]. However, despite its providing a convenient means of designing logical circuits, in practice such a decomposition may not necessarily achieve the shortest possible times of operation and, concomitantly, the lowest possible decoherence rates. In the practical implementations, the latter are crucially important, since any realistic qubit system would always suffer a detrimental effect of its dissipative environment. Recently, there have been various attempts to improve on the performance of universal quantum gates by searching for their optimal implementations among the entire class of the two-qubit Hamiltonians with the most general time dependent coefficients. However, the outcome of a typical tour-de-force variational search such as that of [Niskanen (2003)] appears to be a complicated sequence of highly irregular pulses whose physical content might still remain largely obscure. In the search of a more sophisticated analytical approach, a number of authors applied the optimal control theory with the goal of implementing an arbitrary unitary transformation independently of the initial state [Ramakrishna (2002)]. The resulting complex system of nonlinear integraldifferential equations can be solved numerically with the help of the Krotov's or similar iterative algorithms [Krotov (1996)]. Conceivably, a significantly simpler alternative to the above approaches
158
Optimal control and forecasting of complex dynamical
systems
would be a straightforward implementation of a desired unitary transformation in the smallest possible number of steps, during which the parameters of the qubit Hamiltonian remain constant. One such example is given by the two-qubit SWAP gate which can be readily implemented (up to a global phase) with the use of the Heisenberg-type inter-qubit coupling that remains constant for the duration of the gate operation (see, e.g., [Zhang (2003)]). We are going to construct one-step implementations of some universal gates (e.g. CNOT gate). In contrast with the previous works, where a constant decoherence rate was assumed and, therefore, the overall loss of coherence accumulated during a gate operation was evaluated on the basis of its total time, we quantify the adverse effect of the environment by actually solving the corresponding master equation for the density matrix of the coupled qubits. In this way, we account for the fact that the decoherence rates depend on (and vary with) the changing (in a step-wise manner) parameters of the time-dependent two-qubit Hamiltonian. On the basis of our general conclusions, we also make specific predictions for such a viable candidate to the role of a robust two-qubit gate as a pair of the charge-phase Josephson junction qubits (dubbed "quantronium" in [Vion (2002)]) which are tuned to their optimal points and coupled (both, inductively and capacitively) to each other. The problem of implementing a given unitary transformation in the course of a quantum mechanical evolution of a generic two-qubit system can be formulated as the condition of the minimum deviation \\X - T e x p ( - - i [ ° H(xi(t'))dt')\\ -* min (5.15) n Jo between the time-ordered evolution operator governed by the Hamiltonian H and the target unitary transformation X. Here \xi(t)\ < ai represent the tunable control parameters whose physically attainable values are generally bounded, and the Probenius trace norm is defined as \\Y\\ = \JTr\Y^Y]. In the standard basis where af 2 are diagonal, the noiseless part of Eq.(5.1) takes the form
Ho
/Jz+ei+e2 A2 Ai Jx - Jy A2 ei - e2 - Jz Jx + Jy Ai Ai Jx + Jy e2 - ei - Jz A2 V Jx- Jy Ai A2 - e i - e2 +
\ (5.16) Jz)
In order to facilitate our analysis of the decohering effect of the environ-
Robust qubits
159
ment, in what follows we again, as in the previous section, assume the Ohmic nature of the dissipative reservoirs < hi(t)hi(t') > described by the spectral function 5(w) = auj coth ^6(uc — w) with the bandwidth LOC, 6{UJ) is the step function. This assumption allows one to treat the environment in the standard Bloch-Redfield (i.e., weak-coupling and Markovian) approximation. Below, we restrict our analysis to the Hamiltonians that remain constant for the entire duration of the gate operation (H(t) = Ho6(t)9(tenc[ — t)) and, therefore, commute at different times {[H(t),H(t')] = 0). We then demonstrate that within such a class of quasi-stationary Hamiltonians, the problem of constructing an optimized (coherence-wise) implementation of a given universal gate allows for a rather simple and physically transparent solution. To that end, we invoke a direct relationship between a possibility of decoherence suppression and a spectral degeneracy, as revealed by the analysis done in the previous section, the problem of preserving an initial state ("quantum memory") of an idling pair of coupled qubits (see also the later works [You (2004)] for similar results). Namely, we just have shown above, a contribution to the overall decoherence factor stemming from the relaxation processes can be significantly reduced by tuning the Hamiltonian parameters towards the point where a pair of the lowest eigenvalues of the quasi-stationary Hamiltonian Eq.(5.29) becomes degenerate. The underlying (energy exchange-based) mechanism of the suppression of relaxation can be explained by the fact that near a degeneracy point and at low temperatures the partial relaxation rates (5.7) vary with the transition frequencies w^ as a sum X ^ i = i 4 cij lwu I w n e r e the coefficients c^ are essentially independent of w^-. Therefore, the relaxation rates attain their minimum values at those points in the parameter space where the largest possible number of the transition frequencies vanish as a result of the onset of degeneracy. Notably, a contribution due to the other, pure dephasing, processes is generally unavoidable and can only be suppressed by lowering the temperature of the reservoirs. In order to further illustrate the above point, in Fig.5.5 we plot the decoherence rate \dP/dt\ (note that in the Markovian approximation and in the short time limit the purity decays as a linear function of time) as a function of Jy and Jz. And in Fig.5.6 we plot the function, which quantifies the degree of the degeneracy Q = \E\ — En\ + \E2 — E3\. Obviously, its minima will correspond to the double degeneracy, that helps us to iden-
160
Optimal control and forecasting of complex dynamical
systems
tify the above described effect. Again, we kept the length of the vector constant, since in the case of realistic Josephson qubits an unlimited increase of J would have resulted in an unwanted leakage from the designated two-qubit Hilbert subspace. Besides, for the sake of physical clarity, in obtaining Figs.5.5, 5.6 we put the parameters of both (chosen to be identical, Ai = A2 = A) qubits into the coherence-friendly "quantronium regime" of Ref.[Vion (2002)]: t\ = e2 = 0. Figure 5.5 demonstrates that \dP/dt\ has the absolute minimum at the point characterized by the incidence of double degeneracy (Ei = Ei,E^ = Ez) between the eigenvalues given by the expressions £1,2 = Jx T V / ( A 1 + A 2 ) 2 + ( J J / - J 2 ) 2 ! £3,4 = -Jx T v / ( A 1 - A 2 ) 2 + (J y + J,)2.
(5.17)
We have already shown in the previous section, that the latter occurs for the inter-qubit coupling Jopt satisfying the conditions J°pt = 0,
2J°ptJ°pt
= A2.
(5.18)
Note, in the Fig.5.6 there are two points, corresponding to the double degeneracy. However, the second one, where Jy > Jz is not coherent-friendly, since we assume the coupling to the reservoirs along axis z, that breaks the symmetry between these two points. The Jz > Jy is coherent-friendly because the resulting Hamiltonian will approximately commute with the coupling operator, that leads to additional stability of the two-qubit system. Furthermore, albeit being less effective than the onset of double degeneracy, the emergence of even a single degenerate pair of the two lowest eigenvalues appears to provide a relative improvement as compared to the non-degenerate case. In this case, the two-dimensional degenerate subspace is (partially) protected by the energy gap separating it from the rest of the spectrum, thereby giving rise to the exponential suppression of some relaxation rates at low temperatures. This more relaxed constraint on the Hamiltonian parameters may provide an extra freedom in improved implementations of the logic gates. Having identified the conditions providing a suppression of decoherence, we can now try to satisfy them in the case of various universal gates. For a starter, we impose the less stringent condition of a single degeneracy between the lowest pair of energy levels, while attempting to find a one-
Robust qubits
161
step implementation of the standard CNOT gate
XcNOT=
/1000\ 0100 0001
(5.19)
Vooio/
2 10
*-
2 10"
B'ig. 5.5 The purity decay rate \dP/dt\ as a function of interaction coefficients Jx and Jv and Jz = J*'-J*-J*, J ~ 2A, T = 0.001A.
In the noiseless case, this goal can be accomplished by virtue of computing a matrix logarithm of XCNOT directly. However, the latter appears to feature a substantial degree of ambiguity
HCNOT
= ith/to log (XCNOT)
= A(C + B)A
\
(5.20)
162
Optimal control and forecasting of complex dynamical
systems
4.0 £'-£.:
0.0
J, Fig. 5.6
T h e measure of the double degeneracy \E\ - E&\ + \E2 - E3\ as a function J2 __ j 2 _ j 2 j = 2A, (compare als
of interaction coefficients .Jx and Jy and J with the previous figure).
where t0 is the protocol duration, we also use notation
C =
/oo
0
0 \
00 00
0
0
V0 0
-IT/2
(5.21)
+ TT^O^,
TT/2
TT/2
-TT/2/
/ 0 0 1 1\ /I 100\ 001 1 1 100 + 27rn2 B = 27rn! 1100 0011 \l100/ V0011/
•
2imiE,
(5.22)
where E is the unit matrix, and A is the block-diagonal matrix (ei$3
0 e"
0
\
(5.23)
163
Robust qubits
parameterized by the arbitrary integers n; (observe that [5,C] = 0 ) , and continuous phase variables <j> and fa. The high dimension of the invariant subspace of the CNOT's equivalence class (see below) is a rather unique property of this particular gate. A straightforward analysis reveals that one does reproduce the CNOT gate up to a global phase o = 1/4 (so that XCNOT = exp(—wr< ip\, the entropy of the subsystem can be used as a measure of entanglement. Note that in this case p = p2. The entropy of any pure state is zero, which is unsurprising since there is no uncertainty about the state of the system. It also can be shown that unitary operators acting on a state of the whole system (such as the time evolution operator obtained from the Schrodinger equation) leave the entropy unchanged. This associates the reversibility of a process with its resulting entropy change, which is a deep result linking quantum mechanics to information theory and thermodynamics. In a case when the observed subsystem system initially is in a pure state, the rate of increase of the subsystem's entropy reflects the leakage of the
168
Optimal control and forecasting of complex dynamical
systems
information due to coupling to the unobservable subsystem (second qubit in our language). Explicitly, the marginal density matrix for the first qubit can be written as: P1
=
( P\\ + PZZ P34 + P12 I , i \P43 + P21 Pll + P44
(5.28)
where pij, i,j = 1, ..,4 are the density matrix elements of the total system. As we already did before, let us consider a quantum system of two interacting qubits with the Hamiltonian H: /Jz + e1+e2
A2
Ai Jx-Jy
Jx + Jy Ai
\
Ai
Jx-Jy
\
e2 - ei - Jz A2 A2 - e i -C2 + JzJ
^
'
The system's density matrix p(t) satisfies quantum Liouville equation: ih~p
= Lp=[H,p}.
(5.30)
By parameterizing the interaction vector (Jx, Jy, Jz), Jx = Jcos() sin(6), Jz — Jcos(6) we can plot the entropy of a subsystem (which also characterizes the entanglement in the whole system) S^ = Si(t, 4>, 6). In Fig.5.8 we plot the entropy Si as a function of parameters , 6, at time t = 30.17T, for a particular choice of the Hamiltonian's parameters: ei = e2 = 0, d\,d2 = d — 0.0001, and having the interaction strength J = 1. We set the initial pure state as p(0) = [[1000] [0000] [0000] [0000]]. We have restricted £ [0, n] and 0 £ [0, TT/2] because of the symmetry of the map. In Fig.5.8 we can see that the entropy map Si(<j),9) becomes rather complicated. For even larger times it looks more fractal-like. Note, the lighter areas in the plot correspond to lower values of entropy, and the darker- to higher. This pattern reflects the sensitivity of the system to small perturbations of coupling parameters on big time scales.
5.4
Summary
In this chapter we considered an application of optimal control and optimal design approach to the problem of quantum computing. It was shown, that one can use tunable parameters of basic elements of quantum computer in order to optimize its performance and stability against decoherence effects.
Forecasting
169
We have obtained that even in the case of simple two-qubit system one can improved performance and stability of the system by approximately factor of 10. Since for large subsystems a "plain" design have even less chances be the optimal realization, the significance of the optimal control becomes obvious. One could also expect that the expectation value of the difference in performance between optimal and non-optimal realization should increase with increasing of the object's size. But one have to take into account that for larger systems the sensitivity to the optimal design implementation could also increase, because the spectrum of interacting threeand multi-qubit systems becomes more and more close to the continuum spectrum of a macroscopic system. We also have calculated the entropy of a pair of interacting qubits, and have shown its complex dependence on the interaction parameters.
This page is intentionally left blank
Chapter 6
Forecasting of complex dynamical systems
Es gibt Zeit fur die Mdrchen, Fiir der zauber Baum, Fur das Goldhaarige Mddchen, Fiir der Honigaugen Traum. I.G. As we have discussed in the previous chapters, complex systems are usually associated with the situation when it is extremely difficult to deduce system's behavior from the first principles. For example, on the current level of our knowledge, it is simply impossible to predict, for example, the behavior of a human being purely from a priori principles. The way, how it works, for example, in quantum mechanics, we cannot formulate "behavioral" variational principles (the universal principle that we have mentioned in the first chapter). This means, that without such "behavioral Lagrangian" one cannot derive the correspondent Euler-Lagrange equations. In addition, complex systems are characterized by multiple temporal and spatial scales, that make forecast very nontrivial problem. The approximation and forecast of dynamical systems seems to be a field in which many open problems must be studied in the near future. In particular more needs to be said on the actual relationship between the required accuracy and the computational effort.
6.1
Forecasting of financial markets
Financial markets can be regarded as model complex systems. In fact, they are systems composed by many agents which are interacting between them in a highly complicated way. Financial markets are continuously moni171
172
Optimal control and forecasting of complex dynamical
systems
tored, the data exist down to the scale of each single communication of bid and ask of a financial asset (quotes) and at the level of each transaction (trade). The availability of this enormous amount of data allows a detailed statistical description of several aspects of the dynamics of asset price in a financial market. The results of these studies show the existence of several levels of complexity in the price dynamics of financial asset. Financial markets provide us with a high-frequency time series, which are usually multivariate (multi-dimensional), highly correlated, highly volatile and have interesting scaling properties, they have no simple statistics (like Gaussian random numbers) and finally, they are non-stationary, i.e. their statistical properties are changing in time! One cannot treat such systems simply as deterministic, the stochastic component is an essential part of them. It was shown, (see, for example [Hegger (1998)]) that an implicit assumption of noise-free input for the time series can lead to it systematically wrong results of the model estimation. As a consequence, one obtains biased, not correct estimation of the model parameters. This effect has as greater impact as the actual level of the noise is higher. Thus, one have to use an approach that tackles the noisy time-series. All this makes forecast of financial markets extremely complicated and challenging problem. Besides that, these is, of cause, a philosophical question is it possible at all to predict future and if the answer is positive, how to quantify predictability? Let me leave this question open. Let us mention some of the most common and frequently used forecasting methods, like multi-agent forecasting programs and packages, Neural Networks, different autoregressive models, including fractionally integrated, (we did a short introduction to fractional derivatives in chapter 3): AR, ARMA, ARIMA, GARCH, VAR, ARFIMA, FGARCH [Hendry (2001)]. There are forecasting techniques, based on Chaos Theory and econometric models. In the following paragraphs we are going to talk about some of these methods.
6.2
Autoregressive models
A category of models what is commonly used for forecasting is the class of autoregressive models. The main idea is to predict future values of the time series with the help of the past (lagged) values. One can write an
Forecasting
173
autoregressive model as: N
Yt = YjaiYt-i + Zu
(6.1)
where Zt is some stochastic process, for example a white noise sequence. Intuitively, one can understand white noise as completely random sequence without systematic structure. In terms of autocorrelation it is "delta" correlated in time. =a2S(t-t1),
(6.2)
with variance 0, we are doing ex ante forecasts. After fitting a model, we estimate a future value Yn+m at time n based on the fitted model, while the actual value of Yn+m is unknown. The forecast is h steps ahead; h is known as the lead time or horizon.
174
6.3
Optimal control and forecasting
of complex dynamical
systems
Chaos theory embedding dimensions
If we construct a forecasting model, we have to take care that this model will be not overcomplicated. For example, we would like to identify the order of the autoregressive model. One can do it with the help of chaos theory and so-called embedding theorem. Suppose we have a scalar time series, X\,X2---XNWe make a time-delay reconstruction of the TV phase-space with the reconstructed vectors: Vn = (xn,Xn-.u •••, Zn-(d-l)t)>
(6-4)
where t is time-delay, d is so-called embedding dimension, and n = (d — l ) i + l , . . . , N. d represents the dimension of the state space of the underlying system. The time-delay (time lag), Due to similarity of work between the present t, represents the time interval between sampled observations used in constructing the d-dimensional embedding vectors. If the time series is generated by a deterministic system, then by the embedding theorems [Takens (1981)], there generically exists a function F: that Vn+1 = F(Vn),
(6.5)
if the observation function of the time series is smooth, and d is sufficiently large. And this mapping has the same dynamic behavior as that of the original unknown system in the sense of topological equivalent. Then the remaining problem is how to choose the t and d, i.e. timedelay and embedding dimension, such that the above mapping exists. From the Takens theorem, it does not matter what time delay is selected in a "generic" sense. But in practice, because we have only a finite number of data points available with finite measurement precision, a good choice of t is deemed to be important in phase space reconstructions. In addition, determining a good embedding dimension d depends on actual choice of t. Another interesting issue is the choice of the embedding dimension from a time series. Generally there are three basic methods used in the published literature, which include computing some invariant (e.g., correlation dimension, Lyapunov exponents) on the attractor (e.g., [Grassberger (1983)]), singular value decomposition [Broomhead (1986)], and the method of false neighborhoods [Kennel (1992)]. For more discussions on this topic, see e.g., [Ott (1984)].
Forecasting
6.4
175
Modelling of economic "agents" and El Farol bar problem
As we have mentioned, economical modelling one of the most challenging problems. Some interesting insights could be gained in the modelling of economic "agents" with bounded computational skills and/or resources. One can use artificial agents to simulate individual investors. The ultimate goal of each agent is to maximize its individual wealth. To accomplish this goal, each agent must choose between investing in either a risky or riskfree asset by developing a forecast of the price of the risky security and then determining the position in that security or in the risk-free asset to be taken. One of the most efficient ways to generate a forecast, is to use genetic algorithm to attempt to learn about the behavior of the market. As we have described in the previous chapters, genetic algorithm is heuristic search mechanism based on the notion of biological evolution. A pool of potential solutions, in our case these are market forecasts, are evaluated against some objective function and those solutions that produce the best results are kept. Potential solutions that produce inferior results are discarded from the pool. Genetic operators are then applied to the remaining potential solutions to replenish the pool. One of the first, very simple, but rich multi-agent models were developed to solve El Farol bar problem. The El Farol bar problem is a problem in game theory that was created in 1994 by Brian Arthur [Arthur (1992)]. It got its name from a bar in Santa Fe, New Mexico, which offers Irish music on Thursday nights. The problem is as follows: there is a finite population of people (let's say 100) in a small town. On Thursday night, all of these people want to go to the popular El Farol Bar to hear lovely Irish music. However, since the El Farol is quite small (limited resource), and it's no fun to go there if it's too crowded (or fully occupied). So, we have a "game" with the following rules: If less than 60% of the population decide this evening go to the bar (in our case 60 people), they'll all have a better time than if they stayed at home. If more than 60 people make this decision, they'll all have a worse time than if they stayed at home. The rules of the game that everyone have to decide at the same time whether they will go to the bar or not. They cannot wait and see how many others go before deciding to go themselves. In some variants of the problem, persons are allowed to communicate with each other before deciding to go to the bar. However, they are not required
176
Optimal control and forecasting of complex dynamical
systems
to tell the truth. Now you can appreciate the significance and non-triviality of this problem. The the problem is constructed in such way, that no matter what strategy each person uses to decide if they will go to the bar or not, if everyone uses the same strategy it is guaranteed to fail. If everyone uses the same strategy, then if that strategy suggests that the bar will not be crowded, everyone will go, and thus it will be crowded; likewise, if that method suggests that the bar will be crowded, nobody will go, and thus it will not be crowded. The extension of the theoretical results applied to economic multi-agent models seems promising and certainly deserves further work. These applications can be particularly interesting, because this kind of language allows us to simulate effects of collective behavior, cooperation, clustering etc. With these models one can capture the workings of the processes stage by stage as they are observed and to reproduce the known outcomes. However, one should be aware of recent works which stress that investors are not fully rational, or have at most bounded rationality, and that behavioral and psychological mechanisms, such as herding, may be important in the shaping of market prices.
6.5
Forecasting of the solar activity
Solar activity forecasting is another important topic for various scientific and technological areas, like space activities related to operations of lowEarth orbiting satellites, electric power transmission lines, geophysical applications, high frequency radio communications. The particles and electromagnetic radiations flowing from solar activity outbursts are also important to long term climate variations and thus it is very important to know in advance the phase and amplitude of the next solar and geomagnetic cycles. Nevertheless, the solar cycle is very difficult to predict on the basis of time series of various proposed indicators, due to high frequency content, noise contamination, high dispersion level, high variability in phase and amplitude. This topic is also complicated by the lack of a quantitative theoretical model of the Sun magnetic cycle. Many attempts to predict the future behavior of the solar activity are well documented in the literature. Numerous techniques for forecasting are developed to predict accurately phase and amplitude of the future solar cycles, but with limited success.
Forecasting
6.6
177
Noise reduction and Wavelets
As we already mentioned, real life forecast is usually done on the basis of multi-frequency, noisy data, for example some currency exchange rate during the next month. In this case, before making any forecasting model, one can be useful to filter the high-frequency components from the original time-series. There are many ways to do it, and one of the most effective is based on wavelet analysis. Wavelet analysis is a relatively new development in the area of applied mathematics that is just now receiving the attention of many scientists. In particular, many useful applications of wavelets one can find in noise reduction. By design the wavelets usefulness rests in their ability to localize a process in time frequency space, unlike the Fourier transform, which is only localized in frequency, never giving any information about where in space or time the frequency happens. At high frequency levels, the wavelet is tight in shape (small time interval) and is able to focus in on short lived phenomena like singularity points, while at low frequencies the wavelet is stretched out in shape, making it well suited in identifying long periodic processes. By moving from high to low levels of frequency the wavelet is able to zoom in on a processes behavior at a point in time and identify either singularities or alternatively zoom out and reveal the long and smooth features of a signal. Wavelets can be thought of as the derivative at any order k of a smoothing kernel, under the assumption that the smoothing kernel has at least fc-ordered derivatives. Like any smoothing kernel, the kernel from which wavelets are formed is well localized in time space. But unlike normal unimodular smoothing kernels, the smoothing kernel used in deriving a wavelet can take on negative value. This feature of the smoothing kernel enables the wavelet to be well localized in frequency space, improving the decorrelation between the wavelet coefficients, and enabling the wavelet's bandwidths to be increased (decreased) to capture the long and smooth (short and discontinuous) characteristics of a time series. A continues wavelet is determined as follows:
1>a,b = M V V ( ^ ) ,
(6.6)
where a > 0 and b is any real number. Function ^o.t is simply the dilation (by a) and translation (by b) of the function. If a > 1, V flattens out horizontally, while 0 < a < 1 tightens tp. For this reason a is referred
178
Optimal control and forecasting of complex dynamical
systems
to as the scaling parameter. The l a p ' 2 term is a normalizing, constant that insures that ipa,b has an inner product equal to one. When a > 1 the \a\x/2 term causes the vertical height of to be scaled down, while when 0 < a < 1 the vertical height is increased. If is well localized around zero, changing the value of b shifts over the time arguments, allowing Va,6 to be well localized around the translation point b. b is referred to as the translation parameter. The function •)/>(£) is commonly referred to as the "mother" wavelet. In order for a function to qualify as a "mother" wavelet it must satisfy the admissibility condition, / ip(t)dt = 0. This is a necessary condition insuring smoothness and localization in frequency and time space. The admissibility condition can also be interpreted as requiring tp(t) to be nonunimodular, hence, the name wavelets.
< x(t)il>aib > = \a\x'2 [x{t)^— )dt. (6.7) J a It can be shown that the wavelet coefficients < x(t)ipa,b >b represent the details of the signal x(t) at the scale a.
j | < x(t)^0>6 > ?db = J \x(t)\2dt
(6.8)
and could be used to reconstruct x by x(t) = — / a~2 < x{t)ipa dadb, W J
(6.9)
where C$ = 2n J \i>\2(x)/\x\^X < °° Note that the admissibility condition, ftp(t)dt = 0, is implied by C$ < oo x(t) has sufficient decay. The wavelet transform represents an efficient technique for signal processing with time-varying spectra. It can be viewed as a decomposition of a signal in the time-scale plane [Daubechies (1992)],[Chui (1992)],[Mallat (1989)]. There are many application areas of wavelet transform like as subband coding, data compression and noise reduction. In reducing the noise of measured signal, many techniques are available like as filtering, adaptive method and wavelet transform. Major interests of the recent papers on the noise reduction using wavelet transform are the determination of the wavelet transform and the choice of thresholding parameters. For noise reduction, the wavelet transform that employs thresholding in wavelet domain has been proposed by Donoho as a powerful method [Donoho (1994)]. It has been proved that the method
Forecasting
179
for noise reduction works well for a wide class of one-dimensional and twodimensional signals. Thresholding in wavelet domain is to smooth or to remove some coefficients of wavelet transform of the measured signal. Through the thresholding operation, the noise content of the signal is reduced effectively under the non-stationary environment. The two well-known thresholding techniques are soft thresholding and hard thresholding [Donoho (1994)].
6.7
Finance and random matrix theory
It is often necessary to describe behavior of several (as many as 1000) firms in different sectors of economy. The common sense suggests that their time evolution can be correlated (or anti-correlated). A possible measure of these correlations one can derive from the correlation of their stock prices. As we discussed before, there are fundamental difficulties in quantifying any kind of correlations between any two stocks. In economical systems, unlike physical systems, there is no formalism or theory to calculate the "interaction" between two companies i,j. And it is not clear, in which units these "interaction" could be measured. The problem is that although every pair of company should interact either directly or indirectly, the precise nature of interaction is unknown or extremely complicated. Of course, in physics there are examples of "indirect" interactions, like "superexchange" or RKKY interaction. In the case of the RKKY interaction, which is the dominant exchange interaction in metals where there is little or no direct overlap between neighboring magnetic electrons. It therefore acts through an intermediary which in metals are the conduction electrons (itinerant electrons) [Ruderman(1954)]. However, unlike in physical systems, correlations need not be just pairwise but rather involving clusters of stocks. The correlations Cij between any two pairs i, j of stocks change with time (non-stationary). The correlation matrix C which has elements Cij
=
->
(fU0)
(T.-nowhere &i = y/< Y? > — 2 is the standard deviation of the price changes of company i, and < .. > denotes a time average over the period studied. The matrix C can be studied using Random Matrix Theory (RMT) [Wigner (1951)], which predicts the eigenspectrum of perfectly un-
180
Optimal control and forecasting of complex dynamical
systems
correlated Gaussian random matrices. The deviations of the eigenspectrum of the matrix C from the universal predictions of RMT identify correlations and non-random properties of the specific system. There are generalizations of the Random Matrix Theory for non-gaussian (Levy statistics) random variables, which exhibit so-called "heavy tails" and are more suitable for description of econometric variables. For further details, see, [Plerou (2000)].
6.8
Neural Networks
Artificial neural networks, originally developed to mimic basic biological neural systems the human brain particularly, are composed of a number of interconnected simple processing elements called neurons or nodes. Each node receives an input signal which is the total information from other nodes or external stimuli, processes it locally through an activation or transfer function and produces a transformed output signal to other nodes or external outputs. Many different ANN models have been proposed since 1980s. Perhaps the most influential models are the multi-layer perceptrons (MLP), Hopfield networks, and Kohonens self organizing networks. One can also mention other types of ANNs such as radial-basis functions networks, ridge polynomial networks, and wavelet networks. For more examples and possible applications of neural networks in identification, modelling and control of dynamic systems, one can read a nice book [Nrgaard (2000)]. For an explanatory or causal forecasting problem, the inputs to an ANN are usually the past variables.
Yn+l=f{Yn,Yn_u..,Yn^d).
(6.11)
Thus the ANN is equivalent to the nonlinear autoregressive model for time series forecasting problems. Before an ANN can be used to perform any desired task, it must be trained to do so. Basically, training is the process of determining the arc weights which are the key elements of an ANN. The training algorithm is used to find the weights that minimize some overall error measure such as the sum of squared errors (SSE) or mean squared errors (MSE), and we again arrive to multidimensional optimization problem!
Forecasting
6.9
181
Summary
In this chapter we make a brief review of a general formulation of different forecasting problems and different types of approaches to get a satisfactory solution. We outline the main ideas behind the wavelet transform, that could be useful in filtering of noisy data before making forecast. We have described, how the chaos theory can help us in effective reduction of the model's dimensions. We also have mentioned some philosophical aspects of the forecasting.
This page is intentionally left blank
Bibliography
M. Abramovitz, LA. Stegun, "Handbook of Mathematical Functions", Dover Publications, Inc., New York, (1972). N. Akman and M. Tomak, "The Wigner molecule in a 2D quantum dot", Physica E 4, 277 (1999). V. M. Akulin, N. V. Karlov, "Intence Resonant Interactions in Quantum Electronics", Springer-Verlag, (1992). L. Allen, J. H. Eberly, "Optical resonance and two-level atoms", Interscience Publication, (1975).
A Wiley-
B. L. Altshuler, P. A. Lee and R. Webb, Eds., Mesoscopic Phenomena in Solids, Elsevier, Amsterdam, (1991). L. E. E. de Araujo, I. A. Walmsley, and C. R. Stroud. Jr. "Analytic Solution for Strong-Field Quantum Control of Atomic Wave Packets", Phys. Rev. Lett. 8 1 , 955 (1998). W. B. Arthur, "On Learning and Adaptation in the Economy," Santa Fe Institute Paper 92-07-038, (1992). R. Ashoori, "Electrons in artificial atoms", Nature 379, 413 (1996). D. Bacon, K. R. Brown, and K B. Whaley, "Coherence-Preserving Quantum Bits", Phys. Rev. Lett. 87, 247902 (2001). C. J. Bardeen, V. V. Yakovlev, K. R. Wilson, S. D. Carpenter, P. M. Weberand, W. S. Warren, "Feedback quantum control of molecular electronic population transfer", Chem. Phys. Lett. 280, 151 (1997).
183
184
Optimal control and forecasting of complex dynamical systems
R. Bartels, S. Backus, E. Zeek, L. Misoguti, G. Vdovin, I. P. Christov, M. M. Murnane, H. C. Kapteyn, "Shaped-pulse optimization of coherent emission of high-harmonic soft X-rays", Nature 406, 164 (2000). W. G. Bickley, "Formulae for numerical differentiation", Math. Gaz. 25, 19 (1941). V. Blanchet, M. A. Bouchene and B. Girard, "Temporal coherent control in the photoionization of Cs2'- Theory and experiment", J. Chem. Phys. 108, 4862 (1998). R. H. Blick, R. J. Haug, J. Weis, D. Pfannkuche, K. v. Klitzing and K. Eberl, "Single-electron tunneling through a double quantum dot: The artificial molecule", Phys. Rev. B 5 3 , 7899 (1996). K. Blum, "Density matrix theory and applications", Plenum, New York, (1981). D. S. Broomhead, G. P. King, "Extracting qualitative dynamics from experimental data" Physica D 20, 217 (1986). K. Burrage and P. M. Burrage, "High Strong Order Methods for Noncommutative Stochastic Ordinary Differential Equation Systems and the Magnus Formula" ,Conference on Uncertainty, Physica D, (1998). M. M. Bogdan, A. M. Kosevich, G. A. Maugin, "Soliton complex dynamics in strongly dispersive medium", Wave Motion, 34, 1 (2001). N. H. Bonadeo, J. Erland, D. Gammon, D. Park, D. S. Katzer and D. G. Steel, "Coherent Optical Control of the Quantum State of a Single Quantum Dot", Science 282, 1473 (1998). C. Brif, H. Rabitz, S Wallentowitz, LA. Walmsley, "Decoherence of molecular vibrational wave packets: Observable manifestations and control criteria", Phys. Rev. A 63, 063404, (2001). T. Brixner and G. Gerber, "Femtosecond polarization pulse shaping", Opt. Lett. 26, 557 (2001). N. A. Bruce and P. A. Maksym, "Quantum states of interacting electrons in a real quantum dot", Phys. Rev. B 61, 4718 (2000). A. Bulatov, B. E. Vugmeister, and H. Rabitz, "Nonadiabatic control of BoseEinstein condensation in optical traps", Phys. Rev. A 60, 4875 (1999). M. Caputo, "Linear models of dissipation whose Q is almost frequency independent", Geophys. J. Roy. Astron. Soc. 13, 529 (1967).
Bibliography
185
D. M. Ceperley, "Quantum Monte Carlo simulations, Microscopic simulations in physics", Rev. Mod. Phys. 7 1 , 438 (1999). A. V. Chechkin, R. Gorenflo, and I. M. Sokolov, "Retarding Sub- and Accelerating Super-Diffusion Governed by Distributed Order Fractional Diffusion Equations", Phys. Rev. E 66, 046129 (2002). C. K. Chui, An Introduction to Wavelets, Academic Press Inc., (1992). B. E. Cole, J. B. Williams, B. T. King, M. S. Sherwin and C. R. Stanley, "Coherent Manipulation of Semiconductor Quantum Bits with Terahertz Radiation", Nature 410, 60 (2001). C. E. Creffield, W. Hausler, J. H. Jefferson and S. Sarkar, "Interacting electrons in polygonal quantum dots", Phys. Rev. B 59, 10719 (1999). I. Daubechies, "Ten Lectures on Wavelets", SIAM, (1992). K. Deb, "Multi-objective genetic algorithms: Problem difficulties and construction of test problems", Evolutionary Computation Journal, 7, 205 (1999). D. M. Deaven and K. M. Ho, "Molecular geometry optimization with a genetic algorithm", Phys. Rev. Lett. 75, 288 (1995). D. P. DiVincenzo, "Two-bit gates are universal for quantum computation", Phys. Rev. A 5 1 , 1015 (1995). D. P. DiVincenzo and P. W. Shor, "Fault-Tolerant Error Correction with Efficient Quantum Codes", Phys. Rev. Lett. 77, 3260 (1996); R. Laflamme, C. Miquel, J. P. Paz, and W. H. Zurek, "Perfect Quantum Error Correcting Code", Phys. Rev. Lett. 77, 198 (1996). J. A. Doornik, M. Ooms, "Inference and Forecasting for ARFIMA Models With an Application to US and UK Inflation", Studies in Nonlinear Dynamics and Econometrics, 8, art. 14, (2004). D. L. Donoho, "De-Noising by Soft Thresholding", IEEE Trans. Inform. Theory, (1994). E. Dupont, P. B. Corkum, H. C. Liu, M. Buchanan and Z. R. Wasilewski, "Phasecontrolled currents in semiconductors", Phys. Rev. Lett. 74, 3596 (1995). K. Diethelm, "An Algorithm for the Numerical Solution of Differential Equations of Fractional Order", Elect. Trans. Num. Anal., 5, 1 (1997).
186
Optimal control and forecasting of complex dynamical systems
A. Ederlyi, "Higher Transcendental Functions", McGrau-Hill, New York, (1955). J. Fassbender and M. Bauer, "Numerical investigations on the switching behavior of magnetic tunnel junctions in the quasi-static and dynamic regime", Europhys. Lett. 55, 119 (2001). M. D. Feit, J. A. Fleck, Jr., and S. Steiger, "Solution of the Schrodinger equation by a spectral method", J. Comput. Phys. 47, 412 (1982). B. Fornberg and D. Sloan, in Acta Numerica 1994, edited by A. Iserles, Cambridge University Press, Cambridge, 203 (1994). A.V. Filinov, M. Bonitz, and Yu. E. Lozovik, "Wigner crystallization in mesoscopic 2D electron systems", Phys. Rev. Lett. 86, 3851 (2001). A. Fuhrer, S. Luscher, T. Ihn, T. Heinzel, K. Ensslin, W. Wegscheider and M. Bichler, "Energy spectra of quantum rings", Nature, 413, 822 (2001). I. L. Garzon, K. Michaelian, M. R. Beltran, A. Posada-Amarillas, P. Ordejon, E. Artacho, D. Sanchez-Portal and J. M. Soler, "Lowest Energy Structures of Gold Nanoclusters", Phys. Rev. Lett. 81, 1600 (1998). I.M. Gelfand and S.V. Fomin, "Calculus of variations", (2000).
Dover Publications,
M. Governale, M. Grifoni, and G. Schon, "Decoherence and dephasing in coupled Josephson-junction qubits", Chem. Phys. 268, 273 (2001). I. Grigorenko and M. E. Garcia, "An evolutionary algorithm to calculate the ground state of a quantum system", Physica A 284, 131 (2000). I. Grigorenko, M. E. Garcia and K. H. Bennemann, "Theory for the Optimal Control of Time-Averaged Quantities in Quantum Systems", Phys. Rev. Lett. 89, 233003 (2002). I. Grigorenko and M. E. Garcia, "Two-Particle Systems Determined Using Quantum Genetic Algorithms", Physica A 291, 439 (2001). I. Grigorenko, O. Speer, and M. E. Garcia, "Coherent control of photon-assisted tunneling between quantum dots: A theoretical approach using genetic algorithms", Phys. Rev. B 65, 235309 (2002). I. A. Grigorenko, D. V. Khveshchenko, "Robust Two-Qubit Quantum Registers", Phys. Rev. Lett. 94, 040506 (2005). I. A. Grigorenko and D. V. Khveshchenko, "Single-Step Implementation of Uni-
Bibliography
187
versal Quantum Gates", Phys. Rev. Lett. 95, 110501 (2005). M. Grifoni, M. Winterstetter and U. Weiss, "Coherences and populations in the driven damped two-state system", Phys. Rev. E 56, 334 (1997). M. Grifoni, P.Hanggi, "Driven quantum tunneling", Phys. Rep. 304, 232 (1998). M. E. Garcia, H. 0 . Jeschke, I. Grigorenko and K. H. Bennemann, "Theory for the ultrafast dynamics of excited clusters: interplay between elementary excitations and atomic structure", Appl. Phys. B 7 1 , 361 (2000). D. E. Goldberg, "Genetic Algorithms in search, optimization, and machine learning", Addison-Wesley, (1989). P. Grassberger, I. Procaccia, "Measuring the strangeness of strange attractors" Physica D 9, 189 (1983). K. Groom, A. I. Tartakovskii, D. J. Mowbray, M. S. Skolnick, P. M. Smowton, M. Hopkinson, and G. Hill, "Comparative study of InGaAs quantum dot lasers with different degrees of dot layer confinement", Appl. Phys. Lett. 8 1 , 1, (2002). J. R. Guest, T. H. Stievater, Gang Chen, E. A. Tabak, B. G. Orr, D. G. Steel, D. Gammon, and D. S. Katzer, "Near Field Coherent Spectroscopy and Microscopy of a Quantum Dot system", Science, 293, 2224 (2001). T. T. Hartley, C. F. Lorenzo and H. K. Qammer, IEEE Trans, on Circuits and systems-I, Fundamental Theory and Applications, 42, 485 (1995). A. K. Hartmann, "Calculation of ground states of four-dimensional ± J Ising spin glasses", Phys. Rev. E, 60, 5135 (1999). A. P. Heberle, J. J. Baumberg and K. Kohler, "Ultrafast coherent control and destruction of excitons in quantum wells", Phys. Rev. Lett. 75, 2598 (1995); A. P. Heberle, J. J. Baumberg, T. Kuhn and K. Kohler, "Femtosecond pulse shaping for coherent carrier control", Physica B272, 360 (1999). R. Hegger, H. Kantz, T. Schreiber, "Practical implementation of nonlinear time series methods: The TISEAN package", e-print arXiv. chao-dyn/9810005. D.F. Hendry and J.A. Doornik, "Empirical Econometric Modelling Using PcGive", Timberlake Consultants Press (London), (2001). T. Hertel, E. Knoesel, M. Wolf, and G. Ertl, "Ultrafast electron dynamics at C u ( l l l ) : Response of an electron gas to optical excitation", Phys. Rev. Lett. 76, 535 (1996).
188
Optimal control and forecasting of complex dynamical systems
R. Hilfer, ed., Applications of fractional calculus in physics, World Scientific, River Edge, New Jersey (2000). S. Hirata, S. Ivanov, I. Grabowski, R. J. Bartlett, "Time-dependent density functional theory employing optimized effective potentials", J. Chem. Phys., 116, 6468 (2002). M. W. Hirsch, S. Smale, Differential Equations, Dynamical Systems and Linear Algebra, Academic Press, New York (1965). J. H. Holland, in Adaptation in Natural and Artificial Systems, University of Michigan, Ann Arbor, MI, (1975). J. H. Holland and J. S. Reitman, in Pattern-Directed Inference systems, edited by D. A. Waterman and F. Hayes-Roth, Academic Press, NY (1978). T. Hornung, R. Meier, D. Zeidler, K. L. Kompa, D. Proch, M. Motzkus, "Optimal control of one-and two-photon transitions with shaped femtosecond pulses and feedback", Appl. Phys. B 7 1 , 277 (2000). W. Y. Hwang, H. Lee, D. D. Ahn, and S. W. Hwang, "Efficient schemes for reducing imperfect collective decoherences", Phys. Rev. A 62, 062305 (2000). L. Ingber, "Very Fast Simulated Reannealing", Mathematical and Computer Modeling, 12, 967 (1989); L. Ingber, B. Rosen, "Genetic algorithms and very fast simulated re-annealing: A comparison", Mathematical and Computer Modelling, 16, 87 (1992). K. Jauregui, W. Hausler and B. Kramer, "Wigner Molecules in Nanostructures", Europhys. Lett. 24, 581 (1993). T. F. Jiang, X.-M. Tong, and S.-I Chu, "Self-interaction-free density-functional theoretical study of the electronic structure of spherical and vertical quantum dots", Phys. Rev. B 6 3 , 045317 (2001); M. A. Omary, M. A. Rawashdeh-Omary, C. C. Chusuei, J. P. Fackler, P. S. Bagus, "Electronic structure studies of six-atom gold clusters", J. Chem. Phys., 114, 10695 (2001). R. S. Judson, M. E. Colvin, J. C. Meza, A. Huffer and D. Gutierrez, "Do intelligent configuration search techniques outperform random search for large molecules?", Int. J. Quantum Chem. 44, 277 (1992). R. S. Judson and H. Rabitz, "Teaching lasers to control molecules", Phys. Rev. Lett. 68, 1500 (1992).
Bibliography
189
J. Kainz, S. A. Mikhailov, A. Wensauer, and U. Roessler, "Quantum dots in high magnetic fields: Calculation of ground-state properties", Phys. Rev. B 65, 115305 (2002). S. Kirkpatrick and C. D. Gelatt and M. P. Vecchi, "Optimization by Simulated Annealing", Science, 220, 671, (1983). M. Kennel, R. Brown, and H. Abarbanel, "Determining embedding dimension for phase-space reconstruction using a geometrical construction", Physical Review A 45, 3403 (1992). A. Y. Kitaev, "Foult-tolerant quatum computation by anyons", Ann. Phys. 302, 2 (2003). K. M. Kolwankar and A. D. Gangal, "Local Fractional Fokker-Planck Equation", Phys. Rev. Lett. 80, 214 (1998). LP. Kornfeld, S.V. Fomin, Ya.G. Sinai, "Ergodic theory", Springer, (1982). M. Koskinen, M. Manninen, B. Mottelson, and S. M. Reimann, "Rotational and vibrational spectra of quantum rings", Phys. Rev. B 63, 205323 (2001). D. Kouri and D. Hoffmann, "Time-dependent integral equation approach to quantum dynamics of systems with time-dependent potentials", Chem. Phys. Lett., 186, 91, (1991). L. P. Kouwenhoven, T. H. Oosterkamp, M. W. S. Danoesastro, M. Eto, D. G. Austing, T. Honda, and S. Tarucha, "Excitation spectra of circular fewelectron quantum dots", Science, 278, 1788 (1997). J. R. Koza, "Genetic programming", A Bradford Book, The MIT Press, (1992). V. F. Krotov "Global methods in optimal control theory", Marcel Dekker, Inc., New York, (1996). M. Toda, R. Kubo, and N. Saito, "Statistical Physics I: Equilibrium Mechanics", Springer-Verlag (1983).
Statistical
D. Kusnezov, A. Bulgac, and G. D. Dang, Phys. Rev. Lett. 82, 1136 (1999). D. A. Lidar, D. Bacon, J. Kempe, and K. B. Whaley, "Decoherence-free subspaces for multiple-qubit errors", Phys. Rev. A 61, 052307 (2000); A. Barenco, D. Deutsch, and A. Ekert, "Conditional Quantum Dynamics and Logic Gates", Phys. Rev. Lett. 74, 4083 (1995). E. N. Lorenz, "Deterministic nonperiodic flow", J. Atmosph. Sc. 20, 130 (1963).
190
Optimal control and forecasting of complex dynamical
systems
D. Loss and D. P. DiVincenzo, "Quantum computation with quantum dots", Phys. Rev. A 57, 120 (1998). F.B. Luczak, and J.T. Devrees, L.F. Lemmens, "Many Body Diffusion and Interacting Electrons in a Harmonic Confinement", Arxiv preprint condmat/0002343, (2000). J. Maddox, "How to shadow noisy chaos", Nature, 347, 613 (1989). Y. Makhlin, G. Schoen, and A. Shnirman, "Quantum-state engineering with Josephson-junction devices", Rev. Mod. Phys. 73, 357 (2001). C. H. Mak, R. Egger, and H. Weber-Gottschick, "Multilevel Blocking Approach to the Fermion Sign Problem in Path-Integral Monte Carlo Simulations", Phys. Rev. Lett. 81, 4533 (1998). S. Mallat, "A Theory for multiresolutional signal decomposition: The Wavelet Representation", IEEE Trans. Pattn Anal. Mach. Intell. 11, 674 (1989). N. Metropolis, A. W. Rosenbluth, M. Rosenbluth, A. H. Teller, and E. Teller, "Equation of State Calculations by Fast Computing Machines." J. Chem. Phys. 21, 1087 (1953). K. Michaelian, "Evolving few-ion clusters of Na and CI", Am. J. Phys. 66, 231 (1998); K. Michaelian, Chem. Phys. Lett. 293, 202 (1998). B. Militzer and E. L. Pollock, "Variational Density Matrix Method for Warm Condensed Matter and Application to Dense Hydrogen", Phys. Rev. E 6 1 , 3470, (2000). J. E. Mooij, T. P. Orlando, L. Levitov, L. Tian, C. H. van der Wal, S. Lloyd , "Josephson persistent-current qubit", Science 285, 1036 (1999). Y. Nakamura, Yu.A. Pashkin and J.S. Tsai, "Coherent control of macroscopic quantum states in a single-Cooper-pair box", Nature 398, 786 (1999). Yu. A. Pashkin, T. Yamamoto, O. Astafiev, Y. Nakamura, D. V. Averin4 and J. S. Tsai, "Quantum oscillations in two coupled charge qubits" Nature 421, 823 (2003). A. O. Niskanen, M. Nakahara, and M. M. Salomaa, "Realization of arbitrary gates in holonomic quantum computation", Phys. Rev. A 67, 012319 (2003); A. O. Niskanen, J. J. Vartiainen, and M. M. Salomaa, "Optimal multiqubit operations for Josephson charge qubits", Phys. Rev. Lett., 90, 197901 (2003); J. J. Vartiainen, A. O. Niskanen, M. Nakahara and M. M. Salomaa, "Accel-
Bibliography
191
eration of quantum algorithms using three-qubit gates", Int. J.Quant. Inf. 2, 1 (2004). J. A. Nelder and R. Mead, "A Simplex Method for Function Minimization", Computer Journal, 7, 308 (1965). M. Norgaard, O. Ravn, N. K. Poulsen and L. K. Hansen, "Neural Networks for Modelling and Control of Dynamic Systems", Springer, (2000). Y. Ohtsuki, W. Zhu and H. Rabitz, "Monotonically convergent algorithm for quantum optimal control with dissipation", J. Chem. Phys. 110, 9825 (1999). Y. Ohtsuki, K. Nakagami, Y. Fujimura, W. Zhu, and H. Rabitz, "Quantum optimal control of multiple targets: Development of a monotonically convergent algorithm and application to intramolecular vibrational energy redistribution control", J. Chem. Phys. 114, 8867 (2001). Y. Ohtsuki, W. Zhu and H. Rabitz, "Monotonically convergent algorithm for quantum optimal control with dissipation", J. Chem. Phys. 110, 9825 (1999). W. D. Oliver, F. Yamaguchi, and Y. Yamamoto /'Electron Entanglement via a Quantum Dot", Phys. Rev. Lett. 88, 037901 (2002). T. H. Oosterkamp, T. Fujisawa, W. G. van der Wiel, K. Ishibashi, R. V. Hijman, S. Tarucha and L. P. Kouwenhoven, "Microwave spectroscopy of a quantum-dot molecule", Nature, 395, 874 (1998). E. Ott, W. Withers, and J. Yorke, "Is the dimension of chaotic attractors invariant under coordinate changes?" J. Stat. Phys., 36, 687 (1984). E. Ott, "Chaos in Dynamical Systems", Cambridge University Press, Cambridge (1994). G. M. Palma, K. A. Suominen, and A. K. Ekert, "Quantum Computation and Dissipation", Proc. R. Soc. London A452, 567 (1996); L. M. Duan, and G. C. Guo, "Preserving Coherence in Quantum Computation by Pairing Quantum Bits", Phys.Rev. Lett. 79, 1953 (1997). I. Deutsch, A. Barenco, and A. Ekert, "Universality in quantum computation", Proc. R. Soc. Lond. A 449, 669 (1995). F. Pederiva, C. J. Umrigar, and E. Lipparini, "Diffusion Monte Carlo study of circular quantum dots", Phys. Rev. B 62, 8120 (2000).
192
Optimal control and forecasting of complex dynamical systems
R.E. Peierls, "On a Minimum Property of the Free Energy", Phys. Rev., 54, 918 (1938). A. P. Peirce, M. A. Dahleh, and H- Rabitz, "Optimal control of quantummechanical systems: Existence, numerical approximation, and applications", Phys. Rev. A 37, 4950 (1988). I. Podlubny, "The Laplace Transform Method for Linear Differential Equations of the Fractional Order", Slovac Academy of Sciences, (1997); I. Podlubny, "Fractional Calculus and Its Applications", Academic press, San Diego, (1999). I. Podlubny, "Geometric and physical interpretation of fractional integration and fractional differentiation", e-print arXiv: math/0110241. S. Poetting, M. Cramer, C. H. Schwalb, H. Pu, and P. Meystre, "Coherent acceleration of Bose-Einstein condensates", Phys. Rev. A 64, 023604 (2001). J.F. Poyatos, J.I. Cirac, P. ZoUer, "Complete Characterization of a Quantum Process: The Two-Bit Quantum Gate", Phys. Rev. Lett. 78, 390 (1997). V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N. Amaral, H.E. Stanley, "A random matrix theory approach to financial cross-correlations" Physica A 287, 374 (2000). A. Puente and L. Serra, "Oscillation Modes of Two-Dimensional Nanostructures within the Time-Dependent Local-Spin-Density Approximation", Phys. Rev. Lett. 83, 3266 (1999). V. Ramakrishna, R. Ober, X. Sun, O. Steuernagel, J. Botina, and H. Rabitz, "Explicit generation of unitary transformations in a single atom or molecule", Phys. Rev. A 6 1 , 032106 (2000). V. Ramakrishna, R. J. Ober, K. L. Flores, H. Rabitz, "Control of a coupled twospin system without hard pulses", Phys. Rev A 65, 063405 (2002); J. P. Paolo and R. Kosloff, "Optimal control theory for unitary transformations", 68, 062308 (2003). L.H Haroutyunyan and G. Nienhuis, "Coherent control of atom dynamics in an optical lattice", Phys. Rev. A 64, 033424 (2001). K. M. Romero, S. Kohler, and P. Hanggi, "Coherence stabilization of a two-qubit gate by AC fields", cond-mat/0409774. M.A. Ruderman and C. Kittel, "Indirect Exchange Coupling of Nuclear Magnetic Moments by Conduction Electrons", Phys. Rev., 96, 99 (1954). R. Salomon. "The Evolutionary-Gradient-Search Procedure." J.R. Koza et al.
Bibliography
193
(Eds.), Proceedings of the Third Annual Genetic Programming Conference GP-98, Morgan Kaufmann, San Francisco, CA, 852, (1998). E. Scalas, R. Gorenflo, F. Mainardi, "Fractional calculus and continuous-time finance", Physica A 284, 376 (2000); F. Mainardi, M. Raberto, R. Gorenflo , E. Scalas, "Waiting-times and returns in high-frequency financial data: an empirical study", Physica A 287, 468 (2000); N. Laskin, "Fractional Market Dynamics", Physica A 287, 482 (2000). C. Search and P.R. Berman, "Suppression of Magnetic State Decoherence Using Ultrafast Optical Pulses", Phys. Rev. Lett., 85, 2272 (2000). L. Serra, A. Puente and E. Lipparini, "Orbital current mode in elliptical quantum dots", Phys. Rev. B 60, R13966 (1999). S. G. Schirmer, M. D. Girardeau, J. V. Leahy, "Efficient algorithm for optimal control of mixed-state quantum systems", Phys. Rev. A 6 1 , 012101 (2000). J. D. Schaffer, "Multiple Objective Optimization with Vector Evaluated Genetic Algorithms". In Genetic Algorithms and their Applications: Proceedings of the First International Conference on Genetic Algorithms, Lawrence Erlbaum, 93 (1985). C. Schumacher, M. D. Vose, and L. D. Whitley, "The No Free Lunch and problem description length", in L. Spector et al. (Eds.): Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2001), 565, Morgan Kaufmann, San Francisco, (2001). H.W. Schumacher, C. Chappert, P. Crozat, R.C. Sousa, P.P. Freitas, J. Miltat, J. Fassbender, and B. Hillebrands, "Phase Coherent Precessional Magnetization Reversal in Microscopic Spin Valve Elements", Phys. Rev. Lett. 90, 017201 (2003). H. G. Schuster, "Deterministic Wiley-VCH, (1988).
Chaos: An Introduction",
Weinheim, Germany:
B. W. Shore, "The theory of coherent atomic excitation", John Wiley and Sons, (1989). C. Sparrow, The Lorenz Equations: Bifurcations, Chaos, and Strange Springer-Verlag, New York, (1982).
Attractors,
W. Spendley, G. R. Hext, F. R. Himsworth, "Sequential application of simplex designs in optimisation and evolutionary operation." Technometrics 4, 441 (1962).
194
Optimal control and forecasting of complex dynamical systems
O. Speer, M. E. Garcia and K. H. Bennemann, "Photon-assisted Stiickelberg-like oscillations in a double quantum dot", Phys. Rev. B 62, 2630 (2000). J. C. Sprott, "Chaos and Time-Series Analysis", Oxford University Press: Oxford, (2003). C. A. Stafford and N. S. Wingreen, "Resonant photon-assisted tunneling through a double quantum dot: An electron pump from spatial Rabi oscillations", Phys. Rev. Lett. 76, 1916 (1996). V. N. Staxoverov and G. E. Scuseria, "Assessment of simple exchange-correlation energy functionals of the one-particle density matrix", J. Chem. Phys. 117, 2489 (2002). T. H. Stoof and Y. V. Nazarov, "Time-dependent resonant tunneling via two discrete states", Phys. Rev. B 53, 1050 (1996). P. Sutton and S. Boyden, "Genetic algorithms: A general search procedure", Am. J. Phys. 62, 549 (1994). F. Takens, "Detecting strange attractors in fluid turbulence". In: Rand, D. A., and Young, L. S. (Eds.), "Dynamical Systems and Turbulence, Lecture Notes in Mathematics", 898, Springer-Verlag, Berlin (1981). M. Taut, "Special analytical solutions for two and three electrons in a magnetic field", J. Phys.: Condens. Matter 12, 3689 (2000). M. Thorwart, P. Hanggi, "Decoherence and dissipation during a quantum XOR gate operation", Phys Rev A, 6 5 , 012309, (2001). G. B. Thurston, "Viscoelasticity of Human Blood", Biophysical Journal, 12, 1205 (1972). P. K. Tien and J. P. Gordon, "Multiphoton Process Observed in the Interaction of Microwave Fields with the Tunneling between Superconductor Films", Phys. Rev. 129, 647 (1963). W. Tucker, "The Lorenz attractor exists", C. R. Acad. Sci. 328, 1197 (1999). S. Vajda, A. Bartelt, E. V. Kaposta, T. Leisner, C. Lupulescu, S. Minemoto, P. Rosendo-Francisco, and L. Woste, "Feedback optimization of shaped femtosecond laserpulses forcontrolling the wavepacket dynamics and reactivity of mixed alkaline clusters", Chem. Phys. 267, 231 (2001). R. V. V. Vidal, ed., "Applied simulated annealing", Lecture notes in economics and mathematical systems, 396, Springer-Verlag, Berlin, Heidelberg, New York, (1993).
Bibliography
195
L. Viola and S. Lloyd, "Dynamical suppression of decoherence in two-state quantum systems", Phys. Rev. A 58, 2733 (1998); L.Viola, E. Knill, and S. Lloyd, "Universal Control of Decoupled Quantum Systems", Phys. Rev. Lett. 82, 2417 (1999). D. Vion, A. Aassime, A. Cottet, P. Joyez, H. Pothier, C. Urbina, D. Esteve, M. H. Devoret, "Manipulating the Quantum State of an Electrical Circuit", Science 296, 886 (2002). R. de Vivie-Riedle, K. Sundermann, "Design and interpretation of laser pulses for the control of quantum systems", Appl. Phys. B. 7 1 , 285, (2000); A. Apalategui, A. Saenz, and P. Lambropoulos, "Ab Initio Investigation of the Phase Lag in Coherent Control of H2" Phys. Rev. Lett. 86, 5454 (2001). U. Weiss, Quantum dissipative systems, (World Scientific, Singapore, 1999). E. Wigner, "On the Interaction of Electrons in Metals", Phys. Rev. 46, 1002 (1934). E.P. Wigner, "On a class of analytic function from the quantum theory of collisions", Ann. Math. 53, 36 (1951). D. Wolpert and W. Macready, "No Free Lunch Theorems for optimization", IEEE Transactions on Evolutionary Computation 1, 67 (1997). K. Yabana and G. Bertsch, "Application of the time-dependent local density approximation to optical activity", Phys. Rev. A 60, 1271 (1999). C. Yannouleas and U. Landman, "Spontaneous Symmetry Breaking in Single and Molecular Quantum Dots", Phys. Rev. Lett. 82, 5325 (1999). J. Q. You, X. Hu, and F. Nori, "Correlation-induced suppression of decoherence in capacitively coupled Cooper-pair boxes", cond-mat/0407423. G. M. Zaslavsky, M. Edelman, "Pseudochaos", e-print arXiv: nlin/0112033. Y. B. Zeldovich, Soviet Physics J E T P 24, 1006 (1967). E. Zitzler and L. Thiele, "Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach," IEEE Trans. Evol. Comput., 3, 257 (1999). J. Zhang, J. Vala, S. Sastry, K. B. Whaley, "Optimal quantum circuit synthesis from Controlled-U gates", quant-ph/0308167; J. Zhang, J. Vala, K. B. Whaley, S. Sastry, "A geometric theory of non-local two-qubit operations",
196
Optimal control and forecasting of complex dynamical systems Phys. Rev. A67, 042313 (2003); J. Zhang, J. Vala, S. Sastry, K. B. Whaley, "Exact two-qubit universal quantum circuit", Phys. Rev. Lett.91, 027903 (2003); J. Zhang, J. Vala, S. Sastry, K. B. Whaley "Minimum construction of two-qubit quantum operations", quant-ph/0312193.
Z.-W. Zhou, B. Yu, X. Zhou, M. J. Feldman, "Scalable Fault-Tolerant Quantum Computation in Decoherence-Free Subspaces", Phys. Rev. Lett., 93, 010501 (2004). W. Zhu, J. Botina, and H. Rabitz, "Rapidly convergent iteration methods for quantum optimal control of population", J. Chem. Phys., 108, 1953 (1998).
Index
adiabatic approximation, 104 algorithm, 29 artificial agents, 169 autoregressive models, 166
Fermat's variational principle, 2 fidelity, 149 financial markets, 165 Floquet formalism, 138 forecast, 165 fractional derivative, 83 Frobenius trace norm, 156 Fundamental Lemma, 7
B-gate, 162 Bloch-Redfield equations, 148 brachistochrone problem, 5 Caputo derivative, 85 chaos theory, 77 chaotic behavior, 79 CNOT gate, 156 complex systems, 165 convexity, 31 crossover operation, 47
genetic algorithm, 44 genetic operators, 47 global optimization, 30 Gray code, 47 ground state problem, 58 halting problem, 29 Heisenberg coupling, 149 high-frequency time series, 166
decoherence problem, 145 decoherence-free subspaces, 146 degenerate functionals, 15 density matrix, 95 Dirac delta function, 117 distance between functions, 11 double quantum dot, 124
inter-qubit coupling, 150 Ising coupling, 149 isoperimetric problem, 6, 18 Jacobian elliptic function, 115 Josephson qubits, 147
economical modelling, 169 El Farol bar problem, 169 electron pump, 125 Euler-Lagrange equation, 7 Euler-Ostrogradskii equation, 15 evolutionary gradient search, 74
Lagrange multiplier, 18 Lagrangian, 10 least square estimation, 32 Lorenz attractor, 82 Lyapunov exponents, 78 197
198
Optimal control and forecasting of complex dynamical systems
Lorenz attractor, 82 Lyapunov exponents, 78 Magnus series, 107 Makhlin's invariants, 163 Markovian approximation, 105 mathematical pendulum, 114 Mittag-Leffler function, 86 multi-agent models, 176 multi-photon resonance, 138 multiobjective optimization, 34 mutation operation, 47 nanostructures, 122 neural networks, 180 No Free Lunch Theorem, 29 noise reduction, 177 null controllability, 26 objective function, 30 optimal control, 94 optimal control theory, 21 Pareto front, 35 Pareto optimality, 37 partition function, 66 photon-assisted tunnelling, 124 Poincare-Bendixson theorem, 84 Pontryagin Maximum Principle, 22 purity, 150 quantronium, 158 quantum fluctuations, 74 quantum gate, 147 Quantum Genetic Algorithm, 50 quantum Liouville equation, 96 qubit, 147 Rabi oscillations, 125 random matrix theory, 179 realistic constraints, 26 relaxation, 94, 110 Riemann-Liouville derivative, 85 Ritz's method, 20 rotating wave approximation, 107
sensitivity analysis, 25 short-term predictions, 77 simplex method, 38 simulated annealing, 43 sine-Gordon equation, 117 SnelPs law, 2 solar activity, 176 soliton, 116 Stiickelberg oscillations, 128 stochastic optimization, 42 strange attractor, 79 supercoherent, 152 SWAP gate, 164 Takens's embedding theorem, 174 tautochronism, 13 thermal fluctuations, 74 three-body problem, 78 time ordering operator, 103 time-delay reconstruction, 174 topological qubits, 156 transversality condition, 16 travelling salesman problem, 32 Tucker's theorem, 83 two level system, 105 two-qubit register, 148 universal gates, 157 variational calculus, 1 wavelet analysis, 177 weighted-sum method, 38 Wigner molecule, 71 Wilkinson's problem, 33
OPTIMAL CONTROL and FORECASTING of COMPLEX DYNAMICAL SYSTEMS his important book reviews applications of optimization and optimal control theory to modern problems in physics, nano-science and finance. The theory presented here can be efficiently applied to various problems, such as the determination of the optimal shape of a laser pulse to induce certain excitations in quantum systems, the optimal design of nanostructured materials and devices, or the control of chaotic systems and minimization of the forecast error for a given forecasting model (for example, artificial neural networks). Starting from a brief review of the history of variational calculus, the book discusses optimal control theory and global optimization using modern numerical techniques. Key elements of chaos theory and basics of fractional derivatives, which are useful in control and forecast of complex dynamical systems, are presented. The coverage includes several interdisciplinary problems to demonstrate the efficiency of the presented algorithms, and different methods of forecasting complex dynamics are discussed. .„!•ftWu'SS&Jb-
1||J| ISBN 981-256-660-0
vorld Scientific YEARS OF P U B L I S H I N G
www.worldscientific.com