PROBABILITY AND
SCHRODINGERS MECHANICS David B. Cook
World Scientific
PROBABILITY AND
SCHRODINGER'S MECHANICS
Thi...
34 downloads
984 Views
16MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROBABILITY AND
SCHRODINGERS MECHANICS David B. Cook
World Scientific
PROBABILITY AND
SCHRODINGER'S MECHANICS
This page is intentionally left blank
PROBABILITY AND
SCHRODINGER'S MECHANICS
David B. Cook Department of Chemistry, University of Sheffield
V | f e World Scientific vflfe
New Jersey • London • Singapore • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: Suite 202,1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
PROBABILITY AND SCHRODINGER'S MECHANICS Copyright © 2002 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-238-191-0
This book is printed on acid-free paper. Printed in Singapore by Mainland Press
"But the chemists and the whole class of mechanics and empirics, should they have the temerity to attempt contemplation and philosophy, being accustomed to meticulous subtlety in a few things, they twist by extraordinary means all the rest into conformity with them and promote opinions more odious and unnatural than those advanced by the very rationalists." — Francis Bacon, Preface to Natural History, 1609 (my emphasis)
This page is intentionally left blank
Contents
Preface
xiii
Organisation
xvii
Part 1
Preliminaries
1
Chapter 1 Orientation and Outlook 1.1. General Orientation 1.2. Materialism 1.3. Materialism and Realism 1.4. Logic 1.5. Mathematics 1.6. Reversing Abstraction 1.7. Definitions, Laws of Nature and Causality 1.8. Foundations 1.9. Axioms 1.10. An Interpreted Theory
3 3 4 8 9 12 13 15 18 20 21
Part 2
23
Probabilities
Chapter 2 Simple Probabilities 2.1. Colloquial and Mathematical Terminology 2.2. Probabilities for Finite Systems 2.2.1. An Example: The Faces of a Cube 2.2.2. Dice: Statistical Methods of Measure vii
25 26 27 29 31
viii
2.3. 2.4. 2.5.
2.6.
Contents
2.2.3. Loaded Dice: Statistical Methods of Measure 2.2.4. Standard Dice and Conservation Laws Probability and Statistics 2.3.1. An Extreme Example Probabilities in Deterministic Systems The Referent of Probabilities and Measurement 2.5.1. Single System or Ensemble? 2.5.2. The Collapse of the Distribution 2.5.3. Hidden Variables Preliminary Summary
34 35 39 40 41 45 47 49 50 51
Chapter 3 A More Careful Look at Probabilities 3.1. Abstract Objects 3.2. States and Probability Distributions 3.2.1. The Propensity Interpretation 3.3. The Formal Definition of Probability 3.3.1. A Premonition 3.4. Time-Dependent Probabilities 3.5. Random Tests 3.6. Particle-Distribution Probabilities
53 53 55 56 58 62 63 66 67
Part 3
Classical Mechanics
69
Chapter 4 The Hamilton-Jacobi Equation 4.1. Historical Connections 4.2. The H-J Equation 4.3. Solutions of the H-J Equation 4.3.1. Cartesian Coordinates 4.3.2. Spherical Polar Coordinates 4.3.3. Comparisons 4.3.4. Cylindrical Coordinates 4.4. Distribution of Trajectories 4.5. Summary
71 71 73 76 78 79 81 83 84 86
Appendix 4.A
89
Transformation Theory
Chapter 5 Angular M o m e n t u m 5.1. Coordinates and Momenta 5.2. The Angular Momentum "Vector"
99 99 101
Contents
5.3. 5.4. 5.5.
ix
The Poisson Brackets and Angular Momentum Components of the Angular Momentum "Vector" Conclusions for Angular Momentum
Part 4
Schrodinger's Mechanics
106 107 109
111
Chapter 6 Prelude: Particle Diffraction 6.1. History 6.1.1. The Experiment 6.1.2. The Explanations 6.2. The Wave Theory 6.3. The Particle Theory 6.4. A Simple Case 6.5. Experimental Verification 6.6. The Answer to a Rhetorical Question 6.7. Conclusion
113 113 114 114 115 116 118 120 120 121
Chapter 7 The Genesis of Schrodinger's Mechanics 7.1. Lagrangians, Hamiltonians, Variation Principles 7.1.1. Equations and Identities 7.2. Replacing the Hamilton-Jacobi Equation 7.3. Generalising the Action S 7.3.1. Changing the Notation for Action 7.3.2. Interpreting the Change 7.4. Schrodinger's Dynamical Law 7.4.1. Position Probability and Energy Distributions . . . . 7.4.2. The Schrodinger Condition 7.5. Probability Distributions? 7.6. Summary of Basic Principles
123 123 125 126 128 129 131 134 135 136 139 142
Chapter 8 The Schrodinger Equation 8.1. The Variational Derivation 8.2. Some Interpretation 8.3. The Boundary Conditions 8.4. The Time-Independent Schrodinger Equation
147 147 152 156 158
Appendix 8.A
161
Schrodinger's First Paper of 1926
x
Contents
Chapter 9 Identities: Momenta and Dynamical Variables 9.1. Momentum Definitions and Distributions 9.2. Abstract Particles of Constant Momentum 9.3. Action and Momenta in Schrodinger's Mechanics 9.4. Momenta and Kinetic Energy 9.5. Boundary Conditions 9.5.1. Constant Momenta and Kinetic Energy 9.5.2. Solution of the Schrodinger Equation 9.6. The "Particle in a Box" and Cyclic Boundary Conditions . . .
179 179 180 182 186 189 190 191 192
Chapter 10 Abstracting the Structure 195 10.1. The Idea of Mathematical Structure 195 10.1.1. A Pitfall of Abstraction: The Momentum Operator . 198 10.2. States and Hilbert Space 201 10.3. The Real Use of Abstract Structures 204
Part 5
Interpretation from Applications
207
Chapter 11 The Quantum Kepler Problem 11.1. Two Interacting Particles 11.2. Quantum Kepler Problem in a Plane 11.3. Abstract and Concrete Hydrogen Atoms 11.4. The Kepler Problem in Three Dimensions 11.5. The Separation of the Schrodinger Equation 11.6. Commuting Operators and Conservation 11.7. The Less Familiar Separations 11.7.1. The Everyday Solutions 11.8. Conservation in Concrete and Abstract Systems 11.9. Conclusions from the Kepler Problem 11.9.1. Concrete Objects and Symmetries
209 210 211 212 214 216 218 221 223 223 227 231
Appendix 11.A
233
Hamiltonians by Substitution?
Chapter 12 The Harmonic Oscillator and Fields 12.1. The Schrodinger Equation for SHM 12.2. SHM Details 12.3. Factorisation Method 12.4. Interpreting the SHM Solutions
237 237 239 241 242
Contents
xi
12.5. Vibrations of Fields and "Particles" 12.5.1. Phonons and Photons 12.6. Second Quantisation
244 248 249
Chapter 13 Perturbation Theory and Epicycles 13.1. Perturbation Theories in General 13.2. Perturbed Schrodinger Equations 13.3. Polarisation of Electron Distribution 13.4. Interpretation of Perturbation Theory 13.5. Quantum Theory and Epicycles 13.6. Approximations to Non-existent Functions 13.7. Summary for Perturbation Theory
251 251 252 255 256 258 259 261
Chapter 14 Formalisms and "Hidden" Variables 14.1. The Semi-empirical Method 14.2. The Chemical Bond 14.3. Dirac's Spin "Hamiltonian" 14.4. Interpretation of the Spin Hamiltonian
263 263 264 267 268
Part 6
Disputes and Paradoxes
271
Chapter 15 Measurement at the Microscopic Level 15.1. Recollection: Concrete and Abstract Objects 15.2. Statistical Estimates of Probabilities 15.2.1. von Neumann's Theory of Measurement 15.3. Measurement as "State Preparation" 15.4. Heisenberg's Uncertainty Principle 15.4.1. Measurement and Decoherence 15.5. Measurement Generalities
273 273 275 278 281 284 286 287
Appendix 15.A Standard Deviations of Conjugate Variables
289
Chapter 16 Paradoxes 16.1. The Classical Limit 16.1.1. The Ehrenfest Relations 16.2. The Einstein-Podolsky-Rosen (EPR) Paradox 16.2.1. The EPR Original
291 291 293 294 295
xii
Contents
16.2.2. Bohm's Modification 16.2.3. Bell's Inequality and Theorem 16.3. Bell's Assumptions 16.3.1. Lessons from EPR 16.3.2. Density of Spin and EPR 16.4. Zero-Point Energy
297 298 300 303 304 307
Chapter 17 Beyond Schrodinger's Mechanics? 17.1. An Interregnum? 17.2. The Avant-Garde 17.3. The Break with the Past 17.4. Classical and Quantum Mechanics
311 311 313 314 315
Index
319
Preface
The presentation and interpretation of (non-relativistic) quantum mechanics is a very well-worked area of study; there have to be very good reasons for adding to the literature on this subject. My reasons are (obviously) that I am far from satisfied with much of the published work and find difficulties with some points, in particular: • Any abstract formalism is much less rich than the structure from which it has been abstracted; a fact that even Wittgenstein had to come to terms with in the latter part his adult life. Language is richer than (cannot be reduced to) a representation of logic, Schrodinger's mechanics is richer than (cannot be reduced to) a representation of Hilbert space. Just as language contains more structures than logic so Schrodinger's mechanics contains more structures than those of Hilbert space. • The use of probability in quantum theory is arbitrary, eccentric, out of step with modern probability theory and is the source of the majority of "paradoxes" in the interpretation of quantum theory. While these paradoxes are the bread and butter of some of the more popular expositions of quantum theory, I cannot say that I am fond of paradoxes in physical theories. • Although positivism is discredited as a philosophy of science it has left a huge clutter of verbal and conceptual debris strewn across the field of quantum theory. Positivism in its most aggressive form (instrumentalism) makes the mistake of confusing the meaning of a concept with the way in which numerical values of the variables involved in that concept may be determined. This attitude has, of course, added to the confusion about probabilities; "defining" them in terms of the ways in which they might be measured thus reinforcing the view that probabilities are xiii
Preface
XIV
frequency ratios and the everyday opinion that probabilities are applicable to individual events. • More mundanely, the prescriptions for generating Schrodinger's mechanics from classical mechanics in the vast majority of texts are wrong; they simply do not work. We are saved from total chaos by the fact that the form of the Schrodinger equation is known and used independently of these formal prescriptions by working scientists. I am at a loss to explain why this central point is ignored in text after text, both on the applications of quantum theory and on its interpretation. Most modern studies of the interpretation of quantum mechanics are "modern" in another sense; that used in literary criticism. They study the theory of quantum mechanics rather than the quantum mechanics of the energetics and distribution of sub-atomic particles. Many of these works concentrate on the alleged consequences of imaginary experiments involving spin angular momentum; uncritically using the colloquial, "everyday" interpretation of probability rather than the modern mathematical theory. Indeed, scarcely any text says what is meant by "probability", leaving the reader to assume that the everyday interpretation is correct. If a Hamiltonian is used at all in these expositions it is an empirical one using "coupling constants" and making no reference to the laws of interaction in physical systems. In contrast, I wish to present a study of what I have called Schrodinger's mechanics, the richest, most concrete and most thoroughly interpretable of a variety of more-or-less abstract structures falling under the umbrella of "quantum mechanics". Quantum theory is a mathematically articulated theory embedded in a historical stream of scientific thought; it cannot be understood or its interpretation appreciated without addressing these obvious facts. I shall not shrink from joining scientists in cosmology, geology, biology, chemistry, history, archeology, sociology, paleontology, (even physics) in using, where possible, a mechanistic mode of explanation occasionally involving unobserved, even unobservable, objects and processes. What is more, I will occasionally look at actual solutions of the (spatial differential) equations to examine how they are capable of interpretation and how this interpretation bears on the referent of Schrodinger's mechanics. In a previous work I chose1 to express myself using the "ensemble" interpretation of probabilistic statements since this interpretation has been 'With considerable reservations, as I explained in an Appendix to the work.
Preface
XV
one of the major strands in quantum theory (notably by Einstein) and because it is closest to the "abstract object" interpretation which I use in this work. My main aim was twofold: to stay in contact with one of the main streams of thought in quantum theory and, more important, to emphasise the incorrectness of the colloquial interpretation of probability which assumes that probability statements apply to single concrete systems. I now regret this, since even the ensemble interpretation, although vastly superior to the colloquial interpretation, cannot be made precise enough for my purposes here. Two of the main lessons of the past century in the investigation of the structure of mathematics and, in general, of mathematically articulated theories have been the failures of ordinary logic and intuition for: • Systems with an infinite number of members or for infinite combinations of statements, • Constructions which are self-referential like the ones which generate Russell's paradox or for systems whose logic may be "internalised" like Godel's theorem. So, generally speaking, I shall not be concerned with "the wave function of the Universe", "the wave function of the measuring apparatus" or even "the wave function of the observer" since the first is both recursively selfreferential and with infinite referent and the others arguably so. I shall, therefore, try to avoid writing down symbols for non-existent mathematical objects; the typographical equivalent of hot air. The science of the very small and very light is quite complex enough for me, I do not have time (literally or figuratively), or indeed the expertise, to express opinions about the bearing of quantum theory on God, Mind or Consciousness. My aim is entirely bourgeois in the sense used by Wittgenstein and by Marx. I try to solve the problems of Schrodinger's mechanics "from within"; I do not wish to sweep away quantum theory but to bring this most successful physical theory of the twentieth century back into the mainstream of the great tradition of the mechanics of Newton, Lagrange and Hamilton and to base its interpretation on the work of a pillar of twentieth century mathematics: A. N. Kolmogorov. The degeneration of the interpretation of modern science into silopsism and triviality requires an entirely different kind of investigation for which I have neither the qualifications nor the stomach.
This page is intentionally left blank
Organisation
As this work is presented in a rather unusual way, this short chapter gives both a description of the organisation of material and an attempted justification for this choice. Analysis and interpretation of Schrodinger's mechanics is a very "mature" subject in the sense that an enormous amount has been written about the matter without any consensus emerging. Or, rather, such consensus as there is about the interpretation of quantum theory is, in my view, based on an erroneous (pre-Kolmogorov) view of probability. So, in what is written here almost every page (indeed, in places, almost every sentence) will contain material that is in dispute. Using Schrodinger's starting point and Kolmogorov's probability theory, I shall say (and I shall prove), among other things, that: • Probability statements do not refer to individual objects • Probability distributions do not "collapse" • The Hamiltonian operator cannot be obtained by an operator substitution in the classical Hamiltonian function • Dynamical variables may or may not all be represented by linear operators but there is no general way to find the form of these operators • Conserved quantities are not always represented by operators which mutually commute • It is nonsensical to give a physical interpretation to the terms in a perturbation expansion To take the time and space to discuss the opposing views on every such point is tiresome and, more importantly, makes the presentation diffuse and lacking in direction. I shall therefore simply give my presentation XVll
XV111
Organisation
of probability and Schrodinger's mechanics without detailed references to other points of view or, indeed, in most cases even giving an opposing view a "fair hearing". This device will enable the presentation of the theory and interpretation of Schrodinger's mechanics "as if" one were developing an interpreted theory ab initio in the cleanest and most logical manner while conveniently ignoring other viewpoints. My defence for this outrageous choice is twofold: defending my position would simply take too much space and the alternative and opposing views are too well-known to need rehearsing here. Throughout the work I use the term "state function" rather than the familiar "wave function"; this is a deliberate attempt to avoid, even in terminology, the celebrated wave/particle duality. In fact the solutions of the Schrodinger equation are only waves in the extremely atypical case of a particle in the absence of a field of force (a "free" particle). And even these waves are not real physical waves but particular forms of a function which generates a probability distribution. I have also tended (as I say on page 148 of Chapter 8) to work in a system of units — "atomic units" — in which Planck's constant has the value unity unless the historical context explicitly demands the appearance of of h.
PART 1
Preliminaries
In this introductory material I give a rather casual survey of the general approach I shall be using and of some of the attitudes I shall take to philosophy, logic and, particularly, mathematics. In mathematics I shall combine extreme fussiness in some topics with a rather casual lack of rigour in general. In general, internal, "technological" matters, mathematics is in good hands and needs no gloss from me. But in key areas of interpretation and abstraction the relationship between abstract structures and physical theories are, typically, ignored by mathematicians or deemed to be "arbitrary". In the nineteenth century Hegel showed Western Christendom that the God it worshipped was not "out there" but was nothing more than a fantastic image of itself. Modern mathematicians are not yet ready to accept the same conclusion about the logic they rely on; mathematics and logic are not "out there", they are abstracted from the real world of objects, processes, language and conceptual thought.
This page is intentionally left blank
Chapter 1
Orientation and Outlook
Writings about quantum mechanics are hedged about with claims that it changes our whole conception of the applicability of logic and mathematics and revolutionises philosophy. While much of this discussion is patently foolish, it is worth attempting to say a little about some of the assumptions which I have later (mostly silently) made about philosophical, logical and mathematical matters. The ideas simply sketched here will be developed elsewhere. What is said here is trifling as philosophy, it is merely a general orientation to the study of mathematically-articulated science.
Contents 1.1. 1.2. 1.3. 1.4. 1.5. 1.6. 1.7. 1.8. 1.9. 1.10. 1.1.
General Orientation Materialism Materialism and Realism Logic Mathematics Reversing Abstraction Definitions, Laws of Nature and Causality Foundations Axioms An Interpreted Theory
3 4 8 9 12 13 15 18 20 21
General Orientation
The theory of Schrodinger's mechanics, more than most other parts of theoretical science, seems to be inextricably involved with philosophical ideas. This is not, in my view, an essential part of quantum theory but rather, 3
4
Orientation and Outlook
as often happens, the fact that a new theory brings into sharp focus a collection of ideas which have been only imperfectly absorbed in the previous theories or not involved in those theories. It is astonishing how confusions, which are basically misunderstandings about the meaning and referent of probability theory, have generated a re-examination of the applicability of logic and mathematics to the world and even raised doubts about the objective existence of that world. Ostensible paradoxes involving quantum mechanics continue to be marvelled at even though these paradoxes, while brought to the attention of physicists by quantum theory, do not, in essence, involve quantum theory at all. In this introductory Chapter 1 try to give the briefest of summaries of my attitude to some of these points of ontology, logic and mathematics in that order. They are not meant to be anything other than starting points for a more comprehensive philosophy of science which has been given by others.
1.2.
Materialism
I assume that the world exists independently of my (or anyone else's) perceiving, measuring or thinking about it. Without this assumption the study of science is ridiculous. Post-classical materialism is a product of the British Isles, perfected, alas, by others. After John Duns Scotius had wondered in 1320 if "matter could think" and Francis Bacon's penetrating founding of modern materialism we find, 300 years after Duns Scotius, Thomas Hobbes writing: The Universe is corporeal; all that is real is material and what is not material is not real. (The Leviathan, 1650) After another 300 years, this vigorous statement has been refined to a rather more cautious Materialism: the doctrine that all items in the world are composed of matter. Because not all physical entities are material, the related doctrine of physicalism, claiming that all items in the world are physical entities, has tended to replace materialism. (The Blackwell Companion to Philosophy, N. Bunnin and Tsui-James)
1.2.
Materialism
5
These definitions, although suggestive, do not say what "corporeal", "material", "real", "physical" and, above all, "matter" are, nor is "item" defined. Surely, in this important context, one is entitled to ask, at the very least, for a careful statement of the meaning of "material" and "matter" } But I am being unfair on Hobbes and his commentators; he was bound, as we all are, by the cultural, ideological and scientific milieu in which he lived. The sense of Hobbes' statement is clear enough and, in particular, the strength of Hobbes' polemical formulation is understandable, surrounded as he was by established religion. Most modern statements of the meaning of materialism take the general line of the second quote and, again, the general sense of it is clear, particularly if one has no familiarity with physical science. However, the elucidation of the nature and dynamics of matter is, arguably, a definition of the aim of all physical science. With every major revolution in physical science our conception of the nature of matter and the transformations it may undergo are overthrown. We have seen • Newton's "massy impenetrable particles" replaced by • The Rutherford "planetary" atom composed mostly of empty space, which in turn has had to give way to • The idea of matter composed of particles which are nothing more than "quanta" associated with various fields (arguably massless, extended substances) • which, we can confidently expect to be replaced in due course There is every reason to believe that the scientific investigation of matter will never exhaust its properties and that our ideas of what matter is will continue to evolve.2 Also, in view of past experience, we can be completely confident that, at any particular time, our best ideas about what matter actually is will be wrong. So, in the absence of a definition or description from the philosophers, these changes in the conception of matter have apparently placed materialism, as defined above, in the curious position of being dependent on the pronouncements of a current specialised part of science (quantum field theories, at the moment) because ideas about the nature of "matter" are 1 Oi course, definitions have to stop somewhere with primitives which can, at best, be described not defined. But I think that an "ism" cannot be a primitive, it must be defined in terms of the "X" of which it is the "X-ism". 2 If a series of complete revolutions can be called evolution.
6
Orientation
and Outlook
constantly changing. But materialism (like idealism) is an ontological or metaphysical statement about what exists; it cannot be dependent on the pronouncements of particular, special sciences; decisions about ontology cannot depend on the results of physical science, however suggestive these might appear. If we decide, for example, that matter exists independently of our minds, we can scarcely complain if we are disappointed or repelled by what detailed investigations reveal about the actual nature of that matter. The details of any particular current theory of the structure of matter have no bearing whatsoever on one's ontological position, whether it is a branch of materialism or some form of idealism, scepticism, empiricism or whatever. Scientific investigations (practical and theoretical) continue to reveal the richness and depth of the structure of matter and its autonomous transformations; to base a general ontological position on any particular ideas about the structure of matter is a hostage to fortune of the worst kind. What materialism says is not what matter is but that matter, whatever it is, exists. That is, the characteristic property of matter for the purposes of philosophical materialism is not what its constitution is thought to be at any particular moment in time but the fact that it exists independently of our perceptions, thoughts, theories and intuitions about its true nature. The "materiality" of matter consists in precisely this independent existence and I cannot improve on the formulation of the most influential4 materialist of the twentieth century (and, arguably, of any century) Vladimir Illych Ulyanov who says: For the sole "property" of matter with whose recognition philosophical materialism is bound up is the property of being an objective reality, of existing outside our mind. (V. I. Lenin Materialism and Empirio-Criticism 1908 (emphasis in original)) Some modern physicists, disorientated by the huge changes in the conception of the structure of matter, have declared that, since some theories make fields the ultimate constituents of matter, "matter" has disappeared 3 My own opinion, for example, that wasps are an evolutionary mistake, being far too heavily armed for their size, does not affect my general acceptance of biological evolutionary theories. 4 A n d profoundly unfashionable.
1.2.
Materialism
7
from physics. In some cases the change in ideas about the structure of "Newtonian matter" is blurred together with the idea of a mechanistic universe and even with the "information revolution": Thus the rigid determinism of Newton's clockwork universe evaporates, to be replaced by a world in which the future is open, in which matter escapes its lumpen [sic] limitations and acquires an element of creativity. (P. Davies and J. Gribbin, Chapter 1 (The Death of Materialism) in The Matter Myth (Viking, 1991)) It is just as true that "Newtonian matter" has disappeared as it is that "the flat earth" has disappeared: • The idea of "the flat earth" is an eminently sensible one at the everyday level of operations and is perfectly acceptable as a description of the surface of the planet for one's local environment and journeys of a hundred miles or so. Conversations and plans for car and rail journeys may be safely entered into in the certain knowledge that the assumption of "flat earth" will be understood in context and not be seen to conflict with the understanding that, in a wider context, the earth is roughly spherical. • The idea of "matter" as understood in a Newtonian sense is an eminently sensible one at the everyday level of operations and is perfectly acceptable as a description of everyday objects in our normal environment. One may converse about buying apples or building with bricks in the certain knowledge that everyone will understand that these objects are made of hard substantial "Newtonian matter". No layman will be in any doubt that it is this property of substance which prevents our hands from passing through each other when we clap them together. This understanding will not be seen to contradict any more fundamental understanding of the structure of matter. What has happened in both these cases is that an everyday concept has had to be replaced by a more precisely-defined idea when used in a scientific context. The fact that theories of the underlying structure of matter has changed from the naive assumptions current when the word and concept "matter" were formed changes in no way the basic materialist view that matter, whatever its structure, exists objectively whether or not we perceive it.
8
Orientation
1.3.
and Outlook
M a t e r i a l i s m and Realism
Personally, I have no doubt whatsoever that natural gas (methane, CH4) exists independently of my or any other mind. What is more, the formula CH4 is interpretable as saying (among other things) that a molecule of methane contains 5 atomic nuclei. I believe that these 5 nuclei exist independently of my mind. Also, I know from both quantum mechanical calculations and a wealth of experimental data that the methane molecule has other properties which are not material. In particular, I believe that the four hydrogen nuclei are disposed about a central carbon atom in a tetrahedral manner; the lower the temperature of the methane (the smaller the amplitude of its vibrations) the more nearly does it assume a regular tetrahedral shape. • The number 5 is not a material object, although a penta-atomic molecule is. • The tetrahedral shape of a methane molecule is not a material object, although a tetrahedral molecule clearly is. So I must dissent from Hobbes' vigorous opinion at the start of Section 1.2 on page 4; integers and shapes (arrangements) are real but not material in any sensible use of the term "real". Further considerations along these lines will reveal a whole host of further things which are real but not material. All of them are of conceptual structure and exist in the minds of people; all nouns (except proper names of individual concrete objects) denote things which are abstractions. The desk on which my computer sits does not even have a name; it is just "my desk" but "desk" 5 is a concept and, as such, exists only in minds. Naturally, as far as I am concerned, minds are properties of (or processes in) some material objects but they and their "contents" are not themselves material objects. Thus, although a materialist, I cannot be what one might term an exclusive materialist; there are obviously, in the world, things existing independently of my mind which are not material. One is naturally drawn to the use of the word "realist" to denote such a metaphysical position since what I am saying is that both material objects and abstract objects are real. Unfortunately, the word realist in the philosophy of science seems to 5
1 choose not to distinguish between the word desk and the concept denoted by that word in order to avoid t h e disintegration of any discussion into a theory of t h e proper use of quotation marks of which there are quite enough in this chapter already.
1.4-
Logic
9
have come to mean "the existence, independent of minds, of the object of study" whether it is material or something else (there are realist mathematicians who believe in the objective mind-independent existence of, for example, Laplace transforms). However, without attempting — or indeed wishing — to work out a philosophical system, I take the position that the things that exist are of (at least) two kinds 6 : 1. Material objects which exist independently of minds. 2. Concepts, abstractions, ideas, etc. which exist in minds and are formed culturally and collectively by the interaction of the culture and the individual with material objects and other minds. Some of these material objects have the property of having and maintaining a mind. The existence (once formed) 7 of these minds is independent of other minds. In particular, there are no minds independently of bodies and minds may interact with the material world through bodies containing minds; either their own or others. These assumptions seem to me to be both self-evident and philosophically innocuous and they are certainly sufficient for the purposes of this work.
1.4.
Logic
Most mathematicians and many philosophers regard logic as the sui generis a priori "thing". Indeed, like the fish that is unaware of the water in which it swims, one's first impression is always "how could things be any other way?"; logic is not empirical, it is part of the structure of our minds, it is not capable of derivation or falsification. However difficult it is to conceive and whatever the difficulties there are in finding the opportunities for experimental research, there are always some indefatigable workers who will try to find things out. After the social revolution in the Tsarist Russian empire in 1917 the newly-formed 6
Notice that this rather elementary taxonomy of things is not a Cartesian dualism any more than the recognition of the existence of two types of living things (plants and animals, say) makes one a dualist in biology. 7 It is an empirical point whether or not an isolated individual would develop the property of having a mind. This could only be determined experimentally; reminiscent, perhaps of the seventeenth, century experiment to discover man's natural language by having a pauper child brought up in isolation by a dumb nurse.
10
Orientation and Outlook
Soviet Union was a collection of extremely heterogeneous countries, peoples and cultures; from the urbane metropolitan Muscovites and thoroughly European citizens of St. Petersburg to the nomadic tribes of central Asia. Amongst these peoples were some whose culture was materially quite advanced but who were illiterate. That is, the culture was illiterate, no one could read or write because there was no written expression of their language. This situation was and is extremely rare, it is not unusual for members (even a majority) of a given culture to be illiterate (this was the case in Russia itself) or for very primitive cultures to have no writing, but for a materially advanced culture to be entirely without writing was rare and may no longer be capable of being found. Two pioneering and innovative neuropsychologists A. R. Luria and L. S. Vygotskii were working on the relationship between the development of language and of mind, doing work on child development and individuals with brain lesions. They had the unique opportunity to study such cultures in the 1920s. Their work in this and other areas is now classical in their field and has influenced the development of neuropsychology and linguistics ever since. One of the most interesting and novel of their findings in interviewing and talking to these people was that they had not developed the syllogism; they did not have the general idea that, for example, from: 1. All polar bears are white 2. Boris is a polar bear it necessarily follows (we say) that Boris is white. They would typically demand more information or express exasperation and say that as they had never been to the Arctic how were they to know the colour of Boris? There was no question that it could have been the paucity of language which was at fault; it seems that their (spoken) language was concrete and they had not actually developed some of our abstract ideas at all. They had not developed the structures which we are in the habit of regarding as innate and a priori. With hindsight it is not too difficult to reconstruct a possible scenario. It is, perhaps, too often forgotten by theorists that language originates as spoken language. As we all know, even the most literate of us, when in conversation, will express meaning by facial gesture, by pointing and by mime when necessary. If the language is not written it may not be necessary to evolve actual sounds (articulated words) for some of the most familiar objects, properties and processes around us. However, when we
1.4-
Logic
11
need to convey information by writing to some person who is not with us and so we cannot use these useful aids, we must create words and forms of grammar which will enable us to get the message across. Once this level of abstraction has been done, the written language itself becomes an object of study and refinement and certain structures can be seen to have become explicit in the written language which were only implicitly present in the spoken form. In short, it is not at all difficult to see that Logic is a structure which is abstracted from language; it is not inherent in the human mind. 8 In the case of Vygotskii and Luria, this and a mass of other research (often done in spite of the official ideology) convinced them that language, logic and mind are social developmental products, they evolve in the individual and in the culture through interaction with the material world and with other minds. They have also convinced me. One of the main manifestations of logic, the concept of proof, is nicely illustrated by Wittgenstein's opinion that it is sufficient to "prove" the commutative property of the multiplication of integers simply by exhibiting a rectangular array of objects of sides N and M and noting that the products NxM and MxN are simply two ways of viewing the same array. Wittgenstein also saw clearly that attempting to validate mathematical proofs by some form of metamathematics is just a regress; just another calculus as he insisted. In the end, although the processes of a calculus may be objectified (even automated), the validation of the process can only come from humans or from comparison with the real world. We might, therefore, characterise a valid logic as a calculus for which there is "consenus amongst experienced practitioners" rather than having objective (mind-independent) existence. As Scruton 9 says "[mathematics is] a projection into logical space of our own propensities to coherent thought." I shall, in this work, silently assume the validity of "ordinary" logic, not least because I shall be expressing myself in "ordinary" language and because logic is inherent in all of "ordinary" mathematics (set theory, algebra and analysis) and, without language and mathematics, I would be unable to proceed. This takes us naturally on to a consideration of the use of mathematics in science. 8
Perhaps I should be more careful in view of the work of Frege, Russell and Wittgenstein and say "abstracted from a study of language". 9 R . Scruton, From Descartes to Wittgenstein (Routledge 1981).
Orientation
12
1.5.
and Outlook
Mathematics
The question of the "miracle" of the applicability of mathematics ("free creations of the human mind") to the real world in scientific theories is discussed in many works of the philosophy of science; there is a summary and extension of the ideas involved in a recent book 10 on the subject, so this is still a topical issue. A visit to a really well-stocked tool shop by a novice in hand-work might generate the same feelings; how is it that all these supremely useful objects ("free creations of the tool-maker's mind") can be so convenient and, above all, so applicable"? M a t h e m a t i c s is a system of formal rules for the manipulation of formal objects which has been developed by abstraction from millennia of cultural interaction with the material world. Language is a system of formal rules for the manipulation of formal objects which has been developed by abstraction from millennia of cultural interaction with the material world. And yet, one would be hard pressed to find a discussion (in language) of the problem of the miracle of the applicability of language to the real world in scientific philosophy. It is the same as the apparent problem with logic, in that we are so immersed in language that we cannot conceive that it would let us down. With mathematics we see it for what it is; a tool which has been created to help in our investigation and description of the world. Each of us has to be taught mathematics and to learn the rules explicitly, we do not pick it up at our mother's knee in childhood. So we are able to stand back and question its structure and applicability in much the same way as we are inclined to do when we learn a foreign language through explicit instruction rather than by the process of "osmosis" which occurs when we are immersed in the cultural usage of our native language. Now, while it is entirely possible that some particular parts of mathematics may be wrong, like the view that every continuous function must have a derivative, this is due to logical errors on our part or failures of intuition during the formal processes of generalisation and systematisation. The idea that mathematics might not be applicable to the real world forgets that mathematics has been historically developed from a study of the real world and, as it were by natural selection, the resulting concepts and techniques are necessarily applicable. 10
M. Steiner, The Applicability University Press, 1998).
of Mathematics
as a Philosophical
Problem (Harvard
1.6.
Reversing
Abstraction
13
Of course, since the end of the nineteenth century, mathematics and mathematicians have sought to develop autonomously of science; but the very laws of logic and abstraction which enable this apparent autonomy have, themselves, been generated by abstractions from language and interaction with the world. Without further ado, I shall assume the applicability and utility of ordinary mathematics not least because it contains logic which I have already accepted. 1.6.
Reversing A b s t r a c t i o n
It is increasingly the case in modern mathematically-articulated theoretical science that material is presented first as an abstract mathematical skeleton and the interpreted scientific structure is then given as a "representation" of that abstract structure. Some mathematicians call this the "lapidary" method; the gems are presented cut and polished, removed from the gross material in which they were found and mounted for display in such an environment that there can be no clue of their original earthly origins. In the context of considering the validity of abstraction and interpretation it is perhaps not out of place to be prepared for the rather eccentric use to which mathematicians are inclined to put the word "representation". In ordinary English, if A is a "representation" of B then B is richer and more complex then A; it contains more structure than A. For example, a wiring diagram is a "representation" of the electrical system in a car or building; a flow chart is a "representation" of an industrial process; a portrait is a "representation" of a child. In all these cases the representer is an abstraction of some properties of the represented and is used to show only one aspect of a more complex entity under study. It might therefore be reasonably expected that, in mathematics, the statement "A is a representation of B" might be a paraphrase of something like "A is abstracted from B" as it is in both everyday and scientific usage. In fact the mathematical usage is exactly the opposite; the abstraction is taken to be the more basic or fundamental entity (the represented) and the concrete entity as the representer. Thus, for a mathematician: • Line elements in real space are a "representation" of an abstract vector space. • Rotations of solid bodies in real space are a "representation" of group theory
14
Orientation and Outlook
• Looking ahead, the solutions of the Schrodinger equation are a "representation" of Hilbert space. One assumes and hopes that this inverted way of expressing an idea is to make allowance for the undoubted fact that the same mathematical structure may be abstracted from many (perhaps unrelated) real structures and processes and not in the (Platonic) belief that the mathematical structures are more fundamental. I have said in Section 1.3 that I assume that mathematical structures are real but with the very strict proviso that these structures like many others do not exist outside of people's minds; they are real but not material. Mathematics, like language, is generated by abstraction from the cultural interaction of material people with the material world and by interaction among minds, it has no existence independent of those minds in interaction and would certainly not have been generated without the historical interaction of those minds with the material world. This upside-down terminology will not cause any difficulty in actual applications of mathematics to scientific theories but it can and does give an unfortunate philosophical slant to the interpretation of those theories. If a mathematical or logical structure which is contained in (can be abstracted from) a scientific theory is regarded as more "real" than that scientific theory then the physical interpretation of the quantities in the theory becomes difficult and may come to be regarded as arbitrary. This view can be reinforced by the generation of intuitive paradoxes in the abstract theory which were not present in the original, physically interpreted, theory. The use of the extremely powerful methods of mathematics has more than just a manipulational value in physical science; these methods are a guide to intuition and an aid to concept formation. But mathematics is neither the source nor the destination of scientific theories. In spite of eminent opinion to the contrary, therefore, it is not possible for there to be any "Mathematical Foundations of Quantum Mechanics". If the unfortunate architectural analogy is to be used at all for the role of mathematics in the sciences it is more akin to the wiring, plumbing and heating — all those services which make a building convenient and pleasant to live in — than to the foundations. A more realistic metaphor is between mathematics and the tools with which the edifice is constructed; the building's ultimate shape is determined just as much by the techniques available as it is by the use for which it is intended.
1.7.
Definitions,
Laws of Nature and
Causality
15
From the outlines of the use of language, logic and mathematics which I have given in this chapter, it would appear that their development, in mutual interaction and abstraction, has meant that they are so locked together with each other and with the material world that we can never "step outside" of these structures to think about the world objectively. That we can, in fact, do so is proved, not by ratiocatination but by the very fact that we have been able to do so in both theory and in practice; we can use logic, our mathematics does work and is applicable, we can understand and explain the world and, a fortiori, we have used all these things to generate working logical and material technologies. In fact, it is easy to see that it is entirely possible to solve problems which are strongly "coupled" and "non-linear" in the sense of complete mutual dependence. The analogy which springs most readily to my mind (due to some of my day-to-day practical work) is that of the solution of the HartreeFock or Kohn-Sham equations for the electronic structure and energetics of many-electron molecules and solids. The distribution and energy of each electron in these systems is dependent on the distribution and energy of all the others, presenting a problem which has all the characteristics of a "deadly embrace" n ; we need the distribution of all but one of the electrons before we can compute the distribution of a particular one of them but those distributions can only be computed when we know the distribution of that particular one But, in computing laboratories all over the world these equations are solved thousands of times a day as a completely routine task; they are solved iteratively but guessing a solution and progressively refining it until it satisfies a self-consistency criterion. In theoretical science, perhaps the last thing we want is mere self-consistency because that would mean stagnation and self-satisfaction, but the analogy is useful.
1.7.
Definitions, Laws of Nature and Causality
There is a constant thread in science to confuse together scientific definitions and laws of nature. This tendency varies from confusions between the two and explicit positivistic programs to reduce one to the other; for example: • Mach tried to reduce Newton laws to definitions by defining force in terms of mass and acceleration using F = ma. Or a Catch-22, perhaps.
16
Orientation and Outlook
• The famous experiment to determine the mechanical equivalent of heat cannot now be done since the answer will always be unity as both heat and work are measured in the same units. • Measurement of the velocity of electromagnetic radiation in vacuo is now impossible as this velocity is a standard definition. The first of these is simply a mistake which I will elaborate below. The others are more interesting, since they fuse together a definition and a scientific law which can be clarified by other, more extreme, examples: • Anchor chain and dress fabric are both measured in metres (which is a definition) but they are not interconvertible (which is a law of nature). • Graphite and diamonds are both measured in grams (which is a definition) and they are interconvertible (which is a law of nature). • Work and heat are both measured in Joules (which is a definition) but they are only partly interconvertible (which is a law of nature). The fact that two quantities may be measured in the same units has no consequences at all for any relationship that they might have in natural processes. In fact, much of the science of thermodynamics is concerned with the fact that work and heat are not completely interconvertible. Heat and work are physically and conceptually different things; they are not equivalent, merely partially interconvertible. We shall meet this confusion between definitions and laws of nature later in a more quantitative form; the confusion (in science, not mathematics) between definitions and equations. It is equations which are the quantitative carriers of laws of nature; relationships which have the same form as equations may be simply quantitative definitions. In Chapter 7 and, particularly, Section 7.1.1 this point will be taken up in detail in the context of Schrodinger's mechanics. In the simpler examples of F = ma or pV = RT the question of whether these expression are equations or identities cannot be resolved by a minute examination of the symbols involved; it is a question of interpretation and meaning. If the symbols in an expression are all independently defined physical quantities, then the expression is an equation; it expresses a (possible) necessary relationship between those quantities in the real world. If all but one of them represent such physical quantities, then the expression
1.7.
Definitions,
Laws of Nature and
Causality
17
is merely a definition of the remaining one and does not imply anything about the structure or transformations of the material world. Thus, for example, in pV = RT, pressure (p), volume (V) and temperature (T) are all defined independently of each other and (knowing the value of the constant R) the expression is an equation which contains an (approximate) law of nature. Among the considerations involved in discussing laws of nature, causality plays a major part and it is worth discussing, however briefly, how causality can be viewed from the point of view of the philosophy and interpretation of science. The classic empiricist/positivist view of causality is contained in Hume's account, quoted with approval by Kant 12 : Necessary connection, then, cannot be observed, nor can its existence be logically derived from what is observed, (my emphasis) Truly spoken from the depth of the philosopher's armchair by someone who has never traced a fault in a complex piece of machinery or searched for a bug in an iterative computer program. Necessary connection (causality) is inaccessible to philosophy because it is not a question of precision or clarity of expression but a question of experimental interaction with the real world. No amount of passive observation or logical deduction can establish a causal connection between events. The mere fact that the 12.15 express to Glasgow always precedes the 14.25 train to Bristol says nothing about whether or not a trip to Glasgow causes a journey to Bristol; one must see what happens when one prevents the Glasgow train from leaving. I shall remark from time to time on the overwhelmingly passive view of much philosophy; without an engagement with the real world, philosophy is impotent. 13 12
Kant famously said that "reading Hume woke me from my dogmatic slumbers"; one wonders, in view of this quote, about the depth of Kant's dogmatic slumbers. But this is unkind; Kant must have been the first one to see that philosophy must ultimately split into two: a linguistic cul de sac and natural science. 13 A point on which Marx and Wittgenstein were in complete agreement, although their expressions of this view and their opinions about it are characteristically different. Marx enthusiastically advocates engagement with the material world in the Theses on Fuerbach while Wittgenstein makes the wistful statement that "philosophy leaves the world unchanged".
Orientation
18
1.8.
and Outlook
Foundations
Prom very early in the history of Schrodinger's mechanics there have been works which aim to provide the mathematical or philosophical foundations of quantum theory. In view of what I am to say later in this work it is useful to think, however briefly, about the sort of views expressed in these works. I have already commented on the unfortunate nature of the "foundations" metaphor but it is widely used in the philosophy of science community to mean the "essential underpinning" in the sense of a building's foundations and it is this sense that I want to comment on "foundations" here. Like many mature parts of physical science, Schrodinger's mechanics is expressed in mathematical terms and uses many of the standard conceptual structures of classical analysis, in addition to all the usual structures implied by classical logic (algebra, inference, etc.) These structures are, of course, used in Schrodinger's mechanics in exactly the same way as they are used in any science which is mathematically formulated, whatever its area of applicability and its level of treatment (however fundamental or approximate it claims to be); they are part of the (current) mode of articulation of the theory. 14 The foundations of a distinct part of physical science, whether mathematically expressed or not, cannot be mathematical, since: • as just noted, the same mathematical structures may be used in a wide variety of unrelated branches of science, • the basis of any science is not the rules which are used to manipulate structures representing objects or processes in the real world but the more fundamental assumptions which are made about the nature of the represented objects and of the automous processes they undergo. Not all sciences are mathematically articulated, at least when first formulated, and they may be quite properly and precisely expressed in ordinary language. Perhaps the most cogent and wide-ranging of such theories is Darwin's original formulation of the Theory of Natural Selection. No-one would claim 15 that theories like this have "verbal foundations" in 14 There is a beautiful example in V. I. Arnold excellent text Mathematical Methods of Classical Mechanics (Springer-Verlag, 1978) where the author says (p. 163): "Hamiltonian mechanics cannot be understood without differential forms". One is bound to wonder, therefore, how Hamilton himself managed in the nineteenth century without this twentieth-century mathematical tool. 15 Except, perhaps, in the gormless prattle of the post-moderns.
1.8.
Foundations
19
the same sense that Schrodinger's mechanics is sometimes claimed to have mathematical foundations. If we press the "foundations" metaphor a little, its claims begin to look a little thin. Mathematical and philosopical commentors, in claiming to discover or expose the foundations of a part of science would, I am sure, be viewed askance by the creators of those theories in much the same way that, say, Christopher Wren would be surprised by an announcement by an architectural critic that he had discovered the foundations of St. Paul's Cathedral. Wren was never in any doubt about what and where these foundations were; indeed it was he and his predecessors who developed the methods for establishing foundations of large and imposing edifices. Likewise, Schrodinger16 would, I think, have been amused by claims that, until the mathematical and philosophical critics had done their work, his theory lacked foundations. Schrodinger knew very well that the foundations for his theory were not the tools that he had used in constructing it or the language in which he choose to articulate its concepts. The foundations were (and are) accumulated experimental observations of the real world and the scientific culture epitomised in the seminal works of Newton, Lagrange, Hamilton and Jacobi. A dozen pages of the theory of operators on linear spaces are hardly comparable as the foundations of the most successful theory of matter yet developed. 17 Historically, branches of science do not have mathematical and philosophical foundations but are themselves the driving force for (or the foundation of) the development and application of techniques of mathematics and the formation and extension of philosophical ideas. This is the crucial point for the distinction between a materialist ("realist") view of the world and an idealist point of view. One is bound to ask, however naive it may seem, if an edifice of any kind, physical or logical, can be constructed without a knowledge of the location and nature of its foundations? Perhaps in the spirit of William of Ockham one should inquire what would happen to Schrodinger's mechanics, if the alleged foundations were summarily removed. 16
W h o was notoriously skeptical of taking the work of scientific philosophers seriously. See Foundations of Physics by M. Bunge (Springer Tracts in Natural Philosophy, Vol. 10 1967) for an attempt to present the foundations of physical theories in a balanced way; Bunge's first chapter (on logic, mathematics and philosophy) is called, significantly, "Toolbox" not "Foundations". 17
20
1.9.
Orientation and Outlook Axioms
T h e most effective way to study an existing b o d y of knowledge, particularly if t h a t b o d y is mathematically articulated, is t o find a set of axioms from which t h e whole corpus may be derived by t h e rules of logic and m a t h e m a t ics. In this way, inconsistencies, redundancies and straightforward errors of reasoning m a y be isolated and eradicated. This kind of technical enquiry has a very important place in t h e study of t h e theories of physics in particular. However, I shall not use this method here for reasons which might seem, at first sight, a little perverse. I prefer t o stress t h e basic physical law which generates Schrodinger's mechanics, rather t h a n t o axiomatise a structure which is capable of being abstracted from t h e ramifications of t h a t basic law. E x t a n t axioms systems for q u a n t u m theory are of varying degrees of formality, from, for example, Ludwig two-volume system 1 8 t o informal systems (where t h e axioms are usually called postulates) found in m a n y graduatelevel texts. Most of these systems contain wrong presciptions 1 9 for t h e generation of t h e most i m p o r t a n t differential operators in Schrodinger's mechanics and m a n y of them, in an a t t e m p t at generality, contain w h a t I consider to b e 2 0 false equivalences amongst the properties of some operators in Schrodinger's mechanics. However, these imperfections are not the main reason for my reluctance t o search for an axiom system for q u a n t u m theory; t h e main reason is t h e feeling t h a t axiom systems in physical science are a tool of taxonomy rather t h a n of science. Let me t r y to explain by means of an analogy. In Biology there is always a tension between t h e desire (by taxonomers, mainly) for a watertight classification of (say) m a m m a l s and t h e need (by evolutionists) t o show t h a t such a scheme specifically excludes t h e overarching property of mammals; t h a t is, species evolve. I do not share the desire of taxonomers and of many mathematicians, t o expose the Platonic forms lying behind t h e imperfect corporeal representations which we experience; I am, however, only too pleased t o share in t h e invention of abstract 18
G. Ludwig, An Axiomatic Basis for Quantum Mechanics, Vols. I & II (SpringerVerlag, 1985). 19 See, for example, G. Ludwig, Foundations of Quantum Mechanics I & II (SpringerVerlag, 1985), Vol. II p. 50, where, in addition to giving a wrong prescription for generating the Hamiltonian, it is generously asserted that Schrodinger "guessed" the correct form "with remarkable intuition". I leave the reader to judge by reading Appendix 8.A whether or not Schrodinger guessed the correct form. 20 See Appendix 11.A.
1.10.
An Interpreted
Theory
21
structures which have some of the properties of real objects and processes. In a word, axiom systems are abstracted from developing theories just as the classification of species presents a provisional episode in the history of mammalian development. The main difficulty with the axiomatic approach from the point of view of this work is that the interpretation of the symbolism must also be axiomatised; there have to be axioms of interpretation rather than the development of interpretation from the history and applications of the theory. It brings to mind Dieudonne's (an arch axiomatiser and Bourbakist) famous book on analysis in which he says in his 1960 preface: This has also as a consequence the necessity of a strict adherence to axiomatic methods, . . . a necessity which we have emphasised by deliberately abstaining from introducing any diagram in the book. ("Foundations of Modern Analysis" J. Dieudonne (Academic, I960)) This may be a useful logical exercise, but is it a way to understand analysis? The cat is let out of the bag in the preface to the enlarged and corrected printing of 1969 where we find: The only things assumed at the outset are the rules of logic and the useful properties of the natural numbers.... Nevertheless, this treatise... is not suitable for students who have not yet covered the first two years of an undergraduate honours in mathematics. Just so. 1.10.
An Interpreted Theory
In this work I shall attempt to present a completely interpreted theory of Schrbdinger's mechanics in the sense that I shall try to give a physical (materialist) interpretation to every major symbol which occurs: functions, operators and the like. The theory and the interpretation will be based on a general dynamical law (due, of course, to Schrodinger), a theory of probability (due to Kolmogorov) and articulated with ordinary language, logic and mathematics. The value and validity of the theory should not be judged simply by the agreement of a few numbers with experimental results, this is far too modest a requirement. I hope that the theory will be judged by its coherence and its interpretation of its area of applicability; the sub-atomic domain. I simply assume that this sub-atomic world exists
22
Orientation and Outlook
independently, without my permission. I shall not be making any comments of the relevance of Schrodinger's mechanics to observers, to consciousness, to minds or to God.
PART 2
Probabilities
The interpretation of the modern (measure-theoretical) theory of probability is at odds with the everyday meaning(s) of the word probability. The material presented stresses that Kolmogorov's theory of probability (o) is the only one which can be used in the context of quantitative theories and (b) that probabilities are the relative measures of sets. Statistical methods are related to probabilities insofar as they are ways of experimentally determining these measures.
This page is intentionally left blank
Chapter 2
Simple Probabilities
After some elementary considerations about the relationship between the "colloquial" and "mathematical" use of some common mathematical terminology, the idea of probability is introduced, not axiomatically at this stage, but descriptively. The relationship between probability and statistics is clarified by use of the simplest and most familiar example; dice. The generation of probability distributions for systems which are entirely deterministic is discussed and their indispensability is emphasised. Some of the difficulties which will occur in the interpretation of Schrodinger's mechanics are briefly visited.
Contents 2.1. 2.2.
2.3. 2.4. 2.5.
2.6.
Colloquial and Mathematical Terminology Probabilities for Finite Systems 2.2.1. An Example: The Faces of a Cube 2.2.2. Dice: Statistical Methods of Measure 2.2.3. Loaded Dice: Statistical Methods of Measure 2.2.4. Standard Dice and Conservation Laws Probability and Statistics 2.3.1. An Extreme Example Probabilities in Deterministic Systems The Referent of Probabilities and Measurement 2.5.1. Single System or Ensemble? 2.5.2. The Collapse of the Distribution 2.5.3. Hidden Variables Preliminary Summary
25
26 27 29 31 34 35 39 40 41 45 48 49 50 51
Simple
26
2.1.
Probabilities
Colloquial and Mathematical Terminology
We are all familiar with the fact that a considerable number of words which have common, everyday, meanings are used in more specialised, in particular more precise, ways in science and mathematics. Anyone familiar with elementary chemistry and mathematics can think of four or five different specialised uses to which the word "normal" can be put. 1 There are however some terms for which the mathematical or scientific usage is quite close to the everyday, colloquial, usage and this specialised usage seems perverse in the sense that it is close to the colloquial usage and yet gives a completely different "feel" from the ordinary, conversational use. The common use of the phrase "going off at a tangent" implies that there are several possible tangents to a curve and the whole sense of the familiar phrase is to imply that one could take several possible (inappropriate?) lines of thought from the one under consideration. Yet a mathematician will insist that there is only one tangent to a curve at any point on the curve. Similarly, if one says that x is "derivative" of y in ordinary usage, this tends to mean that x can be obtained in some way from y; thus a musical composition may be derivative of Debussy, a poem derivative of Larkin, etc. So, one might naively expect that mathematicians would consider, for example, 47a;4 or 3x 6 — 4a;2 to be derivatives of x2 since, if x2 is known, then the other expressions can be evaluated. But as we know, in the differential calculus, the the derivative of x2 is 2x. The mathematical definition flies in the face of the colloquial usage; there is only one derivative and its value patently cannot be obtained uniquely from x2 (knowing that x2 is 9 gives two possible values for 2x : ±6). Probability is the worst possible case of this confusion between "colloquial" and "mathematical" usage because it is only in the last 70 years that mathematicians have been able to give an unambiguous meaning to the term and a method of evaluating and manipulating probabilities whose meaning and interpretation is precise enough for scientific use. 2 1
A standard concentration of solutions, standard pressure and temperature of a gas, a perpendicular to a curve or surface, a linear operator with special properties, etc. 2 Significantly enough, for the purposes of this book, Kolmogorov's axiomatic development of Probability Theory (1930) came after Born's probability interpretation of Schrodinger's mechanics (1926-7).
2.2.
Probabilities for Finite
Systems
27
We are all familiar with the various ways in which the term probability is used in everyday life: • A given football team will probably win the World Cup. • It will probably rain tomorrow or even, in the weather forecast, there is a 20% probability of a heavy shower at Wimbledon on Saturday. • The probability of "heads" appearing on the toss of a coin is 1/2. • The probability of throwing two "fair dice" and obtaining two sixes is 1/36. and so on. All of these statements are completely comprehensible and make sense (convey real information) and their use is just as acceptable as the phrase "going off at a tangent" or "The Rolling Stones' music is derivative of the work of McKinley Morganfield" . 3 But none of them uses "probability" in the sense that it is used in the mathematical theory of probability. That this is so is more obvious for the first two statements than it is for the last two. In ordinary usage, where the meaning of probability is clear from the usage and context, it may imply "relative frequency", "reasonable expectation", "past experience", "a hunch", etc. but none of these familiar and frequently contradictory usages 4 can be made precise and quantitative enough for scientific use. As usual in this type of situation, we must extract what is quantitative and essential from common usage and, no matter how far the resulting definition seems to be from that common usage, show how the quantitative use includes all the idiomatic uses; a posteriori, if necessary.
2.2.
Probabilities for Finite Systems
In the modern mathematical theory, probabilities simply involve a comparison of the numbers of members contained in various subsets of a given set. In those cases where the sets contain a finite number of members, the number of members may be obtained by ordinary counting. When the sets 3
Muddy Waters. Suppose that, since records began, it has never rained on June 17th in Sheffield and yet this year it has rained in north west Scotland on the 15th and in Manchester on the 16th and the cold front advances inexorable south-eastward; what is the probability that it will rain in Sheffield on June 17th this year? 4
28
Simple
Probabilities
contain an infinite number of members, simple counting has to be replaced by a suitable measuring process, but the principle is exactly the same. It is conventional to divide the numbers of members in each subset by the total number of members in the whole set so that the probabilities obtained in this way sum to unity. Before attempting to make this idea precise by mathematical definition, it is worth noting: • The idea of "chance" plays no part in the definition of probability; probabilities are ratios of the measure of subsets of a given set. If we know how to measure (count the members of) these sets we can calculate the probabilities uniquely and exactly. • This choice shuts out many of the familiar ideas which fall into the legitimate colloquial use of the word "probability". It specifically excludes the legitimate use of the idea of probability to any quantities which cannot be counted or measured. Thus we will not be allowed to use an expression like "this statement is probably true" or "it will probably rain tomorrow" and the like. Just as we continue to use the words "normal" and "derivative" with their ordinary, conversational meanings we can continue to use "probability" in its colloquial sense but not in the context of the mathematical theory of probability. There is, in fact, no dispute that this is the mathematical theory of probability and, to mathematicians perhaps, probability is nothing more than a part of or an application of measure (integration) theory. Scientists concerned with the interpretation of probabilities seem also to regard the mathematical theory of probability in this way; as just an algorithm or "black box" which simply satisfies the requirements of mathematical rigour and that is all. This gives them free rein to impose their own interpretation on the formal calculus, whether or not this interpretation involves the measures of sets. Generally speaking, the measure 5 theory of probability plays no role in the interpretation of probabilities used in physical theories. Scientists seem dissatisfied with a definition of probability which does not involve the idea of randomness or chance in some way. This unfortunate 5
Throughout my discussion of probabilities and their experimental verification, I shall be in danger of tripping myself up over the use of the word "measure". I shall want to speak of the experimental "measurement" of probabilities, (using "measurement" in its everyday, laboratory sense) and probablities as t h e "measure" of sets (meaning mathematical measure, counting or integration). I hope the reader will bear with me on this if I fail to make the proper distinction.
2.2.
Probabilities for Finite
Systems
29
dichotomy in the use of probability in science is compounded by a legacy from positivism and instrumentalism; a tendency to define physical quantities in terms of the experimental procedures used to measure them. This leads, as we shall see, to a confusion between probability and statistics and to an increasingly subjective interpretation of probability, particularly in its applications in quantum theory. In this work I take the view that Kolmogorov is right and that probabilities are indeed relative measures of sets and that statistical measurements (or verifications) of probabilities are nothing more or less than approximations to these measures obtained by experimental means. This view is established first by some very simple and familiar examples. 2.2.1.
An Example:
The Faces of a Cube
Consider a perfect cube whose faces are numbered so that we can distinguish amongst them and consider the whole set of six faces as our basic set and the subsets as the various possible collections of the numbered faces. The subsets of special interest are the subsets containing a single face; six of them. All these sets have a finite number of members so we can measure them simply by counting their members, and in particular: • The probability that a face of the cube be numbered 5 is The number of cube faces numbered 5 The total number of faces of the cube
1 6
• The probability that the number on the face of a cube is even is The number of cube faces numbered even (2,4,6) __ 3 The total number of faces of the cube 6 and so on in the familiar elementary example. However, it does not take much thought to realise that these conclusions would also be true if the object were not a cube but any hexahedron, regular or not. This does not affect our calculation of probabilities but casts doubt on the possibility of the experimental verification of these probabilities if the object is incompletely specified. This is easily rectified either by simply insisting that the object be a cube or, perhaps better, using the measure of "area of a side" rather than simple counting. If now the measure of each
30
Simple
Probabilities
side is to be the same (A, say), the object must be a cube and we can replace the above calculations by: • The probability that a face of the cube be numbered 5 is The area of cube faces numbered 5 _ 1A _ 1 The total area of faces of the cube QA 6 • The probability that the number on the face of a cube is even is The area of cube faces numbered even (2,4,6) The total area of faces of the cube
3A 6A
3 6
These probabilities may, of course, be verified simply by constructing a cube and carrying out the counting or area measurements and comparing the results. No ideas or experiments involving chance are involved in either the theory or this experimental verification of the theoretical numbers. Probabilities are perfectly definite numbers whose values do not involve chance and may, under some circumstances, be measured experimentally directly with no use of chance. But how do these considerations apply to the tossing of dice; the results of which, as we quickly discover, are not reproducible? How are we to relate our theoretical probabilities to the frequency ratios of "face up" results of throws of material cubes with numbered faces? These throws do involve chance and, in certain special cases, approach the probabilities with increasing accuracy as the number of experiments increases. If we wish to calculate the probability of (for example) a material cube falling onto a horizontal surface with the face numbered 5 on top, we would, according to the above definition of probability, have to find a way of defining a measure for this throw and a way of measuring all the other possibilities. But this could not be simple counting or an area calculation, it is a problem in Newtonian mechanics of some considerable complexity, depending on the force of the throw, the height of the throw, the mass density of the material cube, at least. Equally important, if the results of these throws could be calculated they could be made reproducible and so one would obtain the same side face up every time. What is needed is an understanding of why the frequency ratios of "face up" results of throws of a die approach the number which we have calculated for the probability that the face of a cube be a 5 (say). In general, why are the ratios of measures of certain sets related
2.2.
Probabilities for Finite
Systems
31
to mechanical experiments with concrete realisations (dice) of the abstract quantity (regular hexahedron, cube) used in the mathematical calculation? Or, as Bridgeman 6 says: How can individual events, admittedly independent from one another, combine into regular aggregates unless there is a factor of control over their combination? But what kind of control can there be over independent events?
2.2.2.
Dice: Statistical
Methods
of
Measure
Having looked at what is meant by the term probability and noted that probabilities are purely theoretical quantities referring to abstract objects, it is time to see how probabilities relate to the real world of experiments on concrete systems. 7 Since probabilities are defined and calculated in terms of measure (counting, integral, quadrature) we must expect that any experimental measurement or verification of probabilities must necessarily involve implicit or explicit approximate integration over a set. It is to be expected, therefore, that verifications of probabilities will involve repeated measurements of properties of members of a set of physical objects. Sets may be measured by two general classes of method: • Finite sets may be counted and infinite point sets may be measured by analytical integration (quadrature) methods yielding lengths, areas, volumes and their higher-dimensional analogues. • Infinite point sets may be measured approximately by numerical quadrature methods, all of which involve evaluating a function of the set to be measured at various points within the set and forming a (possibly weighted) sum of these values. 6
" T h e Logic of Modern Physics". As predicted, I am hoist by my own petard here; "measure" is being used in two different ways: the mathematical measure meaning "integral" and the everyday measure meaning "obtain a numerical value of"! 7
Simple Probabilities
32
If the mathematical form of the measure function is known, its value can be calculated at chosen points and numerical methods can be very accurate, using only a small number of values of the function. If, however, the analytical form of the function is unknown (it may be tabulated from experimental results for example), the measure is more difficult to obtain accurately. The worst possible case is when the functional form of the integrand is unknown and the domain and range are also not precisely known. It would seem at first sight that such quadratures would be impossible to obtain. But there are methods of obtaining approximate numerical quadratures of such functions. The very simplest of the numerical methods of measuring a point set is the so-called Monte Carlo method; one simply takes whatever values of the function are available and forms the average of these values multiplied by the range of the function:
f /(x)dx « ^ Ja
E fin) i=i
=£
^ / ( r * )
(2.2.1)
i=i
where r, are the points at which the function is available. This procedure may be visualised by: • Replacing the area under the curve by a rectangle whose height is the average of the known function values and whose width is the length of the interval • Replacing the area by a set of N vertical strips of equal width ((b — a) /N) and height /(r^) Clearly, this method can be ludicrously inaccurate since the points r^ may be entirely unrepresentative of the whole interval [ab]. However, suppose that the points r* occur at random throughout the interval [ab], that is, they are equally likely to be anywhere in the domain of the function / . In this particular case there may be a chance that, if enough random points are used, the value of the approximate measure may converge to an acceptable value. 8 The interval over which the quadrature is being estimated is taken to be given by the extremities of the set of random points, again emphasising the assumption that a (large) set of random points will be representative enough of the domain and range of the measured function. 'This is the source of the name Monte Caxlo.
2.2.
Probabilities for Finite
33
Systems
Now we can see a way of obtaining experimental verification of our calculated probability that, for example, the side of a cube be numbered 5. We can take actual, existing (concrete) cubes and perform some random experiments on them which will experimentally evaluate the function "what is the number of a side of a cube?" or "is the number of a side of a cube 5?" and, most importantly, generate a random set of values of this function. These numbers can then be inserted into the above Monte Carlo formula to obtain approximate measures whose ratios should approximate to the relevant probabilities. If M\ ] is the measure functional and N is the number of experiments, the measure of the set {The number of a side is 5} is N
M[5] « £ * ( » - , - 5 ) i=i
and the measure of the set {The total number of experiments} is N
M [Total] « ] T 1 i=l
so that a numerical approximation to the probability is P(5)
M[5] M [Total]
which should approach 1/6 for large N if the Monte Carlo method of approximate quadrature is good enough. This is nothing more than a theoretical justification of the familiar method of using frequency ratios to get experimental estimates of probabilities. Let's look at two possible practical methods: 1. Dice-throwing: construct homogeneous material cubes, number their faces9 and arrange to have them thrown and spun from a height of not less than ten times their dimension in a gravitational field onto a solid horizontal surface and note which numbered face is uppermost. The method of tossing is assumed to guarantee the required randomness. 9
For the moment, 1 ignore the fact that real dice have their faces numbered, not at random but in a particular arrangement; this point does not affect the argument here and will be taken up later.
Simple Probabilities
34
2. An electronic method: fix a homogeneous material cube with numbered sides and arrange for its sides to be randomly illuminated and an image of the illuminated side to be projected onto a remote screen, the number on the projected image is noted. Here, the randomness is generated by a suitable algorithm, logical or physical. A long run of either type of measurement generates frequency ratios The number of cube faces numbered 5 ^ 1 The total number of faces of the cube 6 Thus these ratios constitute experimental verifications (by the Monte Carlo approximate quadrature method) of the theoretical result that the probability (ratio of measures of sets) that a side of a numbered cube be numbered 5 is 1/6. 2.2.3.
Loaded Dice: Statistical
Methods
of
Measure
Now suppose that we remove one of the specifications of the concrete cubes used in the above experiment and repeat the whole test. In place of the homogeneous cube with numbered faces we use a concrete cube which is not of homogeneous mass density: a "loaded die". Our definition of the probability that a face be numbered 5 is, of course, unchanged because the abstract cube used in the probability theory does not have any mass density, homogeneous or otherwise. But the experimental measurements of the probability will only give acceptable results in the second of the two experiments since the concrete loaded die will not (except by coincidence) give a set of frequency ratios which approximate to the probabilities of the abstract cube. The values of the functional M[5] will be inaccurate in this case because the values of the function will not be equally likely to occur over the whole domain of the function. The second experiment using illumination of the cube faces at random is unaffected by any changes in the density of the cube and will generate frequency ratios which, in long runs, approximate to the theoretical probabilities exactly as before. What makes a "material cube with numbered faces" into a die is: 1. It is thrown and spun in a gravitational field. 2. It falls onto a horizontal surface. 3. It must be allowed to fall a certain minimum distance compared to its own dimensions. 4. It must be of homogeneous mass density.
2.2.
Probabilities for Finite
Systems
35
Otherwise, "material cubes with numbered faces" are unsuitable concrete objects with which to measure (or verify) any calculations of the probability that the face of a cube be 5 (or any other such probability). It must be stressed that the calculated probabilities do not refer to throws of dice, they refer to the numbered sides of a cube. Any calculation of the probability that a throw of a die result in a 5 "face up" is far too complex even to be attempted. On the contrary, the throws are experiments used in a Monte Carlo measure to verify probabilities involving the sides of a cube. The fact that the probability of a side of a cube numbered 5 is 1/6 has no consequences whatsoever for a single throw of a die, loaded or fair. Only the relative frequencies of large numbers of throws of fair dice can have a role as an experimental verification of the probabilities. The relationship between individual throws of a die and probability are put into sharp focus by considering the "measure" definition of the probabilities and the nature of the individual experiments. The probabilities tell us the relative sizes of integrals of a certain function over intervals of the variable on which that function depends; that is all. It is quite impossible in general that the values of a few integrals of any function can tell us anything about the value of that function at any particular point. In the case of the cube, the emphasis and style of approach may be changed by using the idea of the "state" of an abstract object. We may choose the abstract object to be "a face of a cube" which we may think of as having 6 "states". In this case we can set up state functions associated with the six possible eigenvalues of the state operator, projection operators associated with each of the eigenstates and the whole machinery used in quantum mechanics. Since the problem involves a finite number of states, the state functions are 5 functions and the procedure is of rather formal interest. This approach will be discussed qualitatively when some of the more formal aspects of probabilities are reviewed in Chapter 3, since it enables the concepts thought to be unique to quantum theory to be brought closer to those involved in other areas of probability theory. 2.2.4.
Standard
Dice and Conservation
Laws
In the account of the relationship between the abstract object "a cube with numbered faces" and actual, existing, concrete cubes with numbered faces which I have called dice is not, in fact, sufficiently specified to be recognisable as one of the familiar white objects with spots on their
36
Simple
Probabilities
faces. This is deliberate since all that is required for the above considerations to be true is that the material cubes have their faces numbered differently. The faces do not, as I noted at the time, have to be numbered in such a way that the sum of the numbers of opposite faces is 7 any more than they have to be made from any particular material in order to serve as an experimental apparatus to verify probabilities concerning the abstract object "cube with numbered faces". As a matter of fact, it does not matter whether we count "face up" or "face down" as a result or, since the throws are assumed to be random, use a mixture of the two. There are some interesting consequences which arise if we wish to restrict our experimental setup to recording the results of tosses of concrete cubes with numbered faces such that the sum of the numbers on opposite faces is actually 7; that is we use "standard dice" in place of cubes with arbitrarily numbered faces. The abstract object corresponding to these concrete dice i.e. abstracted from size, colour, composition and density, etc. (so long as that density is uniform) is now not simply a "cube with numbered sides" but such a cube with the additional constraint that the numbers on opposite faces are completely correlated by the requirement that they sum to 7. A knowledge of the number on one of the sides determines which of the other five possible numbers is on the opposite side. 10 In this case there is, as physicists would say, a conservation theorem associated with the system. 11 Now let us set up an experiment which will reveal and illustrate the difference between what we may now call the "standard abstract die" (the image of all standard concrete dice) and the abstract "cube with numbered faces" with which we are by now, perhaps, overfamiliar. The experimental setup may seem a trifle eccentric: It is arranged to throw dice onto a horizontal glass table in a laboratory in Sheffield. Suitable video and transmission equipment is used so that an image of the "face up" side can be transmitted to one set of waiting quantum physicists in Mauritius and the "face down" image may be simultaneously transmitted to enthusiasts in Hawaii. The aim of the experiment is for the physicists in Mauritius to predict 10
In point of fact, using the standard dice this knowledge determines the numbers on all the remaining sides if one admits a "handedness" into the experiment. 11 Some quantum physicists might even use eccentric terminology and say that this abstract cube is in a "singlet spin state" but we will take this up later.
2.2.
Probabilities for Finite Systems
37
the the results which are transmitted to Hawaii based on their own readings. The first set of experiments use our "die" of the first type; a cube with its sides numbered 1 through 6 in any old random arrangement. The experimenters at both sites look at the image on their screens, verify that a large number of runs does indeed generate the numbers 1 through 6 with approximately equal frequencies and all are satisfied with the experimental setup. However, all attempts by the Mauritian visitors fail to do better than estimates of 1/5 for the relative frequency of the Hawaiian results. This is exactly what one would expect, of course, if the "die" faces are numbered in random arrangement then the opposite face will be one of the other numbers at random. The next set uses a "standard die" and, after the same intial satisfaction that the both the "face up" numbers and the "face down" numbers are found with approximately equal frequencies in a long run of tosses, the observers in Mauritus are able to predict the results transmitted to Hawaii with 100% success; they simply have to use the conservation law that the sum of the two numbers must be 7 to predict from their observation of n that the flower-clad Hawaiian visitors must see (7 — n). Finally, sets of experiments are performed in which several dice are used and the "face up" image of an arbitrary die is sent to Mauritius and the "face down" image an arbitrarily-chosen die is sent to Hawaii. In this case the result is qualitatively identical to the first case whichever type of die is used; no prediction of the other's result can be made which is better than that expected on purely statistical grounds. In fact it makes no difference whether two dice of each type are used or one of each. These results are trivially obvious of course and the whole exercise is nothing more than an excuse to combine business with pleasure on the part of the experimental physicists. The only point which emerges is the general rule: In the very special circumstances obtaining when one can be sure that two (or more) experiments can be performed on a single concrete system for which the quantities being measured are related by a conservation law, the result of one measurement may be predicted from the result of the other. This result is, of course, true whether or
38
Simple
Probabilities
not one is working with deterministic or probabilistic systems. But this result has no consequences for probabilities since probabilities do not refer to concrete systems. No-one will be in the least surprised by this result since the whole thing hinges on the measurement of two quantities which are known to be quantitatively related. In general, however, if we are dealing with statistical measurements which we wish to compare with computed probabilities, we will be in the position that the sun-bathing physicists were in the third set of experiments; we will not (and should not) know whether or not two measurements of physical quantities which turn up at our apparatus at random are due to the same concrete system or not. Indeed, one of our very basic general principle assumptions was that probabilities can be experimentally verified equally well by many identical experiments on one system or many identical experiments on several systems (or a mixture of both). Further, I noted earlier that the luxury of a choice between the two possibilities is not usually available to us in experiments at the atomic and sub-atomic level; normally, we have to be satisfied by results which simply turn up and are recorded. So, the physicists in Mauritius, on taking a reading of the "face up" image of a concrete standard die of n (say) know that the face down side of that particular concrete die is (7 — n) but transmitting this result to their colleagues in Hawaii cannot predict their image of the "face down" side of a random throw of one of the other concrete dice unless they can guarantee that the image is of the same die. But use of this knowledge invalidates the experiment's qualification as a random test which is absolutely crucial to the use of tests on concrete objects to verify probabilities. This might be simple at the level of macroscopic ivory cubes but to verify this rather simple result at the sub-atomic level requires an enormous amount of equipment, skill and expertise. The ability to use a conservation law to predict the result of an experiment on a single concrete system from another measurement on that same system may be useful but it has nothing to do with probabilities; probabilities are relative measures of sets and experimental measurements of probabilities are approximations to these measures obtained by quadratures based on many random measurements on concrete systems. Above all, it is profoundly un-mysterious that one can predict the value of someone else's experiment on a single concrete system from a known
2.3.
Probability and
Statistics
39
conservation law independently of the distance between those experiments or the time interval between them.
2.3.
Probability and Statistics
The term "statistics" has been used without saying what is meant by the term and how statistics relates to probability; it is time to clear this point up. We have already seen that probabilities are theoretical quantities; they are the ratios of measures of sets. But we have seen that it is possible to use approximate methods of quadrature to obtain approximations to probabilities. In particular, if the source of the values of the function to be measured in these approximation schemes is experiment — using concrete realisations of the abstract objects used to define and compute probabilities — then a connection is made between theory and experiment and we have a physical theory of the behaviour of (sets of) concrete objects. In designing experimental procedures for the measurement of probabilities (which must necessarily involve explicit or implicit quadratures) there are two general points to consider: • The concrete objects on which the experiments are performed must have the essential properties of the abstract object in the theory in the context of the particular experiment. In the example above, mass density homogeneity is essential if dice are to be thrown, but not if they are to be illuminated. • If the quadratures are to be performed numerically, then there must be adequate precautions taken to ensure that the experiments generate values of the integrand which sample the full interval over which the quadrature is to be performed. In the most common case, if the quadrature is to be Monte Carlo, then the randomness of the values must be guaranteed; a point which will be discussed later. In general, there have to be methods of treating the "raw" data to ensure that the implied quadratures are as meaningful and accurate as possible. The first of these is common to experimental verifications of all kinds of physical theory and is a question of good experimental design. The second is the domain of Statistics; statistics provides the mathematical techniques required in the design and analysis of methods for evaluating (among other things) probabilities experimentally using random tests. Although it is important to realise that statistical measurements are approximate (numerical) quadratures, it is obvious (from the explicit
40
Simple
Probabilities
considerations of dice above) that such quadratures almost always turn out to be frequency ratios. Since frequency ratios are what is measured, there is no harm in referring to experimental measurements of probabilities as "frequency ratios" rather than the less familiar and much clumsier "ratios of approximate quadratures" provided that the full context is kept in mind and probabilities are not, under any circumstances, defined by or identified with frequency ratios. Statistical methods are not the only way of getting experimental values of probabilities; as we have seen in particularly simple cases one can explicitly measure the sets by counting. Usually, however, the statistical method of quadratures using random tests is the only feasible way of testing probability statements experimentally. An extreme example of a case in which the statistical method might be inappropriate is given below. Although it is rather bad form to give an answer to what Bridgeman on page 31 clearly intended to be a rhetorical question, we can now see that there can be no question of "control" of independent events leading to measurements of random events generating good approximate probabilities. The reason why many independent events lead to frequency ratios which can be good experimental measurements of probabilities is simply the fact that, if they are random and independent, this ensures that (if enough are taken) they are a representative sample with which to perform the numerical quadrature which is an approximation to the measures defining the probabilities. 12 2.3.1.
An Extreme
Example
The population of mammals in an English meadow provides an elementary example of probabilities which are perfectly well defined but rather tricky to measure by statistical methods. Suppose that there are 8 cows, a horse, a pair of foxes, 9 rabbits, 48 fieldmice and a family (4, say) of weasels in a meadow; a total of 72 mammals. Let us use as a measure for the calculation of probabilities simple counting. So, the probability that a mammal in this 12
There may be "improvements" to the random choice of points such as the ones used in atomic and molecular simulation calculations. Here the random points are generated by a numerical algorithm and are weighted by a Boltzmann factor in the quadrature and so it is possible to reject some of the random points on the grounds of the size of this weighting. But in measurements the points simply turn up at random and we may have no grounds on which to distinguish amongst them.
2.4-
Probabilities in Deterministic
41
Systems
meadow be a cow is 8/72 = 1 / 9 and the probability that a mammal be a weasel is just half of this: 4/72 = 1/18. Now, as anyone who has done natural history research in the field will tell you, the probability of finding that a (randomly selected) mammal is a cow in these circumstances would be very much larger than double the probability of finding that a mammal is a weasel. Weasels are very resourceful and secretive animals and I doubt if any way of experimentally selecting mammals "at random" in the meadow would turn up any weasels at all. This unfortunate fact presents rather acute problems for zoologists and statisticians in their design of experimental procedures, but the relevant probabilities which are ratios of measures of sets of the different types of mammal are not affected at all by these practical considerations. Probabilities are ratios of measures of sets whether or not experimental techniques can be devised to verify them by statistical methods. Finally, let it be said that these probabilities are not impossible to measure, merely extremely difficult to measure by statistical methods. Since the measures concerned simply involve the counting of finite sets, we must resort to more drastic and inhumane measures; we must burn or flood the meadow and count the bodies and the survivors!
2.4.
Probabilities in Deterministic Systems
It has been repeatedly stressed that the idea of randomness or chance does not come into the definition or calculation of probabilities and it is useful to illustrate this point by an example which will have a bearing on the interpretation of Schrodinger's mechanics and which is completely deterministic. The motion of the undamped Simple Harmonic Oscillator as exemplified by an ideal pendulum is completely soluble in both classical and quantum mechanics; it is a paradigm of a deterministic mechanical system. For a pendulum of length I the angular displacement of the pendulum from the vertical (&) is given, as a function of time (t), by: 8(t) = esin(wf + 7 )
(2.4.2)
where 0 (a constant) is the maximum value of the displacement and 7 is the initial (t = 0) displacement of the pendulum. The angular velocity of
Simple
42
-e
-e/2
o
e/2
Probabilities
e
Fig. 2.1. Probability Distribution for a Simple Pendulum.
the pendulum is simply the time derivative of this expression: 8{t) = 0a; cos(ojt + 7)
(2.4.3)
(a; is given by y/g/l, where g is the acceleration due to gravity). The motion is, of course, cyclic and so it is simple to evaluate the probability that the angular displacement have any value from —0 to + 0 and we can therefore calculate the probability distribution function for the angular displacement which should be independent of time precisely because the motion is cyclic.13 The definition of this distribution function (P(0), say) is that the integral r02
M{61,62)=
/
P{8)d8
(2.4.4)
is the probability that the angular deflection of the pendulum lies in the region from 81 to 82A simple calculation gives: 1
P{8) = TTVI
- (0/©)2
(2.4.5)
and a graph of this function is given below; its form simply reflects the angular velocity of the pendulum. The faster the pendulum moves in a region, the less likely is it to be in that region; culminating in the greatest 13
T h e existence, at least in classical mechanics, of Poincare's reccurence theorem puts us all on shaky ground here in the sense that, if all motion is cyclic, all probability distributions are time-independent on some time scale.
2.4-
Probabilities in Deterministic
Systems
43
value at the two extreme turning points where the pendulum is momentarily stationary and a minimum as the pendulum passes through the vertical where the velocity is a maximum. 14 One might ask if this is not merely a mathematical exercise since the angular deflection and the velocity are precisely known for all times by reference to the above equations (2.4.2) and (2.4.3). While it is certainly true that one can calculate the precise position of the pendulum for any given t, what is equally obvious is that this information is of no use at all if one wishes to compare these calculations with experiment and is forced by circumstances (or by choice) to make measurements of the position of the pendulum at random, unpredictable times in order to verify the calculations. If, for whatever reason, one only has access to experimental measurements of the position of the pendulum at random times and wishes to compare these results with the theory of the physical system, then the only way is via the probability distribution function P{9) given by equation (2.4.5). If a random measurement is made on any system it does not make any difference whether the system is completely determinate (i.e. we know the laws of its stucture or evolution) or completely indeterminate (we have no idea of the relevant laws, if any); the result of this random measurement cannot be predicted. In such cases, at best, the relative results of many such random measurements may be predicted. It is worth considering what the factors actually are which make for a set of random times in the case in hand. The problem is simplified by the fact that the probability distribution function is independent of time, depending only on the angle 0. Although the distribution function is independent of time, the pendulum itself has a characteristic "cycle time" (the time taken for a complete swing of the pendulum) and although, in principle, random times could mean anything from nanoseconds to millennia, the experiment itself will normally suggest limitations on the choice of random times. There are some obvious limitations: • If we make measurements of 6 at random times which are very much shorter than the cycle time we may well conclude that the pendulum is stationary unless a huge number of measurements were made. Looking at the moon every few milliseconds during one second might lead to a similar conclusion. 14
There are, of course, no zeroes in this distribution since they would imply infinite velocity.
Simple Probabilities
44
• If measurements are made at random time intervals greater than the cycle time the results should be more useful. • Of course, the worst possible case would be measurements taken at random integral multiples of the cycle time; this "stroboscopic" case would definitely conclude that the pendulum were stationary. These considerations have some relevance to atomic measurements of course. The results of these experiments are interpreted in exactly the same way as the experimental investigations into the numbered faces of a cube: • The computed probabilities M(61,62)=
/
P{9)d8
refer to the abstract pendulum, they are the probabilities that the deflection of the pendulum be in the region [#i,#2] (for —O < 0% < ©). • The frequency ratios (approximate quadrature ratios) of f(0
fi\
-
^I'fr)
should approximate to the relevant M{9\, 62) for large enough N(Q\, O2). (where N(a,b) is the number of times that a real pendulum was found experimentally to have an angular deflection in the interval [a, b]) This example and the example of the cube have been discussed in some detail since both typify the relationship between the calculation of probabilities and their experimental verification (or not) by experimental methods. We have seen that, when we define a suitable measure functional, it is extremely simple to calculate the probability that a side of a cube be numbered 5 or the angle of deflection of a simple pendulum be within a given range. We shall also find that a more complicated calculation will yield the probability that the electron in a hydrogen atom be in a particular region of space. But the problems involved in the calculation of: • The face-up side of a die being found experimentally to be 5 • The angle of deflection of a simple pendulum being found experimentally to be in a given range • The electron of a hydrogen atom being found experimentally to be in a particular region of space
2.5.
The Referent of Probabilities and Measurement
45
are of unimaginable complexity depending, in the last two cases, on the nature of the experimental apparatus, the theory of operation of this apparatus, its accuracy and reliability, the competence of the operators etc., etc. Probabilities are the ratios of measures of sets; in these latter cases one only has an incomplete knowledge of what the sets actually are, let alone whether and how a suitable measure functional may be introduced so that they can be measured. 15 In the case of the pendulum I have chosen to illustrate the role of the probability distribution in the context of measurements at random times. In reality, of course, we often do not have the choice between using the deterministic equation of motion and the probability distribution, the measurements we make are not chosen to be at random but the situation is the opposite; measurements are necessarily random, we have to be satisfied with the information which simply turns up at random. In cases like this, comparison with experiment must be via a model which generates a probability distribution.
2.5.
The Referent of Probabilities and Measurement
I have taken considerable pains to stress that probabilities are theoretical quantities which, once the sets and the measure functional on those sets are chosen ("the model"), are capable of being calculated exactly and are perfectly definite (real) numbers which contain no reference to chance. In this respect they are analagous to any other theoretical quantities which are calculated using some model assumptions about the structure or behaviour of part of reality. One computes the orbit of a planet, say, initially as a two-body Kepler problem. The referent of probability calculations and of classical calculations of the Kepler problem are abstract objects: • In the cases we have discussed above, the referents were the abstract (massless, colourless, immaterial) cube and the abstract (inextensible, undamped) pendulum. • In the case of the Kepler problem, the referent is the relative motion of a set of two (point-mass, unperturbed, undamped) massive particles. Any measurements which we might wish to make to confirm (or not) the predictions of these theoretical models will have to be made on actual 15
The solution of the Schrodinger gives the energy levels of the abstract hydrogen atom, and the probability distributions for its abstract electron, not the design of a UV spectrometer or an X-ray diffractometer.
Simple
46
Probabilities
concrete objects which have the properties of the abstract object plus many other incidental properties and disturbances. It is the task of experimental design to minimise or attempt to neutralise these, as we say, inessential effects in order that any experimental results may be realistically compared to the numbers obtained from the theoretical model. If the referent of the theoretical calculation were the concrete object the life of the experimental scientist would be much more simple. In the case of the pendulum there are two possibilities: 1. If some initial conditions are known for the motion of a concrete pendulum and one is able to take measurements of the subsequent position of the pendulum at known elapsed times, then the experimental measurements may be compared directly with the numbers obtained from equation (2.4.2 on page 41). "Directly" here means that the theoretical (real) number should be comparable to the experimental result which, typically, will be an element of the standard topology of the real numbers 16 or, as typically reported, a result and a standard error. 2. If the initial conditions are not known and one is only able to note the position of the pendulum at random times, then equation (2.4.2) is of no use and one must use the whole sequence of random results to construct (rational) frequency ratios which are approximations to numerical measures of the theoretical probabilities obtained by suitable integrations of the probability distribution given by equation (2.4.5 on page 42). Both of these methods are capable of providing experimental measurements which confirm the theory of the abstract pendulum. Both are subject to errors from the same causes: damping by material resistance, etc. The fact that experimental measurements, however painstaking, will deviate from the theoretical predictions emphasise the process of abstraction in forming a model of reality. The referent of neither of the two equations (2.4.2) — the deterministic model — and (2.4.5) — the probabilistic model — is an actual existing (concrete) 17 pendulum. In both cases the referent is 16
A n interval in the real number system with rational end points. Talking about "concrete" dice, pendulums and, later, electrons, etc. brings to the mind's eye unfortunate mental images and associated attacks of the giggles; these must be sternly suppressed. 17
2.5.
The Referent of Probabilities and
Measurement
47
the abstract (idealised) pendulum whose only properties are its length and the field of force in which it swings. These remarks are not peculiar to pendulums; 18 in particular, the referent of the probability distribution for any abstract system is that abstract object and most emphatically not the actually existing concrete objects used in attempting to verify the probability distribution experimentally. Thus, the trivial probability distribution for the abstract object "numbered sides of a cube" (1/6 for each side) does not refer to throws of concrete dice; as we have noted elsewhere the probability distribution for the abstract object "a throw of a die" has never been calculated and we have to be satisfied with the results of throws of concrete dice. There is a constant thread running through the quantum theory literature that one of the main properties of any measurement is that it shall be reproducible: if the same measurement is repeated it should yield the same numerical result. I am baffled by this opinion 19 since it is self-evidently false for two of the main classes of physical phenomena: time dependent quantities and probabilistic phenomena. A measurement performed on a concrete object (a throw of a die) to verify a probabilistic theory (the numbered side of a cube) is required to be non-reproducible by the very conditions of statistical verification of the theoretical result. Probabilities are verified (or not) experimentally by measurements on randomly selected concrete objects and these measurements will be different from each other except in the case that the abstract object has the property measured as one of its fixed values. How would we verify the probability distribution for the simple pendulum if any measurements of the angular distribution were required to be reproducible? The essence of statistical verifications of probabilities are that the numerical values resulting from measurements are randomly obtained are are not reproducible.
2.5.1.
Single System
or
Ensemble?
It is a matter of common experience that one may verify the probability distribution for the abstract object "numbered side of a cube" by performing many throws of a single die or many throws of many dice or 18
I use "pendulums" rather than "pendula" mainly because the latter sounds rather sinister. 19 Unless it is due to an infatuation with idempotent projection operators on the part of mathematicians.
48
Simple
Probabilities
any combination of throws of concrete dice so long as the dice satisfy our criteria in Section 2.2.3; they may be any size, any material, any colour, etc. All that is necessary is that they reflect two things: • They must have, amongst their properties, the properties of the abstract object "numbered side of a cube". • They must satisfy the criteria for being a die just mentioned. The same remarks apply to the verification of the probabilities referring to any abstract object; in particular we may use any number of pendulums of length £ to verify the probabilities obtained by integrals of equation (2.4.5 on page 42) for the abstract pendulum. If we set a whole host of such pendulums in motion with arbitrary initial conditions (values of 7 in equation (2.4.2 on page 41)) and make measurements of the angular deflection of any or all of them at random times and collate the results, they should converge to rational-number approximations to the theoretical probabilities which are, of course, real numbers. This is the source of the attractive idea that the referent of probability distributions and probability statements in general is the set, the ensemble, of all concrete realisations of the abstract object. This interpretation has the obvious advantage that it does not attempt to make the referent of such statements a concrete object and concentrates attention on the collective nature of experimental aspects of probabilities, but it suffers from the same defect as Russell's definition of, for example, the number 2 as the set of all pairs of objects. Lurking behind each of these ideas is the abstract object used to form the ensemble or set; how is one to decide what a pair is without the use of the number 2? Much more important in the case of the ensemble interpretation of probability distributions is the fact that every concrete pendulum has properties or environmental factors in addition to those of the abstract pendulum which will, sooner or later, mean that any statistical measurements will deviate from the theoretical probabilities. All concrete pendulums are damped, and so the long-term statistical prediction of the probability of position for the angular deflection of an ensemble of concrete pendulums is zero. 20 One might object to this and say "I mean a virtual ensemble of idealised pendulums"; quite so, how does this differ from an abstract object?
2.5.
The Referent of Probabilities and
Measurement
49
Abstraction is the essence of all conceptual thinking and this is o fortiori the case in scientific thinking. In the theory of probability as in other physical theories we deal with abstract models of reality. 2.5.2.
The Collapse
of the
Distribution
We have already seen in Section 2.2.2 that statistical measurements of probabilities are approximations to integrals of the measure function and that a knowledge of either exact or measured probabilities is the knowledge of certain integrals of this function over intervals of its domain. These numbers are not at all sufficient to obtain any information about the value of the function anywhere in its domain; knowing that
[* f{d)d6 = 2; Jo
P
f(6)d6 = l
Jo
for example, does not help us to predict /(0.87) nor to predict the value of /(#) for a random value of 8. A knowledge of the probabilities (relative measures of sets) gives us no knowledge whatsoever about the outcome of any one random test of a concrete object used to verify these probabilities. The random tests give the value of a measure integrand at points in its domain while the probabilities refer to values of the measure functional. If, however, we defer to colloquial usage and imagine that the probability distribution function (measure integrand) refers to each concrete system used in the statistical measurement process we are trapped in an acute paradox: The outcome of every random event using a concrete object is a perfectly definite rational number, what is the role of the probability distribution here? It has been assumed by some writers that probability distributions depend, in addition to the value of the distributed variable, on some mysterious parameters which enable the distribution to change ("collapse" is the fashionable terminology) from being a distribution referring to the abstract object to the unique experimental result when the experiment is actually performed21 on a random representative concrete object. This is just a Or, when the result is read by an observer, in some interpretations.
Simple Probabilities
50
mistake and it is easy to ridicule this position and I am not the one to refrain from such ridicule. But the important point here is that this simple mistake about the referent of probability statements has been and is the source of an enormous literature on the "collapse" of probability distributions, "measurement and the role of the observer" in Schrodinger's mechanics where probability distributions are central to the theory. 2.5.3.
Hidden
Variables
There is a tendency among scientists to think that probabilities arise due to incomplete knowledge of the real nature of the degrees of freedom of a physical system. This opinion is encouraged by the most spectacular successes of probabilistic theories; the explanation of the laws of Thermodynamics by Statistical Mechanical methods and the Kinetic Theory of the Ideal Gas Law. The large-scale (macroscopic) properties of materials are explained by the behaviour of certain averages of the motions of the underlying microscopic (molecular) components of those materials. These examples are misleading because: There are no probabilities in Thermodynamics or in the Ideal Gas Law to be explained by hidden variables. Certainly there are variables hidden to the macroscopic level of observation and the relationship between averages of these properties and macroscopic variables is illuminating. But the "high level" theory does not contain any probabilities which require explanation by hidden variables at a lower level. Although a paradigm for the successful reduction of a high-level theory to a lower-level one, the statistical mechanical explanation of thermodynamic laws is not an example of the explanation of probabilities by hidden variables. Certainly, averages of microscopic variables are used in this reduction, for example the temperature of a body is explained in terms of the mean kinetic energy of its constituents but temperature is not defined as a probabilistic mean in the macroscopic thermodynamic theory. What we are to be concerned with later in Schrodinger's mechanics is a theory which, at its own level, generates probabilistic results unlike either Thermodynamics or the Ideal Gas Law. It is these probabilities which, it is claimed, should be explicable in terms of (averages, presumably of) hidden variables. If we think about the example of the pendulum it is helpful in this context. Suppose that we had generated the probability distribution for
2.6.
Preliminary Summary
51
the abstract pendulum (equation (2.4.5 on page 42)) directly without the intervening deterministic equation (2.4.2 on page 41) and had confirmed its predictions by statistical measurements of the angle of deflection of concrete pendulums. That is, we have a probabilistic theory with its experimental confirmation before us and are dissatisfied with the fact that probabilities are involved and seek a more "fundamental" explanation of the phenomenon. The variables used in this probabilistic theory of abstract pendulums are: the length (. of the pendulum, time t, the angular displacement 6 and the acceleration due to gravity g. Where do we look for hidden variables to generate a deterministic theory? In fact, of course, in this case we know what the deterministic theory is and, what is more, we also know that there are no hidden variables; the above set is completely sufficient to describe the phenomenon in both the deterministic and probabilistic cases. What is missing in this case is not hidden variables but "hidden" physical laws which give a deterministic connection between the explicitly-known variables common to both descriptions. That is not to say that all probabilistic sciences are of this type, but it is clear that we must distinguish between at least the two possibilities: • Probabilities in physical theories are necessary because the phenomena which our theories treat cannot be completely described in terms of the dynamical variables we are using. • Probabilities occur in physical theories because of our ignorance of some of the laws which connect the dynamical variables which we are currently using.
2.6.
Preliminary Summary
This has been an informal and elementary preview of ideas of probability with no axioms and no formal derivations. My main point is to establish that Kolmogorov's theory of probability is not just a mathematical scheme but may be equipped with a physical interpretation by relating the theoretical measures of sets to the experimental (statistical) approximate numerical measures which we obtain to verify probabilities. The most important aim in this chapter is to establish that the referent of probability statements, in particular probability distributions, is the abstract object with which the theory of probability deals. Probability
Simple
52
Probabilities
statements do not refer to individual objects which have the properties (amongst others) of this abstract object. Statistics deals with experimental measurements of the properties of actually existing {concrete) objects which have (at least) the properties of the abstract object. These measurements are used to verify the theoretical probabilities. In this chapter, I have concentrated on a descriptive introduction to the mathematical theory of probability and its interpretation; in the next chapter we can look at a more formal theory and give some of the terminology a more careful definition. Notice from the very outset, here and in Chapter 3, we introduce (and define) probabilities as measures of sets so that only those things which are both 1. Sets (collections of members) and 2. may have a measure introduced into them (members of the set may be counted or the concept of area, volume, etc. is given an exact meaning in the set) are probabilities. Thus: • "The truth of a proposition" (for example) is not a set and so cannot be measured • "A throw of a die" is also not a set and similarly cannot be measured and therefore it is just as meaningless to speak of the value of the probability of either of these objects as it is to speak of their length, area or volume. This is why the four statements made in Section 2.1 cannot be probabilities in the mathematical theory, however familiar they are in colloquial use.
Chapter 3
A More Careful Look at Probabilities
Some of the ideas introduced in the last chapter are placed of a firmer, more formal, footing and a problem in ontology is skirted. The problems associated with time-dependent probability distributions are rehearsed with particular reference to a familiar example. The abstract and concrete objects which are likely to be met in interpreting Schrodinger's mechanics are examined.
Contents 3.1. 3.2. 3.3. 3.4. 3.5. 3.6.
3.1.
Abstract Objects •....• States and Probability Distributions 3.2.1. The Propensity Interpretation The Formal Definition of Probability 3.3.1. A Premonition Time-Dependent Probabilities Random Tests Particle-Distribution Probabilities
53 55 56 58 62 63 66 67
A b s t r a c t Objects
Considerable stress has been placed on the idea that the referents of physical theories, in particular the theory of probability applied to physical processes, are abstract objects. Although indications have been given of what these objects are in Chapter 2, we need a more careful definition if the idea is to be used in less familiar circumstances. Also, I have to attempt to justify the idea that physical theories describe what are, in the everyday sense of the term, non-existent entities. 53
A More Careful Look at Probabilities
54
In fact, all language, logic, mathematics and theoretical science deal with entities which are abstracted from or are idealised versions of actually existing, concrete, objects. We are perfectly familiar with the use of a term like "mammal" and find its use completely unobjectionable. And yet there are no mammals; there are only cats, dogs, gnus, etc., etc. A moment's thought makes us realise that there are no dogs or cats either but only Fido, Rover, Pussums and Tiddles, etc. If we exclude the idea of abstractions from reality we can scarcely use language at all and fall into the same trap as the medieval Nominalists in thinking that only concrete objects exist. Roughly speaking, abstract objects are concepts. In ordinary language one is forced to abstract from "incidental" properties of concrete objects in order to be able to express general ideas; one needs to be able to say what a mammal is and to be able to distinguish a mammal from (say) a bird (another abstract object) without having to explain all the ways in which a gnu is different from a wren which includes (for example) the fact that a wren does not suckle its young. An abstract object is a mental construct which only has the properties which one explicitly assigns to it and no other properties. Thus: • An abstract object is not a "typical" member of a set of concrete objects. All concrete objects will have "incidental" properties; all concrete cubes have mass. • Usually, to each abstract object there is a corresponding concept (word or name) but not always, since, in science we must form abstract objects. • Every actually existing concrete object has a fixed set of values of all its properties 2 But abstract objects may well not have fixed values of some or all of their the explicitly-specified properties. But where are these abstract objects to be found? Do they exist? Are they real? They are real and exist in the minds of people. They do not have any material existence of course independently of such minds but is the place where an object exists a criterion for that existence? 3 Here, perhaps, one has to insist on the distinction between "realism" and "materialism" in ' O r the position satirised by Swift where the learned professors of the Academy of Lagado carried around actual objects to communicate with one another. 2 "Time of specification" may be one of these, of course. One would scarcely withhold the attribute of existence from goldfish because they exist in aquaria rather than in the wild.
3.2.
States and Probability
Distributions
55
philosophy, which I have done. Bananas 4 are both material and real as are electric fields; they exist independently of minds. The number 2 (or it) is real but not material. By "material" I simply mean "existing outside of and independently of our minds". I cannot define "real" in a few words but it includes everything which is material plus those things which, through an active agent (usually human), may have an effect on material objects; maybe material plus "minds and their contents" will do duty for a definition of "real". This usage differs from that used by realist mathematicians who hold, if I have understood their position, that, for example, numbers exist outside minds but, presumably, they are not material. The upshot of these rather cavalier considerations is that the referents of physical theories are abstract objects. 5 The predictions (numerical or otherwise) of physical theories may be checked against experiments on actually existing concrete objects which have at least the properties of the abstract object. The task of the experimenter is to minimise the effects of the incidental properties of the concrete objects on his measurements and that of the statistician to ensure that the mathematical techniques for analysing the data from the experiments is sound. Since properties may be predicated to concrete objects and an abstract object is a set of properties, some writers identify "concepts" with "predicates" so that their "predicates" are my "abstract objects". However, one can scarcely say that the referent of a probabilistic theory is a predicate.
3.2.
States and Probability Distributions
A set of the possible values which some or all of the properties of an abstract object can take may be considered to be (mutually exclusive) "states" of that abstract object. In the very simple cases considered so far there is only one such property and the idea of "state" may sometimes seem a little artificial but the formal similarity amongst even these simple cases is worth emphasising: • The "numbered face of a cube" may take one each of the six possibilities and we may say it is in a state "5" or "2", etc. 4 1 am in danger of tripping myself up here, by "bananas" I mean in this context the set of all concrete bananas, not the abstract banana! 5 "No science ever interprets reality in an exhaustive way. It constructs its object by a choice which preserves the essential and eliminates the non-essential." (Lucien Goldmann in "The Human Sciences k. Philosophy", Cape, 1969).
A More Careful Look at Probabilities
56
• The "mammal in the meadow" may take one of the six values "cow", "horse", "rabbit", "fox", "fieldmouse" and "weasel" and we may say that it is a state "rabbit" or whatever. Here the abstraction method gives us the strange idea of a "mammal" being in the state "rabbit" rather than the colloquial "rabbit as an example of a mammal". • In the case of the pendulum we would normally want to distinguish between the deterministic model and the probabilistic model, denning the abstract object according to the type of measurements to which we subject the concrete pendulums. — In the deterministic model the abstract object may simply be a "pendulum of length £ and angular deflection 9" whose state is defined by the numerical values of £ and 9. The "angular deflection of a pendulum of length £" may take on a non-denumerable infinity of values between —0 and 0 and we may say, therefore, that the abstract object is in a state "a" or some such value. — Using the probabilistic model, on the other hand, the abstract object may be chosen to be a "pendulum of length £ and amplitude 0 " . In this case the numerical values of £ and 0 (which fix its energy and frequency) define the state of the abstract object and the angular deflection 9 is not fixed by the state of the abstract object but is only given by a probability distribution. Notice for future reference the difference between the last pair of these examples and the other two; the pendulum can actually autonomously change from one state to another but neither the "numbered face of a cube" nor a fortiori the "mammal" can actually physically change the value of their state-defining property. 3.2.1.
The Propensity
Interpretation
This last point is worth some elaboration since it bears on the so-called Propensity interpretation of probability due to Popper and which is the adopted interpretation by many scientists who seek an objective view of probability. The principle idea in this interpretation is that probabilities are objective properties of (individual concrete) systems which measure the propensity that the object has to have a particular value of a property of interest. Thus: • In the case of our pendulum, the probability that the angular deflection of the pendulum lies in a given region is simply a measure of its propensity to be in that region
3.2.
States and Probability
Distributions
57
• In the case of a hydrogen atom, the probability that the electron be in a particular volume of space is again a measure of its propensity to be in that volume These propensities are, of course, determined by the potential which constrain the motion of the relevant mechanical system. One weakness of this position is obvious; without a satisfactory definition of propensity independent of probability, the explanation is circular. Its strength is its objectivity; nowhere is it implied that probabilities involve the acts or thoughts of the "observer". But, like other erroneous interpretations of probability, it refers to the properties of individual concrete objects rather than to the true referent of probabilities, abstract objects. This point is made much clearer by looking at the propensity interpretation of probabilities which involve states of systems which may not, autonomously, change into each other. While it may seem to make perfect sense to say that a pendulum of a given length has a larger propensity to be at the extremities of its motion than in a vertical position because of the laws of dynamics, a "mammal in a meadow" cannot be said to have a propensity to be a rabbit, for example, not least because, as we have seen, there are no concrete mammals. Any example of a "mammal in a meadow" (randomly chosen or not) is always a particular concrete animal, it cannot change from being a particular concrete rabbit to being a particular concrete cow by the act of measurement or any other process. But there is a perfectly definite probability that the abstract object "mammal in a meadow" be a rabbit because the probabilities are the relative measures of sets and not dependent on any particular property of that abstract object except that the relevant subsets may be measured. Similarly, any experimental verification of the probabilities using sets of concrete objects depends only on the fact that they can be measured (counted, in this case). To press the point to the edge of fatigue, the two statements: • 83% of all the animals on the planet are insects 6 • The probability that an animal on the planet is an insect is 0.83 are identical and niether of them has anything to say about the properties of any concrete ant or concrete elephant except that each has the (objective) property of being capable of being counted. An individual concrete animal is always a particular wasp, a particular gnu, ..., a particular spider, etc. 6
I have plucked the figure of 0.83 out of the air of course simply to make a point.
A More Careful Look at Probabilities
58
and cannot be said to have any "phylum or genus propensity" to be anything other than what it is. The probability may be statistically verified, as always, by counting sets of animals which are adequate 7 to give Monte Carlo quadrature approximations to that probability. The situation is completely analogous in the case of the "numbered sides of a cube"; the numbers on the sides of the abstract cube or any concrete cube cannot change and the probability that the "numbered side of a cube" be 5 cannot have a propensity interpretation even if that probability be mistakenly taken to refer to throws of concrete dice. The outcome of any particular throw of any concrete die is that a single numbered side be "face up"; that "face up" side cannot be said to have a propensity to be anything other than what it actually is. To say that each of the sides of a concrete die have an equal propensity to fall "face up" is to say nothing about the concrete dice throws and simply to paraphrase the probabilities in the case of the abstract cube. In order for the concept of probability to have a uniform interpretation for all kinds of sets which may be sub-divided into disjoint measureable subsets, it is necessary to discard the propensity interpretation even though its proponents are allies in other areas of philosophy; they are supporters of the objective existence of the properties of the material world. The propensity interpretation arises from the entirely laudable effort to give an objective meaning to the idea of probability in a particular area of science; the theory of those systems for which concrete objects may autonomously change from one part of the probability distribution to another. But this interpretation leads to obvious absurdities in more general probability applications. As we have seen, if probability statements are imputed to individual concrete objects — for whatever reason — one must fall into paradox and confusion.
3.3.
The Formal Definition of Probability
The Kolmogorov axioms apply to measures of subsets of a given set and simply state in formal terms the conditions we have become familiar with in Chapter 2; probabilities are relative measures of subsets of a given set. That is, for a set Q and sub-sets Wi a measure function P:Q-*R, Not a trivial task in experimental design!
P:Wi-*R
3.3.
The Formal Definition of Probability
59
is defined from the subsets Wi C £1 to the real numbers such that: • The probability of a the larger of two sets is not less than than that of the smaller: P{WX) > P(W2)
if
Wi C W2
(3.3.1)
• The sum of the probabilities of two disjoint sets is the sum of their individual probabilities: P(W1) + P(W2) = P(Wx + W2)
if
WinW2
=0
(3.3.2)
this result may be extended by rescursion to any denumerable number of subsets of $7. • The probability of the enclosing set is unity: P(Q) = 1
(3.3.3)
In the case of probabilities generated from a probability density which is a function p from the set X (members x G X, subsets Xi C X)) onto the real numbers, the corresponding results are: 1. /
p(x)dx > I
J Xi Xi
2.
p(x)dz
J X-2 X-i
ifX!CX2. /
p(x)dx + /
J X\
J X-2
p(x)dx = I
p(x)da
J A1+X2
if Xi n x2 = 0
L ix
p(x)dx = 1
which may always be arranged by multiplication by a numerical factor 1/JVif
L
p(x)dx = N < 00 ix Now, to qualify mathematically to be a distribution function a function p must be: 1. Single valued 2. Non-negative 3. Integrable to a finite value.
A More Careful Look at Probabilities
60
Continuity is not necessary and t h e function may have "corners", "jumps" a n d "spikes" provided any infinities occur on sets of zero measure (finite numbers of points for infinite sets). All of these conditions are more t h a n met by insisting t h a t p b e t h e square of t h e modulus of a continous, normalisable, possibly complex function: p(x) = |V(x)| 2 T h u s t h e square of t h e norm of any single-valued function has the correct mathematical requirements t o be a probability distribution function. 8 In fact the requirement can be generalised t o include any "vector" of such functions: 9
i/>(x)
1p3(x)
\i>n{x)J with p(x) = i$>(x)^ ip(x) In addition t o the generation of a probability density, each ip generates a projection operator: 1 0 Pi> = I
i>(x)ip(x')*dx'
such t h a t iV(z') = f
dx'^{xW{x')f{x')
= 4>{x) I dx'ip*(x')f(x')
= ip(x)pi
(say)
8 It is worth stressing the obvious fact that a probability distribution function for the quantity x (say) is a function of x. Thus, a distribution function of energy is a function of values of the energy, not a function of space. Later in this work we shall be concerned with distributions of quantities like x in space; these are not probability distributions since they do not satisfy Kolmogorov's conditions. 9 With the obvious implications for "spin". 10 Unfortunate near-collision of notation; P( ) for probability measure, P for projection operator.
3.3.
The Formal Definition
61
of Probability
where pi is a number. Thus the implied eigenvalue equation P^fi{x)
=Pifi(x)
has the solutions: Pi = 1,
if / i = V'
Pi = 0,
if fi is orthogonal to ip
if V' is normalised to unity. That is, the function ip which is capable of generating a probability distribution p = \ip\2 is an eigenfunction of the (Hermitian) operator P$ with eigenvalue unity. The other eigenfunctions comprise all those functions which are orthogonal to ip and they are all degenerate with eigenvalue zero. It is trivial to show that these alleged projection operators associated with orthonormal functions ipi satisfy the requirements of idempotency and completeness. The connection between the distribution function p and the associated function ip — the "state function", say — may be made more explicit by changing the definition of p slightly so that it is dependent on two sets of variables x and x'\ p(x;x') = ip(x)ip*{x') with the original probability distribution function being p{x; x) of course. This extended definition makes p(x; x') the kernel of the projection operator P$ in the usual sense of integral operators: Pif, =
dx'p(x; x')
Looking ahead we shall, in Schrodinger's mechanics, be dealing with probability distributions which are formed in this way as magnitudes squared \ip{x)|2 where the functions ip are solutions of (partial) differential equations and are typically continuous functions and therefore generate continuous probability distributions. A moment's thought shows that, if a probability distribution is continuous, its derivative must be zero when the function itself is zero:
62
A More Careful Look at Probabilities
since if this were not true p would be negative in the neighbourhood of xo which is impossible. If, however, p is given by p(x) =
tp*(x^(x)
for any square integrable ip then: p(x0) = 0 = > ip(x0) = ip*(x0) = 0 =>
suggesting that the use of probability distributions which are the square of the modulus of a normalisable function is fundamental to probability theory since this method always generates probability distributions which satisfy Kolmogorov's requirements. Since the ip may take positive and negative values while p must always be positive, different ip's may be orthogonal which is impossible for two different p's, again suggesting a more fundamental role for tp; different ^>'s may be solutions of the same (self-adjoint) differential equation. 3.3.1.
A
Premonition
Suppose we wish to consider the states of a single abstract particle and further suppose that to each possible energy 11 Ei of this abstract particle there corresponds a position probability distribution pi(r) such that Piif)
=Vi{r)i>i{r)
in an obvious special case of the above example for general probability distributions (dV is the relevant volume element). Then the projection operator Pi given by:
and each of the separate probability distributions is the squared modulus of an eigenfunction of the corresponding (Hermitian) projection operator: Piipi = 1 x ip. Possibly discrete, possibly continuous, it could be the pendulum bob in our earlier example.
3.4-
Time-Dependent Probabilities
63
Futhermore, t h e t o t a l operator
i
has all t h e tpi as eigenfunctions with eigenvalue 1, if the tpi are orthogonal, i.e. if
If these conditions are met t h e n we m a y form t h e Hermitian operator H, given by
i
for which Htpi = Eiipi for any position probability distributions provided that the orthonomality condition on their component fa is satisfied. This latter condition is not, in fact, met by t h e probability distributions of abstract particles whose motion is governed by classical mechanics as one can quickly verify fom t h e angular deflection probability distributions of Section 2.4 on page 41 b u t t h e exercise is an interesting one. T h e problem lies with t h e continuous n a t u r e of t h e possible E{. It is capable of solution by using (infinite) sets of ^-function distributions and associated projection operators, one for each member of t h e continuous set of Ei.
3.4.
Time-Dependent Probabilities
In first looking at the probabilistic model of t h e simple pendulum, t h e essence of t h e choice of this model was t h e assumption t h a t only random measurements on concrete pendulums were available, making the deterministic equation which fixes t h e p e n d u l u m ' s motion as a function of time unusable. In thinking about r a n d o m measurements of this type, one would naturally assume t h a t values of t h e pendulum's deflection occurred at random (unknown) times (the values simply t u r n up and are recorded). 1 2 B u t w h a t happens in t h e case of a damped pendulum when t h e amplitude 12 By "unknown" here I mean that the relationship of the random events to the motion of the pendulum is unknown, not that all clocks are banished from the lab.
A More Careful Look at Probabilities
64
of the swing decreases with time? If the abstract object is the one chosen in the last section — "a pendulum of length t and amplitude 0 " — then © changes with time so values of 9 for random times will refer to different abstract objects of the former time-independent type and any statistical measurements on a single concrete damped pendulum will be worthless in attempting to verify the probability distribution calculated for the abstract damped pendulum. First of all, let's say what we mean by a damped pendulum. In most pendulum (or other oscillator) is damped by fluid (air) friction and it is a good approximation to make this damping proportional to the velocity of the pendulum so the equation of motion for such a system is
where 7 is a measure of the size of the damping effect and u> is the angular frequency of the corresponding undamped pendulum
g/e Without going into the details of the solution of this equation (which obviously involves the relative magnitudes of the driving force of the oscillations, gravity g, and the damping effect 7) the simplest regime is called light damping in which the frequency of the motion is unchanged from the undamped case and only the amplitude decays with time. 9(t)
eex
P(
-ipt
sin(w* + )
(3.4.5)
= Q(t) sin(u;i + ) (say). The time-dependence of the amplitude has simply been absorbed into the amplitude 0 without change of notation since the details of this dependence are not important here; the amplitude simply decays in the familiar exponential way with time. In this light-damping regime the probability distribution is of exactly the same form as that for the simple pendulum; equation (2.4.5 on page 42) of Chapter 2: P{6;t) =
V I - (0/Q(t))2
where the time dependence is entirely contained in the time dependence of the amplitude Q(t).
3.4-
Time-Dependent
Probabilities
65
It now seems to be the case that the abstract object must involve time: "a pendulum of length £ and amplitude 0(i) at time N +1 constants or initial conditions. One of those will be additive (corresponding to the time derivative)
74
The Hamilton-Jacobi
Equation
leaving just 3N required, which can be always be chosen to be the initial values of the coordinates ql: the <JQ. The relationships between the function S and the initial values q^ is just a special case of the general relationship between momenta and S:
(IF). ,-»
(42 3)
-
and so we may generate the initial momenta {pio) thus obtaining all QN initial conditions (conditions at a chosen value of t) of a particular trajectory. In obtaining the H-J equation, use was made of the dynamical law of classical mechanics and so the final equation can only apply for trajectories which solve Lagrange's equations: i.e. for trajectories which obey Newton's Law. Thus the H-J equation is just that; an equation not an identity; it is an equation equivalent to Hamilton's canonical equations or to Lagrange's equations or indeed to F = ma. But it is a partial differential equation for S in which the ql are the independent variables along with t; there is no question of the q% being functions of t as they are in the solutions of the Lagrange or Hamiltonian equations which are the explicit expressions for the trajectories as a function of time. Indeed such a thing is impossible by the very nature of the equation: it is S which is to be determined by the Hamilton-Jacobi equation not the q% and t. Hamilton's canonical equations 1 have the appearance of partial differential equations but, in fact, they are the generators of ordinary differential equations. That is, unlike the canonical equations
dqi
%
dpi
which involve a known function H (the partial derivatives merely picking out ordinary differential equations), the Hamilton-Jacobi equation involves an unknown function S to be determined by this equation. Once found, this S has values for all q% (and t). That is S, although it contains the dynamical law, does not determine trajectories as functions of t directly as, for every choice of values of the ql and t, S(ql,t) has a value. This raises two questions; one mathematical and one scientific: Note, once more, that only the first of the canonical "equations" is, in fact, an equation (it contains the dynamical law) while the second is merely the definition of velocity in Hamilton's dynamics.
4-2.
The H-J
Equation
75
1. Precisely how does a knowledge of S determine the allowed particle trajectories? 2. What is the referent of S and hence of the Hamilton-Jacobi equation which determines S? To what do the solutions refer — how does one interpret the Hamilton-Jacobi equation? In the past, attention has been concentrated almost exclusively on the first of these points and, fortunately, the answer to (1) helps with the consideration of (2). In the case of the solutions of Lagrange's equations, for the motion of a single particle, the quantities q% appearing as solutions of these equations (or the canonical equations) are functions of time (t G R1) and represent a path through 3D space: qz : i? 1 -> C
C C R3
(4.2.4)
where R1 is the real number system (modelling "time") and C a subset of R? (modelling ordinary space E3) which would normally be capable of being parametrised by R1: in short, C models a curve in ordinary space which is the trajectory of the particle. Now from what we have said above it is clear that this is not what q% is in the Hamilton-Jacobi theory. In fact, in this equation q% is independent of t; it is a coordinate variable which maps points in ordinary space (E3) into the real number system (R1)' given a point in space, ql is a function which gives the numerical value of a single co-ordinate variable in some frame of reference: qi :E3 -> R1
(4.2.5)
So that, in place of: Lagrange, Hamilton: qi : R1 (models "time") -> C (models a curve) C R? (models "space") we have: Hamilton-Jacobi ql : E3 (space) —> R1 (a coordinate) showing that the effect that the change from the Hamiltonian canonical equations to the Hamilton-Jacobi equation has on the interpretation of the ql and the referent of the function S.
76
The Hamilton-Jacobi
Equation
The meaning of ql has changed from "one co-ordinate of a point on a particular allowed trajectory satisfying the dynamical law" to "one coordinate of a point in space" since, as the introduction of S emphasises, all points in space2 lie on allowed trajectories; what distinguishes amongst these allowed trajectories is not the solution of the mechanical equations but only the initial conditions. Just as the referent of the ql has changed so has the referent of the whole mechanical theory with the introduction of the Hamilton-Jacobi approach. The referent of S is the set (ensemble) of all possible trajectories for the given field offeree and inter-particle interactions. Or, if we are to use the more precise concepts introduced in Chapter 3, the referent of S is the abstract object "a particle trajectory consistent with the given environment". This provides an answer to the question (2) posed earlier about the referent of the Hamilton-Jacobi equation and its solution S. The idea of an abstract object "a particle trajectory consistent with a given force field" which may be visualised as an ensemble of one each of all trajectories consistent with a given field of force and differing in initial conditions is a key one in the development of Schrodinger's quantum theory and is, at least incipiently, present in the high point of classical mechanics. Of course, a series of purely mathematical manipulations with equations cannot induce the equations to change their referent and change the meaning of the symbols involved; but the arguments and conclusions presented here can be made more acceptable using the method of "characteristic strips" in the theory of the equivalence of some partial differential equations to sets of ordinary differential equations.
4.3.
Solutions of the H-J Equation
There are two points of view which may be taken about the solutions of the Hamilton-Jacobi (H-J) equation; • That of Jacobi, which concentrates attention on the use of S as a kind of aid to generate the trajectories of the particle(s) (ql{t)) 2
W i t h some obvious exceptions like sources of potential.
4-3.
Solutions of the H-J
Equation
77
• Hamilton's development of the analogy between classical particle mechanics and optics; the possible particle trajectories ("rays") are the normals to the surfaces of constant S which are compared to wave-fronts in the optical model. From the perspective of the development of quantum mechanics, Hamilton's position is the more interesting. In this section some solutions of the Hamilton-Jacobi equation are presented for a very simple system: the free particle in three-dimensional space. Of course, we know the solutions of this problem from the solutions of Newton's equation; the motion is in a straight line with uniform velocity. The solutions of the H-J equation must reflect these known solutions but what they do is show how certain families of solutions emerge by separating the H-J equation in various coordinate systems. These families of solutions have direct connections with the solutions of the dynamical equations of Schrodinger's mechanics. There is a straightforward "recipe" for setting up and solving the H-J equation: • From a knowledge of the Lagrangian, write down the Hamiltonian for the system. • Replace the momentum components by gradients of the action function S. • This generates a partial differential equation (in 3 spatial dimensions plus the time variable for a single particle). • Choose coordinate system(s) in which the equation will separate into 3 ordinary spatial differential equations plus one time equation and solve these equations. • Each of the 3 spatial equations will involve one arbitrary constant which is an initial momentum component and one constant from the time equation which is the initial energy (which may be a constant throughout the motion in many cases of interest). • Combine the separate solutions into a total solution for S. • Form the gradients of S with respect to these arbitrary constants and use the resulting expression to fix the initial coordinates. For the simplest possible case, a single free particle, the basis of this simple recipe is a knowledge of the expression for the kinetic energy (T) in terms of the general coordinate velocities (q1):
The Hamilton- Jacobi Equation
78
T=\Yjmikqiqk
(4.3.1)
i,fe=l where the m ^ are products of the mass of the particle and metric coefficients depending on the particular coordinates 3 q%. In the more common orthogonal co-ordinate systems the matrix of rriik is diagonal and we have:
whence the momentum components required to form the Hamiltonian are:
p =
dL
=
dT
' W W
4.3.1.
Cartesian
.,
= rnuq
Coordinates
In Cartesian coordinates: ITIXX
= rriyy = mzz = m(say), the mass of the particle
and the H-J equation for a free particle is
where —dS/dt = E is the energy of the particle. Clearly, this equation separates Cartesians into a sum of three ordinary differential equations of identical form for x, y and z and a simpler, firstorder ordinary equation in t. Since the particle experiences no potential energy, the energy E is just its kinetic energy:
E = -m{±2 + y2 + z2). The resulting solution, the sum of the separate equations is: S(x, y, z, t) = pxx + pyy + pzz - Et = pxx + pyy + pzz ~^(P2x+P2x+ 3
Px)f
The coefficients in the Jacobian of the transformation between Cartesian coordinates and the q1, assuming that this transformation does not depend on time.
4-3.
Solutions of the H-J
Equation
79
where the pa are constants clearly identified as the components of the momentum of the particle since dS da as required. The surfaces S(x, y, z) = constant are sets of planes parallel to the coordinate planes and the trajectories are sets of parallel lines normal to these planes; straight lines as Newton's equation F = ma requires. The equations of motion for the system, if they are desired, may be had from the condition 9S —— = x opa
px 1 = constant = xo(say) m
(where xo is the value of x at t = 0) or, using px = mvx, x = vxt + XQ . The H-J equation provides a complete solution to the mechanical problem in providing both the (Jacobi) equations of motion and the (Hamilton) "wave fronts" which generate families of trajectories. Insisting that the energy
^(PI+PI+PD be a fixed constant generates a family of trajectories, all of which have the same energy; a situation which we shall meet later on. 4.3.2.
Spherical
Polar
Coordinates
In spherical polar coordinates: 0 < r < oo
the radial distance from the origin
0 < 6 < IT
the angle between r and the z-axis of Cartesians
0 < (j> < 2TT
angle about the z axis from the x axis of Cartesians
mrr = m; mgg — mr2 m^ = m(r sin 9) and the H-J equation becomes: 1
f fdS\2
+
1 fdS\2
2^{UO ^ U )
+
1
fdS\2\
_
+ =o
n
,
i n n
(4 3 3)
^ w U ) > ^ - --
.
The Hamilton-Jacobi
80
Equation
Writing S{r, 9,0; t) = Sr{r) + Se + S+tf) - Et separates this equation into a sum of three ordinary differential equations which differ in form:
fdS^ \d)
a2 (because does not occur in the Hamiltonian)
(dSg\
\d8 J (dSr\
VdV)
sin 2 0 2mE - - ^ .
We know that the trajectories must be straight lines so, by setting a = b = 0 we obtain those trajectories with zero angular momentum about the chosen origin; that is trajectories, all of which pass through this origin. In this case we obtain the very simple solution for S: S(r, 6, ; t) = \llmEr
-
Et.
The "wave fronts" are spheres in this case and the trajectories corresponding to this simple solution are all those of a given energy E which pass through a given point; the (arbitrarily-chosen) origin of spherical polar coordinates. We may, by dropping the requirement that a = 0 or b = 0 (or both) obtain other families of trajectories with given energies and given angular momenta. Combining the ideas that the particle trajectories must be straight lines and that the angular momentum of each particle on such a trajectory is conserved it is easy to see, without an explicit calculation, that a family of trajectories with a given constant angular momentum must be all those whose perpendicular distance from the origin is a constant since the linear momentum and therefore energy of these trajectories is constant. Any non-zero values of the constants a and b determines a subset of this total collection. In the case of a free particle there is obviously no preferred point in space for the origin of either Cartesian or Spherical polar coordinates. In the case of Spherical polars, therefore, we have the unusual situation of constant angular momentum about any point. The values of the constant for a particular trajectory will depend on choice of origin, of course.
4-3.
Solutions of the H-J Equation
81
It is obvious that this separation technique in Spherical polars applies to a particle in a central field of force V(r) if the source of that field is at the (now unique) origin of coordinates since only the equation for Sr is changed:
( f ) 2 = 2 m(£ + nr))-^. The solution of this equation is more difficult than that for the free particle but some qualitative features are the same: • Trajectories with a = b = 0 (zero angular momentum) are straight lines (in this case of variable velocity) and pass through the origin. In this case the particle will impact on the source of potential. • Other trajectories form the familiar circular, elliptical, parabolic or hyperbolic orbits which have non-zero angular momentum with magnitude and spatial orientation being determined by the values of a and b. 4.3.3.
Comparisons
Thus, we can see that in the case of the free particle of constant energy E, for example: • All the trajectories are indeed straight lines with constant linear momentum (and constant angular momentum about any point). • The various possible separations of the H-J equation in different choices of the familiar 11 orthogonal coordinate systems generate families (abstract objects, ensembles) of trajectories with some particular properties in common. • These ensembles share a given spatial symmetry type or, what amounts to the same thing, they have common values of certain dynamical variables. In the ensembles resulting from a separation in spherical polars for example, for a given choice of E, b and a, all particles having one of these trajectories have the same energy and angular momenta. One point which I shall have reason to mention in Chapter 11 is the conservation laws which are characteristic of particular families of trajectories. For a free particle every concrete trajectory is a straight line with constant momentum (p) and constant angular momentum (£).4 What is more, any For a particular choice of origin.
The Hamilton-Jacobi Equation
82
family of trajectories (chosen, for example, by a given separation of coordinates) will consist of a family of straight-line trajectories with particular common properties: • Any one family obtained by separation in Cartesians all have the same constant linear momentum when the "initial conditions" are fixed. • Similarly, any one family obtained by separation in spherical polars all have the same constant angular momentum when the "initial conditions" are fixed. However, and this is the point which is crucial to Chapter 11, The trajectories from one family with constant angular momentum cannot all have the same value of linear momentum and vice versa even though each individual concrete trajectory of each family has constant values of both dynamical variables. Specifically, it is obvious that the linear momentum of any individual free-particle trajectory in the family "trajectories of zero angular momentum passing through the origin" is constant but the (vector) value of that constant varies from trajectory to trajectory within that family: their directions are all different. It cannot be overemphasised here that, for our simple example of a free particle of constant energy: • Every individual concrete trajectory has constant values of both (vector) linear momentum and (vector) angular momentum. Indeed, if they did not have perfectly definite values of each of their properties they would not be concrete trajectories. • Families of trajectories may be generated by separating and solving the H-J equation in various co-ordinate systems and each member of such a family has some shared constant linear or angular momenta but within a family of constant linear momentum there are members with different angular momentum and vice versa. • There are, in fact, at least 11 such families corresponding to the 11 orthogonal co-ordinate systems; each of which has its own characteristic set of constant momenta which are not all different. There is, then, no mystery here:
4-3.
Solutions of the H-J
83
Equation
• It is intuitively completely obvious that one cannot choose a set of all possible different trajectories passing through a given point in 3D space (i.e. of constant angular momentum zero) without all of them having different directions. That is, all of these trajectories have different linear momentum vectors. • This result has nothing to do with the simultaneous measurability of these sets of dynamical variables; it is merely a result of choosing families of trajectories in certain systematic ways. It is perfectly possible to measure both linear and angular momenta 5 of a concrete trajectory; indeed the angular momentum is completely determined by the linear momentum and the position of the origin in this simple case. These results are quite general and independent of the choice of particular example; in Chapter 111 will explore the quantum analogue of this method of choosing families of trajectories and attempt to clear up the confusion surrounding this method. To risk labouring this important point we may look at an intermediate case: the free particle in cylindrical coordinates. 4.3.4.
Cylindrical
Coordinates
Circular cylindrical coordinates provide an interesting intermediate case, in that they involve both linear and angular momenta. The three independent coordinates are: 0 < p < oo
the radial distance from the origin in the xy plane of Cartesians
0 < z < oo
identical to z of Cartesians
0 < 4> < 2TT
angle about the z axis from the x axis of Cartesians rripp — mzz = m
m^
= mr2
giving a H-J equation of the form 5 1 have purposely avoided any discussion of the simultaneous existence of more than one component of the angular momentum "vector" in classical mechanics, even though the answer to this question strengthens the similarity between the classical and quantum cases. In fact, as I shall establish in Chapter 5, only the absolute magnitude and one component of the angular momentum may be measured, even in classical mechanics. In this chapter, I am concerned with concentrating on the interpretation of the H-J equation and not muddying the waters with details of the separate problem of the interpretation of angular momentum.
The Hamilton- Jacobi
84
1
\(dS\2
(8S\2
1 (dS\2\
dS
^{UJ +U) v U ) ) + ^ = 0
n
Equation
fAOA.
(43 4)
-
which separates in the usual way into: a dSz dz dSp dp
b (because niether z nor <j> occur in the Hamiltonian)
)]*«*-(* + $)•
The solutions of these equations, as usual, generate families of trajectories with common values of radial momentum (in the p direction, perpendicular to the z-axis), linear momentum (in the z direction) and angular momentum (about the z axis) depending on the choice of the separation constants a and b. As in the other two cases, all the concrete trajectories are motion in a straight line with constant velocity and constant angular momentum but, this time, the families generated by the separation of the H-J equation in cylindrical coordinates have a set of common properties which differ from those of the earlier (Cartesian and spherical polar) families; in particular, each family has common values of two linear momenta and one angular momentum.
4.4.
Distribution of Trajectories
The H-J equation
generates the trajectories (or "rays") of the particles whose motion is described by the Hamiltonian H and, for a given problem, one might reasonably expect that the number of trajectories is constant; trajectories are neither created or destroyed by the evolution of the motion. If we use the more colloquial interpretation of the solution of the H-J equation as
4-4-
Distribution
of
85
Trajectories
referring to an ensemble6 of particle trajectories consistent with H, rather than the abstract object "particle trajectory", then this result might be heuristically interpreted to be equivalent to the conservation of ensemble particles. This fact ought to be represented by a conservation equation and indeed it is. If p(q%;t) is the density of trajectories in the ql space (the density of particles in the ensemble interpretation) then for the function S which solves the H-J equation, there is a conservation equation which connects S and p:
! +
r)TJ - ( < didS/dqt)
' = 0
(4.4.10)
which, because of the definition of velocity (ql) in Hamiltonian dynamics dS dql
dH dp1
becomes
and the quantity p(ql; t)q = J, say, may be interpreted in the way familiar from electrodynamics as a "current" of trajectories (or particles) to give:
t + v./.o. The differential equations for the two functions S and p may be generated from an all-embracing variational principle 8 f dV fdt£. = 5 fdV
fdtip
H(dS/dqi,qi;t)
+ ^-
\=0
(4.4.11)
when variations are allowed in the forms of the functions S and p. Carrying out the formal mathematical treatment of £ as a "field Lagrangian density" shows that the functions S and — p are a conjugate pair of "coordinate and momentum" ! 7 Here we are so close to Schrodinger's theory 8 that we can almost touch it. What is required now are two things: 6 We are then at one with Einstein; this is the ensemble of all possible trajectories with different initial conditions for a particle in the environment described by H. 7 Where t h e partial differential is replaced by a so-called functional differential 5C/SS. 8 A n d to field theories and "second quantisation".
The Hamilton-Jacobi Equation
86
• A different interpretation of the formalism. • Some connection between the two functions p and S to reduce the number of unknown functions from two to one and the physical consequences of this connection. The first problem will be addressed in due course. That the two functions should be connected is not difficult to see, at least qualitatively. Gradients of the action function S are the momentum components which (apart from masses and metric tensor components) are the velocities of the particles. Now, the faster a particle is moving at a particular point, the smaller amount of time it spends near that point and so the less likely it is to be in the vicinity of the point; as we have seen earlier the probability distribution function for a particle on a trajectory is inversely proportional to its velocity. Thus, gradients of S should influence the values of p.
4.5.
Summary
I have already said that there can be no question of "deriving" quantum mechanics from classical mechanics since, as we shall see, their referents are different and their descriptions of the energetics and distributions of particles are very different so it is worth stressing a few points: • At bottom, the result of solving the Hamilton-Jacobi equation is a set of trajectories for particle(s) in a given environment (potential energy function and mutual interactions). These results could have been obtained by solving Hamilton's, Lagrange's or, indeed, Newton's equations of motion. • What has been achieved by deriving the H-J equation is a transfer of viewpoint of the problems of classical particle mechanics from "finding a trajectory" to the realisation that, roughly speaking, 9 every point in the available space is capable of being on some trajectory. These trajectories differ, not in the particles' environment, but in the particular "initial conditions" which the particles have in that environment. Thus, the solutions of the H-J equation enable all the trajectories to be found. • By solving the H-J equation by the technique of separation of variables, one can obtain "families" of trajectories with particular properties in common as we have seen; all the trajectories of a free particle are straight lines with constant momentum but these trajectories may be collected 9
Obvious exceptions are point sources of potential.
4-5.
Summary
87
together in various ways; all trajectories parallel to a given line or all trajectories passing through a given point, etc. • Although the analogy is tempting, distributions of particle trajectories are not distributions of particles; in a one-particle system described by classical mechanics (Newton's equation or H-J) the equations of motion and the initial conditions fix the trajectory. Certainly the position probability distribution of the particle along that trajectory may be computed from a knowledge of its velocity10 but this position probability distribution is only non-zero along that trajectory and there is no 3-dimensional probability distribution function for the particle. • Precisely because the H-J equation was Schrodinger's starting point for the development of his quantum mechanics, there is a considerable literature on the relationship between the H-J equation and the equations of Schrodinger's mechanics. But Schrodinger's mechanics cannot be obtained from classical mechanics by mere mathematical manipulation. Notice that some care has been taken to distinguish between a concrete particle trajectory, an ensemble of possible trajectories and an abstract particle trajectory; distinctions which will become crucial in what follows.
'As we did for the pendulums in Chapter 2.
This page is intentionally left blank
A p p e n d i x 4.A
Transformation Theory
Here is the very briefest and most elementary "derivation" of the Hamilton-Jacobi equation and the Poisson bracket formalism. It is for illustration purposes only and will not stand close scrutiny. The solution of Hamilton's canonical equations: f £ ~ *
(4.A.D
£ = '
(4.A. 2 )
is obviously enormously simplified if the Hamiltonian function is independent of some of the coordinates qz; since in this case the left-hand-side of (4.A.1) is zero and so pi = 0 (pi = constant) integration is immediate. Similar comments hold for the identities (4.A.2). Clearly, if a co-ordinate system could be found for which the Hamiltonian was independent of all coordinates, then the solution of the equations of motion would be trivial: all momenta would be constants. Before spending too much time in searching for such a co-ordinate system — which would obviously depend on the potential in which the particles move — consider the physical interpretation of such a system. If all momenta are constants, then because of Newton's law there are no forces acting on the particles (or bodies in general) and so, apparently, this co-ordinate system can only be found for the trivial case of non-interacting particles in free motion (or free rotation of extended bodies). The only way in which one could hope to generate a co-ordinate system with the sort of properties we require for non-trivial mechanical problems is to allow the origins of the coordinates to move and "follow" the particles; so that the 89
Transformation
90
Theory
momenta in the moving co-ordinate system can be constant in the presence of a potential and inter-particle interaction. Now, in the Hamiltonian formulation of mechanics we have an ideal vehicle for the construction of such moving co-ordinate systems; the coordinates and momenta are independent variables and the canonical equations (4.A.1) and (4.A.2) are formally almost identical. So, if we admit transformation of coordinates and momenta which "mix" the original coordinates and momenta in the definition of the new sets we should be able to generate a set of canonical equations in terms of the new variables which do have the desirable properties. That is, we seek a transformation to new variables QJ, Pj such that Qj=Qj{q\Pi,t)
(4-A.3)
Pj=Pj(qi,pi,t)
(4.A.4)
for which dh
•
, , . _,
—— = -Pj = constant
(4.A.5)
§-rV
(4.A.6)
where h is the Hamiltonian expressed in terms of the variables Qi, Pj. Now, there are several points to consider when setting out on such a venture: 1. The transformations (4.A.3) should be a complete and non-redundant set if the original q% and pi were; this condition can be expressed in the usual way as the non-vanishing of a Jacobian. 2. The transformation (4.A.3) must be such that the transformed equations of motion stay within the canonical Hamiltonian formalism, i.e. are indeed of the type (4.A.5) and (4.A.6). 3. Since the transformation (4.A.3) "mixes" co-ordinates and momenta of the original "intuitive" type, the new, transformed, equations (4.A.5) and (4.A.6) cannot be as easily distinguished as the original equations (4.A.1) and (4.A.2). That is, while (4.A.1) is an equation of motion and (4.A.2) is an identity in the original formulation, in the transformed equations the "equations of motion" may be inextricably intermingled with the " identities" and so we are forced to treat equations (4. A.3) and (4.A.5) and, by implication, (4.A.1) and (4.A.2) on the same footing. If
91
we do this then we may as well be hung for a sheep as for a lamb and seek transformations which make both the Pi and the Ql constants. It must be said at this point that the equations describing the motions of a system of particles cannot be solved by sleight of hand. In attempting to find a particularly trivial form of the equations of motion by use of a transformation we are simply pushing the difficulties "out of sight" into the generation of the transformation. None of these manipulations are being carried through as aids to the practical solution of problems in mechanics; our aim is to obtain the most general form of the mechanical principles in order to throw light on the formalism and physical interpretation of the quantum mechanics of systems of particles. There are no intuitive guides to be used in seeking the transformations (4.A.3) and so we must fall back on general principles. The most general formulation of Hamiltonian equations is the co-ordinate-free variational formulation: 3N
1 Y^m'-H /•*2
dt = 0
(4.A.7)
,i=i
from which the canonical equations (4.A.1) and (4. A.2) follow. If we wish, therefore, to use only transformations (4. A.3) which remain in the Hamiltonian formalism we require /•*2
3N
Y.PiQ'-h
dt = 0
(4.A.8)
i=\
in addition to (4.A.7). Now the variational problem (•*2 /•12
•L
fdt = 0
(4.A.9)
can always be solved by a function whose total time derivative is / , i.e. if dt
J
then
7
6 I
Jt!
fdt = S(F(t2) - F(h)) = 0
identically. Or, what amounts to the same thing, the variation principle only determines the optimising integrand to within an additive total time
92
Transformation
Theory
derivative. This means, of course, that when / is a known function of some arguments other than just t, like ql, ql, pi in 3iV i=l
then, when the variational problem is solved, the integral which solves the problem is a function of t only. That is, of course, that (in our case) ql and Pi are then fixed as known functions of t which, if we choose to do it, we may substitute in the integrand for ql and pi and obtain 3N
dG(t) = 5^Pi9* - H(q>,Pj) dt i=l
explicitly. We can do this for both the original variables in (4.A.7) and the transformed variables in (4. A.8) and combine the two results; both of which are functions of t only so we may combine (4.A.7) and (4.A.8) to give 6
3AT
J
i^Piq'-HJ-i^PiQ'-hjdt
It is clear that the integrand can be set equal to the total time derivative of an arbitrary function F (say) 3AT
3N
^Pitf-H
£W-/i
.»=i
i=l
~~dt
(4.A.10)
In obtaining this equation we have put no requirements on the transformation (4.A.3), we have not yet sought a transformation which will generate constant Q% and Pi- However, it is clear that F contains a characterisation of the transformation since it is a function of the ql:pi,Q% and Pf, the question is "how do we extract (4.A.3) from (4.A.10)"? First of all, although F is a function of the q\ pi,Ql,Pi (and t), only 6iV of these 12N variables can be independent because of (4.A.3). We may choose which 6N at our pleasure. To illustrate the procedure we choose q% andQ*
F=
F{q\Q\t)
93
so that dF
^dF
.i
^ =E ^
H> dF
i+
.
dF
+
E ^ ar-
(4jU1)
Now both (4.A.10) and (4. A.11) are identities and may be combined to give
which itself is an identity so that the individual coefficients of the 6N + 1 time derivatives must be separately zero:
Pi =
dF(qi,Qi,t) d?
(4.A.13)
dF{ff,Qi,t)
(h-H)
=
(4.A.14) .
(4.A.15)
Equation (4. A.13) fixes the Ql in terms of the ql and pi and may be solved to generate their explicit form. Once the Ql are found from (4.A. 13) they may be substituted in (4.A.14) to generate the Pi and in (4. A.15) to obtain the new Hamiltonian h. These manipulations show that functions like F may be used to generate transformation of the variables in the canonical equations; what is not yet clear is how to choose a particular F which simplifies the canonical equation in order to generate (4.A.5) and (4.A.6). The key lies in the backsubstitution of (4.A.13), (4.A.14) and (4.A.15) into the canonical equations which generates an equation for F. We now suppose that the transformation of coordinates and momenta in which the new coordinates and momenta Q%, Pi are all constants can be found, and express it in terms of an F of the above type. In deference to long-established practice, we call this particular F, S and it is a function of the independent "variables" ql, Ql and t although the Ql are to be constants ultimately, albeit "independent constants". S =
S{q\Q\t).
Transformation Theory
94
We require S such that the transformed Hamiltonian h satisfies dh
„
dh
n
(4.A.16)
If the transformed Hamiltonian h contains no explicit time dependence we have dh ~dt
(4.A.17)
0
and so the Hamiltonian is a constant, having no dependence on Q%, Pi or t. Now the original momenta pi may be generated from S by the use of (4. A.13) so we may write H{q\pi)
asHiq
dS_ dq*
and, substituting into (4.A.15), we have H
dS
' dtf
dt
(4.A.18)
= h
where h (the transformed Hamiltonian) is a constant as we have seen above. This is a partial differential equation for S since we are now going to regard S as a function of the q% and t only since the Ql are constants. In fact a trivial re-definition of S enables us to absorb the constant h , replacing S by S — Et enables the equation to be written in a compact form H
9s
n
(4.A.19)
' dq*
where now S = S(ql,t). This equation, the Hamiltonian-Jacobi equation, can be set up whenever the Hamiltonian can be formed in terms of the original qx and p,. If it can be solved, the mechanical problem is solved. Notice that, however appealing the above "derivation" might be, the central assertion — the existence of the function S — has not been proved, merely made to seem reasonable. In the case where the transformed Hamiltonian function is a constant, which as we have seen may be chosen to be zero by a slight re-definition of S, we can go back to the original equation (4.A.10) which defined the original general transformation: '3N
H=i
M
E^ i=l
h
dt
95
If the Q% are constants then q* = 0 and, if h is chosen to be zero, we have for the special case F = S: JQ
[3N
That is 5 = / Ldt showing the special relationship between this particular transformation function and the original variational formulation of Lagrange's equations. The variational principle is now 8S = 0. It should be noted that this derivation of the transformation equations in general and the Hamilton-Jacobi equation in particular has used the dynamical law by assuming that F (or S) is a function of time only — the q% are fixed functions of T. Thus, these are indeed equations and not identities. We have seen that if the canonical equations were to have the same form in two different co-ordinate systems then there exists a function of the 127V + 1 variables qz,pi,Qj,Pj and (possibly) t from which the details of the transformation may be obtained. Crucial to this development is the fact that, of the 127V variables, only 67V may be independent: the other 67V are to be generated from these independent ones by the very transformation provided by this function. Now the choice of which 67V are chosen as independent variables is (formally if not practically) arbitrary and earlier the most obvious choice was taken; writing the function which generates the transformation from q1 to Q* as an explicit function of the q% and Qi. But there are three other obvious possibilities, if our original choice is Fi(ql,Q:',t) then using the initial and final coordinates and momenta as units we may choose the independent variables in three other ways to define the functions F2(qi,Pj,t):F3(pijQj,t) and F4(pi,Pj,t)
Transformation
96
Theory
and all are related. Of course there is no reason why one should not choose some ql and some pi, etc. b u t such choices have little theoretical or practical interest. We saw earlier t h a t , in addition t o t h e sets (ql,Pi) and (Qi,Pj) satisfying t h e canonical equations which they were required t o do, t h e pi and Pj were related t o F i by Pi
dFi dFi :P< = dqi
T h a t is dQi '
dpi
d2F1
dQi
dQidq*
It is straightforward t o show t h a t similar relationships hold for t h e other dqi choices: dqi _ dPj dqi dQi ~ dPi "' dPj
dQi
dpi
dQi
dpi
dPj
dqi
(4.A.20)
These four relationships fix t h e n a t u r e of t h e allowed transformations; t h a t is, since t h e initial coordinates and m o m e n t a are arbitrary, t h e y fix t h e possible co-ordinate systems in which Hamilton's canonical equations m a y be expressed, together with t h e requirement t h a t t h e Jacobian of t h e transformation to a known set of complete and nonredundant set of coordinates be non-zero. It is usual t o express these relationships in a form which displays t h e structure of t h e results rather t h a n its expression in t e r m s of two co-ordinate systems although, of course, our co-ordinate systems are arbitrary. T h e Poisson Bracket of X and Y (two quantities depending on coordinates, m o m e n t a and possibly time) is defined by: 3iV
[*.n,,P = £
dXdY
dXdY
dqi dpi
dpi dql
dQi dPj
dQi dPj
dqk dpk
dpk
and, in particular, 3N
[QS-P,j\q,p
E
dqk
k=i
which, when t h e relationships (4.A.20) are used, becomes dQl IV ' Pj\q,P ~
dQi
6%,j =
[Q\PJ]Q,P
97
and similarly,
[Q\ Q%,p = [Q\ qjh,p = o: [Pu Pj]q,P = [Pu PJ]Q,P = o • That is, the Poisson Bracket of the coordinates and momentum components are invariant with respect to which co-ordinate system they are evaluated in; so that these relationships which are derived using the relationships (4. A.20) may serve as diagnostic tests of allowed coordinates and conjugate momenta in the canonical equations.
This page is intentionally left blank
Chapter 5
Angular Momentum
While many of the more elementary paradoxes in the interpretation of quantum theory are generated by misunderstandings about probability, some more subtle ones are due to confusions about angular momentum. Rotational motion, having some seductive and misleading analogies with rectilinear motion, has proved a stumbling block in both classical and quantum mechanics. The most fertile source of "mysteries" in quantum theory — the Einstein-Podolsky-Rosen paradox and the related results due to Bell — are exacerbated by this confusion.
Contents 5.1. 5.2. 5.3. 5.4. 5.5. 5.1.
Coordinates and Momenta The Angular Momentum "Vector" The Poisson Brackets and Angular Momentum Components of the Angular Momentum "Vector" Conclusions for Angular Momentum
99 101 105 107 108
Coordinates and Momenta
In everything which has been discussed so far the terms "coordinates" and "momenta" have been used rather informally insofar as we have relied on intuition and ordinary practice to supply a picture of what is meant by a coordinate, and the definition P, = f
(5.11)
to provide a conjugate momentum component. The simplest examples show that this usage seems justified; in particular the familiar results in Cartesian 99
100
Angular Momentum
coordinates are all consistent with this general theory. In normal practice "coordinate" at its most complicated usually means a member of one of the familiar 11 orthogonal coordinate systems in three-dimensional space. If one of these coordinate systems is used, then (5.1.1) provides the conjugate momentum components which are, however, sometimes intuitively less accessible. Quite independently of the Lagrangian and Hamiltonian formalisms there are existing historical (Newtonian) definitions of various types of momenta and so it is natural to inquire about the relationship between these, what one might call "naturally occurring", momenta with their associated components and the momentum components conjugate to sets of coordinates generated by Lagrange's definition (5.1.1). That is, under what conditions is a "coordinate" suitable for use in the canonical formalism and under what conditions can a "pre-existing" momentum component associated with such a coordinate be made conjugate to that coordinate and so be brought into the canonical formalism? Naturally this investigation is limited to "coordinates" in the original sense of the term: i.e. specifically excluding transformations which "mix" coordinates and momenta. Elementary considerations are enough to show that there is a whole class of momentum components which cannot be brought into the canonical formalism in spite of their utility and familiarity. The problem of the interpretation of angular momentum and its components has three aspects which we shall look at separately although all are related: • The relationship between the so-called angular momentum vector and momenta conjugate to angular coordinates. • The canonical formalism and Poisson brackets. • The components of the so-called vector and their interpretation. To set the scene for this investigation it is worthwhile making the distinction between the pairs (linear momentum, Cartesian coordinates) and (angular momentum, angle coordinates). The position of a point in ordinary threedimensional space can be specified by the values of the three Cartesian coordinates (x, y, z) (with respect to some origin) and the linear momentum of a particle in that space can be uniquely specified by the three Cartesian components (linear momentum in each of the three mutually perpendicular directions). Three angles ((f), 6, x) (rotations about the three Cartesian axes, say) do not, of course, have a member with the dimensions of length so they can, at best, specify a direction in space or, intuitively more accessible, they
5.2.
The Angular Momentum
"Vector"
101
can specify the position of a point on a sphere about the origin. But the analogy is incomplete; while the position of a point in space is uniquely specified by the values (x, y, z) and only (x, y, z), the position of a point on a sphere cannot be specified by just the values of three angles. In order to get a unique point on the surface of a sphere, one must specify the order in which the rotations must be performed. It is enough to experiment with rotations of an unsymmetrical rectangular prism to be convinced of this. This asymmetry between lengths and angles destroys the value of the otherwise attractive analogy between the linear and angular momentum vectors.
5.2.
The Angular Momentum "Vector"
In elementary (vectorial) mechanics one defines the angular momentum vector as i=fxp
(5.2.2)
and this definition implies that the angular momentum vector can be resolved into components in any coordinate system whether or not any member of the coordinate system is an angle. That is, (5.2.2) is not concerned with the idea of momentum components conjugate to coordinates with the dimensions of angle (i.e. no dimensions) it is simply called the angular momentum for intuitively justifiable reasons. In fact, the most usual form of resolution of t is l = txt + £yj+ezie
(5.2.3)
in Cartesian components and, of course, tx is not conjugate to x, etc. There are a couple of points to be made about this definition: • The usual description of the angular momentum defined in this way is that I is "the angular momentum about a poinf; the origin of r. But rotation does not occur about a point, it occurs about an axis. This axis is, in fact, provided by the definition; it is the direction of the vector (.. • As it stands, equation (5.2.2) is ambiguous. It depends on the origin of coordinates; different origins give different values for t. These differences are not trivial constants; for a particle rotating about a given axis, the angular momentum defined by (5.2.2) is only constant if the origin is
Angular Momentum
102
chosen to be at a point on the axis at the centre of the rotation (the "centre of mass"). The magnitude of the angular momentum is the same for all origins along the axis, but for any origin not at the centre of mass, the direction of the angular momentum vector rotates. I shall silently assume in what follows that the origin is at the centre of mass of any system. What is very clear is that I cannot be expressed in a way in which, to each of of three linearly independent components (in some coordinate system), there is a conjugate angular coordinate. This is trivially true simply because the position of a point in space cannot be specified by three angles: at least one length is required. One might think of this simple example as suggesting a kind of "inverse problem in canonical coordinates": given a set of momentum components, under what conditions are found conjugate coordinates which: 1. are a complete and non-redundant set for the problem in hand, 2. regenerate the given momentum components via (5.1.1). Now among the familiar orthogonal coordinate systems there are some which have two angles and one length as their dimensions. But the problem with the resolution of the angular momentum vector is more acute than we have suggested; as we shall see shortly, it is possible to make only one of the three components of I into a canonically conjugate momentum. Perhaps the most direct explanation of why this is so is by way of an explicit example: the transformation between Cartesian and spherical polar coordinates. Taking a single particle of mass m with velocity v = f p = mv = mf we have £ = rx p = m(f x r) and the Cartesian components of I are: 4 = m(yz - zy) iy = m(zx - xz) Lz = m(xy - yx)
5.2.
The Angular Momentum
"Vector"
103
using the relationship between Cartesians and spherical polars: x = r sin 6 cos (j> y = r sin 9 sin <j> z = rcosO gives ^ x = —1(9 sin + 0 cos 0 sin 0 cos <j)) ly = 1(9 cos <j> — (j> cos 0 sin 0 sin ) tz = I<j> = (I sin2 6)j>
where / = mr2 and 1$ is the "moment of inertia" of a particle about the z axis. Now for a Lagrangian of the simple form L = -mv2 -V
= -m(r2
+ r292 + r2 sin 2 9 ft) -
V.
Lagrange's definition of the momentum conjugate to (p is dL
d which is indeed I^cj) and so the angular variable (j> and the angular momentum component
are indeed conjugate. But lz is the z-component of the original angular momentum, not the ^-component! Of course, (f> is "tied" to the choice of z-direction (it is an angle of 0 to 27r around the z-axis) and we could have defined <j> in an analogous way around the i-axis and so obtained lx = 1^0. But we cannot do both, if only for the simple reason that angles of 0 to 2ir around two mutually perpendicular axes cover the sphere twice making such a putative coordinate system redundant and, incidentally, showing that there is no possibility of a non-redundant set of coordinates containing two, let alone three, canonical angular momentum components. In fact, once one component of the angular momentum "vector" has been chosen as conjugate to an angular
Angular
104
Momentum
variable this choice excludes other angular momentum components from being conjugate to any other angular coordinate in any coordinate system. Some further insight into the problem may be obtained by considering ignorable coordinates — the starting point of the transformation theory. In Cartesians if dL_ _ &L _ &L _ dx dy dz this usually implies the potential is a constant (say zero) and the vanishing of the above derivatives implies (via the Lagrange equations) the constancy of the three conjugate momentum components and we interpret this constancy by saying that in the absence of a potential function threedimensional space is isotropic with respect to linear displacement. Now, in spherical polars a free-particle Lagrangian is L = ^m{r2 + r2d2 + r2 sin 2 6 but
indicating a major a-symmetry between the two angles in the coordinate system: <j> has a privileged position. Once a given axis is chosen, then the homogeneity of space for rotations is destroyed by that choice. The range of the angle 6 is only from 0 to 7r, that is it is not cyclic and clearly not suitable for the description of angular momentum; the derivatives {dL/dO) are not defined at the end-points. Thus, caution is required if there is a tendency to make too much of certain apparent equivalences between translations and rotations: between angular and linear coordinates. We have seen above that p$ is not constant for a free particle but pe is conjugate to the angular coordinate 9 and the canonical equations can be expressed in spherical polar coordinates. It should be clear now what the source of the confusions about angular momentum actually are: they are verbal rather than essential. We have been discussing two quite distinct concepts and confusing them together because of similar terminology and certain intuitive
5.2.
The Angular Momentum
"Vector"
105
expectations. These confusions are compounded by a contingent connection between the two which occurs in a familiar coordinate system. The clues to the resolution of the confusion lie in the fact that it is the z-component of I which is conjugate to <j> (not z) and the fact that p$ is a valid momentum component conjugate to 0. The problem is simply the simultaneous existence of two quite different quantities: the angular momentum "vector" and the momentum components conjugate to angular coordinates. These two quantities are defined independently of each other and, in general, there will be no simple connection between them. In fact, as we have seen, there may be a contingent connection between them in the sense that (in spherical polar coordinates, at least) one of the Cartesian components of the angular momentum vector is identical to a momentum component conjugate to an angular variable. This is nothing more or less than a co-incidence which has, unfortunately, served to muddy the distinction between the two separate quantities. If one considers the 11 orthogonal coordinate systems in three-dimensional space — many of which have dimensionless (angular) members — the role of 6 in spherical polars is more typical. The momentum component dL/d6 is a proper conjugate momentum component: conjugate, that is, to an angular variable, but it is not a component of the angular momentum "vector", in particular it is not a Cartesian component of the angular momentum "vector". Even if the position of a point cannot be specified by three angles it might be thought that the three Euler angles (for example) which are used to specify the orientation of a rotating body might past muster as "rotational canonical" coordinates. As we noted in the last section, these three angles are not sufficient to define the orientation of a rigid body with respect to a fixed "global" coordinate frame because one must specify the order in which the rotations are performed in fixing the body's orientation. Such a set of coordinates cannot be brought into the canonical formalism. In the last few paragraphs, the word vector has been put in quotes when used in conjunction with angular momentum. This is deliberate and another attempt to draw attention to the difference between the angular momentum "vector" and momentum components conjugate to angular coordinates or, indeed, to any canonical momentum component. Angular momentum, defined as it is in terms of the vector product, is in fact a bivector or anti-symmetric second-rank tensor (number of components n{n — l)/2) and it is only the co-incidence in three-dimensional space that
Angular
106
Momentum
3 = 3(3 —1)/2 which enables the components of this bivector to be put into one-one correspondence with the components of a vector.
5.3.
The Poisson Brackets and Angular M o m e n t u m
The above preliminary discussion of some of the properties and peculiarities of angular momentum was initiated by the more general question "under what conditions is a coordinate (or momentum component) suitable to be used in the canonical equations" . Although angular variables and angular momentum provide special confusions and are the most celebrated case of "non-canonical" momenta, it is worth looking at the general case. However, it is worth remarking that these confusions between angular momenta and canonical momenta conjugate to angular coordinates, when taken over into quantum theory, have some ramifications in discussions of Bohm's version of the Einstein-Podolsky-Rosen (EPR) paradox. The techniques of the Transformation theory are specially suited for this investigation since the only limitations placed on the allowed transformations are: 1. The coordinates should be a complete and non-redundant set: the nonvanishing of the Jacobian of the transformation. 2. Hamilton's canonical equations should have the same form in all allowed coordinate systems. That is, the condition that a set of coordinates and momentum are canonical is "built into" the Transformation theory. In Appendix 4. A it was shown that the transformation theory led to the definition of sets of invariant quantities — the Poisson Brackets — which could be used as diagnostic tests for candidates aspiring to be canonical coordinates and momenta. In the case of angular momenta, we may use the convenience of the invariance of the Poisson brackets to evaluate them for the Cartesian components of angular momentum in Cartesian coordinates: obviously [x, x] = [x, y] = • • • = 0 .
But, for example •xi *-y\ — %-z
5.4-
Components
of the Angular Momentum
"Vector"
107
and faty]
=Z
confirming the fact that the angular momentum components are not conjugate to a set of canonical coordinates. It is easy to show that [tX,l2) = [ty,l2] =
[tz,l2]=0
but this does not admit £2 into the canonical scheme because there is no coordinate to which (? is conjugate and so the other two relationships are trivially not satisfied. To emphasise the point once more, the components of the angular momentum "vector" and the scalar "square of the angular momentum" cannot be brought into the canonical formalism. 5.4.
Components of the Angular Momentum "Vector"
The elementary vectorial definition of angular momentum in equation (5.2.2) I= r x p satisfies all the usual requirements of vector algebra and only the fastidious would object to the mathematics of this definition. The vector so defined is the dual of a bivector and, as a vector, can be resolved into components along any axes whatsoever. But the abstraction of the vectorial properties from the real motion does not carry the most important property of angular momentum with it. As I noted at the outset, it is instructive to contrast linear and angular momentum. If a linear momentum vector is expressed, for example, as the sum of its three Cartesian components P = Px + Py + Pz = Pxi + Py]
+Pzk
(say) then each of these three components has a physical interpretation which is identical to that of the total linear momentum: Each one of p, px, py and pz represents linear momentum in a particular direction; the difference is only in the magnitude and direction of the linear momentum. The key point being that a particle may, simultaneously, have linear momentum in any number of directions. 1 Only three of them will be linearly independent.
Angular
108
Momentum
However, in the case of angular momentum the situation is very different. If an angular momentum vector is similarly expressed as the sum of its three Cartesian components £ = tx ~r £y "T Z-z
==
*-x^ i ^y3
' ^-z^
(say) then, notwithstanding the identical resolution of the two vectors, the implied consequences for the interpretation of the physical reality are quite different. Angular momentum about a given axis is not capable of being seen as composed of several independent angular momenta about other axes. Rotational motion is such that the rotation of a body or system of particles may, at any one time, only take place about a single axis. The axis about which the rotation occurs may change with time of course, as happens in precession but, for example, if the angular momentum is constant, the axis about which rotation occurs is always the one given by the direction 2 of the "vector"
I = r x p. Only this vector has the physical interpretation of rotation of a body or system of particles about the direction of the vector. When we resolve the vector into, for example, Cartesian components ^x> £y> f-z the resulting components do not have a physical interpretation as angular momenta in the same sort of way. That is, the vectors (x, ty, tz are not (all) interpretable as rotational motion about the respective Cartesian axes. Such a conclusion is made more obvious and intuitively acceptable if (again!) we contrast linear and angular motion. If we have a particle moving in the xy plane with momentum components px, py and give it an impulse in the z-direction so that it acquires a linear momentum of pz in that direction, its resultant momentum in the xy plane is unchanged and its total momentum is simply P - Px + Py + Pz •
However, as we all know from our childhood experiences with spinning bicycle wheels, if we have a system rotating about a given axis and try to 2
W i t h due regard for the "sign" of this direction, of course.
5.5.
Conclusions for Angular Momentum
109
give it motion in a different direction (application of a quick torque impulse) we do not get a smooth transition to simple rotational motion about some intermediate axis; precessional motion occurs. 5.5.
Conclusions for Angular M o m e n t u m
Everything discussed in this chapter has been in terms of classical particle mechanics but it has some very far-reaching ramifications for the de-mystification of the interpretation of quantum mechanics. The main qualitative conclusions are: • Angular momentum is motion about an axis, not about a point, and only one direction in space may be chosen for a canonical angular momentum. • Whatever the mathematical convenience of the description of angular momenta by vectors, this description provides profoundly misleading analogies with genuine vector quantities In particular While it is entirely possible to resolve the angular momentum "vector" £ = f x p into components along any axes whatsoever, the resulting vector components are not (in general) capable of being interpreted as angular momenta about these directions. Colloquially one might say that the vector I has components in any direction but the physical quantity angular momentum does not. We shall wish to revisit these conclusions in later discussions.
This page is intentionally left blank
PART 4
Schrodinger's Mechanics
After a discussion of the most famous "thought experiment" based on crystal particle diffraction which purports to show that electrons are both particle and wave, the transition is made from the HamiltonJacobi equation of classical mechanics to the underlying dynamical law of Schrodinger's mechanics. The Schrodinger equation and its associated boundary conditions (which generate quantisation) are derived from this law. An attempt is made to distinguish, in Schrodinger's mechanics, between equations (which are expressions of the dynamical law) and identities (which are not) which, historically, has been the cause of some confusion in quantum mechanics.
This page is intentionally left blank
Chapter 6
Prelude: Particle Diffraction
There are a number of examples of experiments which are presented as supporting the essentially dual nature of what are seen in classical terms as particles. It is suggested both that these experiments make it mandatory to consider individual electrons (for example) as simultaneously having wave and particle properties and the existence in nature of mysterious cooperative properties independently of any physical interactions which the particles might have. In this chapter I take a closer look at the most familiar and ostensibly convincing of these experiments; particle diffraction.
Contents 6.1. 6.2. 6.3. 6.4. 6.5. 6.6. 6.7. 6.1.
History 6.1.1. The Experiment 6.1.2. The Explanations The Wave Theory The Particle Theory A Simple Case Experimental Verification The Answer to a Rhetorical Question Conclusion
113 114 114 115 116 118 120 121 121
History
Historically, the two-slit diffraction experiment has played a significant role in attempts to illustrate and elucidate the difficulties of interpretation associated with quantum theory. In particular, discussions of particle diffraction by periodic structures ("screens with slits") have been used to 113
Prelude: Particle Diffraction
114
support the idea of wave/particle duality and to illustrate the concept of the non-localisability of particles in the quantum domain. 6.1.1.
The
Experiment
There is enough experimental evidence in existence to establish that a beam of particles (electrons, say) of velocity v are diffracted in a way which has an identical mathematical form to a beam of monochromatic (constant wavelength) waves. The formal analogy is complete if the wavelength of the waves (A, say) is related to the momentum of the electrons by , , A = h/p =
Planck's constant ; : —momentum = electron mass x velocity
Experiments are done with crystals or other naturally-occurring forms of matter since the wavelengths A are quite short for accessible electron velocities. For the purposes of this discussion we can replace the actual experiments with an idealised case of a screen with slits. 6.1.2.
The
Explanations
There are two explanations of diffraction, a classical wave theory and a quantum particle theory, both of which lead to the same relationship for the distribution of diffracted particle probability density: • The classical Bragg wave theory involves only the idea of wave interference (each slit acting as a line source) with no involvement from the quantisation of dynamical variables. • The Duane particle diffraction theory, which involves the quantisation of the momentum components of any particles confined to a finite region of space; the linear momentum component of particles confined to a region of length L are restricted to be integral multiples of h/L. In one dimension, the linear momentum components of particles in the screen between the slits and perpendicular to the direction of those slits must be integral multiples of the inverse of the inter-slit distance (L, say). Since both models involve the description of the diffraction of particles (electrons, say) then the first is known as the (wave/particle) dual theory while the second is the unitary particle theory. The identical predictions of both theories were used in the general positivistic atmosphere in the pre-Second-World-War period to suggest that:
6.2.
The Wave
Theory
115
• Attempts to distinguish between the two theories were futile and • Theoretical distinctions which had no observable consequences were meaningless, and this position has remained basically unchanged. But a homomorphism of the mathematical structure of different theories most decidedly does not imply physical identity of the referents of those theories. One only has to think of, for example, 1. the familiar analogue modelling of many systems by electronic components 2. the many applications of Poisson's equation to be convinced that the set of physical systems described by mathematical structures is much larger than the set of mathematical structures used in those descriptions. • The 24 symmetry operations of a regular tetrahedron are isomorphic to the 24 permutations of 4 identical objects, because that is what the operations are, the permutations of the identical corners of a tetrahedron. • However, the fact that the formula F = C — P 4- 2 gives the number of degrees of freedom (F) of a system in terms of the number of components (C) and the number of phases (P) in one interpretation of the symbols and the relationship between the faces (P), edges (C) and vertices (F) of a convex polyhedron in another, is just a coincidence. • Similarly, the appearance in a theory of a partial differential equation containing spatial derivatives of second order and a time derivative of first order does not necessarily mean that there are real (physical) waves involved. In this chapter it is shown that it is possible to distinguish, both conceptually and experimentally, between the dual and unitary description of particle diffraction. A proposal is made for such an experiment.
6.2.
The Wave Theory
The wave theory of diffraction from single or multiple slits has been known for many years and is basically a "macroscopic" theory; it applies equally well to the diffraction of sea waves passing through a narrow channel and to light waves passing through a set of closely-spaced engraved lines on
Prelude: Particle
116
Diffraction
a glass plate. The theory is independent of the structure of the material surrounding the "slit", depending only on one basic assumption: When a set of monochromatic wave fronts impinge on a "screen" containing a single "slit" which is of a width comparable to the wavelength of the waves, the slit acts as a "line source" and emits wave fronts on the other side of the slit which are of (semi) cylindrical symmetry. When the screen contains many (equally-spaced, for simplicity) slits, each slit acts in the same way, emitting its own set of cylindrical wave fronts. The characteristic pattern which these sets of waves generate is due to the interference between the amplitudes of the sets being emitted from each slit; the waves may reinforce or partially cancel each other's amplitude in a characteristically regular fashion. Any plane parallel to the diffracting screen will record a pattern of maxima and minima in which the distance between two adjacent maxima will depend on the slit spacing. It is elementary to show that, for monochromatic radiation of wavelength A, inter-slit spacing L, and if the distance from the slits to a detecting screen is D then the spacing between adjacent maxima in the pattern on the detecting screen is A, given by A = Ax—.
6.3.
(6.2.1)
The Particle Theory
If we simply consider a idealised beam of particles impinging on a screen containing slits, then, if the beam is exactly perpendicular to the screen, there is ostensibly no reason for the beam to spread at all; the particles will either hit the screen and be stopped or "hit" the slit and simply pass through. The particles will only suffer any deflection if they hit or are hit by the internal edges of the slit. 1 So, in contrast to the wave theory of slit diffraction, the particle theory actually depends on the detailed structure of the material of the screen and, in particular, on the nature of the motion of screen material in the region between the slits. The essence of the particle theory is that any particle which is constrained to remain in a one-dimensional region of length L (say) has a *A maddening phenomenon familiar to all snooker (or pool) players when a ball oscillates rapidly across the mouth of the pocket before falling in.
6.3.
The Particle
117
Theory
momentum component in this dimension which may only be integral multiples of the basic quantum h/L. Unfortunately for the logical development of a particle theory of diffraction we are in a deadly embrace here since this quantisation of linear momentum is the simplest application of the Schrodinger equation which we have not yet introduced; it is nothing more than the familiar "particle in a box" model. For the moment, therefore, this quantisation of linear momentum is simply an assertion which will be justified (or not) a posteriori. However, it was clear to the pioneers of the "old" (pre-Schrodinger) quantum theory that, when a particular co-ordinate had a periodic structure, the value of the conjugate momentum is quantised. This was summarised in the so-called Sommerfeld quantisation condition: (p pidq1 = nh where q% is understood to inlude the spatial co-ordinates and time so that: • If an angle is cyclic, the corresponding angular momentum is quantised (2isp$ = mh, for integer m). • If a length is cyclic, the corresponding linear momentum is quantised (Lpx = £h, for integer €). • If time is cyclic, energy is quantised {{\/u)HE = nh, for integer n). Thus, when a beam of particles impinges on a screen with such a slit structure the particles passing through the slits may, if they are close to the edges of those slits, exchange momentum ("collide"2) with particles in the screen material. If we neglect any momentum exchanges in the initial direction of the beam perpendicular to the plane of the screen (which simply speed up or slow down the particles without deflecting them), then we can consider momentum exchanges which may take place in the two perpendicular directions within the screen which will cause changes in particle direction: • The quantity h/L is very small for directions along the slits since this length, L, is essentially of "laboratory" dimensions, and so momentum exchange here will be basically continuous, leading to a simple spreading of the beam in that dimension ("vertically", let's say). 2 Since the particles are charged, such "collisions" will be mutual short-range repulsions, of course.
Prelude: Particle
118
Diffraction
• However, if the inter-slit gap is small (this L being microscopic), the quantity h/L in the perpendicular direction ("horizontally") is large, and impacts between the beam and screen-material particles having a horizontal momentum component of integral multiples of h/L will lead to discrete deviations in the direction of the beam particles in the horizontal plane. Duane 3 showed how the quantitative treatment of this phenomenon for a periodic lattice generates the familiar Bragg law for the diffraction pattern. If we use an experimental setup identical to the one used above for the electromagnetic waves incident on a screen with slits where everything is the same except that the waves are replaced by a beam of particles of moment p, then it is easy to show that the distance between adjacent impact on the detecting screen is given by:
Which is of exactly the same form as (6.2.1) with the wavelength of the electromagnetic radiation replaced by the so-called de Broglie wavelength of the particle "A = h/p". This theory would predict a series of sharp lines on a detecting screen but any real screen made of real material would have a temperature-dependent spread of impacts centred around the theoretical quantity leading to a broadening of the line in a manner familiar to spectroscopists.
6.4.
A Simple Case
A special simple case of interest is the "three-slit" experiment; suppose a planar screen has three equally-spaced slits in it with inter-slit spacing of size appropriate to the momentum ("wavelength") of some incident particles so that a diffraction pattern is generated. Further suppose that the technology is available to collimate a beam of particles sufficiently accurately so that they only impinge on the centre slit of the three. Let us imagine what would be emitted by the slit in the two possible cases, wave diffraction and particle diffraction. 3
W Duane, "The transfer in quanta of radiation momentum to matter", Proc. Nat. Acad. Sci. 9, 158 (1923).
6-4-
A Simple
Case
119
• In the case of the classical wave theory, the presence of the two slits on which no "particle intensity" falls is irrelevant to the experiment. We simply expect that the single centre slit will act as a line source resulting in a cylindrically-symmetrical pattern with maximum intensity in line with that of the incident beam and falling away smoothly and symmetrically to zero for large enough deflection. • In the case of the quantised-inter-slit-momentum theory the situation is quite different. Particles passing through the central slit can interact with the material of the screen (the two narrow inter-slit parts) and exchange momentum only in integral multiples of the basic quantum (of h/L) independently of whether or not any particles pass through the other two slits. Thus, in this case, we would see a diffraction pattern centred about the direction of the incident beam with characteristic maxima and minima satisfying the now-familiar law of equations (6.2.1) and (6.3.2). The characteristic difference here is that, in the case of the Duane theory of quantised particle diffraction, the pattern is generated by momentum exchange with the periodically-quantised material of the screen independently for each set of particles passing through each slit. In the wave theory it is the combined effect of particles passing through the set of slits which generates the diffraction pattern. Thus, according to the wave-theory model, the emergent beam will generate a diffraction pattern only if more than one slit is "open". But, according to the particle model, the pattern will be generated by the component of the beam emerging from each slit. Of course, in both cases the actual amplitude of the pattern will be enhanced by the passage of the beam through many slits. Making an obvious extension of this argument, if an experiment could be arranged so that the beam only fell on a (regularly-spaced) subset of the slits, then the predictions of the two models would be different: • The wave model would predict a diffraction pattern obeying the Bragg law for the periodicity of the slits through which particles had actually passed (here the active slit spacing would be an integral multiple of the actual inter-slit gap). • The particle model would generate a set of deviations generated by the actual underlying basic slit pattern from each of the subset of slits through which particles actually passed. This would lead to sets of maxima and minima combined to form an overall pattern which would
Prelude: Particle
120
Diffraction
show the Bragg law of the basic slit pattern. Naturally this pattern would be attenuated by an overall envelope with the periodicity of the subset of gaps through which the beam had passed. It is important to note that, to see the full contrast between the predictions of the two models of particle diffraction, at least three slits must be used, not the more familiar two. If only two slits are used and the beam is collimated to pass through just one of them, the prediction for the particle model is very different, since impacts made by the particles passing through the slit would produce different effects according to which edge the impact occurred from. Only on one of the edges would the momentum of the screen material be quantised; the other edge would not be the limit of a small one-dimensional region; its dimensions would be macroscopic, from the slit edge to the end of the screen. So, in this CfilSG, cl beam of particles passing through one of a pair of slits would experience "horizontal" impacts which would be quantised from one edge but continuous from the other edge, spreading into an unsymmetrical pattern.
6.5.
Experimental Verification
Needless to say, the verification of this conjecture about the distinction between the two models of particle diffraction would be extremely experimentally demanding; the central problem being, of course, the collimation of the beam so that one could be sure that only one (or only a few relatively widely-separated) slits were traversed by the incident beam. More important, to make a genuine distinction the beam should consist of material about which there is universal agreement that it is composed of particles. This excludes the use of light beams since there is no such universal agreement; the dominant school of thought assumes electro-magnetic radiation to be composed of corpuscular photons while a significant minority regard the term "photon" as a convenient shorthand for "set of quantum numbers for the states of the electromagnetic field". Obviously, if a beam is used which actually is composed of a wave train then there is no distinction to be made.
6.6.
The Answer to a Rhetorical Question
It is common in semi-popular expositions of the quantum theory of particle slit diffraction to find statements like:
6.7.
Conclusion
121
"How is it possible that opening a slit through which the particle does not pass can affect its motion?", where the question is clearly intended to be rhetorical and startle the reader into thinking about "paradoxes" in the interpretation of quantum theory. But, using the particle diffraction theory there actually is a simple answer to this question: "Because opening another (nearby) slit changes the possible motions of the screen material from having continuous linear momentum in a direction in the screen to quantised motion in that direction and it is transfers of this linear momentum which are causing the pattern which you observe." or, even simpler and less pedantic: The pattern is generated by the interaction between two systems, changing the properties of either one of them will change the pattern.
6.7.
Conclusion
The purpose of the material in this chapter is not particularly to champion the cause of Duane's particle theory of diffraction, although it must be obvious where my prejudices lie, but to draw attention to two points which I believe have been lost sight of in attempts to understand quantum theory: • Mathematics is not physics; even though physical theories are often articulated in mathematical form, an identity or similarity of mathematical structure may or may not serve to indicate a similarity of structure in the material world. • Even in the event of having a satisfactory mathematical scheme which appears to rationalise a physical structure or process we can never shirk from the attempt to investigate any physical mechanisms which purport to underlie that rationalisation. Mathematical description cannot replace physical mechanism in science; the ghost of Ptolemy is always present to warn us. In the discussions of Schrodinger's mechanics which follow later in this work, nothing is dependent on the final outcome of an experimental determination of which of the two models of microscopic diffraction turns out to be true.
122
Prelude: Particle
Diffraction
Finally, I must note, parenthetically, that attempts have been made to solve the Schrodinger equation for model systems of this kind and such attempts are not at all trivial for at least two reasons: • The Hamiltonian for a system of particles and a model screen with slits is time independent; solution of the Schrodinger equation will only generate a static probability distribution for the electrons. • Any model potential for such a system will not have the convenient smoothness of the more familiar potentials; there are sharp "corners" with discontinuous derivatives in the potential as it goes from zero to infinity. But, much more important, the particle theory is dependent on the interactions between the particles and the inter-slit material which is not included in these models.
Chapter 7
The Genesis of Schrodinger's Mechanics
Using material developed in Part 3 a brief outline of the relationship between quantum and classical (particle) mechanics is given, stressing the nature and meaning of the variational principle which is the basis of Schrodinger's mechanics. Much of this material is descriptive and heuristic but the final section ("Summary") contains the principles which certainly will enable all of Schrodinger's mechanics to be generated as well as the formal structures which may be abstracted from it.
Contents 7.1. 7.2. 7.3.
7.4.
7.5. 7.6.
7.1.
Lagrangians, Hamiltonians, Variation Principles 7.1.1. Equations and Identities Replacing the Hamilton-Jacobi Equation Generalising the Action S 7.3.1. Changing the Notation for Action 7.3.2. Interpreting the Change Schrodinger's Dynamical Law 7.4.1. Position Probability and Energy Distributions 7.4.2. The Schrodinger Condition Probability Distributions? Summary of Basic Principles
123 125 126 128 129 131 134 135 136 140 142
Lagrangians, Hamiltonians, Variation Principles
T h e principles of classical particle mechanics are most cogently expressed in t e r m s of variation principles involving t h e Lagrangian or Hamiltonian functions. In view of t h e rather strict interpretation I am going t o place 123
The Genesis of Schrodinger's
124
Mechanics
on the corresponding variational principle in Schrodinger's mechanics, it is appropriate here to be absolutely clear about the meaning of these functions and what the effect of the variation principle is on their extremum values. Taking the Hamiltonian as an example, it is a function in general of 6JV + 1 variables: the 3N coordinates (collectively ql), the 37V conjugate momenta (collectively pi) and time (i); it has the dimensions of energy (ML2T~2). That is, for every set of values of q%, pi and t there is a value of H(qi,pi; t); here the special role of the variable t has been emphasised by the notation. When the variational principle is satisfied (when the equations of motion are solved) then, of course, the coordinates q% and momenta pi become known functions of the time t: qi—>qi(t);
Pi —> Pi(t)
where I have deferred to the usual convention in physics and not changed the notation for q% and pi when they become dependent on t, using the same symbol for the interpretation of, for example, a coordinate as for its functional dependence on t; a choice which, in this context, can be confusing. That is, the Hamiltonian function becomes a function (F, say 1 ) of time only:
H(q\Pi;t)^F{t). Also, when the equations of motion are solved, the value of the Hamiltonian function is, at every time t, numerically equal to the energy of the system: H(qi,pi;t)
= F(t) = E(t)
where it is important to stress that this latter expression is an equation not an identity; it is only true for those functions q1, pi which satisfy the variation principle (solve the equations of motion). The detailed functional forms of the Hamiltonian (H) and the energy (E) are different, but their numerical values are equal for trajectories which solve the equations of motion. The numerical value of the Hamiltonian function is only equal to the energy of the system for those particular functions q%{t) and pi{t) which solve the equations of motion for the system; that is UH = E" is only true for actual, real motions of the system. Purposely using new notation for the function of t only.
7.1.
Lagrangians,
Hamiltonians,
Variation Principles
125
In general, since the independent variables q%, pi and t may take any values whatsoever, the Hamiltonian function may take an infinite range of values different from the energy of the system albeit all having the dimensions of energy. Analogous considerations apply to the Lagrangian function L(ql,ql;t); it is a function of 6N+1 independent variables in general and, on solution of the equations of motion, becomes dependent on t only since the independent variables q% and F(t)
H(ql,Pi;t)
—> G(ql;t)
(say) Hamilton (say) Hamilton-Jacobi.
In both cases the functions formed from the Hamiltonian only refer to actual motions of the systems; the equations of motion are solved in generating F(t) or G(ql;t). 7.1.1.
Equations
and
Identities
In setting up the Hamiltonian function, the momentum components pi which are associated with each coordinate q% are defined by def
Pl =
dL{qi,qi;t)
a**
This is not an equation; it is an identity.
'
The Genesis of Schrodinger's
126
Mechanics
The familiar Hamiltonian "equations" of motion are often presented in the following symmetrical form:
dH . def 9H Qi = - 5 — • opt
The first of these is an equation — it contains Newton's F = ma in a generalised form — while the second is an identity — it defines "velocity" in Hamilton's theory. In practical applications of these equations, the second is simply used to eliminate one set of variables in order to be able to integrate the first. We shall meet situations very similar to this in investigating Schrodinger's mechanics; there are superficial (formal) similarities between expressions which are either equations or identities which, if we are to interpret the formalism, we must be able to distinguish. 2
7.2.
Replacing the Hamilton-Jacobi Equation
We have already seen, particularly in Section 4.3 on page 76 of Chapter 4, that in order to obtain a full solution to a dynamical problem from the relevant Hamilton-Jacobi (H-J) equation, in the sense of obtaining the actual particle trajectories as an explicit function of time, we must be able to obtain expression for the "initial conditions" of the motion. These expressions are given by the derivatives of the function S with respect to the "constants" introduced by solving the partial differential H-J equation. By fixing the numerical values of these quantities, we get explicit expressions for the trajectories. Now, in the dynamics of sub-atomic particles this information is simply not available; there is no hope of being able, for example, to obtain the initial positions of the 6 electrons in a carbon atom in order to be able to calculate the trajectories of these electrons and compare the resulting electron distribution and energies with experimental results. We must fall back on some way of calculating the (probability) distribution of the electrons. 2
A distinction which is not possible in the algebraic, Hilbert space formalism.
7.2. Replacing the Hamilton-Jacobi Equation
127
The H-J equation contains the best hope of being able to do this since: • It contains everything about the dynamics of all the possible trajectories of the system. • It is a function of space and time only, the momentum components being (literally) derived from the function S(ql;t). • There is an associated "trajectory density" equation derived from the condition that the H-J equation should conserve trajectories. Thus, speaking colloquially, we might say that "5 fills out configuration space with allowed trajectories" so that this function, unlike H or L, has an objectively real referent for each and every value of its arguments ql and t. What we are saying here is that, by a shift of emphasis, we may regard this referent as an abstract object: A particle whose motion satisfies the H-J equation specifically excluding any mention of (i.e. abstracting from) the initial conditions. This shift of emphasis is made because the next development in the mechanics of systems of particles came in attempts to describe the motions of particles for which there was (and is) no hope of being able to specify the initial conditions; atomic and sub-atomic particles. For most of these microscopic systems, experimental measurements are carried out on billions upon billions of concrete objects whose initial conditions (if that concept has meaning for sub-atomic particles) may well all be different. For all practical purposes, experiments can neither determine nor infer the "initial" conditions of, for example, the motion of the electrons in one carbon atom so that any mechanics which requires the specification of such data is doomed to impotence in the sub-atomic domain. Schrodinger took the Hamilton-Jacobi equation as his starting point for the creation of a new system of particle mechanics valid in the region of the very small and the very light. His contribution in his epoch-making first paper may be summarised in two steps; one apparently trivial and one boldly new. We will examine them in sequence, beginning with the smaller of the two steps. In looking at Schrodinger's work we must, of course, guard against the idea that his mechanics can be "derived" from the HamiltonJacobi equation; it cannot. Schrodinger's mechanics is a new creation, it contains new intuition about reality which mathematical manipulation can
The Genesis of Schrodinger's Mechanics
128
never supply; we can only hope to make the translation a little smoother, to bring, as I have tried to do in discussing the H-J equation, classical particle mechanics and quantum mechanics close together before making a jump. The difference between pedagogy and creation is that, when we jump, we know that there is something there to jump to because Schrodinger has been there before us. The considerations of the next few sections are therefore tentative and heuristic and the reader may safely deplore the whole project of attempting to join up classical particle mechanics with Schrodinger's mechanics. Nevertheless, once the Schrodinger Condition has been set up and the interpretation of the quantities it contains made clear, the whole of Schrodinger's mechanics may be derived and is, of course, independent of the heuristics used here. As we noted in Section 1.6 on page 13, it is the habit of mathematicians and mathematically-inclined scientists to present their work using the "lapidary method"; to show a selection of specimens so well-chosen and perfectly polished that there is no clue to their origin and original appearance. I prefer a more historical and human approach, exhibiting the rough-hewn specimens before cosmetic treatment.
7.3.
Generalising the Action S
Recall the point which we reached at the end of Chapter 4 in the attempt to find the most general expression of the laws of classical particle mechanics; an equation which generated the possible trajectories of particles whose motions are determined by a particular Hamiltonian and an equation for the density of particle trajectories in ordinary space: „ ' d S H
i
\
dS
„
q t + =0
lM' ' ) *
dp dt
„
/
dH ydidS/dq*)
These two partial differential equations may be derived from a general variation principle:
5 f dV fdtC = S fdV
fdtlp
H(dS/dq\Qi;t)
+~
)=0.
(7.3.1)
7.3.
Generalising the Action S
129
What Schrodinger did 3 was to find a variation principle which: • Replaced the two real functions S and p by a single complex function ip and generated an equation for tp. • Enabled the calculation of the allowed energies of systems of microscopic particles in the absence of any initial conditions on the particles' motion. The function ip proved to be capable of interpretation as generating a probability distribution for the positions of the particles when the abstract object to which the function ip refers is suitably defined. 7.3.1.
Changing the Notation
for
Action
The solution of the Hamilton-Jacobi equation, S, has the dimensions of energy x time ("action") and its time derivatives have, as we have seen, the dimensions of energy and, in particular, —dS/dt is the total energy of the system as a function of the ql (and possibly t). Schrodinger, in his investigations of the Hamilton-Jacobi equation made an apparently trivial change of notation writing: S =f K In V : 1> d= exp(S/K)
(7.3.2)
where the numerical factor was added, since In?/' has no dimensions and S should have dimensions of action so that K must have the dimensions of action; obviously the numerical value of K depends on the system of units being employed for actual calculations: a "natural" choice of units would be one which gave K the numerical value of unity. If this substitution is made in the Hamilton-Jacobi equation we need the relationship dS_ _ K_dj)_ da1 ip dqi i.e.
9V _ ip dS 'dqi ~ ~K~dqi for the momenta, and then the equation becomes
"i'.S£.2-Ei>2=0.
~(^f] 2m \dq
(7.3.6)
Now if we transform just the derivative part of the equation back into the original notation involving S using (7.3.3) we have:
(df\2 _ f_ (dS\2 \dq)
~
K*\dq)
i.e. W>2)^ ( ^ )
2
+ W2)V{q) - {tf)E
= 0
(7.3.7)
7.3.
Generalising
the Action
S
131
the right-hand-side of which is identical to the integrand ("Lagrangian 4 density") of the equation (7.3.1 on page 128) from which classical particle mechanics may be derived if we identify the classical p with ip2. If we allow that ip may be complex, going through the whole procedure again generates m
|2
i (dsV
2 ^ \ d j )
,
L/J2T
+Wv(q)-\Tp\E
=0
(7-3-8)
and insisting that (7.3.8) be identical to the Hamilton-Jacobi equation yields K = ±ik (say) 5 5 = — iklnip:
ip =
exp(iS/k)
where k and S are real and the minus sign has been chosen for conventional reasons. Note that, although equation (7.3.8) is written in terms of the two functions S and tp this is only to emphasise its similarity to equation (7.3.1), the essence of the heuristic arguments here is to show that the whole equation may, using (7.3.3), be written in terms of a single complex function ip: -k2 2m 7.3.2.
Interpreting
dip dq the
+ V(q)\iP\2-E\iP\2
= 0.
Change
Now we can attempt to interpret the terms in the right-hand side of (7.3.8). Let us do this by temporarily ignoring the fact that \ip2\ is constrained to be unity and look at the form of (7.3.8) when translated back into coordinate 4 T h e r e is an unfortunate conflict of nomenclature here. Historically, the Lagrangian function was the one which we have used earlier, the difference between the kinetic and potential energies of a classical mechanical system. Its appearance as the integrand in the derivation of Hamilton's equations made it the archetype for the variational method so that, in a mathematical context, "Lagrangian" and "Lagrangian density" have lost their original interpretation and simply become colloquial names for the integrand in a variational principle. This is particularly unfortunate in mechanics since one is typically dealing with "real" Lagrangians and variational principles, which may involve integrands which are simply called Lagrangians. 5 T h i s is where we depart from Schrodinger's own derivation slightly; Schrodinger had K real.
132
The Genesis of Schrodinger's
Mechanics
and momentum variables, i.e. m2~p2
+ ^\2V(q)-\^\2E = 0.
(7.3.9)
Taking the first term, it has the form (positive function) x (Kinetic Energy) that is, it has the form of a distribution of kinetic energy. Similarly, the other two terms in (7.3.9) have the form of a distribution of potential energy and a distribution of total energy, respectively. Further, the function \ip\2 has some of the properties of a distribution function: it is always positive and it is bounded. Recall again that it is possible to interpret S as referring to an ensemble of systems and that p (replaced now by | ip\2) was the density of trajectories or particle density and we have the beginnings of a new approach. The essential point, however, is to make sure that the precise meaning of "a distribution of trajectories or particles" is made clear. With these hopeful ideas in mind we now go back to equation (7.3.2) and allow S to be complex while retaining the same relationship between S and ij). This is now not a trivial change of notation generating mere tautologies; it is a new assumption in mechanics. Since classical mechanics has no use for a two-component S function, all the dynamics comes out of a real S. The whole object of this change is to introduce a "new degree of freedom" into the development which will separate \tjj\2 from S and the momentum in order that \ijj\2 can play the role of a genuine distribution function not constrained to be a constant. Thus, by writing S - iR = -ik In ip : ip = exp[(fi + iS)/k]
(7.3.10)
we have M 2 = exp{2R/k)
= p(q) (say)
and equation (7.3.9) becomes p^p2
+ pV(q)-pE = 0
(7.3.11)
where p is a function of space and a discussion of the meaning of p (the momentum) in the new circumstances of complex "action" has been deliberately deferred. We can now use the same interpretation of the terms in (7.3.11) as before; each term is a distribution function multiplying an energy
7.3.
Generalising the Action S
133
function. But what is p(q) a distribution of and how is it to be determined? We have introduced a new function R and no equation to determine it; we can still formally cancel p from (7.3.11) leaving the Hamilton-Jacobi equation for S. In the classical continuity equation, which was one of our starting points, the strict interpretation of the function p was a density or distribution of trajectories since, for example, in the case of a single particle, there can be no such thing as a particle density; the particle is either at a given point or it is not. Similar remarks apply to many-particle systems. If we visualise the motion of N particles in three-dimensional space as the motion of a single particle in a 3iV-dimensional configuration space, the same conclusion holds; a distribution of trajectories is a mathematical object but a distribution of particle(s) is a physical quantity. The only coherent interpretation of the function p — \I/J\2 is as a probability distribution for the position of the particle(s). The transition between classical mechanics and Schrodinger's mechanics involves making explicit what is only implicit in the Hamilton-Jacobi equation and its associated continuity equation; the "distribution of trajectories" function is replaced by a particle probability distribution function. This decision takes us away from the general ansatz of classical mechanics, dealing as we now shall be with probabilities. The most important consequence of dealing with probabilities is the change in the referent involved. If we continue for the moment to think of a single-particle system, the referent of any theory which interprets p as a particle position probability density must be the abstract particle in the environment generated by the particular constraints of potential energy function, etc. Naturally, we cannot simply call a function a probability distribution function; it must satisfy the mathematical conditions imposed on any such function. That p satisfies such conditions is not obvious, indeed it is impossible to say until we have a (differential) equation which will generate it via the function ip. Investigation of these conditions must be deferred until we have some more information. Like us, Schrodinger was acutely aware of the need to develop a new mechanics of sub-atomic particles and, no doubt, even more acutely aware that a new mechanics cannot be got by changes in notation and re-interpretation of that notation, however suggestive those changes might be. What was needed to enable the theoretical understanding of the
134
The Genesis of Schrodinger's
Mechanics
dynamics of sub-atomic particles was a variational condition which would "contain" or "go over into" the Hamilton-Jacobi condition for large masses and allow for our ignorance of "initial conditions" in the sub-atomic world. Schrodinger was able to present a single variational condition which generated both R and S because they had both been absorbed into a single complex function V, and so could obtain both the particle's probability distribution and the momenta. Historically, Schrodinger's reasoning (if creative thinking can be called reasoning) was different from the development given here and was partially based on an analogy between optics and mechanics originally due to Hamilton. This very analogy led to some confusion about the interpretation of quantum mechanics which we are trying to side-step here; as we remarked above pedagogy is not creativity.
7.4.
Schrodinger's D y n a m i c a l Law
I have tried to present a plausible way in which the Hamilton-Jacobi equation and its continuity equation could be extended and generalised. The elements are: 1. The concentration, in classical mechanics, of information about trajectories into a single scalar function S and a trajectory density function p generated by the variation principle equation (7.3.1). 2. The functions S and p being regarded as referring to the trajectories of an ensemble'of systems differing in their initial conditions. 3. The possibility of relating S to the distribution function p by the (formally) slight generalisation of admitting complex S via the complex function ip. 4. Changing the referent of, for example, the mechanics of a single particle in a given environment from "an ensemble of trajectories of a particle in the given environment with their initial conditions" to the abstract object "a particle in the given environment". Now it is time to look for a variational principle which will replace equation (7.3.1) in this new situation. It turns out that all that is required is to use equation (7.3.1) in the new context described in the last section with p and S generated from the function ip. This is the mathematics of the derivation, the interpretation will be quite different.
7.4- Schrodinger's Dynamical Law
7.4.1.
Position
135
Probability
and Energy
Distributions
The function p{q;t) = \i>{q;t)\2 is (for the abstract single particle in the given environment) a probability of position distribution, i.e. P(W)=
f
\ip\2dV
WcR3
is the probability that the abstract particle be in the region of space W (where R3 models ordinary three-dimensional space E3). If this is the case, we must assume that ip may be normalised to unity by a suitable numerical multiplier: P(R3) = f
|V>|W = 1.
That is, we expect that random measurements of the positions of concrete particles in the given environment will give an approximation to P(W) by means of counting the relative number of such concrete particles found in the region of E3 modelled by the region W of R3. The "distribution" of the dynamical variables for this abstract object takes the form Probability distribution x Value of the dynamical of the particle at a point. variable at that point. which we may write as a distribution of the dynamical variable A, given by pA(qi;t)
=
p(qi;t)A{qi,pi;t)
where the dynamical variable A(q\pi; t) will, in general, depend on position (q1), momenta (pi) and time (t). Notice that this terminology carries the possibilities of misinterpretation, in particular:
The Genesis of Schrodinger's
136
Mechanics
• Quantities like pA.{qx;t) are not probability distribution functions; they are not always positive and may well not satisfy any of Kolmogorov's axioms. • The variable A is not spread out in space with density PA- A is a property of the particle and so there is only "any A" in a given region when there is a particle in that region and the probability that the abstract particle be in a given region is given by the position probability distribution function
ptf-t). So, it is safer to think of quantities like
PA(Q%;
t) as
Probability distribution Value of the dynamical variable A of the abstract particle x that a particle would have at a point. if it were at that point. 7.4.2.
The Schrodinger
Condition
In Schrodinger's notation the important Hamiltonian and energy densities become
and
What is now required is an "equation of motion" — a new dynamical law — which replaces the Hamilton-Jacobi equation in these new circumstances. The H-J equation which determines the particle trajectories is determined by the variation principle (7.3.1):
'/*'/*{'
H{dS/dq\qi;t)
BS + —
0
and can be given quite a simple verbal formulation: Of all the possible trajectories q%{t) and momenta Pi(t) of the particles described by H(ql,pi,t), the ones which occur in nature are those for which the value of the function H is numerically equal to the energy of the system.
7.4-
Schrodinger's
Dynamical
Law
137
That is, for real motions of the particles in the system, only those q1 and pi are allowed in H which make H
= - f t =E
•
The equation which determines the density of trajectories p is also obtained from this variation principle in which the integrand is minimised with respect to two functions p and S generating two differential equations; one for p and one for 5. Schrodinger's theory uses the same variation principle with the functions p and S replaced by their forms in terms of the new function ip:
'W*M"('^&«K£]}-0 That is-.dfdVJ
dt{pH - pE} = 0.
(7.4.13)
This replaces the classical requirement by a new quantum law which is just as easy to state verbally: The Schrodinger Condition Of all possible trajectories q%{t) and momenta Pi(t) of the particles described by H the ones which occur in nature are the ones which on the average over space and time make H equal to the energy of the system. That is, so to speak, Schrodinger's modification of the Hamilton-Jacobi principle is that the Hamilton-Jacobi equation does not have to be obeyed point by point in a configuration space of 3N dimensions but only in the mean over all space. The requirement that both p and S be expressed in terms of the single function ip means that the Schrodinger Condition will generate a single differential equation. Although the above verbal expression refers to the trajectories of the particles it is important to note that, in Schrodinger's mechanics, the trajectories are not determined. What is determined by the Schrodinger Codition are the functions ip which determine (among other things) only the particle probability distributions. We shall see in the next chapter how to obtain the functions ip f° r particular systems; from now on the term "state function" will be used for a function which satisfies the Schrodinger Condition. 6 6 "Wave function" is the more widely used term here but only very rarely do these functions have the form of waves.
The Genesis of Schrodinger's Mechanics
138
In more compact notation this new dynamical law becomes: I
PHdVdt
-
f psdVdt = 0
(7.4.14)
where the Hamiltonian and energy densities PH and PE are denned above. As it stands this variational principle is not in a usable form. What we shall do shortly is carry through the variational calculus to obtain a differential equation which will enable us to actually compute the functions V> for a variety of mechanical systems. Application of standard variational methods to (7.4.13) generates the Schrodinger equation for ip and some boundary conditions which will need elucidation. The function \ip\2, once determined, can be expected to contain reference to all possible trajectories with equal weights. For example, for an isolated single particle moving in a potential V the distribution is over all possible trajectories with constant energy obeying (7.4.13); the trajectories only differ in "initial" conditions. Now in studying a dynamical law which only determines the properties of averages (quadratures) of dynamical variables (H and E) over trajectories we should not be surprised if we cannot recover the individual trajectories over which the averages have been taken. Schrodinger's law generates an equation for the probability distribution of the abstract particle in space. Certainly this may be visualised as a distribution averaged over all possible initial conditions for the given abstract particle's environment, i.e. over all possible concrete trajectories. These individual concrete trajectories are, however, not required by Schrodinger's mechanics to solve an Hamilton-Jacobi equation (or, indeed any equation) only the averages (properties of the abstract object) are fixed by (7.4.13). That is not necessarily to say that the particles in each concrete object having the environment of the abstract object but (say) differing in initial conditions do not have perfectly definite trajectories along which some (as yet unknown) laws are obeyed, it is simply that condition (7.4.13) does not tell us what these trajectories are. For example, in the classical mechanics of an isolated system we have at every point in space
H-E
= 0.
But all that (7.4.13) requires is that if
/ JWi
(PH-pE)dV
=5
7.5.
Probability
Distributions?
139
for some region W\ of space, then there are enough "allowed trajectories" so that, for some other region W2 of space, / (pH - PE)dV = -S Jw2 so that, on average over all space: / (pH-pE)dV JR3
= 0.
As we shall see later, the connections amongst the averages are able to be cast into a form reminiscent of Newton's law — the Ehrenfest relationships — but again only the averages 7 obey these relationships, not the individual trajectories which are completely invisible to Schrodinger's mechanics. The difference between the mechanics generated by the Schrodinger Condition and classical statistical mechanics which deals with ensembles,8 distributions and ensemble averages will help to make the above points clearer. In Schrodinger's mechanics the motion of the abstract object is required to solve the quantum "Hamilton-Jacobi" equation; for all we know individual concrete objects may well not obey any Hamilton-Jacobi-like equation. In classical statistical mechanics the motion of each member of the relevant ensemble is required to solve the Hamilton-Jacobi equation exactly and averaging is done with these exact solutions. That is, the condition for the satisfaction of the relevant dynamical law is: H —E =0 p = 8(H — E) (H — E) = 0
Classical Particle Mechanics Classical Statistical Mechanics Schrodinger's Mechanics
Showing the clear "mean value" nature of the quantum case.
7.5.
Probability Distributions?
In this section a simplified example is used to attempt to clarify the nature of the alleged distribution function |t/>|2 since, so far, it has simply been 7
We shall see in Section 16.1.1 that even these averages are misleading. Here, for obvious reasons, I mean the older literal, Boltzmann, type of ensemble not the Gibbs ensemble. 8
The Genesis of Schrodinger 's Mechanics
140
asserted that this function is a probability distribution without any check on the properties which a probability distribution function must have. Attention is again restricted to a single abstract particle in ordinary three-dimensional space for simplicity and intuitive appeal since it is intended to try to picture the various qualities appearing in Schrodinger's theory. The function V fixed by the variational requirement (7.4.14) is, in general, complex and its primary physical interpretation is via \ip\2 as indicated in the build-up to (7.4.14). For an isolated (constant energy) system with a time-independent Hamiltonian function, —dS/dt = constant = E (say) and so the function V is dependent on time only through an exponential factor of modulus unity (ex-p(iEt/k)) which does not appear in \ip\2 and so we neglect it for the moment. The interpretation of |V|2 a s a particle probability distribution is clearly that /
\ip\2dV WcR3
(7.5.15)
is the (relative) probability that the abstract particle be in the region of space modelled by W. Clearly, numbers like (7.5.15) must be judged by reference to the size of [
\i)\2dV
(7.5.16)
JR3
if R? models 9 all three-dimensional space. If (7.5.16) is finite (and this is not always the case) then tp can be re-scaled by a constant factor so that (7.5.16) has some convenient value; unity or the number of particles in the abstract object are obvious choices. In fact, it is convenient to normalise ift to unity, 10 i.e. insist, by use of a numerical factor, that I \i>\2dV = 1
(7.5.17)
so that the relative numbers
fwM2dV ]\i>\2dV
(7.5.18)
are the same as the numbers (7.5.15). 9
Here, as elsewhere, I distinguish between three-dimensional space and the product of three copies of the real number system R3 plus three linearly independent directions which model that space. This does lead to some unfortunate circumlocutions at times. 10 This is obviously convenient for the probability interpretation of ip but is also means that any theory does not have to specify the number of particles in ip.
7.5.
Probability
Distributions?
141
Now if (7.5.17) is imposed then the measures (7.5.15) satisfy Kolmogorov's axioms for an uninterpreted probability system, for if: P(W) = f \xjj\2dV Jw then • P{W) > P(W) if W D W • P\WX) + P(W2) = P{WX + W2) if Wi n W2 = 0 . P(R3) = 1 Further, since we expect that solution of the variational problem will generate a differential equation for ip which will generally ensure that |^| 2 is a distribution function, i.e. • \ip\2 be single valued • IV7!2 be continuous • and, of course, \ip\2 > 0 We may therefore use any or all of the techniques and concepts of probability theory. In particular we may refer to the numbers (7.5.18) (normalised measures) as "the probability that the abstract particle be in region W" (recall, for the moment we are concentrating on single particle system for simplicity). The experimental verification of these numbers is, naturally, via sets of random measurements on large numbers of concrete one-particle systems. 11 The relative numbers of times a concrete particle in such random experiments is found in those regions should give numerical measures of these probabilities if enough concrete systems are used and they satisfy the statistical criteria of randomness. The extension of these ideas to systems of many-particle systems is straightforward and we may, with caution, use the probability concept and say that /
mq\q2,.
. .,q3N)\2dV1dV2
• • • dVN
Wi,W2,...
is the probability that particle 1 is in region Wi, particle 2 in region W2, etc. Always provided that the function ip has been normalised to unity in 3iV-dimensional space. 11
Or, less realistically, large sets of random measurements on a single concrete particle.
The Genesis of Schrodinger's
142
Mechanics
That these mathematical conditions on 0 are satisfied must be verified in each specific case studied.
7.6.
Summary of Basic Principles
Whatever the reader might make of the attempts in this chapter to make the transition from classical particle mechanics to Schrodinger's mechanics comprehensible, this final section of this chapter gives a collection of the principles from which all of Schrodinger's mechanics may be generated based on, but not dependent on, the previous heuristic considerations. It must be stressed yet again that it is not possible to derive Schrodinger's mechanics from classical particle mechanics however suggestive some of the analogies might be. Schrodinger's mechanics is a physical theory with methods, concepts and interpretations quite different from the classical theory. But, of course, since classical mechanics, classical statistical mechanics and Schrodinger's mechanics have contiguous, even overlapping, regions of applicability, it is not surprising that some mathematical structures and some concepts are common to all three. In this summary I collect together the basic ideas which have been developed12 and which do enable the generation of Schrodinger's mechanics and all the associated abstract structures and concepts. These principles are just that — principles — they are not axioms and are not presented as such. They presuppose most of standard classical mathematics and the logic which is assumed by mathematics. In physical theories axioms are only ever generated post hoc and are used to clarify and systematize an existing body of theory; like poetry, they may be thought of as emotion recollected in tranquillity. We start from the form of the classical variation principle (7.3.1) from which both the Hamilton-Jacobi equation and its associated conservation equation may be derived:
8JdVJdtL\H(dS/dqi,qi;t) = 5 f dV f dt {PH{dS/dq\ 12
+^
\
ql; t) - PE) = 0
W i t h hindsight of the actual material of quantum theory, of course.
7.6.
Summary
of Basic
Principles
143
and make the following assumptions to generate Schrodinger's mechanics (for concreteness for single-particle systems, although the generalisation is straightforward). 1. The referent of the theory is the abstract particle in the environment described by any potential function in the Hamiltonian H. 2. Both of the functions S(q*;t) and p(ql;t) may be expressed in terms of a single (complex, in general) new function ip(ql;t): S = —iklnif) def . , 12
The function p(ql;t) is a position probability density for the abstract particle in the usual Kolmogorov sense:
/
\1>\2dV
WcR3
Jw is the probability that the abstract particle be in a region of E3 (ordinary three-dimensional space) modelled by W if p is normalised over all space to unity. 3. The momenta are, as usual, given by def (dS\
=
^ik
/&4J\
This quantity is the momentum that the abstract particle has if it is at the point ql at time t. 4. The distribution of momentum in three-dimensional space (pPi, say) is this momentum multiplied by the position probability density for the abstract particle: pPM;t)
d
= pW\t)
xpi
= tfV;, Tp*,diip, diip\dti>,
dt^*)dVdt
where tp is a function of the q% and t and a=
d
* fl?
;
d
» dt=
di
3N
dV = ^dQ =
^Y[dqz i=l
where it is assumed that there are N particles in the system and, as usual, g is the metric determinant of the co-ordinates ql which may be evaluated as the Jacobian of the transformation between Cartesians and the ql which we assume to be non-zero. For convenience, we may visualise the single particle in ordinary threedimensional space, i.e. when the 3./V-dimensional configuration space is ordinary, real, space. In orthogonal co-ordinates for example y/g = /11/12/13 where the hi are the scale functions associated with the tangent space basis in the usual way. Thus A is a functional of xp and tp*, etc., and it is desired to find a solution of the variational problem by choice of optimum tp; the standard problem in variational calculus. I use elementary methods. Let Sip and 5ip* be linearly independent variations in the linearly independent functions tp ' T h i s decision amounts to taking Planck's constant h divided by 2-K (h = h/2-n) as the unit of action; it is a very small unit. Where it is desireable to stress the fact that Planck's constant is involved I shall revert to standard units and write in h explicitly.
8.1.
The Variational
149
Derivation
and ip* so that we may investigate the variation in the functional A in the neighbourhood of its value at ip, %p* by writing 1p-+1p + dxp = 1p + €T] tp* -» V* + Sip* = ip* + erf where e is a "small" real parameter and 77, f]* are linearly independent functions arbitrary apart possibly from some boundary conditions. It is assumed that the integrand A is a sufficiently smooth function of ip and ip* so that it may be expanded as a Taylor series about ip, ip*:
plus additional quadratic and higher terms in 6ip, Sip*. Now the variation "operator" 6 and partial differentiation commute so that 5{di%p) = di(6ip) = edi-q 5{dtip) = dt(6ip) = edtv • Thus, to first order in e ..
(dA \dip
^
dA
,
OA j^diditP) ^
dA
dA d(dtip) _ ,
+ 50^ + L ^ ) ^
a
9A
+
.
\
W^J
and so to be discussed later. This satisfies the requirement that 77 appear as a factor in the integrand. A typical member of the terms involving spatial derivatives d^tp is: dA /
f
W^)dl7]dVdt
=
dA
J ^woa^Qd'•
Again, integrating by parts and noting the additional complication of the presence of ^fg we obtain
J^d^)diT]dQdt=[wmM
~ldi
iy^mf))vdQdt-
Again the first term contributes to the boundaries. Using these typical terms the expression for 5A to first order in e is
(plus an expression of identical form in if and tp*) for arbitrary variations 77, 77* in tp, tp*. The condition 5A = 0 and the fact that 77, 77* are arbitrary can only be jointly satisfied if the factors multiplying 77 and 77* in the integrand are identically zero, i.e. vanish for all values of the ql and t. This gives two equations, one for the multiplier of 77 and one for the multiplier of 77*. Since ip and tp* appear symmetrically in A both equations are of the same form: dA
^
1
^
£iy/9
/
r
d A \
V d(di^)J
\d(M)
= 0.
These equations are the Euler-Lagrange equations which, together with the boundary terms, fix tp and ip*. The equations above fix ip and tp* in the region of space and time over which the integration, defining A in terms of A, is carried out. In the case with which we are concerned the integrand A is the difference between the Hamiltonian density and the energy density. Classically in
8.1.
The Variational
Derivation
151
general co-ordinates 2 this is 3JV
A = H-E
= T + V - E = ^ - Y 2m
J
gklPkPl + V( vanish on the boundary of the region in which the particle moves or that "incoming" and "outgoing" components of tp and V?/> cancel; that is if the boundary terms arising from the Schrodinger Condition are zero. Under these conditions the Divergence Theorem guarantees that the remaining term, involving the Laplacian of IV'I2, is zero. That is, for bound states or for systems in which there is "no net creation of particle density" the equality
J||vy>|2
" ^ = 2 ^ -
Therefore the well-known Bohr energy levels, corresponding to the Balmer terms, are obtained, if to the constant K, introduced into (8.A.2 on page 163) for reasons of dimensions, we give the value K=±
(8.A.29)
from which comes 27r2me4
_
~E"
=
12 2
,„
•
. „„.
8.A.30
Our n is the principal quantum number. £ + 1 is analogous to the azimuthal quantum number. The splitting up of this number through a closer definition of the surface harmonic can be compared with the resolution of the azimuthal quantum into an "equatorial" and a "polar" quantum. These numbers here define the system of node-lines on the sphere. Also the "radial quantum number" n — I — 1 gives exactly the number of the "node-spheres", for it is easily established that the function f(x) in (8.A.27 on page 171) has exactly n —£ — 1 positive real roots. The positive E-values correspond to the continuum of the hyperbolic orbits, to which one may ascribe, in a certain sense, the radial quantum number oo. The fact corresponding to this is the proceeding to infinity, under continual oscillations, of the functions in question. It is interesting to note that the range, inside which the functions of (8.A.27 on page 171) differ sensibly from zero, and outside which their oscillations die away, is of the general order of magnitude of the major axis of the ellipse in each case. The factor, multiplied by which the radius vector enters as the argument of the constant-free function / , — naturally — the reciprocal of a length, and this length is K =
K =
y/-2mE
^ ! me2
=
^
(s.A.31) n
Quantisation
as a Problem of Proper Values (Part 1)
173
where an = the semi-axis of the nth elliptic orbit. (The equations follow from (8.A.28 on the preceding page) plus the known relation En = -e2/2an). The quantity (8.A.31) gives the order of magnitude of the range of the roots when n and i are small; for then it may be assumed that the roots of f(x) are of the order of unity. That is naturally no longer the case if the coefficients of the polynomial are large numbers. At present I will not enter into a more exact evaluation of the roots, though I believe it would confirm the above assertion pretty thoroughly. §3. It is, of course, strongly suggested that we should try to connect the function if) with some vibration process in the atom, which would more nearly approach reality than the electronic orbits, the real existence of which is being very much questioned today. I originally intended to found the new quantum conditions in this more intuitive manner, but finally gave them the above neutral mathematical form, because it brings more clearly to light what is really essential. The essential thing seems to me to be that the postulation of "whole numbers" no longer enters into the quantum rules mysteriously, but that we have traced the matter a step further back, and found the "integralness" to have its origin in the finiteness and singlevaluedness of a certain space function. I do not wish to further discuss the possible representations of the vibration process, before more complicated cases have been calculated successfully from the new standpoint. It is not decided that the results will merely re-echo those of the usual quantum theory. For example, if the relativistic Kepler problem be worked out, it is found to lead in a remarkable manner to half-integral partial quanta (radial and azimuthal). Still, a few remarks on the representation of the vibration may be permitted. Above all, I wish to mention that I was led to these deliberations in the first place by the suggestive papers of M. Louis de Broglie,9 and by reflecting over the space distribution of those "phase waves", of which he has shown that there is always a whole number, measured along the path, present on each period or quasi-period of the electron. The main difference is that de Broglie thinks of progressive waves, while we are led to stationary proper vibrations if we interpret our formulae as representing vibrations. I have lately shown10 that the Einstein gas theory can be based on the consideration of such stationary proper vibrations, to which the dispersion law of de Broglie's phase waves has been applied. The above reflections on 9
L. de Broglie, Ann de Physique (10) 3, p. 22, 1925 (Theses, Paris, 1924). Physik. Ztschr. 27, p. 95, 1926.
10
174
Quantisation as a Problem of Proper Values (Part 1)
the atom could have been represented as a generalisation from those on the gas model. If we take the separate functions (8. A.27 on page 171), multiplied by a surface harmonic of order I, as the description of proper vibration processes, then the quantity E must have something to do with the related frequency. Now in vibration problems we are accustomed to the "parameter" (usually called A) being proportional to the square of the frequency. However, in the first place, such a statement in our case would lead to imaginary frequencies for the negative ^-values, and, secondly, instinct leads us to believe that the energy must be proportional to the frequency itself and not to its square. The contradiction is explained thus. There has been no natural zero level laid down for the "parameter" E of the variation equation (8.A.7 on page 165), especially as the unknown function ip appears multiplied by a function of r, which can be changed by a constant to meet a corresponding change in the zero level of E. Consequently, we have to correct our anticipations, in that not E itself — but E increased by a certain constant is expected to be proportional to the square of the frequency. Let this constant now be very great compared with all the admissible negative ^-values (which are already limited by (8.A.22 on page 168)). Then firstly, the frequencies will become real, and secondly, since our E-values correspond to only relatively small frequency differences, they will actually be very approximately proportional to those frequency differences. This, again, is all that our "quantum-instinct" can require, as long as the zero level of energy is not fixed. The view that the frequency of the vibration process is given by !/ = C " V / C T E = C'V / C + 4 = £ + --(8.A.32) 2%/C where C is a constant very great compared with all the £"s, has still another very appreciable advantage. It permits an understanding of the Bohr frequency condition. According to the latter, the emission frequencies are proportional to the E-differences, and therefore from (8.A.32 on the facing page) also to the differences of the proper frequencies nu of those hypothetical vibration processes. But these proper frequencies are all very great compared with the emission frequencies, and they agree very closely among themselves. The emission frequencies appear therefore as deep "difference tones" of the proper vibrations themselves. It is quite conceivable that on the transition of energy from one to another of the
Quantisation as a Problem of Proper Values (Part 1)
175
normal vibrations, something — I mean the light wave — with a frequency allied to each frequency difference should make its appearance. One only needs to imagine that the light wave is causally related to the beatsi which necessarily arise at each point of space during the transition; and that the frequency of the light is defined by the number of times per second the intensity maximum of the beat process repeats itself. It may be objected that these conclusions are based on the relation (8.A.32 on the preceding page) in its approximate form (after expansion of the square root), from which the Bohr frequency condition itself seems to obtain the nature of an approximation. This, however, is merely apparently so, and it is wholly avoided when the relativistic theory is developed and makes a profounder insight possible. The large constant C is naturally very intimately connected with the rest-energy of the electron (mc2). Also the seemingly new and independent introduction of the constant h (already brought in by (8.A.29 on page 172)), into the frequency condition, is cleared up or rather avoided, by the relativistic theory. But unfortunately the correct establishment of the latter meets right away with certain difficulties, which have already been alluded to. It is hardly necessary to emphasize how much more congenial it would be to imagine that at a quantum transition the energy changes over from one vibration to another, than to think of a jumping electron. The changing of the vibration form can take place continuously in space and time, and it can readily last as long as the emission process lasts empirically (experiments on canal rays by W. Wien); nevertheless, if during this transition the atom is placed for a comparatively short time in an electric field which alters the proper frequencies, then the beat frequencies are immediately changed sympathetically, and for just as long as the field operates. It is known that this experimentally established fact has hitherto presented the greatest difficulties. See the well-known attempt at a solution by Bohr, Kramers, and Slater. Let us not forget, however, in our gratification over our progress in these matters, that the idea of only one proper vibration being excited whenever the atom does not radiate — if we hold fast to this idea — is very far removed from the natural picture of a vibrating system. We know that a macroscopic system does not behave like that, but yields in general a pot-pourri of its proper vibrations. But we should not make up our minds too quickly on this point. A pot-pourri of proper vibrations would also be permissible for a single atom, since thereby no beat frequencies could arise
176
Quantisation
as a Problem of Proper Values (Part 1)
other than those which, according to experience, the atom is capable of emitting occasionally. The actual sending of many of these spectral lines simultaneously by the same atom does not contradict experience. It is thus conceivable that only in the normal state (and approximately in certain "meta-stable" states) the atom vibrates with one proper frequency and just for this reason does not radiate, namely, because no beats arise. The stimulation may consist of a simultaneous excitation of one or of several other proper frequencies, whereby beats originate and evoke the emission of light. Under all circumstances, I believe, the proper functions which belong to the same frequency, are in general all simultaneously stimulated. Multipleness of the proper values corresponds, namely, in the language of the previous theory to degeneration. To the reduction of the quantisation of degenerate systems probably corresponds the arbitrary partition of the energy among the functions belonging to one proper value. Addition at the proof correction on 28.2.1926. In the case of conservative systems in classical mechanics, the variation problem can be formulated in a neater way than was previously shown, and without express reference to the Hamilton-Jacobi differential equation. Thus, let T(p, q) be the kinetic energy, expressed as a function of the coordinates and momenta, V the potential energy, and dr the volume element of the space, "measured rationally", i.e. it is not simply the product dqidq2dqs • • • dqn, but this divided by the square root of the discriminant of the quadratic form T(p,q). (Cf. Gibbs Statistical Mechanics) Then let ip be such as to make the "Hamilton integral"
Jdr^T^q^+^v}
(8.A.33)
stationary, while fulfilling the normalising, accessory condition fdrip2
= l.
(8.A.34)
The proper values of this variation problem are then the stationary values of integral (8.A.33) and yield, according to our thesis, the quantum levels of the energy.
Quantisation
as a Problem of Proper Values (Part 1)
It is to be remarked that in the quantity a(q3 ;t) will not, in fact depend on t since, by assumption, A is constant.
9.2.
Abstract Particles of Constant
Momentum
181
similarity is not matched by any similarity in physical interpretation: Expression (9.2.5 on the preceding page) is an identity, not an equation; the solution of this differential equation is simply a function tp for which the momentum pj of the associated abstract object is constant. This tp does not necessarily solve any dynamical law, there may or may not be abstract objects for which this tp satisfies the Schrodinger Condition, i.e. ip may or may not satisfy any Schrodinger equation. Naturally, this process can be extended to any dynamical variable A(qj,pj;t), the analogue of equation (9.2.5 on the facing page) which constrains the value of this dynamical variable to be constant over all space being
^'w;0=a(say)
(9 2 6)
--
which may or may not turn out to be a convenient differential equation mathematically similar to the Schrodinger equation. But the simple requirement that a particular dynamical quantity of an abstract object be constant is not enough to ensure that the abstract object underlying this assumption corresponds to any concrete objects in the real world. Only those abstract objects whose functions ^(q^-jt) satisfy the Schrodinger Condition have associated concrete objects in the real world. It may well be the case that, for a suitable choice of potential function in the Hamiltonian function, an abstract object may be found which does, indeed, have the required constancy for the particular dynamical quantity in question 2 as well as satisfying the Schrodinger Condition but, equally, it may not. It is quite easy to find examples for which no abstract or concrete object exist, particularly for momenta conjugate to angular coordinates. This point is crucial to the relationship between the abstract, algebraic, structure of Schrodinger's mechanics and its physical interpretation, we shall have to return to this matter in Chapter 10. To emphasise the difference between these identities and the equations of Schrodinger's mechanian, we may use the Hamiltonian function (H) as 2
T h e most familiar example will be looked at in Chapter 11.
Identities: Momenta and Dynamical Variables
182
a special case of the dynamical variable A in equation (9.2.6) to obtain the condition that the Hamiltonian function is constant over all space:
*(«'. ; r& is complex with imaginary terms vanishing by boundary conditions while also containing real terms in the gradients of R.
9.3.
Action and Momenta in Schrodinger's Mechanics
185
so that S(q) = kq R{q) = i\nN and the system's distributions are entirely classical. The momentum of the abstract particle (and therefore of any concrete particles, should these exist) is simply dS/dq and any concrete particle is equally likely to be found anywhere in the allowed range of q. The mean value of the momentum is, of course, k since it has been assumed that the abstract particle has constant momentum k. Although this example has a reassuringly familiar connection with classical particle mechanics, it provides no help in interpreting the imaginary momentum. The only concrete objects which exist in the real (micro-) world are those for which the Schrodinger equation for a corresponding abstract object exists. The time-independent Schrodinger equation is a real equation, so its solutions must be real. Or, if complex solutions exist, they must occur in degenerate complex conjugate pairs from which real solutions may be formed. This means that the mean values of all momenta are zero for all abstract objects for which there is a time-independent Schrodinger equation; the only contribution to the momentum distributions are pure imaginary and so (boundary conditions permitting) integrate to zero. Of course, the distribution integrates to zero by cancellation of equal positive and negative contributions and not because the distribution is identically zero. When we consider the nature of the abstract objects which the Schrodinger equation describes this result is less surprising than it would first appear. For a single particle, the abstract object might typically be An abstract particle in a given conservative field of force. with, of course, no specification of position, momenta, etc. Thus, there is no specification of, for example, direction of momenta (linear, angular or whatever) and the abstract object's probability distributions will therefore describe the properties of any concrete objects with all possible magnitudes and directions of all momenta consistent with the given field of force and the particular solution (Ei, rpi, say) of the Schrodinger equation. So, the mean
186
Identities: Momenta and Dynamical Variables
values of all the momenta will indeed be zero; for any given allowed magnitude and direction of momentum, its equal and opposite will also be allowed. Now, one may wish to impose additional constraints on the solutions of a particular Schrodinger equation in order that the resulting constrained abstract object have some particularly desirable properties for a particular application. So, for example, if a solution of a Schrodinger equation is degenerate it may be possible to form linear combinations of these degenerate solutions which are complex in order that the resulting abstract object thus defined have non-zero mean momentum (or momenta) in addition to constant energy. This is a constraint on the solutions of the Schrodinger equation and the abstract object concerned becomes less abstract or more concrete as more of its properties are prescribed. In fact, as we shall see in Chapter 11, it is possible either by an examination of the properties of the variational "Lagrangian" distribution, or by separating the Schrodinger equation, to find solutions to the Schrodinger equation with constant momenta in addition to constant energy and so specify the correspondingly restricted abstract objects. There is then a welldefined problem of finding the least abstract (most closely-specified) object for a given conservative field of force and this problem will be addressed in Chapter 11. The real solutions of the Schrodinger equation correspond, therefore, to the most abstract objects consistent with the given conservative force field and any complex solutions must correspond to less abstract objects. If the means of the momenta are all zero in the most general case, the pure imaginary part of the distribution of momenta (corresponding to the real part of ip) must carry a description of the deviations from these mean momenta; the distributions which cancel in generating the zero mean. 9.4.
Momenta and Kinetic Energy
In looking at the solutions, (9.3.13 on the facing page) of equation (9.2.5 on page 180) we have not checked that the abstract objects which they describe do, in fact, correspond to solutions of the Schrodinger equation. That is, are there concrete objects in the real world having the properties of these abstract objects? The functions ip of equation (9.3.13 on page 184) are complex and, what is more, it is not yet obvious that they solve some Schrodinger equation.
9.4-
Momenta and Kinetic
187
Energy
The simplest and most useful abstract object to look at here is a single free particle, since it is so familiar that any of its properties are almost immediately intuitively interpretable. Such an object only has kinetic energy and there are three possible ways of approaching its properties, using Cartesian coordinates in the same spirit of intuitive accessibility. In the first instance we can look at this example purely from the point of view of solving the differential equations which arise in the three possible cases, without regard for the effect of any boundary conditions which may be relevant. The three possibilities are: 1. The momentum is a (vector) constant: -VIIJ\(X,y,z)
= k = kxi + kyj
Vipi(x,y,z)
=
+ kzk
(say)
iktpi(x,y,z).
Clearly, this equation may be solved by writing Mx,y,z)
=
X1(x)Y1(y)Z1{z)
leading to separate equations for the three factors of the form: dX\
..
—— =
IKXX\
ax font with solutions X\(x) = Ax exp(ikxx) (9.4.14) 2 where the probability distribution function is IV'il and fex, etc., may take any (real) value. 2. The kinetic energy is a (positive, real) constant:
2\Mw)\2
r |VV>2(z,y,z)|
2 d
=T(say)
|VV^(x,l/,«)| 2 = 2T|^ 2 (x, 2 /, Z )| 2 again, writing i/,2 in the product form Mx,y,z)
=
X2(x)Y2(y)Z2(z)
leads to three equations of the form dX2 dx
2
2TX\X2\2
188
Identities:
Momenta
with solutions -X^rr) =
and Dynamical
Variables
Bxexp(i\J2Txx)
where T = yJT* + T2 + T 2
(9.4.15)
this time the probability distribution function is \ip212- The number T, etc., is, of course, non-negative in this case. 3. The system solves the time-independent Schrodinger equation and the separation proceeds in the same way: = Eip3(x,y,z)
(say)
ld2X3 = EXX3 2 dx2 / X3(x) = Cx sin( V 2^a;) + Dx
where E = Ex + Ey + Ez
cosfJ^E^x) (9.4.16)
with the same proviso on ip3 and Ex is positive. Notice that, in this case, the Schrodinger equation is a real equation so the real solutions are given. In each case, the equation separates in Cartesian coordinates and the xcomponent of each has been given explicitly. There are obvious mathematical relationships amongst these quantities: • Xi provides a special solution to the problem in 2. on the preceding page; for kx = y/2Tx. This is not surprising; an abstract object with a given value of linear momentum in a given direction must have a related fixed value of kinetic energy in that direction. But the converse is equally obviously false, a given abstract free particle with constant kinetic energy does not necessarily have constant linear momentum components. In one dimension any mixture of momentum components of |fcx| = V2TX will have the same kinetic energy and in three dimensions. In fact, the situation is even more free, all that is required is that hi + k2y + k2 = IT any mixture of linear momenta in any direction has constant kinetic energy, provided that this condition is satisfied. The solutions (9.4.14) are non-degenerate; to every triple (kx,ky, kz) there is a unique solution.
9.5.
Boundary
Conditions
189
• The solutions ^ ( r r ) may be chosen to have the same form as the solutions of equation (9.4.14 on the facing page) as we have seen above. In contrast to the solutions ipi, the solutions ip2 of (9.4.15) are infinitely degenerate; 6 for every value of T there are infinitely many ways of combining the triple (Tx,Ty,Tz) to generate the total sum. • The (real) solutions of equation (9.5.17) are obviously linear combinations of the complex solutions of equations (9.4.14 on the facing page) and X\ satisfies (9.5.17) again as a special case — specific choice of Cx/Dx = i and
Again the solutions ^3 of (9.5.17) are infinitely degenerate; for every value of E there are infinitely many ways of combining the triple (Ex, Ey,Ez) to generate the total sum. Thus, it looks as if the abstract objects "free particle with constant momentum" and "free particle with constant kinetic energy" do have solutions in common with a Schrodinger equation and so there are concrete objects in the real world which have this property. But for every abstract object "free particle with constant momentum" there are infinitely many others with the same energy.
9.5.
Boundary Conditions
The boundary conditions which might be imposed on the solutions of the Schrodinger equation or on the solutions of the simpler constant-momentum equations have been shabbily treated so far. In fact they are important for two, related, reasons: • It is the boundary conditions which ultimately decide whether or not an acceptable solution exists for the Schrodinger equation and these conditions are the source of the typical quantisation of dynamical quantities. • The boundary conditions fix the nature of the functions on which the operator
_iv2 + vV) In the absence of boundary conditions.
Identities:
190
Momenta
and Dynamical
Variables
may work and it is this domain of the operator which fixes its important formal properties as we shall see. In the case of the Schrodinger equation the boundary conditions are generated by the variational solution of the Schrodinger Condition. It is these boundary conditions which fix the range of Hermiticity of the Hamiltonian operator. • In the case of identities like (9.2.6 on page 181), there is no "natural" source of the boundary conditions, precisely because these identities are not per se part of Schrodinger's mechanics. We are left to impose such boundary conditions as we think suitable because it is we who have generated the original identity, not Schrodinger's mechanics. The particular case of a abstract free particle proves particularly atypical and problematic, basically because the motion is better described by classical mechanics.
9.5.1.
Constant Momenta
and Kinetic
Energy
The functions tp\ for the abstract particle of constant momentum of the previous section do not have finite normalisation integrals, for example: b
/
\Xrfdx
diverges if the full range of the Cartesian coordinate is used: (a, b) = (—oo, oo). If the region over which the abstract particle is allowed to move is finite, the functions Xi (and the functions V>i of which they are factors) may be normalised and the interpretation of the ipi as probability distribution functions is thereby secured. 7 There is a point of principle to be made here: since the complex functions X\(x) = Ax exp(ikxx) have constant modulus, one has to choose between allowing the function to simply stop at a and b with the same amplitude as the rest of the range or forcing a linear combination of the complex conjugates to obtain a real trigonometric function which can be made to vanish at the end-points. But, in doing this, we have formed linear combinations of functions belonging to different eigenvalues of the original eigenvalue equation and so destroyed the property of solving this constant-momentum equation. So, in this case, these new real solutions do not, in fact, solve the original constant-momentum equation 7 One could, of course, argue that the relative measures of the functions V"i over finite regions could still be intepreted as relative probabilities in those regions, even in the absence of this last constraint.
9.5.
Boundary Conditions
191
but they do solve the relevant Schrodinger equation, (9.5.17). They do not solve the original constant-momentum equation because, in order to generate real solutions, one must take linear combinations of two complex solutions with different values of kx; namely the ones with constant \kx\. There is no guidance to be had from the original definition of an abstract particle of constant momentum, precisely because it is a definition; it is at our discretion to enhance our definition. Whether we make the function simply stop at a and b or force it to be zero means that it is defined over a closed interval [a, b] and this implies that its derivatives are not defined at the end points and it is these very derivatives which generate the said momentum. The definition of the momentum can be extended to cover this case (and others where the coordinate ranges are closed intervals of R):
Similar remarks apply to the case of constant kinetic energy with the important exception that, this time, we may form the real solutions by combining the complex solutions of the same |fcx| (say) because these solutions do have the same (constant) kinetic energy even though they have different momenta. 9.5.2.
Solution
of the Schrodinger
Equation
In the case of ^3 and its factors X3 (solutions of the Schrodinger equation) which are entirely real-valued functions, the situation is much more clear-cut since the Schrodinger Condition generates a fixed set of boundary conditions which must be met. The original boundary conditions which must be satisfied by the solutions of a Schrodinger equation were obtained variationally from the Schrodinger Condition in Chapter 8 and are given in general by (8.3.3) which for the particular case here become: [-iip + VV]„ = 0
(9.5.17)
where "a" and "6" are used to mean the lower and upper limits of the three-dimensional region where 1P3 is defined; a lower and upper limit for each of x, y and z in Cartesians. Now tp3 is a real-valued function and so when we write out equation (9.5.17) out in full:
HV(6) + V(6)) - (-iil>(a) + VV>(a)) = 0
Identities:
192
Momenta
and Dynamical
Variables
it is obvious that, since the real and imaginary parts must be separately zero, we have two simple possibilities: 1. The values of 1^3 and
VT/>3
are each zero at both limits:
Mb) = Ma) = Vlfe(6) = W3(a) = 0. 2. The value of V>3 is the same at both limits as is the value of VM
Mb) = Ma) VV>3(&) = VV>3(a).
(9.5.18)
Using the real trigonometric solutions, it is impossible to satisfy the first condition. Indeed, the gradient ("momentum") of ipz is at a maximum when the value of tps is zero for both the sine and cosine functions. We are then left with the very simple condition that, suitably displaced, the function tp^ would join continuously and smoothly at the two limits; its value must be the same and its slopes must be identical. It is a familiar fact of elementary quantum theory that the imposition of these boundary conditions makes the continuum of solutions tp3 of equation (9.5.17) unacceptable and makes the values of the Energy discrete (quantised). This is aways the case: The solutions of the Schrodinger equation (a differential equation) typically involve a continuum of possible energies; it is the imposition of the boundary conditions on these solutions — i.e. ensuring that we are dealing with the domain of Hermiticity of the associated Hamiltonian operator — which generates quantisation of energy. And, of course, to make the point yet again, both the Schrodinger equation and the boundary conditions are given by the fundamental law of Schrodinger's mechanics, the Schrodinger Condition.
9.6.
The "Particle in a Box" and Cyclic Boundary Conditions
The case of a single particle confined to a finite region of space (a particle in a box) is often the first example of the explicit solution of the Schrodinger equation encountered in elementary texts and uses one of two subterfuges
9.6.
The "Particle in a Box" and Cyclic Boundary Conditions
193
to obtain the correct energies, rather than use the correct boundary conditions supplied by the Schrodinger Condition. The one-dimensional case is mathematically identical to X3(x) of equation (9.5.17) and the boundaries are identical to the ones discussed in the last section; the one-dimensional space in which the particle can move is the interval (a, b) (or, perhaps [a, b]). One can think of two physical models of this situation: 1. For x < a and x > b, there is an infinite repulsive potential which prevents the particle from entering those regions. 2. The region is repeated exactly for each contiguous length of (b — a). In the first case, one says that the probability of a particle being outside the region (a, b) is zero and so, since the function X3 must be continuous everywhere, it must vanish at x = a and x = b, generating the correct quantisation conditions from the fact that the trigonometric functions must then be periodic in sub-multiples of (b — a) and only the sine function is retained. Unfortunately, it is equally plausible to say that the momentum of the particle is zero for x < a and x > b, so it must be zero at a and 6 and again generate the correct discrete energies, but this time the cosine function is the only one retained. These two plausible assumptions cannot be made compatible and one simply has to make a choice. In the second case one obtains the correct boundary conditions — those of equation (9.5.18) — but for an entirely artificial physical model. There is, as we have seen above, an entirely straightforward way of obtaining the solutions of the global Schrodinger Condition for a particular application by using its two characteristic local consequences: the Schrodinger equation and the boundary conditions. The boundary conditions, however physically obvious they might or might not be, are always generated by the Schrodinger Condition.
This page is intentionally left blank
Chapter 10
Abstracting the Structure
The most widely used and most familiar applications of Schrodinger's mechanics are in those cases where the potential energy function is independent of time and is a conservative field so that, both classically and in quantum theory, the energy of the system is conserved. The Schrodinger equation for these important cases takes a characteristic form; that of an eigenvalue problem. In this case it is possible to abstract the algebraic structure of a Hilbert space from the solutions of the Schrodinger equation. This algebraic structure transforms much of the mathematics from problems in analysis (which are hard) to problems in algebra (which are easier) while doing some violence to the interpretation of the theory. Quantum theory is not deepened or developed by the use of powerful abstract methods, merely sophisticated. It is via the interpretation of the symbolism that a theory is deepened, not by an improvement of the tools of articulation.
Contents 10.1. The Idea of Mathematical Structure 10.1.1. A Pitfall of Abstraction: The Momentum Operator 10.2. States and Hilbert Space 10.3. The Real Use of Abstract Structures
10.1.
195 198 201 204
The Idea of Mathematical Structure
In Chapter 8 the transition has been made from classical to quantum mechanics by the requirement that, in Schrodinger's dynamics, the Hamilton-Jacobi equation must be satisfied only on the average in space 195
196
Abstracting
the
Structure
and time. We have also embarked on an interpretation of the various quantities appearing in Schrodinger's theory, albeit on a fairly informal and intuitive basis. There are now several possible ways forward: 1. An investigation of the formal properties of the equations and identities of the Schrodinger theory with a view to extracting the "structure" of the theory for a more rigorous and, perhaps, more general formulation. 2. A continued investigation of the interpretation of the theory and a more precise definition of the referents of the theory and a more careful statement of the results of the last chapter. 3. Application of the Schrodinger theory to particular systems. In fact, we shall only be concerned with (3) in particular cases in Chapters 11 and 12 where it enables some far-reaching conclusions to be drawn for (1) and (2). The possible applications of the Schrodinger equation are very numerous and extremely successful and it was these very successes which guaranteed the acceptance of Schrodinger's theory long before an adequate interpretation of the formalism was available. I shall not go into these applications except where they bear on the interpretation of Schrodinger's mechanics. This chapter will be concerned with (1) and (2), and in particular with the connection between (1) and (2): how the interpretation of the theory bears on the validity and utility of any formal structures which may be abstracted from that theory. What are we to make, for example, of the fact that the Schrodinger equation has, no doubt, infinitely many solutions? Is there a relationship between the families of solutions of the Hamilton-Jacobi equation obtained by separation methods and the "corresponding" solutions of Schrodinger's equation? How do we interpret those amongst the solutions which cannot be normalised to unity? Can we distinguish between equations and identities in the theory as we did in classical mechanics? What concept replaces the familiar concepts of a particle's (as yet unmentioned) velocity in classical mechanics? These and many other pressing scientific problems are not touched by the ability of the Schrodinger equation to yield numerical results in essentially perfect agreement with experiment. Superposed on these problems which are particular to Schrodinger's theory is the all-pervasive problem which arises whenever an attempt is made to express a scientific theory in terms of a known mathematical structure: does the formal "abstraction" reveal the real structure of the theory without the inclusion of sources of scientific difficulty?
10.1.
The Idea of Mathematical Structure
197
In our reflections on classical mechanics and on the transition to Schrodinger's mechanics it has not proved necessary to pause to consider the nature of the mathematical structures used or their possible impact on the science involved since — along with Newton, Lagrange, Hamilton, Jacobi and Schrodinger — we have simply assumed the validity and utility of the differential and integral calculus. In a word, it has been silently assumed that the methods and results of classical analysis model the continuity and structure of real space and time. Moreover, it has been assumed that this modelling is done in the usual sense of abstraction; even if the use of the real numbers (for example) to model each dimension of real space excludes some of the properties of real space, the opposite is not true; the use of the reals to model a line does not surreptiously include any properties which are not possessed by physical space. This may be incorrect, of course, but it is the classical viewpoint. The possibilities of unintentionally including extraneous material into a scientific theory is much more real when the more powerful techniques of mathematics are used. The equations of classical mechanics specify local conditions among the mechanical quantities: they are differential equations. If it proves possible to generate these equations from a global condition — a variation principle, for example — then there is always the possibility, even likelihood, that the global condition will generate more differential equations and boundary conditions than the one which was its ostensible "source". This latter trap is only a danger in classical mechanics, where we start with F — ma and generalise the theory by the admission of more and more classes of "co-ordinate". Perhaps the most obvious place where we depend on a global condition is in the development of the transformation theory, where it is required that the generating function be capable of being developed as a function of time only, which is the direct result of the variational formulation of mechanics. In Schrodinger's mechanics, by contrast, the starting point is a global condition; the equality of the mean values of the Hamiltonian density and the energy density. The Schrodinger equation which is the (local) EulerLagrange equation of this global condition (together with some boundary conditions) and is therefore, at a lower level than the fundamental Schrodinger condition. But this condition on the mean values of the Hamiltonian and energy densities is based on the preceeding classical mechanics. If Schrodinger's theory is considered to depend on classical mechanics then any artifacts in classical mechanics may be carried forward. If, however,
Abstracting
198
the
Structure
Schrodinger's mechanics in viewed as independent of classical mechanics, depending only on historically-established classical mechanical concepts (not physical laws) then it is an independent global theory from the outset. 10.1.1.
A Pitfall
of Abstraction:
The Momentum
Operator
A very familiar example of the confusion which may arise in the cycle: Specific mathematical case —> Back to specific case
Abstraction of structure