Texts in Computer Science Editors David Gries Fred. B. Schneider
For other titles published in this series, go to www.springer.com/series/3191
David Salomon
The Computer Graphics Manual
Prof. David Salomon (emeritus) Department of Computer Science California State University Northridge, CA 91330-8281, USA
[email protected] Series Editors David Gries Department of Computer Science Upson Hall Cornell University Ithaca, NY 14853-7501, USA
Fred. B. Schneider Department of Computer Science Upson Hall Cornell University Ithaca, NY 14853-7501, USA
ISSN 1868-0941 e-ISSN 1868-095X ISBN 978-0-85729-885-0 e-ISBN 978-0-85729-886-7 DOI 10.1007/978-0-85729-886-7 Springer London Dordrecht Heidelberg New York British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2011937970 © Springer-Verlag London Limited 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To users of computer graphics everywhere
There’s no better sensation than image. It’s so in-your-face!
—Jimmy Lai
Preface The field of quantum mechanics is the cornerstone of modern physics. This field was rapidly developed in the 1920s and 1930s by a small group of (mostly young) researchers. They generally agreed that we cannot (and indeed, should not even try to) visualize atoms, photons, and elementary particles. These objects, their attributes and their behavior are best described in terms of mathematical abstractions, not pictures. Indeed, one of the first textbooks in this area The Principles of Quantum Mechanics by P. A. M. Dirac, does not include a single diagram in its 314 pages. This style of writing reflects one extreme approach to graphics, namely considering it irrelevant or even detracting as a teaching tool and ignoring it. Today, of course, this approach is unthinkable. Graphics, especially computer graphics, is commonly used in texts, advertisements, and videos to illustrate concepts, to emphasize points being discussed, and to entertain. Our approach to graphics has been completely reversed since the 1930s, and it seems that much of this change is due to the wide use of computers. Computer graphics today is a mature, successful, and growing field. It is employed by many people for many purposes and it is enjoyed by even more people. One criterion for the maturity of a field of study is its size. When a certain discipline becomes so big that no one person can keep all of it in their head, we say that that discipline has matured (or has come of age). This is what happened to computer graphics in the last decade or so. It is now a large field consisting of many subfields such as curve and surface design, rendering methods, and computer animation. Even a person who has written a book covering the entire field cannot claim that they keep all that material in their head all the time, which is precisely the reason why textbooks are being written.
In its 357 pages, The Principles of Quantum Mechanics featured neither a single diagram, nor an index, nor a list of references, nor suggestions for further reading. —Graham Farmelo, The Strangest Man, 2009.
vii
viii
Preface
Overview and Goals Today (in 2011), the power of computer-generated images is everywhere. Computer graphics has pervaded our lives to such an extent that sometimes we don’t even realize that an image we are watching is artificial. The average person comes into contact with computer graphics mostly in three areas, computers, television, and electronic devices. Current computers and operating systems are based on GUI (graphical user interface). Computer programs often display results graphically. Television programs and commercials employ sophisticated, computer-generated graphics that are often hard to distinguish from the real thing. Many television programs (mostly documentaries) and recent movies mix real actors and artificial imagery to such an extent that the viewer may find it difficult to distinguish a real object or scene from a computer-generated image. (A real actor trying to outrun a computer-generated dinosaur is a common example.) More and more digital cameras, electronic devices, and instruments have small screens that display messages, options, controls, and results in color and are often touch sensitive, enabling the user to enter commands by finger gesturing instead of from the traditional keyboard. Many cell telephones even have two screens, and some new digital cameras also feature two LCD displays. With this in mind, the goal of this manual is to present the reader with a wide picture of computer graphics, its history and its pioneers, the hardware tools it employs, and most important, the techniques, approaches, and algorithms that are at the core of this field. Thus, this textbook/reference tries to describe as many concepts and algorithms as possible, paying special attention to the important ones. It would have been nice to include everything in this book and title it, like other texts by the same author, Computer Graphics: The Complete Reference, but computer graphics has grown to a point where I cannot hope to be an authority on the entire field, which is why some readers may not find every topic, term, concept, and algorithm they may be looking for. New material for Volume 4 will first appear in beta-test form as fascicles of approximately 128 pages each, issued approximately twice per year. These fascicles will represent my best attempt to write a comprehensive account; but computer science has grown to the point where I cannot hope to be an authority on all the material covered in these books. Therefore I’ll need feedback from readers in order to prepare the official volumes later. —Donald E. Knuth. On the other hand, those same readers may find in this manual/textbook topics they did not know existed, which might serve as compensation. The many examples and exercises sprinkled throughout the book enhance its usefulness. By paying attention to the examples and working out the exercises, readers will gain deeper understanding of the material.
Organization and Features This manual is large and is organized in seven parts as follows: Part I covers the history, basic concepts, and techniques used in computer graphics. The concepts of pixel, vector scan, and raster scan are discussed. It is shown how an
Preface
ix
image given as a bitmap of pixels can be scaled (zoomed) and rotated. Many important scan-conversion methods are explained and illustrated. Part II is devoted to transformations and projections. It starts with the important two- and three-dimensional transformations, including translation, rotation, reflection, and shearing. This is followed by the main types of projections, namely parallel, perspective, and nonlinear. Part III is by far the largest. It includes many methods, algorithms, and techniques employed to construct curves and surfaces, which are the building blocks of solid objects. Six important interpolation and approximation methods for curves and surfaces are included, as well as sweep surfaces and subdivision methods for surface design. Part IV goes into advanced techniques such as rendering an object, eliminating those parts that are hidden from view, and bringing objects to life (animating them) by interpolation. Several chapters included in this part discuss the nature and properties of light and color, graphics standards and graphics file formats, and fractals. Part V describes the principles of image compression. It concentrates on two important approaches to this topic, namely orthogonal and subband (wavelet) transforms. The important JPEG image compression standard is described in detail. Part VI is devoted to many of the important input/output graphics devices. Chapter 26 describes them and explains their operations. Part VII consists of appendixes, most of which discuss certain mathematical concepts. The following features enhance the usefulness and appearance of this textbook: The powerful MathematicaTM and Matlab software systems are used throughout the book to implement the various concepts discussed. When a figure is computed in one of these programs, the code is listed with the figure. These codes, which are available in the book’s website, are meant to be readable rather than efficient and fast, and are therefore easy to read and to modify even by inexperienced Mathematica and Matlab users. The book has many examples. Experience shows that examples are important for a thorough understanding of the kind of material discussed in this manual. The conscientious reader should follow each example carefully and try to work out variations of the examples. Many examples also include Mathematica code. The many exercises sprinkled throughout the text are not a cosmetic feature. They deal with important topics, and should be worked out. Answers are provided but they should be consulted only as a last resort. A quotation is a phrase that reflects its author’s profound thoughts. Quotations and epigrams enliven a book, which is why they have been generously used throughout this manual. I hope that they add to the book and make it more interesting. The ability to quote is a serviceable substitute for wit. —W. Somerset Maugham. This manual/textbook aims to be practical, not theoretical. After reading and understanding a topic, the reader should be able to design and implement the concepts
x
Preface
discussed there. The few mathematical arguments found in the book are simple, and there is no attempt to present an overall theory encompassing the entire field of computer graphics. An important feature of this text is the attention paid to orphans. Those are topics that most texts on computer graphics either mention briefly or disregard completely. Examples are perspective projections, nonlinear projections, nonlinear bitmap transformations, curves, surfaces, I/O devices, and image compression. The reader will find that this manual discusses orphans in great detail, including numerous examples and exercises. Most of the necessary mathematical background (such as vectors and matrices) is covered in the Appendixes. However, some math concepts that are used only once (such as the mediation operator and points vs. vectors) are discussed right where they are introduced.
The Two Volumes This textbook/reference is big because the discipline of computer graphics is big. There are simply many topics, techniques, and algorithms to discuss, explain, and illustrate by examples. Because of the large number of pages, the book has been produced in two volumes. However, this division of the book into volumes is only technical and the book should be considered a single unit. It is wrong to consider volume I as introductory and volume II as advanced, or to assume that volume I is important while volume II is not. The volumes are simply two halves of a single, large entity.
The Color Plates This extensive manual features more than 100 color plates, placed at the very beginning, at the end, and between individual parts. They serve to liven up the book and to illustrate many of the topics discussed. It is planned to place information about these plates in the book’s website, for the benefit of readers who want to recreate or extend them. The plates were prepared over several months, using a variety of graphics software. Appendix F has more information about the plates, their content, and the graphics applications used to generate them.
A Word on Notation It is common to represent nonscalar quantities such as points, vectors, and matrices with boldface. Here are examples of the notation used in this manual: x, y, z, t, u, v
Italics are used for scalar quantities such as coordinates and parameters.
P, Qi , v, M
Boldface is used for points, vectors, and matrices.
CP
An alternative notation for vectors, used when the two endpoints of the vector are known.
Preface P(t), P(u, v)
a12 a22
a11 a21
a11 a21
Boldface with arguments is used for nonscalar functions such as curves and surfaces.
a12 a22
Parentheses (and sometimes square brackets) are used for matrices. Vertical bars are used for determinants.
|v|
The absolute value (length) of vector v.
AT
The transpose of matrix A.
x∗ , P∗
The transformed values of scalars and points.
f u (u), Pt (t), Ptt (t)
The derivatives (first, second,. . . ) of scalar and vector functions.
df (u) dP(t) , du dt
Alternative notation for derivatives.
df 2 (u) dP2 (t) , du2 dt2
Alternative notation for higher-order derivatives.
∂f (u, v) ∂P(u, v) , ∂u ∂v
Partial derivatives.
f (x)|x0 or f (x0 )
Value of function f (x) at point x0 .
n
xi
xi
The sum x1 + x2 + · · · + xn .
xi
The product x1 x2 . . . xn .
i=1 n i=1
Exercise 1: What is the meaning of (P1 , P2 , P3 , P4 )?
Target Audiences The material presented here has been developed over many years of teaching computer graphics. It has been revised, updated, and distilled many times, with many figures, examples, and exercises added. The text emphasizes the simplicity of the mathematics behind computer graphics and tries to show how graphics software works and how current computer graphics can generate and display realistic-looking curves, surfaces, and solid objects. The key ideas are introduced slowly, are examined, when possible, from several points of view, and are illustrated and illuminated by figures, examples, and (solved) exercises. The discussion must employ mathematics, but it is
Preface
xii
mostly non-rigorous (no theorems or proofs) and therefore easy to grasp. The mathematical background required includes polynomials, matrices, vector operations, and elementary calculus. This manual/textbook was written with three groups of readers in mind. Textbook/student. The book can serve as the primary text for a two-semester class on computer graphics for graduate and advanced undergraduate students. The many fragments of Mathematica code found here may serve as a core around which students can build larger programs. The exercises are especially valuable for students, and the lack of rigorous theorems and proofs should encourage those who consider computer science distinct from mathematics. Reference/professional use. Professionals in engineering, computers, and other scientific disciplines generate, watch, and process digital images all the time. They often would like to know more about how such images are generated. Artists, photographers, and publishing professionals use computer graphics routinely and may be interested in a solid background in this field. Those readers can benefit from two features of this book, namely the detailed index and the thorough and precise exposition of the principles, methods, and techniques used in computer graphics. This textbook/reference tries to cover all the important topics of the graphics field, some in more detail than others. The index is exceptionally detailed and constitutes 2.5% of the book. Individuals/handy resource. All of us, not just certain professionals, are constantly exposed to digital images, digital effects, and computer animation. Intelligent persons who try to widen their horizons may wonder how digital images are created, edited, and distributed. It is my belief that this manual can serve the needs of this group of individuals as well. The book may prove useful to them because of the simple, straightforward descriptions of graphics devices and processes. The book may also be a handy resource because of the detailed index, which makes it easy to locate any topic.
Supplementary Resources Because of the importance of computer graphics, a vast number of supplementary resources is available. Section 1.5 has a long list of seven classes of resources and this only scratches the surface of what is available. The Internet has many thousands of resources in the form of websites dedicated to graphics, class notes, scientific publications, and software. Please bear in mind that graphics, like any discipline associated with computers and computations, develops continually, so readers should search the Internet for new, useful, and exciting sources of information. Search items may be as general as “computer graphics,” “computer animation,” “image processing,” “computer vision,” and “computer-aided design (CAD),” or as specific as “how to twirl in Photoshop,” “what is quaternion rotation,” and “Java code for dithering.” Shaw’s plays are the price we pay for Shaw’s prefaces. —James Agate.
Preface
xiii
Acknowledgements A book of this magnitude is generally written with the help, dedicated work, and encouragement of many people, and this large textbook/reference is no exception. First and foremost should be mentioned my editor, Wayne Wheeler and the copyeditor, Francesca White. They made many useful comments and suggestions, and pointed out many mistypes, errors, and stylistic blemishes. In addition, I would like to thank H. L. Scott for permission to use Figure 2.82, CH Products for permission to use Figure 26.24b, Andreas Petersik for Figure 6.61, Shinji Araya for Figure 7.27, Dick Termes for many figures and paintings, the authors of Hardy Calculus for the limerick at the end of Chapter 13, Bill Wilburn for many Mathematica notebooks, and Ari Salomon for photos and panoramas in several plates. I welcome any comments, suggestions and corrections. They should be emailed to
[email protected]. An errata list and other information will be maintained in the book’s website http://www.davidsalomon.name/CGadvertis. And now, to the matter at hand. This preface made me so impatient, being conscious of my own merits and innocence, that I was going to interrupt; when he entreated me to be silent, and thus proceeded.
— Jonathan Swift, Gulliver’s Travels
Contents Preface
vii
Introduction
1
Part I Basic Techniques 1
2
Historical Notes 1.1 Historical Survey 1.2 History of Curves and Surfaces 1.3 History of Video Games 1.4 Pioneers of Computer Graphics 1.5 Resources For Computer Graphics Raster Graphics 2.1 Pixels 2.2 Graphics Output 2.3 The Source-Target Paradigm 2.4 Interpolation 2.5 Bitmap Scaling 2.6 Bitmap Stretching 2.7 Replicating Pixels 2.8 Scaling Bitmaps with Bresenham 2.9 Pixel Art Scaling 2.10 Pixel Interpolation 2.11 Bilinear Interpolation 2.12 Interpolating Polynomials 2.13 Adaptive Scaling by 2 2.14 The Keystone Problem 2.15 Bitmap Rotation 2.16 Nonlinear Bitmap Transformations 2.17 Circle Inversion 2.18 Polygons (2D) 2.19 Clipping
7 9 9 13 14 17 19 29 30 32 40 41 45 46 46 48 54 56 57 58 65 73 76 79 85 88 91 xv
Contents
xvi 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 2.28 2.29 2.30 2.31 2.32 3
Cohen–Sutherland Line Clipping Nicholl–Lee–Nicholl Line Clipping Cyrus–Beck Line Clipping Sutherland–Hodgman Polygon Clipping Weiler–Atherton Polygon Clipping A Practical Drawing Program GUI by Inversion Points Halftoning Dithering Stippling Random Numbers Image Processing The Hough Transform
92 93 95 97 99 100 104 107 109 118 122 126 131 135
Scan Conversion 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14
Scan-Converting Lines Midpoint Subdivision DDA Methods Bresenham’s Line Method Double-Step DDA Best-Fit DDA Scan-Converting in Parallel Scan-Converting Circles Filling Polygons Pattern Filling Thick Curves General Considerations Antialiasing Convolution
135 136 137 142 147 151 153 156 167 175 177 179 180 192
Part II Transformations and Projections
193
4
199
Transformations 4.1 4.2 4.3 4.4 4.5
5
Introduction Two-Dimensional Transformations Three-Dimensional Coordinate Systems Three-Dimensional Transformations Transforming the Coordinate System
201 204 232 233 250 251
Parallel Projections 5.1 5.2 5.3
Orthographic Projections Axonometric Projections Oblique Projections
252 255 262
Contents 6
7
Perspective Projection 6.1 One Two Three . . . Infinity 6.2 History of Perspective 6.3 Perspective in Curved Objects, I 6.4 Perspective in Curved Objects, II 6.5 The Mathematics of Perspective 6.6 General Perspective 6.7 Transforming the Object 6.8 Viewer at an Arbitrary Location 6.9 A Coordinate-Free Approach: I 6.10 A Coordinate-Free Approach: II 6.11 The Viewing Volume 6.12 Stereoscopic Images 6.13 Creating a Stereoscopic Image 6.14 Viewing a Stereoscopic Image 6.15 Autostereoscopic Displays Nonlinear Projections 7.1 False Perspective 7.2 Fisheye Projection 7.3 Poor Man’s Fisheye 7.4 Fisheye Menus 7.5 Panoramic Projections 7.6 Cylindrical Panoramic Projection 7.7 Spherical Panoramic Projection 7.8 Cubic Panoramic Projection 7.9 Six-Point Perspective 7.10 Other Panoramic Projections 7.11 Panoramic Cameras 7.12 Telescopic Projection 7.13 Microscopic Projection 7.14 Anamorphosis 7.15 Map Projections
xvii 267 269 275 282 283 294 305 310 314 322 326 329 332 336 340 351 355 355 357 369 369 372 373 380 386 389 391 396 402 404 405 408
Part III Curves and Surfaces 8
Basic 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11
Theory Points and Vectors Length of Curves Example: Area of Planar Polygons Example: Volume of Polyhedra Parametric Blending Parametric Curves Properties of Parametric Curves PC Curves Curvature and Torsion Special and Degenerate Curves Basic Concepts of Surfaces
429 433 433 441 442 443 444 445 447 453 461 469 470
Contents
xviii 8.12 8.13 8.14 8.15 8.16 9
Straight Segments Polygonal Surfaces Bilinear Surfaces Lofted Surfaces
483 487 493 499
Four Points The Lagrange Polynomial The Newton Polynomial Polynomial Surfaces The Biquadratic Surface Patch The Bicubic Surface Patch Coons Surfaces Gordon Surfaces
505 506 510 519 521 521 522 527 542
Hermite Interpolation 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10
12
483
Polynomial Interpolation 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8
11
473 475 476 478 481
Linear Interpolation 9.1 9.2 9.3 9.4
10
The Cartesian Product Connecting Surface Patches Fast Computation of a Bicubic Patch Subdividing a Surface Patch Surface Normals
Interactive Control The Hermite Curve Segment Degree-5 Hermite Interpolation Controlling the Hermite Segment Truncating and Segmenting Hermite Straight Segments A Variant Hermite Segment Ferguson Surfaces Bicubic Hermite Patch Biquadratic Hermite Patch
545 546 547 557 558 562 564 566 568 571 573
Spline Interpolation 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9
The Cubic Spline Curve The Akima Spline The Quadratic Spline The Quintic Spline Cardinal Splines Parabolic Blending: Catmull–Rom Curves Catmull–Rom Surfaces Kochanek–Bartels Splines Fitting a PC to Experimental Points
577 578 599 602 604 606 610 615 617 624
Contents 13
14
B´ ezier Approximation 13.1 The B´ezier Curve 13.2 The Bernstein Form of the B´ezier Curve 13.3 Fast Calculation of the Curve 13.4 Properties of the Curve 13.5 Connecting B´ezier Curves 13.6 The B´ezier Curve as a Linear Interpolation 13.7 Blossoming 13.8 Subdividing the B´ezier Curve 13.9 Degree Elevation 13.10 Reparametrizing the Curve 13.11 Cubic B´ezier Segments with Tension 13.12 An Interpolating B´ezier Curve: I 13.13 An Interpolating B´ezier Curve: II 13.14 Nonparametric B´ezier Curves 13.15 Rational B´ezier Curves 13.16 Circles and B´ezier Curves 13.17 Rectangular B´ezier Surfaces 13.18 Subdividing Rectangular Patches 13.19 Degree Elevation 13.20 Nonparametric Rectangular Patches 13.21 Joining Rectangular B´ezier Patches 13.22 An Interpolating B´ezier Surface Patch 13.23 A B´ezier Sphere 13.24 Rational B´ezier Surfaces 13.25 Triangular B´ezier Surfaces 13.26 Joining Triangular B´ezier Patches 13.27 Reparametrizing the B´ezier Surface 13.28 The Gregory Patch Volume II
xix 629 630 632 639 644 647 648 653 658 660 663 672 673 676 684 684 690 693 698 699 701 702 704 707 707 709 719 723 725 729
B-Spline Approximation 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 14.10 14.11 14.12 14.13 14.14
The Quadratic Uniform B-Spline The Cubic Uniform B-Spline Multiple Control Points Cubic B-Splines with Tension Cubic B-Spline and B´ezier Curves Higher-Degree Uniform B-Splines Interpolating B-Splines A Knot Vector-Based Approach Recursive Definitions of the B-Spline Open Uniform B-Splines Nonuniform B-Splines Matrix Form of the Nonuniform B-Spline Subdividing the B-spline Curve Nonuniform Rational B-Splines (NURBS)
731 732 736 743 745 748 748 750 751 760 761 766 775 779 782
Contents
xx 14.15 14.16 14.17 14.18 14.19 15
788 792 796 798 801 803
Subdivision Methods 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 15.10
16
The Cubic B-Spline as a Circle Uniform B-Spline Surfaces Relation to Other Surfaces An Interpolating Bicubic Patch The Quadratic-Cubic B-Spline Surface Introduction Chaikin’s Refinement Method Quadratic Uniform B-Spline by Subdivision Cubic Uniform B-Spline by Subdivision Biquadratic B-Spline Surface by Subdivision Bicubic B-Spline Surface by Subdivision Polygonal Surfaces by Subdivision Doo Sabin Surfaces Catmull–Clark Surfaces Loop Surfaces
803 804 811 812 816 821 826 826 828 829
Sweep Surfaces 16.1 16.2 16.3 16.4
Sweep Surfaces Surfaces of Revolution An Alternative Approach Skinned Surfaces
833 834 839 842 846
Part IV Advanced Techniques 17
851
Rendering 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 17.9 17.10
18
849
Introduction A Simple Shading Model Gouraud and Phong Shading Palette Optimization Ray Tracing Photon Mapping Texturing Bump Mapping Particle Systems Mosaics
852 853 864 866 867 876 877 879 881 883
Visible Surface Determination 18.1 18.2 18.3 18.4 18.5 18.6 18.7 18.8
Ray Casting Z-Buffer Method Explicit Surfaces Depth-Sort Method Scan-Line Approach Warnock’s Algorithm Octree Methods Approaches to Curved Surfaces
891 893 893 895 898 901 905 907 910
Contents 19
911 914 915 916 919 923 924 930 937 943 944
GKS IGES PHIGS OpenGL PostScript Graphics File Formats GIF TIFF PNG CRC
947 947 949 951 954 956 960 961 963 967 972
Color 21.1 21.2 21.3 21.4 21.5 21.6 21.7 21.8 21.9 21.10 21.11 21.12 21.13
22
Background Interpolating Positions Constant Speed: I Constant Speed: II Interpolating Orientations: I SLERP Summary Interpolating Orientations: II Nonuniform Interpolation Morphing Free-Form Deformations
Graphics Standards 20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 20.9 20.10
21
911
Computer Animation 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 19.9 19.10 19.11
20
xxi
975 Light Color and the Eye Color and Human Vision The HLS Color Model The HSV Color Model The RGB Color Space Additive and Subtractive Colors Complementary Colors The Color Wheel Spectral Density The CIE Standard Luminance Converting Color to Grayscale
975 976 978 982 984 984 986 991 992 994 997 1000 1001
Fractals 22.1 22.2 22.3 22.4 22.5 22.6
Introduction Fractal Landscapes Branching Rules Iterated Function Systems (IFS) Attractors Gaussian Distribution
1005 1006 1009 1013 1013 1017 1021
Contents
xxii
Part V Image Compression 23
Compression Techniques 23.1 23.2 23.3 23.4 23.5 23.6 23.7 23.8 23.9 23.10 23.11 23.12 23.13
24
25
1025
Redundancy in Data Image Types Redundancy in Images Approaches to Image Compression Intuitive Methods Variable-Length Codes Codes, Fixed- and Variable-Length Prefix Codes VLCs for Integers Start-Step-Stop Codes Start/Stop Codes Elias Codes Huffman Coding
Transforms and JPEG 24.1 Image Transforms 24.2 Orthogonal Transforms 24.3 The Discrete Cosine Transform 24.4 Test Images 24.5 JPEG
1027 1027 1031 1032 1038 1051 1052 1052 1055 1056 1058 1060 1062 1069 1079 1079 1084 1092 1128 1132
The Wavelet Transform 25.1 25.2 25.3 25.4 25.5
The Haar Transform Filter Banks The DWT SPIHT QTCQ
1147 1148 1166 1176 1188 1199
Part VI Graphics Devices 26
1201 1203
Graphics Devices 26.1 26.2 26.3 26.4 26.5 26.6 26.7 26.8 26.9 26.10 26.11 26.12 26.13 26.14
Displays The CRT LCDs The Digital Camera The Computer Mouse The Trackball The Joystick The Graphics Tablet Scanners Inkjet Printers Solid-Ink Printers Laser Printers Plotters Interactive Devices
1203 1208 1213 1217 1236 1240 1242 1244 1252 1262 1271 1274 1279 1282
Contents
xxiii
27 Appendixes
1287
A
Vector Products
1289
B
Quaternions
1295
C
Conic Sections
1299
D
Mathematica Notes
1305
E
The Resolution of Film
1311
F
The Color Plates
1315
References
1321
Answers to Exercises
1339
Index
1469
To me style is just the outside of content, and content the inside of style, like the outside and the inside of the human body both go together, they can’t be separated.
— Jean-Luc Godard
PlateA.1.WaterSplashinaGreenGarden(Modo).
Plate A.2. A Cluttered Room, Day and Night (Live Interior).
PlateA.3.AJigsawPuzzleofChateauChambord(AVbrothers).
PlateA.4.TheMandelbrotSetandTwoDetails(FractalDomains).
PlateA.5.ThreeShinyDice(Modo).
PlateA.6.POVRAYTextures(MegaPOV).
PlateA.7.RationalFractals(FractalDomains).
Varythelighting
Makethefloorshiny
Addtexture
Addalightsource
PlateA.8.StepstoVisuallyImproveaSimpleScene(MegaPOV).
Addawoodenfloor
Startwithafewrods
Introduction Computer graphics is a vast field with applications to presentations (slides and video), computer art, cartography, medicine, entertainment, training, visualization (of large quantities of data), image processing, design (computer aided or CAD), and many other areas. The adage “a picture is worth a thousand words” explains why this field is so important, and it also explains why this book is so big. A video (or even a single image) tells us so much more than text, but it also requires more resources and longer preparation. There simply is much to learn about computer graphics, the special input/output devices it requires, the specific approaches, techniques, and algorithms it employs, and the specialized language, concepts, and terms that are commonly used in this field. This introduction covers the chief terms used in computer graphics, the concept of graphics processing unit (GPU), and the fundamental equation of computer graphics.
Terms and Concepts Here are a few informal definitions of the graphics field and its “relatives” (as is common with any informal definitions, certain experts and users may disagree). Image processing (more precisely, digital image processing) is the field that deals with methods, techniques, and algorithms for image manipulation, enhancement, and interpolation. Researchers in this field test, publish, and implement their algorithms to make them available to users who often may not be technically savvy. (A technically-savvy person is one who is proficient enough to read a user manual, run software and equipment, and notice when things go wrong, but who is not an expert, does not develop algorithms, and does not write code.) Image editing (or simply imaging) is concerned with using software (implemented by image processing professionals) to manipulate images. Computer graphics is the discipline concerned with generating images. An image is generated (or synthesized) from geometrical descriptions. In addition to just generating images, this field is also concerned with rendering them accurately (so they look real) and fast (so that generating a long video of computer animation does not take forever). 1
2
Introduction
Computer vision is that branch of engineering concerned with constructing devices that comprehend and correctly interpret real objects the way our brains do. Pattern recognition is the mathematical field that deals with finding patterns in data and signals. The data can be text, audio, still images, or video. A special case of pattern recognition is data mining, a branch of computer science concerned with finding trends in large data bases. Image analysis includes anything that has to do with the extraction of meaningful information from images. Image analysis is a wide field that often overlaps other imagerelated areas. Reading a bar code, for example, can be considered image analysis, but is also pattern recognition. Identifying a face in a digital image is image analysis, but may use techniques from artificial intelligence. Detecting an edge in a digital image is image analysis, but it uses algorithms from image processing. Here is a short list of the most important concepts and terms used in computer graphics: Graphics (from the Greek γραφικ´ oς graphia or graphikos) are visual presentations on some surface, such as a canvas, screen, or paper. Graphics are used to inform, illustrate, or entertain. Examples of graphics presentations are paintings, photographs, drawings, digital images, graphs, diagrams, geometric designs, and maps. Graphics may be in black-and-white, grayscale, or color and may contain text. Computer graphics is the field concerned with the generation, manipulation, and storage of digital graphical data. This includes still images (two- and three-dimensional), animated graphics, and interactive images (virtual reality). In fact, most digital data that is not text, software, or audio, is graphics data. A pixel (from picture element) is the smallest unit of a digital image. In the computer, a pixel is represented by its color code, which is either a grayscale value or the three components of a color. We tend to think of a pixel as a small dot, circular or square, but Section 2.1 shows that in principle, a pixel is a mathematical, dimensionless point. Figure Intro.1 shows a small, 128 × 128-pixel image and some of its constituent pixels.
Figure Intro.1: An Image and Pixels.
Introduction
3
Digital image. Such an image is a rectangular array of pixels. Early images had only black-and-white or grayscale pixels, and featured low resolutions (only a few dozen pixels per centimeter or per inch). Current images are mostly color, where the colors are selected from huge palettes of millions of colors. Today’s images can have resolutions of hundreds of pixels per centimeter, more often measured as dpi (dots per inch). 3D (three-dimensional) images. Such an image contains much more information than a two-dimensional image. In order to fully view a three-dimensional image, it has to be rotated or transformed in some way. True viewing devices for such images are rare and expensive (holograms come to mind), so most three-dimensional images are projected on a two-dimensional display and have to be viewed from different directions. Computer animation. Much as a film is a sequence of still images, a digital animation is a sequence of digital images. An animation is organized in scenes, where consecutive images in a scene differ slightly in order to achieve the illusion of smooth movement. Vector graphics. Older graphics displays were of this type. In a vector graphics computer system, the program running in the computer creates a picture out of graphics components such as points, lines, and circles. A set of such components is stored in memory and becomes the description of an image. Special hardware translates each graphics component in memory into a visible part on the output (normally a display). The total of all the parts on the display constitutes the image. Raster graphics. Most current graphics displays are of this type. The screen displays an image that consists of small dots (pixels). The program running in the computer creates an image by assigning color codes to the pixels in memory. The graphics hardware scans the color codes in memory and actually paints the pixels on the display. Scan conversion. This is the process of converting a smooth geometric figure into pixels, such that the result looks as smooth as possible. Scan conversion is a very common operation, which is why scan conversion algorithms should be fast, using only logical operations, shifts, and simple arithmetic operations. Transformations. Once an image has been generated, the user often transforms it in order to view different parts of the image or to modify its shape in regular ways. Transformations are a must when a three-dimensional image is viewed on a two-dimensional display. Projections. A three-dimensional image has to be projected in order to print it or view it on a two-dimensional display. The most common projection is perspective, but parallel projections and nonlinear projections are suitable for special applications. Rendering is the process of generating a digital image from a mathematical model by means of algorithms implemented in computer programs. The mathematical model describes the objects that constitute the image in terms of points and curves. Inputs to the rendering algorithm include the orientation of the objects in space, the surface textures of the objects (including shading information), and the lighting configuration (the positions, intensities, and colors of the light sources). The term rendering has its origins in a painter’s rendering of a scene.
4
Introduction
Modeling. A mathematical model of an object is a collection of points and curves. If the object is three-dimensional, the model also contains surface information. The first step in rendering an object is to display its mathematical model as a wireframe (Section 8.11.2). Color. An understanding of color is essential for dealing with color images and animation. Chapter 21 discusses the physical meaning and psychological implications of color, as well as the various color spaces and human vision. Image compression. Images tend to be large, compared to texts, which is why image compression is such an important field of research. Images can be compressed because their original (raw) representations are inefficient and feature much redundancy. The various image compression methods remove this redundancy in different ways, and Chapters 23 through 25 describe several approaches to image compression, especially orthogonal and subband (i.e., wavelet) transforms.
The Graphics Processing Unit (GPU) Graphics is a computationally intensive application. The task of computing and displaying a color, high-resolution image on a large screen involves the following steps: (1) The surfaces that constitute the image have to be computed (as discussed in Part III of the book), either as polygonal surfaces or as smooth, curved surfaces that are then converted to triangles. (2) Each small triangle has to be rendered by simulating the light reflected from it (Chapter 17). The software has to know the positions, orientations, and colors of the light sources, and has to perform intensive computations for each triangle. (3) A texture is sometimes embedded in a surface. The texture is a small bitmap that is wrapped around the surface to enhance its look. (4) Once rendered, a triangle has to be projected to two dimensions (see Part II of the book). The projection can be parallel or perspective. (5) When the projection of a triangle has been determined (i.e., the two-dimensional coordinates of its three vertices are known), its visibility must be determined (Chapter 18). A triangle (a small part of a larger surface patch) may be completely or partially obscured by other surface parts that are closer to the camera (or the observer). Only those parts of the triangle that are visible to the camera should participate in the next step. (6) Those parts need to be scan converted (Chapter 3). A special algorithm is needed to determine the best pixels for each visible part of the triangle. It has therefore been recognized in the early days of computer graphics that there is a need for special, dedicated hardware to perform most of these operations, which is why several companies started developing such hardware as early as the 1970s. The first units were special-purpose processors for creating bitmaps of text and simple graphics shapes and sending them to the video output. The 1980s saw graphics controllers for the IBM PC and the Commodore Amiga. Those devices tried to accelerate graphics
Introduction
5
operations and implement various two-dimensional graphics primitives in hardware. In the case of the Amiga computer, the graphics unit also handled scan conversion of lines, filling of regions, and performed block image transfers. As hardware manufacturing capabilities improved during the 1990s, more graphics cards, graphics accelerators, and support devices for three-dimensional graphics operations were introduced for both personal computers and console games. In some cases, three-dimensional graphics operations became possible only if an accelerator was installed in the computer. The graphics programming language OpenGL appeared in the early 1990s and became so popular that users started demanding graphics processors that can be programmed in OpenGL. It was only in the late 1990s that the first GPUs, made by Nvidia (which also coined the term GPU) were introduced. These devices (in combination with high-resolution display monitors and drawing and illustration software) have turned our personal computers into fast, high-resolution graphics generators, and have made it possible for anyone to create high-quality original graphics such as maps, plans, drawings, animations, games, and illustrations on a personal computer. A GPU is a processor mounted on a special graphics card which also includes support devices, memory, and a bus. The graphics card can be plugged into the computer and later replaced with a more powerful version, but in most current personal computers the card is simply part of the main circuit board. The main functions of the GPU and its support devices are fast floating-point operations and vector operations. These operations are used all the time in surface computations, rendering, and visible surface determination. Today (2011), GPUs feature parallel computational units that can perform their operations simultaneously. A parallel processor is notoriously difficult to program, which is why software writing for parallel GPUs is now considered an important field which is dubbed “general-purpose computing on GPU” or GPGPU. All problems in computer graphics can be solved with a matrix inversion. —James F. Blinn, What’s the Deal with the DCT (1993). The Fundamental Equation of Computer Graphics Mathematicians like fundamental theorems. The fundamental theorem of a field of mathematics is the theorem considered central to that field. There are fundamental theorems of calculus, algebra, arithmetic, vector analysis, Galois theory, Riemannian geometry, and more. Theorems and proofs are rare in computer graphics, which is not as rigorous as mathematics. However, there is one equation that appears many times in this book and can be considered fundamental in computer graphics. This fundamental equation is the expression for the basic linear interpolation, Equation (9.1) P(t) = (1 − t)P0 + tP1 , where P0 and P1 can be numbers, points, curves, and even images (as, for example, in Sections 2.31.1 and 8.5). It is easy to see that P(0) = P0 and P(1) = P1 . In general,
6
Introduction
P(t) is a mixture (or a combination) of (1 − t)% of P0 and t% of P1 . Notice that these percentages add up to 1. When the parameter t is varied from 0 to 1, P(t) varies linearly from P0 to P1 . (See the epigram at the end of Chapter 13.) You know what? I can’t stand introductions. Weird coming from an author, right? It’s like some committee got together and said that you’ve got to have an introduction in your book. Oh, and please make it long. Really long! In fact, make it so long that it will ensure no one reads introductions.
—Matt Kloskowski, The Complete Guide to Photoshop Layers, 2011.
Part I Basic Techniques The first part of this book deals with the basic concepts and techniques employed in computer graphics. Chapter 1 is a short survey of the history of this field. It also lists many pioneers and their achievements, and follows with a detailed list of graphics resources. Chapter 2 introduces the basic concepts of vector scan and raster scan. It also discusses many topics and operations that relate to bitmaps. Included are windows, the important BitBlt operation, bitmap scaling, bitmap rotation, inversion points, and many others. The important operation of scan conversion is the topic of Chapter 3. Geometrical figures such as lines, circles, and ellipses are ideally straight or curved and are smooth, but digital images consist of pixels and are therefore rough, blocky, and pixelated. Scan conversion is the task of determining the best pixels for a given geometric figure. Several algorithms for scan converting lines and circles are described in this chapter, and it is shown that scan converting these figures is a simple problem in the sense that it can be solved by efficient DDA methods using just integers and simple arithmetic operations (additions, subtractions, and shifts). Many other scan conversion algorithms have been developed for generating polygons, parabolas, hyperbolas, and other common graphics figures. The chapter also discusses antialiasing, an operation that is related to scan conversion in that it employs grayscales to reduce the annoying effect of jagged edges.
1 Historical Notes We start our long voyage in computer graphics with a short survey of the history of this field and the names of some of its pioneers. This is followed by a detailed list of several types of available resources.
1.1 Historical Survey The term “Computer Graphics” was coined in 1960 by William Fetter, to describe what he was doing at Boeing at the time, but the history of computer graphics started in the early 1950s. (In 1950, Ben Laposky, a mathematician and artist from Iowa, created the first graphic images generated by an electronic machine. These were Lissajous figures that he dubbed oscillons and displayed on an oscilloscope, which is an analog device.) This is very early, considering that the history of the modern digital electronic computer itself began in the late 1940s. However, because of high hardware prices, the field was originally the domain of a few lucky individuals, and it was only in the 1970s that it started growing fast and eventually became the vast discipline that we know today. Here is a short chronology. A curious note. For many years, the CRT (Cathode Ray Tube) was the main graphics output device. History tells us that this device was invented in 1885 and became practical in 1897, when Ferdinand Braun in Germany developed a CRT with a fluorescent screen. The screen would emit a visible light when struck by a beam of electrons and the device became known as the cathode ray oscilloscope. By 1951 the Whirlwind computer installed at MIT had two 16-inch graphics displays (actually, modified oscilloscopes). Surprisingly, there were no immediate users. (This computer was part of the Whirlwind project, an electronic controller for a United States Navy flight simulator to train bomber crews. The project started in 1945.) Plotters (Section 26.13) came into use as graphics output devices in 1953. D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_1, © Springer-Verlag London Limited 2011
9
10
1.1 Historical Survey
In 1955, the SAGE (Semi-Automatic Ground Environment) air defense system started its operations. It used vector-scanned monitors as its main output and light pens as its input devices. Digital Equipment Corporation (DEC) was founded in 1957. It started making minicomputers that were later used in the early development of computer graphics. Light pens (Section 26.2.3) came into wide use in 1958, the same year as the first microfilm recorder. In 1959, a partnership of General Motors and IBM produced the first piece of drawing software, the DAC-1 (Design Augmented by Computers). Users could input the three-dimensional description of a car, view the car in perspective, and rotate it. It was in the 1960s that the field got its first big push. In 1961, Ivan Sutherland developed Sketchpad, a drawing program, as his Ph.D. thesis at MIT. Sketchpad used a light pen as its main input device and an oscilloscope (modified to do vector scan) as its output device. The first version handled two-dimensional figures only, and was later extended to draw, transform, and project three-dimensional objects, and also to perform engineering calculations such as stress analysis. One important feature of Sketchpad was its ability to recognize constraints. The user could draw, for example, a rough square, then instruct the software to convert it to an exact square. Another feature was the ability to deal with objects, not just individual curve segments. The user could build an object out of segments, then ask the software to scale it. Because of his pioneering work, Sutherland is often acknowledged as the grandfather of interactive computer graphics and graphical user interfaces. There are many Internet resources with information about and images of Sketchpad, see, for example, [sketchpad.wiki 10]. I just need to figure out how things work. —Ivan Sutherland. At about the same time, Steven Russell, another MIT student, developed the first video game, Spacewar! This program was written for the PDP-1 and was later used by DEC salespersons to demonstrate that minicomputer. In 1963, the first computer-generated film, titled Simulation of a two-giro gravity attitude control system, was created by Edward E. Zajac at Bell laboratories. Others at Bell, Boeing, and Lawrence Radiation Laboratory followed soon with more films. The first digitizer (Section 26.8), the RAND tablet, appeared in 1964. Also in 1964, the first commercially available graphics computer, the IBM 2250 Graphics Display Unit, was announced as part of the historically-important System/360. Like many old displays, the 2250 employed vector graphics on a 1024×1024 CRT that was refreshed up to 40 times per second. Characters were constructed of short line segments and any characters and symbols could be displayed. Like any vector-scan graphics device, the refresh time became longer as more and more symbols were displayed, and the display eventually started to flicker. The 2250 used a light pen as an interactive input device. In the mid-1960s, interest in computer graphics was picking up. More and more companies—such as TRW, Lockheed-Georgia, General Electric, and Sperry Rand— became active in the graphics field. At about the same time, David Evans and Ivan Sutherland founded their company which made, among other things, vector-scan dis-
1 Historical Notes
11
plays. Those displays are historically important since they gave a tremendous boost to computer graphics throughout the 1960s. In 1966, Sutherland developed the first three-dimensional head-mounted display (HMD, section 26.14.2). It displayed a stereoscopic pair of wire-frame images. This device was rediscovered in the 1980s and is commonly used today in virtual-reality applications. In the late 1960s, both Sutherland and Evans were invited to develop a program in computer science at the University of Utah in Salt Lake City. Computer graphics quickly became the specialty of their department, and for years maintained its position as the primary world center for this field. Many important methods and techniques were developed at the UU computer graphics lab, among them illumination models, hiddensurface algorithms, and basic rendering techniques for polygonal surfaces. Names of UU students such as Phong Bui-Tuong, Henri Gouraud, James Blinn, and Ed Catmull are associated with many basic algorithms still in use today. Several accounts of computer graphics persons and projects at UU can be found on the Internet at URL http://www.cs.utah.edu/school/history/. Computer graphics in the 1960s was out of the reach of most computer users because the special graphics hardware was expensive. There were no personal computers or workstations. Users had to pay for mainframe time by the second or buy expensive minicomputers. Display monitors used vector scan and were black and white. The result was that few computer professionals could develop computer graphics techniques and algorithms and the software was noninteractive and non-portable. The advent of the microprocessor, in the mid-1970s, was another factor in the rapid progress of computer graphics. Personal computers appeared on the market and suddenly anyone could afford to own a computer. This encouraged the formation of small companies that developed computer animation, mostly to be used in television commercials. Names such as Abel and Associates, Information International Inc., Digital Effects, and Systems Simulation Ltd. became well known and produced short pieces that demonstrated dazzling effects. SIGGRAPH, the Special Interest Group on Computer Graphics (part of the ACM), was formed in 1969 and has grown in size and importance as the field of computer graphics expanded. The first of the many famous SIGGRAPH conferences was held in 1973. It attracted 1200 attendees and later conferences boasted as many as 30,000 participants and hundreds of exhibitors. The famous Utah teapot (see Page 704 and Plate Z.6) was constructed in 1975. This is perhaps the best known three-dimensional model in computer graphics. The original teapot this model is based on is displayed at the Computer Museum in Boston. During the 1970s, activity in basic computer graphics research started moving from UU first to NYIT, the New York Institute of Technology, then to Lucasfilm. Computer animation and computer painting were two topics seriously developed at those places. The technique of (and hardware for) raster scan was developed in the 1970s by Richard Shoupe at Xerox Palo Alto Research Center (PARC). Workers in the field soon realized the advantages of raster scan and the word “pixel” entered the field of computer graphics. Like any other mature discipline, computer graphics eventually got its first periodic publication. Computer Graphics World started carrying news and reviews in late 1977.
12
1.1 Historical Survey
Fractals, developed by Benoˆıt Mandelbrot in the 1960s and 1970s, were applied to computer graphics in the late 1970s by Loren Carpenter and others. It was in the 1980s that personal computers, most notably, the Macintosh and Amiga, employed graphical user interfaces (GUI) to interact with the user and to graphically display results with symbols, icons, and pictures, rather than text. The term “multimedia,” which originated around 1985, refers to the use of text, images, animation, and audio in computers in an integrated way, which is why computer graphics is one of the main components of multimedia. Ray tracing, a sophisticated rendering method (Section 17.5), was developed by Turner Whitted of Bell labs and published in 1980. Silicon Graphics Inc. (SGI) was founded in 1982 and has been building highperformance graphics computers since. The technique of particle systems (Section 17.9) was developed in the early eighties at Lucasfilm. Morphing (Section 19.10) was developed at the same time at NYIT. The data glove (Section 26.14.1), currently very popular for virtual-reality applications, was developed at Atari in 1983. Radiosity came out of Cornell University in 1984. This is a sophisticated rendering method that simulates light reflection between surfaces by determining the exchange of energy between them. GUI, graphical user interfaces, appeared in 1984 with the release of the first Macintosh computer. This personal computer immediately became, and still remains, a highly popular tool for computer graphics and is currently used by graphics amateurs and professionals, as well as by graphic-oriented businesses. The Amiga computer, made by Commodore, also had much success in the graphics field. In 1985 came the first ISO standard, the High Sierra, for CD-ROMs. The Commodore Amiga personal computer was also introduced in the same year. It immediately became popular for what today is called multimedia applications. PostScript, the all-important page-description language (Section 20.5), came out of Adobe Inc. at about the same time. The 1980s saw the emergence of raster-scan display monitors as the main graphics output device. This technology has benefited from experience gained with television and has resulted in the affordable, reliable color monitors of today. The late 1980s and early 1990s also saw the developments of graphics standards such as GKS and PHIGS. In the late 1980s, graphics computers made by SGI (Silicon Graphics Inc.) were used to create some of the first fully computer-generated short films at Pixar. The Microsoft Windows 3.0 operating system was first shipped in 1990 and, of course, gave a tremendous boost to the concept of GUI. More and more applications were developed to run under MS Windows. OpenGL (Open Graphics Library) (Section 20.4), was originated by SGI in 1992. The 1990s also saw rapid developments in three-dimensional graphics, especially in gaming, multimedia, and animation. Quake, one of the first fully 3D games, was released in 1996. Released in 1997, Toy Story is the first full-length (79 minutes, which translates to more than 114,000 animation frames at 24 frames per second) feature film that’s completely computer-animated. It represents a milestone in computer graphics and
1 Historical Notes
13
it marks the beginning of an era when computer graphics rendering techniques have become so sophisticated that viewers may find it impossible to tell if an image is real or if it is a clever rendering of a mathematical model. The explosion of CPU speeds and memory capacities since the late 1990s have resulted in more detailed and realistic digital images and animation, partly also due to powerful 3D-modeling software. Reference [graphics.timeline 09] is a detailed timeline of important developments in computer graphics. Check also [hocg 06] and [masson 11]. Reference [morrison 10] is Michael Morrison’s history of computer animation. A short visual history of this area is [animation-tube 10]. Search YouTube.com for others.
1.2 History of Curves and Surfaces Section 9.4 discusses lofted surfaces but does not explain the reason for this unusual name. Historically, shipbuilders were among the first to mechanize their operation by developing mathematical models of surfaces. Ships tend to be big and the only dry place in a shipyard large enough to store full-size drawings of ship parts was the sail lofts above the shipyard’s dry dock. Certain parts of a ship’s surface are flat in one direction and curved in the other direction, so such surfaces became known as lofted. In the 1960s, both car and aircraft manufacturers became interested in applying computers to automate the design of their vehicles. Traditionally, artists and designers had to make clay models of each part of the surface of a car or airplane and these models were later used by the production people to produce stamp molds. With the help of the computer it became possible to design a section of surface on the computer in an interactive process, and then have the computer drive a milling machine to actually make the part. The box on Page 629 mentions the work of Pierre B´ezier at Renault and Paul de Casteljau at Citro¨en, the contributions of Steven Coons to Ford Motors and William Gordon and Richard F. Riesenfeld to General Motors, and the efforts of James Ferguson in constructing airplane surfaces. As a result of these developments in the 1960s and 1970s, the area of computer graphics that deals with curves and surfaces has become known, in 1974, as computeraided geometric design (CAGD). Several sophisticated CAGD software systems have been developed in the 1980s for general use in manufacturing and in other fields such as chemistry (to model molecules), geoscience (for specialized maps), and architecture (for three-dimensional models of buildings). Hardware developments in the 1980s made it possible to use CAGD techniques in the 1990s to produce computer-generated special effects for movies (an example is Jurassic Park), followed by full-length movies, such as Toy story, Finding Nemo, and Shrek, that were entirely generated by computer. A detailed survey of the history of this field can be found in [Farin 04]. Several first-person historical accounts by pioneers in this field are collected in [Rogers 01].
14
1.3 History of Video Games
1.3 History of Video Games When personal computers started appearing, in the mid 1970s, many, among them computer professionals, questioned their usefulness. A common question was: Why would anyone want to have a computer at home? Typical answers were: To balance your checking account and To store your recipe collection. Few realized the correct answer which is: To entertain and communicate. Today, those who have computers (and who hasn’t) use them to communicate (by email, Internet telephone, and video chat), to be entertained (by watching movies, listening to music, and playing games), or to do both. This is why video games are so important. Because games are based on graphics, the following short history is included here. 1958. The experimental game Tennis for Two is implemented by William Higinbotham at Brookhaven Laboratory. Even though it was interactive, today this is considered more an experiment than a game. 1966. A short paper by Ralph Baer describes ways to use a television receiver as an output device for interactive games. 1968. Spacewar! is finally finished and is demonstrated at MIT. This early game has inspired the 1971 Computer Space by Nolan Bushnell. 1971. Computer Space, by Nolan Bushnell, is introduced and becomes the first coin-operated video game. 1972. Ralph Baer releases Magnavox Odyssey, the first home video game. The very successful Pong game, also by Nolan Bushnell, is introduced. 1973. Several companies, among them Chicago Coin, Midway, Ramtek, Taito, Allied Leisure, and Kee Games, enter the video game market and give this young field a big boost. 1974. Tank, by Kee Games, becomes the first game to employ ROM (Read-Only Memory) for storing game data. TV Basketball, by Midway, features human figures as players. 1975. Gun Fight, by Midway, is introduced and becomes the first game to be based on a microprocessor. Atari releases Steeplechase, the first six-player arcade video game. Kee Games announces Indy 800, the first eight-player game. Lots of innovations in one year. 1976. The first video game chip, the AY-3-8500, is built by General Instruments. The first cartridge-based home game system, the Fairchild/Zircon Channel F is introduced. Night Driver, by Atari, becomes the first game to simulate a first-person perspective. Atari also releases Breakout. 1977. After two active, successful years, the game market becomes saturated and very competitive. Several companies give up and quit the video game field as a result. The winners continue and Atari introduces its VCS home console system (later renamed the 2600). Super Bug, by Kee Games, becomes the first game to offer four-directional scrolling, and Nintendo, a Japanese newcomer, releases its first home video game, Color TV Game 6. 1978. The very successful and much familiar Space Invaders game, by Taito, appears this year. Many are familiar with how this game became the inspiration for
1 Historical Notes
15
the many vertical shooting games that were introduced later. The technique of twodirectional scrolling is introduced by Football, from Atari. 1979. Warrior, the first one-on-one fighting game, is made public by Vectorbeam. Two vector graphics games, Asteroids and Lunar Lander are released by Atari. Namco introduces Galaxian, the first game with 100% RGB colors, and Puck-Man (later renamed Pac-Man), another success story and an inspiration to many. 1980. Pac-Man, Battlezone (the first arcade game to feature a true 3D environment), and Defender are released. All are influential and a source of ideas to competitors. Ultima becomes the first home computer game with four-directional scrolling. Star Fire becomes both the first cockpit game and the first video game to feature a high-score table using players’ initials. 1981. Donkey Kong, from Nintendo, and Tempest, by Atari, are released. The video game industry is growing fast and experts are predicting a backlash. 1982. Q*bert, by Gottlieb, appears. Zaxxon, by Sega, is introduced and is advertised on television (probably the first game with this distinction). The predicted economic crash starts this year (could it be a self-fulfilling prophecy?). 1983. After several years of continued growth, the video game industry suffers from another recession. Nevertheless, new games appear. I, Robot, by Atari, is the first raster video game with filled-polygon three-dimensional graphics. Atari also comes up with Star Wars, and the Famicon system, from Nintendo, is released in Japan. 1984. Even though the general economic recession that started in 1981 is now over, the video game industry is still suffering economically. Halcyon, by RDI, becomes the first laserdisc-based home video game system. 1985. A new, improved version of Famicon, renamed the Nintendo Entertainment System (NES), is released in America. It becomes so popular that it single-handedly reverses the crash and revives the video game industry. Super Mario Bros. is introduced by Nintendo and immediately becomes a best seller (it seems to be the best selling game ever). In the Soviet Union, Alex Pajitnov designs Tetris, another influential game. 1986. Nintendo’s Famicon game system benefits from the first of the many Zelda games. Taito’s Arkanoid and Bubble Bobble games appear in video game arcades. Sega releases its successful Sega Master System (SMS). 1987. The Manhole, by Cyan, becomes the first computer game released on CDROM. Maniac Mansion, by LucasArt, is the first adventure game with a point-and-click interface. Driller, from Incentive Software, is a personal computer game with stunning 3D graphics. Double Dragon is released by Taito. 1988. The notable game production for this year includes Assault by Namco, NARC, by Williams (the first game to run on a 32-bit processor), and Super Mario Bros. 2, by Nintendo. 1989. The list for this year includes Atari’s Hard Drivin’ and S.T.U.N. Runner, Gottlieb’s Exterminator (the first game to have all digitized imagery for its backgrounds), Nintendo’s Game Boy and Atari’s Lynx (two handheld video game consoles), and Sega Genesis (a home console, not a game). 1990. This is the year SimCity is released, by Maxis. Designed by Will Wright, SimCity is the first in a long line of Sim games. Among other new games, Nintendo releases Super Mario Bros. 3, Sega’s Game Gear is released in Japan, and Squaresoft’s Final Fantasy series is sold in North America.
16
1.3 History of Video Games
1991. New releases continue with the Super Nintendo Entertainment System (SNES), Street Fighter, II by Capcom, and Sonic the Hedgehog, by Sega. 1992. Mortal Kombat is developed by Midway (psychologists recommend replacing “C” by “K” as a means of attracting customers’ attention). The best seller The 7th Guest is released by Virgin Games. Virtua Racing, a 3D racing game, is a new game by Sega. Another familiar title is Wolfenstein 3D, by id Software. Dactyl Nightmare, by Virtuality, employs a virtual reality headset and gun interface. 1993. Another best seller, Myst, from Cyan, appears this year, together with Doom, by id Software, and Virtua Fighter, by Sega. This is also the year when the world-wide-web expands rapidly. 1994. New, important games include Donkey Kong Country (Nintendo), Saturn (Sega), Warcraft (Blizzard), Daytona USA, (Sega), and Warcraft II (Blizzard). Two new game systems, the Sega Saturn and the Sony PlayStation are released in Japan and SNK’s NeoGeo home console system is introduced. 1995. The Sega Saturn and the Sony PlayStation are now available in America. Donkey Kong Country 2 by Nintendo and Kong Quest, by Diddy, are among new game releases this year. 1996. Virtual Boy, a portable, stereo game system with a separate screen for each eye, is released by Nintendo. At last, it is possible to obtain a degree in video game development (from Digipen Institute of Technology). 1997. The first GameWorks arcade opens in Seattle by DreamWorks, Sega, and Universal. The first of the MMORPG Ultima genre games appears. Designed by Richard Garriott these are multiplayer games where the action occurs in an RPG (Role-Playing Game) world. This genre has influenced many of the most successful and popular titles that followed. Other notables are Riven, the sequel to Myst, by Sega, Top Skater, with a skateboard interface, also by Sega, and Mario Kart 64, by Nintendo. 1998. Dance Dance Revolution and the first games in the Beatmania series and GuitarFreaks series, all by Konmai. Boy Color (Nintendo), Half-Life (Sierra Studios), and Grand Theft Auto (Rockstar Games) are among the notables this year. SNK releases the NeoGeo Pocket handheld video game system. 1999. The list includes Dreamcast (Sega), Donkey Kong 64 (Nintendo), and Pro Skater (Tony Hawk). The first Independent Games Festival is held at the Game Developers Conference. 2000. An important announcement: Nintendo has just sold its 100 millionth Game Boy console. Sony introduces its PlayStation 2. Another milestone: The United States Post Office issues a stamp depicting video games. Have video games finally arrived? 2001. This is the year of the Xbox, by Microsoft, but the Nintendo GameCube is also introduced in 2001. Two surprises: Sega is leaving the home video game consoles market and Midway Games quits the arcade video game industry. 2002. The Sim series of games becomes a best seller and the MMORPG Sims Online game starts. Microsoft announces its Xbox Live online gaming service. Sega releases Rez for the PlayStation 2. 2003. This year marks the start of the MMORPG Star Wars Galaxies. Enter the Matrix, by atari, is released. Nokia releases the N-Gage handheld video game system. 2004. PlayStation Portable and PlayStation 2 (Sony) are released in Asia. The Nintendo DS (dual screen) handheld video game system is also introduced.
1 Historical Notes
17
2005. The Xbox 360 (Microsoft) and the Gizmondo (Tiger Telematics) are released. The Sims series appears on postage stamps in France. We thus conclude that: Yes, video games have arrived. 2006. Wii (Nintendo), PlayStation 3 (Sony), and Xbox 360 (Microsoft) are released. 2007. World of Warcraft, an MMORPG game, is estimated to have more than nine million players worldwide. 2008. Apple entered the field of mobile gaming hardware with the release of the iPhone and iPod Touch in the summer. Software for these platforms is sold only online. Nintendo announces its Wii MotionPlus module. 2009. Sony releases its PSP Go. This device is a newer, slimmer version of the PSP. Microsoft and Sony present their new motion controllers: Project Natal (later renamed Kinect) and PlayStation Move, respectively. A few cloud computing services are announced, targeted at video games. 2010. The Nintendo 3DS, the successor to the Nintendo DS, has been released in early 2011. The new Xbox 360 console (referred to as the Xbox 360 S or Slim) is revealed by Microsoft. See also [Wolf 08]. Life is a video game. No matter how good you get, you are always zapped in the end. —Anonymous.
1.4 Pioneers of Computer Graphics The following is an alphabetical list of many pioneers, researchers, and notable figures of this important field. (A writer’s apology. Any such list is necessarily incomplete. I apologize in advance for any omissions. They are unintentional, and when brought to my attention would be included in the errata list of the book.) Bill Atkinson, developed graphics algorithms, implemented the revolutionary MacPaint software for the Macintosh, as well as Hypercard, QuickDraw, and the Macintosh menu bar. Michael Barnsley, worked on fractals. Brian A. Barsky, developed beta splines (with tension). Richard H. Bartels, codeveloper of the Kochanek-Bartels splines. ´ Pierre Etienne B´ezier, created efficient algorithms for curves and surfaces. Jim F. Blinn, artist-mathematician-programmer, the originator of many graphics algorithms, implementations, and video productions. Jack Elton Bresenham, came up with extremely efficient methods for scan converting straight lines and circles. Tom Brigham worked on morphing in 1982.
18
1.4 Pioneers of Computer Graphics Nolan Kay Bushnell, created early video games.
Loren Carpenter, programmer and researcher, implemented fractal methods to draw realistic terrain and mountains. Paul de Faget de Casteljau, developed B´ezier curves and surfaces using an approach radically different from that of B´ezier. Ed Catmull, founded Pixar, a maker of digital films and videos. George Merrill Chaikin, developed subdivision methods for surfaces. Jim Clark, entrepreneur and scientist. Founder, in 1982, of Silicon Graphics. Steven Anson Coons, an early researcher in the field of surface design and implementation. Charles Csuri, an early pioneer in computer animation and digital fine art. Carl Wilhelm Reinhold de Boor, a pioneer in the application of splines to curves. Tom A. DeFanti is a distinguished computer graphics researcher and pioneer. Tony DeRose is a distinguished computer graphics researcher. Donald Doo, developed Doo–Sabin subdivision surfaces. Gerald Farin is a distinguished computer graphics researcher. Charles Geschke developed PostScript and cofounded Adobe. Henri Gouraud, originated the shading algorithm named after him. Donald Peter Greenberg, a leading educator and innovator in computer graphics and the chief developer of radiosity. Charles Hermite, a 19th century mathematician who originated the interpolation method named after him. Alan Kay Originated the notion of a graphical user interface (GUI). He said “the best way to predict the future is to invent it.” Doris H. U. Kochanek, codeveloper of the Kochanek-Bartels splines. Kenneth C. Knowlton, is a computer graphics pioneer, artist, mosaicist, and portraitist. Joseph-Louis Lagrange is the father of the Lagrange polynomial. Charles T. Loop developed Loop subdivision surfaces. Benoˆıt B. Mandelbrot, a major figure in the field of fractals. Martin Newell, developed the Newell algorithm for hidden surface removal and created the Utah teapot which was made famous by Frank Crow. A. Michael Noll is a pioneer of computer art. Phong Bui Tuong, originated the shading algorithm named after him.
1 Historical Notes
19
Richard F. Riesenfeld, developer of B-splines. Ton Roosendaal is the original creator of Blender. Steve Russell is a programmer and scientist known for creating Spacewar!, one of the earliest videogames, in 1961. Malcolm A. Sabin, codeveloper of the Doo–Sabin subdivision surfaces. Daniel J. Sandin is a video and computer graphics artist and researcher. He was part of the team that developed the first data glove. Alvy Ray Smith, entrepreneur and researcher. Founder, or active in, several computer graphics enterprises. Ivan Edward Sutherland, designed and implemented Sketchpad, an early 2D and 3D graphics system. Cofounded Evans and Sutherland. Other achievements too numerous to list here. Dick A. Termes, an internationally acclaimed artist painting in n-point perspective on spheres. John Warnock developed PostScript and cofounded Adobe. Turner Whitted, a pioneer of ray tracing. Edward E. Zajac created the first computer-generated film.
1.5 Resources For Computer Graphics The following types of resources are listed here: (1) organizations and societies, (2) research institutions, (3) universities, (4) journals, (5) books, (6) other graphics-related websites, and (7) software. There are hundreds of academic websites and the ones listed here were selected (somewhat arbitrarily) from hundreds of easily-located similar URLs. Similarly, there are hundreds of textbooks on computer graphics and the few listed here have been selected because they offer complete coverage and are also familiar to me. The reader should bear in mind that the field and its resources develop and change constantly, so readers should search the Internet for new, useful, and exciting sources of information. Search items may be selected from “history of computer graphics,” “computer graphics,” “computer animation,” “image processing,” “computer vision,” “computer-aided design (CAD),” and many other phrases. A few resources are also listed on Pages 196 and 431. 1. Societies. SIGGRAPH, the ACM special interest group on graphics, is perhaps the single most important source of information about computer graphics. It is located at http://www.siggraph.org/ and offers a wealth on information on art and design, computer graphics events, a computer graphics bibliography database, jobs and careers in the field, industry directory, and many related links worldwide. SIGGRAPH is also
20
1.5 Resources For Computer Graphics
known for its publications (Computer Graphics Quarterly, SIGGRAPH Video Review, and ACM Transactions on Graphics) and annual conferences. Eurographics is the European Association for Computer Graphics. Find it at http://www.eg.org/. CGsociety, at http://forums.cgsociety.org/, is the society of digital artists. Its mission is to cater for 3D Animation and Visual Effects developers. http://www.vrs.org.uk/ is The Virtual Reality Society (VRS). This is an international society dedicated to the discussion and advancement of virtual reality and synthetic environments. 2. Research Organizations. http://www.wavelet.org/. This site offers several services intended to foster the exchange of knowledge and viewpoints related to theory and applications of wavelets. It is relevant to computer graphics because of the applications of wavelets to image compression. http://www.igd.fhg.de/ is the address of the Fraunhofer Institute for Computer Graphics Research (IGD) which concentrates on the development of product prototypes (hardware and software) and the realization of concepts, models, and solutions for computer graphics. Look at http://www.ccg.pt/ for information on the Computer Graphics Center, devoted to research and development in virtual reality, multimedia systems, electronic commerce, and other, graphics-related, topics. 3. Academic Websites. Free lecture notes, image gallery, and students’ assignments, projects, and examinations are available at MIT http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/. http://www.cs.brown.edu/exploratories/freeSoftware/home.html is located at Brown University. It offers a collection of Java applets for learning about computer graphics, including color theory, imaging (convolution and filters), viewing techniques, coordinate systems, splines), and texture mapping. Site http://www.eecs.berkeley.edu/Research/Projects/Areas/GR.html lists graphics projects at the University of California, Berkeley. Graphics-related research at the California Institute of Technology (Caltech) can be found at http://www.gg.caltech.edu/. It focuses primarily on the mathematical foundations of computer graphics. Cornell university, at http://www.graphics.cornell.edu/, has an advanced program in computer graphics. http://www.cg.inf.ethz.ch/ will tell you all about the Computer Graphics Laboratory at the Eidgen¨ ossische Technishce Hochschule (ETH), in Z¨ urich.
1 Historical Notes
21
The Georgia Tech (GaTech), at http://gvu.cc.gatech.edu/, has a strong graphics program. Check out http://groups.csail.mit.edu/graphics/ for the graphics program at MIT. Computer Graphics, Visualization and Animation at the University of Utah is located at http://www.cs.utah.edu/research/areas/graphics/. http://mambo.ucsc.edu/psl/cg.html is a jumping point to many sites that deal with computer graphics. A similar site is http://www.cs.rit.edu/~ncs/graphics.html that also has many links to computer graphics sites. A very extensive site of computer-graphics-related pointers is http://ls7-www.informatik.uni-dortmund.de/html/englisch/servers.html. See http://graphics.idav.ucdavis.edu/education/GraphicsNotes/ for computer graphics course notes at the Computer Science Department, University of California, Davis. Site http://ls7-www.cs.uni-dortmund.de/cgotn/ lists many links to computer graphics resources. 4. Journals, Magazines, and Conference Proceedings. See http://www.siggraph.org/publications/newsletter for The ACM SIGGRAPH Computer Graphics Quarterly. This is the official publication of the ACM SIGGRAPH organization. It is published in February, May, August, and November of each year. See http://www.siggraph.org/publications/video-review for the SIGGRAPH Video Review, an important video-based publication. It illustrates the latest concepts in computer graphics and interactive techniques. ACM Transactions on Graphics is the premiere peer-reviewed journal of graphics research. See more information at http://www.siggraph.org/publications/tog. The proceedings (including visual material, such as the Computer Animation Festival) of the annual ACM SIGGRAPH Conferences are published extensively. See http://www.siggraph.org/publications/acm-siggraph-conference-documentation. A satire on the 1992 conference, by Steve Connelly and Tim Hall, can be found at http://steve.hollasch.net/cgindex/misc/sgsatire.html. Computer Graphics World is the premiere authority on innovative graphics, technology, and applications. It is located at http://www.cgw.com/. http://crossings.tcd.ie/ is a multidisciplinary online journal that explores the areas where technology and art intersect. http://computer.org/cga is the home of the IEEE Computer Graphics and Applications, a bimonthly magazine that covers a variety of topics catering to both computer
1.5 Resources For Computer Graphics
22
graphics practitioners and researchers. This popular publication bridges the theory and practice of computer graphics, from specific algorithms to full system implementations. See http://jgt.akpeters.com for the journal of graphics, gpu, and game tools, a quarterly journal whose primary mission is to provide the computer-graphics research, development, and production communities with practical ideas and techniques that solve real problems. Ray Tracing News Guide is a website maintained by Eric Haines who also compiles its content. It is located at http://tog.acm.org/resources/RTNews/html/. This is not a formal journal. It periodically features news of ray tracing. Animation Magazine is a monthly publication covering the entire animation field, computer and otherwise. Check it at http://www.bcdonline.com/animag/. Digital Imaging is a bimonthly reporting on the digital imaging industry. 5. Books. James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes, Computer Graphics: Principles and Practice in C, Addison-Wesley Professional, 2nd ed., 1995. This textbook is a classic. It covers all the important topics and areas of the field. It is not easy reading, though, and some may claim that it is showing its age. Peter Shirley and Steve Marschner, Fundamentals of Computer Graphics, A K Peters; 3rd revised ed., 2009). Covers the basic topics (except computer animation). Francis S. Hill Jr. and Stephen M. Kelley, Computer Graphics Using OpenGL, Prentice Hall, 3rd. ed., 2006. Very clear and easy to read. The treatment of ray tracing is especially comprehensive. There are many practice exercises. Donald D. Hearn and M. Pauline Baker, Computer Graphics with OpenGL, Prentice Hall, 3rd ed., 2003. In addition to its full coverage of the CG field, this book offers a nice OpenGL user’s manual. Graphics Gems is a series of books, started in 1990 by Andrew S. Glassner and written by him and others. See http://www.graphicsgems.org/. There are hundreds of other books on this subject, mostly concentrating on specific topics, such as ray tracing, curves and surfaces, programming in OpenGL, and computer animation. The colonists did not have a library at their disposal, but the engineer was a book that was always ready, always open to the page needed, a book which answered all their questions and which they often leafed through. —Jules Verne, The Mysterious Island, 1874. 6. Other Websites. The following UseNet groups are dedicated to various aspects of computer graphics: comp.graphics.animation, comp.graphics.digest, comp.graphics.opengl, comp.graphics.raytracing, and comp.graphics.visualization.
1 Historical Notes
23
3D ARK (http://www.3dark.com/) is a free online archive of 3D related content and resources for 3D enthusiasts. http://www.colormatters.com/ is a resource for all things color. http://www.computergraphica.com/ is all about human-pixel interaction. The Graphics File Formats page is at http://www.martinreddy.net/gfx/. http://www.opengl.org/ is the chief site for OpenGL, the open-ended graphics language. Site http://steve.hollasch.net/cgindex/index.html is Steve’s Computer Graphics Index. This is a collection of topics related to computer graphics. It is maintained by Steve Hollasch. Nan’s Computer Graphics Page (http://www.cs.rit.edu/~ncs/graphics.html) is a list of links to places that offer help and information related to computer graphics. The site is owned by Nan Schaller. http://www.refdesk.com/compgrah.html is an encyclopedia of facts and topics of computer graphics. The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics. It is available in official versions for several popular platforms. See http://www.povray.org/. GRAFICA Obscura, at http://www.graficaobscura.com/?/, is an evolving computer graphics notebook. This is a compilation of technical notes, pictures, and essays that Paul Haeberli has accumulated over the years. It seems that this site is no longer being maintained. Website http://i33www.ira.uka.de/applets/mocca/html/noplugin/ has applets for computer-aided geometric design (CAGD). http://www.faqs.org/faqs/graphics/faq by John Grieggs contains answers to frequently asked questions on graphics. 7. Graphics Software. Most graphics applications can be classified as two-dimensional or three-dimensional (abbreviated here as 2D and 3D, respectively). In the former class we find painting, drawing, illustration, drafting, and CAD programs. The latter class consists of modelers and renderers. A painting program includes tools such as brush, spray paint, pencil, and eraser. A picture can be painted by moving these tools with the mouse, pad, or other pointing device. Editing a painting is very difficult, because the individual graphics elements are not saved by the program. Once a stroke has been painted, it may be impossible to erase. The (now obsolete) MacPaint program of 1984–1988 was perhaps the first successful painting program. A drawing program offers tools such as a rectangle, ellipse, line, arc, and curve. Each element drawn is saved by the program as a set of data (or control) points. Thus, it is possible to select any element, edit it, move it, or delete it. An illustration program
24
1.5 Resources For Computer Graphics
includes colors, patterns, and textures. A drafting program allows the accurate drawing of graphics elements with precise dimensions. A CAD program may output the drawing in a special format to another program that drives a cutting tool to actually manufacture an item. A 3D modeler allows the user to construct any three-dimensional object in the computer and save it in a special 3D format, whereas a renderer may render such an object accurately, often by tracing every ray of light and computing its reflections from several surfaces. For many years, graphics software remained primitive because computers that were fast enough to process images and compute accurate renderings tended to be expensive. It was only in the late 1980s that several 2D programs were introduced and were slowly developed over the years. The rapid development of fast personal computers and large color display monitors in the 1990s paved the way for sophisticated 2D and 3D graphics software, and today, in 2011, we witness an explosion of such programs. The following survey is necessarily incomplete, but it lists the most important graphics applications available today. I apologize in advance for any omissions; they are unintentional. Mathematica is general-purpose mathematical software that can perform numeric calculations and symbolic manipulations and display its results graphically. Both 2D and 3D graphics are supported, with many options allowing the user to precisely specify what will be displayed and how. This software was conceived by Stephen Wolfram and was first introduced in 1988. It is currently in version 8. Even though Mathematica is not a graphics application, it is listed here because of its powerful graphics capabilities. See http://en.wikipedia.org/wiki/Mathematica. Matlab, by The Mathworks (http://www.mathworks.com/) is a software tool designed mostly for numerical computations. Its capabilities can be extended by individual packages that allow detailed computations and study of topics such as wavelets, symbolic manipulations, and neural networks. Matlab can plot the curves and surfaces it computes, which makes it an important graphics program. The name Matlab is short for Matrix Lab, because all variables in Matlab, even scalars, are matrices (there is also an upazila, a subdistrict, called Matlab in Bangladesh). 2D Applications Adobe Photoshop is a graphics editing program developed by Adobe Systems. The software can edit and process bitmap images (i.e., images where only the pixels are given, without any geometric information). The main features of the program include layers with masks, color spaces, ICC profiles, transparency, text, alpha channels and spot colors, clipping paths, and duotone settings. See http://www.adobe.com. Adobe Illustrator is a vector-based image editor, now at version 15, designed for illustrations. First released in 1988, the software immediately became popular due to its chief innovation, a simple, natural way to draw curves, based on cubic B´ezier curves (Section 13.2). Over the years, many features have been added, including some 3D capabilities to extrude and revolve shapes. Among the most recent sophisticated features and tools of Illustrator are Live Trace, Live Paint, a control palette and custom workspaces, aligning individual points, multiple
1 Historical Notes
25
crop areas, the Color Guide panel, the Live Color feature, the ability to create multiple artboards, a blob brush, a gradient tool, a perspective grid tool, and a bristle brush. In spite of the vast literature (books, videos, and other training materials) that exists for this program, it takes years to master all its capabilities and power. For more information see http://en.wikipedia.org/wiki/Adobe_Illustrator. GIMP is a free image manipulation program for tasks such as photo retouching, image composition, and image authoring. It is available from [GIMP 05] for many operating systems, in many languages. Inkscape is an editor for images in vector format. It runs on several computer platforms, it is free and is distributed under the GNU license. Its developer intends Inkscape to become a powerful graphics tool while being fully compliant with the XML, SVG, and CSS standards. At the time of writing, Inkscape is under active development, with new features being added regularly. See http://www.inkscape.org/. CorelDRAW is a vector-based graphics suite, offering more than just a vector graphics editor. It has been developed by http://www.corel.com/. It supports many powerful, useful features such as layout, tracing, photo editing, Web graphics, and animation. Having powerful competitors, CorelDRAW tries to improve on them in several ways, the most important of which is its being a suite of programs. It consists of the following applications: * * * * *
CorelDRAW: Vector graphics editing software Corel PHOTO-PAINT: Raster image creation and editing software Corel CONNECT: Content organizer Corel CAPTURE: Enables several methods of image-capture Corel PowerTRACE: Converts raster images to vector graphics (available inside the CorelDRAW suite)
Graphing Calculator is a tool for quickly visualizing mathematical objects and results. The user types an equation and the software computes and displays it without complicated dialogs or commands. See http://www.nucalc.com/. K3DSurf is software for plotting surfaces or higher-dimensional manifolds. See http://k3dsurf.sourceforge.net/. It can plot equations of three variables, parametric expressions for higher-dimension surfaces, and can also morph two images based on a variable. 3D-XplorMath, at http://3d-xplormath.org/, is software for visualization of geometric objects and processes. Surfaces, both 2D and 3D curves, complex-valued functions, and differential equations can all be plotted and animated. This software is particularly useful for those working in differential geometry, differential equations, or minimal surfaces. Virtual Math Labs. A group at the technical university of Berlin has developed a number of programs for exploring curves and surfaces. See http://www.math.tuberlin.de/geometrie/lab/curvesnsurfaces.shtml.
26
1.5 Resources For Computer Graphics
Gnuplot, at http://www.gnuplot.info, is a portable command-line driven graphing utility for linux, OS/2, MS Windows, OSX, VMS, and many other platforms. It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. Windows Programs for Plotting Curves and Surfaces MathGV: http://www.mathgv.com/. 3D Grapher: http://www.romanlab.com/. Graphmatica: http://www8.pair.com/ksoft/. Graphis: http://www.kylebank.com/. DPGraph: http://www.dpgraph.com/. Advanced Grapher: http://www.serpik.com/. MathGrapher: http://www.mathgrapher.com/. 3D Graphics Applications POV-Ray, at http://www.povray.org/ is mostly a 3D ray-tracing renderer, but it has many commands and options that make it possible to build quite complex models. The acronym POV stands for persistence of vision. Several modelers, such as Bishop3D and LionSnake, output POV-Ray code that can be rendered directly. The program SU2POV converts the output of SketchUp to POV-Ray code. SketchUp, at http://sketchup.google.com/ is a 3D modeler designed for architects, civil engineers, filmmakers, game developers, and related professions. It has been acquired in 2006 by Google, so it also includes features to facilitate the placement of models in Google Earth. It was originally developed in 1999–2000 and it immediately became clear that its developers had hit on the right way to manipulate 3D objects on the 2D monitor screen. The basic SketchUp is free, while the pro version is commercial. The chief innovation of SketchUp is its Push/Pull technology, described in its patent application as follows: “System and method for three-dimensional modeling: A threedimensional design and modeling environment allows users to draw the outlines, or perimeters, of objects in a two-dimensional manner, similar to pencil and paper, already familiar to them. The two-dimensional, planar faces created by a user can then be pushed and pulled by editing tools within the environment to easily and intuitively model three-dimensional volumes and geometries.” There are quite a few books on this useful software. Many training videos are available at http://sketchup.google.com/training/videos.html. A feature of SketchUp is the 3D Warehouse that lets SketchUp users search for models made by others and contribute models. Maya, by Autodesk, is a high-level application for 3D animation, 3D modeling, simulation, visual effects, rendering, matchmoving, and compositing. Maya is used for animation in film and television, in commercials, video games, and architectural visualization and design. See http://en.wikipedia.org/wiki/Maya_(software). The name Maya has nothing to do with the Maya civilization or with Maya Angelou. It is simply the Sanskrit term for “illusion.”
1 Historical Notes
27
Modo is a powerful polygon and subdivision surface modeling and rendering tool. It also supports morphing, sculpting, 3D painting, and animation. Modo is developed by Luxology (http://www.luxology.com/modo/) and is currently at version 501. Modo is heavily used both by commercial entities (television, film, and video games) and by private individuals. The program incorporates features such as n-gons, 3D painting, and edge weighting, and runs on Mac OS X and Microsoft Windows platforms. Because of its large user base, huge libraries of textures, scenes, studio lighting, and special effects (such as hair, splashes, and water/fog) are available for Modo. There is also much training material. See http://en.wikipedia.org/wiki/Modo_(software). Carrara, from http://www.daz3d.com, is a 3D modeler and renderer. Judging from the examples displayed in its gallery, its users tend to develop models of humans, animals, and aliens. Bryce, also from http://www.daz3d.com, is a 3D modeling and animation package that purports to be the first name in 3D landscapes. It combines powerful features with a smart and simple user interface to create realistic digital landscapes. Blender, from http://www.blender.org, is free software with all the features of commercial programs. It has many active users and its price makes it the default choice for many people. However, its user interface is non-standard so it creates a feeling of user-unfriendliness. Some users complain about incomplete documentation. Zbrush, from http://www.pixologic.com, is a 3D painting and sculpting tool. An object is created by selecting a brush and using it to paint, chisel, and mold. 3d Studio Max, from http://www.autodesk.com/3dsmax, is considered by many the equivalent of Maya because of its similar set of features. Softimage, from http://www.softimage.com, is one of the most advanced 3D animation and character creation software for video games and movies. It uses several non-proprietary languages for its scripting, but its current user base seems small, perhaps because of its price. Many other modelers and renderers are currently available. A detailed list can be found at http://en.wikipedia.org/wiki/3D_computer_graphics_software. The following is an alphabetical list with just the names of the most important ones: 3ds Max (Autodesk), AC3D (Inivis), Aladdin4D (DiscreetFX), Cinema 4D (MAXON), CityEngine (Procedural Inc.), Cobalt, Electric Image Animation System (EI Technology Group), form’Z (AutoDesSys, Inc.), Houdini (Side Effects Software), Inventor (Autodesk), LightWave 3D (NewTek), MASSIVE, NX (Siemens PLM Software), Silo (Nevercenter), solidThinking (solidThinking), Solid Edge (Siemens PLM Software), SolidWorks (SolidWorks Corporation), Swift 3D (Electric Rain), trueSpace (Caligari Corporation), and Vue (Eon Software). Yesterday is history. Tomorrow is a mystery. And today? Today is a gift. That’s why we call it the present.
—Babatunde Olatunji
2 Raster Graphics “An image is worth a thousand words” is a well-known saying. It reflects the truth because the human eye–brain system is a high-capacity, sophisticated data processor. We can “digest” a huge amount of information if we receive it visually, as an image, rather than as a list of numbers. This is the reason for the success of computer graphics. However, if one image conveys a lot of information, many images are even more informative. This is why computer animation is so popular. It can bring to life impossible or non-existent objects and can teach and entertain. So it is not just science—reasoning about the physical world—that involves virtual reality. All reasoning, all thinking, and all external experience are forms of virtual reality. —David Deutsch, The Fabric Of Reality. Computer graphics has progressed over the years in three stages. The first stage was to develop hardware and algorithms and software to compute, construct, and display a single image consisting of smooth, curved, realistic-looking surfaces. The second stage was to extend the basic algorithms in order to create and display an entire animation made of many frames, where each frame is an image. The third stage is virtual reality, where computer graphics goes one step beyond passively watching animation. The main features of virtual reality are the following: 1. Interaction. Once a virtual three-dimensional world is built, a user can walk through it at will, often also “grabbing” objects and manipulating them. 2. Realistic views. With the use of a special helmet (or a head-mounted display, Section 26.14.2) where each eye watches its own display, the user can look in one direction while moving in another. We say that this adds degrees of freedom to the display. The helmet also allows a stereo pair of images to be displayed, adding to the visual realism. Exercise 2.1: Use your crystal ball (or, in its absence, the answer provided here) to predict the next stage beyond virtual reality. D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_2, © Springer-Verlag London Limited 2011
29
30
2.1 Pixels
2.1 Pixels A digital image is a rectangular array of dots, or picture elements, arranged in m rows and n columns. The expression m × n is called the resolution of the image, and the dots are called pixels (except in the cases of fax images and video compression, where they are referred to as pels). The term resolution is sometimes also used to indicate the number of pixels per unit length of the image. Thus, dpi stands for dots per inch. Images are all around us. We see them in color and in high resolution. Many objects (especially artificial objects) seem perfectly smooth and continuous, with no jagged edges and no graininess. Computer graphics, on the other hand, deals with images that consist of small dots, pixels. The term pixel stands for “picture element” (see [Lyon 09] for a lively survey of the history of this important term). When we first hear of this feature of computer graphics, we tend to dismiss the entire field as trivial. It seems intuitively obvious that an image that consists of dots would always look discrete, grainy, rough, and inferior to what we see with our eyes. Yet state-of-the-art computer-generated images are often difficult or impossible to distinguish from their real counterparts, even though they are discrete, made of pixels, and not continuous. (See also Section 21.3 for a discussion of human vision and the resolution of the eye.) A similar dichotomy between discrete and continuous exists in art. Many painters try to mimic nature and employ wide, continuous brush strokes to paint smooth and continuous pictures, while others choose to be pointillists. They create a painting by placing many small dots on their canvas (see Section 2.29 for a similar approach). The most important pointillist was the 19th century French impressionist Georges Seurat. Seurat was a leader in the late 19th century neo-impressionism movement, a school of painting that uses tiny brushstrokes of contrasting colors to achieve a delicate play of light and create subtle changes in form. Seurat used this technique, which became known as pointillism or divisionism, to create large paintings made entirely of small dots of pure color. The dots are too small to be distinguished when looking at the work in its entirety, but they make his paintings shimmer with brilliance. His most well-known works are Une Baignade (1883–1884) and Un Dimanche Apr`esmidi a ` l’Ile de la Grande Jatte (1884–1886). The art critic Ars`ene Alexandre had this to say about the latter painting: “Everything was so new in this immense painting—the conception was bold and the technique one that nobody had ever seen or heard before. This was the famous pointillism.” Most engineers, programmers, and users think of pixels as small squares, and this is generally true for pixels on computer monitors. Pixels in other digital output devices (displays or printers) may be rectangular or circular. However, in principle, a pixel should be considered a mathematical, dimensionless, point (see [Smith 09]). It seems impossible to reconstruct a continuous image from an array of discrete pixels, but this is precisely what the surprising Nyquist-Shannon sampling theorem [Nyquist 03] tells
2 Raster Graphics
31
Figure 2.1: Pointillism.
us (in fact, what it guarantees). This theorem is often applied to digitized audio (a one-dimensional signal), but here we apply it to two-dimensional images. Audio is a good starting point to understand the sampling theorem. Sound fed into a microphone is converted to an electrical voltage that varies with time; it becomes a wave. A wave has a frequency, and a wave that varies all the time consists of many frequencies. We denote the maximum frequency contained in a wave by B (cycles per second, or Hertz). The samping theorem says that it is possible to reconstruct the original wave if it is sampled at a rate greater than 2B samples per second. An image is a rectangular array of point samples (pixels). The sampling theorem guarantees that we’ll be able to reconstruct the image (i.e., to compute the color of every mathematical point in the image) if we sample the image at a rate greater than 2B pixels per unit length, where B is the maximum pixel frequency in the image. Section 24.2 explains the meaning of the term “pixel frequencies in an image,” but in practice, pixels, their values, and their frequencies depend on the accuracy of the capturing device. An ideal device should measure the color of an image at certain points, but image sensors (CCDs and CMOS) used in real devices (cameras and scanners) are often far from ideal. Because of physical limitations, manufacturing defects, and the need to capture enough light, an image sensor often measures the average color (or intensity) of a small area of the image, instead of the color at a point. Assuming that we have enough pixels for a given digital image, we compute the color of a given point in the image by interpolation. Section 2.11 discusses bilinear interpolation and Part III of this book discusses this and other interpolation methods and illustrates them with examples. In this chapter, we are interested in interpolation of pixels (Sections 2.4, 2.5, and 2.10). The discussion assumes a grayscale image, where each pixel is a number indicating a shade of gray (an intensity), but interpolation can easily be extended to color images, where a pixel is a triplet of primary colors.
2.2 Graphics Output
32
2.2 Graphics Output The two traditional graphics output devices are the display monitor (CRT or LCD, Chapter 26) and the printer (mostly inkjet and laser, because hard copy is often needed). These output devices are two-dimensional, which is why a three-dimensional image has to be projected before it can be output (see Part II for projections). Many graphics output devices are currently available, but the discussion in this section employs the CRT to illustrate the important concepts of vector and raster scans. For such an advanced civilization as ours to be without images that are adequate to it is as serious a defect as being without memory. —Werner Herzog.
2.2.1 Vector Scan Current computer graphics hardware is virtually always implemented as raster scan. Very few display monitors are of the vector scan type (also known as random scan). In vector scan, where the display unit itself is normally a CRT, the program prepares graphics commands in a buffer in memory. Each command starts with a code (for example, 0 for point, 1 for line, 2 for circle, and so on), followed by command-specific data. A few examples are listed below. The program then starts the graphics controller, a hardware unit that controls all the graphics operations in the computer, and gives it the start address of the buffer. The controller fetches the commands from the buffer in memory and executes each to display a graphics object on the screen. Each command is executed by moving on the screen to where the object should start, turning the display on, and moving a beam to draw the object. At the end of the buffer, the program should place a branch command that sends the graphics controller to the start of the buffer, to refresh the display. Examples of common commands are the following: Point:
0
x1
y1
Line:
1
x1
y1
x2
y2
2 x1 y1 r 3 address Circle: Goto: The main advantage of vector scan is a smooth display. Slanted lines, arcs, circles, and other objects come out smooth and perfect. However, the hardware is complex (it has to be able, for example, to sweep the beam in a perfect circle around the screen); also, a complex image implies many commands. A good-quality, static display requires at least 20 refreshes per second. If the graphics controller takes more than about one 20th of a second to execute all the commands in the buffer, the refresh rate becomes too low, which causes the display to flicker. Vector scan is suitable for applications such as drafting. A technical drawing consists of a few types of graphics objects such as lines, arcs, rectangles, text, and arrowheads. Each becomes a command in the buffer. General graphics applications, however, often require the display of a smooth, curved surface whose color may vary continuously from point to point. In a vector scan graphics system, such a surface can only be represented by drawing a large number of points in different colors, which causes slow refresh and results in flickering.
2 Raster Graphics
33
2.2.2 Raster Scan Virtually all current graphics displays are raster scan devices. The principle of raster scan is to prepare the entire image in a buffer (called the bitmap) in memory in terms of dots. The final image consists of dots, and a number is stored in the bitmap for each dot, specifying the color of the dot. The graphics controller scans the bitmap number by number and draws a dot, called a pixel, on the screen for each number. If the display is bilevel (or monochromatic, displaying just two colors, black and white or foreground and background), then the color of a pixel is specified by one bit in the bitmap, normally 0 for background and 1 for foreground. If the display supports 2n shades of gray, then the bitmap should contain an n-bit number for each pixel. This is sometimes referred to as a bitmap with n bitplanes. For color displays, each pixel must be represented by three numbers in the bitmap. Figure 2.2 illustrates how a large, complex color image consists of a large number of pixels.
Figure 2.2: Pixels in a Raster Scan Display.
Mouse droppings: [MS-DOS] n. Pixels (usually single) that are not properly restored when the mouse pointer moves away from a particular location on the screen, producing the appearance that the mouse pointer has left droppings behind. —Eric Raymond, The Hacker’s Dictionary. It is useful to imagine the screen as a rectangle with r rows and c columns. A pixel has screen coordinates (x, y), where x is in the range [0, c − 1] and y is in the range [0, r − 1]. However, the bitmap is stored in memory as a one-dimensional array with index values from 0 to r × c − 1. If the bitmap is stored row by row in array bitmap, then the index of pixel (x, y) is yc+x, and the pixel is accessed in the array by bitmap[yc+x]:=.... Unfortunately, the common operation of accessing a pixel requires
2.2 Graphics Output
34
a multiplication. To simplify it, the computer designer should choose a value for c that’s a power of 2. The multiplication can then be replaced by a shift. Exercise 2.2: On some screens, the rows are numbered from the bottom up (the top screen line is row r − 1 instead of row 0). What would be the pixel’s index in array bitmap in such a case? A typical bitmap in current computers represents a pixel by three bytes, each corresponding to one of the three primary colors. Each primary color can therefore have 256 intensities and a pixel can have one of 224 ≈ 16.8 million colors. For a 1K ×1K screen resolution, this requires a bitmap of size 3×210 ×210 = 3×220 = 3 Mbytes. Doubling the resolution to 2K ×2K requires a bitmap four times as big. Graphics applications are one reason why today’s computers have such large memories. R G B 0 1 2 3 110 0 255
3 Bitmap
256 entries
255 Color Lookup Table
Figure 2.3: A Bitmap and a Color Lookup Table.
To save memory it is possible to use a color lookup table (Figure 2.3). The bitmap may consist of just one byte per pixel, containing a number between 0 and 255. That number is used by the graphics controller as a pointer to the color lookup table, which has 256 entries, each three bytes wide, specifying the intensities of red, green, and blue. The program first decides what colors are going to be used, and stores their values in the lookup table. Each color can be selected from a palette of 224 ≈ 16.8 million colors, but because of the size of the lookup table, a maximum of 256 colors can be displayed simultaneously. The program then scan-converts the objects to be displayed and stores pointers in the bitmap. Palette—board on which paints are laid and mixed. The graphics controller has a simple task. It scans the screen line by line and, at the same time, reads numbers from the bitmap and uses them to turn pixels on and off during each scan. The entire image is drawn as dots on scan lines (with nothing between the scan lines). In the case of a color lookup table, the controller reads a byte from the bitmap, uses it as a pointer, goes to the table, and uses the values in the three bytes to adjust the intensities of the three electron beams (in a CRT) or three pixels (in an LCD). This process is repeated for every pixel.
2 Raster Graphics
35
Some graphics controllers perform an interlaced scan. Such a controller first scans all the odd-numbered lines, then all the even-numbered ones, and then starts the refresh. This is supposed to create a smooth display even for low refresh rates, because of the way our eyes work. The advantages of raster scan are simple hardware, fast scan, and no flickering (the time it takes for a complete scan does not depend on the complexity of the image). The main downside is that an image appears on the screen as a large collection of pixels. Increasing the resolution of the screen allows for more complex images, but slanted lines and curved objects always have jagged edges. Vector scan, in contrast, does not involve any pixels and generates smooth lines and curves on the screen. Another disadvantage is that, in addition to the bitmap, another data structure (a geometric one) is needed, to keep track of the individual image elements. Imagine a line whose pixels are turned on in the bitmap. If the line has to be erased, there is no way to tell what bits in the bitmap belong to the line. The geometric data (code and coordinates of endpoints) of the line should be stored in the geometric data structure, then scan-converted from there. To erase the line, its geometric data should first be located in the geometric data structure. It should be scan converted and each pixel, in turn, erased. Historically, vector scan was the first method used, in the 1960s and 1970s, in computer graphics. In the early 1980s, however—with the advent of high-resolution, low-cost color monitors (and especially LCD monitors in the 1990s)—raster scan became the dominant display method. The reason for the popularity of raster scan is that in a complex image with many colors, a vector display requires many short vectors, thus increasing the scan time close to flickering. A raster scan takes the same time to complete a scan, regardless of the number of different pixels displayed, so the complexity of the image and the number of colors do not degrade the performance.
2.2.3 The Refresh Operation The graphics controller saves the coordinates of the currently refreshed pixel in two internal registers, x and y. The registers are used to calculate the bitmap address, yc+x, of the pixel (Figure 2.4), allowing the graphics controller to read the pixel’s value from the bitmap. Notice how the multiplication yc is achieved by simply placing y to the left of x and concatenating them. The value read is then converted to a voltage that’s used to turn the electron beam (in a CRT) or the current pixel (in an LCD) on or off. In a color display, three numbers are read from the bitmap for each pixel, and each is converted to a voltage that’s used to control the intensity of three beams. While the bitmap is read, the graphics controller increments register x, to point to the next pixel. When register x overflows, it automatically returns to zero (think of an odometer going from 9 . . . 99 to 0 . . . 00) and the overflow signal is used to clock (i.e., to increment) register y and also to move the beam to the start of the next scan line. When register y overflows, it also returns to zero and its overflow signal is used to return the beam to the start of the first scan line.
2.2.4 Windows and BitBlt In current computers, the operating system maintains the screen as a bitmap. Everything displayed on the screen—text, cursor, and graphics—is saved in the bitmap as
2.2 Graphics Output
36
Memory
CRT controller
CRT
Data bus Converter
Beam on/off Start new scan
Overflow Clock Overflow Clock Bitmap
y register
Start new scan line
x register
Bitmap address
Internal computer clock
Address bus Figure 2.4: CRT Refresh.
pixels and is constantly refreshed on the screen by the graphics controller. A useful feature supported by current operating systems is multiple windows. At any time, the screen may display several windows (or parts thereof). They can be opened, moved, resized, and closed. The window manager, part of the operating system, is responsible for these operations. The main task of the window manager is to display the windows on the screen in a way that makes sense to the user. Imagine a window A fully displayed on the screen. If another window B is moved by the user over A, completely or partially obscuring it, the window manager has to erase all or part of A and replace it with B. When B is moved again, window A (or part of it) should be restored. If A is static, its content can be saved in memory and later used to restore A. However, if window A is displayed by a program, its content may change all the time. The program must therefore keep the pixels of A in a buffer in memory and update the buffer all the time. The window manager uses this buffer to restore A when necessary. Many graphics packages offer a set of procedures collectively referred to as BitBlt— an acronym that stands for “BIT Boundary bLock Transfer” and is pronounced “bitblit.” These procedures can save parts of the bitmap in, and also restore parts from, memory. The operating system must have routines to provide BitBlt with fresh memory areas and to reclaim unneeded areas. This is done by a dynamic memory allocation method such as first fit, best fit, or the buddy system. Because BitBlt operations are common, they should be fast; today there are VLSI circuits that do BitBlt in hardware at speeds of about 100 million pixels per second. Such a circuit can do more than just save and restore. It can also perform a logical operation while restoring the image, a feature that can be applied to create useful effects. We call the new image “source” and the current bitmap (in which the new image is going to be written) the “destination.” Instead of simply writing each source bit into
2 Raster Graphics
37
the bitmap, thereby erasing a destination bit, the BitBlt circuit writes a bit that’s a function of the source and destination bits. The rule is dst:=dst op src, where op is any logical operation on bits. Table 2.5 shows examples of logical operations (the rows are the source and the columns are the destinations). 01 0 00 1 01
01 0 01 1 10
01 0 01 1 11
01 0 01 1 00
Table 2.5: Some Logical Truth Tables.
Exercise 2.3: How many logical operations are possible in principle? Here are a few examples of logical operations performed by BitBlt (zero represents a white pixel and one represents a black pixel). Op Source
= Destination
replace
or
xor
and
dst:=src. This is the replace operation. It draws the new source on the screen, regardless of what’s on the screen now (i.e., regardless of the destination). It amounts to overpainting the new source on the screen (a destructive write). This is common when windows are created and moved. It can also be used to erase areas by drawing in white (the background color). dst:=0. This clears the screen. dst:=not dst. This inverts the screen. A process of inverting bits, changing ones to zeros and vice versa. For example, in a graphics program, to invert a black-and-white bit-mapped image (to change black to white and vice versa), the program could simply flip the bits that compose the bitmap.
dst:=dst or src. This adds the source to the destination (a nondestructive write). Oring a gray pattern on a white screen results in the pattern. Oring the same pattern on a black background has no effect. The paint brush in many paint programs uses this mode. The and operation dst:=dst and src can be used to erase destination pixels selectively. dst:=dst xor src. A pixel would be black whenever the source and destination pixels differ. This turns out to be an important mode. A common example is erasing an
2.2 Graphics Output
38
object from the bitmap. Imagine an image consisting of simple geometric objects. To erase an object, the coordinates of all its pixels have to be determined, so that they can be erased. The problem is that objects may intersect; erasing all the pixels of an object will therefore leave holes in all the objects that happen to intersect it. The solution is to draw in xor mode when the object is originally drawn and also when it is erased. The term xor stands for exclusive-or. This is a simple logical operation that is defined as follows. The xor of two bits a and b is 1 when a is 1 or b is 1 or both are 1’s. This simple operation has many surprising and powerful applications in computer programming. When objects are drawn and erased in xor, the intersection of several objects (which is normally just one pixel or a few pixels) flips between black and white each time an object is drawn or is erased. Figures 2.6 and 2.7 illustrate an example. In Figure 2.6a, a horizontal line is drawn in xor mode by flipping pixels from white to black. In Figure 2.6b, a vertical line is drawn, also in xor mode, so the pixel at the intersection point becomes white. In Figure 2.6c, a slanted line is drawn, so the same pixel becomes black again. In Figure 2.6d, the vertical line is erased, and in Figure 2.6e, the horizontal line is erased as well. The pixel at the intersection keeps changing its color, but the picture as a whole does not look bad. Note that both the drawing and erasing are done with a source of 1. The BitBlt operation is dst:=dst xor 1.
• • • • • • •
• • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • •
• • • • • • • • • • • •
• • • • • • •
(a)
(b)
(c)
(d)
(e)
Figure 2.6: Xor at Line Intersections.
001=
100
010=
100 xor 001=101
011=
100=
101 xor 010=111
101=
110=
111 xor 001=110
111=
110 xor 100=010
Figure 2.7: Drawing and Erasing Color Lines in Xor Mode.
Exercise 2.4: Is the erase operation common enough to justify xor drawing? Exercise 2.5: Can xor drawing be used when drawing in color (i.e., when a pixel consists of more than 1 bit)? Xor drawing can also be used with a color lookup table. If an original entry in the bitmap is 3 (=0011) and we draw a new object with a source of 5 (=0101), then
2 Raster Graphics
39
the bitmap entry is going to change to (0011 xor 0101)=0110=6. The pixel at the intersection point now points to entry 6 in the color lookup table. If we now erase the original pixels of 3, then (3 xor 6) will result in 5. Example: The concept of transparency is important when adding images to a bitmap. A common example is a web page. Such a page normally has a certain background and new images added to the page obscure that background. When adding a new image to our page, we may want certain parts of the image to be transparent. Such parts should show the background instead of any image pixels. Transparency is achieved by declaring one of the colors used in the image a transparent color (a notable example is the GIF89 graphics file format). The browser displaying the page finds this declaration in the image file. This example shows how several BitBlt operations can be used to implement transparency. We assume that a bitmap and an image are given (parts (a) and (b), respectively, of Figure 2.8) and that the image is to be added to a certain region of the bitmap (the bottom-left 4 × 4 part). We also assume that the image file contains information about the transparent parts. The operation proceeds in the following four steps: 1. Create a bitmap the size of the image, set all the transparent pixels in it to 1 and all the opaque pixels to 0. This bitmap is called a mask (Figure 2.8c). 2. Perform an exclusive-or (xor) of the image with the proper bitmap region (in our example, the bottom-left 4 × 4 part). Store the result in the bitmap (Figure 2.8d). 3. Perform a logical AND of the bitmap and the mask and store it in the bitmap (Figure 2.8e). This operation leaves all the transparent areas unchanged and sets the opaque areas to zeros. 4. Xor the proper bitmap region and the image. This operation sets all the transparent areas back to their original values (because two xor operations cancel each other out). The opaque areas now contain the desired image bits (Figure 2.8f) because an xor with a zero leaves a bit unchanged. 0 0 1 1 0
0 0 1 0 1
0 1 0 1 0
1 1 0 0 1
1 1 1 0 1
(a) Bitmap 0 0 0 1 1
0 1 1 0 0
0 1 1 0 0
1 0 0 1 1
(d)
1 1 1 0 1
0 1 0 1
1 0 0 1
0 1 1 0
1 0 1 0
0 0 1 1
(b) Image 0 0 0 1 1
0 0 0 0 0
0 0 0 0 0 (e)
1 0 0 1 1
0 0 1 1
0 0 1 1
0 0 1 1
(c) Mask 1 1 1 0 1
0 0 1 1 0
0 1 0 0 1
0 0 1 1 0 (f)
Figure 2.8: Transparency in a Bitmap.
1 1 0 0 1
1 1 1 0 1
2.3 The Source-Target Paradigm
40
2.3 The Source-Target Paradigm The source-target paradigm is a simple, useful concept that has to do with bitmap operations. An image taken with a digital camera or produced by a scanner exists only as a bitmap, but users may want to perform many operations on such an image. A bitmap operation starts with a source bitmap S and uses its pixels (x, y) to produce a target bitmap G with pixels (x∗ , y ∗ ). We can denote such an operation by G = T ·S or (x∗ , y ∗ ) = T (x, y), where T is the operation or transformation from S to G. The most common examples of bitmap operations are bitmap scaling, stretching, and rotation, image sharpening, blurring, and pixelating, color enhancement, and converting to grayscale. The source-target paradigm has to do with bitmap operations that change the size of the bitmap (the target bitmap may be larger or smaller than the source bitmap). When a bitmap is enlarged, the target bitmap has more pixels than the source bitmap. When we write a program to perform such scaling, we tend to implement it as a loop that goes over the pixels of the source bitmap. For each pixel (x, y), the program computes new coordinates x∗ and y ∗ and a new value. It then stores the value in location (x∗ , y ∗ ) of the target bitmap G. We can refer to such a program as a source-to-target algorithm and it is clear that this type of algorithm involves a problem. If the target is greater than the source, some target locations will be left empty. If the target is smaller than the source (bitmap shrinking), some target locations may be “covered” with multiple new values from several source pixels. Thus, implementing a bitmap operation that changes the size of the bitmap should be done in a target-to-source algorithm. The program loops over the pixels of the target bitmap G and computes coordinates (x, y) for every target pixel (x∗ , y ∗ ). This can be denoted by (x, y) = T −1 (x∗ , y ∗ ), where T −1 is the inverse of the original transformation T . If the inverse transformation T −1 cannot be figured out, the program has to use a source-to-target algorithm, but must include an extra step that loops over the target pixels, identify each empty pixel and compute a value for it. Figure 2.9 illustrates these types of algorithms.
S
y
T
G
S y*
x
x*
T -1
G y*
y x
x*
Figure 2.9: The Source-Target Paradigm.
Figure 2.9 illustrates another important problem in bitmap operations. The coordinates of pixels are integers, but the coordinates of transformed pixels may be nonintegers, because T and its inverse may transform many pairs of integers to pairs of
2 Raster Graphics
41
real numbers. If an inverse transformation T −1 (x∗ , y ∗ ) results in a pair (x, y) of real numbers, the bitmap-operation algorithm must decide how to convert them to integers. This is an interpolation problem (more specifically, two-dimensional interpolation) and it can be solved in various ways as illustrated in the next section.
2.4 Interpolation Interpolation (in the context used here) is the process of determining intermediate values from a set of discrete values (but see Section 2.10 for a different definition). The word comes from the Latin inter (between) and polare (to polish) and it means to compute new values that lie between (or that are an average of) certain given values. This book is about synthesizing images, which involves pixels, points, and vectors. Therefore, most of the interpolations considered in the book are two dimensional. Here are some examples of two-dimensional interpolation. Rounding. Certainly the simplest solution is to round any real coordinates to the nearest integer. This is denoted by round(x) and is done by x + 0.5. Bilinear. The next obvious interpolation technique assigns a value to pixel (x∗ , y ∗ ) that is a bilinear interpolation of the four near neighbors of the pair (x, y) of real numbers. This type of interpolation is the topic of Section 9.3 (but see also Section 2.11) and is summarized in Equation (9.6), duplicated here P(u, w) = P00 (1 − u)(1 − w) + P01 (1 − u)w + P10 u(1 − w) + P11 uw =
1 1
B1i (u)Pij B1j (w),
i=0 j=0
= [B10 (u), B11 (u)]
P00 P10
P01 P11
(9.6)
B10 (w) , B11 (w)
where Figure 2.10a illustrates the meanings of u and w in this case (notice that these parameters are in the interval [0, 1]).
P00
P10 w
y
2w P 11
u P01
x (a)
P00
P11
2u P22 (b)
(c)
Figure 2.10: Bilinear, Biquadratic, and Bicubic Interpolations.
2.4 Interpolation
42
Biquadratic. A better approach for certain applications, such as free-form deformations (Section 19.11), may be biquadratic interpolation. The new value stored in target pixel (x∗ , y ∗ ) is a weighted sum of a group of 3 × 3 source pixels centered on the pixel nearest (x, y). This is illustrated in Figure 2.10b and the weighted sum is given by Equation (10.21), duplicated here ⎛
⎞⎛ 2 −4 2 P22 4 −1 ⎠ ⎝ P12 P(u, w) = (u , u, 1) ⎝ −3 1 0 0 P02 ⎞T ⎛ 2 ⎞ ⎛ 2 −4 2 w 4 −1 ⎠ ⎝ w ⎠ . × ⎝ −3 1 0 0 1 2
P21 P11 P01
⎞ P20 P10 ⎠ P00
(10.21)
Notice that the distances from the top-left corner of the group to the (red) pixel at (x, y) are denoted by 2u and 2w. These quantities vary from zero (on the left or the top) to 2, so the parameters u and w of Equation (10.21) vary from 0 to 1. If the red pixel happens to be at the center of the group, both u and w will equal 0.5. Since this book is about computer graphics, it is best to explain biquadratic interpolation in graphical terms. Imagine that the 3×3 pixels of the group are three-dimensional points with x and y coordinates that vary from 0 to 2 and z coordinates that are the colors of the pixels. The biquadratic interpolation computes a smooth surface that spans the nine points. Once the location of the red pixel within the surface is determined, the color of the target pixel (x∗ , y ∗ ) is set to the height of the surface at that location. Bicubic. Extending the principles of bilinear and biquadratic interpolations results in the popular bicubic interpolation (Sections 2.12.3 and 10.6). The new value stored in target pixel (x∗ , y ∗ ) in this type of interpolation is a weighted sum of a group of 4 × 4 source pixels selected such that the red pixel at (x, y) is located somewhere in the center square of the group. Figure 2.10c illustrates this type of interpolation, while Equation (10.25) lists the 16 weights of the sum. Catmull-Rom. The Catmull–Rom surface patch of Section 12.7 can be used to interpolate a pixel as a weighted sum of a group of 4 × 4 pixels, similar to bicubic interpolation. The details of this weighted sum are listed in Equation (12.55). The interpolating B´ezier surface patch of Section 13.22 and the interpolating bicubic B-spline patch of Section 14.18 may also be used for two-dimensional interpolation.
2.4.1 The Sinc Function Section 2.1 discusses pixels and shows that a pixel should be considered a mathematical, dimensionless point. It seems impossible to reconstruct a continuous image from an array of discrete pixels, but it has long been known (from the so-called Nyquist-Shannon sampling theorem [Nyquist 03]) that such a feat is possible, provided that the pixels are dense enough. The sampling theorem guarantees that a set of discrete points can perfectly reconstruct a continuous function if the points are dense enough (if there are enough points per unit length). To get a better understanding of this theorem and its consequences, we examine the one-dimensional case. We know that a function g(t) can be plotted as a curve. Such a curve may go up and down and behave like a wave. In any given interval, this wave has
2 Raster Graphics
43
a frequency. We select the highest frequency of the function, and denote it by f . The sampling theorem guarantees (subject to certain conditions) that g(t) can be perfectly reconstructed if it is sampled at a rate greater than 2f . To sample a function means to evaluate it at many points. Thus, if f = 1200, we should sample the function at, say 2500 points for each unit length. For example, from g(3) to g(4) we need to evaluate the function 2500 times and create 2500 equally-spaced samples. Such a set of samples can reconstruct the original, continuous function perfectly. Given the samples, we can compute g(t) at any point t to full precision. Thus, the sampling theorem provides an ideal interpolation method, but like any other ideal concept, solution, or construction, it cannot be fully applied in practice, because (1) the set of samples is often infinite and (2) even if it is finite, too many samples are required for ideal interpolation. In order to be practical, interpolation must include only a relatively small number of samples. Perfect reconstruction is done with the well-known sinc function, which is defined as 1, for x = 0, sinc(x) = sin(πx) , for |x| > 0. πx The name sinc is a contraction of sinus cardinalis (Latin for cardinal sine). Figure 2.11 shows a finite portion of this function, which is a sine wave, scaled vertically to become higher around zero. 1.0 0.8 0.6 0.4 0.2
15
10
5
5
10
15
0.2
Figure 2.11: The Sinc Function.
Reconstruction is done by the infinite sum g(t) =
∞
sinc(t − u)s(u) = sinc ∗ s,
u=−∞
where the “*” symbol stands for convolution. Figure 2.12 illustrates this process graphically (it is obvious that there are not enough samples in the figure). The most striking feature of this figure is that the sinc function is often negative, and many samples are
2.4 Interpolation
44
multiplied by negative weights as a result. It makes sense to multiply nearby samples by large weights and decrease the weights for samples that are farther away, but what is the meaning of negative weights? The explanation is found in the answers to Exercises 10.13 and 2.12.
Discretesamples
Convolution
sinc
Continuousfunction Figure 2.12: Ideal Interpolation with Sinc.
For the two-dimensional case we start with a set s(u, w) of samples taken from a function G(x, y) and we use this set to compute any value of G with the double sum G(x, y) =
∞
∞
sinc(x − u)sinc(y − w)s(u, w) = SINC ∗ s,
u=−∞ w=−∞
where SINC(x, y) = sinc(x)sinc(y).
2 Raster Graphics
45
2.5 Bitmap Scaling An image is made of pixels, so its size is measured in pixels, not in inches or centimeters. When an image has to be displayed or printed, it often has to be scaled to fit the available space. Scaling can be up or down. Scaling up (or zooming in) enlarges the bitmap, while scaling down (or zooming out) reduces it. Both types of scaling are important, they represent different problems, and are often implemented differently. The amount of scaling is the first feature to be discussed and we start with onedimensional scaling. When a straight segment of length L is left as is, we say that its length has increased by 0%. When it is stretched to 2L, we say that its size has increased by 100%. Because the length has doubled (and because the word “double” is associated with the integer 2) we also say that the segment has been stretched or scaled by a factor of 2. Similarly, when the segment is stretched to 3L, it has been scaled by a factor of 3 (because 3L/L = 3) or by 200%. This is a little confusing, and the confusion increases when we consider noninteger scaling, when we take into account shrinking, and when we discuss images, which are two dimensional. When a segment is stretched to three times its length, it has been scaled by 200%, because the difference in lengths 3L − L equals 2L. This answers the question what if a segment has been stretched from L to 2.35L? We simply subtract 2.35L − L = 1.35L and say that the length has increased by 135% or by a factor of 2.35. When a segment is shrunk to 0.4L, it makes sense to claim that its length has increased by 0.4L − L = −0.6L or −60% (or by a factor of 0.4L/L = 0.4). We find it more convenient, however, to talk about 60% shrinking or scaling down by 0.4. When an image of size m × n is left alone, it makes sense to say that it has been scaled by 0%. When both its dimensions have doubled, we say that it has been scaled by a factor 2 or by 100%. The point is that the area of the image is now four times what it used to be, but the terms “factor” and “percent” refer to the increase in the linear dimensions, not the area. In principle, the height and width of an image may be scaled by different amounts, and in such a case we have to use two numbers (factors or percentages) to express the amount of scaling. Thus, increasing the height of an image from H to 2.35H and decreasing its width from W to 0.4W is considered scaling by the pair (135%, −60%) or by (2.35, 0.4). This long section discusses several approaches to bitmap scaling, as follows: Replicate rows and columns as needed (or remove some of them for shrinking). Generate new pixels by assigning them the values of original pixels. This is better than simply replicating entire rows or columns and is also fast, because pixels are simply copied and no computations are needed. Use interpolation to compute values for new pixels. Each new pixel becomes a weighted average of several of its near neighbors, original pixels. This approach involves many calculations and is slow, but normally produces better results than simply replicating rows or individual pixels. Use one of the approaches above, but also examine the original pixels for any edges in the image and try to preserve them. Pixels on or close to an edge should be treated
46
2.6 Bitmap Stretching
differently from pixels located away from a sharp edge. Such techniques are more time consuming, but may produce better results, especially for large scale factors.
2.6 Bitmap Stretching One way to scale a bitmap is to stretch it. This is done by replicating rows and columns several times. To stretch a bitmap to three times its width and twice its height, replicate each column three times and replicate each row twice, creating six identical target pixels for each source pixel. This is simple, but the result is a coarse image, visibly made of large rectangular blocks. To shrink a bitmap (by negative stretching) to a third of its width and half its height, delete one-third of its columns and every other row. Leave the rest of the original bitmap untouched. This is also fast, but the resulting image may be missing important details and may be noticeably different from the original image.
2.7 Replicating Pixels Instead of stretching a bitmap by replicating entire columns and rows, the approach described here generally achieves better results by replicating individual pixels. When a small bitmap U is scaled up to become a larger bitmap V , each of its pixels is replicated several times to become a small, uniform, rectangular region in V . Notice that these regions cannot all be the same size. If we start from a bitmap with P pixels and replicate each pixel into a small region of size Q, then the resulting scaled bitmap will consist of P · Q pixels, but the user may want a different target size. Any algorithm that scales by replicating pixels is given the dimensions of the original and final bitmaps, and it has to generate regions of different dimensions (but not very different) that together will have the required dimensions of the final bitmap. This approach is illustrated here by an algorithm due to [Andres et al. 96], which has two important properties (1) it can scale a bitmap up by any rational factors (and they can be different for the rows and columns) and (2) it is reversible. The main advantage of such an algorithm is its simplicity, which translates to speed. Replicating a pixel several times is much faster than interpolating values. When a bitmap is scaled by clicking and dragging, the program normally displays an empty outline that follows the mouse. The scaled bitmap is computed and displayed only when the dragging is complete. When this method is used, the software may be able to compute and display the scaled bitmap several times per second, while also following the dragged mouse. We assume that a small bitmap UK,L is to be scaled to a bigger bitmap VM,N . The pixels of U are denoted by (x, y) and the pixels of V are denoted by (x , y ). Figure 2.13 illustrates the basic parameters involved in this process and it shows that a pixel in U with row and column coordinates (x, y) should be moved to location (x , y ) = (M x/K, N y/L) in V . However, coordinates of pixels must be integers, so we modify the above to (x , y ) = (M x/K, N y/L). Because of the use of the floor
2 Raster Graphics L
N
x K
47
x’
(x,y) M
(x’,y’)
Figure 2.13: Relation of x to x.
(integer truncation) operation it is obvious that several values of x may yield the same x , and this fact is the basis for the reverse transformation. The reverse transformation, from V to W , is extremely simple. Given a pixel (x , y ) in V , it is moved to become pixel (x, y) = (K x /M , L y /N ) ,
(2.1)
in U . This moves all the (identical) pixels in a rectangular region of V to the same location in U . Mathematical experience, combined with a little thinking, produces the forward transformation. Given a pixel (x, y) in U , we want to determine the region in V where it should be replicated, i.e., the region of pixels (x , y ) that will be transformed to this particular (x, y) by equation (2.1). A little tinkering yields the inequalities
Mx + K − 1 Mx + M − 1 ≤ x ≤ or C0 (x) ≤ x ≤ C1 (x) K K and
Ny + N − 1 Ny + L − 1 ≤ y ≤ or R0 (y) ≤ y ≤ R1 (y). L L The scaling algorithm is now obvious. It loops over the pixels of U and for each pixel (x, y) it computes C0 (x), C1 (x), R0 (y), and R1 (y) from the equations above. It then fills that region of V bounded by rows R0 (y) and R1 (y) and by columns C0 (x) and C1 (x) with copies of pixel (x, y). The rectangular regions in V can have the four sizes [M/K × N/L], [M/K × (N/L + 1)],
[(M/K + 1) × N/L], [(M/K + 1) × (N/L + 1)].
When the reverse transformation is applied to V , all the pixels of a region are moved to the same location in U . This scaling algorithm is perfectly reversible. Example: Given the small bitmap U4,3 , we scale it to become the large V15,13 . The scale factors are 15/4 (vertical) and 13/3 (horizontal). The bounds of the regions in V are
15x + 3 15x + 14 13y + 2 13y + 12 , C1 (x) = , R0 (y) = , R1 (y) = . C0 (x) = 4 4 3 3
2.8 Scaling Bitmaps with Bresenham
48
Thus, for example, pixel (3, 2) of U is transformed as follows:
(3, 2) →
59 28 38 48 , × , = [12, 14] × [9, 12]. 4 4 3 3
This pixel is therefore replicated 12 times into the 3 × 4 region of V that is bounded by rows 12 through 14 and by columns 9 through 12. Figure 2.14 shows the 12 regions of V and the pixel of U that is replicated in each region. This figure also illustrates the main downside of this algorithm. Scaling factors of more than 2–3 result in a pixelated bitmap.
U V
Figure 2.14: Bitmap Scaling by Pixel Replication.
2.8 Scaling Bitmaps with Bresenham In this approach, originally proposed by [Kientzle 95] and later improved and extended by [Riemersma 02], we scale a source bitmap S to a destination bitmap D by replicating pixels. No interpolation or other computations are used. Pixels are selected in S and are moved to D (however, the smooth algorithm that is based on this approach computes weighted averages). We first discuss our method for a single row of pixels. Recall that the goal of bitmap scaling is to determine the destination bitmap D, which means that whatever algorithm we develop, it will have to loop over the pixels of D one by one, and for each pixel Di decide which source pixel Sj to move to Di . When scaling a bitmap up (to zoom in or enlarge it), each iteration advances to the next pixel Di+1 in the destination and a source pixel S is moved there, but this source pixel can only be the one used for Di (let’s denote it by Sj ) or its immediate successor Sj+1 . This brings to mind Bresenham’s algorithm for scan converting straight lines (Section 3.4), where x is incremented by 1 in each iteration, and the only decision is whether or not to increment y. This decision depends on the slope of the line, but the slope is a a real number, which is why Bresenham’s algorithm and other DDA methods for lines use instead the two integers Δx and Δy. The simple bitmap scaling algorithm of Figure 2.15 is based on the same principle. In each iteration, the index destPos of the destination bitmap is incremented by 1, but the index srcPos of the source bitmap is
2 Raster Graphics
49
incremented only from time to time, when needed, based on the relative sizes srcWidth and destWidth of the source and destination bitmaps, respectively. When scaling a bitmap down (shrinking), each iteration still advances to the next destination pixel Di+1 , but selecting the source pixel is more complex, because that pixel may be a distant successor Sj+k of Sj . This is why the algorithm of Figure 2.15 has an inner while loop where index srcPos may be incremented several times as needed depending on the relative sizes of the source and destination bitmaps. var srcWidth, destWidth: integer; var srcPos=0, destPos=0, numerator=0: integer; while(destPos < destWidth) dest[destPos]:=src[srcPos]; destPos:=destPos+1; numerator:=numerator+srcWidth; while(numerator > destWidth) numerator:=numerator-destWidth; srcPos=:srcPos+1 endwhile; endwhile; Figure 2.15: Bitmap Scaling Following Bresenham.
Example: Given a source bitmap of five pixels and a destination bitmap of 13 pixels, Table 2.16 lists the 13 iterations needed to scale the source to the destination. Before the main loop starts, five variables are initialized as follows: srcWidth=5, destWidth=13, srcPos=destPos=0, and Num=0. Figure 2.18a illustrates this process graphically.
destPos Num srcPos Pixel moved destPos Num srcPos Pixel moved 1 5 dest[0]←src[0] 8 14,1 3 dest[7]←src[2] 2 10 dest[1]←src[0] 9 6 dest[8]←src[3] 3 15,2 1 dest[2]←src[0] 10 11 dest[9]←src[3] 4 7 dest[3]←src[1] 11 16,3 4 dest[10]←src[3] 5 12 dest[4]←src[1] 12 8 dest[11]←src[4] 6 17,4 2 dest[5]←src[1] 13 13 dest[12]←src[4] 7 9 dest[6]←src[2] Table 2.16: Bitmap Scaling, 5 to 13 Example.
For the opposite scaling, from a 13-pixel row to a 5-pixel row, the initialization is similar and Table 2.17 lists the individual steps (see also Figure 2.18b). This scaling algorithm is simple and fast, but has serious drawbacks. In particular, enlarging a bitmap may result in an irregular, blocky image, while shrinking skips source pixels and thus generates artifacts that are often very noticeable. It is possible to extend the basic algorithm of Figure 2.15 so that instead of skipping pixels it computes their
50
2.8 Scaling Bitmaps with Bresenham destPos 1 2 3 4 5 6
Num 13,8,3 16,11,6,1 14,9,4 17,12,7,2 15,10,5
srcPos 1,2 3,4,5 6,7 8,9,10 11,12
Pixel moved dest[0]←src[0] dest[1]←src[2] dest[2]←src[5] dest[3]←src[7] dest[4]←src[10] dest[5]←src[12]
Table 2.17: Bitmap Scaling, 13 to 5 Example.
(a)
(b)
Figure 2.18: Scaling Example.
(weighted) average and stores it in the current destination pixel. Such an algorithm is listed in Figure 2.19 and it results in smooth shrinking of the source bitmap (albeit at the price of more operations and slower execution). var srcWidth, destWidth, pixelFrac, num: integer; var srcPos=0, destPos=0: integer; pixelFrac:=destWidth; while(destPos < destWidth) p:=0; num:=0; /* Handle whole pixels first */ while(num+pixelFrac ≤ srcWidth) num:=num+pixelFrac; p:=p+pixelFrac × src[srcPos]; srcPos:=srcPos+1 pixelFrac:=destWidth; endwhile if(num<srcWidth) /* Partial pixel? */ p:=p+(srcWidth-num) × src[srcPos]; pixelFrac:=pixelFrac-(srcWidth-num); endif dest[destPos]:=p/srcWidth; destPos:=destPos+1; endwhile; Figure 2.19: Smooth Bitmap Scaling.
2 Raster Graphics
51
The algorithm of Figure 2.19 consists of a main while loop with three parts (1) an inner while for collecting a number of whole source pixels, (2) an if statement to include a partial source pixel, if needed, and (3) an assignment statement to store a weighted average of the collected pixels in a destination pixel. This is best illustrated by an example. Example. We manually execute the algorithm of Figure 2.19 to shrink a row of 13 pixels to a row of five pixels. Table 2.22 lists the five iterations and Figure 2.20 illustrates how each of the five destination pixels is assigned a weighted average of approximately 13/5 = 2.6 source pixels.
Figure 2.20: Scaling Example.
The methods of this section operate on a single row of pixels. To apply this approach to a complete, rectangular bitmap A, we can first scale every row of A to create B, then scale every column of B to obtain C. Figure 2.21 illustrates how this works and also the downside of this simple technique. Scaling the rows of the original bitmap A creates the wider bitmap B with lower visual quality. Scaling B vertically creates the final bitmap C but degrades the image quality even more in the process. Thus, this algorithm performs best if only the rows or the columns, but not both, need be scaled.
A
B
C Figure 2.21: Scaling by Rows and by Columns.
2.8.1 A Slight Variation The variation on Bresenham described in this section is due to Tomas M¨ oller (page 4 of [Kirk 92]) and it is fast because it employs just integers. In fact, it is similar to the DDA methods of Section 3.3. Any DDA method for lines can be used, and the developer selected Bresenham’s algorithm of Section 3.4. The method is illustrated in Figure 2.23, which is a modified version of the original Bresenham algorithm, Figure 3.8. The original algorithm loops from x1 to x2 and draws several pixels at each y value from y1 to y2 . The modified algorithm assumes a source array where each array location from y1 to y2 is copied into several locations, from x1 to x2, of the destination array.
52
2.8 Scaling Bitmaps with Bresenham
Initialization srcWidth=13; destWidth=5; pixelFrac=5; srcPos=destPos=0; p:=0; num:=0; \* Iteration 1 *\ while(num+pixelFrac≤13) num:=0+5=5; p:=p+5×src[0]; srcPos:=1; pixelFrac:=5; num:=5+5=10; p:=p+5×src[1]; srcPos:=2 pixelFrac:=5; endwhile if(10 1, then P is outside the unit circle and P∗ is inside it (because 1/r < 1). Thus, this projection inverts points with respect to the unit circle centered on the origin. It is easy to see that points on the circumference of the circle are projected to themselves and that circle inversion is undefined for the origin, where r = 0. (Although we can say that the origin is projected to the point at infinity, but this claim is not very useful and may cause confusion with parallel lines, which are also sometimes said to meet at infinity.) Since P is moved to P∗ along the line that connects P to the origin, we can think of this projection as scaling.
2.17 Circle Inversion
86
From P∗ = (1/r, θ), we obtain x∗ 2 + y ∗ 2 = 1/r2 and this implies P∗ = (x∗ , y ∗ ) =
(x, y) P = 2 = sP x2 + y 2 x + y2
because this relation means that x∗ 2 + y ∗ 2 =
(x2
x2 y2 1 + 2 = 2 = 1/r2 . 2 2 +y ) (x + y 2 )2 (x + y 2 )
Notice that the scale factor s depends on P, indicating that this type of projection is nonlinear. There are several applets on the Internet that make it easy to explore the properties of circle inversion. This projection has a number of interesting features, the most important of which are the following: 1. Any circle that intersects the unit circle at right angles is projected to itself. 2. The angle between two projected lines is preserved. Thus, circle inversion is a conformal projection. 3. Circles that do not pass through the origin are projected into circles (that do not pass through the origin and generally have a different radius). 4. Similarly, lines that do not pass through the origin are projected into circles that do pass through the origin (Figure 2.56). 5. A circle centered on the origin is projected to another circle similarly centered. 6. Lines through the origin are projected to themselves (except that the projection of the origin is undefined). 7. The inverse of an inverse is the original point. Thus, (P∗ )∗ = P. (This is trivial.)
4 1
3 2
2 3
Figure 2.56: Four Circles and Lines.
1 4
2 Raster Graphics
87
Curves that are their own inverse are called anallagmatic. Exercise 2.19: Search the mathematical literature or the Internet (or just think) to find another anallagmatic curve.
L
ci Unit rcle
Proje cti on
L of
Q*
Q O
P
P*
Figure 2.57: Circular Inversion of a Line.
Here is an explanation of feature 4. Figure 2.57 shows a line L that does not pass through the origin. Consequently, there must be a perpendicular to L from the origin. The point where this perpendicular meets L is denoted P and its projection is denoted P∗ . We now select another arbitrary point Q on L and denote its projection Q∗ . It is obvious that OP ·OP ∗ = 1 and OQ·OQ∗ = 1, so we conclude that OP/OQ∗ = OQ/OP ∗ . This shows that triangles OP Q and OP ∗ Q∗ are similar (notice that they have a common angle), which, in turn, implies that angles OP Q and OQ∗ P ∗ are equal. Since the former is a right angle, the latter must be also. However, point Q is an arbitrary point on L, so angle OQ∗ P ∗ equals 90◦ for any point Q on L, showing that the projection Q∗ lies on a circle that passes through the origin O and has a diameter OP ∗ . The projection of P is P∗ , and the projection of the origin is the point (or points) at infinity. Line L of Figure 2.57 passes inside the unit circle. For lines outside this circle, the diagram looks different but the proof is identical. Exercise 2.20: Use similar arguments to explain feature 3. Exercise 2.21: The discussion so far has assumed inversion with respect to the unit circle. Given a circle C of radius R about the origin, show how to project a point P with respect to it. Figure 2.58 shows a simple geometric construction of the inverse of a point P. In part (a) of the figure, P is inside the circle. Line L1 is constructed from the center through P and continues outside the circle. Line L2 is then constructed perpendicular to L1 . Point A is the intersection of L2 with the circle. A tangent L3 to the circle is
2.18 Polygons (2D)
88
A
A
L3
L2 P
R
P*
L1
L2
P0
(a)
L3
P*
L1
P
(b) Figure 2.58: Construction of Circle Inversion.
constructed at A, and P∗ is placed at the intersection of the tangent and L1 . Part (b) shows the similar construction when P is outside the circle. Figure 2.58b illustrates another feature of circle inversion. Up to now, we assumed that the inversion is about a unit circle centered on the origin. Given a circle of radius R, the two triangles PP0 A and P∗ P0 A are similar, implying that P0 P/R = R/P0 P∗ or R2 = P0 P×P0 P∗ . The quantity R2 is termed the circle power. The inverse P∗ of a point P with respect to an inversion circle of radius R centered at P0 is given by P∗ = P0 + R2
P − P0 . |P − P0 |2
As is common with nonlinear projections, it is possible to come up with many variants of circle inversions. For example, project point (r, θ) to (1/r, 180◦ + θ). An obvious (but perhaps not very useful) extension of circle inversion is sphere inversion, where the spaces inside and outside a sphere are swapped. Reference [Coxeter 69] presents the complete theory of circle inversions. A more general treatment of inversive geometry can be found in [Stothers 05]. Figure 2.59 (after [Gardner 84]) shows the circle inversion of a chessboard.
2.18 Polygons (2D) This section discusses two-dimensional polygons, their definition, parts, and basic properties. Section 3.9 discusses methods for filling a polygon with a given color or texture. Section 9.2 discusses three-dimensional polygons and their applications to polygonal surfaces. A polygon is a closed, two-dimensional figure that consists of three or more straight line segments. Each segment (edge) is joined to two other segments at a point (vertex). A polygon is a closed plane figure where line segments are joined together. The word is derived from the Greek πoλυσ (many) and γων´ıα (knee or angle). A convex polygon satisfies the following: when a straight line is drawn through it, the line crosses at most two sides. The interior angles of a convex polygon add to 360◦ , but each is less than 180◦ .
2 Raster Graphics
89
Figure 2.59: The Circle Inversion of a Chessboard.
If it is possible to draw a line that crosses more than two edges of a polygon, then the polygon is concave and at least one of its interior angles is greater than 180◦ . A polygon can be regular (all the edges are the same length), irregular, simple (its boundary does not cross itself), or self-intersecting (also known as complex). All convex polygons are simple. Figure 2.60 shows elementary examples of polygons and lines crossing them.
Figure 2.60: Polygons and Intersecting Lines.
Exercise 2.22: What if the angle between two edges is exactly 180◦ ? The area of a polygon with vertices (xi , yi ) is tion 8.3).
1 2
i (xi yi+1
− xi+1 yi ) (see also Sec-
Polygons naming convention. We are familiar with the terms triangle, pentagon, octagon, and a few others, but what about the 60-sided polygon? The standard polygon naming convention is based on the number of sides and it combines a numerical prefix (derived from Greek) with the suffix -gon (there are some exceptions and also disagreements with this standard). The table below lists the names of many polygons.
2.18 Polygons (2D)
90
Name Triangle Quadrilateral Pentagon Hexagon Heptagon Octagon Nonagon, Enneagon Decagon Undecagon, Hendecagon Dodecagon Tridecagon, Triskaidecagon Tetradecagon, Tetrakaidecagon Pentadecagon, Pentakaidecagon Hexadecagon, Hexakaidecagon
n 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Name n Heptadecagon, Heptakaidecagon 17 Octadecagon, Octakaidecagon 18 Enneadecagon, Enneakaidecagon 19 Icosagon 20 Triacontagon 30 Tetracontagon 40 Pentacontagon 50 Hexacontagon 60 Heptacontagon 70 Octacontagon 80 Enneacontagon 90 Hectogon, Hecatontagon 100 Chiliagon 1,000 Myriagon 10,000
Even more, given a polygon with a “weird” number of sides such as 63 or 99, the following table proposes eight prefixes and nine suffixes that should satisfy most cases. Thus, a 63-sided polygon is named Hexacontakaitrigon, and a 99-sided polygon has the right to be addressed as Enneacontakaienneagon. Sides
Prefix
20 30 40 50 60 70 80 90
Icosikai Triacontakai Tetracontakai Pentacontakai Hexacontakai heptacontakai Octacontakai Enneacontakai
Sides +1 +2 +3 +4 +5 +6 +7 +8 +9
Suffix henagon digon trigon tetragon pentagon hexagon heptagon octagon enneagon
In computer graphics, we often need the vector N that is normal to a given edge. Given an edge between vertices P1 = (x1 , y1 ) and P2 = (x2 , y2 ), its slope is m = (y2 − y1 )/(x2 − x1 ). The slope of a vector perpendicular to the edge is therefore −1/m, so such a vector is described by the pair (1, −1/m). For each polygon edge, there are two such vectors, N and −N, pointing in opposite directions. In a convex polygon, it is meaningful to talk about the inside and outside of the polygon, which is why in certain applications it may be important to determine the outer or the inner normal for a given edge. To identify the inner and outer normals, we first construct the vector M from vertex P1 to a vertex Pk , where k is neither 1 nor 2. If the polygon is convex, vector M will point inside the polygon, so the sign of the dot product N • M tells whether N is inner (positive sign, indicating that the angle between
2 Raster Graphics Pk N P1
91
(2.5,3)
(−1,4)
P2
(5,2) (1,1)
Figure 2.61: Inner and Outer Normals.
these vectors is less than 90◦ , as in Figure 2.61) or outer (negative sign). To reverse the direction of N, just reverse its sign. Example: Given three vertices P1 = (1, 1), P2 = (5, 2), and Pk = (2.5, 3), the slope of edge P1 P2 is m = 2−1 5−1 = 1/4, so N = (1, −4) and M • N = (1.5, 2) • (1, −4) = −6.5. The dot product is negative and Figure 2.61 shows that vector (−1, 4) points inside the polygon, which is why vector (1, −4) is an outer normal. Data Structure. It is easy to store a polygon in memory. Simply construct a list of consecutive vertices. To actually display the polygon, a program needs only a pointer to this list. Scanning the list, the software creates an edge from each vertex to its successor in the list. The last edge is from the last vertex to the first one.
2.19 Clipping Often, a large scene is Constructed in memory, but only part of it should be displayed; the rest must be identified and clipped. A typical example is zoom. When zooming on a scene, only part of it can be displayed, so it is unnecessary and time consuming to compute the other parts of the scene; they should be identified and clipped. Clipping is especially important with three-dimensional scenes, where the software has to transform, project, and render (i.e., shade and texture) the individual objects. In such cases, clipping can considerably speed up the program. If the image exists only as a bitmap, clipping is trivial, but if the image exists in vector form (points, lines, curves, polygons, and filled areas), then special algorithms are needed to identify those parts of the image that should be kept and remove or disregard the rest. Any method that identifies those parts of an image that are either inside or outside a given region is referred to as a clipping algorithm. The region against which a graphics object is to be clipped is called clipping window. The discussion here includes two types of clipping algorithms, those that clip lines against rectangles or convex polygons and those that clip polygons against other polygons. To clip a line segment against a given polygon means to find their intersection points and retain only that part of the line located inside the polygon. This is relatively simple with convex polygons but complex (and perhaps rarely needed) with other types of polygons. Similarly, to clip a polygon A (the subject) against another polygon B (the
2.20 Cohen–Sutherland Line Clipping
92
clip) means to locate their intersection points and retain only those parts of A that are inside B. The most important line clipping algorithms are Cohen–Sutherland, Cyrus–Beck, Liang–Barsky, and Nicholl–Lee–Nicholl. Among the common polygon clipping algorithms we find Sutherland–Hodgman, Weiler, Liang–Barsky, Maillot, Vatti, and Greiner– Hormann.
2.20 Cohen–Sutherland Line Clipping The Cohen–Sutherland method is one of the oldest clipping algorithms. The main idea is to classify lines according to the relation between their endpoints and the boundaries of the clip rectangle. It is easy to see that if both endpoints of a segment are above the rectangle, the entire segment is outside the rectangle and no clipping is needed; the segment should be disregarded. The same is true if the two endpoints are below, to the left of, or to the right of the rectangle. If both endpoints are inside the rectangle, the entire segment should be displayed (no clipping). The algorithm proceeds as follows: 1. Each of the two endpoints of the segment is compared to the boundaries of the rectangle and is assigned a 4-bit number according to the result (Figure 2.62). Thus, if an endpoint is above and to the left of the rectangle, it is assigned the number 1001. 2. The two numbers assigned to the endpoints are logically “anded.” If the result is nonzero, the two endpoints are on the same side of the rectangle, meaning the entire segment is outside the rectangle and should be ignored (segment 1 in Figure 2.62). If the result is zero, there are three subcases as follows: 2.1. Both numbers are “0000,” meaning that both endpoints are inside the clip rectangle. There is no need to clip and the entire segment should be displayed (segment 2 in Figure 2.62). 2.2. Only one number is “0000.” One endpoint is inside the rectangle and the other outside it. One clipping point should be calculated (point a in Figure 2.62). 2.3. Neither of the two numbers is “0000.” Both endpoints are, in this case, outside the rectangle and two clipping points (b and c in Figure 2.62) should be computed. Bit # 3 2 1 0
If set then point is
d e
above window below window right of window left of window
3210
3210
3210
1001
1000
1010
0001
0000
0010
0101
0100
0110
f
2
5
g a
b
h 3
c
1 Clip Rectangle
Figure 2.62: The Cohen–Sutherland Clipping Algorithm.
4
2 Raster Graphics
93
It is obvious that cases 2.2 and 2.3 need further treatment. In these cases the line segment has to be clipped and cannot be trivially accepted or discarded. Thus, the algorithm proceeds by splitting the segment at a point where it crosses one of the rectangle’s (infinite) edges. The two resulting sections are examined and the one located on the outside of the edge (it always exists) is discarded. Section 2.18 explains how to distinguish the inside and outside directions of an edge of a convex polygon. A case in point is segment 4 of Figure 2.62. The algorithm determines its intersection with the left edge of the clip rectangle (point b). The section on the left of b lies outside the rectangle and is discarded, but the remaining section still cannot be accepted or discarded, so the algorithm must split it at the point where it crosses the bottom edge of the rectangle (point c). Of the two new sections, the one on the right is outside the rectangle and is discarded, while the one on the left (section bc) is immediately accepted. How does the algorithm know to test segment 4 first against the left edge and then against the bottom edge? It does not. It simply tests such a segment methodically against all four edges to locate an intersection point. The order of the tests is irrelevant, but should always be the same. Exercise 2.23: List the steps taken by the algorithm for segment 5 (dh) of Figure 2.62.
2.21 Nicholl–Lee–Nicholl Line Clipping The Cohen–Sutherland clipping method is simple but may require several iterations for clipping a segment, where the intersection of two lines is computed in each iteration. The Cyrus–Beck line clipping algorithm of Section 2.22 often has to test certain segments methodically against all four edges of the clip rectangle to locate an intersection point. Computing an intersection point is slow, which is why the Nicholl–Lee–Nicholl (NLN) algorithm described here, due to [Nicholl et al. 87], is interesting. It reduces the chances of multiple clipping of a single line segment and therefore requires fewer computations but more tests than other line clipping methods. Its developers have included a detailed analysis that shows that this algorithm requires fewer operations than its predecessors and can also save the results of certain computations for later use. Given a rectangular clip window and a line segment PQ, the NLN method divides the entire xy plane into nine regions, as illustrated in green in Figure 2.63. The algorithm distinguishes the following three cases, point P is inside the rectangle, it is in a corner region, and it is in an edge region (parts (a), (b), and (c) of the figure, respectively). If P is located in other regions, not the ones shown in the figure, the treatment is symmetric. For each of the three cases, the algorithm draws the four rays from P through the corners of the clip rectangle (the central region). These are shown in the figure in black. The algorithm then performs tests to identify those regions that satisfy the following property. For example, if P is in an edge region and Q is in a corner region (as in part (c) of the figure), the algorithm identifies the regions labeled LB in the figure. Regardless of where Q is located in these regions, the algorithm will have to determine the intersections of segment PQ with the left and bottom edges of the clip rectangle. There is no need to try any other intersections.
2.21 Nicholl–Lee–Nicholl Line Clipping
94
T
L
L
P
R
R B B B (a)
P L
T T
TR
P
TR TB
L
T TR
LB
(b)
L
LT L
LT
LR
LR LR
LB LB Q (c)
Figure 2.63: Regions in NLN Clipping.
Abbreviations T, L, B, and R in the figure indicate intersections with the top, left, bottom, and right edges of the rectangle, respectively. Abbreviations LT, LR, LB, TR, and TB indicate intersections of segment PQ with the top and left, left and right, left and bottom, top and right, and top and bottom boundaries of the rectangle, respectively. Thus, the essence of the NLN algorithm is to perform simple tests that indicate whether segment PQ lies to the left or right of each of the four rays, in order to identify the regions that satisfy the property above. Once these regions have been identified, the algorithm knows which edge or edges are intersected by the segment, and it computes the intersection points. It is clear that this algorithm has to perform many tests, but has to compute only one or two intersection points. Overall, the analysis done by the developers claims that the NLN algorithm performs the fewest number of divisions (one for each intersection point that is computed) and the fewest number of comparisons among the three line clipping algorithms described here. Another interesting property of this method is that it uses certain results several times. The tests performed by the algorithm require the perpendiculars to the four rays, and these can be computed once and stored for future use. The following paragraphs explain how to determine whether a given point lies to the left or to the right of a given ray. The negate and exchange rule (Page 207) states that rotating a point (x, y) 90◦ counterclockwise transforms it to point (−y, x). Thus, given the two points P = (Px , Py ) and Q = (Qx , Qy ), their difference is the vector (Qx − Px , Qy − Py ) which is transformed by a 90◦ rotation to its perpendicular (Py − Qy , Qx − Px ). Given a ray from P to Q, it divides the xy plane into two halfplanes. Given another point R, it is now easy to determine whether it lies to the left or to the right of the ray. We construct the vector from P to R and compute its dot product with the perpendicular (Py − Qy , Qx − Px ). The result is (Rx − Px )(Py − Qy ) + (Ry − Py )(Qx − Px ) and it is if this dot product is positive. easy to see that R lies to the left of PQ
2 Raster Graphics
95
2.22 Cyrus–Beck Line Clipping The Cyrus–Beck algorithm [Cyrus and Beck 78] clips a line against a convex polygon. Figure 2.64a shows that a straight segment (shown in red) from P0 to P1 may intersect the (infinite) edges of a polygon (in green) at up to four points, only two of which may be relevant for clipping purposes. The algorithm computes the intersection points and then performs a series of comparisons to decide which points are relevant for the clipping.
L P0
1
2
E
P0
P1
E
L
L
L
P1 Pk
3
P0
E P0
P1
E
P1 >0
(a)
=0
t1 for this line. The Cyrus–Beck algorithm becomes much simplified when the clip polygon is a rectangle with edges that are parallel to the coordinate axes. This version of the algorithm is known as the Liang–Barsky line clipping algorithm. Given a rectangle with its bottom-left corner at point (xmin , ymin ) and its top-right corner at (xmax , ymax ), Table 2.65 summarizes the quantities and operations required to compute up to four t values for the intersection points of a segment from P0 = (x0 , y0 ) to P1 = (x1 , y1 ). t=
2 Raster Graphics Clip edge
N
left (x = xmin ) right (x = xmax ) bottom (y = ymin ) top (y = ymax )
Pk
97
P0 − Pk
t −(x0 −xmin ) (x1 −x0 ) (x0 −xmax ) −(x1 −x0 ) −(y0 −ymin ) (y1 −y0 ) (y0 −ymax ) −(y1 −y0 )
(−1, 0)
(xmin , y)
(x0 − xmin , y0 − y)
(1, 0)
(xmax , y)
(x0 − xmax , y0 − y)
(0, −1)
(x, ymin )
(x0 − x, y0 − ymin )
(0, 1)
(x, ymax )
(x0 − x, y0 − ymax )
Table 2.65: Intersections of a Segment and a Rectangle.
2.23 Sutherland–Hodgman Polygon Clipping The Sutherland–Hodgman algorithm clips a subject polygon S (that can be convex or concave) against a clip convex polygon C. The method is iterative, clipping S against an edge of C in each iteration. An edge E of C is selected and is extended to become a halfplane. The inside and outside normals of E are also computed (these normals are well defined because polygon C is convex). Polygon S is traversed, edge by edge, and each edge D is compared to E and is clipped. Clipping D against E may result in zero, one, or two output vertices. These vertices are appended to an output list of vertices, and the algorithm passes to the next edge of S. Once the algorithm has processed every edge of S, it starts the next iteration, where it selects another edge of C and it uses the output list from the previous iteration as its input list of vertices. When all the edges of C have been selected in this way, the final output list constitutes the vertices of the clipped version of S. Figure 2.66 shows an example of a concave polygon (blue) clipped against four edges (red) of a square.
Figure 2.66: Clipping Against a Convex Polygon.
In a typical step, the algorithm examines an edge D of polygon C. Assume that the edge goes from vertex A to vertex B, the software has to determine the locations of these vertices with respect to edge E. Specifically, whether they lie inside or outside polygon C. Figure 2.67 is an example where three vertices of S are inside polygon C (in blue) and two vertices (plus more that are not shown) are outside. In part (a) of the figure, vertices A and B are both inside C and the algorithm outputs B (in red). In part (b), point A is inside and point B is outside. The algorithm computes the intersection point P and outputs it. In part (c), both points are outside C, and the algorithm outputs nothing. In part (d), point A is outside, while point B is inside, so the algorithm outputs the intersection point P , followed by B.
2.23 Sutherland–Hodgman Polygon Clipping
98
×
D B
E B
A
P
B
B
A
A (a)
P A
×
(b)
(c)
(d)
(e)
Figure 2.67: Sutherland–Hodgman Clipping Rules.
Thus, clipping these four edges against edge D appends points B(a), P (b), P (d), and B(d) to the output list. Point A(a) was added to this list earlier, when the algorithm examined the short edge that leads to A(a). The result of clipping S against edge D is shown in part (e) of the figure. Next, the algorithm examines the remaining edges of polygon S, but all their vertices are outside D, so nothing is added to the output list. Once all the edges of S have been clipped against edge D, the algorithm replaces the original list of vertices with the newly-generated output list and clips this set against another edge of C. Exercise 2.24: Figure 2.67d shows two vertices marked ×. How were they added to the output list? To summarize, when the algorithm examines edge AB against edge D, it follows these simple rules: If both vertices A and B are inside polygon C, output B. If A is inside and B is outside, output the intersection point of AB and D. If A is outside and B is inside, output the intersection point, followed by B. If both vertices are outside, output nothing. The algorithm involves many tests of points for being inside or outside a polygon, but determining intersection points, a more complex computation, is done infrequently.
2 Raster Graphics
c1 s4
s1
p6
p5 p4
p3
s3 p2 c4
c2
99
c3
p1
s2
Figure 2.68: Weiler–Atherton Clipping Example.
2.24 Weiler–Atherton Polygon Clipping In 1977, Kevin Weiler and Peter Atherton were working on an algorithm for visiblesurface determination and serendipitously came up with the clipping algorithm that is briefly described here. In 1980, Weiler improved this method with concepts from graph theory. The first step is to number the vertices of the subject (S) and clip (C) polygons such that the inside of each polygon is always on the right as we traverse it in order of the vertices. In Figure 2.68, polygon S is shown in green, with vertices si , and polygon C is in blue, with vertices ci . Next, all the intersection points of S and C are determined (they are red in the figure and are denoted by pi ). Note that these intersections come in pairs. If S enters C at point pi , it leaves C at point pi+1 . The algorithm starts at a vertex of S, say s1 , and traverses S until it reaches an entering intersection point (p1 in the figure). It then continues traversing S until it reaches a leaving intersection (p2 in the figure). At that point the algorithm “turns to the right” and starts traversing C until an intersection is found, at which point it again turns to the right and starts traversing S. This process is repeated until the algorithm reaches a vertex it has already visited (p1 in our example). This loop results in the polygon p1 s2 p2 c2 . Next, the algorithm repeats this process starting at another vertex of S, and iterates in this way until all the intersection points have been visited. Exercise 2.25: Assume that the algorithm starts at vertex s2 , what is the clipped polygon resulting from that iteration? Note. Here is how to compute the intersection of two edges. Given the edge from P0 to P1 , its parametric equation is P0 (t) = P0 + (P1 − P0 )t (see also Equation (9.1)). Similarly, given the edge P2 to P3 , its parametric equation is P2 (t) = P2 + (P3 − P2 )t. If an intersection point exists, it satisfies P0 + (P1 − P0 )t = P2 + (P3 − P2 )t, which is easily solved for t.
2.25 A Practical Drawing Program
100
If the equations are contradictory, then the edges do not intersect. If the equations are dependent, then the edges are parallel. (End of note.) Figure 2.69 is another example of this algorithm. The first iteration starts at s1 and produces the clipped polygon p1 p2 p3 p4 and the second iteration start at s3 and results in p5 s4 s5 p6 . At this point, all the intersection points have been visited, so the algorithm terminates. c1
s1
p1
p2
p4 s3 s6
c2 s2
p3
p5 p6
s4
s5
c3
c4
S: s1
p1
p2
s2
p3
p4
s3
p5
s4
s5
p6
C: c1
c2
p2
p3
c4
c3
p6
p5
p4
p1
S: s1
p1
p2
s2
p3
p4
s3
p5
s4
s5
C: c1
c2
p2
p3
c4
c3
p6
p5
p4
p1
s6
p6
s6
Figure 2.69: Weiler–Atherton Clipping Example.
2.25 A Practical Drawing Program Programs for artistic and technical drawing and illustration have been popular since the mid-1980s, and this section describes their main data structures and operations. There are also painting programs, but they are not discussed here. The only background necessary for this section is an understanding of the xor operation (Section 2.2.4) and of the bitmap in general. A typical drawing/illustration program starts with a blank screen and a menu of graphics objects that it can draw and manipulate. These normally include line, circle/oval, box (square/rectangle, optionally with rounded corners), polygon, cubic B´ezier curves (Chapter 13), and text. It may also be possible to paste in a small bitmap prepared outside the program (perhaps a scanned painting or a photograph). The user selects a menu option, supplies the necessary data, and the program displays the graphics object on the screen. In this way, the screen fills up with different items, each a simple graphics object, that may intersect, obscure each other, and have different colors.
2 Raster Graphics
101
The user then starts editing the image by selecting various objects and deleting, moving, or modifying them. The discussion here concentrates on selecting one of possibly many graphics objects on the screen. Figure 2.70a shows a simple drawing consisting of a line, a circle, and a rectangle. We initially assume that those objects do not intersect. The user points the cursor (the arrow in the figure) anywhere on the line (or within a few pixels of the line) and clicks to select the line. The program has to identify the object selected and highlight it, thereby confirming the selection to the user. The problem is that the program knows only the position of the cursor on the screen (its screen coordinates). These are the coordinates of a pixel on the selected object, and the immediate task of the program is to quickly find (1) all the other pixels of this object, (2) its type (line, circle, etc.), and (3) its other data (in the case of a line, the coordinates of the two endpoints; in the case of a circle, its center and radius).
Bitmap
Codemap 1
1
1
333333333 3 3 3 3 333333333 1 1 1 22 2 2 2 1 2 2 1 1 2 2 2 2 2 2 22 2 2 2
(a)
avail: no no no yes code:
1
1
1
1
L C R
color: R B G pointer:
(b) 1
(d)
Geometric structure 1 2 3 4
1 333333333 1 22 32 2 3 2 3 231 2 333333 33 3 2 2 2 2 22 2 2 2
(e)
(c)
(f)
String (g)
Figure 2.70: Data Structures for a Drawing Program.
Two data structures are used by the program to perform this task. The first is the codemap, an array the size of the bitmap where each location contains the serial number of a graphics object on the screen. The second is a geometric data structure containing the type of each graphics object drawn so far and its color and specific data. Figure 2.70b shows how the serial numbers of the three objects drawn so far are stored in the codemap (the rest of the codemap is assumed to have been initialized to zeros).
102
2.25 A Practical Drawing Program
Figure 2.70c shows the geometric data structure. Notice that the serial numbers are implied in that structure and don’t actually have to be stored there. Once the program detects a click, it inputs the cursor coordinates and checks the corresponding location of the codemap. In our example, it finds serial number 1. (Both the bitmap and the codemap are shown as two-dimensional arrays, but, in practice, they are normally one-dimensional.) The program examines location 1 of the geometric data structure, finds that the graphics object with serial number 1 is a line, and finds the pointer to its specific data. That data consists of the coordinates of the two endpoints. The program then calculates all the pixels of the line (using the same scan-converting method that was originally used to draw the line) and highlights them. The two endpoints may be highlighted with a different color, making them more visible to the user. This takes about the same amount of time as drawing the line in the first place. The program starts waiting for the next user response/command. Exercise 2.26: Rewrite the preceding paragraph for the case in which the user selects the circle. Text presents a special problem. The drawing program gets the characters of text from the different fonts and knows nothing about their shapes. In order to select a string of text, the user normally has to click on its reference point (which is either the bottom-left corner of the text or the left point of the baseline, Figure 2.70g). Any clicks within the text string are ignored by the program (although a sophisticated program may obtain the width of the character from the font file, and this enables it to identify clicks anywhere on the baseline). After selecting a graphics object, typical user responses may be to delete the object, move it, or reshape it. The user may also want to copy the object, group it with some other objects, or perform other operations, but we discuss only the first three operations. Deleting an object is done by using the same scan-converting method. Every pixel calculated by the method is erased (or is xored with the color of the object being erased) and the program also clears the serial number of the pixel in the codemap. When done, the program uses the serial number to mark the corresponding entry of the geometric data structure “available” and deletes the storage used for the specific data. To move a graphics object, the user grabs it at a certain pixel or anchor point and drags it to its new position. The program creates an outline of the object and moves it with the cursor. When the dragging stops, the program calculates the x and y displacements (the final cursor position minus its initial position) and moves the object to its new location pixel by pixel. Each pixel at position (x, y) is erased and drawn at position (x + dx , y + dy ). An xor may be used instead of straight erase and draw. After moving a pixel, the program moves its serial number from position (x, y) to position (x+dx , y +dy ) in the codemap. There is no need to change the geometric data structure, but the specific object information may have to be updated. (If the object is simple, such as a line or a circle, the outline of the object is the object itself. If the object is more complex, such as text or a bitmap, the outline of the object may be its bounding rectangle.) Reshaping an object can be done by rotating, scaling, or shearing it. These transformations are discussed in Chapter 4, where it is shown that they can all be expressed by a 3 × 3 transformation matrix where only six of the nine elements vary. Thus, re-
2 Raster Graphics
103
shaping is similar to moving. The user may enter a command or make their wish known by dragging. Once the program knows what transformation is required, it prepares a specific transformation matrix T. Each pixel P is then moved from its original position to position P·T, a step that’s followed by moving its serial number in the codemap in the same way. The geometric data structure may have to be updated, since reshaping may change the type of an object. A circle may become an oval; a square may turn into a rectangle or a parallelogram. Notice that scaling or rotating a bitmap cause special problems. These operations are discussed in Sections 2.5 and 2.15. We now turn to the case where graphics objects may intersect. Figure 2.70d shows three intersecting objects. We assume that the line was drawn first, followed by the circle, and then by the rectangle. This behavior is best visualized by thinking of each object as a layer that is drawn on top of all the previous objects. A glance at Figure 2.70e shows that the codes at the intersection points are those of the most-recently drawn object that passes through the point. If the user clicks on an intersection point, the latest object through it will be selected. This does not seem bad, until we consider the case where that object is moved. If the rectangle of Figure 2.70d is moved, its two intersection points would be set to zero. After many such moves, the codemap may become meaningless. We outline two solutions to this problem: 1. When an object with serial number n is drawn, the program checks each codemap location before it sets it to n. If a location is nonzero, its value is saved in a linked list before the location is set to n. We now have to have a linked list associated with each codemap location. We visualize the codemap as an array of structures, where each structure consists of two fields—a code and a pointer to a list. Initially, all the codes are set to zero and all the pointers are set to null. When the first code is stored in a location, the pointer remains null. When another code is stored in the location, the original code is saved in a list node and the pointer is set to point to that node. When a code is removed from a location, the pointer is checked. If it is not null, it is followed and the most recent code is restored. Modern computers can perform these operations fast enough to make them appear instantaneous. When this method is used, clicking on an intersection point will select the most recent object that passes through it. 2. When an object with serial number n is drawn, the program xor’s n with the codemap locations. The value stored in a codemap location is therefore the xor of the serial numbers of all the objects that pass through it. When an object with serial number m is deleted or moved, the program performs an xor of a codemap location with m. No lists are necessary, so this method is simple and fast. The following examples show the values stored in a hypothetical codemap location when objects 1, 5, 7, 1, and 5 are added then deleted (we assume 4-bit serial numbers) 0000 xor 0001 = 0001, 0001 xor 0101 = 0100, 0100 xor 0111 = 0011, 0011 xor 0001 = 0010, 0010 xor 0101 = 0111. The downside is that a codemap location corresponding to an intersection point does not contain any of the serial numbers of the intersecting objects. In our example, the contents of the hypothetical location varies from 4 to 3 to 2, ending with 7. Thus, the program should flag all codemap locations where objects intersect, and should disregard any user clicks at those points.
104
2.26 GUI by Inversion Points
Exercise 2.27: What is a good way to implement such flags?
2.26 GUI by Inversion Points Computer users are familiar with the various flavors of the Microsoft Windows operating system and the many versions of the Macintosh operating system. These operating systems are based on a graphical user interface (GUI), which makes it natural for the user to view the status of the computer at a glance and to perform many operations on files. In a typical application, a GUI-based operating system displays several disks, files, and folders (or subdirectories) on the screen, and the user can select any of them, open it, move it, change its name and content, or delete it. New folders may also be added, named, and dragged to new locations. Any folder may partly obscure others, thereby creating the illusion of a three dimensional display. At any time, only one file or folder (in this section we’ll refer to both files and folders as folders) can be active and its icon is graphically enhanced to distinguish it from the inactive folders. The user can easily select another folder by pointing at any location inside it and clicking. The selected folder, which may have been partly covered by others, is immediately brought to the front and may now obscure other folders. This section describes an approach to implementing such a display. The method described here was developed by Bill Atkinson of Apple Computer [Atkinson 86] who used it in his revolutionary MacPaint software. The first method shown here is an extension of BitBlt and is based on the concept of inversion points. It is described here for a monochromatic display, but can also be adapted to color displays. Any shape (such as folders, icons, and characters of text) that the operating system has to display is prestored in a special area in memory and is referred to as a source image. When the operating system is instructed by the user to display a certain source image in a certain region of the bitmap it “holds” the source image over the bitmap region and performs an xor of each bitmap bit with the source image bit that “covers” it. This embeds the source image in the region in such a way that (1) any other images in the region can still be recognized and (2) those images are automatically restored when the source image is later deleted. Nothing outside the region is affected. As described so far, this operation is a direct application of BitBlt. There are, however, two extensions that make this method more general and useful than BitBlt. (1) The source images don’t have to be rectangular. They can have any shape and can even consist of disjoint parts. (2) A source image is stored in memory in terms of inversion points, which effectively creates a compressed representation of the source image. A large, complex source image that consists of thousands of pixels may be represented by a short, sorted list of tens or perhaps hundreds of inversion points. An inversion point in the source image is defined as a point such that all the pixels to its right and below it should be inverted (flipped). Figure 2.71a (1) illustrates the effect of a single inversion point and (2) shows how two inversion points on a horizontal line create a vertical strip in the bitmap. Such a strip is infinite but can be turned into a rectangle by adding two more points as shown in part 3 of the figure. An L-shaped source image can be represented by six inversion points as depicted in the same figure.
2 Raster Graphics
105
A source image with a hole and another image with two disjoint parts are shown in parts 3 and 4 of Figure 2.71b, respectively, while part 5 of the same figure illustrates a source image with a slanted line. Such an image requires many more inversion points, but the use of inversion points makes it possible to represent any shape, with straight or curved boundaries, with holes or with separate parts.
4
2 3 3 1
5
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 2.71: Inversion Points.
It is also clear, from Figure 2.71, that the inversion points are simply the endpoints of the horizontal boundary segments of the source image. This fact makes it easy for both the user and the software to determine the inversion points of a given image. The user does it by looking at the image, while the software has to scan the image row by row. Figure 2.71c shows how a source image in the shape of an L is embedded in a bitmap that contains a circle and a triangle. The six steps of this process are listed in parts (d) through (i) of the figure and they illustrate the order in which the inversion points are utilized. They are taken from top to bottom and points with identical y
2.26 GUI by Inversion Points
106
coordinates are employed from left to right. This is why the inversion points of a source image should be stored in memory already sorted in this order. The result of this process is that the new source image is xored with the existing bitmap. The circle of Figure 2.71 is unaffected because it is located outside the region and the triangle is still recognizable. The advantage of the xor is that deleting an object restores any other objects intersecting it. Thus, deleting the L-shaped source image restores the triangle and deleting the triangle restores the new source image. In principle, this method is slow. Each new inversion point may require the inversion of those bits located to its right and below it in the bitmap, and there may be many thousands of such bits. Many may have to be inverted several times, as illustrated in Figure 2.71. It is easy to come up with an efficient version once we realize that embedding a source image in a bitmap region affects only the bits within the region. A faster version of this method starts by computing a bounding box around the source image and identifying the matching box in the bitmap. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0000000000 0000000000 0000000000 0000000000 0000000000 0000000000 0000000001 0000000011 0000000111 0000001111 0000011111 0000001111 0000000111 0000000011 0000000001 0000000000 0000000000
1111110000 1111110000 1111110000 1111110000 1111110000 1111110000 1111110001 1111110011 1111111000 1111110000 1111100000 1111110000 1111111000 1111111100 1111111110 1111111111 1111111111
(a)
(b)
FOLDER A
FOLDER A FOLDER B
FOLDER A FOLDER B
(c)
Figure 2.72: A Bitmap Region Before and After Bit Inversions.
Figure 2.72a,b illustrates this version. The bounding box is 10 × 17 bits (the six inversion points are shown in red). In the top eight rows, the 6 bits between the two consecutive inversion points are inverted. Starting with row 9, the range of inversion is widened to include all 10 bits of a row. The two inversion points on row 17 stop the inversion. Figure 2.72c shows how this method can be combined with clipping. When a new folder B is added to the bitmap, folder A is clipped to the boundaries of B, with the result that B seems to float above A. A GUI-based operating system allows the user to select folders on the screen by moving a cursor and clicking. When the user moves the cursor to an arbitrary point P inside an image and clicks, the operating system receives only the screen coordinates of P . It turns out that the use of inversion points makes it easy to decide whether P is located inside a given folder. The operating systems starts by checking all the folders, finding those that contain P . It then performs an operation that depends on P . If P is not inside any folder, nothing should be done. If P is inside the top (active) folder, that folder is selected and highlighted. If P is outside the top folder and inside several inactive folders, the topmost of those folders is selected and becomes the active folder.
2 Raster Graphics
107
In order to determine whether point P is located inside a given folder, P is compared with all the inversion points of the folder as follows: A Boolean variable v is set to false. For each inversion point located above and to the left of P , v is toggled. If v is true at the end (i.e., v has been toggled an odd number of times), then P lies within the folder.
2.27 Halftoning For many decades, newspapers were printed in black-and-white. This is fine for text, but images require at least grayscale and preferably color. For years, CRT monitors used with computers were also monochromatic, which was fine for text, but too restrictive for images. Even when color CRT displays became popular, the really high-resolution CRTs were still monochromatic. Color printers became low cost and popular only in the early 1990s, with the development of reliable, high-resolution inkjet printers. Thus, for many years, both text and images had to be printed on black-and-white printers. The method used to generate grayscales on a bilevel device is known as halftoning. Halftoning is a method that makes it possible to display images with shades of gray on a black and white (i.e., bilevel) output device. The trade-off is loss of resolution. Instead of small, individual pixels, halftoning uses groups of pixels where only some of the pixels in a group are black. Halftoning is important because it makes it possible to print images in grascale on a black and white printer. It is commonly used in newspapers and books. A classic reference is [Ulichney 87]. The human eye can resolve details as small as 1 minute of arc under normal light conditions. This is called the visual acuity. If we view a very small area from a normal viewing distance, our eyes cannot see the details in the area and end up integrating them, such that we only see an average intensity coming from the area. This property is called spatial integration and is very nicely demonstrated by Figure 2.73. The figure consists of black circles and dots on a white background, but spatial integration creates the effect of a gray background. ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦ ◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦◦
.................... .................... .................... .................... .................... ....................
Figure 2.73: Gray Backgrounds Obtained by Spatial Integration.
The principle of halftoning is to use groups of n×n pixels (with n usually in the range 2–4) and to set some of the pixels in a group to black. Depending on the blackto-white percentage of pixels in a group, the group appears to have a certain shade of gray. An n×n group of pixels contains n2 pixels and can therefore provide n2 + 1 levels of gray. The only practical problem is to find the best pattern for each of those levels. The n2 + 1 pixel patterns selected must satisfy the following conditions: 1. Areas covered by copies of the same pattern should not show any textures.
2.27 Halftoning
108
2. Any pixel set to black for pattern k must also be black in all patterns of intensity levels greater than k. This is considered a good growth sequence and it minimizes the differences between patterns of consecutive intensities. 3. The patterns should grow from the center of the n×n area, to create the effect of a growing dot. 4. All the black pixels of a pattern must be adjacent to each other. This property is called clustered dot halftoning and is important if the output is intended for a printer (laser printers cannot always fully reproduce small isolated dots). If the output is intended for a monochromatic display only, then dispersed dot halftoning can be used, where the black pixels of a pattern are not adjacent. As a simple example of condition 1, a pattern such as • • • should be avoided, since large areas with level-3 groups would produce long horizontal lines. Other patterns may result in similar, annoying textures. With a 2×2 group, such effects may be impossible to avoid. The best that can be done is to use the patterns • • • • • • • •. • • A 3 × 3 group provides for more possibilities. The 10 patterns below (= 32 + 1) are the best ones possible (reflections and rotations of these patterns are considered identical) and usually avoid the problem of annoying textures. They were produced by the matrix ⎤ ⎡ 7 9 5 ⎣2 1 4⎦ 6 3 8 using the following rule: To create a group with in the above matrix should be black. • • • • • • • • • • • • • • •
intensity n, only cells with values ≤ n • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The halftone method is not limited to a monochromatic display. Imagine a display with four levels of gray per pixel (2-bit pixels). Each pixel is either black or can be in one of three other levels. A 2×2 group consists of four pixels, each of which can be in one of three levels of gray or in black. The total number of levels is therefore 4×3 + 1 = 13. One possible set of the 13 patterns is shown here. 00 00
10 00
10 01
11 01
11 11
21 11
21 12
22 12
22 22
32 22
32 23
33 23
33 33
2 Raster Graphics
109
2.28 Dithering The downside of halftoning is loss of resolution. It is also possible to display continuoustone images (i.e., images with different shades of gray) on a bilevel device without losing resolution. Such methods are sometimes called dithering and their trade-off is loss of image detail. If the device resolution is high enough and if the image is watched from a suitable distance, then our eyes perceive an image in grayscale, but with fewer details than in the original. It should be noted that dithering is the opposite of antialiasing. Antialiasing (Section 3.13) adds gray pixels to a black and white image in order to improve its appearance on a grayscale output device, while dithering converts a grayscale image to a black-andwhite image, in order to get better output on a monochromatic output device. The dithering problem can be phrased as follows: given an m×n array A of pixels in grayscale, compute an array B of the same size with zeros and ones (corresponding to white and black pixels, respectively) such that for every pixel B[i, j] the average value of the pixel and a group of its near neighbors will approximately equal the normalized value of A[i, j]. (Assume that pixel A[i, j] has an integer value I in the interval [0, a]; then its normalized value is the fraction I/a. It is in the interval [0, 1].) (Dithering can also be exploited to create new colors by mixing several existing colors, as illustrated by Figure 26.30 which shows how purple is obtained by dithering red and blue.) The simplest dithering method uses a threshold and the single test: Set B[i, j] to white (0) if A[i, j] is bright enough (i.e., less than the value of the threshold); otherwise, set B[i, j] to black (1). This method is fast and simple but generates very poor results, as the next example shows, so it is never used in practice. As an example, imagine a human head. The hair is generally darker than the face below it, so the simple threshold method may quantize the entire hair area to black and the entire face area to white, a very poor, unrecognizable, and unacceptable result. (It should be noted, however, that some images are instantly recognizable even in just black and white, as Figure 2.74 aptly demonstrates.) This method can be improved a little by using a different, random threshold for each pixel, but even this produces low-quality results. Four approaches to dithering—ordered dither, constrained average dithering, diffusion dither, and dot diffusion—are presented in this section. Another approach, called ARIES, is discussed in [Roetling 76] and [Roetling 77].
2.28.1 Ordered Dither The principle of this method is to paint a pixel B[i, j] black or leave it white, depending on the intensity of pixel A[i, j] and on its position in the picture [i.e., on its coordinates (i, j)]. If A[i, j] is a dark shade of gray, then B[i, j] should ideally be dark, so it is painted black most of the time, but sometimes it is left white. The decision whether to paint it black or white depends on its coordinates i and j. The opposite is true for a bright pixel. This method is described in [Jarvis et al. 76]. The method starts with an m×n dithering matrix Dmn which is used to determine the color (black = 1 or white = 0) of all the B pixels. In the example below we assume that the A pixels have 16 gray levels, with 0 as white and 15 as black. The dithering matrices for m = n = 2 and m = n = 4 are shown below. The idea in these matrices is
2.28 Dithering
110
Figure 2.74: A Familiar Black and White Image.
The giant panda resembles a bear, although anatomically it is more like a raccoon. It lives in the high bamboo forests of central China. Its body is mostly white, with black limbs, ears, and eye patches. Adults weigh 200 to 300 lb (90 to 140 kg). Low birth rates and human encroachment on its habitat have seriously endangered this species.
to minimize the amount of texture in areas with a uniform gray level.
D22 =
0 2 , 3 1
⎤ 0 8 2 10 ⎢ 12 4 14 6 ⎥ =⎣ ⎦. 3 11 1 9 15 7 13 5 ⎡
D44
The rule is: Given a pixel A[x, y] calculate i = x mod m, j = y mod n, then select black (i.e., set B[x, y] to 1) if A[x, y] ≥ Dmn [i, j] and select white otherwise. To see how the dithering matrix is used, imagine a large, uniform area in the image A where all the pixels have a gray level of 4. Since 4 is the fifth of 16 levels, we would like to end up with 5/16 of the pixels in the area set to black (ideally they should be randomly distributed in this area). When a row of pixels is scanned in this area, y is incremented, but x does not change. Since i depends on x, and j depends on y, the pixels scanned are compared to one of the rows of matrix D44 . If this happens to be the first row, then we end up with the sequence 10101010 . . . in bitmap B. When the next line of pixels is scanned, x and, as a result, i have been incremented, so we look at the next row of D44 , that produces the pattern 01000100 . . . in B. The final result is an area in B that looks like
2 Raster Graphics
111
10101010. . . 01000100. . . 10101010. . . 00000000. . . Ten out of the 32 pixels are black, but 10/32 = 5/16. The black pixels are not randomly distributed in the area, but their distribution does not create annoying patterns either. Exercise 2.28: Assume that image A has three large uniform areas with gray levels 0, 1, and 15 and calculate the pixels that go into bitmap B for these areas. Ordered dither is easy to understand if we visualize copies of the dither matrix laid next to each other on top of the bitmap. Figure 2.75 shows a 6×12 bitmap with six copies of a 4×4 dither matrix laid on top of it. The threshold for dithering a pixel A[i, j] is that element of the dither matrix that happens to lie on top of A[i, j].
Figure 2.75: Ordered Dither.
Matrix D44 above was created from D22 by the recursive rule Dnn =
4Dn/2,n/2 4Dn/2,n/2 + 3Un/2,n/2
4Dn/2,n/2 + 2Un/2,n/2 4Dn/2,n/2 + Un/2,n/2 ,
,
(2.9)
where Unn is an n×n matrix with all ones. Other matrices are easy to generate with this rule. Exercise 2.29: Use the rule of Equation (2.9) to construct D88 . The basic rule of ordered dither can be generalized as follows: Given a pixel A[x, y], calculate i = x mod m and j = y mod n, then select black (i.e., assign B[x, y] ← 1) if Ave[x, y] ≥ Dmn [i, j], where Ave[x, y] is the average of the 3×3 group of pixels centered on A[x, y]. This is computationally more intensive but tends to produce better results in most cases since it considers the average brightness of a group of pixels. Reference [Wolfram-dither 10a] illustrates ordered dither.
2.28 Dithering
112
Ordered dither is a simple, fast method, but it tends to create images that have been described by various people as “computerized,” “cold,” or “artificial.” The reason for that is probably the recursive nature of the dithering matrix.
2.28.2 Constrained Average Dithering In cases where high speed is not critical, constrained average dithering [Jarvis and Roberts 76] produces good results, although it requires more computations than or¯ j] of the dered dither. The idea is to compute, for each pixel A[i, j], the average A[i, pixel and its eight near neighbors. The pixel is then compared to a threshold of the form 2γ ¯ γ+ 1− A[i, j], M where γ is a user-selected parameter and M is the maximum value of A[i, j]. Notice that the threshold can have values in the range [γ, M − γ]. The final step is to compare A[i, j] to the threshold and set B[i, j] to 1 if A[i, j] ≥ threshold and to 0 otherwise. The main advantage of this method is edge enhancement. An example is Figure 2.76 which shows a 6×8 bitmap A, where pixels have 4-bit values. Most pixels have a value of 0, but the bitmap also contains a thick slanted line indicated in the figure. It separates the 0 (white) pixels in the upper-left and bottom-right corners from the darker pixels in the middle.
0 0 0 0 0 0
0 0 0 0 8 15 15 0 0 0 7 14 15 15 0 0 5 13 15 15 10 0 3 11 15 15 8 0 1 10 15 15 8 0 0 9 15 15 6 13 0 0
0 3 19 54 95 117 107 63 1 14 45 87 112 104 71 33 10 38 79 105 113 80 44 8
(a)
0 0 0
0 0 0
(b)
0 0 1
0 1 1
1 1 1
1 1 1 0 1 0
0 0 0
(c)
Figure 2.76: Constrained Average Dithering.
Figure 2.76a shows how the pixels around the line have values approaching the maximum (which is 15). Figure 2.76b shows (for some pixels) the average of the pixel and its eight near neighbors (the averages are shown as integers, so an average of 54 really indicates 54/9). The result of comparing these averages to the threshold (which in this example is 75) is shown in Figure 2.76c. It is easy to see how the thick line is sharply defined in the bilevel image. So I’m a ditherer? Well, I’m jolly well going to dither, then! —Roland Young (as Cosmo Topper) in Topper (1937).
2.28.3 Diffusion Dither Imagine a photograph, rich in color, being digitized by a scanner that can distinguish millions of colors. The result may be an image file where each pixel A[i, j] is represented
2 Raster Graphics
113
by, say, 24 bits. The pixel may have one of 224 colors. Now, imagine that we want to display this file on a computer that can only display 256 colors simultaneously on the screen. A good example is a computer using a color lookup table whose size is 3×256 bytes (Page 34 and Section 20.7). We begin by loading a palette of 256 colors into the lookup table (see Section 17.4 for a good method to select such a palette). Each pixel A[i, j] of the original image will now have to be displayed on the screen as a pixel B[i, j] in one of the 256 palette colors. The diagram below shows a pixel A[i, j] with original color (255, 52, 80). If we decide to assign pixel B[i, j] the palette color (207, 62, 86), then we are left with a difference of A[i, j] − B[i, j] = (48, −10, −6). This difference is called the color error of the pixel. R=48 R=255 R=207 − = G=−10 G=52 G=62 B=−6 B=80 B=86 Large color errors degrade the quality of the displayed image, so an algorithm is needed to minimize the total color error of the image. Diffusion dither does this by distributing the color errors among all the pixels such that the total color error for the entire image is zero (or very close to zero). The algorithm is very simple. Pixels A[i, j] are scanned line by line from top to bottom. In each line, they are scanned from left to right. For each pixel, the algorithm performs the following: 1. Pick up the palette color that’s nearest the original pixel’s color. This palette color is stored in the destination bitmap B[i, j]. 2. Calculate the color error A[i, j] − B[i, j] for the pixel. 3. Distribute this error to four of A[i, j]’s nearest neighbors that haven’t been scanned yet (the one on the right and the three centered below) according to the Floyd– Steinberg filter [Floyd and Steinberg 75] where the X represents the current pixel: X 7/16 3/16 5/16 1/16 Consider the example of Figure 2.77a. The current pixel is (255, 52, 80) and we (arbitrarily) assume that the nearest palette color is (207, 62, 86). The color error is (48, −10, −6) and is distributed as shown in Figure 2.77c. The five nearest neighbors are assigned new colors as shown in Figure 2.77b. The algorithm is shown in Figure 2.78a where the weights p1, p2, p3, and p4 can be assigned either the Floyd–Steinberg values 7/16, 3/16, 5/16, and 1/16 or any other values. The total color error may not be exactly zero because the method does not work well for the leftmost column and for the bottom row of pixels. However, the results can be quite good if the palette colors are carefully selected. This method can easily be applied to the case of a monochromatic display (or any bilevel (monochromatic) output device), as shown by the pseudo-code of Figure 2.78b, where p1, p2, p3, and p4 are the four error diffusion parameters. They can be the ones already given (i.e., 7/16, 3/16, 5/16, and 1/16) or different ones, but their sum should be 1.
2.28 Dithering
114 R=255 Before G=52 B=80 R=192 R=250 G=45 G=49 B=75 B=83
R=178 G=20 B=60 R=191 G=31 B=72
R=207 G=62 B=86 R=201 R=265 G=43 G=46 B=74 B=81 After
(a) 7 16 × 48 = 7 16 ×(−10) = 7 16 × (−6) =
21, −4, −3,
R=199 G=16 B=57 R=194 G=30 B=72
(b)
1 16 × 48 1 16 ×(−10) 1 16 × (−6)
=
3,
= −1, = 0,
5 16 × 48 = 5 16 ×(−10) = 5 16 × (−6) =
15, −3, −2,
3 = 16 × 48 3 ×(−10) = 16 3 16 × (−6) =
(c) Figure 2.77: Diffusion Dither.
for i := 1 to m do for j := 1 to n do begin B[i, j]:=SearchPalette(A[i, j]); err := A[i, j] − B[i, j]; A[i, j + 1] := A[i, j + 1] + err ∗ p1; A[i + 1, j − 1] := A[i + 1, j − 1] + err ∗ p2; A[i + 1, j] := A[i + 1, j] + err ∗ p3; A[i + 1, j + 1] := A[i + 1, j + 1] + err ∗ p4; end. (a) for i := 1 to m do for j := 1 to n do begin if A[i, j] < 0.5 then B[i, j] := 0 else B[i, j] := 1; err := A[i, j] − B[i, j]; A[i, j + 1] := A[i, j + 1] + err ∗ p1; A[i + 1, j − 1] := A[i + 1, j − 1] + err ∗ p2; A[i + 1, j] := A[i + 1, j] + err ∗ p3; A[i + 1, j + 1] := A[i + 1, j + 1] + err ∗ p4; end. (b) Figure 2.78: Diffusion Dither Algorithm. (a) For Color. (b) For Bilevel.
9, −2, −1.
2 Raster Graphics
115
Exercise 2.30: Consider an all-gray image, where A[i, j] is a real number in the range [0, 1] and it equals 0.5 for all pixels. What image B would be generated by diffusion dither in this case? Exercise 2.31: Imagine a grayscale image consisting of a single row of pixels where pixels have real values in the range [0, 1]. The value of each pixel p is compared to the threshold value of 0.5 and the error is propagated to the neighbor on the right. Show the result of dithering a row of pixels all with values 0.5. Error diffusion can also be used for color printing. A typical low-end inkjet color printer has four ink cartridges for cyan, magenta, yellow, and black ink. The printer places dots of ink on the page such that each dot has one of the four colors. If a certain area on the page should have color L, where L isn’t any of CMYK, then L can be simulated by dithering (see also Page 1256 and color Plate U.2). This is done by printing adjacent dots in the area with CMYK colors such that the eye (which integrates colors in a small area) will perceive color L. This can be done with error diffusion where the palette consists of the four colors cyan (255, 0, 0), magenta (0, 255, 0), yellow (0, 0, 255), and black (255, 255, 255) and the error for a pixel is the difference between the pixel color and the nearest palette color. A slightly simpler version of error diffusion is the minimized average error method. The errors are not propagated but rather calculated and saved in a separate table. When a pixel A[x, y] is examined, the error table is used to look up the errors E[x + i, y + j] already computed for some previously seen neighbors A[i, j] of the pixel. Pixel B[x, y] is assigned a value of 0 or 1 depending on the corrected intensity: A[x, y] +
1 ij
αij
αij E[x + i, y + j].
ij
The new error, E[x, y] = A[x, y] − B[x, y], is then added to the error table to be used for future pixels. The quantities αij are weights assigned to the near neighbors of A[x, y]. They can be assigned in many different ways, but they should assign more weight to nearby neighbors, so the following is a typical example: ⎛
⎞ 1 3 5 3 1 ⎝ α = 3 5 7 5 3 ⎠, 5 7 x − − where x is the current pixel A[x, y] and the weights are defined for some previously seen neighbors above and to the left of A[x, y]. If the weights add up to 1, then the corrected intensity above is simplified and becomes A[x, y] +
αij E[x + i, y + j].
ij
Floyd–Steinberg error diffusion generally produces better results than ordered dither but has two drawbacks, namely it is serial in nature and it sometimes produces annoying
116
2.28 Dithering
“ghosts.” Diffusion dither is serial since the near neighbors of B[i, j] cannot be calculated until the calculation of B[i, j] is complete and the error A[i, j] − B[i, j] has been distributed to the four near neighbors of A[i, j]. To understand why ghosts are created, imagine a dark area positioned above a bright area (for example, a dark sky above a bright sea). When the algorithm works on the last dark A pixels, a lot of error is distributed below, to the first bright A pixels (and also to the right). When the algorithm gets to the first bright pixels, they have collected so much error from above that they may no longer be bright, creating perhaps several rows of dark B pixels. It has been found experimentally that ghosts can be “exorcised” by scaling the A pixels before the algorithm starts. For example, each A[i, j] pixel can be replaced by 0.1+0.8A[i, j], which “softens” the differences in brightness (the contrast) between the dark and bright pixels, thereby reducing the ghosts. This solution also changes all the pixel intensities, but the eye is less sensitive to absolute intensities than to changes in contrast, so changing intensities may be acceptable in many practical situations. Reference [Wolfram-dither 10b] illustrates this type of dither.
2.28.4 Dot Diffusion This section is based on [Knuth 87], a very detailed article that includes thorough analysis and actual images dithered using several different methods. The dot diffusion algorithm is somewhat similar to diffusion dither, it also produces good quality, sharp bilevel images, but it is not serial in nature and may be easier to implement on a parallel computer. We start with the 8 × 8 class matrix of Figure 2.79a. The way this matrix was constructed will be discussed later. For now, we simply consider it a permutation of the integers (0, 1, . . . , 63), which we call classes. The class number k of a pixel A[i, j] is found at position (i, j) of the class matrix. The main algorithm is shown in Figure 2.80. The algorithm computes all the pixels of class 0 first, then those of class 1, and so on. Procedure Distribute is called for every class k and diffuses the error err to those near neighbors of A[i, j] whose class numbers exceed k. The algorithm distinguishes between the four orthogonal neighbors and the four diagonal neighbors of A[i, j]. If a neighbor is A[u, v], then the former type satisfies (u − i)2 + (v − j)2 = 1, while the latter type is identified by (u−i)2 +(v −j)2 = 2. It is reasonable to distribute more of the error to the orthogonal neighbors than to the diagonal ones, so a possible weight function is weight(x, y) = 3 − x2 − y2 . For an orthogonal neighbor, either (u − i) or (v − j) equals 1, so weight(u − i, v − j) = 2, while for a diagonal neighbor, both (u − i) and (v − j) equal 1, so weight(u − i, v − j) = 1. Procedure Distribute is listed in pseudo-code in Figure 2.80. Once the coordinates (i, j) of a pixel A[i, j] are known, the class matrix gives the pixel’s class number k that is independent of the color (or brightness) of the pixel. The class matrix also gives the classes of the eight near neighbors of A[i, j], so those neighbors whose classes exceed k can be selected and linked in a list. It is a good idea to construct those lists once and for all, since this speeds up the algorithm considerably. It remains to show how the class matrix, Figure 2.79a, was constructed. The main consideration is the relative positioning of small and large classes. Imagine a large class surrounded, in the class matrix, by smaller classes. An example is class 63, which is surrounded by the “lower classes” 43, 59, 57, 51, 60, 39, 47, and 55. A little thinking
2 Raster Graphics 34 48 40 32 29 15 23 31 42 58 56 53 21 5 7 10 50 62 61 45 13 1 2 18 38 46 54 37 25 17 9 26 28 14 22 30 35 49 41 33 20 4 6 11 43 59 57 52 12 0 3 19 51 63 60 44 24 16 8 27 39 47 55 36
(a)
117
34 48 40 32 29 15 23 31 42 58 56 53 21 50 62 61 45 13 18 38 46 54 37 25 17 26 28 14 22 30 35 49 41 33 20 11 43 59 57 52 12 19 51 63 60 44 24 16 27 39 47 55 36
34 48 40 32 29 23 31 42 58 56 53 21 50 62 61 45 38 46 54 37 25 26 28 22 30 35 49 41 33 43 59 57 52 51 63 60 44 24 27 39 47 55 36
(b)
(c)
34 48 40 32 42 58 56 53 50 62 61 45 38 46 54 37 35 49 41 33 43 59 57 52 51 63 60 44 39 47 55 36
(d)
Figure 2.79: 8×8 Matrices for Dot Diffusion.
for k := 0 to 63 do for all (i, j) of class k do begin if A[i, j] < .5 then B[i, j] := 0 else B[i, j] := 1; err := A[i, j] − B[i, j]; Distribute(err, i, j, k); end. procedure Distribute(err, i, j, k); w := 0; for all neighbors A[u,v] of A[i,j] do if class(u, v) > k then w := w+weight(u − i, v − j); if w > 0 then for all neighbors A[u, v] of A[i, j] do if class(u, v) > k then A[u, v] := A[u, v] + err×weight(u − i, v − j)/w; end; Figure 2.80: The Dot Diffusion Algorithm.
shows that as the algorithm iterates toward 63, more and more error is absorbed into pixels that belong to this class, regardless of their brightness. A large class surrounded by lower classes is therefore undesirable and may be called a “baron.” The class matrix of Figure 2.79a has just two barons. Similarly, “near-baron” positions, which have only one higher-class neighbor, are undesirable and should be avoided. Our class matrix has just two of them. Exercise 2.32: What are the barons and near-barons of our class matrix? Exercise 2.33: Consider an all-gray image where A[i, j] = 0.5 for all pixels. What image B would be generated by dot diffusion in this case? Another important consideration is the positions of consecutive classes in the class matrix. Figure 2.79b,c,d shows the class matrix after 10, 21, and 32 of its lowest classes have been blackened. It is easy to see how the black areas form 45◦ grids that grow and eventually form a 2×2 checkerboard. This helps create diagonal, rather than rectilinear dot patterns in the bilevel array B, and we know from experience that such patterns are less noticeable to the eye. Figure 2.81a shows a class matrix with just one baron and
2.29 Stippling
118
25 21 13 39 47 57 53 45 48 32 29 43 55 63 61 56 40 30 35 51 59 62 60 52 36 14 22 26 46 54 58 44 16 6 10 18 38 42 50 24 8 0 2 7 15 31 34 20 4 1 3 11 23 33 28 12 17 9 5 19 27 49 41 37
(a)
14 13 1 2 4 6 11 9 0 3 15 12 10 8 5 7
(b)
Figure 2.81: Two Class Matrices for Dot Diffusion.
one near-baron, but it is easy to see how the lower classes are mostly concentrated at the bottom-left corner of the matrix. A close examination of the class matrix shows that the class numbers in positions (i, j) and (i, j + 4) always add up to 63. This means that the grid pattern of 63 − k white pixels after k steps is identical to the grid pattern of 63 − k black pixels after 63 − k steps, shifted right four positions. This relation between the dot pattern and the diffusion pattern is the reason for the name dot diffusion. Exercise 2.34: Figure 2.81b shows a 4×4 class matrix. Identify the barons, near-barons, and grid patterns. Experiments with the four methods described in this section seem to indicate that the dot diffusion method produces best results for printing because it tends to generate contiguous areas of black pixels, rather than “checkerboard” areas of alternating black and white. Modern laser and ink-jet printers have resolutions of 600 dpi or more, but they generally cannot produce a high-quality checkerboard of 300 black and 300 white alternating pixels per inch.
2.29 Stippling Stippling is the process of generating a random pattern of black dots that we perceive as a grayscale image. Stippled images are found in nature and are sometimes painted by artists manually with a pen or brush. The pen and ink drawing of locomotive 1356 (Figure 2.82, courtesy of H. L. Scott) is a beautiful example. Stippling as an art form may include strokes in addition to dots. Manual stippling is an attempt to direct the viewer’s attention to important features and details, to simplify features, and to expose features that are otherwise hidden. When done properly, stippling can selectively include and enhance certain details that make the stippled image more expressive than a photograph. In computer graphics and image processing, stippling may be used instead of halftoning and dithering, because it is fast and because it avoids the annoying artifacts that often result from the regular halftone patterns. Pointillism (Section 2.1) also works by spreading small dots, and some experts claim that pointillism and stippling are the same. Others maintain that stippling employs black
2 Raster Graphics
119
Figure 2.82: A Hand-Drawn Stippled Image (Courtesy H. L. Scott).
dots on a white background to create a grayscale image, while pointillism is based on colored dots placed close together, so that their colors are combined and averaged in the eye. Other meanings of stippling. A random pattern of small depressions made in a surface to increase friction and provide better grip. This is also referred to as knurling or checkering. The damage caused by spider mites. They leave tiny but growing white spots on the leaves such that eventually the entire leaf appears silvery. In medicine, the term stippling refers to the circular pattern of dots that appear around a gunshot wound when the bullet was fired from close range. The stippling method described here is simple and intuitive. Given a grayscale image, where each pixel is a number between 0 and 255, the idea is to select certain pixels in the original, grayscale image at random and change them to black, while changing all the other pixels to white. More pixels should be selected in dark regions of the image, which suggests the following approach. Scan the image pixel by pixel. For each pixel P(i, j), draw a random number R in the interval [0, T ] (where T is a user-controlled threshold parameter) and compare it to the gray intensity of the pixel. If P(i, j) > R, paint the pixel black, otherwise change it to white. (Notice that in the Mathematica code of Figure 2.83, the condition is “” because in the test files, 255 was the value of white and 0 was the value of black. Also, the rows of the bitmap were in reverse order, which is why the code has stp[[row+1-i,j]]=1 instead of stp[[i,j]]=1.) This method makes sense because (1) the black pixels are randomly distributed and (2) in dark regions of the image, where many gray pixels have large values, more pixels are selected. Figure 2.83 shows an original grayscale image and two stippled versions, a dark version where 70% of the pixels were chosen, and a brighter version, with a threshold of 150, where only 57% of the pixels were chosen. Notice that reversing the relational btmp[[i,j]]RandomInteger[150] results in a stippled negative of the original image. This simple stippling method gives good results, but can be improved, at least for
120
2.29 Stippling
(* Stippling an image *) ar=Import["A1965.jpg"]; (*Input grayscale image*) d=ImageData[ar]; btmp=255 d[[All,All,1]]; (* Grayscale values in [0,255] *) {row,col}=Dimensions[btmp] stp=Table[0,{i,1,row},{j,1,col}];(*Init stp to zeros*) Do[If[btmp[[i,j]]1] (* Now give r a ramp distribution *) r=Table[Max[Random[Real, {0,R}], Random[Real, {0,R}]], {n}]; P = Point[ Table[{r[[i]] Cos[theta[[i]]], r[[i]] Sin[theta[[i]]]}, {i,1,n}]]; (* points are uniformaly distributed *) Show[Graphics[{Green, P}], Graphics[{Blue, Circle[{0,0},10]}], Graphics[{Red, Point[{0,0}]}], AspectRatio->1] Figure 2.87: Two Distributions of Random Points in a Circle.
The left part of Figure 2.87 shows the results of the latter method. Surprisingly, the 300 points are not distributed uniformly but are denser toward the center. The righthand part of the figure shows points that are uniformly distributed. An examination of the code verifies that the only difference between the two parts is the way the random r numbers are generated. The statement Max[Random[Real,0,R], Random[Real,0,R]] results in a random sequence of numbers with a ramp distribution, which causes the points to be distributed uniformly. Here is why. Assume that both θ and r are uniformly distributed. Assume further that the 300 random numbers that we draw for r can take only 10 values, r1 through r10 , between 0 and R and similarly the 300 numbers for θ are limited to 10 values θj between 0 and 2π. For each ri there are now about 30 points with various θj , and they are arranged in a circle of radius ri . This explains why inner circles (closer to the center)
126
2.31 Image Processing
are denser than outer circles. It is obvious that the 300 random numbers for r should have more large values than small values, i.e., r should be generated with a ramp distribution. If among the 300 numbers there are m random numbers with r = 0.15, then there should be 2m numbers with r = 0.30. It remains to understand how the Max function converts the distribution of r from uniform to ramp. The code above shows that we draw a pair of random numbers a and b distributed uniformly in the interval [0, R], and select the larger. Once the first number, a, is drawn, it partitions the interval [0, R] into two subintervals [0, a] and [a, R]. In order for a to be larger than the next number b, the latter has to be located in [0, a] and the probability of that is the ratio a/R of intervals. Thus, the probability of selecting a is a/R, proportional to a. With such probability, larger values are selected more often than smaller values. See also Section 22.6 for the Gaussian distribution of random numbers. One might have thought that from traditional mathematics and statistics there would long ago have emerged some standard definition of randomness. But despite occasional claims for particular definitions, the concept of randomness has in fact remained quite obscure. And indeed I believe that it is only with the discoveries in this book that one is finally now in a position to develop a real understanding of what randomness is. —Stephen Wolfram, A New Kind of Science (2002).
2.31 Image Processing The rapid development of computer graphics (and related hardware such as scanners and digital cameras) in the last four decades has created a flood of images. As a result, the field of image processing has grown side by side with computer graphics. Image processing is a vast set of techniques that input an image and output either another image or a set of characteristics or parameters related to the input image. Many of those techniques modify a given image in ways that make it more useful or more interesting. Many times, an image taken by a satellite needs to be sharpened or painted with false colors. On the other hand, an artist may want to take a sharply focused photograph and intentionally blur it, or make it look as if it was originally painted by watercolors (Plates E.3, E.4, H.2, and H.4), or make it resemble an image embossed on paper—in order to achieve interesting, beautiful, or useful effects. The input image to be processed is a bitmap, with 1 or more bits per pixel. The processing software must be given the three dimensions of the bitmap (number of rows, number of columns, and number of bits per pixel) and it normally creates the new image in another bitmap, pixel by pixel. A typical image processing algorithm consists of a loop that iterates over all the pixels of the image, processing each in the same way. The original pixel stays in the original bitmap (because its value may be needed to process neighboring pixels) and the newly computed pixel is stored in the new bitmap.
2 Raster Graphics
127
The techniques described here produce very different results, but most are based on the same principle. The principle is to define a small matrix of values, called a convolution kernel, to place it centered on the current pixel, to multiply the value of the pixel and the values of its neighbors by the values in the kernel, to sum the results, and to store the sum in the new bitmap, as the value of the newly computed pixel. If the color of a pixel is specified by means of three numbers (normally the red, green, and blue components), then this process is applied separately to each of the three components of the current pixel. Blurring: This is achieved by the convolution kernel of Figure 2.88b. The values of the current pixel and eight of its nearest neighbors are multiplied by the weights shown and the products are added. The result is a color value that still has 20% of the old pixel value but has contributions of 8% and 12% from neighboring pixels. Note that the weights add up to 1. As an example, suppose that the current pixel is the center pixel of Figure 2.88a. The calculation is 5×0.8 + 5×0.12 + 8×0.8 + 19×0.12 + 5×0.20 + 8×0.12 + 11×0.8 + 1×0.12 + 8×0.8 = 30.56, yielding a value of 31 for the new pixel. Note that more blurring can be achieved by having a 5×5 convolution kernel (the weights should add up to 1) which will spread the color of a pixel to 24 of its neighbors. (See also Exercise 25.9.)
5 5 8 19 5 8 11 1 8 (a)
.8 .12 .8 .12 .20 .12 .8 .12 .8 (b)
0 −1 0 −1 5 −1 0 −1 0 (c)
−1 0 0 0 0 0 0 0 1 (d)
0 0 0 10 10
0 4 1 3 1
4 4 8 7 7
4 4 8 8 9
4 6 6 6 5
(e)
Figure 2.88: Image Processing Techniques.
Sharpening: Sharpening, which often seems a miracle, is obtained by the convolution kernel of Figure 2.88c. The weights again add up to 1, but the negative weights magnify any contrasts between the original pixels. More sharpening may be achieved by repeating the process on the new bitmap. Embossing: This is achieved by the convolution kernel of Figure 2.88d. Note that the weights here add up to 0. To understand how this works, we should think of pixels along an edge, as opposed to pixels away from an edge. Pixels located away from an edge tend to be similar and we can call them “background” pixels. The convolution kernel of Figure 2.88d sets such pixels to 0 or close to 0. In contrast, if the current pixel is part of an edge that goes from bottom-left to top-right (if it is a nonbackground pixel), then its two diagonal neighbors should have different colors and our kernel will create a new nonzero pixel.
128
2.31 Image Processing
Our kernel creates a white background since it sets all background pixels to 0. Visually, it is better to have a medium gray background, and this is easily achieved by adding 128 to each pixel generated. Background pixels will now have a value of 128 (or close to 128) and nonbackground ones will have any values. If they have a value greater than 255, it can be truncated to 8 bits (calculated modulo 256). Note that the embossing kernel can be written in a number of ways. All that is needed are the numbers 1 and −1 in opposite corners. Different kernels create the effect of the light hitting the embossed picture from different directions. Watercoloring: An image can be modified to look as if it has been painted in watercolor by examining a group of neighbors centered on the current pixel and replacing the original value of the pixel by the median of the group. Assuming that the current pixel is the center one in Figure 2.88e, we sort the values in the group of 5×5 neighbor pixels to obtain 0, 0, 0, 0, 1, 1, 3, 4, 4, 4, 4, 4, 4, 5, 6, 6, 6, 7, 7, 8, 8, 8, 9, 10, 10. The median value is 4 (since there are 12 smaller values and 12 greater ones). Our center pixel of 8 is therefore replaced, in the new bitmap, by 4. If the result is too soft, it can later be sharpened. Gaussian Blur: A linear blur assigns lighter and lighter weights to pixels that are farther away from the center. A blurred pixel may be assigned a% of its original value and contributions from its neighbors that go down linearly to (a − b)%, (a − 2b)%, (a − 3b)%, and so on, for farther away neighbors. Thus, if a = 24% and b = 2%, then a blurred pixel will retain 24% of its value and will be assigned 22%, 20%, 18%, and 16% of its neighbors at distances of 1, 2, 3, and 4 units. Notice that the percentages add up to 100. Gaussian blur (also known as Gaussian smoothing) is nonlinear. It may blur a pixel by assigning it 22% of its original value, but only 11% of each of its nearest neighbors, 1.3% of each neighbor at distance 2, 0.03% of each neighbor at distance 3, and so on, down to 0.000067% for each neighbor at distance 6. The percentages have to add up to 100, so the only user-controlled parameter is the maximum distance. Figure 2.89 shows four examples of Gaussian blur (with maximum distances of 10, 20, 30, and 40 pixels) and two linear blurs, one of them radial. We know from experience that linear blur results in an image that seems unfocused (this is also called the bokeh effect). Gaussian blur, on the other hand, reduces the higher frequencies of the image. It has the effect of a low-pass filter and it results in an image that seems to be viewed through translucent glass. Bokeh (or boke) is a term referring to the quality of the out-of-focus (blurred) parts of an image. The amount of bokeh in an image is not well-defined. The bokeh of a photo is determined by certain characteristic of the lens, such as its aperture, the circles of confusion (Section 26.4.7), and how far out-of-focus the lens is. The mathematics of Gaussian blur is based on the well-known Gaussian distribution (or bell curve, Section 22.6), but in two dimensions. We start with the two-dimensional Gaussian distribution, select a value for the standard deviation σ, and compute the
2 Raster Ra aster Graphics Grap phics
10
20
129
30
Linear 40
Figure 2.89: Gaussian and Linear Blurrings.
expression G(x, y) =
2 x + y2 1 exp − , 2πσ2 2σ 2
for several neighbor pixels distributed symmetrically around the current pixel. For example, x and y may each vary seven steps from −3 to 3, producing a 7 × 7 array of Gaussian coefficients. This array is normalized such that the sum of all 49 coefficients is 1, as shown in Figure 2.90, and the current pixel is replaced by the weighted sum of itself and its 48 near neighbors, each multiplied by the corresponding Gaussian coefficient (i.e., it is convolved with a Gaussian distribution). This is done for every pixel in the image (or in the region to be blurred) and the new pixels then replace the original ones. Notice how the elements of the kernel are normalized by dividing them by their sum, so that their new sum is 1. Exercise 2.36: Why is the normalization necessary?
2.31.1 An Alternative Approach It is possible to process images by means of the fundamental relation (1 − t)P0 + tP1 (this is the all-important Equation (9.1)). The idea is to blend two images (Section 8.5) by interpolating (i.e., using 0 ≤ t ≤ 1) or by extrapolating (using t < 0 or t > 1) them. Values t > 1 subtract part of P0 while scaling P1 . Negative values of t do the reverse. The examples below show how a general image P0 can be blended with a special image P1 (a mask) to obtain the following useful results: Brightness: We select a bitmap of all black as the mask P1 . Interpolation darkens the image while extrapolation brightens it. The original image is obtained for t = 0. Contrast: We compute the average intensity I of all the pixels in the original image. We build the mask as a gray bitmap where every pixel has value I. Interpolation (0 ≤ t ≤ 1) reduces contrast, while extrapolation increases it. Negative values of t generate inverted images. The average intensity of the final image is always I.
2.31 Image Processing
130 0.00000067 0.00002292 0.00019117 0.00038771 0.00019117 0.00002292 0.00000067
0.00002292 0.00078633 0.00655965 0.01330373 0.00655965 0.00078633 0.00002292
0.00019117 0.00655965 0.05472157 0.11098164 0.05472157 0.00655965 0.00019117
0.00038771 0.01330373 0.11098164 0.22508352 0.11098164 0.01330373 0.00038771
0.00019117 0.00655965 0.05472157 0.11098164 0.05472157 0.00655965 0.00019117
0.00002292 0.00078633 0.00655965 0.01330373 0.00655965 0.00078633 0.00002292
0.00000067 0.00002292 0.00019117 0.00038771 0.00019117 0.00002292 0.00000067
(* Gaussian Kernel (normalized) *) sigma=0.84089642; sigmat=2.sigma^2; cc=1/sigmat Pi; gausskernel[x_, y_]:=cc E^(-(x^2+y^2)/sigmat); GC=Table[gausskernel[x,y], {x,-3,3}, {y,-3,3}]; GC=GC/Total[Flatten[GC]] (* Normalize *) Plot3D[gausskernel[x,y], {x,-3,3}, {y,-3,3}, PlotRange->All] Figure 2.90: Gaussian Kernel.
Saturation: We first compute the luminance of every pixel in P0 and set the corresponding pixel in the mask P1 to a shade of gray with that luminance. The mask is then used to change the luminance of every pixel in the original image P0 . Interpolation decreases saturation, while extrapolation increases it. Negative t also inverts the hue of image P0 . Sharpening: Section 3.14 discusses convolution. Sharpening and blurring are examples of convolutions. If the mask is a blurred version of the original, then interpolation blurs the original and extrapolation sharpens it. For more information, see [Haeberli and Voorhies 94].
2 Raster Graphics
131
2.32 The Hough Transform A bubble chamber is a container filled with liquid hydrogen (or any superheated transparent liquid). For many years, bubble chambers were used as the main instrument for detecting electrically-charged elementary particles moving through it. It was invented in 1952 by Donald A. Glaser. A nuclear reaction would be initiated by accelerating particles and letting them collide, and the results were photographed through a glass window. A bubble chamber photograph may be very complex (Figure 2.91) and may include hundereds of curves and spirals. Certain elementary particles are elusive and are created only rarely. Thus, very often thousands of photographs had to be taken and analyzed in order to discover such a particle, which is why the analysis had to be automated.
Figure 2.91: A Bubble Chamber Photograph.
In 1962, Paul Hough introduced a method, based on a simple transform, that became the basis for analyzing bubble chamber photographs and can be used to detect lines and curves in digital images. Reference [Hough 62] is the original patent application, reference [Hough 10] is a general introduction to this transform, and [Hart 09] is the history of the Hough transform. In modern computer graphics, the Hough transform is used to detect patterns in images. We first describe how this method is employed to detect straight lines, and then show how to extend it to detect arbitrary parametric curves. To understand the problem, let’s consider the two straight segments y = x and y = x/3 + 4 displayed on an 11 × 11 grid, where 0 ≤ x, y ≤ 10. Looking at the grid, it is easy for a person to realize that it features two lines. Identifying the two lines by software, however, is much more difficult, because all that the software can “see” is the 121-bit bitmap 00000000001 00000000010 00000000100 00000001111 00000111000 00111100000 11001000000 00010000000 00100000000 01000000000 10000000000 (without the spaces), which corresponds to the pixels shown in Figure 2.92. How can a program find the equations of the lines, or even discover the fact that there are two lines, from the pixels in the bitmap? In practical cases, there may be many lines in the bitmap,
132
2.32 The Hough Transform
with missing pixels and with some pixels at slightly wrong positions, complicating the problem even further. O 10 9 O 8 O 7 x 6 O 5 O 4 O 3 O 2 O 1 O 0 0 1 2 3 4 5 6 7 8 9 10 O
Figure 2.92: An 11×11 Bitmap.
The explicit equation of a straight line is y = ax+b, where a is the slope and b is the y intercept. For a given line, the parameters a and b are fixed and the coordinates x and y vary. The principle of the Hough transform is to reverse the roles of the parameters and the coordinates. Given a pair of coordinates (x0 , y0 ), the transform calculates all the possible values of (a, b) that satisfy y0 = a x0 + b. In other words, given a point (x0 , y0 ), the transform computes all the pairs (a, b) of straight lines that pass through the point. When these pairs are plotted in the ab plane, they define a line, because the values of each pair are linearly related. The ab plane is called the parameter space, to distinguish it from the image space (the xy plane) where the pixels actually exist. Example: Given the point (x0 , y0 ) = (10, 10), we calculate the 11 pairs (a, b), where b = 0, 1, . . . , 9, 10. From 10 = 10a + b we obtain a = (10 − b)/10. The 11 pairs are (1, 0), (0.9, 1), (0.8, 2), (0.7, 3),. . . ,(0, 10). Since there can be infinitely many straight lines passing through any given point, we have to limit the calculation to a subset of these lines (i.e., the values of a and b have to be “quantized”). We may decide to calculate only (a, b) values that are integers in a certain range or that have just one digit to the right of the decimal point. For example, we may limit the calculation to the set of quantized values a = 0, 0.1, . . . , 0.9, 1 and b = 0, 1, . . . , 9, 10 (lines with 11 slopes between 0◦ and 45◦ and 11 intercept values between 0 and 10). Each parameter can take on 11 values. The pairs (a, b) being calculated are accumulated in a two-dimensional array ab of integers in memory, whose rows and columns correspond to values of a and b. The array should be large enough for all the possible values of pairs (a, b). In the above example, we need an array of 11×11 integers. All array elements are initially cleared. Each time a pair (a, b) is calculated, the corresponding array element is incremented by 1. The algorithm works by scanning the bitmap, looking for bits of 1. For each 1-bit found, the following three steps are performed: 1. Determine the coordinates (x, y) of the pixel. 2. Compute all the quantized pairs (a, b) for point (x, y). 3. For each pair (a, b), increment array location ab[a,b] by 1.
2 Raster Graphics
133
Exercise 2.37: Assuming that the 121 bits of the bitmap are indexed 0 through 120 and given the row and column numbering of the pixels in Figure 2.92, figure out the row and column numbers of the pixel at bitmap index i. The Hough algorithm is a transform because it transforms points (x, y) in the image space (i.e., pixels) to points (a, b) in the parameter space (i.e., lines). It has the following properties: 1. A single point in image space is transformed to many points in the parameter space. Those points are on a line, so each image point is transformed to a line in the parameter space. 2. Any pair (a, b) defines the unique line y = ax + b, so any point in parameter space corresponds to a line in image space. 3. Imagine two points (x1 , y1 ) and (x2 , y2 ) on the line y = mx + n in image space. Many lines go through each point, so each will cause many elements of array ab to be incremented. Specifically, line y = mx + n goes through both points, so each point will cause array element ab[m,n] to be incremented. This element will therefore be incremented twice. In our example, there are 11 pixels on the line y = x (for which a = 1, b = 0), so array element ab[1,0] will be incremented 11 times. Similarly, line y = x/3 + 4 will cause element ab[1/3,4] to be incremented 11 times. Any other elements of ab will be incremented fewer than 11 times. 4. Imagine two points (a1 , b1 ) and (a2 , b2 ) in parameter space. Each of them defines a line in image space. Imagine that these two lines intersect at point (x0 , y0 ). All points (a, b) between (a1 , b1 ) and (a2 , b2 ) also define lines that intersect at (x0 , y0 ). Property 3 above is the important one. It means that the two array elements ab[1,0] and ab[1/3,4] will have the largest counts. Identifying the two lines is therefore reduced to scanning array ab and finding the elements with the largest counts. Each such element ab[m,n] tells us that the line y = mx + n exists in the image. The first 1-bit is found in the bitmap at index 10. It therefore corresponds to the pixel at location (10, 10). The parameter pairs for this point satisfy 10 = 10a + b or a = (10 − b)/10. The 11 pairs are therefore (1, 0), (0.9, 1), (0.8, 2), (0.7, 3),. . . , (0, 10). Exercise 2.38: The next 1-bit is found in the bitmap at index 20, so it corresponds to the pixel at location (9, 9). Calculate the parameter pairs for this point. Using the slope and y-intercept as parameters has the disadvantage that both can grow without a limit. A better set of parameters, the normal parameters, is shown in Figure 2.93. Parameter α is the angle between the x axis and the normal to the line (0◦ ≤ α < 180◦ ) and parameter β is the distance of the line from the origin (0 ≤ β ≤ ∞). The straight line itself has the equation x cos α + y sin α = β, but if we consider α and β to be the variables, instead of x and y, we end up with the function F (α, β) = x cos α + y sin α − β, a sinusoidal. Each point (x, y) in the image space is therefore transformed into a sinusoidal in the parameter space. However, when several points on a straight line are transformed into sinusoidals, all the sinusoidals intersect at one point. The four properties above also hold, but have to be rephrased as follows: 1. A point (x, y) in the image space is transformed to a sinusoidal curve in the parameter space.
134
2.32 The Hough Transform y
Li β
ne
x
Figure 2.93: The Normal Parameters for a Line.
2. A point (α, β) in the parameter space corresponds to a straight line in the image space. 3. Points lying on a straight line in the image space correspond to sinusoidals that intersect at one point in the parameter space. 4. Points lying on a sinusoidal curve in the parameter space correspond to lines intersecting at one point in the image space. We next discuss the extension of the Hough transform to arbitrary parametric curves. Suppose that P(t1 , t2 , . . . , tn ) is a parametric curve defined by n parameters. For each n-tuple (t1 , t2 , . . . , tn ), the curve passes through a point (x, y). The Hough transform maps each point (x, y) in the image space to many points (t1 , t2 , . . . , tn ) in the n-dimensional parameter space. If n = 2, each point is transformed into a twodimensional curve. For n = 3, each point is transformed into a three-dimensional surface. For larger values of n, each point is transformed into an n-dimensional hypersurface. The advent of the computer, not as a computer but as a drawing machine, was for me a major event in my life.
—Benoˆıt Mandelbrot.
3 Scan Conversion In a raster-scan graphics system, two steps are necessary in order to display a geometric figure: (1) a scan-converting algorithm should be executed to select the best pixels (the ones closest to the ideal figure) and (2) the selected pixels should be turned on. Step 2 is simple. It only requires setting bits in the bitmap (perhaps with xor). Most current compilers have a built-in function to do this. All that the program has to say is putpixel(row,col); or plot(r,c,color); or something similar. Step 1, however, is more complex. The scan-converting algorithm has to be fast and it must depend on the shape of the figure. This chapter discusses scan-converting algorithms for straight lines and for circles.
3.1 Scan-Converting Lines After the point, the straight line is the simplest geometric figure. Its explicit equation is y = a x + b, where a is the slope and b is the y-intercept (see Section 9.1 for other ways to represent lines). In practice, the coordinates of the two endpoints (x1 , y1 ) and (x2 , y2 ) are given, instead of a and b, but it is easy to express a and b in terms of the endpoints. The slope a is simply (y2 − y1 )/(x2 − x1 ) or Δy/Δx. The value of b is obtained from y1 = ax1 + b, which implies b = y1 − ax1 = y1 −
y2 − y1 y1 (x2 − x1 ) − x1 (y2 − y1 ) y1 x2 − x1 y2 x1 = = . x2 − x1 x2 − x1 x2 − x1
Our first algorithm uses (x1 , y1 ) and (x2 , y2 ) to compute a and b, and then executes the loop of Figure 3.1. However, this loop is very slow because it uses multiplications and also because it works with real quantities that eventually have to be rounded to integers. A better algorithm should use just additions/subtractions and just integers.
D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_3, © Springer-Verlag London Limited 2011
135
136
3.2 Midpoint Subdivision var a, b, x, y, x1, x2, y1, y2: real; a:=(y2-y1)/(x2-x1); b:=y1-a*x1; x:=x1; repeat y:=a*x+b; point(round(x),round(y)); x:=x+1; until x>x2; Figure 3.1: Scan Convert y = ax + b.
Exercise 3.1: The loop of Figure 3.1 has another drawback; what is it?
3.2 Midpoint Subdivision Midpoint subdivision is a completely different approach to the problem of scan-converting lines. It employs a procedure Midpoint that divides the line segment in two, calculates the midpoint of the segment, plots this pixel, and calls itself recursively twice, to do the same for the two halves of the segment. The following is a listing in the C language: void Midpoint(int a1,int b1, int a2, int b2) { int midx,midy; midx=(a1+a2)/2; midy=(b1+b2)/2; putpixel(midx,midy,Color); /* Turbo C */ if(abs(a1-midx)>1 || abs(b1-midy)>1) Midpoint(a1,b1,midx,midy); if(abs(midx-a2)>1 || abs(midy-b2)>1) Midpoint(midx,midy,a2,b2); } The main program needs only input the two endpoints and then invoke Midpoint(x1,y1,x2,y2);. In practice, the procedure may be written as a nonrecursive one, manipulating the recursion stack explicitly. An attractive feature of this method is the simple arithmetic. Only two divisions are required and even they can be replaced by a shift. The proverbial German phenomenon of the verb-at-the-end about which droll tales of absentminded professors who would begin a sentence, ramble on for an entire lecture, and then finish up by rattling off a string of verbs by which their audience, for whom the stack had long since lost its coherence, would be totally nonplussed, are told, is an excellent example of linguistic recursion. —Douglas Hofstadter.
3 Scan Conversion
137
3.3 DDA Methods Better scan-conversion algorithms should be arithmetically simple and should use just integers. There is a large class of such methods, called DDA (for Digital Differential Analyzer) methods. Section 2.5 discusses one such method, for bitmap scaling. Several DDA methods for scan-converting lines are described here. The original Differential Analyzer was an analog computer built, in 1927, by the American scientist and engineer Vannevar Bush (1890–1974). It was based on the use of mechanical integrators that could be interconnected in any desired manner.
3.3.1 Simple DDA This algorithm uses the relationship a(x + 1) + b = (ax + b) + a = y + a, which implies that if we increment x by 1 and would like to stay on the line, we should increment y by a. The following code assumes that the two endpoints (x1, y1) and (x2, y2) of the line segment are given. This algorithm does not work for vertical lines, where a = ∞. var x, x1, y1, x2, y2: integer; a, y: real; a:=(y2-y1)/(x2-x1); x:=x1; y:=y1; repeat point(x,round(y)); x:=x+1; y:=y+a; until x>x2; This is still inefficient because y is still a real quantity, but it has another, more important drawback. Imagine a very steep line, where a 1. Since our loop increments x by 1, successive pixels will have y coordinates that differ greatly and the result may be a fragmented line, made of a few disconnected pixels. To correct this problem, we have to check the slope, and if it is greater than 45◦ , increment y, not x, in steps of 1. From y = ax + b, we get y + 1 = ax + b + 1 = ax + 1 + b = a(x + 1/a) + b. Therefore, when y is incremented by 1, x should be incremented by 1/a. The algorithm becomes var Δx,Δy,L:integer; x,y,a,G,H:real; Δx:=x2-x1; Δy:=y2-y1; x:=x1; y:=y1; if Δx>Δy then G:=1; H:=a; else G:=1/a; H:=1 endif; for L:=1 to max(Δx,Δy)+1 do point(round(x),round(y)); x:=x+G; y:=y+H; endfor; Ideally, the number of pixels generated when a line is scan-converted should equal the length of the line. This is because the length is measured in screen units, which are pixels. A line of length L displayed by fewer than L pixels would be fragmented. If the same line is generated by more than L pixels, it would look brighter than other lines. The total number of pixels drawn by simple DDA is max(Δx, Δy). For lines that are close to horizontal or close to vertical, max(Δx, Δy) ≈ length, which is the ideal
3.3 DDA Methods
138
◦ case. For a 45 line, √ Δx = Δy, √ so Δx pixels are drawn. The length of such a line equals 2 2 Δx + Δy = 2Δx2 = 2Δx ≈ 1.41Δx which implies that Δx ≈ 0.71 length. For such a line, this algorithm generates only about 71% of the ideal number of pixels. Consequently, such lines look dim.
Exercise 3.2: Given the two endpoints (x1, y1) = (1, 2) and (x2, y2) = (4, 6), execute the simple DDA algorithm manually and show the pixels generated. Also, calculate the length of the line.
3.3.2 A Variation The following pseudo-code is a variation of the simple DDA. The quantity length is not the length of the line but is related to it. Because of the way it is defined, one of the quantities x_incr, y_incr equals ±1, which simplifies the algorithm. procedure SimpleDDA(x1,y1,x2,y2: integer); begin Δx,Δy,length,i:integer; x,y,x_incr,y_incr:real; Δx:=x2-x1; Δy:=y2-y1; length:=max(abs(Δx), abs(Δy)); x_incr:=Δx/length; y_incr:=Δy/length; x:=x1; y:=y1; for i:=1 to length+1 do point(round(x),round(y)); x:=x+x_incr; y:=y+y_incr; endfor; end; Notice that the other quantity (x_incr or y_incr) is still real.
3.3.3 Symmetrical DDA The simpler DDA methods are based on a loop, where during each iteration, the (x, y) coordinates of the previous pixel are incremented by Δx and Δy, respectively, for some quantity . A typical procedure is procedure SymmDDA(x1,y1,x2,y2: integer); calculate eps; xIncr:=eps*Δx; yIncr:=eps*Δy; x:=x1+.5; y:=y1+.5; repeat Plot(trunc(x),trunc(y)); x:=x+xIncr; y:=y+yIncr; until x=x2 or y=y2; end; (See the end of this section for an explanation of x:=x1+.5; y:=y1+.5; and for the use of trunc instead of round.) In the symmetrical DDA method, we set = 2−n , where n is defined by 2n−1 ≤ max(|Δx|, |Δy|) < 2n .
3 Scan Conversion
139
This sets to 1 over the (approximate) length of the line. It also sets the x and y increments to values less than 1. As an example, consider the line from (0, 0) to (7, 5). Its equation is y = (5/7)x. For this line, we have Δx = 7 and Δy = 5, leading to 22 = 4 ≤ max(7, 5) < 8 = 23 . Therefore, n is set to 3, implying = 2−3 = 1/8. The (x, y) coordinates of the previous pixel are incremented by Δx = 7/8 and Δy = 5/8, respectively. The nine steps of the loop are summarized in Table 3.2. The last column of the table (+.−) compares the ideal y coordinate of the point (which equals 5/7 times the x coordinate) to the y coordinate actually displayed. A “+” is shown if the point displayed is above the ideal point, a “−” is shown in the opposite case, and a period is shown in the ideal case. Point No. 1 2 3 4 5 6 7 8 9
Start (x, y) (.5, .5) (.5 + 78 , .5 + 7 9 ( 11 8 + 8, 8 + 18 7 14 ( 8 + 8, 8 + 7 19 ( 25 8 + 8, 8 + 32 ( 8 + 78 , 24 8 + 7 29 ( 39 + , 8 8 8 + 7 34 + ( 46 8 8, 8 + 54 ( 8 + 78 , 39 8 +
5 8) 5 8) 5 8) 5 8) 5 8) 5 8) 5 8) 5 8)
= (.5, .5) = (11/8, 9/8) = (18/8, 14/8) = (25/8, 19/8) = (32/8, 25/8) = (39/8, 29/8) = (46/8, 34/8) = (54/8, 39/8) = (61/8, 44/8)
Truncated to
Ideal y
+.−
(0, 0) (1, 1) (2, 1) (3, 2) (4, 3) (4, 3) (5, 4) (6, 4) (7, 5)
0 5/7 10/7 15/7 20/7 20/7 25/7 30/7 35/7
. + − − + + + − .
Table 3.2: Symmetrical DDA Example.
Note that point 6 is identical to point 5 and should not be displayed. If we ignore point 6, we end up with a line where the first and last points are smack on the ideal line, points 3, 4, and 8 are below the line, and points 2, 5, and 7 are above it. This is the reason for the name Symmetrical DDA. How can we avoid plotting point 6? The ideal solution is to use special hardware where both x and y are stored in registers that have two parts. The left part of each register holds the integer value of the variable, and the right part, the fractional value. The x and y increments, which are less than 1, are added to the fractional parts. Whenever a fractional part overflows, the overflow signal increments the corresponding integer part. If neither fractional register overflows, the point is not plotted. This method creates the truncated value in the integer parts, rather than the rounded values, since this is faster. Since we still need the rounded, not truncated, values, we initialize both fractional parts to 0.5 rather than to zero. Notice that trunc(x+0.5) equals round(x).
3.3.4 Quadrantal DDA Quadrantal DDA is a more sophisticated DDA method. Its principle is to increment, in each step, either x or y but not both. From step to step, we move along the line either horizontally or vertically, but not diagonally (Figure 3.3).
140
3.3 DDA Methods
Figure 3.3: Quadrantal DDA.
The key to the algorithm is to realize that the ratio number of y increments number of x increments should equal the slope Δy/Δx. The implementation uses an auxiliary variable, Err, that is initially set to zero. In each step, if Err>0, we increment x by 1 and decrement Err by Δy. If Err Δy, we end up with more x than y increments, which is appropriate since Δx > Δy implies a slope that’s less than 45◦ . If Δx < Δy, there will be more steps in the y direction. The quantity Err, thus, oscillates around zero all the time. The method is called quadrantal because the details of the steps depend on the direction of the line. If the direction is in the range 0◦ –90◦ , we increment both x and y by 1. If the direction is in the range 90◦ –180◦ , we increment x by −1 and y by 1; similarly, for the other two ranges. Hence, the algorithm must start by determining the range of the direction, and this is done by comparing Δx and Δy and looking at their signs. The pseudo-code below lists the loop for the first quadrant. It assumes that the two endpoints (x1, y1) and (x2, y2) of the line segment are given. var x1,y1,x2,y2,Δx,Δy: integer; Δx:=x2-x1; Δy:=y2-y1; Err:=0; repeat plot(x1,y1); if Err>0 then x1:=x1+1; Err:=Err-Δy else y1:=y1+1; Err:=Err+Δx endif; until x1>=x2 and y1>=y2; The quadrantal DDA method features the following: No multiplications, divisions, or real numbers are used. There are no diagonal moves, just horizontal and vertical ones. The line is therefore made of segments with a one-pixel overlap.
3 Scan Conversion
141
If Err happens to be zero in some loop iteration, it means that the current pixel is located right on the ideal line. Our program is written such that for Err=0, it executes the else part and therefore increments y1. When the current pixel is positioned on the ideal line, we normally don’t care whether the next pixel is drawn in the x or in the y direction. There is one exception, however—the second pixel! The first pixel is drawn at point (x1, y1), and since initially Err is set to zero, the second pixel will always be drawn at (x1, y1 + 1). This may be annoying if the line is close to horizontal, because it produces the pattern instead of . The solution is to set Err initially to +Δx/2, if Δx > Δy, Err = −Δy/2, otherwise, instead of to zero. This produces better looking lines, but notice that Err is no longer an integer. The number of pixels in a quadrantal line segment is easily computed. For a line close to horizontal or close to vertical, the number equals the length of the line (one pixel per x or per y value, as in the simple DDA method). For a line slanted at 45◦ , quadrantal DDA produces two pixels per x value (except √ the two extreme values), so the line looks as in Figure 3.3. The length of the line is 2Δx, so the ratio (number of √ √ pixels/line length) equals 2Δx/ 2Δx = 2 ≈ 1.41. Thus, there is an excess of 41%. Such lines are still too bright and also slow to compute. Exercise 3.3: Given the two endpoints (1, 1) and (5, 5), calculate the pixels for the straight line between them obtained by simple DDA and by quadrantal DDA.
3.3.5 Octantal DDA The main idea behind octantal DDA is to move along the line either horizontally or diagonally, but not vertically (the precise rule depends on the slope). The main feature of such lines is that they are made of segments that do not overlap (Figure 3.4), so they look smoother than similar quadrantal lines.
Figure 3.4: Octantal DDA.
Eight sets of rules are necessary. If the direction of the line is in the range 0◦ –45◦ (the first octant, where 0 ≤ slope ≤ 1), only horizontal and diagonal moves are allowed. If the direction is in the second octant (45◦ –90◦ ), only vertical and diagonal moves are allowed, not horizontal. In the third octant, the moves are the same as in the second one, but with negative x increments. The precise rules for four of the eight octants are summarized as follows: In octant 1: 1. If Err0, move diagonally (+x, +y) and update Err:=Err+Δy − Δx.
142
3.4 Bresenham’s Line Method
In octant 2: 1. If Err=0, move vertically (+y) and update Err:=Err−Δx. In octant 7: 1. If Err=0, move vertically (−y) and update Err:=Err−Δx. In octant 8: 1. If Err0, move diagonally (+x, −y) and update Err:=Err−Δy − Δx. The rules for octants 3–6 are similar. Note that the program can be made more compact (although a bit slower) by using the symmetry between certain octants. The only difference between octants 2 and 7, for example, is in the y coordinate. If the endpoints are (1, 1) and (2, −7) (which corresponds to octant 7), then the program can calculate the line for points (1, −1) and (2, 7) (octant 2), and transform the line to octant 7 by reversing the y coordinates of all the pixels. Exercise 3.4: Do it. The main advantage of the octantal over the quadrantal DDA is the number of pixels per line, which is closer to the length of the line. Lines drawn using this method look finer and more precise. For lines close to horizontal or vertical, the number of pixels is the same as in the quadrantal DDA method. For a 45◦ line, however, there is an improvement. The line looks as in Figure 3.4, one pixel per x value.√ Hence, the √ number of pixels is Δx and the ratio (# of pixels/line length) equals Δx/ 2Δx = 1/ 2 ≈ 0.71 = 1 − 0.29. There is therefore a pixel shortage of 29% (compared to an excess of 41% in quadrantal DDA). Exercise 3.5: Intuitively it seems that the 45◦ line shown in Figure 3.4 is the best representation of such a line. It looks straight and precise. Any additional pixels would make it too thick. It seems that the octantal DDA method produces ideal 45◦ lines, so how can we say that they have a shortage of pixels?
3.4 Bresenham’s Line Method Currently, high-resolution, color displays are common, but they were rare and very expensive in the 1960s, the period that saw the emergence of computer graphics. This is why the main graphics output device in the early days of this discipline was the pen plotter (Section 26.13), and this is also why the first DDA method was originally developed, by Jack Bresenham [Bresenham 65], for use on a pen plotter. Two versions of this method are described here, together with ideas for improving it and speeding it up. The first version of this algorithm is derived here for the first octant (i.e., for line segments with slopes between 0 and 45◦ ), but its extension to other octants is not difficult. Since the line is close to horizontal, the algorithm should increment x in each iteration and the only decision that needs to be made is whether to also increment y. If the current pixel was drawn at (x, y), then the next pixel should be plotted either at
3 Scan Conversion
143
(x + 1, y) or at (x + 1, y + 1). In principle, this decision can be made by computing the y value of the actual line at x + 1 (i.e., h = a(x + 1) + b) and determining the smaller of the two distances (y + 1) − h and h − y. These computations, however, involve the two real quantities a and b. Instead, the Bresenham algorithm starts by computing the difference d between the height of a general pixel (x, y) and the height of the true line at x. That difference is d = (ax + b) − y = (Δy/Δx)x + b − y. In the first octant, Δx is nonzero, so d is well defined. We now multiply both sides by Δx to obtain the decision variable D = dΔx = xΔy − yΔx + bΔx (note that D is an integer because bΔx is an integer). Each time x is incremented inside the loop, if the algorithm decides to increment y, then D is modified to Dnew ← (x + 1)Δy − (y + 1)Δx + bΔx = Dold + (Δy − Δx), so D should be incremented by (Δy − Δx), a negative quantity. If, on the other hand, the algorithm decides to keep the same y, D is modified to Dnew ← (x + 1)Δy − yΔx + bΔx = Dold + Δy, so D should be incremented by Δy, which is positive. Based on this, the decision whether to increment y is made by trying to keep D close to zero. The principle of Bresenham’s algorithm for the first octant is as follows: Set D to zero and start a loop. In each iteration, if D is zero or negative, keep y the same and increase D by Δy. If D is positive, increment y by 1 and decrease D by (Δy − Δx). The resulting code, listed in Figure 3.5, is both compact and fast. var dx, dy, dxdy, D: integer; x dxdy:=dy-dx, D:=0; 10 x:=x1; y:=y1; 11 repeat 12 pixel(x,y); if d>0 then y:=y+1; D:=D+dxdy 13 14 else D:=D+dy 15 endif; 16 x:=x+1; until x=x2; 17 Figure 3.5: Bresenham’s Method.
y 10 10 11 11 12 12 13 13
D 0 6 −2 4 −4 2 −6 0
Inc y? n y n y n y n n
x 18 19 20 21 22 23 24
y 13 14 14 15 15 16 16
D 6 −2 4 −4 2 −6 0
Inc y? y y y n y n
Table 3.6: An Example.
As an example, Table 3.6 lists the pixels selected by this algorithm for the line from (10, 10) to (24, 16). For this line, Δy = 6, Δx = 14, and Δy − Δx = −8. Table 3.7 can be used to determine the octant of the slope. Given a line segment from (x1 , y1 ) to (x2 , y2 ), first reorder the points, if necessary, such that x1 ≤ x2 , then use the table. The top row of the table reads: If Δy ≥ 0 and Δx ≥ Δy, then the slope is positive and is less than or equal 1. The octant is either 1 or, if the points had to be swapped, it is 5.
144
3.4 Bresenham’s Line Method Δy
Δx?Δy
slope
≥0 ≥0 1 neg ≥ −1 neg < −1
octant 1 2 7 8
(5) (6) (3) (4)
Table 3.7: Determining the Octant.
For the four octants where the slope is in the range [−1, +1], the algorithm loops over x as shown in Figure 3.5. For the other four octants, the roles of x and y should be swapped. The above derivation of Bresenham’s method is simple, but most references in the literature derive it from a different approach, and end up with a slightly different algorithm. This alternative approach is described here. We again assume a line segment from (x1 , y1 ) to (x2 , y2 ) with a slope in the first octant. The loop iterates on x values from x1 to x2 and the only decision to be made in iteration k is whether to increment the y coordinate or not. Assume that iteration k has plotted a pixel at position (xk , yk ). The next iteration should plot a pixel at (xk+1 , yk+1 ) where xk+1 = xk + 1 and yk+1 is either yk or yk + 1. The decision is based on the distances of the true line from the two candidate pixels P0 = (xk + 1, yk ) and P1 = (xk + 1, yk + 1). We use the notation y = a xk+1 + b. This is the true height of the line between points P0 and P1 . The equation of a straight line is y = a x+b. The slope a equals (y2 −y1 )/(x2 −x1 ) = Δy/Δx and the y-intercept b is y1 − a x1 . The distance of the line from P1 is m = (yk +1)−y = yk +1−a(xk +1)−b and its distance from P0 is n = y−yk = a(xk +1)+b−yk . We subtract these distances to obtain the difference n − m = a(xk + 1) + b − yk − yk − 1 + a(xk + 1) + b = 2a(xk + 1) − 2yk + 2b − 1. (3.1) The sign of this difference determines the point to be selected in iteration k + 1. If the difference is positive (m is smaller), then P1 should be selected (i.e., y should be incremented); otherwise, P0 is the point closer to the line. True to the DDA spirit, however, we don’t want to perform so many computations in each iteration, so the algorithm has to be improved. We multiply both sides of Equation (3.1) by Δx to obtain dk = Δx(n − m) = 2xk Δy + 2Δy − 2yk Δx + Δx(2b − 1) = 2xk Δy − 2yk Δx + C, (3.2) where C = 2Δy + Δx(2b − 1) is a constant that does not vary from iteration to iteration. Equation (3.2) can be used to compute the initial value d1 d1 = 2x1 Δy − 2y1 Δx + 2Δy + Δx(2b − 1) = 2Δy − Δx,
(3.3)
but it is clear that computing dk is as bad as computing the difference n − m. The main idea of Bresenham’s algorithm is to compute dk in each iteration indirectly, from
3 Scan Conversion
145
its predecessor. This turns out to be simple, since the difference dk+1 − dk is the simple expression dk+1 − dk = 2Δy(xk+1 − xk ) − 2Δx(yk+1 − yk ) = 2Δy − 2Δx(yk+1 − yk ). If iteration k + 1 decides (based on the sign of dk ) to increment y, then yk+1 = yk + 1, so the updated value dk+1 becomes dk + 2(Δy − Δx). If the decision is not to increment y, then the new value becomes dk + 2Δy. Thus, the algorithm starts by computing the initial value of d from Equation (3.3), then iterates from x1 to x2 . In each iteration it selects either P0 or P1 depending on the sign of d, then updates d. Figure 3.8a shows a pseudo-code version of this algorithm, with a C version shown in Figure 3.8b. var x,y,dxy,dy,d: integer; y:=y1; Δ x=x2-x1; Δ y=y2-y1; dy:=2Δ y; dxy:=2(Δ y −Δ x); d:=2Δ y − Δ x; for x:=x1 to x2 do pixel(x,y); if d< 0 then d:=d+dy; else d:=d+dxy; y:=y+1 endif endfor (a)
bresenham(int x1,y1,x2,y2) {int y=y1, dx=x2-x1, dy=y2-y1; int d=2*dy-dx; for (int x=x1; x 0.5, there should be loop where the sign of b − c − 1 should be checked in each iteration. If it is negative, patterns 2 or 3 should be selected; otherwise, pattern 4. What is needed now is a way to compute the sign of both b − c + 1 and b − c − 1 by using just integer quantities and simple operations. We first develop a simple test for the sign of b − c + 1. We first observe that our line, which goes from point (x1 , y1 ) to (x2 , y2 ), can be translated such that it starts at the origin and this will not change the pattern of pixels. We therefore assume that our line has the equation y = (Δy/Δx)x (the y-intercept is zero), without loss of generality. We next observe that the height of the line at x = xi−1 + 2 is (Δy/Δx)(xi−1 + 2). The same height can also be written as the sum yi−1 + b, so we get b = (Δy/Δx)(xi−1 + 2) − yi−1 . To find a similar expression for c, we subtract the height of the line at x = xi−1 + 2 from the height of pixel J (which is yi−1 + 2). The result is c = yi−1 + 2 − (Δy/Δx)(xi−1 + 2). The quantity b − c + 1 can now be written Δy Δy (xi−1 + 2) − yi−1 − yi−1 + 2 − (xi−1 + 2) + 1 Δx Δx Δy (xi−1 + 2) − 2yi−1 − 1. =2 Δx
b−c+1=
We now define a new quantity Di (the discriminator for the sign of b − c + 1) by means
3.5 Double-Step DDA
150 of Δx and Δy as
Di = Δx(b − c + 1) = 2Δy(xi−1 + 2) − 2Δx yi−1 − Δx. Note that since the line is in the first quadrant, Δx is never negative (this assumption should be changed when extending the algorithm to other quadrants). This means that the sign of Di is the same as that of b − c + 1 and we can use Di in our loop to determine which pattern to use. Before starting the loop, we should set D1 = 4Δy − Δx. In each iteration, a new value Di+1 should be calculated from the previous value Di . From Di+1 = 2Δy(xi +2)− 2Δx yi − Δx, we get Di+1 − Di = 2Δy(xi − xi−1 ) − 2Δx(yi − yi−1 ). Since xi − xi−1 = 2 and since yi − yi−1 equals either 1 (for patterns 2 and 3) or 0 (for pattern 1), we can write if Di < 0 (pattern 1), Di + 4Δy, Di+1 = Di + 4Δy − 2Δx, otherwise (patterns 2 and 3). The discriminator for the sign of b − c − 1 is derived in a similar way, and the result is D1 = 4Δy − 3Δx and Di + 4Δy − 2Δx, if Di < 0 (patterns 2 and 3), Di+1 = Di + 4(Δy − Δx), otherwise (pattern 4). This completes the algorithms for approach 1, where patterns 1, 4, and 5 are used. In order to implement approach 2, where patterns 1–4 are used, we need another discriminator Dt . This is calculated by a process similar to the one used to obtain Di , except that point M is used in Figure 3.11 instead of pixel P as the base for the calculations. Its distance c from the line is shown in Figure 3.11b. By the definition of a discriminator, we get Δx(b − c + 1), if slope< 0.5, Dt = (3.4) Δx(b − c − 1), if slope≥ 0.5, where Figure 3.11 provides the relations b = b − Δy/Δx and c = c + Δy/Δx, which imply Dt = Di − 2Δy. Also, from Figure 3.11 we see that when b − c + 1 < 0, pattern 2 should be used, otherwise pattern 3. Combining this knowledge with Dt = Di − 2Δy and with Equation (3.4), we find that the relation b − c + 1 < 0 is equivalent to 2Δy, if slope< 0.5, Di < 2(Δy − Δx), if slope≥ 0.5. The last point to discuss is the termination of the loop. Since each iteration increments x by 2, we cannot simply check to see whether x is greater than x2 , as this may result in one extra pixel. The solution is to realize that the extra pixel will result only if the line is made of an odd number of pixels. Since our lines are in the first quadrant, the number of pixels equals Δx. We therefore change the loop if Δx is odd, and terminate it when x = x2 − 1. The last pixel is then drawn separately, outside the loop, at (x2 , y2 ).
3 Scan Conversion
151
3.6 Best-Fit DDA Best-fit is an interesting, fast method based on Euclid’s algorithm. It is discussed here for the first octant only (not including slopes of 0◦ and 45◦ , but drawing lines for these slopes is trivial), but it is easy to extend to other octants since it produces a string of bits indicating the moves from pixel to pixel. For the first octant, a 0 bit means a horizontal move and a bit of 1 means a diagonal move. For the second octant, a 0 bit means a vertical move and a 1 bit means a diagonal move. Section 3.3.5 discusses the symmetry between certain octants. The only difference between octants 2 and 7, for example, is in the y coordinate. Since the algorithm is based on strings, we use the string notation rev(str) for the reverse of the string in variable str, we use str1+str2 to denote the concatenation of the two strings, we use substring(str,a,b) for the substring from position a to position b of str, and we use len(str) for the length of string str. (String operations are discussed in any introductory text on computer programming.) Figure 3.12 is a pseudo-code listing of the algorithm The algorithm calls for several string reversals. This operation is slow when done by software. On a graphics computer, however, a special machine instruction may be added that reverses a register, making it possible to reverse a string in just one clock cycle. Example: A line from (1, 2) to (18, 12). The slope is (12 − 2)/(18 − 1) = 10/17 (i.e., in the first octant). Table 3.13 summarizes the steps. The final result is the string 10101011010110101. Note that it consists of 10 ones and seven zeros. It indicates 17 steps—10 of them horizontal and seven diagonal—following the first pixel at (1, 2). Exercise 3.7: Apply this algorithm manually to the lines from (4, 4) to (12, 6), and from (4, 4) to (14, 7). DDA methods generate different types of lines, but they all obey the following rule: In a line that is close to horizontal, steps in the y direction (or diagonal steps) are isolated (never two consecutive steps) and steps in the x direction occur in runs of the same size (except, perhaps, the first and last runs). In a line that’s close to vertical, the situation is the reverse.
3.6.1 Scanning a Bitmap in an Arbitrary Direction It is easy to scan a bitmap by rows or by columns, and this short section shows how a scan-converting algorithm for lines can be employed to scan a bitmap in an arbitrary direction. Figure 3.14a shows a bitmap of R rows and C columns, and a line drawn at an angle α, specifying the direction of the scan. Notice that α = 0 is in the direction of the columns while α = 90◦ implies scanning by rows. Let’s call this line the main diagonal. It goes from (0, 0) to a point (x, y) and our first task is to determine the coordinates of this point.
3.6 Best-Fit DDA
152
procedure bestfit(x1,y1,x2,y2: integer); var str1,str2: 0..1; x,y,i: integer; done: boolean; begin done:=false; str1:=‘0’; str2:=‘1’; y:=y2-y1; x:=(x2-x1)-y; repeat case x>y: str2:=rev(str2)+str1; x:=x-y; x=y: done:=true; x C/R) we have to substitute x = C for y = R. Once the coordinates (x, y) are known, the next task is to determine the coordinates of the pixels closest to the ideal, mathematical main diagonal from (0, 0) to (x, y). This can be done with an algorithm such as quadrantal DDA (Section 3.3.4) or octantal DDA (Section 3.3.5). The algorithm determines the individual steps of moving from a pixel to its successor and saves these steps. For example, move one pixel to the right and draw three pixels vertically down. The last task is to repeat this process for all the diagonals (some are shown as green lines in Figure 3.14b). This is easy, because we already have the steps for the main diagonal. We simply move one pixel to the right and then proceed from pixel to pixel in the steps saved earlier by the DDA algorithm.
3.7 Scan-Converting in Parallel Parallel computers are becoming more and more common, and researchers are constantly looking for parallel versions of important algorithms. It turns out that scan-conversion algorithms can be modified to run on parallel computers. This section shows how to generalize scan-conversion methods for lines so that they can run on an MIMD (Multi Instruction, Multi Data) computer. Such a computer consists of several processors, all running simultaneously, communicating with each other either through shared memory or by means of message passing.
3.7 Scan-Converting in Parallel
154
The main principle in developing a parallel algorithm is to divide (or partition) the problem into subproblems that are identical. In this way, only one program has to be written, and the individual processors execute identical copies. The problem of scan-converting lines can be divided into subproblems in several ways as follows: 1. Assuming that the MIMD computer has p processors, divide the greater of the intervals Δx and Δy into p equal subintervals, and assign each subinterval to a processor. Each processor calculates the best pixels in its subinterval, and the total time is reduced in this way by a factor of p. The main problem is the error variable. Its start value in each subinterval should equal its last value in the preceding subinterval, which is initially unknown. Assigning Err:=0 in every subinterval leads to less than ideal results. 2. Consider the bounding rectangle of the line segment (Figure 3.15a). It contains Δx × Δy pixels. We assume that the MIMD computer has at least Δx × Δy processors arranged in a two-dimensional grid where the processor in position (i, j) becomes “responsible” for the pixel at relative position (i, j) in the rectangle. The processor calculates the distance d of its pixel from the line. If d is less than a preset value, the processor turns the pixel on (or, if XOR is used, flips it). This method is fast, but requires many processors.
(x2,y2) (xi,yj) yj-y0
(4,2)
(4,3)
(4,4)
(3,1)
(3,2)
(3,3)
(3,4)
(2,1)
(2,2)
(2,3)
(2,4)
(1,1)
(1,2)
(1,3)
(1,4)
d (x0,y0) Δy
x0-xi L
(x1,y1)
(4,1)
Δx (a)
(b)
Figure 3.15: (a) A Bounding Rectangle. (b) Divided Among 16 Processors.
Figure 3.15 shows the details of the calculation. The line segment L goes from (x1 , y1 ) to (x2 , y2 ). We draw a perpendicular d from point (xi , yj ) to the line. The point where it intersects the line is denoted (x0 , y0 ). The two triangles (L, Δy, Δx) and (d, yj − y0 , x0 − xi ) are similar. We can, thus, multiply corresponding sides to get d × L = Δy(x0 − xi ) + Δx(yj − y0 ),
3 Scan Conversion
155
yielding d=−
Δx x0 Δy − y0 Δx Δy xi + yj + = Axi + Byj + C. L L L
The three constants A, B, and C are the same for all pixels (see Exercise 3.8). They are calculated—normally by processor 0—and sent to all the other processors. When a processor receives these values, it calculates the distance d of “its” pixel from the line and decides whether or not to turn it on. Exercise 3.8: Show why C = (x0 Δy − y0 Δx)/L is a constant that does not depend on x0 or y0 . 3. If the MIMD computer does not have enough processors, each processor can be assigned a row or column in the bounding rectangle. The processor then calculates the intersection of the line with “its” row or column. If the line is close to horizontal, there should be one pixel per column, so each processor should be assigned a column xi . The processor then solves the equation
y = axi + b =
y2 − y1 y1 x2 − x1 y2 xi + x2 − x1 x2 − x1
for y and sets pixel (xi , y). If the line is close to vertical, there should be one pixel per row and each processor should be assigned a row yj . The processor then solves the equation yj = ax + b =
y2 − y1 y1 x2 − x1 y2 x+ x2 − x1 x2 − x1
for x and sets pixel (x, yj ). 4. If the number of processors is very limited, each may be assigned more than just a row or a column of pixels. As an example, let’s assume that we have an MIMD computer with 16 processors, organized in a two-dimensional grid and numbered from processor (1, 1) to processor (4, 4). If the bounding box contains n×n pixels, we can divide them evenly among the 16 processors and assign each processor a square area of m × m pixels, where m = n/4. Processor number (i, j) would, in this case, be “responsible” for the m×m square of pixels whose bottom-left corner is at position (i × Δx/m, j × Δy/m) (Figure 3.15b).
156
3.8 Scan-Converting Circles
3.8 Scan-Converting Circles Because of the high symmetry of the circle, it is possible to scan-convert it in many ways. Only a few such methods are discussed here, but many more can be found in the charming article [Blinn 87].
3.8.1 Obvious Methods Obvious methods for scan-converting a circle are easy to derive and to implement but are slow and inefficient. 2 2 2 The first obvious √ method is based on the Cartesian equation of a circle x +y = R , which yields y = R2 − x2 . This expression is used in the loop below to determine onequarter of the circle, which is then duplicated to complete the circle. for x:=0 to R step eps do y:=sqrt(R*R-x*x); plot(x,y); plot(-x,y); plot(x,-y); plot(-x,-y); end; The method is slow, but a more important drawback is that the pixels are not uniformly distributed over the quarter circle. This is a result of the equal x increments of the loop (Figure 3.16). y
x Figure 3.16: Equal Increments of x.
The next obvious method solves this problem by employing the parametric equation x = R cos θ, y = R sin θ, which expresses the circle in terms of polar coordinates. for theta:=0 to pi/2 step eps do x:=R*cos(theta); y:=R*sin(theta); plot(x,y); plot(-x,y); plot(x,-y); plot(-x,-y); end;
3 Scan Conversion
157
This method is still very inefficient because of the use of trigonometric functions and also because some pixels may be set multiple times.
3.8.2 A Circle in Polar Coordinates Expressing the circle in polar coordinates allows for a fast algorithm, even though it uses real numbers. This algorithm is presented here for a complete circle (the loop goes from 0◦ to 360◦ ), but it can easily be modified to compute just one-quarter or one octant and use symmetry to complete the circle. The circle equation in polar coordinates is x = R cos θ, y = R sin θ. A computer program computes the circle pixel by pixel, by iterating and varying a variable k from 0 to n − 1 (where n, the number of steps, is specified by the user). As a result, it is convenient to write the above equations as xk = R cos(θk ), yk = R sin(θk ), where θk = 2πk/n. The main point of this method is to compute certain trigonometric functions just once. The user has to input a small value Δθ, which the program uses as an angle increment. In each step, it increments θk+1 = θk + Δθ. We use the trigonometric identities for the sum of angles: cos(α + β) = cos α cos β − sin α sin β,
sin(α + β) = sin α cos β + cos α sin β;
from this we get xk+1 = R cos(θk+1 ) = R cos(θk + Δθ) = R [cos(θk ) cos(Δθ) − sin(θk ) sin(Δθ)] = xk cos(Δθ) − yk sin(Δθ). Similarly, yk+1 = xk sin(Δθ) + yk cos(Δθ). The program needs to compute sin(Δθ) and cos(Δθ) only once, then loop n times. A pseudo-code algorithm is listed here. (Note the quantities a and b. They are added to every pixel, which creates a circle centered at point (a, b) rather than at the origin.) input(n,delta,a,b,R); xk:=R; yk:=0; dcos:=Cos(delta); dsin:=Sin(delta); for k:=0 to n-1 do xn:=xk*dcos-yk*dsin; yn:=xk*dsin+yk*dcos; xk:=xn; yk:=yn; pixel(round(xn)+a,round(yn)+b); end; Exercise 3.9: Select Δθ = 5◦ , a = b = 0 and use this method to calculate the 18 equally-spaced points of the first quadrant of a circle of radius 1.
3.8 Scan-Converting Circles
158
3.8.3 Bresenham–Michener Circle Method The Bresenham–Michener circle method is a DDA algorithm [Bresenham √ √ 77] that is based on a loop that starts at point (0, R) and ends at point (R/ 2, R/ 2) to create one octant of the circle. Each pixel calculated is used to determine seven more pixels, in the remaining seven octants, to√create the complete circle (see also Exercise 4.18). To √ move from (0, R) to (R/ 2, R/ 2), we need to increment x and decrement y. The loop of Figure 3.17 is set such that the x coordinate is incremented in every step, but the y coordinate is only decremented in certain steps (i.e., conditionally). The results are good (i.e., the pixels are fairly uniformly distributed) because in this octant the circle is close to horizontal. In each step (except the first), the algorithm examines two points, S and T, that differ only in their y coordinates, and it selects the one that’s closer to the true circle. Exercise 3.10: Which point should be selected in case of a tie (i.e., when the circle passes exactly between Si and Pi )? The algorithm maintains a variable di (calculated using just additions, subtractions, and shifts) that is updated every step. The sign of di is used as an indicator, telling the program whether to decrement y at the step or not. The general form of the loop is as in Figure 3.17.
x:=0; y:=R; while x0 then . . y:=y-1; else ... endif; x:=x+1; endwhile;
yi−1
Pi
xi−1
Si
Ti
Figure 3.17: Main Loop of Bresenham’s Circle Algorithm.
If a point Pi−1 = (xi−1 , yi−1 ) has been selected in a certain step, then the next step should increment x from xi−1 to xi−1 + 1 and either set yi = yi−1 or decrement yi = yi−1 − 1. The next step should therefore select either point Si = (xi−1 + 1, yi−1 ) or Ti = (xi−1 + 1, yi−1 − 1). The two quantities DS and DT are defined based on the
3 Scan Conversion
159
distances from the points to the circle (Figure 3.17): 2 DS = (xi−1 + 1)2 + yi−1 − R2 ,
DT = R2 − [(xi−1 + 1)2 + (yi−1 − 1)2 ], 2 + (yi−1 − 1)2 − 2R2 . di = DS − DT = 2(xi−1 + 1)2 + yi−1
Note that these quantities are based on the distances of points S and T from the true circle. They are not the distances themselves, since this would require a square root calculation. If the circle passes closer to point S, then DS < DT and di is negative. If the circle passes closer to T, then di is positive. Hence, the sign of di indicates which point to select in iteration i. The only remaining detail is the recalculation of di in each iteration. It turns out that calculating di+1 is very simple if it is done in terms of di . This, in fact, is the main advantage of the method. We start with di+1 = 2(xi + 1)2 + yi2 + (yi − 1)2 − 2R2 . We already know that xi always equals xi−1 + 1, but the value of yi depends on the choice of point. If di > 0 (point T selected), then yi = yi−1 − 1 and di+1 = 2(xi−1 + 2)2 + (yi−1 − 1)2 + (yi−1 − 2)2 − 2R2 2 = 2[(xi−1 + 1)2 + 2xi−1 + 3] + [yi−1 − 2yi−1 + 1]
+ (yi−1 − 1)2 − 2yi−1 + 3 − 2R2 = di + 4xi−1 + 2 · 3 − 2yi−1 + 1 − 2yi−1 + 3 = di + 4(xi−1 − yi−1 ) + 10. If, however, di < 0 (point S selected), then yi = yi−1 and 2 + (yi−1 − 1)2 − 2R2 di+1 = [2(xi−1 + 2)2 ] + yi−1 2 = [2(xi−1 + 1)2 + 4xi−1 + 6] + yi−1 + (yi−1 − 1)2 − 2R2 = di + 4xi−1 + 6.
Hence, updating the value of di in either case is simple and does not require any arithmetic operations beyond addition, subtraction, and shifting. Notice also that di+1 depends on (xi−1 , yi−1 ) and not on (xi , yi ). The program should therefore update the value of d before updating the values of x and y. The initial value of di , namely d1 , is easily found by substituting (xi−1 , yi−1 ) = (0, R). This gives d1 = 2(0 + 1)2 + R2 + (R − 1)2 − 2R2 = 2 + R2 + R2 − 2R + 1 − 2R2 = 3 − 2R. The final program is shown in Figure 3.18.
160
3.8 Scan-Converting Circles procedure Bresenham(R); x:=0; y:=R; d:=3-2*R; while x0 then d:=d+4*(x-y)+10; y:=y-1; else d:=d+4*x+6; endif; x:=x+1; endwhile; if x=y then Plot8(x,y) end; {Bresenham}
procedure Plot8(x,y); Plot(x,y); Plot(-x,-y); Plot(-x,y); Plot(x,-y); Plot(y,x); Plot(-y,-x); Plot(-y,x); Plot(y,-x); end; {Plot8}
Figure 3.18: Bresenham’s Circle Algorithm.
3.8.4 The DCS Circle Method The circle and the square are different mathematical entities. Squaring the circle (constructing a square with the same area as a given circle by using only a finite number of steps with compass and straightedge) was one of the three great problems of classical geometry (the other two were the trisection of the angle and the duplication of the cube). In 1882 it was finally shown (as a direct consequence of the proof that π is transcendental) that squaring the circle is impossible, which is why today circle squarers are considered crackpots. Nevertheless, there have been connections and associations between squares and circles (or between quantities related to those geometric figures). One such relation is the sum of the infinite series of inverse squares ∞ 1 , k2 k=1
which in 1735 was shown by Leonhard Euler to equal π2 /6, a quantity related to circles because of the definition of π. The DCS circle algorithm of this section is another unexpected connection between circles and perfect squares. This simple and elegant algorithm (DCS stands for digital circle squares) determines the best pixels for a circle with radius r in terms of the distribution of square numbers in discrete intervals. Square numbers (or perfect squares) are denoted by Si and are integers of the form i2 where i is a nonnegative integer. They have a property that speeds up this algorithm. Given the square Si , its successor is easily computed by Si+1 = (i + 1)2 = Si + 2i + 1. (Notice that no multiplication is needed because 2i can be computed with a left shift of i.) This property is illustrated by the gnomon of Figure 3.19. (The term gnomon refers to the part that should be added to a figure to produce a larger figure of the same shape.) The main reference is [Bhowmick and Bhattacharya 08], while [Jha 10] is a free Mathematica package that employs animation to demonstrate the algorithm.
3 Scan Conversion
161
Figure 3.19: Square Numbers Computed by a Gnomon.
As in the Bresenham–Michener circle method (Section 3.8.3), the idea is to scan the first octant of the circle from x = 0 to x = y, start with y = r, plot a horizontal run of pixels, and decrement y by 1. While other scan conversion methods compute a y coordinate for each x coordinate, the DCS method determines the length of each horizontal run of pixels with the same y. Thus, DCS can be referred to as an interval searching algorithm. Figure 3.20 illustrates the meaning of the term “run” in DCS. The figure shows the first octant of a circle with r = 41. It is easy to see that the pixels selected by DCS for this circle form horizontal runs of lengths 7, 4, 4, 2, 2, 2, 2, 1, 1, 2, 1, 1, and 1. The run lengths generally decrease, but they may sometimes slightly increase.
Figure 3.20: First Octant Runs of Pixels in DCS.
The run lengths in the figure were determined as follows. The first run-length (7) is the number of perfect squares in the interval I0 = [0, r − 1] = [0, 40] (these are the squares 0, 1, 4, up to 36). The second run-length (4) is the number of perfect squares in the interval I1 = [r, 3r − 3] = [41, 120] (these are the squares 49, 64, 81, and 100). The third length (also 4) is the number of perfect squares in the interval I2 = [3r − 2, 5r − 7] = [121, 198] (these are the squares 121, 144, 169, and 196). In general, run length k (where k starts at 0) is the number of perfect squares in the interval Ik = [(2k − 1)r − k(k − 1), (2k + 1)r − k(k + 1) − 1]. The principle of DCS is surprisingly simple and is illustrated in Figure 3.21. At an arbitrary step in the algorithm, we are at point P = (i, j). The true circle lies on point Q = (i, j − δ), located between P and M = (i, j − 1/2) (but Q could also be between P
162
3.8 Scan-Converting Circles
and N = (i, j + 1/2) and have coordinates (i, j + δ)). We can therefore write r2 = i2 + (j − δ)2 , 0 = δ 2 − 2jδ + (i2 + j 2 − r2 ), 1 δ= 2j ± 4j 2 − 4(i2 + j 2 − r2 ) , 2 δ = j ± r2 − i2 .
y=j+1
N P
y=j
Q
M
y=j−1 x=i−1
x=i
x=i+1
Figure 3.21: Principle of DCS.
The answer to Exercise 3.10 tells us that Q cannot be M or N (i.e., a tie is impossible), which is why |δ| < 1/2 or − 12 < δ < 12 . This implies that we have to select the minus sign in the last equation and write it as δ=j−
r 2 − i2 .
(3.5)
Extending Equation (3.5) while keeping in mind that − 12 < δ < 12 , we obtain 1 1 < j − r 2 − i2 < , 2 2 1 1 2 2 − − j < − r − i < − j, 2 2 1 1 2 2 +j > r −i >j− , 2 2 2 2 1 1 < r 2 − i2 < j + , j− 2 2 1 1 j 2 − j + < r2 − i2 < j 2 + j + . 4 4 −
The terms j 2 − j, r2 − r, and j 2 + j on the last line are integers, which is why the inequality j 2 − j + 14 < r2 − i2 implies (j 2 − j) − (r 2 − i2 ) < − 14 < 0 or j 2 − j < r2 − i2 .
3 Scan Conversion
163
Similarly, the quantity (r2 − i2 ) − (j 2 + j) is an integer and is less than 1/4, so it must be nonnegative. We can therefore write j 2 − j < r2 − i2 ≤ j 2 + j, −j 2 − j ≤ i2 − r 2 < −j 2 + j, r 2 − j 2 − j ≤ i2 < r2 − j 2 + j.
(3.6)
At first, Equation (3.6) does not seem special, but it is in fact the heart of the DCS algorithm. The DCS loop starts with j = r and the first step has to determine the run length of pixels for this j. Substituting j = r in Equation (3.6) yields 0 ≤ i2 < r 2 − r2 + r = r or 0 ≤ i2 ≤ r − 1. This means that the x coordinates i of the pixels in the topmost run are the perfect squares in the interval I0 = [0, r − 1]. This result is an unexpected connection between number theory and digital circles. The DCS algorithm computes the number l0 of these squares and draws the topmost horizontal run of pixels. Variable j is now decremented by 1, and we substitute r − 1 for j in Equation (3.6). The result is the next interval I1 = [r2 − (r − 1)2 − (r − 1), r2 − (r − 1)2 + (r − 1) − 1] = [r, 3r − 3]. It indicates that the horizontal run-length of pixels for j = r − 1 equals the number of perfect squares in the interval [r, 3r − 3]. In the kth step of DCS, we substitute j = k in Equation (3.6) to obtain the kth interval where we’ll have to look for perfect squares
Ik = r2 − (r − k)2 − (r − k), r 2 − (r − k)2 + (r − k) − 1 = [(2k − 1)r − k(k − 1), (2k + 1)r − k(k + 1) − 1]. The lengths of these intervals are easily computed. The length of a general closed interval [a, b] is b − a + 1, so Equation (3.6) implies that l1 = (3r − 3) − r + 1 = l0 + (r − 2) and the length of Ik for k ≥ 1 is lk = [(2k + 1)r − k(k + 1) − 1] − [(2k − 1)r − k(k − 1)] + 1 = 2r − 2k, which suggests that the length of the next interval is lk+1 = 2r −2(k +1) = lk −2 (again, this is for k ≥ 1). Notice that the lengths get shorter by 2, but these are the lengths of intervals containing perfect squares. These are not the lengths of horizontal runs of pixels. The length of each horizontal run of pixels is computed by counting the number of perfect squares included in the corresponding interval. The authors of this algorithm show how to speed up the determination of the intervals Ik . If we denote Ik = [uk , vk := uk + lk − 1], then uk and lk can be computed from their predecessors as follows uk−1 + lk−1 k ≥ 1, uk = 0 k = 0. lk−1 − 2 k ≥ 2, lk = 2r − 2 k = 1, r k = 0. The algorithm is listed here in the C programming language (where Plot8 is a procedure that draws the pixel at location (i, j) and the seven pixels that correspond to it in the other seven octants).
164
3.8 Scan-Converting Circles
DCS (int r){ int i = 0, j = r, s = 0, w = r − 1; int l = w 2 (superness > 0.7071). The superellipse becomes a pinched diamond. When n → ∞ (superness → 1), it approaches the shape of a plus sign.
(a)
(e)
(c) (d) (b)
Figure 3.23: Five Superellipses.
Check also [Lam´e 98] for an interactive JAVA applet. The superellipse was popularized by the multi-artist Piet Hein, who designed one of the Stockholm city squares as a superellipse with n = 2.5 (or superness of 2−0.4 ≈ 0.7578). Regardless of its shape, the superellipse passes through the same points when θ = 0, π/2, π, 3π/2. When a = b, the superellipse is reduced to the supercircle (a cosn θ, a sinn θ). It can be generalized to the three-dimensional superellipsoid (a cosn θ cosm φ, b cosn θ sinm φ, c sinn θ),
where − π/2 ≤ θ ≤ π/2 and − π ≤ φ ≤ π.
This solid can take many shapes, but all are bounded in the box whose dimensions are 2a × 2b × 2c.
3 Scan Conversion
167
A circle no doubt has a certain appealing simplicity at first glance, but one look at an ellipse should have convinced even the most mystical of astronomers that the perfect simplicity of the circle is akin to the vacant smile of complete idiocy. Compared to what an ellipse can tell us, a circle has little to say. Possibly our own search for cosmic simplicities in the physical universe is of this circular kind—a projection of our uncomplicated mentality on an infinitely intricate external world. —Eric Temple Bell, Mathematics: Queen and Servant of Science.
3.9 Filling Polygons Polygons and their applications to polygonal surfaces are discussed in Sections 2.18 and 9.2. Here, we discuss two types of algorithms, seed fill and scan-line fill, for filling any polygon with a given color. The problem of polygon fill is to fill the interior of a given polygon with pixels of a certain color. A related problem is boundary fill, where we want to fill a given area, bounded by pixels of a certain color, with pixels of another color.
3.9.1 Seed Fill The first approach to the polygon fill problem is a recursive seed fill algorithm. The user has to specify a start point by clicking inside the polygon, and the software starts at that point, paints it and its neighbors the required color, examines the neighbors of the neighbors, and so on. The basic seed fill algorithm works as follows: Push the seed pixel onto the stack While the stack is not empty Pop a pixel P from the stack Set P to color f For each of the four (or eight) nearest neighbors of P , If any is a boundary pixel or is already painted f , then disregard it else push it onto the stack.
This algorithm works even for polygons with holes and is simple to implement in a programming language that supports recursion. However, the recursion can get very deep (which results in slow execution) and the recursion stack may have to be very large. We therefore need a sophisticated, nonrecursive boundary seed fill algorithm and such a method is discussed here in detail. We denote the boundary pixels by • and the fill pixels by ×. The area can be of any shape, but it must be completely bounded. If the boundary is not complete, the fill algorithm may not know where to stop and it may spill out of the area. To specify the area to the fill program, the user has to point to and select one of the interior pixels to become the seed. A straightforward algorithm is the following: 1. Set the seed pixel to the fill color ×. 2. Push the four nearest neighbors of the seed into a stack, unless any of them is a boundary pixel • , or has already been colored. 3. Pop the stack. Set that pixel to the fill color, and push its four near neighbors, as indicated in step 2, into the stack.
3.9 Filling Polygons
168
4. Repeat step 3 until the stack is empty. This method is slow because of the excessive use of the stack. Also, it may easily push thousands of pixels into the stack and overflow it. A better algorithm is outlined below. It still uses a stack, but it works with rows of pixels rather than individual ones. The stack is used to indicate future rows to be filled, instead of future pixels, so its use is not excessive. Consider the area in Figure 3.24a. The seed pixel is shown as . The algorithm is the following: 1. Fill the line where the seed pixel is located with the fill color ×. Scanning left and right of the seed as far as necessary until a boundary pixel • is reached (Figure 3.24b). 2. Examine the two rows immediately above and below the current row (if there are none, go to step 3). Scan each of the two rows from right to left looking for pixels that lie immediately to the left of a boundary pixel and that haven’t been colored already. Those pixels are pushed into the stack. Figure 3.24c shows three such pixels, labeled in the order in which they were found. ◦ ◦ • • • • • • • • • •
◦ • ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ •
• ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ •
••• ◦◦◦ ◦◦◦ ••• •◦◦ ••• ◦◦ ◦◦◦ ••• •◦◦ •◦◦ •◦◦
• ◦ ◦ • ◦ • ◦ ◦ • ◦ ◦ ◦
• ◦ ◦ • • • ◦ ◦ • • • •
• ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ •
◦ • ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ •
◦ ◦ • • • • • • • • • •
(a) ◦◦•••••••◦◦ ◦•◦◦◦◦◦◦◦•◦ •◦◦◦◦◦◦◦◦◦• •◦◦•••••◦◦• •◦◦•◦◦◦•◦◦• 2• ◦• • • • • ◦•1 • ××××××××× • • ××××××××× • 4• ◦• • • • • ◦•3 •◦◦•◦◦◦•◦◦• •◦◦•◦◦◦•◦◦• ••••◦◦◦••••
◦◦•••••••◦◦ ◦•◦◦◦◦◦◦◦•◦ •◦◦◦◦◦◦◦◦◦• •◦◦•••••◦◦• •◦◦•◦◦◦•◦◦• •◦◦•••••◦◦• • ××××××××× • •◦◦◦◦◦◦◦◦◦• •◦◦•••••◦◦• •◦◦•◦◦◦•◦◦• •◦◦•◦◦◦•◦◦• ••••◦◦◦••••
◦◦•••••••◦◦ ◦•◦◦◦◦◦◦◦•◦ •◦◦◦◦◦◦◦◦◦• •◦◦•••••◦◦• •◦◦•◦◦◦•◦◦• 3• ◦• • • • • ◦•1 • ××××××××× • • ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦•2 •◦◦•••••◦◦• •◦◦•◦◦◦•◦◦• •◦◦•◦◦◦•◦◦• ••••◦◦◦••••
(b) ◦◦•••••••◦◦ ◦•◦◦◦◦◦◦◦•◦ •◦◦◦◦◦◦◦◦◦• •◦◦•••••◦◦• •◦◦•◦◦◦•◦◦• 2• ◦• • • • • ◦•1 • ××××××××× • • ××××××××× • • ×× • • • • • ◦ • 3 4• ◦• ◦ ◦ ◦ • ◦ ◦ • •◦◦•◦◦◦•◦◦• ••••◦◦◦••••
(c) ◦◦•••••••◦◦ ◦•◦◦◦◦◦◦◦•◦ •◦◦◦◦◦◦◦◦◦• •◦◦•••••◦◦• •◦◦•◦◦◦•◦◦• 2• ◦• • • • • ◦•1 • ××××××××× • • ××××××××× • • ×× • • • • • ◦ • 3 • ×× • ◦ ◦ ◦ • ◦ ◦ • • ×× • ◦ ◦ ◦ • ◦ ◦ • ••••◦◦◦••••
(d) ◦ ◦ ◦ ◦ ◦
◦ • • • ◦
• • ◦ • •
• ◦ ◦ ◦ •
• • ◦ • •
(e) ◦• •• •◦ •• ◦•
(f) • ◦ ◦ ◦ •
• • ◦ • •
◦ • • • ◦
◦ ◦ ◦ ◦ ◦
(g) Figure 3.24: Boundary Fill.
3 Scan Conversion
169
3. The pixel at the top of the stack is popped out (if the stack is empty, the algorithm terminates) and steps 1 and 2 are repeated on that pixel. The result is shown in Figure 3.24d, where two new pixels, numbered 3 and 4, have been pushed into the stack. Repeating steps 2 and 3 results in Figure 3.24e. After a few more repetitions, the situation (which is now much advanced) is as shown in Figure 3.24f. It should be an easy exercise to apply this algorithm to the rest of the example and fill up the entire figure. Note that the region to be filled may include holes and may also be concave, but it should be connected. For the purposes of boundary fill, a region such as in Figure 3.24g is considered two separate, disconnected areas, consisting of five pixels each.
3.9.2 Scan-Line Fill A better approach is to develop a scan line-fill algorithm. This type of algorithm scans the polygon in rows of pixels (which tells us that this type of algorithm is imageprecision), determining the locations of interior pixels in each row, and painting them with the fill color. The determination is done by observing the points where a scan line intersects the edges of the polygon, as illustrated in Figure 18.11. The scan line at y = 5 places three pixels in one span and four pixels in another span of the polygon. The principle is to scan from left to right, to start painting pixels the fill color as soon as the first edge is located, to stop filling when the next edge is found, and to alternate in this way until the last intersection point is found. 12 10
a
b
8 6 4 2 2
4
6
8
10 12
14
Figure 3.25: Pixels on a Scanline.
This simple operation has to solve the following problems: How to compute the intersection of each scan line with the polygon’s edges. Scan line 5 in Figure 18.11 intersects the polygon at four points, two of which (indicated by small triangles) have noninteger x coordinates. We certainly do not want complex computations, involving floating-point numbers, just to determine the intersection points. How to identify the last intersection point between the scan line and an edge. If the algorithm cannot do this, it may have to continue scanning until the right edge of the entire display monitor is reached.
3.9 Filling Polygons
170
The scan line at y = 7 intersects the polygon at three points, one of which corresponds to a vertex of the polygon. An odd number of intersections confuses the basic algorithm. Thus, an intersection of a scan line with a vertex should count as either zero or two intersections. Edge ab of the polygon is horizontal and may result in many intersection points. A correct algorithm has to pay special attention to this case. Perhaps the simplest solution to these problems is to triangulate the given polygon. Any polygon can be decomposed into a mesh of triangles, and triangles are simple geometrical figures that don’t suffer from most of the problems listed above. As we scan a triangle row by row, each row intersects the triangle at either one point (a vertex), two points (edges), or a horizontal edge. Thus, each triangle is easy to fill, which seems to solve the problem of polygon fill. Unfortunately, concave polygons (and also polygons that intersect themselves) are not easy to triangulate, so this approach can be used to fill only convex polygons. Exercise 3.13: What other geometric figure is easy to scan and fill? The following idea solves most of the problems above. Start with a list of the edges of the polygon. Each node in this list is a pair of vertices (actually, it is a pointer to a pair of vertices). Scan-convert each edge with any scan conversion method for straight lines (Section 3.1). The result of scan converting all the edges of a polygon are shown in Figure 18.12a. The points determined by the scan conversion process are placed in a list T where they are sorted by their y coordinates, and within each y, by their x coordinates. Unfortunately, list T is not what we expected. We were hoping for a list where there is an even number of points for each scan line (each y coordinate), but the list resulting from the pixels of Figure 18.12a is different. 4
2
3
1 5
(a)
(b)
Figure 3.26: Boundary Pixels of a Polygon.
A closer look at the figure shows the source of the problem. For edges whose slope is greater than 1, the scan conversion produces one pixel per y value, but scanning a shallow edge produces one pixel for each x value (the gray pixels in Figure 18.12a). Thus, the scan conversion algorithm has to be modified to produce one pixel per y value for any slope, as illustrated in Figure 18.12b. Notice that this modification results in
3 Scan Conversion
171
holes and gaps in the boundary pixels, but each scan line now has an even number of pixels (the black/white pixels are counted twice, because each is created twice, from two edges). There is still one problem. The white pixel (labeled 1) appears in list T twice, because it is a vertex (the common point of two edges), but it is obvious that we want it to appear in T only once, otherwise the third scan line from the bottom would have three points. We therefore examine pixel 1 carefully. It is a vertex of the polygon, but there are four other vertices, labeled 2 through 5, and each of them should appear in T twice. What distinguishes pixel 1 from the other four vertices? The answer, when it finally becomes clear, is obvious. The other four vertices are extreme in some sense but vertex 1 isn’t (we will call such a vertex moderate). Vertex 2, for example, is located at the highest point (maximal y coordinate) of its two member edges (edge 1–2 and edge 2–3). Vertex 5 is located at the lowest point of its two edges. These vertices are extreme, but vertex 1 is located at the ymin of one edge and the ymax of the other edge, which makes it moderate. Thus, we have to add another rule or test to our fill algorithm. Scan list T after it is sorted, looking for pairs of adjacent identical points. The two points of such a pair are endpoints of edges that meet at a vertex. If the vertex is moderate (located at the top of one edge and the bottom of the other edge), then one of the two identical points should be deleted from T . In principle, it does not matter which of the two is deleted, but further experience with this algorithm (with polygons that have horizontal edges) suggests that we should be consistent. We therefore arbitrarily decide to delete the point that is located at the bottom of its edge. In the polygon of Figure 18.12b, we delete the point at the bottom of edge 1–2 and retain the point at the top of edge 5–1. Both points correspond to pixel 1, but now this pixel appears in list T only once. Once list T is complete, the algorithm scans it. The list is sorted by scan lines (y values) and has an even number of points for each scan line. Each of these points causes the algorithm to switch its behavior. The first point starts a fill, the second point terminates the fill, the third point starts another fill, and so on. The third scan line from the top serves as an example. The first boundary pixel starts a string of four fill pixels (shown as triangles). Pixel 3 stops the fill, but the second occurrence of the same pixel starts another fill that includes this pixel and six more. Finally, the rightmost boundary pixel on this scan line stops the fill, and the next point encountered by the algorithm in T corresponds to the next scan line. Our algorithm has to be checked and modified, if needed, so that it deals properly with horizontal edges. It turns out that only the following, simple rule is needed: Delete the two endpoints of every horizontal edge from list T (rather, if the two endpoints of an edge have the same y coordinate, don’t even place them in T ). Figure 3.27 illustrates how this rule works. The figure shows a simple polygon with horizontal and vertical edges. At vertex 1 (moderate), the endpoint of edge 1–2 has been deleted, but the other endpoint (at vertex 2) remains in T . Edge 2–3 is horizontal, so it does not contribute any points to T . Thus, there is only one point at vertex 2 and it starts a fill that is terminated at vertex 3, because this vertex is also moderate (it includes only the endpoint of edge 3–4). The situation at vertex 4 is more complex. Edge 4–5 is horizontal and therefore does not contribute endpoints at vertices 4 and 5. Edge 3–4 has its bottom endpoint at
172
3.9 Filling Polygons
a 1
2
3 4
10
5 6
9 8
7
Figure 3.27: Horizontal edges in a Polygon.
vertex 4, so this endpoint is deleted from T . Thus, there are no points in T for vertex 4, and the scan line at a starts a fill that continues uninterrupted to vertex 5. The horizontal edge 9–10 does not contribute points to T , and the bottom endpoint of edge 1–10 is deleted. Thus, no points in T correspond to vertex 10 and edge 9–10 is not drawn, but the top endpoint of edge 8–9 starts a fill at vertex 9 (which terminates at edge 5–6). Exercise 3.14: What about edge 7–8? Our algorithm works (it works also for polygons with holes), but our arbitrary decision to delete the points located at the bottom of their edges has resulted in horizontal edges missing (i.e., not filled) at the bottom of the polygon. As with many other algorithms, there is more than one way to do things. Earlier we said “scan convert each edge with any scan conversion method for straight lines. . . .” We now describe an efficient method for determining the intersections of edges with scan lines. This method is based on the following observations: (1) If certain edges intersect a scan line, chances are that the same edges will intersect the next scan line (this is referred to as edge coherence and is similar to the concept of pixel correlation, discussed on Page 1035). (2) Once we have determined the intersection (the x value) of an edge with a scan line y, it is easy to compute the intersection (the new x value) of the same edge with the next scan line y + 1. Figure 3.28 illustrates the method. It shows a polygon with six vertices A = (0, 2), B = (10, 6), C = (13, 2), D = (14, 11), E = (6, 9), and F = (2, 10). We assume that the polygon is stored in memory as a list of edges, each of its nodes contains two pointers to the end vertices of the edge (see Section 9.2 for other ways to represent polygons). Thus, the polygon of Figure 3.28 is represented as the list [A, B] → [B, C] → [C, D] → [D, E] → [E, F ] → [F, A] where B is a pointer to the node for vertex B, containing (10, 6), and similarly for the other vertices. (This is not the best polygon representation, but it works well to explain our method). Two simple data structures are employed by the method, an edge table (ET) and an active-edge list (AEL). The ET is a list of nodes (often called buckets) for all the scan lines, i.e., all the y values spanned by the polygon (in our example, the polygon spans y values from 2 to 11). The bucket corresponding to k is the start of a list of edges whose ymin = k.
3 Scan Conversion 11
173 D
F
9
E
7
B
5 3 1
A
1
3
5
7
9
11
C
13
Figure 3.28: Illustrating the Coherence-Based Method.
Thus, many buckets may be empty, as illustrated by the figure. Each bucket of the ET list is either empty or points to a list of edges where each node contains the ymax of the edge, the x value of the bottom endpoint of the edge (the list is kept sorted by these values), and a term of the form 1/a, where a is the slope of the edge (i.e., (ymax − ymin )/(xmax − xmin )). The example that follows explains how the value 1/a is used to determine the intersections of the edge with consecutive scan lines. Notice that 1/a is 0 for vertical edges and is undefined for horizontal edges, but such edges are ignored by our algorithm anyway. 11 10
EF DE 10, 6, −4 11, 6, 4 → → 9 8
.. . 3
AF AB BC CD 10, 0, 1/4 6, 0, 5/2 6, 13, −1/4 11, 13, 1/9 → → → → 2 The AEL is a list of active edges (edges that intersects the current scan line). The algorithm loops over the scan lines in order of increasing y and the AEL is updated in each iteration. Updating the AEL involves the following steps: Edges in the AEL for which ymax = y are deleted. These edges will not be needed in future iterations. Intersections (i.e., x values) are computed for all the edges in the AEL, based on the intersections computed for the previous iteration. This step uses the 1/a term and is illustrated in the example below. Edges for which ymin = y + 1 are added to the AEL. They will be needed in future iterations. Once the functions of the ET and AEL are clear, the method is easy to understand. The main steps are as follows: 1. Set up the ET. Initialize the AEL to an empty list.
174
3.9 Filling Polygons
2. Set y to the smallest nonempty entry of the ET (in our case y ← 2). 3. Repeat the following steps until both the ET and AEL are empty. 4. Bucket y in the ET consists of edge nodes for edges whose ymin = y. Those are the new edges to be added to the AEL. Move them to the AEL and sort the AEL on the x field of the nodes. (Sorting may not be needed because the ET is already sorted.) 5. For each pair (xi , xi+1 ), (xi+2 , xi+3 ),. . . of successive x values in the AEL, compute the endpoints of a span as shown below, convert them to integers by rounding up (in a left edge) or down (in a right edge), and fill the pixels of the span. 6. Scan the AEL for nodes (edges) where ymax = y and remove those nodes. Those edges will not intersect future scan lines. 7. Increment y by 1. 8. Scan the AEL for non-vertical edges and update their x fields for the new y (this is explained below). In our example, we start at y = 2. We move the four nodes of ET bucket 2 to the AEL, and compute two spans, for edges AF and AB (at x = 0) and for edges BC and CD (at x = 13). Each span is a single pixel. No edge should be deleted. We increment y to 3. The ET bucket for y = 3 is empty, so nothing new is moved to the AEL in step 4. Step 5 computes the new endpoints for the span AF–AB as x = 0 + 1/4 and x = 0 + 5/2. The first value is closer to 0, but since AF is a right edge, the algorithm may round 1/4 up to 1. The second value is rounded down to 2. The two endpoints are therefore (1, 3) and (2, 3). Two more endpoints are computed for the span BC–CD. Their x values are 13 − 1/4 → 12 or 13 and 13 + 1/9 → 13. No edges are deleted from the AEL in step 8. Next, y is incremented to 4. No new edges are moved to the AEL (in fact, nothing new will be moved until y reaches 9). The new x values for the AF–AB span are 0 + 2/4 → 1 and 0 + 10/2 → 5. The span from (1, 4) through (5, 4) is filled. The new x values for the span BC–CD are 13 − 2/4 → 12 and 13 + 2/9 → 13. The span from (12, 4) to (13, 4) is also filled. No edges are deleted. Exercise 3.15: When will the first edges be deleted? At y = 5, the x values computed for the two spans are 0 + 3/4 → 1 through 0 + 15/2 → 7 and 13 − 3/4 → 12 through 13 + 3/9 → 13. The rest of the example is easy to complete. The last point that needs be discussed is computing the extreme x values for the spans (step 5 above). The explicit equation of a straight line (Section 9.1) is y = a x + b where a is the slope (yi+1 − yi )/(xi+1 − xi ) of the line. In the special case where yi+1 = yi + 1, the slope becomes a = 1/(xi+1 − xi ) or xi+1 = xi + 1/a. Thus, given an x value of an endpoint for scan line y, the x value for the next scan line y + 1 is x + 1/a rounded up or down to the nearest integer depending on the location (left or right) of the edge in the polygon. As y is incremented, successive values for x have the form x + k/a, which makes them easy to compute. The software has to save the integer part of x and the values of k and m. When k becomes equal to m, the integer part is incremented by 1 and k is reset to 0. Stroke and fill The concept of a stroke was introduced by PostScript (Section 20.5). PostScript is based on entities called paths, where a path can be a polyline, a polygon, or a curved region (open or closed). A path is a two-dimensional object with
3 Scan Conversion
175
two attributes, stroke and fill. The term stroke refers to the edge of the path. It can be thick or thin, it can be in any shade of gray or in any color, and it can be a pattern. Figure 20.3 shows examples of strokes and fills. The point is that the pixels shown in Figure 18.12a constitute the stroke of the polygon. They are not sufficient to fill the polygon, but they are nevertheless useful. Note. Because of the discrete nature of pixels, certain polygons may look bad when filled, regardless of the filling algorithm used. This is especially true for very narrow polygons (and for narrow regions in general), such as the pathological one depicted in Figure 3.29.
Figure 3.29: A Very Narrow (Pathological) Polygon.
3.10 Pattern Filling Once we know how to fill a polygon with a given color, the next step is to find out how a polygon can be filled with copies of a pattern. The pattern is a small M × N rectangle of pixels, and the problem is to fill the interior pixels of the polygon with bits from the pattern such that the pattern will repeat. The main concern here is the placement of the pattern within the polygon. Figure 3.30 shows three variations of pattern fill that illustrate this issue. In part (a) of the figure, The first pixel selected by the fill algorithm (normally the bottom endpoint of the lowest-left edge) is considered the anchor and is filled with the bottom-left pixel of the pattern. If the polygon is moved, the pattern inside the polygon does not change. In part (b) of the figure, the user selects an anchor in the polygon (the thick, short arrow) and the fill algorithm sets this to the bottom-left pixel of the pattern. This is handy in cases where the user wants to try various placements of the pattern in the polygon. In part (c) we assume that the pattern covers the entire screen permanently, but is invisible. When a polygon is placed on the screen and the pattern-fill algorithm is executed, that part of the pattern behind the polygon becomes visible. When the polygon is moved, the pattern in it seems to move in the opposite direction. The technique of part (c) is referred to as absolute anchoring and it has the advantage that polygons may overlap and butt each other seamlessly. A programmer implementing a pattern-fill algorithm may include various options, such as selecting the anchor at random, permitting the user to specify gaps between copies of the pattern, or allowing for triangular patterns that are placed like this . . . .
3.10 Pattern Filling
176
(b) (a) (c)
Figure 3.30: Polygons with Patterns.
Exercise 3.16: Suggest another option that makes sense for a pattern-fill algorithm. The basic mathematics of pattern fill is easy to derive. It employs the modulo operation p mod q, which computes the remainder r of the division p/q. This operation can also be written in terms of the floor operator r =p−q
p , q
where the remainder has the sign of the divisor q. Figure 3.31 illustrates the application of the modulo to our problem. Once the anchor point (a, b) has been determined, by any method, it is painted the color of the bottom-left pixel (pixel (0, 0)) of the pattern. Thus, (a, b) ↔ (0, 0). Obviously, point (a + 1, b) in the polygon corresponds to point (1, 0) in the pattern, and so on for M points. Beyond that, the pattern repeats, so (a + M, b) ↔ (0, 0). x-a g=M M (x,y) r M-r
N
(x,y)
M (a,b)
r = x-a-g
Figure 3.31: The Modulo Operation.
In general, to determine the pattern point that corresponds to polygon point (a + k, 0), we take the distance x − a between the points, and subtract the largest multiple of M that fits into the distance. If the remainder is r, then (a + k, 0) ↔ (r, 0). This computation is easily applied to the y coordinate. The figure shows that polygon point (x, y) should be painted the color of pattern point x−a y−b x−a−M ,y − b − N . M N What about points (x, y) to the left of or below the anchor (a, b)? Figure 3.31 suggests the following. First, count the largest integer multiple of M between x and a,
3 Scan Conversion
177
then compute the remaining distance r, and finally subtract M − r to obtain the correct pattern point. Thus, given polygon point (x, y) to the left of (a, b), the x coordinate of the pattern point that corresponds to it is a−x , M −r =M − a−x−M M and similarly for the y coordinate. In practice, the modulo computations shown here are needed only once in a while. Recall that polygon filling is often done in scan lines, filling one or several spans in each line. If the leftmost point of a span is (x, y), then the modulo computations need be done once, to determine the pattern point (p, q) corresponding to (x, y). Filling the rest of the span is done by selecting pattern point (p + k, q) and incrementing k from 0 to M − 1 (i.e., modulo M ). Skipping to the start of the next span (a distance d) is done by incrementing k by d modulo M .
3.11 Thick Curves So far, we have implicitly assumed that our curves are one-pixel wide. How can we scanconvert thick lines and other geometric primitives? Two simple methods are described here: 1. Replicating pixels. The user specifies the width w of the line. Any desired scan-converting method is used to determine the best pixels. If the line is close to horizontal, each pixel selected is replicated by turning on several pixels above it and below it (Figure 3.32a), to obtain a column that is w pixels high. If the line is close to vertical, several pixels to its left and right are turned on instead. This works better when w is odd; even values of w result in a pixels replicated on one side of the line and a + 1 pixels replicated on the other side. This is a simple, fast method, but it produces lines shaped like parallelograms instead of rectangles (Figure 3.32b). Such lines do not connect√very well and do not always have the right width (if the slope is 45◦ , the width is w/ 2; see Figure 3.32b). When drawing a curve with this method, the program should constantly check the slope to determine whether horizontal or vertical replication is needed. For lines that are not very thick, however, this method may be satisfactory. 2. Using a drawing pen. The user specifies the shape of a pen, which can be a square, a rectangle, a circle, or anything else. The program employs any scan-converting method to determine the best pixels, and draws the shape of the pen (its footprint) centered on every pixel selected (Figure 3.32c). This is slow, since successive copies of the pen may highly overlap, but the results are often visually acceptable. Neither method is ideal. Working with thick lines and curves raises new questions. If the pen is not circular, does it turn while drawing a thick curve? What shape should the endpoints be? How should thick straight segments connect? How to “fatten” a thin line if its thickness should be an even number of pixels? Figure 3.33 illustrates several variations that may be ideal in some applications (but may look bad in others). On the left, we see three types of caps (or line ends), butt,
3.11 Thick Curves
178
w
... ... 0.7071w ...
... ... ... ... ...
w (a)
(b)
(c)
Figure 3.32: Thick Lines.
round, and projecting square. These are followed by four types of joins. Line segments with butt ends do not join well, so drawing and illustration software often offer joining options such as round caps, mitered join, and trimmed (or beveled) miter.
Figure 3.33: Thick Lines Caps and Joins.
How can we draw thick straight segments with uniform thickness regardless of their slopes? Given a segment between (x1 , y1 ) and (x2 , y2 ), its slope a is (y2 − y1 )/(x2 − x1 ). The negate and exchange rule (Page 207) tells us that rotating the segment by 90◦ changes its slope to b = −1/a (Figure 3.34a). Thus, when the original segment is scan converted, we need to replicate each pixel in the direction given by b. Lets concentrate on one pixel (p, q) selected by whatever scan conversion algorithm that we use. If the line segment is close to horizontal (i.e., if the slope satisfies −1 < a ≤ 1), then it will consist of short horizontal strings of pixels (the black pixels in Figure 3.34b), so thickening the line by extending pixels should be done in the general vertical direction (the white pixels in the figure).
/a
1 e=− Slop
nt Segme
e=a Slop (a)
(b)
Figure 3.34: Fattening a Thin Line Segment.
3 Scan Conversion
179
The pixels added around (p, q) should have y coordinates of q + 1, q − 1, q + 2, q − 2, and so on. The first of those will therefore have coordinates (x, q + 1), where x has to be determined. The slope from (p, q) to (x, q + 1) is [(q + 1) − q]/(x − p), which implies x = p − a. The difference p − a is rounded off and the pixel is drawn. Proceeding in the same way, it is easy to verify that the x coordinate of the next pixel (with y = q + 2) is p − 2a. Similarly, the x coordinates of the pixels with y coordinates q − 1, q − 2,. . . are the rounded values of p + a, p + 2a, and so on. The point is that once these coordinates have been determined for one pixel (p, q), there is no need to repeat the computations for other pixels of the segment. Once the software has determined the positions of the white pixels around (p, q), it can paint new pixels (the white empty pixels in the figure) in the same relative positions around any of the black pixels of Figure 3.34b.
3.12 General Considerations When a new scan-converting algorithm is developed or when an existing algorithm is selected for practical use, the following three points should be considered: 1. Is the algorithm symmetric? A symmetric scan-conversion algorithm results in the same set of pixels when run from P1 to P2 as when run from P2 to P1 . Most scan-conversion algorithms are not symmetric, which may result in issues when paths are being drawn and erased. In such a case, it is important not only to use the same algorithm to draw and to erase a path but also to remember which endpoint is P1 and which is P2 . Exercise 3.17: Is the quadrantal DDA method symmetric? 2. Does the algorithm compute any pixels more than once? This may not be important in current computers, where the bitmap is located in memory and the output device is a display monitor. It was, however, important in some old types of displays, such as the (long obsolete) Tektronix storage tubes, where setting a pixel multiple times made it brighter. Exercise 3.18: Does the symmetrical DDA method satisfy point 2? 3. Are points generated in order of minimal adjacent distance? This is important when the output device is a pen plotter. In such a device, it is essential to minimize pen travel in order to save time and to end up with high-quality output. An algorithm that sends the pen all over the place would not only be very slow on a plotter, but may result in poor plotting quality. This feature of the algorithm may even be significant when the bitmap is stored in RAM, since parts of the RAM may be loaded into a cache memory, and setting pixels all over the bitmap may cause unnecessary loading of memory pages into cache. Exercise 3.19: The Bresenham–Michener circle method (Section 3.8.3) calculates pixels in the second octant and uses each pixel to calculate and plot seven more pixels. Suggest a way to modify it to satisfy point 3.
180
3.13 Antialiasing
3.13 Antialiasing All scan-converting methods result in jagged lines. It turns out that the human eye is very sensitive to jagged edges. Reducing the pixel size improves the appearance of jagged edges, but the problem still persists. Better-looking lines and curves can be generated with an antialiasing method. All the antialiasing methods are based on the observation that a pixel, rather than being a mathematical, dimensionless dot, occupies a small area on the screen. Similarly, a line displayed on the screen or printed on paper has some finite width and is not infinitely thin. For a given straight line, even when the best pixels are selected, only some will be centered on the ideal line, and the rest will be off to some extent. The amount by which a pixel is off the line is the distance between its center and the ideal line. This suggests an intuitive method for antialiasing; paint each pixel a shade of gray (or some other color) inversely proportional to its distance from the line. Several examples are shown in Figure 3.35. Especially interesting is example (a) at the bottom-left corner of the figure, where the shades of gray are listed inside each pixel. It is immediately clear that the result (shown in real size in the inset) is very unsatisfactory. It consists of dark and bright areas and looks worse than an aliased line. It is obvious that a good antialiasing method should generate lines where each region has the same average brightness. Figure 3.35b (the three lines at the bottom-right corner) shows how this can be achieved. The top line consists of several nonoverlapping horizontal segments, similar to those generated by octantal DDA (Section 3.3.5) and is aliased; its smaller version is jagged. The middle line illustrates an attempt to blend the individual segments. Each segment receives additional pixels preceding and following it. The pixels preceding the segment become darker, while those following the segment become progressively brighter. The resulting line has two pixels for each x position. The small version looks good, since the eye is directed gradually from one segment to the next. The bottom line is similar but looks even better because the original black segments have been changed to gray. The principle behind such a line is that the two pixels at each x position should have intensities that add up to 100% and are also inversely proportional to their distances from the ideal line. Jaggies: /jag’eez/ n. The “stairstep” effect observable when an edge (especially a linear edge of very shallow or steep slope) is rendered on a pixel device (as opposed to a vector display). —Eric Raymond, The Hacker’s Dictionary.
It should be noted that antialiasing is the opposite of dithering (Section 2.28). Antialiasing adds gray pixels to a black and white image in order to improve its appearance on a grayscale output device, while dithering converts a grayscale image to a black and white image, in order to obtain better output on a monochromatic output device. This section uses the term “grayscale” but the methods described here apply to shades of any color, not just gray. It is possible to antialias a red and white image by using shades of red instead of just maximum red. Reference [Chang 10] employs animation and shades of green to illustrate antialiasing in color.
3 Scan Conversion
181
15 15 35 90
85
25
25
10
15
35
(a)
(b) Figure 3.35: Antialiasing and Gray Pixels.
Another way of looking at antialiasing is as a way to increase the effective resolution of the output device. There are two main approaches to antialiasing. One approach is to extend an existing scan-conversion algorithm so it can compute the correct grayscales for the pixels while the image is being generated. This approach is illustrated in Sections 3.13.1 through 3.13.3. The other approach uses unmodified algorithms to create an entire
3.13 Antialiasing
182
aliased image, then antialiases it by scanning the bitmap and performing certain operations on the values stored there. This approach is illustrated in Sections 3.13.4 and 3.13.5.
3.13.1 The Wu Line Antialiasing Method We start with simple DDA, a method described in Section 3.3.1, and extend it to generate an antialiased line. The discussion here is limited to lines that are close to horizontal (with slopes between zero and 45◦ ), but the generalization to any slopes is straightforward. Given the two points (x1 , y1 ) and (x2 , y2 ), simple DDA starts by calculating the slope a of the line segment between them. If a is in the interval [0, 1], the method sets x to x1 and y to y1 , then goes into a loop where in each iteration it plots a pixel at (x, y), increments x by 1 and increments y by a to stay on the line. Since y is a noninteger, its rounded value should be used when plotting a pixel. This process is summarized by the psuedo-code of Figure 3.36a and it results in an aliased line. (Notice that the code in the figure uses truncation instead of rounding. This is done to accommodate the simple DDA to Wu’s method.) var x,x1,x2: integer; a,y: real; a:=(y2-y1)/(x2-x1); x:=x1; y:=y1; repeat
var x,x1,x2,numgray: integer; a,b,y:real; calculate a and b; x:=x1; y:=y1; repeat d=y-trunc(y); gray1=(1-d)*numgray; gray2=d*numgray; pixel(x,trunc(y),gray1); pixel(x,trunc(y)+1,gray2); x:=x+1; y:=y+a; until x>x2;
pixel(x,trunc(y)); x:=x+1; y:=y+a; until x>x2;
(a)
(b)
Figure 3.36: Simple DDA (a) Aliased, (b) Antialiased.
increment y 0
13
26
39
52
100
87
74
61
48
(a)
0 65
77
90 100
35
23
c d
10
(b)
Figure 3.37: Distances Between Line and Pixels.
Figure 3.37 shows how antialiasing is added to this simple method (the numbers inside the pixels are grayscales in percentages). In each iteration, the distance d between the center of the selected pixel P = (x, trunc(y)) and the line y = ax+b is measured and a grayscale inversely proportional to d is selected for the pixel. (Since d is in the range [0, 1], that grayscale should be proportional to 1 − d.) In addition, the pixel immediately
3 Scan Conversion
183
above P (or, if the slope is negative, the pixel below P ) is also plotted, with a grayscale inversely proportional to its distance c from the line (or, similarly, proportional to 1 − c, but 1 − c = d, since the distance between the centers of the two pixels is one unit). This pixel is the companion of P . Because of the nature of human vision, the effect of plotting two pixels with complementary intensities (i.e., intensities that add up to 1) at each x position creates the impression of a single, full-intensity pixel, positioned precisely on the line. A simple analogy is two masses positioned at different points. Their combined effect is identical to that of a single mass, positioned at the center of gravity of the two original masses. We assume 2n grayscales, ranging from white (0) to black (2n − 1, numgray in Figure 3.36b.) Under this assumption, the grayscale for the main pixel should be (1 − d)(2n − 1) and the intensity of its companion pixel should be d · (2n − 1). Figure 3.36b is pseudo-code for this extended method. Even though this method is easy to understand and produces lines that are pleasing to the eye, it is inefficient. The value of y is real and has to be rounded off before a pixel can be plotted. Also, the calculation of the grayscale value for each of the two pixels involves a multiplication followed by rounding the product to the nearest integer (because d is real). The rest of this section shows how this simple algorithm can be redesigned and made efficient. The ideas presented here are due to Xiaolin Wu [Wu 91] and are also discussed (with C code included) in [Abrash 92]. An alternative C code is available at [Elias 01]. The slope of a line is a real number and real numbers are normally represented in the computer in floating-point. Floating-point operations are slow and should be avoided, but the slope is definitely not an integer. It turns out that in addition to integer and floating-point there is another way to represent numbers in the computer, namely fixedpoint. When an integer is stored in a register or in memory, we implicitly assume that there is a binary point to the right of the number. In a fixed-point representation, the binary point can be located at any position. The binary point itself is not stored in the computer. A fixed-point number consists of the magnitude bits (and perhaps a sign bit) and the routines that operate on these numbers are told where the point is assumed to be. In our application, the slope is a fraction in the range (0, 1), so it can be stored as a fixed-point number where the binary point is assumed to be to the left of the number. We can call such a representation a 0.16 fixed-point scheme (a 16-bit number where the point is at position 0). Before discussing any details, two observations that justify the use of this scheme should be mentioned. (1) Horizontal, vertical, and diagonal (45◦ ) lines do not require antialiasing because they pass through the center of all their pixels, so they are better dealt with separately. This is why we can assume that the slope is in the open interval (0, 1). (2) For lines close to vertical, the slope a is greater than 1, but Section 3.3.1 shows that for these lines we need the quantity 1/a, and this number is again in the interval (0, 1). When the 16-bit unsigned number 110. . . 0 is interpreted as an integer, it is a large number. It equals 214 +215 = 49,152. When the same bit pattern is interpreted as a 0.16 fixed-point number, it equals 0.112 = 2−1 + 2−2 = 0.75. Similarly, the 0.16 fixed-point number 1010. . . 0 equals 0.1012 = 2−1 + 2−3 = 0.625. To avoid the use of floating-point numbers, the slope is computed as a 0.16 fixed-point number in a variable v, and another
3.13 Antialiasing
184
variable V (which starts from zero) is incremented by v in each iteration of the loop. We use the line segment of Figure 3.37a as an example. This segment goes up one unit in the y direction for every eight units in the x direction, so its slope is 1/8 = 0.0012 . . . 0 . When v When stored as a 0.16 fixed-point number in v, the slope becomes 0010 . is repeatedly added to V , the latter variable goes through the values 00000 . . . 0, 00100 . . . 0, 01000 . . . 0, 01100 . . . 0, 10000 . . . 0, 10100 . . . 0, 11000 . . . 0, 11100 . . . 0,
13
(3.8)
and then wraps back to 00000 . . . 0. This is an indication to the algorithm to increment y. In pseudo-code, this part is VT:=V; V:= V + v; if V < VT then y:=y+1; The main point is that incrementing V by v is done by integer addition, as if the two variables were integers. This is how floating-point numbers and operations are avoided. The next point is the computation of the grayscale values for the two pixels in each iteration. This process is also greatly simplified because of the way variable V is maintained. This variable indicates how far the current pixel is from the ideal line. Since V is represented in fixed-point, its significant bits are on the left, so the leftmost n bits of V can serve as an indication of the grayscale needed for the current pixel. A glance at Equation (3.8) shows a simple relation between the successive values of V and the grayscales of Figure 3.37a. It turns out that the n most significant bits of V are the correct grayscale for the companion pixel. Since the grayscales for the two pixels are complementary (add up to 1), the grayscale for the main pixel is given by the complement of those n bits (one’s complement, since these bits represent an unsigned number). As an example, consider the case of 32 = 25 grayscales. In this case, n = 5 and the five leftmost bits of V get the values 00000, 00100 = 4, 01000 = 8, 01100 = 12, 10000 = 16, 10100 = 20, 11000 = 24, and 11100 = 28,
(3.9)
before they reset to 00000. When computed as percentages of 31 (the largest 5-bit integer), these numbers become 0, 13, 26, 39, 52, 65, 77, and 90 %, respectively. These are the values shown in the figure for the grayscales of the companion pixels. The complements of the values in Equation (3.9) are 11111 = 31, 11011 = 27, 10111 = 23, 10011 = 19, 01111 = 15, 01011 = 11, 00111 = 7, and 00011 = 3. When computed as percentages of 31, these numbers become 100, 87, 74, 61, 48, 35, 23, and 10 %. These are the grayscales shown in the figure for the main pixels.
3 Scan Conversion
185
The pseudo-code below summarizes the main steps in computing the two grayscales. The quantity shft is set to 16 (the size of V ) minus 5 (the value of n). Thus, V is shifted to the right 11 positions, which brings its five most significant bits to the right end. The constant 31 is simply five bits of 1. When used as the second operand of an xor, this constant complements the bits of the first operand. Thus, calculating the two grayscales involves a shift followed by an xor, two simple (i.e., fast) operations. shft:=16-5; companion:=V>>shft; main:=companion xor 31; pixel(x,y,main); pixel(x,y+1,companion); Because of the two features above (the use of fixed-point arithmetic and the way pixel intensities are computed) this algorithm is extremely fast, in addition to producing excellent results. Another way to speed it up is to use the natural symmetry of a line segment (point 1 on Page 145). The line can be drawn from both ends toward its middle, plotting four pixels, two main and two companions, in each iteration. The main steps are listed in Figure 3.38, where it is assumed that x1 < x2 and y1 < y2 . After plotting four pixels, two at each end of the line, the algorithm brings x1 and x2 closer. If a decision is made to increment the y coordinate, then y1 is incremented while y2 is decremented, thereby creating a line segment symmetric about its center. var x1,y1,x2,y2,maxgray,shft,V,v: integer; shft:=16-5; V:=0; v:=(y2-y1)shft; main:=companion xor 31; pixel(x1,y1,main); pixel(x1,y1+1,companion); pixel(x2,y2,main); pixel(x2,y2+1,companion); VT:=V; V:= V + v; if V < VT then y1:=y1+1; y2:=y2-1; x1:=x1+1; x2:=x2-1; until x2>x1; Figure 3.38: Symmetric Wu Algorithm.
3.13.2 The Wu Circle Antialiasing Method The Wu line antialiasing method of Section 3.13.1 can be modified to generate antialiased circles. The resulting algorithm, described in this section, is simple to implement, is fast, and produces circles that look pleasing to the eye. However, since a circle is geometrically
3.13 Antialiasing
186
more complex than a straight line segment, the Wu circle method [Wu 91] is not as elegant as his line method and uses precomputed data from a built-in table. The algorithm determines the best pixels for the first octant of the circle (Figure 3.39d) by looping from point (r, 0) to the end of the octant, where x = y. The other seven octants are done by symmetry. Since the first octant is closer to vertical than to horizontal, the y coordinate is incremented in each iteration, and two pixels are plotted side by side. Instead of naming them main and companion, it is more natural to denote them by c and f (for ceiling and floor, respectively). When the octant-1 pixels are copied to the second octant, the two pixels of each iteration are positioned vertically, as shown in Figure 3.39a. y s a
n c
52 51
o e eo
1
cn
6 5
as
(a)
(b) y
c
if
x=y
d
x ic
450
(c)
x (r,0) (d)
Figure 3.39: Wu Circle Antialiasing.
We denote the (x, y) coordinates of the current pixel by i (initialized to r) and j (initialized to 0), respectively. The y coordinate j is incremented in each iteration and the x coordinate should be decremented from time to time until i ≤ j. The equation of a circle of radius r, centered on the origin is x2 + y 2 = r2 . For the first octant, this
3 Scan Conversion
187
implies x = r 2 − y 2 , and since i must be an integer, it is obvious that the two best pixels in each iteration have i coordinates if = r2 − j 2 and ic = r2 − j 2 = if + 1, (these are the reasons for the terms “floor” and “ceiling”). The first task of the algorithm is to decide when to decrement i. Figure 3.39b shows that i should be decremented for those j values that satisfy the condition r 2 − (j − 1)2 = r 2 − j 2 + 1. (3.10) √ √ (In the figure, when j = 6, the difference r 2 − 52 − r 2 − 62 is much less than 1, but when j = 52, the difference is 1.) It can be shown that the condition of Equation (3.10) is equivalent to r2 − (j − 1)2 − r2 − (j − 1)2 > r 2 − j 2 − r2 − j 2 , and this condition, in turn, is equivalent to D(r, j −1) > D(r, j), where D(r, j) is defined in Equation (3.11). The quantities D(r, j) are precomputed for many r values and are stored in a table. They are used by the algorithm to determine when to decrement i. Thus, the Wu circle algorithm is table driven, but has the advantage that the same table is also used by the algorithm for its second task, namely determining the grayscales of the two pixels in each iteration. This task is discussed now. The grayscales of the two pixels c and f should depend on their distances from the ideal circle. Figure 3.39c illustrates how the grayscales are computed. It shows that the horizontal distance d of the pixel at (ic , j) from the true circle is d = ic − x = r 2 − j 2 − r 2 − j 2 . The grayscale value of pixel c should be inversely proportional to d or, equivalently, proportional to 1 − d. Assuming 2n grayscales as in Section 3.13.1, the correct intensity for pixel c is therefore (1 − d)(2n − 1) and that of pixel f is the complementary value d·(2n − 1). These two values should be rounded to the nearest integer. The rounded values of d·(2n − 1) are calculated by adding 0.5 and truncating. Thus, D(r, j) = (2n − 1) r 2 − j 2 − r2 − j 2 + 0.5 .
(3.11)
These values are precomputed and are stored in a table D. Since they depend on the radius r, the table should contain intensity data for many values of r. (For other values of r, the algorithm has to calculate the D values, which slows it down considerably.) However, √ the number of values needed for a given r is approximately r sin 45◦ = ( 2/2)r ≈ 0.7r (this is the y coordinate of the last point of the first octant). The number of values stored in the table for r values from 1 to R is therefore √ √ ( 2/2)(1 + 2 + · · · + R) = ( 2/2)(R + 1)R/2 ≈ 0.35R2 ,
3.13 Antialiasing
188
and each is an n-bit number. For R = 430 and n = 8 (256 grayscales), the table size is 64,715 bytes. This is less than 64K bytes, but the table has variable-size rows and therefore cannot be implemented as a two-dimensional array. A simple implementation of this table is illustrated in Figure 3.41a. Table D is a one-dimensional array where the data for each r is stored in a segment. The segments gets longer with r but they contain just the data. An auxiliary array P contains pointers to the start of each segment in D. When the algorithm starts, it knows the radius r and retrieves a pointer p from P (r). Inside the loop, p is incremented in each iteration and the data is loaded from D(p). The algorithm is illustrated by the flowchart of Figure 3.41b. It is obvious that this method is simpler than the Bresenham line method of Section 3.8.3. Notice also that the first pixel is plotted outside the loop. This pixel has full intensity, and so has no companion.
3.13.3 Pitteway–Watkinson Algorithm The Pitteway–Watkinson method is based on the first approach. It has been developed to antialias the edges of a filled polygon. It provides a simple way of computing the area of a pixel that’s inside the polygon. Figure 3.40a shows a polygon edge with a slope a = 3/10. A careful look shows that as we move from pixel 1 to 2 to 3, the areas under the edge grow (the small areas marked “shade” should be considered parts of pixels 3, 6, and 9). When moving diagonally, from 3 to 4, or from 6 to 7, the area shrinks. (We assume that each pixel covers an area of size 1×1 and that the first pixel is half covered.) The rule at the heart of this algorithm is: When moving horizontally, the pixel area occupied by the polygon (i.e., that part of the pixel that’s inside the polygon) grows by the slope a. When moving diagonally, the area shrinks by 1 − a. Figure 3.40b shows how the area added to a pixel can be divided into two triangles, each with a base of size 1 and a height of size a.
shade 10 7 4 1
2
5
6
8
1
1
9
1
a a
3
(a)
(b)
Figure 3.40: The Pitteway–Watkinson Algorithm.
Assuming that the hardware supports intensity levels 0 through I, we define a quantity i as the integer nearest a × I and either increment the intensity by i, or decrement it by 1 − i, as we move from pixel to pixel along the edge of the filled polygon.
3 Scan Conversion r=1 r=2 r=3
64,715
D
P
189
430
r=430
(a) i:=r; j:=0; pixel(i,j,maxgray); Temp:=0; p:=P(r); i ≤ j?
Yes
Stop
j:=j+1 p:=p+1 D(p)>Temp? No
Yes i:=i-1
pixel(i,j,D(p)) pixel(i-1,j,D(p)) Temp:=D(p) (b) Figure 3.41: Wu Circle Method (a) Table, (b) Flowchart.
The algorithm as described above works only for polygon edges with slopes 0◦ ≤ a ≤ 45◦ (the first octant) but can easily be extended to slopes in other octants. The second approach to antialiasing proceeds as follows: (1) scan-conversion algorithms are used to create all the objects in the image (lines, circles, filled polygons, etc.) in a buffer in memory and (2) an algorithm scans all the pixels in the buffer and creates an antialiased image in the bitmap by changing intensities or adding/deleting pixels. The two similar methods of supersampling and filtering use this approach and are described in Sections 3.13.4 and 3.13.5, respectively.
3.13.4 Supersampling This method is based on the observation that aliasing (i.e., the problem of jagged edges)
3.13 Antialiasing
190
The Image is more than an idea. It is a vortex or cluster of fused ideas and is endowed with energy. —Ezra Pound. decreases with higher resolution. When supersampling is used, the scan-converting algorithm is executed for a resolution higher than that of the hardware. If the hardware resolution is, say, 256×256 with 16 colors (i.e., four bitplanes), then the original (black and white) algorithm is executed in a buffer that’s four times wider and taller (i.e., of size 1K ×1K). Each pixel in the bitmap now corresponds to 16 black and white pixels in the buffer and its (antialiased) value is set to be proportional to the number of black pixels in this group of 16. The algorithm scans the buffer group by group. For each group, the number of black pixels is computed. It can be between 0 and 16, but the pixel in the bitmap can only take the values 0 through 15. We therefore have to set the bitmap pixel to a value slightly lower than the sum, for example, 15 round × (black pixels) . 16 Hence, a group with 15 black pixels (the dashed box in Figure 3.43a) yields intensity 14, but in a group with 4 black pixels, the rounding results in intensity 4. If the hardware supports many colors and high resolution, the algorithm has to use a large buffer and it becomes computationally intensive. With a bitmap resolution of 1K × 1K and 8-bit pixels, the buffer must have a size of 8K × 8K = 64M bits and it is divided into 1M (about a million) groups of 64 bits each. This is an example of a graphics algorithm that can benefit from parallel execution.
(a)
(b) Figure 3.42: Supersampling.
Figure 3.42 is an example of a thick line antialiased by this method. The result (shown in the inset) is unsatisfactory. It is easy to see that supersampling may do a better job with the jagged edges of filled polygons (as in Figure 3.43a), but is not well suited to thin lines. Figure 3.43b shows an example of such a line. Each group sums to 4, so each results in a pixel of intensity 4. The final line will therefore still be aliased and will also be dim.
3 Scan Conversion
(a)
191
(b) Figure 3.43: Supersampling.
3.13.5 Filtering The principle of filtering is averaging. The original scan-conversion algorithms are applied to generate all the objects in the hardware resolution, in either black and white or shades of gray. The pixels are stored in a buffer. The algorithm then scans the buffer in small overlapping groups, each centered on a pixel (Figure 3.44a). The pixel values in each group are averaged and a value is assigned to the pixel in the bitmap, depending on the average. Different weights are assigned to the pixels in the group, depending on their distances from the center. Figure 3.44b shows typical weights for a 3 × 3 group. Note that the weights add up to 1.
1 36 1 9 1 36
(a)
1 9 4 9 1 9
1 36 1 9 1 36
(b) Figure 3.44: Filtering.
A pixel at position (i, j) in the bitmap is now assigned the weighted average of itself and its eight nearest neighbors as follows: P (i, j) =
1 1 1 I(i − 1, j + 1) + I(i, j + 1) + I(i + 1, j + 1) 36 9 36 4 1 1 + I(i − 1, j) + I(i, j) + I(i + 1, j) 9 9 9 1 1 1 + I(i − 1, j − 1) + I(i, j − 1) + I(i + 1, j − 1). 36 9 36
Note that special treatment should be given to the pixels at the boundary of the bitmap, since they have fewer than eight neighbors. Alternatively, two more rows (one at the
3.14 Convolution
192
top and one at the bottom) can be added to the bitmap, together with two additional columns. They can be set to zero (or to other values, if special effects are needed), so that every pixel will have eight neighbors. In the case of black and white pixels, this weighted sum has a value in the range [0, 1] and has to be scaled to whatever range [0, n] is supported by the hardware. In the case of multilevel pixels, a different scaling is necessary. For high-resolution hardware, larger filter groups, such as 5 × 5 or 7 × 7, produce better results. It is interesting to observe that supersampling is a special case of filtering. The 4 × 4 supersampling example described earlier corresponds to a 4 × 4 filtering matrix where all the weights equal 1/16.
3.14 Convolution Both supersampling and filtering are special cases of the general technique of convolution. Many practical image processing problems start with two functions that have to be combined. Normally, one function represents some kind of a signal (for example, a voltage that represents color, or discrete numbers representing pixel intensities) and the other one represents a weight. The signal has to be combined with the weight over a certain range. For continuous functions f (x) and g(x), the convolution is defined as C(x) =
∞
f (t)g(x − t) dt.
−∞
For the discrete functions used in computer graphics, the convolution is defined as: C(i, j) =
xmax −1 ymax −1 x=0
I(x, y)f (i − x, j − y).
y=0
In our case, I(x, y) is the intensity of the pixel at position (x, y), and f (u, w) is the filter. The 3 × 3 filter of Figure 3.44b can be written as a convolution if it is defined as ⎧ 4/9, ⎪ ⎪ ⎪ ⎨ 1/9, f (u, w) = 1/9, ⎪ ⎪ ⎪ ⎩ 1/36, 0,
u = w = 0, u = 0 and |w| = 1, w = 0 and |u| = 1, |u| = |w| = 1, |u| > 1 or |w| > 1.
The last line in the definition takes care of cases where either i − x or j − y are negative. Passing through to Fifth Avenue Rodney began to scan the numbers on the nearest houses.
—Horatio Alger, Jr., Cast Upon the Breakers.
Plate B.1. Free-Form Deformation (ArtText).
(a)
(b)
(c)
(d)
(e)
(f)
Plate B.2. Particle Systems: (a) small number, (b) large number, (c) large particles, static, (d) short life, (e) long life, (f) large particles, moving emitter (ParticleIllusion).
Plate B.3. A Transparent Hemisphere (MegaPOV).
Plate B.4. A Surface of Revolution (SurfaceExplorer).
Plate B.5. Fractals Generated by Mobius Transformations (Cinderella).
Plate C.1. Ray Tracing and Fractal Terrain (Terragen).
PlateC.2.MagrittorMaigret?(ImageFramer).
PlateC.3.ComplexKnots(KnotMaker).
Plate D.1. An Anaglyph Painted on a Vancouver Street. In September 2010, the Canadian organization www.preventable.ca prepared an anaglyph of a little girl playing. This image was painted on a street in West Vancouver, as an attempt to slow down drivers. When a driver approaches this painting, it first seems like a shapeless blob, then like a real, three-dimensional figure of a girl playing in the street, and finally like the optical illusion it actually is. This original approach to preventing accidents has immediately created a controversy, some people claiming that such an attempt may actually increase the number of accidents because this image confuses drivers, which then become disoriented and may slow down and even stop unnecessarily.
Plastic
Rock
OldStone
Bronze
BrushedAluminum
Gold
PlateD.2.AHeadinVariousMaterials,Textures,andIlluminations(Modo).
Part II Transformations and Projections The three main topics of Part II of this book are transformations, projections, and perspective and it is interesting to note that these terms are ambiguous. Here is what the dictionary has to say about them. Transformation (a) The act or an instance of transforming. (b) The state of being transformed. A marked change, as in appearance or character, usually for the better. Mathematical transformation. (a) Replacing a variable in an expression by its value. (b) Mapping a mathematical space onto another or onto itself. In geometry. Moving, rotating, reflecting, or otherwise systematically deforming a geometric figure (discussed in this book). In linguistics. (a) A rule to convert a syntactic form into another. (b) A sentence or sentential form derived by such a rule; a transform. In genetics. (a) The change undergone by a cell upon infection by a cancer-causing virus. (b) The alteration of a bacterial cell caused by the transfer of DNA from another bacterial cell, especially a pathogen. Projection The act of projecting or the condition of being projected. (a) An object or part thereof that extends outward. (b) Spiky projections on top of a fence. (c) A projection of land along the coast. A prediction or an estimate of a future situation, based on current data or trends.
Transformations and Projections
194
(a) The process of projecting a recorded image onto a viewing surface. (b) An image so projected. In mathematics. The image of an n-dimensional geometric figure reproduced in n − 1 or fewer dimensions. The most common case is for n = 3 (which is discussed in this book). In psychology. The attribution of one’s own beliefs or suppositions to others (such as when a scientist projects his beliefs into the subjects of his research or into theories he develops). Perspective (a) A view or scene. (b) A mental view or outlook. The appearance of objects in depth as perceived by normal binocular vision. (a) The relationship of aspects of a subject to each other and to a whole, “let’s put this into perspective.” (b) Subjective evaluation of relative significance, “in my perspective as an electrician, this wire is defective.” (c) The ability to perceive things in their actual interrelations or comparative importance “in perspective, this flood is minor.” The technique of representing three-dimensional objects and depth relationships on a two-dimensional surface (discussed in this book). (Adjective) Of, relating to, seen, or represented in perspective. This is why writing is such a liberating thing. You get to know what you didn’t know you knew. —Richard Lederer. The term “transformation” as discussed in this book refers to a geometric operation applied to all the points of an object. An object may be moved, rotated, or scaled (shrunk or stretched). It may be reflected about a plane (as in a mirror) or deformed in some way, as illustrated by Figures 4.1 and 4.3. Several transformations may be combined and may completely modify the position, orientation, and shape of the object. Many graphics operations are greatly simplified with the help of transformations. A forest can be created from a single tree by duplicating the tree several times and moving and transforming each copy differently. An object can be animated by moving it along a path in small steps while also rotating and scaling it slightly at each step. Transformations, both two-dimensional and three-dimensional, are discussed in Chapter 4. Currently, virtually all our graphics output devices (Chapter 26) are two dimensional, but many graphics projects and objects are three-dimensional. Converting a three-dimensional graphics object or scene into two dimensions is a mathematical operation termed projection. In general, a projection transforms an object from n dimensions to n − 1 or fewer dimensions, but in computer graphics n is always 3. Because of the loss of one dimension, an object loses some visual information when projected. It is therefore important to study the various types of projections and always use the right one. Chapters 5 through 7 discuss, explain, and illustrate the three main classes of projections: parallel, perspective, and nonlinear.
Transformations and Projections
195
Figure 5.1. An Impossible Fork (Duplicate).
This insert discusses the impossible fork of Figure 5.1 (duplicated here). This fork is an impossible three-dimensional object (one of many such objects). Such objects cannot be created in three dimensions but can be drawn in two dimensions because any projection causes loss of image detail. There are many examples of impossible objects, and this one, which is known as Schuster’s conundrum or the Devil’s fork, is especially simple. Notice that this impossible object cannot be colored. The following original, succinct description of impossible objects is given by [Penrose and Penrose 58]. “Each individual part is acceptable as a representation of an object normally situated in three-dimensional space; and yet, owing to false [connections] of the parts, acceptance of the whole figure on this basis leads to the illusory effect of an impossible structure.” Perspective (or more accurately, linear perspective) is the general name of several techniques that create the illusion of depth in a two-dimensional drawing. The rules of perspective determine where and how to place objects in a painting or drawing so that they appear to have depth and seem to be at the correct distance from the observer. A picture in perspective creates in the viewer’s brain the same sensation as the original three-dimensional scene. The main tool employed by linear perspective is vanishing points. Perspective, including its history, its use in art, its applications to computer graphics, and its mathematical representation, is the topic of Chapter 6. Here is a short description of the structure of this part of the book. Chapter 4 introduces two-dimensional and three-dimensional geometric transformations. It is shown that the latter are more plentiful and more complex than the former and are also more difficult to specify and visualize. Rotation is a good example of the difference between two-dimensional and three-dimensional transformations. In two dimensions there are only two directions of rotation, clockwise and counterclockwise, and rotations are performed about a point. In three dimensions, rotations are about an axis and the terms clockwise and counterclockwise are ambiguous. Fortunately, all the important two-dimensional transformations can be specified by a 3×3 transformation matrix, and this matrix is easy to extend to the three-dimensional case, where all the important transformations can be specified by means of a 4×4 matrix. Thus, the use of a transformation matrix is elegant and leads to a deeper understanding of transformations. Other topics discussed in this chapter are (1) homogeneous coordinates, (2) combinations of transformations, such as a rotation followed by a reflection, and (3) transforming the coordinate system instead of the object.
196
Transformations and Projections
The remainder of Part II is devoted to projections, and Chapter 5 introduces parallel projections. These are used mostly in engineering drafting but can also be found in Eastern art. There are three classes of parallel projections: orthographic, axonometric, and oblique (although it is shown at the end of the chapter that the last two types are similar). An orthographic projection displays one side or one face of the object. The downside of this type is that three projections are needed in order to see the entire object. On the other hand, it is easy to compute dimensions of object details from measurements made on the projection. Axonometric projections normally show three sides of the object. Thus, a single projection shows more of the object, but it is more difficult to compute dimensions of parts of the object because each face of the object may be shrunk by a different factor when drawn in the projection. Oblique projections are similar to axonometric projections and employ certain projection angles in order to simplify the process of measuring and computing dimensions. Perspective projections are the topic of Chapter 6. The chapter starts with an intuitive explanation of the important concept of vanishing points. It follows with a short history of perspective, its origins, and its applications to art. Sections 6.3 and 6.4 are devoted to perspective projection in curved objects, a topic that is neglected by most texts on perspective. The bulk of the chapter develops the mathematics of perspective in a systematic way, approaching this topic from several points of view and illustrating it with examples. The chapter ends with a detailed presentation of stereoscopic images, an important application of perspective. Chapter 7 treats the important (and alas, neglected) topic of nonlinear projections. The most important nonlinear projections are the fisheye projection (Section 7.2), the panoramic projection (Section 7.5), and the many sphere projections (Section 7.15). In addition, this chapter includes material and examples on six-point perspective (Section 7.9), panoramic cameras (Section 7.11), telescopic and microscopic projections (Sections 7.12 and 7.13), and anamorphosis (Section 7.14). The heart of mathematics consists of concrete examples and concrete problems. —Paul Halmos, How to Write Mathematics (1973). Books and Internet resources for transformations and projections. Godel, Escher, Bach: An Eternal Golden Braid, by Douglas Hofstadter. Basic Books, 20th Anniversary edition, 1999. This classical volume discusses symmetries in art, literature, and science. Transformation Geometry: An Introduction to Symmetry, by George E. Martin. Springer-Verlag, 1982. An excellent mathematical reference. Symmetry Discovered: Concepts and Applications in Nature and Science, by Joe Rosen. Dover Press, 1975. An accessible introduction to the ideas of symmetry. The New Ambidextrous Universe, by Martin Gardner. W. H. Freeman and Company, 1990. A beautifully written exploration of symmetry. Symmetry, by Hermann Weyl. Princeton University Press, 1952. A classic illustrated introduction to symmetry.
Transformations and Projections
197
The Renaissance and Rediscovery of Linear Perspective, by Samuel Y. Edgerton. Harper and Row, 1976 (especially chapters 9 and 18). Secret Knowledge: Rediscovering the Lost Techniques of the Old Masters, by David Hockney. Viking, 2001. The Science of Art, Optical Themes in Western Art From Brunelleschi to Seurat, by Martin Kemp. Yale University Press, 1990. The Life of Brunelleschi, by Antonio Tuccio Manetti, edited by Howard Saalman. Penn State University, 1970. Geometry: An Investigative Approach, and Laboratory Investigations in Geometry, by Phares G. O’daffer and Stanley R. Clemens. Addison-Wesley, 1976. Reference [Wolfram 06] has information, examples of, and code to create many panoramic projections and map projections. Reference [handprint 06] has a detailed discussion titled “Elements of Perspective.” History is the transformation of tumultuous conquerors into silent footnotes.
Paul Eldridge, Maxims for a Modern Man (1965)
4 Transformations The 1960s were the golden age of computer graphics. This was the time when many of its basic methods, algorithms, and techniques were developed, tested, and improved. Two of the most important concepts that were identified and studied in those years were transformations and projections. Workers in the graphics field immediately recognized the importance of transformations. Once a graphics object is constructed, the use of transformations enables the designer to create copies of the object and modify them in important ways. The necessity of projections was also realized early. Sophisticated graphics requires three-dimensional objects, but graphics output devices are twodimensional. A three-dimensional object has to be projected on the flat output device in a way that will preserve its depth information. Thus, early researchers in computer graphics developed the mathematics of parallel and perspective projections and implemented these techniques. Nonlinear projections deform the projected image in various ways and are mostly used for artistic and ornamental purposes. These projections were also studied and implemented over the years by many people. Exercise 4.1: Most nonlinear projections are valued for their artistic and ornamental effects, but there is at least one type of nonlinear projection that has important practical applications. What is it? The English term sea-change (or seachange) was coined by Shakespeare in his play The Tempest. The term means a gradual transformation in which the form is retained but the substance is replaced. Thus, sea-change is a real-life transformation. In computer graphics (and in other fields of science) the term transformation refers to a process that varies the location and orientation (i.e., the form) of an object while normally retaining its shape (i.e., substance) or at least its topology. Today, transformations and projections are important components of computer graphics and computer-aided design (CAD). Transformations save the designer work and time, while projections are necessary because three-dimensional output devices are still rare (but see Section 6.15 for autostereoscopic displays, a revolutionary technique for D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_4, © Springer-Verlag London Limited 2011
199
200
Transformations
three-dimensional displays), hence this part of the book. Figure 4.1 shows the power of even the simplest two-dimensional transformations. It illustrates, from left to right, the following transformations: rotation, reflection, deformation (shearing), and scaling (see also Figure 4.3). It is not difficult to imagine the power of combining these transformations, but it is more difficult to imagine and visualize the power and flexibility of three-dimensional transformations.
Figure 4.1: Elementary Two-Dimensional Transformations.
The basic two-dimensional transformations are translation, rotation, reflection, scaling, and shearing. They are simple, but it is their combinations that make them powerful. It comes as a surprise to realize that these transformations can be specified by means of a single 3×3 matrix where only six of the nine elements are used. The same five basic transformations also exist in three dimensions, but have more degrees of freedom and therefore require more parameters to fully specify them. The general transformation matrix in three dimensions is 4×4, where 13 of the 16 elements control the transformations and the remaining three are used to specify the orientation of the projection plane in the case of perspective projections. Exercise 4.2: What transformations are possible in one dimension? In contrast with the five basic transformations, there are more than five types of projections. As Figure 4.2 illustrates, we distinguish between linear and nonlinear projections. The former class consists of parallel and perspective projections, while the latter class includes many different types. Each type of projection has variants. Thus, parallel projections are classified into orthographic, axonometric, and oblique, while perspective projections include one-, two-, and three-point projections. Nonlinear projections are all different and employ different approaches and ideas. Linear projections, on the other hand, are all based on the following simple rule of projection. Rule. A three-dimensional object is projected on a two-dimensional plane called the projection plane. The object must be fully located on one side of the plane, and we imagine a viewer or an observer located on the other side. On that side, we select a point termed the center of projection, and it is the location of this point that determines the class, parallel or perspective, of the linear projection. A three-dimensional point P on the object is projected to a two-dimensional point P∗ on the projection plane by connecting P to the center of projection with a straight segment. Point P∗ is placed at the intersection of this segment with the projection plane. When the center of projection
4 Transformations
201
Projections Linear
Parallel
Nonlinear
Perspective
Orthographic
One-point
Axonometric
Two-point
Oblique
Three-point
Fisheye, Panorama, Telescopic,
Microscopic, Map, others...
Figure 4.2: Classification of Projections.
is at infinity, the result is a parallel projection. If the center of projection is at the observer, the projection is perspective. When working on computer graphics projects, we discover very quickly that transformations are an important part of the process of building an image. If an image has two identical (or even similar) parts, such as wheels, only one part need be constructed from scratch. The other parts can be obtained by copying the first and then moving, reflecting, and rotating it to bring it to the right shape, size, position, and orientation. Often, we want to zoom in on a small part of an image so that more detail can be seen. Sometimes it is useful to zoom out, so a large image can be seen in its entirety on the screen, even though no details can then be discerned. Operations such as moving, rotating, reflecting, or scaling an image are called geometric transformations and are discussed in this chapter for two and three dimensions.
4.1 Introduction Mathematically, a geometric transformation is a function f whose domain and range are points. We denote by P a general point before any transformation and by P∗ the same point after a transformation. The notation P∗ = f (P) implies that the transformed point P∗ is obtained by applying f to P. We call our transformations geometric because they have geometric interpretations. Thus, only certain functions f can be used. Years of study and practical experience have shown that in order for it to be meaningful as a geometric transformation, a function must satisfy two conditions: it has to be onto and one-to-one. A general function f maps its domain D into its range R. If every point in R has a corresponding point in D, then the function maps its domain onto its range. An example is f (x) = x, which maps the real numbers onto the integers. Every integer has a real number (in fact, infinitely many real numbers) that map to it. Another example is g(x) = 1/x, a mapping from the real numbers into the real numbers. This mapping is
202
4.1 Introduction
not onto because no real number maps to zero. Requiring a transformation to be onto makes sense since it guarantees that there will not be any special points P∗ that cannot be reached by the transformation. An arbitrary function may map two distinct points x and y into the same point. Function f (x) above maps the two distinct numbers 9.2 and 9.9 into the integer 9. A one-to-one function satisfies x = y → f (x) = f (y). Function g(x) above is one-to-one. The requirement that a transformation be one-to-one makes sense because it implies that a given point P∗ is the transformed image of one point only, thereby making it possible to reconstruct the inverse transformation. Definition. A geometric transformation is a function that is both onto and oneto-one, and whose range and domain are points. Exercise 4.3: Do either of the two real functions f1 (x, y) = (x2 , y) and f2 (x, y) = (x3 , y) satisfy the definition above? There are two ways to look at geometric transformations. We can interpret them as either moving the points to new locations or as moving the entire coordinate system while leaving the points alone. The latter interpretation is discussed in Section 4.5, but the reader should realize that whatever interpretation is used, the movement caused by a geometric transformation is instantaneous. We should not think of a point as moving along a path from its original location to a new location, but rather as being grabbed and immediately planted in its new location. The description of right lines and circles, upon which geometry is founded, belongs to mechanics. Geometry does not teach us to draw these lines, but requires them to be drawn. —Isaac Newton (1687). Combining transformations is an important operation that is discussed in detail in Section 4.2.2. This paragraph intends to make it clear that such a combination (sometimes called a product) amounts to a composition of functions. If functions f and g represent two transformations, then the composition g ◦f represents the product of the two transformations. Such a composition is often written as P∗ = g(f (P)). It can be shown that combining transformations is associative (i.e., g ◦ (f ◦ h) = (g ◦ f ) ◦ h). This fact, together with a few other basic properties of transformations, makes it possible to identify groups of transformations. A discussion of mathematical groups is beyond the scope of this book but can be found in many texts on linear algebra. A set of transformations constitutes a group if it includes the identity transformation, if it is closed, and if every transformation in the set has an inverse that is also included in the set. An example of a group of transformations is the set of two-dimensional rotations about the origin through angles of 0◦ and 180◦ . This two-element set is a group because a 0◦ rotation is an identity transformation and because a 180◦ rotation is the inverse of itself. Exercise 4.4: Is the operation of combining transformations commutative?
4 Transformations
203
Another important example of a group of transformations is the set of linear transformations that map a point P = (x, y, z) to a point P∗ = (x∗ , y ∗ , z ∗ ), where x∗ = a11 x + a12 y + a13 z + a14 , y ∗ = a21 x + a22 y + a23 z + a24 , z ∗ = a31 x + a32 y + a33 z + a34 .
(4.1)
Each new coordinate depends on all three original coordinates, and the dependence is linear. Such transformations are called affine and are defined more rigorously on Page 218. A little thinking shows that the coefficients ai4 of Equation (4.1) represent quantities that are added to the transformed coordinates (x∗ , y ∗ , z ∗ ) regardless of the original coordinates, thereby simply translating P∗ in space. This is why we start the detailed discussion here by temporarily ignoring these coefficients, which leads to the simple system of equations x∗ = a11 x + a12 y + a13 z, y ∗ = a21 x + a22 y + a23 z, (4.2) ∗ z = a31 x + a32 y + a33 z. If the 3×3 coefficient matrix of this system of equations is nonsingular or, equivalently, if the determinant of the coefficient matrix is nonzero (see any text on linear algebra for a refresher on matrices and determinants), then the system is easy to invert and can be expressed in the form x = b11 x∗ + b12 y ∗ + b13 z ∗ , y = b21 x∗ + b22 y ∗ + b23 z ∗ , (4.3) z = b31 x∗ + b32 y ∗ + b33 z ∗ , where the bij ’s are expressed in terms of the aij ’s. It is now easy to see that, for example, the two-dimensional line Ax + By + C = 0 is transformed by Equation (4.3) to the two-dimensional line (Ab11 + Bb21 )x∗ + (Ab12 + Bb22 )y ∗ + C = 0. Exercise 4.5: Show that Equation (4.3) maps the general second-degree curve Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0 to another second-degree curve. In general, an affine transformation maps any curve of degree n to another curve of the same degree.
204
4.2 Two-Dimensional Transformations
4.2 Two-Dimensional Transformations In practice, a complete two-dimensional image is constructed on the screen object-byobject and it may be edited before it is deemed satisfactory. One aspect of editing is to transform objects. Typical transformations (Figures 4.1 and 4.3 and color Plate Q.1) are moving or sliding (translation), reflecting or flipping (mirror image), zooming (scaling), rotating, and shearing (distorting). Notice how the orientation of Bach’s nose in Figure 4.3 is different for reflection and rotation.
Figure 4.3: Two-Dimensional Transformations.
The transformation can be applied to every pixel of the object. Alternatively, it can be applied only to some key points that fully define the object (such as the four corners of a rectangle), following which the transformed object is constructed from the transformed key points. As soon as we use words like “image,” we are already thinking of how one shape corresponds to the other—of how you might move one shape to bring it into coincidence with the other. Bilateral symmetry means that if you reflect the left half in a mirror, then you obtain the right half. Reflection is a mathematical concept, but it is not a shape, a number, or a formula. It is a transformation—that is, a rule for moving things around. —Ian Stewart, Nature’s Numbers (1995). The same principle applies to a three-dimensional image. Such an image consists of one or more three-dimensional objects that can be transformed individually, following which the entire image should be projected on the two-dimensional screen (or other output device). We first take a look at the mathematics of two-dimensional transformations.
4 Transformations
205
We use the notation P = (x, y) for a point and P∗ = (x∗ , y ∗ ) for the transformed point. We are looking for a simple, fast transformation rule, so it is natural to try a linear transformation (i.e., a mathematical rule that does not use operations more complex than multiplications and shifts). The simplest linear transformation is x∗ = ax + cy and y ∗ = bx + dy, in which each of the new coordinates is a linear combination of the two old This transformation can be written P∗ = PT, where T is the 2×2 matrix coordinates. ab c d . Thus, the transformation depends on just four parameters, which makes it easy to analyze and fully understand it. To understand the effect of each of the four matrix elements, we start by setting b = c = 0. The transformation becomes x∗ = ax and y ∗ = dy, i.e., scaling. If applied to all the points of an object, all the x dimensions are scaled by a factor of a and all the y dimensions are scaled by a factor of d. Note that a and d can also be less than 1, which results in shrinking the object. If a or d (or both) equal −1, the transformation is a reflection. Any other negative values result in both scaling and reflection. Note that scaling an object by factors of a and d changes its area by a factor of a × d and that this factor is also the value of the determinant of the scaling matrix a0 d0 . Here are examples of scaling and reflection. In A, the y coordinates are scaled by a factor of 2. In B, the x coordinates are reflected. In C, the x dimensions are shrunk to 0.001 of their original values. In D, the figure is shrunk to a vertical line. A=
1 0 0 2
,
B=
−1 0 0 1
,
C=
0.001 0 0 1
,
D=
0 0 0 1
.
Exercise 4.6: In the novel The Oxford Murders, the author mentions the sequence of symbols 1122 8. Guess the meanings of the symbols and the next symbol in this sequence. (Hint. Ignore the obvious meanings of the M and the 8. This has to do with symmetry, specifically, with reflection.) Exercise 4.7: What scaling transformation changes a circle to an ellipse? The next step is to set a = 1 and d = 1 (no scaling or reflection) and explore the ∗ effect of matrix elements b and c. The transformation becomes x∗ = x + cy, y = bx + y. We first set b = 1 and c = 0 and easily find out that matrix 10 11 transforms the four points (1, 0), (3, 0), (1, 1), and (3, 1) to (1, 1), (3, 3), (1, 2), and (3, 4), respectively. When we plot the original points and the transformed points (Figure 4.4a), it becomes obvious that the original rectangle has been sheared vertically and was transformed into a parallelogram. A similar shearing effect results from matrix 11 01 . The quantities b and c are therefore responsible for shearing. Figure 4.4b shows the connection between shearing and the operation of scissors, which is the reason for the term shearing. Exercise 4.8: Apply the shearing transformation 10−1 1 to the four points (1, 0), (3, 0), (1, 1), and (3, 1). What are the transformed points? What geometrical figure do they represent? The next important transformation is rotation. Figure 4.5 shows a point P rotated clockwise about the origin through an angle θ to become P∗ . Simple trigonometry yields
4.2 Two-Dimensional Transformations
206
gr lo lle ra Pa
4 3 2 1
am
y
Rectangle x 1 2 3 4 5 (a)
(b) Figure 4.4: Scissors and Shearing.
x = R cos α and y = R sin α. From this, we get the expressions for x∗ and y ∗ x∗ = R cos(α − θ) = R cos α cos θ + R sin α sin θ = x cos θ + y sin θ, y ∗ = R sin(α − θ) = −R cos α sin θ + R sin α cos θ = −x sin θ + y cos θ. Hence, the clockwise rotation matrix in two dimensions is
cos θ sin θ
− sin θ cos θ
,
which also equals the product
cos θ 0
0 cos θ
1 tan θ
− tan θ 1
.
(4.4)
This shows that any rotation in two dimensions is a combination of scaling (and, perhaps, reflection) by a factor of cos θ and shearing, an unexpected result (that’s true for all angles where tan θ is finite).
P P*
x
x*
Figure 4.5: Clockwise Rotation.
Exercise 4.9: Show how a 45◦ rotation can be achieved by scaling followed by shearing.
4 Transformations
207
Exercise 4.10: Discuss rotation in two dimensions using the polar coordinates (r, θ) of points instead of the Cartesian coordinates (x, y). A rotation matrix has the following property: When any row is multiplied by itself, the result is 1, and when a row is multiplied by another row, the result is 0. The same is true for columns. Such a matrix is called orthonormal. Matrix T1 below rotates counterclockwise. Matrix T2 reflects about the line y = x, and matrix T3 reflects about the line y = −x. Note the determinants of these matrices. In general, a determinant of +1 indicates pure whereas a determinant of −1 rotation, indicates pure reflection. (As a reminder, det ac db = ad − bc.)
T1 =
cos θ − sin θ
sin θ cos θ
;
T2 =
0 1 1 0
;
T3 =
0 −1 −1 0
.
(4.5)
Exercise 4.11: Show that a y-reflection (i.e., reflection about the x axis) followed by a reflection through the line y = −x produces pure rotation. Exercise 4.12: Show that the transformation matrix ⎞ ⎛ 2t 1 − t2 2 2 1 + t 1 + t ⎠ ⎝ −2t 1 − t2 2 2 1+t 1+t produces pure rotation. Exercise 4.13: For what values of A does the following matrix represent pure rotation and for what values does it represent pure reflection? a/A b/A . −b/A a/A A 90◦ Rotation: For a 90◦ clockwise rotation, the rotation matrix is cos(90) − sin(90) 0 −1 = . sin(90) cos(90) 1 0
(4.6)
A point P = (x, y) is therefore transformed to point (y, −x). For a counterclockwise 90◦ rotation, (x, y) is transformed to (−y, x). This is called the negate and exchange rule. Representations rotated not always by one hundred and eighty degrees, but sometimes by ninety or forty-five, completely subvert habitual perceptions of space; the outline of Europe, for instance, a shape familiar to anyone who has been even only to junior school, when swung around ninety degrees to the right, with the west at the top, begins to look like Denmark. —Georges Perec, Life, A User’s Manual (1976).
4.2 Two-Dimensional Transformations
208
The Golden Ratio Start with a straight segment of length l and divide it into two parts a and b such that a + b = l and l/a = a/b. l a b The ratio a/b is a constant called the Golden Ratio and is denoted φ. It is one of the important mathematical constants, like π and e, and was already known to the ancient Greeks. There seems to be a general belief that geometric figures can be made more pleasing to the eye if they obey this ratio. One example is the golden rectangle, whose sides are x and xφ long (Plate P.3). Many classical buildings and paintings seem to include this ratio. [Huntley 70] is a lively introduction to the Golden Ratio. It illustrates properties such as
φ=
1+
1+
√ 1 + 1 + ···
and
φ=1+
1 1+
1 1 1+ ···
.
The value of φ is easy to determine. The basic ratio l/a = a/b = φ implies (a + b)/a = a/b = φ, which, in turn, means 1 + b/a = φ or 1 + 1/φ = φ, an equation that can be √ written φ2 −φ−1 = 0. This equation is easy to solve, yielding φ = (1+ 5)/2 ≈ 1.618 . . ..
φ 1 1/φ
1 (a)
(b)
(c)
Figure 4.6: The Golden Ratio.
The equation φ = 1 + 1/φ illustrates another unusual property of φ. Imagine the golden rectangle with sides 1 and φ (Figure 4.6a). Such a rectangle can be divided into a 1 × 1 square and a smaller golden rectangle of dimensions 1 × 1/φ. The smaller rectangle can now be divided into a 1/φ × 1/φ square and an even smaller golden rectangle (Figure 4.6b). When this process continues, the rectangles converge to a point. Figure 4.6c shows how a logarithmic spiral can be drawn through corresponding corners of the rectangles.
4 Transformations
209
4.2.1 Homogeneous Coordinates Unfortunately, our simple 2 × 2 transformation matrix cannot generate all the basic transformations that are needed in practice! In particular, it cannot generate translation. This is easy to see by arguing that any object containing the origin will, after any of the transformations above, still contain the origin [i.e., the result of (0, 0)T is (0, 0) for any matrix T]. Translations can be expressed by x∗ = x+m, y ∗ = y +n, and one way to implement them is to generalize our transformations to P∗ = PT + (m, n), where T is the familiar 2 × 2 transformation matrix. A more elegant approach, however, is to stay with the compact notation P∗ = PT and to extend T to the 3×3 matrix ⎞ a b 0 T = ⎝ c d 0⎠. m n 1 ⎛
(4.7)
This approach is called homogeneous coordinates and is commonly used in projective geometry. It makes it possible to unify all the two-dimensional transformations within one 3 × 3 matrix with six parameters. The problem is that a two-dimensional point (a pair) cannot be multiplied by a 3×3 matrix. This is solved by representing our points in homogeneous coordinates, which is done by extending the pair (x, y) to the triplet (x, y, 1). The rules for using homogeneous coordinates are the following: 1. To transform a point (x, y) to homogeneous coordinates, simply add a third component of 1. Hence, (x, y) ⇒ (x, y, 1). 2. To transform the triplet (a, b, c) from homogeneous coordinates back into a pair (x, y), divide by the third component. Hence, (a, b, c) ⇒ (a/c, b/c). This means that a point (x, y) has an infinite number of representations in homogeneous coordinates. Any triplet (ax, ay, a) where a is nonzero is a valid representation of the point. This suggests a way to intuitively understand homogeneous coordinates. We can consider the triplet (ax, ay, a) a point in three-dimensional space. When a varies from 0 to ∞, the point travels along a straight ray from the origin to infinity. The direction of the ray is determined by x and y but not by a. Therefore, each two-dimensional point (x, y) corresponds to a ray in three-dimensional space. To find the “real” location of the point, we look at the z = 1 plane. All points on this plane have coordinates (x, y, 1), so we only have to strip off the “1” in order to see where the point is located. Section 4.4 shows that homogeneous coordinates can also be applied to three-dimensional points. Exercise 4.14: Write the transformation matrix that performs (1) a y-reflection, (2) a translation by −1 in the x and y directions, and (3) a 180◦ counterclockwise rotation about the origin. Apply this compound transformation to the four corners (1, 1), (1, −1), (−1, 1), and (−1, −1) of a square centered on the origin. What are the transformed corners? Matrix (4.7) is the general transformation matrix in two dimensions. It produces the most general linear transformation, x∗ = ax + cy + m, y ∗ = bx + dy + n, and it shows that this transformation is fully specified by just six numbers.
4.2 Two-Dimensional Transformations
210
We can gain a deeper understanding of homogeneous coordinates when we include two more parameters in matrix (4.7), writing it as ⎛
⎞ a b p ⎝ c d q⎠. m n 1
(4.8)
A general point (x, y) is now transformed to ⎞ a b p (x, y, 1) ⎝ c d q ⎠ = (ax + cy + m, bx + dy + n, px + qy + 1). m n 1 ⎛
Applying rule 2 shows that the transformed point (x∗ , y ∗ ) is given by x∗ =
ax + cy + m , px + qy + 1
y∗ =
bx + dy + n . px + qy + 1
To understand what this means, we apply this result to the four points (2, 1), (6, 1), (2, 5), and (6, 5) that constitute the four corners of a square (Figure 4.7a). Using the simple transformation ⎞ ⎛ 1 0 1 ⎝0 1 1⎠ 0 0 1 (i.e., no scaling, rotation, shearing, or translation and p = q = 1), the points are transformed to P1 = (2, 1) → (2, 1, 4) → (1/2, 1/4), P2 = (6, 1) → (6, 1, 8) → (3/4, 1/8), P3 = (2, 5) → (2, 5, 8) → (1/4, 5/8), P4 = (6, 5) → (6, 5, 12) → (1/2, 5/12). The transformed points (Figure 4.7b) also seem to form a square, but one that’s viewed from a different direction and seen in perspective. This suggests that our transformation (using just p and q, without scaling, reflection, rotation, or shearing) has moved the square from its original position in the xy plane to another plane. Such transformations are called projections and are useful when dealing with objects in three-dimensional space.
4.2.2 Combining Transformations Matrix notation is useful when working with transformations, because it makes it easy to combine transformations. To combine transformations A, B, and C, we write the three transformation matrices and multiply them. An example is an x-reflection, followed by a y-scaling, followed by a 45◦ rotation
−1 0 0 1
1 0 0 2
0.707 −0.707 0.707 0.707
=
−0.707 0.707 1.414 1.414
.
4 Transformations y 5 4 3 2 1
211
y P3
1
P4
3/4
P3 P4
1/2
P1
P2
1/4
P1
x 1/4
1 2 3 4 5 6 (a)
P2
1/2 3/4
x 1
(b)
Figure 4.7: A Two-Dimensional Projection of a Square.
In general, matrix multiplication is noncommutative, reflecting the fact that geometric transformations are also noncommutative. It is easy to convince yourself that, for example, a rotation about the origin followed by a translation is not the same as a translation followed by a rotation about the origin. Note that all the transformations discussed earlier are performed about the origin. Figure 4.8a shows an object rotated 40◦ clockwise. It is easy to see that the center of rotation is the origin. If, for example, we want to rotate an object about a point P, we have to translate both the object and the point such that P goes to the origin (Figure 4.8b), then rotate the object, and finally translate back (Figure 4.8c). Similarly, to reflect an object through an arbitrary line, we have to (1) translate the line (and the object) until it passes through the origin, (2) rotate the line (and the object) until it coincides with one of the coordinate axes, (3) reflect through that axis, (4) rotate back, and (5) translate back.
rotate about origin
translate
translate back
rotate (a)
(b)
(c)
Figure 4.8: Rotation About a Point.
(Transformations are often done about the origin. See Exercise 6.11 for an example
4.2 Two-Dimensional Transformations
212
on how this affects scaling in three dimensions.) Exercise 4.15: Derive the rotation matrix for a two-dimensional rotation about a point (x0 , y0 ) using just trigonometry (i.e., without using translation). Example: Reflection about the line y = x + 1. This line has a slope of 1 (i.e., it makes an angle of 45◦ with the x axis) and it intercepts the y axis at y = 1. We first translate down one unit, then rotate clockwise by 45◦ , then reflect through the x axis, rotate back, and translate back. The result is (α stands for both sin 45◦ and cos 45◦ ) ⎛
1 0 T = ⎝0 1 0 −1 ⎛ 0 = ⎝ 2α2 −2α2
⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ 0 α −α 0 1 0 0 α α 0 1 0 0 0 ⎠ ⎝ α α 0 ⎠ ⎝ 0 −1 0 ⎠ ⎝ −α α 0 ⎠ ⎝ 0 1 0 ⎠ 1 0 0 1 0 0 1 0 0 1 0 1 1 ⎞ ⎛ ⎞ 2 0 1 0 2α 1 0 0⎠ = ⎝ 1 0 0⎠ 1 1 −1 1 1
(because 2α2 = sin2 45◦ + cos2 45◦ = 1). Note that det T = −1, i.e., pure reflection. Exercise 4.16: Demonstrate that the result in the example is correct. Example: Reflection about an arbitrary line. Given the line y = ax + b, it is possible to reflect a point about this line by transforming the line to the x axis, reflecting about that axis, and transforming the line back. Since a is the slope (i.e., the tangent of the angle α between the line and the x axis) and b is the y intercept, the individual transformations needed are (1) a translation of −b units in the y direction, (2) a clockwise rotation of α degrees about the origin, (3) a reflection about the x axis, (4) a counterclockwise rotation, and (5) a reverse translation. The combined transformation matrix is therefore ⎛
Treflect
⎞⎛ 1 0 0 cos α − sin α = ⎝ 0 1 0 ⎠ ⎝ sin α cos α 0 −b 1 0 0 ⎛ ⎞⎛ cos α sin α 0 1 0 × ⎝ − sin α cos α 0 ⎠ ⎝ 0 1 0 0 1 0 b ⎞ ⎛ cos(2α) sin(2α) 0 = ⎝ sin(2α) − cos(2α) 0 ⎠ . −b sin(2α) 2b cos2 α 1
⎞⎛ ⎞ 0 1 0 0 0 ⎠ ⎝ 0 −1 0 ⎠ 1 0 0 1 ⎞ 0 0⎠ 1 (4.9)
The determinant of this transformation matrix equals −1, as should be for pure reflection. For the two special cases α = b = 0 and α = 45◦ and b = 0, Equation (4.9) reduces to ⎛ ⎞ ⎛ ⎞ 1 0 0 0 1 0 ⎝ 0 −1 0 ⎠ and ⎝ 1 0 0 ⎠ , respectively. 0 0 1 0 0 1
4 Transformations
213
One feature that makes Equation (4.9) less than general is the way the sine and cosine are obtained from the tangent of a known angle. Given that the slope a equals tan α, we can calculate a = tan α =
sin α sin α = , cos α 1 − sin2 α
which yields sin2 α = a2 /(1 + a2 ) or sin α = ± √
a 1 + a2
1 and cos α = ± √ . 1 + a2
The signs depend on the angle (or rather the quadrant in which the angle happens to be) and cannot be determined in a general way. Exercise 4.17: Compute the numerical value of matrix Treflect for the case α = 30◦ and b = 1. Exercise 4.18: Digital images displayed on a screen or printed on paper consist of pixels. Even smooth curves are made of pixels. Thus, there is a need for efficient algorithms to compute the best pixels for a given curve or geometric figure. The circle has a high degree of symmetry, which is why it is possible to determine the best pixels for a given circle by computing the pixels for one octant and duplicating and transforming each pixel seven times to complete the remaining seven octants. The question is, is it possible to improve such an algorithm even more by doing half an octant and duplicating each pixel 15 times? Another feature that makes Equation (4.9) less than general is the use of the explicit representation y = ax + b. This representation is limited because it cannot express vertical lines (for which a would be infinite). When reflecting a point about an arbitrary line, it is better to use the more general implicit representation of a straight line ax + by + c = 0, where a or b but not both can be zero. The slope of this line is −a/b, and substituting b = 0 yields a vertical line. Given such a line, we start with a point P = (x, y) and its reflection P∗ = (x∗ , y ∗ ) about the line. It is clear that the segment PP∗ must be perpendicular to the line, so its equation must be bx − ay + d = 0. Since both P and P∗ are on such a line, they satisfy bx − ay + d = 0 and bx∗ − ay ∗ + d = 0. Subtracting these two expressions yields b(x − x∗ ) = a(y − y ∗ ).
(4.10)
We assume that P∗ is the reflection of P about the line ax +by + c = 0, so the midpoint of segment PP∗ , which is the point (x + x∗ )/2, (y + y∗ )/2 , must be on this line and must therefore satisfy x + x∗ y + y∗ a +b + c = 0. (4.11) 2 2 ∗ ∗ Equations (4.10) and (4.11) can easily be solved for x and y . The solutions are 2a(ax + by + c) 2b(ax + by + c) P∗ = (x∗ , y ∗ ) = x − , y − a2 + b2 a2 + b2
4.2 Two-Dimensional Transformations
214 =
(b2 − a2 )x − 2aby − 2ac −2abx + (a2 − b2 )y − 2bc , a2 + b2 a2 + b2
.
(4.12)
Equation (4.12) is easy to verify intuitively for vertical and for horizontal lines. When b is zero, the line becomes the vertical line x = −c/a and Equation (4.12) reduces to 2a(ax + c) 2c P∗ = (x∗ , y ∗ ) = x − , y . , y = −x − a2 a When a = 0, the line is the horizontal y = −c/b, and the same equation reduces to P∗ = (x∗ , y ∗ ) =
2b(by + c) 2c x, y − = x, −y − . b2 b
The transformation matrix for reflection about an arbitrary line ax + by + c = 0 is directly obtained from Equation (4.12) ⎛
b2 − a2 T = ⎝ −2ab −2ac
−2ab a 2 − b2 −2bc
0 0
1 a2 +b2
⎞ ⎠.
(4.13)
Its determinant is det T =
(b2 − a2 )(a2 − b2 ) − 4a2 b2 a4 + 2a2 b2 + b4 =− = −(a2 + b2 ), a2 + b2 a2 + b2
which equals −1 (pure reflection) for lines expressed in the standard form (defined as the case where a2 + b2 = 1). Exercise 4.19: Use Equation (4.12) to obtain the transformation rule for reflection about a line that passes through the origin. We turn now to the product of two reflections about the two arbitrary lines L1 : ax + by + c = 0 and L2 : dx + ey + f = 0 (Figure 4.9a). This product can be calculated from Equation (4.13) as the matrix product ⎛
b2 − a2 ⎝ −2ab −2ac
−2ab a 2 − b2 −2bc
0 0
1 a2 +b2
⎞⎛
e2 − d2 ⎠ ⎝ −2de −2df
−2de d2 − e2 −2ef
0 0
1 d2 +e2
⎞ ⎠,
but this product is complex and difficult to interpret geometrically. In order to simplify it, we assume, without loss of generality, that both lines pass through the origin and that the first is also horizontal (Figure 4.9b). The first assumption means that the lines intersect at the origin and that c = f = 0. The second assumption means that the first line is identical to the x axis, so a = 0 and b = 1. Also, f = 0 implies dx + ey = 0 or y = −(d/e)x. The quantity −d/e is the slope (i.e., tan θ) of the second line, so we conclude that sin θ d , implying d2 + e2 = 1. − = − tan θ = − e cos θ
4 Transformations
215
L2
L2
L1
L1
(a)
(b)
Figure 4.9: Reflections About Two Intersecting Lines.
Under these assumptions, the matrix product above becomes ⎛
⎞⎛ 2 1 0 0 e − d2 −2de ⎝ 0 −1 0 ⎠ ⎝ −2de d2 − e2 0 0 1 0 0 ⎞ ⎛ 2 e − d2 −2de 0 e2 − d2 0 ⎠ = ⎝ 2de 0 0 1 ⎛ ⎞ cos(2θ) − sin(2θ) 0 = ⎝ sin(2θ) cos(2θ) 0 ⎠ , 0 0 1
⎞ 0 0⎠ 1
(4.14)
leading to the important conclusion that the product of two reflections about arbitrary lines is a rotation through an angle 2θ about the intersection point of the lines, where θ is the angle between the lines. It can be shown that the opposite is also true; any rotation is the product of two reflections about two intersecting lines. The discussion above assumes that both lines pass through the origin. In the special case where θ = 0, such lines would be identical, so reflecting a point P about them would move it back to itself. However, for θ = 0, matrix (4.14) reduces to the identity matrix, so it is valid even for identical lines. In the special case where the lines are parallel, their intersection point is at infinity and a rotation about a center at infinity is a translation. Exercise 4.20: Given the two parallel lines y = 0 and y = c, calculate the double reflection of a point (x, y) about them. Exercise 4.21: Consider the shearing transformation Ta of Equation (4.15), followed by the 90◦ rotation Tb . What is the combined transformation, and what kind of transformation is it? ⎛
⎞ 0 1 0 Ta = ⎝ 2 0 0 ⎠ , 0 0 1
⎛
cos 90◦ Tb = ⎝ sin 90◦ 0
− sin 90◦ cos 90◦ 0
⎞ 0 0⎠. 1
(4.15)
216
4.2 Two-Dimensional Transformations
Exercise 4.22: Given the two rotations ⎛ ⎛ ⎞ cos θ1 − sin θ1 0 cos θ2 cos θ1 0 ⎠ and T2 = ⎝ sin θ2 T1 = ⎝ sin θ1 0 0 1 0
− sin θ2 cos θ2 0
⎞ 0 0⎠, 1
calculate the combined transformation T1 T2 . Is it identical to a rotation through (θ1 + θ2 )? Exercise 4.23: Given the two shearing transformations ⎞ 1 b 0 T1 = ⎝ 0 1 0 ⎠ 0 0 1
⎞ 1 0 0 and T2 = ⎝ c 1 0 ⎠ , 0 0 1
⎛
⎛
calculate the combined transformation T1 T2 . Is it identical to a shearing by factors b and c? Exercise 4.24: Prove that three successive shearings about the x, y, and x axes is equivalent to a rotation about the origin. Exercise 4.25: Matrix a0 d0 scales an object by factors a and d along the x and y axes, respectively. If we want to scale the object by the same factors, but in the i and j directions (see Figure 4.10, where i and j are perpendicular and form an angle θ with the x and y axes, respectively), we need to (1) the object θ degrees clockwise, (2) rotate scale along the x and y axes using matrix a0 d0 , and (3) rotate back. Write the three transformation matrices and their product. Discuss the case a = d (uniform scaling). Exercise We can perform an exercise with shearing, similar to Exercise 4.25. 4.26: Matrix c1 1b shears an object by factors c and b along the x and y axes, respectively. Calculate the matrix that shears the object by the same factors, but in the i and j directions (Figure 4.10).
y j x
i
Figure 4.10: Scaling Along Rotated Axes.
4 Transformations
217
Exercise 4.27: Discuss scaling relative to a point (x0 , y0 ), and show that the result is identical to the product of a translation followed by scaling, followed by a reverse translation. Using Equation (Ans.2) in the Answers to Exercises, it is easy to explore the effect of two consecutive scaling transformations, with scaling factors of k1 and k2 and about points P1 = (x1 , y1 ) and P2 = (x2 , y2 ), respectively. We simply multiply the two transformation matrices ⎛ ⎝
⎞⎛ ⎞ k2 0 0 0 0 k1 ⎠ ⎝ 0 0⎠ 0 k1 0 k2 x1 (1 − k1 ) y1 (1 − k1 ) 1 x2 (1 − k2 ) y2 (1 − k2 ) 1 ⎞ ⎛ 0 0 k1 k2 0⎠. 0 k1 k2 =⎝ k2 (1 − k1 )x1 + (1 − k2 )x2 k2 (1 − k1 )y1 + (1 − k2 )y2 1
(4.16)
The result is similar to Equation (Ans.2) except for the bottom row. It seems that the product of two scalings is a third scaling with a factor k1 k2 , but about what point? To write Equation (4.16) in the form of Equation (Ans.2), we write k2 (1 − k1 )x1 + (1 − k2 )x2 = xc (1 − k1 k2 ), k2 (1 − k1 )y1 + (1 − k2 )y2 = yc (1 − k1 k2 ), and solve for (xc , yc ), obtaining k2 (1 − k1 )x1 + (1 − k2 )x2 , 1 − k1 k2 k2 (1 − k1 )y1 + (1 − k2 )y2 . yc = 1 − k1 k2
xc =
The center of the double scaling is therefore point Pc =
k2 (1 − k1 ) 1 − k2 P1 + P2 = aP1 + bP2 . 1 − k1 k2 1 − k1 k2
Notice that a + b = 1, which is why Pc is a point on the straight segment connecting P1 and P2 (see also Equation (Ans.42)). In the special case P1 = P2 , it is easy to see that the center of the double scaling is Pc = P1 = P2 . Exercise 4.28: What is the result of two consecutive scalings with the same scaling factors but about two different points?
218
4.2 Two-Dimensional Transformations y
t2 1+
1-t2
2t
x
Figure 4.11: A Unit Circle.
Exercise 4.29: Show that all the points with coordinates (t2 , t), where 0 ≤ t ≤ 1, after being transformed by ⎞ ⎛ −1 0 1 ⎝ 0 2 0⎠, 1 0 1 lie on the perimeter of the unit circle x2 + y 2 = 1. (Hint: See Figure 4.11.) It is easy to see that the transformations discussed here can change lengths and angles. Scaling changes the lengths of objects. Rotation and shearing change angles. One feature that’s preserved, though, is parallel lines. A pair of parallel lines will remain parallel after any scaling, reflection, rotation, shearing, and translation. A transformation that preserves parallelism (and also maps finite points to finite points) is called affine.
4.2.3 Fast Rotations Rotation requires the calculation of the transcendental sine and cosine functions, which is time consuming. If many rotations are needed, it is preferable to precompute the trigonometric functions for many angles and store them in a table. This section shows how to do this using integers only, a method that results in much faster rotations than using floating-point numbers. The method is illustrated for the first quadrant (rotation angles of 0◦ to 90◦ ) in increments of 1◦ . Notice that rotations in other quadrants can be achieved by a firstquadrant rotation followed by a reflection. The following Mathematica code generates 91 sine values, from sin 0◦ = 0 to sin 90◦ = 1, multiplies each by 214 = 16,384, rounds them, and stores them in a table as 16-bit integers ranging from 0 to 16,384. d2r=Pi/180; Table[Round[N[16384*Sin[i*d2r]]], {i,0,90}] The 91 values are listed in Table 4.12, but notice that they are only approximations of the true sine values. (Even floating-point sine values are, in general, just approximations, but normally better than our integers.) This means that the use of this table for many successive rotations of a point may place it farther and farther away from its true position. When we perform many successive rotations of an object that consists
4 Transformations θ 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
sin θ 0 1428 2845 4240 5604 6924 8192 9397 10531 11585 12551 13421 14189 14849 15396 15826 16135 16322 16384
θ 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
sin θ 286 1713 3126 4516 5872 7182 8438 9630 10749 11786 12733 13583 14330 14968 15491 15897 16182 16344
θ 2 7 12 17 22 27 32 37 42 47 52 57 62 67 72 77 82 87
sin θ 572 1997 3406 4790 6138 7438 8682 9860 10963 11982 12911 13741 14466 15082 15582 15964 16225 16362
θ 3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88
219 sin θ 857 2280 3686 5063 6402 7692 8923 10087 11174 12176 13085 13894 14598 15191 15668 16026 16262 16374
θ 4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89
sin θ 1143 2563 3964 5334 6664 7943 9162 10311 11381 12365 13255 14044 14726 15296 15749 16083 16294 16382
Table 4.12: Sine Values as 16-Bit Integers.
of many points, placing points away from where they should be generally results in a deformation of the object. We assume that the points are represented by coordinates that are 16-bit integers. Calculating the rotated coordinates (x∗ , y ∗ ) of a point (x, y) can now be done, for example, by x∗ = rshift(x × Table(90 − θ), 14) − rshift(y × Table(θ), 14), y ∗ = rshift(x × Table(θ), 14) + rshift(y × Table(90 − θ), 14). Notice how the required cosine values are obtained from the end of the table. This method works because the table has 91 entries. Multiplying a 16-bit integer coordinate by a 16-bit integer sine value yields a 32-bit product. The right shift effectively divides the product by 214 = 16,384, a necessary operation because our integer sine values have been premultiplied by this scale factor. Exercise 4.30: Use this method to calculate the results of rotating point (1, 2) by 60◦ and by 80◦ . In each case, compare the results with those obtained when built-in sine and cosine functions are used.
4.2.4 CORDIC Rotations We routinely use calculators to compute values of common functions, but have you ever wondered how a calculator determines the value of, say, tan 72.81◦ so fast? Many calculators use CORDIC (COordinate Rotation, DIgital Computer), a general method for
4.2 Two-Dimensional Transformations
220
computing many elementary functions. CORDIC was originally proposed by [Volder 59] and was extended by [Walther 71]. The original references are hard to find but are included in [Swartzlander 90]. Here, we show how CORDIC can be used to implement fast rotations. It is sufficient to consider a rotation about the origin where the rotation angle θ is in the interval [0, 90◦ ) (the first quadrant). The special case θ = 90◦ can be implemented by the negate and exchange rule, Equation (4.6). Rotations in other quadrants can be achieved by a first-quadrant rotation, followed by a reflection. The rotation is expressed by [see Equation (4.4)] (x∗ , y ∗ ) = (x, y)
cos θ sin θ
− sin θ cos θ
.
(4.17)
Because θ is less than 90◦ , we know that cos θ is nonzero, so we can factor out cos θ, yielding 1 − tan θ . (x∗ , y ∗ ) = cos θ (x, y) tan θ 1 m We now express θ as the sum i=0 θi , where angles θi are defined by the relation def
tan θi = 2−i or θi = arctan(2−i ). The first 16 θi , for i = 0, 1, . . . , 15, are listed in Table 4.13. i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
θi (degrees) 45. 26.5651 14.0362 7.12502 3.57633 1.78991 0.895174 0.447614 0.223811 0.111906 0.0559529 0.0279765 0.0139882 0.00699411 0.00349706 0.00174853
θi (radians) 0.785398 0.463648 0.244979 0.124355 0.0624188 0.0312398 0.0156237 0.00781234 0.00390623 0.00195312 0.000976562 0.000488281 0.000244141 0.00012207 0.0000610352 0.0000305176
Ki 0.70710678118654746 0.63245553203367577 0.61357199107789628 0.60883391251775243 0.60764825625616825 0.60735177014129604 0.60727764409352614 0.60725911229889284 0.60725447933256249 0.60725332108987529 0.60725303152913446 0.60725295913894495 0.60725294104139727 0.60725293651701029 0.60725293538591352 0.60725293510313938
Table 4.13: The First 16 θi ’s and Scale Factors.
In order to express any angle θ as the sum of these particular θi , some θi will have to be subtracted. Consider, for example, θ = 58◦ . We start with θ0 = 45◦ . Since θ0 < θ, we add θ1 . The sum θ0 + θ1 = 45 + 26.5651 = 71.5651 is greater than θ, so we subtract θ2 . The new sum, 57.5289, is less than θ, so we add θ3 , and so on.
4 Transformations
221
Exercise 4.31: We want to be able to express any angle θ in the range [0◦ , 90◦ ) by adding and subtracting a number of consecutive θi , from θ0 to some θm , without skipping any θi in between. Is that possible? It is easy to write a program that decides which of the θi ’s should be added and which should be subtracted. Thus, we end up with θ=
m i=0
di θi =
m
di arctan(2−i ),
where di = ±1.
i=0
Once the number m of necessary di ’s and their values have been determined, we rotate (x, y) to (x∗ , y ∗ ) in a loop where each iteration rotates a point (xi , yi ) through an angle di θi to a point (xi+1 , yi+1 ). A general iteration can be expressed in the form (xi+1 , yi+1 ) = cos(di θi ) (xi , yi ) = cos(di θi ) (xi , yi )
−di tan θi 1 −i −di 2 1
1 di tan θi 1 di 2−i
= cos(di θi ) (xi + yi di 2−i , yi − xi di 2−i ).
(4.18)
We interpret the result (xi+1 , yi+1 ) of an iteration as the vector from the origin to point (xi+1 , yi+1 ). Equation (4.18) shows that this vector is the product of two terms. The second term, (xi + yi di 2−i , yi − xi di 2−i ), determines the direction of the vector, while the first term, cos(di θi ), affects only the magnitude of the vector. The second term is easy to calculate since it just involves shifts. We know that di is just a sign and that a product of the form xi 2−i can be computed by shifting xi i positions to the right. The problem is to calculate the first term, cos(di θi ), and to multiply the two terms. This is why CORDIC proceeds by first performing all the iterations (xi+1 , yi+1 ) ← (xi + yi di 2−i , yi − xi di 2−i ) using just right shifts and additions/subtractions; the cosine terms are ignored. The result is a vector that points in the right direction but is too long (Figure 4.14). To bring this vector to its correct size, it should be multiplied by the scale factor Km =
m
cos θi .
i=0
(Notice that cos(di θi ) = cos θi since cosine is an even function.) This is discouraging because it suggests that m multiplications are needed just to calculate the scale factor Km . However, the first 16 scale factors are listed in Table 4.13 and even a quick glance shows that they converge to the number 0.60725. . . . Reference [Vachss 87] shows that Km can be obtained simply by using the m most significant bits of this number and ignoring the rest.
222
4.2 Two-Dimensional Transformations Using the identity sin2 θ + cos2 θ = 1 and the definition tan θi = 2−i , we get 1
cos θi =
2
1 + tan θi
=√
1 , 1 + 2−2i
which is why the scale factors of Table 4.13 were so easily calculated to a high precision by the code N[Table[Product[(2^(-2i)+1)^(-1/2),{i,0,n}],{n,0,16}],17]//TableForm.
y P*=(x*,y*)
3
2 1 0
P=(x,y) x
Figure 4.14: CORDIC Rotation.
Exercise 4.32: Suggest another way to calculate Km . Any practical CORDIC implementation (see [Jarvis 90] for a C program) should have the following two features. 1. CORDIC employs only shifts and additions/subtractions, so any implementation should use fixed-point, instead of floating-point, arithmetic. This is fast since shifting and adding fixed-point numbers can be done with integer operations. Notice that all the numbers involved in the computations are less than unity, except perhaps the original coordinates (x, y). A software package for graphics employing this method should therefore use normalized coordinates (fixed-point numbers in the interval [0, 1]) throughout and perform all the calculations on these small numbers. Each iteration results in a pair (xi+1 , yi+1 ) that’s slightly larger than its predecessor, but the last iteration results in a pair that can be larger than (x, y) by a factor of at most 1/0.60725 . . . = 1.64676 . . .. This pair is then scaled down when multiplied by Km . The final step is to scale the final coordinates up. All this suggests a 32-bit fixed-point format where the leftmost bit is reserved, as usual, for the sign, the next two bits are the integer part, and the remaining 29 bits are
4 Transformations
223
the fractional part (29 bits being equivalent to 9 decimal digits). The largest number that can be represented by this format is 11.11 . . . 12 = 3.999 . . . and the smallest one is 110 . . . 02 = −4. It’s a good idea to reserve two bits for the integer part because (1) even though all the numbers involved are 1 or smaller, some intermediate results may be greater than 1 and (2) this convention makes it possible to represent the important constants π, e, and φ (the Golden Ratio). 2. Earlier, we said, “It is easy to write a program that decides which of the θi ’s should be added and which should be subtracted.” The practical way to do this is to initialize a variable z to θ and try to drive z to zero during the iterations. In iteration i the program should calculate both z + θi and z − θi , select the value that’s closer to zero, use it to decide whether to add or subtract θi , and then update z. If z − θi is closer to zero, then θi should be added; otherwise, θi should be subtracted. An example is θ = 58◦ . We initialize z to 58. In iteration 0, it is clear that 58 − 45 = 13 is closer to zero than 58 + 45. The program therefore adds θ0 and updates z to 13. In iteration 1, the program finds that 13 − 26.5651 = −13.5651 is closer to zero than 13 + 26.5651, so it adds θ1 and updates z to −13.5651. In iteration 2, the program discovers that −13.5651 + 14.0362 = 0.4711 is closer to zero than −13.5651 − 14.0362, so it subtracts θ2 and updates z to 0.4711. Finally, we realize that there is really no need to compare z+θi and z−θi in iteration i. We simply start by selecting d0 = +1 and update z by subtracting z ← z − θ0 , z ← z − θ1 , etc., until we get a negative value in z. We then change di to −1 (the new sign of z) and update z by z ← z − di θi (which now amounts to adding θi to z). This is summarized by the Mathematica code of Figure 4.15. (But note that the Sign function of Mathematica returns +1, 0, or −1, while we need a result of +1 or −1. The code as shown is simple but not completely general.) t=Table[ArcTan[2.^{-i}], {i,0,15}]; (* arctans in radians *) d=1; x=2.1; y=0.34; z=46. Degree; Do[{Print[i,", ",x,", ",y,", ",z,", ",d], xn=x+y d 2^{-i}, yn=y-x d 2^{-i}, zn=z-d t[[i+1]], d=Sign[zn], x=xn, y=yn, z=zn}, {i,0,14}] Print[0.60725x,", ",0.60725y] Figure 4.15: Mathematica Code for CORDIC Rotations.
Compared to other approaches, CORDIC is a clear winner when a hardware multiplier is unavailable (e.g. in a microcontroller) or when you want to save the gates required to implement one (e.g. in an FPGA). On the other hand, when a hardware multiplier is available (e.g. in a DSP microprocessor), table-lookup methods and good oldfashioned power series are generally faster than CORDIC. —Grant R. Griffin, www.dspguru.com/info/faqs/cordic.htm Exercise 4.33: Instead of using the complex CORDIC method, wouldn’t it be simpler to perform a rotation by a direct use of Equation (4.17)? After all, this only requires the calculation of one sine and one cosine values.
224
4.2 Two-Dimensional Transformations
4.2.5 Similarities A similarity is a transformation that scales all distances by a fixed factor. It is easy to show that a similarity is produced by the special transformation matrix ⎛
⎞ a c 0 ⎝ −c a 0 ⎠ . m n 1 To show this, we observe that translations preserve distances, so we can a cignore the translation part of the matrix above and restrict ourselves to the matrix −c a . It transforms a point P = (x, y) to the point P∗ = (x∗ , y ∗ ) = (ax − cy, cx + ay). Given the two transformations P1 → P∗1 and P2 → P∗2 , it is straightforward to illustrate the relation distance2 (P∗1 P∗2 ) = (Δx∗ )2 + (Δy ∗ )2 = [(ax2 − cy2 ) − (ax1 − cy1 )]2 + [(cx2 + ay2 ) − (cx1 + ay1 )]2 = (aΔx − cΔy)2 + (cΔx + aΔy)2 = a2 Δx2 − 2aΔxcΔy + c2 Δy 2 + c2 Δx2 + 2cΔxaΔy + a2 Δy 2 = (a2 + c2 )(Δx2 + Δy 2 ) = (a2 + c2 )distance2 (P1 P2 ), ∗ ∗ implying that distance(P 1 P2 ) = √ 2 scaled by a factor of a + c2 .
√ a2 + c2 distance(P1 P2 ). Thus, all distances are
∗ In general, a similarity is a transformation of the form P∗ = (x∗ , y√ ) = (ax − cy + m, ±(cx + ay) + n), where the ratio of expansion (or shrinking) is k = a2 + c2 . If k is positive, the similarity is called direct; if k is negative, the similarity is opposite.
Exercise 4.34: Discuss the case k = 0. Using the ratio k, we can write a product ⎛ ⎞⎛ a c 0 k ⎝ −c a 0 ⎠ ⎝ 0 0 0 1 0
similarity (ignoring the translation part) as the 0 k 0
⎞⎛ 0 a/k 0 ⎠ ⎝ −c/k 1 0
c/k a/k 0
⎞ 0 0⎠, 1
which shows that a similarity is a combination of a scaling/reflection (by a factor k) and a rotation. (The definition of k implies that (a/k)2 + (c/k)2 = 1, so we can consider c/k and a/k the sine and cosine of the rotation angle, respectively.)
4.2.6 A 180◦ Rotation Another interesting example of combining transformations is a 180◦ rotation about a fixed point P = (Px , Py ). This combination is called a halfturn. It is performed, as usual, by translating P to the origin, rotating about the origin, and translating back.
4 Transformations
225
The transformation matrix is (notice that cos(180◦ ) = −1) ⎛
1 T=⎝ 0 −Px
0 1 −Py
⎞⎛ ⎞⎛ −1 0 0 1 0 0 ⎠ ⎝ 0 −1 0 ⎠ ⎝ 0 1 0 0 1 Px
0 1 Py
⎞ ⎛ −1 0 0⎠ = ⎝ 0 2Px 1
0 −1 2Py
⎞ 0 0⎠. 1
A general point (x, y) is therefore transformed by a halfturn to ⎛
−1 (x, y, 1) ⎝ 0 2Px
⎞ 0 0 ⎠ = (−x + 2Px , −y + 2Py , 1) 1
0 −1 2Py
(4.19)
(Figure 4.16a), but it’s more interesting to explore the effect of two consecutive halfturns, about points P and Q. The second halfturn transforms point (−x + 2Px , −y + 2Py , 1) to ⎛ ⎞ −1 0 0 −1 0 ⎠ = (x − 2Px + 2Qx , y − 2Py + 2Qy , 1). (4.20) (−x + 2Px , −y + 2Py , 1) ⎝ 0 2Qx 2Qy 1 If P = Q, then the result of the second halfturn is (x, y), showing how two identical 180◦ rotations return a point to its original location. If P and Q are different, the result is a translation of the original point (x, y) by factors −2Px + 2Qx and −2Py + 2Qy (Figure 4.16b).
(x,y) (x,y)
(x*,y*)
P (a)
Tra
tio nsla
n
(x*,y*)
(x*,y*) S Q
R
(x,y) P
P (b)
Q (c)
Figure 4.16: Halfturns.
Exercise 4.35: What is the result of three consecutive halfturns about the distinct points P, Q, and R? Things turn out best for the people who make the best out of the way things turn out. —Art Linkletter.
4.2 Two-Dimensional Transformations
226
4.2.7 Glide Reflections This transformation is a special combination of three reflections. Imagine the two vertical parallel lines x = L and x = M and the horizontal line y = N (Figure 4.17a). Reflecting a point P = (x, y) about the line x = L is done by translating the line to the y axis, reflecting about that axis, and translating back. The transformation matrix is ⎞ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎛ −1 0 0 1 0 0 −1 0 0 1 0 0 ⎝ 0 1 0⎠⎝ 0 1 0⎠⎝ 0 1 0⎠ = ⎝ 0 1 0⎠, 2L 0 1 L 0 1 0 0 1 −L 0 1 and the transformed point is ⎛
⎞ −1 0 0 (x, y, 1) ⎝ 0 1 0 ⎠ = (−x + 2L, y, 1). 2L 0 1 Reflecting this point about the line x = M ⎛ −1 0 (−x + 2L, y, 1) ⎝ 0 1 2M 0 (a translation), and reflecting this about ⎛ 1 0 (x − 2L + 2M, y, 1) ⎝ 0 −1 0 2N
results in ⎞ 0 0 ⎠ = (x − 2L + 2M, y, 1) 1
the horizontal line y = N yields ⎞ 0 0 ⎠ = (x − 2L + 2M, −y + 2N, 1). 1
This particular glide reflection is therefore a translation in x and a reflection in y. A general glide reflection is the product of three reflections, the first two about parallel lines L and M and the third about a line N perpendicular to them (Figure 4.17b).
L x=L
x=M
N
y=N
(a)
(b) Figure 4.17: Glide Reflection.
M
4 Transformations
227
4.2.8 Improper Rotations A rotation followed by a reflection about one of the coordinate axes is called an improper rotation. The transformation matrices for the two possible improper rotations in two dimensions (Figure 4.18) are
cos θ sin θ
cos θ sin θ
cos θ sin θ 1 0 − sin θ , = sin θ − cos θ 0 −1 cos θ − cos θ − sin θ −1 0 − sin θ , = − sin θ cos θ 0 1 cos θ
and the transformation rules therefore are x∗ = x cos θ + y sin θ, x = −x cos θ − y sin θ, ∗
y ∗ = x sin θ − y cos θ, y ∗ = −x sin θ + y cos θ.
Notice that the determinant of an improper rotation matrix equals −1, like that of a pure reflection.
(a)
(b)
Figure 4.18: Improper Rotations.
An improper rotation differs from a rotation in one important aspect. When we rotate an object through a small angle and repeat this transformation, the object seems to move smoothly along a circle. Each time we repeat an improper rotation, however, the object “jumps” from one side of the coordinate plane to the other. The total effect is very different from that of a smooth circular movement.
4.2.9 Decomposing Transformations Sometimes, a certain transformation A may be equivalent to the combined effects of several different transformations B, C, and D. We say that A can be decomposed into B, C, and D. Mathematically, this is equivalent to saying that the original transformation matrix TA equals the product TB TC TD . We have already seen that a rotation in two dimensions can be decomposed into a scaling followed by a shearing; here are other examples. It may come as a surprise that the general two-dimensional transformation matrix, Equation (4.7), can be written as a product of shearing, scaling, rotation, and translation
4.2 Two-Dimensional Transformations
228
as follows: ⎤ ⎡ a b 0 ⎣ c d 0⎦ = m n 1 ⎤ ⎤⎡ ⎤⎡ ⎤⎡ ⎡ 1 0 0 a/A b/A 0 A 0 0 1 0 0 ⎣ (ac + bd)/A2 1 0 ⎦ ⎣ 0 (ad − bc)/A 0 ⎦ ⎣ −b/A a/A 0 ⎦ ⎣ 0 1 0 ⎦ , m n 1 0 0 1 0 0 1 0 0 1 (4.21) √ where A = a2 + b2 . The third matrix produces rotation since (a/A)2 + (b/A)2 = 1. Even something as simple as shearing in one direction can be written as the product of a unit shearing and two scalings: ⎞ ⎞⎛ ⎞⎛ ⎞ ⎛ c 0 0 1 0 0 1/c 0 0 1 0 0 ⎝c 1 0⎠ = ⎝ 0 1 0⎠⎝1 1 0⎠⎝0 1 0⎠. 0 0 1 0 0 1 0 0 1 0 0 1 ⎛
Even the simple transformation of a unit shearing can be decomposed into a product that involves a scaling and two rotations. Note that the Golden Ratio φ is involved, ⎞⎛ ⎞⎛ ⎞ ⎛ cos α − sin α 0 φ 0 0 cos β 1 0 0 ⎝ 1 1 0 ⎠ = ⎝ sin α cos α 0 ⎠ ⎝ 0 1/φ 0 ⎠ ⎝ − sin β 0 0 1 0 0 1 0 0 0 1 ⎛
sin β cos β 0
⎞ 0 0⎠, 1
where α = tan−1 φ ≈ 58.28◦ and β = tan−1 (1/φ) ≈ 31.72◦ . (This is indeed a surprising result. It means that a clockwise rotation of 58.28◦ , followed by a scaling of φ in the x direction and 1/φ in the y direction, followed by a counterclockwise rotation of 31.72◦ , is equivalent to a unit shear in the x direction. This is illustrated by Figure 4.19.) Geometry has two great treasures: one the Theorem of Pythagoras; the other, the division of a line into extreme and mean ratio. The first we may compare to a measure of gold; the second we may name a precious jewel. —Johannes Kepler.
Exercise 4.36: Given the transformation x∗ = 3x − 2y + 1,
y ∗ = 4x + 5y − 6,
calculate the transformation matrix and decompose it into a product of four matrices as shown in Equation (4.21).
4 Transformations
229
rotate 580 clockwise
(a)
(b) 0
rotate 32 counterclockwise
scale by 1.618 and 0.618 (c)
(d)
Figure 4.19: Shearing Decomposed into Rotation and Scaling.
4.2.10 Reconstructing Transformations Given a sequence of two-dimensional transformations, we normally write the 3×3 matrix for each and then multiply the matrices. The result is another 3×3 matrix which is used to transform all the points of an object. An interesting question is: Given the points of an object before and after a transformation, can we reconstruct the transformation matrix from them? The answer is yes! The general two-dimensional transformation matrix depends on six numbers, so all we need are six equations involving transformed points. Since each point consists of two numbers, three points are enough to reconstruct the transformation matrix. Given three points both before (P1 , P2 , P3 ) and after (P∗1 , P∗2 , P∗3 ) a transformation, we can write the three equations P∗1 = P1 T, P∗2 = P2 T, and P∗3 = P3 T and solve for the six elements of T. Example: The three points (1, 1), (1, 0), and (0, 1) are transformed to (3, 4), (2, −1), and (0, 2), respectively. We write the general transformation (x∗ , y ∗ ) = (ax + cy + m, bx + dy + n) for the three sets (3, 4) = (a + c + m, b + d + n), (2, −1) = (a + m, b + n), (0, 2) = (c + m, d + n), and this is easily solved to yield a = 3, b = 2, c = 1, d = 5, m = −1, and n = −3. The
4.2 Two-Dimensional Transformations
230
transformation matrix is therefore ⎛
⎞ 3 2 0 T=⎝ 1 5 0⎠. −1 −3 1 Exercise 4.37: Inverse transformations. From P∗ = PT, we get P∗ T−1 = PTT−1 or P = P∗ T−1 . We can therefore reconstruct an original point P from the transformed one, P∗ , if we know the inverse of the transformation matrix T. In general, the inverse of the 3 × 3 matrix ⎛ ⎞ a b 0 T = ⎝ c d 0⎠ m n 1 is −1
T
⎛ 1 ⎝ = ad − bc
⎞ d −b 0 −c a 0⎠. cn − dm bm − an 1
(4.22)
Calculate the inverses of the transformation matrices for scaling, shearing, rotation, and translation, and discuss their properties. Exercise 4.38: Given that the four points P1 = (0, 0),
P2 = (0, 1),
P3 = (1, 1),
and P4 = (1, 0)
P∗2 = (2, 3),
P∗3 = (8, 4),
and P∗4 = (6, 1),
are transformed to P∗1 = (0, 0),
reconstruct the transformation matrix.
4.2.11 A Note All the expressions derived so far for transformations are based on the basic relation P∗ = PT. Some authors prefer the equivalent relation P∗ = TP, which changes the mathematics somewhat. If we want the coordinates of the transformed point to be the same as before (i.e., x∗ = ax + cy + m, y∗ = bx + dy + n), we have to write the relation P∗ = TP in the form ⎛ ∗⎞ ⎛ ⎞⎛ ⎞ x a c m x ⎝ y∗ ⎠ = ⎝ b d n ⎠ ⎝ y ⎠ . 1 0 0 1 1 The first difference is that both P and P∗ are columns instead of rows. This is because of the rules of matrix multiplication. The second difference is that the new transformation matrix T is the transpose of the original one. Hence, rotation, for example, is achieved by the matrices ⎞ ⎛ cos θ sin θ 0 ⎝ − sin θ cos θ 0 ⎠ 0 0 1
4 Transformations
231
for a clockwise rotation, and ⎛
cos θ ⎝ sin θ 0
− sin θ cos θ 0
⎞ 0 0⎠ 1
for a counterclockwise rotation. ⎛ Similarly, translation is done by
⎞ 1 0 m ⎝0 1 n ⎠ 0 0 1
⎛ instead of
⎞ 1 0 0 ⎝ 0 1 0⎠. m n 1
4.2.12 Summary The general two-dimensional affine transformation is given by x∗ = ax + cy + m, y ∗ = bx + dy + n. This section lists the values or constraints that should be assigned to the four coefficients a, b, c, and d in order to obtain certain types of transformations (we ignore translations). A general affine transformation is obtained when ad − bc = 0. For ad − bc = +1, the transformation is rotation, and for ad − bc = −1, it is reflection. The case ad − bc = 0 corresponds to a singular transformation. The identity transformation is obtained for a = d = 1 and b = c = 0. An isometry is obtained by a2 + b2 = c2 + d2 = 1 and ac + bd = 0. An isometry is a transformation that preserves distances. If P and Q are two points on an object, then the distance d between them is preserved, meaning that the distance d between P∗ and Q∗ is the same. Rotations, reflections, and translations are isometries. A similarity is obtained for a2 + b2 = c2 + d2 and ac + bd = 0. A similarity is a transformation that preserves the ratios of lengths. A typical similarity is scaling, but it may be combined with rotation, reflection, and translation. An equiareal transformation (preserving areas) is obtained when |ad − bc| = 1. A shearing in the x direction is caused by a = d = 1 and b = 0. Similarly, a shearing in the y direction corresponds to a = d = 1 and c = 0. A uniform scaling is a = d > 0 and b = c = 0. (The identity is a special case of scaling.) A uniform reflection is a = d < 0 and b = c = 0. A rotation is the result of a = d = cos θ and b = −c = sin θ.
4.3 Three-Dimensional Coordinate Systems
232
4.3 Three-Dimensional Coordinate Systems We now turn to transformations in three dimensions. In most cases, the mathematics of linear transformations is easy to extend from two dimensions to three dimensions, but the discussion in this section demonstrates that certain transformations, most notably rotations, are more complex in three dimensions because there are more directions about which to rotate and because the simple terms clockwise and counterclockwise no longer apply in three dimensions. We start with a short discussion of coordinate systems in three dimensions.
y z Left-handed
z Right-handed
x
(a)
y
y Object
Observer (b) Left-handed
x
z
Object
x
z Observer (c) Right-handed
Figure 4.20: Three-Dimensional Coordinate Systems.
In two dimensions, there is only one Cartesian coordinate system, with two perpendicular axes labeled x and y (actually, the axes don’t have to be perpendicular, but this is irrelevant for our discussion of transformations). A coordinate system in three dimensions consists similarly of three perpendicular axes labeled x, y, and z, but there are two such systems, a left-handed and a right-handed (Figure 4.20a), and they are different. A right-handed coordinate system is constructed by the following rule. Align your right thumb with the positive x axis and your right index finger with the positive y axis. Your right middle finger will then point in the direction of positive z. The rule for a left-handed system uses the left hand in a similar manner. It is also possible to define a left-handed coordinate system as the mirror image (reflection) of a right-handed system. Notice that one coordinate system cannot be transformed into the other by translating or rotating it.
4 Transformations
233
The difference between left-handed and right-handed coordinate systems becomes important when a three-dimensional object is projected on a two-dimensional screen (Chapter 6). We assume that the screen is positioned at the xy plane with its origin (i.e., its bottom-left corner) at the origin of the three-dimensional system. We also assume that the object to be projected is located on the positive side of the z axis and the viewer is located on the negative side, looking at the projection of the image on the screen. Figure 4.20b shows that in a left-handed three-dimensional coordinate system, the directions of the positive x and y axes on the screen coincide with those of the three-dimensional x and y axes. However, in a right-handed system (Figure 4.20c) the two-dimensional x axis (on the screen) and the three-dimensional x axis point in opposite directions. Principle: Express co-ordinate ideas in similar form. This principle, that of parallel construction, requires that expressions of similar content and function should be outwardly similar. The likeness of form enables the reader to recognize more readily the likeness of content and function. Familiar instances from the Bible are the Ten Commandments, the Beatitudes, and the petitions of the Lord’s Prayer. —W. Strunk Jr. and E. B. White, The Elements of Style.
4.4 Three-Dimensional Transformations We derive three-dimensional transformations by extending the methods used in twodimensional transformations, especially the concept of homogeneous coordinates. A three-dimensional point P = (x, y, z, 1) is transformed to a point P∗ = (x∗ , y ∗ , z ∗ , 1) by multiplying it by a 4×4 matrix ⎞ a b c p d e f q ⎟ ⎜ T=⎝ ⎠. h i j r l m n s ⎛
(4.23)
The last column of T is not (0, 0, 0, 1)T and is used for projections. (See the discussion of n-point perspective on Page 319.) As a result, the product PT is the 4-tuple (X, Y, Z, H), where H equals xp + yq + zr + s and is generally not 1. The three coordinates (x∗ , y ∗ , z ∗ ) of P∗ are obtained by dividing (X, Y, Z) by H. Hence, (x∗ , y ∗ , z ∗ ) = (X/H, Y /H, Z/H). The top left 3 × 3 part of T is responsible for scaling and reflection (a, e, and j), shearing (b, c, f and d, h, i), and rotation (all nine elements). The three quantities l, m, and n are responsible for translation, and the only new parameters are those in the last column (p, q, r, s). 1 1 1 . MulTo understand the meaning of s, we examine the matrix T = S tiplying P by T transforms (x, y, z, 1) into (x, y, z, s), so the new point has coordinates
234
4.4 Three-Dimensional Transformations
(x/s, y/s, z/s). The parameter s is therefore responsible for global scaling (by a factor 1/s 1/s 1/s of 1/s). Its effect is identical to transforming by . 1 Translation in three dimensions is a direct extension of the two-dimensional case. A point can be translated in the direction of any of the coordinate axes. Scaling in three dimensions is simple. An object can be scaled about the origin along any of the three coordinate axes. To scale about another point P0 , a sequence of three transformations is needed. The point should be translated to the origin, the scaling performed, and the point translated back. Notice that scaling an object is done by scaling all its points. Scaling a point does not change its dimensions (a point has no dimensions) but simply moves it to another location. Shearing in three dimensions is difficult to visualize. It is controlled by the six off-diagonal matrix elements b, c, f , d, h, and i, which is why many variations are possible. Perhaps the best way to become familiar with three-dimensional shearing is to experiment with the effect of varying each of the six parameters. Figure 4.21 shows a few possible shearings of a rectangular box.
Figure 4.21: Shearing in Three Dimensions.
Shearing: A transformation in which all points along a given line L remain fixed while other points are shifted parallel to L by a distance proportional to their perpendicular distance from L. Shearing a plane figure does not change its area. This can also be generalized to three dimensions, where planes are translated instead of lines. —Eric W. Weisstein, http://mathworld.wolfram.com/Shear.html
4.4.1 Reflection It is easy to reflect a point (x, y, z) about any of the three coordinate planes xy, xz, or yz. All that is needed is to change the sign of one of the point’s coordinates. In this section, we discuss and explain the general case where an arbitrary plane and a point are given and we want to reflect the point about the plane. We proceed in three steps as follows: (1) We discuss planes and their equations (there is a similar discussion in Section 9.2.2), (2) we show how to determine the distance of a point from a given plane, and (3) we explain how to compute the reflection of a point about a plane. The (implicit) equation of a straight line is Ax + By + C = 0, where A or B but not both can be zero. The equation of a flat plane is the direct extension Ax+By+Cz +D = 0, where A, B, and C cannot all be zero. Four equations are needed to calculate the four unknown coefficients A, B, C, and D. On the other hand, we know that any three independent (i.e., noncollinear) points Pi = (xi , yi , zi ), i = 1, 2, 3 define a plane. Thus, we can write a set of four equations, three of which are based on three given points and
4 Transformations
235
the fourth one expressing the condition that a general point (x, y, z) lies on the plane x x 0 = 1 x2 x3 y1 =x y2 y3
y y1 y2 y3 z1 z2 z3
z 1 z1 1 z2 1 z3 1 x1 1 1 − y x2 x3 1
z1 z2 z3
x1 1 1 + z x2 x3 1
y1 y2 y3
1 x1 1 − x2 1 x3
y1 y2 y3
z1 z2 . z3
We cannot solve this system of equations because x, y, and z can have any values, but we don’t need to solve it! We just have to guarantee that this system has a solution. In general, a system of linear algebraic equations has a solution if and only if its determinant is zero. The expression below assumes this and also expands the determinant by its top row: x x 0 = 1 x2 x3 y1 =x y2 y3
y y1 y2 y3 z1 z2 z3
z 1 z1 1 z2 1 z3 1 x1 1 1 − y x2 x3 1
z1 z2 z3
x1 1 1 + z x2 x3 1
y1 y2 y3
1 x1 1 − x2 1 x3
y1 y2 y3
z1 z2 . z3
This is of the form Ax + By + Cz + D = 0, so we conclude that x1 z1 1 x1 y1 1 x1 y1 z1 1 A = y2 z2 1 B = − x2 z2 1 C = x2 y2 1 D = − x2 x3 z3 1 x3 y3 1 x3 y3 z 3 1
y1 y2 y3
z1 z2 . z3 (4.24)
Exercise 4.39: Derive the expression of the plane containing the z axis and passing through the point (1, 1, 0). Exercise 4.40: In the plane equation Ax + By + Cz + D = 0 if D = 0, then the plane passes through the origin. Assuming D = 0, we can write the same equation as x/a + y/b + z/c = 1, where a = −D/A, b = −D/B, and c = −D/C. What is the geometrical interpretation of a, b, and c? We operate with nothing but things which do not exist, with lines, planes, bodies, atoms, divisible time, divisible space—how should explanation even be possible when we first make everything into an image, into our own image! —Friedrich Nietzsche. In some practical applications, the normal to the plane and one point on the plane are known. It is easy to derive the plane equation in such a case. We assume that N is the (known) normal vector to the plane, P1 is a known point on the plane, and P is an arbitrary point in the plane. The vector P−P1 is perpendicular
4.4 Three-Dimensional Transformations
236
to N, so their dot product N•(P−P1 ) equals zero. Since the dot product is associative, we can write N • P = N • P1 . The dot product N • P1 is just a number, to be denoted by s, so we get (4.25) N • P = s or Nx x + Ny y + Nz z − s = 0. Equation (4.25) can now be written as Ax + By + Cz + D = 0, where A = Nx , B = Ny , C = Nz , and D = −s = −N • P1 . The three unknowns A, B, and C are the components of the normal vector, and D can be calculated from any known point P1 on the plane. The expression N • P = s is a useful equation of the plane and is used in many applications. Exercise 4.41: Given N(u, w) = (1, 1, 1) and P1 = (1, 1, 1), calculate the plane equation. z (0,0,3)
r
P1
ws y
(a)
(3,0,0)
s
x
ur
P2
P
P3 (b)
Figure 4.22: (a). A Plane. (b) Three Points on a Plane.
Note that the direction in which the normal is pointing is irrelevant for the plane equation. Substituting (−A, −B, −C) for (A, B, C) would also change the sign of D, resulting in the same equation. However, the direction of the normal is important when a surface is to be shaded. We want the normal, in such a case, to point outside the surface. Often, this has to be done manually since the computer has no concept of the shape of the object in question and the meaning of the terms “inside” and “outside.” However, in cases where a plane is defined by three points, the direction of the normal can be specified by arranging the three points (in the data structure in memory) in a certain order. It is also easy to derive the equation of a plane when three points on the plane, P1 , P2 , and P3 , are known. In order for the points to define a plane, they should not be collinear. We consider the vectors r = P2 − P1 and s = P3 − P1 a local coordinate system on the plane. Any point P on the plane can be expressed as a linear combination P = ur + ws, where u and w are real numbers. Since r and s are local coordinates on the plane, the position of point P relative to the origin is expressed as (Figure 4.22b) P(u, w) = P1 + ur + ws,
−∞ < u, w < ∞.
(4.26)
4 Transformations N
v
237
P vN
Q Figure 4.23: Distance of a Point from a Plane.
Exercise 4.42: Given the three points P1 = (3, 0, 0), P2 = (0, 3, 0), and P3 = (0, 0, 3), write the equation of the plane defined by them. The next step is to determine the distance between a point and a plane. Given the point P = (x, y, z) and the plane Ax + By + Cz + D = 0, we select an arbitrary point Q = (x0 , y0 , z0 ) on the plane. Since Q is on the plane, it satisfies Ax0 +By0 +Cz0 +D = 0 or −Ax0 − By0 − Cz0 = D. We construct the vector v from Q to P as the difference v = P − Q = (x − x0 , y − y0 , z − z0 ). Figure 4.23 shows that the required distance (the size of the vector from the plane to P that’s perpendicular to the plane) is the component vN of v in the direction of the normal N = (A, B, C). This component is given by |v • N| |A(x − x0 ) + B(y − y0 ) + C(z − z0 )| √ = |N| A2 + B 2 + C 2 |Ax + By + Cz − Ax0 − By0 − Cz0 | √ = A2 + B 2 + C 2 |Ax + By + Cz + D| = √ . A2 + B 2 + C 2
vN =
(4.27)
If we omit the absolute value, then the distance becomes a signed quantity. We can think of the plane as if it divides all of space into two parts, one in the direction of N and the other on the other side of the plane. The distance is positive if P is located in that part of space pointed to by the normal (which is the case in Figure 4.23), and it is negative in the opposite case. Exercise 4.43: What’s the distance of a plane from the origin? Now that we can figure out the distance between a point and a plane, the last step is to reflect a point about a given plane. We start with a point P = (x, y, z) and a plane √ Ax + By + Cz + D = 0. We denote the normal unit vector by N = (A, B, C)/ A2 + B 2 + C 2 and the (signed) distance between P and the plane by d. To get from P to the plane, we have to travel a distance d in the direction of N. To arrive at the reflection point P∗ , we should travel another d units in the same direction. Thus, the reflection P∗ of P is given by P∗ = P − 2dN = P −
2(Ax + By + Cz + D) (A, B, C). A2 + B 2 + C 2
(4.28)
238
4.4 Three-Dimensional Transformations
Exercise 4.44: Why P − 2dN and not P + 2dN? Most neurotics have been mindful of their five W’s since grammar school: why, why, why, why, why. —Terri Guillemets.
(0,1,2)
y
y (1,1,1)
(-1,0,2)
(1,1,1)
z
x
x
(-1,-1,1)
(a)
(b)
Figure 4.24: Reflection in Three Dimensions: Examples.
Examples: We select (Figure 4.24a) the plane x+y = 0 and the point P = (1, 1, 1). Equation (4.28) becomes P∗ = (1, 1, 1) −
2(1 + 1) (1, 1, 0) = (−1, −1, 1). 1+1+0
Similarly, point P = (0, 1, 2) is reflected to P∗ = (0, 1, 2) −
2(0 + 1) (1, 1, 0) = (−1, 0, 2). 1+1+0
We now select (Figure 4.24b) the plane x + y + z − 1 = 0 and the point P = (1, 1, 1). Equation (4.28) becomes P∗ = (1, 1, 1) −
1 2(1 + 1 + 1 − 1) (1, 1, 1) = − (1, 1, 1). 1+1+1 3
Similarly, point P = (0, 0, 0) is reflected to P∗ = (0, 0, 0) −
2 2(0 + 0 + 0 − 1) (1, 1, 1) = (1, 1, 1). 1+1+1 3
The special case of a reflection about one of the coordinate planes is also obtained from Equation (4.28). The equation of the xy plane, for example, is z = 0, where Equation (4.28) yields P∗ = (x, y, z) −
2(0 + 0 + z + 0) (0, 0, 1) = (x, y, −z). 02 + 02 + 12
4 Transformations
239
4.4.2 Rotation Rotation in three dimensions is difficult to visualize and is often confusing. One approach to rotations is to write three rotation matrices that rotate about the three coordinate axes: ⎞ 0 0⎟ ⎠. 0 1 (4.29) Let’s look at the first of these matrices. Its third row and third column are (0, 0, 1, 0), which is why multiplying a point (x, y, z, 1) by this matrix leaves its z coordinate unchanged. The sines and cosines in the first two rows and two columns mix up the x and y coordinates in a way similar to a two-dimensional rotation, Equation (4.4). Thus, this transformation matrix causes a rotation about the z axis. The two other matrices rotate about the y and x axes. ⎛
cos θ ⎜ sin θ ⎝ 0 0
− sin θ cos θ 0 0
y
0 0 1 0
⎞ ⎛ 0 cos θ 0⎟ ⎜ 0 , ⎠ ⎝ 0 sin θ 1 0
(0,1,0)
y
(1,0,0)
x
z (a)
⎞ ⎛ 0 1 0 0 ⎟ ⎜ 0 cos θ , ⎠ ⎝ 0 0 sin θ 1 0 0
0 − sin θ 1 0 0 cos θ 0 0
y x
z
) 0,1 (0,
(b)
0 − sin θ cos θ 0
(0,1,0)
x
z (c)
Figure 4.25: Rotating About the Coordinate Axes.
Okay, so I assume going into this tutorial that you know how to perform matrix multiplication. I don’t care to explain it, and it’s available all over the Internet. However, once you know how to perform that operation, you should be good to go for this tutorial. (Found on the Internet). It is therefore easy to identify the axis of rotation for each of the three rotation matrices of Equation (4.29), but what about their direction of rotation? To figure out the directions, we select θ = 90◦ and substitute sin θ = 1 and cos θ = 0. Simple tests in a right-handed coordinate system show that the first matrix of Equation (4.29) (rotation about the z axis) rotates point (1, 0, 0) to (0, −1, 0) and point (0, 1, 0) to (1, 0, 0). Thus, when we observe this 90◦ rotation looking in the direction of positive z, the rotation is counterclockwise (Figure 4.25a). The second matrix, however, behaves differently. It rotates point (1, 0, 0) to (0, 0, −1) and point (0, 0, 1) to (1, 0, 0). When we observe this 90◦ rotation about the y axis looking in the direction of positive y, the rotation is clockwise (Figure 4.25b). The third matrix (rotation about the x axis) rotates point (0, 1, 0) to (0, 0, −1) and point (0, 0, 1) to (0, 1, 0). When we observe this 90◦ rotation looking in the direction of positive x, the rotation is counterclockwise (Figure 4.25c). We therefore decide (somewhat arbitrarily) to switch the signs (positive and negative) of the sine functions in the matrices that rotate about the z and x axes. The
4.4 Three-Dimensional Transformations
240 result, ⎛
cos θ ⎜ − sin θ ⎝ 0 0
sin θ cos θ 0 0
0 0 1 0
⎞ ⎛ 0 cos θ 0⎟ ⎜ 0 ⎠, ⎝ 0 sin θ 1 0
0 − sin θ 1 0 0 cos θ 0 0
⎞ ⎛ 0 1 0 0 ⎟ ⎜ 0 cos θ ⎠, ⎝ 0 0 − sin θ 1 0 0
0 sin θ cos θ 0
⎞ 0 0⎟ ⎠ , (4.30) 0 1
is a set of three rotation matrices that rotate a point about the three coordinate axes in such a way that if we look in the positive direction of that axis, the rotation is clockwise. (Surprisingly, it turns out that there is an elegant way to specify the direction of rotation that’s generated by the rotation matrices of Equation (4.29), and this is described below.) The rotation matrices of Equations (4.29) and (4.30) are simple but not very useful because in practice we rarely know how to break a general rotation into three rotations about the coordinate axes. There are some cases, however, where rotations about the coordinate axes are common. One such case is discussed in Section 5.2; two more are presented here. Case 1: Rotations about the coordinate axes are common in the motion of a submarine or an airplane. These vehicles have three degrees of freedom and have three natural, mutually perpendicular axes of rotation that are called roll, pitch, and yaw (Figure 4.26). Roll is a rotation about the direction of motion of the vehicle. An airplane rolls when it banks by dipping one wing and lifting the other. Pitch is an up or down rotation about an axis that goes through the wings. An airplane uses its elevators for this. Yaw is a left–right rotation about a vertical axis, accomplished by the rudder. These terms originated with sailors because a ship can yaw and also has limited roll and pitch capabilities.
Yaw Pitch Roll Figure 4.26: Roll, Pitch, and Yaw.
Case 2: Another example of an application where rotations about the three coordinate axes are common is L-systems. This is a system of formal notation developed by the biologist Aristid Lindenmayer (hence the “L”) in 1968 as a tool to describe the morphology of plants [Lindenmayer 68]. In the 1970s, this notation was adopted by computer scientists and used to define formal languages. Since 1984, it has also been used to describe and draw many types of fractals. Today, L-systems are used to generate tilings, geometric art, and even music. The main idea of L-systems is to specify a complex object by (1) defining an initial simple object, called the axiom, and (2) writing rules that show how to replace parts
4 Transformations
241
of the axiom. The rules are written in terms of turtle moves, a concept originally introduced in the LOGO programming language [Abelson and diSessa 82]. L-systems, however, specify the structure of three-dimensional objects, so the turtle must move in three dimensions and can rotate about its three main axes. For more information on L-systems, see [Prusinkiewicz 89]. It has already been mentioned that rotation in three dimensions is more complex than in two dimensions. One reason for this is that rotation in two dimensions is about a point, whereas rotation in three dimensions is about an axis (any axis, not just one of the three coordinate axes). Another reason is that the direction of rotation in two dimensions can be only clockwise or counterclockwise, but the direction of rotation in three dimensions is more complex to specify. The rotation is about an axis, but its direction, clockwise or counterclockwise, about this axis depends on how we look at the axis. Thus, a general rule is needed to specify the direction of a three-dimensional rotation unambiguously. We state such a rule for the rotation matrices of Equation (4.29). The direction of a three-dimensional rotation generated by the matrices of (4.29) in a right-handed coordinate system is determined by the following rule: Write down the sequence “x, y, z” and erase the symbol that corresponds to the axis of rotation. The two remaining symbols are denoted by l and r. Draw the coordinate axes such that the positive direction of l will be up and the positive direction of r will be to the right. (This is not a necessary requirement, but it conforms to Figure 4.27.) The rotation will then be from positive r to positive l to negative r to negative l (Figure 4.27 and see also Exercise 6.13).
x
y
x y
z
z
Figure 4.27: Direction of Three-Dimensional Rotations.
Example: A rotation about the z axis produced by the leftmost matrix of (4.29). After erasing z, the two symbols left are x and y. We draw the coordinate axes such that positive x is up and positive y is to the right. The matrix produces counterclockwise rotation. To achieve clockwise rotation, either use a negative angle or the inverse of the rotation matrix. Inverting our rotation matrices is especially easy and requires only that we change the signs of the sine functions. Example: Consider the following compound transformation: (1) a translation by l, m, and n units along the three coordinate axes, (2) a rotation of θ degrees about the x axis, (3) a rotation of φ degrees about the y axis, and (4) the reverse translation. The
242
4.4 Three-Dimensional Transformations
four transformation matrices are ⎛
1 0 0 ⎜0 1 0 Tr = ⎝ 0 0 1 l m n ⎛ 1 0 0 ⎜ 0 cos θ sin θ Rx = ⎝ 0 − sin θ cos θ 0 0 0
⎛ ⎞ ⎞ 0 1 0 0 0 0⎟ 1 0 0⎟ ⎜ 0 ⎠ , Trr = ⎝ ⎠, 0 0 0 1 0 1 −l −m −n 1 ⎛ ⎞ ⎞ 0 cos φ 0 − sin φ 0 0⎟ 1 0 0⎟ ⎜ 0 ⎠ , Ry = ⎝ ⎠. 0 sin φ 0 cos φ 0 1 0 0 0 1
Their product equals the 4×4 matrix T = Tr Rx Ry Trr ⎛ cos φ ⎜ sin φ sin θ ⎜ ⎜ cos θ sin φ ⎜ ⎜ ⎜ −l + l cos φ =⎜ ⎜ +m cos(φ − θ)/2 ⎜ ⎜ −m cos(φ + θ)/2 ⎝ +n sin(φ − θ)/2 +n sin(φ + θ)/2
0 cos θ − sin θ −m +m cos θ −n sin θ
− sin φ cos φ sin θ cos φ cos θ [−2n + n cos(φ − θ) +n cos(φ + θ) −2l sin φ −m sin(φ − θ) +m sin(φ + θ)]/2
⎞ 0 0⎟ ⎟ 0⎟ ⎟ ⎟ 1⎟. ⎟ ⎟ ⎟ ⎟ ⎠
Substituting the values θ = 30◦ , φ = 45◦ , and l = m = n = −1, we get the 4 × 4 matrix ⎛
⎞ 0.7071 0 −0.7071 0 0.3540 0 ⎟ ⎜ 0.3540 0.866 T=⎝ ⎠. 0.6124 −0.50 0.6124 0 −0.673 0.634 0.7410 1 A point at (1, 2, 3), for example, is transformed by T to the point (1, 2, 3, 1)T = (2.5793, 0.866, 2.5791, 1). Exercise 4.45: Do the same operations for the compound transformation Tr Rx Trr .
4.4.3 General Rotations In practice, we generally don’t know how to express an arbitrary rotation as a product of rotations about the coordinate axes, so we have to derive the important transformation of general rotation explicitly. The problem is easy to state. A point P is to be rotated through an angle θ about a specified axis. It is important to realize that there is a difference between an axis and a vector. A vector is fully specified by three numbers. It has direction and magnitude, but no specific location in space. An axis has both direction and location (it starts at a certain point), but its magnitude is normally irrelevant. A full specification of an axis requires a start point and a vector, a total of six numbers.
4 Transformations
243
(However, because the magnitude of the vector is irrelevant, it can be represented by two numbers only.) In order to simplify our derivation, we assume that our axis of rotation starts at the origin. If it starts at point P0 , we have to precede the rotation by a translation of P0 to the origin and follow the rotation by the inverse translation (see also Section 24.3.10 for a discussion of rotations in connection with the discrete cosine transform (DCT)). We therefore denote by u a unit vector located on an axis that starts at the origin. We can now fully specify a general rotation in three dimensions by four numbers—the rotation angle θ and the three components of u. The rotated point P ends up at P∗ . We connect P to the origin and call the resulting vector r. Rotating point P to P∗ is identical to rotating vector r to r∗ . Figure 4.28a shows that the component OC of r along u is left unchanged, but the component CP is rotated to CP*. The distance OC is seen from the diagram to be (r•u), can be written (r • u)u. From r = OC + CP, we get CP = r − (r • u)u so the vector OC = |r − (r • u)u|. It can also be seen from the diagram or, in terms of magnitudes, |CP| = |r| sin φ. Since u is a unit vector, we can write |u × r| = |r| sin φ. We thus that |CP| = |r − (r • u)u| = |u × r|. obtain |CP| Figure 4.28b shows the situation when looking from the origin in the positive u is perpendicular direction. (The diagram shows the tail of u.) Note that the vector CQ to both u and r, so it is in the direction of u × r. u
u C
C
P
O
P*
r*
φ
P*
r P Q (a)
(b) Figure 4.28: A General Rotation.
The next step is to resolve CP* into its components. From Figure 4.28b, we get ∗ = cos θ[r − (r • u)u] + sin θ[r − (r • u)u] = cos θ[r − (r • u)u] + sin θ(u × r), CP which can be used to express r∗ : ∗ = (r • u)u + cos θ[r − (r • u)u] + sin θ(u × r). + CP r∗ = OC
(4.31)
244
4.4 Three-Dimensional Transformations
Using Equations (A.3) and (A.5) (Page 1290), we can rewrite this as r∗ = (uuT )r + cos θr − cos θ(uuT )r + sin θUr, where ⎞ ⎛ 0 −uz uy 0 −ux ⎠ . U = ⎝ uz −uy ux 0 The result can now be summarized as r∗ = Mr, where M = uuT + cos θ(I − uuT ) + sin θU ⎡ 2 ux uy (1 − cos θ) − uz sin θ ux + cos θ(1 − u2x ) ⎢ = ⎣ ux uy (1 − cos θ) + uz sin θ u2y + cos θ(1 − u2y ) ux uz (1 − cos θ) − uy sin θ
uy uz (1 − cos θ) + ux sin θ
(4.32) ⎤ ux uz (1 − cos θ) + uy sin θ ⎥ uy uz (1 − cos θ) − ux sin θ ⎦ . u2z + cos θ(1 − u2z )
Direction cosines. If v = (vx , vy , vz ) is a three-dimensional vector, its direction cosines are defined as vx vy vz N1 = , N2 = , N3 = . |v| |v| |v| These are the cosines of the angles between the direction of v and the three coordinate axes. It is easy to verify that N12 + N22 + N32 = 1. If u = (ux , uy , uz ) is a unit vector, then |u| = 1 and ux , uy , and uz are the direction cosines of u. It can be shown that a rotation through an angle −θ is performed by the transpose MT . Consider the two successive and opposite rotations r∗ = Mr and r = MT r∗ . On the one hand, they can be expressed as the product r = MT r∗ = MT Mr. On the other hand, they rotate in opposite directions, so they return all points to their original positions; therefore r must be equal to r. We end up with r = MT Mr or MMT = I, where I is the identity matrix. The transpose MT therefore equals the inverse, M−1 , of M, which shows that a rotation matrix M is orthogonal. Example: Consider a rotation about the z axis. The rotation axis is u = (0, 0, 1), resulting in ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 0 0 0 0 −1 0 cos θ − sin θ 0 uuT = ⎝ 0 0 0 ⎠ and U = ⎝ 1 0 0 ⎠ , and hence M = ⎝ sin θ cos θ 0 ⎠ , 0 0 1 0 0 0 0 0 1 which is the familiar rotation matrix about the z axis. It is identical to the z-rotation matrix of Equation (4.29), so we conclude that it rotates counterclockwise when viewed from the direction of positive z. The general rotation matrix of Equation (4.32) can also be constructed as the product of five simple rotations about various coordinate axes. Given a unit vector u = (ux , uy , uz ), consider the following rotations. 1. Rotate u about the z axis into the xz plane, so its y coordinate becomes zero. This is done by a rotation matrix of the form ⎤ ⎡ cos ψ − sin ψ 0 A = ⎣ sin ψ cos ψ 0 ⎦ , 0 0 1
4 Transformations
245
and the angle ψ of rotation can be computed from the requirement that the y component of vector v = uA be zero. This component is −ux sin ψ + uy cos ψ, which implies
cos ψ = ux / u2x + u2y and sin ψ = uy / u2x + u2y . Notice that rotating u does not affect its magnitude, so v is also a unit vector. In addition, since the rotation is about the z axis, the z component of u does not change, so vz = uz . 2. Rotate vector v about the y axis until it coincides with the z axis. This is accomplished by the matrix ⎤ cos φ 0 sin φ 1 0 ⎦. B=⎣ 0 − sin φ 0 cos φ ⎡
The angle φ of rotation is computed from the dot product cos φ = v · (0, 0, 1) = vz = uz , implying that sin φ = 1 − u2z . Since v is a unit vector, it is rotated by B to vector (0, 0, 1). 3. Rotate (0, 0, 1) about the z axis through an angle θ. This is done by matrix ⎡
cos θ C = ⎣ sin θ 0
− sin θ cos θ 0
⎤ 0 0⎦. 1
This is a trivial rotation that does not change (0, 0, 1). 4. Rotate the result of step 3 by B−1 (which equals BT ). 5. Rotate the result of step 4 by A−1 (which equals AT ). When these five steps are performed on a point (x, y, z), the effect is to rotate the point through an angle θ about u. In practice, the five steps are combined by multiplying the five matrices above, as shown in the listing of Figure 4.29. The result is identical to Equation (4.32). tm=Sqrt[x^2+y^2]; a={{x/tm,-y/tm,0},{y/tm,x/tm,0},{0,0,1}}; b={{z,0,Sqrt[1-z^2]},{0,1,0},{-Sqrt[1-z^2],0,z}}; c={{Cos[t],-Sin[t],0},{Sin[t],Cos[t],0},{0,0,1}}; FullSimplify[a.b.c.Transpose[b].Transpose[a] /. x^2+y^2->1-z^2] Figure 4.29: Mathematica Code for a General Rotation.
4.4.4 Givens Rotations The general rotation matrix, Equation (4.32), can be constructed for any general rotation in three dimensions. Given such a matrix A, it is possible to reduce it to a product of rotation matrices that cause the same rotation by performing a sequence of rotations about the coordinate axes. This process, first described in [Givens 58], is based on the QR decomposition of matrices, a subject discussed in any text on matrices (and also in Section 24.3.8), and it results in a set of Givens rotations. Each Givens rotation matrix
246
4.4 Three-Dimensional Transformations
Ti,j is identified by two indexes, i and j, where i > j. The matrix is an identity matrix except for the two diagonal elements (i, i) and (j, j) that are cosines of some angle and for the two off-diagonal elements (i, j) and (j, i) that are the ± sin of the same angle. Specifically, Ti,j [i, i] = Ti,j
[j, j] = c and Ti,j [j, i] = −Ti,j [i, j] = s, where c = A[j, j]/D, s = A[i, j]/D, and D = A[j, j]2 + A[i, j]2 . The special construction of Ti,j implies that the matrix product Ti,j A transforms A to a matrix whose (i, j)th element is zero. Once a general rotation matrix A is given, its Givens rotations can be found by preparing the Givens rotation matrices Ti,j that zero those elements of A located below the main diagonal, column by column, from the bottom up. Figure 4.30 is a listing of Matlab code that does that for the rotation matrix that rotates point (1, 1, 1) to the x axis. n=3; A=[.5774,-.5774,-.5774; .5774,.7886,-.2115; .5774,-.2115,.7886] % Rotation from 1,1,1 to x-axis Q=eye(n); for j=1:n-1, for i=n:-1:j+1, T=eye(n); D=sqrt(A(j,j)^2+A(i,j)^2); cos=A(j,j)/D; sin=A(i,j)/D; T(j,j)=cos; T(j,i)=sin; T(i,j)=-sin; T(i,i)=cos; T A=T*A; Q=Q*T’; end; end; Q A Figure 4.30: Computing Three Givens Matrices.
The three rotation matrices produced by this computation are listed in Figure 4.31, where they are used to rotate point (1, 1, 1) to the x axis. Matrix T1 rotates (1, 1, 1) 45◦ about the y axis to (1.4142, 1, 0), which is rotated by T2 35.26◦ about the z axis to (1.7321, 0, 0), which is trivially rotated by T3 15◦ about the x axis to itself. T1=[0.7071,0,0.7071; 0,1,0; -0.7071,0,0.7071]; T2=[0.8165,0.5774,0; -0.5774,0.8165,0; 0,0,1]; T3=[1,0,0; 0,0.9660,0.2587; 0,-0.2587,0.9660]; p=[1;1;1]; a=T1*p b=T2*a c=T3*b Figure 4.31: Rotating Point
(1,1,1)
to the x Axis.
4 Transformations
247
J. Wallace Givens, Jr. (1910–1993) pioneered the use of plane rotations in the early days of automatic matrix computations. Givens graduated from Lynchburg College in 1928, and he completed his Ph.D. at Princeton University in 1936. After spending three years at the Institute for Advanced Study in Princeton as an assistant of Oswald Veblen, Givens accepted an appointment at Cornell University, but later moved to Northwestern University. In addition to his academic career, Givens was the director of the Applied Mathematics Division at Argonne National Lab and, like his counterpart Alston Householder at Oak Ridge National Laboratory, Givens served as an early president of SIAM. He published his work on the rotations in 1958. —Carl D. Meyer.
4.4.5 Quaternions Appendix B is a general introduction to quaternions and should be reviewed before reading ahead. Quaternions can elegantly express arbitrary rotations in three dimensions. Those familiar with complex numbers may have noticed that a rotation in two dimensions is similar to multiplying two complex numbers because the product c d = (ac − bd, ad + bc) (a, b) −d c is identical to the product (a + ib)(c + id). Quaternions extend this similarity to three dimensions as follows. To rotate a point P by an angle θ about a direction v, we first prepare the quaternion q = [cos(θ/2), sin(θ/2)u], where u = v/|v| is a unit vector in the direction of v. The rotation can then be expressed as the triple product q · [0, P] · q−1 . Note that our q is a unit quaternion since sin2 (θ/2) + cos2 (θ/2) = 1. This interesting connection between quaternions and rotations is developed in detail in [Hanson 06] (see especially page 50 of this reference). Exercise 4.46: Prove that the triple product q· [0, P]· q−1 really performs a rotation of P about v (or u). (Hint: Perform the multiplications and show that they produce Equation (4.31).) As an example of quaternion rotation, consider a 90◦ rotation of point P = (0, 1, 1) about the y axis. The quaternion required is q = [cos 45◦ , sin 45◦ (0, 1, 0)]. It is a unit quaternion, so its inverse is q−1 = [cos 45◦ , − sin 45◦ (0, 1, 0)]. The rotated point is thus q[0, P]q−1 = [− sin 45◦ , (sin 45◦ , cos 45◦ , cos 45◦ )] [0, (0, 1, 1)] [cos 45◦ , − sin 45◦ (0, 1, 0)] = [0, (1, 1, 0)]. The quaternion resulting from the triple product always has a zero scalar. We ignore the scalar and find that the point has been moved, by the rotation, from the x = 0 plane to the z = 0 plane. Figure 4.32 illustrates this particular rotation about the y axis and also makes it easy to understand the rule for the direction of the quaternion rotation q[0, P]q−1 . The rule is: Let q = [s, v] be a rotation quaternion in a right-handed three-dimensional
4.4 Three-Dimensional Transformations
248
coordinate system. To an observer looking in the direction of v, the triple product q[0, P]q−1 rotates point P clockwise. For a negative rotation angle, the rotation is counterclockwise. In a left-handed coordinate system (Figure 4.32b), the direction of rotation is the opposite.
z (toward the reader)
y
z (into the page)
y
x
x
(a)
(b)
Figure 4.32: Rotation in a Right-Handed (a) and in a Left-Handed (b) Coordinate System.
4.4.6 Concatenating Rotations Sometimes we have to perform two consecutive rotations on an object. This turns out to be easy and numerically stable with a quaternion representation. If q1 and q2 are unit quaternions representing the two rotations, then associativity of quaternion multiplication implies that the combined rotation of q1 followed by q2 is represented by the quaternion q2 · q1 . The proof is −1 −1 −1 −1 q2 · (q1 · P · q−1 . 1 ) · q2 = (q2 · q1 ) · P · (q1 · q2 ) = (q2 · q1 ) · P · (q2 · q1 )
Quaternion multiplication involves fewer operations than matrix multiplication, so combining rotations by means of quaternions is faster. Performing fewer multiplications also implies better numerical accuracy. In general, we use 4 × 4 transformation matrices to express three-dimensional transformations, so we would like to be able to express the rotation P∗ = q[0, P]q−1 as P∗ = PM, where M is a 4 × 4 matrix. Given the two quaternions q1 = w1 + x1 i + y1 j + z1 k = (w1 , x1 , y1 , z1 ) and q2 = w2 + x2 i + y2 j + z2 k = (w2 , x2 , y2 , z2 ), their product is q1 · q2 = (w1 w2 − x1 x2 − y1 y2 − z1 z2 ) + (w1 x2 + x1 w2 + y1 z2 − z1 y2 )i + (w1 y2 − x1 z2 + y1 w2 + z1 x2 )j + (w1 z2 + x1 y2 − y1 x2 + z1 w2 )k. The first step is to realize that each term in this product depends linearly on the coefficients of q1 . This product can therefore be expressed as ⎛
w1 ⎜ −z1 q1 · q2 = q2 · L(q1 ) = (x2 , y2 , z2 , w2 ) ⎝ y1 x1
z1 w1 −x1 y1
−y1 x1 w1 z1
⎞ −x1 −y1 ⎟ ⎠. −z1 w1
4 Transformations
249
When L(q1 ) multiplies the row vector q2 , the result is a row vector representation for q1 · q2 . Each term also depends linearly on the coefficients of q2 , so the same product can also be expressed as ⎛
w2 ⎜ z2 q1 · q2 = q1 · R(q2 ) = (x1 , y1 , z1 , w1 ) ⎝ −y2 x2
−z2 w2 x2 y2
y2 −x2 w2 z2
⎞ −x2 −y2 ⎟ ⎠. −z2 w2
When R(q2 ) multiplies the row vector q1 , the result is also a row vector representation for q1 · q2 . We can now write the triple product q · [0, P] · q−1 in terms of the matrices L(q) and R(q): q[0, P]q−1 = q([0, P] · q−1 ) = q([0, P]R(q−1 )) = ([0, P]R(q−1 ))L(q) = [0, P](R(q−1 )L(q)) = [0, P]M, where matrix M is M = R(q−1 ) · L(q) ⎞ ⎞⎛ ⎛ w z −y −x w z −y x x −y ⎟ x y ⎟ ⎜ −z w ⎜ −z w =⎝ ⎠ ⎠⎝ y −x w −z y −x w z x y z w −x −y −z w ⎛ 2 2 2 2 w +x −y −z 2xy + 2wz 2xz − 2wy 2yz + 2wx w2 −x2 +y 2 −z 2 ⎜ 2xy − 2wz =⎝ 2xz + 2wy 2yz − 2wx w2 −x2 −y 2 +z 2 0 0 0
⎞ 0 0 ⎟ ⎠. 0 2 2 2 2 w +x +y +z
Since we have unit quaternions, they satisfy w2 + x2 + y 2 + z 2 = 1, so we can write the final result ⎛
1 − 2y 2 − 2z 2 ⎜ 2xy − 2wz M=⎝ 2xz + 2wy 0
2xy + 2wz 1 − 2x2 − 2z 2 2yz − 2wx 0
2xz − 2wy 2yz − 2wx 1 − 2x2 − 2y 2 0
⎞ 0 0⎟ ⎠. 0 1
(4.33)
In a left-handed coordinate system, the same rotation is expressed by the triple product q−1 [0, P]q or, equivalently, by P∗ = P · MT , where MT is the transpose of M.
250
4.5 Transforming the Coordinate System
4.5 Transforming the Coordinate System Our discussion so far has assumed that points are transformed in a static coordinate system. It is also possible (and sometimes useful) to transform the coordinate system instead of the points. To understand the main idea, let’s consider the simple example of translation. Suppose that a two-dimensional point P is transformed to a point P∗ by translating it m and n units in the x and y directions, respectively. How can the transformation be reversed? We consider two ways. 1. Suppose that the original transformation was P∗ = PT, where ⎛
⎞ 1 0 0 ⎝ T= 0 1 0⎠. m n 1 It is clear that the transformation matrix ⎛ ⎞ 1 0 0 S=⎝ 0 1 0⎠ −m −n 1 will transform P∗ back to P. However, it is trivial to show, by using Equation (4.22), that S is the inverse of T. 2. The transformation can be reversed by translating the coordinate system in the reverse directions (i.e., by −m and −n units) by using an (unknown) transformation matrix M. Since the two methods produce the same result, we conclude that M = S = T−1 . Transforming the coordinate axes is therefore done by a matrix that’s the inverse of transforming a point. This is true for any affine transformations, not just translation. Simple kindness to one’s self and all that lives is the most powerful transformational force of all.
—David R. Hawkins.
5 Parallel Projections The projections discussed in this book transform a scene from three dimensions to two dimensions. Projections are needed because computer graphics is about designing and constructing three-dimensional scenes, but graphics output devices are two-dimensional. Figure 5.1 illustrates what can happen when a dimension is added to space. The figure shows an impossible object, an object that cannot exist in three dimensions, yet it can be drawn in two dimensions.
Figure 5.1: An Impossible Fork.
This figure and others like it show how careful we must be when projecting an object. There are several variants of parallel projections, but they are all based on the following principle: Select a direction v and construct a ray that starts at a general point P on the object and goes in the direction v. The point P∗ where this ray intercepts the projection plane becomes the projection of P. The process is repeated for all the points on the object, creating a set of parallel rays, which is why this class of projections is called parallel. Figure 5.2 illustrates the principle of parallel projections. In Figure 5.2a the rays are perpendicular to the projection plane and in Figure 5.2b they strike at a different angle. This is why the latter method is called oblique projection (Section 5.3). Figure 5.2c shows a different interpretation of parallel projections. Because the rays are parallel, we can imagine that they originate at a center of projection located at infinity. This interpretation unifies parallel and perspective projections and is in D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_5, © Springer-Verlag London Limited 2011
251
5.1 Orthographic Projections
252
accordance with the general rule of projections (Page 200) which distinguishes between parallel and perspective projections by the location of the center of projection. The three types of parallel projections are orthographic, axonometric, and oblique.
∞ (a)
(b)
(c)
Figure 5.2: Parallel Projections.
I will sette as I doe often in woorke use, a paire of paralleles, or [twin] lines of one lengthe, thus =, bicause noe 2. thynges, can be moare equalle. —Robert Recorde, 1557.
5.1 Orthographic Projections The term orthographic (or orthography) is derived from the Greek oρθo (correct) and γραϕoζ (that writes). This term is used in several areas, such as orthographic projection of a sphere (Page 415) and the orthography of a language. The latter is the set of rules that specify correct writing in a language. An example of an orthographic rule in English is that i comes before e (as in “view”) except after a c (as in “ceiling”). The family of orthographic projections is the simplest of the three types of parallel projections. The principle is to imagine a box around the object to be projected and to project the object “flat” on each of the six sides of the box (Figure 5.3a). If the object is simple and familiar, three projections, on three orthogonal sides, may be enough (Figure 5.3b). If the object is complex or is unfamiliar, a perspective projection may be needed in addition to the three or six parallel projections. For even more complex objects, sectional views may be necessary. Such a view is obtained by passing an imaginary plane through the object and drawing a projection of the plane. If one side of the box is the xy plane, then a point P = (x, y, z) is projected on this side by removing its z coordinate to become P∗ = (x, y). This operation can be carried out formally by multiplying P by matrix Tz of Equation (5.1). Similarly, matrices Tx and Ty project points orthographically on the yz and the xz planes, respectively. ⎞ ⎞ ⎞ ⎛ ⎛ ⎛ 0 0 0 1 0 0 1 0 0 (5.1) Tx = ⎝ 0 1 0 ⎠ , Ty = ⎝ 0 0 0 ⎠ , Tz = ⎝ 0 1 0 ⎠ . 0 0 1 0 0 1 0 0 0 The object of Figure 5.3 has two properties that make it especially easy to project. It is similar to a cube, and its edges are aligned with the coordinate axes. In general, if
5 Parallel Projections
(a)
253
(b)
Figure 5.3: Six and Three Orthographic Projections.
the main edges of the object are not aligned with the coordinate axes, its orthographic projections along the axes may look unfamiliar and confusing, and it is preferable to rotate the object, if possible, and align it before it is projected. If the object is not cubical, the best option is to select on the object three axes that are judged the “main” ones and align them with the coordinate axes. The object is then surrounded by a bounding box (Figure 5.4) and the box is projected. Once this is done, the object is transferred into the projected bounding box in a process similar to that described in Section 6.3. If the object is so complex that it is impossible to find three such axes, then the designer should consider projecting several sectional views of the object or using a nonorthographic projection.
Figure 5.4: Orthographic Projection of a Curved Object.
Exercise 5.1: Try to interpret the three orthographic projections of Figure 5.5.
5.1 Orthographic Projections
254
(a)
(b)
(c)
Figure 5.5: Three Orthographic Projections for Exercise 5.1.
The main advantage of orthographic projections is the ease of measuring dimensions. The projection of a segment of length l on the object is a segment of length l (or of a length related to l in a simple way) on the projection plane. This helps in manufacturing an object directly from a drawing and is the main reason orthographic projections are used in technical drawing. Figure 5.6 shows a side view and the top view of a thin hexagon. It is easy to see that a segment of length l on side a becomes a segment of the same length on the projection, while a segment of length l on side b becomes a segment of length l cos β on the projection (where β = 270◦ − α). Top view a
Side view
b
a
l l
b
l
l cosβ
Figure 5.6: Segments on the Sides of a Hexagon.
I feel like I am diagonally parked in a parallel universe. —Unknown.
5 Parallel Projections
255
5.2 Axonometric Projections The term axonometric is derived from the Greek αξων or αξoναζ (axon, axis) and μτ ρoν (metron, a measure). We approach this type of parallel projection from two points of view. Approach 1: Linear perspective, the topic of Chapter 6, was developed in the West during the Renaissance and is based on geometric optics. The observer is considered a point that receives straight rays of light and senses only the color, the intensity, and the direction of a ray but not the distance it has traveled. Oriental art, in contrast, has developed in a different direction and has adopted a different system of perspective, one that is suitable for scroll paintings. A Chinese scroll painting is normally executed on a horizontal rectangle about 40 cm high and several meters long. The painting is viewed slowly from right to left while unrolling the scroll, and it tells a story in time. As the eye moves to the left, we see later occurrences of the same scene, not new views. We can call this approach to art “narrative,” in contrast to Western art, which is situational. Figure 5.7 is an example of this type of art. It is a 33-foot-long scroll titled A City of Cathay that was painted by artists of the Qing court (1662–1795).
Figure 5.7: A City of Cathay.
Because of the temporal approach to scroll art, Chinese (and other Oriental artists) had to develop a system of perspective with no vanishing points, no explicit light sources, and no shadows. The result was a special type of parallel perspective, known today as “Chinese perspective” or axonometric projection. If we imagine the scroll to be the xy plane and we view it along the z axis, then lines that are parallel to the z axis are drawn parallel on the scroll instead of converging to a vanishing point. Approach 2: An orthographic projection of an object shows the details of only one of its main faces, which is why three or even six projections are needed. Each
5.2 Axonometric Projections
256
projection may be detailed and it may show the true shape of that face with the correct dimensions, but it shows little or nothing of the rest of the object. Thus, interpreting and understanding orthographic projections requires experience. Viewing an object from above, from below, and from four sides tends to confuse an inexperienced person. Engineers, architects, and designers may be familiar with orthographic projections, but they have to draw plans that will be viewed and comprehended by their superiors and customers, and this suggests a projection method that will include some perspective, will show more than one face of the object, and will also make it easy to compute dimensions from the drawing. Linear perspective is easy to visualize and understand, but for engineers and designers it has at least three disadvantages: (1) it is complex to compute and draw, (2) the relation between dimensions on the diagram and real dimensions of the object is complex, and (3) distant objects look small. A common compromise is a drawing in one of the three varieties of axonometric projections. Axonometric projections show more of the object in each projection but at the price of having wrong dimensions and angles. An axonometric projection typically shows three or more faces of the object, but it shrinks some of the dimensions. When a dimension is measured on the drawing, some computations are needed to convert it to a true dimension on the object. This is an easy, albeit nontrivial, procedure. An axonometric projection shows the true shape of a face of the object (with true dimensions) only if the face happens to be parallel to the projection plane. Otherwise, the shape of the face is distorted and its dimensions are shrunk. Before we get to the details, here is a summary of the properties of axonometric projections: Axonometric projections are parallel, so a group of parallel lines on the object will appear parallel in the projection. There are no vanishing points. Thus, a wide image can be scrolled slowly while different parts of it are observed. At every point, the viewer will see the same perspective. Distant objects retain their size regardless of their distance from the observer. If the parameters of the projection are known, then the dimensions of any object, far or nearby, can be computed from measurements taken on the projection. There are standards for axonometric projections. A standard may specify the orientation of the object relative to the observer, which makes it easy for the observer to compute distances directly from the projection. To construct an axonometric projection, the object may first have to be rotated to bring the desired faces toward the projection plane. It is then projected on that plane in parallel. We assume that the projection plane is the xy plane, so the projection is done by clearing the z coordinates of all the points or, equivalently, by multiplying each point, after rotating it, by matrix Tz of Equation (5.1). Assuming that we first rotate the object φ degrees about the y axis and then θ degrees about the x axis, the combined rotation/projection matrix is [see Equation (4.30)] ⎞⎛ 1 0 cos φ 0 − sin φ T =⎝ 0 1 0 ⎠ ⎝ 0 cos θ 0 − sin θ sin φ 0 cos φ ⎛
⎞ ⎞⎛ 1 0 0 0 sin θ ⎠ ⎝ 0 1 0 ⎠ 0 0 0 cos θ
5 Parallel Projections ⎞ cos φ sin φ sin θ 0 cos θ 0⎠. =⎝ 0 sin φ − cos φ sin θ 0
257
⎛
(5.2)
To find how various dimensions are affected by these transformations, we start with the vector (1, 0, 0). This is a unit vector in the direction of the x axis. Multiplying it by T gives another vector, which we denote by (x1 , x2 , 0). Its magnitude is sx = x21 + x22 and since the original vector had magnitude 1, the quantity sx expresses the ratio of magnitudes or the factor by which all dimensions in the x direction have shrunk after the transformation/projection T. Similarly, selecting unit vectors (0, 1, 0) and (0, 0, 1) in the y and z directions and multiplying them by T produces vectors (y1 , y2 , 0) and (z1 , z2 , 0) and shrinking factors sy = y12 + y22 and sz = z12 + z22 in the y and z directions, respectively. Figure 5.8a shows a unit cube rotated such that its three sides, which used to be parallel to the coordinate axes, seem to have different lengths. Such an axonometric projection is called trimetric.
(a)
(b)
(c)
Figure 5.8: The Three Types of Axonometric Projections.
Figure 5.8b shows the same unit cube rotated such that two of its three sides seem to have the same length, while the third side looks shorter. Such an axonometric projection is called dimetric. Similarly, Figure 5.8c shows the same unit cube rotated such that all its sides seem to have the same length. This type of axonometric projection is called isometric. Matrix T of Equation (5.2) can be used to calculate the special rotations that produce a dimetric projection. Consider the product of a unit vector in the x direction and T: ⎞ ⎛ cos φ sin φ sin θ 0 (5.3) cos θ 0 ⎠ = (cos φ, sin φ sin θ, 0). (1, 0, 0) ⎝ 0 sin φ − cos φ sin θ 0 This product shows that any vector in the x direction shrinks, after being rotated by matrix T, by a factor sx given by Equation (5.4). The same equation also produces the
5.2 Axonometric Projections
258
shrink factors sy and sz of any vector in the y and z directions. sx =
cos2 φ + sin2 φ sin2 θ,
sy =
√
cos2 θ,
sz =
sin2 φ + cos2 φ sin2 θ.
(5.4)
If we want a dimetric projection where equal-size segments in the x and y directions will have equal sizes after the projection, we set sx = sy or, equivalently, cos2 φ + sin2 φ sin2 θ = cos2 θ, which produces the relation sin2 φ =
sin2 θ . 1 − sin2 θ
(5.5)
Equation (5.5) together with the expression for s2z yields s2z = sin2 φ + cos2 φ sin2 θ = sin2 φ + (1 − sin2 φ) sin2 θ = sin2 φ(1 − sin2 θ) + sin2 θ =
sin2 θ (1 − sin2 θ) + sin2 θ, 1 − sin2 θ
or 2 sin4 θ − (2 + s2z ) sin2 θ + s2z = 0, a quadratic equation in sin2 θ whose solutions are sin2 θ = s2z /2 and sin2 θ = 1. The second solution cannot be used in Equation (5.5) and has to be discarded. The first solution produces −1
θ = sin
sz √ ± 2
−1
and φ = sin
±
sz 2 − s2z
.
(5.6)
−1 must Since the sine function has values√in the range [−1, 1], the argument of sin√ √ be in this range. The expression s / 2 is in this range when − 2 ≤ s ≤ + 2, and z z the expression sz / 2 − s2z is in this range when −1 ≤ sz ≤ +1. Since sz is a shrinking factor, it is nonnegative, which implies that it must be in the interval [0, 1]. Also, since Equation (5.6) contains a ±, any value of sz produces four solutions. Example: Given sz = 1/2, we calculate θ and φ:
0.5 = sin−1 (±0.35355) = ±20.7◦ , θ = sin−1 ± √ 2 0.5 −1 √ φ = sin ± = sin−1 (±0.378) = ±22.2◦ . 2 − 0.52 The two rotations are illustrated in Figure 5.9. Exercise 5.2: Repeat the example for sz = 0.625.
5 Parallel Projections
259
y
x
(a)
(b)
(c)
Figure 5.9: Rotations for Dimetric Projection.
Exercise 5.3: Calculate θ and φ for sx = sz (equal shrink factors in the x and z directions). The condition for an isometric projection (Figure 5.8c) is sx = sy = sz . We already know that sx = sy results in Equation (5.5). Similarly, it is easy to see that sy = sz results in cos2 θ = sin2 φ + cos2 φ sin2 θ, which can be written sin2 φ =
1 − 2 sin2 θ . 1 − sin2 θ
(5.7)
Equations (5.5) and (5.7) result in sin2 θ = 1 − 2 sin2 θ or sin2 θ = 1/3, yielding θ = ±35.26◦ . The rotation angle φ can now be calculated from Equation (5.5): sin2 φ =
1/3 = 1/2, 1 − 1/3
yielding φ = ±45◦ .
The shrink factors can be calculated from, for example, sy = cos2 θ = 2/3 ≈ 0.8165. We conclude that the isometric projection is the most useful but also the most restrictive of the three axonometric projections. Given a diagram with the isometric projection of an object, we can measure distances on the diagram and divide them by 0.8165 to obtain actual dimensions on the object. However, the diagram must show the object (whose main edges are assumed to be originally aligned with the coordinate axes) after being rotated by ±45◦ about the y axis and by ±35.26◦ about the x axis. If these rotations result in obscuring important object features, a less restrictive projection, such as dimetric or trimetric, must be used. Figure 5.10 shows isometric and perspective projections of a simple stair-like object and it is clear that the former looks distorted and unnatural (the side away from the viewer seems too large and bent), while the latter looks real. Standards for Axonometric Projections Several common standards for axonometric projections exist and are described here. We start with a simple 30◦ standard for isometric projections whose principle is illustrated in Figure 5.11. Part (a) of the figure shows a cube projected in this standard after it
5.2 Axonometric Projections
260
Isometric
Perspective
Figure 5.10: Isometric and Perspective Projections.
has been rotated φ = 45◦ about the y axis and θ = 35◦ about the x axis. Part (b) shows the same cube with dimensions and angles. It is not difficult to see that α satisfies tan α =√ h/w, which is why α = arctan(h/w). The standard specifies the ratio h/w = 1/ 3, which results in α ≈ 30◦ . The 30◦ angle is convenient because sin 30◦ = 1/2. This part of the figure also shows that θ = arcsin(h/w), a quantity that happens to be close to 35◦ . This projection is attributed by [Krikke 00] to William Farish, who developed it in 1822. y
w
0
45
h
(a)
x 350
(b) 0
30
0
30
Figure 5.11: The 30◦ Standard for Isometric Projections.
A 30◦ angle is convenient for drafters because sin 30◦ = 1/2. However, in our age of computers and computer-aided design, virtually all graphics output devices (monitors, plotters, and printers) use a raster scan and are based on pixels. A line is drawn as a set of individual pixels, and even a little experience with such lines shows that a line at 30◦ to the horizontal looks bad. Much better results are obtained when drawing a line at about 27◦ because the tangent of this angle is 0.5, resulting in a line made of identical sets of pixels (Figure 5.12). As a result, the 27◦ standard for axonometric projections (Figure 5.13) makes more sense. This standard is sometimes also called the 1: 2 isometric projection because it is based on the ratio h/w = 1/2.
5 Parallel Projections
261
300
270 Figure 5.12: Pixels for
y
30◦
and
27◦
Lines.
w
0
45
h x
(a)
30
(b)
0
0
27 Figure 5.13: The
27◦
0
27
Isometric Projection.
A similar standard is based on the ratio h/w = 1, which leads to α = 45◦ . This case is also known as the military isometric projection. This projection is suitable for applications where the horizontal faces of the projected object are important. Figure 5.14 shows that the xz plane becomes a regular rhombus in this projection, which makes it easy to read details and measure distances on this plane. y
45
w
0
h
x 450 0
45 Figure 5.14: The
45◦
450
Isometric Projection.
A Dutch standard for dimetric projections is based on the ratio h/w = 0.33. It is known as the 42◦ /7◦ standard because it results in angles α and β of these sizes (Figure 5.15). The z axis (the one that’s drawn at 42◦ ) is scaled by a factor of 1/2.
5.3 Oblique Projections
262 y
w
700
h x 20
0
70 Figure 5.15: The
42◦ /7◦
β
420 ∝
Dimetric Projection.
5.3 Oblique Projections An oblique projection is a special case of a parallel projection (i.e., with a center of projection at infinity) where the projecting rays are not perpendicular to the projection plane. We have already seen that axonometric projections show more object details than orthographic projections but make it more cumbersome to compute object dimensions from the flat projection. Similarly, oblique projections generally show more object details than axonometric projections but distort angles and dimensions even more. In an oblique projection, only those faces of the object that are parallel to the projection plane are projected with their true dimensions. Other faces are distorted such that measuring dimensions on them requires calculations. The diagram can be drawn quite quickly because the designer used a style of drawing called oblique projection. So long as basic rules are followed, oblique projection is quite easy to master and it may be a suitable style for you to use in a design project. The basic rules are outlined below. http://www.technologystudent.com/designpro/oblique1.htm Figure 5.16 illustrates the principle of oblique projections. A three-dimensional point P = (x, y, z) is projected obliquely onto a point P∗ on the xy plane. We denote the point (x, y, 0) by Q and examine the angle θ between the two segments PP∗ and P∗ Q. A cavalier projection is obtained when θ = 45◦ and a cabinet projection is the result of θ = 63.43◦ . Because of the special 45◦ angle, the three shrink factors of a cavalier projection are equal, as will be shown later. In a cabinet projection, the shrink factors in the x and y directions (assuming that the object is projected on the xy plane) equal 1/2. Figure 5.17a illustrates the geometry of oblique projections and can be used to derive their transformation matrix. We assume that the projection plane is z = 0 (the xy plane) and that all the projecting rays hit this plane at an angle θ. Two projecting rays are shown, one projecting the special point P = (0, 0, 1) to a point (a, b, 0) and the other projecting Q = (0, 0, z), a general point on the z axis, to a point (A, B, 0). The origin (0, 0, 0) is projected onto itself, so the projection of the unit segment from the origin to P is the segment of size s from the origin to (a, b, 0). The value s is therefore the shrink factor of the oblique projection. The three quantities a, b, and s are related
5 Parallel Projections
263
P*
z Q
P
Figure 5.16: Oblique Projections.
by a = s cos φ and b = s sin φ, where φ is measured on the projection plane. The shrink factor s is also related to the projection angle θ by tan θ = 1/s or s = cot θ. y
(A,B,0)
A
a
(a,b,0) B b
s
φ
x (a)
P (0,0,1)
y
Q (0,0,z) z s
φ
z
x
(b)
Figure 5.17: Oblique Projections.
5.3 Oblique Projections
264
Oblique (Adj.) Neither parallel nor at a right angle to a specified or implied line; slanting. We now consider the projecting ray from Q to (A, B, 0). Since Q is at a distance z from the origin, the distance on the projection plane between the origin and point (A, B, 0) is sz. From this we obtain the relations A = sz cos φ and B = sz sin φ. The next step is to consider the projection of a general point (x, y, z). All the projecting rays are parallel, so a little thinking shows that moving a point from (0, 0, z) to (x, 0, z) moves its projection from (A, B, 0) to (x + A, B, 0). Similarly, moving a point from (0, 0, z) to (0, y, z) moves its projection from (A, B, 0) to (A, y + B, 0). A general point located at (x, y, z) is therefore projected to a point at (x + A, y + B, 0). Thus, the rule of oblique projections is (x, y, z) −→ (x + sz cos φ, y + sz sin φ, 0), (5.8) which can be written in terms of a transformation matrix ⎞ ⎛ 1 0 0 ∗ 1 0⎠. P = PT = (x, y, z) ⎝ 0 s cos φ s sin φ 0
(5.9)
With the help of this matrix we examine the following special cases. 1. A cavalier projection. It is defined as the case where the projection angle is 45◦ , which implies s = cot(45◦ ) = 1. Thus, all edges and segments have shrink factors of 1. 2. A projection angle of 90◦ . A value θ = 90◦ implies a shrink factor s = cot(90◦ ) = 0. Matrix T of Equation (5.9) reduces to matrix Tz of Equation (5.1), showing how the oblique projection reduces in this case to an orthographic projection. 3. A cabinet projection. It is defined as the case where the projection angle is 63.43◦ , which implies s = cot(63.43◦ ) = 1/2. All edges and segments perpendicular to the projection plane have shrink factors of 1/2. Figure 5.17b shows how φ and θ are independent. For a given projection angle θ, it is possible to assign φ any value by rotating the triangle in the figure. In practice, this means that an object can be projected several times, with different values of φ but with the same projection angle θ. Such projections may give all the necessary visual information about the object while having the same shrink factors.
Orthographic
Axonometric
Oblique
Figure 5.18: Comparing Parallel Projections.
Axonometric and oblique projections are generally considered different, but Figure 5.18 shows that the difference between them is a matter of taste and terminology. If
5 Parallel Projections
265
we rotate the object and light rays of the oblique projection 45◦ counterclockwise, the result on the projection plane is identical to the axonometric projection. She could afterward calmly discuss with him such blameless technicalities as hidden line algorithms and buffer refresh times, cabinet versus cavalier projections and Hermite versus B´ ezier parametric cubic curve forms.
—John Updike, Roger’s Version (1986)
6 Perspective Projection The term perspective refers to several techniques that create the illusion of depth (three dimensions) on a two-dimensional surface. Linear perspective is one of these methods. It can be defined as a method for correctly placing objects in a painting or a drawing so they appear closer to the observer or farther away from him. The keyword in this definition is correctly. It implies that a flat picture in linear perspective creates in the viewer’s brain the same sensation as the original three-dimensional scene. The main tool employed by linear perspective is vanishing points. This chapter starts by explaining vanishing points. This is followed, in Section 6.2, by a short history of perspective in art. The remainder of the chapter develops simple mathematical tools to compute the two-dimensional perspective projection of any given three-dimensional point.
Figure 6.1: Ancient Art.
The Bible is eternal and is always the same, but most other objects and processes around us change and develop continually. Hot air balloons, cheese, and bicycles are familiar examples of items that constantly develop and improve. Art is another example. Ancient art tends to be flat, as illustrated by Figure 6.1. The Lascaux cave drawings, D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_6, © Springer-Verlag London Limited 2011
267
Perspective Projection
268
Navajo rock drawings, and ancient Egyptian art shown in the figure are two-dimensional. They are flat and do not attempt to create a sensation of depth. Flatness is also a common feature of modern art. The abstract art and cartoons of Figure 6.2 look flat and use the painter’s algorithm to create the barest hints of depth. (The painter’s algorithm is simply the way painters work. The first objects painted may be partly or fully covered and obscured by objects painted later.)
Figure 6.2: Modern Art.
Art, especially painting and drawing, went through a revolution during the Italian renaissance in the late Middle Ages. An important part of this revolution was the technique of perspective. Almost overnight it became possible to create the illusion of a three-dimensional scene in a flat, two-dimensional picture. Section 6.2 surveys the historical developments that led to an understanding of perspective, but Figure 6.3 illustrates the basic idea. Part (a) of the figure shows a small, flat plane defined by two sets of parallel lines. In part (b), some lines are made to converge to a vanishing point, thereby creating the sensation of depth. Part (c) maintains this feeling even though the vanishing point itself has been removed. Finally, part (d) illustrates how four copies of this plane can be connected to form an object that we perceive as a cube, a box, or a room, even though we know that it is only a collection of lines on a flat surface.
(a)
(b)
(c)
(d)
Figure 6.3: Converging Lines.
Figure 6.4 is another illustration of the same concept. It is easy to see that the railway tracks of part (a) are wrong, while part (b) looks realistic.
6 Perspective Projection
(a)
269
(b)
Figure 6.4: (a) Wrong and (b) Correct Perspective.
Exercise 6.1: Search the works of art (modern or otherwise) for examples of wrong or reversed perspective. Simply stated, sound perspective means that something seen happening in the foreground of the shot must make a louder noise than something seen to be further away. Most failures to respect the rule are instinctively heard as “bad sound,” as imperfect or amateur use of recording technology. —David Bellos, Jacques Tati (1999).
6.1 One Two Three
...
Infinity
The first step toward understanding perspective is an understanding of converging lines and vanishing points. Imagine a simple house shaped like a cube. If we stand in front of it, we see only its front wall, a square, much like the one depicted in Figure 6.5a. If, however, we imagine the house to be transparent, it would look like part (b) of the figure. Its back wall is farther away from us, so it looks smaller than its front wall, which is why the four parallel lines connecting the front and back walls do not look parallel; they seem to converge to an imaginary point called a vanishing point. The vanishing point exists only in our imagination, and we can imagine it only if we extend the four lines in question. Thus, the vanishing point is a result of the way the brain interprets what the eyes see. We now walk around our transparent, cubic house and turn to the left, such that our line of sight is aimed at one of the corners, as shown in Figure 6.5c. The house is the same: it hasn’t moved or changed shape. We, the viewers, are also the same, only our position and orientation have changed. Yet, when we look at the house, we see two groups of lines converging at two vanishing points (Figure 6.5d). Figure 6.6 shows examples of perspective with three vanishing points. Imagine a person standing in front of a corner of a skyscraper, craning his neck in an attempt to
6.1 One Two Three . . . Infinity
270
Back
Front
Roof Back
(a)
(b)
Front
(c)
(d)
Figure 6.5: Vanishing Points.
see all the way to the top of the building. Because of the height of the building, its top seems smaller than its bottom, so the straight, parallel lines connecting top to bottom also seem to converge to a vanishing point. Even a small object, such as a cube, can feature three vanishing points if it is hoisted up and we are positioned under it. Even a small, one-story house can feature three vanishing points if it has a traditional pitched roof.
Figure 6.6: Three Vanishing Points.
We live in a three-dimensional world, which is why we can visualize objects with one, two, or three dimensions, but not more. A line or a curve has one dimension. A flat plane or a curved surface is two-dimensional. A solid object has three dimensions, so the question is, can an object feature more than three vanishing points? The answer, which may come as a surprise to some, is yes, as illustrated by Figures 6.7 and 6.9. Everyday objects, such as a chest of drawers (if you have messed yours up in order to prove my point, please take the time to put it back in order) and a circular staircase, can feature any number of vanishing points. Exercise 6.2: (By enigmatologist Will Shortz) If the question (crossword puzzle hint) is “it might turn into a different story,” what is the answer?
6 Perspective Projection
271
Figure 6.7: Many Vanishing Points.
Exercise 6.3: Come up with other common objects or scenes that feature many vanishing points. We therefore conclude that an object seen in perspective can have any number of vanishing points, even zero. In addition, the number and positions of those points vary when the object is moved or changes its orientation and when the viewer moves, turns, bends, tilts his head, or cranes his neck. The rule governing the number and position of the vanishing points is simple and can be considered the main principle of perspective. Before this rule is stated, let’s take another look at Figure 6.5d. It features two vanishing points, each created by a group of parallel lines. The point is that originally (i.e., in Figure 6.5b) these lines seem parallel, but when the viewer moves to a different location, looking at the same object from a different direction, these same lines no longer look parallel and seem to converge.
* *
*
*
*
*
*
(a)
* (b)
Figure 6.8: Effects of a Small Rotation.
Figure 6.8 serves to further illustrate the behavior of the vanishing points. Part (a) of the figure shows the cube of Figure 6.5b with the four parallel edges marked with an asterisk. In part (b), the cube is rotated through a small angle, which slightly changes the
272
6.1 One Two Three . . . Infinity
Figure 6.9: Many Vanishing Points.
orientation of these four edges relative to the viewer. They no longer appear parallel, and seem to converge to a distant vanishing point on the far left of the figure. In addition, the four lines that originally converged at the center of the cube now converge to a vanishing point slightly to the right of center.
6 Perspective Projection
273
This small rotation has resulted in the same cube featuring two vanishing points. It can be interpreted by saying that the rotation has moved the original vanishing point slightly to the right and the new vanishing point isn’t really new. It was originally located at infinity and has moved by the rotation to a finite (albeit distant) location. These observations should help the reader to understand and agree with the following statement: In order for an object to feature vanishing points, it must have groups of straight parallel lines. The lines may be generated by the intersection of two planes on the object, as in the case of a cube, or they may be painted or scribed on the surface of the object. They may even be located inside the object, if it is transparent. Any such group of lines results in a vanishing point, except if the lines are perpendicular to the line of sight of the viewer. Rotating the cube of Figure 6.8 has changed the orientation of a group of four parallel lines that were originally perpendicular to the line of sight but are no longer so. The new orientation has therefore added a vanishing point. The conclusion is that an object may have any number of vanishing points depending on its shape and orientation, on groups of parallel lines that happen to be on it, and on the direction from which it is viewed. The statement above is the rule governing vanishing points. It should be stressed that the vanishing points are not real. They exist only in our imagination and we imagine them because of the particular way our brain interprets the signals sent from our eyes.
Vi ew er
Normal
A Mirror
ed ct le ef wer R ie V
t
ec
bj
B
O
R O efle bj ct ec ed t
A*
B*
Figure 6.10: The Rule of Reflection.
An interesting example of vanishing points is a reflection in a mirror. A ray of light that strikes a mirror is reflected in a direction determined by the normal to the mirror. The rule of reflection (Figure 6.10) is that the angle of incidence equals the angle of reflection. Points “A” and “B” in the figure are seen by the viewer as if they are deep in the mirror, and any group of parallel lines on a reflected object seems to converge in the mirror to a new vanishing point. Figure 6.11 shows a cube (in two-point perspective) reflected in a mirror. The two real vanishing points are vp1 and vp2 . The cube seen in the mirror also has two virtual
274
6.1 One Two Three . . . Infinity mirr or
vp2*
vp2
vp1*
n tio lec ref
vp1
age l im rea
Figure 6.11: Real and Virtual Vanishing Points.
vanishing points, vp∗1 and vp∗2 , and it is easy to see the symmetric relation between the real and virtual points. Note. This section discusses straight lines and their convergence, which is why the examples here employ cubes and other objects with large, flat surfaces and straight lines. However, curved objects with no straight, parallel lines can also be seen (and drawn) in perspective, and techniques for achieving this are described in Section 6.3. Vanishing points and converging lines are important in perspective, but perspective has another important aspect. When an object is moved away from the viewer, it appears smaller, but it also features less perspective. The amount of perspective seen depends on the relation between the size of the viewed object and its distance from the viewer. To see why this is so, we go back to the cube of Figure 6.5b, duplicated in Figure 6.12a. Assuming that this cube is 10 cm on a side and that it is viewed from a distance of 10 cm, its back face is 20 cm from the viewer, twice the distance of the front face. The back face therefore seems to the viewer much smaller than the front face, and the object is seen with a lot of perspective. If this cube is moved 90 cm away from the viewer, its front face ends up at 100 cm and its back face is at 110 cm from the viewer. The difference between front and back is much smaller compared with the distance from the viewer, causing the back face to appear only a shade smaller than the front, with the result that the object appears to have much less perspective (Figure 6.12b). Front
Front
Back Back
(a)
(b)
Figure 6.12: (a) More and (b) Less Perspective.
Exercise 6.4: In addition to featuring less perspective, a distant object also looks small. Can we bring such an object closer without increasing its perspective?
6 Perspective Projection
275
6.2 History of Perspective In art, the term “perspective” refers to a technique for depicting a three-dimensional scene on a two-dimensional flat surface. The result is similar, but not identical, to the way we perceive three-dimensional objects and scenes in space. Our eyes are separated by a few centimeters and as a result they observe slightly different views of the same scene. The brain combines these views in a complex way to generate the sensation of depth. When we move, turn, or raise or lower our head, the image we see varies continuously. A painting or drawing in perspective, on the other hand, is based on a fixed viewpoint and is equivalent to looking at the scene through a peephole with one eye. The principles of perspective were known to the ancients. Many Greek vase paintings indicate a grasp of the principles of perspective. Roman wall paintings show lines converging to vanishing points, and the Roman architect Vitruvius describes perspective in his writings [Vitruvius 06]. In the Middle Ages, especially in the 13th and 14th centuries, several artists in Italy, France, and Holland (and perhaps also in the East) independently discovered (or rediscovered) some of the principles of perspective, especially the concept of lines converging to a vanishing point. However, none came up with a complete and consistent theory of perspective. Such a theory had to wait until the second decade of the 15th century, when it was developed first experimentally by Filippo Brunelleschi and later in more detail by Leon Battista Alberti. (Some experts also credit the painter Paolo Uccello with major contributions to the understanding of perspective.) [Uccello] would remain the long night in his study to work out the vanishing points of his perspective, and when summoned to his bed by his wife replied in the celebrated words: “How fair a thing is this perspective.” Being endowed by nature with a sophisticated and subtle disposition, he took pleasure in nothing save in investigating difficult and impossible questions of perspective. . . . When engaged in these matters, Paolo would remain alone in his house almost like a hermit, with hardly any intercourse, for weeks and months, not allowing himself to be seen. . . . By using up his time on these fancies he remained more poor than famous during his lifetime. —Giorgio Vasari, The Lives of the Artists (1567). The remainder of this section discusses the contributions made by three Renaissance figures, Brunelleschi, Masaccio, and Alberti, to the understanding of perspective. Brunelleschi Filippo Brunelleschi, known to his contemporaries as “Pippo,” was born in Florence in 1377. His father, Ser Brunellesco di Lippo Lapi, was a prosperous notary, but young Filippo showed an interest in machines and in solving mechanical problems. (The term “ser” was a title of respect, while “di Lippo Lapi” indicates that Brunelleschi’s father was named Lippo and was from the Lapi family.) Filippo was therefore apprenticed, at age 15, to a local goldsmith. For the next six years he learned to cast metals, work with enamel, engrave and emboss silver, and use precious metals to decorate manuscripts with gold leaf and to make jewels and religious artifacts.
276
6.2 History of Perspective
After completing his apprenticeship in 1398 at age 21, Brunelleschi was sworn as a master goldsmith and became a well-known goldsmith in Florence and other cities. From 1401 to 1416 or 1417, he seems to have spent most of his time in Rome (although this is uncertain), working as a goldsmith, making clocks, and surveying the many ruins of the eternal city. Returning to Florence after 13 years of absence, Brunelleschi, then 40, became involved in the competition for the great dome of the Santa Maria del Fiore Cathedral. This was to be both the largest dome ever attempted, with a diameter of more than 143 feet, and the tallest one, starting at a height of about 170 feet off the ground and reaching about 280 feet. (The lantern on top of it adds more than 70 feet to that.) Even though known as a goldsmith, not an architect, Brunelleschi won the 1418 competition because of his original approach to the problem. The novel aspect of his plan for the dome was to build it without any scaffolding. (The term “centering” was then used.) This idea, and the 1:12 model of the dome that he built in brick to demonstrate his method, helped convince the committee of judges to give him the commission. He then spent the years from 1420 to 1436 supervising the construction while also designing and building ingenious machines to haul heavy loads to the top. Brunelleschi, a true Renaissance man both because of his interests and achievements and because of his time period, died in 1446. Like Donatello, Masaccio, da Vinci, and Michelangelo, he never married. For more information on Brunelleschi, his work, and his times, see [King 00] and [Walker 02]. A biography of Brunelleschi [Manetti 88] was written in the 1480s, four decades after the death of its subject, by his pupil Antonio Manetti, which brings us to Brunelleschi’s contribution to perspective. In this biography, Manetti describes Brunelleschi’s panel drawing, a trompe l’oeil that was then used by Brunelleschi in an experiment that fuses nature and art, similar to an optical trick. This historically-important painting has since been lost, but it (and the experiment) are described in detail by Manetti. Trompe l’oeil (French for “deceiving the eye,” pronounced “tromp loy”). 1. A style of painting that gives an illusion of photographic reality. 2. A painting or effect created in this style. The peepshow experiment. Brunelleschi placed himself at a point three braccia (about six feet) inside the doorway of the not yet completed cathedral of Santa Maria del Fiore. His idea was to specify a precise viewing point at which a viewer could compare a real scene with a perspective painting of the same scene. Looking outside across the Piazza del Duomo, he clearly saw, about 115 feet away, the Baptistery of San Giovanni, one of Florence’s most familiar landmarks. This structure was a good choice for the study of perspective because it is shaped like an octagon, so someone standing in front of it sees its three front walls in two-point perspective. (It also features left–right symmetry, so reflecting it horizontally does not change its shape.) Brunelleschi then painted what he saw through the doorframe—the Baptistery and some of the surrounding streets—in perspective on a small panel about 12 inches wide. Finally, he drilled a small hole in the panel at the center of the Baptistery’s door (Figure 6.13a) because this point of the Baptistery would be directly opposite the eye of a viewer standing at the specified viewing point.
6 Perspective Projection
277
Mirror
Painting
Hole
(a)
(b)
0
53
(c)
(d)
Figure 6.13: Brunelleschi’s Experiment in Perspective.
The world having so long been without artists of lofty soul or inspired talent, heaven ordained that it should receive from the hand of Filippo the greatest, the tallest, and the finest edifice of ancient and modern times, demonstrating that Tuscan genius, although moribund, was not yet dead. —Giorgio Vasari, The Lives of the Artists (1567). Brunelleschi then rotated the panel 180◦ and looked through the hole at the Baptistery. He then inserted a mirror and held it at arm’s length as shown in Figure 6.13bc and looked at his painting reflected in the mirror. This became Brunelleschi’s celebrated peepshow experiment, which demonstrated the lifelike qualities of perspective. In his biography, Manetti claims to have held this painting in his hands and to have repeated the experiment. He was unable to tell the difference between the image reflected in the mirror and the real scene (without the mirror). (However, modern travelers to Florence recommend the use of a pair of heavy-duty tripods to hold the image and the mirror at their precise locations.)
6.2 History of Perspective
278
Campanile
530 PIAZZA
Corso degl’Alimari
Via de Martelli
Tempio di S. Gio. Batta Colonna del Miracolo de San Zanobi
Volta de’ Pecori
Canto alla Paglia Palazzo Arcivescovile
Figure 6.14: Plan of the Piazza del Duomo, Florence (After [Sgrilli 33]).
[Brunelleschi] had made a hole in the panel on which there was this painting; . . . which hole was as small as a lentil on the painting side of the panel, and on the back it opened pyramidally, like a woman’s straw hat, to the size of a ducat or a little more. And he wished the eye to be placed at the back, where it was large, by whoever had it to see, with the one hand bringing it close to the eye, and with the other holding a mirror opposite, so that there the painting came to be reflected back . . . which on being seen, . . . it seemed as if the real thing was seen: I have had the painting in my hand and have seen it many times in these days, so I can give testimony. —Antonio Manetti, The Life of Brunelleschi (1480s). Manetti mentions another interesting fact. The painting was about 12 inches wide and Brunelleschi recommended watching it from a distance of 6 inches, so the reflection seen in the mirror appears to be at a distance of 12 inches from the viewer. We know that tan 26.6◦ = 0.5, which implies that the apex angle of an isosceles triangle whose height equals its base is 2×26.6 ≈ 53◦ (see also Exercise 6.30). This trigonometric fact suggests that, as seen from the viewing point specified by Brunelleschi, the Baptistery spans a viewing angle of about 53◦ , and this is verified by Figure 6.14, which follows the site plan given by [Sgrilli 33]. Finally, Manetti mentions that the diameter of the hole on the painted side of the panel was about the thickness of a bean (6–7 mm). Figure 6.13d illustrates how the same angle of 53◦ is obtained if the eye of the viewer is glued to the back of the panel (where according to Manetti the hole was bigger, about the size of a ducat, 20 mm) and the thickness of the panel is the same 6–7 mm. Masaccio Perhaps the first great Renaissance painter to use the ideas of Brunelleschi in a serious work of art was Tommaso di ser Giovanni di Mone (or Tommaso di ser Giovanni cassai),
6 Perspective Projection
279
known to us as Masaccio, a nickname that can be translated as Big Thomas, Rough Thomas, Clumsy Thomas, Sloppy Thomas, Bad Thomas, or even Messy Thomas. He died in 1428, at age 27, and in his last two years he painted a fresco, today titled Trinity (or Holy Trinity), in the church of Santa Maria della Novella in Florence. The accurate execution of one-point perspective in this picture creates the illusion of a sculpture placed in a cavity in the wall, although the picture is flat. This large picture (approximately 6.7 × 3.2 m, or 21 ft 10 12 in by 10 ft 5 in) has a sad history of incompetent restoration and a 19th century attempt to cut it off the wall and move it to another wall in the same church. Figure 6.16 is a small replica showing how the single vanishing point was placed by the artist at the viewer’s position. The architectural setting of this fresco [the Trinity] is so accurate in its perspective and so Brunelleschian in style that some scholars have suggested Brunelleschi drew the sinopia, or cartoon, on the wall for Masaccio to paint. This is certainly possible, but it is also quite possible that Masaccio—a master draftsman as well as an inspired painter—could have done the whole work himself. Perhaps it doesn’t matter. The important fact for the future of Western art is that Masaccio met Brunelleschi and gained such a deep knowledge of perspective that he set a standard for every painter to follow. —Paul Robert Walker, The Feud that Sparked the Renaissance (2002). Alberti In 1435–1436, Leon Battista Alberti wrote and published (in Latin and Italian) Il Trattato della Pittura e I Cinque Ordini Archittonici (“On Painting”), where he describes a simple geometric method for constructing a correct one-point perspective of a horizontal grid on a vertical picture plane. This method was later simplified by Piero della Francesca in his 1478 mathematical treatise De prospectiva pingendi and is illustrated in Figure 6.15. Picture plane
Viewer to picture
Horizon
c
Vi sua lr ay s
Transversals Ch ec k
Side view
Grid on ground
lin e
Front view
Figure 6.15: Alberti’s Method of Traversals in One-Point Perspective.
The left part of the figure shows a side view where the picture plane is intercepted by a family of visual rays that emanate from the viewer’s eye. Each ray connects the
280
6.2 History of Perspective
Figure 6.16: Masaccio’s Holy Trinity.
6 Perspective Projection
281
eye to one of the transversals (or divisions) of the grid on the ground. The point where the ray intercepts the picture plane is then transferred to the front view (on the right part of the figure) to indicate where to place the particular transversal in the picture. It is easy to see how the transversals, which are equally spaced on the ground, become closer and closer in the picture. The last step is to draw a diagonal line in the front view to check for the accuracy of this geometric construction. The canvas is an open window through which I see what I want to paint. —Leon Battista Alberti. In his book, Alberti also shows how such a floor, accurately drawn in perspective, can serve to determine the correct dimensions (both horizontal and vertical) of objects positioned on the floor and elsewhere in the picture. Figure 6.17 illustrates how a grid on a floor is used to determine the height of a large, box-like object placed on the floor. Alberti used the braccio (plural braccia), a length unit that equals approximately 58 cm (or 23 in, roughly the length of a man’s arm), and a length of four braccia, measured on the floor, is employed to determine the heights of the box at its front and back.
Four braccia
Four braccia
Figure 6.17: Determining Vertical Dimensions from the Floor.
It is such precisely described methods and techniques that distinguish Alberti from his predecessors and justify the title “pioneer” or “originator” of perspective. Exercise 6.5: Given the simple two-point perspective of Figure 6.18, show how the equally-spaced red vertical lines were constructed. Leonardo da Vinci, who certainly knew about perspective, developed his own projection, now known as aerial or atmospheric perspective. This method of adding depth to a two-dimensional painting is based on the perception that contrasts of color and shade appear greater in nearby objects than in those far away, and that warm colors (such as red, orange, and yellow) appear to advance, while cool colors (blue, violet, and green) appear to recede. Aerial perspective is also used in East Asian art, where zones of mist are sometimes used to separate near and distant parts of the scene.
282
6.3 Perspective in Curved Objects, I
Figure 6.18: Two-Point Perspective with Equally-Spaced Lines.
6.3 Perspective in Curved Objects, I Up until now, we have discussed perspective, converging lines, and vanishing points in cubes or other objects with large flat surfaces on which it is easy to draw straight lines. Our accumulated life experience, however, teaches us that even curved objects— objects without flat parts and with no groups of straight, parallel lines—are seen in perspective. This section shows how to extend the principles of perspective discussed earlier to arbitrary surfaces.
(a)
(b)
Figure 6.19: Alberti’s Method of Perspective Drawing.
The main idea was already proposed by Alberti and is illustrated in Figure 6.19 for a circle. Start with a flat, nonperspective drawing of a curved object and place a regular rectangular grid on it [part (a)]. Redraw the grid in perspective, with a vanishing point [part (b)], and go over the two grids box by box. For each box, copy that part of the object seen in the first grid and modify it according to the shape of the box in the second grid. The final result (the circle in perspective) looks like an ellipse, but notice how the left and right extreme points of the projected circle (i.e., the ellipse’s major axis) no longer lie on the central horizontal line but have moved below it. Exercise 6.6: Explain why. A variant of this method starts by locating key points on the curved object (points that make it easy to draw the entire object), assigning them coordinates, and locating them on the perspective grid. Figure 6.20 shows an example of a large digit 5 where 5×7 = 35 key points have been located. The digit is placed in a rectangle, and grid lines are added
6 Perspective Projection
283
and labeled 1 through 5 and “A” through “G,” resulting in a nonuniform grid. This grid is then transformed in perspective (one-point or two-point) and the key points located in the new grid, which makes it easy to draw the large 5 in perspective. Exercise 6.7: Show the geometric construction that transfers the 35 key points to a grid in one-point perspective. G F E D
C
B A 1
2
3
4
5
Figure 6.20: A Large Digit “5.”
6.4 Perspective in Curved Objects, II In this section, we discusses techniques for drawing curved objects in perspective. There are many books, mostly for artists, draftsmen, and architects, that discuss perspective and describe methods for drawing objects in perspective. Unfortunately, these books generally employ as examples cubes or cubical objects and therefore create the wrong impression that only such objects can be drawn in perspective. Experienced artists, illustrators, engineers, and architects who draw both curved and cubical objects, know that even curved objects, even objects that lack any straight lines and flat faces, can be drawn in perspective. This section explains simple techniques and approaches to the perspective drawing of arbitrary objects. For general references on this topic and more examples, see [Hulsey 08] and [Robertson 08]. We start with a general one-point perspective grid. Figure 6.21a shows a rectangle bounded by two horizontal lines and the vertical lines a and b. We want to construct an adjacent rectangle of the same width. Two diagonals are drawn (shown in red) to locate the center of the rectangle, and a short horizontal line c is drawn to locate point B, the center of line b. Finally, a line is drawn from point A through point B, to determine point C, which becomes the bottom-right corner of the new rectangle. This simple technique is now applied to construct a perspective grid.
6.4 Perspective in Curved Objects, II
284
Figure 6.21b shows two boundary lines that converge to a vanishing point VP. Two vertical lines a and b are selected arbitrarily (it is useful to assume that the distance between them is one unit) to create a trapezoid. We draw the two diagonals (in red) and construct the line from the center of the trapezoid to the vanishing point. In part (c) of the figure we construct a line from point A through point B (the center of line b) to obtain point C, which determines the position of the next vertical, c (green), and thus the next trapezoid. Notice that the distance between b and c is shorter than the distance between a and b, because line c is farther away from the observer. This is why the distances between the consecutive verticals in part (d) of the figure are diminishing, an effect called foreshortening. Part (e) of the figure shows two one-point perspective grids with different orientations relative to a horizon line. A
A
b
a
a
B b
a c
(a) C
VP
(b)
c
B
Horizon
b
(c)
(e)
(d)
C Figure 6.21: A one-Point Perspective Grid.
A two-point perspective grid is constructed in a similar process, as illustrated by Figure 6.22. It is clear that each square (or rectangle) in the original coordinate system is projected to a quadrilateral.
VP1
VP2
Figure 6.22: A Quadrilateral in a Two-Point Perspective Grid.
We therefore conclude that each rectangle (or square) in the original coordinate system is mapped to a trapezoid in a one-point perspective grid and to a quadrilateral in a 2-point perspective grid. Thus, given an arbitrary figure, we generate its one-point (or two-point) perspective projection by first constructing a bounding rectangle around it and then mapping every point in this rectangle to the trapezoid (or quadrilateral) given by the perspective grid. Figure 6.23 shows how the mathematical expressions of this mapping are derived. Given a point P = (x0 , y0 ) in a rectangle, its projection Q in the quadrilateral
6 Perspective Projection U2 y1
285
U3 Q
P = (x,y) (x0,y0)
x1
U0
U1
Figure 6.23: Mapping a Rectangle to a Quadrilateral.
defined by the four points Ui = (ui , vi ) is computed in the following three steps: y − y0 x − x0 , b= . x1 − x0 y1 − y0 b1 = (1 − b)U0 + bU2 , b2 = (1 − b)U1 + bU3 . Q = (1 − a)b1 + ab2 . a=
The quantity a is the relative distance of P from the left edge of the rectangle. It is a number in the interval [0, 1]. Similarly, b is the relative distance of P from the bottom of the rectangle. Once b is known, it is used to compute points b1 and b2 (these are points, not numbers). Point b1 is located on the left edge of the quadrilateral, at relative distance b from U0 and similarly for point b2 . Finally, point Q is computed on the line connecting b1 and b2 at a relative distance a from the left. Many drawing and illustration programs can perform this mapping automatically, as illustrated by the large digit 5 in the figure. Symmetric objects. The special case of a symmetric object is important and is considered next. Many common, important objects feature some type of symmetry. Animals, kitchenware (plates, pots, cups, forks), vehicles, tools, and furniture exhibit at least left-right symmetry, while wheels and other circular objects feature higher symmetries. To draw an object with left-right symmetry, it is possible to draw one-half of the object, select several strategically-placed points on it, and employ simple geometric constructs to transfer those points to the “other side,” where they can be used to draw the other half of the object. Figure 6.24 illustrates this technique. In part (a) we start with a simple one-point perspective grid consisting of two “horizontal” lines that converge to a vanishing point and three verticals. The result is two quadrilaterals. We draw half of a symmetric curve (blue) in one quadrilateral and use simple geometric construction to mirror it in the other quadrilateral. Green lines a and b are symmetric and line a intersects our curve at a point O. We therefore construct line l that converges to the vanishing point and locate point P , the mirror image of O, at the intersection of l and b. Similarly, lines d and e are employed to locate point Q and line m is used to locate point R. More mirror points can be located in this way, until there are enough points to complete the missing half of the curve (Figure 6.24b). It is obvious that the two halves, which are symmetric, have very different shapes in a perspective drawing. In Figure 6.24c, we draw line n to start another one-point perspective grid and draw one-half of another symmetric curve. In part (d), two green diagonals are drawn
6.4 Perspective in Curved Objects, II
286
e d l
O
Q
R
b
a
P
m (b) (a)
n
(c)
p n S
(d)
(e) Figure 6.24: Mirroring Points.
(f)
6 Perspective Projection
287
to determine the center of a quadrilateral, from which point we draw line n to the vanishing point. Another diagonal locates point S, from which we draw vertical p. This completes the matching quadrilateral. In part (e) of the figure we determine two strategic points, using the same techniques as in part (a), and the complete curve is shown in part (f) of the figure. Figure 6.26 illustrates this technique in two-point perspective. Part (a) of the figure shows two groups of “horizontal” lines that converge to two vanishing points. A few “verticals” are also shown. This construction provides the background grid for the drawing. In part (b) we see a blue vertical curve that immediately makes it clear that this is going to be the drawing of a car. Part (c) shows one-half of the bottom of the car (a blue horizontal curve) with point A selected. The bottom is a symmetric curve, but it is clear from the figure that while it is easy to draw the half curve that we see (the one closer to us) it is much harder to draw its symmetric counterpart. This is true for both freehand drawing and drawing done with special graphics software (the drawings shown here were done in a recent version of Adobe Illustrator). Thus, our immediate problem is to transfer point A to the “other side” of the drawing (to mirror it). We start with the simple construction of part (d). A vertical is drawn from point A and line a is drawn to intercept vertical v at point B. Notice that lines a and b are not parallel; they meet at vanishing point VP1 (not shown). Finally, in part (e) we draw the two diagonals shown in green to determine the center of the quadrilateral, draw line c from the center toward VP1 , and draw a line from the top-right corner of the quadrilateral through point C to intercept line b at point D, which is the mirror image of point A. Note. The height of the vertical drawn from point A in part (d) was chosen such that point B is both on vertical v and on the center profile of the car. It is important to realize that the height of this vertical can be chosen somewhat arbitrarily and that point B does not have to lie on the center profile. In fact, it seems that the best choice is for the resulting quadrilateral to be as close to a rectangle as possible, because this makes it easier to determine an accurate center point in part (e). Figure 6.25 applies this technique to point E of the new curve w. This is a “transverse” curve describing half of the lateral profile of the car, and the construction shows how point F , the mirror image of E, is determined.
w F E
Figure 6.25: Mirroring a Point.
6.4 Perspective in Curved Objects, II
288
Verticals
VP1
VP2 (a)
(b)
(c)
A
B (d)
b
b (e)
a
c
A
C a
D
Figure 6.26: Mirroring a Point.
6 Perspective Projection
289
Circles Circles are rare in nature (are crop circles natural?), but common in man-made objects (Figure 6.28). When it comes to drawing objects with circles, the guiding principle is that the perspective projection of a circle is an ellipse (see Figure 6.28 and [Bartlett 08] for illustrations and [Moore 89] for a proof). The next few paragraphs explain why the statement “the perspective projection of a circle is an ellipse” is only an approximation. When a circle is seen head on, it looks, well, circular, but when it is tilted, we intuitively feel that it should look like an ellipse. Intuition often fails, which is why we rely on mathematics. It is easy to show that a tilted circle projected in perspective is not exactly an ellipse, but is very close to an ellipse. We start with the parametric equation of a circle of radius R centered in the z = R plane (R cos t, R sin t, R). Notice that all the z coordinates of this circle equal R. When this circle is tilted through an angle θ by rotating it about the x axis, the result is ⎛
1 0 (R cos t, R sin t, R) ⎝ 0 cos θ 0 sin θ
⎞ 0 − sin θ ⎠ cos θ
= (R cos t, R(cos θ sin t + sin θ), R(cos θ − sin θ sin t)).
(6.1)
The z coordinates of points on the tilted circle are nonnegative, because all the original z coordinates equaled R. We now add a fourth coordinate of 1 and project the rotated circle on the xy plane ⎞ 1 0 0 0 ⎜0 1 0 0⎟ (R cos t, R(cos θ sin t + sin θ), R(cos θ − sin θ sin t), 1) ⎝ ⎠ 0 0 0 r 0 0 0 1 = (R cos t, R(cos θ sin t + sin θ), 0, 1 + R r(cos θ − sin θ sin t)). ⎛
After dividing by the fourth component, this expression becomes
R(cos θ sin t + sin θ) R cos t , ,0 . 1 + R r(cos θ − sin θ sin t) 1 + R r(cos θ − sin θ sin t)
(6.2)
Unfortunately, Equation (6.2) has (sin t) in the denominator, and is therefore fundamentally different from the parametric equation (a cos t, b sin t) of an ellipse. Nevertheless, for small tilt angles θ, the two equations are very similar, because for such angles sin θ is very small and cos θ is close to 1, so the denominator is close to 1 + R r for any value of t, and Equation (6.2) is not much different from
which is an ellipse.
R cos t R cos θ sin t , ,0 , 1+Rr 1+Rr
6.4 Perspective in Curved Objects, II
290
If the curve of Equation (6.2) is not an ellipse, then what is it? Experiments with the Mathematica code Clear[r,R,ta]; {R Cos[t],R Sin[t],R}.{{1,0,0},{0,Cos[ta],-Sin[ta]},{0,Sin[ta],Cos[ta]}} {R Cos[t],R Cos[ta] Sin[t]+R Sin[ta],R Cos[ta]-R Sin[t] Sin[ta],1}. {{1,0,0,0},{0,1,0,0},{0,0,0,r},{0,0,0,1}} R=1; r=.5; ta=45 Degree; ParametricPlot[{R Cos[t]/(1+r R(Cos[ta]-Sin[t]Sin[ta])), R(Cos[ta]Sin[t]+Sin[ta])/(1+r R(Cos[ta]-Sin[t]Sin[ta]))}, {t,0,2Pi}] show that the curve looks like an ellipse whose eccentricity grows with the tilt angle θ. Thus, for practical purposes we can claim that the perspective projection of a circle looks like an ellipse. With this in mind, we now ask when does a tilted circle look exactly like an ellipse? The answer is, when we use parallel projection, instead of perspective projection. A tilted circle in three dimensions is expressed by Equation (6.1). To project this in
parallel on the xy plane, all we have to do is clear the z coordinate, which produces R cos t, R(cos θ sin t + sin θ) ; an ellipse with major axes R and R cos θ. The ellipse. The term “ellipse” is derived from the Greek λλιψις, meaning absence ([Heath 81] explains this). An ellipse is the locus of all the points for which the sum of the distances to two fixed points, called the foci, is constant. An ellipse centered on the origin with foci at points (−c, 0) and (c, 0) is called canonical (Figure 6.27a). Its implicit representation is (x/a)2 + (y/b)2 = 1, where 2a and 2b are the major and minor axes, respectively. (x,y)
(-c,0)
(c,0)
2b
2a (a)
(b)
Figure 6.27: A Canonical Ellipse.
It’s an ellipse. —Gordon Eklund and Gregory Benford, If the Stars Are Gods (1977). The ellipse can be represented parametrically by means of E(t) = (a cos(2πt), b sin(2πt)), or
√ 1−t 2 t E(t) = a ,b , 1+t 1+t
0 ≤ t ≤ 1, 0 ≤ t ≤ ∞.
When a = b, these expressions reduce to the parametric representations of a circle.
6 Perspective Projection
291
Figure 6.28: Natural and Artificial Circles.
The eccentricity of the ellipse measures how much it deviates from a circle. It is defined as e = c/a. For a circle, e = 0. When e = 1, the ellipse reduces to a line from (−c, 0) to (c, 0). The eccentricity of the Earth’s orbit around the Sun is ≈ 1/60. However, when an ellipse is drawn as the perspective projection of a circle, it is the degree, not the eccentricity, that is used as the measure of deviation. A circle is a 90◦ ellipse, while a straight line is a 0◦ ellipse. Thus, the degree of an ellipse E equals 1 − θ, where θ is the angle through which the original circle had to be rotated to look like E. When an area is scaled (expanded or shrunk), the determinant of the scaling matrix equals the scaling factor. This can be used to determine the area πab of the ellipse. The equations of the circle and the ellipse are x2 + y 2 = R2 and (x/a)2 + (y/b)2 = 1. Therefore, if point (x, y) is on the circle, it can be transformed to the ellipse by the scaling transformation a/R 0 . 0 b/R The determinant of this matrix equals ab/R2 , so the area of the ellipse equals the circle area times ab/R2 or πR2 × ab/R2 = πab. A circle no doubt has a certain appealing simplicity at first glance, but one look at an ellipse should have convinced even the most mystical of astronomers that the perfect simplicity of the circle is akin to the vacant smile of complete idiocy. Compared to what an ellipse can tell us, a circle has little to say. Possibly our own search for cosmic simplicities in the physical universe is of this circular kind—a projection of our uncomplicated mentality on an infinitely intricate external world. —Eric Temple Bell, Mathematics: Queen and Servant of Science. Drawing an ellipse with graphics software is easy. All drawing and illustration programs have a special tool for drawing circles. Often, this tool can also be used to
6.4 Perspective in Curved Objects, II
292
draw ellipses, but in the absence of such an option, an ellipse can be drawn by starting with a circle and scaling it in one dimension (Figure 6.27b). Figure 6.29 illustrates how an ellipse behaves as the one-point perspective projection of a circle. Part (a) of the figure shows an ellipse with its two main axes, and it is clear that the center point (defined as the intersection of the two diagonals of the bounding box) is located at the intersection of the two axes. Also, the two extreme points above and below the center point divide the ellipse into two equal parts. In order to fit this ellipse in a one-point perspective grid, we reshape its bounding box and change it from a rectangle to a trapezoid. The two green diagonals in part (b) of the figure determine the new (perspective) center of the ellipse, and it is obvious that this center is no longer located on the major axis. Also, the two extreme points above and below the center now divide the ellipse into two unequal parts. Notice, however, that the minor axis still divides the ellipse into two identical parts, which is one reason why the minor axis of the ellipse is most important in perspective drawing. He [Bean] had already read all the major writers and many of the minor ones and knew the important campaigns backward and forward, from both sides. —Orson Scott Card, Ender’s Shadow.
Minor axis
Horizon
VP
Perspective center
(a)
(b)
Major axis
Figure 6.29: The Ellipse in one-Point Perspective.
Thus, we conclude that the perspective projection of a given circle is the ellipse that satisfies the following conditions: (1) Its minor axis points in the direction of a vanishing point and (2) it touches the trapezoid at the centers of its sides and is tangent to each side. Next, we extend the one-point perspective grid by constructing more verticals. Vertical a is drawn at an arbitrary location, and then a (green) line is drawn from point A through point B, to intercept the bottom border of the grid at point C, which specifies the next vertical b. Figure 6.30 shows how this process is repeated four more times, to end up with five trapezoids, which are the projections of five identical squares, each the bounding box of a circle. The figure also shows ellipses drawn in the first and last trapezoids. The principle of drawing these ellipses is to keep their minor axes on the horizon (i.e., facing the vanishing point).
6 Perspective Projection A
a
293
b
Minor axis
B
Major axis
C
VP
Major axis
Figure 6.30: Two Ellipses in one-Point Perspective.
We are now ready to extend this technique to two-point perspective. Figure 6.31a depicts a horizon line with two vanishing points and a one-point perspective grid that converges to VP2 . We select the two extreme trapezoids and draw lines from their centers to VP1 . These lines become the minor axes of the ellipses shown in part (b) of the figure. The major axes are simply the lines perpendicular to the minor axes and are also shown. The Cutters had major as well as minor subjects for dispute. The chief of these was the question of inheritance. —Willa Cather, My Antonia (1918). It is also important to discuss how such nonstandard ellipses can be drawn. Notice that the ellipses of Figure 6.31b touch the centers of (and are tangent to) the four sides of the trapezoid, as indicated by the four small triangles. Just drawing a vertical ellipse and then rotating it, as illustrated by the dashed ellipse of Figure 6.31c, does not produce the correct result. In Adobe Illustrator (or a similar drawing program), it is better to start with a vertical ellipse and then drag each of its four anchor points to the center of the trapezoid’s side nearest to it (indicated by the small triangles in Figure 6.31b,c) and adjust the tangent at the anchor point to be parallel to that side, as shown in the figures. A different approach to drawing ellipses is discussed in [Bartlett 08]. The great German painter Albrecht D¨ urer showed how to extend Alberti’s approach to three-dimensional objects (Figure 6.32). Lay the object (a lute in the figure) on a table behind a frame and attach a string with a pulley and a weight to the wall in front of the frame. A wooden leaf is attached to the frame with hinges, and a sheet of blank paper is mounted on the leaf. Now move the free end of the string to an arbitrary point on the object and determine the point where the string intercepts the frame. (This is done by two moveable wires or threads, as shown in the upper part of the figure.) Remove the string temporarily, close the hinged leaf, and mark the intersection point of the wires on the paper. This is repeated for many points on the object, which later permits the artist to interpolate the points and complete the drawing. In contrast with renaissance and classical artists, who mostly tried to create works true to nature, many impressionist and modern artists consider the use of color and technique more important than accurate perspective. Figure 6.33 is a classic example of this approach. It shows the famous yellow chair painted by Vincent van Gogh several
6.5 The Mathematics of Perspective
294
Minor axis Mino r axi s
VP1
Major axis
VP2
(a)
(c) (b)
Figure 6.31: Ellipses in Two-Point Perspective.
times during his short stay in Arles. Even a quick glance at it creates the impression that something is wrong. However, van Gogh fans (this author not numbered among them) claim that his mastery of color, combined with his technique and style, resulted in paintings full of appeal and charm, in spite of the crude perspective (or even because of it). Another example that some may call divergent perspective is The Chair by David Hockney (1985).
6.5 The Mathematics of Perspective The mathematics of linear perspective is easy to derive and to apply to various situations. The mathematical problem involves three entities, a (three-dimensional) object to be projected, a projection plane, and a viewer watching the projection on this plane. The object and the viewer are located on different sides of the projection plane, and the problem is to determine what the viewer will see on the plane. It is like having a transparent plane and looking through it at an object. Specifically, given an arbitrary point P = (x, y, z) on the object, we want to compute the two-dimensional coordinates (x∗ , y ∗ ) of its projection P∗ on the projection plane. Once this is done for all the points of the object, the perspective projection of the object appears on the projection plane. Thus, the problem is to find a transformation T that will transform P to P∗ . We use the notation P∗ = PT from Chapter 4.
6 Perspective Projection
Hinged leaf
Wooden frame
Paper
Moveable wires
Figure 6.32: D¨ urer’s Method of Perspective Drawing.
295
296
6.5 The Mathematics of Perspective
Figure 6.33: Crude Perspective in Von Gogh’s Yellow Chair.
6 Perspective Projection
297
Often, there is no need to compute the projections of all the points of the object. If P1 and P2 are the two endpoints of a straight line on the object, then only their projections P∗1 and P∗2 need be computed and a straight line is then drawn between them on the plane. In the case of a curve, it is enough to compute the projections of several points on the curve and either interpolate them on the projection plane or simply connect them with short, straight segments. It is obvious that what the viewer will see on the projection plane depends on the position and orientation of the viewer. The viewer and the object have to be located on different sides of the plane, and the viewer should look at the plane. If the viewer moves, turns, or tilts his head, he will see something else on the projection plane and may not even see this plane at all. Similarly, if the object is moved or if the projection plane is moved or is rotated, the projection will change. Thus, the mathematical expressions for perspective must depend on the location and orientation of the viewer and the projection plane, as well as on the location of each point P on the object. We start with a special case—where the viewer is positioned at a special location, looking in a special direction at a specially-placed projection plane—and show how to project any three-dimensional point to a two-dimensional point on the projection plane. There is no need to consider the orientation of the object because each point P on the object is projected separately. Starting in Section 6.6, this treatment is generalized and we show how to project an object on any projection plane and with the viewer located anywhere and looking in an arbitrary direction. The discussion of perspective and of converging lines earlier in this chapter implies that we are looking for a transformation T that satisfies the following conditions: 1. As the object is moved away from the projection plane, its projection shrinks. This corresponds to the well-known fact that distant objects appear small. 2. The projection of a distant object features less perspective, as illustrated by Figure 6.12. The reader may claim that the projection of a distant object is too small to be seen, so the loss of perspective may not matter, but the point is that we can look at a distant object through a telescope. This brings the object closer, so it looks big, but there is still loss of perspective. 3. Any group of straight parallel lines on the object seems to converge to a vanishing point, except if the lines are perpendicular to the line of sight of the viewer. This rule of vanishing points is stated and discussed in Section 6.1. The remainder of this section derives the special case of perspective projection in four steps as follows: 1. We describe the special case and state the rule of projection. 2. The mathematical expressions are derived using only similar triangles. 3. We show that this rule satisfies the three requirements above. 4. We include this rule in the general three-dimensional transformation matrix. This produces a 4×4 matrix that can be used to transform the points of an object and also project them on a plane. Step 1. The special case discussed in this section places the viewer at point (0, 0, −k), where k, a positive real number, is a parameter selected by the user. The viewer looks in the positive z axis, so the line of sight is the vector (0, 0, 1). Finally, the projection plane is the xy plane. In order for the projection to make sense, we state
6.5 The Mathematics of Perspective
298
again that the viewer and the object must be on different sides of the projection plane, which implies that all the points of the object must have nonnegative z coordinates. (The points will normally have positive z coordinates, but they may also be of the form (x, y, 0); i.e., located on the projection plane itself.) This special case is referred to as the standard position (Figure 6.34a) and is mentioned often in this book. The rule of perspective projection is a special case of the general rule of projection (Page 200) where the center of projection is at the viewer. Thus, in order to project point P, we compute the line segment that connects P to the viewer at point (0, 0, −k) and place the projected point P∗ where this segment intercepts the xy plane. (The segment always intercepts the xy plane because the object and the viewer are located on opposite sides of the plane.) Because the projection plane is the xy plane, the coordinates of the projected point are (x∗ , y ∗ , 0), indicating that it is two-dimensional. y k
z
z y
k
x (a)
x
(b)
Figure 6.34: (a) Standard and (b) Nonstandard Positions.
It is important to realize that the viewer and the projection plane constitute a single unit and have to be moved and rotated together. This is illustrated in Figure 6.34b and especially in Figure 6.35a, which shows the viewer-plane unit moving around the object and the viewer looking at the object from different directions, examining various projections of it on the plane. It is pointless to move the viewer around the object while the projection plane stays at the same location (Figure 6.35b) because such a viewer will generally not even be looking at the projection plane. Thus, the projection plane must move with the viewer and must remain perpendicular to the line of sight of the viewer and at a distance of k units from him (although k may be varied by the user).
k k
k (a)
k (b)
Figure 6.35: Moving the Viewer and the Projection Plane.
6 Perspective Projection
299
Step 2. The two similar triangles of Figure 6.36 yield the simple relations x x∗ = k z+k
and
y∗ y = , k z+k
from which we obtain x∗ =
x (z/k) + 1
and y ∗ =
y . (z/k) + 1
(6.3)
(Some authors assign the x coordinate a negative sign. This is a result of the difference between left-handed and right-handed coordinate systems as discussed in Section 4.3. See also Exercise 6.28.) The +1 in the denominator of Equation (6.3) is important. It guarantees that the denominator will never be zero. The denominator can be zero only if z/k = −1, but k is positive and z is nonnegative. x or y P*
P y
y*
z z
k
Figure 6.36: Perspective by Similar Triangles.
Step 3. Equation (6.3) can be employed to show that the projection rule of Step 1 results in a projection that satisfies the three conditions above and can therefore be called perspective. Condition 1 says that a distant object should appear small. The object can become distant in three ways: 1. increasing the z coordinates of its points; 2. increasing the x or y coordinates; 3. increasing the value of k. For large values of z, Equation (6.3) yields small values for x∗ and y ∗ . Specifically lim x∗ = 0 and
z→∞
lim y ∗ = 0.
z→∞
For large values of x or y, imagine two points, P1 = (x1 , y1 , z1 ) and P2 = (x2 , y1 , z1 ), on the object that differ only in their x coordinates. They are projected to the two points P∗1 = (x∗1 , y1∗ ) and P∗2 = (x∗2 , y1∗ ), which have identical y coordinates, and the ratio of their x coordinates is x∗1 x1 x2 x1 = . = ∗ x2 (z1 /k) + 1 (z1 /k) + 1 x2
(6.4)
Thus, when both x1 and x2 grow, the ratio x∗1 /x∗2 approaches 1, which implies that the two projected points P∗1 and P∗2 get closer. Since P1 and P2 are any points with
6.5 The Mathematics of Perspective
300 x
P
Limit increase
x coordinates
P1
P2 (a)
z (b)
Figure 6.37: (a) Large x Dimensions. (b) Large Values of k .
the same y and z coordinates, this implies that all the points with the same y and z coordinates produce projections that are very close. The object seems to have shrunk in the x dimension (Figure 6.37a). The case where k increases (i.e., the viewer moves away from the projection plane) is different. Figure 6.37b shows how the projection of the object becomes bigger and bigger in this case until, at the limit, when the viewer is at infinity, the projection reaches the actual size of the object. The perspective projection is reduced in this limit to a parallel projection. However, even though the projection itself gets bigger, the viewer sees a small projected object because the projection plane and everything on it look small to a distant viewer. Condition 2 demands that a distant object feature less perspective. We already know that an object can become distant in three ways each of which is individually treated here. 1. The z coordinates are increased. We select two object points P1 = (x1 , y1 , z1 ) and P2 = (x1 , y1 , z2 ) with the same x and y coordinates and different z coordinates. We denote their projected points by P∗1 = (x∗1 , y1∗ ) and P∗2 = (x∗2 , y2∗ ) and compute the ratio x∗1 /x∗2 : x∗1 x1 x1 z2 + k = = . (6.5) ∗ x2 (z1 /k) + 1 (z2 /k) + 1 z1 + k When the z coordinates are increased, this ratio approaches 1, thereby showing that the distance between the projected points is decreased, resulting in less perspective. 2. The x or y coordinates are increased. Equation (6.4) shows that the projected points get closer in this case, too. 3. The value of k is increased. In this case, Equation (6.5) shows that the projected points get closer, again implying less perspective. Condition 3 is also easy to verify, at least in the case of lines parallel to the z axis. Figure 6.38 shows how a group of lines parallel to the z axis are projected to line segments that converge at the origin. Step 4. The projection expressed by Equation (6.3) can be included in the general
6 Perspective Projection
301
x z
Figure 6.38: Lines Parallel to the z Axis.
4×4 transformation matrix in three dimensions (Equation (4.23)). The result is ⎛
1 ⎜0 Tp = ⎝ 0 0
⎞ 0 0 0 0⎟ ⎠. 0 r 0 1
0 1 0 0
(6.6)
A simple test verifies that the product (x, y, z, 1)Tp yields (x, y, 0, rz + 1) or, after dividing by the fourth coordinate, (x/(rz+1), y/(rz+1), 0, 1). This agrees with Equation (6.3) if we assume that r = 1/k. (Recall that k is strictly positive and is never zero. The viewer never presses his eyes to the projection plane.) It is now clear that there are two more special cases that are geometrically equivalent to our standard position. These are the cases where the viewer is positioned on the negative side of the x axis (or the y axis) at a certain distance from the origin and the projection plane is the yz (or xz) plane. The object is located on the positive side of the x (or y) axis. These cases correspond to the transformation matrices ⎛
0 ⎜0 Tx = ⎝ 0 0
0 1 0 0
0 0 1 0
⎞ p 0⎟ ⎠ 0 1
⎛
1 ⎜0 and Ty = ⎝ 0 0
0 0 0 0
⎞ 0 0 0 q⎟ ⎠, 1 0 0 1
where both 1/p and 1/q are the distances of the viewer from the origin. The general case, where the viewer can be positioned anywhere and looking in any direction, is covered in Section 6.6. Before we get to this material, here are some examples of points projected in the standard position. Linear example. We arbitrarily select the two points P1 = (2, 3, 1) and P2 = (3, −1, 2) and the distance k = 1. Notice that the z coordinates of these points are nonnegative. The points are projected to P∗1 =
3 −1 2 3 , , = (1, 3/2) and P∗2 = = (1, −1/3). (1/1) + 1 (1/1) + 1 (2/1) + 1 (2/1) + 1
We now select the midpoint Pm = (P1 + P2 )/2 = (5/2, 1, 3/2) and project it to P∗m =
5/2 3/2 1
+1
,
1 3/2 1
+1
= (1, 2/5).
302
6.5 The Mathematics of Perspective
Point Pm is located on the straight segment connecting P1 to P2 (it is the midpoint of the segment) and P∗m is on the segment connecting P∗1 to P∗2 (although it isn’t the midpoint, because it is easy to see that P∗m = 0.4P∗1 + 0.6P∗2 ). The perspective projection of a straight segment is a straight segment, which is why it is done in practice by projecting the two endpoints and connecting them on the projection plane with a straight segment. Converging lines. We now select an arbitrary point P3 = (0, 2, 3) and compute a new point P4 = (1, −2, 4) from the relation P4 − P3 = P2 − P1 . The difference of two points is a vector, so this relation guarantees that the vector from P3 to P4 equals the vector from P1 to P2 , or, equivalently, that the two line segments P1 P2 and P3 P4 are parallel. The two new points are projected to yield
−2 1 2 , = (0, 1/2) and P∗4 = = (1/5, −2/5). P∗3 = 0, (3/1) + 1 (4/1) + 1 (4/1) + 1 The parametric equation of the straight segment connecting P∗3 to P∗4 is (see Equation (Ans.42)) L2 (w) = w(P∗4 − P∗3 ) + P∗3 = w(1/5, −9/10) + (0, 1/2) for 0 ≤ w ≤ 1, and the parametric equation of the straight segment connecting P∗1 to P∗2 is L1 (u) = u(P∗2 − P∗1 ) + P∗1 = u(0, −4/3) + (1, 3/2) for 0 ≤ u ≤ 1, the point is that although the original segments P1 P2 and P3 P4 are parallel, the two projected segments are not parallel. They meet at point L1 (33/8) = L2 (5) = (1, −4). Another way to prove that the two projected line segments converge is to show that they are not parallel by computing and comparing their directions (or slopes). It’s easy to see that P∗2 − P∗1 = (0, −4/3) but P∗4 − P∗3 = (1/5, −9/10). Line segment L1 moves straight down, whereas L2 has a slope of (−9/10)/(1/5) = −4.5. Exercise 6.8: Select two line segments that are perpendicular to the line of sight of the viewer, and show that their projections on the xy plane are parallel. Projecting curves. We select the three points P1 = (−1, 0, 1), P2 = (0, 1, 2), and P3 = (1, 1, 3) and compute the B´ezier curve P(t) (Chapter 13) defined by them P(t) = (1 − t)2 (−1, 0, 1) + 2t(1 − t)(0, 1, 2) + t2 (1, 1, 3). The midpoint of this curve is P(0.5) = (−1/4, 0, 1/4) + (0, 1/2, 1) + (1/4, 1/4, 3/4) = (0, 3/4, 2). We now project the three original points and obtain
−1 1 , 0 = (−1/2, 0), P∗2 = 0, = (0, 1/3), P∗1 = (1/1) + 1 (2/1) + 1
1 1 P∗3 = , = (1/4, 1/4). (3/1) + 1 (3/1) + 1
6 Perspective Projection
303
The B´ezier curve defined by these points is P∗ (t) = (1 − t)2 (−1/2, 0) + 2t(1 − t)(0, 1/3) + t2 (1/4, 1/4). The point of this example is that the projection of P(0.5), which is (0, 1/4), is not located on P∗ (t). This illustrates the nonlinear nature of the B´ezier curve (as well as most other curves). Exercise 6.9: Show why point (0, 1/4) is not located on P∗ (t). Transforming and projecting. This example illustrates the advantage of the projection matrix Tp of Equation (6.6). Given an object, we might want to transform it before we project its points. In such a case, all we have to do is prepare the individual 4×4 transformation matrices, multiply them together in the order of the transformations, and multiply the result by Tp . Assume that we want to apply the following transformations to our object: (1) Rotate it about the x axis by 90◦ from the direction of positive y to the direction of positive z (Figure 6.39a). (2) Translate it by 3 units in the positive z direction. (3) Scale it by a factor of 1/2 (i.e., shrink it to half its size) in the y dimension. The three transformation matrices are ⎡
1 0 0 ⎢0 0 1 TR = ⎣ 0 −1 0 0 0 0
⎤ 0 0⎥ ⎦, 0 1
⎡
1 ⎢0 TT = ⎣ 0 0
0 1 0 0
0 0 1 3
⎤ 0 0⎥ ⎦, 0 1
⎤ 1 0 0 0 ⎢ 0 1/2 0 0 ⎥ TS = ⎣ ⎦ 0 0 1 0 0 0 0 1 ⎡
and their product with Tp (we assume k = 1, so r = 1) produces ⎡
1 ⎢0 T = TR TT TS ⎣ 0 0
0 1 0 0
0 0 0 0
y
⎤ ⎡ 0 1 0 0 0⎥ ⎢0 0 0 ⎦=⎣ 1 0 −1/2 0 1 0 0 0
⎤ 0 1⎥ ⎦. 0 4
(6.7)
y z
z x
(a)
x
(b)
Figure 6.39: Rotation about the x Axis.
We can now pick any point on the object, write it as a 4-tuple in homogeneous coordinates, and multiply it by T to obtain its projection after applying the three transformations to it. Notice that a point cannot be scaled, but the effect of scaling is
304
6.5 The Mathematics of Perspective
to move points such that the scaled object will shrink to half its size in the y dimension. As an example, multiplying point (0, 1, −4, 1) by T results in (0, 2, 0, 5), which, after dividing by the fourth coordinate, produces the two-dimensional point (0, 2/5). Exercise 6.10: Multiply point (0, 1, −4, 1) by the product TR TT TS and explain the result. Exercise 6.11: The previous paragraph has mentioned scaling, so let’s consider another subtle effect of this simple transformation. The transformation matrix for scaling is ⎞ ⎛ T1 0 0 0 ⎜ 0 T2 0 0 ⎟ ⎠. ⎝ 0 0 T3 0 0 0 0 1 When combined with perspective projection, ⎛ ⎞⎛ T1 0 0 0 1 0 0 ⎜ 0 T2 0 0 ⎟ ⎜ 0 1 0 ⎝ ⎠⎝ 0 0 T3 0 0 0 0 0 0 0 1 0 0 0
it yields ⎞ ⎛ 0 T1 0⎟ ⎜ 0 ⎠=⎝ 0 r 0 1
0 T2 0 0
⎞ 0 0 0 0 ⎟ ⎠. 0 T3 r 0 1
Hence, a point (x, y, z, 1) is transformed to (T1 x, T2 y, 0, T3 rz + 1), which implies x∗ =
T1 x , T3 rz + 1
y∗ =
T2 y . T3 rz + 1
In the special case of uniform scaling, T1 = T2 = T3 = T , we get x∗ = x/(rz + 1/T ), y ∗ = y/(rz + 1/T ). The problem is that when T gets large (large magnification), 1/T becomes small, resulting in x∗ ≈
xk x = , rz z
y∗ ≈
yk y = . rz z
We don’t seem to get the expected magnification. What’s the explanation? The rightmost column of matrix T of Equation (6.7) is important and will serve (on Page 319) to illuminate the properties of the general perspective projection. The three top elements of this column are 0, 1, and 0. The reader may remember that the general transformation matrix, Equation (4.23), denotes these elements by p, q, and r. Thus, element q of matrix T is nonzero. It has already been mentioned that element r of matrix Tp is nonzero because the viewer is positioned on the z axis. The reason that element q of matrix T is nonzero is the rotation about the x axis. We can interpret this rotation either as a rotation of the point or as a rotation of the coordinate system. In the latter case, this rotation has changed the projection plane from the xy plane to the xz plane and has also moved the viewer (because the viewer and the projection plane constitute one unit) from his standard position on the z axis to a new location on the y axis (Figure 6.39b). The fact that q is nonzero tells us that the y axis now intercepts the projection plane. Page 319 sheds more light on the function of matrix elements p, q, and r.
6 Perspective Projection
305
Exercise 6.12: Compute the coordinates of the object point P that happens to be projected to the origin after the three transformations. Negative z coordinates. It has already been mentioned several times that the viewer and the object have to be located on different sides of the projection plane. In the standard position, this means that all the object points have to have nonnegative z coordinates. This example shows what happens when object points have invalid coordinates. (See also Exercise 6.20.) Figure 6.40a shows the two points P1 = (0, 1, −1) and P2 = (0, 1, 1) and a viewer located at (0, 0, −3). When Equation (6.3) is used to project the two points, the results are
P∗1 = 0,
1 1 , 0 = (0, 3/2, 0) and P∗2 = 0, , 0 = (0, 3/4, 0). (−1/3) + 1 (1/3) + 1
y
P2
z
P1
Ne gat ive
3
x
-5
(a)
Undefined
The result seems to make sense, but Figure 6.40b shows that when P1 is moved to the left (i.e., toward larger negative z values), its projection climbs up the y axis quickly and without limit, thereby creating a distorted projection of the entire object. When P1 is located right over the viewer [when it is moved to (0, 1, −3)], its projection is undefined, and when it is moved farther to the left, its projection becomes negative. In such a case, those parts of the object that are in front of the viewer are projected right side up but distorted, and those parts that are behind the viewer are projected upside down.
pro jec tio n
-3
e rg La
e iv sit po
n tio ec oj r p
y
P1
×
P2
×
-1
1
z
(b)
Figure 6.40: Perspective Projection with Negative z Coordinates.
6.6 General Perspective The standard position is just a special case of perspective projection. It simplifies the computations of the projected points and should be used whenever possible. There are cases, however, where the viewer has to be positioned at different points and has to look in different directions. A common example is computer animation. In a typical animation sequence, there is an object or a scene and we imagine a camera moving around or above the scene, taking snapshots much like a real movie camera. While the camera is moving, the object or objects in the scene may also move along a path, rotate, shrink, or become distorted by shearing.
6.6 General Perspective
306
An animation sequence is therefore done in steps, where each step starts by moving, rotating, or otherwise transforming the object (if necessary), moving the camera (which becomes the viewer) to its appropriate position for the step, orienting it, so it looks in the right direction, and finally taking a snapshot. The last operation, taking a snapshot, is done by computing the perspective projections of all the object’s points and plotting the points on the projection plane. The resulting image on the projecting plane then becomes the next animation frame, and the final animation is screened at a fast rate (typically 18–24 frames per second) to create the illusion of smooth animation. Because animation is such an important application of perspective projection, we often use the term “screen” instead of “projection plane.” The main difference between a screen and a plane is that the former has a finite size, whereas the latter is infinitely large. In order to derive the mathematics of general perspective, we need to know at least (1) the location B of the viewer, (2) the direction D of the viewer’s line of sight, and (3) the coordinates of all the points P on the object. Figure 6.41 illustrates another complication that often arises. The figure shows viewers located at the same point and looking in the same direction, but with screens that have different orientations (although each is perpendicular to the line of sight). Thus, in order to fully specify the viewer-screen unit, we sometimes also need to specify the direction T of the top of the screen. T
T z y
x
z y
x
z y
x
Figure 6.41: General Perspective with Different Screen Orientations.
We start this section with a simple example that illustrates how rotation and translation, combined with basic concepts from geometry, can be applied to the computation of perspective projection. Similar computations can be carried out in other cases, but they are normally very messy. Future sections of this chapter illustrate better approaches to the problem of general perspective. In this example, we assume that the viewer has been moved from the standard position by a translation and his line of sight has been rotated. (It is also possible to first rotate the viewer and then translate him.) We compute the new location and direction of the viewer and use this information to compute the equation of the projection plane. (Alternatively, we can determine the new equation of the projection plane by applying to it the same transformations applied to the viewer.) Once this equation is known, we compute the straight segment P(t) that connects the object point P to the viewer. The final step is to calculate the point P(t0 ) where this segment intercepts the projection plane. This point is the projection P∗ of P. In the example, we rotate the viewer θ degrees counterclockwise about the y axis from the positive z to the positive x direction (Figure 6.42a). The viewer ends up at
6 Perspective Projection
307
point ⎛
cos θ (0, 0, −k) ⎝ 0 sin θ
⎞ 0 − sin θ 1 0 ⎠ = (−k sin θ, 0, −k cos θ) = (−kα, 0, −kβ), 0 cos θ
(6.8)
where α = sin θ and β = cos θ (notice that α2 + β 2 = 1). We select a general point P = (l, m, n) on the object and compute its projection P∗ on the new projection plane. Notice that the new projection plane is still perpendicular to the line of sight of the viewer and is still at a distance of k units. It is no longer identical to the xy plane, but it still contains the origin. Projection plane x
x
z
P
z
φ
k (a)
y
(b)
Figure 6.42: Viewer Rotated About the y Axis.
Exercise 6.13: The previous paragraph talks about rotating the viewer counterclockwise, but Equation (6.8) looks like Equation (4.4), which generates clockwise rotation. What’s the explanation? The first task is to find the equation of the projection plane. Vector (−kα, 0, −kβ) is perpendicular to the plane (it is the normal to the plane), so it is perpendicular to any general vector (x, y, z) on the plane. This is why their dot product is zero. From (−kα, 0, −kβ) • (x, y, z) = 0, we obtain the plane equation αx = −βz. Exercise 6.14: Why doesn’t this equation involve y? An alternate way to derive the plane equation is to start with the equation of the original plane and transform it by means of Equation (6.8). The original plane was the xy plane, whose equation is z = 0. A general point on this plane has coordinates (a, b, 0). When multiplied by the rotation matrix of Equation (6.8), the point is transformed to (βa, b, −αa). Thus, a general point (x, y, z) on the new plane has an x coordinate that’s the product of an arbitrary number a and cos θ, a z coordinate that’s the product of the same number a and − sin θ, and an arbitrary y coordinate. The relation between the coordinates can therefore be expressed as z = −αa = −α(x/β) or αx = −βz. Next, we find the equation of the line segment from the viewer to point P. We use the parametric representation P(t) = (P2 − P1 )t + P1 (Equation (Ans.42)). When
6.6 General Perspective
308
applied to the viewer (denoted by P1 ) and to point P = (l, m, n) (denoted by P2 ), it yields P(t) = (l + kα, m, n + kβ)t + (−kα, 0, −kβ) = ((l + kα)t − kα, mt, (n + kβ)t − kβ)
= Px (t), Py (t), Pz (t) . Our next task is to find the intersection point of the line and the projection plane. This is obtained at the value t0 that satisfies αPx (t0 ) = −βPz (t0 ) or
α (l + kα)t0 − kα = −β (n + kβ)t0 − kβ . The solution is t0 =
k(α2 + β 2 ) k = . αl + βn + k(α2 + β 2 ) αl + βn + k
The intersection point is P(t0 ). The next task is to find the three coordinates of the projected point P∗ = P(t0 ). The x coordinate is x∗ = Px (t0 ) = (l + kα)t0 − kα = (l + kα)
lkβ 2 − nkαβ k − kα = . αl + βn + k αl + βn + k
The y coordinate is y ∗ = Py (t0 ) = mt0 =
mk , αl + βn + k
and the z coordinate is z ∗ = Pz (t0 ) = (n + kβ)t0 − kβ = (n + kβ)
k −lkαβ + nkα2 − kβ = . αl + βn + k αl + βn + k
From (x∗ , y ∗ , z ∗ ) = (X/H, Y /H, Z/H), we obtain X = lkβ 2 − nkαβ, Y = mk, Z = −lkαβ + nkα2 , H = αl + βn + k. Using the four expressions above and keeping in mind that (l, m, n) are the coordinates of point P, it is easy to figure out the transformation matrix that projects P to P∗ : ⎛ ⎜ (l, m, n, 1)T = (X, Y, Z, H) implies T = ⎝
kβ 2 0 −kαβ 0
0 k 0 0
−kαβ 0 kα2 0
⎞ α 0⎟ ⎠. β k
(6.9)
6 Perspective Projection
309
The three quantities α, 0, and β that appear at the top of the rightmost column of matrix T correspond to elements p, q, and r of the general 4×4 transformation matrix. They tell us which of the three coordinate axes is intercepted by the projection plane. In our case, the first and third quantities are nonzero (except for θ = 0 and θ = 90◦ ), which implies that the new projection plane intercepts the x and z axes. Page 319 has more to say about elements p, q, and r. Exercise 6.15: Calculate the values of matrix (6.9) for the three special cases θ = 0◦ , 45◦ , and 90◦ . Exercise 6.16: Given the point P = (βl, m, −αl), calculate its projection. Explain the result! Exercise 6.17: Imagine rotating the viewer, who is now at (−kα, 0, −kβ), a second time, by an angle φ about the x axis (Figure 6.42b). The new position of the viewer is ⎛
⎞ 1 0 0 (−kα, 0, −kβ) ⎝ 0 cos φ − sin φ ⎠ 0 sin φ cos φ = (−k sin θ, −k cos θ sin φ, −k cos θ cos φ) = (kα, −kβγ, −kβδ), where γ = sin φ and δ = cos φ. Derive the projection matrix for this case using steps similar to the ones above. Exercise 6.18: After two rotations, the viewer may be located at any point in space. This is still not the most general case because there is another constraint. What is it? It is important to realize that matrix (6.9) isn’t as useful as it may seem at first. It generates the coordinates of projected points, but those coordinates are on the plane αx = −βz. In practice, we want to display the projected points on the screen, which is two-dimensional, so we have to go through another step. We have to define two local axes on αx = −βz and then figure out the coordinates of the projected points relative to those axes. This is why the approaches discussed in the remainder of this chapter are preferable. They project points onto the xy plane, where they effectively have just two coordinates. Before looking at these approaches, however, here is a short summary of the method used in this section. Summary. The method of this section proceeds in the following steps: 1. Derive the equation of the projection plane. 2. Determine the equation of the line segment connecting an arbitrary point P on the object to the viewer (see Equation (Ans.42)). 3. Locate the intersection point of the line and the plane. 4. Convert the coordinates of the intersection point to screen coordinates. It is possible to use these steps to figure out the projection matrix for the general case where the viewer may be located at any point B, looking in an arbitrary given direction D. This approach to the computation, however, is messy because in addition to B, D,
310
6.7 Transforming the Object
and k, another vector is needed to define the “up” direction of the projection plane. In this section, we started with the “up” direction in the positive y direction. After the two rotations, that direction has changed, but it is fully determined by the rotations and does not need to be explicitly specified. Another drawback of this approach is that points are projected on a three-dimensional plane, so they have three dimensions. In practice, the projected image should be displayed on the computer monitor, which is two-dimensional, so we would like the computations to produce two-dimensional points. The following two sections explain how to project points on the xy plane, which effectively makes them two-dimensional. Perspective, as its inventor remarked, is a beautiful thing. What horrors of damp huts, where human beings languish, may not become picturesque through aerial distance! What hymning of cancerous vices may we not languish over as sublimest art in the safe remoteness of a strange language and artificial phrase! Yet we keep a repugnance to rheumatism and other painful effects when presented in our personal experience. —George Eliot, Daniel Deronda (1876).
6.7 Transforming the Object The theory of special relativity teaches that movement (at a constant speed and in a straight line) is relative, which suggests the following idea. Instead of transforming the viewer to a new location, computing the new equation of the projection plane, and going through all the computations of the previous section, why not leave the viewer at the standard position and transform the object instead? After all, we are interested only in what the viewer sees on the screen. The absolute locations of the viewer and the object are irrelevant. If the viewer is left at the standard position, then any point on the (transformed) object can be projected by means of matrix Tp of Equation (6.6), which greatly simplifies the computations. This approach is ideal for cases where the viewer is located at the standard position and has to be transformed by means of translations and/or rotations (or even reflections, but no scaling or shearing) to a new location, where he can observe the object from a different direction. This approach is useful, for example, in computer animation. Suppose that we have to transform the viewer from the standard position to a new location by means of a transformation A that consists of several translations, rotations, and/or reflections (thus, A = T1 · T2 · · · Tn ). Instead of this, we leave the viewer in the standard position and apply the inverse transformation A−1 to the object. Direct multiplication proves that the inverse A−1 of our product matrix A is given by A−1 = −1 −1 (where T−1 is the reverse of transformation Ti ). A nice feature of T−1 n · · · T2 · T1 i this approach is that the individual Ti transformations are only translations, rotations, and reflections, and these transformations have simple inverses. The point is that transforming the viewer with A or transforming the object with A−1 will bring them to the same relative position. Once the object has been transformed, we can use matrix Tp (Equation (6.6)) to compute the perspective projection because
6 Perspective Projection
311
the viewer is still located at the standard position. In practice, there is, of course, no need to actually transform the object. All that we have to do is compute matrix T = A−1 · Tp and multiply each point of the object by T. Example: A viewer located at the standard position and an object close to the origin (Figure 6.43). Suppose that we want to translate the viewer to the origin, rotate him 45◦ counterclockwise and then translate him k units in both the negative x and negative z directions (Figure 6.43a,b,c). The transformation matrices are ⎛
1 ⎜0 T1 = ⎝ 0 0
0 0 1 0 0 1 0 k
⎞ 0 0⎟ ⎠, 0 1
⎛
cos 45◦ 0 − sin 45◦ 1 0 ⎜ 0 T2 = ⎝ sin 45◦ 0 cos 45◦ 0 0 0 ⎞ ⎛ 1 0 0 0 ⎜ 0 1 0 0⎟ T3 = ⎝ ⎠. 0 0 1 0 −k 0 −k 1
x
x
(a)
(b)
⎞ 0 0⎟ ⎠, 0 1
x (c) z
x
x
(d)
x
(e)
(f) z
Figure 6.43: Transforming Viewer or Object.
The reverse transformations, performed in reverse order, are (Figure 6.43d,e,f) ⎛
A−1
1 0 0 ⎜0 1 0 =⎝ 0 0 1 k 0 k ⎛ sin 45◦ 0 ⎜ =⎝ − sin 45◦ 0
⎞⎛ cos 45◦ 0 0 0⎟⎜ ⎠⎝ − sin 45◦ 0 0 1 0 1 0 0
0 sin 45◦ 1 0 0 cos 45◦ 0 0 ⎞ ◦ sin 45 0 0 0⎟ ⎠. sin 45◦ 0 ◦ −k + 2k sin 45 1
⎞⎛ 0 1 0⎟⎜0 ⎠⎝ 0 0 0 1
0 0 1 0 0 1 0 −k
⎞ 0 0⎟ ⎠ 0 1
Any point P = (x, y, z, 1) on the object can be projected to a two-dimensional point P∗
6.7 Transforming the Object
312 on the screen by
⎞ a 0 0 a/k 0 ⎟ ⎜ 0 1 0 P∗ = PA−1 Tp = (x, y, z, 1) ⎝ ⎠ −a 0 0 a/k 0 0 0 2a = (a(x − z), y, 0, a(2k + x + z)/k), ⎛
resulting in x∗ =
k(x − z) , 2k + x + z
y∗ =
yk , a(2k + x + z)
where a = sin 45◦ . A comparison of parts (c) and (f) in Figure 6.43 shows how the viewer and the object end up in the same relative positions. If transforming the viewer involves only translations and rotations (and no reflections), it is possible to transform the viewer from the standard position to any location in space by means of (1) a translation to the origin, (2) a general rotation about the origin, and (3) another translation from the origin to the final location. The two translations are easy to express, and Section 6.8 shows how to derive the transformation matrix that will rotate the viewer so his line of sight becomes any given direction D. The following example serves to illustrate this claim. Suppose that we want to translate the viewer from the standard position (0, 0, −k) to an arbitrary location B = (a, b, c) and then rotate him about some axis that goes through the origin (or, equivalently, first rotate him and then translate him to B). A rotation about the origin requires a temporary translation from B to the origin, a rotation, and a translation back to B. Thus, we need the four transformation matrices ⎡
1 0 0 0 ⎢0 1 T1 = ⎣ 0 0 1 a b c+k ⎡ . . . ⎢. . . T3 = ⎣ . . . 0 0 0
⎤ ⎡ 0 1 0⎥ ⎢ 0 = , T ⎦ ⎣ 2 0 0 1 −a ⎤ ⎡ 0 1 0⎥ ⎢0 ⎦ , T4 = ⎣ 0 0 1 a
0 1 0 −b 0 0 1 0 0 1 b c
⎤ 0 0 0 0⎥ ⎦, 1 0 −c 1 ⎤ 0 0⎥ ⎦, 0 1
where the elements of the rotation matrix T3 are irrelevant and are not shown. Direct multiplication verifies that the product T1 T2 is a transformation matrix that translates from the standard position (0, 0, −k) to the origin. Thus, instead of the four matrices above, we need only three transformation matrices, a translation to the origin, a rotation about the origin, and a translation to point B. Exercise 6.19: Suppose that we first want to rotate the viewer about the origin and then translate him to point B = (a, b, c). The rotation requires three transformations, a translation T1 to the origin, a rotation T2 about the origin, and a translation T3 back to (0, 0, −k). This must be followed by a translation T4 from the standard position to B. Show that the last two translations, T3 and T4 , can be replaced by one translation.
6 Perspective Projection
313
When reflections are included in addition to translations and rotations, more than four transformation matrices may be needed. Figure 6.44a shows a simple example. Given a viewer at (0, 0, −2), we want to reflect it about the plane (x, 0, x − 1) and rotate it 45◦ about the y axis (Figure 6.44b). The viewer can be considered a point which has no dimensions and no “left” and “right” directions. Thus, the reflection moves the viewer to another location but does not “reverse” him. However, the viewer and the screen have to be treated and moved as a single unit, which is why a full treatment of perspective projection should include a “top” vector that points in the direction of the top of the screen. When the viewer-screen unit is reflected, the left and right sides of the screen are reversed and the “top” vector changes direction (Figure 6.44c). x
x z
(a)
z
L
L R
(b)
R
(c)
Figure 6.44: Reflecting the Viewer.
In general, a reflection about an arbitrary plane in three dimensions requires five transformations: (1) a translation that brings one point of the plane to the origin, (2) a rotation about the origin that brings the plane to one of the three coordinate planes, (3) a reflection about that plane, (4) a reverse rotation, and (5) the reverse translation. In many special cases, such as a plane parallel to one of the coordinate planes, this process can be simplified, but in general a reflection followed by a rotation requires eight (5 + 3) transformations. In order to apply the inverse transformations to points on the object, we have to determine the inverses of all the transformation matrices involved, but fortunately the inverses of translation, rotation, and reflection about one of the coordinate planes are trivial to figure out. It should again be emphasized that the viewer and the projection plane constitute a single unit and should be transformed together. Even though the approach discussed in this section transforms the object and not the viewer, it is still important to make sure that the object remains on the other side of the projection plane from the viewer after all the transformations. Thus, after an object point is transformed and before it is projected, it is important to verify that its z coordinate is still nonnegative. It is also important to make sure that enough points are selected on the object, because otherwise it may happen that two points with nonnegative z coordinates are connected on the object with a curve, some of whose points may have negative z coordinates when projected. Figure 6.45a is an example of an object where P3 initially is not included as an object point. The transformations move the object to the left such that part of the curve between points P1 and P2 ends up to the left of the xy plane and P3 has a negative z coordinate. Once P3 is included as an object point, the software discovers that its projection has a negative z coordinate of, say, a units. The software then moves
6.8 Viewer at an Arbitrary Location
314
P3
x
x
P1 P2
P1 P3 P2 z
(a)
z
(b)
Figure 6.45: An Object with Negative z Coordinates.
all the object points a units to the right (Figure 6.45b) to obtain the correct projection on the xy plane. On the other side of the screen, it all looks so easy. —Jeff Bridges (as Kevin Flynn) in Tron (1982).
6.8 Viewer at an Arbitrary Location The previous section deals with the case where the viewer is initially located at the standard position. This section looks at the more general problem where the viewer is located at an arbitrary point B = (a, b, c), looking in a given direction D = (d, e, f ) (Figure 6.46a). The approach taken here is to transform the viewer to the standard position in three simple steps: (1) translate the viewer from B to the origin (the screen is also translated by the same amount, Figure 6.46b); (2) rotate the viewer-screen unit in three dimensions until D coincides with (0, 0, 1) (i.e., it points in the positive z direction, Figure 6.46c); and (3) translate the viewer and screen from the origin to point (0, 0, −k) (Figure 6.46d). These three transformations bring the viewer to the standard position and the screen to the xy plane. The same transformations are then applied to every point P of the image, thereby bringing the viewer and the image to the same relative positions they had before the transformations. One way to understand this approach is to imagine that the viewer and all the image points are transformed as one unit, such that the viewer ends up at the standard position. Another way to look at this approach is to imagine that we transform the coordinate axes (Section 4.5), while the viewer and the image are not moved. x B k
B
x
D (a)
x
x
D
z
(b)
z
z (c)
Figure 6.46: Transforming the Viewer-Screen Unit.
k z (d)
6 Perspective Projection
315
Now that the viewer is located at the standard position, matrix Tp (Equation (6.6)) can be used to project image points. This approach has the advantage that all the image points are projected on the xy plane, so that the projected points are effectively twodimensional. In practice, there is no need to actually transform the viewer and the screen. We simply use the coordinates (a, b, c) of point B and the components (d, e, f ) of vector D to derive the three transformation matrices T1 (translation), T2 (rotation), and T3 (second translation) and multiply T = T1 T2 T3 Tp . Any point P on the object is then transformed and projected in a single step by the multiplication P∗ = PT. This approach is developed here for the general case but is first illustrated by two examples where the coordinates of B and the components of D are known numbers. Example 1: The viewer is located at B = (1, 1, 1) and is looking in direction D = (1, 0, 1) (i.e., midway between the directions of positive x and positive z). Matrix T1 below translates from (1, 1, 1) to the origin. Matrix T2 rotates by 45◦ from the positive x to the positive z direction. Matrix T3 translates √from the origin to point (0, 0, −k). The result is (we denote s = cos 45◦ = sin 45◦ = 1/ 2) T = T1 T2 T3 Tp ⎛ 1 0 1 ⎜ 0 =⎝ 0 0 −1 −1 ⎛ s 0 1 ⎜ 0 =⎝ −s 0 0 −1 ⎛ s 0 1 ⎜ 0 =⎝ −s 0 0 −1 ⎛ s 0 1 ⎜ 0 =⎝ −s 0 0 −1
⎞⎛ s 0 0 0⎟⎜ 0 1 ⎠⎝ 0 −s 0 1 0 0 ⎞⎛ s 0 1 0 0⎟⎜0 ⎠⎝ s 0 0 −2s − k 1 0 ⎞ 0 sr 0 0 ⎟ ⎠ 0 sr 0 1 − kr − 2rs ⎞ 0 sr 0 0 ⎟ ⎠. 0 sr 0 −2rs 0 0 1 −1
s 0 s 0 0 1 0 0
⎞⎛ 0 1 0⎟⎜0 ⎠⎝ 0 0 0 1 ⎞ 0 0 0 0⎟ ⎠ 0 r 0 1
0 0 1 0 0 1 0 −k
⎞⎛ 1 0 0⎟⎜0 ⎠⎝ 0 0 0 1
0 1 0 0
⎞ 0 0 0 0⎟ ⎠ 0 r 0 1
(6.10)
(Recall that k = 1/r.) The projection of any point P = (x, y, z) is calculated by P∗ = PT. We illustrate this for two points. 1: Point P = (1, 1, 1) is projected to P∗ = (0, 0, 0) because ⎛
s 0 1 ⎜ 0 (1, 1, 1, 1) ⎝ −s 0 0 −1
⎞ 0 sr 0 0 ⎟ ⎠ = (0, 0, 0, 0). 0 sr 0 −2rs
6.8 Viewer at an Arbitrary Location
316
√ 2: Point P = (2k, 0, 2k) is projected to P∗ = (0, −1/ 2(2 − r), 0) because ⎛
s 0 1 ⎜ 0 (2k, 0, 2k, 1) ⎝ −s 0 0 −1
⎞ 0 sr 0 0 ⎟ ⎠ = (0, −1, 0, 2s(2 − r)). 0 sr 0 −2rs
Exercise 6.20: The product ⎛
s 0 1 ⎜ 0 (0, 0, 0, 1) ⎝ −s 0 0 −1
⎞ 0 sr 0 0 ⎟ ⎠ 0 sr 0 −2rs
equals (0, −1, 0, −2sr), √ which suggests that the origin (0, 0, 0) is projected on the screen at point P∗ = (0, k/ 2, 0). This, however, does not make sense since point (0, 0, 0) was originally “behind” the viewer and should remain behind it after all the transformations. What’s the explanation? Mighty is geometry; joined with art, resistless. —Euripides. Note. Notice the rightmost column of matrix T (Equation (6.10)). The first and third elements of that column are nonzero, which indicates that the projection plane intercepts the x and z axes. This is discussed in detail on Page 319. Example 2: The viewer is located at B = (−k sin θ, 0, −k cos θ) = (−kα, 0, −kβ) and is looking in direction D = (α, 0, β) (i.e., toward the origin). Matrices T1 , T2 , T3 , and Tp below are similar to the ones from the previous example. The result is T = T1 T2 T3 Tp ⎛ 1 0 0 ⎜ 0 1 0 =⎝ 0 0 1 kα 0 kβ ⎛ β 0 α ⎜ 0 1 0 =⎝ −α 0 β 0 0 0 ⎛ β 0 0 ⎜ 0 1 0 =⎝ −α 0 0 0 0 0
⎞⎛ 0 β 0⎟⎜ 0 ⎠⎝ 0 −α 1 0 ⎞⎛ 0 1 0 0⎟⎜0 1 ⎠⎝ 0 0 0 1 0 0 ⎞ αr 0 ⎟ ⎠. βr 1
⎞⎛ 0 α 0 1 1 0 0⎟⎜0 ⎠⎝ 0 β 0 0 0 0 1 0 ⎞ 0 0 0 0⎟ ⎠ 0 r 0 1
0 0 1 0 0 1 0 −k
⎞⎛ 0 1 0⎟⎜0 ⎠⎝ 0 0 1 0
0 1 0 0
⎞ 0 0 0 0⎟ ⎠ 0 r 0 1
(6.11)
It is easy to see that for θ = 0 (where α = 0 and β = 1), matrix (6.11) reduces to matrix (6.6).
6 Perspective Projection
317
Exercise 6.21: Assuming a viewer positioned as in the example above, calculate the projection of point P = (βl, m, −αl). Exercise 6.22: Projection matrices (6.11) and (6.9) correspond to the same geometry, so one would think that they should be identical. Why are they different? We now develop this approach for the general case where a viewer is located at an arbitrary point B = (a, b, c) looking in an arbitrary direction D = (d, e, f ), where vector D is assumed to be normalized (i.e., d2 + e2 + f 2 = 1). Translating the viewer to the origin is done, as usual, by matrix T1 ⎛
1 0 0 1 0 ⎜ 0 T1 = ⎝ 0 0 1 −a −b −c
⎞ 0 0⎟ ⎠. 0 1
(6.12)
The main task is to rotate vector D so it coincides with the positive z direction. The rotation should be about an axis that’s perpendicular to both D and the z axis. A general vector in this direction is obtained by the cross product D × (0, 0, 1) = (d, e, f ) × (0, 0, 1) = (e, −d, 0). Normalizing this vector yields (e, −d, 0) u= √ = e 2 + d2
e 1 − f2
−d
,
1 − f2
,0 .
Vector u is a unit vector in the direction of rotation. The rotation angle θ is the angle between vectors D and z = (0, 0, 1). Since both are unit√vectors, we can employ the dot product to obtain cos θ = D • (0, 0, 1) = f and sin θ = 1 − cos2 θ = 1 − f 2 . Notice that sin θ is nonnegative because the angle between vector D and the z axis is measured between the direction of D and the positive z direction and is consequently always in the interval [0, π]. The rotation matrix is obtained from Equation (4.32) ⎛ e2 +f −f 3 −e2 f ⎜ ⎜ T2 = ⎜ ⎝
1−f 2
−ed 1+f
d
−ed 1+f
d2 +f −f 3 −d2 f 1−f 2
e f 0
−d 0
−e 0
0
⎞
⎟ 0⎟ ⎟. 0⎠ 1
(6.13)
The two other tasks are to translate the viewer from the origin to point (0, 0, −k) by means of T3 and to use matrix Tp to project from the standard position: ⎛
1 ⎜0 T3 = ⎝ 0 0
0 0 1 0 0 1 0 −k
⎞ 0 0⎟ ⎠, 0 1
⎛
1 ⎜0 Tp = ⎝ 0 0
0 1 0 0
⎞ 0 0 0 0⎟ ⎠. 0 r 0 1
(6.14)
6.8 Viewer at an Arbitrary Location
318
The result is the matrix product Tg = T1 T2 T3 Tp ⎡ e2 +f +f 2 ⎢ ⎢ =⎢ ⎣
1+f −de 1+f
−de 1+f d +f +f 2 1+f
cd+bde−ae2 −af +cdf −af 2 1+f
−bd2 +ce+ade−bf +cef −bf 2 1+f
−d
dr
0
2
−e
(6.15) ⎤
⎥ ⎥ 0 er ⎥. ⎦ 0 fr 0 −(ad + be + cf )r
For the special case of a viewer located at B = (−k sin θ, 0, −k cos θ) = (−kα, 0, −kβ) and looking in direction D = (α, 0, β), this reduces to matrix (6.11). y y
(2 k
z
,2 k )
z
2k
,1, k)
x
k
(-1
,-2k
(2 k
2k
(-1,1,0)
(a)
(b)
x ,0,0
)
Figure 6.47: Two Tests of Matrix Tg .
Matrix Tg is now tested twice. The first test (Figure 6.47a) assumes that the viewer is at the standard location (0, 0, −k) but looking in direction (−1, 1, k). (These components still have to be normalized.) We compute the projection of point (−1, 1, 0) and the figure shows that this projection should be at the origin because the viewer is looking directly at the point. The Mathematica code k = 3.; r = 1/k; {a, b, c} = {0, 0, -k}; {d, e, f} = Normalize[{-1, 1, k}] T = {{(e^2 + f + f^2)/(1 + f), -d e/(1 + f), 0, d r}, {-d e/(1 + f), (d^2 + f + f^2)/(1 + f), 0, e r}, {-d, -e, 0, f r}, {(c d + b d e - a e^2 - a f + c d f - a f^2)/(1 + f), (-b d^2 + c e + a d e - b f + c e f - b f^2)/(1 + f), 0, -(a d + b e + c f) r}}; {-1, 1, 0, 1}.T computes the normalized components of D as (−0.3015, 0.3015, 0.9045) and the projected point as the 4-tuple (0, 0, 0, 1.1) (i.e., the origin). The second test (Figure 6.47b) assumes that the viewer is located at B = (0, 2k, −2k) looking in (the still unnormalized) direction (2k, −2k, 2k). We compute the projection of point (2k, 0, 0), and the figure again suggests that this projection should be at the origin because the viewer is looking directly at the point. Code similar to the above yields
6 Perspective Projection
319
the normalized direction vector D as (0.577, −0.577, 0.577) and the projected point as (0, 0, 0, 3.5), again the origin. Exercise 6.23: Perform a similar test for B = (0, 2k, −k) and unnormalized D = (0, −1, −1). Use mathematical software to compute the projection of point (0, 0, −4k). Notice that the viewer is looking at the z axis a little “past” this point. The rightmost column of Tg is especially interesting. Its three top elements are dr, er, and f r, where r = 1/k is the inverse of the (strictly positive) distance k of the viewer from the screen and (d, e, f ) are the components of vector D. If any of these components is zero, the corresponding element of Tg will also be zero, which implies that there is a simple relationship between these three matrix elements and the direction D of the viewer’s line of sight. Since the screen is perpendicular to the line of sight, we end up with the following interesting result. The three matrix elements dr, er, and f r indicate which of the three coordinate axes is intercepted by the screen before the screen is transformed to the standard position. y
z 3
x
2
1
Figure 6.48: n-Point Perspective.
For example, if e = 0 and d and f are nonzero, then D is a vector in the xz plane (and is not in the x or z directions), so the projection plane intercepts the x and z axes but is parallel to the y axis and does not intercept it. This result has already been mentioned several times in the past and is often referred to as n-point perspective, where n can be 1, 2, or 3. Figure 6.48 illustrates the justification for this term. The figure shows a cube centered on the origin and three viewers looking at it. Viewer 1 is located on the z axis and sees one vanishing point. Viewer 2 is located on the xz plane and therefore sees two vanishing points, and viewer 3 is located above the xz plane and so sees three vanishing points. However, the term “n-point perspective” refers to the number of coordinate axes, 1, 2, or 3, intercepted by the projection plane, not to the number of vanishing points actually observed by the viewer. The viewer can observe any number of vanishing points, depending on the existence of groups of straight, parallel lines on the object (Page 273).
6.8 Viewer at an Arbitrary Location
320
Exercise 6.24: Compute matrix (6.15) twice, first for the case where D = (0, 0, 1) (viewer looking in the positive z direction) and then for D = (0, 0, 1) and B = (0, 0, −k) (the standard position). Exercise 6.25: Assuming a viewer at point B = (0, 1, 0) looking in direction D = (0, 1, 1), calculate the projection of point P = (0, 1, 10). Matrix Tg of Equation (6.15) contains the expression 1 + f in the denominators of certain elements, which may cause undefined values when f = −1. Since we assume that vector D is normalized, d2 + e2 + f 2 must be equal to 1, so the case f = −1 implies d = e = 0, which, in turn, implies D = (0, 0, −1) (i.e., a viewer looking in the negative z direction). It turns out that Tg can be used even in this case. When d = e = 0, we can write Tg [1, 1] =
e2 + f + f 2 f (1 + f ) = = f = −1, 1+f 1+f
Tg [2, 2] =
f (1 + f ) d2 + f + f 2 = = f = −1, 1+f 1+f
Tg [4, 1] =
1+f cd + bde − ae2 − af + cdf − af 2 = −af = a, 1+f 1+f
Tg [4, 2] =
1+f −bd2 + ce + ade − bf + cef − bf 2 = −bf = b. 1+f 1+f
Matrix elements Tg [1, 2] = Tg [2, 1] = −de/(1 + f ) have the indefinite form 0/0, but we artificially set them to zero. Matrix Tg becomes ⎛
⎞ −1 0 0 0 ⎜ 0 −1 0 0 ⎟ Tg = ⎝ ⎠. 0 0 0 −r a b 0 cr
(6.16)
This is a matrix that transforms a point P = (x, y, z, 1) to point P∗ =
−x + a −y + b , ,0 . (c − z)r (c − z)r
Following are two quick tests of this matrix. They were performed with the following Mathematica code: (* code to check matrix T_g for the case 1 + f = 0 *) r = 1/k; {a, b, c} = {0, 0, -k}; T = {{-1, 0, 0, 0}, {0, -1, 0, 0}, {0, 0, 0, -r}, {a, b, 0, c r}}; {x, y, z, 1}.T
6 Perspective Projection
321
1. When the viewer B is located at the standard location (0, 0, −k), matrix Tg of Equation (6.16) transforms an arbitrary point P = (x, y, z) to the point P∗ =
x y y x , ,0 = , ,0 , (k + z)r (k + z)r 1 + z/k 1 + z/k
which is the familiar Equation (6.3). 2. When the viewer B is located at (1, 1, 1), point (x, y, z) = (1, 1, −1) is transformed to 1−1 1−1 , , 0 = (0, 0, 0). (1 + 1)r (1 + 1)r The reader should visualize this situation with the help of a diagram to see why the result is correct. The Top Vector. This section’s approach to general perspective moves the viewer from an arbitrary location B to the standard position while rotating his line of sight from an arbitrary direction D to the positive z direction. This is done in the following three steps: (1) a translation from B to the origin, (2) a rotation, and (3) a translation to point (0, 0, −k). However, Figures 6.41 and Ans.20b illustrate why another rotation is sometimes needed after step 3 in order to correct the orientation of the screen. Figure 6.49 shows a viewer moved from a general location to the standard position and how the extra rotation serves to align the top of the screen with the y axis in a new step 4. y
y
Q
Q
k
x
k
k
z
y
Positive h
z
x
Negative h
Figure 6.49: The Top Vector.
The software normally has no idea how the screen is oriented initially and how it should be oriented when the viewer is brought to the standard position. If this orientation is important, the user should specify the direction Q of the top of the screen, and step 4 should be added to rotate the viewer-screen unit about the z axis until Q is aligned with the x or y axis or any other desired direction. This extra step can be ignored in cases where the projection plane is rotated through a small angle or is infinitely large. In practice, however, the projection plane is the screen on which the three-dimensional scene is projected. This screen has a finite size and should normally be oriented such that its top points in the positive y direction. The rotation matrix of step 4 is easy to derive. We assume that the first three steps have brought the screen to the xy plane and have transformed the original top vector
322
6.9 A Coordinate-Free Approach: I
2 2 Q to Q √ = (h, i, 0). (We assume that (h, i, 0) is already normalized, so h + i = 1 or 2 h = ± 1 − i .) We further assume that the rotation of step 4 should align Q with the positive y axis (0, 1, 0). The rotation is about the z√axis, and the angle φ of rotation is determined by cos φ = Q • (0, 1, 0) = i and sin φ = 1 − i2 = h. The rotation matrix is therefore given by ⎤ ⎡ i h 0 0 ⎢ −h i 0 0 ⎥ T4 = ⎣ ⎦. 0 0 1 0 0 0 0 1
Matrix T4 rotates vector (h, i, 0, 1) to the positive y axis. The following Mathematica code verifies this for the four special normalized vectors (a, a, 0, 0), (a, −a, 0, 0), √ (−a, a, 0, 0), and (−a, −a, 0, 0), where a = 1/ 2. In each case, once the values of h and i on line 1 are set to a or -a, the result is (0, 1, 0, 0). 1 2 3
a = 1/Sqrt[2]; h = a; i = a; T = {{i, h, 0, 0},{-h, i, 0, 0},{0, 0, 1, 0},{0, 0, 0, 1}}; {h, i, 0, 1}.T
Exercise 6.26: Assume a viewer located at B = (0, 2k, −2k) looking √ in (unnormalized) direction D = (0, −1, −1), as in Exercise 6.23 and a value k = 2. Figure Ans.20a illustrates the geometry of this case. 1. Derive the equation of the projection plane. 2. Multiply the transformation matrices of Equations (6.12), (6.13), and (6.14) to obtain one transformation T123 that brings the viewer to the standard position. 3. Pick up a point on the projection plane and compute its coordinate on the xy plane after transformation T123 .
6.9 A Coordinate-Free Approach: I The discussion of general perspective in the previous sections is based on points and their coordinates relative to a three-dimensional coordinate system. This section presents a coordinate-free approach to the same problem, an approach that is based on vectors and vector operations. The location of the origin and the directions of the coordinate axes are not needed, although they may serve to illuminate the particular geometry of the examples presented here. The term “point” is still used, but we refer to a point in terms of the vector connecting it with the origin instead of as a triplet (x, y, z) of coordinates. Figure 6.50a shows a viewer at point B looking in an arbitrary direction a. The screen is, as always, perpendicular to the line of sight a, and we assume that |a| = k > 0. The center of the screen is at point C. Note that vector a gives both the direction of view of the viewer and the distance between the viewer and the screen. The derivation of the projection is surprisingly easy. We select an arbitrary point P on the other side of the screen from the viewer and connect it with the viewer. The intersection of line BP and the screen is the projected point P∗ . Vector b indicates the position of the viewer. Vector c indicates the direction CP∗ on the screen. Vector d is
6 Perspective Projection
Viewer
y
Screen
B
323
Screen
a b
d−b d p
Origin
C
C
c
a
P* e
B
P
P*
P
(a)
z (b)
Figure 6.50: (a) General Perspective with Vectors. (b) Example.
the position vector of point P∗ . Vector e connects B to P. Vector p points from the origin to point P. Vector d − b connects point B to point P∗ . Vector p is the sum p = b + e, which implies e = p − b. From d = b + a + c, we get c = d−b−a. Vector d−b is in the direction of e, so we can write d−b = αe = α(p−b), where α is a real number. This implies c = α(p − b) − a or d = b + a + c = b + α(p − b).
(6.17)
Since the line of sight is perpendicular to the screen, we can write a • c = 0, which implies a • [α(p − b) − a] = 0, or αa • (p − b) = a • a, or α=
|a|2 . a • (p − b)
(6.18)
Before we continue with the analysis, the following cases should be discussed: 1. α is positive. This is the normal case. It means that the viewer and point P are on different sides of the screen and the projection is meaningful. 2. α is zero. This implies a vector a of magnitude zero (i.e., a viewer positioned at the screen). Either the viewer or the screen should be moved before anything can be meaningfully displayed. 3. α is negative. This implies that P and the viewer are on the same side of the screen, so P should not be projected. 4. α is undefined. This occurs when a•(p−b) = 0, implying that a is perpendicular to p − b and therefore to e. Vector e is therefore parallel to the screen, making it impossible to project P. After α is computed and checked, we can proceed in one of two ways: (1) We can use Equation (6.17) to calculate vector d, which points directly to P∗ on the screen, or (2) we can calculate the screen coordinates of vector c. In the latter case, we consider the center of the screen (point C) a local origin and we define two unit vectors u and
6.9 A Coordinate-Free Approach: I
324
w to serve as local axes on the screen. The screen coordinates of c are, in this case, the projections u • c and w • c of c on these axes. In order to compute u and w, we recall that they should be on the screen (and therefore perpendicular to a) and also perpendicular to each other. We can therefore write a • u = a • w = u • w = 0. It also makes sense to require that u be in the xy plane (which will cause w to point in the z direction as much as possible). Solving these equations results in u = (ay , −ax , 0)
and
w = (ax az , ay az , −a2x − a2y ).
(6.19)
Vectors u and w should then be normalized. Note that u and w are undefined if a points in the z direction (if a = (0, 0, az ), then u = w = (0, 0, 0), an undefined direction). However, in this case the screen is parallel to the xy plane, so we can simply define the local coordinate axes as u = (1, 0, 0) = i and w = (0, 1, 0) = j. This novel approach to general perspective is illustrated by two examples. Example 1: This is a simple example (Figure 6.50b) where all the points lie on the yz plane. We assume a viewer at B = (0, 1, 0), looking in direction (0, 1, 1) (i.e., 45◦ in the yz plane). Vector a must point in this direction, and we√assume a = (0, 2, 2) (i.e., the √ center of the screen is at a distance of |a| = 22 + 22 = 8 units from the viewer). We further assume that the point P to be projected is at (0, 1, 10). The center of the screen (point C) is easily seen to be at b + a = (0, 1, 0) + (0, 2, 2) = (0, 3, 2). The first step is to determine α α=
|a|2 8 2 = = . a • (p − b) (0, 2, 2) • (0 − 0, 1 − 1, 10 − 0) 5
The next step is to compute d = b + α(p − b) = (0, 1, 0) + (2/5)(0, 0, 10) = (0, 1, 4). The projected point P∗ is therefore at (0, 1, 4). (See the diagram to convince yourself that the precise value of the z coordinate of P is irrelevant in this case.) Next, we calculate the local coordinates of this point on the screen. Vector c is first obtained by c = α(p − b) − a = (2/5)(0, 0, 10) − (0, 2, 2) = (0, −2, 2). The local axes on the screen are computed next from Equation (6.19). They are u = (2, 0, 0) and w = (0, 4, −4). We normalize them by dividing each by its magnitude, obtaining √ √ u = (1, 0, 0) and w = (0, 1/ 2, −1/ 2). (Note that u is in the x direction and w is in the yz plane.) Thus, the√screen coordinates √ of c are u • c = (1, 0, 0) • (0, −2, 2) = √ 0 and w • c = √ (0, 1/ 2, −1/ 2) • (0, −2, 2) = − 8. The projected point is therefore 8 units away from the center of the screen C. Note that this equals the absolute value of vector c. As an added bonus, we compute the plane equation of the screen. Let (x, y, z) be a general point on the screen. The vector from the center (point C) to (x, y, z) is (x − 0, y − 3, z − 2). This vector must be perpendicular to the normal to the screen (vector a), which implies 0 = a • (x, y − 3, z − 2) = (0, 2, 2) • (x, y − 3, z − 2),
or y + z = 5.
6 Perspective Projection
325
This equation relates the y and z coordinates of all the points on the screen. Any point with coordinates (x, y, 5 − y) is therefore on the screen regardless of the value of x. Note that the projected point P∗ also satisfies this relation. Exercise 6.27: Generalize the previous example to the case of a general point P = (x, y, z). Example 2: Again we give a simple example, illustrated in Figure √ 6.51. The √ screen is centered on the origin at a 45◦ angle, and the viewer is at point (−k/ 2, 0, −k/ 2), a distance√of k units from the screen. To simplify the notation, we introduce the quantity ψ = k/ 2. From Figure 6.50a it is clear that a = (ψ, 0, ψ) and b = −a = (−ψ, 0, −ψ). The center of the screen is, as always, at a + b, which is point (0, 0, 0). x
P
Screen 45
0
z
k
Viewer Figure 6.51: Viewer Rotated About the y Axis.
The first step is to determine α: α=
2ψ 2 2ψ |a|2 = = . a • (P − b) (ψ, 0, ψ) • (x + ψ, y, z + ψ) x + z + 2ψ
(Try to convince yourself that α is positive in the gray area above and to the right of the screen because x + z + 2ψ is positive in this area.) The next step is to compute vector d d = b + α(P − b) = (−ψ, 0, −ψ) + =
2ψ (x + ψ, y, z + ψ) x + z + 2ψ
ψ (x − z, y, z − x) . x + z + 2ψ
Notice that P = (0, 0, 0) is transformed to P∗ = (0, 0, 0). Also, every point P = (x, 0, −x) is transformed to P∗ = (0, 0, 0). Since the screen is centered at the origin, we have c = α(P−b)−a = α(P−b)+b = d. The next step is to calculate the local screen vectors u and w from Equation (6.19). This is straightforward and results in u = (0, −ψ,√ 0) and w = √ (ψ, 0, −ψ). After normalization, these become u = (0, −1, 0) and w = (1/ 2, 0, −1/ 2). Notice that u is the y axis and w is in the xz plane.
6.10 A Coordinate-Free Approach: II
326
The screen equation is obtained from a • (x, y, z) = 0, which implies ψ(x + z) = 0 or x = −z. The last step is to derive the transformation matrix. From ψ(x − z) X = , H x + z + 2ψ
x∗ =
y∗ =
ψy Y = , H x + z + 2ψ
we get
⎛
ψ ⎜ 0 (X, Y, Z, H) = (x, y, z, 1) ⎝ −ψ 0
0 ψ 0 0
z∗ =
−ψ 0 ψ 0
ψ(z − x) Z = , H x + z + 2ψ
⎞ 1 0 ⎟ ⎠. 1 2ψ
(Notice the two 1’s in the last column. They indicate that the projection plane intercepts the x and z axes but not the y axis. This is a two-point perspective.)
6.10 A Coordinate-Free Approach: II This approach to the problem of perspective projection also employs vectors instead of coordinates, but we assume that the following are given (Figure 6.52): 1. The position of the viewer (vector b). 2. The direction and distance from the viewer to the projection plane (vector a). 3. An “up” vector Z, which determines the direction of the local screen vector w. 4. Two viewing half-angles h and v, an approach that is handy when we want to limit the projected image to certain viewing angles, as in Figure 6.14.
d
P
from origin
w
fro
b
Z W
φ
b d-
v
m
a
or ig in
h
Viewer
c
U Figure 6.52: A Viewing Geometry.
u
6 Perspective Projection
327
We proceed in the following simple steps: 1. Calculate vector U as perpendicular to both a and Z. U = a × Z. 2. Compute vector W as perpendicular to both U and a. W = U × a. Vector W is in the Za plane and is perpendicular to a. It will serve to determine vector w on the screen in step 4. 3. Denote C = b + a. This points to the center of the screen. 4. Construct the half-screen vectors u and w. They are in the directions of U and W, respectively, but their sizes are determined by the viewing angles u=
U |a| tan h, |U|
w=
W |a| tan v. |W|
2
|a| 5. Compute α = a•(P−b) and vectors d = b + α(P − b) and c = α(P − b) − a in the usual way. 6. Now that c is known, we use it to determine the two scale factors cx and cy :
cx =
|c| cos θ 1 (c • u) , = |u| |u|2
cy =
|c| cos φ 1 (c • w) . = |w| |w|2
These are numbers in the range [−1, 1]. Any point P = (x, y, z) for which either cx or cy is greater than 1 or less than −1 is therefore outside the screen and should not be displayed. The range of values of cx and cy assumes that the origin of the screen is at its center. The actual screen coordinates (sx , sy ) of a pixel depend on the dimensions of the screen (measured in pixels). They are given by sx = (half the screen width) × cx ,
sy = (half the screen height) × cy .
If the origin is at the bottom left corner, then sx = (half the screen width) + (half the screen width) × cx , sy = (half the screen height) + (half the screen height) × cy . If it is at the top left corner, sx = (half the screen width) + (half the screen width) × cx , sy = (half the screen height) − (half the screen height) × cy . Example: We apply the method above to the standard case depicted in Figure 6.36, where the screen is part of the xy plane and is centered on the origin and the viewer is located k units from the origin on the negative z axis. Assuming that the two half-angles h and v are given, we need to compute scale factors cx and cy that will make it possible to determine for any given point P whether its projection on the xy plane is inside or outside the screen.
6.10 A Coordinate-Free Approach: II
328
It is clear that b = (0, 0, −k) and a = (0, 0, k) = −b. We also select the positive y direction as our “up” direction, so Z = (0, 1, 0). To express the final results in a general way, we denote m = tan h and n = tan v. The calculation is straightforward. 1. 2. 3. 4.
U = a × Z = (0, 0, k) × (0, 1, 0) = (−k, 0, 0). W = U × a = (−k, 0, 0) × (0, 0, k) = (0, k2 , 0). C = b + a = (0, 0, 0). The center of the screen is at the origin. The local screen axes are u=
U |a| tan h = (−km, 0, 0), |U|
w=
W |a| tan v = (0, kn, 0). |W|
5. The three quantities α, d, and c are determined next: α=
k2 k |a|2 = = , a • (P − b) (0, 0, k) • (x, y, z + k) z+k
d = b + α(P − b) = (0, 0, k) +
k k (x, y, z + k) = (x, y, 0), z+k z+k
c = α(P − b) − a = α(P − b) + b = d. 6. The scale factors cx and cy can now be obtained: cx =
c•u = |u|2
c•w cy = = |w|2
k z+k (−xkm) k 2 m2 k z+k (ykn) k 2 n2
=
−x , m(z + k)
y . = n(z + k)
(6.20)
As a simple application of these results, let’s select h = v = 45◦ , which implies m = n = 1. Let’s also assume screen dimensions of 100 × 100 pixels, a local origin at the center of the screen, and k = 1. For point P = (1, 2, 1), we get the scale factors cx =
−x −1 = = −0.5, m(z + k) 1+1
cy =
y 2 = = 1. n(z + k) 1+1
Thus, the screen coordinates are sx = 50 × (−0.5) = −25 and sy = 1 × 50 = 50 (the top of the screen). However, any point with coordinates (1, y, 1) where y > 2 would produce a scale factor cy > 1, implying that its projection is outside the screen. Exercise 6.28: Why is Equation (6.20) asymmetric with respect to x and y (i.e., why −x and not −y)?
6 Perspective Projection
329
6.10.1 Perspective Depth The perspective projection converts a three-dimensional point to a two-dimensional point. It completely erases any information about the depth (the z coordinate) of the original point. However, certain algorithms for hidden surface removal need precisely that information. We therefore need to generalize our perspective projection to create a third coordinate z ∗ with information about the original z coordinate of the projected point. The obvious choice is z ∗ = z, but this has a serious downside: It does not preserve straight lines. Imagine two three-dimensional points, P1 = (x1 , y1 , z1 ) and P2 = (x2 , y2 , z2 ), projected to the points P∗1 =
y1 k x1 k , , z1 k + z1 k + z1
and P∗2 =
y2 k x2 k , , z2 . k + z2 k + z2
Note that the two projected points are not necessarily on the projection plane. We say that they are located in the image space. The straight segment P(t) = P1 + (P2 − P1 )t (Equation (Ans.42)) connects the two original points, while the segment P∗ (u) = P∗1 + (P∗2 − P∗1 )u connects the two projected ones. It can be shown that an arbitrary point P(t0 ) on P(t) is projected to a point that’s not on P∗ (u). This is why the perspective depth projection is not chosen simply as z ∗ = z but as z ∗ = z/(k + z). This definition preserves depth information, because it has the property z1 > z2 ⇒ z1∗ > z2∗ . It also preserves straight lines. Exercise 6.29: Prove the claim above.
6.11 The Viewing Volume In order to display realistic images, we have to limit the items that are being displayed to those that would actually be seen by a viewer located at (0, 0, −k) and looking at the image projected on the screen. There are three cases to consider: 1. The viewer and the object being projected should be located on different sides of the projection plane. Any parts of the object located on the same side as the viewer should not be projected. Such parts should be identified and disregarded. If the software does not do that, such parts would be projected in a wrong way, upside down and back to front. (See also the discussion of negative z coordinates on Page 305.) As an example, consider points P1 and P2 in Figure 6.53a. The former is on the other side of the screen from the viewer and is therefore projected correctly. The latter is on the viewer’s side of the screen and is projected on the negative side of the x axis. Including points such as P2 in the projection creates a confusing effect. 2. Those parts of the scene that are located very far away may be too small to be seen by an actual viewer, and we may choose not to project and display them on the screen. User-friendly software should therefore make it possible for the user to select a value K and clip off those parts of the scene whose z coordinates are greater than
6.11 The Viewing Volume
330
P1
x
P2
P1
P2
P4 P3
P 1* er ew Vi
P2*
z
(a)
(b) x
far plane
L1
W/2
k
L2
z
K (c)
Figure 6.53: The Viewing Volume in Three Dimensions.
K. The effect of this is to define a plane located at z = K beyond which nothing is projected. 3. The screen and the far plane now define a truncated pyramid, called the viewing volume or frustum (Latin for a piece broken off, the plural is frusta or frustums). Those parts of the image that are outside it are either irrelevant or invisible to the viewer and should not be displayed. Imagine a scene made up of points connected with straight segments. Before displaying the picture, the software should determine which points are outside the viewing volume. Those points should not be displayed but should not be ignored either. Figure 6.53b shows four points connected to form a rectangle. Notice how some of the lines connecting the points should not be displayed and others should be clipped. In general, only those parts of the image that are inside the viewing volume should be displayed. It is easy to determine if a point P = (x, y, z) is inside the viewing volume. We assume that the screen is a square that is W units on a side. Figure 6.53c shows two of the four lines that bound the pyramid. It is easy to see that tan α = (W/2)/k = W/(2k). This is also the slope of line L1 . The x-intercept of the line is W/2, so the line’s equation is x = (W/2k)z + W/2 = (W/2)(z/k + 1). The equation of L2 is, similarly, x = −(W/2)(z/k + 1). Since the diagram is symmetric with respect to x and y, we conclude that point P is located inside the pyramid if its coordinates satisfy |x|, |y| ≤ (W/2)(z/k + 1). Exercise 6.30: Assume that the distance k of the viewer from the screen equals the size W of the screen. What will be the width of the field of view of the viewer?
6 Perspective Projection
331
Let’s assume that two points, P and Q, are part of the total image and are to be connected with a straight line. The first step is to determine, for each point, whether it is located inside or outside the viewing volume. (If a point is located on the edge of the viewing volume, it is considered to be inside.) In the second step, three cases should be distinguished: 1. Both points are inside the viewing volume. The line connecting them is completely inside the volume and should be fully displayed. This is because the viewing volume is convex. (It is a convex polyhedron.) 2. One point is inside and the other is outside the viewing volume. The line connecting them intercepts the volume at exactly one point. (This, again, is a result of the convexity of the viewing volume.) The interception point should be determined and the line should be clipped. 3. Both points are outside. The line connecting them is either completely outside (and should therefore be ignored) or it intercepts the viewing volume at two points. Both interception points should be calculated and the line segment connecting them should be displayed. (There is also the degenerate case where both interception points are identical; the line is tangent to the viewing volume. In such a case, the line can be ignored or just one pixel displayed.)
6.11.1 Application: Flight Simulation People have been fascinated by flight since the dawn of history. It is therefore not surprising that simple, inexpensive flight simulators for personal computers appeared as soon as these computers became fast and powerful enough to perform the necessary computations in real time. A flight simulator, even a simple one, is a complex program because it has to simulate the behavior of an airplane and display both the interior (instruments) and exterior (the view from the cockpit) in real time. This section is concerned with displaying the view from the cockpit, and we show that this task is an application of the important concept of viewing volume.
k
L1
sky projected z (a)
(b)
Viewing volume
L2 pilot ground
Figure 6.54: (a) Two Fields of View. (b) A Viewing Volume.
Figure 6.54a shows part of a typical World War II bomber. It is obvious that the field of view of the pilots in the cockpit is restricted. They see a lot of sky and part of the airplane, but only distant parts of the ground in front and on the sides. The bombardier, however, has an almost 180◦ field of view and can see all the way from 6 o’clock (the ground below their feet) to 12 o’clock (straight up). Figure 6.54b is a schematic diagram showing the viewing volume of the pilot (ignoring the curvature of the Earth). We assume that the flight simulator has to display
332
6.12 Stereoscopic Images
the pilot’s view on a screen placed k units in front of the pilot. It is obvious that the view depends on the precise shape of the aircraft. (This determines the orientation of lines L1 and L2 .) Most of the screen in the figure is a projection of the sky and only a small part shows a projection of the ground in front of the aircraft. It is also trivial to use similar triangles to obtain the basic perspective expression (Equation (6.3)) k z+k = ∗ y y
or y∗ =
y ky = . z+k z/k + 1
6.12 Stereoscopic Images We now turn to an important application of transformations and perspective projection, namely stereoscopic view. This section explains the principles and theory of stereoscopic images, how to create them, and how to view them. Stereo (from the Greek στ ρoσ)—solid, three-dimensional. It is generally agreed that the concepts of stereoscopy were discovered in 1833 by Charles Wheatstone, who is mostly known for the Wheatstone bridge (an electrical circuit for the precise comparison of resistances). His 1833 lecture to the Royal Society in London on his discoveries has been published and became the first milestone in the history of this topic. In this lecture, he describes how his discovery came about as a result of his acoustical experiments. Wheatstone also developed the first stereoscopic viewer, which worked with mirrors. Initially, a pair of stereo pictures had to be drawn by hand, but with the invention of photography by Louis Daguerre and Fox Talbot in 1839, it became possible to generate precise pairs of stereo pictures and watch them as a single stereoscopic image. The ideal graphics output device should be three-dimensional. Unfortunately, only a few such devices are currently available and they are expensive and cumbersome. This is why stereo pictures, displayed on a two-dimensional screen or printed on paper, are interesting and have important applications. The reason we see real-life objects in three dimensions is that our eyes are separated (by about 60–70 mm) and hence look at the same object from slightly different positions (Figure 6.55a). They see slightly different images, which are “fused” by the brain to create the three-dimensional image. The principle of stereo images is therefore to create and display two slightly different images of the same object and to make sure that each eye sees just one image. This may be achieved by displaying the two images in two different colors and watching them through special glasses that allow each color to reach just one eye. Other methods for viewing such a pair of images in stereo are discussed in Section 6.14. The simplest way to compute the two stereo images is to use translation and perspective projection. This is what makes stereoscopy a natural application of the concepts described earlier. Figure 6.55b shows each eye as a viewer. The left eye is located at (−e, 0, −k) and the right eye is at (e, 0, −k). To create the image seen by the left eye
6 Perspective Projection left eye A
A
333
left eye
* Pleft
B
k
B A
(a)
B
right eye
* Pright
right eye
P* z
x (b)
Figure 6.55: Principle of Stereo Images.
(the projection P∗left of point P), we first have to translate the eye to the origin and then follow with a standard perspective projection. The transformations are ⎛
1 ⎜0 ⎝ 0 e
0 1 0 0
0 0 1 0
⎞⎛ 0 1 0⎟⎜0 ⎠⎝ 0 0 1 0
⎞ ⎛ 0 0 1 0 0⎟ ⎜0 ⎠=⎝ 0 r 0 0 1 e
0 1 0 0
0 1 0 0
⎞ 0 0 0 0⎟ ⎠ = Tleft . 0 r 0 1
The transformation for the right eye is similarly ⎛
1 ⎜ 0 ⎝ 0 −e
0 1 0 0
⎞ 0 0 0 0⎟ ⎠ = Tright . 0 r 0 1
It projects P to P∗right . The stereo pair is created by transforming each point P on the original image twice, to the two points Pleft = P Tleft and Pright = P Tright . The value selected for e depends on how the picture is to be viewed. For the dual-color method mentioned earlier, 2e should equal the distance between the eyes (about 60–70 mm). This is a small value, so there is not much difference between Pright and Pleft . The two images highly overlap. For a general point P = (x, y, z), the projections for both eyes are
x+e y , , zr + 1 zr + 1 x−e y = (x − e, y, 0, zr + 1) → , . zr + 1 zr + 1
Pleft = (x, y, z, 1)Tleft = (x + e, y, 0, zr + 1) → Pright = (x, y, z, 1)Tright
This means that the smaller z is (i.e., the closer the point is to the viewer), the greater the difference between what the two eyes see. A good way to visualize this is to imagine an object sliding past the viewer. The front of the object slides faster than the back, an effect known as parallax.
334
6.12 Stereoscopic Images
As an example, consider the two points P = (5, 0, 1) and Q = (5, 0, 2). They differ only in their z coordinate. Assuming that e = 2 and r = 3, their projections are 5+2 7 Pleft = ,0 = ,0 , 3+1 4 7 5+2 ,0 = ,0 , Qleft = 2·3+1 7
5−2 3 ,0 = ,0 , 3+1 4 3 5−2 ,0 = ,0 . = 2·3+1 7
Pright = Qright
The difference between Pleft and Pright is 7/4 − 3/4 = 1, whereas the difference between Qleft and Qright is only 7/7 − 3/7 = 4/7. Figure 6.56 is an example of a stereo pair of a polyline connecting the eight corners of a cube. The Mathematica code that did the computations is also listed. Figure 6.57 shows the complete cubes. A more sophisticated approach to generating a stereo image is shown in Figure 6.58a. The two eyes are located at (e, 0, −k) and (−e, 0, −k), and they view the general point P = (x, y, z) from different directions. Point P is projected twice on the projection plane, at points PL and PR , using the general rule for perspective projections. Assuming that the distance between the eyes is 2e, Figure 6.58c,d shows how to calculate the x coordinates of points PL and PR , respectively. Using similar triangles, Figure 6.58c yields x−e x−e xL − w x + ez/k = or xL = +e= , k+z k 1 + z/k 1 + z/k and, similarly, from Figure 6.58d we get xR + w x+e = k+z k
or xR =
x − ez/k x+e −e= . 1 + z/k 1 + z/k
Since both eyes are at y = 0, the y ∗ coordinates of both PL and PR are given by y∗ =
y . 1 + z/k
We thus obtain the transformation matrices TL and TR that PR , ⎛ ⎞ ⎛ 1 0 0 0 1 0 1 0 0 ⎟ 1 ⎜ 0 ⎜ 0 TL = ⎝ ⎠ , TR = ⎝ e/k 0 0 1/k −e/k 0 0 0 0 1 0 0
transform P to PL and ⎞ 0 0 0 0 ⎟ ⎠. 0 1/k 0 1
(6.21)
Figure 6.58b shows how to select reasonable values for e and k. We first assume that the distance between the eyes is about 75 mm (about 3 in). Normal reading distance is about 20 in. Using the values 3 and 20, we get tan θ/2 = 1.5/20, yielding θ/2 = 4.29◦ or θ = 8.58◦ . This is the average stereo angle between the eyes. To get a stereo pair that will look natural and will be free of distortions, we should select values for e and k that should maintain this angle. A natural value for k is 4 in, since this is the focal length of the lenses used by most commercial stereoscopes. If we reduce k from 20 to 4 (a factor of 5), we should reduce e from 3 to 3/5 = 0.6 to maintain the same stereo angle.
6 Perspective Projection
Figure 6.56: Example of a Stereo Image Pair.
(* display two cubes as a stereo pair *) Clear[Trg, Tlf, pt, e, r, qt]; Tlf={{1,0,0,0},{0,1,0,0},{0,0,0,r},{e,0,0,1}}; Trg={{1,0,0,0},{0,1,0,0},{0,0,0,r},{-e,0,0,1}}; pt={{1,1,1,1},{-1,1,1,1},{1,-1,1,1},{-1,-1,1,1}, {1,1,-1,1},{-1,1,-1,1},{1,-1,-1,1}, {-1,-1,-1,1},{1,1,1,1}}; e=.1; r=3; qt=Table[0, {i,9},{j,4}]; Do[qt[[i]]=pt[[i]].Tlf, {i,1,9}]; (* use Tlf for other image *) Do[qt[[i,1]]=qt[[i,1]]/qt[[i,4]], {i,1,9}]; Do[qt[[i,2]]=qt[[i,2]]/qt[[i,4]], {i,1,9}]; ListPlot[Table[{qt[[i,1]], qt[[i,2]]},{i,1,9}], PlotJoined->True, Axes->False] Code for Figure 6.56.
Figure 6.57: Stereo Pair Shown as Complete Cubes.
335
6.13 Creating a Stereoscopic Image
336
x
P
left eye
PL left eye right eye
3
PR z
k
z
20 right eye
(a)
(b)
x
x x xL
e
z
k
x xR z
z
e
(c)
(d)
Figure 6.58: Perspective Projection of a Stereo Pair.
A stereo pair is therefore calculated by substituting e = 0.6, k = 4 in Equation (6.21) and computing PL = TL · P and PR = TR · P for every point P of the object. Exercise 6.31: What would be good values for e and k assuming a distance of 2.5 in between the eyes?
6.13 Creating a Stereoscopic Image The discussion in Section 6.12 suggests that the simplest way to obtain a left-eye, righteye pair of stereoscopic images is to select a camera, choose a good subject, take a picture, and then shift the camera along the baseline (normally about 65 mm) to the right and take another picture. This pair of two-dimensional images can then be watched as a single three-dimensional (stereoscopic) image with the methods discussed in Section 6.14. (Actually, what will be seen in three dimensions are those parts that are common to both pictures. Any objects that appear only in one picture because they are near an edge will disappear or will confuse the brain, depending on how the pictures are watched.) Stereoscopic scanners are discusses on Page 1261. Here we show several simple ways to photograph such a pair, and we start with the basic rules for obtaining good stereoscopic images. The first rule is to take sharp pictures. All the objects in the photograph should be in focus. The professional term for this is a large depth of field (Section 26.4.7). Photographers sometimes take pictures where certain elements, normally in the background, are blurred, while the main subject is sharp. Such a picture may have artistic merit, but it does not translate well to three dimensions.
6 Perspective Projection
337
The second rule is to select an appropriate subject. The aim is to produce a stereoscopic, three-dimensional image, preferably also interesting and in color. Thus, the subject must be in color and must have depth. Professional photographers and artists recommend selecting a subject that has three main elements, one near the camera, one far away, and the third in between. A simple example is a nearby gray rock, a green/brown tree in the background, and a white fence running between them. A similar example is a nearby statue in the Palais de Chaillot, the Eiffel Tower in the background, and the Pont d’Iena in between. Once such an image is converted to stereoscopic, the viewer can easily see the relative positions of the three elements. In addition, the subject should have other background elements, because a picture with only three items looks empty and disappointing. Experience shows that the best results are achieved if the distance of the nearest picture element from the camera is 30 times the baseline. For the normal baseline of 6.5 cm, this translates to a distance of 195 cm or about 6.4 ft. However, many stereo enthusiasts have discovered that the baseline does not have to be 6.5 cm as long as a ratio of 30 is maintained. Thus, if the nearest object is 300 cm from the camera, then a baseline of 10 cm will produce a realistic-looking stereoscopic image. The third rule is to maintain precise vertical alignment of the two pictures. Every picture element must appear at the same height in the two pictures. Thus, the camera should not be tilted, raised, or lowered between the two exposures. It should only be shifted horizontally. Rule four is to avoid having many red and blue (or red and green) objects in the picture. Section 6.14 shows that a stereoscopic image generated as a color anaglyph looks bad if it employs these colors extensively. The photographer should also make sure that the camera is held vertically and is not tilted up or down, as this may cause unwanted converging lines and extra vanishing points, features that tend to confuse the viewer. Only static images can be photographed (images with moving elements, such as clouds, flags, or vehicles, can be photographed with a pair of cameras; see below). Finally, remember which image is for the left eye and which is for the right eye. Switching these two results in a nonworking stereo image. We now turn to techniques for taking a pair of stereo pictures with a camera. Perhaps the simplest (and cheapest) technique is to use a small, 6 ft (2 m) ladder. Place the camera on several steps of the ladder until you find the ideal height for your subject. Take a picture, move the camera horizontally about 6.5 cm, and make the second exposure. A ruler or a straight piece of wood makes it easier to slide the camera without tilting or rotating it. If you own a tripod, you can get better results. The simplest way to use a tripod is to take one picture, lift the tripod, move it to the left or right, and take the second picture. Before you start, draw a straight line on the ground, perpendicular to the line of sight of the camera, and position the tripod such that two of its legs are on the line. More accurate results are obtained with the simple jig illustrated in Figure 6.59.
338
6.13 Creating a Stereoscopic Image
Figure 6.59: A Jig to Photograph a Stereo Pair.
The jig consists of two pieces of wood or plywood that are attached to the tripod with a clamp. The camera is placed on the wider piece, while the smaller piece serves as a support and guide. The dimensions of the two pieces depend on the size of the camera and the length of the required baseline. Before taking any pictures, mark two points on the guide, about 6.5 cm apart, to serve as marks for the standard baseline. With a bit of experience, this primitive device produces very accurate stereo pairs. It is easy to come up with variations on this simple design. Good-quality kitchen cabinets often have drawers mounted on special ball-bearing metal slides. Anyone planning to take many stereo pictures might want to attach a camera to such a slide and attach the slide to a heavy-duty tripod. Such an arrangement is accurate, easy to use, and lasts a long time. Detailed instructions for its construction are available at [berezin 06]. Similar devices are sold commercially by [3dstereo 11] and others. A completely different approach to taking such pairs of pictures is to make or obtain a pair of cameras whose lenses are placed the right distance apart and are operated together with a common shutter release cable (Figure 6.60). Such a device can produce stereo pairs of scenes that change rapidly, such as flocks of birds or racing cars. The two cameras can be placed either side by side [part (b)] or one above the other, as long as the distance between the centers of their lenses is the right one. If one camera is placed on top of the other [part (a)], it is important to leave enough room between them for the shutter release cable. If the cameras are mounted bottom to bottom, care should be taken to align their lenses vertically [part (c)]. In the latter case, the user should verify that the two pictures are vertically aligned (rule 3 above). Every point should be at the same height in the two pictures.
(a)
(b) Figure 6.60: A Pair of Cameras.
(c)
6 Perspective Projection
339
Such double cameras are available commercially and can also be homemade. Figure 6.61 shows one made in 1998 by Andreas Petersik [Petersik 05] from two Nikon cameras. Yet another solution is to construct a camera with a sliding lens. The lens is shifted to the left and a picture is taken. The lens is then shifted to the right and another picture is taken. If film is used, the two pictures are taken on two adjacent frames of the roll of film. In a digital camera of this type, the CCD sensor slides with the lens and both pictures are captured by the same sensor and stored in different memory areas.
Figure 6.61: A Homemade Double Nikon Camera (Courtesy of Andreas Petersik).
The Fujifilm FinePix Real 3D W3 Digital Camera (with two lenses). This interesting, sophisticated camera (Figure 6.62), introduced in August 2009 (model W1) and August 2010 (model W3), features two lenses (separated by 7.5 cm) and two 10 mpixel CCD image sensors. It snaps a stereo pair of images simultaneously, and immediately displays them in three dimensions on its 3.5 inch autostereoscopic, lenticular LCD screen; no special glasses are required. Images can also be stored on an SD card (both as three-dimensional MPO files and two-dimensional JPEG files), sent to a compatible three-dimensional high-definition television (3DTV), and be printed on special lenticular paper. At the time of writing, .MPO files can be viewed on a computer with the STOIK Imagic software [STOIK-Imagic 11] and with the free MPO Toolbox software, from Fujifilm. The FinePix can also record three-dimensional video (in 720p format). As an added benefit, two-dimensional images (still and video) can also be taken. The camera can shoot two images that capture the same scene but differ by focal length, color tones, or sensitivity. One image can be wide-angle while the other will be taken in telephoto. One image can be shot with standard color setting, while the other is snapped in black and white or in chrome. One image can be taken with high sensitivity, while the other is shot in low sensitivity. The last feature may be useful when shooting fast-moving vehicles or in low light conditions. Actual users of this camera are enthusiastic about how the three-dimensional images look on the LCD screen of the camera. Comments found on the Internet employ superlatives such as “thrilled,” “wowed,” and “fabulous.” However, the conventional, two-dimensional pictures produced by the camera are considered mediocre (quote: “this is essentially two mediocre cameras stuffed into one body”). The three-dimensional images can be sent to the manufacturers to be printed on special lenticular 5 × 7 inches paper, but the prints are expensive and often disappointing.
340
6.14 Viewing a Stereoscopic Image
Figure 6.62: The Fujifilm FinePix Camera.
Note: When a pair of stereo pictures is taken, a flash should be avoided because it normally casts shadows. The subject being photographed is shifted in the two pictures because the camera has moved, but any shadows cast on a wall behind the subject are shifted twice because the camera has moved and because the light from the flash is coming from a different direction. Thus, shadows would be placed incorrectly in the two pictures and would interfere with the correct visualization of the brain. Our range of expression is small, so that a smile in genuine pleasure photographs indistinguishably from a grimace of pain; they are the same unless we know their history and their nature. —C. P. Snow, Strangers and Brothers (George Passant) (1948). Exercise 6.32: The two pictures of a stereo pair differ by a horizontal shift, which suggests the following idea. Instead of taking two pictures, take just one, copy it, shift the copy horizontally, and use it as the second picture. What’s wrong with this method?
6.14 Viewing a Stereoscopic Image A stereoscopic image consists of a pair (right-eye, left-eye) of images. To see this pair in three dimensions, we have to view it in a special way. The guiding principle is that our brain must receive from our eyes the same signals it receives when we watch a real three-dimensional image. Given a stereoscopic pair of images on paper or on a screen, the most common techniques to view it are as follows: 1. View it through a stereoscope. This is a simple device that can easily be built at home. 2. The cross-eye technique. The two images of a pair are laid side by side and the viewer has to cross his eyes in order to slide the images and see them fused into a single image.
6 Perspective Projection
341
3. The parallel-view technique. This is similar to the cross-view method but is appropriate for small images. 4. The anaglyph method. The two original images are combined into a single image where they are painted different colors. Special glasses are used to make sure each eye sees only one color. 5. Page-flipped techniques, where the left and right pictures are continually flipped on the screen. 6. Line alternate methods, where the left-eye and right-eye pictures are interleaved on the screen. These are popular with head-mounted displays. There are other, more sophisticated techniques, such as the Pulfrich effect and dot stereograms. We follow with detailed descriptions of the most common methods. Stereoscope A stereoscope (Figure 6.63) is a simple device for viewing a stereo pair. It can easily be made at home from cardboard, wood, and two lenses. In a piece of cardboard, cut two circular holes with a diameter of about 1.5 in each and with about 6.5 cm separation between their centers. Place a lens with a focal length of 4 in in each hole. Look at a stereo pair located about 4 in away through the lenses, using another piece of cardboard to make sure each eye sees only one image. More sophisticated devices are available from several sources, such as [StereoGraphics 05] and [Edmund Scientific 05].
Figure 6.63: Stereoscopes.
The Cross-Eye View Technique Note. If you wear glasses, keep them on when trying this method.
342
6.14 Viewing a Stereoscopic Image
The right and left images should be displayed side by side, with the right image on the left and the left image on the right, as illustrated here:
RL
right eye left eye
Start by staring at the center point between the two images. Slowly cross your eyes and watch the two images slide closer. With a little patience and practice, you should be able to make the two images overlap. You will then see three images, as illustrated here:
RR LL
At this point, your right eye sees the right image and your left eye sees the left image. Try to ignore them and concentrate on the central image. When you are successful, the center image will feature depth; it will be stereoscopic. If you are one of those who find this technique difficult in practice, try the following aid. R
L
Place a finger here 1. Observe the image above with the R and L targets. 2. Place a finger on the paper, right under the two targets, as indicated. 3. Stare at your fingertip and, while still looking at it, slowly move your finger away from the image pair and toward your eyes. If you relax, practice this method several times, and do it slowly, you should be able to slide the two targets and align them perfectly. You may need to tilt your head slightly left or right to align the targets vertically. 4. When the two targets fuse, move your eyes slowly from your fingertip to the fused image on the page. Don’t forget to keep your eyes crossed during this step. If this “trick” is successful, apply it to a pair of stereo images such as Figure 6.64 and Plates O.2 and P.2. Experience indicates that most people get used to this way of viewing stereoscopic images and don’t find it tiring or uncomfortable. However, if you feel discomfort or if your eyes get tired, don’t try this method again! There are other ways to enjoy stereoscopic images. The Parallel View Technique This technique (also referred to as relaxed viewing) is appropriate for small images where each image of a stereoscopic pair fits between the eyes. The pair is displayed with the
6 Perspective Projection
343
Figure 6.64: A Stereo Pair.
right-eye image on the right and the left-eye image on the left, as illustrated here:
LR
left eye right eye
The following steps show how such an image can be viewed stereoscopically without any tools or instruments. Those who wear glasses may get better results trying these steps without their glasses. 1. Watch the image pair from close range so that each eye is over one of the images. This is possible if the images are small enough. 2. Stare straight ahead and try to gaze through the images to infinity. The stereo images will look blurred. 3. Slowly pull your head away from the page while maintaining the same gaze. The two images will turn into four images. Continue to move away while gazing to infinity. 4. At a certain point, the four images will merge into three. Concentrate on the central image and you will suddenly see it in three dimensions. The effect is more noticeable if the original image pair is in vivid colors. You can try this technique on the image pair of Figure 6.64, but avoid prolonged viewing and concentration, which may lead to eye fatigue. The Anaglyph Method This approach to stereoscopic viewing combines the right-eye and left-eye images (partly overlapping) in one image but in different colors, normally red and blue (or cyan) but sometimes red and green. The method requires the use of special glasses with different color filters for the two eyes, as illustrated here. (See also [kspark 05] for several well-known Escher drawings that have been converted to three dimensions, mostly as anaglyphs.)
344
6.14 Viewing a Stereoscopic Image
The red filter on the left eye looks red because it reflects red light. Any other colors are partly absorbed and partly transmitted through the filter. Thus, this filter lets the blue parts of the image through to the left eye. Similarly, the blue filter on the right lets only the red parts of the image to the right eye. Reference [StereoGlasses 11] teaches how to make such glasses at home. Warning. Some people may be sensitive to these glasses. If you feel discomfort or if you get tired very quickly, take off the glasses and take a break. In any case, try to use these glasses for short periods and only to view an anaglyph image. They are not intended for normal use! From the Dictionary Anaglyph. From Late Latin anaglyphus, carved in low relief. Also from Greek anagluphos (to carve). An anaglyph image is encoded in one of three ways as follows: Color. The left-eye image is left mostly in its original colors, but certain crucial parts, such as edges, curves, lines, and points, are painted blue (or cyan or green). The right-eye image is treated similarly with red. Thus, a color anaglyph (Figure 6.65a) preserves much of the original colors of the image, but its red/blue (or red/green) stereo information is diluted throughout the image. The result is that many images lose their depth information in this format and don’t look three-dimensional. This is especially true if the original image has vivid red or blue colors. However, if an image does look good in a color anaglyph, it looks real and vivid. Gray. The two original images are converted to grayscale and the same crucial elements are painted red and blue. A gray anaglyph image (Figure 6.65b) is therefore seen in grayscale, but its depth information is normally easy to perceive. Pure. The right-eye image is entirely converted to shades of red. The left-eye image is treated as in the color anaglyph method. The combined image looks reddish on paper (Figure 6.65c) but has much depth information when seen through the glasses. Some of the original color information naturally is lost. An interesting difference between the three anaglyphs of Figure 6.65 is the person on the left-hand side (he can be seen in Figure 6.64) who completely disappears in the pure version. Experienced users recommend creating all three anaglyphs of a given image, trying the color, gray, and pure versions (in this order), and selecting the one judged best. There are many sources of software (much of it free) to generate anaglyphs. Those too lazy to search can check the list at [anabuilder 05]. Pick a good-quality anaglyph and examine it carefully. You will notice that each crucial picture element P is shown twice in the anaglyph, in red and blue. The relative positions of these two color elements are interpreted by the brain as the depth of element P. Let’s assume that the left-eye view becomes the red parts and the right-eye view becomes the blue parts. If the red and blue parts of P overlap, the brain considers P to be on the image plane (i.e., the paper or screen on which the anaglyph image is printed or displayed). Such picture elements are said to be at the stereo window and are always comfortable for the eye to watch, regardless of where they are located in the image. If the red part of a picture element P is placed on the anaglyph to the right of the corresponding blue part, the brain perceives P as being located in front of the stereo
6 Perspective Projection
(a)
345
(b)
(c) Figure 6.65: Three Anaglyph Encodings.
window. Figure 6.66 shows that this effect requires a large separation of the red and blue parts. If the red part of a picture element P is placed on the anaglyph to the left of the corresponding blue part, the brain visualizes P as located behind the stereo window. Figure 6.66 shows that this can be achieved with only a small separation of the red and blue parts. If P is seen in only one of the two eye views (because it is close to a border of the image), then it is translated to one color only and is not seen in stereo. It may even confuse the brain if the viewer concentrates on P. Figure 6.66 also illustrates the effect of moving the viewer closer to and away from the anaglyph. As we watch an anaglyph from close by, we see the entire image bigger but with less depth. As we move our head away from the anaglyph, the stereo image becomes smaller, but the difference between points A and B increases; the image acquires more depth. Page-Flipped Techniques These techniques require a special monitor screen and special shutter glasses. The screen switches rapidly between the left-eye and right-eye images. The glasses are triggered by the monitor hardware to block the right lens when the left-eye image is displayed and
346
6.14 Viewing a Stereoscopic Image blue red
A
blue red
B
blue red
A
blue red
B
Figure 6.66: Relative Positions of the Red and Blue Parts.
block the left lens when the right-eye image is displayed (Figure 6.67). Thus, the brain receives the correct image from each eye, and if the images are sent to the brain at a fast rate, the brain fuses them as usual into a single three-dimensional image. If the switching rate is low, the brain interprets the signals as a flickering image (still threedimensional). The shutters in the glasses are electronic and are normally made from liquid crystals. The glasses themselves are connected to the monitor (actually, to the video card) through a special cable or through one of the input/output ports (serial or parallel). New types of shutter glasses are wireless.
Left-eye image
Time
Right-eye image
Figure 6.67: Page-Flip Monitor and Shutter Glasses.
This method generates high-quality, high-resolution color stereoscopic images but requires special hardware, so it is not as common as the previous methods. Line-Alternate Techniques In the past, most monitors used with computers were CRTs. In the last decade, LCD monitors have become popular. Both types of monitors operate as raster scan displays and generate an image in the interleaved mode. The term “raster scan” means that the image is displayed on the monitor screen row by row, from top to bottom, and each row of pixels is generated from left to right. A complete scan of the screen is known as a refresh. In a CRT, this is achieved by sweeping the electron beam over the screen row
6 Perspective Projection
347
by row from the top left corner to the bottom right corner. In an LCD monitor, the individual LCDs are scanned in this order and turned on or off as needed. The term “interleaved” means that each refresh of the screen is done in two parts. The first part refreshes the display of the odd-numbered screen rows, and the second part refreshes the even-numbered rows.
Figure 6.68: Line-Alternate Techniques.
A line-alternate technique for stereoscopic images displays the right-eye image on the odd-numbered rows and the left-eye image on the even-numbered rows or vice versa (Figure 6.68). Special shutter glasses block the left eye from seeing the display during the first part of the refresh (i.e., when the odd-numbered rows are scanned and refreshed) and block the right eye during the second part. Such techniques are popular in small, head-mounted displays. A three-dimensional image created by the various line-alternate techniques is stable and doesn’t suffer from flickers because the screen is normally scanned at a flicker-free speed. In addition, there is no loss of color. The downside is loss of resolution because each image is displayed on half the rows of the display. A variation of the line-alternate technique is a lenticular lens. Figure 6.71 shows the principles of this technique. Each of the two stereo eye images is cut into narrow strips that are then interleaved and viewed through a special lens made of many small half-circular elements (placed at about 100 elements per inch). Each lens element sends one image strip to the left eye and one strip to the right eye. The Pulfrich Effect The Pulfrich effect (Figure 6.69), described by Carl Pulfrich in 1922, is best explained as an optical illusion (but then one might argue that any stereoscopic image is an optical illusion). Imagine an object moving in the plane perpendicular to our line of sight. If we look at the object with both eyes and dim the light reaching one eye, the object seems to move out of this plane and to either approach us or recede from us. The simplest way to observe this effect is to use one sunglass lens, but most pieces of dark glass or plastic, as well as many optical filters, work fine. It is easy to demonstrate the Pulfrich effect with a swinging pendulum. When viewed normally with both eyes, we can verify that the pendulum swings in a plane back and forth. When a dark lens or filter is placed in front of one eye, the pendulum suddenly seems to be swinging in an ellipse parallel to the ground. The light has to be dimmed to one eye only. Dimming the light equally to both eyes results in a dim pendulum seen swinging in a plane.
6.14 Viewing a Stereoscopic Image
348
Figure 6.69: The Pulfrich Effect.
There are several Web sites with Java applets that illustrate this effect very convincingly through animation. One such site is [Newbold 05], but a Web search for pulfrich, java, and animation yields many more. I have never been able to observe these effects myself, for I have been blind in the left eye for 16 years as a result of a traumatic (blutigen) injury of the eye suffered when I was young. —Carl Pulfrich (1922). Dot Stereograms Figure 6.70 illustrates the principle of the interesting method of dot stereograms (for a complete description, see [Thimbleby et al. 94]). A three-dimensional scene is projected on a screen and a point P1 is selected at random. The two eyes of the viewer see point P1 projected at points Q1 and Q2 . We now select another point P2 such that its projections for the two eyes are Q2 and Q3 . We have selected point Q2 in such a way that it is both the left-eye projection of P1 and the right-eye projection of P2 . (P2 must be at the same height as P1 , which is not obvious in our two-dimensional figure.) A little thinking shows that most points on the screen do similar “double duty.” The exceptions are points close to the edges of the screen, or points whose P1 or P2 are hidden by other parts of the scene. Since Q2 is common to P1 and P2 , we face the question of what color to paint it. In fact, Q1 , Q2 , and Q3 have to be painted the same color. Left eye Q3
Scene
Q2 Right eye
Q1
P2 P1
Figure 6.70: Dot Stereograms: The Principle.
The algorithm described in [Thimbleby et al. 94] has to decide what color to paint each point (dot) on the screen and also to determine the two parents, P1 and P2 , of
6 Perspective Projection
Left eye
349
Right eye Cut each into strips
Interlace strips R L R L
R L R L
Attach lens
Interlaced image
Lenticular lens R
L
R
L
Half cylindrical lens
Figure 6.71: Lenticular Lens Principles.
R
L
R
L
350
6.14 Viewing a Stereoscopic Image
each point on the screen. The result of this algorithm is a stereogram that consists of dots and can be watched in three dimensions by crossing the eyes, without the need for special glasses or any other device. There are three types of dot stereograms, as follows: SIRDS (Single Image Random Dot Stereograms). This is the oldest type. It goes back to the pioneering work of B´ela Julesz in the 1960s. Such a stereogram consists of a random pattern of dots, each representing two pixels of the object. Figure 6.72 is an example of this type of stereogram. (This example may be hard or impossible to view because it has been shrunk from the original. It resembles the original image, but its individual pixels are different from the original pixels.)
Figure 6.72: Example of an SIRDS Dot Stereogram.
SIS (Single Image Stereograms). This is currently the most common type. The picture consists of (slightly modified) tiles. This type of dot stereogram is somewhat more complex to generate, but the basic algorithm is the same. SIRTS (Single Image Random Text Stereograms). This type is identical to SIRDS but uses ASCII characters instead of dots. The resulting stereogram has low resolution. A dot stereogram is easier to perceive in three dimensions if it is printed on paper rather than displayed on a screen. Here are two simple methods for viewing this interesting type of stereoscopic image. In the pull-back method there is no need to cross the eyes. Just hold the picture close to and in front of your face. Imagine that you are looking straight ahead, right through the picture. When your eyes relax and are no longer focused on any point, start moving the picture away from you slowly. When you reach your normal reading distance, you should perceive the three-dimensional image. It’s important not to focus on the image.
6 Perspective Projection
351
The reflection method works for stereograms that are printed on reflecting paper. Turn and tilt the paper until it reflects light into your eyes. Focus on the reflection and wait. After a few seconds, you should see the three-dimensional image.
6.15 Autostereoscopic Displays
(a)
box eye
R L
lens
display
lens
light
y x
display
bars
Graphics displays are discussed in Chapter 26. This section concentrates on a special type of display, the autostereoscopic display. The autostereoscopic display presents a completely different approach to the problem of creating and viewing three-dimensional images. Such a display generates a threedimensional image without the need for special glasses, any headgear, or any other auxiliary device. The price for this is a limited field of view. A correct, lifelike threedimensional image can be seen only from certain points. A viewer positioned elsewhere sees either a confusing image or nothing at all. The original idea of the autostereoscopic display is due to Adrian Travis [Travis 90] who patented it in 1992 [Travis 92]. Practical autostereoscopic displays have been developed by DeepLight, Inc., of Westlake Village, California [deeplight 06]. Imagine two cameras L and R separated by the correct distance for stereoscopic viewing (about 6–7 cm), sending images to a computer (Figure 6.73a). The computer displays the two images alternately at high speed (we say that the images are time multiplexed). It sends image L, followed by image R, followed again by image L, and so on, to a monitor screen. A person is sitting in front of the screen, watching the images. The two time-multiplexed images are synchronized with a fast shutter device such that when image L is displayed, only the left eye of the viewer sees the display and when image R is displayed, only the right eye sees the screen.
(b)
(c)
Figure 6.73: Autostereoscopic Display With Light Bars.
The ideal way to achieve such optical synchronization is to use an LCD (Section 26.3). This type of display does not generate light and has to be illuminated from behind. Two special light sources and a Fresnel lens are now placed behind the display. Each light source is a narrow vertical rectangle (a light bar) that illuminates the display from a different direction, thereby causing the light from the display to be sent in a different direction. Figure 6.73b illustrates this configuration as seen from above. The viewer has to be positioned at the center of the eye box. (The eye box is simply a region in space, not a screen or a device.). When light from bar x reaches the lens and
6.15 Autostereoscopic Displays
352
the display, the image from the display is seen only by the viewer’s right eye. A little later, light bar x is turned off and light bar y is turned on, causing the image from the display to shift to the left (in the figure, it is shifted down) and be seen only by the viewer’s left eye. Figure 6.73c shows how this idea can be extended to more than two images. Imagine six cameras positioned in front of a scene. The cameras are set precisely at the same height, they are parallel, and are separated horizontally by 7 cm. Six images are sent to the computer and are time-multiplexed by it to the display. Six light bars are synchronized with the images, such that each image is directed by the display to a different area in the eye box. The viewer can now shift his head left and right from area to area within the eye box and can see the scene in three dimensions from five positions with the correct parallax. Unfortunately, this ideal arrangement is currently impractical because of the following reasons:
box lens
eye
CRT
image
shutters
1. The images must be sent to the LCD at a high rate in order to create the illusion of a single, three-dimensional image. In a system with six images, if we want to send each image to the display 60 times a second, we need a refresh rate of 6×60 = 360 Hz. Unfortunately, the refresh rate of current LCDs is low. A practical autostereoscopic display must therefore use a high-speed display. 2. It is difficult to arrange six cameras at the same height while also keeping them parallel and separated by the right distance. The autostereoscopic display that has been developed by DeepLight uses two cameras and a special, proprietary algorithm to generate four additional stereo images, for a total of six images that are then timemultiplexed and sent to the display one by one.
Figure 6.74: Autostereoscopic Display With LCD Shutters.
Because of these reasons, autostereoscopic displays currently available use a different arrangement that is illustrated in Figure 6.74. (The figure shows the main components from above.) Light from a high-speed display is sent through a lens to form an image. An array of fast LCD shutters “looks” at the image. Each shutter is a narrow rectangle through which the entire image is sent to the Fresnel lens. At any time, only one shutter is open, allowing the image to pass through the shutter and be focused by the Fresnel lens in one area of the eye box. The shutters are switched rapidly, in synchronization with the image displayed on the monitor, so a viewer looking in two adjacent areas in the eye box sees two (timemultiplexed) stereoscopic images.
6 Perspective Projection
353
Applications of autostereoscopic displays are currently limited by the high cost of the hardware. There is also the fact that only one viewer can see the image at any time, and only a limited number of views is available in the eye box. Current applications include laparoscopic surgery and geologic displays employed in searching for oil and gas deposits. Once costs start coming down, future applications may include threedimensional graphics design and game playing on personal computers. Perspective is the rein and rudder of painting.
—Leonardo da Vinci
7 Nonlinear Projections In addition to the parallel and perspective projections, other projections may be developed that are useful for special applications or that create ornamental or artistic effects. Such projections are termed nonlinear because they cannot be expressed by linear transformations such as x∗ = ax + cy + m and y ∗ = bx + dy + n. It seems that the number of possible nonlinear projections is vast and is limited only by the creativity of researchers and implementors. This chapter discusses some of the more common nonlinear projections, including the false perspective, the fisheye projection, several 360◦ panoramic projections, the telescopic and microscopic projections, and sphere projections. These projections create aesthetically pleasing (and sometimes confusing) effects and are mathematically simple and easy to derive. However, because they are nonlinear, they generally cannot be represented by means of transformation matrices. (Recall that multiplying a point (x, y, z) by a matrix results in a linear expression such as ax+by+cz, but never in nonlinear constructs such as ax2 .) See also Section 2.16 for nonlinear bitmap transformations. Back in the corridor of the building, posters of computer-generated fractal images depicting the “arithmetic limits of iterative nonlinear equations” line the walls. —Douglas Rushkoff, Cyberia: Life in the Trenches of Hyperspace, (1994).
7.1 False Perspective Equation (6.3) is the main expression for the linear perspective projection; it is duplicated here: x y x∗ = , y∗ = . (6.3) 1 + (z/k) 1 + (z/k) It shows that the (two-dimensional) coordinates of the projected point P∗ are obtained by dividing by the z coordinate (the depth) of the original point P. False perspective (or D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_7, © Springer-Verlag London Limited 2011
355
7.1 False Perspective
356
pseudoperspective) is a technique to artificially add depth and introduce perspective (or an effect similar to perspective) into a two-dimensional image, thereby making it appear three-dimensional. Points in a two-dimensional image have just x and y coordinates, which makes it natural to modify Equation (6.3) to x∗ =
x , 1 + f (x, y)
y∗ =
y , 1 + f (x, y)
(7.1)
where f (x, y) is a function chosen by the user according to the desired effect. For example, the function 2 2 1 f (x, y) = − e−ax −by , 2 where a and b are real constants, returns the value −0.5 for x = y = 0 (the origin) and values that approach zero for very large x or y coordinates (positive or negative). Points (x, y) near the origin are therefore projected to (2x, 2y), while points on the edges of the image are hardly affected by this projection. This has the effect of magnifying the center of the image, thereby making it appear closer. Other functions may create different effects. Figure 7.1 shows an example of a 5×5 grid of points moved in such a way. The computations were done by the following code fp[x_,y_]:={x/(1-0.5 Exp[-a x^2-b y^2]), y/(1-0.5 Exp[-a x^2-b y^2])};
Figure 7.1: Moving Points in False Perspective.
Psychedelics and VR are both ways of creating a new, nonlinear reality, where selfexpression is a community event. If you realize that the world is nonlinear and random, then it means that you can be completely annihilated by chaos for no particular reason at all. These things happen. There’s no cosmic justice. And that’s a disquieting thing to have to face. It’s damaging to people’s self-esteem. —Douglas Rushkoff, Cyberia: Life in the Trenches of Hyperspace, (1994).
7 Nonlinear Projections
357
7.2 Fisheye Projection This type of projection is named after the fisheye camera lenses that many photography enthusiasts like to use. The name “fisheye” reflects the shape of such a lens, which resembles the protruding eye of a fish. Such lenses are also used in peepholes installed in doors. The basic idea in this type of projection is to take the half-sphere of space (with infinite radius) located in front of the viewer and project it into a flat circle. The half-sphere is infinite, whereas the circle is finite and may be quite small. Thus, the projected image must be distorted. Just shrinking the image uniformly will make most of its details too small to see. A better idea is to implement nonlinear shrinking that should become more pronounced as we move from the center toward the periphery of the image. Objects close to the center of the image are more visible to a viewer and should therefore be shrunk only a little. The shrinking should increase for objects located away from the center. In principle, the scale factor should vary from 1 (no shrinking) at the center to 0 (shrinking all the way to zero) for image points on the periphery (i.e., at 180◦ to the line of sight of the viewer). He sat by Chrystal’s side, red-complexioned, opulent, with protruding eyes that glanced round whenever he spoke to make sure that all were listening. —C. P. Snow, The Light and the Dark (1947). Hemispherical Fisheye Projection We start with a simple variant that can be called hemispherical fisheye. This variant is easy to understand but requires the computations of both the tangent and arctangent for each point being projected. The projection of points in this variant is derived in two steps. In the first step, illustrated in Figure 7.2a, all the points in the hemisphere where z is nonnegative are projected into an infinitely large circle on the xy plane, centered on the origin. In the second step, all the points on this circle are moved closer to the center and end up on the radius-k circle centered at the origin (Figure 7.2b). The first step employs parallel projection to project points onto a plane. Figure 7.2a shows how the parallel projection of a point simply amounts to clearing its z coordinate. The three-dimensional point (x, y, z) is projected to (x, y, 0) on the infinite circle on the xy plane. The second step compresses the infinite circle to a radius-k circle nonlinearly. The user selects a positive value k and each point on the xy plane is moved toward the origin by halving its angle of view θ as seen from the standard position (0, 0, −k). (See Page 298 for a definition of the standard position.) Figure 7.2b shows a point P on the xy plane where the angle between the z axis and line VP is θ. The point is moved closer to the origin along the segment P O and becomes P∗ with a view angle of θ/2. Since both P and P∗ are on the xy plane, we can consider this transformation scaling in two dimensions. The transformed point P∗ equals sP, where the scale factor s is less than one (i.e., shrinking). However, it is easy to see intuitively that points located away from the origin will be scaled more than points closer to the origin. The scale factor s is therefore variable; it depends on P, which is why this type of projection justifies the
7.2 Fisheye Projection
358
y y Infinite circle
P x
k
P* /2
z V (a)
z
O x
k (b)
Figure 7.2: Hemispherical Fisheye Projection.
name nonlinear. The derivation of s starts with Figure 7.2b, which shows that tan θ = |P|/k, implying θ = arctan[|P|/k]. Similarly, the transformed point satisfies tan(θ/2) = |P∗ |/k, which yields the scaling factor k tan (arctan[|P|/k])/2 k tan(θ/2) |P∗ | = = . (7.2) s= |P| |P| |P| Exercise 7.1: Use mathematical software to compute the scale factors for several |P| values from 1 to 10,000. If the programming language or mathematical software being used cannot compute the arctan to the desired accuracy, the following expressions (where h stands for |P|) are equivalent and employ only sines and cosines. From h/k = tan θ and sh/k = tan(θ/2), we obtain s=
k tan(θ/2) 1 − cos θ sin θ cos θ(1 − cos θ) tan(θ/2) = = = , h tan θ sin θ cos θ sin2 θ
or equivalently
1 − cos θ . sin θ Notice that points that are the farthest from the origin on the xy plane have an angle θ in Figure 7.2b close to 90◦ . Thus, their projections have an angle close to 45◦ . A view angle of 45◦ implies that the distance of such a projected point from the origin equals the distance k of the standard position from the origin. The result is that all the points on the (infinitely large) xy plane are moved by the hemispherical fisheye projection onto the radius-k circle located in the xy plane and centered on the origin. Figure 7.3 illustrates this process with 50 points. It is easy to see how the distance of a point from the center of the circle affects the amount by which it is moved toward s h = k tan(θ/2) = k
7 Nonlinear Projections
359
the center. (The code that generated this figure is kept simple. It generates 50 points with random coordinates in the interval [−10, 10], which is why some points are located outside the radius-10 circle. See Section 2.30.2 for random points inside a circle.)
(* hemispherical fisheye projection *) Clear[k, n, P, Q, L] k=10; n=50; scal[q_]:=(k Tan[ArcTan[q/k]/2])/q; P=Table[{Random[Real,{-10.,10.}], Random[Real,{-10., 10.}]},{n}]; Q=Table[Sqrt[P[[i]].P[[i]]], {i, n}]; L=Table[Line[{P[[i]], scal[Q[[i]]] P[[i]]}], {i, n}]; Show[Graphics[L], Graphics[Circle[{0, 0}, 10]], Graphics[Point[{0, 0}]], AspectRatio -> 1] Figure 7.3: Moving Points in Hemispherical Fisheye Projection.
It is possible to extend this variant of the fisheye projection to cover more than 180◦ of space. Figure 7.4a shows how a coverage of up to about 220◦ can be achieved by bending the xy plane “backward” (i.e., toward the negative z axis) and projecting all the three-dimensional points that are located to the “right” of this bent plane. Once this is done, the points are scaled as before into the radius-k circle. Figure 7.4b is an example of the type of distortion typical of the hemispherical fisheye projection. The figure shows the old executive office building in Washington, D.C., and it is easy to see that both the vertical lines (the tree in the foreground) and horizontal lines (the fence) are curved and that image elements in the center are more detailed than those near the periphery. Exercise 7.2: Explain why we expect vertical and horizontal straight lines to become curved in a fisheye projection. Well-known examples of the hemispherical fisheye projection are Hand with Reflecting Sphere and Circle Limit IV (Heaven and Hell) by M. C. Escher [Ernst 76]. See also Figure 7.31 (Parmigianino, A Self Portrait in a Convex Mirror, 1524).
7.2 Fisheye Projection
360
xy plane z 2200
Bent
(a)
(b)
Figure 7.4: (a) Extended Hemispherical Fisheye Projection. (b) Example.
Approximate Hemispherical Fisheye Projection The downside of the hemispherical fisheye projection is the extensive computations required by the tangent and arctangent functions. The method described here employs approximations to simplify the computations. The tradeoff is loss of accuracy, but since the fisheye projection introduces distortions anyway, many viewers may not be able to distinguish accurate results from approximate ones. Figure 7.2b illustrates the principle. Each point P on the infinitely large circle corresponds to an angle θ and is moved toward the origin such that its new angle is θ/2. Thus, we can compute the radii of several concentric circles that correspond to, say, θ = 22.5◦ , 45◦ , 67.5◦ , and 89◦ . Similarly, we can compute the radii of the corresponding circles (the circles for θ/2 values) on the radius-k circle. Figure 7.5 shows an example.
P*
Q P
d 0
DC
B
A
0
22.5
45
0
89
67.5
c
b
Q*
a k
0
Figure 7.5: Approximate Fisheye Projection.
If a point P happens to be located on circle A, it is scaled by moving it to the corresponding circle a on the radius-k circle. Its scale factor is the ratio ra /rA of the radii of the two circles. If a point Q happens to be located 30% of the distance between
7 Nonlinear Projections
361
circles A and B, then it is moved 30% of the distance between circles a and b. Its scale factor is [(1 − 0.3)ra + 0.3rb ]/[(1 − 0.3)rA + 0.3rB ]. This simplifies the computations but introduces inaccuracies because the interpolation between circles is linear. However, the inaccuracies can be reduced as much as desired by precomputing the radii of more circles. The circle that corresponds to 89◦ is large and the circle for 90◦ has infinite radius. Points whose θ is between 89◦ and 90◦ will be moved to the radius-k circle and placed in the narrow region between the (89/2)◦ circle and the outer edge. Such points increase the inaccuracies of this method, but this may be acceptable because this region suffers from maximum distortion anyway. Table 7.6 lists large and small radii for five angles and for k = 10. The code that performs the computations is also listed. n 0 1 2 3 4
θ◦ 0 22.5 45 67.5 89
Rn 0 4.142 10 24.142 572.9
rn 0 1.989 4.142 6.682 9.827
Table 7.6
k = 10; angl = {22.5, 45., 67.5, 89.}; k Tan[angl Degree] k Tan[angl/2 Degree] Code for Table 7.6
A given point (x, y) is at a distance d = x2 + y 2 from the origin. This distance is compared with all the radii in the table. If d equals an Rn , then the point is multiplied by the scale factor rn /Rn . Otherwise, we find the smallest Rn such that Rn < d < Rn+1 . The relative distance of the point from Rn is (d − Rn )/(Rn+1 − Rn ). As an example (recall that our √ table is based on k = 10), consider the point (15, 10). Its distance from the origin is 152 + 102 ≈ 18. Thus, it is between R2 = 10 and R3 = 24. We compute (18 − 10)/(24 − 10) ≈ 0.57, which tells us that the point is located 57% of the distance from R2 to R3 . The scale factor of the point is given by [(1−0.57)4.142+0.57·6.682]/[(1−0.57)10+ 0.57 · 24] = 5.59/18 = 0.31, so it has to be moved to 0.31(15, 10) = (4.66, 3.11) on the radius 10 circle, where its new distance from the origin is 5.6, or 57% of the distance from r2 = 4.142 to r3 = 6.682. A story of particular facts is a mirror which obscures and distorts that which should be beautiful; poetry is a mirror which makes beautiful that which it distorts. —Percy Bysshe Shelley, A Defence of Poetry. Angular Fisheye Projection The hemispherical fisheye projection assigns more importance to those image parts located near the line of sight of the viewer. These parts are displayed in detail, while image elements close to the periphery are displayed in compressed form near the edges of the projection. In contrast, the angular fisheye projection described here assigns the same importance to all the image parts. Each is compressed by the same amount. Perhaps
7.2 Fisheye Projection
362
a better name for this method would be “linear fisheye,” but the term “linear” seems a misnomer because even this projection introduces distortions and is therefore nonlinear. An important feature of the angular fisheye projection is that it can easily be extended to viewing angles of more than 180◦ and can even encompass the entire 360◦ space surrounding the viewer. Figure 7.7 illustrates the principle. The (infinite) sphere of space surrounding the viewer is divided into eight vertical slices of equal viewing angle, each of which is projected into a ring in the final circular projection. We actually see only seven of the eight slices because we are looking at the sphere from an angle. Six points a–f are shown on the sphere with their approximate projections on the circle. Notice that point “d” (shown in blue in slice 5) is supposed to be on the side of the sphere away from us, which is why it is projected on the right-hand side of ring 5. Circle
Sphere
4
b3
5 6
2 e
b a
1
5
2
3 4 d
6 7
8
e
1 a
d
7 (8=157.50) f
c f
22.50 1350
450 67.50
c
900
112.50
Figure 7.7: Angular Fisheye Projection.
The mathematical analysis of this method is a bit tedious but requires only basic geometry and trigonometry. To start, notice that there is one long dashed line in Figure 7.7. A little thinking should convince the reader that all the points in space along this line are projected to the same point on the radius-k circle. Thus, generating a 360◦ angular fisheye projection is done by scanning the entire space around the viewer and, for each direction in space, selecting that point on the scene that is the closest to the viewer. This point should be projected to the surface of the sphere and the scan continued to the next direction. Once all the directions have been examined, the surface of the radius-k sphere around the viewer is full of points. The next step is to divide the sphere into slices and project each slice on the radius-k circle. As a result, we can consider a radius-k sphere centered on the viewer and figure out how to scan it and project any point on this sphere to the radius-k circle. Figure 7.8a shows the half-circle of radius k in the xz plane. Those familiar with the parametric representations of curves and surfaces know that the parametric representation of this half-circle is k(cos u, 0, sin u) for 0 ≤ u ≤ 180◦ . Those unfamiliar with parametric methods should either notice that cos2 u + sin2 u = 1 or should refer to Part III of this book. A complete sphere of radius k is created when this half-circle is rotated 360◦ about the x axis. The parametric equation of the sphere is therefore the
7 Nonlinear Projections
363
product of the half-circle with the matrix that rotates about the x axis, ⎛
⎞ 0 − sin w ⎠ = k(cos u, sin u sin w, sin u cos w), cos w
1 0 k(cos u, 0, sin u) ⎝ 0 cos w 0 sin w
(7.3)
for 0 ≤ u ≤ 180◦ and 0 ≤ w ≤ 360◦ . y k r
z k
x w
z u
k u
x
(a)
(b)
Figure 7.8: Analysis of the Angular Fisheye Projection.
The word barycentric is derived from barycenter, meaning “center of gravity,” because such weights are used to calculate the center of gravity of an object. Barycentric weights have many uses in geometry in general and in curve and surface design in particular. Figure 7.8b shows the half-circle in the xz plane and how it is rotated. It is clear that the angle w of a point P on the sphere is one of the parameters of the projected point P∗ . This angle determines the distance r of P∗ from the center of the radius-k circle. In the figure, r equals k sin w, but the point is that for w = 0 we want r = 0, while for w = 90◦ we want r = k/2 and not r = k. This is because r values from k/2 to k correspond to w values in the “right” hemisphere (i.e., from 90◦ to 270◦ ). Thus, for w values in the interval [0, 90], we write r = k2 sin w, and Table 7.9 lists the expressions of r for the remaining three intervals of w. w 0 → 90 90 → 180 180 → 270 270 → 360
r
r interval
u
sin w
sin w (1 − sin2 w )k (1 + sin2 w )k − k2 sin w
[0, k/2] [k/2, k] [k, k/2] [k/2, 0]
top top bottom bottom
0→1 1→0 0 → −1 −1 → 0
k 2
Table 7.9: Four Cases of w, r , and u.
364
7.2 Fisheye Projection
Once we have r, we still need to decide where in the radius-k circle to place P∗ , and this is determined by u. This angle varies in the interval [0, 180◦ ], and P∗ has to be placed either in the “top” half (if 0 ≤ w ≤ 180◦ ) or the “bottom” half (if 180◦ ≤ w ≤ 360◦ ) of the circle, as indicated by Table 7.9. The complete mapping of the radius-k sphere to the radius-k circle is done in a double loop, where w varies from 0 to 360◦ in the outer loop and u varies from 0 to 180◦ in the inner loop. For each pair (u, w), the point of the three-dimensional scene nearest the viewer (who is located at the origin) is determined and is projected by computing its r value from the table and using the pair (r, u), as well as information about “up” or “down” from the table, as the polar coordinates of P∗ . Exercise 7.3: Rewrite Table 7.9 for a 180◦ angular fisheye projection. The point directly behind the observer presents a special case. This point is reached when w = 180◦ (implying r = k), in which case any value of u will select this point. This special point is therefore mapped to every point on the circle r = k. Exercise 7.4: Explain the special case of the point directly in front of the viewer. Often, a three-dimensional scene occupies every direction in space. The scene may consist of several objects with patches of ground, water, and sky filling up every other point. In such cases, every direction (u, w) will correspond to at least one point of the scene. Sometimes, a scene consists of just objects, with no background. In such cases, many pairs (u, w) will not correspond to any point of the scene. For such a pair, its projection on the radius-k circle can be painted white or any other background color. When the entire space around the viewer is projected into a circle, the angular fisheye projection becomes one of many ways to map a sphere on a plane. Sphere projections are the topic of Section 7.15. Every projection of a sphere into a plane introduces distortions, and the two main distortions of the angular fisheye projection are that (1) straight lines are mapped into curves and that (2) the hemisphere in front of the viewer is projected into the inner half of the circle and can, with some practice, be perceived and understood, but the hemisphere behind the viewer is projected into the outer half of the circle, which is a ring, and this makes it unintuitive to perceive its details. Figure 7.10 shows two 160◦ examples of the angular fisheye projection. It is obvious that straight lines are curved in the photos, but it is also clear that the curvatures diminish in lines that are close to the center of the image. In addition, it is easy to see that both the vertical lines (trees) and horizontal lines (park benches) are curved and that image details in the center are larger than those near the periphery. An interesting point is that the photos were taken with a small digital camera (Canon Powershot SD800 IS) and the fisheye effect was obtained by looking through a peephole (Section 7.3). Exercise 7.5: Show why most straight lines are mapped to curves under the angular fisheye projection. Another point worth mentioning is that the sphere is larger than the circle. Even if u and w are varied in large steps, there may be more directions to scan than there are pixels in the radius-k circle. This suggests another approach to the angular fisheye projection. Instead of scanning the 360◦ sphere in many directions, scan the radius-k
7 Nonlinear Projections
365
Figure 7.10: Angular Fisheye Photos, Taken in Lindo Park, California, with a Peephole.
circle pixel by pixel, compute the polar coordinates (r, u) of each pixel, and use them to determine the corresponding direction (u, w) in space. If a point of the scene is found in that direction, it is projected to the pixel without any additional calculations. Here is a summary of this derivation. (Actual C code can be found in [Bourke 05].) We assume that the circle is embedded in a rectangular bitmap of height H pixels and width W pixels. We scan this rectangle row by row. If the current pixel has coordinates (a, b), we first convert them to normalized coordinates (x, y) in the interval [−k, +k] by
x=
2b 2a − 1 k and y = − 1 k. W H
The distance of the pixel from the center of the rectangular image is r = x2 + y 2 . If r is greater than k, the pixel is outside the radius-k circle and is ignored. Otherwise, angle u is computed by 0, r = 0, u = π − arcsin(y/r), x < 0, arcsin(y/r), x ≥ 0. Angle w equals r/2, so it is in the interval [0, k/2], and the direction vector is k(cos u, sin u sin w, sin u cos w). The distortion introduced by the fisheye projection can be used to convert it to a spherical panoramic projection (which is discussed in more detail in Section 7.7). Imagine a radius-k circle on which a 180◦ fisheye projection is displayed. We scan the circle pixel by pixel √ and translate the Cartesian coordinates (a, b) of a pixel to polar coordinates r = a2 + b2 and u = arctan(b/a) (if a = 0, then u = 0 or u = 180◦ , depending on b). Once r is known, we can use the relations r = ±k sin w to compute angle w. Once u and w have been computed, we know that pixel (a, b) is the projection of a point P located in direction (cos u, sin u sin w, sin u cos w) on the radius-k hemisphere
366
7.2 Fisheye Projection
centered on the viewer. Thus, in principle it is possible to map each pixel in the fisheye projection to a three-dimensional point P on this hemisphere. We don’t know how far from the viewer the original point was because this information was lost when the fisheye projection was prepared, but we know that of all the three-dimensional points in direction (u, w) in the scene, point P was the nearest to the viewer, blocking all the points directly behind it. In practice, however, this technique is not that simple to implement because the number of pixels in the circle is much smaller than the number of pixels in the hemisphere. In this month’s Hemispheres Magazine, the magazine of United Airlines, you’ll find my article about exploring the chocolate shops of Paris. I talk about many of my favorite places, why I like them . . . and what I recommend you get while you’re there! —David Lebovitz in [davidlebovitz 05], October 2005. Off-Axis Fisheye Projection The discussion of both the hemispherical and angular fisheye projections assumes that the viewer is looking at a radius-k circle on which an infinite hemisphere is projected. Figures 7.2b and 7.7 further imply that the line of sight of the viewer passes through the center of the circle. We can say that the viewer is located on the axis of the circle and we can ask what the viewer will see when he moves away from the axis, still looking in the same direction. This is not just a theoretical problem. Many planetariums use a fisheye lens to project an image on a hemispherical dome, where some (or even many) viewers sit away from the center. Those viewers see a twice-distorted image, once because it is a fisheye projection and again because they observe it off-axis. The mathematics of an off-axis fisheye projection is illustrated in Figure 7.11. We start with four points, depicted as circles and labeled 1 through 4. In part (a) of the figure, the viewer is assumed to be on the axis and the points are shifted toward the viewer by halving their view angles. The shifted points are depicted as small squares. In part (b), the viewer is assumed to be located off-axis, and the four points are shifted toward the viewer by halving their new view angles. The new points are depicted as triangles. It is obvious that points 1 and 2 are shifted more in part (a) than in part (b). Thus, those parts of the image are more distorted when the viewer is on-axis. In contrast, points 3 and 4 are shifted more when the viewer is off-axis, thereby distorting those parts of the image on the “right” side. Figure 7.12 illustrates the overall effect of an off-axis projection. It shows 50 points moved toward an off-axis viewer. In the three parts of the figure, from left to right, the viewer is located at (10, 0), (−5, 5), and (0, 5). This figure illustrates the effects of the viewer being off-axis and ignores the distortions (such as straight lines transformed into curves) introduced by the fisheye projection itself. Those who took the trouble to read Chapter 4 know how to compute the off-axis fisheye projection. First translate the viewer on the xy plane to the on-axis position, and then use the translation vector (a, b) to translate each point with (−a, −b), project it according to Equation (7.2), then translate the result back with (a, b). If the last translation brings the point outside the radius-k circle, the point is ignored because the off-axis viewer cannot see it.
7 Nonlinear Projections 2 1
1
2
3 3
4
367 4
(a) 1
1
22
3
4
3
4
(b) Figure 7.11: Off-Axis Fisheye Projection.
k = 10; n = 50; scal[q_] := (k Tan[ArcTan[q/k]/2])/q; P = Table[{Random[Real, {-10.,10.}], Random[Real, {-10.,10.}]}, {n}]; x = -5; y = 5; (* Location of viewer *) Pt = P - Table[{x, y}, {n}]; Q = Table[Sqrt[Pt[[i]].Pt[[i]]], {i, n}]; L = Table[Line[{P[[i]]+{x, y}, (scal[Q[[i]]] P[[i]])+{x, y}}], {i, n}]; Show[Graphics[L], Graphics[Circle[{0, 0}, k]], Graphics[{AbsolutePointSize[5], Point[{0, 0}]}], Graphics[{AbsolutePointSize[5], Point[{x, y}]}], AspectRatio -> Automatic, PlotRange -> All] Figure 7.12: Off-Axis Fisheye Projection and Code.
Rectangular Fisheye Projection The hemispherical fisheye projection projects the entire 180◦ space located in front of the viewer, an infinitely large image, into a finite-sized circle, and it does this by distorting the image, especially in areas away from its center. The rectangular fisheye projection discussed here is a compromise on this technique. It creates less distortion but can project only part of the space in front of the viewer. Those parts that are too high above the viewer or too low are not included in this type of projection. Figure 7.13a shows the principle. We imagine a rectangle of infinite width and a finite height h centered on the
7.2 Fisheye Projection
368
xy plane. A three-dimensional point (x, y, z) is projected on the rectangle in parallel into the point (x, y, 0), but only if the y coordinate is in the interval [−h/2, +h/2]. (The figure shows a green point that’s too high.) Points above or below the rectangle are not included in the projection.
y
y
te ini Inf To o
gle tan rec
k
h
x
P* /2
hi gh
h z
P
V z
x
k
(b)
(a)
Figure 7.13: Rectangular Fisheye Projection.
Once a point has been projected on the rectangle, it is shifted in the x direction to bring it into the rectangle of width k. This is done by halving its view angle θ, as in the hemispherical fisheye projection, but only in the x direction (Figure 7.13b). The final projection is distorted only in the x direction; all the y dimensions are preserved. The final result is that point (x, y, z) is projected into (s·x, y, 0), where the scale factor s is given by (compare with Equation (7.2))
s=
k tan (arctan[|x|/k])/2 . |x|
This variant of the fisheye projection is a relative of the semicylindrical fisheye projection. We start with half a cylinder, on which three-dimensional points are projected in parallel. The semicylinder is then unrolled and viewed as a flat rectangle. Notice that points d and e in Figure 7.14 are close in three-dimensional space, but their projections on the cylinder are separated. This type of projection magnifies details close to the vertical edges of the final projection, which is the opposite of the other fisheye variants.
7 Nonlinear Projections
369
×
× ×
a
b
×
c
×
×
d e
× ×
a ×
b
×
c
d
e
Figure 7.14: Semicylindrical Fisheye Projection.
7.3 Poor Man’s Fisheye Fisheye lenses are expensive. At the time of writing (late 2010) high-quality fisheye lenses for Nikon, Canon, and Pentax cameras cost typically $400–800. An attractive, very low-cost alternative is a peephole, of the kind used in doors. Reference [peephole 11] is one of several found on the Internet that illustrate this alternative. The idea is to buy an inexpensive door viewer and hold it in front of the lens when the picture is taken. An ideal peephole will have an eye hole with the diameter of your lens. Slightly smaller peepholes will also work. Remove the back part of the viewer, hold it in front of your lens, zoom the lens in all the way, and take a picture. If the resulting pictures are not sharp, try the following (1) set the camera to focus and meter the light at the center of the image and (2) use the camera’s macro mode. Depending on the quality of your camera, the diameter of the viewer, and your experience as a photographer, excellent results may be obtained in as little as a few hours. You will notice that with the door viewer in front of the lens, the final picture is different from what you see in the viewfinder. The viewer also converts the image from rectangular to circular, which looks smaller on the small LCD screen. All this takes getting used to. Figure 7.10 shows two images taken in this way (see also Plates L.1 and L.3).
7.4 Fisheye Menus The topic of this section is not a projection from three dimensions to two dimensions, but it is included here because it is a useful and interesting application of the fisheye principle: the technique of local magnification combined with global shrinking. Often, a computer program has to display a long, dynamic menu of items. An address book has to display the list of addresses, an Internet browser must display a list of URLs, and a commercial Web site should display a list of items described on the site or offered
370
7.4 Fisheye Menus
for sale. The user watches such a menu—normally with other menus, text, images, and miscellaneous items—on the computer monitor, where “real estate” (i.e., space) is limited. Software designers have been aware of this problem for a long time and have come up with various solutions. Perhaps the simplest solution is to shrink the size of individual menu items as more items are added to the menu and it gets taller than the screen. This solution can only go so far because text under a certain size (typically 5 printer’s points) is impossible to read on a screen, where the pixel resolution is typically 72 dots per inch (dpi). A slightly better solution is to scroll the screen. Once the menu is taller than the screen (or taller than the window assigned to the menu), a scroll bar appears on the side, so the user can scroll the menu up or down. Sometimes arrows at the top and bottom are used instead of a scroll bar. This is a simple, effective, and very common solution. Its only downside is that only part of the menu is displayed at any given time, but if the menu items are sorted in some way, which they often are, this may not present a serious problem. Another common technique is to use hierarchical “cascading” menus, where the main menu is kept small, but any items in it can have a submenu. Selecting an item, normally with a mouse, opens (after a short delay, allowing the mouse to slide to another item) its submenu and lists its items, which may have subsubmenus. This allows for very large menus, but again only a small part of the menu is displayed at any time. Another disadvantage of this type of menu is the time it takes to open a submenu, examine it, and, if it is the wrong one, slide to another submenu. A more sophisticated solution is the fisheye menu. In such a menu, all the items are displayed simultaneously on the screen or in the window. If there are many items, most are shrunk to small sizes or even very small sizes, where it is impossible to read or perceive an item. Sliding the cursor along such a list magnifies the items closest to the cursor, so they can be read or observed at their full size. Items slightly away from the cursor are displayed at somewhat smaller sizes, and items far from the cursor are displayed at very small sizes. Figure 7.15 shows two examples of fisheye menus. One is a long list of text items (country names from [fisheyemenu 05]) sorted alphabetically. It is obvious that sliding the cursor along such a list is a fast and easy way to select any desired name, even though at any given time most of the list is too small to read. The other example is the Macintosh dock, a feature familiar to Macintosh users since the introduction of OS X in 2001. The dock is a graphical menu with icons of files, folders, and applications that are commonly used. A dock item is selected by sliding the cursor along the dock. The icon sizes vary from small to medium to large and back to small in real time, making it easy for the user to locate any desired item. Once an item is found, merely selecting it also launches it. When a menu is short, all its items can be displayed in full size and the entire menu fits comfortably on the screen. When items are added to the menu, it gets taller until the time comes to shrink items. The algorithm for that must consider three features: 1. The total height of the menu must equal the height of the screen regardless of the number of items. The only exception is a menu that’s too short even when all its items are displayed at maximum size. 2. The maximum font size (or size of the graphical icon) must be specified by the user, with a reasonable default value. Some fisheye menus require a large maximum size,
7 Nonlinear Projections
Figure 7.15: Fisheye Menus.
371
372
7.5 Panoramic Projections
while others can be used with a fairly small maximum size. 3. The item at the cursor location is displayed at the maximum size, and all the items within a distance of f /2 items above it and f /2 items below it must be displayed at a size that will make it possible for the user to read and identify them. The sizes of the remaining items are selected such that the entire menu will fill up the screen. This creates a dynamic bubble of f readable items around the main item, which enables the user to identify items adjacent to the main item and select any of them with ease. The parameter f is referred to as the focus length of the fisheye menu and should be specified by the user, with a reasonable default value. Notice that large-sized items require larger spacing between them, while the spacing between the smaller items can be shrunk accordingly. A large focus length, such as 10 or 20, will cause the peripheral items to be very small, while a small focus length, such as 2 or 3, will force the user to slide the cursor slowly in order to be able to read the current two or three large items. Thus, the choice of focus length is a compromise between fast selection and ease of reading. When a menu becomes very large, most of its items are shrunk to the size of a dot. In such a case, it helps to embed index items in the menu. These items are always kept at a readable size and are used to locate the start of any desired region in the menu. This idea is illustrated in the left part of Figure 7.15, where the index items are the single letters “A” through “Z.” A user looking for an item that starts with “Q” can quickly slide the cursor to the index “Q,” where the first few relevant items will immediately be readable. A fast implementation of fisheye menus is a must and is based on arrays or other data structures, each of which contains relevant data at a certain size. If the menu items are text, then fonts at several sizes must be available. If the items are icons, then each new icon added to the menu must be immediately prepared at several sizes and added to the appropriate data structures. For more information on fisheye menus, see [fisheyemenu 05].
7.5 Panoramic Projections Visitors to an exceptionally lovely spot sometimes wish they could see the view behind them as well as in front of them simultaneously. This kind of effect is generated by the various panoramic projections. A panorama is defined as an unbroken view of an entire surrounding area, and panoramas have always been a favorite with artists, painters, and photographers. The insert below discusses the Mesdag Panorama, one of the few surviving large panoramas painted in the 18th and 19th centuries. When cameras came into general use in the early 20th century, inventors started developing panoramic cameras (Section 7.11). With the advent of fast, inexpensive personal computers and digital cameras in the 1980s, it became possible, even easy, to take a sequence of (partially overlapping) photographs with any camera and stitch them by software into a single picture that depicts a large area, sometimes an entire 360◦ view around a point, including parts that are very high or very low and cannot normally be included in a single picture. The price for including so much visual information in one picture is distortion. Any method
7 Nonlinear Projections
373
for projecting a three-dimensional scene into a panoramic picture introduces some distortion. Straight lines become curved and familiar shapes may look funny or become completely unrecognizable. The main types of panoramic projections described here are the cylindrical, spherical, and cubic. All three are based on the same principle, but only the first is popular because it manages to squeeze the most visual data into a flat image with the minimum of distortion. Section 7.9 presents a different approach to panoramic projections, where they are considered variants of the linear perspective projection but with several vanishing points (up to six) placed at certain strategic locations in the projection. Section 7.10 mentions other techniques for panoramic projections. The Mesdag Panorama The Mesdag Panorama is a painting depicting a 360◦ panoramic view of the surroundings of Scheveningen, a fishing port northwest of The Hague, as seen by the painter in 1881. The painting is huge, measuring 120×14 meters (390×45 feet) for an area of about 17,000 square feet. It is folded into a cylinder and several observers can enter from below and stand at the center, turning, watching, and admiring. The Mesdag panorama was painted by the 19th-century Dutch painter Hendrik Willem Mesdag, with the help of S. Mesdag-van Houten, Theophile de Bock, B.J. Blommers, G.H. Breitner, and A. Nijberck. Similar panoramas were exhibited throughout Europe and America during the 19th century (they were sometimes called cycloramas). The Mesdag panorama is one of the last panorama paintings still in existence. It can be viewed at the Museum Panorama Mesdag in The Hague, The Netherlands. See [Mesdag Documentation Society 98] for more information.
7.6 Cylindrical Panoramic Projection Imagine a rectangle made of transparent material being rolled into a cylinder and placed around an observer (Figure 7.16a). The observer is located at the origin, which is also the center of the cylinder, and is looking at the view outside through the transparent surface of the cylinder. The observer now starts turning around. We imagine that everything the observer sees is magically fused into the cylinder material. (In the absence of magic, the observer may simply use a paintbrush or a magic marker to paint what he sees through the cylinder.) As an example, point P in Figure 7.16a is projected to point P∗ by connecting P to the observer as in linear perspective. After the observer has turned a full circle, the surface of the cylinder is entirely covered with images. The cylinder is now unrolled and is hung flat on a wall, to be viewed as a rectangular picture. The image shown in such a picture is a 360◦ cylindrical panorama (or a cylindrical projection) of the view seen by the observer. Notice that certain details seen by the observer are too high or too low to be seen through the cylinder. Point Q (green in Figure 7.16a) is such an example. Thus, the unrolled cylinder does not contain the entire scene surrounding
7.6 Cylindrical Panoramic Projection
374
the observer. The top and bottom parts are missing, and the sizes of the missing parts depend on the height of the cylinder. Figure 7.16a shows a cylinder centered about the origin. It is easy to see how a three-dimensional point P is projected to a point P∗ on the cylinder. Figure 7.16b shows the cylinder unrolled. Point P is located in the same place in space, but its projection has moved with the opening of the cylinder. Figure 7.16c shows the geometry of the problem. We assume that the dimensions of the original rectangle are 2Y ×2Z. When rolled into a cylinder of radius R, the perimeter of the cylinder satisfies 2πR = 2Y , so R = Y /π. Consider an arbitrary three-dimensional point P = (x, y, z) viewed by the observer. When the cylinder is eventually unrolled, P will be projected to a point P∗ = (x∗ , y ∗ , z ∗ ) and our problem is to determine the coordinates of P∗ as functions of x, y, z, Y , and Z. z
P
z
P P*
2Y
y
Q
x=R unroll here II
x=R lane p e h t er ylind led c l o r n u (b)
the x=R plane
(a)
y
P
D z
P* R
III
2Z
x
x cut here
P*
y
I IV
P P*
Z x
R
Z
(c)
D
Y
Y
Y= π R
z
(d) Figure 7.16: A
360◦
Panoramic Projection.
The x∗ coordinate is trivial to determine. The figure shows that all the points on the unrolled cylinder have the same x coordinate. We can set it to R or, even simpler,
7 Nonlinear Projections
375
to zero. The y ∗ coordinate should equal the length of the arc subtended by θ, which is Rθ. Angle θ depends on the x and y coordinates of P but not on its z coordinate. The relation is (x, y) = D(cos θ, sin θ), where D is the distance (projected on the xy plane) of P from the origin. This distance is x2 + y 2 . From this we get (x, y) = (cos θ, sin θ), x2 + y 2 or
y y x θ = arcsin . = arccos = arctan x x2 + y 2 x2 + y 2
Notice that the signs of x and y determine the quadrant number. If θ is in quadrant III or IV, then y ∗ should be negative. The z ∗ coordinate is determined by perspective projection. Figure 7.16d shows how this is done with similar triangles: zR z z∗ zY . = → z∗ = = D R D π x2 + y 2 Exercise 7.6: It seems that the projected point P∗ is given by ∗
∗
∗
(x , y , z ) =
zY
0, ±Rθ, π x2 + y 2
,
so its coordinates depend on x, y, z, and Y, but not on Z. What’s the explanation? The panoramic projection leads naturally to the concept of curved perspective (see also Section 7.9). This concept comes up when we consider the panoramic projection of a straight line. Figure 7.17a shows a cylinder and a line A in space. Several projection lines are shown going from A to the center of the cylinder. These lines are contained in a plane L, and we know from elementary geometry that the intersection of a cylinder and a plane is, in general, an ellipse (Figure 7.17b). The projection of A on the cylinder is therefore an elliptical arc. When the cylinder is unrolled, this arc turns into a sinusoidal curve (Figure 7.17c). Exercise 7.7: Prove this claim! This behavior means that the panoramic projection converts straight lines into curves, resulting in what can be termed curved perspective. Two special cases should be considered. One is when the plane is perpendicular to the cylinder (corresponding to an angle θ = 0◦ in Figure Ans.26, Page 1379), and the other occurs when it is parallel to the axis of the cylinder (corresponding to an angle θ = 90◦ in Figure Ans.26). In the former case, the intersection is a circle and the sinusoidal curve has zero amplitude (i.e., it degenerates into a straight segment). In the latter case, the intersection is an infinite ellipse and the sinusoidal curve has infinite amplitude; it degenerates into three lines. Figure 7.17d shows an observer positioned at the center of a cylinder and looking to the north. Three horizontal infinitely-long lines are shown. The projections of lines 1
7.6 Cylindrical Panoramic Projection
376 Lin
Elliptical arc
eA
Pl
an
eL
(a) a
ab
b
(b) N 1 N
(c) E
S
S
W
N
E
S
W
2 3
(d)
cut here
(e)
Figure 7.17: Projections of Straight Segments.
You’re wasting that panorama on me, Nan. Save it for Dave Slade. —Robert McWade (as District Attorney) in Ladies They Talk About (1933)
7 Nonlinear Projections
377
and 3 are ellipses and become the sinusoids shown in Figure 7.17e. The projection of line 2 is a half-circle (not shown) that becomes a straight line when the cylinder is unrolled. This shows how horizontal straight lines are projected by curved perspective into either horizontal segments or curves. The three segments are projected into the cylinder in the region bounded by the W and E directions. Two segments become curves (whose curvature depends on the height of the projected segment), and the central one remains straight. Vertical lines are always projected into vertical straight segments. Figure 7.18 is an extension of Figure 7.17e. It illustrates the 360◦ cylindrical projection of horizontal straight segments in four directions. Part (a) of the figure shows four segments and their directions. Part (b) shows how each segment becomes a curve on the unrolled cylinder. Segment 1, to the north, is projected into a curve between W and E (several curves are shown, which are the projections of segments at various heights). Segment 4, to the south, is projected from E to W through S, so it is displayed in two halves. Segment 2, to the west, is projected from S through W to N, and segment 3 is projected from N through E to S. Some straight vertical segments are also shown. Such a grid corresponds to the continuous four-point perspective of Section 7.9.
N
W E
S S
4
W 2
N 1
E 3
S 4
2 4 1
3
cut here
(a)
(b)
Figure 7.18: (a) Four segments. (b) Cylindrical Projections of Horizontal Segments.
Such a grid is handy when we want to compute or paint the cylindrical projection of a three-dimensional scene on a rectangular canvas. This can be done either manually or by special software. Any point in the space around the cylinder of Figure 7.18a is projected onto the surface of the cylinder by moving it to the surface along the segment that connects it to the center of the cylinder. Once a point is on the surface of the cylinder, it is easy to tell where it should go on the grid of Figure 7.18b. Art, like morality, consists in drawing the line somewhere. —G. K. Chesterton. A great artist is always before his time or behind it. —George Moore. Figure 7.19 (courtesy of Dick Termes) is an example of such a drawing. It depicts a familiar scene, so there is no need to include the original three-dimensional image or any hints. The reader should especially note how the vertical lines are straight and how horizontal lines are curved mostly around the center of the drawing, as discussed in
378
7.6 Cylindrical Panoramic Projection
the answer to Exercise 7.7. This figure is also an example of the four-point continuous perspective discussed in Section 7.9.
Figure 7.19: Cylindrical Panoramic Projection (courtesy of Dick Termes).
Almost everything in Dick Termes’ world is round—the sun breaking through morning haze, the tennis ball he batted back and forth before breakfast, and the four geodesic domes in which he lives and works. For more than 36 years, Termes has eschewed traditional flat canvases to create his art on polycarbonate globes he calls “Termespheres.” He came up with the idea while completing his master’s degree at the University of Wyoming in the late 1960s, and it has been his passion ever since. Termes estimates he has painted more than 300 major spheres so far—about a third of those by commission—and his work is displayed internationally from North Pole High School in Alaska to the Sphere Museum in Tokyo, Japan. “In art, the most important thing to find is an original thing to do,” he says. “There have been lots of paintings done over thousands of years, most on flat surfaces. The sphere adds a whole new set of geometries that fits with the real world better than a flat surface. Three-dimensional space is what we live in.” —David Eisenhauer, University of Wyoming Magazine. Figure 7.20a (courtesy of Ari Salomon [helloari 05]) shows three examples of cylindrical panoramas. Each was made by taking several overlapping photographs and stitching them with appropriate software. Part (a), a bathroom in Paris, France, is vertical. It was made by taking pictures with a 20% overlap and tilting the camera to point higher and higher between images. It is obvious that the vertical lines are curved while the horizontal lines remain straight (but not completely parallel since the camera was held by hand during the shots). Part (b) is a street scene in Tel-Aviv, Israel. After watching this image for a few seconds and trying to “digest” it, it becomes clear that we are looking at three parallel streets (even though they seem to diverge). On the right-hand side, we see cars going toward the center of the image (away from our viewpoint). On the left, cars are parked pointing toward us. (One such car can be seen at the extreme right of the image.) These are the two directions of the same street. The center street,
7 Nonlinear Projections
379
Figure 7.20: (a) Vertical and (b,c) Horizontal Cylindrical Projections (courtesy Ari Salomon).
380
7.7 Spherical Panoramic Projection
where we see a park bench, stroller, and people walking, is a paved walkway sandwiched between the two directions of the street. The implicit assumption behind this image is that viewers’ familiarity with street scenes will help them to “straighten out” the distortions in the image and thus to enjoy it. The reader should also notice that vertical lines in this image seem to tilt toward the edges of the image, and this tilting becomes more pronounced for lines close to the edges. This is probably an artifact of the particular software used to create these images. Part (c) of this figure shows a large space serving as artists’ studios in Lyon, France. Here we see the four sets of curved horizontal lines that are the hallmark of Figure 7.18b. The vertical lines are also tilted as in part (b). An intuitive way to understand and accept curved perspective is to print the curved projection of a familiar scene on paper, roll the sheet of paper into a cylinder, go inside into the center, and look around at the scene. (This may be simple if the projection incorporates less than 360◦ .) When seen this way, any curves on the paper that are the projections of straight lines should look straight. This method also provides a simple test of any software used to compute and render the projection. Commercial software for creating cylinder-shaped panoramas already exists. Popular examples are the Apple QuickTime VR Authoring Studio, PhotoVista from Live Picture Inc., PTGUI, from http://www.ptgui.com/, and PhotoStitch, which comes with every Canon digital camera. A qualitative discussion of curved perspective can be found in [Ernst 76], pp. 102–103. The well-known drawing High and Low by M. C. Escher is an example of curved perspective. Plates Q.2, R.1, and S.1 are examples of panoramas “stitched” by special software from individual overlapping photographs.
7.7 Spherical Panoramic Projection The following quotation, from [Ernst 76], suggests a way to generalize the cylindrical panoramic projection of the previous section. Perhaps it has already struck you that the cylinder perspective used by Escher, leading to curved lines in place of the straight lines prescribed by traditional perspective, could be developed even further. Why not a spherical picture around the eye of the viewer instead of a cylindrical one? A fish-eye objective produces scenes as they would appear on a spherical picture. Escher certainly did give some thought to this, but he did not put the idea into practice, and therefore we will not pursue this further. The idea raised by Ernst (but not pursued by Escher) is to imagine a transparent sphere placed around the observer, where everything seen by the observer through the sphere is fused (or painted by the observer) onto the sphere’s material. The sphere is then somehow flattened, resulting in a full 360◦ spherical perspective. The trouble with this idea is that a sphere cannot be unrolled into a flat surface without introducing further distortions (see Section 7.15). We start with what is perhaps the simplest approach to the problem of deforming and flattening a sphere. Once a three-dimensional point P has been projected onto the surface of the sphere, it becomes a point P∗ with longitude and latitude. We construct a rectangle of width 360 and height 180 units and project P∗ on the rectangle by simply
7 Nonlinear Projections
381
using its longitude and latitude as the x and y coordinates, respectively, on the rectangle. Figure 7.21 illustrates the Earth in this projection, and the deformation is immediately obvious. On the rectangle, the lines of latitude are the same length, so polar latitudes, which on the sphere are short, have to be stretched.
Figure 7.21: Equidirectional Projection of a Sphere.
When the entire 360◦ space around an observer is projected onto the rectangle in this way, the regions directly above and below the observer (which often are less important) are stretched and feature much detail. The regions at the height of the observer (the equator), however, lack detail, but are to scale. This projection is sometimes used in map making and is referred to as equirectangular projection, rectangular projection, plane chart, or plate carre. The remainder of this section describes another, highly distorted version of spherical panoramic projection. This version is another manifestation of the concept of curved perspective. What you see on these screens up here is a fantasy; a computer enhanced hallucination! —John Wood (as Stephen Falken) in WarGames (1983). Imagine a transparent sphere of radius R centered on the origin, where an observer is located, looking through the sphere in the z direction. The sphere is now truncated by selecting a value θ in the range [0, π/2] and removing the parts of the sphere above and below latitude θ. The remaining part is shaped like a barrel (Figure 7.22a). The barrel is now cut behind the observer and is unrolled into a flat, two-dimensional figure resembling a Band-Aid (Figure 7.22c) that’s called a band or a capsule (see also Figure 7.56). The
7.7 Spherical Panoramic Projection
382
y vertical line
R cos
v
line of sight of observer
z
R
x
R
Rφ
R
z
barrel
x
vertical line
φ
cut here
(b)
(a)
(-πR cos ,R )
y
φR cos (πR cos ,R ) Rφ
band for large
x (πR,0)
(0,0)
(-πR cos ,-R )
y
cut here
P
top of barrel
curved image of vertical line
(πR cos ,-R )
φR cos(-) (c) Figure 7.22: Spherical Panoramic Perspective.
image seen by the observer through the barrel is displayed on this band, in contrast with the cylindrical panoramic projection, where the projected image is displayed on a rectangle. At its center, the band has a width of 2πR (the circumference of the sphere), while at the top and bottom its width equals 2πR cos θ. The height of the band is 2Rθ. Truncating the sphere into a barrel makes it possible to control the amount of distortion in the final projected image. Small values of θ result in a narrow band whose shape is close to a rectangle. Only a small part of the scene around the observer is displayed on this band, but with a minimum of distortion. When θ is set close to π/2, the band becomes taller and its shape approaches a circle. It includes more of the scene (only those parts located directly above and below the observer are omitted) but with more distortions, especially at the top and bottom.
7 Nonlinear Projections
383
As in the cylindrical panoramic projection, horizontal lines are projected on the band as sinusoids, but we now show that even vertical lines, which in the cylindrical projection are projected straight, now become curved. Figure 7.22b shows the barrel from above (i.e., looking in the y direction). A long vertical line (parallel to the y axis) is shown, and we assume that a general point on this line is projected to a point v on the barrel. After the barrel is unrolled, the y coordinate of point v varies in the range [−Rθ, +Rθ]. The x coordinate depends on the y coordinate and equals the radius of the barrel at height y times the angle φ. The radius of the barrel at height y is easily seen to be R cos(y/R), so point v is located on the band at position φR cos(y/R), y , where −Rθ ≤ y ≤ +Rθ. This position varies from φR cos(−θ), −Rθ to (φR, 0) to φR cos(θ), Rθ when y varies from −Rθ to 0 to Rθ. The projection of the vertical line on the band is therefore the thick red curve shown in Figure 7.22c. It is easy to see that the closer θ is to π/2 (or 180◦ ), the smaller cos θ is and the more curved (distorted) the projection. Given an arbitrary point P = (x, y, z), it is relatively easy to calculate the xy coordinates of its projection on the band. Figure 7.22b shows the situation on the xz plane and makes it clear that the x coordinate of the projected point on the band is the arc Rφ. Since tan φ = x/z, we get the x coordinate as R arctan(x/z). Similarly, Figure 7.22a shows that the y coordinate of the projected point on the band is the arc Rα or R arctan(y/z). Thus, the projected point has band coordinates (Rφ, Rα) or R arctan(x/z), R arctan(y/z) . Both φ and α can vary in the interval [−π, +π], so the projected x coordinate varies in [−πR, +πR]. The projected y coordinate varies in the same interval, but it is clear from the figure that any point P for which |α| is greater than |θ| is projected outside the barrel (i.e., on one of the sphere parts that have been removed) and should consequently be rejected. The IPIX Wizard software [IPIX 05] can create a spherical panorama from two scanned fisheye photographs. To some people, spherical panoramas may seem less interesting (and perhaps also less useful) than cylindrical panoramas, as the following 1998 quotation, from David Palermo, a virtual-reality professional, suggests: “Our market is not craving [sphereshaped panoramas] right now. You can convey a sense of place without looking at the sky or floor.” For me it remains an open question whether [this work] pertains to the realm of mathematics or to that of art. —M. C. Escher.
7.7.1 Curvilinear Perspective However, Figure 7.23 (courtesy of Dick Termes) suggests that it is possible to create full spherical panoramas that show everything an observer sees in front of him and behind him, while also maintaining their artistic merit in spite of the many vertical and horizontal distortions. The reader should especially note that the few vertical and horizontal lines located close to the center of the picture (noticeable in the upper half) are essentially straight. The five-point grid of Figure 7.29 is an artist’s tool that helps draw such pictures. Reference [Termes, Dick 98] has more on such tools. This section explains the principles behind the five-point grid. The material pre-
384
7.7 Spherical Panoramic Projection
Figure 7.23: Spherical Panoramic Perspective (Courtesy of Dick Termes).
7 Nonlinear Projections
385
sented here is based on the concept of curvilinear perspective, developed by Albert Flocon and Andr´e Barre [Flocon and Barre 68]. Curvilinear perspective is a two-step spherical panoramic projection whereby points in the 180◦ space in front of the observer are first projected on a hemisphere and then from the hemisphere onto a flat circle. When this is repeated for the 180◦ space behind the observer, the result is two circles that contain the entire 360◦ of space surrounding the observer. Their book beckons us to join with the fun and excitement, but it is also a revolutionary manifesto, a call to liberation from dogma. Not “Down with Traditional Perspective!” but “Down with the Tyranny of Official Rules.” Not “Learn the Only True Perspective!” but “Let a Hundred Flowers Bloom!” —Robert Hansen in [Flocon and Barre 68]. Figure 7.24a illustrates the first step. A point P in space is projected to a point P∗ on a hemisphere. The observer is located at the center of the sphere. Part (b) of the figure shows how the hemisphere is projected onto a flat circle. The center of the circle is tangent to point R on the sphere (the point right in front of the observer). Given a point Q on the sphere, we draw the great-circle arc from R to Q. Denoting the length of this arc by L, point Q is projected to the point at distance L from the center of the circle in the direction from R to Q. This particular projection of a hemisphere to a circle was proposed in the 16th century by Guillaume Postel and has the useful property that its distortions of angles and distances are minimal. Clearly, the distance between R and Q on the hemisphere is preserved on the circle, whereas the distance between points A and B on the hemisphere of Figure 7.24c suffers a minimal distortion. For a 30◦ angle, the ratio between the arc length AB and its projection is only 1.01, and for a 90◦ angle this ratio is 1.57, much smaller than distance distortions caused by other sphere projections. Exercise 7.8: Show how to determine the distance between points A and B on the hemisphere of Figure 7.24c and on the circle of the same figure. Compute the ratio of these distances and show that it equals 1.01 for a 30◦ angle and 1.57 for a 90◦ angle. Normally, the radius of the circle is R(π/2) because this is the length of the longest radial arc on a hemisphere of radius R. However, it is possible to extend the Postel projection to project an arc of length r on the hemisphere to a segment of length s r on the circle, where s is any desired scale factor. The radius of the circle in such a case is s R(π/2). When the two steps of curvilinear perspective are performed for a vertical line, it becomes a vertical curve on the circle (Figure 7.24d). This curve is very close to a circular arc and for all practical purposes can be approximated by such an arc. Similarly, a horizontal line in space is projected to a horizontal circular arc on the final circle. Lines that are parallel to the line of sight of the observer are projected on the circle to straight segments that converge at the center. Thus, the five-point grid of Figure 7.29 serves as a useful artist’s tool to draw the curvilinear perspective projection of any scene on a circle of radius s R(π/2) in a single step.
7.8 Cubic Panoramic Projection
386
P P* Q R
R
(a)
A
(b)
B
(c)
(d)
Figure 7.24: Principle of Curvilinear Perspective.
7.8 Cubic Panoramic Projection The principle of the cubic panoramic projection is similar to the principles of the other panoramic projections. We imagine an observer located at the center of a cube (Figure 7.25a) and looking at the three-dimensional scene outside. Everything the observer sees is etched on the sides of the cube (or is painted there by the observer), and the cube is then flattened by opening it into six squares as in Figure 7.25b,c. This creates a full 360◦ panorama in six parts. The main advantage of the cubic panoramic projection is the absence of distortion. Straight lines are projected into straight lines, and the only deviation from total linearity is discontinuous slopes at the boundaries between the six planes of the cube. This behavior is best illustrated by Figure 7.27 (courtesy of Shinji Araya) but is also demonstrated here rigorously by means of an example. Figure 7.26a shows two faces (we’ll call them panels) of a cube viewed from the positive z direction. Each face of the cube is 2k units long, and we see the two panels located at x = k and y = k. Figure 7.26b shows the two panels after they have been swung to stand side by side, and we look at their outside surfaces. To best visualize this, imagine that there are hinges between the two panels, so they look like a folding
7 Nonlinear Projections
387
T
T E
N
N
S
N
E
S
W
B
B
B
W
T
S
E
W (b)
(a)
(c)
Figure 7.25: Cubic Panoramic Projection.
closet door (notice the direction of the x axis). The figure indicates that the x = k panel is parallel to the yz plane, which is why all points on it have coordinates of the form (k, y, z), while the y = k panel is parallel to the xz plane and all its points are of the form (x, k, z). y
P2
P0
P*
2
y=k
P*0 P*1
z (a)
x=k
z
z
L(t) P*0 P*1
P1 x
y x
x=k
P *2
1/2
P*1 y=k
(b)
Figure 7.26: Cubic Projection of a Straight Segment.
We arbitrarily select the two points P1 = (4k, k/2, 0) and P2 = (k/2, 2k, 1). The former is projected to the x = k panel, where points have coordinates (k, y, z), which is why it is projected to P∗1 = (k, k/8, 0). The latter is projected to the y = k panel, where its y coordinate must be k, so it is projected to P∗2 = (k/4, k, 1/2). We denote by L(t) the straight segment connecting P1 to P2 and compute it (from Equation (Ans.42)) as the weighted sum L(t) = (1 − t)P1 + t P2 = (4k − 7tk/2, k/2 + 3tk/2, t). Next, we determine the coordinates of point P0 on this segment. This point will be projected to the cube corner where x = y = k, so its x and y coordinates must be equal even before it is projected. Since P0 is on segment L(t), it must equal L(t0 ) for some t0 . Thus, we
388
7.8 Cubic Panoramic Projection
can compute t0 from the relation 4k − 7t0 k/2 = k/2 + 3t0 k/2, which yields t0 = 7/10. 31 7 The coordinates of P0 are therefore L(t0 ) = ( 31 20 k, 20 k, 10 ), and this is projected to 7 × 20 ) = (k, k, 14/31). P∗0 = (k, k, 10 31 Once the z coordinate of P∗0 is known, we can compute the slopes of the two segments that constitute the projection of L(t). On the y = k panel, the slope is 1 2
− 3k 4
14 31
=
2 , 31k
=
16 . 31k
whereas on the x = k panel it is 14 31
−0 7k 8
The straight segment connecting P1 to P2 has been projected into two segments that are straight but travel with different slopes on the two panels. Because of the symmetry of a cube, there is no difference between horizontal and vertical lines and they all feature the same discontinuity of slope between panels. Exercise 7.9: In what cases will the slopes be continuous across a panel boundary? It is clear that a panorama made of six squares doesn’t create a satisfying visual sensation, and Figure 7.27 (courtesy of Shinji Araya) proves this claim. The figure shows a beautiful scene, but the projection seems fractionated and unnatural. This lack of artistic merit is why the cubic panoramic projection was not seen much in the past. Currently, however, cubic panoramas are very popular because version 7 of the popular QuickTime software for the Macintosh computer can create this type of panoramic projection and can also scroll it on the monitor screen such that the viewer can eventually examine a field of view that encompasses 180◦ vertically and a full 360◦ horizontally. The main advantage of this scrolling is that it eliminates the discontinuities of the slopes between panels. The image seems to flow smoothly on the screen without any jumps or distortions. Such a panorama cannot be included in a book, but many can be found on the Internet by searching under “cubic panorama.” MakeCubic is a simple OSX-ready app for creating cubic QTVR movies from six faces or from equirectangular (a kind of sphere-to-rectangle projection which is used in some java-based players and other places) images. —From http://developer.apple.com/quicktime/quicktimeintro/tools/
7 Nonlinear Projections
389
Figure 7.27: Cubic Panoramic Perspective. (Courtesy of Professor Shinji Araya, Fukuoka Institute of Technology.)
7.9 Six-Point Perspective Chapter 6 introduces the concept of n-point perspective, where n can be 1, 2, or 3. This section extends the term “n-point” and discusses n values up to 6. The discussion is based on the work of and terms coined by Dick Termes, who also created the images, art, and grids in this section. Figure 6.15 shows Alberti’s method of traversals in one-point perspective. The important feature of this figure for our present discussion is the converging grid. Certain lines in this grid converge to a vanishing point and thereby turn the grid into an aid to the artist. Such a one-point grid becomes a tool that helps to draw any image in one-point perspective. Section 6.3 discusses perspective in curved objects and employs a similar grid (Figure Ans.18). Figure 7.28 shows grids for one, two, and three vanishing points and artistic drawings based on them. It is natural to accept these drawings. They look familiar and don’t seem distorted or unusual (although the viewpoint in some of them may be unusual). They are drawn in linear perspective. In contrast, drawings based on similar grids with more than three vanishing points are distorted. They belong in the realm of nonlinear projections. Figure 7.29 shows grids for four and five vanishing points, and it is immediately clear that they must introduce distortions in any artwork based on them. The former grid shows straight
390
7.9 Six-Point Perspective
Figure 7.28: Grids and Art for 1, 2, and 3 Vanishing Points (courtesy of Dick Termes).
lines bending and converging to four points. The vanishing points on the left and right sides are familiar. They result in the familiar two-point perspective. The extra two vanishing points, at the top and bottom of the grid, force all the vertical lines to bend and introduce distortions in this way. The result is an image (see example to the right of the grid) that becomes more distorted as the eye moves up or down away from the center of the image. This type of distortion has its own artistic value but it is not immediately clear to which of the projections discussed in this chapter it corresponds. A closer look at the four-point grid of Figure 7.29, however, shows its resemblance to Figure 7.18b, which corresponds to the cylindrical panoramic projection. Thus, a complete 360◦ cylindrical projection, such as the one depicted by Figure 7.20c, can be obtained by placing four four-point grids side by side. This type of grid is referred to by Dick Termes as a continuous four-point perspective. Initially, the five-point grid of Figure 7.29 looks unfamiliar and strange. It is not trivial to guess the type of distortion that results from bending lines in five different directions, toward the four extreme points on the periphery as well as toward the center. However, a glance at Figure 7.10 should convince the reader that the effect of fivepoint perspective is similar (perhaps even identical) to the angular fisheye projection (Page 361) as well as the spherical panoramic projection of Section 7.7. All the horizontal and vertical lines, except those passing through the middle of the figure, are curved. This drawing shows only half a sphere (180◦ vertically and horizontally), but it points the way toward depicting a complete sphere on a flat surface. Simply place two five-point perspective images side by side or one above the other. The result, which Dick Termes terms six-point perspective (no pun intended), is shown in Figure 7.23. Section 7.7.1
7 Nonlinear Projections
391
Figure 7.29: Grids and Art for 4 and 5 Vanishing Points (courtesy of Dick Termes).
discusses an approach to the construction of the five-point grid that is based on the Postel sphere projection. Reference [Cheeseman-Meyer 07] is an introduction to four- and five-point perspective, written especially for comic artists. Some viewers are impatient with attempts to create panoramic projections on flat surfaces. Such people may like the solution adopted by Dick Termes, namely to actually sit inside a sphere and paint a spherical panoramic projection on its surface. The result, which is naturally termed a Termesphere [termespheres 05], is a unique kind of art, but cannot be included in a book. (See Figure 7.30 and Plates H.2, I.6, and I.8 for a rough idea.) A side benefit of this technique is that the finished sphere can easily be converted to two flat disks in six-point perspective [Keith 01]. The original sphere is made of two thin polyethylene hemispheres. Once they are painted with acrylic paint, each hemisphere is heated until the polyethylene melts to become a plastic disk. The painting on the two disks is now in six-point perspective. An added advantage of this process is that such disks can be copied to make more disks that can, in turn, be blown into hemispheres by the same heating process.
7.10 Other Panoramic Projections The cylindrical and cubic projections of Sections 7.6 and 7.8 have a common feature that makes them attractive. The cylinder and the cube can be unrolled or opened into a flat surface without additional distortions. Other geometric shapes have the same feature, and this section mentions the most important of them, namely the five Platonic solids (Figure 7.32) and the cone.
392
7.10 Other Panoramic Projections
Figure 7.30: Emptiness, a 24” Termesphere (1986, courtesy of Dick Termes).
“This sphere shows rooms within rooms within rooms around you. Each room has one person which shows another type of emptiness” (Dick Termes).
7 Nonlinear Projections
Cubical Universe (Plate I.6) is a 16 in diameter sphere painted in 2010. This shows a very complex scene with rooms within rooms within rooms in all directions. My challenge in this sphere was to explore how many rooms within rooms could one see from one point in space. The people within the rooms are studying spheres and three-dimensional geometries to better understand the world they live in. Holes to the Whole (Plate I.8) is a 36 in diameter sphere painted in 2008. As it hangs and rotates it shows a room that branches off into many other rooms. The rooms are filled with people who are studying spheres that are floating around in the rooms. The spheres have images on them that look like they are moving or changing. With a closer look one realizes that the images on the balls are coming from the inside of the large sphere. The smaller spheres that are being studied are really transparent circles and the large sphere has a complete painting on the inside, a painting which is showing up through the transparent circles. If you look closely at all of the small spheres in the painting you can get an idea of what the big sphere on the inside is or if you go up close to one of the smaller spheres and look into it you can also see the whole inside picture. That is where the theme Holes to Whole comes from.
Figure 7.31: Parmigianino: Self Portrait in a Convex Mirror (1524).
393
394
7.10 Other Panoramic Projections
Figure 7.32: The Five Platonic Solids.
A polyhedron whose faces are congruent convex regular polygons is known as a Platonic solid. These figures were known in antiquity, and Euclid has already proved that there are only five of them, the tetrahedron (a pyramid of four triangles), the cube, the octahedron (eight faces, each a triangle), the dodecahedron (12 faces, each a pentagon), and the icosahedron (20 triangles). Many properties and pictures of these solids can be found in [Steinhaus 83]. One of the most original works of art depicting Platonic solids is the wood engraving Stars by M. C. Escher. It takes a while to disentangle the many details in this picture and locate the intersecting octahedra, tetrahedra, cubes, and other figures. The only items that stand out immediately are the chameleons, placed by the artist inside the polyhedra to attract nonmathematically-oriented viewers and capture their attention. The principle of projection is always the same. We imagine an observer located somewhere inside the surface, at the center or at some other preferred point, looking at the three-dimensional scene outside and painting it on the surface. The surface is then opened or unrolled to become a flat panoramic projection. In practice, only the cylinder and the cube are commonly used for panoramic projections. It is rare to find a pyramidal or a conic panoramic projection because opening and flattening such surfaces results in a two-dimensional picture that looks foreign and unfamiliar and is often difficult to visualize, perceive, and enjoy, even though it does not create any distortions. Figure 7.33 (courtesy of Dick Termes) is a typical example. It shows a panorama of the interior of St. Peter’s Basilica in Rome projected on a dodecahedron. It is immediately obvious that in spite of the high precision of the drawing and the many details that are easy to observe, it is difficult, perhaps even impossible, to place the 12 individual pentagons of the projection in the viewer’s mind and grasp them as a single coherent work of art. Such a projection is best viewed after it is cut out, folded, and glued together to actually form a dodecahedron (notice the matching tabs designed to help in this process). The details of this process and how such pictures are taken are described on Page 398. Conic Panoramic Projection Given a cone of height H and radius R, we imagine an observer located at the center of the base of the cone. Such an observer sees the hemisphere of space above him and projects it on the cone, which is later cut and laid flat. It is also possible to place the observer at the center of the cone, where he can see the entire 360◦ of space around him, but this results in even more distortion because part of the lower hemisphere is seen by the observer through the lower sides of the cone, while the rest of this hemisphere is seen through the flat bottom. Reference [lampshade 05] shows how to apply the conic panoramic projection to
7 Nonlinear Projections
395
Figure 7.33: A Panoramic Projection on a Dodecahedron (Courtesy of Dick Termes).
create original lampshades. The derivation presented here starts with a point P = (x, y, z) that is projected onto the surface of the cone. Once the surface is opened, the coordinates of the projected point P∗ are given (Figure 7.34) in terms of the angle θ it makes with the top of the cone, and the distance r from the top. These are polar coordinates on the open cone. z
her
e
cut
P
z x
√
y
H
P*
β
r
S
S
R
S
S
open (flat) cone 2R 2R
x2+y2 Figure 7.34: Conic Panoramic Projection.
√ The height S of the open cone is given by S = H 2 + R2 . The vertical angle β between the xy plane and the direction of P is given by tan β = z/ x2 + y 2 . (Notice
396
7.11 Panoramic Cameras
that β varies from 0 to 90◦ .) Once β is known, the polar coordinate r is determined by r = S(1 − sin β). It varies between 0 and S. The top angle α of the open cone is computed from 2R/S = cos(α/2), and the polar coordinate θ lies between 0 and α, so it is given by θ = αγ/(2π), where γ is determined by the x and y coordinates of P by means of tan γ = |y/x| (90, 180, or 270◦ may have to be added to γ depending on the quadrant, see Figure 7.16).
7.11 Panoramic Cameras A typical dictionary definition of panorama is “a picture taken in three-dimensional space and presented on a continuous surface encircling the viewer.” There are a large variety of lenses available for current cameras (both digital and film based, see Section 7.3), ranging from extreme wide angle to powerful telephoto, but even the widest wide-angle lenses cannot capture an image that spans more than 180◦ . Most fisheye lenses can capture 180◦ images, but the result is highly distorted, especially along the edges. Professional as well as amateur photographers like to be able to stand at a given point and capture an image of everything visible from all directions, which explains why panoramic cameras are popular. Inexpensive high-resolution digital cameras have become powerful and popular, and this has encouraged the development of panoramic software. Given a digital camera and a tripod, it is easy to take a series of overlapping photos, input them directly from the camera into the computer, and stitch them by software into a panorama (normally cylindrical). In spite of this, special panoramic cameras, both digital and film-based, the latter of which have been made since the 1840s, are still being made and used. An important resource for information on all aspects of panoramic cameras is the International Association of Panoramic Photographers [IAPP 05], whose mission is “to educate, promote, and exchange artistic and technical ideas, and to expand public awareness regarding panoramic photography.” Two other useful resources are a list, located at [cameraInproduction 05], of panoramic cameras in production and a timeline of panoramic cameras (up to 1994), located at [cameraTimeline 05]. A fun guide for doit-yourselfers is [funsci 05]. Information on panoramic cameras and creating panoramic images can be found on many Internet sites. See, for example, [shortcourses 05] and [philohome 05]. A new reference book for this topic is [Woeste 09]. See also [Jacobs 05]. There are currently three types of cameras that capture panoramic images: a rotating camera, a swing-lens camera, and a camera with a parabolic panoramic lens system. The first two can produce undistorted images, while the third type produces a highly distorted image that has to be “unfolded” by special software to look like other types (normally cylindrical or cubic) of panoramas. Following is a description of all three plus a note on pinhole cameras. A rotating camera, as its name implies, works by rotating on its base, transferring the image to the film while moving the film in the opposite direction, so the film stays stationary relative to the ground. Examples of this type are the Swiss-made Roundshot [Roundshot 05], some of whose models are digital, the Globuscope (no longer being made), the Lomography Spinner 360◦ [Lomography 10], and the Hulcherama camera,
7 Nonlinear Projections
397
invented and made by Charles A. Hulcher [hulchercamera 05]. Following is some information on the latter type. The Hulcherama is a slit-scanning panoramic camera that works by rotating on its base. An electronically controlled motor is responsible for uniform rotation. (The rate of rotation may be varied from 1 sec to 144 sec per revolution.) During the rotation, the image passes through the lens and then through an adjustable narrow slit onto the film (Figure 7.35a). The slit masks out most of the image but lets a narrow portion pass through, which is how any optical distortion is minimized. As the camera rotates in one direction, the film moves past the slit in the opposite direction. The camera rotation and film movement are synchronized so that the film is stationary relative to the image being photographed. As the camera makes a complete revolution, 8.9 in of film pass behind the slit, creating a 360◦ panoramic image with a height of 2.25 in. The aspect ratio is therefore a pleasing 2.25: 8.9 ≈ 1: 4. It is possible to let the camera rotate more than one revolution (possibly varying the image each time), and a roll of 120-format film is long enough for three revolutions (the Hulcherama uses the old but still available 120 or 220 roll film).
sec. mirror
rotation
slit
rro
r
L mi
slit film (a)
ma
in
hinge fil m
(b)
L camera (c)
Figure 7.35: Panoramic Cameras.
A swing-lens camera (Figure 7.35b) has a lens that rotates during an exposure, thereby “painting” the image on the film through a narrow, vertical, constant-width slit. In order to keep the same distance between the film and the lens, the film has to be curved. An advantage of this type of camera is that the lens only has to cover the vertical dimension of the film and the width of the slit, so it does not have to be complex. The downside of this type is the limited field of view, which is less than 180◦ . A complete 360◦ panorama is created by taking several shots and combining them using special equipment (for a film camera) or special software (for a digital camera). Examples of this type are the Widelux (now discontinued) and the Noblex. The Noblex [Noblex 05] is a family of cameras that consists of models 135, 150, and 175. Model 135 takes a 136◦ -wide image and uses standard 35 mm film. The Noblex-150 provides a 146◦ angle of view, uses 120 film, and produces six 5-inch-wide images on a roll. It can take multiple exposures on the same film. A panoramic lens system (Figure 7.35c) is somewhat similar to a reflecting telescope. Its main part is a convex parabolic mirror (in contrast to the mirrors used in telescopes,
398
7.11 Panoramic Cameras
which are concave) that captures the entire (or almost the entire) half-sphere of image above it and sends it up, where it is reflected by a small, flat mirror (red in the figure) and sent down through a hole in the main mirror to a camera. There are no moving parts, no rotating parts, no need for multiple images, and no need to stitch multiple photos together. The price for all this (aside from the price of the camera and mirror) is image distortion. This lens can, in principle, be used with any camera, digital or film. Since the mirror captures everything above it and on all sides, the only way for the photographer to stay out of the picture is to crawl under the camera. A panoramic lens system is therefore used while mounted on a tripod or a pole and operated from below. An example of this type is the Portal S1 panoramic lens system made by the BeHere company [BeHere 05]. It is 12.5 in in diameter, 13 in tall, and weighs less than 10 pounds. It has a 35-mm Nikon mount, so any Nikon-compatible camera body, digital or film, can be used with the Portal S1. The depth of field of the Portal is from one inch to infinity. (There is no need to focus the camera.) Its lateral field of view is, of course, 360◦ , but its vertical field of view is limited to the blue area in the figure and equals 100◦ (the angle between the two lines marked L). When anything outside this area is reflected in the main mirror, it cannot reach the secondary mirror. If a film camera is used, the film can later be scanned and then processed with special software provided by the manufacturer. This software flattens the donut-shaped image and can also perform other processing such as evening out the lighting, correcting brightness and contrast, and slightly sharpening the edges. The image can then be saved in one of the popular panoramic formats such as QuickTime VR. Exercise 7.10: Explain why the image produced by a panoramic lens system is shaped like a donut. The OmniAlert panoramic video camera system from Remotereality [remotereality 05] also employs a parabolic mirror, but the mirror points down, toward the camera, which results in a circular picture with no hole. This camera has been developed for security and surveillance applications, where a wide field of view is important. The video camera is mounted on a high pole right under the parabolic mirror and uses special software to detect and track moving objects in its field of view and alert operators to any suspicious activities. The 360 One VR parabolic mirror system, from Kaidan [Kaidan 05], also uses a down-pointing mirror and can be attached to several different cameras. Special software must be used to convert the highly distorted image to a flat panorama (Flash VR, cylindrical, QuickTime VR cylindrical, spherical, cubic, or QuickTime VR cubic) that can be displayed and printed. See also [eclipsechaser 05] for astronomical applications of this type of panoramic camera. Note. The pinhole used to be the first camera of many a poor youngster. This is simply a box with a small hole in front and film or light-sensitive paper loaded in the back. The shutter can be as simple as a piece of tape that’s removed to expose the film, then reapplied manually, or it can be a purchased, cable-operated shutter assembly. If the hole is small enough, the resulting image is sharp; if the film is wide, this primitive device can produce wide-angle images. The total photograph. We now turn to a completely different approach to the
7 Nonlinear Projections
399
problem of creating a panorama with a camera. This approach, termed by its inventor the total photograph, was developed and patented by Dick Termes in 1980 and is described in [Termes 80]. To understand this technique, consider the cubic panoramic projections of Section 7.8. We imagine an observer located at the center of a cube (Figure 7.25a) and looking at the three-dimensional scene outside. Everything the observer sees is etched on the sides of the cube or is painted there by the observer. Given a three-dimensional scene and a camera, the problem is generating such a cube. In general, we want a method where we can project a scene on the sides of any of the five regular polyhedra, as discussed in Section 7.10. The first step is to decide what regular polyhedron we want. For example, we may want to create a panorama on the 20 triangular sides (or faces) of an icosahedron. We use suitable material, such as wood, plastic, or metal, to construct a solid icosahedron and mount it on a good-quality, stable camera tripod. (The tripod may have to be loaded with extra weight to make it extremely stable.) The icosahedron stays fixed while pictures are taken. We drill small holes (labeled 34 in Figure 7.36) in each of the 20 sides of the icosahedron to enable us to quickly attach a special bracket to any side. A camera is mounted on the bracket. (The camera has to have a wide field of view, so pictures taken from adjacent faces of the polyhedron do overlap). We then place the bracket with the camera in one of the 20 sides of the icosahedron and, while holding it stable in our hand, take a snapshot. This guarantees that the center of the camera lens is right over the center of the polygon face. It is also important to make sure that the camera’s line of sight is perpendicular to the polygon face. We repeat this for the 19 remaining sides to end up with 20 pictures, each showing what a viewer located on that face of the icosahedron would see. Figure 7.36 is taken from the patent application. The first five figures show the five Platonic solids, each with two holes on each face, for quick mounting of the bracket. Part 6 of the figure shows the bracket, part 7 shows a camera mounted on the bracket, and part 8 is an exploded view of an icosahedron mounted on a camera tripod and the bracket mounted on one side. The only problem is that the camera is located outside the icosahedron, not inside. Thus, the camera sees more than an inside observer would see through each face. The 20 photographs therefore partially overlap and we need to identify the overlapping parts and remove them. The result should be 20 triangular pictures, each corresponding to what an observer inside the icosahedron would see through one face. These triangles can then be pasted together to form an actual icosahedron. Figure 7.37 illustrates this process. Two partially overlapping pictures are placed such that the overlapping parts match precisely (part 9). The centers of the pictures are then identified and connected by a straight segment (56 in part 10), and another segment (58) is drawn, perpendicularly bisecting the first one, as shown in part 10. Once this is done, it is easy to construct the two segments 62 and 64 of part 11 and end up with an equilateral triangle on the picture. The picture is then trimmed as in part 12, with small tabs that are later used to paste this picture to several (up to three) other ones. Part 13 shows how the 20 triangles resulting from this process are mounted in one horizontal strip that can later be converted to an actual icosahedron (part 14). Each face of a dodecahedron is a pentagon, and each side of a cube is a square, but the details of removing overlapping parts and trimming each picture in these cases are
400
7.11 Panoramic Cameras
Figure 7.36: Details of Invention (Courtesy of Dick Termes).
7 Nonlinear Projections
Figure 7.37: Details of Invention (Courtesy of Dick Termes).
401
402
7.12 Telescopic Projection
similar to the triangular case. Figure 7.33 is an example of a panorama constructed on a dodecahedron. From around 1930 on, therefore, the standard photographic image on 35 mm film was 15.6 mm high by 20.8 mm wide, a proportion of roughly four by three. The same proportion of height to width (the aspect ratio) is obtained on the screen when such a frame is projected, and this shape of image (ratio 1:1.33) came to be called the “Academy ratio.” But substantial variations are possible even on conventional 35 mm film. Masks or caches can cut the height of each frame, and thus increase the aspect ratio of the projected image: alternatively, special lenses can be used which “squeeze” a wider image on to the film (through a procedure called anamorphosis, often used by Renaissance painters) and “unsqueeze” it again when the film is projected. A French optical scientist called Chr´etien invented the anamorphic lens and its application to the cinema in the 1920s; Autant-Lara experimented with it in a film version of Jack London’s To Make A Fire, but the Hypergonar, as Chr´etien called his invention, failed to catch on, and development work on it stopped. —David Bellos, Jacques Tati (1999).
7.12 Telescopic Projection Seen through a microscope, small objects look bigger than they are. The telescope, however, does not enlarge objects; it brings them closer. Objects close to the telescope are brought a little closer, while objects located far away are moved much closer. This short section discusses the mathematics of the telescopic projections, but it should be emphasized that this is not a projection from three dimensions to two dimensions, but rather a three-dimensional transformation. (This is also true for the microscopic projection.) Nevertheless, these topics are discussed here because of their nonlinearity. The diameter of the moon is 3,476 km (2,160 miles). When we see the moon through a telescope, its diameter seems only a few centimeters or a few meters, much smaller than the real diameter. This shows that the telescope does not increase the size of the object being viewed. Instead, it decreases the apparent distance of the object. Figure 6.4 is a perspective projection of a long row of telephone poles. The poles, which are the same height and are equally spaced, seem to get smaller and closer together as they get farther from the viewer. This is a common effect of linear perspective. Looking at the same poles through a telescope brings them closer and makes them look bigger, but not by the same amount. Poles closer to the telescope move just a little closer to the viewer, while poles far away move much closer and also get bigger (although still smaller than nearby poles). In order to compute such a projection mathematically, we need an expression that will take a quantity z (the distance of a telephone pole) and will shrink it nonlinearly to z ∗ such that z = 0 (a telephone pole at the viewer’s position) will result in z ∗ = 0 (no movement) and large values of z will yield z ∗ values in the interval [0, k] and approaching k slowly. One choice for such an expression is z ∗ = kz/(z + k),
(7.4)
7 Nonlinear Projections
403
where k is a parameter selected by the user. This expression is similar to the thin lens equation from optics and to Equation (6.3). The Mathematica code k=10.; Table[k z/(z+k), {z,0,100,5}] Table[%[[i+1]]-%[[i]], {i,1,20}] Table[Point[{%%[[i]],0}], {i,1,21}]; Show[Graphics[%]] selects k = 10 and 21 z values from 0 to 100 in steps of 5. It produces the 21 numbers 0, 3.33, 5, 6, 6.67, 7.14, 7.5, 7.78, 8, 8.18, 8.33, 8.46, 8.57, 8.67, 8.75, 8.82, 8.89, 8.95, 9, 9.05, and 9.09. They start at zero and approach k (Figure 7.38a). The third line computes the 20 differences between consecutive numbers. It produces 3.33, 1.67, 1, 0.67, 0.47, 0.36, 0.28, 0.22, 0.18, 0.15, 0.13, 0.11, 0.10, 0.083, 0.074, 0.065, 0.058, 0.053, 0.048, and 0.043, and it is obvious that the differences get smaller and smaller, showing that points brought in from infinity converge at distance k from the viewer.
0
5
10
(a) (b)
Figure 7.38: (a) Twenty Nonuniformly Spaced Points. (b) Varying Heights.
Exercise 7.11: What should be the distance z of a point in order for it to be moved to a distance z ∗ = k/2 by the telescopic transformation? In prophetic utterances, time is often telescoped. —Anonymous. The heights of the transformed telephone poles can be determined by a similar expression. A pole located right at the viewer’s location should maintain its height, while poles that are moved closer should become taller but should remain smaller than the nearest pole. If the nearest pole is l units tall, then the expression zr l∗ = l 1 − z+l produces l∗ values that range from l (for z = 0) to (1 − r)l (for very large z). The Mathematica code l=20.; r=0.1;
404
7.13 Microscopic Projection
Table[l(1-(z r/(z+l))), {z,0,100,5}] Table[%[[i]]-%[[i+1]], {i,1,20}] Table[Line[{{i, 17}, {i, %%[[i]]}}], {i,1,21}] Show[Graphics[%]] selects l = 20 and r = 0.1 to obtain l∗ values ranging from l to 0.9l = 18. The results are 20, 19.6, 19.33, 19.14, 19, 18.89, 18.8, 18.72, 18.67, 18.62, 18.57, 18.53, 18.5, 18.47, 18.44, 18.42, 18.40, 18.38, 18.363, 18.35, and 18.33. Figure 7.38b shows the top parts of the poles to illustrate how the differences in height between consecutive poles diminish. The third line of the code yields the 20 differences 0.4, 0.27, 0.19, 0.14, 0.11, 0.09, 0.073, 0.061, 0.051, 0.044, 0.038, 0.033, 0.029, 0.026, 0.0234, 0.021, 0.019, 0.017, 0.016, and 0.014. Thus, the height differences between consecutive telephone poles get smaller and smaller. After a three-dimensional scene has been telescoped point by point, we can use perspective projection to display it in two dimensions. Love looks through a telescope; envy, through a microscope. —Josh Billings.
7.13 Microscopic Projection A sample observed through a microscope is normally thin. We can therefore assume that points that go through a microscopic projection have the same (or similar) z coordinates. In contrast to a telescope, which brings points closer to the observer, a microscope “opens up” the points. Figure 7.39 shows how this is done by moving points away from the z axis. If the view angle of a point P is θ, then the microscope places its projection P∗ such that its view angle is mθ, where m is the magnification power of the microscope. Thus, the projection rule is x = tan θ z+k
and
x∗ = tan(mθ). z+k
x
(7.5)
P* m
P z
Figure 7.39: Microscopic Projection.
Computing x∗ therefore involves the two steps θ = arctan(x/(z + k)) and x∗ = (z +k) tan(mθ). For small angles, tan θ is close to θ, so we can write as an approximation x∗ x =m z+k z+k
or x∗ = mx.
7 Nonlinear Projections
405
This is a linear scaling transformation where both x and y are scaled by a factor of m, while z is left unchanged. The transformation matrix is ⎞ m 0 0 0 ⎜ 0 m 0 0⎟ ⎠. ⎝ 0 0 1 0 0 0 0 1 ⎛
Nature composes some of her loveliest poems for the microscope and the telescope. —Theodore Roszak, Where the Wasteland Ends (1972).
7.14 Anamorphosis An anamorphosis is a distorted image that can be visualized and perceived only when viewed in a special way. The two most common types of anamorphosis are oblique and catoptric. The former type has to be viewed from an unusual angle or from a specific location or distance. The latter has to be seen reflected in a special mirror. Anamorphosis A distorted or monstrous projection or representation of an image on a plane or curved surface, which, when viewed from a certain point, or as reflected from a curved mirror or through a polyhedron, appears regular and in proportion; a deformation of an image. —From Webster’s Dictionary (1913). Figure 7.40 illustrates oblique anamorphosis. We imagine the artist painting a subject as if seen through a window. A conventional window is perpendicular to the line of sight of the artist, whereas an anamorphosis window is tilted at a sharp angle to the line of sight.
ted w til indo w
subject
conventional window
Figure 7.40: An Anamorphosis Window.
The Hungarian artist Istv´ an Orosz has produced striking examples [Orosz 05] of catoptric anamorphosis. An example of oblique anamorphosis is the well-known painting The Ambassadors by Hans Holbein the young [Holbein 05]. It features, in the foreground,
406
7.14 Anamorphosis
a small detail, the distorted image of a skull. In order to actually see the skull, it has to be viewed from a point to the right of the painting and very close to it A cylindrical anamorphosis is a popular variant of oblique anamorphosis. A cylindrical mirror is placed on a flat plane and a deformed image is drawn on the plane. When viewed in the mirror, the image looks correct. Site www.anamorphosis.com [anamorphosis 05] is a lively introduction to anamorphosis, with many examples and special software, Anamorph Me! [Anamorph Me 05], that can input an image in one of several popular formats and prepare an anamorphosis (either oblique or catoptric). The four variations of Figure 7.42 were generated by this software. Figure 7.41 shows how to create an anamorphosis manually. Start with an image, cover it with a regular grid, stretch the grid and distort it, and then copy the details of the image from each original grid box to the corresponding box (which is no longer a rectangle) in the new grid. In order to obtain a cylindrical anamorphosis, the square (or rectangular) grid covering the original image has to be stretched and bent into a circular arc, as depicted in the figure.
Figure 7.41: Creating an Anamorphosis.
7 Nonlinear Projections
407
Figure 7.42: Four Variations of Anamorphosis.
Prepared by the author with Anamorph Me! by Phillip Kent, free software for Windows [AnamorphMe 05]. Clockwise from top left: Conical mirror, cylindrical mirror, pyramid, and conical. The original (Still Life, by the author) is at the center.
408
7.15 Map Projections
7.15 Map Projections According to [wikipedia 05], the ancients generally believed that the Earth is flat, but by the time of Pliny the Elder (the first century a.d.) its spherical shape had already been generally acknowledged. Many scientists and cartographers strongly believed in a round Earth, which led Columbus to risk his life, in 1492, trying to reach Japan by sailing west. Today, most of us believe that the Earth is a sphere (more accurately, a spheroid, since it is slightly flattened at the poles), but there is still a persistent minority that believes otherwise (see [Flat Earth Society 05] for an interesting example). Regardless of anyone’s beliefs or convictions, our aim in this section is to describe the chief methods for projecting a sphere on a flat plane. The equation of a sphere of radius R centered on the origin is x2 + y2 + z 2 = R2 . This is a special case of the ellipsoid y2 z2 x2 + y 2 z 2 x2 + 2 + 2 = 1 and the spheroid + 2 = 1. a2 b c a2 c The sphere may also be described in spherical coordinates as (compare with Equation (7.3)) x = R cos θ sin φ, y = R sin θ sin φ, z = R cos φ, where θ is the longitude (or azimuthal coordinate), which varies from 0 to 2π, and φ is the colatitude (or polar coordinate, the latitude measured from the north pole), which varies from 0 to π. Exercise 7.12: Look up (in a dictionary or on the Internet) the definitions of latitude, longitude, antipode, and graticule. First, let’s convince ourselves that projecting a sphere on a plane is a practical, important problem. After all, one might claim that we have globes of the Earth, so perhaps we don’t need maps as well. A globe is a true representation of the Earth’s surface because it maintains the true scale of areas and distances and shows the correct shapes of regions and the correct angles between lines. However, its use is limited. Only one half of a globe can be viewed at a time. Normally, the size or scale of a globe is too small to show the details of a small region, such as a town, and large globes are expensive and difficult to handle. Maps, on the other hand, are much more versatile. A flat map is portable because it can be rolled or folded. It is easy to print maps in large quantities and store them digitally in a computer where they can be edited, processed, displayed, and printed. There is vast literature on map projections, map making, and cartographic technique. Distilling it to just four items yields, in the opinion of this author, [Pearson 90] (very mathematical), [Snyder 87], [Snyder 93], and [Furuti 97]. The main problem with mapping a globe is the fact that a sphere is an undevelopable surface. Any attempt to open, unfold, or unroll a sphere to lie flat results in stretching and deforming it in some way. (This is also mentioned in Section 7.7.) Thus, every projection of a sphere onto a flat plane must introduce distortions, and the problem
7 Nonlinear Projections
409
of mapping a globe is to design and develop sphere projections that eliminate or minimize certain distortions (while perhaps increasing others). Thus, we can say that cartography is the art and science of designing and choosing the least inappropriate projection for a given application. A map that preserves distances may be useful in certain applications even if it corrupts angles. Similarly, a map that minimizes distortions around the equator may be ideal for certain countries, such as Ecuador, even if it deforms the shapes of regions close to the poles. An important requirement in sphere projection is to preserve spatial relationships. If a region A lies to the north of another region B on the globe, it should also appear to the north of B on the projection (i.e., on the map resulting from the projection). Other than preserving spatial relationships, any sphere projection is a compromise, displaying some properties accurately while deforming others. Thus, when classifying sphere projections, one attribute that should be considered is the extent to which a projection preserves or distorts certain properties. Following is a list and a short discussion of the most important properties of maps. These properties are identified by answering, for a given map, the following questions: Can distances be accurately measured? How easy is it to determine the shortest path between two points? Are directions between points preserved? Are shapes of geographical features preserved? Are areas preserved to scale? Which regions suffer the most distortion, and what kind of distortion? These features are discussed here. Scale. A map has to shrink the globe down to a convenient size that is determined by the scale. In a 1:10,000 scale map, points separated by two units on the map represent geographical locations separated by 20,000 units on the sphere. However, no map satisfies this condition perfectly. Scale on a flat map changes with location on the map and the direction between the points. Measuring arbitrary distances on a map can at best serve as an estimate of the real distances on the sphere. Recall that the shortest distance between two points on a sphere is a great-circle arc, but such an arc is only rarely represented by a straight line on a map. However, some projections produce maps where certain lines are to scale. Distances measured along those lines are accurate. Such lines are called standard lines. In a sinusoidal projection centered on the equator, all latitudes (parallels) are standard lines. In an azimuthal equidistant map, all lines that pass through the central point are standard. In a cylindrical equidistant map, the vertical lines (longitudes) and the equator are standard lines. A small-scale map portrays a large area and a large-scale map portrays a small area of the Earth. It is intuitively clear that a small region of a sphere is not much different from a flat plane, which is why a large-scale map is not sensitive to the projection
410
7.15 Map Projections
algorithm. When mapping a small area of a sphere, practically any projection method will produce a map where distances, areas, and angles are fairly accurate. The problem of distortion arises when a large area of a sphere has to be mapped. In such a case, no projection method will produce ideal results, and the algorithm used has to be selected depending on the application at hand. One projection method may be suitable for navigation, while another may produce maps useful for surveying. The shortest path between any two points on a sphere is a great-circle arc, also called a geodesic. Thus, a projection where great circles are displayed as straight lines is ideal for measuring shortest paths. No sphere projection can generate such a map, but the stereographic projection comes close to satisfying this requirement because it preserves circles. Any circle on the sphere is mapped by this projection to a circle. In particular, a great circle passing through the center of the projection is mapped to a circle with infinite radius, a straight line. Thus, straight lines through the center of a stereographic projection are great circles and indicate shortest paths. The downside is that this projection can show only one hemisphere, which limits its use in air navigation to short and medium distances. The gnomonic projection maps all great circles, not just those passing through the central point, into straight lines, but this projection projects even less than a hemisphere. A map prepared especially for determining property taxes should allow for accurate measurements of areas. If the scale of the map is s and if the area of a certain region is A, then the area of the region as measured on the map should be A/s. Such a map is termed equal-area and may distort the shapes of areas and display wrong distances between points. Even a quick glance at a Mercator map shows a huge Greenland about the same size as all of Africa, obviously not to scale because the ratio of their areas is 1: 13.7. In this projection, areas close to the poles appear bigger than they should. In contrast, the Mollweide projection preserves areas. It is easy to tell when a familiar shape becomes distorted or deformed. On the other hand, it is not obvious how to measure distortion quantitatively. We are familiar with the shape of the continents on Earth, so when a landmass gets distorted by a projection, we recognize the deformation, but it took cartographers several centuries to come up with a simple measure that shows the amount and direction of the distortion. This measure was introduced by Nicolas Tissot in the 19th century and is known today as Tissot’s indicatrix. The idea is simply to add a grid of small circles to the globe area being mapped. The circles are mapped with the other items in the area (land areas, oceans, rivers, etc.), and a quick look at a circle shows the amount and direction of its distortion. A circle may retain its shape and area, it may get scaled but keep its shape, or it may become deformed. Figure 7.43a shows the Tissot indicatrix for the sinusoidal projection. It is obvious that distortion is minimal around the equator and increases toward the poles. Also, the circles are distorted, but their area is preserved. In contrast, part (b) of the figure indicates that the Mercator projection, which is conformal, does not distort shapes but increases areas as we move away from the equator. (At the poles, the Tissot circles would become infinitely large.) A map prepared for determining the routes of new highways should be equidistant; it should preserve distances. If the distance between two points on the sphere is L, then
7 Nonlinear Projections
411
(a)
(b)
Figure 7.43: Tissot Indicatrix for Sinusoidal and Mercator Projections.
the distance between them on the map should be L/s. In practice, an equidistant map often shows true distances only from one point, the center of projection. An azimuthal or zenithal projection preserves angles. Ideally, if the angle between three points is α, then the angle between the same points on the map should be the same α. In practice, azimuthal maps maintain true angles only from one central point, and even this property is achieved at the price of great distortions of areas and distances. A map projection is conformal (also referred to as orthomorphic or equiangular) when (1) all angles at any point are preserved, (2) lines of latitude and longitude intersect at right angles, and (3) the shapes of small areas are preserved. Such a map corrupts the size of large areas. Table 7.44 lists the pairs of properties that can be combined in a single projection. Projection
Area
Scale
Angle
Shape
Equal-area Equidistant Azimuthal Conformal
— no yes no
no — yes no
yes yes — yes
no no yes —
Table 7.44: Properties that Can be Combined.
May I repeat what I told you here: treat nature by means of the cylinder, the sphere, the cone, everything brought into proper perspective so that each side of an object or a plane is directed towards a central point. —Paul C´ezanne to Emile Bernard, 15 April 1904. Developable surfaces. A developable surface is one that can be opened or unrolled to become flat without introducing any distortions or deforming it. A plane is developable, as are the cone and the cylinder. As a result, most methods for projecting
412
7.15 Map Projections
a globe start by projecting it on a cone or a cylinder (while introducing distortions) and then unfolding this projection to become flat. A developable surface is constructed by rolling or twisting a flat sheet of material without stretching or shrinking it. A ruled (or lofted) surface is linear in one direction. The parametric expression of such a surface is of the form P(u, w) and it is linear either in u or in w. Such surfaces are simple but are not always developable. Exercise 7.13: Are there any other developable surfaces in addition to the cylinder, cone, and plane?
(a)
(b) Figure 7.45: Principles of Projection.
Figure 7.45 illustrates the principle of employing developable surfaces for sphere projection. The cylinder, cone, or plane can either be tangent to the sphere (part (a) of the figure) or secant to it (part (b)). In the latter case, the cylinder and cone intersect the sphere in two circles and the plane intersects it in a single circle. The areas of contact between the sphere and the developable surface are called the standard parallel or the standard line. These areas are important because they correspond to the regions of least distortion in the map. The difference between the various projection algorithms is in the precise way they project points on the sphere to the developable surface.
7 Nonlinear Projections
413
Definitions of secant Line, ray, or segment that contains a chord of a circle. A line that crosses the circle only twice. A line extending through a circle, connecting two nonadjacent points. A straight line that intersects a curve at two or more points. Ratio of the hypotenuse to the adjacent side of a right-angle triangle. Figure 7.47 shows how the orientation of the developable surface can vary relative to the poles of the globe. The surface can be polar, equatorial, or oblique. It is obvious that the orientation significantly affects the graticule and thus the appearance of the map. In the oblique projections, the poles are no longer at the top and bottom of the map but have migrated to unexpected places. Azimuthal projections, also called planar projections, are those that project (normally only part of) a sphere directly to a plane, so there is no need to unroll and flatten a developable surface. The plane is tangent to the sphere at a point that becomes the center of projection. If the center is a pole, then lines of latitude become concentric circles on the projection and lines of longitude become straight segments that converge at the center. Figure 7.46 shows that these projections preserve directions from the center but distort distances and areas as well as directions from other points.
(a)
(b)
(c)
Figure 7.46: Azimuthal Projections.
The case where the center of projection is at the center of the sphere is called a gnomonic projection (part (a) of the figure). Each line of latitude becomes a circle, but the distance between consecutive circles shrinks for high latitudes. Thus, equatorial regions are shown in more detail, while polar regions are shrunk in this type of projection. The figure demonstrates that this projection is limited to less than half the sphere; it cannot include the equator. On the other hand, any great circle is displayed in this projection as a straight segment. (A great circle is one whose center is at the center
7.15 Map Projections
414
Normal
Transverse
Oblique
Polar
Equatorial
Oblique
Figure 7.47: Various Orientations.
7 Nonlinear Projections
415
of the sphere.) Great circles are important for navigation because a great circle arc is the shortest distance between two points on the surface of a sphere. This is why the gnomonic projection is commonly used in air navigation. This projection is neither conformal nor equal-area. Part (b) of the figure shows a stereographic projection. This is the case where the center of projection is at the pole opposite the plane of projection. The circles of latitude are uniformly spaced, which results in uniform distortions throughout. When the center of projection is at infinity on the side of the sphere opposite that of the projection plane, the lines of projections are parallel and the projection is referred to as orthographic. Part (c) of the figure illustrates this type, and it is obvious that the pole that’s tangent to the plane of projection is shown in much detail and little distortion, thereby making this projection ideal for mapping polar regions. Figure 7.48 illustrates the coordinate transformation for the orthographic projection. The polar coordinates of the projected point are θ = longitude and r = R cos (latitude).
R Equator
Point on globe
φ
Projection plane
y Eq ua to r
North pole
r
Projected point
North pole
r
Projected point
x
Figure 7.48: Polar Coordinates in Orthographic Projection.
Cylindrical projections. Figure 7.49 shows the three main ways to project a sphere on a cylinder tangent to it. Part (a) of the figure illustrates a perspective projection from the center of the sphere. In part (b), points on the sphere are projected to the cylinder in parallel, while the projection principle in part (c) is to project equal arc lengths on the sphere to equal vertical segments on the cylinder. 680 600
Rcosφ
450 300 150
R
φ (a)
(b)
(c)
Figure 7.49: Three Types of Cylindrical Projections.
(d)
416
7.15 Map Projections
In all three types of cylindrical projections, unrolling the cylinder results in equallyspaced longitudes on the map. However, in a perspective cylindrical projection, the spaces between consecutive latitudes on the map increase as we move toward the pole and approach infinity at the pole. Thus, it is impractical to extend this projection beyond about latitude 80◦ . The simple projection depicted in Figure 7.49a projects a point at latitude φ and longitude θ on the globe to Cartesian coordinates x = θ − θ0 (where θ0 is the longitude at the center of the map) and y = R tan φ on the map. Such a projection stretches the vertical dimensions of any regions between about latitude 30◦ and the poles, resulting in so much distortion that it is rarely used. Mercator projection. A common variant of the cylindrical perspective projection is the popular Mercator projection, developed by Gerhardus Mercator in 1569. Its principle is to increase the distance between consecutive latitudes in proportion to the increased distance between meridians. This effect is illustrated in Figure 7.49d. The circumference of a globe of radius R at the equator is 2πR and at latitude φ it is 2πR cos φ. Thus, the width of a longitude at latitude φ (the distance between longitude θ◦ and (θ + 1)◦ ) is smaller than the width of a longitude at the equator by a factor of cos φ. In a cylindrical projection, the longitudes are shown as parallel lines, which means that at latitude φ, the width of a longitude in the projection has been artificially increased by a factor of 1/ cos φ. The width of a meridian can be considered the horizontal scale, so the principle of the Mercator projection is to also increase the vertical scale by the same factor. In the basic cylindrical projection, the y coordinate depends on the latitude φ as y = R tan φ. Now, we have to change the dependence such that a small change Δφ in φ changes y by a factor of RΔφ/ cos φ. The basic equation of y as a function of φ is therefore dy =
R dφ , cos φ
which integrates to yield
y(φ) = R ln tan
π φ − . 4 2
Any integration constant is eliminated if we impose the condition that φ = 0 implies y = 0. Now imagine a small region at latitude φ. Both its width and its height have been increased by a factor of 1/ cos φ, so its area is increased by a factor of 1/ cos2 φ, but its shape hasn’t changed. A large region tends to spread beyond a single latitude, so its shape is distorted. Thus, the Mercator projection preserves the shapes of small regions and makes it relatively easy to compute their true areas. Large regions are distorted and also appear very large. Greenland, for example, appears bigger than South America, even though the latter is nine times bigger. It is also obvious that longitudes and latitudes are perpendicular to each other in the Mercator projection. This projection is therefore conformal. “What’s the good of Mercator’s north poles and equators, tropics, zones, and meridian lines?” so the Bellman would cry: and the crew would reply “They are merely conventional signs!” —Lewis Carroll, The Hunting of the Snark (1876).
7 Nonlinear Projections
417
Figure 7.52 shows the standard Mercator projection of the Earth (where the cylinder is tangent to the equator), and Figure 7.53 is the oblique 45◦ Mercator projection introduced by Charles Peirce in 1894. Cylindrical equal-area projection. When the cylinder is aligned with the rotation axis of the globe, any cylindrical projection results in uniformly spaced, parallel meridians and parallel latitudes. However, the latitudes don’t have to be spaced uniformly, and their spacings can be adjusted to preserve areas. There is essentially only one way to design a cylindrical equal-area projection, and it was first described by Johann H. Lambert in 1772. In a cylindrical projection, the x coordinate for longitude θ on the unrolled cylinder is the length of the arc between θ and θ0 . Thus x = R(θ − θ0 ). We have to adjust the space between consecutive latitudes such that any area on the cylinder will equal the corresponding area on the sphere, and this is easy to achieve by comparing areas on the sphere and the cylinder. The total surface area of a sphere is 2πR2 , so the area below latitude φ is 2πR2 sin φ. The area of a cylinder below a certain height y is 2πRy, so equating the expressions 2πR2 sin φ and 2πRy results in y = R sin φ. Table 7.50 lists y values for R = 1 and for latitudes from 0 to 90◦ and the stretch factor for each. This factor is the extra amount the y coordinate has to be moved relative to its “natural” position. For example, for φ = 30◦ , the natural position for the y coordinate is 0.3, but it has moved to 0.5, a stretch factor of 1.67. Figure 7.51 illustrates how each latitude is raised (the dashed green lines in the Northern Hemisphere) in order to preserve areas. The figure illustrates the fact that such a projection is useful in the equatorial regions but useless in the polar regions, where the small gaps between consecutive latitudes make it impossible to distinguish shapes, borders, and distances.
φ◦ 0 10 20 30 40
y
Stretch
0 0.17 0.34 0.50 0.64
0.00 1.74 1.71 1.67 1.61
φ◦ 50 60 70 80 90
90
y 0.77 0.87 0.94 0.98 1.00
Stretch
1.53 1.44 1.34 1.23 1.11
Table 7.50 Cylindrical Equal-Area Projection.
70
70 60 50 40
50
30
30
20
10
10
Figure 7.51 Cylindrical Equal-Area Projection.
Lambert’s design for an equal-area projection can be varied and is used by several similar equal-area cylindrical projections. These vary the standard parallels, the general map proportions, and the ways of distorting shapes. These projections can be converted back to Lambert’s by rescaling both the width and height. Cylindrical equidistant projection. Perhaps the most familiar feature of the cylindrical projections discussed so far is the straight, parallel, and equidistant meridians. A distance measured along a meridian will have true scale because all the meridians have the same length. Given a projection with such meridians, how can we draw the latitudes so as to preserve scale along them too? There seems to be no solution to this problem because the latitudes get shorter as we approach the poles and the only way to
418
7.15 Map Projections
Gerardus Mercator was a well-known 16th century cartographer who is remembered mostly for the useful projection now named after him. He was Flemish (his birth name was Gerard de Cremer) of German ancestry, and the name “Mercator” means “merchant” or “marketer.” Although not a traveler himself, he became interested in geography, maps, and cartography as a young man. His first project, in the mid-1530s, was to construct, with two collaborators, a globe of the Earth. Later, he produced maps of the Holy Land, the world, and Flanders. After being charged with heresy and spending time in prison, he moved to the town of Duisburg, where he became a professional cartographer and also taught mathematics. In 1564 he reached the peak of his career when he became court cosmographer to Duke Wilhelm of Cleve. His famous projection was conceived a few years later as an aid to sea navigation. (Continues. . . )
Figure 7.52: Mercator Projection.
7 Nonlinear Projections
419
(Mercator’s life, continued)
In 1552, in Duisburg, he opened a cartographic workshop, where he completed a sixpanel map of Europe (in 1554) and produced more maps. He devised his famous globe projection and first used it in 1569; it had parallel lines of longitude to aid navigation by sea, as compass courses could be marked as straight lines. Mercator was also the first to use the word atlas to refer to a book of maps. The Mercator Museum in Sint-Niklaas, Belgium, features exhibits about Mercator’s life and work. A simple, detailed description of his life and projection can be found in [mercator 05].
Figure 7.53:
45◦
Oblique Mercator Projection.
420
7.15 Map Projections
fit shorter latitudes among the longitudes is to bend the longitudes. Thus, there is no cylindrical projection that preserves distances along both dimensions. Pseudocylindrical projections. All the cylindrical projections discussed here (and also those not mentioned here) feature noticeable shape distortions at higher latitudes (where area is normally also greatly exaggerated). The poles are either infinitely stretched to lines or are impossible to include in the projection. Various pseudocylindrical projections have therefore been developed in attempts to correct these shortcomings. These projections feature (1) straight horizontal parallels, not necessarily equidistant, and (2) arbitrary curves for meridians, equidistant along every parallel. The horizontal parallels help to compute and predict phenomena that depend on distance from the equator such as the lengths of day and night. The constant scale at any point of a parallel makes it easy to measure distances in the direction of a latitude. Parallels and meridians do not always cross at right angles in a pseudocylindrical projection, which is why this type is nonconformal. Most pseudocylindrical projections are known to cause severe shape distortions at polar regions.
Figure 7.54: Mollweide Projection.
The following are examples of pseudocylindrical projections: The Mollweide projection (Figure 7.54) was created in 1805 by Karl Mollweide and popularized by Jacques Babinet in 1857. This equal-area projection was designed to inscribe the world into a 2:1 ellipse, keeping the latitudes straight while still preserving areas. It was developed for educational purposes. All meridians except the central one are equally spaced semiellipses intersecting at the poles and concave toward the central meridian. Because of the aspect ratio chosen by Mollweide, the central meridian is half as long as the equator. The two meridians 90◦ east and west of the central meridian form a circle. The mathematical expression of this projection starts with a point with longitude
7 Nonlinear Projections
421
θ and latitude φ on the sphere. The point is mapped by this projection to the point √ √ 2 2(θ − θ0 ) cos α x= and y = 2 sin α π on the map, where θ0 is the longitude at the center of the map and α is the solution to the equation 2α + sin(2α) = π sin φ. This projection is also called homalographic, homolographic (from the Greek homo, meaning “same”), elliptical, or Babinet. There is also an interrupted version of the Mollweide projection. Mathematically, this projection is pseudocylindrical equal-area. This projection is sometimes used in thematic world maps. It preserves scale up to latitude 40◦ (north and south). North and south of this latitude, distortions become more and more severe. The sinusoidal projection (Figure 7.55), also known as the Sanson-Flamsteed projection and the Mercator equal-area projection, is the simplest pseudocylindrical equal-area projection.
Figure 7.55: Sinusoidal Projection.
The width of a degree of longitude is proportional to the cosine of the latitude, and the lines of latitude are straight segments equally-spaced on the map. This combination preserves areas. Specifically, a point with longitude θ and latitude φ on the sphere will be mapped by this projection to the point ((θ − θ0 ) cos φ, φ) on the map (where θ0 is the longitude at the center of the map). This projection does not preserve shapes. Landmasses away from the central meridian are sheared, making them look extremely deformed or even unrecognizable. An interrupted version of this projection reduces distortions considerably because (1) the scale on the equator is uniform, (2) the meridians cross it at right angles, and (3) the vertical scale of the projection does not vary along the equator for different longitudes.
422
7.15 Map Projections
It is worth mentioning that the sinusoidal and Mollweide projections handle polar regions in complementary ways; while the former crowds them together, the latter results in widely spaced meridians, which leads to more pronounced angular distortion. These two projections are combined in Goode’s homolosine projection. The Eckert IV equal-area world map projection (Figure 7.56) is the fourth in a set of six projections developed in the 1920s by Max Eckert as a pseudocylindrical compromise projection to obtain equal areas. The projection is in the form of a capsule, similar to an ellipse but larger, with curved lines of longitude (see also Figure 7.22). The outer meridians are semicircles, and the inner meridians are elliptical arcs. The central meridian is straight and its height is identical to the length of the equator.
Figure 7.56: Eckert IV Projection.
The mathematical expression of this projection starts with a point with longitude θ and latitude φ on the sphere. The point is mapped by this projection to the point 2 π x= (θ − θ0 )(1 + cos α) and y = 2 sin α 4 + π π(4 + π) on the map, where θ0 is the longitude at the center of the map and α is the solution to the equation α + sin α cos α + 2 sin α = (2 + π/2) sin α. This projection is often the one favored by climatologists to display climate data. Sometimes it is used as a small inset inside another map (probably because of its pleasing shape), and the National Geographic Society in the United States used it for printing large wall maps of the world. Conical projections. Projections that employ a cone as the developable surface have limited applications because they result in a noticeable distortion of shapes. Figure 7.57a portrays a cone of height h and radius R. We denote half its top angle by α and notice that α varies in the interval [0, 90◦ ). It is immediately obvious that
7 Nonlinear Projections
423
l2 = h2 + R2 and sin α = R/l. Part (b) of the figure shows the cone flattened, and the problem is to compute its top angle β. The bottom part of the flattened cone is a circular arc whose length equals the circumference 2πR of the original cone bottom. √ Thus βl = 2πR or β = 2πR/l = 2π sin α = 2πR/ h2 + R2 . For example, when α = 45◦ , we get β ≈ 2π·0.7071 = 255◦ .
l
h
l R
(a)
l β 2πR (b)
Figure 7.57: A Cone (a) Before and (b) After Flattening.
Clouds are not spheres, mountains are not cones, coastlines are not circles, and bark is not smooth, nor does lightning travel in a straight line. —Benoˆıt Mandelbrot, The Fractal Geometry of Nature (1982). Figure 7.58 illustrates a simple equidistant conic projection of the Earth. This projection is appropriate for small regions regardless of their shape. It is also acceptable for large regions or even continents of predominant east–west extent. It illustrates the main features of a conic projection which are as follows: 1. Meridians are straight equidistant lines converging at the apex of the cone (normally a pole). The angular distance between meridians shrinks linearly as we move toward the apex, and the shrink factor is referred to as the cone constant. 2. Parallels are concentric circular arcs whose center is the point of convergence of the meridians. As a result, the parallels cross all the meridians at right angles and distortion is constant along each parallel. 3. In addition, the particular conical projection of Figure 7.58 is neither conformal nor equal-area, but such variations of the conical projection are possible. Lambert’s conic conformal projection. This type of projection was developed by Johann Lambert in 1772. After staying dormant for many years, it was revived during World War I and became the standard projection for intermediate and large-scale maps of regions in midlatitudes. Recall that “conformal” means shape preserving. A conformal mapping also preserves all angles between intersecting lines or curves. The principle of this projection is illustrated in Figure 7.59a. A cone is placed at a secant to the globe, intersecting the globe in two circles that become standard parallels. The distance between those parallels, which we denote by d, becomes 4/6 of the vertical dimension of the projection. Thus, the projection covers a distance of 6d/4 in the vertical direction. The projection extends d/4 above and d/4 below the standard parallels. The top and
424
7.15 Map Projections
Figure 7.58: A Conical Projection.
bottom of the cone are trimmed, and it is unrolled and takes a shape similar to that featured in Figure 7.59b. Notice the right angles between the (straight) meridians and the (curved) parallels. The scale along the two standard parallels is exact: The scale between them is less than 1 but its smallest value is only 0.99. The scale above and below them is greater than 1 but does not exceed 1.01. Albers’s conic equal-area projection. This projection, developed by Heinrich Albers in 1805, is very similar to Lambert’s conical conformal projection. The cone is at a secant to the globe and intersects it at two latitudes. The difference between these two projections is that Albers shifts the parallels on the cone in order to preserve areas in a way similar to the cylindrical equal-area projection. Given a point P on the globe, its projection P∗ is determined by constructing a straight segment from P that is normal to the cone. Perspective conic. Neither conformal nor equal-area, this projection maintains true scale at one standard latitude, while increasing distortion away from it. The principle is illustrated in Figure 7.60a. A cone covers part of the globe and is tangent to one latitude φ0 . Given a point P on the globe (inside the cone), we extend the straight segment from the center of the globe to P until it intersects the cone. The intersection point is the projection P∗ of P. A pole may be used instead of the center of the globe as the center of projection. A simple application of similar triangles shows that φ0 , the latitude of tangency, is also half the apex angle of the cone. Thus, r0 /R = cot φ0 . The figure also shows that r = r0 − R tan(φ − φ0 ) = R cot φ0 − R tan(φ − φ0 ). It is therefore natural to indicate the position of P∗ on the flat projection by the polar coordinates (r, θ), where
7 Nonlinear Projections
425 1) 1.0 ale
c 0 (s
1/6
of its tion standard parallel limojec pr standard parallel
49
le) sca ct a x 0 (e 45
4/6
1/6
(a)
9) 0.9 le a c 0 (s 37
le) sca act x e 0 ( 1) 29 1.0 ale c s 0 ( 25
(b) Figure 7.59: Lambert Conic Conformal Projection.
r
0
0
φ0 P* r0
0
20
0 600
40
0
80
P
φ
(a)
φ0
(b) Figure 7.60: Conic Perspective Projection.
r is the distance from the top (the projection of the pole) and θ is simply the sine of the longitude of P. Table 7.61 lists the ten latitudes from 0◦ to 90◦ for R = 1 and for φ0 = 45◦ (where R cot φ0 = 1). The differences between consecutive latitudes are also listed in this table, and it is clear that they increase as we move away (above or below) from φ0 . The most common example of this type of projection is the stereographic projection developed by Carl Braun in 1867. It wraps the globe in a cone aligned with the rotation axis. The cone is 1.5 times taller than the globe and is tangent to it at the 30◦ north parallel. The projection center is at the south pole, not at the center of the globe, and
7.15 Map Projections
426 φ 0 10 20 30 40
r 1.999 1.700 1.466 1.268 1.087
diff. 0.300 0.234 0.198 0.180
φ 50 60 70 80 90
r 0.913 0.732 0.534 0.300 0.001
diff. 0.175 0.180 0.198 0.234 0.300
Table 7.61: Ten Latitudes and Their Differences.
the resulting map is a perfect semicircle. Pseudoconical projections. In this type of projection (Figure 7.62), the latitudes are still circular arcs with a common center (concentric), and the meridians still converge to this center but are no longer straight. Such projections have been known since the time of Ptolemy but are not commonly used today.
Figure 7.62: Pseudoconical Projections.
Figure 7.62 shows the Bonne (left) and the second Stabius-Werner (right) pseudoconical projections. The first was developed by R. Bonne and the second is one of three pseudoconical designs by Johann Stabius and Johannes Werner. Other Sphere Projections. Certain applications are best served by sphere projections that do not preserve any of the properties above but instead are a compromise where no feature is greatly distorted. The following are examples of such special projections. Perhaps the most original among them cut (or interrupt) the continuous map into slices or gores. Projections that were especially developed to portray the entire world on one map often result in much distortion, mostly in regions located at the extremes of the projection. To improve the depiction of these distorted areas, “interrupted” forms splitting the projection into gores have been developed. In this approach, many landmasses (or oceans) can have their own central meridian, resulting in true shapes or conformality in each region of the projected map.
7 Nonlinear Projections
427
Goode’s homolosine equal-area projection (Figure 7.63) is not a general sphere projection. It was developed in 1923 by J. Paul Goode specifically to project the entire Earth while trying to minimize the overall distortion of landmasses. Its main feature is discontinuity. It “interrupts” the map (splitting it into slices called “gores”) in the oceans, with the result that the gores distort the shapes of the oceans while showing the continents in their true shapes. Mathematically, this projection is a combination of the homolographic and sinusoidal projections, hence the name homolosine.
Figure 7.63: Goode’s Homolosine Equal-Area Projection.
Gore is also: 1. A triangular point of land often found at road merges and diverges. 2. A triangular piece of cloth or metal used in three-dimensional fabrication. The Miller cylindrical projection (Figure 7.64) was developed by Osborn Miller in 1942 in an attempt to modify the Mercator projection to reduce the distortions around the poles and to make it possible to include the poles in the map. This projection is not equal-area, equidistant, or conformal, nor is it perspective. Along the equator, scale is true, and near the equator there is no distortion (although the distortion increases away from the equator, becoming significant at the poles). Miller started with the Mercator projection and moved the latitude lines closer to the equator. The distance L in the Mercator projection between each parallel and the equator was measured, and the parallel was moved to a distance of 0.8L from the equator. Thus, near the equator, this projection is virtually identical to Mercator. Another result of this shrinking of distances is that the height of the lines of longitude (the meridians) is 0.73 the length of the latitudes. Each pole, which on the Earth is a point, is displayed in this projection as a line of latitude, thereby causing maximum distortions at the poles. The mathematical expression of this projection starts with a point with longitude θ and latitude φ on the sphere. The point is mapped by this projection to the point
4φ π 2φ 5 5 + = sinh−1 tan x = θ − θ0 , y = ln tan 4 4 5 4 5
428
7.15 Map Projections
Figure 7.64: Miller Cylindrical Projection.
on the map (where θ0 is the longitude at the center of the map). The Miller cylindrical projection is often selected by cartographers for atlas maps of the world instead of the more popular Mercator projection. Evidently, some mapping experts feel that this variant is somewhat more appropriate or is simply more pleasing to the eye. The expansion was nonlinear; the stars at the center hardly seemed to move, while those toward the edge accelerated more and more swiftly, until they became streaks of light just before they vanished from view.
—Arthur C. Clarke, 2001: A Space Odissey (1997)
Plate E.1. Subviding a Cube into a Sphere in Four Steps (Modo).
(a)
(b)
(c)
(d)
(e)
(f)
Plate E.2. Particle Systems: (a) Fire, (b) Gun Fire, (c) Ferns, (d) Gas Fire, (e) Grass, (f) Smoke (Particle Illusion).
PlateE.3.BeachResortatCozumel,OilPainting(AKVISArtWork).
PlateE.4.BeachResortatCozumel,ComicsStyle(AKVISArtWork).
(a)
(c)
(b)
(d)
Plate F.1. (a) The Mandelbrot Set, (b) a Detail, (c) a Smaller Detail (Fractal Domain). (d) Steiner Chain (Geometry Sketchpad). A Steiner chain of circles is a finite sequence of circles, each tangent to two fixed, non-intersecting circles and to two other circles in the sequence.
OriginalImage
Pixelated
RefractingTorus
Curled
Translucent
Screened
PlateF.2.TheLittleDrummer(BeLightSoftware,ImageTricks).
PlateG.1.ShadowsFromTwoLightSources(LiveInterior).
PlateG.2.PlateG.1inPerspective(LiveInterior). PlateG.3.ThePolygonsConstitutingaDye(Modo).
PlateG.4.ACrystalCubeReflectedbyTwoParallelMirrors(Modo).
TheMirror,Camera, andCubeLayout.
Part III Curves and Surfaces The main task of computer graphics is to generate and display three-dimensional objects. Only the outer surface of an object is important because there is no need to actually penetrate inside. Thus, it is important to develop mathematical methods for the generation of surfaces (more precisely, for developing mathematical models of surfaces). Even though an object is three-dimensional, its surface is two-dimensional. The surface is embedded in three-dimensional space, it may have a complex shape and may be highly curved, folded, crumpled, or crinkled, but it is two-dimensional because it is possible to specify the location of any (three-dimensional) point on the surface by means of two parameters (or coordinates). Stated another way, a surface is two-dimensional because it has no depth; it is infinitely thin. It turns out that a surface can be considered a set of curves. Imagine a family of curves such that adjacent curves differ only slightly in shape. When the curves are placed side by side, they form a surface. Thus, the key to developing mathematical models of surfaces is to be able to generate curves. Curves, in turn, consist of points. However, a model or an equation of a curve should be simple, easy to compute, and should depend on only a small number of points. Thus, this part of the book is concerned with curves and surfaces, and it starts by discussing how a smooth, continuous curve can be represented by an equation that depends on a small number of key points (and sometimes also on a few tangent vectors). The most important term in the area of curve and surface design is interpolation. It comes from the Latin inter (between) and polare (to polish) and it means to compute new values that lie between (or that are an average of) certain given values. A typical algorithm for curves starts with a set of points and employs interpolation to compute a smooth curve that passes through the points. Such points are termed data points and they define an interpolating curve. Some methods start with both points and vectors and compute a curve that passes through the points and at certain locations it also moves in the directions of the given vectors. Another important term in this area is approximation. Certain curve and surface
430
Curves and Surfaces
methods start with points and perhaps also vectors, and compute a curve or a surface that passes close to the points but not necessarily through them. Such points are known as control points and the curve or the surface defined by them is referred to as an approximating curve or surface. Most chapters in this part of the book describe interpolation and approximation methods. Chapter 8 presents the basic theory of curves and surfaces. It discusses the allimportant parametric representation and covers basic concepts such as curvature, tangent vectors, normal vectors, curve and surface continuity, and Cartesian products. Chapter 9 introduces the simplest curves and surfaces. Straight lines, flat planes, triangles, and bilinear and lofted surfaces are presented and illustrated with examples. Chapter 10 discusses polynomial interpolation. Given a set of points, the problem is to compute a polynomial that passes through them. This problem is then extended to a surface patch that passes through a given two-dimensional set of points. The chapter starts with the important parametric cubic (PC) curves. It continues with the general method of Lagrange interpolation and its relative, the Newton interpolation method. Simple polynomial surfaces are presented, followed by Coons surfaces, a family of simple surface patches based on polynomials. The mathematically-elegant Hermite interpolation technique is the topic of Chapter 11. The chapter discusses cubic and higher-order Hermite curve segments, special and degenerate Hermite segments, Hermite interpolation curves, the Ferguson surface patch, the Coons surface patch, the bicubic surface patch, and Gordon surfaces. A few other topics are also touched upon. The important concept of splines is covered in Chapter 12. Spline methods for curves and surfaces are more practical than polynomial methods, and several spline methods are based on Hermite interpolation. The main topics in this chapter are cubic splines (several varieties are discussed), Akima spline, cardinal splines, Kochanek–Bartels splines, spline surface patches, and cardinal spline patches. Chapter 13 is devoted to B´ezier methods for curves and surfaces. The Bernstein form of the B´ezier curve is introduced, followed by techniques for fast computation of the curve and by a list of the properties of the curve. This leads to a discussion of how to smoothly connect B´ezier segments. The de Casteljau construction of the B´ezier curve is described next. It is followed by the technique of blossoming and by methods for subdividing the curve, for degree elevation, and for controlling its tension. Sometimes one wants to interpolate a set of points by a B´ezier curve and this problem is also discussed. Rational B´ezier curves have important advantages and are assigned a separate section. The chapter continues with material on B´ezier surfaces. The topics discussed are rectangular B´ezier surfaces and their smooth joining, triangular B´ezier surfaces and their smooth joining, and the Gregory surface patch and its tangent vectors. The last of the “interpolation/approximation” chapters is Chapter 14, on the allimportant B-spline technique. B-spline curve topics are the quadratic uniform B-spline curve, the cubic uniform B-spline curve, multiple control points, cubic B-splines with tension, higher-degree uniform B-splines, interpolating B-splines, open uniform B-spline, nonuniform B-splines, matrix form of the nonuniform B-spline curve, subdividing the B-spline curve, and NURBS. The B-spline surface topics are uniform B-spline surfaces, an interpolating bicubic patch, and a quadratic-cubic B-spline patch.
Curves and Surfaces
431
Subdivision methods for curves and surfaces are discussed in Chapter 15. These methods are also based on interpolation, but are different from the traditional interpolation methods discussed in the preceding chapters. The following important techniques are described in this chapter: The de Casteljau refinement process, Chaikin’s algorithm, the quadratic uniform B-spline curve, the cubic uniform B-spline curve, biquadratic B-spline patches, bicubic B-spline patches, Doo–Sabin subdivision methods, Catmull– Clark surfaces, and Loop subdivision surfaces. Chapter 16 presents the various types of sweep surfaces. This is a completely different approach to surface design and representation. A sweep surface is generated by constructing a curve and moving it along another curve, while optionally also rotating and scaling it, to create a surface patch. A special case of sweep surfaces is surfaces of revolution. They are created when a curve is rotated about an axis. Resources for Curves and Surfaces As is natural to expect, the World Wide Web has many resources for curves and surface (a field better known as computer-aided geometric design or CAGD). In addition to the many texts available in this field, the journals CAD and CAGD carry state-of-the-art papers and articles. See [CAD 04] and [CAGD 04]. Following is a list of some of the most important resources for computer graphics, not just CAGD, current as of late 2010. http://www.siggraph.org/ is the official home page of SIGGRAPH, the special interest group for graphics, one of many SIGs that are part of the ACM. The Web page http://www.siggraph.org/publications has useful course notes from SIGGRAPH conferences. The Web page http://www.faqs.org/faqs/graphics/faq/ by John Grieggs has answers to frequently-asked questions on graphics, as well as pointers to other resources. It hasn’t been updated since 1995. See http://www.cse.ohio-state.edu/~parent/ for the latest on Richard Parent’s book on computer animation. http://mambo.ucsc.edu/psl/cgx.html is a jumping point to many sites that deal with computer graphics. A similar site is http://www.cs.rit.edu/~ncs/graphics.html that also has many links to CG sites. IEEE Computer Graphics and Applications is a technical journal carrying research papers and news. See http://www.computer.org/portal/web/cga/home. Animation Magazine is a monthly publication covering the entire animation field, computer and otherwise. Located at http://www.bcdonline.com/animag/. Computer Graphics World is a monthly publication concentrating on news, see http://www.cgw.com/. An Internet search for CAD or CAGD returns many sites.
Curves and Surfaces
432 Software Resources
Those who want to experiment with curves and surfaces can either write their own software (most likely in OpenGL) or learn how to use one of several powerful software packages available either commercially or as freeware. Here are a few. Mathematica, from [Wolfram Research 05], is the granddaddy of all mathematical software. It has facilities for numerical computations, symbolic manipulations, and graphics. It also has all the features of a very high-level programming language. Matlab (matrix lab), from [Mathworks 05] is a similar powerful package that many find easier to use. Blender is powerful software that computes and displays many types of curves and surfaces. It has powerful tools for animation and game design and is available for several platforms from [Blender 05]. DesignMentor is a free software package that computes and displays curves, surfaces, and Voronoi regions and triangulations. It is available from [DesignMentor 05]. Wings3D, from [Wings3D 05], is free software that constructs subdivision surfaces. GIMP is a free image manipulation program for tasks such as photo retouching, image composition, and image authoring. It is available from [GIMP 05] for many operating systems, in many languages, but it does not compute curves and surfaces. This part of the book is dedicated to the memory of Pierre B´ezier (1910–1999).
The question of Beauty takes us out of surfaces, to thinking of the foundations of things.
—Ralph Waldo Emerson, The Conduct of Life (1860)
8 Basic Theory 8.1 Points and Vectors Real-life methods for constructing curves and surfaces often start with points and vectors, which is why we start with a short discussion of the properties of these mathematical entities. The material in this section applies to both two-dimensional and three-dimensional points and vectors, but the examples are given in two dimensions. Points and vectors are different mathematical entities. A point has no dimensions; it represents a location in space. A vector, on the other hand, has no well-defined location and its only attributes are direction and magnitude. People tend to confuse points and vectors because it is natural to associate a point P with the vector v that points from the origin to P (Figure 8.1a). This association is useful, but the reader should bear in mind that P and v are different. Both points and vectors are represented by pairs or triplets of real numbers, but these numbers have different meanings. A point with coordinates (3, 4) is located 3 units to the right of the y axis and 4 units above the x axis. A vector with components (3, 4), however, points in direction 4/3 (it moves 3 units in the√x direction for every 4 units in the y direction, so its slope is 4/3) and its magnitude is 32 + 42 = 5. It can be located anywhere. In mathematics, entities are always associated with operations. An entity that cannot be operated on is generally not useful. Thus, we discuss operations on points and vectors. The first operation is to multiply a point P by a real number α. The product αP is a point on the line connecting P to the origin (Figure 8.1b). Note that this line is infinite and αP can be located anywhere on it, depending on the value of α. The next operation is subtracting points. Let P0 = (x0 , y0 ) and P1 = (x1 , y1 ) be two points. The difference P1 − P0 = (x1 − x0 , y1 − y0 ) = (Δx, Δy) is well defined. It is the vector (the direction and distance) from P0 to P1 (Figure 8.1b). D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_8, © Springer-Verlag London Limited 2011
433
8.1 Points and Vectors
434
y
y
2P1
d
P1
P
w b
P1
c
v
w
P1−P0 P0 (a)
x
(b)
a
x
(c)
Figure 8.1: Operations on Points.
Figure 8.1c shows two pairs of points a b and c d. Points a and c are different and so are b and d. The vectors b − a and d − c, however, are identical Example: The two points P0 = (5, 4) and P1 = (2, 6) are subtracted to produce the pair P1 − P0 = (−3, 2). The new pair is a vector, because it represents a direction and a distance. To get from P0 to P1 , we need to move −3 units in the x direction and 2 units in the y direction. Similarly, P0 − P1 is the direction from P1 to P0 . The distance between the points is (−3)2 + 22 . These properties do not depend on the particular coordinate axes used. If we translate the origin—or, equivalently, translate the points—m units in the x direction and n units in the y direction, the points will have new coordinates, but the difference will not change. The same property (the difference of points being independent of the coordinate axes) holds after rotation, scaling, shearing, and reflection: the so-called affine transformations (or mappings, Page 218). This is why the operation of subtracting two points is affinely invariant. (Note that the product αP is also affinely invariant.) The sum of a point and a vector is well defined and is a point. Figure 8.2a shows the two sums P∗1 = P1 + v and P∗2 = P2 + v. It is easy to see that the relative positions of P∗1 and P∗2 are the same as those of P1 and P2 . Another way to look at the sum P+v is to observe that it moves us away from P, which is a point, in a certain direction and by a certain distance, thereby bringing us to another point. Yet another way of showing the same thing is to rewrite the relation a − b = v as a = b + v, which shows that the sum of point b and vector v is a point a. Given any two points P0 and P2 , the expression P0 + α(P2 − P0 ) is the sum of a point and a vector, so it is a point that we can denote by P1 . The vector P2 − P0 points from P0 to P2 , so adding it to P0 produces a point on the line connecting P0 to P2 . Thus, we conclude that the three points P0 , P1 , and P2 are collinear. Note that the expression P1 = P0 + α(P2 − P0 ) can be written P1 = (1 − α)P0 + αP2 , showing that P1 is a linear combination of P0 and P2 . In general, any of three collinear points can be written as a linear combination of the other two. Such points are not independent. Exercise 8.1: Given the three points P0 = (1, 1), P1 = (2, 2.5), and P2 = (3, 4), are they collinear?
8 Basic Theory
435
y Old coordinate system
New coordinate system Old sum
New sum
P1*
P1
P2*
v
v P1
P2
(a)
P0
x (b)
Figure 8.2: (a) Adding a Point and a Vector. (b) Adding Points.
Exercise 8.2: What can we say about four collinear points? The next operation to consider is the sum of points. In general this operation is not well defined. We intuitively feel that adding two points should be done like adding vectors. The lines connecting the points with the origin should be added, to produce a sum vector. In fact, as Figure 8.2b shows, this operation depends on the coordinate axes. Moving the origin (or moving the points) will move the sum of the vectors a different distance or in a different direction, thereby changing the sum of the points. This is why the sum of points is, in general, undefined. Example: Given the two points (5, 3) and (7, −2), we add them to produce (12, 1). We now move the two points one unit to the left to become (4, 3) and (6, −2). Their new sum is (10, 1), a point located two units to the left of the original sum. There is, however, one important special case where the sum of points is well defined, the so-called barycentric sum. If we multiply each point by a weight and if the weights add up to 1, then the sum of the weighted points is affinely invariant, i.e., it is a valid point. Here is the (simple) proof: If ni=0 wi = 1, then n i=0
wi Pi = P0 +
n
wi Pi − (1 − w0 )P0
i=1
= P0 + w1 P1 + w2 P2 + · · · + wn Pn − (w1 + · · · + wn )P0 = P0 + w1 (P1 − P0 ) + w2 (P2 − P0 ) + · · · + wn (Pn − P0 ) n wi (Pi − P0 ). = P0 +
(8.1)
i=1
This is the sum of the point P0 and the vector ni=1 wi (Pi − P0 ), and we already know that the sum of a point and a vector is a point. Notice that the proof above does not assume that the weights are nonnegative and barycentric weights can in fact be negative. A little experiment may serve to
8.1 Points and Vectors
436
convince the sceptics. Given two points (a, b) and (c, d) we construct the barycentric sum (x, y) = −0.5(a, b) + 1.5(c, d). If we now translate both points by the vector (α, β), the sum is modified to −0.5(a + α, b + β) + 1.5(c + α, d + β) = −0.5(a, b) + 1.5(c, d) + (α, β) = (x, y) + (α, β). The barycentric sum (x, y) is translated by the same vector. Mathematically-savvy readers may be familiar with the concept of normalization. Given a set of weights wi that add up to α = 1, they can be normalized by dividing each weight by the sum α. Thus, if we need a barycentric sum of certain quantities Pi and we are given nonbarycentric weights wi , we can compute n i=1
w n i
j=1 wj
Pi =
n wi i=1
α
Pi =
n
r i Pi ,
i=1
where the new, normalized weights ri are barycentric. Barycentric sums are common in curve and surface design. This book has numerous examples of curves and surfaces that are constructed as weighted sums of points, and they all must be barycentric. When a curve consists of a non-barycentric weighted sum of points, its shape depends on the particular coordinate system used. The shape changes when either the curve or the coordinate axes are moved or are affinely transformed. Such a curve is ill conditioned and cannot be used in practice. The Isotropic Principle Given a curve that’s constructed as the sum P(t) = wi Pi + ui vi , where Pi are points and vi are vectors, the curve is independent of the particular coordinate system used if and only if the weights wi are barycentric. There is no similar requirement for the ui weights. Notice that the points can be data points, control points, or any other points. The vectors can be tangents, second derivatives or any other vectors, but the statement above is always true. This statement is sometimes known as the isotropic principle. A special case is the barycentric sum of two points (1 − t)P0 + tP1 . This is a point on the line from P0 to P1 . In fact, the entire straight segment from P0 to P1 is obtained when t is varied from 0 to 1 (Figure 8.3a). To see this, we write P(t) = (1 − t)P0 + tP1 . Clearly, P(0) = P0 and P(1) = P1 . Also, since P(t) = t(P1 − P0 ) + P0 , P(t) is a linear function of t, which implies a straight line in t. The tangent vector is the derivative dP dt and it is the constant P1 − P0 , the direction from P0 to P1 . Notice that this derivative is a vector, not a number. Selecting t = 1/2 yields P(0.5) = 0.5P1 +0.5P0 , the midpoint between P0 and P1 . The concept of barycentric weights is so useful that the two numbers 1 − t and t are termed the barycentric coordinates of point P(t) with respect to P0 and P1 .
8 Basic Theory
437 P1
P1 (1- t)P0+tP1 P
(1- 0.5)P0+0.5P1
P2 P0
P0 (a)
(b) Figure 8.3: Line and Triangle.
The word barycentric seems to have first been used in [Dupuy 48]. It is derived from barycenter, meaning center of gravity, because such weights are used to calculate the center of gravity of an object. Barycentric weights have many uses in geometry in general and in curve and surface design in particular.
Another useful example is the barycentric coordinates of a two-dimensional point with respect to the three corners of a triangle. Imagine a triangle with corners P0 , P1 , and P2 (Figure 8.3b). Any point P inside the triangle can be expressed as the weighted combination (8.2) P = uP0 + vP1 + wP2 , where u + v + w = 1. The proof is that Equation (8.2) can be written explicitly as three equations in the three unknowns u, v, and w: Px = uP0x + vP1x + wP2x , Py = uP0y + vP1y + wP2y , 1 = u + v + w.
(8.3)
The solutions are unique provided that the three equations are independent. Exercise 8.3: Show that Equation (8.3) consists of three independent equations if the three points P0 , P1 , and P2 are independent. Exercise 8.4: Show that the barycentric coordinates of point P0 with respect to P0 , P1 , and P2 are (1, 0, 0). Also discuss the barycentric coordinates of points outside the triangle. Example: Let P0 = (1, 1), P1 = (2, 3), P2 = (5, 1), and P = (2, 2). Equation (8.3) becomes (2, 2) = u(1, 1) + v(2, 3) + w(5, 1); u + v + w = 1,
8.1 Points and Vectors
438 or
2 = u + 2v + 5w, 2 = u + 3v + w, 1 = u + v + w,
⎧ ⎨ u = 3/8, v = 1/2, ⎩ w = 1/8.
which yield
Exercise 8.5: For a given triangle, calculate the (x, y, z) coordinates of the point with barycentric coordinates (1/3, 1/3, 1/3). This point is called the centroid and is one of many centers that can be defined for a triangle. (Imagine cutting the triangle out of a piece of cardboard. If you try to support it at the centroid, it will balance.) (This material is useful for the triangular B´ezier surface patches of Section 13.25.) The barycentric combination is the most fundamental operation on points; so much so that it is used to define affine transformations. The definition is: a transformation of points in space Hence, if P = is affine if it leaves barycentric combinations invariant. wi Pi and wi = 1, and if T is an affine transformation, then TP = wi TPi . All common geometric transformations—such as scaling, shearing, rotation, and reflection— are affine. Note. The difference of two points is a vector. We can consider such a difference a weighted sum where the weights add up to zero (they are +1 and −1). It turns out that a weighted sum of points where the weights add up to zero is a vector. To prove this, let n wi Pi , where wi = 0, Q= i=1
and let P be a point. The sum R = Q + P is barycentric (since its coefficients add up to 1) and is therefore a point. The difference R − P = Q is a difference of points and is therefore a vector. Note. Multiplying a point by a number produces a point, so if P is a point, then −P is also a point. It is located on the line connecting P with the origin, on the other side of the origin from P. Once this is understood, we notice that the sum of points P + Q can be written as the difference of points P − (−Q). This difference is, of course, the vector from point −Q to point P (Figure 8.4), so we conclude that the sum P + Q of two points is well defined but is not very useful, since it tells us something about the relative positions of P and −Q, not of P and Q. Assuming that Figure 8.4 depicts the points Q = (−5, −1) and P = (4, 3), the sum P + Q equals (−5, −1) + (4, 3) = (−1, 2). This shows that in order to get from point −Q to point P, we need to move one negative step in the x direction for every two steps in the y direction. y
P P+Q −Q
Q
x
Figure 8.4: Adding Two Points.
8 Basic Theory
439
Exercise 8.6: Let P and Q be points and let v and w be vectors. What is the sum P − Q + v + w?
8.1.1 Operations on Vectors The notation |P| indicates the magnitude (or absolute value) of vector P. Vector addition is defined by adding the individual elements of the vectors being added: P + Q = (Px , Py , Pz ) + (Qx , Qy , Qz ) = (Px + Qx , Py + Qy , Pz + Qz ). This operation is both commutative P + Q = Q + P and associative P + (Q + T) = (P + Q) + T. Subtraction of vectors (P − Q) is done similarly and results in the vector from Q to P. Vectors can be multiplied in three different ways as follows (see also appendix A for a more detailed discussion): 1. The product of a real number α by a vector P is denoted by αP and produces the vector (αx, αy, αz). It changes the magnitude of P by a factor α, but does not change its direction. 2. The dot product of two vectors is denoted by P • Q and is defined as the scalar (Px , Py , Pz )(Qx , Qy , Qz )T = PQT = Px Qx + Py Qy + Pz Qz . This also equals |P| |Q| cos θ, where θ is the angle between the vectors. The dot product of perpendicular vectors (also called orthogonal vectors) is therefore zero. The dot product is commutative, P • Q = Q • P. The triple product (P • Q)R is sometimes useful. It can be represented as (P • Q)R = (Px Qx + Py Qy + Pz Qz )(Rx , Ry , Rz )
= (Px Qx + Py Qy + Pz Qz )Rx , (Px Qx + Py Qy + Pz Qz )Ry , (Px Qx + Py Qy + Pz Qz ) Rz ⎛ ⎞ Px Rx Py Rx Pz Rx ⎝ = (Qx , Qy , Qz ) Px Ry Py Ry Pz Ry ⎠ Px Rz Py Rz Pz Rz = Q(PR),
(A.3)
where the notation (PR) stands for the 3×3 matrix of Equation (A.3). 3. The cross product of two vectors (also called the vector product) is denoted by P×Q and is defined as the vector (P2 Q3 − P3 Q2 , −P1 Q3 + P3 Q1 , P1 Q2 − P2 Q1 ).
(A.4)
8.1.2 Projecting a Vector A common and useful operation on vectors is projecting a vector a on another vector b. The idea is to break vector a up into two perpendicular components c and d, such that c is in the direction of b. This operation is also used in Section 19.6.
8.1 Points and Vectors
440
N
N
a d c
l
r
-c
c
b d (b)
(a)
d (c)
Figure 8.5: Projecting a Vector.
Figure 8.5a shows that a = c + d and |c| = |a| cos α. On the other hand, a • b = |a| |b| cos α, yielding the magnitude of c: |c| = |a|
(a • b) (a • b) = . |a| |b| |b|
(8.4)
The direction of c is identical to the direction of b, so we can write vector c as c = |c|
b (a • b) = b. |b| |b|2
(8.5)
Example: Given vectors a = (2, 1) and b = (1, 0), we compute the projection of a on b. c=
2×1 + 1×0 (a • b) b= (2, 0) = (4, 0), |b|2 12 + 02
d = a − c = (−2, 1).
Exercise 8.7: The projection method works also for three-dimensional vectors. Given vectors a = (2, 1, 3) and b = (1, 0, −1), calculate the projection of a on b.
8.1.3 An Application We apply vector projections to the calculation of the direction of reflection. Figure 8.5b shows a ray of light l reflecting from a surface at a point with a normal vector N. The ray is reflected in a direction r such that the angle of incidence equals the angle of reflection. Assuming that l and N are given and that N is a unit vector, we calculate r. The idea is to project l in the direction of N, yielding c=
l•N N = (l • N)N, |N|2
d = l − c.
(8.6)
Vector r is then given as the difference (Figure 8.5c) r = d − c = l − 2c = l − 2(l • N)N. Equation (8.6) implies |r| = |l|. In practice, the intensity of the reflected light is lower than that of the incident beam and is determined by a shading model (such as in Section 17.2).
8 Basic Theory
441
8.2 Length of Curves To compute the arc length of the curve P(t), 0 ≤ t ≤ 1, (see Section 8.6 for parametric curves) we first divide the arc into a large number of short, straight segments of length δP. The length of the arc equals approximately the sum δP. Figure 8.6a shows that δP = P(t + δt) − P(t). On the other hand, we can write δP(t)/δt ≈ Pt (t). In the limit, when δP → 0, we can write dP(t)/dt = Pt (t) or dP(t) = Pt (t)dt and get the exact arc length by replacing the sum δP with the integral
|dP(t)| =
|Pt (t)| dt.
P
P
t
)
P(t)
P
P
(t
+
P(t)
0
1
+ P(t
t)
P(
(a)
t+2
t)
(b) Figure 8.6: Arc Length and Area of a Curve.
To find the area subtended at the origin by the vectors P(0) and P(1) and the curve, we again divide the curve into many straight segments of length δP and create the narrow triangles shown in Figure 8.6b. The area of each triangular slice is (1/2)P(t)×δP or (1/2)P(t)×Pt (t)δt, so, in the limit, the total area is the integral 1/2 0
1
P(t)×Pt (t) dt.
Note that the above expression is a vector. Its magnitude is the area and its direction is perpendicular to the plane defined by P(0) and P(1). The length of your education is less important than its breadth, and the length of your life is less important than its depth. —Marilyn vos Savant.
442
8.3 Example: Area of Planar Polygons
8.3 Example: Area of Planar Polygons Vectors can be used to compute areas of polygons (see Section 9.2.7 for the area of a triangle and Section 2.18 for two-dimensional polygons). Given a planar polygon with consecutive vertices P0 , . . . , Pn , we denote by Pi the vector from the origin to point Pi . The area of the polygon is then given by one of the following expressions: n n 1 1 Pi × Pi+1 , Pi × Pi+1 , N • 2 2 i=0
i=0
where Pn+1 = P0 . The expression on the left applies to polygons that lie in the xy plane. The one on the right applies to polygons that lie in some plane perpendicular to a given unit vector N. These expressions hold even for nonconvex polygons. The first expression can be summarized in an easy-to-remember way. Assuming that the vertices Pi = (xi , yi ) are enumerated counterclockwise, the area can be expressed by xn x1 def 1 x1 x2 x3 ... = (x1 y2 + x2 y3 + · · · + xn y1 ) − (x2 y1 + x3 y2 + · · · + x1 yn ). 2 y1 y2 y3 yn y1 The operator pair · · · is defined as the sum of the products of the “downward” diagonals, minus the sum of the products of the “upward” diagonals. Pick’s theorem: The coordinates of a pixel are integers, so we can think of pixels as points of a grid. A polygon made of pixels is therefore a lattice polygon. There is an elegant formula, called Pick’s theorem, for the area of a lattice polygon. Given a polygon whose boundary consists of connected nonintersecting straight segments, its area is I +B/2−1, where I is the number of interior lattice points and B is the number of boundary lattice points. For example, the area of the simple lattice polygon of Figure 8.7 is 31 + 15/2 − 1 = 37.5.
Figure 8.7: Illustration of Pick’s Theorem.
8 Basic Theory
443
8.4 Example: Volume of Polyhedra Assume that a polyhedron with planar polygonal faces S0 , . . . , Sn is given. We denote by Qi any point on face Si , and by Ni the outward pointing unit normal vector to face Si . The volume of the polygon is then given by n 1 (Qi • Ni )area(Si ) , 3 i=0
where Qi is the vector from the origin to point Qi . We can apply the previous example to the areas of the faces. Denote by P0j , . . . , Pmj the vertices of face Sj enumerated counterclockwise with respect to Nj ; then, the area of Sj is ⎫ ⎧ m ⎬ ⎨ 1 . N • P × P j kj k+1,j ⎭ ⎩ 2 j=0 We simplify the result above in two ways: (1) Instead of a general point on face Sj , we choose vertex P0j , and (2) we express the unit normal to face Sj in terms of the three vertices P0j , P1j , and P2j . Nj =
(P1j − P0j ) × (P2j − P0j ) . |(P1j − P0j ) × (P2j − P0j )|
Combining all this results in the following expression for the volume: ⎡ ⎤ 1 ⎣ (P0j • Nj ) Nj • Pkj × Pk+1,j ⎦ . 6 j k
Note, again, that this expression holds even for nonconvex polyhedra. Summary: The following operations have been discussed in this section: point − point = vector, scalar×vector = vector,
scalar × point = point, vector ± vector = vector, point + vector = point, vector • vector = scalar, vector × vector = vector.
The operation point + point is left undefined (since it is not useful). A barycentric sum of points is a point, and a weighted sum of points where the weights add up to zero is a vector. From the dictionary Vector: (1) A variable quantity that can be resolved into components. (2) A straight line segment whose length is magnitude and whose orientation in space is direction. (3) Any agent (person or animal or microorganism) that carries and transmits a disease.
444
8.5 Parametric Blending
8.5 Parametric Blending Parametric blending is a family of techniques that make it possible to vary the value of some quantity in small steps, without any discontinuities. Blending can be thought of as averaging or interpolating. The following are examples: 1. Numbers. The average of the two numbers 15 and 18 is (15 + 18)/2 = 16.5. This can also be written as 0.5×15 + 0.5×18, which can be interpreted as the blend, or the weighted sum, of the two numbers, where each is assigned a weight of 0.5. When the weights are different, such as 0.9×15 + 0.1×18, the result is a blend of 90% of 15 and 10% of 18. 2. Points. If P1 and P2 are points, then the expression αP1 + βP2 is a blend of the two points, in which α and β are the weights (or the coefficients). If α and β are nonnegative and α + β = 1, then the blend is a point on the straight segment connecting P1 and P2 . 3. Rotations. A rotation in three dimensions is described by means of the rotation angle (one number) and the axis of rotation (three numbers). These four numbers can be combined into a mathematical entity called quaternion and two quaternions can also be blended, resulting in a smooth sequence of rotations that proceeds in small, equal steps from an initial rotation to a final one. This type of blending is useful in computer animation. Sections 4.4.5 and 19.8.2 and Appendix B have more information about those interesting mathematical objects. 4. Curve construction. Given a number of points, a curve can be created as a weighted sum of the points. It has the form wi (t)Pi , where the weights wi (t) are barycentric. Such a curve is a blend of the points. For each value of t, the blend is different, but we have to make sure that the sum of the weights is always 1. It is possible to blend vectors, in addition to points, as part of the curve, and the weights of the vectors don’t have to satisfy any particular requirement. Most of the curve methods described in this book generate a curve as a blend of points, vectors, or both. A special case of curve construction is the linear blending of two points, which can be expressed as (1 − t)P1 + t P2 for 0 ≤ t ≤ 1 (this is the fundamental Equation (9.1)). 5. Surfaces. Using the same principle, points, vectors, and curves can be blended to form a surface patch. 6. Images. Various types of image processing, such as sharpening, blurring, and embossing, are performed by blending an image with a special mask image (Section 2.31 and Plate H.4). 7. It is possible to blend points in nonlinear ways (Section 19.9.1). An intuitive way to get, for example, quadratic blending is to square the two weights of the linear blend. However, the result, which is P(t) = (1 − t)2 P1 + t2 P2 , depends on the particular coordinate axes used, since the two coefficients (1 − t)2 and t2 are not barycentric. It turns out that the sum (1− t)2 + 2t(1 − t) + t2 equals 1. As a result, we can use quadratic blending to blend three points, but not two. Similarly, if we try a cubic blend by simply writing P(t) = (1 − t)3 P1 + t3 P2 , we end up with the same problem. Cubic blending can be achieved by adding four terms with weights t3 , 3t2 (1 − t), 3t(1 − t)2 , and (1 − t)3 . We therefore conclude that B´ezier methods (Chapter 13) can be used for blending. The B´ezier curve is a result of blending several points with the Bernstein polynomials,
8 Basic Theory
445
which add up to unity. Quadratic and cubic blending are special cases of the B´ezier blending (or the B´ezier interpolation).
8.6 Parametric Curves As mentioned in the Preface, the main aim of computer graphics is to display an arbitrary surface so that it looks real. The first step toward this goal is an understanding of curves. Once we have an algorithm to model (calculate) and display any curve, we may try to extend it to a surface. In practice, curves (and surfaces) are specified by the user in terms of points and are constructed in an interactive process. The user starts by entering the coordinates of points, either by scanning a rough image of the desired shape and digitizing certain points on the image, or by drawing a rough shape on the screen and selecting certain points with a pointing device such as a mouse. After the curve has been drawn, the user may want to modify its shape by moving, adding, or deleting points. Such points can be employed in two different ways: 1. We may want the curve to pass through them. Such points are called data points and the curve is called an interpolating curve. 2. We may want the points to control the shape of the curve by exerting a “pull” on it. A point may pull part of the curve toward it, allowing the user to change the shape of the curve by moving the point. Generally, however, the curve does not pass through the point. Such points are called control points and the curve is called an approximating curve. A mathematical function y = f (x) can be plotted as a curve. Such a function is the explicit representation of the curve. The explicit representation is not general, since it cannot represent vertical lines and is also single-valued. For each value of x, only a single value of y is normally computed by the function. The implicit representation of a curve has the form F (x, y) = 0. It can represent multivalued curves (more than one y value for an x value). A common example is the circle, whose implicit representation is x2 + y 2 − R2 = 0. The explicit and implicit curve representations can be used only when the function is known. In practical applications—where complex curves such as the shape of a car or of a toaster are needed—the function is normally unknown, which is why a different approach is required. The curve representation used in practice is called the parametric representation. A
two-dimensional parametric curve has the form P(t) = f (t), g(t) or P(t) = x(t), y(t) . The functions f and g become the (x, y) coordinates of any point on the curve, and the points are obtained when the parameter t is varied over a certain interval [a, b], normally [0, 1]. A simple example of a two-dimensional parametric curve is P(t) = (2t − 1, t2 ). When t is varied from 0 to 1, the curve proceeds from the initial point P(0) = (−1, 0) to the final point P(1) = (1, 1). The x coordinate is linear in t and the y coordinate varies as t2 .
446
8.6 Parametric Curves
˙ or by (Pxt (t), Pyt (t)). This The first derivative dP(t) is denoted by Pt (t), or by P, dt derivative is the tangent vector to the curve at any point. The derivative is a vector and not a point because it is the limit of the difference (P(t + Δ) − P(t))/Δ, and the difference of points is a vector. As a vector, the tangent possesses a direction (the direction of the curve at the point) and a magnitude (which indicates the speed of the curve at the point). The tangent, however, is not the slope of the curve. The tangent is a pair of numbers, whereas the slope is a single number. The slope equals tan θ, where θ is the angle between the tangent vector and the x axis. The slope of a two-dimensional parametric curve is obtained by dy = dx
dy dt dx dt
=
Pyt (t) . Pxt (t)
Example: The curve P(t) = (x(t), y(t)) = (1 + t2 /2, t2 ). Its tangent vector is P (t) = (t, 2t) and the slope is 2t/t = 2. The slope is constant, which indicates that the curve is a straight line. This is also easy to see from the tangent vector. The direction of this vector is always the same since it can be described by saying “for every t steps in the x direction, move 2t steps in the y direction.” t
Example: A circle. Because of its high symmetry, a circle can be represented in different ways. We list four different parametric representations of a circle of radius R centered on the origin. 1. P(t) = R(cos t, sin t), where 0 ≤ t ≤ 2π. This is identical to the polar representation. 2. Substituting t = tan(u/2) yields P(t) = R[(1 − t2 )/(1 + t2 ), 2t/(1 + t2 )]. When 0 ≤ t ≤ 1, this generates √ the first quadrant from (R, 0) to (0, R) (see also Figure 8.8a). 3. P(t) = R(t, ± 1 − t2 ). When 0 ≤ t ≤ 1 this generates the first quadrant from (0, R) to (R, 0) and, simultaneously, the third quadrant from (0, −R) to (−R, 0). 4. P(t) = (0.441, −0.441)t3 + (−1.485, −0.162)t2 + (0.044, 1.603)t + (1, 0). When 0 ≤ t ≤ 1, this generates (approximately) the first quadrant from (1, 0) to (0, 1). (See also circle example in Section 13.15, and Equation (Ans.35).) Exercise 8.8: Explain how representation 4 is derived. Exercise 8.9: Figure 8.8b shows a polygon inscribed in a circle. It is clear that adding sides to the polygon brings it closer to the circle. Calculate the difference R − d as a function of n, the number of polygon sides. The particle paradigm. Deeper insight into the behavior of parametric functions can be gained by thinking of the curve P(t) = (x(t), y(t)) as a path traced out by a hypothetical particle. The parameter t can then be interpreted as time and the first two derivatives Pt (t) and Ptt (t) can be interpreted as the velocity and acceleration of the particle, respectively. It turns out that different parametric representations of the same curve may have different “speeds.” The particle represented by (cos t, sin t), for t example, “moves” along the circle at speed P (t) = (− sin t, cos t), which is constant 2 t since |P (t)| = sin t + cos2 t = 1. The particle of circle representation 2, on the other
8 Basic Theory hand, moves at the variable velocity Pt (t) =
−4t 2(1 − t2 ) , 2 2 (1 + t ) (1 + t2 )2
447
.
y
R
t2
1+
1- t2
2t
R d
π/n
x
(a)
(b)
Figure 8.8: (a) A Parametric Representation. (b) A Polygon Inscribed in a Circle.
Exercise 8.10: Show that this velocity varies with t. Exercise 8.11: What three-dimensional curve is described by the parametric function (cos t, sin t, t)? (Hint: see Section 9.4.1). See also Page 840 for the parametric representations of the sphere, the ellipsoid, and the torus as a small circle rotating around a larger circle. Straight line—the shortest way between two points. —Euclid. Cycloid—-the fastest way between two points. —Johann Bernoulli. Curve—the loveliest way between two points. —Mae West.
8.7 Properties of Parametric Curves Generally, it is impossible to tell much about the behavior of a parametric curve P(t) = (x(t), y(t)) by examining the two components x(t) and y(t) separately. Each of the two functions may have features that do not exist in the combination. The reverse is also true—the combined curve may have features not found in any of the two components. Here is an example of two smooth curves whose combination is a parametric plane curve with a cusp (a sharp corner). The following two curves are polynomials in t: x(t) = −18t2 + 18t + 2,
y(t) = −16t3 + 24t2 − 12t + 5,
where 0 ≤ t ≤ 1.
They are smooth, since their derivatives x (t) = −36t + 18 and y (t) = −48t2 + 48t − 12 are continuous in the range 0 ≤ t ≤ 1. However, the combined curve P(t) = (0, −16)t3 + (−18, 24)t2 + (18, −12)t + (2, 5)
8.7 Properties of Parametric Curves
448
has a sharp corner (a cusp or a kink), because its tangent vector Pt (t) = 3(0, −16)t2 + 2(−18, 24)t + (18, −12) satisfies Pt (0.5) = (0, 0). Exercise 8.12: Find two curves x(t) and y(t), each with a cusp, such that the combined curve P(t) = (x(t), y(t)) is smooth. Exercise 8.13: Find three curves x(t), y(t), and z(t), each a cubic polynomial, such that the combined curve P(t) = (x(t), y(t), z(t)) is not a cubic polynomial. The parametric curves used in computer graphics are normally based on polynomials, since polynomials are simple functions that are easy to calculate and are flexible enough to create many different shapes. However, in principle, any functions can be used to create a parametric curve. Here is an example that uses the smooth sine and cosine curves to create the nonsmooth parametric curve shown on the right. It is defined by the simple expression P(t) = (2 cos(t) + cos(2t), 2 sin(t) − sin(2t)),
2 1
2 -1 -2
where 0 ≤ t ≤ 2π. This curve has cusps at t = 0, t = 0.261799, and t = 0.523599. Another example of a parametric curve that’s not a simple polynomial is the circular B´ezier curve, Equation (13.43). Note. A word about the notation used here. We have used the letter P to denote both points and curves. The same letter is later used to denote surfaces. In spite of using the same letter, the notation is unambiguous. It is always easy to tell what a particular P stands for by counting the number of free parameters. Something like P(u, w) denotes a surface since it depends on two variable parameters, whereas P(0, w) is a curve and P(u0 , 1) (for a fixed u0 ) is a point. One important feature of curves is independence of the coordinate axes. We don’t want the curve to change shape when the coordinate axes (or the points defining the curve) are moved rigidly or rotated. Here is an example of how such a thing can happen. Consider the parametric curve
P(t) = (1 − t)3 P0 + t3 P1 = (1 − t)3 x0 + t3 x1 , (1 − t)3 y0 + t3 y1 . It is easy to see that P(0) = P0 and P(1) = P1 (the curve passes through the two points). What kind of a curve is P(t)? The tangent vector of our curve is
dx dy , = −3(1 − t)2 x0 + 3t2 x1 , −3(1 − t)2 y0 + 3t2 y1 . dt dt
8 Basic Theory
449
To calculate the slope, we have to select actual points. We start with the two points P0 = (0, 0) and P1 = (5, 6). The slope of the curve is dy dy ! dx −3(1 − t)2 0 + 3t2 × 6 6 = = = = constant, dx dt dt −3(1 − t)2 0 + 3t2 × 5 5 so the curve is a straight line. Next, we translate both points by the same amount (0, −1), so the new points are P0 = (0, −1) and P1 = (5, 5). The new slope is 3(1 − t)2 + 15t2 1 = 15t2 5
1 − 1 + 1. t
It is no longer constant and therefore the curve is no longer a straight line (Figure 8.9). The curve has changed its shape just because its endpoints have been moved! 6 5 4 3 2 1 1
2
3
4
5
-1
(* non-barycentric weights example *) Clear[p0,p1,g1,g2,g3,g4]; p0 = {0, 0}; p1 = {5, 6}; g1 = ParametricPlot[(1-t)^3 p0+t^3 p1, {t,0,1}, PlotRange->All, DisplayFunction->Identity]; g3=Graphics[{Red, AbsolutePointSize[4], {Point[p0], Point[p1]}}]; p0 = {0, -1}; p1 = {5, 5}; g2=ParametricPlot[(1-t)^3 p0+t^3 p1, {t, 0, 1}, PlotRange->All, PlotStyle->AbsoluteDashing[{2, 2}], DisplayFunction->Identity]; g4=Graphics[{Red, AbsolutePointSize[6], {Point[p0], Point[p1]}}]; Show[g2, g1, g3, g4, PlotRange->All, AspectRatio->.5]
Figure 8.9: Effect of Nonbarycentric Weights.
n It turns out that a curve of the nform P(t) = i=0 wi (t)Pi , is independent of the particular coordinate axes used if i=0 wi (t) = 1. This is arguably the most important property of barycentric weights. It is easy to extend the concept of parametric curves to three dimensions (space curves) with two minor differences: (1) P(t) should be of the form x(t), y(t), z(t) and (2) the slope of a three-dimensional curve is undefined. Such a curve has a tangent vector dP/dt, but not a slope.
450
8.7 Properties of Parametric Curves
Exercise 8.14: Show that the parametric curve P(t) = P + 2α(Q − P)t + (1 − 2α)(Q − P)t2 ,
0≤t≤1
(8.7)
(where α is any real number) is a straight line, even though it is a polynomial of degree 2 in t. Note that the curve goes from point P to point Q.
8.7.1 Uniform and Nonuniform Parametric Curves So far, we have assumed that the parameter t of a parametric curve P(t) = (x(t), y(t)) varies in the interval [0, 1]. It is also possible to vary t in other ranges, and such curves may be useful in special applications. This idea arises naturally when we try to fit a curve to a given set of data points. One question that should be answered in such a case is what value should the parameter t have at each point. It turns out that this is both a problem and an opportunity. A practical, interactive algorithm for curve design should make it possible to treat the values of t at the data points as parameters, and therefore to produce an entire family of curves, all of whose members pass through the given data points (but behave differently between points). This gives the designer an extra tool that can be used to construct the right curve. The two approaches to this problem are (1) increment t by 1 for each point and (2) increment t by different values. The former approach yields a uniform parametric curve, while the latter results in a nonuniform parametric curve. Uniform parametric curves are normally easy to compute and they produce good results when the points are roughly equally spaced. However, when the spacing of the points is very different, a uniform curve may look strange and unnatural, even though it passes through all the data points. This is when a nonuniform parametric curve should be used. If the spacings of the points are far from uniform, it is common to increase the value of t at point Pi by the distance |Pi − Pi−1 |. Notice that this distance is the chord length from point Pi−1 to point Pi . If this convention is used, then t starts at zero and is assigned the accumulated chord length at every data point. If the curve does not oscillate much between data points, the chord length is a good approximation to the arc length of the curve, with the result that t is assigned, in such a case, values that are close to the arc length. A curve P(s) where the parameter is the arc length s has a tangent vector Ps (s) of magnitude one (it’s a unit vector). If we express such a curve as P(s) = (x(s), y(s)), then (xs (s), y s (s)) is a unit vector, which implies that |xs (s)| ≤ 1 and |y s (s)| ≤ 1. This, in turn, means that the slopes of both curves x(s) and y(s) are bounded between −1 and +1, so the two curves are never too steep and are generally well behaved.
8.7.2 Curve Continuity In practice, a complete curve is often constructed of individual segments, so it is important to understand how individual segments can be connected. There are two types of curve continuities: geometric and parametric. If two consecutive segments meet at a point, the total curve is said to have G0 geometric continuity. (It may look as in Figure 8.10a.) If, in addition, the directions of the tangent vectors of the two segments are the same at the point, the curve has G1 geometric continuity at the point. The two segments connect smoothly (Figure 8.10b). In general, a curve has geometric continuity
8 Basic Theory
451
Gn at a join point if every pair of the first n derivatives of the two segments have the same direction at the point. If the same derivatives also have identical magnitudes at the point, then the curve is said to have C n parametric continuity at the point.
(a)
(c)
(b)
Figure 8.10: (a) G0 Continuity (a Sharp Corner). (b) G1 Continuity (a Smooth Connection). (c) G2 Continuity (a Tight Curve).
We can refer to C 0 , C 1 , and C 2 as point, tangent, and curvature continuities, respectively. Figure 8.11 illustrates the geometric meanings of the three types. In part C 0 of the figure, the curve is continuous at the interior point, but its tangent is not. The curve changes its direction abruptly at the point; it has a kink. In part C 1 , both the curve and its tangent are continuous at the interior point, but the curve changes its shape at the point from a straight line (zero curvature) to a curved line (nonzero curvature). Thus, the curvature is discontinuous at the point. In part C 2 the curve starts curving before it reaches the interior point, in order to preserve its curvature at the point. Generally, high continuity results in a smoother curve.
C
0
C
1
2
C
Figure 8.11: Three Parametric Continuities.
A C k continuity is more restrictive than Gk , so a curve that has C k continuity at a join point also has Gk continuity at the point, but there is an exception. Imagine two segments connecting at a point, where both have tangent vectors of (0, 0, 0) at the point. The vectors are identical, so the curve has C 1 continuity at the point. However, Exercise 12.3 (Page 582) shows that the two segments may move in different directions at the point, in which case the curve will not have G1 continuity. The reason for having two types of continuities has to do with parameter substitution (see box). Given a curve segment P(t) where 0 ≤ t ≤ 1, we can substitute T = t2 . The new segment Q(T ) = Q(t2 ), where 0 ≤ T ≤ 1, is identical in shape to P(t). The two identical curves must, of course, have the same tangents. However, their calculated tangent vectors have different magnitudes because dQ(t2 ) dQ(t) dP(t) = 2t = 2t . dt dt dt
452
8.7 Properties of Parametric Curves Parameter Substitution
Instead of naming the parameter t, we can give it a different name. Moreover, we can use a function of t as the parameter. It can be shown that if g(t) is a function that increases monotonically with t (i.e., if t2 > t1 implies g(t2 ) > g(t1 )), then the curve P(g(t)) will have the same shape as P(t) (although g(t) will normally have to vary in a different range than t). For two-dimensional curves, the substitution does not affect the slope of the curve since dy(g) dg(t) dy(t) dy(t) dg / dt dt . = dx(t) = dx(g) dg(t) dx(t) / dg
dt
dt
This is why we separate the direction and the magnitude of the tangent vectors when considering curve continuities. If the directions of the tangent vectors are equal, they produce a smooth join and we call this case G1 continuity (which is often all that is required in practice). Example: Consider the two straight segments P(t) = (8t, 6t) and Q(t) = (4(t + 2), 3(t + 2)). The first goes from (0, 0) to (8, 6) and the second goes from (8, 6) to (12, 9). Their tangent vectors are Pt (t) = (8, 6) and Qt (t) = (4, 3). The segments connect smoothly at (8, 6) (in fact, they look like one straight segment), but their tangent vectors are different at that point! Thus, the total curve has G1 continuity at point (8, 6), but not C 1 continuity. It is interesting to note, however, that the unit tangent vectors √ √ are equal at the joint. The magnitude of Pt (t) is 82 + 62 = 10 and that of Qt (t) = 42 + 32 = 5. The two unit tangent vectors are therefore equal (8/10, 6/10) = (4/5, 3/5). Thus, the unit tangent vector provides a better measure of the direction of the curve than the tangent vector itself. Another natural vector that’s associated with every point of a smooth curve is the curvature, a basic concept that’s discussed in Section 8.9. A curve whose tangent vector and curvature vector (Section 8.9.6) are everywhere continuous is said to have G2 (second-order geometric) continuity. You can do anything you like with me except paint me, Hughie dear. I have to draw the line somewhere. But that’s just what you can’t do—draw a line, I mean. I like you in every way, as you well know, except as a painter. You would have been a good painter if you had never painted—did I invent that? —L. P. Hartley, The Hireling
8 Basic Theory
453
8.8 PC Curves Definition: A polynomial of degree n in x is the function Pn (x) =
n
ai xi = a0 + a1 x + a2 x2 + · · · + an xn ,
i=0
where ai are the coefficients of the polynomial (in our case, they are real numbers). Note that there are n + 1 coefficients. Parametric curves used in computer graphics are based on polynomials for the following reasons: Polynomials are simple functions. They are easy to compute, requiring only basic arithmetic operations. They do not feature singular points. They are easy to differentiate and integrate. Their coefficients appear linearly, i.e., in the form of ai and not as a2i or
√ ai .
Once the decision is made to employ polynomials, the immediate question is what degree to use. The following paragraph explains why degree 3 is ideal. A polynomial of degree 1 has the form P1 (t) = At + B and is, therefore, a straight line so it can only be used in limited cases. A parametric polynomial of degree 2 (quadratic) has the form P2 (t) = At2 + Bt + C and is always a parabola (see next paragraph and Appendix C). A polynomial of degree 3 (cubic) has the form P3 (t) = At3 + Bt2 + Ct + D and is the simplest curve that can have complex shapes and can also be a space curve. (The complexity of this polynomial is limited, though. It can have at most one loop, and, if it does not have a loop, it can have at most two inflection points, see Section 8.9.8). Polynomials of higher degrees are sometimes needed, but they generally wiggle too much, a feature known as Runge’s phenomenon, and are difficult to control. They also have more coefficients, so they require more input data to determine all the coefficients. As a result, a complete curve is often constructed from segments, each a parametric cubic polynomial (also called a PC). The complete curve is a piecewise polynomial curve, sometimes also called a spline (see definition on Page 577). Plane curves described by degree-2 polynomials are conic sections, but this is true only for the implicit representation. A plane curve described parametrically by a degree2 polynomial can only be a parabola. Given such a curve P(t) = a t2 +b t+c we observe that it has a single value for any value of t and that it grows without limit when t becomes very large (positive or negative). Thus, when t approaches ±∞, P(t) also approaches ∞ or −∞ (depending on the sign of a) but there is only one branch that goes toward ∞ and one branch that goes toward −∞. We therefore conclude that P(t) cannot be an ellipse because ellipses are finite, and it cannot be a hyperbola because these curves approach ±∞ in two directions. It must therefore be a parabola. A more rigorous proof, using parameter substitution, can be found in [Gallier 00], page 66. Figure 8.12 shows seven data points and two curves that fit them. The dashed curve is a polynomial of degree 6; the solid curve is a spline. It is easy to see that the
8.8 PC Curves
454 3.5 3 2.5 2 1.5
1
2
3
4
5
6
Clear[points]; points={{0,1},{1,1.1},{2,1.2},{3,3},{4,2.9},{5,2.8},{6,2.7}}; InterpolatingPolynomial[points,x]; Interpolation[points,InterpolationOrder->3]; Show[ListPlot[points,Prolog->AbsolutePointSize[5]], Plot[%%,{x,0,6},PlotStyle->Dashing[{0.05,0.05}]], Plot[%[x],{x,0,6}]] Figure 8.12: Polynomial and Spline Fit.
polynomial oscillates, whereas the spline curve is tight and is therefore more pleasing to the eye. Exercise 8.15: Show that a quadratic polynomial must be a plane curve. Exercise 8.16: Why does a high-degree polynomial wiggle?
Question: The word “quad” comes from Latin for “four,” so why is a degree-2 polynomial called quadratic? While we are at it, why is a degree-3 polynomial called cubic? Answer: A square of side length n has four sides (it is quadratic), but its area is n2 and this is associated with a degree-2 polynomial, which has terms up to x2 . Similarly, a cube of side length n has volume n3 , which is why the term “cubic” has become associated with a degree-3 polynomial.
A single PC segment is determined by means of points (data or control) or tangent vectors. Continuity considerations are also used sometimes to constrain the curve. Regardless of the input data, the segment always has the form P(t) = At3 + Bt2 + Ct + D. Thus, four unknown coefficients have to be calculated, which requires four equations.
8 Basic Theory
455
The equations must depend on four known quantities, points or vectors, that we denote by G1 through G4 . The PC segment is expressed as the product ⎛
m11 ⎜ m21 3 2 P(t) = (t , t , t, 1) ⎝ m31 m41
m12 m22 m32 m42
m13 m23 m33 m43
⎞⎛ ⎞ m14 G1 m24 ⎟ ⎜ G2 ⎟ ⎠⎝ ⎠ = T(t) · M · G, m34 G3 m44 G4
where M is the basis matrix that depends on the method used and G is the geometry vector, consisting of the four given quantities. The segment can also be written as the weighted sum P(t) = (t3 m11 + t2 m21 + tm31 + m41 )G1 + (t3 m12 + t2 m22 + tm32 + m42 )G2 + (t3 m13 + t2 m23 + tm33 + m43 )G3 + (t3 m14 + t2 m24 + tm34 + m44 )G4 = B1 (t)G1 + B2 (t)G2 + B3 (t)G3 + B4 (t)G4 = B(t) · G = T(t) · N · G, where B(t) equals the product T(t)·M and the Bi (t) are the weights. They are also called the blending functions, since they blend the four given quantities. If any of the quantities being blended are points, their weights should be barycentric. In the case where all four quantities are points, this requirement implies that the sum of the elements of matrix M should equal 1 (because the 16 elements of M are also the elements of the Bi (t)’s). A PC segment can also be written in the form ⎛
Ax ⎜ Bx 3 2 3 2 P(t) = At + Bt + Ct + D = (t , t , t, 1) ⎝ Cx Dx
Ay By Cy Dy
⎞ Az Bz ⎟ ⎠ = T(t) · C, Cz Dz
where A = (Ax , Ay , Az ) and similarly for B, C, and D. Its first derivative is dP(t) dT(t) = · C = (3t2 , 2t, 1, 0)C dt dt and this is the tangent vector of the curve. This vector points in the direction of the tangent to the curve, but its magnitude is also important. It describes the speed of the curve. In physics, if the function x(t) describes the position of an object at time t, then dx(t)/dt describes its velocity, and d2 x(t)/dt2 gives its acceleration. This is also true for curves, but the speed in question is not the speed of drawing the curve on the screen! Rather, it is the distance covered on the curve when t is incremented in equal steps (see the particle paradigm of Section 8.6). This concept is important in computer animation. Imagine a camera moving along the curve while t is incremented in equal steps. The speed of the camera at a point is given by the magnitude of the tangent vector at that point. If we want the camera to move at a constant speed, all tangent vectors must have the same magnitude. For this to happen, the tangent vector must be independent of t, a constant. This implies that the second derivative (the acceleration) is the zero vector, and the curve itself must be a
8.8 PC Curves
456
linear function of t, a straight line. Any other curve has a tangent vector that depends on t, implying that the curve itself moves at variable speed.
8.8.1 Fast Computation of a PC This section employs the method of forward differences, together with the Taylor series representation, to speed up the calculation of a point on a parametric curve P(t). Once this method is implemented, an entire curve can be drawn in a loop where t is incremented from 0 to 1 in small, equal steps of Δ. In iteration i + 1, a point P([i + 1]Δ) is computed and is connected to the previous point P(iΔ) by a short, straight segment. Section 13.3 applies this method to the B´ezier curve. The principle of forward differences is to find a quantity dP such that P(t + Δ) = P(t) + dP for any value of t. If such a dP can be found, then it is enough to calculate P(0), and use forward differences to compute P(0 + Δ) = P(0) + dP, P(2Δ) = P(Δ) + dP = P(0) + 2dP, .. .
P([i + 1]Δ) = P iΔ + dP = P(0) + (i + 1) dP. The point is that dP should not depend on t. If dP turns out to depend on t, then as we advance t from 0 to 1, we have to use different values of dP, slowing down the calculations. The fastest way to calculate the curve is to precalculate dP before the loop starts and repeatedly add this precalculated value to P(0) inside the loop. We calculate dP from the Taylor series representation of the curve. The Taylor series of a function f (t) at a point f (t + Δ) is the infinite sum f (t + Δ) = f (t) + f (t)Δ +
f (t)Δ3 f (t)Δ2 + + ···. 2! 3!
In order to avoid dealing with an infinite sum, we limit our discussion to the popular PC curves. The mathematical treatment for any other type of curve (a different-degree polynomial or a nonpolynomial) is similar, although normally more complex. A general PC curve has the form P(t) = at3 + bt2 + ct + d, so only its first three derivatives are nonzero. These derivatives are Pt (t) = 3at2 + 2bt + c,
Ptt (t) = 6at + 2b,
Pttt (t) = 6a,
so the Taylor series representation produces dP = P(t + Δ) − P(t) Ptt (t)Δ2 Pttt (t)Δ3 + 2 6 = 3a t2 Δ + 2b tΔ + cΔ + 3a tΔ2 + bΔ2 + aΔ3 . = Pt (t)Δ +
This seems a failure since dP is a function of t (it should therefore be denoted by dP(t) instead of just dP) and is also slow to calculate. However, the original PC
8 Basic Theory
457
curve P(t) is a degree-3 polynomial, whereas dP(t) is only a degree-2 polynomial. This suggests a way out of our difficulty. We can try to express dP(t) by means of the Taylor series, similar to what we did with the original curve P(t). This should result in a forward difference ddP(t) that’s a polynomial of degree 1 in t. The quantity ddP(t) can, in turn, be represented by another Taylor series to produce a forward difference dddP that’s a degree-0 polynomial in t, i.e., a constant. Once this is done, we hope to end up with an algorithm of the form Compute P(0), dP, ddP, and dddP; P = P(0); for t:=0 to 1 step Δt do PN:=P+dP; dP:=dP+ddP; ddP:=ddP+dddP; line(P,PN); P:=PN; endfor; The quantity ddP(t) is obtained by dP(t + Δ) = dP(t) + ddP(t) = dP(t) + dPt (t)Δ + yielding ddP(t) = dPt (t)Δ +
dP(t)tt Δ2 , 2
dP(t)tt Δ2 2
= (6a tΔ + 2bΔ + 3aΔ2 )Δ + = 6a tΔ2 + 2bΔ2 + 6aΔ3 .
6aΔΔ2 2
Finally, dddP is similarly obtained by ddP(t + Δ) = ddP(t) + dddP = ddP(t) + ddPt (t)Δ, yielding dddP = ddPt (t)Δ = 6aΔ3 , a constant. The four quantities involved in the calculation of the curve are therefore P(t) = at3 + bt2 + ct + d, dP(t) = 3a t2 Δ + 2b tΔ + cΔ + 3a tΔ2 + bΔ2 + aΔ3 , ddP(t) = 6a tΔ2 + 2bΔ2 + 6aΔ3 , dddP = 6aΔ3 . They have to be calculated at t = 0 before the loop starts, then each iteration computes the first three quantities from those of the previous iteration (dddP doesn’t depend on t). Here are the details dP(0) = aΔ3 + bΔ2 + cΔ, ddP(0) = 6aΔ3 + 2bΔ2 , dddP = 6aΔ3 . P(Δ) = aΔ3 + bΔ2 + cΔ + d = P(0) + dP(0), dP(Δ) = aΔ3 + 2bΔ2 + cΔ + 3aΔ3 + bΔ2 + aΔ3 = dP(0) + ddP(0), ddP(Δ) = 6aΔ3 + 2bΔ2 + 6aΔ3 = ddP(0) + dddP, ···
P(0) = d,
458
8.8 PC Curves P([i + 1]Δ) = P(iΔ) + dP(iΔ), dP([i + 1]Δ) = dP(iΔ) + ddP(iΔ), ddP([i + 1]Δ) = ddP(iΔ) + dddP.
Thus, each iteration computes a point P([i + 1]Δ) on the curve by performing six simple operations, three additions and three assignments. No multiplications are needed.
8.8.2 Subdividing a Parametric Curve Parametric curves are defined by means of points (data or control) and sometimes also vectors. Editing such a curve is normally done by moving points around and by adding new points. Intuitively, it is clear that adding points allows for finer control of the shape of the curve. On the other hand, adding points results in a curve that’s a high-degree polynomial, and such polynomials tend to oscillate. Also, more points implies more calculations to compute and display the curve. It therefore seems that a reasonable method to obtain the right curve is to start with a few points, and if these are not enough to obtain the desired shape of the curve, to add a point (or a few points) at a time until the desired shape is achieved. This section discusses a different approach whereby the correct curve is achieved by subdividing a parametric curve into two segments. Together, the two segments have the same shape as the original curve, but they are defined by more entities (points or vectors), thereby making it possible to fine-tune the curve. This approach is applied in Section 13.8 to the B´ezier curve. Section 8.15 extends this approach to surface patches. The control of large numbers is possible, and like unto that of small numbers, if we subdivide them. —Sun Tze. We limit our discussion to cubic curves, but the method illustrated here applies to polynomial curves of any degree. Let ⎞ P0 ⎜P ⎟ P(t) = (t3 , t2 , t, 1)M ⎝ 1 ⎠ P2 P3 ⎛
(8.8)
be any cubic parametric curve defined by four nonscalar entities (points or vectors) where the parameter t varies from 0 to 1. We construct the two halves P1 (t) and P2 (t) of this curve by varying the parameter in the intervals [0, 0.5] and [0.5, 1] (Section 13.8 shows how the unequal ranges [0, α] and [α, 1] can be used instead). Each of the two new curves should have the same shape as half of the original curve. Each half should therefore be written as an expression similar to Equation (8.8) but based on a new set of entities Qi computed from the original set Pi . To construct the first half P1 (t), we define a new parameter u = 2t. When t varies in the range [0, 0.5], u varies from 0 to 1. The first half of the curve is obtained from Equation (8.8)
8 Basic Theory
459
by substituting t = u/2 ⎛
⎞ P0 ⎜P ⎟ P1 (u) = (u3 /8, u2 /4, u/2, 1)M ⎝ 1 ⎠ P2 P3 ⎞ ⎛ ⎛1 ⎞ P0 0 0 0 8 1 0 0 P 0 ⎟ ⎜ 1⎟ ⎜ 4 = (u3 , u2 , u, 1) ⎝ ⎠M⎝ ⎠ P2 0 0 12 0 P3 0 0 0 1 ⎛ ⎞ P0 ⎜P ⎟ = (u3 , u2 , u, 1)LM ⎝ 1 ⎠ P2 P3 ⎛ ⎞ Q0 ⎜Q ⎟ = (u3 , u2 , u, 1)M ⎝ 1 ⎠ . Q2 Q3
(8.9)
The last line of Equation (8.9) expresses P1 (u) in terms of new entities Qi . It shows that these entities can be calculated from the equation ⎛
⎛ ⎛ ⎛ ⎞ ⎞ ⎞ Q0 ⎞ P0 P0 Q0 Q P Q P ⎜ ⎜ ⎜ ⎜ ⎟ ⎟ ⎟ ⎟ M⎝ 1 ⎠ = LM⎝ 1 ⎠ , whose solution is ⎝ 1 ⎠ = M−1 LM⎝ 1 ⎠ . Q2 P2 Q2 P2 Q3 P3 Q3 P3
(8.10)
Exercise 8.17: Why does P1 (t) have the same shape as the first half of P(t)? The second half, P2 (t) is calculated similarly. We first define a new parameter u = 2t − 1. When t varies in the range [0.5, 1], u varies from 0 to 1. The second half of the curve is obtained from Equation (8.8) by substituting t = (u + 1)/2: ⎛ ⎞ P0
⎜P ⎟ P2 (u) = (u + 1)3 /8, (u + 1)2 /4, (u + 1)/2, 1 M ⎝ 1 ⎠ P2 P3 ⎞ ⎛ ⎛1 ⎞ 0 0 0 P0 8 ⎜ 38 41 0 0 ⎟ ⎜ P1 ⎟ 3 2 ⎟ = (u , u , u, 1) ⎜ ⎝ 3 2 1 0 ⎠ M ⎝ P2 ⎠ 8 4 2 1 1 1 P3 1 8 4 2 ⎞ ⎛ P0 P ⎟ ⎜ = (u3 , u2 , u, 1)RM ⎝ 1 ⎠ P2 P3
8.8 PC Curves
460
⎛
⎞ Q4 ⎜Q ⎟ = (u3 , u2 , u, 1)M ⎝ 5 ⎠ . Q6 Q7
(8.11)
The new entities Qi are calculated for this second half by ⎛
⎛ ⎞ ⎞ Q4 P0 P ⎜ Q5 ⎟ ⎜ 1⎟ −1 ⎝ ⎠ = M RM ⎝ ⎠. Q6 P2 Q7 P3
(8.12)
Given matrix M and four entities Pi , the eight new entities Qi can be calculated from Equations (8.10) and (8.12). The generalization of this method to higher-degree curves is straightforward. As an example, we apply this method to the cubic B´ezier curve, Equation (13.8). Matrix M and its inverse are ⎞ −1 3 −3 1 3 0⎟ ⎜ 3 −6 M=⎝ ⎠, −3 3 0 0 1 0 0 0
⎛
⎛
M−1
0 ⎜0 ⎜ =⎝ 0 1
0 0 1 3
1
0 1 3 2 3
1
⎞ 1 1⎟ ⎟. 1⎠ 1
The matrix products of Equations (8.10) and (8.12) now become ⎛ ⎜ M−1 LM = ⎜ ⎝
1 1 2 1 4 1 8
0 1 2 2 4 3 8
0 0 1 4 3 8
⎞ 0 0⎟ ⎟, 0⎠ 1 8
⎛1 8
⎜0 M−1 RM = ⎜ ⎝0 0
3 8 1 4
0 0
3 8 2 4 1 2
0
1 8 1 4 1 2
⎞ ⎟ ⎟. ⎠
(8.13)
1
The eight new entities (which in this case are control points) are Q0 = P0 , 1 1 1 Q1 = P0 + P1 = (P0 + P1 ), 2 2 2 1 2 1 1 1 1 (P0 + P1 ) + (P1 + P2 ) , Q2 = P0 + P1 + P2 = 4 4 4 2 2 2 1 3 3 1 Q3 = P0 + P1 + P2 + P3 8 8 8 8 1 1 1 1 1 1 1 = , (P0 + P1 ) + (P1 + P2 ) + (P1 + P2 ) + (P2 + P3 ) 2 2 2 2 2 2 2 1 3 3 P0 + P1 + P2 + 8 8 8 1 1 1 (P0 + P1 ) + = 2 2 2
Q4 =
1 P3 8 1 1 1 1 (P1 + P2 ) + (P1 + P2 ) + (P2 + P3 ) , 2 2 2 2
8 Basic Theory 1 2 1 1 1 1 (P1 + P2 ) + (P2 + P3 ) , Q5 = P1 + P2 + P3 = 4 4 4 2 2 2 1 1 1 Q6 = P1 + P2 = (P1 + P2 ), 2 2 2 Q7 = P3 .
461
Section 13.8 describes a different approach to the problem of subdividing a curve, using the mediation operator. That approach is then applied to the B´ezier curve.
8.9 Curvature and Torsion The first derivative Pt (t) of a parametric curve P(t) is the tangent vector of the curve. In this section, we denote the unit tangent vector at point P(i) by T(i). Thus, T(i) =
Pt (i) . |Pt (i)|
The tangent vector is an example of an intrinsic property of a curve. An intrinsic property of a geometric figure depends only on the figure and not on the particular choice of the coordinate axes. Any geometric figure may have intrinsic and extrinsic properties. A triangle has three angles and a quadrilateral has four edges, regardless of the choice of coordinates. The tangent vector of a curve, as well as its curvature, does not depend on the particular coordinate system used. In contrast, the slope of a curve depends on the particular coordinates chosen, which makes it an extrinsic property of the curve. Exercise 8.18: Give a few more intrinsic and extrinsic properties of geometric figures. This section discusses the important intrinsic properties of parametric curves. They include the principal vectors (the tangent, normal, and binormal vectors), the principal planes (the osculating, rectifying, and normal planes), and the concepts of curvature and torsion. These properties are all local and they vary from point to point on the curve. They are therefore functions of the parameter t. Notice that these properties exist for all curves, but the discussion here is limited to parametric curves. Newton was seeking better methods—more general—for finding the slope of a curve at any particular point, as well [as] another quantity, related but once removed, the degree of curvature, rate of bending, “the crookedness in lines.” He applied himself to the tangent, the straight line that grazes the curve at any point. The straight line that the curve would become at that point, if it could be seen through an infinitely powerful microscope. —James Gleick, Isaac Newton (2003).
8.9 Curvature and Torsion
462
8.9.1 Normal Plane The normal plane to a curve P(t) at point P(i) is the plane that’s perpendicular to the tangent Pt (i) and contains point P(i). If Q is an arbitrary point on the normal plane, then Figure 8.13 shows that (Q − P(i)) • Pt (i) = 0. This can be written Q • Pt (i) − P(i) • Pt (i) = 0 or x · xti + y · yit + z · zit − (xi · xti + yi · yit + zi · zit ) = 0,
(8.14)
an expression that has the familiar form Ax + By + Cz + D = 0 (Section 9.2.2).
Q Ti P(i)
Figure 8.13: The Normal Plane.
8.9.2 Principal Normal Vector Another important vector associated with a curve is the principal normal vector N(t). This unit vector is normal to the curve (and is therefore contained in the normal plane and is also perpendicular to the tangent vector), but it is called the principal normal since it points in a special direction, the direction in which the curve is turning. The principal normal vector points toward a point called the center of curvature of the curve. To express N(t) in terms of the curve and its derivatives, we select two nearby points, t and t + Δt, on the curve. The tangent vectors at the two points are a = Pt (t) and b = Pt (t + Δt), respectively. If we subtract them as in Figure 8.14a, we get c = b − a. The difference vector c can be interpreted in two ways. On one hand, we can say that it is a small change in the tangent vector Pt (t), so we can denote it ΔPt (t). On the other hand, since the tangent vector can be interpreted as the velocity of the curve, any changes in it can be interpreted as acceleration, that is, the second derivative Ptt (t). Thus, we can write c = ΔPt (t) = Ptt (t). The two vectors a = Pt (t) and b = Pt (t + Δt) define a plane and the principal normal vector lies at the intersection of this plane and the normal plane. Our task is therefore to compute a vector that is perpendicular to the tangent a = Pt (t) and that is contained in the plane defined by a and b. which is the projection of Ptt (t) (vector nm) Figure 8.14b shows vector nl, onto is Pt (t). Equation (8.4) tells us that the length of nl Ptt (t) • Pt (t) . |Pt (t)|
8 Basic Theory
463
a
c
Pt(t)
l
n Ptt(t)
b m
(b)
(a) P(t)
T(t)
N(t) (c) Figure 8.14: The Principal Normal Vector.
is in the direction of Pt (t), we can write the vector nl as Since nl tt t t tt t = P (t) • P (t) · P (t) = P (t) • P (t) Pt (t). nl |Pt (t)| |Pt (t)| |Pt (t)|2
by K(t) and compute it from the relation nl + lm = nm We denote the vector lm = Ptt (t): tt t = Ptt (t) − P (t) • P (t) Pt (t). (8.15) K(t) = Ptt (t) − nl t 2 |P (t)| The principal normal vector N(t) is a unit vector in the direction of K(t), so it is given by K(t) . N(t) = |K(t)| Exercise 8.19: What can we say about the nature of the principal normal vector of a straight line? Exercise 8.20: Calculate the principal normal vector of the PC curve P(t) = (−1, 0)t3 + (1, −1)t2 + (1, 1)t. Notice that this curve is Equation (11.10), so we know that it goes from (0, 0) to (1, 0) with start and end tangents (1, 1), (0, −1), respectively. Use this to check your results.
8.9.3 Binormal Vector The third important vector associated with a curve is the binormal vector B(t). It is defined as the vector perpendicular to both the tangent and principal normal, so its
8.9 Curvature and Torsion
464
definition is simply B(t) = T(t) × N(t). Notice that it is a unit vector. Since the binormal is perpendicular to the tangent, it is contained in the normal plane. The three vectors T(t), N(t), and B(t) therefore constitute an orthogonal coordinate system that moves along the curve as t varies, except at cusps, where they are undefined.
8.9.4 The Osculating Plane Imagine three points h, i, and j, located close to each other on a curve. If they are not collinear, they define a plane. Now, move h and j independently closer and closer to i. As these points move, the plane may change. The plane obtained at the limit is called the osculating plane at point i (Figure 8.15). It contains the tangent vector T(i) and the principal normal N(i). If Q is an arbitrary point on the osculating plane, then the plane equation is given by the determinant |(Q − P(i)) Pt (i) Ptt (i)| = 0, which can be written explicitly as t t tt tt t (x − xi )(yit zitt − yitt zit ) − (y − yi )(xti zitt − xtt i zi ) + (z − zi )(xi yi − xi yi ) = 0.
Another way to obtain the plane equation is to use the fact that point P(i) and vectors T(i) and N(i) are contained in the osculating plane. Any general point Q in the osculating plane can, therefore, be expressed as Q = P(i) + αT(i) + βN(i), where α and β are real parameters. The osculating plane of a plane curve is, of course, the plane of the curve. The osculating plane of a straight line is undefined.
P(i)
N(i)
T(i)
b Center of curvature
Os
c
ti cula
ng
plan
e
a
Origin Figure 8.15: The Osculating Plane.
Incidentally, two curves joined at a point have C 2 continuity (Section 8.7.2) at the point if they have the same osculating planes and the same curvature vectors at the point. Exercise 8.21: (1) Calculate the B´ezier curve for the four points P0 = (0, 0, 0), P1 = (1, 0, 0), P2 = (2, 1, 0), and P3 = (3, 0, 1). (Those unfamiliar with this curve should use Equation (13.8).) Notice that this is a space curve since the first three points are in the z = 0 plane, while the fourth one is outside that plane. (2) Calculate the (unnormalized) principal normal vector of the curve and find its values for t = 0, 0.5, and 1. (3) Calculate the osculating plane of the curve and find its equations for t = 0, 0.5, and 1 as above.
8 Basic Theory
465
8.9.5 Rectifying Plane The plane perpendicular to the principal normal vector of a curve is called the rectifying plane of the curve. If the curve is P(t), N(t) is its principal normal, and Q is an arbitrary point on the rectifying plane, then the equation of the rectifying plane at point P(i) is [Q − P(i)] • N(i) = 0. Another equation is obtained when we realize that both the tangent and binormal vectors are contained in the rectifying plane. A general point on this plane can therefore be expressed as Q = P(i) + αT(i) + βB(i). Figure 8.16 shows the three unit vectors and three planes associated with a particular point P(i) on a curve. They constitute intrinsic properties of the curve and together they form the moving trihedron of the curve, which can be considered a local coordinate system for the curve. The three vectors constitute the local coordinate axes and the three planes divide the space around point P(i) into eight octants. The curve passes through the normal plane and is tangent to both the osculating and rectifying planes. Normal plane ati cul Os n e pla
ng
h
i j
N
T Rec tify i B p l a n e ng
Figure 8.16: The Moving Trihedron.
8.9.6 Curvature The curvature of a curve is a useful measure, so it deserves to be rigorously defined. Intuitively, the curvature should be a number that measures how much the curve deviates from a straight line at any point. It should be large in areas where the curve wiggles, oscillates, or makes a sudden direction change; it should be small in areas where the curve is close to a straight line. It is also useful to associate a direction with the curvature, i.e., to make it a vector. Given a parametric curve P(t) and a point P(i) on it, we calculate the first two derivatives Pt (i) and Ptt (i) of the curve at the point. We then construct a circle that has these same first and second derivatives and move it so it grazes the point. This is called the osculating circle of the curve at the point. The curvature is now defined as the vector κ(i) whose direction is from point P(i) to the center of this circle and whose magnitude is the reciprocal of the radius of the circle.
466
8.9 Curvature and Torsion Using differential geometry, it can be shown that the vector Pt (t) × Ptt (t) |Pt (t)|3
has the right magnitude. However, this vector is perpendicular to both Pt (t) and Ptt (t), so it is perpendicular to the osculating plane. To bring it into the plane, we need to cross-product it with Pt (t)/|Pt (t)|, so the result is κ(t) =
Pt (t) × Ptt (t) × Pt (t) . |Pt (t)|4
(8.16)
Figure 8.15 shows that the curvature (vector b) is in the direction of the binormal N(t), so it can be expressed as κ(t) = ρ(t)N(t) where ρ(t) is the radius of curvature at point P(t). Given a curve P(t) with an arc length s(t), we assume that dP/ds is a unit tangent vector: dP(t) ds(t) Pt (t) dP(t) = = t . (8.17) ds dt dt s (t) Equation (8.17) shows the following: 1. dP(t)/ds and Pt (t) point in the same direction. Therefore, since dP(t)/ds is a unit vector, we get dP(t) Pt (t) = t . ds |P (t)| 2. st (t) = |Pt (t)|. We now derive the expression for curvature from a different point of view. The curvature k is defined by d2 P(t)/ds2 = kN, where N is the unit principal normal vector (Section 8.9.2). The problem is to express k in terms of the curve P(t) and its derivatives, not involving the (normally unknown) function s(t). We start with t d P (t) Pt (t) d d2 P(t) dt |Pt (t)| = = ds2 ds |Pt (t)| st (t) tt t P (t) P (t) d|Pt (t)| − t 2· t |P (t)| |P (t)| dt = . (8.18) |Pt (t)| The identity A • A = |A|2 is true for any vector A(t) and it implies A(t) • At (t) = |A(t)|
d|A(t)| . dt
When we apply this to the vector Pt (t), we get Pt (t) • Ptt (t) t Ptt (t) d2 P(t) −
= t 2 P (t), 2 t ds P (t) • P (t) Pt (t) • Pt (t)
(8.19)
8 Basic Theory
467
which can also be written kN =
Pt (t) × Ptt (t) × Pt (t) d2 P(t) = . 2
ds2 Pt (t) • Pt (t)
(8.20)
8.9.7 Torsion Torsion is a measure of how much a given curve deviates from a plane curve. The torsion τ (i) of a curve at a point P(i) is defined by means of the following two quantities: 1. Imagine a point h close to i. The curve has rectifying planes at points h and i (Figure 8.17). Denote the angle between them by θ. h
i
Ti Rectifying plane i
Bh
Bi Rectifying plane h
Angle be tween planes
Figure 8.17: Torsion.
2. Denote by s the arc length from point h to point i. The torsion of the curve at point i is defined as the limit of the ratio θ/s when h approaches i. Figure 8.17 shows how the rectifying plane rotates about the tangent as we move on the curve from h to i. The torsion can be expressed by means of the derivatives of the curve and by means of the curvature τ (t) =
|Pt (t) Ptt (t) Pttt (t)| |Pt (t) Ptt (t) Pttt (t)| = ρ(t)2 . |Pt (t) × Pt (t)|2 |Pt (t)|6
(The numerator is a determinant and the denominator is an absolute value. This expression is meaningful only when ρ(t) < ∞.) The torsion of a plane curve is zero. It is interesting to observe that a curve can be fully defined by specifying its curvature and torsion as functions of its arc length s. The functions κ = f (s) and τ = g(s) uniquely define the shape of a curve (although not its location in space). An alternative is the single (implicit) function F (κ, τ, s) = 0. An alternative representation can be derived for a plane curve. Assume that P(t) = (x(t), y(t)) is a curve in the xy plane. Figure 8.18 shows that its shape can be determined if its start point P(0) and its slope (or, equivalently, angle θ) are known as functions of
8.9 Curvature and Torsion
468
the arc length s. Since θ is the angle between the tangent and the x axis, functions x(s) and y(s) must satisfy dy dx = cos θ, = sin θ. ds ds Differentiating produces d2 x dθ dy dθ = − sin θ =− , ds2 ds ds ds
dθ d2 y dx dθ = cos θ = . ds2 ds ds ds
(8.21)
Figure 8.18 also shows that dθ/ds is the magnitude of the curvature κ, so the conclusion is that, given the curvature κ(s) of a curve as a function of its arc length, the two functions x(s) and y(s) can be calculated, either analytically, or point by point numerically, from the differential equations (8.21). y d ds d s
x
Figure 8.18: A Plane Curve.
Exercise 8.22: Given κ(s) = R (a constant), solve Equations (8.21) for x(s) and y(s). What kind of a curve is this?
8.9.8 Inflection Points An inflection point is a point on a curve where the curvature is zero. On a straight line, every point is an inflection point. On a typical curve, an inflection point is created when the curve reverses its direction of turning (for example, from a clockwise direction to a counterclockwise direction). From the definition of curvature (Equation (8.16)) it follows that an inflection point satisfies 0 = |Pt (t) × Ptt (t)| = (Pt (t) × Ptt (t)) • (Pt (t) × Ptt (t)). Therefore,
(Pt (t) × Ptt (t)) • (Pt (t) × Ptt (t)) = 0,
which is equivalent to (Pt (t) × Ptt (t))2x + (Pt (t) × Ptt (t))2y + (Pt (t) × Ptt (t))2z = 0, or
(y t z tt − z t ytt )2 + (z t xtt − xt z tt )2 + (xt y tt − y t xtt )2 = 0.
(8.22)
8 Basic Theory
469
This is the sum of three nonnegative quantities, so each must be zero. Since dy dx yt dy = / = t, dx dt dt x we get
d d2 y = dx2 dt
yt xt
xt y tt − xtt y t dt = . dx (xt )3
Therefore, saying that the three quantities above are zero is the same as saying that d2 y d2 x d2 z = 2 = 2 = 0. 2 dx dz dy Equation (8.22) can be used to show that a two-dimensional parametric cubic can have at most two inflection points. We denote a general PC by P(t) = at3 + bt2 + ct + d = (ax , ay )t3 + (bx , by )t2 + (cx , cy )t + (dx , dy ), which implies xt = 3ax t2 + 2bx t + cx and xtt = 6ax t + bx , and similarly for yt and y tt . Using this notation, we write Equation (8.22) explicitly (notice that for a twodimensional PC, only the third part is nonzero) as 0 = xt ytt − y t xtt = (3ax t2 + 2bx t + cx )(6ay t + by ) − (3ay t2 + 2by t + cy )(6ax t + bx ) = 6(ay bx − ax by )t2 + 6(ay cx − ax cy )t + 2(by cx − bx cy ). This is a quadratic equation in t, so there can be at most two solutions.
8.10 Special and Degenerate Curves Parametric curves may exhibit unusual behavior when their derivatives satisfy certain conditions. Such curves are referred to as special or degenerate. Here are four examples: 1. If the first derivative Pt (t) of a curve P(t) is zero for all values of t, then P(t) degenerates to the point P(0). 2. If Pt (t) = 0 and Pt (t)×Ptt (t) = 0 (i.e., the tangent vector points in the direction of the acceleration vector), then P(t) is a straight line. 3. If Pt (t) × Ptt (t) = 0 and |Pt (t) Ptt (t) Pttt (t)| = 0, then P(t) is a plane curve. (The notation |a b c| refers to the determinant whose three columns are a, b, and c.) 4. Finally, if both Pt (t) × Ptt (t) and |Pt (t) Ptt (t) Pttt (t)| are nonzero, the curve P(t) is nonplanar (i.e., it is a space curve).
470
8.11 Basic Concepts of Surfaces
8.11 Basic Concepts of Surfaces Section 8.6 mentions the explicit, implicit, and parametric representations of curves. Surfaces can also be represented in these three ways. The explicit representation of a surface is z = f (x, y) and the implicit representation is F (x, y, z) = 0 (Figure 8.19 and Plates M.3, P.3, and U.1). In practice, however, the parametric representation is used almost exclusively, for the same reasons that parametric curves are so important. z
y
x
Figure 8.19: An Implicit Surface.
A simple, intuitive way to grasp the concept of a parametric surface is to visualize it as a set of curves. Figure 8.20a shows a single curve and Figure 8.20b shows how it is duplicated several times to create a family of identical curves. The brain finds it natural to interpret such a family as a surface. If we denote the curve by P(u), we can denote each of its copies in the family by Pi (u), where i is an integer index. Taking this idea a step further, a solid surface is obtained by creating infinitely many copies of the curve and placing them next to each other without any gaps in between. It makes sense to replace the discrete integer index i of each curve by a real (continuous) index w. The solid version of the surface of Figure 8.20b can therefore be denoted by Pw (u), where varying u moves us along a curve and varying w moves us from curve to curve in steps that can be arbitrarily small. The next step is to obtain a general surface by varying the shape of the curves so they are not identical (Figure 8.20c). The shape of a curve should therefore depend on w, which suggests a notation such as P(u, w) for the surface. The shape of each curve depends on both u and w but in a special way. Each of the two parameters moves us along a different direction on the surface, so we can talk about the u direction and the w direction (Figure 8.20d). The general form of a parametric surface is P(u, w) = (f1 (u, w), f2 (u, w), f3 (u, w)). The surface depends on two parameters, u and w, that vary independently in some interval [a, b] (normally, but not always, limited to [0, 1]). For each pair (u, w), the expression above produces the three coordinates of a point on the surface.
8 Basic Theory
471
u
w (a)
(b)
(c)
(d)
Figure 8.20: A Surface as a Family of Curves.
Exercise 8.23: A curve can be either two-dimensional or three-dimensional. A surface, however, exists only in three dimensions, and each surface point has three coordinates. Why is it that the expression for the surface depends on two, and not on three, parameters? We would expect the surface to be of the form P(u, v, w), a function of three parameters. What’s the explanation? A simple example of a parametric surface is P(u, w) = [0.5(1 − u)w + u, w, (1 − u)(1 − w)]
(8.23)
(this is also Equation (9.9)). Such a surface is called bilinear since it is linear in both parameters. We use this example to discuss the concept of a surface patch and to show how a wire-frame surface can be displayed.
8.11.1 A Surface Patch The expression P(u, 0.2) (where w is held fixed and u varies) depends on just one parameter and is therefore a curve on the surface. The four curves P(u, 0), P(u, 1), P(0, w), and P(1, w) are of special interest. They are the boundary curves of the surface (Figure 8.21a). Since there are four such curves, our surface is a patch that has a (roughly) rectangular shape. Of special interest are the four quantities P(0, 0), P(0, 1), P(1, 0), and P(1, 1). They are the corner points of the surface patch and are sometimes denoted by Pij . We say that the curve P(u, 0.2) lies on the surface in the u direction. It is an isoparametric curve. Similarly, any curve P(u0 , w) where u0 is fixed, lies in the w direction and is an isoparametric curve. These are the two main directions on a rectangular surface patch. Two more special curves, the surface diagonals, are P(u, 1 − u) and P(u, u). The former goes from P01 to P10 and the latter goes from P00 to P11 . A large surface is obtained by constructing a number of patches and connecting them. The method used to construct the patch should allow for smooth connection of patches. Exercise 8.24: Compute the corner points, boundary curves, and diagonals of the bilinear surface patch of Equation (8.23).
8.11 Basic Concepts of Surfaces
472
P(1,w)
P11
P10 P(u,0)
P(u,1) u w P00 P(0,w)
P01
(a)
(b)
Figure 8.21: (a) A Surface Patch. (b) A Wire Frame.
Exercise 8.25: Calculate the corner points and boundary curves of the surface patch
P(u, w) = (c − a)u + a, (d − b)w + b, 0 , where a, b, c, and d are given constants and the parameters u and w vary independently in the range [0, 1]. What kind of a surface is this?
8.11.2 Displaying a Surface Patch A surface patch can be displayed either as a wire frame (Figure 8.21b) or as a solid surface. The pseudo-code of Figure 8.22 shows how to display a surface patch as a wire frame. The code consists of two similar loops—one drawing the curves in the w direction and the other drawing the curves in the u direction. The first loop varies u from 0 to 1 in steps of 0.2, thereby drawing six curves. Each of the six is drawn by varying w in small steps (0.01 in the example). The second loop is similar and draws six curves in the u direction. Procedure SurfacePoint receives the current values of u and w, and calculates the coordinates (x, y, z) of one surface point. Procedure PersProj uses these coordinates to calculate the screen coordinates (xs, ys) of a pixel (it projects the three-dimensional pixel on the two-dimensional screen using perspective projection). Finally, procedure Pixel actually displays the pixel in the desired color. Better results are obtained by eliminating those parts of the surface that are hidden by other parts, but this topic is outside the scope of this book. To display a solid surface, the normal vector of the surface (Section 8.16) has to be calculated at every point and a shading algorithm applied to compute the amount of light reflected from the point. Most texts on computer graphics discuss shading models and algorithms.
8 Basic Theory for u:=0 to 1 step 0.2 do begin for w:=0 to 1 step 0.01 do begin SurfacePoint(u,w,x,y,z); PersProj(x,y,z,xs,ys); Pixel(xs,ys,color) end; end;
473
for w:=0 to 1 step 0.2 do begin for u:=0 to 1 step 0.01 do begin SurfacePoint(u,w,x,y,z); PersProj(x,y,z,xs,ys); Pixel(xs,ys,color) end; end;
Figure 8.22: Procedure for a Wire-Frame Surface.
8.12 The Cartesian Product The concept of blending was introduced in Section 8.5. This is an important concept that is used in many curve and surface algorithms. This section shows how blending can n be used in surface design. We start with two parametric curves Q(u) = i=1 fi (u)Qi m and R(w) = i=1 gi (w)Ri where Qi and Ri can be points or vectors. Now examine the function P(u, w) =
n m
fi (u)gj (w)Pij =
i=1 j=1
n m
hij (u, w)Pij ,
(8.24)
i=1 j=1
where hij (u, w) = fi (u)gj (w). The function P(u, w) describes a surface, since it is a function of the two independent parameters u and w. For any value of the pair (u, w), the function computes a weighted sum of the quantities Pij . These quantities—which are normally points, but can also be vectors—are triplets, so P(u, w) returns a triplet (x, y, z) that are the three-dimensional coordinates of a point on the surface. When u and w vary over their ranges independently, P(u, w) computes all the three-dimensional points of a surface patch. I don’t blend in at a family picnic. —Batman in Batman Forever, 1995. The technique of blending quantities Pij into a surface by means of weights taken from two curves is called the Cartesian product, although the terms tensor product and cross-product are also sometimes used. The quantities Pij can be points, tangent vectors, or second derivatives. Equation (8.24) can also be written in the compact form ⎛
P11
. P(u, w) = f1 (u), . . . , fn (u) ⎝ .. Pn1
P12 .. . Pn2
⎞ ⎛ g (w) ⎞ P1m 1 .. ⎠ ⎜ .. ⎟ ⎝ . ⎠. . . . . Pnm gm (w) ...
(8.25)
Notice that it uses a matrix whose elements are nonscalar quantities (triplets). Even more important, Equation (8.24), combined with the isotropic principle (Section 8.1), tells us that if all Pij are points, then the surface P(u, w) is independent of the particular
8.12 The Cartesian Product
474
coordinate axes used if ij hij (u, w) = 1. If the two original curves Q(u) and R(w) are isotropic, then it’s easy to see that the surface is also isotropic because hij (u, w) = fi gj = gj fi = 1. ij
i
j
j
i
The following two examples illustrate the importance of the Cartesian product. The first example applies this technique to derive the equation of the bilinear surface (Section 9.3) from that of a straight segment. The parametric representation of the line segment from P0 to P1 is Equation (9.1) P(t) = (1 − t)P0 + tP1 = P0 + (P1 − P0 )t $ $ % % P0 P0 = [1 − t, t] = [B10 (t), B11 (t)] , P1 P1
(8.26)
where B1i (t) are the Bernstein polynomials of degree 1 (Equation (13.5)). The Cartesian product of Equation (8.26) with itself is $ % %$ P00 P01 B10 (w) P(u, w) = [B10 (u), B11 (u)] P10 P11 B11 (w) $ %$ % P00 P01 1−w = [1 − u, u] P10 P11 w = P00 (1 − u)(1 − w) + P01 (1 − u)w + P10 u(1 − w) + P11 uw, and this is the parametric expression of the bilinear surface patch, Equation (9.6). The second example starts with the parametric cubic polynomial that passes through four given points. This curve is derived from first principles in Section 10.1 and is given by Equation (10.6), duplicated here ⎤⎡ ⎤ P1 −4.5 13.5 −13.5 4.5 18 −4.5 ⎥ ⎢ P2 ⎥ ⎢ 9.0 −22.5 P(t) = (t3 , t2 , t, 1) ⎣ ⎦⎣ ⎦ P3 −5.5 9.0 −4.5 1.0 P4 1.0 0 0 0 ⎡ ⎤ P1 ⎢P ⎥ = (t3 , t2 , t, 1)N ⎣ 2 ⎦ . P3 P4 ⎡
(10.6)
The principle of Cartesian product is now applied to multiply this curve by itself in order to obtain a bicubic surface patch that passes through 16 given points. The result is obtained immediately ⎡
P33 ⎢ P23 3 2 P(u, w) = (u , u , u, 1)N ⎣ P13 P03
P32 P22 P12 P02
P31 P21 P11 P01
⎤ ⎡ 3⎤ P30 w P20 ⎥ T ⎢ w 2 ⎥ ⎦. ⎦N ⎣ P10 w P00 1
(8.27)
8 Basic Theory
475
Note that this result is also obtained in Section 10.6.1 (Equation (10.25)), where it is derived from first principles and requires the solution of a system of 16 equations. Cartesian product is obviously a useful, simple, and elegant method to easily derive the expressions of many types of surfaces.
8.13 Connecting Surface Patches Often, a complex surface is constructed of individual patches that have to be connected smoothly, which is why this short section examines the conditions required for the smooth connection of two rectangular patches. Figure 8.23 illustrates two patches P(u, w) and Q(u, w) connected along the w direction such that P(1, w) = Q(0, w) for 0 ≤ w ≤ 1. Specifically, the two corner points Q00 and P10 are identical and so are Q01 and P11 . The two patches will connect smoothly if any of the following conditions are met: 1. Qu (0, w) = Pu (1, w) for 0 ≤ w ≤ 1. 2. Qu (0, w) = f (w)Pu (1, w) for 0 ≤ w ≤ 1 and a positive function f (w). 3. Qu (0, w) = f (w)Pu (1, w) + g(w)Pw (1, w) for 0 ≤ w ≤ 1 and positive functions f (w) and g(w). These conditions involve the three tangent vectors: 1. Qu (0, w), the tangent in the u direction of patch Q at u = 0. 2. Pu (1, w), the tangent in the u direction of P at u = 1. 3. Pw (1, w), the tangent in the w direction of P at u = 1. Condition 1 implies that tangents 1 and 2 are equal. Condition 2 implies that they point in the same direction but their sizes differ. Condition 3 means that tangent 1 does not point in the direction of tangent 2, but lies in the plane defined by tangents 2 and 3. 2 patch Q(u,w) 1
2
1
2 1 3
w
P(1,1)=Q(0,1) 1
P(1,0)=Q(0,0)
patch P(u,w) 1
Pu(1,w)
2
Qu(0,w)
3
Pw(1,w)
0
Figure 8.23: Tangent Vectors For Smooth Connection.
Note that condition 3 includes condition 2 (in the special case g(w) = 0) and condition 2 includes condition 1 (in the special case f (w) = 1).
476
8.14 Fast Computation of a Bicubic Patch
8.14 Fast Computation of a Bicubic Patch A complete rectangular surface patch is displayed as a wireframe by drawing two families of curves, in the u and w directions, as pointed out in Section 8.11.2. This section shows how to apply the technique of forward differences to the problem of fast computation of these curves. The material presented here is an extension of the ideas and methods presented in Section 8.8.1. We limit this discussion to a general bicubic surface patch, whose expression is ⎡
M00 ⎢M P(u, w) = (u3 , u2 , u, 1) ⎣ 10 M20 M30
M01 M11 M21 M31
M02 M12 M22 M32
⎤⎡ 3 ⎤ M03 w M13 ⎥ ⎢ w2 ⎥ ⎦. ⎦⎣ M23 w M33 1
(8.28)
(Where matrix elements Mij are derived from the 16 points Pij and from the elements of matrix N. Compare with Equation (8.28).) For a fixed w, the surface P(u, w) reduces to a PC curve in the u direction Pw (u) = Au3 + Bu2 + Cu + D. Each of the four coefficients is a cubic polynomial in w as follows: A(w) = M00 w3 + M01 w 2 + M02 w + M03 , B(w) = M10 w3 + M11 w 2 + M12 w + M13 , C(w) = M20 w3 + M21 w 2 + M22 w + M23 , D(w) = M30 w3 + M31 w 2 + M32 w + M33 . Applying the forward differences technique of Section 8.8.1, we can compute the n points Pw (0), Pw (Δ), Pw (2Δ),. . . , Pw ([n − 1]Δ) [where (n − 1)Δ = 1] with three additions and three assignments for each point. This, however, requires that the four quantities A(w), B(w), C(w), and D(w) be computed first, which involves multiplications and exponentiations. Moreover, to display the entire surface patch we need to compute and display U curves Pw (u) for U values of w in the interval [0, 1]. The natural solution is to apply forward differences to the computations of A(w), B(w), C(w), and D(w) for each value of w. To compute A(w) = M00 w 3 + M01 w2 + M02 w + M03 we compute the following dA(0) = M00 Δ3 + M01 Δ2 + M02 Δ, ddA(0) = 6M00 Δ3 + 2M01 Δ2 , dddA = 6M00 Δ3 , A(Δ) = A(0) + dA(0), dA(Δ) = dA(0) + ddA(0), ddA(Δ) = ddA(0) + dddA, A([j + 1]Δ) = A(jΔ) + dA(jΔ), dA([j + 1]Δ) = dA(jΔ) + ddA(jΔ), ddA([j + 1]Δ) = ddA(jΔ) + dddA,
A(0) = M03 ,
and similarly for B(w), C(w), and D(w). Each requires three additions and three assignments, for a total of 12 additions and 12 assignments. Thus, a complete curve P(u, jΔ) is drawn in the u direction on the surface in the following two steps:
8 Basic Theory
477
1. Compute A(jΔ) from A([j−1]Δ), dA([j−1]Δ), and ddA([j−1]Δ) and similarly for B(jΔ), C(jΔ), and D(jΔ), in 12 additions and 12 assignments. 2. Use these four quantities to compute the n points P(0, jΔ), P(Δ, jΔ), P(2Δ, jΔ), up to P(1, jΔ), in three additions and three assignments for each point. The total number of simple operations required for drawing curve P(u, jΔ) is therefore 12 + 12 + n(3 + 3) = 6n + 24. If U such curves are drawn in the u direction, the total number of operations is (6n + 24)U . To complete the wireframe, another family of W curves of the form P(iΔ, w) should be computed and displayed. We assume that m points are computed for each curve, which brings the total number of operations for this family of curves to (6m + 24)W . A PC curve Pu (w) in the w direction on the surface has the form Pu (w) = Ew 3 + Fw 2 + Gw + H, where each of the four coefficients is a cubic polynomial in u as follows: E(u) = M00 u3 + M10 u2 + M20 u + M30 , F(u) = M01 u3 + M11 u2 + M21 u + M31 , G(u) = M02 u3 + M12 u2 + M22 u + M32 , H(u) = M03 u3 + M13 u2 + M23 u + M33 . Thus, E, F, G, and H are similar to A(w), B(w), C(w), and D(w), but are computed with the transpose of matrix M. A complete curve P(iΔ, w) is drawn in the w direction on the surface in the following two steps: 1. Compute E(iΔ), F(iΔ), G(iΔ), and H(iΔ) from the corresponding quantities for [i − 1]Δ in 12 additions and 12 assignments. 2. Use these four quantities to compute the m points P(iΔ, 0), P(iΔ, Δ), P(iΔ, 2Δ), up to P(iΔ, 1), in three additions and three assignments for each point. The total number of simple operations required to compute the m points for curve P(iΔ, w) is therefore 6m + 24. If W such curves are drawn in the w direction, the total number of operations is (6m + 24)W . Thus, it seems that the entire wireframe can be computed and drawn with (6n + 24)U + (6m + 24)W operations. For m = n and U = W this becomes 2(6n + 24)U . Typical values of these parameters may be m = n = 100 and U = W = 15, which results in 624×30 = 18,720 operations. However, as Figure 8.24 illustrates, some of the points traversed by the curves of the two families are identical, so a sophisticated algorithm may identify them and store them in memory to eliminate double computations and thereby reduce the total number of operations. The figure shows seven curves in the w direction, with 13 points each (the white circles) and five curves in the u direction, consisting of 19 points each (the black circles). Thus, n = 19, m = 13, W = 7, and U = 5. The total number of points is 19×5 + 13×7 = 186, and of these, 7×5, or about 19%, are identical (the U×W squares).
8.15 Subdividing a Surface Patch
478
u=1 w=1
u=0 w=1
w
u=0 w=0
u=1 w=0
u
Figure 8.24: A Rectangular Wireframe with 186 Points.
8.15 Subdividing a Surface Patch The surface subdivision method illustrated here is based on the approach employed in Section 8.8.2 to subdivide a curve. Hence, the reader is advised to read and understand Section 8.8.2 before tackling the material presented here. Imagine a user trying to construct a surface patch with an interactive algorithm. The patch is based on quantities Pij that are normally points (some of these quantities may be tangent vectors, but we’ll refer to them as points), but the surface refuses to take the desired shape even after the points Pij have been moved about, shuffled, and manipulated endlessly. This is a common case and it indicates that more points are needed. Just adding new points is a bad approach, because the extra points will modify the shape of the surface and will therefore require the designer to start afresh. A better solution is to add points in such a way that the new surface will have the same shape as the original one. A surface subdivision method takes a surface patch defined by n points Pij and partitions it into several smaller patches such that together those patches have the same shape as the original surface, and each is defined by n points Qij , each of which is computed from the original points. We illustrate this approach to surface subdivision using the bicubic surface patch as an example. The general expression of such a patch is Equation (8.28), duplicated here ⎡
P33 ⎢P P(u, w) = (u3 , u2 , u, 1)N ⎣ 23 P13 P03
P32 P22 P12 P02
P31 P21 P11 P01
⎤ ⎡ 3⎤ P30 w P20 ⎥ T ⎢ w2 ⎥ N ⎦ = UNPNT WT , ⎦ ⎣ P10 w 1 P00
where both u and w vary independently over the interval [0, 1]. We now select four numbers u1 , u2 , w1 , and w2 that satisfy 0 ≤ u1 < u2 ≤ 1 and 0 ≤ w1 < w2 ≤ 1. The expression P(u, w) where u and w vary in the intervals [u1 , u2 ] and [w1 , w2 ], respectively, is a rectangle on this surface (Figure 8.25a).
8 Basic Theory 1
u
0
479
1 w2
w
w1
0
u1 (a)
u2 (b)
Figure 8.25: Rectangles on a Bicubic Surface Patch.
The next step is to substitute new parameters t and v for u and w, respectively, and express rectangle P(u, w) as P(t, v) where both t and v vary independently in [0, 1]. If the original rectangle is expressed as P(u, w) = UNPNT WT ,
u1 ≤ u ≤ u2 ,
w1 ≤ w ≤ w2 ,
then after the substitutions its shape will be the same and its form will be P(t, v) = TNQNT VT , for 0 ≤ t ≤ 1,
0 ≤ v ≤ 1.
Both rectangles have the same shape, but P(t, v) is defined by means of new points Qij , and the main task is to figure out how to compute the Qij ’s from the original points Pij while preserving the shape. Once this is clear, a surface patch can be divided into several rectangles, as in Figure 8.25b, and each expressed in terms of new points. Each new rectangle has the same shape as that part of the surface from which it came, but is defined by the same number of points as the entire original surface. Each rectangle can now be reshaped because of the extra points. The parameter substitutions from u and w to t and v are the linear relations t = (u − u1 )/(u2 − u1 ) and v = (w − w1 )/(w2 − w1 ). These imply $ $ % % u1 w1 u = (u2 − u1 ) t + and w = (w2 − w1 ) v + . u2 − u1 w2 − w1 The rectangle is expressed by means of the new parameters in the form P(t, v) (
%3 $ %2 % ) $ u1 u1 u1 ,1 , (u2 − u1 )2 t + , (u2 − u1 ) t + u2 − u1 u2 − u1 u2 − u1 * +3 ⎤ ⎡ 1 (w2 − w1 )3 v + w2w−w 1 ⎢ * +2 ⎥ ⎢ ⎥ w1 ⎥ T ⎢ (w2 − w1 )2 v + ×NPN ⎢ w2 −w1 + ⎥ * ⎢ ⎥ ⎣ (w2 − w1 ) v + w1 ⎦
$ = (u2 − u1 )3 t +
w2 −w1
1
8.15 Subdividing a Surface Patch
480
⎤ 0 0 0 (u2 − u1 )3 2 2 (u2 − u1 ) 0 0⎥ ⎢ 3u (u − u1 ) = [t3 , t2 , t, 1] ⎣ 12 2 ⎦ 3u1 (u2 − u1 ) 2u1 (u2 − u1 ) u2 − u1 0 u31 u21 u1 0 ⎡ (w2 − w1 )3 3w1 (w2 − w1 )2 3w12 (w2 − w1 ) 0 (w2 − w1 )2 2w1 (w2 − w1 ) T ⎢ ×NPN ⎣ 0 0 w2 − w1 0 0 0 ⎡
(8.29) ⎤⎡ 3 ⎤ v w13 2 w1 ⎥ ⎢ v 2 ⎥ ⎦⎣ ⎦ w1 v 1 1
= [t3 , t2 , t, 1]LNPNT R[v 3 , v 2 , v, 1]T = [t3 , t2 , t, 1]NQNT [v 3 , v 2 , v, 1]T , where the new points Q are related to the original points by Q = N−1 LNPNT R(NT )−1 . To illustrate the application of matrices L and R of Equation (8.29), we apply them to the special case u1 = 0, u2 = 1/2, w1 = 1/2, and w2 = 1 to isolate the blue rectangle of Figure 8.26. The resulting matrices are ⎛
⎞ 1/8 0 0 0 ⎜ 0 1/4 0 0 ⎟ L=⎝ ⎠ 0 0 1/2 0 0 0 0 1
⎞ 1/8 3/8 3/8 1/8 ⎜ 0 1/4 1/2 1/4 ⎟ R=⎝ ⎠. 0 0 1/2 1/2 0 0 0 1 ⎛
These should be compared with matrices L and R of Equations (8.9) and (8.11), respectively.
1
0
u
1
w 0 Figure 8.26: A Rectangle on a Surface Patch.
8 Basic Theory
481
8.16 Surface Normals The main aim of computer graphics is to display real-looking, solid surfaces. This is done by applying a shading algorithm to every pixel on the surface. Such algorithms may be very complex, but the main task of shading is to compute the amount of light reflected from every surface point. This requires the calculation of the normal to the surface at every point. The normal is the vector that’s perpendicular to the surface at the point. It can be defined in two ways: 1. We imagine a flat plane touching the surface at the point (this is called the osculating plane). The normal is the vector that’s perpendicular to this plane. 2. We calculate two tangent vectors to the surface at the point. The normal is the vector that’s perpendicular to both tangents. The following shows how to calculate the normal vectors for various types of surfaces. The normal to the implicit surface F (x, y, z) = 0 at point (x0 , y0 , z0 ) is the vector
∂F (x0 , y0 , z0 ) ∂F (x0 , y0 , z0 ) ∂F (x0 , y0 , z0 ) , , ∂x ∂y ∂z
.
Example: The ellipsoid x2 /a2 + y 2 /b2 + z 2 /c2 − 1 = 0. A partial derivative would be, for example, ∂f /∂x = 2x/a2 , so the normal is
2x 2y 2z , , a2 b2 c2
which is in the same direction as
x y z , , . a2 b2 c2
For example, the normal at point (0, 0, −c) is (0, 0, −c/c2 ) = (0, 0, −1/c). This is a vector in the direction (0, 0, −1). Exercise 8.26: What is the normal to the explicit surface z = f (x, y) at point (x0 , y0 )? No money, no job, no rent. Hey, I’m back to normal. —Mickey Rourke (as Henry Chinaski) in Barfly, 1987. The normal to the parametric surface P(u, w) is calculated in two steps. In step 1, the two tangent vectors U = ∂P(u, w)/∂u and V = ∂P(u, w)/∂w are calculated. In step 2, the normal is calculated as their cross-product U × V (Equation (A.4), Page 1290). The normal to a polygon in a polygonal surface (Section 9.2) can be calculated as shown for an implicit surface. The plane equation is F (x, y, z) = Ax + By + (implicit) ∂F ∂F ∂F Cz + D = 0, so the normal is ∂x , ∂y , ∂z , which is simply (A, B, C). Another way of calculating the normal, especially suited for triangles, is to find two vectors on the surface and calculate their cross-product. Two suitable vectors are U = P1 − P2 and V = P1 − P3 , where P1 , P2 , and P3 are the triangle’s corners. Their cross product is U × V = (Uy Vz − Uz Vy , Uz Vx − Ux Vz , Ux Vy − Uy Vx ).
482
8.16 Surface Normals
Example: A polygon with vertices (1, 1, −1), (1, 1, 1) (1, −1, 1), and (1, −1, −1). All the vertices have x = 1, so they are on the x = 1 plane, which means that the normal should be a vector in the x direction. The calculation is straightforward: U = (1, 1, 1) − (1, 1, −1) = (0, 0, 2), V = (1, −1, 1) − (1, 1, −1) = (0, −2, 2), U × V = (0 − (−4), 0 − 0, 0 − 0) = (4, 0, 0). This is a vector in the right direction. Exercise 8.27: What will happen if we calculate U as (1, 1, −1) − (1, 1, 1)? Exercise 8.28: Find the normal to the pyramid face of Equation (Ans.10). Exercise 8.29: Find the normal to the cone of Equation (Ans.9). Exercise 8.30: Construct a cylinder as a sweep surface (Chapter 16) and find its normal vector. Assume that the cylinder is swept when the line from (−a, 0, R) to (a, 0, R) is rotated 360◦ about the x axis.
John’s leaning against the window, probably trying to figure out what parametric equation generated the petals on that eight-foot-tall, carnivorous plant. He turns around to be introduced. “John Cantrell.” “Harvard Li. Didn’t you get my e-mail?” Harvard Li! Now Randy is starting to remember this guy. Founder of Harvard Computer Company, a medium-sized PC clone manufacturer in Taiwan.
—Neal Stephenson, Cryptonomicon (2002)
9 Linear Interpolation In order to achieve realism, the many algorithms and techniques employed in computer graphics have to construct mathematical models of curved surfaces, models that are based on curves. At first it seems that straight line segments and flat surface patches, which are simple geometric figures, cannot play an important role in achieving realism, yet they turn out to be useful in many instances. A smooth curve can be approximated by a set of short, straight segments. A smooth, curved surface can similarly be approximated by a set of surface patches, each a small, flat polygon. Thus, this chapter discusses straight lines and flat surfaces that are defined by points. The application of these simple geometric figures to computer graphics is referred to as linear interpolation. The chapter also presents two types of surfaces, bilinear and lofted, that are curved, but are partly based on straight lines.
9.1 Straight Segments We start with the parametric equation of a straight segment. Given any two points A and C, the expression A + α(C − A) is the sum of a point and a vector, so it is a point (see Page 434) that we can denote by B. The vector C − A points from A to C, so adding it to A results in a point on the line connecting A to C. Thus, we conclude that the three points A, B, and C are collinear (see Exercise 13.7). Note that the expression B = A + α(C − A) can be written as the fundamental equation B = (1 − α)A + αC, showing that B is a linear combination of A and C with barycentric weights. In general, any of three collinear points can be written as a linear combination of the other two. Such points are not independent. We therefore conclude that given two arbitrary points P0 and P1 , the parametric representation of the line segment from P0 to P1 is P(t) = (1 − t)P0 + tP1 = P0 + (P1 − P0 )t = P0 + td,
for 0 ≤ t ≤ 1.
D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_9, © Springer-Verlag London Limited 2011
(9.1) 483
9.1 Straight Segments
484
The tangent vector of this line is the constant vector dP(t) dt = P1 − P0 = d, the direction from P0 to P1 . If we think of Pi as the vector from the origin to point Pi , then the figure on the right shows how the straight line is obtained as a linear, barycentric combination of the two vectors P0 and P1 , with coefficients (1 − t) and t. We can think of this combination as a vector that pivots from P0 to P1 while varying its magnitude, so its tip always stays on the line. The expression P0 + td is also useful. It describes the line as the sum of the point P0 and the vector td, a vector pointing from P0 to P1 , whose magnitude depends on t. P0 P1 This representation is useful in cases where the direction of the line and one point on it are known. Notice that varying t in the interval [−∞, +∞] constructs the infinite line that contains P0 and P1 .
9.1.1 Distance of a Point From a Line Given a line in parametric form L(t) = P0 + tv (where v is a vector in the direction of the line) and a point P, what is the distance between them? Assume that Q is the point on L(t) that’s the closest to P. Point Q can be expressed as Q = L(t0 ) = P0 + t0 v for some t0 . The vector from Q to P is P−Q. Since Q is the nearest point to P, this vector should be perpendicular to the line. Thus, we end up with the condition (P − Q) • v = 0 or (P − P0 − t0 v) • v = 0, which is satisfied by t0 =
(P − P0 ) • v . v•v
Substituting this value of t0 in the line equation gives Q = P0 +
(P − P0 ) • v v. v•v
(9.2)
The distance between Q and P is the magnitude of vector P − Q. This method always works since vector v cannot be zero (otherwise there would be no line). In the two-dimensional case, the line can be represented explicitly as y = ax + b and the problem can be easily solved with just elementary trigonometry. Figure 9.1 shows a general point P = (Px , Py ) at a distance d from a line y = ax + b. It is easy to see that the vertical distance e between the line and P is |Py − aPx − b|. We also know from trigonometry that 1 = sin2 α + cos2 α = tan2 α cos2 α + cos2 α = cos2 α(1 + tan2 α), implying cos2 α =
1 . 1 + tan2 α
We therefore get √ e |Py − aPx − b| √ d = e cos α = e cos2 α = √ . = 2 1 + a2 1 + tan α
(9.3)
9 Linear Interpolation
485
y (Px,aPx+b) Py−aPx−b
d
y=ax+b
e P=(Px,Py)
x
Figure 9.1: Distance Between P and y = ax + b.
Exercise 9.1: Many mathematics problems can be solved in more than one way and this problem is a good example. It is easy to solve by approaching it from different directions. Suggest some approaches to the solution.
A man who boasts about never changing his views is a man who’s decided always to travel in a straight line—the kind of idiot who believes in absolutes. —Honor´e de Balzac, P`ere Goriot, 1834.
9.1.2 Intersection of Lines Here is a simple, fast algorithm for finding the intersection point(s) of two line segments. Assuming that the two segments P1 + α(P2 − P1 ) and P3 + β(P4 − P3 ) are given (Equation (9.1)), their intersection point satisfies P1 + α(P2 − P1 ) = P3 + β(P4 − P3 ), or α(P2 − P1 ) − β(P4 − P3 ) + (P1 − P3 ) = 0. This can also be written αA + βB + C = 0, where A = P2 − P1 , B = P3 − P4 , and C = P1 − P3 . The solutions are α=
By Cx − Bx Cy , Ay Bx − Ax By
β=
Ax Cy − Ay Cx . Ay Bx − Ax By
The calculation of A, B, and C requires six subtractions. The calculation of α and β requires three subtractions, six multiplications (since the denominators are identical), and two divisions. Example: To calculate the intersection of the line segment from P1 = (−1, 1) to P2 = (1, −1) with the line segment from P3 = (−1, −1) to P4 = (1, 1), we first calculate A = P2 − P1 = (2, −2),
B = P3 − P4 = (−2, −2),
C = P1 − P3 = (0, 2).
9.1 Straight Segments
486 Then calculate α=
1 0+4 = , 4+4 2
β=
4−0 1 = . 4+4 2
The lines intersect at their midpoints. Example: The line segment from P1 = (0, 0) to P2 = (1, 0) and the line segment from P3 = (2, 0) to P4 = (2, 1) don’t intersect. However, the calculation shows the values of α and β necessary for them to intersect, A = P2 − P1 = (1, 0), yields α=
B = P3 − P4 = (0, −1), 2−0 = 2, 0+1
β=
C = P1 − P3 = (−2, 0),
0−0 = 0. 0+1
The lines would intersect at α = 2 (i.e., if we extend the first segment to twice its length beyond P2 ) and β = 0 (i.e., point P3 ). Exercise 9.2: How can we identify overlapping lines (i.e., the case of infinitely many intersection points) and parallel lines (no intersection points)? See Figure 9.2.
Overlapping
Parallel
Figure 9.2: Parallel and Overlapped Lines.
The description of right lines and circles, upon which geometry is founded, belongs to mechanics. Geometry does not teach us to draw these lines, but requires them to be drawn. —Isaac Newton, 1687.
9 Linear Interpolation
487
9.2 Polygonal Surfaces A polygonal surface consists of a number of flat faces, each a polygon. A polygon in such a surface is typically a triangle, because the three points of a triangle always lie on the same plane. With higher-order polygons, the surface designer should make sure that all the corners of the polygon are on the same plane. See Section 3.9 for the important topic of filling polygons and Section 2.18 for a discussion of two-dimensional polygons. Each polygon is a collection of vertices (the points defining it) and edges (the lines connecting the points). Such a surface is easy to display, either as a wire frame or as a solid surface. In the former case, the edges of all the polygons should be displayed. In the latter case, all the points in a polygon are assigned the same color and brightness. They are all assumed to reflect the same amount of light, since the polygon is flat and has only one normal vector. As a result, a polygonal surface shaded this way appears angular and unnatural, but a simple method known as Gouraud’s algorithm (Section 17.3) smooths out the reflections from the individual polygons and renders the entire polygonal surface so it looks curved. Three methods are described for representing such a surface in memory: 1. Explicit polygons. Each polygon is represented as a list
(x1 , y1 , z1 ), (x2 , y2 , z2 ), . . . , (xn , yn , zn )
of its vertices, and it is assumed that there is an edge from point 1 to point 2, from 2 to 3, and so on, and also an edge from point n to point 1. This representation is simple but has two disadvantages: I. A point may be shared by several polygons, so several copies have to be stored. If the user decides to modify the point, all its copies have to be located and updated. This is a minor problem, because an edge is rarely shared by more than two polygons. II. An edge may also be shared by several polygons. When displaying the surface, such an edge will be displayed several times, slowing down the entire process. 2. Polygon definition by pointers. There is one list V = (x1 , y1 , z1 ), (x2 , y2 , z2 ), . . . , (xn , yn , zn ) of all the vertices of the surface. A polygon is represented as a list of pointers, each pointing to a vertex in V. Hence, P = (3, 5, 7, 10) implies that polygon P consists of vertices 3, 5, 7, and 10 in V. Problem II still exists. 3. Explicit edges. List V is as before, and there is also an edge list E = ( (v1 , v6 , p3 ), (v5 , v7 , p1 , p3 , p6 , p8 ), . . .). Each element of E represents an edge. It contains two pointers to the vertices of the edge followed by pointers to all the polygons that share the edge. Each polygon is represented by a list of pointers to E, for example, P1 = (e1 , e4 , e5 ). Problem II still exists, but it is minor.
9.2 Polygonal Surfaces
488
9.2.1 Polygon Planarity Given a polygon defined by points P1 , P2 , . . . , Pn , we use the scalar triple product (Equation (A.7)) to test for polygon planarity (i.e., to check whether all the polygon’s vertices Pi are on the same plane). Such a test is necessary only if n > 3. We select P1 as the “pivot” point and calculate the n − 1 pivot vectors vi = Pi − P1 for i = 2, . . . , n. Next, we calculate the n − 3 scalar triple products vi • (v2 × v3 ) for i = 4, . . . , n. If any of these products are nonzero, the polygon is not planar. Note that limited accuracy on some computers may cause an otherwise null triple product to come out as a small floating-point number. Exercise 9.3: Consider the polygon defined by the four points P1 = (1, 0, 0), P2 = (0, 1, 0), P3 = (1, a, 1), and P4 = (0, −a, 0). For what values of a will it be planar?
9.2.2 Plane Equations A polygonal surface consists of flat polygons (often triangles). To calculate the normal to a polygon, we first need to know the polygon’s equation. The implicit equation of a flat plane is Ax + By + Cz + D = 0 (Section 4.4.1). Four equations are needed to solve for the four unknown coefficients A, B, C, and D, and these equations are easy to set up. Given three points Pi = (xi , yi , zi ), i = 1, 2, 3, on the surface, we write the four equations Ax + By + Cz + D = 0, Ax1 + By1 + Cz1 + D = 0, Ax2 + By2 + Cz2 + D = 0, Ax3 + By3 + Cz3 + D = 0. The first equation is true for any point (x, y, z) on the plane. We cannot solve this system of four equations in four unknowns, but we know that it has a solution if and only if its determinant is zero. The expression below assumes this and also expands the determinant by its top row: x x 0 = 1 x2 x3 y1 =x y2 y3
y y1 y2 y3 z1 z2 z3
z 1 z1 1 z2 1 z3 1 x1 1 1 − y x2 x3 1
z1 z2 z3
x1 1 1 + z x2 x3 1
y1 y2 y3
1 x1 1 − x2 1 x3
y1 y2 y3
z1 z2 . z3
This expression is of the form Ax + By + Cz + D = 0 where y1 A = y2 y3
z1 z2 z3
1 1 1
x1 B = − x2 x3
z1 z2 z3
1 1 1
x1 C = x2 x3
y1 y2 y3
1 1 1
x1 D = − x2 x3
y1 y2 y3
z1 z2 . z3 (9.4)
9 Linear Interpolation
489
Exercise 9.4: Calculate the expression of the plane containing the z axis and passing through the point (1, 1, 0). Exercise 9.5: In the plane equation Ax + By + Cz + D = 0, if D = 0, then the plane passes through the origin. Assuming D = 0, we can write the same equation as x/a + y/b + z/c = 1, where a = −D/A, b = −D/B, and c = −D/C. What is the geometrical interpretation of a, b, and c?
We operate with nothing but things which do not exist, with lines, planes, bodies, atoms, divisible time, divisible space—how should explanation even be possible when we first make everything into an image, into our own image! —Friedrich Nietzsche. In some practical situations, the normal to the plane as well as one point on the plane, are known. It is easy to derive the plane equation in such a case. We assume that N is the (known) normal vector to the plane, P1 is a known point, and P is any point in the plane. The vector P − P1 is perpendicular to N, so their dot product N • (P − P1 ) equals zero. Since the dot product is associative, we can write N • P = N • P1 . The dot product N • P1 is just a number, to be denoted by s, so we obtain N • P = s or Nx x + Ny y + Nz z − s = 0. This is identical to Equation (4.25) which can now be written as Ax + By + Cz + D = 0, where A = Nx , B = Ny , C = Nz , and D = −s = −N • P1 . The three unknowns A, B, and C are therefore the components of the normal vector and D can be calculated from any known point P1 on the plane. The expression N • P = s is a useful equation of the plane and is used elsewhere in this book. Exercise 9.6: Given N = (1, 1, 1) and P1 = (1, 1, 1), calculate the plane equation. Note that the direction of the normal in this case is unimportant. Substituting (−A, −B, −C) for (A, B, C) would also change the sign of D, resulting in the same equation. However, the direction of the normal is important when the surface is to be shaded. To be used for the calculation of reflection, the normal has to point outside the surface. This has to be verified by the user, since the computer has no idea of the shape of the surface and the meaning of “inside” and “outside.” In the case where a plane is defined by three points, the direction of the normal can be specified by arranging the three points (in the data structure in memory) in a certain order. It is also easy to derive the equation of a plane when three points on the plane, P1 , P2 , and P3 , are known. In order for the points to define a plane, they should not be collinear. We consider the vectors r = P2 − P1 and s = P3 − P1 a local coordinate system on the plane. Any point P on the plane can be expressed as a linear combination P = ur + ws, where u and w are real numbers. Since r and s are local coordinates on the plane, the position of point P relative to the origin is expressed as (Figure 9.3) P(u, w) = P1 + ur + ws,
−∞ < u, w < ∞
9.2 Polygonal Surfaces
490
r
P1
ws s
ur
P2
P
P3 Figure 9.3: Three Points on a Plane.
(This is Equation (4.26)). Exercise 9.7: Given the three points P1 = (3, 0, 0), P2 = (0, 3, 0), and P3 = (0, 0, 3), write the equation of the plane defined by them.
9.2.3 Space Division An infinite plane divides the entire three-dimensional space into two parts. We can call them “outside” and “inside” (or “above” and “below”), and define the outside direction as the direction pointed to by the normal. Using the plane equation, N • P = s, it is possible to tell if a given point Pi lies inside, outside, or on the plane. All that’s necessary is to examine the sign of the dot product N • (Pi − P), where P is any point on the plane, different from Pi . This dot product can also be written |N| |Pi −P| cos θ, where θ is the angle between the normal N and the vector Pi − P. The sign of the dot product equals the sign of cos θ, and Figure 9.4a shows that for −90◦ < θ < 90◦ , point Pi lies outside the plane, for θ = 90◦ , point Pi lies on the plane, and for θ > 90◦ , Pi lies inside the plane. The regular division of the plane into congruent figures evoking an association in the observer with a familiar natural object is one of these hobbies or problems. . . . I have embarked on this geometric problem again and again over the years, trying to throw light on different aspects each time. I cannot imagine what my life would be like if this problem had never occurred to me; one might say that I am head over heels in love with it, and I still don’t know why. —M. C. Escher.
9.2.4 Turning Around on a Polygon When moving along the edges of a polygon from vertex to vertex, we make a turn at each vertex. Sometimes, the “sense” of the turn (left or right) is important. However, the terms “left” and “right” are relative, depending on the location of the observer, and are therefore ambiguous. Consider Figure 9.4b. It shows two edges, a and b, of a “thick” polygon, with two arrows pointing from a to b. Imagine each arrow to be a bug crawling on the polygon. The bug on the top considers the turn from a to b a left turn, while the bug crawling on the bottom considers the same turn to be a “right” turn.
9 Linear Interpolation Pi
491
N Outside (above)
(Pi - P) Pi
P
Inside (below) (a)
b
a
Pi (b)
Figure 9.4: (a) Space Division. (b) Turning on a Polygon.
It is therefore preferable to define terms such as “positive turn” and “negative turn,” that depend on the polygon and on the coordinate axes, but not on the position of any observer. To define these terms, consider the plane defined by the vectors a and b (if they are parallel, they don’t define any plane, but then there is no sense talking about turning from a to b). The cross product a × b is a vector perpendicular to the plane. It can point in the direction of the normal N to the plane, or in the opposite direction. In the former case, we say that the turn from a to b is positive; in the latter case, the turn is said to be negative. To calculate the sense of the turn, simply check the sign of the triple scalar product N • (a × b). A positive sign implies a positive turn. Exercise 9.8: Why?
9.2.5 Convex Polygons Given a polygon, we select two arbitrary points on its edges and connect them with a straight line. If for any two such points the line is fully contained in the polygon, then the polygon is called convex (Section 2.18). Another way to define a convex polygon is to say that a line can intersect such a polygon at only two points (unless the line is identical to one of the edges or it grazes the polygon at one point). The sense of a turn (positive or negative) can also serve to define a convex polygon. When traveling from vertex to vertex in such a polygon all turns should have the same sense. They should all be positive or all negative. In contrast, when traveling along a concave polygon, both positive and negative turns must be made (Figure 9.5).
Convex
Concave
Figure 9.5: Convex and Concave Polygons.
9.2 Polygonal Surfaces
492
We can think of a polygon as a set of points in two dimensions. The concept of a set of points, however, exists in any number of dimensions. A set of points is convex if it satisfies the definition regardless of the number of dimensions. One important concept associated with a set of points is the convex hull of the set. This is the set of “extreme” points that satisfies the following: the set obtained by connecting the points of the convex hull contains all the points of the set. (A simple, two-dimensional analogy is to consider the points nails driven into a board. A rubber band placed around all the nails and stretched will identify the points that constitute the convex hull.)
9.2.6 Line and Plane Intersection Given a plane N • P = s and a line P = P1 + td (Equation (9.1)), it is easy to calculate their intersection point. We simply substitute the value of P in the plane equation to obtain N • (P1 + td) = s. This results in t = (s − N • P1 )/(N • d). Thus, we compute the value of t and substitute it in the line equation, to get the point of intersection. Such a process is important in ray tracing, an important rendering algorithm where the intersections of light rays and polygons are computed all the time. Exercise 9.9: The intersection of a line parallel to a plane is either the entire line (if the line happens to be in the plane) or is empty. How do we distinguish these cases from the equation above?
9.2.7 Triangles A polygonal surface is often constructed of triangles. A triangle is flat but finite, whereas the plane equation describes an infinite plane. We therefore need to modify this equation to describe only the area inside a given triangle Given any three noncollinear points P1 , P2 , and P3 in three dimensions, we first derive the equation of the (infinite) plane defined by them. Following that, we limit ourselves to just that part of the plane that’s inside the triangle. We start with the two vectors (P2 − P1 ) and (P3 − P1 ). They can serve as local coordinate axes on the plane (even though they are not normally perpendicular), with point P1 as the local origin. The linear combination u(P2 − P1 ) + w(P3 − P1 ), where both u and w can take any real values, is a vector on the plane. To obtain the coordinates of an arbitrary point on the plane, we simply add point P1 to this linear combination (recall that the sum of a point and a vector is a point). The resulting plane equation is P1 + u(P2 − P1 ) + w(P3 − P1 ) = P1 (1 − u − w) + P2 u + P3 w.
(9.5)
To limit the area covered to just the triangle whose corners are P1 , P2 , and P3 , we note that Equation (9.5) yields P1 , when u = 0 and w = 0, P2 , when u = 1 and w = 0, P3 , when u = 0 and w = 1. The entire triangle can therefore be obtained by varying u and w under the conditions u ≥ 0, w ≥ 0, and u + w ≤ 1.
9 Linear Interpolation
493
Exercise 9.10: Given the three points P1 = (10, −5, 4), P2 = (8, −4, 3.2), and P3 = (8, 4, 3.2), derive the equation of the triangle defined by them. Exercise 9.11: Given the three points P1 = (10, −5, 4), P2 = (8, −4, 3.2), and P3 = (12, −6, 4.8), calculate the triangle defined by them. For more information, see [Triangles 11] or [Kimberling 94].
If triangles had a God, He’d have three sides. —Yiddish proverb.
9.3 Bilinear Surfaces A flat polygon is the simplest type of surface. The bilinear surface is the simplest nonflat (curved) surface because it is fully defined by means of its four corner points. It is discussed here because its four boundary curves are straight lines and because the coordinates of any point on this surface are derived by linear interpolations. Since this patch is completely defined by its four corner points, it cannot have a very complex shape. Nevertheless it may be highly curved. If the four corners are coplanar, the bilinear patch defined by them is flat. Let the corner points be the four distinct points P00 , P01 , P10 , and P11 . The top and bottom boundary (Figure 9.6). curves are straight lines and are easy to calculate They are P(u, 0) = P10 − P00 u + P00 and P(u, 1) = P11 − P01 u + P01 . P01 P(0,w)
P(u0,1)
P(u0,w) P(u,0)
P00 P(u0,0)
P11
P(u,1)
P(1,w) P10
Figure 9.6: A Bilinear Surface.
To linearly interpolate between these boundary curves, we first calculate two corresponding points P(u0 , 0) and P(u0 , 1), one on each curve, then connect them with a straight line P(u0 , w). The two points are P(u0 , 0) = (P10 − P00 )u0 + P00
and P(u0 , 1) = (P11 − P01 )u0 + P01 ,
9.3 Bilinear Surfaces
494
and the straight segment connecting them is P(u0 , w) = (P(u0 , 1) − P(u0 , 0)) w + P(u0 , 0) = (P11 − P01 )u0 + P01 − (P10 − P00 )u0 + P00 w + (P10 − P00 )u0 + P00 . The expression for the entire surface is obtained when we release the parameter u from its fixed value u0 and let it vary. The result is: P(u, w) = P00 (1 − u)(1 − w) + P01 (1 − u)w + P10 u(1 − w) + P11 uw 1 1 B1i (u)Pij B1j (w), = i=0 j=0
= [B10 (u), B11 (u)]
P00 P10
P01 P11
(9.6)
B10 (w) , B11 (w)
where the functions B1i (t) are the Bernstein polynomials of degree 1, introduced in Section 13.17. This implies that the bilinear surface is a special case of the rectangular B´ezier surface, introduced in the same section. (The Bernstein polynomials crop up in unexpected places.) Mathematically, the bilinear surface is a hyperbolic paraboloid (see answer to exercise 9.12). Its parametric expression is linear in both u and w. The fundamental equation of computer graphics is P(t) = (1 − t)P1 + t P2 . This is the straight segment from point P1 to point P2 expressed as a blend (or a barycentric sum) of the points with the two weights (1−t) and t. Since B10 (t) = 1−t and B11 (t) = t, this expression can also be written in the form [B10 (t), B11 (t)]
P1 . P2
(9.7)
The reader should notice the similarity between Equations (9.6) and (9.7). The former expression is a direct extension of the latter and is a simple example of the technique of Cartesian product, discussed in Section 8.12, which is used to extend many curves to surfaces. Figure 9.7 shows a bilinear surface together with the Mathematica code that produced it. The coordinates of the four corner points and the final, simplified expression of the surface are also included. The figure illustrates the bilinear nature of this surface. Every line in the u or in the w directions on this surface is straight, but the surface itself is curved. Example: We select the four points P00 = (0, 0, 1), P10 = (1, 0, 0), P01 = (1, 1, 1), and P11 = (0, 1, 0) (Figure 9.7) and apply Equation (9.6). The resulting surface patch is P (u, w) = (0, 0, 1)(1 − u)(1 − w) + (1, 1, 1)(1 − u)w + (1, 0, 0)u(1 − w) + (0, 1, 0)uw = u + w − 2uw, w, 1 − u . (9.8)
9 Linear Interpolation
495
(* a bilinear surface patch *) Clear[bilinear,pnts,u,w]; All, ViewPoint->{0.063, -1.734, 2.905}]; {{0, 0, 1}, {1, 1, 1}, {1, 0, 0}, {0, 1, 0}} {u + w - 2 u w, u, 1 - w}
Figure 9.7: A Bilinear Surface.
It is easy to check the expression by substituting u = 0, 1 and w = 0, 1, which reduces the expression to the four corner points. The tangent vectors can easily be calculated. They are ∂P(u, w) ∂P(u, w) = (1 − 2w, 0, −1), = (1 − 2u, 1, 0). ∂u ∂w The first vector lies in the xz plane, and the second lies in the xy plane. Example: The four points P00 = (0, 0, 1), P10 = (1, 0, 0), P01 = (0.5, 1, 0), and P11 = (1, 1, 0) are selected and Equation (9.6) is applied to them. The resulting surface patch is (Figure 9.8) P (u, w) = (0, 0, 1)(1 − u)(1 − w) + (0.5, 1, 0)(1 − u)w + (1, 0, 0)u(1 − w) + (1, 1, 0)uw = 0.5(1 − u)w + u, w, (1 − u)(1 − w) . (9.9) Note that the y coordinate is simply w. This means that points with the same w value, such as P(0.1, w) and P(0.5, w) have the same y coordinate and are therefore located on the same horizontal line. Also, the z coordinate is a simple function of u and w, varying from 1 (when u = w = 0) to 0 as we move toward u = 1 or w = 1.
9.3 Bilinear Surfaces 1 1
496
0.5
0 1
0.5 0
0.55 0
y z
x
(* Another bilinear surface example *) ParametricPlot3D[{0.5(1-u)w+u,w,(1-u)(1-w)}, {u,0,1},{w,0,1}, ViewPoint->{-0.846, -1.464, 3.997}];
Figure 9.8: A Bilinear Surface.
The boundary curves are very easy to calculate from Equation (9.9). Here are two of them P(0, w) = (0.5w, w, 1 − w),
P(u, 1) = (0.5(1 − u) + u, 1, 0).
The tangent vectors can also be obtained from Equation (9.9) ∂P(u, w) = (−0.5w + 1, 0, w − 1), ∂u
∂P(u, w) = (0.5(1 − u), 1, u − 1). ∂w
(9.10)
The first is a vector in the xz plane, while the second is a vector in the y = 1 plane. The following two tangent values are especially simple: ∂P(u,1) = (0.5, 0, 0) and ∂P(1,w) = ∂u ∂w (0, 1, 0). The first is a vector in the x direction and the second is a vector in the y direction. Finally, we compute the normal vector to the surface. This vector is normal to the surface at any point, so it is perpendicular to the two tangent vectors ∂P(u, w)/∂u and ∂P(u, w)/∂w and is therefore the cross-product (Equation (A.4)) of these vectors. The calculation is straightforward: N(u, w) =
∂P ∂P × = (1 − w, 0.5(1 − u), 1 − 0.5w). ∂u ∂w
(9.11)
There are two ways of satisfying ourselves that Equation (9.11) is the correct expression for the normal: 1. It is easy to prove, by directly calculating the dot products, that the normal vector of Equation (9.11) is perpendicular to both tangents of Equation (9.10). 2. A closer look at the coordinates of our points shows that three of them have a z coordinate of zero and only P00 has z = 1. This means that the surface approaches a flat xy surface as one moves away from point P00 . It also means that the normal should
9 Linear Interpolation
497
approach the z direction when u and w move away from zero, and it should move away from that direction when u and w approach zero. It is, in fact, easy to confirm the following limits: lim N(u, w) = (0, 0, 0.5),
u,w→1
lim N(u, w) = (1, 0.5, 1).
u,w→0
Exercise 9.12: (1) Calculate the bilinear surface for the points (0, 0, 0), (1, 0, 0), (0, 1, 0), and (1, 1, 1). (2) Guess the explicit representation z = F (x, y) of this surface. (3) What curve results from the intersection of this surface with the plane z = k (parallel to the xy plane)? (4) What curve results from the intersection of this surface with a plane containing the z axis?
The scale, properly speaking, does not permit the measure of the intelligence, because intellectual qualities are not superposable, and therefore cannot be measured as linear surfaces are measured. —Alfred Binet (on his new IQ test).
Example: This is the third example of a bilinear surface. The four points P00 = (0, 0, 1), P10 = (1, 0, 0), and P01 = P11 = (0, 1, 0) create a triangular surface patch (Figure 9.9) because two of them are identical. The surface expression is P (u, w) = (0, 0, 1)(1−u)(1−w) + (0, 1, 0)(1−u)w + (1, 0, 0)u(1−w) + (0, 1, 0)uw = u(1 − w), w, (1 − u)(1 − w) . Notice that the boundary curve P(u, 1) degenerates to the single point (0, 1, 0), i.e., it does not depend on u. 0
0.5
1
1 0.75 0.5 0.25
y 1
0.5
z 0
x
0
(* A Triangular bilinear surface example *) ParametricPlot3D[{u(1-w),w,(1-u)(1-w)}, {u,0,1},{w,0,1}, ViewPoint->{-2.673, -3.418, 0.046}];
Figure 9.9: A Triangular Bilinear Surface.
9.3 Bilinear Surfaces
498
Exercise 9.13: Calculate the tangent vectors and the normal vector of this surface. Exercise 9.14: Given the two points P00 = (−1, −1, 0) and P10 = (1, −1, 0), consider them the endpoints of a straight segment L1 . (1) Construct the endpoints of the three straight segments L2 , L3 , and L4 . Each should be translated one unit above its predecessor on the y axis and should be rotated 60◦ about the y axis, as shown in Figure 9.10. Denote the four pairs of endpoints by P00 P10 , P01 P11 , P02 P12 and P03 P13 . (2) Calculate the three bilinear surface patches P1 (u, w) =P00 (1 − u)(1 − w) + P01 (1 − u)w + P10 u(1 − w) + P11 uw, P2 (u, w) =P01 (1 − u)(1 − w) + P02 (1 − u)w + P11 u(1 − w) + P12 uw, P3 (u, w) =P02 (1 − u)(1 − w) + P03 (1 − u)w + P12 u(1 − w) + P13 uw.
L4
P13
P01
L1
P03
2
L3
L2
y
P02
1 120o
P12
60o
x
P11
P00 -1
P10
Figure 9.10: Four Straight Segments for Exercise 9.14.
Trilinear interpolation. Equation (9.6) yields the coordinates of a point on the bilinear surface defined by four corner points. It is easy to extend this equation to produce the coordinates of a point located inside a cube. Imagine a cube defined by eight given corner points Pijk . Each point Puvw inside this cube can be specified by three parameters u, v, and w, and its coordinates can be determined from the corner points by means of trilinear interpolation as follows Puvw =P000 (1 − u)(1 − v)(1 − w) + P100 u(1 − v)(1 − w) + P010 (1 − u)v(1 − w)+ P001 (1 − u)(1 − v)w + P101 u(1 − v)w + P011 (1 − u)vw + P110 uv(1 − w)+ P111 uvw.
9 Linear Interpolation
499
9.4 Lofted Surfaces A lofted surface patch is curved, but it belongs in this chapter because it is linear in one direction. Section 1.2 explains the reason for the name lofted. It is bounded by two arbitrary curves (that we denote by P(u, 0) and P(u, 1)) and by two straight segments P(0, w) and P(1, w) connecting them. Surface lines in the w direction are therefore straight, whereas each line in the u direction is a blend of P(u, 0) and P(u, 1). The blend of the two curves is simply (1 − w)P(u, 0) + wP(u, 1), and this blend (the familiar fundamental equation of computer graphics), constitutes the expression of the surface P(u, w) = (1 − w)P(u, 0) + wP(u, 1).
(9.12)
This expression is linear in w, implying straight lines in the w direction. Moving in the u direction, we travel on a curve whose shape depends on the value of w. For w0 ≈ 0, the curve P(u, w0 ) is close to the boundary curve P(u, 0). For w0 ≈ 1, it is close to the boundary curve P(u, 1). For w0 = 0.5, it is 0.5P(u, 0) + 0.5P(u, 1), an equal mixture of the two. Note that this kind of surface is fully defined by specifying the two boundary curves. The four corner points are implicit in these curves. These surfaces are sometimes called ruled, because straight lines are an important part of their description. This is also the reason why this type of surface is sometimes defined as follows: a surface is a lofted surface if and only if through every point on it there is a straight line that lies completely on the surface. This definition implies that any cylinder is a lofted surface, but a little thinking shows that even a bilinear surface is lofted. Example: We start with the six points P1 = (−1, 0, 0), P2 = (0, −1, 0), P3 = (1, 0, 0), P4 = (−1, 0, 1), P5 = (0, −1, 1), and P6 = (1, 0, 1). Because of the special coordinates of the points (and because of the way we will compute the boundary curves), the surface is easy to visualize (Figure 9.11). This helps to intuitively make sense of the expressions for the tangent vectors and the normal. Note especially that the left and right edges of the surface are in the xz plane, whereas we will see that all the other lines in the w direction have a small negative y component. P4 P5
z
P6
y
P1
P2
P3
x
Figure 9.11: A Lofted Surface.
We proceed in six steps as follows:
9.4 Lofted Surfaces
500
1. As the top boundary curve, P(u, 1), we select the quadratic polynomial passing through the top three points P4 , P5 , and P6 . There is only one such curve and it has the form P(u, 1) = A + Bu + Cu2 , where the coefficients A, B, and C have to be calculated. We use the fact that the curve passes through the three points to set up the three equations P(0, 1) = P4 , P(0.5, 1) = P5 , and P(1, 1) = P6 , that are written explicitly as A + B×0 + C×02 = (−1, 0, 1), A + B×0.5 + C×0.52 = (0, −1, 1), A + B×1 + C×12 = (1, 0, 1). These are easy to solve and result in A = (−1,0, 1), B = (2, −4, 0), and C = (0, 4, 0). The top boundary curve is therefore P(u, 1) = 2u − 1, 4u(u − 1), 1 . 2. As the bottom boundary curve, we select the quadratic B´ezier curve (Equation (13.6)) defined by the three points P1 , P2 , and P3 . The curve is P(u, 0) =
2
B2i (u)Pi+1
i=0
= (1 − u)2 (−1, 0, 0) + 2u(1 − u)(0, −1, 0) + u2 (1, 0, 0) = 2u − 1, −2u(1 − u), 0 . 3. The expression of the surface is immediately obtained P(u, w) = P(u, 0)(1 − w) + P(u, 1)w = 2u − 1, 2u(u − 1)(1 + w), w . (Notice that it does not pass through P2 .) 4. The two tangent vectors are also easy to compute ∂P = 2, 2(2u − 1)(1 + w), 0 , ∂u
∂P = 0, 2u(u − 1), 1 . ∂w
5. The of the tangents and is given by normal, as usual, is the cross-product N(u, w) = 2(2u − 1)(1 + w), −2, 4u(u − 1) . 6. The most important feature of this example is the ease with which the expressions of the tangents and the normal can be visualized. This is possible because of the simple shape and orientation of the surface (again, see Figure 9.11). The reader should examine the expressions and make sure the following points are clear: The two boundary curves are very similar. One difference between them is, of course, the x and z coordinates. However, the only important difference is in the y coordinate. Both curves are quadratic polynomials in u, but although P(u, 1) passes through the three top points, P(u, 0) passes only through the first and last points. The tangent in the u direction, ∂P/∂u, features z = 0; it is a vector in the xy plane. At the bottom of the surface, where w = 0, it changes direction from (2, −2, 0) (when u = 0) to (2, 2, 0) (when u = 1), both 45◦ directions in the xy plane. However, at the
9 Linear Interpolation
501
top, where w = 1, the tangent changes direction from (2, −4, 0) to (2, 4, 0), both 63◦ directions. This is because the top boundary curve goes deeper in the y direction. The tangent in the w direction, ∂P/∂w features x = 0; it is a vector in the yz plane. Its z coordinate is a constant 1, and its y coordinate varies from 0 (on the left, where u = 0), to −0.5 (in the middle, where u = 0.5), and back to 0 (on the right, where u = 1). On the left and right edges of the surface, this vector is therefore vertical (0, 0, 1). In the middle, it is (0, −0.5, 1), making a negative half-step in y for each step in z. The normal vector features y = −2 with a small z component. It therefore points mostly in the negative y direction, and a little in x. At the bottom (w = 0), it varies from (−2, −2, 0), to (0, −2, −1),* and ends in (2, −2, 0). At the top (w = 1), it varies from (−4, −2, 0), to (0, −2, −1), and ends in (4, −2, 0). The top boundary curve is deeper, causing the tangent to be more in the y direction and the normal to be more in the x direction, than on the bottom boundary curve. Exercise 9.15: (a) Given the two three-dimensional points P1 = (−1, −1, 0) and P2 = (1, −1, 0), calculate the straight line from P1 to P2 . This will become the bottom boundary curve of a lofted surface. (b) Given the three three-dimensional points P4 = (−1, 1, 0), P5 = (0, 1, 1), and P6 = (1, 1, 0), calculate the quadratic polynomial P(t) = At2 + Bt + C that passes through them. This will become the top boundary curve of the surface. (c) Calculate the expression of the lofted surface patch and the coordinates of its center point P(0.5, 0.5).
9.4.1 A Double Helix This example illustrates how the well-known double helix can be derived as a lofted surface. The two-dimensional parametric curve (cos t, sin t) is, of course, a circle (of radius one unit, centered on the origin). As a result, the three-dimensional curve (cos t, sin t, t) is a helix spiraling around the z axis upward from the origin. The similar curve (cos(t + π), sin(t + π), t) is another helix, at a 180◦ phase difference with the first. We consider these the two boundary curves of a lofted surface and create the entire surface as a linear interpolation of the two curves. Hence, P(u, w) = (cos u, sin u, u)(1 − w) + (cos(u + π), sin(u + π), u)w, where 0 ≤ w ≤ 1, and u can vary in any range. The two curves form a double helix, so the surface looks like a twisted ribbon. Figure 9.12 shows such a surface, together with the code that generated it. Exercise 9.16: Calculate the expression of a cone as a lofted surface. Assume that the vertex of the cone is located at the origin, and the base is a circle of radius R, centered on the z axis and located on the plane z = H. * It has a small z component, reflecting the fact that the surface is not completely vertical at u = 0.5.
9.4 Lofted Surfaces
502
z y z x y
x
Clear[loftedSurf]; (* double helix as a lofted surface *) loftedSurf:={Cos[u],Sin[u],u}(1-w)+{Cos[u+Pi],Sin[u+Pi],u}w; ParametricPlot3D[loftedSurf, {u,0,Pi,.1},{w,0,1}, Ticks->False, ViewPoint->{-2.640, -0.129, 0.007}] Figure 9.12: The Double Helix as a Lofted Surface.
Exercise 9.17: Derive the expression for a square pyramid where each face is a lofted surface. Assume that the base is a square, 2a units on a side, centered about the origin on the xy plane. The top is point (0, 0, H).
9.4.2 A Cusp Given the two curves P1 (u) = (8, 4, 0)u3 −(12, 9, 0)u2 +(6, 6, 0)u+(−1, 0, 0) and P2 (u) = (2u − 1, 4u(u − 1), 1), the lofted surface defined by them is easy to calculate. Notice that the curves pass through the points P1 (0) = (−1, 0, 0), P1 (0.5) = (0, 5/4, 0), P1 (1) = (1, 1, 0), P2 (0) = (−1, 0, 1), P2 (0.5) = (0, −1, 1), and P2 (1) = (1, 0, 1), which makes it easy to visualize the surface (Figure 9.13). The tangent vectors of the two curves are Pu1 (u) = (24, 12, 0)u2 − (24, 18, 0)u + (6, 6, 0),
Pu2 (u) = (2, 8u − 4, 0).
Notice that Pu1 (0.5) equals (0, 0, 0), which implies that P1 (u) has a cusp at u = 0.5. The lofted surface defined by the two curves is P(u, w) = 4u2 (2u − 3)(1 − w) − 4uw + 6u − 1, u2 (4u − 9)(1 − w) + 4u2 w − 10uw + 6u, w .
9 Linear Interpolation
503
y
x
(* Another lofted surface example *) {-0.139, -1.179, 1.475}, DefaultFont->{"cmr10", 10}, AspectRatio->Automatic, Ticks->{{0,1},{0,1},{0,1}}];
Figure 9.13: A Lofted Surface Patch.
Now, look Gwen, y’know if we’re gonna keep living together in this loft, we’re gonna have to have some rules. —Leah Remini (as Terri Reynolds) in Fired Up (1997).
Exercise 9.18: Calculate the tangent vector of this surface in the u direction, and compute its value at the cusp.
Time itself, as a phenomenon, is utterly linear and unidirectional.
—Orson Scott Card, PASTWATCH (1996)
10 Polynomial Interpolation Given a set of points, it is possible to construct a polynomial that when plotted passes through the points. When fully computed and displayed, such a polynomial becomes a curve that’s referred to as a polynomial interpolation of the points. The first part of this chapter discusses methods for polynomial interpolation and explains their limitations. The second part extends the discussion to a two-dimensional grid of points, and shows how to compute a two-parameter polynomial that passes through the points. When fully computed and displayed, such a polynomial becomes a surface. The methods described here apply the algebra of polynomials to the geometry of curves and surfaces, but this application is limited, because high-degree polynomials tend to oscillate. Section 8.8, and especially Exercise 8.16 show why this is so. Still, there are cases where high-degree polynomials are useful. Definition: A polynomial of degree n in x is the function Pn (x) =
n
ai xi = a0 + a1 x + a2 x2 + · · · + an xn ,
i=0
where ai are the coefficients of the polynomial (in our case, they are real numbers). Note that there are n + 1 coefficients. Calculating a polynomial involves additions, multiplications, and exponentiations, but there are two methods that greatly simplify this calculation. They are the following: 1. Horner’s rule. A degree-3 polynomial can be written in the form P (x) = (a3 x + a2 )x + a1 x + a0 , thereby eliminating all exponentiations. 2. Forward differences. This is one of Newton’s many contributions to mathematics and it is described in some detail in Section 8.8.1. Only the first step requires multiplications. All other steps are performed with additions and assignments only. D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_10, © Springer-Verlag London Limited 2011
505
10.1 Four Points
506
This chapter starts with a simple example where four points are given and a cubic polynomial that passes through them is derived from first principles. Following this, the Lagrange and Newton polynomial interpolation methods are introduced. The chapter continues with a description of several simple surface algorithms based on polynomials. It concludes with the Coons and Gordon surfaces, which also employ polynomials.
10.1 Four Points Four points (two-dimensional or three-dimensional) P1 , P2 , P3 , and P4 are given. We are looking for a PC curve that passes through these points and has the form P(t) = at3 + bt2 + ct + d = (t3 , t2 , t, 1)(a, b, c, d)T = T(t)A for 0 ≤ t ≤ 1, (10.1) where each of the four coefficients a, b, c, and d is a pair (or a triplet), T(t) is the row vector (t3 , t2 , t, 1), and A is the column vector (a, b, c, d)T . The only unknowns are a, b, c, and d. Since the four points can be located anywhere, we cannot assume anything about their positions and we make the general assumption that P1 and P4 are the two endpoints P(0) and P(1) of the curve, and that P2 and P3 are the two interior points P(1/3) and P(2/3). (Having no information about the locations of the points, the best we can do is to use equidistant values of the parameter t.) We therefore write the four equations P(0) = P1 , P(1/3) = P2 , P(2/3) = P3 , and P(1) = P4 , or explicitly a(0)3 + b(0)2 + c(0) + d = P1 , a(1/3) + b(1/3)2 + c(1/3) + d = P2 , a(2/3)3 + b(2/3)2 + c(2/3) + d = P3 , a(1)3 + b(1)2 + c(1) + d = P4 . 3
(10.2)
The solutions of this system of equations are a = −(9/2)P1 + (27/2)P2 − (27/2)P3 + (9/2)P4 , b = 9P1 − (45/2)P2 + 18P3 − (9/2)P4 , c = −(11/2)P1 + 9P2 − (9/2)P3 + P4 , d = P1 . Substituting these solutions into Equation (10.1) gives P(t) = −(9/2)P1 + (27/2)P2 − (27/2)P3 + (9/2)P4 t3 2 + 9P1 − (45/2)P2 + 18P3 − (9/2)P4 t + −(11/2)P1 + 9P2 − (9/2)P3 + P4 t + P1 .
(10.3)
10 Polynomial Interpolation
507
After rearranging, this becomes P(t) =(−4.5t3 + 9t2 − 5.5t + 1)P1 + (13.5t3 − 22.5t2 + 9t)P2 + (−13.5t3 + 18t2 − 4.5t)P3 + (4.5t3 − 4.5t2 + t)P4 =G1 (t)P1 + G2 (t)P2 + G3 (t)P3 + G4 (t)P4 =G(t)P,
(10.4)
where the four functions Gi (t) are cubic polynomials in t G1 (t) = (−4.5t3 + 9t2 − 5.5t + 1),
G3 (t) = (−13.5t3 + 18t2 − 4.5t), G2 (t) = (13.5t3 − 22.5t2 + 9t), G4 (t) = (4.5t3 − 4.5t2 + t),
(10.5)
P is the column (P1 , P2 , P3 , P4 )T and G(t) is the row (G1 (t), G2 (t), G3 (t), G4 (t)) (see also Exercise 10.8 for a different approach to this polynomial). The functions Gi (t) are called blending functions because they represent any point on the curve as a blend of the four given points. Note that they are barycentric (they should be, since they blend points, and this is shown in the next paragraph). We can also write G1 (t) = (t3 , t2 , t, 1)(−4.5, 9, −5.5, 1)T and similarly for G2 (t), G3 (t), and G4 (t). The curve can now be expressed as ⎡
⎤⎡ ⎤ −4.5 13.5 −13.5 4.5 P1 18 −4.5 ⎥ ⎢ P2 ⎥ ⎢ 9.0 −22.5 P(t) = G(t)P = (t3 , t2 , t, 1) ⎣ ⎦⎣ ⎦ = T(t) N P. P3 −5.5 9.0 −4.5 1.0 P4 1.0 0 0 0 (10.6) Matrix N is called the basis matrix and P is the geometry vector. Equation (10.1) tells us that P(t) = T(t) A, so we conclude that A = N P. The four functions Gi (t) are barycentric because of the nature of Equation (10.2), not because of the special choice of the four t values. To see why this is so, we write Equation (10.2) for four different, arbitrary values t1 , t2 , t3 , and t4 (they have to be different, otherwise two or more equations would be contradictory). at31 + bt21 + ct1 + d = P1 , at32 + bt22 + ct2 + d = P2 , at33 + bt23 + ct3 + d = P3 , at34 + bt24 + ct4 + d = P4 , (where we treat the four values Pi as numbers, not points, and as a result, a, b, c, and d are also numbers). The solutions are of the form a = c11 P1 + c12 P2 + c13 P3 + c14 P4 , b = c21 P1 + c22 P2 + c23 P3 + c24 P4 , c = c31 P1 + c32 P2 + c33 P3 + c34 P4 , d = c41 P1 + c42 P2 + c43 P3 + c44 P4 .
(10.7)
10.1 Four Points
508
Comparing Equation (10.7) to Equations (10.3) and (10.5) shows that the four functions Gi (t) can be expressed in terms of the cij in the form Gi (t) = (c1i t3 + c2i t2 + c3i t + c4i ).
(10.8)
The point is that the 16 coefficients cij do not depend on the four values Pi . They are the same for any choice of the Pi . As a special case, we now select P1 = P2 = P3 = P4 = 1 which reduces Equation (10.7) to at31 + bt21 + ct1 + d = 1, at33 + bt23 + ct3 + d = 1,
at32 + bt22 + ct2 + d = 1, at34 + bt24 + ct4 + d = 1.
Because the four values ti are arbitrary, the four equations above can be written as the single equation at3 + bt2 + ct + d = 1, that holds for any t. Its solutions must therefore be a = b = c = 0 and d = 1. Thus, we conclude that when all four values Pi are 1, a must be zero. In general, a = c11 P1 + c12 P2 + c13 P3 + c14 P4 , which implies that c11 + c12 + c13 + c14 must be zero. Similar arguments show that c21 + c22 + c23 + c24 = 0, c31 + c32 + c33 + c34 = 0, and c41 + c42 + c43 + c44 = 1. These relations, combined with Equation (10.8), show that the four Gi (t) are barycentric. To calculate the curve, we only need to calculate the four quantities a, b, c, and d (that constitute vector A), and write Equation (10.1) using the numerical values of a, b, c, and d. Example: (This example is in two dimensions, each of the four points Pi along with the four coefficients a, b, c, d form a pair. For three-dimensional curves the method is the same except that triplets are used instead of pairs.) Given the four two-dimensional points P1 = (0, 0), P2 = (1, 0), P3 = (1, 1), and P4 = (0, 1), we set up the equation ⎛ ⎞ ⎛ ⎞⎛ ⎞ a −4.5 13.5 −13.5 4.5 (0, 0) 18 −4.5 ⎟ ⎜ (1, 0) ⎟ ⎜b⎟ ⎜ 9.0 −22.5 ⎝ ⎠ = A = NP = ⎝ ⎠⎝ ⎠. c −5.5 9.0 −4.5 1.0 (1, 1) d 1.0 0 0 0 (0, 1) Its solutions are a = −4.5(0, 0) + 13.5(1, 0) − 13.5(1, 1) + 4.5(0, 1) = (0, −9), b = 19(0, 0) − 22.5(1, 0) + 18(1, 1) − 4.5(0, 1) = (−4.5, 13.5), c = −5.5(0, 0) + 9(1, 0) − 4.5(1, 1) + 1(0, 1) = (4.5, −3.5), d = 1(0, 0) − 0(1, 0) + 0(1, 1) − 0(0, 1) = (0, 0). So the curve P(t) that passes through the given points is P(t) = T(t) A = (0, −9)t3 + (−4.5, 13.5)t2 + (4.5, −3.5)t. It is now easy to calculate and verify that P(0) = (0, 0) = P1 , and P(1/3) = (0, −9)(1/27) + (−4.5, 13.5)(1/9) + (4.5, −3.5)(1/3) = (1, 0) = P2 , P(1) = (0, −9)13 + (−4.5, 13.5)12 + (4.5, −3.5)1 = (0, 1) = P4 .
10 Polynomial Interpolation
509
Exercise 10.1: Calculate P(2/3) and verify that it equals P3 . Exercise 10.2: Imagine the circular arc of radius 1 in the first quadrant (a quarter circle). Write the coordinates of the four points that are equally spaced on this arc. Use the coordinates to calculate a PC approximating this arc. Calculate point P(1/2). How far does it deviate from the midpoint of the true quarter circle? Exercise 10.3: Calculate the PC that passes through the four points P1 through P4 assuming that only the three relative coordinates Δ1 = P2 − P1 , Δ2 = P3 − P2 , and Δ3 = P4 − P3 are given. Show a numeric example. The main advantage of this method is its simplicity. Given the four points, it is easy to construct the PC that passes through them. This, however, is also the reason for the downside of the method. It produces only one PC that passes through four given points. If that PC does not have the required shape, there is nothing the user can do. This simple curve method is not interactive. Even though this method is not very useful for curve drawing, it may be useful for interpolation. Given two points P1 and P2 , we know that the point midway between them is their average, (P1 + P2 )/2. A natural question is: given four points P1 through P4 , what point is located midway between them? We can answer this question by calculating the average, (P1 + P2 + P3 + P4 )/4, but this weighted sum assigns the same weight to each of the four points. If we want to assign more weight to the interior points P2 and P3 , we can construct the PC that passes through the points and compute P(0.5) from Equation (10.6). The result is P(0.5) = −0.0625P1 + 0.5625P2 + 0.5625P3 − 0.0625P4 . This is a weighted sum that assigns more weight to the interior points. Notice that the weights are barycentric. Exercise 10.13 provides a hint as to why the two extreme weights are negative. This method can be extended to a two-dimensional grid of points (Section 10.6.1). Section 2.12 has more to say about interpolating polynomials. A precisian professor had the habit of saying: “. . . quartic polynomial ax4 + bx3 + cx2 + dx + e, where e need not be the base of the natural logarithms.” —J. E. Littlewood, A Mathematician’s Miscellany.
Exercise 10.4: The preceding method makes sense if the four points are (approximately) equally spaced along the curve. If they are not, the following approach may be taken. Instead of using 1/3 and 2/3 as the intermediate values, the user may specify values α and β, both in the interval (0, 1), such that P2 = P(α) and P3 = P(β). Generalize Equation (10.6) such that it depends on α and β.
10.2 The Lagrange Polynomial
510
10.2 The Lagrange Polynomial The preceding section shows how a cubic interpolating polynomial can be derived for a set of four given points. This section discusses the Lagrange polynomial, a general approach to the problem of polynomial interpolation. Given the n + 1 data points P0 = (x0 , y0 ), P1 = (x1 , y1 ), . . . , Pn = (xn , yn ), the problem is how to find a function y = f (x) that will pass through all of them. We n first try an expression of the form y = i=0 yi Lni (x). This is a weighted sum of the individual yi coordinates where the weights depend on the xi coordinates. This sum will pass through the points if 1, x = xi , Lni (x) = 0, otherwise. A good mathematician can easily guess that such functions are given by Lni (x) =
Πj=i (x − xj ) (x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xn ) = . Πj=i (xi − xj ) (xi − x0 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )
(Note that (x − xi ) is missing from n the numerator and (xi − xi ) is missing from the denominator.) The function y = i=0 yi Lni (x) is called the Lagrange polynomial because it was originally developed by Lagrange [Lagrange 77] and it is a polynomial of degree n. It is denoted by LP. Horner’s rule and the method of forward differences make polynomials very desirable to use. In practice, however, polynomials are used in parametric form as illustrated in Section 8.8, since any explicit function y =f (x) is limited in the shapes of curves it can n generate (note that the explicit form y = i=0 yi Lni (x) of the LP cannot be calculated if two of the n + 1 given data points have the same x coordinate). The LP has two properties that make it impractical for interactive curve design, it is of a high degree and it is unique. 1. Writing Pn (x) = 0 creates an equation of degree n in x. It has n solutions (some may be complex numbers), so when plotted as a curve it intercepts the x axis n times. For large n, such a curve may be loose because it tends to oscillate wildly. In practice, we normally prefer tight curves. 2. It is easy to show that the LP is unique (see below). There are infinitely many curves that pass through any given set of points and the one we are looking for may not be the LP. Any useful, practical mathematical method for curve design should make it easy for the designer to change the shape of the curve by varying the values of parameters. It’s easy to show that there is only one polynomial of degree n that passes through any given set of n + 1 points. A root of the polynomial Pn (x) is a value xr such that Pn (xr ) = 0. A polynomial Pn (x) can have at most n distinct roots (unless it is the zero polynomial). Suppose that there is another polynomial Qn (x) that passes through the same n + 1 data points. At the points, we would have Pn (xi ) = Qn (xi ) = yi or (Pn − Qn ) (xi ) = 0. The difference (Pn − Qn ) is a polynomial whose degree must be ≤ n, so it cannot have more than n distinct roots. On the other hand, this difference is 0 at the n + 1 data points, so it has
10 Polynomial Interpolation
511
n + 1 roots. We conclude that it must be the zero polynomial, which implies that Pn (x) and Qn (x) are identical. This uniqueness theorem can also be employed to show that the Lagrange weights Lni (x) are barycentric. Given a function f (x), select n + 1 distinct values x0 through xn , and consider the n + 1 support points (x0 , f (x0 )) through (xn , f (xn )). The uniqueness theorem states that there is a unique polynomial p(x) of degree n or less that passes through the points, i.e., p(xk ) = f (xk ) for k = 0, 1, . . . , n. We say that this polynomial interpolates the points. Now consider the constant function f (x) ≡ 1. The Lagrange polynomial that interpolates its points is LP(x) =
n
yi Lni (x) =
i=0
n
1×Lni (x) =
i=0
n
Lni (x).
i=0
On the other hand, LP(x) must be identicalto 1, because LP(xk ) = f (xk ) and f (xk ) = 1 n for any point xk . Thus, we conclude that i=0 Lni (x) = 1 for any x. Because of these two properties, we conclude that a practical curve design method should be based on polynomials of low degree and should depend on parameters that control the shape of the curve. Such methods are discussed in the chapters that follow. Still, polynomial interpolation may be useful in special situations, which is why it is discussed in the remainder of this chapter. Exercise 10.5: Calculate the LP between the two points P0 = (x0 , y0 ) and P1 = (x1 , y1 ). What kind of a curve is it? I have another method not yet communicated . . . a convenient, rapid and general solution of this problem, To draw a geometrical curve which shall pass through any number of given points . . . These things are done at once geometrically with no calculation intervening . . . Though at first glance it looks unmanageable, yet the matter turns out otherwise. For it ranks among the most beautiful of all that I could wish to solve. (Isaac Newton in a letter to Henry Oldenburg, October 24, 1676, quoted in [Turnbull 59], vol. II, p 188.) —James Gleick, Isaac Newton (2003). The LP can also be expressed in parametric form. Given the n + 1 data points P0 , P1 , . . . , Pn , we need to construct a polynomial P(t) that passes through all of them, such that P(t0 ) = P0 , P(t1 ) = P1 , . . . , P(tn ) = Pn , where t0 = 0, tn = 1, and t1 through tn−1 are certain nvalues between 0 and 1 (the ti are called knot values). The LP has the form P(t) = i=0 Pi Lni (t). This is a weighted sum of the individual points where the weights (or basis functions) are given by Lni (t) = Note that
n
i=0
Πnj=i (t − tj ) . Πnj=i (ti − tj )
Lni (t) = 1, so these weights are barycentric.
(10.9)
10.2 The Lagrange Polynomial
512
Exercise 10.6: Calculate the parametric LP between the two general points P0 and P1 . Exercise 10.7: Calculate the parametric LP for the three points P0 = (0, 0), P1 = (0, 1), and P2 = (1, 1). Exercise 10.8: Calculate the parametric LP for the four equally-spaced points P1 , P2 , P3 , and P4 and show that it is identical to the interpolating PC given by Equation (10.4). The parametric LP is also mentioned on Page 543, in connection with Gordon surfaces. The LP has another disadvantage. If the resulting curve is not satisfactory, the user may want to fine-tune it by adding one more point. However, all the basis functions Lni (t) will have to be recalculated in such a case, since they also depend on the points, not only on the knot values. This disadvantage makes the LP slow to use in practice, which is why the Newton polynomial (Section 10.3) is sometimes used instead.
10.2.1 The Quadratic Lagrange Polynomial Equation (10.9) can easily be employed to obtain the Lagrange polynomial for three points P0 , P1 , and P2 . The weights in this case are 2
− tj ) (t − t1 )(t − t2 ) , = (t 0 − t1 )(t0 − t2 ) (t − t ) j j=0 0 2 (t − t0 )(t − t2 ) j=1 (t − tj ) = L21 (t) = 2 , (t 1 − t0 )(t1 − t2 ) j=1 (t1 − tj ) 2 (t − t0 )(t − t1 ) j=2 (t − tj ) , = L22 (t) = 2 (t 2 − t0 )(t2 − t1 ) (t − t ) 2 j j=2
L20 (t)
j=0 (t
= 2
(10.10)
and the polynomial P2 (t) = 2i=0 Pi L2i (t) is easy to calculate once the values of t0 , t1 , and t2 have been determined. The Uniform Quadratic Lagrange Polynomial is obtained when t0 = 0, t1 = 1, and t2 = 2. (See discussion of uniform and nonuniform parametric curves in Section 8.7.1.) Equation (10.10) yields t2 − 3t + 2 t2 − t P0 − (t2 − 2t)P1 + P2 2 2 ⎛ ⎞⎛ ⎞ 1/2 −1 1/2 P0 = (t2 , t, 1) ⎝ −3/2 2 −1/2 ⎠ ⎝ P1 ⎠ . P2 1 0 0
P2u (t) =
(10.11)
The sums of three rows of the matrix of Equation (10.11) are (from top to bottom) 0, 0, and 1, showing that the three basis functions are barycentric, as they should be.
10 Polynomial Interpolation
513
The Nonuniform Quadratic Lagrange Polynomial is obtained when t0 = 0, t1 = t0 +Δ0 = Δ0 , and t2 = t1 +Δ1 = Δ0 +Δ1 for some positive Δ0 and Δ1 . Equation (10.10) gives L20 (t) =
(t − Δ0 )(t − Δ0 − Δ1 ) 2 (t − 0)(t − Δ0 − Δ1 ) 2 (t − 0)(t − Δ0 ) , L1 (t) = , L2 (t) = , (−Δ0 )(−Δ0 − Δ1 ) Δ0 (−Δ1 ) (Δ0 + Δ1 )Δ1
and the nonuniform polynomial is ⎡
1
⎢ Δ0 (Δ0 + Δ1 ) ⎢ 1 −1 P2nu (t) = (t2 , t, 1) ⎢ ⎢ ⎣ Δ0 + Δ1 − Δ0 1
1 Δ0 Δ1 1 1 + Δ0 Δ1 0 −
1 (Δ0 + Δ1 )Δ1 1 1 − + Δ1 Δ0 + Δ1 0
⎤
⎡ ⎤ ⎥ P0 ⎥ ⎥ ⎣ P1 ⎦ . (10.12) ⎥ ⎦ P2
For Δ0 = Δ1 = 1, Equation (10.12) reduces to the uniform polynomial, Equation (10.11). For Δ0 = Δ1 = 1/2, the parameter t varies in the “standard” range [0, 1] and Equation (10.12) becomes ⎞⎛ ⎞ P0 2 −4 2 P2std (t) = (t , t, 1) ⎝ −3 4 −1 ⎠ ⎝ P1 ⎠ . 1 0 0 P2 ⎛
2
(10.13)
(Notice that the three rows again sum to 0, 0, and 1, to produce three barycentric basis functions.) In most cases, Δ0 and Δ1 should be set to the chord lengths |P1 − P0 | and |P2 − P1 |, respectively. Exercise 10.9: Use Cartesian product to generalize Equation (10.13) to a surface patch that passes through nine given points. Example: The three points P0 = (1, 0), P1 = (1.3, .5), and P2 = (4, 0) are given. The uniform LP is obtained when Δ0 = Δ1 = 1 and it equals P2u (t) = 1 − 0.9t + 1.2t2 , 0.5(2 − t)t . Many nonuniform polynomials are possible. We select the one that’s obtained when the Δ values are the chord lengths between the points. In our case, they are Δ0 = |P1 − P0 | ≈ 0.583 and Δ1 = |P2 − P1 | ≈ 2.75. This polynomial is P2nu (t) = (1 + 0.433t + 0.14t2 , 1.04t − 0.312t2 ). These uniform and nonuniform polynomials are shown in Figure 10.1. The figure illustrates how the nonuniform curve based on the chord lengths between the points is tighter (features smaller overall curvature). Such a curve is generally considered a better interpolation of the three points. Figure 10.2 shows three examples of nonuniform Lagrange polynomials that pass through the three points P0 = (1, 1), P1 = (2, 2), and P2 = (4, 0). The value of Δ0 is
10.2 The Lagrange Polynomial
514
P1 Uniform Curve Nonuniform Curve
P0
P2
(* 3-point Lagrange polynomial (uniform and nonunif) *) Clear[T,H,B,d0,d1]; d0=1; d1=1; T={t^2,t,1}; H={{1/(d0(d0+d1)),-1/(d0 d1),1/(d1(d0+d1))}, {-1/(d0+d1)-1/d0,1/d0+1/d1,-1/d1+1/(d0+d1)},{1,0,0}}; B={{1,0},{1.3,.5},{4,0}}; Simplify[T.H.B]; C1=ParametricPlot[T.H.B,{t,0,d0+d1}, PlotStyle->AbsoluteDashing[{2,2}], DisplayFunction->Identity]; d0=.583; d1=2.75; H={{1/(d0(d0+d1)),-1/(d0 d1),1/(d1(d0+d1))}, {-1/(d0+d1)-1/d0,1/d0+1/d1,-1/d1+1/(d0+d1)},{1,0,0}}; Simplify[T.H.B]; C2=ParametricPlot[T.H.B,{t,0,d0+d1}]; Show[C1, C2, PlotRange->All]
Figure 10.1: Three-Point Lagrange Polynomials.
1.414, the chord length between P0 and P1 . The chord length between P1 and P2 is 2.83 and Δ1 is first assigned this value, then half this value, and finally twice it. The three resulting curves illustrate how the Lagrange polynomial can be reshaped by modifying the Δi parameters. The three polynomials in this case are (1 + 0.354231t + 0.249634t2 , 1 + 1.76716t − 0.749608t2 ), (1 + 0.70738t − 0.000117766t2 , 1 + 1.1783t − 0.333159t2 ), (1 + 0.777945t − 0.0500221t2 , 1 + 0.919208t − 0.149925t2 ).
10.2.2 The Cubic Lagrange Polynomial Equation (10.9) is now applied to the cubic Lagrange polynomial that interpolates the four points P0 , P1 , P2 , and P3 . The weights in this case are 3 L30 (t)
j=0 (t
= 3
j=0 (t0
3
j=1 (t
L31 (t) = 3
− tj ) − tj ) − tj )
j=1 (t1 − tj )
=
(t − t1 )(t − t2 )(t − t3 ) , (t0 − t1 )(t0 − t2 )(t0 − t3 )
=
(t − t0 )(t − t2 )(t − t3 ) , (t1 − t0 )(t1 − t2 )(t1 − t3 )
10 Polynomial Interpolation
515
P1
P0 ∇ 1= 0.5 | P2 - P1 | ∇ 1= | P2 - P1 | ∇ 1= 2 | P2 - P1 | P2 (* 3-point Lagrange polynomial (3 examples of nonuniform) *) Clear[T,H,B,d0,d1,C1,C2,C3]; d0=1.414; d1=1.415; (* d1=0.5|P2-P1| *) T={t^2,t,1}; H={{1/(d0(d0+d1)),-1/(d0 d1),1/(d1(d0+d1))}, {-1/(d0+d1)-1/d0,1/d0+1/d1,-1/d1+1/(d0+d1)},{1,0,0}}; B={{1,1},{2,2},{4,0}}; Simplify[T.H.B] C1=ParametricPlot[T.H.B,{t,0,d0+d1}]; d1=2.83; (* d1=|P2-P1| *) H={{1/(d0(d0+d1)),-1/(d0 d1),1/(d1(d0+d1))}, {-1/(d0+d1)-1/d0,1/d0+1/d1,-1/d1+1/(d0+d1)},{1,0,0}}; Simplify[T.H.B] C2=ParametricPlot[T.H.B,{t,0,d0+d1}]; d1=5.66; (* d1=2|P2-P1| *) H={{1/(d0(d0+d1)),-1/(d0 d1),1/(d1(d0+d1))}, {-1/(d0+d1)-1/d0,1/d0+1/d1,-1/d1+1/(d0+d1)},{1,0,0}}; Simplify[T.H.B] C3=ParametricPlot[T.H.B,{t,0,d0+d1}]; Show[C1,C2,C3, PlotRange->All] (* (1/24,-1/8)t^3+(-1/3,3/4)t^2+(1,-1)t *)
Figure 10.2: Three-Point Nonuniform Lagrange Polynomials.
10.2 The Lagrange Polynomial
516
3
j=2 (t
L32 (t) = 3
j=2 (t2
3
j=3 (t
L33 (t) = 3
− tj ) − tj ) − tj )
j=3 (t3 − tj )
=
(t − t0 )(t − t1 )(t − t3 ) , (t2 − t0 )(t2 − t1 )(t2 − t3 )
=
(t − t0 )(t − t1 )(t − t2 ) , (t3 − t0 )(t3 − t1 )(t3 − t2 )
(10.14)
and the polynomial P3 (t) = 3i=0 Pi L3i (t) is easy to calculate once the values of t0 , t1 , t2 , and t3 have been determined. The Nonuniform Cubic Lagrange Polynomial is obtained when t0 = 0, t1 = t0 + Δ0 = Δ0 , t2 = t1 + Δ1 = Δ0 + Δ1 , and t3 = t2 + Δ2 = Δ0 + Δ1 + Δ2 for positive Δi . The expression for the polynomial is ⎛
⎞ P0 P ⎜ ⎟ P3nu (t) = (t3 , t2 , t, 1)Q ⎝ 1 ⎠ , P2 P3
(10.15)
where Q is the matrix ⎛
1 Δ0 (−Δ1 )(−Δ1 −Δ2 )
1 (−Δ0 )(−Δ0 −Δ1 )(−Δ0 −Δ1 −Δ2 )
⎜ 3Δ0 +2Δ1 +Δ2 ⎜ − (−Δ0 )(−Δ ⎜ 0 −Δ1 )(−Δ0 −Δ1 −Δ2 ) Q=⎜ ⎜ Δ0 (Δ0 +Δ1 )+(Δ0 +Δ1 )(Δ0 +Δ1 +Δ2 )+(Δ0 +Δ1 +Δ2 )Δ0 ⎜ (−Δ0 )(−Δ0 −Δ1 )(−Δ0 −Δ1 −Δ2 ) ⎝ (Δ0 +Δ1 )(Δ0 +Δ1 +Δ2 ) − (−ΔΔ00)(−Δ 0 −Δ1 )(−Δ0 −Δ1 −Δ2 )
2Δ0 +2Δ1 +Δ2 − Δ0 (−Δ 1 )(−Δ1 −Δ2 ) (Δ0 +Δ1 )(Δ0 +Δ1 +Δ2 ) Δ0 (−Δ1 )(−Δ1 −Δ2 )
1 (Δ0 +Δ1 )Δ1 (−Δ2 ) 0 +Δ1 +Δ2 − (Δ2Δ 0 +Δ1 )Δ1 (−Δ2 )
Δ0 (Δ0 +Δ1 +Δ2 ) (Δ0 +Δ1 )Δ1 (−Δ2 )
0
0 1 (Δ0 +Δ1 +Δ2 )(Δ1 +Δ2 )Δ2
⎞
⎟ 2Δ0 +Δ1 ⎟ − (Δ0 +Δ1 +Δ 2 )(Δ1 +Δ2 )Δ2 ⎟ ⎟. ⎟ Δ0 (Δ0 +Δ1 ) (Δ0 +Δ1 +Δ2 )(Δ1 +Δ2 )Δ2 ⎠ 0
The Uniform Cubic Lagrange Polynomial. We construct the “standard” case, where t varies from 0 to 1. This implies t0 = 0, t1 = 1/3, t2 = 2/3, and t3 = 1. Equation (10.15) reduces to ⎛ ⎞⎛ ⎞ −9/2 27/2 −27/2 9/2 P0 −45/2 18 −9/2 ⎟ ⎜ P1 ⎟ ⎜ 9 P3u (t) = (t3 , t2 , t, 1) ⎝ (10.16) ⎠⎝ ⎠. P2 −11/2 9 −9/2 1 P3 1 0 0 0 Figure 10.3 shows the quadratic and cubic Lagrange basis functions. It is easy to see that there are values of t (indicated by arrows) for which one of the basis functions is 1 and the others are zeros. This is how the curve (which is a weighted sum of the functions) passes through a point. The functions add up to 1, but most climb above 1 and are negative in certain regions. In the nonuniform case, the particular choice of the
10 Polynomial Interpolation
517
various Δi reshapes the basis functions in such a way that a function still retains its basic shape, but its areas above and below the t axis may increase or decrease significantly. Those willing to experiment can copy Matrix Q of Equation (10.15) into appropriate mathematical software and use code similar to that of Figure 10.3 to plot the basis functions for various values of Δi . 1
2
L1
2
L0
L22
1
L31
0.8
L32
0.8 0.6
0.6 0.4
0.4
L33
L30
0.2
0.2
0.2 0.5
1
1.5
0.4
0.6
0.8
1
2 -.2
(a)
(b)
(* Plot quadratic and cubic Lagrange basis functions *) lagq={t^2,t,1}.{{1/2,-1,1/2}, {-3/2,2,-1/2}, {1,0,0}}; Plot[{lagq[[1]],lagq[[2]],lagq[[3]]}, {t,0,2}, PlotRange->All, AspectRatio->1] lagc={t^3,t^2,t,1}.{{-9/2,27/2,-27/2,9/2}, {9,-45/2,18,-9/2}, {-11/2,9,-9/2,1}, {1,0,0,0}}; Plot[{lagc[[1]], lagc[[2]], lagc[[3]], lagc[[4]]}, {t,0,1}, PlotRange -> All, AspectRatio -> 1] Figure 10.3: (a) Quadratic and (b) Cubic Lagrange Basis Functions.
It should be noted that the basis functions of the B´ezier curve (Section 13.2) are more intuitive and provide easier control of the shape of the curve, which is why Lagrange interpolation is not popular and is used in special cases only.
10.2.3 Barycentric Lagrange Interpolation Given the n + 1 data points P0 = (x0 , y0 ) through Pn = (xn , yn ), the explicit (nonparametric) Lagrange polynomial that interpolates them is LP(x) = ni=0 yi Lni (x), where Lni (x) =
Πnj=i (x − xj ) (x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 )(x − xn ) = . Πnj=i (xi − xj ) (xi − x0 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xn )
This representation of the Lagrange polynomial has the following disadvantages: 1. The denominator of Lni (x) requires n subtractions and n−1 multiplications, for a total of O(n) operations. The denominators of the n + 1 weights therefore require O(n2 )
10.2 The Lagrange Polynomial
518
operations. The numerators also require O(n2 ) operations, but have to be recomputed for each value of x. 2. Adding a new point Pn+1 requires the computation of a new weight Ln+1 n+1 (x) and a recomputation of all the original weights Lni (x), because (x) = Lni (x) Ln+1 i
x − xn+1 , xi − xn+1
for i = 0, 1, . . . , n.
3. The computations are numerically unstable. A small change in any of the data points may cause a large change in LP(x). Numerical analysts have long believed that these reasons make the Newton polynomial (Section 10.3) more attractive for practical work. However, recent research has resulted in a new, barycentric form of the LP, that makes Lagrange interpolation more attractive. This section is based on [Berrut and Trefethen 04]. The barycentric form of the LP is LP(x) =
n
yi Lni (x) =
i=0
n i=0
n
yi
Πnj=i (x − xj ) Πnj=i (xi − xj )
n wi n wi yi yi Πj=0 (x − xj ) = Πnj=0 (x − xj ) = x − x x − xi i i=0 i=0
= L(x)
n i=0
where wi =
yi
wi , x − xi
(10.17)
1 , Πnj=i (xi − xj )
for i = 0, 1, . . . , n.
Each weight wi requires O(n) operations, for a total of O(n2 ), but these weights no longer depend on x and consequently have to be computed just once! The only quantity that depends on x is L(x) and it requires only O(n) operations. Also, when a new point is added, the only operations required are (1) divide each wi by (xi − xn+1 ) and (2) compute wn+1 . These require O(n) steps. A better form of Equation (10.17), one that’s more numerically stable, is obtained when we consider the case yi = 1. If all the data points are of the form (xi , 1), then the interpolating LP should satisfy LP(x) ≡ 1, which brings Equation (10.17) to the form 1 = L(x)
n i=0
wi , x − xi
(10.18)
We can now divide Equation (10.17) by Equation (10.18) to obtain
n
wi LP(x) = yi x − xi i=0
⎡ n ⎣ j=0
⎤ wj ⎦ . x − xj
(10.19)
10 Polynomial Interpolation
519
The weights of Equation (10.19) are wi x − xi , j=0 wj /(x − xj )
n
i = 0, 1, . . . , n,
and it’s easy to see that they are barycentric. Also, any common factors in the weights can now be cancelled out. For example, it can be shown that in the case of data points that are uniformly distributed in the interval [−1, +1] P0 = (−1, y0 ),
P1 = (−1 + h, y1 ),
P2 = (−1 + 2h, y2 ), . . . , Pn = (+1, yn )
(where h = 2/n), the weights become wi = (−1)n−i ni /(hn n!). The common factors are those that do not depend on i. When they are cancelled out, the weights become the simple expressions n wi = (−1)i . i (This is also true for points that are equidistant in any interval [a, b]. Incidentally, it can be shown that the case of equidistant data points is ill conditioned and the LP, in any form, can change its value wildly in response to even small changes in the data points.)
10.3 The Newton Polynomial The Newton polynomial offers an alternative approach to the problem of polynomial interpolation. The final interpolating polynomial is identical to the LP, but the derivation is different. It allows the user to easily add more points and thereby provide fine control over the shape of the curve. We again assume that n + 1 data points P0 , P1 , . . . , Pn are given and are assigned knot values t0 = 0 < t1 < · · · < tn−1 < tn = 1. We are looking for a curve expressed by the degree-n parametric polynomial P(t) =
n
Ni (t)Ai ,
i=0
where the basis functions Ni (t) depend only on the knot values and not on the data points. Only the (unknown) coefficients Ai depend on the points. This definition (originally proposed by Newton) is useful because each coefficient Ai depends only on points P0 through Pi . If the user decides to add a point Pn+1 , only one coefficient, An+1 , and one basis function, Nn+1 (t), need be recomputed. The definition of the basis functions is N0 (t) = 1 and Ni (t) = (t − t0 )(t − t1 ) · · · (t − ti−1 ),
for i = 1, . . . , n.
10.3 The Newton Polynomial
520
To calculate the unknown coefficients, we write the equations P0 = P(t0 ) = A0 , P1 = P(t1 ) = A0 + A1 (t1 − t0 ), P2 = P(t2 ) = A0 + A1 (t2 − t0 ) + A2 (t2 − t0 )(t2 − t1 ), .. . Pn = P(tn ) = A0 + · · · . These equations don’t have to be solved simultaneously. Each can easily be solved after all its predecessors have been solved. The solutions are A0 = P0 , P1 − P0 , A1 = t1 − t0
P2 − P1 (P1 − P0 )(t2 − t0 ) P1 − P0 − t1 − t0 t2 − t1 t1 − t0 = . (t2 − t0 )(t2 − t1 ) t2 − t0
P2 − P0 − A2 =
This obviously gets very complicated quickly, so we use the method of divided differences to express all the solutions in compact notation. The divided difference of the knots ti tk is denoted [ti tk ] and is defined as def
[ti tk ] =
Pi − Pk . ti − tk
The solutions can now be expressed as A0 = P0 , P1 − P0 = [t1 t0 ], A1 = t1 − t0 [t2 t1 ] − [t1 t0 ] A2 = [t2 t1 t0 ] = , t2 − t0 [t3 t2 t1 ] − [t2 t1 t0 ] , A3 = [t3 t2 t1 t0 ] = t3 − t0 .. . [tn . . . t1 ] − [tn−1 . . . t0 ] An = [tn . . . t1 t0 ] = . tn − t0 Exercise 10.10: Given the same points and knot values as in Exercise 10.7, calculate the Newton polynomial that passes through the points.
10 Polynomial Interpolation
521
Exercise 10.11: The tangent vector to a curve P(t) is the derivative dP(t) dt , which we denote by Pt (t). Calculate the tangent vectors to the curve of Exercises 10.7 and 10.10 at the three points. Also calculate the slopes of the curve at the points.
10.4 Polynomial Surfaces The polynomial y = ai xi is the explicit representationof a curve. Similarly, the ai (t)Pi (where ai (t) is a parametric polynomial P(t) = ti Pi and also P(t) = polynomial in t) are parametric representations of curves. These expressions can be extended to polynomials in two variables, which represent surfaces. Thus, the double polynomial z = i j aij xi y j is the explicit representation of a surface patch, because it yields a z value for any coordinates (x, y). Similarly, the double parametric of pair i j polynomial P(u, w) = i j u w Pij is the parametric representation of a surface patch. For the cubic case (polynomials of degree 3), such a double polynomial can be expressed compactly in matrix notation as Equation (8.27), duplicated here ⎡
P33 ⎢ P23 3 2 P(u, w) = [u , u , u, 1]N ⎣ P13 P03
P32 P22 P12 P02
P31 P21 P11 P01
⎤ ⎡ 3⎤ w P30 P20 ⎥ T ⎢ w2 ⎥ ⎦N ⎣ ⎦. P10 w 1 P00
(8.27)
The corresponding surface patch is accordingly referred to as bicubic.
10.5 The Biquadratic Surface Patch This section introduces the biquadratic surface patch and constructs this simple surface as a Cartesian product. Given the two quadratic (degree 2) polynomials
Q(u) =
2
fi (u)Qi
and R(w) =
i=0
2
gj (w)Rj
j=0
the biquadratic surface immediately follows from the principle of Cartesian product
P(u, w) =
2 2
fi (u)gj (w)Pij .
(10.20)
i=0 j=0
Different constructions are possible depending on the geometric meaning of the nine quantities Pij . The following section presents such a construction and Section 11.10 discusses another approach, based on points, tangent vectors, and twist vectors.
10.6 The Bicubic Surface Patch
522
10.5.1 Nine Points Equation (10.13), duplicated below, gives the quadratic standard Lagrange polynomial that interpolates three given points: ⎞⎛ ⎛ ⎞ P0 2 −4 2 4 −1 ⎠ ⎝ P1 ⎠ . (10.13) P2std (t) = (t2 , t, 1) ⎝ −3 1 0 0 P2 Cartesian product yields the corresponding biquadratic ⎞⎛ ⎛ P22 2 −4 2 P(u, w) = (u2 , u, 1) ⎝ −3 4 −1 ⎠ ⎝ P12 1 0 0 P02 ⎛ ⎞T ⎛ 2 ⎞ 2 −4 2 w 4 −1 ⎠ ⎝ w ⎠ , × ⎝ −3 1 0 0 1
surface P21 P11 P01
⎞ P20 P10 ⎠ P00
(10.21)
where the nine quantities Pij are points defining this surface patch. They should be roughly equally spaced over the surface. Example: Given the nine points of Figure 10.4a, we compute and draw the biquadratic surface patch defined by them. The surface is shown in Figure 10.4b. The code is also listed. It is also possible to construct similar biquadratic surfaces from the expressions for the uniform and nonuniform quadratic Lagrange polynomials, Equations (10.11) and (10.12). Exercise 10.12: The geometry vector of Equation (10.13) has point P0 at the top, but the geometry matrix of Equation (10.21) has point P00 at its bottom-right instead of its top-left corner. Why is that?
10.6 The Bicubic Surface Patch The parametric cubic (PC) curve, Equation (10.1), is useful, since it can be used when either four points, or two points and two tangent vectors, are known. The latter approach is the topic of Chapter 11. The PC curve can easily be extended to a bicubic surface patch by means of the Cartesian product. A PC curve has the form P(t) = 3i=0 ai ti . Two such curves, P(u) and P(w), can be combined to form the Cartesian product surface patch P(u, w) =
3 3
aij ui w j
i=0 j=0
= a33 u3 w3 + a32 u3 w2 + a31 u3 w + a30 u3 + a23 u2 w3 + a22 u2 w 2 + a21 u2 w + a20 u2 + a13 uw3 + a12 uw2 + a11 uw + a10 u + a03 w3 + a02 w2 + a01 w + a00 (10.22)
10 Polynomial Interpolation
1
0
0.5
1
523 1.5
2
0.5 (1,2,0)
(0,2,0)
z
(0,1,0)
y (0,0,0)
x
(1,1,1) (1,0,0)
(2,2,0)
(2,1,-.5)
(2,0,0)
0 -0.5
z
y x
(a)
(b)
0
0.5
2 1.5 1
(* Biquadratic patch for 9 points *) Clear[T,pnt,M,g1,g2]; T[t_]:={t^2,t,1}; pnt={{{0,0,0},{1,0,0},{2,0,0}}, {{0,1,0},{1,1,1},{2,1,-.5}}, {{0,2,0},{1,2,0},{2,2,0}}}; M={{2,-4,2},{-3,4,-1},{1,0,0}}; g2=Graphics3D[{Red,AbsolutePointSize[6], Table[Point[pnt[[i,j]]],{i,1,3},{j,1,3}] }]; comb[i_]:=(T[u].M.pnt)[[i]](Transpose[M].T[w])[[i]]; g1=ParametricPlot3D[comb[1]+comb[2]+comb[3], {u,0,1},{w,0,1}]; Show[g1,g2, ViewPoint->{1.391, -2.776, 0.304}, PlotRange->All] Figure 10.4: A Biquadratic Surface Patch Example.
⎛
a33 ⎜ a23 3 2 = (u , u , u, 1) ⎝ a13 a03
a32 a22 a12 a02
a31 a21 a11 a01
⎞⎛ 3 ⎞ a30 w a20 ⎟ ⎜ w 2 ⎟ ⎠⎝ ⎠, a10 w a00 1
where 0 ≤ u, w ≤ 1.
(10.23)
This is a double cubic polynomial (hence the name bicubic) with 16 terms, where each of the 16 coefficients aij is a triplet (compare with Equation (8.27)). When w is set to a fixed value w0 , Equation (10.23) becomes P(u, w0 ), which is a PC curve. The same is true for P(u0 , w). The conclusion is that curves that lie on this surface in the u or in the w directions are parametric cubics. The four boundary curves are consequently also PC curves. Notice that the shape and location of the surface depend on all 16 coefficients. Any change in any of them produces a different surface patch. Equation (10.23) is the algebraic representation of the bicubic patch. In order to use it in practice, the 16 unknown coefficients have to be expressed in terms of known geometrical quantities, such as points, tangent vectors, or second derivatives. Two types of bicubic surfaces are discussed in the next two sections. The first is based on 16 data points and the second is constructed from four known curves. A third type—defined by four data points, eight tangent vectors, and four twist vectors—is the topic of Section 11.9.
10.6 The Bicubic Surface Patch
524
Milo . . . glanced curiously at the strange circular room, where sixteen tiny arched windows corresponded exactly to the sixteen points of the compass. Around the entire circumference were numbers from zero to three hundred and sixty, marking the degrees of the circle, and on the floor, walls, tables, chairs, desks, cabinets, and ceiling were labels showing their heights, widths, depths, and distances to and from each other. —Norton Juster, The Phantom Tollbooth.
10.6.1 Sixteen Points We start with the sixteen given points P03 P02 P01 P00
P13 P12 P11 P10
P23 P22 P21 P20
P33 P32 P31 P30 .
u P03
P33
w=2/3
P02
u=1/3
w=1/3
P01 P00
w=1 u=2/3
w=0
P10
P20
P30
(a)
(b)
Figure 10.5: (a) Sixteen Points. (b) Four Curves.
We assume that the points are (roughly) equally spaced on the rectangular surface patch as shown in Figure 10.5a. We know that the bicubic surface has the form P(u, w) =
3 3
aij ui wj ,
(10.24)
i=0 j=0
where each of the 16 coefficients aij is a triplet. To calculate the 16 unknown coefficients, we write 16 equations, each based on one of the given points P(0, 0) = P00 , P(0, 1/3) = P01 , P(0, 2/3) = P02 , P(0, 1) = P03 , P(1/3, 0) = P10 , P(1/3, 1/3) = P11 , P(1/3, 2/3) = P12 , P(1/3, 1) = P13 , P(2/3, 0) = P20 , P(2/3, 1/3) = P21 , P(2/3, 2/3) = P22 , P(2/3, 1) = P23 , P(1, 0) = P30 , P(1, 1/3) = P31 , P(1, 2/3) = P32 , P(1, 1) = P33 .
10 Polynomial Interpolation
525
After solving, the final expression for the surface patch becomes ⎛
P00 ⎜P P(u, w) = (u3 , u2 , u, 1)N ⎝ 01 P02 P03 where
P10 P11 P12 P13
P20 P21 P22 P23
⎞ ⎛ 3⎞ P30 w P31 ⎟ T ⎜ w2 ⎟ N ⎠, ⎠ ⎝ P32 w 1 P33
(10.25)
⎞ −4.5 13.5 −13.5 4.5 18 −4.5 ⎟ ⎜ 9.0 −22.5 N=⎝ ⎠. −5.5 9.0 −4.5 1.0 1.0 0 0 0 ⎛
is the basis matrix used to blend four points in a PC (Equation (10.6)). As mentioned, this type of surface patch has only limited use because it cannot have a very complex shape. A larger surface, made up of a number of such patches, can be constructed, but it is difficult to connect the individual patches smoothly. (This type of surface is also derived in Section 8.12 as a Cartesian product.) Example: Given the 16 points listed in Figure 10.6, we compute and plot the bicubic surface patch defined by them. The figure shows two views of this surface. 0 1 z 0 1 2 3
y 0
1
2
1 0.5
0 -°0.5
x
2
x
3 3
2
y 1
3 z 1 0.5 0 -°0.5 0
(* BiCubic patch for 16 points *) Clear[T,pnt,M,g1,g2]; T[t_]:={t^3,t^2,t,1}; pnt = {{{0,0,0},{1,0,0},{2,0,0},{3,0,0}}, {{0,1,0},{1,1,1}, {2,1,-.5},{3,1,0}}, {{0,2,-.5},{1,2,0},{2,2,.5},{3,2,0}}, {{0,3,0},{1,3,0},{2,3,0},{3,3,0}}}; M={{-4.5,13.5,-13.5,4.5},{9,-22.5,18,-4.5},{-5.5,9,-4.5,1},{1,0,0,0}}; g2=Graphics3D[{Red, AbsolutePointSize[6], Table[Point[pnt[[i,j]]], {i,1,4}, {j,1,4}]}]; comb[i_]:=(T[u].M.pnt)[[i]] (Transpose[M].T[w])[[i]]; g1=ParametricPlot3D[comb[1]+comb[2]+comb[3]+comb[4],{u,0,1},{w,0,1}]; Show[g1, g2, PlotRange->All] Figure 10.6: A Bicubic Surface Patch Example.
526
10.6 The Bicubic Surface Patch
Even though this type of surface has limited use in graphics, it can be used for two-dimensional bicubic polynomial interpolation of points and numbers. Given a set of three-dimensional points arranged in a two-dimensional grid, the problem is to compute a weighted sum of the points and employ it to predict the value of a new point at the center of the grid. It makes sense to assign more weights to points that are closer to the center, and a natural way to achieve this is to calculate the surface patch P(u, w) that passes through all the points in the grid and use the value P(0.5, 0.5) as the interpolated value at the center of the grid. The MLP image compression method [Salomon 09] is an example of the use of this approach. The problem is to interpolate the values of a group of 4×4 pixels in an image in order to predict the value of a pixel at the center of this group. The simple solution is to calculate the surface patch defined by the 16 pixels and to use the surface point P(0.5, 0.5) as the interpolated value of the pixel at the center of the group. Substituting u = 0.5 and w = 0.5 in Equation (10.25) produces P(0.5, 0.5) = 0.00390625P00 − 0.0351563P01 − 0.0351563P02 + 0.00390625P03 − 0.0351563P10 + 0.316406P11 + 0.316406P12 − 0.0351563P13 − 0.0351563P20 + 0.316406P21 + 0.316406P22 − 0.0351563P23 + 0.00390625P30 − 0.0351563P31 − 0.0351563P32 + 0.00390625P33 . The 16 coefficients are the ones used by MLP. Exercise 10.13: The center point of the surface is calculated as a weighted sum of the 16 equally-spaced data points (this technique is known as bicubic interpolation). It makes sense to assign small weights to points located away from the center, but our result assigns negative weights to eight of the 16 points. Explain the meaning of negative weights and show what role they play in interpolating the center of the surface (see also Section 2.4.1). Readers who find it tedious to follow the details above should compare the way twodimensional bicubic polynomial interpolation is presented here to the way it is discussed by [Press and Flannery 88]; the following quotation is from their page 125: “. . . the formulas that obtain the c’s from the function and derivative values are just a complicated linear transformation, with coefficients which, having been determined once, in the mists of numerical history, can be tabulated and forgotten.”
Seated at his disorderly desk, caressed by a counterpane of drifting tobacco haze, he would pore over the manuscript, crossing out, interpolating, re-arguing, and then referring to volumes on his shelves. —Christopher Morley, The Haunted Bookshop (1919).
10 Polynomial Interpolation
527
10.6.2 Four Curves A variant of the previous method starts with four curves (any curves, not just PCs), P0 (u), P1 (u), P2 (u), and P3 (u), roughly parallel, all going in the u direction (Figure 10.5b). It is possible to select four points Pi (0), Pi (1/3), Pi (2/3), and Pi (1) on each curve Pi (u), for a total of 16 points. The surface patch can then easily be constructed from Equation (10.25). Example: The surface of Figure 10.7 is defined by the following four curves (shown in the diagram in an inset). All go along the x axis, at different y values, and are sine curves (with different phases) along the z axis. P0 (u) = (u, 0, sin(πu)), P1 (u) = (u, 1 + u/10, sin(π(u + 0.1))), P2 (u) = (u, 2, sin(π(u + 0.2))), P3 (u) = (u, 3 + u/10, sin(π(u + 0.3))), The Mathematica code of Figure points ⎛ P0 (0) ⎜ P1 (0) ⎝ P2 (0) P3 (0)
10.7 shows how matrix basis is created with the 16 P0 (.33) P1 (.33) P2 (.33) P3 (.33)
P0 (.67) P1 (.67) P2 (.67) P3 (.67)
⎞ P0 (1) P1 (1) ⎟ ⎠. P2 (1) P3 (1)
10.7 Coons Surfaces The Coons surface is based on the pioneering work of Steven Anson Coons at MIT in the 1960s. His efforts are summarized in [Coons 64] and [Coons 67]. We start with the linear Coons surface, which is a generalization of lofted surfaces. This type of surface patch is defined by its four boundary curves. All four boundary curves are given, and none has to be a straight line. Naturally, the boundary curves have to meet at the corner points, so these points are implicitly known. Coons decided to search for an expression P(u, w) of the surface that satisfies (1) it is symmetric in u and w and (2) it is an interpolation of P(u, 0) and P(u, 1) in one direction and of P(0, w) and P(1, w) in the other direction. He found a surprisingly simple, two-step solution. The first step is to construct two lofted surfaces from the two sets of opposite boundary curves. They are Pa (u, w) = P(0, w)(1 − u) + P(1, w)u and Pb (u, w) = P(u, 0)(1 − w) + P(u, 1)w. The second step is to tentatively attempt to create the final surface P(u, w) as the sum Pa (u, w) + Pb (u, w). It is clear that this is not the expression we are looking for because it does not converge to the right curves at the boundaries. For u = 0, for example, we want P(u, w) to converge to boundary curve P(0, w). The sum above, however, converges to P(0, w)+P(0, 0)(1−w)+P(0, 1)w. We therefore have to subtract P(0, 0)(1−w)+P(0, 1)w. Similarly, for u = 1, the sum converges to P(1, w)+P(1, 0)(1− w) + P(1, 1)w, so we have to subtract P(1, 0)(1 − w) + P(1, 1)w. For w = 0, we have to subtract P(0, 0)(1 − u) + P(1, 0)u, and for w = 1, we should subtract P(0, 1)(1 − u) + P(1, 1)u.
528
10.7 Coons Surfaces
Clear[p0,p1,p2,p3,basis,fourP,g0,g1,g2,g3,g4,g5]; p0[u_]:={u,0,Sin[Pi u]};p1[u_]:={u,1+u/10,Sin[Pi (u+.1)]}; p2[u_]:={u,2,Sin[Pi (u+.2)]};p3[u_]:={u,3+u/10,Sin[Pi (u+.3)]}; (*matrix ‘basis’ has dimensions 4x4x3*) basis:={{p0[0],p0[.33],p0[.67],p0[1]},{p1[0],p1[.33],p1[.67],p1[1]}, {p2[0],p2[.33],p2[.67],p2[1]},{p3[0],p3[.33],p3[.67],p3[1]}}; fourP:=(*basis matrix for a 4-point curve*){{-4.5,13.5,-13.5,4.5}, {9,-22.5,18,-4.5},{-5.5,9,-4.5,1},{1,0,0,0}}; prt[i_]:= (*extracts component i from the 3rd dimen of ‘basis‘*) basis[[Range[1,4],Range[1,4],i]]; coord[i_]:=(*calc.the 3 parametric components of the surface*) {u^3,u^2,u,1}.fourP.prt[i].Transpose[fourP].{w^3,w^2,w,1}; g0=ParametricPlot3D[p0[u],{u,0,1}]; g1=ParametricPlot3D[p1[u],{u,0,1}]; g2=ParametricPlot3D[p2[u],{u,0,1}]; g3=ParametricPlot3D[p3[u],{u,0,1}]; g4=Graphics3D[{Red, AbsolutePointSize[6],Table[Point[basis[[i,j]]], {i,1,4},{j,1,4}]}]; g5=ParametricPlot3D[{coord[1],coord[2],coord[3]},{u,0,1},{w,0,1}]; Show[ g0,g1,g2,g3,ViewPoint->{-2.576,-1.365,1.718}, Ticks->False, PlotRange -> All] Show[g4,g5,ViewPoint->{-2.576,-1.365,1.718}] Figure 10.7: A Four-Curve Surface.
10 Polynomial Interpolation
529
Note that the expressions P(0, 0), P(0, 1), P(1, 0), and P(1, 1) are simply the four corner points. A better notation for them may be P00 , P01 , P10 , and P11 . Today, this type of surface is known as the linear Coons surface. Its expression is P(u, w) = Pa (u, w) + Pb (u, w) − Pab (u, w), where Pab (u, w) = P00 (1 − u)(1 − w) + P01 (1 − u)w + P10 u(1 − w) + P11 uw. Note that Pa and Pb are lofted surfaces, whereas Pab is a bilinear surface. The final expression is P(u, w) = Pa (u, w) + Pb (u, w) − Pab (u, w) P(u, 0) P(0, w) + (1 − w, w) = (1 − u, u) P(u, 1) P(1, w) 1−w P00 P01 − (1 − u, u) P10 P11 w ⎛ ⎞ ⎞⎛ −P00 1−w −P01 P(0, w) = (1 − u, u, 1) ⎝ −P10 −P11 P(1, w) ⎠ ⎝ w ⎠ . 1 P(u, 0) P(u, 1) (0, 0, 0)
(10.26) (10.27)
Equation (10.26) is more useful than Equation (10.27) since it shows how the surface is defined in terms of the two barycentric pairs (1 − u, u) and (1 − w, w). They are the blending functions of the linear Coons surface. It turns out that many pairs of barycentric functions f1 (u), f2 (u) and g1 (w), g2 (w) can serve as blending functions, out of which more general Coons surfaces can be constructed. All that the blending functions have to satisfy is f1 (0) = 1, g1 (0) = 1,
f1 (1) = 0, g1 (1) = 0,
f2 (0) = 0, f2 (1) = 1, f1 (u) + f2 (u) = 1, (10.28) g2 (0) = 0, g2 (1) = 1, g1 (w) + g2 (w) = 1.
Example: We select the four (nonpolynomial) boundary curves Pu0 = (u, 0, sin(πu)), Pu1 = (u, 1, sin(πu)), P0w = (0, w, sin(πw)), P1w = (1, w, sin(πw)). Each is one-half of a sine wave. The first two proceed along the x axis, and the other two go along the y axis. They meet at the four corner points P00 = (0, 0, 0), P01 = (0, 1, 0), P10 = (1, 0, 0), and P11 = (1, 1, 0). The surface and the Mathematica code that produced it are shown in Figure 10.8. Note the Simplify command, which displays the final, simplified expression of the surface {u, w, Sin[Pi u] + Sin[Pi w]}. Example: Given the four corner points P00 = (−1, −1, 0), P01 = (−1, 1, 0), P10 = (1, −1, 0), and P11 = (1, 1, 0) (notice that they lie on the xy plane), we calculate the four boundary curves of a linear Coons surface patch as follows: 1. We select boundary curve P(0, w) as the straight line from P00 to P01 : P(0, w) = P00 (1 − w) + P01 w = (−1, 2w − 1, 0).
10.7 Coons Surfaces
530
Automatic, RenderAll->False, Ticks->{{1},{0,1},{0,1}}, Prolog->AbsoluteThickness[.4]]
1 1 0
1
0
Figure 10.8: A Coons Surface.
2. We place the two points (1, −0.5, 0.5) and (1, 0.5, −0.5) between P10 and P11 and calculate boundary curve P(1, w) as the cubic Lagrange polynomial (Equation (10.16)) determined by these four points ⎡
⎤⎡ ⎤ −9 −27 27 9 (1, −1, 0) 1 ⎢ 18 −45 36 −9 ⎥ ⎢ (1, −0.5, 0.5) ⎥ P(1, w) = (w3 , w2 , w, 1) ⎣ ⎦⎣ ⎦ −11 18 −9 2 (1, 0.5, −0.5) 2 2 0 0 0 (1, 1, 0) = 1, (−4 − w + 27w 2 − 18w3 )/4, 27(w − 3w2 + 2w3 )/4 . 3. The single point (0, −1, −0.5) is placed between points P00 and P10 and boundary curve P(u, 0) is calculated as the quadratic Lagrange polynomial (Equation (10.13)) determined by these three points: ⎡
⎤⎡ ⎤ 2 −4 2 (−1, −1, 0) ⎣ ⎦ ⎣ P(u, 0) = (u , u, 1) −3 4 −1 (0, −1, −.5) ⎦ = (2u − 1, −1, 2u2 − 2u). 1 0 0 (1, −1, 0) 2
4. Similarly, a new point (0, 1, .5) is placed between points P01 and P11 , and boundary curve P(u, 1) is calculated as the quadratic Lagrange polynomial determined
10 Polynomial Interpolation
531
by these three points: ⎡
⎤⎡ ⎤ 2 −4 2 (−1, 1, 0) 4 −1 ⎦ ⎣ (0, 1, .5) ⎦ = (2u − 1, 1, −2u2 + 2u). P(u, 1) = (u2 , u, 1) ⎣ −3 1 0 0 (1, 1, 0) The four boundary curves and the four corner points now become the linear Coons surface patch given by Equation (10.26): ⎡ P(u, w) = (1 − u, u, 1) ⎣
−(−1, −1, 0) −(−1, 1, 0) −(1, −1, 0) −(1, 1, 0) (2u − 1, −1, 2u2 − 2u) (2u − 1, 1, −2u2 + 2u)
⎤⎡ ⎤ 1−w (−1, 2w − 1, 0) (1, (−4 − w + 27w2 − 18w3 )/4, 27(w − 3w2 + 2w 3 )/4) ⎦ ⎣ w ⎦ . 0 1 This is simplified with the help of appropriate software and becomes P(u, w) = −1 + 2u + (1 − u)(1 − w) − u(1 − w) + (−1 + 2u)(1 − w) + (1 − u)w − uw + (−1 + 2u)w, − 1 + (1 − u)(1 − w) + u(1 − w) + 2w − (1 − u)w − uw + (1 − u)(−1 + 2w) + u(−4 − w + 27w 2 − 18w3 )/4,
(−2u + 2u2 )(1 − w) + (2u − 2u2 )w + 27u(w − 3w2 + 2w 3 )/4 . The surface patch and the eight points involved are shown in Figure 10.10.
10.7.1 Translational Surfaces Given two curves P(u, 0) and P(0, w) that intersect at a point def
P(u, 0)|u=0 = P(0, w)|w=0 = P00 , it is easy to construct the surface patch created by sliding one of the curves, say, P(u, 0), along the other one (Figure 10.9). P(0,w)
P(u, w
0)
x P00
P(u,0)
Figure 10.9: A Translational Surface.
10.7 Coons Surfaces
532
z
y
x p00={-1,-1,0};p01={-1,1,0};p10={1,-1,0};p11={1,1,0}; pnts={p00,p01,p10,p11,{1,-1/2,1/2},{1,1/2,-1/2}, {0,-1,-1/2},{0,1,1/2}}; p0w[w_]:={-1,2w-1,0}; p1w[w_]:={1,(-4-w+27w^2-18w^3)/4,27(w-3w^2+2w^3)/4}; pu0[u_]:={2u-1,-1,2u^2-2u}; pu1[u_]:={2u-1,1,-2u^2+2u}; p[u_,w_]:=(1-u)p0w[w]+u p1w[w]+(1-w)pu0[u]+w pu1[u]p00(1-u)(1-w)-p01 (1-u)w-p10 u (1-w)-p11 u w; g1=Graphics3D[{Red, AbsolutePointSize[6], Table[Point[pnts[[i]]],{i,1,8}]}]; g2=ParametricPlot3D[p[u,w],{u,0,1},{w,0,1}, Ticks->{{-1,1},{-1,1},{-1,1}}]; Show[g1,g2] Figure 10.10: A Coons Surface Patch and Code.
10 Polynomial Interpolation
533
We fix w at a certain value w0 and compute the vector from the intersection point P00 to point P(0, w0 ) (marked with an x in the figure). This vector is the difference P(0, w0 )−P00 , implying that any point on the curve P(u, w0 ) can be obtained by adding this vector to the corresponding point on curve P(u, 0). The entire curve P(u, w0 ) is therefore constructed as the sum P(u, 0) + [P(0, w0 ) − P00 ] for 0 ≤ u ≤ 1. The resulting translational surface P(u, w) is obtained when w is released and is varied in the interval [0, 1] P(u, w) = P(u, 0) + P(0, w) − P00 . There is an interesting relation between the linear Coons surface and translational surfaces. The Coons patch is constructed from four intersecting curves. Consider a pair of such curves that intersect at a corner Pij of the Coons patch. We can employ this pair and the corner to construct a translational surface Pij (u, w). Once we construct the four translational surfaces for the four corners of the Coons patch, they can be used to express the entire Coons linear surface patch by a special version of Equation (10.27) (1 − u, u)
P00 (u, w) P01 (u, w) P10 (u, w) P11 (u, w)
1−w . w
This version expresses the Coons surface patch as a weighted combination of four translational surfaces
10.7.2 Higher-Degree Coons Surfaces One possible pair of blending functions is the cubic Hermite polynomials, functions F1 (t) and F2 (t) of Equation (11.6) H3,0 (t) = B3,0 (t) + B3,1 (t) = (1 − t)3 + 3t(1 − t)2 = 1 + 2t3 − 3t2 , H3,3 (t) = B3,2 (t) + B3,3 (t) = 3t2 (1 − t) + t3 = 3t2 − 2t3 ,
(10.29)
where Bn,i (t) are the Bernstein polynomials, Equation (13.5). The sum H3,0 (t)+H3,3 (t) is identically 1 (because the Bernstein polynomials are barycentric), so these functions can be used to construct the bicubic Coons surface. Its expression is ⎡ ⎤⎡ ⎤ −P01 P(0, w) −P00 H3,0 (w) −P11 P(1, w) ⎦ ⎣ H3,3 (w) ⎦ (10.30) P(u, w) = (H3,0 (u), H3,3 (u), 1) ⎣ −P10 P(u, 0) P(u, 1) 0 1 ⎤ ⎡ ⎤⎡ −P01 P(0, w) −P00 1 + 2w3 − 3w 2 3 2 2 3 2 3 = (1 + 2u − 3u , 3u − 2u , 1) ⎣ −P10 −P11 P(1, w) ⎦ ⎣ 3w − 2w ⎦ . 1 P(u, 0) P(u, 1) (0, 0, 0) One advantage of the bicubic Coons surface patch is that it is especially easy to connect smoothly to other patches of the same type. This is because its blending functions satisfy dH3,0 (t) dH3,3 (t) dH3,3 (t) dH3,0 (t) = 0, = 0, = 0, = 0. dt t=0 dt t=1 dt t=0 dt t=1 (10.31)
10.7 Coons Surfaces
534
Figure 10.11 shows two bicubic Coons surface patches, P(u, w) and Q(u, w), connected along their boundary curves P(u, 1) and Q(u, 0), respectively. The condition for patch connection is, of course, P(u, 1) = Q(u, 0). The condition for smooth connection is ∂P(u, w) ∂Q(u, w) = (10.32) ∂w w=1 ∂w w=o (but see Section 8.13 for other, less restrictive conditions). P01
u, Q(
) u,1
P11 P10
u
Q(u,w)
0)
P(u,w) u
Q01
P(
w) P(0,
P00
Q(0,w)
Q00
P(
) 1,w
Q10 Q(1,w)
Q11
Figure 10.11: Smooth Connection of Bicubic Coons Surface Patches.
The partial derivatives of P(u, w) are easy to calculate from Equation (10.30). They are
dP(0, w) dP(1, w) ∂P(u, w) = H (u) + H (u) , 3,0 3,3 ∂w w=1 dw w=1 dw w=1 ∂Q(u, w) dQ(0, w) dQ(1, w) = H (u) + H (u) . 3,0 3,3 ∂w w=0 dw w=0 dw w=0
(10.33)
(All other terms vanish because the blending functions satisfy Equation (10.31).) The condition for smooth connection, Equation (10.32), is therefore satisfied if dQ(0, w) dP(1, w) dQ(1, w) dP(0, w) = and = , dw w=1 dw w=0 dw w=1 dw w=0 or, expressed in words, if the two boundary curves P(0, w) and Q(0, w) on the u = 0 side of the patch connect smoothly, and the same for the two boundary curves P(1, w) and Q(1, w) on the u = 1 side of the patch. The reader should now find it easy to appreciate the advantage of the degree-5 Hermite blending functions (functions F1 (t) and F2 (t) of Equation (11.17)) H5,0 (t) = B5,0 (t) + B5,1 (t) + B5,2 (t) = 1 − 10t3 + 15t4 − 6t5 , H5,5 (t) = B5,3 (t) + B5,4 (t) + B5,5 (t) = 10t3 − 15t4 + 6t5 .
(10.34)
10 Polynomial Interpolation
535
They are based on the Bernstein polynomials B5,i (t) hence they satisfy the conditions of Equation (10.28). They further have the additional property that their first and second derivatives are zero for t = 0 and for t = 1. The degree-5 Coons surface constructed by them is ⎤⎡ ⎤ ⎡ H5,0 (w) −P01 P(0, w) −P00 P5 (u, w) = H5,0 (u), H5,5 (u), 1 ⎣ −P10 −P11 P(1, w) ⎦ ⎣ H5,5 (w) ⎦ . (10.35) P(u, 0) P(u, 1) 0 1 Adjacent patches of this type of surface are easy to connect with G2 continuity. All that’s necessary is to have two pairs of boundary curves P(0, w), Q(0, w) and P(1, w), Q(1, w), where the two curves of each pair connect with G2 continuity.
10.7.3 The Tangent Matching Coons Surface The original aim of Coons was to construct a surface patch where all four boundary curves are specified by the user. Such patches are easy to compute and the conditions for connecting them smoothly are simple. It is possible to extend the original ideas of Coons to a surface patch where the user specifies the four boundary curves and also four functions that describe how (in what direction) this surface approaches its boundaries. Figure 10.12 illustrates the meaning of this statement. It shows a rectangular surface patch with some curves of the form P(u, wi ). Each of these curves goes from boundary curve P(0, w) to the opposite boundary curve P(1, w) by varying its parameter u from 0 to 1. Each has a different value of wi . When such a curve reaches its end, it is moving in a certain, well-defined direction shown in the diagram. The end tangent vectors of these curves are different and we can imagine a function that yields these tangents as we move along the boundary curve P(1, w), varying w from 0 to 1. A good name for such a function is Pu (1, w), where the subscript u indicates that this tangent of the surface is in the u direction, the index 1 indicates the tangent at the end (u = 1), and the w indicates that this tangent vector is a function of w.
P(0,w)
u=0
P01
u,
P(
1)
u,. 75
)
)
,.5
w=0
u P(
)
w=1 P11
5)
u,0
,.2 (u P
P(
P(
P00
P10 u=1
P(1,w)
Figure 10.12: Tangent Matching in a Coons Surface.
10.7 Coons Surfaces
536
There are four such functions, namely Pu (0, w), Pu (1, w), Pw (u, 0), and Pw (u, 1). Assuming that the user provides these functions, as well as the four boundary curves, our task is to obtain an expression P(u, w) for the surface that will satisfy the following: 1. When we substitute 0 or 1 for u and w in P(u, w), we get the four given corner points and the four given boundary curves. This condition can be expressed as the eight constraints P(0, 0) = P00 , P(0, 1) = P01 , P(1, 0) = P10 , P(1, 1) = P11 , P(0, w), P(1, w), P(u, 0), and P(u, 1) are the given boundary curves. 2. When we substitute 0 or 1 for u and w in the partial first derivatives of P(u, w), we get the four given tangent functions and their values at the four corner points. This condition can be expressed as the 12 constraints ∂P(u, w) = Pu (0, w), ∂u u=0 ∂P(u, w) = Pw (u, 0), ∂w w=0 ∂P(u, w) = Pu (0, 0), ∂u u=0,w=0 ∂P(u, w) = Pu (1, 0), ∂u u=1,w=0 ∂P(u, w) = Pw (0, 0), ∂w u=0,w=0 ∂P(u, w) = Pw (1, 0), ∂w u=1,w=0
∂P(u, w) = Pu (1, w), ∂u u=1 ∂P(u, w) = Pw (u, 1), ∂w w=1 ∂P(u, w) = Pu (0, 1), ∂u u=0,w=1 ∂P(u, w) = Pu (1, 1), ∂u u=1,w=1 ∂P(u, w) = Pw (0, 1). ∂w u=0,w=1 ∂P(u, w) = Pw (1, 1). ∂w u=1,w=1
3. When we substitute 0 or 1 for u and w in the partial second derivatives of P(u, w), we get the four first derivatives of the given tangent functions at the four corner points. This condition can be expressed as the four constraints ∂ 2 P(u, w) ∂u∂w u=0,w=0 ∂ 2 P(u, w) ∂u∂w u=0,w=1 2 ∂ P(u, w) ∂u∂w u=1,w=0 ∂ 2 P(u, w) ∂u∂w u=1,w=1
dPu (0, w) dw w=0 dPu (0, w) = dw w=1 dPu (1, w) = dw w=0 dPu (1, w) = dw =
w=1
dPu (u, 0) du u=0 dPu (u, 1) = du u=0 dPu (u, 0) = du u=1 dPu (u, 1) = du =
u=1
def
= Puw (0, 0),
def
= Puw (0, 1),
def
= Puw (1, 0),
def
= Puw (1, 1).
This is a total of 24 constraints. A derivation of this type of surface can be found
10 Polynomial Interpolation
537
in [Beach 91]. Here, we only quote the final result ⎤ B0 (w) B (w) ⎥ ⎢ ⎥ ⎢ 1 P(u, w) = B0 (u), B1 (u), C0 (u), C1 (u), 1 M ⎢ C0 (w) ⎥ , ⎦ ⎣ C1 (w) 1 ⎡
(10.36)
where M is the 5×5 matrix ⎤ −P01 −Pw (0, 0) −Pw (0, 1) P(0, w) −P00 −P11 −Pw (1, 0) −Pw (1, 1) P(1, w) ⎥ ⎢ −P10 ⎥ ⎢ M = ⎢ −Pu (0, 0) −Pu (0, 1) −Puw (0, 0) −Puw (0, 1) Pu (0, w) ⎥ . ⎣ −P (1, 0) −P (1, 1) −P (1, 0) −P (1, 1) P (1, w) ⎦ u u uw uw u P(u, 0) P(u, 1) Pw (u, 0) Pw (u, 1) (0, 0, 0) ⎡
(10.37)
The two blending functions B0 (t) and B1 (t) can be any functions satisfying conditions (10.28) and (10.31). Examples are the pairs H3,0 (t), H3,3 (t) and H5,0 (t), H5,5 (t) of Equations (10.29) and (10.34). The two blending functions C0 (t) and C1 (t) should satisfy C0 (0) = 0, C1 (0) = 0,
C0 (1) = 0, C1 (1) = 0,
C0 (0) = 1, C1 (0) = 0,
C0 (1) = 0, C1 (1) = 1.
One choice is the pair C0 (t) = t − 2t2 + t3 and C1 (t) = −t2 + t3 . Such a surface patch is difficult to specify. The user has to input the four boundary curves and four tangent functions, a total of eight functions. The user then has to calculate the coordinates of the four corner points and the other 12 quantities required by the matrix of Equation (10.37). The advantage of this type of surface is that once fully specified, such a surface patch is easy to connect smoothly to other patches of the same type since the tangents along the boundaries are fully specified by the user.
10.7.4 The Triangular Coons Surface A triangular surface patch is bounded by three boundary curves and has three corner points. Such surface patches are handy in situations like the one depicted in Figure 10.15, where a triangular Coons patch is used to smoothly connect two perpendicular lofted surface patches. Section 13.25 discusses the triangular B´ezier surface patch which is commonly used in practice. Our approach to constructing the triangular Coons surface is to merge two of the four corner points and explore the behavior of the resulting surface patch. We arbitrarily decide to set P01 = P11 , which reduces the boundary curve P(u, 1) to a single point (Figure 10.13). The expression of this triangular surface patch is ⎛ −P00 −P11 P(u, w) = B0 (u), B1 (u), 1 ⎝ −P10 −P11 P(u, 0) P11
⎞ ⎞⎛ P(0, w) B0 (w) P(1, w) ⎠ ⎝ B1 (w) ⎠ , 1 (0, 0, 0)
(10.38)
10.7 Coons Surfaces
538
P(0,w) P01
w=1
P00
P11 P(u,1) P(u,0)
T1 T0
P(1,w)
w=0 P10 Figure 10.13: A Triangular Coons Surface Patch.
where the blending functions B0 (t), B1 (t) can be the pair H3,0 and H3,3 , or the pair H5,0 and H5,5 , or any other pair of blending functions satisfying Equations (10.28) and (10.31). The tangent vector of the surface along the degenerate boundary curve P(u, 1) is given by Equation (10.33): dP(0, w) dP(1, w) ∂P(u, w) = B (u) + B (u) . 0 1 ∂w w=1 dw w=1 dw w=1
(10.39)
Thus, this tangent vector is a linear combination of the two tangents def
T0 =
dP(0, w) dw w=1
def
and T1 =
dP(1, w) , dw w=1
and therefore lies in the plane defined by them. As u varies from 0 to 1, this tangent vector swings from T0 to T1 while the curve P(u, 1) stays at the common point P01 = P11 . Once this behavior is grasped, the reader should be able to accept the following statement: The triangular patch will be well behaved in the vicinity of the common point if this tangent vector does not reverse its movement while swinging from T0 to T1 . If it starts moving toward T1 , then reverses and goes back toward T0 , then reverses again, the surface may have a fold close to the common point. To guarantee this smooth behavior of the tangent vector, the blending functions B0 (t) and B1 (t) must satisfy one more condition, namely B0 (t) should be monotonically decreasing in t and B1 (t) should be monotonically increasing in t. The two sets of blending functions H3,0 , H3,3 and H5,0 , H5,5 satisfy this condition and can therefore be used to construct triangular Coons surface patches. Example: Given the three corners P00 = (0, 0, 0), P10 = (2, 0, 0), and P01 = P11 = (1, 1, 0), we compute and plot the triangular Coons surface patch defined by them. The first step is to compute the three boundary curves. We assume that the “bottom” boundary curve P(u, 0) goes from P00 through (1, 0, −1) to P10 . We similarly require that the “left” boundary curve P(0, w) goes from P00 through (0.5, 0.5, 1) to P01 and
10 Polynomial Interpolation
539
the “right” boundary curve P(1, w) goes from P10 through (1.5, 0.5, 1) to P11 . All three curves are computed as standard quadratic Lagrange polynomials from Equation (10.13). They become P(u, 0) = (2u, 0, 4u(u − 1)), P(0, w) = (w, w, 4w(1 − w)), P(1, w) = (2 − w, w, 4w(w − 1)). Figure 10.14 shows two views of this surface and illustrates the downside of this type of surface. The technique of drawing a surface patch as a wireframe with two families of curves works well for rectangular surface patches but is unsuitable for triangular patches. The figure shows how one family of curves converges to the double corner point, thereby making the wireframe look unusually dense in the vicinity of the point. Section 13.25 presents a better approach to the display of a triangular surface patch as a wireframe. 0 0.5
1
1
1
0.5
0.5
0
0
-°0.5
° - 0.5
-°1
° -1
2
11.5 5
1
0.5
0
0 1 0
0.5
1
2
(*Triangular Coons patch*) Clear[T,M,g1,g2]; T[t_]:={1+2t^3-3t^2,3t^2-2t^3,1}; p00={0,0,0};p10={2,0,0};p11={1,1,0}; M={{-p00,-p11,{w,w,4w (1-w)}},{-p10,-p11,{2-w,w,4w (1-w)}}, {{2u,0,4u (u-1)},p11,{0,0,0}}}; g2=Graphics3D[{Red, AbsolutePointSize[6], Point[p00],Point[p10],Point[p11]}]; comb[i_]:=(T[u].M)[[i]] T[w][[i]]; g1=ParametricPlot3D[comb[1]+comb[2]+comb[3],{u,0,1},{w,0,1}]; Show[g1,g2] Figure 10.14: A Triangular Coons Surface Patch Example.
Exercise 10.14: What happens if the blending functions of the triangular Coons surface patch do not satisfy the condition of Equation (10.31)?
10.7 Coons Surfaces
540
“Now, don’t worry, my pet,” Mrs. Whatsit said cheerfully. “We took care of that before we left. Your mother has had enough to worry her with you and Charles to cope with, and not knowing about your father, without our adding to her anxieties. We took a time wrinkle as well as a space wrinkle. It’s very easy to do if you just know how.” —Madeleine L’Engle, A Wrinkle in Time (1962). Exercise 10.15: Given the four points P00 = (0, 0, 1), P10 = (1, 0, 0), P01 = (0.5, 1, 0), and P11 = (1, 1, 0), calculate the Coons surface defined by them, assuming straight lines as boundary curves. What type of a surface is this?
10.7.5 Summarizing Example The surface shown in Figure 10.15 consists of four (intentionally separated) patches. A flat bilinear patch B at the top, two lofted patches L and F on both sides, and a triangular Coons patch C filling up the corner. The bilinear patch is especially simple since it is defined by its four corner points. Its expression is B(u, w) = (0, 1/2, 1)(1 − u)(1 − w) + (1, 1/2, 1)(1 − u)w + (0, 3/2, 1)(1 − w)u + (1, 3/2, 1)uw = (w, 1/2 + u, 1). The calculation of lofted patch L starts with the two boundary curves L(u, 0) and L(u, 1). Each is calculated using Hermite interpolation (Chapter 11) since its extreme tangents, as well as its endpoints, are easy to figure out from the diagram. The boundary curves are T L(u, 0) = (u3 , u2 , u, 1)H (0, 0, 0), (0, 1/2, 1), (0, 0, 1), (0, 1, 0) , T L(u, 1) = (u3 , u2 , u, 1)H (1, 0, 0), (1, 1/2, 1), (0, 0, 1), (0, 1, 0) , where H is the Hermite basis matrix, Equation (11.7). Surface patch L is thus L(u, w) = L(u, 0)(1 − w) + L(u, 1)w = (w, u2 /2, u + u2 − u3 ). Lofted patch F is calculated similarly. Its boundary curves are T F(u, 0) = (u3 , u2 , u, 1)H (3/2, 1/2, 0), (1, 1/2, 1), (0, 0, 1), (−1, 0, 0) , T F(u, 1) = (u3 , u2 , u, 1)H (3/2, 3/2, 0), (1, 3/2, 1), (0, 0, 1), (−1, 0, 0) , and the patch itself is F(u, w) = F(u, 0)(1 − w) + F(u, 1)w = ((3 − u2 )/2, 1/2 + w, u + u2 − u3 ). The triangular Coons surface C has corner points C00 = (1, 0, 0), C10 = (3/2, 1/2, 0), and C01 = C11 = (1, 1/2, 1). Its bottom boundary curve is T C(u, 0) = (u3 , u2 , u, 1)H (1, 0, 0), (3/2, 1/2, 0), (1, 0, 0), (0, 1, 0) ,
10 Polynomial Interpolation
541
(0,3/2,1)
u w
(0,1/2,1)
z
(1,3/2,1)
y
B (1,1 /2,1 )
u F
u
(3/2,3/2,0)
L C (0,0,0)
w
w
w (1,0,0)
(3/2,1/2,0)
u
x
b[u_,w_]:={0,1/2,1}(1-u)(1-w)+{1,1/2,1}(1-u)w +{0,3/2,1}(1-w)u+{1,3/2,1}u w; H={{2,-2,1,1},{-3,3,-2,-1},{0,0,1,0},{1,0,0,0}}; lu0={u^3,u^2,u,1}.H.{{0,0,0},{0,1/2,1},{0,0,1},{0,1,0}}; lu1={u^3,u^2,u,1}.H.{{1,0,0},{1,1/2,1},{0,0,1},{0,1,0}}; l[u_,w_]:=lu0(1-w)+lu1 w; fu0={u^3,u^2,u,1}.H.{{3/2,1/2,0},{1,1/2,1},{0,0,1},{-1,0,0}}; fu1={u^3,u^2,u,1}.H.{{3/2,3/2,0},{1,3/2,1},{0,0,1},{-1,0,0}}; f[u_,w_]:=fu0(1-w)+fu1 w; cu0={u^3,u^2,u,1}.H.{{1,0,0},{3/2,1/2,0},{1,0,0},{0,1,0}}; cu1={1,1/2,1}; c0w={w^3,w^2,w,1}.H.{{1,0,0},{1,1/2,1},{0,0,1},{0,1,0}}; c1w={w^3,w^2,w,1}.H.{{3/2,1/2,0},{1,1/2,1},{0,0,1},{-1,0,0}}; c[u_,w_]:=(1-u)c0w+u c1w+(1-w)cu0+w cu1 \ -(1-u)(1-w){1,0,0}-u(1-w){3/2,1/2,0}-w(1-u)cu1- u w cu1; g1=ParametricPlot3D[b[u,w], {u,0,1},{w,0,1}] g2=ParametricPlot3D[l[u,w], {u,0,1},{w,0,1}] g3=ParametricPlot3D[f[u,w], {u,0,1},{w,0,1}] g4=ParametricPlot3D[c[u,w], {u,0,1},{w,0,1}] Show[g1,g2,g3,g4, PlotRange -> All] Figure 10.15: Bilinear, Lofted, and Coons Surface Patches.
10.8 Gordon Surfaces
542
and its top boundary curve C(u, 1) is the multiple point C01 = C11 . The two boundary curves in the w direction are T C(0, w) = (w3 , w2 , w, 1)H (1, 0, 0), (3/1, 1/2, 1), (0, 0, 1), (0, 1, 0) , T C(1, w) = (w3 , w2 , w, 1)H (3/1, 1/2, 0), (1, 1/2, 1), (0, 0, 1), (−1, 0, 0) , and the surface patch itself equals C(u, w) = (1 − u)C(0, w) + uC(1, w) + (1 − w)C(u, 0) + wC(u, 1) − (1 − u)(1 − w)1, 0, 0 − u(1 − w)3/2, 1/2, 0 − w(1 − u)C11 − uwC11 = ((2 + u2 (−1 + w) − u(−2 + w + w2 ))/2, (−u2 (−1 + w) − u(−1 + w)w + w2 )/2, w + w2 − w3 ).
10.8 Gordon Surfaces The Gordon surface is a generalization of Coons surfaces. A linear Coons surface is fully defined by means of four boundary curves, so its shape cannot be too complex. A Gordon surface (Figure 10.16) is defined by means of two families of curves, one in each of the u and w directions. It can have very complex shapes and is a good candidate for use in applications where realism is important.
P(ui,w) u w P(u,wj) Figure 10.16: A Gordon Surface.
We denote the curves by P(ui , w), where i = 0, . . . , m, and P(u, wj ), j = 0, . . . , n. The main idea is to find an expression for a surface Pa (u, w) that interpolates the first family of curves, add it to a similar expression for a surface Pb (u, w) that interpolates the second family of curves, and subtract a surface Pab (u, w) that represents multiple contributions from Pa and Pb . The first surface, Pa (u, w), should interpolate the family of m + 1 curves P(ui , w). When moving on this surface in the u direction (fixed w), we want to intersect all
10 Polynomial Interpolation
543
m + 1 curves. For a given, fixed w, we therefore need to find a curve that will pass through the m + 1 points P(ui , w). A natural (albeit not the only) candidate for such a curve is our acquaintance the Lagrange polynomial (Section 10.2). We write it as old m w. Similarly, we can Pa (u, w) = i=o P(ui , w)Lm i (u), and it is valid for any value of n write the second surface as the Lagrange polynomial Pb (u, w) = j=o P(u, wj )Lnj (w). The surface representing multiple contributions is similar to the bilinear part of Equation (10.26). It is Pab (u, w) =
m n
n P(ui , wj )Lm i (u)Lj (w),
i=o j=o
and the final expression of the Gordon surface is P(u, w) = Pa (u, w) + Pb (u, w) − Pab (u, w). Note that the (m + 1) × (n + 1) points P(ui , wj ) should be located on both curves. For such a surface to make sense, the curves have to intersect.
A friend comes to you and asks if a particular polynomial p(x) of degree 25 in F2 [x] is irreducible. The friend explains that she has tried dividing p(x) by every polynomial in F2 [x] of degree from 1 to 18 and has found that p(x) is not divisible by any of them. She is getting tired of doing all these divisions and wonders if there’s an easier way to check whether or not p(x) is irreducible. You surprise your friend with the statement that she need not do any more work: p(x) is indeed irreducible!
—John Palmieri, Introduction to Modern Algebra for Teachers
11 Hermite Interpolation The curve and surface methods of the preceding chapters are based on points. Using polynomials, it is easy to construct a parametric curve segment (or surface patch) that passes through a given one-dimensional array or two-dimensional grid of points. The downside of these methods is that they are not interactive. If the resulting curve or surface is not the one the designer had in mind, the only way to modify it is to add points. Moving the points is not an option because the curve has to pass through the original data points. Adding points provides some control over the shape of the curve, but slows down the computations. A practical, useful curve/surface design algorithm should be interactive. It should provide user-controlled parameters that modify the shape of the curve in a predictable, intuitive way. The Hermite interpolation approach, the topic of this chapter, is such a method. Hermite interpolation is based on two points P1 and P2 and two tangent vectors Pt1 and Pt2 . It computes a curve segment that starts at P1 going in direction Pt1 , and ends at P2 moving in direction Pt2 . Before delving into the details, the reader may find it useful to peruse Figure 11.1 where several such curves are shown, with their endpoints and extreme tangent vectors. The method is called Hermite interpolation after the French mathematician Charles Hermite (1822–1901) who developed it and derived its blending functions in the 1870s, as part of his work on approximation and interpolation. He was not concerned with the computation of curves and surfaces (and was actually known to hate geometry), and developed his method as a way to interpolate any mathematical quantity from an initial value to a final value given the rates of change of the quantity at the start and at the end (this application of Hermite interpolation is used in computer animation, see Section 19.9.1). D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_11, © Springer-Verlag London Limited 2011
545
11.1 Interactive Control
546 Pt2
Pt1
P1
P2
Figure 11.1: Various Hermite Curve Segments.
[Hermite] had a kind of positive hatred of geometry and once curiously reproached me with having made a geometrical memoir. —Jacques Hadamard. Figure 11.1 makes it obvious that a single Hermite segment can take on many different shapes. It can even have a cusp and can develop a loop. A complete curve, however, normally requires several segments connected with C 0 , C 1 , or C 2 continuities, as illustrated in Section 8.7.2. Spline methods for constructing such a curve are discussed in Chapter 12.
11.1 Interactive Control Hermite interpolation has an important advantage; it is interactive. If a Hermite curve segment has a wrong shape, the user can edit it by modifying the tangent vectors. Exercise 11.1: In the case of a four-point PC, we can change the shape of the curve by moving the points. Why then is the four-point method considered noninteractive? Figure 11.1 illustrates how the shape of the curve depends on the directions of the tangent vectors. Figure 11.2 shows how the curve can be edited by modifying the magnitudes of those vectors. The figure shows three curves that start in a 45◦ direction and end up going vertically down. The effect illustrated here is simple. As the magnitude of the start tangent increases, the curve continues longer in the original direction. This behavior implies that short tangents produce a curve that changes its direction early and starts moving straight toward the final point. Such a curve is close to a straight segment, so we conclude that a long tangent results in a loose curve and a short tangent produces a tight curve (see also exercise 11.7). The reason the magnitudes, and not just the directions, of the tangents affect the shape of the curve is that the three-dimensional Hermite segment is a PC and calculating
11 Hermite Interpolation
547
Figure 11.2: Effects of Varying the Tangent’s Magnitude.
a PC involves four coefficients, each a triplet, for a total of 12 unknown numbers. The two endpoints supply six known quantities and the two tangents should supply the remaining six. However, if we consider only the direction of a vector and not its magnitude, then the vectors (1, 0.5, 0.3), (2, 1, 0.6), and (4, 2, 1.2) are identical. In such a case, only two of the three vector components are independent and two vectors supply only four independent quantities. Exercise 11.2: Discuss this claim in detail. A sketch tells as much in a glance as a dozen pages of print. —Ivan Turgenev, Fathers and Sons (1862).
11.2 The Hermite Curve Segment The Hermite curve segment is easy to derive. It is a PC curve (a degree-3 polynomial in t) with four coefficients that depend on the two points and two tangents. The basic equation of a PC curve is Equation (10.1) duplicated here P(t) = at3 + bt2 + ct + d = (t3 , t2 , t, 1)(a, b, c, d)T = T(t)A.
(10.1)
This is the algebraic representation of the curve, in which the four coefficients are still unknown. Once these coefficients are expressed in terms of the known quantities, which are geometric, the curve will be expressed geometrically. The tangent vector to a curve P(t) is the derivative dP(t)/dt, which we denote by Pt (t). The tangent vector of a PC curve is therefore Pt (t) = 3at2 + 2bt + c.
(11.1)
We denote the two given points by P1 and P2 and the two given tangents by Pt1 and Pt2 . The four quantities are now used to calculate the geometric representation of the PC by writing equations that relate the four unknown coefficients a, b, c, and d to the four known ones, P1 , P2 , Pt1 , and Pt2 . The equations are P(0) = P1 , P(1) = P2 ,
11.2 The Hermite Curve Segment
548
Pt (0) = Pt1 , and Pt (1) = Pt2 (compare with Equations (10.2)). Their explicit forms are a·03 + b·02 + c·0 + d = P1 , a·13 + b·12 + c·1 + d = P2 ,
(11.2)
3a·02 + 2b·0 + c = Pt1 , 3a·12 + 2b·1 + c = Pt2 . They are easy to solve and the solutions are a = 2P1 − 2P2 + Pt1 + Pt2 ,
b = −3P1 + 3P2 − 2Pt1 − Pt2 ,
c = Pt1 ,
d = P1 . (11.3)
Substituting these solutions into Equation (10.1) gives P(t) = (2P1 − 2P2 + Pt1 + Pt2 )t3 + (−3P1 + 3P2 − 2Pt1 − Pt2 )t2 + Pt1 t + P1 , (11.4) which, after rearranging, becomes P(t) = (2t3 − 3t2 + 1)P1 + (−2t3 + 3t2 )P2 + (t3 − 2t2 + t)Pt1 + (t3 − t2 )Pt2 = F1 (t)P1 + F2 (t)P2 + F3 (t)Pt1 + F4 (t)Pt2 = (F1 (t), F2 (t), F3 (t), F4 (t))(P1 , P2 , Pt1 , Pt2 )T = F(t)B, (11.5) where F1 (t) = (2t3 − 3t2 + 1), F2 (t) = (−2t3 + 3t2 ) = 1 − F1 (t), F3 (t) = (t3 − 2t2 + t), F4 (t) = (t3 − t2 ),
(11.6)
B is the column (P1 , P2 , Pt1 , Pt2 )T , and F(t) is the row (F1 (t), F2 (t), F3 (t), F4 (t)). Equations (11.4) and (11.5) are the geometric representation of the Hermite PC segment. Functions Fi (t) are the Hermite blending functions. They create any point on the curve as a blend of the four given quantities. They are shown in Figure 11.3. Note that F1 (t) + F2 (t) ≡ 1. These two functions blend points, not tangent vectors, and should therefore be barycentric. We can also write F1 (t) = (t3 , t2 , t, 1)(2, −3, 0, 1)T and similarly for F2 (t), F3 (t), and F4 (t). In matrix notation this becomes ⎞ 2 −2 1 1 3 −2 −1 ⎟ ⎜ −3 F(t) = (t3 , t2 , t, 1) ⎝ ⎠ = T(t) H. 0 0 1 0 1 0 0 0 ⎛
The curve can now be written ⎞ ⎞⎛ 2 −2 1 1 P1 P −3 3 −2 −1 ⎜ ⎟ ⎜ 2⎟ P(t) = F(t)B = T(t) H B = (t3 , t2 , t, 1) ⎝ ⎠⎝ t ⎠. P1 0 0 1 0 Pt2 1 0 0 0 ⎛
(11.7)
11 Hermite Interpolation
549
Equation (10.1) tells us that P(t) = T(t) A, which implies A = H B. Matrix H is called the Hermite basis matrix. The following is Mathematica code to display a single Hermite curve segment. Clear[T,H,B]; (* Hermite Interpolation *) T={t^3,t^2,t,1}; H={{2,-2,1,1},{-3,3,-2,-1},{0,0,1,0},{1,0,0,0}}; B={{0,0},{2,1},{1,1},{1,0}}; ParametricPlot[T.H.B,{t,0,1},PlotRange->All] Exercise 11.3: Express the midpoint P(0.5) of a Hermite segment in terms of the two endpoints and two tangent vectors. Draw a diagram to illustrate the geometric interpretation of the result.
11.2.1 Hermite Blending Functions The four Hermite blending functions of Equation (11.6) are illustrated graphically in Figure 11.3. An analysis of these functions is essential for a thorough understanding of the Hermite interpolation method.
1
f(t) F1
F2
F3 F4
1
t
Figure 11.3: Hermite Weight Functions
Function F1 (t) is the weight assigned to the start point P1 . It goes down from its maximum F1 (0) = 1 to F1 (1) = 0. This shows why for small values of t the curve is close to P1 and why P1 has little or no influence on the curve for large values of t. The opposite is true for F2 (t), the weight of the endpoint P2 . Function F3 (t) is a bit trickier. It starts at zero, has a maximum at t = 1/3, then drops slowly back to zero. This behavior is interpreted as follows: 1. For small values of t, function F3 (t) has almost no effect. The curve stays close to P1 regardless of the extreme tangents or anything else. 2. For t values around 1/3, weight F3 (t) exerts some influence on the curve. For these t values, weight F4 (t) is small, and the curve is (approximately) the sum of (1) point F1 (t)P1 (large contribution), (2) point F2 (t)P2 (small contribution), and (3) vector F3 (t)Pt1 . The sum of a point P = (x, y) and a vector v = (vx , vy ) is a point located at
550
11.2 The Hermite Curve Segment
(x + vx , y + vy ), which is how weight F3 (t) “pulls” the curve in the direction of tangent vector Pt1 . 3. For large t values, function F3 (t) again has almost no effect. The curve moves closer to P2 because weight F2 (t) becomes dominant. Function F4 (t) is interpreted in a similar way. It has almost no effect for small and for large values of t. Its maximum (actually, minimum, because it is negative) occurs at t = 2/3, so it affects the curve only in this region. For t values close to 2/3, the curve is the sum of point F2 (t)P2 (large contribution), point F1 (t)P1 (small contribution), and vector −|F4 (t)|Pt2 . Because F4 (t) is negative, this sum is equivalent to (x − vx , y − vy ), which is why the curve approaches endpoint P2 while moving in direction Pt2 . Another important feature of the Hermite weight functions is that F1 (t) and F2 (t) are barycentric. They have to be, since they blend two points, and a detailed look at the four Equations (11.2) explains why they are. The first of these equations is simply d = P1 , which reduces the second one to a + b + c + d = P2 or a + b + c = P2 − P1 . The third equation solves c, and the fourth equation, combined with the second equation, is finally used to compute a and b. All this implies that a and b have the form a = α(P2 − P1 ) + · · ·, b = β(P2 − P1 ) + · · ·. The final PC therefore has the form P(t) = at3 + bt2 + ct + d = (αP2 − αP1 + · · ·)t3 + (βP2 − βP1 + · · ·)t2 + (· · ·)t + P1 , where the ellipses represent parts that depend only on the tangent vectors, not on the endpoints. When this is rearranged, the result is P(t) = (−αt3 − βt2 + 1)P1 + (αt3 + βt2 )P2 + (· · ·)Pt1 + (· · ·)Pt2 , which is why the coefficients of P1 and P2 add up to unity.
11.2.2 Hermite Derivatives The concept of blending can be applied to the calculation of the derivatives of a curve, not just to the curve itself. One way to calculate Pt (t) is to differentiate T(t) = (t3 , t2 , t, 1). The result is Pt (t) = Tt (t)HB = (3t2 , 2t, 1, 0)HB. A more general method is to use the relation P(t) = F(t)B, which implies Pt (t) = Ft (t)B = F1t (t), F2t (t), F3t (t), F4t (t) B. The individual derivatives Fit (t) can be obtained from Equation (11.6). The results can be expressed as ⎤ ⎤⎡ 0 0 0 0 P1 P 6 −6 3 3 ⎢ ⎢ ⎥ 2⎥ Pt (t) = (t3 , t2 , t, 1) ⎣ ⎦ ⎣ t ⎦ = T(t)Ht B. P1 −6 6 −4 −2 Pt2 0 0 1 0 ⎡
(11.8)
11 Hermite Interpolation
551
Similarly, the second derivatives of the Hermite segment can be expressed as ⎡
⎤⎡ ⎤ 0 0 0 0 P1 0 0 0 0 P ⎢ ⎢ ⎥ 2⎥ Ptt (t) = (t3 , t2 , t, 1) ⎣ ⎦ ⎣ t ⎦ = T(t)Htt B. 12 −12 6 6 P1 −6 6 −4 −2 Pt2
(11.9)
These expressions make it easy to calculate the first and second derivatives at any point on a Hermite segment. Similar expressions can be derived for any other curves that are based on the blending of geometrical quantities. Exercise 11.4: What is Httt ? Example: The two two-dimensional points P1 = (0, 0) and P2 = (1, 0) and the two tangents Pt1 = (1, 1) and Pt2 = (0, −1) are given. The segment should therefore start at the origin, going in a 45◦ direction, and end at point (1, 0), going straight down. The calculation of P(t) is straightforward: P(t) = T(t) A = T(t) H B ⎡ ⎤⎡ ⎤ 2 −2 1 1 (0, 0) 3 −2 −1 ⎥ ⎢ (1, 0) ⎥ ⎢ −3 = (t3 , t2 , t, 1) ⎣ ⎦⎣ ⎦ 0 0 1 0 (1, 1) 1 0 0 0 (0, −1) ⎡ ⎤ 2(0, 0) − 2(1, 0) + 1(1, 1) + 1(0, −1) ⎢ −3(0, 0) + 3(1, 0) − 2(1, 1) − 1(0, −1) ⎥ ⎥ = (t3 , t2 , t, 1) ⎢ ⎣ 0(0, 0) + 0(1, 0) + 1(1, 1) + 0(0, −1) ⎦ 1(0, 0) + 0(1, 0) + 0(1, 1) + 0(0, −1) ⎤ (−1, 0) ⎢ (1, −1) ⎥ ⎥ = (t3 , t2 , t, 1) ⎢ ⎣ (1, 1) ⎦ (0, 0) ⎡
= (−1, 0)t3 + (1, −1)t2 + (1, 1)t.
(11.10)
Exercise 11.5: Use Equation (11.10) to show that the segment really passes through points (0, 0) and (1, 0). Calculate the tangent vectors and use them to show that the segment really starts and ends in the right directions. Exercise 11.6: Repeat the example above with Pt1 = (2, 2). The new curve segment should go through the same points, in the same directions. However, it√should continue √ 2 2 longer in the original 45◦ direction, since the√size of the new √ tangent is 2 + 2 = 2 2, 2 2 twice as long as the previous one, which is 1 + 1 = 2. Exercise 11.7: Calculate the Hermite curve for two given points P1 and P2 assuming that the tangent vectors at the two points are zero (indeterminate). What kind of a curve is this?
11.2 The Hermite Curve Segment
552
Exercise 11.8: Use the Hermite method to calculate PC segments for the cases where the known quantities are as follows: 1. The three tangent vectors at the start, middle, and end of the segment. 2. The two interior points P(1/3) and P(2/3), and the two extreme tangent vectors Pt (0) and Pt (1). 3. The two extreme points P(0) and P(1), and the two interior tangent vectors Pt (1/3) and Pt (2/3) (this is similar to case 2, so it’s easy). Example: Given the two three-dimensional points P1 = (0, 0, 0) and P2 = (1, 1, 1) and the two tangent vectors Pt1 = (1, 0, 0) and Pt2 = (0, 1, 0), the curve segment is the simple cubic polynomial shown in Figure 11.4 ⎤ ⎤⎡ (0, 0, 0) 2 −2 1 1 (1, 1, 1) −3 3 −2 −1 ⎥ ⎥ ⎢ ⎢ P(t) = (t3 , t2 , t, 1) ⎣ ⎦ ⎦⎣ (1, 0, 0) 0 0 1 0 (0, 1, 0) 1 0 0 0 ⎡
= (−t3 + t2 + t, −t3 + 2t2 , −2t3 + 3t2 ).
1 1
0.5 0 1 0.55 0 y
1 0.75 0.5 0.25 0.75 0. 0 0.5
0
0.25 0.5 0.75
x
0
1
z
0.25 z
(11.11)
x
y
(* Hermite 3D example *) Clear[T,H,B]; T={t^3,t^2,t,1}; H={{2,-2,1,1},{-3,3,-2,-1},{0,0,1,0},{1,0,0,0}}; B={{0,0,0},{1,1,1},{1,0,0},{0,1,0}}; ParametricPlot3D[T.H.B,{t,0,1}, ViewPoint->{-0.846, -1.464, 3.997}]; (* ViewPoint->{3.119, -0.019, 0.054} alt view *) Figure 11.4: A Hermite Curve Segment in Space.
I’m retired—goodbye tension, hello pension! —Anonymous.
0
0.5 1
11 Hermite Interpolation
553
11.2.3 Hermite Segments With Tension This section shows how to create a Hermite curve segment under tension by employing a nonuniform Hermite segment. Such a segment is obtained when the parameter t varies in the interval [0, Δ], where Δ can be any real positive number. The derivation of this case is similar to the uniform case. Equation (11.2) becomes a·03 + b·02 + c·0 + d = P1 , aΔ3 + bΔ2 + cΔ + d = P2 , 3a·02 + 2b·0 + c = Pt1 , 3aΔ2 + 2bΔ + c = Pt2 , with solutions
2(P1 − P2 ) Pt1 + Pt2 + , Δ3 Δ2 t Pt 3(P2 − P1 ) 2P1 − 2, − b= 2 Δ Δ Δ c = Pt1 , d = P1 . a=
The curve segment can now be expressed, similar to Equation (11.7), in the form ⎛
2 Δ3 −3 Δ2
⎜ Pnu (t) = (t3 , t2 , t, 1) ⎜ ⎝ 0 1
−2 Δ3 3 Δ2
1 Δ2 −2 Δ
0 0
1 0
⎞⎛
⎞ P1 ⎟ ⎜ P2 ⎟ ⎟ ⎝ t ⎠ = T(t)Hnu B. 0 ⎠ P1 Pt2 0
1 Δ2 −1 Δ
(11.12)
It is easy to verify that matrix Hnu reduces to H for Δ = 1. Figure 11.5 shows a typical nonuniform Hermite segment drawn three times for Δ = 0.5, 1, and 2. Careful examination of the three curves shows that increasing the value of Δ causes the curve segment to continue longer in its initial and final directions; it has the same effect as increasing the magnitudes of the tangent vectors of the uniform Hermite segment. Once this is grasped, the reader should not be surprised to learn that the nonuniform curve of Equation (11.12) can also be expressed as ⎞⎛ ⎞ P1 2 −2 1 1 3 −2 −1 ⎟ ⎜ P2 ⎟ ⎜ −3 Pnu (t) = (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠. ΔPt1 0 0 1 0 t 1 0 0 0 ΔP2 ⎛
(11.13)
This shows that the nonuniform Hermite curve segment is a special case of the uniform curve. Any nonuniform Hermite curve can also be obtained as a uniform Hermite curve by adjusting the magnitudes of the tangent vectors. However, varying the magnitudes of both tangent vectors has an important geometric interpretation, it changes the tension of the curve segment. Imagine that the two endpoints are nails driven into the page and the curve segment is a rubber string. When the string is pulled at both
11.2 The Hermite Curve Segment
554
0.35
=2
0.3 0.25
=1
0.2 0.15 0.1
=1/2
0.05 0.5
1
1.5
2
Clear[T,H,B]; (* Nonuniform Hermite segments *) T={t^3,t^2,t,1}; H={{2,-2,1,1},{-3,3,-2,-1},{0,0,1,0},{1,0,0,0}}; B[delta_]:={{0,0},{2,0},delta{2,1},delta{2,-1}}; g1=ParametricPlot[T.H.B[0.5],{t,0,1}]; g2=ParametricPlot[T.H.B[1],{t,0,1}]; g3=ParametricPlot[T.H.B[1.5],{t,0,1}]; Show[g1,g2,g3, PlotRange->All] Figure 11.5: Three Nonuniform Hermite Segments.
sides, its shape approaches a straight line. Figure 11.5 shows how decreasing Δ results in a curve with higher tension, so instead of working with nonuniform Hermite segments, we can consider Δ a tension parameter. Practical curve methods that create a spline curve out of individual Hermite segments can add a tension parameter to the spline, thereby making the method more interactive. An example is the cardinal splines method (Section 12.5).
11.2.4 PC Conic Approximations Hermite interpolation can be applied to compute (approximate) conic sections (see Appendix C for more on conics). Given three points P0 , P1 , and P2 and a scalar α, we construct the 4-tuple (P0 , P2 , 4α(P1 − P0 ), 4α(P2 − P1 )) ,
where 0 ≤ α ≤ 1,
(11.14)
to become our two points and two extreme tangent vectors and compute a segment that approximates a conic section. We obtain an ellipse when 0 ≤ α < 0.5, a parabola when α = 0.5, and a hyperbola when 0.5 < α ≤ 1 (see below for a circle). The tangent vectors at the two ends are Pt (0) = 4α(P1 − P0 ) and Pt (1) = 4α(P2 − P1 ) (note their directions). The tangent vector halfway is Pt (0.5) = (1.5 − α)(P2 − P0 ). It is parallel to the vector P2 − P0 . The case of the parabola is especially useful and is explicitly shown here. Substituting α = 0.5 in Equation (11.14) and applying Equation (11.7) yields the Hermite
11 Hermite Interpolation segment
555
⎤⎡ ⎤ P0 2 −2 1 1 3 −2 −1 ⎥ ⎢ P2 ⎥ ⎢ −3 P(t) = (t3 , t2 , t, 1) ⎣ ⎦⎣ ⎦ 0 0 1 0 2(P1 − P0 ) 1 0 0 0 2(P2 − P1 ) ⎡
= (1 − t)2 P0 + 2t(1 − t)P1 + t2 P2 . This is the parabola produced in Exercises 11.9 and 13.2. Exercise 11.9: We know that any three points P0 , P1 , and P2 define a unique parabola (i.e., a triangle defines a parabola). Use Hermite interpolation to calculate the parabola from P0 to P2 whose start and end tangents go in the directions from P0 to P1 and from P1 to P2 , respectively. Hermite interpolation provides a simple way to construct approximate circles and circular arcs. Figure 11.6a shows how this method is employed to construct a circular arc of unit radius about the origin. We assume that an arc spanning an angle 2θ is needed and we place its two endpoints P1 and P2 at locations (cos θ, − sin θ) and (cos θ, sin θ), respectively. This arc is symmetric about the x axis, but we later show how to rotate it to have an arbitrary arc. Since a circle is always perpendicular to its radius, we select as our start and end tangents two vectors that are perpendicular to P1 and P2 . They are Pt1 = a(sin θ, cos θ) and Pt2 = a(− sin θ, cos θ), where a is a parameter to be determined. The Hermite curve segment defined by these points and vectors is, as usual, ⎤ ⎤⎡ (cos θ, − sin θ) 2 −2 1 1 3 −2 −1 ⎥ ⎢ (cos θ, sin θ) ⎥ ⎢ −3 P(t) = (t3 , t2 , t, 1) ⎣ ⎦ ⎦⎣ a(sin θ, cos θ) 0 0 1 0 a(− sin θ, cos θ) 1 0 0 0 ⎡
(11.15)
= (2t3 − 3t2 + 1)(cos θ, − sin θ) + (−2t3 + 3t2 )(cos θ, sin θ) + (t3 − 2t2 + t)a(sin θ, cos θ) + (t3 − t2 )a(− sin θ, cos θ). We need an equation in order to determine a and we obtain it by requiring that the curve segment passes through the circular arc at its center, i.e., P(0.5) = (1, 0). This produces the equation
2 3 2 3 − + 1 (cos θ, − sin θ) + − + (cos θ, sin θ) 8 4 8 4 1 2 1 1 1 + − + a(sin θ, cos θ) + − a(− sin θ, cos θ) 8 4 2 8 4 1 = (8 cos θ + 2a sin θ, 0), 8
(1, 0) = P(0.5) =
whose solution is a=
4(1 − cos θ) . sin θ
11.2 The Hermite Curve Segment
556
The curve can now be written in the form ⎤ ⎡ (cos θ, − sin θ) ⎤ 2 −2 1 1 (cos θ, sin θ) ⎢ ⎥ 3 −2 −1 ⎥ ⎢ ⎥ ⎢ −3 θ) P(t) = (t3 , t2 , t, 1) ⎣ ⎥. ⎦ ⎢ 4(1 − cos θ), 4(1−cos tan θ 0 0 1 0 ⎣ ⎦ θ) 1 0 0 0 −4(1 − cos θ), 4(1−cos tan θ ⎡
This curve provides an excellent approximation to a circular arc, even for angles θ as large as 90◦ . y
y
Pt2 Pt2
P2
P2 Pt1
x
Pt
1
P1 (a)
2 1 2
P1
x
(b)
Figure 11.6: Hermite Segment and a Circular Arc.
Exercise 11.10: Write Equation (11.15) for θ = 90◦ ; calculate P(0.25) and the deviation of the curve from a true circle at this point. In general, an arc with a unit radius is not symmetric about the x axis but may look as in Figure 11.6b, where P1 and P2 are any points at a distance of one unit from the origin. All that’s necessary to calculate the arc from Equation (11.15) is the value of θ (where 2θ is the angle between P1 and P2 ) and this can be calculated numerically from the two points using the relations θ = (θ1 − θ2 )/2, cos θ1 = P1 • (1, 0), cos θ2 = P2 • (1, 0), cos(2θ) = cos(θ1 − θ2 ) = cos θ1 cos θ2 + sin θ1 sin θ2 , cos θ = ± [1 + cos(2θ)]/2, sin θ = 1 − cos2 θ.
11 Hermite Interpolation
557
11.3 Degree-5 Hermite Interpolation It is possible to extend the basic idea of Hermite interpolation to polynomials of higher degree. Naturally, more data is needed in order to compute such a polynomial, and this data is provided by the user, normally in the form of higher-order derivatives of the curve. If the user specifies the two endpoints, the two extreme tangent vectors, and the two extreme second derivatives, the software can use these six data items to calculate the six coefficients of a degree-5 polynomial that interpolates the two points. In general, if the two endpoints and the first k pairs of derivatives at the extreme points are known (a total of 2k+2 items), they can be used to calculate an interpolating polynomial of degree 2k + 1. These higher-degree polynomials are not as useful as the cubic, but the degree-5 polynomial is shown here, as a demonstration of the power of Hermite interpolation (see also Section 12.4). Given two endpoints P1 and P2 , the values of two tangent vectors Pt1 and Pt2 , and tt of two second derivatives Ptt 1 and P2 , we can calculate the polynomial P(t) = at5 + bt4 + ct3 + dt2 + et + f
(11.16)
by writing the six equations P(0) = at5 + bt4 + ct3 + dt2 + et + f |0 = f = P1 , P(1) = at5 + bt4 + ct3 + dt2 + et + f |1 = a + b + c + d + e + f = P2 , Pt (0) = 5at4 + 4bt3 + 3ct2 + 2dt + e|0 = e = Pt1 , Pt (1) = 5at4 + 4bt3 + 3ct2 + 2dt + e|1 = 5a + 4b + 3c + 2d + e = Pt2 , Ptt (0) = 20at3 + 12bt2 + 6ct + 2d|0 = 2d = Ptt 1, Ptt (1) = 20at3 + 12bt2 + 6ct + 2d|1 = 20a + 12b + 6c + 2d = Ptt 2. Solving for the six unknown coefficients yields the degree-5 Hermite interpolating polynomial tt P(t) = F1 (t)P1 + F2 (t)P2 + F3 (t)Pt1 + F4 (t)Pt2 + F5 (t)Ptt 1 + F6 (t)P2 5 4 3 5 4 3 = (−6t + 15t − 10t + 1)P1 + (6t − 15t + 10t )P2 + (−3t5 + 8t4 − 6t3 + t)Pt1 + (−3t5 + 7t4 − 4t3 )Pt2 tt 5 4 3 + −(1/2)t5 + (3/2)t4 − (3/2)t3 + (1/2)t2 Ptt 1 + (1/2)t − t + (1/2)t P2 ⎡ ⎤ ⎡ P1 ⎤ −6 6 −3 −3 −1/2 1/2 P2 ⎥ 8 7 3/2 −1 ⎥ ⎢ ⎢ 15 −15 ⎥ ⎢ ⎥⎢ Pt1 ⎥ ⎢ −10 10 −6 −4 −3/2 1/2 ⎢ ⎥ 5 4 3 2 ⎢ = (t , t , t , t , t, 1) ⎢ (11.17) ⎥ Pt ⎥ ⎥. 0 0 0 1/2 0⎥⎢ ⎢ 0 2 ⎥ ⎣ ⎦⎢ ⎦ 0 0 1 0 0 0 ⎣ Ptt 1 1 0 0 0 0 0 Ptt 2
11.4 Controlling the Hermite Segment
558
11.4 Controlling the Hermite Segment The Hermite method is interactive. In general, the points cannot be moved, but the tangent vectors can be varied. Even if their directions cannot be changed, their magnitudes normally are not fixed by the user and can be modified to edit the shape of the curve segment. The simple experiment of this section illustrates the amount of editing and controlling that can be achieved just by varying the magnitudes of the tangents. We start with the Hermite segment defined by the two endpoints P1 = (0, 0) and P2 = (2, 1) and by the two tangent vectors Pt (0) = (1, 1) and Pt (1) = (1, 0). The curve starts in the 45◦ direction and ends in a horizontal direction. The curve is easy to calculate. Its expression is ⎡
⎤⎡ ⎤ 2 −2 1 1 (0, 0) 3 −2 −1 ⎥ ⎢ (2, 1) ⎥ ⎢ −3 3 2 P(t) = (t3 , t2 , t, 1) ⎣ ⎦⎣ ⎦ = −(2, 1)t + (3, 1)t + (1, 1)t. 0 0 1 0 (1, 1) 1 0 0 0 (1, 0) (11.18) Suppose that the user wants to raise the curve a bit, but also keep the same start and end directions and endpoints. The only way to edit the curve is to change the magnitudes of the tangents. To keep the same directions, the new tangent vectors should have the form (a, a) and (b, 0), where a and b are two new parameters that have to be computed. To raise the curve, we go through the following steps: 1. Calculate the midpoint of the curve. This is P(0.5) = (1, 5/8). 2. Decide by how much to raise it. Let’s say we decide to raise the midpoint to (1, 1). 3. Construct a new curve Q(t), based on the tangents (a, a) and (b, 0). 4. Require that the new curve pass through (1, 1) as its midpoint and determine a and b from this requirement. The general form of the new curve is ⎡
⎤⎡ ⎤ 2 −2 1 1 (0, 0) 3 −2 −1 ⎥ ⎢ (2, 1) ⎥ ⎢ −3 Q(t) = (t3 , t2 , t, 1) ⎣ ⎦⎣ ⎦ 0 0 1 0 (a, a) 1 0 0 0 (b, 0) = (a + b − 4, a − 2)t3 + (−2a − b + 6, 3 − 2a)t2 + (a, a)t.
(11.19)
The requirement Q(0.5) = (1, 1) can now be written (a + b − 4, a − 2)/8 + (−2a − b + 6, 3 − 2a)/4 + (a, a)/2 = (1, 1), which yields the two equations a+b−4+2(−2a−b+6)+4a = 8 and a−2+2(3−2a)+4a = 8. The solutions are a = b = 4, so the new curve has the form Q(t) = (4, 2)t3 − (6, 5)t2 + (4, 4)t.
(11.20)
11 Hermite Interpolation
559
A simple check verifies that this curve really starts at (0, 0), ends at (2, 1), has the extreme tangents (4, 4) and (4, 0), and passes midway through (1, 1). Raising the midpoint from (1, 5/8) to (1, 1) has completely changed the curve (Equations (11.18) and (11.20) are different). The new curve starts going in the same 45◦ direction, then starts going up, reaches point (1, 1), starts going down, and still has “time” to arrive at point (2, 1) moving horizontally. An interesting question is: How much can we raise the midpoint? If we raise it from (1, 5/8) to, say, (1, 100), would the curve be able to change directions, climb up, pass through the new midpoint, dive down, and still approach (2, 1) moving horizontally? To check this, let’s assume that we raise the midpoint from (1, 5/8) to (1, 5/8 + α), where α is a real number. The curve is constrained by Q(0.5) = (1, 5/8 + α), which yields the equation (a + b − 4, a − 2)/8 + (−2a − b + 6, 3 − 2a)/4 + (a, a)/2 = (1, 5/8 + α). The solutions are a = b = 1 + 8α. This means that α can vary without limit. When α is positive, the curve is pulled up. Negative values of α push the curve down. The value α = −1/8 is special. It implies a = b = 0 and results in the curve Q(t) = (6t2 − 4t3 , 3t2 − 2t3 ). The parameter substitution u = 3t2 − 2t3 yields Q(u) = (2u, u). This curve is the straight line from (0, 0) to (2, 1). Its midpoint is (1, 1/2). Exercise 11.11: Values α < −1/8 result in negative a and b. Can they still be used in Equation (11.19)? Exercise 11.12: How can we coerce the curve of Equation (11.19) to have point (1, 0) as its midpoint? Note. Raising the curve is done by increasing the size of the tangent vectors. This forces the curve to continue longer in the initial and final directions. This is also the reason why too much raising causes undesirable effects. Figure 11.7 shows the original curve (α = 0) and the effects of increasing α. For α = 0.4, the curve is raised and still has a reasonable shape. However, for larger values of α, the curve gets tight, develops a cusp (a kink), then starts looping on itself. It is easy to see that when α = 5/8, the tangent vector becomes indefinite at the midpoint (t = 0.5). To show this, we differentiate the curve of Equation (11.19) to obtain the tangent Qt (t) = 3(a + b − 4, a − 2)t2 + 2(−2a − b + 6, 3 − 2a)t + (a, a). From a = b = 1 + 8α, we get Qt (t) = (48α − 6, 24α − 3)t2 + (6 − 48α, 2 − 32α)t + (1 + 8α, 1 + 8α). For α = 5/8, this reduces to Qt (t) = (24, 12)t2 − (24, 18)t + (6, 6), so Qt (0.5) = (0, 0). Exercise 11.13: Given the two endpoints P1 = (0, 0) and P2 = (1, 0) and the two tangent vectors Pt1 = α(cos θ, sin θ) and Pt1 = α(cos θ, − sin θ) (Figure 11.8), calculate the value of α for which the Hermite segment from P1 to P2 has a cusp.
11.4 Controlling the Hermite Segment
560 y
1.4
.8 .6 .4
1
P2
0 0.4
−0.4
P1
x 2
1 −0.2 Figure 11.7: Effects of Changing α.
y
0
x
1
Figure 11.8: Tangents for Exercise 11.13.
The following problem may sometimes occur in practice. Given two endpoints P1 and P2 , two unit tangent vectors T1 and T2, and a third point P3 , find scale factors α and β such that the Hermite segment P(t) defined by points P1 and P2 and tangents αT1 and βT2, respectively, will pass through P3 . Also find the value t0 for which P(t0 ) = P3 . We start with Equation (11.5), which in our case becomes P3 = F1 (t0 )P1 + F2 (t0 )P2 + F3 (t0 )αT1 + F4 (t0 )βT2, where the Fi (t) are given by Equation (11.6). Since F1 (t) + F2 (t) ≡ 1 we can write P3 − P1 = F2 (t0 )(P2 − P1 ) + αF3 (t0 )T1 + βF4 (t0 )T2.
11 Hermite Interpolation
561
This can now be written as the three scalar equations x3 − x1 = F2 (t0 )(x2 − x1 ) + αF3 (t0 )T 1x + βF4 (t0 )T 2x , y3 − y1 = F2 (t0 )(y2 − y1 ) + αF3 (t0 )T 1y + βF4 (t0 )T 2y , z3 − z1 = F2 (t0 )(z2 − z1 ) + αF3 (t0 )T 1z + βF4 (t0 )T 2z .
(11.21)
This is a system of three equations in the three unknowns α, β, and t0 . In principle, it should have a unique solution, but solving it is awkward since t0 is included in the Fi (t0 ) functions, which are degree-3 polynomials in t0 . The first step is to isolate the two products αF3 (t0 ) and βF4 (t0 ) in the first two equations. This yields
αF3 (t0 ) βF4 (t0 )
=
T 1x T 1y
T 2x T 2y
−1
x3 − x1 y3 − y1
−
x2 − x1 y2 − y1
F2 (t0 ) .
This result is used in step two to eliminate αF3 (t0 ) and βF4 (t0 ) from the third equation: z3 − z1 = F2 (t0 )(z2 − z1 ) + (T 1z , T 2z ) = F2 (t0 )(z2 − z1 ) T 1x + (T 1z , T 2z ) T 1y
T 2x T 2y
αF3 (t0 )
βF4 (t0 )
−1
x3 − x1 y3 − y1
−
x2 − x1 y2 − y1
F2 (t0 ) .
We now have an equation with the single unknown t0 . Step three is to simplify the result above by using the value F2 (t0 ) = −2t30 + 3t20 : x2 − x1 T 1x T 2x
y2 − y1 T 1y T 2y
x3 − x1 z2 − z1 T 1z (−2t30 + 3t20 ) = T 1x T 2x T 2z
y3 − y1 T 1y T 2y
z3 − z1 T 1z . T 2z
(11.22)
Step four is to solve Equation (11.22) for t0 . Once t0 is known, α and β can be computed from the other equations. Equation (11.22), however, is cubic in t0 , so it may have to be solved numerically and it may have between zero and three real solutions t0 . Any acceptable solution t0 must be a real number in the range [0, 1] and must result in positive α and β. This, of course, is a slow, tedious approach and should only be used as a last resort, when nothing else works.
11.5 Truncating and Segmenting
562
t) P( t=0
T=0 Pi=Q1
Q1(T) t1
Q(T) t0=0
Pj=Q2
T=1 P(t)
Q2(T) t2
Q3(T) t=1
t3
(a)
Q4(T)
t4=1
(b)
Figure 11.9: Truncating and Segmenting.
11.5 Truncating and Segmenting Surfaces and solid objects are constructed of curves. When surfaces are joined, clipped, or intersected, it is sometimes necessary to truncate curves. In general, the problem of truncating a curve starts with a parametric curve P(t) and the two values ti and tj . A new curve Q(T ) needs to be determined, that is identical to the segment P(ti ) → P(tj ) (Figure 11.9a) when T varies from 0 to 1. The discussion in this section is limited to Hermite segments. The endpoints of the new curve are Q(0) = P(ti ) and Q(1) = P(tj ). To understand how the two extreme tangent vectors of Q(T ) are calculated, we first need to discuss reparametrization of parametric curves. Reparametrization is the case where a new parameter T (t) is substituted for the original parameter t. Notice that T (t) is a function of t. One example of reparametrization is reversing the direction of a curve. It is easy to see that when t varies from 0 to 1, the simple function T = 1 − t varies from 1 to 0. The two curves P(t) and P(1 − t) have the same shape and location but move in opposite directions. Another example of reparametrization is a curve P(t) with a parameter 0 ≤ t ≤ 1 being transformed to a curve Q(T ) with a parameter a ≤ T ≤ b (Section 12.1.6 has an example). The simplest relation between T and t is linear, i.e., T = at + b. We can make two observations about this relation as follows: 1. At two different points i and j along the curve, the parameters are related by Ti = ati + b and Tj = atj + b, respectively. Subtracting yields Tj − Ti = a(tj − ti ), so a = (Tj − Ti )/(tj − ti ). 2. T = at + b gives dT = a dt. These two observations can be combined to produce the expression dt 1 tj − ti . = = dT a Tj − Ti
(11.23)
Equation (11.23) is used to calculate the extreme tangent vectors of our new curve Q(T ). Since it goes from point P(ti ) (where T = 0) to point P(tj ) (where T = 1), we have
11 Hermite Interpolation
563
Tj − Ti = 1. The tangent vectors of Q(T ) are therefore QT (T ) =
dP(t) dt dQ(T ) = = Pt (t) · (tj − ti ). dT dt dT
The two extreme tangents are QT (0) = (tj − ti )Pt (ti ) and QT (1) = (tj − ti )Pt (tj ). The new curve can now be calculated by ⎡ ⎢ Q(T ) = (T 3 , T 2 , T, 1)H ⎣
⎤ P(ti ) P(tj ) ⎥ ⎦, (tj − ti )Pt (ti ) (tj − ti )Pt (tj )
(11.24)
where H is the Hermite matrix, Equation (11.7). Exercise 11.14: Compute the PC segment Q(T ) that results from truncating P(t) = (−1, 0)t3 + (1, −1)t2 + (1, 1)t (Equation (11.10)) from ti = 0.25 to tj = 0.75. Segmenting a curve is the problem of calculating several truncations. Assume that we are given values 0 = t0 < t1 < t2 < · · · < tn = 1, and we want to break a given curve P(t) into n segments such that segment i will go from point P(ti−1 ) to point P(ti ) (Figure 11.9b). Equation (11.24) gives segment i as ⎤ P(ti−1 ) P(ti ) ⎥ ⎢ Qi (T ) = (T 3 , T 2 , T, 1)H ⎣ ⎦. (ti − ti−1 )Pt (ti−1 ) t (ti − ti−1 )P (ti ) ⎡
11.5.1 Special and Degenerate Hermite Segments The following special cases result in Hermite curve segments that are either especially simple (degenerate) or especially interesting The case P1 = P2 and Pt1 = Pt2 = (0, 0). Equation (11.4) yields P(t) = P1 ; the curve degenerates to a point. The case Pt1 = Pt2 = P2 − P1 . The two tangents point in the same direction, from P1 to P2 . Equation (11.4) yields P(t) = 2P1 − 2P2 + 2(P2 − P1 ) t3 + − 3P1 + 3P2 − 3(P2 − P1 ) t2 + (P2 − P1 )t + P1 = (P2 − P1 )t + P1 . (11.25) The curve reduces to a straight segment. The case P1 = P2 . Equation (11.4) yields P(t) = (Pt1 + Pt2 )t3 + (−2Pt1 − Pt2 )t2 + It is easy to see that this curve satisfies P(0) = P(1). It is closed (but is not a circle).
Pt1 t + P1 .
564
11.6 Hermite Straight Segments The case Pt1 = Pt2 = (x2 − x1 , y2 − y1 , 0). Equation (11.4) yields P(t) = 2P1 − 2P2 + 2(x2 − x1 , y2 − y1 , 0) t3 + − 3P1 + 3P2 − 3(x2 − x1 , y2 − y1 , 0) t2 + (x2 − x1 , y2 − y1 , 0)t + (x1 , y1 , z1 ) = x1 + (x2 − x1 )t, y1 + (y2 − y1 )t, z1 + (z2 − z1 )(3t2 − 2t3 ) .
The x and y coordinates of this curve are linear functions of t, so its tangent vector has the form (α, β, z(t)). Its x and y components are constants, so it always points in the same plane. Thus, the curve is planar.
11.5.2 Special and Degenerate Curves Parametric curves in general, not just Hermite segments, exhibit special behavior when their derivatives satisfy certain conditions. Here are four examples: 1. If the first derivative Pt (t) of a curve P(t) is zero for all values of t, then P(t) degenerates to the point P(0). 2. If Pt (t) = 0 and Pt (t)×Ptt (t) = 0 (i.e., the tangent vector points in the direction of the acceleration vector), then P(t) is a straight line. 3. If Pt (t) × Ptt (t) = 0 and |Pt (t) Ptt (t) Pttt (t)| = 0, then P(t) is a plane curve. (The notation |a b c| refers to the determinant whose three columns are a, b, and c.) 4. Finally, if both Pt (t) × Ptt (t) and |Pt (t) Ptt (t) Pttt (t)| are nonzero, the curve P(t) is nonplanar (i.e., it is a space curve).
11.6 Hermite Straight Segments Equation (11.25) shows that the Hermite segment can sometimes degenerate into a straight segment. This section describes variations on Hermite straight segments. Specifically, we look in detail at the case where the two extreme tangent vectors point in the same direction, from P1 to P2 , but have different magnitudes. We denote them by Pt1 = α(P2 − P1 ) and Pt2 = β(P2 − P1 ), where α and β can be any real numbers. Equation (11.25) is obtained in the special case α = β = 1. The Hermite segment is expressed as P(t) = F(t)B, where the four Fi (t) functions are given by Equation (11.6), and B is the geometry vector, which, in our case, has the form T B = P1 , P2 , α(P2 − P1 ), β(P2 − P1 ) . This can be written (since F1 (t) + F2 (t) ≡ 1) in the form P(t) = F1 (t)P1 + F2 (t)P2 + F3 (t)α(P2 − P1 ) + F4 (t)β(P2 − P1 ) = P1 + (F2 (t) + αF3 (t) + βF4 (t))(P2 − P1 ) = P1 + (1 − 2t3 + 3t2 ) + α(t3 − 2t2 + t) + β(t3 − t2 ) (P2 − P1 ) = P1 + (α + β − 2)t3 − (2α + β − 3)t2 + αt (P2 − P1 ). (11.26)
11 Hermite Interpolation
565
This has the form P(t) = P1 + G(t)(P2 − P1 ), which shows that all the points of P(t) lie on the straight line that passes through P1 and has the tangent vector (P2 − P1 ). The precise form of P(t) depends on the values and signs of α and β. The remainder of this section analyzes several cases in detail. The remaining cases can be analyzed similarly. See also Exercise 13.7. Case 1 is when α = β = 1, which leads to Equation (11.25), a straight segment from P1 to P2 . Case 2 is when α = β = 0. Equation (11.26) reduces in this case to P(t) = P1 + (−2t3 + 3t2 )(P2 − P1 ),
(11.27)
or P(T ) = P1 + T (P2 − P1 ), where T = −2t3 + 3t2 . This also is a straight segment from P1 to P2 but moving at a variable speed. It accelerates up to point P(0.5), then decelerates. Exercise 11.15: Explain why this is so. Case 3 is when α = β = −1. Equation (11.26) becomes in this case P(t) = P1 + (−4t3 + 6t2 − t)(P2 − P1 ),
(11.28)
which is the curve shown in Figure 11.10a. It consists of three straight segments, but we can also think of it as a straight line that goes from P1 backward to a certain point P(i), then reverses direction, passes points P1 and P2 , stops at point P(j), reverses direction again, and ends at P2 . We can calculate i and j by calculating the tangent of Equation (11.28) and equating it to zero. The tangent vector is Pt (t) = (−12t2 + 12t − 1)(P2 − P1 ) and the roots of the quadratic equation −12t2 + 12t − 1 = 0 are (approximately) 0.083 and 0.92. t=.08
t=1/3
t=.92
t=0
t=1 (a)
t=0
t=1 t=.1 t=1/2
t=.8
t=0
(b)
t=1 (c)
Figure 11.10: Straight Hermite Segments.
Case 4 is when α > 0, β > 0. As an example, we try the values α = 2 and β = 4. Equation (11.26) becomes in this case P(t) = P1 + (4t3 − 5t2 + 2t)(P2 − P1 ).
(11.29)
This curve also consists of three straight segments (Figure 11.10b), but it behaves differently. It goes forward from P1 to a certain point P(i), then reverses direction, goes to point P(j), reverses direction again, and continues to P2 . We can calculate i and j by calculating the tangent of Equation (11.29) and equating it to zero. The tangent
11.7 A Variant Hermite Segment
566
vector is Pt (t) = (12t2 − 10t + 2)(P2 − P1 ) and the roots of the quadratic equation 12t2 − 10t + 2 = 0 are 1/3 and 1/2. Case 5 is when α < 0, β < 0. As an example, we try the values α = −2 and β = −4. Equation (11.26) becomes in this case P(t) = P1 + (−8t3 + 11t2 − 2t)(P2 − P1 ).
(11.30)
This curve again consists of three straight segments as in case 3, but points i and j are different (Figure 11.10c). The tangent of Equation (11.30) is Pt (t) = (−24t2 + 22t − 2)(P2 − P1 ), and the roots of the quadratic equation −24t2 + 22t − 2 = 0 are (approximately) 0.1 and 0.8. Table 11.11 summarizes the nine possible cases of Equation (11.26). Case α β
1 1 1
2 0 0
3 −1 −1
4 >0 >0
5 {0.322,1.342,0.506}]
Figure 11.15: Two Ferguson Surface Patches.
11 Hermite Interpolation
571
11.9 Bicubic Hermite Patch The spline methods covered in Chapter 12 are based on Hermite curve segments, which suggests that Hermite interpolation is useful. The Ferguson surface patch of Section 11.8 is an attempt to extend the technique of Hermite interpolation to surface patches. This section describes a more general extension. A single Hermite segment is a cubic polynomial, so we expect the Hermite surface patch, which is an extension of the Hermite curve segment, to be a bicubic surface. Its expression should be given by Equation (10.25), where matrix H (Equation (11.7)) should be substituted for N, and the 16 quantities should be points and tangent vectors. The basic idea is to ask the user to specify the four boundary curves as Hermite segments. Thus, the user should specify two points and two tangent vectors for each curve, for a total of eight points and eight tangents. For the four curves to form a surface, they have to meet at the four corners, so the eight points are reduced to four points. Four points and eight tangents provide 12 of the 16 quantities needed to construct the surface. Four more quantities are needed in order to calculate the 16 unknowns of Equation (10.24), and they are selected as the second derivatives of the surface at the corner points. They are called twist vectors. To calculate the surface, 16 equations are written, expressing the way we require the surface to behave. For example, we want P(u, w) to approach the corner point P01 when u → 0 and w → 1. We also want P(0, w) to equal the PC between points P00 and P01 . The equations are obtained from the 16 terms of Equation (10.22) = a00 , = a30 + a20 + a10 + a00 , = a03 + a02 + a01 + a00 , = a33 + a32 + a31 + a30 + a23 + a22 + a21 + a20 + a13 + a12 + a11 + a10 + a03 + a02 + a01 + a00 , Pu00 = a10 , Pw 00 = a01 , Pu10 = 3a30 + 2a20 + a10 , Pw 10 = a31 + a21 + a11 + a01 , Pu01 = a13 + a12 + a11 + a10 , Pw 01 = 3a03 + 2a02 + a01 , Pu11 = 3a33 + 3a32 + 3a31 + 3a30 + 2a23 + 2a22 + 2a21 + 2a20 + a13 + a12 + a11 + a10 , Pw 11 = 3a33 + 2a32 + a31 + 3a23 + 2a22 + a21 + 3a13 + 2a12 + a11 + 3a03 + 2a02 + a01 , Puw 00 = a11 , Puw 10 = 3a31 + 2a21 + a11 , Puw 01 = 3a13 + 2a12 + a11 , Puw 11 = 9a33 + 6a32 + 3a31 + 6a23 + 4a22 P00 P10 P01 P11
11.9 Bicubic Hermite Patch
572
+ 2a21 + 3a13 + 2a12 + a11 . The solutions express the 16 coefficients aij in terms of the four corner points, eight tangent vectors, and four twist vectors: = Pw 00 , w = −2Pw 00 − P01 − 3P00 + 3P01 , w = Pw 00 + P01 + 2P00 − 2P01 , u = P00 , = Puw 00 , uw u u = −2Puw 00 − P01 − 3P00 + 3P01 , uw u u = Puw 00 + P01 + 2P00 − 2P01 , u u = −2P00 − P10 − 3P00 + 3P10 , uw w w = −2Puw 00 − P10 − 3P00 + 3P10 , uw uw uw uw = 4P00 + 2P01 + 2P10 + P11 + 6Pu00 − 6Pu01 + 3Pu10 − 3Pu11 + 6Pw 00 w w + 3Pw 01 − 6P10 − 3P11 + 9P00 − 9P01 − 9P10 + 9P11 , uw uw uw u u u u w a23 = −2Puw 00 − 2P01 − P10 − P11 − 4P00 + 4P01 − 2P10 + 2P11 − 3P00 w w w − 3P01 + 3P10 + 3P11 − 6P00 + 6P01 + 6P10 − 6P11 , a30 = Pu00 + Pu10 + 2P00 − 2P10 , uw w w a31 = Puw 00 + P10 + 2P00 − 2P10 , uw uw uw u u u u w a32 = −2P00 − P01 − 2P10 − Puw 11 − 3P00 + 3P01 − 3P10 + 3P11 − 4P00 w w w − 2P01 + 4P10 + 2P11 − 6P00 + 6P01 + 6P10 − 6P11 , uw uw uw u u u u w w a33 = Puw 00 + P01 + P10 + P11 + 2P00 − 2P01 + 2P10 − 2P11 + 2P00 + 2P01 w − 2Pw 10 − 2P11 + 4P00 − 4P01 − 4P10 + 4P11 . a01 a02 a03 a10 a11 a12 a13 a20 a21 a22
When Equation (10.24) is written in terms of these values, it becomes the compact expression ⎡
P00 ⎢ P10 3 2 P(u, w) = (u , u , u, 1)H ⎣ u P00 Pu10 = UHBHT WT ,
P01 P11 Pu01 Pu11
Pw 00 Pw 10 Puw 00 Puw 10
⎤ ⎡ 3⎤ Pw w 01 w P11 ⎥ T ⎢ w2 ⎥ ⎦ ⎦H ⎣ Puw w 01 uw P11 1
(11.35)
where H is the Hermite matrix, Equation (11.7). The quantities Puw ij are the twist vectors. They are usually not known in advance but the next section describes a way to estimate them.
11 Hermite Interpolation
573
11.10 Biquadratic Hermite Patch Section 11.7 discusses a variation on the Hermite segment where two points P1 and P2 and just one tangent vector Pt1 are known. The curve segment is given by Equation (11.32), duplicated here P(t) = (P2 − P1 − Pt1 )t2 + Pt1 t + P1 = (−t2 + 1)P1 + t2 P2 + (−t2 + t)Pt1 ⎞ ⎞⎛ ⎛ P1 −1 1 −1 1 ⎠ ⎝ P2 ⎠ . = (t2 , t, 1) ⎝ 0 0 Pt1 1 0 0
(11.32)
If we denote the curve segment by P(t) = at2 + bt + c, then its tangent vector has the form Pt (t) = 2at + b = 2(P2 − P1 − Pt1 )t + Pt1 , which implies that the end tangent is Pt (1) = 2(P2 − P1 ) − Pt1 . The biquadratic surface constructed as the Cartesian product of two such curves is given by ⎛ ⎞⎛ ⎞⎛ ⎞⎛ 2 ⎞ −1 1 −1 Q22 Q21 Q20 −1 0 1 w 2 P(u, w) = (u , u, 1) ⎝ 0 0 1 ⎠ ⎝ Q12 Q11 Q10 ⎠ ⎝ 1 0 0 ⎠ ⎝ w ⎠ , Q02 Q01 Q00 1 1 0 0 −1 1 0 (11.36) where the nine quantities Qij still have to be assigned geometric meaning. This is done by computing P(u, w) and its partial derivatives for certain values of the parameters. Simple experimentation yields P(0, 0) = Q22 , P(0, 1) = Q21 , P(1, 0) = Q12 , P(1, 1) = Q11 , Pu (0, 0) = Q02 , Pu (0, 1) = Q01 , Pw (0, 0) = Q20 , Pw (1, 0) = Q10 , Puw (0, 0) = Q00 . This shows that the surface can be expressed as ⎞ ⎞⎛ ⎛ P(0, 0) P(0, 1) Pw (0, 0) −1 1 −1 P(u, w) = (u2 , u, 1) ⎝ 0 0 P(1, 1) Pw (1, 0) ⎠ 1 ⎠ ⎝ P(1, 0) 1 0 0 Pu (0, 0) Pu (0, 1) Puw (0, 0) ⎛ ⎞⎛ 2 ⎞ −1 0 1 w (11.37) ×⎝ 1 0 0 ⎠ ⎝ w ⎠ 1 −1 1 0 ⎞⎛ ⎛ ⎞⎛ ⎞⎛ 2 ⎞ P00 P01 Pw −1 1 −1 −1 0 1 w 00 ⎠⎝ 1 0 0⎠⎝ w ⎠. = (u2 , u, 1) ⎝ 0 0 1 ⎠ ⎝ P10 P11 Pw 10 Pu00 Pu01 Puw 1 0 0 1 −1 1 0 00 Thus, this type of surface is defined by the following nine quantities: The four corner points P00 , P01 , P10 , and P11 . The two tangents in the u direction at points P00 and P01 . The two tangents in the w direction at points P00 and P10 .
11.10 Biquadratic Hermite Patch
574
The second derivative at point P00 . The first eight quantities have simple geometric meaning, but the second derivative, which is a twist vector, has no simple geometrical interpretation. It can simply be set to zero or it can be estimated. Several methods exist to estimate the twist vectors of biquadratic and bicubic surface patches. The simple method described here is useful when a larger surface is constructed out of several such patches. We start by looking at the twist vector of a bilinear surface. Differentiating Equation (9.6) twice, with respect to u and w, produces the simple, constant expression Puw (u, w) = P00 − P01 − P10 + P11 = (P00 − P01 ) + (P11 − P10 ),
(11.38)
that’s a vector and is also independent of both parameters. This expression is now employed to estimate the twist vectors of all the patches that constitute a biquadratic or a bicubic surface. Figure 11.16a is an idealized diagram of such a surface, showing some individual patches. The first step is to apply Equation (11.38) to calculate a vector Ti for patch i from the four corner points of the patch. Vectors Ti are then averaged to provide estimates for the four twist vectors of each patch. The principle is as follows: A corner point Pi with one index i belongs to just one patch (patch i) and is one of the four corner points of the entire surface (P1 , P4 , P9 , and Pc of Figure 11.16a). The twist vector estimated for such a point is Ti , the vector previously calculated for patch i. A point Pij with two indexes ij is common to two patches i and j and is located on the boundary of the entire surface (examples are P15 and P59 ). The twist vector estimated for such a point is the average (Ti + Tj )/2. A point Pijkl with four indexes is common to four patches. The twist vector estimated for such a point is the average (Ti + Tj + Tk + Tl )/4. This method works well as a first estimate. After the surface is drawn, the twist vectors determined by this method may have to be modified to bring the surface closer to its required shape. P3= (-1,2,0) P9
P9a 9
Pbc
a
P59 P569a 5 P15 P1256 1 P1
Pab
P12
b
P67ab
Pc 3
c P78bc
P8c
6
7
8
P2367 2
P3478 3
4
P23
P34
P34= (1,2,1) P4= (1,2,0)
P48
4
P24= (3,1,1) P13= (1,1,1) P1234= (2,1,°1) 1
2 P2= (2,0,2)
P4 P1= (0,0,0) P12= (1,0,0)
(a)
(b)
Figure 11.16: Estimating Twist Vectors.
Example: Compute twist vectors for the four patches shown in Figure 11.16b.
11 Hermite Interpolation
575
from Equation (11.38) for The first step is to compute a second derivative vector Puw i each patch i. Puw 1 Puw 2 Puw 3 Puw 4
= [(0, 0, 0) − (1, 1, 1)] + [(2, 1, −1) − (1, 0, 0)] = (0, 0, −2), = [(1, 0, 0) − (2, 1, −1)] + [(3, 1, 1) − (2, 0, 2)] = (0, 0, 0), = [(1, 1, 1) − (−1, 2, 0)] + [(1, 2, 1) − (2, 1, −1)] = (1, 0, 3), = [(2, 1, −1) − (1, 2, 1)] + [(1, 2, 0) − (3, 1, 1)] = (−1, 0, −3).
The second step is to compute a twist vector Ti for each of the nine points = Puw 1 = (0, 0, −2), uw = [Puw 1 + P3 ]/2 = [(0, 0, −2) + (1, 0, 3)]/2 = (.5, 0, .5), = Puw 3 = (1, 0, 3), uw = [Puw 1 + P2 ]/2 = [(0, 0, 0) + (1, 0, 3)]/2 = (.5, 0, 1.5), uw uw uw = [P1 + Puw 2 + P3 + P4 ]/4 = [(0, 0, −2) + (0, 0, 0) + (1, 0, 3) + (−1, 0, −3)]/4 = (0, 0, −.5), uw T34 = [Puw 3 + P4 ]/2 = [(1, 0, 3) + (−1, 0, −3)]/2 = (0, 0, 0), uw T2 = P2 = (0, 0, 0), uw T24 = [Puw 2 + P4 ]/2 = [(0, 0, 0) + (−1, 0, −3)]/2 = (−.5, 0, −1.5), uw T4 = P4 = (−1, 0, −3).
T1 T13 T3 T12 T1234
The last step is to compute one twist vector for each patch by averaging the four twist vectors of the four corners of the patch. For patch 1, the result is [T1 + T13 + T1234 + T12 ]/4 = [(0,0,−2)+(.5,0,.5)+(0,0,−.5)+(.5,0,1.5)]/4 = (.25, 0, −.125), and similarly for the other three surface patches. She could afterward calmly discuss with him such blameless technicalities as hidden line algorithms and buffer refresh times, cabinet versus cavalier projections and Hermite versus B´ ezier parametric cubic curve forms.
—John Updike, Roger’s Version (1986)
12 Spline Interpolation Given a set of points, it is easy to compute a polynomial that passes through the points. The Lagrange polynomial (LP) of Section 10.2 is an example of such a polynomial. However, as the discussion in Section 8.8 (especially Exercise 8.16) illustrates, a curve based on a high-degree polynomial may wiggle wildly and its shape may be far from what the user has in mind. In practical work we are normally interested in a smooth, tight curve that proceeds from point to point such that each segment between two points is a smooth arc. The spline approach to curve design, discussed in this chapter, constructs such a curve from individual segments, each a simple curve, generally a parametric cubic (PC). This chapter illustrates spline interpolation with four examples, cubic splines (Section 12.1), the Akima spline (Section 12.2), cardinal splines (Section 12.5), and Kochanek–Bartels splines (Section 12.8). Another important type, the B-spline, is the topic of Chapter 14. Other types of splines are known and are discussed in the scientific literature. A short history of splines can be found in [Schumaker 81] and [Farin 04]. For those looking for other texts on splines, the bibliography lists several books by Gerald Farin, and I would also like to recommend [Sp¨ ath 95a,b] and [Dierckx 95]. Definition: A spline is a set of polynomials of degree k that are smoothly connected at certain data points. At each data point, two polynomials connect, and their first derivatives (tangent vectors) have the same values. The definition also requires that all their derivatives up to the (k − 1)st be the same at the point.
D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_12, © Springer-Verlag London Limited 2011
577
12.1 The Cubic Spline Curve
578
12.1 The Cubic Spline Curve The cubic spline was originally introduced by James Ferguson in [Ferguson 64]. Given n data points that are numbered P1 through Pn , there are infinitely many curves that pass through all the points in order of their numbers (Figure 12.1a), but the eye often tends to trace one imaginary smooth curve through the points, especially if the points are arranged in a familiar pattern. It is therefore useful to have an algorithm that does the same. Since the computer does not recognize familiar patterns the way humans do, such a method should be interactive, thereby allowing the user to create the desired curve. The cubic spline method is such an algorithm. Given n data points, it makes it possible to construct a smooth curve that passes through the points (see definition of data points in Section 8.6). The curve consists of n − 1 individual Hermite segments that are smoothly connected at the n − 2 interior points and that are easy to compute and display. For the segments to meet at the interior points, their tangent vectors (first derivatives) must be the same at each interior point. An added feature of cubic splines is that their second derivatives are also the same at the interior points. The cubic spline method is interactive. The user can control the shape of the curve by varying the two extreme tangent vectors at the beginning and the end of the curve. Given the n data points P1 , P2 , through Pn , we look for n − 1 parametric cubics P1 (t), P2 (t), . . . , Pn−1 (t) such that Pk (t) is the polynomial segment from point Pk to point Pk+1 (Figure 12.1b). The PCs will have to be smoothly connected at the n − 2 interior points P2 , P3 , . . . , Pn−1 , which means that their first derivatives will have to match at every interior point. The definition of a spline requires that their second derivatives match too. This requirement (the boundary condition of the cubic spline) is important because it provides the necessary equations and also results in a tight curve in the sense that once the curve is drawn, the eye can no longer detect the positions of the original data points.
Pk
t) P k(
Pk+1
Pk+2 Pk+1(t)
(a)
(b)
Figure 12.1: (a) Three Different Curves. (b) Two Segments.
The principle of cubic splines is to divide the set of n points into n − 1 overlapping pairs of two points each and to fit a Hermite segment (Equations (11.4) and (11.5)) to each pair. The pairs are (P1 , P2 ), (P2 , P3 ), and so on, up to (Pn−1 , Pn ). Recall that a Hermite curve segment is specified by two points and two tangents. In our case, all the points are given, so the only unknowns are the tangent vectors. In order for segments Pk (t) and Pk+1 (t) to connect smoothly at point Pk+1 , the end tangent of Pk (t) has to equal the start tangent of Pk+1 (t). Thus, there is only one tangent vector per point, for a total of n unknowns.
12 Spline Interpolation
579
The unknown tangent vectors are computed as the solutions of a system of n equations. The equations are derived from the requirement that the second derivatives of the individual segments match at every interior point. However, there are only n − 2 interior points, so we can only have n − 2 equations, enough to solve for only n − 2 unknowns. The key to resolving this shortage of equations is to ask the user to provide the software with the values of two tangent vectors (normally the first and last ones). Once this is done, the equations can easily be solved, yielding the remaining n − 2 tangents. This seems a strange way to solve equations, but it has the advantage of being interactive. If the resulting curve looks wrong, the user can repeat the calculation with two new tangent vectors. Before delving into the details, here is a summary of the steps involved. 1. The n data points are input into the program. 2. The user provides values (guesses or estimates) for two tangent vectors. 3. The program sets up n − 2 equations, with the remaining n − 2 tangent vectors as the unknowns, and solves them. 4. The program loops n − 1 times. In each iteration, it selects two adjacent points and their tangent vectors to compute one Hermite segment. We start with three adjacent points, Pk , Pk+1 , and Pk+2 , of which Pk+1 must be an interior point and the other two can be either interior or endpoints. Thus, k varies from 1 to n − 2. The Hermite segment from Pk to Pk+1 is denoted by Pk (t), which implies that Pk (0) = Pk and Pk (1) = Pk+1 . The tangent vectors of Pk (t) at the endpoints are still unknown and are denoted by Ptk and Ptk+1 . The first step is to express segment Pk (t) geometrically, in terms of the two endpoints and the two tangents. Applying Equation (11.4) to our segment results in Pk (t) = Pk + Ptk t + 3(Pk+1 − Pk ) − 2Ptk − Ptk+1 t2 + 2(Pk − Pk+1 ) + Ptk + Ptk+1 t3 .
(12.1)
When the same equation is applied to the next segment Pk+1 (t) (from Pk+1 to Pk+2 ), it becomes Pk+1 (t) = Pk+1 + Ptk+1 t + 3(Pk+2 − Pk+1 ) − 2Ptk+1 − Ptk+2 t2 (12.2) + 2(Pk+1 − Pk+2 ) + Ptk+1 + Ptk+2 t3 . Exercise 12.1: Where do we use the assumption that the first derivatives of segments Pk (t) and Pk+1 (t) are equal at the interior point Pk+1 ? Next, we use the requirement that the second derivatives of the two segments be equal at the interior points. The second derivative Ptt (t) of a Hermite segment P(t) is obtained by differentiating Equation (11.1) Ptt (t) = 6at + 2b.
(12.3)
Equality of the second derivatives at the interior point Pk+1 implies tt Ptt k (1) = Pk+1 (0)
or
6ak ×1 + 2bk = 6ak+1 ×0 + 2bk+1 .
(12.4)
580
12.1 The Cubic Spline Curve
Using the values of a and b from Equations (12.1) and (12.2), we get 6 2(Pk − Pk+1 ) + Ptk + Ptk+1 + 2 3(Pk+1 − Pk ) − 2Ptk − Ptk+1 = 2 3(Pk+2 − Pk+1 ) − 2Ptk+1 − Ptk+2 ,
(12.5)
which, after simple algebraic manipulations, becomes Ptk + 4Ptk+1 + Ptk+2 = 3(Pk+2 − Pk ).
(12.6)
The three quantities on the left side of Equation (12.6) are unknown. The two quantities on the right side are known. Equation (12.6) can be written n − 2 times for all the interior points Pk+1 = P2 , P3 , . . . , Pn−1 to obtain a system of n − 2 linear algebraic equations expressed in matrix form as ⎧⎛ ⎞⎛ Pt ⎞ ⎛ 3(P3 − P1 ) ⎞ 1 4 1 0 ··· 0 1 ⎪ ⎪ ⎨ ⎜0 1 t ⎜ ⎟ 4 1 · · · 0 ⎟⎜ P2 ⎟ ⎜ 3(P4 − P2 ) ⎟ ⎜ ⎟. . ⎟⎜ n−2 (12.7) .. .. ⎟=⎜ .. ⎝ ⎠ ⎪ . . .. ⎠⎝ ... ⎠ ⎝ ⎪ . ⎩ t 0 ··· ··· 1 4 1 3(Pn − Pn−2 ) Pn n
Equation (12.7) is a system of n − 2 equations in the n unknowns Pt1 , Pt2 , . . . , Ptn . A practical approach to the solution is to let the user specify the values of the two extreme tangents Pt1 and Ptn . Once these values have been substituted in Equation (12.7), it’s easy to solve it and obtain values for the remaining n − 2 tangents, Pt2 through Ptn−1 . The n tangent vectors are now used to calculate the original coefficients a, b, c, and d of each segment by means of Equations (11.3), (11.4), or (11.7), which should be written and solved n − 1 times, once for each segment of the spline. The reader should notice that the matrix of coefficients of Equation (12.7) is tridiagonal and therefore diagonally dominant and thus nonsingular. This means that the system of equations can always be solved and that it has a unique solution. (Matrices and their properties are discussed in texts on linear algebra.) This approach to solving Equation (12.7) is called the clamped end condition. Its advantage is that the user can vary the shape of the curve by entering new values for Pt1 and Ptn and recalculating. This allows for interactive design, where each step brings the curve closer to the desired shape. Figure 12.1a is an example of three cubic splines that pass through the same points and differ only in Pt1 and Ptn . It illustrates how the shape of the entire curve can be radically changed by modifying the two extreme tangents. It is possible to let the user specify any two tangent vectors, not just the two extreme ones. However, varying the two extreme tangents is a natural way to edit and reshape the curve in practical applications. Tension control. Section 11.2.3 shows how to control the tension of a Hermite segment by varying the magnitudes of the tangent vectors. Since a cubic spline is based on Hermite segments, its tension can also be controlled in the same way. The user may input a tension parameter s and the software simply multiplies every tangent vector by
12 Spline Interpolation
581
s. Small values of s correspond to high tension, so a user-friendly algorithm inputs a parameter T in the interval [0, 1] and multiplies each tangent vector by s = α(1 − T ) for some predetermined α. Large values of T (close to 1) correspond to small s and therefore to high tension, while small values of T correspond to s close to α. This makes T a natural tension parameter. Section 12.5 has the similar relation T = 1 − 2s, which makes more sense for cardinal splines. The downside of the cubic spline is the following: 1. There is no local control. Modifying the extreme tangent vectors changes Equation (12.7) and results in a different set of n tangent vectors. The entire curve is modified! 2. Equation (12.7) is a system of n equations that, for large values of n, may be too slow to solve. Picnic Blues (anagram of Cubic Spline).
12.1.1 Example Given the four points P1 = (0, 0), P2 = (1, 0), P3 = (1, 1), and P4 = (0, 1), we are looking for three Hermite segments P1 (t), P2 (t), and P3 (t) that will connect smoothly at the two interior points P2 and P3 and will constitute the spline. We further select an initial direction Pt1 = (1, −1) and a final direction Pt4 = (−1, −1). Figure 12.2 shows the points, the two extreme tangent vectors, and the resulting curve. y P4
P1
P3
P2
x
Figure 12.2: A Cubic Spline Example.
We first write Equation (12.7) for our special case (n = 4)
1 4 1 0 0 1 4 1
⎞ (1, −1) t (3, 3) 3[(1, 1) − (0, 0)] P ⎟ ⎜ 2 , = ⎠= ⎝ t (−3, 3) 3[(0, 1) − (1, 0)] P3 (−1, −1) ⎛
or
(1, −1) + 4Pt2 + Pt3 = (3, 3), Pt2 + 4Pt3 + (−1, −1) = (−3, 3).
This is a system of two equations in two unknowns. It is easy to solve and the solutions are Pt2 = ( 23 , 45 ) and Pt3 = (− 23 , 45 ).
582
12.1 The Cubic Spline Curve
We now write Equation (11.7) three times, for the three spline segments. For the first segment, Equation (11.7) becomes ⎞ ⎞⎛ 2 −2 1 1 (0, 0) 3 −2 −1 ⎟ ⎜ (1, 0) ⎟ ⎜ −3 P1 (t) = (t3 , t2 , t, 1) ⎝ ⎠ ⎠⎝ 0 0 1 0 (1, −1) 1 0 0 0 ( 23 , 45 ) ⎛
= (− 13 , − 15 )t3 + ( 13 , 65 )t2 + (1, −1)t.
The second segment is calculated in a similar way: ⎞ ⎞⎛ 2 −2 1 1 (1, 0) 3 −2 −1 ⎟ ⎜ (1, 1) ⎟ ⎜ −3 P2 (t) = (t3 , t2 , t, 1) ⎝ ⎠⎝ 2 4 ⎠ 0 0 1 0 (3, 5) (− 23 , 45 ) 1 0 0 0 ⎛
= (0, − 25 )t3 + (− 23 , 35 )t2 + ( 23 , 45 )t + (1, 0).
Finally, we write, for the third segment, ⎞ ⎞⎛ 2 −2 1 1 (1, 1) −3 3 −2 −1 (0, 1) ⎟ ⎜ ⎜ ⎟ P3 (t) = (t3 , t2 , t, 1) ⎝ ⎠ ⎠⎝ 0 0 1 0 (− 23 , 45 ) (−1, −1) 1 0 0 0 ⎛
= ( 13 , − 15 )t3 − ( 23 , 35 )t2 + (− 23 , 45 )t + (1, 1), which completes the example. Exercise 12.2: Check to make sure that the three polynomial segments really connect at the two interior points. What are the tangent vectors at the points? Exercise 12.3: Redo the example of this section with an indefinite initial direction Pt1 = (0, 0). What does it mean for a curve to start going in an indefinite direction?
12.1.2 Relaxed Cubic Splines The original approach to the cubic spline curve is for the user to specify the two extreme tangent vectors. This approach is known as the clamped end condition. It is possible to have different end conditions, and the one described in this section is based on the simple tt idea of setting the two extreme second derivatives of the curve, Ptt 1 (0) and Pn−1 (1), to zero. If we think of the second derivative as the acceleration of the curve (see the particle paradigm of Section 8.6), then this end condition implies constant speeds and therefore small curvatures at both ends of the curve. This is why this end condition is called relaxed. It is easy to calculate the relaxed cubic spline. The second derivative of the parametric cubic P(t) is Ptt (t) = 6at + 2b (Equation (12.3)). The end condition Ptt 1 (0) = 0 implies 2b1 = 0 or, from Equation (11.3), −3P1 + 3P2 − 2Pt1 − Pt2 = 0,
which yields Pt1 = 32 (P2 − P1 ) − 12 Pt2 .
(12.8)
12 Spline Interpolation
583
The other end condition, Ptt n−1 (1) = 0, implies 6an−1 + 2bn−1 = 0 or, from Equation (11.3) 6 2Pn−1 − 2Pn + Ptn−1 + Ptn + 2 −3Pn−1 + 3Pn − 2Ptn−1 − Ptn = 0, or
Ptn = 32 (Pn − Pn−1 ) − 12 Ptn−1 .
(12.9)
Substituting Equations (12.8) and (12.9) in Equation (12.7) results in ⎧⎡ 1 4 1 0 ⎪ ⎪ ⎪ ⎪ ⎢0 1 4 1 ⎨ ⎢ .. ⎢ n−2 . ⎢ ⎪ ⎪ ⎣0 ··· 1 ⎪ 4 ⎪ ⎩ 0 ··· ··· 1
··· ··· .. . 1 4
⎤⎡ 0 0 ⎥⎢ ⎢ .. ⎥ ⎢ .⎥ ⎥⎢ ⎦ 0 ⎣ 1
n
3 2 (P2
3 2 (Pn
− P1 ) − 12 Pt2 Pt2 .. .
Ptn−1 − Pn−1 ) − 12 Ptn−1
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
(12.10)
⎡ ⎢ ⎢ =⎢ ⎢ ⎣
3(P3 − P1 ) 3(P4 − P2 ) .. . 3(Pn−1 − Pn−3 ) 3(Pn − Pn−2 )
⎤ ⎥ ⎥ ⎥. ⎥ ⎦
This is a system of n − 2 equations in the n − 2 unknowns Pt2 , Pt3 , . . . , Ptn−1 . Calculating the relaxed cubic spline is done in the following steps: 1. Set up Equation (12.10) and solve it to obtain the n − 2 interior tangent vectors. 2. Use Pt2 to calculate Pt1 from Equation (12.8). Similarly, use Ptn−1 to calculate Ptn from Equation (12.9). 3. Now that the values of all n tangent vectors are known, write and solve Equation (11.4) or (11.7) n − 1 times, each time calculating one spline segment. The clamped cubic spline is interactive. The curve can be modified by varying the two extreme tangent vectors. The relaxed cubic spline, on the other hand, is not interactive. The only way to edit or modify it is to move the points or add points. The points, however, are data points that may be dictated by the problem on hand or that may be given by a user or a client, so it may not always be possible to move them. Example: We use the same four points P1 = (0, 0), P2 = (1, 0), P3 = (1, 1), and P4 = (0, 1) of Section 12.1.1. The first step is to set up Equation (12.10) and solve it to obtain the two interior tangent vectors Pt2 and Pt3 .
1 4 1 0 0 1 4 1
The solutions are Pt2 =
⎞ ( 32 , 0) − 12 Pt2 t P2 (3, 3) ⎟ ⎜ = . ⎠ ⎝ Pt3 (−3, 3) 3 1 t (− 2 , 0) − 2 P3 ⎛
3 2 , 5 3
,
Pt3 =
3 2 . − , 5 3
12.1 The Cubic Spline Curve
584
The second step is to calculate Pt1 and Pt4 3 3 1 1 3 2 6 1 (P2 − P1 ) − Pt2 = ,0 − , ,− = , 2 2 2 2 5 3 5 3 3 1 3 1 3 2 6 1 Pt4 = (P4 − P3 ) − Pt3 = − , 0 − − , = − ,− . 2 2 2 2 5 3 5 3 Pt1 =
Now that the values of all four tangent vectors are known, the last step is to write and solve Equation (11.4) or (11.7) three times to calculate each of the three segments of our example curve. For the first segment, Equation (11.7) becomes ⎞ ⎛ (0, 0) ⎞ 2 −2 1 1 (1, 0) ⎟ 3 −2 −1 ⎟ ⎜ ⎜ −3 ⎟ P1 (t) = (t3 , t2 , t, 1) ⎝ ⎠⎜ 0 0 1 0 ⎝ ( 65 , − 13 ) ⎠ 1 0 0 0 ( 35 , 23 ) ⎛
= (− 15 , 13 )t3 + ( 65 , − 13 )t.
For the second segment, Equation (11.7) becomes ⎞ ⎛ (1, 0) ⎞ 2 −2 1 1 (1, 1) ⎟ 3 −2 −1 ⎟ ⎜ ⎜ −3 ⎟ P2 (t) = (t3 , t2 , t, 1) ⎝ ⎠⎜ 0 0 1 0 ⎝ ( 35 , 23 ) ⎠ 1 0 0 0 (− 35 , 23 ) ⎛
= (0, − 23 )t3 + (− 35 , 1)t2 + ( 35 , 23 )t + (1, 0).
Exercise 12.4: Compute the third Hermite segment.
12.1.3 Cyclic Cubic Splines The cyclic end condition is ideal for a closed cubic spline (Section 12.1.5) and also for a periodic cubic spline (Section 12.1.4). The condition is that the tangent vectors be equal at the two extremes of the curve (i.e., Pt1 = Ptn ) and the same for the second derivatives tt Ptt 1 = Pn . Notice that the curve doesn’t have to be closed, i.e., a segment from Pn to P1 is not required. Applying Equation (11.1) to the first condition yields Pt1 (0) = Ptn−1 (1) or 3a1 t2 + 2b1 t + c1 |t=0 = 3an−1 t2 + 2bn−1 t + cn−1 |t=1 or c1 = 3an−1 + 2bn−1 + cn−1 .
(12.11)
12 Spline Interpolation
585
Applying Equation (12.3) to the second condition yields tt Ptt 1 (0) = Pn−1 (1)
or 6a1 t + 2b1 |t=0 = 6an−1 t + 2bn−1 |t=1 or 2b1 = 6an−1 + 2bn−1 .
(12.12)
Subtracting Equations (12.11) and (12.12) yields c1 − 2b1 = −3an−1 + cn−1 or, from Equation (11.3), Pt1 − 2[−3P1 + 3P2 − 2Pt1 − Pt2 ] = −3[2Pn−1 − 2Pn + Ptn−1 + Ptn ] + Ptn−1 . This can be written Pt1 + 4Pt1 + 3Ptn = 6(P2 − P1 + Pn − Pn−1 ) − (Pt2 + Ptn−1 ). Using the end condition Pt1 = Ptn , we get Pt1 = Ptn =
3 4
(P2 − P1 + Pn − Pn−1 ) −
1 4
t P2 + Ptn−1 .
(12.13)
Substituting Equation (12.13) in Equation (12.7) results in ⎧⎡ 1 4 1 0 ⎪ ⎪ ⎪ ⎪ ⎢0 1 4 1 ⎨ ⎢ .. ⎢ n−2 . ⎢ ⎪ ⎪ ⎣0 ··· 1 ⎪ 4 ⎪ ⎩ 0 ··· ··· 1 n
⎤ ⎡ 3 (P − P + P − P 1 n n−1 )− ⎤ 4 2 1 t t − 4 P2 + Pn−1 ··· 0 ⎢ ⎥ ⎥ ⎢ Pt2 · · · 0 ⎥⎢ ⎥ ⎥ ⎥ ⎢ . .. .. ⎥ ⎢ . .. ⎥ ⎥⎢ . ⎥ ⎦ ⎥ ⎢ t 1 0 ⎢ Pn−1 ⎥ 4 1 ⎣ 3 (P2 − P1 + Pn − Pn−1 ) − ⎦ 4 − 14 Pt2 + Ptn−1 ⎡ ⎢ ⎢ =⎢ ⎢ ⎣
(12.14)
3(P3 − P1 ) 3(P4 − P2 ) .. . 3(Pn−1 − Pn−3 ) 3(Pn − Pn−2 )
⎤ ⎥ ⎥ ⎥, ⎥ ⎦
which is a system of n − 2 equations in the n − 2 unknowns Pt2 , Pt3 , . . . , Ptn−1 . Notice that in the case of a closed curve, these equations are somehow simplified because the two extreme points P1 and Pn are identical. Calculating the cyclic cubic spline is done in the following steps: 1. Set up Equation (12.14) and solve it to obtain the n − 2 interior tangent vectors. 2. Use Pt2 and Ptn−1 to calculate Pt1 and Ptn from Equation (12.13). 3. Now that the values of all n tangent vectors are known, write and solve Equation (11.4) or (11.7) n − 1 times, each time calculating one spline segment.
586
12.1 The Cubic Spline Curve
Example: We select the five points P1 = P5 = (0, −1), P2 = (1, 0), P3 = (0, 1), and P4 = (−1, 0) and calculate the cubic spline with the cyclic end condition for these points. Notice that the curve is closed since P1 = P5 . Also, since the points are symmetric about the origin, we can expect the resulting four PC segments to be similar. We start with Equation (12.14) ⎡3 1 t t ⎤ ⎡ ⎤ 4 (P2 − P1 + P5 − tP4 ) − 4 (P2 + P4 ) ⎤ ⎡ P2 1 4 1 0 0 ⎢ 3(P3 − P1 ) ⎥ ⎢ ⎥ t ⎣0 1 4 1 0⎦⎢ P3 ⎥ = ⎣ 3(P4 − P2 ) ⎦ , ⎣ ⎦ 0 0 1 4 1 Pt4 3(P5 − P3 ) 1 3 t t (P − P + P − P ) − (P + P ) 2 1 5 4 2 4 4 4 which is solved to yield Pt2 = (0, 3/2), Pt3 = (−3/2, 0), and Pt4 = (0, −3/2). These values are used to solve Equation (12.13) Pt1 = Pt5 =
3 4
(P2 − P1 + P5 − P4 ) −
1 4
(Pt2 + Pt4 ) ,
which gives Pt1 = Pt5 = (3/2, 0). The four segments can now be calculated in the usual way. For the first segment, Equation (11.7) becomes ⎞ ⎞⎛ 2 −2 1 1 (0, −1) 3 −2 −1 ⎟ ⎜ (1, 0) ⎟ ⎜ −3 P1 (t) = (t3 , t2 , t, 1) ⎝ ⎠ ⎠⎝ 3 0 0 1 0 ( 2 , 0) (0, 32 ) 1 0 0 0 ⎛
= −( 12 , 12 )t3 + (0, 32 )t2 + ( 32 , 0)t + (0, −1).
For the second segment, Equation (11.7) becomes ⎞ ⎞⎛ 2 −2 1 1 (1, 0) 3 −2 −1 ⎟ ⎜ (0, 1) ⎟ ⎜ −3 P2 (t) = (t3 , t2 , t, 1) ⎝ ⎠ ⎠⎝ 0 0 1 0 (0, 32 ) (− 32 , 0) 1 0 0 0 ⎛
= ( 12 , − 12 )t3 + (− 32 , 0)t2 + (0, 32 )t + (1, 0).
Exercise 12.5: Compute the third and fourth Hermite segments. Notice how the symmetry of the problem causes the coefficients of P1 (t) and P3 (t) to have opposite signs, and the same for the coefficients of P2 (t) and P4 (t). It is also possible to have an anticyclic end condition for the cubic spline. It requires that the two extreme tangent vectors have the same magnitudes but opposite directions Pt1 = −Ptn tt and the same condition for the second derivatives Ptt 1 = −Pn . Such an end condition makes sense for curves such as the cross section of a vase or any other surface of revolution. Following steps similar to the ones for the cyclic case, we get for the anticyclic end condition 3 1 t (12.15) Pt1 = −Ptn = (P2 − P1 − Pn + Pn−1 ) − P2 − Ptn−1 . 4 4
12 Spline Interpolation
587
Exercise 12.6: Given the three points P1 = (−1, 0), P2 = (0, 1), and P3 = (1, 0), calculate the anticyclic cubic spline for them and compare it to the clamped cubic spline for the same points.
12.1.4 Periodic Cubic Splines A periodic function f (x) is one that repeats itself. If p is the period of the function, then f (x + p) = f (x) for any x. A two-dimensional cubic spline is periodic if it has the same extreme tangent vectors (i.e., if it starts and ends going in the same direction) and if its two extreme points P(0) and P(1) have the same y coordinate. If the curve satisfies these conditions, then we can place consecutive copies of it side by side and the result would look like a single periodic curve. The case of a three-dimensional periodic cubic spline is less clear. It seems that the two extreme points can be any points (they don’t have to have the same y or z coordinates or any other relationship), so the condition for periodicity is that the curve will have the same start and end tangents, i.e., it will be cyclic. Example: Exercise 8.11 shows that the parametric expression (cos t, sin t, t) describes a helix (see also Section 9.4.1 for a double helix). Modifying this expression to P(t) = (0.05t+cos t, sin t, .1t) creates a helix that moves in the x direction as it climbs up in the z direction. Figure 12.3 shows its behavior. This curve starts at P(0) = (1, 0, 0) and ends at P(10π) = (0.5π + 1, 0, π). There is no special relation between the start and end points, but the curve is periodic since both its start and end tangents equal Pt (0) = Pt (10π) = (0.05, 1, 0.1). We can construct another period of this curve by copying it, moving the copy parallel to itself, and placing it such that the start point of the copy is at the end point of the original curve. Notice that it is possible to make the start and end points even more unrelated by, for example, tilting the helix also in the y direction as it climbs up in the z direction. This kind of effect is achieved by an expression such as P(t) = (0.05t + cos t, −0.05t2 + sin t, 0.1t).
12.1.5 Closed Cubic Splines A closed cubic spline has an extra curve segment from Pn to P1 that closes the curve. In such a curve, every point is interior, so Equation (12.7) becomes a system of n equations in the same n unknowns. No user input is needed, which implies that the only way to control or modify such a curve is to move, add, or delete points. It is convenient to def def define the two additional points Pn+1 = P1 and Pn+2 = P2 . Equation (12.7) then becomes
12.1 The Cubic Spline Curve
588
0 1 2
z
-1
0
3
2
1
x
1
y
0
(* tilted helix as a periodic curve *) ParametricPlot3D[{.05t+Cos[t],Sin[t],.1t},{t,0,10Pi}, Ticks->{{-1,0,1,2},{-1,0,1},{0,1,2,3}}, PlotPoints->100,PlotStyle->Red]
Figure 12.3: A Tilted Helix as a Periodic Curve.
⎧⎡ 1 4 1 ⎪ ⎪ ⎪ ⎪ ⎢0 1 4 ⎪ ⎪ ⎨⎢ ⎢ ⎢ n ⎪⎢ ⎢0 ··· ··· ⎪ ⎪ ⎪ ⎣ ⎪ ⎪ ⎩ 1 ··· ··· 4 1 0
··· 0 ··· 1 ··· ··· .. .. . . ··· 1 4 ··· 0 1 ··· 0 0 n
⎤ 0 ⎡ Pt ⎤ ⎡ 3(P − P ) ⎤ 3 1 0 ⎥⎢ 1t ⎥ ⎢ ⎥ 3(P P − P ⎥ 4 2) .. ⎥⎢ 2 ⎥ ⎢ ⎥ ⎥. . ⎥⎢ .. ⎥=⎢ .. ⎢ ⎥ ⎢ ⎥ ⎥ . . ⎥ 1 ⎥⎢ t ⎥ ⎢ ⎣ ⎦ ⎣ 3(Pn+1 − Pn−1 ) ⎦ 4 ⎦ Pn−1 3(Pn+2 − Pn ) Ptn 1
(12.16)
Example: Given the four points of Section 12.1.1, P1 = (0, 0), P2 = (1, 0), P3 = (1, 1), and P4 = (0, 1), we are looking for four Hermite segments P1 (t), P2 (t), P3 (t), and P4 (t) that would connect smoothly at the four points. Equation (12.16) becomes ⎛
1 ⎜0 ⎝ 1 4
4 1 0 1
1 4 1 0
⎤ ⎞⎛ t ⎞ ⎡ 3(P3 − P1 ) 0 P1 1 ⎟ ⎜ Pt2 ⎟ ⎢ 3(P4 − P2 ) ⎥ ⎦. ⎠ ⎝ Pt ⎠ = ⎣ 3(P1 − P3 ) 4 3 t P − P ) 3(P 1 2 4 4
(12.17)
Its solutions are Pt1 = (3/4, −3/4), Pt2 = (3/4, 3/4), Pt3 = (−3/4, 3/4), and Pt4 = (−3/4, −3/4), and the four spline segments are P1 (t) = (−1/2, 0)t3 + (3/4, 3/4)t2 + (3/4, −3/4)t,
12 Spline Interpolation
589
P2 (t) = (0, −1/2)t3 + (−3/4, 3/4)t2 + (3/4, 3/4)t + (1, 0), P3 (t) = (1/2, 0)t3 + (−3/4, −3/4)t2 + (−3/4, 3/4)t + (1, 1), P4 (t) = (0, 1/2)t3 + (3/4, −3/4)t2 + (−3/4, −3/4)t + (0, 1).
12.1.6 Nonuniform Cubic Splines All the different types of cubic splines discussed so far assume that the parameter t varies in the interval [0, 1] in each segment. These types of cubic spline are therefore uniform or normalized. The nonuniform cubic spline is obtained by adding another parameter tk to each spline segment and letting t vary in the interval [0, tk ]. Since there are n − 1 spline segments connecting the n data points, this adds n − 1 parameters to the curve, which makes it easier to fine-tune the shape of the curve. The nonuniform cubic splines are especially useful in cases where the data points are nonuniformly spaced. In regions where the points are closely spaced, the normalized cubic spline tends to develop loops and overshoots. In regions where the points are widely spaced, it tends to “cut corners,” i.e., to be too tight. Careful selection of the tk parameters can overcome these tendencies. The calculation of the nonuniform cubic spline is based on that of the uniform version. We simply rewrite some of the basic equations, substituting tk for 1 as the final value of t. We start with Equation (11.2) that becomes, for the first spline segment, a·03 + b·02 + c·0 + d = P1 , a(t1 )3 + b(t1 )2 + c(t1 ) + d = P2 , 3a·02 + 2b·0 + c = Pt1 , 3a(t1 )2 + 2b(t1 ) + c = Pt2 , with solutions
2(P1 − P2 ) Pt1 Pt2 + 2 + 2, t31 t1 t1 3(P2 − P1 ) 2Pt1 Pt b= − − 2, 2 t1 t1 t1 a=
(12.18)
Pt1 ,
c= d = P1 . Equation (11.4) now becomes 3(P2 − P1 ) 2Pt1 Pt2 2 Pt2 3 2(P1 − P2 ) Pt1 + + + − − t t + Pt1 t + P1 . t31 t21 t21 t21 t1 t1 (12.19) Equation (12.4) becomes
P(t) =
tt Ptt k (tk ) = Pk+1 (0)
or
6ak × tk + 2bk = 6ak+1 × 0 + 2bk+1 ,
(12.20)
12.1 The Cubic Spline Curve
590 and Equation (12.5) is now 2
Pt Pt 2(Pk − Pk+1 ) Ptk 3(Pk+1 − Pk ) 2Ptk − − k+1 + 6tk + 2 + k+1 2 3 2 tk tk tk tk tk tk Ptk+2 3(Pk+2 − Pk+1 ) 2Ptk+1 =2 − − . t2k+1 tk+1 tk+1
(12.21)
Equation (12.6) now becomes tk+1 Ptk + 2(tk + tk+1 )Ptk+1 + tk Ptk+2 3 2 = t (Pk+2 − Pk+1 ) + t2k+1 (Pk+1 − Pk ) . tk tk+1 k
(12.22)
This produces the new version of Equation (12.7) ⎧⎡ t2 ⎪ ⎪ ⎨⎢0 ⎢ n−2 ⎣ ⎪ ⎪ ⎩ 0
2(t1 + t2 ) t1 2(t2 + t3 ) t3 ···
0
0 t2
0 0 .. .
· · · tn−1 n
··· ··· .. .
0 0 .. .
1
⎥⎢ Pt2 ⎥ ⎥⎢ . ⎥ ⎦⎣ . ⎦ . 2(tn−1 + tn−2 ) tn−2 Ptn
⎤ − P2 ) + t22 (P2 − P1 ) ⎢ ⎥ − P3 ) + t23 (P3 − P2 ) ⎢ ⎥ ⎢ ⎥. =⎢ .. ⎥ ⎣ ⎦ . 3 2 2 tn−2 tn−1 tn−2 (Pn − Pn−1 ) + tn−1 (Pn−1 − Pn−2 ) ⎡
⎤⎡ t ⎤ P
(12.23)
3 2 t1 t2 t1 (P3 3 2 t2 t3 t2 (P4
This is again a system of n − 2 equations in the n unknowns Pt1 , Pt2 ,. . . , Ptn . After the user inputs the guessed or estimated values for the two extreme tangent vectors Pt1 and Ptn , this system can be solved, yielding the values of the remaining n−2 tangent vectors. Each of the n − 1 spline segments can now be calculated by means of Equation (12.18) that is written here for the first segment in compact form ⎛ ⎞ ⎛ a 2/t31 2 b −3/t ⎜ ⎟ ⎜ 1 ⎝ ⎠=⎝ c 0 d 1
−2/t31 3/t21 0 0
1/t21 −2/t1 1 0
⎞⎛ ⎞ P1 1/t21 −1/t1 ⎟ ⎜ P2 ⎟ ⎠⎝ t ⎠. 0 P1 Pt2 0
(12.24)
Notice how each of Equations (12.18) through (12.24) reduces to the corresponding original equation when all the ti are set to 1. The nonuniform cubic spline can now be calculated in the following steps: 1. The user inputs the values of the two extreme tangent vectors and the values of the n − 1 parameters tk . The software sets up and solves Equation (12.23) to calculate the remaining tangent vectors. 2. The software sets up and solves Equation (12.24) n − 1 times, once for each of the spline segments.
12 Spline Interpolation
591
3. Each segment Pk (t) is plotted by varying t from 0 to tk . Before looking at an example, it is useful to try to understand the advantage of having the extra parameters tk . Equation (12.18) shows that a large value of tk for spline segment Pk (t) means small a and b coefficients (since tk appears in the denominators), and hence a small second derivative Ptt k (t) = 6ak + 2bk for that segment. Since the second derivative can be interpreted as the acceleration of the curve, we can predict that a large tk will result in small overall acceleration for segment k. Thus, most of the segment will be close to a straight line. This is also easy to see when we substitute small a and b in Pk (t) = at3 + bt2 + ct + d. The dominant part of the segment becomes ct + d, which brings it close to linear. If the start and end directions of the segment are very different, the entire segment cannot be a straight line, so, in order to minimize its overall second derivative, the segment will end up consisting of two or three parts, each close to a straight line, with short, highly-curved corners connecting them (Figure 12.4). Such a geometry has a small overall second derivative. This knowledge is useful when designing curves, which is why the nonuniform cubic spline should not be dismissed as impractical. It may be the best method for certain curves.
Figure 12.4: Curves with Small Overall Second Derivative.
Example: The four points of Section 12.1.1 are used in this example. They are P1 = (0, 0), P2 = (1, 0), P3 = (1, 1), and P4 = (0, 1). We also select the same initial and final directions Pt1 = (1, −1) and Pt4 = (−1, −1). We decide to use tk = 2 for each of the three spline segments to illustrate how large tk values create a curve very different from the one of Section 12.1.1. Equation (12.23) becomes
t2 0
2(t1 + t2 ) t3
t1 0 2(t2 + t3 ) t2
⎤ (1, −1) 3 2 [t (P − P2 ) + t22 (P2 − P1 )] ⎢ Pt2 ⎥ ⎥ = t1 t2 1 3 ⎢ . ⎦ ⎣ Pt 3 2 2 3 t2 t3 [t2 (P4 − P3 ) + t3 (P3 − P2 )] (−1, −1) ⎡
For t1 = t2 = t3 = 2, this yields Pt2 = (1/6, 1/2) and Pt3 = (−1/6, 1/2). Equation (12.24) is now written and solved three times:
Segment 1
⎛ ⎞ ⎛ a 2/t31 ⎜ b ⎟ ⎜ −3/t21 ⎝ ⎠=⎝ 0 c 1 d
−2/t31 3/t21 0 0
1/t21 −2/t1 1 0
⎞⎡ ⎤ 1/t21 (0, 0) −1/t1 ⎟ ⎢ (1, 0) ⎥ ⎠⎣ ⎦. 0 (1, −1) 0 (1/6, 1/2)
12.1 The Cubic Spline Curve
592
Segment 2
⎛ ⎞ ⎛ a 2/t32 ⎜ b ⎟ ⎜ −3/t22 ⎝ ⎠=⎝ 0 c 1 d
−2/t32 3/t22 0 0
1/t22 −2/t2 1 0
⎞⎡ ⎤ 1/t22 (1, 0) −1/t2 ⎟ ⎢ (1, 1) ⎥ ⎠⎣ ⎦. 0 (1/6, 1/2) 0 (−1/6, 1/2)
Segment 3
⎛ ⎞ ⎛ 2/t33 a ⎜ b ⎟ ⎜ −3/t23 ⎝ ⎠=⎝ c 0 d 1
−2/t33 3/t23 0 0
1/t23 −2/t3 1 0
⎤ ⎞⎡ (1, 1) 1/t23 (0, 1) −1/t3 ⎟ ⎢ ⎥ ⎦. ⎠⎣ (−1/6, 1/2) 0 (−1, −1) 0
This yields the coefficients for the three spline segments: P1 (t) = (1/24, −1/8)t3 + (−1/3, 3/4)t2 + (1, −1)t, P2 (t) = (0, 0)t3 + (−1/12, 0)t2 + (1/6, 1/2)t + (1, 0), P3 (t) = −(1/24, 1/8)t3 + (−1/12, 0)t2 + (−1/6, 1/2)t + (1, 1). The result is shown in Figure 12.5. It should be compared with the uniform curve of Figure 12.2 that’s based on the same four points. (Recall that t varies from 0 to 2 in each of the segments above.)
1.25 1
P1
P2
P1
P2
0.5
0.2
0.6
1
-0.25
(* Nonuniform cubic spline example *) C1:=ParametricPlot[{1/24,-1/8}t^3+{-1/3,3/4}t^2+{1,-1}t, {t,0,2}]; C2:=ParametricPlot[{-1/12,0}t^2+{1/6,1/2}t+{1,0}, {t,0,2}]; C3:=ParametricPlot[-{1/24,1/8}t^3+{-1/12,0}t^2+{-1/6,1/2}t+{1,1}, {t,0,2}]; Show[C1, C2, C3, PlotRange->All, AspectRatio->Automatic]
Figure 12.5: A Nonuniform Cubic Spline Example.
12 Spline Interpolation
593
12.1.7 Fair Cubic Splines The term “fair curve” refers to a spline curve where each segment is close to a circular arc. Such a curve does not have very flat or very curved segments (e.g., segments with loops) and is generally considered pleasing to the eye. It is useful in artistic applications and in font design, where the aim is to get beautiful curves rather than a precise fit to a given set of points. The approach presented here is based on [Manning 74] and is illustrated by Figure 12.6. In Figure 12.6a we see four Hermite segments connecting the same two endpoints, all with 45◦ tangent vectors. The difference between them is the magnitude of the vectors. If we denote the distance between the points l, then Figure 12.6a(i)–(iii) shows curves whose tangent vectors have magnitudes smaller than, equal to, and greater than l, respectively. In Figure 12.6a(iv), the left tangent has magnitude > l and the right one < l, resulting in a curve that’s “pulled” to the right. Of these four, curve (ii) can be considered “fair,” since it is close to a special circular arc, namely the arc that passes through the endpoints of the curve at the same angle as the tangent vectors. lk+1Tk+1
Pk(t) rkTk
rk+1Tk+1
Pk+1(t)
Pk+1 lk+2Tk+2
(e)
Pk+2
Pk
P2
P1 (a)
1 A
P1
(iii) (iv) (ii) (i)
(b)
1
P2
B
(c) 2
P2
P1 1
2
(d)
2
Figure 12.6: Varying Tangent Vector Magnitudes.
Exercise 12.7: Two given points P1 and P2 define a straight segment P1 → P2 . Your task is to construct a “fair” Hermite segment with P1 and P2 as its endpoints. Here is what you need to show. Imagine the circular arc that passes through P1 and P2 and makes angles θ with the line P1 → P2 . Show that if the center point P(0.5) of
594
12.1 The Cubic Spline Curve
the Hermite segment is located on this circular arc, then the magnitudes of the tangent vectors satisfy 2|P2 − P1 | |Pt1 | = |Pt2 | = . (12.25) 1 + cos θ Figure 12.6b shows a Hermite segment with tangent vectors whose magnitudes satisfy Equation (12.25). The curve is therefore close to the special circular arc mentioned earlier. In Figure 12.6c, the left tangent vector P1 has been rotated (but has the same magnitude), thereby increasing angle θ1 and pulling the curve to the left. Figure 12.6d shows how the curve can (approximately) be returned to its former shape by shortening P1 . This suggests a way to build a complete cubic spline where every segment is fair. At every internal point Pk+1 , the “incoming” and “outgoing” tangent vectors should go in the same direction but should have different magnitudes. Figure 12.6e shows the terminology used. The incoming tangent vector at interior point Pk+1 is Ptk (1) = lk+1 Tk+1 , where all Ti are unit vectors and lk+1 is a scalar (the magnitude of the tangent). Similarly, the outgoing tangent vector at Pk+1 is Ptk+1 (0) = rk+1 Tk+1 , the same unit vector, but with a different magnitude rk+1 . Equation (12.25) can be considered the definition of a fair Hermite segment in the special case θ1 = θ2 . The previous paragraph suggests a way to extend it to cases where θ1 = θ2 . We need to express the magnitudes of the tangents Pt1 and Pt2 in terms of the endpoints and the angles θ1 and θ2 such that θ1 > θ2 will result in |Pt1 | < |Pt2 |. One way to achieve this effect is to define 2|P2 − P1 | , 1 + α cos θ2 + (1 − α) cos θ1 2|P2 − P1 | l2 = |Pt2 | = , 1 + α cos θ1 + (1 − α) cos θ2
r1 = |Pt1 | =
(12.26) (12.27)
where α is a parameter in the range [0, 1] to be determined by the user. This, of course, is not the only way to define a fair curve, but this definition has two useful properties: 1. For θ1 = θ2 , Equations (12.26) and (12.27) reduce to Equation (12.25). 2. If θ1 > θ2 , then cos θ1 < cos θ2 (for fair curves, we can assume angles between 0◦ and 90◦ ). In order to achieve |Pt1 | < |Pt2 |, we need a situation where (1 − α) cos θ1 < α cos θ2 . This is satisfied when 1 − α ≤ α, i.e., when 0.5 ≤ α ≤ 1. [Manning 74] suggests α = 2/3, but it seems that α should be left as a user-defined parameter, especially for closed fair curves, which are discussed later, where there is no other parameter for the user to adjust. The condition for slope continuity at the n − 2 interior points is thus written hk+1 Ptk (1) = Ptk+1 (0), where hk+1 is the ratio of the tangent vectors’ magnitudes. We denote by Tk a unit tangent vector, so we can write hk+1 lk+1 Tk+1 = rk+1 Tk+1 or hk+1 = rk+1 /lk+1 for k = 1, 2, . . . , n − 2. The two tangents Ptk (1) and Ptk+1 (0) go in the same direction, so hk+1 must be positive. The quantities l2 , l3 , . . . , ln and r1 , r2 , . . . , rn−1 make a total of 2n−2 unknowns that have to be calculated. The equations to calculate them are obtained by the requirement that the curvatures of the spline segments be equal at the n − 2 interior points. When
12 Spline Interpolation
595
Equation (8.19) is used for the curvature, this requirement becomes d2 Pk (1) d2 Pk+1 (0) = 2 ds ds2 or
Pt (1) • Ptt Ptt t k (1) k (1) − k 2 Pk (1) Ptk (1) • Ptk (1) Ptk (1) • Ptk (1) =
Ptt k+1 (0) t Pk+1 (0) • Ptk+1 (0)
Pt (0) • Ptt k+1 (0) t − k+1 2 Pk+1 (0). t t Pk+1 (0) • Pk+1 (0)
This is simplified by substituting hk+1 Ptk (1) = Ptk+1 (0) and multiplying by (Ptk (1) • Ptk (1)), yielding Ptt k (1) −
Ptt h2k+1 Ptk (1) • Ptt Ptk (1) • Ptt k+1 (0) k+1 (0) k (1) t P Ptk (1) (1) = − Ptk (1) • Ptk (1) k h2k+1 h4k+1 Ptk (1) • Ptk (1)
or h2k+1 Ptt k (1)
−
Ptt k+1 (0)
t tt h2k+1 Ptk (1) • Ptt k (1) − Pk (1) • Pk+1 (0) Ptk (1). = Ptk (1) • Ptk (1)
(12.28)
Equation (12.28) can be written tt t h2k+1 Ptt k (1) − Pk+1 (0) = Mk+1 Pk (1),
(12.29)
where the quantity Mk+1 that is defined by 2 def hk+1
Mk+1 =
t t tt Pk (1) • Ptt k (1) − Pk (1) • Pk+1 (0) , Ptk (1) • Ptk (1)
is a scalar combining all the scalar quantities from the right-hand side of Equation (12.28). Fair: To draw and adjust (the lines of a ship’s hull being designed) to produce regular surfaces of the correct form. — A dictionary definition. The next step is to replace the two second derivatives on the left side of Equation (12.29) with expressions containing the unknowns lk , rk , and Tk (and, perhaps, some known quantities, such as the points Pk ). This will provide equations whose solutions will yield the tangent vectors Ptk at all the points. We start with the second derivative of a PC Ptt (t) = 6at + 2b (Equation (12.3)), where a and b are given by Equation (11.3). The two second derivatives used in Equation (12.29) are (see also Figure 12.6e) as follows:
12.1 The Cubic Spline Curve
596 1. From segment Pk (t),
Ptt k (1) = 6ak × 1 + 2bk = 6[2(Pk − Pk+1 ) + Ptk + Ptk+1 ] + 2[3(Pk+1 − Pk ) − 2Ptk − Ptk+1 ] = −6(Pk+1 − Pk ) + 2Ptk + 4Ptk+1 = −6(Pk+1 − Pk ) + 2rk Tk + 4lk+1 Tk+1 . (12.30) 2. From segment Pk+1 (t), Ptt k+1 (0) = 6ak+1 × 0 + 2bk+1
= 2[3(Pk+2 − Pk+1 ) − 2Ptk+1 − Ptk+2 ] = 6(Pk+2 − Pk+1 ) − 4rk+1 Tk+1 − 2lk+2 Tk+2 .
(12.31)
Substituting Equations (12.30) and (12.31) into Equation (12.29) and taking into account that hk = rk /lk for k = 1, 2, . . . , n − 2, we get 2 rk+1 [−6(Pk+1 − Pk ) + 2rk Tk + 4lk+1 Tk+1 ] 2 lk+1
− [6(Pk+2 − Pk+1 ) − 4rk+1 Tk+1 − 2lk+2 Tk+2 ] = Mk+1 Ptk (1). 2 Multiplying by lk+1 and simplifying yields 2 rk+1 [−6(Pk+1 − Pk ) + 2rk Tk + 4lk+1 Tk+1 ] 2 − lk+1 [6(Pk+2 − Pk+1 ) − 4rk+1 Tk+1 − 2lk+2 Tk+2 ] 2 3 = lk+1 Mk+1 Ptk (1) = lk+1 Mk+1 Tk+1
or 2 2 2 (Pk+1 − Pk ) + 2rk+1 rk Tk + 4rk+1 lk+1 Tk+1 − 6rk+1 2 2 2 − 6lk+1 (Pk+2 − Pk+1 ) + 4lk+1 rk+1 Tk+1 + 2lk+1 lk+2 Tk+2 3 = lk+1 Mk+1 Tk+1 .
Dividing by 2 and changing signs results in 2 2 2 2 3rk+1 (Pk+1 − Pk ) − 3lk+1 (Pk+2 − Pk+1 ) − rk+1 rk Tk − lk+1 lk+2 Tk+2 1 3 def 2 2 = − lk+1 Mk+1 Tk+1 − 2rk+1 lk+1 Tk+1 + 2lk+1 rk+1 Tk+1 = Lk+1 Tk+1 , 2
(12.32)
where Lk+1 is defined as everything that multiplies Tk+1 on the right-hand side of Equation (12.32). Equation (12.32) is a vector equation that can be written n − 2 times, for k = 1, 2, . . . , n − 2. It therefore yields 2(n − 2) or 3(n − 2) equations, depending on whether
12 Spline Interpolation
597
the original data points Pk are two- or three-dimensional. However, more equations are needed. To derive them, we turn our attention to Figure 12.7, which shows the relation between a unit tangent vector T and the angle θ between it and the line connecting the two endpoints Pk+1 − Pk . cos θk = Tk • |Pk+1 − Pk | Tk
Pk
Pk+1
Pk+1−Pk
Figure 12.7: Relation Between θ and T.
For segment Pk (t), we get from Equation (12.27) lk+1 = |Ptk+1 | = =
2|Pk+1 − Pk |
1 + αTk •
Pk+1 −Pk |Pk+1 −Pk |
+ (1 − α)Tk+1 •
2|Pk+1 − Pk | 1 + [αTk + (1 − α)Tk+1 ] •
Pk+1 −Pk |Pk+1 −Pk |
Pk+1 −Pk |Pk+1 −Pk |
.
(12.33)
From Equation (12.26) rk = |Ptk | = =
1 + αTk+1 •
2|Pk+1 − Pk | Pk+1 −Pk |Pk+1 −Pk | + (1 − α)Tk
2|Pk+1 − Pk | 1 + [αTk+1 + (1 − α)Tk ] •
•
Pk+1 −Pk |Pk+1 −Pk |
Pk+1 −Pk |Pk+1 −Pk |
.
(12.34)
Equations (12.33) and (12.34) are scalar. They can be written for each of the n − 1 spline segments (i.e., for k = 1, 2, . . . , n − 1), providing 2n − 2 additional equations. The total number of equations is thus (2n − 4) + (2n − 2) = 4n − 6 for the two-dimensional case and (3n − 6) + (2n − 2) = 5n − 8 for the three-dimensional case. The unknowns are l2 , l3 , . . . , ln , r1 , r2 , . . . , rn−1 , L2 , L3 , . . . , Ln−1 and T1 , T2 , . . . , Tn , a total of (n − 1) + (n − 1) + (n − 2) + n = 4n − 4 unknowns. It is important to realize that in the two-dimensional case, each unit vector Tk is equivalent to only one unknown (since it can be written (cos θ, sin θ) for some angle θ). In the three-dimensional case, each Tk is equivalent to two unknowns, so the number of unknowns in that case is 5n−4. We thus end up with 4n − 6 equations and 4n − 4 unknowns (in the two-dimensional case) or 5n − 8 equations and 5n − 4 unknowns (in the three-dimensional case). We seem to be two or four equations short, but we already know from past experience with cubic
598
12.1 The Cubic Spline Curve
splines that this apparent problem can be turned to our advantage. The solution, of course, is to ask the user to supply the values of the two extreme unit tangents T1 and Tn . This provides the two or four necessary values to bring the number of unknowns down to the number of equations. This, together with the free parameter α, turns fair cubic splines into an interactive method. The discussion so far has assumed an open curve, but closed curves are also useful in practical problems. In fact, the original fair cubic splines were developed by [Manning 74] to help design insoles for shoes; a useful, practical example of a closed curve. A closed curve does not have endpoints; every point can be considered interior. Equation (12.32) can thus be written n times, providing 2n or 3n equations. A closed curve also requires n segments, instead of n − 1. Each of Equations (12.33) and (12.34) can thus be written n times, once for each segment. The total number of equations is therefore 4n or 5n. The unknowns are l1 , l2 , . . . , ln , r1 , r2 , . . . , rn , L1 , L2 , . . . , Ln and T1 , T2 , . . . , Tn , a total of 4n or 5n unknowns. A closed curve can thus be computed based on the n data points, without any additional user input. If it is unsatisfactory, it can be edited by moving the points or changing the value of the parameter α. One drawback of this method is that the equations are not linear. This makes it complicated to solve them and it also means that there may be either no solutions or several different solutions. A simple, iterative algorithm for solving the equations is the following: 1. Guess reasonable initial values for the unit tangents Tk . A good idea is to set Tk in the direction Pk+1 − Pk . 2. Using these values, solve Equations (12.33) and (12.34) to calculate initial values for all the lk and rk unknowns. 3. Substitute the current values of Tk , lk , and rk in the left-hand side of Equation (12.32) to obtain better values for the products Lk Tk . Once such a product is known, Tk can be calculated since its magnitude is 1. 4. Repeat steps 2 and 3 until the process converges (i.e., until none of lk , rk , and Tk changes between consecutive iterations by more than a preset threshold). As has been mentioned earlier, this process is not guaranteed to converge, or it may converge to one of several possible solutions. Another difficulty has to do with the constants Lk . We never need their magnitudes, but their signs are important, since they give the sense of direction at the interior points. One way to handle the Lk is to keep the angle between the two vectors (Pk+2 − Pk+1 ) and (Pk+1 − Pk ) small for every interior point Pk+1 (see Figure 12.6e). This will produce individual spline segments, none of which turns too much, resulting in tangents Tk , Tk+1 , and Tk+2 that don’t differ much in direction. Such a situation corresponds to Lk > 0 for every k. An alternative is to keep the sign of each of the new products Lk Tk obtained in step 3 identical to the sign of Tk used earlier in that step. This guarantees that none of the new Tk will change much in direction during an iteration.
12 Spline Interpolation
599
12.2 The Akima Spline In the late 1960s, Hiroshi Akima was looking for a cubic spline curve that would look natural and smooth, like a curve drawn intuitively by hand. The result turned out to be successful and the Akima spline has become the algorithm of choice for several illustration and drawing applications. The Akima algorithm has been published in [Akima 70], [Akima 78], and [Akima 91], with Matlab code available at [Akima-matlab 10]. The curves produced by the Akima spline are often very similar to those obtained by cubic splines, but are less prone to wiggling. This feature is especially noticeable when one data point is an outlier, as illustrated by Figure 12.8. The figure shows several data points and it is obvious that one of them lies at an unexpected location, far from the other points. A cubic spline has been computed for the points and it is clear that it wiggles between points. The Akima spline, on the other hand, is smooth overall and resembles a curve we expect a person to draw for the given points. Definition of outlier A value that “lies outside” (is much smaller or larger than) most of the other values in a set of data. A person who lives away from his place of work. An observation that lies outside the overall pattern of a distribution. The presence of an outlier normally indicates some sort of problem. A point which a data set is better off without (this is an embarrassing secret of the statistical trade).
Cubic spline Akima spline
Figure 12.8: Two Splines at an Outlier.
The original Akima spline was developed for explicit curves (i.e., curves expressed as y = f (x), also referred to as single-valued functions). Each spline segment from point xi to point xi+1 is a cubic polynomial, and the main idea is to compute the slope def si = f (xi ) of the segment at point xi explicitly, by means of a simple expression that depends on the point, its two immediate predecessors and its two immediate successors. The method is therefore local, and moving a point affects at most five spline segments. (This explains its usefulness when some data points are outliers. It also implies that the
12.2 The Akima Spline
600
number of data points must be at least five.) Another notable feature is the absence of the large system of equations, which is the cornerstone of the traditional cubic splines. A possible shortcoming is the fact that the second derivatives of the spline segments are not continuous at the data points. Figure 12.9 shows two examples of the construction that is used to compute the slope. Five data points, numbered 1 through 5, are shown. The points are connected with straight segments (secants) whose slopes are denoted by m1 through m4 . Thus, for example, m1 = (y2 − y1 )/(x2 − x1 ). The desired slope at point 3 is also shown, as a short straight segment. m4
1
m1 m2
1
3
4
m1
m
2
5
3
m4
m
2
2
3
m3
5
4
Figure 12.9: Construction for Computing the Slope.
The chief innovation of the Akima algorithm is the expression for computing the slope at point 3 |m4 − m3 |m2 + |m2 − m1 |m3 s3 = . (12.35) |m4 − m3 | + |m2 − m1 | In reference [Akima 70], the developer of this method, Hiroshi Akima, shows why this expression makes sense. Figure 12.10a again shows five points. The straight segments 12, 23, 43, and 54 are extended to determine the locations of auxiliary points A and B. Segment CD is the desired slope at point 3, and it is determined by satisfying the relation 2C 4D = . CA DB About a dozen steps (not listed here) are needed to arrive from this relation to Equation (12.35). 5 5 1 4 4 1 2
3 C
2
D
B
3
B C
A
(a)
D
A
(b)
Figure 12.10: Construction for Determining the Slope.
12 Spline Interpolation
601
Notice that in Figure 12.10b the two segments 4D and DB seem to go in “opposite” directions, which is why the relation above uses absolute values. Equation (12.35) implies that the slope at point 3 depends only on the slopes of the four secants. It is independent of the lengths of the secants and is invariant under a linear transformation of the coordinate system. It is also obvious that the equation makes sense in the following special cases: If m1 = m2 and m3 = m4 , then s = m2 . If m3 = m4 and m1 = m2 , then s = m3 . If m2 = m3 , then s = m2 = m3 . However, in the special case m1 = m2 = m3 = m4 , Equation (12.35) is indeterminate and the slope is simply defined as the average (m2 + m3 )/2. Once we accept Equation (12.35), it is applied to points (xi , yi ) and (xi+1 , yi+1 ) to compute slopes si and si+1 . These six numbers (four coordinates and two slopes) are then used to compute the cubic polynomial segment from xi to xi+1 . The polynomial itself has the form yi = p0 + p1 (x − xi ) + p2 (x − xi )2 + p3 (x − xi )3 ,
where
xi ≤ x ≤ xi+1 ,
(12.36)
and it is easy to show that the four parameters pk are expressed in terms of the six numbers as p0 = yi , p1 = si , p2 = 3(yi+1 − yi )/(xi+1 − xi ) − 2si − si+1 /(xi+1 − xi ), p3 = si + si+1 − 2(yi+1 − yi )/(xi+1 − xi ) /(xi+1 − xi )2 .
(12.37) (12.38) (12.39) (12.40)
Exercise 12.8: Only subscripts i and i+1 appear in Equations (12.37) through (12.40), but the slope at each data point i depends on the point and its four near neighbors. Where are the subscripts of the other neighbors? Computing and drawing an Akima spline for a set of n data points is done in n − 1 iterations. For each pair of consecutive points, the four polynomial coefficients are computed and the curve segment, Equation (12.36), is drawn. While experimenting with the curve, it is a good idea to save the four polynomial coefficients of each segment. If the user decides to move a data point, the software has to recompute the coefficients of at most five polynomial segments. The next point to consider is the computation of the first two and last two spline segments. The developer of this method, Hiroshi Akima, adopts the following idea. To compute the first segment, from point 1 to point 2, assume two imaginary points, 0 and −1, to the left of point 1, and compute slopes m−1 and m0 . To compute the last segment, assume two more points n + 1 and n + 2, and compute slopes mn and mn+1 . The computations are done as follows. For a non-periodic curve, m−1 = 3m1 − 2m2 , m0 = 2m1 − m2 , mn = 2mn−1 − mn−2 , and mn+1 = 3mn−1 − 2mn−2 . For a periodic curve, m−1 = mn−2 , m0 = mn−1 , mn = m1 , and mn+1 = m2 .
12.3 The Quadratic Spline
602
12.3 The Quadratic Spline The cubic spline curve is useful in certain practical applications, which raises the question of splines of different degrees based on the same concepts. It turns out that splines of degrees higher than 3 are useful only for special applications because they are more computationally intensive and tend to have many undesirable inflection points (i.e., they tend to wiggle excessively). Splines of degree 1 are, of course, just straight segments connected to form a polyline, but quadratic (degree-2) splines can be useful in certain applications. Such a spline is easy to derive and to compute. Each spline segment is a quadratic polynomial, i.e., a parabolic arc, so it results in fewer oscillations in the curve. On the other hand, quadratic spline segments connect with at most C 1 continuity because their second derivative is a constant. Thus, a quadratic spline curve may not be as tight as a cubic spline that passes through the same points. The quadratic spline curve is derived in this section based on the variant Hermite segment of Section 11.7. Each segment Pi (t) is therefore a quadratic polynomial defined by its two endpoints Pi and Pi+1 and by its start tangent vector Pti . Equation (11.33) shows that the end tangent of such a segment is Pti (1) = 2(Pi+1 − Pi ) − Pti . The first two spline segments are P1 (t) = (P2 − P1 − Pt1 )t2 + Pt1 t + P1 , P2 (t) = (P3 − P2 − Pt2 )t2 + Pt2 t + P2 . At their joint point P2 they have the tangent vectors Pt1 (1) = 2(P2 − P1 ) − Pt1 and Pt2 (0) = Pt2 . In order to achieve C 1 continuity we have to have the boundary condition Pt1 (1) = Pt2 (0) or 2(P2 − P1 ) − Pt1 = Pt2 . This equation can be written Pt1 + Pt2 = 2(P2 −P1 ), and when duplicated n−1 times, for the points P1 through Pn−1 , the result is ⎧⎡ ⎤⎡ Pt ⎤ ⎡ ⎤ P2 − P1 1 1 0 0 ··· 0 0 1 ⎪ ⎪ ⎨ ⎢ 0 1 1 0 · · · 0 0 ⎥⎢ ⎢ P3 − P2 ⎥ Pt2 ⎥ ⎥ ⎢ ⎥⎢ ⎥. .. n−1 (12.41) .. .. .. ⎢ . ⎥= 2 ⎢ ⎣ ⎦ ⎣ ⎦ ⎪ . . . ⎣ ⎦ . . ⎪ . ⎩ t 0 0 0 0 ··· 1 1 Pn − Pn−1 Pn n
As with the cubic spline, there are more unknowns than equations (n unknowns and n − 1 equations), and the standard technique is to ask the user to provide a value for one of the unknown tangent vectors, normally Pt1 . Example: We select the four points of Section 12.1.1, namely P1 = (0, 0), P2 = (1, 0), P3 = (1, 1), and P4 = (0, 1). We also select the same start tangent Pt1 = (1, −1). Equation (12.41) becomes ⎞ ⎛ ⎛ ⎞ ⎞ ⎛ Pt1 ⎞ P2 − P1 (2, 0) 1 1 0 0 t P ⎟ ⎜ 2 ⎝ 0 1 1 0 ⎠ ⎝ t ⎠ = 2 ⎝ P3 − P2 ⎠ = ⎝ (0, 2) ⎠ , P3 P4 − P3 (−2, 0) 0 0 1 1 Pt4 ⎛
with solutions Pt2 = (1, 1), Pt3 = (−1, 1), and Pt4 = (−1, −1). The three spline segments
12 Spline Interpolation
603
become P1 (t) = (P2 − P1 − Pt1 )t2 + Pt1 t + P1 = (t, t2 − t), P2 (t) = (P3 − P2 − Pt2 )t2 + Pt2 t + P2 = (−t2 + t + 1, t), P3 (t) = (P4 − P3 − Pt3 )t2 + Pt3 t + P3 = (−t + 1, −t2 + t + 1). Their tangent vectors are Pt1 (t) = (1, 2t−1), Pt2 (t) = (−2t+1, 1), and Pt3 (t) = (−1, −2t+ 1). It is easy to see that Pt1 (1) = Pt2 (0) = (1, 1) and Pt2 (1) = Pt3 (0) = (−1, 1). Also, the end tangent of the entire curve is Pt3 (1) = (−1, −1), the same as for the cubic case. The complete spline curve is shown in Figure 12.11.
1.2 1
y P4
P3
0.8 0.6 0.4 0.2 P1
P2 0.2 0.4 0.6 0.8
1
x 1.2
−°0.2
(*quadratic spline example*) C1:=ParametricPlot[{t,t^2-t},{t,0,1}]; C2:=ParametricPlot[{-t^2+t+1,t},{t,0,1}]; C3:=ParametricPlot[{-t+1,-t^2+t+1},{t,0,1}]; C4=Graphics[{Red, AbsolutePointSize[6],Point[{0,0}], Point[{1,0}],Point[{1,1}],Point[{0,1}]}]; Show[C1,C2,C3,C4,PlotRange->All, AspectRatio->Automatic]
Figure 12.11: A Quadratic Spline Example.
12.4 The Quintic Spline
604
12.4 The Quintic Spline The derivation of the cubic spline is based on the requirement (boundary condition) that the second derivatives of the individual segments be equal at the interior points. This produces n − 2 equations to compute the first derivatives, but makes it impossible to control the values of the second derivatives. In cases where the designer wants to specify the values of the second derivatives, higher-degree polynomials must be used. A degree-5 (quintic) polynomial is a natural choice. Section 11.3 discusses the similar case of the quintic Hermite segment. The approach to the quintic spline is similar to that of the cubic spline. The spline is a set of n−1 segments, each a quintic polynomial, so we have to compute the coefficients of each segment from the boundary conditions. A general quintic spline segment from point Pk to point Pk+1 is given by Equation (11.16), duplicated here Pk (t) = ak t5 + bk t4 + ck t3 + dk t2 + ek t + fk .
(11.16)
The six coefficients are computed from the following six boundary conditions Pk (0) = Pk , Pk (1) = Pk+1 (0),
Pk (1) = Pk+1 ,
Pk (1) = Pk+1 (0),
Pk (1) = Pk+1 (0),
Pk (1) = Pk+1 (0).
(Notice that these conditions involve the first four derivatives. Experience indicates that better-looking splines are obtained when the boundary conditions are based on an even number of derivatives, which is why the quintic, and not the quartic, polynomial is a natural choice.) The boundary conditions can be written explicitly as follows: fk = Pk , ak + bk + ck + dk + ek + fk = fk+1 = Pk+1 , 5ak + 4bk + 3ck + 2dk + ek = ek+1 , 20ak + 12bk + 6ck + 2dk = 2dk+1 , 60ak + 24bk + 6ck = 6ck+1 , 120ak + 24bk = 24bk+1 .
a b c (12.42)d e f
These equations are now used to express the six coefficients of each of the n − 1 quintic polynomials in terms of the second and fourth derivatives. 1 Pk (0). This also Equation (12.42)f results in 24bk+1 = Pk+1 (0) or bk = 24 1 implies ak = 120 [Pk+1 (0) − Pk (0)]. Equation (12.42)d implies 2dk+1 = Pk+1 (0) or dk = 12 Pk (0). Now that we have expressed ak , bk , and dk in terms of the second and fourth derivatives, we substitute them in Equation (12.42)d to get the following expression for ck ck =
1 1 [P (0) − Pk (0)] − [Pk+1 (0) + 2Pk (0)]. 6 k+1 36
12 Spline Interpolation
605
The last coefficient to be expressed in terms of the (still unknown) second and fourth derivatives is ek . This is done from Pk (1) = Pk+1 and results in 1 1 [7Pk+1 (0) + 8Pk (0)]. ek = [Pk+1 − Pk ] − [Pk+1 (0) + 2Pk (0)] + 6 360
When these expressions for the six coefficients are combined with Pk−1 (1) = Pk (0), all the terms with first and third derivatives are eliminated, and the result is a relation between the (unknown) second and fourth derivatives and the (known) data points 1 1 [7Pk−1 (0) + 8Pk (0)] [Pk−1 − Pk ] + [Pk−1 (0) + 2Pk (0)] − 6 360 1 1 [7Pk+1 (0) + 8Pk (0)]. = [Pk+1 − Pk ] − [Pk+1 (0) + 2Pk (0)] + 6 360
(12.43)
When these expressions for the six coefficients are similarly combined with P k−1 (1) = P k (0), the result is another relation between the second and fourth derivates 1 2 1 −Pk−1 (0) + 2Pk (0) − Pk+1 (0) + Pk−1 (0) + Pk (0) + Pk+1 (0) = 0. 6 3 6
(12.44)
Each of Equations (12.43) and (12.44) is n − 1 equations for k = 1, 2, . . . , n − 1, so we end up with 2(n − 1) equations with the 2n second and fourth unknown derivatives. As in the case of the cubic spline, we complete this system of equations by guessing values for some extreme derivatives. The simplest end condition is to require
P1 (0) = Pn−1 (1) = P1 (0) = Pn−1 (1) = 0,
which implies P1 (0) = P1 (1) − 16 P1 (1) and Pn (0) = Pn−1 (1) − 16 Pn−1 (1) and makes ath 83] it possible to eliminate P1 (0) and Pn (0) from Equations (12.43) and (12.44). [Sp¨ shows that the end result is the system of equations
where
A −B C A
T P = P1 (0), . . . , Pn−1 (0) ,
P P
P
=
D , 0
(12.45)
T = P , 1 (0), . . . , Pn−1 (0)
T D = 6[(P2 − P1 ) − (P1 − P0 )], . . . , 6[(Pn − Pn−1 ) − (Pn−1 − Pn−2 )] , and ⎡
5 1 ⎢ 1 14 ⎢
A=⎢ ⎣
⎤ 1 4
..
1
. 1 4 1 1 5
⎡
26 7 ⎢ 7 16 7 ⎢
⎥ ⎥ ⎥, B = ⎢ ⎣ ⎦
⎤ 7 16
..
7
. 7 16 7 7 26
⎡
6 −6 12 −6 ⎢ −6 −6 12 −6 ⎢
⎥ ⎥ ⎥, C = ⎢ ⎣ ⎦
..
. −6 12 −6 6 6
⎤ ⎥ ⎥ ⎥. ⎦
12.5 Cardinal Splines
606
Notice that matrices A, B, and C are tridiagonal and symmetric. In addition, A and B are diagonally dominant, while C is nonnegative definite. This guarantees that the block matrix of Equation (12.45) will have an inverse, which implies that the system of equations has a unique solution. Solving the system of Equations (12.45) means expressing the second and fourth derivatives of the spline segments in terms of the data points (the known quantities). Once this is done, the six coefficients of each of the n−1 spline segments can be expressed in terms of the data points, and the segments can be constructed.
12.5 Cardinal Splines The cardinal spline is another example of how Hermite interpolation is applied to construct a spline curve. The cardinal spline overcomes the main disadvantages of cubic splines, namely the lack of local control and the need to solve a system of linear equations that may be large (its size depends on the number of data points). Cardinal splines also offer a natural way to control the tension of the curve by modifying the magnitudes of the tangent vectors (Section 11.2.3). The price for all this is the loss of second-order continuity. Strictly speaking, this loss means that the cardinal spline isn’t really a spline (see the definition of splines on Page 577), but its form, its derivation, and its behavior are so similar to those of other splines that the name “cardinal spline” has stuck. Figure 12.12a illustrates the principle of this method. The figure shows a curve that passes through seven points. The curve looks continuous but is constructed in segments, two of which are thicker than the others. The first thick segment, the one from P2 to P3 , starts in the direction from P1 to P3 and ends going in the direction from P2 to P4 . The second thick segment, from P5 to P6 , features the same behavior. It starts in the direction from P4 to P6 and ends going in the direction from P5 to P7 .
P3
P1
P2
P6 P4
P5
P2 P7
P1
P3
P1 P 3−
(a)
P4 − P2
P4
(b)
Figure 12.12: Tangent Vectors in a Cardinal Spline.
The cardinal spline for n given points is calculated and drawn in segments, each depending on four points only. Each point participates in at most four curve segments, so moving one point affects only those segments and not the entire curve. This is why the curve features local control. The individual segments connect smoothly; their first derivatives are equal at the connection points (the curve therefore has first-order continuity). However, the second derivatives of the segments are generally different at the connection points.
12 Spline Interpolation
607
The first step in constructing the complete curve is to organize the points into n − 3 highly-overlapping groups of four consecutive points each. The groups are [P1 , P2 , P3 , P4 ],
[P2 , P3 , P4 , P5 ],
[P3 , P4 , P5 , P6 ], . . . , [Pn−3 , Pn−2 , Pn−1 , Pn ].
Hermite interpolation is then applied to construct a curve segment P(t) for each group. Denoting the four points of a group by P1 , P2 , P3 , and P4 , the two interior points P2 and P3 become the start and end points of the segment and the two tangent vectors become s(P3 − P1 ) and s(P4 − P2 ), where s is a real number. Thus, segment P(t) goes from P2 to P3 and its two extreme tangent vectors are proportional to the vectors P3 − P1 and P4 − P2 (Figure 12.12b). The proportionality constant s is related to the tension parameter T . Note how there are no segments from P1 to P2 and from Pn−1 to Pn . These segments can be added to the curve by adding two new extreme points P0 and Pn+1 . These points can also be employed to edit the curve, because the first segment, from P1 to P2 , starts going in the direction from P0 to P2 , and similarly for the last segment. The particular choice of the tangent vectors guarantees that the individual segments of the cardinal spline will connect smoothly. The end tangent s(P4 − P2 ) of the segment for group [P1 , P2 , P3 , P4 ] is identical to the start tangent of the next group, [P2 , P3 , P4 , P5 ]. Segment P(t) is therefore defined by P(0) = P2 , P(1) = P3 , Pt (0) = s(P3 − P1 ), Pt (1) = s(P4 − P2 )
(12.46)
and is easily calculated by applying Hermite interpolation (Equation (11.7)) to the four quantities of Equation (12.46) ⎞⎛ ⎞ P2 2 −2 1 1 3 −2 −1 ⎟ ⎜ P3 ⎟ ⎜ −3 P(t) = (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠ 0 0 1 0 s(P3 − P1 ) 1 0 0 0 s(P4 − P2 ) ⎛ ⎞⎛ ⎞ −s 2 − s s − 2 s P1 ⎜ 2s s − 3 3 − 2s −s ⎟ ⎜ P2 ⎟ = (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠. P3 −s 0 s 0 P4 0 1 0 0 ⎛
(12.47)
Tension in the cardinal spline can now be controlled by changing the lengths of the tangent vectors by means of parameter s. A long tangent vector (obtained by a large s) causes the curve to continue longer in the direction of the tangent. A short tangent has the opposite effect; the curve moves a short distance in the direction of the tangent, then quickly changes direction and moves toward the end point. A zero-length tangent (corresponding to s = 0) produces a straight line between the endpoints (infinite tension). In principle, the parameter s can be varied from 0 to ∞. In practice, we use only values in the range [0, 1]. However, since s = 0 produces maximum tension, we cannot
12.5 Cardinal Splines
608
intuitively think of s as the tension parameter and we need to define another parameter, T inversely related to s. The tension parameter T is defined as s = (1 − T )/2, which implies T = 1 − 2s. The value T = 0 results in s = 1/2. The curve is defined as having tension zero in this case and is called the Catmull–Rom spline [Catmull and Rom 74]. Section 12.6 includes a detailed derivation of this type of spline as a blend of two parabolas. Increasing T from 0 to 1 decreases s from 1/2 to 0, thereby reducing the magnitude of the tangent vectors down to 0. This produces curves with more tension. Exercise 11.7 tells us that when the tangent vectors have magnitude zero, the Hermite curve segment is a straight line, so the entire cardinal spline curve becomes a set of straight segments, a polyline, the curve with maximum tension. Decreasing T from 0 to −1 increases s from 1/2 to 1. The result is a curve with more slack at the data points. To illustrate this behavior mathematically, we rewrite Equation (12.47) explicitly to show its dependence on s: P(t) = s(−t3 + 2t2 − t)P1 + s(−t3 + t2 )P2 + (2t3 − 3t2 + 1)P2 + s(t3 − 2t2 + t)P3 + (−2t3 + 3t2 )P3 + s(t3 − t2 )P4 .
(12.48)
For s = 0, Equation (12.48) becomes (2t3 − 3t2 + 1)P2 + (−2t3 + 3t2 )P3 , which can be simplified to (3t2 − 2t3 )(P3 − P2 ) + P2 . Substituting u = 3t2 − 2t3 reduces this to u(P3 − P2 ) + P2 , which is the straight line from P2 to P3 . For large s, we use Equation (12.48) to calculate the mid-curve value P(0.5): s [(P3 − P1 ) + (P2 − P4 )] + 0.5(P2 + P3 ) 8 s t = P (0) − Pt (1) + 0.5(P2 + P3 ). 8
P(0.5) =
This is an extension of Equation (Ans.13). The first term is the difference of the two tangent vectors, multiplied by s/8. As s grows, this term grows without limit. The second term is the midpoint of P2 and P3 . Adding the two terms (a vector and a point) produces a point that may be located far away (for large s) from the midpoint, showing that the curve moves a long distance away from the start point P2 before changing direction and starting toward the end point P3 . Large values of s therefore feature a loose curve (low tension). Thus, the tension of the curve can be increased by setting s close to 0 (or, equivalently, setting T close to 1); it can be decreased by increasing s (or, equivalently, decreasing T toward 0). Exercise 12.9: What happens when T > 1? Setting T = 0 results in s = 0.5. Equation (12.47) reduces in this case to ⎛ ⎜ P(t) = (t3 , t2 , t, 1) ⎝
⎞ ⎞⎛ −0.5 1.5 −1.5 0.5 P1 1 −2.5 2 −0.5 ⎟ ⎜ P2 ⎟ ⎠, ⎠⎝ P3 −0.5 0 0.5 0 P4 0 1 0 0
(12.49)
12 Spline Interpolation
609
a curve known as the Catmull–Rom spline. Its basis matrix is termed the parabolic blending matrix. Example: Given the four points (1, 0), (3, 1), (6, 2), and (2, 3), we apply Equation (12.47) to calculate the cardinal spline segment from (3, 1) to (6, 2): ⎤⎡ ⎤ −s 2 − s s − 2 s (1, 0) ⎢ 2s s − 3 3 − 2s −s ⎥ ⎢ (3, 1) ⎥ P(t) = (t3 , t2 , t, 1) ⎣ ⎦⎣ ⎦ −s 0 s 0 (6, 2) 0 1 0 0 (2, 3) ⎡
= t3 (4s − 6, 4s − 2) + t2 (−9s + 9, −6s + 3) + t(5s, 2s) + (3, 1). For high tension (i.e., T = 1 or s = 0), this reduces to the straight line P(t) = (−6, −2)t3 + (9, 3)t2 + (3, 1) = (3, 1)(−2t3 + 3t2 ) + (3, 1) = (3, 1)u + (3, 1). For T = 0 (or s = 1/2), this cardinal spline reduces to the Catmull–Rom curve P(t) = (−4, 0)t3 + (4.5, 0)t2 + (2.5, 1)t + (3, 1).
(12.50)
Figure 12.13 shows an example of a similar cardinal spline (the points are different) with four values 0, 1/6, 2/6, and 3/6 of the tension parameter. 3 P1 2.5
P4 P3
2 1.5 1 0.5
P2 1.5
2
2.5
3
(* Cardinal spline example *) T={t^3,t^2,t,1}; H[s_]:={{-s,2-s,s-2,s},{2s,s-3,3-2s,-s},{-s,0,s,0},{0,1,0,0}}; B={{1,3},{2,0},{3,2},{2,3}}; s=3/6; (* T=0 *) g1=ParametricPlot[T.H[s].B,{t,0,1}]; s=2/6; (* T=1/3 *) g2=ParametricPlot[T.H[s].B,{t,0,1}]; s=1/6; (* T=2/3 *) g3=ParametricPlot[T.H[s].B,{t,0,1}]; s=0; (* T=1 *) g4=ParametricPlot[T.H[s].B,{t,0,1}]; g5=Graphics[{AbsolutePointSize[4], Table[Point[B[[i]]],{i,1,4}] }]; Show[g1,g2,g3,g4,g5, PlotRange->All]
Figure 12.13: A Cardinal Spline Example.
610
12.6 Parabolic Blending: Catmull–Rom Curves
12.6 Parabolic Blending: Catmull–Rom Curves The Catmull–Rom curve (or the Catmull–Rom spline) is the special case of a cardinal spline with tension T = 0. In this section, we describe an approach to the Catmull–Rom spline where each spline segment is derived as the blend of two parabolas. This approach to the Catmull–Rom curve proceeds in the following steps: 1. Organize the points in overlapping groups of three consecutive points each. The groups are [P1 , P2 , P3 ],
[P2 , P3 , P4 ],
[P3 , P4 , P5 ],
···
[Pn−2 , Pn−1 , Pn ].
2. Fit two parabolas, one through the first three points, P1 , P2 , and P3 , and the other through the overlapping group, P2 , P3 , and P4 . 3. Calculate the first curve segment from P2 to P3 as a linear blend of the two parabolas, using the two barycentric weights 1 − t and t. 4. Fit a third parabola, through points P3 , P4 , and P5 and calculate the second curve segment, from P3 to P4 , as a linear blend of the second and third parabolas. 5. Repeat until the last segment, from Pn−2 to Pn−1 , is calculated as a linear blend of the (n − 3)rd and the (n − 2)nd parabolas. Each parabola is defined by three points (which, of course, are on the same plane) and is therefore flat. However, the two parabolas that make up the segment are not generally on the same plane, so their blend is not necessarily flat and can twist in space. The two original parabolas are denoted by Q(u) = (u2 , u, 1)H123 and R(w) = (w2 , w, 1)H234 , where H123 and H234 are column vectors, each depending on the three points involved. They will have to be calculated. The expression for the blended segment is P(t) = (1−t)Q(u)+tR(w). Since this expression depends on t only, we have to express parameters u and w in terms of t. We try the linear expressions u = at + b, w = ct + d. To calculate a, b, c, and d, we write the end conditions for the two parabolas and for the curve segment (Figure 12.14a): Q(0) = P1 ,
Q(0.5) = P2 ,
Q(1) = P3 ,
R(0) = P2 ,
R(0.5) = P3 ,
R(1) = P4 ,
P(0) = P2 ,
P(1) = P3 .
For point P2 , we get (1) u = 0.5 and t = 0, implying b = 0.5, and (2) w = 0 and t = 0, implying d = 0. For point P3 , we similarly get (1) u = 1 and t = 1, implying a + b = 1 ⇒ a = 0.5, and (2) w = 0.5 and t = 1, implying c = 0.5. This results in u = (1 + t)/2 and w = t/2. Therefore, for the first parabola, we get ⎞ ⎞ ⎛ ⎛ Q(0) = P1 = (0, 0, 1)H123 , 0 0 1 P1 Q(0.5) = P2 = (1/4, 1/2, 1)H123 , =⇒ ⎝ P2 ⎠ = ⎝ 1/4 1/2 1 ⎠ H123 = MH123 , 1 1 1 P3 Q(1) = P3 = (1, 1, 1)H123 .
12 Spline Interpolation
611
P3
P4
P2
P1
(a)
F2
1
F3 .5
0
10F4
10F1 (b)
Figure 12.14: Parabolic Blending: (a) Two Parabolas.
(b) The Blend Functions. This can be solved for H123 ⎛ −1 ⎝
H123 = M So the first parabola is
⎞ ⎛ ⎞ ⎞⎛ P1 2 −4 2 P1 ⎠ ⎝ ⎠ ⎝ P2 = −3 4 −1 P2 ⎠ . P3 P3 1 0 0 ⎛
⎞ P1 Q(u) = (u2 , u, 1)M−1 ⎝ P2 ⎠ . P3
The second parabola is obtained similarly: ⎛
⎞ P2 R(w) = (w2 , w, 1)M−1 ⎝ P3 ⎠ . P4 The first curve segment is therefore P(t) = (1 − t)Q(u) + tR(w) ⎛
⎞ ⎛ ⎞ P1 P2 2 −1 = (1 − t)(u , u, 1)M ⎝ P2 ⎠ + t(w , w, 1)M ⎝ P3 ⎠ P3 P4 ⎞ ⎛ P1 = (1 − t)(2u2 − 3u + 1, −4u2 + 4u, 2u2 − u) ⎝ P2 ⎠ P3 2
−1
612
12.6 Parabolic Blending: Catmull–Rom Curves ⎛ ⎞ P2 2 2 2 + t(2w − 3w + 1, −4w + 4w, 2w − w) ⎝ P3 ⎠ P4 = (−0.5t3 + t2 − 0.5t)P1 + (1.5t3 − 2.5t2 + 1)P2 + (−1.5t3 + 2t2 + 0.5t)P3 + (0.5t3 − 0.5t2 )P4 ⎞⎛ ⎛ ⎞ P1 −0.5 1.5 −1.5 0.5 1 −2.5 2 −0.5 P ⎜ ⎟ ⎜ 2⎟ = (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠ −0.5 0 0.5 0 P3 0 1 0 0 P4 = (t3 , t2 , t, 1)BP,
(12.51)
(12.52)
where B is called the parabolic blending matrix. The other segments are calculated similarly. Note that, in practice, there is no need to calculate the parabolas. The program simply executes a loop where in each iteration, it uses Equation (12.51) with the next group of points to calculate the next segment. The Catmull–Rom curve starts at point P2 and ends at Pn−1 . To make it pass through all n points P1 , . . . , Pn , we add two more points P0 and Pn+1 . In practice, we normally select them as P0 = P1 and Pn+1 = Pn . The first group of points is now P0 , . . . , P3 , and the last one is Pn−2 , . . . , Pn+1 . This also makes the method more interactive, since two more points can be repositioned to edit the shape of the curve. The curve can also be closed, if the first and last points are set to identical values. Equation (12.51) gives the representation of the Catmull–Rom curve curves in terms of the four blending functions F1 (t) = (−0.5t3 + t2 − 0.5t), F3 (t) = (−1.5t3 + 2t2 + 0.5t),
F2 (t) = (1.5t3 − 2.5t2 + 1), F4 (t) = (0.5t3 − 0.5t2 ).
Note how F1 and F4 are negative (Figure 12.14b), how F2 and F3 are symmetric, and how the four functions are barycentric. Exercise 12.10: Prove the first-order continuity of the parabolic curve. Example: Given the five points (1, 0), (3, 1), (6, 2), (2, 3), and (1, 4), we calculate the Catmull–Rom curve from (1, 0) to (1, 4). The first step is to add two more points, one on each end. We simply duplicate each of the two endpoints, ending up with seven points. The first segment is (from Equation (12.51)) P1 (t) = (−0.5t3 + t2 − 0.5t)(1, 0) + (1.5t3 − 2.5t2 + 1)(1, 0) + (−1.5t3 + 2t2 + 0.5t)(3, 1) + (0.5t3 − 0.5t2 )(6, 2) = (−0.5t3 + 1.5t2 + t + 1, −0.5t3 + t2 + 0.5t). This segment goes from point (1, 0) (for t = 0) to point (3, 1) (for t = 1). The next
12 Spline Interpolation
613
segment, from (3, 1) to (6, 2), is similarly P2 (t) = (−0.5t3 + t2 − 0.5t)(1, 0) + (1.5t3 − 2.5t2 + 1)(3, 1) + (−1.5t3 + 2t2 + 0.5t)(6, 2) + (0.5t3 − 0.5t2 )(2, 3) = (−4, 0)t3 + (4.5, 0)t2 + (2.5, 1)t + (3, 1). This is identical to Equation (12.50). Calculating the other two segments is left as an exercise.
12.6.1 Generalized Parabolic Blending The previous discussion assumes that the n points are roughly equally spaced. This is why we could write Q(0.5) = P2 and R(0.5) = P3 . This assumption is sometimes true in practical work. In cases where it isn’t true, it is possible to write Q(α) = P2 and R(β) = P3 (where 0 ≤ α, β ≤ 1 and their values depend on the placement of the points) and derive the expression for the curve from there. Here is a summary of the results: The three parameters are now related by u = (1 − α)t + α and w = βt. The two parabolas are given by ⎞ ⎛ P1 0 ⎝ P2 ⎠ = ⎝ α2 P3 1 ⎛
implying
⎛
⎞ ⎛ 1 P1 α P2 ⎠ = ⎝ −(1+α) α P3 1
−1 ⎝
H123 = M and
⎛
⎞ ⎛ P2 0 ⎝ P3 ⎠ = ⎝ β 2 P4 1
implying
⎞ 0 1 α 1 ⎠ H123 , 1 1
0 β 1
⎛ −1
H234 = M
⎞ ⎛ 1 P2 β ⎝ P3 ⎠ = ⎝ −(1+β) β P4 1
⎞ P1 ⎠ ⎝ P2 ⎠ , P3 0
−1 α(1−α) 1 α(1−α)
1 1−α −α 1−α
0
⎞⎛
⎞ 1 1 ⎠ H234 , 1 ⎞ P2 ⎠ ⎝ P3 ⎠ . P4 0
−1 β(1−β) 1 β(1−β)
1 1−β −β 1−β
0
⎞⎛
The final expression of the curve is ⎛ −(1−α)2 α
⎜ 2(1−α)2 ⎜ α P(t) = (t3 , t2 , t, 1) ⎜ ⎝ −(1−α)2 α
0
(1−α)+αβ α −2(1−α)−αβ α (1−2α) α
1
−(1−α)−αβ 1−β 2(1−α)−β(1−2α) 1−β
α 0
β2 1−β −β 2 1−β
⎞
⎞ ⎛ P1 ⎟ ⎟ ⎜ P2 ⎟ ⎟⎝ ⎠. ⎠ P3 0 P4 0
12.6 Parabolic Blending: Catmull–Rom Curves
614
12.6.2 Bessel’s Algorithm The cardinal spline and the Catmull–Rom curve are based on the particular way the two extreme tangent vectors of each four-point segment are defined. Equation (12.46) defines Pt (0) = s(P3 −P1 ) and Pt (1) = s(P4 −P2 ). So far, these definitions, which seem arbitrary, have been used because they make sense. They can, however, be explained (or justified) by a simple method called Bessel’s algorithm. The idea is to calculate a quadratic interpolating polynomial Qs (t) for the first three points P0 , P1 , and P2 and define Pt (0) as the tangent vector of Qs (t) at point P1 (Figure 12.15). Similarly, another quadratic interpolating polynomial Qe (t) is calculated for the last three points P1 , P2 , and P3 , and Pt (1) is defined as the tangent vector of Qe (t) at point P2 . Qs(t)
P3
P1 P1
P0
P2
Qe(t)
P2 Figure 12.15: Bessel’s Algorithm.
Friedrich Wilhelm Bessel (1784–1846): German astronomer and mathematician. Best known for making the first accurate measurement of the distance to a star. The uniform quadratic Lagrange polynomial (Equation (10.11)) is used as our interpolating polynomial: t2 − 3t + 2 t2 − t P0 − (t2 − 2t)P1 + P2 2 2 ⎞⎛ ⎛ ⎞ P0 1/2 −1 1/2 2 = (t , t, 1) ⎝ −3/2 2 −1/2 ⎠ ⎝ P1 ⎠ . 1 0 0 P2
Qs (t) =
The parameter t varies in the range [0, 2], so Qs (1) gives the middle point. The tangent vector of Qs (t) is 2t − 3 2t − 1 P0 − (2t − 2)P1 + P2 2 2 ⎞ ⎞⎛ ⎛ P0 1/2 −1 1/2 = (2t, 1, 0) ⎝ −3/2 2 −1/2 ⎠ ⎝ P1 ⎠ . 1 0 0 P2
Qts (t) =
12 Spline Interpolation
615
Thus, Qts (1) = (P2 − P0 )/2. Similarly, t2 − 3t + 2 t2 − t P1 − (t2 − 2t)P2 + P3 2 2 ⎞ ⎞⎛ ⎛ P1 1/2 −1 1/2 = (t2 , t, 1) ⎝ −3/2 2 −1/2 ⎠ ⎝ P2 ⎠ , 1 0 0 P3
Qe (t) =
which yields Qte (1) = (P3 − P1 )/2. It is also possible to use the nonuniform quadratic Lagrange polynomial (Equation (10.12)). If we select ⎛
1 ⎜ Δ0 (Δ0 + Δ1 ) ⎜ −1 1 Qs (t) = (t2 , t, 1) ⎜ ⎜ ⎝ Δ0 + Δ1 − Δ0 1
1 Δ0 Δ1 1 1 + Δ0 Δ1 0 −
1 (Δ0 + Δ1 )Δ1 1 1 − + Δ1 Δ0 + Δ1 0
⎞
⎞ ⎛ ⎟ P0 ⎟ ⎟ ⎝ P1 ⎠ , (12.53) ⎟ ⎠ P2
then the tangent vector at point P1 becomes Δ1 Δ1 − Δ0 Δ0 P0 + P1 + P2 Δ0 (Δ0 + Δ1 ) Δ0 Δ1 (Δ0 + Δ1 )Δ1 Δ1 P1 − P0 Δ0 P2 − P1 = + . Δ0 + Δ1 Δ0 Δ0 + Δ1 Δ1
Qts (Δ0 ) = −
(12.54)
It is easy to see that Equation (12.54) reduces to (P2 − P0 )/2 when Δ0 = Δ1 = 1. Exercise 12.11: Use Equation (10.12) to represent Qe (t) and calculate the tangent vector Qte (Δ1 ).
12.7 Catmull–Rom Surfaces The cardinal spline or the Catmull–Rom curve can easily be extended to a surface that’s fully defined by a rectangular grid of data points. In analogy to the Catmull–Rom curve segment—which involves four points but only passes through the two interior points—a single Catmull–Rom surface patch is specified by 16 points, the patch is anchored at the four middle points and spans the area delimited by them. We start with a group of m × n data points roughly arranged in a rectangle. We examine all the overlapping groups that consist of 4×4 adjacent points, and we calculate a surface patch for each group. Some of the groups are shown in Figure 12.16.
12.7 Catmull–Rom Surfaces
616 P40 P41 P42 P43 P30 P31 P32 P33 P20 P21 P22 P23 P10 P11 P12 P13
P41 P42 P43 P44 P31 P32 P33 P34 P21 P22 P23 P24 P11 P12 P13 P14
P42 P43 P44 P45 P32 P33 P34 P35 P22 P23 P24 P25 P12 P13 P14 P15
... ... ... ...
P4,n−3 P4,n−2 P4,n−1 P4n P3,n−3 P3,n−2 P3,n−1 P3n P2,n−3 P2,n−2 P2,n−1 P2n P1,n−3 P1,n−2 P1,n−1 P1n
P30 P31 P32 P33 P20 P21 P22 P23 P10 P11 P12 P13 P00 P01 P02 P03
P31 P32 P33 P34 P21 P22 P23 P24 P11 P12 P13 P14 P01 P02 P03 P04
P32 P33 P34 P35 P22 P23 P24 P25 P12 P13 P14 P15 P02 P03 P04 P05
... ... ... ...
P3,n−3 P3,n−2 P3,n−1 P3n P2,n−3 P2,n−2 P2,n−1 P2n P1,n−3 P1,n−2 P1,n−1 P1n P0,n−3 P0,n−2 P0,n−1 P0n
Figure 12.16: Points for a Catmull–Rom Surface Patch.
The expression of the surface is obtained by applying the technique of Cartesian product (Section 8.12) to the Catmull–Rom curve. Equation (8.25) produces ⎞ w3 2 w ⎟ ⎜ P(u, w) = (u3 , u2 , u, 1)BPBT ⎝ ⎠, w 1 ⎛
(12.55)
where B is the parabolic blending matrix of Equation (12.49) ⎛
⎞ −0.5 1.5 −1.5 0.5 1 −2.5 2 −0.5 ⎟ ⎜ B=⎝ ⎠ −0.5 0 0.5 0 0 1 0 0 and P is a matrix consisting of the 4×4 points participating in the patch ⎛
Pi+3,j ⎜ Pi+2,j P=⎝ Pi+1,j Pi,j
Pi+3,j+1 Pi+2,j+1 Pi+1,j+1 Pi,j+1
Pi+3,j+2 Pi+2,j+2 Pi+1,j+2 Pi,j+2
⎞ Pi+3,j+3 Pi+2,j+3 ⎟ ⎠. Pi+1,j+3 Pi,j+3
Notice that the patch spans the area bounded by the four central points. In general, the entire surface spans the area bounded by the four points P11 , P1,n−1 , Pm−1,1 , and Pm−1,n−1 . If we want the surface to span the area bounded by the four corner points P00 , P0n , Pm0 , and Pmn , we have to create two new extreme rows and two new extreme columns of points, by analogy with the Catmull–Rom curve. Example: Given the following coordinates for 16 points in file CRpoints 0 0 0 0
0 1 2 3
0 0 0 0
1 0 0 .5 .5 1 .5 2.5 0 1 3 0
2 0 0 2.5 .5 0 2.5 2.5 1 2 3 0
3 3 3 3
0 1 2 3
0 0 0 0
12 Spline Interpolation
000 010 020 030
617
1 00 2 00 300 .5 .5 1 2.5 .5 0 3 1 0 .5 2.5 0 2.5 2.5 1 3 2 0 13 0 2 3 0 330
Clear[Pt,Bm,CRpatch,g1,g2]; Pt=ReadList["CRpoints",{Number,Number,Number},RecordLists->True]; Bm:={{-.5,1.5,-1.5,.5},{1,-2.5,2,-.5},{-.5,0,.5,0},{0,1,0,0}}; CRpatch[i_]:=(*1st patch,rows 1-4*) {u^3,u^2,u,1}.Bm.Pt[[{1,2,3,4},{1,2,3,4},i]]. Transpose[Bm].{w^3,w^2,w,1}; g1=Graphics3D[{Red, AbsolutePointSize[6], Table[Point[Pt[[i,j]]],{i,1,4},{j,1,4}]}]; g2=ParametricPlot3D[{CRpatch[1],CRpatch[2],CRpatch[3]}, {u,0,.98},{w,0,1}]; Show[g1,g2,ViewPoint->{-4.322,0.242,0.306},PlotRange->All]
Figure 12.17: A Catmull–Rom Surface Patch.
the Mathematica code of Figure 12.17 reads the file and generates the Catmull–Rom patch. Note how the patch spans only the four center points and how the z coordinates of 0 and 1 create the particular shape of the patch. Example: (extended) We now add four more points to file CRpoints, and use rows 2–5 to calculate and display another patch. Notice the five values of y compared to the four values of x. The code of Figure 12.18 reads the extended file and generates and displays both patches. Each patch spans four points, but they share the two points (0.5, 2.5, 0) and (2.5, 2.5, 1). Note how they connect smoothly. Tension can be added to a Catmull–Rom surface patch in the same way that it is added to a Catmull–Rom curve or to a cardinal spline. Figure 12.19 illustrates how smaller values of s create a surface closer to a flat plane.
12.8 Kochanek–Bartels Splines The Kochanek–Bartels spline method [Kochanek and Bartels 84] is an extension of the cardinal spline. In addition to the tension parameter T , this method introduces two new parameters, c and b to control the continuity and bias, respectively, of individual curve segments. The curve is a spline computed from a set of n data points, and the three
12.8 Kochanek–Bartels Splines
618
000 010 020 030 040
1 00 2 00 300 .5 .5 1 2.5 .5 0 3 1 0 .5 2.5 0 2.5 2.5 1 3 2 0 13 0 2 3 0 330 14 0 2 4 0 340
Clear[Pt,Bm,CRpatch,CRpatchM,g1,g2,g3]; Pt=ReadList["CRpoints",{Number,Number,Number},RecordLists->True]; Bm:={{-.5,1.5,-1.5,.5},{1,-2.5,2,-.5},{-.5,0,.5,0},{0,1,0,0}}; CRpatch[i_]:=(*1st patch,rows 1-4*){u^3,u^2,u,1}.Bm. Pt[[{1,2,3,4},{1,2,3,4},i]].Transpose[Bm].{w^3,w^2,w,1}; CRpatchM[i_]:=(*2nd patch,rows 2-5*){u^3,u^2,u,1}.Bm. Pt[[{2,3,4,5},{1,2,3,4},i]].Transpose[Bm].{w^3,w^2,w,1}; g1=Graphics3D[{Red,AbsolutePointSize[6], Table[Point[Pt[[i,j]]],{i,1,5},{j,1,4}]}]; g2=ParametricPlot3D[{CRpatch[1],CRpatch[2],CRpatch[3]}, {u,0,.98},{w,0,1}]; g3=ParametricPlot3D[{CRpatchM[1],CRpatchM[2],CRpatchM[3]}, {u,0,1},{w,0,1}]; Show[g1,g2,g3,PlotRange->All]
Figure 12.18: Two Catmull–Rom Surface Patches.
shape parameters can be specified separately for each point or can be global. Thus, the user/designer has to specify either 3 or 3n parameters. Consider an interior point Pk where two spline segments meet. When the “arriving” segment arrives at the point it is moving in a certain direction that we call the arriving tangent vector. Similarly, the “departing” segment starts at the point while moving in a direction that we call the departing tangent vector. The three shape parameters control these two tangent vectors in various ways. The tension parameter varies the magnitudes of the arriving and departing vectors. The bias parameter rotates both tangents by the same amount from their “natural” direction, and the continuity parameter rotates each tangent separately, so they may no longer point in the same direction. A complete Kochanek–Bartels spline passes through n given data points P1 through Pn and is computed and displayed in the following steps: 1. The designer (or user) adds two new points P0 and Pn+1 . Recall that each cardinal spline segment is determined by a group of four points but it goes from the second point to the third one. Adding point P0 makes it possible to have a segment from P1 to P2 , and similarly for the new point Pn+1 . All the original n points are now interior. 2. Two tangent vectors, arriving and departing, are computed for each of the n interior points from Equations (12.56) and (12.57). The arriving tangent at P1 and the
12 Spline Interpolation
s=0.4
619
s=0.9
(* A Catmull-Rom surface with tension *) Clear[Pt,Bm,CRpatch,g1,g2,s]; Pt={{{0,3,0},{1,3,0},{2,3,0},{3,3,0}}, {{0,2,0},{.1,2,.9},{2.9,2,.9},{3,2,0}}, {{0,1,0},{.1,1,.9},{2.9,1,.9},{3,1,0}}, {{0,0,0},{1,0,0},{2,0,0},{3,0,0}}}; Bm:={{-s,2-s,s-2,s},{2s,s-3,3-2s,-s},{-s,0,s,0},{0,1,0,0}}; CRpatch[i_]:=(*rows 1-4*){u^3,u^2,u,1}.Bm. Pt[[{1,2,3,4},{1,2,3,4},i]].Transpose[Bm].{w^3,w^2,w,1}; g1=Graphics3D[{Red,AbsolutePointSize[6], Table[Point[Pt[[i,j]]],{i,1,4},{j,1,4}]}]; s=.4; g2=ParametricPlot3D[{CRpatch[1],CRpatch[2],CRpatch[3]}, {u,0,1},{w,0,1}]; Show[g1,g2,ViewPoint->{1.431,-4.097,0.011},PlotRange->All]
Figure 12.19: A Catmull–Rom Surface Patch with Tension.
departing tangent at Pn are not used, so the total number of tangents to compute is 2n − 2. 3. The n+2 points are divided into n−1 overlapping groups of four points each, and a Hermite curve segment is computed and displayed for each group. The computations are similar to those for the cardinal spline, the only difference being that the tangent vectors are computed in a special way. d
Pk Pk−1
Pk−1(t) d
Pk−1
Pk a
Pk−1
Pk(t)
a
Pk Pk+1
Figure 12.20: Two Kochanek–Bartels Spline Segments.
Figure 12.20 shows two spline segments Pk−1 (t) and Pk (t) that meet at interior point Pk . This point is the last endpoint of segment Pk−1 (t) and the first endpoint def of segment Pk (t). We denote the two tangent vectors at Pk by Pak−1 = Ptk−1 (1) and def
Pdk = Ptk (0). In a cardinal spline the two tangents Pak−1 and Pdk are identical and are proportional to the vector Pk+1 − Pk−1 (the chord surrounding Pk ). This guarantees a smooth connection of the two segments. In a Kochanek–Bartels spline, the two tangents are computed as shown here, they have the same magnitude, but may point in different directions. Notice that the two endpoints of segment Pk (t) are Pk and Pk+1 and its two extreme tangent vectors are Pdk and Pak . Here is how the tangent vectors are computed.
12.8 Kochanek–Bartels Splines
620
Tension. In a cardinal spline, tension is controlled by multiplying the tangent vectors by a parameter s. Small values of s produce high tension, so the tension parameter T is defined by s = (1 − T )/2. Thus, we can express the tangents as 1 1−T (Pk+1 − Pk−1 ) = (1 − T ) (Pk+1 − Pk ) + (Pk − Pk−1 ) . 2 2 This can be interpreted as (1 − T ) multiplied by the average of the “arriving” chord (Pk − Pk−1 ) and the “departing” chord (Pk+1 − Pk ). In a Kochanek–Bartels spline, the tension parameter contributes the same quantity (1 − Tk )
1 (Pk+1 − Pk ) + (Pk − Pk−1 ) 2
to the two tangents Pak−1 and Pdk at point Pk . The value Tk = 1 results in tangent vectors of zero magnitude, which corresponds to maximum tension. The value Tk = 0 (zero tension) results in a contribution of (Pk+1 − Pk−1 )/2 to both tangent vectors. The value Tk = −1 results in twice that contribution and therefore to long tangents and low tension. Continuity. Curves are important in computer animation. An object being animated is often moved along a curve and the (virtual) camera may also move along a path. Sometimes, an animation path should not be completely smooth, but should feature jumps and jerks at certain points. This effect is achieved in a Kochanek–Bartels spline by separately rotating Pak−1 and Pdk , so that they point in different directions. The contributions of the continuity parameter to these vectors are contribution to Pak−1 contribution to Pdk
is is
1 − ck 1 + ck (Pk − Pk−1 ) + (Pk+1 − Pk ) , 2 2 1 + ck 1 − ck (Pk − Pk−1 ) + (Pk+1 − Pk ) , 2 2
P
k−
P
k− 1
where ck is the continuity parameter at point Pk . The value ck = 0 results in Pak−1 = Pdk and therefore in a smooth curve at Pk . For ck = 0, the two tangents are different and the curve has a sharp corner (a kink or a cusp) at point Pk , a corner that becomes more pronounced for large values of ck . The case ck = −1 implies Pak−1 = Pk − Pk−1 (the arriving chord) and Pdk = Pk+1 − Pk (the departing chord). The case ck = 1 produces tangent vectors in the opposite directions: Pak−1 = Pk+1 − Pk and Pdk = Pk − Pk−1 . These three extreme cases are illustrated in Figure 12.21.
Pk−1
Pk+1−Pk
Pk
P k -1 1−
P k+
Pk+1 a Pk−1
c=−1
d
Pk
d
a Pk−1
c=0
Pk
d
a Pk−1
c=1
Figure 12.21: Effects of the Continuity Parameter.
Pk
12 Spline Interpolation
621
Tension and continuity may have the same effect, yet they affect the dynamics of the curve in different ways as illustrated by Figure 12.22. Part (a) of the figure shows five points and a two-segment Kochanek–Bartels spline from P1 through P2 to P3 . Both the tension and continuity parameters are set to zero at P2 , so the direction of the curve at this point is the direction of the chord P3 − P1 . Setting T = 1 at P2 increases the tension to maximum at that point, thereby changing the curve to two straight segments (part (b) of the figure). However, if we leave T at zero and set c = −1 at P2 , the resulting curve will have the same shape (the direction of the arriving tangent Pa1 is from P1 to P2 while the direction of the departing tangent Pd2 is from P2 to P3 ). P2 P1
P3
P0
P4 (a)
T=1 or c=−1 (b)
(c)
Figure 12.22: Different Dynamics of Tension and Continuity.
Thus, maximum tension and minimum continuity may result in identical geometries, but not in identical curves. These parameters have different effects on the speed of the curve as illustrated in Part (c) of the figure. Specifically, infinite tension results in nonuniform speed. If the first spline segment P1 (t) is plotted by incrementing t in equal steps, the resulting points are first bunched together, then feature larger gaps, and finally become dense again. When the user specifies high (or maximum) tension at a point, the tangent vectors become short (or zero) at the point, but they get longer as the curve moves away from the point. The speed of the curve is determined by the size of its tangent vector, which is why high tension results in nonuniform speed. In contrast, low tension does not affect the magnitude of the tangent vectors, which is why it does not affect the speed. When low continuity results in a straight segment, the speed will be uniform. Curved segments, however, always have variable speed regardless of the continuity parameters at the endpoints of the segment. Exercise 12.12: Compute the tangent vector of the cardinal spline for s = 0 and show that its length is zero for t = 0 and t = 1, but is nonzero elsewhere. Bias. In a cardinal spline with zero tension, both tangent vectors at point Pk have the value 1 1 (Pk+1 − Pk−1 ) = (Pk − Pk−1 ) + (Pk+1 − Pk ) , 2 2 implying that the direction of the curve at point Pk is the average of the two chords connecting at Pk . The Kochanek–Bartels spline introduces an additional (sometimes misunderstood) parameter bk to control the direction of the curve at Pk by rotating Pak−1 and Pdk by the same amount. The contribution of the bias parameter to the arriving and departing
12.8 Kochanek–Bartels Splines
622
tangents is set (somewhat arbitrarily) to 1 + bk 1 − bk (Pk − Pk−1 ) + (Pk+1 − Pk ) . 2 2 Setting bk = 1 changes both tangents to Pk − Pk−1 , the chord on the left of Pk . The other extreme value, bk = −1, changes them to the chord on the right of Pk . Figure 12.23 illustrates the effects of the three extreme values of bk .
P1
P2 b=0
P3
P0
b=1
b=−1
(b)
(c)
P4 (a)
Figure 12.23: Effect of the Bias Parameter b.
Bias is used in computer animation to obtain the effect of overshooting a point (bk = 1) or undershooting it (bk = −1). The three shape parameters are incorporated in the tangent vectors as follows: the tangent vector that departs point Pk is defined by 1 1 (1−Tk )(1+bk )(1−ck )(Pk −Pk−1 )+ (1−Tk )(1−bk )(1+ck )(Pk+1 −Pk ). 2 2 (12.56) Similarly, the tangent vector arriving at point Pk+1 is defined by Pdk = Ptk (0) =
1 (1 − Tk+1 )(1 + bk+1 )(1 + ck+1 )(Pk+1 − Pk ) 2 1 + (1 − Tk+1 )(1 − bk+1 )(1 − ck+1 )(Pk+2 − Pk+1 ). 2
Pak = Ptk (1) =
(12.57)
As a result, the Kochanek–Bartels curve segment Pk (t) from Pk to Pk+1 is constructed by the familiar expression ⎛
⎞ Pk ⎜P ⎟ Pk (t) = (t3 , t2 , t, 1)H ⎝ k+1 ⎠, Pdk a Pk where H is the Hermite matrix, Equation (11.7). Notice that the segment depends on six shape parameters, three at Pk and three at Pk+1 . The segment also depends on four points Pk−1 , Pk , Pk+1 , and Pk+2 . Note also that the second derivatives of this curve are generally not continuous at the data points.
12 Spline Interpolation
623
Example: The three points P1 = (0, 0), P2 = (4, 6), and P3 = (10, −1) are given, together with the extra points P0 = (−1, −1) and P4 = (11, −2). Up to nine shape parameters can be specified (three parameters for each of the three interior points). Figure 12.24 shows the curve with all shape parameters set to zero, and the effects of setting T to 1 (maximum tension) and to −1 (a loose curve), setting c to 1, and setting b to 1 (overshoot) and −1 (undershoot), all in P2 . The Mathematica code that computed the curves is also included. 6
6
4
4
2
2
2
4
6
8
10
6
2
2
4
6
8
10
6
6
c=1
4
b=1
4
6
8
10
4
6
8
10
6
8
10
b=−1
4
2
2
2
6
4
2
t=−1
4
t=1
2
2
4
6
8
10
2
4
Clear[T, H, B, pts, Pa, Pd, te, bi, co]; (*Kochanek Bartels 3+2 points*) T = {t^3, t^2, t, 1}; H = {{2, -2, 1, 1}, {-3, 3, -2, -1}, {0, 0, 1, 0}, {1, 0, 0, 0}}; Pd[k_] := (1 - te[[k + 1]]) (1 + bi[[k + 1]]) (1 + co[[k + 1]]) (pts[[k + 1]] - pts[[k]])/ 2 + (1 - te[[k + 1]]) (1 - bi[[k + 1]]) (1 co[[k + 1]]) (pts[[k + 2]] - pts[[k + 1]])/2; Pa[k_] := (1 - te[[k + 2]]) (1 + bi[[k + 2]]) (1 co[[k + 2]]) (pts[[k + 2]] - pts[[k + 1]])/ 2 + (1 - te[[k + 2]]) (1 - bi[[k + 2]]) (1 + co[[k + 2]]) (pts[[k + 3]] - pts[[k + 2]])/2; pts := {{-1, -1}, {0, 0}, {4, 6}, {10, -1}, {11, -2}}; te = {0, 0, 0, 0, 0}; bi = {0, 0, 0, 0, 0}; co = {0, 0, 0, 0, 0}; B = {pts[[2]], pts[[3]], Pd[1], Pa[1]}; Simplify[T.H.B]; Simplify[D[T.H.B, t]]; g1 = ParametricPlot[T.H.B, {t, 0, 1}, PlotRange -> All]; B = {pts[[3]], pts[[4]], Pd[2], Pa[2]}; Simplify[T.H.B]; Simplify[D[T.H.B, t]]; g2 = ParametricPlot[T.H.B, {t, 0, 1}, PlotRange -> All]; g3 = Graphics[{Red, AbsolutePointSize[6], Table[Point[pts[[i]]], {i, 1, 5}]}]; Show[g1, g2, g3, PlotRange -> All]
Figure 12.24: Effects of the Three Parameters in the Kochanek–Bartels Spline.
624
12.9 Fitting a PC to Experimental Points
12.9 Fitting a PC to Experimental Points The spline methods discussed so far use data points. The curve methods of Chapters 13 and 14 use control points. In the case of data points, the curve has to pass through all of them. Control points exert a pull on the curve, so each of them pulls the curve toward itself (Section 8.6). The method described here, due to [Plass and Stone 83], uses experimental points (“epoints” for short). Such points are typically obtained as a result of a science experiment, but can also be input by scanning an image. Given n epoints P1 , P2 ,. . . , Pn , the problem is calculating a PC curve that will pass close to all the points but will not necessarily pass through them. A user-controlled tolerance parameter controls the closeness of the fit. Since a PC is fully defined by means of just four coefficients, it cannot have a very complex shape, so it may not be able to follow a set of epoints that meander all over the place. In such a case, the curve will have to be calculated as a spline where each segment is a PC and the segments fit together, either smoothly or with corner joints. In this section, we show how to calculate one such PC, so we assume that the n epoints do not describe a complex curve. To distinguish between simple and complex curves quantitatively, we connect the n epoints with n − 1 straight segments, resulting in an open polygon. Experience shows that the set of points is simple and will allow a PC to follow it if (1) all the angles between consecutive segments are in the range [135◦ , 180◦ ], (2) the curve has at most one loop, and (3) if it does not have a loop, it can have at most two inflection points. We denote our single PC segment by P(t) = u(t), w(t) = a3 t3 + a2 t2 + a1 t + a0 3
(12.58) 2
= (a3x , a3y )t + (a2x , a2y )t + (a1x , a1y )t + (a0x , a0y ), where the four vector quantities a, b, c, and d have to be determined. Together, they constitute eight numbers, so we can say that a PC segment has eight degrees of freedom. To understand the method, let’s imagine that we have somehow found a PC segment P(t) that passes close to all n epoints. We can use this PC to find the n values of the parameter t where the curve passes closest to each of the n epoints. Denoting these values by t1 , t2 ,. . . , tn , we use them to label the epoints Pt1 , Pt2 ,. . . , Ptn . Now imagine the opposite situation where we still don’t know the PC segment, but we already have the epoints somehow labeled correctly. Using the coordinates of the n epoints and the n values of t, we could, in such a case, calculate a curve using the least-squares fitting technique. The idea is to start with an initial set of estimated t values, use least squares to calculate a PC segment from this set, use this PC to calculate a better set of t values, and repeat until the curve obtained is close enough to all the epoints. Convergence is not guaranteed, but experience shows that epoints that satisfy the three conditions stated earlier normally result in a reasonably shaped curve in just a few iterations. The initial set of estimated t values is based on the lengths of the polygon’s edges. Denoting the polygon vertices (i.e., the epoints) by Pi = (xi , yi ), we define a quantity
12 Spline Interpolation
625
sk as the sum of the polygon’s edges from P1 to Pk : s1 = 0,
sk =
k−1 !
|Pi+1 − Pi | =
k−1 !"
i=1
(xi+1 − xi )2 + (yi+1 − yi )2 ,
k = 2, 3, . . . , n.
i=1
The initial value of tk is now defined as the ratio sk /sn , resulting in t1 = 0, tn = 1, and, in general, 0 ≤ ti ≤ 1. Exercise 12.13: Given the eight epoints P1 = (2, 5), P2 = (2, 8), P3 = (5, 11), P4 = (8, 8), P5 = (11, 4), P6 = (14, 8), P7 = (13, 8), and P8 = (11, 10), draw them in the xy plane, draw the open polygon connecting them, indicate the “bad” polygon vertices, and calculate the quantities sk and tk . Once a set of ti values is available, a PC curve segment can be calculated by least squares. The principle is to compute values for the four coefficients ai that will minimize the expression S(a0 , a1 , a2 , a3 ) =
# 3 $2 n n ! ! ! (P(tj ) − Pj )2 = ai tij − Pj . j=1
j=1
i=0
We consider this expression a function S of the four coefficients ai and minimize it by (1) writing the four partial derivatives of S, n 3 ∂S(a0 , a1 , a2 , a3 ) ! ! i ai tj − Pj tkj , = 2 ∂ak j=1 i=0
1 ≤ k ≤ 4,
(2) equating each to zero, 3 ! i=0
⎛ ⎞ n n ! ! ⎝ tij tkj ⎠ ai = Pj tkj , j=1
1 ≤ k ≤ 4,
j=1
which can also be written a0 (t01 tk1 + t02 tk2 + · · · + t0n tkn ) + a1 (t11 tk1 + t12 tk2 + · · · + t1n tkn ) + a2 (t21 tk1 + t22 tk2 + · · · + t2n tkn ) + a3 (t31 tk1 + t32 tk2 + · · · + t3n tkn ) = P1 tk1 + P2 tk2 + · · · + Pn tkn ,
(12.59)
1 ≤ k ≤ 4,
and then (3) solving the resulting system of four linear equations in the four unknowns ai . Having produced values for the four coefficients ai , we use the resulting PC to calculate a better set of t values. For each epoint, we find the value of t that produces the point nearest it on the PC and assign that t value to the epoint. Mathematically, this amounts to finding the minimum distance between an epoint Pj = (xj , yj ) and the curve P(t) = (u(t), w(t)). Since the distance involves a square root, we use the square
12.9 Fitting a PC to Experimental Points
626
of the distance (a similar method is used in the Bresenham–Michener circle method, Section 3.8.3). Our problem is, therefore, to minimize the function 2 2 D(t) = |P(t) − Pj | = u(t) − xj + w(t) − yj , and we do this by differentiating it with respect to t, equating the derivative to zero, and solving for t. Thus, 2(u(t) − xj )ut (t) + 2(w(t) − yj )wt (t) = 0.
(12.60)
Since u(t) and w(t) are cubic polynomials in t, their derivatives are quadratic polynomials. The left side of Equation (12.60) is thus a degree-5 polynomial in t, so a numerical solution is required. We use the Newton–Raphson method, a general, fast, iterative method for finding roots of functions. Given a function f (t), the method requires an initial value of t (a guess or an estimate) and updates this value by the iteration t←t−
f (t) . f (t)
If the initial value is close to a root, convergence is fast but is not guaranteed. In our case, function f (t) is given by Equation (12.60), and we always have an estimate for t. Our Newton–Raphson iteration thus becomes t←t−
ut (t)2
2(u(t) − xj )ut (t) + 2(w(t) − yj )wt (t) . + wt (t)2 + (u(t) − xj )utt (t) + (w(t) − yj )wtt (t)
Experience shows that one iteration is enough to produce a new t value much better than its predecessor. The new t values may be located outside the interval [0, 1], so one last step is needed, where all n new t values are linearly scaled to bring them back into the right interval, if necessary. Here is how it’s done. If the new t1 is positive, it should not be scaled or changed in any way. This is the algorithm’s way of telling us that a better fit would be achieved if the curve did not start at the first epoint. However, if t1 becomes negative (e.g., if t1 = −α), it should be incremented by α to bring it back to zero, and all the other ti ’s should be incremented by quantities that get smaller with i until they reach zero for i = n (i.e., tn should not be changed). Similarly, if the new tn is less than 1, it should not be scaled, but if it exceeds 1 (by a quantity β), it should be decremented by β and all the other ti ’s should be decremented by quantities that get smaller with i until they reach zero for i = 1. Once this is grasped, it is easy to guess how a general value ti should be scaled. It should be incremented by α multiplied by some weight and decremented by β multiplied by another weight, such that the weights add up to 1. If the new t1 is positive, α should be set to 0. Similarly, if the new tn is less than 1, β should be set to 0. The result is ti ← t i + α
n−i i−1 −β . n−1 n−1
(12.61)
12 Spline Interpolation
627
Exercise 12.14: Given the eight new t values, −0.1, 0.1, 0.2, 0.3, 0.4, 0.6, 0.8, and 1.2, use Equation (12.61) to scale them. These are the steps of the iteration. The loop continues until none of the t values changes significantly or, alternatively, until the maximum distance between an epoint and the curve falls below a preset threshold. If this does not happen after a certain, fixed number of iterations, the loop stops and displays an error message (curve does not converge to epoints). Figure 12.25 is an example of a spline fitting a set of 20 epoints. It is easy to see how the fit improves even after a small number of iterations.
Initial fit
1 iteration
10 iterations
100 iterations
Figure 12.25: Spline Fit to Many Epoints.
We next discuss how to add constraints to the PC segment that’s being calculated. When the initial t values are calculated by tk = sk /sn , the first value, t1 , becomes zero, and the last value, tn , is set to 1. The PC segment thus starts at the first epoint P1 and ends at Pn . When the t values are updated in an iteration, both t1 and tn may get new values. If t1 goes below zero, it is scaled back to zero. However, if it becomes positive (e.g., t1 = 0.05), it is not changed. This means that point P(0.05) on the curve would be closest to P1 . The start of the curve (point P(0)) would, in this case, be located “before” P1 . A similar situation may happen at the end of the curve, where P(1) may move “past” the last epoint Pn . Fitting a PC segment to a set of epoints in this way generally means that the curve may be “longer” than the set. Sometimes, we want the curve to start and end at the two extreme epoints, so we have to “constrain” it. Another aspect of constraining arises when we are given a complex set of epoints, where more than one PC segment is needed to fit all the points. In such a case, we have to consider the problem of joining individual segments. A segment may therefore have to be constrained by specifying its start and/or end tangent vectors. The point to understand is that each added constraint reduces the quality of the fit. The reason is that a PC depends on four vector coefficients, which constitute eight scalar quantities (it has eight degrees of freedom). Adding a constraint means fixing one or more of those quantities, thereby reducing the number of degrees of freedom, and thus leading to a worse fit. The number of constraints should, therefore, be kept small (no more than one or two).
12.9 Fitting a PC to Experimental Points
628
Adding constraints is done by generalizing the cubic polynomials u(t) and w(t). Instead of writing them in the form u(t), w(t) = a3 t3 + a2 t2 + a1 t + a0 , we express them as u(t), w(t) = a1 F1 (t) + a2 F2 (t) + a3 F3 (t) + a4 F4 (t), where the Fi (t) are any four linearly independent cubic polynomials. Both u(t) and w(t) remain cubic polynomials, but certain choices of the Fi (t)’s may make it easy to constrain the endpoints or the extreme tangents of the PC segment. One such choice is the set of four Hermite blending functions of Equation (11.6), duplicated here: F1 (t) = 2t3 − 3t2 + 1, F2 (t) = −2t3 + 3t2 , F3 (t) = t3 − 2t2 + t, F4 (t) = t3 − t2 .
(11.6)
We know from Equation (11.5) that if a PC segment P(t) is expressed as the weighted sum (12.62) P(t) = a1 F1 (t) + a2 F2 (t) + a3 F3 (t) + a4 F4 (t), then a1 and a2 are the endpoints of the segment, and a3 and a4 are its two extreme tangents. We can now add constraints by using Equation (12.62) instead of Equation (12.58) and preassigning values to some of the four ai coefficients. For example, if we want the initial tangent vector to be in the “up” direction (0, 1), we assign a3 the value (0, 1) and end up with Equation (12.59) becoming a system of three equations in the three unknowns a1 , a2 , and a4 . It is now obvious that the more constraints (i.e., the more coefficients are assigned values and eliminated from Equation (12.59)), the fewer are the possibilities for fitting the PC segment to the epoints. There is, therefore, a trade-off between a good fit and more constraints. . . . and then in midair the elasticity makes the shape rebound, so what you have is not a circle but some linked spline curves, not exactly symmetrical, because the ball flattens on one side . . .
—John Updike, Roger’s version (1996)
13 B´ezier Approximation B´ezier methods for curves and surfaces are popular, are commonly used in practical work, and are described here in detail. Two approaches to the design of a B´ezier curve are described, one using Bernstein polynomials and the other based on the mediation operator. Both rectangular and triangular B´ezier surface patches are discussed, with examples. Historical Notes Pierre Etienne B´ezier (pronounced “Bez-yea” or “bez-ee-ay”) was an applied mathematician with the French car manufacturer Renault. In the early 1960s, encouraged by his employer, he began searching for ways to automate the process of designing cars. His methods have been the basis of the modern field of computer aided geometric design (CAGD), a field with practical applications in many areas. It is interesting to note that Paul de Faget de Casteljau, an applied mathematician with Citro¨en, was the first, in 1959, to develop the various B´ezier methods, but because of the secretiveness of his employer, never published it (except for two internal technical memos that were discovered in 1975). This is why the entire field is named after the second person, B´ezier, who developed it. B´ezier and de Casteljau did their work while working for car manufacturers. It is little known that Steven Anson Coons of MIT did most of his work on surfaces (around 1967) while a consultant for Ford. Another mathematician, William J. Gordon, has generalized the Coons surfaces, in 1969, as part of his work for General Motors research labs. In addition, airplane designer James Ferguson also came up with the same ideas for the construction of curves and surfaces. It seems that car and airplane manufacturers have been very innovative in the CAGD field. Detailed historical surveys of CAGD can be found in [Farin 04] and [Schumaker 81].
D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_13, © Springer-Verlag London Limited 2011
629
13.1 The B´ ezier Curve
630
13.1 The B´ ezier Curve The B´ezier curve is a parametric curve P(t) that is a polynomial function of the parameter t. The degree of the polynomial depends on the number of points used to define the curve. The method employs control points and produces an approximating curve (note the title of this chapter). The curve does not pass through the interior points but is attracted by them (however, see Exercise 13.7 for an exception). It is as if the points exert a pull on the curve. Each point influences the direction of the curve by pulling it toward itself, and that influence is strongest when the curve gets nearest the point. Figure 13.1 shows some examples of cubic B´ezier curves. Such a curve is defined by four points and is a cubic polynomial. Notice that one curve has a cusp and another curve has a loop. The fact that the curve does not pass through the points implies that the points are not “set in stone” and can be moved. This makes it easy to edit, modify and reshape the curve, which is one reason for its importance. The curve can also be edited by adding new points or deleting existing points. These techniques are discussed in Sections 13.8 and 13.9, but they are cumbersome because the mathematical expression of the curve depends on the number of points, not just on the points themselves. P2
P1
P1
P3
P0
P2
P0
P2
P1
P0
P3
P3
P2 z P0
y
P1
x P3 Figure 13.1: Cubic B´ezier Curves with Their Control Points and Polygons.
The control polygon of the B´ezier curve is the polygon obtained when the control points are connected, in their natural order, with straight segments. How does one go about deriving such a curve? We describe two approaches to the design—a weighted sum and a linear interpolation—and show that they are identical.
13 B´ ezier Approximation
631
13.1.1 Pascal Triangle and the Binomial Theorem The Pascal triangle and the binomial theorem are related because both employ the same numbers. The Pascal triangle is an infinite triangular matrix that’s built from the edges inside 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 ... ... ... We first fill the left and right edges with 1’s, then compute each interior element as the sum of the two elements directly above it. As can be expected, it is not hard to obtain an explicit expression for the general element of the Pascal triangle. We first number the rows from 0 starting atthe top, and the columns from 0 starting on the left. A general element is denoted by ji . We then observe that the top two rows (corresponding to i = 0, 1) consist of 1’s and that every other row can be obtained as the sum of its predecessor and a shifted version of its predecessor. For example, +
1 3 3 1 1 3 3 1 1 4 6 4 1
This shows that the elements of the triangle satisfy i i = 1, i = 0, 1, . . . , = i 0 i−1 i−1 i , i = 2, 3, . . . , + = j j−1 j
j = 1, 2, . . . , (i − 1).
From this it is easy to derive the explicit expression i−1 i−1 i + = j j−1 j (i − 1)! (i − 1)! = + (j − 1)!(i − j)! j!(i − 1 − j)! (i − j)(i − 1)! j(i − 1)! + = j!(i − j)! j!(i − j)! i! = . j!(i − j)! Thus, the general element of the Pascal triangle is the well-known binomial coefficient i i! . = j!(i − j)! j
632
13.2 The Bernstein Form of the B´ ezier Curve
The binomial coefficient is one of Newton’s many contributions to mathematics. His binomial theorem states that (a + b)n =
n n i n−i ab . i i=0
(13.1)
This equation can be written in a symmetric way by denoting j = n − i. The result is (a + b)n =
i+j=n i,j≥0
(i + j)! i j ab , i!j!
(13.2)
from which we can easily guess the trinomial theorem (which is used in Section 13.25) (a + b + c)n =
i+j+k=n i,j,k≥0
(i + j + k)! i j k ab c . i!j!k!
(13.3)
13.2 The Bernstein Form of the B´ ezier Curve The first approach to the B´ezier curve expresses it as a weighted sum of the points (with, of course, barycentric weights). Each control point is multiplied by a weight and the products are added. We denote the control points by P0 , P1 , . . . , Pn (n is therefore defined as 1 less than the number of points) and the weights by Bi . The expression of the weighted sum is n Pi Bi , 0 ≤ t ≤ 1. P(t) = i=0
The result, P(t), depends on the parameter t. Since the points are given by the user, they are fixed, so it is the weights that must depend on t. We therefore denote them by Bi (t). How should Bi (t) behave as a function of t? We first examine B0 (t), the weight associated with the first point P0 . We want that point to affect the curve mostly at the beginning, i.e., when t is close to 0. Thus, as t grows toward 1 (i.e., as the curve moves away from P0 ), B0 (t) should drop down to 0. When B0 (t) = 0, the first point no longer influences the shape of the curve. Next, we turn to B1 (t). This weight function should start small, should have a maximum when the curve approaches the second point P1 , and should then start dropping until it reaches zero. A natural question is: When (for what value of t) does the curve reach its closest approach to the second point? The answer is: It depends on the number of points. For three points (the case n = 2), the B´ezier curve passes closest to the second point (the interior point) when t = 0.5. For four points, the curve is nearest the second point when t = 1/3. It is now clear that the weight functions must also depend on n and we denote them by Bn,i (t). Hence, B3,1 (t) should start at 0, have a maximum at t = 1/3, and go down to 0 from there. Figure 13.2 shows the desired behavior of Bn,i (t)
13 B´ ezier Approximation 1
1 B20(t)
B22(t)
1 B30(t)
B21(t)
633
B33(t)
B31(t)
B40(t)
B32(t) B41(t)
t
B44(t) B42(t)
B43(t) t
t
(* Just the base functions bern. Note how "pwr" handles 0^0 *) Clear[pwr,bern]; pwr[x_,y_]:=If[x==0 && y==0, 1, x^y]; bern[n_,i_,t_]:=Binomial[n,i]pwr[t,i]pwr[1-t,n-i] (* t^i x (1-t)^(n-i) *) Plot[Evaluate[Table[bern[5,i,t], {i,0,5}]], {t,0,1}];
Figure 13.2: The Bernstein Polynomials for n = 2, 3, 4.
for n = 2, 3, and 4. The five different weights B4,i (t) have their maxima at t = 0, 1/4, 1/2, 3/4, and 1. The functions chosen by B´ezier (and also by de Casteljau) were derived by the Russian mathematician Serge˘ı Natanovich Bernshte˘ın in 1912, as part of his work on approximation theory (see Chapter 6 of [Davis 63]). They are known as the Bernstein polynomials and are defined by Bn,i (t) =
n i n−i , i t (1 − t)
where
n i
=
n! i!(n − i)!
(13.4)
are the binomial coefficients. These polynomials feature the desired behavior and have a few more useful properties that are discussed here. (In calculating the curve, we assume that the quantity 00 , which is normally undefined, equals 1.) The B´ezier curve is now defined as P(t) =
n i=0
Pi Bn,i (t), where Bn,i (t) =
n i n−i and 0 ≤ t ≤ 1. i t (1 − t)
(13.5)
Each control point (a pair or a triplet of coordinates) is multiplied by its weight, which is in the range [0, 1]. The weights act as blending functions that blend the contributions of the different points. Here is Mathematica code to compute and plot the Bernstein polynomials and the B´ezier curve: (*Just the base functions bern.Note how "pwr" handles 0^0*) Clear[pwr,bern,n,i,t] pwr[x_,y_]:=If[x==0&&y==0,1,x^y]; bern[n_,i_,t_]:=Binomial[n,i]pwr[t,i]pwr[1-t,n-i] (*t^i*(1-t)^(n-i)*)
634
13.2 The Bernstein Form of the B´ ezier Curve Plot[Evaluate[Table[bern[5,i,t],{i,0,5}]],{t,0,1}] Clear[i,t,pnts,pwr,bern,bzCurve,g1,g2]; (*Cubic Bezier curve either read points from file pnts=ReadList["DataPoints",{Number,Number}];*) or enter them explicitly*) pnts={{0,0},{.7,1},{.3,1},{1,0}}; (*4 points for a cubic curve*) pwr[x_,y_]:=If[x==0&&y==0,1,x^y]; bern[n_,i_,t_]:=Binomial[n,i]pwr[t,i]pwr[1-t,n-i] bzCurve[t_]:=Sum[pnts[[i+1]]bern[3,i,t],{i,0,3}] g1=Graphics[{Red, AbsolutePointSize[6], Table[Point[pnts[[i]]],{i,1,4}]}]; g2=ParametricPlot[bzCurve[t],{t,0,1}]; Show[g1,g2,PlotRange->All]
Next is similar code for a three-dimensional B´ezier curve. It was used to draw the space curve of Figure 13.1. Clear[pnts,pwr,bern,bzCurve,g1,g2,g3]; (*General 3D Bezier curve*) pnts={{1,0,0},{0,-3,0.5},{-3,0,0.75},{0,3,1}, {3,0,1.5},{0,-3,1.75},{-1,0,2}}; n=Length[pnts]-1; pwr[x_,y_]:=If[x==0&&y==0,1,x^y]; bern[n_,i_,t_]:=Binomial[n,i]pwr[t,i]pwr[1-t,n-i] (*t^i x (1-t)^(n-i)*) bzCurve[t_]:=Sum[pnts[[i+1]]bern[n,i,t],{i,0,n}]; g1=ParametricPlot3D[bzCurve[t],{t,0,1},DisplayFunction->Identity]; g2=Graphics3D[{AbsolutePointSize[2],Map[Point,pnts]}]; g3=Graphics3D[{AbsoluteThickness[2], (*control polygon*) Table[Line[{pnts[[j]],pnts[[j+1]]}],{j,1,n}]}]; g4=Graphics3D[{AbsoluteThickness[1.5], (*the coordinate axes*) Line[{{0,0,3},{0,0,0},{3,0,0},{0,0,0},{0,3,0}}]}]; Show[g1,g2,g3,g4,AspectRatio->Automatic,PlotRange->All,Boxed->False]
Exercise 13.1: Design a heart-shaped B´ezier curve based on nine control points. When B´ezier started searching for such functions in the early 1960s, he set the following requirements [B´ezier 86]: 1. The functions should be such that the curve passes through the first and last control points. 2. The tangent to the curve at the start point should be P1 − P0 , i.e., the curve should start at point P0 moving toward P1 . A similar property should hold at the last point. 3. The same requirement is generalized for higher derivatives of the curve at the two extreme endpoints. Hence, Ptt (0) should depend only on the first point P0 and its two neighbors P1 and P2 . In general, P(k) (0) should only depend on P0 and its k neighbors P1 through Pk . This feature provides complete control over the continuity at the joints between separate B´ezier curve segments (Section 13.5). 4. The weight functions should be symmetric with respect to t and (1 − t). This means that a reversal of the sequence of control points would not affect the shape of the curve.
13 B´ ezier Approximation
635
5. The weights should be barycentric, to guarantee that the shape of the curve is independent of the coordinate system. 6. The entire curve lies within the convex hull of the set of control points. (See property 8 of Section 13.4 for a discussion of this point.) The definition listed in Equation (13.5), using Bernstein polynomials as the weights, satisfies all these requirements. In particular, requirement 5 is proved when Equation (13.1) is written in the form [t + (1 − t)]n = · · · (see Equation (13.12) if you cannot figure this out). Following are the explicit expressions of these polynomials for n = 2, 3, and 4. Example: For n = 2 (three control points), the weights are B2,0 (t) = ( 20 )t0 (1 − t)2−0 = (1 − t)2 ,
B2,1 (t) = ( 21 )t1 (1 − t)2−1 = 2t(1 − t), B2,2 (t) = ( 22 )t2 (1 − t)2−2 = t2 ,
and the curve is P(t) = (1 − t)2 P0 + 2t(1 − t)P1 + t2 P2 = (1 − t)2 , 2t(1 − t), t2 (P0 , P1 , P2 )T ⎞⎛ ⎛ ⎞ P0 1 −2 1 2 = (t , t, 1) ⎝ −2 2 0 ⎠ ⎝ P1 ⎠ . P2 1 0 0
(13.6)
This is the quadratic B´ezier curve. Exercise 13.2: Given three points P1 , P2 , and P3 , calculate the parabola that goes from P1 to P3 and whose start and end tangent vectors point in directions P2 − P1 and P3 − P2 , respectively. In the special case n = 3, the four weight functions are B3,0 (t) = ( 30 )t0 (1 − t)3−0 = (1 − t)3 ,
B3,1 (t) = ( 31 )t1 (1 − t)3−1 = 3t(1 − t)2 ,
B3,2 (t) = ( 32 )t2 (1 − t)3−2 = 3t2 (1 − t), B3,3 (t) = ( 33 )t3 (1 − t)3−3 = t3 ,
and the curve is (13.7) P(t) = (1 − t)3 P0 + 3t(1 − t)2 P1 + 3t2 (1 − t)P2 + t3 P3
T = (1 − t)3 , 3t(1 − t)2 , 3t2 (1 − t), t3 P0 , P1 , P2 , P3
T = (1 − 3t + 3t2 − t3 ), (3t − 6t2 + 3t3 ), (3t2 − 3t3 ), t3 P0 , P1 , P2 , P3 ⎞ ⎛ ⎞⎛ −1 3 −3 1 P0 P 3 −6 3 0 ⎜ ⎜ ⎟ 1⎟ (13.8) = (t3 , t2 , t, 1) ⎝ ⎠. ⎠⎝ P2 −3 3 0 0 P3 1 0 0 0
13.2 The Bernstein Form of the B´ ezier Curve
636
It is clear that P(t) is a cubic polynomial in t. It is the cubic B´ezier curve. In general, the B´ezier curve for points P0 , P1 ,. . . , Pn is a polynomial of degree n. Exercise 13.3: Given the curve P(t) = (1 + t + t2 , t3 ), find its control points. Exercise 13.4: The cubic curve of Equation (13.8) is drawn when the parameter t varies in the interval [0, 1]. Show how to substitute t with a new parameter u such that the curve will be drawn when −1 ≤ u ≤ +1. Exercise 13.5: Calculate the Bernstein polynomials for n = 4. It can be proved by induction that the general, (n + 1)-point B´ezier curve can be represented by ⎛
⎞
P0 P1 .. .
⎜ ⎜ P(t) = (tn , tn−1 , . . . , t, 1)N ⎜ ⎜ ⎝P
n−1
⎟ ⎟ ⎟ = T(t) · N · P, ⎟ ⎠
(13.9)
Pn
where
nn n 0 n (−1) ⎜ n n n−1 ⎜ ⎜ 0 n−1 (−1) ⎜ . N=⎜ . ⎜ . n n ⎜ 1 ⎝ 0 1 (−1) nn 0 0 0 (−1) ⎛
nn−1 n−1 1 n−1 (−1) nn−1 n−2 1 n−2 (−1) .. nn−1. 0 1 0 (−1) 0
nn−n ⎞ 0 n n−n (−1) ⎟ ⎟ ··· 0 ⎟ ⎟ ⎟. ··· 0 ⎟ ⎟ ··· 0 ⎠ ··· 0 ···
(13.10)
Matrix N is symmetric and its elements below the second diagonal are all zeros. Its determinant therefore equals (up to a sign) the product of the diagonal elements, which are all nonzero. A nonzero determinant implies a nonsingular matrix. Thus, matrix N always has an inverse. N can also be written as the product AB, where nn−1 ⎛ n ⎞ n n−1 0 · · · nn n−n n (−1) 1 n−1 (−1) n−n (−1) nn−1 ⎜ n ⎟ n−1 n−2 ⎜ ⎟ ··· 0 1 n−2 (−1) ⎜ n−1 (−1) ⎟ ⎜ ⎟ .. .. A=⎜ ⎟ . . · · · 0 ⎜ ⎟ n n n−1 ⎜ ⎟ 1 0 ··· 0 ⎝ ⎠ 1 (−1) 1 0 (−1) n 0 0 ··· 0 0 (−1) ⎛ n
⎞ 0n · · · 0 ⎜ 0 ··· 0 ⎟ 1 ⎟ ⎜ . B=⎜ . .. ⎟ .. ⎝ .. . . ⎠ 0 0 · · · nn Figure 13.3 shows the B´ezier N matrices for n = 1, 2, . . . , 7. and
0
13 B´ ezier Approximation
N1 =
−1 1 1 0
,
⎛
⎞ 1 −2 1 ⎝ N2 = −2 2 0 ⎠ , 1 0 0 ⎛
⎞ −1 3 −3 1 3 0⎟ ⎜ 3 −6 N3 = ⎝ ⎠, −3 3 0 0 1 0 0 0 ⎛
⎞ 1 −4 6 −4 1 ⎜ −4 12 −12 4 0 ⎟ ⎜ ⎟ 6 0 0⎟, N4 = ⎜ 6 −12 ⎝ ⎠ −4 4 0 0 0 1 0 0 0 0 ⎛
⎞ −1 5 −10 10 −5 1 −20 30 −20 5 0 ⎟ ⎜ 5 ⎜ ⎟ 0 0⎟ ⎜ −10 30 −30 10 N5 = ⎜ ⎟, 0 0 0⎟ ⎜ 10 −20 10 ⎝ ⎠ −5 5 0 0 0 0 1 0 0 0 0 0 ⎛
⎞ 1 −6 15 −20 15 −6 1 30 −60 60 −30 6 0 ⎟ ⎜ −6 ⎜ ⎟ 0 0⎟ ⎜ 15 −60 90 −60 15 ⎜ ⎟ 0 0 0⎟, N6 = ⎜ −20 60 −60 20 ⎜ ⎟ 0 0 0 0⎟ ⎜ 15 −30 15 ⎝ ⎠ −6 6 0 0 0 0 0 1 0 0 0 0 0 0 ⎛
⎞ −1 7 −21 35 −35 21 −7 1 −42 105 −140 105 −42 7 0 ⎟ ⎜ 7 ⎜ ⎟ 0 0⎟ ⎜ −21 105 −210 210 −105 21 ⎜ ⎟ 35 0 0 0⎟ ⎜ 35 −140 210 −140 N7 = ⎜ ⎟. 35 0 0 0 0⎟ ⎜ −35 105 −105 ⎜ ⎟ 21 −42 21 0 0 0 0 0 ⎜ ⎟ ⎝ ⎠ −7 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Figure 13.3: The First Seven B´ezier Basis Matrices.
637
13.2 The Bernstein Form of the B´ ezier Curve
638
Exercise 13.6: Calculate the B´ezier curve for the case n = 1 (two control points). What kind of a curve is it? Exercise 13.7: Generally, the B´ezier curve passes through the first and last control points, but not through the intermediate points. Consider the case of three points P0 , P1 , and P2 on a straight line. Intuitively, it seems that the curve will be a straight line and would therefore pass through the interior point P1 . Is that so? The B´ezier curve can also be represented in a very compact and elegant way as P(t) = (1 − t + tE)n P0 , where E is the shift operator defined by EPi = Pi+1 (i.e., applying E to point Pi produces point Pi+1 ). The definition of E implies EP0 = P1 , E 2 P0 = P2 , and E i P0 = Pi . The B´ezier curve can now be written n n n i n i t (1 − t)n−i Pi = t (1 − t)n−i E i P0 i i i=0 i=0 n n n = (tE)i (1 − t)n−i P0 = tE + (1 − t) P0 , i i=0
P(t) =
where the last step is an application of the binomial theorem, Equation (13.1). Example: For n = 1, this representation amounts to P(t) = (1 − t + tE)P0 = P0 (1 − t) + P1 t. For n = 2, we get P(t) = (1 − t + tE)2 P0 = (1 − t + tE − t + t2 − t2 E + tE − t2 E + t2 E 2 )P0 = P0 (1 − 2t + t2 ) + P1 (2t − 2t2 ) + P2 t2 = P0 (1 + t)2 + P1 2t(1 − t) + P2 t2 . Given n + 1 control points P0 through Pn , we can represent the B´ezier curve for (n) (j) the points by Pn (t), where the quantity Pi (t) is defined recursively by (j)
Pi (t) =
(j−1)
(j−1)
(1 − t)Pi−1 (t) + tPi Pi ,
(t), for j > 0, for j = 0.
(13.11)
The following examples show how the definition above is used to generate the quantities (j) (n) Pi (t) and why Pn (t) is the degree-n curve: (0)
P0 (t) = P0 , (1) P1 (t) (2) P2 (t)
= (1 − = (1 −
(0)
P1 (t) = P1 , (0) t)P0 (t) (1) t)P1 (t)
+ +
(0)
P2 (t) = P2 , . . . , P(0) n (t) = Pn ,
(0) tP1 (t) (1) tP2 (t)
= (1 − t)P0 + tP1 ,
13 B´ ezier Approximation
639
= (1 − t) (1 − t)P0 + tP1 + t (1 − t)P1 + tP2 = (1 − t)2 P0 + 2t(1 − t)P1 + t2 P2 , (3) P3 (t)
(2)
(2)
= (1 − t)P2 (t) + tP3 (t) (1) (1) (1) (1) = (1 − t) (1 − t)P1 (t) + tP2 (t) + t (1 − t)P2 (t) + tP3 (t) (1)
(1)
(1)
= (1 − t)2 P1 (t) + 2t(1 − t)P2 (t) + t2 P3 (t) = (1 − t)2 (1 − t)P0 + tP1 + 2t(1 − t) (1 − t)P1 + tP2 + t2 (1 − t)P2 + tP3 = (1 − t)3 P0 + 3t(1 − t)2 P1 + 3t2 (1 − t)P2 + t3 P3 .
13.3 Fast Calculation of the Curve Computing the B´ezier curve is straightforward but slow. A little thinking, however, shows that it can be speeded up considerably, a feature that makes this curve very useful in practice. This section discusses three methods. Method 1: We notice the following: The computation requires the binomials ( ni ) for i = 0, 1, . . . , n, which, in turn, require the factorials 0!, 1!, . . . , n!. The factorials can be precalculated once (each one from its predecessor) and stored in a table. They can later be used to calculate all the necessary binomials and those can also be stored in a table. The calculation involves terms of the form ti for i = 0, 1, . . . , n and for many t values in the interval [0, 1]. These can also be precomputed and stored in a two-dimensional table where they can be accessed later, using t and i as indexes. This has the advantage that the values of (1 − t)n−i can be read from the same table (using 1 − t and n − i as row and column indexes). The calculation now reduces to a sum where each term is a product of four quantities, one control point and three numbers from tables. Instead of computing n n i t (1 − t)n−i Pi , i i=0
we need to compute the simple sum n
Table1 [i, n] · Table2 [t, i] · Table2 [1 − t, n − i] · Pi .
i=0
The parameter t is a real number that varies from 0 to 1, so a practical implementation of this method should use an integer T related to t. For example, if we increment t in 100 steps, then T should be the integer 100t.
13.3 Fast Calculation of the Curve
640
Method 2: Once n is known, each of the n + 1 Bernstein polynomials Bn,i (t), i = 0, 1, . . . , n, can be precalculated for all the necessary values of t and stored in a table. The curve can now be calculated as the sum n
Table[t, i]Pi ,
i=0
indicating that each point on the computed curve requires n + 1 table lookups, n + 1 multiplications, and n additions. Again, an integer index T should be used instead of t. Method 3: Use forward differences in combination with the Taylor series representation, to speed up the calculation significantly. The B´ezier curve, which we denote by B(t), is drawn pixel by pixel in a loop where t is incremented from 0 to 1 in fixed, small steps of Δt. The principle of forward differences (Section 8.8.1) is to find a quantity dB such that B(t + Δt) = B(t) + dB for any value of t. If such a dB can be found, then it is enough to calculate B(0) (which, as we know, is simply P0 ) and use forward differences to calculate B(0 + Δt) = B(0) + dB, B(2Δt) = B(Δt) + dB = B(0) + 2dB, and, in general,
B(iΔt) = B (i − 1)Δt + dB = B(0) + i dB.
The point is that dB should not depend on t. If dB turns out to depend on t, then as we advance t from 0 to 1, we would have to use different values of dB, slowing down the calculations. The fastest way to calculate the curve is to precalculate dB before the loop starts and to repeatedly add this precalculated value to B(t) inside the loop. We calculate dB by using the Taylor series representation of the B´ezier curve. In general, the Taylor series representation of a function f (t) at a point f (t + Δt) is the infinite sum f (t + Δt) = f (t) + f (t)Δt +
f (t)Δ2 t f (t)Δ3 t + + ···. 2! 3!
In order to avoid dealing with an infinite sum, we limit our discussion to cubic B´ezier curves. These are the most common B´ezier curves and are used by many popular graphics applications. They are defined by four control points and are given by Equations (13.7) and (13.8): B(t) = (1 − t)3 P0 + 3t(1 − t)2 P1 + 3t2 (1 − t)P2 + t3 P3 ⎛ ⎞⎛ ⎞ −1 3 −3 1 P0 3 0 ⎟ ⎜ P1 ⎟ ⎜ 3 −6 = (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠. P2 −3 3 0 0 P3 1 0 0 0 These curves are cubic polynomials in t, implying that only their first three derivatives are nonzero. In order to simplify the calculation of their derivatives, we need to express
13 B´ ezier Approximation
641
these curves in the form B(t) = at3 + bt2 + ct + d (Equation (10.1)). This is done by B(t) = (1 − t)3 P0 + 3t(1 − t)2 P1 + 3t2 (1 − t)P2 + t3 P3 = 3(P1 − P2 ) − P0 + P3 t3 + 3(P0 + P2 ) − 6P1 t2 + 3(P1 − P0 )t + P0 = at3 + bt2 + ct + d, so a = 3(P1 − P2 ) − P0 + P3 , b = 3(P0 + P2 ) − 6P1 , c = 3(P1 − P0 ), and d = P0 . These relations can also be expressed in matrix notation ⎞ ⎛ ⎞ ⎛ ⎞⎛ a −1 3 −3 1 P0 3 0 ⎟ ⎜ P1 ⎟ ⎜ b ⎟ ⎜ 3 −6 ⎝ ⎠=⎝ ⎠⎝P ⎠. c −3 3 0 0 2 P3 d 1 0 0 0 The curve is now easy to differentiate Bt (t) = 3at2 + 2bt + c,
Btt (t) = 6at + 2b,
Bttt (t) = 6a;
and the Taylor series representation yields dB = B(t + Δt) − B(t) Btt (t)Δ2 t Bttt (t)Δ3 t + 2 6 2 = 3a t Δt + 2b tΔt + cΔt + 3a tΔ2 t + bΔ2 t + aΔ3 t.
= Bt (t)Δt +
This seems like a failure since the value obtained for dB is a function of t (it should be denoted by dB(t) instead of just dB) and is also slow to calculate. However, the original cubic curve B(t) is a degree-3 polynomial in t, whereas dB(t) is only a degree-2 polynomial. This suggests a way out of our dilemma. We can try to express dB(t) by means of the Taylor series, similar to what we did with the original curve B(t). This should result in a forward difference ddB(t) that’s a polynomial of degree 1 in t. The quantity ddB(t) can, in turn, be represented by another Taylor series to produce a forward difference dddB that’s a degree-0 polynomial, i.e., a constant. Once we do that, we will end up with an algorithm of the form precalculate certain quantities; B = P0 ; for t:=0 to 1 step Δt do PlotPixel(B); B:=B+dB; dB:=dB+ddB; ddB:=ddB+dddB; endfor; The quantity ddB(t) is obtained by dB(t + Δt) = dB(t) + ddB(t) = dB(t) + dBt (t)Δt +
dB(t)tt Δ2 t , 2
13.3 Fast Calculation of the Curve
642 yielding
ddB(t) = dBt (t)Δt +
dB(t)tt Δ2 t 2
= (6a tΔt + 2bΔt + 3aΔ2 t)Δt +
6aΔtΔ2 t 2
= 6a tΔ2 t + 2bΔ2 t + 6aΔ3 t. Finally, the constant dddB is similarly obtained by ddB(t + Δt) = ddB(t) + dddB = ddB(t) + ddBt (t)Δt, yielding dddB = ddBt (t)Δt = 6aΔ3 t. The four quantities involved in the calculation of the curve are therefore B(t) = at3 + bt2 + ct + d, dB(t) = 3a t2 Δt + 2b tΔt + cΔt + 3a tΔ2 t + bΔ2 t + aΔ3 t, ddB(t) = 6a tΔ2 t + 2bΔ2 t + 6aΔ3 t, dddB = 6aΔ3 t. They all have to be calculated at t = 0, as functions of the four control points Pi , before the loop starts: B(0) = d = P0 , dB(0) = cΔt + bΔ2 t + aΔ3 t = 3Δt(P1 − P0 ) + Δ2 t 3(P0 + P2 ) − 6P1 + Δ3 t 3(P1 − P2 ) − P0 + P3 = 3Δt(P1 − P0 ) + 3Δ2 t(P0 − 2P1 + P2 ) + Δ3 t 3(P1 − P2 ) − P0 + P3 , ddB(0) = 2bΔ2 t + 6aΔ3 t = 2Δ2 t 3(P0 + P2 ) − 6P1 + 6Δ3 t 3(P1 − P2 ) − P0 + P3 = 6Δ2 t(P0 − 2P1 + P2 ) + 6Δ3 t 3(P1 − P2 ) − P0 + P3 , dddB = 6aΔ3 t = 6Δ3 t 3(P1 − P2 ) − P0 + P3 . The above relations can be expressed in matrix notation as follows: ⎛
⎞ ⎛ dddB 6 0 0 ddB(0) ⎜ ⎟ ⎜6 2 0 ⎝ ⎠=⎝ dB(0) 1 1 1 B(0) 0 0 0 ⎛ ⎞⎛ 3 6 0 0 0 Δ t ⎜6 2 0 0⎟⎜ 0 =⎝ ⎠⎝ 0 1 1 1 0 0 0 0 0 1
⎞⎛ 3 0 0 0 Δ t Δ2 t 0 0⎟⎜ 0 ⎠⎝ 0 0 Δt 0 0 0 0 1 ⎞⎛ 0 0 0 −1 Δ2 t 0 0 ⎟ ⎜ 3 ⎠⎝ 0 Δt 0 −3 0 0 1 1
⎞⎛ ⎞ a 0 0⎟⎜b⎟ ⎠⎝ ⎠ 0 c 1 d ⎞ ⎞⎛ 3 −3 1 P0 −6 3 0 ⎟ ⎜ P1 ⎟ ⎠ ⎠⎝ P2 3 0 0 P3 0 0 0
13 B´ ezier Approximation ⎛
−6Δ3 t 6Δ2 t − 6Δ3 t 3Δ2 t − Δ3 t − 3Δt 1 ⎞ ⎛ P0 ⎜P ⎟ = Q⎝ 1⎠, P2 P3
⎜ =⎝
18Δ3 t −12Δ2 t + 18Δ3 t −6Δ2 t + 3Δ3 t + 3Δt 0
643 ⎞⎛
⎞
−18Δ3 t 6Δ3 t P0 6Δ2 t − 18Δ3 t 6Δ3 t ⎟ ⎜ P1 ⎟ ⎠⎝ ⎠ 3Δ2 t − 3Δ3 t Δ3 t P2 0 0 P3
where Q is a 4×4 matrix that can be calculated once Δt is known. A detailed examination of the above expressions shows that the following quantities have to be precalculated: 3Δt, 3Δ2 t, Δ3 t, 6Δ2 t, 6Δ3 t, P0 − 2P1 + P2 , and 3(P1 − P2 ) − P0 + P3 . We therefore end up with the simple, fast algorithm shown in Figure 13.4. For those interested in a quick test, the corresponding Mathematica code is also included.
Q1:=3Δt; Q2:=Q1×Δt; // 3Δ2 t Q3:=Δ3 t; Q4:=2Q2; // 6Δ2 t Q5:=6Q3; // 6Δ3 t Q6:=P0 − 2P1 + P2 ; Q7:=3(P1 − P2 ) − P0 + P3 ; B:=P0 ; dB:=(P1 − P0 )Q1+Q6×Q2+Q7×Q3; ddB:=Q6×Q4+Q7×Q5; dddB:=Q7×Q5; for t:=0 to 1 step Δt do Pixel(B); B:=B+dB; dB:=dB+ddB; ddB:=ddB+dddB; endfor; n=3; Clear[q1,q2,q3,q4,q5,Q6,Q7,B,dB,ddB,dddB,p0,p1,p2,p3,tabl]; p0={0,1}; p1={5,.5}; p2={0,.5}; p3={0,1}; (* Four points *) dt=.01; q1=3dt; q2=3dt^2; q3=dt^3; q4=2q2; q5=6q3; Q6=p0-2p1+p2; Q7=3(p1-p2)-p0+p3; B=p0; dB=(p1-p0) q1+Q6 q2+Q7 q3; (* space indicates *) ddB=Q6 q4+Q7 q5; dddB=Q7 q5; (* multiplication *) tabl={}; Do[{tabl=Append[tabl,B], B=B+dB, dB=dB+ddB, ddB=ddB+dddB}, {t,0,1,dt}]; ListPlot[tabl]; Figure 13.4: A Fast B´ezier Curve Algorithm.
Each point of the curve (i.e., each pixel in the loop) is calculated by three additions
13.4 Properties of the Curve
644
and three assignments only. There are no multiplications and no table lookups. This is a very fast algorithm indeed!
13.4 Properties of the Curve The following useful properties are discussed in this section: 1. The weights add up to 1 (they This is easily shown from are barycentric). Newton’s binomial theorem (a + b)n = ni=0 ni ai bn−i : n n n n i 1 = t + (1 − t) = t (1 − t)n−i = Bn,i (t). i i=0 i=0
(13.12)
2. The curve passes through the two endpoints P0 and Pn . We assume that 00 = 1 and observe that Bn,0 (0) = n0 00 (1 − 0)n−0 = 1 · 1 · 1n = 1, which implies P(0) =
n
Pi Bn,i (0) = P0 Bn,0 (0) = P0 .
i=0
Also, the relation Bn,n (1) = implies P(1) =
n n (n−n) = 1 · 1 · 00 = 1, n 1 (1 − 1)
n
Pi Bn,i (1) = Pn Bn,n (1) = Pn .
i=0
3. Another interesting property of the B´ezier curve is its symmetry with respect to the numbering of the control points. If we number the points Pn , Pn−1 , . . . , P0 , we end up with the same curve, except that it proceeds from right (point P0 ) to left (point Pn ). The Bernstein polynomials satisfy the identity Bn,j (t) = Bn,n−j (1 − t), which can be proved directly and which can be used to prove the symmetry n j=0
Pj Bn,j (t) =
n
Pn−j Bn,j (1 − t).
j=0
4. The first derivative (the tangent vector) of the curve is straightforward to derive Pt (t) =
n
Pi Bn,i (t)
i=0
=
n 0
Pi ( ni ) i ti−1 (1 − t)n−i + ti (n − i)(1 − t)n−i−1 (−1)
13 B´ ezier Approximation =
n 0
Pi ( ni )i ti−1 (1 − t)n−i −
n−1 0
n (using the identity n( n−1 i−1 ) = i( i ),
=n
n
Pi ( ni )ti (n − i)(1 − t)n−1−i we get)
i−1 Pi ( n−1 (1 − t)(n−1)−(i−1) − n i−1 )t
1
n−1 0
i−1 (but ( n−1 (1 − t)(n−1)−(i−1) = Bn−1,i−1 (t), i−1 )t
=n =n =n
n−1
Pi+1 Bn−1,i (t) − n
0 n−1
n−1
645
i n−1−i Pi ( n−1 i )t (1 − t)
so)
Pi Bn−1,i (t)
0
[Pi+1 − Pi ]Bn−1,i (t)
0 n−1
ΔPi Bn−1,i (t),
where ΔPi = Pi+1 − Pi .
(13.13)
0
Note that the tangent vector is a B´ezier weighted sum (of n terms) where each Bernstein polynomial is the weight of a “control point” ΔPi (ΔPi is the difference of two points, hence it is a vector, but since it is represented by a pair or a triplet, we can conveniently consider it a point). As a result, the second derivative is obviously another B´ezier sum based on the n − 1 “control points” Δ2 Pi = ΔPi+1 − ΔPi = Pi+2 − 2Pi+1 + Pi . 5. The weight functions Bn,i (t) have a maximum at t = i/n. To see this, we first differentiate the weights
(t) = ( ni ) i ti−1 (1 − t)n−i + ti (n − i)(1 − t)n−i−1 (−1) Bn,i = ( ni )i ti−1 (1 − t)n−i − ( ni )ti (n − i)(1 − t)n−1−i ,
then equate the derivative to zero ( ni )i ti−1 (1 − t)n−i − ( ni )ti (n − i)(1 − t)n−1−i = 0. Dividing by ti−1 (1 − t)n−i−1 yields i(1 − t) − t(n − i) = 0 or t = i/n. 6. The two derivatives Pt (0) and Pt (1) are easy to derive from Equation (13.13) and are used to reshape the curve. They are Pt (0) = n(P1 −P0 ) and Pt (1) = n(Pn −Pn−1 ). Since n is always positive, we conclude that Pt (0), the initial tangent of the curve, points in the direction from P0 to P1 . This initial tangent can easily be controlled by moving point P1 . The situation for the final tangent is similar. 7. The B´ezier curve features global control. This means that moving one control point Pi modifies the entire curve. Most of the change, however, occurs at the vicinity of Pi . This feature stems from the fact that the weight functions Bn,i (t) are nonzero for all values of t except t = 0 and t = 1. Thus, any change in a control point Pi affects the contribution of the term Pi Bn,i (t) for all values of t. The behavior of the global control of the B´ezier curve is easy to analyze. When a control point Pk is moved by a vector (α, β) to a new location Pk + (α, β), the curve P(t) is changed from the original sum
13.4 Properties of the Curve
646
Bni (t)Pi to n
Bni (t)Pi + Bnk (t)(α, β) = P(t) + Bnk (t)(α, β).
i=0
Thus, every point P(t0 ) on the curve is moved by the vector Bnk (t0 )(α, β). The points are all moved in the same direction, but by different amounts, depending on t0 . This behavior is demonstrated by Figure 13.27b. (In principle, the figure is for a rational curve, but the particular choice of weights in the figure results in a standard curve.) 8. The concept of the convex hull of a set of points was introduced in Section 9.2.5. Here, we show a connection between the B´ezier curve and the convex hull. Let P1 , P2 ,. . . , Pn be a given set of points and let a point P be constructed as a barycentric sum of these points with nonnegative weights, i.e., P=
n
ai Pi ,
where
ai = 1 and ai ≥ 0.
(13.14)
i=1
It can be shown that the set of all points P satisfying Equation (13.14) lies in the convex hull of P1 , P2 , . . . , Pn . The B´ezier curve, Equation (13.5), satisfies Equation (13.14) for all values of t, so all its points lie in the convex hull of the set of control points. Thus, the curve is said to have the convex hull property. The significance of this property is that it makes the B´ezier curve more predictable. A designer specifying a set of control points needs only little experience to visualize the shape of the curve, since the convex hull property guarantees that the curve will not “stray” far from the control points. 9. The control polygon of a B´ezier curve intersects the curve at the first and the last points and in general may intersect the curve at a certain number m, of points (Figure 13.1, where m is 2, 3, or 4, may help to visualize this). If we take a straight segment and maneuver it to intersect the curve as many times as possible, we find that the number of intersection points is always less than or equal m. This property of the B´ezier curve may be termed variation diminution. 10. Imagine that each control point is moved 10 units to the left. Such a transformation will move every point on the curve to the left by the same amount. Similarly, if the control points are rotated, reflected, or are subject to any other affine transformation, the entire curve will be transformed in the same way. We say that the B´ezier curve is invariant under affine transformations. However, the curve is not invariant under projections. If we compute a three-dimensional B´ezier curve and project every point on the curve by a perspective projection, we end up with a two-dimensional curve P(t). If we then project the three-dimensional control points and compute a two-dimensional B´ezier curve Q(t) from the projected, two-dimensional points, the two curves P(t) and Q(t) will be different. Invariance under projections can be achieved by switching from the standard B´ezier curve to the rational B´ezier curve (Section 13.15).
13 B´ ezier Approximation
647
13.5 Connecting B´ ezier Curves The B´ezier curve is a polynomial of degree n, which makes it slow to compute for large values of n. It is therefore preferable to connect several B´ezier segments, each defined by a few points, typically four to six, into one smooth curve. The condition for smooth connection of two such segments is easy to derive. We assume that the control points are divided into two sets P0 , P1 , . . . , Pn and Q0 , Q1 , . . . , Qm . In order for the two segments to connect, Pn must equal Q0 . We already know that the extreme tangent vectors of the B´ezier curve satisfy Qt (0) = m(Q1 − Q0 ) and Pt (1) = n(Pn − Pn−1 ). The condition for a smooth connection is Qt (0) = Pt (1) or mQ1 −mQ0 = nPn −nPn−1 . Substituting Q0 = Pn yields Pn =
m n Q1 + Pn−1 . m+n m+n
(13.15)
The three points Pn−1 , Pn , and Q1 must therefore be dependent. Hence, the condition for smooth linking is that the three points Pn−1 , Pn , and Q1 be collinear. In the special case where n = m, Equation (13.15) reduces to Pn = 0.5Q1 + 0.5Pn−1 , implying that Pn should be the midpoint between Q1 and Pn−1 . Example: Given that P4 = Q0 = (6, −1), Q1 = (7, 0), and m = 5, we compute P3 by 4 5 (7, 0) + P3 , (6, −1) = 4+5 4+5 which yields P3 = (21/4, −9/4). Exercise 13.8: A more general condition for a smooth connection of two curve segments is αQt (0) = Pt (1). The two tangents at the connection point are in the same direction, but have different magnitudes. Discuss this condition and what it means for the three control points Pn−1 , Pn = Q0 , and Q1 . Breaking large curves into short segments has the additional advantage of easy control. The B´ezier curve offers only global control, but if it is constructed of separate segments, a change in the control points in one segment will not affect the other segments. Figure 13.5 is an example of two B´ezier segments connected smoothly. Q1
P1
Q2 Q4
P3=Q0
P0 P2
Figure 13.5: Connecting B´ezier Segments.
Q3
648
13.6 The B´ ezier Curve as a Linear Interpolation
13.5.1 Quadratic and Cubic Blending We start with the linear blend P(t) = (1 − t)P1 + tP2 (Equation (9.1)). This means that if we select, for example, t = 0.7, then P(t) will be a blend of 30% of P1 and 70% of P2 . It is possible to blend points in nonlinear ways. An intuitive way to get, for example, quadratic blending is to square the two weights of the linear blend. However, the result, which is P(t) = (1 − t)2 P1 + t2 P2 , depends on the particular coordinate axes used, since the two coefficients (1 − t)2 and t2 are not barycentric. It turns out that the sum (1 − t)2 + 2t(1 − t) + t2 equals 1. As a result, we can use quadratic blending to blend three points, but not two. Similarly, if we try a cubic blend by simply writing P(t) = (1 − t)3 P1 + t3 P2 , we get the same problem. Cubic blending can be achieved by adding four terms with weights t3 , 3t2 (1 − t), 3t(1 − t)2 , and (1 − t)3 . We therefore conclude that B´ezier methods can be used for blending. The B´ezier curve is a result of blending several points with the Bernstein polynomials, which add up to unity. Quadratic and cubic blending are special cases of the B´ezier blending (or the B´ezier interpolation).
13.6 The B´ ezier Curve as a Linear Interpolation The original form of the B´ezier curve, as developed by de Casteljau in 1959, is based on an approach entirely different from that of B´ezier. Specifically, it employs linear interpolation and the mediation operator. Before we start, Figure 13.6 captures the essence of the concepts discussed here. The figure shows how a set of straight segments (or, equivalently, a single segment that slides along the base lines) creates the illusion (some would say, the magic) of a curve. Such a curve is called the envelope of the set, and the linear interpolation method of this section shows how to extend this simple construction to more than three points and two segments. P1
P
(t) 01
P
12 (
t)
P2 P0 Figure 13.6: A Curve as an Envelope of Straight Segments.
Figure 13.6 involves only three points, which makes it easy to derive the expression of the envelope. The equation of the straight segment from P0 to P1 is P01 (t) = (1−t)P0 +t P1 and the equation of the segment between P1 and P2 is similarly P12 (t) =
13 B´ ezier Approximation
649
(1 − t)P1 + t P2 . If we fix t at a certain value, then P01 (t) and P12 (t) become points on the two segments. The straight segment connecting these points has the familiar form P(t) = (1 − t)P01 (t) + t P12 (t) = (1 − t)2 P0 + 2t(1 − t)P1 + t2 P2 . For a fixed t, this is a point on the B´ezier curve defined by P0 , P1 , and P2 . When t is varied, the entire curve segment is obtained. Thus, the magical envelope has become a familiar curve. We can call this envelope a multilinear curve. Linear, because it is constructed from straight segments, and multi, because several such segments are required. In order to extend this method to more than three points, we need appropriate notation. We start with a simple definition. The mediation operator t[[P0 , P1 ]] between two points P0 and P1 is defined as the familiar linear interpolation* (the fundamental equation of computer graphics) t[[P0 , P1 ]] = (1 − t)P0 + tP1 = t(P1 − P0 ) + P0 ,
where 0 ≤ t ≤ 1.
The general definition, for any number of points, is recursive. The mediation operator can be applied to any number of points according to t[[P0 , . . . , Pn ]] = t[[ t[[P0 , . . . , Pn−1 ]], t[[P1 , . . . , Pn ]] ]], .. . t[[P0 , P1 , P2 , P3 ]] = t[[ t[[P0 , P1 , P2 ]], t[[P1 , P2 , P3 ]] ]], t[[P0 , P1 , P2 ]] = t[[ t[[P0 , P1 ]], t[[P1 , P2 ]] ]], t[[P0 , P1 ]] = (1 − t)P0 + tP1 = t(P1 − P0 ) + P0 , where 0 ≤ t ≤ 1. This operator creates curves that interpolate between the points. It has the advantages of being a simple mathematical function (and therefore fast to calculate) and of producing interpolation curves whose shape can easily be predicted. We examine cases involving more and more points. Case 1. Two points. Given the two points P0 and P1 , we denote the straight segment connecting them by L01 . It is easy to see that L01 = t[[P0 , P1 ]], because the mediation operator is a linear function of t and because 0[[P0 , P1 ]] = P0 and 1[[P0 , P1 ]] = P1 . Notice that values of t below 0 or above 1 correspond to those parts of the line that do not lie between the two points. Such values may be of interest in certain cases but not in the present context. The interpolation curve between the two points is denoted by P1 (t) and is simply selected as the line L01 connecting the points. Hence, P1 (t) = L01 = t[[P0 , P1 ]]. Notice that a straight line is also a polynomial of degree 1. Case 2. Three points. Given the three points P0 , P1 , and P2 (Figure 13.7), the mediation operator can be used to construct an interpolation curve between them in the following steps: 1. Construct the two lines L01 = t[[P0 , P1 ]] and L12 = t[[P1 , P2 ]]. * The term “mediation” seems to have originated in [Knuth 86].
650
13.6 The B´ ezier Curve as a Linear Interpolation P1 L01
L012
P12 L12
P01 P012 P0
P2 Figure 13.7: Repeated Linear Interpolation.
2. For some 0 ≤ t0 ≤ 1, consider the two points P01 = t0 [[P0 , P1 ]] and P12 = t0 [[P1 , P2 ]]. Connect the points with a line L012 . The equation of this line is, of course, t[[P01 , P12 ]] and it equals L012 = t[[P01 , P12 ]] = t[[ t[[P0 , P1 ]], t[[P1 , P2 ]] ]] = t[[P0 , P1 , P2 ]]. 3. For the same t0 , select point P012 = t0 [[P0 , P1 , P2 ]] on L012 . The point can be expressed as P012 = t0 [[P0 , P1 , P2 ]] = t0 [[P01 , P12 ]] = t0 [[ t0 [[P0 , P1 ]], t0 [[P1 , P2 ]] ]]. Now, release t0 and let it vary from 0 to 1. Point P012 slides along the line L012 , whose endpoints will, in turn, slide along L01 and L12 . The curve described by point P012 as it is sliding is the interpolation curve for P0 , P1 , and P2 that we are seeking. It is the equivalent of the envelope curve of Figure 13.6. We denote it by P2 (t) and its expression is easy to calculate, using the definition of t[[Pi , Pj ]]: P2 (t) = t[[P0 , P1 , P2 ]] = t[[ t[[P0 , P1 ]], t[[P1 , P2 ]] ]] = t[[tP1 + (1 − t)P0 , tP2 + (1 − t)P1 ]] = t[tP2 + (1 − t)P1 ] + (1 − t)[tP1 + (1 − t)P0 ] = P0 (1 − t)2 + 2P1 t(1 − t) + P2 t2 . P2 (t) is therefore the B´ezier curve for three points. Case 3. Four points. Given the four points P0 , P1 , P2 , and P3 , we follow similar steps: 1. Construct the three lines L01 = t[[P0 , P1 ]], L12 = t[[P1 , P2 ]], and L23 = t[[P2 , P3 ]]. 2. Select three points, P01 = t0 [[P0 , P1 ]], P12 = t0 [[P1 , P2 ]], and P23 = t0 [[P2 , P3 ]], and construct lines L012 = t[[P0 , P1 , P2 ]] = t[[P01 , P12 ]] and L123 = t[[P1 , P2 , P3 ]] = t[[P12 , P23 ]]. 3. Select two points, P012 = t0 [[P01 , P12 ]] on segment L012 and P123 = t0 [[P12 , P23 ]] on segment L123 . Construct a new segment L0123 as the mediation t[[P0 , P1 , P2 , P3 ]] = t[[P012 , P123 ]].
13 B´ ezier Approximation
651 P2
P12 P123
P1 P012
P23 P0123 P3
P01
P0 Figure 13.8: Scaffolding for k = 3.
4. Select point P0123 = t0 [[P012 , P123 ]] on L0123 . When t0 varies from 0 to 1, point P0123 slides along L0123 , whose endpoints, in turn, slide along L012 and L123 , which also slide. The entire structure, which resembles a scaffolding (Figure 13.8), slides along the original three lines (see Java animation in [redpicture 11]). The interpolation curve for the four original points is denoted by P3 (t) and its expression is not hard to calculate, using the expression for P2 (t) = t[[P0 , P1 , P2 ]]: P3 (t) = t[[P0 , P1 , P2 , P3 ]] = t[[ t[[P0 , P1 , P2 ]], t[[P1 , P2 , P3 ]] ]] = t[t2 P3 + 2t(1 − t)P2 + (1 − t)2 P1 ] + (1 − t)[t2 P2 + 2t(1 − t)P1 + (1 − t)2 P0 ] = t3 P3 + 3t2 (1 − t)P2 + 3t(1 − t)2 P1 + (1 − t)3 P0 . P3 (t) is therefore the B´ezier curve for four points. Case 4. In the general case, n + 1 points P0 , P1 ,. . . , Pn are given. The interpolation curve is, similarly, t[[P0 , P1 , . . . , Pn ]] = t[[P01...n−1 , P12...n ]]. It can be proved by induction that its value is the degree-n polynomial Pn (t) =
n
Pi Bn,i (t),
where Bn,i (t) = (ni)ti (1 − t)n−i ,
i=0
that is the B´ezier curve for n + 1 points. The two approaches to curve construction, using Bernstein polynomials and using scaffolding, are therefore equivalent. Exercise 13.9: The scaffolding algorithm illustrated in Figure 13.8 is easy to understand because of the special placement of the four control points. The resulting curve is similar to a circular arc and does not have an inflection point (Section 8.9.8). Prove your grasp of this algorithm by executing it on the curve of Figure 13.9. Try to select the intermediate points so as to end up with the inflection point.
13.6 The B´ ezier Curve as a Linear Interpolation
652
P2
P0 P3 P1 Figure 13.9: Scaffolding with an Inflection Point.
Figure 13.10 summarizes the process of scaffolding in the general case. The process takes n steps. In the first step, n new points are constructed between the original n + 1 control points. In the second step, n − 1 new points are constructed, between the n points of step 1 and so on, up to step n, where one point is constructed. The total number of points constructed during the entire process is therefore n + (n − 1) + (n − 2) + · · · + 2 + 1 = n(n + 1)/2.
Step
Points constructed
# of points
1 2 3 .. .
P01 P12 P23 . . . Pn−1,n P012 P123 P234 . . . Pn−2,n−1,n P0123 P1234 P2345 . . . Pn−3,n−2,n−1,n .. .
n n−1 n−2 .. .
n
P0123...n
P0
1 Figure 13.10: The n Steps of Scaffolding.
P1 P01
P2 P12
P012
P3 P23
P123
P0123
13 B´ ezier Approximation
653
13.7 Blossoming The curves derived and discussed in the preceding chapters are based on polynomials. A typical curve is a pair or a triplet of polynomials of a certain degree n in t. Mathematicians know that a degree-n polynomial Pn (t) of a single variable can be associated with a function f (u1 , u2 , . . . , un ) in n variables that’s linear (i.e., degree-1) in each variable and is symmetric with respect to the order of its variables. Such functions were named blossom by Lyle Ramshaw in [Ramshaw 87] to denote arrival at a promising stage. (The term pole was originally used by de Casteljau for those functions.) [Gallier 00] is a general, detailed reference for this topic. Given a B´ezier curve, this section shows how to derive its blossom and how to use the blossom to label the intermediate points obtained in the scaffolding construction. Other sections show how to apply blossoms to curve algorithms, such as curve subdivision (Section 13.8) and degree elevation (Section 13.9). Dictionary definitions Blossom: Noun: The period of greatest prosperity or productivity. Verb: To develop or come to a promising stage (Youth blossomed into maturity). Blossoming: The process of budding and unfolding of blossoms. We start by developing a special notation for use with blossoms. The equation of the straight segment from point P0 to point P1 is the familiar linear interpolation P(u) = (1 − u)P0 + uP1 . Its start point is P(0), its end point is P(1), and a general point on this segment is P(u) for 0 ≤ u ≤ 1. Because a straight segment has zero curvature, parameter values indicate arc lengths. Thus, the distance between P(0) and P(u) is proportional to u and the distance between P(u) and P(1) is proportional to 1 − u. We can therefore consider parameter values u in the interval [0, 1] a measure of distance (called affine distance) from the start of the segment. We introduce the symbol u to denote point P(u). Similarly, points P(0) and P(1) are denoted by 0 and 1, respectively (Figure 13.11a). P(u) P(0) u 0
P(1) 1
1
0
0 (a)
01=10
1
11
00
(b)
(c)
Figure 13.11: Blossom Notation for Points (Two Segments).
A spline consists of segments connected at the interior points, so we consider two straight segments connected at a common point. The endpoints of each segment are
13.7 Blossoming
654
denoted by 0 and 1, but this creates an ambiguity. There are now two points labeled 0 (Figure 13.11b). We distinguish between them by appending a bit to the symbol of each point. The two endpoints of one segment are now denoted by 00 and 01, while the two endpoints of the other segment are denoted by 10 and 11 (Figure 13.11c). The common point can be denoted by either 01 or 10. So far, it seems that the order of the individual indexes, 01 or 10, is immaterial. The new notation is symmetric with respect to the order of point indexes. We now select a point with a parameter value u on each segment. The two new points are denoted by 0u and 1u (Figure 13.12a), but they can also be denoted by u0 and u1, respectively. The two points are now connected by a segment and a new point selected at affine distance u on that segment (Figure 13.12b). The new point deserves the label uu because the endpoints of its segment have the common index u. 01=10 0u
01=10 0u
1u
00
11 (a)
00
uu
0u 1u
(b)
1u (c)
Figure 13.12: Blossom Notation for Points (Two Segments).
At this point it is clear that the simple scaffolding construction of Figure 13.12b is identical to the de Casteljau algorithm of Section 13.6, which implies that point uu is located on the B´ezier curve defined by the three points 00, 01, and 11 (Figure 13.12c). To illustrate this process for more points, it is applied to three line segments in Figure 13.13. Two bits are appended to each point in order to distinguish between the segments. Thus, a point is denoted by a triplet of the form 00x, 01x, or 11x. Notice that our indexes are symmetric, so 01x = 10x, which is why we use 11x instead of 10x to identify the third segment. Again, our familiarity with the B´ezier curve and the de Casteljau algorithm indicates intuitively that point uuu is located on the B´ezier curve defined by the four control points 000, 001, 011, and 111. Let us be grateful to people who make us happy, they are the charming gardeners who make our souls blossom. —Marcel Proust. An actual construction of the scaffolding for this case verifies our intuitive feeling. Given points 0uu and uu1, we can write them as 0uu and 1uu, which immediately produces point uuu (it’s located an affine distance u from 0uu). Similarly, given points 00u and 0u1, we can write them as 00u and 01u, which immediately produces point 0uu. A similar step produces 00u if points 000 and 001 are given. Thus, we conclude that knowledge of the four control points can produce all the intermediate points in the scaffolding construction and lead to one point uuu
13 B´ ezier Approximation 010 001
011
001
655
011
0u1
110 111
u11 000
111
000
00u 001
uu1 0uu
uuu
011
uuu 000
111
Figure 13.13: Blossom Notation for Points (Three Segments).
that is located on the B´ezier curve defined by the control points. This is an informal statement of the blossoming principle. This principle can be illustrated in a different way. We know that point 0u1 is obtained from points 001 and 011 as the linear interpolation 0u1 = (1−u)001+ u011. We can therefore start from point uuu and figure out its dependence on the four original points 000, 001, 011, and 111 as follows: uuu = (1 − u)0uu + u1uu
= (1 − u) (1 − u)00u + u01u + u (1 − u)10u + u11u = (1 − u)2 00u + 2u(1 − u)01u + u2 11u
= (1 − u)2 (1 − u)000 + u001 + 2u(1 − u) (1 − u)010 + u011
+ u2 (1 − u)110 + u111 = (1 − u)3 000 + 3u(1 − u)2 001 + 3u2 (1 − u)011 + u3 111 = B3,0 (u)000 + B3,1 (u)001 + B3,2 (u)011 + B3,3 (u)111, where B3,i are the Bernstein polynomials for n = 3. This again shows that point uuu lies on the B´ezier curve that is defined by the control points 000, 001, 011, and 111. So far, blossoming has been used to assign labels to the control points and to the intermediate points. Even this simple application illustrates some of the power and elegance of the blossoming approach. Section 13.6 employs the notation P234 , while various authors denote intermediate point i of scaffolding step j by dji . The blossom labels u1 u2 . . . un are much more natural and useful. We are now ready to see the actual blossom associated with the degree-n polynomial Pn (t) as given by [Ramshaw 87]. The blossom of Pn (t) is a function f (u1 , u2 , . . . , un ) that satisfies the following: 1. f is linear in each variable ui .
656
13.7 Blossoming
2. f is symmetric; the order of variables is irrelevant. Thus, f (u1 , u2 , . . . , un ) = f (u2 , u1 , . . . , un ) or any other permutation of the n variables. 3. The diagonal f (u, u, . . . , u) of f equals Pn (u). Requirement 1 suggests the name “multilinear function” but [Ramshaw 87] explains why the term “multiaffine” is more appropriate. Given Pn (t), such a multiaffine function is easy to derive and is also unique. Here is an example for n = 3. Given the cubic polynomial P (t) = −3t3 +6t2 +3t, we are looking for a function f (u, v, w) that’s linear in each of its three parameters and is symmetric with respect to their order. The general form of such a function is f (u, v, w) = a1 uvw + a2 uv + a3 uw + a4 vw + a5 u + a6 v + a7 w + a8 . If we also require that f (u, v, w) satisfies f (t, t, t) = P (t) for any t, it becomes obvious that a1 must equal the coefficient of t3 . Because of the required symmetry, the sum a2 + a3 + a4 must equal the coefficient of t2 and the sum a5 + a6 + a7 must equal the coefficient of t. Finally, a8 must equal the free term of P (t). Thus, we end up with the blossom f (u, v, w) = −3uvw + 2(uv + uw + vw) + (u + v + w) + 0. This blossom is unique. In general, given an n-degree polynomial, the corresponding multiaffine blossom function is easy to construct in this way. Here are some examples. Degree-0. P (t) = a → f (u, v, w) = a, a (u + v + w), 3 a (13.16) Degree-2. P (t) = at2 → f (u, v, w) = (uv + uw + vw), 3 3 2 Degree-3. P (t) = a3 t + a2 t + a1 + a0 a1 a2 → f (u, v, w) = a3 uvw + (uv + uw + vw) + (u + v + w) + a0 . 3 3 Degree-1. P (t) = at → f (u, v, w) =
The discussion above shows that the kth control point of the degree-n polynomial . . . 1). Notice that there are n + 1 such is associated with blossom value f (00 . . . 0 11 n−k
k
values, corresponding to the n + 1 control points, and that blossom symmetry implies f (011) = f (101) = f (110). If t varies in the general interval [a, b] instead of in [0, 1], then the kth control point is associated with the blossom value f (aa . . . a bb . . . b). n−k
k
Exercise 13.10: Given the four points P0 = (0, 1, 1), P1 = (1, 1, 0), P2 = (4, 2, 0), and P3 = (6, 1, 1), compute the B´ezier curve defined by them, construct the three blossoms associated with this curve, and show that the four blossom values f (0, 0, 0), f (0, 0, 1), f (0, 1, 1), and f (1, 1, 1) yield the control points.
13.7.1 Nonsmooth B´ ezier Curves The B´ezier curve may have cusps (kinks or sharp corners) at points where it has to loop on itself. At such points, the curve has no definite tangent vector, so if we try to calculate the tangent, we end up with the indefinite direction (0, 0).
13 B´ ezier Approximation
657
Example: Figure 13.14 shows three cubic B´ezier curves. All three are generated by control points P0 = (0, 0) and P3 = (1, 0). The other two (interior) control points are as follows: 1. P1 = (0.7, 1) and P2 = (0.3, 1). These produce the smooth green curve (dotdashed) of Figure 13.14a. 2. P1 = (1, 1) and P2 = (0, 1). Opening up the points produces the cusp of Figure 13.14b (solid curve). 3. P1 = (1.5, 1) and P2 = (−0.5, 1) (points not shown). Opening up the points even more produces a loop (the dashed red curve of Figure 13.14c). 1
P2=(0,1)
P2=(.3,1)
P1=(.7,1)
P1=(1,1)
0.8
0.6 (a) 0.4
(b) (c)
0.2
P3
P0 0.2
0.4
0.6
0.8
1
Figure 13.14: Three B´ezier Curves.
Exercise 13.11: Calculate the curve of case 2 and show that it has a cusp at its midpoint. (See also Exercise 13.7 for another nonsmooth B´ezier curve.)
13.8 Subdividing the B´ ezier Curve
658
13.8 Subdividing the B´ ezier Curve B´ezier methods are interactive. It is possible to control the shape of the curve by moving the control points and by smoothly connecting individual segments. Imagine a situation where the points are moved and maneuvered for a while, but the curve “refuses” to get the right shape. This indicates that there are not enough points. There are two ways to increase the number of points. One is to add a point to a segment while increasing its degree. This is called degree elevation and is discussed in Section 13.9. An alternative is to subdivide a B´ezier curve segment into two segments such that there is no change in the shape of the curve. If the original segment is of degree n (i.e., based on n + 1 control points), this is done by adding 2n − 1 new control points and deleting n − 1 of the original points, bringing the number of points to (n + 1) + (2n − 1) − (n − 1) = 2n + 1. Each new segment is based on n + 1 points and they share one of the new points. With more points, it is now possible to manipulate the control points of the two segments in order to fine-tune the shape of the segments. The advantage of this approach is that both the original and the new curves are based on n + 1 points, so only one set of Bernstein polynomials is needed. The new points being added consist of some of the ones constructed in the last k steps of the scaffolding process. For the case k = 2 (quadratic curve segments), the three points P01 , P12 , and P012 are added and the single point P1 is deleted (Figure 13.7). The two new segments consist of points P0 , P01 , and P012 , and P012 , P12 , and P2 . For the case k = 3 (cubic segments), the five points P01 , P23 , P012 , P123 , and P0123 are added and the two points P1 and P2 are deleted (Figure 13.8, duplicated here, where the inset shows the two segments with their control polygons). The two new segments consist of points P0 , P01 , P012 , and P0123 and P0123 , P123 , P23 , and P3 . P2
P12 P123
P1 P012
P23 P0123 P3
P01
P0 Figure 13.8: Scaffolding and Subdivision for k = 3 (Duplicate).
Using the mediation operator to express the new points in the scaffolding in terms of the original control points produces, for the quadratic case P01 = αP0 +(1−α)P1 , P12 = αP1 +(1−α)P2 , P012 = α2 P0 +2α(1−α)P1 +(1−α)2 P2 ,
13 B´ ezier Approximation
659
where α is any value in the range [0, 1]. We can therefore write ⎞ ⎛ 1 P0 ⎝ P01 ⎠ = ⎝ α P012 α2 ⎛ ⎞ ⎛ 2 P012 α ⎝ P12 ⎠ = ⎝ 0 0 P2 ⎛
⎞⎛ ⎞ 0 0 P0 ⎠ ⎝ P1 ⎠ , 1−α 0 2α(1 − α) (1 − α)2 P2 ⎞⎛ ⎞ P0 2α(1 − α) (1 − α)2 α 1 − α ⎠ ⎝ P1 ⎠ , 0 1 P2
for the left and right segments, respectively. Exercise 13.12: Use the mediation operator to calculate the scaffolding for the cubic case (four control points). Use α = 1/2 and write the results in terms of matrices, as above. In the general case where an (n + 1)-point B´ezier curve is subdivided, the n − 1 points being deleted are P1 , P2 ,. . . , Pn−1 (the original n − 1 interior control points). The 2n − 1 points added are the first and last points constructed in each scaffolding step (except the last step, where only one point is constructed). Figure 13.10 shows that these are points P01 , Pn−1,n (from step 1), P012 , Pn−2,n−1,n (from step 2), P0123 , Pn−3,n−2,n−1,n (from step 3), up to P0123...n from step n. The 2n − 1 points being added are therefore P01 , P012 , P0123 , . . . , P0123...n , P123...n , P23...n , . . . , Pn−1,n . These points can be computed in two ways as follows: 1. Perform the entire scaffolding procedure and save all the points, then use only the appropriate 2n − 1 points. 2. Compute just the required points. This is done by means of the two relations (a) P0123...k =
k
Bk,j (t)Pj ,
and (b) Pn−k,n−k+1,...,n =
j=0
k
Bk,j (t)Pn−k+j .
j=0
(13.17) (These expressions can be proved by induction.) The first decision that has to be made when subdividing a curve, is at what point (what value of t) to break the original curve into two segments. Breaking a curve P(t) into two segments at t = 0.1 will result in a short segment followed by a long segment, each defined by n + 1 control points. Obviously, the first segment will be easier to edit. Once the value of t has been determined, the software computes the 2n − 1 new points. The original n − 1 interior control points are easy to delete, and the set of 2n + 1 points is partitioned into two sets. The procedure that computed the original curve is now invoked twice, to compute and display the two segments. Exercise 13.13: Given the four points P0 = (0, 1, 1), P1 = (1, 1, 0), P2 = (4, 2, 0), and P3 = (6, 1, 1), apply Equation (13.17)a,b to subdivide the B´ezier curve B3,i (t)Pi at t = 1/3.
13.9 Degree Elevation
660
Figure 13.15 illustrates how blossoms are applied to the problem of curve subdivision. The points on the left edge of the triangle become the control points of the first segment. In blossom notation these are points 00 . . . 0 tt . . . t. Similarly, the points n−k
k
on the right edge of the triangle become the control points of the second segment. In blossom notation these are points 11 . . . 1 tt . . . t. There are n + 1 points on each edge, n−k
k
but the total is 2n − 1 because the top of the triangle has just one point, namely ttt. ttt 0tt 1tt 00t 01t 11t 000 001 011 111 Figure 13.15: Blossoming for Subdivision.
13.9 Degree Elevation Degree elevation of the B´ezier curve is a process that starts with a B´ezier curve Pn (t) of degree n (i.e., defined by n + 1 control points) and adds a control point, thereby ending up with a curve Pn+1 (t). The advantage of degree elevation is that the new curve is based on more control points and is therefore easier to edit by maneuvering the points. Its shape can be better fine-tuned than that of the original curve. Just adding a control point is not very useful because the new point will change the shape of the curve globally. Degree elevation is useful only if it is done without modifying the shape of the curve. The principle of degree elevation is therefore to compute a new set of n + 2 control points Qi from the original set of n + 1 points Pi , such that the B´ezier curve Pn+1 (t) defined by the new points will have the same shape as the original curve Pn (t). We start with the innocuous identity that’s true for any B´ezier curve P(t) P(t) = t + (1 − t) P(t) = tP(t) + (1 − t)P(t). The two B´ezier curves on the right-hand side are polynomials of degree n, but because each is multiplied by t, the polynomial on the left-hand side is of degree n + 1. Thus, we can represent a degree-(n + 1) curve as the weighted sum of two degree-n curves and write the identity in the form Pn+1 (t) = (1 − t)Pn (t) + tPn (t). We use the notation Pn (t) =
n n i def n−i Pi = P0 , P1 , . . . , Pn . i t (1 − t) i=0
13 B´ ezier Approximation
661
(Recall that the angle bracket notation indicates blossoms. The double-angle bracket notation used here implies that each point should be multiplied by the corresponding Bernstein polynomial and the products summed.) The first step is to express tPn (t) in the new notation n m n i+1 m−1 k n−i (1 − t) P = t t (1 − t)m−k Pk−1 i i k − 1 i=0 k=1 m m k 2P1 nPn−1 k P0 t (1 − t)m−k Pk−1 = . , ,···, , Pn = 0, k m n+1 n+1 n+1
tPn (t) =
k=0
Here, we first use the substitutions k = i + 1 and m = n + 1, and then the identity m−1 k m = . k−1 m k The next step is to similarly express (1 − t)Pn (t) in the new notation: Pn nP1 (n − 1)P2 , ,···, ,0 . (1 − t)Pn (t) = P0 , n+1 n+1 n+1 Adding the two expressions produces Pn+1 (t) = (1 − t)Pn (t) + tPn (t) P0 2P1 nPn−1 = 0, , ,···, , Pn n+1 n+1 n+1 nP1 (n − 1)P2 Pn + P0 , , ,···, ,0 n+1 n+1 n+1 nPn−1 +Pn P0 +nP1 2P1 +(n−1)P2 , ,···, , Pn = P0 , , (13.18) n+1 n+1 n+1 which shows the n + 2 control points that define the new, degree-elevated B´ezier curve. If the new control points are denoted by Qi , then the expression above can be summarized by the following notation: Q0 = P0 , Qi = ai Pi−1 + (1 − ai )Pi , Qn+1 = Pn .
where ai =
i , n+1
i = 1, 2, . . . , n,
(13.19)
Exercise 13.14: Given the quadratic B´ezier curve defined by the three control points P0 , P1 , and P2 , elevate its degree twice and list the five new control points. It is possible to elevate the degree of a curve many times. Each time the degree is elevated, the new set of control points grows by one point and also approaches the curve. At the limit, the set consists of infinitely many points that are located on the curve.
13.9 Degree Elevation
662
Exercise 13.15: Given the four control points P0 = (0, 0), P1 = (1, 2), P2 = (3, 2), and P3 = (2, 0), elevate the degree of the B´ezier curve defined by them. The degree elevation algorithm summarized by Equation (13.19) can also be derived as an application of blossoms. We define a three-parameter function f? (u1 , u2 , u3 ) as a sum of blossoms of two parameters 1 f2 (u1 , u2 ) + f2 (u1 , u3 ) + f2 (u2 , u3 ) 3 a1 a1 1 = [a2 u1 u2 + (u1 + u2 ) + a0 ] + [a2 u1 u3 + (u1 + u3 ) + a0 ] 3 2 2 a1 + [a2 u2 u3 + (u2 + u3 ) + a0 ] 2 a2 (13.20) = (u1 u2 + u1 u3 + u2 u3 ) + a1 (u1 + u2 + u3 ) + a0 . 3
f? (u1 , u2 , u3 ) =
We notice that f? (u1 , u2 , u3 ) satisfies the following three conditions 1. It is linear in each of its three parameters. 2. It is symmetric with respect to the order of the parameters. 3. Its diagonal, f? (u, u, u), yields the polynomial P2 (t) = a2 t2 + a1 t + a0 . We therefore conclude that f? (u1 , u2 , u3 ) is the (n + 1)-blossom of P2 (t). It should be denoted by f3 (u1 , u2 , u3 ). It can be shown that the extension of Equation (13.20) to any fn+1 (u1 , u2 , . . . , un+1 ) is fn+1 (u1 , . . . , un+1 ) =
n+1 1 fn (u1 , . . . , ui , . . . , un+1 ). n + 1 i=1
(13.21)
(where the underline indicates a missing parameter). Section 13.7 shows that control point Pk of a B´ezier curve Pn (t) is given by the . . 1). Equation (13.21) implies that the same control point Qk of a blossom f (0 . . . 0 1 . n−k
k
B´ezier curve Pn+1 (t) is given as the sum Qk =
n+1−k k Pk + Pk−1 , n+1 n+1
which is identical to Equation (13.19).
13 B´ ezier Approximation
663
13.10 Reparametrizing the Curve The parameter t varies normally in the range [0, 1]. It is, however, easy to reparametrize the B´ezier curve such that its parameter varies in an arbitrary range [a, b], where a and b are real and a < b. The new curve is denoted by Pab (t) and is simply the original curve with a different parameter: t−a Pab (t) = P . b−a The two functions Pab (t) and P(t) produce the same curve when t varies from a to b in the former and from 0 to 1 in the latter. Notice that the new curve has tangent vector t−a 1 Ptab (t) = Pt . b−a b−a Reparametrization can also be used to answer the question: Given a B´ezier curve P(t) where 0 ≤ t ≤ 1, how can we calculate a curve Q(t) that’s defined on an arbitrary part of P(t)? More specifically, if P(t) is defined by control points Pi and if we select an interval [a, b], how can we calculate control points Qi such that the curve Q(t) based on them will go from P(a) to P(b) (i.e., Q(0) = P(a) and Q(1) = P(b)) and will be identical in shape to P(t) in that interval? As an example, if [a, b] = [0, 0.5], then Q(t) will be identical to the first half of P(t). The point is that the interval [a, b] does not have to be inside [0, 1]. We may select, for example, [a, b] = [0.9, 1.5] and end up with a curve Q(t) that will go from P(0.9) to P(1.5) as t varies from 0 to 1. Even though the B´ezier curve was originally designed with 0 ≤ t ≤ 1 in mind, it can still be calculated for t values outside this range. If we like its shape in the range [0.2, 1.1], we may want to calculate new control points Qi and obtain a new curve Q(t) that has this shape when its parameter varies in the standard range [0, 1]. Our approach is to define the new curve Q(t) as P([b − a]t + a) and express the control points Qi of Q(t) in terms of the control points Pi and a and b. We illustrate this technique with the cubic B´ezier curve. This curve is given by Equation (13.8) and we can therefore write Q(t) = P([b − a]t + a)
⎛ ⎞⎛ ⎞ −1 3 −3 1 P0 3 0 ⎟ ⎜ P1 ⎟ ⎜ 3 −6 = ([b−a]t + a)3 , ([b−a]t + a)2 , ([b−a]t + a), 1 ⎝ ⎠⎝ ⎠ P2 −3 3 0 0 P3 1 0 0 0 ⎞⎛ ⎞⎛ ⎛ ⎞ −1 3 −3 1 P0 (b−a)3 0 0 0 2 2 3 −6 3 0 (b−a) 0 0 P 3a(b−a) ⎟⎜ ⎟⎜ 1 ⎟ ⎜ = (t3 , t2 , t, 1) ⎝ 2 ⎠⎝ ⎠⎝ ⎠ −3 3 0 0 P2 3a (b−a) 2a(b−a) b−a 0 3 2 a a 1 P3 1 0 0 0 a = T(t)·A·M·P = T(t)·M·M−1 ·A·M·P = T(t)·M·(M−1 ·A·M)·P
13.10 Reparametrizing the Curve
664 = T(t)·M·B·P = T(t)·M·Q, where B = M−1 · A · M ⎛ (1 − a)3 ⎜ (a − 1)2 (1 − b) ⎜ =⎜ ⎝ (1 − a)(−1 + b)2 (1 − b)3
3(a − 1)2 a
3(1 − a)a2
a3
⎞
(a − 1)(−2a − b + 3ab) a(a + 2b − 3ab) a2 b ⎟ ⎟ ⎟. (b − 1)(−a − 2b + 3ab) b(2a + b − 3ab) ab2 ⎠ 3(b − 1)2 b
3(1 − b)b2
(13.22)
b3
The four new control points Qi , i = 0, 1, 2, 3 are therefore obtained by selecting specific values for a and b, calculating matrix B, and multiplying it by the column P = (P0 , P1 , P2 , P3 )T . Exercise 13.16: Show that the new curve Q(t) is independent of the particular coordinate system used. Example: We select values b = 2 and a = 1. The new curve Q(t) will be identical to the part of P(t) from P(1) to P(2) (normally, of course, we don’t calculate this part, but this example assumes that we are interested in it). Matrix B becomes, in this case ⎛
0 ⎜ 0 B=⎝ 0 −1
0 0 0 −1 1 −4 6 −12
⎞ 1 2⎟ ⎠ 4 8
(it is easy to verify that each row sums up to 1) and the new control points are ⎞ ⎛ ⎞ ⎛ ⎞ P0 P3 Q0 −P2 + 2P3 ⎜ P1 ⎟ ⎜ ⎟ ⎜ Q1 ⎟ ⎠ = B⎝ ⎠=⎝ ⎠. ⎝ Q2 P2 P1 − 4P2 + 4P3 Q3 P3 −P0 + 6P1 − 12P2 + 8P3 ⎛
To understand the geometrical meaning of these points, we define three auxiliary points Ri as follows: R1 = P1 + (P1 − P0 ), R2 = P2 + (P2 − P1 ), R3 = R2 + (R2 − R1 ) = P0 − 4P1 + 4P2 , and write the Qi ’s in the form Q0 = P 3 , Q1 = P3 + (P3 − P2 ), Q2 = Q1 + (Q1 − R2 ) = P1 − 4P2 + 4P3 , Q3 = Q2 + (Q2 − R3 ) = −P0 + 6P1 − 12P2 + 8P3 .
13 B´ ezier Approximation R1
R2
P2
P3=
P1
665
R3
Q0
P(t)
Q1
P0 Q(t) Q2
Q3 Figure 13.16: Control Points for the Case [a, b] = [1, 2].
Figure 13.16 illustrates how the four new points Qi are obtained from the four original points Pi . Example: We select b = 2 and a = 0. The new curve Q(t) will be identical to P(t) from P(0) to P(2). Matrix B becomes ⎛
1 0 0 2 0 ⎜ −1 B=⎝ 1 −4 4 −1 6 −12
⎞ 0 0⎟ ⎠, 0 8
and the new control points Vi are ⎛
⎞ ⎛ ⎞ ⎛ ⎞ V0 P0 P0 −P0 + 2P1 ⎜ V1 ⎟ ⎜ P1 ⎟ ⎜ ⎟ ⎝ ⎠ = B⎝ ⎠=⎝ ⎠, V2 P2 P0 − 4P1 + 4P2 V3 P3 −P0 + 6P1 − 12P2 + 8P3 and it is easy to see that they satisfy V0 = P0 , V1 = R1 , V2 = R3 , and V3 = Q3 . Exercise 13.17: (1) Calculate matrix B for a = 1 and b = a + x (where x is positive); (2) calculate the four new control points Qi as functions of the Pi ’s and of b; and (3) recalculate them for x = 0.75. Exercise 13.18: Calculate matrix B and the four new control points Qi for a = 0 and b = 0.5 (the first half of the curve).
13.10 Reparametrizing the Curve
666
13.10.1 Length of the B´ ezier Curve The length L(P) of the B´ezier curve P(t) can be computed by evaluating the integral L(P) =
1
0
|Pt (t)| dt
(Section 8.2), but this is a tedious operation. It turns out that the length can be calculated approximately (but to any desired accuracy) from the lengths of the control polygon and the chord of the curve.
P12
P1 P012
P2 P123
P0123
P1 P23 P3
P01
P01
P0
P0
P12 P012
P2
P123 P0123 P23
P3
Figure 13.17: Two Subdivided Curves.
Figure 13.17 shows the control polygons of two cubic B´ezier curves (themselves not shown) that have been subdivided. The original control polygon of each curve consists of the three straight segments connecting points P0 , P1 , P2 , and P3 . It is clear that the length of this polygon can be used as a (rough) approximation of the length of the curve and also that the control polygon is longer than the curve. It is also clear that the length of the control polygon after one subdivision in the middle (i.e., the length of the five segments connecting points P0 , P01 , P012 , P123 , P23 , and P3 ) is a better approximation and that the length will always be longer than that of the curve. We (k) denote the length of the control polygon after k midway subdivisions L1 (P). The chord of the original curve is simply the straight segment from P0 to P3 . The chord is shorter than the curve and is clearly not a good approximation of the length of the curve (especially for the curve on the right). However, after one midway subdivision, the chord length (the two dashed segments connecting points P0 , P0123 , and P3 ) becomes a better approximation and it is easy to see intuitively that after k subdivisions, (k) the chord length (which we denote by L0 (P)) becomes a better approximation. (2)
Exercise 13.19: What is L0 (P)? (k)
(k)
It therefore makes sense to use both L1 (P) and L0 (P) to get a good approximation of the curve length. The former expression should be assigned more weight than
13 B´ ezier Approximation
667
the latter. The result discussed here is due to [Gravesen 93]. It states that the length of the B´ezier curve P(t) of order n (i.e., based on n + 1 control points) is given by L(P) =
n − 1 (k) 2 (k) L (P) + L (P) n+1 1 n+1 0
(13.23)
to within 16−k . A segment is supposed to be divided in the middle. (k) (k) Equation (13.23) is a barycentric weighted sum of L1 (P) and L0 (P), where the former has a large weight (the expression (n − 1)/(n + 1) approaches 1 for large n) and the latter has a small weight (the value 2/(n+1) approaches 0 for large n). The meaning of the value 16−k is that the difference between the true length and Equation (13.23) decreases by a factor of 16 after each subdivision. In practice, only about three to four subdivisions are required to get the length of the curve to a high accuracy, sufficient for most practical purposes. (k) (k) Notice that L1 (P) ≥ L(P) ≥ L0 (P). An equal sign applies only if the curve is a straight line, in which case both the control polygon and the chord coincide with the curve.
13.10.2 Speed of the B´ ezier Curve Speed is normally measured in units of length per units of time. The speed discussed here, however, is measured in pixels per unit of t. The problem is that incrementing t in equal steps of size Δ moves us unequal distances on the curve. This happens commonly with curves and is not specific to the B´ezier curve. It causes two problems: 1. In regions of low speed, where incrementing t by a small unit Δ moves us just a small distance along the curve, the values P(t) and P(t + Δ) may be so close that they may refer to the same pixel. Plotting the same pixel twice slows down the curve plotting algorithm. 2. In regions of high speed, the distance between P(t) and P(t + Δ) may be more than one pixel, causing the final curve to look fragmented. This section discusses the speed of the B´ezier curve and how it is affected by the relative positions of the control points. The curves in Figure 13.18 were constructed by varying t in 30 small steps. The 30 pixels are not uniformly distributed along the curve. This property is a result of the shape of the weight functions and it is easy to verify just by watching the pixels drawn on the screen. The curve of Figure 13.18c is simple. It is close to a straight line (its curvature is small) and it is based on four control points that are roughly equidistant. In this curve, the pixels initially move fast; toward the middle of the curve they slow down; close to the end they speed up again. The explanation of this behavior is simple. At the start, when t is close to zero, the shape of the curve is influenced mostly by Bn,0 (t), since the other weights are close to zero. This function, however, has a large negative slope in this region, so every small change in t changes its value (and, as a result, the value of the curve) substantially. The pixels drawn for, say, t = 0.01 and t = 0.02 will be quite separated. Toward the end, when t is close to 1, a similar situation happens with Bn,n (t). In the middle, however, the curve is influenced by weight functions that do not slope as much, so small changes
13.10 Reparametrizing the Curve
668
t=0.5
t=0.25 (a) t=0.5
t=0.75
t=0.75
t=0.25 (b)
t=0.25
t=0.75 t=0.5 (c)
Figure 13.18: Speed of B´ezier Curves.
in t produce small changes in the curve. Therefore, the pixels drawn for, say, t = 0.5 and t = 0.51 are not separated as much as P(0.01) and P(0.02). The curve of Figure 13.18b is also close to a straight line, but its four control points are not equidistant. It is easy to see how the pixels bunch together when the curve travels in the region where the first three points are located. Once out of this region, the curve “picks up speed.” Figure 13.18a is similar. The pixels are again bunched together in the vicinity of the last three points, but these points are not on a straight line, a feature that gives the curves large curvature in their area. This example shows that we can expect the curve to slow down in regions with high curvature, because the control points must be close together in order to create high curvature. To understand why the curve slows down when control points are close together, let’s imagine an extreme case where P0 = P1 = P2 . The expression for the curve in such a case is P(t) = P0 B30 (t) + B31 (t) + B32 (t) + P3 B33 (t). It is easy to see that the parameter t must get very close to 1 before point P3 would have much influence on the curve (before B33 (t) would become larger than the sum B30 (t) + B31 (t) + B32 (t)). This is why the curve spends most of its “time” in the vicinity of the triple point P0 , then rushes toward P3 when t gets close to 1.
13.10.3 Constant Speed Sometimes it is important to move along a B´ezier curve at constant speed. A practical example is computer animation, where the (imaginary) camera has to be moved along a curve and stopped to take a snapshot at n + 1 equally spaced positions (Section 19.2). The method discussed here is based on approximating the curve by a polyline, then finding the values ti of the parameter t that advance equal distances on the polyline and use them to move along the curve. To construct the polyline, the algorithm selects points
13 B´ ezier Approximation
669
on the curve and connects them with straight segments. In regions where the curve is close to a straight line (i.e., has low curvature), these points can be well separated. In regions where the curvature is high, the points must, of course, be close together (Figure 13.19a) to guarantee good approximation. The points are selected by applying the subdivision method of Section 13.8. A subdivision divides a curve into two curves that connect at a point and this point becomes a vertex of the polyline. Our algorithm thus proceeds in the following steps: 1. The first and last points of the curve are placed in the (initially empty) list of polyline points.
(a)
s
s
s
s (b)
s
s s
s
s s
s
Figure 13.19: Unequally Spaced Points.
2. The curve is checked to see if it deviates from a straight line sufficiently to justify being subdivided. If yes, it is subdivided, the common point of subdivision is added to the list of polyline points, and each of the resulting two curves is recursively checked to see if it should be further subdivided. This step provides adaptive subdivision of the curve, i.e., only high-curvature areas are further subdivided. 3. The polyline created by the list of points provides a close approximation to the curve. The length L of this polyline is the sum of the lengths of the individual segments, so it is easy to calculate. Our original problem was to move along the curve and stop at n + 1 equally spaced points. Now that we have a polyline of length L closely following the curve, we divide it into n chunks of size s = L/n each (Figure 13.19b). 4. The n + 1 parameter values ti that divide the polyline into chunks of size s are calculated. These values are later used to move along the curve and stop at n + 1 points. The better the polyline approximates the curve, the more equally spaced these points will be. Subdividing the B´ezier curve is time-consuming, so the minimum number of subdivisions should be used. At the same time, each subdivision improves the approximation of the polyline to the curve. The test used in step 2 is therefore crucial to the performance of the algorithm. This test is based on Equation (13.23) (Section 13.10.1) that defines the relation between the length of a B´ezier curve and the lengths of its control polygon and its chord. The control polygon is normally longer than the curve; the chord
670
13.10 Reparametrizing the Curve
is normally shorter. The two quantities have the same length only when the curve is a straight line. The conclusion is that the closer the lengths of the control polygon and the chord, the closer the curve is to a straight line. The curve should thus be recursively subdivided if the test if(ctrl_polygon - chord>=eps) or, alternatively, if(ctrl_polygon > (1+eps)*chord) is satisfied, where eps is a small, user-controlled tolerance parameter (notice that the control polygon cannot be shorter than the chord, so the difference ctrl_polygon - chord is never negative). Figure 13.20 is a pseudo-code for step 4. We assume that we already have a ksegment polyline based on the k + 1 points P0 , P1 ,. . . , Pk obtained by the subdivisions. The algorithm starts by measuring the total length L of the polyline and calculating s = L/n, where n is an input parameter. The main loop iterates over the segments and measures n chunks of length s. For a general segment from Pi−1 to Pi , variable st measures the distance from the end of the last chunk to the end of the previous segment. A piece of size s−st is still needed to complete the current chunk. This piece may require just part of the current segment, or the entire segment and part (or all) of the next one. Variable t is incremented from 0 to 1. Each time a chunk of length s is identified, t is set to the correct value at the end of this chunk. At the end of an iteration, it is always set to its value at point Pi (the end of the segment).
13.10.4 Converting Cubic Curves The fact that the B´ezier curve has the convex hull property makes it useful to convert other types of curves to a B´ezier curve. The discussion below shows how to do this for the cubic case. Let ⎛ ⎞ Q0 Q ⎜ ⎟ Q(t) = (t3 , t2 , t, 1)M ⎝ 1 ⎠ Q2 Q3 be any cubic parametric curve where the Qi ’s may be points, tangent vectors, or any other nonscalar quantities. The cubic B´ezier curve is given by Equation (13.8), ⎛
⎞ P0 ⎜P ⎟ P(t) = (t3 , t2 , t, 1)B ⎝ 1 ⎠ , P2 P3 where B is the basis matrix ⎛
⎞ −1 3 −3 1 3 0⎟ ⎜ 3 −6 B=⎝ ⎠. −3 3 0 0 1 0 0 0
13 B´ ezier Approximation
671
t=0; TotSegLen=0; // total length of segments visited so far L=0; // total length of polyline for i=1 to k do L=L+|Pi − Pi−1 |; endfor; st=0; s=L/n; // size of a chunk AddTable(0); // add initial value for i=1 to k do // loop over k segments SegLen=|Pi − Pi−1 |; TotSegLen=TotSegLen+SegLen; if(s-st≤SegLen) then // a chunk ends at this segment t=t+(s-st)/L; AddTable(t); while SegLen>s do // more chunks in t=t+s/L; // this segment AddTable(t); SegLen=SegLen-s; endwhile; st=SegLen; else // entire segment is part of chunk st=st+SegLen; endif; t=t+TotSegLen/L; endfor; AddTable(1); // add final value Figure 13.20: Measuring n Chunks on a Polyline.
For the curves to be equal, the following must be true: ⎛
⎞ ⎛ ⎞ P0 Q0 ⎜P ⎟ ⎜Q ⎟ B⎝ 1 ⎠ = M⎝ 1 ⎠. P2 Q2 P3 Q3 Thus, the solution is
⎛ ⎞ ⎞ Q0 P0 ⎜ Q1 ⎟ ⎜ P1 ⎟ −1 ⎠ = B M⎝ ⎠, ⎝ P2 Q2 P3 Q3 ⎛
and it always exists since we know that B is nonsingular. Similarly, it is possible to convert the B´ezier curve into any other cubic form, provided M is nonsingular. The following discussion shows the relationship between the B´ezier curve and the Hermite curve segment. A similar relationship between the B´ezier curve and the Catmull-Rom curve is shown in Section 13.12.
672
13.11 Cubic B´ ezier Segments with Tension
Any set of four given control points P0 , P1 , P2 , and P3 determines a unique (cubic) B´ezier curve. It is interesting to note that there is a Hermite curve that has an identical shape. It is determined by the 4-tuple (P0 , P3 , 3(P1 − P0 ), 3(P3 − P2 )).
(13.24)
Exercise 13.20: Prove this claim! The opposite is also true. Given two points P0 and P1 and two tangent vectors Pt0 and Pt1 they define a Hermite segment. An identical B´ezier segment is determined by the 4-tuple (13.25) P0 , (P0 + 13 Pt0 ), (P1 − 13 Pt1 ), P1 .
13.11 Cubic B´ ezier Segments with Tension Adding a tension parameter to a cubic B´ezier segment is done by manipulating tangent vectors, similar to how tension is added to the Cardinal spline (Section 12.5). We use Hermite interpolation (Equation (11.7)) to calculate a PC segment that starts at point P0 and ends at point P3 and whose extreme tangent vectors are s(P1 − P0 ) and s(P3 − P2 ) (see Equation (13.24).) Substituting these values in Equation (11.7), we manipulate it so that it ends up looking like a cubic B´ezier segment, Equation (13.8) ⎛
2 −2 1 3 −2 ⎜ −3 3 2 P(t) = (t , t , t, 1) ⎝ 0 0 1 1 0 0 ⎛ 2−s s ⎜ 2s − 3 −2s = (t3 , t2 , t, 1) ⎝ −s s 1 0
⎞ ⎞⎛ 1 P0 P3 −1 ⎟ ⎜ ⎟ ⎠ ⎠⎝ s(P1 − P0 ) 0 s(P3 − P2 ) 0 ⎞⎛ ⎞ −s s − 2 P0 s 3 − s ⎟ ⎜ P1 ⎟ ⎠⎝ ⎠. P2 0 0 P3 0 0
(13.26)
A quick check verifies that Equation (13.26) reduces to the cubic B´ezier segment, Equation (13.8), for s = 3. This value is therefore considered the “neutral” or “standard” value of the tension parameter s. Since s controls the length of the tangent vectors, small values of s should produce the effects of higher tension and, in the extreme, the value s = 0 should result in indefinite tangent vectors and in the curve segment becoming a straight line. To show this, we rewrite Equation (13.26) for s = 0: ⎛
2 ⎜ −3 3 2 P(t) = (t , t , t, 1) ⎝ 0 1
0 0 0 0
⎞⎛ ⎞ 0 −2 P0 0 3 ⎟ ⎜ P1 ⎟ ⎠⎝ ⎠ P2 0 0 0 0 P3
= (2t3 − 3t2 + 1)P0 + (−2t3 + 3t2 )P3 .
13 B´ ezier Approximation
673
Substituting T = 3t2 − 2t3 for t changes the expression above to the form P(T ) = (P3 − P0 )T + P0 , i.e., a straight line from P(0) = P0 to P(1) = P3 . The tangent vector of Equation (13.26) is ⎞ ⎞⎛ 2−s s −s s − 2 P0 ⎜ 2s − 3 −2s s 3 − s ⎟ ⎜ P1 ⎟ Pt (t) = (3t2 , 2t, 1, 0) ⎝ ⎠ ⎠⎝ P2 −s s 0 0 P3 1 0 0 0 = 3t2 (2 − s) + 2t(2s − 3) − s P0 + 3st2 − 4st + s P1 + −3st2 + 2st P2 + 3t2 (s − 2) + 2t(3 − s) P3 . ⎛
(13.27)
The extreme tangents are Pt (0) = s(P1 − P0 ) and Pt (1) = s(P3 − P2 ). Substituting s = 0 in Equation (13.27) yields the tangent vector for the case of infinite tension (compare with Exercise 12.12) Pt (t) = 6(t2 − t)P0 − 6(t2 − t)P3 = 6(t − t2 )(P3 − P0 ).
(13.28)
Exercise 13.21: Since the spline segment is a straight line in this case, its tangent vector should always point in the same direction. Use Equation (13.28) to show that this is so. See also Section 14.4 for a discussion of cubic B-spline with tension. We interrupt this program to increase dramatic tension. —Joe Leahy (as the Announcer) in Freakazoid! (1995).
13.12 An Interpolating B´ ezier Curve: I Any set of four control points P1 , P2 , P3 , and P4 determines a unique Catmull–Rom segment that’s a cubic polynomial going from point P2 to point P3 . It turns out that such a segment can also be written as a four-point B´ezier curve from P2 to P3 . All that we have to do is find two points, X and Y, located between P2 and P3 , such that the B´ezier curve based on P2 , X, Y, and P3 will be identical to the Catmull–Rom segment. This turns out to be an easy task. We start with the expressions for a Catmull–Rom segment defined by P1 , P2 , P3 , and P4 , and for a four-point B´ezier curve defined by P2 , X, Y, and P3 (Equations (12.49) and (13.8)): ⎛
⎞⎛ ⎞ −0.5 1.5 −1.5 0.5 P1 P 1 −2.5 2 −0.5 ⎜ ⎟⎜ 2⎟ (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠, P3 −0.5 0 0.5 0 P4 0 1 0 0 ⎞ ⎛ ⎞⎛ −1 3 −3 1 P2 3 0⎟⎜ X ⎟ ⎜ 3 −6 (t3 , t2 , t, 1) ⎝ ⎠. ⎠⎝ Y −3 3 0 0 P3 1 0 0 0
674
13.12 An Interpolating B´ ezier Curve: I
These have to be equal for each power of t, which yields the four equations −0.5P1 +1.5P2 −1.5P3 +0.5P4 = −P2 +3X−3Y+P3 , P1 −2.5P2 +2.0P3 −0.5P4 = 3P2 −6X+3Y, +0.5P3 =−3P2 +3X, −0.5P1 P2 = P2 . X Y
P2
P3
P4
P1 Figure 13.21: Calculating Points X and Y .
These are easily solved to produce 1 1 X = P2 + (P3 − P1 ) and Y = P3 − (P4 − P2 ). 6 6
(13.29)
The difference (P3 − P1 ) is the vector from P1 to P3 . Thus, point X is obtained by adding 1/6 of this vector to point P2 (Figure 13.21). Similarly, Y is obtained by subtracting 1/6 of the difference (P4 − P2 ) from point P3 . This simple result suggests a novel approach to the problem of interactive curve design, an approach that combines the useful features of both cubic splines and B´ezier curves. A cubic spline passes through the (data) points but is not highly interactive. It can be edited only by modifying the two extreme tangent vectors. A B´ezier curve does not pass through the (control) points, but it is easy to manipulate and edit by moving the points. The new approach constructs an interpolating B´ezier curve in the following steps: 1. The user is asked to input n points, through which the final curve will pass. 2. The program divides the points into overlapping groups of four points each and applies Equation (13.29) to compute two auxiliary points X and Y for each group. 3. A B´ezier segment is then drawn from the second to the third point of each group, using points X and Y as its other two control points. Note that points Y and P3 of a group are on a straight line with point X of the next group. This guarantees that the individual segments will connect smoothly. 4. It is also possible to draw a B´ezier segment from P1 to P2 (and, similarly, from Pn−1 to Pn ). This segment uses the two auxiliary control points X = P1 + 16 (P2 − P1 ) and Y = P2 − 16 (P3 − P1 ). Users find it natural to specify such a curve, because they don’t have to worry about the positions of the control points. The curve consists of n − 1 segments and the two auxiliary control points of each segment are calculated automatically. Such a curve is usually pleasing to the eye and rarely needs to be edited. However, if it is not satisfactory, it can be modified by moving the auxiliary control points. There
13 B´ ezier Approximation
675
are 2(n − 1) of them, which allows for flexible control. A good program should display the auxiliary points and should make it easy for the user to grab and move any of them. The well-known drawing software Adobe Illustrator [Adobe 04] employs a similar approach. The user specifies points with the mouse. At each point Pi , the user presses the mouse button to fix Pi , then drags the mouse before releasing the button, which defines two symmetrical points, X (following Pi ) and Y (preceding it). Releasing the button is a signal to the program to draw the segment from Pi−1 to Pi (Figure 13.22). release Xi
press Pi
drag
Yi-1 Xi-1
Bezier segment Pi-1
Figure 13.22: Construction of Xi and Yi by Click and Drag.
Example: We apply this method to the six points P0 = (1/2, 0), P1 = (1/2, 1/2), P2 = (0, 1), P3 = (1, 3/2), P4 = (3/2, 1), and P5 = (1, 1/2). The six points yield three curve segments and the main step is to calculate the two intermediate points for each of the three segments. This is trivial and it results in: X1 = P1 + (P2 − P0 )/6 = (5/12, 2/3), Y1 = P2 − (P3 − P1 )/6 = (−1/12, 5/6), X2 = P2 + (P3 − P1 )/6 = (1/12, 7/6), Y2 = P3 − (P4 − P2 )/6 = (3/4, 3/2), X3 = P3 + (P4 − P2 )/6 = (5/4, 3/2), Y3 = P4 − (P5 − P3 )/6 = (3/2, 7/6). Once the points are available, the three segments can easily be calculated. Each is a cubic B´ezier segment based on a group of four points. The groups are [P1 , X1 , Y1 , P2 ],
[P2 , X2 , Y2 , P3 ],
[P3 , X3 , Y3 , P4 ],
and the three curve segments are P1 (t) = (1 − t)3 P1 + 3t(1 − t)2 X1 + 3t2 (1 − t)Y1 + t3 P2 = (2 − t − 5t2 + 4t3 )/4, (1 + t)/2 , P2 (t) = (1 − t)3 P2 + 3t(1 − t)2 X2 + 3t2 (1 − t)Y2 + t3 P3 = (t + 7t2 − 4t3 )/4, (2 + t + t2 − t3 )/2 , P3 (t) = (1 − t)3 P3 + 3t(1 − t)2 X3 + 3t2 (1 − t)Y3 + t3 P4 = (4 + 3t − t3 )/4, (3 − 2t2 + t3 )/2 .
13.13 An Interpolating B´ ezier Curve: II
676
The 12 points and the three segments are shown in Figure 13.24 (where the segments have been separated intentionally), as well as the code for the entire example.
13.13 An Interpolating B´ ezier Curve: II We start, as usual, with n + 1 control points P0 ,. . . , Pn . Two auxiliary points Xi and Yi+1 are automatically calculated by the software between each pair Pi , Pi+1 of control points. After all the Xi and Yi points have been computed, the curve is drawn as a sequence of four-point B´ezier segments, each based on a group of four points Pi , Xi , Yi+1 , and Pi+1 . The auxiliary points are computed as follows. A new point, Qi , is defined by the relation Qi − Pi = Pi − Pi−1 . It will be recalled that the difference of two points is a vector, so Qi = 2Pi − Pi−1 is at the same distance from Pi as Pi is from Pi−1 (Figure 13.23). Also, the direction from Pi to Qi is the same as that from Pi−1 to Pi . The first auxiliary point Xi is now calculated midway between Qi and Pi+1 , i.e., Xi =
Qi + Pi+1 1 2Pi − Pi−1 + Pi+1 = = Pi + (Pi+1 − Pi−1 ). 2 2 2
(13.30)
The second auxiliary point Yi is now calculated by Yi − Pi = Pi − Xi , i.e., Yi is symmetric to Pi with respect to Xi . Given the definition of Xi from Equation (13.30), we get 1 Yi = 2Pi − Xi = Pi − (Pi+1 − Pi−1 ). (13.31) 2
Qi
Yi
Pi−1
Xi
Pi
Pi+1
Figure 13.23: Construction of Xi and Yi−1 .
The final sequence of points is P0 , X0 , Y1 , P1 , X1 , Y2 , P2 , X2 , Y3 , P3 , . . . , Pn−1 , Xn−1 , Yn , Pn ,
13 B´ ezier Approximation
Y2 1.4 1.2
X3
P3
P3(t)
P2(t)
X2
677
Y3 P4
1 P2 Y1 0.8
P1(t) X1
0.6
P1
0.4
P5
0.2 P0 0.25
0.5
0.75
1
1.25
1.5
(* Interpolating Bezier Curve: I *) p0={1/2,0};p1={1/2,1/2};p2={0,1}; p3={1,3/2};p4={3/2,1};p5={1,1/2}; x1=p1+(p2-p0)/6; x2=p2+(p3-p1)/6; x3=p3+(p4-p2)/6; y1=p2-(p3-p1)/6; y2=p3-(p4-p2)/6; y3=p4-(p5-p3)/6; c1[t_]:=Simplify[(1-t)^3 p1+3t (1-t)^2 x1+3t^2(1-t) y1+t^3 p2] c2[t_]:=Simplify[(1-t)^3 p2+3t (1-t)^2 x2+3t^2(1-t) y2+t^3 p3] c3[t_]:=Simplify[(1-t)^3 p3+3t (1-t)^2 x3+3t^2(1-t) y3+t^3 p4] g1=ListPlot[{p0,p1,p2,p3,p4,p5,x1,x2,x3,y1,y2,y3}, PlotStyle->{Red,AbsolutePointSize[6]}, AspectRatio->Automatic]; g2=ParametricPlot[c1[t],{t,0,.9}]; g3=ParametricPlot[c2[t],{t,0.1,.9}]; g4=ParametricPlot[c3[t],{t,0.1,1}]; Show[g1,g2,g3,g4,PlotRange->All] Figure 13.24: An Interpolating B´ezier Curve.
678
13.13 An Interpolating B´ ezier Curve: II
a total of 3n+1 points. Notice that X0 and Yn cannot be calculated by Equations (13.30) and (13.31). They have to be input by the user, and they serve to establish the start and end directions of the curve. As mentioned earlier, the curve is made up of four-point segments based on the groups (P0 , X0 , Y1 , P1 ), (P1 , X1 , Y2 , P2 ), . . . , (Pn−1 , Xn−1 , Yn , Pn ). Notice that this method is similar to that of Section 13.12, the main difference being that this method requires the user to input the values of X0 and Yn , whereas in Section 13.12, the software calculates all the auxiliary points but produces a curve from P1 to Pn−1 instead of from P0 to Pn . Another difference is the factors of 1/2 and 1/6. Experience indicates that this method produces satisfactory curves in most cases. In cases where the curve is not satisfactory, a variant simply draws the B´ezier curve that’s based on all 3n + 1 points. This curve does not, of course, pass through the original control points, so it is not an interpolating curve, but it may, nevertheless, be useful in certain applications, such as computer animation (Section 19.2). Example: We apply this method to the six points P0 = (1/2, 0), P1 = (1/2, 1/2), P2 = (0, 1), P3 = (1, 3/2), P4 = (3/2, 1), and P5 = (1, 1/2). The six points yield five curve segments. The first step is to calculate the two intermediate points for each of the five segments. This is straightforward, but notice that the choice of X0 and Y5 is arbitrary: X0 = P1 − P0 = (0, 1/2), Y5 = P4 − P5 = (1/2, 1/2), X1 = P1 + (P2 − P0 )/2 = (1/4, 1), Y1 = P1 − (P2 − P0 )/2 = (3/4, 0), X2 = P2 + (P3 − P1 )/2 = (1/4, 3/2), Y2 = P2 − (P3 − P1 )/2 = (−1/4, 1/2), X3 = P3 + (P4 − P2 )/2 = (7/4, 3/2), Y3 = P3 − (P4 − P2 )/2 = (1/4, 3/2), X4 = P4 + (P5 − P3 )/2 = (3/2, 1/2), Y4 = P4 − (P5 − P3 )/2 = (3/2, 3/2), Once the points are available, the five segments can be calculated easily. Each is a cubic B´ezier segment based on a group of four points. The groups are [P0 , X0 , Y1 , P1 ], [P1 , X1 , Y2 , P2 ], [P2 , X2 , Y3 , P3 ], [P3 , X3 , Y4 , P4 ], [P4 , X4 , Y5 , P5 ], and the five curve segments are P1 (t) = (1 − t)3 P0 + 3t(1 − t)2 X0 + 3t2 (1 − t)Y1 + t3 P1 = (2 − 6t + 15t2 − 9t3 )/4, t(3 − 6t + 4t2 )/2 , P2 (t) = (1 − t)3 P1 + 3t(1 − t)2 X1 + 3t2 (1 − t)Y2 + t3 P2 = (2 − 3t − 3t2 + 4t3 )/4, (1 + 3t − 6t2 + 4t3 )/2 , P3 (t) = (1 − t)3 P2 + 3t(1 − t)2 X2 + 3t2 (1 − t)Y3 + t3 P3 = t(3 − 3t + 4t2 )/4, (2 + 3t − 3t2 + t3 )/2 , P4 (t) = (1 − t)3 P3 + 3t(1 − t)2 X3 + 3t2 (1 − t)Y4 + t3 P4 = 1 + 9t/4 − 3t2 + 5t3 /4, (3 − t3 )/2 ,
13 B´ ezier Approximation
679
P5 (t) = (1 − t)3 P4 + 3t(1 − t)2 X4 + 3t2 (1 − t)Y5 + t3 P5 = (3 − 6t2 + 5t3 )/2, (2 − 3t + 3t2 − t3 )/2 . The 12 points and the five segments are shown in Figure 13.25 (where the segments have been separated intentionally), as well as the code for the entire example.
13.13.1 An Interpolating B´ ezier Curve: III The approach outlined in this section computes an interpolating B´ezier curve by solving equations. Given a set of n + 1 data points Q0 , Q1 ,. . . ,Qn , we select n + 1 values ti such that P(ti ) = Qi . We require that whenever t reaches one of the values ti , the curve will pass through a point Qi . The values ti don’t have to be equally spaced, a feature that provides control over the “speed” of the curve. All that’s needed to calculate the curve is to calculate a set of n + 1 control points Pi . This is done by setting and solving the set of n + 1 linear equations P(t0 ) = Q0 , P(t1 ) = Q1 ,. . . ,P(tn ) = Qn that’s expressed in matrix notation as follows: ⎛ B (t ) B (t ) . . . B (t ) ⎞ ⎛ ⎞ ⎛ ⎞ Q0 P0 n,0 0 n,1 0 n,n 0 ⎜ Bn,0 (t1 ) Bn,1 (t1 ) . . . Bn,n (t1 ) ⎟ ⎜ P1 ⎟ ⎜ Q1 ⎟ ⎜ ⎟⎜ . ⎟ = ⎜ . ⎟. (13.32) .. .. .. .. ⎝ ⎠ ⎝ .. ⎠ ⎝ . ⎠ . . . . . Pn Qn Bn,0 (tn ) Bn,1 (tn ) . . . Bn,n (tn ) This set can be expressed as MP = Q and it is easily solved numerically by P = M−1 Q. If we select t0 = 0, the first row of Equation (13.32) yields P0 = Q0 . Similarly, if we select tn = 1, the last row of Equation (13.32) yields Pn = Qn . This decreases the number of equations from n + 1 to n − 1. The disadvantage of this approach is that any changes in the ti ’s require a recalculation of M and, consequently, of M−1 . If controlling the speed of the curve is not important, we can select the n+1 equally spaced values ti = i/n. Equation (13.32) can now be written Bn,n (0/n) ⎞ ⎛ P0 ⎞ ⎛ Q0 ⎞ Bn,n (1/n) ⎟ ⎜ P1 ⎟ ⎜ Q1 ⎟ ⎜ Bn,0 (1/n) ⎟⎜ . ⎟ = ⎜ . ⎟. ⎜ .. .. ⎠ ⎝ .. ⎠ ⎝ . ⎠ ⎝ . . . Pn Qn Bn,0 (n/n) Bn,1 (n/n) . . . Bn,n (n/n) ⎛B
n,0 (0/n)
Bn,1 (0/n) Bn,1 (1/n) .. .
... ... .. .
(13.33)
Now, if the data points Qi are moved, matrix M (or, rather, M−1 ) doesn’t have to be recalculated. If we number the rows and columns of M by 0 through n, then a general element of M equals n n!(n − i)n−j ij (i/n)j (1 − i/n)n−j = . Mij = Bn,j (i/n) = j!(n − j)!nn j Such elements can be calculated, if desired, as exact rational integers, instead of (approximate) floating-point numbers. Example: We use Equation (13.33) to compute the interpolating B´ezier curve that passes through the four points Q0 = (0, 0), Q1 = (1, 1), Q2 = (2, 1), and Q3 = (3, 0).
13.13 An Interpolating B´ ezier Curve: II
680
1.4
X2
Y3
P3
Y4
X3
1.2 1 P2
P4
X1
0.8 Y2 0.6 0.4
X0
Y5
P1
P5
X4
0.2 P0 0.5
Y1 1
1.5
Clear[p0,p1,p2,p3,p4,p5,x0,x1,x2,x3,x4,y1,y2,y3,y4,y5,c1, c2,c3,c4,c5,g1,g2,g3,g4,g5,g6]; p0={1/2,0};p1={1/2,1/2};p2={0,1};p3={1,3/2}; p4={3/2,1};p5={1,1/2}; x0=p1-p0;y5=p4-p5; x1=p1+(p2-p0)/2;x2=p2+(p3-p1)/2; x3=p3+(p4-p2)/2;x4=p4+(p5-p3)/2; y1=p1-(p2-p0)/2;y2=p2-(p3-p1)/2; y3=p3-(p4-p2)/2;y4=p4-(p5-p3)/2; c1[t_]:=Simplify[(1-t)^3 p0+3t (1-t)^2 x0+3t^2(1-t) y1+t^3 p1] c2[t_]:=Simplify[(1-t)^3 p1+3t (1-t)^2 x1+3t^2(1-t) y2+t^3 p2] c3[t_]:=Simplify[(1-t)^3 p2+3t (1-t)^2 x2+3t^2(1-t) y3+t^3 p3] c4[t_]:=Simplify[(1-t)^3 p3+3t (1-t)^2 x3+3t^2(1-t) y4+t^3 p4] c5[t_]:=Simplify[(1-t)^3 p4+3t (1-t)^2 x4+3t^2(1-t) y5+t^3 p5] g1=ListPlot[{p0,p1,p2,p3,p4,p5,x0,x1,x2,x3,x4,y1,y2,y3,y4,y5}, PlotStyle->{Red,AbsolutePointSize[6]},AspectRatio->Automatic]; g2=ParametricPlot[c1[t],{t,0,.95}]; g3=ParametricPlot[c2[t],{t,0.05,.95}]; g4=ParametricPlot[c3[t],{t,0.05,.95}]; g5=ParametricPlot[c4[t],{t,0.05,.95}]; g6=ParametricPlot[c5[t],{t,0.05,1}]; Show[g1,g2,g3,g4,g5,g6,PlotRange->All] Figure 13.25: An Interpolating B´ezier Curve: II.
13 B´ ezier Approximation
681
Since the curve has to pass through the first and last point, we get P0 = Q0 = (0, 0) and P3 = Q3 = (3, 0). Since the four given points are equally spaced, it makes sense to assume that P(1/3) = Q1 and P(2/3) = Q2 . We, thus, end up with the two equations 3(1/3)(1 − 1/3)2 P1 + 3(1/3)2 (1 − 1/3)P2 + (1/3)3 (3, 0) = (1, 1), 3(2/3)(1 − 2/3)2 P1 + 3(2/3)2 (1 − 2/3)P2 + (2/3)3 (3, 0) = (2, 1) that are solved to yield P1 = (1, 3/2) and P2 = (2, 3/2). The curve is thus P(t) = (1 − t)3 (0, 0) + 3t(1 − t)2 (1, 3/2) + 3t2 (1 − t)(2, 3/2) + t3 (3, 0). Exercise 13.22: Plot the curve and the eight points.
13.13.2 An Interpolating B´ ezier Curve: IV Traditionally, the word font refers to a set of characters of type that share the same size and style, such as Times Roman 12 point. Imagine the task of a font designer about to design the next character of a new font. The designer has a rough idea of the shape of the character and needs to create a curve that will fit the outline of the character. A natural solution to the problem is to strategically place data points along the outline of the character and connect them with spline segments. The method described here uses four-point B´ezier segments, where each segment goes from a data point Pk to the next point Pk+1 , using two intermediate control points that the software calculates automatically. In this way, the designer does not have to know about control points. Their (the designer’s) job is to place data points strategically along the desired curve. Figure 13.26 is an example of the letter “A” in font Times. It is clear that large parts of the letter are made of straight (or close to straight) segments which require few data points. Only regions of high curvature need many points for their definition.
A
Figure 13.26: Data Points for the Letter “A”.
The method outlined here is due to John Hobby [Hobby 86] and is used in the Metafont software (see page 131 of [Knuth 86] for more details) to design the outlines of fonts of type. This method constructs an interpolating B´ezier curve and does it by combining features borrowed from Hermite interpolation and cubic splines. The
682
13.13 An Interpolating B´ ezier Curve: II
advantage of this method is that the designer can help the software in three ways as follows: 1. The designer may specify the two intermediate control points for any segment. This overrides the points calculated by the software and is usually done by the designer while looking at the first version of the curve and attempting to fine-tune its shape. 2. The designer may specify the direction of the curve at certain points. Often, it is clear to the designer that the curve should go, for example, horizontally from left to right, when it passes through data point Pk and such information can be very helpful to the software. 3. Another feature that can help the designer get the right curve is the ability to specify the tension of the curve individually for each segment. The next paragraph is copied from Section 12.5. Perhaps the best way to visualize a spline under tension is to think of the data points as nails driven into a board, and of the spline as a rubber band strung above or below the nails. To add tension, simply pull the rubber band, which tightens it, bringing each segment closer to a straight line. Overhead the sky was half crystalline, half misty, and the night around was chill and vibrant with rich tension. —F. Scott Fitzgerald, This Side of Paradise. We start with a single Hermite segment connecting points P1 and P2 . The two extreme tangent vectors are usually denoted Pt1 and Pt2 . In this section, they are expressed as Pt1 = (f (θ, φ)/τ1 )T1 and Pt2 = (g(θ, φ)/τ2 )T2 , where T1 and T2 are unit vectors in the directions of the tangents, and f and g are functions that determine the magnitudes of the tangents. These functions depend on θ and φ, which are the angles between the line P1 → P2 and the two tangents. The quantities τ1 and τ2 are the tension parameters. The bigger they are, the shorter the tangent vectors become and the closer the curve gets to a straight line. The default value of the tension parameters is 1, but the user can specify values τk at any data point Pk along the curve. We select functions f and g in a manner similar to that of Section 12.1.7, Equations (12.26) and (12.27): 2|P2 − P1 | , 1 + α cos φ + (1 − α) cos θ 2|P2 − P1 | , f (θ, φ) = |Pt2 | = 1 + α cos θ + (1 − α) cos φ g(θ, φ) = |Pt1 | =
(13.34)
where α is a user-defined parameter in the range [0.5, 1]. Instead of constructing the Hermite segment out of the two points and two tangents, we construct it as a four-point B´ezier curve, according to Equation (13.25) (Section 13.10.4):
13 B´ ezier Approximation
683
1 1 P1 , P1 + Pt1 , P2 − Pt2 , P2 3 3 (13.35) f (θ, φ) g(θ, φ) = P 1 , P1 + T1 , P2 − T2 , P2 . 3τ1 3τ2 In this way, each segment has two interior control points, making it possible for the user to explicitly specify any control points. The software has to set up equations that are easy to solve (i.e., linear). The equations are based on the requirements that the first and second derivatives are continuous at the n − 2 interior points. The unknowns are the various θ and φ angles. Each interior point has two such angles, and each of the two extreme points has one angle. The total number of unknowns is, therefore, 1 + 2(n − 2) + 1 = 2n − 2. Once all the angles are known, all the f and g functions and all the unit tangents Tk can be calculated. Using these and the tension parameters, all the interior control points Pk +
f (θk , φk+1 ) Tk 3τk
and Pk+1 −
g(θk , φk+1 ) Tk+1 , 3τk+1
(13.36)
can be calculated. (Equation (13.36) shows how changing the tension parameters is equivalent to sliding the two interior control points along the lines that connect them to Pk and Pk+1 , respectively.) The last step is to calculate and draw all the n − 1 B´ezier segments that constitute the curve. Any control points and directions supplied by the user help the calculations since they reduce the number of unknowns. Any tension parameters supplied by the user should be included in the calculations. The requirement that the first derivatives be equal at the interior points results in the n − 2 equations θk+1 + φk+1 = −ψk+1 ,
for k = 1, 2, . . . , n − 2,
(13.37)
where ψk+1 is the angle between vectors Pk+2 − Pk+1 and Pk+1 − Pk . The requirement that the second derivatives be equal at the interior points leads to the so-called mock curvature that is obtained in the following steps: 1. Calculating the second derivative Ptt k (t) of a general segment Pk (t) (where the functions f and g are given by Equation (13.34)). This is easier to do if the Hermite form of the segment is used, rather than the B´ezier form. The second derivative of the PC segment Pk (t) is Ptt k (t) = 6ak t + 2bk (Equation (12.3)), where ak and bk are given by Equation (11.3): ak = 2Pk − 2Pk+1 + Ptk + Ptk+1 ,
bk = −3Pk + 3Pk+1 − 2Ptk − Ptk+1 .
tt 2. Calculating the Taylor expansion of Ptt k (1) and Pk+1 (0) about θ = φ = 0 and retaining the linear parts. This process results in the n − 2 linear equations for the unknown angles 2 2 τk+1 τk+1 θk + φk+1 θk+1 + φk+2 − 3φk+1 = − 3θk+1 , |Pk+1 − Pk | τk |Pk+2 − Pk+1 | τk+2 (13.38)
684
13.14 Nonparametric B´ ezier Curves
where k = 1, 2, . . . , n − 2. The total number of Equations (13.37) and (13.38) is 2n − 4, again two short of the number of unknowns. The user should therefore supply the values of two unknowns, typically the angles of the two extreme tangents. For a closed curve, every data point is interior, so the number of equations is 2n, the same as the number of unknowns.
13.14 Nonparametric B´ ezier Curves The explicit representation of a curve (Section 8.6) has the familiar form y = f (x). The B´ezier curve is, of course, parametric, but it can be represented in a nonparametric form, similar to explicit curves. Given n + 1 real values (not points) Pi , we start with the polynomial c(t) = Pi Bni (t) and employ the identity n (i/n)Bni (t) = t
(13.39)
i=0
to create the curve
n P(t) = t, c(t) = (i/n, Pi )Bni (t). i=0
(This identity is satisfied by the Bernstein polynomials and can be proved by induction.) It is clear that this version of the curve is defined by the control points (i/n, Pi ) which are equally-spaced on the x axis. This version of the B´ezier curve exists only for two-dimensional curves. In the general case, where t varies in the interval [a, b], the control points are (a + i(b − a))/n, Pi .
13.15 Rational B´ ezier Curves The rational B´ezier curve is an extension of the original B´ezier curve (Equation (13.5)) to the form n n n wi Pi Bn,i (t) wi Bn,i (t) i=0 = = Pi n Pi Rn,i (t), 0 ≤ t ≤ 1. P(t) = n j=0 wj Bn,j (t) j=0 wj Bn,j (t) i=0 i=0 The new weight functions Rn,i (t) are ratios of polynomials (which is the reason for the term rational) and they also depend on weights wi that act as additional parameters that control the shape of the curve. Note that negative weights might lead to a zero denominator, which is why nonnegative weights are normally used. A rational curve seems unnecessarily complicated (and for many applications, it is), but it has the following advantages: 1. It is invariant under projections. Section 13.4 mentions that the B´ezier curve is invariant under affine transformations. If we want to rotate, reflect, scale, or shear
13 B´ ezier Approximation
685
such a curve, we can apply the affine transformation to the control points and then use the new points to compute the transformed curve. The B´ezier curve, however, is not invariant under projections. If we compute a three-dimensional B´ezier curve and project every point of the curve by a perspective projection, we end up with a plane curve P(t). If we then project the three-dimensional control points and compute a plane B´ezier curve Q(t) from the projected, two-dimensional points, the two curves P(t) and Q(t) will generally be different. One advantage of the rational B´ezier curve is its invariance under projections. 2. The rational B´ezier curve provides for accurate control of curve shape, such as precise representation of conic sections (Appendix C). Section 14.5 shows that the B´ezier curve is a special case of the B-spline curve. As a result, many current software systems use the rational B-spline (Section 14.14) when rational curves are required. Such a system can produce the rational B´ezier curve as a special case. Here is a quick example showing how the rational B´ezier curve can be useful. Given the three points P0 = (1, 0), P1 = (1, 1), and P2 = (0, 1), The B´ezier curve defined by the points is quadratic and is therefore a parabola P(t) = (1−t)2 P0 +2t(1−t)P1 +t2 P2 = (1 − t2 , 2t(1 − t)), but the rational B´ezier curve with weights w0 = w1 = 1 and w2 = 2 results in the more complex expression P(t) =
(1 − t)2 P0 + 2t(1 − t)P1 + 2t2 P2 = (1 − t)2 + 2t(1 − t) + 2t2
2t 1 − t2 , 1 + t2 1 + t2
which is a circle, as illustrated by Figure 8.8a. In general, a quadratic rational B´ezier curve with weights w0 = w2 = 1 is a parabola when w1 = 1, an ellipse for w1 < 1, and a hyperbola for w1 > 1. A quarter circle is obtained when w1 = cos(α/2) where α is the angle formed by the three control points P0 , P1 , and P2 (the control points must also be placed as the three corners of an isosceles triangle). Page 261 of [Beach 91] proves this construction for the special case α = 90◦ . Appendix C shows, among other features, that the canonical ellipse is represented as the rational expression 2t 1 − t2 ,b a , 2 2 1+t 1+t
−∞ < t < ∞,
(C.7)
and the canonical hyperbola is represented as the rational 2t 1 + t2 ,b , a 2 2 1−t 1−t
−∞ < t < ∞.
(C.8)
Accurate control of the shape of the curve is provided by either moving the control points or varying the weights, and Figure 13.27 illustrates the different responses of the curve to these changes. Part (a) of the figure shows four curves where weight w1 is increased from 1 to 4. The curve is pulled toward P1 in such a way that individual points on the curve converge at P1 . In contrast, part (b) of the figure illustrates how the curve behaves when P2 itself is moved (while all the weights remain set to 1). The
13.15 Rational B´ ezier Curves
686
1
P1 1
P2
0.8 0.8 0.6
0.6
0.4
0.4
(a)
0.2
(b)
0.2 0.2
0.4
0.6
0.8
1
0.2
0.4
0.6
0.8
1
Figure 13.27: (a) Varying Weights and (b) Moving Points in a Rational B´ezier Curve.
curve is again pulled toward P2 , but in such a way that every point on the curve moves in the same direction as P2 itself. Exercise 13.23: Use mathematical software to compute Figure 13.27 or a similar illustration. Section 13.24 extends the techniques presented here to rectangular B´ezier surface patches.
13.15.1 Circular B´ ezier Curves The B´ezier curve is a polynomial P(t) whose parameter t varies in the interval [0, 1] and whose value is a point (in two or three dimensions). We say that the domain of this polynomial is the interval [0, 1] and the range is the two- or three-dimensional Euclidean space (i.e., all the pairs or triplets of real numbers). This section, which is based on [Alfeld et al. 95], describes the circular B´ezier curve, a polynomial whose domain is a circular arc, not an interval. We start with a unit circle C centered on the origin. An arc A of length less than π and with vertices at points v1 and v2 is selected such that 0 < θ2 −θ1 < π (Figure 13.28). Assume that v is a point on the arc, then v can be written as the weighted combination v = b1 v1 + b2 v2 ,
(13.40)
where b1 and b2 are the circular barycentric coordinates of v with respect to arc A. It is easy to see that these coordinates have the following properties: 1. The coordinates of v1 are (b1 , b2 ) = (1, 0) and those of v2 are (b1 , b2 ) = (0, 1). 2. Any other point v on the arc has two positive coordinates b1 and b2 . 3. The sum b1 + b2 equals 1 for the two endpoints but is greater than 1 for any other point. Notice that point v = 0.5b1 + 0.5b2 is located midway between v1 and v2 ,
13 B´ ezier Approximation
687
but on the straight segment connecting them, not on the arc. In order for v to be on the arc, the sum b1 + b2 should be greater than 1. 4. The coordinates of v are invariant under rotation since they depend only on the relative positions of v, v1 , and v2 . This justifies the name barycentric. y
y v2
v2 v
2
1
C3
C2
C1
v1 v1
x
(a)
C0
x
(b)
Figure 13.28: Barycentric Circular Coordinates.
Since v, v1 , and v2 are located on the circumference of a unit circle, each can be expressed by means of one angle θ. Figure 13.28a shows that v = (cos θ, sin θ), v1 = (cos θ1 , sin θ1 ), and v2 = (cos θ2 , sin θ2 ). Substituting this in Equation (13.40) results in (cos θ, sin θ) = b1 (cos θ1 , sin θ1 ) + b2 (cos θ2 , sin θ2 ). This is a system of two equations whose solutions are sin θ2 cos θ − cos θ2 sin θ sin(θ2 − θ) = , sin θ2 cos θ1 − cos θ2 sin θ1 sin(θ2 − θ1 ) sin θ cos θ1 − cos θ sin θ1 sin(θ − θ1 ) . = b2 (θ) = sin θ2 cos θ1 − cos θ2 sin θ1 sin(θ2 − θ1 ) b1 (θ) =
(13.41)
Thus, the circular barycentric coordinates are expressed as linear combinations of sin θ and cos θ, and also as ratios of arc lengths. They can also be expressed as ratios of areas of triangles: area(0, v, v2 ) area(0, v1 , v) b1 = , b2 = . area(0, v1 , v2 ) area(0, v1 , v2 ) The next step is to define the circular Bernstein polynomials of degree n, n b1 (θ)n−i b2 (θ)i , i = 0, 1, . . . , n, Bni (θ) = i
(13.42)
(compare with Equation (13.4)) and the circular B´ezier curve, P(θ) =
n i=0
ci (cos θ, sin θ)Bni (θ),
θ1 ≤ θ ≤ θ2 ,
(13.43)
13.15 Rational B´ ezier Curves
688
where the ci ’s are any real numbers. Notice that this curve is a trigonometric polynomial, i.e., a polynomial where each monomial consists of powers of sines and cosines instead of powers of x. Comparing this definition with the linear B´ezier curve (Equation (13.5)) suggests how to define the control points of the circular curve. We divide the range [θ1 , θ2 ] into n equal intervals by defining φi = θ1 + i(θ2 − θ1 )/n,
i = 0, 1, . . . , n,
(13.44)
and define the control points def
Ci = ci (cos φi , sin φi ). Points Ci have a simple geometric meaning. The pair (cos φi , sin φi ) is a point on the circumference of the unit circle C. Multiplying it by ci yields a point Ci on the line connecting the origin to (cos φi , sin φi ) (Figure 13.28b). The control points are thus equally spaced in arc length around the arc A. From Equation (13.44) we get φ0 = θ1 and φn = θ2 . From this, it is easy to see that n ci (cos θ1 , sin θ1 )Bni (θ1 ) = c0 (cos φ0 , sin φ0 ) = C0 , P(θ1 ) = i=0
P(θ2 ) =
n
ci (cos θ2 , sin θ2 )Bni (θ2 ) = cn (cos φn , sin φn ) = Cn .
i=0
Thus, the circular curve starts at C0 and ends at Cn , reinforcing the interpretation of the Ci ’s as control points. There is a difference, however, between the way the control points are used in the linear and in the circular B´ezier curves. In the linear curve, the user selects the control points, which may be any points, and the curve is calculated as the weighted sum of the points. In the circular curve, the user selects the constants ci and the two values θ1 and θ2 . Each point P(θ0 ) on the curve becomes the weighted sum of points ci (cos θ0 , sin θ0 ), not of the control points. The control points are spread evenly along the interval [θ1 , θ2 ] but are not used in calculating the curve. Example: We select n = 3, θ1 = 0◦ , θ2 = 90◦ = π/2, and constants c0 = 0, c1 = 0.1, c2 = 0.2, and c3 = 2. We notice that sin(θ2 − θ1 ) = 1, so the circular Bernstein polynomials for this case are B3i (θ) =
3 3 b1 (θ)3−i b2 (θ)i = sin3−i (π/2 − θ) sini (θ) i i
and the circular curve is 3 3 sin2 (π/2 − θ) sin1 (θ) P(θ) = (cos θ, sin θ) 0 sin3 (π/2 − θ) sin0 (θ) + 0.1 1 0 3 3 +0.2 sin1 (π/2 − θ) sin2 (θ) + 2 sin0 (π/2 − θ) sin3 (θ) 2 3
= (cos θ, sin θ) 0.3 sin2 (π/2 − θ) sin(θ) + 0.6 sin(π/2 − θ) sin2 (θ) + 2 sin3 (θ) .
13 B´ ezier Approximation 2
689
c3
1.5 circular curve 1
0.5 c2 arc
c1 c0
0.2
0.4
0.6
0.8
Figure 13.29: A Circular B´ezier Curve.
It goes from C0 = c0 (cos 0, sin 0) = (0, 0) to C3 = c3 (cos(π/2), sin(π/2)) = (0, 2) (Figure 13.29). Exercise 13.24: What are the four control points in this case? Exercise 13.25: Calculate the four control points of the cubic circular curve defined by θ1 = 0, θ2 = 90◦ = π/2, c0 = 2, c1 = 1.2, c2 = 1.6, and c3 = 1.
13.16 Circles and B´ ezier Curves
690
13.16 Circles and B´ ezier Curves Parametric curves are general and can take many shapes. In principle, such a curve can be based on any functions, but in practice polynomials are virtually always used. It is well known, however, that one common, important curve, namely the circle, cannot be precisely represented by a polynomial. √ The equation of a circle is x2 + y 2 = r2 or y = ± r 2 − x2 . This is not a polynomial, and in fact Exercise 13.26 proves that a polynomial cannot represent a circle. Applying B´ezier methods to circles can be done either by using rational B´ezier curves (Section 13.15) or by deriving an approximation to the circle. This section discusses the latter approach. We start with a three-point example. We select the three points P0 = (1, 0), P1 = (k, k), and P2 = (0, 1) and attempt to find the value of k such that the quadratic B´ezier curve defined by the points will best approximate a quarter circle of radius 1 (Figure 13.30). As usual, the curve is given by P(t) = (1 − t)2 (1, 0) + 2t(1 − t)(k, k) + t2 (0, 1) = 1 + 2t(k − 1) + t2 (1 − 2k), 2kt + t2 (1 − 2k) = Px (t), Py (t) ,
(13.45)
and it is identical to the circle at its start and end points. We need a constraint that will produce an equation whose solution will yield the value of k. A reasonable constraint is to require that the √ curve √ be identical to the circle at its midpoint. This can be expressed as P(0.5) = (1/ 2, 1/ 2) and it produces the equation P(0.5) =
1 1 1 (1, 0) + (k, k) + (0, 1) = 4 2 4
whose solution is k=
1 1 √ ,√ 2 2
,
√ 2 2−1 ≈ 0.914. 2
We also note that the tangent vector of Equation (13.45) is Pt (t) = 2(k − 1) + 2t(1 − 2k), 2k + 2t(1 − 2k) .
(13.46)
How much does this curve deviate from a true circle of radius 1? To answer this, we first notice that the distance of a point P(t) from the origin is D(t) =
2 2 Px2 (t) + Py2 (t) = 1 + 2t(k − 1) + t2 (1 − 2k) + 2kt + t2 (1 − 2k) .
To find the maximum distance, we differentiate D(t): 2Px (t) · Pxt (t) + 2Py (t) · Pyt (t) d D(t) = dt 2 P 2 (t) + P 2 (t) x
y
13 B´ ezier Approximation
(0,1)
691
(k,k)
(1,0) Figure 13.30: A Quadratic B´ezier Curve Approximating a Quarter Circle.
and set the result equal to 0. This yields Px (t) · Pxt (t) + Py (t) · Pyt (t) = P(t) · Pt (t) = 0. Applying Equations (13.45) and (13.46), we get the equation 2(k − 1) + 2 1 + 2(k − 1)2 t − 6(1 − 2k)2 t2 + 4(1 − 2k)2 t3 = 0, which has two roots in the interval [0, 1], namely t1 ≈ 0.33179 and t2 ≈ 0.66821, close to the expected values of 1/3 and 2/3. Simple computation shows the maximum distance of P(t) from the origin to be D(t1 ) = D(t2 ) = 0.995685. The maximum deviation of this from a circle of radius one is therefore 0.432%, negligible for most purposes. Exercise 13.26: Prove that the B´ezier curve cannot be a circle. Exercise 13.27: Consider the quarter circle from P0 = (1, 0) to P3 = (0, 1). Select two points P1 and P2 such that the B´ezier curve defined by the four points would be the closest possible to a circle. Exercise 13.28: Do the same for the oval (elliptic) arc from (1, 0) to (0, 1). Exercise 13.29: Calculate the cubic B´ezier curve that approximates the circular arc of Figure 13.31 spanning an angle of 2θ. The calculation should be based on the requirement that the curve and the arc have the same endpoints and the same extreme tangent vectors. Example: We approximate a sine wave by smoothly joining eight cubic B´ezier segments (Figure 13.32). The first segment requires four control points and each of the remaining seven segments requires three additional points. The total number of points is therefore 25. They are numbered P0 through P24 , but because of the high symmetry of the sine wave, only the first seven points, P0 through P6 , need be determined. The rest can be obtained from these by simple translations and reflections. We require that the following three points be on the sine curve, making it easy to find their coordinates: P0 = (0, 0), P3 =
π 4
, sin
π 4
≈ (0.785, 0.7071), P6 =
π 2
, sin
π 2
≈ (1.57, 1).
13.16 Circles and B´ ezier Curves
692
y P3
(0,1)
P0
P2 x
P1
Figure 13.31: A Cubic B´ezier Curve Approximating an Arc.
P4 P2
P5
P3
P6
P7 P9
P1 P0
P24
P12
P15
P21 P18
Figure 13.32: A Sine Curve Approximated by Eight Cubic B´ezier Segments.
The expression for segment i (where i = 0, 3, 6, 9, 12, 15, 18, and 21) is Pi (t) = (1 − t)3 Pi + 3t(1 − t)2 Pi+1 + 3t2 (1 − t)Pi+2 + t3 Pi+3 , and its tangent vector is Pti (t) = −3(1 − t)2 Pi + (3 − 9t)(1 − t)Pi+1 + 3t(2 − 3t)Pi+2 + 3t2 Pi+3 . To determine point P1 , we require that the initial tangent Pt0 (0) of the first curve segment matches the initial slope of the sine wave, which is 45◦ . We can therefore write Pt0 (0) = (a, a) for any positive a and we select a = 0.7071 since this produces a normalized tangent vector. The result is (0.7071, 0.7071) = Pt0 (0) = −3P0 + 3P1 or P1 = (0.7071, 0.7071)/3 = (0.2357, 0.2357). To determine points P2 and P4 , we again require that the final tangent vector Pt0 (1) of the first segment match the slope of the sine wave at x3 = π/4. That slope is 0.7071,
13 B´ ezier Approximation
693
so we select (1, 0.7071) as the tangent vector, then normalize it to (0.816, 0.577). We end up with (0.816, 0.577) = Pt0 (1) = −3P2 + 3P3 or P2 = P3 − (0.816, 0.577)/3 = (0.513, 0.5151). By symmetry we also get P4 = P3 + (0.816, 0.577)/3 = (1.057, 0.899). Only point P5 remains to be determined. Again, we require that the final tangent vector Pt3 (1) of the second segment (segment 3) match the slope of the sine wave at P6 , which is 0. Thus, the normalized tangent vector is (1, 0), which produces the equation (1, 0) = Pt3 (1) = 3P6 − 3P5 , or P5 = P6 − (1, 0)/3 = (1.237, 1). Points P7 through P24 can be obtained from the first seven points by translation and reflection. Alternatively, the first four cubic segments can be calculated and each pixel can be used to calculate one more pixel by translation and reflection.
13.17 Rectangular B´ ezier Surfaces The B´ezier surface patch, like its relative the B´ezier curve, is popular and is commonly used in practice. We discuss the rectangular and the triangular B´ezier surface methods, and this section covers the former type. We start with an (m + 1) × (n + 1) grid of control points arranged in a roughly rectangular grid Pm,0 Pm,1 . . . Pm,n .. .. .. . . . P1,0 P0,0
P1,1 P0,1
... ...
P1,n P0,n
and construct the rectangular B´ezier surface patch for the points by applying the technique of Cartesian product (Section 8.12) to the B´ezier curve. Equation (8.25) produces P(u, w) =
n m
Bm,i (u)Pi,j Bn,j (w)
⎞ n,0 (w) ⎜ Bn,1 (w) ⎟ ⎟ = (Bm,0 (u), Bm,1 (u), . . . , Bm,m (u)) P ⎜ .. ⎠ ⎝ . Bn,n (w) = Bm (u) P Bn (w), i=0 j=0
where
⎛B
⎛P 0,0 ⎜ P1,0 P=⎜ ⎝ .. .
P0,1 P1,1 .. .
... ... .. .
Pm,0
Pm,1
. . . Pm,n
P0,n P1,n .. .
⎞ ⎟ ⎟. ⎠
(13.47)
13.17 Rectangular B´ ezier Surfaces
694
The surface can also be expressed, by analogy with Equation (13.9), as P(u, w) = UNPNT WT ,
(13.48)
where U = (um , um−1 , . . . , u, 1), W = (wn , wn−1 , . . . , w, 1), and N is defined by Equation (13.10). Notice that both P(u0 , w) and P(u, w0 ) (for constants u0 and w0 ) are B´ezier curves on the surface. A B´ezier curve is defined by n + 1 control points, it passes through the two extreme points, and employs the interior points to determine its shape. Similarly, a rectangular B´ezier surface patch is defined by a rectangular grid of (m + 1) × (n + 1) control points, it is anchored at the four corner points and employs the other grid points to determine its shape. Figure 13.33 is an example of a biquadratic B´ezier surface patch with the Mathematica code that generated it. Notice how the surface is anchored at the four corner points and how the other control points pull the surface toward them. Example: Given the six three-dimensional points P10 P00
P11 P01
P12 P02
the corresponding B´ezier surface is generated in the following three steps: 1. Find the orders m and n of the surface. Since the points are numbered starting from 0, the two orders of the surface are m = 1 and n = 2. 2. Calculate the weight functions B1i (w) and B2j (u). For m = 1, we get B1i (w) =
1 i w (1 − w)1−i , i
which yields the two functions B10 (w) =
1 0 w (1 − w)1−0 = 1 − w, 0
For n = 2, we get B2j (u) =
B11 (w) =
1 1 w (1 − w)1−1 = w. 1
2 j u (1 − u)2−j , j
which yields the three functions 2 0 B20 (u) = u (1 − u)2−0 = (1 − u)2 , 0 2 1 B21 (u) = u (1 − u)2−1 = 2u(1 − u), 1 2 2 u (1 − u)2−2 = u2 . B22 (u) = 2
13 B´ ezier Approximation
1
0
695
2
2 z 1
0
1
x y
0
4 (* biquadratic bezier surface patch *) Clear[pwr,bern,spnts,n,bzSurf,g1,g2]; n=2; spnts={{{0,0,0},{1,0,1},{0,0,2}},{{1,1,0},{4,1,1},{1,1,2}}, {{0,2,0},{1,2,1},{0,2,2}}}; (*Handle Indeterminate condition*) pwr[x_,y_]:=If[x==0&&y==0,1,x^y]; bern[n_,i_,u_]:=Binomial[n,i]pwr[u,i]pwr[1-u,n-i] bzSurf[u_,w_]:=Sum[bern[n,i,u] spnts[[i+1,j+1]] bern[n,j,w], {i,0,n},{j,0,n}] g1=ParametricPlot3D[bzSurf[u,w],{u,0,1},{w,0,1}, Ticks->{{0,1,4},{0,1,2},{0,1,2}}]; g2=Graphics3D[{Red,AbsolutePointSize[6], Table[Point[spnts[[i,j]]],{i,1,n+1},{j,1,n+1}]}]; Show[g1,g2,ViewPoint->{2.783,-3.090,1.243},PlotRange->All]
Figure 13.33: A Biquadratic B´ezier Surface Patch.
3. Substitute the weight functions in the general expression for the surface (Equation (13.47)):
P(u, w) =
2 1
B1i (w)Pij B2j (u)
i=0 j=0
= B10 (w)
2 j=0
P0j B2j (u) + B11 (w)
2
P1j B2j (u)
j=0
= (1 − w) [P00 B20 (u) + P01 B21 (u) + P02 B22 (u)] + w [P10 B20 (u) + P11 B21 (u) + P12 B22 (u)]
= (1 − w) P00 (1 − u)2 + P01 2u(1 − u) + P02 u2
13.17 Rectangular B´ ezier Surfaces
696
+ w P10 (1 − u)2 + P11 2u(1 − u) + P12 u2 = P00 (1 − w)(1 − u)2 + P01 (1 − w)2u(1 − u) + P02 (1 − w)u2 + P10 w(1 − u)2 + P11 w2u(1 − u) + P12 wu2 . (13.49) The final expression is linear in w since the surface is defined by just two points in the w direction. Surface lines in this direction are straight. In the u direction, where the surface is defined by three points, each line is a polynomial of degree 2 in u. This expression can also be written in the form (1 − w)
B2,i (u)P0i + w
B2,i (u)P1i = (1 − w)P(u, 0) + wP(u, 1),
which is a lofted surface (Equation (9.12)). A good method to check the final expression is to calculate it for the four values (u, w) = (0, 0), (0, 1), (1, 0), and (1, 1). This should yield the coordinates of the four original corner points. The entire surface can now be easily displayed, as a wire frame, by performing two loops. One draws curves in the u direction and the other draws the curves in the w direction. Notice that the expression of the patch is the same regardless of the particular points used. The user may change the points to modify the surface, and the new surface can be displayed (Figure 13.34) by calculating Equation (13.49). Exercise 13.30: Given the 3×4 array of control points P20 = (0, 2, 0)
P21 = (1, 2, 1)
P22 = (2, 2, 1)
P23 = (3, 2, 0)
P10 = (0, 1, 0)
P11 = (1, 1, 1)
P12 = (2, 1, 1)
P13 = (3, 1, 0)
P00 = (0, 0, 0)
P01 = (1, 0, 1)
P02 = (2, 0, 1)
P03 = (3, 0, 0),
calculate the order-2×3 B´ezier surface patch defined by them. Notice that the order-2×2 B´ezier surface patch defined by only four control points is a bilinear patch. Its form is given by Equation (9.6).
13.17.1 Scaffolding Construction The scaffolding construction (or de Casteljau algorithm) of Section 13.6 can be directly extended to the rectangular B´ezier patch. Figure 13.35 illustrates the principle. Part (a) of the figure shows a rectangular B´ezier patch defined by 3×4 control points (the red circles). The de Casteljau algorithm for curves is applied to each row of three points to compute two intermediate points (the green squares), followed by a final point (the triangle). The final point is located on the B´ezier curve defined by the row of three points. The result of applying the de Casteljau algorithm to the four rows is four points (the triangles). The algorithm is now applied to those four points (Figure 13.35b) to compute one point (the hollow circle) that’s located both on the curve defined by the four (red triangle) points and on the B´ezier surface patch defined by the 3×4 control points. (This is one of the many curve algorithms that can be directly extended to surfaces.) Referring to Equation (13.47), we can summarize this process as follows:
13 B´ ezier Approximation
1
0
697
2
1
0
0.5 0
(* A Bezier surface example. Given the six two-dimensional... *) Clear[pnts,b1,b2,g1,g2,vlines,hlines]; pnts={{{0,1,0},{1,1,1},{2,1,0}},{{0,0,0},{1,0,0},{2,0,0}}}; b1[w_]:={1-w,w};b2[u_]:={(1-u)^2,2u (1-u),u^2}; comb[i_]:=(b1[w].pnts)[[i]] b2[u][[i]]; g1=ParametricPlot3D[comb[1]+comb[2]+comb[3],{u,0,1},{w,0,1}, AspectRatio->Automatic,Ticks->{{0,1,2},{0,1},{0,.5}}]; g2=Graphics3D[{Red,AbsolutePointSize[6], Table[Point[pnts[[i,j]]],{i,1,2},{j,1,3}]}]; vlines=Graphics3D[{Green,AbsoluteThickness[2], Table[Line[{pnts[[1,j]],pnts[[2,j]]}],{j,1,3}]}]; hlines=Graphics3D[{Green,AbsoluteThickness[2], Table[Line[{pnts[[i,j]],pnts[[i,j+1]]}],{i,1,2},{j,1,2}]}]; Show[g1,g2,vlines,hlines,PlotRange->All]
Figure 13.34: A Lofted B´ezier Surface Patch.
(a)
(b)
Figure 13.35: Scaffolding in a Rectangular B´ezier Patch.
698
13.18 Subdividing Rectangular Patches 1. Construct the n + 1 curves Pj (u) =
m
Bmi (u)Pij ,
j = 0, 1, . . . , n.
i=0
2. Apply the de Casteljau algorithm to each curve to end up with n + 1 points, one on each curve. 3. Apply the same algorithm to the n + 1 points to end up with one point. Alternatively, we can first construct the m + 1 curves Pi (w) =
n
Pij Bnj (w),
i = 0, 1, . . . , m,
j=0
then apply the de Casteljau algorithm to each curve to end up with m + 1 points, and finally apply the same algorithm to the m + 1 points, and end up with one point.
13.18 Subdividing Rectangular Patches A rectangular B´ezier patch is computed from a given rectangular array of m×n control points. If there are not enough points, the patch may not have the right shape. Just adding points is not a good solution because this changes the shape of the surface, forcing the designer to start reshaping it from scratch. A better solution is to subdivide the patch into four connected surface patches, each based on m×n control points. The technique described here is similar to that presented in Section 13.8 for subdividing the B´ezier curve. It employs the scaffolding construction of Section 13.6. Figure 13.35a shows a grid of 4 × 3 control points. The first step in subdividing the surface patch defined by this grid is for the user to select values for u and w. This determines a point on the surface, a point that will be common to the four new patches. The de Casteljau algorithm is then applied to each of the three columns of control points (the black circles of Figure 13.36a) separately. Each column of four control points P0 , P1 , P2 , and P3 results in several points, of which the following seven are used for the subdivision (refer to Figure 13.8) P0 , P01 , P012 , P0123 , P123 , P23 , and P3 . The result of this step is three columns of seven points each (Figure 13.36b where the black circles indicate original control points).
(a)
(b)
(c)
Figure 13.36: Subdividing a Rectangular 3×4 B´ezier Patch.
13 B´ ezier Approximation
699
The next step is to apply the de Casteljau algorithm to each of the seven rows of three points, to obtain five points (refer to Figure 13.7). The resulting grid of 7 × 5 is shown in Figure 13.36c. This grid is divided into four overlapping subgrids of 4×3 control points each, and each subgrid serves to compute a new rectangular B´ezier patch.
13.19 Degree Elevation Degree elevation of the rectangular B´ezier surface is similar to elevating the degree of the B´ezier curve (Section 13.9). Specifically, Equation (13.19) is extended in the following way. Given a rectangular B´ezier patch of degree m×n (i.e., defined by (m + 1)×(n + 1) control points), expressed as a double-polynomial by Equation (13.47) Pmn (u, w) =
m n
Bm,i (u)Pi,j Bn,j (w),
(13.47)
i=0 j=0
we first write the patch as a double polynomial of degree (m + 1)×n defined by intermediate control points Rij m+1 n j=0
Bm+1,i (u)Ri,j Bn,j (w).
i=0
Based on the result of Section 13.9 the intermediate points are given by Rij =
i i Pi−1,j + (1 − )Pi,j . m+1 m+1
(13.50)
We then repeat this process to increase the degree to (m + 1)×(n + 1) and write Pm+1,n+1 (u, w) =
m+1 n+1
Bm+1,i (u)Qi,j Bn+1,j (w),
i=0 j=0
where the new (m + 2) × (n + 2) control points Qij can be obtained either from the intermediate points Rij by an expression similar to Equation (13.50) or directly from the original control points Pij by a bilinear interpolation j i i Pi−1,j−1 Pi−1,j n+1 ,1 − , j Pi,j−1 Pi,j m+1 m+1 1 − n+1 for i = 0, 1, . . . , m + 1, and j = 0, 1, . . . , n + 1.
Qij =
(13.51)
If i = 0 or j = 0, indexes of the form i − 1 or j − 1 are negative, but (the nonexistent) points with such indexes are multiplied by zero, which is why this bilinear interpolation works well in this case. Similarly, when i = m + 1, point Pi,j does not exist, but the factor 1 − i/(m + 1) that multiplies it is zero and when j = n + 1, point Pi,j does not
13.19 Degree Elevation
700
exist, but the factor 1 − j/(n + 1) that multiplies it is also zero. Thus, Equation (13.51) always works. Example: Starting with the 2×3 control points P10 P00
P11 P01
P12 , P02
(this implies that m = 1 and n = 2), we perform two steps to elevate the degree of the rectangular patch defined by them from 1×2 to 2×3. The first step is to elevate the degree of each of the three columns from 1 (two control points P0i and P1i ) to 2 (three intermediate points R0i , R1i , and R2i ). This step produces the nine intermediate points R20 R10 R00
R21 R11 R01
R22 R12 . R02
For the leftmost column, the two extreme points R00 and R20 equal the two original control points P00 and P10 , respectively. The middle point R10 is computed from Equation (13.50) as R10 = 12 P00 + (1 − 12 )P10 . Similarly, the middle column yields R01 = P01 ,
R21 = P11 ,
R11 = 12 P01 + (1 − 12 )P11
and the rightmost column results in R02 = P02 ,
R22 = P12 ,
R12 = 12 P02 + (1 − 12 )P12 .
The second step is to elevate the degree of each of the three rows from 2 (three points Ri0 , Ri1 , and Ri2 ) to 3 (four new points Qi0 , Qi1 , Qi2 , and Qi3 ). This step produces the 12 new control points Q20 Q10 Q00
Q21 Q11 Q01
Q22 Q12 Q02
Q23 Q13 . Q03
For the bottom row, the two extreme points Q00 and Q03 equal the two intermediate control points R00 and R02 , respectively. These, together with the two interior points Q01 and Q02 are computed from Equations (13.50) and (13.51) as P−1,−1 P−1,0 0 , Q00 = R00 = P00 = (0, 1 − 0) P0,−1 P00 1 P−1,0 P−1,1 1/3 Q01 = 13 R00 + 23 R01 = 13 P00 + 23 P01 = (0, 1) , P0,−1 P01 1 − 1/3 P−1,1 P−1,2 2/3 Q02 = 23 R01 + 13 R02 = 23 P01 + 13 P02 = (0, 1) , P01 P02 1 − 2/3 1 P−1,2 P−1,3 . Q03 = R02 = P02 = (0, 1 − 0) 0 P0,2 P03
13 B´ ezier Approximation
701
The middle row yields Q10 = R10 = 12 P00 + (1 − 12 )P10 = ( 12 , 1 − 12 )
P0,−1 P1,−1
P00 P10
0 , 1
Q11 = 13 R10 + 23 R11 = 13 ( 12 P00 + 12 P10 ) + 23 ( 12 P01 + 12 P11 ) P00 P01 1/3 , = ( 12 , 1 − 12 ) P10 P11 1 − 1/3 Q12 = 23 R11 + 13 R12 = 23 ( 12 P01 + 12 P11 ) + 13 ( 12 P02 + 12 P12 ) P01 P02 2/3 , = ( 12 , 1 − 12 ) P11 P12 1 − 2/3 1 P02 P03 . Q13 = R12 = 12 P02 + (1 − 12 )P12 = ( 12 , 1 − 12 ) 0 P12 P13 Finally, the third row of intermediate points produces the four new control points 0 P1,−1 P10 Q20 = R20 = P10 = (1, 0) , 1 P2,−1 P20 P10 P11 1/3 Q21 = 13 R20 + 23 R21 = 13 P10 + 23 P11 = (1, 0) , P20 P21 1 − 1/3 P11 P12 2/3 Q22 = 23 R21 + 13 R22 = 23 P11 + 13 P12 = (1, 0) , P21 P22 1/3 1 P12 P13 . Q23 = R22 = P12 = (1, 0) 0 P22 P23 Figure 13.37 lists code for elevating the degree of a rectangular B´ezier patch based on 2×3 control points. In part (a) of the figure each point is a symbol, such as p00, and in part (b) each point is a triplet of coordinates. The points are stored in a 2×3 array p and are transferred to a 4×5 array r, parts of which remain undefined.
13.20 Nonparametric Rectangular Patches The explicit representation of a surface (Section 8.11) is z = f (x, y). The rectangular B´ezier surface is, of course, parametric, but it can be represented in a nonparametric form, similar to explicit surfaces. The derivation in this section is similar to that of Section 13.14. Given (n + 1)×(m + 1) real values (not points) Pij , we start with the double polynomial n m s(u, w) = Bni (u)Pij Bmj (w) i=0 j=0
and employ the identity of Equation (13.39) twice, for u and for w, to create the surface patch m n Bni (u)(i/m, j/n, Pij )Bmj (w). P(u, w) = u, w, s(u, w) = i=0 j=0
702
13.21 Joining Rectangular B´ ezier Patches (* Degree elevation of a rect Bezier surface from 2x3 to 4x5 *) Clear[p,q,r]; m=1; n=2; p={{p00,p01,p02},{p10,p11,p12}}; (* array of points *) r=Array[a, {m+3,n+3}]; (* extended array, still undefined *) Part[r,1]=Table[a, {i,-1,m+2}]; Part[r,2]=Append[Prepend[Part[p,1],a],a]; Part[r,3]=Append[Prepend[Part[p,2],a],a]; Part[r,n+2]=Table[a, {i,-1,m+2}]; MatrixForm[r] (* display extended array *) q[i_,j_]:=({i/(m+1),1-i/(m+1)}. (* dot product *) {{r[[i+1,j+1]],r[[i+1,j+2]]},{r[[i+2,j+1]],r[[i+2,j+2]]}}). {j/(n+1),1-j/(n+1)} q[2,3] (* test *)
(a) (* Degree elevation of a rect Bezier surface from 2x3 to 4x5 *) Clear[p,r,comb]; m=1; n=2; (* set p to an array of 3D points *) p={{{0,0,0},{1,0,1},{2,0,0}},{{0,1,0},{1,1,.5},{2,1,0}}}; r=Array[a, {m+3,n+3}]; (* extended array, still undefined *) Part[r,1]=Table[{a,a,a}, {i,-1,m+2}]; Part[r,2]=Append[Prepend[Part[p,1],{a,a,a}],{a,a,a}]; Part[r,3]=Append[Prepend[Part[p,2],{a,a,a}],{a,a,a}]; Part[r,n+2]=Table[{a,a,a}, {i,-1,m+2}]; MatrixForm[r] (* display extended array *) comb[i_,j_]:=({i/(m+1),1-i/(m+1)}. {{r[[i+1,j+1]],r[[i+1,j+2]]},{r[[i+2,j+1]],r[[i+2,j+2]]}})[[1]]{j/(n+1),1-j/(n+1)}[[1]]+ ({i/(m+1),1-i/(m+1)}. {{r[[i+1,j+1]],r[[i+1,j+2]]},{r[[i+2,j+1]],r[[i+2,j+2]]}})[[2]]{j/(n+1),1-j/(n+1)}[[2]]; MatrixForm[Table[comb[i,j], {i,0,2},{j,0,3}]]
(b) Figure 13.37: Code for Degree Elevation of a Rectangular B´ezier Surface.
This version of the B´ezier surface is defined by the control points (i/m, j/n, Pij ) which form a regular grid on the xy plane.
13.21 Joining Rectangular B´ ezier Patches It is easy, although tedious, to explore the conditions for the smooth joining of two B´ezier surface patches. Figure 13.38 shows a typical example of this problem. It shows parts of two patches P and Q. It is not difficult to see that the former is based on 4 × 5 control points and the latter on 4 × n points, where n ≥ 2. It is also easy to see that they are joined such that the eight control points along the joint satisfy Pi4 = Qi0 for i = 0, 1, 2, 3. The condition for smooth joining of the two surface patches is that the two tangent vectors at the common boundary are in the same direction, although they may have different magnitudes. This condition is expressed as ∂Q(u, w) ∂P(u, w) = α . ∂w w=1 ∂w w=0
13 B´ ezier Approximation
703
u=0 P04=Q00
P03
Q01
w=1
Q11
P14=Q10
P13 P(u,w)
w=0
P23
Q(u,w)
Q21
P24=Q20
Q31 P34=Q30
P33
u=1
Figure 13.38: Smoothly Joining Rectangular B´ezier Patches.
The two tangents are calculated from Equation (13.48) (and the B3 and B4 matrices given by Figure 13.3). For the first patch, we have ∂P(u, w) ∂w w=1
⎛
P00 ⎜ P10 3 2 = (u , u , u, 1)B3 ⎝ P20 P30
P01 P11 P21 P31
P02 P12 P22 P32
P03 P13 P23 P33
⎞ P04 P14 ⎟ T ⎠ B4 P24 P34
⎛
⎞ P04 − P03 ⎜ P − P13 ⎟ = 4(u3 , u2 , u, 1)B3 ⎝ 14 ⎠. P24 − P23 P34 − P33
⎞ 4w3 2 ⎜ 3w ⎟ ⎟ ⎜ ⎜ 2w ⎟ ⎠ ⎝ 1 0 w=1 ⎛
Similarly, for the second patch, ⎛ ⎞ Q01 − Q00 ∂Q(u, w) Q − Q ⎜ 10 ⎟ = 4(u3 , u2 , u, 1)B3 ⎝ 11 ⎠. Q21 − Q20 ∂w w=0 Q31 − Q30 The conditions for a smooth join are therefore ⎛ ⎞ ⎞ Q01 − Q00 P04 − P03 ⎜ Q11 − Q10 ⎟ ⎜ P14 − P13 ⎟ ⎠ = α⎝ ⎠, ⎝ P24 − P23 Q21 − Q20 P34 − P33 Q31 − Q30 ⎛
or Pi4 − Pi3 = α(Qi1 − Qi0 ) for i = 0, 1, 2, and 3. This can also be expressed by saying
704
13.22 An Interpolating B´ ezier Surface Patch
that the three points Pi3 , Pi4 = Qi0 , and Qi1 should be on a straight line, although not necessarily equally spaced. Example: Each of the two patches in Figure 13.39 is based on 3×3 points (n = 2). The patches are smoothly connected along the curve defined by the common points (0, 2, 0), (0, 0, 0), and (0, −2, 0). Note that in the diagram they are slightly separated, but this was done intentionally. The smooth connection is obtained by making sure that the points (−2, 2, 0), (0, 2, 0), and (2, 2, 0) are collinear (find the other two collinear triplets). The coordinates of the points are −2, 2, 2 −2, 2, 0 0, 2, 0 −4, 0, 2 −4, 0, 0 0, 0, 0 −2, −2, 2 −2, −2, 0 0, −2, 0
0, 2, 0 2, 2, 0 2, 2, −2 0, 0, 0 4, 0, 0 4, 0, −2 0, −2, 0 2, −2, 0 2, −2, −2
The famous Utah teapot was designed in the 1960s at the University of Utah by digitizing a real teapot (now kept at the computer museum in Boston) and creating 32 smoothly-connected B´ezier patches defined by a total of 306 control points. [Crow 87] has a detailed description. The coordinates of the points are publicly available, as is a program to display the entire surface. The program is part of a public-domain general three-dimensional graphics package called SIPP (SImple Polygon Processor). SIPP was originally written in Sweden and is distributed by the Free Software Foundation [Free 04]. It can be downloaded anonymously from several sources and for different platforms. A more recent source for this important surface is a Mathematica notebook by Jan Mangaldan, available at [MathSource 05]. She finished pouring the tea and put down the pot. “That’s an old teapot,” remarked Harold. “Sterling silver,” said Maude wistfully. “It was my dear mother-in-law’s, part of a dinner set of fifty pieces. It was sent to me, one of the few things that survived.” Her voice trailed off and she absently sipped her tea. —Colin Higgins, Harold and Maude (1971).
13.22 An Interpolating B´ ezier Surface Patch An interpolating rectangular B´ezier surface patch solves the following problem. Given a set of (m + 1)×(n + 1) data points Qkl , compute a set of (m + 1)×(n + 1) control points Pij , such that the rectangular B´ezier surface patch P(u, w) defined by the Pij ’s will pass through all the data points Qkl . Section 13.13 discusses the same problem for the B´ezier curve, and here we apply the same approach to the rectangular B´ezier surface. We select m+1 values uk and n+1 values wl and require that the (m + 1)×(n + 1) surface points P(uk , wl ) equal the data points Qkl for k = 0, 1, . . . , m and l = 0, 1, . . . , n. This results in a set of (m + 1)×(n + 1) equations with the control points Pij as the unknowns. Such a set of equations may be
13 B´ ezier Approximation
705
2 1 0 −1 −2 2
−4 −2
1 0 −1
0 2 4
−2
Clear[n,bern,p1,p2,g3,bzSurf,patch]; n=2; p1={{{-2,2,2},{-2,2,0},{0,2,0}},{{-4,0,2},{-4,0,0}, {0,0,0}},{{-2,-2,2},{-2,-2,0},{0,-2,0}}}; p2={{{0,2,0},{2,2,0},{2,2,-2}},{{0,0,0},{4,0,0},{4,0,-2}}, {{0,-2,0},{2,-2,0},{2,-2,-2}}}; pwr[x_,y_]:=If[x==0&&y==0,1,x^y]; bern[n_,i_,u_]:=Binomial[n,i]pwr[u,i]pwr[1-u,n-i] bzSurf[p_]:={Sum[p[[i+1,j+1,1]]bern[n,i,u]bern[n,j,w], {i,0,n,1},{j,0,n,1}], Sum[p[[i+1,j+1,2]]bern[n,i,u]bern[n,j,w], {i,0,n,1},{j,0,n,1}], Sum[p[[i+1,j+1,3]]bern[n,i,u]bern[n,j,w], {i,0,n,1},{j,0,n,1}]}; patch[s_]:=ParametricPlot3D[bzSurf[s], {u,0,1},{w,0.02,.98}]; g3=Graphics3D[{Red,AbsolutePointSize[6], Table[Point[p1[[i,j]]],{i,1,n+1},{j,1,n+1}]}]; g4=Graphics3D[{Red,AbsolutePointSize[6], Table[Point[p2[[i,j]]],{i,1,n+1},{j,1,n+1}]}]; Show[patch[p1],patch[p2],g3,g4,PlotRange->All] Figure 13.39: Two B´ezier Surface Patches.
13.22 An Interpolating B´ ezier Surface Patch
706
big, but is easy to solve with appropriate mathematical software. A general equation in this set is P(uk , wl ) = Bm (uk ) P Bn (wl ) = Qkl
for k = 0, 1, . . . , m and l = 0, 1, . . . , n.
Example: We choose m = 3 and n = 2. The system of equations becomes ⎡ P00
3 2 2 3 ⎢ P10 (1 − uk ) , 3uk (1 − uk ) , 3uk (1 − uk ), uk ⎣ P20 P30
P01 P11 P21 P31
⎤ ⎤ P02 ⎡ (1 − wl )2 P12 ⎥ ⎣ ⎦ 2wl (1 − wl ) ⎦ = Qkl , P22 wl2 P32
for k = 0, 1, 2, 3 and l = 0, 1, 2. This is a system of 12 equations in the 12 unknowns Pij . In most cases the uk values can be equally spaced between 0 and 1 (in our case 0, 0.25, 0.5, 0.75, and 1), and the same for the wl values (in our case, 0, 0.5, and 1).
13.22.1 Bicubic B´ ezier and Hermite Patches The order-3×3 rectangular B´ezier surface patch based on 16 control points Pij is expressed as ⎛ 3⎞ w 2 3 2 T ⎜w ⎟ T T P(u, w) = (u , u , u, 1)NPN ⎝ ⎠ = UNPN W , w 1 where N is given by Equation (13.8), ⎛
⎞ −1 3 −3 1 3 0⎟ ⎜ 3 −6 N=⎝ ⎠. −3 3 0 0 1 0 0 0 This expression is similar to the one for the bicubic Hermite patch UHBHT WT , where B is defined by Equation (11.35). Comparing these expressions by setting NPNT = HBHT gives us the specific matrix B needed to express the 3×3 B´ezier patch in bicubic form ⎛ ⎜ B=⎝
P03 P00 P30 P33 3(P10 − P00 ) 3(P13 − P03 ) 3(P30 − P20 ) 3(P33 − P23 )
⎞ 3(P01 − P00 ) 3(P03 − P02 ) 3(P33 − P32 ) 3(P31 − P30 ) ⎟ ⎠. 9(P00 − P10 − P01 + P11 ) 9(P02 − P12 − P03 + P13 ) 9(P20 − P30 − P21 + P31 ) 9(P22 − P32 − P33 + P33 )
The eight tangent vectors and four twist vectors are expressed in terms of the 16 control points. Each tangent is the difference of two control points located on the boundary of
13 B´ ezier Approximation
707
the control polyhedron (i.e., of the form P0j or Pi0 ), while the twists are computed by means of all 16 points.
13.23 A B´ ezier Sphere Section 13.16 shows how the B´ezier curve can approximate a circle to a high precision. Specifically, Exercise 13.27 shows how to place the four points (1, 0), (1, k), (k, 1), and (0, 1), where k ≈ 0.5523, in order to construct a curve (Equation (Ans.35)) whose maximum deviation from a true quarter circle is just 0.027%. This section shows how to construct an approximate sphere (a B´ezphere) out of eight identical B´ezier surface patches, one of which is shown in Figure 13.40a. The idea is to define a degenerate surface patch where one boundary curve degenerates to a single point and each of the other three boundary curves is an approximate quarter circle. Each of the eight patches is defined by 4×4 control points arranged as in Figure 13.40b. The four points P00 , P10 , P20 , and P30 are identical and equal (0, 0, 1). The four points P03 = (1, 0, 0), P13 = (1, k, 0), P23 = (k, 1, 0), and P33 = (0, 1, 0) are located on the xy plane. The group P10 , P11 , P12 , and P12 have the same relative positions and are located on a plane rotated 30◦ from positive x to positive y. The Mathematica code of Figure 13.41 calculates all the points and has produced the following expression of the surface: P(u, w) = 0.211374u(7.83872 − 1.48457u − 1.62319u2 − 3.15057w + 0.596685uw + 2.55388u2 w − 5.45694w2 + 1.03349uw2 − 1.93069u2 w2 + 0.768792w3 − 0.145601uw3 + 1.u2 w3 ), − 0.211374u(−11.7581w + 2.22686uw + 1.6925u2 w + 3.15057w2 − 0.596685uw2 − 1.06931u2 w2 + 0.768792w3 − 0.145601uw3 + 1u2 w3 ), 0.3431(2.9146 − 3.9146u2 + 1.u3 ) .
13.24 Rational B´ ezier Surfaces Section 13.15 describes the rational B´ezier curve. The principle of this type of curve can be extended to surfaces, and this section discusses the rational rectangular B´ezier surface patch. This type of surface is given by the expression n P(u, w) =
i=0 n
m
k=0
j=0 m
wij Bn,i (u)Pij Bm,j (w)
l=0 wkl Bn,k (u)Bm,l (w)
0 ≤ u, w ≤ 1.
(13.52)
When all the weights wij are set to 1, Equation (13.52) reduces to the original rectangular B´ezier surface patch. The weights serve as additional parameters and provide fine, accurate control of the shape of the surface. Figure 13.42 shows how the surface patch of
13.24 Rational B´ ezier Surfaces
708
1
0.75
0.5
0.25
0 1
0.75
0.5
0.25
z y
0 x
0.25
0
(a)
0.75 0.5
z P00=P10=P20=P30 P01
P31 P11
P02
x
P21 P32
P12
P03
P22
P13
P23
P33
(b) Figure 13.40: A B´ezier Patch for a Sphere.
y
1
13 B´ ezier Approximation
709
(*Sphere made of 8 Bezier patches*) Clear[u,w,patch]; al3=Sin[30. Degree];be3=Cos[30. Degree]; p00=p10=p20=p30={0,0,1};p03={1,0,0};p33={0,1,0}; t3={{be3,al3,0},{-al3,be3,0},{0,0,1}}; k=0.5523; p13={1,k,0};p23={k,1,0}; p02={1,0,k};p01={k,0,1}; p32={0,1,k};p31={0,k,1}; p11=p01.t3;p12=p02.t3; t6={{al3,be3,0},{-be3,al3,0},{0,0,1}}; p21=p01.t6;p22=p02.t6; b30[t_]:=(1-t)^3;b31[t_]:=3t (1-t)^2; b32[t_]:=3t^2(1-t);b33[t_]:=t^3; patch[u_,w_]:=b30[w](b30[u]p00+b31[u]p01+b32[u]p02+ b33[u]p03)+b31[w](b30[u]p10+b31[u]p11+b32[u]p12+ b33[u]p13)+b32[w](b30[u]p20+b31[u]p21+b32[u]p22+ b33[u]p23)+b33[w](b30[u]p30+b31[u]p31+b32[u]p32+b33[u]p33); Factor[patch[u,w]] ParametricPlot3D[patch[u,w],{u,0,1},{w,0,1}, Prolog->AbsoluteThickness[.5],ViewPoint->{1.908,-3.886,0.306}] Figure 13.41: Code for a B´ezier Patch.
Figure 13.33 can be pulled toward the center point (point (4, 1, 1)) by assigning w22 = 5, while keeping the other weights set to 1. Note that weights of 0 and negative weights can also be used, as long as the denominator of Equation (13.52) is not zero. Exercise 13.31: Use the code of Figure 13.42 to construct a closed rational B´ezier surface patch based on a grid of 2×4 control points.
13.25 Triangular B´ ezier Surfaces The first surface to be derived with B´ezier methods was the triangular patch, not the rectangular. It was developed in 1959 by de Casteljau at Citro¨en. The triangular B´ezier patch and its properties is the topic of this section, but it should be noted that the ideas and techniques described here can be extended to B´ezier surface patches with any number of edges. [DeRose and Loop 89] discusses one approach, termed S-patch, to this problem. The triangular B´ezier patch is based on control points Pijk arranged in a roughly triangular shape. Each control point is three-dimensional and is assigned three indexes ijk such that 0 ≤ i, j, k ≤ n and i + j + k = n. The value of n is selected by the user depending on how large and complex the patch should be and how many points are
710
13.25 Triangular B´ ezier Surfaces
(* A Rational Bezier Surface *) Clear[pwr,bern,spnts,n,m,wt,bzSurf,cpnts,patch,vlines,hlines,axes]; spnts={{{0,0,0},{1,0,1},{0,0,2}},{{1,1,0},{4,1,1},{1,1,2}}, {{0,2,0},{1,2,1},{0,2,2}}}; m=Length[spnts[[1]]]-1;n=Length[Transpose[spnts][[1]]]-1; wt=Table[1,{i,1,n+1},{j,1,m+1}]; wt[[2,2]]=5; pwr[x_,y_]:=If[x==0&&y==0,1,x^y]; bern[n_,i_,u_]:=Binomial[n,i]pwr[u,i]pwr[1-u,n-i] bzSurf[u_,w_]:=Sum[wt[[i+1,j+1]]spnts[[i+1,j+1]]bern[n,i,u]bern[m,j,w], {i,0,n},{j,0,m}]/Sum[wt[[i+1,j+1]]bern[n,i,u]bern[m,j,w], {i,0,n},{j,0,m}]; patch=ParametricPlot3D[bzSurf[u,w],{u,0,1},{w,0,1}]; cpnts=Graphics3D[{Red,AbsolutePointSize[6], (*control points*) Table[Point[spnts[[i,j]]],{i,1,n+1},{j,1,m+1}]}]; vlines=Graphics3D[{Green,AbsoluteThickness[1], (*control polygon*) Table[Line[{spnts[[i,j]],spnts[[i+1,j]]}],{i,1,n},{j,1,m+1}]}]; hlines=Graphics3D[{Green,AbsoluteThickness[1], Table[Line[{spnts[[i,j]],spnts[[i,j+1]]}], {i,1,n+1},{j,1,m}]}]; maxx=Max[Flatten[Table[Part[spnts[[i,j]],1],{i,1,n+1},{j,1,m+1}]]]; maxy=Max[Flatten[Table[Part[spnts[[i,j]],2],{i,1,n+1},{j,1,m+1}]]]; maxz=Max[Flatten[Table[Part[spnts[[i,j]],3],{i,1,n+1},{j,1,m+1}]]]; axes=Graphics3D[{AbsoluteThickness[1.5], (*the coordinate axes*) Line[{{0,0,maxz},{0,0,0},{maxx,0,0},{0,0,0},{0,maxy,0}}]}]; Show[cpnts,hlines,vlines,axes,patch,PlotRange->All, ViewPoint->{2.783,-3.090,1.243}]
Figure 13.42: A Rational B´ezier Surface Patch.
13 B´ ezier Approximation
711
given. Generally, a large n allows for a finer control of surface details but requires more computations. The following convention is used here. The first index, i, corresponds to the left side of the triangle, the second index, j, corresponds to the base, and the third index, k, corresponds to the right side. The indexing convention for n = 1, 2, 3, and 4 are shown in Figure 13.43. There are n + 1 points on each side of the triangle and because of the way the points are arranged there is a total of 12 (n + 1)(n + 2) control points:
P010 P001 P100
P020 P011 P110 P002 P101 P200
P030 P021 P120 P012 P111 P210 P003 P102 P201 P300
P040 P031 P130 P022 P121 P220 P013 P112 P211 P310 P004 P103 P202 P301 P400
Figure 13.43: Control Points for Four Triangular B´ezier Patches.
The surface patch itself is defined by the trinomial theorem (Equation (13.3)) as P(u, v, w) =
Pijk
i+j+k=n
n! ui v j w k = i! j! k!
n Pijk Bijk (u, v, w),
(13.53)
i+j+k=n
where u + v + w = 1. Note that even though P(u, v, w) seems to depend on three parameters, it only depends on two because their sum is constant. The quantities n (u, v, w) = Bijk
n! ui v j w k i! j! k!
are the Bernstein polynomials in two variables (bivariate). They are listed here for n = 1, 2, 3, and 4
v2 v 2vw 2uv w u w 2 2uw u2
v3 2 3v w 3uv 2 3vw2 6uvw 3u2 v w3 3uw2 3u2 w u3
v4 4v 3 w 4uv3 6v2 w 2 12uv2 w 6u2 v 2 4vw3 12uvw2 12u2 vw 4u3 v w4 4uw3 6u2 w 2 4u3 w u4
The three boundary curves are obtained from Equation (13.53) by setting each of the three parameters in turn to zero. Setting, for example, u = 0 causes all the terms of Equation (13.53) except those with i = 0 to vanish. The result is P(0, v, w) =
j+k=n
P0jk
n! j k v w , j! k!
where v + w = 1.
(13.54)
13.25 Triangular B´ ezier Surfaces
712
Since v + w = 1, Equation (13.54) can be written P(v) =
n
P0jk
j+k=n
n! j n! v (1 − v)k = v j (1 − v)n−j , P0j,n−j j! k! j! (n − j)! j=0
(13.55)
and this is a B´ezier curve. Example: We illustrate the case n = 2. There should be three control points on each side of the triangle, for a total of 12 (2 + 1)(2 + 2) = 6 points. We select simple coordinates: (1, 3, 1) (0.5, 1, 0) (1.5, 1, 0) (0, 0, 0) (1, 0, −1) (2, 0, 0) Note that four of the points have z = 0 and are therefore on the same plane. It is only the other two points, with z = ±1, that cause this surface to be nonflat. The expression of the surface is P(u, v, w) =
i+j+k=2
Pijk
n! ui v j wk i! j! k!
2! 2! 2! w2 + P101 uw + P200 u2 0! 0! 2! 1! 0! 1! 2! 0! 0! 2! 2! 2! + P011 vw + P110 uv + P020 v2 0! 1! 1! 1! 1! 0! 0! 2! 0! = (0, 0, 0)w2 + (1, 0, −1)2uw + (2, 0, 0)u2 = P002
+ (0.5, 1, 0)2vw + (1.5, 1, 0)2uv + (1, 3, 1)v2 = (2uw + 2u2 + vw + 3uv + v 2 , 2vw + 2uv + 3v 2 , −2uw + v2 ). It is now easy to verify that the following special values of u, v, and w produce the three corner points: u v w point 0 0 1
0 1 0
1 0 0
(0,0,0) (1,3,1) (2,0,0)
But the most important feature of this triangular surface patch is the way it is displayed as a wireframe. The principle is to display this surface as a mesh of three families of curves (compare this with the two families in the case of a rectangular surface patch). Each family consists of curves that are roughly parallel to one side of the triangle (Figure 13.44a,b). Exercise 13.32: Write pseudo-code to draw the three families of curves. A triangle of points can be stored in a one-dimensional array in computer memory. A simple way of doing this is to store the top point P0n0 at the beginning of the array, followed by a short segment consisting of the two points P0,n−1,1 and P1,n−1,0 of the
13 B´ ezier Approximation u=
.1
.2
713
.3
.9
1
(a)
(b)
Figure 13.44: (a) Lines in the u Direction. (b) The Complete Surface Patch.
next row down, followed by a longer segment with three points, and so on, ending with a segment with the n + 1 points P00n , P1,0,n−1 , through Pn00 of the bottom row of the triangle. A direct check verifies that the points Pijk of triangle row j, where 0 ≤ j ≤ n, start at location j(j + 1)/2 + 1 of the array, so they can be indexed by j(j + 1)/2 + 1 + i. Figure 13.45 lists Mathematica code to compute one point on such a surface patch. Note that j is incremented from 0 to n (from the bottom to the top of the triangle), so the first iteration needs the points in the last segment of the array and the last iteration needs the single point at the start of the array. This is why the index to array pnts depends on j as (n − j)(n − j + 1)/2 + 1 instead of as j(j + 1)/2 + 1. (* Triangular Bezier surface patch *) pnts={{3,3,0}, {2,2,0},{4,2,1}, {1,1,0},{3,1,1},{5,1,2}, {0,0,0},{2,0,1},{4,0,2},{6,0,3}}; B[i_,j_,k_]:=(n!/(i! j! k!))u^i v^j w^k; n=3; u=1/6; v=2/6; w=3/6; Tsrpt={0,0,0}; indx:=(n-j)(n-j+1)/2+1+i; Do[{k=n-i-j, Tsrpt=Tsrpt+B[i,j,k] pnts[[indx]]}, {j,0,n}, {i,0,n-j}]; Tsrpt Figure 13.45: Code for One Point in a Triangular B´ezier Patch.
Figure 13.46 shows a triangular B´ezier surface patch for n = 3. Note how the wireframe consists of three sets of curves and how the curves remain roughly parallel and do not converge toward the three corners. (This should be compared with the triangular Coons patch of Figure 10.14 and with the lofted sweep surface of Figure 16.3. Each of these surfaces is displayed as two families of curves and has one dark corner as a result.) The control points and control polygon are also shown. The Mathematica code for this type of surface is due to Garry Helzer and it works by recursively subdividing the triangular patch into subtriangles. Figure 13.47 shows two triangular B´ezier patches for n = 2 and n = 4.
13.25.1 Scaffolding Construction The scaffolding construction (or de Casteljau algorithm) of Section 13.6 can be directly extended to triangular B´ezier patches. The bivariate Bernstein polynomials that are the
13.25 Triangular B´ ezier Surfaces
714
y z
x (* Triangular Bezier patch by Garry Helzer *) rules=Solve[{u{a1,b1}+v{a2,b2}+w{a3,b3}=={x,y},u+v+w==1},{u,v,w}] BarycentricCoordinates[Polygon[{{a1_,b1_},{a2_,b2_},{a3_,b3_}}]] \ [{x_,y_}]={u,v,w}/.rules//Flatten Subdivide[l_]:=l/. Polygon[{p_,q_,r_}] :> Polygon /@ \ ({{p+p,p+q,p+r},{p+q,q+q,q+r},{p+r,q+r,r+r},{p+q,q+r,r+p}}/2) Transform[F_][L_]:= L /. Polygon[l_] :> Polygon[F /@ l] P[L_][{u_,v_,w_}]:= Module[{x,y,z,n=(Sqrt[8Length[L]+1]-3)/2}, ((List @@ Expand[(x+y+z)^n]) /. {x->u,y->v,z->w}).L] Param[T_,L_][{x_,y_}]:=With[{p=BarycentricCoordinates[T][{x, y}]},P[L][p]]
Run the code below in a separate cell (* Triangular bezier patch for n=3 *) T=Polygon[{{1, 0}, {0, 1}, {0, 0}}]; L={P300,P210,P120,P030, P201,P111,P021, P102,P012, P003} \ ={{3,0,0},{2.5,1,.5},{2,2,0},{1.5,3,0}, {2,0,1},{1.5,1,2},{1,2,.5}, {1,0,1},{.5,1,.5}, {0,0,0}}; SubT=Nest[Subdivide, T, 3]; Patch=Transform[Param[T, L]][SubT]; cpts={PointSize[0.02], Point/@L}; coord={AbsoluteThickness[1], Line/@{{{0,0,0},{3.2,0,0}},{{0,0,0},{0,3.4,0}},{{0,0,0},{0,0,1.3}}}}; cpolygon={AbsoluteThickness[2], Line[{P300,P210,P120,P030,P021,P012,P003,P102,P201,P300}], Line[{P012,P102,P111,P120,P021,P111,P201,P210,P111,P012}]}; Show[Graphics3D[{cpolygon,cpts,coord,Patch}], Boxed->False, PlotRange->All, ViewPoint->{2.620, -3.176, 2.236}];
Figure 13.46: A Triangular B´ezier Surface Patch For n = 3.
When an object is digitized mechanically, the result is a large set of points. Such a set can be converted to a set of triangles by the Delaunay triangulation algorithm [Delaunay 34]. This method produces a collection of edges that satisfy the following property: For each edge we can find a circle containing the edge’s endpoints but not containing any other points.
13 B´ ezier Approximation
715
z x y
z x
y Figure 13.47: Two Triangular B´ezier Surfaces for n = 2 and n = 4.
basis of this type of surface are given by Equation (13.3), rewritten here n Bi,j,k (u, v, w) =
i+j+k=n i,j,k≥0
(i + j + k)! i j k uv w = i!j!k!
i+j+k=n i,j,k≥0
n! i j k u v w . (13.3) i!j!k!
Direct checking verifies that these polynomials satisfy the recursion relation n−1 n−1 n−1 n (u, v, w) = uBi−1,jk (u, v, w) + vBi,j−1,k (u, v, w) + wBi,j,k−1 (u, v, w), Bi,j,k
(13.56)
and this relation is the basis of the de Casteljau algorithm for the triangular B´ezier patch. The algorithm starts with the original control points Pijk which are labeled P0ijk . The user selects a triplet (u, v, w) where u + v + w = 1 and performs the following step n times to compute intermediate points Pri,j,k for r = 1, . . . , n and i + j + k = n − r r−1 r−1 Pri,j,k = uPr−1 i+1,j,k + vPi,j+1,k + wPi,j,k+1 .
The last step produces the single point Pn000 that’s also the point produced by the selected triplet (u, v, w) on the triangular B´ezier patch. The algorithm is illustrated here for n = 3. Figure 13.43 shows the 10 control points. Assuming that the user has selected appropriate values for the parameter triplet (u, v, w), the first step of the algorithm produces the six intermediate points for n = 2 (Figure 13.48) P1002 = uP0102 + vP0012 + wP0003 , P1200 = uP0300 + vP0210 + wP0201 , P1110 = uP0210 + vP0120 + wP0111 ,
P1101 = uP0201 + vP0111 + wP0102 , P1011 = uP0111 + vP0021 + wP0012 , P1020 = uP0120 + vP0030 + wP0021 .
13.25 Triangular B´ ezier Surfaces
716
The second step produces the three intermediate points for n = 1 P2001 = uP1101 + vP1011 + wP1002 , P2100 = uP1200 + vP1110 + wP1101 , P2010 = uP1110 + vP1020 + wP1011 . And the third step produces the single point P3000 = uP2100 + vP2010 + wP2001 . This is the point that corresponds to the particular triplet (u, v, w) on the triangular patch defined by the 10 original control points. 030 020 021
120
010 011
012
110
210
111
001 002 003
102
201
101
100
200
300
Figure 13.48: Scaffolding in a Triangular B´ezier Patch.
Exercise 13.33: Illustrate this algorithm for n = 4. Start with the 15 original control points and list the four steps of the scaffolding. The final result should be the single point P4000 . Assume that the user has selected appropriate values for the parameter triplet (u, v, w), Exercise 13.34: Assuming the values u = 1/6, v = 2/6, and w = 3/6, and the 10 control points (3, 3, 0) (2, 2, 0) (4, 2, 1) (1, 1, 0) (3, 1, 1) (5, 1, 2) (0, 0, 0) (2, 0, 1) (4, 0, 2) (6, 0, 3) apply the de Casteljau algorithm to compute point P3000 and then use Equation (13.53) to compute surface point P(1/6, 2/6/3/6) and show that the two points are identical. It can be shown that a general intermediate point Pri,j,k (u, v, w) obtained in the scaffolding process can be computed directly from the control points without having to go through the intermediate steps of the scaffolding construction, as follows Prijk (u, v, w) =
a+b+c=r
r Babc (u, v, w)Pi+a,j+b,k+c .
13 B´ ezier Approximation
717
Example: For n = 3 and r = 1, point P1002 is computed directly from the control points as the sum P1002 =
1 Babc (u, v, w)P0+a,0+b,2+c = uP102 + vP012 + wP003 .
a+b+c=1
For n = 3 and r = 2, point P2001 is computed directly as the sum P2001 =
2 Babc (u, v, w)P0+a,0+b,1+c
a+b+c=2
= v2 P021 + 2vwP012 + 2uvP111 + w2 P003 + 2uwP102 + u2 P201 . Exercise 13.35: For n = 4, compute intermediate points P3001 and P1111 directly from the control points.
13.25.2 Subdivision A triangular B´ezier patch can be subdivided into three triangular B´ezier patches by a process similar to the one described in Section 13.8 for the B´ezier curve. New control points for the three new patches are computed in two steps. First, all the intermediate points generated in the scaffolding steps are computed, then the original interior control points are deleted. We illustrate this process first for n = 3 and n = 4, then for the general case. A triangular B´ezier patch for n = 3 is defined by 10 control points, of which nine are exterior. The user first selects the point inside the surface patch where the three new triangles will meet. This is done by selecting a barycentric triplet (u, v, w). The user then executes three steps of the scaffolding process to generate 6 + 3 + 1 = 10 new intermediate points. The new points are added to the nine exterior control points and the single interior point P111 is deleted. The resulting 19 points are divided into three overlapping sets of 10 points each (Figure 13.49) that define three adjacent triangular B´ezier patches inside the original patch.
111
Figure 13.49: Subdividing the Triangular B´ezier Patch for n = 3.
A triangular B´ezier patch for n = 4 is defined by 15 control points, of which 12 are exterior. The user selects a barycentric triplet (u, v, w) and executes four steps of
13.25 Triangular B´ ezier Surfaces
718
the scaffolding process to generate 9 + 6 + 3 + 1 = 19 new intermediate points. The new points are added to the 12 exterior control points and the three interior points are deleted. The resulting 31 points are divided into three overlapping sets of 15 points each that define three adjacent triangular B´ezier patches inside the original patch. Exercise 13.36: Draw a diagram for this case, similar to Figure 13.49. In general, a triangular B´ezier patch is defined by 12 (n + 1)(n + 2) control points, of which 1 + 2 + 2 + · · · + 2 +(n + 1) = 3n points are exterior. The scaffolding construction n−2
is then performed, creating 3(n − 1) points in step 1, 3(n − 2) points in step 2, and so on, down to 3[n − (n − 1)] = 3 points in step n − 1 and one point in step n, for a total of 3n 2 (n − 1) + 1 points. For n = 3 through 7, these numbers are 10, 19, 31, 46, and 64. (Note that there are no interior points for n = 1 and n = 2.) These new points, added 3n to the original exterior points, provide 3n 2 (n − 1) + 1 + 3n = 2 (n + 1) + 1 points. For n = 3 through 7, these numbers are 19, 31, 46, 64, and 84. These numbers are enough to construct three adjacent triangular B´ezier patches defined by 12 (n + 1)(n + 2) control points each. The user always starts a subdivision by selecting a surface point P(u, v, w) where the three new triangular patches will meet. A special case occurs if this point is located on an edge of the original triangular patch (i.e., if one of u, v, or w is zero). In such a case, the original triangle is subdivided into two, instead of three triangular patches. This may be useful in cases where only a few extra points are required to reshape the surface.
13.25.3 Degree Elevation Section 13.9 describes how to elevate the degree of a B´ezier curve. This section adopts the same ideas to elevate the degree of a triangular B´ezier patch. Given a triangular patch of order n defined by 12 (n + 1)(n + 2) control points Pijk , it is easy to compute a new set of control points Qijk that represent the same surface as a triangular patch of order n + 1. The basic relation is i+j+k=n
n Pijk Bi,j,k (u, v, w) =
i,j,k≥0
i+j+k=n+1
n+1 Qijk Bi,j,k (u, v, w).
It can be shown, employing methods similar to those of Section 13.9, that the new points Qijk are obtained from the original control points Pijk by Qijk =
1 [iPi−1,j,k + jPi,j−1,k + kPi,j,k−1 ] . n+1
Example: We elevate the degree of a triangular B´ezier patch from n = 2 to n = 3. The 10 new control points are obtained from the six original points by Q003 = P002 , Q102 = 13 (P002 + 2P101 ), Q201 = 13 (2P101 + P200 ), Q300 = P200 , Q012 = 13 (P002 + 2P011 ), Q111 = 13 (P011 + P101 + P110 ), Q210 = 13 (2P110 + P200 ), Q021 = 13 (2P011 + P020 ), Q120 = 13 (P020 + 2P110 ), Q030 = P020 .
13 B´ ezier Approximation
719
It is possible to elevate the degree of a patch repeatedly. Each degree elevation increases the number of control points and moves them closer to the actual surface. At the limit, the number of control points approaches infinity and the net of points approaches the surface patch.
13.26 Joining Triangular B´ ezier Patches The triangular B´ezier surface patch is used in cases where a large surface happens to be easier to break up into triangular patches than into rectangular patches. It is therefore important to discover the conditions for smooth joining of these surface patches. The conditions should be expressed in terms of constraints on the control points. These constraints are developed here for cubic surface patches, but the principles are the same for higher-degree patches. The idea is to calculate three vectors that are tangent to the surface at the common boundary curve. Intuitively, the condition for a smooth join is that these vectors be coplanar (although they can have different magnitudes). We proceed in three steps: Step 1. Figure 13.50 shows two triangular B´ezier cubic patches, P(u, v, w) and Q(u, v, w), joined at the common boundary curve P(0, v, w) = Q(0, v, w). We can see from Equation (13.55) how the boundary curves can be expressed as B´ezier curves. Based on this equation, our common boundary curve can be written P(v) =
j+k=3
3! j v (1 − v)3−j P0jk . j! k!
This is easy to differentiate with respect to v and the result is dP(v) = 3v 2 (P030 − P021 ) + 6v(1 − v)(P021 − P012 ) + 3(1 − v)2 (P012 − P003 ) dv (13.57) = 3v 2 B3 + 6v(1 − v)B2 + (1 − v)2 B1 , where each of the Bi vectors is defined as the difference of two control points. They can be seen in the figure as thick arrows going from P003 to P030 . Step 2. Another vector is computed that’s tangent to the patch P(u, v, w) along the common boundary. This is done by calculating the tangent vector to the surface in the u direction and substituting u = 0. We first write the expression for the surface patch without the parameter w (it can be eliminated because w = 1 − u − v): P(u, v) =
i+j+k=3
3! ui v j (1 − u − v)k Pijk . i! j! k!
This is easy to differentiate with respect to u and it yields ∂P(u, v) = 3v 2 (P120 − P021 ) + 6v(1 − v)(P111 − P012 ) ∂u u=0 + 3(1 − v)2 (P102 − P003 )
= 3v 2 A3 + 6v(1 − v)A2 + 3(1 − v)2 A1 ,
(13.58)
13.26 Joining Triangular B´ ezier Patches
720
P030
w=
0
Q030
P120
P210
Q120
P021
Q021
u=0
P111
0
Q210
Q(u,v,w)
P(u,v,w)
P300
w=
Q111
Q300
P201 v=
P012
0
P102
P003
Q012
v=
0
Q201
Q102 Q003
Figure 13.50: Joining Triangular B´ezier Patches Smoothly.
where each of the Ai vectors is again defined as the difference of two control points. They can be seen in the figure as thick arrows going, for example, from P003 to P102 . Step 3. The third vector is the tangent to the other surface patch Q(u, v, w) along the common boundary. It is expressed as ∂Q(u, v) = 3v 2 (Q120 − Q021 ) + 6v(1 − v)(Q111 − Q012 ) ∂u u=0 (13.59) + 3(1 − v)2 (Q102 − Q003 ) = 3v 2 C3 + 6v(1 − v)C2 + 3(1 − v)2 C1 , where each of the Ci vectors is again defined as the difference of two control points. They can be seen in the figure as thick arrows going, for example, from Q003 to Q102 . The condition for smooth joining is that the vectors defined by Equations (13.57) through (13.59) be coplanar for any value of v. This can be expressed as 3v2 B3 + 6v(1 − v)B2 + (1 − v)2 B1 = α(3v2 A3 + 6v(1 − v)A2 + 3(1 − v)2 A1 ) 2
(13.60)
2
+ β(3v C3 + 6v(1 − v)C2 + 3(1 − v) C1 ), or, equivalently, v 2 (B3 − αA3 − βC3 ) + 2v(1 − v)(B2 − αA2 − βC2 ) + (1 − v)2 (B1 − αA1 − βC1 ) = 0.
13 B´ ezier Approximation
721
Since this should hold for any value of v, it can be written as the set of three equations: B1 = αA1 + βC1 , B2 = αA2 + βC2 , B3 = αA3 + βC3 .
(13.61)
Each of the three sets of vectors Bi , Ai , and Ci (i = 1, 2, 3) should therefore be coplanar. This condition can be expressed for the control points by saying that each of the three quadrilaterals given by P003 = Q003 , P012 = Q012 , P021 = Q021 ,
P102 , P111 , P120 ,
P012 = Q012 , P021 = Q021 , P030 = Q030 ,
Q102 , Q111 , Q120 ,
should be planar. In the special case α = β = 1, each quadrilateral should be a square. Otherwise, each should have the same ratio of height to width. The condition for such a set of three vectors to be coplanar is simple to derive. Figure 13.51 shows a quadrilateral with four corner points A, B, C, and D. Two dashed segments are shown, connecting A to B and C to D. The condition for a flat quadrilateral (four coplanar corners) is that the two segments intersect. The first segment can be expressed parametrically as (1 − u)A + uB and the second segment can be similarly expressed as (1 − w)C + wD. If there exist u and w in the interval [0, 1] such that (1 − u)A + uB = (1 − w)C + wD, then the quadrilateral is flat. D
B
A C Figure 13.51: A Quadrilateral.
13.26.1 Joining Rectangular and Triangular B´ ezier Patches A smooth joining of a rectangular and a triangular surface patches, both of order n, may be useful in many practical applications. Figure 13.52a shows the numbering of the control points for the case n = 4. Points Qijk define the triangular patch and points Pij define the rectangular patch. There are four pairs (in general, n pairs) of identical points. The problem of joining surface patches of such different topologies can be greatly simplified by elevating the degree (Section 13.25.3) of the two rightmost columns of control points of the triangular patch. The column of four points Q0jk where j + k = 3 is transformed to five points R0jk where j + k = 4, and the column of three points Q1jk where 1 + j + k = 3 is transformed to four points R1jk where 1 + j + k = 4. Figure 13.52b shows the new points and how, together with the column of four points
13.26 Joining Triangular B´ ezier Patches
722
Q300
Q210 Q201
Q120 Q111
Q030=P03
P13
Q021=P02 P12
Q102 Q012=P01 Q003=P00
P11
R130
R040
R121
R031
R112 R103
P10
(a)
R022 R013
P13 P12 P11 P10
R004 (b)
Figure 13.52: Smooth Joining of Triangular and Rectangular B´ezier Surface Patches.
P10 through P13 , they create four quadrilaterals. The condition for smooth joining of the patches is that each quadrilateral be flat. In general, there are n + 1 such quadrilaterals, and each condition can be written explicitly, as an equation, in terms of some of the points P1i , Q0jk , and Q1jk . A general equation is (1 − α)R0,i,n−i + αR0,i+1,n−i = (1 − β)R1,i,n−i + βP1,i ,
for i = 0, 1, . . . , n.
When the Rijk points are expressed in terms of the original Qijk points, this relation becomes 1−α α [iQ0,i−1,n−i + (n − i)Q0,i,n−i ] + [(i + 1)Q0,i,n−i + (n − i)Q0,i+1,n−i−1 ] n n β = [Q0,i,n−i + iQ1,i−1,n−i + (n − i)Q1,i,n−i−1 ] + βP1i . n Note that the quantities α and β in these equations should be indexed by i. In general, each quadrilateral has its own αi and βi , but the surface designer can start by guessing values for these 2(n + 1) quantities, then use them as parameters and vary them (while still keeping each quadrilateral flat), until the surface is molded to the desired shape. If the rectangular patch is given and the triangular patch has to be designed and manipulated to connect smoothly to it, then the n points Q1jk (the column to the left of the common boundary) are the unknowns. Conversely, if we start from the triangular patch and want to select control points for the rectangular patch, then the unknowns are the n + 1 control points P1i (the column to the right of the common boundary). Reference [Liu and Hoschek 89] has a detailed analysis of the conditions for smooth connection of various types of B´ezier surface patches.
13 B´ ezier Approximation
723
13.27 Reparametrizing the B´ ezier Surface We illustrate the method described here by applying it to the bicubic B´ezier surface patch. The expression for this patch is given by Equations (13.48) and (13.47): P(u, w) =
3 3
B3,i (u)Pi,j B3,j (w)
i=0 j=0
=
3 3 (u3 , u2 , u, 1)MPM−1 (w3 , w2 , w, 1)T , i=0 j=0
where M is the basis matrix ⎛
⎞ −1 3 −3 1 3 0⎟ ⎜ 3 −6 M=⎝ ⎠ −3 3 0 0 1 0 0 0 and P is the 4×4 matrix of control points ⎛
P3,0 ⎜ P2,0 ⎝ P1,0 P0,0
P3,1 P2,1 P1,1 P0,1
P3,2 P2,2 P1,2 P0,2
⎞ P3,3 P2,3 ⎟ ⎠. P1,3 P0,3
This surface patch can be reparametrized with the method of Section 13.10. We select part of patch P(u, w), e.g., the part where u varies from a to b, and define it as a new patch Q(u, w) where both u and w vary in the range [0, 1]. The method discussed here shows how to obtain the control points Qij of patch Q(u, w) as functions of a, b and points Pij . B-splines are the defacto standard that drives today’s sophisticated computer graphics applications. This method is also responsible for the developments that have transformed computer-aided geometric design from the era of hand-built models and manual measurements to fast computations and three-dimensional renderings. Suppose that we want to reparametrize the “left” part of P(u, w), i.e., the part where 0 ≤ u ≤ 0.5. Applying the methods of Section 13.10, we select a = 0, b = 0.5 and can write P(u/2, w) = (u3 , u2 , u, 1)MBPM−1 (w3 , w2 , w, 1)T , where B is given by Equation (13.22) ⎛
(1 − a)3 ⎜ (a − 1)2 (1 − b) B=⎝ (1 − a)(−1 + b)2 (1 − b)3
⎞ 3(a − 1)2 a 3(1 − a)a2 a3 (a − 1)(−2a − b + 3ab) a(a + 2b − 3ab) a2 b ⎟ ⎠. (b − 1)(−a − 2b + 3ab) b(2a + b − 3ab) ab2 3(b − 1)2 b 3(1 − b)b2 b3
724
13.27 Reparametrizing the B´ ezier Surface
Exercise 13.18 shows that selecting a = 0 and b = 0.5 reduces matrix B to ⎞ ⎛ 1 0 0 0 1 1 ⎜2 2 0 0⎟ ⎟ B=⎜ ⎝ 1 1 1 0 ⎠. 4 1 8
2 3 8
4 3 8
1 8
The new control points for our surface patch are therefore ⎛ ⎞ ⎛ 1 0 0 0 ⎞⎛ ⎞ Q3,0 Q3,1 Q3,2 Q3,3 P3,0 P3,1 P3,2 P3,3 1 1 ⎟ ⎜ ⎜ Q2,0 Q2,1 Q2,2 Q2,3 ⎟ ⎜ 2 2 0 0 ⎟ ⎜ P2,0 P2,1 P2,2 P2,3 ⎟ ⎝ ⎠=⎝1 1 1 ⎠ ⎝ Q1,0 Q1,1 Q1,2 Q1,3 0 ⎠ P1,0 P1,1 P1,2 P1,3 4 2 4 1 3 3 1 Q0,0 Q0,1 Q0,2 Q0,3 P0,0 P0,1 P0,2 P0,3 8 8 8 8 ⎛ P3,0 P3,1 1 1 1 ⎜ P + P P + 12 P2,1 3,0 2,0 3,1 2 2 2 =⎜ 1 1 1 1 1 1 ⎝ 4 P3,0 + 2 P2,0 + 4 P1,0 4 P3,1 + 2 P2,1 + 4 P1,1 1 3 3 1 3 3 1 1 8 P3,0 + 8 P2,0 + 8 P1,0 + 8 P0,0 8 P3,1 + 8 P2,1 + 8 P1,1 + 8 P1,0 ⎞ P3,2 P3,3 1 1 1 1 ⎟ 2 P3,2 + 2 P2,2 2 P3,3 + 2 P2,3 ⎟. 1 1 1 1 1 1 ⎠ P + P + P P + P + P 3,2 2,2 1,2 3,3 2,3 1,3 4 2 4 4 2 4 1 3 3 1 3 3 1 1 8 P3,2 + 8 P2,2 + 8 P1,2 + 8 P2,0 8 P3,3 + 8 P2,3 + 8 P1,3 + 8 P3,0 In general, suppose we want to reparametrize that portion of patch P(u, w) where a ≤ u ≤ b and c ≤ w ≤ d. We can write Q(u, w) = P([b − a]u + a, [d − c]w + c)
⎞ ⎛ ([d−c]w+c)3 2 ⎜ ([d−c]w+c) ⎟ = ([b−a]u+a)3 , ([b−a]u+a)2 , ([b−a]u+a), 1 M · P · M−1 ⎝ ⎠ [d−c]w+c 1
= (u3 , u2 , u, 1)Aab M · P · MT · ATcd (w3 , w2 , w, 1)T = (u3 , u2 , u, 1)M(M−1 · Aab · M)P(MT · ATcd · (MT )−1 )MT (w3 , w2 , w, 1)T = (u3 , u2 , u, 1)M · Bab · P · BTcd · MT (w3 , w2 , w, 1)T = (u3 , u2 , u, 1)M · Q · MT (w3 , w2 , w, 1)T , (13.62) where Bab = M−1 · Aab · M, BTcd = MT · ATcd · (MT )−1 , Q = Bab · P · BTcd , and ⎛
Aab
(b − a)3 0 0 0 ⎜ 3a(b − a)2 (b − a)2 =⎝ 2 3a (b − a) 2a(b − a) b − a a2 a a3
⎞ 0 0⎟ ⎠. 0 1
The elements of Q depend on a, b, c, and d, and the Pij ’s and are quite complex. They can be produced by the following Mathematica code:
13 B´ ezier Approximation
725
B={{(1 - a)^3, 3*(-1 + a)^2*a, 3*(1 - a)*a^2, a^3}, {(-1 + a)^2*(1 - b), (-1 + a)*(-2*a - b + 3*a*b), a*(a + 2*b - 3*a*b), a^2*b}, {(1 - a)*(-1 + b)^2, (-1 + b)*(-a - 2*b + 3*a*b), b*(2*a + b - 3*a*b), a*b^2}, {(1 - b)^3, 3*(-1 + b)^2*b, 3*(1 - b)*b^2, b^3}}; TB={{(1 - c)^3, (-1 + c)^2*(1 - d), (1 - c)*(-1 + d)^2, (1 - d)^3}, {3*(-1 + c)^2*c, (-1 + c)*(-2*c - d + 3*c*d), (-1 + d)*(-c - 2*d + 3*c*d), 3*(-1 + d)^2*d}, {3*(1 - c)*c^2, c*(c + 2*d - 3*c*d), d*(2*c + d - 3*c*d), 3*(1 - d)*d^2}, {c^3, c^2*d, c*d^2, d^3}}; P={{P30,P31,P32,P33},{P20,P21,P22,P23}, {P10,P11,P12,P13},{P00,P01,P02,P03}}; Q=Simplify[B.P.TB]
13.28 The Gregory Patch John A. Gregory developed this method to extend the Coons surface patch. The Gregory method, however, becomes very practical when it is applied to extend the bicubic B´ezier patch. Recall that such a patch is based on 4×4 = 16 control points (Figure 13.53a). We can divide the 16 points into two groups: the interior points, consisting of the four points P11 , P12 , P21 , and P22 , and the boundary points, consisting of the remaining 12 points. Experience shows that there are too few interior points to fine-tune the shape of the patch. Moving point P11 , for example, affects both the direction from P01 to P11 , and the direction from P10 to P11 . P 13
P 23 P 33
P 03
P 02 P 01 P 00
P 12
P 11
P 10 (a)
P 22
P 21 P 20
P 220
P 120 P 32 P 31
P 121
P 221 P 211
P 111 P 110
P 210
P 30 (b)
Figure 13.53: (a) A Bicubic B´ezier Patch. (b) A Gregory Patch.
13.28 The Gregory Patch
726
The idea in the Gregory patch is to split each of the four interior points into two points. Hence, instead of point P11 , for example, there should be two points P110 and P111 , both in the vicinity of the original P11 . Moving P110 affects the shape of the patch only in the direction from P10 to P110 . The shape of the patch around point P01 is not affected (at least, not significantly). Thus, the bicubic Gregory patch is defined by 20 points (Figure 13.53b), eight interior points and 12 boundary points. Points P110 and P111 can initially be set equal to P11 , then moved interactively in different directions to obtain the right shape of the surface. To calculate the surface, we first define 16 new points Qij , then use Equation (13.47) with the new points as control points and with n = m = 3. Twelve of the Q points are boundary points and are identical to the boundary P points. The remaining four Q points are interior and each is calculated from a pair of interior P points. Their definitions are the following uP110 + wP111 , u+w uP120 + (1 − w)P121 Q12 (u, w) = , u+1−w Q11 (u, w) =
(1 − u)P210 + wP211 , 1−u+w (1 − u)P220 + (1 − w)P221 Q22 (u, w) = . 1−u+1−w Q21 (u, w) =
Note that Q11 (u, w) is a barycentric sum of two P points, so it is well defined. Even though u and w are independent and each is varied from 0 to 1 independently of the other, the sum is always a point on the straight segment connecting P110 to P111 . The same is true for the other three interior Q points. After calculating the new points, the Gregory patch is defined as the bicubic B´ezier patch 3 3 P(u, w) = B3,i (w)Qi,j B3,j (u). i=0 j=0
(Note that four of the 16 points Qi,j depend on the parameters u and w.)
13.28.1 The Gregory Tangent Vectors The first derivatives of the Gregory patch are more complex than those of the bicubic B´ezier patch, because four of the control points depend on the parameters u and w. The derivatives are ∂P(u, w) ∂u 3 3 3 3 d B3,i (u) ∂Qi,j (u, w) B3,j (w)Qi,j (u, w) + , = B3,i (u)B3,j (w) du ∂u i=0 j=0 i=0 j=0 ∂P(u, w) ∂w 3 3 3 3 d B3,j (w) ∂Qi,j (u, w) = B3,i (u) B3,i (u)B3,j (w) Qi,j (u, w) + . dw ∂w i=0 j=0 i=0 j=0
13 B´ ezier Approximation
727
Each derivative is the sum of two similar terms, each of which has the same format as a derivative of the bicubic B´ezier patch. Therefore, only one procedure is needed to calculate the derivatives numerically. This procedure is called twice for each partial derivative. The second call involves the derivatives of the control points, which are shown here. The 12 boundary Q points don’t depend on u or w, so their derivatives are zero. The eight derivatives of the four interior points are ∂Q11 (u, w) w(P110 − P111 ) , = ∂u (u + w)2 ∂Q21 (u, w) w(P210 − P211 ) = , ∂u (1 − u + w)2 (1 − w)(P120 − P121 ) ∂Q12 (u, w) = , ∂u (u + 1 − w)2 ∂Q22 (u, w) (1 − w)(P220 − P221 ) , = ∂u (1 − u + 1 − w)2
∂Q11 (u, w) ∂w ∂Q21 (u, w) ∂w ∂Q12 (u, w) ∂w ∂Q22 (u, w) ∂w
u(P110 − P111 ) , (u + w)2 (1 − u)(P210 − P211 ) = , (1 − u + w)2 u(P120 − P121 ) = , (u + 1 − w)2 (1 − u)(P220 − P221 ) . = (1 − u + 1 − w)2 =
After the first derivatives (the tangent vectors) have been calculated numerically at a point, they are used to numerically calculate the normal vector at the point. It is interesting to observe that the Bernshte˘ın polynomial of degree 1, i.e., the function z(t) = (1 − t) z1 + t z2 , is precisely the mediation operator t[z1 , z2 ] that we discussed in the previous chapter.
—Donald Knuth, The MetafontBook (1986)
Bezier invented some curves That he used to approximate swerves If you use them just right They’ll fit very tight And save wear and tear on your nerves
—From Hardy Calculus.
PlateH.1.ARoomSceneProcessedNineTimes(3DMaker).
CylinderWrap
Flag
Embossed
CubeWrap
Edges
ConeWrap
DotStereogram
SphereWrap
SphereWrap
Plate H.2. Termespheres (Courtesy of Dick Termes).
Plate H.3. A Cluttered Room, Day and Night (Live Interior).
Plate H.4. Blur, Sharpen, Emboss, and Watercolor (Photoshop Effects).
Plate H.5. Woman in a Bathroom, Day (Left) and Night (Right) (Live Interior).
PlateI.1.ABitmapAutomatically(andUnsuccessfully)ConvertedtoVectors (VectorMagic).
PlateI.2.ImagesBasedonGeneticAlgorithmsthatMimicArtificialSelection (SBArt).
PlateI.3.ParametricSurfaces(SurfaceExplorer).
Brick
Floortile
Glass
Masonry
Metal
MonaLisa
Plastic
Stone
Walltile
Woodgrain
Woodshingle
Wool
PlateI.4.TeapotwithVariousTextures(LiveInterior).
PlateI.5.WaterSplashwithMetallicTexture(Modo).
Plate I.6. Cubical Universe (Courtesy of Dick Termes).
Plate I.7. Woman in a Bathroom, Day (Left) and Night (Right) (Live Interior).
Plate I.8. Whole To The Hole (Courtesy of Dick Termes).
Plate I.9. Smoking? Not Here! (Smoke).
Volume II This is the second volume of A Manual of Computer Graphics. This textbook/reference is big because the discipline of computer graphics is big. There are simply many topics, techniques, and algorithms to discuss, explain, and illustrate by examples. Because of the large number of pages, the book has been produced in two volumes. However, this division of the book into volumes is only technical and the book should be considered a single unit. It is wrong to consider volume I as introductory and volume II as advanced, or to assume that volume I is important while volume II is not. The volumes are simply two halves of a single, large entity and each refers to many figures, equations, and sections that appear in the other. This volume starts in the middle of Part III. In addition to Parts IV through VII it also contains the bibliogrphy, answers to all the exercises, and the detailed index. Each volume starts and ends with color plates, and there are also plates between individual parts of the book.
Acknowledgements A book of this magnitude is generally written with the help, dedicated work, and encouragement of many people, and this large textbook/reference is no exception. First and foremost should be mentioned my editor, Wayne Wheeler and the copyeditor, Francesca White. They made many useful comments and suggestions, and pointed out many mistypes, errors, and stylistic blemishes. In addition, I would like to thank H. L. Scott for permission to use Figure 2.82, CH Products for permission to use Figure 26.24b, Andreas Petersik for Figure 6.61, Shinji Araya for Figure 7.27, Dick Termes for many figures and paintings, the authors of Hardy Calculus for the limerick at the end of Chapter 13, Bill Wilburn for many Mathematica notebooks, and Ari Salomon for photos and panoramas in several plates. The Preface is the most important part of the book. Even reviewers read a preface.
—Philip Guedalla
14 B-Spline Approximation B-spline methods for curves and surfaces were first proposed in the 1940s but were seriously developed only in the 1970s, by several researchers, most notably R. Riesenfeld. They have been studied extensively, have been considerably extended since the 1970s, and much is currently known about them. The designation “B” stands for Basis, so the full name of this approach to curve and surface design is the basis spline. This chapter discusses the important types of B-spline curves and surfaces, including the most versatile one, the nonuniform rational B-spline (NURBS, Section 14.14). The B-spline curve overcomes the main disadvantages of the B´ezier curve which are (1) the degree of the B´ezier curve depends on the number of control points, (2) it offers only global control, and (3) individual segments are easy to connect with C 1 continuity, but C 2 is difficult to obtain. The B-spline curve features local control and any desired degree of continuity. To obtain C n continuity, the individual spline segments have to be polynomials of degree n. The B-spline curve is an approximating curve and is therefore defined by control points. However, in addition to the control points, the user has to specify the values of certain quantities called “knots.” They are real numbers that offer additional control over the shape of the curve. The basic approach taken in the first part of this chapter disregards the knots, but they are introduced in Section 14.8 and their effect on the curve is explored. There are several types of B-splines. In the uniform (also called periodic) B-spline (Sections 14.1 and 14.2), the knot values are uniformly spaced and all the weight functions have the same shape and are shifted with respect to each other. In the nonuniform B-spline (Section 14.11), the knots are specified by the user and the weight functions are generally different. There is also an open uniform B-spline (Section 14.10), where the knots are not uniform but are specified in a simple way. In a rational B-spline (Section 14.14), the weight functions are in the form of a ratio of two polynomials. In a nonrational B-spline, they are polynomials in t. The B-spline is an approximating curve based on control points, but there is also an interpolating version that passes through D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_14, © Springer-Verlag London Limited 2011
731
14.1 The Quadratic Uniform B-Spline
732
the points (Section 14.7). Section 14.4 shows how tension can be added to the B-spline. B-splines are mathematically more sophisticated than other types of splines, so we start with a gentle introduction. We first use basic assumptions to derive the expressions for the quadratic and cubic uniform B-splines directly and without mentioning knots. We then show how to extend the derivations to uniform B-splines of any order. Following this, we discuss a different, recursive formulation of the weight functions of the uniform, open uniform, and nonuniform B-splines.
14.1 The Quadratic Uniform B-Spline We start with the quadratic uniform B-spline. We assume that n + 1 control points, P0 , P1 ,. . . , Pn , are given and we want to construct a spline curve where each segment Pi (t) is a quadratic parametric polynomial based on three points, Pi−1 , Pi , and Pi+1 . We require that the segments connect with C 1 continuity (only cubic and higher-degree polynomial segments can have C 2 or higher continuities) and that the entire curve has local control. To achieve all this, we have to give up something and we elect to give up the requirement that a segment will pass through its first and last control points. We denote the start and end points of segment Pi (t) by Ki and Ki+1 , respectively and we call them joint points, or just joints. These points are still unknown and will have to be determined. Figure 14.1a shows two quadratic segments P1 (t) and P2 (t) defined by the four control points P0 , P1 , P2 , and P3 . The first segment goes from joint K1 to joint K2 and the second segment goes from joint K2 to joint K3 , where the joints are drawn tentatively and will have to be determined accurately and redrawn. Note that each segment is defined by three control points, so its control polygon has two edges. The first spline segment is defined only by P0 , P1 , and P2 , so any changes in P3 will not affect it. This is how local control is achieved in a B-spline. P1
P1(t) K1
K1
P2(t)
K2 P1(t)
P3
P0
(a)
P2
P2(t) K3
K3 P3
P0
P1
P2
K2
(b) Figure 14.1: The Quadratic Uniform B-Spline.
We use the usual notation for the two segments ⎞ Pi−1 Pi (t) = (t2 , t, 1)M ⎝ Pi ⎠ , Pi+1 ⎛
i = 1, 2,
(14.1)
14 B-Spline Approximation
733
where M is the 3×3 basis matrix whose nine elements have to be computed. We define three functions a(t), b(t), and c(t) by: ⎛
a2 (t2 , t, 1)M = (t2 , t, 1) ⎝ a1 a0
b2 b1 b0
⎞ c2 c1 ⎠ c0
= (a2 t2 + a1 t + a0 , b2 t2 + b1 t + b0 , c2 t2 + c1 t + c0 ) = a(t), b(t), c(t) .
(14.2)
The nine elements of M are determined from the following three requirements: 1. The two segments should meet at a common joint and their tangent vectors should be equal at that point. This is expressed as P1 (1) = P2 (0) and Pt1 (1) = Pt2 (0),
(14.3)
and produces the explicit equations (where the dots indicate differentiation with respect to t) a(1)P0 + b(1)P1 + c(1)P2 = a(0)P1 + b(0)P2 + c(0)P3 , ˙ ˙ a(1)P ˙ ˙ ˙ ˙ 0 + b(1)P1 + c(1)P 2 = a(0)P 1 + b(0)P2 + c(0)P 3. Since the control points Pi are arbitrary and can be any points, we can rewrite these two equations in the form a(1) = 0, b(1) = a(0), c(1) = b(0), 0 = c(0),
a(1) ˙ = 0, for P0 , ˙ b(1) = a(0), ˙ for P1 , ˙ c(1) ˙ = b(0), for P2 , 0 = c(0), ˙ for P3 .
Using the notation of Equation (14.2), this can be written a2 + a1 + a0 = 0, b2 + b1 + b0 = a0 , c2 + c1 + c0 = b0 , 0 = c0 ,
2a2 + a1 = 0, 2b2 + b1 = 0, 2c2 + c1 = 0, 0 = c1 .
(14.4)
This requirement produces eight equations for the nine unknown matrix elements. 2. The entire curve should be independent of the particular coordinate system used, which implies that the weight functions of each segment should be barycentric, i.e., a(t) + b(t) + c(t) ≡ 1. This condition can be written explicitly as a2 + b2 + c2 = 0,
a1 + b1 + c1 = 0,
and these add three more equations.
a0 + b0 + c0 = 1,
(14.5)
14.1 The Quadratic Uniform B-Spline
734
We now have 11 equations for the nine unknowns, but it is easy to show that only nine of the 11 are independent. The sum of the first two of Equations (14.5) equals the sum of the three equations in the right column of Equation (14.4). Taking this into account, the equations can be solved uniquely, yielding a2 = 1/2, a1 = −1, a0 = 1/2, b2 = −1, b1 = 1, b0 = 1/2, c2 = 1/2, c1 = 0, c0 = 0. The general quadratic B-spline segment, Equation (14.1), can now be written as ⎛ ⎞⎛ ⎞ 1 −2 1 Pi−1 1 2 2 0 ⎠ ⎝ Pi ⎠ Pi (t) = (t , t, 1) ⎝ −2 2 1 1 0 Pi+1
(14.6)
1 1 t2 = (t2 − 2t + 1)Pi−1 + (−2t2 + 2t + 1)Pi + Pi+1 , 2 2 2
i = 1, 2.
We are now in a position to determine the start and end points, Ki and Ki+1 of segment i. They are Ki = Pi (0) =
1 (Pi−1 + Pi ), 2
Ki+1 = Pi (1) =
1 (Pi + Pi+1 ). 2
Thus, the quadratic spline segment starts in the middle of the straight segment Pi−1 Pi and ends at the middle of the straight segment Pi Pi+1 , as shown in Figure 14.1b. The tangent vector of the general quadratic B-spline segment is easily obtained from Equation (14.6). It is
Pti (t)
⎤ ⎤⎡ ⎡ Pi−1 1 −2 1 1 ⎦ ⎣ ⎣ 2 0 = (2t, 1, 0) −2 Pi ⎦ = (t−1)Pi−1 +(−2t+1)Pi +tPi+1 . (14.7) 2 1 1 0 Pi+1
The tangent vectors at both ends of the segment are therefore Pt (0) = Pi − Pi−1
and Pt (1) = Pi+1 − Pi ,
i.e., each of them points in the direction of one of the edges of the control polygon of the spline segment. Since a quadratic spline segment is a polynomial of degree 2, we require continuity of the first derivative only. It is easy to show that the second derivative of our segment is Pi−1 − 2Pi + Pi+1 . It is constant for a segment but is different for different segments. Equation (15.4) of Section 15.2 shows a relation between the quadratic B-spline and B´ezier curves. A similar relation between the corresponding cubic curves is illustrated in Section 14.5.
14 B-Spline Approximation
735
Example: Given the four control points P0 = (1, 0), P1 = (1, 1), P2 = (2, 1), and P3 = (2, 0) (Figure 14.2), the first quadratic spline segment is obtained from Equation (14.6) ⎞⎛ ⎛ ⎞ P0 1 −2 1 1 2 2 0 ⎠ ⎝ P1 ⎠ P1 (t) = (t , t, 1) ⎝ −2 2 1 1 0 P2 1 2 1 t2 (t − 2t + 1)(1, 0) + (−2t2 + 2t + 1)(1, 1) + (2, 1) 2 2 2 = (t2 /2 + 1, −t2 /2 + t + 1/2).
=
It starts at joint K1 = P1 (0) = (1, 12 ) and ends at joint K2 = P1 (1) = ( 32 , 1). y
P
1(
t)
P1
K2 P
P2
2(
t)
K1 P0
K3 P3
x
Figure 14.2: A Quadratic Uniform B-Spline Example.
The tangent vector of this segment is obtained from Equation (14.7) ⎞ ⎞⎛ ⎛ P0 1 −2 1 1 2 0 ⎠ ⎝ P1 ⎠ Pt1 (t) = (2t, 1, 0) ⎝ −2 2 1 1 0 P2 = (t − 1)(1, 0) + (−2t + 1)(1, 1) + t(2, 1) = (t, 1 − t). Thus, the first segment starts going in direction Pt1 (0) = (0, 1) (straight up) and ends going in direction Pt1 (1) = (1, 0) (to the right). Exercise 14.1: Calculate the second segment, its tangent vector, and joint point K3 . Closed Quadratic B-Splines: Closed curves are sometimes needed and a closed B-spline curve is easy to construct. Given the usual n + 1 control points, we extend them cyclically to obtain the n + 3 points Pn , P0 , P1 , P2 , . . . , Pn−1 , Pn , P0 and compute the curve by applying Equation (14.6) to the n + 1 geometry vectors ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ Pn−2 P0 P1 Pn−1 Pn ⎝ P0 ⎠ ⎝ P1 ⎠ ⎝ P2 ⎠ · · · ⎝ Pn−1 ⎠ ⎝ Pn ⎠ . P1 P2 P3 Pn P0
14.2 The Cubic Uniform B-Spline
736
Example: Given the four control points P0 = (1, 0), P1 = (1, 1), P2 = (2, 1), and P3 = (2, 0) of the previous example, it is easy to close the curve by calculating the two additional segments ⎞⎛ ⎛ ⎞ P3 1 −2 1 1 2 2 0 ⎠ ⎝ P0 ⎠ P0 (t) = (t , t, 1) ⎝ −2 2 1 1 0 P1 1 2 1 t2 (t − 2t + 1)(2, 0) + (−2t2 + 2t + 1)(1, 0) + (1, 1) 2 2 2 = (t2 /2 − t + 3/2, t2 /2). ⎞ ⎞⎛ ⎛ P2 1 −2 1 1 2 2 0 ⎠ ⎝ P3 ⎠ P3 (t) = (t , t, 1) ⎝ −2 2 1 1 0 P0 =
1 2 1 t2 (t − 2t + 1)(2, 1) + (−2t2 + 2t + 1)(2, 0) + (1, 0) 2 2 2 = (−t2 /2 + 2, t2 /2 − t + 1/2).
=
The four segments connect the four joint points (1, 1/2), (3/2, 1), (2, 1/2), (3/2, 0) and back to (1, 1/2).
14.2 The Cubic Uniform B-Spline This curve is again defined by n + 1 control points and it consists of spline segments Pi (t), each a PC defined by four control points Pi−1 , Pi , Pi+1 , and Pi+2 . The general form of segment i is therefore ⎛
⎞ Pi−1 ⎜ P ⎟ Pi (t) = (t3 , t2 , t, 1)M ⎝ i ⎠ , Pi+1 Pi+2
(14.8)
where M is a 4 × 4 matrix whose 16 elements have to be determined by translating the constraints on the curve into 16 equations and solving them. The constraints are (1) two segments should meet with C 2 continuity and (2) the entire curve should be independent of the particular coordinate system. As in the quadratic case, we give up the requirement that a segment Pi (t) starts and ends at control points, and we denote its extreme points by Ki and Ki+1 . These joints can be computed as soon as the expression for the segment is derived. Figure 14.3a shows a tentative design for two cubic segments.
14 B-Spline Approximation P2 P1
P0
P2
(t) P1
K1
P2
P3
K2
737
P1 K1
(t)
K3 P4
P
t) 1(
K2 P2(t)
P3 K3 P4
P0
(a)
(b) Figure 14.3: The Cubic Uniform B-Spline.
We start the derivation by writing ⎛
a3 ⎜ a2 3 2 3 2 (t , t , t, 1)M = (t , t , t, 1) ⎝ a1 a0
b3 b2 b1 b0
c3 c2 c1 c0
⎞ d3 d2 ⎟ ⎠ d1 d0
= (a3 t3 + a2 t2 + a1 t + a0 , b3 t3 + b2 t2 + b1 t + b0 , c3 t3 + c2 t2 + c1 t + c0 , d3 t3 + d2 t2 + d1 t + d0 ) = a(t), b(t), c(t), d(t) . The first three constraints are expressed by P1 (1) = P2 (0),
Pt1 (1) = Pt2 (0),
tt Ptt 1 (1) = P2 (0),
or, explicitly a(1)P0 + b(1)P1 + c(1)P2 + d(1)P3 = a(0)P1 + b(0)P2 + c(0)P3 + d(0)P4 , ˙ ˙ ˙ ˙ a(1)P ˙ ˙ ˙ ˙ 0 + b(1)P1 + c(1)P 2 + d(1)P3 = a(0)P 1 + b(0)P2 + c(0)P 3 + d(0)P4 , ¨ ¨ a ¨(1)P0 + ¨b(1)P1 + c¨(1)P2 + d(1)P ¨(0)P1 + ¨b(0)P2 + c¨(0)P3 + d(0)P 3 =a 4. Using the definitions of a(t) and its relatives, this can be written explicitly as a3 + a2 + a1 + a0 = 0, 3a3 + 2a2 + a1 = 0, 6a3 + 2a2 = 0, b3 + b2 + b1 + b0 = a0 , 3b3 + 2b2 + b1 = a1 , 6b3 + 2b2 = 2a2 , c3 + c2 + c1 + c0 = b0 , 3c3 + 2c2 + c1 = b1 , 6c3 + 2c2 = 2b2 , d3 + d2 + d1 + d0 = c0 , 3d3 + 2d2 + d1 = c1 , 6d3 + 2d2 = 2c2 , 0 = d0 , 0 = d1 , 0 = 2d2 .
(14.9)
These are 15 equations for the 16 unknowns. We already know from the quadratic case that the weight functions of each segment should be barycentric, i.e., a(t) + b(t) + c(t) + d(t) ≡ 1. This condition can be written explicitly as a3 + b3 + c3 + d3 = 0, a2 + b2 + c2 + d2 = 0, (14.10) a1 + b1 + c1 + d1 = 0, a0 + b0 + c0 + d0 = 1,
14.2 The Cubic Uniform B-Spline
738
and they add four more equations. We now have 19 equations, but only 16 of them are independent, since the first three equations of Equation (14.10) can be obtained by summing the first four equations of the left column of Equation (14.9). The system of equations can therefore be uniquely solved and the solutions are a3 = −1/6, a2 = 1/2, a1 = −1/2, a0 = 1/6, b3 = 1/2, b2 = −1, b1 = 0, b0 = 2/3, c3 = −1/2, c2 = 1/2, c1 = 1/2, c0 = 1/6, d3 = 1/6, d2 = 0, d1 = 0, d0 = 0. The cubic B-spline segment can now be expressed as ⎛
⎞⎛ ⎞ −1 3 −3 1 Pi−1 1 3 0 ⎟ ⎜ Pi ⎟ ⎜ 3 −6 Pi (t) = (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠ Pi+1 −3 0 3 0 6 Pi+2 1 4 1 0 1 1 = (−t3 + 3t2 − 3t + 1)Pi−1 + (3t3 − 6t2 + 4)Pi 6 6 1 t3 + (−3t3 + 3t2 + 3t + 1)Pi+1 + Pi+2 . 6 6
(14.11)
The two extreme points are therefore Ki = Pi (0) =
1 1 (Pi−1 + 4Pi + Pi+1 ), and Ki+1 = Pi (1) = (Pi + 4Pi+1 + Pi+2 ). 6 6
In order to interpret them geometrically, we write them as
Ki = Ki+1
1 Pi−1 + 6 1 = Pi + 6
5 1 Pi + (Pi+1 − Pi ) , 6 6 5 1 Pi+1 + (Pi+2 − Pi+1 ) . 6 6
(14.12)
Point Ki is the sum of the point ( 16 Pi−1 + 56 Pi ) and one-sixth of the vector (Pi+1 − Pi ). Point Ki+1 has a similar interpretation. Both are shown in Figure 14.3b. Exercise 14.2: Show another way to interpret Pi (0) and Pi (1) geometrically. Users, especially those familiar with B´ezier curves, find it counterintuitive that the B-spline curve does not start and end at its terminal control points. This “inconvenient” feature can be modified—and the curve made to start and end at its extreme points—by adding two phantom endpoints, P−1 and Pn+1 , at both ends of the curve, and placing those points at locations that would force the curve to start at P0 and end at Pn . The derivation of this case is simple. The first segment starts at 16 [P−1 + 4P0 + P1 ]. This value will equal P0 if we select P−1 = 2P0 − P1 . Similarly, the last segment ends at 1 6 [Pn−1 + 4Pn + Pn+1 ] and this value equals Pn if we select Pn+1 = 2Pn − Pn−1 .
14 B-Spline Approximation
739
Adding phantom points adds two segments to the curve, but this has the advantage that the tangents at the start and the end of the curve have known directions. The former is in the direction from P0 to P1 and the latter is from Pn−1 to Pn (same as the end tangents of a B´ezier curve). The tangent vector at the start of the first segment is 1 1 2 P−1 + 2 P1 = P1 − P0 , and similarly for the end tangent of the last segment. The tangent vector of the general cubic B-spline segment is Pti (t) =
1 1 1 t2 (−3t2 + 6t − 3)Pi−1 + (9t2 − 12t)Pi + (−9t2 + 6t + 3)Pi+1 + Pi+2 . 6 6 6 2
As a result, the extreme tangent vectors are Pti (0) =
1 (Pi+1 − Pi−1 ), 2
Pti (1) =
1 (Pi+2 − Pi ). 2
(14.13)
They have simple geometric interpretations. The second derivative of the cubic segment is Ptt i (t) =
1 1 1 (−6t + 6)Pi−1 + (18t − 12)Pi + (−18t + 6)Pi+1 + tPi+2 , 6 6 6
tt 2 and it’s easy to see that Ptt i (1) = Pi+1 (0) = Pi − 2Pi+1 + Pi+2 , which proves the C continuity of this curve.
Example: We select the five points P0 = (0, 0), P1 = (0, 1), P2 = (1, 1), P3 = (2, 1), and P4 = (2, 0). They have simple, integer coordinates to simplify the computations. We use these points to construct two cubic B-spline segments. The first one is given by Equation (14.11) 1 1 (−t3 + 3t2 − 3t + 1)(0, 0) + (3t3 − 6t2 + 4)(0, 1) 6 6 1 t3 + (−3t3 + 3t2 + 3t + 1)(1, 1) + (2, 1) 6 6 = (−t3 /6 + t2 /2 + t/2 + 1/6, t3 /6 − t2 /2 + t/2 + 5/6).
P1 (t) =
It starts at joint K1 = P1 (0) = (1/6, 5/6) and ends at joint K2 = P1 (1) = (1, 1). Notice that these joint points can be verified from Equation (14.12). The tangent vector of this segment is 1 1 Pt1 (t) = (−3t2 + 6t − 3)(0, 0) + (9t2 − 12t)(0, 1) 6 6 t2 1 + (−9t2 + 6t + 3)(1, 1) + (2, 1) 6 2 = (−t2 /2 + t + 1/2, t2 /2 − t + 1/2). The two extreme tangents are Pt1 (0) = (1/2, 1/2) and Pt1 (1) = (1, 0). These can also be verified by Equation (14.13). Figure 14.4 shows this segment and its successor (the dashed curves).
14.2 The Cubic Uniform B-Spline
740
1.0 0.0
0.8
0.5
1.0
1.5
2.0
0.6 0.4 0.2 0.0
(* B-spline example of 2 cubic segs and 3 quadr segs for 5 points *) Clear[Pt,T,t,M3,comb,a,g1,g2,g3]; Pt={{0,0},{0,1},{1,1},{2,1},{2,0}}; (*first,2 cubic segments (dashed)*) T[t_]:={t^3,t^2,t,1}; M3={{-1,3,-3,1},{3,-6,3,0},{-3,0,3,0},{1,4,1,0}}/6; comb[i_]:=(T[t].M3)[[i]] Pt[[i+a]]; g1=Graphics[{Red, PointSize[.02],Point/@Pt}]; a=0; g2=ParametricPlot[comb[1]+comb[2]+comb[3]+comb[4],{t,0,.95}, PlotRange->All,PlotStyle->{Green,AbsoluteDashing[{5,2}]}]; a=1; g3=ParametricPlot[comb[1]+comb[2]+comb[3]+comb[4],{t,0.05,1}, PlotRange->All,PlotStyle->{Green,AbsoluteDashing[{5,2}]}]; (*Now the 3 quadratic segments (solid)*) T[t_]:={t^2,t,1}; M2={{1,-2,1},{-2,2,0},{1,1,0}}/2; comb[i_]:=(T[t].M2)[[i]] Pt[[i+a]]; a=0; g4=ParametricPlot[comb[1]+comb[2]+comb[3],{t,0,.97}]; a=1; g5=ParametricPlot[comb[1]+comb[2]+comb[3],{t,0.03,.97}]; a=2; g6=ParametricPlot[comb[1]+comb[2]+comb[3],{t,0,1}]; Show[g2,g3,g4,g5,g6,g1,PlotRange->All] Figure 14.4: Two Cubic (Dashed) and Three Quadratic (Solid) B-spline Segments.
14 B-Spline Approximation
741
Exercise 14.3: Compute the second spline segment P2 (t), its tangent vector, and joint K3 . Exercise 14.4: Use the five control points of the example above to construct the three segments and determine the four joints of the quadratic uniform B-spline defined by the points. Exercise 14.4 shows that the same n + 1 control points can be used to construct a quadratic or a cubic B-spline curve (or a B-spline curve of any order up to n + 1). This is in contrast to the B´ezier curve whose order is determined by the number of control points. This is also the reason why both n and the degree of the polynomials that make up the spline segments are needed to identify a B-spline. In practice, we use n and k (the order) to identify a B-spline. The order is simply the degree plus 1. Thus, a Bspline defined by five control points P0 through P4 can be of order 2 (linear, with four segments), order 3 (quadratic, with three segments), order 4 (cubic, with two segments), or order 5, (quintic, with one segment). Figure 14.5a,b,c shows how a B´ezier curve, a cubic B-spline, and a quadratic Bspline, respectively, are attracted to their control polygons. We already know that these three types of curves do not have the same endpoints, so this figure is only qualitative. It only shows how the various types of curves are attracted to their control points. Collinear Points: Segment P2 (t) of Exercise 14.4 depends on points P1 , P2 , and P3 that are located on the line y = 1. This is why this segment is horizontal (and therefore straight). We conclude that the B-spline can consist of curved and straight segments connected with any desired continuity. All that’s necessary in order to have a straight segment is to have enough collinear control points. In the case of a quadratic B-spline, three collinear points will result in a straight segment that will connect to its neighbors (curved or straight) with C 1 continuity. In the case of a cubic B-spline, four collinear points will result in a straight segment that will connect to its neighbors (curved or straight) with C 2 continuity, and similarly for higher-degree uniform B-splines. A Closed Cubic B-Spline Curve: closing a cubic B-spline is similar to closing a quadratic curve. Given a set of n + 1 control points, we extend them cyclically to obtain the n + 4 points Pn , P0 , P1 , P2 , . . . , Pn−1 , Pn , P0 , P1 , and compute the curve by applying Equation (14.11) to the n + 1 geometry vectors ⎞ Pn ⎜ P0 ⎟ ⎝P ⎠ 1 P2 ⎛
⎛
⎞ P0 ⎜ P1 ⎟ ⎝ ⎠ P2 P3
⎛
⎞ ⎛ ⎞ P1 Pn−2 ⎜ P2 ⎟ ⎜ Pn−1 ⎟ ⎝ ⎠···⎝ ⎠ P3 Pn P4 P0
⎛
⎞ Pn−1 ⎜ Pn ⎟ ⎝ ⎠. P0 P1
742
14.2 The Cubic Uniform B-Spline
(c)
(b)
(a) Figure 14.5: A Comparison of (a) B´ ezier, (b) Cubic B-Spline, and (c) Quadratic B-Spline Curves.
14 B-Spline Approximation
743
14.3 Multiple Control Points It is possible (and may even be useful) to have several identical control points. A set of identical points is referred to as a multiple point. We use the uniform cubic B-spline (Equation (14.11)) as an example, but higher-degree uniform B-splines behave similarly. We start with a double control point. Consider the cubic segment P1 (t) defined by the four control points P0 , P1 = P2 , and P3 . Its expression is P1 (t) =
1 1 t3 (−t3 + 3t2 − 3t + 1)P0 + (−3t2 + 3t + 5)P1 + P3 , 6 6 6 P1 (0) =
which implies
1 5 P0 + P1 , 6 6
P1 (1) =
5 1 P1 + P3 . 6 6
This segment therefore starts and ends at the same points as the general cubic segment and also has the same extreme tangent vectors. The difference is that it is strongly attracted to the double point. Next, we consider a triple point. The five control points P0 , P1 = P2 = P3 , and P4 define the two cubic segments 1 1 (−t3 + 3t2 − 3t + 1)P0 + (t3 − 3t2 + 3t + 5)P1 6 6 = (1 − u)P0 + uP1 , for u = (t3 − 3t2 + 3t + 5)/6,
P1 (t) =
1 t3 (−t3 + 6)P1 + P4 6 6 = (1 − w)P1 + wP4 , for w = t3 /6.
P2 (t) =
The parameter substitutions above show that these segments are straight (Figure 14.6). The extreme points of the two segments are P1 (0) =
1 5 P0 + P1 , 6 6
P2 (0) = P1 ,
P2 (1) =
P1 (1) = P1 , 5 1 P1 + P4 , 6 6
showing that the segments meet at the triple control point. In general, a cubic segment is attracted to a double control point and passes through a triple control point. A degree-4 segment is attracted to double and triple control points and passes through quadruple points, and similarly for higher-degree uniform segments. The tangent vectors of the two cubic segments are 1 1 (−3t2 + 6t − 3)P0 + (3t2 − 6t + 3)P1 , 6 6 t2 t2 Pt2 (t) = − P1 + P4 , 2 2
Pt1 (t) =
14.3 Multiple Control Points
744
yielding the extreme directions Pt1 (0) =
1 (P1 − P0 ), 2 Pt2 (0) = (0, 0),
Pt1 (1) = 0 · P0 + 0 · P1 = (0, 0), Pt2 (1) =
1 (P4 − P1 ). 2
Thus, the first segment starts in the direction from P0 to the triple point P1 . The second segment ends going in the direction from P1 to P4 . However, at the triple point, both tangents are indefinite, suggesting a cusp. It turns out that the two segments are straight lines (Figure 14.6). P1=P2=P3 P1(t)
P2(t) P
3 (t
P0
P7
)
P4(t) P4=P5=P6 Figure 14.6: A Triple Point.
Exercise 14.5: Given the eight control points P0 , P1 = P2 = P3 , P4 = P5 = P6 , and P7 , calculate the two cubic segments P3 (t) and P4 (t) and their start and end points (Figure 14.6). Exercise 14.6: Show that a cubic B-spline segment passes through its first control point if it is a triple point. As a corollary, we deduce that a uniform cubic B-spline curve where every control point is triple is a polyline. Example: We consider the case where both terminal points are triple and there are two other points in between. The total number of control points is eight and they satisfy P0 = P1 = P2 and P5 = P6 = P7 . The five cubic spline segments are 1 t3 (−t3 + 6)P0 + P3 , 6 6 1 3 1 t3 2 P2 (t) = (2t − 3t − 3t + 5)P0 + (−3t3 + 3t2 + 3t + 1)P3 + P4 , 6 6 6 1 1 3 3 2 2 P3 (t) = (−t + 3t − 3t + 1)P0 + (3t − 6t + 4)P3 6 6 1 t3 + (−3t3 + 3t2 + 3t + 1)P4 + P5 , (14.14) 6 6 P1 (t) =
14 B-Spline Approximation
745
1 1 (−t3 + 3t2 − 3t + 1)P3 + (3t3 − 6t2 + 4)P4 6 6 1 + (−2t3 + 3t2 + 3t + 1)P5 , 6 1 1 P5 (t) = (−t3 + 3t2 − 3t + 1)P4 + (t3 − 3t2 + 3t + 5)P5 . 6 6
P4 (t) =
It is easy to see that they satisfy P1 (0) = P0 and P5 (1) = P5 and that they meet at the four points 5 1 P0 + P3 , 6 6
1 4 1 P0 + P3 + P4 , 6 6 6
1 4 1 P3 + P4 + P5 , 6 6 6
and
1 5 P4 + P5 . 6 6
If we want to keep the two extreme points as triples, we can edit this curve only by moving the two interior points P3 and P4 . Moving P4 affects the last four segments, and moving P3 affects the first four segments. This type of curve is therefore similar to a B´ezier curve in that it starts and ends at its extreme control points and it features only limited local control. Exercise 14.7: Given the eight control points P0 = P1 = P2 = (1, 0), P3 = (2, 1), P4 = (4, 0), and P5 = P6 = P7 = (4, 1), use Equation (14.14) to calculate the cubic uniform B-spline curve defined by these points and compare it to the B´ezier curve defined by the points.
14.4 Cubic B-Splines with Tension Adding a tension parameter to the uniform cubic B-spline is similar to tension in the cardinal spline (Section 12.5). We use Hermite interpolation (Equation (11.7)) to compute a PC segment that starts and ends at the same points as a cubic B-spline and whose extreme tangent vectors point in the same directions as those of the cubic Bspline, but whose magnitudes are controlled by a tension parameter s. Substituting 1 4 1 1 4 1 6 P0 + 6 P1 + 6 P2 and 6 P1 + 6 P2 + 6 P3 for the terminal points and s(P2 − P0 ) and s(P3 − P1 ) for the extreme tangents, we write Equation (11.7) and manipulate it such that it ends up looking like a uniform cubic B-spline segment, Equation (14.11). ⎞ ⎛ 1 P0 + 4 P1 + 1 P2 ⎞ 2 −2 1 1 6 6 6 1 P1 + 46 P2 + 16 P3 ⎟ 3 −2 −1 ⎟ ⎜ ⎜ −3 ⎟ 6 ⎜ P(t) = (t3 , t2 , t, 1) ⎝ ⎠ ⎠ 0 0 1 0 ⎝ s(P2 − P0 ) 1 0 0 0 s(P3 − P1 ) 1 3 2 = t (2 − s) + t (2s − 3) − st + 1 P0 + t3 (6 − s) + t2 (s − 9) + 4 P1 6 + t3 (s − 6) + t2 (9 − 2s) + st + 1 P2 + t3 (s − 2) + t2 (3 − s) P3 ⎞ ⎛ ⎞⎛ 2−s 6−s s−6 s−2 P0 1 3 2 ⎜ 2s − 3 s − 9 9 − 2s 3 − s ⎟ ⎜ P1 ⎟ (14.15) = (t , t , t, 1) ⎝ ⎠. ⎠⎝ P2 −s 0 s 0 6 P3 1 4 1 0 ⎛
14.4 Cubic B-Splines with Tension
746
A quick check verifies that Equation (14.15) reduces to the uniform cubic B-spline segment, Equation (14.11), for s = 3. This value is therefore considered the “neutral” or “standard” value of the tension parameter s. Since s controls the length of the tangent vectors, small values of s should produce the effects of higher tension and, in the extreme, the value s = 0 should result in indefinite tangent vectors and in the spline segment becoming a straight line. To show this, we rewrite Equation (14.15) for s = 0: ⎛
⎞⎛ ⎞ 2 6 −6 −2 P0 1 9 3 ⎟ ⎜ P1 ⎟ ⎜ −3 −9 P(t) = (t3 , t2 , t, 1) ⎝ ⎠⎝ ⎠ P2 0 0 0 0 6 P3 1 4 1 0 1 1 = (2t3 − 3t2 + 1)P0 + (6t3 − 9t2 + 4)P1 6 6 1 1 + (−6t3 + 9t2 + 1)P2 + (−2t3 + 3t2 )P3 . 6 6 Substituting T = 3t2 − 2t3 for the parameter t changes the above expression to the form P(T ) =
1 1 (−P0 − 3P1 + 3P2 + P3 )T + (P0 + 4P1 + P2 ), 6 6
which is a straight line from P(0) = 16 (P0 + 4P1 + P2 ) to P(1) = 16 (P1 + 4P2 + P3 ). The tangent vector of Equation (14.15) is ⎛ ⎞⎛ ⎞ 2−s 6−s s−6 s−2 P0 1 ⎜ 2s − 3 s − 9 9 − 2s 3 − s ⎟ ⎜ P1 ⎟ Pt (t) = (3t2 , 2t, 1, 0) ⎝ ⎠⎝ ⎠ P2 −s 0 s 0 6 P3 1 4 1 0 2 1 2 = 3t (2 − s) + 2t(2s − 3) − s P0 + 3t (6 − s) + 2t(s − 9) P1 6 + 3t2 (s − 6) + 2t(9 − 2s) + s P2 + 3t2 (s − 2) + 2t(3 − s) P3 .
(14.16)
The extreme tangents are Pt (0) =
s s (P2 − P0 ) and Pt (1) = (P3 − P1 ). 6 6
Substituting s = 0 in Equation (14.16) yields the tangent vector for the case of infinite tension 1 2 6(t − t)P0 + 18(t2 − t)P1 − 18(t2 − t)P2 − 6(t2 − t)P3 6 = (t2 − t)(P0 + 3P1 − 3P2 − P3 ).
Pt (t) =
(14.17)
14 B-Spline Approximation
747
1.0
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1.0
(* Cubic B-spline with tension *) Clear[t,s,pnts,stnp,tensMat,bsplineTensn,g1,g2,g3,g4]; pnts={{0,0},{0,1},{1,1},{1,0}}; stnp=Transpose[pnts]; tensMat={{2-s,6-s,s-6,s-2},{2s-3,s-9,9-2s,3-s},{-s,0,s,0},{1,4,1,0}}; bsplineTensn[t_]:=Module[{tmpstruc},tmpstruc={t^3,t^2,t,1}.tensMat; {tmpstruc.stnp[[1]],tmpstruc.stnp[[2]]}/6]; g1=ListPlot[pnts,PlotStyle->{Red,AbsolutePointSize[6]}, AspectRatio->Automatic]; s=0; g2=ParametricPlot[bsplineTensn[t],{t,0,1}]; s=3; g3=ParametricPlot[bsplineTensn[t],{t,0,1}, PlotStyle->{Green,AbsoluteDashing[{2,2}]}]; s=5; g4=ParametricPlot[bsplineTensn[t],{t,0,1}, PlotStyle->{Blue,AbsoluteDashing[{1,2,2,2}]}]; Show[g1,g2,g3,g4,PlotRange->All]
Figure 14.7: Cubic B-Spline with Tension.
Exercise 14.8: Since the spline segment is a straight line in this case, its tangent vector should always point in the same direction. Use Equation (14.17) to show that this is so. Figure 14.7 illustrates the effect of tension on a cubic B-spline. Three curves are shown, corresponding to s values of 0, 3, and 5. See also Section 13.11 for a discussion of cubic B´ezier curves with tension.
14.5 Cubic B-Spline and B´ ezier Curves
748
Sex alleviates tension and love causes it. —Woody Allen (as Andrew) in A Midsummer Night’s Sex Comedy (1982).
14.5 Cubic B-Spline and B´ ezier Curves Given a cubic B-spline segment P(t) based on control points P0 , P1 , P2 , and P3 , it is easy to determine points Q0 , Q1 , Q2 , and Q3 such that the B´ezier curve Q(t) defined by them will have the same shape as P(t). This is done by equating the matrices of Equation (14.11) that define P(t) to those of Equation (13.8) that define Q(t): ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ P0 −1 3 −3 1 Q0 −1 3 −3 1 3 0 ⎟ ⎜ P1 ⎟ ⎜ 3 −6 3 0 ⎟ ⎜ Q1 ⎟ ⎜ 3 −6 ⎠⎝ ⎝ ⎠⎝ ⎠=⎝ ⎠. −3 0 3 0 −3 3 0 0 P2 Q2 1 4 1 0 1 0 0 0 P3 Q3 ⎛
The solutions are Q0 = Q1 = Q2 = Q3 = Equation (15.4) of Section 15.2 spline and B´ezier curves.
1 (P0 + 4P1 + P2 ) , 6 1 (4P1 + 2P2 ) , 6 1 (2P1 + 4P2 ) , 6 1 (P1 + 4P2 + P3 ) . 6 shows a similar relation between the quadratic B-
14.6 Higher-Degree Uniform B-Splines The methods of Sections 14.1 and 14.2 can be employed to construct uniform B-splines of higher degrees. It can be shown (see, for example, [Yamaguchi 88], p. 329) that the degree-n uniform B-spline segment is given by ⎛ P i−1 ⎜ Pi ⎜ Pi+1 Pi (t) = (tn , . . . , t2 , t, 1)M ⎜ ⎜ .. ⎝ .
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
Pi+n−1 where the elements mij of the basis matrix M are mij =
n 1 n n+1 (n − k)i (−1)k−j . n! i k−j k=j
Figure 14.8 shows a few examples of these matrices.
14 B-Spline Approximation
M1 =
1 1! ⎛
−1 1 1 0
749
⎞ 1 −2 1 1 ⎝ −2 2 0⎠ M2 = 2! 1 1 0 ⎛ ⎞ −1 3 −3 1 1 ⎜ 3 −6 3 0⎟ M3 = ⎝ ⎠ 0 3 0 3! −3 1 4 1 0 ⎛ ⎞ 1 −4 6 −4 1 −4 12 −12 4 0 ⎟ 1 ⎜ ⎜ ⎟ M4 = ⎜ 6 −6 −6 6 0⎟ 4! ⎝ ⎠ −4 −12 12 4 0 1 11 11 1 0 ⎛ ⎞ −1 5 −10 10 −5 1 −20 30 −20 5 0 ⎟ ⎜ 5 ⎟ 1 ⎜ 0 −20 10 0 ⎟ ⎜ −10 20 M5 = ⎜ ⎟ 10 20 −60 20 10 0 ⎟ 5! ⎜ ⎝ ⎠ −5 −50 0 50 5 0 1 26 66 26 1 0 ⎛ 1 −6 15 −20 15 −6 30 −60 60 −30 6 ⎜ −6 ⎜ 15 −45 30 30 −45 15 1 ⎜ ⎜ 160 −160 20 20 M6 = ⎜ −20 −20 6! ⎜ 135 −150 −150 135 15 ⎜ 15 ⎝ −6 −150 −240 240 150 6 1 57 302 302 57 1
⎞ 1 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎟ 0⎟ ⎠ 0 0
Figure 14.8: Some Basis Matrices for Uniform B-Splines.
14.7 Interpolating B-Splines
750
14.7 Interpolating B-Splines The B-spline is an approximating curve. Its shape is determined by the control points Pi , but the curve itself does not pass through those points. Instead, it passes through the joints Ki . In our notation so far, we have assumed that the cubic uniform B-spline is based on n + 1 control points and passes through n − 1 joint points. The number of control points for the cubic curve is therefore always two more than the number of joints. One person’s constant is another person’s variable. —Susan Gerhart. This section deals with the opposite problem. We show how to employ B-splines to construct an interpolating cubic spline curve that passes through a set of n + 1 given data points K0 , K1 , . . . , Kn . The curve must consist of n segments and the idea is to use the Ki points to compute a new set of points Pi , and then use the new points as the control points of a cubic uniform B-spline curve. To obtain n cubic segments, we need n + 3 points and we denote them by P−1 through Pn+1 . Using Pi as our control points, Equation (14.11) shows that the general segment Pi (t) terminates at Pi (1) = 16 [Pi−2 + 4Pi−1 + Pi ]. We require that the segment ends at point Ki−1 , which produces the equation 16 [Pi−2 + 4Pi−1 + Pi ] = Ki−1 . When this equation is repeated for 0 ≤ i ≤ n, we get a system of n + 1 equations with the Pi s as the unknowns. However, there are n + 3 unknowns (P−1 through Pn+1 ), so we need two more equations. The required equations are obtained by considering the tangent vectors of the interpolating curve at its two ends. We denote the tangent at the start by T1 . It is given by T1 = 12 (P1 − P−1 ), so it points in the direction from P−1 to P1 ; similarly for the end tangent Tn = 12 (Pn+1 − Pn−1 ). After these two relations are included, the resulting system of n + 3 equations is ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
⎛
−3 ⎜ 1 ⎜ ⎜ 0 1⎜ ⎜ ... n+3 ⎪ 6⎜ ⎪ ⎪ ⎜ ⎜ 0 ⎪ ⎪ ⎪ ⎝ 0 ⎪ ⎪ ⎩ 0
0 3 0 ... 4 1 0 ... 1 4 1 ...
0 0 0
⎞ ⎛ ⎞ ⎞⎛ P−1 T1 0 0 0 0 ⎟ ⎜ P0 ⎟ ⎜ K0 ⎟ ⎟ ⎜ ⎟ ⎟⎜ 0 0 ⎟ ⎜ P1 ⎟ ⎜ K1 ⎟ ⎜ ⎟ ⎜ ⎟ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟ . ⎟⎜ . ⎟ = ⎜ . ⎟ ⎟. ⎜ ⎟ ⎜ ⎟ 1 0⎟ ⎟ ⎜ Pn−1 ⎟ ⎜ Kn−1 ⎟ 4 1⎠⎝ P ⎠ ⎝ K ⎠
0 0 0 ... 4 0 0 0 ... 1 0 0 0 . . . −3 0 3
n
Pn+1
(14.18)
n
Tn
n+3
The user specifies the values of the two extreme tangents T1 and Tn , the equations are solved, and the Pi points are then used in the usual way to calculate a cubic uniform B-spline that passes through the original points Ki . This process should be compared to the similar computation of the cubic spline, Section 12.1. Specifically, Equation (14.18) should be compared with Equation (12.7). Notice that the coefficient matrix of Equation (14.18) is not diagonally dominant because of the four ±3’s. We can, however, modify it slightly by writing the system of
14 B-Spline Approximation
751
equations in the form ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
⎛
⎜ ⎜ ⎜ 1⎜ ⎜ n+3 ⎪ 6⎜ ⎜ ⎪ ⎪ ⎜ ⎪ ⎪ ⎪ ⎝ ⎪ ⎪ ⎩
⎞ ⎞ ⎛ T1 /2 P−1 ⎟ ⎜ P0 ⎟ ⎜ K0 ⎟ ⎟ ⎟ ⎜ ⎟⎜ ⎟ ⎜ P1 ⎟ ⎜ K1 ⎟ ⎟⎜ . ⎟ ⎜ . ⎟ ⎟ ⎜ . ⎟ = ⎜ . ⎟ . (14.19) ⎟⎜ . ⎟ ⎜ . ⎟ ⎟ ⎜ ⎟ ⎜ 0 ... 4 1 0 ⎟ ⎟ ⎜ Pn−1 ⎟ ⎜ Kn−1 ⎟ ⎝ ⎝ ⎠ ⎠ 0 ... 1 4 1 Pn Kn ⎠ 0 . . . −3/2 0 3/2 Pn+1 Tn /2
−3/2 0 3/2 0 . . . 1 4 1 0 ... 0 1 4 1 ... .. . 0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0 .. .
⎞⎛
n+3
The coefficient matrix of Equation (14.19) is columnwise diagonally dominant and is therefore nonsingular. Thus, this system of equations has a unique solution, but this system is mathematically identical to Equation (14.18), so that system of equations also has a unique solution. Example: This is the opposite of the example on Page 739. We start with K0 = (1/6, 5/6), K1 = (1, 1), K2 = (11/6, 5/6), and the two extreme tangents T1 = (1/2, 1/2) and T2 = (1/2, −1/2), and set up the 5×5 system of equations ⎛
−3 1 ⎜ 1⎜ ⎜ 0 6⎝ 0 0
0 3 4 1 1 4 0 1 0 −3
0 0 1 4 0
⎞ ⎛ ⎞ ⎞⎛ (1/2, 1/2) 0 P−1 0 ⎟ ⎜ P0 ⎟ ⎜ (1/6, 5/6) ⎟ ⎟ ⎜ ⎟ ⎟⎜ (1, 1) 0 ⎟ ⎜ P1 ⎟ = ⎜ ⎟. ⎠ ⎝ ⎠ ⎠⎝ (11/6, 5/6) 1 P2 (1/2, −1/2) 3 P3
This is easy to solve and the solutions are P−1 = (0, 0), P0 = (0, 1), P1 = (1, 1), P2 = (2, 1), and P3 = (2, 0), identical to the original control points of the abovementioned example.
14.8 A Knot Vector-Based Approach The knot vector approachto the uniform B-spline curve assumes that the curve is n a weighted sum, P(t) = i=0 Pi Bn,i (t) of the control points with unknown weight functions that have to be determined. The method is similar to that used in deriving the B´ezier curve (Section 13.2). The cubic uniform B-spline is used here as an example, but this approach can be applied to B-splines of any order. We assume that five control points are given—so that five weight functions, B4,0 (t) through B4,4 (t), are required— and that the curve will consist of two cubic segments. In this approach we assume that each spline segment is traced when the parameter t varies over an interval of one unit, from an integer value u to the next integer u + 1. The u values are called the knots of the B-spline. Since they are the integers 0, 1, 2, . . ., they are uniformly distributed, hence the name uniform B-spline. To trace out a two-segment spline curve, t should vary in the interval [0, 2]. The guiding principle is that each weight function should be a cubic polynomial, should have a maximum at the vicinity of “its” control point, and should drop to zero
14.8 A Knot Vector-Based Approach
752
when away from the point. A general weight function should therefore have the bell shape shown in Figure 14.9a. To derive such a function, we write it as the union of four parts, b0 (t), b1 (t), b2 (t), and b3 (t), each a simple cubic polynomial, and each defined over one unit of t. Figure 14.9b shows how each weight B4,i (t) is defined over a range of five knots and is zero elsewhere
b1(t)
b2(t)
b0(t)
b3(t) ui+1
ui
ui+2
ui+3
t ui+4
(a) B4,0(t)
2/3
B4,2(t) B4,3(t) B4,4(t)
1/6
t °2
°1
1
2
3
4
(b)
2/3
1/6
b2
b1
b0
b3
t 1
(c) Figure 14.9: Weight Functions of the Cubic Uniform B-Spline.
The following considerations are employed to set up equations to calculate the bi (t) functions: 1. They should be barycentric. 2. They should provide C 2 continuity at the three points where they join. 3. b0 (t) and its first two derivatives should be zero at the start point b0 (0). 4. b3 (t) and its first two derivatives should be zero at the end point b3 (1).
14 B-Spline Approximation
753
We adopt the notation bi (t) = Ai t3 + Bi t2 + Ci t + Di . The conditions above yield the following equations: 1. The single equation B4,0 (0) + B4,1 (0) + B4,2 (0) + B4,3 (0) = 1. This is a special case of condition 1. We see later that the bi (t) functions resulting from our equations are, in fact, barycentric. 2. Condition 2 yields the nine equations b0 (1) = b1 (0), b1 (1) = b2 (0), b2 (1) = b3 (0),
b˙ 0 (1) = b˙ 1 (0), b˙ 1 (1) = b˙ 2 (0), b˙ 2 (1) = b˙ 3 (0),
¨b0 (1) = ¨b1 (0), ¨b1 (1) = ¨b2 (0), ¨b2 (1) = ¨b3 (0).
(14.20)
The first two derivatives of bi (t) are dbi (t) = b˙ i (t) = 3Ai t2 + 2Bi t + Ci , dt
d2 bi (t) ¨ = bi (t) = 6Ai t + 2Bi , dt2
so the nine equations above can be written explicitly as A0 + B0 + C0 + D0 = D1 , 3A0 + 2B0 + C0 = C1 , A1 + B1 + C1 + D1 = D2 , 3A1 + 2B1 + C1 = C2 , A2 + B2 + C2 + D2 = D3 , 3A2 + 2B2 + C2 = C3 ,
6A0 + 2B0 = 2B1 , 6A1 + 2B1 = 2B2 , 6A2 + 2B2 = 2B3 .
3. Condition 3 yields the three equations D0 = 0,
C0 = 0,
2B0 = 0.
4. Condition 4 yields the three equations A3 + B3 + C3 + D3 = 0,
3A3 + 2B3 + C3 = 0,
6A3 + 2B3 = 0.
Thus, we end up with 16 equations that are easy to solve. Their solutions are 1 3 1 t , b1 (t) = (1 + 3t + 3t2 − 3t3 ), 6 6 1 1 2 3 b2 (t) = (4 − 6t + 3t ), b3 (t) = (1 − 3t + 3t2 − t3 ). 6 6 b0 (t) =
(14.21)
The proof that the bi (t) functions are barycentric is now trivial. Figure 14.9c shows the shapes of the four weights. Now that the weight functions are known, the entire curve can be expressed as n the weighted sum P(t) = i=0 Pi B4,i (t), where the weights all look the same and are shifted with respect to each other by using different ranges for t. Each weight B4,i (t) is nonzero only in the (open) interval (ui−3 , ui+1 ) (Figure 14.9b). Each curve segment Pi (t) can now be expressed as the barycentric sum of the four weighted points Pi−3 through Pi (or, alternatively, as a linear combination of the B4,i (t) 0 functions), Pi (t) = j=−3 Pi+j B4,i+j (t), where ui ≤ t < ui+1 . The next (crucial) step
14.8 A Knot Vector-Based Approach
754
is to realize that in the range ui ≤ t < ui+1 , only component b3 of B4,i−3 is nonzero and similarly for the other three weights (see the dashed box of Figure 14.9b). The segment can therefore be written Pi (t) =
0
Pi−j bj (t)
j=3
1 1 Pi−3 (−t3 + 3t2 − 3t + 1) + Pi−2 (3t3 − 6t2 + 4) 6 6 1 1 3 2 + Pi−1 (−3t + 3t + 3t + 1) + Pi t3 6 6 ⎛ ⎞⎛ ⎞ −1 3 −3 1 Pi−3 1 3 2 3 0 ⎟ ⎜ Pi−2 ⎟ ⎜ 3 −6 = (t , t , t, 1) ⎝ ⎠⎝ ⎠, Pi−1 −3 0 3 0 6 Pi 1 4 1 0
=
(14.22)
an expression identical (except for the choice of index i) to Equation (14.11). This approach to deriving the weight functions can be generalized for the nonuniform Bspline. The dashed box of Figure 14.9b illustrates how the B4,i (t) weight functions blend the five control points in the two spline segments. The first weight, B4,0 (t), goes down from 1/6 to 0 when t varies from 0 to 1. Thus, the first control point P0 starts by contributing 1/6 of its value to the curve, then decreases its contribution until it disappears at t = 1. This is why P0 does not contribute to the second segment. The second weight, B4,1 (t), starts at 2/3 (when t = 0), goes down to 1/6 for t = 1, then all the way to 0 when t reaches 2. This is how the second control point P1 participates in the blend that generates the first two spline segments. Notice how the weight functions have their maxima at integer values of t, how only three weights are nonzero at these values, and how there are four nonzero weights for any other values of t. Figure 14.10a shows the weight functions for the linear uniform B-spline. Each has the form of a hat, going from 0 to 1 and back to 0. They also have their maxima at integer values of t. The weight functions of the quadratic B-spline are shown in Figure 14.10b. Notice how each varies from 0 to 3/4, how they meet at a height of 1/2, and how their maxima are at half-integer values of t. The first weight, B3,0 (t), drops from 1/2 to 0 for the first spline segment (i.e., when t varies in the interval [0, 1]) and remains zero for the second and subsequent segments. The second weight, B3,1 (t), climbs from 1/2 to 1, then drops back to 1/2 for the first segment. For the second segment, this weight goes down from 1/2 to 0. These diagrams provide a clear understanding of how the control points are blended by the uniform B-spline. The general B-spline weight functions are normally denoted by Nik (t) and can be defined recursively. Before delving into this topic, however, we show how the uniform B-spline curve itself can be defined recursively, similar to the recursive definition of the B´ezier curve (Equation (13.11)). Given a set of n + 1 control points P0 through Pn and a uniform knot vector (t0 , t1 , . . . , tn+k ) (a set of equally-spaced n + k + 1 nondecreasing real numbers), the B-spline of order k is defined as (k−1)
P(t) = Pl
(t),
where tl ≤ t < tl+1
(14.23)
14 B-Spline Approximation 1
B2,0(t)
755
B2,2(t)
B2,1(t)
1/2
t -1
1
2
3
(a) B3,0(t) B3,1(t) B3,2(t) B3,3(t)
3/4
1/2
t -1
1
2
3
4
(b) Figure 14.10: Weight Functions of the Linear and the Quadratic B-Splines. (j)
and where the quantities Pi (t) are defined recursively by (j)
Pi (t) =
Pi , for j = 0, (j−1) (j−1) (t), for j > 0, (1 − Tij )Pi−1 (t) + Tij Pi
and Tij =
t − ti . ti+k−j − ti (k−1)
Figure 14.11 is a pyramid that illustrates how the quantities Pl (t) are constructed (j) recursively. Each Pi (t) in the figure is constructed as a barycentric sum of the two quantities immediately to its left. Equation (14.23) is the geometric definition of the uniform B-spline. We now turn to the algebraic (or analytical) definition of the general (uniform and nonuniform) B-spline curve. It is defined as the weighted sum P(t) =
n
Pi Nik (t),
i=0
where the weight functions Nik (t) are defined recursively by Ni1 (t) =
1, if t ∈ [ti , ti+1 ), 0, otherwise,
(14.24)
14.8 A Knot Vector-Based Approach
756 .. .
Pl−k+1
(1)
Pl−k+2 Pl−k+2 Pl−k+3 Pl−k+4 . . . . Pl−2
(1) Pl−k+3
Pl
. . (k−2)
Pl−1
(k−1)
Pl
(k−2) Pl (2)
(1)
Pl−1 Pl−1
(2)
Pl−k+3
(1)
Pl−1 . (2)
Pl
Pl .. .
(k−1)
Figure 14.11: Recursive Construction of Pl
(t).
(note how the interval starts at ti but does not reach ti+1 ; such an interval is closed on the left and open on the right) and Nik (t) =
t − ti ti+k − t Ni,k−1 (t) + Ni+1,k−1 (t), ti+k−1 − ti ti+k − ti+1
where 0 ≤ i ≤ n. (14.25)
The weights Nik (t) may be tedious to calculate in the general case, where the knots ti can be any, but are easy to calculate in the special case where the knot vector is the uniform sequence (0, 1, . . . , n + k), i.e., when ti = i. Here are examples for the first few values of k. For k = 1, the weight functions are defined by 1, if t ∈ [i, i + 1), (14.26) Ni1 (t) = 0, otherwise. This results in the “step” functions shown in Figure 14.12. Notice how each step is closed on the left and open on the right and how Ni1 (t) is nonzero only in the interval [i, i + 1) (this interval is its support). It is also clear that each of them is a shifted version of its predecessor, so we can express any of them as a shifted version of the first one and write Ni1 (t) = N01 (t − i). For k = 2, the weight functions can be calculated for any i from Equation (14.25) N02 (t) =
t − t0 t2 − t N01 (t) + N11 (t) t1 − t0 t2 − t1
14 B-Spline Approximation [
757
)
N01(t) 0
1
2
[
)
3
4
3
4
N11(t) 0
1
2 [
)
N21(t) 0
1
2
3
4
[
)
N31(t) 0
1
2
3
4
Figure 14.12: Uniform B-Spline Weight Functions for k = 1.
= tN01 (t) + (2 − t)N11 (t) t, when 0 ≤ t < 1, = 2 − t, when 1 ≤ t < 2, 0, otherwise, t − t1 t3 − t N12 (t) = N11 (t) + N21 (t) t2 − t1 t3 − t2 = (t − 1)N11 (t) + (3 − t)N21 (t) t − 1, when 1 ≤ t < 2, = 3 − t, when 2 ≤ t < 3, 0, otherwise, t − t2 t4 − t N22 (t) = N21 (t) + N31 (t) t3 − t2 t4 − t3 = (t − 2)N21 (t) + (4 − t)N31 (t) t − 2, when 2 ≤ t < 3, = 4 − t, when 3 ≤ t < 4, 0, otherwise. The hat-shaped functions are shown in Figure 14.13. Notice how Ni2 (t) spans the interval [i, i+2). It is also obvious that each of them is a shifted version of its predecessor, so we can express any of them as a shifted version of the first one and write Ni2 (t) =
14.8 A Knot Vector-Based Approach
758
N02(t) 0
1
2
3
4
3
4
N12(t) 0
1
2
N22(t) 0
1
2
3
4
Figure 14.13: Uniform B-Spline Weight Functions for k = 2.
N02 (t − i). For k = 3, the calculations are similar: t − t0 t3 − t N02 (t) + N12 (t) t2 − t0 t3 − t1 3−t t N12 (t) = N02 (t) + 2 2 ⎧ 2 t /2, when 0 ≤ t < 1, ⎪ ⎪ ⎨ t2 3−t (2 − t) + (t − 1), when 1 ≤ t < 2, 2 2 = 2 ⎪ /2, when 2 ≤ t < 3, (3 − t) ⎪ ⎩ 0, otherwise, ⎧ 2 when 0 ≤ t < 1, t /2, ⎪ ⎪ ⎨ 2 (−2t + 6t − 3)/2, when 1 ≤ t < 2, = 2 ⎪ when 2 ≤ t < 3, ⎪ ⎩ (3 − t) /2, 0, otherwise,
N03 (t) =
t − t1 t4 − t N12 (t) + N22 (t) t3 − t1 t4 − t2 4−t t−1 N12 (t) + N22 (t) = 2 2
N13 (t) =
14 B-Spline Approximation ⎧ (t − 1)2 /2, when 1 ≤ t < 2, ⎪ ⎪ ⎨ 2 + 10t − 11)/2, when 2 ≤ t < 3, (−2t = 2 ⎪ when 3 ≤ t < 4, ⎪ (4 − t) /2, ⎩ 0, otherwise.
759
Each of these curves (Figure 14.14) is a spline whose three segments are quadratic polynomials (i.e., parabolic arcs) joined smoothly at the knots. Notice again that the support of Ni3 (t) is the interval [i, i + 3) and that they are shifted versions of each other, allowing us to write Ni3 (t) = N03 (t − i).
N03(t) 0
1
2
3
4
3
4
N13(t) 0
1
2
N23(t) 0
1
2
3
4
Figure 14.14: Uniform B-Spline Weight Functions for k = 3.
Exercise 14.9: How can we show that the various Ni3 (t) are shifted versions of each other? In general, the support of Nik (t) is the interval [i, i + k) and Nik (t) = N0k (t − i). Figure 14.15 shows how a general weight function Nik (t) is constructed recursively. Each Nij (t) function in this triangle is constructed as a weighted sum of the two functions immediately to its left. The geometric and algebraic definitions of the B-spline look different but it can be shown that they are identical. The proof of this is called the Cox–DeBoor (or DeBoor– Cox) formula [DeBoor 72].
760
14.9 Recursive Definitions of the B-Spline Ni,1 Ni+1,1 . .
. . Ni,k−2 Ni,k−1 Ni+1,k−2
Ni,k Ni+1,k−1
. . Ni+k−2,1 Ni+k−1,1
Ni+2,k−2 . .
Figure 14.15: Recursive Construction of Ni,k (t).
14.9 Recursive Definitions of the B-Spline The order k of the B-spline curve is an integer in the interval [2, n + 1] (it is possible to have k = 1, but the curve degenerates in this case to just a plot of the control points). Each blending function Nik (t) has support over k intervals [ti , ti+k−1 ) and is zero outside its support. The knot vector (t0 , t1 , . . . , tn+k ) consists of n + k + 1 nondecreasing real numbers ti . These values define n + k subintervals [ti , ti+1 ). The two extreme values t0 and tn are selected based on the values of n and k. Any terms of the form 0/0 or x/0 in the calculation of the blending functions are assumed to be zero. Editing the B-spline curve can be done by (1) adding, moving, or deleting control points without changing the order k, (2) changing the order k without modifying the control points, and (3) increasing the size of the knot vector. The knot vector contains n + k + 1 values, so increasing its size implies that either n or k should be increased. Here are a few more properties of the curve: 1. Plotting the B-spline curve is done by varying the parameter t over the range of knot values [tk−1 , tn+1 ). 2. Each segment of the curve (between two consecutive knot values) depends on k control points. This is why the curve has local control and it also implies that the maximum value of k is the number n + 1 of control points. 3. Any control point participates in at most k segments. 4. The curve lies inside the convex hull defined by at most k control points. This means that the curve passes close to the control points, a feature that makes it easy for a designer to place these points in order to obtain the right curve shape. 5. The blending functions Nik (t) are barycentric for any t in the interval [tk−1 , tn+1 ). They are also nonnegative and, except for k = 1, each has one maximum. 6. The curve and its first k − 1 derivatives are continuous over the entire range (except that nonuniform B-splines can have discontinuities, see Figure 14.19d). 7. The entire curve can be affinely transformed by transforming the control points, then redrawing the curve from the new points. One important difference between the B-spline and the B´ezier curve is the use of a knot vector. This feature (which has already been mentioned) consists of a nondecreasing sequence of real numbers called knots. The knot vector adds flexibility to the
14 B-Spline Approximation
761
curve and provides better control of its shape, but its use requires experience. There are three common ways to select the values in the knot vector, namely uniform, open uniform, and nonuniform. In a uniform B-spline the knot values are equally spaced. An example is (−2, −1.5, −0.5, 0, 0.5, 1, 1.5), but more typical examples are a vector with normalized values between 0 and 1 (0, 0.2, 0.4, 0.6, 0.8, 1) or a vector with integer values (0, 1, 2, 3, 4, 5, 6). Figure 14.16 lists Mathematica code to compute, print, and plot the weight functions for any set of knots.
(* B-spline weight functions printed and plotted *) Clear[bspl,knt,i,k,n,t,p] bspl[i_,k_,t_]:=If[knt[[i+k]]==knt[[i+1]],0, (*0 0,
(24.18)
and CT is the transpose of C. (The product of two matrices Amp and Bpn is a matrix Cmn defined by p Cij = aik bkj . k=1
For other properties of matrices, see any text on linear algebra.) Calculating one matrix element of the product CP therefore requires eight multiplications and seven (but for simplicity let’s say eight) additions. Multiplying the two 8 × 8 matrices C and P requires 64 × 8 = 83 multiplications and the same number of additions. Multiplying the product CP by CT requires the same number of operations, so the DCT of one 8×8 data unit requires 2×83 multiplications (and the same number of additions). Assuming that the entire image consists of n×n pixels and that n = 8q, there are q×q data units, so the DCT of all the data units requires 2q 2 83 multiplications (and the same number of additions). In comparison, performing one DCT for the entire image would require 2n3 = 2q 3 83 = (2q 2 83 )q operations. By dividing the image into data units, we reduce the number of multiplications (and also of additions) by a factor of q. Unfortunately, q cannot be too large, because that would mean very small data units. Recall that a color image consists of three components (often RGB, but sometimes YCbCr or YPbPr). In JPEG, the DCT is applied to each component separately, bringing the total number of arithmetic operations to 3×2q2 83 = 3,072q 2 . For a 512×512-pixel image, this implies 3072 × 642 = 12,582,912 multiplications (and the same number of additions). 3. Another way to speed up the DCT is to perform all the arithmetic operations on fixed-point (scaled integer) rather than on floating-point numbers. On many computers, operations on fixed-point numbers require (somewhat) sophisticated programming techniques, but they are considerably faster than floating-point operations (except on supercomputers, which are optimized for floating-point arithmetic).
1116
24.3 The Discrete Cosine Transform
The DCT algorithm with smallest currently-known number of arithmetic operations is described in [Feig and Linzer 90]. Today, there are also various VLSI chips that perform this calculation efficiently.
24.3.6 The LLM Method This section describes the Loeffler–Ligtenberg–Moschytz (LLM) method for the DCT in one dimension [Loeffler et al. 89]. Developed in 1989 by Christoph Loeffler, Adriaan Ligtenberg, and George S. Moschytz, this algorithm computes the DCT in one dimension with a total of 29 additions and 11 multiplications. Recall that the DCT in one dimension involves multiplying a row vector by a matrix. For n = 8, multiplying the row by one column of the matrix requires eight multiplications and seven additions, so the total number of operations required for the entire operation is 64 multiplications and 56 additions. Reducing the number of multiplications from 64 to 11 represents a savings of 83% and reducing the number of additions from 56 to 29 represents a savings of 49%—very significant! Only the final result is listed here and the interested reader is referred to the original publication for the details. We start with the double sum of Equation (24.13) and claim that a little algebraic tinkering reduces it to the form CPCT , where P is the 8×8 matrix of the pixels, C is the matrix defined by Equation (24.18), and CT is the transpose of C. In the one-dimensional case, only one matrix multiplication, namely PC, is needed. The √ originators of this method show that matrix C can be written (up to a factor of 8) as the product of seven simple matrices, as shown in Figure 24.32. Even though the number of matrices has been increased, the problem has been simplified, because our seven matrices are sparse and contain mostly 1’s and −1’s. Multiplying by 1 or by −1 does not require a multiplication, and multiplying something by 0 saves an addition. Table 24.31 summarizes the total number of arithmetic operations required to multiply a row vector by the seven matrices. Matrix C1 C2 C3 C4 C5 C6 C7 Total
Additions
Multiplications
0 8 4 2 0 4 8 26
0 12 0 0 2 0 0 14
Table 24.31: Number of Arithmetic Operations.
These surprisingly small numbers can be reduced further by the following observation. We notice that matrix C2√has three groups of four cosines each. One of the 6 2 groups consists of (we ignore the 2) two cos 16 π and two cos 16 π (one with a negπ ative sign). We use the trigonometric identity cos( 2 − α) = sin α to replace the two 2 6 ± cos 16 π with ± sin 16 π. Multiplying any matrix by C2 now results in products of the 6 6 6 6 form A cos( 16 π) − B sin( 16 π) and B cos( 16 π) + A sin( 16 π). It seems that computing
24 Transforms and JPEG
1117
⎡
⎤ 1 0 0 0 0 0 0 0 ⎢0 0 0 0 1 0 0 0⎥ ⎢ ⎥ ⎢0 0 1 0 0 0 0 0⎥ ⎢ ⎥ ⎢0 0 0 0 0 0 1 0⎥ C=⎢ ⎥ ⎢0 1 0 0 0 0 0 0⎥ ⎢ ⎥ ⎢0 0 0 0 0 1 0 0⎥ ⎣ ⎦ 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 ⎡ 1 1 0 0 0 ⎢ 1 −1 0 √ 0 2π √ 0 6π ⎢ ⎢0 0 2 cos 16 2 cos 16 0 ⎢ √ √ 6π ⎢ 0 0 − 2 cos 2π 2 cos 0 ⎢ 16 16 √ ×⎢ 2 cos 7π 0 0 ⎢0 0 16 ⎢ ⎢0 0 0 0 0 ⎢ ⎣0 0 0 0 0 √ π 0 0 0 0 − 2 cos 16 ⎡ ⎤ ⎡ 1 0 0 0 0 0 0 0 1 0 0 0 0⎥ ⎢0 1 0 ⎢0 1 0 0 0 0 ⎢ ⎥ ⎢ 0 0⎥ ⎢0 0 1 ⎢0 0 1 0 0 0 ⎢ ⎥ ⎢ 0 0⎥ ⎢0 0 0 ⎢0 0 0 1 0 0 ×⎢ ⎥×⎢ ⎢ 0 0 0 0 1 −1 0 0 ⎥ ⎢ 0 0 0 ⎢ ⎥ ⎢ 0 0⎥ ⎢0 0 0 ⎢0 0 0 0 1 1 ⎣ ⎦ ⎣ 0 0 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 ⎤ ⎡ ⎡ 1 0 1 0 0 0 0 0 0 0 0 0 0⎥ ⎢0 1 ⎢0 1 0 0 0 ⎥ ⎢ ⎢ 0 0 0⎥ ⎢0 1 ⎢0 0 1 0 0 ⎥ ⎢ ⎢ 0 0 0⎥ ⎢1 0 ⎢0 0 0 1 0 ×⎢ ⎥×⎢ 0 0 0 0 1 0 0 0 ⎥ ⎢0 0 ⎢ √ ⎥ ⎢ ⎢ 0√ 0 ⎥ ⎢ 0 0 ⎢ 0 0 0 0 0 1/ 2 ⎦ ⎣ ⎣ 0 0 0 0 0 0 1/ 2 0 0 0 0 0 0 0 0 0 0 1 0 0 ⎡ ⎤ 1 0 0 0 0 0 0 1 0 1 0 ⎥ ⎢0 1 0 0 0 ⎢ ⎥ 1 0 0 ⎥ ⎢0 0 1 0 0 ⎢ ⎥ 0 0 0 ⎥ ⎢0 0 0 1 1 ×⎢ ⎥ 0 0 ⎥ ⎢ 0 0 0 1 −1 0 ⎢ ⎥ 0 ⎥ ⎢ 0 0 1 0 0 −1 0 ⎣ ⎦ 0 1 0 0 0 0 −1 0 1 0 0 0 0 0 0 −1 = C1 C2 C3 C4 C5 C6 C7 .
0 0 0 0 0 √ 2 cos 3π 16 √ − 2 cos 5π 16 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 −1 0 0 1 1 0 0 0 0 0 1 0 1 0 0 −1 0 0 0 −1 0 0 0 1 0 0 0 0 0 0 0 0 0
Figure 24.32: Product of Seven Matrices.
0 0 0 0 0 0 0 0 √ π 0 2 cos 16 √ 5π 2 cos 16 0 √ 2 cos 3π 0 16 √ 0 2 cos 7π 16 ⎤ 0 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎦ 0 1 ⎤ 0 0 0 0 0 0⎥ ⎥ 0 0 0⎥ ⎥ 0 0 0⎥ ⎥ 0 0 0⎥ ⎥ 1 0 0⎥ ⎦ 0 1 0 0 0 1
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
24.3 The Discrete Cosine Transform
1118
these two elements requires four multiplications and two additions (assuming that a subtraction takes the same time to execute as an addition). The following computation, however, yields the same result with three additions and three multiplications: T = (A + B) cos α,
T − B(cos α − sin α),
−T + A(cos α + sin α).
Thus, the three groups now require nine additions and nine multiplications instead of the original six additions and 12 multiplications (two more additions are needed for the other nonzero elements of C2 ), which brings the totals of Table 24.31 down to 29 additions and 11 multiplications. There is no national science just as there is no national multiplication table; what is national is no longer science. —Anton Chekhov.
24.3.7 Hardware Implementation of the DCT Table 24.29 lists the 64 angle values of the DCT-2 for n = 8. When the cosines of those angles are computed, we find that because of the symmetry of the cosine function, there are only √six distinct nontrivial cosine values. They are summarized in Table 24.33, where a = 1/ 2, bi = cos(iπ/16), and ci = cos(iπ/8). The six nontrivial values are b1 , b3 , b5 , b7 , c1 , and c3 . 1 b1 c1 b3 a b5 c3 b7
1 b3 c3 −b7 −a −b1 −c1 −b5
1 b5 −c3 −b1 −a b7 c1 b3
1 b7 −c1 −b5 a b3 −c3 −b1
1 −b7 −c1 b5 a −b3 −c3 b1
1 −b5 −c3 b1 −a −b7 c1 −b3
1 −b3 c3 b7 −a b1 −c1 b5
1 −b1 c1 −b3 a −b5 c3 −b7
Table 24.33: Six Distinct Cosine Values for the DCT-2.
This feature can be exploited in a fast software implementation of the DCT or to make a simple hardware device to compute the DCT coefficients Gi for eight pixel values pi . Figure 24.34 shows how such a device may be organized in two parts, each computing four of the eight coefficients Gi . Part I is based on a 4×4 symmetric matrix whose elements are the four distinct bi ’s. The eight pixels are divided into four groups of two pixels each. The two pixels of each group are subtracted, and the four differences become a row vector that’s multiplied by the four columns of the matrix to produce the four DCT coefficients G1 , G3 , G5 , and G7 . Part II is based on a similar 4×4 matrix whose nontrivial elements are the two ci ’s. The computations are similar except that the two pixels of each group are added instead of subtracted. ⎤ ⎡ b3 b5 b7 (I) b1 ⎢ b3 −b7 −b1 −b5 ⎥ (p0 − p7 ), (p1 − p6 ), (p2 − p5 ), (p3 − p4 ) ⎣ ⎦ → [G1 , G3 , G5 , G7 ], b5 −b1 b7 b3 b7 −b5 b3 −b1
24 Transforms and JPEG ⎡
1 c1 ⎢1 c3 (p0 + p7 ), (p1 + p6 ), (p2 + p5 ), (p3 + p4 ) ⎣ 1 −c3 1 −c1
1119 ⎤
a c3 (II) −a −c1 ⎥ , G , G , G → [G ⎦ 0 2 4 6 ]. −a c1 a −c3
Figure 24.34: A Hardware Implementation of the DCT-2.
Figure 24.35 (after [Chen et al. 77]) illustrates how such a device can be constructed out of simple adders, complementors, and multipliers. Notation such as c(3π/16) in the figure refers to the cosine function. p0
+
+
p1
+
+
p2
+
+
p3
+
+
p4
+
+
p5
+
p6
+
p7
+
−c(π/4)
c(π/4)
+
c(π/4)
−c(π/4) c(3π/8)
c(3π/8)
+
G0
+
G1
+
+
+
c(π/4)
+
G2
c(π/8) −c(π/8) c(7π/16) c(3π/16)
G3 G4
+
c(π/16)
G5
+
c(5π/16) −c(5π/16)
+
+
c(3π/16)
+
+
+
c(7π/16)
+
G6
−c(π/16)
G7
Figure 24.35: A Hardware Implementation of the DCT-2.
24.3.8 QR Matrix Decomposition This section provides background material on the technique of QR matrix decomposition. It is intended for those already familiar with matrices who want to master this method. Any matrix A can be factored into the matrix product Q × R, where Q is an orthogonal matrix and R is upper triangular. If A is also orthogonal, then R will also be orthogonal. However, an upper triangular matrix that’s also orthogonal must be diagonal. The orthogonality of R implies R−1 = RT and its being diagonal implies R−1 ×R = I. The conclusion is that if A is orthogonal, then R must satisfy RT ×R = I, which means that its diagonal elements must be +1 or −1. If A = Q×R and R has this form, then A and Q are identical, except that columns i of A and Q will have opposite signs for all values of i where Ri,i = −1. The QR decomposition of matrix A into Q and R is done by a loop where each iteration converts one element of A to zero. When all the below-diagonal elements of A have been zeroed, it becomes the upper triangular matrix R. Each element Ai,j is zeroed by multiplying A by a Givens rotation matrix Ti,j (Section 4.4.4). This is an
24.3 The Discrete Cosine Transform
1120
antisymmetric matrix where the two diagonal elements Ti,i and Tj,j are set to the cosine of a certain angle θ, and the two off-diagonal elements Tj,i and Ti,j are set to the sine and negative sine, respectively, of the same θ. The sine and cosine of θ are defined as cos θ =
Aj,j , D
sin θ =
Ai,j , D
where D =
$ A2j,j + A2i,j .
Following are some examples of Givens rotation matrices:
c s , −s c
⎡
⎤
1 0 0 ⎣0 c s⎦, 0 −s c
⎡
1 0 ⎢0 c ⎣ 0 0 0 −s
⎤ 0 0 0 s⎥ ⎦, 1 0 0 c
⎤ 1 0 0 0 0 ⎢0 c 0 s 0⎥ ⎥ ⎢ ⎢0 0 1 0 0⎥. ⎦ ⎣ 0 −s 0 c 0 0 0 0 0 1 ⎡
(24.19)
Those familiar with rotation matrices will recognize that a Givens matrix [Givens 58] rotates a point through an angle whose sine and cosine are the s and c of Equation (24.19). In two dimensions, the rotation is done about the origin. In three dimensions, it is done about one of the coordinate axes (the x axis in Equation (24.19)). In four dimensions, the rotation is about two of the four coordinate axes (the first and third in Equation (24.19)) and cannot be visualized. In general, an n×n Givens matrix rotates a point about n − 2 coordinate axes of an n-dimensional space. Figure 24.36 is a Matlab function for the QR decomposition of a matrix A. Notice how Q is obtained as the product of the individual Givens matrices and how the double loop zeros all the below-diagonal elements column by column from the bottom up. function [Q,R]=QRdecompose(A); % Computes the QR decomposition of matrix A % R is an upper triangular matrix and Q % an orthogonal matrix such that A=Q*R. [m,n]=size(A); % determine the dimens of A Q=eye(m); % Q starts as the mxm identity matrix R=A; for p=1:n for q=(1+p):m w=sqrt(R(p,p)^2+R(q,p)^2); s=-R(q,p)/w; c=R(p,p)/w; U=eye(m); % Construct a U matrix for Givens rotation U(p,p)=c; U(q,p)=-s; U(p,q)=s; U(q,q)=c; R=U’*R; % one Givens rotation Q=Q*U; end end Figure 24.36: A Matlab Function for the QR Decomposition of a Matrix.
24 Transforms and JPEG
1121
“Computer!’ shouted Zaphod, “rotate angle of vision through oneeighty degrees and don’t talk about it!’ —Douglas Adams, The Hitchhikers Guide to the Galaxy.
24.3.9 Vector Spaces The discrete cosine transform can also be interpreted as a change of basis in a vector space from the standard basis to the DCT basis, so this section is a short discussion of vector spaces, their relation to data compression and to the DCT, their bases, and the important operation of change of basis. An n-dimensional vector space is the set of all vectors of the form (v1 , v2 , . . . , vn ). We limit the discussion to the case where the vi ’s are real numbers. The attribute of vector spaces that makes them important to us is the existence of bases. Any vector (a, b, c) in three dimensions can be written as the linear combination (a, b, c) = a(1, 0, 0) + b(0, 1, 0) + c(0, 0, 1) = ai + bj + ck, so we say that the set of three vectors i, j, and k forms a basis of the three-dimensional vector space. Notice that the three basis vectors are orthogonal; the dot product of any two of them is zero. They are also orthonormal; the dot product of each with itself is 1. It is convenient to have an orthonormal basis, but this is not a requirement. The basis does not even have to be orthogonal. The set of three vectors i, j, and k can be extended to any number of dimensions. A basis for an n-dimensional vector space may consist of the n vectors vi for i = 1, 2, . . . , n, where element j of vector vi is the Kronecker delta function δij . This simple basis is the standard basis of the n-dimensional vector space. In addition to this basis, the ndimensional vector space can have other bases. We illustrate two other bases for n = 8. God made the integers, all else is the work of man. —Leopold Kronecker. The DCT (unnormalized) basis consists of the eight vectors (1, 1, 1, 1, 1, 1, 1, 1), (1, 1, 1, 1, −1, −1, −1, −1), (1, 1, −1, −1, −1, −1, 1, 1), (1, −1, −1, −1, 1, 1, 1, −1), (1, −1, −1, 1, 1, −1, −1, 1), (1, −1, 1, 1, −1, −1, 1, −1), (1, −1, 1, −1, −1, 1, −1, 1),
(1, −1, 1 − 1, 1, −1, 1, −1).
Notice how their elements correspond to higher and higher frequencies. The (unnormalized) Haar wavelet basis (Section 25.1) consists of the eight vectors (1, 1, 1, 1, 1, 1, 1, 1), (1, 1, 1, 1, −1, −1, −1, −1), (1, 1, −1, −1, 0, 0, 0, 0), (0, 0, 0, 0, 1, 1, −1, −1), (1, −1, 0, 0, 0, 0, 0, 0), (0, 0, 1, −1, 0, 0, 0, 0), (0, 0, 0, 0, 1, −1, 0, 0), (0, 0, 0, 0, 0, 0, 1, −1).
1122
24.3 The Discrete Cosine Transform
To understand why these bases are useful for data compression, recall that our data vectors are images or parts of images. The pixels of an image are normally correlated, but the standard basis takes no advantage of this. The vector of all 1’s, on the other hand, is included in the above bases because this single vector is sufficient to express any uniform image. Thus, a group of identical pixels (v, v, . . . , v) can be represented as the single coefficient v times the vector of all 1’s. (The discrete sine transform of Section 24.3.11 is unsuitable for data compression mainly because it does not include this uniform vector.) Basis vector (1, 1, 1, 1, −1, −1, −1, −1) can represent the energy of a group of pixels that’s half dark and half bright. Thus, the group (v, v, . . . , v, −v, −v, . . . , −v) of pixels is represented by the single coefficient v times this basis vector. Successive basis vectors represent higher-frequency images, up to vector (1, −1, 1, −1, 1, −1, 1, −1). This basis vector resembles a checkerboard and therefore isolates the high-frequency details of an image. Those details are normally the least important and can be heavily quantized or even zeroed to achieve better compression. The vector members of a basis don’t have to be orthogonal. In order for a set S of vectors to be a basis, it has to have the following two properties: (1) The vectors have to be linearly independent and (2) it should be possible to express any member of the vector space as a linear combination of the vectors of S. For example, the three vectors (1, 1, 1), (0, 1, 0), and (0, 0, 1) are not orthogonal but form a basis for the threedimensional vector space. (1) They are linearly independent because none of them can be expressed as a linear combination of the other two. (2) Any vector (a, b, c) can be expressed as the linear combination a(1, 1, 1) + (b − a)(0, 1, 0) + (c − a)(0, 0, 1). Once we realize that a vector space may have many bases, we start looking for good bases. A good basis for data compression is one where the inverse of the basis matrix is easy to compute and where the energy of a data vector becomes concentrated in a few coefficients. The bases discussed so far are simple, being based on zeros and ones. The orthogonal bases have the added advantage that the inverse of the basis matrix is simply its transpose. Being fast is not enough, because the fastest thing we could do is to stay with the original standard basis. The reason for changing a basis is to get compression. The DCT base has the added advantage that it concentrates the energy of a vector of correlated values in a few coefficients. Without this property, there would be no reason to change the coefficients of a vector from the standard basis to the DCT basis. After changing to the DCT basis, many coefficients can be quantized, sometimes even zeroed, with a loss of only the least-important image information. If we quantize the original pixel values in the standard basis, we also achieve compression, but we lose image information that may be important. Once a basis has been selected, it is easy to express any given vector in terms of the basis vectors. Assuming that the basis vectors are bi and given an arbitrary vector P = (p1 , p2 , . . . , pn ), we write P as a linear combination P = c1 b1 + c2 b2 + · · · + cn bn of the bi ’s with unknown coefficients ci . Using matrix notation, this is written P = c · B, where c is a row vector of the coefficients and B is the matrix whose rows are the basis vectors. The unknown coefficients can be computed by c = P · B−1 and this is the reason why a good basis is one where the inverse of the basis matrix is easy to compute. A simple example is the coefficients of a vector under the standard basis. We have seen that vector (a, b, c) can be written as the linear combination a(1, 0, 0) + b(0, 1, 0) + c(0, 0, 1). Thus, when the standard basis is used, the coefficients of a vector P are simply
24 Transforms and JPEG
1123
its original elements. If we now want to compress the vector by changing to the DCT basis, we need to compute the coefficients under the new basis. This is an example of the important operation of change of basis. i and vi and assuming that a given vector P can be expressed Given two bases b wi vi , the problem of change of basis is to express one set of as ci bi and also as coefficients in terms of the other. Since the vectors vi constitute a basis, any vector can beexpressed as a linear combination of them. Specifically, any bj can be written bj = i tij vi for some numbers tij . We now construct a matrix T from the tij and observe that it satisfies bi T = vi for i = 1, 2, . . . , n. Thus, T is a linear transformation that transforms basis bi to vi . The numbers tij are the elements of T in basis vi . For our vector P, we can now write ( ci bi )T = ci vi , which implies n
wj vj =
j=1
j
wj bj T =
j
wj
i
vi tij =
i
wj tij vi .
j
This shows that ci = j tij wj ; in other words, a basis is changed by means of a linear transformation T and the same transformation also relates the elements of a vector in the old and new bases. Once we switch to a new basis in a vector space, every vector has new coordinates and every transformation has a different matrix. A linear transformation T operates on an n-dimensional vector and produces an m-dimensional vector. Thus, T(v) is a vector u. If m = 1, the transformation produces a scalar. If m = n − 1, the transformation is a projection. Linear transformations satisfy the two important properties T(u + v) = T(u) + T(v) and T(cv) = cT(v). In general, the linear transformation of a linear combination T(c1 v1 + c2 v2 + · · · + cn vn ) equals the linear combination of the individual transformations c1 T(v1 ) + c2 T(v2 ) + · · · + cn T(vn ). This implies that the zero vector is transformed to itself under any linear transformation. Examples of linear transformations are projection, reflection, rotation, and differentiating a polynomial. The derivative of c1 + c2 x + c3 x2 is c2 + 2c3 x. This is a transformation from the basis (c1 , c2 , c3 ) in three-dimensional space to basis (c2 , c3 ) in two-dimensional space. The transformation matrix satisfies (c1 , c2 , c3 )T = (c2 , 2c3 ), so it is given by ⎤ ⎡ 0 0 ⎣ T = 1 0⎦. 0 2 Examples of nonlinear transformations are translation, the length of a vector, and adding a constant vector v0 . The latter is nonlinear because if T(v) = v + v0 and we double the size of v, then T(2v) = 2v + v0 is different from 2T(v) = 2(v + v0 ). Transforming a vector v to its length ||v|| is also nonlinear because T(−v) = −T(v). Translation is nonlinear because it transforms the zero vector to a nonzero vector. In general, a linear transformation is performed by multiplying the transformed vector v by the transformation matrix T. Thus, u = v · T or T(v) = v · T. Notice that we denote by T both the transformation and its matrix. In order to describe a transformation uniquely, it is enough to describe what it does to the vectors of a basis. To see why this is true, we observe the following. If for a given
1124
24.3 The Discrete Cosine Transform
vector v1 we know what T(v1 ) is, then we know what T(av1 ) is for any a. Similarly, if for a given v2 we know what T(v2 ) is, then we know what T(bv1 ) is for any b and also what T(av1 + bv2 ) is. Thus, we know how T transforms any vector in the plane containing v1 and v2 . This argument shows that if we know what T(vi ) is for all the vectors vi of a basis, then we know how T transforms any vector in the vector space. Given a basis bi for a vector space, we consider the special transformation that affects the magnitude of each vector but not its direction. Thus, T(bi ) = λi bi for some number λi . The basis bi is the eigenvector basis of transformation T. Since we know T for the entire basis, we also know it for any other vector. Any vector v in the vector space can be expressed as a linear combination v = i ci bi . If we apply our transformation to both sides and use the linearity property, we end up with T(v) = v · T =
ci bi · T.
(24.20)
i
In the special case where v is the basis vector b1 , Equation (24.20) implies T(b1 ) = i ci bi · T. On the other hand, T(b1 ) = λ1 b1 . We therefore conclude that c1 = λ1 and, in general, that the transformation matrix T is diagonal with λi in position i of its diagonal. In the eigenvector basis, the transformation matrix is diagonal, so this is the perfect basis. We would love to have it in compression, but it is data dependent. It is called the Karhunen–Lo`eve transform (KLT) and is described in Section 24.2.4.
24.3.10 Rotations in Three Dimensions For those exegetes who want the complete story, the following paragraphs show how a proper rotation matrix (with a determinant of +1) that rotates a point (v, v, v) to the x axis can be derived from the general rotation matrix in three dimensions (Section 4.4.3). A general rotation in three dimensions is fully specified by (1) an axis u of rotation, (2) the angle θ of rotation, and (3) the direction (clockwise or counterclockwise as viewed from the origin) of the rotation about u. Given a unit vector u = (ux , uy , uz ), matrix M of Equation (24.21) performs a rotation of θ ◦ about u. The rotation appears clockwise to an observer looking from the origin in the direction of u. If P = (x, y, z) is an arbitrary point, its position after the rotation is given by the product P · M. M= ⎛ 2 ux + cos θ(1 − u2x ) ⎜ ⎝ ux uy (1 − cos θ) + uz sin θ
ux uy (1 − cos θ) − uz sin θ u2y + cos θ(1 − u2y )
(24.21) ⎞ ux uz (1 − cos θ) + uy sin θ ⎟ uy uz (1 − cos θ) − ux sin θ ⎠ .
ux uz (1 − cos θ) − uy sin θ
uy uz (1 − cos θ) + ux sin θ
u2z + cos θ(1 − u2z )
The general rotation of Equation (24.21) can now be applied to our problem, which is to rotate the vector D = (1, 1, 1) to the x axis. The rotation should be done about the vector u that’s perpendicular to both D and (1, 0, 0). This vector is computed by the cross-product √ u = D×(1, 0, 0) = (0, 1, −1). Normalizing it yields u = (0, α, −α), where α = 1/ 2. The next step is to compute the angle θ between D and the x axis. This is done by normalizing D and computing the dot product of it and the x axis (recall that the dot
24 Transforms and JPEG
1125
product of two unit vectors √ is the cosine of the angle between them). The normalized D is (β, β, β), where β = 1/ 3, and the dot product results in cos θ = β, which also produces sin θ = − 1 − β 2 = − 2/3 = −β/α. The reason for the negative sign is that a rotation from (1, 1, 1) to (1, 0, 0) about u appears counterclockwise to an observer looking from the origin in the direction of positive u. The rotation matrix of Equation (24.21) was derived for the opposite direction of rotation. Also, cos θ = β implies that θ = 54.76◦ . This angle, not 45◦ , is the angle made by vector D with each of the three coordinate axes. (As an aside, when the number of dimensions increases, the angle between vector (1, 1, . . . , 1) and any of the coordinate axes approaches 90◦ .) Substituting u, sin θ, and cos θ in Equation (24.21) and using the relations α2 + β(1 − α2 ) = (β + 1)/2 and −α2 (1 − β) = (β − 1)/2 yields the simple rotation matrix ⎡
⎤ ⎡ β −β −β β 2 2 2 M = ⎣ β α + β(1 − α ) −α (1 − β) ⎦ = ⎣ β β β −α2 (1 − β) α2 + β(1 − α2 ) ⎤ ⎡ 0.5774 −0.5774 −0.5774 0.7886 −0.2115 ⎦ . ≈ ⎣ 0.5774 0.5774 −0.2115 0.7886
⎤ −β −β (β + 1)/2 (β − 1)/2 ⎦ (β − 1)/2 (β + 1)/2
It is now easy to see that a point on the line x = y = z, with coordinates (v, v, v) is rotated by M to (v, v, v)M = (1.7322v, 0, 0). Notice that the determinant of M equals +1, so M is a rotation matrix, in contrast to the matrix of Equation (24.15), which generates improper rotations.
24.3.11 Discrete Sine Transform Readers who have made it to this point may raise the question of why the cosine function, and not the sine, is used in the transform? Is it possible to use the sine function in a similar way to the DCT to create a discrete sine transform? Is there a DST, and if not, why? This short section discusses the differences between the sine and cosine functions and shows why these differences lead to a very ineffective discrete sine transform. A function f (x) that satisfies f (x) = −f (−x) is called odd. Similarly, a function for which f (x) = f (−x) is called even. For an odd function, it is always true that f (0) = −f (−0) = −f (0), so f (0) must be 0. Most functions are neither odd nor even, but the trigonometric functions sine and cosine are important examples of odd and even functions, respectively. Figure 24.37 shows that even though the only difference between them is phase (i.e., the cosine is a shifted version of the sine), this difference is enough to reverse their parity. When the (odd) sine curve is shifted, it becomes the (even) cosine curve, which has the same shape. To understand the difference between the DCT and the DST, we examine the onedimensional case. The DCT in one dimension, Equation (24.11), employs the function cos[(2t + 1)f π/16] for f = 0, 1, . . . , 7. For the first term, where f = 0, this function becomes cos(0), which is 1. This term is the familiar and important DC coefficient, which is proportional to the average of the eight data values being transformed. The DST is similarly based on the function sin[(2t + 1)f π/16], resulting in a zero first term [since sin(0) = 0]. The first term contributes nothing to the transform, so the DST does not have a DC coefficient.
24.3 The Discrete Cosine Transform Sine Cosine 1
1126
−2π
−π
0
π
2π
−1 Figure 24.37: The Sine and Cosine as Odd and Even Functions, Respectively.
The disadvantage of this can be seen when we consider the example of eight identical data values being transformed by the DCT and by the DST. Identical values are, of course, perfectly correlated. When plotted, they become a horizontal line. Applying the DCT to these values produces just a DC coefficient: All the AC coefficients are zero. The DCT compacts all the energy of the data into the single DC coefficient whose value is identical to the values of the data items. The IDCT can reconstruct the eight values perfectly (except for minor differences resulting from limited machine precision). Applying the DST to the same eight values, on the other hand, results in seven AC coefficients whose sum is a wave function that passes through the eight data points but oscillates between the points. This behavior, illustrated by Figure 24.38, has three disadvantages, namely (1) the energy of the original data values is not compacted, (2) the seven coefficients are not decorrelated (since the data values are perfectly correlated), and (3) quantizing the seven coefficients may greatly reduce the quality of the reconstruction done by the inverse DST.
DCT coefficients DST coefficients Figure 24.38: The DCT and DST of Eight Identical Data Values.
Example: Applying the DST to the eight identical values 100 results in the eight coefficients (0, 256.3, 0, 90, 0, 60.1, 0, 51). Using these coefficients, the IDST can reconstruct the original values, but it is easy to see that the AC coefficients do not behave like those of the DCT. They are not getting smaller, and there are no runs of zeros among them. Applying the DST to the eight highly correlated values 11, 22, 33, 44, 55, 66, 77, and 88 results in the even worse set of coefficients (0, 126.9, −57.5, 44.5, −31.1, 29.8, −23.8, 25.2). There is no energy compaction at all.
24 Transforms and JPEG
N=8; m=[1:N]’*ones(1,N); n=m’; % can also use cos instead of sin %A=sqrt(2/N)*cos(pi*(2*(n-1)+1).*(m-1)/(2*N)); A=sqrt(2/N)*sin(pi*(2*(n-1)+1).*(m-1)/(2*N)); A(1,:)=sqrt(1/N); C=A’; for row=1:N for col=1:N B=C(:,row)*C(:,col).’; %tensor product subplot(N,N,(row-1)*N+col) imagesc(B) drawnow end end Figure 24.39: The 64 Basis Images of the DST in Two Dimensions.
1127
1128
24.4 Test Images
These arguments and examples, together with the fact (discussed in [Ahmed et al. 74] and [Rao and Yip 90]) that the DCT produces highly decorrelated coefficients, argue strongly for the use of the DCT as opposed to the DST in data compression. Exercise 24.9: Use mathematical software to compute and display the 64 basis images of the DST in two dimensions for n = 8. We are the wisp of straw, the plaything of the winds. We think that we are making for a goal deliberately chosen; destiny drives us towards another. Mathematics, the exaggerated preoccupation of my youth, did me hardly any service; and animals, which I avoided as much as ever I could, are the consolation of my old age. Nevertheless, I bear no grudge against the sine and the cosine, which I continue to hold in high esteem. They cost me many a pallid hour at one time, but they always afforded me some first rate entertainment: they still do so, when my head lies tossing sleeplessly on its pillow. —J. Henri Fabre, The Life of the Fly.
24.4 Test Images New image compression methods that are developed and implemented have to be tested. Testing different methods on the same data makes it possible to compare their performance both in compression efficiency and in speed. This is why there are standard collections of test data, such as the Calgary Corpus [Calgary 11], the Canterbury Corpus [Canterbury 11], and the ITU-T set of eight training documents for fax compression [funet 11]. In addition to these sets of test data, there currently exist collections of still images commonly used by researchers and implementors in the fields of image compression and image processing. Three of the four images shown here, namely Lena, mandril, and peppers, are arguably the most well-known of them. They are continuous-tone images, although Lena has some features of a discrete-tone image. Each image is accompanied by a detail, showing individual pixels (see also Figure 21.13). It is easy to see why the peppers image is continuous-tone. Adjacent pixels that differ much in color are fairly rare in this image. Most neighboring pixels are very similar. In contrast, the mandril image, even though natural, is a bad example of a continuous-tone image. The detail (showing part of the right eye and the area around it) shows that many pixels differ considerably from their immediate neighbors because of the animal’s facial hair in this area. This image compresses badly under any compression method. However, the nose area, with mostly blue and red, is continuous-tone. The Lena image is mostly pure continuous-tone, especially the wall and the bare skin areas. The hat is good continuous-tone, whereas the hair and the plume on the hat are bad continuous-tone. The straight lines on the wall and the curved parts of the mirror are features of a discrete-tone image. The Lena image is widely used by the image processing community, in addition to being popular in image compression. Because of the interest in it, its origin and history have been researched and are well documented. This image is part of the Playboy
24 Transforms and JPEG
1129
Figure 24.40: Lena and Detail.
centerfold for November, 1972. It features the Swedish playmate Lena Soderberg (n´ee Sjooblom), and it was discovered, clipped, and scanned in the early 1970s by Alexander Sawchuk, an assistant professor at the University of Southern California for use as a test image for his image compression research. It has since become the most important, well-known, and commonly used image in the history of imaging and electronic communications. As a result, Lena is currently considered by many the First Lady of the Internet. Playboy, which normally prosecutes unauthorized users of its images, has found out about the unusual use of one of its copyrighted images, but decided to give its blessing to this particular “application.” Lena herself currently lives in Sweden. She was told of her “fame” in 1988, was surprised and amused by it, and was invited to attend the 50th Anniversary IS&T (the society for Imaging Science and Technology) conference in Boston in May 1997. At the conference she autographed her picture, posed for new pictures (available on the www), and gave a presentation (about herself, not image processing). The three images are widely available for downloading on the Internet. Figure 24.44 shows a typical discrete-tone image, with a detail shown in Figure 24.45. Notice the straight lines and the text, where certain characters appear several times (a source of redundancy). This particular image has few colors, but in general, a discrete-tone image may have many colors. Lena, Illinois, is a community of approximately 2,900 people. Lena is considered to be a clean and safe community located centrally to larger cities that offer other interests when needed. Lena is 2-1/2 miles from Lake Le-Aqua-Na State Park. The park offers hiking trails, fishing, swimming beach, boats, cross country skiing, horse back riding trails, as well as picnic and camping areas. It is a beautiful well-kept park that has free admission to the public. A great place for sledding and ice skating in the winter! (From http://www.villageoflena.com/)
1130
24.4 Test Images
Figure 24.41: Mandril and Detail.
Figure 24.42: JPEG Blocking Artifacts.
Figure 24.43: Peppers and Detail.
24 Transforms and JPEG
Figure 24.44: A Discrete-Tone Image.
Figure 24.45: A Discrete-Tone Image (Detail).
1131
1132
24.5 JPEG
24.5 JPEG JPEG is a sophisticated lossy/lossless compression method for color or grayscale still images (not videos). It does not handle bi-level (black and white) images very well. It also performs best on continuous-tone images, where adjacent pixels have similar colors. An important feature of JPEG is its use of many parameters, allowing the user to adjust the amount of the data lost (and thus also the compression ratio) over a very wide range. Often, the eye cannot see any image degradation even at compression factors of 10 or 20. There are two operating modes, lossy (also called baseline) and lossless (which typically produces compression ratios of around 0.5). Most implementations support just the lossy mode. This mode includes progressive and hierarchical coding. A few of the many references to JPEG are [Pennebaker and Mitchell 92], [Wallace 91], and [Zhang 90]. JPEG is a compression method, not a complete standard for image representation. This is why it does not specify image features such as pixel aspect ratio, color space, or interleaving of bitmap rows. JPEG has been designed as a compression method for continuous-tone images. The main goals of JPEG compression are the following: 1. High compression ratios, especially in cases where image quality is judged as very good to excellent. 2. The use of many parameters, allowing knowledgeable users to experiment and achieve the desired compression/quality trade-off. 3. Obtaining good results with any kind of continuous-tone image, regardless of image dimensions, color spaces, pixel aspect ratios, or other image features. 4. A sophisticated, but not too complex compression method, allowing software and hardware implementations on many platforms. 5. Several modes of operation: (a) A sequential mode where each image component (color) is compressed in a single left-to-right, top-to-bottom scan; (b) a progressive mode where the image is compressed in multiple blocks (known as “scans”) to be viewed from coarse to fine detail; (c) a lossless mode that is important in cases where the user decides that no pixels should be lost (the trade-off is low compression ratio compared to the lossy modes); and (d) a hierarchical mode where the image is compressed at multiple resolutions allowing lower-resolution blocks to be viewed without first having to decompress the following higher-resolution blocks. The name JPEG is an acronym that stands for Joint Photographic Experts Group. This was a joint effort by the CCITT and the ISO (the International Standards Organization) that started in June 1987 and produced the first JPEG draft proposal in 1991. The JPEG standard has proved successful and has become widely used for image compression, especially in Web pages. The main JPEG compression steps are outlined here, and each step is then described in detail later. 1. Color images are transformed from RGB into a luminance-chrominance color space (Section 21.12; this step is skipped for grayscale images). The eye is sensitive to small changes in luminance but not in chrominance, so the chrominance part can later lose much data, and thus be highly compressed, without visually impairing the overall image quality much. This step is optional but important because the remainder of the algo-
24 Transforms and JPEG
1133
rithm works on each color component separately. Without transforming the color space, none of the three color components will tolerate much loss, leading to worse compression. 2. Color images are downsampled by creating low-resolution pixels from the original ones (this step is used only when hierarchical compression is selected; it is always skipped for grayscale images). The downsampling is not done for the luminance component. Downsampling is done either at a ratio of 2:1 both horizontally and vertically (the socalled 2h2v or 4:1:1 sampling) or at ratios of 2:1 horizontally and 1:1 vertically (2h1v or 4:2:2 sampling). Since this is done on two of the three color components, 2h2v reduces the image to 1/3 + (2/3) × (1/4) = 1/2 its original size, while 2h1v reduces it to 1/3 + (2/3) × (1/2) = 2/3 its original size. Since the luminance component is not touched, there is no noticeable loss of image quality. Grayscale images don’t go through this step. 3. The pixels of each color component are organized in groups of 8 × 8 pixels called data units, and each data unit is compressed separately. If the number of image rows or columns is not a multiple of 8, the bottom row and the rightmost column are duplicated as many times as necessary. In the noninterleaved mode, the encoder handles all the data units of the first image component, then the data units of the second component, and finally those of the third component. In the interleaved mode the encoder processes the three top-left data units of the three image components, then the three data units to their right, and so on. The fact that each data unit is compressed separately is one of the downsides of JPEG. If the user asks for maximum compression, the decompressed image may exhibit blocking artifacts due to differences between blocks. Figure 24.42 is an extreme example of this effect. 4. The discrete cosine transform (DCT, Section 24.3) is then applied to each data unit to create an 8 ×8 map of frequency components (Section 24.5.1). They represent the average pixel value and successive higher-frequency changes within the group. This prepares the image data for the crucial step of losing information. Since DCT involves the transcendental function cosine, it must involve some loss of information due to the limited precision of computer arithmetic. This means that even without the main lossy step (step 5 below), there will be some loss of image quality, but it is normally small. 5. Each of the 64 frequency components in a data unit is divided by a separate number called its quantization coefficient (QC), and then rounded to an integer (Section 24.5.2). This is where information is irretrievably lost. Large QCs cause more loss, so the highfrequency components typically have larger QCs. Each of the 64 QCs is a JPEG parameter and can, in principle, be specified by the user. In practice, most JPEG implementations use the QC tables recommended by the JPEG standard for the luminance and chrominance image components (Table 24.47). 6. The 64 quantized frequency coefficients (which are now integers) of each data unit are encoded using a combination of RLE and Huffman coding (Section 24.5.3). An arithmetic coding variant known as the QM coder (see [Salomon 09]) can optionally be used instead of Huffman coding. 7. The last step adds headers and all the required JPEG parameters, and outputs the result. The compressed file may be in one of three formats (1) the interchange format, in which the file contains the compressed image and all the tables needed by the decoder (mostly quantization tables and tables of Huffman codes), (2) the abbreviated format for compressed image data, where the file contains the compressed image and may contain
1134
24.5 JPEG
no tables (or just a few tables), and (3) the abbreviated format for table-specification data, where the file contains just tables, and no compressed image. The second format makes sense in cases where the same encoder/decoder pair is used, and they have the same tables built in. The third format is used in cases where many images have been compressed by the same encoder, using the same tables. When those images need to be decompressed, they are sent to a decoder preceded by one file with table-specification data. The JPEG decoder performs the reverse steps. (Thus, JPEG is a symmetric compression method.) The progressive mode is a JPEG option. In this mode, higher-frequency DCT coefficients are written on the compressed stream in blocks called “scans.” Each scan that is read and processed by the decoder results in a sharper image. The idea is to use the first few scans to quickly create a low-quality, blurred preview of the image, and then either input the remaining scans or stop the process and reject the image. The trade-off is that the encoder has to save all the coefficients of all the data units in a memory buffer before they are sent in scans, and also go through all the steps for each scan, slowing down the progressive mode. Figure 24.46a shows an example of an image with resolution 1024 × 512. The image is divided into 128 × 64 = 8192 data units, and each is transformed by the DCT, becoming a set of 64 8-bit numbers. Figure 24.46b is a block whose depth corresponds to the 8,192 data units, whose height corresponds to the 64 DCT coefficients (the DC coefficient is the top one, numbered 0), and whose width corresponds to the eight bits of each coefficient. After preparing all the data units in a memory buffer, the encoder writes them on the compressed stream in one of two methods, spectral selection or successive approximation (Figure 24.46c,d). The first scan in either method is the set of DC coefficients. If spectral selection is used, each successive scan consists of several consecutive (a band of) AC coefficients. If successive approximation is used, the second scan consists of the four most-significant bits of all AC coefficients, and each of the following four scans, numbers 3 through 6, adds one more significant bit (bits 3 through 0, respectively). In the hierarchical mode, the encoder stores the image several times in the output stream, at several resolutions. However, each high-resolution part uses information from the low-resolution parts of the output stream, so the total amount of information is less than that required to store the different resolutions separately. Each hierarchical part may use the progressive mode. The hierarchical mode is useful in cases where a high-resolution image needs to be output in low resolution. Today, in 2011, it is difficult to come up with an example of a low-resolution output device, but there may be places where a few old, obsolete dot-matrix printers are still in use. The lossless mode of JPEG (Section 24.5.4) calculates a “predicted” value for each pixel, generates the difference between the pixel and its predicted value, and encodes the difference using the same method (i.e., Huffman or arithmetic coding) employed by step 5 above. The predicted value is calculated using values of pixels above and to the left of the current pixel (pixels that have already been input and encoded). The following sections discuss the steps in more detail:
24 Transforms and JPEG
1135
1024=8×128
512=8×64
1
2
3
4
127 128
129 130
its un ta da
255 256 0 1 2
524,288 pixels
12
8,192 data units 8065
62 63
8191 8192
76
(a)
0
76
1st scan
10
0 1 2 2nd scan 62 63
0 1 2
10
(b)
0 1st scan
0 1
8192
7 654
can ds 2n
0 1 2 3rd scan 62 63
3 3rd scan
0 1 2
62 63
kth scan
(c)
62 63
0
n sca 6th
(d) Figure 24.46: Scans in the JPEG Progressive Mode.
24.5 JPEG
1136
24.5.1 DCT The general concept of a transform is discussed in Section 24.1. The discrete cosine transform is discussed in much detail in Section 24.3. Other examples of important transforms are the Fourier transform and the wavelet transform (Chapter 25). Both have applications in many areas and also have discrete versions (DFT and DWT). The JPEG committee elected to use the DCT because of its excellent performance, because it does not assume anything about the structure of the data (the DFT, for example, assumes that the data to be transformed is periodic), and because there are ways to speed it up (Section 24.3.5). The JPEG standard calls for applying the DCT not to the entire image but to data units (blocks) of 8 × 8 pixels. The reasons for this are: (1) Applying DCT to large blocks involves many arithmetic operations and is therefore slow. Applying DCT to small data units is faster. (2) Experience shows that, in a continuous-tone image, correlations between pixels are short range. A pixel in such an image has a value (color component or shade of gray) that’s close to those of its near neighbors, but has nothing to do with the values of far neighbors. The JPEG DCT is therefore executed by Equation (24.13), duplicated here for n = 8 7 7 (2y + 1)jπ (2x + 1)iπ 1 cos , pxy cos Gij = Ci Cj 4 16 16 x=0 y=0 1 √ , f = 0, 2 where Cf = and 0 ≤ i, j ≤ 7. 1, f > 0,
(24.13)
The DCT is JPEG’s key to lossy compression. The unimportant image information is reduced or removed by quantizing the 64 DCT coefficients, especially the ones located toward the lower-right. If the pixels of the image are correlated, quantization does not degrade the image quality much. For best results, each of the 64 coefficients is quantized by dividing it by a different quantization coefficient (QC). All 64 QCs are parameters that can be controlled, in principle, by the user (Section 24.5.2). The JPEG decoder works by computing the inverse DCT (IDCT), Equation (24.14), duplicated here for n = 8 7 7 (2y + 1)jπ (2x + 1)iπ 1 cos , Ci Cj Gij cos 4 i=0 j=0 16 16 1 √ , f = 0; 2 where Cf = 1, f > 0.
pxy =
(24.14)
It takes the 64 quantized DCT coefficients and calculates 64 pixels pxy . If the QCs are the right ones, the new 64 pixels will be very similar to the original ones. Mathematically, the DCT is a one-to-one mapping of 64-point vectors from the image domain to the frequency domain. The IDCT is the reverse mapping. If the DCT and IDCT could be calculated with infinite precision and if the DCT coefficients were not quantized, the original 64 pixels would be exactly reconstructed.
24 Transforms and JPEG
1137
24.5.2 Quantization After each 8 × 8 data unit of DCT coefficients Gij is computed, it is quantized. This is the step where information is lost (except for some unavoidable loss because of finite precision calculations in other steps). Each number in the DCT coefficients matrix is divided by the corresponding number from the particular “quantization table” used, and the result is rounded to the nearest integer. As has already been mentioned, three such tables are needed, for the three color components. The JPEG standard allows for up to four tables, and the user can select any of the four for quantizing each color component. The 64 numbers that constitute each quantization table are all JPEG parameters. In principle, they can all be specified and fine-tuned by the user for maximum compression. In practice, few users have the patience or expertise to experiment with so many parameters, so JPEG software normally uses the following two approaches: 1. Default quantization tables. Two such tables, for the luminance (grayscale) and the chrominance components, are the result of many experiments performed by the JPEG committee. They are included in the JPEG standard and are reproduced here as Table 24.47. It is easy to see how the QCs in the table generally grow as we move from the upper left corner to the bottom right corner. This is how JPEG reduces the DCT coefficients with high spatial frequencies. 2. A simple quantization table Q is computed, based on one parameter R specified by the user. A simple expression such as Qij = 1 + (i + j) × R guarantees that QCs start small at the upper-left corner and get bigger toward the lower-right corner. Table 24.48 shows an example of such a table with R = 2. 16 12 14 14 18 24 49 72
11 12 13 17 22 35 64 92
10 14 16 22 37 55 78 95
16 24 40 19 26 58 24 40 57 29 51 87 56 68 109 64 81 104 87 103 121 98 112 100 Luminance
51 61 60 55 69 56 80 62 103 77 113 92 120 101 103 99
17 18 24 47 99 99 99 99
18 21 26 66 99 99 99 99
24 26 56 99 99 99 99 99
47 66 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
99 99 99 99 99 99 99 99
Chrominance
Table 24.47: Recommended Quantization Tables.
If the quantization is done correctly, very few nonzero numbers will be left in the DCT coefficients matrix, and they will typically be concentrated in the upper-left region. These numbers are the output of JPEG, but they are further compressed before being written on the output stream. In the JPEG literature this compression is called “entropy coding,” and Section 24.5.3 shows in detail how it is done. Three techniques are used by entropy coding to compress the 8 × 8 matrix of integers: 1. The 64 numbers are collected by scanning the matrix in zigzags (Figure 23.9). This produces a string of 64 numbers that starts with some nonzeros and typically ends with many consecutive zeros. Only the nonzero numbers are output (after further compressing
24.5 JPEG
1138 1 3 5 7 9 11 13 15
3 5 7 9 11 13 15 17
5 7 9 11 13 15 17 19
7 9 11 13 15 17 19 21
9 11 13 15 17 19 21 23
11 13 15 17 19 21 23 25
13 15 17 19 21 23 25 27
15 17 19 21 23 25 27 29
Table 24.48: The Quantization Table 1 + (i + j) × 2.
them) and are followed by a special end-of block (EOB) code. This way there is no need to output the trailing zeros (we can say that the EOB is the run-length encoding of all the trailing zeros). The interested reader should also consult chapter 11 of [Salomon 09] for other methods to compress binary strings with many consecutive zeros. Exercise 24.10: Propose a practical way to write a loop that traverses an 8 × 8 matrix in zigzag. 2. The nonzero numbers are compressed using Huffman coding (Section 24.5.3). 3. The first of those numbers (the DC coefficient, Page 1083) is treated differently from the others (the AC coefficients). She had just succeeded in curving it down into a graceful zigzag, and was going to dive in among the leaves, which she found to be nothing but the tops of the trees under which she had been wandering, when a sharp hiss made her draw back in a hurry. —Lewis Carroll, Alice in Wonderland (1865).
24.5.3 Coding We first discuss point 3 above. Each 8 × 8 matrix of quantized DCT coefficients contains one DC coefficient (at position (0, 0), the top left corner) and 63 AC coefficients. The DC coefficient is a measure of the average value of the 64 original pixels, constituting the data unit. Experience shows that in a continuous-tone image, adjacent data units of pixels are normally correlated in the sense that the average values of the pixels in adjacent data units are close. We already know that the DC coefficient of a data unit is a multiple of the average of the 64 pixels constituting the unit. This implies that the DC coefficients of adjacent data units don’t differ much. JPEG outputs the first one (encoded), followed by differences (also encoded) of the DC coefficients of consecutive data units. Example: If the first three 8 × 8 data units of an image have quantized DC coefficients of 1118, 1114, and 1119, then the JPEG output for the first data unit is 1118 (Huffman encoded, see below) followed by the 63 (encoded) AC coefficients of that data unit. The output for the second data unit will be 1114 − 1118 = −4 (also Huffman encoded), followed by the 63 (encoded) AC coefficients of that data unit, and the output for the third data unit will be 1119 − 1114 = 5 (also Huffman encoded), again followed
24 Transforms and JPEG
1139
by the 63 (encoded) AC coefficients of that data unit. This way of handling the DC coefficients is worth the extra trouble, because the differences are small. Coding the DC differences is done with Table 24.49, so first here are a few words about this table. Each row has a row number (on the left), the unary code for the row (on the right), and several columns in between. Each row contains greater numbers (and also more numbers) than its predecessor but not the numbers contained in previous rows. Row i contains the range of integers [−(2i −1), +(2i −1)] but is missing the middle range [−(2i−1 − 1), +(2i−1 − 1)]. Thus, the rows get very long, which means that a simple two-dimensional array is not a good data structure for this table. In fact, there is no need to store these integers in a data structure, since the program can figure out where in the table any given integer x is supposed to reside by analyzing the bits of x. The first DC coefficient to be encoded in our example is 1118. It resides in row 11 column 930 of the table (column numbering starts at zero), so it is encoded as 111111111110|01110100010 (the unary code for row 11, followed by the 11-bit binary value of 930). The second DC difference is −4. It resides in row 3 column 3 of Table 24.49, so it is encoded as 1110|011 (the unary code for row 3, followed by the 3-bit binary value of 3). Exercise 24.11: How is the third DC difference, 5, encoded? Point 2 above has to do with the precise way the 63 AC coefficients of a data unit are compressed. It uses a combination of RLE and either Huffman or arithmetic coding. The idea is that the sequence of AC coefficients normally contains just a few nonzero numbers, with runs of zeros between them, and with a long run of trailing zeros. For each nonzero number x, the encoder (1) finds the number Z of consecutive zeros preceding x; (2) finds x in Table 24.49 and prepares its row and column numbers (R and C); (3) the pair (R, Z) (that’s (R, Z), not (R, C)) is used as row and column numbers for Table 24.52; and (4) the Huffman code found in that position in the table is concatenated to C (where C is written as an R-bit number) and the result is (finally) the code emitted by the JPEG encoder for the AC coefficient x and all the consecutive zeros preceding it. 0: 1: 2: 3: 4: 5: 6: 7: .. .
0 -1 -3 -7 -15 -31 -63 -127
1 -2 -6 -14 -30 -62 -126
2 -5
...
-29 -61 -125
14: -16383 -16382 -16381 15: -32767 -32766 -32765 16: 32768
3 -4 -9
... ... ...
.. .
4 -8 -17 -33 -65
5 8 -16 -32 -64
6 9 16 32 64
7 10 17 33 65
... ... ... ...
15 31 63 127
0 10 110 1110 11110 111110 1111110 11111110
. . . -8193 -8192 8192 8193 . . . 16383 111111111111110 . . . -16385 -16384 16384 16385 . . . 32767 1111111111111110 1111111111111111
Table 24.49: Coding the Differences of DC Coefficients.
The Huffman codes in Table 24.52 are not the ones recommended by the JPEG standard. The standard recommends the use of Tables 24.50 and 24.51 and says that
1140
24.5 JPEG
up to four Huffman code tables can be used by a JPEG codec, except that the baseline mode can use only two such tables. The actual codes in Table 24.52 are thus arbitrary. The reader should notice the EOB code at position (0, 0) and the ZRL code at position (0, 15). The former indicates end-of-block, and the latter is the code emitted for 15 consecutive zeros when the number of consecutive zeros exceeds 15. These codes are the ones recommended for the luminance AC coefficients of Table 24.50. The EOB and ZRL codes recommended for the chrominance AC coefficients of Table 24.51 are 00 and 1111111010, respectively. As an example consider the sequence 1118, 2, 0, −2, 0, . . . , 0, −1, 0, . . . . % &' ( 13
The first AC coefficient 2 has no zeros preceding it, so Z = 0. It is found in Table 24.49 in row 2, column 2, so R = 2 and C = 2. The Huffman code in position (R, Z) = (2, 0) of Table 24.52 is 01, so the final code emitted for 2 is 01|10. The next nonzero coefficient, −2, has one zero preceding it, so Z = 1. It is found in Table 24.49 in row 2, column 1, so R = 2 and C = 1. The Huffman code in position (R, Z) = (2, 1) of Table 24.52 is 11011, so the final code emitted for 2 is 11011|01. Exercise 24.12: What code is emitted for the last nonzero AC coefficient, −1? Finally, the sequence of trailing zeros is encoded as 1010 (EOB), so the output for the above sequence of AC coefficients is 01101101110111010101010. We saw earlier that the DC coefficient is encoded as 111111111110|1110100010, so the final output for the entire 64-pixel data unit is the 46-bit number 1111111111100111010001001101101110111010101010. These 46 bits encode one color component of the 64 pixels of a data unit. Let’s assume that the other two color components are also encoded into 46-bit numbers. If each pixel originally consists of 24 bits, then this corresponds to a compression factor of 64 × 24/(46 × 3) ≈ 11.13; very impressive! (Notice that the DC coefficient of 1118 has contributed 23 of the 46 bits. Subsequent data units code differences of their DC coefficient, which may take fewer than 10 bits instead of 23. They may feature much higher compression factors as a result.) The same tables (Tables 24.49 and 24.52) used by the encoder should, of course, be used by the decoder. The tables may be predefined and used by a JPEG codec as defaults, or they may be specifically calculated for a given image in a special pass preceding the actual compression. The JPEG standard does not specify any code tables, so any JPEG codec must use its own. Readers who feel that this coding scheme is complex should take a look at the much more complex CAVLC coding method that is employed by H.264 [Salomon 09] to encode a similar sequence of 8×8 DCT transform coefficients. Some JPEG variants use a particular version of arithmetic coding, called the QM coder, that is specified in the JPEG standard. This version of arithmetic coding is adaptive, so it does not need Tables 24.49 and 24.52. It adapts its behavior to the image statistics as it goes along. Using arithmetic coding may produce 5–10% better compression than Huffman for a typical continuous-tone image. However, it is more
24 Transforms and JPEG
1 6
Z 0 00 1111000 1100
1 1111111110000100 11100
2 111111110001010 3 111010 1111111110010001 4 111011 1111111110011001 5 1111010 1111111110100001 6 1111011 1111111110101001 11111010
7 1111111110110001 111111000
8 1111111110111001 9 111111001 1111111111000010 A 111111010 1111111111001011 B 1111111001 1111111111010100 C 1111111010 1111111111011101 11111111000
D 1111111111100110 1111111111101011
E 1111111111110000 F 11111111001 1111111111111001
1141
2 7
R 3 8
4 9
5 A
01 11111000 11011 1111111110000101 11111001 111111110001011 111110111 1111111110010010 1111111000 1111111110011010 11111110111 1111111110100010 111111110110 1111111110101010 111111110111 1111111110110010 111111111000000 1111111110111010 1111111110111110 1111111111000011 1111111111000111 1111111111001100 1111111111010000 1111111111010101 1111111111011001 1111111111011110 1111111111100010 1111111111100111 1111111111101100 1111111111110001 1111111111110101 1111111111111010
100 1111110110 11110001 1111111110000110 1111110111 111111110001100 111111110101 1111111110010011 1111111110010110 1111111110011011 1111111110011110 1111111110100011 1111111110100110 1111111110101011 1111111110101110 1111111110110011 1111111110110110 1111111110111011 1111111110111111 1111111111000100 1111111111001000 1111111111001101 1111111111010001 1111111111010110 1111111111011010 1111111111011111 1111111111100011 1111111111101000 1111111111101101 1111111111110010 1111111111110110 1111111111111011
1011 1111111110000010 111110110 1111111110000111 111111110100 111111110001101 1111111110001111 1111111110010100 1111111110010111 1111111110011100 1111111110011111 1111111110100100 1111111110100111 1111111110101100 1111111110101111 1111111110110100 1111111110110111 1111111110111100 1111111111000000 1111111111000101 1111111111001001 1111111111001110 1111111111010010 1111111111010111 1111111111011011 1111111111100000 1111111111100100 1111111111101001 1111111111101110 1111111111110011 1111111111110111 1111111111111101
11010 1111111110000011 11111110110 1111111110001000 111111110001001 111111110001110 1111111110010000 1111111110010101 1111111110011000 1111111110011101 1111111110100000 1111111110100101 1111111110101000 1111111110101101 1111111110110000 1111111110110101 1111111110111000 1111111110111101 1111111111000001 1111111111000110 1111111111001010 1111111111001111 1111111111010011 1111111111011000 1111111111011100 1111111111100001 1111111111100101 1111111111101010 1111111111101111 1111111111110100 1111111111111000 1111111111111110
Table 24.50: Recommended Huffman Codes for Luminance AC Coefficients.
24.5 JPEG
1142
1 6
Z 0 01 111000 1011
1 111111110101 11010
2 1111111110001100 3 11011 1111111110010010 4 111010 1111111110011010 5 111011 1111111110100010 6 1111001 1111111110101010 1111010
7 1111111110110010 11111001
8 1111111110111011 9 111110111 1111111111000100 A 111111000 1111111111001101 B 111111001 1111111111010110 C 111111010 1111111111011111 11111111001
D 1111111111101000 11111111100000
E 1111111111110001 F 111111111000011 1111111111111010
2 7
R 3 8
4 9
5 A
100 1111000 111001 111111110001000 11110111 1111111110001101 11111000 1111111110010011 111110110 1111111110011011 1111111001 1111111110100011 11111110111 1111111110101011 11111111000 1111111110110011 1111111110110111 1111111110111100 1111111111000000 1111111111000101 1111111111001001 1111111111001110 1111111111010010 1111111111010111 1111111111011011 1111111111100000 1111111111100100 1111111111101001 1111111111101101 1111111111110010 111111111010110 1111111111111011
1010 111110100 11110110 111111110001001 1111110111 1111111110001110 1111111000 1111111110010100 1111111110010111 1111111110011100 1111111110011111 1111111110100100 1111111110100111 1111111110101100 1111111110101111 1111111110110100 1111111110111000 1111111110111101 1111111111000001 1111111111000110 1111111111001010 1111111111001111 1111111111010011 1111111111011000 1111111111011100 1111111111100001 1111111111100101 1111111111101010 1111111111101110 1111111111110011 1111111111110111 1111111111111100
11000 1111110110 111110101 111111110001010 111111110110 1111111110001111 111111110111 1111111110010101 1111111110011000 1111111110011101 1111111110100000 1111111110100101 1111111110101000 1111111110101101 1111111110110000 1111111110110101 1111111110111001 1111111110111110 1111111111000010 1111111111000111 1111111111001011 1111111111010000 1111111111010100 1111111111011001 1111111111011101 1111111111100010 1111111111100110 1111111111101011 1111111111101111 1111111111110100 1111111111111000 1111111111111101
11001 111111110100 11111110110 111111110001011 111111111000010 1111111110010000 1111111110010001 1111111110010110 1111111110011001 1111111110011110 1111111110100001 1111111110100110 1111111110101001 1111111110101110 1111111110110001 1111111110110110 1111111110111010 1111111110111111 1111111111000011 1111111111001000 1111111111001100 1111111111010001 1111111111010101 1111111111011010 1111111111011110 1111111111100011 1111111111100111 1111111111101100 1111111111110000 1111111111110101 1111111111111001 1111111111111110
Table 24.51: Recommended Huffman Codes for Chrominance AC Coefficients.
24 Transforms and JPEG R Z: 0: 1: 2: 3: 4: 5: .. .
0
1
1010 00 1100 01 11011 100 1111001 1011 111110110 11010 11111110110 .. .
1143
...
15
... ... ... ... ...
11111111001(ZRL) 1111111111110101 1111111111110110 1111111111110111 1111111111111000 1111111111111001
Table 24.52: Coding AC Coefficients.
complex to implement than Huffman coding, so in practice it is rare to find a JPEG codec that uses it.
24.5.4 Lossless Mode The lossless mode of JPEG uses differencing to reduce the values of pixels before they are compressed. This particular form of differencing is called predicting. The values of some near neighbors of a pixel are subtracted from the pixel to get a small number, which is then compressed further using Huffman or arithmetic coding. Figure 24.53a shows a pixel X and three neighbor pixels A, B, and C. Figure 24.53b shows eight possible ways (predictions) to combine the values of the three neighbors. In the lossless mode, the user can select one of these predictions, and the encoder then uses it to combine the three neighbor pixels and subtract the combination from the value of X. The result is normally a small number, which is then entropy-coded in a way very similar to that described for the DC coefficient in Section 24.5.3. Predictor 0 is used only in the hierarchical mode of JPEG. Predictors 1, 2, and 3 are called one-dimensional. Predictors 4, 5, 6, and 7 are two dimensional.
C B A X (a)
Selection value 0 1 2 3 4 5 6 7
Prediction no prediction A B C A+B−C A + ((B − C)/2) B + ((A − C)/2) (A + B)/2 (b)
Figure 24.53: Pixel Prediction in the Lossless Mode.
It should be noted that the lossless mode of JPEG has never been very successful. It produces typical compression factors of 2, and is therefore inferior to other lossless image compression methods. Because of this, many JPEG implementations do not even
24.5 JPEG
1144
implement this mode. Even the lossy (baseline) mode of JPEG does not perform well when asked to limit the amount of loss to a minimum. As a result, some JPEG implementations do not allow parameter settings that result in minimum loss. The strength of JPEG is in its ability to generate highly compressed images that when decompressed are indistinguishable from the original. Recognizing this, the ISO has decided to come up with another standard for lossless compression of continuous-tone images. This standard is now commonly known as JPEG-LS and is described in [Salomon 09].
24.5.5 The Compressed File A JPEG encoder outputs a compressed file that includes parameters, markers, and the compressed data units. The parameters are either four bits (these always come in pairs), one byte, or two bytes long. The markers serve to identify the various parts of the file. Each is two bytes long, where the first byte is X’FF’ and the second one is not 0 or X’FF’. A marker may be preceded by a number of bytes with X’FF’. Table 24.55 lists all the JPEG markers (the first four groups are start-of-frame markers). The compressed data units are combined into MCUs (minimal coded unit), where an MCU is either a single data unit (in the noninterleaved mode) or three data units from the three image components (in the interleaved mode). Compressed image SOI
Frame
EOI
Frame [Tables] Frame header Scan1
[DNL segment] [Scan2]
[Scanlast]
Scan [Tables] Frame header ECS0 [RST0]
ECSlast-1 [RSTlast-1] ECSlast
Segment0 MCU MCU
Segmentlast MCU
MCU MCU
MCU
Figure 24.54: JPEG File Format.
Figure 24.54 shows the main parts of the JPEG compressed file (parts in square brackets are optional). The file starts with the SOI marker and ends with the EOI marker. In between these markers, the compressed image is organized in frames. In the hierarchical mode there are several frames, and in all other modes there is only one frame. In each frame the image information is contained in one or more scans, but the frame also contains a header and optional tables (which, in turn, may include markers). The first scan may be followed by an optional DNL segment (define number
24 Transforms and JPEG
Value
Name Description Nondifferential, Huffman coding FFC0 SOF0 Baseline DCT FFC1 SOF1 Extended sequential DCT FFC2 SOF2 Progressive DCT FFC3 SOF3 Lossless (sequential) Differential, Huffman coding FFC5 SOF5 Differential sequential DCT FFC6 SOF6 Differential progressive DCT FFC7 SOF7 Differential lossless (sequential) Nondifferential, arithmetic coding FFC8 JPG Reserved for extensions FFC9 SOF9 Extended sequential DCT FFCA SOF10 Progressive DCT FFCB SOF11 Lossless (sequential) Differential, arithmetic coding FFCD SOF13 Differential sequential DCT FFCE SOF14 Differential progressive DCT FFCF SOF15 Differential lossless (sequential) Huffman table specification FFC4 DHT Define Huffman table Arithmetic coding conditioning specification FFCC DAC Define arith coding conditioning(s) Restart interval termination FFD0–FFD7 RSTm Restart with modulo 8 count m Other markers FFD8 SOI Start of image FFD9 EOI End of image FFDA SOS Start of scan FFDB DQT Define quantization table(s) FFDC DNL Define number of lines FFDD DRI Define restart interval FFDE DHP Define hierarchical progression FFDF EXP Expand reference component(s) FFE0–FFEF APPn Reserved for application segments FFF0–FFFD JPGn Reserved for JPEG extensions FFFE COM Comment Reserved markers FF01 TEM For temporary private use FF02–FFBF RES Reserved Table 24.55: JPEG Markers.
1145
1146
24.5 JPEG
of lines), which starts with the DNL marker and contains the number of lines in the image that’s represented by the frame. A scan starts with optional tables, followed by the scan header, followed by several entropy-coded segments (ECS), which are separated by (optional) restart markers (RST). Each ECS contains one or more MCUs, where an MCU is, as explained earlier, either a single data unit or three such units. I think he be transform’d into a beast; For I can nowhere find him like a man.
—William Shakespeare, As You Like It (1601)
25 The Wavelet Transform The concept of a transform was introduced in Section 24.1 and the rest of Chapter 24 discusses orthogonal transforms. The transforms dealt with in this chapter are different and are referred to as subband transforms, because they partition an image into various bands or regions that contain different features of the image. We start with the simple concept of a signal. In mathematics, a function is normally denoted by y = f (x). For our purposes, a signal is simply a function x(t). The independent variable is denoted by t because signals that are of practical interest are functions of the time. When a signal is displayed graphically, we see its amplitude at any time and we can also measure its frequency at any point in time. We therefore say that the graphical representation of a signal is its time-amplitude representation or that it represents the time domain of the signal. Many problems in science and engineering depend on how the frequency of a signal varies with time. In such a problem, the most important information of the signal is hidden in its frequency content. Such problems may be easier to solve when the signal is represented in the frequency domain. The concept of frequency domain originated with Joseph Fourier who developed it in the early 1800s as part of his work on heat transfer. The Fourier transform changes the representation of a function from the time domain to the frequency domain. It has many applications and has been the subject of much research and experimentation. As an example, consider the signal x(t) = cos(20tπ) + cos(50tπ) + cos(100tπ) + cos(200tπ). The cosine function is periodic with a period of 2π. Thus, our signal contains the four periodic frequencies 10, 25, 50, and 100 (measured in units of 2π). Figure 25.1 illustrates the two domains of this signal. The time domain is a complex, infinite wave, but the frequency domain consists of only four peaks, indicating that this signal consists of four frequencies. Our signal is relatively simple. In particular, its frequencies are always the same and do not vary over time. Such a signal is termed stationary. In general, signals are not stationary; they consist of frequencies that vary with time. The Fourier transform, D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7_25, © Springer-Verlag London Limited 2011
1147
25.1 The Haar Transform
1148 4 3 2 1
1
400
200
Magnitude
2
10
25
50
Frequency(Hz)
100
Figure 25.1: Time and Frequency Domains.
however, cannot tell the particular frequencies of a signal at any given time. It tells us what frequencies are included in the signal, but not when (at what values of t) each is present. We say that this transform has only frequency resolution but no time resolution. The wavelet transform has been developed in the last few decades specifically to overcome this deficiency of the Fourier transform. The wavelet transform can tell what frequencies make up any part of a given signal. The complete details of this transform are outside the scope of this book, and this chapter deals only with those aspects of the wavelet transform that are relevant to the compression of images. In practice, the wavelet transform is applied to data such as digitized audio and images. Such data consists of of individual numbers, and is therefore discrete, in contrast to mathematical signals, which are normally continuous. This is why the discrete, and not the continuous, wavelet transform is employed in the compression of images. The general discrete wavelet transform is described in Section 25.3 This chapter starts with the Haar transform, the simplest wavelet transform. It then describes how this simple transform can be extended and improved by means of filter banks. The discrete wavelet transform is then introduced, and the technique of compressing images by means of the wavelet transform is illustrated by the SPIHT algorithm.
25.1 The Haar Transform The Haar transform is the simplest wavelet transform. It is presented here first informally, by means of an example, and then formally (in a short section that may be skipped by non-mathematically-savvy readers)
25 The Wavelet Transform
1149
25.1.1 An Illustrating Example This example is one dimensional. We consider a row of pixels, i.e., a one-dimensional array of n values. For simplicity we assume that n is a power of 2. (We use this assumption throughout this chapter, but there is no loss of generality. If n has a different value, the data can be extended by appending zeros. After decompression, the extra zeros are removed.) Consider the array of eight values (1, 2, 3, 4, 5, 6, 7, 8). We first compute the four averages (1 + 2)/2 = 3/2, (3 + 4)/2 = 7/2, (5 + 6)/2 = 11/2, and (7 + 8)/2 = 15/2. It is impossible to reconstruct the original eight values from these four averages, so we also compute the four differences (1−2)/2 = −1/2, (3−4)/2 = −1/2, (5− 6)/2 = −1/2, and (7 − 8)/2 = −1/2. These differences are called detail coefficients, and in this section the terms “difference” and “detail” are used interchangeably. We can think of the averages as a coarse resolution representation of the original image, and of the details as the data needed to reconstruct the original image from this coarse resolution. If the pixels of the image are correlated, the coarse representation will resemble the original pixels, while the details will be small. This explains why the Haar wavelet compression of images uses averages and details.
Prolonged, lugubrious stretches of Sunday afternoon in a university town could be mitigated by attending Sillery’s tea parties, to which anyone might drop in after half-past three. Action of some law of averages always regulated numbers at these gatherings to something between four and eight persons, mostly undergraduates, though an occasional don was not unknown. —Anthony Powell, A Question of Upbringing (1951). It is easy to see that the array (3/2, 7/2, 11/2, 15/2, −1/2, −1/2, −1/2, −1/2) made of the four averages and four differences can be used to reconstruct the original eight values. This array has eight values, but its last four components, the differences, tend to be small numbers, which helps in compression. Encouraged by this, we repeat the process on the four averages, the large components of our array. They are transformed into two averages and two differences, yielding (10/4, 26/4, −4/4, −4/4, −1/2, −1/2, −1/2, −1/2). The next, and last, iteration of this process transforms the first two components of the new array into one average (the average of all eight components of the original array) and one difference (36/8, −16/8, −4/4, −4/4, −1/2, −1/2, −1/2, −1/2). The last array is the Haar wavelet transform of the original data items. Because of the differences, the wavelet transform tends to have numbers smaller than the original pixel values, so it is easier to compress using RLE, perhaps combined with move-to-front [Salomon 09] and Huffman coding. Lossy compression can be obtained if some of the smaller differences are quantized or even completely deleted (set to zero). Before we continue, it is interesting (and also useful) to determine the complexity of this transform, i.e., the number of arithmetic operations as a function of the size of the data. In our example we needed 8+4+2 = 14 operations (additions and subtractions), a number that can also be expressed as 14 = 2(8 − 1). In the general case, assume that we start with N = 2n data items. In the first iteration we need 2n operations, in the second one we need 2n−1 operations, and so on, until the last iteration, where 2n−(n−1) = 21
25.1 The Haar Transform
1150
operations are needed. Thus, the total number of operations is n
2i =
i=1
n
2i − 1 =
i=0
1 − 2n+1 − 1 = 2n+1 − 2 = 2(2n − 1) = 2(N − 1). 1−2
The Haar wavelet transform of N data items can therefore be performed with 2(N − 1) operations, so its complexity is O(N ), an excellent result. It is useful to associate with each iteration a quantity called resolution, which is defined as the number of remaining averages at the end of the iteration. The resolutions after each of the three iterations above are 4(= 22 ), 2(= 21 ), and 1(= 20 ). Section 25.1.5 shows that each component of the wavelet transform should be normalized by dividing it by the square root of the resolution. (This is the orthonormal Haar transform, also discussed in Section 24.2.3.) Thus, our example wavelet transform becomes
36/8 −16/8 −4/4 −4/4 −1/2 −1/2 −1/2 −1/2 √ , √ , √ , √ , √ , √ , √ , √ 20 20 21 21 22 22 22 22
.
If the normalized wavelet transform is used, it can be formally proved that ignoring the smallest differences is the best choice for lossy wavelet compression, since it causes the smallest loss of image information. The two procedures of Figure 25.2 illustrate how the normalized wavelet transform of an array of n components (where n is a power of 2) can be computed. Reconstructing the original array from the normalized wavelet transform is illustrated by the pair of procedures of Figure 25.3. These procedures seem at first different from the averages and √ differences discussed earlier. They don’t compute averages, because they divide by 2 instead of by 2; the √ first procedure starts by dividing the entire array by n, and the second procedure ends by doing the reverse. The final result, however, is the same as that shown above. Starting with array (1, 2, 3, 4, 5, 6, 7, 8), the three iterations of procedure NWTcalc result in the following arrays 3 7 11 15 −1 −1 −1 −1 √ ,√ ,√ ,√ ,√ ,√ ,√ ,√ , 24 24 24 24 24 24 24 24 10 26 −4 −4 −1 −1 −1 −1 √ ,√ ,√ ,√ ,√ ,√ ,√ ,√ , 25 25 25 25 24 24 24 24 36 −16 −4 −4 −1 −1 −1 −1 √ ,√ ,√ ,√ ,√ ,√ ,√ ,√ , 26 26 25 25 24 24 24 24 36/8 −16/8 −4/4 −4/4 −1/2 −1/2 −1/2 −1/2 √ , √ . , √ , √ , √ , √ , √ , √ 20 20 21 21 22 22 22 22
25.1.2 A Formal Description The use of the Haar transform for image compression is described here from a practical point of view. We first show how this transform is applied to the compression of grayscale
25 The Wavelet Transform
procedure NWTcalc(a:array of real, n:int); comment √ n is the array size (a power of 2) a:=a/ n comment divide entire array j:=n; while j≥ 2 do NWTstep(a, j); j:=j/2; endwhile; end; procedure NWTstep(a:array of real, j:int); for i=1 to j/2 do √ b[i]:=(a[2i-1]+a[2i])/ 2; √ b[j/2+i]:=(a[2i-1]-a[2i])/ 2; endfor; a:=b; comment move entire array end; Figure 25.2: Computing the Normalized Wavelet Transform.
procedure NWTreconst(a:array of real, n:int); j:=2; while j≤n do NWTRstep(a, j); j:=2j; endwhile √ a:=a n; comment multiply entire array end; procedure NWTRstep(a:array of real, j:int); for i=1 to j/2 do √ b[2i-1]:=(a[i]+a[j/2+i])/ √ 2; b[2i]:=(a[i]-a[j/2+i])/ 2; endfor; a:=b; comment move entire array end; Figure 25.3: Restoring from a Normalized Wavelet Transform.
1151
1152
25.1 The Haar Transform We spake no word, Tho’ each I ween did hear the other’s soul. Not a wavelet stirred, And yet we heard The loneliest music of the weariest waves That ever roll. —Abram J. Ryan, Poems.
images, then show how this method can be extended to color images. One reference for Haar transform is [Stollnitz et al. 96]. The Haar transform uses a scale function φ(t) and a wavelet ψ(t), both shown in Figure 25.4a, to represent a large number of functions f (t). The representation is the infinite sum ∞ ∞ ∞ f (t) = ck φ(t − k) + dj,k ψ(2j t − k), k=−∞
k=−∞ j=0
where ck and dj,k are coefficients to be calculated. The basic scale function φ(t) is the unit pulse 1, 0 ≤ t < 1, φ(t) = 0, otherwise. The function φ(t − k) is a copy of φ(t), shifted k units to the right. Similarly, φ(2t − k) is a copy of φ(t − k) scaled to half the width of φ(t − k). The shifted copies are used to approximate f (t) at different times t. The scaled copies are used to approximate f (t) at higher resolutions. Figure 25.4b shows the functions φ(2j t − k) for j = 0, 1, 2, and 3 and for k = 0, 1, . . . , 7. The basic Haar wavelet is the step function 1, 0 ≤ t < 0.5, ψ(t) = −1, 0.5 ≤ t < 1. From this we can see that the general Haar wavelet ψ(2j t − k) is a copy of ψ(t) shifted k units to the right and scaled such that its total width is 1/2j . Exercise 25.1: Draw the four Haar wavelets ψ(22 t − k) for k = 0, 1, 2, and 3. Both φ(2j t−k) and ψ(2j t−k) are nonzero in an interval of width 1/2j . This interval is their support. Since this interval tends to be short, we say that these functions have compact support. We illustrate the basic transform on the simple step function 5, 0 ≤ t < 0.5, f (t) = 3, 0.5 ≤ t < 1. It is easy to see that f (t) = 4φ(t) + ψ(t). We say that the original steps (5, 3) have been transformed to the (low resolution) average 4 and the (high resolution) detail 1. Using
25 The Wavelet Transform
1
1
t
t
1
1
1153
(t)
(t) (a) j=
0
(c) 1
(4tk)
2
3
k 0 1 2 3 4 5 6 7
(b)
(2jtk)
Figure 25.4: The Haar Basis Scale and Wavelet Functions.
√ matrix notation, this can be expressed (up to a factor of 2) as (5, 3)A2 = (4, 1), where A2 is the order-2 Haar transform matrix of Equation (24.9). Alfr´ ed Haar (1885–1933)
Alfr´ed Haar was born in Budapest and received his higher mathematical training in G¨ ottingen, where he later became a privatdozent. In 1912, he returned to Hungary and became a professor of mathematics first in Kolozsv´ ar and then in Szeged, where he and his colleagues created a major mathematical center. Haar is best remembered for his work on analysis on groups. In 1932 he introduced an invariant measure on locally compact groups, now called the Haar measure, which allows an analog of Lebesgue integrals to be defined on locally compact topological groups. Mathematical lore has it that John von Neumann tried to discourage Haar in this work because he felt certain that no such measure could exist. The following limerick celebrates Haar’s achievement. Said a mathematician named Haar,
25.1 The Haar Transform
1154
“Von Neumann can’t see very far. He missed a great treasure— They call it Haar measure— Poor Johnny’s just not up to par.”
25.1.3 Applying the Haar Transform Once the concept of a wavelet transform is grasped, it is easy to generalize it to a complete two-dimensional image. This can be done in several ways that are discussed in section 8.10 of [Salomon 09]. Here we show two such approaches, called the standard decomposition and the pyramid decomposition. The former (Figure 25.6) starts by computing the wavelet transform of every row of the image. This results in a transformed image where the first column contains averages and all the other columns contain differences. The standard algorithm then computes the wavelet transform of every column. This results in one average value at the top-left corner, with the rest of the top row containing averages of differences, and with all other pixel values transformed into differences. The latter method computes the wavelet transform of the image by alternating between rows and columns. The first step is to calculate averages and differences for all the rows (just one iteration, not the entire wavelet transform). This creates averages in the left half of the image and differences in the right half. The second step is to calculate averages and differences (just one iteration) for all the columns, which results in averages in the top-left quadrant of the image and differences elsewhere. Steps 3 and 4 operate on the rows and columns of that quadrant, resulting in averages concentrated in the top-left subquadrant. Pairs of steps are repeatedly executed on smaller and smaller subsquares, until only one average is left, at the top-left corner of the image, and all other pixel values have been reduced to differences. This process is summarized in Figure 25.7. The transforms described in Section 24.1 are orthogonal. They transform the original pixels into a few large numbers and many small numbers. In contrast, wavelet transforms, such as the Haar transform, are subband transforms. They partition the image into regions such that one region contains large numbers (averages in the case of the Haar transform) and the other regions contain small numbers (differences). However, these regions, which are called subbands, are more than just sets of large and small numbers. They reflect different geometric artifacts of the image. To illustrate this important feature, we examine a small, mostly-uniform image with one vertical line and one horizontal line. Figure 25.5a shows an 8 × 8 image with pixel values of 12, except for a vertical line with pixel values of 14 and a horizontal line with pixel values of 16. Figure 25.5b shows the results of applying the Haar transform once to the rows of the image. The right half of this figure (the differences) is mostly zeros, reflecting the uniform nature of the image. However, traces of the vertical line can easily be seen (the notation 2 indicates a negative difference). Figure 25.5c shows the results of applying the Haar transform once to the columns of Figure 25.5b. The upper-right subband now features traces of the vertical line, whereas the lower-left subband shows traces of the horizontal line. These subbands are denoted by HL and LH, respectively (Figures 25.7 and 25.29, although there is inconsistency in the use of this notation by various authors). The lower-right subband, denoted by HH, reflects diagonal image artifacts (which our
25 The Wavelet Transform 12 12 12 12 12 16 12 12
12 12 12 12 12 16 12 12
12 12 12 12 12 16 12 12
12 14 12 14 12 14 12 14 12 14 16 14 12 14 12 14 (a)
12 12 12 12 12 16 12 12
12 12 12 12 12 16 12 12
12 12 12 12 12 16 12 12
12 12 12 12 12 16 12 12
12 12 12 12 12 16 12 12
13 12 13 12 13 12 13 12 13 12 15 16 13 12 13 12 (b)
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0
1155 12 12 14 12 0 0 4 0
12 12 14 12 0 0 4 0
13 13 14 13 0 0 2 0
12 12 14 12 0 0 4 0 (c)
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
2 2 0 2 0 0 4 0
0 0 0 0 0 0 0 0
Figure 25.5: An 8×8 Image and Its Subband Decomposition.
example image lacks). Most interesting is the upper-left subband, denoted by LL, that consists entirely of averages. This subband is a one-quarter version of the entire image, containing traces of both the vertical and the horizontal lines. Exercise 25.2: Construct a diagram similar to Figure 25.5 to show how subband HH reflects diagonal artifacts of the image. (Artifact: A feature not naturally present, introduced during preparation or investigation.) Figure 25.29 shows four levels of subbands, where level 1 contains the detailed features of the image (also referred to as the high-frequency or fine-resolution wavelet coefficients) and the top level, level 4, contains the coarse image features (low-frequency or coarse-resolution coefficients). It is clear that the lower levels can be quantized coarsely without much loss of important image information, while the higher levels should be quantized finely. The subband structure is the basis of all the image compression methods that use the wavelet transform. Figure 25.8 shows typical results of the pyramid wavelet transform. The original image is shown at the top-left part of the figure. In order to illustrate how the pyramid transform works, this image consists only of horizontal, vertical, and slanted lines. The two halves of the top-right part of the figure show a left subband with the averages (this is similar to the entire image) and a right subband with the vertical details of the image. The bottom part of the figure features four subbands where the horizontal and diagonal details are also clear. The Mathematica code is also listed. Section 24.1 discusses orthogonal transforms. An orthogonal linear transform is performed by computing the inner product of the data (pixel values or audio samples) with a set of basis functions. The result is a set of transform coefficients that can later be quantized or compressed with RLE, Huffman coding, or other methods. The subband transform, on the other hand, is performed by computing a convolution of the data (Section 25.2) with a set of bandpass filters. Each resulting subband encodes a particular portion of the frequency content of the data. As a reminder, the discrete inner product of the two vectors fi and gi is defined by f, g =
i
fi gi .
25.1 The Haar Transform
1156
procedure StdCalc(a:array of real, n:int); comment array size is nxn (n = power of 2) for r=1 to n do NWTcalc(row r of a, n); endfor; for c=n to 1 do comment loop backwards NWTcalc(col c of a, n); endfor; end; procedure StdReconst(a:array of real, n:int); for c=n to 1 do comment loop backwards NWTreconst(col c of a, n); endfor; for r=1 to n do NWTreconst(row r of a, n); endfor; end;
Original image
H1
L2 H2
H1
H3 L3
L1
H2
H1
Figure 25.6: The Standard Image Wavelet Transform and Decomposition.
25 The Wavelet Transform
1157
procedure √ NStdCalc(a:array of real, n:int); a:=a/ n comment divide entire array j:=n; while j≥ 2 do for r=1 to j do NWTstep(row r of a, j); endfor; for c=j to 1 do comment loop backwards NWTstep(col c of a, j); endfor; j:=j/2; endwhile; end; procedure NStdReconst(a:array of real, n:int); j:=2; while j≤n do for c=j to 1 do comment loop backwards NWTRstep(col c of a, j); endfor; for r=1 to j do NWTRstep(row r of a, j); endfor; j:=2j; endwhile √ a:=a n; comment multiply entire array end;
Original image
LL
LH
HL
HH
LLH
H
LLL
L
HL
LH HH
LH HL
HH
Figure 25.7: The Pyramid Image Wavelet Transform.
1158
25.1 The Haar Transform
ar=Import["Design.raw", "Bit"]; stp=Partition[ar,256]; {row,col}=Dimensions[stp]; ArrayPlot[stp] (* step 1, loop over columns and construct array ptp *) ptp=Table[0,{i,1,row},{j,1,col}];(*Init ptp to zeros*) mcol=Floor[col/2]; Do[ k=1; Do[ptp[[i,k]]=(stp[[i,j]]+stp[[i,j+1]])/2; ptp[[i,mcol+k]]=(stp[[i,j]]-stp[[i,j+1]])/2; k=k+1, {j,1,col-1,2}], {i,1,row}] ArrayPlot[ptp] (* step 2, loop over the rows of ptp and construct array qtp *) qtp=Table[0,{i,1,row},{j,1,col}];(*Init qtp to zeros*) mrow=Floor[row/2]; Do[ k=1; Do[qtp[[k,j]]=(ptp[[i,j]]+ptp[[i+1,j]])/2; qtp[[mrow+k,j]]=(ptp[[i,j]]-ptp[[i+1,j]])/2; k=k+1, {i,1,row-1,2}], {j,1,col}] ArrayPlot[qtp] Figure 25.8: A Pyramid Wavelet Decomposition.
25 The Wavelet Transform
1159
The discrete convolution h is defined by Equation (25.1): hi = f g =
fj gi−j .
(25.1)
j
(Each element hi of the discrete convolution h is the sum of products. It depends on i in the special way shown.) Either method, standard or uniform, results in a transformed, although not yet compressed, image that has one average at the top-left corner and smaller numbers, differences, or averages of differences everywhere else. These numbers can be compressed using a combination of methods, such as RLE, move-to-front, and Huffman coding. If lossy compression is acceptable, some of the smallest differences can be quantized or even set to zeros, which creates run lengths of zeros, making the use of RLE even more attractive. Whiter foam than thine, O wave, Wavelet never wore, Stainless wave; and now you lave The far and stormless shore — Ever — ever — evermore! —Abram J. Ryan, Poems. Color Images: So far we have assumed that each pixel is a single number (i.e., we have a single-component image, in which all pixels are shades of the same color, normally gray). Any compression method for single-component images can be extended to color (three-component) images by separating the three components, then transforming and compressing each individually. If the compression method is lossy, it makes sense to convert the three image components from their original color representation, which is normally RGB, to the YIQ color representation. The Y component of this representation is called luminance, and the I and Q (the chrominance) components are responsible for the color information (Chapter 21). The advantage of this color representation is that the human eye is most sensitive to Y and least sensitive to Q. A lossy method should therefore leave the Y component alone and delete some data from the I, and more data from the Q components, resulting in good compression and in a loss to which the eye is not that sensitive. It is interesting to note that United States color television transmission also takes advantage of the YIQ representation. Signals are broadcast with bandwidths of 4 MHz for Y, 1.5 MHz for I, and only 0.6 MHz for Q.
25.1.4 Properties of the Haar Transform The examples in this section illustrate some properties of the Haar transform, and of the discrete wavelet transform in general. Figure 25.10 shows a highly correlated 8×8 image and its Haar wavelet transform. Both the grayscale and numeric values of the pixels and the transform coefficients are shown. Because the original image is so correlated, the wavelet coefficients are small and there are many zeros.
25.1 The Haar Transform
1160
Exercise 25.3: A glance at Figure 25.10 suggests that the last sentence is wrong. The wavelet transform coefficients listed in the figure are very large compared with the pixel values of the original image. In fact, we know that the top-left Haar transform coefficient should be the average of all the image pixels. Since the pixels of our image have values that are (more or less) uniformly distributed in the interval [0, 255], this average should be around 128, yet the top-left transform coefficient is 1,051. Explain this! In a discrete wavelet transform, most of the wavelet coefficients are details (or differences). The details in the lower levels represent the fine details of the image. As we move higher in the subband level, we find details that correspond to coarser image features. Figure 25.11a illustrates this concept. It shows an image that is smooth on the left and has “activity” (i.e., adjacent pixels that tend to be different) on the right. Part (b) of the figure shows the wavelet transform of the image. Low levels (corresponding to fine details) have transform coefficients on the right, since this is where the image activity is located. High levels (coarse details) look similar but also have coefficients on the left side, because the image is not completely blank on the left. The Haar transform is the simplest wavelet transform, but even this simple method illustrates the power of the wavelet transform. It turns out that the low levels of the discrete wavelet transform contain the unimportant image features, so quantizing or discarding these coefficients can lead to lossy compression that is both efficient and of high quality. Often, the image can be reconstructed from very few transform coefficients without any noticeable loss of quality. Figure 25.12a–c shows three reconstructions of the simple 8 × 8 image of Figure 25.10. They were obtained from only 32, 13, and 5 wavelet coefficients, respectively. 00 2020 4040 6060 8080 100 100
nz=653 120 120 0 0
(a)
20 20
40 40
60 60 nz = 653
80 80
100 100
120 120
(b)
Figure 25.9: Reconstructing a 128×128 Simple Image from 4% of Its Coefficients.
Figure 25.9 is a similar example. It shows a bi-level image fully reconstructed from just 4% of its transform coefficients (653 coefficients out of 128×128).
25 The Wavelet Transform
255 0 255 0 255 0 255 0
1051 0 0 0 48 48 48 48
224 32 224 32 224 32 224 32
192 64 192 64 192 64 192 64
34.0 0.0 0.0 0.0 239.5 239.5 239.5 239.5
1161
159 159 159 159 159 159 159 159
127 127 127 127 127 127 127 127
95 159 95 159 95 159 95 159
63 191 63 191 63 191 63 191
32 223 32 223 32 223 32 223
-44.5 0.0 0.0 0.0 112.8 112.8 112.8 112.8
-0.7 0.0 0.0 0.0 90.2 90.2 90.2 90.2
-1.0 -62 0 -1.0 0.0 0 0 0.0 0.0 0 0 0.0 0.0 0 0 0.0 31.5 64 32 31.5 31.5 64 32 31.5 31.5 64 32 31.5 31.5 64 32 31.5
Figure 25.10: The Image that is Reconstructed in Figure 25.12 and Its Haar Transform.
(a)
(b)
Figure 25.11: (a) A 128×128 Image with Activity on the Right. (b) Its Transform.
25.1 The Haar Transform
1162
11 22 33 44 55 66 77 880 0
11
2
2
3
3
44 nz = 32 55 nz=32
6
6
7
7
88
99
(a)
11 22 33 44 55 66 77 880 0
1
1
2
2
3
3
4 nz = 13 5 nz=13
4
5
6
44 nz = 5 55 nz=5
6 6
6
7
7
8
8
9
9
(b)
11 22 33 44 55 66 77 880 0
11
2 2
3
3
7 7
88
99
(c)
Figure 25.12: Three Lossy Reconstructions of an 8×8 Image.
25 The Wavelet Transform
1163
Experimenting is the key to understanding these concepts. Proper mathematical software makes it easy to input images and experiment with various features of the discrete wavelet transform. In order to help the interested reader, Figure 25.13 lists a Matlab program that inputs an image, computes its Haar wavelet transform, discards a given percentage of the smallest transform coefficients, then computes the inverse transform to reconstruct the image. Lossy wavelet image compression involves the discarding of coefficients, so the concept of sparseness ratio is defined to measure the amount of coefficients discarded. Sparseness is defined as the number of nonzero wavelet coefficients divided by the number of coefficients left after some are discarded. The higher the sparseness ratio, the fewer coefficients are left. Higher sparseness ratios lead to better compression but may result in poorly reconstructed images. The sparseness ratio is distantly related to compression factor, a compression measure defined in the Introduction. The line “filename=’lena128’; dim=128;” contains the image file name and the dimension of the image. The image files used by me were in raw form and contained just the grayscale values, each as a single byte. There is no header, and not even the image resolution (number of rows and columns) is included in the file. However, Matlab can read other types of files. The image is assumed to be square, and parameter “dim” should be a power of 2. The assignment “thresh=” specifies the percentage of transform coefficients to be deleted. This provides an easy way to experiment with lossy wavelet image compression. File “harmatt.m” contains two functions that compute the Haar wavelet coefficients in a matrix form (Section 25.1.5). (A technical note: A Matlab m file may include commands or a function but not both. It may, however, contain more than one function, provided that only the top function is invoked from outside the file. All the other functions must be called from within the file. In our case, function harmatt(dim) calls function individ(n).) Exercise 25.4: Use the code of Figure 25.13 (or similar code) to compute the Haar transform of the Lena image (Figure 24.40) and reconstruct it three times by discarding more and more detail coefficients.
25.1.5 A Matrix Approach The principle of the Haar transform is to compute averages and differences. It turns out that this can be done by means of matrix multiplication ([Mulcahy 96] and [Mulcahy 97]). As an example, we look at the top row of the simple 8×8 image of Figure 25.10. Anyone with a little experience with matrices can construct a matrix that when multiplied by this vector creates a vector with four averages and four differences. Matrix A1 of Equation (25.2) does that and, when multiplied by the top row of pixels of Figure 25.10, generates (239.5, 175.5, 111.0, 47.5, 15.5, 16.5, 16.0, 15.5). Similarly, matrices A2 and A3 perform the second and third steps of the transform, respectively. The results are shown
1164
25.1 The Haar Transform
clear; % main program filename=’lena128’; dim=128; fid=fopen(filename,’r’); if fid==-1 disp(’file not found’) else img=fread(fid,[dim,dim])’; fclose(fid); end thresh=0.0; % percent of transform coefficients deleted figure(1), imagesc(img), colormap(gray), axis off, axis square w=harmatt(dim); % compute the Haar dim x dim transform matrix timg=w*img*w’; % forward Haar transform tsort=sort(abs(timg(:))); tthresh=tsort(floor(max(thresh*dim*dim,1))); cim=timg.*(abs(timg) > tthresh); [i,j,s]=find(cim); dimg=sparse(i,j,s,dim,dim); % figure(2) displays the remaining transform coefficients %figure(2), spy(dimg), colormap(gray), axis square figure(2), image(dimg), colormap(gray), axis square cimg=full(w’*sparse(dimg)*w); % inverse Haar transform density = nnz(dimg); disp([num2str(100*thresh) ’% of smallest coefficients deleted.’]) disp([num2str(density) ’ coefficients remain out of ’ ... num2str(dim) ’x’ num2str(dim) ’.’]) figure(3), imagesc(cimg), colormap(gray), axis off, axis square File harmatt.m with two functions function x = harmatt(dim) num=log2(dim); p = sparse(eye(dim)); q = p; i=1; while i N . When expressed in terms of a matrix, the matrix is lower triangular and banded. Such filters are commonly used and are important.
From the Dictionary Tap (noun). 1. A cylindrical plug or stopper for closing an opening through which liquid is drawn, as in a cask; spigot. 2. A faucet or cock. 3. A connection made at an intermediate point on an electrical circuit or device. 4. An act or instance of wiretapping.
To illustrate the frequency response of a filter we select an input vector of the form x(n) = einω = cos(nω) + i sin(nω), for − ∞ < n < ∞. This is a complex function whose real and imaginary parts are a cosine and a sine, respectively, both with frequency ω. It is known that the Fourier transform of a pulse contains all the frequencies, but the Fourier transform of a sine wave has just one frequency. The
25.2 Filter Banks
1172
smallest frequency is ω = 0, for which the vector becomes x = (. . . , 1, 1, 1, 1, 1, . . .). The highest frequency is ω = π, where the same vector becomes x = (. . . , 1, −1, 1, −1, 1, . . .). The special feature of this input is that the output vector y(n) is a multiple of the input. For the moving average, the output (filter response) is y(n) =
1 1 1 1 x(n) + x(n − 1) = einω + ei(n−1)ω = 2 2 2 2
1 1 −iω inω + e = H(ω)x(n), e 2 2
where H(ω) = ( 12 + 12 e−iω ) is the frequency response function of the filter. Since H(0) = 1/2 + 1/2 = 1, we see that the input x = (. . . , 1, 1, 1, 1, 1, . . .) is transformed to itself. Also, H(ω) for small values of ω generates output that is very similar to the input. This filter “lets” the low frequencies through, hence the name “lowpass filter.” For ω = π, the input is x = (. . . , 1, −1, 1, −1, 1, . . .) and the output is all zeros (since the average of 1 and −1 is zero). This lowpass filter smooths out the high-frequency regions (the bumps) of the input signal. Notice that we can write ω iω/2 H(ω) = cos . e 2 When we plot the magnitude |H(ω)| = cos(ω/2) of H(ω) (Figure 25.18a), it is easy to see that it has a maximum at ω = 0 (the lowest frequency) and two minima at ω = ±π (the highest frequencies). The highpass filter uses differences to pick up the high frequencies in the input signal, and reduces or removes the smooth (low frequency) parts. In the case of the Haar transform, the highpass filter computes y(n) =
1 1 x(n) − x(n − 1) = h x, 2 2
where the filter coefficients are h(0) = 1/2 and h(1) = −1/2, or h = (. . . , 0, 0, 1/2, −1/2, 0, . . .). In matrix notation this can be expressed by ⎞ ⎛ ··· ··· − 12 ⎜ y(−1) ⎟ ⎜ ⎟ ⎜ ⎜ ⎜ y(0) ⎟ = ⎜ ⎠ ⎜ ⎝ ⎝ y(1) ···
⎞⎛
⎛
1 2 − 12
1 2 − 12
⎞ ··· ⎟ ⎜ x(−1) ⎟ ⎟⎜ ⎟ ⎜ x(0) ⎟ ⎟. ⎟ 1 ⎠ ⎝ x(1) ⎠ 2 ··· ···
The main diagonal contains copies of h(0), and the diagonal below contains h(1). Using the identity and delay operator, this can also be written 1 1 highpass filter = (identity) − (delay). 2 2
25 The Wavelet Transform
1173
Again selecting input x(n) = einω , it is easy to see that the output is y(n) =
1 inω 1 i(n−1)ω − e = e 2 2
1 1 −iω −iω/2 = sin (ω/2) ie−iω/2 . e − e 2 2
This time the highpass response function is 1 1 −iω 1 iω/2 − e e = − e−iω/2 e−iω/2 = sin (ω/2) e−iω/2 . 2 2 2 The magnitude is |H1 (ω)| = | sin ω2 |. It is shown in Figure 25.18b, and it is obvious that it has a minimum for frequency zero and two maxima for large frequencies. H1 (ω) =
−π
(a)
π
−π
(b)
π
Figure 25.18: Magnitudes of (a) Lowpass and (b) Highpass Filters.
An important property of filter banks is that none of the individual filters are invertible, but the bank as a whole has to be designed such that the input signal could be perfectly reconstructed from the output in spite of the data loss caused by downsampling. It is easy to see, for example, that the constant signal x = (. . . , 1, 1, 1, 1, 1, . . .) is transformed by the highpass filter H1 to an output vector of all zeros. Obviously, there cannot exist an inverse filter H1−1 that will be able to reconstruct the original input from a zero vector. The best that such an inverse transform can do is to use the zero vector to reconstruct another zero vector. Exercise 25.6: Show an example of an input vector x that is transformed by the lowpass filter H0 to a vector of all zeros. Summary: The discussion of filter banks in this section should be compared to the discussion of image transforms in Section 24.1. Even though both sections describe transforms, they differ in their approach, since they describe different classes of transforms. Each of the transforms described in Section 24.1 is based on a set of orthogonal basis functions (or orthogonal basis images), and is computed as an inner product of the input signal with the basis functions. The result is a set of transform coefficients that are subsequently compressed either losslessly (by RLE or some entropy encoder) or lossily (by quantization followed by entropy coding). This section deals with subband transforms [Simoncelli and Adelson 90], a different type of transform that is computed by taking the convolution of the input signal with a
25.2 Filter Banks
1174
set of bandpass filters and decimating the results. Each decimated set of transform coefficients is a subband signal that encodes a specific range of the frequencies of the input. Reconstruction is done by upsampling, followed by computing the inverse transforms, and merging the resulting sets of outputs from the inverse filters. The main advantage of subband transforms is that they isolate the different frequencies of the input signal, thereby making it possible for the user to precisely control the loss of data in each frequency range. In practice, such a transform decomposes an image into several subbands, corresponding to different image frequencies, and each subband can be quantized differently. The main disadvantage of this type of transform is the introduction of artifacts, such as aliasing and ringing, into the reconstructed image, because of the downsampling. This is why the Haar transform is unsatisfactory, and most of the research in this field is concerned with finding better sets of filters. Figure 25.19 illustrates a typical case of a general subband filter bank with N bandpass filters and three stages. Notice how the output of the lowpass filter H0 of each stage is sent to the next stage for further decomposition, and how the combined output of the synthesis bank of a stage is sent to the top inverse filter of the synthesis bank of the preceding stage.
stage 3 y0(n)
y0(n) x(n)
↓k0
H1
↓k1
HN
↓kN
H0
↓k0
H1
↓k1
HN
↓kN
stage 2
H0
↓k0
H1
↓k1
HN
↓ kN
y0(n)
H0
y1(n) yN(n)
y1(n) yN(n)
y1(n) yN(n)
↑k0
F0
↑k1
F1
↑ kN
FN
↑k0
F0
↑k1
F1
↑ kN
FN
↑k0
F0
↑k1
F1
↑kN
FN
x(n) stage 1
Figure 25.19: A General Filter Bank.
25.2.1 Deriving the Filter Coefficients Once the basic operation of filter banks is understood, the natural question is, how are the filter coefficients derived? A full answer to this question is outside the scope of this book (see, for example, [Akansu and Haddad 92]), but this section provides a glimpse at the rules and methods used to figure out the values of various filter banks.
25 The Wavelet Transform
1175
Given a set of two forward and two inverse N -tap filters H0 and H1 , and F0 and F1 (where N is even), we denote their coefficients by
h0 = h0 (0), h0 (1), . . . , h0 (N − 1) ,
f0 = f0 (0), f0 (1), . . . , f0 (N − 1) ,
h1 = h1 (0), h1 (1), . . . , h1 (N − 1) ,
f1 = f1 (0), f1 (1), . . . , f1 (N − 1) .
The four vectors h0 , h1 , f0 , and f1 are the impulse responses of the four filters. The simplest set of conditions that these quantities have to satisfy is: 1. Normalization: Vector h0 is normalized (i.e., its length is one unit). 2. Orthogonality: For any integer i that satisfies 1 ≤ i < N/2, the vector formed by the first 2i elements of h0 should be orthogonal to the vector formed by the last 2i elements of the same h0 . 3. Vector f0 is the reverse of h0 . 4. Vector h1 is a copy of f0 where the signs of the odd-numbered elements (the first, third, etc.) are reversed. We can express this by saying that h1 is computed by coordinate multiplication of h1 and (−1, 1, −1, 1, . . . , −1, 1). 5. Vector f1 is a copy of h0 where the signs of the even-numbered elements (the second, fourth, etc.) are reversed. We can express this by saying that f1 is computed by coordinate multiplication of h0 and (1, −1, 1, −1, . . . , 1, −1). For two-tap filters, rule 1 implies h20 (0) + h20 (1) = 1.
(25.5)
Rule 2 is not applicable because N = 2, so i < N/2 implies i < 1. Rules 3–5 yield
f0 = h0 (1), h0 (0) ,
h1 = −h0 (1), h0 (0) ,
f1 = h0 (0), −h0 (1) .
It all depends on the values of h0 (0) and h0 (1), but the single Equation (25.5) is not enough to √ determine them. However, it is not difficult to see that the choice h0 (0) = h0 (1) = 1/ 2 satisfies Equation (25.5). For four-tap filters, rules 1 and 2 imply h20 (0) + h20 (1) + h20 (2) + h20 (3) = 1, and rules 3–5 yield
h0 (0)h0 (2) + h0 (1)h0 (3) = 0,
(25.6)
f0 = h0 (3), h0 (2), h0 (1), h0 (0) ,
h1 = −h0 (3), h0 (2), −h0 (1), h0 (0) ,
f1 = h0 (0), −h0 (1), h0 (2), −h0 (3) .
Again, Equation (25.6) is not enough to determine four unknowns, and other considerations (plus mathematical intuition) are needed to derive the four values. They are listed in Equation (25.7) (this is the Daubechies D4 filter).
25.3 The DWT
1176
Exercise 25.7: Write the five conditions above for an eight-tap filter. Determining the N filter coefficients for each of the four filters H0 , H1 , F0 , and F1 depends on h0 (0) through h0 (N − 1), so it requires N equations. However, in each of the cases above, rules 1 and 2 supply only N/2 equations. Other conditions have to be imposed and satisfied before the N quantities h0 (0) through h0 (N − 1) can be determined. Here are some examples: Lowpass H0 filter: We want H0 to be a lowpass filter, so it makes sense to require that the frequency response H0 (ω) be zero for the highest frequency ω = π. Minimum phase filter: This condition requires the zeros of the complex function H0 (z) to lie on or inside the unit circle in the complex plane. Controlled collinearity: The linearity of the phase response can be controlled by requiring that the sum
2 h0 (i) − h0 (N − 1 − i) i
be a minimum. Other conditions are discussed in [Akansu and Haddad 92].
25.3 The DWT Information that is produced and analyzed in real-life situations is discrete. It comes in the form of numbers, rather than as a continuous function. This is why the discrete, rather than the continuous, wavelet transform is used in practice ([Daubechies 88], [DeVore et al. 92], and [Vetterli and Kovacevic 95]). Section 8.5 of [Salomon 09] discusses the continuous wavelet transform (CWT) and shows that it is the integral of the product f (t)ψ ∗ ( t−b a ), where a, the scale factor, and b, the time shift, can be any real numbers. The corresponding calculation for the discrete case (the DWT) involves a convolution, but experience shows that the quality of this type of transform depends heavily on two factors, the choice of scale factors and time shifts, and the choice of wavelet. In practice, the DWT is computed with scale factors that are negative powers of 2 and time shifts that are nonnegative powers of 2. Figure 25.20 shows the so-called dyadic lattice that illustrates this particular choice. The wavelets used are those that generate orthonormal (or biorthogonal) wavelet bases. The main thrust in wavelet research has therefore been the search for wavelet families that form orthogonal bases. Of those wavelets, the preferred ones are those that have compact support, because they allow for DWT computations with finite impulse response (FIR) filters. The simplest way to describe the discrete wavelet transform is by means of matrix multiplication, along the lines developed in Section 25.1.5. The √ Haar transform depends on two filter coefficients c0 and c1 , both with a value of 1/ 2 √ ≈ 0.7071. The smallest transform matrix that can be constructed in this case is 11 −11 / 2. It is a 2×2 matrix, and it generates two transform coefficients, an average and √ a difference. (Notice that these are not exactly an average and a difference, because 2 is used instead of 2. Better names for them are coarse detail and fine detail, respectively.) In general, the DWT can
25 The Wavelet Transform
1177
Scale
23
22 21 20
Time 2
1
3
Figure 25.20: The Dyadic Lattice Showing the Relation Between Scale Factors and Time.
use any set of wavelet filters, but it is computed in the same way regardless of the particular filter used. We start with one of the most popular wavelets, the Daubechies D4. As its name implies, it is based on four filter coefficients c0 , c1 , c2 , and c3 , whose values are listed in Equation (25.7). The transform matrix W is (compare with matrix A1 , Equation (25.2)) ⎛
c0 ⎜ c3 ⎜ ⎜0 ⎜ ⎜0 ⎜ . W =⎜ ⎜ .. ⎜ ⎜0 ⎜ ⎜0 ⎝c 2
c1
c1 −c2 0 0 .. .
c2 c1 c0 c3
c3 −c0 c1 −c2
0 0 c2 c1
0 0 c3 −c0 .. .
... ... ... ...
0 0 c3 −c0
... ... 0 0
0 0 ... ...
c0 c3 0 0
c1 −c2 0 0
c2 c1 c0 c3
0 0 0 0
⎞
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟. ⎟ ⎟ c3 ⎟ ⎟ −c0 ⎟ c ⎠ 1
−c2
When this matrix is applied to a column vector of data items (x1 , x2 , . . . , xn ), its top row generates the weighted sum s1 = c0 x1 + c1 x2 + c2 x3 + c3 x4 , its third row generates the weighted sum s2 = c0 x3 + c1 x4 + c2 x5 + c3 x6 , and the other odd-numbered rows generate similar weighted sums si . Such sums are convolutions of the data vector xi with the four filter coefficients. In the language of wavelets, each of them is called a smooth coefficient, and together they are called an H smoothing filter. In a similar way, the second row of the matrix generates the quantity d1 = c3 x1 − c2 x2 + c1 x3 − c0 x4 , and the other even-numbered rows generate similar convolutions. Each di is called a detail coefficient, and together they are called a G filter. G is not a smoothing filter. In fact, the filter coefficients are chosen such that the G filter generates small values when the data items xi are correlated. Together, H and G are
25.3 The DWT
1178
called quadrature mirror filters (QMF). The discrete wavelet transform of an image can therefore be viewed as passing the original image through a QMF that consists of a pair of lowpass (H) and highpass (G) filters. If W is an n × n matrix, it generates n/2 smooth coefficients si and n/2 detail coefficients di . The transposed matrix is ⎛
WT
c0 ⎜ c1 ⎜ ⎜ c2 ⎜ ⎜ c3 ⎜ =⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
c3 −c2 c1 −c0
0 0 c0 c1
0 0 c3 −c2
... ... ... ... .. .
c2 c3 0 0 c2 c3
c1 −c0
c0 c1 c2 c3
c3 −c2 c1 −c0
0 0 c0 c1
⎞ c1 −c0 ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ ⎟ ⎟. ⎟ ⎟ 0 ⎟ ⎟ 0 ⎟ c ⎠ 3
−c2
It can be shown that in order for W to be orthonormal, the four coefficients have to satisfy the two relations c20 + c21 + c22 + c23 = 1 and c2 c0 + c3 c1 = 0. The other two equations used to calculate the four filter coefficients are c3 − c2 + c1 − c0 = 0 and 0c3 − 1c2 + 2c1 − 3c0 = 0. They represent the vanishing of the first two moments of the sequence (c3 , −c2 , c1 , −c0 ). The solutions are √ √ √ √ 3)/(4 2) ≈ 0.48296, c1 = (3 + 3)/(4 2) ≈ 0.8365, √ √ √ √ c2 = (3 − 3)/(4 2) ≈ 0.2241, c3 = (1 − 3)/(4 2) ≈ −0.1294. c0 = (1 +
(25.7)
Using a transform matrix W is conceptually simple, but not very practical, since W should be of the same size as the image, which can be large. However, a look at W shows that it is very regular, so there is really no need to construct the full matrix. It is enough to have just the top row of W . In fact, it is enough to have just an array with the filter coefficients. Figure 25.21 lists Matlab code that performs this calculation. Function fwt1(dat,coarse,filter) takes a row vector dat of 2n data items, and another array, filter, with filter coefficients. It then calculates the first coarse levels of the discrete wavelet transform. Exercise 25.8: Write similar code for the inverse one-dimensional discrete wavelet transform. Plotting Functions: Wavelets are being used in many fields and have many applications, but the simple test of Figure 25.21 suggests another application, namely, plotting functions. Any graphics program or graphics software package has to include a routine to plot functions. It works by calculating the function at certain points and connecting the points with straight segments. In regions where the function has small curvature (it resembles a straight line) few points are needed, whereas in areas where the function has large curvature (it changes direction rapidly) more points are required. An ideal plotting routine should therefore be adaptive. It should select the points depending on the curvature of the function.
25 The Wavelet Transform
function wc1=fwt1(dat,coarse,filter) % The 1D Forward Wavelet Transform % dat must be a 1D row vector of size 2^n, % coarse is the coarsest level of the transform % (note that coarse should be 1 implies a negative a and a negative b2 (hence an imaginary b). If we use the absolute value of the imaginary b, Equation (C.4) becomes x2 y 2 − 2 = 1, a2 b
a, b > 0.
(C.5)
This is a canonical hyperbola, where the x axis is the traverse axis and the y axis is called the semiconjugate or imaginary axis. The hyperbola consists of two distinct parts with the imaginary axis separating them. The two points (−a, 0) and (a, 0) are called the vertices of the hyperbola. Case 3: Parabola (e = 1). The simple transformation x = x − k/2 yields, when substituted into Equation (C.1), the canonical parabola y 2 = 4ax,
where a = k/2 > 0,
(C.6)
with focus at (a, 0) (thus, a is the focal distance) and directrix x = −a. The origin is the vertex of the canonical parabola. All the conic sections can also be expressed (although not in their canonical forms) by f (θ) =
K . 1 ± e cos(θ)
For e = 0 this is a circle. For 0 < e < 1 this is an ellipse. For e = 1 this is a parabola and for e > 1 it is a hyperbola.
Conic Sections
1302
The parametric representations of the conics are simple. We start with the ellipse. In order to show that the expression 2t 1 − t2 , b , −∞ < t < ∞, (C.7) a 1 + t2 1 + t2 traces out an ellipse we show that it satisfies Equation (C.4): a2
1−t2 1+t2
a2
2 +
b2
2t 1+t2
b2
2 =
1 − 2t2 + t4 + 4t2 = 1. 1 + 2t2 + t4
The first quadrant is obtained for 0 ≤ t ≤ 1. To get the second quadrant, however, t has to vary from 1 to ∞. Quadrants 4 and 3 are obtained for −∞ ≤ t ≤ 0. The canonical hyperbola is represented parametrically by 1 + t2 2t , −∞ < t < ∞. (C.8) a , b 1 − t2 1 − t2 The right branch is traced out when −1 ≤ t ≤ 1, and the left branch is obtained when −∞ ≤ t ≤ −1 and 1 ≤ t ≤ ∞. Thus, the two values t = ±1 represent hyperbola points at infinity. The simple expression (at2 , 2at),
−∞ < t < ∞,
(C.9)
traces out the canonical parabola. Equations (C.7) and (C.8) are called rational parametrics since they contain the parameter t in the denominator. Rational parametric curves are generally complex but can represent more shapes and are therefore more general than the nonrational ones. One disadvantage of the rational parametrics is variable velocity. Varying t in equal increments generally results in traveling along the curve in unequal steps. In practice, it is sometimes necessary to have conics placed anywhere in threedimensional space, not just on the xy plane. This is done by taking a general twodimensional conic P(t) (one of Equations (C.7), (C.8), or (C.9)), adding a third coordinate z = 0, and transforming it with the general 4 × 4 transformation matrix T (Equation (4.23)). Normally, such a curve is translated and rotated. It may also be scaled and sheared. The result is a three-dimensional curve of the form a0 + a1 t + a2 t2 b0 + b1 t + b2 t2 c0 + c1 t + c2 t2 , , P∗ (t) = w0 + w1 t + w2 t2 w0 + w1 t + w2 t2 w0 + w1 t + w2 t2 2 2 2 ai ti bi ti ci ti . = 2i=0 , 2i=0 , 2i=0 i i i i=0 wi t i=0 wi t i=0 wi t Denoting xi = ai /wi , yi = bi /wi , zi = ci /wi , and ai = (xi , yi , zi ), we can write this as 2 wi ai ti w0 a0 + w1 a1 t + w2 a2 t2 P (t) = = i=0 . 2 2 i w0 + w1 t + w2 t i=0 wi t ∗
(C.10)
C Conic Sections
1303
This is the general rational form of the conic sections. It can also be shown that any rational parametric expression of the form (C.10) represents a conic.
She could see at once by his degenerate conic and dissipative terms that he was bent on no good, “Arcsinh,” she gasped. “Ho, Ho,” he said. “What a symmetric little asymptote you have. I can see your angles have a lit of secs.”
—Richard Woodman, Impure Mathematics (1981)
D Mathematica Notes One of the aims of this book is to give the reader confidence in writing Mathematica code for computer graphics topics (mostly to compute and display curves and surfaces). This Appendix lists several of the Mathematica examples in the book and explains selected lines in each. The examples are all about curves and surfaces, which is why certain commands and techniques appear in several examples. Each command, technique, and approach is explained here once. Mathematica (now in its 8th version) is an immense software system, with many commands, parameters, and options, which is why this short appendix often refers the reader to [Wolfram 03] (or the latest version of this excellent reference) for more details, more examples, and complete lists of options, data types, and directives (this book is also included in the HELP menu of the software itself). The examples in this book have been written for ease of readability and are not the fastest or most sophisticated. They have been run on versions 3 through 7 of Mathematica. Bear in mind, though, that Mathematica has gone through many changes and improvements, so code for version 3 may not run in newer versions because older commands have been deleted or replaced by more powerful ones. Examples are shading and compiled. The first listing is the code for Figure 8.9 (effect of nonbarycentric weights). 1 2 3 4 5 6 7 8 9 10 11
(* non-barycentric weights example *) Clear[p0,p1,g1,g2,g3,g4]; p0={0,0}; p1={5,6}; g1=ParametricPlot[(1-t)^3 p0+t^3 p1,{t,0,1}, PlotRange->All, Compiled->False, DisplayFunction->Identity]; g3=Graphics[{AbsolutePointSize[4], {Point[p0],Point[p1]} }]; p0={0,-1}; p1={5,5}; g2=ParametricPlot[(1-t)^3 p0+t^3 p1,{t,0,1},PlotRange->All, Compiled->False, PlotStyle->AbsoluteDashing[{2,2}], DisplayFunction->Identity]; g4=Graphics[{AbsolutePointSize[4], {Point[p0],Point[p1]} }]; Show[g2,g1,g3,g4, DisplayFunction->$DisplayFunction, DefaultFont->{"cmr10", 10}];
Line 1 is a comment. Anyone with any experience in computer coding, in any programming language, knows the importance of comments. The Clear command of 1305
1306
Mathematica Notes
line 2 is useful in cases where several programs are executed in different cells in one Mathematica session and should not affect each other. If a variable or a function is used by a program, and then used by another program without being redefined, it will have its original meaning. This is a useful feature where a large program can be divided into two parts (“cells” in Mathematica jargon) where the first part defines functions and the second part has the executable commands. However, if several cells are executed and there is no relation between them, a Clear command can save unnecessary errors and precious time spent on debugging. Line 3 defines two variables of type “list.” They are later used as points. Later examples show how to construct lists of control points or data points, either twodimensional or three-dimensional. Line 4 is the first example of the ParametricPlot command (note the uppercase letters). This command plots a two-dimensional parametric curve (there is also a ParametricPlot3D version). It expects two or more arguments. The first argument is an expression (that normally depends on a parameter t) that evaluates to a pair of numbers for any value of t. Each pair is plotted as a point. If several curves should be plotted, this argument can be a list of expressions. The second argument is the range of values of t, written as {t, tmin, tmax}. The remaining arguments are options of ParametricPlot. This command has the same options as the low-level Plot command, and they are all listed in [Wolfram 03]. The options in this example are: PlotRange->All. Plot the entire curve. This option can be used to limit the plot to a certain rectangle. Compiled->False. Do not compile the parametric function. DisplayFunction->Identity. Do not display the graphics. Option DisplayFunction tells Mathematica how to display graphics. The value Identity implies no display. The curve is not plotted immediately. Instead, it is assigned to variable g1, to be displayed later, with other graphics. Line 6 prepares both p0 and p1 for display as points. Each is converted to an object of type Point, with an absolute size of four printer’s points (there are 72 printer’s points per inch). There is also a PointSize option, where the size of a point is computed relative to the size of the entire display. The list of two points is assigned, as an object of type Graphics, to variable g3. Notice that the Graphics command accepts one argument that’s a two-part list. The first part specifies the point size and the second part is the list of points. The following is a common mistake Graphics[AbsolutePointSize[4], {Point[p0],Point[p1]} ] which triggers the error message “Unknown Graphics option AbsolutePointSize.” Mathematica doesn’t recognize AbsolutePointSize, because it currently expects a single argument of type Graphics. Line 7 assigns different coordinates to the two points, and lines 8 and 10 compute another curve and another list of two points and assign them to variables g2 and g4. Option PlotStyle receives the value AbsoluteDashing, which specifies the sizes of the dashes and spaces between them. In addition to dashing, plot styles may include graphics directives such as hue and thickness. Finally, the Show command on line 11 displays the two curves and four points (variables g1 through g4). This command accepts any number of graphics arguments
D Mathematica Notes
1307
(two dimensional or three dimensional) followed by options, and displays the graphics. The options on line 11 are: DisplayFunction->$DisplayFunction. This tells Mathematica to convert the graphics to PostScript and send it to the standard output. DefaultFont->{"cmr10", 10}. Any text displayed will be in font cmr10 at a size of 10 printer’s point. Exercise D.1: Experiment to find out what happens if the semicolon following Show is omitted. The next listing is for Figure 9.7 (a bilinear Surface). 1 2 3 4 5 6 7 8 9 10 11
(* a bilinear surface patch *) Clear[bilinear,pnts,u,w]; False, DisplayFunction->Identity]; Show[g1,g2, ViewPoint->{0.063, -1.734, 2.905}];
Line 3 is the Get command, abbreviated All, DefaultFont->{"cmr10",10}, DisplayFunction->$DisplayFunction, ViewPoint->{2.783, -3.090, 1.243}];
Line 6 illustrates how array dimensions can be determined automatically and used later. Line 7 creates a table of weights that are all 1’s and line 8 sets the center weight to 5. Line 9 defines function pwr that computes xy , but returns a 1 in the normallyundefined case 00 . Line 10 is an (inefficient) computation of the Bernstein polynomials and line 11–13 compute the rational B´ezier surface as the ratio of two sums. Lines 18 and 20 prepare the segments of the control polygon. Several pairs of adjacent control points in array spnts are selected to form Mathematica objects of type Line. Lines 22–24 determine the maximum x, y, and z coordinates of the control points. These quantities are later used to plot the three coordinate axes. The construct Table[Part[spnts[[i,j]], 1], {i,1,n+1}, {j,1,m+1}]; in line 22 creates a list with part 1 (i.e., the x coordinate) of every control point. The largest element of this list is selected, to become the length of the x axis. Line 26 shows how the Line command can have more than one pair of points. Finally, line 27 displays the surface with the control points, control polygon, and three coordinate axes. The next example is a partial listing of the code for Figure 13.34 (a lofted B´ezier surface patch). It illustrates one way of dealing with matrices whose elements are lists. 1 2 3 4 5 6
pnts={{{0,1,0},{1,1,1},{2,1,0}},{{0,0,0},{1,0,0},{2,0,0}}}; b1[w_]:={1-w,w}; b2[u_]:={(1-u)^2,2u(1-u),u^2}; comb[i_]:=(b1[w].pnts)[[i]] b2[u][[i]]; g1=ParametricPlot3D[comb[1]+comb[2]+comb[3], {u,0,1},{w,0,1}, Compiled->False, DefaultFont->{"cmr10", 10}, DisplayFunction->Identity, AspectRatio->Automatic, Ticks->{{0,1,2},{0,1},{0,.5}}];
D Mathematica Notes
1309
The surface is computed as the product of the row vector b1[w_], the matrix pnts, and the column b2[u_]. We first try the dot product b1[w].pnts.b2[u], but this works only if the elements of matrix pnts are numbers. The following simple test m={{m11,m12,m13},{m21,m22,m23}}; a={a1,a2}; b={b1,b2,b3}; a.m.b produces the correct result b1(a1 m11+a2 m21)+b2(a1 m12+a2 m22)+b3(a1 m13+a2 m23). In our case, however, the elements of pnts are triplets, so the dot product b1[w].pnts produces a row of three triplets that we may denote by ((a, b, c), (d, e, f ), (g, h, i)). The dot product of this row by a column of the form (k, l, m) produces the triplet (ka + lb + mc, kd + le + mf, kg + lh + mi) instead of the triplet k(a, b, c) + l(d, e, f ) + m(g, h, i). One way to obtain the correct result is to define a function comb[i_] that multiplies part i of b1[w].pnts by part i of b2[u]. The correct expression for the surface is then the sum comb[1]+comb[2]+comb[3]. Exercise D.3: When do we need the sum comb[1]+comb[2]+comb[3]+comb[4]? Finally, the last listing is associated with Figure 13.37 (code for degree elevation of a rectangular B´ezier surface). This code illustrates the extension of a smaller array p to an extended array r, some of whose elements are left undefined (they are set to the undefined symbol a and are never used). Array r is then used to compute the control points of a degree-elevated B´ezier surface, and the point is that the undefined elements of r are not needed in this computation, but are appended to r (and also prepended to it) to simplify the computations. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
(* Degree elevation of a rect Bezier surface from 2x3 to 4x5 *) Clear[a,p,q,r]; m=1; n=2; p={{p00,p01,p02},{p10,p11,p12}}; (* array of points *) r=Array[a, {m+3,n+3}]; (* extended array, still undefined *) Part[r,1]=Table[a, {i,-1,m+2}]; Part[r,2]=Append[Prepend[Part[p,1],a],a]; Part[r,3]=Append[Prepend[Part[p,2],a],a]; Part[r,n+2]=Table[a, {i,-1,m+2}]; MatrixForm[r] (* display extended array *) q[i_,j_]:=({i/(m+1),1-i/(m+1)}. (* dot product *) {{r[[i+1,j+1]],r[[i+1,j+2]]},{r[[i+2,j+1]],r[[i+2,j+2]]}}). {j/(n+1),1-j/(n+1)} q[2,3] (* test *)
Line 5 constructs array r two rows and two columns bigger than array p. Lines 6 and 9 fill up the first and last rows of r with the symbol a, while lines 7 and 8 move array p to the central area of r and then fill up the leftmost and rightmost columns of r with symbol a. Array r becomes the 4×5 matrix ⎡
⎤ a a a a a ⎢ a p00 p01 p02 a ⎥ ⎣ ⎦. a p10 p11 p12 a a a a a a
1310
Mathematica Notes
Lines 11–13 compute the control points for the degree-elevated B´ezier surface as described in Section 13.19. Each undefined symbol a corresponds to i = 0, i = m + 1, j = 0, or j = n + 1, and is consequently multiplied by zero. Exercise D.4: Why is it important to clear the value of the undefined symbol a on line 2?
I have a number of notes about Mathematica, our products, and how I use them in this website and elsewhere. Please note, however, that beyond this there is no official connection whatsoever between Wolfram Research and this website. Everything on it is my personal opinion, not endorsed, controlled, or vetted any way by Wolfram Research. I am solely and entirely responsible for any and all errors, libels, liabilities, dangerous instructions, or just plain stupidities you may happen to find here.
—Theodore Gray, http://www.theodoregray.com
E The Resolution of Film The concept of resolution is easy to define in raster-scan computer graphics hardware because such hardware is based on pixels, which are easy to count. With the advent of digital cameras featuring higher and higher resolutions, it is only natural to compare those cameras to traditional, film-based cameras and to ask what is the resolution of film? (See also discussion of the resolution of the eye in Section 21.8.) Intuitively, we feel that film is a continuous medium, but in fact the image is recorded on film in small particles (grains) of various silver compounds. The resolution of the film is thus related to the size of these grains. One approach to quantifying the resolution is to say that the size of the grains determines the resolution, but only indirectly. This approach measures the resolution of film by trying to answer the following question experimentally: At what resolution can an observer no longer distinguish between an image on film and its digitized copy? Experiments with hundreds of observers with 35-mm film seem to indicate a range of resolutions from 1,000×1,000 to 1,500×1,500, depending on the observer and on film quality. Another, more quantitative approach to this problem uses the concept of line pairs per millimeter (LPPM). A chart, such as the ones of Figure E.1, is photographed and the film, after being properly developed and printed, is observed with a magnifying glass for the densest group of lines that can still be resolved. (The chart should have the same aspect ratio as the film, and should be photographed from such a distance that its outer frame will coincide with the boundary of the film.) Suppose, for example, that a group of lines on the chart has a density of 50 lines per mm, with 50 gaps between them, for a total of 100 lines plus gaps per mm. If this is the highest density that can be resolved on the film (i.e., any denser groups of lines look like a gray blur), then each line and each gap can be considered a pixel, and we say that the film has an LPPM value of 100 and a resolution equivalent to 100 bit/mm. The reason for the term line pair is that the gaps are also considered (white) lines. Thus, a 35 mm-wide film that tests at 100 LPPM contains the information equivalent of 3,500 bits horizontally. 1311
The Resolution of Film
1312
The patterns of Figure E.1a consist of a half-circle of wedges. The observer has to determine the distance from the center to the point where the diverging lines can be seen individually. This distance is inversely related to the resolution (such a chart, of course, has to be calibrated before it can be used). Similar, more sophisticated charts may include, in addition, some long wedges, a series of line widths, and a range of font sizes.
1
0
1 2 3 4 5 6
2 2
3 5
4
3 4
4
2
5 6
0 1 (a)
(b)
Figure E.1: Line Charts for LPPM Measurements.
Group
Element 1 2 3 4 5 6
0 10.0 11.2 12.6 14.1 15.9 17.8
1 22.0 22.4 25.2 28.3 31.7 35.6
2 40.0 44.9 50.4 56.6 63.5 71.3
3 80.0 89.8 101.0 113.0 127.0 143.0
4 160 — — — — —
Table E.2: LPPM Values for Figure E.1b.
Figure E.1b consists of three parts, identical except for size, that are placed one inside the other. Each part contains two groups, for a total of six groups. The groups are numbered 0–5, and each consists of six elements (1–6). An element constitutes 10 lines, 5 horizontal and 5 vertical. Group 0 consists of the five elements numbered 2–6 on the right of the chart, and element 1 at the bottom left. Groups 2 and 4 are shaped like group 0 and are placed inside the chart. The six elements of group 1 are placed on the
E The Resolution of Film
1313
left side, and groups 3 and 5 are small copies of group 1 and are placed inside the chart. The widths of the lines go down from element to element, from a value w in element 1 of a group to w/2 in element 1 of the following group. Given a piece of film, the chart is photographed on it and an observer should identify the smallest pair (group, element) in which all 10 lines can clearly be distinguished. The LPPM value of the film can then be found in Table E.2 (that’s limited to just groups 0–3 and the first element of group 4). It is obvious that the contrast between lines and gaps in the chart affects what can be seen on the film. Dark lines are more visible than gray ones, so the measurement is done in practice with charts that have a contrast of 1,000:1. At such contrast, most color films can resolve 100–125 LPPM, equivalent to 3,500–4,375 bits horizontal resolution for 35 mm-wide film. Slow black-and-white film may have two to three times this resolution. However, when taking real-life pictures of images with low-contrast objects, the resolution of most commercial films, as measured by LPPM, may go down to about 30 LPPM. The next point to be mentioned has to do with image magnification. When a 35mm-wide negative with 100 bits per mm (i.e., 3,500 pixels horizontally) is enlarged to, say, seven inches (about 178 mm, or a magnification factor of 5), the large picture will still have the same 3500 pixels horizontally, implying a resolution of only 3,500/178 = 20 bits/mm. The resolution as measured by the LPPM method has gone down by a factor of 5, but we know from experience that if the original image on the film is sharp, it can be magnified by a factor of 5 (i.e., to 7 in.) or even more without loss of image details, something that’s impossible with digital images. This fact suggests that the LPPM approach to measuring film resolution is not ideal and that there is a fundamental difference between digital images and images on film. LPPM Film type
ISO
1.6 : 1
1000 : 1
Tri-X pan T-Max 400 Plus-X pan K-64 K-25 T-Max 100 Fuji Velvia Panatomic-X Ektar 25 Tech-Pan Tech-Pan 120
400 400 125 64 25 100 50 32 25 25 25
50 50 50 50 63 63 * 80 80 100 100
100 125 125 100 100 200 150 200 200 320 320
Total 8,640,000 13,500,000 13,500,000 8,640,000 8,640,000 34,560,000 19,440,000 34,560,000 34,560,000 88,473,600 368,640,000
* Not specified by Fuji. Table E.3: Sensitivities, LPPMs, and Total Number of Pixels for Various Commercial Films.
1314
The Resolution of Film
Exercise E.1: The frame size of still 35 mm film is 24×36 mm. Compute the total number of pixels per frame assuming 100 LPPM. Table E.3 lists LPPM values (for two different contrasts) and total number of pixels (for a 1,000 : 1 contrast) for various commercial films, most of which are still available (based on manufacturers’ data sheets). The numbers vary widely because of the large number of different varieties of film emulsions available. The table raises another difficulty with the LPPM method, namely that even though film like Tech-Pan can have resolution of 320 LPPM, the actual resolution obtained may be lower because of the lens used in the camera. Most commercial lenses can’t resolve 320 LPPM. Exercise E.2: It is clear that film resolution, as measured by LPPM, depends on the film emulsion (its light sensitivity and grain size) and degree of contrast of the test chart. What other factors can affect the measured resolution? No comparison of digital and film images is complete without discussing color quality. Older digital images had only 8 or 16 bits per pixel and suffered from a “banding” effect, where the image seemed to consist of bands of different colors because the colors couldn’t vary continuously (they were quantized). However, most observers agree that even 12 bits/pixel (i.e., 212 or 4096 colors) produce images with no bands, which are indistinguishable from those of film. References [kenrockwell 10] and [templeton 10] have more to say about this interesting and neglected topic. The resolution of a film, as stated in Kodak handbooks, is determined under laboratory conditions, as for example on an optical bench, by photographing a black and white grating pattern, meaning 100% contrast modulation, onto the film. If one uses a grating whose spacing gets tighter and tighter, there is a point at which the adjacent lines smear into one another and can no longer be separated. That is the stated resolving power of the film.
—W. H. Fahrenbach, home.clara.net/rfthomas/papers/filmres.html
F The Color Plates This extensive manual features more than 100 color plates, placed at the very beginning, at the end, and between individual parts. They serve to liven up the book and to illustrate many of the topics discussed. It is planned to place information about these plates in the book’s website, for the benefit of readers who want to recreate or extend them. The plates were prepared over several months, using a variety of graphics software. The following programs were used to produce the plates: 3D-XplorMath, from http://3d-xplormath.org/ (mathematical visualization software). 3D Maker, from http://www.tabberer.com/sandyknoll/, is a set of filters to produce beautiful and interesting 3D effects, including dot stereograms.. Anaglyph Workshop, from http://www.sandyknollsw.com, generates anaglyphs from a pair of images or from a single image with depth information added by the user. ArtText, from www.belightsoft.com, adds many effects to text. ArtWork, from http://akvis.com/en/index.php, can manipulate an image to resemble an oil painting or the type of drawing common in comics. Cindrella, from http://cinderella.de/tiki-index.php, is interactive geometry software. Citra FX Photo Effects, from http://www.kiyut.com/, is a set of filters that add artistic effects to images. Excentro, from http://www.excourse.com/excentro is a simple but advanced tool for the creation of guilloche designs such as backgrounds, borders, and rosettes. Guilloche is an ornamental pattern or border that consists of paired ribbons or lines flowing in interlaced curves around a series of circular voids. Such patterns are used in architecture and in important documents for protection against forgery. 1315
1316
The Color Plates
Fractal Domains, from http://www.fractaldomains.com/, generates certain types of fractals. Geometer’s Sketchpad, from http://www.dynamicgeometry.com/, is graphics software for visually exploring mathematics. Golden Section, from http://powerretouche.com/ is a Photoshop plugin that generates golden rectangles, spirals, and triangles. It can also embed the rule of thirds in an image. Image Framer, from http://www.apparentsoft.com/, places a frame around an image. The user can select from a large collection of artistic, contemporary, and other types of frames. Image Tricks, from www.belightsoft.com, is a set of filters that can add many effects to an image. Knot Plot, from http://www.knotplot.com/ can compute and display a vast number of complex, intriguing, three-dimensional knots. Live Interior, from www.belightsoft.com, is software for architectural design. It includes a library of shapes, objects, and textures, and can display its results in perspective. MegaPOV, from http://megapov.inetart.net/ is a Macintosh implementation of the well-known ray-tracing software POVray. The original POVray (povray.org) is a free, powerful program that can produce stunning ray-traced images of shiny, glossy, and textured objects, where each object reflects light coming from other objects. Morph Age, from http://www.creaceed.com/ can deform images and morph one image into another. MozoDojo, from http://ktd.club.fr/programmation/mozodojo.php, creates a mosaic of images. It inputs an image A and a large set B of images, and recreates A as accurately as possible from thumbnail versions of the images in B. Particle Illusion, from http://www.wondertouch.com/, creates particle-systems effects such as explosions, smoke, fire, sparkles, motion graphics backgrounds, space effects, and creatures, as well as abstract artistic effects. PlasticBeauty, by http://pencilsoftware.com/, is a bitmap-editing application that can easily and intuitively edit an image. Puzzle Pro and Page Curl Pro, from http://www.avbros.com/, are Photoshop plugins that generate puzzles from images and can curl and fold an image. SBArt 4.2.3 has been developed by Tatsuo Unemi as a design support tool to create beautiful images based on genetic algorithms that mimic artificial selection. See http://www.intlab.soka.ac.jp/~unemi/sbart/4. Smoke, from neatberry.com, is a unique painting brush that creates smoke effects. Surface Explorer is Java code from Charles Gunn to construct parametric surfaces and surfaces of revolution. See http://www.math.tu-berlin.de/~gunn/.
F The Color Plates
1317
Terragen, from http://www.planetside.co.uk has sophisticated, fractal-based algorithms to simulate skies, outdoor lighting, terrain textures, and to render extremely large and detailed terrains. Vector Magic, from http://vectormagic.com/home, is auto-tracing software. It attempts to convert bitmap images to vector images which can be saved in EPS format and later scaled up or down by any amount. Modo, Photoshop, and Adobe Illustrator were also used. They are mentioned in Chapter 1, in the Graphics Software section. The halftone image in Plate U.2 was prepared with the instructions found at http://www.photoshoproadmap.com/Photoshop-blog/2007/09/13/ file give-your-photos-a-retro-comic-book-effect. The canvas painting in the same plate was prepared with the instructions found at http://www.photoshopstar.com/photo-effects/canvas-texture-imitation. Plates M.3, P.3, and U.1 depict many mathematical objects. They were made with 3D-XplorMath, mathematical visualization software from http://3d-xplormath.org/. The following text explains some of the figures found in those plates. Cyclide. The cyclides of Dupin are a family of implicit surfaces defined by the expression (a + b cos(u)) cos(v), (a + b cos(u)) sin(v), c sin(u) , where a, b, and c are parameters that produce all the surfaces in this family. The cyclide can also be interpreted as a torus inverted in a sphere. Dragon curve. This is a fractal that is a limit of a set of polygons Di constructed recursively. The first polygon, D1 , is a short, horizontal straight segment, and each successive polygon, Di+1 is obtained from its predecessor Di in the following steps: 1. Translate Di such that its endpoint goes to the origin. 2. Scale the translated object by 1/2 ≈ 0.7071. 3. The result, which is denoted by Ci , is rotated by −45◦ . 4. Copy Ci , rotate the copy by −90◦ , and join it to the end of Ci to obtain Di+1 . The Whitney umbrella surface is expressed either as the implicit surface x2 y−z 2 = 0 or as the parametric surface (uv, u, v2 ). The Bianchi Pinkall flat is one of many surfaces that are created by embedding an object (in this case, a torus) flat inside a sphere, and then projecting the object stereographically in three dimensions. Luigi Bianchi and Ulrich Pinkall were mathematicians active in this field. The Klein bottle is a well-known example of a non-orientable surface. It does not have and inside and and outside faces. Instead, when traveling on this surface, we find ourselves alternately inside it and outside it. This surface is well documented elsewhere. The snail shell is a set of circles (each of which depends on a parameter u) whose diameters are varied continuously. The surface is given by r cos(v), d(1 − s) + s · b sin(u), r sin(v) , where r = s(a + b cos(u)), s = exp(−c · v), and v ← v + (v + e)2 /16. Parameter e controls the size of the openings of the snail, while d controls the total length of the snail. Parameters a, b, and c control the overall appearance of the surface.
The Color Plates
1318
Torus knot. The parametric equation of the torus is P(u, v) = ((a + b cos u) cos v, (a + b cos u) sin v, c sin u). To obtain a space curve that lies on the surface of the torus, we reduce the surface of the torus to a curve by adding the relations u = d · t and v = e · t. The resulting curve is P(t) = ((a + b cos(d · t)) cos(e · t), (a + b cos(d · t)) sin(e · t), c sin(d · t)). In the plot, the parameters are a = 3.12, b = c = 1.5, d = 5, and e = 2. Hopf. The Hopf fibration describes a 3-sphere (a hypersphere in four-dimensional space) in terms of circles and an ordinary sphere. This example of a fiber bundle was discovered by Heinz Hopf in 1931. A spherical ellipse is the spherical analog of a plane ellipse. Given a unit sphere, its circumference is π, so any distance on its surface canner exceed π. We select any two points F1 and F2 as foci, denote their distance by 2e, and compute the set of points P , the sum of whose distances from the foci equals a given parameter 2a. Thus, {P : dist(P, F1 ) + dist(P, F2 ) = 2a} . Notice that the large axis a is restricted by the inequality 2e < 2a < 2π − 2e. A helicoid is a plane curve that is given a screw motion in the z direction and thus lies on a spiral. Given the plane curve (x(u), y(u)), it becomes the helicoid P(u, v) = x(u) cos v, x(u) sin v, y(u) + hv , where h = 0 is the speed of the helicoid. The default (or standard) helicoid is obtained for (x(u), y(u)) = (u, 0). In the plot, the hue of any point on the surface is selected as the current value of u2 + v 2 . The Scherk minimal surface is given by the parametric expression cos(a v) u, v, ln /a . cos(a u) The Deco cube is an explicit surface whose equation is (x2 + y 2 − c2 )2 + (z − 1)2 (z + 1)2 )[(y 2 + z 2 − c2 )2 + (x − 1)2 (x + 1)2 ]× [(z 2 + x2 − c2 )2 + (y − 1)2 (y + 1)2 ] − f [1 + b(x2 + y 2 + z 2 )] = 0. In the example shown, the three parameters have values b = c = 0.8 and f = 0. A loxodrome is a space curve that lies on a sphere and maintains a right angle with any longitude. Lissajous curves are parametric (either 2D or 3D) and are described by P(t) = a sin(2π d t), b sin(2π e t + g), a sin(2π f t + c) .
F The Color Plates
1319
Such a curve describes the joint motion of orthogonal, uncoupled oscillators with different frequencies. In the figure, the parameters are set to d = 4, e = 3, f = 23, g = π/2, and c = 0.04. The Lissajous family of curves is named after the second person, Jules Lissajous, who investigated it. The first description of these figures is due to Nathaniel Bowditch in 1815. Today, Lissajous curves are used by many screen savers and also appear in the logos of the Australian Broadcasting Corporation, the MIT Lincoln Laboratory, and the University of Electro-Communications of Ch¯ ofu (Tokyo). The Steiner parametric surface is defined by a2 sin(2u) cos2 (v), sin(u) sin(2v), cos(u) sin(2v) /2. In the plot, a = 1.732. The Ishihara Color Test is a test for red-green color deficiencies. It consists of a number of colored plates, called Ishihara plates, each of which contains a circle of dots appearing randomized in color and size.
—Wikipedia
References Some books leave us free and some books make us free. —Ralph Waldo Emerson 3dstereo (2011) is http://www.3dstereo.com/viewmaster/sca-bar.html. Abelson, H., and A. A. diSessa (1982) Turtle Geometry, Cambridge, MA, MIT Press. Abrash, Michael (1992) “Fast Antialiasing,” Dr Dobb’s Journal, 17(6)139, June. Adobe (2004) http://www.adobe.com/products/illustrator/index.html. Adobe Systems Inc. (1985) PostScript Language Tutorial and Cookbook, Reading, MA, Addison-Wesley. Adobe Systems Inc. (1990) PostScript Language Reference Manual, 2nd edition, Reading, MA, Addison-Wesley. Ahmed, N., T. Natarajan, and R. K. Rao (1974) “Discrete Cosine Transform,” IEEE Transactions on Computers, C-23:90–93. Akansu, Ali, and R. Haddad (1992) Multiresolution Signal Decomposition, San Diego, CA, Academic Press. Akima, Hiroshi (1970) “A New Method of Interpolation and Smooth Curve Fitting Based on Local Procedures,” Journal of the ACM, 17(4):589–602, October. Akima, Hiroshi (1978) “A Method of Bivariate Interpolation and Smooth Surface Fitting for Irregularly Distributed Data Points,” ACM Transactions on Mathematical Software (TOMS), 4(2):148–159, June. Akima, Hiroshi (1991) “A Method of Univariate Interpolation That Has the Accuracy of a Third-Degree Polynomial,” ACM Transactions on Mathematical Software (TOMS), 17(3):341–366, September. Akima-matlab (2010) is http://www.mathworks.com/matlabcentral/fileexchange file 1814. D. Salomon, The Computer Graphics Manual, Texts in Computer Science, DOI 10.1007/978-0-85729-886-7, © Springer-Verlag London Limited 2011
1321
1322
Bibliography
Alfeld, Peter, Marian Neamtu, and Larry L. Schumaker (1995) “Circular BernsteinB´ezier Polynomials,” in Mathematical Methods for Curves and Surfaces, M. Daehlen et al., eds. Nashville, Vanderbilt University Press, pp. 11–20. anabuilder (2005) is http://anabuilder.free.fr/Concurrents.html. AnamorphMe (2005) is http://www.anamorphosis.com/software.html. anamorphosis (2005) is http://www.anamorphosis.com/. Anderson, David P. (1982) “Hidden Line Elimination in Projected Grid Surfaces,” ACM Transactions on Graphics (TOG), 1(4):274–288. Andres, Eric et al. (1996) “Rational bitmap scaling,” Pattern Recognition Letters, 17(14):1471–1475, December. animation-tube (2010) is http://www.youtube.com/watch?v=LzZwiLUVaKg. Anoto (2010) is http://www.anoto.com/about-anoto-4.aspx. ANSI (1985), Standard X3.124-1985, available from ANSI, 1430 Broadway, New York, NY 10018. ascii-wiki (2009) is http://en.wikipedia.org/wiki/ASCII. Atkinson, William (1986) “Method and Apparatus for Image Compression and Manipulation,” United States patent 4,622,545, September 30. Banister, Brian, and Thomas R. Fischer (1999) “Quadtree Classification and TCQ Image Coding,” in Storer, James A., and Martin Cohn (eds.) (1999) DCC ’99: Data Compression Conference, Los Alamitos, CA, IEEE Computer Society Press, pp. 149–157. Barnsley, M. (1993) Fractals Everywhere, 2nd edition, Boston, MA, Academic Press. Bartlett, Randall Neal (2008) The Bad Ellipse: Circles in Perspective, available at http://idsa.org/sites/default/files/xiglafiles/ed_conference02/ file 01.pdf. Bayer (1976) is US patent 3,971,065 (20 July 1976), Bryce E. Bayer, Color imaging array. This is available in PDF from http://www.google.com/patents?id=Q_o7AAAAEBAJ&dq=3,971,065. Beach, Robert C. (1991) An Introduction to the Curves and Surfaces of Computer-Aided Design, New York, Van Nostrand Reinhold. BeHere (2005) is http://www.behere.com/. berezin (2006) is http://www.berezin.com/3d/slide_bar.htm. Berrut, Jean-Paul, and Lloyd N. Trefethen (2004) “Barycentric Lagrange Interpolation,” SIAM Review, 46(3)501–517. B´ezier, Pierre (1986) The Mathematical Basis of the UNISURF CAD System, Newton, Mass., Butterworth-Heinemann. BFGS (2010) is http://en.wikipedia.org/wiki/BFGS_method.
Bibliography
1323
Bhowmick, Partha and Bhargab B. Bhattacharya (2008) “Number-Theoretic Interpretation and Construction of a Digital Circle,” Journal of Computational Science, Discrete Applied Mathematics 156(12):2381–2399, June. Blaker, Alfred A. (1985) Applied Depth of Field, Boston and London, Focal Press (Elsevier). Blender (2005) http://www.blender.org/. Blinn, J. (1987) “How Many Ways Can You Draw a Circle?” IEEE Computer Graphics and Applications, 7(8):39–44, August. bouguetj (2010) is http://www.vision.caltech.edu/bouguetj/ICCV98. Bourke (2005) is http://paulbourke.net/miscellaneous/domefisheye/. Bresenham J. E. (1965) “Algorithm for Computer Control of A Digital Plotter,” IBM Systems Journal, 4(1):25–30. Bresenham J. E. (1977) “A Linear Algorithm for Incremental Digital Display of Circular Arcs,” Communications of the ACM, 20(2):100–106. Britanak, Vladimir, Patrick C. Yip, and Kamisetty Ramamohan Rao (2006) Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations, St. Louis, MO, Academic Press, 2006 CAD (2004) is http://www.sciencedirect.com/science/journal/00104485. CAGD (2004) is http://www.sciencedirect.com/science/journal/01678396. Calgary (2011) is ftp://ftp.cpsc.ucalgary.ca/pub/projects/ directory text.compression.corpus. Canterbury (2011) is http://corpus.canterbury.ac.nz. cameraInproduction (2005) is http://www.photoshop-tutorials-plus.com/ panoramic-cameras.html. cameraTimeline (2005) is http://www.panoramicphoto.com/timeline.htm. carlson (2011) is http://design.osu.edu/carlson/history/lessons.html. Carmody, Kevin, (1988) “Circular and Hyperbolic Quaternions, Octonions and Sedenions,” Applied Mathematics and Computation, 28:47–72. Carmody, Kevin, (1997) “Circular and Hyperbolic Quaternions, Octonions and Sedenions, Further results,” Applied Mathematics and Computation, 84:27–47. Carrato, S. and L. Tenze (2000) “A High Quality 2? image interpolator,” Signal Processing Letters, 7:132–134, June. Carpentieri, B., M.J. Weinberger, and G. Seroussi (2000) “Lossless Compression of Continuous-Tone Images,” Proceedings of the IEEE, 88(11):1797–1809, November. Catmull, E., and J. Clark (1978) “Recursively Generated B-Spline Surfaces on Arbitrary Topological Meshes,” Computer-Aided Design 10(6):350–355, Sept.
1324
Bibliography
Catmull, E. and R. Rom (1974) “A Class of Interpolating Splines,” in R. Barnhill and R. Riesenfeld, editors, Computer Aided Geometric Design, Academic Press, pages 317–326. Chaikin, G. (1974) ”An Algorithm for High-Speed Curve Generation,” Computer Graphics and Image Processing, 3:346–349. Chang (2010) is http://demonstrations.wolfram.com/ file DrawingALineOnADigitalDisplay. Chen, Wen-Hsiung, C. Harrison Smith, and S. C. Fralick (1977) “A Fast Computational Algorithm For the Discrete Cosine Transform,” IEEE Transactions on Communications, 25(9):1004–1009, September. Cheeseman-Meyer, Jason (2007) Vanishing Point: Perspective for Comics from the Ground Up, Impact publishing. Choueka Y., Shmuel T. Klein, and Y. Perl (1985) “Efficient Variants of Huffman Codes in High Level Languages,” Proceedings of the 8th ACM-SIGIR Conference, Montreal, pp. 122–130. cklin (2003) is http://sites.google.com/site/chklin/demosaic/. Cohen, E., et al., (1980) “Discrete B-Splines and Subdivision Techniques in Computer Aided Geometric Design and Computer Graphics,” Computer Graphics and Image Processing, 14:87–111. Cohen, E., et al. (1985) “Algorithms For Degree Raising of Splines,” ACM Transactions on Graphics, 4:171–181. colorcube (2010) is http://demonstrations.wolfram.com/ColorCube. Coons, Steven A. (1964) “Surfaces for Computer-Aided Design of Space Figures,” Cambridge, MA, MIT Project MAC, report MAC-M-253, January. Coons, Steven A. (1967) “Surfaces for Computer-Aided Design of Space Forms,” Cambridge, MA, MIT Project MAC TR-41, June. Coxeter, H. S. M. (1969) “Inversion in a Circle” and “Inversion of Lines and Circles” Sections 6.1 and 6.3 (pages 77–83) in Introduction to Geometry, 2nd ed., New York, John Wiley and Sons. Crow, Frank (1987) “The Origins of the Teapot,” IEEE Computer Graphics and Applications, 7(1):8–19, January. Cyrus, M., and J. Beck (1978) “Generalized Two- and Three-Dimensional Clipping,” Computers and Graphics, 3(1):23–28. Daubechies, Ingrid (1988) “Orthonormal Bases of Compactly Supported Wavelets,” Communications on Pure and Applied Mathematics, 41:909–996. davidlebovitz (2005) is http://www.davidlebovitz.com/2005/10/. Davis, Philip J. (1963) Interpolation and Approximation, Waltham, MA, Blaisdell Publishing, and New York, Dover Publications, 1975.
Bibliography
1325
DeBoor, Carl, (1972) “On Calculating With B-Splines,” Journal of Approximation Theory, 6:50–62. deeplight (2006) is http://www.deeplight.com/. Deering, Michael F. (2005) “A Photon Accurate Model of the Human Eye,” http://michaelfrankdeering.com/Projects/EyeModel/eyeExpandedS.pdf. Delaunay B. (1934) “Sur La Sphere Vide,” Bulletin of Academy of Sciences of the USSR, pp. 793–800. DeRose T., and C. Loop (1989) “The S-Patch: A New Multisided Patch Scheme,” ACM Transactions on Graphics, 8(3):204–234. DesignMentor (2005) is http://www.cs.mtu.edu/~shene/NSF-2/DM2-BETA/index.html. DeVore R., B. Jawerth, and B. Lucier (1992) “Image Compression Through Wavelet Transform Coding,” IEEE Transactions on Information Theory 38(2):719–746, March. Dierckx, Paul (1995) Curve and Surface Fitting with Splines, Oxford University Press. disspla (2011) is http://www.gaeinc.com/disspla.html. Doo, Donald, and M. Sabin (1978) “Behavior of Recursive Division Surfaces Near Extraordinary Points,” Computer-Aided Design, 10(6):356–360, Sept. double pendulum (2010) is http://en.wikipedia.org/wiki/Double_pendulum. Dupuy, M. (1948) “Le Calcul Num´erique des Fonctions par l’Interpolation Barycentrique,” Comptes Rendus de l’Acad´emie des Sciences, Paris, 158–159. eagle (2010) is http://everything2.com/index.pl?node_id=1859453. eclipsechaser (2005) is http://www.eclipsechaser.com/eclink/astrotec/allsky.htm. Edmund Scientific (2005) is http://www.edsci.com/. eff (2010) is http://www.eff.org/issues/printers. eldonoffice (2005) is http://rolodex.com/Pages/index.aspx. Elias (2001) is http://freespace.virgin.net/hugo.elias/graphics/x_wuline.htm. Elias, P. (1975) “Universal Codeword Sets and Representations of the Integers,” IEEE Transactions on Information Theory, IT-21(2):194–203, March. Enderle, Gunter, K. Kansy, and G. Pfaff (1987) Computer Graphics Programming: GKS, The Graphics Standard, Berlin, Springer-Verlag. Ernst, Bruno (1976) The Magic Mirror of M. C. Escher, New York, Random House. Fang I. (1966) “It Isn’t ETAOIN SHRDLU; It’s ETAONI RSHDLC,” Journalism Quarterly, 43:761–762. Farin G. (1992) Curves and Surfaces for CAGD, 3rd edition, Boston, MA, Academic Press. Farin G. (1998) NURBS Curves and Surfaces, Wellesley, MA, AK Peters.
1326
Bibliography
Farin, Gerald (1999) NURBS: From Projective Geometry to Practical Use, 2nd edition, Wellesley, MA, AK Peters. Farin, Gerald (2001) Curves and Surfaces for CAGD (Computer Aided Graphics and Design), San Diego, Academic Press. Farin, Gerald (2002) Curves and Surfaces for CAGD: A Practical Guide, 5th ed, Los Altos, CA, Morgan Kaufmann Publications. Farin, Gerald (2004) A History of Curves and Surfaces in CAGD, in G. Farin, J. Hoschek, and M. S. Kim, editors, Handbook of CAGD, pages 1–22. Elsevier, 2002. (Available, in PDF format from the author of this book.) Feig, E., and E. Linzer (1990) “Discrete Cosine Transform Algorithms for Image Data Compression,” in Proceedings Electronic Imaging ’90 East, pp. 84–87, Boston, MA. Ferguson, J. (1964) “Multivariate Curve Interpolation,” Journal of the ACM, 11(2):221– 228. Feynman, R. (1985) QED, The Strange Theory of Light and Matter, Princeton, NJ, Princeton University Press. Fiala, E. R., and D. H. Greene (1989), “Data Compression with Finite Windows,” Communications of the ACM, 32(4):490–505. fisheyemenu (2005) is http://www.cs.umd.edu/hcil/fisheyemenu/. Flat Earth Society (2005) is http://theflatearthsociety.org/cms/. Flocon, Albert and Andr´e Barre (1968), La Perspective Curviligne, Flammarion. English translation by Robert Hansen, 1987 Curvilinear Perspective, Berkeley, CA, University of California Press. Floyd, R., and L. Steinberg (1975) “An Adaptive Algorithm for Spatial Gray Scale,” in Society for Information Display 1975 Symposium Digest of Technical Papers, p. 36. Foley J., and A. Van Dam (1994) Fundamentals of Interactive Computer Graphics, 2nd edition, Reading, MA, Addison-Wesley. Fournier, Alain, Don Fussell, and Loren Carpenter (1982) “Computer Rendering of Stochastic Models,” Communications of the ACM, 25(6):371–384, June. fractals (2010) is http://paulbourke.net/fractals/. Free Software Foundation (2004), 59 Temple Place, Suite 330, Boston MA 02111-1307 USA. http://www.fsf.org/. Freeman, G. H. (1991) “Asymptotic Convergence of Dual-Tree Entropy Codes,” Proceedings of the Data Compression Conference (DCC ’91), pp. 208–217. Freeman, G. H. (1993) “Divergence and the Construction of Variable-to-Variable-Length Lossless Codes by Source-Word Extensions,” Proceedings of the Data Compression Conference (DCC ’93), pp. 79–88. Freeman, H. (ed.) (1980) Tutorial and Selected Readings in Interactive Computer Graphics, Silver Springs, MD, IEEE Computer Society Press.
Bibliography
1327
funet (2011) is ftp://nic.funet.fi/pub/graphics/misc/test-images/. funsci (2005) is http://www.funsci.com/fun3_en/panoram2/pan2_en.htm. Furuti (1997) is http://www.progonos.com/furuti/MapProj/. Gallager, Robert G. (1978) “Variations On a Theme By Huffman,” IEEE Transactions on Information Theory, IT-24(6):668–674, November. Gallery (2005) is http://gallery.wolfram.com/. Gallier, Jean (2000) Curves and Surfaces in Geometric Modeling, San Francisco, Morgan Kaufmann. Gardner, Martin (1972) “Mathematical Games,” Scientific American, 227(2):106, August. Gardner, M. (1984) The Sixth Book of Mathematical Games from Scientific American, Chicago, IL, University of Chicago Press. Gemstar (2006) is http://www.vcrplus.com/notification.asp. Ghostscript (1998) http://www.cs.wisc.edu/~ghost/index.html. Gilbert, E. N., and E. F. Moore (1959) “Variable Length Binary Encodings,” Bell System Technical Journal, Monograph 3515, 38:933–967, July. GIMP (2005) http://www.gimp.org/. Givens, Wallace (1958) “Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form,” Journal of the Society for Industrial and Applied Mathematics, 6(1):26–50, March. Glassner, Andrew S. (1989) An Introduction to Ray Tracing, London, Academic Press. Glassner, Andrew S. (ed.) (1990) Graphics Gems, London, Academic Press. Gonzalez, Rafael C., and Richard E. Woods (1992) Digital Image Processing, Reading, MA, Addison-Wesley. Gouraud, Henri (1971) “Continuous-Shading of Curved Surfaces,” IEEE Transactions on Computers, C-20(6):623–629, June. (Reprinted in [Freeman 80].) graphics.timeline (2009) is http://design.osu.edu/carlson/history/timeline.html. Gravesen, J. (1993) “Adaptive Subdivision and the Length of B´ezier Curves,” Technical Report 472, The Danish Center for Applied Mathematics and Mechanics, Technical University of Denmark. Gray, Frank (1953) “Pulse Code Communication,” United States Patent 2,632,058, March 17. Guenter B., and R. Parent (1990) “Computing the Arc Lengths of Parametric Curves,” IEEE Computer Graphics and Applications 10(3):72–78, May. Haeberli, P., and D. Voorhies (1994) “Image Processing by Linear Interpolation and Extrapolation,” IRIS Universe Magazine (28), August.
1328
Bibliography
handprint (2006) is http://www.handprint.com/HP/WCL/tech10.html. Hanson, Andrew J. (2006) Visualizing Quaternions, San Francisco, Elsevier MorganKaufmann. Hart, P. E. (2009) “How the Hough Transform was Invented,” IEEE Signal Processing Magazine, 26(6):18–22, November. Hearn D., and J. P. Baker (1997) Computer Graphics, 2nd edition, Englewood Cliffs, NJ, Prentice-Hall. Heath, F. G. (1972) “Origins of the Binary Code,” Scientific American, 227(2):76, August. Heath, Thomas (1981) A History of Greek Mathematics, New York, Dover Publications, vol. II, pp. 134–138. Heckbert, Paul S., ed., (1994) Graphics Gems IV, Boston, MA, AP Professional. helloari (2005) is http://www.helloari.com/gallery/. Herbert, Zbigniew (1993) Still Life with Bridle: Essays and Apocryphas, Ecco Press, New York. Hill, F. S. (2006) Computer Graphics, 3rd edition New York, MacMillan. Hobby, John (1986) “Smooth, Easy to Compute Interpolating Splines,” Discrete and Computational Geometry 1:123–140. hocg (2006) is http://hem.passagen.se/des/hocg/hocg_1960.htm. Holbein (2005) is http://www.nationalgallery.org.uk/server.php?show=conObject.227. Holmes (2010) is http://www.holmes3d.net/graphics/subdivision. Hopgood, F. R. A., D. A. Duce, J. R. Gallop, and D. C. Sutcliffe (1986) Introduction to the Graphical Kernel System (GKS), London, Academic Press. Hopgood, F. R. A., and D. A. Duce (1991) A Primer for PHIGS, Chichester, John Wiley. Hough, Paul V. C. (1962) Method and Means for Recognizing Complex Patterns, United States Patent 3,069,654. Hough (2010) is http://en.wikipedia.org/wiki/Hough_transform. Howard, T. L. J., W. T. Hewitt, R. J. Hubbold, and K.M. Wyrwas, (1991) A Practical Introduction to PHIGS and PHIGS Plus, Wokingham, Addison-Wesley. hq3x (2010) is web.archive.org/web/20070703061942/www.hiend3d.com file hq3x.html. Huffman, David (1952) “A Method for the Construction of Minimum Redundancy Codes,” Proceedings of the IRE, 40(9):1098–1101. hulchercamera (2005) is http://www.hulchercamera.com/.
Bibliography
1329
Hulsey, Kevin (2008) is http://www.khulsey.com/student.html. Huntley, H. E. (1970) The Divine Proportion: A Study in Mathematical Beauty, New York, Dover Publications. IAPP (2005) is http://www.panoramicassociation.org/. IGES (1986) Initial Graphics Exchange Specifications, version 3.0, Doc. #NB-SIR 863359, National Bureau of Standards, Gaithersburg, MD. IGES-NIST (2010) is http://ts.nist.gov/standards/iges/. IPIX (2005) is http://www.ipix.com/products_realestate.html. Jacobs, Corinna (2005) Interactive Panoramas: Techniques for Digital Panoramic Photography, translated by J. Parrish, New York, Springer-Verlag. Jarvis, J. F., C. N. Judice, and W. H. Ninke (1976) “A Survey of Techniques for the Image Display of Continuous Tone Pictures on Bilevel Displays” Computer Graphics and Image Processing 5(1):13–40. Jarvis, J. F. and C. S. Roberts (1976) “A New Technique for Displaying Continuous Tone Images on a Bilevel Display” IEEE Transactions on Communications 24(8):891– 898, August. Jarvis, P. (1990) “Implementing CORDIC Algorithms,” Dr. Dobb’s Journal, 152–158, October, Jensen, Henrik Wann (2005) Realistic Image Synthesis Using Photon Mapping, Natick, MA, AK Peters. Jha (2010) is http://demonstrations.wolfram.com/ NumberTheoreticConstructionOfDigitalCircles. Joshi, R. L., V. J. Crump, and T. R. Fischer (1993) “Image Subband Coding Using Arithmetic and Trellis Coded Quantization,” IEEE Transactions on Circuits and Systems Video Technology, 5(6):515–523, December. joystick (2011) is http://electronics.howstuffworks.com/joystick.htm. Kaidan (2005) is www.Kaidan.com. Karp, R. S. (1961) “Minimum-Redundancy Coding for the Discrete Noiseless Channel,” Transactions of the IRE, 7:27–38. Keith, Sandra (2001) “Dick Termes and His Spheres,” Math Horizons, September. kenrockwell (2010) is http://www.kenrockwell.com/tech/film-resolution.htm. keystoning (2010) is www.photoshopessentials.com/photo-editing/keystoning/. Kientzle, Tim (1995) “Scaling Bitmaps with Bresenham,” Dr. Dobb’s Journal, October. Kientzle, Tim (1996) “Approximate Inverse Color Mapping,” Dr. Dobb’s Journal, August 1.
1330
Bibliography
Kimberling, C., (1994) “Central Points and Central Lines in the Plane of a Triangle,” Mathematical Magazine 67:163–187. King, Ross (2000) Brunelleschi’s Dome: How a Renaissance Genius Reinvented Architecture, New York, Walker and Company; London, Chatto and Windus. Kirk, David, ed., (1992) Graphics Gems III, San Diego, CA, Harcourt Brace Jovanovich. Knuth, Donald E., (1981) The Art of Computer Programming, Reading, MA, AddisonWesley. Knuth, Donald E., (1986) The Metafont Book, Reading, MA, Addison-Wesley. Knuth, Donald E., (1987) “Digital Halftones by Dot Diffusion,” ACM Transactions on Graphics 6(4):245–273. Kochanek, D. H. U., and R. H. Bartels (1984) “Interpolating Splines with Local Tension, Continuity, and Bias Control,” Computer Graphics 18(3):33–41 (Proceedings SIGGRAPH ’84). Krikke, Jan (2000) “Axonometry: A Matter of Perspective,” IEEE Computer Graphics and Applications, 20(4):7–11, July/August. kspark (2005) is http://kspark.kaist.ac.kr/Escher/Escher.htm. Kuler (2011) is http://kuler.adobe.com/#create/fromacolor. ´ ementaires Sur Les Math´ematiques, Don´ees ` Lagrange, J. L. (1877) “Le¸cons El´ a l’Ecole Normale en 1795,” in Ouvres, VII, Paris, Gauthier-Villars, 183–287. Lam´e (1998) is the applet http://www-groups.dcs.st-andrews.ac.uk/ file ~history/Java/Lame.html. lampshade (2005) is http://www.philohome.com/lampshade/lampshade.htm. Lane, Jeffrey M., Loren C. Carpenter, Turner Whitted, and James F. Blinn (1980) “Scan Line Methods for Displaying Parametrically Defined Surfaces,” Communications of the ACM, 23(1):23–34, January. Lee, E. (1986) “Rational B´ezier Representations for Conics,” in Geometric Modeling, Farin, G., editor, Philadelphia, SIAM Publications, pp. 3–27. Lewell, John (1985) Computer Graphics: A Survey of Current Techniques and Applications, New York, Van Nostrand Reinhold. Lindenmayer, A. (1968) “Mathematical Models for Cellular Interaction in Development,” Journal of Theoretical Biology 18:280–315. Liu, D., and J. Hoschek (1989) “GC 1 Continuity Conditions Between Adjacent Rectangular and Triangular B´ezier Surface Patches,” Computer-Aided Design, 21:194–200. Loeffler, C., A. Ligtenberg, and G. Moschytz (1989) “Practical Fast 1-D DCT Algorithms with 11 Multiplications,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’89), pp. 988–991. Lomography (2010) is usa.shop.lomography.com/cameras/panoramic-cameras.
Bibliography
1331
Loop, Charles T. (1987) “Smooth Subdivision Surfaces Based on Triangles,” M.S. thesis, University of Utah, Mathematics. See also http://research.microsoft.com/~cloop Loop, Charles T. (2002) “Smooth Ternary Subdivision of Triangle Meshes,” Curve and Surface Fitting: St. Malo 2002. Luk´ aˇs, Jan, Jessica Fridrich, and M. Goljan (2006a) “Detecting Digital Image Forgeries Using Sensor Pattern Noise,” Proceedings of SPIE Electronic Imaging, Photonics West, January. http://www.ws.binghamton.edu/fridrich/publications.html Luk´ aˇs, Jan, Jessica Fridrich, and M. Goljan (2006b) “Digital Camera Identification from Sensor Noise, ” IEEE Transactions on Information Security and Forensics, 1(2):205– 214, June. http://www.ws.binghamton.edu/fridrich/publications.html Lyon, Richard F. (2009) “A Brief History of Pixel,” available online from http://www.foveon.com/files/ABriefHistoryofPixel2.pdf. Magdassi, Shlomo (Ed., 2010) The Chemistry of Inkjet Inks, Singapore, World Scientific Publishing. Manetti, Antonio (1488) The Life of Brunelleschi, translated by Catherine Enggass, University Park, PA, Pennsylvania State University Press, 1970. Manning, J. R. (1974) “Continuity Conditions for Spline Curves,” Computer Journal 17(4):181–186, May. martinreddy (2010) is http://www.martinreddy.net/gfx/2d/GIF89a.txt and also http://www.w3.org/Graphics/GIF/spec-gif89a.txt. masson (2011) is http://www.cs.cmu.edu/~ph/nyit/masson/history.htm. MathSource (2005) http://library.wolfram.com/infocenter/MathSource/4930/. Mathworld (2005) is http://mathworld.wolfram.com/Quaternion.html. Mathworks (2005) http://www.mathworks.com/. May, C. P. (1962) James Clerk Maxwell and Electromagnetism, New York, Franklin Watts. McAllister (2010) is (or was) http://www.particlesystems.org. mercator (2005) is http://mathsforeurope.digibel.be/mercator.htm. Mesdag Documentation Society (1998) is http://www.mesdag.com/index.html. Moore, Charles G. (1989) “To View an Ellipse in Perspective,” The College Mathematics Journal, 20(2):134–136, March. morrison (2010) is http://www.cs.cmu.edu/~ph/nyit/morrison/index.html. mosaic history (2010) is http://www.digitalartform.com/archives/2004/12/ file history_of_phot.html. msnbc.camera (2009) is http://www.msnbc.msn.com/id/9261340/.
1332
Bibliography
Mulcahy, Colm (1996) “Plotting and Scheming with Wavelets,” Mathematics Magazine, 69(5):323–343, December. See also http://www.spelman.edu/~colm/csam.ps. Mulcahy, Colm (1997) “Image Compression Using the Haar Wavelet Transform,” Spelman College Science and Mathematics Journal, 1(1):22–31, April. Also available at URL http://www.spelman.edu/~colm/wav.ps. (It has been claimed that any smart 15-year-old could follow this introduction to wavelets.) Newbold (2005) is http://dogfeathers.com/java/pulfrich.html. Newell, M. E., R. G. Newell and T. L. Sancha (1972) “A Solution to the Hidden Surface Problem,” Proceedings of the ACM annual conference, ACM ’72, pp. 443–450 Nicholl, Tina M., D. T. Lee, and Robin A. Nicholl (1987) “An Efficient New Algorithm for 2-D Line Clipping: Its development and Analysis,” SIGGRAPH ’87, 21(4):253—262, July. Noblex (2005) is http://www.kwdo.de/. Nyquist (2003) is file Nyquist-Shannon_sampling_theorem in http://www.wikipedia.org/wiki/. OpenGL (1999) OpenGL Reference Manual, 3rd ed., written by the OpenGL Architectural Review Board, Reading, MA, Addison-Wesley Professional, December. This is known as the blue book. OpenGL (2009) Shreiner, David OpenGL Programming Guide: The Official Guide to Learning OpenGL, Versions 3.0 and 3.1, 7th ed., Reading, MA, Addison-Wesley Professional, July. This is known as the red book. OpenGL (2010) Wright, Richard S., Benjamin Lipchak, Nicholas Haemel, and Graham Sellers, OpenGL SuperBible: Comprehensive Tutorial and Reference, 5th ed., Reading, MA, Addison-Wesley, July. Orosz (2005) is http://www.utisz.net/. Paeth, Alan W. (1986) ”A Fast Algorithm for General Raster Rotation,” in Proceedings Graphics Interface ’86, Canadian Information Processing Society, pp. 77–81. Paeth, Alan W. (1991) “Image File Compression Made Easy,” in Graphics Gems II, James Arvo, editor, San Diego, CA, Academic Press. Paeth, Alan W. (editor) (1995) Graphics Gems V, Boston, MA, AP Professional. Pamplona, Vitor F., Leandro A. F. Fernandes, Jo˜ ao Prauchner, Luciana P. Nedel and Manuel M. Oliveira (2008) “The Image-Based Data Glove,” Proceedings of X Symposium on Virtual Reality (SVR’2008), Jo˜ ao Pessoa, Ed., pp. 204–211. Available from http://vitorpamplona.com/deps/papers/2008_SVR_IBDG.pdf. Pantone Inc., (1991) PANTONE Color Formula Guide 1000, Pantone Inc. Pearson, F. (1990) Map Projections: Theory and Applications, Boca Raton, FL., CRC Press. peephole (2011) is instructables.com/id/%2411-Super-Wide-Angle-Digital-Camera.
Bibliography
1333
Pennebaker, William B., and Joan L. Mitchell (1992) JPEG Still Image Data Compression Standard, New York, Van Nostrand Reinhold. Penrose, L. S., and Roger Penrose (1958) “Impossible objects: A Special Type of Visual Illusion,” British Journal of Psychology, 49(1):31–33. Peters, J¨ org and Ulrich Reif (2008) Subdivision Surfaces, Berlin, Springer-Verlag. Petersik (2005) is http://www.stereofoto.de/index.html. Phaser (2011) is http://www.office.xerox.com/printers/enus.html. philohome (2005) is http://www.philohome.com/panorama.htm. photo.net (2011) is http://photo.net/digital-darkroom-forum/00GDqR. Piegl L., and W. Tiller (1997) The NURBS Book, Berlin, Springer-Verlag. Pigeon, Steven (2001a) “Start/Stop Codes,” Proceedings of the Data Compression Conference (DCC ’01), p. 511. Also available at http://www.stevenpigeon.org/Publications/publications/ssc_full.pdf. Pigeon, Steven (2001b) Contributions to Data Compression, PhD Thesis, University of Montreal (in French). Available from http://www.stevenpigeon.org/Publications/publications/phd.pdf. The part on taboo codes is “Taboo Codes, New Classes of Universal Codes,” and is also available at www.iro.umontreal.ca/~brassard/SEMINAIRES/taboo.ps. A new version has been submitted to SIAM Journal of Computing. pixma.ulmb (2011) is http://pixma.ulmb.com/?p=136/3000-4000-5000-8500.html. Plass, Michael, and Maureen Stone (1983) ”Curve-Fitting with Piecewise Parametric Cubics,” ACM Transactions on Computer Graphics, 17(3):229–239, July. PNG (2003) is http://www.libpng.org/pub/png/. pogo sketch (2010) is http://tenonedesign.com/products.php?application=MacBook. Prautzsch, H., (1984) “A Short Proof of the Oslo Algorithm,” Computer Aided Geometric Design, 1:95–96. Press, W. H., B. P. Flannery, et al. (1988) Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press. (Also available from http://www.nr.com/.) Pritchard, D. H. (1977) “US Color Television Fundamentals—a Review,” IEEE Transactions on Consumer Electronics CE-23(4):467–478, November. Prusinkiewicz, Przemyslaw (1989) Lindenmayer Systems, Fractals, and Plants, New York, Springer-Verlag. Quasi-Newton (2010) is http://en.wikipedia.org/wiki/Quasi-Newton_method. Ramshaw, Lyle (1987) “Blossoming: A Connect-the-Dots Approach to Splines,” Digital Equipment Corporation, Research Report 19, June 21. Rao, K. R., and P. Yip (1990) Discrete Cosine Transform—Algorithms, Advantages, Applications, London, Academic Press.
1334
Bibliography
Rao, Kamisetty Ramamohan and Patrick C. Yip, editors (2000) The Transform and Data Compression Handbook, Boca Raton, FL, CRC Press. redpicture (2011) is http://www.redpicture.com/bezier/bezier-01.html. Reeves, William T. (1983) “Particle Systems, a Technique for Modeling a Class of Fuzzy Objects,” ACM Transactions on Graphics, 2(2):91–108, April. remotereality (2005) is http://www.remotereality.com/. Riemersma, Thiadmer (2002) “Quick and Smooth Image Scaling with Bresenham,” Dr. Dobb’s Journal May. Riemersma, Thiadmer (2006) “Quick image scaling by 2” in http://www.compuphase.com/graphic/scale2.htm. Riesenfeld, R. (1975) “On Chaikin’s Algorithm,” IEEE Computer Graphics and Applications, 4(3):304–310. roberthodgin (2010) is http://roberthodgin.com/stippling. Robertson, Scott (2008) is http://drawthrough.com/dvds.php. Roetling, P. G. (1976) “Halftone Method with Edge Enhancement and Moir´e Suppression,” Journal of the Optical Society of America, 66:985–989. Roetling, P. G. (1977) “Binary Approximation of Continuous Tone Images,” Photography Science and Engineering, 21:60–65. Rogers, David (2001) An Introduction To NURBS: With Historical Perspective, San Francisco, Morgan Kaufmann. Rokne, J. G., Brian Wyvill, and Xiaolin Wu (1990) “Fast Line Scan-Conversion,” ACM Transactions on Graphics, 9(4):376–388, October. roundshot (2005) is http://www.roundshot.ch/. Rozin (2011) is http://www.smoothware.com/danny. Said, A. and W. A. Pearlman (1996), “A New Fast and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees,” IEEE Transactions on Circuits and Systems for Video Technology, 6(6):243–250, June. Salomon, David (1999) Computer Graphics and Geometric Modeling, New York, SpringerVerlag. Salomon, David (2005) Curves and Surfaces for Computer Graphics, New York, SpringerVerlag. Salomon, David (2007) Variable-Length Codes for Data Compression, London, SpringerVerlag. Salomon, David (2009) A handbook of Data Compression, London, Springer-Verlag. Samet, Hanan (1990a) Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS, Reading, MA, Addison-Wesley.
Bibliography
1335
Samet, Hanan (1990b) The Design and Analysis of Spatial Data Structures, Reading, MA, Addison-Wesley. Scale2 (2010) is http://scale2x.sourceforge.net/algorithm.html. Schreiber (2010) is http://demonstrations.wolfram.com/GammaCorrection. Schumaker, L. (1981) Spline Functions: Basic Theory, New York, John Wiley. Sedgewick, Robert (1997) Algorithms in C: Parts 1–4: Fundamentals, Data Structures, Sorting, Searching, Reading, MA, Addison-Wesley. sensics (2008) is http://www.sensics.com/files/documents/2008SurveyResults.pdf Sgrilli, Bernardo Sansone (1733) Descrizione e studi dell’insigne fabbrica di Santa Maria del Fiore metropolitana fiorentina, Florence, Bernardo Paperini. shortcourses (2005) is http://www.shortcourses.com/. Simoncelli, Eero P., and Edward. H. Adelson (1990) “Subband Transforms,” in Subband Coding, John Woods, ed., Boston, MA, Kluwer Academic Press, pp. 143–192. sketchpad.wiki (2010) is http://en.wikipedia.org/wiki/Sketchpad. Sloan (2010) is http://sloan.stanford.edu/MouseSite/1968Demo.html. Smith, Alvy Ray (2009) “A Pixel Is Not A Little Square,” available from ftp://ftp.alvyray.com/Acrobat/6_Pixel.pdf. [SMPTE-170M] Society of Motion Picture and Television Engineers (1994), “Television— Composite Analog Video Signal—NTSC for Studio Applications,” SMPTE-170M. This is available from http://www.smpte.org/standards. Snyder, J. P. (1987) Map Projections: A Working Manual, US Geological Survey Professional Paper 1395, Washington, DC, US Government Printing Office. Snyder, J. P. (1993) Flattening the Earth, Chicago, Ill, The University of Chicago Press. Sobol-wiki (2010) is http://en.wikipedia.org/wiki/Sobol_sequence. solid-ink (2011) is www.imaging.org/ist/resources/tutorials/solid_ink.cfm. Sp¨ ath, Helmuth (1983) Spline Algorithmen zur Konstruktion glatter Kurven und Fl¨ achen, 3rd edition, Munich, Vienna, Oldenbourg Wissenschaftsverlag. Sp¨ ath, Helmuth (1995a) One-Dimensional Spline Interpolation Algorithms, AK Peters. Sp¨ ath, Helmuth (1995b) Two-Dimensional Spline Interpolation Algorithms, AK Peters. Steinhaus, Hugo (1983) “Platonic Solids, Crystals, Bees’ Heads, and Soap,” Chapter 8 in Mathematical Snapshots, 3rd ed. New York, Dover, pp. 199–201 and 252–256. StereoGlasses (2011) is http://www.wikihow.com/Make-Your-Own-3D-Glasses. StereoGraphics (2005) is http://www.reald.com/content/professional.aspx. STOIK-Imagic (2011) is http://www.stoik.com/products/photo/STOIK-Imagic/.
1336
Bibliography
Stollnitz E. J., T. D. DeRose, and D. H. Salesin (1996) Wavelets for Computer Graphics, San Francisco, CA, Morgan Kaufmann. Stone, M. C., William B. Cowan, and John C. Beatty, (1988) “Color Gamut Mapping and the Printing of Digital Color Images,” ACM Transactions on Graphics 7(3):249–292, October. Stothers (2005) is http://www.maths.gla.ac.uk/~wws/cabripages/inversive/ file inversive0.html. Strang, Gilbert (1999) “The Discrete Cosine Transform,” SIAM Review, 41(1):135–147. Strang, Gilbert, and Truong Nguyen (1996) Wavelets and Filter Banks, Wellesley, MA, Wellesley-Cambridge Press. Strothotte, Thomas and Stefan Schelchtweg (2002) Non-Photorealistic Computer Graphics: Modeling, Rendering, and Animation, San Francisco, Morgan Kaufmann (Elsevier science). strydent (2011) is http://www.strydent.com. Sturman, D.J. and D. Zeltzer (1994), “A survey of glove-based input,” IEEE Computer Graphics and Applications, 14(1):30–39, January. Sukthankar, Rahul, Robert G. Stockton, and Matthew D. Mullin1 (2000) “Automatic Keystone Correction for Camera-assisted Presentation Interfaces,” Advances in Multimodal Interfaces, ICMI 2000, Springer Lecture Notes in Computer Science, 2000, Volume 1948/2000, pp. 607–614. Swartzlander, Earl E. (1990) Computer Arithmetic, Silver Spring, MD, IEEE Computer Society Press. Tanenbaum, Andrew S. (2002) Computer Networks, Upper Saddle River, NJ, Prentice Hall. templeton (2010) is http://pic.templetons.com/brad/photo/pixels.html. Termes (1980) United States patent 4,214,821, available at http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u= /netahtml/search-bool.html&r=10&f=G&l=50&co1=AND&d=ptxt& s1=4214821&OS=4214821&RS=4214821. Termes, Dick (1998) New Perspective Systems, published privately, see http://www.termespheres.com/perspective.html. termespheres (2005) is http://www.termespheres.com/perspective.html. Thimbleby, Harold, Stuart Inglis, and Ian Witten (1994) “Displaying 3D Images: Algorithms for Single-Image Random-Dot Stereograms,” IEEE Computer, 27(10):38–48. Travis, A. R. L., (1990) “Autostereoscopic 3-D Display,” Applied Optics, 29(29):4341– 4342. Travis (1992) “Three-Dimensional Display Device,” United States patent 5,132,839 issued July 21, 1992 to Adrian R. L. Travis.
Bibliography
1337
Triangles (2011), See http://faculty.evansville.edu/ck6/index.html for many triangles’ centers. Turnbull, Herbert W. et al. (editors) (1959) The Correspondence of Isaac Newton, Seven volumes, Cambridge, Cambridge University Press. Ulichney, Robert (1987) Digital Halftoning, Cambridge, MA, MIT Press. unicorn (2010a) is http://www.newyorker.com/archive/2005/04/11/ 050411fa_fact?currentPage=1. unicorn (2010b) is http://en.wikipedia.org/wiki/The_Hunt_of_the_Unicorn. Vachss, Raymond (1987) ”The CORDIC Magnification Function,” IEEE Micro, 7(5)83– 84, October. Vetterli, M., and J. Kovacevic (1995) Wavelets and Subband Coding, Englewood Cliffs, NJ, Prentice-Hall. Vitruvius (2006) De Architectura. The Latin text and English translation is at http://penelope.uchicago.edu/Thayer/E/Roman/Texts/Vitruvius/home.html. Also see D. Rowland and T.N. Howe, Vitruvius: Ten Books on Architecture, Cambridge, Cambridge University Press, 1999. vrealities (2010) is http://www.vrealities.com/cyber.html. Volder, Jack E. (1959) “The CORDIC Trigonometric Computing Technique,” IRE Transactions on Electronic Computers, EC-8:330–334. wacom (2010) is http://www.wacom.com/bamboo/bamboo_pen_touch.php. Walker, Paul Robert (2002) The Feud That Sparked the Renaissance: How Brunelleschi and Ghiberti Changed the Art World, New York, HarperCollins. Wallace, Gregory K. (1991) “The JPEG Still Image Compression Standard,” Communications of the ACM, 34(4):30–44, April. Walther, John S. (1971) “A Unified Algorithm for Elementary Functions,” Proceedings of Spring Joint Computer Conference, 38:379–385. Warnock, John (1969) “A Hidden Surface Algorithm for Computer Generated Halftone Pictures,” Technical Report TR 4-15, Computer Science Dept., University of Utah. Warren, Joe and Henrik Weimer (2002) Subdivision Methods for Geometric Design: A Constructive Approach,San Francisco, CA, Morgan Kaufmann. Watson, Andrew (1994) “Image Compression Using the Discrete Cosine Transform,” Mathematica Journal, 4(1):81–88. wearcam (2010) is http://wearcam.org/head-mounted-displays.html. wikipedia (2005) is http://en.wikipedia.org/wiki/Flat_earth. WikiQuaternion (2005) is http://en.wikipedia.org/wiki/Quaternion. wiki-color-print (2010) is http://en.wikipedia.org/wiki/Color_printing.
1338
Bibliography
Wings3D (2005) is http://www.wings3d.com/. Woeste, Harald (2009) Mastering Digital Panoramic Photography, Santa Barbara, CA, Rocky Nook. Wolf, Mark (editor, 2008) The Video Game Explosion: A History from PONG to PlayStation and Beyond, Greenwood Press, Westport, CT. Wolfram (2006) is http://mathworld.wolfram.com/topics/MapProjections.html. Wolfram-dither (2010a) is http://demonstrations.wolfram.com/ OrderedDitherPatterns. Wolfram-dither (2010b) is http://demonstrations.wolfram.com/ ErrorDiffusionDitherPatterns. Wolfram Research (2005) is http://www.wolfram.com. Wolfram, Stephen (2003) The Mathematica Book, Fifth Edition, Champaign, IL., Wolfram Media. wondertouch (2010) is http://www.wondertouch.com. Woodham, R. J. (1980) “Photometric method for determining surface orientation from multiple images,” Optical Engineerings, 19(I):139–144. Wright, T.J. (1973) “A Two-Space Solution to the Hidden-Line Problem for Plotting Functions of Two Variables,” IEEE Transactions on Computers, C-22(1):28–33, January. Wu, Xiaolin (1991) “An Efficient Antialiasing Technique,” Computer Graphics, 25(4)143– 152. Wu, X., and J. G. Rokne (1987) “Double-Step Incremental Generation of Lines and Circles,” Computer Vision, Graphics, and Image Processing 37:331–344. xahlee (2005) is http://www.xahlee.org/SpecialPlaneCurves_dir/ CassinianOval_dir/cassinianOval.html. Yamaguchi, F. (1988) Curves and Surfaces in Computer Aided Geometric Design, Berlin, Springer-Verlag. yarin (2010) is http://alumni.media.mit.edu/~yarin/laser/laser_printing.html. Zhang, Manyun (1990) The JPEG and Image Data Compression Algorithms (dissertation). Zpen (2011) is http://www.danedigital.com/6-Zpen/. What do you think? Your manuscript referenced his Louvre collection several times, his books are in your bibliography, and the guy has some serious clout for foreign sales. Sauni` ere was a no-brainer.
—Dan Brown, The Da Vinci Code (2003)
Answers to Exercises It is a good morning exercise for a research scientists to discard a pet hypothesis every day before breakfast. It keeps him young. —Konrad Lorenz. 1: This is a row vector whose four elements are points and are therefore themselves vectors (pairs in two dimensions and triplets in three dimensions). 2.1: At present, virtual reality renders images and sounds and allows for some user interaction with the simulated environment. Extrapolating this, we predict that the next step will be to compute a simulation that covers the entire human sensory range, including the visual, kinesthetic (tactile and emotional feelings), olfactory (smell), gustatory (taste), and auditory senses. Such a simulation would create a perfect illusion, completely overriding the normal functioning of the senses and fooling the user into believing that they really are experiencing the simulated, virtual environment. To understand how such a simulation can be done let’s consider, for example, the sense of smell. It can be simulated by preparing chemicals that produce the required smells and mixing and releasing them during a virtual-reality session. However, a better approach is to find out how the sense of smell works. Once this is understood, it may be possible to send the brain the signals that are normally sent by the nerves from the nose, and in this way directly stimulate the brain and create the sensation of any desired smells. 2.2: This is easy. The index is (r − 1 − y)c + x. 2.3: Since each truth table has four, 1-bit entries, there can be 16 combinations of the entries, leading to 16 possible logical operations. Most, however, are not useful and are never used in practice. The last example in Table 2.5, for example, creates a zero if its first input is zero. If its first input is one, it creates the opposite of the second input. This is rarely, if ever, useful. 2.4: Yes! When dragging an object or rubber banding it, the program has to draw an outline, erase it, then draw a slightly different outline, and repeat this very quickly. 1339
Answers to Exercises
1340
2.5: Yes! Imagine 4-bit pixels (16 colors). If a pixel is drawn in color 1010 and we have a new source of color 1110, then the xor pixel being drawn will be 0100. If we now erase, say, the 1010 pixel, the result will be the xor of 0100 and 1010, which is 1110. It’s easy to demonstrate that this method works even for three or more objects intersecting at a point (see Figure 2.7). 2.6: Such a polynomial depends on three coefficients b, c, and d that can be considered three-dimensional points, and any three points are on the same plane. 2.7: This is straightforward P(2/3) =(0, −9)(2/3)3 + (−4.5, 13.5)(2/3)2 + (4.5, −3.5)(2/3) =(0, −8/3) + (−2, 6) + (3, −7/3) =(1, 1) = P3 . 2.8: We use the relations sin 30◦ = cos 60◦ = .5 and the approximation cos 30◦ = sin 60◦ ≈ .866. The four points are P1 = (1, 0), P2 = (cos 30◦ , sin 30◦ ) = (.866, .5), P3 = (.5, .866), and P4 = (0, 1). The relation A = N · P becomes ⎛ ⎞ ⎛ ⎞⎛ ⎞ a −4.5 13.5 −13.5 4.5 (1, 0) 18 −4.5 ⎟ ⎜ (.866, .5) ⎟ ⎜b⎟ ⎜ 9.0 −22.5 ⎝ ⎠=A=N·P=⎝ ⎠⎝ ⎠ c −5.5 9.0 −4.5 1.0 (.5, .866) d 1.0 0 0 0 (0, 1) and the solutions are a = −4.5(1, 0) + 13.5(.866, .5) − 13.5(.5, .866) + 4.5(0, 1) = (.441, −.441), b = 19(1, 0) − 22.5(.866, .5) + 18(.5, .866) − 4.5(0, 1) = (−1.485, −0.162), c = −5.5(1, 0) + 9(.866, .5) − 4.5(.5, .866) + 1(0, 1) = (0.044, 1.603), d = 1(1, 0) − 0(.866, .5) + 0(.5, .866) − 0(0, 1) = (1, 0). Thus, the PC is P(t) = (.441, −.441)t3 + (−1.485, −0.162)t2 + (0.044, 1.603)t + (1, 0). The midpoint is P(.5) = (.7058, .7058), only 0.2% away from the midpoint of the arc, which is at (cos 45◦ , sin 45◦ ) ≈ (.7071, .7071). 2.9: The new equations are easy enough to set up. Using Mathematica, they are also easy to solve. The following code Solve[{d==p1, a al^3+b al^2+c al+d==p2, a be^3+b be^2+c be+d==p3, a+b+c+d==p4},{a,b,c,d}]; ExpandAll[Simplify[%]] (where al and be stand for α and β, respectively) produces the (messy) solutions
Answers to Exercises
1341
P2 P3 P4 P1 + + , + αβ −α2 + α3 + αβ − α2 β αβ − β 2 − αβ 2 + β 3 1 − α − β + αβ b = P1 −α + α3 + β − α3 β − β 3 + αβ 3 /γ + P2 −β + β 3 /γ + P3 α − α3 /γ + P4 α3 β − αβ 3 /γ,
1 1 βP2 + c = −P1 1 + + α β −α2 + α3 + αβ − α2 β αβP4 αP3 + + , αβ − β 2 − αβ 2 + β 3 1 − α − β + αβ d = P1 , where γ = (−1 + α)α(−1 + β)β(−α + β). a=−
From here, the basis matrix immediately follows ⎛
1 −α2 +α3 αβ−α2 β
1 − αβ
⎜ −α+α +β−α β−β +αβ ⎜ γ ⎜
⎜ ⎝ − 1 + α1 + β1 3
3
1
3
3
3
1 αβ−β 2 −αβ 2 +β 3
1 1−α−β+αβ 3
3
α−α3 γ
α β−αβ γ
β −α2 +α3 +αβ−α2 β
α αβ−β 2 −αβ 2 +β 3
αβ 1−α−β+αβ
0
0
0
−β+β γ
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
A direct check, again using Mathematica, for α = 1/3 and β = 2/3, reduces this matrix to matrix N of Equation (2.6). 2.10: The weights have to add up to 1 because this results in a weighted sum whose value is in the same range as the values of the pixels. If pixel values are, for example, in the range [0, 15] and the weights add up to 2, a prediction may result in values of up to 30. 2.11: The missing points will have to be estimated by interpolation or extrapolation from the known points before our method can be applied. Obviously, the fewer points are known, the worse the final interpolation. Note that 16 points are necessary, because a bicubic polynomial has 16 coefficients. 2.12: Figure Ans.1a shows a diamond-shaped grid of 16 equally-spaced points. The eight points with negative weights are shown in red. Figure Ans.1b shows a cut (labeled xx) through four points in this surface. The cut is a curve that passes through pour data points. It is easy to see that when the two exterior (red) points are raised, the center of the curve (and, as a result, the center of the surface) is lowered. It is now clear that points with negative weights push the center of the surface in a direction opposite that of the points. Figure Ans.1c is a more detailed example that also shows why the four corner points should have positive weights. It shows a simple symmetric surface patch that
Answers to Exercises
1342
x
x (a)
(b)
3
3
2
2
1
3
1
0
2
0 2
2
1 0
1
2
1
0 3
0 2
1 3
3 2
2 1 0
(c)
1
1 0
(d) Figure Ans.1: An Interpolating Bicubic Surface Patch.
Clear[Nh,p,pnts,U,W]; p00={0,0,0}; p10={1,0,1}; p20={2,0,1}; p30={3,0,0}; p01={0,1,1}; p11={1,1,2}; p21={2,1,2}; p31={3,1,1}; p02={0,2,1}; p12={1,2,2}; p22={2,2,2}; p32={3,2,1}; p03={0,3,0}; p13={1,3,1}; p23={2,3,1}; p33={3,3,0}; Nh={{-4.5,13.5,-13.5,4.5},{9,-22.5,18,-4.5}, {-5.5,9,-4.5,1},{1,0,0,0}}; pnts={{p33,p32,p31,p30},{p23,p22,p21,p20}, {p13,p12,p11,p10},{p03,p02,p01,p00}}; U[u_]:={u^3,u^2,u,1}; W[w_]:={w^3,w^2,w,1}; (* prt [i] extracts component i from the 3rd dimen of P *) prt[i_]:=pnts[[Range[1,4],Range[1,4],i]]; p[u_,w_]:={U[u].Nh.prt[1].Transpose[Nh].W[w], U[u].Nh.prt[2].Transpose[Nh].W[w], \ U[u].Nh.prt[3].Transpose[Nh].W[w]}; g1=ParametricPlot3D[p[u,w], {u,0,1},{w,0,1}, Compiled->False, DisplayFunction->Identity]; g2=Graphics3D[{AbsolutePointSize[2], Table[Point[pnts[[i,j]]],{i,1,4},{j,1,4}]}]; Show[g1,g2, ViewPoint->{-2.576, -1.365, 1.718}] Code For Figure Ans.1
0
(e)
Answers to Exercises
1343
interpolates the 16 points P00 P01 P02 P03
= (0, 0, 0), = (0, 1, 1), = (0, 2, 1), = (0, 3, 0),
P10 P11 P12 P13
= (1, 0, 1), = (1, 1, 2), = (1, 2, 2), = (1, 3, 1),
P20 P21 P22 P23
= (2, 0, 1), = (2, 1, 2), = (2, 2, 2), = (2, 3, 1),
P30 P31 P32 P33
= (3, 0, 0), = (3, 1, 1), = (3, 2, 1), = (3, 3, 0).
We first raise the eight boundary points from z = 1 to z = 1.5. Figure Ans.1d shows how the center point P(.5, .5) gets lowered from (1.5, 1.5, 2.25) to (1.5, 1.5, 2.10938). We next return those points to their original positions and instead raise the four corner points from z = 0 to z = 1. Figure Ans.1e shows how this raises the center point from (1.5, 1.5, 2.25) to (1.5, 1.5, 2.26563). 2.13: Figure Ans.2 illustrates the results of the 4/10 shrinking. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
1 3 6 8 31 33 36 38 61 63 66 68 81 83 86 88
Figure Ans.2: 4/10 Bitmap Shrinking by Copying.
2.14: Because the ideal edge passes pixel y1 in the middle. 2.15: The ideal edge (slanted at 60◦ ) cuts y1 into two pieces whose ratio is 3/1 and y2 in two pieces whose ratio is 1/3. Thus, the following values are ideal y1 = (3a + b)/2, y2 = (a + 3b)/2, and y3 = y4 = b. 2.16: Each row will have to be stretched by adding to it wt − L(r) pixels. 2.17: Direct computation produces
(−1, −1) → (−1, 0), (−1, 0) → (−0.8, 0.8), (−1, 1) → (0, 1), (0, −1) → (−0.8, −0.8), (0, 0) → (0, 0), (0, 1) → (0.8, 0.8), (1, −1) → (0, −1), (1, 0) → (0.8, −0.8), (1, 1) → (1, 0).
Answers to Exercises
1344 If we round 0.8 to 1, we get
(−1, −1) → (−1, 0), (−1, 0) → (−1, 1), (−1, 1) → (0, 1), (0, −1) → (−1, −1), (0, 0) → (0, 0), (0, 1) → (1, 1), (1, −1) → (0, −1), (1, 0) → (1, −1), (1, 1) → (1, 0). A direct check verifies that in this case, every destination pixel is the mapping of some source pixel and no two source pixels map to the same destination. 2.18: The short Mathematica code below produces the inverse transformation ⎡ ⎢ T−1 = ⎣
cos α cos2 α+sin2 α sin α cos2 α+sin2 α 2 −x0 cos α+x0 cos α−y0 sin α+x0 sin2 α cos2 α+sin2 α
− sin α cos2 α+sin2 α cos α cos2 α+sin2 α 2 −y0 cos α+y0 cos α+x0 sin α+y0 sin2 α cos2 α+sin2 α
⎤ 0 0⎥ ⎦, 1
where again we have to keep in mind that α depends on the distance d. Figure 2.50 is a simple example of a twirl. T={{1,0,0},{0,1,0},{-x0,-y0,1}}. {{Cos[a],Sin[a],0},{-Sin[a],Cos[a],0},{0,0,1}}. {{1,0,0},{0,1,0},{x0,y0,1}} Inverse[T] 2.19: The Cassinian oval is another anallagmatic curve. Recall that an ellipse is the locus of all the points the sum of whose distances from two foci is constant. The Cassinian oval is similarly defined as the locus of the points the product of whose distances from two foci is constant. See [xahlee 05] for a detailed discussion and figures. 2.20: Figure Ans.3 shows a circle C that does not pass through the origin. (Notice that the circle of inversion itself is not shown.) We construct the line L from O through the center of C and examine the intersection points P and Q. Their projections P∗ and Q∗ must be on L. We select an arbitrary point R on C and denote its projection R∗ . From OP · OP ∗ = OQ · OQ∗ = OR · OR∗ , we get OR∗ /OP = OP ∗ /OR, indicating that the two triangles ORP and OR∗ P ∗ are similar. This implies that angles OP ∗ R∗ and ORP are equal and also that angles OQ∗ R∗ and ORQ are equal. We subtract angles and find that OP ∗ R∗ − OQ∗ R∗ = ORP − ORQ = 90◦ , which implies that angle P ∗ R∗ Q∗ = 90◦ . Since this is true for a general point R, we conclude that all the points R∗ (which together constitute the projection of C) are located on the circle C ∗ centered on L with diameter P ∗ Q∗ . 2.21: The rule P∗ = 1/P is generalized to P∗ = R2 /P. This projection retains all the features of the original (unit circle) projection. 2.22: The two edges are considered a single edge.
Answers to Exercises
1345
R
R* O
P*
Q*
P
Q
C* C
Figure Ans.3: Circular Inversion of a Circle.
2.23: Assume that the algorithm tests a segment against the four rectangle edges in the order top, bottom, left, and right. Segment dh is first tested against the top edge and point f is located. Section df lies outside and is discarded. Section f h is tested against the bottom edge (no intersection), the left edge (no intersection), and the right edge (point g). section gh lies outside the rectangle and is discarded, while section f g lies completely inside the rectangle. Notice that point e is never identified.
2.24: The first point was added when the edge that ends with A(a) was examined and clipped. The second point is added when the edge that starts with B(d) is examined.
2.25: It is easy to see that the result is polygon p3 p4 p5 p6 .
2.26: Once the program detects a click, it inputs the cursor coordinates and checks the corresponding location of the codemap. In our example, it finds serial number 2. The program examines location 2 of the geometric data structure, finds that graphics object with serial number 2 is a circle, and finds the pointer to its specific data. That data consists of the radius and the coordinates of the center point. The program then calculates all the pixels of the circle (using the same scan-converting method that was originally used to draw it) and highlights them. Certain points may be highlighted with a different color or made larger (Figure 2.70f), making them more obvious to the user. These are called anchors and can later be used by the user to drag or reshape the circle.
2.27: Another array, the size of the codemap, may be declared, whose elements are boolean. Alternatively, the codemap may be constructed as an array of structures, each consisting of a code field and a flag (boolean) field.
2.28: Direct calculations using matrix D44 produce the areas shown in Figure Ans.4. The three areas have black pixel percentages of 1/16, 2/16, and 16/16, respectively.
Answers to Exercises
1346 A[x, y] =
0
1
15
10001000. . . 00000000. . . 00000000. . . 00000000. . .
10001000. . . 00000000. . . 00100010. . . 00000000. . .
11111111. . . 11111111. . . 11111111. . . 11111111. . .
Figure Ans.4: Ordered Dither: Three Uniform Areas.
2.29: A direct application of Equation (2.9) yields
D88 =
0 32 8 40 2 34 10 42 48 16 56 24 50 18 58 26 12 44 4 36 14 46 6 38 60 28 52 20 62 30 54 22 . 3 35 11 43 1 33 9 41 51 19 59 27 49 17 57 25 15 47 7 39 13 45 5 37 63 31 55 23 61 29 53 21
2.30: A checkerboard pattern. This can be seen by manually simulating the algorithm of Figure 2.78b for a few pixels. 2.31: We assume that the test is if p ≥ 0.5, then p := 1 else p := 0; add the error 0.5 − p to the next pixel q. The first pixel is thus set to 1 and the error of 0.5 − 1 = −0.5 is added to the second pixel, changing it from 0.5 to 0. The second pixel is set to 0 and the error, which is 0 − 0 = 0, is added to the third pixel, leaving it at 0.5. The third pixel is thus set to 1 and the error of 0.5 − 1 = −0.5 is added to the fourth pixel, changing it from 0.5 to 0. The results are .5 .5 .5 .5 .5 → 1 0 .5 .5 .5 → 1 0 1 0 .5 → 1 0 1 0 1 2.32: Direct examination shows that the barons are 62 and 63 and the near-barons are 60 and 61. 2.33: A checkerboard pattern, similar to the one produced by diffusion dither. This can be seen by manually executing the algorithm of Figure 2.80 for a few pixels. 2.34: Classes 14, 15, and 10 are barons. Classes 12 and 13 are near-barons. The class numbers in positions (i, j) and (i, j + 2) add up to 15. 2.35: A program with bugs. Each time it is executed it may crash, but it may also run and produce wrong (and different) results. A case in point is an array overflow. If an array of length 10 is allocated in a program, and the program (because of a bug) tries to access location 11, the result will be unpredictable and may be different each time the program is run.
Answers to Exercises
1347
2.36: If the elements of the kernel total more than 1 or less than 1, the Gaussian blurred image would be brighter or darker than the original image. 2.37: The row number is (i mod 11) and the column number is 10 − (i ÷ 11), where ÷ denotes integer-by-integer division (the quotient is truncated to the nearest integer). 2.38: The parameter pairs for pixel (9, 9) satisfy 9 = 9a + b or a = (9 − b)/9. The 11 pairs are therefore (1, 0), (8/9 ≈ 0.9, 1), (7/9 ≈ 0.8, 2), (6/9 ≈ 0.7, 3), . . . , (0, 9), and (−1/9 ≈ −0.1, 10). 3.1: It is the small step. If the line has a small slope (i.e., it is close to horizontal), small changes in x cause only small changes in y. If x = 4.32 corresponds to y = 6.15, then x = 4.33 may correspond to, say, y = 6.27. Both values are rounded to the pixel (4, 6). A good algorithm should create a new pixel in each iteration. 3.2: The slope a of the line equals Δy/Δx = (6 − 2)/(4 − 1) = 4/3. Since Δy > Δx we set G = 1/a = 3/4 and H = 1. The loop iterates from L = 1 to L = max(Δx, Δy) + 1 = 5. The five pixels generated are x: 1 1.75 2.5 3.25 4 round(x): 1 2 3 3 4 y: 2 3 4 5 6 √ The length of the line equals Δ2 x + Δ2 y = 32 + 42 = 5. It is identical to the number of pixels. 3.3: For simple DDA, we get a slope of (5 − 1)/(5 − 1) = 1. The x coordinate is incremented from 1 to 5 in steps of 1. The y coordinate is incremented from the initial y value of 1 to the final value in steps of the slope, which is also 1. The points are thus (1, 1), (2, 2) through (5, 5). For the quadrantal DDA, we start with Δx = 4, Δy = 4 and Err = 0. Table Ans.5 summarizes the nine steps of the loop. The results are for simple DDA, and for quadrantal DDA. Step
Plot
Err > 0?
1 2 3 4 5 6 7 8 9
(1, 1) (1, 2) (2, 2) (2, 3) (3, 3) (3, 4) (4, 4) (4, 5) (5, 5)
No Yes No Yes No Yes No Yes No
Update
New Err
← ← ← ← ← ← ← ← ←
4 4−4=0 4 4−4=0 4 4−4=0 4 4−4=0 4
y x y x y x y x y
2 2 3 3 4 4 5 5 6
Table Ans.5: A Quadrantal DDA Example.
Answers to Exercises
1348
3.4: We select an initial value Err = (Δy − Δx)/2 = 3.5 for a better looking line. The nine steps of the algorithm are summarized in Table Ans.6. Step 1 2 3 4 5 6 7 8 9
Plot
Err < 0?
(1, −1) (1, 0) (1, 1) (1, 2) (1, 3) (2, 4) (2, 5) (2, 6) (2, 7)
No No No Yes No No No No
Update y y y y y y y y
←0 ←1 ←2 ←3 ← 4, x ← 2 ←5 ←6 ←7
New Err 3.5 − 1 = 2.5 2.5 − 1 = 1.5 1.5 − 1 = 0.5 0.5 − 1 = −0.5 −0.5 + 8 − 1 = 6.5 6.5 − 1 = 5.5 5.5 − 1 = 4.5 4.5 − 1 = 3.5
Table Ans.6: An Octantal DDA Example.
After reversing the y coordinates, we get the seven pixels (1, −1), (1, −2), (1, −3), (2, −4), (2, −5), (2, −6), and (2, −7). 3.5: It is true that such a line has an ideal shape, but it is dimmer than a horizontal or a vertical line. Compare the following two lines: The first goes from (1,1) to (7,1). It is horizontal, it consists of seven pixels, and its ◦ length is also 7. The second line goes √ from (1,1) to (7,7). It is slanted at 45 , is made of seven pixels, but its length is 2 × 7 ≈ 10. The two lines will not have the same brightness! To correct this, we can add three pixels to the second line, distributing them as evenly as possible: The two lines will now have the same brightness, but the 45◦ line will not look as precise as before. We therefore end up with a trade-off. The octantal DDA method produces 45◦ lines that look great but are dim compared to other lines. Other methods may produce 45◦ lines that are bright but don’t look that great. 3.6: The details of the calculation and the pixels are shown in Figure Ans.7. Notice that the segment is symmetric about its center. 3.7: For the first line, the steps are listed in Table Ans.8a and the result is 01000100 or 00100010. For the second line, the steps are listed in Table Ans.8b and the final result is either 0100010010 or 0100100010.
Answers to Exercises x 1 2 3 4 5
y 1 1 1 2 2
D −5 −1 3 −11 −7
Inc y? n n y n n
x 6 7 8 9 10
y 2 2 3 3 3
D −3 1 −13 −9 −5
1349
Inc y? n y n n
Figure Ans.7: A Bresenham Line Segment from (1, 1) to (10, 3).
x
y
str1
str2
x
y
str1
str2
8 6 4 2
2 2 2 2
0 0 0 0
1 1 10 010
10 7 4 1 1 1
3 3 3 3 2 1
0 0 0 0 0010 0100010
1 1 10 010 010 010
(a)
(b)
Table Ans.8: Examples of Best-Fit DDA.
3.8: The explicit equation of a straight line is y = a x + b = (Δy/Δx)x + b. If point (x0 , y0 ) lies on the line, then y0 = (Δy/Δx)x0 + b. This yields x0 Δy − y0 Δx = bΔx and bΔx does not depend on x0 or y0 . 3.9: The reason for selecting n = 18 is that 90/5 = 18. Figure Ans.9 shows the 18 points. The coordinates of the points are also shown, as well as the Mathematica code that did the calculations. 3.10: Naturally, either point can be selected, but it is interesting to note that in principle such a case cannot happen. It can occur only in algorithms that employ floating-point numbers and only because of the limited precision of machine arithmetic. Figure Ans.10 shows why a tie is impossible. Suppose that the circle crosses exactly between Si and Pi , at the point marked by the square. The coordinates of that point are x and y+1/2, so the square of its distance from the origin is x2 +(y+1/2)2 . However, this point is on the circle, so its distance from the origin is also r. We therefore end up with r2 = x2 + y 2 + y + 1/4, which is a contradiction because r is an integer, while the right-hand side of this equation is a non-integer. Si y
x Ti
Figure Ans.10: A Tie.
Answers to Exercises
1350
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
Figure Ans.9: Circle in Polar Coordinates (Δθ = 5◦ ).
Clear[L]; n=18; delta=5 Degree; R=1; xk=R; yk=0; dcos=Cos[delta]//N; dsin=Sin[delta]//N; L={}; Do[xn=xk dcos-yk dsin; yn=xk dsin+yk dcos; xk=xn; yk=yn; L=Append[L,{xn,yn}], {k,0,n-1}]; L ListPlot[L, Prolog->AbsolutePointSize[3], AspectRatio->Automatic] Mathematica Code for Figure Ans.9
0.996,0.087 0.819,0.574 0.423,0.906
0.985,0.174 0.766,0.643 0.342,0.940
0.966,0.259 0.707,0.707 0.259,0.966
0.940,0.342 0.643,0.766 0.174,0.985
0.906,0.423 0.574,0.819 0.087,0.996
Coordinates of 18 Points of Figure Ans.9
0.866,0.500 0.500,0.866 0.000,1.000
Answers to Exercises
1351
3.11: Consider point T in Figure 3.22a. Its coordinates are (0, b), so its distance d/2 from any of the foci satisfies (d/2)2 = b2 + c2 . Now consider point L. Its distances from the two foci are a − c and a + c, so it tells us that (a − c) + (a + c) = d or a = d/2. The result is a2 = (d/2)2 = b2 + c2 . Curves. Head down against the wind, surf pounding to my right, I notice the pattern the sand makes as it blows along the beach, filling in footprints, covering chevron streaks left by the falling tide. The sand moves like smoke from a chimney, or water weed in a smoothly-flowing stream, or the curve—I forget its name— drawn by tying a pencil to a thread unwinding from a spool. There are connections here. My mind struggles clumsily, glimpsing an elegance I long to comprehend. —Maureen Eppstein: Poems. First published in Quantum Tao (1996).
3.12: Consider point L of Figure 3.22b. Both the maximum value of d and the minimum value of d occur at this point. We can thus write dmax = dmin + F . In general, the oval satisfies d + 2d = S. From these two relations, we get S = d + 2d = dmin + 2dmax = 3dmin + 2F or dmin = (S − 2F )/3. Similarly, both the minimum value of d and the maximum value of d occur at point R of Figure 3.22b, which enables us to write S = d + 2d = dmax + 2dmin = 3dmax − 2F or dmax = (S + 2F )/3. 3.13: A trapezoid (a four-sided figure with one pair of parallel sides). The rectangle and square are special cases of trapezoids. 3.14: Edge 7–8 does not contribute endpoints to T . In addition, the bottom endpoint of edges 6–7 and 8–9 are deleted, so there are no points in T corresponding to vertices 7 and 8, which is why edge 7–8 remains unfilled. 3.15: Edges AB and BC will be deleted at y = 6.
1352
Answers to Exercises
3.16: Determine the corner points of the rectangle that bounds the polygon, fill the rectangle with the pattern assuming an anchor at the bottom-left corner of the rectangle, and then fill the polygon from the rectangle, as illustrated by Figure Ans.11.
Figure Ans.11: Patterning from a Bounding Box.
3.17: No. Table Ans.5 shows an example where applying quadrantal DDA from (1, 1) to (5, 5) produces nine pixels, the last two of which are (4, 5) and (5, 5). However, when applying the same algorithm from (5, 5) to (1, 1) the first two pixels are (5, 5) and (5, 4). 3.18: No. Table 3.2 shows an example where a pixel is drawn twice. 3.19: The simplest way is to first calculate all the pixels in octant 2 and store them in a table, then use them to calculate and draw the pixels of octant 1, octant 2, and so on up to octant 8. This is a good method because the table size is reasonably small even for large circles. 4.1: Map projections. Projecting a sphere on a flat surface always results in deformations, but such projections are important in cartography. 4.2: The one-dimensional space is a straight line. The only graphics elements on a line are points and line segments. They can be moved about on the line (translated), and segments can also be scaled. A rotation takes a line segment outside of the line, so rotation is not a one-dimensional transformation. A reflection is identical to moving a segment or a point, so it cannot be considered an independent transformation. Similarly, shearing a line segment either changes its length (which makes it identical to scaling) or takes it outside the line. Thus, the only basic independent transformations in one dimension are translation and scaling. The latter applies only to line segments. 4.3: Function f1 is not onto since point (−1, 0) is not the mapping of any real point. This function is also not one-to-one since the two different points (a, b) and (−a, b) map to (a2 , b). Function f2 , however, is a valid geometric transformation. 4.4: No. It is easy to come up with examples of two transformations f and g such that f ◦ g = g ◦ f . One example is a 90◦ counterclockwise rotation about the origin and a reflection about the x axis. When the point (1, 0) is first rotated 90◦ about the origin and then reflected about the x axis, it is first moved to (0, 1) and then ends up at (0, −1). If the same point is first reflected and then rotated, it first moves to itself and then to (0, 1).
Answers to Exercises
1353
4.5: This is a direct application of Equation (4.3). The result is A(b11 x∗ + b12 y∗ )2 + B(b11 x∗ + b12 y∗ )(b21 x∗ + b22 y ∗ ) + C(b21 x∗ + b22 y∗ )2 + D(b11 x∗ + b12 y ∗ ) + E(b21 x∗ + b22 y∗ ) + F = 0, which is a degree-2 curve. 4.6: Each symbol consists of one of the digits 1, 2, and 3, attached to its mirror image. Thus, the next symbol in the sequence is the digit 4 attached to its mirror image, which looks like 44 . 4.7: A point (x, y) on a circle with radius R satisfies x2 +y 2 = R2 or (x/R)2 +(y/R)2 = 1. The transformed point (x∗ , y ∗ ) on an ellipse should satisfy (x/a)2 + (y/b)2 = 1. It is easy to guess that the transformation rule is x∗ = ax/R, y∗ = by/R, but this can also be proved as follows. The general scaling transformation is x∗ = k1 x, y ∗ = k2 y. For the transformed point to be on an ellipse, it should satisfy (k1 x/a)2 + (k2 y/b)2 = 1, which can be simplified to k12 b2 x2 + k22 a2 y 2 = a2 b2 . Substituting y 2 = R2 − x2 yields (k12 b2 − k22 a2 )x2 = a2 b2 − k22 a2 R2 . This equation must hold for every value of x, which is possible only if k12 b2 − k22 a2 = 0 and a2 b2 − k22 a2 R2 = 0. Solving these equations yields k1 = a/R and k2 = b/R. 4.8: The transformation can be written (x, y) → (x, −x + y), so (1, 0) → (1, −1), (3, 0) → (3, −3), (1, 1) → (1, 0), and (3, 1) → (3, −2). The original rectangle is therefore transformed into a parallelogram. 4.9: From cos 45◦ = 0.7071 and tan 45◦ = 1, we get the 45◦ rotation matrix as the product:
0.7071 0 1 −1 . 0 0.7071 1 1
(-1,1)
(0,1.4142)
(1,1)
(1.4142,0)
(a)
(b) Figure Ans.12: A
45◦
Rotation as Scaling and Shearing.
(c)
Answers to Exercises
1354
Figure Ans.12 shows how a 2 × 2 square centered on the origin (Figure Ans.12a) is first shrunk to about 70% of its original size (Figure Ans.12b), then sheared by the second matrix according to (x∗ , y ∗ ) = (x + y, −x + y), and then becomes the rotated diamond shape of Figure Ans.12c. Direct calculations show that the two original corners (−1, 1) and (1, 1) are transformed to (0, 1.4142) and (1.4142, 0), respectively. 4.10: Figure 4.5 gives the polar coordinates P = (r, α) and P∗ = (r, φ) = (r, α − θ). There is no 2×2 matrix for this rotation because our transformation matrices are linear, while the transformation between polar and Cartesian coordinates is nonlinear. This is true because, for example, (αx, αy) = (αr, θ). Multiplying both the x and y components by a constant multiplies r by that constant butdoes not change θ. We can artificially construct a matrix T = ac db such that P∗ will equal the product PT. It does not take much to figure out that what we are looking for is
T=
1 −θ/r 0 1
.
However, this matrix, which should be independent of the coordinates of the rotated point, depends on r. Also, it is not orthonormal (and not even orthogonal). 4.11: A reflection about the x axis transforms a point (x, y) to a point (x, −y). A reflection about y = −x similarly transforms a point (x, y) to a point (−y, −x). (This is matrix T3 of Equation (4.5).) Thus, the combination of these two transformations transforms (x, y) to (y, −x), which is another form of the negate and exchange rule, corresponding to a 90◦ clockwise rotation about the origin. This rotation can also be expressed by the matrix (compare with Equation (4.6))
cos 90◦ − sin 90◦
sin 90◦ cos 90◦
=
0 1 −1 0
.
4.12: The determinant of this matrix equals
1 − t2 1 + t2
2 −
−4t2 (1 − t2 )2 + 4t2 = = +1, (1 + t2 )2 (1 + t2 )2
which shows that it generates pure rotation. Also, if we denote this matrix by
a11 a21
a12 a22
,
it is easy to see that a11 = a22 , a12 = −a21 , a211 + a212 = 1, and a221 + a222 = 1. These properties are all satisfied by a rotation matrix.
Answers to Exercises
1355
4.13: The determinant of this matrix is
a 2 b a2 + b2 b − = − . A A A A2 √ It equals 1 for A = ± a2 + b2 but cannot equal −1 since it is the ratio of the two nonnegative numbers a2 + b2 and A2 . We consequently conclude that this matrix can represent pure rotation but never pure reflection. An example of pure rotation is a = √ b = 1, which produces A = ± 2 ≈ ±1.414. The rotation matrices for this case are √ √
1/ √2 1/√2 0.7071 0.7071 , = −0.7071 0.7071 −1/ 2 1/ 2 √ √
−1/√ 2 −1/√2 −0.7071 −0.7071 , = 0.7071 −0.7071 1/ 2 −1/ 2 and they correspond to 45◦ rotations about the origin. 4.14: The combined transformation matrix is the product ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ cos 180◦ − sin 180◦ 0 1 0 0 1 0 0 −1 0 0 ⎝ 0 −1 0 ⎠ ⎝ 0 1 0 ⎠ ⎝ sin 180◦ cos 180◦ 0 ⎠ = ⎝ 0 1 0 ⎠ . −1 −1 1 0 0 1 1 1 1 0 0 1 This matrix combines a reflection of the x coordinates with a one-unit translation in the x and y directions. Applying it to the four points yields (0, 2), (0, 0), (2, 2), and (2, 0). This is the same square but is now located in the first quadrant (Figure Ans.13).
translate x and y
reflect x
(a)
(b)
(c)
Figure Ans.13: An x Reflection and Translation.
4.15: Using angles φ and θ from Figure 4.5 but assuming that the rotation is counterclockwise about (x0 , y0 ), we get x∗ = x0 + (x − x0 ) cos θ − (y − y0 ) sin θ, y ∗ = y0 + (x − x0 ) sin θ + (y − y0 ) cos θ.
Answers to Exercises
1356
We are looking for a matrix T that satisfies ⎞ a b 0 (x∗ , y ∗ , 1) = (x, y, 1) ⎝ c d 0 ⎠ . m n 1 ⎛
The simple solution is ⎛
cos θ T=⎝ − sin θ x0 (1 − cos θ) + y0 sin θ
sin θ cos θ y0 (1 − cos θ) − x0 sin θ
⎞ 0 0⎠. 1
In a similar way, it can be shown that a clockwise rotation about (x0 , y0 ) is produced by ⎞ ⎛ cos θ − sin θ 0 ⎝ sin θ cos θ 0⎠. T= x0 (1 − cos θ) − y0 sin θ y0 (1 − cos θ) + x0 sin θ 1 4.16: If a point P = (x, y, 1) is reflected to a point P∗ = (x∗ , y ∗ , 1) = (y − 1, x + 1, 1) about the line y = x+1, then their midpoint (which is (P+P∗ )/2 = (x+y−1, y+x+1)/2) should be on the line. It’s easy to see that the midpoint is on the line because its y coordinate equals 1 more than its x coordinate. 4.17: This is easily done with the help of result is ⎛ 0.5 ⎝ 0.866 −0.866
appropriate mathematical software, and the ⎞ 0.866 0 −0.5 0 ⎠ . 1.5 1
4.18: Such a thing is possible but would not improve the algorithm. Transforming a point from octant 1 to octant 2 is done by reflecting it about the 45◦ line y = x. A point (x, y) is therefore transformed to the point (y, x). The similar transformation between half-octants amounts to reflection about the 22.5◦ line y = ax (where a = tan 22.5◦ ≈ 0.414). This transforms point (x, y) to (0.7071x+0.7071y, 0.7071x−0.7071y) (see the following proof) and would slow down the algorithm since it involves real-number arithmetic. Proof. Let’s denote α = sin 22.5◦ , β = cos 22.5◦ . To reflect about the 22.5◦ line, we rotate clockwise by 22.5◦ , reflect about the x axis, and rotate back. The combined transformation matrix is ⎞ ⎞⎛ ⎞⎛ ⎛ β α 0 1 0 0 β −α 0 ⎝ α β 0 ⎠ ⎝ 0 −1 0 ⎠ ⎝ −α β 0 ⎠ 0 0 1 0 0 1 0 0 1 ⎛ 2 ⎞ ⎞ ⎛ 2 2αβ 0 β −α .7071 .7071 0 = ⎝ 2αβ α2 − β 2 0 ⎠ ≈ ⎝ .7071 −.7071 0 ⎠ . 0 0 1 0 0 1
Answers to Exercises
1357
The last equality is true because 0.7071 ≈ sin 45◦ = sin 22.5◦ cos 22.5◦ + cos 22.5◦ sin 22.5◦ = 2αβ, 0.7071 ≈ cos 45◦ = cos 22.5◦ cos 22.5◦ − sin 22.5◦ sin 22.5◦ = β 2 − α2 . 4.19: In order for the general line ax + by + c = 0 to pass through the origin, it must satisfy c = 0. This implies y = −(a/b)x, so −a/b is the slope (i.e., tan θ) and a and b equal sin θ and cos θ, respectively, up to a sign. This also implies a2 + b2 = 1 and ab = sin θ cos θ. When this is substituted in Equation (4.12), it reduces to x∗ = x − 2a(ax + by) = x(1 − 2a2 ) − 2aby = x cos(2θ) + y sin(2θ),
(Ans.1)
y ∗ = y − 2b(ax + by) = −2abx + y(1 − 2b2 ) = x sin(2θ) − y cos(2θ).
4.20: Reflecting a point (x, y) about the line y = c moves it to (x, 2c − y). Reflecting this about line y = 0 simply reverses the y coordinate. Thus, the two reflections move (x, y) to (x, y − 2c). This is a translation of −2c units in the y direction. 4.21: Starting with sin 90◦ ⎛ 0 1 ⎝2 0 0 0
= 1, cos 90◦ = 0, we multiply the matrices to obtain ⎞⎛ ⎞ ⎛ ⎞ 0 0 −1 0 1 0 0 0 ⎠ ⎝ 1 0 0 ⎠ = ⎝ 0 −2 0 ⎠ , 1 0 0 1 0 0 1
which is a reflection and scaling in the y dimension. 4.22: Direct multiplication yields ⎛ cos θ1 cos θ2 − sin θ1 sin θ2 − cos θ1 sin θ2 − cos θ2 sin θ1 ⎝ sin θ1 cos θ2 + cos θ1 sin θ2 − sin θ1 sin θ2 + cos θ1 cos θ2 0 0 ⎛ ⎞ cos(θ1 + θ2 ) − sin(θ1 + θ2 ) 0 = ⎝ sin(θ1 + θ2 ) cos(θ1 + θ2 ) 0 ⎠ , 0 0 1
⎞ 0 0⎠ 1
thereby proving that two-dimensional rotations are additive. 4.23: Direct multiplication yields ⎛ T1 T2 = ⎝
⎞ 1 + bc b 0 c 1 0⎠. 0 0 1
This is a combination of shearing and scaling in the x direction. It is pure shearing only if bc = 0. This shows that shearing is not an additive transformation.
Answers to Exercises
1358
4.24: The product of the three shears is
1 a 0 1
1 0 b 1
1 c 0 1
=
ab + 1 a + abc + c b bc + 1
.
When we equate this to the standard rotation matrix
cos θ sin θ
− sin θ cos θ
,
we end up with a=c=
cos θ − 1 θ = − tan , sin θ 2
and b = sin θ,
which shows how to calculate a, b, and c from θ. Notice that both (cos θ − 1) and sin θ approach zero for small angles. The ratio of two small numbers is hard to calculate with any precision, which is why it is preferable to use tan(θ/2) instead. This particular combination of transformations does not save any time because we still have to calculate sin θ and cos θ in order to obtain a, b, and c. Still, it is an interesting, unexpected result that’s illustrated in Figure Ans.14 for θ = 45◦ .
(a)
(b)
(c)
(d)
Figure Ans.14: A
45◦
Rotation as Three Successive Shearings.
Answers to Exercises 4.25: The transformation matrices are ⎞⎛ ⎞⎛ ⎛ cos θ sin θ a 0 0 cos θ − sin θ 0 ⎝ sin θ cos θ 0 ⎠ ⎝ 0 d 0 ⎠ ⎝ − sin θ cos θ 0 0 0 0 1 0 0 1 ⎛ ⎞ 2 2 a cos θ + d sin θ (d − a) cos θ sin θ 0 = ⎝ (d − a) cos θ sin θ a sin2 θ + d cos2 θ 0 ⎠ . 0 0 1 When a = d, this reduces to
1359
⎞ 0 0⎠ 1
⎛
⎞ a 0 0 ⎝0 a 0⎠, 0 0 1
which does not depend on θ! This proves that uniform scaling produces identical results regardless of the particular axes used. 4.26: We simply multiply ⎞⎛ 1 cos θ − sin θ 0 ⎝ sin θ cos θ 0 ⎠ ⎝ c 0 0 0 1 ⎛ 2 cos θ − c sin θ cos θ− ⎜ −b sin θ cos θ + sin2 θ ⎜ 2 =⎜ ⎜ sin θ cos θ + c cos θ− ⎝ −b sin2 θ − sin θ cos θ 0 ⎛ 1 − (b + c) sin θ cos θ = ⎝ c cos2 θ − b sin2 θ 0 ⎛
⎞ ⎞⎛ cos θ sin θ 0 b 0 1 0 ⎠ ⎝ − sin θ cos θ 0 ⎠ 0 0 1 0 1 ⎞ sin θ cos θ − c sin2 θ+ 0⎟ +b cos2 θ − sin θ cos θ ⎟ 2 ⎟ sin θ + c sin θ cos θ+ ⎟ 0 2 ⎠ +b sin θ cos θ + cos θ 0 1 ⎞ b cos2 θ − c sin2 θ 0 1 + (b + c) sin θ cos θ 0 ⎠ . 0 1
This expression does depend on θ! When b = c = 0, the expression reduces to the identity matrix. However, when b = c = 0, this does not reduce to anything simple or elegant. 4.27: A direct scaling of point P = (x, y) relative to (x0 , y0 ) is done by x∗ = x0 + (x − x0 )sx = x · sx + x0 (1 − sx ), y ∗ = y0 + (y − y0 )sy = y · sy + y0 (1 − sy ). Using matrix notation, this is written as ⎞ 0 0 sx 0 sy 0⎠. (x∗ , y ∗ , 1) = (x, y, 1) ⎝ x0 (1 − sx ) y0 (1 − sy ) 1 ⎛
(Ans.2)
Answers to Exercises
1360
Performing the same transformation by means of translation, scaling, and reverse translation is done by the matrix product ⎛
1 ⎝ 0 −x0
0 1 −y0
⎞⎛ sx 0 0⎠⎝ 0 1 0
0 sy 0
⎞⎛ 0 1 0⎠⎝ 0 x0 1
0 1 y0
⎞ 0 0⎠, 1
which produces the same result. 4.28: Substituting k1 = k2 = k in Equation (4.16) yields ⎛
k2 ⎝ 0 k(1 − k)x1 + (1 − k)x2
0 k2 k(1 − k)y1 + (1 − k)y2
⎞ 0 0⎠. 1
This is equivalent to a single scaling by a factor k2 about point Pc =
k(1 − k) 1−k k 1 P1 + P2 = P1 + P2 . 1 − k2 1 − k2 1+k 1+k
4.29: Using homogeneous coordinates, we transform ⎛
⎞ −1 0 1 ⎝ (t , t, 1) 0 2 0 ⎠ = (1 − t2 , 2t, 1 + t2 ), 1 0 1 2
which, after dividing by the third component, becomes the point
1 − t2 2t , 1 + t2 1 + t2
.
This point satisfies the relation x2 + y 2 = 1, so it is located on the unit circle. 4.30: The Mathematica code t14=2^14; Print["(x*=",(8192-(2 14189.))/t14,",y*=",(14189.+(2 8192))/t14,")"] Print["(x*=",Cos[60 Degree]-2. Sin[60 Degree], ",y*=",Sin[60 Degree]+2. Cos[60 Degree], ")"] calculates the rotated point twice, first using integers and then using Mathematica’s built-in sine and cosine functions. The results are identical: (x∗ = −1.23206, y ∗ = 1.86603). For an 80◦ rotation, the code t14=2^14; Print["(x*=",(2845.-(2 16135.))/t14,",y*=",(16135.+(2 2845.))/t14,
Answers to Exercises
1361
")"] Print["(x*=",Cos[80 Degree]-2. Sin[80 Degree], ",y*=",Sin[80 Degree]+2. Cos[80 Degree], ")"] produces (x∗ = −1.79596, y∗ = 1.33209) and (x∗ = −1.79597, y ∗ = 1.3321) (a slightly different result). 4.31: From the definition of θi , we know that the ratio tan θi+1 / tan θi is 1/2. Small angles satisfy tan θ ≈ θ, so we conclude that the ratio θi+1 /θi equals approximately 1/2, except for the first few θi ’s. This can also be confirmed by manually checking the ratios from Table 4.13. Given an infinite sequence of numbers t, t/2, t/4,. . . , t/2i , we can express every number from 0 (which is obtained by subtracting all the numbers in the sequence from the first one) to 2t (which is obtained by adding all the numbers in the sequence). Our sequence of θi is finite and the ratio of consecutive elements isn’t always precisely 1/2, but [Walther 71] proves that every number in the range [0, 90◦ ) can be reached, up to a certain precision, by adding and subtracting a number of consecutive θi ’s. 4.32: The method proposed here is based on the fact that the magnitude of the rotated vector (x∗ , y ∗ ) should be identical to that of the original vector (x, y). This can be achieved by first normalizing (x∗ , y ∗ ) and then multiplying it by the magnitude of (x, y), x2 + y 2 x2 + y 2 (x∗ , y ∗ ) ← (x∗ , y ∗ ) = (x∗ , y ∗ ) , x∗2 + y ∗2 x∗2 + y ∗2 a calculation involving four exponentiations, one division, one multiplication, and one square root. 4.33: The traditional way of calculating a sine function is by its power series sin(θ) =
θ θ3 θ5 θ7 − + − + ···, 1! 3! 5! 7!
and similarly for cosine. These series, however, converge very slowly, requiring many multiplications and divisions. If a graphics application needs just rotations, the method of Section 4.2.3 may be simpler and faster than CORDIC. The advantage of CORDIC is that it can be adapted to the calculation of many different functions. A general software package that is concerned not just with rotations may benefit from the application of CORDIC. √ 4.34: From the definition k = a2 + c2 , it follows that k = 0 implies a = c = 0. In this case, the similarity becomes x∗ = m, y ∗ = n, and this is not a transformation because it is not one-to-one.
Answers to Exercises
1362
4.35: Transforming point (x − 2Px + 2Qx , y − 2Py + 2Qy ) through another halfturn yields ⎞ ⎛ −1 0 0 −1 0 ⎠ (x − 2Px + 2Qx , y − 2Py + 2Qy , 1) ⎝ 0 2Rx 2Ry 1 = (−x + 2Px − 2Qx + 2Rx , −y + 2Py − 2Qy + 2Ry , 1). Comparing this with Equation (4.19) shows that the result of three halfturns is a halfturn about the point S = P − Q + R. Writing this as S − P = R − Q shows that PQRS is a parallelogram (Figure 4.16c). Thus, point S completes the original three points to a parallelogram. 4.36: The first part results in ⎛
⎞ 3 4 0 5 0⎠. (x∗ , y ∗ ) = (x, y) ⎝ −2 1 −6 1 The decomposition is simple because A =
√ 9 + 16 = 5:
⎛
⎞⎛ ⎞⎛ ⎞ ⎞⎛ 1 0 0 5 0 0 3/5 4/5 0 1 0 0 ⎝ 14/25 1 0 ⎠ ⎝ 0 23/5 0 ⎠ ⎝ −4/5 3/5 0 ⎠ ⎝ 0 1 0 ⎠ . 0 0 1 0 0 1 1 −6 1 0 0 1
4.37: From Equation (4.22), we get the following. 1. For scaling, the inverse of ⎛
⎞ a 0 0 T = ⎝0 d 0⎠ 0 0 1 so (x, y)T−1 =
is T
−1
x y 1 , , a d ad
⎛ ⎞ d 0 0 1 ⎝ 0 a 0⎠, = ad 0 0 1
→ (x∗ , y ∗ ) = (dx, ay),
which is also scaling by factors d and a. 2. For shearing, the inverse of ⎛
⎞ 1 b 0 T = ⎝c 1 0⎠ 0 0 1 so (x, y, 1)T−1 =
−1
is T
x − yc −xb + y 1 , , −bc −bc −bc
⎛ ⎞ 1 −b 0 1 ⎝ = −c 1 0 ⎠ , −bc 0 0 1
which is a combination of shearing and scaling.
→ (x∗ , y ∗ ) = (x − yc, −xb + y),
Answers to Exercises
1363
3. For rotation, the inverse of ⎛
− sin θ cos θ 0
cos θ T = ⎝ sin θ 0 is T−1
⎛ cos θ 1 ⎝ − sin θ = cos2 θ + sin2 θ 0
⎞ 0 0⎠ 1
⎞ ⎛ 0 cos θ 0 ⎠ = ⎝ − sin θ 1 0
sin θ cos θ 0
sin θ cos θ 0
⎞ 0 0⎠. 1
This is a rotation in the opposite direction. 4. For translation, the inverse of ⎛
⎞ 1 0 0 ⎝ 0 1 0⎠ m n 1
⎛ is
⎞ 1 0 0 ⎝ 0 1 0⎠. −m −n 1
This is a reverse of the original translation. 4.38: We denote the transformation matrix by
Pi
a b c d
= P∗i
a b cd
and write the four equations
for 1 ≤ i ≤ 4.
These are easy to solve and yield a = 6, b = 1, c = 2, and d = 3. 4.39: The plane should pass through the three points (0, 0, 0), (0, 0, 1), and (1, 1, 0). Equation (4.24) gives 0 A = 0 1 0 C = 0 1
0 1 1 1 = −1, 0 1 0 1 0 1 = 0, 1 1
0 B = − 0 1 0 D = − 0 1
0 1 1 1 = 1, 0 1 0 0 0 1 = 0. 1 0
The expression of the plane is therefore −x + y = 0. 4.40: They are the points where the plane x/a + y/b + z/c = 1 intercepts the three coordinate axes. 4.41: s = N • P1 = (1, 1, 1) • (1, 1, 1) = 3, so the plane is given by x + y + z − 3 = 0. It intercepts the three coordinate axes at points (3, 0, 0), (0, 3, 0), and (0, 0, 3) (Figure 4.22a).
Answers to Exercises
1364 4.42: The expression is
P(u, w) = P1 + u(P2 − P1 ) + w(P3 − P1 ) = (3, 0, 0) + u(−3, 3, 0) + w(−3, 0, 3).
4.43: This is trivial. The origin is point (0, 0, 0), and Equation (4.27) shows that the distance between it and the plane Ax + By + Cz + D = 0 is D √ . A2 + B 2 + C 2
4.44: Because d is the signed distance. If the normal points from the plane in the direction of P, then d is positive, but we have to travel in the direction of −N. If the normal points in a direction opposite that of P, then we travel from P to P∗ in the direction of N but d is negative. 4.45: The product Tr Rx Trr yields ⎛
1 0 cos θ ⎜0 ⎝ 0 − sin θ 0 m(cos θ − 1) − n sin θ
0 sin θ cos θ n(cos θ − 1) + m sin θ
⎞ 0 0⎟ ⎠. 0 1
Substituting θ = 30◦ produces the matrix ⎛
1 0 0 0.5 ⎜ 0 0.866 ⎝ 0 −0.5 0.866 0 0.634 −0.366
⎞ 0 0⎟ ⎠, 0 1
which transforms point (1, 2, 3, 1) to (1, 0.866, 3.232, 1). 4.46: Using the rule for quaternion multiplication and the three trigonometric identities cos θ = cos2
θ 2
− sin2 θ2 ,
sin θ = 2 sin θ2 cos θ2 ,
and
cos θ = 1 − 2 sin2 θ2 ,
we can write q · [0, P] · q−1 = cos θ2 , u sin θ2 · [0, P] · cos θ2 , −u sin θ2 = cos θ2 , u sin θ2 · [0, P] · cos θ2 , −u sin θ2 = − sin θ2 (u • P), cos θ2 P + sin θ2 (u × P) · cos θ2 , −u sin θ2
= [− sin θ2 cos θ2 (u • P) + sin θ2 cos θ2 (P • u) − sin2 θ2 (u × P) • u, sin2 2θ (u • P)u + cos2 θ2 P + sin θ2 cos θ2 (u × P)
Answers to Exercises
1365
− sin θ2 cos θ2 (P × u) − sin2 θ2 (u × P) × u] = 0, sin2 θ2 (u • P)u + cos2 θ2 P + 2 sin θ2 cos θ2 (u × P) − sin2 θ2 (P − (u • P)u) = 0, 2 sin2 θ2 (u • P)u + (cos2 θ2 − sin2 θ2 )P + 2 sin θ2 cos θ2 (u × P) = [0, (1 − cos θ)(u • P)u + cos θP + sin θ(u × P)] = [0, (u • P)u + cos θ(P − (u • P)u) + sin θ(u × P)], that is Equation (4.31). 5.1: They could be (a) a cube, (b) the same cube seen edge on, and (c) the same cube seen rotated through 30◦ with one front edge and one back edge. 5.2: Given sz = 0.625, we calculate θ and φ
0.625 θ = sin−1 ± √ = sin−1 (±0.44194) = ±26.23◦ , 2
0.625 = sin−1 (±0.49266) = ±29.52◦ . φ = sin−1 ± √ 2 − 0.6252 5.3: Equation (5.4) shows that s2x = s2z is equivalent to cos2 φ + sin2 φ sin2 θ = sin2 φ + cos2 φ sin2 θ. This can be simplified to (sin2 φ − cos2 φ) cos2 θ = 0, with the two solutions cos2 θ = 0 → θ = ±90◦ and sin2 φ − cos2 φ = 0, which implies sin φ = ± cos φ and results in φ = 90◦ ± 45◦ and 270◦ ± 45◦ . 6.1: Such examples abound, mostly in modern art, which is one reason why many consider modern art trivial or false. Figure Ans.15, The Old Testament Trinity, c. 1410s, by the Russian painter Andrei Rublev is an example of reversed perspective. A well-known example of diverging lines is Woman in Mirror (1937) by Picasso. 6.2: Spiralstaircase. 6.3: A rolodex [eldonoffice 05] features many vanishing points because each of its index cards is oriented differently, causing its sides to seem to converge to a different point. A striped shirt may feature several vanishing points because the groups of parallel stripes on a sleeve, on the shirt itself, and on the flat parts of the collar may point in different directions. Long, meandering railway tracks may feature straight segments that go in different directions and create different vanishing points. Many scenes feature multiple vanishing points, as illustrated by the flat rectangles of Figure Ans.16. The well-known drawing High and Low by Escher [Ernst 76] features five vanishing points, four near the four corners of the figure and the fifth one at the center.
1366
Answers to Exercises
Figure Ans.15: Andrei Rublev, The Old Testament Trinity, c. 1410s.
Figure Ans.16: Many Vanishing Points.
6.4: Yes, by viewing it through a telescope. This device “telescopes” a scene and brings objects closer to the observer rather than magnifying them, but it does not affect the perspective. See Section 7.12 for the telescopic projection. 6.5: Figure Ans.17 illustrates the construction. First, the blue lines a and b are constructed, followed by the two lines labeled c. This is followed by the eight green lines, four of which are equally spaced on the left-hand side of b and the other four equally spaced on the right-hand side of b. The last step is to construct the eight red vertical line segments. We shall therefore borrow all our rules for the finishing of our proportions, from the musicians, who are the greatest masters of this sort of numbers, and from those things wherein nature shows herself most excellent and compleat. —Leon Battista Alberti. 6.6: Because the seven horizontal lines of the grid of part (b) are no longer equally spaced. Instead, they converge toward the top of the grid. 6.7: We start with a rectangle in one-point perspective and determine its single vanishing point (Figure Ans.18). We then copy the bottom line of the original rectangle (with the five numbered key points) and move it between the converging lines to form
Answers to Exercises
1367
b a
c
c Figure Ans.17: Two-Point Perspective with Equally-Spaced Lines.
line a. This makes it easy to construct the three green lines b. Next, the left-hand side of the original rectangle (with the seven points labeled A through G) is placed to the right of the perspective rectangle and point G is connected with point x. This segment is continued until it intercepts line h to determine point f . Connecting points B through F to point f determines the locations of the five red horizontal guidelines, which completes the construction of the perspective grid. It is now obvious how to move the various key points of the large digit to their new locations. f
h
a 1
2
3
4
5
x
b
b
b
AB
C
D E FG
Figure Ans.18: A Large Digit “5” in One-Point Perspective.
6.8: In the standard position, the line of sight of the viewer is the z axis. In order for a line segment to be perpendicular to this direction, all its points must have the same z coordinate (i.e., the segment must be contained in a plane parallel to the xy plane). We therefore select two endpoints with z = 1 and two other endpoints with z = 3. The first two points are selected, somewhat arbitrarily, as P1 = (2, 3, 1) and P2 = (3, −1, 1). The third point is chosen as P3 = (0, 2, 3) and the last point is
Answers to Exercises
1368
determined from P4 = P2 − P1 + P3 = (1, −2, 3). The four points are now projected to P∗1 = (1, 3/2), P∗2 = (3/2, −1/2), P∗3 = (0, 1/2), and P∗4 = (1/4, −1/2). It is easy to show that the two straight segments defined by the four projected points are parallel by computing the differences v1 = P∗2 − P∗1 = (1/2, −2) and v2 = P∗4 − P∗3 = (1/4, −1). The difference of two points is a vector, and the two vectors v1 and v2 point in the same direction. 6.9: We are looking for a t value for which P∗ (t) = (0, 1/4). This can be written as the vector equation (1 − t)2 (−1/2, 0) + 2t(1 − t)(0, 1/3) + t2 (1/4, 1/4) = (0, 1/4) or as the two separate scalar equations (1 − t)2 (−1/2) + 2t(1 − t)(0) + t2 (1/4) = 0 and (1 − t)2 (0) + 2t(1 − t)(1/3) + t2 (1/4) = (1/4). The first equation yields the solutions t ≈ 0.5858 and t ≈ 3.414, while the second equation has the solutions t = 0 and t = 1.6. The two equations are therefore contradictory. 6.10: Appropriate mathematical software produces the result (0, 2, 4, 1). The rotation transforms (0, 1, −4, 1) to (0, 4, 1, 1), the translation transforms this to (0, 4, 4, 1), and the scaling produces (0, 2, 4, 1). 6.11: When T1 or T2 gets large, the object is magnified. However, when T3 gets large, the object is scaled in the z direction relative to the origin. All the z coordinates become large, effectively moving the object away from the observer. When all three scale factors get large, the magnification in the x and y directions is canceled out by the effect of moving away in the z direction, so the object does not seem to change in size. 6.12: Equation (6.7) gives us ⎡
1 0 0 0 0 ⎢0 T=⎣ 0 −1/2 0 0 0 0
⎤ 0 1⎥ ⎦, 0 4
and we know that (0, 1, −4, 1)T = (0, 2, 0, 5). We are looking for a point P = (x, y, z) such that (x, y, z, 1)T = (0, 0, 0, w) for any w = 0. The explicit form of this set of equations is (x, −z/2, 0, y + 4) = (0, 0, 0, w), and this is satisfied by all the points of the form (0, y, 0), where y = −4. The interpretation of this result is simple. The rotation brings the points on the y axis to the z axis, where they are translated by three units and remain on the z axis. The scaling doesn’t move these points any farther. Point (0, −4, 0) is rotated to (0, 0, −4) and translated to (0, 0, −1), which is the viewer’s position. All the points on the z axis are projected to the origin except the viewer’s location. The projection of the viewer is undefined because the case z = −k results in Equation (6.3) having a zero denominator. The next example sheds more light on the perspective projection of points with negative z coordinates.
Answers to Exercises
1369
6.13: The terms clockwise and counterclockwise fully describe rotations in two dimensions. Our example, however, is in three dimensions, where rotations are more complex and can have more directions. The rotation produced by matrix (6.8) is from the positive z to the positive x direction (or, alternatively, from the negative x direction to the negative z direction). 6.14: Because of the special orientation of the projection plane. This equation says that any point (x, y, z) satisfying αx = −βz lies on the projection plane, regardless of its y coordinate. 6.15: The case θ = 0 means α = 0 and β = 1. Matrix (6.9) reduces to ⎛
k ⎜0 ⎝ 0 0
0 k 0 0
⎞ ⎛ 0 0 1 0 0⎟ ⎜0 ⎠ = k⎝ 0 1 0 0 k 0
0 1 0 0
⎞ 0 0 0 0⎟ ⎠. 0 r 0 1
√ The case θ = 45◦ implies α = β = 1/ 2. Matrix (6.9) is reduced to √ ⎞ −k/2 1/ 2 0 0√ ⎟ ⎠. k/2 1/ 2 0 k
⎛
k/2 0 k ⎜ 0 ⎝ −k/2 0 0 0
The case θ = 90◦ means α = 1 and β = 0. Matrix (6.9) reduces to ⎛
0 0 ⎜0 k ⎝ 0 0 0 0
0 0 k 0
⎛ ⎞ 1 0 0⎟ ⎜0 ⎠ = k⎝ 0 0 k 0
0 1 0 0
⎞ 0 r 0 0⎟ ⎠. 1 0 0 1
6.16: Direct multiplication yields ⎛
kβ 2 ⎜ 0 (βl, m, −αl, 1) ⎝ −kαβ 0
0 k 0 0
−kαβ 0 kα2 0
⎞ α 0⎟ ⎠ β k
= (klβ 3 + klα2 β, mk, −klαβ 2 − klα3 , lαβ − lαβ + k) = (klβ, km, −klα, k). The transformed point is P∗ = (lβ, m, −lα) = P. Point P is thus transformed to itself! This happens because P resides on the projection plane. The equation of the plane is αx = −βz, and a simple check verifies that the coordinates of point P satisfy this relation.
Answers to Exercises
1370
6.17: The steps are similar to the ones used to derive matrix (6.9): Use the relation (−kα, −kβγ, −kβδ) • (x, y, z) = 0 to derive the equation of the projection plane. This is trivial, and the equation is −xkα − ykβγ − zkβδ = 0. Compute the straight segment from the viewer to a general point P = (l, m, n): (l + kα, m + kβγ, n + kβδ)t + (−kα, −kβγ, −kβδ). Calculate the value of t0 at the intersection point of the segment and the plane. From (l + kα)t0 − kα kα + (m + kβγ)t0 − kβγ kβγ + (n + kβδ)t0 − kβδ kβδ = 0, we get
k(α2 + β 2 γ 2 + β 2 δ 2 ) (l + kα)α + (m + kβγ)βγ + (n + kβδ)βδ k(α2 + β 2 γ 2 + β 2 δ 2 ) = . lα + mβγ + nβδ + k(α2 + β 2 γ 2 + β 2 δ 2 )
t0 =
The coordinates of the projected point can now be determined. The x∗ coordinate is k(α2 + β 2 γ 2 + β 2 δ 2 ) − kα lα + mβγ + nβδ + k(α2 + β 2 γ 2 + β 2 δ 2 ) lkβ 2 (γ 2 + δ 2 ) − mkαβγ − nkαβδ = . lα + mβγ + nβδ + k(α2 + β 2 γ 2 + β 2 δ2 )
x∗ = (l + kα)t0 − kα = (l + kα)
The y ∗ coordinate is y ∗ = (m + kβγ)t0 − kβγ k(α2 + β 2 γ 2 + β 2 δ 2 ) − kβγ lα + mβγ + nβδ + k(α2 + β 2 γ 2 + β 2 δ 2 ) −lkαβγ + mk(α2 + β 2 δ 2 ) − nkβ 2 γδ . = lα + mβγ + nβδ + k(α2 + β 2 γ 2 + β 2 δ 2 ) = (m + kβγ)
The z ∗ coordinate is z ∗ = (n + kβδ)t0 − kβδ k(α2 + β 2 γ 2 + β 2 δ2 ) − kβδ lα + mβγ + nβδ + k(α2 + β 2 γ 2 + β 2 δ2 ) −lkαβδ − mkβ 2 γδ + nk(α2 + β 2 γ 2 ) = . lα + mβγ + nβδ + k(α2 + β 2 γ 2 + β 2 δ 2 ) = (n + kβδ)
Answers to Exercises
1371
The projection matrix is now easy to calculate. It is ⎛ 2 2 ⎞ kβ (γ + δ 2 ) −kαβγ −kαβδ α 2 2 2 2 k(α + β δ ) −kβ γδ βγ ⎜ −kαβγ ⎟ ⎝ ⎠. −kαβδ −kβ 2 γδ k(α2 + β 2 γ 2 ) βδ 2 2 2 2 2 0 0 0 k(α + β γ + β δ )
(Ans.3)
To check our result, we consider the special case of no rotation about the x axis. In this case, φ = 0, γ = 0, and δ = 1. It is easy to see that this reduces matrix (Ans.3) to matrix (6.9). 6.18: After the two rotations, the viewer may end up at any point in space, but the projection plane still passes through the origin. This is why our case is not completely general. 6.19: These two translation matrices can easily be written, and it is obvious that their product is a translation from the origin to B. ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ 1 0 0 0 1 0 0 0 1 0 0 0 0 0⎥ ⎢0 1 0 0⎥ ⎢0 1 ⎢0 1 0 0⎥ T3 = ⎣ ⎦ , T4 = ⎣ ⎦ , T3 ·T4 = ⎣ ⎦. 0 0 1 0 0 0 1 0 0 0 1 0 0 0 −k 1 a b c+k 1 a b c 1 6.20: Recall that the basic rule of perspective projection is to connect an image point to the viewer with a line that intercepts the projection plane. The viewer and the image points should therefore be on different sides of the projection plane. In our case, point (0, 0, 0) is behind the viewer, so it is on the same side of the projection plane as the viewer and, consequently, it does not make sense to project it. 6.21: Direct multiplication yields ⎞ ⎛ β 0 0 αr ⎜ 0 1 0 0 ⎟ (βl, m, −αl, 1) ⎝ ⎠ = (lβ 2 + lα2 , m, 0, lrαβ − lrαβ + 1) = (l, m, 0, 1), −α 0 0 βr 0 0 0 1 so the transformed point is P∗ = (l, m, 0). Figure Ans.19 shows that point P = (βl, m, −αl) resides on the projection plane. After the transformations, it is still located on the projection plane, only now this is the xy plane. x x P P
Viewer
k
z
Viewer
z Screen
Figure Ans.19: Transforming and Projecting.
Answers to Exercises
1372
6.22: Because they project on different projection planes. Matrix (6.9) projects on plane αx = −βz, where the z coordinate is proportional to the x coordinate, whereas matrix (6.11) projects on the xy plane, where the z coordinate is zero. 6.23: Figure Ans.20a shows the geometry of the problem. Notice that the viewer looks in the direction of negative z and also down. The Mathematica code k = 3.; r = 1/k; {a, b, c} = {0, 2k, -k}; {d, e, f} = Normalize[{0, -1, -1}] T = {{(e^2 + f + f^2)/(1 + f), -d e/(1 + f), 0, d r}, {-d e/(1 + f), (d^2 + f + f^2)/(1 + f), 0, e r}, {-d, -e, 0, f r}, {(c d + b d e - a e^2 - a f + c d f - a f^2)/(1 + f), (-b d^2 + c e + a d e - b f + c e f - b f^2)/(1 + f), 0, -(a d + b e + c f) r}}; {0,0,-4k,1}.T
computes the normalized components of D as (0, −0, 7071, −0.7071) and the projected point as the 4-tuple (0, −2.12132, 0, 3.53553) (i.e., point (0, −0.6) on the xy plane, shown in the diagram). This is the first example where the viewer is not looking in the positive z direction or anywhere near that direction, and this fact raises the issue of the top of the screen. If the viewer is looking at or near the positive z direction, the rotation that aligns the screen with the xy plane is about a small angle. In such a case, the screen does not change its orientation much, so there is no problem with the direction of the top of the screen. If we assume that the top of the screen was in the positive y direction (or close to it) before the rotation, then the rotation aligns the top of the screen with the positive y axis. y
y
2k
D
3
z x
z 2
k
(0,
−1 ,−1 )
1
2k
(a)
(b)
Figure Ans.20: A Viewer Looking “Backward.”
In this example, however, the rotation is about an angle that is close to 180◦ , so the direction of the top of the screen becomes important. Figure Ans.20b suggests that the top of the screen is a vector in the direction (0, 1, −1) because this direction is perpendicular to the line of sight of the viewer and isn’t very different from the direction
Answers to Exercises
1373
of positive y. If this is so, then after the large rotation this top becomes the bottom of the screen in the xy plane. Figure Ans.20b shows how the top of the screen retains its orientation when the screen is translated from 1 to 2 but becomes the bottom when the screen is rotated about the x axis from 2 to 3. Thus, a complete treatment of general perspective should include an optional rotation of the screen about the z axis. (See the discussion of the top vector in Section 6.8.) 6.24: This is easily done with the help of appropriate software. The results for the two cases are ⎞ ⎛ ⎞ ⎛ 1 0 0 0 1 0 0 0 1 0 0 ⎟ ⎜0 1 0 0⎟ ⎜ 0 (Ans.4) ⎠. ⎠ and ⎝ ⎝ 0 0 0 r 0 0 0 r 0 0 0 1 −a −b 0 −cr Notice how the second matrix of (Ans.4) is the standard perspective projection matrix Tp of Equation (6.6). √ √ 6.25: We substitute (a, b, c) = (0, 1, 0) and (d, e, f ) = (0, 1/ 2, 1/ 2) in matrix (6.15). The transformation is therefore ⎛
1 ⎜0 ⎜ (0, 1, 10, 1) ⎜ ⎝0 0
0
√1 2 −1 √ 2 −1 √ 2
0 0 0 0
0
√r 2 √r 2 −r √ 2
⎞ ⎟ 1 − 10 − 1 r + 10r − r ⎟ √ √ , 0, , ⎟ = 0, ⎠ 2 2
so P∗ = (0, −1/r, 0) = (0, −k, 0). The following Mathematica code may be helpful for further experimentation: {a,b,c}={0,1.,0}; {d,e,f}=Normalize[{0,1,1}] T = {{(e^2 + f + f^2)/(1 + f), -d e/(1 + f), 0, d r}, {-d e/(1 + f), (d^2 + f + f^2)/(1 + f), 0, e r}, {-d, -e, 0, f r}, {(c d + b d e - a e^2 - a f + c d f - a f^2)/(1 + f), (-b d^2 + c e + a d e - b f + c e f - b f^2)/(1 + f), 0, -(a d + b e + c f) r}}; {0,1,10,1}.T
√ √ √ 6.26: 1. The magnitude of vector D is 12 + 12 = 2 = k, so the choice k = 2 implies that D has the correct length. It goes from point B to the center of the projection plane. The coordinates of the center are therefore B + D = (0, 2k − 1, −2k − 1). To find the equation of the projection plane, we consider an arbitrary point P = (x, y, z) on this plane. The vector from the center point to P is the difference P − (B + D) = (x, y − 2k + 1, z + 2k + 1). This vector is perpendicular to D, so their dot product (x, y − 2k + 1, z + 2k + 1) • (0, −1, −1) must be zero. This produces the equation y + z = −2, and Figure Ans.21 illustrates how this plane is parallel to the x axis, which
Answers to Exercises
1374
B
y
D B+D
z
−2
Q
−2
Figure Ans.21: Projection Plane
y+z=−2.
is why its equation does not depend on x. Thus, a general point on the projection plane has coordinates (x, y, −y − 2). 2. Appropriate mathematical software produces the 4×4 transformation matrix T123 = ⎡ ⎢ ⎢ ⎢ ⎣ cd +
e2 +f −e2 f −f 3 1−f 2 de − 1+f bde 1+f
de − 1+f
d2 +f −d2 f −f 3 1−f 2
−d
−
a(e2 +f −e2 f −f 3 ) 1−f 2
ce +
ade 1+f
−e
−
b(d2 +f −d2 f −f 3 ) 1−f 2
0
dr
(Ans.5) ⎤
⎥ ⎥ 0 er ⎥. ⎦ 0 fr 0 1 + (−ad − be − cf )r
Notice the denominators in this matrix. They imply that values f = ±1 require special treatment. In our first case, vector D = (0, −1, −1) has f = −1, so we change it to (0, −1, −0.99). We pick up the point Q = (0, 2k − 3, −2k + 1) ≈ (0, −0.17, −1.83) that’s on the projection plane, located “below” the center of the plane. The product Q · T123 yields the 4-tuple (0, 3.97, 0, 2.42), which is point (0, 1.64, 0), located on the xy plane but above the origin. 6.27: We first determine α α=
|a|2 8 4 = = . a • (p − b) (0, 2, 2) • (x − 0, y − 1, z − 0) y+z−1
(Note that P = (0, 1, 10), implying α = 4/(1 + 10 − 1) = 2/5.) Next, we compute vector d d = b + α(p − b) = (0, 1, 0) + =
4 y+z−1
4 (x, y − 1, z) y+z−1 x, (5y − z − 5)/4, z .
(A check verifies that P = (0, 1, 10) ⇒ d = (0, 1, 4).)
Answers to Exercises
1375
Vector c can now be calculated c = α(p − b) − a 4 (x, y − 1, z) − (0, 2, 2) = y+z−1 4 x, (y − z − 1)/2, −(y − z − 1)/2 . = y+z−1 Thus, the screen coordinates are u • c = (1, 0, 0) •
4 x, (y − z − 1)/2, −(y − z − 1)/2 y+z−1
4x , y+z−1 √ √ w • c = (0, 1/ 2, −1/ 2) • =
4 x, (y − z − 1)/2, −(y − z − 1)/2 y+z−1
4(y − z − 1) . =√ 2(y + z − 1) Again, a direct check verifies that P = (0, 1, 10) results in √ 4(1 − 10 − 1) −4 u • c = 0 and w • c = √ = √ = − 8. 2(1 + 10 − 1) 2 Also, the screen coordinates of point P = (0, 5, 4) are u • c = 0,
4(5 − 4 − 1) = 0, and w • c = √ 2(5 + 4 − 1)
as should be expected (why?). 6.28: Figure Ans.22 shows that in a right-handed coordinate system, the positive y axis is in the direction of vector w and vector u is in the direction of negative x. y x
w
Viewer k
u
z
Figure Ans.22: A Right-Handed Coordinate System.
Answers to Exercises
1376
6.29: The proof is straightforward but a little messy. We start with two threedimensional points, P1 = (x1 , y1 , z1 ) and P2 = (x2 , y2 , z2 ). Their projections are P∗1 =
x1 k y1 k z1 , , k + z1 k + z1 k + z1
and P∗2 =
x2 k y2 k z2 , , k + z2 k + z2 k + z2
.
Now consider the two lines P(t) = P1 + (P2 − P1 )t and P∗ (u) = P∗1 + (P∗2 − P∗1 )u. We need to prove that every point on P(t) is transformed to a point on P∗ (u), where u depends on t, k, P1 , and P2 only. The coordinates of a general point on P(t) are x1 + (x2 − x1 )t k y1 + (y2 − y1 )t k z1 + (z2 − z1 )t k , , . k + z1 + (z2 − z1 )t k + z1 + (z2 − z1 )t k + z1 + (z2 − z1 )t The coordinates of a general point on P∗ (u) are
x1 k + k + z1
x2 k x1 k − k + z2 k + z1
u,
y1 k + k + z1
y2 k y1 k − k + z2 k + z1
u,
z1 + k + z1
z2 z1 − k + z2 k + z1
u .
In order for the points to be equal, the following two equations have to hold:
x1 + (x2 − x1 )t k x1 k x2 k x1 k = + − u, k + z1 + (z2 − z1 )t k + z1 k + z2 k + z1
z1 + (z2 − z1 )t k z1 z2 z1 = + − u. k + z1 + (z2 − z1 )t k + z1 k + z2 k + z1 (There are actually three equations, but the second one, for y, is equivalent to the first one, so it is not included here.) Because of the way the depth transformation is defined, both equations are satisfied if u is defined by u=t
k + z2 . k + z1 + (z2 − z1 )t
Note that t = 0 ⇒ u = 0 and t = 1 ⇒ u = 1. 6.30: The tangent of half the angle is (W/2)/k = 1/2. Therefore, half the angle equals 26.5◦ and the entire field of view is twice that, or 53◦ wide. (See also the discussion of Brunelleschi’s peepshow experiment on Page 278.) 6.31: Since k is scaled by the same factor of 5, we should scale e by this factor, bringing it down from 2.5 to 0.5.
Answers to Exercises
1377
6.32: The difference between the two pictures of a stereo pair is a horizontal shift, but various parts of the pictures are shifted by different amounts. Parts close to the camera are shifted more than parts that are far away. Thus, the two pictures are not simply shifted versions of each other. However, a person looking at a picture can often tell the approximate distance of each picture element from the camera. This makes it possible, at least in principle, to create a copy of a picture and have the user specify the amount of shift of every picture element in the copy. In practice, such a method is slow and cumbersome, and the result depends on the depth estimates of the user, so it should be used only as a last resort, when only one picture is available and it is important to see it in three dimensions. 7.1: The Mathematica code (* exercise for hemispherical fisheye projection *) k=1; scal[q_]:=(k Tan[ArcTan[q/k]/2])/q; {scal[1.],scal[10.], scal[100.], scal[1000.], scal[10000.]} produces the values 0.414214, 0.0904988, 0.0099005, 0.000999, and 0.00009999. 7.2: Figure Ans.23 illustrates this effect. We see a few points on a vertical line. In the fisheye projection, each point is moved toward the origin, but points that are close to the origin are moved less than points that are further away. The result is a curve. Applying this argument to straight lines that pass through the center of the circle shows that they are not bent.
Figure Ans.23: Vertical Distortion in Fisheye Projection.
7.3: The only differences are that (1) w varies in the intervals [0, 90◦ ] (for the top half of space) and [270◦ , 360◦ ] (for the bottom half) and (2) the entire radius-k circle, not just half of it, is now devoted to the hemisphere of space in front of the viewer. As a result, the new table (Ans.24) has just two rows. 7.4: This point corresponds to w = 0◦ (implying r = 0), and the pair of polar coordinates (0, u) corresponds to the center of the radius-k circle regardless of u. This special point is therefore mapped to the center of the circle.
Answers to Exercises
1378 w 0 → 90 270 → 360
r k sin w −k sin w
r interval [0, k] [k, 0]
u top bottom
sin w 0→1 −1 → 0
Table Ans.24: Two Cases of w, r , and u.
7.5: Imagine a straight segment parallel to the x axis (Figure Ans.25a). The angle w is the same for all the points of this segment, so a point in direction (u, w) on the segment is projected on the circle into a point with polar coordinates ( k2 sin w, u). The result is a set of points with polar coordinates (r, u), where r is constant (i.e., a circular arc). When this straight segment is slightly perturbed, as in Figure Ans.25b, its projection does not vary much and remains a curve. On the other hand, when a straight segment passes through the viewer’s line of sight, as in Figure Ans.25c, all its points have the same angle u. The projection of such a segment is a set of points (r, u), where r normally varies but u is constant, a straight segment. y
x
z (a)
(b)
(c)
Figure Ans.25: Straight Segments in the Angular Fisheye Projection.
7.6: The z ∗ coordinate depends on Z in the sense that point P should be projected on the cylinder only if |z ∗ | ≤ Z. 7.7: Figure Ans.26a shows a cylinder of radius R centered on the origin, with its axis in the z direction. We start with a circle in the xy plane. The circle’s equation is R cos(2πt), R sin(2πt), 0 . The circle is now rotated θ degrees about the y axis, as shown in Figure Ans.26b. The new circle is given by ⎞ ⎛ cos θ 0 sin θ R cos(2πt), R sin(2πt), 0 ⎝ 0 1 0 ⎠ − sin θ 0 cos θ = R cos(2πt) cos θ, R sin(2πt), R cos(2πt) sin θ . Figure Ans.26c shows that in order to convert this tilted circle into an ellipse, its x and z coordinates should be scaled by a factor of 1/ cos θ. The equation of this ellipse is thus (Ans.6) R cos(2πt), R sin(2πt), R cos(2πt) tan θ .
Answers to Exercises
1379
In order to prove that this is an ellipse, we can rotate it back to the xy plane. The result is ⎛ ⎞ cos θ 0 − sin θ R cos(2πt), R sin(2πt), R cos(2πt) tan θ ⎝ 0 1 0 ⎠ sin θ 0 cos θ = R(cos θ + sin2 θ) cos(2πt), R sin(2πt), 0 , an expression that satisfies x2 /a2 + y 2 /b2 = 1 for a = R(cos θ + sin2 θ) and b = R. Figure Ans.26d shows the unrolled cylinder, cut along the y = 0 line, with the origin at its center. z
z y x
y x
R (a)
(b)
z
2πR z y y
x
cut here (c)
(d) Figure Ans.26: Ellipse and Sinusoid.
The behavior of the resulting flat curve can be figured out when we notice that the x and y coordinates of the ellipse of Equation (Ans.6) form a circle, which is a curve with constant speed. This means that when the curve is flattened, it has constant speed in the horizontal direction (i.e., incrementing t in equal steps moves us equal horizontal increments on the unrolled cylinder). The vertical behavior of the flattened curve is determined by the z coordinate of Equation (Ans.6), and this coordinate behaves like a sine curve with an amplitude R tan θ. The result is the parametric curve (2t − 1)πR, R tan θ cos (2t − 1)π ,
0 ≤ t ≤ 1.
Answers to Exercises
1380
As t varies from zero to one, the horizontal coordinate varies from −πR to +πR and the vertical coordinate varies as a sine curve from −1 to 0 to +1, back to 0, and ends up at −1. It is also interesting to consider the curvature of this sine curve. The curvature is essentially given by the second derivative, which, in the case of sin(t), equals − sin(t). We are interested only in the absolute magnitude of the curvature, so we can disregard the minus sign. The result is that for t = 0 and t = π the curvature is zero, while for t = π/2 it is maximum. The conclusion is that when a straight line is projected by curved perspective into a sinusoid, those parts of the line that are close to the observer become highly curved, while the distant parts remain straight or close to straight. Figure 7.19 is a typical example of this behavior. 7.8: Figure Ans.27 illustrates the geometry of the problem. Parts a and b show that the distance between the two points on the hemisphere is R(sin θ)φ, and part c shows that the distance between them on the circle is Rθφ. The ratio of the distances is θ/ sin θ and this number, which is undefined for θ = 0, starts at 1 for small angles, reaches 1.01 for 30◦ , and becomes π/2 ≈ 1.57 for 90◦ . A
R R
R (a)
(b)
B
R(sin)φ
Rsin
B Rsin
R
φ
Rφ
A
φ R
(c)
Figure Ans.27: Distance Between Points in Curvilinear Perspective.
7.9: Imagine a straight horizontal line of the form (x(t), y(t), 0). All its z coordinates are zero, which makes it easy to show (and also to visualize) that the projected segments of this line (there can be up to three segments) are all horizontal and therefore have identical slopes. Figure Ans.28 illustrates an example. Given the two points P1 = (3k, −3k/2, 0) and P2 = (k, 5k/4, 0), it is trivial to determine that P∗1 = (k, −k/2, 0) and P∗2 = (4k/5, k, 0). Since both P1 and P2 have z coordinates of zero, the entire line segment connecting them has z = 0. Thus, even though we don’t know the precise location of point P0 , we know that its z coordinate is zero. The coordinates of its projection P∗0 are between those of P0 and the origin, implying that the z coordinate of P∗0 is also zero. Thus, the two segments P∗1 P∗0 and P∗0 P∗2 have z = 0 and therefore have the same slope. On the other hand, a straight horizontal line of the form x(t), y(t), a for a = 0 features an interpanel slope discontinuity that’s proportional to a. Here is an illustrative example. Given the two points P1 = (2k, 0, a) and P2 = (2k, 100k, a), the line segment connecting them is L(t) = (2k, 100tk, a). Point P1 is projected to P∗1 = (k, 0, a/2), and point P2 is projected to P∗2 = (k/50, k, a/100). Point P0 is a point L(t0 ) on this segment with the property that its x and y coordinates are equal. This produces the equation
Answers to Exercises y
P2 P*
2
y=k
1381
P0
x
z
P*1 x=k
P1
Figure Ans.28: Cubic Projection of a Horizontal Straight Segment.
2k = 100t0 k, which yields t0 = 1/50. Thus, P0 is the point (2k, 2k, a) and is projected to P∗0 = (k, k, a/2). The result is two projected segments. The one on panel x = k goes from P∗1 = (k, 0, a/2) to P∗0 = (k, k, a/2), so its slope is zero. The projected segment on the y = k segment goes from P∗0 = (k, k, a/2) to P∗2 = (k/50, k, a/100), so its slope is a 2
−
a 100
k . k− 50
Assuming that the y coordinate of P2 is very large (more than 100), we obtain the approximate slope a/(2k). The slope discontinuity is proportional to the z coordinate a. It is zero for a = 0 and becomes large (positive or negative) with a. 7.10: The image is circular because the main mirror is circular. It has a hole in the middle because the main mirror has a hole in it (more accurately, because light hitting the top of the main mirror, around its hole, cannot reach the secondary mirror). I’ve finally figured out what’s wrong with photography. It’s a one-eyed man looking through a little ’ole. Now, how much reality can there be in that? —David Hockney. 7.11: It is easy to see from Equation (7.4) that z = k → z ∗ = k/2. 7.12: These concepts are defined for the Earth or for any rotating sphere. The rotation naturally defines two special points, the poles. These, in turn, define the equator (the great circle at equal distances from the poles and parallel to the axis of rotation). Now imagine a point P on the surface of the sphere. Draw a vertical great circle arc from P to the equator. (The term “vertical” means part of a great circle that passes through the poles.) This arc meets the equator at a point Q. The angle POQ (where O is the center of the sphere) is the latitude of P. It varies in the interval [0, 90◦ ] for each hemisphere. Thus, latitude is a natural coordinate on the rotating sphere. Its definition does not require any arbitrary choices.
1382
Answers to Exercises
She wanted to live in Canada, he wanted to live in Mexico, so they parted. Years later, when asked the reason she replied simply “I just didn’t like his latitude!” —Charles Schultz, Peanuts. The definition of longitude, on the other hand, is arbitrary and depends on a special direction that must be chosen by general agreement. This direction, which is referred to as longitude zero (or meridian zero), is perpendicular to the rotation axis. The longitude of a point P is the angle between its direction (the segment connecting it to the axis) and longitude zero. Thus, longitude varies in the interval [0, 360◦ ], although many maps show it in the interval [0, 180◦ ] and add the designation “east” or “west.” The antipode of point P is the point on the surface of the sphere at maximum distance from P. A graticule is a spherical grid of coordinate lines, latitudes and longitudes, over the surface of the sphere. The latitudes are circles perpendicular to the axis, which is why they are also called parallels. Each longitude is a semicircular arc (a meridian) with the axis as its chord. All the meridians meet at each pole, and every parallel crosses every meridian at a right angle. 7.13: Yes, there are infinitely many developable surfaces, one of which is shown in Figure Ans.29. Notice that at every point on a developable surface it is possible to draw a straight line that lies completely on the surface.
Figure Ans.29: A Developable Surface.
8.1: Yes, because (2, 2.5) = 0.5(1, 1) + 0.5(3, 4). 8.2: We can write P1 = P0 + α(P3 − P0 ) and similarly P2 = P0 + β(P3 − P0 ). It is obvious that n collinear points can be represented by two points and n − 2 real numbers. 8.3: Three two-dimensional points are independent if they are not collinear. The three corners of a triangle cannot, of course, be on the same line and are therefore independent. As a result, the three components of Equation (8.3), which are based on the coordinates of the corner points, are independent. 8.4: It is always true that P0 = 1·P0 + 0·P1 + 0·P2 , so the barycentric coordinates of P0 are (1, 0, 0). Points outside the triangle have barycentric coordinates, some of which are negative and others are greater than 1 (Figure Ans.30).
Answers to Exercises
1383
u1 v>1
0
0
w=
u=
(010)
v=0
(100)
v