Springer Series in
advanced microelectronics
26
Springer Series in
advanced microelectronics Series Editors: K. Itoh T. Lee T. Sakurai W.M.C. Sansen
D. Schmitt-Landsiedel
The Springer Series in Advanced Microelectronics provides systematic information on all the topics relevant for the design, processing, and manufacturing of microelectronic devices. The books, each prepared by leading researchers or engineers in their f ields, cover the basic and advanced aspects of topics such as wafer processing, materials, device design, device technologies, circuit design, VLSI implementation, and subsystem technology. The series forms a bridge between physics and engineering and the volumes will appeal to practicing engineers as well as research scientists. 18 Microcontrollers in Practice By I. Susnea and M. Mitescu 19 Gettering Defects in Semiconductors By V.A. Perevoschikov and V.D. Skoupov 20 Low Power VCO Design in CMOS By M. Tiebout 21 Continuous-Time Sigma-Delta A/D Conversion Fundamentals, Performance Limits and Robust Implementations By M. Ortmanns and F. Gerfers 22 Detection and Signal Processing Technical Realization By W.J. Witteman 23 Highly Sensitive Optical Receivers By K. Schneider and H.K. Zimmermann 24 Bonding in Microsystem Technology By J.A. Dziuban 25 Power Management of Digital Circuits in Deep Sub-Micron CMOS Technologies By S. Henzler 26 High-Dynamic-Range (HDR) Vision Microelectronics, Image Processing, Computer Graphics Editor: B. Hoefflinger
Volumes 1–17 are listed at the end of the book.
B. Hoefflinger (Ed.)
High-Dynamic-Range (HDR) Vision Microelectronics, Image Processing, Computer Graphics
With 172 Figures
123
Professor Dr. Bernd Hoefflinger Director (retired) Institute for Microelectronics Stuttgart Leonberger Strasse 5 71063 Sindelfingen, Germany
Series Editors:
Dr. Kiyoo Itoh Hitachi Ltd., Central Research Laboratory, 1-280 Higashi-Koigakubo Kokubunji-shi, Tokyo 185-8601, Japan
Professor Thomas Lee Stanford University, Department of Electrical Engineering, 420 Via Palou Mall, CIS-205 Stanford, CA 94305-4070, USA
Professor Takayasu Sakurai Center for Collaborative Research, University of Tokyo, 7-22-1 Roppongi Minato-ku, Tokyo 106-8558, Japan
Professor Willy M. C. Sansen Katholieke Universiteit Leuven, ESAT-MICAS, Kasteelpark Arenberg 10 3001 Leuven, Belgium
Professor Doris Schmitt-Landsiedel Technische Universit¨at M¨unchen, Lehrstuhl f¨ur Technische Elektronik Theresienstrasse 90, Geb¨aude N3, 80290 München, Germany
ISSN 1437-0387 ISBN-10 3-540-44432-7 Springer Berlin Heidelberg New York ISBN-13 978-3-540-44432-9 Springer Berlin Heidelberg New York Library of Congress Control Number: 2006933223 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specif ically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microf ilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media. springer.com © Springer Berlin Heidelberg 2007 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specif ic statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. A X macro package. Typesetting by the editor and SPi using Springer LT E
Cover concept by eStudio Calmar Steinen using a background picture from Photo Studio “SONO”. Courtesy of Mr. Yukio Sono, 3-18-4 Uchi-Kanda, Chiyoda-ku, Tokyo Cover design: eStudio Calmar Steinen Printed on acid-free paper
SPIN: 11574088
57/3100/SPi - 5 4 3 2 1 0
Preface
The high-fidelity vision of our world has been a continuous challenge as our intelligence and skills have evolved. The acquisition and the mapping of the rich and complex content of visual information rank high among our most demanding technical tasks. Electronic vision took big leaps with image sensors capturing in real time a dynamic range from bright to dark of more than seven orders of magnitude, thus exceeding the spontaneous ability of the human eye by more than a hundred times, and with displays displaying five orders of magnitude in brightness, again manifesting a more than 100-fold improvement of the state of the art. This book is the account of high-dynamic-range (HDR) real-time vision starting with high-speed HDR photometric video capture and following the video-processing chain with coding and tone mapping to the HDR display. The power of eye-like photometric video is demonstrated in machinevision, medical and automotive applications, and it is extended to HDR subretinal implants for the vision impaired. While the book tries to convey the overall picture of HDR real-time vision and specific knowledge of microelectronics and image processing is not required, it provides a quantitative summary of the major issues to allow the assessment of the state of the art and a glimpse at future developments. Selected experts have been invited to share their know-how and expectations in this rapidly evolving art surrounding the single most powerful one of our senses. Stuttgart, July 2006
Bernd Hoefflinger
Acknowledgments
High-dynamic-range (HDR) video, the subject of this book, emerged as a critical subject for research and product development in 1986 when Europe launched the joint research program EUREKA. Within this framework, the automotive industry, the electronic industry and a large network of research institutes teamed up in PROMETHEUS, the Programme for European Traffic with Highest Efficiency and Unprecedented Safety. A key requirement for electronic-vision-assisted driving was a camera with an instantaneous dynamic range of over 1,000,000:1 and up to 1,000 frames per second. Several authors of this book united early-on to realise this goal, and it is in this context that we like to acknowledge the PROMETHEUS leadership of Prof. Dr. Ferdinand Panik of the Daimler-Chrysler Corporation, who was pivotal in pushing HDR vision technology. Several other research programs followed in the 1990s to use the benefits of real-time HDR vision in industrial machine vision, in minimally invasive endoscopy and even in retinal implants to aid blind people. We thank the German Ministry for Education and Research (formally BMFT, recently BMBF), the European Commission, Brussels, and the Department of Commerce of the State of Baden-Wuerttemberg, Stuttgart, Germany, for consistently supporting vision research. This book only became a reality through the expertise and the dedication of the contributing authors, and I take this opportunity to thank everyone for engaging in this joint effort. Several authors were my co-workers at the Institute for Microelectronics Stuttgart (IMS CHIPS). Many other colleagues at IMS CHIPS too have been involved in the HDRC technology, and I like to express my gratitude here. Verena Schneider deserves special credit for meticulously correcting and merging all the contributions into one opus. Her scientific and managerial skills were key and are highly commendable. Joachim (Joe) Deh, the multimedia and DTP manager at IMS CHIPS has been instrumental for many years in the presentation of HDR image and video materials, and I thank him for many ideas and their high-quality execution. Astrid Hamala brought
VIII
Acknowledgments
her language and professional skills to this publishing project. She translated several contributions into English and polished major sections with impeccable style. Our special thanks go to the team at Springer, Heidelberg, in particular to Adelheid Duhm, for the excellent preparation of HDR Vision. Our section editor, Dr. Claus Ascheron, at Springer, Heidelberg, encouraged me with his unwaivering confidence in the realisation of this book. I thank him for his patience and hope that the response of the readers will be the reward for his leadership in this joint effort. Stuttgart, July 2006
Bernd Hoefflinger
Contents
1 The Eye and High-Dynamic-Range Vision Bernd Hoefflinger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 The High-Dynamic-Range Sensor Bernd Hoefflinger and Verena Schneider . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 General Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The HDRC (High-Dynamic-Range CMOS) Pixel . . . . . . . . . . . . . . . 2.3 The HDRC Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Fixed-Pattern Correction of HDR Imagers . . . . . . . . . . . . . . . . . . . . . 2.4.1 Physical Background of Logarithmic OECF . . . . . . . . . . . . . 2.4.2 Parameter Extraction with Software . . . . . . . . . . . . . . . . . . . 2.4.3 Effects of Parameter Variation on the OECF . . . . . . . . . . . . 2.4.4 Presentation of Three Correction Algorithms . . . . . . . . . . . . 2.4.5 New Parameterized Correction Algorithm . . . . . . . . . . . . . . . 2.4.6 Masking Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.7 Algorithm Including Temperature . . . . . . . . . . . . . . . . . . . . . . 2.4.8 Correction Procedure and Runtime . . . . . . . . . . . . . . . . . . . . 2.4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 HDRC Dynamic Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 HDRC Sensor with Global Shutter . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13 13 19 27 32 32 33 35 37 38 40 41 46 47 47 53 56
3 HDR Image Noise Bernd Hoefflinger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 High-Dynamic-Range Contrast and Color Management Bernd Hoefflinger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
X
Contents
5 HDR Video Cameras Markus Strobel and Volker Gengenbach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 HDRC CamCube Miniaturized Camera Module . . . . . . . . . . . . . . . . 5.2.1 Features of the HDRC CamCube . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Assembly Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 HDRC Camera Front-End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Digital HDRC Camera LinkTM System . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Features of the HDRC Camera Link Camera . . . . . . . . . . . . 5.4.2 Features of the “IP3 Control” Software . . . . . . . . . . . . . . . . . 5.4.3 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Intelligent HDRC GEVILUX CCTV Camera . . . . . . . . . . . . . . . . . . . 5.5.1 Features of the Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 HDR Video-Based Aircraft Docking Guidance . . . . . . . . . . . . . . . . . . 5.6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.4 Challenges to the Sensor System . . . . . . . . . . . . . . . . . . . . . . . 5.6.5 HDR Camera with Improved Sensitivity . . . . . . . . . . . . . . . . 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73 73 75 76 76 77 78 78 82 84 84 84 85 85 90 90 91 92 93 94 97 97
6 Lenses for HDR Imaging Hans-Joerg Schoenherr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 7 HDRC Cameras for High-Speed Machine Vision Bela Michael Rohrbacher, Michael Raasch and Roman Louban . . . . . . . . . 107 7.1 General Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2 Special Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.3 Methods for Obtaining the Specific Image Information . . . . . . . . . . 109 7.4 Optoelectronic Transfer Function (Lookup Table, LUT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.4.1 Mode 1:1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.4.2 Mode Rec. 709 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.4.3 Mode Stretched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.4.4 Mode CatEye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.4.5 Mode CatEye2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.5 Application Example Surface Inspection . . . . . . . . . . . . . . . . . . . . . . 113 7.6 Evaluation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.7 Robot Controlled Image-Processing System for Fully Automated Surface Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Contents
XI
8 HDR Vision for Driver Assistance Peter M. Knoll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 8.2 Components for Predictive Driver Assistance Systems . . . . . . . . . . . 124 8.2.1 Ultrasonic Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8.2.2 Long Range Radar 77 GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 8.2.3 Video Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 8.3 Driver Assistance Systems for Convenience and for Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.4 Video-Based Driver Assistance Systems . . . . . . . . . . . . . . . . . . . . . . . 128 8.4.1 Video System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.4.2 Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.5 Night Vision Improvement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.6 Night Vision Enhancement by Image Presentation . . . . . . . . . . . . . . 131 8.7 Night Vision Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.8 Sensor Data Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.8.1 Lane Detection and Lane Departure Warning . . . . . . . . . . . 134 8.8.2 Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 9 Miniature HDRC Cameras for Endoscopy Christine Harendt and Klaus-Martin Irion . . . . . . . . . . . . . . . . . . . . . . . . . . 137 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 10 HDR Sub-retinal Implant for the Vision Impaired Heinz-Gerd Graf, Alexander Dollberg, Jan-Dirk Schulze Sp¨ untrup and Karsten Warkentin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 10.2 Electronic HDR Photoreceptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 10.3 The Differential Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 10.4 The Complete Amplifier Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 10.5 The Retinal Implant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 11 HDR Tone Mapping Grzegorz Krawczyk, Karol Myszkowski, and Daniel Brosch . . . . . . . . . . . . 147 11.1 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 11.1.1 Spatially Invariant Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 149 11.1.2 Spatially Variant Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 11.2 HDR Video: Specific Conditions and Requirements . . . . . . . . . . . . . 159 11.3 Tone Mapping for HDR Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 11.3.1 Response Curve Compression . . . . . . . . . . . . . . . . . . . . . . . . . 161 11.3.2 Local Details Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 11.3.3 Temporal Luminance Adaptation . . . . . . . . . . . . . . . . . . . . . . 163
XII
Contents
11.3.4 Key Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 11.3.5 Tone Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 11.4 Simulating Perceptual Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 11.4.1 Scotopic Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 11.4.2 Visual Acuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 11.4.3 Veiling Luminance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 11.4.4 Tone Mapping with Perceptual Effects . . . . . . . . . . . . . . . . . . 169 11.5 Bilateral Tone Mapping for HDRC Video . . . . . . . . . . . . . . . . . . . . . . 170 11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 12 HDR Image and Video Compression Rafal Mantiuk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 12.2 Device-Referred and Scene-Referred Representation of Images . . . . 180 12.3 HDR Image and Video Compression Pipeline . . . . . . . . . . . . . . . . . . 180 12.4 HDR Image Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 12.4.1 Radiance’s HDR Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 12.4.2 LogLuv TIFF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 12.4.3 OpenEXR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 12.4.4 Subband Encoding – JPEG HDR . . . . . . . . . . . . . . . . . . . . . . 183 12.5 HDR Extension to MPEG Video Compression . . . . . . . . . . . . . . . . . 184 12.6 Perceptual Encoding of HDR Color . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 12.7 Software for HDR Image and Video Processing . . . . . . . . . . . . . . . . 191 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 13 HDR Applications in Computer Graphics Michael Goesele and Karol Myszkowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 13.2 Capturing HDR Image Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 13.2.1 Multiexposure Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 13.2.2 Photometric Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 13.3 Image-Based Object Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 13.3.1 Image-Based Capture of Spatially Varying BRDFs . . . . . . . 196 13.3.2 Acquisition of Translucent Objects . . . . . . . . . . . . . . . . . . . . . 197 13.4 Image-Based Lighting in Image Synthesis . . . . . . . . . . . . . . . . . . . . . . 199 13.4.1 Rendering Techniques for Image-based Lighting . . . . . . . . . . 200 13.4.2 A CAVE System for Interactive Global Illumination Modeling in Car Interior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 13.4.3 Interactive Lighting in Mixed Reality Applications . . . . . . . 205 13.5 Requirements for HDR Camera Systems . . . . . . . . . . . . . . . . . . . . . . . 206 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Contents
XIII
14 High-Dynamic Range Displays Helge Seetzen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 14.1 HDR Display Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 14.2 HDR Display Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 14.2.1 LED Backlight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 14.2.2 LCD Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 14.2.3 Image Processing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 216 14.3 HDR Display Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 14.4 Alternative Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 14.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 15 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 15.1 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 15.2 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 15.3 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 15.4 Some Useful Quantities and Relations . . . . . . . . . . . . . . . . . . . . . . . . . 231 15.5 Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
List of Contributors
Dipl.-Ing. (FH) Daniel Brosch IMS CHIPS Allmandring 30a 70569 Stuttgart Germany
[email protected] Dr. Alexander Dollberg University of Dortmund, AG Microelektronic Emil-Figge Str.68 44227 Dortmund
[email protected] Dr.-Ing. Michael Goesele University of Washington Department of Computer Science and Engineering Box 352350 Seattle, Washington 98195-2350 USA
[email protected] Dipl. Phys. Heinz-Gerd Graf IMS CHIPS Allmandring 30a 70569 Stuttgart Germany
[email protected] Dr. Volker Gengenbach GEVITEC Marie-Curie-Straße 9 76275 Ettlingen Germany
[email protected] Dr. Christine Harendt IMS CHIPS Allmandring 30a 70569 Stuttgart Germany
[email protected] Prof. Dr. Bernd Hoefflinger Leonberger Strasse 5 71063 Sindelfingen Germany
[email protected] Dr. Klaus-Martin Irion Karl Strorz GmbH & Co. KG Mittelstr. 8 78532 Tuttlingen Germany
[email protected] Prof. Dr. Peter M. Knoll Robert Bosch GmbH Postfach 16 61 71226 Leonberg Germany
[email protected] XVI
List of Contributors
Dipl.-Ing. Grzegorz Krawczyk Max-Planck-Institut f¨ ur Informatik Dept. 4: Computer Science Stuhlsatzenhausenweg 85 66123 Saarbr¨ ucken Germany
[email protected] Dr.-Ing. Roman Louban hema electronic GmbH R¨ontgenstrasse 31 73431 Aalen Germany
[email protected] Dipl.-Ing. Rafal Mantiuk Max-Planck-Institut f¨ ur Informatik Dept. 4: Computer Science Stuhlsatzenhausenweg 85 66123 Saarbr¨ ucken Germany
[email protected] Dr.-Ing. habil. Karol Myszkowski Max-Planck-Institut f¨ ur Informatik Dept. 4: Computer Science Stuhlsatzenhausenweg 85 66123 Saarbr¨ ucken
[email protected] Dipl.-Ing. Hans-Joerg Schoenherr KST GmbH – Kamera & System Technik Kamera & System Technik Hugo-K¨ uttner-Strasse 1a 01796 Pirna Germany
[email protected] Dr. Helge Seetzen BrightSide Technologies Inc. 1310 Kootenay Street Vancouver, BC Canada V5K 4R1
[email protected] Dipl.-Inf (FH) Verena Schneider, MSc IMS-CHIPS Allmandring 30a 70569 Stuttgart Germany
[email protected] Dipl.-Ing. Jan-Dirk Schulze Sp¨ untrup IMS-CHIPS Allmandring 30a 70569 Stuttgart Germany
[email protected] Dipl.-Ing. Michael Raasch hema electronic GmbH R¨ontgenstrasse 31 73431 Aalen Germany
[email protected] Dipl.-Ing. Markus Strobel IMS-CHIPS Allmandring 30a 70569 Stuttgart Germany
[email protected] Dipl.-Ing. Bela Michael Rohrbacher hema electronic GmbH R¨ontgenstrasse 31 73431 Aalen Germany
[email protected] Dipl.-Ing. (FH) Karsten Warkentin IMS-CHIPS Allmandring 30a 70569 Stuttgart Germany
[email protected] 1 The Eye and High-Dynamic-Range Vision Bernd Hoefflinger
The dream of electronic vision is to mimic the capabilities of the human eye and possibly to go beyond in certain aspects. The eye and human vision overall are so versatile and powerful that any strategy to realise similar features with electronics and information technology have to be focused on certain aspects of man’s powerful sense to collect the most comprehensive information on our physical world. The single most challenging feature of optical information is the high dynamic range of intensities hitting the same receptor from one instant to another or hitting different receptors in an observed scene at the same time. This range may exceed 8 orders of magnitude (Table 1.1). This is wider than our eyes can handle instantaneously or even with longtime adaptation. The sensing and acquisition system in our eyes handles over 5 orders of magnitude in real time (Fig. 1.1) and up to 8 orders with long-time adaptation. The curve shown here represents the instantaneous response of a normal eye. The minimum detectable signal moves up and the maximum tolerable because of overflow (blinding) and damage decreases with age so that the dynamic range deteriorates with age. In any event, the eye’s dynamic range used to be much superior to any real-time electronic acquisition system until the advent of certain high-dynamic-range (HDR) video sensors and cameras. It is the cornerstone of this book that real-time HDR video acquisition with a dynamic range of more than 7 orders of magnitude has evolved to a mature technology since it was first demonstrated in 1992 [1]. Fundamentally, the pixels in these sensors mimic the logarithmic compression of the high input range similar to the photoreceptors in our eyes making them “bionic” sensors in many respects like contrast sensitivity, separation of the illuminant and color constancy [2]. Cameras with such sensors have been available since 1996 and a significant amount of research and development along the vision chain including video processing and the challenging art of displaying HDR video to our eyes has been performed.
2
B. Hoefflinger Table 1.1. Optical intensities in photometric units in some real-world scenes
0.001
0.01
Condition
Illuminance (lx)
Clear night sky Quarter moon Full moon Late twilight Twilight Heavy overcast Overcast sky Full daylight Direct sunlight
0.001 0.01 0.1 1 10 100 1,000 10,000 100,000
0.1
1
10
100
1000
104
105
106
Fig. 1.1. The response, in the following also called the optoelectronic conversion function (OECF), of the human eye and of an HDRC array sensor
As a result, a comprehensive and exemplary treatment of HDR video can be presented, which, as we shall see, is inspired in many ways by the unique capabilities of the human eye and the human visual system (HVS). The eye-like HDR video acquisition paradigm in this book is evident in Fig. 1.1, where the OECF of a commercial High-Dynamic-Range CMOS (HDRC) sensor is overlayed on the eye’s OECF. What is striking is that the electronic sensor is superior to the eye in: 1. Sensitivity, which means the minimum detectable illuminance, by at least a factor of 10 2. Range, which means the maximum illuminance measurable without distortion, white saturation or damage by another factor of 10, so that the total dynamic range is more than 100× higher than that of the human eye The unique feature of this sensing technology is HDR photometric video capture in real time of 8 orders of magnitude of illuminance.
1 The Eye and High-Dynamic-Range Vision
3
This means that over an illuminance range of 100 million to 1 all values measured consecutively on the same pixel and/or simultaneously on different pixels in an array are correct relative to each other and on an absolute scale, if the sensor temperature is kept constant and the sensor is calibrated. Before we treat other unique features and consequences of this paradigm of real-time photometric HDR video capture, it is useful to put our treatment of HDR vision in perspective relative to the book HDR IMAGING by Reinhard, Ward, Pattanaik and Debevec [3], which we will reference as HDRI in the present book (HDRV). HDRI is a formidable scientific treatment on the “processing” of highdynamic-range images acquired with low-dynamic-range cameras basically through multiple exposures. This means HVI is focused on high-quality still images and computer graphics while there are limitations for video, particularly high-speed video in real-world scenes. The present book HDRV presents the disruptive technology of HDR “acquisition” with eye-like log-compression up front in the electronic photoreceptor (pixel) and its powerful consequences for robust, real-time video processing, transmission and display. Our focus on real-time HDR video capture is demonstrated in Figs. 1.2–1.4. Fire breather, solar eclipse and welding simply could not be captured without the high-speed, real-time recording offered by the HDRC sensors. However, even with the “natural”, eye-like log compression in each pixel, the camera output data for these scenes still needed “tone mapping”, which will be presented
Fig. 1.2. A frame taken from the video sequence of a fire breather recorded with an HDRC video camera at 30 frames/s−1
4
B. Hoefflinger
400 mm mirror telescope, f:5,6.
4.12:34 MEZ
3.12:33 MEZ
2.12:32 MEZ
210 mm telephoto lens, f:5,6. 1.12:25 MEZ
Fig. 1.3. Solar eclipse: a few frames from the video sequence of the solar eclipse on August 11, 1999, in Germany, recorded with a HDRC camera at 30 frames/s−1 . The camera was equipped with a 400 mm telescope, f :5.6, and aH6 filter. The partially clouded dark sky was genuinely captured with its shady details because of the lowlight contrast sensitivity of the HDRC sensor simultaneously with the extremely bright corona
in later chapters, to offer us an informative print image. It is in areas like tone mapping where all in the HDR community come together to benefit from the general progress in HDR imaging. However, we return here to HDR eye-like logarithmic video acquisition to identify further tremendous benefits resulting from mimicking the HVS. Figure 1.5 shows HDR acquisition. The scale in this graph is powers of 2, also called f -stops or octaves. The upper bars show how seven exposures
1 The Eye and High-Dynamic-Range Vision
5
Fig. 1.4. Welding scence: Single frames from the video sequence of a welding scene recorded simultaneously with a CCD and an HDRC camera in 1995. The HDRC sensor had a resolution of 256 × 128 pixels and it delivered 120 frames/s−1
24 f stops
Linear Camera with 7 exposures aquires image with range of 24 f stops
0 0 HDRC® Dynamic Range is 28 f stops per frame
0
5
10
15
20
24
25
28
30 stops
Fig. 1.5. HDR acquisition
with a conventional camera are staggered to obtain an HDR image file with a dynamic range of 23 f -stops, considered to be sufficient for a high-quality image file. Special stitching algorithms are required to fit the 7 files together into one comprehensive file. This art is covered extensively in HDRI Chap. 4 [3]. By comparison, HDRC VGAX sensors have been reported to have an instantaneous range of 8 orders of magnitude or 28 f -stops [2] so that one exposure captures all the information photometrically correct. The optoelectronic conversion function (OECF) of the HDRC sensor, Fig. 1.1 is not only continuous and monotonic over so many orders of magnitude, it is moreover strictly logarithmic over the largest part. With its extremely high dynamic range, it reduces radically the design and operating problems of cameras: With HDRC, there is no need for aperture and/or exposure-(integration-) time control.
6
B. Hoefflinger
In other words: With HDRC, no pixel and no frame can be over- or under-exposed. In fact, as an accurate photometric sensor, it surpasses the human eye, which still needs pupil control covering about four f -stops to control the range of input intensities. In the evolution of its senses, the human system has optimised its functionalities for efficiency and sensitivity among other features. The logarithmic conversion function is a wonderful case in point. In Fig. 1.1, we have plotted the OECF of the eye and the HDRC sensor on a comparable scale. Regarding the HDRC sensor, we see that the OECF maps an input range of 8 decades or 28 Bit to an output range of 10 Bit. So the first result of log conversion is information compression from 28 to 10 Bit: The log converting HDR pixel provides natural bit-rate compression. We may be concerned that this means a possible loss of information. As we shall see shortly, the opposite is true. We gain contrast sensitivity in darker, shaded parts of a scene. The so-called contrast sensitivity answers the most critical question: What is the minimum change of the input grey value, which causes a just noticeable difference (JND) on the output? The contrast sensitivity tells us, by how many percent we have to change an input value to obtain this JND on the output. The human eye can distinguish changes in the input luminance, which are a little smaller than 1%, at sufficient light levels. In this region of “photopic” vision, this contrast sensitivity is independent of the input (grey) level as shown in Fig. 1.6. This is the region of the logarithmic response:
Fig. 1.6. The contrast sensitivity function (CSF) of the human eye and of an HDRC sensor with an input range of 7 orders of magnitude and a 10-Bit output
1 The Eye and High-Dynamic-Range Vision
7
A logarithmic OECF offers constant contrast sensitivity independent of the input grey value. It was Weber’s finding in the 19th century that a natural response function has this unique feature that a constant “relative” change of the stimulus causes the just noticeable difference in the response. At lower light levels, the contrast sensitivity (CS) of the eye decreases: A relatively larger input change is needed to cause a JND on the output so that the spontaneous sensitivity of the eye is limited. However, with long-time adaptation, the sensitivity range then extends to very low light levels, as shown by the low-light tail of the CSF. We also find in Fig. 1.6 the CSF of the HDRC sensor. In the case of a 10-Bit output covering 7 orders of magnitude on the input, it is 1.5% and constant over 5 orders of magnitude. For the sensor with digital output, the JND is just 1 least-significant bit (LSB), often also called 1 digital number (DN) in the text. The sensitivity decreases at low light levels similar to the eye. However, the sensitivity is better than the spontaneous sensitivity of the eye. In the present discussion, we call sensitivity, in the language of film and standards, the minimum detectable signal. The relatively high performance of the electronic sensors is due to the properties and extensive technology development of silicon (Si) sensors. We put this in perspective relative to ASA and DIN sensitivity standards. For this purpose, we use the relation that a “green” light energy of 8 mlx s produces a just noticeable exposure (or a signal-to-noise ratio of unity) on a 100 ASA film. The electronic receptor of choice, namely the Si photodiode, is limited in its detection capability by the so-called dark current. Its dominant component in miniature photodiodes is that along the perimeter of the photodiode. To get a feel for its significance, we assume that 50 dark electrons per micron are generated in 25 ms (a frame rate of 40 frames/s−1 ) at a temperature of 50◦ C. The statistical variance of this number of electrons, the so-called shot noise, is given by the square root of this number (see Chap. 3). With a green illuminance of 20 mlx we would collect typically .1 electron/µm−2 in this frame time. The minimum detectable luminance is reached, when the number of photoelectrons has reached the number of electrons due to the dark shot noise. This would be the absolute physical limit. Other noise sources in an actual Si sensor make the total noise bigger and the sensitivity worse than the ideal presented here in Table 1.2. This exercise shows that Si sensor technology has the potential for very good ASA/DIN ratings and that there are serious trade-offs between photodiode/pixel size and obtainable sensitivity. Progress in HDR capture in the future depends mostly on progress in low-light sensitivity. This exercise gives us a glimpse at where to look for improvements. The extended range of high contrast sensitivity down to low light levels is essential for capturing details in the shady parts of HDR scenes. An example is shown in Fig. 1.7.
8
B. Hoefflinger
Table 1.2. The sensitivity of miniature Si photodiodes for different square sizes in terms of ASA and DIN ratings (see text for model assumptions) Minimum detectable (mlx s): Unity signal-to-noise ratio (shot-noise limit) ASA DIN mlx s Signal e µm−2 Photodiode size (µm)
6,400 39 0.12 0.25 15.0
3,200 36 0.24 0.5 9.3
1,600 33 0.5 1.0 5.9
800 30 1.0 2.0 3.7
400 27 2.0 4.0 2.3
200 24 4.0 8.0 1.8
100 21 8.0 16.0 0.9
Reference: Quantum efficiency: 50% Temperature: 50◦ C Dark electrons: 50e µm−1 Int. time: 25 ms
Fig. 1.7. Portrait illuminated by intense side lighting, captured by an HDRC camera 512 × 256 pixels, 1/60 s, 1996
Inspite of the intense side lighting, the shaded half of the face shows the same amount of detail as the illuminated half, allowing safe contour extraction and face recognition. We use this image to introduce another powerful feature eye-like HDR log capture: Separation (or Disregard) of the Illuminant We perceive real-world scenes predominantly by content (objects, contours, texture, color, motion) and less by actual and changing illumination. This has led to the feature of “Disregarding the Illuminant” in the context of describing the (HVS). In this introduction, we show practical examples where we correct or remove the incidental illumination in images recorded with HDR log-response sensors in real time to obtain image representations either pleasing for our eyes or suitable for robust machine vision. Figure 1.8 shows four images of the Macbeth Chart recorded with an HDRC camera where the lens aperture was varied over 5 f -stops. Raw images on the left. On the right side, all pixels on the seemingly “over” or “under”exposed frames have received an offset corresponding to the log of the numbers
1 The Eye and High-Dynamic-Range Vision 1.8
8
4
16
9
1.8
8
4
16
Fig. 1.8. Macbeth chart recorded with an HDRC camera with four apertures. Left: sensor output. Right: Pixel data shifted by common offsets
of f -stops necessary to get what we perceive as the “standard” representation of the chart. A global “offset” correction, easily done in real time, restores the raw images on the left to the desired images on the right without any further processing or correction. Leaving details to later chapters, let us consider here that the intensity incident on each pixel of our sensor is the product of the illumination and of the reflectance of that spot in our scene, which is imaged onto this pixel. Because of the logarithmic response of the HDRC pixel, the pixel output is the sum of log-luminance and log-reflectance. Therefore, an offset correction by log-luminance has the desired effect. The result of this straightforward procedure is shown in Fig. 1.9 for the case of a rather spotty illumination of an object. When the profile of that illumination at the distance of that object, e.g. in front of an ATM, is known for that camera operation, the application of that profile as an offset produces an output image, which comes close to perfect studio lighting and which is ideal for feature extraction and recognition.
Original HDRC® Frame
Log Illuminance (Spotlight)
HDRC® Frame after Log Illuminance Offset
Fig. 1.9. One frame from a video sequence of a moving model with spot illumination recorded with an HDRC camera. On the left: The raw frame. In the middle: The log profile of the illumination recorded and stored after the installation of the camera. On the right: The illumination profile added as an offset in real time to the raw frame. Frame rate: 30 frames s−1
10
B. Hoefflinger
We have shown in Figs. 1.8 and 1.9 that with HDRC eye-like video capture we can Manipulate, Correct or Eliminate the Illuminant. We just notice here that with the inverse process we can also custom-illuminate HDR log images. What has been subtly introduced with these HDR color figures is another important feature of eye-like HDR acquisition: Color Constancy This phenomenon is the very complex capability of the HVS to perceive a color unchanged whether intensely or weakly illuminated and even if the spectrum of the illuminant changes. The latter is beyond the capability of present HDR capture. However, the first, namely rendering a color of any local area in a scene constant irrespective of whether it is brightly lit or shaded is evidenced by our two figures. No local or global color correction of HDR log images is needed to compensate the illumination. The offset by log luminance is the only operation performed to obtain the improved images in Figs. 1.8 and 1.9 Chapter 4 addresses HDR eye-like contrast and color in more detail. Our paradigm of: “HDR Photometric Video Capture mimicking the Eye’s Response” offers unique features summarised here with references to the further treatment and important applications in this book: 1. Each pixel is a photometric log-compressing sensor with a dynamic range of 8 orders of magnitude (Chap. 2). 2. All pixels in the sensor have the same OECF (Chap. 2) due to calibration in real time. 3. Because of the very high dynamic range, there is no aperture or shutter(integration) Time control (Chaps. 2, 3, 5, 6, 9 and 10) 4. The sensors have good low-light sensitivity, because the pixels operate basically at the shot-noise limit of their photodiodes (Chap. 3). 5. Constant contrast sensitivity over many decades of grey facilitates robust object detection in high-speed machine vision (Chaps. 5 and 6), in traffic (Chap. 8), in endoscopy (Chap. 9). 6. Pixel output is the sum of log-luminance and log-reflectance (R, G, B) (Chap. 4). 7. Log-luminance is available directly for each pixel (Chaps. 4, 9, 11 and 12). 8. Color per pixel is constant irrespective of luminance level (Chaps. 4, 7, 8, 9, 11 and 12).
1 The Eye and High-Dynamic-Range Vision
11
The disruptive technology of HDR photometric video capture summarised in this chapter impacts the design of cameras and optics, and it enables entirely new, high-performance vision systems. HDRC video cameras are presented in Chaps. 5, 7, 8 and 9. Lenses for HDR video are discussed in Chap. 6. Some of the most demanding applications of HDR vision were selected for this book.The video-based guidance of aircraft is presented in Chaps. 5 and 6. Chapter 7 is an in-depth treatment of high-speed HDR machine vision for the inspection of complex, highly reflective parts. Chapter 8 offers a perspective on automotive driver assistance and the importance of HDR vision in this context. All the unique features of our new paradigm come to bear in new miniature cameras for minimally invasive endoscopy. It is a very special phenomenon that we return to the impaired eye what we have learned from the healthy eye, namely a subretinal implant, in which HDR log-compressing pixels generate a compatible stimulus for the retinal network as a partial replacement for deceased photoreceptors (Chap. 10). Direct HDR photometric, log-compressing video capture with its film-like response, log luminance, natural bit-rate compression, constant contrast sensitivity and constant color has formidable implications on the video chain (Fig. 1.10) consisting of coding, transmission, tone mapping and display. Chapters 11 (tone mapping) and 12 (coding) are critical reviews of these dynamic areas with an emphasis on comparative evaluations and original contributions, which have special affinity to HDR photometric capture. The HDR display presented in Chap. 14 is not only the one with widest dynamic range, but it is also the most illustrative inverse of the separation of log luminance and color in HDR video capture: What is displayed to us is basically the multiplication of regional luminance and high-resolution color. To conclude this introductory chapter and in view of the many disciplines coming together in this book, we point the reader to the glossary of terms and abbreviations as well as to the comprehensive index in the back.
PROCESSING
ACQUISITION SENSOR FOCAL-PLANE PROCESSING
PROCESSING
DISPLAY
TONE MAPPING CODING VIDEO PLAYER
VIDEO
Fig. 1.10. The HDR vision chain from video capture through processing to display
12
B. Hoefflinger
References 1. Seger U, Graf HG, Landgraf ME (1993) Vision assistance in scenes with extreme contrast. IEEE Micro 13: 50–56 2. Seger U, Apel U, H¨ offlinger B (1999) HDRC imagers for natural visual perception. In: J¨ ahne B, Haußecker H, Geißler P (eds) Handbook of computer vision and applications, volume 1 Sensors and applications. Academic, San Diego, pp. 223–235 3. Reinhard E, Ward G, Pattanaik S, Debevec P (2006) High dynamic range imaging. Elsevier, Amsterdam
2 The High-Dynamic-Range Sensor Bernd Hoefflinger and Verena Schneider
2.1 General Considerations The pixels of a sensor imaging our world can be hit by light intensities spanning a dynamic range from dark to bright of over seven orders of magnitude. If we want to electronically record such scenes without loss of information, with manageable amplitudes and without the failure of over- or underexposed pixels, we have to find means where the response to such signal intensities is an output signal, which is proportional to the logarithm of this intensity. Such a response characteristic would be a straight line in a plot of the sensor signal versus the log of intensity (Fig. 2.1). This figure shows the response curves, also called optoelectronic conversion functions (OECF), of several image sensors on this scale. Our eyes have an instantaneous dynamic range of typically 200,000:1. In the category of film, negative film has the widest dynamic range of about 10,000:1. The S-shaped curve of the film shows that it has a log-type response in its central section followed by curved tails at the low- and high- light intensity sections. CCD sensor response is shown in this diagram as a curved line covering a range of less than 1000:1. This type of sensor has a linear response resulting in the curved shape in our semi-log plot. Also shown is the curve of a HDRC sensor with a very wide dynamic range exceeding 7 orders of magnitude. This type of sensor will be treated extensively in the following. At this point, we want to focus our attention on the fundamental significance of log-response characteristics. For this purpose, we assume that the output signal from our sensor can be detected with a basic resolution of one step in a total range of 256 steps, in other words, an 8-bit output signal. For any log-response sensor (linear in our semi-log plot), this result is a constant number of bits per decade of intensity. This means that independent of the grey level we have 82 DN per decade in the log sensor.
14
B. Hoefflinger and V. Schneider
FILM(1, 2)
EYE
Density
CCD(1)
0.001
HDRC VGAx
Logarithmic Intensity [Lux] 0.01
0.1
1
10
100
1000
104
105
106
Dynamic Range 170 dB 1 millilux
1) Exposure time 30 ms 500k Lux 2) 100 ASA
Fig. 2.1. Response curves of image sensors
For a linear sensor (a), the sensitivity to cover a dynamic range of 1000:1 strongly depends on the signal level. For the upper decade, from white to grey, we have 230 DN available, but only 2.5 DN are available for the decade at the dark end and 22.5 DN for the range of 1–10% of maximum brightness. In fact, the slope of the optoelectronic conversion function (OECF) is a very important measure for the capability of a sensor to detect fine changes in the optical signal. In the time domain, we ask ourselves: How big is the change in the electronic (digital) signal from a pixel, if the incident light intensity changes by 10%? In the spatial domain, we ask ourselves: How big is the difference in the electronic (digital) output signals of two neighbouring pixels, if their optical input differs by 10% due to, for instance, an edge of the imaged object. In Fig. 2.2, we illustrate the spatial aspect by displaying the difference in the digital output signals (DN) from two adjacent pixels whose opti0cal inputs differ by 10%. We assume an 8-bit digital output that is a maximum signal of 255 DN; and we assume that our white signal is 100 lx and that we try to handle a dynamic range of 1,000:1. The linear OECF has a slope of 2.5 DN per lux. At maximum white the two pixels see 100 and 90 lx, respectively, and we get a healthy difference of 25 DN in the output signals from the two pixels. If the illumination of our test body is reduced to 1/10, one pixel receives 10 lx and the other 9 lx. The difference in the output signals now is reduced to 2.5 DN and is thus barely detectable. Finally, the 10% difference will just be detectable at 4 lx, where we would receive 1 LSB (1 DN) as the difference in the output signals.
2 The High-Dynamic-Range Sensor
(a)
(b)
255
255 247
L
Linear
230
Digital Density 175 167
87 79 25 23 0
0.1 L 0 9 10
90100
L
Logarithmic Sensitivity: 80 DN per decade
15
0.1 L
0.01 L
8 0.001 L 0 0.09 0.1 0.9 1
910
90 100
Intensity (Lux)
Intensity (Lux)
Fig. 2.2. OECF of a linear sensor (a) and a logarithmic (b) sensor for 8-bit output
If our imaging system is supposed to detect a difference of at least 10%, the linear optoelectronic conversion function with an 8-bit output would provide a dynamic range of 25–1 (from 100 to 4 lx). In the case of an imaging system with a logarithmic OECF with 8-bit output, in the example of Fig. 2.2b, we choose as an example the output range of 255 DN to represent incident optical intensities from 0.09 to 100 lx. Our choice means a slope of 82 DN decade −1 of optical intensity, a significantly different thinking from the linear OECF. Specifically, the slope is not DN lx−1 , depending directly on the input level and therefore rapidly decreasing for smaller inputs. We now choose a constant for relative changes independent of the absolute input signal level. We span a range of 10–100 lx with 80 DN and we get 80 DN to span a range of 1–10 lx. For the 10% input difference between our two pixels, we now get 8 DN independent of the magnitude of the input. If our object receives very little illumination so that the two pixels would see 0.09 and 0.1 lx, respectively, we still receive an output difference of 8 DN. Thus, we can safely detect a 10% difference in the dark, giving us a dynamic range of 1,000:1 to meet the 10% criterion. For a general OECF y = f (x), where x is the optical input and y is the electronic output, we have to ask ourselves: for a given optical input x, what is the smallest difference ∆x, which would result in a noticeable change ∆y of our output signal df ∆y = . ∆x dx The relative difference is the contrast sensitivity: ∆C =
∆x ∆y = . x x · df /dx
16
B. Hoefflinger and V. Schneider
In the case of a logarithmic OECF y = a · lnx,
(2.1)
∆y = 1/a, (2.2) a if ∆y = 1 LSB in the case of a digital output. If we have a digital output of N bit, we can trade ∆C against the dynamic range of M decades of input intensity: ∆C =
∆C =
1 M · ln 10 = N a 2 −1
(2.3)
From Table 2.1, we learn the power of the logarithmic OECF: Input dynamic range: M Decades Output Signal Range: N Bit Contrast sensitivity is the percent input change, which will be noticed as a change of 1 LSB (1 DN) in the output. A dynamic range of M = 6 orders of magnitude or 1 Mio.:1 corresponding to 20 bit of optical input is mapped to an electronic output with a range of N = 12 bit resolving 0.35% changes of the optical input over the full input range from white to dark. For a 10-bit output, a change of 1.4% would produce 1 LSB. This example of a logarithmic OECF is therefore also said to offer a compression of a 20 bit input to a 10 bit output. In any real-world system, as the incident optical intensity x decreases, eventually our electronic output signal y will settle to y → 0 for x → 0 as shown in Fig. 2.3. We can write this OECF as y = a · ln(1 + x).
(2.4)
The extrapolation of the log characteristic with the slope a for larger inputs x 1 intersects the x-axis at x = 1. Figure 2.3b shows the contrast-sensitivity function (CSF) ∆C for this OECF. Table 2.1. Contrast sensitivity (percent) of a logarithmic OECF Output N Bit Input M decades 2 Dec 3 Dec 4 Dec 5 Dec 6 Dec 7 Dec 8 Dec
8 bit
9 bit
10 bit
11 bit
12 bit
1.8 2.7 3.6 4.5 5.4 6.3 7.2
0.90 1.35 1.8 2.25 2.7 3.15 3.6
0.45 0.675 0.900 1.175 1.350 1.575 1.80
0.225 0.336 0.45 0.56 0.67 0.784 0.9
0.112 0.168 0.225 0.280 0.336 0.392 0.45
2 The High-Dynamic-Range Sensor
(a)
17
OECF 700 y=100*In(1+x)
Digital Output y (DN)
600 500 400 300 200 100 0 10−2
10−1
100
101
102
103
Optical Intensity x
(b)
CSF 102
Contrast Sensitivity ∆C (%)
∆C = (1+x)/(100*x)
101
100
10−1 10−2
10−1
100
101
102
103
Optical Intensity x
Fig. 2.3. (a) OECF of a natural eye-like image sensor. (b) CSF of a natural eye-like image sensor
18
B. Hoefflinger and V. Schneider
Obviously, for x sufficiently larger than 1, we obtain the constant value 1/a. At x = 1, ∆C increases to 2/a. At x ≈ 0.1, ∆C increases to 11/a, rising to 1 in our example at x = 1/a for a 1. At such a low optical input, we would have to double the input intensity for a noticeable change of 1 LSB in the output. For log-response sensors we will often plot the OECF as y = f (log x) (Digits = DN) and its derivative S=
dy (DN Decade−1 ), d log x S=
x · dy . log e · dx
To convert S into the contrast sensitivity ∆C =
100 (%), x · dy/dx
−1
we observe that S = (∆C · log e) . Therefore, we represent the CSF either as S (DN Decade−1 ) or ∆C [%]. For example, a contrast sensitivity of 1 or 2% is equivalent to 230 or 115 DN Decade−1 , respectively. In Fig. 2.4, we show, somewhat simplified, the contrast-sensitivity function of the human eye. We see that for luminances higher than 100 cd m−2 (3.4 lx) its contrast sensitivity is about 1% and independent of the luminance. The spontaneous sensitivity is only 10% at 1 cd m−2 (34 mlx). The characteristic point, where it has doubled to 2% occurs at about 10 cd m−2 , which we will call the 3 db point from now on.
Contrast sensitivity (percent)
100
10
2 1
10−3
10−2
10−1
1
10
Adapting luminance in
102 [cd/m2]
Fig. 2.4. The CSF of the human eye
103
104
2 The High-Dynamic-Range Sensor
19
What Fig. 2.4 tells us is that in human vision minimum perceivable differences are about 1% over several decades of luminance at levels higher than 10 cd m−2 and, that in this range, which is also called the photopic response, the human eye has a logarithmic OECF. Its spontaneous response is limited at about 0.3 cd m−2 (10 mlx), reaching out to mcd m−2 with long-term adaptation. We will use the eye characteristic of Fig. 2.4 as nature’s guidance to the electronic implementation of high-dynamic-range (HDR) image sensors.
2.2 The HDRC (High-Dynamic-Range CMOS) Pixel Silicon integrated-circuit technology has been the basis of electronic image sensors for over 30 years for two dominant reasons: 1. Silicon photodiodes are the best visible-light detectors. 2. Metal Oxide Silicon (MOS) technology facilitates the storage and loss-free transport of photo-generated electronic charges. The pn junction of silicon photodiodes can act efficiently as converter of visible-light photons into electronic charges. The quantum-efficiency and spectral-sensitivity diagram (Fig. 2.5) shows that for each photon absorbed in the pn junction almost one electron can be obtained. Furthermore, silicon integrated-circuit technology as the world’s dominant microelectronic manufacturing technology allows the closely spaced integration of millions of such photodiodes on a single chip. The separation and isolation of these diodes in array-type image sensors has been and is continuously optimized to reduce leakage currents, which are the primary cause for the detection limit. Leakage-current figures of merit as low as 10 pA cm−2 have been realized at room temperature. The currents have a characteristic dependence on temperature where silicon provides the best compromise among all semiconductors. A rule of thumb for this temperature dependence is that leakage currents double every 8–10◦ C in the temperature range –40 to +100◦ C. The generated photo charges emerge from the photodiode as a photocurrent, which is strictly proportional to the incident photon flux over many orders of magnitude of incident intensity. In spite of near-ideal optoelectronic conversion, the generated photocurrents are exceedingly small. Table 2.2 provides a listing of some useful relations and quantities. Based on these numbers, a photodiode with an area of 25 µm2 illuminated by moonlight of 10 mlx will offer a photocurrent of 0.13 fA. This is equivalent to just 800 e s−1 . Such small currents cannot be sensed directly as currents, and direct current amplifiers at these levels are also not feasible in mainstream microelectronics. This mainstream is MOS technology. For reasons of power consumption and noise immunity, complementary MOS (CMOS) technology
20
B. Hoefflinger and V. Schneider 100% 0.7
90% icie
0.6 Spectral Sensitivity [A / W]
y nc
m
tu an
Eff
80%
Qu
0.5
70% 60%
0.4
50% 0.3
40%
VG
Ax
0.2
30% 20%
0.1 10%
200
300
400
500
600
700
800
900
Wavelength [nm] Visible Light
UV C
B
A
Blue
Green Yellow Ora.
IR Red
Fig. 2.5. Quantum efficiency and spectral sensitivity vs. the wavelength of light. Also shown: The characteristic of an integrated silicon photodiode with an area of 40 µm2
Table 2.2. Parameters of a reference pixel Pixel area Photodiode (PD) area Capacitance Dark charge Dark shot noise Conversion efficiency Sensitivity Reset noise Quantization (1 DN)a Conditions: T = 20◦ C, T int = 25 ms a
Linear 10 bit
(µm2 ) (µm2 ) (f F) (e) (e) (e mlx−1 ) (DN mlx−1 ) (V lx−1 s−1 ) (e) (e) (mV)
7.4 × 7.4 5.0 × 5.0 2.5 20 4.5 2 0.33 5.4 20 6 0.4
2 The High-Dynamic-Range Sensor
21
has a share of over 90% of the integrated-circuit world market today. In MOS integrated circuits, we manipulate tiny electronic charges on tiny capacitors, measure the voltage changes on those capacitors and subsequently amplify and digitize these voltages. The key active component facilitating these operations is the MOS transistor. It functions not only as an ideal voltage-controlled current amplifier, it is also a voltage-controlled switch with a very high-off resistance and a very large on–off resistance ratio. The control gate of the MOS transistor can also serve as a storage capacitor. As soon as millions of such transistors per chip were realized as mass products in the eighties of the twentieth century their combination with photodiodes became a major development area in what has since been called active-pixel image sensor (APS). For HDR sensing, we focus on three observations: 1. We stated before that the signal type of modern microelectronics is a voltage signal. 2. The optical input x, namely photon flux, is converted in the silicon photodiode into a photocurrent with strict proportionality. 3. The electronic output signal y of our pixel should be a voltage and the optoelectronic conversion function (OECF) should be logarithmic and, therefore, a pixel should convert the photocurrent x into a voltage y, which would be proportional to the logarithm of the photocurrent x: y = a · ln(1 + x).
(2.4)
How can we accomplish this? Figure 2.6 shows an NMOS transistor, basically a four-terminal device. The current of interest is the drain current ID. We consider two voltages, the gate-to-source voltage VGS and the drain-to-source voltage VDS. Dependencies between these voltages and the drain current in the predominant modes of operation of the MOS transistor have either linear or parabolic relationships. However, for small control voltages VGS the drain current decreases or increases exponentially as VGS is reduced or increased. This is normally considered an unfavorable “leakage” or “parasitic” current. However, a very well-established and controlled physical mechanism is responsible for this relationship. Electrons from the source have to cross a barrier towards the channel and the collecting drain electrode. This barrier height is modulated by VGS and VDS according to the following relationship [1]: ID ≈ I0 exp[(A.VGS + B.VDS)/nVt ],
(2.5)
where A and B depend on the length of the transistor channel, the thickness of the gate oxide, the doping of the transistor channel and of the transistor well and on VBS. Representative values are A = 0.6 and B = 0.2. A typical value for n is 1.5 and Vt is the “thermal” voltage kT /q, which is 26 mV at
22
B. Hoefflinger and V. Schneider
(a)
(b)
VDS
VDS
G VGS
S
D
ID
G
ID
VGS
n+
n+
ID
ID D
S
p-Well
B
p+-substrate
B
(c) 10−4
Drain Current ID (A)
10−6
10−8
10−10
10−12
10−14
10−16 0
0.5
1
1.5
2
Gate-Source Voltage VGS (V)
Fig. 2.6. NMOS transistor: (a) cross section, (b) symbol, (c) typical ID (VGS) characteristic for VGS = VDS
2 The High-Dynamic-Range Sensor
23
T = 300K. A typical characteristic is shown in Fig. 2.6 for the conditions VGS = VDS and VBS = 3.0 V. Evidently, this transistor provides a drain-source voltage VDS, which is the precise logarithm of the drain current ID over at least 8 orders of magnitude. In this case, the transistor actually operates as a highly nonlinear resistor. The essential barrier control (2.5) is provided by the voltage difference between gate and source and the gate is actually connected to the drain: VGS = VDS. The photodiode current is injected at the source electrode of the MOS transistor T1 in the pixel diagram Fig. 2.7. In fact, the source area of the transistor can be extended to become the photosensitive cross-section of the pixel. In this fashion, contacts are saved and density is gained in the pixel cell. The output signal voltage is available at the source of T1 and this voltage becomes the gate control voltage of a second transistor T2, which provides the charge and discharge current for the read-out electronics of the sensor. A select transistor T3 operating as a switch completes the pixel cell as shown in Fig. 2.7. This figure also contains a cross-section through the pixel, VDD T1
T2
SEL OUT
Vlog
T3
VSS
Vlog SEL
VDD n+
n+ T1
n+ T2
OUT
n+ T3
p-well
p+-substrate
VSS
Fig. 2.7. HDRC pixel diagram and cross-section
24
B. Hoefflinger and V. Schneider
which shows the P- and N-type regions of the diode and the transistors as well as all the gates, contacts and interconnects needed. A HDRC sensor with 64 × 64 pixels of this type was first realized and published in February 1992 [2] and cameras were then tested on vehicles to record high-dynamic-range scenes on the road including tunnels. For the performance assessment of the HDRC pixel, we consider a reference pixel, which has the parameters given in Table 2.2. The simplicity of the HDRC pixel as shown in Fig. 2.8 allows the direct measurement of the current in the branch of the pixel consisting of the series connection of the log transistor T1 and the photodiode as a function of the illumination (Fig. 2.8). This data is obtained from measurements on a 768 × 496 pixel HDRC sensor under flat-field illumination. On the straight-line section, we find a slope of 15 fA lx−1 . This is equivalent to a spectral sensitivity of 0.3 A W−1 . The voltage signal V log is obtained basically after a string of unity-gain amplifiers (pixel follower T2, mux1, and output amplifier) at the output of the sensor, and measured results are shown in Fig. 2.9. This is the OECF of the HDRC sensor. This data spans more than 7 orders of magnitude in the semi-log diagram. It shows the perfect logarithmic behavior and an overall shape similar to the model OECF as shown in Fig. 2.3. The output is given both as a voltage (mV) and as digital numbers (DN) with a 10 bit range. We can readily extract the CSF as given in Fig. 2.9b, which is the slope of Fig. 2.9a. 10−11
1.6*105
10−12
1.6*104
1.6*103
1.6*102
1.6*101
1.6*100
Number of Electrons per 25 ms Current [A]
1.6*106
10−13
10−14 10−15
10−16
10−17 10−2
10−1
100 101 102 Illumination x [Lux]
Fig. 2.8. Current in the HDRC 10 µm pixel
103
104
2 The High-Dynamic-Range Sensor
25
(a) 800 Continuous Mode 750 700
Digital Output [DN]
650 600 550 500 450 400 350 300 10−4
10−2
100 Illumination x [Lux]
102
(b) 120 Continuous Mode
Sensitivity [DN/decade]
100
80
60
40
20
0 10−4
10−2
100 Illumination x [Lux]
102
Fig. 2.9. (a) The OECF of the HDRC 10 µm pixel (b) The CSF of the HDRC reference pixel
26
B. Hoefflinger and V. Schneider
In this setting, the output signal range covers more than 7 orders of magnitude or 140 dB, and the contrast sensitivity obtained is 1%, which corresponds to our expectations for a 10 bit output according to Table 2.1. The characteristics show that the HDRC pixel, as simple as it is, has the capability of recording an extremely wide dynamic range. As we see in Fig. 2.9, our reference pixel has a leakage current of 0.5 fA resulting from the photodiode and from the log-node contact. The photocurrent is equal to this “dark” current at the 3 dB point of the OECF, which we find has this magnitude at an illumination of 20 mlx. At this point on our OECF, the contrast sensitivity deteriorates by a factor of 2 compared with the log-response part of the OECF. At 2 mlx, the contrast sensitivity is 1/10 of the sensitivity in the log-region or, in other words, a grey level change at this low illuminance would have to be 10% to be just noticeable. The OECF in Fig. 2.10 of our reference HDRC pixel has the ideal model shape according to (2.4). The output signal y with a DN range 0–1,023 has a contrast sensitivity of 110 DN decade−1 of illumination. The HDRC pixel realizes the OECF of the human eye and, when we compare the critical 3 dB points, we find that the electronic pixel with a 3 dB point of 0.6 cd m−2 (20 mlx) performs better than the human eye where this point is at about 10 cd m−2 (340 mlx) for the spontaneous sensitivity. We should also point out here that the HDRC pixel sensitivity and minimum detectable level are basically independent of any exposure or integration time. The pixel follows the photocurrent and its changes continuously in the basic mode of HDRC operation. We can read the voltage Vlog randomly at any time by accessing the pixel and reading out the voltage VOUT. Having shown that the HDRC pixel has the ideal logarithmic eye-like conversion function, we move on to study a HDRC array sensor. 8
1
4
V
x1 2
5 MUX
6 Gain
7
A D
3
Controller
Data Out Control In
Iph 9 FPC
2. Photodiode
6. Amplifier
3. Pixel capacitances
7. Video A/D converter
4. Pixel buffer
8. Controller
5. Multiplexer
9. Fixed-pattern correction memory
Fig. 2.10. Block diagram of HDRC sensor
2 The High-Dynamic-Range Sensor
27
2.3 The HDRC Sensor The basic HDRC pixel as shown in Fig. 2.7 does not need any clock or reset signals. It monitors the optical input (the photocurrent) continuously. The dynamic range is so wide that no camera aperture or shutter time adjustments are necessary. This means that the HDRC pixel basically does not need any control or adjustment loops with critical timing requirements. In fact, the HDRC pixel never “blinks”, that is to say, it cannot be caught off guard, it cannot be over- or underexposed, it gives a continuous ready-any-time response to its optical input. This fact also makes the HDRC pixel very easy to integrate in a large-array image sensor with full random access to any pixel any time. The basic block diagram of an HDRC sensor is shown in Fig. 2.10. This sensor requires just one clock signal and a single voltage supply. It operates in the progressive scan mode with pixels being read out synchronous with the chip clock. Again, there are no control loops and restrictions on read-out. The pixel access time is constant, in the reference sensor it is 80 ns. The systematic characterization of this HDRC sensor starts with the sensor output under flat-field illumination. Figure 2.11 shows an overlay of the conversion functions of the 400,000 individual pixels in our reference sensor. The result is a medium characteristic Raw Data
600
Mean Value −3 Sigma +3 Sigma
AD-Output (DN)
500
400
300
200
100 10−2
100
102
104
Illumination in Lux
Fig. 2.11. Overlay of the OECFs of the 400,000 pixels of a 768 × 496 pixel HDRC sensor
28
B. Hoefflinger and V. Schneider x 104 5 Distribution of raw data in the dark Distribution after Single-Point Offset Correction Distribution of raw data in the bright Distribution after Single-Point Offset Correction
4.5 4
Absolute Occurance
3.5 3 2.5 2 1.5 1 0.5 0
0
100
200
300
400
500
600
700
AD-Output
Fig. 2.12. Dark and bright histogram of the sensor fixed pattern
similar to the one shown in Fig. 2.10 with a corridor around it, showing deviations of individual pixels from the mean. The analysis and corrections of this fixed pattern is important in order to obtain high-quality image acquisition. Figure 2.12 shows a dark histogram and a bright histogram for the fixed pattern. An effective fixed-pattern correction (FPC) requires an understanding of its causes. Two independent phenomena are present: 1. The distribution at low light levels is dominated by the distribution of the dark currents of the pixel photodiodes. 2. The distribution at bright levels is caused by distributions of transistor parameters, mostly the distribution of their so-called threshold voltages VT. In either region, the deviations from the mean function can be treated as voltage offsets, VOFFD in the dark section and VOFFB in the bright section. Because of the different causes of these distributions, namely, either the photodiode or on the other hand the transistor, it is important to note that these offsets are not correlated within a pixel. In fact, a pixel may have a positive VOFFD and a negative VOFFB: This means that two or more correction values have to be stored per pixel and the selection between the two should take place at a threshold near to the
2 The High-Dynamic-Range Sensor
29
5 Multipoint - 2 Measuring Points 4.5
Sigma in Digits [DN]
4 3.5 3 2.5 2 1.5 1 0.5 0
10−2
100 Illumination x [Lux]
102
104
Fig. 2.13. Standard deviation after fixed-pattern correction with VOFFD and VOFFB per pixel
3 dB-point of the conversion function. If such a correction is implemented, the overlay of the response curves is shown in Fig. 2.13. At this correction level, the standard deviation from the mean OECF is reduced to better than 3 DN in the range from 1 mlx to 5 klx. More refined fixed-pattern correction will be described in the following section: For the HDRC sensor, adding an offset per pixel enables several other powerful functions besides fixed-pattern correction, and it teaches us some unique properties of logarithmic sensor outputs. In Fig. 2.14, we show that we introduce the offset memory as a digital memory behind the analog-to-digtal converter. Digital storage and arithmetic are robust and exceedingly cost-effective. We also show the camera lens and the scene illumination as important components of our imaging system. Local optical intensity O(λ, x, y) incident on pixel (x, y) is the product of many components in the path of light rays falling on this pixel: The illuminance L(λ, x, y) illuminating that part of the imaged object The reflectance Refl (λ, x, y) of that part of our imaged object The local transmittance T (λ, x, y) of the lens and coatings effective for the pixel (x, y) The spectral sensitivity α(λ, x, y) of the photodiode (x, y).
30
B. Hoefflinger and V. Schneider
Pixel Lens
MUX
A
+ D
Optimized Video Output V
Pixel Balance Memory
Fig. 2.14. Block diagram with digital offset correction memory
As a result, the optical intensity falling on our pixel is O(λ, x, y) = L(λ, x, y) × Refl(λ, x, y) × T (λ, x, y).
(2.6)
This intensity produces a photocurrent in the pixel Iph (x, y) = α(λ, x, y) × O(λ, x, y).
(2.7)
If the signal is big enough so that we are in the logarithmic section of our conversion function, the sensor output voltage V (x, y) for pixel (x, y) will be V (x, y) = a log Iph (x, y) = a(log L + log Refl + log T + log α) + F P (x, y),
(2.8)
where FP(x, y) is the fixed-pattern offset of pixel (x, y). Equation (2.8) can be considered as the master equation of HDR logarithmic imaging. Let us first look at the effect of illuminance and aperture and let us assume that the illuminance is uniform over our observed scene. We see that, in the HDRC-sensor output, illuminance as set by the lens aperture comes in just as an offset. If the aperture is changed by one f -stop (a factor of 2) or if the illuminance changes by a factor of 2, the HDRC sensor output will change be an offset of a log 2, which is a small change if we have a dynamic range of 27 f -stops. On the other side, it means that any recording with the wrong aperture can be ajusted by just an offset to obtain the desired output. Figure 2.15 is an example of these effects. The grey chart has been recorded with varying the camera aperture from f 1.4 to f 16 that is by 7 f -stops. In Fig. 2.16a the recorded grey levels change significantly. However, the contrast
2 The High-Dynamic-Range Sensor
31
1.8
4
1.8
4
8
16
8
16
Fig. 2.15. Grey chart recordings with apertures from f 1.4 to f 16. Left: Raw digital image. Right: Images after a digital offset proportional to log aperture HDRC
(a) Original HDRC® Frame
HDRC
(b) Log Illuminance (Spotlight)
HDRC
(c) HDRC® Frame after Log Illuminance Offset
Fig. 2.16. Scene with spotlight illumination: (a) raw HDRC from output. (b) HDRC data equivalent to the log of the illumant (c) “Standard Appearance” after subtracting the illuminant
steps are well resolved even at f : 16. In Fig. 2.16b, the four images are shifted by the appropriate offsets resulting in identical grey-chart appearance. Now, consider a scene illuminated by a spotlight, that is with a highly nonuniform illumination. This is quite apparent in the face imaged in Fig. 2.16a. However, we recorded the illuminant distribution in the plane of the face with putting a white surface there and we obtained the distribution of Fig. 2.16b. If we subtract Fig. 2.16b from Fig. 2.16a, we obtain the image in Fig. 2.16c, which now has the appearance we would get from an elaborate balanced studio illumination. This illuminance equalization is quite straightforward and very effective in rendering objects with their characteristic appearance not distorted by changing lighting conditions. The recording also shows that, in spite of the spotty illumination, bright regions are free from white saturation, that rich detail is available in deep shades and that we can pull up this information with illuminance equalization performed as a simple offset operation. We also notice that wavelength – dependent sensitivity α (Red, Green, Blue) of the pixel can be easily adjusted by an offset contrary to a pixel with a linear aperture function, where we would have to apply multiplicative corrections.
32
B. Hoefflinger and V. Schneider
Fig. 2.17. Left: Scene taken with the HDRC sensor without any fixed-pattern correction. Right: Scene taken with the HDRC sensor with fixed-pattern correction
The exposure of the HDRC sensor to optical input with a spatially high dynamic range ratio raises concern about crosstalk between neighboring pixels. Measurements have shown the second and third neighbor of a brightly illuminated pixel record to be just 1% and 0.03% of that intensity. These results are satisfactory considering other factors limiting the spatial resolution of image features.
2.4 Fixed-Pattern Correction of HDR Imagers Due to the production process, each pixel has a different OECF with different parameters. This results in a speckled image when illuminating a CMOS image sensor with homogeneous light. Recording normal scenes with the HDRC sensor, fixed-pattern noise degrades the image to an extend that the scene can hardly be recognized (see Fig. 2.17). Without corrective measures the image will have low contrast. Consequently, in order to use HDR imagers, fixed-pattern noise must be corrected. 2.4.1 Physical Background of Logarithmic OECF The OECF used for the fixed-pattern correction algorithm is more complex than the one previously mentioned. In order to achieve high-quality correction results, it is important to consider the following representation of the OECF: y = a + b · ln(x + c). (2.9) Parameter a presents the offset, b the gain and c the illumination at the 3 dB-point. In order to understand the present approach, it is important to know the physical behavior of the pixel. To get a logarithmic OECF, the conversion transistor must work in weak inversion, defined by the following relationship [3]:
2 The High-Dynamic-Range Sensor
ID ≈ Imax ·
VGS − Vthresh . n · Vt
33
(2.10)
ID represents the drain current, Imax the maximum current in weak inversion, VGS the gate-to-source voltage, Vthresh the threshold voltage, n the subthreshold slope factor and Vt the thermal voltage (26 mV at 300 K). The threshold voltage defines at which point the transistor changes from weak inversion into strong inversion mode. The pixel produces a photocurrent out of the incident photons, which is then transformed into a voltage. The total current consists of the photocurrent and the dark current. (2.11) ID = IDark + IPhoto . As the total current corresponds to the drain current and the gate–source– voltage to the output, (2.10) has to be changed as follows: IPhoto + IDark VGS = n · VT · ln (2.12) + Vthresh . Imax Equation (2.12) can be further transformed VGS = aph + bph · ln(IPhoto + cph ),
(2.13)
aph = Vthresh − ln(Imax ) · n · VT , bph = n · VT cph = IDark . As (2.13) is based on currents and voltages, new parameters aph , bph and cph are defined. The extended OECF of (2.9) corresponds to them. This means that offset a of the OECF mainly depends on the threshold voltage and gain b on the subthreshold slope. The illumination at the 3 dB-point is directly connected to the dark current. 2.4.2 Parameter Extraction with Software In order to get parameters a, b and c the sensor is measured with three different illuminations, the first one in the dark, e.g. 0.001 lx. This point is especially important to calculate parameter c. The medium illumination at, e.g., 10 lx and the bright one at, e.g., 10,000 lx are needed to define parameters a and b. Following the measurements, the data for each pixel is used to get the individual parameter set by using the following equations: b=
ybright − ymedium , ln(xbright ) − ln(xmedium )
(2.14)
a = ybright − b · ln(xmedium ), c=e
ydark −a b
− xdark ≈ e
ydark −a b
.
34
B. Hoefflinger and V. Schneider
(a) 700
600
Digital Output
500
400
300
200
100
0 10−4
10−2
100
102
104
Illumination in Lux
(b) 700
600
Digital Output y [DN]
500
400
300
200
100
0
10−2
100
102
104
Illumination in Lux
Fig. 2.18. (a) Individual pixel characteristic without correction shown for 35 pixels (b) Individual pixel characteristic with correction shown for 35 pixels
2 The High-Dynamic-Range Sensor
35
In these expressions x always represents the illumination, e.g., xdark = 0.001 lx and ydark is the corresponding output value of the pixel. Figure 2.18a shows that without any correction all measured 35 pixel cells have a different characteristic curve. The aim of the correction is to get, in the optimum case, one single output characteristic shown in Fig. 2.18b. In order to get such a reference curve, the algorithm takes the mean values of the individual parameters ai , bi and ci . Finally, the correction is done with the inversion function (2.15). y −a i roh −1 bi ¯ ¯ + b · ln e − ci , f (y) = a (2.15) y−ai ysoll = a ¯ + ¯b · ln e bi − ci + c¯ . 2.4.3 Effects of Parameter Variation on the OECF The following figures show the effects of changing these parameters independently. Figure 2.19 demonstrates the importance the variation in illumination plays in the three parameters: parameter a is always important. This means that the fixed-pattern correction algorithm is always a decisive factor. If parameter b varies between the pixels, fixed-pattern noise is mainly caused in bright conditions (see Fig. 2.20). The opposite effect can be seen when changing the 3 dB-Point. This causes high noise in the dark (see Fig. 2.21). 6 5
y = In(x + 0.5) + 0.5 y = In(x + 0.5) y = In(x + 0.5) − 0.5
4
AD−Output
3 2 1 0 −1 −2 10−3
10−2
10−1
100
Illumination in Lux
Fig. 2.19. Variation of offset a
101
102
B. Hoefflinger and V. Schneider 7 6
y = 1.5*In(x + 0.5) y = In(x + 0.5) y = 0.5*In(x + 0.5)
5
AD−Output
4 3 2 1 0 −1 −2 10−3
10−2
10−1
100
101
102
101
102
Illumination in Lux
Fig. 2.20. Variation of gain b 5
4
y = In(0.5 + x) y = In(1 + x) y = In(1.5 + x)
3 Digital Output
36
2
1
0
−1 10−3
10−2
10−1
100
Illumination in Lux
Fig. 2.21. Variation of the 3dB-point c
2 The High-Dynamic-Range Sensor
37
2.4.4 Presentation of Three Correction Algorithms In the literature three correction algorithms are named which either consider only a variation of parameter a (1-parameter correction), a correction in the offset and gain (2-parameter correction) or of all three parameters (3parameter correction) [4]. The following set of equations defines the correction values for each algorithm: y1−Parameter = yi − ai + a ¯, yi − ai +a ¯, y2−Parameter = ¯b · bi (yi −ai ) y3−Parameter = ¯b · ln e bi − ci + c¯ + a ¯.
(2.16)
Figure. 2.22 compares the deviation of the 1-, 2-, and 3-parameter correction algorithms. The 1-parameter correction represents the worst case scenario, as it does not consider the gain and 3 dB-point variation. The 2-parameter correction also has a high deviation in the dark; under bright conditions, however, it delivers as good results as the 3-parameter correction. This means that the best correction is the 3-parameter correction, which considers all three parameters. 6 1− Parameter Correction 2− Parameter Correction 3 − Parameter Correction
Sigma in Digits (DN)
5
4
3
2
1
0
10−2
10−0
102
104
Illumination in Lux
Fig. 2.22. Comparison of 1-, 2- and 3-parameter correction algorithm
38
B. Hoefflinger and V. Schneider
2.4.5 New Parameterized Correction Algorithm As the correction algorithm for the 3-parameter method requires exponential and logarithmic calculus (see (2.16)), it cannot be implemented in cheap hardware for everyday use. In order to solve this problem, the newly developed algorithm approximates the logarithmic curve by three lines: Simple methods use two lines for approximation: line L1 in dark and line L3 in bright conditions. Line L1 represents a simple offset correction in the dark, as it does not consider a gain factor. Under bright conditions line L3 approximates the logarithmic curve (Fig. 2.23). It becomes apparent, however, that the error of this simplification is high in the area around the 3 dB-point. In order to solve this problem, we use a third line L2 which is a secant to the logarithmic curve and produces half the gain L3 does. Simulations showed that the secant delivers better results than the tangent through the 3 dBpoint. Line L2 is used for correction between the illumination c < x < 4 ∗ c. All three lines can be calculated with the help of parameters a, b and c: yL1,i = ai + bi · ln(ci ) = lini yL2,i = a2i + bi · ln(x) with a2i = ai + 0.5 · bi · ln(ci ) + bi yL3,i = ai + bi · ln(x).
(2.17)
Approximation of the logarithmic Curve 700 y = a + b*In(c + x)
Digital Output y [DN]
600
y = a + b*In(x) y = a + b*In(c) Tangent a2 = a + 0.5*b*In(2*c) Secant: a2 = a + 0.5*b*In(c) + b
500
400
L3
3 dB−Point: 2x c
300
L2 200 L1 100
10−2
100 Illumination in Lux
102
Fig. 2.23. Approximation of the logarithmic characteristic
104
2 The High-Dynamic-Range Sensor
39
As described before the parameters for each pixel are defined by measurements under at least three different illuminations. The succeeding set of equations describes how the raw pixel data is corrected during runtime. The presented algorithm does not need any logarithmic or exponential calculus and is, thus, easy to implement into hardware yraw, i ≤ thresh1: ycor, i = yraw, i − (ai + bi · ci ) + lin
(2.18)
thresh1≤ yraw, i ≤thresh2: ycor, i = a2 + b2 · thresh2≤ yraw, i :
yraw, i − (ai + 0.5 · bi · ci + bi ) 0.5 · bi
yraw, i − ai ycor, i = a ¯ + ¯b · bi
Figure 2.24 compares the new parameterized algorithm ((2.18)) with the logarithmic 3-point correction (see (2.16)). The analyzed image sensor without correction has a 10-bit resolution and a 22-digit deviation. Thus, the two algorithms deliver very good results and reduce the FPN by a factor 17. In this case, both methods use three measuring points in the dark for further 1.4 Logarithmic Correction New parameterised Correction c, Threshold 1 4*c, Threshold 2
1.2
Sigma in Digits (DN)
1 0.8 0.6 0.4 0.2 0
10−2
100
102
104
Illumination in Lux
Fig. 2.24. Comparison of logarithmic correction with new parameterized algorithm
40
B. Hoefflinger and V. Schneider Table 2.3. Parameter resolution Parameter a b c∗
Resolution 2 2−4 2−5
Table 2.4. Parameter resolution Parameter
Resolution
a b· c∗
2 1 2−6
¯ b bi
improvement in order to calculate parameter c. Consequently, measuring point 4 is at 2.4 lx and measuring point 5 at 2,080 lx. The new algorithm is not as good in the dark as the logarithmic. This is due to the simple offset correction used here. Thresholds 1 and 2 are also indicated in the figure. They show that at those illuminations the approach by approximation leads to minor disadvantages. Under bright conditions, however, the approximation by lines leads to even better results. This means that the new parameterized algorithm can compete with the logarithmic one. As it does not need any logarithmic or exponential calculus, this simple method can easily be implemented into hardware. Another important point for the hardware implementation is the reduction of storage. Simulations showed good results when using 8 bits for each parameter (see Table 2.3). For simplification, the logarithmized value of c, c* is saved in the memory. As this algorithm does not compensate temperature differences, these parameters are taken at room temperature. In order to further reduce the need for hardware, the product of b· c∗ and the quotient ¯b/bi are saved directly, again with a resolution of 8 bit (see Table 2.4). As a consequence of this improvement the algorithm does not need a division unit. For the implementation an evaluation board with the XILINX-FPGA XCV400E is used. The storage consists of two units with 8 and 16 bits, respectively, and 19 bit address lines. 2.4.6 Masking Process Without a masking process, an image includes lots of black spots which are not part of the natural scene. These pixels are so called bad or weak pixels. Their OECF does not follow general rules, which means that they cannot see in the dark or simply do not work at all. The reason for this phenomenon is the production process. Nevertheless, in order to get high quality images those pixels must be marked and substituted by another one.
2 The High-Dynamic-Range Sensor
41
The masking process is always a trade-off between eliminating as many bad pixels as possible, saving memory space and still having good image quality. The latter is important since the resolution of the image is reduced due to the substitution process. The used substitution mechanism plays an important part. The simplest way is to exchange a masked pixel by its predecessor. The algorithm currently masks all pixels outside the ±3σ interval for parameter c. 2.4.7 Algorithm Including Temperature Figure 2.25 demonstrates how the characteristic curves vary with temperature. In order to achieve a good correction quality across the whole dynamic range, it is necessary to consider the temperature dependence of the parameters. This leads to parameters a(T ), b(T ) and c(T ). Because of the wide application field of HDRC camera systems the ambient temperatures of the sensors also change. Fixed-pattern noise is temperature-dependent. In order to get good images, the temperature drift of the parameters has to be taken into consideration. In order to save memory space, the solution for temperature compensation should be based on one parameter set per pixel across the whole temperature range. Figure 2.26 visualizes the problem of temperature-dependent fixed-pattern noise. In both cases five different correction methods are compared: – Arithmetic logarithmic correction: it corresponds to the 3-parameter correction algorithm using exponential and logarithmic calculus. For better 700
Digital Output y [DN]
600
0⬚C 41⬚C 80⬚C
500 400 300 200 3dB-point
100 0 10−4
10−2
100 Illumination in Lux
102
Fig. 2.25. Temperature dependence of the VGAx sensor
104
42
B. Hoefflinger and V. Schneider
Comparison at 13⬚C Correction data calculated at 13⬚C
(a) 7
Logarithmic Correction (arithm) New parameterised Correction, 3 points in the dark Multipoint, 3 points
6
1−Point Offset−Correction
Sigma in Digits
5
Algorithm incl. Temperature (13⬚C, 45⬚C, 85⬚C)
4 3 2 1 0
10−2
100
102
104
Illumination in Lux
(b) 6
Logarithmic Correction (artihm.) 3−Parameter−Method, 3 points in the darkl Multipoint, 3 points 1−Point−Offset Correction Algorithm incl. Temperature (13⬚C, 45⬚C, 85⬚C)
5.5 5
Sigma in Digits
4.5 4 3.5 3 2.5 2 1.5
10−2
100
102
104
Illumination in Lux
Fig. 2.26. (a) Comparison at 13◦ C, correction data calculated at 13◦ C. (b) Comparison at 84◦ C, correction data calculated at 13◦ C
2 The High-Dynamic-Range Sensor
–
–
– –
43
results in the dark it uses the arithmetic mean value of three measuring points. New parameterized correction: method presented in the previous chapter approximating the logarithm by lines. Again, three measurements in the dark are used to calculate parameter c. Mulitpoint correction: the individual and mean offsets are measured during three differently illuminated conditions and during runtime subtracted. This represents an extension of the IMS-Offset Correction. 1-Point Offset Correction: offsets are required only at one single illumination. Algorithm incl. temperature: this algorithm will be explained later in this chapter.
The correction data for both graphs in Fig. 2.26 is calculated at 13◦ C using an additional set of data. This includes frames at different illuminations and integration across 20 frames in order to eliminate temporal noise. Later, this data is used to correct nonintegrated frames at different illuminations and temperatures. Figure 2.26a shows a correction at 13◦ C. Thus, the correction data and the scene to be corrected have the same temperature and the results are very good. It is apparent that the quality differs with the complexity of the algorithm. Therefore, the logarithmic and new parameterized algorithm delivers much better results than the 1-point offset or the multipoint correction. In Fig. 2.26b the sensor was heated up to 84◦ C and again the same set of correction data is used. This simulates a sensor which has saved only one set of correction data at one certain temperature −13◦ C. When comparing both figures it is obvious that the correction quality worsens – the deviation is reduced by 2 digits. Thus, it is obvious that the fixed-pattern noise changes with temperature and, therefore, sets of parameters taken at one temperature lead to poor results at other temperatures. The only algorithm which has the same quality in both cases is the new algorithm including temperature which will be explained later. Further analysis showed that the offset, gain and the logarithmized value of the dark current differ with temperature. Figure 2.27 shows the temperature dependence of offset a, gain b (see Fig. 2.28) and the dark current c (see Fig. 2.29). In all three cases, the measured values can be approximated by using linear regression. As seen in Fig. 2.26 this leads to really good results. Therefore, parameter ai is now changed into ai (T ). This correlation can be expressed by (2.19): ai (T◦ C ) = ai,0 + αi T◦ C
(2.19)
Parameter ai (T ) is now a combination of offset a0 at 0◦ C and slope α of the regression line. Also parameters bi (T ) and ci (T ) are linearly dependent on temperature and, therefore, can be expressed by similar expressions:
B. Hoefflinger and V. Schneider 0 −1
AD-Output
−2 −3 −4 −5
Measuring Values y = − 6.6443 + 0.0772*T
−6 −7
0
10
20
30
40
50
60
70
80
Temperature in ⬚C
Fig. 2.27. Temperature dependence of offset a 350 Measuring Values y = 344.3838 − 3.3783*T 300
250 AD–Output
44
200
150
100
50 0
10
20
30
40
50
60
Temperature in ⬚C
Fig. 2.28. Temperature dependence of gain b
70
80
2 The High-Dynamic-Range Sensor
45
29
28
AD−Output
27
26
25 Measuring Values y = 24.1938 + 0.0549*T
24
23
0
10
20
30
40
50
60
70
80
Temperature in ⬚C
Fig. 2.29. Temperature dependence of the 3 dB-point c
bi (T◦ C ) = bi,0 + βi T◦ C . ci (T◦ C ) = ci,0 + γi T◦ C .
(2.20)
Including the parameter temperature dependence leads to a doubling in the amount of parameters to be saved. However, compared to other approaches which save a set of parameters (ai,t , bi,t and ci,t ) for different temperatures t individually with only six parameters to cover the whole temperature range is very good. Figure 2.30 shows how the deviation in illumination and temperature of the new algorithm including temperature varies. First of all, it should be mentioned that for the first general analysis it is sufficient to take the same set of data to calculate the correction data as well as implement the correction itself. Furthermore, the resolution again is 10 bit and the deviation without correction is roughly 22 digits across the whole temperature range. Thus, the maximum peak at 1.8 is optimized by a factor 12. The figure also shows that the deviation increases with temperature and it proves that our new approach works correctly at different temperatures. Under bright conditions, a slight increase in the deviation becomes apparent. This is caused by an optimization to 40 bit per pixel. In this case, parameter β is not saved per pixel but rather per column. When using 48 bit the deviation is constant under bright conditions. Fixed-point calculation is needed, where all six parameters are 8 bits long. Again, the same hardware is used: XILINX-FPGA XCV400E with 8 and 16 bit storage (see Table 2.5).
46
B. Hoefflinger and V. Schneider 2
1.8
Sigma in digits x
1.6 0⬚C 21⬚C 37⬚C 52⬚C 80⬚C
1.4
1.2
1
0.8
10−4
10−2
100 Illumination in Digits
102
104
Fig. 2.30. Deviation over illumination and temperature – correction results with the algorithm including temperature [5]
Table 2.5. Parameter resolution for method including temperature compensation Parameter a0 b0 c0 α β γ
Resolution 2 2−3 2−4 2−5 2−10 2−10
Ultimately, this chapter introduced a new correction method which needs only one set of data of 40/48 bit to correct fixed-pattern noise across the entire temperature range. This method leads to better results and, for that reason, extends the application field of HDRC image sensors. 2.4.8 Correction Procedure and Runtime Finding a suitable good fixed-pattern correction algorithm is only the first step towards high quality images. After the production of HDR imagers, their specific fixed-pattern correction parameter set must be determined. For this we first measure the sensor characteristic by using a phlox-plate followed by the PC calculating the correction parameters according to the previously presented algorithm using (2.14). The exponential calculus in these equations
2 The High-Dynamic-Range Sensor
47
Calculation of Parameter Values (only once)
Embedded memory
Correction parameters
Calculation in PC
Camera System
Sensor data
Illumination
Phlox
Control of illumination
Correction of Pixel Values during Run Time
Embedded memory
Correction parameters
Camera System
Corrected Image
Calculation of Corrected Image repeats after every picture
Fig. 2.31. Process flow of fixed-pattern correction
is not problematic as it is performed by a fast computer. Finally, this data is transferred to the embedded memory of the sensor. During runtime, the digital correction unit reads out the raw data of the sensor and corrects it. The hardware first evaluates the raw value with the threshold and then decides on (2.18). Figure 2.31 visualizes this process. 2.4.9 Summary The presented method regards all physical pixel parameters as the offset, the gain and the dark current of the pixel. The results show that the algorithm improves image quality greatly compared to the 1-point-offset correction. A comparison at high temperatures shows that the image quality can be improved by using temperature compensation. The strategy and the realization of this method are based on the effectiveness and speed of digital storage and processing so that a correction within the camera head is possible in realtime with 32 frames s−1 at VGA-resolution (300,000 pixels).
2.5 HDRC Dynamic Performance As a continuous photocurrent monitor with logarithmic opto-electronic conversion function, the HDRC sensor has an inherently simple functionality. The
48
B. Hoefflinger and V. Schneider MOS ACTIVE PIXEL VDD
LINEAR Charge Integration Mode VDD
LOG Continuous Current Mode VDD V~log I
Source Follower
Row Select VSS
VSS Reset
VSS Integrate
VSS Continuous
Fig. 2.32. HDRC pixel with reset including key capacitance
output voltage of an HDRC pixel continuously tracks changes in the photocurrent. Naturally, the node voltage Vlog tracks the changes in the photocurrent depending on the charging and discharging of the log-node capacitance C. In Fig. 2.32, the HDRC pixel diagram of Fig. 2.7 is redrawn here with its significant capacitance. In our HDRC reference sensor, this capacitance is typically 2.5 fF. This capacitance is charged by the log-transistor T1 and discharged by the photodiode current. In the stationary state, the node voltage Vlog is dictated by the gate-source voltage VGS, which is consistent with the photodiode current according to (3.1). This means that for low light levels Vlog is high, typically 2.8 Volt in a 3.3 Volt sensor and it is low, about 2.4 Volt, for high illuminance values. If the illumination of a pixel changes abruptly from dark to bright, Vlog is high, VGS is low and the transistor conductance is very small. The pixel capacitance is discharged by the increased photocurrent and as Vlog decreases, VGS is increased and a competing pullup current is generated by the transistor T1. Nevertheless, in this case the transition is mostly determined by a discharge of the pixel capacitance with a constant current source. For a transition from 10 mlx to 10 lx the settling time for Vlog is much less than one frame time. For a step-function change from bright to dark, the pixel capacitor has to be charged to a higher voltage. Initially, the gate-source voltage is fairly large and, consequently, sufficient pull-up current is delivered by transistor T1. However, as Vlog rises, VGS decreases and, consequently, the pull-up current decreases exponentially resulting in an ever slower rise of Vlog. Figure 2.33 shows Vlog versus time for this case and it is evident that the pixel does not obtain its stationary dark-level voltage within one frame. Practically, a laterally moving bright light source leaves behind a tail of grey levels like a comet tail. This phenomenon has been called HDRC image lag. For this transition from bright to dark, the fact that the HDRC pixel tracks differences continuously with some memory of the previous frame causes this effect.
2 The High-Dynamic-Range Sensor
49
800 750
Digital Output [DN]
700 650 600 550 500 450
0
500
1000
1500
2000
2500
3000
3500
4000
Frame Number [1000 F/s]
Fig. 2.33. HDRC pixel voltage Vlog after a step-function change from 1 klx to 10 mlx
In order to erase this memory, Vlog has to be reset for each frame to a reference level. The log transistor T1 can serve for this purpose. For this function, T1 is operated as an active transistor with properly controlled gate and drain voltages. If the gate voltage VGG is sufficiently high compared with the drain voltage SVDD, the difference being at least the threshold voltage VT, Vlog will be forced to SVDD. This “dark” reset was proposed basically in 1997 [6]. The most effective mechanism is outlined in Fig. 2.34. At the end of a frame, the Vlog voltages are distributed from dark to bright as shown in the figure, obviously with VGG = SVDD = VDD as in the classical continous HDRC mode. Reset is initiated by raising VGG to VGGH and lowering SVDD to a level SVDDL, which is slightly higher than the dark level of Vlog. All pixels are effectively black at the end of reset. Now VGG and VDD return to the standard HDRC level SVDD. This condition effectively turns off the transistor, and all pixel nodes Vlog are now discharged by their individual photocurrent. Brightly illuminated pixels with large photocurrents will see a rapid decrease of Vlog, turning on the log-transistor T1 and quickly establishing the stationary bright signal voltage Vlog identical to the continuous HDRC mode. Dark pixels with low photocurrent will be discharged slowly with a linear response dictated by the pixel capacitance C until, for sufficiently large integration time, Vlog (t) has decreased to values where the log transistor T1 begins to provide a competing pull-up current so that Vlog settles to a value closer to the log-response value of the continuous mode of operation.
50
B. Hoefflinger and V. Schneider RESET
SVDDL
Vlog
SVDDH
SVDD SVDDL
VGGH
VGG
SVDDH 0.0
0.005
0.01
Time (s)
Fig. 2.34. Timing diagram for HDRC pixel in the dark reset mode
The OECF of the dark HDRC reset mode is shown in Fig. 2.35(a) in comparison with the OECF of the continuous HDRC mode. As we can see, the output signal range is expanded at low-luminance levels, the conversion function becomes more linear and the contrast sensitivity function (the derivative of the OECF) in Fig. 2.35(b) is improved in the low-luminance section around 1 lx. For illuminances above 10 lx, the two characteristics are identical as we expect. The HDRC sensor in this reset mode effectively is free from image lag as shown in Fig. 2.36 for a rotating light source. We can also reset the HDRC pixel to the other end of the response scale providing a “white” reset. As shown in Fig. 2.37, with this reset we set the pixel voltage Vlog to the lowest voltage on our scale by pulling the drain voltage VDD of log transistor T1 to a value SVDDL lower than that in the continuous HDRC mode for a very bright illumination. All pixel voltages Vlog are now forced to SVDDL, appearing white. At the end of this reset, again gate and drain of log transistor T1 are returned to their standard HDRC voltage SVDD. All pixels start with an initially high pull-up current provided by T1. Bright pixels will settle fast because of the small required voltage change and correspondingly
2 The High-Dynamic-Range Sensor
51
(a) 800
Digital Output [DN]
700
Continuous Mode White Reset Black Reset
600
500
400
300
200 10−4
10−2
100 Illumination x [Lux]
102
(b) 350
Sensitivity [DN/decade]
300
Continuous Mode White Reset Black Reset
250
200
150
100
50
0 10−4
10−2
100
102
Illumination x [Lux]
Fig. 2.35. (a) The OECF of the HDRC pixel in the continuous mode, the dark reset mode and the white reset mode. (b) The CSF of the HDRC pixel in the continuous mode, the dark reset mode and the white reset mode
52
B. Hoefflinger and V. Schneider
Fig. 2.36. Rotating light spot image recorded in HDRC continuous mode (left) and in the dark reset mode (right) Dark
Vlog SVDDL White
SVDDH
SVDD SVDDL
RESET
VGG = SVDD 0.0
0.005 Time (s)
0.01
Fig. 2.37. Timing diagram for the HDRC pixel in the “white” reset mode
large effective gate voltage VGS and pull-up current from T1. They attain the signal level of the continuous HDRC mode quickly. Dark pixels have to rise to larger voltages and their transistors are slow because, while pulling up Vlog, transistor T1 turns itself off more and more. However, the dark pixels have no memory of their illumination in the previous frame so that the image lag or comet tail of moving light sources is eliminated. The conversion function of our pixel in this “white”- reset mode is also shown in Fig. 2.35 for two values of reset voltage for SVDDL. In one case, the conversion function is indistinguishable from the continuous HDRC mode, and also the contrast sensitivity function in Fig. 2.35b is identical. In the second case, where SVDDL is even lower, the range of output values is compressed for lower illuminations and also the contrast sensitivity function deteriorates at low light levels.
2 The High-Dynamic-Range Sensor
53
Again, the image lag is removed in the scene shot with the rotating light source. As this chapter shows, the HDRC image lag at low light levels can be eliminated with controlling the potentials of the log transistor T1 and allowing an appropriate integration time as a fraction of the frame time. If this becomes very short in the case of high-speed subframing operations in industrial machine vision, sufficient illumination is provided so that the continuous HDCRC log-imaging mode with its robust high-speed capability can be used throughout without resorting to a reset.
2.6 HDRC Sensor with Global Shutter HDRC sensors whose pixels contain three transistors shown in Fig. 2.7 are read out one row at a time. In this rolling-shutter mode, the pixel voltages within each row apply to the same instant in time. However, the time elapsed between the first row and the last of an array sensor may have the effect that the shape of a large rapidly moving object filling a large part of the frame will be distorted. An array sensor with a global shutter would acquire the outputs of all pixels in an array absolutely at the same time. A HDRC pixel, which has this functionality, is shown in Fig. 2.38. The pixel signal at the first source follower is transferred to the hold capacitor CH when the shutter transistor is turned on. This happens for all pixels at the same time. The shutter transistor is then closed and the signal voltages on the hold capacitors are read out one row at a time providing suitable output levels and drive currents to the column electronics. Table 2.6 provides the data of such an HDRC sensor with global shutter. Although the photo diodes now are somewhat smaller because of the additional transistors in each pixel, the sensor OECF in Fig. 2.39 still shows a contrast sensitivity of 0.9% and a remarkable 3 dB point of the contrast sensitivity at 30 mlx. Even at 3 mlx the contrast sensitivity is still 9%. SVDD SHUTTER
SEL
A
A
OUT
CH VSS
Fig. 2.38. Circuit diagram of HDRC pixel with global shutter
B. Hoefflinger and V. Schneider 800 Continuous Mode 750 700 Digital Output [DN]
650 600 550 500 450 400 350 300 10−4
10−2
100
102
Illumination x [Lux] 120 Continuous Mode 100 Sensitivity [DN/decade]
54
80
60
40
20
0 10−4
10−2
100
102
Illumination x [Lux]
Fig. 2.39. OECF and CSF of an HDRC sensor with global shutter
2 The High-Dynamic-Range Sensor
55
Table 2.6. Parameters of an HDRC sensor with global shutter Product name Pixels Optical format VGA resolution Image diagonal Full frame Technology Pixel area Fill factor Dynamic range Sensitivity (S/N = 1, 20◦ C) Spectral sensitivity [A/W] 550 nm 850 nm Pixel rate VGA frames Random access (AOI) Subframes (3,000 Pixels) Shutter Digital output Power supply AVDD, VDD LVDD
Global shutter Unit 768 × 496 1/2
in.
9.1 0.25 10 × 10 > 25 120 0.004
mm µm µm2 % dB lx
0.30 0.15 12 38 Yes 4k Global 10
A W−1 A W−1 MPixel s−1 Frames s−1
3.3 2.5
Frame s−1 Bit V V
The recording of a rotating fan is a common example to show the significance of a global shutter. A frame from a recording with a rolling-shutter HDRC sensor is shown in Fig. 2.40(a) having the typical distortions of the fan blades. The frame in Fig. 2.40(b) taken from a recording with the globalshutter sensor shows that this effect is totally removed.
(a)
(b)
Fig. 2.40. One frame form an HDRC video recording of a rotating fan (3,000 rpm) (a) recorded with a rolling shutter (b) recorded with global shutter
56
B. Hoefflinger and V. Schneider
The global-shutter sensor also has all the gate- and drain-voltage controls on the log transistors T1 to use the reset options explained in Sect. 2.4 and thus offering very versatile and robust recording features under any lighting conditions with the familiar dynamic range of over 120 dB.
References 1. Grotjohn T and Hoefflinger B (1984) A parametric short-channel MOS transistor model for subthreshold and strong-inversion currents, IEEE J. Solid-State Circuits SC-19, pp. 100–113 2. Seger U, Graf HG, Landgraf ME (1993) Vision assistance in scenes with extreme contrast. IEEE Micro 13: 50–56 3. Hauschild, Ralf: Integrierte CMOS-Kamerasysteme f¨ ur die zweidimensionale Bildsensorik, Uni Duisburg, Dissertation 1999, http://www.ub.uni-duisburg.de /diss/diss9922/ 4. M. Loose, Self-Calibrating CMOS Image Sensor with Logarithmic Response, Dissertation, Universit¨ at Heidelberg, 1999 5. Schneider, V.: Fixed-Pattern Correction of HDR Image Sensors, Prime Conference 2005 6. Takebe K, Abe H (1999) Photosensor circuit having an initial setting unit, United States Patent 5.861.621
3 HDR Image Noise Bernd Hoefflinger
The nonuniformities or fluctuations in an image acquired with an array sensor under uniform illumination are called flat-field noise. The major sources in the total resulting flat-field noise are studied in this chapter for high-dynamicrange logarithmic imaging, and comparisons with linear or piecewise linear, extended-range image sensors will be made for our reference pixel (Fig. 3.1 and Table 3.1). The dominant source of nonuniformity in the HDRC sensor, fixed-pattern noise, has been treated already in Sects. 2.2 and 2.3. The fixed-pattern of pixel dark currents NDFP and of pixel output signals NSFP were described there having standard deviations of about 20–30 DN for a contrast sensitivity of 100 DN per decade of illuminance. The digital fixed-pattern correction was also described there resulting in a standard deviation of 1–3 DN. We continue to relate our assessment of image noise to this digital output scale. Of the other various noise sources, we have already encountered another one: quantization noise NQ. For a single reading of an output value, the error is 1 DN. While NDFP and NSFP are noise contributions in the space domain, major noise contributions in the time domain besides NQ come from the granularity of the exceedingly small dark and signal currents. We study a generic CMOS pixel with parameters given in Table 3.1. The dark current of our reference pixel at room temperature is 128 aA, corresponding to 800 e s−1 . We study the noise for a frame or integration time of 25 ms. This produces a dark charge of just 20 electrons, which would cause a change of 1.3 mV on our pixel capacitor of 2.5 fF. This electron count ND of 20 electrons has an RMS variance, also called shot noise, of √ NDS = ND = 4.5 electrons. The variance of the photocurrent is also significant. We have seen in Chap. 2 that the HDRC optoelectronic conversion function (OECF), Eq. (2.4) produces useful gray levels for photocurrents I P, which are 1/10 of the dark current I D . For our reference pixel, a photocurrent of 12.8 aA would be associated with an illuminance of a little over 1 mlx. This corresponds to 76 e s−1
58
B. Hoefflinger MOS ACTIVE PIXEL
LOG Continuous Current Mode
LINEAR Charge Integration Mode VDD
VDD
VDD V~ log I
Source Follower
VSS
Row Select
VSS Reset
VSS
VSS Continuous
Integrate
Fig. 3.1. CMOS reference pixel Table 3.1. Parameters of a reference pixel Pixel area Photodiode(PD) area Capacitance Dark charge Dark shot noise Conversion efficiency Sensitivity Reset noise Quantization: (1 DN)a a
(µm2 ) (µm2 ) (fF) (e) (e) (e mlx−1 ) (DN mlx−1 ) (V lx−1 s−1 ) (e) (e) (mV)
7.4 × 7.4 5.0 × 5.0 2.5 20 4.5 2 0.33 5.4 20 6 0.4
Linear 10 bit Conditions: T = 20◦ C, Tint = 25 ms
or just two electrons in the frame time of 25 ms. The variance NSS of this signal charge NS is √ NSS = NS = 1.4 electrons or 70% of the signal, respectively. NDS and NSS establish the shot-noise limit NSL for our reference pixel, and this is NSL = (NDS2 + NSS2 )1/2 = (ND + NS)1/2
(3.1)
shown in Fig. 3.2. Converting electrons to output voltages requires readout electronics from the source follower T2 to multiplexers and amplifiers producing a read-noise voltage or digital count. For a linear OECF, this can be translated back to equivalent read-noise electrons NR. If we operate any pixel with a reset, the reset voltage from which we then start to integrate charges has a voltage variance VN RESET dictated by the reset transistor on-resistance and the pixel capacitance C to be charged through this reset transistor. This variance, the reset noise, is determined by
3 HDR Image Noise
59
e− Total Signal 103
Photosignal 102 Total Shot Noise
Dark Signal 10
Photosignal g Shot Noise Dark Shot Noise
2
Shot Noise Limit 10–3 2 0.1
10–2 20 1
10–1 200 10
1 2•103 102
10 2•104 103
Lux e– X
Fig. 3.2. Flat-field shot-noise limit (40 fps)
Eq. (3.2)
kT [V], C √ Qnreset = C V nreset = kT C [As] √ NRESET = 4 × 108 C electrons V nreset =
(3.2)
(3.3)
at room temperature. For our evaluation at room temperature, the reset noise of our reference pixel is 20 electrons. We now study the effect of these noise sources on the flat-field noise, the granularity of the output and uniformly illuminated sensor. This study is inspired by the publications of Janesick [1]. Assuming that the identified noise sources are uncorrelated, the total flat-field noise is N = [NDFP2 + NSFP2 + NDS2 + NSS2 + NQ2 + NRESET2 + NR2 ]1/2 . (3.4) The contributions to the flat-field noise of a linear charge-integrating CMOS sensor with our reference pixel are shown in Fig. 3.3 for NDFP = NSFP = NQ = 1 DN = 6 e and for an integration time of 25 ms.
60
B. Hoefflinger
e−
Saturation Total Signal
103
Total Noise Linear Sensor
Photosignal 102
Total Shot Noise Reset Noise
Dark Signal 10
Quantisation Noise Dark Shot Noise 2
Shot Noise Limit 10–3 2 0.1
10–2 20 1
10–1 200 10
1 2•103 102
10 2•104 103
Lux e– X
Fig. 3.3. Flat-field noise linear APS sensor (40 fps)
Here we assume that the fixed-pattern has been corrected to 1 DN. A maximum signal voltage of 400 mV would correspond to 6,000 electrons so that 1 DN would be six electrons with a 10-bit A/D converter. In other words, the sensitivity would be 15 electrons mV−1 . On this scale, the lower flat-field noise limit is given by reset and quantization noise. As a consequence, the signal-to-noise ratio SNR vs. illuminance is given in Fig. 3.4. The S/N ratio reaches a value of 1 at 10 mlx, 10 at 90 mlx and it peaks at 70. With a maximum saturation signal of 3 lx, the dynamic range of the linear reference sensor is 300 to 1 or 50 dB. We have chosen the linear sensor first because photocurrent and output voltage are proportional so that the noise in electrons directly provides the output voltage noise. In the high-dynamic-range logarithmic image sensor, the (OECF), Eq. (2.4), has to be considered as transforming current noise into output voltage noise and vice versa. Our conversion function is (DN) = f (x) = 54 ln(1 + x) for a 10-bit output and for a dynamic range of 140 dB. In our reference pixel, x = 1 corresponds to 20 electrons.
3 HDR Image Noise S/N 102
61
dB 40 HDRC
10
20 Piecewise Linear Linear
HDRC
1 10–3
0 10–2
10–1
1
10
102
103
104 Lux
Fig. 3.4. Flat-field signal-to-noise ratio (40 fps)
The noise contributions are related as df ∆x dx 54 = ∆x. x+1
∆y =
Quantization noise ∆y = 1 DN on the output means a noise charge ∆x =
x+1 54
corresponding to NQ =
NS + 20 54
(3.5)
electrons. To obtain the total flat-field noise at the output in the continuous HDRC mode, we have to include the fixed-pattern and the quantization noise, whereas there is no reset noise contribution. We obtain the digital flat-field noise output as N = (NSL2 + NQ2 + NFP2 )1/2 . With fixed-pattern noise corrected to 1 DN, we obtain at the digital output N (DN) = (NSL2 + 2NQ2 )1/2 . This result is plotted in Fig. 3.5 for the HDRC sensor in the continuous mode.
62
B. Hoefflinger
Total Signal
Total Noise HDRC-Sensor
103
Photosignal Total Shot Noise
102
Quantisation Noise Dark Signal 10 Dark Shot Noise 2
Shot Noise Limit
2 0.1
20 1
200 10
1 2•103 102
10 2•104 103
Lux x
Fig. 3.5. Flat-field noise HDRC sensor (40 fps)
In Fig. 3.4 we have shown the signal-to-noise ratios of our reference CMOS sensor both in the linear reset mode and in the high-dynamic-range continuous mode. We notice that the HDRC sensor has the superior flat-field signal-tonoise ratio over an extremely wide dynamic range starting out with a better ratio at very low light levels achieving a maximum already at fairly low light levels and maintaining this ratio over six orders of magnitude. The linear integrating sensor is inferior at low levels because of reset and quantization noise. It is superior near its white saturation but of course at the price of white saturation and a useful dynamic range of less than 60 dB. The signal-to-noise characteristic of the HDRC sensor shown in Figs. 3.4 and 3.5 is correct only for quasistationary illumination. For rapid changes of illumination including low illumination levels, the HDRC sensor has to be operated with a dark or white reset as described in Sect. 2.4. In this case, we have to include the reset noise. At low light levels, the HDRC sensor with dark reset due to its higher contrast sensitivity in the 1 lx region exceeds the continuous HDRC mode. It still offers the very high-dynamic-range and is attractive because of the elimination of the image lag. At lower light levels, because of its somewhat reduced contrast sensitivity, the white reset mode of the HDRC sensor is inferior to the continuous mode, but again, it is free from image lag and offers the same very high-dynamic-range.
3 HDR Image Noise
63
e− Saturation Total Signal 103 Total Noise Linear Sensor Photosignal 102
Quantisation Noise B Total Shot Noise Reset Noise
Dark Signal 10
Quantisation Noise A Dark Shot Noise 2
Shot Noise Limit 10−3 2 0.1
10−2 20 1
10−1 200 10
1 2•103
10 2•104
102
103
Lux e− x
Fig. 3.6. Flat-field noise piecewise linear APS sensor
In the linear integrating mode of CMOS pixels, the dynamic range can be extended by switching to a smaller gain above a certain threshold. The effect of this technique on the flat-field noise is shown in Fig. 3.6, where this threshold is set at 2,000 e and where the gain drops by a factor of 30. The S/N ratio for this technique is also included in Fig. 3.4. It should be noted that the inverse of the flat-field signal-to-noise ratio S/N in Fig. 3.4 gives us the detectable contrast for 40 frames s−1 . Because we have applied one flat-field noise model and one common reference pixel, the favorable results for the HDRC sensor make it very attractive. At the same time, we have to keep in mind that these noise results are all based on the assumption that we have performed a digital fixed-pattern correction with a standard deviation of 1 DN over the full dynamic range following the methods outlined in Sects. 2.2 and 2.3.
References 1. Janesick JR (2003) Introduction to CCD and CMOS imaging sensors and applications. SPIE Education Services SC504 San Diego, CA USA
4 High-Dynamic-Range Contrast and Color Management Bernd Hoefflinger
By now we have learned that HDRC sensors faithfully acquire scenes with widest dynamic range and with very good sensitivity and uniformity. The sensor signals we now have available are the prerequisite for a high-performance electronic-vision system, which should accomplish two tasks: 1. Present and display the scene informative and pleasing to our eyes. 2. Deliver optimal input signals to machine-vision systems, which build on the unique characteristics of the human vision system (HVS) combined with the benefits of digital processing power. For both tasks, the HVS provides guidance on many counts. We concentrate here on three: 1. Contrast sensitivity: How do we detect small differences anywhere in any scene? 2. Discount the illuminant, which can be extremely variable, spotty and blinding, obscuring the objects we want to see. 3. Provide correct and constant colors under different and changing illuminations both for our critical perception of colors and for object recognition and classification in machine-vision systems. We recapitulate that the quality of an imager lies in its capability to help us perceive detail and differences in the optical signals, which we acquire. In sensing the optical intensity I, what is the minimum just noticeable difference (JND) ∆I? It was Weber’s finding in the 19th century that, for many human sensing functions, the ratio ∆I/I is constant. This law, extended over the wide dynamic range of the eye results in the famous logarithmic relationship for the detected signal S: (4.1) S = K0 ln I. The constant K is the contrast sensitivity, which we saw in Chap. 2 is about 1% for the human eye and it is about 1% for an HDRC sensor with 140 dB dynamic range and a 10 bit digital output. The sensitivity decreases at very low intensities so that the full OECF is
66
B. Hoefflinger
S = K ln(I + 1).
(4.2)
Here I is the photocurrent normalized to the dark current. At I = 1 we find the 3 dB point of sensitivity. It occurs at about 1 cd m−2 (340 mlx) in spontaneous eye sensitivity and at 10 mlx in our reference HDRC sensor. The form of these natural sensor conversion functions has attracted continuing research and a well-known alternative is the power relationship S = KI γ ,
(4.3)
which is often called Stevens Law. For the visual system, γ = 1/3 is used, for instance in the CIELab standard color space for the display of images. There, the displayed lightness L* has the following relationship to the green stimulus Y : 1/3 Y Y ∗ L = 116 − 16 for > 0.008856, (4.4) YW YW where Yw is the white reference level. The logarithmic conversion function (4.1) and the power function (4.3) with Y = 1/3 have similar characteristics for a certain range of inputs ln(1 + x) ≈ x − 0.5x2 + 0.33x3 , for x ≤ 1, (1 + 1.8x)1/3 − 1 ≈ 0.6x − 0.9x2 + 0.34x3 , for 0 ≤ x ≤ 1. We continue to concentrate on the logarithmic conversion function, because it helps very effectively to explain several other perception-based features of the human visual system (HVS). The constant contrast sensitivity of the log OECF is the basis of the familiar gray charts (Fig. 4.1). Here we have taken two shots of the same gray chart with the HDRC camera where for the lower right one the aperture was changed by 2 f stops. Therefore, for the lower right chart, the illuminance L is 1/4 of that of the upper chart. However, we perceive the relative differences of the gray fields as identical. This chart helps us to introduce: Lightness is the brightness of an area judged relative to the brightness of an area similarly illuminated that appears white. 1.8
1.8
4
Fig. 4.1. Gray chart recorded with HDRC camera and two apetures
4 HDR Contrast and Color Management
67
For completeness and distinction, we define brightness as: “Brightness is sensation according to which an area appears to emit more or less light.” With the radiometric measurement we measure brightness, whereas with our eyes we sense lightness. Our “white” sensation of a white paper is largely independent of whether we look at it in sunlight or indoors at much lower light levels. We conclude: The eye with the log-conversion function and the HDRC sensor with a log-conversion function are essentially “lightness” sensors. For the eye and for the HDRC sensor this means further: The input side of the eye and the HDRC sensor may be hit by a very wide range of brightness L but the sensation, which we perceive, is a range of lightness L∗ = K log L.
(4.5)
This leads us to the topic of lightness and contrast on the output or display side of an HDRC sensor. In Fig. 4.2a we show the HDRC image of a scene and its histogram where the 10 bit output range can represent seven orders of magnitude at the input of the sensor or an input dynamic range of 140 dB. The histogram shows that our scene dynamic range including the bright light source is only three orders of magnitude or 60 dB. In Fig. 4.2b we have shifted and expanded the output signal to give us the sensation of a lightness range from white to black. The differences on the original output signal on the left have been multiplied by a gain γ and the contrast sensitivity has been multiplied accordingly by that amount maintaining a constant value independent of
Fig. 4.2. Histogram of reference image with light source. Lef t: sensor output. Right: stretched by gain γ
68
B. Hoefflinger
local lightness, maintaining exactly the logarithmic OECF. In an HDRC camera, the minimum and the maximum of the histogram as well as the resulting γ and offset are computed on-the-fly per frame providing a simple and highly useful mapping function. More sophisticated, perception-based mapping functions will be discussed in Chap. 11. The gray-scale mapping shown here offers an effective way to tackle HDRC color management. Sensing and distinguishing colors is one of the most powerful capabilities of the HVS. The HVS performs this with a remarkable robustness and sensitivity for differences under any lighting conditions, in fact, we see an apple as red in bright sunlight as well as indoors at low light levels (and even a different illuminant). This sensation of “color constancy” has intrigued scientists for hundreds of years and it is a fundamental challenge for an HDRC imaging system to display this to the human viewer or for a machine-vision system to provide this capability. The sensing, encoding, and displaying of color is a sophisticated science, and HDR color imaging is an exciting new extension, which emerges with the availability of HDR image sensors. With respect to the state-of-the-art of digital color management, we follow three excellent books [1–3] and we present HDR color imaging and management as an effective extension of this art. Color stimulus is completely defined when we know the tristimulus values X, Y, Z (Red, Green, and Blue). We cover the pixels of the HDRC sensor with a mosaic of red, green, and blue dye filters with sensitivities δR , δG , and δB , respectively. The pixels now provide densitometric red, green and blue output signals. The stimuli are photocurrents IR , IG , and IB IR = δR LRR , IG = δG LRG , IB = δG LRB .
(4.6)
Here, L is the brightness of the illuminant and R the reflectance of the object. The logarithmic conversion function of the HDRC sensor in its “photopic” section, where the photocurrents are large compared with the pixel dark currents, delivers pixel output voltages VR , VG , and VB VR = K log IR + bR , VG = K log IG + bG , VB = K log IB + bB ,
(4.7)
where K is the contrast sensitivity and bR , bG , bB are the offset in the red, green, and blue pixels, respectively. As an example, we insert the stimulus current in the red pixel to obtain its output voltage VR VR = K log RR + K log L + K log δR + bR .
(4.8)
In the following, we assume that the white balance has been performed
4 HDR Contrast and Color Management
69
log δR + log LR + bR = log δG + log LG + bG = log δB + log LB + bB (4.9) The brightness “White” LW is now mapped in our sensor to a lightness “White” L∗W L∗W = K log LW .
(4.10)
Local color output “Red” appears as CR (x, y) = K log X(x, y).
(4.11)
The color saturation appears as SR = K log(X/L) = CR − L∗ ,
(4.12)
where the local lightness is L∗ =
CR + CG + CB . 3
(4.13)
We notice again that lightness enters the color signal as an offset. Chroma CR is constant. This phenomenon becomes apparent in Fig. 1.8 where we have taken the Macbeth chart with four exposures in (a), which appear over- or underexposed. In (b) we have just added offsets to all pixel values to obtain the charts with an identical appearance. This color constancy of HDRC sensors is very powerful because it saves color-correction operations, and objects are identified directly with correct colors. The scene in Fig. 4.3 with two exposures 5f -stops apart again demonstrates this powerful feature. As the dynamic range becomes large, the range of output lightness L* increases by log L so that the colors in the output (4.11) appear less saturated resulting in a grayish appearance. A powerful step to enhance the color saturation given by (4.11) is to apply saturation enhancement factors αR, αG , αB ∗ SR = αR (CR − L∗ ), ∗ SG = αG (CG − L∗ ), ∗ SB = αB (CB − L∗ ),
so that we obtain enhanced colors ∗ ∗ CR = SR + L∗ = αR (CR − L∗ ) + L∗ , ∗ ∗ CG = SG + L∗ = αG (CG − L∗ ) + L∗ , ∗ CB∗ = SB + L∗ = αB (CB − L∗ ) + L∗ ,
70
B. Hoefflinger
Fig. 4.3. HDRC recordings of masks with two apertures after log luminance offset of the pixel data. Lef t: 1:1.8. Right: 1:8
Fig. 4.4. The cones of chromaticity vs. lightness
Fig. 4.5. HDRC color image before and after enhanced color saturation
4 HDR Contrast and Color Management
71
Table 4.1. Steps to ward displaying log HDR color images 1. For the R, G, B pixels, record the output voltages VR , VG , VB of the log-converting HDRC sensor and convert to digital numbers (DN). (a) With a Gray opaque diffuser filter, shift the centers of the recorded RGB histograms to coincide = white balance. (b) Remove filter, and for each captured frame, get Min and Max of the frame histogram. 2. Expand and shift all DN to fully utilize the lightness range L∗W . 3. Depending on the RGB mosaic topology, perform center-surround operations to obtain the tricolor CR , CG , CB for each pixel. ∗ ∗ , CG , CB∗ per 4. Perform color enhancement operations to obtain CR pixel.
It has been suggested for a linear OECF to amplify the differences (R–L, G–L, B–L). However, for a linear OECF, more precisely, the ratios R/L, G/L, B/L should be amplified. The difference is only justified and appropriate for a logarithmic OECF. The absolute and relative magnitudes of the color amplification factors αR , αG , and αB have to be determined for a given HDRC sensor and camera. Typical results for the VGAx sensor are αR = αG = αB = 2.0. Amplification may have to increase with lightness dynamic range to maintain color saturation. As a result, the lightness – color cone of Fig. 4.4 is obtained for a HDRC color camera. Figure 4.5 shows a typical HDRC color image before and after saturation enhancement. To summarize, the steps for a densitometric HDRC color image are listed in Table 4.1. In this chapter we have limited HDR contrast and color management to simple linear global operations for the optimal display of recorded HDRC images, which can be executed easily frame-by-frame in real time. More advanced and particularly perception-motivated HDR video tone mapping and encoding will be treated in Chaps. 11 and 12.
References 1. Giorgianni EJ, Madden TE (1997) Digital color management. Addison-Wesley, Reading 2. Johnson GM and Fairchild MD (2003) Visual psychophysics and color appearance. In: Digital color imaging handbook, Sharma G ed., Chap. 2 3. Wyszecki W and Stiles WS (2000) Color science: concepts and methods. Wiley, New York
5 HDR Video Cameras Markus Strobel and Volker Gengenbach
5.1 Introduction In this chapter, we describe various cameras built with HDRC sensors. One common property of these cameras is the modular system design which separates the functionality in a sensor board, a controller board, and an interface board for different communication standards, e.g., CameraLinkTM , NTSC/PAL, or proprietary data transmission. The optical properties of the cameras are determined by the HDRC sensor itself, so a brief overview of the actual HDRC sensors is given in this introduction. Table 5.1 gives the technical specifications of the HDRC VGAy sensor with rolling shutter and the HDRC-G sensor with global shutter. The monochrome version of the sensors has a wide spectral sensitivity range from 400 to 950 nm, the color version has an additional RGB-Bayer color mosaic filter. As the monochrome sensor is sensitive to infrared radiation, it is the sensor of choice for automotive night vision applications with active near infrared (NIR) illumination. The HDRC VGAy sensor with on-chip A/D-converter and rolling shutter is explained more detailed below. It is well suited for high performance cameras, monochrome or color, as well as for highly miniaturized camera modules. The block diagram is shown in Fig. 5.1. The HDRC VGAy is a high-dynamic-range CMOS imager with 10 bit digital video output, which compresses a wide range of light intensity instantaneously and within each pixel with a logarithmic response like our eyes, but exceeding their spontaneous range by orders of magnitude. The ultra-highdynamic-range, which is larger than 160 dB (>8 decades of light intensity) leads to saturation-free images. This allows imaging without the need for exposure control like aperture, shutter, or integration time control. Pixels can be read non-destructively at high-speed, which results in very high frame rates, e.g., more than 4 k subframes s−1 consisting of 3,000 s−1 pixels. Additionally,
74
M. Strobel and V. Gengenbach Table 5.1. HDRC sensor properties Product name
VGAy
Pixels 768 × 496 Optical format 1/2 VGA-resolution Image diagonal 9.1 full frame Technology 0.25 Pixel area 10 × 10 Fill factor >40 Dynamic range >160 0.002 Sensitivity (S/N=1, 20◦ C) Spectral sensitivity (A W−1 ) 550 nm 0.30 850 nm 0.20 Pixel rate 12 VGA frames 38 Random access (AOI) Yes Subframes (3,000 pixels) 4k Shutter Rolling Digital output 10 A/D-converter On-chip Power supply AVDD, VDD 3.3 LVDD 2.5
HDRC-G
Unit
768 × 496 1/2
in.
9.1
mm
0.25 10 × 10 >25 >120 0.004
µm µm2 % dB lx
0.30 0.15 12 38 Yes 4k Global 10 On-chip
A W−1 A W−1 Mpixel s−1 frames s−1
3.3 2.5
V V
frames s−1 bit
the analog video and offset signal can be used in conjunction with an external A/D-converter to increase the bit depth. The built-in address generator supports flexible readout schemes, which enable applications to randomly access subframes (area of interest, AOI) of different sizes at different locations on a frame by frame basis. New subframe parameters can be set while reading out the actual subframe. The memory mapped registers (MMRs) of the sensor are controlled by a serial interface suited for microcontroller-based vision systems. Figure 5.2 shows a HDRC sensor in a 48 pin ceramic image sensor package. For high performance camera systems a lid glass with an antireflection coating is recommended to optimize the optical system. In case of a color sensor the infrared cut-off filter can also be implemented in the lid glass. Other package types, e.g., plastic image sensor packages, or different assembly techniques like chip-on-board (COB) mounting of the sensor die on the circuit board are possible.
5 HDR Video Cameras VDD LVDD
75
SVDD AVDD
sync Pixel Array 768 × 496
Address Generation
Register Bank
Sample/Hold Stage
Shadow Register
n_fen n_len pixclk
Analog Path
spi_clk spi_data n_spi_sld spi_out
vref1 vref2
video_pad
n_reset hdrc_clk
video_data [9:0]
ADC
Serial Interface
VSS AVSS
Vtop Vmid Vbot
Fig. 5.1. Block diagram HDRC VGAy sensor with on-chip A/D-converter
Fig. 5.2. Packaged HDRC sensor (LCC48 package)
5.2 HDRC CamCube Miniaturized Camera Module The HDRC CamCube, Fig. 5.3, is a miniaturized digital camera module with 140 dB dynamic range, low light sensitivity, VGA-resolution, and the generic Open Eye Module interface [1] for easy integration into digital image processing systems. The Open Eye Module interface will be described more detailed in Sect. 5.4. Here, the focus lies on the assembly techniques, the system design, and the electronic of the miniaturized module.
76
M. Strobel and V. Gengenbach
Fig. 5.3. HDRC CamCube, miniaturized camera module
Fig. 5.4. Exploded view of the HDRC CamCube
5.2.1 Features of the HDRC CamCube – – – – – – – – –
HDRC VGA sensor with 640 × 480 pixels >140 dB dynamic range 10 bit digital output 30 fps at full-resolution 10 MHz pixel rate User defined subframe readout (AOI) MMRs of the sensor can be set by a serial interface Very compact module, only 25 × 25 × 25 mm3 (H, W, L) Standard NF-mount lens with adapters to C-mount or 12 × 0.5 mm lenses
5.2.2 Assembly Techniques Figure 5.4 shows the exploded view of the HDRC CamCube with the sensor board (to the right), followed by the board-to-board connection spacer, the
5 HDR Video Cameras
77
25 mm Lensholder, Housing HDRC Sensor (COB) Connection Spacer (BGA) ASIC Controller (COB) Flash Memory (BGA)
Fig. 5.5. Assembly of the HDRC CamCube (schematic)
controller board and output via a 32 pin flat connection cable which can have a length of up to 20 cm. For the assembly of the module several mounting techniques like chip on board (COB), ball grid array (BGA), and surface mount technology (SMT) were used in order to shrink the dimensions of the printed circuit boards. See Fig. 5.5 with the schematic assembly and the main components HDRC sensor, ASIC controller, and flash memory. – COB : Direct chip mounting for the HDRC sensor and the ASIC controller on the circuit board. The bondwires of the chips were covered with a globe top for sealing. – BGA: The flash memory and the board-to-board spacer are mounted by solder bumps in a BGA technology. – SMT : SMT for all other circuits like the oscillator, the resistors, and capacitors.
5.2.3 System Design Figure 5.6 gives an overview of the functional blocks and the signal chain of the HDRC CamCube from the pixel with the optoelectronic conversion of the incident light, the multiplexed readout of the pixels, the A/D-conversion of the image data, the fixed-pattern correction (FPC) in the controller with the correction data from the flash memory. The HDRC VGA sensor used here is a prior version and is similar to the HDRC VGAy sensor described in Sect. 5.1. Due to the small size of the module the use of an on-chip A/D-converter is necessary. The 10 bit output data are fed into the controller which addresses the FPC memory, here a 512 k × 8 bit flash memory with pixel rate access time. To reduce the size and power consumption of the camera module the controller was realized as an ASIC instead of using an FPGA.
78
M. Strobel and V. Gengenbach 4
1
x1
V 2
5 MUX
6 Gain xγ
8
7
A
Controller
D
3
Data Out Control In
Iph 9 FPC 1. Log Transistor V~log Iph 2. Photodiode 3. Pixel capacitances 4. Pixel buffer 5. Multiplexer
6. Amplifier 7. Video A/D converter 8. Controller 9. Fixed-pattern correction memory
Fig. 5.6. Block diagram of HDRC CamCube
5.2.4 Application Example Figure 5.7 shows an application example of the HDRC CamCube mounted in an interior mirror of a car and Fig. 5.8 shows the interfacing to the Camera LinkTM standard.
5.3 HDRC Camera Front-End The HDRC Camera Front-End [1], Fig. 5.9, is intended to serve as a reference design for a quick integration of HDRC CMOS sensors in digital vision systems. Physical interfacing to any transmission board with standards like RS422, Camera LinkTM [2], 1394 Fire Wire, or USB2.0 is easy to accomplish by the generic digital video and synchronization signals of the interface. The
Fig. 5.7. HDRC CamCubes integrated in an interior mirror of a car looking on the road through the windshield
5 HDR Video Cameras
79
Fig. 5.8. HDRC CamCube with a Camera LinkTM interface board
Fig. 5.9. HDRC Camera Front-End
interface is open for active controlling the HDRC sensor’s registers via a bidirectional serial interface, e.g., accessed by a microcontroller or a digital signal processor. The HDRC Camera Front-End as well as the interface is designed for highest demands in image capturing and transmission. The system concept allows smart solutions with minimum hardware requirements. An integrated lens mount adapter provides direct adaptation of special HDRC lenses but, at the same time, offers an interface to standard C-mount lenses. The dimensions of the HDRC Camera Front-End with the sensor and controller board are depicted in Figs. 5.10 and 5.11, respectively. The optical center is defined by the sensor’s center of the active pixel array.
80
M. Strobel and V. Gengenbach Note: all units are in mm 24.6
7.7
φ3.0
φ3.0
21.5 35.6
40.0
HDRC Sensor Optical Center (20 20)
9.3
20.0
φ3.0
29.0
5.5
Origin (0 0)
40.0
Fig. 5.10. Front side of the HDRC Camera Front-End with dimensions
A flexible 24 bit parallel data bus with bit assignments of 1 × 8, 1 × 10, 1 × 12, 2 × 10, 2 × 12, 3 × 8, or up to 1 × 24 bit allows adaptation of future HDRC imager generations with high resolution (Megapixel or HDTV) and high-speed digital output. The pinout of the module connector is shown in Fig. 5.12, and Table 5.2 gives the pin definition of the digital signals and power supply. The signals in Table 5.2 can be grouped according to their function into video data, frame synchronization, serial control, and system signals as well as power supply: – Video data DIO23 to DIO0. DIO23 to DIO12 for a 12 bit, DIO23 to DIO14 for a 10 bit single digital output are assigned, respectively. – Frame synchronization signals. /LEN and /FEN define the horizontal and vertical frame synchronization, respectively. PCLK defines the pixel rate.
5 HDR Video Cameras
81
Note: all units are in mm 7.7
24.6
Oscillator
φ3.0
40.0
φ3.0
35.6
21.5
Xilinx FPGA
20.0
φ3.0
4.17
9.3
Module Connector
20.0 Origin (0 0)
29.0
5.5 40.0
Fig. 5.11. Back side of the HDRC Camera Front-End with dimensions A25
A20
A15
A10
A5
A1
B25
B20
B15
B10
B5
B1
Fig. 5.12. Pinout of the module connector (type: ERNI SMC-B-50F, female)
– Serial control interface. With the signals SDIN, SCLK, SLD, and SOUT, a SPI slave interface of the very common serial peripheral interface (SPI) standard is used. – System signals. TRIGGER (input), /RESET, and oscillator OSC (input or output) can be used for triggering on single events or pixel synchronous readout of multiple HDRC Camera Front-Ends, e.g., for stereo applications.
82
M. Strobel and V. Gengenbach Table 5.2. Pin definition of the Camera Front-End interface
2.5 V/ 1.8 V / 1.2 Va AGND A+3.3 V AGND D+3.3 V DGND SDIN (SDA) SLD /RESET DGND OSZ /LEN DIO23 (MSB) DIO21 DIO19 DIO17 DIO15 DIO13 DGND DIO11 DIO9 DIO7 DIO5 DIO3 DIO1 a
A25 A24 A23 A22 A21 A20 A19 A18 A17 A16 A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4 A3 A2 A1
B25 B24 B23 B22 B21 B20 B19 B18 B17 B16 B15 B14 B13 B12 B11 B10 B9 B8 B7 B6 B5 B4 B3 B2 B1
2.5 V/ 1.8 V / 1.2 Va AGND A+3.3 V AGND D+3.3 V DGND SCLK (SCL) SOUT TRIGGER DGND PCLK /FEN DIO22 DIO20 DIO18 DIO16 DIO14 DIO12 DGND DIO10 DIO8 DIO6 DIO4 DIO2 DIO0 (LSB)
The supply voltage on pin A25-B25 depends on the HDRC module version
– Power supply. Only few voltage levels are required for the operation of the HDRC module like analog and digital 3.3 V. Depending on the module version an additional voltage like digital 2.5 V may be necessary.
5.4 Digital HDRC Camera LinkTM System The HDRC Camera Link system is a very flexible PC-based digital video acquisition system which supports the Camera LinkTM interface standard [2]. It is also intended as an evaluation platform for different HDRC sensors and applications. Due to the modular design of the camera using the generic HDRC Camera Front-End described in Sect. 5.4, various camera modules can be adapted to the Camera Link interface board. Figure 5.13 shows the setup of the camera with the HDRC Camera Front-End and Fig. 5.14 shows the complete system including a Camera Link frame grabber.
5 HDR Video Cameras HDRC Camera Front-End
MDR-26 pin Connector
3 pin Power Connector
HDRC Sensor Sensor Board
Control Board
Module Connector
Interface Board Camera Link
Fig. 5.13. HDRC Camera Front-End with Camera Link interface board
Fig. 5.14. HDRC Camera Link System
83
84
M. Strobel and V. Gengenbach
5.4.1 Features of the HDRC Camera Link Camera HDRC Camera Front-End, 768 × 496 pixel resolution >140 dB dynamic range (depending on the HDRC sensor type) Base Camera Link Configuration, 12 bit per pixel 30 fps at full-resolution, 38 fps at VGA-resolution 12 MHz pixel rate User defined subframe readout (AOI) MMRs of the sensor can be accessed by the serial camera configuration of the Camera Link standard – Very compact camera, only 54 × 46 × 46.4 mm3 (H, W, L) – Standard C-mount lens adapter – – – – – – –
The HDRC Camera Link System offers a great variety and flexibility to the user as far as optical characterization of the sensor with raw image data, FPC techniques, image analysis to advanced contrast, and tone-mapping algorithms by user software and a development environment: – User software “IP3 Control” for monochrome and color cameras (supports frame grabber p3i CL) – Software development environment: DLLs and documentation for programming of user applications in C/C++ (supports frame grabber p3i CL) With the “IP3 Control” software the basic camera functions can be set by a graphical user interface. 5.4.2 Features of the “IP3 Control” Software – Support of HDRC Camera Link cameras, monochrome and color (for frame grabber p3i CL) – User defined subframe readout (AOI) – 10 or 12 bit acquisition – Several lookup tables (LUTs) for optimum displayed contrast in manual and automatic mode – Software FPC – Camera control for image size and trigger mode – Image sequence recording and displaying
5.4.3 Application Example Figure 5.15 shows a xenon lamp in on and off state. With the HDRC Camera Link system a high-speed sequence of subframes, 100 × 100 pixels at a frame rate of 1,000 frames s−1 , was captured to visualize the switch-off process of the electric arc at the left electrode (see Fig. 5.16).
5 HDR Video Cameras
(a)
85
(b)
Fig. 5.15. Xenon lamp: (a) on state, (b) off state t = 1ms
t = 2ms
t = 3ms
t = 4ms
t = 5ms
Fig. 5.16. Switch-off of the xenon lamp: sequence of subframes 100 × 100 pixels at a frame rate of 1,000 frames s−1
5.5 Intelligent HDRC GEVILUX CCTV Camera The HDRC GEVILUX camera [3] is a fully automatic CCTV (NSTC/PAL) video camera with an extremely high-dynamic-range. Various integrated image-processing modes like contrast enhancement, different lookup tables, and edge detection are particularly suitable for sophisticated machine vision or process control applications (Figs. 5.17 and 5.18). 5.5.1 Features of the Camera – – – – – – – – – –
HDRC VGAy sensor Dynamic range >120 dB B/W or color camera Min detectable illuminance 0.02 lx (with lens f/1.4, B/W camera). Integrated real-time image processing Analog output NTSC/PAL Optoisolated trigger control RS232 control of output modes Small camera body 46 × 54 × 54 mm3 (without lens) Wide power supply input range 8–36 V
86
M. Strobel and V. Gengenbach
Fig. 5.17. HDRC GEVILUX CCTV camera with integrated real-time edge detection 7-segment display shows actual function 2 micro switches for manual mode control
Mode
Video output Video
Connectors for power, LVDS output, RS232 control and trigger
Power/Control LVDS/SPI
GEVILUX
Fig. 5.18. Back panel of the HDRC GEVILUX CCTV camera
The electronics inside the camera consist of three boards similar to the modular concept described in Sect. 5.4. The functional blocks performed on the sensor board, the controller board, and the interface board are shown in Fig. 5.19 and listed in brief for each board. Sensor Board – GLHEAD R – HDRC VGAy sensor – Readout of 640 × 480 pixels – 12-bit off-chip A/D-converter
5 HDR Video Cameras Power Supply 1.8, 2.5, 3.3 V HDRC Sensor 768×496
12
A
5...10 V
Control
Video Out
8 (10)
Multi-Point Fixed-Pattern Correction Std.Dev.min range
top grey
Input
top%
1023
Fig. 7.6. Mode CatEye2
The detection of such blemishes can be very difficult because of the underlying ancillary conditions. Some of those are, for example, the strongly fluctuating degree of brightness on the surface to be tested and/or the area of shadow caused by the convolution of the surface. On the other hand, the impurities and/or surface discoloration contained in the lubricant (oil or lotion) during processing can create confusing appearances on the surface that is to be tested. These can then be detected as pseudodefects. In order to carry out a surface inspection successfully under these conditions, new algorithms are always required for defect detection. The physical basics for the creation of defects as well as adjustment mechanisms to deal with the fluctuation of brightness set the base for newly developed algorithms. The hardware of an image-processing system, particularly the PC-based one, often cannot cope with calculation-intense real-time algorithms. In addition to that, the conventional CCD and CMOS cameras usually deliver a rather unusable image of metal surfaces. Such an image usually produces extremely under- or overexposed areas that can be traced back to brightness or an extremely corrugated form of the metal surface. Only a combination of high-performance evaluation electronics, a camera with a highly dynamic HDRC sensor and, adaptive evaluation algorithms can guarantee detecting all flaws on a surface with reflections and shades.
7.6 Evaluation Algorithms Actually, the sought out defects are a “fluke of nature.” They show certain characteristics that always occur with defects of the same type. The measurable values of these characteristics are, however, similar among two
7 HDRC Cameras for High-Speed Machine Vision
115
random defects but never identical. They can deviate quite significantly from the predetermined reference values. Therefore, these defects cannot really be described or ultimately recognized well with the assistance of formalistic methods, such as structural analysis [1,2] or edge profiling segmentation with predefined parameters [2, 3]. This is the reason for the necessity of always having new trouble-shooting methods in order to guarantee a thorough inspection of an inhomogeneous surface which can display either global or local fluctuations in brightness. For this, one needs to rely on the physical basics regarding the appearance of defects and the human visual behavior. Only an algorithm developed on this basis can be successful. Its implementation guarantees a definitive detection, recognition, and classification of surface defects. In order to develop such a procedure, the following basic question needs to be answered. What is a Surface Defect? As metallographic tests have shown repeatedly, a surface defect (crack, scratch, dent, etc.) is an object with characteristic edges [4]. These edges, however, are very similar to partial edges designed into the surface. Their similarity consists of an erroneous characteristic, such as the disproportional light between the surface and the background (erroneous depth) that is equal to the relevant characteristics across the partial edge. This is especially relevant for the width of the mapped edge that can be considered the junction width between the surface and the background (erroneous depth). This is independent of the luminous conditions across the edge. The characteristics of the edge to analyze itself as well as the parameters of the recording system, such as its resolution, however, determine the width of the recorded edge. It is known [5] that a beam’s intensity diversion displays a Gauss profile. Since a surface can be considered a beam source due to its reflexion the luminous diversion extracted from the surface across the edge to the background described as Gauss distribution [2]. This model has also proven to be the best one for the precise calculation of the edge position with subpixel accuracy [6]. This is why one can assume to have a Gauss profile (Fig. 7.7) when describing the gray value profile established across the edge. That means that this gray value profile has a normal diversion according to Gauss and, that is why, is an exponential function and, therefore, is characterized by certain specifically proportional points. Accordingly, the following adaptivity parameters can be derived from the gray value profile: – σ: the standard deviation representing the semiedge width of the detected edge – ξ1 : the inflection point coefficient that determines the brightness Iturn on the edge – ξ2 : the turning point coefficient that determines the background brightness Ibgrd
116
R. Louban
Brightness
l
Isurf
Iturn
2σ
σ I0 Ibgrd Testdirection
Fig. 7.7. Gray value profile that is established alongside a scanning direction located across an edge between the background and the surface
– η0 : the brightness factor that represents the minimal proportion between the brightness of the earliest possible edge position I0 and the background Ibgrd With the help of these adaptivity parameters all surface damage can be detected. This process has to be continued to be carried out on separate levels whereas a definite scan of the local surrounding conditions will be carried out across the edges of all detected objects. All scanned object ranges will be marked accordingly to their respective analysis results (Fig. 7.8). The partially marked defects can then be segmented and evaluated using known conventional contour tracing methods. With this, the above-mentioned defects can be recognized and no longer be differentiated from other objects (pseudodefects) on neither a homogene, nor an inhomogeneous but, instead, a inhomogeneous and/or structured surface (Fig. 7.9). Subsequently, all recognized defects have to be evaluated according to defined minimal parameters. The newly developed algorithm [7] allows an inspection for free shaped surfaces for the first time because a defect detection of the entire surface is guaranteed regardless of where and how much of the evaluated surface is in the recorded image. This makes a complex segmentation of the test areas to be evaluated in the image obsolete, which causes the testing program to always be dependent on the surface form. In addition, all characteristic values necessary for the defect detection are determined dynamically from the image to be tested in accordance with the local brightness conditions and only used for this particular image. All objects thus detected are then analyzed on the basis of their edge quality and, if necessary, evaluated as defects. Because of this, layered process for defect detection can be described as adaptive edge-based object detection. Using this process, a safe and at the same time explicit defect detection is guaranteed, where merely the type of defect and the minimal defect size need to be entered.
7 HDRC Cameras for High-Speed Machine Vision
1. Original Image
2. Candidates
4. Processing Image
117
3. 100% defects
5. Resulting Image
Fig. 7.8. Sequence of the adaptive edge-based algorithm
The new intelligent camera seelector ICAM-HD1 supports the abovedetailed algorithm. It provides the necessary computing capacity for the algorithmic in real-time. A proper inspection of the free shaped and arched metal surfaces can, however, be impeded through the appearance for disturbing effects, such as very strong reflections on the surface to be tested (Fig. 7.10a). It is important that these effects usually appearing on an extremely deformed metal surface are adjusted. This is where HDRC sensors proved to be the last missing link for industry-capable image process systems. The HDRC sensor with its dynamics of 170 dB integrated into the seelector ICAM-HD1 is the perfect solution for the above-mentioned task. It provides recordings of dynamics within such a high range that no over- and/or underexposed areas are included. At the same time, the requirements for illuminance are low. This opens up far more fields of application for industrial image-processing systems. Typically, the displayed image recorded with a HDRC sensor has a contrast that is too low (Fig. 7.10b). This can, for example, be successfully corrected (Fig. 7.10c) with image processing methods shown in Sect. 7.4. On the enhanced image, a detection of even the finest defects such as open tears or even closed constrictions are possible on a strongly deformed metal surface (Fig. 7.11) with the aid of an adaptive edge-based algorithmic. The processing for a printed image recorded with a HDRC sensor can be carried out immediately for all recordings or dynamically for each individual recording depending on real luminance conditions.
118
R. Louban
1:
(a)
(b)
(c)
2:
(a)
(b)
(c)
3:
(a)
(b)
(c)
Fig. 7.9. Detection capacity of the newly developed algorithm on several metal surfaces with different: (1) massive ground surface with a scratch; (2) massive raw surface with several cages; (3) unwashed surface of a preformed part with a crack; (a) original image; (b) processing image with all candidates (yellow marking) and 100% defect positions (green marking); (c) resulting image with defect (red frame marking)
This means that an explicit and safe detection of all defects on different metal surfaces are guaranteed with the aid of an adaptive edge-based algorithm (Fig. 7.12).
7.7 Robot Controlled Image-Processing System for Fully Automated Surface Inspection The combination of high-performance evaluation electronics, a camera with a highly dynamic HDRC image sensor and an adaptive evaluation algorithm produces a product for the inspection of free shaped metal surfaces – the MetalDefectDetector. The MetalDefectDetector is implemented with an intelligent
7 HDRC Cameras for High-Speed Machine Vision
(a)
119
(b)
(c) Fig. 7.10. Deformed metal surface with a constriction: (a) recording with a CCD camera (original image); (b) recording with a ICAM-HD1(original image); (c) recording with a ICAM-HD1(enhanced image)
1:
(a)
(b)
2:
(a)
(b)
Fig. 7.11. Defect detection capacity of the adaptive edge-based algorithm on on a deformed metal surface (recording with a ICAM-HD1): (1) closed constriction; (2) closed constriction (tear ); (a) enhanced image; (b) resulting image (with all marks)
120
R. Louban
1:
(a)
(b)
2:
(a)
(b)
3:
(a)
(b)
Fig. 7.12. Defect detection capability of the adaptive edge-based algorithm on a metal surface: (1) recording with a CCD camera (original image); (2) recording with a ICAM-HD1 (enhanced image); (3) processed with a ICAM-HD1 (resulting image); (a) massive metal; (b) sheeting (preformed object)
camera described in Sect. 7.2. It can, for example, be installed and parameterized using a laptop and afterward be operated as fully standalone function. Its implementation has already proven itself in industrial application. Among other things, the MetalDefectDetector makes it possible for the first time to inspect a free shaped metal surface of a 3D object. This inspection can be carried out fully automatic where one or several MetalDefectDetectors with a robot can move along the surface to be tested (Fig. 7.13a). In doing so, the inspected surface is continuously recorded and evaluated (Fig. 7.13b) regardless of the location of the inspected surface and to what degree it is present in the recorded image. It is crucial for the implementation of the robot
7 HDRC Cameras for High-Speed Machine Vision
121
(a)
(b)
Fig. 7.13. Robot controlled fully automated inspection of a free shaped metal surface: (a) the 3D object to be inspected (inspection along the yellow line); (b) resulting images
that none of the parts of the inspected surface are predefined or have to be pretrained. This enables a MetalDefectDetector based on HDRC technology to reach new horizons in fully automated surface inspection.
References 1. Frei konfigurierbare regionalbasierte Farbtexturanalyse zur automatischen Auswertung von Bild-Stichproben/A. Burmeister. Arbeit zur Erlangung des Grades “Doktor-Ingenieur” Der Technischen Fakult¨ at der Universit¨ at Erlangen-N¨ urnberg, Erlangen, 1999 2. Digitale Bildverarbeitung/Bernd J¨ ahne. 4. Aufl. Berlin Heidelberg New York: Springer, 1997 3. Industrielle Bildverarbeitung/Christian Demant, Bernd Streicher-Abel, Peter Waszkewitz. Berlin Heidelberg New York: Springer, 1998 4. M. Pohl. Unsch¨ arfe nutzen, QZ, 5, 2001, S. 615–617 5. Technische Optik: Grundlagen und Anwendungen/Gottfried Schr¨ oder. 7. Aufl. W¨ urzburg: Vogel, 1990 (Kamprath-Reihe), S. 88 6. A Comparison of Algorithms for Subpixel Peak Detection/D. K. Naidu, R. B. Fisher. Department of Artifical Intelligence, University of Edinburgh, Scotland, UK. DAI Research Paper No. 553, October 17, 1991 7. Verfahren zur adaptiven Fehlererkennung auf einer inhomogenen Oberfl¨ ache/ R. Louban, DE 10326033 C1, 10.06.2003. hema electronic GmbH
8 HDR Vision for Driver Assistance Peter M. Knoll
8.1 Introduction Due to an earlier analysis of the interrelation between collisions and advanced driver reaction performed by ENKE [1], a significant number of accidents could be avoided through timely threat recognition and appropriate maneuvers for collision avoidance. This may be achieved either by suitable warning to the driver or by automatic support to longitudinal or lateral control of the vehicle. A precondition for the detection of the dangerous situation is the use of appropriate sensors. This leads to an environmental sensor vision system accompanied by a matched human machine interface. Many vehicles readily offer ultrasonic reversing aids as add-on systems. Furthermore, long range radar systems for Adaptive Cruise Control (ACC) have been introduced to the market a couple of years ago. New sensor technologies, in particular video, open up a plurality of novel functions thus enhancing driving safety and convenience. Upon availability of high dynamic CMOS imager chips video cameras will be introduced in vehicles. As a first step, functions for driver warning or information, e.g., lane departure warning, distance warning, are being introduced. In a second step, functions interacting with vehicle dynamics through brake, steering, and acceleration will open up new perspectives for collision mitigation and collision avoidance. Due to the possibility of its multiusage the potential of video sensing is considered high for various functions. Today, the components for the realization of these systems – highly sensitive sensors and powerful microprocessors – are available and/or under development with a realistic time schedule, and the chance for the realization of the “sensitive” automobile is fast approaching. Soon sensors will scan the environment around the vehicle, derive warnings from the detected objects, and perform driving maneuvers all in a split second faster than the most skilled driver. In critical driving situations only a fraction of a second may determine whether an accident occurs or not. Studies [1] indicate that about 60% of
124
P.M. Knoll Video
Long Range Radar (Lidar) Infrared
Ultra long 1 m to ≤ 150 m
Night vision range ≤ 200 m
Video Medium 0 to ≤ 80 m*
Short Range (Radar, Lidar)
Ultrasonic
Short 0.2 to ≤ 20 m
Ultra short 0.2 to ≤ 1.5 (2.5) m
*object detection up to 80 m
Fig. 8.1. Surround sensing: detection fields of different sensors
front-end crashes and almost one-third of head-on collisions would not occur if the driver could react one-half second earlier. Every second accident at intersections could be prevented by faster reactions.
8.2 Components for Predictive Driver Assistance Systems Electronic surround sensing is the basis for numerous predictive driver assistance systems – systems that warn or actively interact with the vehicle. Figure 8.1 shows the detection areas of different sensor types [2]. 8.2.1 Ultrasonic Sensors Reversing and Parking Aids today are using Ultra Short Range Sensors in ultrasonic technology. Figure 8.2 shows an ultrasonic sensor of the fourth generation. The driving and the signal processing circuitry are integrated in the sensor housing. The sensors have a detection range of approx. 3 m. Ultrasonic parking aid systems have gained high acceptance with the customer and are found in many vehicles. The sensors are mounted in the bumper fascia. When approaching an obstacle, the driver receives an acoustical and/or optical warning.
8 HDR Vision for Driver Assistance
125
Fig. 8.2. Ultrasonic sensor fourth generation
8.2.2 Long Range Radar 77 GHz The second generation Long Range Sensor with a range of approx. 200 m is based on FMCW Radar technology. The narrow lobe with an opening angle of ±8◦ detects obstacles in front of the own vehicle and measures the distance to vehicles in front. The CPU is integrated in the sensor housing. The sensor is multitarget capable and can measure distance and relative speed simultaneously. The angular resolution is derived from the signals from four Radar lobes. Figure 8.3 shows the second generation sensor. It has been introduced into the market in March 2004. At that time this Sensor & Control Unit was the smallest and lightest of its kind on the market. The unit is mounted in air cooling slots of the vehicle front end or behind plastic bumper material. The information of this sensor is used to realize the ACC function. The system warns the driver from following too closely or keeps automatically a safe distance to the vehicle ahead. The set cruise speed and the safety distance are controlled by activating brake or accelerator. The speed range is between 30 and 200 km h−1 . At speeds below 30 km h−1 the systems switches off and gives an appropriate warning signal to the driver. 8.2.3 Video Sensor Figure 8.4 shows the current setup of the Robert Bosch automotive camera module. The camera head is fixed on a small PC board with camera relevant electronics. On the rear side of the camera board the plug for the video cable is
126
P.M. Knoll
(a)
(b)
Fig. 8.3. Bosch 77 GHz Radar sensor with integrated CPU for adaptive cruise control (a) and exploded view (b)
Fig. 8.4. Bosch video camera module
mounted. The components are integrated into a metallic housing. The whole unit is clipped into a windshield mounted adapter. CMOS technology with nonlinear luminance conversion will cover a wide luminance dynamic range and will significantly outperform current CCD cameras. Since brightness of the scene cannot be controlled in automotive
8 HDR Vision for Driver Assistance
127
Fig. 8.5. Comparison of the performance of a CCD camera (left) and a HDRC CMOS camera (right)
Lane Departure Warning
Convenience
Driver Support
Park Assistant
Night Vision Support
Parking Aid
ACC Full Speed Range ACC plus
ACC
Vehicle Lane Keeping Support Pedestrian/Obect Guidance Detection Collision Warning
Pedestrian/Object Protection Parking Stop
PreCrash-Sensing
Passive Safety
PSS Predicitive Safety System
Active Safety Safety
Fig. 8.6. Driver assistance systems on the way to the safety vehicle
environment, the dynamic range of common CCD technology is insufficient and high dynamic range imagers are needed. Figure 8.5 shows a comparison between the two technologies. On the left is an image taken with a CCD camera, on the right an image taken with a high dynamic range camera (HDRC) in nonlinear CMOS technology. It is obvious in the left image that the CCD sensor detects no gray shades within the light area of the tunnel opening – except two oncoming cars while every detail (trees, more oncoming cars, and a truck) can be detected in the right image [3].
8.3 Driver Assistance Systems for Convenience and for Safety Figure 8.6 shows the enormous range of driver assistance systems on the way to the “Safety Vehicle”. They can be subdivided into two categories: – Convenience systems with the goal of semiautonomous driving – Safety systems with the goal of collision mitigation and collision avoidance
128
P.M. Knoll
Driver support systems without active vehicle interaction can be viewed as a prestage to vehicle guidance. They warn the driver or suggest a driving maneuver. One example is the parking assistant of Bosch. This system will give the driver steering recommendations in order to park optimally in a parking space. In future, a rear view camera will be added to the system allowing the driver to watch the scene behind the own car on a graphic display. Additional information, e.g., course prediction based on steering angle and distance information can be added to ease the parking procedure for the driver. Another example is the Night Vision Improvement system. As more than 40% of all fatalities occur at night this function has high potential for saving lives. Lane departure warning systems can also contribute significantly to the reduction of accidents as almost 40% of all accidents are due to unintended lane departure. ACC, which has been introduced to the market a few years ago, belongs to the group of active convenience systems and will further be developed to a better functionality. If longitudinal guidance is augmented by lane-keeping assistance (also a video-based system for lateral guidance), and making use of complex sensor data fusion algorithms, automatic driving is possible in principle. Passive safety systems contain the predictive recognition of potential accidents and the functions of pedestrian protection. The highest demand regarding performance and reliability is put on active safety systems. They range from a simple parking stop, which automatically brakes a vehicle before reaching an obstacle, to Predictive Safety Systems (PSS) [4, 5].
8.4 Video-Based Driver Assistance Systems Due to the high information content of a video image, video technology has the highest potential for future functions. They can be realized on the video sensor alone or video signals can be fused with radar or ultrasonic signals. 8.4.1 Video System The above-mentioned video technology will first be introduced for convenient functions that provide transparent behavior to and intervention by the driver. Figure 8.7 shows the basic principle of operation for a video system. The enormous potential of video sensing is intuitively obvious from the performance of human visual sensing. Although computerized vision has by far not achieved similar performance until today, a respectable plurality of information and related functions can readily be achieved by video sensing: – Night vision improvement – Lane recognition and lane departure warning, position of own car within the lane, lane keeping support
8 HDR Vision for Driver Assistance
Acquisition
Communication / Vehicle Interaction
Processing
Camera
129
ECU
HMI/Actuator
bus
position relative to lane, lane curvature, recognised traffic signs, object location, relative object speed, object identification, ...
Driver information • optical • acoustical • haptical
Model of the surrounding “world”
Fig. 8.7. Basic principle of a video system and functions being considered
high
Amount of Sensorinformation
Classification Measurement
Cars, Trucks, Pedestrians
Object classification
Object Detection
Vehicles, Obstacles Curves Road Sign Recognition
Recognition
Road Signs
Lane Detection
Detection
Seeing low
Scene Interpretation
Interpretation Behavior of road users, Prediction
Lane, Road Signs
(Night Vision) complexity
high
Fig. 8.8. Steps of image processing
– Traffic sign recognition (speed, no passing, etc.) with an appropriate warning to the driver – Obstacles in front of the car, collision warning – Vehicle inclination for headlight adjustments and others
130
P.M. Knoll
New methods of image processing will further improve the performance of these systems. Besides the measurement of the distance to the obstacle the camera can assist the ACC system by performing object detection or object classification and, thus, open up realization of PSS with a high collision avoidance and collision mitigation potential [6]. 8.4.2 Image Processing Figure 8.8 depicts the six different steps in image processing. The first step shows the image as it is seen by the camera. In the second step relevant parts of the image are extracted based on a model or on features. Examples are lane detection or the geometry and the trajectory of a traffic sign. This traffic sign is recognized in the third step. The traffic sign is based on patterns, which have been learned by the system. Step 4, which is explained in more detail below, describes the measuring of the position and the outlines of objects and markings, e.g., vehicles, obstacles, 3D lane curvature. Step 5 describes the object classification, i.e., the discrimination between, e.g., a vehicle or a pedestrian, while step 6 describes the most challenging image processing, the interpretation of scenes, predicting the potential movement of other traffic members in the vicinity.
8.5 Night Vision Improvement System In Germany more than 40% of all fatal accidents occur at night while only 20% of traffic happens at night. In the United States the probability for an accident at night is by the factor 5 higher than an accident at daytime. In 2000, more than 265,000 animals have been involved in traffic accidents at night, and every year more than 3,500 pedestrians are killed at night; these are 64% of all fatal accidents with pedestrians [7]. Accident statistics show that night vision driver assistance systems (NVsystems) have a huge potential in reducing accidents and fatalities by collision mitigation and avoidance. The evolution of night vision systems provides a roadmap forecast on the development of functions and their combinations aimed to assist drivers at night. In 2000, a Night Vision Enhancement based on far infrared (FIR) has been introduced on the US American automotive market. A specially designed camera, sensitive in the wavelength range of 7,000 µm picks up an image. Warm objects appear light, while cold objects are not visible. FIR systems suffer from rather low resolution, and the fact that objects with the same temperature as the ambient are not visible to the camera. Another approach, illustrated in Fig. 8.9, uses modified halogen headlamps radiating near infrared (NIR) light with a characteristic comparable to a visible high beam. CCD- or CMOS cameras are sensible in the NIR wavelength range, and can pick up the “illuminated” scene in front of the car as it is seen by the vehicles driver. The camera image can be shown, e.g., on a head up
8 HDR Vision for Driver Assistance
131
Fig. 8.9. Night vision improvement system using near infrared (NIR) radiation of the vehicles modified headlamps
display projected on the windscreen of the vehicle. The viewing range of the driver is extended to the range of the NIR irradiation. One can foresee that NV functions will be introduced on the market in three major steps: – Night Vision Enhancement (NV). First NV-systems will present enhanced live images of the road scene ahead in or close to the primary view of the driver. – Night Vision Warning (NVW). NVW-systems will not present live images anymore, but appropriate warnings to the driver. NVW-systems will cover, depending on their development, some subfunctions like Obstacle Warning, Lane Departure Warning and Lane Keeping Support, Lane Guidance Assistance and Road Sign Assistance. – Night Vision Safety (NVS). With NVS-systems active safety functions will be introduced. NVS will extend Night Vision Warning functions by Obstacle Mitigation/Avoidance Support. At this stage, the NIR-systems (illumination and camera) may have reached a performance that algorithms for collision mitigation and avoidance can be used at daytime and night without different applications.
8.6 Night Vision Enhancement by Image Presentation The use cases for NV are mostly nonilluminated country roads and situations with approaching vehicles, which might hide pedestrians, cyclists, or any obstacles on one’s own lane due to dazzling.
132
P.M. Knoll
Fig. 8.10. Images from NV-system without (left) and with (right) infrared illumination
Independent of their basic technologies (FIR or NIR), the principle of this function is in any case the presentation of an enhanced live image in real time in or close to the primary field of the driver’s view. The basic idea is to present images from a range in front of the vehicle which equals at least the viewing range of the naked human eye with a high-beam illumination. Derived from the basic requirement the NV-systems have to cover a range of at least 120 m and the resolution of the used FIR or NIR imager, including the display for presenting the images, has to be large enough that appearing objects can be recognized by the driver. It is expected that a resolution of 640 pixels in horizontal direction with a field of view of approx. 20◦ will be sufficient for appropriate image presentations (Fig. 8.10). The images are taken from an NIR-system with CMOS-Imager. The “puppets” are at distances of 60 m, 90 m, and 120 m. The well-reflecting road sign behind the first puppet is at a distance of 80 m. The human machine interface (HMI) plays a major role with this function, because it directly determines the usability of the system. Two main technologies have been identified to be suitable for the image presentation, the headup display and the direct-view display. The head-up display is a projection from a conventional TFT-LCD or digital-mirror-device directly onto the windscreen of the vehicle or onto an additional flappable mirror, usually mounted on top of the dashboard. The direct-view display could be a conventional TFTLCD mounted into the instrument cluster or above the center console. The latter placement is not recommended due to distraction from the task of driving. In any case, while driving with such a system at night, the driver has the additional task to observe continuously the enhanced images to check if any obstacle appears on the display which cannot be seen by the naked eyes. But this task increases the eye-off-road time and therefore decreases the resulting amount of attention as a side effect accordingly.
8.7 Night Vision Warning The night vision warning system (NVW-system) is the logical further development of image presenting NV-systems, aiming to increase safety while driving
8 HDR Vision for Driver Assistance
133
at night. The basic idea of NVW-systems is to assist the driver’s observation and checking of night vision images with a computer. The driver will not get live images any more, but an appropriate warning or information, if potentially relevant objects are detected from the computer within the night vision images. Identified relevant objects for NVW-systems are Obstacles, Lanes, and Road signs.
8.8 Sensor Data Fusion Depending on the functions to be realized, information from other sensors can be used to support the measurements from the image processing. The goal is to increase the overall reliability of the measurements and therefore to maximize the prediction probability or minimize the false alarm rate. It is expected that sensor data fusion for night vision will be introduced with the introduction of NVW-systems. Especially the subfunctions Obstacle Warning and Lane Departure Warning/Lane Guidance Assistance can be improved. Data Fusion for Obstacle Warning Camera images are well suitable to determine lateral positions of objects but quite poor for detecting longitudinal ranges. Therefore, range sensors, like RADAR or LIDAR, with good longitudinal detection ability are well-suited complementary sensors for data fusion. Both sensor types are already used in series car models for ACC function. In Europe RADAR sensors are mostly used, while LIDAR sensors have a large market penetration in Japan. Data Fusion for Lane Departure Warning/Lane Guidance Assistance The detection range of lane detection algorithms is determined by the viewing range of the optical system of the camera. With an increased “viewing range,” e.g., by prediction of the road section ahead from navigation maps, lane detection and lane guidance could be more stable and therefore more reliable. Requirements to Camera and Imager for Night Vision Systems For Night Vision applications there are a couple of requirements put to the camera and to the imager: – A high quality lens array is required to avoid halos – CMOS technology is a must to avoid blooming and smearing
134
P.M. Knoll
– A high sensitivity is required especially in the case of image presentation to the driver – A wide dynamic range (>120 dB) is required to cover the high brightness range – A nonlinear intensity response is required to avoid blinding of the imager by the headlights of oncoming vehicles To achieve a naturalistic image on the graphic display, corrections and additional signal processing are required [8]: – CMOS imagers have a fixed pattern noise (FPN), needing an additional memory and a calibration procedure – After grabbing the image additional noise reduction has to be made – Edge enhancement is necessary – A nonlinear intensity compression is required
8.8.1 Lane Detection and Lane Departure Warning Lane departure warning (LDW) systems can be operated with a mono or with a stereo camera. As shown in Fig. 8.11, the camera searches for lines in the front of the vehicle. The upper image shows the camera image with search lines (see detail in the second image from above) and the crosses in the upper image show the lane course, which is calculated by the image processing computer. To detect a line, the luminance signal within the search line is analyzed and, using a high pass-filtering algorithm, the edges of the line are detected. From these signals a warning can be derived if the driver is crossing the lane mark without having set his turning lights. This warning may be acoustical or haptical, e.g., by giving a momentum on the steering wheel in the opposite direction [9]. 8.8.2 Traffic Sign Recognition A traffic sign recognition system consists of the above-described video system but with a mono camera. The image of the camera is shown in Fig. 8.12. While driving, the camera searches permanently for round shaped objects, which could be a traffic sign. When such an object is detected, the object is tracked until the resolution of the camera chip is sufficient for reading the sign. Reading in this case means a comparison between the recognized sign and sign patterns which have previously been learned by the system. The recognized traffic signs can then be displayed on a graphic display being located in the instrument cluster or in the center console. It is easy to add an acoustic warning when the driven speed is higher than the allowed speed [9].
8 HDR Vision for Driver Assistance
135
Original image with search lines
Detail with search line Luminance signal within search line Edge information by highpass filtering
Fig. 8.11. Principle of lane detection Detection & Tracking
Classification
Fig. 8.12. Traffic sign recognition: camera image with tracking lines and recognized signs
8.9 Conclusion Driver assistance functions have the potential to significantly contribute in reducing accidents and fatalities while driving at day and night. As basic technology with most advantages for future functions seen from the evolution side of night vision systems, the active near-infrared technology combined with a high-resolution CMOS-sensor is expected. Current first generation night vision systems presenting live images in the primary field of the driver’s view will be substituted by systems based on object detection. These second generation systems will provide warnings instead of images to the driver, e.g., for lane departure warning, lane guidance, or obstacle collision warning, which will require new ideas for human machine interfaces. From this stage on, the introduced functions will make further use of data fusion with other sensors or systems, like navigation maps and range sensors based on RADAR or LIDAR technology, to increase the reliability in detecting
136
P.M. Knoll
objects. Further on, object detection will be the basis for controlling actuators for lane keeping functions and collision mitigation and collision avoidance functions. The highest demand regarding performance and reliability is put on active safety systems. They range from a simple parking stop, which automatically brakes a vehicle before reaching an obstacle, to computer-supported control of complex driving maneuvers to avoid collisions. For example, the automatic emergency braking feature intervenes if a crash is unavoidable. In its highest levels of refinement, active systems intervene in steering, braking, and engine management to avoid colliding with an obstacle. Here, the vision goes to the collision avoiding vehicle, making computer assisted driving maneuvers for crash avoidance [5]. For all these future functions, video is a key technology.
References 1. Enke, K.: “Possibilities for Improving Safety Within the Driver Vehicle Environment Loop”, 7th Intl. Technical Conference on Experimental Safety Vehicle, Paris (1979) 2. Knoll, P.M.: “Mehr Komfort und mehr Sicherheit durch pr¨ adiktive Fahrerassistenzsysteme”, Conf. Proc. Vision Automobil, Handelsblatt, M¨ unchen (2005) 3. Seger, U.; Knoll, P.M.; Stiller, C.: Sensor Vision and Collision Warning Systems, Convergence Conference, Detroit (2000) 4. Knoll, P.M.: “Predictive Safety Systems” Mstnews, Mitteilungsblatt der DLR zum F¨ orderprogramm Mikrosystemtechnik (2004) 5. Sch¨ afer, B.-J.; Knoll, P.M.: “Pr¨ adiktive Fahrerassistenzsysteme – vom Komfortsystem zur aktiven Unfallvermeidung”, Proc. VDI-Conference, Wolfsburg (2004) 6. Knoll, P.M.: Fahrerassistenzsysteme – Realer Kundennutzen oder Ersatz f¨ ur den Menschen? VDI, Deutscher Ingenieurtag, M¨ unster, Germany (2003) 7. Anonimous statistics of accident data of the “Statistisches Bundesamt” (German Federal Statistical Institution), Wiesbaden, Germany (2002) 8. Bischoff, S.; Haug, K.: “Automotive NIR-Night-Vision-Systems: Challenges for Series Introduction”, Proc. Vision 2004, Rouen (2004) 9. Knoll, P.M.: “Handbuch Kraftfahrzeugtechnik”, chapter “Fahrerassistenzsysteme”, Vieweg-Verlag (2005)
9 Miniature HDRC Cameras for Endoscopy Christine Harendt and Klaus-Martin Irion
Visualization of the status of organs health is one major task in medical diagnosis and therapy. Endoscopy and Minimal Invasive Surgery (MIS) are techniques for this purpose in medical applications. Recent developments in microelectronics allow the fabrication of advanced and highly integrated image sensors and improved solutions for miniaturization and wireless data transmission [1]. In endoscopy important aspects are the demand for smaller devices and the need to integrate high quality, but low cost visualization techniques. Moreover, the problems and cost of sterilization raised the wish to fabricate disposable endoscope heads. While the majority of endoscopes uses optical lens systems (rigid endoscopes) or fiber bundles (flexible endoscopes) to transmit the images to a camera, video-endoscopes have the camera directly at the tip. Thus, such an endoscope head is a micro system with image sensor chip, optics, illumination and electrical wiring. Image data is transmitted via cables, which also provide the power for the system. A major objective in a recently finished research project of the European IST (Information Society Technology) program was the development of a small image sensor for an endoscope with a very small diameter. In the project IVP (Intracorporeal Videoprobe) a consortium with partners from four European member states cooperated in the development of videoprobes [2]. One prototype, the so-called wired probe IVP1, has a diameter of the endoscope head of 3.5 mm (Fig. 9.1). The HDR color image sensor could be realized with outer dimensions of 1.7 × 1.3 mm2 and a resolution of 36,000 pixels. A major issue in the chip development was the size reduction of both the image area and the overall chip. Thus, additional to the general objectives (sensitivity, dynamic range and low dark current) three issues had to be addressed:
138
C. Harendt and K.-M. Irion
Fig. 9.1. Videoprobe with optics and illumination
Fig. 9.2. Color image obtained with an IVP1 image sensor. The enlarged images show details revealed after edge detection
– The minimization of the pixel size to allow acceptable dimensions for this sensor. – The reduction of the number of pad connections because bonding pads with their area and the required sizes for pad circuitry cover a significant area of the chip. Additionally, each bond also requires size on the package. – The testability of the device to allow complete evaluation and verification of the first devices. The resulting sensor with a logarithmic response has a pixel pitch of 4.6 µm. The sensor shows a measured dynamic range >100 dB and has a minimal detectable illumination of 0.03 lx.
9 Miniature HDRC Cameras for Endoscopy
139
Figure 9.2 demonstrates the performance of the color device on an image of a human eye. Although the bright reflection in the center seems to be saturated, more details are visible in the enlarged images, which have been processed with an edge-detection software. The details are the shape of the spot light used for the illumination during the shot. The assembled device has only four connections, but the chip is originally equipped with 43 pads, which are used for the complete digital test. The majority of these pads are cut-off before the final mounting of the chip. For the fabrication of the IVP1 prototype the chip is mounted on a circular (Ø 3 mm) ceramics, where it is connected to the cables. Two openings in the circuit board are used to fix the optical fibers for the illumination: a metal cap with the optics completes the distal end of the endoscope. The analog image data are transferred via the cables to a PC, where the image data is processed and subsequently displayed.
References 1. K.M. Irion “Endoscopes – Potential for Microsystem Technology” MST News, Special Issue “Microsystems Made in Germany” 2005, p 26 2. A. Arena, M. Boulougoura, H.S. Chowdrey, P. Dario, C. Harendt, K.M. Irion, V. Kodogiannis, B. Lenaerts, A. Menciassi, R. Puers, C. Scherjon, D. Turgis, “Intracorporeal Videoprobe (IVP) ” in Medical and Care Compunetics 2, Volume 114, edited by L. Bos, S. Laxminarayan and A. Marsh, May 2005, IOS Press, pp. 167–174
10 HDR Sub-retinal Implant for the Vision Impaired Heinz-Gerd Graf, Alexander Dollberg, Jan-Dirk Schulze Sp¨ untrup and Karsten Warkentin
10.1 Introduction The photo-resistive cells of the human retina can degenerate by different diseases leading to about 50% of the incidence of total blindness. Retinitis Pigmentosa and age-related macular degeneration are two main examples for the progressing degeneration of the outer retina. Until now, there is no known successful medical treatment to cure the tissue and renew the visual faculty [1]. Thus, a visual prosthesis to recover sight is an important application of integrated micro-systems. The different concepts apply electrical signals to the visual cortex [2], the optic nerve [3] or to the retinal tissue. For retinal implants, basically two different types exist. The epi-retinal implant uses the bottom layer of the retina to supply electrical impulses for stimulation. In the healthy eye the processed information is sent to the brain by the ganglion cells. The epi-retinal implant is mounted on the retina and supplies the electrical impulses to the ganglion cells/optic-nerve interface [4–6]. Therefore, a complex image processing and transmission unit is needed to generate an appropriate stimulus. The second method is the sub-retinal implantation which is used by the implant introduced here (Fig. 10.1). The degenerated photo-receptors are in loco replaced by the implant. They directly stimulate the following bipolar cells and thus use the image processing abilities of the retina and eye movements. No image processing unit is needed in this implant. For sub-retinal implants, a high number of artificial stimulation cells is needed to replace the light-sensitive human cells in a sufficient manner. In many reports, passive photo-diode arrays for electrical stimulation have been proposed [7–9]. After a thorough investigation it appeared that the generated impulses were not strong enough to successfully stimulate the retina [10]. Thus, an active implant with analogue amplifiers for processing the photodiode signals was developed [11]. Similar to the micro-photodiode arrays the active retinal implant utilises the eye-lens image to create the electrical stimulation in the different image
142
H.G. Graf et al.
Sub-retinal implant
Fig. 10.1. Schematic of the human eye with the insertion of a sub-retinal implant
areas. The amplifier cells convert differences between the local and the average illumination into electrical impulses. The stimulation electrodes placed on every amplifier supply an amount of charge locally into the tissue of the retina. The maximum height of the output pulses and also the DC level are strictly limited because higher values can cause electrolysis effects and decompose the tissue surrounding the electrode. This could permanently damage the retina and has to be avoided. Moreover, the injected charge has to be in a particular range to ensure successful stimulation of the retina [8, 9]. For this reason, a very important circuit requirement is the measurability and testability of any amplifier cell on the implant.
10.2 Electronic HDR Photoreceptors The conversion of the brightness into a voltage is performed by a logarithmic cell. A photodiode is driven in the reverse direction. Corresponding to the chip illumination, a proportional reverse current is generated in the photodiode. The photo-current is supplied to a MOS transistor which is in sub-threshold region as description in Fig. 2.6 (Fig. 10.2). The voltage drop across a MOS transistor, which works in the sub-threshold region, is exponentially dependent on the current. A logarithmic change in illumination causes a linear change in the output voltage of the logarithmic cell. This results in a logarithmic behaviour similar to the light sensitivity of the eyes. This circuit is well known to work for a very wide dynamic range of more than 1–106 . Averaging the voltage of some cells, by an analogue summation and attenuation circuit, results in a signal representing the average illumination. This signal is very robust to the large contrast of a natural environment.
10 HDR Sub-retinal Implant for the Vision Impaired
143
sawing line implant
VDD
test local illumination
local photodiode out put voltage
+
biological load
VSS differential amplifier
VDD
− pull-down signal
VSS
global photodiode
VSS global illumination
VSS
Fig. 10.2. Circuit schematic of one pixel of the implant
Bright local illumination of a small part of the “average illumination” cells network results only in small changes of the average illumination.
10.3 The Differential Principle For the amplifier cells, a differential configuration was chosen to get differences in lightness independent of the absolute level of brightness (Fig. 10.2). In this way, patterns can be identically recognized in daylight and twilight. The negative input of the differential amplifier is the representation of the local illumination. The voltage corresponding to the global illumination is connected to the positive input. If the local brightness rises, the negative input voltage decreases and the output voltage of the amplifier increases. The output of the amplifier is not dependent on the absolute voltage level in a very wide range of illumination. The output responds only to the difference between the illuminations. The resulting signal corresponds mainly to differences in the reflectance of illuminated objects. Each pixel delivers a charge proportional to the log of the illuminance, effectively 2.5 µC per decade of illuminance and the dynamic range of each pixel is two decades of illuminance around the global illuminance. The maximum charge packet delivered at the pixel output is 30 × 1012 electrons. The two decades of sensitivity are shifted with the global illuminance level.
10.4 The Complete Amplifier Cell The complete cell integrates the HDR photoreceptors for local and average illumination, a differential amplifier and the discharge switch. As test circuits,
144
H.G. Graf et al.
there is a NAND an inverter gate for cell selection and some switches needed to connect the output of the amplifier to the test-circuit and to make the inputs accessible. A pulsed power supply is connected to the amplifier. With each pulse, the amplifier cell emits charge between 0.5 and 10 nC to the high capacitive load connected to the tissue. The maximum height of the output pulses is set to 2 V because a higher voltage can cause electrolysis effects. For this reason, the design of the amplifier considers 2 V as the maximum output voltage of the amplifier. The electrode capacitance restricts the charge margin. Discharging the output node to ground reference voltage during the time of inactivity secures charge balance at full pulse cycle. This is done by a pulldown circuit that forces the amplifier outputs to ground during inactivity and thus discharges the electrodes. Additionally, this implements charge balance regarding the stimulation impulses.
Fig. 10.3. Capsulated retinal implant. The higher magnification shows the output contacts to the tissue (photo: Retina Implant AG)
10 HDR Sub-retinal Implant for the Vision Impaired
145
The pull-down signal is generated by an NMOS inverter. The inverter is connected to a different supply voltage to keep on working even if the pulsed supply is switched off. The dimensions of the cells are 72 × 72 µm2 .
10.5 The Retinal Implant The complete implant consists of 38 × 38 cells. The space of the missing cells is needed to place the reference source, the pull-down inverter and the bonding pads to connect the final implant. The dimension of the implant chip is 3 × 3.1mm2 (Fig. 10.3). During normal operation, the implant uses a pulsed supply at about 20 Hz. The active time per period is about 500 µs. The pull-down signal is generated by means of an additional DC supply voltage.
Acknowledgement The project, in which this implant was developed, was funded in part by BMBF (German Federal Ministry of Education and Research). The cooperation with NMI (Naturwissenschaftlich Medizinisches Institut, Reutlingen) with the Augenklinik, University Tuebingen and with Retina Implant AG, Reutlingen, is gratefully acknowledged.
References 1. E. Zrenner, Will retinal implants restore vision, Science, 295, 1022–1025, Feb. 2002 2. W.H. Dobelle, Artifcial vision for the blind by connecting a television camera to the visual cortex, ASAIO-J, 46, Jan–Feb 2000, 3–9 3. C. Veraart et al., Brain Res. 813, 181, 1998 4. R. Eckmiller, Learning retina implants with epi-retinal contacts, Ophthalmic Res., 29, 281–289, 1997; A.Y. Chow, M.T. Pardue, V.Y. Chow, G.A. Peyman, C. Liang, J.I. Perlman, N.S. Peachey, Implantation of silicon chip microphotodiode arrays into cat sub-retinal space, IEEE J. Solid State Circuits, 35(10), 2000 5. M. Schwarz, R. Hauschild, B.J. Hosticka, J. Huppertz, et al., Single-chip CMOS image sensors for a retina implant system, IEEE Trans. Circuits Systems II, 46(7), 1999 6. R.J. Greenberg, T.J. Velte, M.S. Humayun, G.N. Scarlatis, E. de Juan, A computational model of electrical stimulation of retinal ganglion cell, IEEE J. Solid State Circuits, 35(10), 2000 7. A.Y. Chow, Electrical stimulation of the rabbit retina with subretinal electrodes and high density microphotodiode array implants, Invest. Ophthalmol Vis. Sci. 34(suppl), 835, 1993
146
H.G. Graf et al.
8. M. Stelzle, A. Stett, B. Brunner. M. Graf, W. Nisch, Electrical properties of micro-photodiode arrays for use as artificial retina implant, Biomed. Microdevices, 3(2), 133–142, 2001 9. B. Schlosshauer, A. Hoff, E. Guenther, E. Zrenner, Towards a retina prosthesis model: Neurons on microphotodiode arrays in vitro, Biomed. Microdevices, 2(1), 61–72, 1999 10. F. Geckler, H. Schwahn, A. Stett, K. Kohler, E. Zrenner, Sub-retinal microphotodiodes to replace photoreceptor function. A review of the current state, Vision, sensations et environment, Irvinn, Paris, 2001, pp. 77–95 11. A. Dollberg, H.G. Graf, B. H¨ offlinger, W. Nisch, J.D. Schulze Spuentrup, K. Schumacher, E. Zrenner, A fully testable retinal implant, Proceedings of IASTED Conference on Biomedical Engineering, Salzburg 2003, pp. 255–259
11 HDR Tone Mapping Grzegorz Krawczyk, Karol Myszkowski, and Daniel Brosch
High dynamic range (HDR) imaging is a very attractive way of capturing real world appearance, since it permits the preservation of complete information on luminance (radiance) values in the scene for each pixel. This, however, requires the development of a complete pipeline for HDR image and video processing from acquisition, through compression and quality evaluation, to display. While the prospects for proliferation of HDR capturing (refer to Chap. 2 and 5) and multimedia (refer to Chap. 12) technologies are good, an important problem to solve regards how to reproduce the scene appearance using media with very limited dynamic range such as hard-copy prints, CRT/LCD displays, and projectors. While this may change in the future due to successful efforts in building HDR displays [1] (refer also to Chap. 14) at present such devices are relatively rare and expensive. To solve the problem of transforming high dynamic range images to the limited range of most existing devices, the so-called tone mapping operators are being developed. Essentially, tone mapping addresses the problem of strong contrast reduction from scene values to displayable ranges while preserving the image details, which are important to appreciate the scene content. The goals of tone mapping can be differently stated depending on the particular application. In some cases producing just “nice-looking” images is the main goal, while other applications might emphasize reproducing as many image details as possible, or might maximize the image contrast. The goal in realistic rendering applications might be to obtain a perceptual match between a real scene and a displayed image even though the display device is not able to reproduce the full range of luminance values. However, achieving such a match is feasible because the human visual system (HVS) has limited capacity to detect differences in absolute luminance levels, and concentrates more on aspects of spatial patterns when comparing two images [2]. It should be noted that the choice of an appropriate tone mapping algorithm and its parameters may depend not only on a particular application, but also on the type of display device (projector, plasma, CRT, LCD) and its characteristics, such as reproduced maximum contrast, luminance range,
148
Krawczyk et al.
and gamma correction. Also, the level of surround luminance, which decides upon the HVS adaptation state and effective contrast on the display screen, is important in this context. This means that the visual quality of displayed content can be seriously degraded when already tone mapped images and video are stored or transmitted without any prior knowledge of their actual display conditions. The design of a typical tone mapping operator is usually guided by few rules. It must provide consistent results despite the vast diversity of natural scenes and the possible radiance value inaccuracy found in HDR photographs. Additionally, it should be adaptable and extendible to address the current capabilities of displaying methods and their future evolution. Tone mapping must capture the physical appearance of the scene, while avoiding the introduction of artifacts such as contrast reversal or black halos. The overall brightness of the output image must be faithful to the context. It must be “user-friendly” i.e., automatic in most cases, with a few intuitive parameters which provide possibility for adjustments. It must be fast for interactive and realtime applications while avoiding any trade-off between speed and quality. In this chapter we briefly overview existing tone mapping operators (Sect. 11.1). Since a majority of those operators have been designed for static images with the assumption of ideal (noise-less) sensors used for their capturing, in Sect. 11.2 we discuss specific requirements imposed on tone mapping for real-world HDR video. We present in more detail a complete solution suitable for real-time HDR video tone mapping in Sect. 11.3. In Sect. 11.4 we discuss perceptual effects such as glare, rod-vision that cannot be evoked by typical displays because of their limited dynamic range, but can be easily simulated in real-time atop of a video stream based on the knowledge of HDR luminance and contrast levels. In Sect. 11.5 we discuss a high quality tone mapping operator based on bilateral filtering, which due to high computational cost is suitable only for off-line HDR video processing. Finally, in Sect. 11.6 we summarize major requirements imposed on tone mapping operators in HDR video applications.
11.1 Taxonomy In general, tone mapping can be classified as time dependent for handling animated sequences and time independent designed specifically for static images. In the former case temporal adaptation conditions of the HVS such as dark or bright adaptation are taken into account. Another classification takes into account whether the same mapping function is used for all pixels (spatially uniform or global operators) or if the mapping function variates depending on a given pixel neighborhood which is often achieved through the modeling of spatial adaptation (spatially varying or local operators). In the former case a compressive, monotonic tone reproduction curve (TRC) is constructed which for a given HDR pixel luminance
11 HDR Tone Mapping
149
level always assigns the same output pixel intensity. For spatially varying operators the output pixel intensity depends also on the scene surround and can vary quite significantly even for pixels with the same luminance level. Yet another classification takes into account whether a given tone mapping algorithm tries to model some characteristics of the HVS or perhaps is a pure image processing technique which essentially transforms one image into another. In the former case through mimicking the HVS processing it is expected to achieve predictive tone mapping whose the main goal is to obtain a good match between the real world scene and displayed scene percepts. The objective of image processing approach is often to obtain the results which are preferred by the observer, even if they depart from realism and are not plausible in terms of the real world appearance. 11.1.1 Spatially Invariant Operators In this section we overview global tone mapping operators which are inspired by psychophysical models of brightness and contrast perception (Brightness and Contrast Perception) as well as psychophysiological models of retinal photoreceptors (Photoreceptor Model of Contrast Compression). Then we discuss more engineering oriented solutions for contrast compression using logarithmic functions (Logarithmic Contrast Compression) and image histogram adjustment (Histogram Adjustment Methods). Brightness and Contrast Perception The term brightness B describes the response of the human visual system (HVS) to stimulus luminance Y . This response has the form of compressive nonlinearity which can be approximated by a logarithmic function (Weber-Fechner law) B = k1 ln(Y /Y¯ ), where Y¯ denotes the luminance of the background and k1 is a constant factor. This relation has been derived in psychophysical threshold experiments through examining just noticeable differences (JND) ∆Y for various Y¯ . Slightly different relations between B and Y have been obtained depending on such factors as the stimulus size, the luminance of adaptation Y¯ , and temporal presentation. For example, suprathreshold experiments resulted in an observation that equal ratios of luminance lead to equal ratios of brightness and the HVS response should be rather modeled by a power function (Stevens law) B = k2 Y n , where n falls in the range of 0.3–1.0. In practice, the Weber law is a good approximation for bright illumination conditions (Y¯ > 1, 000 cd·m−2 ) and for lower adaptation luminance levels (0.1 < Y¯ < 1, 000 cd·m−2 ) power functions fit better to the experimental data (refer also to Sect. 12.6). Tumblin and Rushmeier [3] have developed a perception-inspired tone mapping whose the main objective is to preserve a constant relationship between the brightness of a scene perceived on a display and its real counterpart, for any lighting condition. They use Stevens’ power-law relationship to
150
Krawczyk et al.
transform the real world luminance into the real world brightness. Then based on the brightness matching assumption they find the corresponding display luminance values using inverse display observer and inverse display device transformations. Since the Tumblin and Rushmeier operator is derived from suprathreshold psychophysical data it predicts well the brightness magnitude, but may be imprecise in predicting whether some low contrast details in the real world scene can be perceived by the human observer. This task is relatively easy for tone mapping operators [4,5] based on the psychophysical measurements of the threshold vs. intensity (TVI) function which for each luminance adaptation level Y¯ specifies just noticeable luminance threshold ∆Y . It is assumed that the eye is adapted to unique luminance levels Y¯w and Y¯d corresponding to the real world and to the display observation conditions (Y¯ is usually computed as the average of logarithm values of each pixel luminance in the image). Then the corresponding just noticeable threshold values ∆Yw and ∆Yd are found for Y¯w and Y¯d based on the TVI function. The tone mapping between the real world luminance Yw and display luminance Yd is then a simple scaling function: ∆Yd Yw . (11.1) Yd = ∆Yw Due to the linear nature of this operator it is not suitable for images of very high dynamic range and it poorly predicts contrast magnitudes well over the perceivability threshold. Photoreceptor Model of Contrast Compression Neurological and psychophysiological studies of HVS demonstrate that nonlinear luminance compression can mostly be attributed to the retinal signal processing. Direct measurements of cellular response functions for cones, rods, bipolar, and ganglion cells show that they belong to the family of sigmoid functions Yα R(Y ) = Rmax α , (11.2) Y + σα where R is the neural response (0 < R < Rmax ), Y is the luminance value, σ is the semi-saturation constant which corresponds to Y¯ for which R(Y¯ ) = 0.5· Rmax . The response sensitivity is controlled by α which is found to lie between 0.2 and 0.9. The response function R(Y ) is close to linear for luminance close to the adaptation level Y¯ and features strong compression of luminance values significantly departing from Y¯ (both for extremely dark and bright pixels). The photoreceptor model in the form of (11.2) with α = 0.73 is the foundation of time-dependent tone mapping operator developed by Pattanaik et al. [6]. This operator deals with the changes of threshold visibility, color appearance, visual acuity, and sensitivity over time. The operator can be decomposed into two models: the visual adaptation model and the visual appearance model. The signals that simulate the adaptation measured in the
11 HDR Tone Mapping
151
retina are used for adaptation in each pixel of an image. To reproduce visual appearance, it is assumed that a viewer determines reference white and reference black colors. Then, the visual appearance model recalculates the visual appearance with respect to those reference points. By assembling the visual adaptation and appearance models, the scene appearance is reproduced with changes to visual adaptation depending on time. This method is also useful to predict the visibility and appearance of scene features because it deals with reference white and black points. Reinhard and Devlin [7] used in their tone mapping operator a version of photoreceptor model as proposed by Hoods and colleagues. An important contribution of their work is the control of color saturation in the spirit of a von Kries model. A sigmoid function is also used in the photographic tone reproduction which belongs to the category of spatially variant tone mapping operators (refer to Photographic Tone Reproduction and Sect. 11.3.1). Logarithmic Contrast Compression Stockham [8] recommended a logarithmic relation in his early tone mapping solution log(Yw + 1) , (11.3) R(Yw ) = log(Ymax + 1) where for each pixel, the output intensity R(Yw ) is derived from the ratio of world luminance Yw and maximum luminance in the scene Ymax . This mapping ensures that whatever the dynamic range of the scene is, the maximum value is remapped to one (white) and other luminance values are smoothly changing. While the Stockham approach leads to pleasant images, Drago et al. [9] found that the luminance compression is excessive and the feeling of high contrast content is lost. Drago et al. introduced a method which is called adaptive logarithmic mapping. This method addresses the need for a fast algorithm suitable for interactive applications which automatically produces realistically looking images for a wide variation of scenes exhibiting high dynamic range of luminance. To preserve details while providing high contrast compression, a family of logarithmic functions with increasing compressive power are used logbase (Y ) =
log(Y ) log(base)
(11.4)
with logbase ranging from log2 to log10 . The log10 is applied for the brightest image pixel and for remaining pixels the logarithm base is smoothly interpolated between values 2 and 10 as a function of their luminance: R(Yˆw ) =
log(Yˆw + 1) , log(b) log(0.5) ˆ log 2 + 8 · Yˆ Yw
w max
(11.5)
152
Krawczyk et al.
where each pixel luminance Yˆw and the maximum luminance of the scene Yˆw max are normalized (divided) by the world adaptation luminance Y¯w . Perlin bias power function [10] in the denominator of (11.5) is used for interpolation between the logarithm bases to provide better steepness control of the resulting tone mapping curve. The bias parameter b is essential to adjust compression of high values and visibility of details in dark areas. Values for b between 0.7 and 0.9 are most useful to generate perceptually good images. Psychophysical experiments with human subjects suggest that the Drago et al. operator leads to images which are highly ranked in terms of natural appearance and subject preferences [11]. Histogram Adjustment Methods Ward Larson et al. [12] proposed the histogram adjustment method which allocates most of the displayable dynamic range to the luminance ranges which are represented by many pixels in the scene. This results in strong contrast compression for pixels belonging to sparsely populated regions in the image histogram, which helps to overcome the problem of dynamic range shortage. On the other hand, for HDR images with more uniform distribution of pixels across the full luminance range the compression would be weak. The method leads to a monotonic tone reconstruction curve which is applied globally to all pixels in the image. The slope of the curve is constrained by considering the human contrast sensitivity to guarantee that displayed image does not exhibit more contrast than it could be perceived in the real scene. The method incorporates also other characteristics of the HVS such as color sensitivity, visual acuity, and glare. Comparison of Global Mapping Curves In Fig. 11.1 we show for comparison several tone mapping curves adjusted to a sample HDR image. The histogram of the image is plotted in the background for reference. Clearly, a naive approach to linearly scale the luminance values is not able to map the input dynamic range to displayable values and the details in high and low luminance ranges are lost. The full range of input luminance values can be successfully mapped using the adaptive logarithmic mapping [9] (refer to “Logarithmic Contrast Compression”). Equally good performance can be achieved with the photoreceptor model [7], though the image will be considerable brighter as the mapping curve maps low luminance values to higher display values (refer to “Photoreceptor Model of Contrast Compression”). The curve of the histogram adjustment method [12] not only covers the whole input range but also fine tunes its shape to assign wider displayable range to highly populated luminance ranges what in the end effect produces a well optimized final result (refer to “Histogram Adjustment Methods”).
11 HDR Tone Mapping
linear
logarithmic
photoreceptor
histogram eq.
153
Tone mapped intensities
1 0.8 0.6 0.4 0.2 0
−2
−1
0 log_{10} luminance
1
2
Fig. 11.1. Various mapping curves of global tone mapping operators adjusted for a sample HDR image. The histogram in the background illustrates the luminance of the sample HDR image. Refer to “Comparison of Global Mapping Curves” for details
11.1.2 Spatially Variant Operators A single pixel emitting radiance, which seems to be the smallest entity in computer graphics, is actually meaningless for the human eye. As obviously as the human eye cannot see a single pixel when all other pixels on the screen are of the same color, the HVS cannot retain much information from absolute luminance or radiance levels. Instead, we are perfectly equipped for the task of comparison of light patches – we see contrast rather than luminance. From
154
Krawczyk et al.
this standpoint all spatially invariant tone mapping operators, which consider each pixel separately and transform its luminance using a monotonic tone reproduction curve, are poor in mimicking those important characteristics of the HVS. It turns out that global operators are also poor in reproducing local image details. To achieve better balance between local and global contrast reproduction, spatially variant tone mapping have been introduced. Lightness is a perceptual quantity measured by the HVS which describes the amount of light reflected from the surface normalized for the illumination level. Contrary to brightness, which describes a visual sensation according to which an area exhibits more or less light, the lightness of a surface is judged relative to the brightness of a similarly illuminated area that appears to be white. The basic observation relevant for spatially variant operators is that the scene illumination is responsible for high dynamic range and must be compressed while lightness with surface details (texture) should be preserved. Since recovering lightness from images is an ill posed problem which can be solved precisely only for unrealistically simplified scenes (e.g., it is often assumed that illumination is a slowly changing, smooth function [13–15], which effectively means that shadows cannot be handled), approximated methods of lightness and illumination separation are used. For this purpose HDR images are decomposed into various layers in terms of spatial frequencies, which are then differently attenuated in terms of contrast. Low spatial frequency layers which supposedly carry more illumination information are attenuated stronger. Pyramids of band filters such as Laplacians or Gabor filters are used at the decomposition stage. Unfortunately, combining such differently attenuated layers into the output image leads to contrast reversal or halos [16]. This was a major problem with all early spatially variant operators. In the following sections we briefly review modern local tone mapping operators which mostly avoid halo artifacts by considering sophisticated robust filters that penalize spatial neighbors (outliers) whose intensity is significantly different from the current pixel intensity (“Photographic Tone Reproduction”, “Perceptually Uniform JND Space”, and “Bilateral Filtering Method”). We present also a solution for handling difficult scenes featuring substantially different luminance levels and different chromatic adaptation conditions (“Lightness Perception”). In “Gradient methods” we discuss in more detail an operator featuring controllable enhancement of local and global image details (contrast), which belong to the class of gradient methods that rely on direct contrast processing (instead of luminance). Photographic Tone Reproduction Reinhard et al. presented a photographic tone reproduction inspired by photographic film development and the printing process [17]. The luminance of an image is initially compressed into the displayable range using a sigmoid function (refer to “Photoreceptor Model of Contrast Compression”). The photo-
11 HDR Tone Mapping
155
graphic “dodging and burning” technique, which allows a different exposure for each part of the image, is used to enhance local contrast. To automate processes, low contrast regions are found by a center-surround filters at different scales. The biggest scale that does not contain outlier pixels is chosen. Then, a tone mapping function is locally applied. The automatic dodging and burning method enhances contrast and details in an image while preserving the overall luminance characteristics. This method can operate fully automatically based on information extracted individually from each HDR image, freeing the user from any parameter setting [18]. In Sect. 11.3 we provide more details on the photographic tone reproduction which is a good choice for real-time HDR video processing. Perceptually Uniform JND Space Ashikhmin presented a new, multipass approach to tone mapping [19]. The method takes into account two basic characteristics of the human visual systems (HVS): signaling absolute brightness and local contrast. This method first calculates local adaptation luminance by calculating an average luminance of neighboring pixels fitting in a bound-limited contrast range (similar to Reinhard et al. [17]). Then, it applies the capacity function which is based on the linearly approximated threshold vs. intensity function and calculates the final pixel values. The key feature of the capacity function is that it transforms input luminance values into uniformly scaled space in terms of just noticeable differences for various levels of adaptation luminance Y¯ . The final calculation restores the details which may be lost in the steps of compression. A tone mapped pixel value is obtained by multiplying a detail image given by the ratio of pixel luminance to the corresponding local world adaptation. Bilateral Filtering Method A fast bilateral filtering method was presented by Durand and Dorsey [20]. This method considers two different spatial frequency layers: a base layer and a detail layer. The base layer preserves high contrast edges and removes highspatial frequency details of lower contrast. The detail layer is created as the difference of the original image and the base layer in logarithmic scale. After contrast compression in the base layer, both layers are summed up to create a tone mapped image.In Sect. 11.5 we provide more details on this operator, which is suitable for high-quality HDR video processing that is performed off-line. Lightness Perception Lightness constancy is an important characteristics of the HVS which leads to a similar appearance of the perceived objects independently of the lighting
156
Krawczyk et al.
and viewing conditions [21]. While observing the images presented on display devices, it is desirable to reproduce the lightness perception corresponding to the observation conditions in the real world. This is not an easy task because the lightness constancy achieved by the HVS is not perfect and many of its failures appear in specific illumination conditions or even due to changes in the background over which an observed object is imposed [22]. It is known that the lightness constancy increases for scene regions that are projected over wider retinal regions [23]. This effect is reinforced for objects whose perceived size is larger even if their retinal size is the same [24]. The reproduction of images on display devices introduces further constraints in terms of a narrower field of view and limitations in the luminance dynamic range. Some failures of lightness constancy still appear in such conditions (simultaneous contrast for instance), but other, such as the Gelb illusion, cannot be observed on a display device. Clearly, a lightness perception model needs to be embedded into the processing pipeline of HDR images in order to improve the fidelity of their display. The recently presented model of an anchoring theory of lightness perception by Gilchrist et al. [25] provides an unprecedented account of empirical experiments for which it provides a sound explanation. The theory is based on a combination of global and local anchoring of lightness values and introduces two main concepts: the anchor and frameworks. In order to relate the luminance values to lightness, it is necessary to define at least one mapping between the luminance value and the value on the scale of perceived gray shades – the anchor. Once such an anchor is defined, the lightness value for each luminance value can be estimated by the luminance ratio between the value and the anchor – the scaling. Gilchrist et al. [25] argues that the anchor should be calculated using the highest luminance rule. In general, this defines the mapping of the highest luminance in the visual field to a lightness value perceived as white. However, there is a tendency of the highest luminance to appear white and a tendency of the largest area to appear white. Therefore the estimation is redefined based on this experimental evidence. As long as there is no conflict, i.e., the highest luminance covers the largest area, the highest luminance becomes a stable anchor. However, when the darker area becomes larger, the highest luminance starts to be perceived as self-luminous. The anchor becomes a weighted average of the luminance proportionally to the occupying area. The anchoring, however, cannot be applied directly to complex images in an obvious way. Instead, Gilchrist et al. [25] introduce the concept of decomposition of an image into components, frameworks, in which the anchoring rule can be applied directly. In the described theory, following the gestalt theorists, frameworks are defined by regions of common illumination. For instance, all objects being under the same shadow would constitute a framework. A real-world image is usually composed of multiple frameworks. The framework regions can be organized in an adjacent or a hierarchical way and their areas may overlap. The lightness of a target is computed according to the
11 HDR Tone Mapping
157
anchoring rule in each framework. However, if a target in a complex image belongs to more than one framework, it may have different lightness values when anchored within different frameworks. According to the model, the net lightness of a given target is predicted as a weighted average of its lightness values in each of the frameworks in proportion to the articulation of this framework. The articulation of a framework is determined by the variety of luminance values it contains in such a way that frameworks with low variance have less influence on the net lightness. Krawczyk et al. [27] derive a tone mapping algorithm for contrast reduction in HDR images. Their contrast reduction process is solely based on the luminance channel. First, the input HDR image is decomposed into overlapping frameworks. A framework is represented as a probability map over the whole image in which a probability of belonging to this framework is assigned to each pixel. For the purpose of tone mapping for LDR displays, further constraint is imposed on the dynamic range in the framework’s area, which cannot exceed two orders of magnitude. The frameworks are identified in the HDR image based on the pixel luminance value in the log10 space using several iterations of the K-means algorithm until the constraints are met. Additionally, the probability map of each framework is processed spatially in order to smooth small local variations in the probability values which may appear due to textures. Next, the anchor in each framework is estimated, i.e., the luminance value perceived as white, and the local pixel lightness within each framework is computed. Finally, the net lightness of each pixel is obtained by merging the individual frameworks into one-image proportionally to the pixel probability values. The result is suitable to be viewed on an LDR display device. Figure 11.2 illustrates sample result of this method. The concept of frameworks gives a unique possibility for further processing of the HDR images. For instance, since frameworks provide the information which areas of the HDR image contain common illumination, they enable the possibility to apply local white balance. This is particularly useful for
Fig. 11.2. Conventional photography with over and under exposures (left), decomposition of an image into the areas of consistent illumination (middle), and the result of the contrast ratio optimization between the frameworks – tone mapping (right). The HDR source image courtesy of OpenEXR.
158
Krawczyk et al.
images with areas illuminated by considerable different illuminants, as for example a shot in a room with tungsten illumination (orange color cast) with a window looking at the shadowed outdoor (blue color cast). The concept of frameworks enables also the application of the algorithm to video sequences where temporal coherence between frames can be achieved via the smoothing of the local anchors values. Gradient Methods Lightness determination algorithms [13–15] through recovering reflectance information naturally achieve dynamic range compression and are commonly used for tone mapping purposes [28]. In all those algorithms a logarithmically compressed intensity signal (an approximation of luminance processing by the retina) is differentiated and then thresholded to eliminate small derivative values attributed to illumination changes. The differentiation actually transforms images from the luminance to the contrast domain. The final integration of the thresholded derivatives leads to the reconstruction of image reflectance. In the Retinex algorithm [13] a huge number of global paths in the image plane is considered to perform such an integration. Horn [14] performs the differentiation and integration in 2-D, which can be reduced to the solution of Poisson’s equation. Those algorithms rely on many limiting assumptions [15], such as slow changes in illumination, which are not very realistic for high contrast images. Fattal et al. [29] are the first who consider contrast processing explicitly for tone mapping. They show that visually compelling tone mapped images can be obtained through a smart nonlinear attenuation of luminance gradient magnitudes. In this section we describe in more detail a more recent work of Mantiuk et al. [30] which can be seen as a generalization of [29]. Mantiuk et al. introduce a multi-resolution framework which extends contrast considerations from local pixel neighborhood to the whole image domain. In their approach the control over the global contrast (between distant image regions) and local contrast (between neighboring pixels) relies on the minimization problem rather than on user-tuned parameters as in [29]. This leads to images free from artifacts that are typical for local optimization procedures unable to grasp global contrast relations. Even without any picture-by-picture tuning, the framework can produce artifact-free and extremely sharp images. An important role in the contrast domain framework of Mantiuk et al. play two distinct characteristics of the HVS: localized contrast perception and contrast discrimination threshold. Localized contrast perception comes from the fact that the human eye can see high frequency patterns with foveal vision only within limited area, while the rest of the image is seen with parafoveal vision. For example, the eye can easily see the difference between fine details if they are close to each other, but have difficulties to distinguish which detail is brighter or darker if they are distant in the field of view. On the other hand, distant light patches can
11 HDR Tone Mapping
159
be easily compared if they are large enough. Based on those facts, Mantiuk et al. introduce a representation of contrast in complex images as a lowpass pyramid of contrast values. They use that pyramid to perform image processing on contrast rather than luminance. The contrast discrimination threshold is the smallest difference in contrast the human eye can see. Such contrast discrimination data is well studied in psychophysics, but unfortunately mostly for low contrast patterns unsuitable for HDR images. The notable exception is the work of Whittle [31], in which the contrast discrimination was measured for the full range of contrast patterns. Whittle’s data clearly show that the eye is the least sensitive (the thresholds are the highest) for low and high contrast ranges, in which either neural noise or optics of the eye limits our vision. Therefore, the contrast in those ranges can be modified the most before we notice any visible difference. In the tone mapping algorithm [30], images are first converted to the multiscale contrast pyramid. The contrast values are then modified accordingly to the contrast discrimination thresholds – the contrast is strongly compressed for very low and very high contrast and minimally compressed for the middle contrast range, to which the eye is the most sensitive. Another approach, which leads to even sharper but less natural looking images, takes advantage of the statistics of an image and performs histogram equalization on contrast values. The contrast values resulting from both approaches are then converted back to the luminance values by solving an optimization problem. Note that for the histogram equalization approach the basic difference with respect to [12] is that the latter one operates directly on luminance values while Mantiuk et al. consider contrast values instead. Contrast domain offers several advantages over luminance domain. Firstly, contrast can be modified with regard to its magnitude, thus taking advantage of contrast perception characteristics such as the contrast discrimination thresholds becomes simple. Secondly, extremely sharp contrast can be achieved without introducing halo artifacts, since contrast values are compressed but not reversed. Finally, the pyramid of contrast values [30], which approximates the localized contrast perception, can take into account both local (high spatial frequency) and global (low frequency) contrast relations. Without taking into account global-contrast relations, the resulting images can show significantly disparate brightness levels in the areas where the input luminance is the same (refer to Fig. 11.3). Figure 11.4 shows the difference in the image sharpness achieved using the bilateral filtering approach [20], which is one of the best local tone mapping algorithms working in luminance domain, and the gradient domain technique as proposed by Mantiuk et al. [30].
11.2 HDR Video: Specific Conditions and Requirements The tone mapping algorithms have been mostly developed specifically for the static images and they are not always applicable to the HDR video streams. The problems may among others arise from the temporal incoherence or too
160
Krawczyk et al.
Fig. 11.3. The algorithm by Fattal et al. (left) renders windows panes of different brightness due to the local nature of the optimization procedure. The contrast compression on the multiscale contrast pyramid proposed by Mantiuk et al. can maintain proper global contrast proportions (right). The approach based on the perceptual model of the contrast discrimination thresholds has been used to generate this image. The HDR source image courtesy of Greg Ward
Fig. 11.4. Extreme sharp contrast can be achieved without introducing halos. The result of bilateral filtering by Durand and Dorsey (left) compared with Mantiuk et al. method based on the histogram equalization (right). The HDR source image courtesy of Greg Ward
high computational complexity. In the following sections we focus on the tone mapping operators for video applications having particularly in mind the sensors described in the previous chapters of this book. The capabilities of the HDR video cameras define specific characteristics of the video streams they generate. Currently the resolution is usually low (typically VGA), what implies that the tone mapping process may be done relatively fast or more complex methods can be used in the given time-slice between the frames. The dynamic range that the cameras are capable to capture is already very wide, therefore often an aggressive compression is necessary to fit the whole information into the displayable range. On the other hand the level of noise is still quite high especially during acquisition in
11 HDR Tone Mapping
161
poor illumination conditions. In this case, algorithms that extract details very well may cause the noise amplification and in fact lead to the loss of useful information. Finally, the field of application for such HDR video cameras provides further requirements to the performance of the tone-mapping algorithms. Specific expectations to the obtained image-quality should be taken into account while choosing the algorithm.
11.3 Tone Mapping for HDR Video In general, the global algorithms are usually the first choice for the tone mapping of HDR video. Mainly because of their simplicity and, in consequence, high efficiency, but also because it is straightforward to make them temporally coherent. What is more, often a type of dynamic range compression is already present in the camera response as is the case in for instance HDRC sensor which has a logarithmic response similar to the idea of logarithmic contrast compression (“Logarithmic Contrast Compression”). Sometimes however the dynamic range is so high that after the global tone mapping the colors in the image look desaturated and the fine details in the scene are lost. In such cases the local operators provide better results at the cost of the processing speed, however the real-time implementation of some of those algorithms is currently not achievable. As a compromise, various algorithms provide an enhancement to the global contrast compression which to some extent allows the preservation of details. In Sect. 11.3.1 we introduce the state-of-the-art solution in this category. 11.3.1 Response Curve Compression The algorithm proposed by Reinhard et al. [17] operates solely on the luminance values which can be extracted from RGB intensities using the standard CIE XYZ transform (D65 white point). The method is composed of a globalscaling function similar to photoreceptor model (“Photoreceptor Model of Contrast Compression”), and a local dodging and burning technique which allows to preserve fine details. The results are driven by two parameters: the adapting luminance for the HDR scene and the key value. The adapting luminance ensures that the global scaling function provides the most efficient mapping of luminance to the display intensities for given illumination conditions in the HDR scene. The key value controls whether the tone mapped image appears relatively bright or relatively dark. In this algorithm, the source luminance values Y are first mapped to the relative luminance Yr : α·Y (11.6) Yr = ¯ , Y
162
1 0.8 0.6 0.4 0.2 0 −1
Krawczyk et al.
0
1
2 3 4 log10 luminance
5
6
1 0.8 0.6 0.4 0.2 0 −1
0
1
2 3 4 log10 luminance
5
6
Fig. 11.5. Tone mapping of an HDR image with a low key (left) and a high key (right). The curve on the histograms illustrates how the luminance is mapped to normalized pixel intensities. Refer to Sect. 11.3 for details
where Y¯ is the logarithmic average of the luminance in the scene, which is an approximation of the adapting luminance, and α is the key value. The relative luminance values are then mapped to the displayable pixel intensities L using the following function (compare to (11.2) in “Photoreceptor Model of Contrast Compression”): Yr (11.7) L= 1 + Yr The above formula maps all luminance values to the [0 : 1] range in such way that the relative luminance Yr = 1 is mapped to the pixel intensity L = 0.5. This property is used to map a desired luminance level of the scene to the middle intensity on the display. Mapping higher luminance level to middle gray results in a subjectively dark image (low key) whereas mapping lower luminance to middle gray will give a bright result (high key) (see Fig. 11.5). Obviously, images which we perceive at night appear relatively dark compared to what we see during a day. We can simulate this impression by modulating the key value in (11.6) with respect to the adapting luminance in the scene. We explain our solution in Sect. 11.3.4. 11.3.2 Local Details Enhancement Often, the tone mapping function in (11.7) may lead to the loss of fine details in the scene due to the extensive contrast compression. Reinhard et al. [17] propose a solution to preserve local details by employing a spatially variant local adaptation value V in (11.7):
11 HDR Tone Mapping
163
1.2 #1 #2 #3 #4 #5 #6 #7 #8
1 0.8 0.6 0.4 0.2 0 −30
−20
−10
0 pixels
10
20
30
Fig. 11.6. Plot of the Gaussian profiles used to construct the scales of the pyramid used for local dodging and burning in the tone mapping algorithm. The smallest scale is #1 and the largest is #8. Plots are normalized by maximum value for illustration
L(x, y) =
Yr (x, y) . 1 + V (x, y)
(11.8)
The local adaptation V equals to an average luminance in a surround of the pixel. The problem lies however in the estimation of how large the surround of the pixel should be. The goal is to have as wide surround as possible, however too large area may lead to well known inverse gradient artifacts, halos. The solution is to successively increase the size of a surround, checking each time if no artifacts are introduced. For this purpose a Gaussian pyramid is constructed with successively increasing kernel g(x, y, s) =
2 2 1 − x +y s2 · e . πs2
(11.9)
The √ Gaussian for the first scale is one pixel wide, setting kernel size to s = (2 2)−1 , on each following scale s is 1.6 times larger. The Gaussian functions used to construct seven scales of the pyramid are plotted in Fig. 11.6. As we later show, such a pyramid is also very useful in introducing the perceptual effects to tone mapping. 11.3.3 Temporal Luminance Adaptation While tone mapping the sequence of HDR frames, it is important to note that the luminance conditions can significantly change from frame to frame. The human vision reacts to such changes through the temporal adaptation processes. The time course of adaptation differs depending on whether we adapt to light or to darkness, and whether we perceive mainly using rods (during night) or cones (during a day). Many intricate models have been introduced to computer graphics, however it is not as important to faithfully model the process as to account for it at all [32].
164
Krawczyk et al.
In the tone mapping algorithm chosen by us, the luminance adaptation can be modeled using the adapting luminance term in (11.6). Instead of using the actual adapting luminance Y¯ , a filtered value Y¯a can be used whose value changes according to the adaptation processes in human vision, eventually reaching the actual value if the adapting luminance is stable for some time. The process of adaptation can be modeled using an exponential decay function [33] T (11.10) Y¯anew = Y¯a + (Y¯ − Y¯a ) · (1 − e− τ ), where T is the discrete time step between the display of two frames, and τ is the time constant describing the speed of the adaptation process. The time constant is different for rods and for cones τrods = 0.4s τcones = 0.1s.
(11.11)
Therefore, the speed of the adaptation depends on the level of the illumination in the scene. The time required to reach the fully adapted state depends also whether the observer is adapting to light or dark conditions. The values in (11.11) describe the adaptation to light. For practical reasons the adaptation to dark is not simulated because the full process takes up to tens of minutes. Therefore, it is acceptable to perform the adaptation symmetrically, neglecting the case of a longer adaptation to dark conditions. We model the temporal luminance adaptation based on (11.10). However, in our algorithm we do not perform separate computations for rods and cones, what makes it hard to properly estimate the adaptation speed having two time constants τrod and τcone instead of one. To account for this, and still be able to correctly reproduce the speed of the adaptation, we interpolate the actual value of the time constant based on the sensitivity of rods (11.15): τ (Y¯ ) = σ(Y¯ ) · τrod + (1 − σ(Y¯ )) · τcone
(11.12)
which we then use to process the adaptation value using (11.10). 11.3.4 Key Value The key value, explained in Sect. 11.3.1, determines whether the tone-mapped image appears relatively bright or dark, and in the original paper [17] is left as a user choice. In his follow-up paper, Reinhard [18] proposes a method of automatic estimation of the key value that is based on the relations between minimum, maximum, and average luminance in the scene. Although the results are appealing, we feel this solution does not necessary correspond to the impressions of everyday perception. The critical changes in the absolute luminance values may not always affect the relation between the three values. This may lead to dark night scenes appearing too bright and very light too dark. The key value, α in (11.6), takes values from [0 : 1] range where 0.05 is the low key, 0.18 is a typical choice for moderate illumination, and 0.8 is
Key value
11 HDR Tone Mapping 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −4
−2
0
2
4
6
165
8
log10 luminance
Fig. 11.7. Key value related to adapting luminance in the scene
the high key. We propose to calculate the key value based on the absolute luminance. Since the key value has been introduced in photography, there is no scientifically-based experimental data which would provide an appropriate relation between the key value and the luminance, so the proper choice is a matter of experience. We therefore empirically specify key values for several illumination conditions and interpolate the rest using the following formula: α(Y¯ ) = 1.03 −
2 , 2 + log10 (Y¯ + 1)
(11.13)
where α is the key value and Y¯ is an approximation of the adapting luminance. The plot of this estimation is shown in Fig. 11.7. 11.3.5 Tone Mapping We start the tone mapping process with calculating the luminance from the HDR frame and mapping it to the relative luminance according to (11.6). We calculate the logarithmic average of the luminance Y¯ in the frame and apply the temporal adaptation process (11.10). The map of relative luminance values constitutes the first scale of the Gaussian pyramid. At each scale of the Gaussian pyramid, we render the successive scale by convolving the previous scale with the appropriate Gaussian (11.9). Having the current and the previous scales, the local adaptation is computed using the measure of the difference between the previous and the current scale as described in [17]. We iterate through all of the scales of the Gaussian pyramid and in the final iteration we fill the missing parts of the adaptation map with data from the lowest scale. We then use the following formula to calculate the tone-mapped RGB values of the original HDR input ⎤ ⎡ ⎤ ⎡ R RL ⎣ GL ⎦ = ⎣ G ⎦ · L , (11.14) Y BL B
166
Krawczyk et al.
where {RL , GL , BL } denotes the tone mapped intensities, {R, G, B} are the original HDR values, Y is the luminance, and L is the tone mapped luminance calculated using (11.8) from Sect. 11.3.1.
11.4 Simulating Perceptual Effects The fact that a daylight scene appears very bright and colorful, but during the night everything looks dark and grayish, is obvious for an average observer. In dimly illuminated scenes the fine details are not perceivable, because the acuity of human vision is degraded, therefore when using local tone mapping operators, well preserved details will give an unrealistic impression in such cases. On the other hand, certain perceptual effects, like glare, cannot be evoked because the maximum luminance of typical displays is not high enough. However, we are so used to the presence of such a phenomena, that adding glare to an image can increase subjective brightness of the tone mapped image [34, 35]. Apparently, it appears important to properly predict and simulate such perceptual effects during the tone mapping process in order to convey a realistic impression of HDR data over a wide range of luminance, when such data are displayed on typical display devices. The simulation of the perceptual effects is in principle possible, because the camera response can be calibrated to approximate the physically correct values cd·m−2 at given aperture [26]. Knowing exactly the illumination level in the scene, it is possible to estimate the performance of HVS in this conditions. In the following sections, we present the models which describe the most significant perceptual effects and explain how to combine them with the tonemapping algorithm given in Sect. 11.3.1. 11.4.1 Scotopic Vision The human vision operates in three distinct adaptation conditions: scotopic, mesopic, and photopic. The photopic and mesopic vision provide color vision, however in scotopic range, where only rods are active, color discrimination is not possible. The cones start to loose their sensitivity at 3.4 cd·m−2 and become completely insensitive at 0.03 cd·m−2 where the rods are dominant. We model the sensitivity of rods σ after [36] with the following function: σ(Y ) =
0.04 , 0.04 + Y
(11.15)
where Y denotes the luminance. The sensitivity value σ = 1 describes the perception using rods only (monochromatic vision) and σ = 0 perception using cones only (full color discrimination). The plot of (11.15) is shown in Fig. 11.8.
11 HDR Tone Mapping 1
167
sensitivity of rods loss of visual acuity
0.8 0.6 0.4 0.2 0 −4
−2
0
2
4
6
8
log10 luminance
Fig. 11.8. The influence of perceptual effects on vision depending on the luminance level. For details on rods sensitivity and visual acuity refer to Sects. 11.4.1 and 11.4.2, respectively
11.4.2 Visual Acuity Perception of spatial details in the human vision is not perfect and becomes limited with a decreasing illumination level. The performance of visual acuity is defined by the highest resolvable spatial frequency and has been investigated by Shaler in [37]. Ward et al. [12] offer the following function fit to the data provided by Shaler RF(Y ) = 17.25 · arctan(1.4 log10 Y + 0.35) + 25.72,
(11.16)
where Y denotes the luminance and RF is the highest resolvable spatial frequency in cycles per degree of the visual angle. The plot of this function is shown in Fig. 11.9. To simulate the loss of visual acuity on a display device we need to map the visual degrees to pixels. Such a mapping depends on the size of the display, the resolution, and the viewing distance. For a typical observation of a 15-inch screen from half a meter at 1024 × 768 resolution we assume 45 pixels per 1 degree of the visual angle. It is important to note that the highest frequency possible to visualize in such conditions is 22 cycles per visual degree. Therefore, technically we can simulate the loss of visual acuity only for luminance below 0.5 cd·m−2 . The irresolvable details can be removed from an image by the convolution with the Gaussian kernel from (11.9) where s is calculated as follows [12]: 1 width · , (11.17) sacuity (Y ) = fov 1.86 · RF(Y ) where “width” denotes the width in pixels and “fov” is the horizontal field of view in visual degrees. For typical observation the “width” to “fov” relation equals 45 pixels.
168
Krawczyk et al.
spatial frequency
50 40 30 20 10 0 −4
−2
0
2 log10 luminance
4
6
8
Fig. 11.9. Plot of the highest resolvable spatial frequency for a given luminance level which illustrates the effect of loss of the visual acuity. Spatial frequency is given in cycles per degree of visual angle. The horizontal line marks the maximum displayable spatial frequency on a 15-inch LCD in typical viewing conditions
In Fig. 11.8 we show the intensiveness of the loss of visual acuity with respect to the luminance level. Apparently the loss of the visual acuity correlates with the increasing sensitivity of rods, and is therefore present in the monochromatic vision. 11.4.3 Veiling Luminance Due to the scattering of light in the optical system of the eye, sources of relatively strong light cause the decrease of contrast in their vicinity – glare. Such an effect cannot be naturally evoked while perceiving an image on a display due to different viewing conditions and limited maximum luminance of such devices. It is therefore important to account for it while tone mapping. The amount of scattering for a given spatial frequency ρ under a given pupil aperture d is modeled by an ocular transfer function [38] (1.3−0.07·d) ρ , OTF(ρ, d) = exp − 20.9−2.1·d (11.18) d(Y¯ ) = 4.9 − 3 tanh(0.4 log10 Y¯ + 1). In a more practical manner the scattering can be represented in the spatial domain as a point spread function. In Fig. 11.10 we show point spread functions for several adapting luminance levels, which were numerically found by applying the inverse Fourier transform to (11.18). Another model of the glare effect was introduced in computer graphics by Spencer et al. [34, 35]. They describe this phenomenon with four point spread functions linearly combined with three sets of coefficients for different adaptation conditions (scotopic, mesopic, and photopic). Since their model is complex, and it is not obvious how to apply it in continuously changing
11 HDR Tone Mapping
169
1.2 Y = 0.0001 [cd/m2] Y = 0.01 [cd/m2] Y = 1 [cd/m2] Y = 100 [cd/m2]
1 0.8 0.6 0.4 0.2 0 −4
−2
0 pixels
2
4
Fig. 11.10. The point-spread function illustrating scattering of light in the optical system of the eye for several adapting luminance levels
luminance conditions, we decided to employ the model developed by Deeley et al. [38], which describes the effect with one function, continuously for all adaptation levels, and provides equally good results. 11.4.4 Tone Mapping with Perceptual Effects The process of tone mapping with perceptual data is similar to description in Sect. 11.3.5. However during the iteration through the Gaussian pyramid, apart from the local adaptation map, also the visual acuity and glare maps are calculated. For the acuity map, we first estimate the proper scale for the luminance of the current pixel. If it falls between the previous and current scales, we interpolate the final value and update the map. We update the glare map in the same manner, with one difference: the appropriate scale for glare depends on the adapting luminance and is uniform for the whole frame. For tone mapping with perceptual effects we use a modified form of (11.8) from Sect. 11.3.1 to account for the loss of the visual acuity and the glare L(x, y) =
Yacuity (x, y) + Yglare (x, y) , 1 + V (x, y)
(11.19)
where L is the final pixel intensity value, Yacuity is the spatially processed luminance map that represents the visual acuity, Yglare is the amount of additional light scattering in the eye, and V is the local adaptation map. Because the glare map in fact contains the relative luminance from the appropriate scale of the Gaussian pyramid, we estimate the additional amount of scattering in the following way to include only the contribution of the highest luminance
170
Krawczyk et al.
Loss of Visual Acuity
Glare Effect
Fig. 11.11. Sample perceptual effects simulated on a HDR Video stream: glare (right image) and scotopic vision with loss of visual acuity (left image). The closeup in the left image inset shows the areas around the car in such way that their brightness match to illustrate the loss of visual acuity. Refer to Section 11.4 for details
Yglare
0.9 = Ygmap · 1 − 0.9 + Ygmap
,
(11.20)
where Ygmap denotes the glare map from the perceptual data. We account for the last perceptual effect, the scotopic vision, while applying the final pixel intensity value to the RGB channels in the original HDR frame (Fig. 11.11). Using the following formula, we calculate the tone mapped RGB values as a combination of the color information and the monochromatic intensity proportionally to the scotopic sensitivity ⎡ ⎤ ⎤ ⎡ ⎤ ⎡ 1.05 R RL L · (1 − σ(Y )) ⎣ GL ⎦ = ⎣ G ⎦ · + ⎣ 0.97 ⎦ · L · σ(Y ), (11.21) Y BL 1.27 B where {RL , GL , BL } denotes the tone mapped intensities, {R, G, B} are the original HDR values, Y is the luminance, L is the tone mapped luminance, and σ is the scotopic sensitivity from (11.15). The constant coefficients in the monochromatic part account for the blue shift of the subjective hue of colors for the night scenes [36]. Fig. 11.11 shows examples of tone mapped images featuring perceptual effects discussed in this section. In the following section we present an off-line solution for high-quality tone mapping applied in the context of HDR video.
11.5 Bilateral Tone Mapping for HDRC Video Durand and Dorsey [20] presented a spatially varying tone mapping method, which reduces the contrast of an HDR image while preserving the details. The
11 HDR Tone Mapping
171
operator is based on bilateral filtering, which was introduced in image processing and computer vision literature as an efficient tool to remove noise [39]. It is a non-linear filter which performs feature preserving smoothing in a single pass. Those two seemingly contradictory goals are achieved using the Gaussian 2 2 filter wS (x) = e−x /2σS in the image space with simultaneous penalizing large 2 2 variations in pixel intensities using another Gaussian filter wI (x) = e−x /2σI . σS denotes the spatial filter support whose extent determines the set Ω(q) of pixels p which are centered around a pixel q. σI controls the influence of p ˆ as a function of the difference in intensities on the filtered pixel intensity I(q) |I(q) − I(p)| in the input image for pixels p and q. The bilateral filter can be formulated as follows: wS (|q − p|) · wI (|I(q) − I(p)|) · I(p) p∈Ω(q) ˆ (11.22) I(q) = wS (|q − p|) · wI (|I(q) − I(p)|) p∈Ω(q)
The impact of the bilateral filter on the processed image is controlled by the values of σS and σI . The larger σS value the stronger smoothing in the spatial domain. The larger σI value the greater tolerance for intensity differences between pixel q and neighboring pixels p. At the limit, for large σI the bilateral filter converges to an ordinary Gaussian filter and looses the edge preserving properties. The tone mapping operator uses the bilateral filter to decompose the luminance of an image into two different layers similar to Tumblin and Turk’s LCIS method [40]. The first layer, called the base layer, is obtained by filtering an input image using the bilateral filter. The base layer contains information of low spatial frequencies as well as high contrast edges. The second layer, called the detail layer, is computed as the difference between the input image and the base layer. The detail layer contains high and middle spatial frequency details in the image. All computations are performed on luminance in the logarithmic space. The dynamic range reduction is obtained through the scaling of the base layer by a contrast factor c. The detail layer is unaffected by the dynamic range reduction and is added to the modified base layer to obtain the final tone mapped image. The processing flow for tone mapping using bilateral filtering can be summarized as follows: BASE = bilateral(log10 (Y ), σS , σI ), DET AIL = log10 (Y ) − BASE, L = BASE · c + DET AIL, ⎤ ⎡ ⎤ ⎡ R RL L ⎣ GL ⎦ = ⎣ G ⎦ · 10 , Y BL B
(11.23)
The advantage of the bilateral filter tone mapping operator is that it is easy to implement and works robustly for many classes of input HDR images.
172
Krawczyk et al.
The resulting tone mapped images feature good balance between global contrast and detail visibility, which can be easily adjusted accordingly to the user preferences. The method does not lead to halo artifacts and reversing contrast, which are common drawbacks for many local tone mapping operators. The bilateral tone mapping method is not intended to model the HVS and can be considered as a purely image processing method. In this sense this tone mapping method does not comply with the definition formulated by Tumblin and Rushmeier [3] which states that the basic goal of a tone reproduction operator is to achieve a good match between the appearance of real world scenes and corresponding tone mapped images. Figure 11.12 shows a comparison of an HDR image recorded with an HDRC video camera. The colors of the stretched logarithmic data on the left are washed out, details are moderately preserved. On the right side, the image of the eye using bilateral filter technique is shown. Note that the fine details of the iris, skin, or the blood vessels are well preserved. Choudhury and Tumblin [41] introduced trilateral filtering, which removes some of bilateral filtering drawbacks such as blending between disjoint regions that have similar intensity characteristics and smoothing high frequency details in high gradient regions. However, those improvements come at the expense of significant computational costs. The HDRC sensor technology (refer to Chap. 2) with its logarithmic response curve makes the use of the bilateral filter technique interesting and easy because all of the image calculations are done in the logarithmic space. Durand and Dorsey tested their bilateral filter tone mapping mostly for still images of very high quality. The operator has not been applied to dynamic sequences, in particular, those featuring significant temporal noise as this is the case for currently available HDR video cameras. Typically, HDR images are created by fusing a set of photographs taken with different exposures using a standard LDR camera featuring linear sensor (for more details refer to the description of multiple-exposure techniques in Chap. 13). Image artifacts like ghosting due to object movements, lens flare can be removed or limited with available postprocessing algorithms. Such computed HDR images feature low dark noise as well as good contrast and color saturation, which are perfect initial conditions for all tone mapping algorithms to achieve good visual results. On the other side, HDR video cameras like the HDRC camera enable direct capturing of live HDR video streams (see Fig. 11.13). A single frame of the video stream is usually not so “ideal” as an image recorded with a multipleexposure method, which means that they include different kinds of noise like fixed-pattern or readout noise. Also artifacts from camera components appear like ghosts created by reflections of the optical system or additional filters during fluctuating recording environments (e.g., car driving scenes), blurred
11 HDR Tone Mapping
173
Fig. 11.12. HDRC image with stretched log mapping (left) compared with bilateral filtering mapping (right) (Full dynamic range of the images is not visualized due to the print limitations)
Fig. 11.13. Images from HDRC sensor of a fire breather movie with tone mapping using bilateral filtering. This type of sensor becomes more and more interesting for automotive, machine vision, surveillance systems, and television applications
Fig. 11.14. HDRC image with stretched log mapping (left) compared with Bilateral Filtering mapping (right)
174
Krawczyk et al.
image regions caused by not completely IR-filtered images (e.g., high beam car headlights that produce their lighting power peak in the NIR range, see Fig. 11.14) conditions that cannot be changed during recording. All these artifacts cause the loss of detailed image information and will also complicate the use of a tone mapping algorithm. So it is necessary to preprocess the image data to get a better image quality for tone mapping functions. In the case of detail preserving operators, any image interferences like noise, reflections from the optics will be preserved as an image detail and amplified. In practice, the following settings for the bilateral tone mapping operator worked well for many HDRC applications: σS = 4, σI = 0.4, and the compression ratio c applied to the base layer ranging from 0.3 to 0.5. It is also recommended to reduce temporal noise first (refer to Chap. 3), e.g., using a simple 1D temporal filtering with motion compensation as proposed by Bennett and McMillan [42]. A relevant issue is the correction of the sensor’s fixed pattern noise (discussed in Sect. 2.3). The colors look unsaturated after color demosaicking due to the logarithmic principle of lightness recording, so it is important to enhance additionally the color saturation (discussed in Chap. 4). Enhancement factors for αR = αG = αB = 1.3–1.5 worked well in conjunction with bilateral tone mapping. In summary, the local tone mapping operator provides good overall brightness and contrast especially for images with very HDR in comparison to other global operators. This means that it is possible to display nearly the whole dynamic range of an HDRC camera image without loss of image details on an LDR display and fully demonstrate the power of the logarithmic compression concept of HDRC technology. Fig. 11.14 shows an HDR scene recorded with an HDRC camera of highbeam car headlights against dark backlight illumination with a dynamic range of more than six orders of magnitude (illumination range varied from 0.3 lx to over 300,000 lx). The colors of the stretched logarithmic compressed image (Fig. 11.14, left) look unsaturated and many image details of the scene e.g., the Macbeth Color Chart or the LED lamp on the bottom left of the image are lost. In comparison to the bilateral filter mapped image (Fig. 11.14, right) all colors of the color checker are pleasantly saturated and fine details are visible. Spatially varying filtering including bilateral filtering is a very time consuming operation. Durand and Dorsey [20] proposed an approximated bilateral filtering solution, which reduces the computation time significantly, but still real-time performance as required for on-line HDR video playback cannot be at present achieved. For applications involving real-time HDR video playback a very good cost-performance can be achieved using commodity graphics cards. It can
11 HDR Tone Mapping
175
be expected that with the increasing power and programmability of graphics processing units (GPUs), more and more advanced image processing and filtering will be feasible in the near future.
11.6 Summary In view of the increasing availability of the HDR contents the problem of their presentation on conventional display devices is highly recognized. Different goals and approaches led to the development of versatile algorithms. These algorithms have different properties which correspond to the specific requirements and applications. Furthermore, due to the temporal incoherence certain methods cannot be used for the tone mapping of video streams. A universal method has not been found so far, therefore the choice of the tonemapping method should be based on the application requirements. With respect to the HDR video streams the choice of an appropriate tone mapping method is usually a trade-off between the computational intensity and the quality of dynamic range compression. The quality here is mainly assessed by a good local details visibility. The global tone mapping methods are very fast, but often lead to the loss of local details due to an intensive dynamic range compression. Such methods should be used whenever high efficiency is the main requirement of the target application. The adaptation mechanisms can be used to select the range of luminance values which should obtain the best mapping. However when the quality is insufficient, local tone mapping methods are necessary. The local details enhancement methods provide a good improvement to the global tone mapping methods still achieving good computational performance. The best results are achieved using the bilateral filtering, lightness perception or gradient methods, however these techniques do not currently have real-time implementations. They can be therefore used for an off-line rendering of high quality still shots. The photometrically calibrated HDR video streams allow for the prediction of the perceptual effects. Such effects are typical to everyday perception of real-world scene, but do not appear when observing a display showing a tone mapped HDR video. Prediction of such effects and their simulation can increase the realism of the presentation of HDR contents. On the other hand, such a prediction may also be used to identify situations when a realworld observation of scene would be impaired and to hint the tone mapping algorithm to focus on the good detail reproduction there.
References 1. H. Seetzen, W. Heidrich, W. Stuerzlinger, G. Ward, L. Whitehead, M. Trentacoste, A. Ghosh, and A. Vorozcovs. High dynamic range display systems. ACM Transactions on Graphics, 23(3), 2004 2. B.A. Wandell. Foundations of Vision. Sinauer Associates, Sunderland, MA, 1995
176
Krawczyk et al.
3. J. Tumblin and H.E. Rushmeier. Tone reproduction for realistic images. IEEE Computer Graphics and Applications, 13(6):42–48, November 1993 4. G. Ward. A contrast-based scalefactor for luminance display. Graphics Gems IV, pages 415–421, 1994 5. J.A. Ferwerda, S. Pattanaik, P.S. Shirley, and D.P. Greenberg. A model of visual adaptation for realistic image synthesis. In Proceedings of SIGGRAPH 96, Computer Graphics Proceedings, Annual Conference Series, pages 249–258, August 1996 6. S.N. Pattanaik, J.E. Tumblin, H.Yee, and D.P. Greenberg. Time-dependent visual adaptation for realistic image display. In Proceedings of ACM SIGGRAPH 2000, Computer Graphics Proceedings, Annual Conference Series, pages 47–54, July 2000 7. E. Reinhard and K. Devlin. Dynamic range reduction inspired by photoreceptor physiology. IEEE Transactions on Visualization and Computer Graphics, 11(1):13–24, 2005 8. T.G. Stockham. Image processing in the context of a visual model. Proceedings of the IEEE, 13(6):828–842, 1960 9. F. Drago, K. Myszkowski, T. Annen, and N. Chiba. Adaptive logarithmic mapping for displaying high contrast scenes. Computer Graphics Forum, Proceedings of Eurographics 2003, 22(3):419–426, 2003 10. K. Perlin and E.M. Hoffert. Hypertexture. In Computer Graphics (Proceedings of SIGGRAPH 89), Vol. 23, pages 253–262, July 1989. 11. A. Yoshida, V. Blanz, K. Myszkowski, and H.P. Seidel. Perceptual evaluation of tone mapping operators with real-world scenes. In B.E. Rogowitz, T.N. Pappas, and S.J. Daly, editors, Human Vision and Electronic Imaging X, IS&T/SPIE’s 17th Annual Symposium on Electronic Imaging (2005), volume 5666 of SPIE Proceedings Series, pages 192–203, San Jose, USA, 2005. SPIE 12. G. Ward Larson, H. Rushmeier, and C. Piatko. A visibility matching tone reproduction operator for high dynamic range scenes. IEEE Transactions on Visualization and Computer Graphics, 3(4):291–306, 1997 13. E.H. Land and J.J. McCann. Lightness and the retinex theory. Journal of the Optical Society of America, 61(1):1–11, 1971 14. B.K.P. Horn. Determining lightness from an image. Computer Graphics and Image Processing, 3(1):277–299, 1974 15. A. Hurlbert. Formal connections between lightness algorithms. Journal of the Optical Society of America A, 3(10):1684–1693, 1986 16. J. DiCarlo and B. Wandell. Rendering high dynamic range images. In Proceedings of SPIE, Vol. 3965, pages 392–401, 2000 17. E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda. Photographic tone reproduction for digital images. ACM Transactions on Graphics, 21(3):267–276, 2002 18. E. Reinhard. Parameter estimation for photographic tone reproduction. Journal of Graphics Tools, 7(1):45–52, 2002 19. M. Ashikhmin. A tone mapping algorithm for high contrast images. In Proceedings of the 13th Eurographics workshop on Rendering, pages 145–156, 2002 20. F. Durand and J. Dorsey. Fast bilateral filtering for the display of high-dynamicrange images. ACM Transactions on Graphics, 21(3):257–266, 2002 21. S.E. Palmer. Vision Science: Photons to Phenomenology, chapter 3.3 SurfaceBased Color Processing. MIT, Cambridge, MA, 1999 22. A. Gilchrist. Lightness contrast and failures of constancy: A common explanation. Perception & Psychophysics, 43:415–424, 1988
11 HDR Tone Mapping
177
23. I. Rock. The Logic of Perception. MIT, Cambridge, MA, 1983 24. A. Gilchrist and J. Cataliotti. Anchoring of surface lightness with multpile illumination levels. Investigative Ophthamalmology and Visual Science, 35, 1994 25. A. Gilchrist, C. Kossyfidis, F. Bonato, T. Agostini, J. Cataliotti, X. Li, B. Spehar, V. Annan, and E. Economou. An anchoring theory of lightness perception. Psychological Review, 106(4):795–834, 1999 26. G. Krawczyk, M. Goesele, and H.P. Seidel. Photometric calibration of high dynamic range cameras. Research Report MPI-I-2005-4-005, Max-PlanckInstitut f¨ ur Informatik, Stuhlsatzenhausweg 85, 66123 Saarbr¨ ucken, Germany, April 2005 27. G. Krawczyk, K. Myszkowski, and H.P. Seidel. Lightness perception in tone reproduction for high dynamic range images. In The European Association for Computer Graphics 26th Annual Conference EUROGRAPHICS 2005, volume 24 of Computer Graphics Forum, pages 635–646, Dublin, Ireland, 2005. Blackwell 28. D.J. Jobson, Z. Rahman, and G.A. Woodell. Properties and performance of a center/surround retinex. IEEE Transactions on Image Processing, 6(3):451–462, 1997 29. R. Fattal, D. Lischinski, and M. Werman. Gradient domain high dynamic range compression. ACM Transactions on Graphics, 21(3):249–256, 2002 30. R. Mantiuk, K. Myszkowski, and H.P. Seidel. A Perceptual framework for contrast processing of high dynamic range images. In J.Koenderink and J. Malik, editors, Proceedings of the 2nd Symposium on Applied Perception in Graphics and Visualization (APGV 2005). ACM, New York, 2005 31. P. Whittle. Increments and decrements: luminance discrimination. Vision Research, 26(10):1677–1691, 1986 32. N. Goodnight, R. Wang, C. Woolley, and G. Humphreys. Interactive timedependent tone mapping using programmable graphics hardware. In Rendering Techniques 2003: 14th Eurographics Symposium on Rendering, pages 26–37, 2003 33. F. Durand and J. Dorsey. Interactive tone mapping. In Rendering Techniques 2000: 11th Eurographics Workshop on Rendering, pages 219–230, 2000 34. G. Spencer, P. Shirley, K. Zimmerman, and D.P. Greenberg. Physically-based glare effects for digital images. In Proceedings of ACM SIGGRAPH 95, pages 325–334, 1995 35. G. Spencer, P.S. Shirley, K. Zimmerman, and D.P. Greenberg. Physically-based glare effects for digital images. In Proceedings of ACM SIGGRAPH 95, Computer Graphics Proceedings, Annual Conference Series, pages 325–334, August 1995 36. R.W.G. Hunt. The Reproduction of Colour in Photography, Printing and Television: 5th Edition. Fountain, Tolworth 1995 37. S. Shaler. The relation between visual acuity and illumination. Journal of General Psychology, 21:165–188, 1937 38. R.J. Deeley, N. Drasdo, and W. N. Charman. A simple parametric model of the human ocular modulation transfer function. Ophthalmology and Physiological Optics, 11:91–93, 1991 39. C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In Proceedings of the 1998 IEEE International Conference on Computer Vision, Bombay, India, pages 839–846, 1998
178
Krawczyk et al.
40. J. Tumblin and G. Turk. LCIS: A boundary hierarchy for detail-preserving contrast reduction. In Proceedings of ACM SIGGRAPH 99, Computer Graphics Proceedings, Annual Conference Series, pages 83–90, 1999 41. P. Choudhury and J. Tumblin. The trilateral filter for high contrast images and meshes. In Rendering Techniques 2003: 14th Eurographics Symposium on Rendering, pages 186–196, 2003 42. E.P. Bennett and L. McMillan. Video enhancement using per-pixel virtual exposures. ACM Transactions on Graphics, 24(3): 765–776, 2005
12 HDR Image and Video Compression Rafal Mantiuk
12.1 Introduction HDR data can consume enormous amounts of storage space or transmission bandwidth if it is represented in a naive way. HDR compression can reduce the size of visual data by more than several hundreds of times without introducing objectionable distortions. Therefore, compression has become an important part of many HDR image and video systems. Capturing of HDR video and images becomes easier with the development of HDR cameras. On the other end of the pipeline, HDR data can be displayed on new generation of HDR displays. However, in order to make those two ends of the pipeline work together, there is a need for a common format of data that both would understand. This can be achieved with scene-referred approach, described in Sect. 12.2. The recent advances in digital camera and display technologies make a traditional 8-bit per color channel representation of visual data insufficient. The usual pixel formats as well as file formats for images and video (JPEG, PNG, MPEG) fail to represent scenes of dynamic range over 2 or 3 orders of magnitude and extended color gamut. To address this problem, several HDR image formats have been introduced. Section 12.4 describes several such image formats, which are the most commonly used. Equally important to image compression is the compression of HDR video. Therefore, Sect. 12.5 describes a HDR extension to MPEG encoding. The major distinction between low dynamic range and HDR compression is the format in which pixels are stored. A pixel format, which can encode the full range of color the eye can see and that is adjusted to the limitations of our perception, is described in Sect. 12.6. Finally, Sect. 12.7 gives a brief review of the existing software for HDR image and video processing.
180
R. Mantiuk
12.2 Device-Referred and Scene-Referred Representation of Images Commonly used image formats (JPEG, PNG, TIFF, etc.) contain data that is tailored to particular display devices: cameras, CRTs or LCD monitors. The sRGB color space, a de facto standard for many low dynamic images and devices, defines the maximum luminance of 80 cd m−2 and a restricted color gamut, which does not even cover the gamut of some color printers. Obviously, such limited representation of color cannot faithfully represent real world scenes, which can contain both highly saturated colors and luminance range that cannot be captured by any low-dynamic range color space. The reason why sRGB is so restrictive is that it was designed to match the capabilities of most computer monitors. Obviously, such representation of images vaguely relates to the actual photometric properties of the scene it depicts, but it is rather dependent on a display device. Therefore, sRGB and similar color formats can be considered as device-referred (also known as output-referred) since they are tightly coupled with the capabilities and characteristics of a particular imaging device. Because both color gamut and dynamic range may differ significantly between devices, there is often a need to convert image colors between different device-referred formats. ICC color profiles [1] can be used to convert visual data from one device-referred format to another. Such profiles define colorimetric properties of a device the image is intended for. Problems arise if the two devices have a different color gamut or dynamic range, in which case a conversion from one format to another usually requires loss of some visual information. The problem is even more difficult when an image captured with a HDR camera is to be converted to the color space of a low-dynamic range monitor (this is known as a tone mapping problem, discussed in Chapter 11 of this book). Obviously, the ICC profiles cannot be easily used to facilitate interchange of data between LDR and HDR devices. A scene-referred representation of images offers a much simpler solution to this problem. It implies that an image should encode the actual photometric characteristic of the scene it depicts. Conversion from such a common representation (for example CIE XYZ tri-stimulus values), which directly corresponds to the physical luminance or radiance values, to a format suitable for a particular device is a responsibility of that device. HDR file formats are an example of scene-referred encoding, as they usually represent either luminance or radiance, rather than gamma corrected intensity.
12.3 HDR Image and Video Compression Pipeline Although image and video compression of low-dynamic range sceneries has been studied thoroughly for many years, the problem of HDR compression has been identified and addressed only recently. Fortunately, most of low-dynamic
12 HDR Image and Video Compression Raw image/video
Color Space Encoding
Pixel format
Motion Compensation (only video)
Spatial Transformation (e.g. DCT, DWT)
Quantization (only lossy compression)
Arithmetic Encoding
181
Bit-stream
Fig. 12.1. A general pipeline of video and image compression. See the text for details
range compression techniques can also be applied to HDR with some minor modifications. In this chapter we focus on the differences between high- and low-dynamic range image and video compression without going too much into details of general compression techniques. Figure 12.1 illustrates the basic pipeline of almost any video and image compression method. The first step is color space encoding, which aims at converting pixel values to the representation, which is more suitable for compression. This is usually the step that differs significantly between low-dynamic range and HDR compression. The next step, motion compensation, is present only in the case of video compression and is responsible for minimizing the differences between two consecutive frames due to motion, so that the difference between these frames can be encoded efficiently. Advanced compression methods do not store pixels, but rather transform images or video to the frequency or the wavelet domain, which gives much more compact representation (step Spatial Transformation). To achieve high compression ratios, the data is then quantized (step Quantization), which means that the precision and, thus, the number of bits needed to encode values is reduced. The final step, Arithmetic encoding, applies lossless compression to the data in order to further reduce its size. Since the major difference between low-dynamic range and HDR compression is the way pixels are encoded, the focus of Sect. 12.4 is on pixel formats (or color spaces) used for HDR image compression. A detailed discussion of a particular HDR pixel format, which takes into account the limitations of the human eye, is described in Sect. 12.6.
12.4 HDR Image Formats This section reviews the most popular scene-referred HDR image formats and gives some insight how HDR data can be efficiently represented and stored. 12.4.1 Radiance’s HDR Format One of the first HDR image formats, which gained much popularity, was introduced with the Radiance rendering package.1 Therefore, it is known as the Radiance picture format [2] and can be recognized by the file extensions 1
Radiance is an open source light simulation and realistic rendering package. Home page: http://radsite.lbl.gov/radiance/
182
R. Mantiuk 0
8
Red
16
Green
24
Blue
31
Exponent
Fig. 12.2. 32-bit per pixel RGBE encoding
.hdr or .pic. The file consist of a short text header, followed by run-length encoded pixels. The pixels are encoded using so-called RGBE or XYZE representations, which differ only by a color space that is used. Since both representations are very similar, we only describe the RGBE encoding. RGBE pixel encoding represents colors using four bytes: the first three bytes encode red, green and blue color channels, and the last byte is a common exponent for all channels (see Fig. 12.2). RGBE is essentially a custom floating point representation of pixel values. RGBE encoding takes advantage of the fact that all color channels are strongly correlated in RGB color space and their values are at least of the same order of magnitude. Therefore, there is no need to store a separate exponent for each color channel. 12.4.2 LogLuv TIFF The major drawback of floating point representation of pixel values is that floating point numbers do not compress well. This is mainly because they need additional bits to encode both mantissa and exponent values. Such representation, although flexible, is not really required for visual data. Furthermore, precision error of floating point numbers varies significantly across the full range of possible values and is different than the “precision” of our visual system. Therefore, better compression can be achieved when integer numbers are used to encode HDR pixels. The LogLuv encoding [3] requires only integer numbers to encode a full range of luminance and color gamut that is visible to the human eye. It is an optional encoding in the TIFF library. This encoding benefits from the fact that the human eye is not equally sensitive to all luminance ranges. In the dark we can see a luminance difference of a few hundredths of cd m−2 , while in the sunlight we need a difference of tens of cd m−2 to see a difference. But if, instead of luminance, a logarithm of luminance is considered, the detectable threshold values do not vary so much and a constant value can be a conservative approximation of the visible threshold. Therefore, if a logarithm of luminance is encoded using integer numbers, quantization errors roughly correspond to the visibility thresholds of the human visual system, which is a desirable property for pixel encoding. 32-bit LogLuv encoding uses two bytes to encode luminance and another two bytes to represent chrominance (see Fig. 12.3). Chrominance is encoded using a perceptually uniform chromacity scale u v (see Sect. 12.6 for details). There is also 24-bit LogLuv encoding, which needs fewer bits to encode pixels
12 HDR Image and Video Compression 0 1
Sign
16
15-bit logL
24
8-bit u
183 31
8-bit v
Fig. 12.3. 32-bit per pixel LogLuv encoding
with the precision that is below the visibility thresholds. However, this format is rather ineffective to encode, due to discontinuities resulting from encoding two chrominance channels with a single lookup value. 12.4.3 OpenEXR An OpenEXR format or the EXtended Range format recognized by the file name extension .exr was made available with an open source C++ library in 2002 by Industrial Light and Magic (see http://www.openexr.org/ and [4]). Before that date the format was used internally by Industrial Light and Magic for the purpose of a special effect production. The format is currently promoted as a special-effect industry standard and many software packages already support it. Some features of this format include: – Support for 16-bit floating point, 32-bit floating point, and 32-bit integer pixels. The 16-bit floating point format, called “half”, is compatible to the HALF data type in NVIDIA’s Cg graphics language and supported natively on their new GeForce FX and Quadro FX 3D graphics solutions. – Multiple lossless image compression algorithms. Some of the included codecs can achieve 2:1 lossless compression ratios on images with film grain. – Extensibility. New compression codecs and image types can easily be added by extending the C++ classes included in the OpenEXR software distribution. New image attributes (strings, vectors, integers, etc.) can be added to OpenEXR image headers without affecting backward compatibility with existing OpenEXR applications. Although OpenEXR file format offers several data types to encode channels, color data is usually encoded with 16-bit floating point numbers, known as half-precision floating points. Such two byte floating point number consists of one bit of sign, 5-bit exponent, and 10-bit mantissa, as shown in Fig. 12.4 (thus the format is known also as S5E10). 12.4.4 Subband Encoding – JPEG HDR JPEG HDR is an extension to the JPEG format for storing HDR images that is backward compatible with an ordinary 8-bit JPEG [5]. A JPEG HDR file contains a tone mapped version of a HDR image and a ratio (subband) image, which contains information needed to restore the HDR image from the tone
184
R. Mantiuk 0
15
0
15
Red Green 0
15 Blue
Sign
Exponent
Mantissa
Fig. 12.4. 48-bit per pixel OpenEXR half-precision floating point encoding HDR Image
Tone Map HDR Image
Compute Ratio Image
Sub-sample Ratio Image
JPEG DCT compression
JPEG DCT compression
Store Ratio Image as JPEG markers JPEG file
Fig. 12.5. Data flow of subband encoding in JPEG HDR format
mapped image. The ratio image is stored in user-data JPEG markers, which are normally ignored by applications. This way a naive application will always open a tone mapped version of an image, whereas an HDR-aware application can retrieve the HDR image. A data flow of the subband encoding is shown in Fig. 12.5. An HDR image is first tone mapped and compressed as an ordinary JPEG file. The same image is also used to compute a ratio image, which stores a ratio between HDR and tone mapped image luminance for each pixel. To improve encoding efficiency, the ratio image is subsampled and encoded at a lower resolution using the ordinary JPEG compression. After the compression, the ratio image is stored in JPEG markers together with the tone mapped image. To reduce the loss of information due to subsampling of the ratio image, two correction methods have been proposed: enhancing edges in a tone mapped image (so-called precorrection) and synthesizing high frequencies in the ratio image during up-sampling (so-called postcorrection). Further details on the JPEG HDR compression can be found in [5].
12.5 HDR Extension to MPEG Video Compression Mantiuk et al. [6] showed that HDR Video can be encoded using the standard MPEG-4 (ISO/IEC 14496-2) compression with only modest changes in the compression algorithm. Fig. 12.6 shows the differences in the data flow of the HDR and standard MPEG-4 video compression. The first major difference is the color space transformation used to encode HDR pixels. HDR video encoding takes advantage of the fact that luminance can be encoded using up to 12-bits in case of the MPEG-4 standard. A new luminance nonlinearity
12 HDR Image and Video Compression 8-bit RGB
185
HDR XYZ
Color Space Transformation
YCrCb Lpuv
Inter-frame Motion Estimation differences and Compensation
DCT blocks
Discrete Cosine Transform Hybrid Luminance and Frequency Space Coding
bitstream quantized DCT blocks Variable Length Quantization Coding edge blocks
Run-length Coding
Fig. 12.6. Data flow for the standard MPEG video encoding (solid) and the extensions (dashed and italic) for encoding HDR video. Note that edge blocks are encoded together with DCT data in the HDR flow Sharp edge signal Original signal Smoothed signal
Fig. 12.7. Edge encoding: a decomposition of a signal into sharp edge and smooth signals
(transfer function) is used to encode the full visible range of luminance using only 11–12-bits. This nonlinearity is described in detail in Sect. 12.6. Another major difference between HDR and standard compression is a new approach for encoding sharp contrast edges. To avoid the problem of noisy artifacts near sharp contrast edges resulting from quantization of DCT coefficients, such edges are encoded separately from smoothed DCT data, as illustrated on the example of a one-dimensional signal in Fig. 12.7. Since sharp contrast edges occupy only a small portion of an image, they can be efficiently compressed using run-length encoding. Removing sharp contrast edges from the signal that is to be DCT encoded removes some high frequencies, which in turn improve both quality and compression ratio. This is because DCT encoding achieves the best performance for low-frequency content. The paper [6] also shows several applications of HDR videos. A dynamic range exploration tool for example allows the user to view a selected range of luminance in a rectangular window displayed on top of the video (see Figs. 12.8-12.9). The user can move the window interactively and choose which part of a dynamic range should be mapped linearly to the display for closer inspection. Most of the tone mapping operators have one or more parameters that can be tuned by the user to match her or his taste. Since accurate photometric values are available in case of HDR videos, the user is free to choose a tone mapping algorithm and control its parameters on the fly. LDR movie files are usually tuned for a single kind of display device and viewing condition. Since real world luminance values are encoded in the scene-referred HDR video stream, a video player can adjust presentation parameters to any existing or future display device. A HDR video stream with real world luminance values
186
R. Mantiuk
Fig. 12.8. HDR video frame with dynamic range exploration tool
Fig. 12.9. Simulated night vision based on an HDR video frame. Note the lack of colors and bluish cast, which is typical for low-light vision
makes it possible to add client-side postprocessing, which accurately simulates the limitations of human visual perception or video cameras. It is possible, for instance, to add a night vision postprocessing [7], as shown in Fig. 12.9.
12 HDR Image and Video Compression
187
12.6 Perceptual Encoding of HDR Color Floating point representation of pixel values does not lead to the best image or video compression ratios, as discussed in Sect. 12.4.2. Additionally, since the existing image and video formats, such as MPEG or JPEG2000, can encode only integer numbers, HDR data must be represented as integers in order to use these formats. Therefore, it is highly desirable to convert floating point luminance values into integers numbers. Such integer encoding of luminance should take into account the limitations of human perception and the fact that the eye can see only limited numbers of luminance levels. This section gives an overview of the color space that can efficiently represent HDR pixel values using only integer numbers and the minimal number of bits. More information on this color space can be found in [8]. Different applications may require different precision of the visual data. For example satellite imaging may require multispectral techniques to capture information that is not even visible to the human eye. However, for a large number of applications it is sufficient if the human eye cannot notice any encoding artifacts. It is important to note that low-dynamic range formats, like JPEG or a simple profile MPEG, cannot represent the full range of colors that the eye can see. Although the quantization artifacts due to 8-bit discretization in those formats are not visible to our eyes, this encoding can represent only a fraction of the dynamic range that the eye can see. Most of the low-dynamic range image or video formats use so-called gamma correction to convert luminance or RGB spectral color intensity into integer numbers, which can be encoded later. Gamma correction is usually given in a form of the power function intensity = signalγ (or signal = intensity (1/γ) for an inverse gamma correction), where the value of γ is between 1.8 and 2.2. Gamma correction was originally intended to reduce camera noise and to control the current of the electron beam in CRT monitors (for details on gamma correction, see [9]). Incidentally, light intensity values, after being converted into signals using the inverse gamma correction formula, usually correspond well with our perception of lightness. Therefore, such values are also wellsuited for image encoding since the distortions caused by image compression are equally distributed across the whole scale of signal values. In other words, altering signals by the same value for small values and large values of signals should result in the same magnitude of visible changes. Unfortunately, this is only true for a limited range of luminance values, usually within a range of 0.01 – 100 cd m−2 . This is because the response characteristics of the human visual system (HVS) to luminance2 changes considerably above 100 cd m−2 . This is especially noticeable for HDR images, which can span across the luminance range from 10−4 to 108 cd m−2 . An ordinary gamma correction is not 2
HVS use both types of photoreceptors, cones and rods within the range of luminance of approximately 0.01 to 10 cd m−2 . Above 100 cd m−2 only cones contribute to the visual response.
188
R. Mantiuk 0
12
12-bit JND L
20
8-bit u
27
8-bit v
Fig. 12.10. 28-bit per pixel JND encoding
sufficient in such case and a more elaborate model of luminance perception is needed. This problem is solved by JND encoding, described in this section. The JND encoding is a further improvement over the LogLuv encoding (see Sect. 12.4.2), which takes into account more accurate characteristics of the human eye. JND encoding can also be regarded as an extension of gamma correction to HDR pixel values. The name JND encoding is motivated by its design, which makes the encoded values correspond to the Just Noticeable Differences (JND) of luminance. JND encoding requires two bytes to represent color and 12-bits to encode luminance (see Fig. 12.10). Similar to LogLuv encoding, chroma is represented using u and v chromacities as recommended by the CIE 1976 Uniform Chromacity Scales (UCS) diagram and defined by equations: u =
4X , X + 15Y + 3Z
(12.1)
9Y . (12.2) X + 15Y + 3Z Luma, l, is found from absolute luminance values, y (cd m−2 ), using the following formula: ⎧ if y < yl , ⎨a · y if yl ≤ y < yh , l(y) = b · y c + d (12.3) ⎩ e · log(y) + f if y ≥ yh . v =
There is also a formula for the inverse conversion, from 12-bit luma to luminance: ⎧ if l < ll , ⎨a · l c y(l) = b (l + d ) (12.4) if ll ≤ l < lh , ⎩ e · exp(f · l) if l ≥ lh . The constants are given in the table below: a = 17.554 b = 826.81 c = 0.10013 d = −884.17
e = 209.16 f = −731.28 yl = 5.6046 yh = 10469
a = 0.056968 b = 7.3014e − 30 c = 9.9872 d = 884.17
e = 32.994 f = 0.0047811 ll = 98.381 lh = 1204.7
The above formulas have been derived from the luminance detection thresholds in such a way that the same difference of values l, regardless
12 HDR Image and Video Compression
189
whether in a bright or in a dark region, corresponds to the same visible difference3 . Neither luminance nor the logarithm of luminance has this property since the response of the human visual system to luminance is complex and nonlinear. The values of l are within the range of 0 and 4, 095 (12-bit integer) for the corresponding luminance values of 10−5 to 1010 cd m−2 , which is the range of luminance that the human eye can effectively see (although the values above 106 would mostly be useful for representing the luminance of bright light sources). If desired, the values of l can be rescaled to a lower range in order to encode luminance using 10- or 11- bits. Such lower bit encodings should still offer quantization errors below the visibility thresholds, especially for video encoding. A useful property of the function given in (12.3) is that it is smooth (C 1 ) and defined for the full positive range of luminance values including point y = 0, in which l = 0. Function l(y) (12.3) is plotted in Fig. 12.11 and labelled “JND encoding”. Note that both formula and shape of the JND encoding is very similar to the nonlinearity (gamma correction) used in the sRGB color space [10]. Both JND encoding and sRGB nonlinearity follow similar curves on the plot, but the JND encoding is more conservative (a steeper curve means that a luminance range is projected on a larger number of discrete luma values, V, thus lowering quantization errors). sRGB nonlinearity consists of two segments: a linear and a power function. So does the JND encoding, but it also includes a logarithmic segment for the luminance values greater than 1420.7 (see 12.3). Therefore, the JND encoding can be considered as an extension of a low-dynamic range nonlinearity (also known as gamma correction or transfer function) to HDR luminance values. For comparison, Fig. 12.11 also shows log luminance encoding used in the LogLuv TIFF format. The shape of the logarithmic function is significantly different from both sRGB gamma correction and JND encoding. Although a logarithmic function is a simple and often used approximation of the HVS response to the full range of luminance, it is clear that such approximation is very coarse and does not predict the loss of sensitivity for the low light conditions. The maximum quantization errors for all luminance encodings described in this chapter are shown in Fig. 12.12. All but JND encoding have approximately uniform maximum quantization errors across all visible luminance values. The edgy shape of both RGBE and 16-bit half encodings is caused by rounding of the mantissa. The JND encoding varies the maximum quantization error across the range to mimic loss of sensitivity in the HVS for low light levels. This not only makes better use of available range of luma values 3
Derivation of this function can be found in [8]. The formulas are derived from the threshold versus intensity characteristics measured for human subjects and fitted to the analytical model [11].
190
R. Mantiuk 4000 3500
Luma V
3000 2500 2000 log_2 Mapping 1500 JND Encoding
1000 500
sRGB
0 1e−04
0.01
1
1e + 06
10000
100
1e + 08
Luminance Y [cd/m∧ 2] ∧∧∧∧∧
Fig. 12.11. Functions mapping physical luminance y to encoded luma values l. JND Encoding – perceptual encoding of luminance; sRGB – nonlinearity (gamma correction) used for the sRGB color space; logarithmic compression – logarithm of luminance, rescaled to 12-bit integer range. Note that encoding high luminance values using the sRGB nonlinearity (dashed line) would require a significantly larger number of bits than the perceptual encoding
Log maximum relative quant. error
6 5 4 luma (12 bit integer)
3 2
OpenEXR (S5E 10 float)
1 0 −1
RGBE (8E8 float)
LogLuv 32 (16-bit log)
−2 −3 −4 −6
−4
−2
0 2 4 Log luminance y [cd/m^2]
6
8
10
Fig. 12.12. Comparison of the maximum quantization errors for different luminance to luma encodings: JND encoding is given by 12.3; RGBE is an encoding used in the Radiance HDR format; 16-bit half is a 16-bit floating point format used in OpenEXR; 32-bit LogLuv is a logarithmic luminance encoding used in LogLuv TIFF format
12 HDR Image and Video Compression
191
but also reduces invisible noise in very dark scenes. Such noise reduction can significantly improve image or video compression.
12.7 Software for HDR Image and Video Processing There is a growing number of applications that can read, write and modify HDR images and video. It is to be expected that most of the digital image processing software will support HDR data in the near feature. Currently however, (as of September 2005) the choice of applications that can operate on higher dynamic range is rather limited. Therefore, we list some free and commercial software that can take advantage of HDR: – pfstools — a set of command line programs for reading, writing, manipulating and viewing HDR images and video frames. The software facilitates image file format conversion, resizing, cropping and rotating video frames or images. It is available as open source under GPL. URL: http://pfstools.sourceforge.net/ – OpenEXR — a library for handling OpenEXR files and a viewer for HDR images (exrdisplay). The library is available as open source. URL: http: //www.openexr.org/ – Photosphere — an HDR Photoalbum software for Mac OS X, which can also create HDR images using multi-exposure technique. Available for free. URL: http://www.anyhere.com/ – Photogenics HDR — an image editing and paiting program, which can operate on 32-bit buffers. The program addresses similar applications as Adobe Photoshop. It is commercially available. URL: http://www. idruna.com/photogenicshdr.html – HDR Shop 2 — can make HDR images from LDR photographs using multi exposures technique. Some image processing operations are included as well. It is commercially available. URL: http://www.hdrshop.com/
References 1. ICC. Specification ICC.1:2004-10 (Profile version 4.2.0.0) Image technology colour management – Architecture, profile format, and data structure. International Color Consortium, 2004 2. Greg Ward. Real pixels. Graphics Gems II, pp. 80–83, 1991 3. G. Ward Larson. LogLuv encoding for full-gamut, high-dynamic range images. Journal of Graphics Tools, 3(1):815–830, 1998 4. R. Bogart, F. Kainz, and D. Hess. OpenEXR image file format. In ACM SIGGRAPH 2003, Sketches & Applications, 2003 5. Greg Ward and Maryann Simmons. Subband encoding of high dynamic range imagery. In APGV ’04: Proceedings of the 1st Symposium on Applied perception in graphics and visualization, pp. 83–90, New York, NY, USA, 2004. ACM Press
192
R. Mantiuk
6. Rafal Mantiuk, Grzegorz Krawczyk, Karol Myszkowski, and Hans-Peter Seidel. Perception-motivated high dynamic range video encoding. ACM Trans. Graph, 23(3):733–741, 2004 7. William B. Thomspon, Peter Shirley, and James A. Ferwerda. A spatial postprocessing algorithm for images of night scenes. Journal of Graphics Tools, 7(1):1–12, 2002 8. Rafal Mantiuk, Karol Myszkowski, and Hans-Peter Seidel. Lossy compression of high dynamic range images and video. In Proc. of Human Vision and Electronic Imaging XI, volume 6057 of Proceedings of SPIE, page 60570V, San Jose, USA, February 2006. SPIE 9. Charles A. Poynton. A Technical Introduction to Digital Video, Chapter 6, Gamma. John Wiley & Sons, 1996 10. M. Stokes, M. Anderson, S. Chandrasekar, and R. Motta. A standard default color space for the internet – sRGB. 1996 11. CIE. An Analytical Model for Describing the Influence of Lighting Parameters Upon Visual Performance, volume 1. Technical Foundations, CIE 19/2.1. International Organization for Standardization, 1981
13 HDR Applications in Computer Graphics Michael Goesele and Karol Myszkowski
13.1 Introduction One of the goals of computer graphics is to realistically render real objects and scenes taking their physical properties into account. Due to the high dynamic range (HDR) encountered in real scenes this requires performing most rendering steps with floating point precision. This was acknowledged early on and implemented in many software rendering systems. But the limited computational resources in hardware accelerated rendering systems often required developers to work with lower precision – typically 8 bit per color channel. With the advent of more powerful graphics cards, CPUs, and cluster rendering systems, an increasing number of systems incorporate now a full floating point rendering pipeline so that dealing with HDR data has become a standard requirement in computer graphics. One indication for this development is the now wide-spread use of HDR image file formats such as the OpenEXR format [1]. Without exact scene models (including materials, objects, lighting), realistic and physically correct rendering is, however, impossible. Graphics research is therefore also introducing methods to digitize various aspects of reality. Many of these methods are using camera systems as input devices and are therefore called image-based methods. One important aspect is hereby the ability to cope with the often huge dynamic range of the captured scene either using software or hardware system. In this chapter, we first discuss common approaches in the computer graphics area for capturing and calibrating HDR image data. We then describe several approaches for the digitization of real-world objects (Sect. 13.3) and for image-based lighting (Sect. 13.4). The chapter is concluded by a discussion of the requirements for HDR capture systems in computer graphics applications (Sect. 13.5).
194
M. Goesele and K. Myszkowski
13.2 Capturing HDR Image Data One of the earliest examples of HDR imaging was introduced by Wyckoff and Feigenbaum [2] in the 1960s who invented an analog film with several emulsions of different sensitivity levels. This false color film had an extended dynamic range of about 108 which is sufficient to capture the vast majority of scenes encountered in practice without changing any exposure settings (aperture, exposure time) or adding filters to the optical path. An additional advantage is that brightness levels can be easily compared between different images provided that camera settings remain unchanged. 13.2.1 Multiexposure Techniques As in the case of analog film, the dynamic range of traditional CCD imaging sensors often used in computer graphics applications is quite limited (typically in the order of 103 –104 ). Several authors proposed therefore methods to extend the dynamic range of digital imaging systems by combining multiple images of the same scene that differ only in exposure time (multiexposure techniques). Madden [3] assumes linear response of a CCD imager and selects for each pixel an intensity value from the brightest nonsaturated image. This value is scaled according to the image’s exposure and stored in the final HDR image. A large number of methods has been proposed by several authors including Mann and Picard [4], Debevec and Malik [5], Mitsunaga and Nayar [6] and Robertson et al. [7] that estimate the Opto Electronic Conversion Function (OECF) (mostly called response curve in computer graphics) from a set of images with varying exposure. Given the response curve, input images can be linearized and scaled according to their exposure settings. The final HDR image is computed as a weighted sum of these images, where the weighting function reflects the certainty with which the value of an individual pixel in any of the input images is known (e.g., pixel values close to over- or underexposure are assigned a low certainty and consequently a low weight). Although capturing multiple images of the same scene with different exposure settings has several disadvantages (e.g., long acquisition times due to the multiple exposures, difficulties for dynamic scenes), it is used in many applications. Systems using cameras with native HDR sensors, which can avoid many of these problems, are only slowly gaining momentum (see also the discussion of camera requirements for computer graphics applications in Sect. 13.5). 13.2.2 Photometric Calibration For many applications, photometric calibration of the captured image data is crucial. Most multiexposure techniques yield images with linear floating point data per pixel in unknown units. These can be related to physical quantities by performing absolute calibration with a test target with known luminance (e.g.,
13 HDR Applications in Computer Graphics
195
1e+06
luminance in [cd/m2]
100,000
· ·
10,000
·
1000 100
· ·
10 1
·
·
· · · ·
·
·
·
· ·
·
· ·
·
measured HDRC Lars III C14
·
·
·
0.1 0
6
12
18
24
30
36
measurement Fig. 13.1. Absolute calibration of several camera systems. Gray patches were illuminated with different intensities and their absolute luminance values were determined using a luminance meter. Absolute luminance values were then captured using a standard CCD camera (Jenoptik C14) with multiexposure technique and using two HDR cameras (IMS-Chips HDRC, SiliconVision Lars III)
determined by a light meter). Similarly, imagery from HDR camera systems needs to be linearized and calibrated to yield absolute luminance values. To determine the accuracy of various HDR capture methods, Krawczyk et al. [8] used a series of test targets with known luminance values covering about 6 orders of magnitude. The luminance values were determined using a standard luminance meter. HDR images of these targets were acquired with multiexposure techniques using a standard CCD camera (Jenoptik C14). As shown in Fig. 13.1, the camera could capture all target luminance values with high accuracy apart from the two brightest samples. These samples were overexposed in all input images and no valid data could be recovered. They furthermore performed an absolute calibration of two HDR camera systems (IMS-Chips HDRC, SiliconVision Lars III) using two methods: an adapted version of a standard multiexposure calibration [7] and a simple fitting procedure based on the known linear or logarithmic camera behavior as given in the manufacturers’ data sheets. Figure 13.1 shows also the results of capturing the same test targets with the calibrated HDR cameras. For valid samples within a defined luminance range, the accuracy of the HDR cameras calibrated with the latter approach is comparable to the multiexposure technique with an average error of about 6%. But the accuracy falls off for samples taken with the HDR cameras in very bright or very dark conditions, whereas the multiexposure technique yields data with constant accuracy as long as some of the input images contain valid data for a sample location.
196
M. Goesele and K. Myszkowski
13.3 Image-Based Object Digitization Real-world objects interact with light in a variety of different ways [9, 10]: Purely diffuse objects such as a piece of chalk scatter light uniformly in all outgoing directions at the point of incidence. Glossy or specular surfaces scatter light in a more directional way leading to visible highlights on the surface. Light enters transparent objects, travels through them in straight lines, and leaves them at a possibly different location whereas light is scattered diffusely inside translucent objects. The interaction of light with arbitrary objects such as the examples given above is described by the object’s reflection properties. The bidirectional scattering-surface reflectance distribution function (BSSRDF) [11] provides a general model of the light reflection properties of an object and is defined as follows: ˆ i ; xo , ω ˆ o ) := S( xi , ω
ˆo) dL→ ( xo , ω . ← dΦ ( xi , ω ˆi)
(13.1)
The BSSRDF S is the ratio of reflected radiance L→ ( xo , ω ˆ o ) leaving the surˆ o to the incident flux Φ← ( xi , ω ˆ i ) arriving at face at a point xo in direction ω ˆ i . Note that the BSSRDF is closely related to a point xi from a direction ω the reflectance field [12] which is defined for an arbitrary surface in space and does not need to coincide with an actual object surface. Capturing the full BSSRDF of a general object is, however, an ambitious task due to its high dimensionality. Practical systems capture therefore only a lower dimensional slice of the BSSRDF adapted to the reflection properties of an object. Light is for example only reflected at the surface of opaque objects and does not enter them. Their reflection behavior can therefore be modeled by a bidirectional reflectance distribution function (BRDF) that assumes xo = xi .1 Many capture systems used in computer graphics are image-based acquisition methods that use one or several cameras as input device taking advantage of the massive parallelism in the image sensor. In the following, we will describe two examples of image-based acquisition methods. A more detailed overview can for example be found in Lensch et al. [10]. 13.3.1 Image-Based Capture of Spatially Varying BRDFs Determining an object’s BRDF for all visible surface points is a difficult task: It requires illuminating all parts of the surface from all possible incident directions ω ˆ i and observing the reflected radiance for all outgoing directions ω ˆ o . There are, however, various physically based or empirical BRDF models such as the Cook–Torrance model [13], the He model [14], the Ward model [15], or the Lafortune model [16] that describe the reflectance properties of an opaque surface with only a few parameters. Most of these models have been 1
Note that the BRDF is also defined for translucent objects which requires integrating contributions over the whole surface [11].
13 HDR Applications in Computer Graphics
197
Fig. 13.2. Objects rendered with spatially varying BRDFs. The models were captured using methods developed by Lensch et al. [28]
proposed in the context of rendering but they can also be used to efficiently determine an approximation of the BRDF of a real object by fitting their parameters based on a small number of observations. Marschner et al. [17] introduced an image-based BRDF measurement system that determined the average BRDF of an object. The system makes use of the fact that many objects have a curved surface. A single photograph shows the object therefore under a variety directions (relative to the local surface normal) corresponding to outgoing directions ω ˆ o . To collect samples of the BRDF, the object is illuminated by a small light source and photographed from different directions. Based on these ideas, Lensch et al. [18] built a system that determines the parameters of the Lafortune BRDF model [16] per surface point yielding a spatially varying BRDF. They capture a small number of HDR images of an object illuminated by a point light source yielding typically 3–10 BRDF samples per surface point. Note that the “hard” illumination of the point light source and the specularity of the objects mandate the use of HDR capture techniques. The object surface is then clustered into regions with similar reflectance properties. A set of Lafortune parameters is determined for each of these regions and the reflectance of each surface point is modeled as a linear combination of these basis BRDF models. Figure 13.2 (refer to the color plate) shows renderings of several objects captured with this approach. Highlights on the objects’ surfaces are clearly visible and add to the realism of the renderings. 13.3.2 Acquisition of Translucent Objects Translucency is an important effect in realistic image synthesis – many daily life objects such as skin, fruits, or wax are translucent. Light is not only reflected off their surface but enters the object, is scattered inside, and leaves
198
M. Goesele and K. Myszkowski
Fig. 13.3. Left: Input image used to determine the diffuse reflectance Rd with a dynamic range of about 105 . Right: Rendered model of an alabaster horse sculpture. The model was captured using an approach developed by Goesele et al. [21] and rendered interactively using methods proposed by Lensch et al. [22]
the object at a possibly different surface location. While many recent publications deal with the rendering side of translucent objects (see, e.g., [19–21] for a comprehensive overview), only very few deal with the problem of creating digital models of real translucent objects with heterogeneous material properties. Capturing a translucent object’s complete light interaction properties by recording its BSSRDF is a complex task due to the high dimensionality of the problem. If we can, however, assume that the objects consist of optically dense material, the problem can be simplified by introducing the assumption that light transport is dominated by diffuse multiple scattering. After multiple scattering events, the dependence on the incoming and outgoing angles ω ˆi and ω ˆ o can be neglected. It is thus sufficient to determine the diffuse reflectance Rd ( xi , xo ) that describes the light transport between arbitrary pairs of surface locations. Based in the assumption of multiple scattering, Goesele et al. [21] introduced the DISCO system to capture the reflection properties of heterogeneous translucent objects. The object is hereby illuminated by a narrow laser beam that is scanning over the object surface. For each point illuminated by the laser, the object’s impulse response is recorded from multiple view points by a HDR camera. To achieve good surface coverage, the system captures up to 1 million input images for a single object. Due to the exponential fall-off of intensity inside scattering material, capturing HDR data is mandatory. The captured data is used to assemble a large transfer matrix encoding the diffuse reflectance Rd . Figure 13.3 shows on the left a captured image for an alabaster horse sculpture with a dynamic range of about 105 and a rendering of the acquired model using Lensch et al. [22].
13 HDR Applications in Computer Graphics
199
Fig. 13.4. Overview of a rendering system using captured image-based lighting to illuminate a synthetic scene. The hemispherical Environment Map (EM) is captured using an HDR video camera with a fisheye lens (left). The resulting video environment map (center ) with real-world lighting is submitted to a GPU-based renderer featuring the shadow computation (right)
13.4 Image-Based Lighting in Image Synthesis In realistic image synthesis apart from geometric models of rendered objects and their surface characteristics such as color, texture, and light reflectance, information about light sources illuminating a given scene is required. Traditionally lighting information is approximated by a set of point light sources possibly with emitted energy modulated as a function of direction to better approximate the real-world light sources. To encounter indirect lighting costly interreflection computation must be performed. While such an approach may work well for some lighting scenarios, especially those dealing with interiors in architectural applications, it usually leads to images with a synthetic look. Recent research demonstrates that realism in image synthesis increases significantly when captured real-world lighting is used to illuminate rendered scenes. This approach is often called image-based lighting since images play a role of light sources which illuminate the scene (refer to Fig. 13.4). Note that in this case the direct illumination and lighting interreflected between surfaces in the real-world scene are simultaneously captured. The human visual system is specialized to operate in real-world lighting conditions and makes many implicit assumptions about statistical regularities in such a lighting [23]. The real-world lighting statistics are often needed to disambiguate information about surrounding objects: the same amount of light may fall onto the human eye retina as reflected from surfaces being poor reflectors that are strongly illuminated and identically shaped surfaces that are good light reflectors located in a dim environment. The human visual system can easily distinguish both situations by discounting the illuminants, which computationally is an ill posed problem of lightness determination that requires some assumptions about the scene lighting to be solved [24, 25]. Through psychophysical experiments with computer generated images Fleming et al. [23] have shown that the human observer ability to notice even subtle differences in the material appearance (surface reflectance characteristics) is much better under real-world lighting conditions than using a small number of point light sources (still a prevailing illumination setting in many
200
M. Goesele and K. Myszkowski
computer graphics applications). Realistic lighting improves also the accuracy of shape difference discrimination between rendered objects as well [26]. Clearly, real-world lighting is desirable in many engineering applications and would improve the believability of virtual reality systems notoriously lacking realism in rendering. Real-world lighting is indispensable in many mixed reality applications in which virtual objects should be seamlessly merged with a real-world scene [27]. Traditionally, real-world lighting is captured into environment maps (EM), which represent distant illumination incoming to a point from thousands or even millions of directions that are distributed over a hemisphere (sphere). HDR technology is required for the environment map acquisition to accommodate high contrasts in the real-world lighting. For static conditions low dynamic range cameras and multiexposure techniques (refer to Sect. 13.2.1) can be used to acquire a HDR image of a spherical light probe. Recently, Stumpfel et al. [28] captured dynamic sky conditions featuring a direct sun visibility every 40 s. In principle, real-time environment maps acquisition using a low dynamic range video camera with a fisheye lens could be performed using the techniques proposed by Kang et al. [29] but only for a limited number of exposures (i.e., effectively reduced dynamic range). This limitation can be overcome using HDR video sensors, which enable the direct capturing of HDR video environment maps (VEM). It can be envisioned that with quickly dropping costs of such sensors, many applications relying so far on static image-based lighting will be soon upgraded to dynamic settings. In this section, we are mostly interested in interactive computer graphics applications involving lighting captured in VEM. In Sect. 13.4.1 we discuss briefly existing rendering techniques that are suitable to achieve this goal. Then, in Sect. 13.4.2 we present an efficient global illumination solution specifically tailored for those CAVE applications, which require an immediate response for dynamic light changes and allow for free motion of the observer, but involve scenes with static geometry. As an application example we choose modeling of car interiors under free driving conditions. The car is illuminated using dynamically changing HDR VEM and precomputed radiance transfer (PRT) techniques are used for the global illumination computation (refer to “Precomputed Radiance Transfer”). In Sect. 13.4.3 we present an interactive system for fully dynamic lighting of a scene using real-time captured HDR VEM. The key component of this system is an algorithm for efficient decomposition of HDR VEM into a set of representative directional light sources (refer to “Environment Maps Importance Sampling”), which can be used for the direct lighting computation with shadows on graphics hardware. 13.4.1 Rendering Techniques for Image-based Lighting Environment Map Look-up Environment maps are commonly used to render the reflection of captured environment in mirror-like, nonplanar surfaces. A reflected ray direction with
13 HDR Applications in Computer Graphics
201
respect to a given view direction is computed based on the surface normal and is used to access the corresponding pixel in the environment map. The pixel intensity is then modulated by the surface reflectance and displayed in the final image pixel. Such a mirror reflection rendering can be performed on graphics hardware in real time. The technique works, however, only well for perfect mirrors, when only one direction of incoming lighting is assumed. Other reflectance functions require the integration of incoming lighting contributions from all directions within a hemisphere aligned to the surface normal direction. In such a case environment map prefiltering is commonly used, which for example for Lambertian surfaces requires summing up the contributions of all hemisphere pixels modulated by the cosine of light incidence angle [30]. (The angle is measured between the direction to a given environment map pixel and the surface normal direction.) Such environment map prefiltering is possible for the Phong model [31], or even more general light reflectance functions [32]. Those techniques could be adapted for VEM processing but the VEM resolution should be reduced to achieve interactive performance due to the prefiltering cost. The techniques support dynamic environments but ignore the visibility computation, which affects the realism of resulting frames due to lack of shadows. Lighting from Video Textures Assarsson and M¨ oller have shown that realistic soft shadows can be obtained at interactive speeds for dynamic scenes illuminated by planar HDR video textures [33]. However, time consuming and memory intensive preprocessing of each video frame is required to achieve real-time rendering performance. This precludes the on-line use of directly captured lighting using an HDR camera. It is also not clear how to extend this technique to handle lighting emitted by nonplanar environment maps. Precomputed Radiance Transfer The costly prefiltering operation can be avoided using a technique proposed by Ramamoorthi and Hanrahan [34]. They assumed that the irradiance function is smooth and continuous, so that only nine spherical harmonics (SH) coefficients are enough to give an accurate approximation of irradiance for diffusely reflecting surfaces. Those coefficients can be computed in the linear time with respect to the number of pixels in the map. It is therefore possible to prefilter environment maps for diffuse objects on the fly. In [34] only environment map lighting has been projected to the SH basis functions. Sloan et al. [35, 36] have shown that also the visibility and reflectance functions can be efficiently represented in this basis. They introduced the concept of the PRT function, which maps incoming radiance from an environment map to outgoing radiance from the scene surfaces. The transfer
202
M. Goesele and K. Myszkowski
takes into account soft shadows and interreflection effects. The outgoing radiance can be simply computed as the dot product between the incident lighting and radiance transfer vectors of SH coefficients. For Lambertian surfaces the lighting and transfer vectors composed of 25 SH coefficients for each sample point lead to good visual results for slowly changing and smooth lighting. For more general reflectance functions for which the incoming lighting directions are important, a matrix of spherical harmonic coefficients with the transfer vectors for each of those directions must be considered. In practice, matrices of 25 × 25 coefficients are commonly used [35]. Since the transfer vectors (matrices) are stored densely over the scene surfaces (usually for each mesh vertex) data compression is an important issue. It can be efficiently performed using standard tools such as principal component analysis (PCA) and clustering [36]. Recently, the limitation of low-frequency lighting has been lifted using the wavelet basis functions [37]. Using this approach proposed by Ng et al. both soft and sharp shadows can be rendered, but a very dense mesh is required to reconstruct the lighting function precisely. PRT techniques are suitable for real-time VEM-based lighting of static scenes, for which the transfer vectors are precomputed at the preprocessing stage and the projection of environment map lighting into the SH basis is inexpensive and can be performed for each frame. Soft shadows, interreflections, and even more advanced shading effects such as subsurface scattering and translucency can be efficiently rendered using those techniques. It is easy to extend PRT techniques for dynamic scenes with prior knowledge of the animation by precomputing the radiance transfer data for each frame (or keyframe), which involves huge storage costs. However, efficient handling of fully dynamic scenes in interactive scenarios is still an open research issue. Environment Maps Importance Sampling For efficiency reasons many interactive applications rely on rendering based on graphics hardware. The best performance of such rendering in terms of the direct lighting and shadow computation can be achieved for directional light sources. It is thus desirable that captured environment maps are decomposed into a set of representative directional light samples. A number of techniques based on the concept of importance sampling have been recently proposed to perform such a decomposition for static lighting [38–40]. Unfortunately, their direct extension for VEM is usually not feasible due to at least one of the following major problems: – Too high computational costs precluding VEM capturing and scene relighting at interactive speeds [38, 39] – Lack of temporal coherence in the positioning of selected point light sources for even moderate and often local lighting changes in VEM [38,40] – Lack of flexibility in adapting the number of light sources to the rendered frame complexity as might be required to maintain a constant frame rate [40]
13 HDR Applications in Computer Graphics
203
The latter two problems, if not handled properly, can lead to annoying popping artifacts. Havran et al. [41] have proposed an efficient importance sampling algorithm which leads to temporally coherent sets of light sources of progressively adjustable density. In their algorithm they treat the pixel luminance values in the EM as a discrete 2D probability density function (PDF) and they draw samples (light source directions) from this distribution following procedures established in the Monte Carlo literature. For this purpose they select an inverse transform method [42], which exhibits unique continuity and uniformity properties that are desirable for VEM applications. The method guarantees the bicontinuity property for any nonnegative PDF, which means that a small change in the input sample position over the unit square is always transformed into a small change in the resulting position of light source over the EM hemisphere. The uniformity property is important to achieve a good stratification of the resulting light source directions. To reduce temporal flickering in the resulting frames Havran et al. choose the same set of initial samples over the unit square for each VEM frame, which are then processed using their inverse transform method. Since their emphasis is on interactive applications, they use a progressive sequence of samples in which adding new samples does not affect the position of samples already used for shading computations, while good sample stratification properties are always preserved. This is important for an adaptive selection of the number of light sources to keep a constant frame rate and for the progressive image quality refinement where the number of light sources can be gradually increased (this requires freezing the VEM frame). Havran et al. achieve the progressiveness of sampling sequence using quasi-Monte Carlo sequences such as the 2D Halton sequence. They perform Lloyd’s relaxation over the initial sample positions in the unit square in a preprocessing step. This allows us to achieve a blue noise sample pattern on the hemisphere. Since even local changes in the EM lead to global changes of the PDF, the direction of virtually all light sources may change from frame to frame, which causes unpleasant flickering in the rendered images. Havran et al. apply perception-inspired, low-pass FIR filtering to the trajectory of each light motion over the hemisphere and also to the power of the light sources. 13.4.2 A CAVE System for Interactive Global Illumination Modeling in Car Interior Global illumination dramatically improves realistic appearance of rendered scenes, but is usually neglected in virtual reality (VR) systems due to its high costs [43]. In this section we present a VR system for realistic rendering of car interiors illuminated by VEMs that have been captured for various driving conditions and are visible through the car windows (refer to Fig. 13.5). The main goal of the system is to investigate the impact of quickly changing lighting conditions on the visibility of information displayed on LCD panels
204
M. Goesele and K. Myszkowski
Fig. 13.5. Interactive rendering of car interior for two different lighting conditions
(commonly mounted on the dashboard in modern cars). This requires a global illumination solution responding interactively to lighting changes, which result from different car orientations in respect to distant lighting stored as dynamically changing VEM. This application scenario is similar to the simulation of free driving in an environment in which buildings, trees, and other occluders change the amount lighting penetrating the car interior. In such a scenario the response for lighting changes should be immediate for an arbitrary observer (virtual camera) position, but it can be safely assumed that the geometry of car interior is static, which greatly simplifies the choice of global illumination solution. To improve the immersion experience a CAVE system is used for displaying the car interior. A head tracking system monitors the current observer position, to properly model light reflection in the LCD panel. The choice of global illumination solution in this application is strongly constrained by the hardware configuration of the CAVE system. Each of the five walls in the test system is powered by two consumer-level dualprocessor PCs needed for the stereo projection effect (i.e., 5 × 2 frames must be simultaneously rendered). Each PC is equipped with a high-end graphic card. The design goal is to select such GI algorithms that fully exploit the computational power of the available CPUs and GPUs and can produce images of high (full screen) resolution at interactive rates. The choice of global illumination algorithm is also influenced by major characteristics of the car interior: static complex geometry and dominating Lambertian lighting interreflection component. Taking into account all discussed system constraints, PRT techniques (refer to “Precomputed Radiance Transfer”), which require very costly preprocessing, but then enable real-time rendering that is particularly efficient for environments with predominantly diffuse reflectance properties. The accuracy of light reflection modeling in the LCD panel is the most critical requirement in this VR system. Since the LCD panel reflectance involves also glossy component, the low-frequency lighting assumption that is imposed by view-independent PRT techniques might not hold so well. To
13 HDR Applications in Computer Graphics
205
Fig. 13.6. The LCD panel appearance as a result of the global illumination computation for VEM lighting (Left: full global illumination, Center: display emitted light only, Right: reflected light)
improve the spatial resolution of lighting details for a given camera view final gathering is used, in which the results obtained using the PRT techniques are stochastically integrated for selected sample points. To reduce the variance of such integration the reflectance function of LCD panel is used as an important criterion for choosing sample directions. Figure 13.6 shows the appearance of a LCD panel under global illumination conditions for dynamic VEM lighting as displayed on a HDR monitor. Since the dynamic range of the HDR monitor is significantly higher than the one of a typical LCD panel that is mounted in the car cockpit, the visibility of information displayed for the driver can be tested for many external lighting conditions. Through the calibration of the HDR display, real-world luminance values can be reproduced for the LCD panel taking into account both the panel emissivity as well as reflected lighting resulting from the global illumination simulation. Due to separate treatment of the car interior and LCD panel, a hybrid global illumination approach is implemented in which the PRT lighting computation and rendering are performed on GPUs for all five CAVE walls in stereo. The final gathering computation also uses the results of PRT lighting, but it is performed in parallel on all idle CPUs in the system. This way the final gathering reputed as a computationally heavy off-line technique can be performed with interactive performance in the VR system. 13.4.3 Interactive Lighting in Mixed Reality Applications Havran et al. [41] have presented a complete pipeline from the HDR VEM acquisition to rendering at interactive speeds. The main technical problem in this work is efficient processing of the acquired HDR video, which leads to a good quality rendering of dynamic environments using graphics hardware. For the HDR VEM acquisition Havran et al. use a photometrically calibrated HDRC VGAx (IMS CHIPS) camera with a fish-eye lens. They decompose each captured frame of VEM into a set of directional light sources, which are well suited for the shadow computation and shading using graphics hardware (refer to “Environment Maps Importance Sampling”). The light sources after temporal filtering are used in the pixel shader of a GPU. The main computational bottleneck is the shadow map computation for hundreds
206
M. Goesele and K. Myszkowski
Fig. 13.7. Left: the model with strong specular reflectance properties composed of 16,200 triangles rendered with 72 shadow maps at 5.3 Hz. On the left top is an environment map captured in real time using a HDR camera with fish-eye lens. The light sources are marked by green points. On the left bottom the same environment map is shown in polar projection. Right: the same model illuminated by outdoor lighting
of light sources. They propose an algorithm that for a given extent of the normal vector directions of a shaded pixel efficiently eliminates most of the invisible light sources. Optionally, they cluster lights with similar directions corresponding to the regions of concentrated energy in the VEM, e.g., around the sun position, to reduce the shadow computation cost even further. The system presented by Havran et al. lifts common limitations of existing rendering techniques, which cannot handle at interactive speeds HDR imagebased lighting captured in dynamic real-world conditions along with complex shadows, fully dynamic geometry, and arbitrary reflectance models evaluated on a GPU (refer to Fig. 13.7 in the color plate). The system runs on a PC with a 3 GHz Pentium4 processor and NVidia GeForce 6800GT graphics card. The performance of 22 fps for the VEM capturing and directional light generation with filtering is achieved. The asynchronous rendering performance drops to 7 fps when 72 light sources and shadows at the image resolution of 320 × 240 pixels are considered. The system developed by Havran et al. has many potential applications in mixed reality and virtual studio systems, in which real and synthetic objects are illuminated by consistent lighting at interactive frame rates (refer to Fig. 13.8 in the color plate). The synthetic entities are expected to blend seamlessly with the real-world surroundings, objects, and persons. So far dynamic lighting in such systems is performed by scripting rigidly set matrices of light sources whose changes are synchronized with the virtual set rendering. Such systems are expensive and require significant human assistance, which reduces their applicability for smaller broadcasters. The presented system can support arbitrary changes of lighting in virtual studio in a fully automatic way.
13.5 Requirements for HDR Camera Systems The earlier sections showed some applications of HDR camera systems in computer graphics. In the following, we summarize some of the requirements
13 HDR Applications in Computer Graphics
207
Fig. 13.8. Comparison of the fidelity in the shadow and lighting reconstruction for the real-world and synthetic angel statuette illuminated by dynamic lighting. Real-world lighting is captured by the HDR video camera located in the front of the round table with an angel statuette placed atop (the right image side). The captured lighting is used to illuminate the synthetic model of the angel statuette shown in the display (the left image side)
for HDR camera systems that are used in such applications. Their importance depends naturally on the application at hand and such a list will therefore never be complete. High dynamic range. As shown previously, the limited dynamic range of many camera systems is a key problem in computer graphics. Having sufficient dynamic range available (a dynamic range of 106 and above seems to be sufficient for most current applications) is therefore mandatory. High sampling depth. Quantization artifacts can cause problems in numerical computations or lead to visible banding artifacts. The individual pixel values should be digitized with highest possible sampling depth. Note that the meaningful sampling depth will also be limited by factors such as noise or optical blurring due to the point spread function of the imaging system. High image resolution. Spatial detail in captured images is often more important than overall structure (e.g., for captured textures). Graphics application require therefore often higher image resolutions than applications from other fields. Good image quality. The quality of the input images plays a huge role for the achievable results – especially if the results are not interpreted but displayed directly (e.g., in the form of a texture). Standard requirements such as low noise or good color calibration are therefore also important for the case of HDR images.
208
M. Goesele and K. Myszkowski
If these requirements are met, there is a large potential for the use of HDR camera systems in computer graphics. Our world contains inherently HDR information and most digitization systems would benefit from capturing the full scene information. Most rendering systems can already perform computations in HDR with floating point precision so that HDR capture can be seamlessly integrated in many existing solutions.
References 1. R. Bogart, F. Kainz, and D. Hess. OpenEXR image file format. In ACM SIGGRAPH 2003, Sketches and Applications, 2003 2. Charles W. Wyckoff and Stan A. Feigenbaum. An experimental extended exposure response film. SPIE, 1:117–125, 1963 3. Brian C. Madden. Extended Intensity Range Imaging. Technical report, University of Pennsylvania, GRASP Laboratory, 1993 4. Steve Mann and Rosalind W. Picard. On being ‘undigital’ with digital cameras: Extending dynamic range by combining differently exposed pictures. In IS&T’s 48th Annual Conference, pp. 422–428, 1995 5. Paul E. Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs. In Proceedings of SIGGRAPH 97, Computer Graphics Proceedings, Annual Conference Series, pp. 369–378, 1997 6. T. Mitsunaga and S.K. Nayar. Radiometric self calibration. In Proceeding CVPR-99, pp. 374–380. IEEE, New York, 1999 7. Mark A. Robertson, Sean Borman, and Robert L. Stevenson. Estimationtheoretic approach to dynamic range enhancement using multiple exposures. Journal of Electronic Imaging, 12(2):219–228, April 2003 8. Grzegorz Krawczyk, Michael Goesele, and Hans-Peter Seidel. Photometric calibration of high dynamic range cameras. Research Report MPI-I-20054-005, Max-Planck-Institut f¨ ur Informatik, Stuhlsatzenhausweg 85, 66123 Saarbr¨ ucken, Germany, April 2005 9. Hendrik P.A. Lensch. Efficient, Image-Based Appearance Acquisition of Real-World Objects. PhD Thesis, Universit¨ at des Saarlands, 2003 10. Hendrik P.A. Lensch, Michael Goesele, Yung-Yu Chuang, Tim Hawkins, Steve Marschner, Wojciech Matusik, and Gero M¨ uller. Realistic materials in computer graphics. In SIGGRAPH 2005 Course Notes, 2005 11. Fred E. Nicodemus, Joseph C. Richmond, Jack J. Hsia, I.W. Ginsberg, and T. Limperis. Geometrical Considerations and Nomenclature for Reflectance. National Bureau of Standards, 1977 12. Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, and Mark Sagar. Acquiring the reflectance field of a human face. Proceedings of SIGGRAPH 2000, pp. 145–156, July 2000 13. Robert L. Cook and Kenneth E. Torrance. A reflectance model for computer graphics. In Computer Graphics (Proceedings of SIGGRAPH 81), pp. 307–316, August 1981 14. X. He, K. Torrance, F. Sillion, and D. Greenberg. A comprehensive physical model for light reflection. In Proceedings of SIGGRAPH 1991, pp. 175–186, July 1991
13 HDR Applications in Computer Graphics
209
15. G. Ward. Measuring and modeling anisotropic reflection. In Proceedings of SIGGRAPH 1992, pp. 265–272, July 1992 16. E. Lafortune, S. Foo, K. Torrance, and D. Greenberg. Non-linear approximation of reflectance functions. In Proceedings of SIGGRAPH 1997, pp. 117–126, August 1997 17. S. Marschner, S. Westin, E. Lafortune, K. Torrance, and D. Greenberg. Image-based BRDF measurement including human skin. In Proceedings of 10th Eurographics Workshop on Rendering, pp. 131–144, June 1999 18. H.P.A. Lensch, J. Kautz, M. Goesele, W. Heidrich, and H.-P. Seidel. Imagebased reconstruction of spatial appearance and geometric detail. ACM Transactions on Graphics, 22(2):234–257, 2003 19. Henrik Wann Jensen, Stephen R. Marschner, Marc Levoy, and Pat Hanrahan. A practical model for subsurface light transport. In SIGGRAPH 2001, pp. 511–518, 2001 20. Yanyun Chen, Xin Tong, Jiaping Wang, Stephen Lin, Baining Guo, and Heung-Yeung Shum. Shell texture functions. ACM Transactions on Graphics (SIGGRAPH 2004), 23(3):343–353, 2004 21. Michael Goesele, Hendrik P.A. Lensch, Jochen Lang, Christian Fuchs, and Hans-Peter Seidel. DISCO – Acquisition of translucent objects. ACM Transactions on Graphics (SIGGRAPH 2004), 23(3):835–844, 2004 22. Hendrik P.A. Lensch, Michael Goesele, Philippe Bekaert, Jan Kautz, Marcus A. Magnor, Jochen Lang, and Hans-Peter Seidel. Interactive rendering of translucent objects. Computer Graphics Forum, 22(2):195–206, 2003 23. R.W. Fleming, R.O. Dror, and E.H. Adelson. Real-world illumination and the perception of surface reflectance properties. Journal of Vision, 3(5):347–368, 2003 24. E.H. Land and J.J. McCann. Lightness and the retinex theory. Journal of the Optical Society of America, 61(1):1–11, 1971 25. B.K.P. Horn. Determining lightness from an image. Computer Graphics and Image Processing, 3(1):277–299, 1974 26. James A. Ferwerda, Stephen H. Westin, Randall C. Smith, and Richard Pawlicki. Effects of rendering on shape perception in automobile design. In ACM Siggraph Symposium on Applied Perception in Graphics and Visualization 2004, pp. 107–114, 2004 27. Paul Debevec. Rendering synthetic objects into real scenes: Bridging traditional and image-based graphics with global illumination and high dynamic range photography. In Proceedings of SIGGRAPH 98, Computer Graphics Proceedings, Annual Conference Series, pp. 189–198, 1998 28. Jessi Stumpfel, Chris Tchou, Andrew Jones, Tim Hawkins, Andreas Wenger, and Paul E. Debevec. Direct HDR capture of the Sun and Sky. In Afrigraph, pp. 145–149, 2004 29. Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. High dynamic range video. ACM Transactions on Graphics, 22(3):319–325, 2003 30. Ned Greene. Environment mapping and other applications of world projections. IEEE Computer Graphics and Applications, 6(11):21–29, 1986 31. Wolfgang Heidrich and Hans-Peter Seidel. Realistic, hardware-accelerated shading and lighting. In Proceedings of SIGGRAPH 99, Computer Graphics Proceedings, Annual Conference Series, pp. 171–178, 1999 32. Jan Kautz and Michael D. McCool. Approximation of glossy reflection with prefiltered environment maps. In Graphics Interface, pp. 119–126, 2000
210
M. Goesele and K. Myszkowski
33. Ulf Assarsson and Tomas Akenine-M¨ oller. A geometry-based soft shadow volume algorithm using graphics hardware. ACM Transactions on Graphics, 22(3):511–520, 2003 34. Ravi Ramamoorthi and Pat Hanrahan. An efficient representation for irradiance environment maps. In Proceedings of ACM SIGGRAPH 2001, Computer Graphics Proceedings, Annual Conference Series, pp. 497–500, August 2001 35. Peter-Pike Sloan, Jan Kautz, and John Snyder. Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. ACM Transactions on Graphics, 21(3):527–536, 2002 36. Peter-Pike Sloan, Jesse Hall, John Hart, and John Snyder. Clustered principal components for precomputed radiance transfer. ACM Transactions on Graphics, 22(3):382–391, July 2003 37. Ren Ng, Ravi Ramamoorthi, and Pat Hanrahan. Triple product wavelet integrals for all-frequency relighting. ACM Transactions on Graphics, 23(3):477–487, 2004 38. Sameer Agarwal, Ravi Ramamoorthi, Serge Belongie, and Henrik Wann Jensen. Structured importance sampling of environment maps. ACM Transactions on Graphics, 22(3):605–612, 2003 39. Thomas Kollig and Alexander Keller. Efficient illumination by high dynamic range images. In Eurographics Symposium on Rendering: 14th Eurographics Workshop on Rendering, pp. 45–51, 2003 40. Victor Ostromoukhov, Charles Donohue, and Pierre-Marc Jodoin. Fast hierarchical importance sampling with blue noise properties. ACM Transactions on Graphics, 23(3):488–495, 2004 41. Vlastimil Havran, Miloslaw Smyk, Grzegorz Krawczyk, Karol Myszkowski, and Hans-Peter Seidel. Interactive system for dynamic scene lighting using captured video environment maps. In 16th Eurographics Symposium on Rendering, pp. 31–42, 2005 42. Vlastimil Havran, Kirill Dmitriev, and Hans-Peter Seidel. Goniometric Diagram Mapping for Hemisphere. Short Presentations (Eurographics 2003), 2003 43. Kirill Dmitriev, Thomas Annen, Grzegorz Krawczyk, Karol Myszkowski, and Hans-Peter Seidel. A CAVE system for interactive modeling of global illumination in car interior. In Rynson Lau and George Baciu, editors, ACM Symposium on Virtual Reality Software and Technology (VRST 2004), pp. 137–145, Hong Kong, 2004
14 High-Dynamic Range Displays Helge Seetzen
We see more than we can process. This fact of our visual system applies equally to the HDR imaging pipeline. Digital cameras can capture more than displays can show. Likewise, most rendering techniques produce results that overwhelm the capabilities of conventional displays. Overall, there are few output devices today that can rival the dynamic range of comparable input devices. This chapter describes techniques to overcome this limitation as well as an implementation example in the form of the BrightSide Technologies DR37 HD HDR Display (Fig. 14.1).
14.1 HDR Display Requirements Any effective HDR Display technology needs to deliver not only high dynamic range output but also meet key industry and infrastructure requirements. Specifically, the HDR Display technology should be: – Able to present realistic luminance distributions – Compatible with existing imaging infrastructure – Reasonably cost effective The latter two requirements are the easiest to define. Any improved display technology needs to function within today’s imaging pipeline. Ideally it should also provide some benefit over existing solution even with conventional input. An example of a novel approach that did not meet this requirement is high definition (HD) television. These displays, while offering compelling benefits with the right content, deliver little improvement when showing much more common conventional (regular definition) footage. As a result the introduction of HD displays has been very slow and only the stamina of very large consumer electronics companies allows HD Displays to compete in the market today. The cost effectiveness requirement is also quite self-explanatory. While higher performance displays can command some price premium, they need to
212
H. Seetzen
be in the same league with conventional solutions to leave the niche application space. The first requirement is intuitively the most obvious. Clearly, HDR Displays should be able to present images with realistic luminance distributions. A matching technical description for this objective requires a good definition of the human visual system (HVS) and the translation thereof into display performance. The latter is particularly challenging because of variable terminology used in the display industry. Natural scenes offer a vast luminance range over 14 orders of magnitude from starlight to sunlight. Our visual system has evolved to operate in this environment by using adaptation mechanism. For the purpose of HDR Displays the ambient environment is usually a darkened room (e.g. home theatre) or an office setting. Comfortable peak luminance levels for such an environment are 3,000–6,000 cd m−2 depending on the size of the feature [1]. A good HDR Display should reach this level to maximize the use of the viewer’s capabilities. With the peak luminance of the display set in this fashion, the next step is to define the black luminance and contrast of the display. The use of these terms varies greatly between psychophysics and the display industry. Among display vendors the contrast specification is often provided with qualifiers such as “on/off contrast” (ratio between a completely black and a completely white screen), “frame sequential contrast” (contrast achievable over multiple frames) or “ANSI contrast” (ratio between black and white portions of a checkerboard pattern). Each metric yields different numbers. Most display technologies have fairly low ANSI contrast due to light leakage from bright into dark regions of the checkerboard. On/off contrast is usually the highest value for devices that can increase or reduce the light output if the entire screen in black (though of course it is arguably also the least relevant metric for actual video content since the presence of some white pixel in most normal video frames prevents the display from ever reaching this exaggerated black level). Rather than agonizing over the correct choice, it is simpler to conclude that there is nothing wrong with having a black level near or at zero. This sets the ideal luminance range of HDR Displays at 3,000–6,000 cd m−2 to 0 cd m−2 for applications where visually pleasant images are desired. Obviously this numbers will change if accurate luminance reproduction of very high brightness features is desired in applications such as scientific simulation. The final aspect of the technical requirement is the amplitude resolution of the display. Like contrast, this is a difficult characteristic that depends highly on the HVS characteristics. The HVS has a non-linear response and as a result most displays feature at least some form of gamma response. Fortunately, psychophysics provides a good description of the contrast sensitivity threshold as a function of luminance. The amplitude resolution and response curve of a good HDR Display then should be such that the digital step size at any luminance level is lower than the perceivable threshold at that luminance.
14 High-Dynamic Range Displays
213
14.2 HDR Display Design There are in principle two avenues to design HDR Displays. Either each pixel of the display can be modulated accurately over a very wide range or the output of a pixel is actually the serial combination of two or more modulators. In the later case the final dynamic range of the display is given by the product of the ranges of each of the sub-modulators. Technologies in the first category include laser projection and organic light emitting diodes (OLED). These devices are at least in principle capable of modulating each pixel from zero to some high luminance value. In reality their capabilities are more restricted. In both cases the largest restriction is the limited peak luminance of the devices. Laser projection requires high power laser diodes which are currently prohibitively expensive, especially for green and blue lasers. The materials for OLED devices are available but suffer from low efficiencies and significant degradation over time. In both cases these limitations have prevented the technology from reaching the luminance level of today’s displays, much less the level required for HDR imaging. Another serious limitation for direct approaches is the availability of high amplitude resolution drive circuitry. A wide range of luminance is only useful if a reasonable number of luminance levels are distinctly addressable within this range. Useful HDR Displays require an amplitude resolution or bit depth ranging from 12 to 16 bit depending on the application. High bit depth drivers of this kind are currently expensive and challenging to implement. Nevertheless, both approaches show promise as materials and drivers continue to improve. Proponents of these concepts estimate 10–15 years for laser projection to become available for consumer applications and 3–5 years for OLED displays to reach the luminance level of today’s conventional displays. An improvement beyond today’s level will likely take significantly more time and might not be possible at all given the physical limit on OLED material efficiency. The second approach offers a path to reach HDR image quality with existing devices. Combining two LDR devices removes the need for high bit depth drivers though high bit depth image processing logic is still required. The following describes implementations of dual-modulation displays and projectors, the associated image processing algorithms and the performance that can be obtained in this fashion. The alternative are displays that use multiple lower bit depth modulators in series. Such dual-modulation HDR Displays can be built by combining conventional transmissive displays with a spatially modulated light source. Transmissive displays such as liquid crystal displays (LCD) are passive devices that allow a portion of incident light to pass through each pixel. The transmission can be modulated over a reasonable range between 200:1 and 500:1. Standard LCD are combined with a passive light source such as cold cathode fluorescent tubes (CCFL) which provide a uniform and constant light field at the back of the LCD. A spatially modulated backlight can greatly increase the dynamic
214
H. Seetzen
Fig. 14.1. BrightSide Technologies DR 37 HD HDR Display
Fig. 14.2. HDR display image processing example. From top left: the LED backlight shows a low resolution approximation of the luminance distribution of the image; the actual light distribution from the LED backlight is blurred by the optical package; the LCD image compensates for the low resolution of the LED backlight; the final image is the product of the LED light distribution and the LCD image
14 High-Dynamic Range Displays
215
range of such a display. Such a backlight could be a secondary LCD (or other type of transmissive modulator) in front of a high brightness light source or an array of individually controlled light sources such as light emitting diodes (LED). High quality LCD have a very low transmission efficiency even in the white state (approximately 2–5%) and combining two transmissive modulators is therefore very inefficient. The LED approach is therefore by far the most energy efficient because the efficiency loss of the second LCD is avoided. Moreover, in the LED approach light is only created in the bright regions of the image. The average power consumption of LED based HDR Displays is therefore lower than that of conventional LCD where the backlight is constantly on. Normal TV signal has an average intensity level of approximately 22% such that the LED approach is on average 3–5 times more power efficient than a conventional LCD display of the same brightness. The efficiency gain compared to the dual modulation LCD approach is even more significant. An optimized HDR Display design has consequently three major components: a backlight with a low resolution array of LED and appropriate optical structures, a high resolution LCD panel, and an image processing algorithm to drive both modulators (Fig. 14.2). Each component uses standard electronic and optical elements so that the implementation and design cost are low. 14.2.1 LED Backlight The LED backlight’s primary function is to provide a smooth light distribution. This can be achieved with a point spread function (PSF) of the light emitted by a single LED that overlaps significantly with the PSF of neighboring LED. This blurs the light field created by the LED backlight sufficiently to remove any high spatial frequency information which could interfere with the high resolution pixel structure of the LCD. It also ensures that the backlight produces an approximately uniform light field if many LED are driven at the same level. The HDR Display can therefore operate like a conventional LCD display with a constant (and uniform) backlight if HDR image quality is not desired for a particular application. Because the LED light distribution has radial symmetry it is desirable to arrange the LED in a hexagonal rather than rectangular grid. This arrangement will not interfere with the rectangular structure of the LCD pixel layout because of the vast resolution difference and blur of the LED light field. A hexagonal LED spacing of 10–30 mm is acceptable for monitor or TV applications provided the PSF is arranged such that it overlaps substantially with that of direct neighbors and falls off towards zero when it reaches secondary neighbors. A reflective cavity between the LED and LCD will create such an optimized PSF. Conventional LCD use optical films such as brightness enhancement film [2], reflective polarizer film [2] and various diffuser sheets to shape the light distribution leaving the displays. These films have different functions
216
H. Seetzen (1) (2) (3) (4) (5) (6) (7)
Fig. 14.3. Optical package of the HDR Display including LCD (1), micro-structured films such as BEF or DBEF (2), diffuser (3), cavity (4), reflective film (5), LED (6) and circuit board with appropriate thermal management solution (7)
but share the characteristic that they reflect some portion of the incoming light back towards the backlight. In the HDR display a significant portion of the light coming from the LED will therefore be reflected by the film layers back towards the LED. A reflective cavity can be created by placing a layer of reflective material onto the LED array. This causes the light from the LCD to bounce between the LCD (and film layers) and the back reflector. Some light leaves the LCD with each bounce but the reflected light also moves laterally so that the final PSF is very similar to a Gaussian or combination of Gaussian distributions. Fig. 14.3 shows a schematic diagram of this optical package. The LED backlight also needs to be controllable over the entire range of each LED with some reasonable amplitude resolution. Because high spatial frequency and color modulation occurs in the LCD, the amplitude resolution of the LED control does not have to be very high. An 8-bit control component with 255 distinct steps is more than sufficient. Lower resolution control is also possible but usually it is cheapest to use mass produced 8-bit devices. Some of the excess amplitude can be used, for example, for calibration of the LED over lifetime or temperature. 14.2.2 LCD Panel The LCD panel used for the HDR Display is a can be a standard panel. The LCD is responsible for high spatial frequency information, local contrast and color. Therefore the benefits of a better LCD will usually directly improve the HDR Display as well. 14.2.3 Image Processing Algorithm Image processing for the HDR Display would be easy if the LED were small enough to match the LCD pixel in size. This is not the case and the image processing algorithm therefore needs to compensate for the low resolution of the LED backlight by adjusting the LCD image. This is possible because
14 High-Dynamic Range Displays 2
3
2a
r1−1 (IL)
IL
I
217
LED
4
p1 IL 5 I
p1
·
IL
r2−2
( ·
(
1
I
I
p1
IL
6 LCD
Fig. 14.4. HDR Display image processing algorithm
LCD
10000
Final Veiling Luminance Image LED LCD
1000
Luminance Transmission
100 10 1
0
1
2
3
4
5
6
7
8
9
0.1
0.01
0.001 Position on Screen (cm) Fig. 14.5. High contrast feature showing undesirable LED halo masked by veiling luminance halo
Fig. 14.6. Schematic HDR Display image processing steps: Input image, LED values, LCD image with correction, final output
the light field created by the LED backlight is effectively multiplied by the transmission pattern on the LCD. The algorithm shown in Fig. 14.4 separates the incoming image (1) into LED (3) and LCD (6) information such that the optical multiplication of the two modulators returns the luminance levels defined by the input.
218
H. Seetzen
The first step in this process is to reduce the resolution of the incoming image to approximately that of the LED array. Since the LED array usually has a hexagonal layout this will involve some asymmetric down-sampling. Different methods can be used for this process depending on the capabilities of the processor. Once the image has been reduced to the LED layout, the algorithm decides what portion of the total image will be presented by the LED backlight. The final image is the product of the LED and LCD image so that an approximately equal distribution can be achieved by taking the square root (2) of the reduced resolution image (2a). Varying the power of the root will change this relationship. Likewise, non-linear response of the LCD and other factors such as thermal management characteristics will influence this decision. The down-sampling and distribution steps are loosely constrained because many choices for the LED values will yield acceptable results. The LCD image will compensate for most of the variation in the LED backlight. The same reason allows generous application of common image enhancement techniques to the LED image such as sharpening filters and other techniques designed to optimize the energy efficiency of the display. The next step is to calculate the anticipated light field (4) generated by the LED by summing up the contributions of each LED to the total light field. The PSF of each LED is adjusted by the drive value of the LED, the response function of the LED, and any other necessary factor such as thermal or lifetime calibration. It is important that the PSF used in this process accurately reflects the effective light distribution of one LED, including any interaction with the optics and films used in the package. The result of this summation is a simulation of the light field of the LED backlight for this input image. Finally, the LCD image can be generated by dividing the input image by the light field simulation (5). If the LED backlight uses LED with different spectral components then these need to be simulated as well and the division can occur at the color channel level. For a white LED backlight the normally RGB input image will be divided by a monochrome light field. In both cases the result of the division are adjusted by the inverse response function of the LCD. At this point the image processing algorithm is finished and the HDR Display can show a complete frame. The LED will be driven to the appropriate intensity and the LED backlight will create a luminance distribution that is similar to the light field simulation. The LCD will modulate this luminance distribution such that the final output will match the input image (Fig. 14.6). In principle the algorithm can accurately reproduce any input image within the range of the product of the dynamic ranges of the LED and LCD elements. In actual devices there are limits to this approach because of the discreet step size of the LCD and LED, rounding errors in the simulation and a host of other factors. Specifically, if both LED and LCD are 8-bit devices with 255 discreet steps each then the total modulated range of 0–65025 but several steps inside that range are not achievable. For example, levels between 64,770(= 255×254)
14 High-Dynamic Range Displays 1
1
10
100
1000
10000
219
100000
Relative Step Size (%)
0.1
0.01
0.001
0.0001
0.00001 Step Number
Fig. 14.7. Relative step size of dual-8-bit HDR display
and 65,025(= 255 × 255) are unreachable. As a result there are luminance levels that the HDR Display cannot reproduce accurately. Fortunately, the magnitude of these gaps in the luminance range of the HDR Display is very small at the low end of the range and increases towards the high end. The relative gap size is therefore very low everywhere along the range. Figure 14.7 shows this effect for a display with an 8-bit linear LED and an 8-bit LCD with a gamma of 2.2 as response curve. With the exception of the near black range the relative step size of the HDR Display is below 1%. A further limitation of the HDR Display occurs at very high contrast boundaries. Such boundaries require the LED directly under it to be very bright, leaving only the LCD to adjust the image. The light field from the LED will be approximately constant in the region very close to the boundary so that the highest local contrast is the dynamic range of the LCD. As a result the bright side of the boundary can be displayed accurately but the dark side will be slightly grey near the boundary. Further away from the boundary full black is achievable because a different LED contributes to this area and it can be turned off. The HDR Display is therefore capable of modulation over the dynamic range of the product of the LED and LCD ranges globally but only that of the LCD locally. Fortunately, this limitation is not relevant in conventional imaging. While our visual system is extremely good at dealing with high contrast images, it is limited in its capability in small areas. This is not so much an issue of our visual process but a result of optics. The optical system of our eye, including the lens and intraocular fluid, is imperfect. Specifically, each element scatters some of the light that passes through it. Our eye therefore has a well
220
H. Seetzen
La
B
Lb
α
A
a b
Fig. 14.8. Veiling luminance effect
established PSF for incoming light hitting the retina. A common model of this PSF is shown in (14.1). The PSF has two parts: a delta function term corresponding to the portion of incoming light that is not scattered, and a second term describing the scattered portion which is distributed as a function of the angle between the light source and the position on the retina. This is effect is often called veiling luminance or veiling glare. Veiling luminance point spread function P (α) = ηδ(α) +
c . f (α)
(14.1)
Figure 14.8 shows a schematic outline of this effect. The scene is a high contrast boundary with bright region A producing luminance La and dark region B with luminance Lb. Scatter in the eye sends some of the light from A onto spot b on the retina where region B is imaged. The actual perceived luminance at b is therefore given by the contribution of Lb and the veiling luminance portion of La for the angle between A and B. This increase in the perceived luminance at b in turn lowers the perceived contrast between points A and B in the scene. Veiling luminance therefore puts a limit on the perceivable contrast in a small region. The LED spacing suggested earlier ensures that the halo caused by the large size of the LED will always be smaller than the veiling luminance halo caused by the bright portion of the high contrast boundary. Figure 14.5 shows a cross section example of a small bright feature on black background. Notice the increased light level near the boundary resulting from the LCD’s inability to compensate for the excess light from the LED. Also shown is the veiling luminance caused by the bright feature for an average viewer at a viewing distance of 20 cm. The veiling luminance effect is significantly stronger than the display imperfection. In other words, the HDR Display has a defect at high contrast boundaries but the HVS has a similar and larger defect at those points as well which results in the display defect becoming invisible.
14 High-Dynamic Range Displays
221
Fig. 14.9. HDR Display layout including LED backlight, optical package and LCD layer
14.3 HDR Display Performance The dual-modulation HDR Display approach (Fig. 14.9) offers a wide range of improvements over conventional displays. First and foremost are of course the increases in dynamic range and amplitude resolution. But other benefits can be obtained as well. For example, the use of different color LED (or integrated RGB LED packages) can significantly improve the color gamut of the display by providing more saturated primaries. The LED have a very fast response time so that flashing them can increase the effective response time of the LCD (the LCD is a passive device so the effective light emissive period is defined only by the LED flash time). There are many other options that make use of the unique architecture of the HDR Display to enhance image quality or system efficiency. BrightSide Technologies has developed a 37 high definition HDR Display reference design that incorporates many of these benefits (Fig. 14.1). This unit offers a peak luminance of over 3,000 cd m−2 (approximately 5–10 times higher than any other TV on the market), a black level that is effectively zero when the LED switch off. The sequential contrast of such a device is infinite since the black level is zero. A more useful measure is the ANSI contrast which is approximately 25,000:1 on a 9-point ANSI checkerboard target. The non-zero ANSI black level is the result of small light contributions from LED below the bright regions towards the dark regions.
222
H. Seetzen
14.4 Alternative Implementation As mentioned earlier the dual-modulation concept can be used to create HDR projection devices as well. The design requirements and algorithms for this are nearly identical to those described above. Miniaturization of LED is technically very challenging and therefore the dual-modulation approach with two passive modulators and a fixed light source is currently easier to implement. As with the HDR Display there are many configurations for an HDR Projector. For example, a low resolution digital mirror device (DMD) could be used as a modulated light source for a high resolution LCD. Alternatively, two LCD in series or any such combination involving liquid crystal on silicon (LCOS) modulators could be used.
14.5 Conclusion Overall the dual modulation approach offers a cost effective path to HDR imaging that can be implemented without major infrastructure or manufacturing changes. The performance benefits of the approach are significant with an effectively infinite increase in sequential contrast, an over hundred-fold increase in ANSI contrast, a doubling of the bit depth and the ability to achieve peak luminance levels that are ten times higher than anything available today. All this with an average power consumption that is three times lower than that of LCD with comparable peak luminance levels.
References 1. H. Seetzen, H. Li, L. Ye, L. A. Whitehead, “Observation of Luminance, Contrast and Amplitude Resolution of Displays”, Proc. of the 2006 Society for Information Display Annual Symposium, June 2006 2. Brightness Enhancement Film and Multi Layer Optical Film are products of 3M Corp. More information about the Vikuiti line of films can be found at www.3m.com/Vikuiti 3. P. Ledda, A. Chalmers, T. Troscianko, H. Seetzen, “Evaluation of Tone Mapping Operators using a High Dynamic Range Display”, ACM Transactions on Graphics, special issue on Proceedings. of ACM SIGGRAPH 2005, August 2005 4. H. Seetzen, W. Heidrich, W. Stuerzlinger, G. Ward, L. A. Whitehead, M. Trentacoste, A. Gosh, A. Vorozcovs, “High Dynamic Range Display Systems”, ACM Transactions on Graphics, special issue on Proceedings. of ACM SIGGRAPH 2004, August 2004 5. P. Ledda, A. Chalmers, H. Seetzen, “A Psychophysical Validation of Tone Mapping Operators Using a High Dynamic Range Display”, Symposium on Applied Perception in Graphics and Visualization, LA 2004 6. H. Seetzen, H. Li, L. Ye, L. A. Whitehead, “Observation of Luminance, Contrast and Amplitude Resolution of Displays”, Proceedings of the 2006 Society for Information Display Annual Symposium, June 2006
14 High-Dynamic Range Displays
223
7. H. Seetzen, L. A. Whitehead, G. Ward, “A High Dynamic Range Display Using Low and High Resolution Modulators”, Proceedings of the 2003 Society for Information Display Annual Symposium, May 2003 8. H. Seetzen, G. Ward, L. A. Whitehead, “High Dynamic Range Imaging Pipeline”, Proceedings of the 2005 Americas Display Engineering and Application Conference, November 2005 9. P. Ledda, A. Chalmers, H. Seetzen, “High Dynamic Range Displays: a Validation Against Reality”, IEEE SMC 2004 International Conference on Systems, Man and Cybernetics, October 2004 10. US 6,891,672 L. A. Whitehead, H. Seetzen, W. Stuerzlinger, G. Ward, High Dynamic Range Display Devices
15 Appendix
15.1 Symbols
σ
σ(λ) κR , κ G , κ B
the standard deviation representing the semiedge width of the detected edge; the inflection point coefficient that determines the brightness Iturn on the edge; the turning point coefficient that determines the background brightness Ibgrd the brightness factor that represents the minimal proportion between the brightness of the earliest possible edge position I0 and the background Ibgrd Color Matching Functions temperature gradient of the offset temperature gradient of the gain temperature gradient of the illumination at the 3 dB-point Sensitivity R, G, B sensitivity
κR , κG , κB
R, G, B sensitivity
αR , αG , αB
Color enhancement parameters
¯ Y
Adapting luminance (background luminance in simple stimuli) offset Inverse contrast sensitivity Color vectors in CIELab
ξ1
ξ2
η0
x ¯, y¯, z¯ α β γ
a a a∗ , b ∗
DN Dec−1 DN Dec−1
(DN) percent DN
226
Appendix
a0 aph b B b0 bph bR , bG , bB c C∗R c0 CR , CG , CB fR , fG , fB I ID IDark Imax IPhoto k k ka , k b l L l(y) L∗ (x, y) L∗W L→ (xo , ωo ) lg ln log LW n N NDFP NDS NPS NQ NR NSFP p q QD QS
Offset at 0◦ C Physical offset gain Brightness Gain at 0◦ C Physical gain R, G, B offset Illumination at the 3 dB-point Chroma Illumination at the 3 dB-point at 0◦ C Colorfulness red, green, blue output R, G, B OECF’s Intesity Drain current Dark current Maximum current in weak inversion Photo current Normalization constant Boltzmann Constant CIELab Paramters Luma Brightness Luminance to luma conversion function Lightness Lightness white Radiance leaving the surface at point xo in direction ωo Logarithm base 2 Logarithm base e Logarithm base 10 Brightness white subthreshold slope factor RMS noise Dark fixed-pattern noise Dark shot noise Photosignal shot noise Quantisation noise Reset noise Photosignal fixed-pattern noise Central pixel Electron charge Dark charge Signal charge
(DN) (V)
DN (lx) Dig. (DN) Dig.
(A) (A) (A) (A)
Dig. Dig.
V, e, DN V, e, DN As, e, DN As, e, DN V, e, DN V, e, DN As As, e, DN As, e, DN
Appendix R(Y) R, G, B Rd(xi , xo ) ¯ RF(Y) S s(λ) S(xi , ωi ; x0 , ω0 ) ∗ S R , SR T u
v
V(x, y) VGS VR Vt VT Vthresh x x x x X x, y x ¯, y¯, z¯ xbright xdark xi xL , xM , xS xmedium xo XW , YW , ZW
Neural response to luminance Y Densitometric Tristimuli Diffuse reflectance Highest resolvable spatial frequency for adapting luminance Signal Sensitivity Bidirectional scattering-surface reflectance distribution function (BSSRDF) Saturation red Temperature first component of the CIE 1976 Uniform Chromacity Scales Second component of the CIE 1976 Uniform Chromacity Scales Spatially variant local adaptation value gate-to-source voltage Output voltage red Thermal voltage (0.025 V) Threshold voltage (0.7 V) Threshold voltage Illumination Chromaticity coordinate red Array coordinate Stimulus Colorimetric tristimulus value red Image coordinates Color matching functions Illumination at bright illumination (e.g. 1,000 lx) Illumination at low illumination (e.g. 0.01 lx) Point at which light enters into the surface Colorfulness red, green, blue input Illumination at medium illumination (e.g. 10 lx) point at which light leave the surface Tristimulus values white limit
227
DN
V, e, DN
Dig. C, K
◦
(V) DN V (V) (lx) Dig. cm Dig.
(lx) (lx)
(lx)
228
Appendix
y Y y y y Y y(l) ybright ydark ymedium yraw Z Z φ← (xi , ωi ) ωi ωo
Digital output Stimulus luminance Luminance Sensor output Chromaticity coordinate green Colorimetric tristimulus value green Luma to luminance conversion function Digital output at bright illumination (e.g. 1,000 lx) Digital output at low illumination (e.g. 0.01 lx) Digital output at medium illumination (e.g. 10 lx) Uncorrected, digital output Chromaticity coordinate blue Colorimetric tristimulus value blue Incident flux entering into the surface at point xi from direction ωi Incoming light direction Outgoing (reflected) light direction
(DN)
V, DN Dig. Dig. (DN)
(DN)
(DN) Dig. Dig.
List of symbols(E) AVDD AVSS DVDD DVSS SVDD VDD
Analog supply voltage Analog ground Digital supply voltage Digital ground Pixel supply voltage Positive supply voltage
Units V V V V V V
Typ. range 3.3 0 2.5 0 3.3 2.5
VT VOFFD VOFFB Vlog
Treshold voltage Dark offset voltage Bright offset voltage Node voltage inside pixel
V V V V
0.5 . . . 0.7 0.09 0.009 2.4 . . . 2.8
Appendix
229
15.2 Abbreviations 3D A/D-Converter ADC AOI ASIC BGA BIOS BRDF BSSRDF Camera LinkTM CCD CCTV CE CIE
CMOS C-Mount COB CPU CRT DSP EM FLASH FPN GI GPU HD HDR HDRC HDRCsensor HDTV hil hipe HVS ICC JND JPEG LAN LCC
3 dimension Analog-to-digital converter Analog-to-digital converter Area of interest, also named ROI (region of interest) or subframe Application specific integrated circuit Ball grid array Basic input output system Bidirectional reflectance distribution function Bidirectional scattering-surface reflectance distribution function Interface for digital cameras Charge-coupled device Closed-circuit television Communaut´e Europ´eenne Commission Internationale de l’Eclairage, International Commission on Illumination, Vienna Complementary metal oxide semiconductor Mechanical lens mount standard Chip-on-board Central Processing Unit Cathode-Ray Tube Digital signal processor Environment map Digital non-volatile memory Fixed-pattern noise Gobal illumination Graphics processing unit High definition High dynamic range High dynamic range CMOS High dynamic range CMOS sensor High definition television Hema imaging library Hema image processing environment Human visual system International color consortium Just noticeable difference Joint Photographic Experts Group (also image format) Local area network Leadless chip carrier
230
Appendix
LCD LDR LED LUT MMR MPEG NF-Mount NIR NTSC OECF OLED Open Eye Module OTF PAL PC PDF PRT PSF RGB RMD ROI SH SMT SPI TVI VEM VR
Liquid crystal display Low dynamic range Light emitting diodes Look-up table Memory mapped register Moving Picture Experts Group Mechanical lens mount standard Near infrared National television systems committee, video output standard of CCTV cameras Optoelectronic conversion function Organic Light Emitting Diodes Camera module with a generic digital interface Optical transfer function Phase-alternating line, video output standard of CCTV cameras Personal computer Probability density function Precomputed radiance transfer Point Spread Function Red, green, blue color space Root-mean square Region of interest, also named AOI (area of interest) or subframe Spherical harmonics Surface mount technology Serial peripheral interface Threshold versus intensity function Video environment maps Virtual reality
15.3 Glossary
Brightness
Chroma Chromaticity Color appearance Color constancy
Attribute of visual sensation according to which an area appears to emit more or less light Colorfulness judged relative to brightness “White” Lw Scaled tristimulus value Color stimulus perceived by the HVS Capacity of the HVS to transform recorded stimuli into representations of the same reflectance that are (largely) independent of the viewing illuminant
Appendix Colorfulness
Lightness
Saturation
231
Perception of an area being more or less chromatic. It increases with luminance The brightness of an area judged relative to the brightness of a similarly illuminated area that appears “white” (L×W ) Colorfulness judged in proportion to its brightness
15.4 Some Useful Quantities and Relations – Energy of a green (555 nm) photon: 3.58 × 10−19 Ws – Charge of the electron: 1.602 × 10−19 As – Radiant flux (W) is the radiated power at a certain wavelength: ◦ 1 W = 0.45 A at 555 nm wavelength and 100% quantum efficiency. – Luminous flux is the power of visible light. – Photopic flux (lm) is the power weighted for the response of the human eye: – At 555 nm, the peak response of the human eye: ◦ 1 W = 683 lm (555 nm). – Irradiance (W m−2 ) is the density of the radiant flux: ◦ 1 W m−2 = 683 lx (555 nm). – Illuminance (lx) is the photometric flux density: ◦ 1 lx = 1 lm m−2 ◦ 1 green (555 nm) lx = 1.46 mW m−2 – A light source of 1 cd emits 1 lm steradian−1 (sr). – Point light source with isotropic emission: ◦ 1 cd = 1 lm sr−1 = 4 π lm. – Luminance is the photopic flux density: ◦ 1 cd m−2 = 1 lm m−2 sr−1 ◦ = 1 lx sr−1 ◦ = 4 π lx (isotropic source). ◦ 1 green (555 nm) lx = 4.08 × 1011 photons cm−2 s−1 ◦ 1 fA = 6,250 e s−1
15.5 Trademarks Camera Link is a trademark of National Semiconductor Corporation HDRC is a trademark of IMS CHIPS Windows is a trademark of Microsoft BrightSide is a trademark of BrightSide Technologies Seelector is a trademark of Hema DLT is a trademark of Texas Instruments.
Index
ACC, 125, 128, 130 A/D-converter, 73, 74, 77, 86 acoustic warning, 134 acoustical, 134 active implant, 141 active safety, 128, 136 active-pixel image sensors (APS), 21 adaptive cruise control, 126, 133 adaptive cruise control (ACC), 123 adaptive edge-based algorithm, 117–120 address, 40, 73 address generator, 74 age-related macular degeneration, 141 amplifier, 19, 21 amplitude, 212, 213, 216, 221 analog photography, 103 analog-to-digtal converter, 29 anchor, 156, 158 anchoring, 156 ANSI contrast, 212 antireflection coatings, 101 AOI, 74, 76, 84 aperture, 27, 30, 31 appearance, 31, 69, 147–152, 155 arithmetic, 29 arithmetic mean value, 43 artifacts halos, 163 ASIC, 77 asphere, 102–104 automotive, 125, 126, 130 average illumination, 142, 143
back light, 100 back lighting, 99 bad or weak pixels, 40 barrier, 21, 23 barrier height, 21 base layer, 171 BGA, 77 bidirectional reflectance distribution function (BRDF), 196 bidirectional scattering-surface reflectance distribution function (BSSRDF), 196 bilateral filter, 171, 172 Bilateral Filter Tone Mapping, 170 bilateral filtering, 155, 159, 160, 175 bionic, 1 bipolar cells, 141 bit, 16, 45, 46 blackening, 104 blinding, 1 blindness, 141 blinks, 27 block diagram, 26, 27, 30 blooming, 133 body, 14 bright histogram, 28 brightness, 66–69, 89, 148–150, 154, 155, 159, 160, 166 brightness range, 134 C-mount, 76, 79, 84 Camera Link, 83, 84 Camera LinkTM , 78, 79, 82 candidates, 118
234
Index
capacitance, 20, 48 CCD, 13 CCTV, 85–87 channel, 21 characteristic edges, 115 charge balance, 144 chip, 19, 21 chroma, 69 chromaticity, 70 CIELab, 66 CMOS technology, 19 coatings, 29 COB, 74, 77 collision avoidance, 123, 127, 130, 136 collision mitigation, 123, 127, 130, 136 color, 8, 10, 11, 65, 66, 68, 69, 71, 73, 84 color amplification factors, 71 color constancy, 1, 10 color management, 64, 68, 71 color saturation, 69, 71 color space, 180–182, 184, 187, 189, 190 color stimulus, 68 color-correction, 69 comet tail, 48, 52 compatibility, 183 compatible, 183, 211 compression, 16 cones, 150, 163, 164, 166 continuous mode, 49, 52 contour, 97 contour extraction, 8 contrast, 10, 64, 71, 157–162, 168 contrast compression, 161 contrast discrimination, 159, 160 contrast range, 100 contrast reduction, 147, 157 contrast sensitivity, 1, 4, 6, 7, 10, 11 contrast sensitivity function (CSF), 6 control loops, 27 controller, 73, 76, 77 conversion efficiency, 20 conversion transistor, 32 correction, 35, 37 correction parameters, 46 correction procedure, 46 correlation, 43 cortex, 141 cover glass, 101 crosstalk, 32
CSF, 7 current in weak inversion, 33 current source, 48 100% defect positions, 118 3 dB-point, 26, 29, 32, 33, 35–38, 45, 53 dark, 13, 57 dark charge, 20 dark current, 26, 28, 33, 43, 47 dark HDRC reset, 50 dark histogram, 28 dark noise, 172 dark reset, 49, 50 dark reset mode, 52 dark shot noise, 20 data fusion, 135 decade, 13–16, 18, 19, 26 decompose the luminance, 171 defect detection, 114, 116, 119, 120 degenerated, 141 densitometric, 68 densitometric HDRC color, 71 detail layer, 171 differences, 65–68, 71, 94 differences in lightness, 143 diffuse, 201 diffusely, 201 digital, 14 digital color management, 68 digital memory, 29 digital numbers (DN), 24 digital output, 14, 16, 55 digital processing power, 65 digital storage, 29, 47 display, 211–217, 221 distal, 139 distance warning, 123 distorted by, 102 distortions caused, 187 distraction, 132 distribution, 28 dodging and burning, 155 doping, 21 drain, 21, 23, 49, 50, 56 drain current, 21, 23, 33 drain electrode, 21 drain-source voltage, 23
Index driver assistance, 122, 127, 128, 130, 135 Durand, 170, 172, 174 dynamic performance, 47 dynamic range, 1, 2, 5, 10 edge, 14 edge detection, 85, 87–90, 93, 96, 97, 138, 139 effect, 48 electrolysis, 142, 144 embedded memory, 47 encoded in the scene-referred, 185 encoding, 68, 71 endoscopy, 137 enhance, 69 enhanced color saturation, 70 enhanced colors, 69 environment map, 199–201 environment map lighting, 201, 202 environment maps importance sampling, 202 environmental sensor, 123 erroneous depth, 115 extended-range image sensors, 57 extracted, 92, 96 eye, 1, 3, 4, 6, 8, 10, 13, 18, 26 eye movements, 141 eye’s, 10 eye’s OECF, 2 eye-off-road time, 132 f -stops, 30, 101 false alarm, 133 far infrared (FIR), 130 field of the driver’s view, 135 field of view, 132 film, 13 filter, 71, 104 filtering, 97 FIR filtering, 203 fixed pattern, 28, 46 fixed pattern correction, 32 fixed-pattern correction, 29, 32, 35, 46, 47 fixed-pattern correction (FPC), 28 fixed-pattern noise, 32, 35, 41, 43 fixed-pattern offset, 30 flash, 77
235
flat-field, 57, 63 flat-field illumination, 24, 27 flat-field noise, 59–62 flat-field shot-noise, 59 floating point format, 183 flood light, 100 fluctuations, 57 focal length, 101–105 foveal, 158 frame, 57, 58, 63 frame sequential contrast, 212 frameworks, 156–158 free-standing lens, 101, 104 frequency, 171 fully automated inspection, 121 fully automated surface inspection, 118, 121 fusion, 133 γ, 67 gain, 32, 33, 35–38, 43, 44, 47 gamma correction, 148, 187, 189, 190 gamma correction to HRD, 188 ganglion cells, 141 gate, 21, 23, 24, 33, 49, 52, 56 gate and drain voltages, 49 gate oxide, 21 gate source, 33, 48 gauss profile, 115 ghost, 101, 102 glare, 148, 152, 166, 168–170 glass–air surfaces, 101, 104 global, 73 global illuminance, 143 global illumination, 143 global operations, 71 global operators, 148, 154 global or local fluctuations in brightness, 115 global shutter, 53, 55 glossy or specular surfaces, 196 gradient methods, 154, 158, 175 graphics processing units (GPUs), 175 gray, 89, 96, 97 gray charts, 66 gray opaque diffuse filter, 71 gray value profile, 115, 116 gray-scale plates, 100 grayish appearance, 69
236
Index
grey, 89 grey chart, 30, 31 grey-chart appearance, 31 half, 183 half-precision floating point, 184 halo artifacts, 154, 159 haptical, 134 HDRC pixel, 23, 26 HDRC sensors, 53 head-up display, 132 high definition (HD) television, 211 high-quality image acquisition, 28 histogram adjustment, 149, 152 histogram adjustment methods, 152 histogram equalization, 88, 159, 160 HMI, 129 hold capacitor, 53 home theatre, 212 human machine interface, 123, 135 human machine interface (HMI), 132 human visual system, 149, 166 human visual system (HVS), 2, 147, 149 HVS, 4, 8, 10 illuminance, 26, 29, 30, 48, 50, 66 illuminance equalization, 31 illuminant, 1, 10 illumination, 14, 15, 24, 26, 29, 31–33, 35, 38, 48, 52, 53, 137–139 image, 40 image errors, 101, 102, 105 image lag, 48, 50, 52, 53 image processing, 129, 130, 133, 134 image sensor, 13, 14, 17, 19, 27, 32, 39, 46 image synthesis, 197, 199 image-based acquisition methods, 196 image-based lighting, 193, 199, 200 image-based methods, 193 imagers, 32, 46 images, 40, 46 importance sampling, 202, 203 industrial image processing, 113, 117 industrial machine vision, 53 infrared cut-off, 74 inhomogeneous and/or structured surface, 116 inspection for free shaped surfaces, 116
instantaneous response, 1 integrated-circuit, 19, 21 integration time, 26, 49, 53 intensity, 13, 14, 16, 18, 19, 32 intracorporeal videoprobe, 137 IR-filtered, 174 iris, 172 isolation, 19 JPEG HDR, 183, 184 JPEG HDR format, 184 just noticeable differences (JND), 188 kTC, 59 lambertian surfaces, 201, 202 lane departure warning, 123, 128, 131, 133–135 lane guidance, 131, 133, 135 LCC, 48, 75 LCD, 214–219, 222 LCD’s, 220 LDR display, 174 leakage, 19, 21 leakage current, 19, 26 lens, 29 lens aperture, 30 lens contrast, 99, 101 lens elements, 101 lenses, 98, 99, 101, 102, 105 LIDAR, 133, 135 light traps, 104 lightness, 66, 67, 69, 70, 154, 156–158 lightness – color cone, 71 lightness perception, 155 lightness sensors, 67 linear sensor, 14 liquid crystal display (LCD), 213 local, 142 local illumination, 143 local lightness, 68, 69 local operators, 148, 161 localized contrast perception, 158, 159 log luminance, 70 log transistor, 24, 48–50, 53, 56 log-node, 26 log-node capacitance, 48 log-response, 13, 18, 26, 49 logarithmic compression, 1
Index logarithmic contrast compression, 151 logarithmic space, 171, 172 logarithmically compressed, 158 logluv TIFF, 182 logLuv TIFF format, 189, 190 long-time adaptation, 1, 7 longitudinal or lateral control of the vehicle, 123 low-luminance, 50 luma, 188–190 luminance, 18, 19, 50 luminance adaptation, 150, 163, 164 luminance distribution, 211, 212, 214, 218 LUT, 84, 87–90 macbeth chart, 8, 9, 69 mapping, 68 masking process, 40, 41 masticed multiple-lens, 105 matching, 92 matting, 104 mean values, 35 memory, 29, 41, 48 mesopic, 168 mesopic vision, 166 MetalDefectDetector, 118, 120, 121 microelectronic manufacturing, 19 min–max gamma, 88, 89 min–max stretch, 88, 89 miniature, 137 miniaturization, 137 minimal invasive surgery (MIS), 137 minimum detectable, 26 minimum detectable signal, 1, 7 mixed reality, 205, 206 MMR, 74, 76, 84 moon, 2 MOS transistor, 21 mosaic, 68 MPEG, 179, 187 MPEG video, 185 MPEG video compression, 184 MPEG-4 video compression, 184 multiexposure techniques, 194, 195, 200 multiplexers, 58 mux1, 24
237
nanotechnology, 104 natural scene, 40, 212 near infrared (NIR), 73 near infrared (NIR) light, 130 near-infrared, 135 neutrally gray background, 100 NF-mount, 76 night vision, 128, 130, 131, 133, 135 night vision warning, 132 night vision warning (NVW), 131 NIR, 73, 130 NMOS transistor, 21, 22 noise, 57, 58, 61–63 noise reduction, 134, 191 noise sources, 57, 59 nonuniformities, 57 not correlated, 28 NTSC, 73, 85, 87 object, 14, 29, 31, 53 object recognition and classification, 65 octaves, 4 OECF, 2, 6, 7, 10, 14–19, 21, 24–27, 29, 32, 33, 35, 40, 50, 53 offset, 30, 31, 33, 37, 43 offset correction, 30, 38, 40, 43, 47 on-resistance, 58 on/off contrast, 212 opaque diffuser filter, 71 Open Eye Module, 75 openEXR file format, 183 openEXR format, 183 optical axis, 101 optical intensities, 2 optical intensity, 15, 16, 29, 30 optical signals, 65 opto electronic conversion function (OECF), 2, 5, 13, 194 output amplifier, 24 output-referred, 180 1-Point Offset Correction offsets, 43 1-parameter correction, 37 1-point offset or the multipoint correction, 43 3-parameter correction, 37 3-parameter correction algorithm, 37, 43 p-substrate, 22, 23
238
Index
p-well, 22, 23 package, 138 PAL, 73, 85, 87 parameter extraction, 33 parameter variation, 35 parameterized correction algorithm, 38 parasitic current, 21 parking aids, 124 partially marked defects, 116 passive safety, 128 pedestrian, 130 pedestrian protection, 128 perception, 149, 156, 159, 164, 166, 167, 175 perception-based mapping, 68 perception-motivated HDR video tone mapping, 71 perceptual, 147, 148, 154, 160, 163, 166, 167, 169, 170, 175 perceptually, 152, 155 pfstools, 191 photocurrent monitor, 47 photodiode, 7, 8, 19, 20, 23, 24, 26, 28, 29, 48 photodiode/pixel, 7 photometric calibration, 194 photometric units, 2 photon, 19, 21 photopic, 166, 168 photopic response, 19 photoreceptor, 1, 3, 11, 142, 143, 149 piecewise linear, 57 pixel, 1, 3, 6, 8–11, 13–15, 19, 26–28, 31, 48, 50, 53, 55 pixel buffer, 26, 78 pixel capacitance, 49 pixel characteristic, 35 pixel follower, 24 pixel signal, 53 pn junction, 19 predictive driver assistance, 124 preprocess, 174 preserving the details, 170 progressive scan, 27 prosthesis, 141 pseudodefects, 114, 116 pull up, 31, 48–50 pull-down, 145
quantization, 20 quantization errors, 182, 189, 190 quantization noise, 57, 60–62 quantum efficiency, 19, 20 radar, 123, 125, 126, 128, 133, 135 radiance HDR format, 190 radiometric measurement, 67 random access, 55 read-out electronics, 23 real time, 2, 3, 8–10 real world scene, 149, 150, 180 real-time electronic acquisition system, 1 real-time environment maps acquisition, 200 real-time HDR video acquisition, 1 real-timealgorithms, 114 real-world scene, 8, 199, 200 real-world system, 16 receptor, 1, 7 red, green and blue, 68 reduces the contrast, 170 reference level, 49 reference pixel, 20, 24–26 reflectance, 29, 143 reflections, 99 refraction index at adjacent, 104 remission, 100 reset, 27, 48–50, 52, 53, 56 reset noise, 20 resistance, 21 response curve, 194 retina, 141, 142 retina implant, 144 retinal implant, 141, 145 retinitis pigmentosa, 141 RGB-Bayer color mosaic, 73 RGBE encoding, 182 robot, 118, 120, 121 rods, 150, 163, 164, 166–168 rolling-shutter, 53, 55 rotating fan, 55 rotating light spot, 52 3-sigma, 27, 41 S/N ratio, 60, 63 safety, 125, 127, 132 sample/hold stage, 73
Index saturation, 62 saturation enhancement, 71 saturation enhancement factors, 69 saturation signal, 60 scene-referred encoding, 180 scene-referred representation, 180 scotopic vision, 166, 170 select transistor, 23 semi-log plot, 13 sense lightness, 67 sensitivity, 2, 6–8, 10 sensor, 1–10, 41, 137 sensor data fusion, 128, 133 sensor to a lightness, 69 separation (or disregard) of the illuminant, 8 settling time, 48 shot noise, 7, 10, 58 shutter time, 27 sigma, 27, 41 signal, 13, 14, 30, 52, 57, 58 signal range, 16, 26, 50 signal-to-noise ratio, 60–62 signal-to-noise ratio S/N, 63 silicon, 19 silicon photodiode, 19–21 six-sigma, 41 slanted beams, 101, 104 smearing, 133 SMT, 77 snapshot, 53 sol–gel processes, 104 source, 21, 23, 33, 48, 50, 52, 53 source electrode, 23 source follower, 53 space domain, 57 spatial domain, 14 spectral sensitivity, 19, 20, 24, 29, 55 spherical harmonic, 202 spherical harmonics (SH), 201 SPI, 81 spontaneous sensitivity, 18, 26 spot light, 101 spotlight, 31 spotlight illumination, 31 standard deviation, 29 standard photographic objects, 100 stationary state, 48 step-function, 48
239
Stevens law, 66, 149 Stevens’ power-law, 149 stimulation, 141, 144 stimulation electrodes, 142 stimulus, 66, 68 stray light, 99, 101, 103, 105 stretch, 88 stretched, 67 studio, 9, 206 studio illumination, 31 sub-retinal implant, 140 sub-retinal implantation, 141 substrate, 22, 23 subthreshold slope, 33 subthreshold slope factor, 33 sunlight, 2 surface defects, 115 surface inspection, 113, 114 surface texture, 104 surround sensing, 124 taxonomy, 148 temperature dependence, 19, 41, 43–45 templates, 91 temporal coherence, 158 temporal incoherence, 159 testability, 138, 142 TFT, 132 thermal voltage, 21, 33 threshold visibility, 150 threshold voltage, 28, 33, 49 time domain, 14 tissue, 141, 142 to HDR, 189 tone mapping, 147–155, 157–159, 161, 162, 164, 165, 168, 169 tone mapping operator, 171, 174 tone reproduction curve, 154 tone reproduction curve (TRC), 148 tone-mapping, 84, 166 traffic sign recognition, 134, 135 transistor, 23, 24, 28, 49, 52, 53 transistor channel, 21 transistor well, 21 translucent objects, 196–198 transmittance, 29 transparent objects, 196 trilateral filtering, 172 tristimulus, 68
240
Index
ultrasonic sensor, 124, 125 uniform chromacity scale, 182, 188 uniformity, 65 unity-gain amplifiers, 24 useful dynamic range, 62
visual, 152 visual acuity, 150, 167–170 visualization, 137 vlog, transistor, 52
variance, 57, 58 video compression, 179, 181, 187, 191 video environment map, 199 video environment maps (VEM), 200 video streams, 172 videoprobe, 138 vision impaired, 140
wavelength – dependent sensitivity, 31 weak inversion, 32, 33 weber law, 149 weber-fechner law, 149 well, 221 white balance, 68 white reset, 50, 52
Springer Series in
advanced microelectronics 1
2
3 4
5 6
7
8
9
Cellular Neural Networks Chaos, Complexity and VLSI Processing By G. Manganaro, P. Arena, and L. Fortuna Technology of Integrated Circuits By D. Widmann, H. Mader, and H. Friedrich Ferroelectric Memories By J.F. Scott Microwave Resonators and Filters for Wireless Communication Theory, Design and Application By M. Makimoto and S. Yamashita VLSI Memory Chip Design By K. Itoh Smart Power ICs Technologies and Applications Ed. by B. Murari, R. Bertotti, and G.A. Vignola Noise in Semiconductor Devices Modeling and Simulation By F. Bonani and G. Ghione Logic Synthesis for Asynchronous Controllers and Interfaces By J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Yakovlev Low Dielectric Constant Materials for IC Applications Editors: P.S. Ho, J. Leu, W.W. Lee
10 Lock-in Thermography Basics and Use for Functional Diagnostics of Electronic Components By O. Breitenstein and M. Langenkamp 11 High-Frequency Bipolar Transistors Physics, Modelling, Applications By M. Reisch 12 Current Sense Amplifiers for Embedded SRAM in High-Performance System-on-a-Chip Designs By B. Wicht 13 Silicon Optoelectronic Integrated Circuits By H. Zimmermann 14 Integrated CMOS Circuits for Optical Communications By M. Ingels and M. Steyaert 15 Gettering Defects in Semiconductors By V.A. Perevostchikov and V.D. Skoupov 16 High Dielectric Constant Materials VLSI MOSFET Applications Editors: H.R. Huff and D.C. Gilmer 17 System-level Test and Validation of Hardware/Software Systems By M. Sonza Reorda, Z. Peng, and M. Violante