EYE AND VISION RESEARCH DEVELOPMENTS
BINOCULAR VISION: DEVELOPMENT, DEPTH PERCEPTION AND DISORDERS
EYE AND VISION RESEARCH DEVELOPMENTS Eye Cancer Research Progress Edwin B. Bospene (Editor) 2008. ISBN: 978-1-60456-045-9 Non-Age Related Macular Degeneration Enzo B. Mercier 2008. ISBN: 978-1-60456-305-4 Optic Nerve Disease Research Perspectives Benjamin D. Lewis and Charlie James Davies (Editors) 2008. ISBN: 978-1-60456-490-7 2008. ISBN: 978-1-60741-938-9 (E-book) New Topics in Eye Research Lauri Korhonen and Elias Laine (Editors) 2009. ISBN: 978-1-60456-510-2 Eye Infections, Blindness and Myopia Jeffrey Higgins and Dominique Truax (Editors) 2009. ISBN: 978-1-60692-630-7 Eye Research Developments: Glaucoma, Corneal Transplantation, and Bacterial Eye Infections Alan N. Westerhouse (Editor) 2009. ISBN: 978-1-60741-1772 Retinal Degeneration: Causes, Diagnosis and Treatment Robert B. Catlin (Editor) 2009. ISBN: 978-1-60741-007-2 2009. ISBN: 978-1-60876-442-6 (E-book) Binocular Vision: Development, Depth Perception and Disorders Jacques McCoun and Lucien Reeves (Editors) 2010. ISBN: 978-1-60876-547-8
Understanding Corneal Biomechanics through Experimental Assessment and Numerical Simulation Ahmed Elsheikh 2010. ISBN: 978-1-60876-694-9 Retinitis Pigmentosa: Causes, Diagnosis and Treatment Michaël Baert and Cédric Peeters (Editors) 2010. ISBN: 978-1-60876-884-4 Color: Ontological Status and Epistemic Role Anna Storozhuk 2010. ISBN: 978-1-61668-201-9 2010. ISBN: 978-1-61668-608-6 (E-book) Coherent Effects in Primary Visual Perception V.D. Svet and A.M. Khazen 2010. ISBN: 978-1-61668-143-2 2010. ISBN: ISBN: 978-1-61668-496-9 (E-book) Conjunctivitis: Symptoms, Treatment and Prevention Anna R. Sallinger 2010. ISBN: 978-1-61668-321-4 2010. ISBN: 978-1-61668-443-3 (E-book) Novel Drug Delivery Approaches in Dry Eye Syndrome Therapy Slavomira Doktorovová, Eliana B. Souto, Joana R. Araújo, Maria A. Egea and Marisa L. Garcia 2010. ISBN: 978-1-61668-768-7 2010. ISBN: 978-1-61728-449-6 (E-book) Pharmacological Treatment of Ocular Inflammatory Diseases Tais Gratieri, Renata F. V. Lopez, Elisabet Gonzalez-Mira, Maria A. Egea and Marisa L. Garcia 2010. ISBN: 978-1-61668-772-4 2010. ISBN: 978-1-61728-470-0 (E-book) Cataracts: Causes, Symptoms, and Surgery Camila M. Hernandez (Editor) 2010. ISBN: 978-1-61668-955-1 2010. ISBN: 978-1-61728-312-3 (E-book)
EYE AND VISION RESEARCH DEVELOPMENTS
BINOCULAR VISION: DEVELOPMENT, DEPTH PERCEPTION AND DISORDERS No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
JACQUES MCCOUN AND
LUCIEN REEVES EDITORS
Nova Science Publishers, Inc. New York
Copyright © 2010 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.
LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Binocular vision : development, depth perception, and disorders / editors, Jacques McCoun and Lucien Reeves. p. ; cm. Includes bibliographical references and index. ISBN 978-1-61761-957-1 (eBook) 1. Binocular vision. 2. Binocular vision disorders. 3. Computer vision. 4. Depth perception. I. McCoun, Jacques. II. Reeves, Lucien. [DNLM: 1. Vision, Binocular--physiology. 2. Dominance, Ocular--physiology. 3. Pattern Recognition, Visual--physiology. 4. Vision Disparity--physiology. WW 400 B6145 2009] QP487.B56 2009 612.8'4--dc22 2009038663
Published by Nova Science Publishers, Inc. New York
CONTENTS Preface
ix
Chapter 1
New Trends in Surface Reconstruction Using Space-Time Cameras: Fusing Structure from Motion, Silhouette, and Stereo Hossein Ebrahimnezhad and Hassan Ghassemian
Chapter 2
Ocular Dominance within Binocular Vision Jonathan S. Pointer
63
Chapter 3
Three-Dimensional Vision Based on Binocular Imaging and Approximation Networks of a Laser Line J. Apolinar Muñoz-Rodríguez
81
Chapter 4
Eye Movement Analysis in Congenital Nystagmus: Concise Parameters Estimation Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio, Bifulco Paolo and Fratini Antonio
107
Chapter 5
Evolution of Computer Vision Systems Vladimir Grishin
125
Chapter 6
Binocular Vision and Depth Perception: Development and Disorders Ken Asakawa and Hitoshi Ishikawa
139
1
viii
Contents
Chapter 7
Repeatability of Prism Dissociation and Tangent Scale Near Heterophoria Measurements in Straightforward Gaze and in Downgaze David A. Goss, Douglas K. Penisten, Kirby K. Pitts and Denise A. Burns
155
Chapter 8
Temporarily Blind in One Eye: Emotional Pictures Predominate in Binocular Rivalry Georg W. Alpers and Antje B.M. Gerdes
161
Chapter 9
Stereo-Based Candidate Generation for Pedestrian Protection Systems David Geronimo, Angel D. Sappa and Antonio M. López
189
Chapter 10
Development of Saccade Control Burkhart Fischer
209
Short Commentary Ocular Dominance Jonathan S. Pointer
247
Index
249
PREFACE "Binocular vision" literally means vision with two eyes, and refers to the special attributes of vision with both eyes open, rather than one eye only. Our perception under binocular conditions represents a highly complex coordination of motor and sensory processes and is markedly different from and more sophisticated than vision with one eye alone. This book reviews our ability to use both eyes, while also providing basic information on the development of binocular vision and on the clinical disorders that interfere with our depth perception, such as strabismus and amblyopia. This book also describes the development of eye movement control, particularly those that are important for reading. In addition, the authors of this book review the phenomenon of ocular dominance (OD) in the light of the types of test used to identify it; question whether inter-test agreement of OD in an individual might be anticipated, and address some practical implications of OD as demonstrated in healthy eyes and in cases where there is compromised binocular function. Other chapters in this book disclose new methodologies in congenital nystagmus eye movements analysis and evaluate heterophoria as an important element of assessment of binocular vision disorders. Three dimensional model reconstruction from image sequences has been extensively used in recent years. The most popular method is known as structure from motion, which employs feature and dense points matching to compute the motion and depth. Chapter 1 is intended to present an overview of new trends in three dimensional model reconstruction using multiple views of object, which has been developed by the authors. Robust curve matching method in stereo cameras for extraction of unique space curves is explained. Unique space curves are constructed from plane curves in stereo images based on curvature and torsion consistency. The shortcoming of outliers in motion estimation is extremely
x
Jacques McCoun and Lucien Reeves
reduced by employing the space curves. Besides, curve matching method deals with pixel range information and does not require the sub-pixel accuracy to compute structure and motion. Furthermore, it finds the correspondence based on curve shape and does not use any photometric information. This property makes the matching process very robust against the color and intensity maladjustment of stereo rigs. The recovered space curves are employed to estimate robust motion by minimizing the curve distance in the next sequence of stereo images. An efficient structure of stereo rigs – perpendicular double stereo – is presented to increase accuracy of motion estimation. Using the robust motion information, a set of exactly calibrated virtual cameras is constructed, which the authors call space-time cameras. Then, the visual hull of object is extracted from intersection of silhouette cones of all virtual cameras. Finally, color information is mapped to the reconstructed surface by inverse projection from two dimensional image sets to three-dimensional space. All together, the authors introduce a complete automatic and practical system of three-dimensional model reconstruction from raw images of arbitrarily moving object captured by fixed calibrated perpendicular double stereo rigs to surface representation. While, the simple methods of motion estimation suffer from the statistical bias due to quantization noise, measurement error, and outliers in the input data set; the complicated system overcomes the bias problem, by fusing several constraints, even in pixellevel information. Experimental results demonstrate the privileged performance of the complicated system for a variety of object shapes and textures. Ocular dominance (OD) can be defined and identified in a variety of ways. It might be the eye used to sight or aim, or whose input is favoured when there is competing information presented to the two eyes, or the eye whose functional vision appears superior on a given task or under certain conditions. The concept, which has been the subject of much discussion and revision over the past four centuries, continues to excite controversy today. What is becoming evident is that even in its most direct and behaviourally significant manifestation – sighting preference – it must be regarded as a flexible laterality within binocular vision, influenced by the physical circumstances and viewing constraints prevailing at the point of testing. Chapter 2 will review the phenomenon of OD in the light of the types of test used to identify it; question whether inter-test agreement of OD in an individual might be anticipated; briefly consider the possibility of any relationship between OD and limb or cortical laterality; and speculate whether OD is essentially the product of forced monocular viewing conditions and habitual use of one or other eye. The chapter will conclude with remarks addressing some practical
Preface
xi
implications of OD as demonstrated in healthy eyes and in cases where there is compromised binocular function. The authors present a review of their computer vision algorithms and binocular imaging for shape detection optical metrology. The study of Chapter 3 involves: laser metrology, binocular image processing, neural networks, and computer vision parameters. In this technique, the object shape is recovered by means of laser scanning and binocular imaging. The binocular imaging avoids occlusions, which appear due to the variation to the object surface. A Bezier approximation network computes the object surface based on the behavior of the laser line. By means of this network, the measurements of the binocular geometry are avoided. The parameters of the binocular imaging are computed based on the Bezier approximation network. Thus, the binocular images of the laser line are processed by the network to compute the object topography. By applying Bezier approximation networks, the performance of the binocular imaging and the accuracy are improved. It is because the errors of the measurement are not added to the computational procedure, which performs the shape reconstruction. This procedure represents a contribution for the stripe projection methods and the binocular imaging. To describe the accuracy a mean square error is calculated. This technique is tested with real objects and its experimental results are presented. Also, the time processing is described. Along with other diseases that can affect binocular vision, reducing the visual quality of a subject, Congenital Nystagmus (CN) is of peculiar interest. CN is an ocular-motor disorder characterized by involuntary, conjugated ocular oscillations and, while identified more than forty years ago, its pathogenesis is still under investigation. This kind of nystagmus is termed congenital (or infantile) since it could be present at birth or it can arise in the first months of life. The majority of CN patients show a considerable decrease of their visual acuity: image fixation on the retina is disturbed by nystagmus continuous oscillations, mainly horizontal. However, the image of a given target can still be stable during short periods in which eye velocity slows down while the target image is placed onto the fovea (called foveation intervals). To quantify the extent of nystagmus, eye movement recordings are routinely employed, allowing physicians to extract and analyze nystagmus main features such as waveform shape, amplitude and frequency. Use of eye movement recording, opportunely processed, allows computing “estimated visual acuity” predictors, which are analytical functions that estimate expected visual acuity using signal features such as foveation time and foveation position variability. Hence, it is fundamental to develop robust and accurate methods to measure both those parameters in order to obtain reliable values from the predictors. In this chapter the current methods to record eye movements in
xii
Jacques McCoun and Lucien Reeves
subjects with congenital nystagmus will be discussed and the present techniques to accurately compute foveation time and eye position will be presented. Chapter 4 aims to disclose new methodologies in congenital nystagmus eye movements analysis, in order to identify nystagmus cycles and to evaluate foveation time, reducing the influence of repositioning saccades and data noise on the critical parameters of the estimation functions. Use of those functions extends the information acquired with typical visual acuity measurement (e.g., Landolt C test) and could be a support for treatment planning or therapy monitoring. In Chapter 5, applications of computer vision systems (CVS) in the flight control of unmanned aerial vehicles (UAV) are considered. In many projects, CVS are used for precision navigation, angular and linear UAV motion measurement, landing (in particular shipboard landing), homing guidance and others. All these tasks have been successfully solved separately in various projects. The development of perspective CVS can be divided into two stages. The first stage of perspective CVS development is the realization of all the above tasks in a single full-scale universal CVS with acceptable size, weight and power consumption. Therefore, all UAV flight control tasks can be performed in automatic mode on the base of information that is delivered by CVS. All necessary technologies exist and the degree of its maturity is high. The second stage of CVS development is integration of CVS and control systems with artificial intelligence (AI). This integration will bring two great benefits. Firstly it will allow considerable improvement of CVS performance and reliability due to accumulation of additional information about the environment. Secondly, the AI control system will obtain a high degree of awareness about the state of the environment. This allows the realization of a high degree of control effectiveness of the autonomous AI system in a fast changing and hostile environment. “Binocular vision” literally means vision with two eyes, and refers to the special attributes of vision with both eyes open, rather than one eye only. Our perception under binocular conditions represents a highly complex coordination of motor and sensory processes and is markedly different from and more sophisticated than vision with one eye alone. However, the use of a pair of eyes can be disrupted by a variety of visual disorders, e.g., incorrect coordination between the two eyes can produce strabismus with its associated sensory problems, amblyopia, suppression and diplopia. What, then, is the reason for-and the advantage of-having two eyes? From our visual information input, we can perceive the world in three dimensions even though the images falling on our two retinas are only two-dimensional. How is this accomplished? Chapter 6 is a review of our ability to use both eyes, while also providing basic information on
Preface
xiii
the development of binocular vision and on the clinical disorders that interfere with our depth perception, such as strabismus and amblyopia. The evaluation of heterophoria is an important element of assessment of binocular vision disorders. Chapter 7 examined the interexaminer repeatability of two heterophoria measurement methods in a gaze position with no vertical deviation from straightforward position and in 20 degrees downgaze. The two procedures were von Graefe prism dissociation method (VG) and the tangent scale method commonly known as the modified Thorington test (MT). Serving as subjects were 47 young adults, 22 to 35 years of age. Testing distance was 40 cm. A coefficient of repeatability was calculated by multiplying the standard deviation of the difference between the results from two examiners by 1.96. Coefficients of repeatability in prism diopter units were: VG, straightforward, 6.6; VG, downgaze, 6.2; MT, straightforward, 2.8; MT, downgaze, 3.6. The results show a better repeatability for the tangent scale procedure than for the von Graefe prism dissociation method. As explained in Chapter 8, preferential perception of emotional cues may help an individual to respond quickly and effectively to relevant events. Existing data supports this hypothesis by demonstrating that emotional cues are more quickly detected among neutral distractors. Little data is available to demonstrate that emotional stimuli are also preferentially processed during prolonged viewing. The preferential perception of visual emotional cues is apparent under conditions where different cues compete for perceptual dominance. When two incompatible pictures are presented to one eye each, this results in a perceptual alternation between the pictures, such that only one picture is visible while the other is suppressed. This so called binocular rivalry involves different stages of early visual processing and is thought to be relatively independent from intentional control. Several studies from our laboratory showed that emotional stimuli predominate over neutral stimuli in binocular rivalry. These findings can be interpreted as evidence for preferential processing of emotional cues within the visual system, which extends beyond initial attentional capture. Taken together, data from this paradigm demonstrates that emotional pictures are perceived more intensively. Chapter 9 describes a stereo-based algorithm that provides candidate image windows to a latter 2D classification stage in an on-board pedestrian detection system. The proposed algorithm, which consists of three stages, is based on the use of both stereo imaging and scene prior knowledge (i.e., pedestrians are on the ground) to reduce the candidate searching space. First, a successful road surface fitting algorithm provides estimates on the relative ground-camera pose. This stage directs the search toward the road area thus avoiding irrelevant regions like
xiv
Jacques McCoun and Lucien Reeves
the sky. Then, three different schemes are used to scan the estimated road surface with pedestrian-sized windows: (a) uniformly distributed through the road surface (3D); (b) uniformly distributed through the image (2D); (c) not uniformly distributed but according to a quadratic function (combined 2D- 3D). Finally, the set of candidate windows is reduced by analyzing their 3D content. Experimental results of the proposed algorithm, together with statistics of searching space reduction are provided. Chapter 10 describes the development of eye movement control. The authors will consider, however, only those aspects of eye movements that are important for reading: stability of fixation and control of saccades (fast eye movements from one object of interest to another). The saccadic reflex and the control of saccades by voluntary conscious decision and their role in the optomotor cycle will be explained on the basis of the reaction times and neurophysiological evidence. The diagnostic methods used in the next part of the book will be explained in this chapter. The age curves of the different variables show that the development of the voluntary component of saccade control lasts until adulthood. The Short Commentary discusses ocular dominance and the rationale behind this phenomenon.
In: Binocular Vision Editors: J. McCoun et al, pp. 1-62
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 1
NEW TRENDS IN SURFACE RECONSTRUCTION USING SPACE-TIME CAMERAS: FUSING STRUCTURE FROM MOTION, SILHOUETTE, AND STEREO Hossein Ebrahimnezhad1,a and Hassan Ghassemian2,b 1
Sahand University of Technology, Dept. of Electrical Engineering, Computer Vision Research Lab, Tabriz, Iran 2 Tarbiat Modaress University, Dept. of Electrical and Computer Engineering, Tehran, Iran
Abstract Three dimensional model reconstruction from image sequences has been extensively used in recent years. The most popular method is known as structure from motion, which employs feature and dense points matching to compute the motion and depth. This chapter is intended to present an overview of new trends in three dimensional model reconstruction using multiple views of object, which has been developed by the authors [43]. Robust curve matching method in stereo cameras for extraction of unique space curves is explained. Unique space curves are constructed from plane curves in stereo images based on curvature and torsion consistency. The shortcoming of outliers in motion estimation is a
E-mail address:
[email protected],. E-mail address:
[email protected]. Web address: http://ee.sut.ac.ir/ showcvdetail.aspx?id=5 b
2
Hossein Ebrahimnezhad and Hassan Ghassemian extremely reduced by employing the space curves. Besides, curve matching method deals with pixel range information and does not require the sub-pixel accuracy to compute structure and motion. Furthermore, it finds the correspondence based on curve shape and does not use any photometric information. This property makes the matching process very robust against the color and intensity maladjustment of stereo rigs. The recovered space curves are employed to estimate robust motion by minimizing the curve distance in the next sequence of stereo images. An efficient structure of stereo rigs – perpendicular double stereo – is presented to increase accuracy of motion estimation. Using the robust motion information, a set of exactly calibrated virtual cameras is constructed, which we call them space-time cameras. Then, the visual hull of object is extracted from intersection of silhouette cones of all virtual cameras. Finally, color information is mapped to the reconstructed surface by inverse projection from two dimensional image sets to three-dimensional space. All together, we introduce a complete automatic and practical system of threedimensional model reconstruction from raw images of arbitrarily moving object captured by fixed calibrated perpendicular double stereo rigs to surface representation. While, the simple methods of motion estimation suffer from the statistical bias due to quantization noise, measurement error, and outliers in the input data set; the complicated system overcomes the bias problem, by fusing several constraints, even in pixel-level information. Experimental results demonstrate the privileged performance of the complicated system for a variety of object shapes and textures.
Keywords: 3D model reconstruction; space-time cameras; perpendicular double stereo; structure from silhouette; structure from motion; space curves; unique points; visual hull.
1. Introduction Reconstruction of surface model for a moving rigid object, through a sequence of photo images, is a challenging problem and an active research topic in computer vision. In recent years, there has been extensive focus in literature to recover three-dimensional structure and motion from image sequences [1-6]. Different types of algorithms are used because of the wide range of options, e.g., the image projection model, number of cameras and available views, availability of camera calibration, feature types and model of the scene. For a fixed object with a moving camera (or a moving rigid object with a fixed camera) setup, the shape and motion recovery problem can be formulated as trying to find out the 6 motion parameters of the object, e.g. its position and orientation displacement together with the accurate 3D world coordinates for each point. This problem is also known as bundle adjustment [7]. The standard method of rigid motion
New Trends in Surface Reconstruction Using Space-Time Cameras
3
recovery has been developed in the last decade based on sparse feature points [89]. The sparse method typically assumes that correspondences between scene features such as corners or surface creases have been established by tracking technique. It can compute only the traveling camera positions, and is not sufficient for modeling the object as it only reconstructs sparsely distributed 3D points. Typically, motion estimation methods suffer from instability due to quantization noise, measurement errors, and outliers in the input datasets. Outliers occur in the feature-matching process due mostly to occlusions. Different robust estimation techniques have been proposed to handle outliers. RANdom SAmple Consensus (RANSAC) is known as a successful technique to deal with outliers [10]. MEstimators reduce the effects of outliers by applying the weighted leastsquares [11]. Many other similar methods also are available [12-13]. Another standard method of shape recovery from motion has been developed in the last decade based on optical flow [1, 2, and 14]. The process of structure and motion recovery usually consists of the minimization of some cost function. There are two dominant approaches to choose the cost function. The first approach is based on epipolar geometry leading to a decoupling of the shape and motion recovery. In epipolar constraint approach, the cost function reflects the amount of deviation of the epipolar constraint as made happen by noise and other measurement errors. In this method, the motion information can be achieved as the solution to a linear problem. Presence of statistical bias in estimating the translation [15-17] as well as the sensitivity to noise and pixel quantization is the conventional drawback, which makes additional error in linear solution. Even small pixel-level perturbations can make the image plane information ineffective and cause the wrong motion recovery. To improve the solution, some methods minimize the cost function using the nonlinear iterative methods like Levenberg-Marquardt algorithm [8]. Such methods are initialized with the output of the linear algorithms. The second approach directly minimizes the difference between observed and predicted feature coordinates using Levenberg-Marquardt algorithm [18]. This method is marked by a high-dimensional search space (typically n+6 for n image correspondences) and, unlike the epipolar constraint-based approach, it does not explicitly account for the fact that a one-parameter family of solutions exists. In general, the structure and motion from monocular view image sequences is inherently a knotty problem and has its own restrictions, as the computations are very sensitive to noise and quantization of image points. Actually, the motion and structure computations are highly dependent to each other and any ambiguity in structure computation propagates to motion computation and vise versa. On the other hand, calibrated stereo vision directly computes the structure of feature
4
Hossein Ebrahimnezhad and Hassan Ghassemian
points. Therefore, integrating stereo and motion can reasonably improve the structure and motion procedure. Some works in literature fuse stereo and motion for rigid scene to get better results. Young et al. [19] computed the rigid motion parameters assuming that depth information had been computed already by stereo vision. Weng et al. [20] derived a closed form approximate matrix weighted least squares solution for motion parameters from three-dimensional point correspondences in two stereo image pairs. Li et al. [21] proposed a two-step fusing procedure: first, translational motion parameters were found from optical flows in binocular images, then the stereo correspondences were estimated with the knowledge of translational motion parameters. Dornaika et al. [22] recovered the stereo correspondence using motion of a stereo rig in two consecutive steps. The first step uses metric data associated with the stereo rig while the second step employs feature correspondences only. Ho et al. [23] combined stereo and motion analyses for three-dimensional reconstruction when a mobile platform was captured with two fixed cameras. Park et. al [24] estimated the object motion directly through the calibrated stereo image sequences. Although, the combination form of motion and stereo enhances the computations [25], presence of statistical bias in estimating the motion parameters still has destructive effect in structure and motion estimation procedure. In this chapter, a constructive method is presented to moderate the bias problem using curve based stereo matching and robust motion estimation by tracking the projection of space curves in perpendicular double stereo images. We prove mathematically and demonstrate experimentally that the presented method can increase motion estimation accuracy and reduce the problem of statistical bias. Moreover, the perpendicular double stereo setup appears to be more robust against the perturbation of edge points. Any large error in depth direction of stereo rig 1 is restricted by minimizing the error in parallel direction of stereo rig 2 and vice versa. In addition, the curve-matching scheme is very robust against the color maladjustment of cameras and shading problem during object motion. In section 2, a robust edge point correspondence with match propagation along the curves is presented to extract unique space curves with extremely reduced number of outliers. In section 3, two sets of space curves, which have been extracted from two distinct stereo rigs, are used to estimate object motion in sequential frames. A space curve-tracking algorithm is presented by minimizing the geometric distance of moving curves in camera planes. The proposed curvetracking method works as well with pixel accuracy information and does not require the complicated process of position computation in sub-pixel accuracy. An efficient structure of stereo setup - the perpendicular double stereo - is presented to get as much accuracy as possible in motion estimation process. Its properties
New Trends in Surface Reconstruction Using Space-Time Cameras
5
are discussed and proven mathematically. In section 4, a set of calibrated virtual cameras are constructed from motion information. This goal is achieved by assuming the moving object as a fixed object and the fixed camera as a moving camera in opposite direction. To utilize the benefits of multiview reconstruction, the moving object is supposed to be fixed and the camera is moved in the opposite direction. So, the new virtual cameras are constructed as the real calibrated cameras around the object. In section 5, object's visual hull is recovered as fine as possible by intersecting the large number of cones established by silhouettes of multiple views. A hierarchical method is presented to extract the visual hull of the object as bounding edges. In section 6, experimental results with both synthetic and real objects are presented. We conclude in section 7 with a brief discussion. The total procedure of three dimensional model reconstruction from raw images to fine 3d model is done in the following steps: Step1- Extraction of unique space curves on the surface of rigid object from calibrated stereo image information based on curvature and torsion consistency of established space curves during rigid motion. Step2- Object tracking and rigid motion estimation by curve distance minimization in projection of space curves to the image planes of perpendicular double stereo rigs Step3- Making virtual calibrated cameras as many as required from fixed real camera information and rigid motion information Step4- Reconstruction of the object's visual hull by intersecting the cones originated from silhouettes of object in virtual cameras across time (Space-Time Cameras) Step5- Texture mapping to any point of visual hull through the visible virtual cameras
2. Reconstruction of Space Curves on the Surface of Object The problem of decision on the correspondence is the main obscurity of stereo vision. Components in the left image should be matched to those of the right one to compute disparity and thus depth. Several constraints such as intensity correlation, epipolar geometry, ordering, depth continuity and local orientation differences have been proposed to improve the matching process, but their success has been limited. There are many situations, where it is not possible to find point-like features as corners or wedges. Then there is need to deal e.g.
6
Hossein Ebrahimnezhad and Hassan Ghassemian
with silhouette of the object instead of sparse local features. Besides, there exist objects, which cannot be represented adequately by primitive object features as points, lines, or circles. Moreover, pose estimations of global object descriptions are, statistically, more accurate and robust than those from a sparse set of local features. Whereas point features reveal little about surface topology, space curves provide such geometrical cues. Therefore, we focus on the space curves to develop an inverse problem approach. Robert and Faugeras presented an edge-based trinocular stereo algorithm using geometric matching principles [26]. They showed that given the image curvature of corresponding curve points in two views, it is possible to predict the curvature in the third one and use that as a matching criterion. Schmid and Zisserman offered an extension to Robert method by fusing the photometric information and edge information to reduce the outlier matches [27]. Both these approaches apply many heuristics. Han and Park developed a curve-matching algorithm based on geometric constraints [28]. They apply epipolar constraint between two sets of curves and compute corresponding points on the curves. From the initial epipolar constraints obtained from corner point matching, candidate curves are selected according to the epipolar geometry, curve-end constraints, and curve distance measures. Assuming that the corresponding curves in stereo images are rather similar, they apply curve distance measure as a constraint of curve matching. In general, this assumption is not true, as it will be discussed in section 2.2. Kahl and August developed an inverse method to extract space curves from multiple view images [29]. Instead of first seeking a correspondence of image structure and then computing 3D structure, they seek the space curve that is consistent with the observed image curves. By minimizing the potential associated to prior knowledge of space curves, i.e. average curvature, and potential associated to the image formation model, they look for the candidate space curves. The main deficiency of this method is that the relative motion of the cameras is assumed to be known.
2.1. Differential Geometry of Space Curves Let P = ( X ,Y , Z ) be a point whose position in space is given by the equations X = f ( s), Y = g ( s) and Z = h( s) where f, g, and h are differentiable functions of s.
As s varies continuously, P traces a curve in space. The differential geometry of curves traditionally begins with a vector R ( s) = X ( s )i +Y ( s ) j+ Z ( s )k that describes the curve parametrically as a function of s that is at least thrice differentiable.
New Trends in Surface Reconstruction Using Space-Time Cameras
7
Then the tangent vector T(s) is well-defined at every point R(s) and we may choose two additional orthogonal vectors in the plane perpendicular to T(s) to form a complete local orientation frame (see figure 1). We can choose this local coordinate system to be the Frenet frame consisting of the tangent T(s), the principal normal N(s) and the binormal B(s), which are given in terms of the curve itself: T(s ) =
R ′(s ) R ′(s )
; B(s ) =
R ′(s )× R ′′(s )
R ′(s )× R ′′(s )
; N(s ) = B(s )×T(s )
(1)
Differentiating the Frenet frame yields the classic Frenet equations: ⎡T′( s)⎤ ⎡ 0 κ ( s) 0 ⎤⎡T( s)⎤ ⎢ ⎥ ⎢ ⎥⎢ ⎥ 0 τ ( s)⎥⎢N(s)⎥ ⎢N′( s)⎥ = R ′(s) ⎢−κ (s) ⎢B′( s)⎥ ⎢ 0 ⎥ −τ ( s) 0 ⎥⎢ ⎣ ⎦ ⎣ ⎦⎣B(s)⎦
(2)
Here, κ ( s) and τ ( s) are curvature and torsion of the curve, respectively, which may be written in terms of the curve itself : dT R ′( s)× R ′′(s) = 3 ds R ′(s)
(3)
dB (R ′(s)× R ′′( s))⋅R ′′′(s) = 2 ds R ′( s)× R ′′( s)
(4)
κ ( s) =
τ ( s) =
Considering the vector R ( s) = X ( s )i +Y ( s ) j+ Z ( s )k , Eq.4 and Eq.5 can be modified as:
(YZ −YZ ) + (Z X − ZX ) + (X Y − XY ) (X +Y + Z ) 2
κ=
τ=
2
2
2
2
2
32
( ) ( ) ( ) (YZ −YZ ) + (Z X − ZX ) + (X Y − XY )
X Y Z −Y Z +Y ZX − ZX + Z XY− XY 2
2
2
(5)
(6)
8
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 1. Space curve geometry in Frenet frame.
2.2. Inverse Problem Formulation Given a sequence of edge curves of a moving object in the calibrated stereo rig, the problem is to extract the space curves on the surface of the object. It is obvious that the plane curve is established by projection of the space curve to camera plane. As it is shown in figure 2, projections of space curve in two different camera planes do not necessarily have the same shape. In the small base line stereo setup, the assumption of shape similarity for correspondent curves will be reasonable because of the small variation of viewpoint. However, in general, for any curve pair in two camera planes we can find one space curve by intersecting the projected rays from plane curves into space through the camera centers. Therefore, the inverse problem of determining the space curve from plane curves is an ill posed problem. To find a way out to this problem, we consider the fundamental theorem of space curves as: Theorem 1. If two single-valued continuous functions κ ( s) and τ ( s) are given for s > 0 , then there will exist exactly one space curve determined except for orientation and position in space, i.e. up to a Euclidean motion, where s is the arc length, κ is the curvature, and τ is the torsion [30]. The fundamental theorem illustrates that the parameters κ ( s) and τ ( s) are intrinsic characteristics of space curve that do not change when the curve moves
New Trends in Surface Reconstruction Using Space-Time Cameras
9
in the space. This cue leads us to propose a new method of space curve matching based on curvature and torsion similarity through the curve length. The space curves that fit in the surface of rigid object must be consistent in curvature and torsion during the object movement. As illustrated in figure 2, each pair of curves in stereo cameras can define one space curve. However, for any curve in the left camera there is only one true match in the right camera. Hence, only one of the established space curves fits to the surface of the object and the others are outliers (or invalid space curves). In the following, we present a new method to determine the true pair match between different curves.
Figure 2. Space Curve is established by intersection of the rays projected from image curves into the space through the camera centers. Any pair of curves in the left and right camera images can make one space curve.
Proposition 1. Between different pairs of curves in the left and right images of calibrated stereo rig, only the pair is true match that its associated space curve is consistent in curvature and torsion during motion of the curve (or stereo rig). Proof: For any pair of curves in stereo cameras, we can establish one space curve by projecting them to 3D space through the related camera center and intersecting the projected rays by triangulation (figure 2). Based on the fundamental theorem, if the internal characteristics of space curve i.e. κ ( s) and
τ ( s) be consistent during movement, all points of curve should have the same motion parameters. In the other words, we can find a fixed motion matrix i.e. R and T, which transforms all points of the space curve to their new positions. Therefore, if we show that it is impossible to find such a fixed motion matrix for all points of invalid space curve during motion, the proof will be completed. To simplify the proof, we deal with the rectified stereo configuration (figure 3). Any other configuration can be easily converted to this configuration.
10
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 3. Reconstructed 3D points from different pairs of points in left and right camera images in rectified stereo configuration. Horizantal scan line is the epipolar line.
Applying the constraint of horizontal scan line, as the epipolar line, the following equations can be easily derived: i P ( ) = ⎡⎣ X
P(
ij )
x L(i ) = x L( j ) =
= ⎡⎣ X X
(i )
Z
(i )
X
(j)
Z
(j)
t t T i i i , Z ( ) ⎤⎦ = ( i ) x ( i ) ⎡⎣ x L( ) , y L( ) ,1⎤⎦ xL −xR
(7)
t t T ij i i , Z ( ) ⎤⎦ = ( i ) x ( j ) ⎡⎣ x L( ) , y L( ) ,1⎤⎦ xL −xR
(8)
(i )
,Y
(i )
( ij )
,Y
( ij )
= =
X
( ij )
Z
( ij )
; x R(i ) =
X
( ji )
Z
( ji )
X
(i )
Z
; x R( j ) =
X
Y
(i )
y L( i ) = y R( i ) = y L( j ) = y R( j ) =
−T x
(i )
(j)
Z
Z (i )
=
=
−T x
(j)
Y
( ij )
Z (ij )
X
( ji )
−T x
( ji )
= =
Z X ( ij ) −T x
(9)
Z ( ij )
Y
(j)
Z (j)
=
Y
( ji )
Z ( ji )
New Trends in Surface Reconstruction Using Space-Time Cameras
11
Suppose that P (i ) is the valid three-dimensional point, which has been reconstructed from the true match pair x L(i ) , x R(i ) and P(ij ) is the invalid threedimensional point, which has been reconstructed from the false match pair x L(i ) , x R( j ) . Based on the fundamental theorem of space curves, we can find a fixed
matrices R and T, which transform any point P (i ) of the valid space curve to new position Pm(i ) after movement: ∀ s ∈ SpaceCurve (i ) → Pm(i ) ( s ) = R ⋅ P (i ) ( s ) + T
(10)
In the next step, we should inspect whether there are another fixed matrices R' and T', which transform any point P (ij ) of the invalid space curve to its new position Pm(ij ) after movement, or not. The following equations can be easily extracted: t ⎡ X m( i ) Y m( i ) ⎤ T Tx (i ) ( i ) ⎤t Pm( ij ) = ⎡⎣X m( ij ) ,Y m( ij ) , Z m( ij ) ⎤⎦ = ( i ) x ( j ) ⎡⎣x Lm , y Lm ,1⎦ = ( i ) ⎢ i , i ,1⎥ j x Lm − x Rm Xm X m( ) −T x ⎣⎢ Z m( ) Z m( ) ⎦⎥ − i j Z m( ) Z m( )
⎡ ⎢ Tx Tx Tx =⎢ , , ⎢ Z ( i ) X ( j ) −T X ( i ) Z ( i ) X ( j ) −T X ( i ) X ( j ) −T m m ⎢1 − m( i ) ⋅ m ( j ) x − mi ⋅ m ( j ) x − m (j) x i i Zm Y m( ) Y m( ) Zm Z m( ) Zm ⎢⎣ X m
⎤ ⎥ ⎥ ⎥ ⎥ ⎥⎦
t
t
(11) X m( i ) = R1t ⋅ P ( i ) + T1
R = ⎡⎣ R1t , Rt2 , Rt3 ⎤⎦ ; T = [T 1 ,T 2 ,T 3 ] → Y m( i ) = Rt2 ⋅ P ( i ) + T 2 t
t
(i )
Zm = R ⋅P t 3
(i )
(12)
+T 3
Combining Eq.11 and Eq.12 results in: ⎡ ⎢ Tx Tx Tx , , Pm(ij ) = ⎢ ⎢ Rt P(i ) +T Rt P( j ) +T −T Rt P(i ) +T Rt P(i ) +T Rt P( j ) +T −T Rt P(i ) +T Rt P( j ) +T −T 3 3 3 3 1 1 1 1 1 1 1 1 x x ⎢1− t ( i ) − 1 t (j) 1 x ⋅ t (j) − ⋅ Rt3 P( i ) +T3 R3 P +T3 Rt2 P(i ) +T 2 Rt2 P(i ) +T 2 Rt3 P( j ) +T3 ⎣⎢ R1 P +T1 R3 P +T3
t
⎤ ⎥ ⎥ ⎥ ⎥ ⎦⎥
(13)
12
Hossein Ebrahimnezhad and Hassan Ghassemian •
and P ( j ) can be written as a function of P (ij ) using Eq.9: t
i P( ) = ⎡⎣X
(i )
,Y
(i )
t
i ij i (i ) ( ij ) t Y() ⎤ Y( ) ⎤ Z( ) i i ⎡X i ⎡X , Z ( ) ⎤⎦ = Z ( ) ⎢ ( i ) , ( i ) ,1⎥ = Z ( ) ⎢ ( ij ) , (ij ) ,1⎥ = ( ij ) ⎡⎣X Z Z ⎣Z ⎦ ⎣Z ⎦ Z
( ij )
,Y
( ij )
t Z ( ) ij ij , Z ( ) ⎤⎦ = (ij ) P( ) Z i
(14) P( j ) = ⎡⎣X
(j)
,Y
(j)
t
, Z ( j ) ⎤⎦ =
Z (j) ⎡ X Z ( ij ) ⎣
( ij )
,Y
( ij )
⎡ t , Z (ij ) ⎤⎦ + ⎢T x ⎢⎣
⎛ Z (j) ⎜⎜1 − (ij ) ⎝ Z
t
⎤ ⎞ Z (i ) ( ij ) ⎟⎟ , 0, 0⎥ = (ij ) P +T x ⎥⎦ Z ⎠
⎛ Z (j) ⎜⎜1 − ( ij ) ⎝ Z
⎞ t ⎟⎟ [1,0,0] ⎠
(15) Substituting Eq.14 and Eq.15 in Eq.13, we obtain:
Pm( ij )
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ Tx ⎢ (i ) i) ( ⎛ Z (j) ⎞ Z ⎢ t ( ij ) Z t ( ij ) R P + R T ⎜⎜ 1 − ( ij ) ⎟⎟ + T1 −T x 1 11 x R P + T ij ( ) ⎢ 3 ( ij ) 3 Z ⎝ Z ⎠ ⎢ 1 − Z (i ) ⋅ (i ) ⎢ ⎛ Z (j) ⎞ Z Z R t P ( ij ) + T1 R t P ( ij ) + R 31T x ⎜⎜ 1 − ( ij ) ⎟⎟ + T 3 ⎢ ( ij ) 1 ( ij ) 3 Z Z ⎝ Z ⎠ ⎢ ⎢ Tx =⎢ ⎛ Z ( j) ⎞ ⎢ Z (i ) Z ( i ) t ( ij ) Z ( i ) t ( ij ) R P + R11T x ⎜⎜ 1 − ( ij ) ⎟⎟ + T1 −T x ⎢ ( ij ) R1t P ( ij ) + T 1 R P +T 3 ( ij ) 1 ( ij ) 3 Z ⎝ Z ⎠ ⎢Z − Z (i ) ⋅ ⎢ Z ( i ) t ( ij ) ⎛ Z (j) ⎞ Z Z ( i ) t ( ij ) ( ij ) t R 2P +T 2 ⎢ ( ij ) R 2 P + T 2 R P + R 31T x ⎜⎜1 − ( ij ) ⎟⎟ + T 3 ( ij ) ( ij ) 3 Z Z Z ⎢ ⎝ Z ⎠ ⎢ Tx ⎢ ⎢ ⎛ Z (j) ⎞ Z ( i ) t ( ij ) Z ( i ) t ( ij ) R P + R11T x ⎜⎜1 − ( ij ) ⎟⎟ + T1 −T x ⎢ R1 P +T1 ( ij ) 1 ij ) ( Z ⎢ ⎝ Z ⎠ Z − ⎢ (i ) (i ) ⎛ Z (j) ⎞ Z Z ij ) ( t ij ( ) t ⎢ R P +T 3 R3 P + R 31T x ⎜⎜1 − ( ij ) ⎟⎟ +T 3 ( ij ) 3 ij ) ( Z Z ⎝ Z ⎠ ⎣⎢
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ i ⎥ ⎛ Z() ⎥ = f ⎜ P ( ij ) , R , T, (ij ) ⎜ ⎥ Z ⎝ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦
⎞ ⎟⎟ ⎠
(16) Eq.16 clarifies that Pm(ij ) is a nonlinear function of P (ij ) and we cannot find the fixed rotation and translation matrices that transform all points of P (ij ) to Pm(ij ) . Moreover, the elements of Pm(ij ) depend on
Z (i ) Z ( ij )
which may vary for any point of
New Trends in Surface Reconstruction Using Space-Time Cameras
13
curve. Therefore, we cannot find a fixed motion matrix for all points of invalid space curve and the proof is completed. In special situation where Z (i ) = Z (ij ) , Eq.16 can be modified as:
Pm(
ij )
⎡ ⎢ Tx ⎢ ⎢ Rt3P(ij ) +T 3 R1t P(ij ) +T1 −T x ⎢ ⋅ 1 − t ( ij ) R1 P +T1 Rt3P(ij ) +T 3 ⎢ ⎢ Tx = ⎢ t (ij ) ⎢ R1 P +T1 Rt3P(ij ) +T 3 R1t P( ij ) +T1 −T x − t (ij ) ⋅ ⎢ t (ij ) Rt3P( ij ) +T 3 ⎢ R2 P +T 2 R2 P +T 2 ⎢ Tx ⎢ ⎢ R1t P(ij ) +T1 R1t P(ij ) +T1 −T x − ⎢ Rt3P(ij ) +T 3 Rt3P( ij ) +T 3 ⎣
⎤ ⎡ ⎥ ⎢ Tx ⎥ ⎢ ⎥ ⎢ R1t P(ij ) +T1 −T x ⎥ ⎢ 1− ij R1t P( ) +T1 ⎥ ⎢ ⎥ ⎢ Tx ⎥=⎢ ⎥ ⎢ R1t P(ij ) +T1 R1t P(ij ) +T1 −T x − ⎥ ⎢ t (ij ) ij Rt2 P( ) +T 2 ⎥ ⎢ R2 P +T 2 ⎥ ⎢ Tx ⎥ ⎢ ⎥ ⎢ R1t P(ij ) +T1 R1t P(ij ) +T1 −T x − ⎥ ⎢ t (ij ) ij Rt3 P( ) +T 3 ⎦ ⎣ R3 P +T 3
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎡ Rt P(ij ) +T ⎤ 1 ⎥ ⎢ 1 ⎥ ⎥ = ⎢Rt2 P(ij ) +T 2 ⎥ = RP(ij ) + T ⎥ ⎢ ⎥ ⎥ ⎢ Rt3 P(ij ) +T 3 ⎥ ⎦ ⎥ ⎣ ⎥ ⎥ ⎥ ⎥ ⎦
(17) Referring to figure 3, this condition can occur if and only if curve i and curve j in both stereo images be identical (i=j). In the proof, we assumed that the space curve is not occluded during movement. This assumption is achievable for the proper length of curves with small amount of motion. Figure 4 illustrates the proof in graphical method. Two fixed space curves are captured by moving stereo rig in 3D-studio max environment. In part (a) and (b), the stereo images are shown before and after motion of the rig. Part (c) and (d) display the established space curves by different pair of curves before and after motion. Part (e) illustrates that the valid space curve established by true match is consistent in shape during the movement, but the invalid space curve established by false match is not. The presented curve matching method, which applies shape consistency of space curve during motion, does not consider any shape similarity between plane curves in stereo images. So, it can be used effectively to extract space curves from wide base line stereo setup, where projections of the space curve in stereo cameras do not have similar shape. Consequently, we can get more precise depth values as illustrated in figure 5. On the other hand, the wide base line stereo setup intensifies occlusion, which can make the curve matching inefficient. Therefore, there is a tradeoff between occlusion and depth accuracy to choose the proper length of base line.
14
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 4. Stereo curve matching by curvature and torsion consistency of established space curve during motion: (a) stereo images befor motion, (b) stereo images after motion, (c) established space curves from different curve pairs before motion, (d) established space curves from different curve pairs after motion and (e) determining the true mathes from consistent space curve in curvature and torsion during motion.
New Trends in Surface Reconstruction Using Space-Time Cameras
15
Figure 5. Stereo triangulation and uncertanity in depth. The true point can lie anywhere inside the shaded uncertanity region. Uncertanity in depth ΔΗ is reduced by increasing the length of base line b (In the formulation, it is assumed that the maximum error of matching is one pixel in every camera image). In practice, the right configuration is employed to make efficient use of camera plane.
2.3. Reconstruction of Unique Space Curves Here, we describe the different steps involved in the curve matching process to extract 3D position of unique feature points by forcing the constraint of space curves as global object descriptions. The unique points are defined as the points in three-dimensional space that are matchless after forcing all the constraints, i.e. edge positioning, epipolar line, capability of curve forming with proper length by joining the neighboring points, curvature and torsion consistency of relevant space curve in two or more consecutive frames, and the uniqueness of such space curve. The unique space curves are also composed from adequate number of continuous adjacent unique points. As illustrated in figure 6, to check the uniqueness of any edge point in the left camera, all potential matches are labeled in the right camera by intersecting all existence curves with the epipolar line. Then, the associated curves to test point and each labeled point are projected to space through their
16
Hossein Ebrahimnezhad and Hassan Ghassemian
camera centers. Each intersection generates a potential space curve with different shape, which should be checked for curvature and torsion consistency during the object motion. This process is done by intersecting the moved version of curves in the next frame as illustrated in below part of figure 6. The consistent space curve in torsion and curvature is selected as valid space curve if there were only one solution. The 3D point on this curve that corresponds to the test point is labeled as unique point. Details of the presented algorithm are given at the following: Algorithm: Step1- At any instance of time, the moving object is captured by two calibrated fixed cameras. The edge curves of image are extracted and thinned by the canny method. Edges are then linked into chains, jumping up to one pixel gap. The small size edge regions are removed to reduce the effect of noise and get the more descriptive curves. The extracted image curves are never perfect in practice: there will be missing segments, erroneous additional segments, etc. Therefore, to improve robustness to these imperfections we begin with one such segment in the left image and seek confirming evidence in the right one. Step2- To extract the unique points, the resulted left curve image in step 1 is scanned and all edge points are checked one by one for uniqueness. For each examined edge point c L (s0 ) , which is considered as initial point, the corresponding epipolar line in the right curve-image is computed. Intersection of the epipolar line with edge curves are labeled as candidate match points. One or more match candidates c(Ri) ( s0′ ), i =1,2,... may be found in this step (see figure 6). Only one of the candidate points is the true match and the other points are outliers. Step3- The points c L (s0 ) and c(Ri) (s0′ ), i =1,2,... are grown n points from two sides to form the curves. The curves with smaller length than 2n and the branched curves are discarded. Step4- To distinguish the true match from other points, the next sequence of stereo images is also considered. It is assumed that the frame rate is adjusted as well to capture consecutive images with small amount of the object motion. The neighborhood of c L (s0 ) is inspected in the next sequence of left camera to find the shift of the curve c L ( s) as c Lm ( s) . The key property of the curve with small movement is its proximity and similarity to the main curve. Therefore, we define a correlation function as a combination of curve distance and curvature difference along the curves:
New Trends in Surface Reconstruction Using Space-Time Cameras
(
( ))
(
( ))
cor (cL (s ),cLm (s ))=max⎡cor cL (s0 ),cLm s j ⎤ ; cor cL (s0 ),cLm s j = j ⎣ ⎦
1
( )
n
17
(
cL (s0 )−cLm s j +α ⋅ ∑ κcL (sk ) −κc s Lm ( k + j ) k =− n
)
(18)
(
n
( )
)( 2
where: cL (s0 )−cLm s j = ∑ x cL (sk ) −x cLm (sk + j ) + y cL (sk ) − y cLm (sk + j ) k =− n
κ=
− yx xy
(x
2
+ y 2
)
)
2
(19)
(20)
32
The shift of c L ( s) is selected as the argument cLm (s) which maximizes the correlation function cor(cL (s),cLm (s)) . The center of c Lm ( s) is also determined as an
(
( ))
argument sj , which maximizes the cor cL (s0 ),cLm s j .
( )
Step 5- The epipolar line of cLm s j is computed in next sequence of the right image. Intersections of the curves with the epipolar line are labeled as
( )
i i c(Rm) s ′j , i =1,2,... according to their proximity to c(R) (s0′ ), i =1,2,... .
{
}
Step 6- The Space curves SC(i ) (s 0 ), i =1,2,... corresponding to c L ( s0 ),c(Ri) (s0′ )
{
}
i) and the Space curves SCm(i) ( s j ), i =1,2,... corresponding to c Lm (s j ),c(Rm (s′j ) are
established by projecting the two dimensional curves in to the space and intersecting the rays. For each space curve, the curvature and torsion are computed from Eq. 5 and 6. The correlation between two space curves before and after motion is computed from Eq. 21 for i=1, 2, …
(
)
cor SC( ) ,SC( )m = i
i
1 n
∑κ
k =− n
i SC( ) (sk )
n
−κSC(i ) s + ∑ τSC(i ) s −τSC(i ) s (k) m ( k +j ) m ( k +j ) k =− n
(21)
The space curve i=q that maximizes the correlation function, is selected as the consistent space curve and the pair c L ( s0 ) and c(Rq) ( s0′ ) are selected as the unique points with determined depth value. If there were more than one solution because of having the close values of correlation function, the third sequence is also inspected to find the more confident answer. If there were only one solution, the
18
Hossein Ebrahimnezhad and Hassan Ghassemian
point c L (s0 ) would be selected as known depth value. Otherwise, it would be labeled as non-unique point and rejected.
Figure 6. Curve stereo matching to find the unique points of the object.
Step 7- Going back to step 2 and repeat the procedure for all edge points to find adequate number of unique points. At the end, the unique space curves are composed from continuous adjacent unique points. Shape descriptivity, branching, and occlusion are considered as three factors to choose the proper length of curves in matching process. The short curves are less descriptive in shape and result in low confidence matching, as the number of detected similar curves will be increased. Hence, the uniqueness-checking process will fail to find the best match. On the contrary, the long curves are more descriptive in shape and result in high confidence matching, as the number of detected similar curves will be decreased. Occlusion and branching are the other factors that restrict lengthening of the curves, so that the number of appropriate curves reduces by increasing the length of curves. Our experiments show that the curve length between 20 to 40 points (for 480×640 image size) provides good result. Of course, the proposed length is a representative value. Depending on texture of the object, number of the similar
New Trends in Surface Reconstruction Using Space-Time Cameras
19
curve on surface of the object, and range of depth-variation across the curves, the representative curve length may be varied.
3. Rigid Motion Estimation by Tracking the Space Curves The movement of a rigid object in space, can be expressed as the six rotation and translation parameters. Now, suppose that we have extracted a set of points on the surface of an object and the goal is to estimate 3D motion of the object across time. To estimate the motion parameters, an error function, which describes the difference of points before and after motion, should be minimized. To get rid of photometric information, we define the error function as a distance of unique points from the nearby curves after movement. To explain the problem mathematically, suppose that Wi is the ith unique point, Nu is the total number of unique points, k (R ⋅ Wi + T) is projection of Wi in camera plane k after
P
movement, and contourk( m ) is the curve number m in camera plane k. To estimate the motion matrix, i.e. R and T, the error component for each unique point in each projected camera image is defined as the minimum distance of that point from nearby curves in that camera. The total error is also calculated by summing error components over all unique points and all cameras.
{
e ik = min distance m
(P (R ⋅ W + T),contour )} (m )
k
i
e = ∑ i =1 ∑ k =1 e ik ; R , T = arg min{e } Nu
K
k
(22)
Where K is the total number of cameras (K=2, for single stereo rig). To find the minimum distance of each point from nearby curve in the camera image, we use a circle based search area with increasing radius (figure 7). Therefore, the minimum distance is determined as the radius of the first osculating circle with adjacent curves. R and T are parameterized as Θ = ⎡⎣ϕ x , ϕ y , ϕ z , t x , t y , t z ⎤⎦ where
ϕx , ϕ y , ϕz
are the Euler angles of rotation and t x , t y , t z are the x, y, z
components of translation vector. The total error function defined in Eq.22 can be minimized by an iterative method similar to the Levenberg-Marquardt algorithm [8]:
20
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 7. To find the minimum distance of point from the adjacent curves in the camera image, a circle based search window, with increasing radius, is considered. The minimum distance is determined as the radius of the first touching circle with adjacent curves.
ˆ , calculate the Hessian matrix H and the 1. With an initial estimate Θ difference vector d as:
⎡ ∂e ik ⎢ ⎢ ∂φx ⎢ ∂e k ⎢ i ⎢ ∂φ y ⎢ k ⎢ ∂e i ⎢ ∂φ z H ik = ⎢ k ⎢ ∂e i ⎢ ⎢ ∂t x ⎢ k ⎢ ∂e i ⎢ ∂t y ⎢ ⎢ ∂e ik ⎢ ⎣ ∂t z
∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ⎤ ⋅ ⋅ ⋅ ⋅ ⋅ ⎥ ∂φx ∂φx ∂φ y ∂φx ∂φz ∂φx ∂t x ∂φx ∂t y ∂φx ∂t z ⎥ ∂e k ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ⎥ ⎥ ⋅ i ⋅ ⋅ ⋅ ⋅ ⋅ ∂φx ∂φ y ∂φ y ∂φ y ∂φz ∂φ y ∂t x ∂φ y ∂t y ∂φ y ∂t z ⎥ ⎥ ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ⎥ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂φx ∂φz ∂φ y ∂φz ∂φz ∂φz ∂t x ∂φz ∂t y ∂φz ∂t z ⎥ ⎥ ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ⎥ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎥ ∂φx ∂t x ∂φ y ∂t x ∂φz ∂t x ∂t x ∂t x ∂t y ∂t x ∂t z ⎥ ⎥ ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ⎥ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∂φx ∂t y ∂φ y ∂t y ∂φz ∂t y ∂t x ∂t y ∂t y ∂t y ∂t z ⎥ ⎥ ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ∂e ik ⎥ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⎥ ∂φx ∂t z ∂φ y ∂t z ∂φz ∂t z ∂t x ∂t z ∂t y ∂t z ∂t z ⎦ ⋅
New Trends in Surface Reconstruction Using Space-Time Cameras H = ∑i =1 ∑k =1 Hik Nu
K
(23)
⎡ ∂e k ∂e k ∂e k ∂e k ∂e k ∂e k ⎤ dik = ⎢e ik ⋅ i , e ik ⋅ i , e ik ⋅ i , e ik ⋅ i , e ik ⋅ i , e ik ⋅ i ⎥ ∂φx ∂φy ∂φz ∂t x ∂t y ∂t z ⎥⎦ ⎣⎢
d = −2 ⋅
∑ ∑ Nu
K
i =1
k =1
21
t
(24)
dik
ˆ by an amount ΔΘ : 2. Update the parameter Θ
ˆ ( n +1) = Θ ˆ ( n ) + ΔΘ = Θ ˆ ( n ) + 1 H −1 ⋅ d Θ
λ
(25)
Where λ is a time-varying stabilization parameter. ˆ converges. 3. Go back to step1 until the estimate of Θ Unless the object has periodic edge curves, the error function in Eq.22 usually has one minimum and convergence of the algorithm will be guaranteed. Outlier points have destructive effect on convergence of the algorithm. Naturally, projection of the outlier point in the camera planes will not be close to the tracking curves. As a result, minimization of the error function cannot be accomplished accurately. To explain the problem mathematically, consider the unique points in two groups, i.e. inliers and outliers. The error function can be rearranged as: e=
∑
N inlier i =1
ei +
∑
N outlier j =1
e j where : N inlier + N outlier = N u
(26)
In the provision that Noutlier is very small than Ninlier, the error component
∑
N outlier j =1
e j has negligible effect compared to
∑
N inlier i =1
ei and estimation of the
motion will go in the true way. However, the unique points will not join to the tracking curves during convergence. To make the algorithm more efficient, the minimum distance of each unique point from nearby curve is checked after adequate number of iterations. The points that their distance is very greater than the average distance (i.e. e i >> e N u ), are distinguished as outliers. Such points are excluded in calculation of the error function and hence the closer unique
22
Hossein Ebrahimnezhad and Hassan Ghassemian
points to the tracking curves, with more precise motion parameters, can be achieved. Once the six motion parameters were estimated for two consecutive sequences, the motion matrix can be constructed as:
(
⎡ M = ⎢ R ϕx , ϕ y , ϕz ⎣ 0 0 0
)
T⎤ 1 ⎥⎦
(27)
Where: 0 ⎤ ⎡ cosϕz sinϕz 0⎤ ⎡cosϕy 0 −sinϕy ⎤ ⎡1 0 R ϕx ,ϕy ,ϕz = ⎢−sinϕz cosϕz 0⎥ ⎢ 0 1 0 ⎥ ⎢0 cosϕx sinϕx ⎥ ⎢ ⎥⎢ ⎢ ⎥ 0 1⎦⎥ ⎢⎣sinϕy 0 cosϕy ⎥⎣ ⎣ 0 ⎦ 0 −sinϕx cosϕx ⎦
(
)
⎡tx ⎤ ; T= ⎢t y ⎥ ⎢ ⎥ ⎢⎣tz ⎥⎦
(28)
The new position of each point, in the next frame, can be calculated by multiplying the motion matrix to the position vector. W(
n +1)
= M n ⋅ W(
n)
where :
T
1 W( ) = ⎡⎣X w ,Y w , Z w ,1⎤⎦
(29)
4. Motion Estimation Using Double Stereo Rigs In this section, we present a double stereo configuration to get as much accuracy as possible in estimation of motion parameters. The basic idea to achieve this end is to find an arrangement of stereo cameras in which the sensitivity of image pose variation to space pose variation is maximized. At first, the single stereo setup is investigated and then a perpendicular double stereo configuration is presented and its dominance to the single stereo is demonstrated.
4.1. Single Stereo Rig As mentioned in section 2.3, the base line of stereo rig is adjusted neither small nor wide to compromise between depth uncertainty and occlusion. Moreover, to utilize the linear part of camera lens and to get rid of the complex computations of nonlinear distortion, the view angle is chosen as small as possible. Hence, the size of the object is usually very smaller than its distance from the camera center, i.e. 2r t z (see figure 8).
New Trends in Surface Reconstruction Using Space-Time Cameras
23
Figure 8. Single stereo setup with small view angle ( t z >> 2r ).
Now, we would like to answer the question that how much accuracy is achievable in space motion estimation by tracking the projection of points in camera planes. For the sake of simplicity, we assume that the optical axis of camera1 is in the depth direction of world coordinate (i.e. Zw). Projection of any point ( X w , Yw , Z w ) in the image plane of camera1 is computed as: ⎛
Xw Yw ⎞ , −f y 1 ⋅ ⎟ Zw + tz Zw + tz ⎠
(30)
∂x im 1 ∂x im 1 ∂x im 1 ⎧ ⎪Δx im 1 = ∂X ΔX w + ∂Y ΔY w + ∂Z ΔZ w ⎪ w w w ⎨ ∂ ∂ ∂ y y y ⎪Δy = im 1 ΔX + im 1 ΔY + im 1 ΔZ w w w ⎪⎩ im 1 ∂X w ∂Y w ∂Z w
(31)
( x im 1 , y im 1 ) = ⎜ −f x 1 ⋅ ⎝
By differentiating, we can write:
⎧ f x1 ⎪Δx im 1 = Zw +tz ⎪ ⎨ f y1 ⎪ ⎪ Δy im 1 = Z + t w z ⎩
For
the
provision
of
⎛ ⎞ Xw ΔZ w ⎟ ⎜ −ΔX w + Zw +tz ⎝ ⎠
small
assuming X w , Yw , Z w ≤ r , we have:
(32)
⎛ ⎞ Yw ΔZ w ⎟ ⎜ −ΔY w + Zw +tz ⎝ ⎠
view
angle
(i.e. t z >> 2r )
and
24
Hossein Ebrahimnezhad and Hassan Ghassemian Xw ΔX w ⎪⎩ w Z w +t z or ⎧ ΔY w , ΔZ w ≈ 0 ⎪ Yw Δy im 1 ≈ 0 → ⎨ ΔY ≈ ΔZ w →ΔZ w >> ΔY w ⎪⎩ w Z w +t z
(34)
This equation reveals that the inverse problem of 3D motion estimation by tracking the points in camera plain is an ill posed problem and does not have one solution. Any small estimation error of Xw or Yw (i.e. ΔX w ≠ 0 or ΔYw ≠ 0 ) imposes a large estimation error of Zw (i.e. ΔZ w >>ΔX w or ΔZ w >>ΔYw ). Therefore, the total 3D positional error
ΔX w2 +ΔYw2 +ΔZ w2 will be notably increased and the
inaccurate 3D motion parameters will be estimated.
4.2. Double Stereo Rigs Due to the limitation of large base line selection in single stereo rig, both stereo cameras have approximately the same effect in motion estimation process. To take the advantages of both small and wide base line stereo cameras, we present a combined double stereo setup. This combination is composed of two single stereo rigs in which they make angle θ from each other (see figure 9).
New Trends in Surface Reconstruction Using Space-Time Cameras
25
Figure 9. Structure of double stereo setup: (a) Double stereo setup with angle θ , (b) Perpendecular double stereo setup.
Similar to single stereo setup and considering the rotation angle of θ for camera3, it can be easily shown that:
X w cos θ − Z w sin θ ⎧ ⎪ x im 3 = −f x 3 ⋅ X sin θ + Z cos θ + t + x o 3 ⎪ w w z ⎨ Yw ⎪y + y o3 im 3 = − f y 3 ⋅ ⎪⎩ X w sin θ + Z w cos θ + t z f x3 ⎧ ⎪⎪Δx im 3 = A ( − ( Z w + t z cos θ ) ΔX w + ( X w + t z sin θ ) ΔZ w ⎨ f y3 ⎪ Δy = (Y w sin θ ⋅ ΔX w − A ⋅ ΔY w +Y w cos θ ⋅ ΔZ w ) ⎪⎩ im 3 A
(35)
) (36)
Where: A = ( X w sin θ + Z w cos θ + t z
)
2
(37)
By choosing a proper amount of θ , it is possible to increase the sensitivity of xim3 and yim3 to Zw as much as possible. Therefore, we can minimize the 3D motion estimation errors ∆Xw and ∆Yw by minimizing ∆xim1 and ∆yim1, and the estimation
26
Hossein Ebrahimnezhad and Hassan Ghassemian
error ∆Zw by ∆xim3 and ∆yim3. It can be verified, by differentiating, that the maximum value of sensitivity is achieved by θ = 90 . For θ = 90 , the Eq.36 is simplified as:
⎧ f x3 ⎪ Δx im 3 = X w +tz ⎪ ⎨ f y3 ⎪ ⎪ Δy im 3 = X + t w z ⎩
⎡ −Z w ⎤ ΔX w + ΔZ w ⎥ ⎢ ⎣ X w +tz ⎦ ⎡ Yw ⎤ ΔX w − ΔY w ⎥ ⎢ ⎣ X w +tz ⎦
(38)
Similar to single stereo, we can assume ( Δxim1 , Δyim1 ≈ 0 ) for each tracking point in camera1 and
( Δxim3 , Δyim3 ≈ 0 )
for each tracking point in camera3 after
convergence of motion estimation algorithm. Hence:
or ⎧ ΔX w , ΔZ w ≈ 0 ⎪ Δx im 3 ≈ 0 → ⎨ Z w ΔX w ≈ ΔZ w →ΔX w >> ΔZ w ⎪⎩ X w +t z or ⎧ ΔY w , Δ Z w ≈ 0 ⎪ Y Δy im 3 ≈ 0 → ⎨ w ΔX w ≈ ΔY w →ΔX w >>ΔY w ⎪⎩ X w +t z
(39)
Combination form of the Eq.34 and Eq.39 result in ΔX w , ΔYw , ΔZ w ≈ 0 . Therefore, the total 3D positional error
ΔX w2 +ΔYw2 +ΔZ w2 will be notably decreased in
perpendicular double stereo setup and more precise motion parameters will be resulted.
5. Shape Reconstruction from Object Silhouettes Across Time Three-dimensional model reconstruction by extracting the visual hull of an object has been extensively used in recent years [31-34] and it has become a standard and popular method of shape estimation. Visual hull is defined as a rough model of the object surface, which can be calculated from different views
New Trends in Surface Reconstruction Using Space-Time Cameras
27
of the object's silhouette. The silhouette of an object in an image refers to the curve, which separates the object from background. The visual hull cannot recover concave regions regardless of the image numbers that are used. In addition, it needs a large number of different views for recovering the fine details. To moderate the first drawback of visual hull, combination with stereo matching can be employed. To get rid of the second drawback, more silhouettes of the object can be captured by the limited number of cameras across time. Cheng et al. presented a method to enhance the shape approximation by combining multiple silhouette images captured across time [34]. Employing a basic property of visual hull, which affirms that each bounding edge must touch the object in no less than one point, they use multi-view stereo to extract these touching points called Colored Surface Points (CSP) on the surface of the object. These CSPs are then used in a 3D image alignment algorithm to find the six rotation and translation parameters of rigid motion between two visual hulls. They utilize the color consistency property of the object to align the CSP points. Once the rigid motion across time is known, all of the silhouette images are treated as being captured at the same time instant and the shape of the object is refined. Motion estimation by CSP method suffers from some drawbacks. Nonaccurate color adjustment between cameras is one problem that makes some error in color-consistency test. Moreover, variation of the light angle while the object moves around the light source produces additional error. Our presented method of motion estimation which uses only the edge information as the space curves, is very robust against the color maladjustment of cameras and shading during the object motion. Moreover, it can be effectively used to extract visual hull of poorly textured objects. In the remainder of this section, it is assumed that the motion parameters are known for multiple views of the object and the goal is to reconstruct 3D shape of the object from silhouette information across time.
5.1. Space-Time or Virtual Camera Generation Let P defined in Eq.40, be the projection matrix of camera, which translates the 3D point W in the world coordinate to ( x im , y im ) in the image coordinate of the camera plane:
28
Hossein Ebrahimnezhad and Hassan Ghassemian ⎡ −f x P = ⎢⎢ 0 ⎢⎣ 0
0
x0⎤ y 0 ⎥⎥ ⎡ Rwc ⎢⎣ 1 ⎥⎦
( ) − (R )
−f y 0
[ x im , y im ,1]
T
−1
∝ P ⎡⎣ X w ,Y
w c
w
−1
(40)
⋅ Tcw ⎤ ⎥⎦
(41)
T
, Z w ,1⎤⎦
Where, R cw and Tcw are the rotation and translation of the camera coordinate system to the world coordinate system, respectively. fx and fy are the focal length in x and y direction and x0, y0 are the coordinates of principal point in the camera plane. From Eq.29, we can get: W ( ) = M n −1 ⋅ W ( n
n −1)
= M n −1 ⋅" ⋅ M 2 ⋅ M1 ⋅ W ( ) 1
(42)
By multiplying the projection matrix to the motion matrix, a new projection matrix is deduced for any sequence. This matrix defines a calibrated virtual camera for that sequence as: P(
n)
= P ⋅ ( M n −1 ⋅" ⋅ M 2 ⋅ M1 )
(43)
The matrix P( n ) , which produces a new silhouette of the object, projects any point in the world coordinate to the image plane of the virtual camera n as: T
(n) (n) ⎤ (n) ⎡ xim ⎡ w w w ⎤ ⎣ , yim ,1⎦ ∝ P ⎣ X , Y , Z ,1⎦
T
(44)
In fact, by generating the virtual cameras, the moving object and fixed camera system is substituted by the fixed object and moving camera that moves in the opposite direction.
5.2. Visual Hull Reconstruction from Silhouettes of Multiple Views In the resent years, shape from silhouette has been used widely to reconstruct three dimensional shape of an object. Each silhouette makes one cone in the space, with the camera center. Using more cameras from different views of object, more cones are constructed. The visual hull is defined as a shared volume between these cones. There are two conventional approach to extract the visual hull: voxel
New Trends in Surface Reconstruction Using Space-Time Cameras
29
carving [35,36] and view ray sampling [37]. In the voxel carving method, a discrete number of voxels are constructed around the volume of interest. Then, each voxel is checked for all silhouettes and any voxels that project outside the silhouettes are removed from the volume. Voxel carving can be accelerated using octree representation which employs coarse to fine hierarchy. In view ray sampling, a sampled representation of the visual hull is constructed. The visual hull is sampled in a view-dependent manner. For each viewing ray in some desired view, the intersection points with all surfaces of the visual hull are computed. Moezzi et al. [38] construct the visual hull using voxels in an off-line processing system. Cheung et al. [39, 40] show that the voxel method can achieve interactive reconstruction results. The polyhedral visual hull system developed by Matusik et al. [41] also runs at interactive rate. In this section, two efficient algorithms are presented to improve the speed of computations in visual hull extraction. The first algorithm accelerates the voxel carving method. This algorithm reduces the number of check-points at intersection test procedure. The octree division method is optimized, by minimizing the number of check-points, to find intersection between cubes and silhouette images. To accomplish this function, the points are checked on the edges of octree cubes rather than the inside of volume. Furthermore, the points are checked hierarchically and their number is changed corresponding to the size of octree cubes. The second algorithm employs the ray sampling method to extract the bounding edge model of visual hull. To find the segments of any ray which lies inside the other silhouette cones, the points of ray are checked hierarchically.
5.2.1. Volume Based Visual Hull Many algorithms have been developed to construct the volumetric models from a set of silhouette images [35, 36, 37, 39]. Starting from a bounding volume that is known to surround the whole scene, the volume is divided into voxels. The task is finding which voxels belong to the surface of 3D object, corresponding to the intersection of back-projected silhouette cones. The most important step in these algorithms is the intersection test. To make the projection and intersection test more efficient, most methods use an octree representation and test voxels in a course-to-fine hierarchy. 5.2.1.1. Intersection Test in Octree Cubes The most important and time-consuming part of octree reconstruction is the cubes intersection check with silhouette images. All algorithms use one common
30
Hossein Ebrahimnezhad and Hassan Ghassemian
rule to decide whether or not intersection is happened between cube and object. Cube is known as outside if all points inside cube are “1” and known as inside if all points inside cube are “0”. Also, intersected cube is the cube which has at least two different color points. Different methods of point checking are classified in figure 10. The number of check points may be constant in all size of cubes, or change dynamically based on the size of cube. In all methods, the 8 corners of each cube are checked by projecting them to all the silhouettes. If there were at least two different color corners, occurrence of intersection will be inferred and the process for this cube can be terminated, otherwise more points in the cube should be checked. If there was color difference during check, the cube will be marked as intersected cube and the process will be terminated. After checking all points, if there was no color difference, the cube will be identified as outside (or inside) according to the color of points "1" (or "0"). To compare the complexity of different types of intersection check in the octree cubes, the following parameters will be considered: L = level of octree division; CL = number of grey (intersected or surface) cubes in level L; NL= maximum number of checking points to identify the mark of cube in level L; S = number of silhouettes. Since each grey cube is divided to 8 sub-cubes in octree division, so the number of grey cubes in level L-1 will be equal or greater than 1/8 grey cubes in level L according to the number of child grey cubes. The total number of point projections to silhouette images in the worst case will be: N to t
(m ax )
= S ⋅ ( N L C L + N L -1 C L -1 + N L -2 C L -2 + N L -3 C L -3 + · · · ) N N N ⎛ ⎞ ≥ S ⋅ C L ⎜ N L + L -1 + L -2 + L -3 + · · · ⎟ 8 64 512 ⎝ ⎠
(45)
Figure 10. Checking methods in octree cubes. a) Sequential check in volume b) Random check in volume c) Sequential check on edges d) Hierarchical check on edges.
New Trends in Surface Reconstruction Using Space-Time Cameras
31
Obviously, the total number of check points will be smaller than Ntot(max), because intersected cubes normally will be recognized in early checking. This number can be used as a measure to compare different methods of intersection check. It is obvious that the computing time is proportional with Ntot. Edge based checking is one approach to decrease the number of checking points to identify the mark of cube without loss of accuracy. Any intersection between cube and silhouette should be occurred through edges in one-piece objects. Of course there is one exception case when the object is small and posed inside the cube so that there is no intersection between object and cube through the edges. For such cases, the edge base method can not be employed to decide if object is inside the cube or intersects with cube through the face. Therefore checking some points inside the volume will be inevitable for this situation. If the size of bounding cube be selected properly, comparable to that of object, the cube will be larger than the object only in first level of division. So the ambiguity of intersection test through the edges will be stay only for the first level. Since the octree division is done at all times without checking the occurrence of intersection in first level, the use of edge based intersection test for one piece object can be applied with certainty. Another approach to decrease the number of check points is to change the number of points dynamically in each level. In fact, the large cubes may intersect with small parts of silhouette and it needs checking of more points to identify the intersection. In small cubes, this situation can not be occurred and there is no need to check more points. By choosing NL=8 (checking only corners of cube in last level) and increasing checked points with the factor of 'k' in lower levels, we can minimize Ntot (max) as below: 8k k ⎛ 8k 8k ⎞ ⎛ k k ⎞ N tot ( max ) ≥ S ⋅ C L ⎜ 8+ 1 + 2 + 3 + ··· ⎟ ≥ 8 ⋅ S ⋅ C L ⎜ 1+ 1 + 2 + 3 + ··· ⎟ (46) 8 64 512 8 64 512 ⎝ ⎠ ⎝ ⎠
The final approach to increase the speed is to check the edge points hierarchically. In this way the chance to find two different color points in early checks could be increased. 5.2.1.2. Synthetic Model Results To determine the capability of presented algorithm and to quantify its performance, we have tested it on a synthetically generated image named Bunny and Horse. Simulation was run on PC Pentium-III 933Mhz using Matlab and C++ generated Mex files. In this analysis, 18 silhouettes of bunny from equal space
32
Hossein Ebrahimnezhad and Hassan Ghassemian
angle viewpoints have been used. Figure 11 shows the result of simulation for different methods. In this figure 'CN' and 'DN' mean Constant Number or Dynamic Number of points should be checked in different size of cubes. 'S', 'H' and 'R' mean Sequential, Hierarchical and Random method to check the points, respectively. The last word 'V' and 'E' mean that the check points are selected inside the Volume or on the Edges of cube. To compare the efficiency of methods, computing time for a fix number of recovered cubes (voxels) in the last level could be evaluated for different types of intersection check. As it is cleared in figure, DNHE method gives the best result and CNRV method gives the worst result. Computing time for random check method is high, because some check points may be chosen near each other as it is cleared in figure 10-b.
Computing Time (Sec)
1000
CNSV DNSV CNSE CNHE DNHE CNRV
100
10
1 6880
6900
6920
6940
6960
6980
7000
Num ber of Recovered Voxels of Visual Hull
Figure 11. Computation time for different types of intersection check.
In figures 12, a synthetic object named Bunny has been applied to reconstruct the 3D shape using DNHE algorithm. The synthetic object has been captured from different views and 18 silhouettes of object have been prepared. These silhouettes are shown in figure 11-a. The different levels of octree division are illustrated in figure 11-b and the depth-map of reconstructed 3d-model is shown in figure 11-c
New Trends in Surface Reconstruction Using Space-Time Cameras
33
from different views. Figure 13 shows the result of shape reconstruction for another synthetic object named Horse.
Figure 12. Three-dimensional shape Reconstruction of Bunny from 18 silhouettes using DNHE algorithm. a) different silhouettes of object from 18 view angles b) different levels of octree division using DNHE algorithm c) depth-map of reconstructed 3d-model in different view angles.
34
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 13. Three-dimensional shape Reconstruction of Horse from 18 silhouettes using DNHE algorithm a) different silhouettes of object from 18 view angles b) different levels of octree division using DNHE algorithm c) depth-map of reconstructed 3d-model in different view angles.
New Trends in Surface Reconstruction Using Space-Time Cameras
35
5.2.2. Edge Base Visual Hull Cheng et al. suggested a representation of visual hull that uses a onedimensional entity called bounding edge [34]. To reconstruct the bounding edges, a ray is formed by projecting any point on the boundary of each silhouette image into space through the camera center. This ray is projected to other silhouette images. Those segments of the ray whose projections remain inside the other silhouettes will be selected as the bounding edges. So, by unifying these bounding edges of all rays of all silhouettes, the visual hull of the object will be reconstructed. Note that the bounding edge is not necessarily a continuous line. It may consist of several segments if any of the silhouette images are not convex. This point has been illustrated in figure 14-a where ray1i consists of two shared segments in S2, S3 and S4. To represent each bounding edge, it is enough to find the start and end position of each segment. We employ a hierarchical checking method to extract the start-end points very fast. The idea has been illustrated in figure 14-b. Instead of checking all the points of the ray, they are checked in hierarchical method. First, the middle point of the ray is projected to all the silhouette images and its status is checked. If it was inside in all the silhouettes, the quarter point in the left side is checked; otherwise, the quarter points in both sides must be checked. This procedure is repeated hierarchically to find the start point of the bounding edge with the favorite accuracy. Once the start point is found, hierarchical checking is restarted to find the end of the segment. Still, it is unclear how to find the start and end when the ray consists of more than one segment (in concave parts). To resolve this ambiguity, the points can be checked in the coarse-to-fine method. This means that the points are checked in large steps on the ray at first. Upon changing the status, hierarchical checking is applied to find exact position of the start-end points for each segment. This process is repeated to reach the end of the ray. Therefore, position of the start-end points in multi-segment bounding edges can definitely be determined. This concept is illustrated in the right part of figure 14-b. To have a sense on computation complexity of the hierarchical method and see its efficiency, suppose that: m j is the number of points on the boundary of silhouette j; N ray is the number of points on each ray, and N sil is the number of silhouettes. In the ordinary method, the points on the surface of each cone are projected to all other silhouettes. Then, decision is made to see which point is inside in all the silhouettes. The total number of projections is given by:
36
Hossein Ebrahimnezhad and Hassan Ghassemian N sil
N tot ( max ) = ∑ m j ⋅ N ray ⋅ ( N sil − 1)
(47)
j =1
It is possible to decrease this number by removing the points that are identified as outside in one silhouette and it would not be necessary to project such points to other silhouettes. Therefore, the total number of projections will be: N sil
(
N tot = ∑ m j ⋅N ray ⋅ 1+ k 1 j + k 1 j k 2 j +...+ k 1 j k 2 j ...k ( N sil −2) j j =1
)
(48)
Where 0 < kij < 1 is the percentage of points on cone(j) which project to the inside of silhouette i. The amount of kij depends on the shape of cone(j) and silhouette i. For hierarchical checking method, consider that each ray is divided into n part at
first. To find the exact position of each start and end points, log 2 ( N ray / n ) points
should be checked. In convex parts, bounding edges are formed in one segment. Therefore, the number of checking points in each ray is:
(
NH ray = n + 2 ⋅ log 2 N ray / n
)
(49)
Figure 14. (a) Projection of ray to the silhouette images to extract the bounding edges (b) hierarchical method to find the start and the end of segments.
New Trends in Surface Reconstruction Using Space-Time Cameras
37
And total number of projections for the convex object is: N sil
(
NH tot = ∑ m j ⋅NH ray ⋅ 1+ k 1 j + k 1 j k 2 j +...+ k 1 j k 2 j ...k ( N sil −2) j j =1
)
(50)
In concave parts, the bounding edges are formed in two or more segments. For q segmented bounding edges, the total number of checking points in the ray is given by:
(
NH ray (q ) = n + 2 ⋅ q ⋅ log 2 N ray / n
)
(51)
5.2.2.1. Synthetic Model Results To show the capability of the presented algorithm and to quantify its performance, we have tested it on a synthetically generated image named Bunny. In this analysis, 18 silhouettes of Bunny from equal space angle viewpoints have been captured and used. Table 1 shows the result of simulation for different methods. To compare the efficiency of methods, computing time for a fix number of recovered points on visual hull can be evaluated for different types of intersection check. Table 2 shows the result for voxel based DNHE extraction for the same silhouettes. As it is clear, hierarchical method of bounding edge extraction gives very good results compared to ordinary bounding edge method and voxel based DNHE method, especially for high number of recovered points which is proportion to high accuracy. Table 1. Computing time to extract bounding edges of bunny Num. of points on each ray of bounding cone
Recovered points on visual hull
Computing Time (sec) Ordinary method
Hierarchical method n=10
n=30
55
7192
0.25
0.1
0.17
80
10353
0.33
0.11
0.18
150
19421
0.59
0.112
0.19
250
32351
0.95
0.113
0.191
500
64764
1.83
0.115
0.195
1000
129578
3.52
0.12
0.21
38
Hossein Ebrahimnezhad and Hassan Ghassemian Table 2. Computing time for voxel base visual hull extraction Num. of voxels in each edge of bounding cube
Recovered voxels on visual hull
Computing Time (sec) (DNHE method)
27=128
6890
2.2
28=256
29792
12.4
Table 3. Computing time to extract bounding edges of bunny from different number of silhouettes Computing Time (sec) (Hierarchical Method) n=10 n=30
Number of silhouettes
Number of points on each ray
48
500
0.22
0.39
18
500
0.115
0.195
8
500
0.05
0.09
4
500
0.03
0.05
The computing time 0.12 or 0.21 sec to extract object from 18 silhouettes with the resolution of 1000 is very low and it makes possible to use algorithm for real-time extraction purposes. Table-3 shows the result of simulation for different number of silhouettes of Bunny.
Figure 15. Continued on next page.
New Trends in Surface Reconstruction Using Space-Time Cameras
39
Figure 15. Start-End points, Bounding-Edges and Depth-Map for 3 models named bunny, female and dinosaur which have been extracted through 18 silhouettes from different views using hierarchical algorithm.
In figure 15 we have demonstrated the start-end points of edges, bounding edges and depth map of reconstructed 3d models for synthetically objects named horse, bunny, female and dinosaur from 18 silhouette by hierarchical method has been shown.
Implementation and Exprimental Results To evaluate the efficiency of our approach to 3D model reconstruction by tracking the space curves using the perpendicular double stereo rigs, there are
40
Hossein Ebrahimnezhad and Hassan Ghassemian
two questions that we need to consider. First, how good is the estimation of motion parameters? Second, how efficient is our method in practice or how robust is our method against the disturbing effects like noise? In this section, we present the results of experiments designed to address these two concerns. Experimental results are conducted with both synthetic and real sequences that contain different textures of objects. Two synthetic objects named Helga and Cow were captured by perpendicular double stereo cameras in 3D-StudioMax environment. The objects were moved with arbitrary motion parameters and 36 sequences were captured by each camera. Figure 16 shows the camera setup to capture Helga and Cow models along with their extracted space curves. It is clear that the number of outlier points is significantly less than the number of valid unique points. Figure 17 illustrates the curve tracking and motion estimation process by minimizing the geometric distance of curves in the camera images. Two sets of space curves, which have been extracted by two distinct stereo rigs, are projected to the camera planes in the next sequence and motion parameters are adjusted in which the projection of space curves in the related camera planes to be as close as possible to nearby curves. Figures 18 and 19 tend to demonstrate how the projection of unique points becomes closer and closer to edge curves, in each iteration, to minimize the geometric distance error. To evaluate the estimation error of motion process, variation of the six motion parameters across time has been plotted in figures 20 and 21 for Cow and Helga sequences. Comparing diagrams of true motion with diagrams of estimated motion by single stereo and perpendicular double stereo setup reveals the superiority of the perpendicular double stereo to the single stereo. The assessment is also given numerically in table 4 and table 5. Figures 22 and 23 show the temporal sequence of Helga and result of implementation for virtual camera alignment across time. In addition, this figure illustrates how the silhouette cones of virtual cameras are intersected to construct the object visual hull. Figure 24 compares the quality of reconstructed Helga model using true motion information and estimated motion information by single and perpendicular double stereo setups. Figures 25 to 27 demonstrate the result of implementation for Cow sequences. To evaluate the robustness of motion estimation against the noise and color maladjustment, comparison between single stereo and perpendicular double stereo are given both quantitatively and qualitatively in table 6 and figure 28. To get the qualified edge curves in noisy image, it will be necessary to smooth the image before edge detection process. At the same time, smoothing the image makes small perturbation in the position of edge points. The perpendicular double stereo setup appears to be more robust against the perturbation of edge points. Figure 29 to 34 demonstrate the result of
New Trends in Surface Reconstruction Using Space-Time Cameras
41
implementation for real objects sequences named Buda, Cactus and Head by perpendicular double stereo setup. All objects have been captured through the turntable sequences. Notice the small perturbations from circular path in the alignment of virtual cameras for Head sequence. These perturbations are caused by the none-rigid motion of body in the neck region. In this experiment, the process of motion estimation has been accomplished only based on the head (not body) region information. Both synthetic and real experimental results demonstrate the honored performance of the presented method for the variety of motions, object shapes and textures.
Figure 16. Reconstructed spaced curves on the surface of synthetic models Helga and Cow (captured in 3D Studio Max).
42
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 17. Motion estimation by tracking the projections of space curves in perpendecular double stereo images.
New Trends in Surface Reconstruction Using Space-Time Cameras
43
Figure 18. Motion estimation by minimizing the geometric distance of space curves from adjacent curves in four camera images. Convergence of algorithm is shown for different number of iterations.
44
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 19. Motion estimation by minimizing the geometric distance of space curves from adjacent curves in the projected camera images. Convergence of algorithm is shown for different number of iterations for (a) Cow and (b) Helga.
New Trends in Surface Reconstruction Using Space-Time Cameras
Figure 20.True and estimated motion parameters for Cow sequences by single and perpendicular double stereo setup.
Table 4. Estimation error of motion parameters for Cow sequences by single and perpendicular double stereo setup Motion Parameter
Δφx (deg)
Mean of Abstract Error Double Single Perpendicular Stereo Rig Stereo Rigs 0.46 0.13
Maximum of Abstract Error Double Single Perpendicular Stereo Rig Stereo Rigs 1.53 0.50
Δφy (deg)
0.72
0.17
1.97
0.61
Δφz (deg)
0.34
0.12
1.07
0.41
Δφtotal (deg)
0.51
0.14
1.97
0.61
ΔT x (mm)
0.27
0.15
1.02
0.39
ΔT y (mm)
0.13
0.10
0.31
0.24
ΔT z (mm)
0.28
0.30
1.08
1.02
ΔT total (mm)
0.23
0.18
1.08
1.02
45
46
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 21. True and estimated motion parameters for Helga sequences by single and perpendicular double stereo setup.
Table 5. Estimation error of motion parameters for Helga sequences by single and perpendicular double stereo setup Motion Parameter
Mean of Abstract Error Double Single Perpendicular Stereo Rig Stereo Rigs
Maximum of Abstract Error Double Single Perpendicular Stereo Rig Stereo Rigs
Δφx (deg)
0.54
0.14
2.92
0.53
Δφy (deg)
0.96
0.57
2.91
1.20
Δφz (deg)
0.32
0.14
1.17
0.44
Δφtotal (deg)
0.61
0.28
2.92
1.20
ΔT x (mm)
0.25
0.18
0.75
0.40
ΔT y (mm)
0.19
0.19
0.43
0.34
ΔT z (mm)
0.28
0.23
0.61
0.46
ΔT total (mm)
0.24
0.20
0.75
0.46
New Trends in Surface Reconstruction Using Space-Time Cameras
Figure 22. Different views of Helga in 36 sequence of its motion.
47
48
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 23. Three-dimensional model reconstruction from multivies for Helga sequences. (top) Trajectory of estimated virtual cameras by perpendicular double stereo rigs along with two silhouette cones intersection, (middle) extraction of visual hull by all silhouette cones intersection and, (down) color mapping from visible cameras.
Figure 24. Reconstructed model of Helga including the bonding edges visual hull, depthmap and texture mapped 3D model: (a) estimated-motion with single stereo, (b) estimatedmotion with perpendicular double stereo, and (c) true-motion.
New Trends in Surface Reconstruction Using Space-Time Cameras
Figure 25. Different views of Cow in 36 sequence of its motion.
49
50
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 26. Trajectory of estimated virtual cameras by perpendicular double stereo rigs and extraction of visual hull by silhouette cones intersection (Cow sequences).
Figure 27. Reconstructed model of Cow including the bonding edges visual hull, depthmap and texture mapped 3D model: (a) estimated-motion with single stereo, (b) estimatedmotion with perpendicular double stereo and (c) true-motion.
New Trends in Surface Reconstruction Using Space-Time Cameras
51
Table 6. Estimation error of motion parameters for noisy sequences of Helga by single and perpendicular double stereo setup (σn2 = 0.1) Motion Parameter
Mean of Abstract Error Double Single Perpendicular Stereo Rig Stereo Rigs
Maximum of Abstract Error Double Single Perpendicular Stereo Rig Stereo Rigs
Δφx (deg)
1.02
0.29
5.80
1.41
Δφ y (deg)
1.89
1.43
10.30
3.93
Δφz (deg)
0.71
0.21
5.12
0.61
Δφtotal (deg)
1.21
0.64
10.30
3.93
ΔT x (mm)
0.33
0.24
0.54
0.38
ΔT y (mm)
0.38
0.30
0.94
0.54
ΔT z (mm)
0.31
0.21
0.68
0.51
ΔT total (mm)
0.34
0.25
0.94
0.54
Figure 28. Effect of noise and color unbalance in 3D reconstruction: (a) noisy images of different cameras with σ2 = 0.1, (b) reconstructed model with single stereo rig and (c) reconstructed model with perpendicular double stereo rigs.
52
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 29. Different views of Buda in 36 sequence of its motion.
New Trends in Surface Reconstruction Using Space-Time Cameras
53
Figure 30. Reconstruction of Buda statue by perpendicular double stereo rigs (circular motion with turn table).
54
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 31. Different views of Cactus in 36 sequence of its motion.
New Trends in Surface Reconstruction Using Space-Time Cameras
55
Figure 32. Reconstruction of Cactus by perpendicular double stereo rigs (circular motion with turn table).
56
Hossein Ebrahimnezhad and Hassan Ghassemian
Figure 33. Different views of Head (the picture of author) in 36 sequence of its motion.
New Trends in Surface Reconstruction Using Space-Time Cameras
57
Figure 34. Reconstruction of Head by perpendicular double stereo rigs (non-circular motion).
Conclusions In this chapter, an efficient method has been presented to reconstruct the three dimensional model of a moving object by extracting the space curves and tracking
58
Hossein Ebrahimnezhad and Hassan Ghassemian
them across time using perpendicular double stereo rigs. A new method of space curve extraction on the surface of the object was presented by checking the consistency of torsion and curvature through motion. The nature of space curves makes the extraction very robust against the poor color adjustment between cameras and changing the light angle during the object motion. Projection of the space curves in the camera images were employed for tracking the curves and for the robust motion estimation of the object. The concept of virtual cameras, which are constructed from motion information, was introduced. Constructing the virtual cameras makes possible to use a large number of silhouette cones in the structurefrom-silhouette method. Experimental results show that the presented edge based method can be effectively used in reconstruction of visual hull for poorly textured objects. In addition, it does not require the accurate color adjustment during camera setup and provides a better result compared to the other methods, which use the color consistency property. Moreover, the presented method is not limited to the circular turntable rotation. Finally, the double perpendicular stereo setup has been presented as a way to reduce the effect of statistical bias in motion estimation and to enhance the quality of 3D reconstruction. Quantitatively, the average of abstract error in Δφtotal is reduced from 0.51 deg in single stereo rig to 0.14 deg in perpendicular double stereo rigs and, the average of abstract error in ΔT total is reduced from 0.23 mm to 0.18 mm for Cow sequence. The respective values in Helga sequence are Δφtotal 0.61 - 0.28 deg and ΔT total 0.24-0.20 mm. These values are increased in noisy sequence of Helga to Δφtotal 1.21 - 0.64 deg and ΔT total 0.34-0.25 mm.
Acknowledgment This research was supported in part by ITRC, the Iran Telecommunication Research Center, under grant no. TMU 85-05-33. Main part of this chapter has been published previously in [43] by the authors.
References [1]
Y. Han, Geometric algorithms for least squares estimation of 3-D information from monocular image, IEEE Trans. Circuits and Systems for Video Technology, (15) (2) (2005), pp.269-282.
New Trends in Surface Reconstruction Using Space-Time Cameras [2]
[3] [4] [5]
[6] [7]
[8] [9]
[10] [11] [12] [13]
59
T. Papadimitriou, K.I. Diamantaras, M.G. Strintzis, and M. Roumeliotis, Robust Estimation of Rigid-Body 3-D Motion Parameters Based on Point Correspondences, IEEE Trans. Circuits and Systems for Video Technology, (10) (4) (2000), pp.541-549. M. Lhuillier, L. Quan, A Quasi-Dense Approach to Surface Reconstruction from Uncalibrated Images, IEEE Trans. Pattern Analysis and Machine Intelligence, (27) (3) (2005), pp. 418-433. Y. Zhang and C. Kambhamettu, On 3-D Scene Flow and Structure Recovery From Multiview Image Sequences, IEEE Trans. Systems, Man, and Cybernetics - Part B, (33) (4) (2003), pp. 592-606. P. Eisert, E. Steinbach, and B. Girod, Automatic Reconstruction of Stationary 3-D Objects from Multiple Uncalibrated Camera Views, IEEE Trans. Circuits and Systems for Video Technology, (10) (2) (2000), pp.261277. Y. Furukawa, A. Sethi, J. Ponce, and D.J. Kriegman, Robust Structure and Motion from Outlines of Smooth Curved Surfaces, IEEE Trans. Pattern Analysis and Machine Intelligence, (28) (2) (2006), pp. 302-315. B. Triggs, P. McLauchlan, R. Hartley and A. Fiztgibbon, Bundle adjustment- a modern synthesis, In vision Algorithms: Theory and Practice, Lecture Notes on Computer Science, Sppringer-Verlag, (1883) (2000), pp 298-372. R.I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge Univ. Press, Second Edition, 2003. M. Pollefeys, R. Koch, M. Vergauwen, and L. Van Gool, Metric 3D Surface Reconstruction from Uncalibrated Image Sequences, Proc. European Workshop 3D Structure from Multiple Images of Large-Scale Environments, (1998), pp. 139-154. M.A. Fischler and R.C. Bolles, Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography, Comm. ACM, (24) (6) (1981), pp. 381-395. P. Torr and D. Murray, The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix, Int’l J. Computer Vision, (24) (3) (1997), pp. 271-300. Z. Zhang, Determining the epiipolar Geometry and its Uncertainity: a Review, Int’l J. Computer Vision, (27) (2) (1998), pp. 161-195. P. Torr, A. Zisserman, MLESAC: A New robust estimator with application to estimating image geometry, Computer Vision and Image Understanding, (78) (2000), pp. 138-156.
60
Hossein Ebrahimnezhad and Hassan Ghassemian
[14] A. Calway, Recursive Estimation of 3D Motion and Surface Structure from Local Affine Flow Parameters, IEEE Trans. Pattern Analysis and Machine Intelligence, (27) (4) (2005), pp. 562-574. [15] K. Kanatani, 3D interpretation of optical flow by renormalization, Int’l J. Computer Vision, (11) (3) (1993), pp. 267–282. [16] A. D. Jepson and D. J. Heeger, Linear subspace methods for recovering translation direction, in Spatial Vision in Humans and Robots, Cambridge Univ. Press, (1993), pp. 39–62. [17] A.K.R. Chowdhury and R. Chellappa, Statistical Bias in 3-D Reconstruction from Monocular Video, IEEE Trans. Image Processing, (14) (8) (2004), pp. 1057-1062. [18] R. Szeliski and S. B. Kang, Recovering 3D shape and motion from image streams using nonlinear least squares, J. Visual Commun. Image Representation, (5) (1) (1994), pp. 10–28. [19] G. Young and R. Chellappa, 3-D motion estimation using a sequence of noisy stereo images: Models, estimation, and uniqueness results, IEEE Trans. Pattern Anal. Machine Intell., (12) (1990), pp. 735–759. [20] J. Weng, P. Cohen, and N. Rebibo, "Motion and Structure Estimation from Stereo Image Sequences ," IEEE Trans. Robotics and Automation, (8) (3) (1992), pp. 362-382. [21] L. Li and J. Duncan, 3-D translational motion and structure from binocular image flows, IEEE Trans. Pattern Anal. Machine Intell., (15) (1993), pp. 657–667. [22] F. Dornaika and R. Chung, Stereo correspondence from motion correspondence, Proc. IEEE Conf. Computer Vision Pattern Recognition, (1999), pp. 70–75. [23] P.K. Ho and R. Chung, Stereo-Motion with Stereo and Motion in Complement, IEEE Trans. Pattern Anal. Machine Intell., (22) (2) (2000), pp. 215–220. [24] S.K. Park and I.S. Kweon, Robust and direct estimation of 3-D motion and Scene depth from Stereo image sequences, Pattern Recognition, (34) (9) (2001), pp. 1713-1728. [25] H. Ebrahimnezhad, H. Ghassemian, 3D Shape Reconstruction of Moving Object by Tracking the Sparse Singular Points, IEEE International workshop on Multimedia Signal Processing, (2006), pp. 192-197. [26] L. Robert and O.D. Faugeras, Curve-based stereo: figural continuity and curvature, Proc. IEEE Conf. Computer Vision Pattern Recognition, (1991), pp. 57–62.
New Trends in Surface Reconstruction Using Space-Time Cameras
61
[27] C. Schmid and A. Zisserman, The geometry and matching of lines and curves over multiple views, Int. Journal ComputerVision, (40) (3) (2000), pp. 199–233. [28] J.H. Han, J. S. Park, Contour Matching Using Epipolar Geometry, IEEE Trans. Pattern Analysis and Machine Intelligence, (22) (4) (2000), pp.358370. [29] F. Kahl, J. Augut, Multiview Reconstruction of Space Curves, In Int.Conf. Computer Vision, (2003), pp. 181–186. [30] A.Gray, Modern Differential Geometry of Curves and Surfaces with Mathematica, CRC Press, Second Edition, (1997), pp. 219-222. [31] A. Bottino, A. Laurentini, Introducing a New Problem: Shape-fromSilhouette When the Relative Positions of the Viewpoints is Unknown, IEEE Trans. Pattern Analysis and Machine Intelligence, (25) (11) (2003), pp. 1484-1492. [32] Yang Liu, George Chen, Nelson Max, Christian Hofsetz, Peter McGuinness, Visual Hull Rendering with Multi-view Stereo, Journal of WSCG. (12) (1-3) (2004). [33] A.Y. Mulayim, U. Yilmaz, V. Atalay, Silhouette-based 3-D model reconstruction from multiple images, IEEE Trans. on Systems, Man and Cybernetics, Part B, (33) (4) (2003), pp. 582-591. [34] G.K.M. Cheung, S. Baker, T. Kanade, Shape-from-Silhouette Across Time Part I: Theory and Algorithms, Int’l J. Computer Vision, (62) (3) (2005), pp. 221-247. [35] M. Potmesil, "Generating Octree Models of 3D Objects from their Silhouettes in a Sequence of Images", CVGIP 40, pp. 1-29, 1987. [36] R.Szeliski, "Rapid Octree Construction from Image Sequences", CVGIP: Image Understanding 58, 1, pp. 23- 32, July 1993. [37] W. Matusik, C. Buehler, R. Raskar, S. Gortler, "Image- Based Visual Hulls", SIGGRAPH 2000, McMillan, pp. 369- 374, July 23- 28, 2000. [38] S. Moezzi, A. Katkere, D. Y. Kuramura, and R. Jain, "Reality modeling and visualization from multiple video sequences", IEEE Computer Graphics and Applications, 16( 6), pp. 58– 63, November 1996. [39] G.K.M. Cheung, T. Kanade, J.Y. Bouguet, and M. Holler. "A real time system for robust 3D voxel reconstruction of human motions", In Proceedings of the 2000 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’00), Vol. 2, pp. 714–720, June 2000. [40] G.K.M. Cheung, et all. "Visual hull alignment and refinement across time: a 3D reconstruction algorithm combining shape-frame-silhouette with stereo."
62
Hossein Ebrahimnezhad and Hassan Ghassemian
In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2003, Vol. 2, pp. 375-382, June 2003. [41] W. Matusik, et all, "Polyhedral visual hulls for real-time rendering", In Proceedings of 12th Euro graphics Workshop on Rendering, pp. 115–125, June 2001. [42] S. Lazebnik, E. Boyer, and J. Ponce. "On computing exact visual hulls of solids bounded by smooth surfaces", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR'01), Kauai HI, December 2001. [43] H. Ebrahimnezhad, H. Ghassemian, “Robust Motion from Space Curves and 3D Reconstruction from Multiviews Using Perpendicular Double Stereo Rigs, Journal of Image and Vision Computing, Elsevier,Vol 26, No.10, pp. 1397-1420, Oct. 2008.
In: Binocular Vision Editors: J. McCoun et al, pp. 63-80
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 2
OCULAR DOMINANCE WITHIN BINOCULAR VISION Jonathan S. Pointer Optometric Research; 4A Market Square, Higham Ferrers, Northamptonshire NN10 8BP, UK
Abstract Ocular dominance (OD) can be defined and identified in a variety of ways. It might be the eye used to sight or aim, or whose input is favoured when there is competing information presented to the two eyes, or the eye whose functional vision appears superior on a given task or under certain conditions. The concept, which has been the subject of much discussion and revision over the past four centuries, continues to excite controversy today. What is becoming evident is that even in its most direct and behaviourally significant manifestation – sighting preference – it must be regarded as a flexible laterality within binocular vision, influenced by the physical circumstances and viewing constraints prevailing at the point of testing. This chapter will review the phenomenon of OD in the light of the types of test used to identify it; question whether inter-test agreement of OD in an individual might be anticipated; briefly consider the possibility of any relationship between OD and limb or cortical laterality; and speculate whether OD is essentially the product of forced monocular viewing conditions and habitual use of one or other eye. The chapter will conclude with remarks addressing some practical implications of OD as demonstrated in healthy eyes and in cases where there is compromised binocular function.
64
Jonathan S. Pointer
Introduction: Ocular Dominance Walls (1951: p. 394) has observed: “… dominance is a phenomenon of binocular vision: an eye does not become dominant only when the other eye is out of action”. So what is ‘ocular dominance’? And what conceivable benefit might such a preferent facility – apparently demonstrable in the majority of binocularly sighted persons (Ehrenstein et al., 2005; Miles, 1930) – bestow upon the individual? This chapter will describe and review the phenomenon of ocular dominance. Historically, discussion of eye dominance has occurred in the context of theories of binocular visual function (Wade, 1998). But while its existence has been acknowledged for over 400 years, are we any nearer to understanding this putative lateral oculo-visual preference? The human eyes, although naturally paired, not infrequently manifest functional asymmetries. The apparent behavioural performance superiority of one eye is recognised by a variety of terms: these include ocular dominance, eye preference, sighting dominance or, in terminology analogous to ‘handedness’ or ‘footedness’ (motor preference demonstrated by an upper or lower limb, respectively), eyedness (Porac & Coren, 1981). Ocular dominance (OD) is the term that will be used preferentially throughout this chapter to embrace this concept, although it should be noted that this choice is not intended to imply that any dominance is of ‘ocular’ origin or even a unitary concept (Warren & Clark, 1938). OD means different things to different people. The lay person might perhaps encounter the phenomenon when aligning objects in space during DIY tasks or when threading a needle; when participating in aiming sports activities (eg, clay pigeon shooting); or if engaged in specific occupations or pastimes that require monocular use of an optical (usually magnification) aid (eg, microscopy, astronomy). Under these circumstances one eye is unconsciously chosen (when viewing in binocular free space) or consciously selected (when using a gun or monocular instrument) to undertake the task (Miles, 1929; Porac & Coren, 1976): unconscious and conscious sighting choices as regards right or left eye use are reportedly in agreement approximately 92% of the time (Coren et al., 1979). All sighting tasks determine OD on the basis of the alignment of two objects presented at a stereo-disparity sufficiently far outside Panum’s area such that fusion is denied (Kommerell et al., 2003): the subject is forced to choose between one or the other image (ie, eye). The clinical research scientist might consider criteria other than a sighting (motor) preference as providing a more appropriate indication of ocular laterality preference under particular circumstances. These alternatives are likely to be
Ocular Dominance within Binocular Vision
65
performance-related measures of sensory origin; two longstanding popular examples are the eye with apparently better visual acuity (van Biervlet, 1901; Miles, 1930), or that eye which shows greater facility to suppress a rivalrous retinal image under conditions of binocular viewing (Washburn et al., 1934). The first recorded description of what we now call OD is usually attributed to Giovanni Battista della Porta (ca1535-1615) [figure 1] in Book 6 of his treatise De Refractione (Porta, 1593: Wade, 1998). Porta (1593, pp. 142-143) described a pointing test to determine the preferred eye: a rod is held directly in front of the body and, with both eyes open, the viewer aligns the tip of the rod with a defined object in the mid-distance – the eye which retains an aligned view of the rod and the fixation object when the eyes are alternately closed is the preferred (sighting) eye. Translations of Porta’s text, originally published in Latin, have been provided by Durand & Gould (1910) and Wade (1998). It has also been suggested (Wade, 1987, p. 793) that the Flemish artist Peter Paul Rubens (1577-1640) might have made an engraving depicting Porta’s sighting test.
Figure 1. Giovanni Battista (sometimes Giambattista) della Porta (born probably late 1535, deceased 4 February 1615), Neapolitan scholar, polymath and playright. Portrait in profile: frontispiece engraving to expanded 20-volume edition of Magiae Naturalis (Naples, 1589).
66
Jonathan S. Pointer
It might also be noted that around 330BC Aristotle (ca384-322BC) described (without appreciating the significance of his observation) the ability of most individuals to wink or temporarily close one eye despite both eyes having similarly acute vision (Ross, 1927, p. 959). For the interested reader Wade (1998) provides a wide historical survey of studies of eye dominances: his narrative places discussion of reported oculo-visual preferences in the context of evolving theories of human binocular visual function. Over the years numerous techniques have been devised to define OD (Crider, 1944). Walls (1951) compiled a list of 25 tests (and indicated that his inventory could not be regarded as exhaustive). Gronwall & Sampson (1971) compared 18 techniques, and Coren & Kaplan (1973) analysed 13 measures. The issue of test appropriateness and comparative inter-technique agreement will be addressed below. Suffice to say here that on the grounds of the results of Coren & Kaplan (1973) if one were to choose a single technique to predict eye laterality it could reasonably – as four hundred years ago – be that of sighting alignment. Furthermore, subsequent research has suggested that the sighting eye thus determined seems to extract and process visual spatial information more efficiently than its fellow (Porac & Coren, 1977; Shneor & Hochstein, 2006 and 2008).
Demography of Ocular Dominance The majority of individuals can record a sighting (motor) dominant eye. This laterality is apparently established in early life, possibly around four years of age (Barbeito, 1983; Dengis et al., 1996), and becomes stable by the middle of the human development period (Dengis et al., 1993). Any contribution of genetic factors to an individual’s OD has been little explored. Porac & Coren (1981) have concluded that familial traits are absent. Reiss (1997) has expressed equivocation, although admitting that in an examination of fresh family data OD failed to align with any direct recessive or dominant Mendelian model of genetic transfer. A number of studies throughout the twentieth century have explored the laterality distribution of OD in large samples of normally sighted human populations. Porac & Coren ( 1976: Table 1, p. 884) have surveyed much of this work, drawing on studies published between 1929-1974, and undertaken in North America, the UK, Japan, Australia, and Africa. The broad conclusion was that approximately 65% of persons sighted with their right eye, 32% with the left and
Ocular Dominance within Binocular Vision
67
only 3% demonstrated no consistent preference. Chronological age and country of origin appeared to be negligible influences in this compilation. Subsequent to this summary, the same authors (Porac & Coren, 1981) published the results of a substantial questionnaire-based population study of eye (also ear and limb) laterality. The survey was completed by 5147 individuals (representing an approximately 25% response rate to the mailing) across North America. Results indicated that 71.1% of respondents were right sighting preferent. The balance (28.9%) of respondents were left sighting preferent; this apparently included a tiny proportion of persons who were unable to indicate a preference. Replicating the outcome of a previous study (Porac et al., 1980), males (at 72.9%) were revealed in this survey as being statistically significantly more right eyed than females (69.1%): this gender imbalance has recently been reported again in an independent study (Eser et al., 2008). Male subjects have also been shown (Porac & Coren, 1975) to be statistically significantly more consistent than females (81% versus 63%, respectively) in their sighting preferences (regardless of whether laterality was dextral or sinistral). Adults in the North American postal survey of Porac & Coren (1981) were possibly more dextral than children, but the trend with advancing chronological age was weak and not statistically significant. Suggestions that refractive error and OD might be associated have not been substantiated in a recent large population study of adult subjects (Eser et al., 2008). Furthermore, over a twoyear longitudinal study (Yang et al., 2008) the development of childhood myopia has been shown to be free of the influence of OD.
A Taxonomy of Ocular Dominance Over the four centuries subsequent to Porta’s (1593) description of sighting preference the bibliography of the topic covering theoretical, practical and conjectural issues has expanded to perhaps 600 or more articles (Coren & Porac, 1975; updated by Mapp et al., 2003). Unfortunately this burgeoning literature has not produced a consistent or unifying theory of OD. As others have voiced previously (Flax, 1966; Warren & Clark, 1938) we can still legitimately ask: “What is the purpose of ocular dominance?” Controversially, does it have a purpose or – given that the eye that is considered dominant in a given person might vary with the task and circumstances (see below) – might the phenomenon be considered an artefact resulting from a particular test format or approach?
68
Jonathan S. Pointer
A way forward might be provided by considering a taxonomy of eye dominance based on whether OD might be regarded as a unitary concept or a multi-factorial phenomenon. There have been those who claim a generalised laterality (examined by Porac & Coren, 1975), with OD matching hand/foot preferences: an early supporter of absolute unilateral superiority of sensori-motor function was the anatomist G. M. Humphrey (1861; as discussed by Woo & Pearson, 1927, pp. 167-169). Others (Berner & Berner, 1953; Cohen, 1952) considered OD to be composed of two factors, namely sighting dominance and rivalry dominance. Walls (1951) regarded OD as a composite of sensory and motor (eye movement) dominance. Jasper & Raney (1937) added acuity dominance and a dominant cerebral hemisphere to motor control. Lederer (1961) contemplated five varieties of OD (a proposal examined further by Gilchrist, 1976): monocular sighting dominance, motor dominance of one eye in binocular circumstances, orientational dominance, sensory dominance of one eye, and dominance of the lateral (right or left) hemi-field. A weakness of many of these claims is that essentially they are based on individual observation or founded on theoretical considerations associated with the extant literature. More recently Gronwall & Sampson (1971) and Coren & Kaplan (1973) have both brought some enlightenment to a reappraisal of the issue by actually undertaking fresh examinations and applying modern comparative statistical analyses to the results. The outcome of the study by Coren & Kaplan (1973) in particular provides us with an entry into our probing of the form and function of OD.
Is Ocular Dominance Test Specific? Coren & Kaplan (1973) assessed a group of fifty-seven normally sighted subjects using a battery of thirteen OD tests that intentionally covered the broad range of sensori-motor approaches suggested by other investigators over the years. The test scoring methodology indicated the strength as well as the laterality of dominance. A factor analysis of the results identified three orthogonal determinants of OD: (i) sensory – as revealed by (form) rivalry of stereoscopically presented images; (ii) acuity – a visual functional superiority indicated directly by the comparative inter-eye level of Snellen visual acuity; and (iii) sighting – ocular preference indicated by an aiming, viewing or alignment type of task. These three alternative bases for a dominant eye will each be considered further.
Ocular Dominance within Binocular Vision
69
I. Tests of Rivalry Suppression of rivalrous stimuli has been a consistently recognised feature of eye dominance since the earliest writing on the phenomenon (Porta, 1593: Wade, 1998). The viewing of superficially similar but subtly different stimulus pairs, separated using different coloured optical filters or cross-polarised lenses, can be used to quantify the proportion of time that the view of one or the other eye holds sway in the binocular percept (Washburn et al., 1934). Suppression of competing stimuli is demonstrably not limited to one eye, being usually in a state of flux. But this inter-eye rivalry is only possible when the stimuli are small and discrete (Levelt, 1968), immediately questioning the ability of such a rivalry technique to provide a wider ‘global’ indication of OD.
II. Tests of Asymmetry Historically (as summarised by Wade, 1998) Aristotle in the third century BC contended that, although both eyes possessed equal visual acuity, superior accuracy was achieved using one eye; this opinion prevailed for nearly two millennia. Only in the eighteenth century was the possibility of an inter-ocular acuity difference given wider credence. Nowadays, it is recognised that in normally sighted (non-amblyopic) individuals, the level of visual acuity is demonstrably similar in the two eyes (Lam et al., 1996) but not atypically one eye might perform very slightly better than its fellow. Unfortunately many investigators have been tempted to claim the better-sighted eye as the dominant one (Duke-Elder, 1952; Mallett, 1988a; Miles, 1930). The problem is that the evidence base for such a supposition is weak (Pointer, 2001 and 2007). In addition, laterality correlations with other recordable ocular asymmetries (eg, vergence and version saccades: Barbeito et al., 1986; Pickwell, 1972) are absent or unexplored, throwing into question reliance upon oculo-visual functional asymmetries as indicators of OD either in the individual or on a universal scale.
III. Sighting Tests Probably the most direct and intuitive demonstration of OD originates with the pointing/alignment test described by Porta (1593). Test variations are many but include viewing a discrete distant target either through a circular hole cut in a
70
Jonathan S. Pointer
piece of card or through a short tube (Durand & Gould, 1910), or through the circular aperture created at the narrow end of a truncated cone when the wider end is held up to the face (A-B-C Test: Miles, 1929). Prima facie OD thus determined is usually clearly defined and consistent within (Miles, 1928 and 1929; Porac & Coren, 1976) and – importantly (see below) – between tests (Coren & Kaplan, 1973; Gronwall & Sampson, 1971). Sighting dominance in the study by Coren & Kaplan (1973) accounted for the greatest proportion (67%) of the variance among all of the tests analysed. Unfortunately, it has come to be realised that even sighting dominance is not entirely robust, being subject to possible corrupting influences which include: observer (subjective) expectation and test knowledge (Miles, 1929); aspects of operational detail for even this simplest of tests (Ono & Barbeito, 1982); the fact that dominance, possibly as a result of relative retinal image size changes (Banks et al., 2004), has been shown to cross between eyes at horizontal viewing angles as small as 15 degrees eccentricity (Khan & Crawford, 2001; also Henriques et al., 2002; Quartley & Firth, 2004); and not least (for sighting tests which have to be held), the potentially modulating influence of hand use (Carey, 2001). The dilemma that arises when OD appears to be test specific is simply where to place one’s reliance: has dominance switched between eyes or is the outcome an artefact of the testing technique? This rather begs the question: what is the purpose of OD?
Some Misconceptions It remains a belief in some quarters that OD is a fixed characteristic in a given individual (a claim disputed by Mapp et al., 2003) or even, in an unwarranted leap of reasoning, displays the same laterality as limb (hand/foot) preferences (Delacato, 1959; Humphrey, 1861; Porta, 1593). Apparently one has only to apply one’s choice of test(s) as summarised in the previous section (or identify the writing hand or the ball-kicking foot) and the laterality of the dominant eye is established. But as we have just discussed, these several tests unfortunately indicate that more often than not OD appears to vary with test selection or circumstances. Also, the majority of studies investigating OD in tandem with hand (and rarely, foot) preference have failed to show a congruent relationship: selected examples include Annett (1999), Coren & Kaplan (1973), Gronwall & Sampson (1971), McManus et al. (1999), Merrell (1957), Pointer (2001), Porac & Coren (1975), and Snyder & Snyder (1928).
Ocular Dominance within Binocular Vision
71
The content of the previous section of this chapter was based on the results of the modern analytical comparative study of OD tests undertaken by Coren & Kaplan (1973). Three statistically significant but independent features were common in analysis: viz, the eye which most dominated during binocular rivalry tests, the eye with the better visual acuity and (most significantly) the eye used for sighting. This outcome formalised the several results reported by many investigators before and since: in fact Mapp et al., (2003) have listed 21 such relevant studies published between 1925 (Mills) and 2001 (Pointer). Put succinctly, OD measured in the individual with one test format does not necessarily agree with that determined using an alternative approach. Undaunted, a neuro-anatomical explanation for this functional inconsistency has been essayed. Hemispheric cortical specialisation has frequently been claimed as the causal factor underlying all dominances of paired sensory organs or motor limbs. However as long as seventy years ago Warren & Clark (1938) disputed any relation between OD and cortical laterality, but still speculation has not been entirely silenced. Suggestions continue to be made that a greater proportion of the primary visual cortex is activated by unilateral stimulation of the dominant eye than by that of the companion eye (eg, Menon et al., 1997; Rombouts et al., 1996). However, what must be remembered in the specific case of human ocular neuroanatomy is that there is semi-decussation of the optic nerve fibres at the optic chiasma (Wolff, 1968), which results in the unique bi-cortical representation of each eye (Duke-Elder, 1952; Flax, 1966). This situation is quite unlike the straightforward contra-lateral cortical representation pertaining for the upper or lower limbs. Quite simply, ocular neuro-anatomy denies any unifying concept of laterality. With an equal longevity to misunderstandings surrounding a claimed cortical basis for OD is the suggestion that sighting laterality provides the reference frame for the spatial perception of visual direction (Khan & Crawford, 2001; Porac & Coren, 1981; Sheard, 1926; Walls, 1951). The justification for this assertion, linking a monocular task with a binocular system, is doubtful (Gilchrist, 1976). It has been convincingly argued by Mapp et al. (2003; pp. 313-314), drawing on both their own research and independent evidence in the literature, that both eyes participate in determining visual direction (Barbeito, 1981; Porac & Coren, 1986) and not the sighting dominant eye alone (eg, Khan & Crawford, 2001). This paired influence is also of course in accord with Hering’s (1868/1977) concept of binocular vision, wherein the two eyes are composite halves of a single organ.
72
Jonathan S. Pointer
Resolving the Paradox of Ocular Dominance From the foregoing discussion of the phenomenon of OD and attempts to define and measure it, we have apparently arrived at a paradoxical position. On the one hand the majority of binocularly sighted persons would quite reasonably claim to have experienced a demonstration of OD; on the other hand we are unable to confirm uniformity of laterality in the individual, or identify a clearly defined oculo-visual role for the phenomenon. Of the non-human species, only primates have been considered to show characteristics consistent with having a sighting dominant eye (Cole, 1957; Smith, 1970). This has led Porac & Coren (1976) to speculate that perhaps OD is important to animals (including man) where the two eyes are located in the frontal plane of the head: this anatomical arrangement means that the left and right monocular visual fields display a substantial binocular overlap, with the functional consequence of enhanced depth perception through binocular disparity. However, given that there is fusion only for objects stimulating corresponding retinal points, perhaps suppression of the image in the non-dominant eye removes the interference arising from disparate and non-fusible left and right images that would otherwise confuse the visual percept. Thus the result when undertaking a sighting task, for example, is that the image in the dominant eye prevails. The apparent consistency of right and left OD proportions across the human population, and the almost universal demonstration of OD from an early (preschool) age, have inclined Porac & Coren (1976, p. 892) to the opinion that: “… monocular viewing via the dominance mechanism is as natural and adaptive to the organism as binocular fusion”. But how reasonable is it to suggest that the highly evolved human visual system, which normally seeks to maintain binocular single vision, will ‘naturally’ occasionally resort to monocularity? And in this regard, as we have touched on in the previous section of this chapter and also stated when discussing tests of rivalry, in the individual with normal (ie, non-amblyopic) sight suppression of competing or rivalrous stimuli is not limited to one eye and one eye alone but rather is fluid from one moment to the next. It is evident that the sighting (syn. aiming or alignment) test format is the only technique that apparently has the potential to identify OD consistently in a binocularly sighted individual (Coren & Kaplan, 1973). This form of task specifically allows the selection of only one eye to accomplish its demands (Miles, 1928); whether by ease or habit most individuals usually perform reliably and repeatedly in their choice of eye under these constrained circumstances (Porac & Coren, 1976).
Ocular Dominance within Binocular Vision
73
It is precisely when the restricted condition of monocular sighting is denied that the consistency of eye choice deteriorates. For example, Barbeito (1981) reported that when performing the hole-in-the-card test with the hole covered (or the Porta pointing test with the view of one’s hand obscured) the imagined hole (or unseen finger or rod) is located on a notional line joining the target and a point between the eyes. Interestingly, a similar effect is observed in infants: when asked to grasp and look through a tube, they will invariably place the tube between rather than in front of one or other eye (the Cyclops effect: Barbeito, 1983). Dengis et al., (1998) have replicated this latter outcome in binocular adult subjects: an electronic shutter obscured the view of the target as a viewing tube was brought up to the face, resulting in failure to choose a sighting eye but rather place the tube at a point on or either side of the bridge of the nose. Gathering these strands together, perhaps it is possible to reconcile the conflicting views of OD by considering it to be (in older children and adults) a phenomenon demonstrated under constrained viewing conditions (Mapp et al., 2003), with convenience or personal habit (Miles, 1930) forcing the (likely consistent) choice of one or the other eye. The elaboration of convoluted, parsimonious or test-specific explanations of OD in an attempt to reconcile observed phenomena with the known facts regarding binocular vision then becomes unnecessary. In summary, the functional significance of OD simply extends to identifying which of a pair of eyes will be used for monocular sighting tasks (Mapp et al., 2003).
Some Clinical Implications of Ocular Dominance The concept of OD as a phenomenon identified under circumstances where monocular forced-choice viewing conditions prevail does not exist in a void. Consequently, this chapter will conclude with a consideration of the clinical implications of OD in patients for whom unilateral refractive, pathological or physiological (usually age-related) changes might impact on their binocular status. While the normally sighted binocular individual may not substantially depend on a dominant eye for the majority of daily activities, the identification of a preferred (sighting) eye could become functionally beneficial under specific circumstances. In an optometric context, for example, for the maximum relief of symptoms (including blurred vision and headaches) associated with uncompensated heterophoria (a tendency for the two eyes to deviate from the intended point of fixation), Mallett (1988b) advised that the greater part of any prescribed prism should be incorporated before the non-dominant eye; clinical
74
Jonathan S. Pointer
experience suggests that inattention to this point might fail to resolve the patient’s asthenopia or visual discomfort. A specialist clinical area that has come to the fore in recent years is that of monovision, a novel method of correcting presbyopia (the physiological agerelated deterioration in near vision as a consequence of impaired accommodative ability). In this approach (Evans, 2007), one eye is optically corrected for distance viewing and its companion focused for near. Prospective candidates for this procedure include middle-aged (usually longstanding) contact lens wearers, and persons electing to undergo laser refractive surgery. Typically in monovision it is the dominant eye (usually identified by a sighting test) that is provided with the distance refractive correction; near focus tasks are allotted to the companion eye. This allocation recognises (Erickson & Schor, 1990) that performance with the dominant eye is superior for spatiolocomotor tasks (including ambulatory activities and driving a vehicle), such actions relying on an accurate sense of absolute visual direction. In addition, it is claimed that this ‘dominant/distance’ clinical approach produces better binocular summation at middle distance and reasonable stereo-acuity at near (Nitta et al., 2007). Others have disputed the necessity to adhere to such a rigid rule, not only when fitting contact lenses (Erickson & McGill, 1992) but also when undertaking refractive surgery (Jain et al., 2001). However, it might be remarked that a fundamental area of concern remains centred on the identification or the procedural choice of the ‘dominant’ eye, ie, that eye which is likely to take on the distance-seeing role. While the visual sensori-motor system shows great adaptability and, as we have seen, OD can switch between eyes depending upon circumstances, great care should be taken when practitioners choose to prescribe monovision. Given the wealth of stimuli and changeable viewing circumstances in the ‘real world’, even sighting tests may not reliably indicate which eye should have the distance correction; furthermore, such tests cannot accurately predict or guarantee the success of monovision in an individual. Medico-legal, occupational and vocational caveats surround such a specific approach to visual correction. Appropriate patient selection and screening are essential features (Jain et al., 1996). Subjects usually require a variable period of spatial adaptation due to compromised binocular function and the evident necessity for visual suppression as viewing circumstances demand. These and related features associated with this slightly controversial clinical modus operandi have been well reviewed as experience with the procedure has evolved (Erickson & Schor, 1990; Evans, 2007; McMonnies, 1974) so will not be considered further here.
Ocular Dominance within Binocular Vision
75
A naturally occurring version of monovision might temporarily appear in patients affected by physiological age-related lens opacities (cataracts). The visual deterioration usually occurs in both eyes, but often to differing degrees and progresses at different rates, eventually requiring first then frequently second eye surgery with artificial lens implantation to restore reasonable visual acuity. However, it has been reported (Talbot & Perkins, 1998) that following the improvement in binocular function after second eye treatment the laterality of OD may change. It has also been recorded (Waheed & Laidlaw, 2003) that the debilitating effects of monocular injury or pathology may more markedly impair mobility or quality of life if it is the performance of the hitherto sighting dominant eye that is primarily compromised. However, again, the possibility of sighting dominance switching in patients with unilaterally acquired macular disease cannot be discounted (Akaza et al., 2007).
Conclusion It might be that a conclusion drawn by Warren & Clark (1938: p. 302) seventy years ago remains pertinent today: “eye dominance as a single unitary factor does not exist”. Perhaps OD is no more than a “demonstrable habit” (Miles, 1930) in binocular vision, adopted when viewing circumstances demand that only one eye can conveniently be used (Porac & Coren, 1976). Since classical times the paradoxical question of why, under binocular conditions, unilateral sighting might be considered an advantage compared to the continued use of two eyes has continued to be asked (Wade, 1998). Allied to this, what could be the oculo-visual purpose or benefit to the individual of a dominant eye whose laterality can apparently be modified by test conditions (Carey, 2001), by vision training (Berner & Berner, 1953), and by attentional factors (Ooi & He, 1999)? While the functional basis of OD remains uncertain in a species with a highly evolved binocular visual system, its demonstrable existence in the majority of normally sighted individuals has been linked to a number of perceptual and clinical phenomena. Unfortunately, the years of burgeoning knowledge have perhaps tended to obscure rather than clarify many issues surrounding OD and its relation to oculo-visual performance. What can be stated is that OD must be recognised as a dynamic concept, fluid and deformable in the context of specific viewing conditions and with regard to the methods used to identify it.
76
Jonathan S. Pointer
References Akaza, E., Fujita, K., Shimada, H. & Yuzawa, M. (2007). Sighting dominance in patients with macular disease. Nippon Ganka Gakkai Zasshi, 111, 322-325 [in Japanese]. Annett, M. (1999). Eye dominance in families predicted by the right shift theory. Laterality, 4, 167-172. Banks, M. S., Ghose, T. & Hillis, J. M. (2004). Relative image size, not eye position, determines eye dominance switches. Vision Res., 44, 229-234. Barbeito, R. (1981). Sighting dominance: an explanation based on the processing of visual direction in tests of sighting dominance. Vision Res., 21, 855-860. Barbeito, R. (1983). Sighting from the cyclopean eye: the Cyclops effect in preschool children. Percept. Psychophys., 33, 561-564. Barbeito, R., Tam, W. J. & Ono, H. (1986). Two factors affecting saccadic amplitude during vergence: the location of the cyclopean eye and a left-right bias. Ophthalmic Physiol.Opt., 6, 201-205. Berner, G. E. & Berner, D. E. (1953). Relation of ocular dominance, handedness and the controlling eye in binocular vision. A. M. A. Arch. Ophthalmol. 50, 603-608. van Biervlet, J. J. (1901). Nouvelle contribution a l’étude de l’asymetrie sensorielle. Bull. l’Acad. Roy. Sci. Belgique, 3, 679-694. Carey, D. P. (2001). Vision research: Losing sight of eye dominance. Curr. Biol., 11, R828-R830. Cohen, J. (1952). Eye dominance. Am. J. Psychol., 65, 634-636. Cole, J. (1957). Laterality in the use of the hand, foot, and eye in monkeys. J. Comp. Physiol. Psychol., 50, 296-299. Coren, S. & Kaplan, C. P. (1973). Patterns of ocular dominance. Am. J. Optom. Arch. Am. Acad. Optom., 50, 283-292. Coren, S. & Porac, C. (1975). Ocular dominance: an annotated bibliography. JSAS Catalogue of Selected Documents in Psychology, 5, 229-230. (Ms. No. 922). Coren, S., Porac, C. & Duncan, P. (1979). A behaviourally validated self-report inventory to assess four types of lateral preference. J. Clin. Neuropsychol., 1, 55-64. Crider, B. (1944). A battery of tests for the dominant eye. J. Gen. Psychol., 31, 179-190. Delacato, C. H. (1959). The Treatment and Prevention of Reading Problems (The Neuropsychological Approach). Springfield, Illinois: Charles C. Thomas. Dengis, C. A., Simpson, T. L., Steinbach, M. J. & Ono, H. (1998). The Cyclops effect in adults: sighting without visual feedback. Vision Res., 38, 327-331.
Ocular Dominance within Binocular Vision
77
Dengis, C. A., Steinbach, M. J., Goltz, H. C. & Stager, C. (1993). Visual alignment from the midline: a declining developmental trend in normal, strabismic, and monocularly enucleated children. J. Pediatr. Ophthalmol. Strabismus, 30, 323-326. Dengis, C. A., Steinbach, M. J., Ono, H., Gunther, L. N., Fanfarillo, R., Steeves, J. K. & Postiglione, S. (1996). Learning to look with one eye: the use of head turn by normals and strabismics. Vision Res., 36, 3237-3242. Duke-Elder, W. S. (1952). Textbook of Ophthalmology, Vol. 4. London: Henry Kimpton. Durand, A. C. & Gould, G. M. (1910). A method of determining ocular dominance. J. Am. Med. Assoc., 55, 369-370. Ehrenstein, W. H., Arnold-Schulz-Gahmen, B. E. & Jaschinski, W. (2005). Eye preference within the context of binocular functions. Graefes Arch. Clin. Exp. Ophthalmol., 243, 926-932. Erickson, P. & McGill, E.C. (1992). Role of visual acuity, stereoacuity, and ocular dominance in monovision patient success. Optom. Vis. Sci., 69, 761764. Erickson, P. & Schor, C. (1990). Visual function with presbyopic contact lens correction. Optom. Vis. Sci., 67, 22-28. Eser, I., Durrie, D. S., Schwendeman, F. & Stahl, J. E. (2008). Association between ocular dominance and refraction. J. Refract. Surg., 24, 685-689. Evans, B. J. (2007). Monovision: a review. Ophthalmic Physiol. Opt., 27, 417439. Flax, N. (1966). The clinical significance of dominance. Am. J. Optom. Arch. Am. Acad. Optom., 43, 566-581. Gilchrist, J. M. (1976). Dominance in the visual system. Br. J. Physiol. Opt., 31, 32-39. Gronwall, D. M. & Sampson, H. (1971). Ocular dominance: a test of two hypotheses. Br. J. Psychol., 62, 175-185. Henriques, D. Y., Medendorp, W. P., Khan, A. Z. & Crawford, J. D. (2002). Visuomotor transformation for eye-hand coordination. Prog. Brain Res., 140, 329-340. Hering, E. (1977). The Theory of Binocular Vision (B. Bridgeman & L. Stark, Eds. and trans.). New York: Plenum. (Originally published 1868). Humphrey, G. M. (1861). The Human Foot and the Human Hand. Cambridge: Cambridge University Press, pp. 201 et seq. Jain, S., Arora, I. & Azar, D. T. (1996). Success of monovision in presbyopes: review of the literature and potential applications to refractive surgery. Surv. Ophthalmol., 40, 491-499.
78
Jonathan S. Pointer
Jain, S., Ou, R. & Azar, D. T. (2001). Monovision outcomes in presbyopic individuals after refractive surgery. Ophthalmology, 108, 1430-1433. Jasper, H. H. & Raney, E. T. (1937). The phi test of lateral dominance. Am. J. Psychol., 49, 450-457. Khan, A. Z. & Crawford, J. D. (2001). Ocular dominance reverses as a function of horizontal gaze angle. Vision Res., 41, 1743-1748. Kommerell, G., Schmitt, C., Kromeier, M. & Bach, M. (2003). Ocular prevalence versus ocular dominance. Vision Res., 43, 1397-1403. Lam, A. K. C., Chau, A. S. Y., Lam, W. Y., Leung, G. Y. O. & Man, B. S. H. (1996). Effect of naturally occurring visual acuity differences between two eyes in stereoacuity. Ophthalmic Physiol. Opt., 16, 189-195. Lederer, J. (1961). Ocular dominance. Aust. J. Optom., 44, 531-539; 570-574. Levelt, W. J. M. (1968). On Binocular Rivalry. Paris: Mouton. Mallett, R. F. J. (1988a). Techniques of investigation of binocular vision anomalies. In K. Edwards & R. Llewellyn (Eds.), Optometry (p. 266). London: Butterworths. Mallett, R. F. J. (1988b). The management of binocular vision anomalies. In K. Edwards & R. Llewellyn (Eds.), Optometry (pp. 281-282). London: Butterworths. Mapp, A. P., Ono, H. & Barbeito, R. (2003). What does the dominant eye dominate? A brief and somewhat contentious review. Percept. Psychophys., 65, 310-317. McManus, I. C., Porac, C., Bryden, M. P. & Boucher, R. (1999). Eye dominance, writing hand, and throwing hand. Laterality, 4, 173-192. McMonnies, C. W. (1974). Monocular fogging in contact lens practice. Aust. J. Optom., 57, 28-32. Menon, R. S., Ogawa, S., Strupp, J. P. & Ugurbil, K. (1997). Ocular dominance in human V1 demonstrated by functional magnetic resonance imaging. J. Neurophysiol., 77, 2780-2787. Merrell, D. J. (1957). Dominance of eye and hand. Human Biology, 29, 314-328. Miles, W. R. (1928). Ocular dominance: methods and results. Psychol. Bull., 25, 155-156. Miles, W. R. (1929). Ocular dominance demonstrated by unconscious sighting. J. Exp. Psychol., 12, 113-126. Miles, W. R. (1930). Ocular dominance in human adults. J. Gen. Psychol., 3, 412420. Mills, L. (1925). Eyedness and handedness. Am. J. Ophthalmol., 8 (Series 3), 933941.
Ocular Dominance within Binocular Vision
79
Nitta, M., Shimizu, K. & Niida, T. (2007). The influence of ocular dominance on monovision – the interaction between binocular visual functions and the state of dominant eye’s correction. Nippon Ganka Gakkai Zasshi, 111, 434-440 [in Japanese]. Ono, H. & Barbeito, R. (1982). The cyclopean eye vs. the sighting-dominant eye as the centre of visual direction. Percept. Psychophys., 32, 201-210. Ooi, T. L. & He, Z. J. (1999). Binocular rivalry and visual awareness: the role of attention. Perception, 28, 551-574. Pickwell, L. D. (1972). Hering’s law of equal innervation and the position of the binoculus. Vision Res., 12, 1499-1507. Pointer, J. S. (2001). Sighting dominance, handedness, and visual acuity preference: three mutually exclusive modalities? Ophthalmic Physiol. Opt., 21, 117-126. Pointer, J. S. (2007). The absence of lateral congruency between sighting dominance and the eye with better visual acuity. Ophthalmic Physiol. Opt., 27, 106-110. Porac, C. & Coren, S. (1975). Is eye dominance a part of generalized laterality? Percept. Mot. Skills, 40, 763-769. Porac, C. & Coren, S. (1976). The dominant eye. Psychol. Bull., 83, 880-897. Porac, C. & Coren, S. (1977). The assessment of motor control in sighting dominance using an illusion decrement procedure. Percept. Psychophys. 21, 341-346. Porac, C. & Coren, S. (1981). Lateral Preferences and Human Behaviour. New York: Springer-Verlag. Porac, C. & Coren, S. (1986). Sighting dominance and egocentric localization. Vision Res., 26, 1709-1713. Porac, C., Coren, S. & Duncan, P. (1980). Life-span age trends in laterality. J. Gerontol., 35, 715-721. Porta, G. B. (1593). De Refractione. Optices Parte. Libri Novem. Naples: Salviani. Quartley, J. & Firth, A. Y. (2004). Binocular sighting ocular dominance changes with different angles of horizontal gaze. Binocul. Vis. Strabismus Q., 19, 2530. Reiss, M. R. (1997). Ocular dominance: some family data. Laterality, 2, 7-16. Rombouts, S. A., Barkhof, F., Sprenger, M., Valk, J. & Scheltens, P. (1996). The functional basis of ocular dominance: functional MRI (fMRI) findings. Neurosci. Lett., 221, 1-4. Ross, W. D. (1927). The Works of Aristotle: Volume 3 (Ed.). Oxford: Clarendon Press.
80
Jonathan S. Pointer
Sheard, C. (1926). Unilateral sighting and ocular dominance. Am. J. Optom. Arch. Am. Acad. Optom., 7, 558-567. Shneor, E. & Hochstein, S. (2006). Eye dominance effects in feature search. Vision Res., 46, 4258-4269. Shneor, E. & Hochstein, S. (2008). Eye dominance effects in conjunction search. Vision Res., 48, 1592-1602. Smith, L. (1970). Eye dominance in a monkey. Percept. Mot. Skills, 31, 657-658. Snyder, A. M. & Snyder, M. A. (1928). Eye preference tendencies. J. Ed. Psychol., 19, 431-433. Talbot, E. M. & Perkins, A. (1998). The benefit of second eye cataract surgery. Eye, 12, 983-989. Wade, N. J. (1987). On the late invention of the stereoscope. Perception, 16, 785818. Wade, N. J. (1998). Early studies of eye dominances. Laterality, 3, 97-108. Waheed, L. & Laidlaw, D. A. H. (2003). Disease laterality, eye dominance, and visual handicap in patients with unilateral full thickness macular holes. Br. J. Ophthalmol., 87, 626-628. Walls, G. L. (1951). A theory of ocular dominance. A. M. A. Arch. Ophthalmol., 45, 387-412. Warren, N. & Clark, B. (1938). A consideration of the use of the term ocular dominance. Psychol. Bull., 35, 298-304. Washburn, M. F., Faison, C. & Scott, R. (1934). A comparison between the Miles A-B-C method and retinal rivalry as tests of ocular dominance. Am. J. Psychol., 46, 633-636. Wolff, E. (1968). The Visual Pathway. In R. J. Last (Ed.), Anatomy of the Eye and Orbit (6th edition, pp. 341-344). London: H. K. Lewis. Woo, T. L. & Pearson, K. (1927). Dextrality and sinistrality of hand and eye. Biometrika, 19, 165-199. Yang, Z., Lan, W., Liu, W., Chen, X., Nie, H., Yu, M. & Ge, J. (2008). Association of ocular dominance and myopia development: a 2-year longitudinal study. Invest. Ophthalmol. Vis. Sci., 49, 4779-4783.
In: Binocular Vision Editors: J. McCoun et al, pp. 81-105
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 3
THREE-DIMENSIONAL VISION BASED ON BINOCULAR IMAGING AND APPROXIMATION NETWORKS OF A LASER LINE J. Apolinar Muñoz-Rodríguez* Centro de Investigaciones en Optica, A. C. Leon, Gto, 37150 Mexico.
Abstract We present a review of our computer vision algorithms and binocular imaging for shape detection optical metrology. The study of this chapter involves: laser metrology, binocular image processing, neural networks, and computer vision parameters. In this technique, the object shape is recovered by means of laser scanning and binocular imaging. The binocular imaging avoids occlusions, which appear due to the variation to the object surface. A Bezier approximation network computes the object surface based on the behavior of the laser line. By means of this network, the measurements of the binocular geometry are avoided. The parameters of the binocular imaging are computed based on the Bezier approximation network. Thus, the binocular images of the laser line are processed by the network to compute the object topography. By applying Bezier approximation networks, the performance of the binocular imaging and the accuracy are improved. It is because the errors of the measurement are not added to the computational procedure, which performs the shape reconstruction. This procedure represents a contribution for the stripe projection methods and the binocular imaging. To describe the accuracy a mean square error is calculated. *
E-mail address:
[email protected]. Tel: (477) 441 42 00.
82
J. Apolinar Muñoz-Rodríguez This technique is tested with real objects and its experimental results are presented. Also, the time processing is described.
Keywords: Binocular imaging, laser line, Bezier approximation networks.
1. Introduction In computer vision, optical systems have been applied for shape detection. The use of structured illumination makes the system more reliable and the acquired data are easier to interpret. A particular technique is the laser line projection [1-3]. When a laser line is projected on an object, the line position is shifted in the image due to the surface variation and the camera position. From the line position and the geometry of the optical setup, the object contour is deduced. The main aim of the line projection is the detection of the line behavior in the image [4-6]. Also, the geometry of the setup is measured to obtain the object topography. When the surface variation produces an occlusion, the line position can not be detected. Therefore, in the area of the line occlusion the topographic data are not retrieved [7-8]. In the proposed technique here, occluded areas are retrieved by binocular imaging. To extract the object surface, the object is moved in the x-axis and scanned by a laser line. Based on the line behavior in the image, the object shape is retrieved. In each step of the scanning, a pair of binocular images of the line is captured. When the surface produced a line occlusion, the occluded area appears only one of the two images. Thus, the data missed in each image can be retrieved from each other. In this manner, the line occlusion is avoided by the binocular imaging. Typically, the disparity is needed to retrieve the object depth in a binocular system [9]. In the proposed technique, the disparity is determined by detecting the line position in each pair of the binocular images. In the proposed setup, the position of the line disparity is proportional to the object depth. By means of a Bezier network, a mathematical relationship between the line position and the object depth is generated. In this manner, the measurements of the focal length, and the distance between the two cameras are avoided. The Bezier network is constructed using the disparity of the line, which is projected on the objects with known dimensions. The disparity is detected measuring the position of the line displacement in the image. Thus, the network calculates the depth dimension for a position of a stripe displacement given. The line position is detected by means of Bezier curves with a resolution of a fraction of pixel. This position is processed by the network to determine the object shape. In this manner, all steps of the proposed technique are performed automatically by
Three-Dimensional Vision Based on Binocular Imaging…
83
computational algorithms. Thus, the physical measurements on the setup are avoided. This kind of computational process improves the performance and the accuracy. Thus, a contribution is provided in the binocular imaging methods. In the proposed technique, the object is moved in the x-axis and scanned by a laser line. From the scanning, a set of binocular images of the line is captured. Each one of these images is processed by the Bezier curves to determine the position of the line disparity. This position is processed by the network to determine the object depth. The structure of the network consists of an input vector, a hidden layer and an output layer. The input vector includes: object dimensions, line position and parametric data. The hidden layer is constructed by neurons of Bezier Basis functions. The output layer is formed by the summation of the neurons, which are multiplied by a weight. The produced information by a pair of the binocular images corresponds to a transverse section of the object. The data of transverse sections are stored in an array memory to obtain the complete object shape. The results obtained in this technique are achieved with very good repeatability.
2. Basic Teory The proposed setup figure 1 consists of a line projector, two CCD cameras, an electromechanical device and a computer. In this arrangement, the object is fixed on a platform of the electromechanical device. The platform moves the object in the x-axis. A laser stripe is projected onto the object surface by a laser diode to perform the scanning. In each step of the scanning, the laser line is digitized by two CCD cameras. In each image, the line is deformed in the x-axis according to the object surface. The profilometric method is based on the line deformation. By detecting the position of the line deformation, the object depth is determined. Therefore, the line position is main parameter to perform the object contouring. The object contour is broken when an occlusion appears. To retrieve the complete object contour, the binocular imaging is applied. In this system, the occluded line in the first camera can be observed by the second camera. Also, the occluded line in the second camera can be observed by the first camera. Thus, the binocular system overcomes the occlusion in the line projection. In this manner, the object contour can be computed completely. The contouring is described based on the geometry shown in figure 2. On the reference plane are located the x-axis, y-axis and the object depth is indicated by h(x, y) in the z-axis. The points o and p correspond to line projected on the reference plane and object surface, respectively. The focal length is indicated by F, d is the distance between the two cameras and the image center of each
84
J. Apolinar Muñoz-Rodríguez
camera is indicated by ac and ec, respectively. When a laser line is projected on the surface, the line position is moved from ao to ap in the image plane of the camera a. At the same time, the line position is moved from eo to ep in the camera e. The line displacement in the image plane for each camera is represented by as(x,y)= ao - ap.
(1)
es(x,y)= ep - e0.
(2)
By means of the positions ep and ap, the line displacement respect to the reference point “o” is given by as(x,y) for the camera a and es(x,y) for the camera e. Based on the pinhole camera model [10], the surface zi in a binocular system is deduced by
zi =
d F
δ +ε
,
(3)
from this equation, δ+ε is the disparity. The distances of the disparity are deduced by δ = ac- ap and ε = ec – ep, respectively. To compute the surface zi by means of Eq.(3), the constants F, d, ac, ec and camera orientation should be known. Typically, the parameters d and F are obtained by an external procedure to the contouring. Then, these parameters are given to the computer system to reconstruct the object shape. This means that Eq.(3), can not be computed in the image processing of the laser line. For the proposed technique, the object depth h(x,y) is computed directly in the image processing of line position. In this procedure, a Bezier network provides a function that computes the object depth based on the line displacement. Also, this network provides information of the camera orientation, the focal lens, the center coordinates, the distance between tow cameras, disparity and center coordinates. Thus, the performance for object contouring is improved. In our contouring system, the object depth is computed by the network based on the line position using only one camera of the binocular system. This means that the network produces the depth h(x,y) using as(x,y) or es(x,y). The image processing provides a least one of the two displacements. When a line occlusion appears in the camera a, as(x,y) is missing. Then, the ep of the line disparity is used to compute the object depth h(x,y). In this manner, the occlusion problem of the laser line is solved by the binocular imaging. To detect the disparity, the line position is measured in every row of the binocular images. This position is computed by detecting the maximum of the line intensity in each
Three-Dimensional Vision Based on Binocular Imaging…
85
row of the image. The intensity projected by a laser diode is a Gaussian distribution in the x-axis [11]. The intensity in every row of the image is represented by (x0, z0), (x1, z1), (x , z2),.......,(xn, zn), where xi is the pixel position and zi is the pixel intensity. To detect the line position, the maximum intensity measured in the image. To carry it out, Bezier curves and peak detection are used. The nth-degree Bezier function is determined by n+1pixels [12]. The nth-degree Bezier function is determined by two parametric equations, which are described by
⎛ n⎞
⎛ n⎞
x(u) = ⎜⎜ ⎟⎟ (1 - u)n u0 x0 + ⎜⎜ ⎟⎟ (1 - u) n-1u x1 + 0 1
⎝ ⎠ ⎝ ⎠ ⎛ n⎞ + ⎜⎜ ⎟⎟ (1 - u) 0 un xn, 0 ≤ u ≤ 1. ⎝ n⎠ ⎛ n⎞
(4)
⎛ n⎞
z(u) = ⎜⎜ ⎟⎟ (1 - u)n u0 z0 + ⎜⎜ ⎟⎟ (1 - u) n-1u z1 + 0 1
⎝ ⎠ ⎝ ⎠ ⎛ n⎞ + ⎜⎜ ⎟⎟ (1 - u) 0 un zn, 0 ≤ u ≤ 1. ⎝ n⎠
(5)
Eq.(4) represents the pixel position and Eq.(5) represents the pixel intensity. To fit the Bezier curve shown in figure 3, x0, x1, x2,......,xn, are substituted into Eq. (10) and z0, z1, z2,....,zn, are substituted into Eq. (5). Then, these equations are evaluated in the interval 0≤ u≤1. In this case, the second derivative z”(u) > 0 in the interval 0≤ u≤1. Therefore, the maximum is detected by the first derivative equal to zero z´(u)=0 [13] via bisection method. Beginning with a pair of values ui = 0 and us =1, because z(u) is defined for the interval 0 ≤ u ≤ 1, u* is halfway between ui and us. If z´(u) evaluated at u = u* is positive, then ui = u*. If z´(u) evaluated at u = u* is negative, then us=u*. Next, u* is taken as the mid point of the last pair values that converges to the root. The value u* where z´(u) = 0 is substituted into Eq.(5) to obtain maximum position x*. The result is x* = 34.274 and the stripe position is ap = 34.274 pixels, which is shown in figure 3. The procedure of stripe detection is applied to all rows of the image. Then, the line position is processed by the network to obtain the object contour.
86
J. Apolinar Muñoz-Rodríguez
Figure 1. Experimental setup.
Figure 2. Geometry of the experimental setup.0.
Three-Dimensional Vision Based on Binocular Imaging…
87
Figure 3. Maximum position from a set of pixels fitted to a Bezier curves.
3. Bezier Networks for Surface Contouring From the binocular images, the line displacement is proportional to the object depth. A Bezier network is built to compute the object depth h(x,y) based on the displacement as(x,y) or es(x,y). This network is constructed based on a line projected on objects with known dimensions. The network structure consists of an input vector, a parametric input, a hidden layer and an output layer. This network is shown in figure 4. Each layer of the network is deduced as follow. The input includes: the object dimensions hi, the stripe displacements as, es and the parametric values u and v. The input data as0, as1, as2,….,asn and es0, es1, es2,….,esn are the stripe displacements obtained by image processing described in section 2. By means of these displacements, a linear combination LCa and LCe are determined [14] to compute u and v. The relationship between the displacement and the parametric values is described by u= b0 +b1as,
(6)
v= c0 +c1es,
(7)
88
J. Apolinar Muñoz-Rodríguez
where bi and ci are the unknown constants. The Bezier curves are defined in the interval 0≤ u ≤ 1 and 0≤v≤ 1 [17]. For the depth h0, the displacements as0 and es0 are produced. Therefore, u=0 for as0 and v=0 for es0. For hn the displacements asn and esn are produced. Therefore, the u=1 for asn and v=1 for esn is. Substituting the values (as0, u=0) and (asn, u=1) in Eq.(6), two equations with two unknown constants are obtained. Solving these equations b0 and b1 are determined and Eq.(6) is completed. Substituting the values (es0, v=0) and (esn, v=1) in Eq.(7), two equations with two unknown constants are obtained. Solving these equations c0 and c1 are determined and Eq.(7) is completed. Thus, for the displacement as and es, the parametric values u and v are computed via Eq.(6) and Eq.(7), respectively. The input h0, h1, h2,...,hn are obtained from the pattern objects, whose dimensions are known. The hidden layer is constructed by Bezier basis function [16], which is described by
⎛ n⎞ Bi (u ) = ⎜⎜ ⎟⎟u i (1 − u ) n−i , ⎝i ⎠
⎛ n⎞ n! ⎜⎜ ⎟⎟ = i i ! ( n − i )! ⎝ ⎠
(8)
denotes the binomial distribution from statistics. The output layer is the summation of the neurons of the hidden layer, which are multiplied by a weight. The output is the depth h(x,y), which is represented by ah(u) and eh(v). These two outputs are described by the next equations a h (u ) =
n
∑ w B ( u )h , i
0 ≤ u ≤ 1,
(9)
∑ r B ( v )h ,
0 ≤ v ≤ 1,
(10)
i
i =0
e h( v ) =
i
n
i =0
i
i
i
where wi and ri are the weights, hi is the known dimension of the pattern objects and Bi is the Bezier basis function Eq.(8). To obtain the network Eq.(9) and Eq.(10), the suitable weights wi and ri should be determined. To obtain the weights w0, w1, w2,…..,wn, the network is being forced to produce the outputs h0, h1, h2,.....,hn by means of an adjustment mechanism. To carry it out, the values hi and its u are substituted in Eq.(9). The value u is computed via Eq.(6) based on the displacement as, which corresponds to the known dimension hi.
Three-Dimensional Vision Based on Binocular Imaging…
89
For each input u, an output ah is produced. Thus, the next equation system is obtained
⎛ n⎞ ⎟⎟ 0 ⎝ ⎠
(1 - u)n u0 h0 + w1 ⎜ ⎜
⎛ n⎞ ⎟⎟ ⎝0⎠
(1 - u)n u0 h0 + w1 ⎜ ⎜
ah0 = h0 = w0 ⎜ ⎜
ah1 =h1 = w0 ⎜ ⎜
#
#
⎛ n⎞ ⎟⎟ 1 ⎝ ⎠
(1 - u) n-1uh1 +,…..,+ wn ⎜ ⎜
⎛ n⎞ ⎟⎟ ⎝ 1⎠
(1 - u) n-1uh1 +,…..,+ wn ⎜ ⎜
#
⎛ n⎞ ⎟⎟ ⎝0⎠
ahn =hn = w0 ⎜ ⎜
⎛ n⎞ ⎟⎟ n ⎝ ⎠
(1 - u) 0 un hn, 0 ≤ u ≤ 1.
⎛ n⎞ ⎟⎟ ⎝ n⎠
(1 - u) 0 un hn, 0 ≤ u ≤ 1. (11)
⎛ n⎞ ⎜ n ⎟⎟ ⎝ ⎠
(1 - u) 0 un hn, 0 ≤ u ≤ 1.
#
⎛ n⎞ ⎜ 1 ⎟⎟ ⎝ ⎠
(1 - u)n u0 h0 + w1 ⎜
(1 - u) n-1uh1 +,…..,+ wn ⎜
This linear system of Eq.(11) can be represented as h0 = w0β0,0 + w1β0,1+..........+wnβ0,n h1 = w0β1,0 + w1β1,1+..........+wnβ1,n
# # #
(12)
#
hn = w0βn,0 + w1βn,1+..........+wnβn,n This equation can be rewritten as the product between the matrix of the input data and the matrix of the corresponding output values: βW = H. The linear system represented by the next matrix
⎡ β 0,0 ⎢β ⎢ 101 ⎢ # ⎢ ⎣ β n ,0
β 0,1 β 1,1 #
β 0, 2 .... β 0,n ⎤ ⎡ w0 ⎤ ⎡h0 ⎤ β 1, 2 .... β 1,n ⎥⎥ ⎢ w1 ⎥ ⎢ h1 ⎥ ⎢ ⎥=⎢ ⎥ #
β n ,1 β n , 2
⎥⎢ # ⎥ ⎥ .... β n ,n ⎦ ⎢⎣wn ⎥⎦ #
⎢#⎥ ⎢ ⎥ ⎣hn ⎦
(13)
This system Eq.(13) is solved by the Chelosky method [17]. Thus the weights w0, w1, w2,….,wn are calculated and Eq.(15) has been completed.
90
J. Apolinar Muñoz-Rodríguez
To determine the weights r0, r1, r2,…..,rn, again, the network is being forced to produce the outputs h0, h1, h2,.....,hn. ri. To carry it out, the values v and hi are substituted in Eq.(10). The value v is calculated via Eq.(7) using the displacement es, which corresponds to the dimension hi. For each input v, an output eh is produced and the next equation system is obtained
⎛ n⎞ ⎜ 0 ⎟⎟ ⎝ ⎠
(1 - v)n v0 h0 + r1 ⎜
⎛ n⎞ ⎟⎟ ⎝0⎠
(1 - v)n v0 h0 + r1 ⎜ ⎜
eh0 = h0 = r0 ⎜
eh1 =h1 = r0 ⎜ ⎜
#
#
#
⎛ n⎞ ⎟⎟ ⎝0⎠
ehn =hn = r0 ⎜ ⎜
⎛ n⎞ ⎜ 1 ⎟⎟ ⎝ ⎠
(1 - v) n-1vh1 +,…..,+ rn ⎜
⎛ n⎞ ⎜ n ⎟⎟ ⎝ ⎠
(1 - v) 0 vn hn, 0 ≤ v ≤ 1.
⎛ n⎞ ⎟⎟ ⎝ 1⎠
(1 - v) n-1vh1 +,…..,+ rn ⎜ ⎜
⎛ n⎞ ⎟⎟ ⎝ n⎠
(1 - v) 0 vn hn, 0 ≤ v ≤ 1.
⎛ n⎞ ⎟⎟ ⎝ n⎠
(1 - v) 0 vn hn, 0 ≤ v ≤ 1.
(14)
#
⎛ n⎞ ⎟⎟ ⎝ 1⎠
(1 - v)n v0 h0 + r1 ⎜ ⎜
(1 - v) n-1vh1 +,…..,+ rn ⎜ ⎜
The linear system Eq.(14) can be represented as the product between the input matrix and the matrix of the corresponding output: ℑ R = H. The linear system represented by the next matrix
⎡ ℑ0 , 0 ⎢ℑ ⎢ 101 ⎢ # ⎢ ⎣ℑ n , 0
ℑ0,1 ℑ1,1 #
ℑ0 , 2 ℑ1, 2 #
ℑn ,1
ℑn , 2
.... ℑ0,n ⎤ ⎡ r0 ⎤ ⎡h0 ⎤ .... ℑ1,n ⎥ ⎢ r1 ⎥ ⎢ h1 ⎥ ⎥⎢ ⎥ = ⎢ ⎥ # ⎥⎢ # ⎥ ⎢ # ⎥ ⎥ .... ℑn ,n ⎦ ⎢⎣rn ⎥⎦ ⎢⎣hn ⎥⎦
(15)
This linear system Eq.(15) is solved and the weights r0, r1, r2,….,rn are determined. In this manner, Eq.(16) has been completed. Thus, the network produces the shape dimension via Eq.(9) and Eq.(10) based on the line displacement as and es respectively. In this manner, the BAN provides the object depth by means of h(x,y)= ah(u) and h(x,y)=eh(v). This network is applied to the binocular images shown in figure 5(a) and figure 5(b), respectively. The binocular images correspond to a line projected on a dummy face. From these images, the stripe position ap and ep are computed along the y-axis. Then, the displacement as(x,y) and es(x,y) are computed via Eq.(1) and Eq.(2), respectively.
Three-Dimensional Vision Based on Binocular Imaging…
91
The contours provided by the network are shown figure 6(a) and figure 6(b), respectively. The contour of figure 6(a) is not completed due to the occlusion in figure 5(a). But, figure 5(b) does not contain line occlusions and the complete contour is achieved in figure 6(b).
Figure 4. Structure of the proposed Bezier network.
92
J. Apolinar Muñoz-Rodríguez
Figure 5 (a). First line captured by the binocular system.
Figure 5 (b). Second line captured by the binocular system.
Three-Dimensional Vision Based on Binocular Imaging…
93
Figure 6 (a). Surface profile computed by the network from figure 5(a).
Figure 6 (b). Surface profile computed by the network from figure 5(b).
4. Parameter of the Vision System In optical metrology, the object shape is reconstructed based on the parameters of the camera and the setup. Usually, these parameters are computed by external procedure to the reconstruction system. The camera parameters include focal distance, image center coordinates, pixel dimension, distortion and camera orientation. In the proposed binocular system, the camera parameters are determined based on the data provided by the network and image processing. The camera parameters are determined based on the pinhole camera model, which is shown in figure 7. In the binocular system, the optical axis of the cameras is perpendicular to the reference plane. The camera orientation in the xaxis is determined by means of the geometry figure 8(a). In this geometry, the line
94
J. Apolinar Muñoz-Rodríguez
projected on the reference plane and the object is indicated by ao and ap at the image plane, respectively. The distance between the image center and the laser stripe in the x-axis is indicated by Aa. The object dimension is indicated by hi and D = zi + hi. For this geometry, the camera orientation is performed based on si and hi from the network. According to the perpendicular optical axis, the object depth hi has a projection ki in the reference plane. From figure 8(a), the displacement is defined as si = (xc - ap) - (xc - ao). Thus, the projection ki at the reference plane is computed by
ki =
F hi si + xc − a o
(16)
From Eq.(16) F, xc, ao are constants and hi is computed by the network based on si. In this case, ki is a linear function. Therefore, the derivative ki respect to si dk/ds is a constant. Other configuration is an optical axis not perpendicular to the reference plane. In this case, si does not produce a linear function ki. Also, the derivative dk/ds is not a constant. The orientation of the camera in y-axis is performed based on the geometry of figure 8(b). In this case, the object is moved in y-axis over the line stripe. When the object is moved, the pattern position changes from ayp to ayi in the laser stripe. In this case, t = (ayc-ayp) - (ayc-ayi). For an optical axis perpendicular to the reference plane y-axis, a linear q produces a linear t at the image plane. Therefore, the derivative dt/dq is a constant. Thus, the orientation camera is performed by means of dk/ds = constant for the x-axis and dt/dq = constant for the y-axis. Based on these criterions, the optical axis is aligned perpendicular to x-axis and y-axis. For the orientation in x-axis, ki is computed from hi provided by the network. Due to the distortion, the derivative dk/ds is slightly different to a constant. But, this derivative is the more similar to a constant, which is shown in figure 8(c). In this figure, the dash line is dk/ds for β minor than 90° and the dot line is dk/ds for β major than 90°. Thus, the generated network corresponds to an optical axis aligned perpendicularly to the x-axis. For the orientation in y-axis, qi is provided by the electromechanical device and t is obtained by image processing. In this process, the object position is detected in the line in each movement. Due to the distortion, the derivative dt/dq is not exactly a constant. But, this derivative is the more similar to a constant. Thus, the optical axis is aligned perpendicular to the y-axis. In this manner, the network and image processing provide an optical axis aligned perpendicularly to the reference plane. Based on the optical axis perpendicular to reference plane, the camera
Three-Dimensional Vision Based on Binocular Imaging…
95
parameters are obtained. To carry it out, the network produces the depth hi based on si for the calibration. The geometry of the setup figure 8(a) is described by
zi zi + F = , A a η ( xc − a p ) + A a
(17)
From this equation η is scale factor to convert the pixels to millimeters. Using D = zi + hi and ηsi = η (xc - ap) - η (xc - ao), Eq.(17) is rewritten as
D − hi D − hi + F = Aa η ( si + xc − a o ) + A a
(18)
Where D is the distance from the lens to the reference plane. From Eq.(18) the constants D, Aa, F, η, xc and ao should be determined. To carry it out, Eq.(18) is rewritten as equation system
h1 = D − h2 = D − h3 = D − h4 = D − h5 = D − h6 = D −
F Aa
η ( s1 + xc − ao ) F Aa
η ( s 2 + xc − a o ) F Aa
η ( s 3 + xc − a o )
(19)
F Aa
η ( s4 + xc − ao ) F Aa
η ( s5 + x c − a o ) F Aa
η ( s6 + xc − ao )
The values h1, h2,…., h6, are computed by the network according to s1, s2,…., s6. These values are substituted in Eq.(19) and the equation system is solved. Thus,
96
J. Apolinar Muñoz-Rodríguez
the constants D, Aa, F, η, xc and ao are determined. The coordinate ayc is computed from the geometry figure 8(c) described by
t i = ( ayc − ay p ) −
F ( D − hi ) , η ( A b − qi −1 )
(20)
From Eq.(20) the constants D, F, η, qi, ti, hi are known and ayc, Ab, ayp should be determined. To carry it out, Eq.(20) is rewritten as equation system for an hi constant by
t1 = ( ay c − ay p ) −
F ( D − h1 ) η ( g − q0 )
t 2 = ( ay c − ay p ) −
F ( D − h1 ) η ( g − q1 )
t 3 = ( ayc − ay p ) −
F ( D − h1 ) η ( g − q2 )
(21)
The values t1, t2, t3, are taken from the orientation in y-axis, q0= 0 and the values q1, q2, are provided by the electromechanical device. These values are substituted in Eq.(21) and the equation system is solved. Thus, the constants ayc, ayp and Ab are determined. In this manner the camera parameters are calibrated based on the network and image processing of the laser line. The distortion is observed by means of the line position ap in the image plane, which is described by
ap =
F Aa D − hi
+ xc
(22)
Based on Eq.(22), the behavior of ap respect to hi is a linear function. However, due to the distortion, the real data ap are not linear. The network is constructed by means of the real data using the displacement si =(ac- ap) - (xc- ao). Therefore, the network produces a non linear data h, which is shown in figure 8(d). Thus, the distortion is included in the network, which computes the object depth in the imaging system.
Three-Dimensional Vision Based on Binocular Imaging…
Figure 7. Geometry of the pinhole camera model.
Figure 8 (a). Geometry of an optical axis perpendicular to x-axis.
97
98
J. Apolinar Muñoz-Rodríguez
Figure 8 (b). Geometry of an optical axis perpendicular to y-axis.
Figure 8 (c). Derivative dk/ds for an optical axis perpendicular to x-axis and for an optical axis not perpendicular to x-axis.
Three-Dimensional Vision Based on Binocular Imaging…
99
5. Experimental Results The approach of the Binocular imaging in this technique is to avoid the line occlusions. When an occlusion appears, the object contour is not completed. However, in a binocular imaging one of two images provides the occluded line and the object contour is completed. Based on the network, the parameters of the binocular setup are computed and physical measurements are avoided. In this manner, the computational performance provides all parameters to reconstruct the object shape. Figure 1 shows the experimental setup. The object is moved in the xaxis by means of the electromechanical device in steps of 1.77 mm. A laser line is projected on the target by a 15 mW laser diode to perform the scanning. The line is captured by two CCD cameras and digitized by a frame grabber of 256 gray levels. By means of image processing, the displacement as and es are computed. Then, the object depth h(x,y) is computed via Eq.(9) or Eq.(10). From each pair of the binocular images, a transverse section of the object is produced. The information of all transverse sections is stored in array memory to construct the complete object shape. The experiment is performed with three objects. The first object to be profiled is a dummy face see figure 9(a) and 9(b). The object surface produces stripe occlusions shown in figure 9(a). In this case, the second image figure 9(b) provides the area of the occluded stripe. Therefore, the object reconstruction can be done by means of the binocular imaging. To perform the contouring, the dummy face is scanned in x-axis in steps of 1.27 mm and binocular images of the line are captured. From each pair of images, the first image is used to detect the stripe displacement as. If the first image contains line occlusions, the second image will be used to detect the stripe displacement es. The line occlusion is detected based pixel of the stripe. If the pixels of high intensity is minor than three in a row, a line occlusion is detected in the image. By image processing, the stripe displacement as or es is calculated via Eq.(1) or Eq(2) respectively. Then, the values u and v are deduced via Eq.(6) and Eq.(7). By means of the network Eq.(9) or Eq.(10), the object depth is computed via u or v, respectively. Thus, the network produces a transverse section based on the binocular images. To know the accuracy of the data provided by the network, the root mean squared error (rms) is calculated [18] based on a contact method. To carry it out, the object is measured by a coordinate measure machine (CMM). The rms is described by the next equation
100
J. Apolinar Muñoz-Rodríguez
rms =
1 n (hoi − hci ) 2 , ∑ n i =1
(30)
where hoi is the data measured by the CMM, hci is the calculated data h(x,y) by network and n is the number of data. The rms was computed using n=1122 data and the result is a rms = 0.155 mm for the dummy face. In this case, sixty eighty lines were processed to determine the complete object shape shown in figure 9(c). The scale of this figure is mm. The second object to be profiled is a metallic piece figure 10(a). The contouring is performed by scanning the metallic piece in steps of 1.27 mm. In this case, Fifty eight lines were processed to determine the complete object shape shown in figure 10(b). The metallic piece was measured by the CMM to determine the accuracy provided by the network. The rms was computed using n=480 data, which were provided by the network and by the CMM as reference. The rms is calculated for this object is a rms = 0.114 mm.
Figure 9(a). First image of the dummy face whit line occlusion.
Three-Dimensional Vision Based on Binocular Imaging…
Figure 9(b). Second image of the dummy face with out line occlusion.
Figure 9(c). Three-dimensional shape of the dummy face.
101
102
J. Apolinar Muñoz-Rodríguez
Figure 10(a). Metallic piece with to be profiled.
Figure 10(b). Three-dimensional shape of the metallic piece.
The value n has an influence in the confidence level respect to the precision of the error calculated. To determine if the value n is according to the desired precision, the confidence level is calculated by the next relation [19] 2
⎛ σ ⎞ n = ⎜ zα x ⎟ , e ⎠ ⎝
(31)
Three-Dimensional Vision Based on Binocular Imaging…
103
where zα is the confidence desired, e is the error expressed in percentage, and σx is standard deviation. Therefore, the confidence level according to the data n can be described by
zα =
e
σx
n.
(32)
To know if the value n chosen is according with the confidence of level desired Eq.(31) is applied. The confidence level desired is 95 %, which corresponds to zα=1.96 according to the confidence table [26]. The average of the height of the face surface is 19.50 mm, therefore using the rms the error is 0.0079, which represents a 0.79 % of error. To determine the precision of this error, the confidence level is calculated for the n=1122, e = 0.79 and standard deviation is 7.14. Substituting the values in Eq.(32), the result is zα =3.7061. It indicates a confidence level greater than the 95%. Also, the confidence level is greater than 95% for metallic piece. The employed computer in this process is a PC to 1 GHz. Each stripe image is processed in 0.011 sec. This time processing is given because the data of the image is extracted with few operations via Eq.(4) and Eq.(5). The capture velocity of the camera used in this camera is 34 fps. The electromechanical device is moved also at 34 steps per second. The complete shape of the dummy face is profiled in 4.18 sec, and the metallic piece is profiled in 3.22 sec. In this procedure, distances of the geometry of the setup to obtain the object shape are not used. Therefore, the procedure is easier than those techniques that use distances of the components of optical setup. In this manner, the technique is performed by computational process and measurements on optical step are avoided. Therefore, a good repeatability achieved in experiment of a standard deviation +/- 0.01 mm.
Conclusions A technique for shape detection performed by means of line projection, binocular imaging and approximation networks has been presented. The described technique here provides a valuable tool for inspection industrial and reverse engineering. The automatic technique avoids the physical measurements of the setup, as is common in the methods of laser stripe projection. In this technique, the parameters of the setup are obtained automatically by computational process using a Bezier network. It improves the accuracy of the results, because
104
J. Apolinar Muñoz-Rodríguez
measurement errors are not introduced to the system for the shape detection. In this technique, the ability to measure the stripe behavior with a sub-pixel resolution has been achieved by Bezier curves. It is achieved with few operations. By using this computational-optical setup a good repeatability is achieved in every measurement. Therefore, this technique is performed in good manner.
References [1]
L. Zagorchev and A. Goshtasby, “A paintbrush laser range scanner”, Computer vision and image understating, Vol. 10, p. 65-86 (2006). [2] Z. Wei, G. Zhang and Y. Xu, “Calibration approach for structured–lightstripe vision sensor based on invariance of double cross-ratio”, Optical Engineering, Vol. 42 No. 10, p. 2956-2966 (2003). [3] A. M. Mclvor, “Nonlinear calibration of a laser profiler”, Optical Engineering, Vol. 42 No.1, p. 205-212, (2002). [4] W. Ch. Tai and M. Chang, “Non contact profilometric measurement of large form parts”, Optical Engineering, Vol. 35 No. 9, p. 2730-2735 (1996). [5] M. Baba, T. Konishi and N. Kobayashi, “A novel fast rangefinder with nonmechanical operation”, Journal of Optics, Vol. 29 No. 3, p. 241-249 (1998). [6] J. A. Muñoz Rodríguez and R. Rodríguez-Vera, “Evaluation of the light line displacement location for object shape detection”, Journal of Modern Optics, Vol. 50 No.1, p. 137-154 (2003). [7] Q. Luo, J. Zhou, S. Yu and D. Xiao, “Stereo matching and occlusion detection with integrity and illusion sensitivity”, Pattern recognition letters, Vol. 24, p. 1143-1149, (2003). [8] H. Mitsudo, S. Nakamizo and H. Ono, “A long-distance stereoscopic detector for partially occluding surfaces”, Vision Research, Vol. 46, p. 1180-1186, (2006). [9] H. J. Andersen, L. Reng and K. Kirk, “Geometric plant properties by relaxed stereo vision using simulated annealing”, Computers and Electronics in Agriculture, Vol. 49, p. 219-232, (2005). [10] R. Klette, K. Schluns and A. Koschan, Computer vision: Three-dimensional data from images, Springer, Singapore 1998. [11] F. Causa and J. Sarma, “Realistic model for the output beam profile of stripe and tapered superluminescent light–emitting diodes”, Applied Optics, Vol. 42 No.21, p. 4341-4348 (2003). [12] Peter C. Gasson, Geometry of spatial forms, U.S.A. John Wiley and Sons, 1989.
Three-Dimensional Vision Based on Binocular Imaging…
105
[13] H. Frederick and G. J. Lieberman, Introduction to operations research, McGraw- Hill, U.S.A. 1982. [14] Robert J. Schalkoff, Artificial Neural Networks, Mc Graw Hill, U.S.A. 1997. [15] Y. J. Ahn, Y. S. Kim and Y. Shin, “Approximation of circular arcs and o set curves by Bezier curves of high degree”, Journal of Computational and Applied Mathematics, Vol. 167, p. 405–416, (2004). [16] G. D. Chen and G. J. Wang, “Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity”, Computer Aided Geometric Design Vol. 19, p. 365–377, (2002). [17] W.H. Press, B.P.Flannery, S.A.Teukolsky, W.T.Vetterling, Numerical Recipes in C, Cambridge Press, U.S.A. 1993. [18] T. Masters, Practical Neural Networks Recipes in C++, Academic Press, U.S.A 1993. [19] J. E. Freund, Modern Elementary Statistics, Prentice Hall, U.S.A. (1979).
In: Binocular Vision Editors: J. McCoun et al, pp. 107-123
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 4
EYE MOVEMENT ANALYSIS IN CONGENITAL NYSTAGMUS: CONCISE PARAMETERS ESTIMATION Pasquariello Giulio1, Cesarelli Mario1, La Gatta Antonio2, Bifulco Paolo1 and Fratini Antonio1 1
Dept. of Biomedical, Electronic and Telecommunication Engineering, University “Federico II” of Naples, Via Claudio, 21, 80125, Napoli, Italy 2 Math4Tech Center, University of Ferrara, via Saragat, 1, 44100, Ferrara, Italy
Abstract Along with other diseases that can affect binocular vision, reducing the visual quality of a subject, Congenital Nystagmus (CN) is of peculiar interest. CN is an ocular-motor disorder characterized by involuntary, conjugated ocular oscillations and, while identified more than forty years ago, its pathogenesis is still under investigation. This kind of nystagmus is termed congenital (or infantile) since it could be present at birth or it can arise in the first months of life. The majority of CN patients show a considerable decrease of their visual acuity: image fixation on the retina is disturbed by nystagmus continuous oscillations, mainly horizontal. However, the image of a given target can still be stable during short periods in which eye velocity slows down while the target image is placed onto the fovea (called foveation intervals). To quantify the extent of nystagmus, eye movement recordings are routinely employed, allowing physicians to extract and analyze nystagmus main features such as waveform
108
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al. shape, amplitude and frequency. Use of eye movement recording, opportunely processed, allows computing “estimated visual acuity” predictors, which are analytical functions that estimate expected visual acuity using signal features such as foveation time and foveation position variability. Hence, it is fundamental to develop robust and accurate methods to measure both those parameters in order to obtain reliable values from the predictors. In this chapter the current methods to record eye movements in subjects with congenital nystagmus will be discussed and the present techniques to accurately compute foveation time and eye position will be presented. This study aims to disclose new methodologies in congenital nystagmus eye movements analysis, in order to identify nystagmus cycles and to evaluate foveation time, reducing the influence of repositioning saccades and data noise on the critical parameters of the estimation functions. Use of those functions extends the information acquired with typical visual acuity measurement (e.g., Landolt C test) and could be a support for treatment planning or therapy monitoring.
Introduction Congenital nystagmus (CN) is an ocular–motor disorder that appears at birth or during the first few months of life, characterized by involuntary, conjugated, bilateral to and fro ocular oscillations. Clinical descriptions of nystagmus are usually based on the direction of the fast phase and are termed horizontal, vertical, or rotary, or any combination of these. CN is predominantly horizontal, with some torsional and, rarely, vertical motion [1]. Nystagmus oscillations can persist also closing eyes, moreover they tend to damp in absence of visual activity. In vertebrates, eye movements are controlled by the oculomotor system in a complex manner, depending on the stimuli and viewing conditions. In the human eye, the little portion of the retina which allows the maximal acuity of vision is called the fovea. An attempt to bring the image of a target onto the fovea can involve up to five oculomotor subsystems: the saccadic, smooth pursuit, vestibular, optokinetic and vergence systems. The vestibular system is driven by non-visual signals from the semicircular canals, while the other systems are mainly driven by visual signals encoding target information. Pathogenesis of the congenital nystagmus is still unknown; dysfunctions of at least one of the ocular stabilization systems have been hypothesized, but no clear evidence was reported. Nystagmus can be idiopathic or associated to alteration of the central nervous system and/or ocular system such as achromathopsia, aniridia and congenital cataract. Both nystagmus and associated ocular alterations can be genetically transmitted, with different modalities; estimates of the prevalence of infantile nystagmus range from 1 in 1000 to 1 in 6000 [23,26,34,38].
Eye Movement Analysis in Congenital Nystagmus
109
CN occurrence associated with total bilateral congenital cataract is of 50– 75%, while this percentage decreases in case of partial or monolateral congenital cataract. CN is present in most cases of albinism. The cause or causes and pathophysiological mechanisms of CN have not been clarified. Children with this condition frequently present with a head turn, which is used to maintain the eyes in the position of gaze in which the nystagmus is minimum. This happens more often when the child is concentrating on a distant object, as this form of nystagmus tends to worsen with attempted fixation. The head turn is an attempt to stabilize the image under these conditions. CN may result from a primary defect (e.g., familial X-linked) of ocular motor calibration. Some authors (e.g., Hertle, 2006) hypothesized that CN may also result from abnormal cross-talk from a defective sensory system to the developing motor system at any time during the motor system’s sensitive period; this can occur from conception due to a primary defect (e.g., retinal dystrophy), during embryogenesis due to a developmental abnormality (e.g., optic nerve hypoplasia), or after birth during infancy (e.g., congenital cataracts). This theory of the genesis of CN incorporates a pathophysiological role for the sensory system in its genesis and modification. Although the set of physiological circumstances may differ, the final common pathway is abnormal calibration of the ocular motor system during its sensitive period.
Terminology (Definitions) Efforts are being made to add precision and uniformity to nystagmus terminology. The terms congenital nystagmus (CN), infantile nystagmus and idiopathic motor nystagmus have become synonymous with the most common form of neonatal nystagmus [4,17,30,31,42]. However, the term infantile nystagmus syndrome (INS) is a broader and more inclusive term that we prefer not to use since it refers to the broad range of neonatal nystagmus types, including those with identifiable causes. According to the bibliography, idiopathic nystagmus can be classified in different categories depending on the characteristics of the oscillations [2]; typically in CN eye movement recordings are possible to identify, for each nystagmus cycle, the slow phase, taking the target away from the fovea, the fast (or slow) return phase. According to the nystagmus waveform characterization by Dell’Osso [20], in case the return phase is slow then the nystagmus cycle is pendular or pseudo-cycloid; if the return phase is fast then the waveform is defined as jerk (unidirectional or bidirectional). In general, CN waveform has an increasing velocity exponential slow phase [2].
110
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al.
A schematic illustration of a unidirectional jerk nystagmus waveform (pointing to the left) is presented in figure 1.
Figure 1. A schematic illustration of a jerk nystagmus waveform (bold line) with fast phase pointing to the left; on the picture are depicted various nystagmus features, such as: fast and slow phase components, nystagmus period and amplitude; the grey box on each cycle represents the foveation window. The baseline oscillation is shown as a dashed line, and its amplitude is also shown.
In general, CN patients show a considerable decrease of the visual acuity, since image fixation on the fovea is reduced by nystagmus continuous oscillations. CN patient visual acuity reach a maximum when eyes are in the position of least ocular instability, hence, in many cases, a compensatory head malposition is commonly achieved, in order to bring the zone of best vision into the straight-ahead position. Such so-called ‘null zones’ (or null positions) correspond to a particular gaze angle, in which a smaller nystagmus amplitude and a longer foveation time can be obtained, thus reaching a better fixation of the visual target onto the retina. Abnormal head posture could be alleviated by surgery (mainly translating the null zone to straight-ahead position). Other clinical characteristics, not always present, include increased intensity with fixation and decreased intensity with sleep or inattention; variable intensity in different positions of gaze; decreased intensity (damping) with convergence; changing direction in different positions of gaze (about a so-called neutral position); strabismus and an increased incidence of significant refractive errors. In normal subjects, i.e., not affected by nystagmus, when the velocity of the image projected on the retina increases by a few degrees per second, visual acuity and contrast sensitivity decrease. In CN patients, fixation is disrupted by nystagmus rhythmical oscillations, which result in rapid movements of the target image onto the retina [6]. Ocular stabilization is achieved during foveation periods
Eye Movement Analysis in Congenital Nystagmus
111
[2] in which eye velocity slows down (less than 4 degrees/s) while the visual target crosses the foveal region (± 0.5 degree); in this short time interval called ‘foveation window’, it is said that the subject ‘foveates’. Visual acuity was found to be mainly dependent on the duration of the foveation periods [2,20,22], but the exact repeatability of eye position from cycle to cycle and the retinal image velocities also contribute to visual acuity [1, 35]. Numerous studies of CN in infants and children confirm an age-dependent evolution of waveforms during infancy from pendular to jerk [4,17,31,42]. This concept is consistent with the theory that jerk waveforms reflect modification of the nystagmus by growth and development of the visual system [28,29]. Accurate, uniform, and repeatable classification and diagnosis of nystagmus in infancy as CN is best accomplished by a combination of clinical investigations and motility analysis; in some cases, eye movement recording and analysis are indispensable for diagnosis. If a subject is diagnosed with CN, ocular motility study can also be helpful in determining visual status. Analysis of binocular or monocular differences in waveforms and foveation periods could be an important information in therapy planning or can be used to measure outcome of (surgical) treatment. Presence of pure pendular or jerk waveforms without foveation periods are indicators of poorer vision whereas waveforms of either type with extended periods of foveation are associated with good vision; moreover significant interocular differences in a patient reflect similar differences in vision between the two eyes. Ocular motility analysis in CN subjects is also the most accurate method to determine nystagmus changes with gaze (null and neutral zones).
Clinical Assessment The clinical examination of a subject affected by congenital nystagmus is a complex task; eye movement recording is often one of the necessary steps, but a general physical examination and the assessment of vision are usually preliminary performed. During the first examination, the physicians can assess the most important features of nystagmus, such as direction of eyes’ beating, presence of anomalous head positions while viewing distant or near objects. In addition an adequate fundus examination is often carried out, in order to asses eventual prechiasmal visual disorders. Complete clinical evaluation of the ocular oscillation also includes identification of fast-phase direction, movement intensity, conjugacy, gaze effects,
112
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al.
convergence effects, and effect of monocular cover. Changes in the character of the nystagmus with convergence or monocular viewing are often evaluated. Visual acuity of the patient is habitually tested with both eyes open (binocular viewing) and with one eye covered (monocular). It is important not to forget that binocular acuity is the “person’s” acuity and monocular acuity is the “eye’s” acuity. These two are often very different in patients with nystagmus and both has to be tested in CN subjects. Among the various available tests, the best choice to assess visual acuity in an older child and cooperative adult is the ETDRS chart, since it provides LogMar evaluation of all acuities, especially those between 20/400 and 20/100 [29].
Examination Techniques: Motility However, it is well documented that differentiating true nystagmus from saccadic oscillations and intrusions is sometimes impossible clinically. Recent advances in eye movement recording technology have increased its application in infants and children who have disturbances of the ocular motor system [1,4]. As stated above, nystagmus is caused by disorders of the mechanisms responsible in holding gaze steady: the vestibular system, the gaze-holding mechanism, the visual stabilization system, and the smooth pursuit system. Thus, evaluation of a patient’s nystagmus requires a systematic examination of each functional class of eye movements. Measurement of the nystagmus waveform, using reliable methodology, is often helpful in securing a diagnosis. Such measurements help differentiate acquired nystagmus from congenital forms of nystagmus and from other saccadic disorders that lead to instability of gaze [36].
Ocular Motility Recordings Qualitative or quantitative analysis of eye movements was attempted since the early twentieth century, with primitive electronic technology available at that time. Nowadays more complex and less invasive methods are available, from biopotential recording up to high-speed photographic methods. Various techniques are currently in use to record eye movements: electro-oculography (EOG), infrared oculography (IROG), magneto-oculography (MOG) also known as scleral search coil system (SSCS) and video-oculography (VOG). The first technique relies on the fact that the eye has a standing electrical potential between the front and the back. Horizontal EOG is measured by placing electrodes on the nasal and temporal boundaries of the eyelids; as the eye turns a proportional
Eye Movement Analysis in Congenital Nystagmus
113
change in electrodes potential is measured. The IROG approach relies on measuring the intensity of an infrared light reflected back from the subject’s eye. Infrared emitters and detectors are located in fixed positions around the eye. The amount of light reflected back to a fixed detector varies with eye position. The VOG approach relies on recording eye position using a video camera, often an infrared device coupled with an infrared illuminator (in order to avoid disturbing the subject), and applying image processing techniques. The scleral search coil method is based on electromagnetic interaction at radio frequencies between two coils, one (embedded in a contact lens) fixed on the eye sclera and the other external. Bilateral temporal and nasal electrode placement is useful for gross separation of fast and slow phases but is limited by nonlinearity, drift, and noise. Infrared reflectance solves these problems and can be used in infants and children but it is limited by difficulty in calibration. IR video systems have become increasingly popular in research laboratories and in the clinical setting, hence the comparison between IR and the scleral search coil method has become an actual issue. Different studies analyzed this subject reporting a good performance of video oculography compared with scleral search coils. Van der Geest and Frens [43] compared the performance of a 2D video-based eye tracker (Eyelink I; SR Research Ltd., Mississauga, Ontario, Canada) with 2D scleral search coils. They found a very good correspondence between the video and the coil output, with a high correlation of fixation positions (average discrepancy, +/-1° over a tested range of 40 by 40° of visual angle) and linear fits near one (range, 0.994 to 1.096) for saccadic properties. However, Houben, Goumans, and van der Steen, [33] found that lower time resolution, possible instability of the head device of the video system, and inherent small instabilities of pupil tracking algorithms still make the coil system the best choice when measuring eye movement responses with high precision or when high-frequency head motion is involved. For less demanding and for static tests and measurements longer than a half an hour, the latest generation infrared video system is a good alternative to scleral search coils; in addition video oculography is not at all invasive, making it suitable for children younger than 10 years old. However, the quality of torsion of the infrared video system is less compared with scleral search coils and needs further technological improvement.. CN eye movement recordings are often carried out only on the horizontal axis and display the data, by convention, during continuous periods of time. Position and velocity traces are clearly marked, with up being rightward eye movements and down being leftward eye movements. Figure 2 reports, as example, a signal tracts recorded from actual CN patients; computed eye velocity is shown
114
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al.
underneath the eye movement signal. It is possible to identify some nystagmus characteristics, such as nystagmus amplitude, frequency and the fast and slow phases.
Figure 2. An example of eye movement recording, jerk left The eye velocity is also depicted with a 0 °/s threshold; the figure also shows (between 26.5 s and 27.6 s) a saccade of about ten degrees corresponding to a gaze angle voluntary shift.
Semiautomatic Analysis of Eye Movement Recordings Congenital nystagmus is a rhythmic phenomenon and researchers have tried to analyze eye movements signals using methodologies specific for frequency analysis such as spectral and wavelet analysis (Reccia et al., 1989-1990, Clement et al., 2002; Miura et al., 2003). Fewer authors (e.g. Hosokawa, 2004) applied the Short Time Fourier Transform (STFT) to congenital nystagmus recordings, in order to highlight modifications in the principal component and in the harmonics during time. However resolution of this technique is limited by the duration of the windows in which the signal is divided [3]. Wavelet analysis seems more useful since it is able to assess how much the signal in study differs in time from a specific template adopted as a reference and it is able to localize a brief transient intrusion into a periodic waveform [37]. It has been used with success to separate
Eye Movement Analysis in Congenital Nystagmus
115
fast and slow phases in caloric nystagmus. However, as stated by Abel [3], the outcome of this analysis is a time-frequency plot or a coefficient sequence, which are difficult to relate to a subject visual ability. Moreover, the foveation time modification in each cycle and the variability of position between successive foveations can hardly be highlighted using STFT and wavelet analysis [3]. On the contrary, time domain analysis techniques, such as velocity thresholds, region-based foveation identification, syntactic recognition or time series analysis, have been routinely employed in the last decades to analyse nystagmus, either congenital or vestibular. Usually, visual acuity increases in people suffering from congenital nystagmus if the foveation time increases and the signal variability decreases. An analysis of the signal characteristics near the desired position (target position) can easily take place in the time domain, as demonstrated by Dell’Osso et al. who defined an analytic function to predict visual acuity (NAFX) [15] and by Cesarelli et al. who defined a similar function (NAEF) [10].
Figure 3. An example of slow phase and fast phase (bold) separation in CN eye movement recordings.
116
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al.
Figure 4. The local foveation windows identified for each nystagmus cycle.
Time domain analysis of congenital nystagmus is the most used technique and, in our opinion, best option so far and it is able to estimate the visual ability of a subject at different gaze angles; however its application with semi-automatic methods still needs improvement both in performance and reliability. The first step of each algorithm for the time analysis of rhythmic eye movements is the cycles identification: in congenital nystagmus, the most common waveforms are jerk, jerk with extended foveation followed by pendular and pseudo-cycloid [1]; however only the first waveform allows foveation time which ensure a good visual ability. The CN jerk waveforms can be described as a combination of two different actions: the slow phase taking the eye away from the desired target, followed by the fast, corrective phase; the foveation takes place when eye velocity is small, which happens after the fast phase. Hence, a local foveation window can be defined, located at the end of each fast phase and at the beginning of the slow phase, which allows to separate the effects of changes in foveation time and alteration in eye position on visual acuity [10]. The analysis oh these two separate effects is of strong importance due to the presence of a slow ‘periodic’ component in the eye movement signal, which we called baseline oscillation (BLO) [7,39].
Eye Movement Analysis in Congenital Nystagmus
117
Slow Eye Movements in Congenital Nystagmus The role of the standard deviation of eye position (SDp) during foveations with respect to visual acuity has been discussed in the past ten years [10,16]. Fostered also by a remarkable increase in some CN patients’ visual acuity, obtained with botulinum toxin treatment, which didn’t correspond to large extensions n foveation time, pointed us to characterize in more details such foveation variability. A slow sinusoidal-like oscillation of the baseline (baseline oscillation or BLO) was found superimposed to nystagmus waveforms [9,11,12] and its relation with the SDp was estimated [7]. Presence of similar slow pendular waveforms, superimposed to nystagmus, was also reported by Gottlob et al. [27]. In addition, in eye movement recordings presented by Dell’Osso et al. [15,16] it is possible to recognize slow oscillations superimposed to the nystagmus. Akman et al. [5], using dynamical systems analysis to quantify the dynamics of the nystagmus in the region of foveation, found that the state-space fixed point, or steady state, is not unique. Physiologically this means that the control system does not appear to maintain a unique gaze position at the end of each fast phase. Similarly, Evans [24] reported that some of the analyzed patients fail to coordinate target with fovea position (approximately 50% of patients). Kommerell [35] noticed that in CN patients, tracking moving targets, the eye recording presented a slow eye movement superimposed to the stimulus trajectory in addition to nystagmic cycles. Nystagmus and the slow oscillation could modify visual acuity. Currie et al. [14] evaluated acuity for optotypes in healthy subjects using moving light sources to simulate retinal image motion that occurs in nystagmus. Their results are that acuity depends on both foveation duration and position variability, although the presence of other sensory defects (e.g. astigmatism) must be taken into account. Moreover, they found that an addition of low-frequency (1.22 Hz) waves to the light stimuli, i.e. slow oscillation, caused a worsening of visual acuity. In order to estimate the slow sinusoidal oscillations, a common least mean square (LMS) fitting technique could be used. For each signal block the highest peak of the power spectrum of the eye movement signal in the range 0.1–1.5 Hz can be considered as an estimator of the BLO frequency. The high frequency limit result from the lowest frequency commonly associated to nystagmus (accordingly to Bedell and Loshin, 1991, and Abadi and Dickinson, 1986), while the low frequency limit depends on the signal length corresponding to each gaze position (in our tests approximately 10 s).
118
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al.
Figure 5a and 5b. Examples of acquired signals showing the presence of the slow eye movement added up to the nystagmus oscillations.
Eye Movement Analysis in Congenital Nystagmus
119
Conclusion Eye movement recording methodology is most commonly used as a research tool by neurologists, neurophysiologists, psychophysicists, psychologists/ psychiatrists, ophthalmologists, and optometrists [18,21,25]. Eye movement recording and estimation of concise parameters, such as waveform shape, nystagmus amplitude, frequency, direction of beating, foveation periods and eye position variability, are a strong support for an accurate diagnosis, for patient follow-up and for therapy evaluation [8]. Regarding the last parameter, the slow eye movement, described as baseline oscillation, explains most of the eye position variability during foveations (SDp) [7], which in turn was found exponentially well related to visual acuity [10]. According to the procedure described above, baseline oscillation parameters can be estimated for any CN eye movement recordings. In a case study by Pasquariello et al. carried out on 96 recordings, almost 70% of the recordings had BLO amplitude greater than 1° (appreciatively the fovea angular size); in the remaining 30% the amplitude of the BLO was smaller and didn’t affect significantly visual acuity. In that study a high correlation coefficient (R2 = 0.78) was also found in the linear regression analysis of BLO and nystagmus amplitude, suggesting a strong level of interdependence between the two. The regression line slope coefficient was about 0.5, which implies that BLO amplitude on average is one half of the correspondent nystagmus amplitude. Specifically, since BLO amplitude resulted directly related to nystagmus amplitude, its presence is particularly evident in the signal tracts away from the null zone (i.e., not in the position in which nystagmus amplitude is lesser). The origin of such baseline oscillation is unknown. Some authors assert that slow movement can be recorded only in subjects with severely reduced visual experience from birth (like CN patients) [27]. However, the high value of the correlation coefficient between BLO and nystagmus amplitude found in this study suggests that the two phenomena are somewhat linked together. Therefore the origin of the BLO could be searched analyzing within the same ocular motor subsystems considered for nystagmus. The baseline oscillation highlights the presence of a slow ‘periodic’ component in the eye movement signal. The sine function is a rather good estimator of this slow periodic component added to nystagmus; the basic shape of the baseline is indeed a sinusoid, sometimes and randomly disrupted by phase inversions, interruptions (as short as hundreds of milliseconds, lasting to even 1 second) and other non linear components. To the periodic component represented
120
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al.
by BLO the small, additional random movements should be added, in order to assess the whole variability of eye position during fixation [7].
Figure 6. The relationship between Baseline Oscillation and Nystagmus amplitude.
References [1] [2] [3] [4]
Abadi RV; Bjerre A. Motor and sensory characteristics of infantile nystagmus. Br. J. Ophthalmol. 2002, 86, 1152-1160 Abadi RV; Dickinson CM. Waveform Characteristics in Congenital Nystagmus. Doc. Ophthalmol. 1986, 64, 153-67. Abel LA; Wang ZI; Dell'Osso LF. Wavelet analysis in infantile nystagmus syndrome: limitations and abilities. Invest. Ophthalmol. Vis. Sci. 2008 Aug, 49(8), 3413-23 Abel LA. Ocular oscillations. Congenital and acquired. Bull. Soc. Belge Ophthalmol. 1989, 237, 163–189.
Eye Movement Analysis in Congenital Nystagmus [5] [6] [7] [8] [9] [10] [11]
[12]
[13] [14] [15] [16]
121
Akman OE; Broomhead DS; Clement RA, Abadi RV. Nonlinear time series analysis of jerk congenital nystagmus. J. Comput. Neurosci. 2006, 21(2), 153-70. Bedell HE; Loshin DS. Interrelations between Measures of Visual Acuity and Parameters of Eye Movement in Congenital Nystagmus. Invest. Ophthalmol. Vis. Sci. 1991, 32, 416-21. Bifulco P; Cesarelli M; Loffredo L; Sansone M; Bracale M. Eye Movement Baseline Oscillation and Variability of Eye Position During Foveation in Congenital Nystagmus. Doc. Ophthalmol. 2003, 107, 131-136. Cesarelli M., et al. Analysis of foveation duration and repeatability at different gaze positions in patients affected by congenital nystagmus. IFMBE Proceedings. 2007, 16 (12), 426-429. Cesarelli M; Bifulco P; Loffredo L; Magli A; Sansone M; Bracale M. Eye Movement Baseline Oscillation in Congenital Nystagmus. Proceedings of the WC2003. Cesarelli M; Bifulco P; Loffredo L; Bracale M. Relationship between visual acuity and eye position variability during foveation in congenital nystagmus. Doc. Ophthalmol. 2000, 101, 59-72. Cesarelli M; Bifulco P; Loffredo L. EOG Baseline Oscillation in Congenital Nystagmus. VIII Mediterranean Conference on Medical Biological Engineering and Computing - MEDICON '98, Lemesos - Cyprus, June 1417, 1998 - CD-ROM 19.3 Cesarelli M; Loffredo L; Bifulco P. Relationship between Visual Acuity and Oculogram Baseline Oscillations In Congenital Nystagmus. Proceedings of the 4th European Conference on Engineering and Medicine, Warsaw 1997, 301-2. Clement RA; Whittle JP; Muldoon MR; Abadi RV; Broomhead DS; Akman O. Characterisation of congenital nystagmus waveforms in terms of periodic orbits. Vision Research, 2002, 42, 2123–2130 Currie DC; Bedell HE; Song S. Visual Acuity for Optotypes with Image Motions Simulating Congenital Nystagmus. Clin. Vision Sci. 1993, 8, 7384. Dell’Osso LF; Jacobs JB. An Expanded Nystagmus Acuity Function: Intraand Intersubject Prediction of Best-Corrected Visual Acuity. Doc. Ophthalmol. 2002, 104, 249-276. Dell'Osso LF; Van Der Steen J; Steinman RM; Collewijn H. Foveation Dynamics in Congenital Nystagmus. I: Fixation. Doc. Ophthalmol. 1992, 79, 1-23.
122
Pasquariello Giulio, Cesarelli Mario, La Gatta Antonio et al.
[17] Dell’Osso LF. Congenital, latent and manifest latent nystagmus-similarities, differences and relation to strabismus. Jpn. J. Ophthalmol. 1985, 29(4), 351-368. [18] Dell’Osso LF; Flynn JT. Congenital Nystagmus surgery. A quantitative evaluation of the effects. Arch. Ophthalmol. 1979, 97(3), 462–469. [19] Dell’Osso LF; Schmidt D; Daroff RB. Latent, manifest latent, and Congenital Nystagmus. Arch. Ophthalmol. 1979, 97(10), 1877–1885. [20] Dell'Osso LF; Darof RB. Congenital Nystagmus waveform and foveation strategy. Doc. Ophthalmol. 1975, 39, 155-82. [21] Dell’Osso LF; Gauthier G; Liberman G; Stark L. Eye movement recordings as a diagnostic tool in a case of Congenital Nystagmus. Am. J. Optom. Arch. Am. Acad. Optom. 1972, 49(1), 3-13. [22] Dickinson CM; Abadi RV. The Influence of the nystagmoid oscillation on contrast sensitivity in normal observers. Vision Res. 1985, 25, 1089-96. [23] Duke-Elder S. Systems of ophthalmology. Vol III, Part 2. London: Henry Kimpton, 1973. [24] Evans N. The significance of the nystagmus. Eye. 1989; 3, 816-832. [25] Flynn JT; Dell’Osso LF. The effects of Congenital Nystagmus surgery. Ophthalmology. 1979, 86(8), 1414–1427. [26] Forssman B; Ringer B. Prevalence and inheritance of congenital nystagmus in a Swedish population. Ann. Hum. Genet. 1971, 35, 139-47. [27] Gottlob I; Wizov SS; Reinecke RD. Head and eye movements in children with low vision. Graefes Arch. Clin. Exp. Ophthalmol. 1996, 234, 369-77. [28] Harris C; Berry D. A developmental model of infantile nystagmus. Semin. Ophthalmol. 2006 Apr-Jun, 21(2), 63-9. [29] Hertle, RW, Nystagmus and Ocular Oscillations in Childhood and Infancy. In: Wright KW, Spiegel PH, Thompson LH editors. Handbook of pediatric neuro-ophthalmology. New York: Springer; 2006; 289-323 [30] Hertle RW; Zhu X. Oculographic and clinical characterization of thirtyseven children with anomalous head postures, nystagmus, and strabismus: the basis of a clinical algorithm. J. Am. Assoc. Pediatr. Ophthalmol. Strabismus. 2000, 4(1), 25–32. [31] Hertle RW; Dell’Osso LF. Clinical and ocular motor analysis of congenital nystagmus in infancy (see also comments). J. Am. Assoc. Pediatr. Ophthalmol. Strabismus. 1999, 3(2), 70–9. [32] Hosokawa M; Hasebe S; Ohtsuki H; Tsuchida Y. Time-Frequency Analysis of Electro-nystagmogram Signals in Patients with Congenital Nystagmus. Jpn. J. Ophthalmol. 2004, 48, 262-7
Eye Movement Analysis in Congenital Nystagmus
123
[33] Houben, MMJ; Goumans J and van der Steen J Recording ThreeDimensional Eye Movements: Scleral Search Coils versus Video Oculography. Invest. Ophthalmol. Vis. Sci. 2006, 47(1):179-187 [34] Hu DN. Prevalence and mode of inheritance of major genetic eye disease in China. J. Med. Genet. 1987, 24, 584–8. [35] Kommerell G. Congenital nystagmus: control of slow tracking movements by target offset from the fovea. Graefes Arch. Clin. Exp. Ophthalmol. 1986, 224(3), 295-8. [36] Leigh RJ. Clinical features and pathogenesis of acquired forms of nystagmus. Baillieres Clin. Neurol. 1992, 1(2), 393–416. [37] Miura K, Hertle RW, FitzGibbon EJ, Optican LM. Effects of tenotomy surgery on congenital nystagmus waveforms in adult patients. Part I. Wavelet spectral analysis. Vision Research. 2003, 43, 2345–2356 [38] Norn MS. Congenital idiopathic nystagmus. Incidence and occupational prognosis. Acta Ophthalmol. 1964, 42, 889–96. [39] Pasquariello G; Cesarelli M; Bifulco P; Fratini A; La Gatta A, Romano M. Characterisation of baseline oscillation in congenital nystagmus eye movement recordings. Biomedical Signal Processing and Control, 2009 Apr, 4, 102–107. [40] Reccia R; Roberti G; Russo P. Computer analysis of ENG spectral features from patients with congenital nystagmus. Journal of Biomedical Engineering. 1990, 12, 39–45. [41] Reccia R; Roberti G; Russo P. Spectral analysis of pendular waveforms in congenital nystagmus. Ophthalmic Research. 1989, 21, 83–92. [42] Reinecke RD. Idiopathic infantile nystagmus: diagnosis and treatment. J. Am. Assoc. Pediatr. Ophthalmol. Strabismus. June 1997; 1(2), 67–82. [43] van der Geest JN, Frens MA, Recording eye movements with videooculography and scleral search coils: a direct comparison of two methods. J. Neurosci. Methods. 2002, 114, 185–195.
In: Binocular Vision Editors: J. McCoun et al, pp. 125-137
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 5
EVOLUTION OF COMPUTER VISION SYSTEMS Vladimir Grishin* Space Research Institute (IKI) of the Russian Academy of Sciences 117997, 84/32 Profsoyuznaya Str, Moscow, Russia
Abstract Applications of computer vision systems (CVS) in the flight control of unmanned aerial vehicles (UAV) are considered. In many projects, CVS are used for precision navigation, angular and linear UAV motion measurement, landing (in particular shipboard landing), homing guidance and others. All these tasks have been successfully solved separately in various projects. The development of perspective CVS can be divided into two stages. The first stage of perspective CVS development is the realization of all the above tasks in a single full-scale universal CVS with acceptable size, weight and power consumption. Therefore, all UAV flight control tasks can be performed in automatic mode on the base of information that is delivered by CVS. All necessary technologies exist and the degree of its maturity is high. The second stage of CVS development is integration of CVS and control systems with artificial intelligence (AI). This integration will bring two great benefits. Firstly it will allow considerable improvement of CVS performance and reliability due to accumulation of additional information about the environment. Secondly, the AI control system will obtain a high degree of awareness about the state of the environment. This allows the realization of a high degree of control effectiveness of the autonomous AI system in a fast changing and hostile environment.
*
E-mail address:
[email protected] 126
Vladimir Grishin
Introduction The computer vision systems (CVS) revealed a great evolution during the last decades. This chapter attempts to estimate the nearest perspective for its development. Further analysis will be dedicated to usage of CVS in mobile robot control systems, mainly in unmanned aerial vehicles (UAV). This problem is one of the most challenging tasks of control theory and practice. However, the main principles of this analysis are applicable to most of the different kinds of mobile robots.
Present-Day Level of CVS Development A large number of publications is devoted to the application of CVS to different tasks of UAV flight control. The enumeration of these publications may take many pages; so here we refer to a few arbitrarily chosen papers. Let’s list the key tasks of such CVS. •
High precision navigation [1–6]. This task can be solved by means of recognition (matching) beforehand specified objects (landmarks) whose coordinates are known [1]. Another demand which is imposed on these landmarks is reliability of detection and recognition process. Reliability had to be guaranteed in conditions of possible imitation and masking. Since these landmarks are selected in advance and their reference patterns can be carefully prepared, the process of recognition (matching) can be reliably performed. Reference patterns are prepared with the account of different distances, perspective aspect angles of observation and observation conditions. Reliability of landmark recognition can be subsequently increased by joint usage of landmark images and their 3D profiles. 3D profiles are widely used for navigation of missiles of different kinds (Tomahawk cruise missiles and others). The technologies for 3D profile reconstruction are well known. For instance, the complex of the Israeli firm VisionMap [2] can be referred. The complex allows reconstructing of a 3D profile with precision about 0.2–0.3 m from the altitude of 3250 m. This complex is heavy enough and has considerable size. Some weakening of the precision requirement will allow significant decrease in weight and size. Further increasing of navigation reliability can be achieved by selection of a redundant number of landmarks in the whole area of observation. Information from CVS is usually integrated
Evolution of Computer Vision Systems
•
•
•
127
with information from inertial navigation system. Such integration allows serious decreasing of accumulated errors of the inertial navigation system [3]. The 3D profile of observed surface can be calculated simultaneously [4, 5]. In the presence of the on-board high precision inertial navigation system, it is possible to make a 3D reconstruction in the monocular mode (with longitudinal stereo base) [6]. In this case it is possible to make a 3D reconstruction of very distant objects and surfaces. Flight stabilization and control of angular orientation [7–12]. This task is solved by the measurement of angular and linear UAV motion relative to the observed surface or objects [7, 8]. The set of features (points in frame with good localization) are selected and traced in subsequent frames. Observed shifts of a selected set of points are used for calculation of relative angular and linear motion. In this aspect, the star trackers should be mentioned. These devices are the specialized CVS which are used in automatic spacecrafts for measurement of angular position with high precision and reliability [9, 10]. For flight control, it is important to estimate the UAV orientations with regard to local vertical (pitch and roll angles). These estimations are used for attitude stabilization and control. CVS sensors of local vertical use algorithms of horizon line detection, recognition and tracing [11–12]. Joint usage of CVS and inertial navigation systems allows significant improvement of precision and reliability of angular orientation [7, 8]. The pose estimation and 3D reconstruction are frequently realized in single algorithm, and this process is called SLAM (simultaneous localization and mapping) [4, 5, 7]. Near-obstacle flight [13–17]. It is a very complicated flight control task. The control system had to guarantee high precision control with a short time delay. A good example of such control is the flight control of an airplane or helicopter between buildings in an urban environment in altitude about 10–15 m [15]. Another example is the ground hugging flight or on-the-deck flight. The CVS is able to provide all necessary information for solving such control tasks. In particular, the optical flow calculation allows us to evaluate the risk of collision and to correct the direction of flight to avoid collision (obstacle avoidance). The distance and 3D profile of observed objects can be calculated by stereo pairs. The optical flow is used for estimation of flight altitude, too. Landing [18–26]. Landing is the most dangerous stage of flight. During this stage, UAV can crash. Moreover, some persons can be injured or property can be damaged. CVS can provide all necessary information for
128
Vladimir Grishin
•
•
the control system which should realize this maneuver. This task includes the recognition of landing stripe or landing site and flight motion control relative to the landing stripe. CVS is used also for landing site selection in conditions of forced landing; see for example [19, 21 and 25]. In particular, 3D reconstruction of observed surface allows selection of the most proper landing area. In other words, CVS supports landing on unprepared (non-cooperative) sites. CVS are used for autonomous landing on the surface of Solar system planets and small bodies [22, 24]. The most complicated task is the shipboard landing [26]. There are two main difficulties. The first – the landing stripe is small. The second – the ship deck is moving. Detection and tracking of selected moving targets [27–30]. A target can be a pedestrian, car, ship or other moving object. The selected target can make evolutions, attempts to hide from the observation and so on. In such a complicated condition, CVS should guarantee reliable tracing of specified target. In the case of automatic tracking collapse, the CVS should use effective search algorithms for target tracking restoration. Homing guidance to selected objects [31]. The homing task is similar to the task of navigation. It includes search, detection, recognition and tracking on the aim object. Significant problem is the multiple changing of distance during the homing process. During this process the observed size of the tracking object changes very significantly. In such case the correlation matching technologies are used. These technologies are used in smart weapons (precision-attack bombs). Another example is the Tomahawk cruise missile which is equipped with so-called DSMAC (Digital Scene Matching Area Corellator) system. The other scene matching terminal guidance systems exist.
All these tasks have been successfully solved separately in different projects. The larger part of these projects belongs to the on-board real-time systems. The remaining part will be realized in the form of on-board real-time systems in near future. During the last decades, great attention had been paid to the pattern recognition problem. From the flight control CVS view, the pattern recognition problem can be divided into two problems. •
Recognition of preliminary specified objects. This problem in fact is being solved in the high precision navigation task, tracking of selected objects and homing guidance.
Evolution of Computer Vision Systems •
129
Recognition as a classification of observing objects [32-36]. This problem is being successfully solved during the development of automatic systems intended for the processing of visual information – space images of high resolution. The huge volume of such information makes the manual processing impossible. Such tasks should be realized in work stations as they require a great deal of calculation resources. But subsequent development of such systems and some simplification of task will allow realizing it in the form of on-board real-time systems. The possibility to recognize wide set of objects creates the necessary prerequisites for development artificial intelligence (AI) control systems.
Full-Scale Universal CVS The development of perspective CVS can be divided into two stages. The first stage of perspective CVS development is realization of all listed tasks in single full-scale system. In other words the task is to combine (to join) these technologies into the whole interrelated system. Mention should be made that all these technologies have many common methods and algorithms. All necessary technologies exist and the degree of its maturity is high. Many algorithms have been suggested for solution any of the listed tasks. One of the main aims is the selection of the most effective and reliable algorithms. Then it is necessary to develop appropriate hardware and software. The serious attention should be drawn to the development of CVS architecture [40] and special processors for most calculation consuming algorithms. It is highly probable that development and production of specialized chipsets for image processing in CVS will be required. Some attempts to move in that direction of the multifunctional CVS elaboration are currently appearing [37-42]. The cooperation of many groups of researches, engineers and industry for realization this complicated CVS in acceptable size, weight and power consumption will be needed. These parameters should be acceptable for wide area of practical applications (robots of different kind). Another area of application is the safety systems which should be designed for preventing pilot errors in flight control or driver errors in car control. The accessibility of this task by means of present-day technologies is undoubted. Realization of the first stage will allow realizing complete automatic vision-based control of UAV and other mobile robots. Small size and cheap CVS will have very wide area of application which will be comparable with the area of GPS navigation receivers application.
130
Vladimir Grishin
Integration of CVS and AI Control System One can say with confidence that the second stage of CVS development is the integration of CVS with AI control systems which realize the functions of task analysis, estimation of current situation, operation planning, realization, and correction of plans in real time. On the one hand such integration will allows the essential improvement of CVS performance and reliability due to accumulation of additional information about objects of environment, the possibility to undertake special action for investigation of the environment and the aggregation with the huge volume of non-visual information of AI control systems. On the other hand the AI control system will obtains high degree of awareness about the current state of environment. This allows to realize high degree of control effectiveness in uncertain, fast changing and hostile environment. The possibility of gaining and accumulation of AI system individual experience allows improving CVS reliability and effectiveness autonomously during the whole time of CVS and AI system operation. It stands to reason that CVS and AI control system have to be tightly integrated with the conventional flight control system. It should be emphasized once more that CVS is the base for AI control systems development due to their high informativity and high awareness. From the other side, the requirements of AI system in solution of a task will stimulate the build-up of CVS functions and opportunities. On the certain stage of AI system development these processes can occur autonomously. In other words, the synergetic effect should takes place. In that way the close interaction between AI control system and CVS should be established. There are other high informative sensors, such as radar sets and laser scanners. The modern radar set with electronically scanned array can provide the huge information flow. Laser scanner is capable to provide millions measurements per second. But its size, weight and power consumption are much larger then similar parameters of TV camera. The TV camera cost is much smaller then cost of a radar set or laser scanner. Moreover, a radar set and laser scanner produce electromagnetic wave emission what in some circumstances is highly undesirable. Thus the application area of CVS will be much wider then the application area of radar sets and laser scanners. Some activities in the direction of integration some elements of AI and CVS are described in [43-49]. However, the realization of full-scale AI control system is still too complicated task. The effective methods for developing such combined CVS-AI systems are debatable and rather complicated. The second stage is characterized by the considerable degree of uncertainty. But during the last 20-25 years the quite acceptable approaches to the
Evolution of Computer Vision Systems
131
development of AI control systems and its integration with CVS have been formulated. The most popular introduction to principles of artificial and natural intelligence structure and function is presented in [50]. This book is dedicated mainly to a brain functioning. A more complicated and advanced conception is presented in [51]. This book is dedicated to development of autonomous AI systems and their function. These AI conceptions are suitable for development of control system for mobile objects which are equipped with CVS. Mention should be made that these authors traditionally paid great attention to the artificial and natural neuron networks functioning. It seems that more effective special processors architectures should be used in developing of AI processors. Blind imitation of biological structures on the completely different technological base is ineffective. For instance, wheels allow such speed of motion, which is absolutely inaccessible for legs or its mechanical imitations. The present-day aircraft wings permit flying with the speed, which is absolutely inaccessible for birds with flapping wings or their mechanical imitations. One can state that precedent thinking is the basis of natural and artificial intelligence. The development of the effective methods of permanent precedents accumulation and processing is required. These methods should include precedents data bases development, associative search, information and knowledge extraction methods from precedents data bases. Very significant precedent based methods of the situation evolution forecast should be developed. These methods should function in real time. Principles of precedent accumulation and processing are used by all living creature which have nervous system of any kind. Memory capacity and effectiveness of precedent processing determines the stage of the creature evolution. We describe here over-simplified constructive conception which is suitable for embedded autonomous AI system design. CVS can be considered as a basis for gradual wide practical implementation of AI. It will require the wide integration of different commands (groups) of researches, engineers and industry. The second stage will require much more time, money and other resources then the first stage. But even during the development process it is possible to obtain practically useful and commercially applicable results. For instance vision system of such species as frog or crocodile are very primitive as well as their intelligence. However starting from survival task, admissible size, weight and power consumption such vision and intelligence systems are quite effective and useful. This fact is confirmed by wide spreading and quantity of frogs and crocodiles species. So even not very advanced and perfect CVS and AI systems can find their useful applications and be commercially successful.
132
Vladimir Grishin
Returning back to the UAV control, we see that the overwhelming majority of UAV operate under the remote control or preliminary prepared mission plan which is based on waypoints navigation. Possibilities of remote control are very limited. Remote control requires high reliability of information exchange with the UAV. Such information exchange is possible in limited range from the control station. Another drawback of remote control is the vulnerability to countermeasures. Remote control requires a highly skilled operator. Nevertheless the crash ratio of remote-controlled UAV is relatively high. In the case of a complicated environment, the preparation of a flight plan for automatic UAV is a complicated and time consuming task. The initial information which is used for flight plan preparation can grow old rather fast. Any unmapped obstacle can cause the UAV to crash . Flight in a physically cluttered environment (such as streets of a city) on a preliminarily prepared mission plan is impossible. Mention should be made about the high degree of vulnerability of such popular GPS navigation to countermeasures.
Conclusion One of the most difficult and attractive aims is the development of fully autonomous unmanned aerial vehicles and other robots. These vehicles should be capable effectively to solve the required tasks which should be carried-out in complicated unpredictable varying and hostile environments without remote control in any form. High awareness of computer vision systems is the necessary condition for the development of such advanced control systems.
References [1] [2]
[3]
Xie, S.-R.; Luo, J.; Rao, J.-J.; Gong, Z.-B. Computer Vision-based Navigation and Predefined Track Following Control of a Small Robotic Airship. Acta Automatica Sinica. 2007, vol. 33, issue 3, pp. 286-291. Pechatnikov, M.; Shor, E.; Raizman, Y. Visionmap A3 - Super Wide Angle Mapping System Basic Principles and Workflow. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 2008, vol. XXXVII, part B4. Veth, M. J.; Raquet, J. F. Two-Dimensional Stochastic Projections for Tight Integration of Optical and Inertial Sensors for Navigation. Proceedings of
Evolution of Computer Vision Systems
[4]
[5]
[6] [7]
[8]
[9]
[10] [11] [12] [13]
133
the 2006 National Technical Meeting of the Institute of Navigation. 2007, pp. 587–596. Karlsson, R.; Schon, T. B.; Tornqvist, D.; Conte, G.; Gustafsson F. Utilizing Model Structure for Efficient Simultaneous Localization and Mapping for a UAV Application. Proceedings of 2008 IEEE Aerospace Conference. 2008, pp. 1-10. Hygounenc, E.; Jung, I.-K.; Soueres, P.; Lacroix, S. The Autonomous Blimp Project of LAAS-CNRS: Achievements in Flight Control and Terrain Mapping. The International Journal of Robotics Research. 2004, vol. 23, issue 4-5, pp. 473-511. Borodovskiy, V. N.; Grishin, V. A. Estimation of Limitations in Precision for 3-D Surface Reconstruction. Transactions on IEEE International Conference on Systems, Man and Cybernetics. 1998, vol. 5, pp. 4387-4391. Baker, C.; Debrunner, C.; Gooding, S.; Hoff W.; Severson, W. Autonomous Vehicle Video Aided Navigation – Coupling INS and Video Approaches. Advances in visual computing: Proceedings of the Second International Symposium on Visual Computing (ISVC 2006). 2006, vol. 4292, issue 2, pp. 534-543. Roumeliotis, S. I.; Johnson, A. E.; Montgomery, J. F. Augmenting Inertial Navigation with Image-Based Motion Estimation. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ‘02). 2002, vol. 4, pp. 4326–4333. Avanesov, G. A.; Forsh, A. A.; Bessonov, R. V.; Ziman, Ya. L.; Kudelin, M. I. BOKZ-M star tracker and its evolution. Proceedings of the 14th Saint Petersburg International Conference on Integrated Navigation Systems. 2007, pp. 219-224. SED 26 Star Tracker. (2006). http://www.sodern.fr/site/docs_wsw/ fichiers_sodern/SPACE%20EQUIPMENT/FICHES%20DOCUMENTS/SE D26.pdf Dusha, D.; Boles, W.; Walker, R. Fixed-Wing Attitude Estimation Using Computer Vision Based Horizon Detection. Proceedings of the 12th Australian International Aerospace Congress. 2007, pp. 1-19. Ettinger, S. M.; Nechyba, M. C.; Ifju, P. G.; Waszak, M. Vision-Guided Flight Stability and Control for Micro Air Vehicles. Advanced Robotics. 2003, vol. 17, issue 7, pp. 617-640. Oh, P. Y.; Green, W. E.; Barrows, G. Neural Nets and Optic Flow for Autonomous Micro-Air-Vehicle Navigation. Proceedings of the ASME International Mechanical Engineering Congress and Exposition (IMECE 04). 2004, vol. 2, pp. 1-7.
134
Vladimir Grishin
[14] Kellogg, J.; Bovais, C.; Dahlburg, J.; Foch, R.; Gardner, J.; Gordon, D.; Hartley, R.; Kamgar-Parsi, B.; McFarlane, H.; Pipitone, F.; Ramamurti, R.; Sciambi, A.; Spears, W.; Srull, D.; Sullivan, C. The NRL MITE Air Vehicle. Proceedings of the Bristol RPV/AUV Systems Conference. 2001. [15] Muratet, L.; Doncieux, S.; Briere, Y.; Meyer, J.-A. A Contribution to Vision-Based Autonomous Helicopter Flight in Urban Environments. Robotics and Autonomous Systems. 2005, vol. 50, issue 4, pp. 195-209. [16] Ortiz, A. E.; Neogi, N. N. Object Detection and Avoidance Using Optical Techniques in Uninhabited Aerial Vehicles. Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit. 2007. [17] Zufferey, J.-C.; Floreano, D. Toward 30-gram Autonomous Indoor Aircraft: Vision-based Obstacle Avoidance and Altitude Control. Proceedings of the 2005 IEEE International Conference on Robotics and Automation (ICRA'2005). 2005, pp. 2594-2599. [18] Saripalli, S.; Montgomery, J. F.; Sukhatme, G. S. Visually-Guided Landing of an Unmanned Aerial Vehicle. IEEE Transactions on Robotics and Automation. 2003, vol. 19, issue 3, pp. 371-380. [19] Fitzgerald, D.; Walker, R.; Campbell, D. A Vision Based Emergency Forced Landing System for an Autonomous UAV. Proceedings of Australian International Aerospace Congress Conference. 2005. [20] Sharp, C. S.; Shakernia, O.; Sastry, S. S. A Vision System for Landing an Unmanned Aerial Vehicle. Proceedings of the IEEE International Conference on Robotics and Automation. (2001 ICRA). 2001, vol. 2, p. 1720-1727. [21] Theodore, C.; Rowley, D.; Ansar, A.; Matthies, L.; Goldberg, S.; Hubbard, D.; Whalley, M. Flight Trials of a Rotorcraft Unmanned Aerial Vehicle Landing Autonomously at Unprepared Sites. Proceedings of the American Helicopter Society 62nd Annual Forum. 2006, pp. 1250-1264. [22] Zhukov, B.; Avanesov, G.; Grishin, V.; Krasnopevtseva, E. On-Board RealTime Image Processing to Support Landing on Phobos. Proceedings of the 7th International Symposium on Reducing Costs of Spacecraft Ground Systems and Operations (RCSGSO). 2007, pp. 423-428. [23] Grishin, V. A. A Cramer–Rao Bound for the Measurement Accuracy of Motion Parameters and the Accuracy of Reconstruction of a Surface Profile Observed by a Binocular Vision System. Pattern Recognition and Image Analysis. 2008, vol. 18, issue 3, pp. 507–513. [24] Trawny, N.; Mourikis, A. I.; Roumeliotis, S. I.; Johnson, A. E.; Montgomery, J. F.; Ansar, A.; Matthies, L. H. Coupled Vision and Inertial
Evolution of Computer Vision Systems
[25]
[26] [27] [28]
[29]
[30] [31] [32]
[33]
[34]
[35]
135
Navigation for Pin-Point Landing. Proceedings of the NASA Science and Technology Conference (NSTC’07). 2007, paper B2P2. Fitzgerald, D. L.; Walker, R. A; Campbell, D. A Vision Based Forced Landing Site Selection System for an Autonomous UAV. Proceedings of the 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP). 2005, pp. 397- 402. Saripalli, S.; Sukhatme, G. S. Landing on a Moving Target using an Autonomous Helicopter. Proceedings of the International Conference on Field and Service Robotics. 2003, vol. 24, pp. 277-286. Helble, H.; Cameron, S. OATS: Oxford Aerial Tracking System. Robotics and Autonomous Systems. 2007, vol. 55, issue 9, pp. 661-666. Lenhart, D.; Hinz, S. Automatic Vehicle Tracking in Low Frame Rate Aerial Image Sequences. International Archives of Photogrammetry, Remote Sensing, and Spatial Information Sciences. 2006, vol. 36-3, pp. 203208. Nguyen, T. T.; Grabner, H.; Bischof, H.; Gruber, B. On-line Boosting for Car Detection from Aerial Images. Proceedings of the IEEE International Conference on Research Information and Vision for the Future (RIVF’07). 2007, pp. 87-95. Zhao, T.; Nevatia, R. Car Detection in Low Resolution Aerial Image. Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV 2001). 2001, vol. 1, pp. 710-717. Min-Shou T. (1996). The Application of Correlation Matching Technique in Image Guidance. http://handle.dtic.mil/100.2/ADA316752 Rekik, A.; Zribi, M.; Hamida, A. B.; Benjelloun, M. An Optimal Unsupervised Satellite image Segmentation Approach Based on Pearson System and k-Means Clustering Algorithm Initialization. International Journal of Signal Processing. 2009, vol. 5, issue 1, pp. 38-45. Lari, Z.; Ebadi, H. Automatic Extraction of Building Features from High Resolution Satellite Images Using Artificial Neural Networks. Proceeding of the Conference on Information Extraction from SAR and Optical data, with Emphasis on Developing Countries. 2007. Kampouraki, M.; Wood, G. A.; Brewer, T. R. The Suitability of ObjectBased Image Segmentation to Replace Manual Aerial Photo Interpretation for Mapping Impermeable Land Cover. Proceedings of the 2007 Annual Conference of the Remote Sensing & Photogrammetry Society (RSPSoc2007). 2007. Samadzadegan, F.; Abbaspour, R. A.; Hahn, M. Automatic Change Detection of Geospatial Databases Based on a Decision-Level Fusion
136
[36]
[37]
[38] [39]
[40] [41]
[42]
[43] [44] [45]
Vladimir Grishin Technique. Proceedings of the XXth International Society for Photogrammetry and Remote Sensing Congress (ISPRS 2004). 2004, pp. 489-491. Li, Y.; Atmosukarto, I.; Kobashi, M.; Yuen, J.; Shapiro, L. G. Object and Event Recognition for Aerial Surveillance. Proceedings of the SPIE Conference on Optics and Photonics in Global Homeland Security. 2005, vol. 5781, pp. 139-149. Grishin, V. A; Aust, S. A. Architecture of Computer Vision Systems Intended for Control of Aircraft. Proceedings of the Fourth International Conference Parallel Computations and Control Problems (PACO ‘2008). 2008, pp. 1071-1086. (in Russian). Wagter, C. D.; Proctor, A. A.; Johnson, E. N. Vision-Only Aircraft Flight Control. Proceedings of the 22nd AIAA Digital Avionics Systems Conference (DASC '03). 2003, vol. 2, pp. 8.B.2-81-11. Montgomery, J. F.; Johnson, A. E.; Roumeliotis, S. I.; Matthies, L. H. The Jet Propulsion Laboratory Autonomous Helicopter Testbed: A Platform for Planetary Exploration Technology Research and Development. Journal of Field Robotics. 2006, vol. 23, issue 3-4, pp. 245-267. Wagter, C. D.; Bijnens, B.; Mulder J. A. Vision-Only Control of a Flapping MAV on Mars. Proceedings of AIAA Guidance, Navigation and Control Conference and Exhibit. 2007, pp. 1-7. Nordberg, K.; Doherty, P.; Farneback, G.; Forssen, P.-E.; Granlund, G.; Moe, A.; Wiklund, J. Vision for a UAV helicopter. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'02). Workshop WS6 on aerial robotics. 2002, pp. 29-34. Kurdila, A.; Nechyba, M.; Prazenica, R.; Dahmen, W.; Binev. P.; DeVore, R.; Sharpley, R. Vision-Based Control of Micro–Air–Vehicles: Progress and Problems in Estimation. Proceedings of the 43rd IEEE Conference on Decision and Control. 2004, vol. 2, pp. 1635– 1642. Kim, H. J.; Shim, D. H. A flight control system for aerial robots: algorithms and experiments. Control engineering practice. 2003, vol. 11, issue 12, pp. 1389-1400. Smith, N. W. (2006). Artificially Intelligent Autonomous Aircraft Navigation System Using a Distance Transform on an FPGA. www.altera.com/literature/dc/2006/a1.pdf Granlund, G.; Nordberg, K.; Wiklund, J.; Doherty, P.; Skarman, E.; Sandewall E. WITAS: An Intelligent Autonomous Aircraft Using Active Vision. Proceedings of the UAV 2000 International Technical Conference and Exhibition. 2000.
Evolution of Computer Vision Systems
137
[46] Dickmanns, E. D. Vehicles Capable of Dynamic Vision. Proceedings of 15th International Joint Conference on Artificial Intelligence (IJCAI-97). 1997, vol. 2, pp. 1577-1592. [47] Heintz, F.; Doherty, P. Managing Dynamic Object Structures using Hypothesis Generation and Validation. Proceedings of the AAAI Workshop on Anchoring Symbols to Sensor Data. 2004, pp. 54-62. [48] Cliff, D.; Noble, J. Knowledge-based vision and simple visual machines. Philosophical Transactions of the Royal Society. Biological Sciences. 1997, vol. 352, issue 1358, pp. 1165-1175. [49] Johnson, E. N.; Proctor, A. A.; Ha, J.; Tannenbaum, A. R. Development and Test of Highly Autonomous Unmanned Aerial Vehicles. AIAA Journal of Aerospace Computing, Information, and Communication. 2004, vol. 1, issue 12, pp. 485-501. [50] Hawkins, J.; Blakeslee, S. On Intelligence; Times Books; Henry Holt and Company: New York, NY, 2004. [51] Zhdanov, A. A. Autonomous artificial intelligence; Adaptive and Intellectual systems; Binom. Laboratory of Knowledge: Moscow, Russia, 2008. (in Russian).
In: Binocular Vision Editors: J. McCoun et al, pp. 139-153
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 6
BINOCULAR VISION AND DEPTH PERCEPTION: DEVELOPMENT AND DISORDERS Ken Asakawa* and Hitoshi Ishikawa Department of Ophthalmology and Visual Science, Kitasato University Graduate School, Doctors Program of Medical Science.
Introduction “Binocular vision” literally means vision with two eyes, and refers to the special attributes of vision with both eyes open, rather than one eye only. Our perception under binocular conditions represents a highly complex coordination of motor and sensory processes and is markedly different from and more sophisticated than vision with one eye alone. However, the use of a pair of eyes can be disrupted by a variety of visual disorders, e.g., incorrect coordination between the two eyes can produce strabismus with its associated sensory problems, amblyopia, suppression and diplopia. What, then, is the reason forand the advantage of-having two eyes? From our visual information input, we can perceive the world in three dimensions even though the images falling on our two retinas are only two-dimensional. How is this accomplished? This article is a review of our ability to use both eyes, while also providing basic *
E-mail address:
[email protected]. Correspondence to Ken Asakawa, CO (Orthoptist), Department of Ophthalmology and Visual Science, Kitasato University Graduate School, Doctors Program of Medical Science, 1-15-1, Kitasato, Sagamihara, Kanagawa, 228-8555, Japan
140
Ken Asakawa and Hitoshi Ishikawa
information on the development of binocular vision and on the clinical disorders that interfere with our depth perception, such as strabismus and amblyopia.
1. Advantages of Binocular Vision “Two eyes are better than one,” it is said; and, indeed, two eyes do offer a number of advantages over just one. It is reported that some 80% of the neurons in the visual cortex receive input from both eyes, which offers anatomical support for the view that binocular vision is an attribute of considerable value and importance. Clearly, binocular vision has a number of functional advantages, the main ones being: 1) Binocular summation, in which many visual thresholds are lower than with monocular vision[16]. Binocular visual acuity, for example, is typically better than monocular visual acuity; and two eyes offer better contrast detection thresholds than one does. 2) The binocular field of view is larger than either monocular field alone. We have a horizontal field of approximately 200 degrees, in which the two visual fields overlap by about 120 degrees when both eyes are used together[29]. We can see objects whose images are formed on both foveas as if their images fell on a single point midway between the two eyes, like an imaginary single eye in the middle of our forehead, named a “cyclopean eye” [7,45]. 3) If one looks at the fingertip in front of the eyes, noticing what can be seen behind it, and one first closes one eye, and then the other, the objects behind the fingertip should appear to move. This positional difference results from the fact that the two eyes are arranged laterally, and are a certain distance-the interocular distance (60 to 65 mm)-apart. They therefore see the world from two slightly different points. The subtle differences between the images entering each eye make possible the binocular form of depth perception, which is the true advantage of binocular vision, and is known as “stereopsis”,[15,55,56]. The large designed studies of binocular vision and stereopsis by Howard, Rogers[32], Saladin[51] and Watt[63] are investigated.
2. Foundations of Binocular Vision Images of a single object that do not stimulate corresponding retinal points in both eyes are said to be disparate[22,37]; binocular disparity is defined as the difference in position of corresponding points between images in the two eyes [48,49,50] (figure 1). Binocular disparity can be classified as crossed or
Binocular Vision and Depth Perception: Development and Disorders
141
uncrossed in relation to the point at which the two eyes converge (the fixation point)[44]. Points perceived to be nearer than the fixation point (within the Vieth-Müller circle, a theoretical prediction of objects in space that stimulate corresponding points in the two eyes) generally have lines of sight that cross in front of the fixation point; these points are said to have crossed disparity. Points farther away than the fixation point have lines of sight that meet behind the fixation point; this is called uncrossed disparity. The Vieth-Müller circle intersects the fixation point and the entrance pupils of each eye. Diplopia is the result of a large binocular disparity; however, the visual system is able to combine two images into a single percept with smaller disparities. In binocular disparities associated with normal binocular vision, the relationship between motor and sensory fusion is more complex[25]. Panum’s area determines the upper limit of disparities that can produce single vision[41,54]. Small differences in the perception of the two eyes give rise to stereopsis—threedimensional depth perception. When a distant object is fixated bifoveally, nearer objects in front of it will be imaged on the temporal retina of each eye on noncorresponding points, resulting in a double image; crossed diplopia. In contrast, when a near object is fixated and a distant object is seen double, this is called uncrossed diplopia. In this case, each image is formed on the nasal retina of the eye. These phenomenons called physiological diplopia. The double image arises from visual corresponding or non-corresponding retinal areas under binocular vision. Binocular retinal correspondence is defined by the set of retinal image locations that produces identical visual directions when viewing with both eyes at the same time. These object locations, imaged onto corresponding retinal points, can be imagined as a cylinder with an infinite radius of curvature. This surface of points, called the horopter, stimulates the perception of identical visual directions for the two eyes[57,58]. However, the horopter will not precisely intersect only the fixation target. Because single binocular vision only requires the retinal image to fall within Panum’s area, a small residual misalignment of the visual axis (vergence error) may occur, causing a constant retinal disparity of a fixated object without diplopia. Fixation disparity is used by the vergence eye movement system to maintain its innervational level and compensate for a heterophoria. In the United States, testing has primarily followed a motor approach, whereas a strong sensorybased analysis has been used in Germany[39].
142
Ken Asakawa and Hitoshi Ishikawa
Figure 1. Crossed and uncrossed disparities result when objects produce images that are formed on closely separated retinal points. Any point within Panum’s area yields a percept of a single image, while points outside Panum’s area produce diplopia.
3. Stereopsis as the Highest Level of Binocular Vision The first descriptions of binocular vision were described in detail by Worth (1921), who classified binocular vision as three grades. The first degree consists of the simultaneous perception of each eye’s image at once. The second degree consists of the combination of the two images into a single percept and fusion,
Binocular Vision and Depth Perception: Development and Disorders
143
which include motor and sensory fusion. The third degree and highest level of binocular visual function is stereopsis—binocular, three-dimensional depth perception resulting from the neural processing of horizontal binocular disparities (figure 2). However, stereopsis is not the only way to obtain depth information; even after closing one eye, we can still determine the relative positions of objects around us and estimate our spatial relationships with them. The clues that permit the interpretation of depth with one eye alone are called monocular clues. They include pictorial clues, such as the size of the retinal image, linear perspective, texture gradients, aerial perspective, and shading, as well as non-stereoscopic clues, such as accommodation of the crystalline lens, motion parallax, and structure from motion[62].
Figure 2. The classical model of binocular visual function is composed of three hierarchical degrees.
4. Binocular Viewing Conditions on Pupil Near Responses Here, the effect of binocular clues on near pupil response as our preliminary research is introduced. When changing visual fixation from a distant to a close object, accommodation, convergence, and pupil constriction occur, three feedback responses those constitute the near reflex[42]. We investigated the amplitudes of vergence eye movements associated with pupil near responses for subjects of prepresbyopia and presbyopia under binocular and monocular viewing conditions in dynamics of step change in real target position from far to near (figure 3). The
144
Ken Asakawa and Hitoshi Ishikawa
findings of these experiments were that the convergence response with pupil miosis was induced in all cases under binocular viewing conditions (figure 4A,C), whereas only presbyopic subjects showed version eye movement without pupil constriction under monocular conditions (figure 4D). Our findings imply that accommodation, which is high in younger subjects, but becomes progressively restricted with age, is a most important factor in the induction of the pupil near response. However, the results of presbyopia subjects under binocular conditions suggested that binocular visual function such as fusion of the real target, depth perception, and proximity induces pupil constriction in presbyopia resulting from the inability to accommodate[27,31]. When both eyes are oriented toward a target, a fused perception of the target is formed, and through the processing of retinal disparity, depth perception can be achieved. As object distances from the plane of fixation increase, retinal image disparities become large and an object appears to be in two separate directions i.e., viewing a nearby target binocularly yields proximal and disparity clues[20,43]. Consequently, in young subjects, accommodation is active, thus, the pupil near response with convergence by blur-driven is well induced despite the monocular viewing condition. On the other hand, in presbyopic subjects, since the change in real target position was performed in real space and binocular viewing conditions, proximity and disparity clues were all available and were in conjunction with each other[47].
Infrared CCD camera Target (near)
Figure 3. We measure and record the dynamics of pupil and convergence simultaneously with the step stimuli of a real target in real space.
Binocular Vision and Depth Perception: Development and Disorders
145
Figure 4. Measured data of binocular viewing conditions. The upper trace is from a young subject (A), and the lower, from a subject with presbyopia (B). The young subject’s typical results under monocular (non-dominant eye occluded) visual conditions (C). Typical trace of a subject with presbyopia showed conjugate eye movement without pupil constriction (D).
146
Ken Asakawa and Hitoshi Ishikawa A
Target
B-1
B-2
Visual axis
Visual axis OA
Cyclopean eye
Uncrossed diplopia SA
Confusion
Nasal Temporal
SA = OA
C-1
C-2
Fovea Zero point (Yoke area) Anomalous associated point
OA
AA
SA
Suppression SA = Subjective angle
C-1
OA = Objective angle Monofixation
AA = Angle of anomaly (OA - SA)
Uncrossed diplopia
SA
SA < OA
Figure 5. Suppression and retinal correspondence in strabismus with esodeviataion. (A) Normal subject; (B) Strabismic patient with normal retinal correspondence and without suppression would have diplopia (B-1) and visual confusion (B-2), a common visual direction for two separate objects. (C) Elimination of diplopia and confusion by suppression of retinal image (C-1) and anomalous retinal correspondence (C-2): adaptation of visual directions of deviating eye.
Binocular Vision and Depth Perception: Development and Disorders
147
5. Development of Binocular Vision A major question is whether binocularity and stereopsis are present at birth or whether infants must be learn to see binocularly and three-dimensionally. The visual system takes approximately 6 weeks to become sensitive to visual stimulus deprivation, and binocular vision first appears at about 3 months of age. Although it never tapers off completely[52], visual experience has its greatest effects at about 6 months of age, with effects diminishing rapidly until about 6 years of age [6,11,19]. During the critical period of rapid visual change between 6 weeks and 3 months after birth, infants are at a greater risk of developing visual abnormalities than at any other life stage. Therefore, infants are extremely susceptible to severe visual disorders arising from inadequate visual experience during the critical period. Since Wheatstone (1838), stereopsis has been one of the most popular fields of vision research, and it is routinely measured in clinical practice[40]. Disorders affecting stereopsis include blur, strabismus, and amblyopia, and the clinical measurement of stereopsis is of value as a means of indirect screening. The type and extent of sensory adaptation are important factors in the re-establishment of functional binocular vision for disorders such as strabismus and amblyopia in children[8,24]. Infantile esotropia, a stable, cross-fixational large-angle esotropia with onset before 6 months of age, is the most common form of strabismus. Generally, cycloplegic refraction reveals less than 3D of hyperopia, with no refractive and accommodative component responsible for deviation. Accommodative esotropia, in contrast, usually occurs between 6 months and 7 years of age, with an average age of onset of 3 years[17]. The amount of hyperopic refractive error in accommodative esotropia averages +4D; esodeviation is restored to orthophoria by optical correction of the underlying hyperopia[1]. In the non-refractive form, hyperopia averages +2D; esodeviation (not related to uncorrected refractive error) is caused by a high AC/A ratio. The normal sensory organization of binocular vision can be altered in infantile strabismus by suppression or anomalous retinal correspondence (figure 5). Therefore, most strabismic patients do not experience diplopia and visual confusion[2,62,64]. Single vision is achieved by suppression, which causes elimination of the perception of objects normally visible to the deviating eye during simultaneous binocular viewing [28,34,65]. Anomalous retinal correspondence is an adapted shift in the visual directions of the deviated eye relative to the normal visual directions of the fixating eye [4,21,35,46]. The net result is that the deviating eye acquires a common visual direction to that of the
148
Ken Asakawa and Hitoshi Ishikawa
fovea of the fixating eye during binocular viewing of a peripheral retinal area [5,14]. According to the recent study, early abnormal binocular visual input contributes to poor outcomes in both infantile and accommodative esotropia[33]. The accepted strabismus treatment is wearing appropriate glasses and eye muscle surgery. These treatments may prevent the development of sensory and motor dysfunctions[60]. However, several factors, including patient age at surgical alignment and duration of misalignment, influence the outcome of treatment. Future studies should establish critical factors for achieving stable binocular vision. Abnormal development of spatial vision causes amblyopia, decreased visual acuity that cannot be attributed to suppression scotoma, uncorrected refractive error, and visual stimulus deprivation. Clinically, amblyopia is defined as a reduction in visual function caused by abnormal visual experience during development[30,61]. Strabismic amblyopia refers to amblyopia that is associated with the presence of strabismus, typically either esotropia or exotropia. The strabismic eye also shows a pronounced suppression of the central and peripheral visual field[26,53]. In addition, there is a contrast-dependent that is strongly dependent on spatial frequency and a contrast-independent deficit for position of targets[18,38,59]. Therefore, for infantile esotropia with significant fixation preference, occlusion therapy and surgery are associated with normal acuity development and a potential for at least gross stereopsis[10,40]. Anisometropic amblyopia is caused by significant, unequal refractive errors, exceeding +2D, between the eyes. Ametropic amblyopia may have equal refractive errors that are either extremely myopic (more than -6D) or hyperopic (more than +4D). Yet another kind of amblyopia, meridional amblyopia, is caused by astigmatic refractive errors for long periods (more than 2 years)[3]. Moreover, form vision deprivation amblyopia occurs in patients with a constant obstruction in the image formation mechanism of the eye, such as congenital ptosis, congenital or traumatic cataracts and corneal opacities that remain untreated for some time[9,23]. Pediatric cataract treatment is now undergoing rapid development, and visual prognosis for children with cataracts is improving due to earlier surgery, increased frequency of intraocular lens (IOL) implantation, and improved amblyopia therapy. Traditional amblyopia treatment consists of full-time occlusion of the sound eye, using an adhesive patch. However, recent trends include prescribing fewer hours and using atropine as an alternative or adjunct to patching or even as a first-line treatment[36].
Binocular Vision and Depth Perception: Development and Disorders
149
Conclusion Binocular vision requires a high level of coordination between motor and sensory processes—binocular vision and stereopsis will be compromised if any component in this system fails. Visual inputs from both eyes are combined in the primary visual cortex (V1), where cells are tuned for binocular vision. The observation of the cells tuning in V1, together with psychophysical evidence that stereopsis occurs in visual processing, suggests that V1 was the neural correlate of stereoscopic depth perception; however, more recent work has indicated that this occurs in higher visual areas (in particular, MT area). In the future, we would like to review the neural integration of depth perception and binocular vision. The present review provides the basic information on normal and abnormal binocular vision that forms the foundation for the clinical disorder of binocular vision. We look forward to new ideas and research on binocular vision[12].
References [1] [2] [3] [4] [5] [6] [7] [8] [9]
Asakawa K, Ishikawa H, Shoji N. New methods for the assessment of accommodative convergence. J. Pediatr. Ophthalmol. Strabismus. 2009; 46:273-277. Asher H. Suppression theory of binocular vision. Br. J. Ophthalmol. 1953; 37:37-49. Atkinson J. Infant vision screening: Prediction and prevention of strabismus and amblyopia from refractive screening in the Cambridge photorefraction program, Oxford University Press, 1993 Awaya S, von Noorden GK, Romano PE. Symposium: Sensory Adaptations in Strabismus. Anomalous retinal correspondence in different positions of gaze. Am. Orthopt. J. 1970; 20:28-35. Bagolini B. Anomalous correspondence: definition and diagnostic methods. Doc. Ophthalmol. 1967; 23:346-98. Banks MS, Aslin RN, Letson RD. Sensitive period for the development of human binocular vision. Science. 1975; 190:675-677. Barbeito R. Sighting from the cyclopean eye: the cyclops effect in preschool children. Percept. Psychophys. 1983; 33:561-564. Birch EE, Gwiazda J, Held R. Stereoacuity development for crossed and uncrossed disparities in human infants. Vision Res. 1982; 22:507-513. Birch EE, Stager DR. Prevalence of good visual acuity following surgery for congenital unilateral cataract. Arch. Ophthalmol. 1988; 106:40-43.
150
Ken Asakawa and Hitoshi Ishikawa
[10] Birch EE, Stager DR, Berry P, Everett ME. Prospective assessment of acuity and stereopsis in amblyopic infantile esotropes following early surgery. Invest. Ophthalmol. Vis. Sci. 1990; 31:758-765. [11] Birch EE: Stereopsis in infants and its developmental relation to visual acuity, Oxford University Press, 1993 [12] Brodsky MC. Visuo-vestibular eye movements: infantile strabismus in 3 dimensions. Arch. Ophthalmol. 2005; 123:837-842. [13] Brown AM, Lindsey DT, Satgunam P, Miracle JA. Critical immaturities limiting infant binocular stereopsis. Invest. Ophthalmol. Vis. Sci. 2007; 48:1424-1434. [14] Burian HM. Anomalous retinal correspondence. Its essence and its significance in diagnosis and treatment. Am. J. Ophthalmol. 1951; 34:237253. [15] Burian HM. Stereopsis. Doc. Ophthalmol. 1951; 5-6:169-83. [16] Campbell FW, Green DG. Monocular versus binocular visual acuity. Nature. 1965 9; 208:191-192. [17] Coats DL, Avilla CW, Paysse EA, Sprunger D, Steinkuller, PG, Somaiya M. Early onset refractive accommodative esotropia. J. AAPOS. 1998; 2:275278. [18] Demanins R, Wang YZ, Hess RF. The neural deficit in strabismic amblyopia: sampling considerations. Vision Res. 1999; 39:3575-3585. [19] Dobson V, Teller DY. Visual acuity in human infants: a review and comparison of behavioral and electrophysiological studies. Vision Res. 1978; 18:1469-1483. [20] Erkelens CJ, Regan D. Human ocular vergence movements induced by changing size and disparity. J. Physiol. 1986; 379:145-169. [21] Flom MC. Corresponding and disparate retinal points in normal and anomalous correspondence. Am. J. Optom. Physiol. Opt. 1980; 57:656-665. [22] Foley JM, Applebaum TH, Richards WA. Stereopsis with large disparities: discrimination and depth magnitude. Vision Res. 1975; 15:417-21. [23] Forbes BJ, Guo S. Update on the surgical management of pediatric cataracts. J. Pediatr. Ophthalmol. Strabismus. 2006; 43:143-151. [24] Fox R, Aslin RN, Shea SL, Dumais ST. Stereopsis in human infants. Science. 1980; 207:323-324. [25] Fredenburg P, Harwerth RS. The relative sensitivities of sensory and motor fusion to small binocular disparities. Vision Res. 2001; 41:1969-1979. [26] Gunton KB, Nelson BA. Evidence-based medicine in congenital esotropia. J. Pediatr. Ophthalmol. Strabismus. 2003; 40:70-73.
Binocular Vision and Depth Perception: Development and Disorders
151
[27] Gwiazda J, Thorn F, Bauer J, et al. Myopic children show insufficient accommodative response to blur. Invest. Ophthalmol. Vis. Sci. 1993; 34:690-694. [28] Harrad R. Psychophysics of suppression. Eye. 1996; 10:270-273. [29] Harrington DO. The visual fields, St Louis, Mosby, 1964 [30] Harwerth RS, Smith EL 3rd, Duncan GC, Crawford ML, von Noorden GK. Multiple sensitive periods in the development of the primate visual system. Science. 1986; 232:235-238. [31] Hokoda SC, Rosenfield M, Ciuffreda, KJ. Proximal vergence and age. Optom. Vis. Sci. 1991; 68:168-172. [32] Howard IP, Rogers BJ. Binocular vision and stereopsis, Oxford University Press. [33] Hutcheson KA. Childhood esotropia. Curr. Opin. Ophthalmol. 2004; 15:444-448. [34] Jampolsky A. Characteristics of suppression in strabismus. AMA Arch. Ophthalmol. 1955; 54:683-696. [35] Kerr KE. Anomalous correspondence--the cause or consequence of strabismus? Optom. Vis. Sci. 1998; 75:17-22. [36] Khazaeni L, Quinn GE, Davidson SL, Forbes BJ. Amblyopia treatment: 1998 versus 2004. J. Pediatr. Ophthalmol. Strabismus. 2009; 46:19-22. [37] Lappin JS, Craft WD. Definition and detection of binocular disparity. Vision Res. 1997; 37:2953-2974. [38] Levi DM, Harwerth RS, Smith EL. Binocular interactions in normal and anomalous binocular vision. Doc. Ophthalmol. 1980; 49:303-324. [39] London R, Crelier RS. Fixation disparity analysis: sensory and motor approaches. Optometry. 2006; 77:590-608. [40] Maeda M, Sato M, Ohmura T, Miyazaki Y, Wang AH, Awaya S. Binocular depth-from-motion in infantile and late-onset esotropia patients with poor stereopsis. Invest. Ophthalmol. Vis. Sci. 1999; 40:3031-3036. [41] Mitchell DE. A review of the concept of "Panum's fusional areas". Am. J. Optom. Arch. Am. Acad. Optom. 1966; 43:387-401. [42] Myers GA, Stark L. Topology of the near response triad. Ophthalmic Physiol. Opt. 1990; 10:175-181. [43] North RV, Henson DB, Smith TJ. Influence of proximal, accommodative and disparity stimuli upon the vergence system. Ophthalmic. Physiol. Opt. 1993; 13:239-243. [44] Ogle KN. Disparity limits of stereopsis. AMA Arch. Ophthalmol. 1952; 48:50-60.
152
Ken Asakawa and Hitoshi Ishikawa
[45] Ono H, Barbeito R. The cyclopean eye vs. the sighting-dominant eye as the center of visual direction. Percept. Psychophys. 1982; 32:201-210. [46] Pasino L, Maraini G. Area of binocular vision in anomalous retinal correspondence. Br. J. Ophthalmol. 1966; 50:646-650. [47] Phillips NJ, Winn B, Gilmartin B. Absence of pupil response to blur-driven accommodation. Vision Res. 1992; 32:1775-1779. [48] Regan D. Binocular correlates of the direction of motion in depth. Vision Res. 1993; 33:2359-2360. [49] Richards W. Stereopsis and stereoblindness. Exp. Brain Res. 1970; 10:380388. [50] Richards W. Anomalous stereoscopic depth perception. J. Opt. Soc. Am. 1971; 61:410-414. [51] Saladin JJ. Stereopsis from a performance perspective. Optom. Vis. Sci. 2005; 82:186-205. [52] Scheiman MM, Hertle RW, Kraker RT, Beck RW, Birch EE, Felius J, Holmes JM, Kundart J, Morrison DG, Repka MX, Tamkins SM. Patching vs atropine to treat amblyopia in children aged 7 to 12 years: a randomized trial. Arch. Ophthalmol. 2008; 126:1634-1642. [53] Schor CM. Visual stimuli for strabismic suppression. Perception. 1977; 6:583-593 [54] Schor CM, Tyler CW. Spatio-temporal properties of Panum's fusional area. Vision Res. 1981; 21:683-692. [55] Sekular R, Blake R: Perception, Knopf, New York, 1985 [56] Sheedy JE, Fry GA. The perceived direction of the binocular image. Vision Res. 1979; 19:201-211. [57] Shipley T, Rawlings SC. The nonius horopter. I. History and theory. Vision Res. 1970; 10:1225-1262. [58] Shipley T, Rawlings SC. The nonius horopter. II. An experimental report. Vision Res. 1970; 10:1263-1299. [59] Tychsen L, Burkhalter A. Neuroanatomic abnormalities of primary visual cortex in macaque monkeys with infantile esotropia: preliminary results. J. Pediatr. Ophthalmol. Strabismus. 1995; 32:323-328. [60] Uretmen O, Civan BB, Kose S, Yuce B, Egrilmez S. Accommodative esotropia following surgical treatment of infantile esotropia: frequency and risk factors. Acta Ophthalmol. 2008; 86:279-283. [61] von Noorden GK. Amblyopia: a multidisciplinary approach. Proctor lecture. Invest. Ophthalmol. Vis. Sci. 1985;26:1704-1716. [62] von Noorden GK: Binocular vision and ocular motility, St Louis, Mosby, 1990
Binocular Vision and Depth Perception: Development and Disorders
153
[63] Watt SJ, Akeley K, Ernst MO, et al. Focus clues affect perceived depth. J. Vis. 2005; 5:834-862. [64] Wong AM, Lueder GT, Burkhalter A, Tychsen L. Anomalous retinal correspondence: neuroanatomic mechanism in strabismic monkeys and clinical findings in strabismic children. J. AAPOS. 2000; 4:168-174. [65] Wong AM, Burkhalter A, Tychsen L. Suppression of metabolic activity caused by infantile strabismus and strabismic amblyopia in striate visual cortex of macaque monkeysi 2005; 9:37-47.
In: Binocular Vision Editors: J. McCoun et al, pp. 155-160
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 7
REPEATABILITY OF PRISM DISSOCIATION AND TANGENT SCALE NEAR HETEROPHORIA MEASUREMENTS IN STRAIGHTFORWARD GAZE AND IN DOWNGAZE David A. Goss1, Douglas K. Penisten2, Kirby K. Pitts2 and Denise A. Burns2 1
2
School of Optometry, Indiana University, Bloomington, IN 47405 College of Optometry, Northeastern State University, Tahlequah, OK 74464
Abstract The evaluation of heterophoria is an important element of assessment of binocular vision disorders. This study examined the interexaminer repeatability of two heterophoria measurement methods in a gaze position with no vertical deviation from straightforward position and in 20 degrees downgaze. The two procedures were von Graefe prism dissociation method (VG) and the tangent scale method commonly known as the modified Thorington test (MT). Serving as subjects were 47 young adults, 22 to 35 years of age. Testing distance was 40 cm. A coefficient of repeatability was calculated by multiplying the standard deviation of the difference between the results from two examiners by 1.96. Coefficients of repeatability in prism diopter units were: VG, straightforward, 6.6; VG, downgaze, 6.2; MT, straightforward, 2.8; MT, downgaze, 3.6. The results show a better repeatability for the tangent scale procedure than for the von Graefe prism dissociation method.
156
David A. Goss, Douglas K. Penisten, Kirby K. Pitts et al.
Introduction Vergence disorders can be a source of eyestrain and uncomfortable vision. Measurement of heterophoria is an important component for the clinical examination for vergence disorders. Measurement of heterophoria requires a method for the elimination of binocular fusion and a method for determining the angle between the lines of sight of the eyes and the position the lines of sight would assume if they intersected at the object of regard. Two common clinical methods of heterophoria measurement are the von Graefe prism dissociation method (VG) and the tangent scale method that is commonly known as the modified Thorington test (MT). The VG test uses prism dissociation to prevent binocular fusion. Alignment of the diplopic images by a rotary prism or a prism bar provides measurement of the heterophoria. The MT test employs a Maddox rod to prevent binocular fusion. A penlight is pointed toward the patient through a hole in the center of the test card. A line is seen by the eye covered by the Maddox rod and a tangent scale is seen by the other eye. The position of the line on the tangent scale is reported by the patient. One of the factors that can be considered in the evaluation of a clinical test is the repeatability of the results obtained with it. A metric often used in the evaluation of repeatability of clinical tests is found by multiplying the standard deviation of the differences between pairs of measurements on a series of subjects by 1.96 to get a 95% limits of agreement between the repeat measurements. This value is sometimes referred to as a coefficient of repeatability. Previous studies have reported better repeatability for MT testing than for VG testing. Morris [1] tested adult subjects, ages 22 to 31 years, on separate days. Near point VG and MT tests were performed on separate days by one examiner with subjects viewing through a phoropter at a test target. Coefficients of repeatability for the VG test were 3.3 prism diopters (Δ) for 20 trained observers and 2.9Δ for 20 untrained observers. On the MT, coefficients of repeatability were 2.0Δ for the trained observers and 1.6Δ for the untrained observers. Slightly lower coefficients of repeatability were obtained when the subjects had kinesthetic input from holding some part of the test target or the target support. Rainey et al. [2] reported on repeatability of test results between two examiners for 72 subjects between the ages of 22 and 40 years. Test distance for both the VG test and the MT test was 40 cm. VG testing was done using phoropter rotary prisms, while the MT testing was performed without a phoropter. Coefficients of repeatability for the VG test were 6.7Δ using a flash presentation of the target in which subjects viewed the target intermittently between
Repeatability of Prism Dissociation and Tangent Scale…
157
adjustments of the prism and 8.2Δ using continuous viewing of the target. The coefficient of repeatability for the MT was 2.3Δ. Wong et al. [3] presented repeatability data based on the agreement of results from two examiners. Seventy-two students, ranging in age from 18 to 35 years, served as subjects. All testing was done without a phoropter. For the VG test, a loose prism was used for prism dissociation and a prism bar was used for the alignment measurement. Test distance for both tests was 40 cm. The coefficient of repeatability for the VG test, using continuous presentation, was 3.8Δ. Coefficients of repeatability for the MT were 2.3Δ for continuous presentation and 2.1Δ for a flashed presentation of the test card. Escalante and Rosenfield [4] examined the repeatability of gradient (lens change) accommodative convergence to accommodation (AC/A) ratios. Repeat measurements, at least 24 hours apart, were performed by one examiner on 60 subjects ranging in age from 20 to 25 years. Testing was done with subject viewing through a phoropter, which was used for the gradient AC/A ratio lens changes. Viewing distance was 40 cm. Coefficients of repeatability of the AC/A ratios, using various lens combinations, ranged from 2.2 to 3.5 prism diopters per diopter (Δ/D) for the VG test and from 1.2 to 2.0 Δ/D for the MT test. A consistent finding of the previous studies was better repeatability on the MT than on the VG. Only one of the studies involved doing both tests outside the phoropter. The present study reports results with both tests done without a phoropter. While not explicitly stated in each of the papers, it may be presumed that test targets were placed in a position without any vertical deviation from straightforward position. The present study reports results for straightforward position and for 20 degrees of downgaze. Thus the purpose of the present study was to test for confirmation of better repeatability on the MT test than on the VG test and to examine their repeatabilities for a position of downgaze.
Methods Forty-seven subjects ranging in age from 22 to 35 years served as subjects. Inclusion criteria were no ocular disease, no history of ocular surgery, no strabismus, no amblyopia, and best corrected distance visual acuity of at least 20/25 in each eye. Subjects wore habitual contact lens or spectacle prescriptions during testing. Testing protocols were approved by the human subjects committee at Northeastern State University, Tahlequah, Oklahoma. Testing was done at 40 cm. VG and MT tests were performed with no vertical deviation from straightforward gaze and with 20 degrees of downgaze. For 20
158
David A. Goss, Douglas K. Penisten, Kirby K. Pitts et al.
degrees of downgaze, test cards were moved down from their position for straightforward gaze by about 13.7 cm and they were tilted back about 20 degrees so that the subjects’ lines of sight would be approximately perpendicular to the cards. Four recording forms, each with a different test sequence, were used to counter-balance test order. The recording form used was changed with each consecutive subject number. All tests were done by two examiners. For the VG test, dissociation of the eyes was induced with an 8Δ base-down loose prism held over the subject’s right eye. Subjects viewed a vertical column of letters on a test card at 40 cm from them. They were instructed to keep the letters clear to control accommodation. A rotary prism was placed over the subject’s left eye in a clamp mounted on a table. Subjects were instructed to report when the columns of letters were aligned so that the upper column of letters was directly above the lower one, and the reading on the rotary prism was noted when the subjects reported alignment. Two values, one with the prism starting from the base-in side and one starting from the base-out side, were averaged, and the result was recorded. Exo findings (Base-in for alignment) were recorded as negative values, and eso findings (base-out for alignment) were treated as positive numbers. For downgaze measurements, subjects were instructed to turn their eyes down rather than their heads, and the prisms were tilted forward by approximately 20 degrees. A Bernell Muscle Imbalance Measure (MIM) near test card was used for the MT test. This is calibrated for a 40 cm test distance. For lateral phoria testing, there is a horizontal row of numbered dots separated by 1Δ when the card is at 40 cm and a hole in the center of the row. Light from a penlight was directed through the hole in the center of the card toward the subject. A red Maddox rod was placed over the subject’s right eye oriented so that the subject saw a vertical line. Subjects were then instructed to close their eyes. As soon as they opened their eyes, they reported the number through which the vertical red line passed and whether the line was to the left or right of the white light. The number indicated the phoria magnitude. The position of the red line relative to the white light indicated the direction of the phoria, to the left for exo (recorded as a negative number) and to the right for eso (recorded as a positive number).
Results The mean findings on both tests were a small amount of exophoria (Table 1). A coefficient of repeatability was determined by finding the mean and standard deviation of the differences between the two examiners and then multiplying the
Repeatability of Prism Dissociation and Tangent Scale…
159
standard deviation of the differences by 1.96. The coefficients of repeatability for the VG test were 6.6Δ for no elevation or depression of gaze and 6.2Δ for downgaze. The respective coefficients of repeatability for the MT test were 2.8Δ and 3.6Δ (Table 1). Table 1. Means (and standard deviations in parentheses) for both examiners and the coefficients of repeatability for each of the tests. Units are prism diopters. VG, von Graefe test; MT, modified Thorington test; S, straightforward position; D, downgaze Test VG, S VG, D MT, S MT, D
Examiner 1 Mean (SD) -1.9 (4.4) -1.0 (4.6) -2.2 (3.6) -1.9 (4.0)
Examiner 2 Mean (SD) -2.4 (5.1) -0.9 (4.5) -2.3 (4.1) -2.0 (3.7)
Coefficient of Repeatability 6.6 6.2 2.8 3.6
Discussion The results of the present study agree with previous studies [1-5] in finding better repeatability with the MT test than the VG test. This better repeatability is present under a variety of test conditions, including with or without the phoropter and in downgaze as well as with no vertical gaze deviation from straightforward position. The VG test is the most commonly used subjective dissociated phoria in optometric practice. Some clinicians have observed that the MT test is generally simpler than the VG test in terms of instrumentation and patient instructions. [2,6] The fact that the MT also offers better repeatability than the VG suggests that its more widespread adoption as a routine phoria test may be advisable. Repeatability of other less commonly used phoria tests has also been studied. One investigation reported the repeatability of the VG test to be better than that for the Maddox rod test. [1] Another study found repeatability for the near Howell card phoria to be better than that for the VG test and nearly as good as that for the MT test. [3] The means found in the present study are similar to previous studies comparing VG and MT tests, in which means for the VG ranged from -2.2 to 5.0Δ and means for the MT ranged from -2.1 to -3.4Δ. [2,3,5-7] In the present study, as in the previous studies, standard deviations for the VG were higher than for the MT. It has been reported that for midrange phorias, there is quite good
160
David A. Goss, Douglas K. Penisten, Kirby K. Pitts et al.
agreement of VG and MT findings, but for higher magnitude phorias, either exo or eso, VG tends to yield higher values. [7]
Conclusion Based on the results of the present study as well as those of previous studies, one can conclude that the MT tangent scale test has better repeatability than the VG prism dissociation test under a variety of test conditions.
References [1] [2] [3]
[4]
[5]
[6] [7]
Morris, FM. The influence of kinesthesis upon near heterophoria measurments. Am. J. Optom. Arch. Am. Acad. Optom. 1960;37,327-51. Rainey, BB; Schroeder, TL; Goss, DA; Grosvenor, TP. Inter-examiner repeatability of heterophoria tests. Optom. Vis. Sci. 1998;75,719-26. Wong, EPF; Fricke, TR; Dinardo, C. Interexaminer repeatability of a new, modified Prentice card compared with established phoria tests. Optom. Vis. Sci. 2002;79,370-5. Escalante, JB; Rosenfield, M. Effect of hetereophoria measurement technique on the clinical accommodative convergence to accommodation ratio. Optom – J. Am. Optom. Assoc. 2006;77,229-34. Hirsch, MJ; Bing, LB. The effect of testing method on values obtained for phoria at forty centimeters. Am. J. Optom. Arch. Am. Acad. Optom. 1948;25,407-16. Hirsch, MJ. Clinical investigation of a method of testing phoria at forty centimeters. Am. J. Optom. Arch. Am. Acad. Optom. 1948;25,492-5. Goss, DA; Moyer, BJ; Teske, MC. A comparison of dissociated phoria test findings with von Graefe phorometry and modified Thorington testing. J. Behav. Optom. 2008;19,145-9.
Reviewed by Douglas G. Horner, O.D., Ph.D., School of Optometry, Indiana University, Bloomington, IN 47405.
In: Binocular Vision Editors: J. McCoun et al, pp. 161-188
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Chapter 8
TEMPORARILY BLIND IN ONE EYE: EMOTIONAL PICTURES PREDOMINATE IN BINOCULAR RIVALRY Georg W. Alpers1 and Antje B.M. Gerdes University of Würzburg and University of Bielefeld, Germany
Abstract Preferential perception of emotional cues may help an individual to respond quickly and effectively to relevant events. Existing data supports this hypothesis by demonstrating that emotional cues are more quickly detected among neutral distractors. Little data is available to demonstrate that emotional stimuli are also preferentially processed during prolonged viewing. The preferential perception of visual emotional cues is apparent under conditions where different cues compete for perceptual dominance. When two incompatible pictures are presented to one eye each, this results in a perceptual alternation between the pictures, such that only one picture is visible while the other is suppressed. This so called binocular rivalry involves different stages of early visual processing and is thought to be relatively independent from intentional control. Several studies from our laboratory showed that emotional stimuli predominate over neutral stimuli in binocular rivalry. These findings can be interpreted as evidence for preferential processing of emotional cues within the visual system, which extends beyond 1
E-mail address:
[email protected]. Address for correspondence: PD Dr. Georg W. Alpers Department of Psychology, (Biological Psychology, Clinical Psychology, and Psychotherapy), University of Würzburg, Marcusstraße 9-11 , D- 97070 Würzburg, Germany, Tel.: 0049-931-31-2840, Fax: 0049-931-31-2733.
162
Georg W. Alpers and Antje B.M. Gerdes initial attentional capture. Taken together, data from this paradigm demonstrates that emotional pictures are perceived more intensively.
Keywords: binocular rivalry, emotional pictures, visual perception, preferential processing
1. Preferential Processing of Emotional Stimuli We are constantly exposed to a multitude of visual stimuli. Evaluating this information and responding with adequate behavior where necessary, helps us to survive. Identifying threatening stimuli seems to be especially important as we need to repel danger or protect ourselves from it as quickly as possible. Evidence for a privileged role of aversive stimuli in perception and attention processes can be found in a number of convincing research paradigms. For example, visual search tasks show that angry faces can be detected very quickly and pop out among a variety of neutral faces (Hansen and Hansen, 1988; Öhman, Lundqvist, and Esteves, 2001). Similarly, fear relevant stimuli like spiders and snakes surrounded by neutral objects can be found more quickly (Öhman, Flykt, and Esteves, 2001). In the so called dot-probe paradigm, participants respond faster to test probes (letters, for example) that appear on the spot where a fear relevant, as opposed to a neutral, stimulus was presented beforehand (Bradley, Mogg, Millar, Bonham-Carter, Fergussoon, Jenkins et al., 1997; Mogg and Bradley, 2002). Many scientists assume that this allocation of attention is based on automatic processes which operate independently from conscious processing. Convincing evidence for this assumption comes from experiments which reveal psychophysiological reactions to emotional stimuli even when they were not consciously perceived, for example when they were presented subliminally and masked by another picture (Dimberg, Thunberg, and Elmehed, 2000; Öhman and Soares, 1994).
1.1. Two Pathways for the Processing of Emotional Stimuli Fast and automatic perception of emotionally relevant visual stimuli calls for rapid neuronal processing. The amygdala is central for the processing of emotional cues and especially for fear relevant information (LeDoux, 1996; Morris, Friston, Buchel, Frith, Young, Calder et al., 1998). Based on a number of animal studies, LeDoux (1996) demonstrated that sensory information of external stimuli can reach the amygdala via two relatively independent pathways. On the
Temporarily Blind in One Eye
163
one hand, information from the retina projects to the sensory cortex, via the sensory thalamus, for a higher-level analysis. This pathway is rather slow but encompasses detailed information and LeDoux therefore calls it the “high road” of emotional processing. Cortical areas process this input before it reaches the amygdala, from where emotional reactions can be elicited and modulated. Then, a more time consuming and more elaborate processing of emotional stimuli can be guaranteed (Pessoa, Kastner, and Ungerleider, 2002). Even more complex reciprocal influences of emotion and attention can further modulate processing of visual perception within a network of frontal and parietal brain regions (Pessoa, et al., 2002; Windmann, Wehrmann, Calabrese, and Güntürkün, 2006).
Figure 1. Two pathways of processing of visual emotional stimuli (after LeDoux, 1996).
In addition to this cortical pathway, a subcortical pathway for immediate processing of emotional information exists, where sensory information reaches the amygdala via direct thalamic projections. This is thought to be independent from the cortical analysis mentioned above. This direct projection from the thalamus to the amygdala only provides for a crude analysis but it is much faster than the one from the thalamus to the amygdala via the sensory cortex. Thus, this so called “low road” represents a shortcut which bypasses cortical areas. Along this pathway, information about emotional relevance can reach the amygdala much more rapidly (see Figure 1). This allows for a direct and fast response to potentially dangerous stimuli before the analysis is complete. Thus, emotional reactions that are initiated by the amygdala enable for effective fight or flight
164
Georg W. Alpers and Antje B.M. Gerdes
responses (e.g., defensive reflexes or an increased respiration rate and heart rate – see Alpers, Mühlberger, and Pauli, 2005).
Figure 2. Amygdala projections to the visual cortex of the macaque brain (Amaral, et al., 2003). L - lateral nucleus, Bmc - magnocellular division, BI - intermediate division, TEO – optic tectum , TE – telencephalon.
Support for the independence of the "low road" from cortical processing comes from animal studies and studies with cortically blind patients. They demonstrate that processing of visual emotional information is indeed possible without involvement of intact cortical circuitry. For example, cortically blind patients show physiological responses to emotional stimuli even if they are not able to consciously perceive them (Anders, Birbaumer, Sadowski, Erb, Mader, Grodd et al., 2004; Hamm, Weike, Schupp, Treig, Dressel, and Kessler, 2003). The “low road” has an additional function: It can modulate cortical processing on the “high road”. Once again, it was shown in animal studies that there are direct neuronal projections from the amygdala to cortical visual areas (V1 or V2, for example) (Amaral, Behniea, and Kelly, 2003; Amaral and Price, 1984) (see figure 2). Evidence supporting that such neuronal circuits also exist in humans has been provided since (Catani, Jones, Donato, and Ffytche, 2003). These projections make it possible that "quick and dirty" subcortical processing can influence more
Temporarily Blind in One Eye
165
elaborate processing in cortical areas which would allow for enhanced conscious perception and processing (Pessoa, Kastner, and Ungerleider, 2003). Electrophysiological (Schupp, Junghöfer, Weike, and Hamm, 2003; Stolarova, Keil, and Moratti, 2006), as well as hemodynamic evidence (Herrmann, Huter, Plichta, Ehlis, Alpers, Mühlberger et al., 2008; Lang, Bradley, Fitzsimmons, Cuthbert, Scott, Moulder et al., 1998) shows that emotionally relevant stimuli are accompanied by increased activation of visual areas in the occipital lobe. It is very plausible that this may be partly initiated by input from the “low road” in addition to top-down input from higher cortical areas such as directed attention.
1.2. Intensive Processing of Negative Valence or of Arousal? In additon to multiple findings documenting a preferential processing of negative stimuli in the amygdala there is growing evidence for an equally intensive processing of arousing positive stimuli. According to the Emotionality Hypothesis, all emotional stimuli are selectively processed, independent of their specific valence. And indeed, processing stimuli which are associated with reward involves similar brain circuits as the processing of cues for danger (Berridge and Winkielman, 2003; Davis and Whalen, 2001). As it can be seen in functional magnetic resonance imaging (fMRI) and positron-emission-tomography (PET) studies, processing of positive as well as negative words is associated with higher amygdala activation (Hamann and Mao, 2002). Furthermore, higher amygdala activation in response to positive and negative as opposed to neutral pictures has been observed (Garavan, Pendergrass, Ross, Stein, and Risinger, 2001; Hamann, Ely, Hoffman, and Kilts, 2002). Electroencephalography (EEG) findings also support the notion that strongly activating affective pictures are processed faster and more intensely in the visual cortex (Cuthbert, Schupp, Bradley, Birbaumer, and Lang, 2000; Schupp, et al., 2003). Thus, the intensity of emotional arousal seems to be more crucial than the specific affective valence of a stimulus. Also, these findings suggest that the amygdala may be involved in the processing of positive as well as negative emotional stimuli. In conclusion, it could be expected that positive as well as negative pictures boost visual perception.
166
Georg W. Alpers and Antje B.M. Gerdes
2. "Blind" in One Eye: Binocular Rivalry Information from the environment reaches the two eyes independently. Under normal circumstances this information is combined into a meaningful spatial representation. However, conscious perception does not inevitably represent the physical environment. Instead, perception is the end product of several steps of selective processing. While many stimulus properties are still processed by the eyes’ sensory cells, only a small fraction of them reaches conscious awareness. This is especially obvious when ambivalent information is presented to the two eyes and information cannot be combined to a meaningful impression at later stages of processing in the brain. When competing pictures are presented to the two eyes and a distinct impression cannot be evoked, this results in a fascinating perceptual phenomenon called binocular rivalry. For a given period of time, one of the pictures is perceived dominantly while the other is suppressed and thus removed from conscious awareness. An unpredictable alternation between the two impressions ensues. At times rivalry can also result in percepts which consist of mixtures of both pictures combined of parts from both. During extended periods of time, input from one eye is completely suppressed from conscious awareness. Thus, perceptual changes occur while visual input remains constant. Binocular rivalry is a remarkable phenomenon and offers the possibility to further investigate visual perception. Importantly, conscious control over what is perceived during binocular rivalry is very limited (Meng and Tong, 2004; Tong, 2001). In general, binocular rivalry enables the researcher to investigate features of conscious perception, and processes underlying perception, in detail.
2.1. What Helmholtz Knew Already Recently, binocular rivalry receives growing attention in research focused on consciousness, both from the psychological and the neurological point of view, but it is not a newly discovered phenomenon. Binocular rivalry has been a well known phenomenon for a very long time (see Humphrey and Blake, 2001). In 1593 already, Porta reported a perceptual phenomenon which occurred when he held the two pages of a book right in front of his two eyes (cited by Wade and Ono, 1985). He reported being able to temporarily read one of the pages, while the other was invisible. Early systematic investigations of binocular rivalry trace back to Wheatstone (1838). For his experimental investigations he developed the mirror stereoscope –
Temporarily Blind in One Eye
167
an optic apparatus which makes it possible to present different pictures to the two eyes (see a modern version in Figure 3). Ever since binocular rivalry has been studied scientifically, the underlying neuronal processes have been discussed controversially. In spite of the multitude of interesting findings, the neuronal mechanisms have not yet been unambiguously clarified. The main disagreement that prevails is whether the competition for conscious perception takes place at very early or later stages of visual processing. One of the earliest theories was introduced by Helmholtz (1924) and many other studies are based on it. He assumed that the eyes’ visual fields are completely independent of each other and that under normal circumstances the consistent perceptual impression does not occur until higher mental processes have taken place. Consequentially, he called the phenomenon “retinal rivalry”. According to this theory, the decision about dominance or suppression would only occur after both pictures have been processed independently and higher mental processes such as attention would select among the two.
Figure 3. The mirror stereoscope used in our laboratory (sample pictures on the computer screen).
168
Georg W. Alpers and Antje B.M. Gerdes
Hering (1886), on the other hand, favoured another model. He assumed that early inhibitory interactions in visual processing account for the occurrence of rivalry. This has also been labelled the “low-level” theory. Thus, according to this theory, the decision about dominance or suppression takes place before the two pictures are completely processed.
2.2. Competition between Input from the Eyes or between the Percepts? This 19th century controversy continues to the present day. According to the “low-level” theory (advocated for example by Blake, 1989), binocular rivalry exhibits typical properties of early visual processes. Therefore, this theory is also called “eye-rivalry” theory. Rivalry thus occurs by means of inhibitory connections between the monocular processing channels. Evidence for this theory comes from imaging studies showing that rivalry alters activity early in the visual stream, e.g., in V1 (Polonsky, Blake, Braun, and Heeger, 2000; Tong and Engel, 2001) and the lateral geniculate nucleus (Haynes, Deichmann, and Rees, 2005). Importantly, processing in these circuits is generally thought to be mostly independent from input from the two eyes. Binocular rivalry suppression of one channel would thus mean that input from one eye is not thoroughly processed before being suppressed. In contrast, the “high-level” theory postulates that binocular rivalry arises because of a competition between stimulus information, independent from the source of this input. Thus, this perspective is also called the “stimulus-rivalry” perspective. It assumes that rivalry is decided after primary processing in monocular channels is integrated in binocular channels, that is, the crucial processes are thought to be independent from the fact that this input originated from one eye or the other (Logothetis, Leopold, and Sheinberg, 1996). Thus, according to this theory, rivalry takes place between integrated stimulus representations, rather than information from the two eyes. Support for this perspective comes from single cell recordings in animals which show little evidence for rivalry-correlated activation in V1, inconsistent effects for visual areas V4 and MT, but clear effects in the inferior temporal cortex. These neurophysiological findings favour the theory that rivalry is the result of a competition between incompatible stimulus representations in higher areas of visual processing, i.e. after information from the two eyes has been integrated in V1 (Sheinberg and Logothetis, 1997).
Temporarily Blind in One Eye
169
Some findings from human studies also account for the fact that a certain amount of processing has taken place before the competition is decided. For example, suppressed stimuli can produce after-images(e.g., O'Shea and Crassini, 1981). However, the strongest argument for the “high-level” theory is the finding that fragments of a picture presented to one eye each can be reassembled to a consistent percept in conscious perception (Kovacs, Papathomas, Yang, and Feher, 1996). Rivalry then occurs between combined percepts and not between input from one eye. Interestingly, such rivalry between percepts even occurs when parts of the pictures are intermittently projected to one or to the other eye (Logothetis, et al., 1996). Because convincing support has been found for both theories, many authors conclude that binocular rivalry may involve stages of visual processing (Nguyen, Freeman, and Alais, 2003; Wilson, 2003). From this point of view, the theories are not mutually exclusive, but rather characterize two extremes on a continuum. Rivalry considerably suppresses activation in the primary processing channels, but enough information of the suppressed picture can proceed to brain circuits which process integrated information from both monocular channels.
2.3. Possible Influences from Non-visual Neuronal Circuits Taking into account the projections from the amygdala to primary sensory areas of the visual cortex (V1, V2) which we mentioned above (Amaral, et al., 2003), it becomes apparent that this may be an avenue for a picture's emotional significance to influence processing within visual circuitry. If the relevance of a stimulus has been detected within subcortical circuits (“low road”) this may lead to more intense visual processing at several later stages of visual perception. Indeed, it has been shown that the amygdala is more strongly activated by fearful faces, even when they are suppressed in binocular rivalry. Thus, although the emotional material was not available to conscious perception (confirmed by an absence of activation in specialized face-sensitive regions of the ventral temporal cortex), emotional circuitry was (Pasley, Mayes, and Schiltz, 2004; Williams, Morris, McGlone, Abbott, and Mattingley, 2004). Thus, we proposed that such preferential processing of emotional pictures may result in their predominanceover neutral pictures in binocular rivalry. Because conscious control over binocular rivalry is probably not possible (Meng and Tong, 2004) we argued that demonstrating such predominance would provide particularly convincing evidence for preferential processing during prolonged perception.
170
Georg W. Alpers and Antje B.M. Gerdes
3. Previous Investigations of Emotional Pictures in Binocular Rivalry 3.1. Significance and Predominance Until recently, very few studies have examined the influence of a pictures’ significance on predominance and suppression in binocular rivalry. An early study by Engel (1956) demonstrated that upright pictures of faces predominate over inverted faces, indicating that greater salience of the familiar upright faces boost their competitive strength in binocular rivalry. Bagby (1957) investigated the possible influence of personal relevance by showing pairs of pictures containing culturally relevant material to participants with different cultural backgrounds. In North American participants the typically American scene (a baseball player) predominated over less relevant material (a matador) while the latter predominated in participants from Mexican descent. Personal relevance and familiarity thus seem to have an influence on perception in binocular rivalry. However, it has to be acknowledged that response biases (for example whether a picture is easier to specify) have not been taken into account in these studies. Additional studies further support that perception in binocular rivalry is modulated by personal significance of the stimuli. Kohn (1960) as well as Shelley and Toch (1962) showed that certain personality traits (e.g., aggressiveness) can influence the predominance of related stimuli (violent scenes). It was also reported that depressed participants predominantly perceive pictures with sad content compared to healthy control participants (Gilson, Brown, and Daves, 1982). Enthusiasm for this paradigm was hampered by other disappointing results. Blake (1988) did not find predominance of meaningful texts over meaningless strings of letters. However, it is important to note that the text elements were not shown at once but as a stream of letters, thus, holistic processing was not possible. Rivalry would have had to occur between non-salient letters of salient words. Interest in this research paradigm waned for many years after a comprehensive review by Walker (1978) highlighted the methodological problems with early studies. Walker’s main critique was that studies investigating the influence of picture content did not apply stringent definitions of predominance. He concluded that response biases might have been the main cause of the observed results.
Temporarily Blind in One Eye
171
Today, interest in this paradigm has been renewed and a number of well controlled experiments have been conducted since. Results convincingly demonstrate that pictures with a good Gestalt predominate over meaningless pictures (de Weert, Snoeren, and Koning, 2005), and that a meaningful context of pictures promotes predominance (Andrews and Lotto, 2004). Taken together, these investigations more clearly than former studies, support the hypothesis that semantic contents of pictures can affect binocular rivalry.
3.2. Emotional Discrepancy and Binocular Rivalry Although an influence of semantic content of pictures on binocular rivalry thus seems very likely, there is little evidence that emotional salience can also promote dominance in rivalry. There are few studies in which different emotional pictures were presented stereoscopically. A recent study investigated the effect of different emotional facial expressions on their relative dominance (Coren and Russell, 1992). Here, pairs of pictures showing different facial expressions were presented to one eye each, and after a presentation time of 350 msec participants were asked what their predominant percept was. Facial expressions with strong positive or strong negative valence and high subjective arousal were more frequently perceived as predominant than faces with less extreme ratings. Overall, valence seemed to have had the strongest impact while arousal mainly influences dominance when the rivalling pictures had the same valence. It is problematic that only emotional faces were presented together and the presentation time was very short, because binocular rivalry takes time to built up. Moreover, asking the participants about their percept after the presentation introduced sources of error such as memory effects and response biases. Another study investigated perception of different facial expressions in binocular rivalry but its aim was to document the two-dimensional structure of emotions (on scales of valence and arousal) and predominance of emotional pictures was not the center of its design (Ogawa, Takehara, Monchi, Fukui, and Suzuki, 1999). Pairs of pictures with different emotional facial expressions were presented and the perceived impression was rated for valence and arousal. A second and very similar study by the same group measured specific perceptual impressions during the presentation in addition to the dimensional structure of the evaluations (Ogawa and Suzuki, 2000). The authors conclude that there is more rivalry between pictures which are evaluated similarly on valence and arousal, while pictures with stronger valence and superior arousal are more unambiguously perceived as dominant.
172
Georg W. Alpers and Antje B.M. Gerdes
Taken together, there is some evidence that emotional meaning influences binocular rivalry, but until now, only few studies investigated this influence systematically.
4. Binocular Rivalry Experiments at Our Lab 4.1. Predominance of Emotional Scenes We argue that binocular rivalry provides for very interesting experiments in emotion research, but our review of the literature demonstrated that this question has not been thoroughly investigated. There is a general lack of recent and methodically convincing studies. The first study on binocular rivalry from our lab investigated whether complex emotional pictures predominate over neutral pictures (Alpers and Pauli, 2006). Sixty-four healthy participants were stereoscopically shown ten pairs of pictures from the International Affective Picture System (IAPS, Lang, Bradley, and Cuthbert, 2005). These pictures depict different emotional scenes and are frequently used as stimuli in emotion research. Using this widespread picture material allows for a direct comparison of the results with data from other established paradigms. Because positive as well as negative pictures elicit similar patterns of activation in areas responsible for emotional processing (see above), we chose pictures of negative, neutral and positive valence for this study. If pictures with emotionally activating content predominated over neutral pictures in binocular rivalry, this would suggest that emotional input is preferentially processed at an early stage of visual processing. Thus, pairs of pictures were composed with one emotional (positive or negative) and one neutral picture each. These pairs were presented for 30 sec in a randomized order. Participants looked through a mirror stereoscope that was mounted about 30 cm in front of the computer monitor. Thus only one picture was projected to each eye. In contrast to earlier studies, participants had to continuously verbalize what their perceptual impression was throughout each trial. Participants were not explicitly instructed to categorize emotional and neutral percepts, but were simply asked to report the content they saw. A trained research assistant coded the participants’ comments as emotional or neutral with button presses. A ratio of the cumulative time for emotional versus neutral percepts was used as the dominance index. We assessed the initial perceptual impression in each trial as another dependent variable because this is thought to be less strongly affected by habituation and less error-prone with regard to verbalization.
Temporarily Blind in One Eye
173
First of all, our results confirmed that emotional pictures are considerably longer perceived as dominant than neutral pictures. Also, the initial perceptual impression was significantly more often that of the emotional as opposed to the neutral picture. Differences in predominance were not found between positive and negative pictures, either in duration of the percept nor in the frequency of the initial percept. This experiment clearly shows that pictures of emotional scenes presented to one eye predominate over neutral pictures presented to the other eye. This confirms our hypothesis that emotional content can boost visual perception. As clear cut as the results of this experiment may be, a number of serious limitations have to be considered. First, verbal coding is certainly prone to response biases. The tendency to more often mention emotional picture contents could have had an influence on the results, as well as a tendency to avoid verbalizing specific picture content, because some contents may have been embarrassing for the participant, for example (erotic pictures). Verbalizing unpleasant issues (repellent and disgusting details) could also pose a problem. Furthermore, relatively large pictures were used in this study, and this often leads to the perception of mixed pictures, so called piecemeal rivalry (O'Shea, Sims, and Govan, 1997). Although it was also possible to report mixed pictures, it remains unclear whether participants applied different decision criteria for reporting mixed or unambiguous percepts. In conclusion, these results support the hypothesis that binocular rivalry is influenced by the emotional contents of two competing pictures.
4.1.1. Possible Confounds Among the undesirable confounding factors which can potentially influence perceptual dominance, potential physical differences in the pictures’ complexity and color are certainly most important. However, to a large extent such differences are closely associated with the emotional content (Lang et al., 1998). Binocular rivalry is strongly influenced by certain physical characteristics. For example, larger pictures tend to fuse with each other more strongly than smaller pictures (i.e. the rate at which rivalry occurs, is diminished) (O'Shea, et al., 1997). Pictures with high-contrast are perceived more dominantly compared with low-contrasted pictures (Blake, 1989). Brighter pictures dominate more frequently than darker pictures (Kaplan and Metlay, 1964) and moving pictures dominate over stable pictures (Blake and Logothetis, 2002). The studies summarized below are aimed at controlling for physical characteristics and at controlling for possible problems with self report.
174
Georg W. Alpers and Antje B.M. Gerdes
4.2. Dominance of Emotional Facial Expressions In order to further pursue the question of preferential perception of emotional stimuli with the aid of binocular rivalry, and stimuli which are better controlled for physical differences, we conducted an experiment with pictures of different facial expressions (Alpers and Gerdes, 2007). Presenting emotional faces is especially useful in emotion research because these stimuli are evolutionary relevant (Dimberg, et al., 2000), largely independent from culture (Ekman, Sorenson, and Friesen, 1969) and because they provoke emotional reactions in every day life (Dimberg, 1982). Emotional facial expressions are processed very rapidly (Jeffreys, 1996) and holistically (Farah, Wilson, Drain, and Tanaka, 1998) via a specialized subcortical route (Johnson, 2005). Emotional faces attract attention as can be seen in visual search and Dot-Probe paradigms (Mogg and Bradley, 2002). They elicit subcortical activation as well as peripheral physiological and behavioral reactions (Whalen, Kagan, Cook, Davis, Kim, Polis et al., 2004; Whalen, Rauch, Etcoff, McInerney, Lee, and Jenike, 1998) even when they are masked and thus not consciously perceived. For our study on binocular rivalry we chose pictures of eight actresses, each of them showing angry, frightened, happy, surprised and neutral facial expressions, from a standardized set of pictures (Karolinska Directed Emotional Faces, KDEF, Lundqvist, Flykt, and Öhman, 1998). These pictures are very well standardized regarding physical characteristics such as background and brightness. Pair wise presentation of one neutral and one emotional facial expression of the same actress made it possible for us to adequately control for inter-individual differences of the actresses. During the stereoscopic presentation the 30 participants continuously coded their perception of emotional, neutral or mixed pictures by button presses. Again, there was clear evidence for predominance of emotional stimuli compared to neutral stimuli for both, the cumulative duration with which each percept was seen as well as the initial percept seen during each trial (see Figure 4). Emotional faces were consciously perceived by the participants significantly longer throughout the trial and they were significantly more often perceived as the first clear percept of a trial. Different from our expectation, there were no differences between positive and negative facial expressions. These findings support our earlier findings of predominance of emotional pictures over neutral pictures in binocular rivalry. In this study we were able to take into account potential limitations of the first study. Predominance of emotional faces is equally
Temporarily Blind in One Eye
175
strong as that of emotional IAPS pictures, although the latter normally induce a higher arousal than emotional faces (see Adolph, Alpers, and Pauli, 2006).
Figure 4. Results from the experiment with emotional faces (Alpers and Gerdes, 2007; with permission from APA). Left panel: Cumulative duration of dominant perception (in seconds) and standard errors of the mean, for emotional, neutral and mixed pictures. Right panel: Average frequency of initial perception and standard errors of emotional, neutral and mixed pictures.
As participants did not have to verbalize what they perceived and the categorical classification of perceived facial expressions was easy (emotional vs. neutral), the likelihood of response biases was clearly reduced in this study. Nonetheless, coding of participants' perception was still based on self-report. Thus, biases can not be completely ruled out.
4.3. Inter-Individual Differences: Phobic Stimuli After having demonstrated that emotional pictures are in fact dominant over neutral pictures, we addressed another early hypothesis of the binocular rivalry literature: Are inter-individual differences reflected in what people perceive in binocular rivalry? With the help of a further improved experimental design, we investigated whether fear relevant stimuli are perceived as more dominant by individuals who
176
Georg W. Alpers and Antje B.M. Gerdes
are characterized by high levels of high fear. We recruited patients with a specific phobia of spiders because they are especially well suited for this investigation. The presentation of phobia-related stimuli elicits strong emotional reactions (Globisch, Hamm, Esteves, and Öhman, 1999). That phobic material activates subcortical networks which may in turn prime or boost the activity of the visual cortex, has been documented in a stronger amygdala activation in response to phobic cues (e.g., Alpers, Gerdes, Lagarie, Tabbert, Vaitl, and Stark, submitted). Remarkably, possible influences of mere physical differences of pictures (emotional versus neutral) are not problematic in this endeavor because they should affect phobic and non-phobic participants in equal measure. Different degrees of dominance between patient and control participants would also support the theory that phobia-related cues are processed more intensely in phobic participants. A group of 23 patients who met diagnostic criteria for spider phobia (DSM-IV, American Psychiatric Association, 1994) and 20 non-phobic control participants were recruited for this investigation (Alpers and Gerdes, 2006). Twenty different pictures of spiders and flowers were presented stereoscopically. Different from previous studies all of these pictures were paired with an abstract pattern,. We hoped that this would minimize the problems related to response biases and to interindividual differences in decision criteria. These picture-pattern pairs were presented for 8 sec each (see Figure 5). Participants were asked to continuously code their perceptual impression by pressing one of three different buttons. There was one button each for dominant perceptual impression of a picture (spider and flower), one for the abstract pattern, and and one for mixed percepts. The advantages of this approach were that the specific content of a picture did not have to be identified and that no decision between two semantic pictures was needed.
Figure 5. Stimulus material for the experiment with phobic patients (Alpers and Gerdes, 2006): examples of a spider picture, the abstract pattern, and a flower.
Temporarily Blind in One Eye
177
Figure 6. Results from the experiment with phobic patients; Left panel: mean duration of dominant perception (with standard errors) of pictures of spiders, patterns and mixed pictures (crossed bars: phobic; empty bars: control participants). Right panel: mean duration of dominant perception (with standard errors) of pictures of flowers, patterns and mixed pictures (crossed bars: phobic; empty bars: control participants).
Spider phobic participants perceived pictures of spiders as dominant for longer periods of time during the trials and they also reported that they perceived phobic pictures as the first clear percept of a trial more often than control participants. At the same time, there are no group differences for pictures of flowers versus the pattern; groups did not differ in the duration with which they perceived pictures of flowers as dominant or in how often they reported seeing flowers as the first percept in a trial (see Figure 6). This study replicates our previous findings in showing that emotional pictures modulate competition in binocular rivalry. In addition, we were able to demonstrate that personal relevance is reflected in dominance of specific phobiarelated cues. We were also able to support the theoretically founded assumption that phobia-related cues are preferentially processed by individuals with spider phobia. When we compared negatively valenced and positively valenced pictures in earlier studies, no differences were apparent. With respect to interindividual differences, we have no data concerning personal relevance of positive pictures at this point.
178
Georg W. Alpers and Antje B.M. Gerdes
4.4. Controlling for Physical Properties of Stimuli Although the findings we reported above clearly support the hypothesis that emotional picture content results in more predominant perception, it cannot be completely ruled out that physical properties of pictures, as well as self-report of perception, could have influenced the results in these studies. To account for this and to validate our previous findings, we conducted four more experiments with the objective to largely eliminate the influence of physical properties of stimuli and minimize response biases in self-report of perception. In the first experiment, we presented two physically identical geometric gratings which only differed in spatial orientation. In order to introduce differences in emotional valence we used a differential fear conditioning paradigm. Thus, the two stimuli which had neutral valence at the outset were given different emotional valence by our experimental manipulation (Alpers, Ruhleder, Walz, Mühlberger, and Pauli, 2005). Interestingly, we were able to interleave conditioning and binocular rivalry trials. This helped us to document that more aversive experience with a given grating with a given orientation changed its influence on predominance across trials. As a result, the aversively conditioned pattern was perceived as the dominant percept for a longer period of time during the trials when compared to the perception before the fear conditioning, and it was reported more and more frequently as the first percept of a trial. However, these effects were rather small compared with the effects in studies using emotional IAPS pictures and emotional facial expressions. This can probably be best explained by the fact that geometric patterns are not evolutionary relevant, even if they aquire emotional relevance after fear conditioning. Thus, underlying neuronal processes of emotional processing are probably less pronounced here than in experiments with stimuli that are evolutionary prepared or are naturally relevant. In a second study we presented schematic emotional faces which are more biologically relevant but differ in physical characteristics. We designed neutral, positive, and negative facial expressions by arranging nearly identical picture elements (also see Lundqvist, Esteves, and Öhman, 2004). Although those faces are rather simple, several studies have demonstrate that schematic faces can elicit similar emotional reactions as photographs of faces (Bentin and Gollanda, 2002; Eastwood, Smilek, and Merikle, 2003; Lundqvist and Öhman, 2005). In this experiment schematic emotional faces clearly predominanted compared with neutral faces (Experiment 2, Alpers and Gerdes, 2007). The pattern of results was very similar to the pattern reported above for photographic emotional faces. Taken together, both control-experiments demonstrate that
Temporarily Blind in One Eye
179
dominance of emotional stimuli can not be exclusively attributed to covarying physical differences between neutral and emotional pictures.
4.5. Validation of Self-report Two further studies were aimed at verifying the participants' self report of perception during binocular rivalry. Similar to the conditioning experiment introduced above, two geometric patterns were shown stereoscopically. In order to obtain an objective measure of the participants' perception we coded each stimulus with one of two flicker frequencies. When perceived dominantly, each frequency resulted in a distinguishable EEG signals (steady-state visually evoked potentials) (Brown and Norcia, 1997). With this approach, we were able to demonstrate that the participants' self-report of what they saw corresponded with the respective objective EEG signal of the dominant pattern over occipital regions (Alpers, et al., 2005). Another experiment was based on the finding that changes in the suppressed picture are harder to detect than changes in the dominant picture (Fox and Check, 1968; Freeman and Jolly, 1994; Nguyen, Freeman, and Wenderoth, 2001). In this experiment, we again presented emotional and neutral faces from the KDEF picture set (Alpers and Gerdes, 2007). In addition, in the course of a trial, we occasionally presented a small dot in either the emotional or the neutral picture. Participants were asked to press a button when they detected a probe. More dots were identified in the more emotional pictures which were dominant more often and reaction times were shorterwhen dots were identified in emotional pictures compared to neutral ones. Taken together, these two experiments may provide support for the validity our participants' self-reportof their perception in binocular rivalry.
4.6. Summary The series of studies presented here documents that emotional visual stimuli clearly predominate in binocular rivalry. The findings are consistent across a variety of stimuli such as emotional scenes, emotional facial expressions or aversively conditioned stimuli. In addition, differences in perception between differentially affected groups of people were documented using phobia-related pictures. Moreover, we can largely rule out effects of physical differences or response biases on our results.
180
Georg W. Alpers and Antje B.M. Gerdes
These results are consistent with other findings showing that meaningful pictures dominate over meaningless ones in binocular rivalry (Yu and Blake, 1992). However, as to the mechanisms, we have not yet shown that dominance was mediated by activation of emotional neuronal circuits (the amygdala, for example). Nonetheless, predominance of emotional stimuli in binocular rivalry is another piece of evidence for preferential processing in the visual stream. Whether this is based on automatic processes (Öhman, 2005), higher order (cortical) attentional processes (Pessoa, et al., 2003) or an interaction of both is a challenging problem for future research.
5. Conclusion: "Blind" in One Eye - But not When It Comes to Emotion The preferential perception of emotional pictures in binocular rivalry is clearly consistent with results from other experimental paradigms, such as the faster detection of emotional stimuli in search tasks (Hansen and Hansen, 1988; Öhman, et al., 2001). Furthermore, the results are consistent with findings form psychophysiological studies which show stronger activation of the visual cortex when looking at emotional pictures (Alpers, et al., 2005; Herrmann, et al., 2008; Schupp, et al., 2003). An evolutionary advantage of faster detection of potentially meaningful stimuli accounts for an easier processing of emotional material (Öhman and Mineka, 2001). However, with the paradigm we described here we were not able to verify whether activation of emotional neuronal circuits is in fact responsible for the competitive strength of emotional pictures in binocular rivalry. With regard to neuronal substrates in which binocular rivalry is processed, it can be hypothesized that feedback from subcortical circuity such as the amygdala (Amaral, Price, Pitkanen, and Carmichael, 1992) and the anterior cingulate cortex (Posner and Raichle, 1995) may be involved in processes leading to conscious perception. As we explained in the introduction, emotional material activates the amygdala in binocular rivalry even when it is suppressed. Because different stages of processing in primary and extrastriatal areas are involved in binocular rivalry (Kovacs, et al., 1996; Logothetis, et al., 1996; Sheinberg and Logothetis, 1997), it is apparent that influences from emotional processing centers could take effect. While some experiments with different paradigms found emotion specific effects which suggest that preferential processing of negative pictures is specific (Hansen and Hansen, 1988; Öhman, Lundqvist, et al., 2001), our experiments with binocular rivalry point to a dominance of both positive and negative pictures.
Temporarily Blind in One Eye
181
Although it seems to be particularly reasonable, from an evolutionary perspective, to preferentially process negative stimuli, there are several findings, which show that positive stimuli are also processed preferentially (Garavan, et al., 2001; Hamann and Mao, 2002). Effects of arousal irrespective of valence are also evident in event related potentials of the EEG (Cuthbert, et al., 2000). Furthermore, automatic allocation of attention was found for both positive and negative pictures (Alpers, 2008; Chen, Ehlers, Clark, and Mansell, 2002). It might be left to the higher cortical circuits to precisely analyze the specific valence and to control appropriate approach or avoidance responses. In conclusion, positive and negative picture content seems to influence perception in binocular rivalry, largely independent of physical differences. This yields more evidence for a preferential processing of emotional stimuli in the visual system. This series of experiments provides some of the first evidence that emotional stimuli are also preferentially processed during prolonged viewing. The use of the binocular rivalry paradigm could render an essential contribution to further investigations of emotional influences on visual perception.
References Adolph, D., Alpers, G. W., and Pauli, P. (2006). Physiological Reactions to emotional Stimuli: A Comparison between Scenes and Faces. In H. Hecht, S. Berti, G. Meinhardt and M. Gamer (Eds.), Beiträge zur 48. Tagung experimentell arbeitender Psychologen (pp. 235). Lengerich: Pabst. Alpers, G. W. (2008). Eye-catching: Right hemisphere attentional bias for emotional pictures. Laterality: Asymmetries of Body, Brain and Cognition, 13, 158-178. Alpers, G. W., and Gerdes, A. (2006). Im Auge des Betrachters: Bei Spinnenphobikern dominieren Spinnen die Wahrnehmung. In G. W. Alpers, H. Krebs, A. Mühlberger, P. Weyers and P. Pauli (Eds.), Wissenschaftliche Beiträge zum 24. Symposium der Fachgruppe Klinische Psychologie und Psychotherapie (pp. 87). Lengerich: Pabst. Alpers, G. W. and Gerdes, A.B.M. (2007). Here's looking at you: Emotional faces predominate in binocular rivalry. Emotion, 7, 495-506. Alpers, G. W., Gerdes, A. B. M., Lagarie, B., Tabbert, K., Vaitl, D., and Stark, R. (submitted). Attention modulates amygdala activity: When spider phobic patients do not attend to spiders.
182
Georg W. Alpers and Antje B.M. Gerdes
Alpers, G. W., Herrmann, M. J., Pauli, P., and Fallgatter, A. J. (2005). Emotional arousal and activation of the visual cortex: A Near Infrared Spectroscopy Analysis [abstract]. Journal of Psychophysiology, 19, 106. Alpers, G. W., Mühlberger, A., and Pauli, P. (2005). Angst - Neuropsychologie [Anxiety -Neuropsychology]. In H. F M. Hautzinger and G. Roth (Eds.), Neurobiologie psychischer Störungen [Neurobiology of mental disorders] (pp.523-544). Heidelberg: Springer. Alpers, G. W., and Pauli, P. (2006). Emotional pictures predominate in binocular rivalry. Cognition and Emotion, 20, 596-607. Alpers, G. W., Ruhleder, M., Walz, N., Mühlberger, A., and Pauli, P. (2005). Binocular rivalry between emotional and neutral stimuli: a validation using fear conditioning and EEG. International Journal of Psychophysiology, 57, 25-32. Amaral, D. G., Behniea, H., and Kelly, J. L. (2003). Topographic organization of projections from the amygdala to the visual cortex in the macaque monkey. Neuroscience, 118, 1099-1120. Amaral, D. G., Price, J. L., Pitkanen, A., and Carmichael, S. (1992). Anatomical organization of the primate amygdaloid complex. In J. P. Aggleton (Ed.), The Amygdala (pp. 1-66). New York: Wiley-Liss. Anders, S., Birbaumer, N., Sadowski, B., Erb, M., Mader, I., Grodd, W., et al. (2004). Parietal somatosensory association cortex mediates affective blindsight. Nature Neuroscience, 7, 339-340. Andrews, T. J., and Lotto, R. B. (2004). Fusion and rivalry are dependent on the perceptual meaning of visual stimuli. Current Biology, 14, 418-423. Bagby, J. W. (1957). A cross-cultural study of perceptual predominance in binocular rivalry. Journal of Abnormal and Social Psychology, 54, 331-334. Bentin, S., and Gollanda, Y. (2002). Meaningful processing of meaningless stimuli: The influence of perceptual experience on early visual processing of faces. Cognition, 86, B1-B14. Berridge, K. C., and Winkielman, P. (2003). What is an unconscious emotion? (The case for unconscious "liking"). Cognition and Emotion, 17, 181-211. Blake, R. (1988). Dichoptic reading: The role of meaning in binocular rivalry. Perception and Psychophysics, 44, 133-141. Blake, R. (1989). A neural theory of binocular rivalry. Psychological Review, 96, 145-167. Blake, R., and Logothetis, N. K. (2002). Visual competition. Nature Reviews Neuroscience, 3, 13-21.
Temporarily Blind in One Eye
183
Bradley, B. P., Mogg, K., Millar, N., Bonham-Carter, C., Fergussoon, E., Jenkins, J., et al. (1997). Attentional biases for emotional faces. Cognition and Emotion, 11, 25-42. Brown, R. J., and Norcia, A. M. (1997). A method for investigating binocular rivalry in real-time with the steady-state VEP. Vision Research, 37, 24012408. Catani, M., Jones, D. K., Donato, R., and Ffytche, D. H. (2003). Occipitotemporal connections in the human brain. Brain, 126, 2093-2107. Chen, Y. P., Ehlers, A., Clark, D. M., and Mansell, W. (2002). Patients with generalized social phobia direct their attention away from faces. Behaviour Research and Therapy, 40, 677-687. Coren, S., and Russell, J. A. (1992). The relative dominance of different facial expressions of emotion under conditions of perceptual ambiguity. Cognition and Emotion, 6, 339-356. Cuthbert, B. N., Schupp, H. T., Bradley, M. M., Birbaumer, N., and Lang, P. J. (2000). Brain potentials in affective picture processing: covariation with autonomic arousal and affective report. Biological Psychology, 52, 95-111. Davis, M., and Whalen, P. J. (2001). The amygdala: vigilance and emotion. Molecular Psychiatry, 6, 13-34. de Weert, C. M., Snoeren, P. R., and Koning, A. (2005). Interactions between binocular rivalry and Gestalt formation. Vision Research, 45, 2571-2579. Dimberg, U. (1982). Facial reactions to facial expressions. Psychophysiology, 19, 643-647. Dimberg, U., Thunberg, M., and Elmehed, K. (2000). Unconscious facial reactions to emotional facial expressions. Psychological Science, 11, 86-89. Eastwood, J. D., Smilek, D., and Merikle, P. M. (2003). Negative facial expression captures attention and disrupts performance. Perception and Psychophysics, 65, 352-358. Ekman, P., Sorenson, E. R., and Friesen, W. V. (1969). Pan-cultural elements in facial displays of emotion. Science, 164, 86-88. Engel, E. (1956). The role of content in binocular resolution. American Journal of Psychology, 69, 87-91. Farah, M. J., Wilson, K. D., Drain, M., and Tanaka, J. N. (1998). What is "special" about face perception? Psychological Review, 105, 482-498. Fox, R., and Check, R. (1968). Detection of motion during binocular rivalry suppression. journal of Experimental Psychology, 78, 388 - 395. Freeman, A. W., and Jolly, N. (1994). Visual loss during interocular suppression in normal and strabismic subjects. Vision Research, 34, 2043-2050.
184
Georg W. Alpers and Antje B.M. Gerdes
Garavan, H., Pendergrass, J. C., Ross, T. J., Stein, E. A., and Risinger, R. C. (2001). Amygdala response to both positively and negatively valenced stimuli. Neuroreport, 12, 2279-2783. Gilson, M., Brown, E. C., and Daves, W. F. (1982). Sexual orientation as measured by perceptual dominance in binocular rivalry. Personality and Social Psychology Bulletin, 8, 494-500. Globisch, J., Hamm, A. O., Esteves, F., and Öhman, A. (1999). Fear appears fast: Temporal course of startle reflex potentiation in animal fearful subjects. Psychophysiology, 36, 66-75. Hamann, S., and Mao, H. (2002). Positive and negative emotional verbal stimuli elicit activity in the left amygdala. Neuroreport, 13, 15-19. Hamann, S. B., Ely, T. D., Hoffman, J. M., and Kilts, C. D. (2002). Ecstasy and agony: activation of the human amygdala in positive and negative emotion. Psychological Science, 13, 135-141. Hamm, A. O., Weike, A. I., Schupp, H. T., Treig, T., Dressel, A., and Kessler, C. (2003). Affective blindsight: intact fear conditioning to a visual cue in a cortically blind patient. Brain, 126, 267-275. Hansen, C. H., and Hansen, R. D. (1988). Finding the face in the crowd: an anger superiority effect. Journal of Personality and Social Psychology, 54, 917-924. Haynes, J. D., Deichmann, R., and Rees, G. (2005). Eye-specific effects of binocular rivalry in the human lateral geniculate nucleus. Nature, 438, 496499. Helmholtz, H. v. (1924). Helmholtz's treatise on physiological optics. In J. P. C. Southall (Ed.). Rochester, N.Y.: The Optical Society of America. (Translated from the 3rd German edition, 1909) Hering, W. (Ed.). (1964). Outlines of a theory of the light sense (originally published in 1886). Cambridge, Mass.: Harvard University Press. Herrmann, M. J., Huter, T. J., Plichta, M. M., Ehlis, A.-C., Alpers, G. W., Mühlberger, A., et al. (2008). Enhancement of neural activity of the primary visual cortex for emotional stimuli measured with event-related functional near infrared spectroscopy (NIRS). Human Brain Mapping, 29, 28-35. Humphrey, G. K., and Blake, R. (2001). Introduction [Special issue on binocular rivalry]. Brain and Mind, 2, 1-4. Jeffreys, D. A. (1996). Evoked potential studies of face and object processing. Visual Cognition, 3, 1-38. Johnson, M. H. (2005). Subcortical face processing. Nature Reviews Neuroscience, 6, 766-774. Kaplan, I. T., and Metlay, W. (1964). Light intensity and binocular rivalry. journal of Experimental Psychology, 67.
Temporarily Blind in One Eye
185
Kohn, H. (1960). Some personality variables associated with binocular rivalry. The Psychological Record, 10, 9-13. Kovacs, I., Papathomas, T. V., Yang, M., and Feher, A. (1996). When the brain changes its mind: Interocular grouping during binocular rivalry. Proceedings of the National Academy of Sciences of the United States of America, 93, 15508-15511. Lang, P. J., Bradley, M. M., and Cuthbert, B. N. (2005). International affective picture system (IAPS) :Affective ratings of pictures and instruction manual. Technical Report A-6. University of Florida, Gainesville, FL. Lang, P. J., Bradley, M. M., Fitzsimmons, J. R., Cuthbert, B. N., Scott, J. D., Moulder, B., et al. (1998). Emotional arousal and activation of the visual cortex: an fMRI analysis. Psychophysiology, 35, 199-210. Larson, C. L., Schaefer, H. S., Siegle, G. J., Jackson, C. A. B., Anderle, M. J., and Davidson, R. J. (2006). Fear is fast in phobic individuals: Amygdala activation in response to fear-relevant stimuli. Biological Psychiatry, 60, 410417. LeDoux, J. (1996). The emotional brain: The mysterious underpinnings of emotional life. New York: Simon and Schuster. Logothetis, N. K., Leopold, D. A., and Sheinberg, D. L. (1996). What is rivalling during binocular rivalry? Nature, 380, 621-624. Lundqvist, D., Esteves, F., and Öhman, A. (2004). The face of wrath: The role of features and configurations in conveying social threat. Cognition and Emotion, 18, 161-182. Lundqvist, D., Flykt, A., and Öhman, A. (1998). The Karolinska Directed Emotional Faces (KDEF). Stockholm: Karolinska Institute. Lundqvist, D., and Öhman, A. (2005). Emotion regulates attention: The relation between facial configurations, facial emotion, and visual attention. Visual Cognition, 12, 51-84. Meng, M., and Tong, F. (2004). Can attention selectively bias bistable perception? Differences between binocular rivalry and ambiguous figures. Journal of Vision, 4, 539-551. Mogg, K., and Bradley, B. P. (2002). Selective orienting of attention to masked threat faces in social anxiety. Behaviour Research and Therapy, 40, 1403– 1414. Morris, J. S., Friston, K. J., Buchel, C., Frith, C. D., Young, A. W., Calder, A. J., et al. (1998). A neuromodulatory role for the human amygdala in processing emotional facial expressions. Brain, 121, 47-57.
186
Georg W. Alpers and Antje B.M. Gerdes
Nguyen, V. A., Freeman, A. W., and Alais, D. (2003). Increasing depth of binocular rivalry suppression along two visual pathways. Vision Research, 43, 2003-2008. Nguyen, V. A., Freeman, A. W., and Wenderoth, P. (2001). The depth and selectivity of suppression in binocular rivalry. Perception and Psychophysics, 63, 348-360. O'Shea, R. P., and Crassini, B. (1981). Interocular transfer of the motion aftereffect is not reduced by binocular rivalry. Vision Research, 21, 801-804. O'Shea, R. P., Sims, A. J. H., and Govan, D. G. (1997). The effect of spatial frequency and field size on the spread of exclusive visibility in binocular rivalry. Vision Research, 37, 175-183. Ogawa, T., and Suzuki, N. (2000). Emotion space as a predictor of binocular rivalry. Perceptual and Motor Skills, 90, 291-298. Ogawa, T., Takehara, T., Monchi, R., Fukui, Y., and Suzuki, N. (1999). Emotion space under conditions of perceptual ambiguity. Perceptual and Motor Skills, 88, 1379-1383. Öhman, A. (2005). The role of the amygdala in human fear: Automatic detection of threat. Psychoneuroendocrinology, 30, 953-958. Öhman, A., Flykt, A., and Esteves, F. (2001). Emotion drives attention: detecting the snake in the grass. Journal of Experimental Psychology: General, 130, 466-478. Öhman, A., Lundqvist, D., and Esteves, F. (2001). The face in the crowd revisited: A threat advantage with schematic stimuli. Journal of Personality and Social Psychology, 80, 381-396. Öhman, A., and Mineka, S. (2001). Fears, phobias, and preparedness: Toward an evolved of fear and fear learning. Psychological Review, 108, 483-522. Öhman, A., and Soares, J. J. F. (1994). "Unconscious anxiety": Phobic responses to masked stimuli. Journal of Abnormal Psychology, 103, 231-240. Pasley, B. N., Mayes, L. C., and Schiltz, R. T. (2004). Subcortical discrimination of unperceived objects during binocular rivalry. Neuron, 42, 163-172. Pessoa, L., Kastner, S., and Ungerleider, L. G. (2002). Attentional control of the processing of neutral and emotional stimuli. Cognitive Brain Research, 15, 31-45. Pessoa, L., Kastner, S., and Ungerleider, L. G. (2003). Neuroimaging studies of attention: from modulation of sensory processing to top-down control. Journal of Neuroscience, 23, 3990-3998. Polonsky, A., Blake, R., Braun, J., and Heeger, D. J. (2000). Neuronal activity in human primary visual cortex correlates with perception during binocular rivalry. Nature Neuroscience, 3, 1153-1159.
Temporarily Blind in One Eye
187
Posner, M. I., and Raichle, M. E. (1995). Precis of Images of Mind. Behavioral and Brain Sciences, 18, 327-383. Schupp, H. T., Junghöfer, M., Weike, A. I., and Hamm, A. O. (2003). Emotional facilitation of sensory processing in the visual cortex. Psychological Science, 14, 7-13. Sheinberg, D. L., and Logothetis, N. K. (1997). The role of temporal cortical areas in perceptual organization. Proceedings of the National Academy of Sciences of the United States of America, 94, 3408–3413. Shelley, E. L. V., and Toch, H. H. (1962). The perception of violence as an indicator of adjustment in institutionalized offenders. Journal of Criminal Law, Criminology and Police Science, 53, 463-469. Stolarova, M., Keil, A., and Moratti, S. (2006). Modulation of the C1 visual event-related component by conditioned stimuli: evidence for sensory plasticity in early affective perception. Cerebral Cortex, 16, 876-887. Tong, F. (2001). Competing theories of binocular rivalry: A possible resolution. Brain and Mind, 2, 55-83. Tong, F., and Engel, S. A. (2001). Interocular rivalry revealed in the human cortical blind-spot representation. Nature, 411, 195-199. Wade, N. J., and Ono, H. (1985). The stereoscopic views of Wheatstone and Brewster. Psychological Bulletin, 47, 125-133. Walker, P. (1978). Binocular rivalry: Central or peripheral selective processes? Psychological Bulletin, 85, 376-389. Whalen, P. J., Kagan, J., Cook, R. G., Davis, F. C., Kim, H., Polis, S., et al. (2004). Human amygdala responsivity to masked fearful eye whites. Science, 306, 2061. Whalen, P. J., Rauch, S. L., Etcoff, N. L., McInerney, S. C., Lee, M. B., and Jenike, M. A. (1998). Masked presentations of emotional facial expressions modulate amygdala activity without explicit knowledge. Journal of Neuroscience, 18, 411-418. Wheatstone, C. (1838). On some remarkable, and hitherto unobserved, phenomena of binocular vision. Philosophical Transactions of the Royal Society of London, 128, 371-394. Williams, M. A., Morris, A. P., McGlone, F., Abbott, D. F., and Mattingley, J. B. (2004). Amygdala response to fearful and happy facial expressions under conditions of binocular suppression. The Journal of Neuroscience, 24, 28982904. Wilson, H. R. (2003). Computational evidence for a rivalry hierarchy in vision. Proceedings of the National Academy of Sciences of the United States of America, 100, 14499-14503.
188
Georg W. Alpers and Antje B.M. Gerdes
Windmann, S., Wehrmann, M., Calabrese, P., and Güntürkün, O. (2006). Role of the prefrontal cortex in attentional control over bistable vision. Journal of Cognitive Neuroscience, 18, 456-471. Yu, K., and Blake, R. (1992). Do recognizable figures enjoy an advantage in binocular rivalry? Journal of Experimental Psychology: Human Perception and Performance, 18, 1158-1173.
In: Binocular Vision Editors: J. McCoun et al, pp. 189-208
ISBN 978-1-60876-547-8 c 2010 Nova Science Publishers, Inc.
Chapter 9
S TEREO -BASED C ANDIDATE G ENERATION FOR P EDESTRIAN P ROTECTION S YSTEMS David Geronimo1∗, Angel D. Sappa1 and Antonio M. L´opez1,2 1 Computer Vision Center and 2 Computer Science Department Universitat Aut`onoma de Barcelona, 08193, Bellaterra, Barcelona, Spain
Abstract This chapter describes a stereo-based algorithm that provides candidate image windows to a latter 2D classification stage in an on-board pedestrian detection system. The proposed algorithm, which consists of three stages, is based on the use of both stereo imaging and scene prior knowledge (i.e., pedestrians are on the ground) to reduce the candidate searching space. First, a successful road surface fitting algorithm provides estimates on the relative ground-camera pose. This stage directs the search toward the road area thus avoiding irrelevant regions like the sky. Then, three different schemes are used to scan the estimated road surface with pedestrian-sized windows: (a) uniformly distributed through the road surface (3D); (b) uniformly distributed through the image (2D); (c) not uniformly distributed but according to a quadratic function (combined 2D3D). Finally, the set of candidate windows is reduced by analyzing their 3D content. Experimental results of the proposed algorithm, together with statistics of searching space reduction are provided. ∗
E-mail address:
[email protected] 190
1.
David Geronimo, Angel D. Sappa and Antonio M. L´opez
Introduction
According to the World Health Organization, every year almost 1.2 million people are killed and 50 million are injured in traffic accidents worldwide [11]. These dramatic statistics highlight the importance of the research in traffic safety, which involves not only motor companies but also governments and universities. Since the early days of the automobile, in the beginning of 20th century, and along with its popularization, different mechanisms were successfully incorporated to the vehicle with the aim of improving its safety. Some examples are turn signals, seat-belts and airbags. These mechanisms, which rely on physical devices, were focused on improving safety specifically when accidents where happening. In the 1980s a sophisticated new line of research began to pursue safety in a preventive way: the so-called advanced driver assistance systems (ADAS). These systems provide information to the driver and perform active actions (e.g., automatic braking) by the use of different sensors and intelligent computation. Some ADAS examples are adaptive cruise control (ACC), which automatically maintains constant distance to a front-vehicle in the same lane, and lane departure warning (LDW), which warns when the car is driven out the lane unadvertently. One of the more complex ADAS are pedestrian protection systems (PPSs), which aim at improving the safety of these vulnerable road users. Attending to the number of people involved in vehicle-to-pedestrian accidents, e.g., 150 000 injured and 7 000 killed people each year in the European Union [6], it is clear that any improvement in these systems can potentially save many human lifes. PPSs detect the presence of people in a specific area of interest around the host vehicle in order to warn the driver, perform braking actions and deploy external airbags in the case of an unavoidable collision. The most used sensor to detect pedestrians are cameras, contrary to other ADAS such as ACC, in which active sensors like radar or lidar are employed. Hence, Computer Vision (CV) techniques play a key role in this research area, which is not strange given that vision is the most used human sense when driving. People detection has been an important topic of research since the beginning of CV, and it has been mainly focused on applications like surveillance, image retrieval and human-machine interfaces. However, the problem faced by PPSs differs from these applications and is far from being solved. The main challenges of PPSs are summarized in the following points:
Stereo-Based Candidate Generation...
191
• Pedestrians have a high variability in pose (human body can be viewed as a highly deformable target), clothes (which change with the weather, culture, and people), distance (typically from 5 to at least 25m), sizes (not only adults and children are different, but also there are many different human constitutions), viewpoints (e.g., front, back or side viewed). • The variability of the scenarios is also considerable, i.e., the detection takes place in outdoor dynamic urban roads with cluttered background and illumination changes. • The requirements in terms of misdetections and computational cost are hard: these systems must perform real-time actions at very low miss rates. The first research works in PPSs were presented in the late 1990s. Papageorgiou et al. [10] proposed to extract candidate windows by exhaustively scanning the input image and classify them with support vector machines based on Haar Wavelet features. This two-step candidate generation and classification scheme has been used in a countless number of detection systems: from faces [14], vehicles or generic object detection to human surveillance and image retrieval [3]. The simplest candidate generation approach is the exhaustive scan, also called sliding window: it consists in scanning the input image with pedestrian-sized windows (i.e., with a typical aspect ratio around 1/2) at all the possible scales and positions. Although this candidate generation method is generic and easy to implement, it can be improved by making use of some prior knowledge from the application. Accordingly, during the last decade researchers have tried to exploit the specific aspects of PPSs to avoid this generation technique. Some cues used for generating candidates are vertical symmetry [1], infrared hot spots [4] and 3D points [7]. However, the proposed techniques that exploit them pose several problems that make the systems not reliable in real-world scenarios. For example, in the case of 2D analysis, the number of false negatives (i.e., discarded pedestrians) can not be guaranteed to be low enough: symmetry relies on vertical edges, but in many cases the illumination conditions or background clutter make them disappear. Hot spot analysis in infrared images holds a similar problem because of the environmental conditions [2]. On the other hand, although stereo stands as a more reliable cue, the aforementioned techniques also hold problems. In the case of [7], the algorithm assumes a constant road slope, so the problems appear when the road orientation is not constant which is common in urban scenarios.
192
David Geronimo, Angel D. Sappa and Antonio M. L´opez
This chapter presents a candidate generation algorithm that reduces the number of windows to be classified while minimizes the number of wrongly discarded targets. This is achieved by combining a prior-knowledge criterion, pedestrians-on-the-ground, and using 3D data to filter the candidates. This procedure can be seen as a conservative but reliable approach, which in our opinion is the most convenient option for this early step of the system. The remainder of the manuscript is organized as follows. First, we introduce the proposed candidate generation algorithm with a brief description of its components and their objective. Then, the three stages in which the algorithm is divided are presented: Sect. 3. describes the road surface estimation algorithm, Sect. 4. presents the road scanning and Sect. 5. addresses the candidate filtering. Finally, Sect. 6. provides experimental results of the algorithm output. In Sect. 7., conclusions and future work is presented.
2.
Algorithm Overview
A recent survey on PPSs by Ger´onimo et al. [8] proposes a general architecture that consists of six modules, in which most of the existing systems can be fit. The modules (enumerated in the order of the pipeline process) are: 1) preprocessing, 2) foreground segmentation, 3) object classification, 4) verification and refinement, 5) tracking and 6) application. As can be seen, modules 2) and 3) correspond to the steps presented in the introduction. The algorithm presented in this chapter consists in a candidate generation algorithm to be used in the foreground segmentation module, which gets an input image and generates a list of candidates where a pedestrian is likely to appear, to be sent to the next module, the classifier. There are two main objectives to be carried out in this module. The first is to reduce the number of candidates, which directly affects the performance of the system both in terms of speed (the fewer the candidates sent to the classifier the less the computation time is) and detection rates (negatives can be pre-filtered by this module). The second is not to discard any pedestrian, otherwise the later modules will not be able to correct the wrong filtering. The proposed algorithm is divided into three stages, as illustrated in Fig. 1. 1. Road surface estimation computes the relative position and orientation between the camera and the scene (Sect. 3.).
Stereo-Based Candidate Generation...
193
2. Road scanning places 3D windows over the estimated road surface using a given scanning method (Sect. 4.). 3. Candidate filtering filters out windows that do not contain enough stereo evidence of containing vertical objects (Sect. 5.). Next sections describe each stage in detail. Road Surface Estimation
Road Scanning
Candidate Filtering
Vertical Objects
Estimated road position
Pre-selected windows
Discarded windows
Selected windows
Figure 1. Stages of the proposed algorithm.
3.
Road Surface Estimation
The first stage is focused on adjusting the candidate searching space to the region where the probability of finding a pedestrian is higher. In the context of PPSs, the searching space is the road, hence irrelevant regions like the sky can be directly ommited from the processing. The main targets of road surface estimation are two-fold: first, to fit a surface (a plane in the current implementation) to the road; second, to compute the relative position and orientation (pose) of the camera1 with respect to such a plane. A world coordinate system (XW , YW , ZW ) is defined for every acquired stereo image, in such a way that: the XW ZW plane is contained in the current road fitted plane, just under the camera coordinate system (XC , YC , ZC ); the YW axis contains the origin of the camera coordinate system; the XW YW plane contains the XC axis and the ZW YW plane contains the ZC axis. Due to that, the six extrinsic parameters (three for the position and three orientation angles) that refer the camera coordinate system to the world coordinate system reduce 1
Also referred to as camera extrinsic parameters.
194
David Geronimo, Angel D. Sappa and Antonio M. L´opez
to just three, denoted in the following as (Π, Φ, Θ) (i.e., camera height, roll and pitch). Figure 2 illustrates the world and camera coordinate systems.
ZC XC Camera height Ȇ
YC ZW XW Ro ll
Pitch Ĭ
ĭ
YW
Figure 2. Camera coordinate system (XC , YC , ZC ) and world coordinate system (XW , YW , ZW ). From the (Π, Φ, Θ) parameters, in most situations the value of Φ (roll) is very close to zero. This condition is fulfilled as a result of a specific camera mounting procedure that fixes Φ at rest, and because in normal urban driving situations this value scarcely varies [9]. The proposed approach consists of two substages detailed below (more information in [13]): i) 3D data point projection and cell selection and ii) road plane fitting and ROIs setting.
3.1.
3D Data Point Projection and Cell Selection
Let D(r, c) be a depth map provided by the stereo pair with R rows and C columns, in which each array element (r, c) is a scalar that represents a scene point of coordinates (xC , yC , zC ), referred to the camera coordinate system (Fig. 2). The aim at this first stage is to find a compact subset of points, ζ, containing most of the road points. To speed up the whole algorithm, most of the processing at this stage is performed over a 2D space.
Stereo-Based Candidate Generation...
195
Initially, 3D data points are mapped onto cells in the (YC ZC ) plane, resulting in a 2D discrete representation ψ(o, q); where o = ⌊DY (r, c) · ς⌋ and q = ⌊DZ (r, c) · ς⌋, ς representing a scale factor that controls the size of the bins according to the current depth map (Fig. 3). The scaling factor is aimed at reducing the projection dimensions with respect to the whole 3D data in order to both speed up the plane fitting algorithm and be robust to noise. It is defined as: ς = ((R + C)/2)/(∆X + ∆Y + ∆Z)/3); (∆X, ∆Y, ∆Z) is the working range in 3D space. Every cell of ψ(o, q) keeps a reference to the original 3D data points projected onto that position, as well as a counter with the number of mapped points. From that 2D representation, one cell per column (i.e., in the Y-axis) is selected, relying on the assumption that the road surface is the predominant geometry in the given scene. Hence, it picks the cell with the largest number of points in each column of the 2D projection. Finally, every selected cell is repreP P sented by the 2D barycenter (0, ( ni=0 yCi )/n, ( ni=0 zCi )/n) of its n mapped points. The set of these barycenters defines a compact representation of the selected subset of points, ζ. Using both one single point per selected cell and a 2D representation, a considerable reduction in the CPU time is reached during the road plane fitting stage.
39m Right camera
50m
YW
ZW
pro
jec tio n
ZC
Inliers band at ±10cm of plane hypothesis
XC Camera
Estimated road plane
5m
4m
Figure 3. YZ Projection and road plane estimation.
YC
196
3.2.
David Geronimo, Angel D. Sappa and Antonio M. L´opez
Road Plane Fitting
The outcome of the previous substage is a compact subset of points, ζ, where most of them belong to the road. As stated in the previous subsection, Φ (roll) is assumed to be zero, hence the projection is expected to contain a dominant 2D line corresponding to the road together with noise coming from the objects in the scene. The plane fitting stage consits of two steps. The first one is a 2D straight line parametrisation, which selects the dominant line corresponding to the road. It uses a RANSAC based [5] fitting applied over 2D barycenters intended for removing outlier cells. The second step computes plane parameters by means of a least squares fitting over all 3D data points contained into inlier cells. Initially, every selected cell is associated with a value that takes into account the amount of points mapped onto that position. This value will be considered as a probability density function. The normalized probability density function is defined as follows: pdfi = ni /N ; where ni represents the number of points mapped onto the cell i and N represents the total amount of points contained in the selected cells. Next, a cumulative distribution function, Fj , is defined as: Fj = ji=0 pdfi ; If the values of F are randomly sampled at n points, the application of the inverse function F −1 to those points leads to a set of n points that are adaptively distributed according to pdfi . P
3.2.1.
Dominant 2D Straight Line Parametrisation
At the first step a RANSAC based approach is applied to find the largest set of cells that fit a straight line, within a user defined band. In order to speed up the process, a predefined threshold value for inliers/outliers detection has been defined (a band of ±10 cm was enough for taking into account both data point accuracy and road planarity); an automatic threshold could be computed for inliers/outliers detection, following robust estimation of standard deviation of residual errors [12]. However, it would increase CPU time since robust estimation of standard deviation involves computationally expensive algorithms (e.g., sorting functions).
Stereo-Based Candidate Generation...
197
Repeat L times (a) Draw a random subsample of 2 different barycenter points (P1 , P2 ) according to the probability density function pdfi using the above process; (b) For this subsample, indexed by l (l = 1, ..., L), compute the straight line parameters (α, β)l ; (c) For this solution, compute the number of inliers among the entire set of barycenter points contained in ζ, as mentioned above using a ±10 cm margin.
3.2.2.
Road Plane Parametrisation
(a) From the previous 2D stright line parametrisation choose the solution that has the highest number of inlier; (b) Compute (a, b, c) plane parameters by using the whole set of 3D points contained in the cells considered as inliers, instead of the corresponding barycenters. To this end, the least squares fitting approach [15], which minimizes the square residual error (1 − axC − byC − czC )2 is used; (c) In case the number of inliers is smaller than 40% of the total amount of points contained in ζ (e.g., severe occlusion of the road by other vehicles), those plane parameters are discarded and the ones corresponding to the previous frame are used as the correct ones.
4.
Road Scanning
Once the road is estimated, candidates are placed on the 3D surface and then projected to the image plane to perform the 2D classification. The most intuitive scanning scheme is to distribute windows all over the estimated plane in a uniform way, i.e., in a nx ×nz grid, with nx sampling points in the road’s X axis and nz in the Z axis. Each sampling point on the road is used to define a set of scanning windows, to cover the different sizes of pedestrian, as will be described later.
198
David Geronimo, Angel D. Sappa and Antonio M. L´opez
Let us define ZC min = 5m as the minimum ground point seen from the camera2 , ZC max = 50m as the furthest point, and τ = 100 the number of available sampling positions along the ZC axis of the road plane (a, b, c). Given that the points are evenly placed over the 3D plane, the corresponding image rows can be computed by using the plane and projection equations. Hence, the sampled rows in the image are: y = y0 +
f c −f , bz b
(1)
where z = ZC min + iδZ ∀i ∈ {0, .., nz − 1}; δZ = (ZC max − ZC min )/nz is the 3D sampling stride; (a, b, c) are the plane parameters; f is the camera focal; and y0 is the y coordinate of the center point of the camera in the image. The same procedure is applied to the X xis, e.g., from XCmin to XCmax with the nx sampling points. We refer to this scheme as Uniform World Scanning. As can be appreciated in Fig. 4(a), this scheme has two main drawbacks: it oversamples far positions (i.e., Z close to ZC max ) and undersamples near positions (i.e., the sampling is too sparse when Z is close to the camera). In order to ammend these problems, it is clear that the sampling cannot rely only on the world but must be focused on the image. In fact, the sampling is aimed at extracting candidates in the 2D image. According to this, we compute the minimum and maximum image rows corresponding to the Z range: yZC max = y0 + yZC min = y0 +
f
c , b
(2)
c f −f , bZC min b
(3)
bZC max
−f
and evenly place the sampling points between these two image rows using: y = yZC m in + iδim ∀i ∈ {0..nz − 1} ,
(4)
where δim = (yZC min − yZC max )/nz . In this case, the corresponding z in the plane (later needed to compute the window size) is z= 2
f . c + b(y − y0 )
(5)
With a camera of 6mm focal, oriented to the road avoiding to capture the hood, the first road point seen is around 4 to 5 meters from the camera.
Stereo-Based Candidate Generation...
199
In the case of X axis, the same procedure as in the first scheme can be used. This scheme is called Uniform Image Scanning. In this case, it is seen in Fig. 4(b) that although the density of sampling points for the closer ZC is appropiate, the far ZC are undersampled, i.e., the space between sampling points is too big (see histogram of the same figure). Figure 5 displays the sampling functions with respect to the ZC scanning positions and the image Y axis. The Uniform to Image, in dotted-dashed-blue, draws a linear function since the windows are evenly distributed over the available rows. On the contrary, the Uniform to Road, in dashed-red, takes the form of an hyperbola as a result of the perspective projection. The aforementioned over- and under-sampling in the top and bottom regions of this curve can be also seen in this figure. Attending to the problems of these two approaches, we finally propose the use of a non-uniform scheme that provides a more sensible sampling, i.e., neither over- nor under-sampling the image or the world. The idea is to sample the image with a curve in between the two previous schemes, and adjust the row-sampling according to our needs, i.e., mostly linear in the bottom region of the image (close Z) and logarithmic-like for further regions (far Z), but avoiding over-sampling. In our case, we use a quadratic function of the form y = ax2 + bx + c, constrained to pass through the intersection points between the linear and hyperbolic curves and by a user defined point (iuser , yuser ) between the two original functions. The curve parameters can be found by solving the following system of equations:
yZC max a imax 2 imax 1 2 imin 1 b = yZC min , imin c yuser iuser 2 iuser 1
(6)
where imin = 0 and imax = nz − 1. For example, in the non-uniform curve in Fig. 5 (solid-black line), yuser = imin + (imax − imin )×κ and iuser = imin + (imax − imin )×λ, where κ = 0.6 and λ = 0.25. For the XC axis we follow the same procedure as with the other schemes. The resulting scanning, called non-uniform scanning, can be seen in Fig. 4(c). Once we have the set of 3D windows on the road, they are used to compute the corresponding 2D windows to be classified. We assume a pedestrian to be h = 1.70m high, with an standard deviation σ = 0.2m. In the case of body width, the variability is much bigger than height, so a width margin is used to adjust most of human proportions and also leave some space for the extremities. Hence, the width is defined as a ratio of the height, specifically 1/2.
200
David Geronimo, Angel D. Sappa and Antonio M. L´opez Image Rows
World
5
Over-sampling
Times Sampled
4
3
2
z
Under-sampling
x 1
0
280
300
320
340 360 380 400 Sampled Image Rows
420
440
460
(a) Uniform Road Scanning Image Rows
World
5
Times Sampled
4
3
2
z
Under-sampling
x 1
0
280
300
320
340 360 380 400 Sampled Image Rows
420
440
460
(b) Uniform Image Scanning Image Rows
World
5
Times Sampled
4
3
z
2
x 1
0
280
300
320
340 360 380 400 Sampled Image Rows
420
440
460
(c) Non-Uniform Scanning Figure 4. The three different scanning schemes. Right column shows the scanning rows using the different schemes and also a representation of the scan over the plane. In order to enhance the figure visualization just 50% of the lines are shown. The histograms of sampled image rows are shown on the left column; under- and over-sampling problems can be seen.
Stereo-Based Candidate Generation...
201
Figure 5. Scanning functions. A non-uniform road scanning with parameters κ = 0.6 and λ = 0.25 is between the uniform to road and to image curves, hence achieving a more sensible scan.
For example, the mean pedestrian window sizes 1.70 × 0.85m, independently of the extra-margin taken by the classifier3 .
5.
Candidate Filtering
The final stage of the algorithm is aimed at discarding candidate windows by making use of the stereo data (Fig. 6). The method starts by aligning the camera coordinate system with the world coordinate system (see Fig. 2) with the aim of compensating pitch angle Θ, computed in Sect. 3.. Assuming that roll is set to zero, as described in the aforementioned section, the coordinates of a given point p(x,y,z) , referred to the new coordinate system, are computed as follows: 3
Dalal et al. [3] demonstrate that adding some margin to the window (33% in their case) results in a performance improvement in their classifier.
202
David Geronimo, Angel D. Sappa and Antonio M. L´opez px R p yR pz R
= px = cos(Θ)py − sin(Θ)pz = sin(Θ)py + cos(Θ)pz .
ROIs over cells with few accumulated points are discarded
x x
y
x
x
3D p
x
y
lan e
x
x
x
x x
x x x
x x
x
x
x
x
x
x
(7)
x
x
x
x
z
x
x
x
Preserved ROI
x Discarded ROI
Figure 6. Schematic illustration of the candidate filtering stage. Then, rotated points located over the road4 are projected onto a uniform grid GP in the fitted plane (Sect. 3.), where each cell has a size of σ × σ. A given point p(xR , yR , zR ) votes into the cell (i, j), where i = ⌊xR /σ⌋ and j = ⌊zR /σ⌋. The resulting map GP is shown in Fig. 7(b). As can be seen, cells far away from the sensor tend to have few projected points. This is caused by two factors. First, the number of projected points decreases directly with the distance, as a result of perspective projection. Second, the uncertainty of stereo reconstruction also increases with distance, thus the points of an ideal vertical and planar object would spread wider into GP as the distance of these points increases. In order to amend this problem, the number of points projected onto each cell in GP are reweighted and redistributed. The reweighting function is GRW (i, j) = jσGP (i, j) ,
(8)
where jσ corresponds to the real depth of the cell. The redistribution function 4
Set of points placed in a band from 0 to 2m over the road plane, assuming that this is the maximum height of a pedestrian.
Stereo-Based Candidate Generation...
203
consists in propagating the value of GRW to its neighbours as follows: G(i, j) =
i+η/2
j+η/2
X
X
GRW (s, t) ,
(9)
s=i−η/2 t=j−η/2
where η is the stereo uncertainty at a given depth (in cells): η = uncertainty/σ. Uncertainty is computed as a function of disparity values: uncertainty = f ·baseline
µ , disparity2
(10)
where baseline is the baseline of the stereo pair in meters, f is the focal length in pixels and µ is the correlation accuracy of the stereo. The resulting map G, after reweighting and redistribution processes, is illustrated in Fig. 7(c). The filtering consists in discarding the candidate windows that are over cells with less than χ points, which is set experimentally. In our implementation, this parameter is low in order to fulfill the conservative criterion mentioned in the introduction, i.e., in this early system module false positives are preferred than false negatives.
(4)
(2)
(3) (1)
(1)
(2)
(5)
(4) (5) (2) (1)
(3)
(3)
(a)
x
(2)
x
(1)
z
(3)
z
(b)
(c)
Figure 7. Probability map of vertical objects on the road plane. (a) Original frame. (b) Raw projection GP . (c) Reweighted and redistributed vertical projection map of the frame 3D points.
6.
Experimental Results
The evaluation of the algorithm has been made using data taken from an onboard stereo rig (Bumblebee from Point Grey, http://www.ptgrey.com, Fig. 8). The stereo pair has a baseline of 0.12m and each camera has a focal of
204
David Geronimo, Angel D. Sappa and Antonio M. L´opez
6mm and provides a resolution of 640 × 480 pixel (the figures in the paper show the right sensor image). The HFOV is 43◦ and the VFOV is 33◦ , which allows to detect pedestrians at a minimum of 5m, and the camera reconstruction software provides 3D information until 50m, which coincides with the parameters described in Sect. 3.. As introduced in Sect. 1., one of the most used candidate generation methods is sliding window. Although this method does not perform an explicit foreground segmentation, which is our motivation, it is useful as a reference to evaluate the benefits of our proposal. Let us say that we must detect pedestrians up to 50m, which measure around 12 × 24 pixels (of course the size will slightly differ depending on the focal and the size of the sensor pixels). On the other hand, the nearest pedestrian fully seen, at 5m, is about 140 × 280 pixels. Hence, a regular exhaustive scan algorithm must place windows of the scales between these two distances at all the possible positions. If a scale variation is assumed to be 1.2 and the position stride is 4 pixels, the number of windows is over 100 000. However, smaller windows need a smaller stride between them, so the number can range between from 200 000 to 400 000.
Figure 8. Stereo pair used in our acquisition system. We have selected 50 frames taken from urban scenarios with the aforementioned stereo camera and applied the proposed algorithm. The parameters for the road surface estimation are L = 100 and ς = 0.68. In the case of the scanning, we have used the non-uniform scheme with τ = 90 sampling points, κ = 0.5 and λ = 0.25. The scanning in the XC axis is made in XC = {−10, .., 10}m with a stride of 0.075m. For each selected window, 10 different sizes are tested (the smallest 0.75 × 1.5m and the biggest 0.95 × 1.8m). The algorithm selects about 50 000 windows, which is a reduc-
Stereo-Based Candidate Generation...
205
Figure 9. Experimental results. The left column shows the original real urban frames in which the proposed algorithm is applied. The middle column corresponds to the final windows after the filtering step. The right column shows the number of windows generated after the scanning (Gen) and after the filtering (Final). In order to enhance the visualization the different scales tested for each sampling point are not shown, so just one candidate per point was drawn.
206
David Geronimo, Angel D. Sappa and Antonio M. L´opez
tion of about 75 − 90% with respect to the sliding window, depending on the stride of this latter. Then, we apply the filtering stage with a cell size of σ = 0.2 and χ = 2000, reducing again a 90% the number of candidates. This represents a reduction of 97 − 99% compared to the sliding window. Figure 9 illustrates the results in six of the frames used to test the algorithm. As can be seen, the pedestrians in the scenario are correctly selected as candidates, while other freespace areas are discarded to be classified. In addition, attending to the results, the number of false negatives is marginal, which is a key factor for the whole sytem performance.
7.
Conclusions
We have presented a three-stage candidate generation algorithm to be used in the foreground segmentation module of a PPS. The stages consist of road surface estimation, road scanning and candidate filtering. Experimental results demonstrate that the number of candidates to be sent to the classifier can be reduced by a 97 − 99% compared to the typical sliding window approach, while minimizing the number of false negatives to around 0%. Future work will be focused on the research of algorithms to fuse the cues used to select the candidates, which can potentially improve the proposed pipeline process.
Acknowledgements The authors would like to thank Mohammad Rouhani for his ideas with the road scanning section. This work was supported by the Spanish Ministry of Education and Science under project TRA2007-62526/AUT and research programme Consolider Ingenio 2010: MIPRCV (CSD200700018); and Catalan Government under project CTP 2008 ITT 00001. David Ger´onimo was supported by Spanish Ministry of Education and Science and European Social Fund grant BES-2005-8864.
References [1] M. Bertozzi, A. Broggi, R. Chapuis, F. Chausse, A. Fascioli, and A. Tibaldi. Shape–based pedestrian detection and localization. In Proc. of the IEEE International Conference on Intelligent Transportation Systems pages 328–333, Shangai, China, 2003.
Stereo-Based Candidate Generation...
207
[2] C.-Y. Chan and F. Bu. Literature review of pedestrian detection technologies and sensor survey. Technical report, Institute of Transportation Studies, Uni. of California at Berkeley, 2005. [3] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 886–893, San Diego, CA, USA, 2005. [4] Y. Fang, K. Yamada, Y. Ninomiya, B. Horn, and I. Masaki. A shapeindependent method for pedestrian detection with far-infrared images. IEEE Trans. on Vehicular Technology , 53(6):1679–1697, 2004. [5] M. Fischler and R. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Graphics and Image Processing, 24(6):381–395, June 1981. [6] United Nations Economic Commission for Europe. Statistics of road traffic accidents in Europe and North America, 2005. [7] D.M. Gavrila, J. Giebel, and S. Munder. Vision–based pedestrian detection: The PROTECTOR system. In Proc. of the IEEE Intelligent Vehicles Symposium, pages 13–18, Parma, Italy, 2004. [8] D. Ger´onimo, A. L´opez, A.D. Sappa, and T. Graf. Survey of pedestrian detection for advanced driver assistance systems. In IEEE Transactions on Pattern Analysis and Machine Intelligence (in press), 2009. [9] R. Labayrade and D. Aubert. A single framework for vehicle roll, pitch, yaw estimation and obstacles detection by stereovision. In Proc. of the IEEE Intelligent Vehicles Symposium, pages 31–36, Columbus, OH, USA, June 2003. [10] C. Papageorgiou and T. Poggio. A trainable system for object detection. International Journal on Computer Vision , 38(1):15–33, 2000. [11] M. Peden, R. Scurfield, D. Sleet, D. Mohan, A.A. Hyder, E. Jarawan, and C. Mathers. World Report on road traffic injury prevention. World Health Organization, Geneva, Switzerland, 2004. [12] P. Rousseeuw and A. Leroy. Robust Regression and Outlier Detection. John Wiley & Sons, New York, 1987.
208
David Geronimo, Angel D. Sappa and Antonio M. L´opez
[13] A.D. Sappa, F. Dornaika, D. Ponsa, D. Ger´onimo, and A. L´opez. An efficient approach to onboard stereo vision system pose estimation. IEEE Trans. on Intelligent Transportation Systems, 9(3):476–490, 2008. [14] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 511–518, Kauai, HI, USA, 2001. [15] C. Wang, H. Tanahashi, H. Hirayu, Y. Niwa, and K. Yamamoto. Comparison of local plane fitting methods for range data. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pages 663–669, Kauai Marriot, HW, USA, December 2001.
In: Binocular Vision Editors: J. McCoun et al, pp. 209-246
ISBN 978-1-60876-547-8 c 2010 Nova Science Publishers, Inc.
Chapter 10
D EVELOPMENT OF S ACCADE C ONTROL Burkhart Fischer Univ. of Freiburg, Optomotor Lab., Freiburg, Germany Abstract This chapter describes the development of eye movement control. We will consider, however, only those aspects of eye movements that are important for reading: stability of fixation and control of saccades (fast eye movements from one object of interest to another). The saccadic reflex and the control of saccades by voluntary conscious decision and their role in the optomotor cycle will be explained on the basis of the reaction times and neurophysiological evidence. The diagnostic methods used in the next part of the book will be explained in this chapter. The age curves of the different variables show that the development of the voluntary component of saccade control lasts until adulthood.
1.
Introduction
Saccades are fast eye movements. We make them all the time, 3 to 5 in a second. Due to these saccadic eye movements the brain receives 3 to 5 new pictures from the retina in each second. Without these ongoing sequences of saccades we would not see very much, because of the functional and anatomical structure of the retina: it contains in the middle a small area, called fovea, where the receptor cells and the other cells in the retinal layers are densely packed. It is only this small part of the retina which allows to see sharp images. What we want to see in detail and what we want to identify as an object or any other small
210
Burkhart Fischer
visual pattern, e.g. a letter, must be inspected by foveal vision. The solution for this biological demand are sequences of rapid relatively small eye movements (saccades). The time periods between saccades are 150 to 300 ms long. They are often called ”fixations”. Usually in everyday life these saccades are made automatically, i.e. we do not have to ”think” about them and we do not have to generate each of them by a conscious decision. But, by our own decision, we can also stop the sequence and fixate a certain small object for longer periods of times. We can also actively and voluntarily move our eyes from one place of interest to another. The situation is quite similar to our breathing: it works by itself but we can control it also voluntarily. We are not aware of these saccades, they remain unconscious and – most importantly – we do not see the jumps of the retinal image. Somehow, the visual system cooperates with the saccade control centres in a perfect way to differentiate between self-induced movements of the retinal image and externally generated movements. Only under certain somewhat artificial conditions we can see our saccades. An example is shown by Fig. 1. The figure shows a modification of the well known Hermann grid. As long as we look around across on this figure, we see black dots appearing spontaneously at the crossing lines between the black squares. The picture looks like scintillations. We notice, that there are no such black dots. To avoid this illusory black blinks, we just have to decide to stop making saccades. The reader may try be her/himself. Pick one of the white dots and maintain fixation at it. As long as one can prevent saccades, the black dots remain absent. Each saccade occurring spontaneously will create the illusion of dots again. One can also see illusory movements, which are related to eye movements. The Fig. 2 shows an example. The movements disappear when we stop to make eye movements. An example of a geometric illusion allows also to become aware of ones own saccadic eye movements. The Fig. 3 shows the Z-illusion. One can see that the prolongations of the short ends of the line (upper left and lower right) will not meet the corners at the lower left and upper right. If one succeeds to prevent all saccades for some seconds (up to 10, which is a long time in this context) one will see that the lines meet the corners as they do in reality. The reader may convince her/himself by using a ruler. As long as we do not take into account the fact that vision needs saccades,
Development of Saccade Control
211
Figure 1. Scintillations of the Herman grid due to eye movements. As long as we look around across this pattern, we see black dots jumping around. Whenever we try to look at one of them, there is no black spot. When fixating the eyes at one white spot for a few seconds black spots are no longer seen. With still longer fixation most of the white dots disappear also.
we will not understand the visual system. Any theory of vision which makes correct predictions but does not include the fast movements of the eyes can hardly be considered a valid theory. It is interesting to note, that most visual psychophysical experiments require the subject to fixate one spot of light while being examined on their visual experience with another stimulus a distance away. The reason for this methodological detail is the sharply decreasing visual acuity from the centre to the periphery of the visual field. Unfortunately, when it came to visual illusions, fixation was not longer required when testing the stability of the illusory impression. The result was, that the instability of geometrical illusions, shown in the examples above, remained undiscovered until recently [Fischer et al. 2003]. In particular, when we talk about reading, eye movements are one key for the
212
Burkhart Fischer
Figure 2. The horizontal movements that one sees when looking at this figure are obviously illusory, because nothing moves in this figure. The illusion disappears, when we stop our eye movements by fixating the centre of the figure.
understanding of the reading process, which allows us to compose words from letters or from syllables. We therefore have to consider the saccade system, before we may consider its role in reading. The significance of saccadic eye movements in reading was emphasized also by a reader model resting on the basis of experiments, where eye movements were measured while the subjects were reading text, which was manipulated in several physical and linguistic ways [Reichle et al. 2003]. In the following sections we will consider those parts of eye movement control, that play an important role for reading. The other types of eye movements (vestibular ocular compensation, optokinetic nystagmus) will be neglected all together. However, we will consider the instability of fixation due to unwanted saccades or to unwanted movements of the two eyes in different directions or with different velocities. In principal, binocular vision is not needed at all for reading, but imperfections of binocular vision, which remain undetected, may
Development of Saccade Control
213
Figure 3. Z-Illusion shows the capital letter Z. The prolongations of the short lines do not seem to hit the corners. The real geometric relations can be seen by using a ruler and draw the prolongations or they can be seen by fixating the eyes in the middle.
disturb vision and consequently reading may be difficult. The Fig. 4 shows the most important anatomical connections that begin in the retina and end up at the eye muscles. There are hundreds of papers on eye movements in the literature. Saccades have always received special interest [Fischer, 1987]. Here we are interested in fixation and in saccades. Today it is easy to measure eye movements in human observes. A sufficiently precise method uses infrared light reflection from the eyes. The data presented here are all collected by using this method. For the purpose of clinical application a special instrument was developed. It is called ExpressEye and provides the infrared light source, the light sensitive elements, the visual stimuli needed for basic tasks (with minilasers, see below), and the amplifiers. The instrument can deliver the raw eye position data trial by trial during the experiment and detect saccades and provides a statistical analysis of saccades. The front view of the system is shown in Fig. 5.The method has been described in detail elsewhere [Hartnegg and Fischer, 2002]. The raw data can be stored at the hard disc of a computer for further analysis to obtain different variables that characterize the performance of the tasks. These methods have been described in detail [Fischer et al. 1997].
214
Burkhart Fischer
Anatomical Pathways for Saccade Control ITC
Assoc Visual
Parietal MT
FEF
LGN
I
II MST
PFC
III Frontal
NC
SN
Tectal
BS
Figure 4. The figure shows a schematic diagram of the neural system of the control of visually guided saccades and their connections. LGN = Lateral Geniculate Nucleus; Assoc = Association Cortex; ITC = Infero-Temporal Cortex; FEF = Frontal Eye Field; PFC = Prefrontal Cortex; MT = Medio Temporal Cortex; MST = Medio-Superior-Temporral Cortex; NC = Nucleus Caudatus; SN = Substantia Nigra; Tectal = Tectum = Superior Coilliculus; BS = Brain Stem.
2.
Fixation and Fixation Stability
It may come as a surprise, that a section on eye movements starts by dealing with fixation, i. e. with periods of no eye movements. It has been the problem over many years of eye movement research, that fixation was not considered at all as an important active function. The interest was in the moving and not in the resting (fixating) eye. Only direct neurophysiological experiments [Munoz and Wurtz, 1992] and thorough investigation of the reaction times of saccades [Mayfrank et al. 1986] provided the evidence, that fixation and saccade generation are controlled in a mutual antagonistic way similar to the control of other body muscles. We will see, that we can observe movements of the eyes during periods where they were not supposed to move at all. It seems that there was little doubt, that almost any subject can follow the instruction ”fixate” or
Development of Saccade Control
215
Figure 5. The front view of the Express Eye designed to measure eye movements. One sees the screws for the mechanical adjustment in all 3 dimensions in front of each eye. Infrared light emitting diode and the two photocells are located behind and directed to the centre of the eye ball.
”do not move the eyes”. But this not the case: stability of fixation cannot always be guaranteed by all subjects. This section deals with the results of the corresponding analysis of eye movements.
2.1.
Monocular Instability
As pointed out earlier, we have to consider two different aspects of disturbances of fixation: the first aspect are unwanted (or intrusive) saccades. These are mostly small conjugate saccades that take the fovea from the fixation point and back. This kind of disturbance is called a monocular instability, because when it occurs one sees it in both eyes at exactly the same time and by the same amount of saccade size. The disturbance remains if one closes one eye and therefore it disturbs monocular vision and it does not disturb binocular vision. This is the reason why it is called a monocular instability. Below we will also explain the binocular instability. To measure the stability or instability of fixation due to unwanted saccades, we simply count these saccades during a short time period, when the subject is instructed to fixate a small fixation point. Such a period repeatedly occurs in both diagnostic tasks that are used for saccade analysis that are described in
216
Burkhart Fischer
section 3.2. on page 226. The number of unwanted saccades counted during this task is used as a measure of monocular fixation instability. For each trial this number is recorded and attributed to this trial. The mean value calculated over all trials will serve as a measure. The ideal value is zero for each individual trial and therefore the ideally fixating subject will receive also zero as a mean value. The Fig. 6 shows the mean values of the number of intrusive (unwanted) saccades per trial as a function of age. While children at the age of 7 produce one intrusive saccade every 2 or 3 trials, adults around 25 years of age produce one intrusive saccade every 10 trials. At higher ages the number of intrusive saccades increases again. Of course, not every intrusive saccade leads to an interruption of vision and therefore one can live with a number of them without problems. But if the number of intrusive saccades is to high, visual problems may occur.
Figure 6. The curve shows the age development of the number of unwanted (intrusive) saccades per trial. The ideal value would be zero.
When we measure the monocular instability by detecting unwanted saccades, we should not forget, that there may be also another aspect of strength or weakness of fixation, which cannot be detected by looking at the movements of the eyes during periods of fixation, but rather be looking at reaction times of saccades that were required when the subject has to disengage from a visible fixation point.
Development of Saccade Control
2.2.
217
Binocular Instability
To understand binocular stability we have to remember that the two eyes must be in register in order for the brain to ”see” only one image, even though each eye delivers its own image. This kind of convergence of the lines of sight of the two eyes at one object is achieved by the oculomotor system. We call it motor fusion. However, even with ideal motor fusion, the two images of the two eyes will be different, because they look at a single three dimensional object from slightly different angles. The process of perceiving only one object in its three dimensions (stereo vision), is called perceptual fusion, or stereopsis. When we talk about stereo vision (stereopsis) we mean fine stereopsis, i.e. single three-dimensional vision of objects. It is clear that we need both eyes for this kind of stereopsis. However, we also have three-dimensional vision with one eye only. The famous Necker cube shown in Fig. 7 is one of the best known examples. From the simple line drawing our brain constructs a threedimensional object. Close one eye and the percept of the cube does not change at all. This type of three-dimensional spatial vision does not need both eyes. The brain constructs a three-dimensional space within which we see objects. In order to guarantee stable stereopsis, the two eyes must be brought in register and they have to stay in register for some time. This means that the eyes are not supposed to move independently from each other during a period of fixation of a small light spot. By recording the movements of both eyes simultaneously one has a chance to test the quality of the stability of the motor aspect of binocular vision. The Fig. 8 illustrates the methods for determining an index of binocular stability. Two trials from the same child are depicted. In the upper trial the left eye shows stable fixation before and after the saccade. The right eye, however, converges after the saccade producing a period of non-zero relative velocity. In the lower case, both eyes produce instability after the saccades. The example shows, that the instability is sometimes produced by one eye only, or by both eyes simultaneously Often it is caused in some trials by one eye, and in other trials by the other eye. Extreme dominance of one eye producing the instability was rarely seen (see below). In the example of Fig. 8 the index of binocular instability was 22%. This means, that the two eyes were moving at different velocities during 22% of the analysed time frame. To characterize a subject’s binocular stability as a whole,
218
Burkhart Fischer
Figure 7. The figure shows the famous Necker cube. Note that one sees a three-dimensional object even though the lines are not correctly connected. The percept of a three-dimensionl cube remains even when we close one eye. Note, that the lines do not really meet at the corners of the cube. Yet, our perception is stable against such disturbances and the impression of a cube is maintained.
the percent number of trials, in which this index exceeded 15% was used. The ideal observer will be assigned zero. The worst case would be assigned a value of 100%. The Fig. 9 shows the data of binocular instability of a single subject. The upper left panel depicts the frequency of occurrence of the percentages of time, during which the eyes were not in register. The upper right panel depicts the distribution of the relative velocity of the two eyes. The scatter plot in the lower left panel displays the correlation between these variables. Ideally all data points should fall in the neighbourhood of zero. The lower right panel depicts the time development of the variable percent time of limits by showing the single values as they were obtained trial by trial from trial 1 to trial 200. This panel allows to see, whether or not fatigue has in influence on the binocular stability. When the values of the binocular stability were compared among each other, several aspects became evident: (i) Within a single subject the values assigned to the trials can be very different. Almost perfect trials may be followed by trials with long periods of instability. This means, that the subject was not completely
Development of Saccade Control
219
Figure 8. The figure illustrates the methods for determining an index of binocular stability by analysing the relative velocity of the two eyes. Time runs horizontally. At the time of stimulus onset the subject was required to make a saccade. Up means right, down means left. Two trials from the same child are depicted. In the upper case the left eye shows stable fixation before and after the saccade. The right eye, however, converges after the saccade producing a period of non-zero relative velocity. In the lower case, both eyes produce instability after the saccades. For details see text.
unable to main the line of gaze for both eyes, but from time to time the eyes drifted against each other. (ii) There was a large interindividual scatter of the mean values even within a single age group. (iii) Even among the adult subjects large amounts of instability were observed. (iv) The test-retest reliability was reduced by effects of fatigue or general awareness of the subjects. The Fig. 10 shows the age development of the binocular instability using data from the prosaccade task with overlap conditions. At the beginning of school large values but also small values were obtained. There was a clear tendency towards smaller values until the adult age. How-
220
Burkhart Fischer
Figure 9. The figure shows the data of binocular instability of a single subject. For details see text.
ever, the ideal value of zero is not reached at any age. This means that somehow small slow movements of the two eyes in different directions during short periods of time are well tolerated by the visual system. In other words: there are subjects with considerably instable binocular fusion not complaining about visual problems. Maybe these subjects suppress the ”picture” of one eye to avoid double vision all the time at the price of a loss of fine stereo vision, their vision is monocular. Because this does not create too much of a problem in everyday life, subjects do not show up in the eye doctors praxis. Their binocular system is never checked. This situation could be regarded as similar to the case of redgreen colour blindness, which may remain undetected throughout life, because the subject has no reason to take tests of colour vision.
Development of Saccade Control
221
Figure 10. The percentage of trials in which the two eyes were moving relative to each other (in more than 15% of the analysis time) is shown as a function of age.
2.3.
Eye Dominance in Binocular Instability
Since the two eyes send two different pictures to the brain, the picture of one of the two eyes must be prevented from automatically reaching consciousness. This is true for most of the visual scene we see. Only those parts form one single picture in the brain that fall on corresponding points of the two retinae. It is often forgotten, that this part covers only those objects, that are at about the same distance from the eyes as the object, that we are just fixating with both eyes. Because of the necessity to suppress the information from one eye most of the time, it has been speculated, that each subject selects one eye as the dominant eye (similar to the selection of one hand as the dominant hand). If, however, the image of one eye is permanently suppressed, fine stereopsis is not possible. We can easily see, that the images of both eyes are present in our visual system, but usually, we do not perceive both of them. Fixating a near point and attending to an object further away leads to double vision of the object. The reader may try by her/himself using the thumb of one hand as a near point and the thumb of the other hand as a far point. The Fig. 11 shows the distribution of the differences between the right eye values and the left eye values of the index of binocular instability. The mean value is not significantly different from zero. But one sees that in a few cases
222
Burkhart Fischer
clear dominances are obtained (8 subjects scored values far to the left, 3 far to the right).
Figure 11. The distribution of the differences between the right eye and the left eye values of binocular instability.
2.4.
Independence of Mono- and Bino-Fixation Instability
In principle, the two types of instability may have the same reason: a weak fixation system allows all kinds of unwanted eye movement, including unwanted saccades and unwanted drifts of one or both eyes. In this case one should see high correlations between the two variables describing these types of instability. Because both variable depend on age, we analyse the data within restricted age group. The Fig. 12 shows the scatter plots of the binocular versus the monocular instability for two age groups. The correlation coefficients were only 0.22 for the younger subjects (left side) and 0.21 for the older group (right side). Both correlations failed to reach a significance level of 1%. This means that the properties assessed by the two measures of fixation instability are independent from each other and different in nature. When we look at the data of dyslexic children, we will have many more data, that will support the independence of these two aspects of fixation instability. Also, we will see later, that a monocular training improves the binocular instability but not the monocular instability.
Development of Saccade Control
223
Figure 12. Scatter plot of the binocular versus the monocular instability obtained from overlap prosaccade trials. The left panels depict the data from subjects between 7 and 17 years of age (N= 129), the right panel shows the data from older subjects 22 to 45 years of age (N=97). No correlations can be seen.
3.
Development of Saccade Control
In the last section we have considered the stability of fixation as an important condition for perfect vision. Earlier, we have mentioned, that saccades are necessary for vision. This might sound as a contradiction. The real requirement is, that one should be able to generate sequences of saccades and fixations without an intrusion of unwanted saccades and without loosing the convergence of the two eyes when they are in register for a given distance. Therefore, both components of gaze control should function normally. This section will show that saccade control has to be subdivided into subfunctions described by different variables. We have to find out first, what these subfunctions are and how they can be assessed.
3.1.
The Optomotor Cycle and the Components of Saccade Control
The control of saccades has been investigated for about 40 years and still we do not understand the system completely. Specialization of visual scientists and oculomotorists has prevented that the two fields so closely related have been investigated by corresponding combined research projects for a long time. The oculomotor research groups were interested in the eye movement as a move-
224
Burkhart Fischer
ment. Their interest begins when the eyes begin to move and it stops when the eyes stop to move. The visual groups on the other hand, were interested in time periods when the eyes do not move. They required their subjects to fixate a small spot, while being tested for their visual functions. Only when the interest was concentrated on the time just before the movements, there was a chance to learn more about the coordination of saccades and visual processes. The time before a saccade is easily defined by the reaction time: one asks a subject to maintain fixation at one spot of light straight ahead and to make a fast eye movement to an other spot of light, as soon as it appeared. Under these conditions the reaction time is in the order of 200 ms. This is a value, which one can find in a student handbook. However, there were several problems. The first was: why is this time so long? While this was a question all from the beginning there was no answer until 1983/84, when the express saccade was discovered in monkeys [Fischer and Boch, 1983] and in human observers [Fischer and Ramsperger, 1984]. The express saccades is the reflex movement to a suddenly presented light stimulus after an extremely short reaction time (70-80 ms in monkeys and 100-120 ms in human observers). The reflex needs an intact superior colliculus [Schiller et al. 1987]. The Fig. 17 shows in its lower part a distribution of saccadic reaction times. It exhibits 3 modes: one at about 100 ms, the next at about 150 ms, and the third at about 200 ms. It was evident from these observation, that there was not only one reaction time with a (large and unexplained) scatter. Rather the reaction time spectrum indicated that there must be at least 3 different presaccadic processes that determine the beginning of a saccade, each taking its own time in a serial way. Depending on how many of the presaccadic processes are completed already before the occurrence of the target stimulus, the reaction time can take one out of three values, each with a certain amount of scatter [Fischer et al. 1995]. It became clear that the shortest reaction time was 100 ms (not 200 ms) and this was much easier to explain by the time the nerve impulses needed to be generated in the retina (20 ms), to travel to the cortex (10 ms), to the centres in the brain stem (10 ms) and finally to the eye muscles (15 ms). Another 5 ms elapse before the eye begins to move. A much shorter time remains, that was attributed to a central computation time to find the correct size of the saccade to be programmed. One has to know at this point, that saccades are pre-programmed movements: during the last 80 ms before a saccade actually starts, one cannot
Development of Saccade Control
225
change anything anymore. The next problem was: what is it that keeps the eyes from producing saccades all the time? Or, the other way around: what is it that enables us to fixate an object on purpose? The answer came from observations of cells that were active during time periods of no eye movements and that were inhibited, when saccades were made [Munoz and Wurtz, 1993]. What could have been found much earlier, became clear only after the neuroscientists began to think in very small steps: each process, that we experience as one unique action, must be eventually subdivided into a number of sub-processes. It became clear that the break of fixation and/or allocated attention was a necessary step before a saccade can be generated. There quite a number of single papers contributing to the solution of the related problems. Most of them have been summarized and discussed earlier [Fischer and Weber, 1993]. Most important for the understanding of the relation between saccades and cognitive processes is the finding that there is a component in saccade control that relies on an intact frontal lobe [Guitton et al. 1985] From all these consideration became clear, that sequences of fixations and reflexes form the basis of natural vision.
Cognitive Processes Attention Decision Reflex Fixation
Stop
Go
Figure 13. The figure shows the functional principle of the cooperation of the 3 components of eye movement control.
226
Burkhart Fischer
The Fig. 13 shows a scheme, which summarizes and takes into account the different findings: the stop-function by fixation alternates with the reflex, the go-function. These two together built up a stop-and-go traffic of fixations and saccades. What remains open, was the question of how it was possible to interrupt this automatic cycling. The answer came from observations of the frontal lobe functions: patients who lost parts of their frontal lobe at one side, were unable to suppress the reflex in a simple tasks, called the antisaccade task [Guitton et al. 1985]. This task requires the subject to make a saccade to one side, when the stimulus is presented at the opposite side. The task became very popular during recent years, but it was used already many years ago [Hallet, 1978].
3.2.
Methods and Definition of Variables
The fundamental aspects of saccade control as described by the optomotor cycle have been discovered by using two fundamental tasks, which are surprisingly similar but give insight into different aspects of the optomotor cycle. They have been used to quantitatively measure the state of the system of saccade control. We describe these methods and define the variables first. Then we will see some of the results obtained by the these methods. The two tasks are called the overlap prosaccade task and the gap antisaccade task. The words pro and anti in their names refer to the instructions that the subject is given. The words overlap and gap describe the timing of the presentation of the fixation point. The Fig. 14 shows the sequence of frames for gap and for overlap conditions. In both tasks a small light stimulus is shown, which the subjects is asked to fixate. This stimulus is called the fixation point. In overlap trials a new stimulus is added left or right t of the fixation point. The subjects is asked to make a saccade to this new stimulus, the target stimulus, as soon as it appears. Both, the fixation point and the target are visible throughout the rest of the trial: they overlap in time. This overlap condition and the task to look towards (’pro’) the stimulus explain the complete name of the task: overlap prosaccade task. The gap condition differs from the overlap condition in only one aspect: the fixation point is extinguished 200 ms before the target stimulus is presented. The time span from extinguishing the fixation point to the onset of the new target stimulus is called gap. In addition to this physical difference the instruction for the subject is also changed: the subject is required to make a saccade in the
Development of Saccade Control
227
direction opposite (’anti’) of the stimulus: when the stimulus appears left, the subject shall look to the right and vice versa. Therefore the complete name of this task is: gap antisaccade task. The prosaccade task with overlap condition allows to find too slow or too fast reaction times and to measure their scatter. The presence of a fixation point should prevent the occurrence of too many reflexes, the appearance of a new stimulus should allow a timely generation of a saccadic eye movement. The antisaccade task with gap condition challenges the fixation system to maintain fixation and the ability to generate a saccade against the direction of the reflex.
Overlap Prosaccade Task
Gap Antisaccade Task
+
Gap
+
+
Figure 14. The figure shows the sequence of frames for overlap and for gap conditions. The horizontal arrows indicate, in which direction the saccade should be made: to the stimulus in the prosaccade task, in the direction opposite to the stimulus, in the antisaccade task.
Now we can define variables to describe the state of the saccade control system. First of all, one has to keep in mind, that these variables may be different for left versus right stimulation. Because left/right directed saccades are generated by the right/left hemisphere, side differences should not be much of a surprise. However, for the general definition of the variables to be used in the diagnosis, the side differences do not need to be considered at this point. The Fig. 15illustrates the definition of the variables described below. Time runs from left to right. The stimulus is indicated by the thick black line. Because its presentation is identical in both conditions, it is drawn only once in the middle. The fixation point is shown by the thin black line. In the case of an overlap trial, the fixation remains visible, in the case of a gap trial the fixation point is extinguished 200 ms before. In addition the figure shows schematically traces
228
Burkhart Fischer
of eye movements, which help to understand the definition of the variables. The upper case shows a trace from an overlap trial. Usually one saccade is made and it contributes its reaction time, SRT. Below one sees two examples of traces. One shows a correct antisaccade, which contributes its reaction time, ANTISRT. The other trace depicts a trial with a direction error that was corrected a little later. It contributes the reaction time of the error, Pro-SRT and the correction time, CRT (in case the error was corrected). While these variables can be taken from every single trail, some other variables are determined by the analysis of the complete set of 200 trials: the percentage of express saccades from all overlap trials, the percentage of errors from all gap trials and the percentage of corrections among the errors.
Stimulus and Eye Movement Events % express SRT
PRO - OVERLAP
eye position
Fixationpoint Stimulus
Fixationpoint ANTI - GAP time
Gap ProSRT
% errors CRT
% corrections
AntiSRT
Figure 15. The schematic drawing of eye movement traces illustrates the definition of the different variables describing the performance of the prosaccade task with overlap conditions and the antisaccade task with gap conditions. List of variables: From the overlap prosaccade task the following mean values and the scatter are used: • SRT: saccadic reaction time in ms from the onset of the target to the beginning of the saccade • % expr: the percentage of express saccades, i.e. reaction times between
Development of Saccade Control
229
80 and 130 ms. From the gap antisaccade task: • A-SRT: the reaction time of the correct antisaccades • Pro-SRT: the reaction time of the errors • CRT: the correction time • %err: the percentage of errors • %corr: the percentage of corrections among the errors Note that the percentage of trials, in which the subject missed to reach the opposite side within the time limit in the trial (700 ms from stimulus presentation) can be calculated as pmis = perr · (100 − pcorr)/100 The latter variable combines errors rate and correction rate.
3.3.
Prosaccades and Reflexes
The optomotor system has a number of reflexes for automatic reactions to different physical stimulations. The best known reflex is the vestibular-ocular reflex, which compensates head or body movements to stabilize the direction of gaze on a fixated object: the eyes move smoothly in the direction opposite to the head movement in order to keep the currently fixated object in the fovea. Similarly, is it possible to stabilize the image of a moving object by the optokinetic reflex. Both reflexes have little or nothing to do with reading. The saccadic reflex is a reaction of the eyes to a suddenly appearing light stimulus. It was discovered only in 1983/84 by analysing the reaction times of the saccades in a situation, where the fixation point was extinguished shortly (200 ms gap) before a new target stimulus was presented. It was known at that time that under these gap conditions the reaction times were considerably shorter as compared to those obtained under overlap conditions [Saslow, 1967]. When the gap experiment of Saslow was repeated years later, it became evident that among the well know reactions around 150 ms after target onset there was a
230
Burkhart Fischer
separate group of extremely short reactions at about 100 ms, the express saccade [Fischer and Ramsperger, 1984]. The Fig. 16 shows the distribution of reaction times from a single subject. One clearly sees two peaks. The first peak consists of express saccades, the second represents the fast regular saccades.
Figure 16. The figure shows the distributions of reaction times from a single subject. One clearly sees two peaks. The first represents the express saccades, the second the fast regular saccades.
The Fig. 17 shows the difference in the distributions of reaction times when gap and overlap trials were used. The separate peaks in the distributions indicate, that saccades can be generated at distinctly different reaction times depending on the preparatory processes between target onset and the beginning of the saccade. In the gap condition there is time during the gap to complete one or even two pre-saccadic processes. Therefore the chances of generation of express saccades is high. In the overlap condition it is the target stimulus, which triggers the preparatory processes and therefore the chances of express saccade are low. If one leaves the fixation point visible throughout the trial (overlap condition) the reaction times are considerably longer, even longer as compared with the gap=0 condition (not shown here). The consistent shortening of reaction time by introducing a temporal gap between fixation point offset and target onset was surprising, because the role of fixation and of the fixation point as a visual stimulus was unknown. The effect is called the gap-effect and has been investigated in numerous studies of different
Development of Saccade Control
231
Figure 17. The figure shows the difference in the distributions of reaction times when gap and overlap trials were used. Note the separate peaks in the distributions.
research groups all around the world since 1984. The effect of the gap on the reaction time is strongest if the gap lasts approximately 200 Milliseconds. An overview and a list of publications can be found in an overview article [Fischer and Weber, 1993]. Today it is clear, that the main reason for the increase in reaction time under overlap conditions is due to an inhibitory actions of a separate subsystem in the control of eye movements, the fixation system. It is activated by a foveal stimulus, which is being used as a fixation point and it inhibits the subsystem which generates saccades. If this stimulus is removed early enough, the inhibition is removed by the time the target occurs and a saccade can be generated immediately, i.e. after the shortest possible reaction time. Note, that the effect of the gap is not a general reduction of reaction times,
232
Burkhart Fischer
but rather the first peak is larger and the third peak is smaller or almost absent. As a result the mean value of the total distribution is reduced. At this point, we do not have to go through the discussion of whether or not directed visual attention also inhibits or des-inhibits the saccade system depending on whether attention is engaged or disengaged. This issue has been discussed extensively and still today different the arguments are not finally settled We only have to keep in mind, that the gap conditions enables the reflex movements to a suddenly presented visual stimulus. It is also important to remember, that there are subjects, who produce express saccades under overlap conditions [Fischer et al. 1993]. We will encounter these so called express saccade makers [Biscaldi et al. 1996] again, when we consider the eye movements of dyslexic subjects.
3.4.
Antisaccades: Voluntary Saccade Control
It is an everyday experience, that we can stop our saccades and that we can direct our centre of gaze to a selected object or location in space on our own decision. These saccades are called voluntary saccades for obvious reasons. All from the beginning it will not be a big surprise to learn, that there are also different neural subsystems, that generate the automatic saccades and the voluntary saccades. The investigation of voluntary saccades was introduced many years ago [Hallett, 1978], but the oculomotor research community did not pay attention to it very much. Hallett instructed his subjects to make saccades to the side opposite to a suddenly presented stimulus. These saccades were called antisaccades. An early observation of neurologists did not receive much attention either, but turned out to be very important. It was reported that patients, who lost a considerable part of their frontal lobe in one side only, were unable to generate antisaccades to the side of the lesion, while the generation of antisaccades to the opposite side remained intact [Guitton et al. 1985]. Meanwhile the antisaccade task has become an almost popular ”instrument” for diagnosis in neurology and neuropsychology. Reviews have been published and can be consulted by the interested reader [Everling and Fischer, 1998]; [Munoz and Everling, 2004]. The effect of changing the instruction from ” look to the stimulus, when it appears” to ”look away from from the stimulus (the anti-instruction)” can be seen in Fig. 18.
Development of Saccade Control
233
Figure 18. The figure shows the distribution of saccadic reaction times under overlap (left) and under gap conditions (right). The lower panels show the data from those trials, in which the subjects made errors by looking first to the stimulus. Note, that with overlap conditions there are virtually no such errors, while with gap conditions a considerable number of errors were made.
The introduction of the gap leads to quite a number errors. Interestingly, subjects often failed to judge their performance: some claimed that they made many errors, but did not make many, others claimed that they made few errors, but made quite many. This indicates, that we have little conscious knowledge of what we do with our eyes. The processes preparing the saccades and their execution remain mostly unconscious. When the variables obtained from an overlap prosaccade task and from a gap antisaccade task were analysed by a factor analysis [Gezeck et al. 1997] it turned out, that there were only 2 factors. The first factor contained the variables that describe prosaccades, irrespective of whether
234
Burkhart Fischer
they were generated in the prosaccade task or as erros in the antisaccade task. The second factor contained the variables that described the performance of the antisaccade task. But there was one exception: the error rate loaded on both factors. The explanation of this result becomes evident, when we remember that the correct performance of the antisaccade task requires 2 steps: suppression of the prosaccades and generation of antisaccades. The error may be high for 2 reasons: (i) the suppression is not strong enough, (ii) the subject has difficulties in looking to the side where there is no target. The details of this observations finally resulted in the decision to use the 2 tasks described above in order to characterize the functional state of the system of saccade control. The procedure of the corresponding analysis of the raw eye movement data have been described in great detail [Fischer et al. 1997]. The tasks are illustrated in Fig. 14, the definition of the variables are illustrated by Fig. 15. Today, there is already a special instrument and analysis system, which allows to measure the eye movements and to assess the variables, their mean values and their scatter. Test-retest reliability of saccade measures, especially also for measures of antisaccade task performance are available [Klein and Fischer, 2005] The data obtained from these two tasks are shown in Fig. 19 separately for left and right stimulation. The data were combined from 8 subjects in the age range of 14 to 17 years. The distributions show most of the important aspects of the data, which are not as clear in the data of single subjects. The 3 peaks are seen in both left and right distributions obtained from the prosaccade task with overlap conditions (upper panels). But they are not quite identical: more express saccades are made to the right stimulus than to the left stimulus. The antisaccades (lower panels) have longer reaction times and a structure with different modes is missing. Earlier studies of antisaccade performance as summarized recently [Munoz and Everling, 2004] analyse the reaction times in the antisaccade task and the percentage of errors. Most studies, however, failed to analyse the reaction time of the errors. They also neglected the percentage of corrective saccades and the correction time. We will see below, that these variables provide important information about the reasons, why errors were made [Fischer et al. 2000]; [Fischer and Weber, 1992]. We therefore also show the distributions of the reaction times of the errors and the distributions of the correction times of the same subjects as in Fig. 19. Now we can further explain the data shown in Fig. 19 and in Fig. 20. The
Development of Saccade Control
235
Figure 19. The figure shows the distributions of reaction times of 8 subjects performing the prosaccade task with overlap conditions (upper panels) and the antisaccade task with gap conditions. Panels at the left and right show the data for left and right stimulation, respectively.
subjects as a group made quite a number of express saccades in the overlap prosaccade task (upper panels of Fig. 19) indicating that their ability to suppress saccades is limited. We can see that the errors in the gap antisaccade task (upper panels of Fig. 20) contained also more than 50% express saccades. The error rate is 35% at the left and 42% at the right side. Of these errors 87% and 92% were corrected after very short correction times of 131 ms and 129 ms, respectively (lower panels of Fig. 20). This indicates that the subjects have no problem of looking to the opposite side. They do reach the destination, but they get there with a detour because they could not suppress the saccade to the target. Their errors were mostly due to a weakness of the fixation system. This reminds us, that we have already 2 independent factors of instability of
236
Burkhart Fischer
Figure 20. The figure shows the reaction times of the errors (upper panels) and the distributions of the correction times (lower panels) of the same subjects as in Fig. 19.
fixation: the intrusive saccades, and the binocular instability of slow movements of the two eyes in different directions or with different velocities. Now a third aspect is added by the occurrence of express saccades and in particular, when they occur as errors in the antisaccade task. Fixation may also by weak, when it does not allow to suppress the errors in the antisaccade task.
3.5.
The Age Curves of Saccade Control
After these considerations and definitions we can look at the age development of the different variables. The data presented here contain many more subjects than in an earlier study, which has shown already the development of saccade control with age increasing from 7 to 70 years [Fischer et al. 1997].
Development of Saccade Control
237
Fig. 21 begins with the age curves of the performance of prosaccades with overlap conditions. The reaction times start with about 240 ms at the age of 7 to 8 years. During the next 10 years the reaction times become shorter by about 50 or 60 ms. From the age of 40 years one sees a gradual increase of the reaction times. At about 60 years they reach the level of the 7 year old children.
Figure 21. The diagrams show the age curves of the performance of prosaccades with overlap conditions. The left side depicts the age dependence of the reaction times, the right side shows the age dependence of the percentage of express saccades in the distributions. N=425. One might expect that the occurrence of reflex-like movements (express saccades) is also a function of age, because the reflexes receive more cortical control with increasing age. However, this general aspect of the development may be seen much earlier in life, i.e. during the first year of life. Yet, there is strong tendency of a reduction of the number of express saccades with increasing age from a mean value just below 15% to a mean value of about 5%. There are however, extreme cases of subjects producing quite many express saccades. The large scatter in the data is due to these subjects. It has been stated, that percentages of express saccades above a limit of 30% must be regarded as an exceptional weakness of the fixation system. The corresponding subjects are called express saccade makers [Biscaldi et al. 1996]. An extreme case of an express saccade maker is shown in Fig. 22. In this subject the express saccades occur only to the right side. Later in the book we will look at the percentage of express saccades among the prosaccades generated under overlap conditions, because we want to be pre-
238
Burkhart Fischer
Figure 22. The figure shows the distributions of saccadic reaction times from a single subject, who performed the prosaccade task with overlap condition. Saccades to the left side are depicted by the left panel, those to the right side by the right panel. Note the large peak of express saccade to the right as compared with no express saccades to the left.
pared for the diagnosis of saccade control in the following parts of the book, when large amounts of express saccades are made by single subjects of certain ages. The Fig. 23 shows the age curves for the variables that describe the performance of the antisaccade task. The reaction times of the correct antisaccades are depicted by the upper left panel. The mean value of the youngest group at about 340 ms is 100 ms is slower than that of their prosaccades. As in the case of the prosaccades, a reduction of the reaction times is obtained within the next 10 years. However, they are reduced by about 100 ms. When compared with the prosaccades this reduction is two times as big. The percentage of errors (middle left panel) reaches almost 80% for the youngest group. This means that they are almost completely unable to do the task in one step. The error rate decreases down to about 20%, stays at this level and increases after the age of about 40 years. The bottom left panel depicts the correction rate. Out of the 80% errors, the youngest group was able to correct the primary error in only 40% of cases. The correction rate increases until the age of 20 to above 80%, stays at this level and decreases again after the age of 50 years. Combining the two measures of error production and correction results in
Development of Saccade Control
239
Figure 23. The figure shows the age development of the performance of the antisaccade task with gap conditions. N=328. The period of the ”best” values is between 20 and 40 years of age.
the age curve of the percentage of uncorrected errors (misses) shown by the lower right panel of 23. The children of the youngest group reached the opposite
240
Burkhart Fischer
side in only half of the trials. During the following 10 years the rate of misses drops down to almost zero. This indicates that the adult subjects in the age range between 20 and 40 years produce 20% errors, but they correct almost all of them. During years after the age of 60 the subjects begin to have more difficulties in correcting their increasing rate of errors. Finally, we look at the reaction times of the errors shown by the upper right panel. The age curve reflects the curve for the reaction time of the prosaccades generated in the overlap condition. However, all error reaction time were shorter by about the same amount of 50 ms over the complete range of ages covered.
3.6.
Left – Right Asymmetries
The question of hemispheric specialisation is asked for almost any aspect of brain functions. In the case of saccade control it might be argued, that depending on the culture writing goes from left to right, right to left, or from top to bottom. Therefore we look at the possible asymmetries of the different variables describing saccade control. The differences between left and right variables did not show any systematic age dependence, presumably because the age dependence for the right and the left variables have the same development. Therefore we look at the total distribution of the difference values for all ages. The Fig. 24 shows these distributions of differences for 6 variables.The upper left panels depicts the differences in the reaction time of the prosaccades with overlap conditions. The distribution looks rather symmetrical and in fact the deviation of the mean value is only 6 ms and not significantly different from zero. However, this does not indicate that there are no asymmetries. It shows, that asymmetries occur about as often in favour of the right side as they occur in favour of the left side. The standard deviation of 30 ms to either side indicates that in 32% of the cases the reaction times differ by 30 ms or more. The tendency is that reaction times are somewhat shorter for the right directed saccades as compared with the left directed saccades. This small difference maybe related to the fact that the German language is written from left to right (all data in this book comes from native German speakers). The upper right panel depicts the differences between the percentages of express saccades made to the right and to the left. The mean value is -1.1% and not significantly different from zero. But there is a tendency to more express saccades to right than to the left. The standard deviation if 12% indicating that
Development of Saccade Control
241
Figure 24. The figure shows the distributions of the left minus right differences of 6 variables describing saccade control.
32% of the subjects produced more then 12% of their express saccades to one side than to the other. Extreme cases can be seen within this relatively large group of normal subjects. An example can be seen in Fig. 22. The distribution of saccadic reaction times obtained with overlap conditions are shown for left
242
Burkhart Fischer
and right directed prosaccades. Almost all saccades to the left occur between 130 ms and 170 ms. These are fast regular saccades. Most saccades to the right occur between 85 ms and 140 ms. These are express saccades. The figure demonstrates an extreme case of asymmetry of prosaccades. The reaction times of the correct antisaccades in the gap condition shows a similar result: one encounters quite a number of subjects with heavy asymmetries (32% with differences of more than 45 ms), but the mean value of 5 ms is statistically not significant from zero. The correction times exhibit even stronger asymmetries: in 32% of the subject the differences are larger than 55 ms. The percentage of errors in the antisaccade task exhibit differences of more than 15% in 32% of the cases and the differences of the percentage of corrective saccades are larger than 28% in 32% of the subjects. From the consideration of the asymmetries in saccade control we can conclude that large asymmetries occur in quite many cases. Because the asymmetries in favour of the right or of the left side are about the same in number as well as in size, the mean value of the distribution does not deviate significantly from zero.
3.7.
Correlations and Independence
Large numbers of errors in the antisaccade task are often interpreted as a consequence of a weak fixation system. This would imply that many intrusive saccades should be observed in the overlap prosaccade task (poor mono fixation stability) along with many errors in the gap antisaccade task. We can look at the possible correlation between these two measures. Fig. 25 shows the scatter plot of the data obtained from control subjects in the age range of 7 to 13 years. While the correlation coefficient indicates a positive significant correlation, the plot shows in detail, that the relation works only in one direction: High values of intrusive saccades occur along with high values of errors, but not vice versa: high values of errors may occur along with low or with high numbers of intrusive saccades. In other words: even if a subject is able to suppress intrusive saccades while fixating a small spot, he/she may not be able to suppress reflexive saccades to a suddenly presented stimulus. But a subject, who is able to suppress the reflexive saccades, is also able to suppress intrusive saccades. This means that the reason for many errors in the antisaccade task may be a weak fixation system, but other reasons also exits such that high errors rates may be produced even though the mono fixation stability was high. The analysis of the relationship between errors, error correction, correction
Development of Saccade Control
243
Figure 25. Scatterplot of error rate in the gap antisaccade task and mono fixation instability. High values of intrusive saccades occur along with high values of errors, but not vice versa: high values of errors may occur along with low or with high numbers of instrusive saccades.
time and express saccades can also be used to learn more about fixation and its role in saccade control. Those, who produce many errors and many express saccades, correct their errors more often and after shorter correction times in comparison to subjects, who produce also many errors but relatively few express saccades. They correct the errors not as often and the correction times are longer. The details are described in the literature [Mokler and Fischer, 1999]. In conclusion from this section we can state, that saccade control has indeed 3 main components: fixation (being weak or strong as indicated by express saccades), reflexive control, and voluntary control. These 3 components work together in the functional from of the optomotor cycle. The functioning of the cycle improves over the years from the age of 7 to adult age and has a strong tendency to deteriorate after the age of 40 years [Fischer et al. 1997].
References Biscaldi, M; Fischer, B; Stuhr, V. (1996). Human express-saccade makers are impaired at suppressing visually-evoked saccades. J Neurophysiol 76: 199-214 Everling, S; Fischer, B. (1998). The antisaccade: a review of basic research
244
Burkhart Fischer and clinical studies. Neuropsychologia 36: 885-899
Fischer, B. (1987). The preparation of visually guided saccades. Rev Physiol Biochem Pharmacol 106: 1-35 Fischer, B; Biscaldi, M; Gezeck, S. (1997). On the development of voluntary and reflexive components in human saccade generation. Brain-Res 754: 285-297 Fischer, B; Boch, R. (1983). Saccadic eye movements after extremely short reaction times in the monkey. Brain-Res 260: 21-26 Fischer, B; Breitmeyer, B. (1987). Mechanisms of visual attention revealed by saccadic eye movements. Neuropsychologia 25: 73-83 Fischer, B; daPos, O; Strzel, F. (2003). Illusory illusions: The significance of fixation on the perception of geometrical illusions. Perception 32: 10011008 Fischer, B; Hartnegg, K; Mokler, A. (2000). Dynamic visual perception of dyslexic children. Perception 29: 523-530 Fischer, B; Ramsperger, E. (1984). Human express saccades: extremely short reaction times of goal directed eye movements. Exp-Brain-Res 57: 191195 Fischer, B; Gezeck, S; Huber, W. (1995). The three-loop-model: A neural network for the generation of saccadic reaction times. Biol Cybern 72: 185-196 Fischer, B; Weber, H. (1992). Characteristics of ”anti” saccades in man. ExpBrain-Res 89: 415-424 Fischer, B; Weber, H. (1993). Express Saccades and Visual Attention. Behavioral and Brain Sciences 16,3: 553-567 Fischer, B; Weber, H; Biscaldi, M; Aiple, F; Otto, P; Stuhr, V. (1993). Separate populations of visually guided saccades in humans: reaction times and amplitudes. Exp-Brain-Res 92: 528-541 Gezeck, S; Fischer, B; Timmer, J. (1997). Saccadic reaction times: a statistical analysis of multimodal distributions. Vision Res 37: 2119-2131
Development of Saccade Control
245
Guitton, D; Buchtel, HA; Douglas, RM. (1985). Frontal lobe lesions in man cause difficulties in suppressing reflexive glances and in generating goaldirected saccades. Exp-Brain-Res 58: 455-472 Hallet, PE. (1978). Primary and secondary saccades to goals defined by instructions. Vision Res 18: 1279-1296 Hallett, P. (1978). Primary and secondary saccades to goals defined by instructions. Vision Res 18: 1279-1296 Hartnegg, K; Fischer, B. (2002). A turn-key transportable eye-tracking instrument for clinical assessment . Behavior, Research Methods, Instruments, & Computers 34: 625-629 Klein, C; Fischer, B. (2005). Instrumental and test-retest reliability of saccadic measures. Biological Psychology 68: 201-213 Mayfrank, L; Mobashery, M; Kimmig, H; Fischer, B. (1986). The role of fixation and visual attention in the occurrence of express saccades in man. Eur Arch Psychiatry Neurol Sci 235: 269-275 Mokler, A; Fischer, B. (1999). The recognition and correction of involuntary saccades in an antisaccade task. Exp Brain Res 125: 511-516 Munoz, DP; Wurtz, RH. (1992). Role of the rostral superior colliculus in active visual fixation and execution of express saccades. J-Neurophysiol 67: 1000-1002 Munoz, DP; Wurtz, RH. (1993). Fixation cells in monkey superior colliculus. I. Characteristics of cell discharge. J Neurophysiol 70: 559-575 Munoz, DP; Everling, S. (2004). Look away: the anti-saccade task and the voluntary control of eye movement. Nature Reviews/ Neuroscience 5: 218-228 Reichle, ED; Rayner, K; Pollatsek, A. (2003). The E-Z-Reader model of eyemovement control in reading: comparison to other models. Behavioral and Brain Sciences 26: 445-526 Saslow, MG. (1967). Latency for saccadic eye movement. J Opt Soc Am 57: 1030-1033
246
Burkhart Fischer
Schiller, PH; Sandell, JH; Maunsell, JH. (1987). The effect of frontal eye field and superior colliculus lesions on saccadic latencies in the rhesus monkey. J Neurophysiol 57: 1033-1049
In: Binocular Vision Editors: J. McCoun et al, pp. 247-248
ISBN: 978-1-60876-547-8 © 2010 Nova Science Publishers, Inc.
Short Commentary
OCULAR DOMINANCE Jonathan S. Pointer Optometric Research 4A Market Square, Higham Ferrers, Northamptonshire NN10 8BP, UK
The time has come to consider seriously what has hitherto possibly been regarded as a somewhat narrow proposition; namely, that ocular dominance is best understood by the behaviourally descriptive term ‘sighting preference’. By this is meant: The eye that is consciously or unconsciously selected for monocular tasks. This definition aligns with the first description of the phenomenon recorded four hundred years ago. Sighting dominance can be regarded as the ocular laterality most analogous to decisions regarding limb choice, ie, under circumstances when only one of a pair can be selected for use (eg, writing hand, ball-kicking foot). An individual’s sighting choice appears to be substantially consistent within and between applicable tests, which latter are usually based on simple motor tasks such as pointing, aiming or alignment. Such techniques have proved to be reliable indicators of dominance across populations and within either gender; the ocular laterality thus identified is apparently stable with advancing chronological age, and appears not to show familial traits. In contrast, indications of ocular dominance from sensory tests, including the binocular viewing of rivalrous stimuli or the recording of functional oculo-visual
248
Jonathan S. Pointer
asymmetries (often of visual acuity), have a tendency to show intra- and inter-test disagreement. Theories of ocular dominance have developed and evolved through the twentieth century to acknowledge and accommodate these discrepancies, and to account for the not infrequent disagreement found in the individual between the laterality result obtained using a sighting compared to a sensory test format. Many of these explanations have been at best parsimonious and sometimes in contradiction of the prevailing knowledge of binocular vision, including that of visual suppression. Sighting dominance is not reliably associated with limb preference or, for reasons of ocular neuro-anatomy, predicted by cortical laterality: a generalised (uniform) laterality preference of sensory organs and motor limbs in the individual is unlikely. Despite a burgeoning research output over recent decades, the identification of a functional basis for ocular dominance continues to prove elusive: the phenomenon remains a ‘demonstrable habit’, adopted when a single eye must be chosen for a particular viewing task.
INDEX A abnormalities, 147, 152 ACC, 190 accessibility, 129 accidents, 190, 207 accommodation, 143, 144, 152, 157, 158, 160 accuracy, viii, ix, 2, 4, 13, 22, 23, 31, 35, 37, 69, 81, 83, 99, 100, 103, 196, 203 ACM, 59 activation, 165, 168, 169, 172, 174, 176, 180, 182, 184, 185 acute, 66 adaptability, 74 adaptation, 74, 146, 147 adjustment, 2, 27, 58, 59, 88, 187, 215 adult, 67, 73, 112, 123, 156, 219, 240, 243 adulthood, xii, 209 adults, xi, 73, 76, 78, 155, 191, 216 Africa, 66 age, xi, xii, 66, 67, 72, 74, 79, 144, 147, 148, 151, 155, 157, 200, 209, 216, 219, 220, 221, 222, 223, 234, 236, 237, 238, 239, 240, 242, 243, 247 aggregation, 130 aggressiveness, 170 aid, 64, 174 Aircraft, 134, 136 Airship, 132 albinism, 109 algorithm, xi, xii, 3, 4, 6, 16, 19, 21, 24, 26, 27, 29, 31, 32, 33, 34, 37, 38, 39, 43, 44, 61, 116, 122, 127, 189, 191, 192, 193, 194, 195, 201, 203, 204, 205, 206
alternative, 68, 71, 113, 148 alternatives, 64 alters, 168 ambiguity, 3, 31, 35, 183, 186 ambivalent, 166 amblyopia, vii, x, xi, 139, 140, 147, 148, 149, 150, 152, 153, 157 American Psychiatric Association, 175, 176 amplitude, ix, 76, 108, 110, 114, 119, 120 amygdala, 162, 163, 164, 165, 169, 176, 180, 181, 182, 183, 184, 185, 186, 187 anger, 184 animal studies, 162, 164 animals, 72, 168 aniridia, 108 annealing, 104 anomalous, 111, 122, 146, 147, 150, 151, 152 antagonistic, 214 anterior cingulate cortex, 180 anxiety, 182, 186 application, 59, 112, 116, 126, 129, 130, 191, 192, 196, 213 argument, 17, 169 Aristotle, 66, 69, 79 arousal, 165, 171, 175, 181, 182, 183, 185 artificial intelligence, x, 125, 129, 137 aspect ratio, 191 assessment, vii, xi, 40, 79, 111, 149, 150, 155 astigmatism, 117 astronomy, 64 asymmetry, 242 atropine, 148, 152 attentional bias, 181 Australia, 66 automatic processes, 162, 180
250
availability, 2 avoidance, 127, 181 awareness, x, 79, 125, 130, 132, 166, 219
Index
B basic research, 243 battery, 68, 76 beating, 111, 119 behavior, ix, 81, 82, 96, 104, 162 benefits, x, 5, 125, 204 Bezier, ix, 81, 82, 83, 84, 85, 87, 88, 91, 103, 104, 105 bias, viii, 2, 3, 4, 58, 76, 185 binomial distribution, 88 birds, 131 birth, ix, 107, 108, 109, 119, 147 blindness, 220 bonding, 48, 50 botulinum, 117 brain, 131, 163, 164, 165, 166, 169, 185, 209, 217, 221, 224, 240 brain functioning, 131 brain functions, 240 brain stem, 224 branching, 18 breathing, 210 buildings, 127 buttons, 176
C calibration, 2, 95, 104, 109, 113 caloric nystagmus, 115 Canada, 113 candidates, 16, 74, 191, 192, 197, 198, 206 capacity, 131 case study, 119 cataract, 80, 108, 109, 148, 149 cataract surgery, 80 cataracts, 75, 109, 148, 150 cell, 168, 194, 195, 196, 202, 206, 245 cerebral hemisphere, 68 channels, 168, 169 chiasma, 71 childhood, 67
children, 67, 73, 77, 111, 112, 113, 122, 147, 148, 149, 151, 152, 153, 191, 216, 222, 237, 239, 244 China, 123, 206 classical, 75, 143 classification, xi, 111, 129, 175, 189, 191, 192, 197 clay, 64 clinical approach, 74 clinical assessment, 245 clinical disorders, vii, xi, 140 clinical examination, 111 coding, 173, 175 cognitive, 225 cognitive process, 225 coil, 112, 113 communication, 58 community, 232 compensation, 212 competition, 167, 168, 169, 177, 182 compilation, 67 complexity, 30, 35, 173 components, 19, 103, 110, 119, 192, 223, 225, 243, 244 computation, 3, 4, 35, 190, 192, 224 computational performance, 99 computing, ix, 6, 31, 32, 37, 38, 62, 108, 133 conception, 109, 131 conditioning, 178, 179, 182, 184 confidence, 18, 102, 103, 130 configuration, 9, 10, 15, 22, 94 confusion, 146, 147 congenital cataract, 109 Congress, 133, 134, 136 conscious awareness, 166 conscious knowledge, 233 conscious perception, 166, 167, 169 consciousness, 166, 221 consensus, 207 constraints, viii, 2, 5, 6, 15, 63, 105 consumption, x, 125, 129, 130, 131 continuity, 5, 60, 105 contrast sensitivity, 110, 122 control, vii, x, xi, xii, 117, 123, 125, 126, 127, 128, 129, 130, 131, 132, 136, 158, 161, 166, 169, 170, 174, 176, 177, 181, 186, 188, 190, 209, 210, 212, 214, 223, 225, 226, 227, 231, 234, 236, 237, 238, 240, 241, 242, 243, 245
Index
convergence, 21, 24, 26, 110, 112, 143, 144, 149, 157, 160, 217, 223 convex, 35, 36, 37 coordination, vii, x, 77, 139, 149, 224 corneal opacities, 148 correlation, 5, 16, 17, 113, 119, 128, 203, 218, 222, 242 correlation coefficient, 119, 222, 242 correlation function, 16, 17 correlations, 69, 222, 223 cortex, 71, 140, 152, 153, 163, 164, 165, 168, 169, 176, 180, 182, 184, 185, 187, 224 cortical processing, 164 countermeasures, 132 covering, 67 CPU, 195, 196 CRC, 61 critical period, 147 crocodile, 131 cross-cultural, 182 cross-talk, 109 CRT, 228, 229 cruise missiles, 126 crystalline, 143 cues, xi, 6, 161, 162, 165, 176, 177, 191, 206 culture, 174, 191, 240 cumulative distribution function, 196 Cybernetics, 59, 61, 133 cycles, x, 108, 116, 117 cycling, 226 cycloplegic refraction, 147 Cyprus, 121
D damping, 110 danger, 162, 165 data set, viii, 2 decisions, 247 decoupling, 3 defects, 117 deficiency, 6 deficit, 148, 150 definition, 149, 227, 228, 234, 247 deformation, 83 demand, 74, 75, 126, 210 density, 196, 199 dependent variable, 172 depressed, 170
251
depression, 159 deprivation, 147, 148 depth perception, xi, 72, 140, 141, 144, 149, 152 detection, ix, xi, 40, 81, 82, 85, 103, 104, 126, 127, 128, 140, 151, 180, 186, 189, 190, 191, 192, 196, 206, 207, 208 deviation, xi, 3, 103, 147, 155, 156, 157, 158, 159, 240 diodes, 104 diplopia, x, 139, 141, 142, 146, 147 discomfort, 74 discrimination, 150, 186 diseases, ix, 107 disorder, ix, 107, 108, 149 displacement, 2, 82, 84, 87, 88, 90, 94, 96, 99, 104 dissociation, xi, 155, 156, 157, 158, 160 distribution, 66, 85, 88, 196, 218, 221, 222, 224, 230, 232, 233, 240, 241, 242 division, 29, 30, 31, 32, 33, 34, 164 doctors, 220 dominance, vii, viii, xi, xii, 22, 63, 64, 67, 68, 69, 70, 72, 75, 76, 77, 78, 79, 80, 161, 167, 168, 171, 172, 173, 176, 177, 179, 180, 183, 184, 217, 247, 248 DSM-IV, 176 duration, 111, 114, 117, 121, 148, 173, 174, 175, 177
E Education, 206 EEG, 165, 179, 181, 182 elaboration, 73, 129 electrodes, 112, 113 electromagnetic, 113, 130 electromagnetic wave, 130 embryogenesis, 109 emission, 130 emitters, 113 emotion, 163, 172, 174, 180, 182, 183, 184, 185 emotional, xi, 161, 162, 163, 164, 165, 169, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187 emotional information, 163, 164 emotional reactions, 163, 174, 176, 178
252
Index
emotional stimuli, xi, 161, 162, 163, 164, 165, 174, 179, 180, 181, 184, 186 emotional valence, 178 emotions, 171 encoding, 108 environment, x, 13, 40, 125, 127, 130, 132, 166 environmental conditions, 191 EOG, 112, 121 epipolar geometry, 3, 5, 6 epipolar line, 10, 15, 16, 17 esotropia, 147, 148, 150, 151, 152 estimating, 3, 4, 59 estimation process, 4, 40 estimator, 59, 117, 119 Euro, 62 Europe, 207 European Social Fund, 206 European Union, 190 evolution, 111, 126, 131, 133 examinations, 68 execution, 233, 245 exotropia, 148 experimental design, 175 explicit knowledge, 187 extraction, vii, 1, 29, 37, 38, 48, 50, 58, 131 eye movement, vii, ix, xii, 68, 107, 108, 109, 111, 112, 113, 114, 115, 116, 117, 119, 122, 123, 141, 143, 144, 145, 150, 209, 210, 211, 212, 213, 214, 215, 222, 223, 224, 225, 227, 228, 231, 232, 234, 244, 245 eyes, vii, viii, ix, x, 63, 64, 65, 66, 69, 70, 71, 72, 73, 74, 75, 78, 108, 109, 110, 111, 112, 139, 140, 141, 144, 148, 149, 156, 158, 166, 167, 168, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 229, 233, 236
F facial expression, 171, 174, 175, 178, 179, 183, 185, 187 factor analysis, 68, 233 failure, 73 false negative, 191, 203, 206 false positive, 203 familial, 66, 109, 247 family, 3, 66, 79 fatigue, 218, 219
fear, 162, 175, 176, 178, 182, 184, 186 February, 65 feedback, 76, 143, 180 females, 67 filters, 69, 193 fixation, ix, xii, 65, 73, 107, 109, 110, 113, 120, 141, 143, 144, 148, 209, 210, 211, 212, 213, 214, 215, 216, 217, 219, 222, 223, 224, 225, 226, 227, 229, 230, 231, 235, 236, 237, 242, 243, 244, 245 flight, x, 125, 126, 127, 128, 129, 130, 132, 136, 163 flow, 3, 60, 127, 130 fluid, 72, 75 Fourier, 114 fovea, ix, 107, 108, 109, 110, 117, 119, 123, 148, 209, 215, 229 Fox, 150, 179, 183 FPGA, 136 frog, 131 frontal lobe, 225, 226, 232 functional magnetic resonance imaging, 78, 79, 165, 185 fundus, 111 fusion, 64, 72, 141, 142, 143, 144, 150, 156, 217, 220
G Gaussian, 85 gender, 67, 247 generation, 113, 191, 192, 204, 206, 214, 227, 230, 232, 234, 244 genetic factors, 66 Geneva, 207 Ger, 192, 206, 207, 208 Germany, 141, 161, 209 Gestalt, 171, 183 glasses, 148 goals, 245 government, 190 GPS, 129, 132 grades, 142 grass, 186 gratings, 178 grouping, 185 groups, 21, 129, 131, 177, 179, 222, 223, 224, 231 growth, 111
Index
guidance, x, 125, 128
H habituation, 172 handedness, 64, 76, 78, 79 harmonics, 114 Harvard, 184 heart, 164 heart rate, 164 height, 103, 194, 199, 202 hemisphere, 181 hemodynamic, 165 Hessian matrix, 20 high resolution, 129 high-frequency, 113 high-level, 168, 169 high-speed, 112 histogram, 199, 200 holistic, 170 Homeland Security, 136 horizon, 127 horse, 39 host, 190 hostile environment, x, 125, 130, 132 hot spots, 191 human, 61, 64, 66, 71, 72, 78, 108, 149, 150, 157, 169, 183, 184, 185, 186, 187, 190, 191, 199, 207, 213, 224, 244 human brain, 183 human development, 66 human subjects, 157 humans, 164, 244 hyperbolic, 199 hyperopia, 147 hypoplasia, 109 hypothesis, xi, 161, 171, 173, 175, 178, 195
I identification, 73, 74, 111, 115, 116, 248 idiopathic, 108, 109, 123 Illinois, 76 illumination, 82, 191 illusion, 79, 104, 210, 212 illusions, 211, 244 image analysis, 207
253
images, vii, viii, ix, x, 1, 2, 4, 5, 6, 9, 10, 13, 14, 16, 27, 29, 30, 35, 36, 40, 42, 43, 44, 51, 58, 60, 61, 68, 72, 81, 82, 83, 84, 87, 90, 99, 104, 126, 129, 139, 140, 141, 142, 156, 191, 207, 209, 217, 221 imaging, ix, xi, 78, 81, 82, 83, 84, 96, 99, 103, 165, 168, 189 imitation, 126, 131 implementation, 40, 41, 131, 193, 203 inattention, 74, 110 incidence, 110 independence, 164, 222 Indiana, 155, 160 indication, 64, 69 indicators, 69, 111, 247 induction, 144 industrial, 103 industry, 129, 131 inertial navigation system, 127 infancy, 109, 111, 122 infants, 73, 111, 112, 113, 147, 149, 150 infinite, 141 information exchange, 132 infrared, 112, 113, 184, 191, 213 infrared light, 113, 213 infrared spectroscopy, 184 inheritance, 122, 123 inhibition, 231 inhibitory, 168, 231 injury, 75, 207 innervation, 79 INS, 109, 133 insight, 226 inspection, 103 instabilities, 113 instability, 3, 110, 112, 113, 211, 212, 215, 216, 217, 218, 219, 220, 221, 222, 223, 235, 236, 243 instruction, 185, 214, 226, 232 integration, x, 125, 127, 130, 131, 149 integrity, 104 intelligence, 131 intensity, viii, 2, 5, 84, 85, 99, 110, 111, 113, 165, 184 interaction, 79, 113, 130, 180 interactions, 151, 168 interdependence, 119 interference, 72 interpretation, 60, 143 interval, 85, 88, 111
254
intraocular, 148 intraocular lens (IOL), 148 intrinsic, 8 intrusions, 112 invasive, 112, 113 inversions, 119 Investigations, 170 IOL, 148 Iran, 58 Italy, 107, 207 ITC, 214 iteration, 40 ITRC, 58 ITT, 206
Index
limitation, 24 limitations, 120, 173, 174 linear, x, 3, 22, 87, 89, 90, 94, 96, 113, 119, 125, 127, 143, 199 linear function, 94, 96, 199 linear regression, 119 linguistic, 212 localization, 79, 127, 206 location, 76, 104, 232 London, 77, 78, 80, 122, 151, 187 long period, 148, 218 long-distance, 104 longevity, 71 longitudinal study, 67, 80 low-level, 168
J Japan, 66, 139 Japanese, 76, 79 Jet Propulsion Laboratory, 136 judge, 233 Jung, 133 justification, 71
K kinesthesis, 160 kinesthetic, 156
L language, 240 laser, ix, 74, 81, 82, 83, 84, 85, 94, 96, 99, 103, 104, 130 late-onset, 151 laterality, viii, 63, 64, 66, 67, 68, 69, 70, 71, 72, 75, 79, 80, 247, 248 law, 79 lead, 112, 169 learning, 186 left hemisphere, 227 lens, 22, 74, 75, 77, 78, 84, 95, 113, 143, 148, 157 lenses, 69, 74 lesions, 245, 246 light emitting diode, 215 likelihood, 175
M machines, 137, 191 magnetic, 165 magnetic resonance, 78, 165 magnetic resonance imaging, 78, 79, 165 males, 67 management, 78, 150 manipulation, 178 mapping, 5, 48, 127 Mars, 136 masking, 126 matrix, 4, 9, 13, 19, 20, 22, 27, 28, 89, 90 measurement, viii, ix, x, xi, 2, 3, 81, 104, 108, 125, 127, 147, 155, 156, 157, 160 measures, 6, 65, 66, 222, 234, 238, 242, 245 medicine, 150 Mediterranean, 121 memory, 83, 99, 171 mental disorder, 182 mental processes, 167 metabolic, 153 metric, 4, 156 Mexican, 170 Mexico, 81 microscopy, 64 middle-aged, 74 Ministry of Education, 206 miosis, 144 mirror, 166, 167, 172 missiles, 126 mobile robot, 126, 129 mobile robots, 129
Index
mobility, 75 modalities, 79, 108 model fitting, 207 modeling, 3, 61 models, 29, 39, 40, 41, 245 modulation, 186 modules, 192 modus operandi, 74 MOG, 112 money, 131 monkeys, 76, 152, 153, 224 monocular clues, 143 Moscow, 125, 137 motion, vii, viii, x, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 16, 17, 19, 21, 22, 23, 24, 25, 26, 27, 28, 40, 41, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 108, 113, 117, 125, 127, 128, 131, 143, 152, 183, 186 motion control, 128 motivation, 204 motor control, 68, 79 motor system, 109, 112 motor task, 247 movement, 9, 11, 13, 16, 19, 94, 111, 117, 118, 119, 122, 123, 224, 229, 234, 245 multidisciplinary, 152 muscle, 148 muscles, 213, 214, 224 myopia, 67, 80 myopic, 148
N NASA, 135 National Academy of Sciences, 185, 187 natural, 72, 131, 225 navigation system, 127 near infrared spectroscopy, 184 neck, 41 negative valence, 171 neonatal, 109 nerve, 224 nervous system, 131 network, ix, 81, 82, 83, 84, 85, 87, 88, 90, 91, 93, 94, 95, 96, 99, 100, 103, 163, 244 neural network, ix, 81 neural networks, ix, 81 neuroanatomy, 71 neuronal circuits, 164, 180
255
neurons, 83, 88, 140 neuropsychology, 232 neuroscientists, 225 New York, 77, 122, 137, 152, 182, 185, 207 noise, viii, x, 2, 3, 16, 40, 51, 108, 113, 195, 196 non-human, 72 nonlinear, 3, 12, 22, 60 non-uniform, 199, 201, 204 normal, 7, 72, 77, 110, 122, 141, 146, 147, 148, 149, 150, 151, 166, 167, 183, 194, 241 North America, 67, 170, 207 nucleus, 164, 168, 184 nystagmus, vii, ix, x, 107, 108, 109, 110, 111, 112, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 212
O observations, 225, 226, 234 obstruction, 148 occipital lobe, 165 occipital regions, 179 occluding, 104 occlusion, 13, 18, 22, 82, 83, 84, 91, 99, 100, 101, 104, 148, 197 occupational, 74, 123 oculomotor, 108, 217, 223, 232 offenders, 187 Oklahoma, 157 operations research, 105 operator, 132 ophthalmologists, 119 optic nerve, 71, 109 optical, ix, 3, 4, 23, 60, 64, 69, 81, 82, 93, 94, 97, 98, 103, 127, 147 optical systems, 82 optics, 184 optomotor system, 229 organ, 71 organism, 72 organization, 147, 182, 187 orientation, 2, 5, 7, 8, 84, 93, 94, 96, 127, 178, 184, 191, 192, 193 oscillation, 110, 111, 116, 117, 119, 122, 123 oscillations, ix, 107, 108, 109, 110, 112, 117, 118, 120 outliers, vii, viii, 1, 2, 3, 4, 9, 16, 21, 196
256
Index
P paper, 135, 204 paradoxical, 72, 75 parameter, 21, 83, 119, 203 Parietal, 182, 214 Paris, 78 pathogenesis, ix, 107, 123 pathology, 75 pathophysiological, 109 pathophysiological mechanisms, 109 pathways, 162, 163, 186 patients, ix, 73, 75, 76, 80, 107, 110, 112, 113, 117, 119, 121, 123, 147, 148, 151, 164, 176, 177, 181, 226, 232 pattern recognition, 128 pedestrian, xi, 128, 189, 190, 192, 193, 197, 199, 201, 202, 204, 206, 207 pedestrians, xi, 189, 190, 191, 204, 206 pediatric, 122, 150 perception, vii, x, xi, 71, 72, 139, 140, 141, 142, 143, 144, 147, 149, 152, 161, 162, 165, 166, 167, 169, 170, 171, 173, 174, 175, 177, 178, 179, 180, 181, 183, 185, 186, 187, 218, 244 performance, viii, ix, x, 2, 31, 37, 41, 64, 74, 75, 81, 83, 84, 99, 113, 116, 125, 130, 152, 183, 192, 201, 206, 213, 228, 233, 234, 237, 238, 239 periodic, 21, 114, 116, 119, 121 permit, 131, 143 personal, 73, 170, 177 personal relevance, 170, 177 personality, 170, 185 personality traits, 170 perturbation, 4, 40 perturbations, 3, 41 PET, 165 PFC, 214 phobia, 176, 177 Phobos, 134 photocells, 215 photographs, 178 physical environment, 166 physical properties, 178 physicians, ix, 107, 111 physiological, 73, 74, 75, 109, 141, 164, 174, 184 picture processing, 183
pinhole, 84, 93, 97 pitch, 127, 194, 201, 207 planar, 202 planets, 128 planning, x, 108, 111, 130 plasticity, 187 play, 190, 212 pond, 90 poor, 58, 148, 151, 242 population, 67, 72, 122 posture, 110 power, x, 117, 125, 129, 130, 131 PPS, 206 praxis, 220 prediction, 141 predictors, ix, 108 preference, viii, 63, 64, 67, 68, 70, 76, 77, 79, 80, 148, 247, 248 prefrontal cortex, 188 preparedness, 186 preprocessing, 192 presbyopia, 74, 143, 144, 145 preschool, 72, 76, 149 preschool children, 76 prevention, 149, 207 preventive, 190 primary visual cortex, 149, 186 primate, 151, 182 primates, 72 prior knowledge, xi, 6, 189, 191 PRISM, 155 probability, 193, 196, 197 probability density function, 196, 197 probe, 179 production, 129, 238 prognosis, 123, 148 program, 149 projector, 83 promote, 171 propagation, 4 property, viii, 2, 16, 27, 58, 127 proposition, 247 protection, 190 protocols, 157 proximal, 144, 151 psychiatrists, 119 ptosis, 148 pupil, 113, 143, 144, 145, 152 pupils, 141
Index
Q quality of life, 75 quantization, viii, 2, 3 questioning, 69
R radar, 130, 190 radio, 113 radius, 19, 20, 141 random, 32, 120, 197 range, viii, 2, 19, 68, 104, 108, 109, 113, 117, 132, 195, 198, 204, 208, 234, 240, 242 ras, 2 ratings, 171, 185 reaction time, xii, 179, 209, 214, 216, 224, 227, 228, 229, 230, 231, 233, 234, 235, 236, 237, 238, 240, 241, 244 reading, vii, xii, 158, 182, 209, 211, 212, 213, 229, 245 real time, 61, 130, 131 reality, 210 reasoning, 70 recognition, 104, 115, 126, 127, 128, 245 reconcile, 73 reconstruction, vii, viii, ix, 1, 2, 5, 26, 29, 33, 39, 48, 51, 58, 61, 81, 93, 99, 126, 127, 128, 202, 204 recovery, 2, 3 redistribution, 202, 203 reduction, xii, 105, 148, 189, 195, 206, 231, 237, 238 reference frame, 71 reflection, 213 reflexes, 164, 225, 227, 229, 237 regression, 119 regression analysis, 119 regression line, 119 regular, 204, 230, 242 relationship, viii, 63, 70, 82, 87, 120, 141, 242 relationships, 143 relevance, 163, 169, 170, 177, 178 reliability, x, 116, 125, 126, 127, 130, 132, 219, 234, 245 renormalization, 60 repeatability, xi, 83, 103, 104, 111, 121, 155, 156, 157, 158, 159, 160
257
research, 2, 58, 64, 66, 71, 76, 105, 113, 119, 143, 147, 149, 162, 166, 170, 172, 174, 180, 190, 191, 206, 214, 223, 231, 232, 243, 248 Research and Development, 136 researchers, 114, 191 residual error, 196, 197 resolution, 38, 82, 104, 113, 114, 129, 183, 187, 204 resources, 129, 131 respiration, 164 retina, ix, 107, 108, 110, 141, 163, 209, 213, 224 risk, 127, 147, 152 risk factors, 152 Robotics, 60, 133, 134, 135, 136 robustness, 16, 40 ROI, 202 Royal Society, 137 Russia, 125, 137 Russian, 125, 136, 137 Russian Academy of Sciences, 125
S saccades, x, xii, 69, 108, 209, 210, 212, 213, 214, 215, 216, 217, 219, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 240, 241, 242, 243, 244, 245 saccadic eye movement, 209, 210, 227, 244, 245 safety, 129, 190 sample, 167, 199, 207 sampling, 29, 150, 197, 198, 199, 204, 205 SAR, 135 scalar, 194 scaling, 195 scatter, 218, 219, 222, 224, 227, 228, 234, 237, 242 scatter plot, 218, 222 Schmid, 6, 61 school, 219 scientists, 162, 223 sclera, 113 scotoma, 148 search, xi, 3, 19, 20, 80, 112, 113, 123, 128, 131, 162, 174, 180, 189 searching, xi, xii, 189, 193
258
Index
segmentation, 192, 204, 206 selectivity, 186 self-report, 76, 175, 178, 179 semantic, 171, 176 semantic content, 171 semicircular canals, 108 sensitivity, 3, 22, 25, 26, 104, 110, 122 sensors, 127, 130, 190 separation, 113, 115 series, 115, 121, 156, 179, 181 shape, viii, ix, 2, 3, 8, 13, 16, 18, 26, 27, 28, 32, 33, 34, 36, 60, 81, 82, 83, 84, 90, 93, 99, 100, 101, 102, 103, 104, 108, 119, 207 short period, ix, 107, 220 signals, 108, 114, 118, 179, 190 significance level, 222 similarity, 8, 9, 13, 16 simulation, 32, 37, 38 Simultaneous Localization and Mapping, 127, 133 sine, 119 Singapore, 104 sites, 128 sleep, 110 smoothing, 40 snakes, 162 social anxiety, 185 social phobia, 183 software, 129, 204 solutions, 3 somatosensory, 182 sorting, 196 space-time, 2 Spain, 189 spatial, 66, 71, 74, 104, 143, 148, 166, 178, 186, 217 spatial frequency, 148 spatial information, 66 specialisation, 71, 240 species, 72, 75, 131 spectral analysis, 123 spectrum, 117, 224 speculation, 71 speed, 29, 31, 131, 192, 194, 195, 196 sports, 64 SRT, 228 stability, xii, 209, 211, 215, 217, 218, 219, 223, 242 stabilization, 21, 108, 110, 112, 127 stabilize, 109, 229
stages, x, xi, 125, 129, 161, 166, 167, 169, 180, 189, 192, 206 standard deviation, xi, 103, 117, 159, 196, 199, 240 standard error, 175, 177 statistical analysis, 213 statistics, xii, 88, 189, 190 steady state, 117 Stimuli, 162, 175, 178, 181 stimulus, 69, 117, 147, 148, 162, 165, 166, 168, 169, 179, 211, 219, 224, 226, 227, 229, 230, 231, 232, 233, 234, 242 stimulus information, 168 stimulus pairs, 69 Stochastic, 132 strabismus, vii, x, xi, 110, 122, 139, 140, 146, 147, 148, 149, 150, 151, 153, 157 streams, 60 strength, 68, 170, 180, 216 STRUCTURE, 1 students, 157 subjective, 70, 159, 171 substrates, 180 suffering, 115 superiority, 40, 64, 68, 184 suppression, x, 72, 74, 139, 146, 147, 148, 151, 152, 167, 168, 170, 183, 186, 187, 234, 248 surgery, 74, 75, 77, 78, 110, 122, 123, 148, 149, 150, 157 surgical, 111, 148, 150, 152 surprise, 214, 227, 232 surveillance, 190, 191 survival, 131 switching, 75 Switzerland, 207 symmetry, 191 symptoms, 73 syndrome, 109, 120 syntactic, 115 synthesis, 59 systems, x, 108, 113, 117, 125, 126, 128, 129, 130, 131, 132, 137, 190, 191, 192, 194, 207
T targets, 117, 128, 148, 157, 192, 193 task performance, 234 taxonomy, 68
Index
technology, 112 telencephalon, 164 temporal, 40, 112, 113, 141, 168, 169, 187, 230 test procedure, 29 test-retest reliability, 219, 245 thalamus, 163 theory, 67, 76, 80, 109, 111, 126, 149, 152, 167, 168, 169, 176, 182, 184, 211 therapy, x, 108, 111, 119, 148 thinking, 131 threat, 185, 186 threatening, 162 three-dimensional, viii, 2, 4, 11, 15, 143, 217, 218 three-dimensional model, viii three-dimensional reconstruction, 4 three-dimensional space, viii, 2, 15, 217 threshold, 114, 196 thresholds, 115, 140 time, viii, ix, x, 5, 16, 19, 27, 31, 32, 37, 38, 40, 58, 61, 64, 69, 82, 84, 103, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 121, 127, 130, 131, 132, 141, 148, 163, 166, 171, 172, 177, 178, 192, 195, 196, 209, 210, 215, 217, 218, 219, 220, 221, 223, 224, 225, 226, 228, 229, 230, 231, 234, 243, 247 time consuming, 132, 163 time frame, 217 time periods, 210, 225 time resolution, 113 time series, 115, 121 timing, 226 Timmer, 244 top-down, 165, 186 topographic, 82 topology, 6 toxin, 117 tracking, 3, 4, 5, 21, 22, 23, 24, 26, 39, 40, 42, 57, 58, 113, 117, 123, 128, 192 traffic, 190, 207, 226 training, 75, 222 traits, 66, 247 trajectory, 117 trans, 27, 77 transfer, 66, 186 transformation, 77 translation, 3, 12, 19, 27, 28, 60 translational, 4, 60
259
transverse section, 83, 99 travel, 224 trend, 67, 77 trial, 152, 172, 174, 177, 178, 179, 213, 216, 217, 218, 226, 227, 228, 229, 230 triangulation, 9, 15 triggers, 230 two-dimensional, x, 139, 171
U uncertainty, 22, 130, 202, 203 uniform, 111, 197, 201, 202, 248 United Nations, 207 United States, 141, 185, 187 universities, 190 Unmanned Aerial Vehicles, x, 126, 132, 137
V valence, 165, 171, 172, 178, 181 validation, 182 validity, 179 values, ix, 13, 17, 58, 85, 87, 88, 89, 90, 95, 96, 99, 103, 108, 158, 160, 196, 203, 216, 218, 219, 221, 222, 224, 228, 234, 239, 240, 242, 243 variability, ix, 108, 115, 117, 119, 120, 121, 191, 199 variable, 74, 110, 172, 218, 222, 229 variables, xii, 185, 209, 213, 218, 222, 223, 226, 227, 228, 233, 234, 236, 238, 240, 241 variance, 70 variation, ix, 8, 22, 27, 40, 81, 82, 204 vector, 6, 7, 19, 20, 22, 83, 87, 191 vehicles, x, 125, 126, 132, 191, 197 velocity, ix, 103, 107, 109, 110, 111, 113, 114, 115, 116, 217, 218, 219 vertebrates, 108 vestibular system, 108, 112 violence, 187 violent, 170 visible, xi, 5, 48, 147, 161, 216, 226, 227, 230 vision, vii, viii, ix, x, xi, 2, 3, 4, 5, 59, 63, 64, 66, 71, 72, 73, 74, 75, 76, 78, 81, 82, 104, 107, 108, 110, 111, 122, 125, 126, 131, 132, 137, 139, 140, 141, 142, 147, 148, 149, 151, 152, 155, 156, 187, 188, 190,
260
Index
208, 210, 211, 212, 213, 215, 216, 217, 220, 221, 223, 225, 248 visual acuity, ix, x, 65, 68, 69, 71, 75, 77, 78, 79, 108, 110, 111, 112, 115, 116, 117, 119, 140, 149, 150, 157, 211, 248 visual area, 149, 164, 165, 168 visual attention, 185, 232, 244, 245 visual field, 72, 140, 148, 151, 167, 211 visual perception, 162, 163, 165, 166, 169, 173, 181, 244 visual processing, xi, 149, 161, 167, 168, 169, 172, 182 visual stimuli, 162, 179, 182, 213 visual stimulus, 147, 148, 230, 232 visual system, xi, 75, 77, 111, 141, 147, 151, 161, 181, 210, 211, 220, 221 visualization, 61, 200, 205 vocational, 74 vulnerability, 132
W Warsaw, 121
wavelet, 114, 115 wavelet analysis, 114, 115 weakness, 68, 216, 235, 237 wealth, 74 weapons, 128 windows, xi, xii, 114, 116, 189, 191, 192, 193, 197, 199, 201, 203, 204, 205 World Health Organization, 190 writing, 69, 70, 78, 240, 247
X X-linked, 109
Y Y-axis, 195 yield, 160 young adults, xi, 155