3DTV CONTENT CAPTURE, ENCODING AND TRANSMISSION BUILDING THE TRANSPORT INFRASTRUCTURE FOR COMMERCIAL SERVICES
Daniel Mi...
72 downloads
677 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
3DTV CONTENT CAPTURE, ENCODING AND TRANSMISSION BUILDING THE TRANSPORT INFRASTRUCTURE FOR COMMERCIAL SERVICES
Daniel Minoli
A JOHN WILEY & SONS, INC., PUBLICATION
3DTV CONTENT CAPTURE, ENCODING AND TRANSMISSION
3DTV CONTENT CAPTURE, ENCODING AND TRANSMISSION BUILDING THE TRANSPORT INFRASTRUCTURE FOR COMMERCIAL SERVICES
Daniel Minoli
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 2010 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Minoli, Daniel, 19523DTV content capture, encoding and transmission : building the transport infrastructure for commercial services / Daniel Minoli. p. cm. ISBN 978-0-470-64973-2 (cloth) 1. Stereoscopic television. I. Title. TK6643.M56 2010 621.388– dc22 2010008432 Printed in Singapore 10 9 8 7 6 5 4 3 2 1
For Anna, Emma, Emile, Gabby, Gino, and Angela
CONTENTS
Preface
xi
About the Author 1 Introduction
xiii 1
1.1
Overview
1
1.2
Background 1.2.1 Adoption of 3DTV in the Marketplace 1.2.2 Opportunities and Challenges for 3DTV
6 6 16
1.3
Course of Investigation
19
References
24
Appendix A1: Some Recent Industry Events Related to 3DTV
26
2 3DV and 3DTV Principles Visual System Depth/Binocular Cues Accommodation Parallax
29
2.1
Human 2.1.1 2.1.2 2.1.3
2.2
3DV/3DTV Stereoscopic Principles
35
2.3
Autostereographic Approaches
42
References 3 3DTV/3DV Encoding Approaches 3.1
3.2
29 33 34 34
45 47
3D Mastering Methods 3.1.1 Frame Mastering for Conventional Stereo Video (CSV) 3.1.2 Compression for Conventional Stereo Video (CSV)
51 51 55
More Advanced Methods 3.2.1 Video Plus Depth (V + D)
59 60 vii
viii
CONTENTS
3.2.2 3.2.3 3.3 3.4
Multi-View Video Plus Depth (MV + D) Layered Depth Video (LDV)
63 65
Short-term Approach for Signal Representation and Compression
69
Displays
69
References
69
Appendix A3: Color Encoding
73
Appendix B3: Additional Details on Video Encoding Standards B3.1 Multiple-View Video Coding (MVC) B3.2 Scalable Video Coding (SVC) B3.3 Conclusion
74 75 78 79
4 3DTV/3DV Transmission Approaches and Satellite Delivery
81
4.1
Overview of Basic Transport Approaches
81
4.2
DVB
90
4.3
DVB-H
95
References Appendix A4: Brief Overview of MPEG Multiplexing and DVB Support A4.1 Packetized Elementary Stream (PES) Packets and Transport Stream (TS) Unit(s) A4.2 DVB (Digital Video Broadcasting)-Based Transport in Packet Networks A4.3 MPEG-4 and/or Other Data Support 5 3DTV/3DV IPTV Transmission Approaches
99 101 101 104 105 113
5.1
IPTV Concepts 5.1.1 Multicast Operation 5.1.2 Backbone 5.1.3 Access
114 115 120 125
5.2
IPv6 Concepts
132
References
135
Appendix A5: IPv6 Basics A5.1 IPv6 Overview A5.2 Advocacy for IPv6 Deployment—Example
138 138 157
CONTENTS
6 3DTV Standardization and Related Activities
163
6.1
Moving 6.1.1 6.1.2 6.1.3
6.2
MPEG Industry Forum (MPEGIF)
182
6.3
Society of Motion Picture and Television Engineers (SMPTE) 3D Home Entertainment Task Force
183
6.4
Rapporteur Group On 3DTV of ITU-R Study Group 6
184
6.5
TM-3D-SM Group of Digital Video Broadcast (DVB)
187
6.6
Consumer Electronics Association (CEA)
188
6.7
HDMI Licensing, LLC
189
6.8
Blu-ray Disc Association (BDA)
189
6.9
Other Advocacy Entities 6.9.1 3D@Home Consortium 6.9.2 3D Consortium (3DC) 6.9.3 European Information Society Technologies (IST) Project “Advanced Three-Dimensional Television System Technologies” (ATTEST) 6.9.4 3D4YOU 6.9.5 3DPHONE
190 190 190
References
Picture Experts Group (MPEG) Overview Completed Work New Initiatives
ix
165 165 166 178
191 192 196 198
Glossary
201
Index
225
PREFACE
3 Dimensions TV (3DTV) became commercially available in the United States in 2010 and service in other countries was expected to follow soon thereafter. 3DTV is a subset of a larger discipline known as 3D Video (3DV). There are now many routine vendor announcements related to 3DTV/3DV, and there are also conferences wholly dedicated to the topic. To highlight the commercial interest in this topic, note that ESPN announced in January 2010 that it planned to launch what would be the world’s first 3D sports network with the 2010 World Cup soccer tournament in June 2010, followed by an estimated 85 live sports events during its first year of operation. DirecTV was planning to become the first company to offer satellite-based 3D as announced at the 2010 International Consumer Electronics Show. Numerous manufacturers showed 3D displays at recent consumer electronics trade shows. Several standards bodies and industry consortia are now working to support commercialization of the service. An increasing inventory of content is now also becoming available in 3D. This text offers an overview of the content capture, encoding, and transmission technologies that have emerged of late in support of 3DTV/3DV. It focuses on building the transport infrastructure for commercial services. The book is aimed at interested planners, researchers, and engineers who wish to get an overview of the topic. Stakeholders involved with the rollout of the infrastructure include video engineers, equipment manufacturers, standardization committees, broadcasters, satellite operators, Internet Service Providers, terrestrial telecommunications carriers, storage companies, content-development entities, design engineers, planners, college professors and students, and venture capitalists. While there is a lot of academic interest in various aspects of the overall system, service providers and the consumers ultimately tend to take a system-level view. While service providers do to an extent take a constructionist bottom-up view to deploy the technological building blocks (such as encoders, encapsulators, IRDs, and set-top boxes), 3DTV stakeholders need to consider the overall architectural system-level view of what it will take to deploy an infrastructure that is able to reliably and cost-effectively deliver a commercial-grade quality bundle of multiple 3DTV content channels to paying customers with high expectations. This text, therefore, takes such system-level view. Fundamental visual concepts supporting stereographic perception of 3DTV are reviewed. 3DTV technology and digital video principles are discussed. Elements of an end-to-end 3DTV system are covered. Compression and transmission technologies are assessed xi
xii
PREFACE
for satellite and terrestrial (or hybrid) IPTV-based architecture. Standardization activities, critical to any sort of broad deployment, are identified. The focus of this text is how to actually deploy the technology. There is a significant quantity of published material in the form of papers, reports, and technical specifications. This published material forms the basis for this synthesis, but the information is presented here in a self-contained, organized, tutorial fashion.
ABOUT THE AUTHOR
Mr. Minoli has done extensive work in video engineering, design, and implementation over the years. The results presented in this book are based on work done while at Bellcore/Telcordia, Stevens Institute of Technology, AT&T, and other engineering firms, starting in the early 1990s and continuing to the present. Some of his video work has been documented in the books he has authored such as 3D Television (3DTV) Technology, Systems, and Deployment - Rolling out the Infrastructure for Next-Generation Entertainment (Francis and Taylor, 2010); IP Multicast with Applications to IPTV and Mobile DVB-H (Wiley/IEEE Press, 2008); Video Dialtone Technology: Digital Video over ADSL, HFC, FTTC, and ATM (McGraw-Hill, 1995); Distributed Multimedia Through Broadband Communication Services (co-authored) (Artech House, 1994); Digital Video (4 chapters) in The Telecommunications Handbook, K. Terplan & P. Morreale Editors, IEEE Press, 2000; and, Distance Learning: Technology and Applications (Artech House, 1996). Mr. Minoli has many years of technical hands-on and managerial experience in planning, designing, deploying, and operating IP/IPv6-, telecom-, wireless-, and video networks, and data center systems and subsystems for global best-in-class carriers and financial companies. He has worked in financial firms such as AIG, Prudential Securities, Capital One Financial, and service provider firms such as Network Analysis Corporation, Bell Telephone Laboratories, ITT, Bell Communications Research (now Telcordia), AT&T, Leading Edge Networks Inc., and SES Engineering, where he is Director of Terrestrial Systems Engineering (SES is the largest satellite services company in the world). At SES, in addition to other duties, Mr. Minoli has been responsible for the development and deployment of IPTV systems, terrestrial and mobile IP-based networking services, and other global networks. He also played a founding role in the launching of two companies through the high-tech incubator Leading Edge Networks Inc., which he ran in the early 2000s: Global Wireless Services, a provider of secure broadband hotspot mobile Internet and hotspot VoIP services; and, InfoPort Communications Group, an optical and Gigabit Ethernet metropolitan carrier supporting data center/SAN/channel extension and cloud computing network access services. For several years, he has been Session, Tutorial, and now overall Technical Program Chair for the IEEE ENTNET (Enterprise Networking) conference; ENTNET focuses on enterprise networking requirements for large financial firms and other corporate institutions. xiii
xiv
ABOUT THE AUTHOR
Mr. Minoli has also written columns for ComputerWorld, NetworkWorld, and Network Computing (1985–2006). He has taught at New York University (Information Technology Institute), Rutgers University, and Stevens Institute of Technology (1984–2006). Also, he was a Technology Analyst At-Large, for Gartner/DataPro (1985–2001); based on extensive hand-on work at financial firms and carriers, he tracked technologies and wrote CTO/CIO-level technical scans in the area of telephony and data systems, including topics on security, disaster recovery, network management, LANs, WANs (ATM and MPLS), wireless (LAN and public hotspot), VoIP, network design/economics, carrier networks (such as metro Ethernet and CWDM/DWDM), and e-commerce. Over the years he has advised Venture Capitals for investments of $150M in a dozen high-tech companies. He has acted as Expert Witness in a (won) $11B lawsuit regarding a VoIP-based wireless air-to-ground communication system, and has been involved as a technical expert in a number of patent infringement lawsuits (including two lawsuits on digital imaging).
CHAPTER 1
Introduction
1.1
OVERVIEW
Recently, there has been a lot of interest on the part of technology suppliers, broadcasters, and content providers to bring 3 Dimension Video (3DV) to the consumer. The year 2010 has been called the first year of 3D Television (3DTV) by some industry players. 3DTV is the delivery of 3DV on a TV screen, typically in the consumer’s home. The initial step in this commercialization endeavor was to make 3D content available on Blu-ray Discs (BDs), for example with the release of Titanic, Terminator, and Avatar. However, well beyond that stand-alone home arrangement there has been a concerted effort to develop end-to-end systems to bring 3DTV services to the consumer, supported by regular commercial programming that is delivered and made available on a routine scheduled basis. Broadcasters such as, but not limited to, ESPN, DIRECTV, Discovery Communications, BSkyB, and British Channel 4 were planning to start 3D programming in 2010. LG, Samsung, Panasonic, Sony, JVC, Vizio, Sharp, and Mitsubishi, among others, were actively marketing high quality TV display products at press time, with some such as Samsung and Mitsubishi already shipping 3D-ready flatpanel TVs as far back as 2008. Front Projection 3D systems for medium-sized audiences (5–25 people), for example for the “prosumer,” have been available for longer; of course, movie theater systems have been around for years. The goal of the 3DTV industry is to replicate to the degree possible the experience achievable in a 3D movie theater, but in the home setting. A commercial 3DTV system is comprised of the following functional elements: capture of 3D content, specifically moving scenes; encoding (representation) of content; content compression; content transport over satellite, cable, Internet Protocol Television (IPTV), or over-the-air channels1 ; and content display. Figure 1.1 depicts a logical, functional view of an end-to-end 3DTV 1 Internet-based
downloading and/or streaming is also a possibility for some applications or subset
of users. 3DTV Content Capture, Encoding and Transmission: Building the Transport Infrastructure for Commercial Services, by Daniel Minoli Copyright 2010 John Wiley & Sons, Inc.
1
2
INTRODUCTION
Scene replica
3D scene
Capture
Representation
Transmission
Figure 1.1
Compression
Signal conversion
Coding
Display
Basic 3DTV system—logical view.
system. Figure 1.2 depicts graphically a system architecture that may see early commercial introduction—this system is known as stereoscopic Conventional Stereo Video (CSV) or Stereoscopic 3D (S3D). Figures 1.3 and 1.4 show examples of 3D camera arrangements, while Fig. 1.5 illustrates a typical 3D display (this one using active glasses, also called eyewear). Finally, Fig. 1.6 depicts what we call a pictorialization of 3D TV screens, as may be included in vendor brochures. This text offers an overview of the content capture, encoding, and transmission subelements, specifically the technologies, standards, and infrastructure required to support commercial real-time 3DTV/3DV services. It reviews the required standards and technologies that have emerged of late—or are just emerging—in support of such new services, with a focus on encoding and the build-out of the transport infrastructure. Stakeholders involved with the rollout of this infrastructure include consumer and system equipment manufacturers, broadcasters, satellite operators, terrestrial telecommunications carriers, Internet Service Providers (ISPs), storage companies, content-development entities, and standardization committees. There is growing interest on the part of stakeholders to introduce 3DTV services, basically as a way to generate new revenues. There was major emphasis
OVERVIEW
3
5 - Transmission 2 - Mastering of two frames
4 - Digital (HD) encoder Satellite
1 - Dual camera capture system
3 - 3D processor combines two frames into single (HDTV) frames
6 - Decoding
Figure 1.2
Figure 1.3
IPTV cable over the air Internet
3DTV 3DTV glasses display 7 - Displaying, viewing
Basic 3DTV system—conventional stereo video.
Illustrative 2-camera rig for 3D capture. Source: www.inition.co.uks.
on 3DTV from manufacturers at various consumer shows taking place in the recent past. One in four consumers surveyed by the Consumer Electronics Association (CEA) in a press time study indicated that they plan to buy a 3D TV set within the next three years [1]. The research firm DisplaySearch has forecasted that the 3D display market will grow to $22 billion by 2018 (this represents an
4
INTRODUCTION
Figure 1.4 Illustrative single 3D camcorder with dual lenses. Source: Panasonic CES 2010 Press Kit.
Figure 1.5 Illustrative 3D home display. Source: Panasonic CES 2010 Press Kit.
OVERVIEW
5
Figure 1.6 Pictorialization of 3D home display. Source: LG CES 2010 Press Kit.
annual compound growth rate of about 50%2 ). When it comes to entertainment, especially for a compelling type of entertainment that 3D has the opportunity of being, there may well be a reasonably high take rate, especially if the price point is right for the equipment and for the service. Classical questions that are (and/or should be) asked by stakeholders include the following: • Which competing 3D encoding and transmission technologies should an operator adopt? • What technological advancements are expected in 3D, say by 2012 or 2015? • Where do the greatest market opportunities exist in the 3D market? These and similar questions are addressed in this text. 2
The company originally forecast a $1B industry for 2010, but recently lowered that forecast by about 50%.
6
INTRODUCTION
1.2
BACKGROUND
This section provides an encapsulated assessment of the 3DTV industry landscape to give the reader a sense of what some of the issues are. It provides a press time snapshot of industry drivers that support the assertion just made: that there is a lot of activity in this arena at this time. 1.2.1
Adoption of 3DTV in the Marketplace
It should be noted that 3D film and 3DTV trials have a long history, as shown in Fig. 1.7 (based partially on Ref. 2). However, the technology has finally Stereoscopic 3D pictures—1838 (Wheatsone) Popular by 1844 in US and Europe 2D Photography—1839 2D Movies—1867 (Lincon) 3D stereoscopic cinema—early 1900 2D TV—1920 (Belin and Baird) Stereoscopic 3D TV—1920 (Baird) Stereoscopic 3D cinema popular by 1950s Stereoscopic 3DTV broadcast—1953 First commercial 3DTV broadcast—1980s Vendor buzz—2010 called “The Year of 3DTV” by some 1838
1867
1950s
1920
1980s
2010
Timeline (not on a linear scale)
Analog broadcast of 3DTV (limited to single movie or specific event): first experimental broadcast in 1953 first commercial broadcast in 1980 first experimental broadcast in Europe in 1982 Digital broadcast of 3DTV (typically stereoscopic 3DTV) Japan, 1998; Korea, 2002 3D cinema becomes popular routine stereoscopic broadcast with anaglypth methods, especially for sports events 3DTV over IP networks: video streaming experiments and demonstrations assessment of streaming protocols (RTP over UDP, DCCP) research into Multiview video streaming Development of plethora of displays (ongoing) Development of standards (ongoing)
Figure 1.7
History of 3D in film and television.
BACKGROUND
7
progressed enough at this juncture, for example with the deployment of digital television (DTV) and High Definition Television (HDTV), that regular commercial services will finally be introduced at this juncture. We start by noting that there are two general commercial-grade display approaches for 3DTV: (i) stereoscopic TV, which requires special glasses to watch 3D movies, and (ii) autostereoscopic TV, which displays 3D images in such a manner that the user can enjoy the viewing experience without special accessories.3 Short-term commercial 3DTV deployment, and the focus of this book, is on stereoscopic 3D imaging and movie technology. The stereoscopic approach follows the cinematic model, is simpler to implement, can be deployed more quickly (including the use of relatively simpler displays), can produce the best results in the short term, and may be cheaper in the immediate future. However, the limitations are the requisite use of accessories (glasses), somewhat limited positions of view, and physiological and/or optical limitations including possible eye strain. In summary, (i) glasses may be cumbersome and expensive (especially for a large family) and (ii) without the glasses, the 3D content is unusable. Autostereoscopic 3DTV eliminates the use of any special accessories: it implies that the perception of 3D is in some manner automatic, and does not require devices—either filter-based glasses or shutter-based glasses. Autostereoscopic displays use additional optical elements aligned on the surface of the screen, to ensure that the observer sees different images with each eye. From a home screen hardware perspective the autostereoscopic approach is more challenging, including the need to develop relatively more complex displays; also, more complex acquisition/coding algorithms may be needed to make optimal use of the technology. It follows that this approach is more complex to implement, will require longer to be deployed, and may be more expensive in the immediate future. However, this approach can produce the best results in the long term, including accessories-free viewing, multi-view operation allowing both movement and different perspective at different viewing positions, and better physiological and/or optical response to 3D. Table 1.1 depicts a larger set of possible 3DTV (display) systems than what we identified above. The expectation is that 3DTV based on stereoscopy will experience earlier deployment compared with other technological alternatives. Hence, this text focuses principally on stereoscopy. Holography and integral imaging are relatively newer technologies in the 3DTV context compared to stereoscopy; holographic and/or integral imaging 3DTV may be feasible late in the decade. There are a number of techniques to allow each eye to view the separate pictures, as summarized in Table 1.2 (based partially on Ref. 3.) All of these techniques work in some manner, but all have some shortcomings. To highlight the commercial interest in 3DTV at press time, note that ESPN announced in January 2010 that it planned to launch what would be the world’s 3
Autostereoscopic technology may also (in particular) be appropriate for mobile 3D phones and there are several initiatives to explore these applications and this 3D phone-display technology.
8
INTRODUCTION
TABLE 1.1
Various 3D Display Approaches and Technologies
Stereoscopic 3D (S3D)
Autostereoscopic
Multi-viewpoint 3D system
Integral imaging (holoscopic imaging)
Holography
Volumetric systems
A system where two photographs (or video streams) taken from slightly different angles that appear three-dimensional when viewed together; this technology is likely to see the earliest implementation using specially designed equipment displays that support polarization 3D displays that do not require glasses to see the stereoscopic image (using lenticular or parallax barrier technology). Whether stereoscopic or autostereoscopic, a 3D display (screen) needs to generate parallax that, in turn, creates a stereoscopic sense. Will find use in cell phone 3D displays in the near future A system that provides a sensation of depth and motion parallax based on the position and motion of the viewer; at the display side new images are synthesized, based on the actual position of the viewer A technique that provides autostereoscopic images with full parallax by using an array of microlenses to generate a collection of 2D elemental images; in the reconstruction/display subsystem, the set of elemental images is displayed in front of a far-end microlens array A technique for generating an image (hologram) that conveys a sense of depth, but is not a stereogram in the usual sense of providing fixed binocular parallax information; holograms appear to float in space and they change perspective as one walks left or right; no special viewers or glasses are necessary (note, however, that holograms are monochromatic) Systems that use geometrical principles of holography, in conjunction with other volumetric display methods. Volumetric displays form the image by projection within a volume of space without the use of a laser light reference, but have limited resolution. They are primarily targeted, at least at press time, at the Industrial, Scientific, and Medical (ISM) community
first 3D sports network with the 2010 World Cup soccer tournament in June 2010, followed by an estimated 85 live sports events during its first year of operation. DIRECTV announced that they will start 3D programming in 2010. DIRECTV’s new HD 3D channels will deliver movies, sports, and entertainment content from some of the world’s most renowned 3D producers. DIRECTV is currently working with AEG/AEG Digital Media, CBS, Fox Sports/FSN, Golden Boy Promotions, HDNet, MTV, NBC Universal, and Turner Broadcasting System, Inc., to develop additional 3D programming that will debut in 2010–2011. At launch, the new DIRECTV HD 3D programming platform will offer a 24/7 3D pay per view channel focused on movies, documentaries, and other programming;
9
Without appliances
With appliances (glasses)
TABLE 1.2
Barrier
“Virtual reality” headset Lenticular
Colorimetric arrangements (or anaglyth) Time multiplexing of the display
Orthogonal polarization
This technique arranges for each eye’s view to be directed toward separate picture elements by lenses. This is done by fronting the screen with a ribbed (lenticular) surface This technique arranges for the screen to be fronted with barrier slots that perform a similar function. In this system, two views (left and right), or more than two (multi-camera 3D) can be used. However, since each of the picture elements (stripes or points) have to be laid next to each other, the number of views impacts on the resolution available. There is a trade-off between resolution and ease of viewing. Arrangements can be made with this type of system to track head or eye movements, and thus change the barrier position, giving the viewer more freedom of head movement
Uses orthogonal (different) polarization planes for each, with matching viewer glasses for each of the left and right eye pictures. Light from each picture is filtered such that only one plane for the light wave is available. This is easy to arrange in a cinema, but more difficult to arrange in a television display. Test television systems have been developed on the basis of this method, either using two projection devices projecting onto the same surface, or two displays orthogonally placed so that a combined image can be seen using a semisilvered mirror. In either case, these devices are “nonstandard” television receivers. Of the systems with glasses, this is considered the “best” One approach is to use different colorimetric arrangements (anaglyth) for each of the two pictures, coupled with glasses that filter appropriately. A second, is a relatively new notch filter color separation technique that can be used in projection systems (advanced by Dolby)—described later in the chapter Sometimes also called “interlaced stereo”, content shown with consecutive left and right signals and shuttered glasses. This technology is applicable to 3DTV. This technique is still used for movie theaters today, such as the IMAX, and sometimes used in conjunction with polarization plane separation. In a Cathode Ray Tube (CRT) environment, a major shortcoming of the interlaced stereo was image flicker, since each eye would see only 25 or 30 images per second, rather than 50 or 60. To overcome this, the display rate could be doubled to 100 or 120 Hz to allow flicker-free reception Technique using immersion headgear/glasses often used for video games
Current Techniques to Allow Each Eye to View Distinct Pictures Streams
10
INTRODUCTION
a 24/7 3D DIRECTV on Demand channel; and a free 3D sampler demo channel featuring event programming such as sports, music, and other content. Comcast has announced that its VOD (Video-On-Demand) service is offering a number of movies in anaglyph 3D (as well as HD) form. Comcast customers can pick up 3D anaglyph glasses at Comcast payment centers and malls “while supplies last” (anaglyph is a basic and inexpensive method of 3D transmission that relies on inexpensive colored glasses, but its drawback is the relatively low quality.) Verizon’s FiOS was expected to support 3DTV programming by Late 2010. Sky TV in the United Kingdom was planning to start broadcasting programs in 3D in the fall of 2010 on a dedicated channel that will be available to anyone who has the Sky HD package; there are currently 1.6 million customers who have a Sky HD set-top box. Sky TV has not announced what programs will be broadcast in 3D, but it is expected to broadcast live the main Sunday afternoon soccer game from the Premiership in 3D from the 2011 season, along with some arts documentaries and performances of ballet [4]. Sky TV has already invested in installing special twin-lens 3D cameras at stadiums. (Appendix A1 includes a listing of events during the year, prior to the publication of this text to further document the activity in this arena.) 3DTV television displays could be purchased in the United States and United Kingdom as of the spring of 2010 for $1000–5000 initially, depending on technology and approach. Liquid Crystal Display (LCD) systems with active glasses tend to generally cost less. LG released its 3D model, a 47-in. LCD screen, expected to cost about $3000; with this system, viewers will need to wear polarized dark glasses to experience broadcasts in 3D. Samsung and Sony also announced they were bringing their own versions to market by the summer of 2010, along with 3D Blu-ray players, allowing consumers to enjoy 3D movies such as Avatar and Up, in their own homes [4]. Samsung and Sony’s models use LED (Light-Emitting Diode) screens which are considered to give a crisper picture and are, therefore, expected to retail for about $5000 or possibly more. While LG is adopting the use of inexpensive polarizing dark glasses, Sony and Samsung are using active shutter technology. This requires users to buy expensive dark glasses, which usually cost more than $50 and are heavier than the $2–3 plastic polarizing ones. Active shutter glasses alternately darken over one eye, and then the other, in synchronization with the refresh rate of the screen using shutters built into the glasses (using infrared or Bluetooth connections). Panasonic Corporation has developed a full HD 3D home theater system consisting of a plasma full HD 3D TVs, 3D Blu-ray player, and active shutter 3D glasses. The 3D display was originally available in 50-in., 54-in., 58-in. and 65-in. class sizes. High-end systems are also being introduced; for example Panasonic announced a 152-in. 4K × 2K (4096 × 2160 pixels)-definition full HD 3D plasma display. The display features a new Plasma Display Panel (PDP) that uses self-illuminating technology. Self-illuminating plasma panels offer excellent response to moving images with full motion picture resolution, making them suitable for rapid 3D image display (its illuminating speed is about one-fourth the speed of conventional full HD panels). Each display approach has advantages and disadvantages as shown in Table 1.3.
11
LCD 3DTV (polarized display) with passive glasses
3D plasma/LCD TV (unpolarized display) with active glasses
4
5
3
2
Projection-based FPTV (polarized display) with passive glasses Projection-based FPTV (unpolarized display) with active glasses Projection-based RPTV (polarized display) with passive glasses
1
Simple-to-use system not requiring projection setup Flat-screen TV type, elegant d´ecor
Integrated unit—easier to add to room d´ecor To present stereoscopic content, two images are projected superimposed onto the same screen through different polarizing filters (either linear or circular polarizing filters can be used) Viewer wears low-cost eyeglasses that also contain a corresponding pair of different polarizing filters Simple-to-use system, not requiring projection setup To present stereoscopic content, two images are projected superimposed onto the same screen through interlacing techniques
Big-screen 3D effect similar to cinematic experience Excellent-to-good light intensity Choice of projectors/cost Inexpensive, lightweight passive 3D glasses Option of using newer single DLP projectors that support 120 Hz refresh rate (active-based system) No polarization-protecting screen needed
Advantages
Disadvantages
(continued overleaf)
Some possible loss of resolution Viewer wears low-cost eyeglasses which also contain a pair of different polarizing filters Some light intensity loss at the display level Relatively expensive ($3000–5000 in 2010) Delivers two images to the same screen pixels, but alternates them such that two different images are alternating on the screen Active shutter glasses can be expensive, particularly for a larger viewing group
Some light intensity loss at the display level Not of the “flat-panel-TV type”; cannot be hung on walls
Needs a silver screen to retain polarization of light Alignment of two projectors should be such that they are stacked on top of each other Not totally d´ecor-friendly More expensive glasses Need battery-powered LCD shutter glasses
Summary of Possible, Commercially Available TV Screen/System Choices for 3D
3D Display System
TABLE 1.3
12
a
Autostereoscopic screen (lenticular or barrier) No glasses needed
Advantages
Disadvantages Requires TV sets to be able to accept and display images at 120/240 Hz Glasses need power Some light intensity loss at the viewer (glasses level) Some loss of resolution Size limited to 60–80 in. at this time but larger systems being brought to market LCDs are relatively cheaper than other alternatives: a 42-in. set based on LCD and shutter glasses was selling for about US$1000 and a 50-in. set was selling for more than US$2000 (compare that with a 42-in. HD LCD TV which costs about US$600–700) LEDs and or plasma systems can be costly Very few suppliers in 2010 Further out in time in terms of development and deployment Some key manufacturers have exited the business (for now) Content production is more complex Displays have a “sweet spot” that requires viewers to be within this viewing zone
FPTV, Front Projection Television; DLP, Digital Light Processing; RPTV, Rear Projection Television.
6
(Continued )
3D Display System
TABLE 1.3
BACKGROUND
13
Figure 1.8 3D Blu-ray disc logo.
It is to be expected that 3DTV for home use is likely to first see penetration via stored media delivery. For content source, proponents make the case that BD “is the ideal platform” for the initial penetration of 3D technology in the mainstream market because of the high quality of pictures and sound it offers film producers. Many products are being introduced by manufacturers: for example at the 2010 Consumer Electronics Show (CES) International Trade Show, vendors introduced eight home theater product bundles (one with 3D capability), 14 new players (four with 3D capability), three portable players, and a number of software titles. In 2010 the Blu-ray Disc Association (BDA) launched a new 3D Blu-ray logo to help consumers quickly discern 3D-capable Blu-ray players from 2D-only versions (Fig. 1.8) [5]. The BDA makes note of the strong adoption rate of the Blu-ray format. In 2009, the number of Blu-ray households increased by more than 75% over 2008 totals. After four years in the market, total Blu-ray playback devices (including both set-top players and PlayStation3 consoles) numbered 17.6 million units, and 16.2 million US homes had one or more Blu-ray playback devices. By comparison, DVD playback devices (set-tops and PlayStation2 consoles) reached 14.1 million units after four years, with 13.7 million US households having one or more playback devices. The strong performance of the BD format is due to a number of factors, including the rapid rate at which prices declined due to competitive pressures and the economy; the rapid adoption pace of HDTV sets, which has generated a US DTV household penetration rate exceeding 50%; and, a superior picture and sound experience compared to standard definition and even other HD sources. Another factor in the successful adoption pace has been the willingness of movie studios to discount popular BD titles [5]. Blu-ray software unit sales in 2009 reached 48 million, compared with 22.5 million in 2008, up by 113.4%. A number of movie classics were available at press time through leading retailers at sale prices as low as $10. The BDA also announced (at the end of 2009) the finalization and release of the Blu-ray 3D specification. These BD specifications for 3D allow for full HD 1080p resolution to each eye. The specifications are display agnostic, meaning they apply equally to plasma, LCD, projector, and other display formats regardless of the 3D systems those devices use to present 3D to viewers. The
14
INTRODUCTION
specifications also allow the PlayStation3 gaming console to play back 3D content. The specifications that represent the work of the leading Hollywood studios and consumer electronic and computer manufacturers, will enable the home entertainment industry to bring stereoscopic 3D experience into consumers’ living rooms on BD, but will require consumers to acquire new players, HDTVs, and shutter glasses. The specifications allow studios (but do not require them) to package 3D Blu-ray titles with 2D versions of the same content on the same disc. The specifications also support playback of 2D discs in forthcoming 3D players and can enable 2D playback of Blu-ray 3D discs on an installed base of BD. The Blu-ray 3D specification encodes 3D video using the Multi-View Video Coding (MVC) codec, an extension to the ITU-T H.264 Advanced Video Coding (AVC) codec currently supported by all BD players. MPEG-4 (Moving Picture Experts Group 4)-MVC compresses both left and right eye views with a typical 50% overhead compared to equivalent 2D content, according to BDA and can provide full 1080p resolution backward compatibility with current 2D BD players [6]. The broadcast commercial delivery of 3DTV on a large scale—whether over satellite/Direct-To-Home (DTH), over the air, over cable systems, or via IPTV—may take some number of years because of the relatively large-scale infrastructure that has to be put in place by the service providers and the limited availability of 3D-ready TV sets in the home (implying a small subscriber, and so small revenue base). A handful of providers were active at press time, as described earlier, but general deployment by multiple providers serving a geographic market will come at a future time. Delivery of downloadable 3DTV files over the Internet may occur at any point in the immediate future, but the provision of a broadcast-quality service over the Internet is not likely for the foreseeable future. At the transport level, 3DTV will require more bandwidth of regular programming, perhaps even twice the bandwidth in some implementations (e.g., simulcasting—the transmission of two fully independent channels4 ); some newer schemes such as “video + depth” may require only 25% more bandwidth compared to 2D, but these schemes are not the leading candidate technologies for actual deployment in the next 2–3 years. Other interleaving approaches use the same bandwidth of a channel now in use, but at a compromise in resolution. Therefore, in principle, if HDTV programming is broadcast at high quality, say, 12–15 Mbps using MPEG-4 encoding, 3DTV using the simplest methods of two independent streams will require 24–30 Mbps.5 This data rate does not fit a standard over-the-air digital TV (DTV) channel of 19.2 Mbps, and will also be 4 In
the 3DTV context, the term “simulcasting” has been used with two meanings: one use is as implied above—the coding and transmission of two channels (which is unlikely to occur in reality); the second use is in the traditional sense of transmitting, say a HDTV signal and also a 3DTV signal by some other means or on some other channel/system. 5 Some HDTV content may be delivered at lower rates by same operators, say 8 Mbps; this rate, however, may not be adequate for sporting HDTV channels, and may be marginal for 3D TV at 1080p/60 Hz per eye.
BACKGROUND
15
a challenge for non-Fiber-To-The-Home (non-FTTH) broadband Internet connections. However, one expects to see the emergence of bandwidth reduction techniques, as alluded to above. On the other hand, DTH satellite providers, terrestrial fiberoptic providers, and some cable TV firms should have adequate bandwidth to support the service. For example, the use of the Digital Video Broadcast Satellite Second Generation (DVB-S2) allows a transponder to carry 75 Mbps of content with modulation using an 8-point constellation and twice that much with a 16-point constellation. The trade-off would be, however (if we use the raw HD bandwidth just described as a point of reference), that a DVB-S2 transponder that would otherwise carry 25 channels of standard definition video or 6–8 channels of HD video would now only carry 2–3 3DTV channels. To be pragmatic about this issue, most 3DTV providers are not contemplating delivering full resolution as just described and/or the transmission of two fully independent channels (simulcasting), but some compromise; for example, lowering the per eye data rate such that a 3DTV program fits into a commercial-grade HDTV channel (say 8–10 Mbps), using time interleaving or spatial compression—again, this is doable but comes with the degradation of ultimate resolution quality. There are a number of alternative transport architectures for 3DTV signals, also depending on the underlying media. As noted, the service can be supported by traditional broadcast structures including the DVB architecture, wireless 3G/4G transmission such as DVB-H approaches, Internet Protocol (IP) in support of an IPTV-based service (in which case it also makes sense to consider IPv6) and the IP architecture for internet-based delivery (both non–real time and streaming). The specific approach used by each of these transport methods will also depend on the video-capture approach. One should note that in the United States, one has a well-developed cable infrastructure in all Tier 1 and Tier 2 metropolitan and suburban areas; in Europe/Asia, this is less so, with more DTH delivery (in the United States DTH tends to serve more exurban and rural areas). A 3DTV rollout must take these differences into account and/or accommodate both. In reference to possible cable TV delivery, CableLabs announced at press time that it started to provide testing capabilities for 3D TV implementation scenarios over cable; these testing capabilities cover a full range of technologies including various frame-compatible, spatial multiplexing solutions for transmission [7]. Standards are critical to achieving interworking and are of great value to both consumers and service providers. The MPEG of the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) has been working on coding formats for 3D video (and has already completed some of them.) The Society of Motion Picture and Television Engineers (SMPTE) 3D Home Entertainment Task Force has been working on mastering standards. The Rapporteur Group on 3DTV of the International Telecommunications UnionRadiocommunications Sector (ITU-R) Study Group 6, and the TM-3D-SM group of DVB were working on transport standards.
16
INTRODUCTION
1.2.2
Opportunities and Challenges for 3DTV
The previous section highlighted that many of the components needed to support an end-to-end commercial broadcast service are available or are becoming available. Hence, proponents see a significant market opportunity at this time. CEA estimates that more than 25% of sets sold in 2013 will be 3D-enabled. A handful of representative quotes from proponents of the 3D technology are as follows: No one can escape the buzz and excitement around 3D. We’re witnessing the start of dramatic change in how we view TV—the dawn of a new dimension. And through Sky’s clear commitment to 3D broadcasting, 3D in the home is set to become a reality . . . [4]. . . . The next killer application for the home entertainment industry—3DTV . . . [It] will drive new revenue opportunities for content creators and distributors by enabling 3D feature films and other programming to be played on their home television and computer displays—regardless of delivery channels . . . [8]. . . . The most buzzed about topics at CES: 3-D stereoscopic content creation . . . Several pivotal announcements [in] 2010 including 3D-TV releases from the major consumer electronics manufacturers and the launch of several dedicated 3D broadcast channels are driving the rapid increase in demand for 3-D content . . . [9]. 3D technology is now positioned “to become a major force in future in-home entertainment.”[10]. . . . 3DTV is one of the ‘hottest’ subjects today in broadcasting. The combination of the audience’s‘wow’ factor and the potential to launch completely new services, makes it an attractive subject for both consumer and professional. There have already been broadcasts of a conventional display-compatible system, and the first HDTV channel compatible broadcasts are scheduled to start in Europe in the Spring of 2010 . . . [11]. . . . In Europe, the EC is currently funding a large series of projects for 3DTV, including multiview, mobile 3D and 3D search . . . [12].
Naturally, while there are proponents of the 3DTV technology, at the same time, there are industry observers that take a more conservative view. These observers make note that there are uncertainties about the availability of content, the technological readiness, and acceptance in the living room, especially given the requirement to use polarized or shutter glasses. A rational approach to market penetration is certainly in order; also, the powerful tool of statistically valid market research can be used to truly measure user interest and willingness to pay. Some representative quotes for a more conservative view of the 3D technology are given below: . . . In a wide range of demos, companies . . . claim . . . in January 2010 that stereoscopic 3D is ready for the home. In fact, engineers face plenty of work hammering out
BACKGROUND
17
the standards and silicon for 3DTV products, most of which will ship for the holiday 2010 season . . . [13]. It has proven somewhat difficult to create a 3D system that does not cause ‘eye fatigue’ after a certain time. Most current-generation higher resolution systems also need special eyeglasses which can be inconvenient. Apart from eye-fatigue, systems developed so far can also have limitations such as constrained viewing positions. Multiple viewpoint television systems are intended to alleviate this. Stereoscopic systems also allow only limited ‘production grammar’ . . . One should not underestimate the difficulty, or the imagination and creativity required, to create a near ‘ideal’ 3DTV system that the public could enjoy in a relaxed way, and for a long period of time . . . [14]. . . . The production process for 3D television requires a fundamental rethinking of the underlying technology. Scenes have to be recorded with multiple imaging devices that may be augmented with additional sensor technology to capture the three-dimensional nature of real scenes. In addition, the data format used in 3D television is a lot more complex. Rather than normal video streams, time-varying computational models of the recorded scenes are required that comprise of descriptions of the scenes’ shape, motion, and multiview appearance. The reconstruction of these models from the multiview sensor data is one of the major challenges that we face today. Finally, the captured scene descriptions have to be shown to the viewer in three-dimensions which requires completely new display technology . . . [15]. . . . The conventional stereoscopic concept entails with two views: it relies on the basic concept of an end-to-end stereoscopic video chain, that is, on the capturing, transmission and display of two separate video streams, one for the left and one for the right eye. [Advocates for the autostereoscopic approach argue that] this conventional approach is not sufficient for future 3DTV services. The objective of 3DTV is to bring 3D imaging to users at home. Thus, like conventional stereo production and 3D Cinema, 3DTV is based on the idea of providing a viewer with two individual perspective views—one for the left eye and one for the right eye. The difference in approaches, however, lies in the environment in which the 3D content is presented. While it seems to be acceptable for a user to wear special glasses in the darkened theatrical auditorium of a 3D Digital Cinema, [many, perhaps] most people would refuse to wear such devices at home in the communicative atmosphere of their living rooms. Basically, auto-stereoscopic 3D displays are better suited for these kinds of applications [16]. . . . The two greatest industry-wide concerns [are]: (1.) That poor quality stereoscopic TV will‘poison the water’ for everyone. Stereoscopic content that is poorly realized in grammar or technology will create a reputation of eyestrain which cannot be shaken off. This has happened before in the 30s, the 50s, and the 80s in the cinema. (2.) That fragmentation of technical standards will split and confuse the market, and prevent stereoscopic television from ever being successful . . . [17]. . . . people may quickly tire of the novelty. I think it will be a gimmick. I suspect there will be a lot of people who say it’s sort of neat, but it’s not really comfortable . . . [18].
The challenge for the stakeholder is to determine where the “true” situation is, whether it is at one extreme, at the other extreme, or somewhere in the middle. An
18
INTRODUCTION
abbreviated list of issues to be resolved in order to facilitate broad deployment of 3DTV services, beyond pure technology and encoding issues, include the following [19]: • Production grammar (3D production for television still in infancy) • Compatibility of systems (also possibly different issues for pay TV and free-to-air operators) • Assessment of quality/suitability – Methodologies for the quality assessment of 3D TV systems; – Parameters that need to be measured that are specific to 3D TV; – Sensation of reality; – Ease of viewing. • Understanding what the user requirements are. In general, a technology introduction process spans three phases: • Phase 1: The technology becomes available in basic form to support a given service; • Phase 2: A full panoply of standards emerges to support widespread deployment of the technology; • Phase 3: The technology becomes inexpensive enough to foster large-scale adoption by a large set of end-users. With reference to 3DTV, we find ourselves at some early point in Phase 1. However, there are several retarding factors that could hold back short-term deployment of the technology on a broad scale, including deployment and service cost (overall status of the economy), standards, content, and quality. The previous assertion can be further elaborated as follows: ITU-R WP 6C classifies 3D TV systems into two groups. The “first generation” systems are essentially those based on “Plano-stereoscopic” display of single or multiple discrete lateral left and right eye pictures. Recommendations for such systems should be possible in the near future. The “second generation” systems are those which are based on object-wave recording (holography) or approximations of object-wave recording. Recommendations for such systems may be possible in the years ahead. We refine these observations by defining the following generations of 3DTV technology: • Generation 0: Anaglyth TV transmission; • Generation 1: 3DTV that supports plano-stereoscopic displays, which are stereoscopic (that is, require active or passive glasses); • Generation 2: 3DTV that supports plano-stereoscopic displays, which are autostereoscopic (do not require glasses);
COURSE OF INVESTIGATION
19
3DTV epoch 0 Generation 0: Yesterday anaglypth TV
3DTV epoch 1 This decade
Generation 1: plano-stereoscopic displays stereoscopic
2010–2013 3DTV epoch 2 Speculative > 2020?
Generation 3: integral imaging, transmission, and displays
Generation 2.5: Generation 2: plano-stereoscopic displays plano-stereoscopic displays autostereoscopic autostereoscopic multiple (N = 9) views
2013–2015 Generation 4: volumetric displays, transmission, and displays
2016 Generation 5: object–wave transmission re-creation
Note: These blocks intend to represent a deployed commercial service, not just prototypes and/or trials
Figure 1.9 Three epochs of 3DTV commercial deployment.
• Generation 2.5: 3DTV that supports plano-stereoscopic displays, which are autostereoscopic (do not require glasses) and support multiple (N = 9) views; • Generation 3: 3DTV that supports integral imaging, transmission, and displays; • Generation 4: 3DTV that supports volumetric displays, transmission, and displays; • Generation 5: 3DTV that supports object-wave transmission. See Figs. 1.9 and 1.10 (partially based on Ref. 2). Whether and when we get beyond Generation 2.5 in this decade remains to be seen. This text, and the current commercial industry efforts concentrate on Generation 1 services. At press time, we find ourselves in Phase 1 of Generation 1. The existing commercial video infrastructure can handle 3D video in a basic, developmental form; however, providing HD 3D with motion graphics is not achievable without making enhancements to such infrastructure. Existing infrastructures, including satellite and/or terrestrial distribution networks for example, can handle what some have termed “half-HD resolution” per eye, or frame formats of 1080i, 1080p24, and 1080i60. Existing encoders and set-top boxes can be used as long as signaling issues are addressed and 3D content is of a consistent form. The drawback of half-HD resolution is that images can be blurry, especially for sporting events and high-action movies [20]. New set-top chip sets are required for higher resolution 3DTV.
1.3
COURSE OF INVESTIGATION
While there is a lot of academic interest in various aspects of the overall system, service providers and the consumers ultimately tend to take a system-level view. While service providers do, to an extent, take a constructionist bottom-up view to
20
INTRODUCTION
•
Generation 0: Anaglypth TV transmission
• Anaglypth Analog Digital
• Stereoscopy • earliest form: known for ~ 170 years • simplest, based on “perception” • two simultaneous video/image
•
•
•
•
to two eyes – color-based filtering (anaglyphs)
Generation 1: 3DTV that supports plano-stereoscopic displays, which are stereoscopic (that is, require active or passive glasses)
– polarization-based filtering – shutter-based filtering • other forms exist • pulfrich effect stereoscopy
Generation 2: 3DTV that supports plano-stereoscopic displays, which are autostereoscopic (do not need glasses)
• Autostereoscopic viewing
Generation 2.5: 3DTV that supports plano-stereoscopic displays, which are autostereoscopic (do not need glasses) and support multiple (N = 9) views
• Mult-view autostereoscopy
• no special eye-wear • lenticular or barrier technologies • sweet-spot phenomenon
• many simultaneous horizontally spaced views • usually 7-9 views; may go up to ~ 50 • some horizontal parallax
• Integral imaging
Generation 3: 3DTV that supports integral imaging, transmission, and displays
• known since 1905 • microlens arrays during capture and projection • 2D array of many elemental images • both vertical and horizontal parallax • light field renderer in the limit – replicates 3D physical light distribution: True 3D technique – “incoherent Holography”
•
• Volumetric 3D displays
Generation 4: 3DTV that supports volumetric displays, transmission, and displays
• sweeping 3D volume either mechanically or electronically • voxels – self-luminous pixels – moving projection screens
•
Generation 5: 3DTV that supports object–wave transmission
30-year timeline: 1995–2025
• Holography • basic principle—1948, first holograms—1960 • based on physics: duplication of light field: True 3D technique • recording on – photographic films – high resolution CCD arrays • 3D reconstruction by proper illumination of the recording • digital holographic techniques • experimental holographic motion pictures—1989 • still at basic research phase
Figure 1.10 A 30-year timeline for 3DTV services (1995–2025).
COURSE OF INVESTIGATION
21
Millions...
2D decoder
Each viewer IGMPing v1c1, v1c2, v1c3,...
2D display
2D SD, HD 2D camera
Stereo camera
Encoder and 3D content production
Encoder and 3D content production
v1 meta
Hundreds
V+D Depth camera
v1 v2
Hundreds
V + meta Stereo camera
Millions...
Encoder and v1 content production Hundreds CSV
Encoder and 3D content production
v1 d
Hundreds
m content providers Aggregator
3D display Each viewer IGMPing v1c1+v2c1, v1c2+v2c2, v1c3+v2c3,...
m content providers Aggregator m content providers Aggregator
Millions...
IPTV with V+D decoder service provider 3D display managed IP network Each viewer (DVB transmission) IGMPing v1c1+d1c1, v1c2+d2c2, v1c3+d3c3,...
m content providers Aggregator
MV + D Multicamera
CSV decoder
200–300 content channels
Millions...
2D display
Encoder and 3D content production
v1 vn
m content providers Aggregator
Hundreds (3v, d)
200–300 content channels
MV+D decoder with DIBR
Each viewer IGMPing v1c1+v5c1+ v9c1, v1c2+v5c2+v9c3,... M-view 3D display
(a)
All channels: each viewer selecting v1c1, v1c2, v1c3,... DTH
3D
DTH
DTH DTH Millions...
2D decoder
3D
2D camera
Stereo camera
Encoder and v1 content production Hundreds CSV Encoder and 3D content production V + meta
Stereo camera
v1 v2
Hundreds
Encoder and 3D content production V+D
Depth camera
v1 meta
Hundreds
Encoder and 3D content production
v1 d
Hundreds Encoder and 3D content production
v1 vn
DTH
3D m content providers Aggregator m content providers Aggregator m content providers Aggregator
m content providers Aggregator
MV + D Multicamera
2D display
3D
2D SD, HD
m content providers Aggregator
Hundreds (3v, d)
DTH
DTH
Millions...
DTH CSV decoder
All channels: 3D display each viewer selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3,... DTH
Millions...
DTH
DTH V+D decoder DTH All channels: 3D display each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d2c3,...
DTH DTH
DTH
Millions...
DTH
2D display MV+D decoder with DIBR
All channels: each viewer selecting v1c1+ v5c1+ v9c1, v1c2+v5c2+v9c3,... M-view 3D display
(b)
Figure 1.11
A system view of a fully developed 3DTV distribution ecosystem.
22
INTRODUCTION
deploy the technological building blocks (such as encoders, encapsulators, IRDs [Integrated Receiver/Decoders], set-top boxes, and so on), 3DTV stakeholders need to consider the overall architectural system-level view of what it will take to deploy an infrastructure that is able to reliably and cost-effectively deliver a commercial-grade quality bundle of multiple 3DTV content channels to paying customers with high expectations. This text, therefore, takes such a system-level view, namely how to actually deploy the technology. Figure 1.11 depicts the 3DTV distribution ecosystem that this text addresses. Fundamental visual concepts supporting stereographic perception of 3DTV are reviewed in Chapter 2. 3DTV technology and digital video compression principles are discussed in Chapter 3. Elements of an end-to-end 3DTV system are covered from a satellite deliver perspective in Chapter 4. Transmission technologies are assessed for terrestrial and IPTV-based architecture in Chapter 5. Finally, Chapter 6 covers standardization activities that are critical to any sort of broad deployment. This author recently published a companion text, “3D Television (3DTV) Technology, Systems, and Deployment—Rolling out the Infrastructure for Next-Generation Entertainment” (published by Francis and Taylor, 2010), which addresses the broader issues related to technologies listed in Table 1.1 and with an emphasis on post-CSV systems. At this time, as noted earlier, the CSV approach is the mostly likely to see early deployment in commercial 3DTV systems. Table 1.4 identifies the approaches that have been advanced by researchers for the treatment of the video images after their immediate capture by a stereo (or multi-view) set of cameras [21–23]. The most common approaches are CSV, video plus depth (V + D), multi-view video plus depth (MV + D), and layered depth video (LDV). We provide a brief discussion of these other systems in the chapters that follow, but we do not focus on them; the reader is referred to our companion text for a more detailed discussion of these systems.
TABLE 1.4
Common Video Treatment Approachesa
Video Treatment Approach Conventional stereo video (CSV)
Description CSV is the most well-known and in a way, the simplest type of 3D video representation. Only color pixel video data are involved and are captured by at least two cameras. The resulting video signals may undergo some processing steps like normalization, color correction, and rectification but in contrast to other 3D video formats, no scene geometry information is involved. The video signals are meant in principle to be directly displayed using a 3D display system, though some video processing might also be involved before display
COURSE OF INVESTIGATION
TABLE 1.4
23
Continued
Video Treatment Approach
Description
Video plus depth The video plus depth (V + D) representation consists of a video (V + D) signal and a per pixel depth map. (This is also called 2D plus depth by some and color plus depth by others). Per pixel depth data is usually generated from calibrated stereo or multi-view video by depth estimation and can be regarded as a monochromatic, luminance-only video signal. The depth range is restricted to a range in between two extremes z near and z far indicating the minimum and maximum distance respectively, of the corresponding 3D point from the camera. Typically, the depth range is quantized with 8 bit, associating the closest point with the value 255 and the most distant point with the value 0. With that, the depth map is specified as a grayscale image that can be fed into the luminance channel of a video signal and then be processed by any state-of-the-art video codec. For displaying V + D at the decoder, a stereo pair can be rendered from the video and depth information by 3D warping with camera geometry information Multi-view video Advanced 3D video applications are wide-ranging multi-view plus depth autostereoscopic displays and free viewpoint videos, where the user (MV + D) can choose an own viewpoint. They require a 3D video format that allows rendering a continuum of output views or a very large number of different output views at the decoder. Multi-view video by itself does not support a continuum and coding is increasingly inefficient for a large number of views. V + D supports only a very limited continuum around the available original view since view synthesis artifact increase dramatically with the distance of the virtual viewpoint. Therefore, a MV + D representation is required for advanced 3D video applications. MV + D involves a number of complex and error-prone processing steps. Depth has to be estimated for the N views at the sender. N color with N depth videos have to be encoded and transmitted. At the receiver, the data have to be decoded and the virtual views have to be rendered. The multi-view video coding (MVC) standard–developed MPEG supports this format and is capable of exploiting the correlation between the multiple views that are required to represent 3D video Layered depth Layered depth video is a derivative and alternative to MV + D. It video (LDV) uses one color video with associated depth map and a background layer with associated depth map. The background layer includes image content that is covered by foreground objects in the main layer. LDV might be more efficient than MV + D because less data have to be transmitted. On the other hand, additional error-prone vision tasks are included that operate on partially unreliable depth data that may increase artifacts a Based
on concepts from: 3DPHONE Document “All 3D Imaging Phone, 7th Framework Programme”.
24
INTRODUCTION
REFERENCES 1. Steinberg S. 3DTV: Is the World Really Ready to Upgrade? Digital Trends, Online Magazine. Jan 7, 2010. 2. Onural L. The 3DTV Toolbox—The Results of the 3DTV NoE. 3DTV NoE Coordinator, Bilkent University, Workshop on 3DTV Broadcasting, Geneva. Apr 30, 2009. 3. Dosch C, Wood D. Can we create the “holodeck”? The challenge of 3D television. ITU News Magazine: Article: Issue No 09. Nov 2008. 4. Wallop H. CES 2010: 3D TVs on sale in UK by April. Telegraph. Jan 7, 2010. telegraph.co.uk. 5. Tarr G. BDA Welcomes 3D into Blu-ray Fold. TWICE. Jan 8, 2010. 6. Shilov A. Blu-Ray Disc Association Finalizes Stereoscopic 3D Specification: BluRay 3D Spec Finalized: New Players Incoming. xbitslabs On line Magazine. Dec 18, 2009. http://www.xbitlabs.com. 7. 3D TV Round-Up. ITVT Online Magazine. Jan 6, 2010. 8. Aylsworth W. New SMPTE 3D Home Content Master Requirements Set Stage For New Market Growth. Las Vegas (NV): National Association of Broadcasters; 2009. 9. Otellini P. Intel Corporation President and CEO, Keynote Speech, Consumer Electronics Show, Las Vegas (NV). Jan 7, 2010. 10. 3-D Video Changes the Consumer Content Experience, CEA/ETC@USC SURVEY FINDS. Joint Consumer Study of the Consumer Electronics Association and the Entertainment and Technology Center at the University of Southern California. Feb 20, 2009. 11. Digital Video Broadcasting Project (DVB), Online website material regarding the launch of the DVB 3D TV Kick-Off Workshop. Jan 2010. 12. Dosch C. Toward Worldwide Standards for First and Second Generation 3d TV. Workshop on Three-Dimensional Television Broadcasting. Organized jointly by ITUR, EBU and SMPTE, Geneva. April 30, 2009. 13. Merritt R. Incomplete 3DTV Products in CES Spotlight HDMI Upgrade One of Latest Pieces in Stereo 3D Puzzle. EE Times. Dec 23, 2009. 14. ITU-R Newsflash. ITU Journey To Worldwide “3D Television” System Begins, Geneva. Jun 3, 2008. 15. Rosenhahn B, editor. D26.3 Technical Report # 3 on 3D: Time-varying Scene Capture Technologies. Project Number: 511568, Project Acronym: 3DTV Initiative Title: Integrated Three-Dimensional Television—Capture, Transmission and Display. TC1 WP7 Technical Report 3. March 2008. 16. Kauff P, M¨uller M, et al. ICT- 215075 3D4YOU, Deliverable D2.1.2: Requirements on Post-production and Formats Conversion. August 2008. 17. Wood D. Adding value to 3D TV standards, Chair, ITU-R WP 6C. Apr 29, 2009. Comments found at International Telecommunication Union website. www.itu.int. 18. Steenhuysen J. For Some, 3D Movies a Pain in the Head. Reuters. Jan 11, 2010. 19. ITU-R Activities in 3D WP6C Rapporteurs for 3D TV. Apr 30, 2009. 20. TVB, Television Broadcast. A 3DTV Update From the MPEG Industry Forum. Online Magazine. Jan 20, 2010. www.televisionbroadcast.com.
FURTHER READING
25
21. IST–6th Framework Programme, 3DTV NoE, 2004. Project Coordinator: Prof. Levent Onural, EEE Department, Bilkent University, TR-06800 Ankara, Turkey. 22. 3DPHONE. Project no. FP7-213349, Project title: All 3D Imaging Phone, 7th Framework Programme, Specific Programme “Cooperation”, FP7-ICT2007.1.5—Networked Media, D5.2- Report on first study results for 3D video solutions. Dec 31, 2008. 23. 3DPHONE, Project no. FP7-213349, Project title: All 3D Imaging Phone, 7th Framework Programme, Specific Programme “Cooperation”, FP7-ICT2007.1.5—Networked Media, D5.1- Requirements and specifications for 3D video. Aug 19, 2008. 24. TVB Magazine. TVB’s 3DTV Timeline. Online Magazine. Jan 5, 2010. www. televisionbroadcast.com.
FURTHER READING Ozaktas HM, Onural L, editors, Three-dimensional television: capture, transmission, display. New York: Springer Verlag; 2008. XVIII, 630, p. 316 illus. ISBN: 978-3540-72531-2. Bahram J, Fumio Okano, editors, Three-dimensional television, video and display technology. New York: Springer Verlag; 2002. Schreer O, Kauff P, Sikora T, editors, 3D Videocommunication: Algorithms, concepts and real-time systems in human-centered communication (Hardcover), New York: Wiley, John & Sons; 2005, ISBN-13: 9780470022719. Minoli D, 3D Television (3DTV) Technology, Systems, and Deployment—Rolling out the Infrastructure for Next-Generation Entertainment. Francis and Taylor; 2010.
26
INTRODUCTION
APPENDIX A1: SOME RECENT INDUSTRY EVENTS RELATED TO 3DTV
This appendix includes a listing of events during the year prior to the publication of this text, so as to further document the activity in this arena. It is based in its entirety on Ref. 24. Despite the economic difficulties of 2009, the year marked a turning point in the adoption of 3D a viable entertainment format. TVB presents a timeline of 3D video developments over the last year, from content to workflow initiatives to display technologies: December 4, 2008 : The San Diego Chargers and the Oakland Raiders appeared in a 3D simulcast displayed at theaters in Boston, Hollywood, and New York. January 8, 2009 : A 3D version of the Gators–Sooners match-up was simulcast in Las Vegas at the Consumer Electronics Show. February 14, 2009 : The NBA’s All-Star Game was simulcast in 3D. February 24, 2009 : Toshiba announces the development of OLED Wallpaper TV, with a 3D version utilizing circularly polarized light in the works. March 2, 2009 : Avid Technology announced it was developing native support for the Sony XDCAM format, as well as adding 3D capabilities to its various editing software packages, Composer and Symphony. March 9, 2009 : BSkyB continued plowing toward 3DTV, with a goal to offer it by the end of the year. April 6, 2009 : BSkyB successfully transmitted live 3DTV across its HD infrastructure in the United Kingdom. April 20, 2009 : At the NAB show in Las Vegas, Panasonic announced work on a full 3D HD production system, encompassing everything from capture to Bluray distribution. The Panasonic gear list comprised authoring, a twin-lens P2 camera recorder and drives, 3D Blu-ray discs and players, and a 3D plasma display. Panasonic displayed its HD 3D Plasma Home Theater at the NAB convention. July 30, 2009 : BSkyB now plans to launch its 3D channel in 2010. August 24, 2009 : Panasonic joined James Cameron in a flack blitz for “Avatar,” with a multipoint media and sales campaign and a nationwide tour with customized 18wheelers outfitted with 103-inch Panasonic Viera plasma HDTVs and Blu-ray disc players. September 2, 2009 : Sony announced that it planned to introduce a consumer-ready 3D TV set in 2010, as well as build 3D capability into many of its consumer electronics, encompassing music, movies, and video games. September 10, 2009 : Mobile TV production specialist NEP has rolled out its first 3D truck.
APPENDIX A1: SOME RECENT INDUSTRY EVENTS RELATED TO 3DTV
September 11, 2009 : BBC executives say some of the 2012 Olympics Games there could be carried in 3D. September 12, 2009 : ESPN transmits an invitation-only 3D version of the University of SoCal versus Ohio State game to theaters in Los Angeles, Columbus, Ohio; Hartford, Conn.; and Hurst, Texas. September 14, 2009 : IBC features several professional 3D technologies, including Nagravision’s 3D program guide and Viaccess 3D conditional access. The awards ceremony featured a 16-minute clip of James Cameron’s “Avatar.” Skeptics mentioned the headache factor, as well as the difficulty of doing graphics for 3D. September 24, 2009 : In-Stat finds that about 25 percent of those who are at least somewhat interested in having the ability to view 3D content at home, however, were unwilling to spend more money on a 3D TV. October 1, 2009 : Sony Broadcast bows a new single-lens 3D technology comprising a new optical system that captures left and right images simultaneously, with existing high frame-rate recording technology to realize 240 fps 3D filming. October 8, 2009 : 3M says it has developed 3D for mobile devices. The autostereoscopic 3D film targets cell phones, small video game consoles, and other portable digital devices, and requires no glasses. October 21, 2009 : SMPTE’s fall conference focuses on 3D, with Dolby Labs, Fox Network, DTS, and RealD lending input. October 26, 2009 : Televisa broadcast the first soccer match in 3D over the weekend to parts of Mexico. November 11, 2009 : Rich Greenfield, analyst with Pali Capital, deems 3D a gimmick, at least as far as the movie industry was concerned. The movie “Scrooge” in 3D fueled his skepticism. November 23, 2009 : Sony chief Sir Howard Stringer is counting on 3D to be the company’s next $10 billion business. December 3, 2009 : The International Federation of Football said it would broadcast 2010 World Cup soccer matches in 3D. FIFA said it has signed a media rights agreement with Sony, one of its official partners, to delivery 3D versions of up to 25 matches in the 2010 FIFA World Cup South Africa. December 3, 2009 : Neither Michael Jackson’s videos nor the next Spiderman movie would be among Sony’s upcoming 3D releases, the company’s top executive said. December 14, 2009 : Two events mark the advance of 3D. First was the debut of a live 3D broadcast on the massive display screen at the Dallas Cowboys stadium in Arlington, Texas. The second—the release of “Avatar,” James Cameron’s interplanetary 3D epic. December 18, 2009 : “Avatar” changes Greenfield’s doubts about 3D. “We are assuming ‘Avatar’ will reach opening weekend attendance levels of about 12 million, with
27
28
INTRODUCTION
57.5 percent of attendance occurring on 3D screens yielding total opening weekend box office of over $100 million,” he said. January 4, 2010 : ESPN announces the intended launch of ESPN 3D in time for the June 11 FIFA World Cup Soccer games. January 5, 2010 : Sony, Discovery, and IMAX make it official, announcing the intended launch of a 3D network in 2011. 3D technologies dominate previews of the annual Consumer Electronics Show in Las Vegas. DIRECTV announced the 2010 launch of an HD 3D channel at the show.
CHAPTER 2
3DV and 3DTV Principles
The physiological apparatus of the Human Visual System (HVS) which is responsible for the human sense of depth has been understood for a long time but it is not trivial. In this chapter we explore some basic key concepts of visual science that play a role in 3DTV. The concepts of stereoscopic vision, parallax, convergence, and accommodation are covered, among others.
2.1
HUMAN VISUAL SYSTEM
Stereopsis is the binocular sense depth. Binocular as an adjective means “with two eyes, related to two eyes.” Depth is perceived by the HVS by way of cues. Binocular cues are depth cues that depend on perception with two eyes. Monocular cues are depth cues that can be perceived with a single eye alone, such as relative size, linear perspective, or motion parallax. Stereoscopic fusion is the ability of the human brain to fuse the two different perspective views into a single, 3D image. Accommodation is the focusing of the eyes. Convergence is the horizontal rotation of eyes (or cameras) that makes their optical axes intersect in a single point in 3D space. Interocular distance is the distance between an observer’s eye—about 64 mm for adults. Disparity is the distance between corresponding points on left- and right-eye images. Retinal disparity is the disparity perceived at the retina of the human eyes. Horopter is the 3D curve that is defined as the set of points in space whose images form at corresponding points in the two retinas (i.e., the imaged points have zero disparity). Panum’s fusional area is a small region around the horopter where retinal disparities can be fused by HVS into a single, 3D image. Point of convergence is a point in 3D point where optical axis of eyes (or convergent cameras) intersect. The plane of convergence is the depth plane where optical rays of sensor centers intersect in case of parallel camera setup. Crossed disparity represents retinal disparities indicating that corresponding optical rays intersect in front of the horopter or the convergence plane. Uncrossed disparity 3DTV Content Capture, Encoding and Transmission: Building the Transport Infrastructure for Commercial Services, by Daniel Minoli Copyright 2010 John Wiley & Sons, Inc.
29
30
3DV AND 3DTV PRINCIPLES
a
tc
Horopter Convergence Panum’s area (area of fusion)
Crossed disparity
tc Horopter Convergence
Uncrossed disparity tc Horopter Convergence
Figure 2.1 Mechanism of stereoscopic fusion and retinal disparity.
represents retinal (or camera disparities) where the optical rays intersect behind the horopter or the convergence plane. Interocular distance (also caller interpupillary distance) is the distance between an observer’s eye, about 64 mm for adults [1] (although there is a distribution1 of distances ±12 mm). See Fig. 2.1 for illustrations on concepts of disparity and Fig. 2.2 for the concept of fusion. Stereoscopy is the method used for creating a pair of planar stereo images. Plano-stereoscopic is the exact term for describing 3D displays that achieve a binocular depth effect by providing the viewer with images of slightly different perspective at one common planar screen. Depth range is the extent of depth that is perceived when a plano-stereoscopic image is reproduced by means of a stereoscopic viewing device. 1
This distance may need to be taken into account by content producers. Also a percentage of people (around 6%) have depth impairments in their vision.
HUMAN VISUAL SYSTEM
31
Original object
Right-eye view Left-eye view Views are fused to form a single perceived 3D object
Figure 2.2 Fusion of left-eye–right-eye images.
Corresponding points are the points in the left and right images that are pictures of the same point in 3D space. Parallax is the distance between corresponding points in the left- and right-eye images of a plano-stereoscopic image. Parallax angle is the angle under which the optical rays of the two eyes intersect at a particular point in the 3D space. Hence, (binocular) parallax is the apparent change in the position of an object when viewed from different points (e.g., from two eyes or from two different positions); in slightly different words, an apparent displacement or difference in the apparent position of an object viewed along two different lines of sight. Negative parallax stereoscopic presentation occurs where the optical rays intersect in front of the screen in the viewers’ space (this refers to crossed disparity). Positive parallax stereoscopic presentation occurs where the optical rays intersect behind the screen in the screen space (this refers to uncrossed disparity). Screen space is the region behind the display screen surface. Objects will be perceived in this region if they have positive parallax.
32
3DV AND 3DTV PRINCIPLES
(a)
Figure 2.3
(b)
(c)
Parallax: (a) positive parallax, (b) zero parallax, and (c) negative parallax.
Viewer space is the region between the viewer and the display screen surface. Objects will be perceived in this region if they have negative parallax (Fig. 2.3). Accommodation/convergence conflict is the deviation from the learned and habitual correlation between accommodation and convergence when viewing plano-stereoscopic images. Binocular rivalry represents perception conflicts that appear in case of colorimetric, geometric, photometric or other asymmetries between the two (recreated) stereo images. Crosstalk is the imperfect separation of the left- and right-eye images when viewing plano-stereoscopic 3D content. Crosstalk is a physical entity, whereas ghosting is a psychophysical entity (Fig. 2.4).
Convergence ZD
a WD
te
P
3D display Accomodation 3D objects
Zv
Figure 2.4 Viewing a 3D image on a screen and the related accommodation– convergence conflict.
HUMAN VISUAL SYSTEM
2.1.1
33
Depth/Binocular Cues
The terms we just defined are now further applied. The HVS is2 able to perceive depth due to the brain’s ability to interpret several types of depth cues that can be separated into two broad categories: sources of information that require only one eye (e.g., relative size, linear perspective or motion parallax), are called monocular cues, whereas those that depend on both eyes are called binocular cues. Everyday scenes usually contain more than one type of depth cue and the importance of each cue is based on learning and experience. In addition to this, the influence of the different cues on human depth perception also depends on the relative distances between the observer and the objects in the scene [2]. Binocular cues are mainly dominant for viewing distances below 10 m and, hence, they are particularly important for 3DTV. They are based on the fact that the human eyes are horizontally separated. Each eye provides the brain with a unique perspective of the observed scene. This horizontal separation—on average approximately 64 mm for an adult—is known as interocular distance te . It leads to spatial distances between the relative projections of observed 3D points in the scene onto the left and the right retina, also known as retinal disparities. These disparities provide the HVS with information about the relative distance of objects and about the spatial structure of our 3D environment. It is the retinal disparity that allows the human brain to fuse the two different perspective views from the left and the right eye into a single 3D image. Figure 2.1 illustrates how this process of stereoscopic fusion works in detail. Basically, when looking at the 3D world, the eyes rotate until their optical axes converge (intersect at a single point) on the “object of interest” [3]. It follows that the point of convergence is projected to the corresponding image points on the respective retinas; that is, it does not produce any retinal disparity. The same holds true for all points on the horopter which is defined by the fixation point and the nodal points of both eyes. All other points, however, will produce retinal disparities whose magnitude becomes larger, the further away the 3D points are from the horopter. Disparities that are caused by points in front of the horopter are said to be crossed, while disparities that result from points behind the horopter are called uncrossed. As long as the—crossed or uncrossed—disparities do not exceed a certain magnitude, the two separate viewpoints can be merged by the human brain into a single, 3D percept. The small region around the horopter within which disparities are fused is known in the literature as Panum’s fusional area. Points outside this area are not fused and double images will be seen, a phenomenon that is called diplopia.
2 Portions
of the discussion that follows for the rest of Section 2.1 are based on a public report of the 3D4YOU project under the ICT (Information and Communication Technologies) Work Programme 2007– 2008 [1].
34
3DV AND 3DTV PRINCIPLES
2.1.2
Accommodation
That the double images just described usually do not disturb visual perception is the result of another habitual behavior that is tightly coupled with the described convergence process. In concert with the rotation of the optical axes, the eyes also focus (accommodate by changing the shape of the eye’s lenses) on the object of interest. This is important for two different reasons. First of all, focusing on the point of convergence allows the observer to see the object of interest clear and sharp. Secondly, the perception of disturbing double images, which in principle result from all scene parts outside Panum’s fusional area, is efficiently suppressed due to an increasing optical blur [4]. Although particular realizations differ widely in the specifically used techniques, most of all stereoscopic displays and projections are based on the same basic principle of providing the viewer with two different perspective images for the left and the right eye. Usually, these slightly different views are presented at the same planar screen. These displays are therefore called plano-stereoscopic devices. In this case, the perception of binocular depth cues results from the spatial distances between corresponding points in both planar views, that is, from the so-called parallax P that in turn, induces the retinal disparities in the viewer’s eyes. Thus, the perceived 3D impression depends on, among others, parameters such as the viewing distance on both, the amount and type of parallax. 2.1.3
Parallax
As shown in Fig. 2.3, three different cases have to be taken into account here: 1. Positive Parallax: Corresponding image points are said to have positive or uncrossed parallax P when the point in the right-eye view lies more to the right than the corresponding point in the left-eye view. Thus, the related viewing rays converge in a 3D point behind the screen, so that the reproduced 3D scene is perceived in the so-called screen space. Furthermore, if the parallax P exactly equals the viewer’s interocular distance te , the 3D point is reproduced at infinity. This also means that the allowed maximum of the positive parallax is limited to te . 2. Zero Parallax: With zero parallax, corresponding image points lie at the same position in the left- and the right-eye views. The resulting 3D point is therefore observed directly at the screen, a situation that is often referred to as the Zero Parallax Setting (ZPS). 3. Negative Parallax: Conjugate image points with negative or crossed parallax P are located such that the point in the right-eye view lies more to the left than the corresponding point in the left-eye view. The viewing rays therefore converge in a 3D point in front of the screen in the so-called viewer space. The parallax angle is unlimited when looking at a real-world 3D scene. In this case, the eyes simultaneously converge and accommodate on the object
3DV/3DTV STEREOSCOPIC PRINCIPLES
35
of interest. As explained, these jointly performed activities allow the viewer to stereoscopically fuse the object of interest and, at the same time, to suppress diplopia (double image) effects for scene parts that are outside the Panum’s fusional area around the focused object. However, the situation is different in stereoreproduction. When looking at a stereoscopic 3D display, the eyes always accommodate on the screen surface, but they converge according to parallax (Fig. 2.4). This deviation from the learned and habitual correlation between accommodation and convergence is known as accommodation–convergence conflict. It represents one of the major reasons for eyestrain, confusion, and loss of stereopsis in 3D stereoreproduction [5–7]. It is therefore important to make sure that the maximal parallax angle αmax is kept within acceptable limits or, in other words, to guarantee that the 3D world is reproduced rather close to the screen surface of the 3D display. The related generation of planar stereoscopic views requires capture with a synchronized stereocamera. Because such 2-camera systems are intended to mediate the natural binocular depth cue, it is not surprising that their design shows a striking similarity with the HVS. For example, the interaxial distance tc between the focal points of left- and the right-eye camera lens is usually chosen in relation to the interocular distance te . Furthermore, similar to the convergence capability of the HVS, it must be able to adapt a stereocamera to a desired convergence condition or ZPS; that is, to choose the part of the 3D scene that is going to be reproduced exactly on the display screen. As shown in Fig. 2.5, this can be achieved by two different camera configurations [8, 9]. 1. “Toed-In” Setup: With the toed-in approach, depicted in Fig. 2.4(a), a point of convergence is chosen by a joint inward rotation of the left- and the right-eye cameras. 2. “Parallel” Setup: With the parallel method, shown in Fig. 2.4(b), a plane of convergence is established by a small shift h of the sensor targets. At first view, the toed-in approach intuitively seems to be the more suitable solution because it directly fits the convergence behavior of the HVS. However, it has been shown in the past that the parallel approach is nonetheless preferable, because it provides a higher stereoscopic image quality [8, 9].
2.2
3DV/3DTV STEREOSCOPIC PRINCIPLES
We start this section with a few additional definitions. Stereo means “having depth, or being three-dimensional” and it describes an environment where two inputs combine to create one unified perception of three-dimensional space. Stereoscopic vision is the process where two eye views combine in the brain to create the visual perception of one 3D image; it is a by-product of good binocular vision. Stereoscopy can be defined as any technique that creates the illusion of
36
3DV AND 3DTV PRINCIPLES
Z
Z
a
a
Ze
Ze
f
te
(a)
f
f′ h
te
(b)
f′ h
Figure 2.5 Basic stereoscopic camera configurations: (a) “toed-in” approach, and (b) “parallel” setup.
depth of three-dimensionality in an image. Stereoscopic (literally: “solid looking”) is the term to describe a visual experience having visible depth as well as height and width. The term may refer to any experience or device that is associated with binocular depth perception. Stereoscopic 3D refers to two photographs taken from slightly different angles that appear three-dimensional when viewed together. Autostereoscopic describes 3D displays that do not require glasses to see the stereoscopic image. Stereogram is a general term for any arrangement of left-eye and right-eye views that produces a three-dimensional result that may consist of (i) a side-by-side or over-and-under pair of images; (ii) superimposed images projected onto a screen; (iii) a color-coded composite (anaglyph); (iv) lenticular images; or (v) alternate projected left-eye and right-eye images that fuse by means of the persistence of vision [10]. Stereoplexing (stereoscopic multiplexing) is a mechanism to incorporate information for the left and right perspective views into a single information channel without expansion of the bandwidth. On the basis of the principles discussed above, a number of techniques for re-creating depth for the viewer of photographic or video content have been developed. Considerable amount of research has taken place during the past 30 or more years on 3D graphics and imaging; most of the research has focused on photographic techniques, computer graphics, 3D movies, and holography (the field of imaging, including 3D imaging relates more to the static or quasi-static capture/representation—encoding, compression/transmission/display/storage of content, for example, photographs, medical images, CAD/CAM drawings, and so on, especially for high-resolution applications—this topic is not covered here).
3DV/3DTV STEREOSCOPIC PRINCIPLES
37
Fundamentally, the technique known as “stereoscopy” has been advanced, where two pictures or scenes are shot, one for each eye, and each eye is presented with its proper picture or scene, in one fashion or another (Fig. 2.6). Stereoscopic 3D video is based on the binocular nature of human perception; to generate quality 3D content, the creator needs to control the depth and parallax of the scene, among other parameters. Depth perception is the ability to see in 3D to allow the viewer to judge the relative distances of objects; depth range is a term that applies to stereoscopic images created with cameras. As noted above, parallax is the apparent change in the position of an object when viewed from different points; namely, the visual differences in a scene when
Figure 2.6 Stereoscopic capture of scene to achieve 3D when scene is seen with appropriate display system. In this figure the separation between the two images is exaggerated for pedagogical reasons (in actual stereo photos the differences are very minute).
38
3DV AND 3DTV PRINCIPLES
Statue Wallpaper (2D image)
Statue (3D object)
Wallpaper (2D image)
Horizontal (binocular) parallax Object plane
Figure 2.7 Generation of horizontal parallax for stereoscopic displays.
viewed from different points. A 3D display (screen) needs to generate some sort of parallax, which, in turn, creates a stereoscopic sense (Fig. 2.7). Nearby objects have a larger parallax than more distant objects when observed from different positions; because of this feature, parallax can be used to determine distances. Because the eyes of a person are in different positions on the head, they present different views simultaneously. This is the basis of stereopsis, the process by which the brain exploits the parallax due to the different views from the eye to gain depth perception and estimate distances to objects. 3D depth perception can be supported by 3D display systems that allow the viewer to receive a specific and different view for each eye; such a stereo pair of views must correspond to the human eye positions, thus enabling the brain to compute the 3D depth perception. In recent years, the main means of stereoscopic display has moved over the years from anaglyph to polarization and shutter glasses. Some basic terms and concepts related to camera management for stereoscopic filming are as follows: interaxial distance is the distance between the left- and right-eye lenses in a stereoscopic camera. Camera convergence is the term used to denote the process of adjusting the ZPS in a stereoscopic camera. ZPS defines the point(s) in 3D space that have zero parallax in the plano-stereoscopic image created; for example, with a stereoscopic camera. These points will be stereoscopically reproduced on the surface of the display screen. Two simultaneous conventional 2D video streams are produced by a pair of cameras mimicking the two human eyes that see the environment from two slightly different angles. Simple planar 3D films are made by recording separate
3DV/3DTV STEREOSCOPIC PRINCIPLES
39
images for the left eye and the right eye from two cameras that are spaced a certain distance apart. The spacing chosen affects the disparity between the lefteye and the right-eye pictures, and thereby the viewer’s sense of depth. While this technique achieves depth perception, it often results in eye fatigue after watching such a programming for a certain amount of time: within minutes after the onset of viewing, stereoscopy frequently causes eye fatigue and, in some, feelings similar to those experienced during motion sickness [11]. Nevertheless, the technique is widely used for (stereoscopic) photography and moviemaking, and it has been tested many times for television [12]. At the display level, one of these streams is shown to the left eye, and the other one to the right eye. Common means of separating the right-eye and left-eye views include glasses with colored transparencies, polarization filters, and shutter glasses. Polarization of light is the arrangement of beams of light into separate planes or vectors by means of polarizing filters; when two vectors are crossed at right angles, vision or light rays are obscured. In the filter-based approach, complementary filters are placed jointly over two overlapping projectors (when projectors are used—refer back to Table 1.3) and over the two corresponding eyes (i.e., anaglyph, linear or circular polarization, or the narrow-pass filtering of Infitec) [13]. Although the technology is relatively simple, the necessity of wearing glasses while viewing has often been considered a major obstacle to the wide acceptance of 3DTV. Also, there are some limitations to the approach, such as the need to retain a head orientation that works properly with the polarized light (e.g., do not bend the head 45 degrees side to side), and the need to be within a certain viewing angle. There are a number of other mechanisms to deliver binocular stereo, including barrier filters over LCDs (vertical bars act as a fence, channeling data in specific directions for the eyes). It should be noted as we wrap up this brief overview of the HVS that individuals vary along a continuum in their ability to process stereoscopic depth information. Studies have shown that a relatively large percentage of the population experience stereodeficiencies in depth discrimination/perception if the display duration is very short, and that a certain percentage of the adult population (about 6%) has persistent deficiencies. Figure 2.8 depicts the results of a study that quantifies these observations [14]. These results indicate that certain fast-cut methods in scenes may not work for all in 3D. Object motion can also create visual problem in stereoscopic 3DTV. Figure 2.9 depicts visual discomfort that has been observed in studies [14]. At the practical level, in the context of cinematography, while new digital 3D technology has made the experience more comfortable for many, for some people with eye problems, a prolonged 3D session may result in an aching head according to ophthalmologists. Some people have very minor eye problems (e.g., a minor muscle imbalance), which the brain deals with naturally under normal circumstances; but in a 3D movie, these people are confronted with an entirely new sensory experience that translates into greater mental effort, making it easier to get a headache. Some people who do not have normal depth perception cannot see in 3D at all. People with eye muscle problems, in which the eyes are not pointed at the same object, have trouble processing 3D images.
3DV AND 3DTV PRINCIPLES
Percentages stereo deficient (left/right depth discrimination) 60 n = 100
Percentage of viewers
50 Any stereodeficiency Stereo–anomalous: uncrossed “Stereo blind” Stereo–anomalous: crossed
40
30
20
10
0 0
Figure 2.8
200
400 600 800 Display duration (ms)
1000
Stereo deficiencies in some populations [14].
100 Very comfortable 80 Comfortable Visual comfort
40
60 Mildly uncomfortable 40 Uncomfortable 20 Externely uncomfortable 0
Slow
Figure 2.9
Medium Velocity (cm/s)
Fast
Visual discomfort caused by motion in a scene [14].
3DV/3DTV STEREOSCOPIC PRINCIPLES
41
Headaches and nausea are cited as the main reasons 3D technology never took off. However, newer digital technology addresses many of the problems that typically caused 3D moviegoers discomfort. Some of the problems were related to the fact that the projectors were not properly aligned; systems that use a single digital projector help overcome some of the old problems [15]. However, deeper-rooted issues about stereoscopic display may continue to affect a number of viewers (these problems will be solved by future autostereoscopic systems). The two video views required for 3DTV can be compressed using standard video compression techniques. MPEG-2 encoding is widely used in digital TV applications today and H.264/MPEG-4 AVC is expected to be the leading video technology standard for digital video in the near future. Extensions have been developed recently to H.264/MPEG-4 AVC and other related standards to support 3DTV; other standardization work is underway. The compression gains and quality of 3DTV will vary depending on the video coding standard used. While inter-view prediction will likely improve the compression efficiency compared to simulcasting (transmitting the two views end-to-end, and so requiring a doubling of the channel bandwidth), new approaches, such as, but not limited to, asymmetric view coding, video-plus-depth, and layered video, are necessary to reduce bandwidth requirements for 3DTV [16]. Temporal multiplexing and spatial compression are being used in the short term, but with a compromise in resolution, as discussed in Chapter 3. There are a number of ways to create 3D content, including: (i) ComputerGenerated Imagery (CGI); (ii) stereocameras; and (iii) 2D to 3D conversions. CGI techniques are currently the most technically advanced, with welldeveloped methodologies (and tools) to create movies, games, and other graphical applications—the majority of cinematic 3D content is comprised of animated movies created with CGI. Camera-based 3D is more challenging. A 2-camera approach is the typical approach here, at this time; another approach is to use a 2D camera in conjunction with a depth-mapping system. With the 2-camera approach, the two cameras are assembled with same spatial separation to mimic how the eye may perceive a scene. The technical issues relate to focus/focal length, specifically keeping in mind that these have to be matched precisely to avoid differences in vertical and horizontal alignment and/or rotational differences (lens calibration and motion control must be added to the camera lenses). 2D to 3D conversion techniques include the following: • • • • •
object segmentation and horizontal shifting; depth mapping (bandwidth-efficient multiple images and viewpoints); creation of depth maps using information from 2D source images; making use of human visual perception for 2D to 3D conversion; creation of surrogate depth map (e.g., gray-level intensities of a color component).
Conversion of 2D material is the least desirable but perhaps it is the approach that could generate the largest amount of content in the short term. Some note that it is “easy to create 3D content, but it is hard to create good 3D content” [17].
42
3DV AND 3DTV PRINCIPLES
A practical problem relates to “insertion”. At least early on, 2D content will be inserted into a 3D channel, much the way standard-definition commercials still show up in HD content. A set-top could be programmed to automatically detect an incoming format and handle various frame packing arrangement to support 2D/3D switching for advertisements [18]. In summary, and as we transition the discussion to autostereoscopic approaches (and in preparation for that discussion), we list below the highlights of the various approaches, as provided in Ref. [19] (refer back to Table 1.1 for definition of terms). Stereoscopy is the Simplest and Oldest Technique: • does not create physical duplicates of 3D light; • quality of resultant 3D effect is inferior; • lacks parallax; • focus and convergence mismatch; • mis-alignment is seen; • “motion sickness” type of a feeling (eye fatigue) is produced; • is the main reason for commercial failure of 3D techniques. Multi-view video provides some horizontal parallax: • still limited to a small angle (∼ 20–45 degrees); • jumping effect observed; • viewing discomfort similar to stereoscopy; • requires high-resolution display device; • leakage of neighboring images occurs. Integral Imaging adds vertical parallax: • gets closer to an ideal light-field renderer as the number of lenses (elemental images) increase: true 3D; • alignment is a problem; • requires very high resolution devices; • leakage of neighboring images occurs. Holography is superior in terms of replicating physical light distribution: • recording holograms is difficult; • very high resolution recordings are needed; • display techniques are quite different; • network transmission is anticipated to be extremely taxing. 2.3
AUTOSTEREOGRAPHIC APPROACHES
Autostereo implies that the perception of 3D is in some manner automatic, and does not require devices such as glasses—either filtered or shuttered. Autostereoscopic displays use additional optical elements aligned on the surface of the
AUTOSTEREOGRAPHIC APPROACHES
43
screen, to ensure that the observer sees different images with each eye. 3D autostereoscopic displays (where no headgear needed) are still in the research phase at this time. We describe here displays based only on lenticular or parallax barrier binocular mechanisms (and not, for example holographic approaches). Lenticular lenses are curved optics that allow both eyes to see a different image of the same object at exactly the same time. Lenticules are tiny plastic lenses pasted in an array on a transparent sheet that is then applied onto the display surface of the LCD screen (Fig. 2.10). A typical multi-view 3D display device shows nine views simultaneously and allows a limited free-viewing angle (some prototype products support a larger number of views). When looking at the cylindrical image on the TV, the left and right eye see two different 2D images that the brain combines to form one 3D image. The lenslet or lenticular
Lenticular lens
Left
Figure 2.10
Lenticular approach.
Right
44
3DV AND 3DTV PRINCIPLES
elements are arranged to make parts of an underlying composite image visible only from certain viewing directions. Typically, a lenticular display multiplexes separate images in cycling columns beneath its elements making them take on the color of selected pixels beneath when viewed from different directions. LCDs or projection sources can provide the pixels for such display [13]. A drawback of the technology is that it requires a very specific “optimal sitting spot” for getting the 3D effect, and shifting a small distance to either side will make the TV’s images seem distorted. A parallax barrier is a device used on the surface of non–glasses based 3DTV system with slits that allow the viewer to see only certain vertical columns of pixels at any one time. The parallax barrier is the more consumer-friendly technology of the two and the only one that allows for regular 2D viewing. The parallax barrier is a fine grating of liquid crystal placed in front of the screen, with slits in it that correspond to certain columns of pixels of the TFT (ThinFilm Transistor) screen (Fig. 2.11). These positions are carved so as to transmit
3D display mode parallax barrier on (light cannot pass through.)
Figure 2.11 Parallax barrier approach.
REFERENCES
45
alternating images to each eye of the viewer, who is again sitting in an optimal “sweet spot.” When a slight voltage is applied to the parallax barrier, its slits direct light from each image slightly differently to the left and right eyes, again creating an illusion of depth and thus a 3D image in the brain [20]. The parallax barrier can be switched on and off allowing the screen to be used for 2D or 3D viewing. However, the need still exists to sit in the precise “sweet spots,” limiting the usage of this technology. Autostereoscopic technology will likely not be part of early 3DTV deployments. For example, Philips reportedly folded an effort to define an autostereoscopic technology that does not require glasses because it had a narrow viewing range and a relatively high loss of resolution and brightness [21]. REFERENCES 1. Kauff P, M¨uller M, et al. ICT- 215075 3D4YOU, Deliverable D2.1.2: Requirements on Post-production and Formats Conversion. Aug 2008. [This reference is not copyrighted.] 2. Cutting JE. How the eye measures reality and virtual reality. Behav Res Methods Instrum Comput 1997; 29(1):27–36. 3. Lipton L. Foundations of the Stereoscopic Cinema—A Study in Depth. Van Nostrand Reinhold, New York: NY, USA; 1982. 4. IJsselsteijn WA, Seunti¨ens PJH, Meesters LMJ. State-of-the-Art in Human Factors and Quality Issues of Stereoscopic Broadcast Television. ATTEST Technical Report D1, IST-2001-34396. Aug 2002. 5. IJsselsteijn WA, de Ridder H, Freeman J, Avons SE, Bouwhuis D. Effects of stereoscopic presentation, image motion, and screen size on subjective and objective corroborative measures of presence. Presence 2001; 10(3). 6. Lipton L. Stereographics Developers’ Handbook. Developers’ Handbook. 1997. 7. Pastoor S. Handbuch der telekommunikation, chapter 3D-Displays: methoden und Stand der Technik. K¨oln: Deutscher Wirtschaftsdienst; 2002. 8. Woods A, Docherty T, Koch R. Image distortions in stereoscopic video systems. Proceedings of SPIE Stereoscopic Displays and Applications IV; Feb 1993; San Jose (CA). pp. 36–48. 9. Yamanoue H, Okui M, Okano F. Geometrical analysis of puppet-theater and cardboard effects in stereoscopic HDTV images. IEEE Trans Circuits Syst Video Technol 2006; 16(6):744–752. 10. The 3D@Home Consortium. http://www.3dathome.org/. 2010. 11. Onural L, Ozaktas HM. Three-dimensional television: from science-fiction to reality. In: Ozaktas HM, Onural L, editors. Volume XVIII, Three-dimensional television: capture, transmission, display. New York: Springer; 2008. 630 pp. ISBN: 978-3-54072531-2. 12. Dosch C, Wood D. Can We Create the “Holodeck”? The Challenge of 3D Television. ITU News Magazine: Article: Issue No 09. Nov 2008. 13. Baker H, Li Z, Papadas C. Calibrating camera and projector arrays for immersive 3D display. Hewlett-Packard Laboratories Papers; 2009; Palo Alto (CA).
46
3DV AND 3DTV PRINCIPLES
14. Tam WJ. Human Visual Perception Relevant to 3D-TV. Ottawa: Communications Research Centre Canada; 2009. 15. Steenhuysen J. For Some, 3D Movies a Pain in the Head. Reuters. Jan 11, 2010. 16. Christodoulou L, Mayron LM, Kalva H, Marques O, Furht B. 3D TV using MPEG2 and H.264 view coding and autostereoscopic displays. International multimedia conference archive. Proceedings of the 14th Annual ACM International Conference on Multimedia; 2006; Santa Barbara (CA). ISBN:1-59593-447-2. 17. Chinnock C. 3D Coming Home in 2010. 3D@Home White Paper, 3D@Home Consortium. http://www.3Dathome.org. 18. TVB, Television Broadcast. A 3DTV Update from the MPEG Industry Forum. Online Magazine. Jan 20, 2010. 19. Onural L. The 3DTV Toolbox—The Results of the 3DTV NoE. 3DTV NoE Coordinator, Bilkent University, Workshop on 3DTV Broadcasting, Geneva. Apr 30, 2009. 20. Patkar M. How 3DTV Works: Part II—Without Glasses. Online Magazine. Oct 26, 2009. http://Thinkdigit.com. 21. Merritt R, Incomplete 3DTV Products in CES Spotlight HDMI Upgrade One of Latest Pieces in Stereo 3D Puzzle. EE Times. Dec 23, 2009.
CHAPTER 3
3DTV/3DV Encoding Approaches
This chapter looks at some of the subsystems elements that comprise an overall 3DTV system. Subsystems include elements for the capture, representation/definition, compression, distribution, and display of the signals. Figure 3.1 depicts a logical end-to-end view of a 3DTV signal management system; Fig. 3.2 provides additional details. Figure 3.3 provides a more physical perspective. 3D approaches are an extension of traditional video capture and distribution approaches. We focus here on the representation/definition of the signals and compression.1 We provide only a brief discussion on the capture and display technology. The reader may refer to [1–4] for more details on capture and display methods and technologies. Distribution is covered in Chapters 4 and 5. We already mentioned in Chapter 1 that the availability of content will be critical to the successful introduction of the 3DTV service and that 3D content is more demanding in terms of production. Real-time capture of 3D content almost invariably requires a pair of cameras to be placed side-by-side in what is called a 3D rig to yield a left-eye, right-eye view of a scene. The lenses on the left and right cameras in a 3D rig must match each other precisely. The precision of the alignment of the two cameras is critical; misaligned 3D video is cumbersome to watch and will be stressful to the eyes. Two parameters of interest for 3D camera acquisition are camera separation and toe-in; we already covered these issues in the previous chapter. This operation is similar to how the human eyes work: as one focuses on an object in close proximity the eyes toe-in; as one focuses on remote objects the eyes are parallel. Interaxial distance (also known as interaxial separation) is the distance between camera lenses’ axes; this can be also defined as the distance between two taking positions for a stereo photograph. 1 Some
people refer to compression, in this context, as coding. We use a slightly different nomenclature. Compression is the application of a bit-reduction technique; for example, an MPEG-based Discrete Cosine Transform (DCT) and the MPEG multiplex structure. By coding we mean the encapsulation needed prior to transmission into a network. This may include IP encapsulation, DVB encapsulation, Forward Error Correction (FEC), encryption, and other bit-level management algorithms (e.g. scrambling). 3DTV Content Capture, Encoding and Transmission: Building the Transport Infrastructure for Commercial Services, by Daniel Minoli Copyright 2010 John Wiley & Sons, Inc.
47
48
3DTV/3DV ENCODING APPROACHES
Scene replica
3D scene
Capture
Representation
Transmission
Compression
Signal conversion
Coding
Display
Capture: capture 3D scene to provide input to 3DTV system Scene representation: abstract representation of captured 3D scene information in digital form Compression: data reduction algorithms Coding: specify the exchange format of the data Transmission: transmit coded data Signal processing: conversion of 3DTV data to suitable forms for 3DTV displays Display: equipment to decode and display 3DTV signal.
Figure 3.1 Logical view of end-to-end 3DTV system. (Based in part on the 3DTV Project, 3D Media Cluster.)
The baseline distance between visual axes (separation) for the eyes is around 2.5 in. (65 mm) although there is a distribution of values as shown in Fig. 3.4 that may have to be taken into account by content producers. 3D cameras use the same separation for baseline, but the separation can be smaller or larger to accentuate the 3D effect of the displayed material. The separation will need to be varied for different focal length lenses and by the distance from cameras to the subject [5]. A number of measures can (or better yet, must) be taken to reduce eye fatigue
49
Volumetric Texture mapping Light field Object-based
Coding: DVB-S DVB-S2 DVB-T DVB-C DVB-H IP/IPTV
Signal conversion: Technology-dependent for polarized screens, time-interleaved screens, LCD, LED, Plasma, lenticular displays, barrier displays, and/or future displays (integral imaging screens, holography, etc.)
Display: Stereoscopic displays based on eye-wear Autostereoscopic displays (lenticular, barrier) Integral imaging displays Volumetric displays
Transmission: Satellite backbone + local distribution Satellite DTH Terrestrial over the air Cable TV system IP streaming/Internet IP/IPTV (private network) 3G/4G Media (Blu-ray Disc) Internet downloads
Figure 3.2 End-to-end details.
Compression: Stereoscopic video coding –ITU-T Rec. H.262/ISO/IEC 13818-2 MPEG-2 video (Multiview profile) Multiview video coding (MVC) –H264/AVC can be used for each view independently –MVC extension of H 264/AVC (Amend 4)
Representation: Pseudo-3D Dense depth Surface-based Point-based
Capture: Single camera techniques Multi camera techniques Holographic capture devices Pattern projection techniques Time-of-flight techniques
3D scene
Scene replica
3DTV/3DV ENCODING APPROACHES
Left eye
3D camera
3D encoding and video compression
Video 3D format compresencode sion
Blu-ray Disc Cable TV Satellite TV Terrestrial TV IPTV Internet
3D video distribution channels
3D home package
Right eye 3D home master
Video decompression
3D format decode
Media players and Set top boxes
3DTV
Figure 3.3
End-to-end 3DTV system.
20 18 16 14 Percentage of viewers
50
12 10 8 6 4 2 0 45
Figure 3.4
50
55 60 65 Interpupillary distance (mm)
70
Distribution of interpupillary distance in adults.
75
3D MASTERING METHODS
51
in 3D during content development/creation. Some are considering the creation of 3D material by converting a 2D movie to a stereoscopic product with left/right-eye tracks; in some instances non–real time conversion of 2D to 3D may lead to (marginally) satisfactory results. It remains a fact however, that it is not straightforward to create a stereo pair from 2D content (issues relate to object depth and reconstruction of parts of the image that are obscured in the first eye). Nonetheless, conversion from 2D may play a role in the short term.
3.1
3D MASTERING METHODS
For the purpose of this discussion we define a mastering method as the mechanism used for representing a 3D scene in the video stream that will be compressed, stored, and/or transmitted. Mastering standards are typically used in this process. As alluded to earlier, a 3D mastering standard called “3D Master” is being defined by SMPTE. The high-resolution 3D master file is one that is used to generate other files appropriate for various channels; for example, theater release, media (DVD, Blu-ray Disc) release, and broadcast (e.g., satellite, terrestrial broadcast, cable TV, IPTV, and/or Internet distribution). The 3D Master is comprised of two uncompressed files (left- and right-eye files), each of which has the same file size as a 2D video stream. Formatting and encoding procedures have been developed to be used in conjunction with already-established techniques, to deliver 3D programming to the home over a number of distribution channels. In addition to normal video encoding, 3D mastering/transmission requires additional encoding/compression, particularly when attempting to use legacy delivery channels. Additional encoding schemes for CSV include the following [6]: (i) spatial compression and (ii) temporal multiplexing. 3.1.1
Frame Mastering for Conventional Stereo Video (CSV)
CSV is the most well-developed and the simplest 3D video representation. This approach deals only with (color) pixels of the video frames captured by the two cameras. The video signals are intended to be directly displayed using a 3D display system. Figure 3.5 shows an example of a stereo image pair: the same scene is visible from slightly different viewpoints. The 3D display system ensures that a viewer sees only the left view with the left eye and the right view with the right eye to create a 3D depth impression. Compared to the other 3D video formats, the algorithms associated with CSV are the least complex. A straightforward way to utilize existing video codecs (and infrastructure) for stereo video transmission is to apply one of the interleaving approaches illustrated in Fig. 3.6. A practical challenge is that there is no de facto industry standard available (so that any downstream decoder knows what kind of interleaving was used by the encoder). However, there is an industry movement toward using an over/under approach (also called top/bottom spatial compression).
52
3DTV/3DV ENCODING APPROACHES
Figure 3.5 A stereo image pair. (Note: Difference in left-eye/right-eye views is greatly exaggerated in this and pictures that follow for pedagogical purposes.)
(a)
(b)
(c)
Figure 3.6 Stereo interleaving formats: (a) time multiplexed frames; (b) spatial multiplexed as side-by-side; and (c) spatial multiplexed as over/under.
3D MASTERING METHODS
53
3.1.1.1 Spatial Compression. When an operator seeks to deliver 3D content over a standard video distribution infrastructure, spatial compression is a common solution. Spatial compression allows the operator to deliver a stereo 3D signal (now called frame-compatible) over a 2D HD video signal making use of the same amount of channel bandwidth. Clearly, this entails a loss of resolution (for both the left and the right eye). The approach is to pack two images into a single frame of video; the receiving device (e.g., set-top box) will, in turn, display the content in such a manner that a 3D effect is perceived (these images cannot be viewed in a standard 2D TV monitor). There are a number of ways of combining two frames; the two most common are the side-by-side combination and the over/under combination. As can be seen there, the two images are reformatted at the compression/mastering point to fit into that standard frame. The combined frame is then compressed by standard methods and delivered to a 3D-compatible TV, where it is reformatted/rendered for 3D viewing. The question is how to take two frames, a left frame and a right frame, and reformat them to fit side-by-side or over/under in a single standard HD frame. Sampling is involved, but as noted, with some loss of resolution (50% to be exact). One approach is to take alternative columns of pixels from each image and then pack the remaining columns in the side-by-side format. Another approach is to take alternative rows of pixels from each image and then pack the remaining rows in the above/under format (Fig. 3.7). Studies have shown that the eye is less sensitive to loss of resolution along a diagonal direction in an image than in the horizontal or vertical direction. This allows the development of encoders that optimize subjective quality by sampling each image in a diagonal direction. Other encoding schemes are also being developed to attempt to retain as much of the perceived/real resolution as possible. One approach that has been studied for 3D is quincunx filtering. A quincunx is a geometric pattern comprised of five coplanar points, four of them forming a square (or rectangle) and a point fifth at its center, like a checkerboard. Quincunx filter banks are 2D two-channel nonseparable filter banks that have been shown to be an effective tool for image coding applications. In such applications, it is desirable for the filter banks to have perfect reconstruction, linear phase, high coding gain, good frequency selectivity, and certain vanishing-moment properties [7–12]. Almost all hardware devices for digital image acquisition and output use square pixel grids. For this reason and for the ease of computations, all current image compression algorithms (with the exception of mosaic image compression for single-sensor cameras) operate on square pixel grids. It turns out that the optimal sampling scheme in the two-dimensional image space is claimed to be the hexagonal lattice; unfortunately, a hexagonal lattice is not straightforward in terms of hardware and software implementations. A compromise, therefore, is to use the quincunx lattice; this is a sublattice of the square lattice, as illustrated in Fig. 3.7. The quincunx lattice has a diamond tessellation that is closer to optimal hexagon tessellation than square lattice, and it can be easily generated by down-sampling conventional digital images without any hardware change. Because of this, quincunx lattice is widely adopted by single-sensor digital cameras to sample the green channel; also, quincunx partition of an image
54
3DTV/3DV ENCODING APPROACHES
Side-by-side
Left frame
Right frame (a)
Over/under
Left frame
Right frame (b)
Quincunx
Left frame
Right frame (c)
Figure 3.7 Selection of pixels in (a) side-by-side, (b) over/under, and (c) quincunx approaches. (Note: Either black or white dots can comprise the lattice.)
was recently studied as a means of multiple-description coding [13]. When using quincunx filtering, the higher-quality sampled images are encoded and packaged in a standard video frame (either with the side-by-side or over/under arrangement). The encoded and reformatted images are compressed and distributed to the home using traditional means (cable, satellite, terrestrial broadcast, and so on). 3.1.1.2 Temporal Multiplexing. Temporal (time) multiplexing doubles the frame rate to 120 Hz to allow the sequential repetitive presentation of the left eye and right eye images in the normal 60-Hz time frame. This approach retains full resolution for each eye, but requires a doubling of the bandwidth and storage capacity. In some cases spatial compression is combined with time multiplexing; however, this is more typical of an in-home format and not a transmit/broadcast format. For example, Mitsubishi’s 3D DLP TV uses quincunx sampled (spatially compressed) images that are clocked at 120 Hz as input.
3D MASTERING METHODS
3.1.2
55
Compression for Conventional Stereo Video (CSV)
Typically, the algorithms to compress act to separately encode and decode the multiple video signals, as shown in Fig. 3.8a. This is also called simulcast. The drawback is the fact that the amount of data is increased compared to 2D video; however, reduction of resolution can be used as needed, to mitigate this requirement. Table 3.1 summarizes the available methods. It turns out that the MPEG-2 standard includes an MPEG-2 Multi-View Profile (MVP) Coding that allows efficiency to be increased by combining temporal/inter-view prediction as illustrated in Fig. 3.6b.H.264/AVC was enhanced a few years ago with a stereo Supplemental Enhancement Information (SEI) message that can also be used to implement a prediction as illustrated in Fig. 3.8b. Although not designed for stereo-view video coding, the H.264 coding tools can be arranged to take advantage of the correlations between the pair of views of a stereo-view video, and provide very reliable and efficient compression performance as well as stereo/mono-view scalability [14]. For more than two views, the approach can be extended to Multi-view Video Coding (MVC) as illustrated in Fig. 3.9 [15]; MVC uses inter-view prediction by referring to the pictures obtained from the neighboring views. MVC has been standardized in the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group (VCEG) and ISO/IEC MPEG. MVC enables efficient encoding of sequences captured simultaneously from multiple cameras using a single video stream. MVC is currently the most efficient way for stereo and MVC; for two views, the performance achieved by H.264/AVC stereo SEI message and MVC are similar [16]. MVC is also expected to become a new MPEG video coding standard for the realization of future video applications such as 3D Video (3DV) and Free Viewpoint Video (FVV). The MVC group in the JVT has chosen the
Time Left view Right view
Encoder Encoder
I
B
B
P
B
B
P
B
I
B
B
P
B
B
P
B
(a) Left view Right view
Encoder
I
B
B
P
B
B
P
B
P
B
B
B
B
B
B
B
(b)
Figure 3.8 Stereo video coding with combined temporal/inter-view prediction. (a) Traditional MPEG-2/MPEG-4 applied to 3DTV; (b) MPEG-2 multi-view profile and H.264/AVC SEI message.
56
3DTV/3DV ENCODING APPROACHES
TABLE 3.1
Compression Methods
Service
Standard
Simulcast coding
The separate encoding (and transmission) of the two video scenes in the Conventional Stereo Video (CSV) format. Any coding scheme, such as MPEG-4 can be used. The bitrate will typically be in the range of double that of 2DTV. Video plus depth (V + D) is more bandwidth efficient: studies show that the depth map can typically be compressed to 10–20% of the color information • ITU-T Rec. H.262/ISO/IEC 13818-2 MPEG-2 Video (Multi-View Profile) • Transport of this data is defined in a separate MPEG Systems specification “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data” • H264/AVC can be used for each view independently ISO/MPEG and ITU/VCEG have recently jointly published MVC extension of H.264/AVC (Amendment 4)
Stereoscopic video coding
Multi-view video coding
Standard H.264/AVC Stereo supplemental enhancement information (SEI)
H.264/AVC Scalable video coding
Description H.264/AVC was enhanced with SEI message that can also be used to implement a prediction capability that reduces the overall bandwidth requirement. Some correlations between the pair of views of a stereo-view video can be exploited Annex G supports the concept of scalable video coding scheme to enable the encoding of a video stream that contains one (or several) subset bitstream(s) of a lower spatial or temporal resolution (that is, lower quality video signal)—each separately or in combination—compared to the bitstream it is derived from (e.g., the subset bitstream is typically derived by dropping packets from the larger bitstream), that can itself (themselves) be decoded with a complexity and reconstruction quality comparable to that achieved using the existing coders (e.g., H.264/MPEG-4 AVC) with the same quantity of data as in the subset bitstream. Using SEI message defined in H.264 Fidelity Range Extensions (FRExt), a decoder can easily synchronize the views, and a streaming server or a decoder can easily detect the scalability of a coded stereo video bitstream
3D MASTERING METHODS
TABLE 3.1
57
(Continued )
Service
Standard
ISO/IEC FDIS 23002-3:2007(E)
Video plus depth (V + D) has been standardized for this MPEG as an extension for 3D filed under ISO/IEC FDIS 23002-3:2007(E) “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” (also known as MPEG-C Part 3). Transport of this data is defined in a separate MPEG Systems specification “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data” This standard which supports MV + D (and also V + D) encoded representation inside the MPEG-2 transport stream, has been developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). MVC allows the construction of bitstreams that represent multiple views. MVC supports efficient encoding of video sequences captured simultaneously from multiple cameras using a single video stream. MVC can be used for encoding stereoscopic (two-view) and multi-view 3DTV, and for free viewpoint TV See Appendix B.3 New initiatives under way, to conclude by 2012 Existing (but limited) standards: • Rec. ITU-R BT.1198 (1995) Stereoscopic television based on R- and L-eye two-channel signals • Rec. ITU-R BT.1438 (2000) Subjective assessment of stereoscopic television pictures
MVC (ISO/IEC 14496-10:2008 Amendment 1 and ITU-T Recommendation H.264)
MPEG, new efforts ITU-R ITU-R BT.1198 ITU-R ITU-R BT.1438
H.264/AVC-based MVC method as the MVC reference model, since this method showed better coding efficiency than H.264/AVC simulcast coding and the other methods that were submitted in response to the call for proposals made by the MPEG [15, 17–20]. Some new approaches are also emerging and have been proposed to improve efficiency, especially for bandwidth-limited environments. A new approach uses binocular suppression theory that employs disparate image quality in left- and right-eye views. Viewer tests have shown that (within reason), if one of the images of a stereo pair is degraded, the perceived overall quality of the stereo video will be dominated by the higher-quality image [16, 21, 22]. This concept is illustrated in Fig. 3.10. Applying this concept, one could code the right-eye image with less than the full resolution of the left eye; for example, downsampling it to half or quarter resolution (Fig. 3.11). Some call this asymmetrical
58
3DTV/3DV ENCODING APPROACHES
Time T0
Multiview Encoder
T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
T11
V0
I
B2
B2
B2
B1
B2
B2
B2
I
B2
B2
B2
V1
B1
b
B2
b
B2
b
B2
b
B1
b
B2
b
V2
P
B2
B2
B2
B1
B2
B2
B2
P
B2
B2
B2
V3
B1
b
B2
b
B2
b
B2
b
B1
b
B2
b
V4
P
B2
B2
B2
B1
B2
B2
B2
P
B2
B2
B2
V5
B1
b
B2
b
B2
b
B2
b
B1
b
B2
b
V6
P
B2
B B22
B2
B1
B2
B2
B2
P
B2
B2
B2
V7
B
b
B2
b
B1
b
B2
b
B
b
B2
b
“V” picture set predicted by interview pictures on view axis
“T” picture set predicted by temporal pictures on temporal axis “V/T” picture set predicted by viewtemporal pictures on both axes
Figure 3.9 Multi-view video coding with combined temporal/inter-view prediction.
Figure 3.10
Use of binocular suppression theory for more efficient coding.
quality. Studies have shown that asymmetrical coding with cross-switching at scene cuts (namely alternating the eye that gets the more blurry image) is a viable method for bandwidth savings [23]. In principle this should provide comparable overall subjective stereo video quality, while reducing the bitrate: if one were to adopt this approach, the 3D video functionality could be added by an overhead of say 25%–30% to the 2D video for coding the right view at quarter resolution. See Appendix B3 for some additional details.
MORE ADVANCED METHODS
59
Full resolution
Half resolution
Quarter resolution
Figure 3.11 Mixed resolution stereo video coding.
3.2
MORE ADVANCED METHODS
Other methods have been discussed in the industry, known generally as 2D in conjunction with metadata (2D + M). The basic concept here is to transmit 2D images and to capture the stereoscopic data from the “other eye” image in the form of an additional package, the metadata; the metadata is transmitted as part of the video stream (Fig. 3.12). This approach is consistent with MPEG multiplexing; therefore, to a degree, it is compatible with embedded systems. The requirement to transmit the metadata increases the bandwidth needed in the channel: the added bandwidth ranges from 60%–80% depending on quality goals and techniques used. As implied, a set-top box employed in a traditional 2D environment would be able to use the 2D content, ignoring the metadata, and properly display the 2D image; in a 3D environment the set-top box would be able to render the 3D signal. Some variations of this scheme have already appeared. One approach is to capture a delta file that represents the difference between the left and right images.
60
3DTV/3DV ENCODING APPROACHES
3D camera
Left eye
3D encoding and video compression
Video 3D format compression encode
3D home master
Right eye
Blu-ray Disc Cable TV Satellite TV Terrestrial TV IPTV Internet
3D video distribution channels
3D home package Metadata Left eye (2D)
Left eye (2D) Metadata
Video decompression
3D format decode
Media players and set top boxes 3DTV
Figure 3.12
2D in conjunction with metadata.
A delta file is usually smaller than the raw file because of intrinsic redundancies. The delta file is then transmitted as metadata. Companies such as Panasonic and TDVision use this approach. This approach can also be used for stored media. For example, Panasonic has advanced (and the Blu-ray Disc Association is studying), the use of metadata to achieve a full-resolution 3D Blu-ray Disc standard. A 1920 × 1080p 24 fps resolution per eye is achievable. This standard would make Blu-ray Disc a high-quality 3D content (storage) system. The goal was to agree to the standard by early 2010 and have 3D Blu-ray Disk players emerge by the end-of-year shopping season 2010. Another approach entails transmitting the 2D image in conjunction with a depth map of each scene. 3.2.1
Video Plus Depth (V + D)
As noted above, many 3DTV proposals often rely on the basic concept of “stereoscopic” video, that is, the capture, transmission, and display of two separate video streams (one for the left eye and one for the right eye). More recently, specific proposals have been made for a flexible joint transmission of monoscopic color video and associated per-pixel depth information [24, 25]. The concept of V + D representation is the next notch up in complexity. From this data representation, one or more “virtual” views of the 3D scene can then be generated in real-time at the receiver side, by means of DepthImage-Based Rendering (DIBR) techniques [26]. A system such as this provides
MORE ADVANCED METHODS
61
important features, including backwards compatibility to today’s 2D digital TV; scalability in terms of receiver complexity; and easy adaptability to a wide range of different 2D and 3D displays. DIBR is the process of synthesizing “virtual” views of a scene from still or moving color images and associated per-pixel depth information. Conceptually, this novel view generation can be understood as the following two-step process: at first, the original image points are re-projected into the 3D world, utilizing the respective depth data; thereafter, these 3D space points are projected into the image plane of a “virtual” camera that is located at the required viewing position. The concatenation of re-projection (2D to 3D) and subsequent projection (3D to 2D) is usually called 3D image warping in the Computer Graphics (CG) literature and will be derived mathematically in the following paragraph. The signal processing and data transmission chain of this kind of 3DTV concept is illustrated in Fig. 3.13; it consists of four different functional building blocks: (i) 3D content creation, (ii) 3D video coding, (iii) transmission, and (iv) “virtual” view generation and 3D display. As it can be seen in Fig. 3.14, a video signal and a per-pixel depth map is captured and eventually transmitted to the viewer. The per-pixel depth data can be considered a monochromatic luminance signal with a restricted range spanning the interval [Znear , Zfar ] representing, respectively, the minimum and maximum distance of the corresponding 3D point from the camera. The depth range is quantized with 8 bit, with the closest point having the value 255 and the most distant point having the value 0. Effectively, the depth map is specified as a grayscale image; these values can be supplied into the luminance channel of a video signal and the chrominance can be set to a constant value. In summary, this representation uses a regular video stream enriched with so-called depth maps providing a Z -value for each pixel. Note that V + D enjoys backward compatibility because a 2D receiver will display only the V portion of the V + D signal. Studies by “Virtual” view generation and 3D display
3D video coding
2D color video MPEG-2 coded
Recorded 3D
Standard DVB decoder
Standard 2DTV
3DTV broadcast decoder
Single user 3DTV
3D recording
Meta data 2D-to-3D content conversion 2D out of 3D
3D content generation
3D video coding
DVB network
Depth information MPEG-4 coded
Transmission
Figure 3.13 Depth-image-based rendering (DIBR) system.
Multiple user 3DTV
62
3DTV/3DV ENCODING APPROACHES
z far
z near
0
255
Figure 3.14 Video plus depth (V + D) representation for 3D video.
Figure 3.15
Regeneration of stereo video from V + D signals.
the European ATTEST (Advanced Three Dimensional Television System Technologies) project indicate that depth data can be compressed very efficiently and still be of good quality; namely, that it needs only around 20% of the bitrate that would otherwise be needed to encode the color video (the qualitative results were confirmed by means of subjective testing). This approach can be placed in the category of Depth-Enhanced Stereo (DES). A stereo pair can be rendered from the V + D information, by 3D warping at the decoder. A general warping algorithm takes a layer and deforms it in many ways: for example, twists it along any axis, or bends a layer around itself or adds arbitrary dimension with a displacement map. The generation of the stereo pair from a V + D signal at the decoder as illustrated in Fig. 3.15. This reconstruction
MORE ADVANCED METHODS
63
affords extended functionality compared to CSV because the stereo image can be adjusted and customized after transmission. Note that in principle, more than two views can be generated at the decoder thus enabling support of multi-view displays (and head motion parallax viewing within reason). V + D enjoys backwards compatibility, compression efficiency, extended functionality, and the ability to use existing coding algorithms. It is only necessary to specify high-level syntax that allows a decoder to interpret two incoming video streams correctly as color and depth. The specifications “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” and “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data” enable 3D video-based V + D to be deployed in a standardized fashion by broadcasters interested in adopting this method. It should be noted however, that the advantages of V + D over CSV entail increased complexity for both, sender and receiver. At the receiver side, view synthesis has to be performed after decoding to generate the second view of the stereo pair. At the sender (capture) side, the depth data have to be generated before encoding can take place. This is usually done by depth/disparity estimation from a captured stereo pair; these algorithms are complex and still error prone. Thus in the near future, V + D might be more suitable for applications with playback functionality, where depth estimation can be performed offline on powerful machines, for example in a production studio or home 3D editing suite, enabling viewing of downloaded 3D video clips and 3DTV broadcasting [16]. 3.2.2
Multi-View Video Plus Depth (MV + D)
There are some advanced 3D video applications that are not properly supported by any existing standards and where work by the ITU-R or ISO/MPEG is needed. Two such applications are given below: • wide range multi-view autostereoscopic displays (say, nine or more views); • FVV (environment where the user can chose his/her own viewpoint). These 3D video applications require a 3D video format that allows rendering a continuum and/or large number of output views at the decoder. There really are no available alternatives: MVC discussed above does not support a continuum and becomes inefficient for a large number of views; and, we noted that V + D could in principle generate more than two views at the decoder but in practice, it supports only a limited continuum around the original view (artifacts increase significantly with the distance of the virtual viewpoint). In response, MPEG started an activity to develop a new 3D video standard that would support these requirements (Chapter 6). The MV + D concept is illustrated in Fig. 3.16. MV + D involves a number of complex processing steps where (i) depth has to be estimated for the N views at the capture point, and then (ii) N color with N depth video streams have to
64
3DTV/3DV ENCODING APPROACHES
Transmission
3D decoding
3D coding
3D representation
3D representation
3D representation
Depth estimation
Camera views
Scene rendering Decoder
3D scene
Figure 3.16 Multi-view video plus depth (MV + D) concept.
be encoded and transmitted. At the receiver, the data have to be decoded and the virtual views have to be rendered (reconstructed). As was implied just above, MV + D can be used to support multi-view autostereoscopic displays in a relatively efficient manner. Consider a display that supports nine views (V1–V9) simultaneously (e.g., with a lenticular display manufactured by Philips; Fig. 3.17). From a specific position a viewer can see Decoded MV + D data D1
D5
D9
V1
V5
V9 DIBR
DIBR
V1
V2
V3
V4
V5
V6
V7
V8
V9
Lenticular display
MV 3D display
Left ∑
Right Left
Pos1 Left
Right ∑
Right ∑
Pos3
Pos2
Figure 3.17 Multi-view autostereoscopic displays based on MV + D.
MORE ADVANCED METHODS
65
only a stereo pair of views, depending on the viewer’s position. Transmitting nine display views directly (e.g., by using MVC) would be taxing from a bandwidth perspective; in this illustrative example only three original views (views V1, V5, and V9) along with corresponding depth maps D1, D5, and D9 are in the decoded stream—the remaining views can be synthesized from these decoded data by using DIBR techniques. 3.2.3
Layered Depth Video (LDV)
LVD is a derivative and also an alternative to MV + D. LDV is believed to be more efficient than MV + D because less information has to be transmitted; however, additional error-prone vision processing tasks are required that operate on partially unreliable depth data. These efficiency assessments remain to be fully validated as of press time. LVD uses (i) one-color video with associated depth map and (ii) a background layer with associated depth map; the background layer includes image content that is covered by foreground objects in the main layer. This is illustrated in Figs 3.18 and 3.19. The occlusion information is constructed by warping two or
Capture
Foreground layer
Occlusion Depth estimation generation
1 2 3 4 Upstream/capture Transmission Downstream/rendering
Center view Generation new view
Viewing from left side of image
Viewing from right side of image
Figure 3.18 Layered depth video (LDV) concept.
Occlusion layer
66
3DTV/3DV ENCODING APPROACHES
Capture
Foreground layer
Depth estimation
zfar
znear
Occlusion generation
Figure 3.19
0
255
Occlusion layer
Layered depth video (LDV) example.
more neighboring V + D views from the MV + D representation onto a defined center view. The LDV stream or substreams can then be encoded by a suitable LDV coding profile. Note that LDV can be generated from MV + D by warping the main layer image onto other contributing input images (e.g., an additional left and right view). By subtraction, it is then determined which parts of the other contributing input images are covered in the main layer image; these are then assigned as residual images and transmitted while the rest is omitted [16]. Figure 3.18 is based on a recent presentation at the 3D Media Workshop, Heinrich Hertz Institut (HHI) Berlin, October 15–16, 2009 [27, 28]. LDV provides a single view with depth and occlusion information. The goal is to achieve automatic acquisition of 3DTV content, especially to obtain depth and occlusion information from video and to extrapolate a new view without error. Table 3.2, composed from technical details in Ref. [29] provides a summary of the issues associated with the various representation methods.
MORE ADVANCED METHODS
TABLE 3.2
67
Summary of Formats
Short-term
Stereoscopic 3D formats
• Suboptions ◦ Simulcast (2 views transmitted, double the bandwidth) ◦ Spatially interleaved side-by-side ◦ Spatially interleaved above/under ◦ Time interleaved (2 views transmitted, double the bandwidth) • Standard format for 3D cinema (a plus) • Standard format for glasses-based consumer displays (a plus) • No support for non-glasses-based multi-view displays (a minus) • Allows adjustment of zero parallax (a plus) • No scaling of depth (a minus) ◦ No adjustment to display size ◦ No personal preferences, kids mode • No occlusion information ◦ no motion parallax
Longer-term
Video plus depth: one video stream with associated depth map
• Successfully demonstrated by ATTEST project (2002–2004), MPEG-C Part 3 • Not the standard format for 3D cinema (a minus) • Depth-image-based rendering ◦ Support for stereoscopic glasses-based consumer displays ◦ Support for non-glasses-based multi-view displays (a plus) ◦ Allows scaling of depth (a plus) – Adjustment to display size – Personal preferences, kids mode ◦ Views must be extrapolated (a minus) • Allows adjustment for zero parallax (a plus) • No occlusion information (a minus) ◦ Reduced quality of depth-image-based rendering (continued overleaf)
68
3DTV/3DV ENCODING APPROACHES
TABLE 3.2
(Continued )
Layered depth video (LDV): video plus depth-enhanced with additional occlusion layer with depth information (video with per-pixel depth map and occlusion layer with depth map)
• Not the standard format for 3D cinema (a minus) • Depth-image-based rendering ◦ Support for stereoscopic glasses-based consumer displays ◦ Support for non-glasses-based multi-view displays (a plus) ◦ Allows scaling of depth (a plus) ◦ Views must be extrapolated (a minus) • Allows adjustment for zero parallax (a plus) • Provides occlusion information (a plus) ◦ Better quality of depth-image-based rendering
Depth-enhanced stereo (DES): 2 video streams with depth map and additional occlusion layer with depth information (2 videos with per-pixel depth map and occlusion layer with depth map)
• Not the standard format for 3D cinema (a minus) • Easily usable for stereoscopic glasses-based consumer displays (a plus) • Depth-image-based rendering ◦ Support for non-glasses-based multi-view displays (a plus) ◦ Allows scaling of depth (a plus) ◦ Views are interpolated or extrapolated • Allows adjustment for zero parallax (a plus) • Provides excellent occlusion information (a big plus)
Multiple video plus depth): 2 or more video streams with depth (interpolation of intermediate virtual views from multiple video plus depth (MVD))
• Not the standard format for 3D cinema (a minus) • Easily usable for stereoscopic glasses-based consumer displays • Depth-image-based rendering ◦ Support for non-glasses-based multi-view displays (a plus) ◦ Allows scaling of depth (a plus) ◦ Views are interpolated (a plus) • Allows adjustment for zero parallax (a plus) • Provides good occlusion handling due to redundant information (a plus)
REFERENCES
69
3.3 SHORT-TERM APPROACH FOR SIGNAL REPRESENTATION AND COMPRESSION
In summary, stereoscopic 3D will be used in the short term. Broadcasters appear to be rallying around top/bottom spatial compression; however, trials are still ongoing. Other approaches involve some form of compression including checkerboard (quincunx filters), side-by-side or interleaved rows or columns [30]. Spatial compression can operate on the same channel capacity as an existing TV channel but with a compromise in resolution. Stereoscopic 3D is the de facto standard from 3D cinema; note that this approach is directly usable for glasses-based displays, but it does not allow for scaling of depth. It is also not usable for non-glasses-based displays [29]. (Preferably, a 3D representation format must be generic for all display types—stereoscopic displays and multi-view displays—the long-term approaches we listed above will support that goal.) For compression, one of the following four may find use in the short term: (i) ITU-T Rec. H.262/ISO/IEC 13818-2 MPEG-2 Video (MVP); or (ii) H.264/AVC with SEI; or (iii) H264/AVC can be used for each view independently; or (iv) the MVC extension of H.264/AVC (Amendment 4). 3.4
DISPLAYS
We include this topic here just to provide the end-to-end view implied in Fig. 3.1 though we have indirectly covered it earlier along the way. 3D displays include the following: • • • • • • •
glasses-based displays—anaglyph; glasses-based displays with active shutter glasses; glasses-based displays with passive glasses; non-glasses-based displays, lenticular; non-glasses-based displays, barrier; non-glasses-based displays, two views, tracked; non-glasses-based displays, nine views, video + depth input (internal conversion to multi-view); • non-glasses-based displays, 2x video + depth. Tables 1.2 and 1.3 provided a synopsis of the technology, also with a perspective on what was commercially available at press time. REFERENCES 1. Minoli D. 3D Television (3DTV) Technology, Systems, and Deployment—Rolling out the Infrastructure for Next-Generation Entertainment. Francis and Taylor; 2010. 2. Ozaktas HM, Onural L, editors. Three-dimensional television: capture, transmission, display. New York: Springer; 2008, XVIII, 630 p. 316 illus., ISBN: 978-3-54072531-2.
70
3DTV/3DV ENCODING APPROACHES
3. Bahram J, Fumio O, editors. Three-dimensional television, video and display technology. New York: Springer; 2002. 4. Schreer O, Kauff P, Sikora T, editors. 3D Videocommunication: Algorithms, concepts and real-time systems in human-centered communication (Hardcover). New York: Wiley, John & Sons; 2005, ISBN-13: 9780470022719. 5. Johnston C. Will New Year of 3D Drive Lens Technology? TV Technology Online Magazine. Dec 15 2009. 6. Chinnock C. 3D Coming Home in 2010, 3D@Home White Paper, 3D@Home Consortium. www.3Dathome.org. 2010. 7. Tay DBH, Kingsbury NG. Flexible design of multidimensional perfect reconstruction FIR 2-band filters using transformations of variables. IEEE Trans Image Process 1993; 2(4): 466–480. 8. Sweldens W. The lifting scheme: a custom-design construction of biorthogonal wavelets. Appl Comput Harmonic Anal 1996; 3: 186–200. 9. Gouze A, Antonini M, Barlaud M. Quincunx lifting scheme for lossy image compression. Proceedings of IEEE International Conference on Image Processing, vol. 1; Sep 2000; Vancouver, BC, Canada. pp. 665–668. 10. Kovacevic J, Sweldens W. Wavelet families of increasing order in arbitrary dimensions. IEEE Trans Image Process 2000; 9(3): 480–496. 11. Chen Y, Adams MD, Lu WS. Design of optimal quincunx filter banks for image coding. EURASIP J Adv Signal Process 2007; 2007, Article ID 83858. 12. Liu Y, Nguyen TT, Oraintara S. Embedded Image Coding using Quincunx Directional Filter Bank, ISCAS 2006, IEEE. pp 4943. 13. Zhang X, Wu X, Wu F. Image coding on quincunx lattice with adaptive lifting and interpolation. Data Compression Conference (DCC ’07). IEEE Computer Society: Piscataway, NJ, USA; 2007. 14. Sun S, Lei S. Stereo-view video coding using H.264 tools. Proc SPIE Int Soc Opt Eng 2005; 5685: 177–184. 15. Hur J-H, Cho S, Lee Y-L. Illumination change compensation method for H.264/AVCbased multi-view video coding. IEEE Trans Circuit Syst Video Technol 2007; 17(11). 16. 3DPHONE. Project no. FP7-213349, Project title: ALL 3D IMAGING PHONE, 7th FRAMEWORK PROGRAMME, Specific Programme “Cooperation”, FP7-ICT2007.1.5—Networked Media, D5.1- Requirements and specifications for 3D video. Aug 19 2008. 17. Smolic A, McCutchen D. 3DAV exploration of video-based rendering technology in MPEG. IEEE Trans Circuits Syst Video Technol 2004; 14(3): 348–356. 18. Sullivan G, Wiegand T, Luthra A. Draft of Version 4 of H.264/AVC (ITU-T Recommendations H.264 and ISO/IEC 14496–10 (MPEG-4 Part 10) Advanced Video Coding), ISO/IEC JTC1/SC29/WG11 and ITU-T Q6/SG16, Doc. JVT-N050d1. 2005. 19. Mueller K, Merkle P, Smolic A, et al. Multi-view Coding Using AVC, ISO/IEC JTC1/SC29/WG11, Bangkok, Thailand, Doc. M12945. 2006. 20. ISO. Subjective Test Results for the CfP on Multi-view Video Coding, ISO/IEC JTC1/SC29/WG11, Bangkok, Thailand, Doc. N7779. 2006. 21. Stelmach L, Tam WJ. Stereoscopic image coding: effect of disparate image-quality in left- and right-eye views. Signal Process Image Commun 1998; 14: 111–117.
REFERENCES
71
22. Stelmach L, Tam WJ, Meegan D, et al. Stereo image quality: effects of mixed spatiotemporal resolution. IEEE Trans Circuits Syst Video Technol 2000; 10(2): 188–193. 23. Tam WJ. Human Visual Perception relevant to 3D-TV. Ottawa: Communications Research Centre Canada; Apr 30 2009. 24. Fehn C, Kauff P, et al. An evolutionary and optimized approach on 3DTV. Proceedings of International Broadcast Conference ’02; 2002; Amsterdam, The Netherlands. pp. 357–365. 25. Fehn C. A 3DTV Approach Using Depth-Image-Based Rendering (DIBR). Proceedings of Visualization, Imaging, and Image Processing ’03; 2003; Benalmadena, Spain. pp. 482–487. 26. Fehn C. Depth-Image-Based Rendering (DIBR), compression, and transmission for a flexible approach on 3DTV [PhD thesis]. Germany: Technical University Berlin; 2006. 27. Koch R. Future 3DTV Acquisition, 3D Media Workshop, HHI Berlin. Oct 15–16, 2009. 28. Frick A, Kellner F, et al. Generation of 3DTV LDV content with time of flight cameras. Proceedings of 3DTV-CON 2009; May 04–06 2009; Potsdam. 29. Tanger R. 3D4YOU, Seventh Framework Theme ICT-2007.1.5 Networked Media, Position Paper submitted to ITU-R, 5/6/2009 Fraunhofer HHI, Berlin, Germany. 30. Merritt R. Incomplete 3DTV products in CES spotlight HDMI upgrade one of latest pieces in stereo 3D puzzle, EE Times. Dec 23 2009. 31. Starks M. Spacespex anaglyph—the only way to bring 3DTV to the masses. Online article. 2009. 32. Choi Y-W, Thao NT. Implicit coding for very low bit rate image compression. 1998 IEEE International Conference on Image Processing, 1998. ICIP 98. Proceedings, Volume 2; 4–7 Oct Chicago: IL, USA; 1998; pp. 560–564. 33. International Organization For Standardization. ISO/IEC JTC1/SC29/WG11, Coding Of Moving Pictures And Audio, “Vision on 3D Video, Video and Requirements,” ISO/IEC JTC1/SC29/WG11N10357. Lausanne, Switzerland. Feb 2009. 34. Chen Y, Wang Y.-K, Ugur K, Hannuksela MM, Lainema J, Gabbouj M. The emerging MVC standard for 3D video services. EURASIP Journal on Advances in Signal Processing 2009; 2009: Article ID 786015, DOI 10.1155/2009/786015. 35. Editor: Smolic A. Introduction to Multi-view Video Coding. Antalya, Turkey, International Organization for Standardization, ISO/IEC JTC 1/SC 29/WG 11, Coding of Moving pictures and Audio. Jan 2008. 36. Yang W, Wu F, Lu Y, et al. Scalable multi-view video coding using wavelet. IEEE Int Symp Circuits Syst 2005; 6(23–26): 6078–6081. DOI 10.1109/ISCAS. 2005.1466026. 37. Min D, Kim D, Yun SU, et al. 2D/3D freeview video generation for 3DTV system. Signal Process Image Commun 2009; 24(1–2): 31–48. DOI 10.1016/j.image. 2008.10.009 38. Ozbek N, Tekalp A. Scalable multi-view video coding for interactive 3DTV. 2006 IEEE International Conference on Multimedia and Expo, Proceedings; July 09–12 2006; Toronto, Canada. pp. 213–216. ISBN: 1-4244-0366-7.
72
3DTV/3DV ENCODING APPROACHES
39. Tech G, Smolic A, Brust H, et al. Optimization and comparison of coding algorithms for mobile 3DTV. White Paper, Fraunhofer Institute for Telecommunications. Berlin: Heinrich-Hertz-Institut, Image Processing Department Einsteinufer; 2009. 40. Hewage CTER, Worrall S. Robust 3D video communications. IEEE Comsoc Mmtc E-Letter 2009; 94(3).
APPENDIX A3: COLOR ENCODING
73
APPENDIX A3: COLOR ENCODING
Color encoding (anaglyph) is the de facto method used for 3D over the years. In fact, there are many hundreds of patents on anaglyphs in a dozen languages going back 150 years. The left-eye and right-eye images are color encoded to derive a single merged (overlapped) frame; at the receiving end the two frames are restored (separated) using colored glasses. This approach makes use of a number of encoding processing techniques to optimize the signal in order to secure better color contrast, image depth, and overall performance (Fig. A3.1). Red/blue, red/cyan, green/magenta, or blue/yellow color coding can be used, with the first two being the most common. Orange/blue anaglyph techniques are claimed by some to provide good quality, but there is a continuum of combinations [31]. Advantages of this approach include the fact that it is frame-compatible with existing systems, can be delivered over any 2D system, provides full resolution, and uses inexpensive “glasses.” However, it produces the lowest quality 3D image compared with the other systems discussed above.
3D camera
Left eye
3D encoding and video Compression
3D Video format compresencode sion
3D home master
Blu-ray Disc Cable TV Satellite TV Terrestrial TV IPTV Internet
3D video distribution channels
Right eye
(One frame)
(One frame)
Video decompression
3D format decode
Media players and set top boxes
TV
Figure A3.1 Anaglyph method.
74
3DTV/3DV ENCODING APPROACHES
APPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODING STANDARDS
This appendix provides some additional details on video encoding, especially in terms of future systems. Efficient video encoding is required for 3DTV/3DV and for FVT/FVV. 3DTV/3DV support 3D depth impression of the observed scenery, while FVT/FVV additionally allow for an interactive selection of viewpoint and direction within a certain operating range. Hence, a common feature of 3DV and FVV systems is the use of multiple views of the same scene that are transmitted to the user. Multi-view 3D video can be encoded implicitly in the V + D representation or, as is more often the case, explicitly. In implicit coding one seeks to use (implicit) shape coding in combination with MPEG-2/MPEG-4. Implicit shape coding could mean that the shape can be easily extracted at the decoder, without explicit shape information present in the bitstream. These types of image compression schemes do not rely on the usual additive decomposition of an input image into a set of predefined spanning functions. These schemes only encode implicit properties of the image and reconstruct an estimate of the scene at the decoding end. This has particular advantages when one seeks very low bitrate perceptually oriented image compression [32]. The literature on this topic is relatively scanty. Chroma Key might be useful in this context: Chroma Key, or green screen, allows one to put a subject anywhere in a scene or environment using the Chroma Key as the background. One can then import the image into the digital editing software, extract the Chroma Key and replace with another image or video. Chroma Key shape coding for implicit shape coding (for medium quality shape extraction) has been proposed and also demonstrated in the recent past. On the other hand, there are a number of strategies for explicit coding of multiview video: (i) simulcast coding, (ii) scalable simulcast coding, (iii) multi-view coding, and (iv) Scalable Multi-View Coding (SMVC). Simulcast coding is the separate encoding (and transmission) of the two video scenes in the CSV format; clearly the bitrate will typically be in the range of double that of 2DTV. V + D is more bandwidth efficient not only in the abstract, but also in practice. At the practical level, in a V + D environment the quality of the compressed depth map is not a significant factor in the final quality of the rendered stereoscopic 3D video. This follows from the fact that the depth map is not directly viewed, but is employed to warp the 2D color image to two stereoscopic views. Studies show that the depth map can typically be compressed to 10%–20% of the color information. V + D (also called 2D plus depth, or 2D + depth, or color plus depth) has been standardized in MPEG as an extension for 3D filed under ISO/IEC FDIS 23002-3:2007(E). In 2007, MPEG specified a container format “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” (also known as MPEG-C Part 3) that can be utilized for V + D data. 2D + depth, as specified by ISO/IEC 23002-3 supports the inclusion of depth for generation
APPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODING STANDARDS
75
of an increased number of views. While it has the advantage of being backward compatible with legacy devices and is agnostic of coding formats, it is capable of rendering only a limited depth range since it does not directly handle occlusions [33]. Transport of this data is defined in a separate MPEG systems specification “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data.” There is also major interest in MV + D. Applicable coding schemes of interest here include the following: • Multiple-view video coding (MVC) • Scalable Video Coding (SVC) • Scalable multi-view video coding (SMVC) From a test/test-bed implementation perspective, for the first two options, each view can be independently coded using the public-domain H.264 and SVC codecs respectively. Test implementations for MVC and for preliminary implementations of an SMVC codec have been documented recently in the literature. B3.1
Multiple-View Video Coding (MVC)
It has been recognized that MVC is a key technology for a wide variety of future applications including FVV/FTV, 3DTV, immersive teleconference and surveillance, and other applications. An MPEG standard, “Multi-View Video Coding (MVC),” to support MV + D (and also V + D) encoded representation inside the MPEG-2 transport stream has been developed by the JVT of ISO/IEC MPEG and ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). MVC allows the construction of bitstreams that represent multiple views [34]; MVC supports efficient encoding of video sequences captured simultaneously from multiple cameras using a single video stream. MVC can be used for encoding stereoscopic (two-view) and multi-view 3DTV, and for FVV/FVT. MVC (ISO/IEC 14496-10:2008 Amendment 1 and ITU-T Recommendation H.264) is an extension of the AVC standard that provides efficient coding of multi-view video. The encoder receives N temporally synchronized video streams and generates one bitstream. The decoder receives the bitstream, decodes and outputs the N video signals. Multi-view video contains a large amount of inter-view statistical dependencies, since all cameras capture the same scene from different viewpoints. Therefore, combined temporal and inter-view prediction is the key for efficient MVC. Also, pictures of neighboring cameras can be used for efficient prediction [35]. MVC supports the direct coding of multiple views and exploits inter-camera redundancy to reduce the bitrate. Although MVC is more efficient than simulcast, the rate of MVC encoded video is proportional to the number of views. The MVC group in the JVT has chosen the H.264/MPEG-4 AVC-based multi-view video method as its MVC video reference model, since this method supports better coding efficiency than H.264/AVC simulcast coding. H.264/MPEG-4 AVC was developed jointly by ITU-T and ISO through the JVT
76
3DTV/3DV ENCODING APPROACHES
in the early 2000s (the ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC, ISO/IEC 14496-10-MPEG-4 Part 10 are jointly maintained to retain identical technical content). H.264 is used with Blu-ray Disc and videos from the iTunes Store. The standardization of H.264/AVC was completed in 2003, but additional extensions have taken place since then; for example, SVC as specified in Annex G of H.264/AVC added in 2007. Owing to the increased data volume of multi-view video, highly efficient compression is needed. In addition to the redundancy exploited in 2D video for compression, the common idea for MVC is to further exploit the redundancy between adjacent views. This is because multi-view video is captured by multiple cameras at different positions and significant correlations exist between neighbor views [36]. As hinted elsewhere, there is interest in being able to synthesize novel views from the virtual cameras in multi-view camera configurations; however, the occlusion problem can significantly affect the quality of virtual view rendering [37]. Also, for FVV, the depth map quality is important because it is used to render virtual views that are further apart than with the stereoscopic case: when the views are further apart, the distortion in the depth map has a greater effect on the final rendered quality—this implies that the data rate of the depth map has to be higher than in the CSV case. Note: Most existing MVC techniques are based on the traditional hybrid DCTbased video coding schemes. These neither fully exploit the redundancy among different views nor provide an easy way of implementation for scalabilities. In addition, all the existing MVC schemes mentioned above use DCT-based coding. A fundamental problem for DCT-based block coding is that it is not convenient to achieve scalability, which has become a more and more important feature for video coding and communications. As a research topic, wavelet-based image and video coding has been proved to be a good way to achieve both, good coding performance and full scalabilities including spatial, temporal, and Signal-To-Noise Ratio (SNR) scalabilities. In the past, MVC has been included in several video coding standards such as MPEG-2 MVP, and MPEG-4 MAC (Multiple Auxiliary Component). More recently, an H.264-based MVC scheme has been developed that utilizes the multiple reference structure in H.264. Although this method does exploit the correlations between adjacent views through inter-view prediction, it has some constraints for practical applications compared to a method that uses, say, wavelets [36].
As just noted, MPEG has developed a suite of international standards to support 3D services and devices. In 2009 MPEG initiated a new phase of standardization to be completed by 2011. MPEG’s vision is a new 3DV format that goes beyond the capabilities of existing standards to enable both, advanced stereoscopic display processing and improved support for autostereoscopic N -view displays, while enabling interoperable 3D services. 3DV aims to improve rendering capability of 2D + depth format while reducing bitrate requirements relative to simulcast and MVC. Figure B3.1 illustrates ISO MPEG’s target of 3DV format illustrating limited camera inputs and constrained rate transmission
77
APPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODING STANDARDS
Stereoscopic displays Variable stereo baseline Adjust depth perception
Left
Right Limited camera inputs Data format
Data format Constrained rate (based on distribuiton)
Autostereoscopic N-view displays
Figure B3.1
Wide viewing angle Large number of output views
Target of 3D video format for ongoing MPEG standardization initiatives.
according to a distribution environment. The 3DV data format aims to be capable of rendering a large number of output views for autostereoscopic N -view displays and support advanced stereoscopic processing. Owing to limitations in the production environment, the 3DV data format is assumed to be based on limited camera inputs; stereo content is most likely, but more views might also be available. In order to support a wide range of autostereoscopic displays, it should be possible for a large number of views to be generated from this data format. Additionally, the rate required for transmitting the 3DV format should be fixed to the distribution constraints; that is, there should not be an increase in the rate simply because the display requires a higher number of views to cover a larger viewing angle. In this way, the transmission rate and the number of output views are decoupled. Advanced stereoscopic processing that requires view generation at the display would also be supported by this format [33]. Compared to the existing coding formats, the 3DV format has several advantages in terms of bit rate and 3D rendering capabilities; this is also illustrated in Fig. B3.2 [33]. • 2D + depth, as specified by ISO/IEC 23002-3, is only capable of rendering a limited depth range since it does not directly handle occlusions. The 3DV format is expected to enhance the 3D rendering capabilities beyond this format. • MVC is more efficient than simulcast but the rate of MVC encoded video is proportional to the number of views. The 3DV format is expected to significantly reduce the bitrate needed to generate the required views at the receiver.
78
3DTV/3DV ENCODING APPROACHES
Simulcast
MVC
Bitrate
3DV should be compatible with existing standards mono and stereo devices existing or planned infrastructure
3DV
2D + Depth 2D
3D rendering capability
Figure B3.2
B3.2
Illustration of 3D rendering capability versus bit rate for different formats.
Scalable Video Coding (SVC)
The concept of the SVC scheme is to enable the encoding of a video stream that contains one (or several) subset bitstream(s) of a lower spatial or temporal resolution (that is, lower quality video signal)—each separately or in combination—compared to the bitstream it is derived from (e.g., the subset bitstream is typically derived by dropping packets from the larger bitstream), that can itself (themselves) be decoded with a complexity and reconstruction quality comparable to that achieved by using the existing coders (e.g., H.264/MPEG-4 AVC) with the same quantity of data as in the subset bitstream. A standard for SVC was recently being worked on by the ISO MPEG Group, and was completed in 2008. The SVC project was undertaken under the auspices of the JVT of the ISO/IEC MPEG and the ITU-T VCEG. In January 2005, MPEG and VCEG agreed to develop a standard for SVC, to become as an amendment of the H.264/MPEG-4 AVC standard. It is now an extension, Annex G, of the H.264/MPEG-4 AVC video compression standard. A subset bitstream may encompass a lower temporal or spatial resolution (or possibly a lower quality video signal, say with a camera of lower quality) as compared to the bitstream it is derived from. • Temporal (Frame Rate) Scalability: the motion compensation dependencies are structured so that complete pictures (specifically packets associated with these pictures) can be dropped from the bitstream. (Temporal scalability is already available in H.264/MPEG-4 AVC but SVC provides supplemental information to ameliorate its usage.) • Spatial (Picture Size) Scalability: video is coded at multiple spatial resolutions. The data and decoded samples of lower resolutions can be used to
APPENDIX B3: ADDITIONAL DETAILS ON VIDEO ENCODING STANDARDS
79
predict data or samples of higher resolutions in order to reduce the bitrate to code the higher resolutions. • Quality Scalability: video is coded at a single spatial resolution but at different qualities. In this case the data and samples of lower qualities can be utilized to predict data or samples of higher qualities—this is done in order to reduce the bitrate required to code the higher qualities. Products supporting the standard (e.g., for video conferencing) started to appear in 2008. B3.2.1 Scalable Multi-View Video Coding (SMVC). Although there are many approaches published on SVC and MVC, there is no current work reported on scalable multi-view video coding (SMVC). SMVC can be used for transport of multi-view video over IP for interactive 3DTV by dynamic adaptive combination of temporal, spatial, and SNR scalability according to network conditions [38]. B3.3
Conclusion
Table B3.1 based on Ref. [39] indicates how the “better-known” compression algorithms can be applied, and what some of the trade-offs in quality are (this study was done in the context of mobile delivery of 3DTV, but the concepts are similar in general). In this study, four methods for transmission and compression/coding of stereo video content were analyzed. Subjective ratings show that the mixed resolution approach and the video plus depth approach do not impair video quality at high bitrates; at low bitrates simulcast transmission is outperformed by the other methods. Objective quality metrics, utilizing the blurred or rendered view from uncompressed data as reference, can be used for optimization of single methods (they cannot be used for comparison of methods since they have a positive or negative bias). Further research of individual methods will include combinations like inter-view prediction for mixed resolution coding and depth representation at reduced resolution. In conclusion, the V + D format is considered by researchers to be a good candidate to represent stereoscopic video that is suitable for most of the 3D displays currently available; MV + D (and the MVC standard) can be used for holographic displays and for FVV, where the user, as noted, can interactively select his or her viewpoint and where the view is then synthesized from the closest spatially located captured views [40]. However, for the initial deployment one will likely see (in order of likelihood). • spatial compression in conjunction with MPEG-4/AVC; • H.264/AVC stereo SEI message; • MVC, which is an H.264/MPEG-4 AVC extension.
80
3DTV/3DV ENCODING APPROACHES
TABLE B3.1
Application of Compression Algorithms
H.264/AVC simulcast
H.264/AVC stereo SEI message
Mixed resolution coding
Video plus depth
The left and right view are transmitted independently, each coded using H.264/MPEG-4 AVC. Hence, this method does not need any pre- or post-processing before coding and after decoding, the complexity on the sender and receiver sides is low. Redundancy between channels is not reduced, thus coding efficiency is not optimized Note: Nonhierarchical B pictures can be used with a Group of Pictures (GOP) structure of IBBP (hierarchical B pictures significantly increase coding efficiency; however, hierarchical B pictures require increased complexity of the decoder and the encoder, which limits application in mobile devices) H.264/MPEG-4 AVC enables inter-view prediction through the stereo SEI syntax. Practically speaking, it is based on interlacing the left and the right view prior to coding and exploring interlaced coding mechanisms. It has been shown that the principle and efficiency of this approach is very similar to MVC, which is a H.264/MPEG-4 AVC extension to code two or more related video signals Note: Nonhierarchical B pictures can be used with a GOP structure of IBBP (hierarchical B pictures significantly increase coding efficiency; however, hierarchical B pictures require increased complexity of the decoder and the encoder—this limits application in mobile devices) Binocular suppression theory states that perceived image quality is dominated by the view with higher spatial resolution. The mixed resolution approach utilizes this attribute of human perception by decimating one view before transmission and up-scaling at the receiver side. This enables a trade-off between spatial subsampling and amplitude quantization. For example the right view can be reduced by a factor of about two in the horizontal and vertical directions; one can also alternate such reduction between the two eyes when there is a scene cut MPEG-C Part 3 defines a video plus depth representation of the stereo video content. Depth is generated at the sender side for instance, by estimation from an original left and right view. One view is transmitted simultaneously with the depth signal. At the receiver, the other view is synthesized by depth-image–based rendering. Compared to video, a depth signal can, in most cases, be coded at a fraction of the bitrate at sufficient quality for view synthesis. Errors in depth estimation and problems with disocclusions introduce artifacts to the rendered view
CHAPTER 4
3DTV/3DV Transmission Approaches and Satellite Delivery
This chapter addresses some key concepts related to the transmission of 3DTV video in the near term. This chapter is not intended to be a research monograph of open technical issues, but rather to discuss in some generality some of the approaches being considered to distribute content to end users. If 3DTV as a commercial service is to become a complete reality in the next 2–5 years, it will certainly use some or all of the technologies discussed in this chapter—these systems can all be deployed at this time and will be needed at the practical level to make this nascent service a reality. We start with a generic discussion about transmission approaches and then look at DVB-based satellite approaches. 4.1
OVERVIEW OF BASIC TRANSPORT APPROACHES
It is to be expected that 3DTV for home use will likely first see penetration via stored media delivery (e.g., Blu-ray Disc). The broadcast commercial delivery of 3DTV (whether over satellite/DTH, over the air, over cable, or via IPTV), may take a few years because of the relatively large-scale infrastructure that has to be put in place by the service providers and the limited availability of 3D-ready TV sets in the home (implying a small subscriber, and so small revenue, base). Delivery of downloadable 3DTV files over the Internet may occur at any point in the immediate future, but the provision of a broadcast-quality service over the Internet is not likely in the foreseeable future. There are a number of alternative transport architectures for 3DTV signals, also depending on the underlying media. The service can be supported by traditional broadcast structures including the DVB architecture, wireless 3G/4G transmission such as DVB-H approaches, Internet Protocol (IP) in support of an IPTV-based service (in which case it also makes sense to consider IPv6), and the IP architecture for Internet-based delivery (both non–real time and streaming). 3DTV Content Capture, Encoding and Transmission: Building the Transport Infrastructure for Commercial Services, by Daniel Minoli Copyright 2010 John Wiley & Sons, Inc.
81
82
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
The specific approach used by each of these transport methods will also depend on the video-capture approach, as depicted in Table 4.1. Initially conventional stereo video (with temporal multiplexing or spatial compression) will be used by all commercial 3DTV service providers; later in the decade other methods may be used. Also, make note in this context that in the United States one has a well-developed cable infrastructure in all Tier 1 and Tier 2 metropolitan and suburban areas; in Europe/Asia, this is less so, with more DTH delivery (in the United States DTH tends to serve more exurban and rural areas). A 3DTV rollout must take these differences into account and/or accommodate both. Note that the V + D data representation can be utilized to build 3DTV transport evolutionarily on the existing DVB infrastructure. The in-home 3D images are reconstructed at the receiver side by using DIBR. MPEG has established a standardization activity that focuses on 3DTV using V + D representation. There are generally two potential approaches for transport of 3DTV signals: (i) connection-oriented (time/frequency division multiplexing) over existing DVB infrastructure over traditional channels (e.g., satellite, cable, over-the-air broadcast, DVB-H/cellular), and (ii) connectionless/packet using the IP (e.g., “private/dedicated” IPTV network, Internet streaming, Internet on-demand servers/P2P i.e., peer-to-peer). These references, for example, among others, describe various methods for traditional video over packet/ATM (Asynchronous Transfer Mode)/IPTV/satellite/Internet [1–5]; many of these approaches and techniques can be extended/adapted for use in 3DTV. Figures 4.1–4.7 depict graphically system-level views of the possible delivery mechanisms. We use the term “complexity” in these figures to remind the reader that it will not be trivial to deploy these networks on a broad national basis. A challenge in the deployment of multi-view video services, including 3D and free-viewpoint TV, is the relatively large bandwidth requirement associated with transport of multiple video streams. Two-streams signals CSV, V + D, and LDV are doable: the delivery of a single stream of 3D video in the range of 20 Mbps is not outside the technical realm of most providers these days, but to deliver a large number of channels in an unswitched mode (requiring say 2 Gbps access to a domicile) will require FTTH capabilities. It is not possible to deliver that content over an existing copper plant of the xDSL (Digital Subscriber Line) nature unless a provider could deploy ADSL2+ (Asymmetric Digital Subscriber Line; but why bother upgrading a plant to a new copper technology such as this one when the provider could actually deploy fiber? However, ADSL2+ may be used in Multiple Dwelling Units as a riser for a FTTH plant). A way to deal with this is to provide user-selected multicast capabilities where a user can select an appropriate content channel using IGMP (Internet Group Management Protocol). Even then, a household may have multiple TVs (say three or four) switched on simultaneously (and maybe even an active Digital Video Recorder or DVR), thus requiring bandwidth in the 15–60 Mbps. MV + D, where one wants to carry three or even more intrinsic (raw) views, becomes much more challenging and problematic for practical commercial applications. We cover ADSL2+ issues in Chapter 5.
83
Good Good
Best
Good Good
Best
Fine, doable; good, better approach; best, best approach.
Fine
DTH with DVB
Fine
Terrestrial DVB
Fine
Fine
Fine
Limited
3G/4G + DVB-H
Video Capture and Transmission Possibilities
Conventional stereo video (CSV) Video plus depth (V + D) Multi-view video plus depth (MV + D) Layered depth video (LDV)
TABLE 4.1
Best
Good
Good
Fine
IPTV (IPv4 or IPv6)
Fine
Fine
Good
Fine
Internet Real-Time Streaming
Good
Good
Good
Fine
Internet Non–real Time
Best
Good
Good
Fine
Cable TV
84
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
2D SD, HD 2D camera
2D display
Encoder and content production
v1
Hundreds
CSV
Encoder and 3D content production
Stereo camera
Encoder and 3D content production
HDMI 1.4
v1 Meta
Hundreds
V+D
Encoder and 3D content production
Depth camera
3D display
Hundreds
V + meta Stereo camera
v1 v2
v1 d
Hundreds MV + D v1
Encoder and 3D content production
Multicamera
M-view 3D display
vn
Hundreds Video
Depth
Figure 4.1
Metadata
(Simplicity of) initial enjoyment of 3DTV in the home.
Millions ...
Each viewer IGMPing v1c1, v1c2, v1c3, ...
2D decoder
2D display
2D SD, HD 2D camera
Stereo camera
Encoder and 3D content production
Encoder and 3D content production
v1 meta
Hundreds
V+D Depth camera
v1 v2
Hundreds
V + meta Stereo camera
Millions ...
Encoder and v1 content production Hundreds CSV
Encoder and 3D content production
v1 d
Hundreds
m content providers Aggregator m content providers Aggregator m content providers Aggregator
m content providers Aggregator
MV + D Multicamera
Encoder and 3D content production
v1 vn
Hundreds (3v, d)
Video
Depth
Metadata
m content providers Aggregator
200–300 content channels
CSV decoder
3D display Each viewer IGMPing v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... Millions ...
IPTV with V+D decoder service provider– 3D display managed IP network Each viewer (DVB transmission) IGMPing v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ... Millions ...
200–300 MV+D decoder with DIBR content channels Each viewer IGMPing v1c1+ v5c1+ v9c1, v1c2+v5c2+v9c3, ...
2D display
M-view 3D display
200–300 content channels
Figure 4.2
Complexity of a commercial-grade 3DTV delivery environment using IPTV.
Off-the-air broadcast could be accomplished with some compromise by using the entire HDTV bandwidth for a single 3DTV channel—here, multiple TVs in a household could be tuned to different programs. However, a traditional cable TV plant would find it a challenge to deliver a (large) pack of 3DTV channels, but it could deliver a subset of their total selection in 3DTV (say 10 or 20 channels) by scarifying bandwidth on the cable that could otherwise carry distinct channels. The same is true for DTH applications. For IP, a service provider–engineered network could be used. Here, the provider can control the latency, jitter, effective source–sink bandwidth, packet
OVERVIEW OF BASIC TRANSPORT APPROACHES
85
Millions ...
2D camera
Encoder and v1 content production Hundreds CSV
Stereo camera
Encoder and 3D content production
Stereo camera
Encoder and 3D content production
v1 Meta
Hundreds
V+D Depth camera
v1 v2
Hundreds
V + meta
m content providers Aggregator
m content providers Aggregator
3D display All channels; each viewer selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... Millions ...
V+D decoder
Cable TV network(s) Hundreds
Encoder and 3D content production
v1 d
Hundreds Encoder and 3D content production
v1 vn
2D display Millions ...
CSV decoder
200–300 content channels
m content providers Aggregator
All channels; 3D display each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ...
m content providers Aggregator
MV + D Multicamera
2D decoder
All channels; each viewer selecting v1c1, v1c2, v1c3, ...
2D SD, HD
m content providers Aggregator
Millions ...
2D display
200–300 content channels
MV+D decoder with DIBR
Hundreds (3v, d)
Video
Depth
All channels; M-view each viewer 3D display selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ...
Metadata
200–300 content channels
Figure 4.3 Complexity of a commercial-grade 3DTV delivery environment using the cable TV infrastructure.
All channels; each viewer selecting v1c1, v1c2, v1c3, ... DTH
DTH
3D
DTH
DTH Millions ...
2D decoder
3D
2D camera
Encoder and content production
Encoder and 3D content production
Stereo camera
Encoder and 3D content production
Encoder and 3D content production
v1 Meta
v1 d
Hundreds
MV + D Multicamera
v1 v2
Hundreds
V+D Depth camera
v1
Hundreds
V + meta
Encoder and 3D content production
v1 vn
Hundreds (3v, d) Video
Depth
Metadata
DTH
3D
Hundreds
CSV Stereo camera
2D display
3D
2D SD, HD m content providers Aggregator m content providers Aggregator m content providers Aggregator
m content providers Aggregator
m content providers Aggregator
DTH
DTH
Millions ...
DTH CSV decoder
All channels; 3D display each viewer selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... DTH
Millions ...
DTH DTH
V+D decoder DTH All channels; 3D display each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d2c3, ...
DTH DTH
DTH DTH
Millions...
2D display MV+D decoder with DIBR
All channels; each viewer selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ...
M-view 3D display
200–300 content channels
Figure 4.4 Complexity of a commercial-grade 3DTV delivery environment using a satellite DTH infrastructure.
86
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY Millions ...
All channels; each viewer selecting v1c1, v1c2, v1c3, ...
2D SD, HD 2D camera
Encoder and v1 content production Hundreds CSV
Stereo camera
Encoder and 3D content production
Stereo camera
Encoder and 3D content production
v1 Meta
Hundreds
V+D Depth camera
v1 v2
Hundreds
V + meta
2D decoder
Encoder and 3D content production
v1 d
Hundreds
m content providers Aggregator
2D display Millions ...
CSV decoder
All channels; 3D display each viewer selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ...
m content providers Aggregator
Millions ...
V+D decoder
m content providers Aggregator
3D display All channels; each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ...
m content providers Aggregator
Millions ...
MV + D Multicamera
2D display v1 vn
Encoder and 3D content production
m content providers Aggregator
MV+D decoder with DIBR
Hundreds (3v, d)
Video
Depth
All channels; M-view each viewer 3D display selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ...
Metadata
200–300 content channels
Figure 4.5 Complexity of a commercial-grade 3DTV delivery environment using overthe-air infrastructure.
Millions ...
Each viewer selecting v1c1, v1c2, v1c3, ...
2D decoder
2D display
2D SD, HD 2D camera
Encoder and 3D content production
Encoder and 3D content production
Encoder and 3D content production
v1 d
Hundreds
MV + D Multicamera
v1 Meta
Hundreds
V+D Depth camera
v1 v2
Hundreds
V + meta Stereo camera
v1
Hundreds
CSV Stereo camera
Millions ...
Encoder and content production
Encoder and 3D content production
m content providers Aggregator m content providers Aggregator m content providers Aggregator
m content providers Aggregator
CSV decoder
200–300 content channels
3D display Each viewer selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... Millions ...
Hundreds Server-driven rate scaling (P2P) Client-driven (selective) Client-driven (multicast) Servers
V+D decoder
3D display Each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ... Millions ...
Internet
v1 vn
m content providers Aggregator
200–300 content channels
2D display MV+D decoder with DIBR
Hundreds (3v, d)
Video
Depth
Metadata
M-view Each viewer 3D display selecting v1c1+v5c1+v9c1, v1c2+v5c2+v9c3, ...
200–300 content channels
Figure 4.6 Complexity of a commercial-grade 3DTV delivery environment using the Internet.
loss, and other service parameters. However, if the approach is to use the Internet, performance issues will be a major consideration, at least for real-time services. A number of multi-view encoding and streaming strategies using RTP (Real-Time Transport Protocol)/UDP (User Datagram Protocol)/IP or RTP/DCCP (Datagram Congestion Control Protocol)/IP exist for this approach. Video streaming architectures can be classified as (i) server to single client unicast, (ii) server multicasting
OVERVIEW OF BASIC TRANSPORT APPROACHES
87
Millions ... 2D decoder
2D display All channels; each viewer selecting v1c1, v1c2, v1c3, ...
2D SD, HD Encoder and content production
2D camera
v1
Hundreds
CSV
Encoder and 3D content production
Stereo camera
Hundreds
V + meta
Encoder and 3D content production
Stereo camera
v1 v2
v1 Meta
Hundreds
V+D
Encoder and 3D content production
Depth camera
Encoder and 3D content production
m content providers Aggregator
DVB-H (or proprietary) DVB-H (or proprietary) Millions ...
DVB-H (or proprietary)
m content providers Aggregator
v1 vn
m content providers Aggregator
Hundreds (3v, d)
CSV decoder
3D display
m content providers Aggregator
v1 d
Hundreds
MV + D Multicamera
m content providers Aggregator
All channels; each viewer selecting v1c1+v2c1, v1c2+v2c2, v1c3+v2c3, ... DVB-H (or proprietary)
Millions ... V+D decoder
3D display
Video
Depth
Metadata
All channels; each viewer selecting v1c1+d1c1, v1c2+d2c2, v1c3+d3c3, ...
200–300 content channels
Figure 4.7 Complexity of a commercial-grade 3DTV delivery environment using a DVB-H (or proprietary infrastructure).
MVC JMVM Server-driven rate scaling (P2P) Scalable MVC Client-driven (selective) MDC
Wireline IP client
Client-driven (multicast) Server Internet infrastructure Wireless client
Encoders content providers
Figure 4.8 over IP.
Clients (Viewers)
Block diagram of the framework and system for 3DTV streaming transport
to several clients, (iii) P2P unicast distribution, where each peer forwards packets to another peer, and (iv) P2P multicasting, where each peer forwards packets to several other peers. Multicasting protocols can be supported at the network-layer or application layer [6]. Figure 4.8 provides a view of the framework and system for 3DTV streaming transport over IP.
88
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
Yet, there is a lot of current academic research and interest in connectionless delivery of 3DTV content over shared packet networks. 3D video content needs to be protected when transmitted over unreliable communication channels. The effects of transmission errors on the perceived quality of 3D video could not be less than those for the equivalent 2D video applications, because the errors will influence several perceptual attributes (e.g., naturalness, presence, depth perception, eye-strain, and viewing experience), associated with 3D viewing [7]. It has long been known that IP-based transport can accommodate a wide range of applications. Transport and delivery of video in various forms goes back to the early days of the Internet. However, (i) the delivery of quality (jitter-, loss-free) content, particularly HD or even 3D; (ii) the delivery of content in a secure, money-making subscription-based manner; and (iii) the delivery of streaming real-time services for thousands of channels (worldwide) and millions of simultaneous customers remain a long shot at this juncture. Obviously, at the academic level, transmission of video over the Internet (whether 2D or 3D) is currently an active research and development area where significant results have already been achieved. Some video-on-demand services that make use of the Internet, both for news and entertainment applications, have emerged but desiderata (i), (ii), and (iii) have not been met. Naturally, it is critical to distinguish between the use of the IP (IPv4 or IPv6) protocol and the use of the Internet (that is based on IP), as a delivery mechanism (a delivery channel). IPTV services delivered over a restricted IP infrastructure appear to be more tenable in the short term, both in terms of Quality of Service (QoS) and Quality of Experience (QoE). Advocates now advance the concept of 3D IPTV. The transport of 3DTV signals over IP packet networks appears to be a natural extension of video over IP applications; but the IPTV model (rather than the Internet) model seems more appropriate at this time. The consuming public will not be willing, we argue based on experience, to purchase new (fairly expensive) TV displays for 3D, if the quality of the service is not there. The technology has to “disappear into the background” and not be smack in the foreground, if the QoE has to be reasonable. To make a comparison with Voice over Internet Protocol (VoIP), it should be noted that while voice over the Internet is certainly doable end-to-end, specialized commercial VoIP providers tend to use the Internet mostly for access (except for international calling to secondary geographic locations). Most top-line (traditional) carriers use the IPover their own internally designed, internally engineered, and internally provisioned network, and also for core transport [8–21]. Some of the research issues associated with IP delivery in general, and IP/Internet streaming in particular, include but are not limited to, the following [6]: 1. Determination of the best video encoding configuration for each streaming strategy: multi-view video encoding methods provide some compression efficiency gain at the expense of creating dependencies between views that hinder random access to views.
OVERVIEW OF BASIC TRANSPORT APPROACHES
89
2. Determination of the best rate adaptation method: adaptation refers to adaptation of the rate of each view as well as inter-view rate allocation depending on available network rate and video content, and adaptation of the number and quality of views transmitted depending on available network rate and user display technology and desired viewpoint. 3. Packet-loss resilient video encoding and streaming strategies as well as better error concealment methods at the receiver: some ongoing industry research includes the following [7]. • Some research related to Robust Source Coding is of interest. In a connectionless network, packets can get lost; the network is lossy. A number of standard source coding approaches are available to provide robust source coding for 2D video to deal with this issue, and many of these can be used for 3D V + D applications (features such as slice coding, redundant pictures, Flexible Macroblock Ordering or FMO, Intrarefresh, and Multiple Description Coding or MDC are useful in this context). Lossaware rate-distortion optimization is often used for 2D video to optimize the application of robust source coding techniques. However, the models used have not been validated for use with 3D video in general and FVV in particular. • Some research related to Cross-Layer Error Robustness is also of interest for transport of V + D signals over a connectionless network. In recent years, attention has focused on cross-layer optimization of 2D video quality. This has resulted in algorithms that have optimized channel coding and prioritized the video data. Similar work is needed to uncover/assess appropriate methods to transport 3D video across networks. • Other research work pertains to Error Concealment that might be needed when transporting V + D signals over a connectionless network. Most 2D error concealment algorithms can be used for 3D video. However, there is additional information that can be used in 3D video to enhance the concealed quality: for example, information such as motion vectors can be shared between the color and depth video; if color information is lost, then depth motion vectors can be used to carry out concealment. There are other opportunities with MVC, where adjacent views can be used to conceal a view that is lost. 4. Best peer-to-peer multicasting design methods are required, including topology discovery, topology maintenance, forwarding techniques, exploitation of path diversity, methods for enticing peers to send data and to stay connected, and use of dedicated nodes as relays. Some have argued that stereo streaming allows for flexibility in congestion control methods, such as video rate adaptation to the available network rate, methods for packet loss handling, and postprocessing for error concealment but it is unclear how a commercial service with paying customers (possibly paying a
90
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
premium for the 3DTV service) would indeed be able to accept any degradation in quality. Developing laboratory test bed servers that unicast content to multiple clients with stereoscopic displays should be easily achievable, as should be the case for other comparable arrangements. Translating those test beds to scalable, reliable (99.999% availability), cost-effective commercial-service-supporting infrastructures is altogether another matter. In summary, real-time delivery of 3DTV content can make use of satellite, cable, broadcast, IP, IPTV, Internet, and wireless technologies. Any unique requirements of 3DTV need to be taken into account. The requirements are very similar to those needed for delivery of entertainment-quality video (e.g., with reference to latency, jitter, and packet loss), but with the observation that a number (in not most) of the encoding techniques require more bandwidth. The incremental bandwidth is as follows: (i) from 20% to 100% more for stereoscopic viewing compared with 2D viewing;1 (ii) from 50% to 200% for multi-view systems compared with 2D viewing; and (iii) a lot more bandwidth for holoscopic/holographic designs (presently not even being considered for near-term commercial 3DTV service). We mentioned explicit coding earlier: that would indeed provide more efficiency, but as noted, most video systems in use today (or anticipated to be available in the near future) use explicit coding. Synthetic video generation based on CGI techniques needs less bandwidth than actual video. There can also be content with Mixed Reality (MR)/Augmented Reality (AR) that mix graphics with real images, such as those that use depth information together with image data for 3D scene generation. These systems may also require less bandwidth than actual full video. Stereoscopic video (CSV) may be used as a reference point. Holoscopic/holographic systems require the most. It should also be noted that while graphic techniques and/or implicit coding may require a very large transmission bandwidth, the tolerance to information (packet loss) is typically very low. We conclude this section by noting, again, that while connectionless packet networks offer many research opportunities as related to supporting 3DTV, we believe that a commercial 3DTV service will more likely occur in a connectionoriented (e.g., DTH, cable TV) environment and/or a controlled-environment IPTV setting.
4.2
DVB
DVB is a consortium of over 300 companies in the fields of broadcasting and manufacturing that work cooperatively to establish common international standards for digital broadcasting. DVB-generated standards have become the leading international standards, commonly referred to as “DVB,” and the accepted choice for technologies that enable an efficient, cost-effective, high-quality, and 1
As noted, spatial compression uses the same channel bandwidth as a traditional TV signal, but by compromising resolution.
DVB
91
interoperable digital broadcasting. The DVB standards for digital television have been adopted in the United Kingdom, across mainland Europe, in the Middle East, South America, and in Australasia. DBV standards are used for DTH satellite transmission 22 (and also for terrestrial and cable transmission). The DVB standards are published by a Joint Technical Committee (JTC) of European Telecommunications Standards Institute (ETSI), European Committee for Electrotechnical Standardization (Comit´e Europ´een de Normalisation Electrotechnique—CENELEC), and European Broadcasting Union (EBU). DVB produces specifications that are subsequently standardized in one of the European statutory standardization bodies. They cover the following DTV-related areas: • • • • • • • • • • •
conditional access, content protection copy management, interactivity, interfacing, IP, measurement, middleware, multiplexing, source coding, subtitling, transmission.
Standards have emerged in the past 10 years for defining the physical layer and data link layer of a distribution system, as follows: • • • •
satellite video distribution (DVB-S and DVB-S2), cable video distribution (DVB-C), terrestrial television video distribution (DVB-T), terrestrial television for handheld mobile devices (DVB-H).
Distribution systems differ mainly in the modulation schemes used (because of specific technical constraints): • DVB-S (SHF) employs QPSK (Quadrature Phase-Shift Keying). • DVB-S2 employs QPSK, 8PSK (Phase-Shift Keying), 16APSK (Asymmetric Phase-Shift Keying) or 32APSK; 8PSK is the most common at this time (it supports a 30-megasymbols pre-satellite transponder and provides a usable rate in the 75 Mbps range, or about 25 SD-equivalent MPEG-4 video channels). • DVB-C (VHF/UHF) employs QAM (Quadrature Amplitude Moderation): 64-QAM or 256-QAM. • DVB-T (VHF/UHF) employs 16-QAM or 64-QAM (or QPSK) along with COFDM (Coded Orthogonal Frequency Division Multiplexing).
92
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
• DVB-H: refer to the next section. Because these systems have been widely deployed, especially in Europe, they may well play a role in the near-term 3DTV services. IPTV also makes use of a number of these standards, particularly when making use of satellite links (an architecture that has emerged is to use satellite links to provide signals to various geographically distributed headends, which then distribute these signals terrestrially to a small region using the telco IP network—these headends act as rendezvous point in the IP Multicast infrastructure). Hence, in the reasonable assumption that IPTV will play a role in 3DTV, these specifications will also be considered for 3DTV in that context. As implied above, transmission is a key area of activity for DVB. See Table 4.2 for some of the key transmission specifications. In particular, EN 300 421 V1.1.2 (1997–2008) describes the modulation and channel coding system for satellite digital multiprogram television (TV)/HDTV services to be used for primary and secondary distribution in Fixed Satellite Service (FSS) and Broadcast Satellite Service (BSS) bands. This specification is also known as DVB-S. The system is intended to provide DTH services for consumer IRD, as well as cable television headend stations with a likelihood of remodulation. The system is defined as the functional block of equipment performing the adaptation of the baseband TV signals, from the output of the MPEG-2 transport multiplexer (ISO/IEC DIS 13818-1) to the satellite channel characteristics. The following processes are applied to the data stream: • • • • • •
transport multiplex adaptation and randomization for energy dispersal; outer coding (i.e., Reed–Solomon); convolutional interleaving; inner coding (i.e., punctured convolutional code); baseband shaping for modulation; modulation.
DVB-S/DVB-S2 as well as the other transmission systems could be used to deliver 3DTV. As seen in Fig. 4.9, MPEG information is packed into PESs (Packetized Elementary Streams), which are then mapped to TSs that are then handled by the DVB adaptation. The system is directly compatible with MPEG2 coded TV signals. The modem transmission frame is synchronous with the MPEG-2 multiplex transport packets. Appropriate adaptation to the signal formats (e.g., MVC ISO/IEC 14496-10:2008 Amendment 1 and ITU-T Recommendation H.264, the extension of AVC) will have to be made, but this kind of adaptation has recently been defined in the context of IPTV to carry MPEG-4 streams over an MPEG-2 infrastructure (Fig. 4.10). Some additional arrangements for the use of satellite transmission are depicted in Chapter 5. Also, see Appendix A4 for a brief overview of MPEG multiplexing and DVB support.
DVB
TABLE 4.2
93
Key DVB Transmission Specifications
EN 300 421 V1.1.2 (08/97), S TR 101 198 V1.1.1 (09/97)
EN 302 307 V1.2.1 (08/09), S2
TR 102 376 V1.1.1 (02/05)
TS 102 441 V1.1.1 (10/05)
EN 300 429 V1.2.1 (04/98), C DVB BlueBook A138 (04/09), C2
EN 300 473 V1.1.2 (08/97), CS TS 101 964 V1.1.1 (08/01)
TR 102 252 V1.1.1 (10/03)
EN 300 744 V1.6.1 (01/09), T TR 101 190 V1.3.1 (07/08) TS 101 191 V1.4.1 (06/04) EN 302 755 V1.1.1 (09/09)
DVB BlueBook A122 (12/09), T2
Framing structure, channel coding, and modulation for 11/12 GHz satellite services Implementation of Binary Phase Shift Keying (BPSK) modulation in DVB satellite transmission systems Second-generation framing structure, channel coding, and modulation systems for broadcasting, interactive services, news gathering, and other broadband satellite applications User guidelines for the second-generation system for broadcasting, interactive services, news gathering, and other broadband satellite applications DVB-S2 adaptive coding and modulation for broadband hybrid satellite dial-up applications Framing structure, channel coding, and modulation for cable systems Frame structure channel coding and modulation for a second-generation digital transmission system for cable systems (DVB-C2) DVB Satellite Master Antenna Television (SMATV) distribution systems Control channel for SMATV/MATV (Master Antenna Television) distribution systems; baseline specification Guidelines for implementation and use of the control channel for SMATV/MATV distribution systems Framing structure, channel coding, and modulation for digital terrestrial television Implementation guidelines for DVB terrestrial services; transmission aspects Megaframe for Single Frequency Network (SFN) synchronization Frame structure channel coding and modulation for a second-generation digital terrestrial television broadcasting system (DVB-T2) Frame structure channel coding and modulation for a second-generation digital terrestrial television broadcasting system (DVB-T2) (dEN302 755 V1.2.1) (continued overleaf )
94
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
TABLE 4.2
(Continued )
DVB BlueBook A133 (12/09)
Implementation guidelines for a second-generation digital terrestrial television broadcasting system (DVB-T2) (draft TR 102 831 V1.1.1) TS 102 773 V1.1.1(09/09) Modulator Interface (T2-MI) for a second-generation digital terrestrial television broadcasting system (DVB-T2) EN 302 304 V1.1.1 (11/04), H Transmission system for handheld terminals TR 102 377 V1.4.1 (06/09) Implementation guidelines for DVB handheld services TR 102 401 V1.1.1 (05/05) DVB-H validation task force report TS 102 585 V1.1.2 (04/08) System specifications for Satellite Services to Handheld Devices (SH) below 3 GHz EN 302 583 V1.1.1 (03/08), SH Framing structure, channel coding, and modulation for SH below 3 GHz DVB BlueBook A111 (12/09) Framing structure, channel coding, and modulation for SH below 3 GHz (dEN 302 583 v.1.2.1) TS 102 584 V1.1.1 (12/08) Guidelines for implementation for SH below 3 GHz DVB BlueBook A131 (11/08) MPE-IFEC (draft TS 102 772 V1.1.1) EN 300 748 V1.1.2 (08/97), MDS Multipoint Video Distribution Systems (MVDS) at 10 GHz and above EN 300 749 V1.1.2 (08/97) Framing structure, channel coding, and modulation for MMDS(Multichannel Multipoint Distribution Service) systems below 10 GHz EN 301 701 V1.1.1 (08/00) OFDM (Orthogonal Frequency Division Multiplexing) modulation for microwave digital terrestrial television EN 301 210 V1.1.1 (02/99), DSNG Framing structure, channel coding, and modulation for Digital Satellite News Gathering (DSNG) and other contribution applications by satellite TR 101 221 V1.1.1 (03/99) User guidelines for DSNG and other contribution applications by satellite EN 301 222 V1.1.1 (07/99) Coordination channels associated with DSNG
For Digital Rights Management (DRM), the DVB Project–developed Digital Video Broadcast Conditional Access (DVB-CA) defines a Digital Video Broadcast Common Scrambling Algorithm (DVB-CSA) and a Digital Video Broadcast Common Interface (DVB-CI) for accessing scrambled content: • DVB system providers develop their proprietary conditional access systems within these specifications; • DVB transports include metadata called service information (DVB-SI i.e., Digital Video Broadcast Service Information) that links the various Elementary Streams (ESs) into coherent programs and provides human-readable descriptions for electronic program guides.
DVB-H
95
System information
General data (already in IP packets)
Data coder Services components
1 Transport MUX
Audio coder
Programme MUX
Video coder
2
Sync 1 byte
187 bytes
n
MPEG-2 transport MUX packet DVB-S / DVB-S2 Encapsulation
Services MPEG-2 Source coding and multiplexing
204 bytes Sync 1 or sync n
R 187 bytes
RS(204,188,8)
Reed–Solomon RS (204,188, T-8) error protected packet
DVB-S / DVB-S2 Infrastructure (e.g., DTH)
Audio coder
Data coder
1
2
Services components
MUX adaptation and energy dispereal
Transport MUX
Programme MUX
Video coder
RS (204,188)
Convolutional code
Outer coder
Inner coder
Convolutional interleaver
Baseband shaping
QPSK modulator
n
To the RF satellite channel
Services MPEG-2 Source coding and multiplexing
Satellite channel adapter
Figure 4.9 Functional block diagram of DVB-S.
MPEG-2
MPEG-4
PS
TS
DVD
DVB
DVB-S2
Figure 4.10 Mapping of MPEG-2/MPEG-4 to DVB/DVB-S2 systems.
4.3
DVB-H
There is interest in the industry in delivering 3DTV services to mobile phones. It is perceived that simple lenticular screens can work well in this context and that the bandwidth (even though always at a premium in mobile applications)
96
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
would not be too onerous overall; even assuming a model with two independent streams being delivered, it would double the bandwidth to 2 × 384 kbps or 2 × 512 kbps and the use of spatial compression (which should not be such a “big” compromise here) would be handled at the traditional data rate, 384 kbps or 512 kbps. DVB-H, as noted in Table 4.2, is a DVB specification that deals with approaches and technologies to deliver commercial-grade medium-quality real-time linear and on-demand video content to handheld, battery-powered devices such as mobile telephones and PDAs (Personal Digital Assistants). IP Multicast is typically employed to support DVB-H. DVB-H2 addresses the requirements for reliable, high-speed, high–data rate reception for a number of mobile applications including real-time video to handheld devices. DVB-H systems typically make use of IP Multicast. DVB-H is generating significant interest in the broadcast and telecommunications worlds, and DVB-H services are expected to start at this time. The DVB-H standards have been standardized through ETSI. ETSI EN 302 304 “Digital Video Broadcasting (DVB); Transmission System for Handheld Terminals (DVB-H)” is an extension of the DVB-T standard. Additional features have been added to support handheld and mobile reception. Lower power consumption for mobile terminals and secured reception in the mobility environments are key features of the standard. It is meant for IP-based wireless services. DVB-H can share the DVB-T MUX with MPEG-2/MPEG-4 services, so it can be part of the IPTV infrastructure described in the previous chapter, except that lower bitrates are used for transmission (typically in the 384-kbps range). DVB-H was published as ETSI Standard in 2004 as an umbrella standard defining how to combine the existing (now updated) ETSI standards to form the DVB-H system (Fig. 4.11). DVB-H is based on DVB-T, a standard for digital transmission of terrestrial over-the-air TV signals. When DVB-T was first published in 1997, it was not designed to target mobile receivers. However, DVB-T mobile services have been launched in a number of countries. Indeed, with the advent of diversity antenna receivers, services that target fixed reception can now largely be received on the move as well. DVB-T is deployed in more than 50 countries. Yet, a new standard was sought, namely, DVB-H. Despite the success of mobile DVB-T reception, the major concern with any handheld device is that of battery life. The current and projected power consumption of DVB-T front-ends is too high to support handheld receivers that are expected to last from one to several days on a single charge. The other major requirements for DVB-H were an ability to receive 15 Mbps in an 8-MHz channel and in a wide area Single Frequency Network (SFN) at high speed. These requirements were drawn up after much debate and with an eye on emerging convergence devices providing video services and other broadcast data services to 2.5G and 3G handheld devices. Furthermore, all this should be possible while 2
This material is based on Ref. [23].
DVB-H
DVB-H IPE IP
DVB-T modulator
MPEG-2 mux
MPE MPE FEC Time slicing
97
8K
4K
2K
DVB-H TPS Channel
DVB-H IPE IP
MPE MPE FEC Time slicing
DVB-H IPE IP
MPE MPE FEC Time slicing
DVB-T demodulator
DVB-H decapsulator
IP
DVB-H TPS
MPE MPE FEC Time slicing
IP
IP
4K
8K
DVB-T demodulator
DVB-H decapsulator
DVB-H TPS
MPE MPE FEC Time slicing
DVB-H decapsulator
2K
4K
8K
DVB-T demodulator
MPE MPE FEC Time slicing
New to DVB-H
2K
DVB-H TPS
2K
4K
8K
IPE = IP Encapsulator
Figure 4.11
DVB-H Framework.
IP Multicast Content acquisition
Content aggregation
Content encoding (low bit rate)
Figure 4.12
Provisioning of service
Operation of network
Provisioning of cellular
Operation of cellular network
Transmitter
Buy content Rights
Consumer
Block-level view of a DVB-H network.
maintaining maximum compatibility with existing DVB-T networks and systems. Figure 4.12 depicts a block-level view of a DVB-H network. In order to meet these requirements, the newly developed DVB-H specification includes the capabilities discussed next.
98
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
• Time-Slicing: Rather than continuous data transmission as in DVB-T, DVBH employs a mechanism where bursts of data are received at a time—a so-called IP datacast carousel. This means that the receiver is inactive for much of the time, and can thus, by means of clever control signaling, be “switched off.” The result is a power saving of about 90% and more in some cases. • “4-K Mode”: With the addition of a 4-K mode with 3409 active carriers, DVB-H benefits from the compromise between the high-speed small-area SFN capability of 2-K DVB-T and the lower speed but larger area SFN of 8-K DVB-T. In addition, with the aid of enhanced in-depth interleavers in the 2-K and 4-K modes, DVB-H has even better immunity to ignition interference. • Multiprotocol Encapsulation–Forward Error Correction (MPE-FEC): The addition of an optional, multiplexer level, FEC scheme means that DVB-H transmissions can be even more robust. This is advantageous when considering the hostile environments and poor (but fashionable) antenna designs typical of handheld receivers. Like DVB-T, DVB-H can be used in 6-, 7-, and 8-MHz channel environments. However, a 5-MHz option is also specified for use in non-broadcast environments. A key initial requirement, and a significant feature of DVB-H, is that it can coexist with DVB-T in the same multiplex. Thus, an operator can choose to have two DVB-T services and one DVB-H service in the same overall DVB-T multiplex. Broadcasting is an efficient way of reaching many users with a single (configurable) service. DVB-H combines broadcasting with a set of measures to ensure that the target receivers can operate from a battery and on the move, and is thus an ideal companion to 3G telecommunications, offering symmetrical and asymmetrical bidirectional multimedia services. DVB-H trials have been conducted in recent years in Germany, Finland, and the United States. Such trials help frequency planning and improve understanding of the complex issue of interoperability with telecommunications networks and services. However, to date at least in the United States, there has been limited interest (and success) in the use of DVB-H to deliver video to hand-held devices. Providers have tended to use proprietary protocols. Proponents have suggested the use of DVB-H for delivery of 3DTV to mobile devices. Some make the claim that wireless 3DTV may be introduced at an early point because of the tendency of wireless operators to feature new applications earlier than traditional carriers. While this may be true in some parts of the world—perhaps mostly driven by the regulatory environment favoring wireless in some countries, by the inertia of the wireline operators, and by the relative ease with which “towers are put up”—we remain of the opinion that the spectrum limitations and the limited QoE of a cellular 3D interaction do not make cellular 3D such a financially compelling business case for the wireless operators to induce them to introduce the service “over night.”
REFERENCES
99
REFERENCES 1. Minoli D. IP multicast with applications to IPTV and mobile DVB-H. New York: Wiley/IEEE Press; 2008. 2. Minoli D. Video dialtone technology: digital video over ADSL, HFC, FTTC, and ATM. New York: McGraw-Hill; 1995. 3. Minoli D. Distributed multimedia through broadband communication services (coauthored). Norwood, MA: Artech House; 1994. 4. Minoli D. Digital video. In: Terplan K, Morreale P, editors. The telecommunications handbook, Chapter 4. New York: IEEE Press; 2000. 5. Minoli D. Distance learning: technology and applications. Norwood, MA: Artech House; 1996. 6. Tekalp M, editor. D32.2, Technical Report #2 on 3D Telecommunication Issues, Project Number: 511568, Project Acronym: 3DTV, Title: Integrated ThreeDimensional Television—Capture, Transmission and Display. Feb 20, 2007. 7. Hewage CTER, Worrall S. Robust 3D video communications. IEEE Comsoc MMTC E-Letter 2009; 4(3). 8. Minoli D. Delivering voice over IP networks, 1st edn (co-authored). New York: Wiley; 1998. 9. Minoli D. Delivering voice over IP and the Internet, 2nd edn (co-authored). New York: Wiley; 2002. 10. Minoli D. Voice over MPLS. New York: McGraw-Hill; 2002. 11. Minoli D. Voice over IPv6—architecting the next-generation VoIP. New York: Elsevier; 2006. 12. Minoli D. Delivering voice over frame relay and ATM, (co-authored). New York: Wiley; 1998. 13. Minoli D. Optimal packet length for packet voice communication. IEEE Trans Commun 1979; COMM-27:607–611. 14. Minoli D. Packetized speech network, Part 3: Delay behavior and performance characteristics. Aust Electron Eng 1979:59–68. 15. Minoli D. Packetized speech networks, Part 2: Queuing model. Aust Electron Eng 1979:68–76. 16. Minoli D. Packetized speech networks, Part 1: Overview. Aust Electron Eng 1979:38–52. 17. Minoli D. Satellite On-Board Processing of Packetized Voice. ICC 1979 Conference Record. pp. 58.4.1–58.4.5; New York, NY, USA. 18. Minoli D. Issues in packet voice communication. Proc IEE 1979; 126(8):729–740. 19. Minoli D. Analytical models for initialization of single hop packet radio networks. IEEE Trans Commun 1979; COMM-27:1959–1967, Special Issue on Digital Radio (with I. Gitman and D. Walters). 20. Minoli D. Some design parameters for PCM-based packet voice communication. International Electrical/Electronics Conference Record; 1979; Toranto, ONT, Canada. 21. Minoli D. Digital voice communication over digital radio links. SIGCOMM Comput Commun Rev 1979; 9(4):6–22.
100
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
22. Minoli D. Satellite systems engineering in an IPv6 environment. New York: Taylor and Francis; 2009. 23. The DVB Project Office. MacAvock P., Executive Director, DVB-H White Paper. EBU, Geneva, Switzerland. http://www.dvb.org. 2010. 24. ISO/IEC IS 13818-1. Information technology—Generic coding of moving pictures and associated audio information—Part 1: Systems. International Organization for Standardization (ISO); 2000. 25. ISO/IEC DIS 13818-2. Information technology—Generic coding of moving pictures and associated audio information: Video, International Organization for Standardization (ISO); 1995. 26. ISO/IEC 13818-3:1995. Information technology—Generic coding of moving pictures and associated audio information—Part 3: Audio, International Organization for Standardization (ISO). 1995. 27. Fairhurst G, Montpetit M-J. Address Resolution for IP datagrams over MPEG-2 networks. Internet Draft draft-ietf-ipdvb-ar-00.txt, IETF ipdvb. Jun 2005. 28. Clausen HD, Collini-Nocker B, et al. Simple Encapsulation for transmission of IP datagrams over MPEG-2/DVB Networks. Internet Engineering Task Force draftunisal-ipdvb-enc-00.txt. May 2003. 29. Montpetit MJ, Fairhurst G, et al. RFC 4259, a framework for transmission of IP datagrams over MPEG-2 networks. Nov 2005. 30. Faria G, Henriksson JA, Stare E, et al. DVB-H: digital broadcast services to handheld devices. IEEE Proc IEEE 2006; 94(1):194.
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT
101
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT
A4.1 Packetized Elementary Stream (PES) Packets and Transport Stream (TS) Unit(s)
International Standard ISO/IEC 13818-1 was prepared3 by JTC ISO/IEC JTC 1, Information technology, Subcommittee SC 29, “Coding of audio, picture, multimedia and hypermedia information,” in collaboration with ITU-T. The identical text is published as ITU-T Rec. H.222.0. ISO/IEC 13818 consists of the following parts, under the general title “Information technology—Generic coding of moving pictures and associated audio information”:4 • • • • • • • • •
Part 1: Systems Part 2: Video Part 3: Audio Part 4: Conformance testing Part 5: Software simulation Part 6: Extensions for DSM-CC Part 7: Advanced Audio Coding (AAC) Part 9: Extension for real time interface for systems decoders Part 10: Conformance extensions for Digital Storage Media Command and Control (DSM-CC)
The MPEG-2 and/or -4 standard defines three layers: systems, video, and audio [24–26]. The systems layer supports synchronization and interleaving of multiple compressed streams, buffer initialization and management, and time identification. For video and audio, the information is organized into access units, each representing a fundamental unit of encoding; for example, in video, an access unit will usually be a complete encoded video frame. The audio and the video layers define the syntax and semantics of the corresponding Elementary Streams (ESs). An ES is the output of an MPEG encoder and typically contains compressed digital video, compressed digital audio, digital data, and digital control data. The information corresponds to an access unit (a fundamental unit of encoding), such as a video frame. The compression is achieved using the DCT. Each ES is in turn an input to an MPEG-2 processor that accumulates the data into a stream of PES packets. A PES typically contains an integral number of ESs. Figure A4.1 shows both the multiplex structure and the Protocol Data Unit (PDU) format. A PES packet may be a fixed- or variable-sized block, with up to 65,536 octets per block and includes a 6-byte protocol header. 3 This
second edition, published in 2000, cancels and replaces the first edition (ISO/IEC 3818-1:1996), which has been technically revised. 4 Part 8 has been withdrawn; it addressed 10-bit video.
102
Video
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
Uncompressed stream
MPEG-2 elementary encoder
ES Packetizer MPEG Encoded (compressed) stream
PES Systems layer multiplexer and SAR (segmentation/ reassembly)
PES Audio
Uncompressed stream
MPEG-2 elementary encoder
ES
Packetizer MPEG Encoded (compressed) stream
ES
Data
Transport stream TS
PES
Packetizer
Header
PES payload
Packet start code prefix
Stream ID
24
8
PES packet Optional lenght PES header 16
Figure A4.1 Combining of Packetized Elementary Streams (PES) into a TS.
Serial digital Interface (SDI) video
Stream ID1 PES Stream ID2
Audio
MPEG-2 Mux
SPTS (single program transport stream) 4
PES
Data
Encoder output
ENCODER
PES
Video
184 bytes 4 4
UDP IP encapsulation
184 bytes 184 bytes
MPEG 2 transport stream (TS) 4 bytes 184 bytes Packetized elementary stream (PES)
Payload
Header (32 bits)
Transport Payload Transport Adaptation Sync error Transport Continuity unit start PID scrambling field byte indicator priority counter indicator control control (TEI) 8
1
1
1
13
2
2
4
Packetized elementary stream 6 bytes PES packet data bytes
Header (48 bits)
Packet start code prefix
Stream ID
24
8
PES Optional packet length PES header 16
Figure A4.2 PES and TS multiplexing.
As seen in the figure, and more directly in Fig. A4.2, PESs are then mapped to Transport Stream (TS) unit(s). Each MPEG-2 TS packet carries 184 octets of payload data prefixed by a 4-octet (32 bit) header (the resulting 188-byte packet size was originally chosen for compatibility with Asynchronous Transfer Mode (ATM systems). These packets are the basic unit of data in a TS. They consist
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT
103
of a sync byte (0 × 47), followed by flags and a 13-bit Packet Identifier (PID5 ). This is followed by other (some optional) transport fields; the rest of the packet consists of the payload. Figure A4.3 connects the PES and TS concepts together.
PES hdr
TS TS payload hdr
PES hdr
PES packet payload
TS TS payload hdr
TS TS payload hdr
PES packet payload
TS TS payload hdr
Adaptation field (used for stuffing here) 188 bytes 4 bytes Header
Adaptation field (may not be present)
- Sync_byte (sync the decoder) - Transport_error_indicator - Payload_unit_start_indicator (PSI or PES packet) - Transport priority - PID (13 bit id for each stream) - Transport_scrambling control - Adaptation_field_control - Continuity_counter (counts packets of PES) - PES packet length
Payload
- Discontinuity_indicator - Random_access_indicator - ES_priority_indicator - Flags (PCR_flag...) - PCR (if PCR_flag is set) (system time clock, every 0.1 sec, to sync decoder and encoder time) - Other fields depending on flags
Figure A4.3 A sequence of PESs leads to a sequence of uniform TS packets.
The PID is a 13-bit field that is used to uniquely identify the stream to which the packet belongs (e.g., PES packets corresponding to an ES) generated by the multiplexer. Each MPEG-2 TS channel is uniquely identified by the PID value carried in the header of fixed length MPEG-2 TS packets. The PID allows the receiver to identify the stream to which each received packet belongs; effectively, it allows the receiver to accept or reject PES packets at a high level without burdening the receiver with extensive processing. Often one sends only one PES (or a part of a single PES) in a TS packet (in some cases, however, a given PES packet may span several TS packets so that the majority of TS packets contain continuation data in their payloads). Each PID contains specific video, audio or data information. Programs are groups of one or more PID streams that are related to each other. For example, a TS used in IPTV could contain five programs, to represent five video channels. Assume that each channel consists of one video stream, one or two audio streams, and metadata. A receiver wishing to tune to a particular “channel” has to decode the payload of the PIDs associated with its program. It can discard the contents of all other PIDs. The number of TS logical channels is limited to 8192, some of which are reserved; unreserved TS 5
Some also call this the Program ID.
104
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
logical channels may be use to carry audio, video, IP datagrams, or other data. Examples of systems using MPEG-2 include the DVB and Advanced Television Systems Committee (ATSC) Standards for Digital Television. Note 1: Ultimately an IPTV stream consists of packets of fixed size. MPEG (specifically MPEG-4) packets are aggregated into an IP packet then and the IP packet is transmitted using IP Multicast methods. MPEG TS are then typically encapsulated in the UDP and then in IP. In turn, and (only) for interworking with existing MPEG-2 systems already deployed (e.g., satellite systems and associated ground equipment supporting DTH), this IP packet needs further encapsulation, as discussed later. Note that traditional MPEG-2 approaches make use of the PID to identify content, whereas in IPTV applications, the IP Multicast address is used to identify the content; also, the latest IPTV systems make use of MPEG-4-coded PESs.
Note 2: The MPEG-2 standard defines two ways for multiplexing different elementary stream types: (i) Program Stream (PS) and (ii) Transport Stream (TS). • An MPEG-2 PS is principally intended for storage and retrieval from storage media. It supports grouping of video, audio, and data ESs that have a common time base. Each PS consists of only one content (TV) program. The PS is used in error-free environments; for example, DVDs use the MPEG-2 PS. A PS is a group of tightly coupled PES packets referenced to the same time base. • An MPEG-2 TS combines multiple PESs (that may or may not have common time base) into a single stream and multiplexes these PESs into one stream, along with information for synchronizing between them. At the same time the TS segments the PES into the smaller fixed-size TS packets. An entire video frame may be mapped in one PES packet. PES headers distinguish PES packets of various streams and also contain time stamp information. PESs are generated by the packetization process; the payload consists of the data bytes taken sequentially from the original ES. A TS may correspond to a single TV program; this type of TS is normally called a Single Program Transport Stream (SPTS). In most cases, one or more SPTS streams are combined to form a Multiple Program Transport Stream (MPTS). This larger aggregate also contains the control information (Program Specific Information or PSI) [27].
A4.2 DVB (Digital Video Broadcasting)-Based Transport in Packet Networks
As we discussed in the body of this chapter, DVB-S is set up to carry MPEG-2 TS streams encapsulated with 16 bytes of Reed–Solomon FEC to create a packet
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT
105
DVB packet 204 bytes 188 bytes
16 bytes
MPEG-2 packet
Reed–Solomon error correction
4 bytes
184 bytes
MPEG-2 header
MPEG-2 payload
Packet identifier (PID) PIDs are from 0 × 0000 to 0 × 1fff PIDs from 0 × 0000 to 0 × 0020 and 0 × 1fff are reserved
Figure A4.4 DVB packet.
that is 204 bytes long (Fig. A4.4). DVB-S embodies the concept of “virtual channels” in a manner analogous to ATM; virtual channels are identified by PIDs (one can think of the DVB packets as being similar to an ATM cell, but with different length and format). DVB packets are transmitted over an appropriate network. The receiver looks for specific PIDs that it has been configured to acquire (directly in the headend receiver for terrestrial redistribution purposes or in the viewer’s set-top box for a DTH application or in the set-top box via an IGMP join in an IPTV environment). Specifically, to display a channel of IPTV digital television, the DVB-based application configures the driver in the receiver to pass up to it the packets with a set of specific PIDs, for example, PID 121 containing video and PID 131 containing audio (these packets are then sent to the MPEG decoder which is either hardware- or software-based). So, in conclusion, a receiver or demultiplexer extracts ESs from the TS in part by looking for packets identified by the same PID. A4.3
MPEG-4 and/or Other Data Support
For satellite transmission and to remain consistent with already existing MPEG-2 technology6 , MPEG-4 TSs (or other data) are further encapsulated in Multiprotocol Encapsulation (MPE - RFC 3016) and then segmented again and placed into TS streams via a device called IP Encapsulator (IPE; Fig. A4.5). MPE is used to transmit datagrams that exceed the length of the DVB “cell,” just as Asynchronous Transfer Mode Adaptation Layer 5 (AAL5) is used for a 6
Existing receivers (specifically, IRDs) are based on hardware that works by de-enveloping MPEG-2 TSs; hence, the MPEG-4-encoded PESs are mapped to TSs at the source.
106
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
Uplink
Receiver
Multiprotocol encapsulation (MPE)
Figure A4.5
Pictorial view of encapsulation.
similar function in an ATM context. MPE allows one to encapsulate IP packets into MPEG-2 TSs (“packets,” or “cells”; Fig. A4.6). IPEs handle statistical multiplexing and facilitate coexistence. IPE receives IP packets from an Ethernet connection and encapsulates packets using MPE, and then maps these streams into an MPEG-2 TS. Once the device has encapsulated the data, the IPE forwards the data packets to a satellite link. Generic data (IP) for transmission over the MPEG-2 transport multiplex (or IP packets containing MPEG-4 video) is passed to an encapsulator that typically receives PDUs (Ethernet frames, IP datagrams, or other network layer packets); the encapsulator formats each PDU into a series of TS packets (usually after adding an encapsulation header) that are sent over a TS logical channel. The MPE packet has the format shown in Fig. A4.7. Figure A4.8 shows the encapsulation process. Note: IPEs are usually not employed if the output of the layer 2 switch is connected to a router for transmission over a terrestrial network; in this case, the headend is responsible for proper downstream enveloping and distribution of the traffic to the ultimate consumer. In other pure, IP-based video environments, where DVB-S or DVB-S2 are not used (e.g., a greenfield IP network that is designed to just handle video), the TSs are included in IP packets that are then transmitted as needed (Fig. A4.9). Specifically, with the current generation of equipment, the encoder will typically generate IP packets; these have a source IP address and a single or multicast IP address. The advantage of having video in IP format is that it can be carried over a regular (pure) Local Area Network (LAN) or carrier Wide Area Network (WAN).
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT
107
Ethernet 14 bytes IP 20 bytes IP multicast Output of encoders
UDP 8 bytes
7 TSs--same PID TS TS TS TS TS TS TS MPEG2 transport 4 bytes Video or Audeo or Data MPEG-4 stream 184 bytes
DVB packet = 204 bytes
DVB packet = 204 bytes
4 184 bytes 16
4 184 bytes 16
TS packet = 188
Encoder 1 Encoder 2
TS packet = 188
See bottom of this figure L2S
IPE DVB-S/DVB-S2-based satellite network
Encoder 3
Terrestrial network
7 TSs-same PID MPE hdr 12 bytes
IP 20 bytes
UDP 8 bytes
TS TS TS TS TS TS TS
MPE trailer 4 bytes
IP Multicast MPEG2 transport 4 bytes Video or Audeo or Data MPEG-4 stream 184 bytes
MPE hdr
TS TS payload hdr
TS TS payload hdr
MPE payload
TS TS payload hdr
MPE trailer
MPE hdr
MPE trailer
MPE payload
TS TS payload hdr
Adaptation field (used for stuffing here) 188 bytes 4 bytes Header
Adaptation field (may not be present)
Payload
TS packet
Header
Adaptation field (may not be present)
Payload
DVB DVB-S packet trailer
Figure A4.6 IPE protocol stack.
108
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
13 byte MPE
28 bytes UDP/IP
139 byte payload (content of the “virtual channel” (aka PID))
4 byte CRC
IP payload
MPE header 28 bytes of UDP/IP header or 40 bytes of TCP/IP header
Figure A4.7 MPE packet.
If IP packet looks like this IP/UDP hdr 28 bytes
Payload 375 bytes
CRC 4 bytes
The MPEG packets would look like this MPEG hdr 4 bytes
Pointer MPEG hdr IP/UDP hdr 1 byte 12 bytes 28 bytes
MPEG hdr 4 bytes MPEG hdr 4 bytes
Payload 143 bytes
Payload 184 bytes Payload 48 bytes
CRC 4 bytes
Filler “0×FFFF” 132 bytes
Figure A4.8 Encapsulation process.
Consider Fig. A4.9 again. It clearly depicts video (and audio) information being organized into PESs that are then segmented into TSs. Examining the protocol stack of Fig. A4.9 one should note that in a traditional MPEG-2 environment of DTV, either over-the-air transmission or cable TV transmission, the TSs are handled directly by an MPEG-2-ready infrastructure formally known as an MPEG-2 Transport Multiplex (see left-hand side stack). As explained, MPEG-2 Transport Multiplex offers a number of parallel channels that are known as TS logical channels. Each TS logical channel is uniquely identified by the PID value that is carried in the header of each MPEG-2 TS packet. TS logical channels are independently numbered on each MPEG-2 TS multiplex (MUX). As just noted, the service provided by an MPEG-2 transport multiplex offers a number of parallel channels that correspond to logical links (forming the MPEG TS) [24, 28]. The MPEG-2 TS has been widely accepted not only for providing digital TV services but also as a subnetwork technology for building IP networks, say in cable TV–based Internet access.
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT
109
System information
Ethernet 14 bytes IP 20 bytes LAN / WAN IP encapsulation
UDP 8 bytes
7 TSs -- same PD TS TS TS TS TS TS TS MPEG02 Transport 4 bytes Video or
IP LAN / WAN (traditional network) Typical of traditional video-over-IP
Audio or Data 184 bytes
Figure A4.9 Simplified protocol hierarchy.
There may be an interest in also carrying actual IP datagrams over this MPEG-2 transport multiplex infrastructure (this may be generic IP data or IP packets emerging from MPEG-4 encoders that contain 7 MPEG-4 frames.) To handle this requirement, packet data for transmission over an MPEG-2 transport multiplex is passed to an IPE. This receives PDUs, such as Ethernet frames or IP packets, and formats each into a Subnetwork Data Unit (SNDU), by adding an encapsulation header and trailer. The SNDUs are subsequently fragmented into a series of TS packets. To receive IP packets over an MPEG-2 TS Multiplex, a
110
3DTV/3DV TRANSMISSION APPROACHES AND SATELLITE DELIVERY
receiver needs to identify the specific TS multiplex (physical link) and also the TS logical channel (the PID value of a logical link). It is common for a number of MPEG-2 TS logical channels to carry SNDUs; therefore, a receiver must filter (accept) IP packets sent with a number of PID values, and must independently reassemble each SNDU [29]. Some applications require transmission of MPEG-4 streams over a preexisting MPEG-2 infrastructure, for example, in a cable TV application. This is also done via the IPE; here the IP packets generated by the MPEG-4 encoder are considered (treated) as if they were data, as just described above in this paragraph (Fig. A4.10). The encapsulator receives PDUs (e.g., IP packets or Ethernet frames) and formats these into SNDUs. An encapsulation (or convergence) protocol transports each SNDU over the MPEG-2 TS service and provides the appropriate mechanisms to deliver the encapsulated PDU to the receiver IP interface. In forming an SNDU, the encapsulation protocol typically adds header fields that carry protocol control information, such as the length of SNDU, receiver address, multiplexing information, payload type, and sequence numbers. The SNDU payload is typically followed by a trailer that carries an Integrity Check (e.g., Cyclic Redundancy Check, CRC). When required, an SNDU may be fragmented across a number of TS packets (Figs A4.11 and A4.12) [29].
MPEG-4 digital video (usually placed in an IP packet) MPEG-4 encoder
Other sources, e.g. • MPEG-2 digital video • IP over MPEG-2 IP over ethernet
TS TS ASI
IP over MPEG-2 TS
IP encapsulator (IPE)
ASI
Multiplexer
MPEG-2 transport multiplex
MPEG-2-ready network
IPE receives IP traffic on its ethernet interface and segments the IP datagrams into smaller packets and places them inside MPEG-2 transport stream packets. The IPE ASI output is fed into the multiplexer MPEG-4 digital video (usually placed in an IP packet)
IP over ethernet
IP network
MPEG-4 encoder
Figure A4.10 Encapsulator function.
+--------+-------------------------+-----------------+ PDU | Header | | Integrity check | +--------+-------------------------+-----------------+
Figure A4.11 payload unit.
Encapsulation of a subnetwork IPv4 or IPv6 PDU to form an MPEG-2
APPENDIX A4: BRIEF OVERVIEW OF MPEG MULTIPLEXING AND DVB SUPPORT
111
+-----------------------------------------+ |Encap header|Subnetwork data unit (SNDU) | +-----------------------------------------+ / / \ \ / / \ \ / / \ \ +------+----------+ +------+----------+ +------+----------+ |MPEG-2| MPEG-2 |..|MPEG-2| MPEG-2 |...|MPEG-2| MPEG-2 | |header| payload | |header| payload | |header| payload | +------+----------+ +------+----------+ +------+----------+
Figure A4.12 Encapsulation of a PDU (e.g., IP packet) into a series of MPEG-2 TS packets. Each TS packet carries a header with a common packet ID (PID) value denoting the MPEG-2 TS logical channel.
In summary, the standard DVB way of carrying IP datagrams in an MPEG-2 TS is to use MPE; with MPE each IP datagram is encapsulated into one MPE section. A stream of MPE sections are then put into an ES, that is, a stream of MPEG-2 TS packets with a particular PID. Each MPE section has a 12-B header, a 4-B CRC (CRC-32) tail, and a payload length, which is identical to the length of the IP datagram that is carried by the MPE section [30].
CHAPTER 5
3DTV/3DV IPTV Transmission Approaches
IPTV services enable advanced content viewing and navigation by consumers; the technology is rapidly emerging and becoming commercially available. IPTV is being championed by the telecom industry in particular, given the significant IP-based infrastructure these carriers already own. IPTV may be an ideal technology to support 3DTV because of the efficient network pruning supported by IP Multicast. Developers are encouraged to explore the use of IPv6 to support evolving 3DTV needs. 3DTV is a forward-looking service and hence, it should make use of a forward-looking IP transport technology, specifically IPv6. IP Multicast is also employed for control. While IP Multicast has been around for a number of years, it is now finding fertile commercial applications in the IPTV and DVB-H arenas. Applications such as datacasting (e.g., stock market or other financial data) tend to make use of large multihop networks; pruning is often employed and nodal store-and-forward approaches are completely acceptable. Applications such as video are very sensitive to end-to-end delay, jitter, and (uncorrectable) packet loss; QoS considerations are critical. These networks tend to have fewer hops and pruning may be somewhat trivially implemented by making use of a simplified network topology. IPTV services enable traditional carriers to deliver SD (Standard Definition) and HD video to their customers in support of their Triple/Quadruple Play strategies. With the significant erosion in revenues from traditional voice services on wireline-originated calls (both, in terms of depressed pricing and a shift to VoIP over broadband Internet services delivered over cable TV infrastructure), and with the transition of many customers from wireline to wireless services, the traditional telephone carriers find themselves in need of generating new revenues by seeking to deliver video services to their customers. Traditional phone carriers find themselves challenged in the voice arena (by VoIP and other providers); their Internet services are also challenged in the broadband Internet access arena (by cable TV companies); and, their video services are nascent and challenged by a lack of deployed technology. 3DTV Content Capture, Encoding and Transmission: Building the Transport Infrastructure for Commercial Services, by Daniel Minoli Copyright 2010 John Wiley & Sons, Inc.
113
114
5.1
3DTV/3DV IPTV TRANSMISSION APPROACHES
IPTV CONCEPTS
As described in Ref. [1], IPTV deals with approaches, technologies, and protocols to deliver commercial-grade SD and HD entertainment-quality real-time linear and on-demand video content over IP-based networks, while meeting all prerequisite QoS, QoE, Conditional Access (CA) (security), blackout management (for sporting events), Emergency Alert System (EAS), closed captions, parental controls, Nielsen rating collection, secondary audio channel, picture-in-picture, and guide data requirements of the content providers and/or regulatory entities. Typically, IPTV makes use of MPEG-4 encoding to deliver 200–300 SD channels and 20–30 HD channels; viewers need to be able to switch channels within 2 s or less; also, the need exists to support multi-set-top boxes/multiprogramming (say 2–4) within a single domicile. IPTV is not to be confused with the simple delivery of video over an IP network (including video streaming) that has been possible for over two decades; IPTV supports all business, billing, provisioning, and content protection requirements that are associated with commercial video distribution. IP-based service needs to be comparable to that received over cable TV or Direct Broadcast Satellite. In addition to TV sets, the content may also be delivered to a personal computer. MPEG-4, which operates at 2.5 Mbps for SD video and 8–11 Mbps for HD video, is critical to telco-based video delivery over a copper-based plant because of the bandwidth limitations of that plant, particularly when multiple simultaneous streams need to be delivered to a domicile; MPEG-2 would typically require a higher bitrate for the same perceived video quality. IP Multicast is typically employed to support IPTV. There has been significant deployment of commercial-grade IPTV services around the world in the recent past, as seen in Table 5.1. TABLE 5.1
Partial List of IPTV Providers in the United States and Europe
US IPTV providers
European IPTV providers
AT&T: U-Verse TV offers 300+ channels. Features include DVR, VOD, and HD Verizon: FiOS TV offers 200+ channels. Features include VOD, HD, and multi-room DVR (control and watch DVR programs from multiple rooms) Deutsche Telekom: T-Home service includes 60+ channels; HD, VOD, and TV archives are available Belgacom: Belgacom TV includes 70+ channels; VOD France Telecom: Orange TV offers 200+ channels;HD and VOD Telecom Italia: Alice Home TV offers 50+ channels; VOD British Telecom: BT Vision Service offers 40+ standard channels; DVR Telefonica: Imagenio service offers over 70+ channels Swisscom: Bluewin TV offers over 100+ channels; DVR and VOD
IPTV CONCEPTS
115
One can anticipate several phases in the deployment of IPTV, as follows: • Phase 1: IPTV introduced by the telcos for commercial delivery of entertainment-grade video over their IP/MPLS (Multiprotocol Label Switching) networks (2007–2012). • Phase 2: IPTV introduced by the cable TV companies for commercial delivery of entertainment-grade video over their cable infrastructure (speculative, 2012+). • Phase 3: IPTV to morph to Internet TV for commercial delivery of any video content but of entertainment-grade quality over the Internet/broadband Internet access connections (2012+). 5.1.1
Multicast Operation
As noted above, the backbone may consist of (i) a pure IP network or (ii) a mixed satellite transmission link to a metropolitan headend that, in turn, uses a metropolitan (or regional) telco IP network. Applications such as video are very sensitive to end-to-end delay, jitter, and (uncorrectable) packet loss; QoS considerations are critical. These networks tend to have fewer hops and pruning may be somewhat trivially implemented by a making use of a simplified network topology. At the logical level, there are three types of communication between systems in a(n IP) network: • Unicast: Here, one system communicates directly to another system. • Broadcast: Here, one system communicates to all systems. • Multicast: Here, one system communicates to a select group of other systems. In traditional IP networks, a packet is typically sent by a source to a single destination (unicast); alternatively, the packet can be sent to all devices on the network (broadcast). There are business- and multimedia (entertainment) applications that require a multicast transmission mechanism to enable bandwidthefficient communication between groups of devices where information is transmitted to a single multicast address and received by any device that wishes to obtain such information. In traditional IP networks, it is not possible to generate a single transmission of data when this data is destined for a (large) group of remote devices. There are classes of applications that require distribution of information to a defined (but possibly dynamic) set of users. IP Multicast, an extension to IP, is required to properly address these communications needs. As the term implies, IP Multicast has been developed to support efficient communication between a source and multiple remote destinations. Multicast applications include, among others, datacasting—for example, for distribution of real-time financial data—entertainment digital television over an IP network (commercial-grade IPTV), Internet radio, multipoint video conferencing, distance-learning, streaming media applications, and corporate
116
3DTV/3DV IPTV TRANSMISSION APPROACHES
communications. Other applications include distributed interactive simulation, cloud/grid computing, and distributed video gaming (where most receivers are also senders). IP Multicast protocols and underlying technologies enable efficient distribution of data, voice, and video streams to a large population of users, ranging from hundreds to thousands to millions of users. IP Multicast technology enjoys intrinsic scalability, which is critical for these types of applications. As an example in the IPTV arena, with the current trend toward the delivery of HDTV signals, each requiring the 12 Mbps range, and the consumers’ desire for a large number of channels (200–300 being typical), there has to be an efficient mechanism of delivering a signal of 1–2 Gbps1 aggregate to a large number of remote users. If a source had to deliver 1 Gbps of signal to, say, 1 million receivers by transmitting all of this bandwidth across the core network, it would require a petabit per second network fabric; this is currently not possible. On the other hand, if the source could send the 1 Gbps of traffic to (say) 50 remote distribution points (for example, headends), each of which then makes use of a local distribution network to reach 20,000 subscribers, the core network only needs to support 50 Gbps, which is possible with proper design. For such reasons, IP Multicast is seen as a bandwidth-conserving technology that optimizes traffic management by simultaneously delivering a stream of information to a large population of recipients, including corporate enterprise users and residential customers. IPTV uses IP-based basic transport (where IP packets contain MPEG-4 TSs) and IP Multicast for service control and content acquisition (group membership). See Fig. 5.1 for a pictorial example. One important design principle of IP Multicast is to allow receiver-initiated attachment (joins) to information streams, thus supporting a distributed informatics model. A second important principle is the ability to support optimal pruning such that the distribution of the content is streamlined by pushing replication as close to the receiver as possible. These principles enable bandwidth-efficient use of underlying network infrastructure. The issue of security in multicast environments is addressed via Conditional Access Systems (CAS) that provide per-program encryption (typically, but not always, symmetric encryption; also known as inner encryption) or aggregate IP-level encryption (again typically, but not always, symmetric encryption; also known as outer encryption). Carriers have been upgrading their network infrastructure in the past few years to enhance their capability to provide QoS-managed services, such as IPTV. Specifically, legacy remote access platforms, implemented largely to support basic DSL service roll-outs—for example, supporting ATM aggregation and DSL termination—are being replaced by new broadband network gateway access technologies optimized around IP, Ethernet, and VDSL2 (Very High Bitrate 1 Currently,
a typical digital TV package may consist of 200–250 SD signals each operating at 3 Mbps, and 30 HD signals each operating at 12 Mbps. This equates to about 1 Gbps; as more HDTV signals are added, the bandwidth will reach the range of 2 Gbps.
IPTV CONCEPTS
117
S
Traditional unicast IP
R R
R
S
Multicast IP
R R
R
S = Source R = Receiver
Figure 5.1
Bandwidth advantage of IP Multicast.
Digital Subscriber Line 2). These services and capabilities are delivered with multiservice routers on the network edge. Viewer-initiated program selection is achieved using the IGMP, specifically with the Join Group Request message. (IGMP v2 messages include Create Group Request, Create Group Reply, Join Group Request, Join Group Reply, Leave Group (LG) Request, LG Reply, Confirm Group Request, and Confirm Group Reply.) Multicast communication is based on the construct of a group of receivers (hosts) that have an interest in receiving a particular stream of information, be it voice, video, or data. There are no physical or geographical constraints, or boundaries to belong to a group, as
118
3DTV/3DV IPTV TRANSMISSION APPROACHES
long as the hosts have (broadband) network connectivity. The connectivity of the receivers can be heterogeneous in nature, in terms of bandwidth and connecting infrastructure (for example, receivers connected over the Internet), or homogeneous (for example, IPTV or DVB-H users). Hosts that are desirous of receiving data intended for a particular group join the group using a group management protocol: hosts/receivers must become explicit members of the group to receive the data stream, but such membership may be ephemeral and/or dynamic. Groups of IP hosts that have joined the group and wish to receive traffic sent to this specific group are identified by multicast addresses. Multicast routing protocols belong to one of two categories: Dense-Mode (DM) protocols and Sparse-Mode (SM) protocols. • DM protocols are designed on the assumption that the majority of routers in the network will need to distribute multicast traffic for each multicast group. DM protocols build distribution trees by initially flooding the entire network and then pruning out the (presumably small number of) paths without active receivers. The DM protocols are used in LAN environments, where bandwidth considerations are less important, but can also be used in WANs in special cases (for example, where the backbone is a one-hop broadcast medium such as a satellite beam with wide geographic illumination, such as in some IPTV applications). • SM protocols are designed on the assumption that only few routers in the network will need to distribute multicast traffic for each multicast group. SM protocols start out with an empty distribution tree and add drop-off branches only upon explicit requests from receivers to join the distribution. SM protocols are generally used in WAN environments, where bandwidth considerations are important. For IP Multicast there are several multicast routing protocols that can be employed to acquire real-time topological and membership information for active groups. Routing protocols that may be utilized include the Protocol-Independent Multicast (PIM), the Distance Vector Multicast Routing Protocol (DVMRP), the MOSPF (Multicast Open Shortest Path First), and Core-Based Trees (CBT). Multicast routing protocols build distribution trees by examining routing a forwarding table that contains unicast reachability information. PIM and CBT use the unicast forwarding table of the router. Other protocols use their specific unicast reachability routing tables; for example, DVMRP uses its distance vector routing protocol to determine how to create source-based distribution trees, while MOSPF utilizes its link state table to create source-based distribution trees. MOSPF, DVMRP, and PIM-DM are dense-mode routing protocols, while CBT and PIM-SM are sparse-mode routing protocols. PIM is currently the most-widely used protocol. As noted, IGMP (versions 1, 2, and 3) is the protocol used by Internet Protocol Version 4 (IPv4) hosts to communicate multicast group membership states to multicast routers. IGMP is used to dynamically register individual hosts/receivers on a particular local subnet (for example, LAN) to a multicast group. IGMP
IPTV CONCEPTS
119
Specifies the maximum allowed time a host can wait before sending a corresponding report. 0
8
16
31
Type
Max response time Type of IGMP packet (used in membership query messages)
16-bit check sum
Class D address (used in a report packet) −0×11: Specifies a membership query packet. This packet is sent by a multicast router. −0×12: Specifies an IGMP v1 membership report packet. This packet is sent by a multicast host to signal participation in a specific multicast host group. −0×16: Specifies an IGMP v2 membership report packet. −0×17: Specifies a leave group packet. This packet is sent by a multicast host.
Figure 5.2 IGMP v2 message format.
version 1 defined the basic mechanism. It supports a Membership Query (MQ) message and a Membership Report (MR) message. Most implementations at press time employed IGMP version 2; it adds LG messages. Version 3 adds source awareness, allowing the inclusion or exclusion of sources. IGMP allows group membership lists to be dynamically maintained. The host (user) sends an IGMP “report,” or join, to the router to be included in the group. Periodically, the router sends a “query” to learn which hosts (users) are still part of a group. If a host wishes to continue its group membership, it responds to the query with a “report.” If the host does not send a “report,” the router prunes the group list to delete this host; this eliminates unnecessary network transmissions. With IGMP v2, a host may send an LG message to alert the router that it is no longer participating in a multicast group; this allows the router to prune the group list to delete this host before the next query is scheduled, thereby minimizing the time period during which unneeded transmissions are forwarded to the network. The IGMP messages for IGMP version 2 are shown in Fig. 5.2. The message comprises an eight octet structure. During transmission, IGMP messages are encapsulated in IP datagrams; to indicate that an IGMP packet is being carried, the IP header contains a protocol number of 2. An IP datagram includes a Protocol Type field, that for IGMP is equal to 2 (IGMP is one of many protocols that can be specified in this field). An IGMP v2 PDU consists of a 20-byte IP header and 8 bytes of IGMP. Some of the areas that require consideration and technical support to develop and deploy IPTV systems include the following, among many others: • content aggregation; • content encoding (e.g., AVC/H.264/MPEG-4 Part 10, MPEG-2, SD, HD, Serial Digital Interface (SDI), Asynchronous Serial Interface (ASI), Layer 1 switching/routing); • audio management;
120
3DTV/3DV IPTV TRANSMISSION APPROACHES
• digital right management/CA: encryption (DVB-CSA, AES or Advanced Encryption StandardAdvanced Encryption Standard); key management schemes (basically, CAS); transport rights; • encapsulation (MPEG-2 transport stream distribution); • backbone distribution such as satellite or terrestrial (DVB-S2, QPSK, 8-PSK, FEC, turbo coding for satellite—SONET (Synchronous Optical Network)/SDH/OTN (Synchronous Digital Hierarchy/Optical Transport Network) for terrestrial); • metro-level distribution; • last-mile distribution (LAN/WAN/optics, GbE (Gigabit Ethernet), DSL/FTTH); • multicast protocol mechanisms (IP multicast); • QoS backbone distribution; • QoS, metro-level distribution; • QoS, last-mile distribution; • QoS, channel surfing; • Set-Top Box (STB)/middleware; • QoE; • Electronic Program Guide (EPG); • blackouts; • service provisioning/billing, service management; • advanced video services (e.g., PDR and VOD); • management and confidence monitoring; • triple play/quadruple play. 5.1.2
Backbone
Figures 5.3–5.6 depict typical architectures for linear IPTV. Figure 5.3 shows a content aggregator preparing the content at single source S for terrestrial distribution of the content to multiple remote telcos. This example depicts the telcos acquiring the service from an aggregator/distributor, rather than performing that fairly complex function on their own since it can be fairly expensive to architect, develop, setup, and maintain. The operator must sign an agreement with each content provider; hundreds of agreements are therefore required to cover the available channels and VOD content. Aggregators provide the content to the operator, typically over satellite delivery and do a lot of the signal normalization and CA work. However, an additional per-channel agreement with one or more content aggregators is also needed. Note: This figure shows DSL delivery, likely ADSL2, but an FTTH can also be used. Note: This figure does not show the middleware server—either distributed at the telco headend or centralized at the content aggregator.
121
IPTV CONCEPTS
Conditional access system DSLAM
Control word generator Firewall Encryptor 2D and/or 3D
Content 1
Dense or Sparse–Dense
Content 2 Dense or Sparse– Dense
Content 3
DSLAM
Content n Encryptor Firewall Conditional access system
Dense or Sparse– Dense
DSLAM
Control word generator
DSLAM
Figure 5.3 Typical terrestrial-based single-source IPTV system.
Note: This figure does not show the content acquisition; the uniform transcoding (e.g., using MPEG-4) is only hinted by the device at the far left. Note: This figure does not show the specifics of how the ECMs (Entitlement Control Message) and EMMs (Entitlement Management Message) to support the CA function are distributed resiliently. This is typically done in-band for the ECMs and out-of-band (e.g., using a Virtual Private Network or VPN over the Internet) for the EMMs. Note: This figure does not show the Video On Demand (VOD) overlay is deployed over the same infrastructure to deliver this and other advanced services.
122
3DTV/3DV IPTV TRANSMISSION APPROACHES
Conditional access system
rec
Control word generator Firewall
DSLAM rec
Encryptor E
Content 1
Mo U
HPA
Mi
2D and/or 3D rec
Content 2 DSLAM rec
Content 3
E
Mo U
HPA
Dense or Sparse–Dense
Content n Encryptor Firewall Conditional access system
rec
DSLAM rec
Control word generator
rec
DSLAM rec
Figure 5.4 Typical satellite-based single-source IPTV system.
Note: This figure does not show a blackout management system, which is needed to support substitution of programming for local sports events. Note: This figure does not show how the tribune programming data is injected into the IPTV system, which is needed for scheduling/programming support.
Figure 5.4 is an architecture that is basically similar to that of Fig. 5.3, but the distribution to the remote telcos is done via a satellite broadcast technology. Satellite delivery is typical of how cable TV operators today receive their signals from various media content producers (e.g., ABC/Disney, CNN, UPN, Discovery, and A&E). In the case of the cable TV/Multiple Systems Operator (MSOs), the operator would typically have (multiple) satellite antenna(s) accessing multiple transponders on a satellite or on multiple satellites, and then combine these signals for distribution. See Fig. 5.5 for a pictorial example. In contrast, in the architecture of Fig. 5.6, the operator will need only one receiver antenna, because the signal
IPTV CONCEPTS
123
2D and/or 3D
Cable TV 1 Receivers
Content providers Content providers
Receivers Receivers
Content providers
Receivers
Cable TV 2 Receivers Receivers Receivers
Figure 5.5 Disadvantages of distributed-source IPTV: requires dish farms at each telco and for all ancillary subsystems.
aggregation (CA, middleware administration, etc.) is done at the central point of content aggregation. Zooming in a bit, the technology elements (subsystems) involved in linear IPTV include the following: • • • • • • • • •
content aggregation, uniform transcoding, CA management, encapsulation, long-haul distribution, local distribution, middleware, STBs, catcher (for VOD services).
124
3DTV/3DV IPTV TRANSMISSION APPROACHES
2D and/or 3D Content providers
Source receivers Video routing
Uplink
IP receiver
H.264 encoding
DVB modulation
IP router
IP switch
IP encapsulation
STB
DSLAM Telco 1
IP receiver IP router
STB
DSLAM
Figure 5.6 telco.
Telco 2
Advantages of single-source IPTV: obviates need for dish farms at each
Each of these technologies/subcomponents has its own specific design requirements, architectures, and considerations. Furthermore, these systems have to interoperate for an end-to-end complete solution that has a high QoE for the user, is easy to manage, and is reliable. In turn, each of these subsystems can be viewed as a vendor-provided platform. Different vendors have different product solutions to support these subsystems; generally, no one vendor has a true end-to-end solution. Hence, each of the following can be seen as a subsystem platform in its own right: • • • •
content aggregation, uniform transcoding, CA management, encapsulation,
IPTV CONCEPTS
• • • • •
125
long-haul distribution, local distribution, middleware, STBs, catcher (for VOD services).
5.1.3
Access
The local distribution network is typically high-quality routed network with very tightly controlled latency, jitter, and packet loss. The local distribution network is typically comprised of a metropolitan core tier and a consumer distribution tier (Fig. 5.7). In the metropolitan core tier, IPTV is generally transmitted using the telco’s private “carrier-grade” IP network. The network engine can be pure IP-based, MPLS-based (layer “2.5”), metro Ethernet-based (layer 2), or optical SONET/OTN based (layer 1), or a combination thereof. A (private) wireless network such as WiMAx can also be used. The backbone network supports IP Multicast, very typically PIM-DM or PIM Sparse–Dense. It is important to keep the telco-level IP network (the metropolitan core tier) streamlined with as few routed hops as possible, and with plenty of bandwidth between links and with high-power nodal routers in order to meet the QoS
Consumer distribution tier DSLAM
2D and/or 3D Conditional access system
Metropolitan core
Content 1
Consumer distribution tier
Content 2 Backbone core
Content 3
DSLAM
Content n Conditional access system
Metropolitan core Consumer distribution tier DSLAM
Figure 5.7
Distribution networks.
126
3DTV/3DV IPTV TRANSMISSION APPROACHES
requirements of IPTV. Otherwise pixilation, tiling, waterfall effects, and even blank screens will be an issue. It is important to properly size all Layer 2 and Layer 3 devices in the network. It is also important to keep multicast traffic from “splashing back” and flooding unrelated ports. IGMP snooping and other techniques may be appropriate. The consumer distribution tier, the final leg, is generally (but not always) DSL-based at this time (e.g., VDSL or ADSL2+); other technologies such as PON (Passive Optical Network) may also be used (Table 5.2). A bandwidth in the 20–50 Mbps is generally desirable for delivery of IPTV services. For TABLE 5.2
Consumer Distribution Tier
Approach for Consumer Distribution Tier “Classical”
Under deployment
New/future
Cable operators
Description Digital Subscriber Line (DSL) delivers digital data over a copper connection, typically using the existing local loop. There are multiple DSL variants with ADSL2 and ADSL2+ being the most prevalent. DSL is distance-sensitive and has limited bandwidth. As a result, DSL often cannot be used alone, fiber must be deployed to connect to a Digital Subscriber Line Access Multiplexer (DSLAM) located in an outside plant cabinet • Fiber to the Neighborhood (FTTN, also referred to as Fiber To The Node): Fiber is extended to each neighborhood where IPTV service is to be supported. A small-size DSLAM in each neighborhood supports a few dozen subscribers • Fiber-to-the-Curb (FTTC): Fiber is extended to within (typically) less than 1/10th of a mile from the subscriber site. Each fiber typically supports one to three subscribers • Fiber-to-the-Premises/Home/Subscriber/Business (FTTP, FTTH, FTTS, FTTB): Fiber reaches the subscriber site • Passive Optical Networks (PON) technology can be used to deliver service using end-to-end fiber. A single fiber emanates from the Central Office, and a passive splitter in the outside plant splits the signal to support multiple subscribers. Broadband PON (BPON) supports up to 32 subscribers per port, while Gigabit PON (GPON) supports up to 128 subscribers per port • Fixed wireless WiMAX. Note that WiMAX supports only 17 Mbps of shared bandwidth over a 2.5-mile radius (and less at higher distances), and is therefore, rather limited Hybrid Fiber Coax (HFC) is the traditional technology used by cable operators. Fiber is used for the first section, from the headend to the subscriber’s neighborhood. The link is then converted to coax for the remainder of the connection, terminating at the subscriber premises
IPTV CONCEPTS
127
example, the simultaneous viewing of an HD channel along with two SD channels would require about 17 Mbps; Internet access would require additional bandwidth. Therefore, the 20 Mbps is seen as a lower bound on the bandwidth. In the United States Verizon is implementing Fiber-to-the-Premises (FTTP) technologies, delivering fiber to the subscriber’s domicile; this supports high bandwidth but it requires significant investments. AT&T is implementing Fiber-to-the-Curb (FTTC) in some markets, using existing copper for only the last 1/10th of a mile, and Fiber-to-the-Node (FTTN) in other markets terminating the fiber run within a few thousand feet of the subscriber. These approaches lower the up-front cost but limit the total bandwidth. As noted, IPTV as delivered by the telephone carriers may use PON technology, as an FTTH implementation technology, or perhaps VDSL2. However, if loop investments are made by these carriers it is likely that it will be in favor of FTTH. VDSL2 may find a use in Multidwelling Unit (MDUs), as we note below. The VDSL2 standard ITU G.993.2 is an enhancement to G.993.1 (VDSL). It uses about 30 MHz of spectrum (versus 12 MHz in VDSL) and thus allows more data to be sent at higher speeds and over longer distances. VDSL2 utilizes up to 30 MHz of bandwidth to provide speeds of 100 Mbps both downstream and upstream within 1000 ft. Data rates in excess of 25 Mbps are available for distances up to 4000 ft (Fig. 5.8); Fig. 5.9, depicts, for illustrative purposes, test results for Zhone’s VDSL2 products [2] VDSL2 technology can handle, say, three simultaneous HDTV streams (for example, according to the firm GigaOm Research, the average US home has 3.1 televisions). Of course, there is the issue Aggregate data rate (DS + US) for AWGN-140dBm/Hz 200 VDSL2 VDSL1 ADSL2+ ADSL2
180 160 Data rate (Mbps)
140 120 100 80 60 40 20 0
0
1
2
3 4 5 6 7 Loop length (kft, 26AWG)
8
9
10
Figure 5.8 VDSL2 aggregate channel capacity as a function of the link length.
Kbps
Kbps
Kbps
Kbps
128
3DTV/3DV IPTV TRANSMISSION APPROACHES
100000 80000 60000 40000 20000 0
VDSL2 PROFILES 8A-8D 8a-DS line rate 8b-US line rate 8d-DS line rate 8b-DS line rate 0
100000 80000 60000 40000 20000 0
140000 120000 100000 80000 60000 40000 20000 0
140000 120000 100000 80000 60000 40000 20000 0
1000
2000 3000 4000 5000 Distance (ft)
8a-US line rate 8c-DS line rate 8d-DS line rate 8c-US line rate
6000 DS = downstream; US = upstream
VDSL2 PROFILES 12A-12B 12a-DS line rate 12b-DS line rate 12a-US line rate 12b-US line rate 0
1000
2000 3000 4000 5000 Distance (ft)
6000
VDSL2 PROFILE 17A DS line rate US line rate
0
1000
2000 3000 4000 5000 Distance (ft)
6000
VDSL2 PROFILE 30A DS line rate US line rate
0
1000
2000 3000 4000 5000 Distance (ft)
6000
Figure 5.9 Actual VDSL2 downstream/upstream line rates, based on profiles.
that many homes in the United States are too far from the Central Office. The VDSL2 standard defines a set of profiles (Table 5.3) that can be used in different VDSL deployment architectures; ITU G.992.3 extends the North American frequency range from 12 to 30 MHz. For example, carriers such as Verizon Communications may use VDSL2 for risers in MDUs to bring FTTH service in these buildings. The carrier has been using relatively inexpensive Optical Network Terminal or ONTs (also called Optical Network Units (ONUs)2 ) for Single Family Unit (SFUs) (using Broadband Passive Optical Network or BPON) initially, but now also seeing Gigabit Passive Optical Network or GPON deployment). Using this arrangement, it is not excessively expensive to bring the fiber service to the living unit of an SFU. 2
ONU is IEEE terminology and ONT is ITU-T terminology.
IPTV CONCEPTS
TABLE 5.3 VDSL2 Profiles Regional Relevance Profiles 8a Bandwidth (MHz) Bandwidth (KHz) 4.312 Maximum Throughput (Mbps, downstream)
129
North America Europe Asia 8c 8d 12a 12b 17a 30a 8.5 12 17.7 30 4.312 4.312 4.312 4.312 4.312 4.312 8.625 8b
50
68
100
200
However, tenants of MDUs are more expensive to service because of the cost in pulling the fiber up the risers. Here is where DSL technologies still have some play: on the link between the basement and the apartment unit. For BPON, it is all VDSL1; for GPON, it is all VDSL2. The carrier will set the locations of the MDUs so that the furthest tenant is around 500 ft.; this achieves speeds of around 35 Mbps downstream, 10 Mbps upstream on VDSL1 and BPON. On GPON/VDSL2 the carrier expects to achieve 75 Mbps downstream (Fig. 5.10). PON is the leading FTTH technology3 (Fig. 5.11). This approach differs from most of the telecommunications networks in place today by featuring “passive” operation. Active networks such as DSL, VDSL, and cable have active components in the network backbone equipment, in the central office, neighborhood network infrastructure, and customer premises equipment. PONs employ only passive light transmission components in the neighborhood infrastructure; active components are located only in the central office and the customer premises equipment. The elimination of active components means that the access network consists of one bidirectional light source and a number of passive splitters that divide the data stream into the individual links to each customer. At the central office, the termination point is in the PON’s Optical Line Terminal (OLT) equipment. Between the OLT and the customer’s ONT/ONUs, one finds the PON; the PON is comprised of fiber links and passive splitters and couplers. 5.1.3.1 APON, BPON, GPON, EPON, and GE-PON. These represent various flavors of PON technology. Asynchronous Transfer Mode Passive Optical Network (APON) and BPON are the same specification, which is commonly referred to as BPON. BPON is the oldest PON standard, defined in the mid1990s and while there is an installed base of BPON, most of the new market deployment focus is now on Ethernet Passive Optical Network (EPON)/Gigabit Ethernet Passive Optical Network (GE-PON). GE-PON and EPON are different names for the same specification, which is defined by the IEEE 802.3 ah Ethernet in the First Mile (EFM) standard ratified in 2004. This is the current standardized high-volume solution for GPON technologies. GPON was being standardized as the ITU-T G.984 recommendation and is attracting interest in North America and elsewhere, but with no final standard. GPON devices have just been announced, and there is no volume deployment as yet. 3
This section is based on material from Ref. [3].
130
3DTV/3DV IPTV TRANSMISSION APPROACHES
3DTV Multiple HDTV/5DTV
6652-A2-302 VDSL2 modern
Data 10/100 Mbps
Voice VDSL2 100/100 Mbps
Main distribution frame
ata IP/ D rks o w net GPON
Bitstorm-HP VDSL2 IP-DSLAM
Figure 5.10 MDU use of VDSL2. Zhone’s VDSL2 products shown for illustrative purposes.
5.1.3.2 Differences between BPON, GPON and GE-PON. One important distinction between the standards is operational speed. BPON is relatively low speed with a 155 Mbps upstream/622 Mbps downstream operation. GEPON/EPON supports 1.0 Gbps symmetrical operation. GPON supports 2.5/1.25 Gbps asymmetrical operation. Another key distinction is the protocol support for transport of data packets between access network equipment. BPON is based on ATM, GE-PON uses native Ethernet and GPON supports ATM, Ethernet, and Wavelength Division Multiplexing (WDM) using a superset multiprotocol layer. BPON suffers from the very aggressive optical timing of ATM and the high complexity of the ATM transport layer. ATM-based FTTH solutions face a number of problems posed by (i) the provisioning process (which requires ATM-based central office equipment); (ii) by the complexity (in timing requirements and protocol complexity); and (iii) by the cost of components. This cost is exacerbated by the relatively small market for traditional ATM equipment used in the backbone telecommunications network. GPON is still evolving; the final specification of GPON is still being discussed by the ITU-T and Full Service Access Network
131
Other networks
CATV overlay service
Video/Audio over IP
en
e
c ffi
Passive optical splitter
C
tr
o al
O (C
)
OLT optical line terminal
O
O
O
PON.
ON
Ho
e m Ho
m
e m
Ho
ONU optical network units
Customer premise equipment (CPE)
Figure 5.11
ce
ffi
ble
r ca
Fibe
Passive optical network
U N
IP networks
U
Telecommunications and internet backbone network
U N
Home
Residency gateway ONT or ONU
U
O
N
e
132
3DTV/3DV IPTV TRANSMISSION APPROACHES
TABLE 5.4
PON Comparison
Attributes
BPON (APON)
GE-PON (EPON)
GPON
Speed: upstream/downstream Native protocol Complexity Cost Standards body Standard complete Volume deployment Primary deployment area
155/622 Mbps ATM High High ITU-T Yes, 1995 Yes, in 100,000s North America
1.0/1.0 Gbps Ethernet Low Low IEEE Yes, 2004 Yes, in 1,000,000s Asia
1.25/2.5 Gbps GEM High Undetermined ITU-T No No Not applicable
(FSAN) bodies. By definition, it requires the complexity of supporting multiple protocols through translation to the native Generic Encapsulation Method (GEM) transport layer that through emulation, provides support for ATM, Ethernet, and WDM protocols. This added complexity and lack of standard low-cost 2.5/1.25 Gbps optical components has delayed industry development of low-cost, highvolume GPON devices. GE-PON or EFM has been ratified as the IEEE 802.3 ah EFM standard and is already widely deployed in Asia. It uses Ethernet as its native protocol and simplifies timing and lowers costs by using symmetrical 1 Gbps data streams using standard 1 Gbps Ethernet optical components. Similar to other Ethernet equipment found in the extended network, Ethernet-based FTTH equipment is much lower in cost relative to ATM-based equipment, and the streamlined protocol support for an extended Ethernet protocol simplifies development. Table 5.4 compares the technologies.
5.2
IPv6 CONCEPTS
While it is likely that initially 3DTV will be delivered by traditional transport mechanisms, including DVB over DTH systems, recently some research efforts have been focused on delivery (streaming) of 3DTV using IP. IP can be used for IPTV systems or over an IP shared infrastructure, whether a private network (here shared with other applications) or over the Internet (here shared with a multitude of other users and applications) (some studies have also been undertaken of late on the capabilities of DVB-H to broadcast stereo-video streams.) However, it seems that the focus so far has been on IPv4; the industry is encouraged to assess the capabilities of IPv6. While this topic is partially tangential to a core 3DTV discussion, the abundant literature on proposals for packet-based delivery of future 3DTV (including but not limited to Refs [4–13]) makes the issue relevant. IPv6, when used with header compression, is expected to be a very useful technology to support IPTV in the future in general and 3DTV in particular. For a general discussion of IPTV and DVB-H, the reader may refer to Ref. [14] among other references.
IPv6 CONCEPTS
133
IPv6 was defined in the mid-1990s in IETF Request for Comments (RFC) 2460 “Internet Protocol, Version 6 (IPv6) Specification” and a host of other more recent RFCs, is an “improved, streamlined, successor version” of IPv4.4 Because of market pull from the Office of Management and Budget’s mandate that 24 major federal agencies in the US Government (USG) be IPv6-ready by June 30, 2008, and because of market pull from European and Asian institutions, IPv6 is expected to see gradual deployment from this point forward and in the coming decade. With IPv6 already gaining momentum globally, with major interest and activity in Europe and Asia and also some traction in the United States; the expectation is that in the next few years a (slow) transition to this new protocol will occur worldwide. An IP-based infrastructure has now become the ubiquitous underlying architecture for commercial-, institutional-, and USG/Other (non-US) Government (OG) communications and services functions. IPv6 is expected to be the next step in the industry’s evolution in the past 50 years from analog, to digital, to packet, to broadband. As an example of IPv6 deployment underway, Europe has set the objective to widely implement IPv6 by 2010; the goal is that at least 25% of users should be able to connect to the IPv6 Internet and to access their most important content and service providers without noticing a major difference when compared to IPv4. IPv6 offers the potential of achieving increased scalability, reachability, endto-end interworking, QoS, and commercial-grade robustness for data communication, mobile connectivity, and for VoIP/triple-play networks. The current version of the IP, IPv4, has been in use successfully for almost 30 years and poses some challenges in supporting emerging demands for address space cardinality, high-density mobility, multimedia, and strong security. This is particularly true in developing domestic and defense department applications utilizing peer-to-peer networking. IPv6 is an improved version of IP that is designed to coexist with IPv4 while providing better internetworking capabilities than IPv4 [14–17]. When the current version of IPv4 was conceived in the mid-1970s and defined soon thereafter (1981), it provided just over 4 billion addresses; that is not enough to provide each person on the planet with one address without even considering the myriad of other devices and device modules needing addressability (such as but not limited to over 3 billion cellphones). Additionally, 74% of IPv4 have been assigned to North American organizations. The goal of developers is to be able to assign IP addresses to a new class of Internet-capable devices: mobile phones, car navigation systems, home appliances, industrial equipment, and other devices (such as sensors and Body-Area-Network medical devices). All of these devices can then be linked together, constantly communicating, even in wireless mode. Projections show that the current generation of the Internet will “run out of space” in the near future (2010/2011) if IPv6 is not adopted around the world. IPv6 is an essential technology for ambient intelligence and will be a key driver for a multitude of new, innovative mobile/wireless applications and services [18]. 4
IPv6 was originally defined in RFC1883, RFC1884, and RFC1885, December 1995. RFC2460 obsoletes RFC1883.
134
3DTV/3DV IPTV TRANSMISSION APPROACHES
IPv6 was initially developed in the early 1990s because of the anticipated need for more end system addresses based on anticipated Internet growth, encompassing mobile phone deployment, smart home appliances, and billions of new users in developing countries (e.g., in China and India). New technologies and applications such as VoIP, “always-on access” (e.g., DSL and cable), Ethernet-tothe-home, converged networks, and evolving ubiquitous computing applications will continue to drive this need even more in the next few years [19]. IPv6 features, in comparison with IPv4, include the following [20]: • Expanded Addressing Capabilities: IPv6 increases the IP address size from 32 bits to 128 bits to support more levels in the addressing hierarchy, a much greater number of addressable nodes, and simpler autoconfiguration of addresses. The scalability of multicast routing is improved by adding a “scope” field to multicast addresses. A new type of address called an “anycast address” is also defined to be used to send a packet to any one of a group of nodes. • Header Format Simplification: Some IPv4 header fields have been dropped or made optional, to reduce the common-case processing cost of packet handling and to limit the bandwidth cost of the IPv6 header. • Authentication and Privacy Capabilities: In IPv6, security is built-in as part of the protocol suite: extensions to support authentication, data integrity (encryption), and (optional) data confidentiality are specified for IPv6. The security features of IPv6 are described in the Security Architecture for the Internet Protocol RFC 2401 [21], along with RFC 2402 [22] and RFC2406 [23]; Internet Protocol Security (IPsec) defined in these RFCs is required (mandatory). IPsec is a set of protocols and related mechanisms that supports confidentiality and integrity. (IPsec was originally developed as part of the IPv6 specification, but due to the need for security in the IPv4 environment, it has also been adapted for IPv4). • Flow Labeling Capability: A new feature is added to enable the labeling of packets belonging to particular traffic “flows” for which the sender requests special handling, such as non-default quality of service or “real-time” service. Services such as VoIP and IP-based entertainment video delivery (IPTV) is becoming broadly deployed and flow labeling, especially in the network core, can be very beneficial. • Improved Support for Extensions and Options: Changes in the way IP header options are encoded allows for more efficient forwarding, less stringent limits on the length of options, and greater flexibility for introducing new options in the future. End systems (such as PCs, servers), network elements (customer-owned and/or carrier-owned) and (perhaps) applications need to be IPv6-aware to communicate in the IPv6 environment. IPv6 has been enabled on many computing platforms. At this juncture, many operating systems come with IPv6 enabled by default;
REFERENCES
135
IPv6-ready Operating Systems (OS) include but are not limited to Mac OS X, OpenBSD, NetBSD, FreeBSD, Linux, Windows Vista, Windows XP (Service Pack 2), Windows 2003 Server, and Windows 2008 Server. Java began supporting IPv6 with J2SE 1.4 (in 2002) on Solaris and Linux. Support for IPv6 on Windows was added with J2SE 1.5. Other languages, such as C and C++ also support IPv6. At this time the number of applications with native IPv6 support is significant given that most important networking applications provide native IPv6 support. Hardware vendors including Apple Computer, Cisco Systems, HP, Hitachi, IBM, and Microsoft, support IPv6. One should note that IPv6 was designed with security in mind, but at the current time its implementation and deployment are (much) less mature than is the case for IPv4. When IPv4 was developed in the early 1980s, security was not a consideration; now a number of mechanisms have been added to address security considerations to IP. When IPv6 was developed in the early-to-mid 1990s, security was a consideration; hence, a number of mechanisms have been built-in into the protocol from the get-go to furnish security capabilities to IP.5 A presentation delivered during an open session at the July 2007 ICANN Public Meeting in San Juan, Puerto Rico made note of the accelerated depletion rate of IPv4 addresses and the growing difficulties the Regional Internet Registries (RIRs) are experiencing in allocating contiguous address blocks of sufficient size to service providers. Furthermore, the fragmentation in the IPv4 address space is taxing and stressing the global routing fabric and the near-term expectation is that the RIRs will impose more restrictive IPv4 allocation policies and promote a rapid adoption of IPv6 addresses [24]. The IPv4 address space is expected to run out by 2012.6 Appendix A5 provides some detailed information on IPv6. REFERENCES 1. Minoli D. IP multicast with applications to IPTV and mobile DVB-H. New York: Wiley/IEEE Press; 2008. 2. Zhone Technologies. Zhone VDSL2 technology, Zhone Technologies, Inc., Oakland (CA). Nov 2009. 3. PCM. FTTH Fiber to the Home Overview. Whitepaper. PCM-Sierra, Santa Clara (CA). 2009 4. Gurses E, Akar GB, Akar N. Optimal packet scheduling and rate control for video streaming. SPIE Visual Comm. and Image Processing (VCIP). Jan 2007. 5 Some
purist will argue (perhaps as an exercise in semantics), that since IPsec is available also to IPv4, that IPv6 and IPv4 have the same level of security. We take the approach in this text that since the use of IPsec is mandated as required in IPv6 while it is optional in IPv4, that at the practical, actual level, “IPv6 is more secure.” 6 There has been talk about reclaiming unused IPv4 space that it would be a huge undertaking. A reclaiming of some portion of the IPv4 space will not help with the goal of proving an addressable IP address to appliances, cell phones, sensors (such as Smart Dust), surveillance cameras, BodyArea-Network devices, Unmanned Aerial Vehicle, and so on.
136
3DTV/3DV IPTV TRANSMISSION APPROACHES
5. Kurutepe E, Civanlar MR, Tekalp AM. Client-Driven Selective Streaming of MultiView Video for Interactive 3DTV, submitted to IEEE Trans. CSVT. Dec 2006. 6. Kurutepe E, Civanlar MR, Tekalp AM. Interactive transport of multi-view videos for 3DTV applications. J Zhejiang Univ - Sci A: Appl Phys Eng 2006; 7(5): 830–836. 7. Petrovic G, de With PHN. Near-future streaming framework for 3D TV applications. Proceendings of the IEEE International Conference on Multimedia and Expo (ICME); Jul 2006; Toronto, Canada. pp. 1881–1884. 8. Tekalp AM, Kurutepe E, Civanlar MR. 3DTV over IP: End-to-end streaming of multi-view video. IEEE Signal Process Mag 2007; 24(6): 77–87. 9. Tekalp AM. 3D media delivery over IP. IEEE Multimed Comm Tech Committ E-Lett 2009; 4(3). 10. Thomos N, Argyropoulos S, Boulgouris NV, Strintzis MG. Robust transmission of H.264/AVC streams using adaptive group slicing and unequal error protection. EURASIP J Appl Signal Process 2006. 11. Argyropoulos S, Tan AS, Thomos N, Arikan E, Strintzis MG. Robust Transmission of Multi-View Video Streams Using Flexible Macroblock Ordering and Systematic LT Codes Proceedings of the 3DTV conference “3DTV-CON”; May 2007; Kos Island, Greece. 12. Hu S-Y. A case for 3D streaming on peer-to-peer networks. Proceedings of ACM Web3D; April 2006; Columbia (ML). pp. 57–64. 13. Sung W-L, Hu S-Y, Jiang J-R. Selection strategies for peer-to-peer 3D streaming. Proceedings of ACM NOSSDAV; 2008; Braunschweig, Germany. pp. 15–20. 14. Amoss J, Minoli D. Handbook of IPv4 to IPv6 transition methodologies for institutional & corporate networks. Boca Raton (FL): Taylor and Francis; 2008. 15. Minoli D, Kouns J. Security in an IPv6 environment. Boca Raton (FL): Taylor and Francis; 2009. 16. Minoli D. Satellite systems engineering in an IPv6 environment. Boca Raton (FL): Taylor and Francis; 2009. 17. Minoli D. Voice over IPv6—architecting the next-generation VoIP. New York: Elsevier; 2006. 18. Directorate-Generals Information Society. IPv6: Enabling the Information Society. European Commission Information Society, Europe Information Society Portal. Feb 18 2008. 19. IPv6 Portal. http://www.ipv6tf.org/meet/faqs.php. 20. Postel J. Internet Protocol. STD 5, RFC 791, Sep 1981. 21. Kent S, Atkinson R. Request for Comments: 2401. Security Architecture for the Internet Protocol. Nov 1998. 22. Kent S, Atkinson R. Request for Comments: 2402. IP Authentication Header. Nov 1998. 23. Kent S, Atkinson R. Request for Comments: 2406. IP Encapsulating Security Protocol (ESP). Nov 1998. 24. ICANN Security and Stability Advisory Committee (SSAC). Survey of IPv6 Support in Commercial Firewalls. Oct 2007; Marina del Rey, CA. 25. Lioy A. Security features of IPv6. Gai S, editor. Internetworking IPv6 with Cisco routers. McGraw-Hill; 1998. Chapter 8. Available at www.ip6.com/us/book/ Chap8.pdf.
REFERENCES
137
26. An IPv6 Security Guide for U.S. Government Agencies—Executive Summary. The IPv6 World Report Series, Volume 4, Juniper Networks, Sunnyvale (CA). Feb 2008. 27. Kaeo M, Green D, Bound J, Pouffary Y. IPv6 Security Technology Paper. North American IPv6 Task Force (NAv6TF) Technology Report. Jul 22 2006. ¨ urk O. Mobility in IPv6, The IPv6 Portal. 2000. www.ipv6tf.org. 28. Yaiz RA, Ozt¨ 29. Srisuresh P, Egevang K. Traditional IP Network Address Translator (Traditional NAT). RFC 3022. Jan 2001. 30. Microsoft Corporation. MSDN Library, Internet Protocol. 2004. http://msdn. microsoft.com. 31. Hermann-Seton P. Security Features in IPv6, SANS Institute, As part of the Information Security Reading Room. 2002. 32. Ertekin E, Christou C. IPv6 Header Compression, North American IPv6 Summit. Booz Allen Hamilton. Jun 2004. 33. Donz´e F. IPv6 autoconfiguration. The Internet Protocol Journal 2004; 7(2). Available at http://www.cisco.com. San Jose, CA. 34. Desmeules R. Cisco self-study: implementing Cisco IPv6 networks (IPv6). Cisco Press; 2003. 35. 6NET. D2.2.4: Final IPv4 to IPv6 Transition Cookbook for Organizational/ISP (NREN) and Backbone Networks. Version: 1.0, Project Number: IST-2001-32603, CEC Deliverable Number: 32603/UOS/DS/2.2.4/A1. Feb 4 2005. 36. Gilligan R, Nordmark E. Transition Mechanisms for IPv6 Hosts and Routers. RFC 2893. Aug 2000. 37. Shin M-K. Application aspects of IPv6 transition. In: Hong Y-G, Hagino J, Savola P, et al., editors. RFC 4038. Mar 2005. 38. Warfield MH. Security implications of IPv6. 16th Annual First Conference on Computer Security Incident Handling; Jun 13–18 2004; Budapest, Hungary. X-Force, Internet Security Systems, Inc. (ISS). 39. Commission of the European Communities, Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions—Advancing the Internet: Action Plan for the deployment of Internet Protocol version 6 (IPv6) in Europe, Brussels. May 27 2008.
138
3DTV/3DV IPTV TRANSMISSION APPROACHES
APPENDIX A5: IPv6 BASICS
This appendix does not discuss 3DTV technology phase. However, the position is taken that 3DTV designers considering to use IP-based distribution networks should give serious consideration to utilizing IPv6, if not immediately then as a transition plan. A5.1
IPv6 Overview
While the basic function of the IP is to move information across networks, IPv6 has more capabilities built into its foundation than IPv4. A key capability is the significant increase in address space. For example, all devices could have a public IP address so that they can be uniquely tracked.7 Today, inventory management of dispersed assets in a very large dispersed organization such as the United States Department of Defense (DoD) Department cannot be achieved with IP mechanisms; during the inventory cycle someone has to manually verify the location of each desktop computer. With IPv6 one can use the network to verify that such equipment is there; even non-IT equipment in the field can also be tracked, by having an IP address permanently assigned to it. IPv6 also has extensive automatic configuration (autoconfiguration) mechanisms and reduces the IT burden, making configuration essentially plug-and-play (autoconfiguration implies that a Dynamic Host Configuration Protocol or DHCP server is not needed and/or does not have to be configured. Owing to the fact that IPv4 manual configuration is already a challenge in itself, one can understand that manually manipulating IPv6 addresses that are four times longer can be much more problematic. Corporations and government agencies will be able to achieve a number of improvements with IPv6 such as, but not limited to the following • expanded addressing capabilities; • serverless autoconfiguration (what some call “plug-n-play”) and reconfiguration; • streamlined header format and flow identification; • end-to-end security, with built-in, strong IP-layer encryption and authentication (embedded security support with mandatory IPsec implementation); • in IPv6, creating a VPN is easier and more standard than in IPv4, because of the Authentication Header (AH) and Encapsulating Security Protocol (ESP) Extension Headers and the performance penalty is lower for the VPN implemented in IPv6 compared to those built in IPv4 [25]; • enhanced support for multicast and QoS (more refined support for flow control and QoS for the near real-time delivery of data); • more efficient and robust mobility mechanisms (enhanced support for Mobile IP and mobile computing devices); 7
Note that this has some potential negative security issues as attackers could be able to own a machine and then exactly know how to go back to that same machine again. Therefore, reliable security mechanisms need to be put understood and put in place in IPv6 environments.
APPENDIX A5: IPv6 BASICS
139
• extensibility: improved support for feature options/extensions; • IPv6 makes it easy for nodes to have multiple IPv6 addresses on the same network interface. This can create the opportunity for users to establish overlay or Communities of Interest (COI) networks on top of other physical IPv6 networks. Department, groups, or other users and resources can belong to one or more COIs, where each can have its own specific security policy [26]; • merging two IPv4 networks with overlapping addresses (say, if two organizations merge) is complex; it will be much easier to merge networks with IPv6; • IPv6 network architectures can easily adapt to an end-to-end security model where the end hosts have the responsibility of providing the security services necessary to protect any data traffic between them; this results in greater flexibility for creating policy-based trust domains that are based on varying parameters including node address and application [27]. IPv6 basic capabilities include the following: • • • • •
addressing, anycast, flow labels, ICMPv6, Neighbor Discovery (ND).
Table A5.1 shows the core protocols that comprise IPv6. TABLE A5.1
Key IPv6 Protocols
Protocol Internet Protocol Version 6 (IPv6): RFC 2460 Internet Control Message Protocol for IPv6 (ICMPv6): RFC 2463 Multicast Listener Discovery (MLD): RFC 2710, RFC 3590, RFC 3810 Neighbor Discovery (ND): RFC 2461
Description IPv6 is a connectionless datagram protocol used for routing packets between hosts A mechanism that enables hosts and routers that use IPv6 communication to report errors and send status messages A mechanism that enables one to manage subnet multicast membership for IPv6. MLD uses a series of three ICMPv6 messages. MLD replaces the Internet Group Management Protocol (IGMP) v3 that is employed for IPv4 A mechanism that is used to manage node-to-node communication on a link. ND uses a series of five ICMPv6 messages. ND replaces Address Resolution Protocol (ARP), ICMPv4 Router Discovery, and the ICMPv4 Redirect message ND is implemented using the Neighbor Discovery Protocol (NDP)
IP was designed in the 1970s for the purpose of connecting computers that were in separate geographic locations. Computers in a campus were connected
140
3DTV/3DV IPTV TRANSMISSION APPROACHES
by means of local networks, but these local networks were separated into essentially stand-alone islands. “Internet,” as a name to designate the protocol and more recently the worldwide information network, simply means “internetwork”; that is, a connection between multiple networks. In the beginning, the protocol initially had only military use in mind, but computers from universities and enterprises were quickly added. The Internet as a worldwide information network is the result of the practical application of the IP protocol; that is, the result of the interconnection of a large set of information networks [19]. Starting in the early 1990s, developers realized that the communication needs of the twenty-first century required a protocol with some new features and capabilities, while at the same time retaining the useful features of the existing protocol. While link-level communication does not generally require a node identifier (address) since the device is intrinsically identified with the link-level address, communication over a group of links (a network) does require unique node identifiers (addresses). The IP address is an identifier that is applied to each device connected to an IP network. In this setup, different elements taking part in the network (servers, routers, desktop computers, etc.) communicate among each other using their IP address as an entity identifier. In version 4 of the IP protocol, addresses consist of four octets. For ease of human conversation, IP protocol addresses are represented as separated by periods, for example: 166.74.110.83, where the decimal numbers are a short hand (and correspond to) the binary code described by the byte in question (an 8 bit number takes a value in the 0–255 range). Since the IPv4 address has 32 bits there are nominally 232 different IP addresses (approximately 4 billion nodes, if all combinations are used). The Domain Name System (DNS) also helped the human conversation in the context of IPv4; DNS is going to be even more critical in IPv6 and will have substantial impact on security administrators that use IP addresses to define security policies (e.g., Firewalls). IPv4 has proven, by means of its long life, to be a flexible and powerful networking mechanism. However, IPv4 is starting to exhibit limitations, not only with respect to the need for an increase of the IP address space, driven, for example, by new populations of users in countries such as China and India, and by new technologies with “always connected devices” (DSL, cable, networked Primary Deployment Area or PDAs, 2.5G/3G mobile telephones, etc.), but also in reference to a potential global rollout of VoIP. IPv6 creates a new IP address format, so that the number of IP addresses will not get exhausted for several decades or longer even though an entirely new crop of devices are expected to connect to Internet. IPv6 also adds improvements in areas such as routing and network autoconfiguration. Specifically, new devices that connect to Internet will be “plug-and-play” devices. With IPv6 one is not required to configure dynamic unpublished local IP addresses, the gateway address, the subnetwork mask or any other parameters. The equipment, when plugged into the network, automatically obtains all requisite configuration data [19]. The advantages of IPv6 can be summarized as follows:
APPENDIX A5: IPv6 BASICS
141
• Scalability: IPv6 has 128 bit addresses versus 32 bit IPv4 addresses. With IPv4 the theoretical number of available IP addresses is 232 ∼ 1010 . IPv6 offers a 2128 space. Hence, the number of available unique node addressees are 2128 ∼ 1039 . • Security: IPv6 includes security features in its specifications such as payload encryption and authentication of the source of the communication. • Real-Time Applications: To provide better support for real-time traffic (e.g., VoIP), IPv6 includes “labeled flows” in its specifications. By means of this mechanism, routers can recognize the end-to-end flow to which transmitted packets belong. This is similar to the service offered by MPLS, but it is intrinsic with the IP mechanism rather than an add-on. Also, it preceded this MPLS feature by a number of years. • “Plug-And-Play”: IPv6 includes a “plug-and-play” mechanism that facilitates the connection of equipment to the network. The requisite configuration is automatic. • Mobility: IPv6 includes more efficient and enhanced mobility mechanisms, which are important for mobile networks.8 • Optimized Protocol: IPv6 embodies IPv4 best practices but removes unused or obsolete IPv4 characteristics. This results in a better-optimized Internet protocol. • Addressing and Routing: IPv6 improves the addressing and routing hierarchy. • Extensibility: IPv6 has been designed to be extensible and offers support for new options and extensions. With IPv4, the 32-bit address can be represented as AdrClass|netID|hostID. The network portion can contain either a network ID or a network ID and a subnet. Every network and every host or device has a unique address, by definition. Basic NATing is a method by which IP addresses (specifically IPv4 addresses) are transparently mapped from one group to another. Specifically, private “unregistered” addresses are mapped to a small set (as small as 1) of public registered addresses; this impacts the general addressability, accessibility, and “individuality” of the device. Network Address Port Translation (NAPT), also referred to as Port Address Translation (PAT), is a method by which many network addresses and their TCP/UDP ports are translated into a single network address and its TCP/UDP ports. Together, these two methods, referred to as traditional Network 8 Some
of the benefits of IPv6 in the context of mobility include [28] (i) larger addresses, which allows for new techniques to be used in order for the Mobile Node (MN) to obtain a care-of address; here MNs can always get a collocated care-of address, a fact that removes the need for a Foreign Agent (FA). (ii) New routing header, which allows for proper use of source routing. This was not possible with IPv4. (iii) AH, which allows for the authentication of the binding messages. (iv) Destination options header, which allows for the use an options without significant performance degradation; performance degradation may have occurred in IPv4 because every router along the path had to examine the options even when they are only destined for the receiver of the packet.
142
3DTV/3DV IPTV TRANSMISSION APPROACHES
Address Translation (NAT), provide a mechanism to connect a realm with private addresses to an external realm with globally unique registered addresses [29]. NAT is a short-term solution for the anticipated Internet growth requirements for this decade and a better solution is needed for address exhaustion. There is a clear recognition that NAT techniques make the Internet, the applications, and even the devices more complex (especially when conducting business-to-business transactions) and this means a cost overhead [19]. Overlapping encryptions domains has been a substantial issue for organizations to deal with when creating gateway-togateway VPNs. The expectation is that IPv6 can make IP devices less expensive, more powerful, and even consume less power; the power issue is not only important for environmental reasons, but also improves operability (e.g., longer battery life in portable devices, such as mobile phones). IPv4 addresses can be from an officially assigned public range or from an internal intranet private (but not globally unique) block. Internal intranet addresses may be in the ranges 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16, as suggested in RFC 1918. In the case of an internal intranet private address, a NAT function is employed to map the internal addresses to an external public address when the private-to-public network boundary is crossed. This, however, imposes a number of limitations, particularly since the number of registered public addresses available to a company is almost invariably much smaller (as small as 1) than the number of internal devices requiring an address. As noted, IPv4 theoretically allows up to 232 addresses, based on a four-octet address space. Public, globally unique addresses are assigned by the Internet Assigned Numbers Authority (IANA). IP addresses are addresses of network nodes at layer 3; each device on a network (whether the Internet or an intranet) must have a unique address. In IPv4, it is a 32-bit (4-byte) binary address used to identify the device. It is represented by the nomenclature a.b.c.d, each of a, b, c, and d being from 1 to 255 (0 has a special meaning). Examples are 167.168.169.170, 232.233.229.209, and 200.100.200.100. The problem is that during the 1980s many public, registered addresses were allocated to firms and organizations without any consistent control. As a result, some organizations have more addresses than they actually need, giving rise to the present dearth of available “registerable” Layer 3 addresses. Furthermore, not all IP addresses can be used due to the fragmentation described above. One approach to the issue would be a renumbering and a reallocation of the IPv4 addressing space. However, this is not as simple as it appears since it requires significant worldwide coordination efforts and it would not solve the medium-term need for a much larger address space for evolving end-user/ consumer applications. Moreover, it would still be limited for the human population and the quantity of devices that will be connected to the Internet in the medium-term future [19]. At this juncture, and as a temporary and pragmatic approach to alleviate the dearth of addresses, NAT mechanisms are employed by organizations and even home users. This mechanism consists of using only a small set of public IPv4 addresses for an entire network to access to Internet. The myriad of internal devices are assigned IP addresses from a specifically
APPENDIX A5: IPv6 BASICS
143
designated range of Class A or Class C address that are locally unique but are duplicatively used and reused within various organizations. In some cases (e.g., residential Internet access use via DSL or cable), the legal IP address is only provided to a user on a time-lease basis, rather than permanently. A number of protocols cannot travel through a NAT device and hence the use of NAT implies that many applications (e.g., VoIP) cannot be used effectively in all instances.9 As a consequence, these applications can only be used in intranets. Examples include the following [19]: • Multimedia applications such as videoconferencing, VoIP, or VOD/IPTV do not work smoothly through NAT devices. Multimedia applications make use of RTP and Real-Time Control Protocol (RTCP). These in turn use UDP with dynamic allocation of ports and NAT does not directly support this environment. • IPsec is used extensively for data authentication, integrity, and confidentiality. However, when NAT is used, IPsec operation is impacted, since NAT changes the address in the IP header. • Multicast, although possible in theory, requires complex configuration in a NAT environment and hence, in practice, is not utilized as often as could be the case. The need for obligatory use of NAT disappears with IPv6 (but it can still be used if someone wanted to). The format of IPv6 addressing is described in RFC 2373. As noted, an IPv6 address consists of 128 bits, rather than 32 bits as with IPv4 addresses. The number of bits correlates to the address space, as follows:
IP Version IPv6 IPv4
9
Size of Address Space 128 bits, that allows for 2128 or 340,282,366,920,938,463,463, 374,607,431,768,211,456 (3.4 × 1038 ) possible addresses 32 bits, that allows for 232 or 4,294,967,296 possible addresses
The reader should be aware that we are not referring here to deploying corporate VoIP for an organization of 10, 1000, or 10,000 employees and then being able to pass VoIP protocols over the firewall. That is a fairly trivial exercise. We are referring here to the overreaching goal of enabling any-person-on-the-planet-to-any-other-person-on-the-planet VoIP-based communication by affording a consistent, stable, and publishable addressing scheme. The US Bell System and the telecommunications world solved that problem over half a century ago, by giving the world a telephony addressing scheme that allows every person in the world to have a unique, persistent, usable telephone number (Country Code + City (if applicable) + Local number) from Antarctica (+672) to Zimbabwe (+263), from Easter Island (+56) to Tristan da Cunha (+290), and every land and island in between.
144
3DTV/3DV IPTV TRANSMISSION APPROACHES
The relatively large size of the IPv6 address is designed to be subdivided into hierarchical routing domains that reflect the topology of the modern-day Internet. The use of 128 bits provides multiple levels of hierarchy and flexibility in designing hierarchical addressing and routing. The IPv4-based Internet currently lacks this flexibility [30]. The IPv6 address is represented as 8 groups of 16 bits each, separated by the “:” character. Each 16 bit group is represented by 4 hexadecimal digits, that is, each digit has a value between 0 and F (0,1, 2, . . . A, B, C, D, E, F with A = 1010 , B = 1110 , etc., to F = 1510 ). What follows is an example of a hypothetical IPv6 address 3223 : 0BA0:01E0:D001 : 0000 : 0000 : D0F0 : 0010 If one or more four-digit groups is 0000, the zeros may be omitted and replaced with two colons (::). For example, 3223 : 0BA0 :: is the abbreviated form of the following address: 3223 : 0BA0 : 0000 : 0000 : 0000 : 0000 : 0000 : 0000 Similarly, only one 0 is written, removing 0’s in the left side, and four 0’s in the middle of the address. For example, the address 3223 : BA0 : 0 : 0 : 0 : 0 :: 1234 is the abbreviated form of the following address 3223 : 0BA0 : 0000 : 0000 : 0000 : 0000 : 0000 : 1234 There is also a method to designate groups of IP addresses or subnetworks that is based on specifying the number of bits that designate the subnetwork, beginning from left to right, using remaining bits to designate single devices inside the network. For example, the notation 3223 : 0BA0:01A0 :: /48 indicates that the part of the IP address used to represent the subnetwork has 48 bits. Since each hexadecimal digit has 4 bits, this points out that the part used to represent the subnetwork is formed by 12 digits, that is “3223:0BA0:01A0.” The remaining digits of the IP address would be used to represent nodes inside the network. There are a number of special IPv6 addresses, as follows: • Autoreturn or Loopback Virtual Address: This address is specified in IPv4 as the 127.0.0.1 address. In IPv6, this address is represented as ::1.
APPENDIX A5: IPv6 BASICS
145
• Unspecified Address (::): This address is not allocated to any node since it is used to indicate the absence of an address. • IPv6 over IPv4 Dynamic/Automatic Tunnel Addresses: These addresses are designated as IPv4-compatible IPv6 addresses and allow the sending of IPv6 traffic over IPv4 networks in a transparent manner. For example, they are represented as ::156.55.23.5. • IPv4 over IPv6 Addresses Automatic Representation: These addresses allow for IPv4-only-nodes to still work in IPv6 networks. They are designated as IPv4-mapped IPv6 addresses and are represented as ::FFFF: (e.g., ::FFFF:156.55.43.3). Like IPv4, IPv6 is a connectionless, unreliable datagram protocol used primarily for addressing and routing packets between hosts. Connectionless means that a session is not established before exchanging data. Unreliable means that delivery is not guaranteed. IPv6 always makes a best-effort attempt to deliver a packet. An IPv6 packet might be lost, delivered out of sequence, duplicated, or delayed. IPv6 per se does not attempt to recover from these types of errors. The acknowledgment of packets delivered and the recovery of lost packets is done by a higher-layer protocol, such as TCP [30]. From a packet forwarding perspective, IPv6 operates just like IPv4. An IPv6 packet, also known as an IPv6 datagram, consists of an IPv6 header and an IPv6 payload, as shown in Fig. A5.1. The IPv6 header consists of two parts, the IPv6 base header, and optional extension headers (Fig. A5.2). Functionally, the optional extension headers and upper-layer protocols, for example
Version Traffic class Payload length
Flow label Next header
Source address
Destination address
Figure A5.1 IPv6 packet.
Hop limit
146
3DTV/3DV IPTV TRANSMISSION APPROACHES
Version
Traffic class
Flow label
Payload length
Next header
Hop limit
Source IPv6 address (128 bit) 40 octets
Destination IPv6 address (128 bit)
Next header
Extension header information
Variable length
Payload
Figure A5.2 IPv6 extension headers. IPv6 extension headers are optional headers that may follow the basic IPv6 header. An IPv6 PDU may include zero, one or multiple headers. When multiple extension headers are used, they form a chained list of headers identified by the “next header” field of the previous header.
TCP, are considered part of the IPv6 payload. Table A5.2 shows the fields in the IPv6 base header. IPv4 headers and IPv6 headers are not directly interoperable: hosts and/or routers must use an implementation of both IPv4 and IPv6 in order to recognize and process both header formats (Fig. A5.3). This gives rise to a number of complexities in the migration process between the IPv4 and the IPv6 environments. The IP header in IPv6 has been streamlined and defined to be of a fixed length (40 bytes). In IPv6, header fields from the IPv4 header have been removed, renamed, or moved to the new optional IPv6 Extension Headers. The header length field is no longer needed since the IPv6 header is now a fixed length entity. The IPv4 Type of Service is equivalent to the IPv6 Traffic Class field. The Total Length field has been replaced with the Payload Length field. Since IPv6 only allows for fragmentation to be performed by the IPv6 source and destination nodes, and not individual routers, the IPv4 segment control fields (Identification, Flags, and Fragment Offset fields) have been moved to similar fields within the Fragment Extension Header. The functionality provided by the Time to Live (TTL10 ) field has been replaced with the Hop Limit field. The Protocol field has been replaced with the Next Header Type field. The Header Checksum field was removed; that has the main advantage of not having each relay spend time processing the checksum. The Options field is no longer part of 10
TTL has been used in many attacks and Intrusion Detection System (IDS) tricks in IPv4.
APPENDIX A5: IPv6 BASICS
TABLE A5.2
IPv6 Base Header
IPv6 Header Field Length (bits) Version
4
Traffic class
8
Flow label
20
Payload length
16
Next header
8
Function Identifies the version of the protocol. For IPv6, the version is 6 Intended for originating nodes and forwarding routers to identify and distinguish between different classes or priorities of IPv6 packets (Sometimes referred to as Flow ID.) Defines how traffic is handled and identified. A flow is a sequence of packets sent either to a unicast or a multicast destination. This field identifies packets that require special handling by the IPv6 node. The following list shows the ways the field is handled if a host or router does not support flow label field functions: • if the packet is being sent, the field is set to zero • if the packet is being received, the field is ignored Identifies the length, in octets, of the payload. This field is a 16-bit unsigned integer. The payload includes the optional extension headers, as well as the upper-layer protocols; for example, TCP Identifies the header immediately following the IPv6 header. The following shows examples of the next header: 00 01 04 06 17 43 44 50 51 58
Hop limit
Source address Destination address
147
8
128 128
= = = = = = = = = =
Hop-by-hop options ICMPv4 IP in IP (encapsulation) TCP UDP Routing Fragment Encapsulating security payload Authentication ICMPv6
Identifies the number of network segments, also known as links or subnets, on which the packet is allowed to travel before being discarded by a router. The Hop Limit is set by the sending host and is used to prevent packets from endlessly circulating on an IPv6 internetwork When forwarding an IPv6 packet, IPv6 routers must decrease the Hop Limit by 1, and must discard the IPv6 packet when the Hop Limit is 0 Identifies the IPv6 address of the original source of the IPv6 packet Identifies the IPv6 address of intermediate or final destination of the IPv6 packet
148
3DTV/3DV IPTV TRANSMISSION APPROACHES
Figure A5.3 Comparison of IPv4 and IPv6 headers.
the header as it was in IPv4. Options are specified in the optional IPv6 Extension Headers. The removal of the Options field from the header enables more efficient routing; only the information that is needed by a router needs to be processed [31]. One area requiring consideration, however, is the length of the IPv6 PDU: the 40-octet header can be a problem for real-time IP applications such as VoIP and IPTV. Header compression becomes critical [32].11 Also, there will be some bandwidth inefficiency in general, that could be an issue in limited-bandwidth environments or applications (e.g., sensor networks.) 11
Two compression protocols emerged from the IETF in recent years [32]: (i) Internet Protocol Header Compression (IPHC), a scheme designed for low Bit Error Rate (BER) links (compression profiles are defined in RFC 2507 and RFC 2508); it provides compression of TCP/IP, UDP/IP, RTP/UDP/IP, and ESP/IP header; “enhanced” compression of RTP/UDP/IP (ECRTP) headers is defined in RFC 3545. (ii) Robust Header Compression (ROHC) Working Group, a scheme designed for wireless links which provides greater compression compared to IPHC at the cost of greater implementation complexity (compression profiles are defined in RFC 3095 and RFC 3096); this is more suitable for high BER, long Round Trip Time (RTT) links and supports compression of ESP/IP, UDP/IP, and RTP/UDP/IP headers.
APPENDIX A5: IPv6 BASICS
149
“Autoconfiguration” is a new characteristic of the IPv6 protocol that facilitates network management and system setup tasks by users. This characteristic is often called “plug-and-play” or “connect-and-work.” Autoconfiguration facilitates initialization of user devices: after connecting a device to an IPv6 network, one or several IPv6 globally unique addresses are automatically allocated. DHCP allows systems to obtain an IPv4 address and other required information (e.g., default router or DNS server). A similar protocol, DHCPv6, has been published for IPv6. DHCP and DHCPv6 are known as stateful protocols because they maintain tables on (specialized) servers. However, IPv6 also has a new stateless autoconfiguration protocol that has no equivalent in IPv4. The stateless autoconfiguration protocol does not require a server component because there is no state to maintain (a DHCP server may typically run in a router or firewall). Every IPv6 system (other than routers) is able to build its own unicast global address. Stateless Address Autoconfiguration (SLAAC) provides an alternative between a purely manual configuration and stateful autoconfiguration [33]. “Stateless” autoconfiguration is also described as “serverless.” The acronym SLAAC is also used for serverless address autoconfiguration. SLAAC is defined in RFC 2462. With SLAAC, the presence of configuration servers to supply profile information is not required. The host generates its own address using a combination of the information that it possesses (in its interface or network card) and the information that is periodically supplied by the routers. Routers determine the prefix that identifies networks associated to the link under discussion. The “interface identifier” identifies an interface within a subnetwork and is often, and by default, generated from the Media Access Control (MAC) address of the network card. The IPv6 address is built combining the 64 bits of the interface identifier with the prefixes that routers determine as belonging to the subnetwork. If there is no router, the interface identifier is self-sufficient to allow the PC to generate a “link-local” address. The “link-local” address is sufficient to allow the communication between several nodes connected to the same link (the same local network). IPv6 addresses are “leased” to an interface for a fixed established time (including an infinite time.) When this “lifetime” expires, the link between the interface and the address is invalidated and the address can be reallocated to other interfaces. For the suitable management of addresses expiration time, an address goes through two states (stages) while is affiliated to an interface [19]: 1. At first, an address is in a “preferred” state, so its use in any communication is not restricted. 2. After that, an address becomes “deprecated,” indicating that its affiliation with the current interface will (soon) be invalidated. When it is in a “deprecated” state, the use of the address is discouraged, although it is not forbidden. However, when possible, any new communication (for example, the opening of a new TCP connection) must use a “preferred” address. A “deprecated” address should only be used by applications that have
150
3DTV/3DV IPTV TRANSMISSION APPROACHES
already used it before and in cases where it is difficult to change this address to another address without causing a service interruption. To ensure that allocated addresses (granted either by manual mechanisms or by autoconfiguration) are unique in a specific link, the link duplicated addresses detection algorithm is used. The address to which the duplicated address detection algorithm is being applied to is designated (until the end of this algorithmic session) as an “attempt address.” In this case, it does not matter that such an address has been allocated to an interface and received packets are discarded. Next, we describe how an IPv6 address is formed. The lowest 64 bits of the address identify a specific interface and these bits are designated as “interface identifier.” The highest 64 bits of the address identify the “path” or the “prefix” of the network or router in one of the links to which such interface is connected. The IPv6 address is formed by combining the prefix with the interface identifier. It is possible for a host or device to have IPv6 and IPv4 addresses simultaneously? Most of the systems that currently support IPv6 allow the simultaneous use of both protocols. In this way, it is possible to support communication with IPv4-only-networks as well as IPv6-only-networks and the use of the applications developed for both protocols [19]. Is it possible to transmit IPv6 traffic over IPv4 networks via tunneling methods. This approach consists of “wrapping” the IPv6 traffic as IPv4 payload data: IPv6 traffic is sent “encapsulated” into IPv4 traffic and at the receiving end, this traffic is parsed as IPv6 traffic. Transition mechanisms are methods used for the coexistence of IPv4 and/or IPv6 devices and networks. For example, an “IPv6-inIPv4 tunnel” is a transition mechanism that allows IPv6 devices to communicate through an IPv4 network. The mechanism consists of creating the IPv6 packets in a normal way and encapsulating them in an IPv4 packet. The reverse process is undertaken in the destination machine that de-encapsulates the IPv6 packet. There is a significant difference between the procedures to allocate IPv4 addresses, that focus on the parsimonious use of addresses (since addresses are a scare resource and should be managed with caution), and the procedures to allocate IPv6 addresses, that focus on flexibility. ISPs deploying IPv6 systems follow the RIRs policies relating to how to assign IPv6 addressing space among their clients. RIRs are recommending ISPs and operators allocate to each IPv6 client a/48 subnetwork; this allows clients to manage their own subnetworks without using NAT. (The implication is that the obligatory need for NAT disappears in IPv6). In order to allow its maximum scalability, the IPv6 protocol uses an approach based on a basic header, with minimum information. This differentiates it from IPv4 where different options are included in addition to the basic header. IPv6 uses a header “concatenation” mechanism to support supplementary capabilities. The advantages of this approach include the following: • The size of the basic header is always the same, and is well known. The basic header has been simplified compared with IPv4, since only 8 fields are used instead of 12. The basic IPv6 header has a fixed size; hence, its processing
APPENDIX A5: IPv6 BASICS
151
by nodes and routers is more straightforward. Also, the header’s structure aligns to 64 bits, so that new and future processors (64 bits minimum) can process it in a more efficient way. • Routers placed between a source point and a destination point (that is, the route that a specific packet has to pass through), do not need to process or understand any “following headers.” In other words, in general, interior (core) points of the network (routers) only have to process the basic header while in IPv4, all headers must be processed. This flow mechanism is similar to the operation in MPLS, yet precedes it by several years. • There is no limit to the number of options that the headers can support (the IPv6 basic header is 40 octets in length, while IPv4 one varies from 20 to 60 octets, depending on the options used). In IPv6, interior/core routers do not perform packets fragmentation, but the fragmentation is performed end-to-end. That is, source and destination nodes perform, by means of the IPv6 stack, the fragmentation of a packet and the reassembly, respectively. The fragmentation process consists of dividing the source packet into smaller packets or fragments [19]. The IPv6 specification defines a number of extension headers [31] (Table A5.3) [34]): • Routing Header: Similar to the source routing options in IPv4, the header is used to mandate a specific routing. • Authentication Header: AH is a security header that provides authentication and integrity. • Encapsulating Security Payload (ESP) Header: ESP is a security header that provides authentication and encryption. • Fragmentation Header: This is similar to the fragmentation options in IPv4. • Destination Options Header: A header that contains a set of options to be processed only by the final destination node. Mobile IPv6 is an example of an environment that uses such a header. • Hop-by-Hop Options Header: A set of options needed by routers to perform certain management or debugging functions. As noted, IPsec provides network-level security where the application data is encapsulated within the IPv6 packet. IPsec utilizes the AH and/or ESP header to provide security (the AH and ESP header may be used separately or in combination). IPsec, with ESP, offers integrity and data origin authentication, confidentiality, and optional (at the discretion of the receiver) antireplay features (using confidentiality without integrity is discouraged by the RFCs); ESP furthermore provides limited traffic flow confidentiality. Both the AH and ESP header may be employed as follows [31] (Fig. A5.4):
152
3DTV/3DV IPTV TRANSMISSION APPROACHES
TABLE A5.3
IPv6 Extension Headers
Header (Protocol ID)
Description
Hop-by-hop options header (protocol 0)
The hop-by-hop options header is used for jumbogram packets and the Router Alert. An example of applying the hop-by-hop options header is Resource Reservation Protocol (RSVP). This field is read and processed by every node and router along the delivery path This header carries optional information that is specifically targeted to a packet’s destination address. The mobile IPv6 protocol specification makes use of the destination options header to exchange registration messages between mobile nodes and the home agent. Mobile IP is a protocol allowing mobile nodes to keep permanent IP addresses even if they change point of attachment This header can be used by an IPv6 source node to force a packet to pass through specific routers on the way to its destination. A list of intermediary routers may be specified within the routing header when the routing type field is set to 0 In IPv6, the Path Maximum Transmission Unit Discovery (PMTUD) mechanism is recommended to all IPv6 nodes. When an IPv6 node does not support PMTUD and it must send a packet larger than the greatest MTU (Maximum Transmission Unit) along the delivery path, the fragment header is used. When this happens, the node fragments the packets and sends each fragment using fragment headers; then the destination node reassembles the original packet by concatenating all the fragments This header is used in IPsec to provide authentication, data integrity, and replay protection. It also ensures protection of some fields of the basic IPv6 header. This header is identical in both IPv4 and IPv6 This header is also used in IPsec to provide authentication, data integrity, replay protection, and confidentiality of the IPv6 packet. Similar to the authentication header, this header is identical in both IPv4 and IPv6
Destination options header (protocol 60)
Routing header (protocol 43)
Fragment header (protocol 44)
Authentication header (AH) (protocol 51)
Encapsulating security payload (ESP) header (protocol 50)
• Tunnel Mode: The protocol is applied to the entire IP packet. This method is needed to ensure security over the entire packet, where a new IPv6 header and an AH or ESP header are wrapped around the original IP packet. • Transport Mode: The protocol is just applied to the transport layer (i.e., TCP, UDP, ICMP) in the form of an IPv6 header, AH or ESP header, followed by the transport protocol data (header, data).
153
APPENDIX A5: IPv6 BASICS
Original IP header
Transport mode
Original IP Hop-by-hop, DST options, routing, fragment header
Mutable field processing
AH
Extension headers
TCP
Data
AH
DST options
Data
TCP
Immutable fields
DSCP ECN Flow Label Hop Limit Authenticated except for mutable fields
Transport mode Tunnel mode
ESP
AH
Original IP Hop-by-hop, DST options, ESP header routing, fragment
DST options
Data
TCP
ESP trailer
ESP ICV
Encrypted Integrity protected New IP header
Hop-by-hop, dest, routing, fragment
Mutable field processing
AH
Original IP Hop-by-hop, Dast, routing, fragment header
TCP
Data
Immutable fields
Tunnel mode
DSCP ECN Flow Label Hop Limit
ESP
Authenticated except for mutuable fields
New IP header
New EXT headers
ESP
Original IP header
Original EXT headers
TCP
Data
ESP trailer
ESP ICV
Encrypted Integrity protected
Figure A5.4
IPsec modes and types.
Migration to IPv6 environments is expected to be fairly complex. Initially, internetworking between the two environments will be critical. Existing IPv4endpoints and/or nodes will need to run dual-stack nodes or convert to IPv6 systems. Fortunately, the new protocol supports an IPv4-compatible IPv6 address that is an IPv6 address employing embedded IPv4 addresses. Tunneling, that we already described in passing, will play a major role in the beginning. There are a number of requirements that are typically applicable to an organization wishing to introduce an IPv6 service [35]: • the existing IPv4 service should not be adversely disrupted (e.g., as it might be by router loading of encapsulating IPv6 in IPv4 for tunnels); • the IPv6 service should perform as well as the IPv4 service (e.g., at the IPv4 line rate, and with similar network characteristics); • the service must be manageable and be able to be monitored (thus tools should be available for IPv6 as they are for IPv4); • the security of the network should not be compromised, due to the additional protocol itself or a weakness of any transition mechanism used; • an IPv6 address allocation plan must be drawn up.
154
3DTV/3DV IPTV TRANSMISSION APPROACHES
Well-known interworking mechanisms include the following [36]12 : • Dual IP-Layer (or Dual Stack): A technique for providing complete support for both IPs—IPv4 and IPv6—in hosts and routers. • Configured Tunneling of IPv6 over IPv4: Point-to-point tunnels made by encapsulating IPv6 packets within IPv4 headers to carry them over IPv4 routing infrastructures. • Automatic Tunneling of IPv6 over IPv4: A mechanism for using IPv4compatible addresses to automatically tunnel IPv6 packets over IPv4 networks. Tunneling techniques include the following [36]12 : • IPv6-over-IPv4 Tunneling: The technique of encapsulating IPv6 packets within IPv4 so that they can be carried across IPv4 routing infrastructures. • Configured Tunneling: IPv6-over-IPv4 tunneling where the IPv4 tunnel endpoint address is determined by configuration information on the encapsulating node. The tunnels can be either unidirectional or bidirectional. Bidirectional configured tunnels behave as virtual point-to-point links. • Automatic Tunneling: IPv6-over-IPv4 tunneling where the IPv4 tunnel endpoint address is determined from the IPv4 address embedded in the IPv4compatible destination address of the IPv6 packet being tunneled. • IPv4 Multicast Tunneling: IPv6-over-IPv4 tunneling where the IPv4 tunnel endpoint address is determined using ND. Unlike configured tunneling, this does not require any address configuration and unlike automatic tunneling it does not require the use of IPv4-compatible addresses. However, the mechanism assumes that the IPv4 infrastructure supports IPv4 multicast. Applications (and the lower-layer protocol stack) need to be properly equipped. There are four cases [37]. Case 1: IPv4-only applications in a dual-stack node. IPv6 protocol is introduced in a node, but applications are not yet ported to support IPv6. The protocol stack is as follows: +-------------------------+ |
appv4
| (appv4 - IPv4-only applications)
+-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | IPv6 | (IP protocols supported/enabled in the OS) +-------------------------+ 12 This
section is based on Ref. [36]. The reference is Copyrighted (C) by The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works.
APPENDIX A5: IPv6 BASICS
155
Case 2: IPv4-only applications and IPv6-only applications in a dual-stack node. Applications are ported for IPv6-only. Therefore there are two similar applications, one for each protocol version (e.g., ping and ping6). The protocol stack is as follows: +-------------------------+ (appv4 - IPv4-only applications) |
appv4 | appv6
| (appv6 - IPv6-only applications)
+-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | IPv6 | (IP protocols supported/enabled in the OS) +-------------------------+
Case 3: Applications supporting both IPv4 and IPv6 in a dual-stack node. Applications are ported for both IPv4 and IPv6 support. Therefore, the existing IPv4 applications can be removed. The protocol stack is as follows: +-------------------------+ |
appv4/v6
| (appv4/v6 - applications supporting both IPv4 and IPv6)
+-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | IPv6 | (IP protocols supported/enabled in the OS) +-------------------------+
Case 4: Applications supporting both IPv4 and IPv6 in an IPv4-only node. Applications are ported for both IPv4 and IPv6 support, but the same applications may also have to work when IPv6 is not being used (e.g., disabled from the OS). The protocol stack is as follows: +-------------------------+ |
appv4/v6
| (appv4/v6 - applications supporting both IPv4 and IPv6)
+-------------------------+ | TCP / UDP / others| (transport protocols - TCP, UDP, etc.) +-------------------------+ | IPv4 | (IP protocols supported/enabled in the OS) +-------------------------+
The first two cases are not interesting in the longer term; only a few applications are inherently IPv4- or IPv6-specific and should work with both protocols without having to care about which one is being used. Figure A5.5 depicts some basic scenarios of carrier-based IPv6 support. Cases (a) and (b) represent traditional environments where the carrier link supports either a clear channel that is used to connect, say, two IPv4 routers, or is IPaware. (In each case, the “cloud” on the left could also be the IPv4 Internet or the IPv6 Internet.)
156
3DTV/3DV IPTV TRANSMISSION APPROACHES
(a)
IPv4
(c)
IPv6-
(e)
IPv4
(g)
IPv4
Carrier (telco) Network (PHY)
Carrier (telco) Network (PHY)
Carrier (telco) Network (IPv4-IPv6)
Carrier (telco) Network (IPv4-IPv6)
IPv4
(b)
IPv4
Carrier (telco) Network (IPv4-based)
IPv6-
(d)
IPv6-
Carrier (telco) Network (IPv4-based)
IPv6-
(f)
IPv6-
Carrier (telco) Network (IPv6-based)
IPv6(h) IPv4
IPv6-
Carrier (telco) Network (IPv6-IPv4)
IPv4
IPv6-
IPv6-
IPv6-
IPv4
Figure A5.5 Support of IPv6 in carrier networks.
In Case (c), the carrier link is used to connect as a transparent link two IPv6 routers; the carrier link is not (does not need to be) aware that it is transferring IPv6 PDUs. In Case (d), the carrier system is IPv4-aware, so the use of that environment to support IPv6 requires IPv6 to operate in a tunneled-mode over the non-IPv6 cloud, which is a capability of IPv6. In Case (e), the carrier infrastructure needs to provide a gateway function between the IPv4 and the IPv6 world (this could entail repacking the IP PDUs from the v4 format to the v6 format). Case (f) is the ideal long-term scenario where the “world has converted to IPv6” and “so did the carrier network.” In Case (g), the carrier IP-aware network provides a conversion function to support both IPv4 (as a baseline) and IPv6 (as a “new technology”) handoffs. Possibly a dual-stack mechanism is utilized. In Case (h), the carrier IPv6-aware network provides a support function for IPv6 (as a baseline) and also a conversion function to support legacy IPv4 islands. Even network/security administrators that operate in a pure IPv4 environment need to be aware of IPv6-related security issues. In a standard IPv4 environment where IPv6 is not explicitly supported, any form of IPv6-based tunneling traffic must be considered abnormal, malicious traffic. For example, unconstrained 6to4-based traffic should be blocked (6to4 is a transitional mechanism intended
APPENDIX A5: IPv6 BASICS
157
for individual independent nodes to connect IPv6 over the greater Internet). Most commercial-grade IPv4 firewalls block IP protocol 41, the 6to4, and tunnel protocol, unless it has been explicitly enabled [38]. In 2008, the Cooperative Association for Internet Data Analysis (CAIDA) and the American Registry for Internet Numbers (ARIN) surveyed over 200 respondents from USG agencies, commercial organizations (including ISPs and end users), educational institutions, associations, and other profit and nonprofit entities to determine the state of affairs in the United States with reference to IPv6 plans. Between 50% and 75% of the organizations surveyed indicated that they plan to deploy IPv6 by 2010 or sooner. According to some observers IPv6 is still an emerging technology, maturing and growing as practical experience is gained; others take a more aggressive view, as seen in the next section. A5.2
Advocacy for IPv6 Deployment—Example
We include below some excerpt from the European Economic and Social Committee and the Committee of the Regions [39] to emphasize the issues related to IPv6. Clearly, issues about IPv6 impact not only Europe but the entire world. The European Economic and Social Committee and the Committee of the Regions has issued an “Action Plan for the deployment of IPv6 in Europe.” It is the objective of this Action Plan to support the widespread introduction of the next version of the IP (IPv6) for the following reasons: • Timely implementation of IPv6 is required as the pool of IP addresses provided by the current protocol version 4 is being depleted. • IPv6 with its huge address space provides a platform for innovation in IP based services and applications. A5.2.1 Preparing for the Growth in Internet Usage and for Future Innovation. One common element of the Internet architecture is the IP that in essence gives any device or good connecting to the Internet a number, an address, so that it can communicate with other devices and/or goods. This address should generally be unique, to ensure global connectivity. The current version, IPv4, already provides for more than 4 billion such addresses. Even this, however, will not be enough to keep pace with the continuing growth of the Internet. Being aware of this long-term problem the Internet community developed an upgraded protocol, IPv6, which has been gradually deployed since the late 1990s. In a previous Communication on IPv6, the European Commission made the case for the early adoption of this protocol in Europe. This Communication has been successful in establishing IPv6 Task Forces, enabling IPv6 on research networks, supporting standards, and setting-up training actions. Following the Communication, more than 30 European R&D projects related to IPv6 were financed. Europe has now a large pool of experts with experience in IPv6 deployment. Yet, despite the progress made, adoption of the new protocol has remained slow while the issue of future IP address scarcity is becoming more urgent.
158
3DTV/3DV IPTV TRANSMISSION APPROACHES
A5.2.2 Increasing Scarcity of IPv4 Addresses: A Difficulty for Users, an Obstacle to Innovation. Initially all Internet addresses are effectively held by the IANA and then large blocks of addresses are allocated to the five RIRs that in turn allocate them in smaller blocks to those who need them, including ISPs. The allocation, from IANA to RIR to ISP, is carried out on the basis of demonstrated need: there is no preallocation. The address space of IPv4 has been used up to a considerable extent. At the end of January 2008 about 16% was left in the IANA pool, that is, approximately 700 million IPv4 addresses. There are widely quoted and regularly updated estimates that forecast the exhaustion of the unallocated IANA pool somewhere between 2010 and 2011. New end users will still be able to get addresses from their ISP for some time after these dates, but with increasing difficulty. Even when IPv4 addresses can no longer be allocated by IANA or the RIRs, the Internet will not stop working: the addresses already assigned can and most probably will be used for a significant time to come. Yet the growth and also the capacity for innovation in IP-based networks would be hindered without an appropriate solution. How to deal with this transition is currently the subject of discussion in the Internet community in general, and within and amongst the RIR communities in particular. All RIRs have recently issued public statements and have urged the adoption of IPv6. A5.2.3 IPv4 is only a Short-Term Solution Leading to More Complexity. Concerns about the future scarcity of IP addresses are not a recent phenomenon. In the early days of the Internet, before the establishment of the RIRs and before the take-off of the World Wide Web, addresses were assigned rather generously. There was a danger of running out of addresses very quickly. Therefore, changes in allocation policy and in technology were introduced that allowed allocation to be more aligned to actual need. One key IPv4 technology has been NAT. NATs connect a private (home or corporate) network that uses private addresses to the public Internet where public IP addresses are required. Private addresses come from a particular part of the address space reserved for that purpose. The NAT device acts as a form of gateway between the private network and the public Internet by translating the private addresses into public addresses. This method therefore reduces consumption of IPv4 addresses. However, the usage of NATs has two main drawbacks, namely:
• It hinders direct device-to-device communication: intermediate systems are required to allow devices or goods with private addresses to communicate across the public Internet. • It adds a layer of complexity in that there are effectively two distinct classes of computers: those with a public address and those with a private address. This often increases costs for the design and maintenance of networks, as well as for the development of applications.
APPENDIX A5: IPv6 BASICS
159
Some other measures could extend the availability of IPv4 addresses. A market to trade IPv4 addresses might emerge that would offer incentives to organizations to sell addresses they are not using. However IP addresses are not strictly property. They need to be globally acceptable to be globally routable, which a seller cannot always guarantee. In addition, they could become a highly priced resource. So far, RIRs have been skeptical about the emergence of such a secondary market. Another option consists of trying to actively reclaim those already-allocated address blocks that are underutilized. However, there is no apparent mechanism for enforcing the return of such addresses. The possible cost of it has to be balanced against the additional lifetime this would bring to the IANA pool. Though such measures may provide some interim respite, sooner or later the demand for IP addresses will be too large to be satisfied by the global IPv4 space. Efforts to stay with IPv4 too long risk increasing unnecessary complexity and fragmentation of the global Internet. A timely introduction of IPv6 is thus the better strategy. A5.2.4 IPv6: The Best Way Forward. IPv6 provides a straightforward and long-term solution to the address space problem. The number of addresses defined by the IPv6 protocol is huge. IPv6 allows every citizen, every network operator (including those moving to all IP “Next Generation Networks”), and every organization in the world to have as many IP addresses as they need to connect every conceivable device or good directly to the global Internet. IPv6 was also designed to facilitate features that were felt to be missing in IPv4. Those features included quality of service, autoconfiguration, security, and mobility. In the meantime, however, most of those features have been engineered in and around the original IPv4 protocol. It is the large address space that makes IPv6 attractive for future applications as this will simplify their design when compared to IPv4. The benefits of IPv6 are, therefore, most obviously apparent whenever a large number of devices or goods need to be easily networked, and made potentially visible and directly reachable over the Internet. A study funded by the Commission demonstrated this potential for a number of market sectors such as home networks, building management, mobile communication, defense and security sector, and car industry. Prompt and efficient adoption of IPv6 offers Europe potential for innovation and leadership in advancing the Internet. Other regions, in particular the Asian region, have already taken a strong interest in IPv6. For instance, the Japanese consumer electronics industry increasingly develops IP enabled products and exclusively for IPv6. The European industry should therefore be ready to meet future demand for IPv6-based services, applications, and devices and so secure a competitive advantage in world markets. To conclude, the key advantage of IPv6 over IPv4 is the huge, more easily managed address space. This solves the future problem of address availability now and for a long time to come. It provides a basis for innovation—developing and deploying services and applications that may be too complicated or too costly
160
3DTV/3DV IPTV TRANSMISSION APPROACHES
in an IPv4 environment. It also empowers users, allowing them to have their own network connected to the Internet. A5.2.5 What Needs to be Done? IPv6 is not directly interoperable with IPv4. IPv6 and IPv4 devices can only communicate with each other using application-specific gateways. They do not provide a general future-proof solution for transparent interoperability. However, IPv6 can be enabled in parallel with IPv4 on the same device and on the same physical network. There will be a transition phase (expected to last for 10, 20, or even more years) when IPv4 and IPv6 will coexist on the same machines (technically often referred to as “dual stack”) and be transmitted over the same network links. In addition, other standards and technologies (technically referred to as “tunneling”) allow IPv6 packets to be transmitted using IPv4 addressing and routing mechanisms and ultimately vice versa. This provides the technical basis for the step-by-step introduction of IPv6. Because of the universal character of the IP, deployment of IPv6 requires the attention of many actors worldwide. The relevant stakeholders in this process are as follows:
• Internet organizations (such as ICANN, RIRs, and IETF) that need to manage common IPv6 resources and services (allocate IPv6 addresses, operate DNS servers, etc.), and continue to develop needed standards and specifications. As of May 2008, the regional distribution of allocated IPv6 addresses is concentrated on Europe (R´eseaux Internet Protocol Europ´eens or RIPE: 49%), with Asia and North America growing fast (Asia–Pacific Network Information Centre, APNIC: 24%; ARIN: 20%). Less than half of those addresses are currently being announced on the public Internet (i.e., visible in the default-free routing table). In the DNS the root and top-level name servers are increasingly becoming IPv6 enabled. For instance, the gradual introduction of IPv6 connectivity to. eu name servers started in 2008. • ISPs that need over time to offer IPv6 connectivity and IPv6 based services to customers: There is evidence that less than half of the ISPs offer some kind of IPv6 interconnectivity. Only a few ISPs have a standard offer for IPv6 customer access service (mainly for business users) and provide IPv6 addresses. The percentage of “Autonomous Systems” (typically ISPs and large end users) that operate IPv6 is estimated at 2.5%. Accordingly, IPv6 traffic seems to be relatively low. Typically the IPv6/v4 ratio is less than 0.1% at Internet Exchange Points (of which about one in five supports IPv6). However, this omits direct ISP to ISP traffic and IPv6 that is “tunneled” and so appears at first glance to be still IPv4. Recent measurements suggest that this kind of traffic IPv6 that is “tunneled” is growing. • Infrastructure vendors (such as network equipment, operating systems, network application software) that need to integrate IPv6 capability into their products: Many equipment and software vendors have upgraded their products to include IPv6. However, there are still issues with certain functions and performance, and vendor support equivalent to IPv4. The installed
APPENDIX A5: IPv6 BASICS
161
equipment base of consumers, such as small routers and home modems to access the Internet, still by and large do not yet support IPv6. • Content and service providers (such as websites, instant messaging, email, file sharing, voice over IP) that need to be reachable by enabling IPv6 on their servers: Worldwide there are only very few IPv6 websites. Almost none of the global top sites offer an IPv6 version. The de facto nonexistence of IPv6 reachable content and services on the Internet is a major obstacle in the take-up of the new protocol. • Business and consumer application vendors (such as business software, smart cards, peer-to-peer software, transport systems, sensor networks) that need to ensure that their solutions are IPv6 compatible and increasingly need to develop products and offer services that take advantage of IPv6 features. Today, there are few, if any, current applications that are exclusively built on IPv6. One expectation has been that proliferation of IP as the dominant network protocol would drive IPv6 into new areas such as logistics and traffic management, mobile communication, and environment monitoring that has not taken place to any significant degree yet. • End users (consumers, companies, academia, and public administrations) that need to purchase IPv6 capable products and services and to enable IPv6 on their own networks or home Internet access: Many home end users, without being aware of it, operate IPv6 capable equipment and yet, as a result of missing applications, without necessarily making use of it. Companies and public administrations are cautious to make changes to a functioning network without a clear need. Therefore not much user deployment in private networks is visible. Among the early adopters have been universities and research institutions. All EU national research and education networks also operate on IPv6. The European G´eant network is IPv6 enabled, whereby approximately 1% of its traffic is native IPv6. How much and which efforts are required to adopt IPv6 differ amongst actors and depend on each individual case. Therefore, it is practically impossible to reliably estimate the aggregated costs to introduce IPv6 globally. Experience and learning from projects have shown that costs can be kept under control when deployment is gradual and planned ahead. It is recommended that IPv6 be introduced step-by-step, possibly in connection with hardware and software upgrades, organizational changes, and training measures (at first glance unrelated to IPv6). This requires a general awareness within the organization in order to not miss those synergies. The costs will be significantly higher when IPv6 is introduced as a separate project and under time constraints. Introduction of IPv6 will take place alongside the existing IPv4 networks. Standards and technology allow for a steady incremental adoption of IPv6 by the various stakeholders that will help to keep costs under control. Users can use IPv6 applications and generate IPv6 traffic without waiting for their ISP to offer IPv6 connectivity. ISPs can increase their IPv6 capability and offer this in line with perceived demand.
CHAPTER 6
3DTV Standardization and Related Activities
This chapter provides a survey of some key standardization activities to support the deployment of 3DTV. Standards need to cover many, if not all, elements depicted in Fig. 4.1, including capture, mastering, distribution, and consumer devices interface. Standards for 3D transport issues are particularly important because content providers and studios seek to create one master file that can carry stereo 3D content (and 2D content by default) across all the various distribution channels including cable TV, satellite, over-the-air, packaged media, and the Internet. Standardization efforts have to be understood in the context of where stakeholders and proponents see the technology going. We already defined what we believe to be five generations of 3DTV commercialization in Chapter 1, which the reader will certainly recall. These generations fit in well with the following menu of research activity being sponsored by various European and global research initiatives, as described in Ref. [1]: Short-term 3DV R&D (immediate commercialization, 2010–2013) • Digital stereoscopic projection – better/perfect alignment to minimize “eye-fatigue.” • End-to-end digital production-line for stereoscopic 3D cinema – digital stereo cameras; – digital baseline correction for realistic perspective; – digital postprocessing. Medium-term 3DV R&D (commercialization during the next few years, 2013–2016) • End-to-end multi-view 3DV with autostereoscopic displays – cameras and automated camera calibration; – compression/coding for efficient delivery; 3DTV Content Capture, Encoding and Transmission: Building the Transport Infrastructure for Commercial Services, by Daniel Minoli Copyright 2010 John Wiley & Sons, Inc.
163
164
3DTV STANDARDIZATION AND RELATED ACTIVITIES
– standardization; – view interpolation for free-view video; – better autostereoscopic displays, based on current and near future technology (lenticular, barrier-based); – natural immersive environments. Long-term 3DV R&D (10+ years, 2016–2020+) • realistic/ultrarealistic displays; • “natural” interaction with 3D displays; • holographic 3D displays, including “integral imaging” variants; • natural immersive environments; • total decoupling of “capture” and “display”; • novel capture, representation, and display techniques. One of the goals of the current standardization effort is to decouple the capture function from the display function. This is a very typical requirement for service providers, going back to voice and Internet services: there will be a large pool of end users each opting to choose a distinct Customer Premises Equipment (CPE) device (e.g., phone, PC, fax machine, cell phone, router, 3DTV display); therefore, the service provider needs to utilize an network-intrinsic protocol (encoding, framing, addressing, etc.) that can then be utilized by the end device to create its own internal representation, as needed. The same applies to 3DTV. As noted in Chapter 1, there is a lot of interest shown in this topic by the industry and standards body. The MPEG of ISO/IEC is working on a coding format for 3DV. Standards are the key to cost-effective deployment of a technology. Examples of video-related standards include the Beta-VHS (Video Home System) and the HD DVD–Blu-ray controversies.1 As we mentioned in Chapter 1, SMPTE is working on some of the key standards needed to deliver 3D to the home. As far back as 2003, a 3D Consortium with 70 partner organizations had been founded in Japan and, more recently, four new activities have been started: the 3D@Home Consortium, the SMPTE 3D Home Entertainment Task Force, the Rapporteur Group on 3DTV of ITU-R Study Group 6, and the TM-3D-SM group of DVB. It will probably be somewhere around 2012 by the time there will be an interoperable standard available in consumer systems to handle all the delivery mechanisms for 3DTV. At a broad level and in the context of 3DTV, the following major initiatives had been undertaken at press time: • MPEG: standardizing multi-view and 3DV coding; • DVB: standardizing of digital video transmission to TVs and mobile devices; 1 HD
DVD (High-Definition/Density DVD) was a high-density optical disc format for storing data and high-definition video advanced principally by Toshiba. In 2008 after a protracted format war with rival Blu-ray, the proposed format was abandoned.
MOVING PICTURE EXPERTS GROUP (MPEG)
165
• SMPTE: standardizing 3D delivery to the home; • ITU-T: standardizing user experience of multimedia content; • VQEG (Video Quality Experts Group): standardizing of objective video quality assessment. We review some of the ongoing standardization/advocacy work in this chapter. Only a subset of the universe of entities working on 3DTV is covered here. There is a pragmatic possibility that in the short term, equipment providers may have to support a number of formats for stereo 3D content. The ideal approach for stereoscopic 3DTV is to provide sequential left and right frames at twice the chosen viewing rate. However, because broadcasters and some devices may lack transport/interface bandwidth for that approach, a number of alternatives may also be used (at least in the short term). Broadcasters appear to be focusing on top/bottom interleaving; however, trials are still ongoing to examine other approaches that involve some form of compression including checkerboard, sideby-side, or interleaved rows or columns [2].2 6.1
MOVING PICTURE EXPERTS GROUP (MPEG)
6.1.1
Overview 3
MPEG is a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video and related data. Established in 1988, the group produces standards that help the industry offer end users an evermore enjoyable digital media experience. In its 21 years of activity, MPEG has developed a substantive portfolio of technologies that have created an industry worth several hundred billion USD. MPEG is currently interested in 3DV in general and 3DTV in particular. Any broad success of 3DTV/3DV will likely depend on the development and industrial acceptance of MPEG standards; MPEG is the premiere organization worldwide for video encoding and the list of standards that have been produced in recent years is as follows: MPEG-1 MPEG-2 MPEG-4 MPEG-7 MPEG-21 2 In
The standard on which such products as video CD and MP3 are based The standard on which such products as digital television set-top boxes and DVDs are based The standard for multimedia for the fixed and mobile web The standard for description and search of audio and visual content The multimedia framework
this case, the TV set will have to recognize all the various formats and transcode and convert them to the native rate of the TV. This is obviously suboptimal, but is similar to what actually transpired related to frame rates initially required to support in HDTVs for cameras and TVs. 3 This entire section is based on ISO/MPEG materials.
166
3DTV STANDARDIZATION AND RELATED ACTIVITIES
MPEG-A MPEG-B MPEG-C MPEG-D MPEG-E MPEG-M MPEG-U MPEG-V
The standard providing application-specific formats by integrating multiple MPEG technologies A collection of systems-specific standards A collection of video-specific standards A collection of audio-specific standards A standard (M3W) providing support to download and execute multimedia applications A standard (MXM) for packaging and reusability of MPEG technologies A standard for rich media user interface A standard for interchange with virtual worlds
Table 6.1 provides a more detailed listing of activities of MPEG groups in the area of video. 6.1.2
Completed Work
As we have seen in other parts of this text, currently there are a number of different 3DV formats (either already available and/or under investigation), typically related to specific types of displays (e.g., classical two-view stereo video, multiview video with more than two views, V + D, MV + D, and layered depth video). Efficient compression is crucial for 3DV applications and a plethora of compression and coding algorithms are either already available and/or under investigation for the different 3DV formats (some of these are standardized e.g., by MPEG, others are proprietary). A generic, flexible, and efficient 3DV format that can serve a range of different 3DV systems (including mobile phones) is currently being investigated by MPEG. As we noted earlier in this text, MPEG standards now already support 3DV based on V + D. In 2007 MPEG specified a container format “ISO/IEC 23002-3 Representation of Auxiliary Video and Supplemental Information” (also known as MPEG-C Part 3) that can be utilized for V + D data. Transport of this data is defined in a separate MPEG systems specification “ISO/IEC 13818-1:2003 Carriage of Auxiliary Data” [3, 4]. In 2008 ISO approved a new 3DV project in 2008 under ISO/IEC JTC1/SC29/WG11 (ISO/IEC JTC1/SC29/WG11, MPEG2008/N9784). The JVT of ITU-T and MPEG has devoted its recent efforts to extend the widely deployed H.264/AVC standard for MVC to support MV + D (and also V + D). MVC allows the construction of bitstreams that represent multiple views. The MPEG standard that emerged, MVC, provides good robustness and compression performance for delivering 3DV by taking into account of the inter-view dependencies of the different visual channels. In addition, its backwards-compatibility with H.264/AVC codecs makes it widely interoperable in environments having both 2D and 3D capable devices. MVC supports an MV + D (and also V + D) encoded representation inside the MPEG-2 transport stream. The MVC standard was developed by the JVT of ISO/IEC MPEG
167
1.
4.
3.
2.
1.
1. 2. 3. 4. 5. 6. 7.
1. 2.
1. 2.
1. 2. 3. 4. 5. 6. 7. 8.
2D video coding MPEG-1 video MPEG-2 video MPEG-4 visual (rectangular) Shape coding (nonrectangular) Advanced video coding Scalable video coding MVC High-performance video coding Decoder representation Reconfigurable video coding Coding tool repository 3D video coding Auxiliary video data representation 3D video coding Audio coding MPEG-1 audio MPEG-2 audio Advanced audio coding Parametric audio coding Spectral band replication Lossless coding Scalable lossless coding
Media coding
1-Pager
White Paper
X X X X X X Coded representation of time-dependent 3D arrays of pixels X X X X Coded representation of audio (speech and music) information X X X X X X X X X X X X X X X X X X X X X
X X X X X X X
X X
X X X X X X X X
Presentation
(continued overleaf)
graphics, etc. in a bit-efficient way Coded representation of time-dependent 2D arrays of pixels (video) X X X X X X X X X X X X X X X X X X X X X X X X
Standards to represent natural and synthetic media such as audio, video,
Summary
Activities of MPEG Groups in the Area of Video
Technology Area
TABLE 6.1
168
11.
10.
9.
8.
7.
6.
5.
Technology Area
(Continued )
1. Control information 2. Sensory information 3. Virtual object characteristics
8. 1-bit lossless coding 9. MPEG surround 10. Spatial audio object coding 11. Unified speech and audio coding 2D graphic coding 1. Texture coding 2. 2D mesh coding 3D graphic coding 1. Face and body animation 2. 3D mesh coding 3. AFX Synthetic audio coding 1. Structured audio Text coding 1. Streaming text format Font coding 1. Font compression and streaming 2. Open font format Music coding 1. Symbolic music representation Media context and control
TABLE 6.1 1-Pager
White Paper
Presentation
X X X X X X X X X X X X X X X X Coded representation of 2D synthetic information X X X X X X X X Coded representation of 3D synthetic information X X X X X X X X X X X X Coded representation of synthetic audio information X X X X Coded representation of text information X X X X Coded representation of font information X X X X X X X X Coded representation of musical information X X X X Coded representation of information designed to stimulate other senses than vision or audition; for example, olfaction, mechanoreception, equilibrioception, or thermoception X X X X X X X X X X X X
Summary
169
3.
2.
1. 2. 3. 4. 5.
1.
Description technologies
Media value chains Media value chains Composition coding Composition coding Binary Format for Scenes (BIFS) Audio BIFS BIFS for digital radio Lightweight scene representation Presentation and modification of structured information Description coding
1. Description definition language 2. MPEG-7 schemas 2. Video description 1. Low-level descriptions 2. High-level descriptions 3. Overview 4. Visual description tools 5. Image and video signature 3. Audio description 1. Low-level descriptions 2. High-level descriptions
1.
1.
12.
X X
X X X X X
X X
(continued overleaf)
Standards to describe media content that can be stored and transmitted for use by a machine Descriptors, description schemes, description definition language, and efficient representation technologies X X X X X X Description of video and image information X X X X X X X X X X X X X X X Description of audio information X X X X X X
Coded representation of information regarding the full media value chain X X X X Standards to describe how different media objects are composed in a scene Coded representation of the composition of media objects in a scene X X X X X X X X X X X X X X X X X X X X
170
5.
4.
1.
2. 3.
1.
1. 2. 3.
1.
Multimedia description Multimedia description schemes Systems support Multiplexing and synchronization MPEG-1 MPEG-2 MPEG-4 Signaling DSM-CC (Digital Storage Media Command and Control) user to user DSM-CC user to network DMIF IPMP
Technology Area
(Continued )
1. 2.
Rights expression language Rights data dictionary
General MPEG technologies for DRM 2. Identification technologies 1. MPEG-2 copyright identifier 2. Object content information 3. Digital item identification 3. Rights expression technologies
1.
2.
1.
4.
TABLE 6.1 1-Pager
White Paper
Presentation
X X X X X X Standards to enable the management and protection of intellectual property related to digital media objects General information on MPEG IPMP technologies X X X Technologies to uniquely identify media objects X X X X X X X X X Syntax and semantics of rights expression languages and dictionary of terms rights data X X X X X X
X X
X X X of
X
X X
Description of information types that are used in multimedia applications X X X X Standards to enable the use of digital media by an application Technologies to serialize multiple media sources and to keep them synchronized X X X X X X X X X X X X Protocols to interact with a delivery system X X X X
Summary
171
6.
2.
1.
5.
4.
Digital item declaration Digital item processing C++ bindings Session mobility Event reporting Schema files Digital item presentation Resources in digital items
Digital item technologies
Persistent association technologies Evaluation tools for persistent association Access technologies MPEG-2 IPMP MPEG-4 IPMP MPEG-21 IPMP XML representation of IPMP-X messages Digital item
1. Digital item adaptation 2. Fragment identification for MPEG resources
1. 2. 3. 4. 5. 6. 7.
1. 2. 3. 4.
1.
X
X X
X X X X X X X
(continued overleaf)
Standards to represent structured digital objects, including identification, metadata, and governance information Technologies designed to deal specifically for digital items, such as digital item declaration, digital item processing, and event reporting X X X X X X X X X X X X X X X X X X X X X Handling of resources in digital items such as when adapting or identifying fragments X X X X X X
Protocols to access IPMP tools when they are required by an IPMP system X X X X X X X X X X X X X X X X
Technologies to bind information to resources in a persistent fashion X X X
172
9.
8.
7.
1.
1.
3.
2.
1.
1. 2.
1. 2.
1.
1. 2. 3. 4. 5. 6.
1. 2.
TABLE 6.1
Terminal architecture MPEG-1 MPEG-2
Transport of media streams Program stream Transport stream M4Mux Media file formats ISO base media file format MPEG-4 file format AVC file format SVC file format MVC file format Digital item file format Transport of digital items Digital item streaming User interaction User interaction Widgets Advanced user interaction Multimedia architecture
Transport and file format
Technology Area
(Continued ) 1-Pager
White Paper
X X X X X X Reference models and technology to enable the use of digital media in a device or by an application Reference architectures for MPEG standards X X X X X X
X X
X X
X
X X X X X X
X X X
Presentation
Standards to enable the transport of digital media by means of files or transport protocol Technology to transport digital media information on a transport protocol X X X X X X X X X Technology to package digital media information in a file X X X X X X X X X X X X X X X X X X Technology to transport digital items X X X Technologies for user interaction
Summary
173
1.
1. 2. 3. 4. 5. 6. 7.
Application formats Music player Photo player Musical slide show Media streaming Professional archival Open access Portable video
3. MPEG-4 4. Graphics compression model 5. MPEG-7 6. M3W architecture 7. MXM architecture and technologies 2. Application Programming Interfaces ( API s) 1. MPEG-J 2. MPEG-J GFX 3. M3W multimedia API 4. MXM API 3. Terminals 1. M3W component model 2. M3W resource and quality management 3. M3W component download 4. M3W fault management 5. M3W system integrity management 6. Advanced IPTV terminal 10. Application formats
(continued overleaf)
X X X X X X X
X X X X
X X
X X
X X X X X X X X X X X X Standards to support specific applications by means of component MPEG technologies Specification of formats for media players X X X X X X X X X X X X X X X X X X X X X
X X X X
X X X X
X X X X X X X X API to enable enhanced use of rich media X X X X
X X X X X
X X X X X
X X X X X X X X X X API to enable enhanced use of rich media
174
2.
1.
1.
XML technologies Binary MPEG format for XML Signal processing technologies
1.
1. 2. 3. 4. 5.
Reference software MPEG-1 MPEG-2 MPEG-4 MPEG-7 MPEG-21
1. Generic inverse DCT specification 2. Fixed point implementation of DCT/IDCT 3. Bitstream technologies 1. Bitstream syntax description language 12. Protocols Protocols 1. MXM protocols 13. Reference implementations
11.
Technology Area
(Continued )
8. Digital multimedia broadcasting 9. Video surveillance 10. Stereoscopic video 11. Interactive music Generic media technologies
TABLE 6.1 1-Pager
White Paper
Presentation
X X X Implementation of MPEG standards using programming languages or hardware description languages Implementation of MPEG standards using a programming language X X X X X X X X X X X X X X X
Protocols to communicate between devices
X X X X X
X
DSP technologies such as 8 × 8 DCT and IDCT, and coding tool specification X X X X
X X X X X X X X X X X X X X X X Generic standard technologies for digital media to be used across MPEG standards XML-related technologies such as binarization X X X X Digital Signal Processing (DSP) technologies such as 8 × 8 DCT and IDCT, and coding tool specification X X X X X X X X
Summary
175
14.
MPEG-4 Conformance
1.
MPEG-1 1. Systems 2. Video 3. Audio 2. MPEG-2 1. Systems 2. Video 3. Audio 4. DSM-CC 3. MPEG-4 1. Systems 2. Visual 3. Audio 4. AVC
1.
2.
MPEG-A MPEG-B MPEG-C MPEG-D MPEG-E MPEG-M MPEG-U MPEG-V Reference hardware description
6. 7. 8. 9. 10. 11. 12. 13.
X X X
X X X X
X X X
X
X X X X X X X X
(continued overleaf)
X X X X X X X X X X X X X X X X X X X X X X X X Implementation of MPEG standards using a hardware description language X X X Specification of procedures and data to test the conformance of encoders, bitstreams or decoders to an MPEG standard MPEG-1 conformance X X X X X X X X X MPEG-2 conformance X X X X X X X X X X X X MPEG-4 conformance X X X X X X X X X
176
15.
1. 2.
7. 8. 9. 10. 11. 12. 13.
6.
5.
4.
Technology Area
(Continued )
MPEG-1 MPEG-2
MPEG-7 1. Systems 2. Visual 3. Audio MPEG-21 1. Digital item declaration 2. Rights expression language 3. Digital item adaptation 4. Digital item processing 5. Digital item processing MPEG-A 1. Music player 2. Photo player MPEG-B MPEG-C MPEG-D MPEG-E MPEG-M MPEG-U MPEG-V Maintenance
TABLE 6.1 1-Pager
standards through
X X
X X X X X
X X X
White Paper
MPEG-7 conformance X X X X X X MPEG-21 conformance X X X X X X X X X X MPEG-A conformance X X X X MPEG-B conformance MPEG-C conformance MPEG-D conformance MPEG-E conformance MPEG-M conformance MPEG-U conformance MPEG-V conformance Activities designed to maintain the body of MPEG the development of corrigenda and new editions MPEG-1 maintenance MPEG-2 maintenance
Summary
X X
X X X X X
X X X
Presentation
177
X = available.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
MPEG-4 MPEG-7 MPEG-21 MPEG-A MPEG-B MPEG-C MPEG-D MPEG-E MPEG-M MPEG-U MPEG-V
MPEG-4 maintenance MPEG-7 maintenance MPEG-21 maintenance MPEG-A maintenance MPEG-B maintenance MPEG-C maintenance MPEG-D maintenance MPEG-E maintenance MPEG-M maintenance MPEG-U maintenance MPEG-V maintenance
178
3DTV STANDARDIZATION AND RELATED ACTIVITIES
and ITU-T Video Coding Experts Group (VCEG; ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6). MVC was originally an addition to H.264/MPEG-4 AVC video compression standard that enables efficient encoding of sequences captured simultaneously from multiple cameras using a single video stream. At press time, MVC was the most efficient way for stereo and multi-view video coding; for two views, the performance achieved by H.264/AVC Stereo SEI message and MVC are similar. MVC is also expected to become a new MPEG video coding standard for the realization of future video applications such as 3DTV and FTV. The MVC group in the JVT has chosen the H.264/AVC-based MVC method as the MVC reference model, since this method showed better coding efficiency than H.264/AVC simulcast coding and the other methods that were submitted in response to the call for proposals made by the MPEG [3, 5]. 6.1.3
New Initiatives
ISO MPEG has already developed a suite of international standards to support 3D services and devices, and in 2009 initiated a new phase of standardization to be completed by 2011 [6]. • One objective is to enable stereo devices to cope with varying display types and sizes, and different viewing preferences. This includes the ability to vary the baseline distance for stereo video to adjust the depth perception that could help to avoid fatigue and other viewing discomforts. • MPEG also envisions that high-quality autostereoscopic displays will enter the consumer market in the next few years. Since it is difficult to directly provide all the necessary views due to production and transmission constraints, a new format is needed to enable the generation of many high-quality views from a limited amount of input data such as stereo and depth. ISO’s vision is now a new 3DV format that goes beyond the capabilities of existing standards to enable both advanced stereoscopic display processing and improved support for autostereoscopic N -view displays, while enabling interoperable 3D services. The new 3DV standard aims to improve rendering capability of 2D + Depth format while reducing bitrate requirements relative to existing standards, as noted earlier in this Section 6. 3DV supports new types of audiovisual systems that allow users to view videos of the real 3D space from different user viewpoints. In an advanced application of 3DV, denoted as FTV, a user can set the viewpoint to an almost arbitrary location and direction that can be static, change abruptly, or vary continuously, within the limits that are given by the available camera setup. Similarly, the audio listening point is changed accordingly. The first phase of 3DV development is expected to support advanced 3D displays, where M dense views must be generated from a sparse set of K transmitted views (typically K ≤ 3) with associated depth data. The allowable range of view synthesis will be relatively narrow (20◦ view angle from leftmost to rightmost view).
MOVING PICTURE EXPERTS GROUP (MPEG)
179
Display configuration User preferences
3D content production
Depth camera
N × Video + Depth
Multi-View coding + DVB Transmission
Multi-camera setup
2D display Depth-Image-based rendering
Stereo camera
2D/3D conversion
M-view 3D display
Head-tracked stereo display Video
Depth
Metadata
Figure 6.1 Example of an FTV system and data format.
The MPEG initiative notes that 3DV is a standard that targets serving a variety of 3D displays. It is the first phase of FTV, that is a new framework that includes a coded representation for multi-view video and depth information to support the generation of high-quality intermediate views at the receiver. This enables free viewpoint functionality and view generation for automultiscopic displays [7]. Figure 6.1 shows an example of an FTV system that transmits multi-view video with depth information. The content may be produced in a number of ways; for example, with multicamera setup, depth cameras or 2D/3D conversion processes. At the receiver, DIBR could be performed to project the signal to various types of displays. The first focus (phase) of ISO/MPEG standardization for FTV is 3DV [8]. This means video for 3D displays. Such displays focus present N views (e.g., N = 9) simultaneously to the user (Fig. 6.2). For efficiency reasons, only a lower number K of views (K = 1, 2, 3) shall be transmitted. For those K views additional depth data shall be provided. At the receiver side, the N views to be displayed are generated from the K transmitted views with depth by DIBR. This is illustrated in Fig. 6.2. This application scenario imposes specific constraints such as narrow angle acquisition (