Content Networking in the Mobile Internet

CONTENT NETWORKING IN THE MOBILE INTERNET Edited by SUDHIR DIXIT and TAO WU Nokia Research Center A JOHN WILEY & SONS, ...

Author: Sudhir Dixit | Tao Wu

97 downloads 2367 Views 7MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

CONTENT NETWORKING IN THE MOBILE INTERNET Edited by SUDHIR DIXIT and TAO WU Nokia Research Center

A JOHN WILEY & SONS, INC., PUBLICATION

CONTENT NETWORKING IN THE MOBILE INTERNET

CONTENT NETWORKING IN THE MOBILE INTERNET Edited by SUDHIR DIXIT and TAO WU Nokia Research Center

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright # 2004 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008. Limit of Liability/Disclaimer of Warranty. While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data: Dixit, Sudhir. Content networking in the mobile Internet / Sudhir Dixit & Tao Wu. p. cm. Includes bibliographical references and index. ISBN 0-471-46618-2 (Cloth) 1. Wireless Internet. 2. Computer networks. I. Wu, Tao, 1971–. II. Title. TK5103.4885 D58 2004 004.670 8- -dc22 2003025148 Printed in the United States of America 10 9 8

7 6 5

4 3 2 1

To my in-laws, Shri S. D. Shukla and Smt. M. D. Shukla —SUDHIR DIXIT To my wife Lingxuan, parents Huile Wu and Zuwu Zhang, brother Jiang, and sister Hong —TAO WU

CONTRIBUTORS

JANNE AALTONEN , Nokia Ventures Organization, Joukahaisenkatu 1, 20520 Turku, Finland MITRI ABOU -RIZK , Nokia Research Center, 5 Wayside Road, Burlington, MA 01803 ATUL ADYA , Microsoft Research, One Microsoft Way, Redmond, WA 98052 SADHNA AHUJA , Nokia Research Center, 5 Wayside Road, Burlington, MA 01803 PARAMVIR BAHL , Microsoft Research, One Microsoft Way, Redmond, WA 98052 DAVID BANJO , Nokia Networks, FIN-00045 Nokia Group, Helsinki, Finland STEPHANE COULOMBE , Nokia Research Center, 6000 Connection Drive, Irving, TX 75039 SUDHIR DIXIT , Nokia Research Center, 5 Wayside Road, Burlington, MA 01803 XIA GAO , DoCoMo USA Labs, 181 Metro Drive, Suite 300, San Jose, CA 95110 GUIDO GRASSEL , Nokia Research Center, Ita¨merenkatu 11-13, FIN-0045, Helsinki, Finland HARRI HOLMA , Nokia Networks, FIN-00045 Nokia Group, Espoo, Finland OSAKARI KOSKIMIES , Nokia Research Center, Ita¨merenkatu 11-13, FIN-0045, Helsinki, Finland DEEPA KUNDUR , Department of Electrical Engineering, Texas A&M University, College Station, TX 77843 DAN LI , Cisco Systems, 1322 Nelson Way, Sunnyvale, CA 94087 vii

viii

CONTRIBUTORS

CHING -YUNG LIN , IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532 MICHAEL MAHAN , Nokia Research Center, 5 Wayside Road, Burlington, MA 01803 MUHAMMAD MUKARRAM BIN TARIQ , DoCoMo Communication Laboratories USA, Inc., 181 Metro Drive, Suite 300, San Jose, CA 95110 LILI QIU , Microsoft Research, One Microsoft Way, Redmond, WA 98052 GANESH SIVARAMAN , Nokia Research Center, FIN-33721, Helsinki, Finland KEISUKE SUWA , Musashi Institute of Technology, Japan ANTTI TOSKALA , Nokia Networks, FIN-0045 Nokia Group, Espoo, Finland ANTII -PENTTI VAINIO , Nokia Research Center, FIN-33721, Tampere, Finland SANJEEV VERMA , Nokia Research Center, 5 Wayside Road, Burlington, MA 01803 ROD WALSH , Nokia Research Center, FIN-33721, Tampere, Finland GANG WU , DoCoMo USA Labs, 181 Metro Drive, Suite 300, San Jose, CA 95110 TAO WU , Nokia Research Center, 5 Wayside Road, Burlington, MA 01803 TAKESHI YOSHIMURA , Multimedia Laboratories, NTT DoCoMo, Inc., 3-5, Hikarino-oka, Yokosuka, Kanagawa, 239-8536, Japan HEATHER YU , Panasonic Information & Networking Technologies Laboratory, 2 Research Way, Princeton, NJ 08540

CONTENTS

PREFACE ACRONYMS 1 CONTENT NETWORKING IN THE MOBILE INTERNET Sudhir Dixit and Tao Wu

1.1 1.2 1.3

Introduction / 1 Content Networking in the Mobile Internet / 2 Book Overview / 4 1.3.1 Chapter 2: Mobile Internet Architecture Overview / 4 1.3.2 Chapter 3: Protocols for the Web and the Mobile Internet / 5 1.3.3 Chapter 4: Content Caching and Multicast / 5 1.3.4 Chapter 5: Characterizing Web Workload of Mobile Clients / 5 1.3.5 Chapter 6: ACME: A New Mobile Content Delivery Architecture / 5 1.3.6 Chapter 7: Content Adaptation for the Mobile Internet / 6 1.3.7 Chapter 8: Content Synchronization / 6 1.3.8 Chapter 9: Multimedia Streaming in Mobile Wireless Networks / 6 1.3.9 Chapter 10: Multicast Content Delivery for Mobiles / 6

xxv xxvii 1

x

CONTENTS

1.4

1.3.10 Chapter 11: Security and Digital Rights Management for Mobile Content / 7 1.3.11 Chapter 12: Charging for Mobile Content / 7 1.3.12 Chapter 13: Algorithms and Infrastructures for Location-Based Services / 7 1.3.13 Chapter 14: Fixed and Mobile Web Services / 8 Concluding Remarks / 8

2 MOBILE INTERNET ARCHITECTURE OVERVIEW

9

Harri Holma and Antti Toskala

2.1 2.2 2.3 2.4

2.5

2.6 2.7 2.8

Introduction / 9 Standardization Framework / 10 System Architecture and Core Network / 11 WCDMA Radio Access Network / 13 2.4.1 WCDMA Radio Access Network Architecture / 13 2.4.2 WCDMA Layer 2/3 Architecture and Principles / 14 2.4.3 WCDMA Physical Layer / 18 2.4.4 WCDMA beyond 2Mbps with HSDPA / 20 2.4.5 Evolution of WCDMA / 21 2.4.5.1 Enhanced Uplink Dedicated Channel / 22 2.4.5.2 New Frequency Variants of WCDMA / 23 2.4.5.3 Advanced Antenna Technologies / 23 2.4.5.4 Multimedia Broadcast and Multicast Service / 24 GSM/GPRS/EDGE / 24 2.5.1 GSM Principle / 24 2.5.2 GSM Radio Access Network Architecture / 26 2.5.3 GSM Service Creation Principle / 26 IS-95 Radio Access / 27 GSM/EDGE and WCDMA Operator Performance / 29 GSM/EDGE and WCDMA End-User Performance / 31 References / 33

3 PROTOCOLS FOR THE WEB AND THE MOBILE INTERNET Mitri Abou-Rizk

3.1 3.2

Introduction / 35 History of the World Wide Web / 35

35

CONTENTS

3.3 3.4 3.5

3.6

The Web Today / 37 The Future Web / 37 HyperText Transfer Protocol / 38 3.5.1 Definition and General Operation / 38 3.5.2 HTTP Evolution / 39 3.5.2.1 HTTP/0.9 / 40 3.5.2.2 HTTP/1.0 / 41 3.5.3 HTTP/1.1 / 44 3.5.3.1 Formats of the Request and Response Messages for HTTP/1.1 / 45 3.5.3.2 New Request Methods and Definitions / 45 3.5.3.3 Persistent Connections / 46 3.5.3.4 Chunked Encoding / 47 3.5.3.5 Content Negotiations / 47 3.5.3.6 Byte-Range Operation / 48 3.5.3.7 Authentication / 48 3.5.3.8 Caching / 48 3.5.3.9 Headers / 50 3.5.4 Conclusion / 66 Wireless Access Protocol (WAP) / 66 3.6.1 Introduction / 66 3.6.2 WAP Evolution / 66 3.6.2.1 WAP 1.0 Architecture / 67 3.6.2.2 WAP 1.0 Components / 69 3.6.3 WAP 2.0 / 78 3.6.3.1 Introduction / 78 3.6.3.2 WAP 2.0 Architecture and Overview / 78 3.6.3.3 WAP 2.x Components / 80 3.6.4 Future of WAP / 83 References / 84

4 CONTENT CACHING AND MULTICAST Dan Li

4.1

xi

Web-Based Applications / 87 4.1.1 Information Dissemination / 87 4.1.1.1 Static Content / 88 4.1.1.2 Dynamic Content / 88 4.1.1.3 Streaming Media / 89

87

xii

CONTENTS

4.2 4.3

4.4

4.5

4.6

4.1.2 Information Exchange / 90 Scalable Content Delivery via Multicast and Caching / 91 IP Multicast and Reliable Multicast / 93 4.3.1 Challenges Facing Reliable Multicast / 94 4.3.2 NAK-Based Recovery / 95 4.3.3 Distributed Recovery / 95 4.3.4 Router-Assisted Recovery / 98 4.3.5 FEC-Based Recovery / 100 4.3.6 State of the Art / 102 Application Layer Multicast / 103 4.4.1 Rationale for Application Layer Multicast / 104 4.4.2 Why We Still Need IP Multicast / 108 4.4.3 Functions of Application Layer Multicast / 110 4.4.4 Building the Distribution Tree / 112 4.4.5 State of the Art / 116 Web Proxy Caching / 117 4.5.1 Basics of Proxy Caching / 117 4.5.2 Content Delivery / 118 4.5.3 Cache Consistency / 120 4.5.4 Cache Cooperation / 122 4.5.5 Limitations of Previous Work / 125 Summary / 126 References / 127

5 CHARACTERIZING WEB WORKLOAD OF MOBILE CLIENTS Atul Adya, Paramvir Bahl, and Lili Qiu

5.1

5.2

Overview of Web Workload Characterization / 136 5.1.1 Motivation for Workload Characterization / 136 5.1.2 Types of Analysis / 137 Overview of Previous Work / 137 5.2.1 Wireline User Workload Characterization / 137 5.2.1.1 Content Analysis / 138 5.2.1.2 User Behavior Analysis / 139 5.2.1.3 System Load Analysis / 140 5.2.2 Wireless User Workload Characterization / 140 5.2.2.1 Analysis of WAP Traffic at Bell Mobility’s PCS / 141

135

CONTENTS

5.2.2.2

5.3

5.4

5.5

5.6

5.7

Analysis of a Metropolitan Area Wireless Network / 141 5.2.2.3 Wireless LAN Study / 141 Server Architecture and Data Gathering / 142 5.3.1 Server Architecture / 143 5.3.2 Description of Data Logs / 144 5.3.3 Types of Accesses / 144 Characterizing Web Browsing Workload / 145 5.4.1 Content Analysis / 145 5.4.1.1 Content Size / 146 5.4.1.2 Popular Content Categories / 146 5.4.1.3 Document Popularity / 148 5.4.2 User Behavior Analysis / 149 5.4.2.1 Load Distribution of Different Users / 150 5.4.2.2 Distribution of Wireless User Sessions / 153 5.4.2.3 Temporal Stability / 155 5.4.2.4 Spatial Locality / 158 5.4.3 System Load Analysis / 160 5.4.4 Summary of Browse Log Analyses / 161 Characterizing Notification Workload / 163 5.5.1 Content Analysis / 164 5.5.1.1 Notification Message Size and Its Implications / 164 5.5.1.2 Popular Categories / 164 5.5.1.3 Message Popularity Analysis and Its Implications / 167 5.5.2 User Behavior Analysis / 168 5.5.2.1 Load Distribution of Different Users / 169 5.5.2.2 Spatial Locality / 169 5.5.3 System Load / 171 5.5.4 Summary of Notification Log Analyses / 172 Correlation between Web Browsing and Notification / 174 5.6.1 Correlation in the Amount of Usage / 174 5.6.2 Correlation in Popular Content Categories / 176 5.6.3 Summary / 178 Comparison between Workload of Wireline Web and Mobile Web / 178 5.7.1 Comparison in Web Content / 178

xiii

xiv

CONTENTS

5.8

5.7.2 Comparison in User Behavior / 179 5.7.3 Comparison in System Load / 179 Summary / 179 References / 180

6 ACME: A NEW MOBILE CONTENT DELIVERY ARCHITECTURE

183

Tao Wu, Sadhna Ahuja, and Sudhir Dixit

6.1 6.2

6.3

6.4

6.5 6.6

Introduction / 183 Mobile Content Delivery Techniques and Related Work / 186 6.2.1 Content Delivery for the Internet / 186 6.2.1.1 Network Scaling / 187 6.2.1.2 End-System Acceleration / 187 6.2.1.3 Content and Protocol Optimization / 188 6.2.2 Content Delivery for the Mobile Internet / 189 6.2.3 Related Work / 189 ACME Performance Analysis / 191 6.3.1 System Description / 191 6.3.2 ACME Performance in a Slotted ALOHA System / 191 6.3.2.1 Performance of Baseline / 192 6.3.2.2 Performance of ACME / 193 6.3.2.3 Comparison / 194 6.3.3 ACME in CDMA Networks / 196 Exploiting User Interest Correlation with ACME / 197 6.4.1 The Algorithm / 197 6.4.2 Traces / 198 6.4.3 Simulations / 198 ACME in Radio Resource Management / 201 Conclusions / 201 References / 202

7 CONTENT ADAPTATION FOR THE MOBILE INTERNET Stephane Coulombe, Oskari Koskimies, and Guido Grassel

7.1 7.2

Motivation for Adaptation / 205 Multimedia Content Types / 207 7.2.1 Media Content / 207 7.2.1.1 Textual Content / 207 7.2.1.2 Audiovisual Content / 208 7.2.2 Presentation Content / 210

205

CONTENTS

Stylesheets / 210 Device-Independent Presentation Content / 214 7.2.3 Application Data / 214 7.2.4 Procedural Code / 215 Types of Adaptation / 216 7.3.1 Format Adaptation / 216 7.3.2 Characteristics Adaptation / 217 7.3.3 Appearance Adaptation / 217 7.3.4 Size Adaptation / 219 7.3.5 Encapsulation Adaptation / 221 Methods of Adaptation / 221 7.4.1 Multimedia Transcoding / 221 7.4.1.1 Multimedia Transcoding Architecture / 221 7.4.1.2 Transcoding of Audiovisual Content / 222 7.4.1.3 Transcoding of Nonaudiovisual Content / 223 7.4.1.4 Transcoding of Procedural Code / 224 7.4.1.5 Advantages and Drawbacks of Transcoding / 224 7.4.2 Content Selection / 225 7.4.2.1 The Infopyramid / 225 7.4.2.2 The Customizer / 226 7.4.2.3 The Infopyramid Creation Process / 227 7.4.2.4 Advantages and Drawbacks of Content Selection / 227 7.4.2.5 Separating Content and Its Representation / 228 7.4.3 Rendering at the Client / 228 7.4.4 Hybrid Approaches / 230 Capabilities and Metadata / 230 7.5.1 Capabilities / 231 7.5.1.1 User-Agent Information / 231 7.5.1.2 Composite Capability/Preference Profiles (CC/PP) / 231 7.5.1.3 UAProf / 233 7.5.1.4 Subscriber Databases / 233 7.5.2 Metadata / 236 Adaptation Architectures / 237 7.6.1 Location of Adaptation / 237 7.2.2.1 7.2.2.2

7.3

7.4

7.5

7.6

xv

xvi

CONTENTS

7.7

7.8

7.6.1.1 Adaptation at the Source / 238 7.6.1.2 Adaptation at the Destination / 239 7.6.1.3 Adaptation at the Intermediary / 239 7.6.2 Adaptation Architecture Configurations / 239 Application Scenarios / 241 7.7.1 Scenario for Content Selection: Browsing / 241 7.7.1.1 Content Selection Algorithm / 242 7.7.1.2 The Infopyramid and Media Capability Descriptors / 242 7.7.1.3 The Terminal’s Media Capability Descriptors / 244 7.7.1.4 Results of the Media Content Selection / 244 7.7.1.5 Results of the Overall Adaptation / 244 7.7.2 Scenario for Transcoding: Multimedia Messaging Service / 244 7.7.2.1 MMS Applications / 246 7.7.2.2 MMS Transactions / 246 7.7.2.3 The MMS Conformance Document / 248 7.7.2.4 The UAProf Descriptions for MMS Application / 248 7.7.2.5 MMS Adaptation Example for a Weather Service / 249 7.7.3 Concluding Remarks / 250 Standardization and Future Work / 251 References / 252

8 CONTENT SYNCHRONIZATION Ganesh Sivaraman

8.1 8.2 8.3

8.4

Introduction / 255 Why Mobile Devices Need Synchronization / 257 Fundamental Principles of Synchronization / 258 8.3.1 Types of Synchronization / 258 8.3.1.1 One- versus Two-Way Synchronization / 258 8.3.1.2 Slow versus Fast Synchronization / 259 8.3.2 Change Detection / 260 8.3.3 Conflict Detection and Resolution / 260 Adoption of Synchronization for Mobile Devices / 261 8.4.1 Synchronization Scenarios for Mobile Devices / 261 8.4.2 Adhering to Mobile Device Constraints / 262

255

CONTENTS

8.5

8.6

xvii

Synchronization Standard / 263 8.5.1 OMA Data Synchronization Overview / 264 8.5.1.1 OMA Representation / 264 8.5.1.2 OMA Data Synchronization Protocol / 268 Summary / 273 References / 274

9 MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS

275

Sanjeev Verma, Muhammad Mukarram bin Tariq, Takeshi Yoshimura, and Tao Wu

9.1 9.2

9.3

9.4

9.5

9.6

Introduction / 275 QoS Issues for Streaming Applications / 275 9.2.1 Application Layer QoS Control / 280 9.2.1.1 Congestion Control and Quality Adaptation / 281 9.2.1.2 Error Control / 283 9.2.2 Network Layer QoS Control / 284 Streaming Media Codecs / 285 9.3.1 Video Compression / 285 9.3.2 Audio Compression / 286 9.3.3 Codecs Used in 3GPP / 286 End-to-End Architecture to Provide Streaming Services in Wireless Environments / 288 9.4.1 Logical Streaming Multimedia Architecture / 288 Protocols for Streaming Media Delivery / 291 9.5.1 Protocols and Languages for Streaming Media Session Control / 291 9.5.1.1 Real-Time Streaming Protocol / 292 9.5.1.2 Session Description Protocol / 297 9.5.1.3 Other Session Control Protocols / 298 9.5.1.4 Description Languages / 298 9.5.1.5 UAProf Specification / 303 9.5.2 The Streaming Media Transport Protocols / 305 9.5.2.1 The Real-Time Transport Protocol / 305 9.5.2.2 Other Media Transport Protocols / 308 3GPP Packet-Switched Streaming Service / 308 9.6.1 3GPP Packet-Switched Domain Architecture / 310 9.6.2 The 3GPP PSS Framework / 312

xviii

CONTENTS

9.6.2.1

9.7

9.8

Streaming Media Session Setup Procedures for PSS / 313 Multimedia Services in Mobile and Wireless Environments / 315 9.7.1 Differentiating Transmission Error Losses from Congestion Losses / 315 9.7.2 Counteracting Handover Packet Loss / 316 9.7.3 Mobility-Aware Server Selection and Request Routing in Mobile CDN Environments / 318 9.7.4 Architectural Considerations to Provide Streaming Services in Integrated Cellular/WLAN Environments / 319 9.7.4.1 Tight Coupling / 320 9.7.4.2 Loose Coupling / 320 Conclusions / 321 References / 322

10 MULTICAST CONTENT DELIVERY FOR MOBILES Rod Walsh, Antti-Pentti Vainio, and Janne Aaltonen

10.1 Introduction / 327 10.1.1 Chapter Overview / 328 10.2 Multicast Overview / 328 10.2.1 The Justification for Multicast / 328 10.2.2 Three Perspectives on Multicast / 330 10.2.3 Multicast as a Communication Technique / 331 10.2.4 Multicast Applications and Services / 332 10.2.5 Mobile Wireless Multicast / 335 10.3 The Generic IP Multicast System / 336 10.3.1 Common Multicast System Aspects / 336 10.3.2 A Reference System Model / 336 10.3.3 Three-Platform Services / 337 10.3.3.1 IETF Streaming / 339 10.3.3.2 IETF Filecast / 339 10.3.3.3 IETF Media Discovery / 339 10.3.4 IP Multicast Networking Procedure / 340 10.3.5 Additional Aspects of the Mobile Wireless Environment / 342 10.3.5.1 Mobility and Movement of Users / 342 10.3.5.2 Errors in Radio Transmission / 345 10.3.5.3 Unidirectional Downlink Bearers / 346

327

CONTENTS

xix

10.4 IP Datacast (IPDC) / 347 10.4.1 The IPDC Concept / 347 10.4.2 IPDC Services and Applications / 348 10.4.3 IPDC System Architecture / 350 10.4.4 Mobile Wireless Radio Networks for IPDC / 350 10.4.4.1 DVB-T/XH as a Radio Access Network for IPDC / 352 10.4.5 IP Infrastructure for IPDC / 354 10.4.6 The IPDC Service System / 355 10.4.7 E-Commerce for IPDC / 356 10.4.8 IPDC in Summary / 357 10.5 Multicast in Third-Generation Cellular (MBMS) / 358 10.5.1 The MBMS Concept / 358 10.5.2 MBMS Services and Applications / 360 10.5.3 MBMS System Architecture / 361 10.5.4 MBMS Radio Access Networks / 362 10.5.5 MBMS in the Core Network / 364 10.5.6 MBMS Service Center and Data Sources / 365 10.5.7 Commercial Interfaces / 366 10.5.8 MBMS in Summary / 366 10.6 Multicast Content Delivery for Mobiles in Summary and in the Future / 367 References / 368 11 SECURITY AND DIGITAL RIGHTS MANAGEMENT FOR MOBILE CONTENT Deepa Kundur, Heather Yu, and Ching-Yung Lin

11.1 Introduction to Information Security and DRM Technologies / 371 11.1.1 Information Security / 372 11.1.2 Content-Based Media Security / 374 11.1.3 Digital Rights Management / 375 11.1.3.1 Overview / 375 11.1.3.2 DRM and Content Distribution Business Models / 381 11.1.3.3 DRM and Security / 384 11.1.3.4 General Requirements for Mobile Terminals / 384 11.2 MPEG Intellectual Property Management and Protection / 385

371

xx

CONTENTS

11.2.1 Copy Protection on MPEG-2 Videos / 385 11.2.2 MPEG-4 IPMP Hook / 386 11.2.3 MPEG-21 and MPEG IPMP Extensions / 387 11.3 Emerging Technologies and Applications / 387 11.3.1 State-of-the-Art MDRM Systems / 388 11.3.1.1 Nokia Music Player—Distributed DRM / 391 11.3.1.2 NEC VS-7810—Centralized DRM / 391 11.3.1.3 Integrated Model / 392 11.3.1.4 MDRM Requirements / 392 11.3.2 State-of-the-Art MDRM Component Technologies / 394 11.3.2.1 Scalable and Format-Compliant Encryption for Multimedia / 394 11.3.2.2 Public Key Watermarking System / 398 11.3.2.3 Efficient Key Management for Multicast in the Mobile Environment / 401 11.3.2.4 Multimedia Content Verification and Error Concealment / 402 11.3.3 Summary / 404 References / 404

12 CHARGING FOR MOBILE CONTENT David Banjo

12.1 Introduction / 409 12.1.1 The Scope of “Charging” / 410 12.2 Fixed-Line Telephony Charging / 411 12.3 Mobile Telephony Charging / 414 12.4 Aspects Pertinent to Mobile Content Charging / 415 12.4.1 Revenue Chain / 416 12.4.2 Subscription Models / 417 12.4.3 Postpaid and Prepaid Charging / 419 12.4.4 Business-to-Business (B2B) / 421 12.4.5 Roaming / 422 12.4.6 Multiple Access / 422 12.4.7 Source of Charging Records / 422 12.4.8 Multiple Servers Involved in Delivery / 423 12.5 Charging Concepts and Mechanisms / 424 12.5.1 Creation of Charging Records / 424 12.5.2 Differentiated Charging / 425

409

CONTENTS

12.6 12.7 12.8

12.9

xxi

12.5.3 Flow-Based Charging / 426 12.5.4 Mediation / 428 12.5.5 Correlation / 428 12.5.6 Charging Rules / 429 12.5.7 Rating / 429 12.5.8 Advice of Charge / 429 Charging Interfaces / 429 Charging Information / 431 Charging Architecture and Scenarios / 433 12.8.1 Charging Architecture / 433 12.8.2 Charging Scenarios /433 12.8.2.1 Browsing / 433 12.8.2.2 Person-to-Person Messaging / 435 12.8.2.3 Download / 436 12.8.2.4 Streaming Video / 436 Summary / 437 References / 437

13 ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES Gang Wu, Xia Gao, and Keisuke Suwa

13.1 Introduction / 439 13.2 Taxonomy of Location / 441 13.2.1 Physical and Symbolic Location / 441 13.2.2 Absolute and Relative Location / 442 13.3 Location Estimation Media / 442 13.3.1 Radiofrequency (RF) / 443 13.3.1.1 Multipath Propagation / 443 13.3.1.2 Other Interference Factors / 445 13.3.2 Infrared (IR) / 445 13.3.3 Ultrasound / 446 13.4 Location Estimation Algorithms / 447 13.4.1 Triangulation / 447 13.4.1.1 Time of Arrival (TOA) / 447 13.4.1.2 Time Difference of Arrival (TDOA) / 448 13.4.1.3 Angle of Arrival (AOA) / 449 13.4.2 Scene Analysis / 450 13.4.2.1 Rationale for RF-Based Scene Analysis / 450

439

xxii

CONTENTS

13.4.2.2 General Framework of Scene Analysis / 451 13.4.3 Proximity / 454 13.5 Location Estimation Systems / 454 13.5.1 Indoor Location Estimation Systems / 454 13.5.1.1 Scene-Analysis-Based Systems / 454 13.5.1.2 Ultrasound-Based Location Estimation Systems / 455 13.5.1.3 Infrared-Based Location Estimation Systems / 457 13.5.1.4 Location Proximity in CDN / 457 13.5.2 Outdoor Location Estimation Systems / 459 13.5.2.1 Location Estimation with GPS-Based System / 459 13.5.2.2 Location Estimation with Cellular-Based System / 460 13.5.3 Location Format Transformation / 464 13.6 Location Services Based on Cellular Systems / 465 13.6.1 Location Service System Architecture / 465 13.6.2 Mobile Location Protocol / 467 13.6.3 Location Service Platform / 468 References / 471

14 FIXED AND MOBILE WEB SERVICES Michael Mahan

14.1 Web Services Introduction / 473 14.1.1 Web Services Defined / 473 14.1.2 Service-Oriented Architectures / 475 14.1.3 Motivating Technologies: Creating a Mature Foundation / 477 14.1.4 Quick Glance at Foundation and Core Technologies of Web Services / 480 14.1.5 Web Services Hype / 482 14.2 Web Services Foundation Technologies / 483 14.2.1 XML/XML Schema / 483 14.2.2 The Web / 483 14.2.3 The Web Services Standards / 487 14.2.4 Simple Object Access Protocol (SOAP) / 489 14.2.4.1 SOAP Deployment Environments / 490

473

CONTENTS

xxiii

14.2.4.2 SOAP Example / 492 14.2.4.3 SOAP 1.1 Structure and Processing Model / 494 14.2.4.4 SOAP Bindings / 499 14.2.4.5 SOAP Styles and Encodings / 500 14.2.4.6 SOAP with Attachments / 505 14.2.4.7 SOAP Message Security / 507 14.2.4.8 SOAP 1.2 Changes / 510 14.2.5 WSDL / 511 14.2.6 Discovery Protocols / 518 14.2.7 Web Services Interoperability Organization (WS-I) / 519 14.2.8 Mobile Terminal Web Services / 520 14.2.9 Privacy and Identity Management / 522 14.3 Conclusion / 523 References / 524 INDEX

527

PREFACE

When we initiated a research project on content delivery in Nokia Research Center in 1999, the technology world was at the height of its fascination with the Internet and the World Wide Web. The Internet backbone bandwidth just could not keep up with the exponential traffic growth, and from time to time, a major Website was brought down by overwhelming visits. At that time, content networking was attracting interest from industry and academia for its scalability, performance, and cost effectiveness. Many companies sprang up to meet the challenge by coming up with the idea of content delivery networks (CDNs) to deliver service level guarantees to the Internet service providers, enterprises, operators, and content providers. The buzzes were that the “content is king,” “it is all about user experience—the 8second rule,” and so on. The joint ownership of the content and the infrastructure by some operators or service providers gave an impetus to CDNs so that they could differentiate their services from those of their competitors. The last several years have witnessed dramatic changes in business and technology environment. The Web has clearly emerged as the interface of choice, and multimedia applications are rapidly gaining consumer pull and acceptance on handheld devices. As life becomes mobile and consumer and business devices become digital with wireless interconnectivity, mobility and context awareness are providing the much-needed value-add to build viable business models. Concurrently, Web performance is undergoing substantial improvement, thanks to increased capacity in the backbone, wide availability of broadband access, and, of course, content delivery technologies. However, as professionals and consumers alike start to access information and multimedia content anywhere and anytime, mobility, the varying characteristics of the wireless medium, and constrained handheld terminals bring even greater challenges to the evolving next-generation mobile services. And that is what this book is all

xxvi

PREFACE

about—making Web and multimedia content and services available to mobile users with optimal user experience. Mobile content networking is still in its infancy, and we believe that what we have presented in this book is only the beginning of the story. Indeed, with the long-awaited arrival of the third-generation (3G) wireless technology, the increasing popularity of the wireless local area network (WLAN), and the emergence of numerous other short-range radio technologies, a new era of mobility services has just begun. We hope this book is helpful for anyone who is interested in mobile networks, content networking, and Web services in writing his/her own story. In order to cover the breadth and depth of the topics covered in this book, we felt that this could be done judiciously and expeditiously only by bringing the various experts in their respective fields together to contribute to a book of this kind. Since content networking is an emerging area of research, this also helped all the contributing authors present their different points of view. The book is written in a style intended to provide a broad overview of the content networking technologies with special emphasis on the mobile Internet, and is aimed toward practicing engineers, graduate students, and researchers. It has been our objective to provide the material in one single place to enable quick learning of the fundamentals involved in an easy-to-read format. We are indebted to the contributors of this book for their diligent work that made this book possible. Throughout this project they were very understanding and forthcoming with any revisions that we requested of them. We would like to thank Zhu Liu and Zhimei Jiang for their help and valuable suggestions during the course of the preparation of this book. We also thank the reviewers for their comments on the initial drafts, especially Mitri Abou-Rizk, Sadhna Ahuja, Mortaza Bargh, Srinivas Bindignavile, Dan Li, Zhu Liu, Gabor Fodor, Xia Gao, Yin-Ling Liong, Lili Qiu, and Haihao Wu. We express our gratitude to the staff of John Wiley, especially Rosalyn Farkas, Val Moliere, and Kirsten Rohstedt, for guiding us through the labyrinth of the publication process. Their promptness and attention to details made editing this book so much easier. Last, but not the least, we thank our families (Sudhir Dixit thanks his wife Asha and children Sapna and Amar, and Tao Wu thanks his wife Lingxuan) for their understanding and support, without which this book would not have been possible. They happily agreed to forsake valuable family time to let us work on this book. Thanks again! Finally, we (the authors and editors) have tried our best to make each chapter quite complete in itself and its contents as accurate as possible. However, we are afraid that some errors and omissions may still have remained unnoticed. Any feedback intended to correct errors and improve the book would be highly appreciated. Boston, Massachusetts (Email: [email protected]) Boston, Massachusetts (Email: [email protected]) March 2004

SUDHIR DIXIT TAO WU

ACRONYMS

2G 3G 3GPP 8PSK 8-VSB AAA ACK ACME ADSL ALC AMP AMPS AMR ANON ATSC B2B BAN BEEP BM-SC BSC BST-OFDM CAMEL CBS CDF CDN

Second-generation cellular communications Third-generation cellular communications Third Generation Partnership Project Octagonal phase shift keying Trellis-coded eight level vestigial sideband Authentication, authorization, and accounting Acknowledgments Architecture for content delivery in the mobile environment Asymmetric digital subscriber line Asynchronous layered coding Asynchronous multicast push Advanced mobile phone system Adaptive multirate (voice codec) Active-networks overlay network Advanced Television System Committee Business to business Body area network Blocks extensible exchange protocol Broadcast multicast service center Base station controller Band-segmented transmission OFDM Customized applications for mobile network enhanced logic Cell broadcast service Cumulative distribution function Content delivery network xxvii

xxviii

ACRONYMS

CMP CN COFDM COPS CORBA CPS CS DA DAB DHCP DNS DR DRM DSL DSM DSSS DVB DVB-H DVB-S DVB-T DUMRP e-CS EDGE ERS ESG FEC FIB FLO FLUTE GERAN GGSN GPRS GSM GTP HDML HO HSDPA HTML HTTP IETF IGMP IMG IMS IP IPDC

Continuous multicast push Core network Coded OFDM Common open policy service Common object request broker architecture Content provisioning (or policy) system Circuit switch Duplicate avoidance Digital audio broadcast Dynamic host configuration protocol Domain name service Donated receiver Digital rights management Digital subscriber line Distributed shared memory Direct-sequence spread spectrum Digital video broadcast DVB-handheld (a mobile optimized version of VB-T) Satellite DVB Terrestrial DVB Distance vector multicast routing protocol E-commerce system Enhanced data rates for GSM evolution Expanding-ring search Electronic service guide Forward error correction Forwarding information base Flexible layer one File delivery over unidirectional transport GSM/EDGE RAN Gateway GPRS support node General packet radio service Global system for mobile communications GPRS tunneling protocol Handheld Device Markup Language Handover/handoff High-speed downlink packet access Hypertext Markup Language Hypertext Transfer Protocol Internet Engineering Task Force Internet group management protocol Internet media guide IP multimedia system Internet protocol IP datacast

ACRONYMS

ISDB-T ISO ISP LAN LLC/SNAP LSM m-t-m m-t-p MAC MAN MBMS MLD MPE MPEG MRF MTU MUSE NACK NORM OFDM OMA OSI OSPF p-t-m p-t-p PAN PC PCV PID PING PKI PLMN POP PPP PS PSI QAM QoS RADIUS RAMP RAN RB REST RF RFC

Terrestrial integrated service digital broadcasting International Organization for Standardization Internet service provider Local area network Logical link control/subnetwork attachment point Limited scope multicast multipoint-to-multipoint multipoint-to-point Media access control Metropolitan area network Multimedia broadcast and multicast service Multicast listener discovery protocol Multiprotocol encapsulator/encapsulation Moving Picture Experts Group Media resource function Maximum transport unit Multiuse sensor enhancement Negative ACK NACK-oriented reliable multicast Orthogonal frequency-division multiplex Open Mobile Alliance Open systems interconnection Open shortest path first Point-to-multipoint Point-to-point Personal area network Personal computer Piggyback cache validation Packet identifier Packet Internet groper Public key infrastructure Public land mobile network Point of presence Point-to-point protocol Packet switch Program service information Quadrature amplitude modulation Quality of service Remote authentication dial-in user service Reliable adaptive multicast protocol Radio access network Radio bearers Representative state transfer Radio frequency Request for comment(s)

xxix

xxx

ACRONYMS

RIS RLC RMT RNC RP RRC RRM RTCP RTSP RTP RTT SA SAP SCE SDMS SDP SDPng SGSN SI SIR SMS SMSC SOAP SRM SS SSM TAG TCP TDMA TFCI TPC TR TS TSG TTL UDDI UE UHF UMTS UTRA UTRAN URI VoD VoIP W3C

Rights issue server Radio link control Reliable multicast transport Radio network controller Rendezvous point Radio resource controller Radio resource management RTP control protocol Real-time streaming transport Real-time transport protocol Roundtrip time System aspect Session announcement protocol Single-connection emulation Service and delivery management system Session description protocol SDP next generation Serving GPRS support node Service information Signal-to-interference ratio Short message service SMS center Simple object access protocol Scalable reliable multicast Service system Single-source multicast Technical Architecture Group Transmission control protocol Time-division multiple access Transport format combination indication Transmission power control Technical report Transport stream Technical specification group Time to live; transistor – transistor logic Universal description, discovery, and integration User equipment Ultrahigh frequency Universal mobile telecommunications system Universal terrestrial radio access (previously UMTS) Universal terrestrial RAN Uniform resource identifier Video on demand Voice over Internet Protocol World Wide Web Consortium

ACRONYMS

WAN WAP WCCP WCDMA WG WLAN WML WWW XML XrML

Wide area network Wireless access protocol Web Cache Control Protocol (proprietary to Cisco) Wideband code-division multiple access Working group Wireless LAN Wireless Markup Language World Wide Web eXtensible Markup Language eXtensible rights Markup Language

xxxi

CHAPTER 1

CONTENT NETWORKING IN THE MOBILE INTERNET SUDHIR DIXIT and TAO WU Nokia Research Center Burlington, Massachusetts

1.1

INTRODUCTION

The mobile wireless and Internet are clearly driving the need for more content that is varied, customizable, and available anywhere, anytime, and at low cost. With the penetration of mobile devices reaching 60% or more in many countries, it is only natural that an increasing number of people will access the Internet and invoke services and applications from such tetherless devices. From a business perspective, an interconnected population of over 6 billion worldwide presents vast opportunities at a time where distance is rapidly losing its significance. The universal availability of content and related services, including delivery, distribution, adaptation, management, and charging, will be critical for the success of the mobile Internet. Human factors studies suggest that a delay of more than 8 seconds in the delivery of the content can easily result in either lost sale or permanent abandonment of the site and/or content by the user. In fixed-access networks, various content delivery techniques are already in use to enhance user experience; however, when the access is through the air interface with mobility, new issues and challenges of performance need to be dealt with. In addition, the mobility provides the dimensions of location and context that can be leveraged to enable enhanced performance and more meaningful services to the user. Although the mobile Internet is in its nascent stage, the explosive growth is yet to come when real-time multimedia applications become pervasive and as the need to access information anywhere anytime grows. Nonetheless, the air interface will continue to be a limited and unreliable resource, and, when combined with the slowdown in the telecom (telecommunications) market and deferred infrastructure upgrades, it is even more important to develop tools and techniques that will enhance the user experience with no or Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

1

2

CONTENT NETWORKING

minimum requirements on upgrading network protocols and infrastructure. These solutions will target the manner in which the content is managed (in terms of storage, value-added services, search, distribution, consistency, adaptation, digital rights management, and charging) and delivered to enhance performance (latency, reliability, efficiency) and usability. In a nutshell, the development of future mobile networks will be driven, in large part, by content and Web-based services, and user experience. These are essentially the topics that are covered in this book.

1.2


Before answering the question of what is content networking in the mobile Internet, we must first define the keywords: content, content networking, and the mobile Internet. Content typically refers to any sharable object in any of its manifestations that a user is interested in. Networking as a general term is well understood, and refers to the Open Systems Interconnection (OSI) or the transmission control protocol/Internet protocol (TCP/IP) layered models. Layers 1– 3 (physical ! data data link ! network) provide the communication infrastructure of conduits and routing methodologies to deliver packets from one end to the other end regardless of the contents of those packets. Content networking generally refers to the tools and techniques that operate at the layer above the networking layer where the networking decisions are based on the content contained in the communication pipes and packets to satisfy the expectations of the users, operators, and the content providers. Such complex tools and techniques reside at the middleware that provides the important glue between the network infrastructure and the services and applications. The content networking middleware provides the necessary hooks in a commonly understood and documented manner to enable application and service developers and content providers to focus on their respective domains and optimize their goals and objectives to satisfy their users. To end users, content networking offers an environment where they have optimized and seamless experience in creating, sharing, managing, and consuming multimedia content. The main focus of this book is on content networking, as illustrated through a simple diagram in Figure 1.1. Nevertheless, there is sufficient background information provided at the lower layers (TCP/UDP/IP, link layer, and physical layer) to the extent that it is relevant to understanding the content networking in the mobile Internet. In the mobile Internet, mobility can be regarded as another window to the Internet where the access is provided over the air via an access point, and mobility is afforded by locating a user’s position and handovers from one access point to the other depending on the coverage. For any technology to be successful, the user experience has to be positive, and this is indeed the case with the mobile Internet today. The inherent unreliability and low bandwidth in the wireless access call for even more innovations at all layers of the protocol stack. In the wireless world of the future, since the mobile devices must be small and lightweight, there are additional challenges of local content storage, delivery, discovery, adaptation, and content presentation. Looking beyond these challenges, however, mobile terminals are

1.2


3

Figure 1.1 (a) Generic TCP/IP protocol stack; (b) shaded area shows the layers where content networking concepts and the main focus of the book reside.

personal devices that users carry almost all the time, and hold the promise of becoming the interface of choice for users to interact with the rich content environment, as shown in Figure 1.2, which depicts devices surrounding a user’s immediate proximity (shown by the inner circle), which when interconnected, constitute a personal area network (PAN) or body area network (BAN). These devices could share content and/or services. The outer circle shows the wide area network (WAN) that interconnects the world of personal communications with the rest of the world (cyberworld) in a seamless manner. The content network is an overlay network over the existing IP infrastructure that provides the logical connectivity between the users’ devices and the network elements. Any collaboration level infrastructure is the subset of the content network and largely leverages the capabilities and features of the middleware and application layer (e.g., multiplayer games, collaborative business applications). Content delivery networks (CDNs) are often misunderstood for content networking. They are in fact a subset of content networking, and their main purpose is to distribute the content across the global Internet so as to reduce network latency and maximize availability, scalability, and flexibility. CDNs offer the desired feature of providing quality of service (QoS) guarantees at the application layer. The CDNs provide different cost-performance tradeoffs and are targeted for different market segments. The content delivery network service providers made their foray around 1999 by offering—which was then a revolutionary step—to move content across the Internet and serve it from the network edge. This technology is expected to become more intelligent and complex as the Internet becomes more mobile- and Web-enabled. Because many of the enabling technologies to enhance the user experience in mobile content networking are strongly intertwined, including the Web as one of the dominant communication platforms and user interfaces, it is difficult to focus in depth on each and every aspect. Rather, we focus on the key technologies that provide the foundations to build a content networking infrastructure to suit

4

CONTENT NETWORKING

Figure 1.2 The mobile terminal serves as the interface of choice and the center of user experience for end users to interact with the content-rich environment.

the individual requirements and expectations. Next, we provide a brief overview of the various topics covered in this book.

1.3 1.3.1

BOOK OVERVIEW Chapter 2: Mobile Internet Architecture Overview

This chapter provides a comprehensive introduction to third-generation mobile communications from the perspectives of radio-related issues and standardization. While the first and second generations of mobile communications systems are designed and optimized for voice communications, IP-based data networking is an essential feature of 3G mobile networks. This chapter not only discusses wideband codedivision multiple access (WCDMA) technology and the global system for mobile communications (GSM) evolution in detail but also overviews IS-95 (Interim Standard, 1995) and its evolution toward 3G. In particular, the authors cover important recent developments in 3G technologies, such as high-speed downlink packet access (HSDPA), which increases the downlink bandwidth well beyond 2 Mbps (megabits per second).

1.3

1.3.2

BOOK OVERVIEW

5

Chapter 3: Protocols for the Web and the Mobile Internet

The Web, the Internet, and mobility all combined have dramatically impacted the entire world in a short period of about 10 years. This would not have been possible without the appropriate protocols. The Internet Protocol (TCP/IP) was well established when the Web came into existence. Therefore, protocols for the Web were built on top of the IP protocols that formed the lowest common denominator across a multitude of transport and access networks. This chapter presents a history of the World Wide Web (WWW) and describes the major Web protocols and their adjoining layers, including the Hypertext Transfer Protocol (HTTP), Wireless Access Protocol (WAP), Handheld Device Markup Language (HDML), and Wireless Markup Language (WML). The chapter describes how these Web protocols function from one end to the other end. 1.3.3

Chapter 4: Content Caching and Multicast

This chapter describes the nature of Web-based applications, in which the Web is the dominant carrier of information. The Web is used to either disseminate or exchange information. The chapter discusses the various types of Web content (e.g., static, dynamic, streaming media) and outlines what requirements they pose for the underlying network. The rest of the chapter is devoted to scalable content delivery via proactive multicast and on-demand caching with particular focus on the transport layer to support one-to-many data transport. IP multicast and reliable multicast are described in detail to support the multicast service model. Finally, the chapter presents a novel idea of application layer multicast. Since it is not practical to change all the routers in the Internet that “look” inside the packets, it is indeed possible to add intelligent nodes at strategic locations of the network to build a connection-oriented reliable overlay network, where both receiving and transmitting are done reliably using TCP by the overlay nodes. 1.3.4

Chapter 5: Characterizing Web Workload of Mobile Clients

This chapter investigates the important problem of characterizing mobile Web workload. Understanding the characteristics of mobile Web access is critical in numerous tasks, including network provisioning, developing services that can be scaled to millions of users, and offering content that is optimized to mobile usage patterns. Using mobile Web access logs, the chapter evaluates key attributes of mobile Web workload and provides valuable insights in designing efficient and effective mobile content services. 1.3.5

Chapter 6: ACME: A New Mobile Content Delivery Architecture

A fundamental mechanism of CDNs is exploiting user interest in the same content and distributing content with popular demand to multiple content caches. This chapter extends this principle to accelerate content delivery over the radio link,

6

CONTENT NETWORKING

which is usually the bottleneck of mobile user experience. The chapter develops the key concept of user interest correlation that identifies the similarity of interest in content between any two users. By exploiting user interest correlation, the chapter develops an architecture for content delivery in the mobile environment (ACME), which pushes content to targeted mobile users with high accuracy to improve user experience. This system is highly suitable for mobile content delivery, because it uses user interest correlation to achieve high bandwidth- and terminal power efficiency. 1.3.6

Chapter 7: Content Adaptation for the Mobile Internet

Mobile content networking assumes the formidable task of delivering content of multiple formats and modalities (text, video, audio, etc.) to heterogeneous devices with different capabilities. In addition, mobile users may require that content be delivered in different ways and in different contexts. This chapter first discusses the mobile content and articulates the need for content adaptation for mobile terminals. This is followed by a presentation of several adaptation methods such as transcoding and content selection, and how terminals can signal their capabilities to the server. Two adaptation scenarios essential to mobile content services, mobile browsing and multimedia messaging, are discussed in detail. 1.3.7

Chapter 8: Content Synchronization

Data synchronization is an essential functionality for mobile phones, as users may store and update critical information such as address book, calendar items and emails in various devices, personal computers, or backend servers. This chapter gives a high level overview of content synchronization and specifications developed at the Open Mobile Alliance (OMA). 1.3.8

Chapter 9: Multimedia Streaming in Mobile Wireless Networks

Streaming multimedia is an appealing application for many mobile users. However, it also encounters significant technical challenges because of its complexity and stringent QoS requirements. This chapter presents the QoS requirements and solutions for streaming media services. Furthermore, it presents basic mobile streaming media system architecture and related protocols. Media encoding and decoding methods that are widely used in mobile networks are also discussed. 1.3.9

Chapter 10: Multicast Content Delivery for Mobiles

Envisioned mobile multimedia services consume large amount of radio bandwidth, and multicast is a key enabling technology to support simultaneous multimedia transmissions to multiple mobile receivers. This chapter presents an overview of the motivations, requirements, and mechanisms of providing multicast services in future mobile networks. In particular, this chapter reviews two promising multicast

1.3

BOOK OVERVIEW

7

technologies: IP datacast (IPDC), which originated from digital television broadcasting, and multimedia broadcast/multicast services (MBMS), which is rooted in cellular telecommunications. Combined, they promise to deliver mobile multimedia content to the mass market in a bandwidth-efficient and cost-effective manner. 1.3.10 Chapter 11: Security and Digital Rights Management for Mobile Content Digital rights management (DRM) techniques protect the intellectual property of the content owner, and are thus critical in offering enforceable, profitable, and easy-to-use mobile content services. This chapter first presents a general overview of information protection and DRM, followed by specific considerations for mobile DRM systems, from both technical and business perspectives. State-ofthe-art mobile DRM systems are discussed to give the reader a clear picture of the current status and future trends for mobile DRM. 1.3.11

Chapter 12: Charging for Mobile Content

It is obvious that no service is viable without a proper billing model, and this is critical to the success of the mobile content business. Charging for the mobile content is continuously evolving and is being refined all the time. The charging models vary widely from very simple models where the customer directly pays the content provider over the Web without involvement of the operator to very complex revenue-sharing models where the billing is done to a single entity (mostly the operator) and then is shared among the various participants in the value chain on the basis of some prior agreements. This chapter describes the various mechanisms that enable the mobile content providers to charge for their services. After an overview of charging, accounting, billing, and payment in the context of current fixed and mobile telephony services, the chapter presents several subscription and online billing models on how mobile content could be charged. The chapter concludes with the assertion that (1) for the mass market to develop, charging should be fully differentiated to make it understandable, predictable, and acceptable and (2) charging should not be thrown in at the end of the system and service design; rather, it should be designed in from the very beginning with appropriate flexibility to adapt the billing system to the realities of the marketplace. 1.3.12 Chapter 13: Algorithms and Infrastructures for Location-Based Services Although providing mobile services that are as convenient and powerful as the wireline Internet is often challenging because of device and radio bandwidth limitations, such mobile services also offer unique values to the end user. Mobile services are usually situational or context-related, where location is the best understood and most widely used context. Location-based services provide the user with feature

8

CONTENT NETWORKING

and flexibility that is difficult to achieve in the wireline Internet. This chapter discusses the essential technology components in location-based services, including various location estimation methods as well as service architectures and platforms. 1.3.13

Chapter 14: Fixed and Mobile Web Services

As the Web becomes a valuable tool for people to obtain information, it is also developing into a unifying platform for automated services between computers. Web services facilitate service creation, integration, aggregation, and discovery by standardizing relevant interfaces. This chapter is a tutorial for Web services, covering Web services architecture and components, with focus on simple object access protocol (SOAP) and Web services description language (WSDL). It also discusses important issues such as over-the-air performance when applying Web services architecture in mobile services.

1.4

CONCLUDING REMARKS

Today, content plays a critical role in the success of the Internet, and will be even more important when the access goes broadband wireless with mobility. The ability to access the Internet anytime, anywhere, anyhow (from a multitude of devices) will give impetus to new services and applications, thus increasing the traffic volume enormously and especially burdening the wireless links. The wireless links are inherently limited in bandwidth capacity and are weakly connected, which puts an additional burden on operators and service providers to ensure that the user experience is positive. Coupled with the Internet today being best-effort, something has to be done, short of a forklift upgrade of the whole infrastructure, to deliver acceptable QoS, including reduced latencies in the delivery of the content. Content delivery technologies do exactly that. Additionally, to enhance the user experience even more, tools and techniques can be provided at the middleware and applications layer for managing and manipulating the content. Content delivery, management, and manipulation, typically referred to as content networking, also deal with service discovery, content synchronization, Web services, content adaptation, caching, streaming, multicasting, and so on. The mobile Internet cannot be successful unless there are appropriate safeguards for copyrights and charging models to enable all actors in the value chain to make money and pay for the fixed and recurring costs. This book covers the technologies discussed above in depth, and provides relevant pointers to other technologies that are needed or are prerequisite to building cost-effective, interoperable, efficient, and user-friendly mobility-enabled content networks.

CHAPTER 2

MOBILE INTERNET ARCHITECTURE OVERVIEW HARRI HOLMA and ANTTI TOSKALA Nokia Networks Espoo, Finland

This chapter presents an overview of the mobile internet architecture, focusing on the WCDMA radio technology as well as the core network fundamentals. It also covers the GSM/EGDE (EDGE ¼ enhanced data rates for GSM evolution) and IS-95 (also know as 1X, IS-2000, or cdma2000) radio technology evolution path as those technologies also play a role in making the Internet mobile.

2.1

INTRODUCTION

The first generation of the mobile communication systems, such as the advanced mobile phone system (AMPS) in United States and NMT or TACS in Europe, focused on making voice mobile and making that in the analog way. Data services were not part of the picture at all or only in very limited and specific applications. The second-generation mobile communication systems, such as GSM or IS-95, made things digital, but still focused on the efficient voice service in terms of capacity and quality, and deployment of the data services took some time to appear on the market after the systems were introduced. As data use was not the primary focus from the start, many of the data-related features were added later on, making multiple services or introduction of new service difficult from the systems perspective. With the third-generation mobile communications systems, such as wideband code-division multiple access (WCDMA) technology, the data were the main driver for system design. This enabled the simultaneous and transparent support of multiple services, whether packet or circuit switched, to take place from the start. Unlike first-generation analog services, the second-generation systems like GSM or IS-95 are not fading away but are also evolving to meet the new expectations and to be part of making the Internet mobile. In GSM side the evolution can be seen in GSM/EDGE standard series, and on the IS-95 Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

9

10

MOBILE INTERNET

side, the new series of the specifications is IS-2000, or as often denoted as 1X. In this chapter the WCDMA radio access as well as core networks are covered. The WCDMA and GSM/EDGE connect to the common core network with standardized interfaces. With the IS-95/1X radio access, the core network principles do not follow a common standard. For the IP multimedia subsystem part, there seems, however, to be convergent toward a single solution covering WCDMA, GSM, as well as IS-95/1X evolution.

2.2

STANDARDIZATION FRAMEWORK

The global picture of standardization has changed considerably since the late 1990s. The time of first-generation analog cellular systems was characterized by national standards or standards adopted by a small group of countries. The advent of 2G systems changed the picture to a great extent, especially when GSM standardization was done with a regional approach, and then later adopted and shaped by several other regions. Following the 3G generation technology selection during 1997/98 in different regions, it became obvious that similar technologies were being standardized in different places. It was soon realized that achieving identical specifications to ensure equipment compatibility globally would be very difficult with work going on in parallel in different regional organizations. Therefore the process was started by the initiative to create a single forum for WCDMA standardization for a common WCDMA specification. The standardization organizations involved in the creation of the Third Generation Partnership Project (3GPP) [1] were ARIB (Japan), ETSI (Europe), TTA (Korea), TTC (Japan), and T1P1 (USA). The partners agreed on joint efforts for the standardization of UTRA, now standing for universal terrestrial radio access, as distinct from UMTS terrestrial radio access from ETSI, also submitted to 3GPP. Companies such as manufacturers and operators are members of 3GPP through the respective standardization organization to which they belong. Later during 1999, CWTS (the China Wireless Telecommunication Standard Group) also joined 3GPP. 3GPP also includes market representation partners: the GSM Association, the UMTS Forum, the Global Mobile Suppliers Association, the IPv6 Forum, and the Universal Wireless Communications Consortium (UWCC). In Ref. 1 there are upto-date links to all participating organizations. Later there was also 3GPP2 formed by a number of regional standards bodies to create standards around IS-95 evolution. The 3G standardization was a rapid process, with the 3GPP producing the first full version of the WCDMA standard at the end of 1999. This release, called “release’99,” contains all the necessary elements to meet the requirements for IMT-2000 technologies, including a 2 Mbps data rate with variable bit rate capability, support of multiservice, a flexible physical layer for easy introduction of new services, quality of service (QoS) differentiation, and efficient packet data. More details on 3GPP and 3GPP2 organizations can be found in Refs. 1 and 2 respectively.

2.3

2.3

SYSTEM ARCHITECTURE AND CORE NETWORK

11

SYSTEM ARCHITECTURE AND CORE NETWORK

The WCDMA as well as GSM standards are based on the principles of open interfaces that enable, for example, an operator to have different vendors for the core network and the radio access network. The network elements are grouped into the radio access network (RAN, UMTS terrestrial RAN ¼ UTRAN), which handles all radio-related functionality, and the core network, which is responsible for switching and routing calls and data connections to external networks. To complete the system, the user equipment (UE) that interfaces with the user and the radio interface are defined. The CN domain handling packet-switched traffic is denoted as the packed-switched (PS) domain, while the domain handling the circuit-switched traffic is denoted as the Circuit-Switched (CS) domain. The system architecture overview is shown in Figure 2.1. From Figure 2.1 one can see the Internet being addressed via the PS domain. Figure 2.1 shows basically the release’99 status, while releases 5 and 6 (the latter not yet finalized) enhanced the PS domain by introduction of the IP multimedia subsystem (IMS), as shown in Figure 2.2. The IMS enables the creation of standardsbased packet-switched services. The WCDMA radio access network has also standardized functional split as well as the internal interfaces also inside the radio access network. This enables the multivendor network integration to allow, for instance, CDMA technology critical features such as soft handover to be operated seamlessly across the network between different radio network controllers (RNCs). As in GSM networks, the interfaces

Figure 2.1

System level architecture.

12

MOBILE INTERNET

Figure 2.2

Release 5 WCDMA core network including IMS.

between core and radio access network are standardized to allow the core network and radio access networks from different vendors to work smoothly together and ensure that both sides can evolve without the need for end-to-end upgrade of the network in case of such a new feature is added that impacts only the other side of the network. Standardized interfaces also define explicitly which interfaces should carry some information related to a particular feature, thus revealing information on which network elements need to be updated when the feature is implented. The core network has been developed since the first release of the WCDMA standards in 1999. The release’99-based core network currently being operated with WCDMA is basically the same as that been used with GSM previously. The main elements shown in Figure 2.1 for the release’99 architecture are . The HLR (home location register) database stores the master copy of the user’s service profile, including information on allowed services. It is created when a new user subscribes to the system, and remains stored as long as the subscription is active. For routing incoming transactions to the UE, the HLR also contains the UE location on the level of serving MSC or SGSN. . The MSC/VLR (mobile services switching center/visitor location register) is the switch (MSC) and database (VLR) that serves the UE in its current location for circuit-switched (CS) services. The MSC function is used to switch the CS transactions, and the VLR function holds a copy of the visiting user’s service profile, as well as more precise information on the UE’s location within the serving system.

2.4

WCDMA RADIO ACCESS NETWORK

13

. The GMSC (gateway MSC) is the switch at the point where UMTS PLMN (public land mobile network) is connected to external CS networks. All incoming and outgoing CS connections go through GMSC. . The SGSN [serving GPRS (general packet radio service) support node] functionality is similar to that of MSC/VLR but is used for packet-switched (PS) services. Mobility management issues such as routing area update and packet paging are andled by the SGSN. The SGSN does not need to worry about the radio-related details as they are hidden behind the Iu-interface. . The GGSN (gateway GPRS support node) functionality is close to that of GMSC but relates to PS services. For the packet-based services the call path for the correct SGSN is created by GGSN. For roaming, the GGSN can be in a network different from that of SGSN. Connection to the Internet and other networks goes via the GGSN. In release 4 the MSC split was introduced with the MSC server and the media gateway (MGW). The next significant step was the introduction of the IP multimedia subsystem (IMS) in release 5. The IMS enables standardized multimedia services from the PS domain of the core network, with further work carried out to provide all the desired services via the PS domain. The IMS-specific functions are located in three separate units as shown in Figure 2.2. The IMS functions shown are . The media resource function (MRF), which among other tasks controls media stream resources . The call session control function, which acts as the proxy for the terminal in IMS . Media gateway control function (MGCF), handling, for instance, protocol conversions It should be noted that further division of smaller entities can be applied, as there are more interfaces defined but practical implementations are not expected to have all functionalities in physically separated entities. Work on the IMS is currently aimed at bridging the gap between 3GPP and 3GPP2 activities as 3GPP2 IMS is based on 3GPP IMS. 2.4 2.4.1

WCDMA RADIO ACCESS NETWORK WCDMA Radio Access Network Architecture

The general principles applied in most of the 2G network deployed were established during GSM standardization. The basestation would handle radio transmission and reception. A more intelligent unit, a basestation controller (BSC), would perform tasks that require more intelligence than just the radio transmission and reception functions. One example of such tasks is the radio resource management (RRM) function. WCDMA radio network architecture is illustrated in Figure 2.3.

14

MOBILE INTERNET

Iub

Iu-CS

RNC

Node B

Iur

3G MSC Iu-PS

Iub 3G SGSN

Node B Figure 2.3

RNC

WCDMA radio access network architecture.

In WCDMA technology there are certain properties that will have an impact on the network architecture. For proper system operation, functional interfacing is needed between different parts of the radio network so that soft handover (where a terminal is connected to more than a single cell at the time) is possible even if the different parts of the network are controlled by different controllers. In the case of WCDMA the term radio network controller (RNC) is used. The key WCDMA feature as the multivendor RAN environment, since also the internal interfaces have been standardized in 3GPP as follows: . Iur, which is the interface between two RNCs. The open and standardized Iur interface is already deployed and tested in the multivendor network by many operators, thus ensuring the soft handover functionality in multivendor environment as well. . Iub, which is the interface between the base station (denoted as node B in 3GPP terminology) and the radio network controller (RNC). 2.4.2

WCDMA Layer 2/3 Architecture and Principles

The channel structure of WCDMA consists of logical, transport, and physical channels. This structure is related to the WCDMA protocol architecture, as shown in Figure 2.4. The physical layer offers services to the medium access control (MAC) layer via transport channels. Transport channels are characterized by how and with what characteristics data can be transferred. The key functionality of the MAC layer is to select suitable transport format to be used as a function of the data rate. Further in case of transparent radio link control (RLC) mode, the ciphering is done in the MAC layer. The MAC layer correspondingly offers services to the RLC layer by means of logical channels. The logical channels are characterized by what type of data are transmitted. The RLC layer offers segmentation and retransmission services for the data to be transmitted and handles ciphering unless the transparent mode of RLC is in use. The RLC layer contains transmission and reception buffers that also handle in-sequence delivery of the data to higher layers.

2.4


15

User Plane

Control Plane

U-Plane Radio Bearer

RRC

L3

Signalling Radio Bearer

RLC = Radio Link Control

PDCP

Control

MAC =Medium Access Control

BMC RLC

L3 L2

RRC = Radio Resource Control

Logical Channels MAC Transport Channels Physical Layer

L1

Figure 2.4 UTRA FDD protocol architecture [packet data convergence protocol (PDCP); broadcast/multicast control protocol (BMC)].

A general classification of logical channels is presented by the authors [3], who divide the logical channels into two groups: control channels and traffic channels. The control channels are used for transfer of control plane information, and the traffic channels are used for the transfer of user plane information only. The control and traffic channels can be summarized as follows: . Broadcast control channel (BCCH), for broadcasting system control information in the downlink. . Paging control channel (PCCH), for transferring paging information in the downlink (used when the network does not know the cell location of the UE, or, the UE is in a cell-connected state). . Common control channel (CCCH), for transmitting control information between the network and UEs in both directions (commonly used by UEs that either have no RRC connection with the network or use common transport channels when accessing a new cell after cell reselection). . Dedicated control channel (DCCH), a point-to-point bidirectional channel for transmitting dedicated control information between the network and a UE (established through the RRC connection setup procedure). . Dedicated traffic channel (DTCH), a point-to-point channel, dedicated to one UE for the transfer of user information (a DTCH can exist in both uplink and downlink directions). . Common traffic channel (CTCH), a point-to-multipoint unidirectional channel for transfer of dedicated user information for all or a group of specified UEs. The mapping between logical channels and transport ones is depicted in Figure 2.5.

16

MOBILE INTERNET

Figure 2.5 Mapping between logical channels and transport channels, in the uplink and downlink directions (for UTRAN FDD only, i.e. without uplink shared channel).

In WCDMA the data generated at higher layers are carried over the air interface using transport channels mapped onto different physical channels as shown in Figure 2.6. The physical layer has been designed to support variable bit rate transport channels, to offer bandwidth-on-demand services, and to be able to multiplex several services with different quality requirements within the same RRC connection into one coded composite transport channel (CCTrCH). A CCTrCH carries the data being mapped to it by higher layers, and the CCTrCH itself is carried by one physical control channel and one or more physical data channels. There can be more than one downlink CCTrCH, such as DCH and DSCH in parallel to one user, but only one physical control channel is transmitted on a given connection. A similar principle is valid for release 5 with high-speed downlink packet access (HSDPA) as there is always one DCH in parallel to the high-speed DSCH (HS-DSCH). The dedicated channel (DCH) can be used for any type of the service up to 2 Mbps, and it has a fixed spreading factor (SF) in the downlink. The DCH is fast-power-controlled and may be operated in soft handover as well. The service data rates can be configured to basically any particular bit rate between the maximum data rate and zero. The DCH maps to the dedicated physical control channel (DPCCH) and the dedicated physical data channel (DPDCH). DPCCH carries the physical layer control information while user data and higher layer control messages are carried on DPDCH. The random access control channel (RACH) carries uplink control information, such as a request to set up a radio resource control (RRC) connection. It is further used to send small amounts of uplink packet data. It is mapped onto the physical random access channel (PRACH). RACH is intended for short-duration 10/20-ms packet data transmission in the uplink. There exists also another uplink option, the common packet channel (CPCH), to enable longer packet bursts up to 640 ms with

2.4

Figure 2.6


17

Release’99 WCDMA transport channels mapping to the physical channels.

power control, but that is not foreseen to be part of the first-phase WCDMA network deployment or the terminal implementations. CPCH supports uplink inner loop PC, with the aid of a downlink DPCCH. Its transmission may span over several radio frames, and it is mapped onto the physical common packet channel (PCPCH). The broadcast channel (BCH) is used to transmit information (random access codes, cell access slots, cell type transmit diversity methods, etc.) specific to the UTRA network or to a given cell; it is mapped onto the primary common control physical channel (P-CCPCH), which is a downlink data channel only. The forward access channel (FACH) can be used for downlink packet data as well. The FACH is operated on its own, and it is sent with a fixed spreading factor and typically with rather high power level since it does not support fast power control as power control feedback in the uplink is not available. FACH does not use soft handover. The uplink counterpart for FACH is RACH. The paging channel (PCH) carries data relevant to the paging procedure. The paging message can be transmitted in a single or several cells, according to the system configuration. It is mapped onto the secondary CCPCH. The downlink shared channel (DSCH) has been developed to operate always together with an associated DCH. This allows one to define the DSCH channel properties best suited for packet data while leaving the data with a tight delay budget to be carried by DCH. The DSCH, in contrast to DCH, has a dynamically varying spreading factor relayed to the terminal on a 10-ms frame-by-frame basis with physical layer signaling carried on the DCH. This allows dynamic multiplexing of several users to share the DSCH code resource and thus optimizes the orthogonal code resource and basestation hardware usage in the downlink. DSCH can utilize power control based on the associated DCH. DSCH is not operated in soft handover. The HS-DSCH properties will be covered in more detail in Section 2.4.4.

18

MOBILE INTERNET

The WCDMA protocol and channel structures are designed to maximize the capacity utilization. By assigning a service request to an optimum physical channel, resources can be effectively allocated and shared. In release 4 the IP header compression was included in the WCDMA standards. End-to-end IP connection is still maintained, but the IP headers are compressed between RNC and UE to save air interface and Iub transmission capacity. The details for the header compression itself are specified not in 3GPP but in IETF [5]. 2.4.3

WCDMA Physical Layer

The physical layer of the WCDMA air interface is based on direct sequence CDMA technology. In a CDMA technology system all the users share the same carrier where the shared resource is the power on the carrier. In the case of WCDMA the carrier bandwidth used is 5 MHz. The chip rate for the spreading sequence was set for 3.84 Mcps (million chips per second) for the compatibility with the required carrier spacing as well as to ensure timing compatibility with GSM for dual-mode GSM/WCDMA terminals. Spreading of the user data over the WCDMA carrier bandwidth is illustrated in Figure 2.7. The transmission bandwidth is the same for all data rates, with processing gain larger for the smaller data rates than for the higher data rates. The achieved processing gain provides protection against interference from the other users active on the same carrier in the same cell or neighboring cells. In the receiver despreading separates the transmitted and spread signals for data detection and reduces interference from other users in the network. The spreading ratios used in WCDMA range between 512 down to 4 for high-data-rate services. There are two key features resulting from the use of CDMA technology: fast power control and soft handover. They both contribute to the WCDMA system capacity but are also required for proper system operation. The fast power control (Fig. 2.8), especially in the uplink, is needed to avoid the reception signal being blocked by the signals from other users. Without power control a terminal transmitting near the basestation would block the reception of the other users further away

Figure 2.7 The CDMA principle used in WCDMA.

2.4


19

TPC commands if SIR > (SIR)set then "down" else "up" TPC commands

BTS

MS adjusts TX power according to TPC commands

MS

MS

Figure 2.8

Fast power control principle.

when exceeding the achieved processing gain. This is known also as the classical near –far problem in CDMA. The power control command rate in WCDMA is set to 1500 Hz with typical a 1-dB step either up or down. The soft handover is required for similar reasons. In soft handover a terminal is connected simultaneously to two or more cells on the same frequency. Especially in the uplink, this is again vital, since a terminal between two cells could otherwise cause problems to the cell to which it is not connected. In soft handover all cells provide power control information to the terminal and the near – far problem is avoided. The WCDMA channel structure is based on the 10 ms frame length, with interleaving done typically over 20 or 40 ms depending on the delay budget for the service. The channel coding can use either the more traditional convolutional coding, as been deployed in, for example, GSM, or the more modern and effective coding with Turbo coding. The use of Turbo coding gives better capacity with higher data rates but does not give any benefits with smaller data rates such as speech. The different users are in line with the CDMA principle separated from each other with CDMA spreading. In the downlink control end-user data are time-multiplexed (DPCCH/DPDCH) as shown in Figure 2.9, while in the uplink DPDCH and DPCCH are on separate parallel codes, as shown in Figure 2.10. The transport format combination indication (TFCI) informs the momentary data rate

Figure 2.9

Basic WCDMA DCH structure in the downlink direction.

20

MOBILE INTERNET

Figure 2.10 Basic WCDMA DCH structure in the uplink direction.

and combination of the transport channels being carried on the particular frame. Pilot information is used for the channel estimation similar to the training sequence in GSM. The transmission power control (TPC) is carried on the DPCCH as well to enable the fast power control operation. The physical layer design in release’99 makes data rates up to the 2.8 Mbps possible, and the downlink direction is then stretched even further in release 5 with the introduction of HSDPA. The key physical layer parameters, based on the release’99 WCDMA specifications, are summarized in Table 2.1. 2.4.4

WCDMA beyond 2 Mbps with HSDPA

The major addition to the WCDMA radio in release 5 was the HSDPA feature, with the impact to the network protocol shown in Figure 2.11. The physical layer principles in HSDPA are similar to those in GSM EDGE evolution: higher-order TABLE 2.1

Key WCDMA Physical Layer Parameters

Physical Layer Parameter

Alternatives

Chip rate Carrier spacing (nominal) Carrier raster Modulation uplink/downlink Channel coding Frame length Interleaving length Spreading factors uplink/downlink Power control command rate User data rate range with 12 rate coding

3.84 Mcps 5 MHz 200 kHz Dual-channel QPSK/QPSK Convolutional or Turbo coding 10 ms 10, 20, 40, or 80 ms 4-256/4-512 1500 Hz 0 –2.8 Mbps (terminal capability)

2.4

21

Rel’5 HS-DSCH

Rel’99 DCH/DSCH

RNC

Packet


New Node B functions: Scheduling & retransmission handling Retransmission

Packet

Node B

RLC ACK/NACK

Retransmission

L1 ACK/NACK

UE

Figure 2.11

Release 5 HSDPA principle.

modulation and retransmission combining in the receiver, to boost the packet data throughput. Additionally, HSDPA brings packet scheduling to the basestation (or node B in the 3GPP standard terminology), which allows faster and more efficient packet scheduling and retransmission functions close to the air interface. The use of the higher-order modulation, 16QAM (level 16 quadrature amplitude modulation), extends the downlink data rate capability beyond 10 Mbps, with the theoretical data rate (without channel coding) over 14 Mbps. The high-speed DSCH (HS-DSCH) channel carries the data in the downlink direction. The 2-ms frame duration is shorter than for the other transport channels. The HS-DSCH resource is dynamically shared between different users on the 2-ms frame resolution. In Figure 2.12 user throughput capability is illustrated in two different environments. From the cumulative distribution it can be seen that, for instance, in the microcell environment 50% of the user locations are such that mean bit rates on the order of 5 to 6 Mbps could be reached. Obviously the data rate given for the individual users is impacted by the number of users being served in the particular cell as well as by the terminal capability. This example corresponds to the case of 80% power available for the HSDPA use, which is close to the maximum power allocation possible due to the power needed for common channels as well as for the associated DCH. Further details of HSDPA and WCDMA principles and performance can be found in Ref. 3. 2.4.5

Evolution of WCDMA

The WCDMA standardization is an ongoing process, similar to other evolving wireless technologies such as GSM/EDGE or IS-95. There is going to be a number of

22

MOBILE INTERNET

80% power and 15 codes allocated to HSDPA service 1

Cumulative distribution function [-]

0.9 0.8

Macrocell/Veh A/3kmph

0.7 0.6 0.5 0.4 0.3

Microcell/Ped A/3kmph

0.2 0.1 0 0.01

0.1

1

10

Instantaneous (per 2 ms) user throughput [Mbps] Figure 2.12 Cumulative distribution example of the HSDPA user throughput in different environments [3].

new items to be included in the release 6 version of the specifications as well as further releases from release 6 onward. The current timing for the release 6 milestone is the end of 2004. Key items currently undergoing development in 3GPP and likely to appear in the release 6 or 7 timeframe are described in the following paragraphs.

2.4.5.1 Enhanced Uplink Dedicated Channel The enhanced uplink dedicated channel focuses on methods used in the HSDPA feature to achieve performance improvements in the downlink direction. The restrictions for the uplink direction are different from those for the downlink direction. In the uplink there is no code shortage, as codes are not orthogonal and each terminal may use its own scrambling code from more than a million alternatives. Instead, the terminal transmission power and control of the uplink noise rise are the keys for improved capacity. WCDMA contains advanced features related to the uplink packet data operation, such as RNC-based uplink scheduling functionality, already in release’99. Also the measurements needed for advanced RRM functionalities, such as uplink buffer occupancy and transmission (Tx) power level, were already covered in Release’99. Figure 2.13 illustrates an example of the concept being studied for the uplink enhancements in 3GPP. It is expected that there will be new physical layer signaling added between node B and the user equipment (UE).

2.4

Packet

Terminal

L1 ACK/NACK


Retransmission

Node B

23

Correctly Received Packet

RNC

Figure 2.13 Release 6 study item, enhanced uplink DCH, with proposed node B– based operation for HARQ in the uplink.

In general the results covered so far in 3GPP indicate that there are some benefits to be obtained, but as release’99 had many advanced features included for the uplink, the level of improvement achieved with HSDPA is not achievable in the uplink.

2.4.5.2 New Frequency Variants of WCDMA The work on the frequency variants of WCDMA is an exception to the normal way of working on the releases, as when a new frequency band is added, it is done generically, independent of the particular release. This enables faster time to market as one can focus on the RF impacts due to the new frequency and take everything else from an earlier release. For example, even if the release 6 specifications would contain only the 1.7/2.1 GHz RF requirements, one can still build a release’99 terminal that only on top of release’99 fulfills the RF requirements as well as the necessary signaling extensions (if any) for the band supported. The bands that are covered now are the IMT-2000 core band, 1900 band in the Americas, and the 1800-MHz band. Release 6 covers 800-MHz bands (different allocations in the United States and Japan) as well as 1.7/2.1 GHz for Americas. For the 2.5-GHz band the feasibility study has been completed and actual specification work will start later.

2.4.5.3 Advanced Antenna Technologies The WCDMA terminals have the necessary bits and pieces for support of the beamforming or advanced antenna technologies. However on the network side work is under way to enable advanced RRM for the beamforming cases by specifying network side measurements. An example of such a measurement is the signalto-interference ratio (SIR) for each beam in the uplink that can be used to control the beam to which the terminal is assigned to. Further topics being discussed are the multiple-antenna Tx concepts as well as multiple antenna Tx and Rx (reception) concepts. The latter work is known also under the multiple-input multiple-output (MIMO) term.

24

MOBILE INTERNET

2.4.5.4 Multimedia Broadcast and Multicast Service Multimedia broadcast and multicast service (MBMS) means supporting on the radio protocol level the case where the same content is provided for more than one user in a more sophisticated way than, for example, the existing cell broadcast. The MBMS will enable users to join (subscribe to) a particular service advertised in the area. The radio resources are then saved, transmitting the content over the common channels simultaneously to several users when there are several users in the service area who want the same content. The support for MBMS is as such a service issue, but discussion has been initiated in standardization whether there should be some further enhancements on the radio access side to support the service more efficiently.

2.5

GSM/GPRS/EDGE

The GSM system has been in use globally since the start of the first networks in 1991. Over the years GSM has experienced several development phases that have stretched the original capabilities as well as capacity of the system far away from the original starting point. The first phase of GSM focused on speech but offered also 9.6 kbps (kilobits per second) data together with services such as short message service (SMS). The later phase in GSM evolution brought high-speed circuit-switched data (HSCSD), which has theoretical peak data rates at 115.2 kbps. The next step was the involvement of the packet-switched side data introduced in release’97 with GPRS, offering a maximum of 160 kbps via the PS domain. The latest development that also reached commercial stage was EDGE, part of release’99 standard, enabling up to 473 kbps theoretical data speed. Work on GSM/GPRS/EDGE evolution is ongoing and is expected to continue on the side of WCDMA development as both WCDMA and GSM further standardization work is carried out in 3GPP. 2.5.1

GSM Principle

The GSM system has a fundamental difference from WCDMA; while WCDMA is based on CDMA technology, GSM is based on time-division multiple access (TDMA). In the GSM system each user is assigned a particular timeslot on the carrier, and on the same carrier other users do not transmit until there timeslot occurs during the frame. In the early phase GSM a terminal can use only a single slot in uplink and downlink, but the later releases have introduced multislot-capable terminals for the higher data rates. The GSM slot and frame structure is illustrated in Figure 2.14. As GSM does not have the near – far problem as do CDMA-based systems, there is no need to have fast power control. Instead, slower power control is applied as that in any case reduces the intercell interference and terminal power consumption. The handover is also based on the hard handover as neighboring cells typically use different frequencies thus circumventing the need for connection to more than one basestation at the time. As the bandwidth of the signal itself is clearly narrow compared to

2.5

Figure 2.14

GSM/GPRS/EDGE

25

GSM slot/frame structure.

WCDMA, another approach is used to obtain the frequency diversity: frequency hopping. When frequency hopping is applied, consecutive frames are transmitted and received on different carrier frequencies to obtain both frequency diversity for the .200-kHz spectrum and interference diversity. There are two types of frequency hopping: random and systematic. The random hopping over large number of frequencies provides good interference diversity impact to the system performance. In a TDMA system use of discontinuous transmission is straightforward in case no data are to be transmitted, in which case the slots are simply skipped from transmission, thus reducing battery consumption and the generated intercell interference. The key GSM/GPRS/EDGE physical layer parameters are shown in Table 2.2. The basic modulation used on GSM is Gaussian minimum shift keying (GMSK), which is suitable for the power efficient terminal design. During GSM development

TABLE 2.2 Key GSM Physical Layer Parameters Physical Layer Parameter

Alternatives

Carrier symbol rate Cross data rate per timeslot GSM/GPRS Cross data rate per timeslot EDGE Carrier spacing Modulation GSM/GPRS Modulation EDGE Channel coding Frame length Interleaving length Spreading factors uplink/downlink Power control User data rate range

273,666 ksps 22.8 kbps 68.4 kbps 200 kHz GMSK GMSK or 8PSK Convolution coding 4.615 ms Up to 4 frames No spreading applied Slow power control, up to 100 Hz 0– 473 kbps (terminal capability)

26

MOBILE INTERNET

work there was a need to increase the data rates, and methods similar to those for HSDPA for WCDMA (mentioned above), were used. The use of octagonal phase shift keying (8PSK) allows one to transmit 3 bits per symbol instead of 1 with GMSK, thus increasing the basic data rates of GSM. Similar to the 16QAM with HSDPA, higher-order modulation obviously needs higher SIR, and thus the use of the modulation is controlled by the link adaptation functionality in GSM. EDGE also includes the hybrid automatic retransmission query (ARQ) for combining the retransmissions in case of packet decoding failures. As the GSM is the most widely deployed cellular technology, it was obvious that the design of WCDMA had to take that into consideration. Therefore all the necessary measurement and protocol enhancements are defined to enable the handover between GSM and WCDMA. From an operation perspective, the obvious advantage is to offer seamless service coverage regardless of the radio technology and also benefiting from the trunking gain in terms of load balancing between the systems. In the marketplace most of the published WCDMA terminals also support GSM functionality to ensure global roaming coverage in areas where WCDMA is not yet deployed. The WCDMA symbol rates have also been designed with the dualmode terminal implementation in mind. 2.5.2

GSM Radio Access Network Architecture

The GSM radio access network architecture is similar to the WCDMA network architecture, as illustrated in Figure 2.15. The RNC functionality on the GSM side is covered with the basestation controller (BSC). The A interface connects the GSM to the circuit-switched core network, while the Gb interface connects to the packet-switched network. In addition to the interface shown, the release 5 version of the GSM specification defines the Iu interface to be used in GSM networks. Further, the interface between BSCs has been added following the WCDMA example. In GSM this interface is not needed because soft handover is not necessary in a TDMA system. However, it has been included to enable, for instance, sharing of the measurement information in the network between BSCs. 2.5.3

GSM Service Creation Principle

The GSM has a layered protocol structure similar to that of WCDMA, although the naming and functional split are slightly different. GSM and WCDMA services are implemented differently; in GSM in the early days, the service set in GSM was limited to speech, SMS, and low rate data. Thus it was simpler to define the services on a bit-by-bit basis instead of defining rules as to how various data rates should be determined. Considering the DSP processing power of the early days of GSM, this was also simpler for the practical implementation when more tasks needed to be done on hardware in comparison to the tasks needed today. A similar service definition approach has been used in other 2G systems as well, such as PDC or IS-95. For release 6, however, an approach similar to that used for WCDMA is being adopted. The “flexible layer one” (FLO) concept is making its way as part of the

2.6 IS-95 RADIO ACCESS

Abis

A BSC

BTS S

27

Iur-g

MSC Gb

Abis SGSN

MS BTS

BSC

Figure 2.15 GSM radio access network (GERAN) structure.

GSM evolution. The use of FLO enables to have new service data rates supported also in GSM without the need for the updates in the radio access network side specifications. Further details of the GSM can be found in Refs. 1 or 5.

2.6

IS-95 RADIO ACCESS

Besides the GSM system, another widely known second-generation cellular system used in several countries in IS-95, whose evolution path is also known as 1X, cdma2000, or IS-2000 standards series. IS-95 is based on CDMA technology with the 1.28 Mcps chip rate. The first version of IS-95 design was very circuit-switched speech-oriented similar, to GSM and likewise improvements have been made since the initial deployment to improve the system capacity as well as service. IS-95 has the same elements as WCDMA in terms of basic features such as fast power control or soft handover. The users are separated from each other by the parallel CDMA code channels. The physical layer design is illustrated in Figure 2.16,

Figure 2.16 Basic IS-95 downlink channel structure.

28

MOBILE INTERNET

TABLE 2.3

Key IS-95/1X Physical Layer Parameters

Physical Layer Parameter

Alternatives

Chip rate Carrier spacing (nominal) Modulation uplink/ downlink Channel coding

1.28 Mcps 1.25 MHz Dual-channel QPSK/ QPSK Convolutional or Turbo coding 20 ms 800 Hz 0 – 300 kbps (terminal capability)

Frame length Power control command rate User data rate range (1X)

which shows the downlink structure with the parallel common pilot and one of the dedicated channels shown. The power control commands are inserted in the downlink user datastream and, in contrast to WCDMA, no physical layer control information is provided for the dynamic rate variation but is accommodated by higher-layer signaling. The CDMA data rates are obviously lower than WCDMA data rates, as can be expected when comparing the basic parameters in Table 2.3 with the corresponding WCDMA physical layer parameters presented earlier. Similar to the HSDPA development in WCDMA, IS-95 has developed to allow higher data rates for data above the 300 kbps range offered by the 1X standard with QPSK modulation. The evolution versions are 1X-Ev-DO (data only) and 1X-Ev-DV (data and voice), offering around 2 and 3 Mbps theoretical peak bit rates in the downlink direction. The higher data rates are enabled in similar ways as with EDGE or HSDPA by using link adaptation (adapting to the momentary channel conditions), HARQ (combining of retransmissions), and higher order modulation (8PSK/16QAM). The IS-95 technology data rate evolution is shown in Figure 2.17.

1XEv -D O IS-95

1XEv-DV

1X

14.4 kbps 144 kbp s (300 kbp s)

2Mbp s

3Mbp s

WCDMA/HSDPA

10 Mbps and beyond

Data rate capability

Figure 2.17 capability.

Data rate evolution with IS-95 technology toward WCDMA/HSDPA

2.7

2.7

GSM/EDGE AND WCDMA OPERATOR PERFORMANCE

29

GSM/EDGE AND WCDMA OPERATOR PERFORMANCE

The performance of GPRS and WCDMA networks is considered here and in Section 2.8 from the operator perspective concentrating on capacity and coverage, and on the end-user perspective, respectively. The maximum coverage area of the basestation site is very similar for GSM/ GPRS and for WCDMA when deployed in the similar frequency band. This means that lower frequency, like GSM900, has an advantage in coverage compared to GSM1800 or compared to WCDMA in UMTS band. The maximum path loss of GSM or WCDMA is typically 150– 160 dB, which corresponds to a density of a few hundred meters of cell radius in dense urban areas and up to 5 –10 km in rural areas. The network capacity is initially limited by Iub/Abis transmission capacity or by basestation hardware capacity. Figure 2.18 shows typical basestation hardware capacities of GSM/GPRS and WCDMA basestation products and indicates that WCDMA system has been designed to deliver very high capacities, which also enable pushing the cost per voice channel and the cost per data bit very low. The higher integration level in WCDMA is enabled by the large (5 MHz) bandwidth, which requires less RF (radiofrequency) parts than in the narrowband systems. When the amount of traffic in the network increases, it may not be possible to increase the system capacity by installing more hardware units to the basestation cabinet because the interference levels in the air interface would increase too much. In that case the system will have hit the spectral efficiency limits. More basestation sites are then needed to increase the system capacity. The voice spectral efficiencies of GSM with AMR (adaptive multirate voice codec) and WCDMA are very similar. The data spectral efficiencies of EDGE and WCDMA are very similar as well with best-effort non-real-time services. These figures demonstrate the continuous development of GSM system matching WCDMA spectral efficiency for low-data-rate services. WCDMA evolution to HSDPA clearly improves WCDMA spectral efficiency.

Data throughput [Mbps] per base station cabine t

Voice channels per base station cabinet 700

25

600

20

500 400

15

300

10

200 5

100 0

0 GSM Full rate GSM Half rate

Figure 2.18

WCDMA

EGPRS

WCDMA

Typical basestation capacity per cabinet [3,5].

HSDPA

30

MOBILE INTERNET

Voice channels per site per 10 MHz spectrum with 3 sectors

Data throughput [Mbps] per site with 10 MHz spectrum with 3 sectors

600

12

500

10

400

8

300

6

200

4

100

2

0

0 GSM with AMR

Figure 2.19 [3,5].

WCDMA

EGPRS

WCDMA

HSDPA

Typical spectral efficiency, with best-effort non-real-time data service assumed

The spectral efficiency of GSM and WCDMA can be further improved by advanced antenna solutions, by interference cancellation techniques, or by AMR voice source adaptive coding. The abovementioned hardware capacity and spectral efficiency results can be turned into the maximum amount of data that each user can transmit in the network. Let us make the following assumptions: . . . .

12 Mbps capacity per basestation site with HSDPA according to Figure 2.19. 1500 subscribers per basestation site on average. Network is 80% utilized during the busy hour. Busy hour represents 20% of daily traffic.

These assumptions allow maximum 14.4 MB (megabytes) of data per subscriber per day, which is equivalent to 430 MB per subscriber per month. A similar calculation for voice shows that 2500 min of mobile-to-fixed voice per subscriber can be Voice traffic [min] per subscriber per month

Data transfer [MB] per subscriber per month

3000

500

2500

400

2000

300

1500 200

1000

100

5 00 0

GSM Full rate GSM Half rate

WCDMA

0

EGPRS

WCDMA

HSDPA

Figure 2.20 Maximum user traffic per month with 10 MHz spectrum and 1 basestation cabinet per site.

2.8 GSM/EDGE AND WCDMA END-USER PERFORMANCE

31

achieved with 280 mErl (millierlangs) of traffic during the busy hour in UMTS. The results are shown in Figure 2.20. These results demonstrate that there is a lot of potential for new traffic in UMTS networks from the spectrum and from the network hardware perspective.

2.8

GSM/EDGE AND WCDMA END-USER PERFORMANCE

Third-generation networks must be able to provide attractive end-user performance in order to achieve new services and more traffic to the cellular networks. Typical end-user data rates with GPRS are approximately 40 kbps, which is close to dialup modem speeds of 56 kbps maximum. EDGE has improved the practical GPRS data rates initially to 80– 100 kbps with two timeslot terminals and up to 160– 200 kbps with four slot terminals. Those EDGE data rates are similar or higher than ISDN data rates of 64/128 kbps. The first WCDMA terminals supported 384 kbps in downlink and 64– 128 kbps in uplink. Those downlink data rates are similar to those of low-end ADSL (asymmetric digital subscriber line) modems, which provide 256 kbps. HSDPA is able to push the peak air interface data rates beyond 2 Mbps, and average practical data rates of 500– 1000 kbps can be provided corresponding to high-end ADSL connections. The average end-user bit rates depend on the environment (see Fig. 2.21) and the system loading assumed. The end-user-experienced performance is affected not only by the bit rates but also by the latency of the network. The latency is characterized by the end-to-end roundtrip time, specifically, the delay in transmitting a small IP packet from the terminal to the server and back. The main components of the roundtrip time are shown in Figure 2.22. The average roundtrip time of GPRS and EDGE networks today is approximately 600 ms. The GPRS improvements allow reduction of the GPRS roundtrip time to approximately 450 ms. The WCDMA roundtrip time is typically 200 ms in the first networks. Since most of the delay is caused by radio access network Peak end user bit rate [kbps] 2000

Average end user bit rate [kbps] 700 600

1500

500 400

1000

300 200

500

100 0

GPRS EDGE EDGE WCDMA HSDPA 3+ 1 2 +1 4 +1 384

Figure 2.21

0

GPRS EDGE EDGE WCDMA HSDPA 3 +1 2+ 1 4+ 1 384

Practical end-user bit rates in macrocell environment.

32

MOBILE INTERNET

Figure 2.22

End-user performance is affected by end-to-end roundtrip time.

components, the new product platforms together with new features, like HSDPA, will enable roundtrip times below 100 ms in WCDMA. Minimization of the roundtrip time is important for the following reasons: 1. Shorter roundtrip time gives faster response times for the end user. 2. New real-time packet-based services, like voice over IP (VoIP) and real-time packet video, require short delay for acceptable end-user quality, typically below 300 ms. The short roundtrip time in WCDMA makes it possible to support new real-time packet-based services, such as “rich” calls. 3. TCP (Transmission Control Protocol) works better when the TCP acknowledgments can be received faster, and consequently, the TCP slow start has a smaller effect on the throughput. The file download times over GPRS/WCDMA is estimated with TCP in Figure 2.23. The new technologies EDGE, WCDMA, and HSDPA clearly improve the end-user

Download time of 100-kB f ile [seconds] 30 25 20 15 10 5 0 GPRS

Figure 2.23

EGPRS WCDMA HSDPA

Typical download time of 100-kB file with TCP protocol.

REFERENCES

33

experienced-delay compared to GPS. WCDMA and HSDPA delay values are so short that even content sizes greater than 100 kB could be downloaded with reasonable speed.

REFERENCES 1. 2. 3. 4. 5.

www.3gpp.org. www.3gpp2.org. H. Holma and A. Toskala, WCDMA for UMTS, 2nd ed., Wiley, New York, 2002. www.ietf.org. T. Halonen, J. Romero, and J. Melero, GSM, GPRS and EDGE Performance, Wiley, New York, 2001.

CHAPTER 3

PROTOCOLS FOR THE WEB AND THE MOBILE INTERNET MITRI ABOU-RIZK Nokia Research Center Burlington, Massachusetts

3.1

INTRODUCTION

The Internet is a phenomenon that has greatly impacted not only the computer industry but also the entire world. It has changed the way we work, entertain, socialize, and communicate. With it, of course, have come both business opportunities and challenges. This chapter presents a small window into the Web’s history and describes its evolution. It also discusses various key concepts behind the wireline and wireless Internet technology.

3.2

HISTORY OF THE WORLD WIDE WEB

What is the World Wide Web (WWW)? And how did it evolve to the gigantic, incredible phenomena that transformed the world? Indeed, the WWW has changed the way we create and seek information and the way we conduct business around the world. Dating as far back as 1945, scientific ideas were emerging that would eventually evolve into the Web. Vannevar Bush, in his influential article “As we may think” [1], urges scientists to make our baffling store of knowledge more accessible. He states that humankind should benefit from the sum of knowledge and not be bogged down by a mountain of research. He even went as far as to predict a “mathematical machine of the future,” in which vast amounts of reusable information could be stored and controlled by instructions. Dr. Bush also predicted large-scale indexing of text that would help in searches of documents. In his own words Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

35

36

PROTOCOLS FOR THE WEB AND THE MOBILE INTERNET

Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, “memex” will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory. Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.

The next steps came in the early 1960s, when Dr. Doug Engelbart [2] prototyped, at the Augmentation Research Center at Stanford Research Institute in California, an online real-time system. This system, known as NLS, created and defined many concepts, such as point-and-click hypertext browsing, two-dimensional display editing, integrated hypermedia email, and cross-file editing. Shortly thereafter, Ted Nelson [3] defined the word hypertext as “forms of writing that branch or perform on request; they are best presented on computer display screens.” In the mid to late 1960s, and as part of the cold war, the ARPANET project was created. Its focus was on computer networking and communications technology. Its initial goal was to provide a distributed packet switching network technology to allow universities, labs, and researchers to communicate. With the standardization, development, and deployment of the TCP/IP, the scope of the ARPANET grew quickly from connecting only a handful of universities supported by the Department of Defense to connecting several hundreds of research sites [4]. Because of its success, the ARPANET is considered to be the precursor to the Internet. Throughout the 1980s, the use of the Internet mushroomed at universities and other research institutions (see Refs. 4 and 5 for some historical reviews). At the European Particle Physics Laboratory (CERN) in Geneva, Tim Berners-Lee wrote a program called “Enquire,” as a way to help him remember connections between people and projects at the lab. This tool, a “memory substitute,” as he called it, was based on hypertext and proved to be very helpful in connecting CERN’s researchers around the world [6]. In the meantime, other software and technologies, such as FTP (file transfer protocol), Gopher, and WAIS (wide area information servers), were being developed and becoming quite useful to the business community. FTP allowed the sharing and distribution of files [34]; Gopher provided a way to search for documents via keywords or subjects; and WAIS allowed users to query remote databases and retrieve customized responses [35]. While each of these systems served its own purpose and was independently helpful, it is the subsumption and evolution of these capabilities that led to the success of the Web. An important example of this progress is the development of the Web browser, in which graphical and point-and-click interfaces were used instead of obscure commands. The World Wide Web is thus the realization of Dr. Bush’s vision of a “memex” that would facilitate the storing and communication of information. Less than 50 years after his prediction, the World Wide Web has become a global resource accessible to all.

3.4 THE FUTURE WEB

3.3

37

THE WEB TODAY

The Web has been defined by many people in many different ways. For engineers, the Web is a physical collection of routers and circuits that enables, through standard protocols, the communication of data between computers at different locations. For novice users, it is a place where common resources, such as documents, can be retrieved and shared across the world. For programmers, it is the largest established client – server deployment to date. In fact, it would be impossible to classify the Web for all people; it simply has no absolute definition. Despite the various descriptions, we all agree that the Web has become one of the most important information technologies. So, who governs it? There is no single authority that sets rules and regulations for the Web. Two of the most respected organizations are the Internet Society (ISOC) [7] and the World Wide Web Consortium (W3C) [8 –11]. ISOC provides guidance in open development of standards, protocols, and technical infrastructure for the use and benefit of the Internet across the world [7]. A board of trustees, or Internet Architecture Board (IAB), is responsible for the technical management and direction of ISOC. Internet users usually express their opinion through publicly announced meetings of the Internet Engineering Task Force (IETF) [12], which is a forum for the exchange of information within the Internet community. The IETF has published several RFCs (requests for comment), which are formal technical documents of proposed standards in use on the Internet, such as SIP, HTTP, IPSEC, RTSP, AAA, and Mobile IP. The principal U.S. repository for RFCs is held at the Internic, the agency responsible for domain registrations, among other functions. Their Website is http://www.internic.net. W3C was founded in 1994 by Tim Berners-Lee, and encourages open forum discussions about protocols and standards [8]. The goals of W3C, as listed on their Website, are to promote universal access to the Web; to promote the semantic Web; and to steer its growth with social issues in mind. W3C has published several recommendations, including XHTML, CSS, XML, DOM, SMIL, and P3P.

3.4

THE FUTURE WEB

Growth of the WWW is far from stagnant. Just as it developed at a fast pace from a “memex” idea to today’s Internet, it is continuing to advance and adjust to the needs of its consumers. The direction it appears to be taking now is toward mobility. Companies, including Nokia, Motorola, and Ericsson, foresee the convergence of mobile communication, such as 2G/3G, and the fixed network, such as Internet, intranets, and extranets [36]. This convergence is pressing for a new mobile Internet architecture and business models that will consequently lead to new opportunities and new challenges for this industry. According to Nokia, this convergence will lead to a “mobile world,” in which most personal communication will be wireless; and to the emergence of mobile applications (in addition to the Web browsing), which will enable consumers to access content, make transactions, do business, contact

38


friends and family, watch videos, and so on. The challenge for such a vision is to limit the complexity of the technical environment, so that end users focus on the values and usability of services rather than be concerned with device capabilities, types of network, and different interaction modes. Naturally, the fast-paced progress of the Web has unraveled certain complex issues, such as Web traffic, accessibility, privacy, security, and identity theft. Future standardization and laws are emerging to deal with such matters.

3.5 3.5.1

HYPERTEXT TRANSFER PROTOCOL Definition and General Operation

One of the main reasons why the Web became so popular to programmers and technical gurus is that it is mainly based on a simple underlying the Hypertext Transfer Protocol (HTTP) [26,27]. HTTP is defined, in RFC 2068, as “a ubiquitous application-level protocol for distributed, collaborative, hypermedia information systems.” Like any other protocol, this RFC defines specific HTTP syntax and semantic rules associated with the use of the language elements. Today HTTP is the de facto data transfer protocol standard used in the Web, although it is designed to be extensible to almost any document format. HTTP is a request – response type of protocol. It provides a simple dialog-oriented exchange between client –server applications. At the basic level, the client application sends a request message to the server, and the server responds to the request with a response message. The underlying protocol used in HTTP is TCP/IP, and the uniform resource identifier (URI) mechanism is used to identify the content server. In early versions of the HTTP protocol (HTTP/0.9 and HTTP/1.0) [9 –11], a new connection was required per each request. However, HTTP/1.1 introduced the concept of persistent connections as the default behavior [13]. With persistent connection, the client and server were able to batch several requests and responses, respectively, before either the client or the server terminated the connection. So, for example, if a Webpage contained N images, HTTP/0.9 and HTTP/1.0 would have required (N þ 1) total TCP/ IP connections to transfer the images, as well as the page itself, while HTTP/1.1 would require only one TCP/IP connection. Naturally, the exchange of several messages per connection reduced the number of packets transmitted, which resulted in better resource utilization, more bandwidth efficiency, and, therefore, faster enduser experience. Figure 3.1 shows the basic operation of an HTTP user agent, such as a Web browser, sending a request to a proxy. In turn, the proxy makes a brand new request, with possible translation, on behalf of the user agent to the HTTP server. The HTTP server creates a message response and sends it to the proxy, which then forwards this response to the user agent. This type of connection exists in corporations where the LAN is separated from the Internet. In this scenario, the proxy can also act as a caching agent and serve the request locally from its cache without connecting to the server. Depending on the network configuration, the intermediary

3.5

Figure 3.1

HYPERTEXT TRANSFER PROTOCOL

39

HTTP basic operation.

proxy, shown in Figure 3.1, could also be substituted with either a tunnel or a gateway or a combination of tunnel, gateway, and proxy. From an operation point of view, a tunnel operates differently from a proxy in that, unlike a proxy, a tunnel funnels the message and does nothing to it. It acts as a relay point between two connections and is commonly used when communication needs to pass through an intermediary, such as a firewall. On the other hand, a gateway receives the request as if it were the origin server for the requested resource. A typical gateway is used mainly for protocol translations in order to access other non-HTTP systems. Regardless of the version, HTTP’s main characteristic is that it is a stateless object-oriented protocol [16]. A stateless protocol does not require the client and server to keep track and state of messages exchanged but simply dictates that messages received are to be treated independently of each other. Generally, statelessness improves scalability because the server-side state does not scale with the number of clients, and it also improves reliability since it is easier to recover from partial failure (since every HTTP message is independent of the previous message). Another feature of HTTP is that it is independent of data representation. In other words, the content of the HTTP body is treated as a block of bytes in the protocol, which allows multiple new formats and character sets to be transported since no analysis is done on the message body.

3.5.2

HTTP Evolution

As described earlier, HTTP was first proposed by Tim Berners-Lee at CERN in order to help researchers navigate through the number of research papers and projects at the lab. Table 3.1 shows notable HTTP dates, listed in chronological order, from its birth until it became a formal standard. As indicated in this table, the evolution and the maturity of HTTP to a formal standard took about 9 years.

40


TABLE 3.1

Historical Timeline of HTTP

Date

Related Document

January 1992 March 1993 November 1993 March 1996 January 1997 June 1999 June 2001

HTTP/0.9 HTTP/1.0 first draft HTTP/1.0 second draft HTTP/1.0 informational RFC 1945 HTTP/1.1 proposed standard, RFC 2068 HTTP/1.1 draft standard, RFC 2616 HTTP/1.1 formal standard

3.5.2.1 HTTP/0.9 The first specification of HTTP, known as HTTP/0.9, was simple and limited, and it described the connection, the request message, and its reply. A client’s user agent made the connection to the original server over TCP/IP with a default port 80 [10,11]. The request message, known as GET, was the only method supported, and its syntax is shown in Table 3.2. The message started with the GET command followed by the HTML document name. Only HTML documents were allowed as part of the request. Further, and as shown in Table 3.2, the client also had the option to send a search request by appending a question mark (“?”) to the filename followed by a search idiom. Multiple search idioms were allowed by appending plus signs (“ þ ”) between them. This search, however, would be effective only if the document contained the ISINDEX HTML tag. A carriage return and a linefeed (CRLF) ended the GET command message. In the reply message, only HTML content formats were transmitted by the protocol. Therefore, and once the transfer of the HTML text document was over, the server closed the TCP/IP connection indicating the end of the file. Error responses in HTTP/0.9 were supplied in human readable text in HTML syntax. Therefore, an error response could not be detected without looking at the content of the text. It is obvious that HTTP/0.9 was simple but restrictive since only the transfer of HTML documents were permitted, and clients were unable to submit documents or any information to the server about their capabilities and their resources. However, HTTP/0.9 was successful in that it set the premise for moving forward to a more elaborate, less restrictive specification for HTTP. The development of the next version of HTTP took about 3 – 4 years. And as pointed out in Table 3.1, HTTP/ 1.0 never became a standard; it was published only as informational RFC 1945. Nevertheless, many implementations of HTTP/1.0 were and are still based on this version.

TABLE 3.2 Request Message Format for HTTP/0.9 GET documentName.html[?srch_1 þ . . . þ srch_N] CRLF

3.5


41

3.5.2.2 HTTP/1.0 One of the main design goals in HTTP/1.0 is that it had to be backward-compatible with its predecessor HTTP/0.9 and remain a stateless protocol running over TCP/IP [37]. The choice of TCP/IP is due mainly to the implementation availability, in the public domain, of the protocol over others, such as a protocol based on the OSI stack. Another design goal is its dependency on the uniform resource identifier (URI) mechanisms. The URI mechanism allows locating and identifying a resource, such as any type of document (html, word, pdf, etc.), image (e.g., jpeg), video clip (e.g., mp3, mp4). The URI RFC dictates that the resource’s representation and content are not dependent on the URI. In other words, the content of a URI can change, but not the URI itself [20 – 23]. One of the major additions in the HTTP/1.0 is the inclusion of the MIME type. MIME, which stands for multipurpose Internet mail extensions, is an extension of the email protocol that allowed users to exchange textual multipart messages, such as ASCII text, and nontextual multipart messages, such as audio and video, as well as other types [15 – 19]. This inclusion allowed Web servers (e.g., HTTP) to include MIME header types, such as image/gif, text/html, application/ zip, application/pdf, and others in their message responses. Therefore, Web clients were now able to parse the MIME type in the transmitted message and select the appropriate application depending on the particular type of header. Another addition to the HTTP/1.0 is the appending of the HTTP version number as part of the request line shown in Table 3.3. With this inclusion, the HTTP servers are now able to distinguish between HTTP/1.0 and HTTP/0.9 request line and formulate an appropriate response in the same protocol version as supported by the client’s user agent. Herewith the forwarding proxies and servers had to be vigilant about not forwarding requests in a different format than that of the native client’s HTTP version. As stated in RFC 1945, a proxy/gateway must never send a message with a version indicator that is greater than its native version. HTTP/1.0 also introduced the concept that each particular message is made up of a header and content. With the two types of messages, request and response, both the client and the Web server were able to exchange information and share resource metadata (description of data that follow). Metadata is in the form of name– value pairs, where the name corresponds to a set of standards that defines the structure or

TABLE 3.3

Full Request Message Format for HTTP/1.0

Full-Request = Request-Line ( General-Header | Request-Header | Entity-Header ) CRLF [ Entity-Body ] Request-Line = Method SP Request-URI SP HTTP-Version CRLF

42


the semantics of the value. Hence, the HTTP request message became a structured multiline request in the form shown in Table 3.3. The inclusion of headers in the request message allowed the client to send, for example, its preferences, as MIME media types, for the type of information it can handle. Further, other advanced headers, such as the If-Modified-Since header, allowed the client to request the resource only if it had changed since the given date. In a sense, the inclusions of headers also allowed parameterizing a request and overriding default behavior. This resulted in a more efficient communication and message responses tailored to specific clients. Similar to the request message type, the response message also became a structured multi-line of the form shown in Table 3.4. As shown in the response structure in Table 3.4, the server is also able to send back more than the resource HTML file, as was the case with HTTP/0.9. It included different types of headers in the message, which allowed the server to indicate the type—using MIME types—of resources being sent to the client’s user agent. New resources, such as images/gif, became quite popular. Moreover, RFC 1945 defined the HEAD and POST methods and redefined the GET method. The GET method retrieves whatever information is identified by the request URI. The retrieved information could come from a static page or a dynamically created page by the server. Naturally, a conditional GET (with inclusion of If-Modified-Header) could also retrieve the information from a client cache instead of a Web server or even from a Web server’s cache in case of a dynamically created page. The HEAD method returns only the metainformation (e.g., headers) about the request URI. This method allows Web robots, Web crawlers, and various search engines to validate links and, if necessary, update the cache by submitting a GET method to the Web server. In both the GET and HEAD request methods, the user agent is only able to retrieve information and not update it on the Web server. On the other hand, the POST method allows sending, storing or updating information onto the Web server. The latter method became the basis for the interactive Web; indeed, end users are now able to post a message to a bulletin board, a newsgroup or submit a form, a survey, and so on. HTTP/1.0 also included additional methods that were not considered part of the standard since they were implemented inconsistently on different Web servers. TABLE 3.4

Full Response Message Format for HTTP/1.0

Full-Response = Status-Line ( General-Header | Response-Header | Entity-Header ) CRLF [ Entity-Body ] Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

3.5


43

These methods are the PUT, DELETE, LINK and UNLINK. The PUT method is similar to the POST method in that it causes some information to be updated on the server. The difference is that the POST request method is directed at a resource that already exists while the PUT request method could be directed at a resource that does not exist. Both the POST and the PUT methods require some kind of authentication. The DELETE request method, given enough privileges, is used to delete the resource pointed by the URI from the Web server. A successful response from a DELETE method could either mean that the request for deleting the resource has been scheduled for later processing or that the requested resource has been actually deleted from the Web server. The combination of the PUT and DELETE methods made HTTP servers suitable for Web publishing. The LINK and UNLINK methods permit the creation and deletion, respectively, of links and relationships between documents. As shown in Table 3.4, the first line of the response is a status code. With this, the Web server is capable of indicating the status code of the client’s request. Each status code is made up of a three-digit number followed by a textual description of the code. The first digit of the status code is categorized to represent the class of response. As shown in Table 3.5, there are currently five values for the first digit, and they are defined in RFC 1945. This kind of categorization facilitates the error handling routine. Some status code examples are 200 OK; 404 Not Found; 302 Moved Temporarily; 401 Unauthorized; 500 Internal Server Error. HTTP/1.0 also introduced the concept of basic user authentication in order to restrict access to resources. The mechanism for authentication in HTTP/1.0 is a simple challenge response in which the server challenges the client request to access a specific resource [14]. The sequence in which this is done is by first sending the status code 401-unauthorized response to the client. This response includes the WWW-Authenticate header field containing the challenge to the requested resource. The client’s user agent may then reissue the same request by adding the authorization information that is typically a login username and a

TABLE 3.5

HTTP/1.1 Status Code with Description

Status Code

Type of Code

Description

1xx

Informational

2xx

Success

3xx

Redirection

4xx

Client error

5xx

Server error

This class of status was not used in HTTP/1.0 and was reserved for future use This class of status indicated that the request was received by the Web server and accepted This class of status indicated that further request must be taken in order to complete the request This class of status indicated that the request could not be fulfilled because of some syntax error This class of status indicated that the server failed to fulfill a valid request

44


password. It is important to note that this basic authentication mechanism does not prevent the interception and manipulation of the body of the HTTP messages. On the contrary, with basic authentication, your private information is sent in clear text and is easily interceptable by common tools such as sniffers. It is a nonsecure transaction. The basic authentication mechanism gave marketers a way to monitor users’ behavior and shopping trends, which facilitated the growth of e-commerce on the Web. On a sidenote, HTTP/1.0 did not prevent authentication and encryptions mechanism from being used to secure access of transmitted data. Such protection usually requires the establishment of a secure HTTP connection, typically through the use of the secure socket layer (SSL). The many additions and improvements to HTTP/0.9 turned the protocol into the most popular and dominant protocol of the Web. In fact, by the time RFC 1945 was released, HTTP/1.0 had already been deployed by thousands of Websites and used by zillions of people (with no exaggeration!). Along with this success and popularity, there emerged issues and problems that drafters of the RFC did not plan for or expect. These problems encompass . Lack of control in caching the content . Inability to continue with interrupted transfers . Need for persistent connections in order to avoid server overload and faster loading of Webpages . Design mismatch between byte-oriented TCP stream and message-oriented HTTP . Exhaustion of IP addresses . Poor level of security . Miscellaneous problems with headers and response codes for different methods Such issues and others pushed the IETF to work on the next HTTP RFC in order to fix, improve, and find solutions to some of the abovementioned problems. Hence RFC 2068 was born and soon followed by RFC 2616.

3.5.3

HTTP/1.1

The main goals of HTTP/1.1 are to remove any inconsistency and resolve the interoperability issues that exploded with HTTP/1.0. To do so, RFC 2068 and RFC 2616 added features and included more stringent requirements [38,39]. Some of the main areas in which HTTP/1.1 improved on its predecessors are . . . .

Format of the request and response messages Consistent request and response methods Persistent connections Chunked encoding

3.5

. . . . . .


45

Content negotiations Byte-range operations Digest authentication Caching New headers Others, such as virtual hosting support

A description of these improvements follows.

3.5.3.1 Formats of the Request and Response Messages for HTTP/1.1 HTTP/1.1, like HTTP/1.0, is based on the request –response paradigm. The request message format for HTTP/1.1 is similar to the previous version and is shown in Table 3.6. The response message format for HTTP/1.1 is also quite similar to the HTTP/1.0 and is shown in Table 3.7. 3.5.3.2 New Request Methods and Definitions In addition to the GET, HEAD, and POST methods discussed in HTTP/1.0, HTTP/1.1 defined and finalized the semantics of four new method request types: DELETE, PUT, OPTIONS, TRACE, and CONNECT.

TABLE 3.6

Full Request Message Format for HTTP/1.1

Full-Request = Request-Line (( General-Header | Request-Header | Entity-Header )CRLF CRLF [ message-Body ] Request-Line = Method SP Request-URI SP HTTP-Version CRLF

TABLE 3.7 Full Response Message Format for HTTP/1.1 Full-Response = Status-Line (( General-Header | Response-Header | Entity-Header )CRLF) CRLF [ message-Body ] Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

46


The GET, HEAD, and POST are all described in HTTP/1.0, and no changes were made there except for the HEAD method. The HTTP/1.1 version of the specification introduced the notion of a conditional HEAD, by which metadata are returned only under the conditions described by the header in the request. The DELETE method was formalized in HTTP/1.1. Similar to HTTP/1.0, this method allows the user agent to delete the request URI from the server. The specification does not indicate when the server should delete the entity and therefore keeps its actual deletion dependent on the server’s implementation. Further, if the DELETE request passes through cache, the cache entity should be marked as stale. The PUT method was also formalized and clarified in HTTP/1.1. This method allows the client to request a server to accept a resource and store it as the request URI. The user agent, with enough privileges or credentials, is now able to update or create a new resource on a server. The example following this paragraph shows a PUT request with the credentials of the user given by the Authorization header. Using the status code, the server, in its response, must indicate whether the resource was created (i.e., 201 Created) or modified (i.e., 200 OK). This new method would be useful in an HTTP-based revision control system. The difference between the PUT and the POST methods is that the former restricts the server to apply the request only to the enclosed URI. The resource in the POST method, however, refers to the program/component that is going to handle the data that are included in the entity of the POST message. PUT /foo/foo.html HTTP/1.1 User-Agent: Mozilla/2.0 Authorization: Basic jtlwjtjwqtjt ...

Both the DELETE and PUT allowed the user to create, update, and delete a resource. Servers limit access to these methods by using an IP address-based restriction or via the HTTP authentication methods. The OPTIONS and TRACE methods were added and finalized in HTTP/1.1. These methods allow retrieval of server information for the purpose of provisioning for a client. The OPTIONS method allows the client to query the server about its capabilities and its resources. A typical response returned by the server would be the status code 200 OK with an Allow header listing the supported methods. The TRACE method, on the other hand, is used for the purpose of debugging at the application level. An HTTP trace route can be accomplished by checking the response back from the request as it passes through each server. Finally, the CONNECT method is used to set up tunnels across proxies and to initiate a transport layer security (TLS) over an existing TCP connections.

3.5.3.3 Persistent Connections As discussed a bit earlier, HTTP/1.0 makes inefficient use of network and server resources. In fact, the HTTP/1.0 protocol creates a new TCP connection for each

3.5


47

request, thus introducing unnecessary latencies. In order to increase the protocol efficiency, HTTP/1.1 has made persistent connection as the default behavior. This use resulted in improved performance and efficiency in all of the Web elements (clients, servers, proxies, gateways, etc.) since it eliminated the congestion caused by three-way handshake for every TCP connection request. Further, user agents, as well as the servers, are now able to pipeline the requests and responses, thus minimizing the inefficiency caused by the handling of individual requests. The HTTP/ 1.1 specification does not recommend when a persistent connection should be closed. However, it imposes specific requirements on certain methods in dealing with prematurely closed connections. 3.5.3.4 Chunked Encoding In HTTP, it is very important to indicate the length of the response message being sent to make sure that no loss of information has occurred during transmission. HTTP/1.0 servers indicate the length of the message by closing the connection after transmitting the full message. However, in HTTP/1.1, and with persistent connection as the default behavior, the server indicates the length of the sent entity to the user agent by including the Content-Length header with each response. For dynamic generated responses, however, where the length cannot be calculated beforehand, this created latency to end users and contradicted the persistent connection features. Since this was unacceptable, the specification introduced the transfer coding termed chunked. With this encoding, the entity body is broken up into a series of chunks that encode their own length and are transferred separately. At the end of the transmission, the server generates a zero length chunk indicating that the message has been fully sent. 3.5.3.5 Content Negotiations With the worldwide widespread of the Web, a resource may be available in different representations. For example, resources on the server may exist in different character sets, different encoding, different languages, and different media types. In order to select the most appropriate content choice, the user agent and the server can negotiate on the resource. HTTP/1.1 defined two types of negotiation: server-driven and agent-driven. The server-driven negotiation is based on the client’s request. In other words, the server will choose the appropriate format and representation of the resource according to the request headers and/or the client IP address. Some of the pertinent request headers that the client may use to state its preferences are Accept, AcceptCharset, and Accept-Language. In the following example, the user agent’s (i.e., browser) header indicates that the client wishes to receive the French representation of the resource: Accept-Language: fr

A server-driven negotiation could also be carried out by an intermediary, such as a proxy or a cache, rather than by the origin server.

48


The second type of negotiation is the agent-driven negotiation. This occurs when the server provides the user agent alternatives to the available resource; and it is up to the user agent to indicate the best representation. Further, HTTP introduced a way for both the user agent and the client to express the extent to which they prefer a particular format versus another. This is done through the qvalue field, which is a numeric number between 0.0 and 1.0. A qvalue of 0.0 means that the format is unacceptable, and a qvalue of 1.0 means that the format is best. With this feature, content negotiation has numerous choices, each with a priority level. The following example shows that a variant may be given by the user agent (i.e., browser) to state that it is configured to accept either French or English with a preference for receiving French content. Accept-Language: fr; q=1.0, en; q=0.5

3.5.3.6 Byte-Range Operation HTTP/1.1 introduced the Range header to allow the client to download a range of bytes instead of the entire document. This permitted the client to recover from failed transfers and to perform multiple transfers of the same document at the same time. Further, a client with limited memory was able to use this feature to work with small parts of large documents without completely downloading it. 3.5.3.7 Authentication HTTP/1.1 introduced the concept of digest authentication in addition to the basic authentication that was introduced in HTTP/1.0. The basic authentication model is based on the fact that the client needs to authenticate with a combination of username and a password for each realm, which are encoded in base64 format. The basic authentication access is not considered a secure method of user authentication since the username and password are transmitted in cleared text. HTTP/1.1 defined the digest authentication model, which is identical to the basic authentication model except that they encrypted the username and password. Digest authentication only solved a serious flaw of basic authentication but still fell short for encrypting, and therefore end-to-end securing, the message content. 3.5.3.8 Caching In general, the goal of caching is to minimize the network roundtrip operation needed in order for the user agent to receive a response and, in some cases, to eliminate the needs to send full responses. Therefore, whether it is a user agent cache, a standalone cache or a server cache, the goal is to always provide a “semantically transparent” operation. In other words, and as shown in Figure 3.2, a semantic transparent cache will always provide the same exact resource as it is in the origin server; and if this is not the case, then the cache will fetch the resource from the origin server and update its content. This HTTP caching protocol is implemented through complex mechanisms, such as expiration models and document freshness. The

3.5

Figure 3.2


49

Benefits of a semantic transparent cache.

specification also provides notification to the user, with specific warning, in case the user agent receives a stale document from cache. In order to determine whether a response can be retuned from cache, a cache needs to calculate and compare both the age as well as the freshness lifetime of a resource. The age represents the elapsed time since the response is generated at the origin server. The freshness lifetime is the length of time between the generation of a response and its expiration time. The freshness lifetime value can also apply some heuristic expiration value based on the Last-Modified header to determine a suitable lifetime. The response of a cache is fresh if and only if the freshness lifetime exceeds its current age as described in the following function: response is fresh = ( freshness lifetime . current age)

A fresh response is forwarded to the client, thus saving bandwidth, while a stale resource should not be forwarded to a user agent unless it is accompanied by warnings. Hypothetically, every HTTP/1.1 response is cachable. However, it does not make sense to cache responses to the following requests: POST, PUT, DELETE, OPTIONS, TRACE, and any response that needed authentication. Further,

50


TABLE 3.8

Cache Headers

Age Cache-Control directives ETag Vary

HTTP/1.1 specification indicates that the response is cachable if the requirement of the request method, its header fields and the response status indicate it. HTTP/1.1 defined four new headers associated with cache operations, and these are listed in Table 3.8. By doing so, the specification defined a better granularity of control over responses and requests that are cachable in order to achieve a semantic cache operation. The cache control directives are composed of the cache control request and the cache control response directives. The client uses the cache request directives listed in Table 3.9 to override the expected cache behavior. This is usually done in order to protect privacy information of the end user. The server uses the cache control response directives listed in Table 3.10 to affect cache behavior, such as to modify the expiration mechanisms of a response. The advantages of adding cache control mechanisms are obvious. Cache is capable of reducing latency, enhancing end-to-end bandwidth efficiency, extending the caching system, and improving interactions. However, as with every design, there is a tradeoff for this architecture in that it affects reliability of the data (by assuming proper cache behavior) and in some cases privacy for the end users.

3.5.3.9 Headers HTTP headers are metadata that describe the resource being sent. They play a major role in HTTP specification and implementation in that they extend the capabilities of the protocol, allow content and protocol negotiation between client and server, and affect caching and storing of the information and authentication. HTTP/1.1 classified four types of headers that fall under two categories: end-to-end and hop-by-hop

TABLE 3.9

Client Cache Request Directives

No-cache Only-if-cached No-store Max-age Max-stale Min-fresh No-transform Extension tokens

3.5


51

TABLE 3.10 Server Cache Response Directives Private Public no-store no-cache no-transform must-revalidate proxy-revalidate max-age s-maxage extension tokens

headers. The four types are the general headers, the entity headers, the request headers, and the response headers. All of these are to be interpreted and acted on by clients, proxies/gateways, and servers. The syntax of an HTTP header, shown in the Table 3.11, consists of a name– value pair separated by the colon character and ending with a carriage return and a linefeed (CRLF). An HTTP message may have any number of headers, that is, zero or several; therefore, they are an optional component of the HTTP message. An example of a Date header is shown below: Date: Sat, 1 Mar 2003 10:20:00 GMT

End-to-End versus Hop-by-Hop Headers The end-to-end headers are carried through as many intermediaries as necessary to transmit the message. Any intermediary that did not understand one or more headers was required to forward them without modifying them. That was the case for HTTP/1.0. However, with the widespread use of combinations of intermediaries and their functional diversity in caching and forwarding, HTTP/1.1 introduced the concept of the hop-by-hop type of headers. This type is only applicable for a single transportlevel connection and is not stored by cache or forwarded by proxies. For example, a hop-by-hop header could be used between two adjacent intermediaries that understand a specific compression algorithm that others in the path cannot identify. The hop-by-hop headers are shown in Table 3.12. All other headers are considered end-to-end headers. In the next few sections, a brief description of these headers will be clarified.

TABLE 3.11

Message Header Format for HTTP/1.1

Message-header ¼ fieldname “:” [field value] CRLF

52


TABLE 3.12

Hop-by-Hop Headers for HTTP/1.1

Connection Keep-Alive Proxy-Authenticate Proxy-Authorization TE Trailers Transfer-Encoding Upgrade

General Headers General headers are generic and used by clients, servers, and other applications to supply metainformation to one another. They mainly apply to the entity being transmitted and not the one being transferred. In other words, general headers are not applied to the content but rather to the sent message. Table 3.13 lists the general headers that are defined in HTTP/1.1. The Pragma general header includes implementation-specific directives. This header does not affect the protocol or the content of the resource. It only contains application component information that will only affect its behavior. HTTP/1.0 initially defined only one usage, as shown below, and HTTP/1.1 kept this definition for backward compatibility. The Pragma: no-cache header means that any subsequent request to the same resource cannot be obtained from cache. Pragma: no-cache

The Cache-Control general header gives an elaborate access to the cache and was discussed earlier in this chapter. In brief, its main purpose is to give better control of what is to be cached versus not cached and a scheme on how to calculate the freshness lifetime of a resource to determine whether or not it is stale.

TABLE 3.13

General Headers for HTTP/1.1

Pragma Cache-Control Date Connection Transfer-Encoding Upgrade Via Warning

3.5


53

The Date general header is generated by the server and in theory represents the date and time at which the message is generated. However, in practice the Date header represents the best available approximation of the date and time of the generated message. The format dictated by HTTP/1.1 refers to RFC 1123 time format and is shown below. Date: Sat, 1 Mar 2003 10:20:00 GMT

The Connection general header specifies options for the session between two adjacent intermediaries. HTTP/1.1 defines the value: close, for this header, and the example is shown below. A close connection indicates that the connection will be closed once the response is transmitted; and that the client ought not to send more requests on the same TCP connection. This header is a hop-by-hop header. Therefore, it is valid for a single transport-level connection and is not stored by cache or forwarded by proxies. Connection: close

The Transfer-Encoding general header indicates the type of transformations applied to the message body to transfer it safely between two adjacent points on the network. This header is a property of the message and not of the entity. For HTTP/1.1, the only transfer encoding supported is chunked data, as shown in the example below, which allows transferring an entity without knowing its length in advance. HTTP/1.1 200 OK Server: Apache/1.2 Connection: Keep Alive Transfer-Encoding: chunked Content-Type: text/html ... kdata in chunked formatl ...

The Upgrade general-header is primarily for switching from one protocol to another. It allows protocol negotiation between the client and the server. As shown in the example below, a user agent, through the use of this header, is requesting a transition from one protocol, such as HTTP/1.1, to either IRC/6.9 or HTTP/2.0 protocol. Upgrade: IRC/6.9, HTTP/2.0

The Via general header allows end-to-end systems to detect the identity and versions of any proxies. It is only inserted by gateways and proxies to indicate the protocols and receivers, which handled the request between the client and the server. As shown in the example below, a request message is sent from an HTTP/1.0 client to an

54


internal proxy code named Curly, which uses HTTP/1.1 to forward the request at eaglecompany.com. Via: 1.0 Curly, 1.1 eaglecompany.com

The Warning general header field contains detailed status information for the client about a request. This header may indicate that the response does not contain the exact representation of the resource in the origin server. HTTP/1.1 predefined a few codes, shown in Table 3.14. For example, a 110 Response is stale warning header may indicate that a proxy returned an expired resource from its cache; or a 214 Transformation applied may indicate that a proxy has altered a particular content coding. When a response contains warning headers, then it is the responsibility of the client agent to inform the user in the order that they appear. Entity Headers The entity headers contain metainformation about the entity body or the resource sent by the client and/or server applications. For instance, entity headers indicate the type of data in the entity body. Table 3.15 lists the entity headers defined in HTTP/1.1. They are part of both the HTTP requests and responses. An unrecognized general, request, or response header is considered an entity header. The Allow entity header informs the client the list of methods supported by the resource identified by the request URI. The example below shows that that the server will allow only the GET and HEAD methods. This header merely serves as a suggestion for the client but cannot be forced on the client. PUT/foo.html HTTP/1.0 Allow: GET, HEAD

The Content-Encoding entity header informs the client about the content coding that was applied to the entity body before transmission. Essentially, a user agent (e.g., browser) specifies to the server that it can accept “content encoding,” and if the server is capable, it will then compress the data and transmit them. The user agent decompresses the data and then renders the page. Content coding is performed on documents to minimize their size without loss of

TABLE 3.14 110 111 112 113 199 214 299

Warning Codes for HTTP/1.1

Response is stale Revalidation failed Disconnected operation Heuristic expiration Miscellaneous warning Transformation applied Miscellaneous persistent warning

3.5


55

TABLE 3.15 Entity Headers for HTTP/1.1 Allow Content-Encoding Content-language Content-length Content-location Content-MD5 Content-Range Content-Type Expires Last-Modified

information. The example below indicates that the entity body has been compressed using gzip. Content-Encoding: gzip

The Content-language entity header identifies the natural language of the entity being returned. This header is the result of a server-based negotiation for a specific resource before it is sent to the client. As shown in the example below, the Content-Language of the entity is English. HTTP/1.1 200 OK Content-length: 24324 Content-language: en ... kresponse in Englishl ...

The Content-length entity header specifies the length in bytes of the entity body that is being transmitted. It is an important header, since it is an indication to the client of when the transmission will be completed. The example above shows that the content length of the message is 24,324 bytes in length. As mentioned previously, for dynamic generated pages, in which the length is unknown, the Transfer-Encoding: chunked encoding is used instead. The Content-location entity header indicates the base URI location of the entity being returned. This header is used when a resource has several entities associated with it, and each of the entities has an alternate access location. As shown in the example below, the Content-location is /home/lang/fr (i.e., the language resource version is French). Content-location: /home/lang/fr

56


The Content-MD5 entity header provides an MD5 digest of the entity body. An MD5 digest is a 128-bit checksum, and its algorithm is defined in RFC 1864. The digest value in Content-MD5 is encoded as per the base64 algorithm. Moreover, HTTP extends it to permit the digest to be computed for MIME composite media types. The Content-MD5 header allows the server to provide a primitive end-to-end message integrity check. It is primitive because the digest cannot prevent modifications to the entity and/or any of the headers. For example, a hacker could change its content, making it impossible to detect any changes. Nonetheless, this header does not protect against malicious attacks, but it merely helps in the detection of modifications and/or errors during transmission. The Content-Range entity header specifies to the client which byte ranges are being returned and the length of the full entity body. It may specify only one range, and it must contain absolute byte position for the first and last byte of the range. In the example below, a server returned the first 2000 bytes of an entity, assuming its total length to be 100,000 bytes. This header helps the client to place the requested partial body in the response message inside its full entity. Content-Range: bytes 0-1999/100,000

The Content-Type entity header specifies the media type of the entity being returned. The media types have to be registered with IANA. An example of this header, in which the entity body is an image/gif type, is shown below. Content-Type: image/gif

The Expires entity header indicates to the client that the response should be considered stale after the specified date and time. It is important to note that expired data may be kept in cache storage, but it may not be used in a response unless the origin server first validates it. Expires: Thu, 27 Feb 2003 12:00:00 GMT

The Last-Modified entity header, as its name implies, indicates the date and time that the resource was last modified at the origin server. For instance, for files, it may represent a timestamp of its last modification; for database gateways, it may be the last update timestamp of a record; for dynamically generated objects, it may reflect the last updated set of its components. Last-Modified: Thu, 27 Feb 2003 12:00:00 GMT

Request Headers As the term implies, request headers are specific to the request messages sent to the server. They allow the client either to express the type of data it is willing to receive from the server or to put some constraints on the server. They can be grouped into four different categories: the conditional request headers, the

3.5


57

client information request headers, the response preference request headers, and the constraint server request headers. As the term implies, conditional request headers render the HTTP request message conditional. In other words, it is recommended that the server build a customized response according to the conditions described by the request headers. Table 3.16 lists the conditional request headers for HTTP/1.1. The If-Modified-Since conditional request header indicates to the server whether to return the requested variant. For example, when a user agent requests a resource, the entity is returned only if the requested variant has been modified since the time specified in this header. If not, a 304 Not Modified response code is returned without any message body. This minimizes the number of bytes sent to the client in case the response is valid.

CONDITIONAL REQUEST HEADERS

If-Modified-Since: Fri, 28 MAR 2003 10:00:00 GMT

The If-Unmodified-Since conditional request header indicates to the server only to carry out the operation if the variant has not been modified since the given date. On failure, the server should return a 412 Precondition Failed status response code. If-Unmodified-Since: Fri, 28 MAR 2003 10:00:00 GMT

The If-Match and the If-None-Match request-headers make a conditional request to the server. These headers use the entity tag (ETag) instead of the date/timestamp in order to compare cache entries against new versions of the same entries. ETag are an opaque validator and each resource, regardless of its version, will have a distinct entity tag. The comparison with the entity tag will decouple cache validation from expiration date and, therefore, allow to identify versions of the same resources reliably. For example, a user agent specifying to delete (i.e., DELETE method) or to update (i.e., PUT method) a resource on the server may make its request conditional by adding the If-Match header. This allows the server to match the entity tag before carrying out the requested operation. When the match fails, the server should return a 412 Precondition Failed response code. In the example below, the server will compare the entity tag for the current version

TABLE 3.16 Conditional Request Headers for HTTP/1.1 If-Modified-Since If-Unmodified-Since If-Match If-None-Match If-Range

58


of the “file.html” file with the ones sent in the request (“bbdd” or “32eedd”). If the entity tag matches either one of them, the server will perform the PUT request. Otherwise, the 412 Precondition Failed status response code is returned. PUT/file.html HTTP/1.1 If-Match: ‘‘bbdd’’, ‘‘}32eedd’’ ... kcontent for file.htmll

The If-Range request header is also considered a conditional request header. This header allows the client to send a request for parts of a document if it has not changed; otherwise, and if the entity has been modified, the server will automatically send the complete new entity. In the example below, the Range and the If-Range request are combined with a GET request to make it conditional, and to save on bandwidth. In the example, if the entity tag of “file.html” file on the origin server matches the entity tag present in the If-Range header, the server will just send back the requested bytes within the requested range; otherwise it will send the entire resource. GET/file.html HTTP/1.1 Range: bytes=1300-2000 If-Range:fff33 139u9 2429429 Host: www.abcd.com ... kcontent for file.htmll

This second category of request headers allows the client to send additional information about itself, such as the type and version of the software the client is running, and about its user, such as authentication information. The list is shown in Table 3.17. The Authorization request header allows the client to send authorization information to validate the user’s request to access a specific resource as indicated by the request URI. As discussed earlier, there are two types of authentication supported in HTTP/1.1: basic and digest authentication. It is important to note that this header field does not prevent access to malicious acts but only allows tracking user agents as they access some restricted information. If the server can validate the username and password in the specified realm then it will serve the request; otherwise, a 401 unauthorized or a 403 Forbidden response code is returned to the client. CLIENT INFORMATION REQUEST HEADERS

Authorization: Basic YOFFLSHJF29292u2ag-=

The Proxy-Authorization request header has concepts similar to those of the Authorization header, but instead of authenticating a user agent with a server,

3.5


59

TABLE 3.17 Client Information Request Headers for HTTP/1.1 Authorization Proxy-Authorization From Referer User-Agent

it authenticates itself with a proxy server. Unlike the Authorization, the Proxy-Authorization applies only to the next outbound proxy; it is a hopby-hop header. Proxy-Authorization: Basic YOFFLSHJF29292u2-=

The From request header allows a client to send the Internet email address in the format described in RFC 1123. This email address should be machine-readable in order to allow automatic responses to be sent to the user. However, in practice it is advisable for this header to be configurable to allow the user to prohibit transmission of this field so as to protect privacy. From: webmaster @ mycompany.com

The Referer request header indicates to the server the source of the current request URI. Similar to the From header, this field is not obligatory for the purpose of protecting user privacy. It allows the server to generate back links to sites and documents for marketing purposes and also allows links to be traced for maintenance. For example, if I am visiting http://www.yahoo.com, and I click on this page on a reference to the Celtics NBA (National Basketball Assoc.) sport (i.e., http:// www.nba.com/celtics.html), the Referer header in the request sent to the NBA sports would be the string, http://www.yahoo.com, as shown in the example below. GET/celtics.html HTTP/1.1 Host: www.nba.com Referer: http://www.yahoo.com ...

The User-Agent request header allows the client to include information about the software type and version of the client (browser, OS, hardware type and version, etc.). The server may use this information for statistical and auditing purposes, for tracing of protocol violations, and for customizing responses according the clients’ limitations. This information can also be used for observing clients’ behavior and therefore infringing on privacy. User-Agent: Mozilla/4.01 ( OS2, I)

The third category of the request headers facilitates content negotiations between the client and the server. These headers indicate the client’s preferences in terms of media types, character

CLIENT RESPONSE PREFERENCES REQUEST HEADERS

60


sets, preferred language, content coding, and transfer coding. The list is shown in Table 3.18. The client uses the Accept request header to specify the media types, described in MIME, which will be accepted in the request response. This field may contain multiple entries, each separated by a quality factor (i.e., qvalue). The example below shows that the client will accept all subtypes of audio with preferences to basic audio. The server may also send any audio type (audio/*) if it is best available after 80% markdown in quality. Accept: audio/ ; q=0.2, audio/basic

The client uses the Accept-language request header to indicate the natural languages that are acceptable in the responses. A natural language is spoken, written, or otherwise conveyed by people for communication purposes; it excludes all types of computer languages. The example below shows that the client is willing to receive the requested resource in either U.S. English version or in French. The server will select the variant and will return the resource in either language, and it will be denoted in the response by the Content-Language header. Accept-Language: en-us, fr

The client uses the Accept-Charset request header to indicate the resource character set to be delivered in the response. Similar to the Accept header, this header syntax allows the use of a wildcard thus allowing it to contain multiple values, each with a relative quality value. If the server cannot supply a character set that is given in this header field, then it should return a status code of 406 Not Acceptable in the response. Accept-Charset: iso=8859-5, ISO-2022-JP; q=0.8

The Accept-Encoding request header is used by the client to inform the server of the acceptable content encoding in the responses. If the server cannot obey the header, it should respond with a status code of 406 Not Acceptable. The example below shows that the client prefers the “gzip” encoding in the response.

TABLE 3.18 Response Preference for Request Headers for HTTP/1.1 Accept Accept-Language Accept-Charset Accept-Encoding TE

3.5


61

However, and as a distant secondary choice (q ¼ 0.1), the client will also accept the compress encoding in the response. Accept-Encoding: gzip; q=1.0, compress; q=0.1

The client uses the TE (transfer coding) request header to specify the transfer coding it prefers. It is a hop-by-hop header and indicates the transformation applied to the entity body. TE: compress; q=0.7, gzip; q=0.2

This fourth category of the request headers attaches some constraints on the behavior of the server and/or also discovers its limitation. Table 3.19 lists the request headers for HTTP/1.1. The Expect request header allows the user agent to determine the limitations of the server before submitting a request. If the server is able to meet such an expectation, it sends a 100 Continue response. Otherwise, it responds with a status code, such as 417 Expectation Failed. For example, if the client is forbidden to access such information, the server sends a 403 Forbidden response code; if the request is too long, then the server sends a 413 Entity too large response code. More specifically, if the client is about to send one megabyte (1 MB), and it wants to know whether the server is able to handle the request, then this type of negotiation would look like CONSTRAINT SERVER REQUEST HEADERS

POST /foo/doc HTTP/1.1 Content-Length: 1000000 Expect: 100-Continue

and the server would respond HTTP/1.1 100 Continue

In HTTP/1.0, the servers were unable to extract the server name from the URL. This dictated that all content had to be hosted under the same hostname, and for obvious business reasons companies were not fond of this method. Therefore, Host request header was introduced in HTTP/1.1 to distinguish between virtual hosts on the same machine. With this header, Web hosting companies were able to manage Web content for several businesses, and each company was able to publish its own URL.

TABLE 3.19 Request Headers for HTTP/1.1 Indicating Constraint on the Server Expect Host Max-Forwards Range

62


As shown in the example below, if a client wanted to access the resource http:// then the GET request would be

www.w3.org/pub/WWW/,

GET /pub/WWW/ Host: www.w3.org

HTTP/1.1

The Host header is a mandatory field, and it represents the origin server. If this header is not present, the server must reject the request with a status code of 400 Bad Request. The Max-Forwards request header is used only in the TRACE and OPTIONS methods. With this header, the client specifies the maximum number of proxies and gateways it desires the request to go through before being returned. As it passes through, every intermediate agent will decrement the value of the header until it is zero, at which point this recipient will respond as if it were the last recipient. OPTIONS /foo HTTP/1.1 Host:.com User-Agent: Mozilla/2.0 Max-Forwards: 5

When downloading large documents, the network connection got dropped and we found ourselves requesting the same document all over again. Naturally, these actions frustrated Web users, wasted bandwidth and increased latency. The Range request header is new to HTTP/1.1 and is designed to improve performance and minimize the number of bytes transmitted over the network. It allows the client to request only the portion of the resource in which it is interested. An example of such a request would be GET largedoc.html HTTP/1.1 Host: foo.com Range: 3000-5000

and the server response would be HTTP/1.1 206 Partial Content Content-Range: bytes 3000-5000/100000000 Content-Length: 2001 Content-Type: text/doc

The Content-Range, as described earlier in this chapter, helps put the downloaded portion in its place in the resource container, and it also indicates the total number of bytes (e.g., 100 MB) for the document. As an example, it is worth mentioning that Adobe “.pdf ” files took real advantage of this header and incorporated a start and end location for each page in the document. Hence it allowed the client to issue several GETs simultaneously, leading to improved user experience.

3.5


63

Response Headers As the term implies, response headers are specific to response messages. They provide additional information about the response and/ or the server capabilities. This data may include location information and server identification. Table 3.20 lists the response headers for HTTP/1.1. The location response header redirects the user agent to a location different than the request URI. This header differs from the Content-Location entity header in that it informs the client it received an entity body from a URI that is different from the requested URI (i.e., server side redirection is in effect). On the other hand, the Location header is used on 3xx status response (redirect) to inform the user agent to try the request again using the request URI (resolved against the original request URI). An example of the location header is shown below: http: ///www.microsoft.com/msdn Location: http://msdn.microsoft.com ...

The Server response header is analogous to the User-Agent request header except that it allows the server to identify its software and version number and other server configuration information, leading to improved user experience. HTTP/1.1 200 OK Date: Thu, 20 Mar 2003 20:00:00 GMT Server: Apache/1.2.3 ...

The Retry-After response header informs the client when to retry the request for a successful response. In most cases, this header is used when the server responds with

TABLE 3.20 Response Headers for HTTP/1.1 Location Server Retry-After Accept-Ranges WWW-Authenticate Proxy-Authenticate Etag Age Vary

64


a 503 Service Unavailable status code; or it also may be used with any 3xx (redirect) status code response to indicate the minimum time the user agent is asked to wait before issuing the redirect request. The value of the header field can either be the HTTP-Date header or an integer number of seconds (in decimals). Two examples of its use are Retry-After: Fri, 25 Apr 2003 13:00:00 GMT

or Retry-After: 60

In the latter example, the server indicates to the client to wait one minute before issuing the next request. As mentioned earlier, HTTP/1.1 introduced a series of headers to allow both the client and the server to deal with downloading a portion of a resource. However, for one reason or another, the specification does not mandate that the server support such a feature. The Accept-Ranges response header allows the server to indicate whether it can handle the range requests for a resource. The server should include this header in the response of an OPTIONS method request. A positive response is indicated in the example below, and none as a value for this header would indicate a negative response. Accept-Ranges: bytes

For restricted resources, the HTTP server prevents any user agent from accessing the information without proper authorization. In HTTP, this is described as both the basic and digest authorization. These types of authorization were briefly described earlier. Therefore, if a client request requires user authentication, the server will send a response containing a 401 Unauthorized status code response and a challenge applicable to a requested resource. Further, the WWW-Authenticate response field will contain the data necessary for the client to validate itself by sending an Authorization header. For example, accessing the URL http://www.mybank. com/checking.html will result in the server issuing the following challenge: WWW-Authenticate: Digest realm=‘‘[email protected]’’, nonce=‘‘dsglrpwq30r33u9wfj93j9393ur9iriri’’, opaque=‘‘fiwtr932m329t3ofm329f32i03i03’’

It is important to note that the WWW-Authenticate response header field might contain more than one challenge and, as shown in the example above, the contents of a challenge may have a list of comma-separated authentication parameters.

3.5


65

Similar to the WWW-Authenticate, the Proxy-Authenticate response header field is used to authenticate with a proxy instead of a server. However, and unlike the WWW-Authenticate, the Proxy-Authenticate header applies only to a current connection. It is a hop-by-hop header. The Etag (“entity” tags) response header is used for cache validation against newer version of the same resource. The Etag is an opaque cache validator and is associated with a specific resource; hence it should not be used to compare across different resources. The following example shows the use of an Etag header: HTTP/1.1 200 OK Date: Sat, 22 Mar 2003 10 58:00 GMT Server: Apache/1.2.5 ETag: 72492749-298282 Content-Type: text/html Last-Modified: Thu, 20 Mar 2003 10:58:00 GMT Host: . . . kcontent followsl

The Age response header indicates the best approximation of the age of a response from the origin server. Cache servers and proxies are required to send the Age header with every response. A cache response is valid if it does not exceed its “freshness lifetime.” HTTP has defined specific calculation on how to compute the Age values. Most cache servers use the URI as the key for storing and retrieving resources. But since the resources can be of all types and kinds, such as different languages and character sets, the URI used as a key became insufficient and inadequate. The Vary response header informs the client cache whether it has the correct resource in its storage or whether it has to fetch the resource from the origin server. The following example indicates to the cache to store the value of the Vary response header, which is Accept-language in this case, as part of the key. Subsequent requests to the same URI must check the value of the Accept-Language to be exactly the same as in the original request. If not, the cache cannot return the cached response and instead must get the resource from the origin server. HTTP/1.1 200 OK Date: Sat, 22 Mar 2003 10:58:00 GMT Server: Apache/1.2.5 ETag: 72492749-298282 Content-Type: text/html Last-Modified: Thu, 20 Mar 2003 10:58:00 GMT Host: www.abcd.com Vary: Accept-Language kcontents followsl

66

3.5.4


Conclusion

Today HTTP is one of the most widely used application-level protocols. It and the introduction of HTML fueled the Web’s rapid growth in the late 1980s and early 1990s. This technology has changed the way we do research, publish, and do business. It is the simplicity of HTTP and the ease of its implementation that caused the explosion of the Web. HTTP semantics consist of a limited number of commands (GET, HEAD, POST, DELETE, PUT, TRACE, and CONNECT) that can be described by a rich variety of headers. This gives HTTP a great flexibility to adapt itself to new applications and also makes it a perfect match for client – server architecture. Could HTTP adapt to mobile devices? Could it lead to the next evolution by connecting the Web to the telecommunication industry?

3.6 3.6.1

WIRELESS ACCESS PROTOCOL (WAP) Introduction

During the rapid growth of the fixed Internet, the world of telecommunication, specifically GSM (global system for mobile communication), matured and exponentially expanded throughout the world. Just as HTTP provided the backbone growth for the Web, GSM provided a framework of reference for telecommunication companies to build their end-to-end solutions [36]. Initially, most of the Web was designed for desktops and workstations that have access to a relatively high bandwidth rate and reliable data network. Moreover, the content developed required large displays and graphic capabilities. However, since the late 1990s there has been a fundamental push toward mobility and smaller Web devices that are constantly connected anywhere and at anytime, such as cellular phones and personal digital assistants (PDAs). This constant need to be continuously reachable necessitated the development of new specifications that fit this restricted environment. These are called the wireless access protocol but are better known as WAP. WAP is designed mainly for low-end devices that have a small memory footprint, a limited amount of storage, very limited computation capabilities, smaller screen displays, and very awkward input means. Its specifications also deal with wireless data network that have high latency and less bandwidth, and the availability of the connection is not reliable. In short, WAP hype is all about accessing the Web from mobile devices. In fact, creating content for mobile terminals rather than for the desktop became a more strategic task for companies, since the size of the screen limits the amount of information available at once. 3.6.2

WAP Evolution

In the mid to late 1990s, Unwired Planet (since then it evolved to Phone.com and then to Openwave), a California company, was the first to introduce a markup

3.6 WIRELESS ACCESS PROTOCOL (WAP)

67

language, known as Handheld Device Markup Language (HDML) [40]. HDML is specifically tailored to create hypertextlike content for handheld devices with small display capabilities. HDML objectives replace the navigation and display model of HTML, which was not designed and developed for a typical restricted environment, and they allow these handheld devices to function as Web clients, keeping the same Web infrastructure (such as HTTP, URLs, CGI, Perl script). The building block of HDML content is the card, and HDML cards can be grouped together to form HDML decks. It is the responsibility of the user agent, the runtime environment of HDML, to provide the mechanisms and behaviors for displaying and navigating through the cards. HDML defines four types of cards: . The No Display Card, which contains text and/or references not displayed to the user . The Display Card, which contains text and/or references displayed to the user . The Choice Card, which allows the user to select from a list of available options . The Entry Card, which allows the user to enter text An HDML deck is similar to an HTML page in that a URL identifies it, and it is the unit of content requested from an HDML server/gateway. A deck is similar to an application in that it contains one or more related cards, and many decks can be associated to form linked applications. In late 1997, in an effort to standardize their HDML solution, Unwired Planet (UP) formed the WAP forum with Nokia, Ericsson, and Motorola. Its aim is to have the major handset manufacturers decide on and adopt a single standard for the Web mobile devices. This partnership created a series of WAP specifications. The main goals of the Forum are to bring Internet content and adapt it to the wireless mobile devices; to create and standardize a set of communication protocols tailored for these small devices, for the sake of interoperability and scalability; and to use as much specification as possible without re-inventing the wheel.

3.6.2.1 WAP 1.0 Architecture The WAP 1.0 architecture goals [28] were to provide a layered scalable and extensible architecture that would support as many wireless networks as possible; to optimize communication for narrowband bearers, such as SMS; to optimize the use of local device resources, such as memory and power consumption; to provide secure applications for the wireless devices; and to provide a programming model for telephony services and integration. The WAP request model closely followed the WWW model. A client’s user agent, such as a browser, sends a request for content or data objects to an origin server. The origin server responds with the requested data (e.g., HTML) to a user agent. As shown in Figure 3.3, HTTP is the underlying network protocol that

68


Figure 3.3 1998).

WWW programming model (based on the WAP Architecture Model Specification,

allows the client to communicate to the origin server. The socket secure layer (SSL) is used to achieve end-to-end security between the client and the server. Similar to the WWW model, WAP defined a programming model (i.e., request – response paradigm) with one main difference: the existence of a WAP gateway between the server (i.e., HTTP) and the client (i.e., WAP). The programming model, shown in Figure 3.4, is made up of three components: the WAP client, the WAP gateway, and the WWW origin server. The WAP client, typically the wireless device, communicates with the WAP gateway by sending its request and receiving the content from the Internet. The WAP gateway is a logical component in the WAP architecture and is responsible for encoding and decoding the data between the WAP client and the Web server (e.g., HTTP). Its responsibility is to translate the requests and responses between the Web server and the mobile client (i.e., microbrowser). By doing so, it ensures that OTA (over-the-air) data are minimized and client CPU requirements are met. The Web server is just the typical application server that hosts the content (IIS, Apache, etc.). The existence of the gateway in WAP is one of the fundamental differences between the WWW and the WAP programming model. Even though the gateway can act like a proxy server and therefore can be anywhere on the network, it usually

Figure 3.4 1998).

WAP programming model (based on the WAP Architecture Model Specification,


69

belongs to a third-party vendor. This raises two major concerns: scalability and security. From the scalability point of view, the gateway represents a single point of failure and a bottleneck due to the end-to-end flow control. From the security point of view, since it provides protocol translations and compression of content, it acts like a “builtin man in the middle” and is vulnerable to hackers’ attacks. In fact, the security model of WAP is between two WAP endpoints, and in this case, it would be between the WAP client and the gateway and not between the client and the content server. So in order to secure your data, a company has to own and manage a WAP gateway/ proxy/server within a trusted environment (e.g., intranet) as shown in Figure 3.5. 3.6.2.2 WAP 1.0 Components The WAP architecture provides an extensible application environment and a scalable set of communication protocols tailored specifically for wireless devices. Figure 3.6 shows the WAP stack, which is made up of five layers: 1. WAP application environment (WAE) specifies mobile applications’ framework that enables the retrieval of WAP content from the Internet. 2. WAP session protocol (WSP) layer provides a lightweight session layer to allow the exchange (in binary format) of data between applications. 3. WAP transaction protocol (WTP), the layer that provides a reliable transaction to the transport layer. 4. WAP transport layer security (WTLS), the security layer that allows encryption of data for securing the transport of messages required by e-commerce applications. This is an optional layer. 5. WAP datagram protocol (WDP), the transport layer that sends and receives data via any bearer network, such as SMS, USSD, CDMA, IS-136, and CSD. Also depicted in Figure 3.6 are the mapping of the Web and the WAP layers to enable one to easily compare their respective responsibilities. Figure 3.7 shows a typical interaction between the stack components when a WAE client requests information from a Web server. Typically, a WAP request starts from the client’s WML user agent (i.e., WAP browser) and passes through all the layers of the WAP stack client. Once the request reaches the WAP

Figure 3.5

End-to-end security in WAP.

70


Figure 3.6

Component of the WAP 1.x stack.

gateway/proxy, the latter converts it from a WML/WAP protocol type to an HTML/HTTP/IP type and on behalf of the client, reissues the request to the origin server (i.e., HTTP). Assuming the origin server honors the request, it sends the response back to the client through the WAP gateway. This time the gateway translates the request from HTML/HTTP/IP to a WAP format, which is better suited for OTA transmissions and sends the response to the WAP client. The gateway is a logical entity between the WAP client and the HTTP server, and it is conceivable to have the built-in encoders and decoders in the origin server.

Figure 3.7

WAP stack overview (taken from WAP Forum, 1998).


71

However, the functionality of the WAP gateway, such as converting and transforming requests between the Internet and the WAP protocols, is essential and mandatory from the WAP client’s perspective. Wireless Application Environment (WAE) Layer The WAE layer is a general runtime environment for the WAP that is similar to the HTML, JavaScript, and Web browser environment. It defines the fundamental services and formats needed to ensure interoperability between different device manufacturers and to create content that can be viewed across several Web devices. The WAE is made up of two user agents: the microbrowser and the WTA (wireless telephony application). In a nutshell, the microbrowser facilitates browsing of WAP content (e.g., WML pages, WMLScript scripts, wbmp images, etc.), while the WTA user agent provides an interface to telephony applications, such as phonebooks, calendar, and interactions with mobile telephone features, such as call control. These agents support different types of content formats, such as WML, WMLScript, wbmp, vCard, vCalendar, and mobile services. It is important to note that WAE does not restrict its environment to the two user agents mentioned above but also permits the integration of other user agents with varying architectures and environment. Typically, and as shown in Figure 3.4, the user agent on the terminal initiates the request for specific content. The WAE layer also allows content to reach the client even though it did not initiate the request. This is described through the WAE pushbased model as expressed in the WAP Wireless Application Environment Overview specification. WAE helped to resolve interoperability issues brought on by the complexity and the limited characteristics of Web devices. It allows the implementer of the microbrowser to select (in both features and capabilities) the preferred way to display the content for a specific device. For example, a WML page with input control would most likely be displayed differently on a Nokia-compliant WAP phone versus an Openwave-compliant WAP device. In other words, the user agent is free to choose how to present the input controls. For instance, it may either bind them to specific keys on the device or render them into buttons. The WAP content is made up of WML pages and WMLScript scripts. WML is a lightweight XML-compliant language, and WMLScript is a lightweight scripting language that is based on ECMAScript (although not fully compliant) that is similar to JavaScript [29]. WML language is parsed and displayed on WAP devices. It is mainly designed to support limited input and output device characteristics, low-bandwidth constraints and limited memory resources. It inherited the terminology of HDML version 2.0 in that each WML document is a single deck, which is made up of one or several associated cards. A card contains both content, such as text, and navigational controls, such as going back to a previous card. The reasons the associated cards are packaged together into a single deck is to reduce latency (since the unit of download is a deck), and to facilitate packaging of specific applications. Therefore, if you

WML AND WMLSCRIPT

72


download an application from company.com, then all necessary cards will be packaged into a deck and downloaded to the client in a single request. The user agent, in turn, will cache all the cards, and through the WML microbrowser, the end user will navigate from one card to another in this deck, will review its content and enter any information needed (username, password, etc), and move to the next card. Similar to HDML and HTML, WML relies on the URI semantics. A URL references a deck and their fragments define the cards within it. Further, WAP extends the URL fragments to allow linking to a specific WMLScript function. WML features include support for . . . . . . .

Internationalization (character set is set to UNICODE) Text that includes various emphasis elements (bold, Italic, etc.) Images that include layouts and presentation hints Variables that keep track of user input Narrowband optimization, such as client-side user input validation Variables that last longer than a single deck (global variables) Control elements for user input, such as text entry (password fields, etc.), option selection (that can set data, navigate among cards and invoke scripts), and task invocation control (navigation and history stack management)

A basic WML deck with one card is shown in Figure 3.8. The WML deck starts with an XML header. The header indicates that the following document is an XML document and indicates its version number (e.g., 1.0). The header also contains the document type with a URL reference to the DTD (document type definition) that gives the full XML definition of the WML. In this case the DTD referenced is defined in WAP 1.1. Following the header is the content of the deck. The deck is enclosed with the , wml . , /wml . tags. This deck contains only one card that is enclosed by the , card . , /card . tags. This card contains some text and two attributes: an identification (“card id”) and a “title.” This card identification could be used to refer to this card and for navigation purposes between cards. A common usage of the optional title attribute is to display it to the end user on the device for the simple purpose of improving usability. The text to be displayed to the user is

Figure 3.8

Simple WML deck.


73

enclosed in , p . , /p . tags. These tags represent the beginning and the end-ofparagraph element. WML has an associated scripting language called WMLScript. Similar to JavaScript and the Web browser (Internet Explorer, Netscape, etc), WMLScript is fully integrated into the WML microbrowser; and therefore it can access the WML state model as well as the WML variables. This adds intelligence to the thin client by allowing it to check for user input, such as validating the input values for variables before sending the request OTA, and thus reducing latency. However, and unlike JavaScript and HTML, WMLScript is not part of a WML deck. It contains separate files that can be referred to from the WML deck using a URI mechanism. WMLScript also provides extensibility to the WAE client by allowing future services to access device peripherals, such as phonebook, calendar, and list of WTA messages. A typical WMLScript is shown in Figure 3.9. The script function name is currencyConvertion and takes two variables from the calling entity: the exchangeRate and the currency. This script sets a variable ret to the result of multiplying the exchange rate and the currency variables. These variables have been passed to the function. It then calls a function from the WMLBrowser library to set WML variable $(CONV_RESULT) to the number in variable ret. Finally, it calls another function from the WMLBrowser library to display the third card where the CONV_RESULT variable will be displayed with its new value. WMLScript supports several categories of operations including assignment, arithmetic, logical, and comparisons. Further, WMLScript defines several libraries including a language library, a browser library, a string library, a floating library, a URL library, and a dialog library. Every library contains some predefined functions that are stored on the device and may be considered as part of the microbrowser. The intention behind the libraries is to make WMLScript extensible. Wireless Session Layer (WSP) The WSP is the session layer of the WAP stack. Like HTTP, it provides a request – response protocol but in a compact OTA encoding. In short, it enables data exchange between mobile service and applications [28]. WSP has two different services: the connection-oriented mode and the connectionless mode. The connection-oriented service mode operates over the transaction layer protocol (WTP), while the connectionless service mode operates over datagram service (WDP). In the connection-oriented mode, the service keeps track of the context between consecutive requests. While in the connectionless mode, every request is handled independently and there is no guarantee in what order two requests will be served. Therefore, in the latter, the client marks the requests

Figure 3.9

A simple WMLScript.

74


with a unique transaction identifier (TID) and the server attaches this mark to the reply to allow the client to separate different replies from each other. WSP basic services provide a means to establish and release a session in an orderly manner, to suspend and resume a session with session migration (a session might be resumed over a different bearer), to incorporate capability negotiations; and to exchange binary-encoded content. WSP is basically designed to include the same functionality as HTTP protocol but in binary syntax. So, unlike HTTP that uses ASCII strings to communicate protocol information (e.g., in the headers), WSP has defined byte codes to minimize the number of bytes to be sent OTA. Specifically, the WSP specification specifies a byte number to each header present in HTTP; it also assigns a byte sequence for the values of the HTTP headers. If the HTTP header is not defined in the WSP specification then the header is considered to be an application header and is sent in plain text. One characteristic of the WSP is the session state length. Unlike HTTP, in which the session length is very short and is measured in terms of the request – response connection, the WSP state session is quite long (relative to HTTP). In WSP, the session starts when the mobile user agent connects to one of the URL via the WAP gateway and ends when the user finishes the browsing. Having such a long session enables WSP properties, such as content types, character sets, languages, and device capabilities, to be defined and established only once during the session creation (i.e., at browser startup). Naturally, this has the benefit of saving bandwidth and reducing protocol overhead. Another characteristic that was added in the WSP layer is the push feature. This is exclusive to WSP and is performed when a WSP server sends information to a mobile client without a preceding request. WSP supports three different types of push mechanisms: . Confirmed data push with an existing session context—this is a reliable push since the client sends a confirmation to the server that it was received. . Nonconfirmed data push with an existing session context—this is an unreliable push since the client does not confirm the data reception. . Nonconfirmed data push without an existing session context—in this case, a default session context is assumed. The WSP layer also supports asynchronous requests. These allow both the client and the server to bundle several requests and submit them. They consume less airtime and improve latency. Wireless Transaction Protocol (WTP) The WTP layer is designed mainly for a transaction-oriented protocol suitable for low-bandwidth mobile clients [28]. It is an optional layer that can operate, securely or nonsecurely, on top of a datagram protocol (WDP) and sits below WSP as shown in Figure 3.7. One of the protocol features is to provide reliable client – server transactions. To help provide reliability, WTP uses retransmission techniques after timeouts and


Figure 3.10 2000).

75

Unreliable invoke message with no response (taken from the WAP Forum,

acknowledgment of messages. Specifically, it uses the transaction identifier (TID) in every message to recognize duplicate messages. Another protocol characteristic is that the WTP layer is responsible for concatenation (if possible) of multiple protocol data units (PDUs) into one transport service unit. This will improve latency and utilization of OTA bandwidth. In order to satisfy all application requirements, the WTP defines three classes of transaction service: . Class 0: Unreliable Invoke Message with No Response (Fig. 3.10). This class provides an unreliable datagram service and is intended to send an “unreliable push” service. The basic behavior of this class is simple in that the initiator (client or server) sends a transaction to the receiver with no guarantee that the transaction will reach its destination, since a receipt acknowledgment is not sent. This class of transaction is stateless and terminates once the “invoke” message is sent. . Class 1: Reliable Invoke Message with No Result Message (Fig. 3.11). This class provides a reliable datagram service and is intended for applications that require a “reliable push” service. The basic behavior is that the receiver of the “invoke” message must acknowledge the datagram when it is received. If the initiator does not get an acknowledgment within a limited amount of time, the sender retransmits the message. This transaction terminates when the initiator receives the acknowledgment. . Class 2: Reliable Invoke Message with One Reliable Result Message (Fig. 3.12). This class provides a typical invoke– response transaction service. The basic behavior is that the initiator sends the “invoke” message to the receiver. The receiver replies with exactly one result

Figure 3.11 Reliable invoke message with no result message (taken from the WAP Forum, 2000).

76


Figure 3.12 Reliable invoke message with one reliable result message (taken from the WAP Forum, 2000).

message that implicitly acknowledges the “invoke” message. The initiator then acknowledges the result message but keeps the transaction state information for some time, in case it fails to arrive. If the receiver estimates that the retransmit timer of the initiator is about to expire, it may send a “hold on” acknowledgment. This will inform the initiator to wait and not to retransmit the message. The transaction ends when the receiver acknowledges the message. WTP is a symmetric protocol in that the same services are present in the client and the server. Further, in principle, the client and the server can act both as initiator and responder. WTP uses a primitive error handling. If an error occurs, such as broken connection, the transaction is aborted. Wireless Transport Layer Security (WTLS) The WTLS layer is the security layer for WAP [28]. It provides security mechanisms between two WAP endpoints (refer to Fig. 3.5). Hence the information that travels between the WAP gateway and the Web server is normally not encrypted unless SSL connection is used between these endpoints. WTLS operates over WDP and is an optional layer based on TLS (transport layer security) version 1.0, which in turn is based on SSL (secure socket layer) version 3.0. It is commonly known that these Internet protocols provide mechanisms for “data integrity,” “privacy,” “authentication,” and “denial of service.” Therefore, WTLS provides similar features on top of the added optimization and efficiency with respect to bandwidth and processing power. The WTLS specification describes how to use the following cryptographic operations: digital signing, stream cipher encryption, block cipher encryption, and public key encryption. The internal architecture of WTLS, shown in Figure 3.13, is subdivided into two sublayers. The WTLS record protocol accepts all messages to be transmitted, compresses the data, applies a MAC, and encrypts and sends the result to the lower layers for transmission. Received data are decrypted, verified, decompressed, and then delivered to upper layers. WTLS describes four record protocol clients: the handshake protocol, the change cipher spec protocol, the alert protocol, and the application data protocol.


Figure 3.13

77

WTLS internal architecture.

The “handshake” protocol allows peers to agree on the protocol options to be used, such as the security parameters for the record layer and key exchange. This type of negotiation occurs at the beginning of each secure connection. The “alert” protocol contains the severity of the message and an alert description. It is protected with a 4-byte checksum and the format (compressed and encrypted or plain text) of the message is specified at the connection state. The WTLS specification supports three severity levels: . Warning level—in this case, the session and connection are kept intact. . Critical level—in this case, the connection is terminated immediately. . Fatal level—in this case, the connection and the session are invalidated. The “change cipher” protocol exists to signal transitions in ciphering strategies and consists of a single message, which is encrypted and compressed. The “application data” protocol is one of the most important sections in the upper layer as it contains the data that is exchanged between the two parties. WTLS provides both encryption and message authentication codes for the data, ensuring that the data are kept confidential and private. Wireless Datagram Protocol (WDP) The WDP layer provides a simple datagram service and, hence, operates above the data-capable services [28]. It specifies how different existing bearer services should be used to provide a uniform service to the upper layers. This consistency is achieved through an adaptation layer, which shields the upper layer from the bearer specific issues, such as addressing. Different bearers (shown in Fig. 3.6), such as SMS, CSD, and USSD, have different characteristics and therefore the specification for each one of these bearers varies in both content and size. For instance, the section on IP bearers simply states that WDP must use the UDP protocol from the IP suite. How WDP adapts to other bearers, such as GSM and USSD, is described in their specification section. Bearer The bearer layer is basically responsible for routing datagrams to the destination device.

78

3.6.3


WAP 2.0

3.6.3.1 Introduction Even before its first release, massive publicity surrounded the WAP technology and Forum. Absolutely every company—network operators, mobile manufacturers, and content providers—hyped it as being the next Internet evolution. Marketing fueled end users’ expectations about the technology, its content deliverables, and its development. As WAP devices started to get deployed, consumers expected to surf the Web on them as easily as they do on their desktops, with images, sounds, and other effects. However, the reality was quite different. Consumers were forced to accept basic functionality over the traditional offering of voice communications. Moreover, the killer application never materialized, and suddenly everyone realized the complexity of the user interface and limitations of the mobile devices. Mobile users and investors demanded that the industry move to a mobile environment, carrying with it the functionality of a fixed environment. Therefore, the WAP Forum moved rapidly to align its specifications [30,31] with the fixed Internet standards. Hence, WAP 2.0 was born.

3.6.3.2 WAP 2.0 Architecture and Overview As mentioned, one of the most important goals for WAP 2.0 is to bring the mobile Internet services and standardization closer to the Internet standards. This would benefit all. For the end users, it would mean (to a certain extent) a similar browsing experience, a similar look and feel, and most importantly, a user-friendly and familiar experience. For the content provider, it would be an opportunity to expand their offerings to mobile Internet users in addition to the fixed Internet. For the telecommunication industry, it would bring short development cycles and interoperable devices all within global reach. The WAP 2.0 goal is to define a platform that is compliant with the Internet architecture, its protocols, security, and trust. The Forum roadmap is to be synchronized with other standards bodies such as ETSI, 3GPP, IETF, and W3C and adds value to them by focusing on the mobility factor, such as location based services, as well as the personalization and customization of the Web devices, such as the presentation of the content. The end-to-end programming model for WAP 2.x is shown in Figure 3.14. It resembles the WWW request – response model, shown in Figure 3.4; but some enhancements, optimizations, and extensions have been made to match the characteristics of the wireless devices better. In this model, the end-to-end aspect, without a gateway acting as a “man in the middle,” is emphasized. The WAP forum depended on the WWW programming model and its standards; this is a natural evolution since these have been proved to be simple and functional, and everyone, including consumers, developers, and content providers, is familiar with it. For example, the naming model for WAP 2.x to identify local resources is based on the URI mechanisms; the WAP content formats are based on WWW technology, such as display markup, images, scripting language, and other features.


Figure 3.14 document).

79

WAP 2.0 programming model (taken from WAP Forum Architecture

It is important to note that the WAP 2.x does not prevent a WAP proxy to act as a “man in the middle” in order to optimize and enhance the connection. This is shown in Figure 3.15. This proxy may provide several functions, including . Translation from the WAP 1.x (WSP, WTP, WTLS, WDP, etc.) to WWW protocols (TCP/IP and HTTP) . Content encoders and decoders in order to translate WAP WML content to HTML and or XHTML format and vice versa

Figure 3.15 WAP 2.x proxy.

80


. A caching proxy to improve performance and network utilization Further WAP 2.x architecture allows supporting servers to provide services, such as public key creation, provisioning, and user agent profiles (UAProf) to wireless devices, proxies, and applications. 3.6.3.3 WAP 2.x Components The WAP 2.x stack is shown in Figure 3.16. The new stack architecture, compared to the WAP 1.x stack architecture in Figure 3.7, is a service-oriented stack rather than a protocol stack. It is a layered design stack, which provides extensibility and scalability for the mobile environment, in which each layer depends on the services of the layer underneath it. The modularity is achieved by exposing a set of public APIs and services for each layer. Application Framework The application framework is based on the general application environment of the WWW, Internet, and Mobile Internet technologies; and similar to WAP 1.x, the main purpose of this application development is to establish an environment to allow different devices, network operators, and

Figure 3.16

WAP 2.x stack architecture (taken from WAP Forum Architecture, 2001).


81

content providers to build interoperable applications and services. The application layer is considered independent from the stack. As shown in Figure 3.16, this application framework includes . . . .

WAE/WTA user agents Push service Multimedia message service Content formats service

The WAE is a microbrowser that supports the WML and the XHTML mobile profile markup languages (XHTML MP), the WMLScript and the ECMAScript languages, and the stylesheet language based on the mobile profile of the cascading stylesheet (CSS) from the W3C. The wireless telephony application (WTA) user agent supports additional content types, such as WBMP images, vCard and vCalendar, which allow access to calendar and phonebook services. In addition, WAE in WAP 2.x provides full backward-compatibility support for WML 1.x through either native support for both languages (WML 1.x and XHTMLMP) or by a defined transformation operation (through the eXtensible Stylesheet Language Transformation (XSLT)) of WML 1.x to WML 2.x. WML 2.x is an extension of XHTMLMP that adds specific features of WML 1.x for backward compatibility. The WAP push service defines a mechanism to send data (or push content) to mobile devices by server-based applications via a push proxy as defined in the WAP Push Architecture Overview Specification. The functionality of the push service provides control over the lifetime of pushed messages, store and forward capabilities at the push proxy and control over bearer choice delivery. The multimedia message service (MMS) provides a rich set of content and media types, such as color images, audio, video, and animation, to subscribers in a messaging context. It supports both the sending and receiving of such messages by MMS enabled devices. The content format service provides the support for a set of well-defined data formats such as video, audio, images, and calendar information. Session Services The session services establish shared state among network elements distributed across multiple networks. As shown in Figure 3.16, these services include capability negotiation, push-OTA, and synchronization. Capability negotiation services allow the negotiation between two communication endpoints in order to describe, transmit, and manage their capabilities about a client, user preferences (to an application server), or a network element. These capability negotiations are defined in the user agent profiling (UAProf ) specification and are based on the composite capabilities/preference profiles (CC/PP) defined by W3C. The push-OTA service is the protocol specification that allows the data to be delivered from the proxy gateway to the mobile terminal device. It is an application layer protocol that provides a mandatory connectionless service that can run on top

82


of the WSP or connection-oriented service in conjunction with both WSP (OTAWSP) and HTTP (OTA-HTTP). The synchronization service defines a common framework solution for WAP data synchronization. It has basically adopted the SyncML specification. The SyncML messages are supported over both WSP and HTTP/1.1 protocols. The “cookies” service allows the client to gain a persistent state, which may be used over multiple hypermedia transfer transactions. In short, cookies are a general mechanism that a server-side connection can use both to store and retrieve information on the client side.

Transfer Services The transfer services enable the structured transfer of information between network elements. As shown in Figure 3.16, this layer includes the hypermedia transfer, the streaming, and the message transfer. The hypermedia transfer services are made up of the combination of the wireless session protocol (WSP) and the wireless transaction protocol (WTP). These protocols, defined in WAP 1.x and carried over to WAP 2.x, provide the hypermedia transfer of resources over secure and nonsecure datagram transports. HTTP provides the hypermedia transfer service over secure and nonsecure connection-oriented transports. In WAP 2.x, and contrary to the WAP 1.x, the use of HTTP compression is left optional. This means that the encoding of WSP will not be a feature of HTTP. The streaming services allow for transferring data, such as audio and video, isochronously. The message transfer service, on the other hand, provides a means to transfer asynchronous multimedia messages such as e-mail and instant messages.

Transport Services The transport services layer offers a consistent set of services to the upper layers and can be mapped onto bearers that have different characteristics. As shown in Figure 3.16, this layer includes the datagram and the connection-oriented protocols but is not limited to these services. The datagram service consists of the wireless transaction protocol (WTP) and the wireless datagram protocol (WDP), and it operates over the data-capable services supported by the various network types. Both WTP and WDP have been defined in WAP 1.x and carried over to WAP 2.x. The connection-oriented service provides a connection that is established and maintained until all messages exchanged by the end-to-end application is completed, then the connection is properly released. In short, TCP is responsible for ensuring that a message is divided into the packets that IP manages and for reassembling the packets back into the complete message at the other end. Since the initial release of WAP 2.x, the wireless profiled TCP (WP-TCP) has provided the connection-oriented services. This inclusion is due mainly to the WAP Forum alignment with the IETF. Moreover, TCP provides very useful connection characteristics, such as large data transfer and end-to-end security (TLS). WPTCP is optimized for wireless environments and can interoperate with standard TCP implementation in the Internet.


83

Bearer Networks Compared to WAP 1.x, there is no real change in the bearer network layer. All types of bearers will be allowed, including SMS, circuit-switched data, and packet data. Most likely the decision on which bearer will be chosen would depend on the service, the billing, the marketing and the quality of service of the application. It is important to note that WAP 2.x introduced the support of IP in mobile devices. In fact, there is a great expectation from the WAP community that IPv4 and IPv6 will be the main bearer since it has proved to be efficient and reliable in the WWW programming model. Security Services The security service, shown in Figure 3.16, is a vertical layer that spans across the entire WAP stack. Some of the security specifications have been defined in WAP 1.x, and those were carried over to WAP 2.x; others have been defined in WAP 2.x. For example, wireless transport layer security (WTLS), WAP public key infrastructure (WPKI), and wireless identity module (WIM) have been defined in WAP 1.x, while WAP 2.x focused on IPSec (defined through IPv6) to run on top of the bearer and on TLS to be used with HTTPS. Service Discovery The service discovery, shown in Figure 3.16, also spans across many WAP stack layers. Some of the specifications defined there are . An external functional interface (EFI), which allows applications to discover the external functions/services available on the device. This extends the capabilities of the microbrowsers and device application to execute outside the WAE capabilities. . A provisioning service that allows a device to be remotely configured. This permits operator to manage the devices on its network. . A navigational service that allows a device to discover new network services. . A service lookup that provides the discovery of a specific parameters of a service through a directory. 3.6.4

Future of WAP

Initially, WAP 2.x services are expected to run over SMS, which will dictate the nature and speed of early applications. Then GPRS, with data rates up to 115 kbps, will offer faster connectivity for GSM subscribers and will allow multimedia as well as Web browsing. Later 3G networks, with a minimum data rate of 384 kbps, will deliver to roaming users enough bandwidth and capacity to browse and will make WAP a vital technology. With the new WAP 2.x architecture, the WAP Forum has delivered its promise to simplify its architecture and align it with the fixed Internet. The Forum defined a consistent architecture that is transparent throughout the wireless and the wireline Internet and more importantly kept backward compatibility to WAP 1.x. Indeed, XHTML enables more compelling services and end-user experience plus the added value of convergence between Internet and mobile content development. Further, TCP/IP communication protocol enables the development of rich 2.5–

84


3G multimedia services. Compared to WAP 1.x stack, TCP/IP provides a greater end-to-end security and faster connections bringing into reality services such as downloading multimedia video and different services. So, what is next? Recently (as of 2004), the WAP forum joined forces with the Open Mobile Alliance (OMA) [32,33] and consolidated their efforts. OMA is designed to be the center of mobile service standardization work, helping the creation of interoperable services between operators and terminal devices. One of the OMA goals is to grow the mobile market by supporting seamless and easy to use mobile experience for users and a market environment that encourages competition through innovation and differentiation. It remains to be seen how WAP will develop and expand its market share.

REFERENCES 1. V. Bush, As we may think, The Atlantic Monthly, 176(1):101 –108 (July 1945). 2. Bootstrap Institute Library, http://www.bootstrap.org/engelbart/. 3. T. Nelson, Complex information processing: A file structure for the complex, the changing and the indeterminate, Proc. 1965 20th Nat. Conf., New York, ACM Press. 4. National Museum of American History (birth of the Internet), http://smithsonian.yahoo.com/birthoftheinternet.html. 5. Web History, http://www.webhistory.org/home.html. 6. T. Berners-Lee, R. Fielding, and H. Frystyk, Hypertext Transfer Protocol—HTTP/1.0, RFC 1945, IETF, May 1996, http://www.ietf.org/rfc/rfc1945.txt?number ¼ 1945. 7. Internet Society (ISOC), http://www.isoc.org. 8. The World Wide Web Consortium, W3C, http://www.w3.org/Consortium/. 9. The original HTTP document as defined in 1991, http://www.w3.org/Protocols/HTTP/AsImplemented.html. 10. Hypertext Transfer Protocol—HTTP/0.9, http://www.w3.org/Protocols/HTTP/AsImplemented.html. 11. Basic HTTP as defined in 1992, http://www.w3.org/Protocols/HTTP/HTTP2.html. 12. Internet Engineering Task Force Overview, IETF, http://www.ietf.org/overview.html. 13. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, Hypertext Transfer Protocol, HTTP 1.1, RFC 2616, IETF, June 1999, http://www.ietf.org/rfc/rfc2616.txt. 14. J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, and L. Stewart. HTTP Authentication: Basic and Digest Access Authentication, RFC 2617, IETF, June 1999, http://www.ietf.org/rfc/rfc2617.txt?number ¼ 2617. 15. N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME), Part One: Format of Internet Message Bodies, RFC 2045, IETF, Nov. 1996, http://www.ietf.org/rfc/rfc2045.txt?number ¼ 2045. 16. N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME), Part Two: Media Types, RFC 2046, IETF, Nov. 1996, http://www.ietf.org/rfc/rfc2046.txt?number ¼ 2046.

REFERENCES

85

17. K. Moore, Multipurpose Internet Mail Extensions (MIME), Part Three: Message Header Extensions for Non-ASCII Text, RFC 2047, IETF, Nov. 1996, http://www.ietf.org/rfc/rfc2047.txt?number ¼ 2047. 18. N. Freed, J. Klensin, and J. Postel, Multipurpose Internet Mail Extensions (MIME), Part Four: Registration Procedures, RFC 2048, IETF, Nov. 1996, http://www.ietf.org/rfc/rfc2048.txt?number ¼ 2048. 19. N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME), Part Five, Conformance Criteria and Examples, RFC 2049, IETF, Nov. 1996, http://www.ietf.org/rfc/rfc2049.txt?number ¼ 2049. 20. T. Berners-Lee, Universal Resource Identifiers in WWW, a Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as Used in the World Wide Web, RFC 1630, IETF, June 1994, http://www.ietf.org/rfc/rfc1630.txt?number ¼ 1630. 21. J. Kunze, Functional Recommendations for Internet Resource Locators, RFC 1736, IETF, Feb. 1995, http://www.ietf.org/rfc/rfc1736.txt?number ¼ 1736. 22. K. Sollins and L. Masinter, Functional Requirements for Uniform Resource Names, RFC 1737, IETF, Dec. 1994, http://www.ietf.org/rfc/rfc1737.txt?number ¼ 1737. 23. T. Berners-Lee, L. Masinter, and M. McCahill, Uniform Resource Locators (URL), RFC 1738, IETF, Dec. 1994, http://www.ietf.org/rfc/rfc1738.txt?number ¼ 1738. 24. R. Rivest, The MD5 Message Digest Algorithm, RFC 1321, IETF, April 1992, http://www.ietf.org/rfc/rfc1321.txt?number ¼ 1321. 25. J. Meyers and M. Rose, The Content-MD5 Header Field, RFC 1864, IETF, Oct. 1995, http://www.ietf.org/rfc/rfc1864.txt?number ¼ 1864. 26. D. Gourley, M. Sayer, B. Totty, S. Reddy, and A. Aggarwal, HTTP: The Definitive Guide, O’Reilly & Associates, Inc., 2002. 27. C. Wong and L. Mui, HTTP Pocket Reference, O’Reilly & Associates, Inc., 200. 28. WAP Forum, WAP 1.0 Technical Specifications, http://www.wapforum.org/what/technical_1_2_1.htm. 29. K. A. Jamsa, WML & WMLScript: A Beginner’s Guide, McGraw-Hill, 2001. 30. WAP Forum, WAP 2.0 Technical Specifications, http://www.wapforum.org/what/technical.htm. 31. WAP Forum, WAP 2.0 Technical White Paper, http://www.wapforum.org/what/WAPWhite_paper1.pdf, Aug. 2001. 32. Open Mobile Alliance Overview, http://www.openmobilealliance.org/overview.asp. 33. SyncML Data Synchronization Specifications, version 1.1, http://www.openmobilealliance.org/syncml/downloads.html. 34. J. Postel and J. Reynolds, File Transfer Protocol, RFC 959, IETF, October 1985, http://www.ietf.org/rfc/rfc0959.txt?number=959.

35. C. Adie, A Status Report on Networked Information Retrievl: Tools and Groups, RFC 1689, IETF, August 1994. http://www.ietf.org/rfc/rfc0959.txt?number=959. 36. Nokia, Mobile Internet Technical Architecture, IT Press, 2002. 37. T. Berners-Lee, R. Fielding, and H. Frystyk, Hypertext Transfer Protocol – HTPP/1.0, RFC 1945, IETF, May 1996. Defines current usage of HTPP/1.0. http://www. ietf.org/rfc/rfc1945.txt?number=1945.

86


38. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, and T. Berners-Lee, Hypertext Transfer Protocol – HTPP/1.1, RFC 2068, IETF, January 1997. Proposed Standard of HTTP/ 1.1. http//www.ietf.org/rfc/rfc2068txt?number=2068. 39. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, Hypertext Transfer Protocol – HTPP/1.1, RFC 2616, IETF, June 1999. Draft standard of HTTP/1.1. http://www.ietf.org/rfc/rfc2616.txt?number=2616. 40. P. King, and T. Hyland. Proposal for a Handheld Mark-up Lanaguage, HDML version 2.0, W3C May 1997. http://www.w3.org/TR/NOTE-Submission-HDMLspec.html.

CHAPTER 4

CONTENT CACHING AND MULTICAST DAN LI Cisco Systems Sunnyvale, California

4.1

WEB-BASED APPLICATIONS

Before we discuss the content distribution on the Web, let’s first examine the nature of the Web-based applications. Just as TCP/IP is the common platform that interfaces upper-layer applications to various link layer technologies (modem, Ethernet, fiber, wireless, etc.), HTTP/HTML [1,2], specifically, the Web, is becoming the dominant carrier for information such as document retrieval, Web-based email and news, and e-commerce. The Web performs two major functions: information dissemination and information exchange. For example, reading news articles on CNN.com or listening to the online broadcast of KQED radio is information dissemination. On the other hand, reading and sending emails at Hotmail or checking out an online shopping cart at Amazon.com is information exchange. 4.1.1

Information Dissemination

It may seem obvious that information-dissemination-based applications can be easily distributed on the Internet by caching the Webpages at locations close to the end users: “Yes” and “No”: 1. Content that seldom changes, such as the Webster English dictionary, can be easily distributed via caching. But certain Webpages, for example, the CNN.com front page, may change frequently at the source, making the cached copies obsolete. 2. Information on a Webpage may be filtered or personalized before being returned to particular end users, such as the page of my.yahoo.com, which makes caching ineffective in speeding up other users’ access. Moreover, Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

87

88

CONTENT CACHING AND MULTICAST

information may reside in a database behind the Web server and retrieving the information involves database queries. As the database data may change at anytime, the resulting Webpage cannot be cached and reused without querying the database again. 3. The information may be a live stream of video and audio to millions of end users and might not fit in the cache-and-relay model at all. In summary, within information dissemination, there are four types of Web content based on their various cachabilities. Higher cachability implies easier distribution on the Internet. 4.1.1.1 Static Content Long-lived documents such as RFCs are ideal candidates for caching [3]. A proxy can retrieve many popular pages once over the limited WAN connection to the Internet and then deliver them repeatedly at LAN speeds to local browsers. A local browser can also cache the documents if enough disk space is available. The proxy can poll the document source to check cache consistency. The rate of polling may be low because writes are infrequent. Suppose that a Web cache proxy keeps the frequently accessed RFCs in the cache and performs weekly consistency checks (each less than 200 bytes long). Assume that on average each RFC is 20 kB (kilobytes) long, is referenced once a day by employees, and has a modification interval (also called a lifetime) of one year. Then the traffic load on the data source and the Internet backbone is 20 kB þ 52 0.2 kB with caching and 20 kB 356 without caching, a ratio of 1 : 234. Besides the huge bandwidth saving, the Web access latency is also reduced from the WAN speed [RTT on the order of 10 ms (a hundred milliseconds)] to the LAN speed [roundtrip time (RTT) in the order of 10 ms].1 4.1.1.2 Dynamic Content Opposite to static content, “dynamic content” refers to the various types of Web objects that are not static and not cache-friendly. A piece of dynamic content has one or more of the following characteristics: frequently changing, personalized, and/or dynamically generated. For example, the Google.com search result page is dynamically generated from database queries. The front page of CNN.com is frequently changing. The front page of my.yahoo.com is personalized and may be frequently changing as well. Popular and changing Webpages such as the Webpages of CNN.com, Yahoo.com and ESPN.com represent an increasing portion of the Web traffic [4,5]. The author refers to Web objects2 that are both accessed and modified frequently as dynamic objects. Please note that a different term, dynamic content, refers to web content that is generated on the fly, for instance, by CGI scripts [6]. 1

These RTT ranges can be observed via simple traceroute or ping experiments to local and remote sites. A Web object is one or more files a browser needs to retrieve from the Web server in order to display a URL. 2

4.1

WEB-BASED APPLICATIONS

89

Dynamic objects reduce the benefit of caching: (1) cached copies of dynamic objects become stale and useless more often—frequent retrieval of new copies over the wide area network causes longer delays to end users and heavier loads on the network, the Web server, and the proxy; and (2) the Web cache proxy also bears a larger consistency cost for dynamic objects. For example, with polling, the rate of polling by the proxy must be considerably higher than the rate of modification at the Web server in order to maintain an acceptable stale rate (percentage of instances that the cache returns a stale document) [3].

4.1.1.3 Streaming Media Web-based prototypes of Internet radio and TV stations have emerged in the marketplace. For example, http://events.yahoo.com/ and http://www.broadcast.com/ maintain links to many live-on-the-Web events. The lack of native multicast support and the network heterogeneity make it compelling for Web cache proxies to support live multimedia. For example, most cache vendors (like Cisco) and multimedia player vendors (like Real Networks) have developed “video splitters” to handle live multimedia in Web caches. There are two reasons: 1. Radio and TV broadcasts are large-scale, one-to-many synchronous dissemination, an ideal application of IP multicast. However, many ISPs currently do not provide native IP multicast service and much multicast traffic is carried over MBone (the Internet multicast backbone) [7], which uses IP tunneling to connect islands of native multicast domains. Conversely, the live multimedia source cannot sustain the large throughput in order to unicast repeatedly the content to many individual subscribers. Web cache proxies can alleviate the problem by receiving the live multimedia flow as a single endpoint and then streaming the data to multiple local subscribers, which is effectively a form of multicast in the application layer. 2. The Internet is intrinsically heterogeneous in terms of both the network speed and the end-station capability. Multimedia research has proposed applicationlevel video gateways [8] and active service [9], which relay the multimedia streams from the media source to the end hosts, adjusting the streams (e.g., the bit rate and the coding scheme) to accommodate the constraints of the end hosts. For Web-based multimedia applications, it is both convenient and compelling for Web cache proxies to assume the role of “video gateway” because (a) these proxies are deployed at strategic locations of the infrastructure (e.g., at the gateway of a campus network) and (b) the proxy settings are already configured into end users’ browsers because of other Web-based applications such as document retrieval. Internet content delivery includes static content, streaming media, and dynamic content. Information exchange refers to end-to-end transactions such as checking out an online shopping cart, online banking, and stock trading.

90


There are three types of content: static content, streaming media, and dynamic content. Some of them can take advantage of Web caching, while some cannot. Figure 4.1 classifies the various web-based applications according to their cachability. A typical online transaction often consists of an information dissemination phase and then an information exchange phase. Cachability is the limiting factor on the effectiveness of caching [10]. The fewer Web objects are cachable, the less benefit caching brings. Moreover, current Web caches provide only weak cache consistency; content providers tend to mark frequently updated web pages as non-cacheable just to make sure that users never see stale data. Therefore, cache consistency affects cachability and thus also limits the effectiveness of caching (this point is discussed further in Section 4.5).

4.1.2

Information Exchange

Every day the Web carries more and more client-specific information such as customized news and e-commerce. Such Web information is referred to as dynamic content—web pages generated on the fly, for example, based on the cookies set by the user request [6]. Although much of this information is not cachable, Web caching researchers and cache vendors proposed ways to “cache” dynamic content in order to meet the explosive growth of the Web with the relatively slowly improving Internet infrastructure [10,11]. Their basic approach is service migration and replication. For example, Cao et al. proposed “active cache” [10] where a Java application, called a “cache applet,” defines the dynamic content of a document. A proxy caches the document by caching the cache applet. When a user request hits in the cache, the proxy invokes the cache applet with the user request as arguments. The cache applet can perform a variety of functions, such as logging user accesses, rotating advertising banners, checking access permissions, and constructing clientspecific Webpages. For example, a proxy may cache all the CNN headline news but can tailor the news front page to include only items that the user is interested

Figure 4.1

Classificaion of Web-based applications.

4.2 SCALABLE CONTENT DELIVERY VIA MULTICAST AND CACHING

91

in or has subscribed to. This also applies to e-commerce applications where the proxy can customize users’ shopping catalogs. The author believes that more powerful server replication may also be incorporated with caching. For example, a Website (or a Web hosting service) may replicate all or some of its functionality at various Web caches (e.g., run by ISPs or by the likes of Akamai, an Internet content distribution service provider). A Web cache cluster can either have enough processing power and disk space to support replicated pieces of multiple Websites, each running in a separate address space, or forward the client request to a nearby replicated server of the requested Website. For example, Inktomi once teamed up with CNN to provide a similar service called “reverse proxy” (http://biz.yahoo.com/bw/990331/ca_inktomi_1.html). Akamai and Sandpiper also made some waves in replicating and serving heavy-load content such as image files for content providers (http://www.redherring.com/insider/ 1999/0826/inv-akamai.html). Besides better service scalability, the content provider also retains a higher degree of control over the Web service compared to the traditional caching. On the other hand, the caching provider is rewarded not only by the improved bandwidth utilization but also by a new lucrative revenue source.

4.2 SCALABLE CONTENT DELIVERY VIA MULTICAST AND CACHING Content is delivered over many layers of network protocols. This chapter examines the transport layer for efficiently moving data around the networks. First, let us take a birds-eye view of all the network layers: Layer 5: Application, e.g., DNS, HTTP, FTP, SMTP Layer 4: Transport, e.g., TCP, UDP Layer 3: IP (Internet Protocol), e.g., IP v4, IP v6 Layer 2: MAC (media access control), e.g., Ethernet, PPP Layer 1: Physical, e.g., fiber, DSL, cable modem

This diagram depicts how we normally think of the Internet architecture. Layers 1 through 4 are not “content-aware,” while layer 5 is the one and only “content layer.” Layer 5 protocols started as individual applications, each serving specific needs and not very much intertwined. But over the years, while many have remained in their niche (e.g., Gopher), a few have proliferated and become part of the Internet infrastructure, such as DNS (domain name service) [12], HTTP (Hypertext Transfer Protocol) [13], and SMTP (simple mail transport protocol) [14]. In fact, the layer 5

92


protocols have become so common that new applications are being built on top of them every day. In short, more layers are emerging from layer 5. Borrowing the Open System Interconnection (OSI) architecture [15], we can think of the Internet infrastructure as follows:

Layer 7: Application, e.g., Web, email, instant messaging Layer 6: Presentation, e.g., XML, MIME, DNS, URI Layer 5: Session, e.g., HTTP, BEEP, SCTP, RTSP, RTCP Layer 4: Transport, e.g., TCP, UDP, reliable IP multicast, application layer multicast Layer 3: IP (Internet Protocol), e.g., IP v4, IP v6 Layer 2: MAC (media access control), e.g., Ethernet, PPP Layer 1: Physical, e.g., fiber, DSL, cable modem

Now, zoom in to layer 4, the transport layer. While unicast transports such as TCP [16] and UDP [17] are indispensable building blocks, the foundation of scalable content distribution is efficient one-to-many data transport, namely, multicast. Broadly speaking, the term “multicast” simply refers to moving a piece of data to multiple endpoints. For example, a satellite beaming live Superbowl video feed to millions of “Direct TV” settop boxes is a form of multicast. The CBS.com Web server delivering video clips of the “Survivor” show to millions of Web surfers via repeated TCP is also a form of multicast, although “repeated unicast” is not nearly as efficient as a satellite beam. IP unicast delivers data to one receiver a time, while IP multicast delivers data to multiple receivers simultaneously. For example, repeated unicast of one megabit of data to 1000 sites would use one gigabit network bandwidth, while a single IP multicast stream can reach all 1000 sites simultaneously, using only one megabit bandwidth. This chapter focuses on efficient one-to-many content transport and, specifically, three flavors of it: IP multicast, application layer multicast, and on-demand proxy caching. Section 4.3 talks about IP multicast and how to provide it reliably, such as, for real-time live streaming. Section 4.4 introduces application layer multicast, typically to preposition of bulk data, compares it with IP multicast, and describes the various algorithms in building application layer multicast. Section 4.5 describes Web proxy caching, which provides the same one-to-many content delivery, but in an end-user-driven and on-demand fashion. A combination of both proactive multicast and on-demand caching enables the most scalable blend of one-to-many content delivery on the Web.

4.3

4.3

IP MULTICAST AND RELIABLE MULTICAST

93


How to distribute content to vast audiences in a timely, efficient, and, most importantly, scalable fashion? A key technology that can enable this is IP multicast [18]. With IP multicast, applications send one copy of the information to a group address, reaching all recipients who want to receive it. Without multicast, the same information must be carried over the Internet multiple times, one time for each recipient, consuming more bandwidth and limiting the number of participants. This is especially unacceptable if the underlying network is capable of broadcast, such as satellite or cable networks. For Internet applications such as online TV stations to reach millions of Internet users, IP multicast is a requirement, not an option. IP multicast is the transmission of an IP datagram to a “host group,” a set of zero or more hosts identified by a single IP destination address. A multicast datagram is destined to a host group, and then the network delivers it to all the hosts subscribing to that group. Multicast-capable routers keep track of the group membership and maintain the multicast FIB (“forwarding information base” in router memory; used for route lookup). Hence the multicast source does not need to know the group membership. The original multicast service model, as defined by S. Deering in the late 1980s, is many-to-many, where multiple sources may send to the same host group, and group members do not need to know where the potential sources are. A decade later, prompted by the deployment difficulties of IP multicast, H. Holbrook [19] defined a new IP multicast service model, single-source multicast (SSM). A SSM host group can have only one source. When joining a SSM group, one needs to specify both the source address and the group address to the network. Such a single-source model allows much simpler and more scalable implementation of IP multicast. Similarly, our discussions on multicast transport protocol will focus on single-source multicast sessions. Multicast datagrams are delivered with the same “best effort” reliability as unicast datagrams. While this is okay for certain real-time streaming applications such as online radio stations, it’s inadequate for many others. For example, besteffort multicast service is unacceptable for multipoint software distribution or multipayer Internet games because “best effort” implies “unreliable.” Just as a unicast application may require TCP [16] on top of IP unicast, a multicast application may require a reliable multicast (RM) protocol on top of IP multicast. A RM protocol either confirms with the upper-layer application that the delivery of multicast datagrams is error-free and in sequence, or informs the application of the failure of delivery. The same techniques used in TCP can be applied to reliable multicast, including sequence numbers, checksums, and positive acknowledgments (ACKs). For example, SCE (single-connection emulation) is such a reliable multicast protocol [20]. SCE turns a reliable multicast session into the union of multiple reliable TCP sessions. The sender maintains TCP state with every receiver. It sends out data via IP multicast and receives ACKs from receivers via IP unicast. Such techniques are classified as ACK-based and centralized reliable multicast. Such protocols work very well within a small group but cannot scale up to hundreds or thousands of receivers.

94


Large-scale reliable multicast faces the challenges of feedback implosion and the “crying baby” [21,22] that do not exist in reliable unicast (TCP) and are unique to reliable IP multicast. Years of research resulted in only a few solutions. Researchers have identified four techniques that can mitigate the problems and improve scalability of reliable multicast: . . . .

NAK-based loss reporting, as opposed to ACK-based loss reporting Distributed loss recovery, as opposed to centralized global loss recovery Router-assisted loss recovery FEC (forward error correction)-based scheme for bulk data transfer

Before we examine each of these techniques in more detail, let’s first understand the nature of the challenges. 4.3.1

Challenges Facing Reliable Multicast

Providing efficient reliable multicast support for large-scale multipoint applications is a challenge because of the feedback implosion and “crying baby” issues (described below), especially when the application requires fast packet delivery in addition to bandwidth-efficient delivery. Historically, reliable transport is achieved mainly using ARQ (automatic repeat request) [23] such as in TCP. However, ARQ with a large receiver set—such as in RM—suffers from fundamental scaling problems [21,24]: 1. The feedback implosion problem, where feedback from all receivers can overwhelm the source and the links close to the source [24,25]. For example, with thousands or even millions of receivers, the probability of packet losses in the audience at any moment becomes so high that negative acknowledgments (NAKs) and retransmissions occur constantly, multiplying bandwidth consumption and resulting in more severe congestion and more losses. 2. The “crying baby” problem [21], where one or a few receivers experiencing high packet loss may trigger repeated retransmissions and slow down the entire multicast session. For example, if the source multicasts the repairs, each group member has to receive the amount of traffic equivalent to what the most lossy group member receives, unfairly burdening the ones with low loss rates. Or, if the source regulates its transmission rate accordingly, the entire session slows to the rate of the slowest receiver. The larger the multicast group, the more heterogeneous the network and end-station conditions, the worse this problem. Moreover, as various RM recovery techniques try to address these problems, they make design tradeoffs that come down to the fundamental tradeoff between bandwidth and latency—two competing requirements of multipoint streaming data applications. Reducing the recovery traffic often prolongs the recovery process and vice

4.3


95

versa. For example, delaying or restricting retransmissions [22,25,26] can reduce the recovery traffic load but also increase the recovery latency. 4.3.2

NAK-Based Recovery

While positive acknowledgements (ACKs) such as in SCE relies on the sender to ensure the reliable delivery to all receivers, negative acknowledgments (NAKs) make each receiver responsible for detecting losses (via sequence number gaps) and requesting retransmissions for itself. Therefore it’s also called receiver-driven reliability [21,22]. Research [21,22,27] has shown that NAK-based receiver reliability is better than ACK-based sender reliability in that it generates less acknowledgment traffic, offers better throughput, and works without membership knowledge at the source. However, NAK implosion still poses threats to multicast scalability, especially when a segment is dropped in the upper part of the multicast delivery tree, affecting a large number of receivers and triggering simultaneous NAKs toward the source. Protocols discussed below prevent NAK implosion by using randomization, NAK fusion, or FEC-based open-loop protocol. 4.3.3

Distributed Recovery

One of the most effective problem solvers in human history is divide and conquer. Applying it to RM, the distributed recovery approach employs multiple retransmitters (other than the source) to repair the packet losses of nearby group members [22,24,25,28]. Distributed recovery speeds up the loss recovery process because a repair from the local retransmitter reaches the requester(s) faster than does one from the source. Besides, distributed recovery can prevent the source and its nearby links from being overloaded, compared to centralized recovery (at the source). Its overhead, however, includes (1) the arbitration of who should retransmit if any receiver can become a retransmitter and (2) the extra cost for communicating and maintaining the locality information, for example, who is close to me and to whom I should send the NAK. There are two classes of distributed recovery techniques: global recovery with duplicate avoidance [22] and tree-based local recovery [25,28]. The main difference is whether the retransmitters are explicitly chosen. Floyd et al. first proposed global recovery with duplicate avoidance in SRM [22]. In SRM, any receiver may multicast a NAK to the multicast group and may also respond to a NAK with a multicast repair. Before transmitting a NAK or repair, the sender intentionally delays for a period during which the receiver suppresses the transmission if an identical packet arrives. The actual length of the delay is a randomized function of the roundtrip time (RTT) between the data source and the sender of the NAK (or repair). RTTs are measured periodically by exchanging multicast packets called session messages between group members. This delay-and-suppress process is referred to as duplicate avoidance (DA). It aims at reducing duplicate transmissions and preventing feedback implosion. Note that for each loss there may still be multiple NAKs and/or repairs. For example, a receiver may generate duplicates when another

96


receiver’s NAK (or repair) is propagating and has not reached this receiver. Raman et al. [29] studied the scaling behavior of SRM and concluded that the amount of duplicates and the loss recovery latency present a tradeoff. If the duplicate avoidance period is zero or very small relative to the RTT, the amount of duplicates grows exponentially as the group size increases. If the duplicate avoidance period is set to be much higher than the RTT, the amount of duplicates can remain constant when the group size increases, but the recovery latency will be much higher as a result. The simulation results show that global recovery with DA does not scale well. Conversely, a tree-based RM protocol explicitly defines the retransmitters, called the designated receivers (DRs) [28]. An application layer tree (which is in fact a hierarchy of subgroups) is constructed among the group members and rooted at the source. Receivers close to each other form a subgroup and elect a DR for the subgroup. DRs of leaf subgroups then form higher-level subgroups recursively. Every subgroup member sends NAKs to its DR. Only DRs retransmit. Retransmissions are forwarded only within the local subgroup(s). If the DR does not have the requested packet, it has to send a NAK for itself. Thus, NAKs are relayed along the fusion tree toward the source and fused so that NAK implosion is avoided. Therefore, the subgroup hierarchy is also named the fusion tree [30]. Three design choices in tree-based local recovery differentiate various tree-based RM protocols: (1) how to form the subgroup hierarchy, (2) how to recover losses, and (3) how to control the delivery scope if the recovery process uses multicast (called multicast confinement). One fusion tree formation technique has been the expanding-ring search (ERS) [25,26], where group members locate nearby peers by multicasting session messages to the host group with incremental IP time-to-live (TTL) values. Once the nearby group members establish contact, they can form a subgroup and elect DR based on each member’s loss rate, roundtrip time (RTT) to the source, and so forth. However, ERS works only in source-based tree multicast routing domains to locate nearby peers such as DVMRP [31] and PIM-DM [32]. This is because, when a group member multicasts ERS packets, a source-based tree multicast routing domain forms a multicast routing tree rooted at this member itself, and thus this member’s ERS packets travel via the shortest paths to other members of the group. In contrast, in a shared-tree multicast routing domain such as CBT [33] and PIM-SM [34], multicast messages sent by a group member are first forwarded toward the core or the rendezvous point (RP) of the multicast group and then delivered on a shared multicast routing tree rooted at the RP. Therefore, in a shared-tree domain, the ERS packets from one member do not normally take the shortest path to the other members. Consequently the TTL value cannot reflect the actual closeness between an ERS sender and receiver, defying the purpose of ERS. Thus ERS works in a source-based tree domain but not in a shared-tree domain. Another fusion tree formation technique, the subcast ERS [30], avoids the abovementioned problem. Subcasting is a facility to multicast a packet over a subtree of the multicast delivery tree for a particular group, specified by the multicast address G and the router R at the root of this subtree. It’s referred to as “subcasting from R” (see Fig. 4.2). Subcasting can be implemented using IP encapsulation or loose

4.3


Figure 4.2

97

Subcast.

source routing. To form a fusion tree, a group member (say, A) first identifies the multicast route from the source to itself using IGMP traceroute (mtrace) [35]. Then, A locates nearby peers by subcasting messages from routers on that route with incremental distance from A itself. Such messages reach increasingly larger subtrees that A resides in, until some peers respond and the DR selection process can begin. A nice property of the subcast ERS is that the resulting fusion tree is a close match to the underlining IP multicast routing tree, regardless of whether the tree is source-specific or shared. Thus the losses experienced by DRs are representative to the losses experience by its children (because they share the same subtree root where losses can occur). This property was shown to significantly improve the efficiency of loss recovery from the DRs. Once the fusion tree is built, there are three ways to recover losses: 1. Multicast with duplicate avoidance [22], where NAKs are multicast to the subgroup (after an intentional delay), requesting retransmission from the DR and also suppressing duplicate NAKs from other subgroup members. The DR immediately responds by multicasting the repair to the entire subgroup. TMTP [25] and Lorax [26] are two RM protocols that employ this scheme. A repair may be relayed through the subgroup hierarchy several times before reaching the leaf group members. 2. Cascaded unicast, where NAKs are unicast from subgroup members to the DR and repairs are unicast from the DR to the requester. This design avoids the delay associated with the duplicate avoidance process, which we refer to as DA delay. However, the DR may be subject to NAK implosion if the subgroup size is too large. No prior work has explored this direction.3 3. Hybrid, where NAKs are unicast from subgroup members to the DR while repairs are multicast to the entire subgroup. One example protocol is 3

To the best of my knowledge there is no prior publication, but I have heard that someone at CMU is currently investigating this scheme.

98


OTERS, which employs subcast ERS for fusion tree formation and then unicast NAKs and subcast repairs for loss recovery. A tree-based RM protocol can limit the scope of multicast NAKs and repairs to the local subgroup(s) using these methods: limited scope multicast [25,26], administratively scoped multicast [36], multigroup multicast [37], and subcast [28,30]: 1. Limited-scope multicast (LSM) is used in conjunction with ERS. The subgroups are defined by TTL radius while NAKs and repairs are sent to the host group with a TTL value equal to the subgroup radius. Again, this works well only in source-based tree multicast routing domains. Besides, an LSM packet may be forwarded unnecessarily toward group members that are beyond the subgroup radius and be dropped only at the border of the subgroup. TMTP and Lorax employ this scheme. 2. The recovery traffic may be administratively scoped multicast [38]. The entire administrative domain is defined as one subgroup. It is less flexible than variable size subgroups but confines multicast to subgroups more precisely than LSM. R. Kermode [36] incorporates this scheme with SRM and shows that it improves SRM by several orders of magnitude. 3. Multigroup multicast assigns a multicast group address to each subgroup. NAKs and repairs are sent to this address. This scheme achieves both precise confinement and flexible subgroup formation. The downside is that it requires multiple multicast group addresses, increasing the routing state and address allocation overhead. LGMP [37] employs this scheme. 4. Subcast repair works in a fusion tree that’s built to match the IP multicast routing tree [30]. If a packet drop happens at a router R on the multicast routing tree, the DR responsible for the subtree rooted at R will subcast a repair packet from R. This repair reaches all the receivers underneath R and thus repairs all the losses of that particular packet. It provides accurate multicast confinement similar to multigroup multicast but avoids the routing state and address allocation overhead of multigroup multicast. Its downside is that the routers must allow subcast generated by hosts other than the multicast group source. This can be done by special arrangement with the network but is certainly not universally available nowadays, which brings up the next topic—how can the network help? 4.3.4

Router-Assisted Recovery

Perhaps prompted by the apparent lack of good design choices and network tools for scalable general-purpose RM, a trend in reliable multicast has been focusing on adding router support for fast and efficient loss recovery [39 – 43]. The key question is what and how much mechanism to introduce into the network layer. Some proposals maintain retransmission soft state or buffer packets in routers. Others support richer multicast forwarding semantics than does IP multicasting.

4.3


99

In LMS [39] by Papadopoulos et al., every on-tree router knows an outgoing interface that leads to a replier (designated receiver). NAKs from downstream are recognized by the router and forwarded to the replier (except NAKs from the replier itself). Because NAKs are examined hop by hop, they are forced to be processed by the slow path4 of each router. Repairs are tunneled to the router and multicast to a particular interface by the router. In essence, this approach maintains an implicit fusion tree in the network layer at the cost of additional network state and protocol processing. Therefore, additional interactions between the NAK sender and the routers are necessary to overcome application-level malfunctions (e.g., nonconforming, failed, or malicious repliers) that the network layer by itself cannot correct. This approach introduces two new IP options and one more IGMP packet format. PGM [40] by T. Speakman et al. (a work in progress), employs “network elements” (routers) to fuse NAKs and constrain retransmissions. NAKs are forwarded and confirmed hop by hop. Routers establish retransmission soft state for every NAK’ed segment and forward retransmissions only to the interfaces where NAKs have been received. Both NAKs and repairs must be forwarded via the slow path of the routers. Besides, the network layer decides when the NAKs are to be fused and when re-NAKs can be forwarded, complicating (if not preventing) application control over timing, for example, for congestion control or flow control. In essence, PGM injects into the network layer session semantics and layer-4 state. AIM [41] by B. Levine et al. extends IP multicasting with three labels and rich forwarding semantics that support the delivery of packets to any specific subtree of the multicast routing tree at any specific distance away from the root of the tree. Multicast packets that belong to the same group address but different streams can also be delivered to different subsets of the host group. This approach is conceptually a superset of LMS and PGM, with more state and protocol complexity in the network layer. The additional network state is updated and propagated for every multicast session and on every membership change, regardless of whether the application needs it. The “cache router” proposal [42] by M. Lim et al. (a work in progress) assigns some routers to cache multicast packets and retransmit dropped packets. When dropping a packet, the router sends a “drop” notification to the cache router for retransmission. Again, the network must undertake a transport layer task. This approach consumes scarce network buffer space and also complicates the reliability of drop notifications, such as whether the router should timeout and retransmit the drop notification. Lehman et al. [43] proposed ARM (active reliable multicast) where routers aggregate NAKs and limit the delivery of repairs similarly to PGM. ARM also caches multicast packets in routers and provides best-effort retransmissions similar to the “cache router” proposal. 4

A router can forward packets in one of three ways: hardware-switched, interrupt-switched, or processswitched. The last one is much slower (by an order of magnitude or more) than the first one and is generally referred to as the “slow path.”

100


In summary, these network-assisted RM proposals maintain session state and transport-layer functionality in the network layer. A legitimate concern is, of course, its scalability and efficiency impact on layer 3 routing and packet forwarding. The network layer traditionally has worked better in providing the stateless and best-effort datagram service. Maintaining session state and reliability mechanisms into layer 3 is essentially solving a transport layer problem (the problem of scalable reliable multicast) by pushing the problem into the lower layer. Therefore, whether the router-assisted approach (or how much of it) will actually succeed going forward remains to be seen. Later, we will introduce application layer multicast, which is the direct opposite of router-assisted reliable multicast. Application layer multicast solves the problem of scalable reliable multicast by pushing the problem into the upper layer—the application layer.

4.3.5

FEC-Based Recovery

All the elaborate techniques devised for reliable multicast stem from the need of delivering a multicast packet reliably but also timely. Thus came the negative acknowledgment, its implosion and the “crying baby” for loss repairs. As it turned out, delay-insensitive multipoint applications, typically multicast bulk data transfer, can be supported very differently from general-purpose RM because of its loose timing requirements. Most notably, a FEC-based RM protocol can achieve reliable eventual delivery with little or no feedback (termed open-loop) and thus is often immune to the feedback implosion problem and the crying baby problem. Examples are RMDP [44], Digital Fountain [45], Fcast [46], and MFTP [47]. These protocols are “special-purpose” as they are suitable only for applications not affected by packet delivery delays. Nonetheless, FEC can also be used in conjunction with general-purpose RM recovery techniques to reduce their repair traffic, but it typically increases the repair latency at the same time [48,49]. Next, the authors describe what is FEC and how it can be ingeniously used to provide highly efficient multicast bulk data transfer. FEC erasure correction [50,51] restores erased packets (e.g., due to network congestion) using other redundant packets, while the other form of FEC—corruption correction—corrects a corrupted codeword or packet (e.g., due to line noise) using the redundant information encoded within the packet. Only erasure correction is of interest to transport protocols because any unrecoverable corruption is transformed into erasure by the link layer or the network layer. Let us examine a simple example of how FEC erasure correction can substantially improve the bandwidth efficiency of multicast transport. Suppose that k segments (of equal length) are multicast to m receivers and that each receiver loses exactly one segment. The source can generate a parity segment by taking XOR (exclusive OR ) of all the k segments (viewing each segment as a binary vector). It then multicasts this parity segment to all the receivers. Any receiver can recover its lost segment by taking XOR of the k21 segments it initially received as well as the parity segment. Should the source not use the parity segment, it would

4.3


101

have to retransmit all the distinct lost segments. This scheme can be generalized for multiple packet losses as well, as shown below. Figure 4.3 shows the basic operation of FEC erasure correction. First, the source divides the data to be transmitted into equal-size segments. Every k1 consecutive segments form a transmission group (TG). Then, each TG is encoded into n segments (called FEC packets) such that the original k1 segments can be reconstructed from any k2 of the n segments. k1 and k2 are usually equal and denoted as k. The n segments can all be transmitted and the reception of the original TG is resistant to at most h packet losses, where h ¼ n 2 k. This coding is called an (n,k) erasure correction code [45]. Note that the source must possess the content of the entire TG before encoding and transmitting it, which is a requirement that many streaming media/ data applications cannot meet when k is large. The mathematical principle behind FEC erasure encoding is linear algebra over finite fields. The k original segments are viewed as the coefficients (in the form of vectors) of a polynomial function of degree k 2 1. Redundant segments can be generated by evaluating the polynomial function at n different points. Any k out of the n values fully specifies the polynomial, effectively regenerating its coefficients. Two commonly used codes are the Reed – Solomon code [51] and the Tornado code [52]. The Reed – Solomon code performs lossless data tranformation from the origin data to the encoded data and vice versa. Its computational complexity increases exponentially relative to the coding block size. The Tornado code, on the other hand, trades off a small decrease in bandwidth efficiency for a substantial improvement to the encoding and decoding complexity (or time). Specifically, the Tornado code has a slightly larger k2 than k1 (typically 5% larger), while in return, its encoding and decoding involve only addition (i.e., XOR ) and no matrix inversion (which requires more costly multiplication, typically using table lookups). This strength is especially evident when k is large. Byers et al. [45] showed that, with k values over 1000, a software Tornado code implementation on a low-end workstation can encode and decode several megabytes of data in a second while a Reed – Solomon implementation needs hundreds of seconds.

Figure 4.3

Forward error correction erasure correction process.

102


Now, let’s apply FEC to multicast bulk file transfer. Suppose that the payload data is a file whose content is repeatable and predetermined before transmission. The source encodes the entire file into FEC packets and transmits these FEC packets to a multicast group continuously by looping through them. A receiver joins the multicast group at any time to download the file and leaves as soon as it receives enough FEC packets in order to reconstruct the original file. Using the Tornado code [45], the entire file (even if very large) may be encoded as one transmission group (TG) because the Tornado code is fast even for very large values of k. A receiver can leave the multicast group as soon as k distinct FEC packets are received. Using the Reed – Solomon code [44], however, a large file may have to be split into multiple TGs each of which is encoded separately. When transmitting, the source interleaves packets from different TGs; that is, after sending a FEC packet from a TG, one packet from each of the other TGs is transmitted before another packet from the same TG is sent. A receiver leaves the multicast group only after at least k distinct FEC packets are received for every TG in the file. This scheme requires no receiver feedback, nor subgroup formation or special router assistance. It is therefore highly scalable, deployable, and efficient. A receiver may receive redundant packets for an already fully received TG while waiting for another TG to accumulate the needed k packets, because the source interleaves the FEC packets from all the TGs. The Tornado code can usually fit an entire file in just one TG and thus eliminate duplicates. At the same time, it introduces a decoding inefficiency of 5%, meaning that a receiver has to receive on average 105% times the original amount of data in order to reconstruct the original file. Because the Tornado code can place a tight upper bound on the inefficiency, it’s often preferred over the Reed – Solomon code for bulk file transfer. Although we call these protocols “open-loop,” some form of feedback is still needed to avoid wasting resources at the multicast source. For example, the source has to keep transmitting in case a late receiver joins and/or a “crying baby” is still waiting for packets. Thus, void of member registration and deregistration, the source has to continue transmission indefinitely, even though when the multicast group is empty, the traffic from the source is not forwarded beyond the first-hop router. The solution is either in the network layer or in the application layer. In the network layer, the first-hop router can notify the source whenever the multicast group starts or stops being empty, allowing the source to start or stop transmission. In the application layer, the receivers can register with the source when they join and leave the session so that the source transmits only when there is someone interested in receiving. 4.3.6

State of the Art

If your application needs to use reliable multicast, here are some available software packages you can use: . For

large-scale

bulk

digitalfountain.com/.

data

transfer,

Digital

Fountain,

http://www.

4.4

APPLICATION LAYER MULTICAST

103

. For large-scale real-time event notification and publisher– subscriber communication, try Talarian http://www.talarian.com/ and Tibco http:// www.tibco.com/. . For general-purpose multicast in conjunction with Cisco router assistance, try PGM http://www.iet.unipi.it/ luigi/pgm.html. . For use with a small group and a single source, try SCE (single– connection emulation), which emulates one multicast session using multiple unicast TCP sessions, http://www.cc.gatech.edu/computing/Telecomm/ playground/SCE/. . For use with a small group and a single source, try RAMP (reliable adaptive multicast protocol), initially described in RFC 1458, http://www.tascnets. com/tbone/ramp.html (or http://www.google.com/search?q ¼ cache:www. tascnets.com/mist/doc/RAMP.html þ RAMP þ multicast þ protocol&hl ¼ en). . For use within a local area network and with multiple sources, try SRM (scalable reliable multicast), http://www-mash.CS.Berkeley.EDU/mash/software/ srm2.0/. 4.4


Multicast is a powerful service model that enables scalable content delivery. However, in the last section, we witnessed the many challenges facing reliable multicast. Sophisticated solutions have been devised to address them. As traditional endto-end mechanisms such as ACKs and NACKs were shown to be less than adequate, some proposed pushing transport session state and reliability mechanisms into the routers. In fact, if the routers supported connection-oriented and reliable data transfer, instead of the current connectionless and unreliable IP datagram service, we wouldn’t have the headache of reliable IP multicast. But apparently we cannot convert the world of routers to be connection-oriented overnight. Nonetheless, it’s a promising approach. We cannot change all the routers, but we can add to them. In other words, we can deploy intelligent nodes at strategic locations of the current IP network to perform connection-oriented reliable data transfer among them. These intelligent nodes form an overlay network. Among the interesting things these intelligent nodes may be capable of is the ability to receive one datastream and forward it to multiple downstream peers. This is very similar to how the routers perform IP multicast. While routers do it to IP datagrams (hence unreliably), here, both receiving and sending are performed reliably using TCP, by the overlay nodes. This is called the application layer multicast. As such, the term “application layer multicast” refers to the service of one-to-many and/or many-to-many data transfer (just like IP multicast) by utilizing general-purpose computers (instead of IP routers) in the middle of the network to replicate data to multiple receivers on the application layer (instead of IP layer). Application layer multicast is being used to deliver live streaming media or to preposition videos to branch offices, and then provide live or on-demand streaming services to end users from their branch office [53,54].

104


The same technique is also used for software distribution and prepositioning of Web content. Next, let’s examine the motivations for application layer multicast and how it works.

4.4.1

Rationale for Application Layer Multicast

Many factors contribute to the necessity of application layer multicast. In short, it allows a simpler and more comprehensive implementation of the “multicast” service model than does IP multicast [55,56]. Effective Transport First, application layer multicast effectively solves the transport issues that other multicast transports find challenging to solve. A transport protocol needs to provide solutions for congestion control and flow control and, often, for reliability and security. These are hard problems to solve and, so far, standard solutions exist only for transports between two ends, namely, TCP. Building a multicast transport on top of IP multicast remains sorely challenging. For example, reliable IP multicast faces scalability problems caused by feedback implosion and “crying baby” (see Section 4.3). Scalability also complicates multicast security. Every member joining the multicast group has to be authenticated and obtain the security key from someone; and every member leaving the group forces all the other members to rekey in order to prevent the leaving member from eavesdropping. This can cause the security overhead to grow exponentially as the group membership increases and join/leave (turnover) becomes frequent. Moreover, IP multicast has difficulty with heterogeneity when providing congestion control across networks with vastly different bandwidth availability and congestion characteristics. No matter what data rate the sender transmits at, there will be a number of receivers not able to keep up with the rate and a number of receivers not able to utilize their full download potentials. Many proposals address or mitigate these problems. Some encode the data specially to facilitate recovery from packet losses (e.g., using FEC [45]) or response to network congestion (e.g., using layered multicast [57]). Some require additional support from the network layer to aid packet recovery [40]. Many more proposals take the approach of “divide and conquer,” such as dividing a heterogeneous multicast group into several homogeneous groups for congestion control [58], employing local representatives for loss recovery and security key distribution [30]. Application layer multicast is also a divide-and-conquer approach, but it’s a drastic departure from the premise of having to build the multicast transport on top of a multicast datagram service. Instead, application layer multicast builds the multicast transport on top of the unicast transport, thus delegating much of the congestion control, flow control, and reliability problems to the unicast transport. Because each overlay node receives data reliably from its parent and relays the data reliably to its children, one-to-many reliability is reduced to multiple oneto-one reliability, effectively solved by TCP. Similarly, one-to-many congestion

4.4


105

control is reduced to multiple one-to-one congestion control, again, effectively solved by TCP. Easy Deployment Application layer multicast not only addresses the transport issues simply and effectively but is also much more deployable and controllable. For any application vendor, deployment issues govern whether and to what extent they can financially profit from the applications that they sell. Thus, being easily deployable often drives the design choices in application development. For example, suppose that a multi-party Internet game vendor needs reliable multicast among its game players. Should they use reliable IP multicast? From the deployment angle, the players could be dispersed in various ISPs that may or may not support IP multicast internally, not to mention inter-ISP IP multicast forwarding. Even if they all do, the game operator may need to negotiate service-level agreements for the IP multicast service, potentially with every ISP where potential game players reside. Therefore, the deployment of the Internet game that requires ubiquitous IP multicast support is dependent on the deployment of ubiquitous IP multicast support. Currently, there is no ubiquitous IP multicast support on the Internet; but there is ubiquitous IP unicast support. Consequently, there is only one choice available to the game vendor. Instead of using reliable IP multicast, they choose to simulate the effect of reliable multicast using a central game station (an overlay node) to relay players’ moves to each other via unicast. Many application developers have made such a design choice ever since the inception of IP multicast. Because of the lack of compelling applications for interdomain IP multicast, its deployment on the Internet has been slow, which in turn deters application developers from adopting it. Hence, the vicious cycle continues and is the “chicken and egg” problem facing IP multicast deployment. One may ask whether, if IP multicast has this chicken-and-egg problem, wouldn’t any emerging technology would face this problem and how any new technology could be deployed. The central test is whether incremental deployment is possible and useful. IP multicast doesn’t do well in the test in that the value of IP multicast becomes compelling only when it’s universally available on the Internet, that is, available across various ISPs, as IP unicast is today. On the contrary, application layer multicast is incrementally deployable and makes minimal assumptions about the underlying network, specifically, assuming universal IP and TCP connectivity. To deploy an application that requires application layer multicast is equivalent to deploying the application itself because application layer multicast is bundled with the application. For example, once an end station installs the Internet game application, it becomes an overlay node and can talk to other overlay nodes to achieve the effect of multicast among themselves. The overlay nodes can consist of entirely the end consumers of the Internet game content, requiring no additional deployment other than deploying the application itself. However, additional overlay nodes may be deployed at strategic locations of the network and assume a dedicated role of accelerating the application layer multicast to end consumers. Such deployment can be entirely incremental, meaning that the Internet game will function without it but will function better with each addition of dedicated overlay nodes.

106


Thus, the application operator may add dedicated overlay nodes when and only when there are enough players in the game and it’s economically possible as well as performancewise necessary to install dedicated overlay nodes. Separately, application layer multicast also requires no additional routing state in the routers, while each IP multicast group results in a routing entry in every router on the IP multicast tree. Note that unicast routing entries can be aggregated into IP prefixes so that each entry represents a set of destinations, allowing fast-path hardware forwarding of unicast. In contrast, it’s difficult, if not impossible, to aggregate IP multicast routes because, unlike unicast address, multicast group addresses have no geographic significance and encapsulate no routing information [18,19]. Therefore the cost of IP multicast routing cannot be as easily amortized as IP unicast routing. This suggests that small and sparse multicast groups should use application layer multicast instead of IP multicast. This is not a problem today as not many applications are using interdomain IP multicast anyway. However, when use of interdomain IP multicast becomes more widespread, ISPs may charge the sender of an IP multicast group according to the intrinsic cost of routing that IP multicast group. This will make it less economical for small and sparse groups to use IP multicast than to use application layer multicast.

Asynchronous Delivery While both application layer multicast and IP multicast implement the “multicast” service model, application layer multicast is capable of asynchronous delivery while IP multicast is not. With asynchronous delivery, a multicast subscriber can choose when to receive the multicast content, whereas with synchronous delivery all the subscribers receive the multicast content at roughly the same time. For example, live concert broadcast requires synchronous delivery while the on-demand replay of the concert requires asynchronous delivery. Many Internet applications, such as typical Web surfing activities, require asynchronous delivery. In application layer multicast, an overlay node can store an entire file to disk and forward the file on only when its peers are ready to receive the file. While IP routers also store and forward individual IP packets, their “store” is small and transient. If the receiving router or end station is busy or down, packets fill up the router’s queue and then start to be dropped. When the receiving router or end station comes back online later, those lost packets are lost forever. So, while IP multicast works well for simultaneous multipoint distributions such as live streaming, it falls short when the recipients may be off line and on line randomly and may request the content at random time. Some techniques can mitigate this shortcoming to some extend via FEC coding and/or repeated multicast from the data origin [45,59,60]. On the contrary, overlay nodes’ store is large and persistent, allowing “time-shifted” multipoint distribution [61]. The large store in the overlay node also serves as a buffer between a fast sender and a slow receiver, allowing both ends to operate at their optimal speed. So, application layer multicast is effective at both synchronous and asynchronous deliveries, making it more attractive than IP multicast for certain applications.

4.4


107

Application Layer Routing Both IP multicast and application layer multicast build a distribution tree from the data source to the multicast recipients. Application layer multicast, however, allows for better application control of the tree than does IP multicast. IP multicast tree building is typically based on layer 3 metrics such as the hop count, the link cost, and the peering relationship between ISPs. It gives no consideration to end-to-end application metrics, as the IP layer focuses on providing connectivity instead of preserving application performance. In contrast, application layer multicast is aware of the applications it serves, and the tree building can be geared toward specific application needs. For example, to serve popular Web content, the overlay network may consider end-to-end metrics in routing, such as each overlay node’s current load, available disk space, and the latency of retrieving a 10-kB document via HTTP. Such application-specific routing is difficult, if not impossible, to provide on layer 3, because layer 3 metrics are typically transitive and additive (e.g., the hop count and link cost), with little correlation to the endto-end performance. For example, S. Savage et al. compared the end-to-end performance (e.g., roundtrip time, loss rate, and bandwidth) seen using the default path taken in the Internet with the potential performance available using some alternate path [62]. They discovered that in 30– 80% of the cases, there is an alternate path with significantly superior end-to-end quality. This suggests that application layer routing on top of IP layer routing is necessary to improve end-to-end application performances. In the context of Internet content distribution, application layer routing is also referred to as “content routing.”

Versatility Besides providing an effective multicast transport with easy deployment, asynchronous delivery, and application layer routing, an overlay network is capable of much more. For example, the overlay node can perform multimedia transcoding to convert MPEG4 streams into QuickTime to accommodate QuickTime-only receivers [8,63]. The overlay node can also perform “intelligent stream thinning” when forwarding streams from a fat pipe to a thin pipe, as in Inktomi’s MediaBridge product. Or it can perform Web caching by fetching popular content from the origin Web server and storing the content closer to the Web surfers [64]. Basically, the overlay network can be customized to provide any per-hop content adaptation on the data path from the sender to the receiver. Taking this concept further, overlay networks open the doors to a truly intelligent network infrastructure. On one end of the spectrum, the overlay nodes can be commodity PCs running in people’s garages, inexpensive but collaborative, like Napster. On the other end of the spectrum, the overlay nodes can be part of the core network infrastructures. They can be deployed along side the backbone routers in the ISP’s points of presence (POPs). Moreover, a router chassis can host a blade with generalpurpose processors on board, performing intelligent overlay, with large amounts of network storage installed nearby. Such overlay nodes can be highly integrated with the network and receive the same 24/7 maintenance and monitoring as the IP routers in the POPs. They allow the ISP to provide value-added services beyond IP forwarding. Application layer multicast is a good example.

108


4.4.2

Why We Still Need IP Multicast

Now that we’ve shown that application layer multicast is simpler and more powerful, some might think IP multicast is simply unnecessary. In fact, it’s quite the contrary. While application layer multicast is undoubtedly irreplaceable, there are many scenarios where IP multicast significantly improves efficiency. 1. IP multicast is much more effective when the physical media is broadcast in nature. For example, when the underlying network is satellite or cable networks or flat Ethernet LANs, the cost of one unicast is equivalent to that of a broadcast. Thus application layer multicast, which uses multiple unicast to simulate the effect of multicast, is inferior to IP multicast, which directly utilizes the broadcast link. For example, in a star topology where thousands of hosts are connected via satellite to the source, there is no connectivity between any two recipients other than via the back channel to the source. In this case, application layer multicast doesn’t help. Even in terrestrial networks where the underlying network is not capable of broadcast, the ability to use IP multicast where it’s available is also preferred as this avoids consuming the valuable resources on the dedicated overlay nodes in the middle of the network. These overlay nodes are often subject to stringent unicast bandwidth provision and control, because they multiply traffic and hence can place a significant load on the network. So, it’s better to reserve their resources for relaying traffic only between non-multicast-capable networks. 2. IP multicast utilizes the link bandwidth more efficiently by building a distribution tree that matches the physical topology. While the overlay nodes are good at considering application layer metrics when building the distribution tree, it’s difficult for them to build a network-friendly tree without the network layer routing and topology information [61]. The overlay nodes do not have such information because the underlining IP network simply provides the appearance of direct connectivity between all overlay nodes. The resulting tree often doesn’t fully map the underlining topology, especially when the deployment of overlay nodes is dense. This causes inefficient distribution. For example, data are sent over the same physical link multiple times as opposed to only once in the IP multicast scenario. Such inefficiency will be magnified when the overlay network is used to stream real-time video. All of a sudden, multiple overlay nodes start sending large volumes of data all across the same physical link, not only competing with each other but also hurting the other types of traffic. 3. IP multicast helps applications scale to thousands and millions of end stations. Because IP routers forward IP multicast at wire speed and each router’s interfaces receive at most one copy of the multicast traffic, IP multicast can easily scale up to unlimited number of receivers without degradation to the forwarding latency or the bandwidth consumption. In contrast, building large-scale content distribution purely on top of application layer multicast can result in prolonged delivery latency, higher deployment cost, and less

4.4


109

efficient replication. Suppose that application layer multicast is used to broadcast TV programs on the Internet to ten million of viewers. If the application layer multicast tree has a fanout of 2, the video stream has to be relayed through at least 23 overlay nodes (besides all the IP routers on the way) before reaching a viewer—much longer than IP multicast would take. Moreover, to support a fanout of 2, at least 5 million dedicated overlay nodes must be deployed in the middle of the network—a huge deployment cost. One may suggest that, instead of deploying so many dedicated overlay nodes, we employ the viewers’ end stations to replay traffic. But that effectively triples the processing and bandwidth requirements on the end station. So, instead of a 256-kB stream, the end user may be able to view only an 80-kB stream—unacceptable quality degradation. In addition, the video stream has to be forwarded from the backbone into the access network to reach a viewer’s end station and then back out to viewers in other access networks, tripling the traffic between backbone ISPs and access ISPs—again, an expensive proposition. So dedicated overlay nodes have to be deployed to split the traffic in the middle of the network. One may attempt to reduce the number of dedicated overlay nodes by increasing the tree fan-out, say, to 10. Unfortunately, it would still require at least one million dedicated overlay nodes. Moreover, each overlay nodes would then generate 10 times unicast traffic in order to replicate the video stream to 10 downstream peers. If the application layer multicast tree doesn’t fully match the underlying routing topology (which is often the case), many of these ten replication streams will flow through the same network interfaces on some routers along the way. In comparison, IP multicast forwards traffic no more than once through any network interface. Overall, IP multicast is well suited for large-scale content distribution, especially live streaming. It’s wire speed and utilizes the physical links efficiently, especially when the underlying topology is broadcast in nature. With the introduction of singlesource multicast (SSM) and the increasing demand on multicast “killer apps” such as Internet TV broadcast, the deployment of IP multicast is expected to accelerate in the very near future. Now, after considering both sides of the argument, we reached a balanced view: IP multicast and application layer multicast both have strengths and weaknesses, but they complement each other nicely. An optimal “multicast” system would take advantage of both of them: by placement of a handful of dedicated overlay nodes at strategic locations of the network, such as nearby backbone routers or campus gateways, so as to match the overlay topology with the network topology. Overlay nodes serve as persistent storage, as well as provide intelligent content adaptation to applications that require these services, including stream thinning, transcoding, and video on demand. Wherever available and suitable, IP multicast or reliable IP multicast is used to distribute content from a parent overlay node to a large number of children overlay nodes, or directly to the end users, saving network resources as well as overlay node resources. Where IP multicast is not available, the

110


overlay nodes perform application layer multicast. Where interdomain IP multicast is not available, overlay nodes interconnect the isolated IP multicast domains by setting up IP tunnels between them. For example, in a reliable multicast system developed at UC Berkeley [63], a large group of heterogeneous end receivers are partitioned into multiple smaller groups, called “data groups,” each with homogeneous receivers and an overlay node. Various data groups are interconnected through their respective overlay nodes. These overlay nodes form a global application layer multicast tree to distribute data to the overlay nodes. Each overlay node in turn relays the data to its data group using IP multicast. Moreover, as data flow through the overlay nodes, certain content adaptations may happen; for example, 2-MB video stream may be reencoded and sent out at 256 kB if the receiving overlay node does not have enough bandwidth for the 2-MB stream. In another system, called mTunnel [65], Parnes et al. developed the application-level tunneling of IP multicast traffic to users that are not IP-multicast-capable. The end user is responsible for deciding which multicast sessions and multicast groups to tunnel from the Mbone, the multicast backbone [66]. The mTunnel listens to session announcements and presents information about current sessions and control of mTunnel through a Web interface. This and other systems [53] demonstrate that IP multicast and application layer multicast can work in concert to provide large-scale content distribution on the heterogeneous Internet.

4.4.3

Functions of Application Layer Multicast

Application layer multicast involves four aspects 1. Overlay Setup. This is the deployment of the physical overlay infrastructure. A number of overlay nodes are scattered in the networks, either at branch offices or at the ISP POPs. The overlay nodes are regular PCs or router blades, with large amounts of permanent storage (e.g., 100 GB). The software for application layer multicast is deployed on the overlay nodes as well as on the multicast source. In a “mesh” implementation, the overlay network operator manually configures a mesh by specifying “virtual links” between overlay nodes [54]. An overlay node may only talk directly to nodes to which it has a virtual link, despite the underlying layer 3 connectivity to all nodes. In a “nonmesh” implementation, an overlay node may talk directly to any overlay node [56,61]. While “mesh” implementation gives the network operator more influence over the overlay topology, the “nonmesh” implementation gives more freedom to the application layer multicast software and places fewer burdens on the network operator. Once the physical overlay infrastructure is set up, the next three phrases deal with the instantiation of application layer multicast sessions on the overlay network. 2. Tree Organization. An application layer distribution tree is constructed for each multicast group, amongst the overlay nodes that participate in the

4.4


111

session. While both network layer and application layer metrics are considered during tree building, whether the overlay network is “mesh” or “nonmesh” also significantly affects the tree building process. After being built, the distribution tree receives ongoing maintenance in order to adapt to changes in network conditions and overlay node availability. Tree organization is the most complex and yet critical to the overall success of the system. So we will dedicate the next section to tree organization algorithms. 3. Content Distribution. Once the distribution tree is built, multicast content is transmitted reliably over the distribution tree. For real-time streaming, the transmission is relayed by all the overlay nodes instantly and, optionally, stored at certain overlay nodes. For latency-insensitive application such as software distribution and content replication, the time and speed of transmission are controlled by session management, which in turn are driven by resource provision policies. For instance, the administrator of the overlay network may configure an overlay node to generate no more than 10 mbps (millibits per second) of traffic during the peak hours and no more than 100 mbps off peak. A real-time multicast session may reserve the necessary bandwidth before commencing, while a delay-insensitive application may not and accept the possibility of being temporarily stalled (and stored) on intermediate nodes, awaiting bandwidth. Each overlay node archives the content as needed so any downstream node may retrieve the content only when resources there free up for the content. Plus, the downstream node may go on and off line at random and still be able to retrieve the content from its parent’s permanent storage instead of requesting the content from the origin again. Similar to bandwidth usage, the usage of disk space on the overlay node is also scheduled according to resource provision and session management. 4. End-User Subscription. At last, after content is transported close to the end users, an end user joins the application layer multicast session by grafting itself to a nearby overlay node on the distribution tree. If the end user runs the same application layer multicast software, subscription would be straightforward and be part of the tree organization. However, that’s not the case for many Internet applications, e.g., web browsing or playing QuickTime video, which have established the end-user access convention. For these scenarios, another set of algorithms are needed to hook the end user to a nearby overlay node on the distribution tree. For example, in “Overcast” [61], the application layer multicast system allows unmodified HTTP clients to join an application layer multicast group by naming the multicast group as a URL. The host portion of the URL denotes the multicast source while the path portion identifies a multicast group. To subscribe, the end user simply points its browser to this URL. The source then uses the HTTP “Location:” directive to redirect the browser to a nearby overlay node that’s part of the distribution tree. How to identify the nearby overlay node and route the end user there is the topic of global content routing. So we won’t elaborate further here.

112


4.4.4

Building the Distribution Tree

The tree building process instantiates a loop-free overlay topology for the purpose of distributing content to all the nodes on the tree. Like any other routing protocols, we will describe application layer tree building in a distributed fashion, that is, in terms of actions an individual overlay node takes. Four steps take place in building the distribution tree among overlay nodes: (1) peer discovery, (2) neighbor selection, (3) parent selection, and (3) tree maintenance. Peer Discovery From the perspective of an individual overlay node (let’s call it Joe), building a tree is equivalent to grafting itself to a parent node that is on the tree. Before identifying a parent node, Joe has first to discover a list of candidate nodes. The candidate list doesn’t have to reflect any parent selection preferences, as Joe would evaluate them all according to Joe’s needs and select a parent in the next step. So, a candidate list may consist of all the overlay nodes that participate in the multicast session. However, if the peer discovery algorithm can return a narrowed list or piggyback some eligibility information on each candidate, it may significantly simplify Joe’s work. Next we will see techniques of both. Joe obtains a list of candidate nodes from a “point of contact,” as well as registering itself with the point of contact so other nodes may know about Joe through the point of contact. Joe knows about the point of contact via outof-band methods, such as, preconfigured, obtained through DHCP or DNS, or in the same way that the multicast group is announced. For example, if the multicast group is named as a URL, the domain name of the point of contact may be encoded in the host portion of the URL or in the query string at the end of the URL. The candidate list returned by the point of contact can simply include all the nodes that have registered with it. In addition, it may include the machine load, bandwidth provision, and network location (e.g., the AS number) of each candidate. If the point of interest supports the suitable parent selection algorithms, Joe may also ask it to filter out apparent losers and return a list of plausible candidates only. In general, such filtering shouldn’t replace Joe’s parent selection because most parent selection algorithms require localized or end-to-end measurement.

POINT OF CONTACT

The counterpart to unicast point-of-contact is multicast expanding-ring search. Joe sends a query message to a IP multicast group designated for peer discovery. Subscribers to that group respond to the query message so Joe discovers them. Joe also subscribes to the IP multicast group himself in order to be discovered. Similar to point of contact, Joe knows about the IP multicast group via out-of-band means. To reduce the amount of query traffic, Joe send the queries with incremental IP time-to-live (TTL) values until enough peers respond, a technique called expanding-ring search (ERS) [25]. Small TTLs prevent the queries from reaching nodes that are too far away and have no chance of being selected neighbor or parent. Unfortunately, ERS works only on source-specific IP multicast routing trees and not on shared trees (see also

MULTICAST EXPANDING-RING SEARCH

4.4


113

Section 4.3). Moreover, being “far away” is determined by the hop count and is indicative of end-to-end latency or bandwidth. So ERS is not a reliable selection tool. Nonetheless, it suffices as a discovery tool. The overlay network operator manually configures a mesh by specifying “virtual links” between overlay nodes and imposing preferences such as bandwidth limits on the virtual links. From Joe’s point of view, he is statically configured with a set of neighbors. No matter which application layer multicast group Joe may join, these preconfigured neighbors are the only candidates for Joe’s parent. Compared to the point-of-contact approach, “manual mesh” simplifies but also limits Joe’s choices. This is desirable when network operators need to influence the overlay topology, especially when the tree building algorithm isn’t sophisticated enough to figure out the underlying topology. For example, Joe may be connected via a satellite network, with a fat downlink but thin uplink. To prevent Joe from relaying content to peer using the thin uplink, the network operator may configure Joe to talk only to the node at the center of the star. In another example, Joe is among many other peers in the same data center. Instead of letting everyone there participate in tree building with nodes outside the data center, the network operator may configure most nodes to talk only to peers in the same data center, while designating a couple nodes for connecting to the outside. The downside of “manual mesh” is that, as the overlay network scales up, the required human intervention can become too complex and beyond comprehension. Therefore, “manual mesh” should only complement automated tree building (and automated mesh building, which is the next topic).

MANUAL MESH

Neighbor Selection In the absence of manual configuration, Joe automatically builds a narrowed list of neighbors (out of all the reachable overlay nodes). From the global perspective, such neighboring relationships are the “virtual links” and they form an overlay mesh. In contract to “manual mesh,” we call this “automesh.” Similar to manual-mesh, Joe evaluates and selects parents only among this set of neighbors. In an overlay network with tens of thousands of nodes, maintaining states on only a narrowed set of peers can significantly reduce the resource usage in terms of memory, processor, and network bandwidth (for probing the peers). Although this may limit the choices for parent selection, in a large overlay network, scalability is much more important. The guideline for narrowing the list of peers is twofold. One is to minimize network distance to a peer, in terms of hop counts, roundtrip time, and throughput to the peer. The other is to maximize overlay resources on the peer, in terms of disk space, provisioned bandwidth, and the number of existing peers supported on this peer. Joe uses the point of contact or expanding-ring search to obtain a list of possible peers. It then contacts every peer on the list to evaluate and select qualified neighbors. A slight problem, though, is that the selected set of neighbors is likely to be used for a long time and yet some metrics such as roundtrip times are transient and thus not representative over time. One technique to mitigate this problem is

114


to use “fuzzy” comparison between peers, that is, to declare a tie if the difference between two measurements is small relative to their mean. For example, a peer with a roundtrip time of 120 ms and another peer with 150 ms may be considered to have the same level of roundtrip time; it’s up to their showing on other metrics to tell which one is better qualified (or if both are). Another technique against transient measurement is to occasionally adjust the set of neighbors. For example, for peers that have made the neighbor list, their network distances and overlay resources are measured periodically, often as a result of building and maintaining distribution trees. Every once in a while, the bottom 10% neighbors are compared with the top 10% disqualified peers to see if some of them should be demoted or promoted. Parent Selection The main difference between neighbor selection and parent selection is that the latter has the concept of a root while the former does not. One builds the overlay mesh, while the other builds a distribution tree with a specific root. Hence, parent selection tends to focus on end-to-end performance over the distribution tree from the root. Different applications may have different combinations of goals. To name just a few: (1) minimize the bandwidth consumption on the network, which suggests selecting the nearest peer as the parent, where “near” is defined in network layer terms such as hop counts; (2) minimize the resource usage on each node for relaying traffic, which suggests a tree with small fanout and large depth; (3) maximize the bandwidth available end to end from the content source (the root) to each receiving node, which suggests that nodes with more incoming/outgoing bandwidth are placed closer to the root; (4) minimize the delivery latency to each node, which suggests a tree with large fanout and small depth as well as selecting the nearest peer as the parent, where “near” is defined by end-to-end latency. One may notice that these goals result in conflicting tree building strategies. In practice, specific applications weight one against another and try to make the best compromises according to the application needs and priorities. For instance, in the application layer multicast system, Overcast [61], the emphasis is to maximize the available bandwidth from the root to all nodes, where the network operator configures the incoming and outgoing bandwidth a node is allowed to consume. Thus, Overcast employs a strategy to place the new node (Joe) as far away from the root as possible without sacrificing the available bandwidth from the root. To measure the available bandwidth, Joe issues a HTTP request to download a 10-kB file from the candidate parent and measure the throughput, as the application for Overcast is pre-positioning of Web content. If this were a live streaming application instead, the measurement would be whether Joe can reserve the necessary bandwidth and CPU from the candidate parent and the delivery latency for the candidate parent to stream a piece of video to Joe. Here is Joe’s parent selection algorithm in Overcast system: Step 1. Joe considers the tree root as the candidate parent (noted as Pete). Joe discovers the tree root via “point of contact.” Step 2. Joe measures the bandwidth from Pete directly to Joe, as well as the bandwidth from Pete to Joe through each child of Pete’s.

4.4


115

Step 3. Joe finds the child (noted as Carl) that offers the best bandwidth through it. If two children provide similar levels of bandwidth, Joe does “traceroute” to obtain the hop count as the tiebreaker. Step 4. If the bandwidth through Carl is no worse than that from Pete directly, Joe picks Carl as the new Pete (so as to deepen the tree without lowering available bandwidth) and go to step 2. Step 5. If the bandwidth from Pete directly is better than that through Carl, Joe picks Pete as its parent and the recursion stops here. Note that this parent selection does not pick among a fixed set of neighbors. Thus, it does not require mesh configuration or neighbor selection. Francis refers to this as the “tree-first” approach [56]. In contract, a “mesh-first” approach builds an overlay mesh first and then simply runs link state routing protocols over the mesh to instantiate application layer multicast trees, in the same way as IP multicast routing tree is computed among IP routers. For example, Touch and Hotz [67] developed the Xbone systems to automatically configure overlay nodes with IP tunnels, using IP-in-IP encapsulation [68] to implement the “virtual links” between overlay nodes. On top of the virtual links run vanilla routing protocols such as OSPF (open shortest path first) [69] and DVMRP (distance vector multicast routing protocol) [70]. Xbone and systems similar to it are built primarily for experimenting new layer 3 technologies before they are fully deployed, such as Mbone [66] for tunneling between isolated IP multicast islands, 6bone [71] for tunneling between isolated IP v6 islands, and ANON (active networks overlay network) [72] for building an overlay network to experiment “active networks” (i.e., the dynamic loading and execution of programs in routers). So far, no one has built an application layer overlay system based solely on layer 3 link-state routing in a overlay mesh, maybe because people have yet to figured out how to fit application semantics (end to end) to layer 3 protocols (hop to hop). Studies have also shown that layer 3 routing protocols are often unable to identify the best end-to-end route (see also Section 4.4.1). Nonetheless, maintaining an overlay mesh increases the fault tolerance of application layer multicast. For instance, it facilitates rapid failover to a nearby neighbor node, in case the parent node of a distribution tree crashes. Plus, it speeds up the parent selection process by narrowing down the list of candidate parents. Furthermore, the neighbor list can, to some extend, reflect the underlying network topology, especially when manually configured or measured via traceroute or roundtrip time. Therefore, picking parents from the neighbor list helps match the application layer distribution tree with the underlying network topology, hence improving the bandwidth consumption by selecting nearby peers as parents. Given a neighbor list, Joe’s parent selection process would be modified (or rather, simplified) as follows: Step 1. For each node in the neighbor list (noted as Terry), if Joe is not one of Terry’s ancestors, Joe measures the bandwidth from the tree root to Joe through Terry and Terry’s ancestors.

116


Step 2. Joe finds the node that offers the best bandwidth and uses it as its parent. If there is a tie, Joe picks the one that’s the least hops away, with the least fanout, or with the most tree depth. Tree Maintenance Tree maintenance stems from the need to detect and recover from network and node failures, as well as the need to adapt to changing network and node loads and thus improve the tree structure over time. For example, given a neighbor list, Joe simply periodically remeasures the available bandwidth from the tree root through each of his neighbors and switches his parent for a better one (if any). In the absence of a neighbor list, Joe can reevaluate the available bandwidths through its parent and its siblings, respectively. If a sibling is able to offer better bandwidth than its parent, Joe relocates under that sibling, thus moving down the tree. Similarly, Joe also reevaluates the available bandwidth through its grandparent. If the grandparent offers better bandwidth than does its parent, Joe relocates under the grandparent, thus moving up the tree. Over time, Joe finds himself the best parent. In case of network or node failure, Joe can immediately relocate under one of its siblings or neighbors or under its grandparent, and then improve its position via periodic reevaluation.

4.4.5

State of the Art

If you are considering deploying application-layer multicast, here are some available software and products: . Cisco “Content Delivery Network” implemented the Overcast application layer multicast system. See http://www.cisco.com/warp/public/779/ largeent/learn/technologies/content_networking/. . Tivoli NetView “Distribution Manager” and “Software Distribution” implement one-to-many software distribution and management. See http://www.Tivoli.com/products/index/software_dist/ and http://www. Tivoli.com/products/index/netview_distmgr/. . RealServer 8 implements application layer multicast as well as IP multicast distribution of streaming media. See also http://www.realnetworks.com/ feature/index_092600.html?src ¼ rnhmfs. . Inktomi “Media Distribution Network” provides overlay solution for streaming media broadcast. See http://www.inktomi.com/products/media/pdfs/ and http://www.inktomi.com/products/media/products/ whtpapr.pdf mediabridge.html. . Xbone implemented a suite of tools to configure and manage overlay networks via IP tunnels. See http://www.isi.edu/x-bone/pubs.html. We classify Web-based applications into four categories, namely long-lived documents, dynamic objects, live multimedia, and dynamic content. We show

4.5

WEB PROXY CACHING

117

that disseminating dynamic objects is a challenge in that dynamic objects reduce the benefit of caching and increase the problem of cache inconsistency. Section 4.5 surveys a variety of Web proxy caching techniques: (1) Web content delivery techniques include on-demand request –response, prefetching and multicast push; (2) Web cache consistency techniques include “polling every time,” adaptive TTL, piggyback invalidation and validation, server-driven TCP-based invalidation, and volume leases; and (3) cache cooperation schemes include hierarchical caching, cache mesh, and transparent caching. Overall, we show that the “repeated unicast” model commonly employed by the above techniques does not scale, especially for dynamic objects.

4.5

WEB PROXY CACHING

Caching has been employed in many fields as an effective data access optimization to improve response time and save bandwidth, such as in computer architecture, distributed shared memory (DSM), distributed file systems, distributed databases, and, more recently, the Web [64,73]. Cao and Liu gave an excellent illustration of the differences and similarities between caching in the Web and caching in these other fields [5]. In computer architecture and DSM, a piece of data typically has multiple writers and multiple readers, while the Web provides only a single-writer multiple-reader interface. Distributed databases provide transactional guarantees over a set of data accesses in the presence of caching, while Web caching is much more primitive. Caching in distributed file systems and in the Web are the most similar, but the Web is orders of magnitude bigger than any distributed file system. Given these differences, caching algorithms in other fields are not always directly applicable to the Web. However, many of the performance tradeoffs in those fields are similar to that in the Web. Much Web caching research has borrowed from the approaches long used in those fields and refined them in the context of the Web. A Web cache can be local to one browser or shared by a collection of users on a campus. The latter is called a Web cache proxy (or proxy). This section presents various deployed and proposed techniques in Web proxy caching and discusses their limitations. But first we examine the characteristics of Web-based applications and their implications for caching. 4.5.1

Basics of Proxy Caching

Before we discuss the research on Web proxy caching, let us first overview the current common practices [1,3,64]. The Web browser (or client) is statically configured with the IP address (or domain name) and port number of the Web cache proxy. All HTTP requests that the browser makes are sent directly to the proxy. When the proxy receives a client request, one of four things may happen: 1. If the requested document is not already in the cache, the proxy retrieves it from the Web server and relays it to the client. This is a cache miss. At this

118


time, the proxy also decides whether to cache this document, based on its popularity. The server response (to be cached) may indicate a cache expiration time, at which time the proxy has to invalidate the cached copy, if the document is cached. If the server response does not set the expiration time, the proxy “guesses” a time-to-live value for the cached copy. 2. If the document is cached and the cached copy is valid (e.g., within the expiration time), the proxy returns the cache copy to the client. This is called a fast hit. Note that the cached copy could be valid but in fact is stale compared to the original document (or invalid but still identical to the original) because the original may be modified at anytime. The proxy maintains only weak cache consistency. 3. If the request hits in the cache but the cached copy is invalid (e.g., past the expiration time), the proxy makes an “If modified since” request to the server to check the cache consistency. If the server answers “Not modified,” the cached copy is returned directly to the client and the cache expiration time is updated accordingly. This is called a slow hit because the request is served from the cache after a roundtrip to the server. 4. If the cached copy is indeed stale, the server responds with the up-to-date copy. The proxy relays it to the client. The proxy also saves it to the local cache and updates the cache expiration time accordingly. This is again a cache miss. 4.5.2

Content Delivery

The Web content can be delivered in one of three ways: (1) by on-demand request – response, which has difficulty to scale [1,73]; (2) by prefetching, where the Web client or proxy proactively requests a Web object in anticipation that it will be referenced in the near future [74]; or (3) by multicast push, where the Web server proactively sends a Web object to the Web clients after the object is modified, often using reliable multicast [59,60,75]. Next we discuss the latter two in more detail. Prefetching is a common technique used in computer I/O (input/output) systems to reduce the latency [76]. It is also applicable to the Web. Kroeger et al. [74] studied a set of Web traces and found that, while proxy caching reduces the latency by at most 26%, proxy prefetching (combined with caching) offers up to 60% reduction. However, it is not practical to achieve these upper bounds because they assume unlimited bandwidth and storage at the proxy. Prefetching may incur more traffic if the prediction is inaccurate (i.e., the prefetched document is never referenced). In general, a prediction algorithm constructs a dependency graph where nodes represent individual files. The arc from a node A to another node (B) indicates that B is likely to be referenced “soon” after A is referenced, with the weight of the arc indicating the likelihood. The concept of “soon” is controlled by the maximum number of references that A and B can be apart by, called the lookahead window. The dependency graph is updated on every HTTP request. Given a request for A, one can generate the prediction by listing all the nodes whose arc from A has a weight higher than a prefetch threshold. The dependency graph may be based on the reference

4.5

WEB PROXY CACHING

119

patterns seen at the local user(s) and/or at the Web server. Padmanabhan and Mogul [77] proposed a prediction scheme where the server makes predictions and piggybacks the prediction (relative to a client request) in the response to the client. The client then makes prefetch requests based on the prediction and its local condition such as whether the predicted file is already downloaded. Simulation results showed a substantial reduction to the access latency, at the cost of a similar increase in the amount of network traffic. Jiang et al. [78] proposed a similar prediction algorithm, which runs instead at the client (or proxy) side. Moreover, the client dynamically adjusts the prefetch threshold according to the current system load, bandwidth availability, and other variables. Cohen et al. [79] combined server-driven and client-driven prediction by employing server volumes that are collections of web objects that tend to be referenced together and proxy filters that tailor the piggybacked server volumes to suit the proxy’s need. Multicast push was proposed as an alternative to caching for distributing highly popular Webpages [59,60,75,80]. In this scheme, a Web client receives Web objects from a multicast distribution initiated by the Web server. However, IP multicast is intrinsically synchronous while the Web access pattern is fundamentally asynchronous. To bridge their difference, two multicast push schemes were proposed. The asynchronous multicast push (AMP) [75,80] scheme, requests for the same document are deferred until the next multicast push event. At this periodically scheduled event, the document is multicast to all the Web clients that have requested it. The longer the interval between the multicast push events, the longer the average delay each client experiences, but the better the bandwidth utilization because of the increased number of requests that are served at once. Hence this scheme is more suitable for disseminating objects that can tolerate long delays such as software distribution over the Web. To achieve the normal Web access latency (e.g., 500 ms), the interval between the multicast push events must be reduced to almost “continuous” (e.g., under 500 ms), which brings us to the next scheme. Next is continuous multicast push (CMP) [59,60]. The Web server multicasts repeatedly a Web document to a multicast group address (e.g., using RMDP or Digital Fountain). Every popular document is assigned to a multicast address. Receivers tune to the corresponding multicast group address when requesting for a document and leave as soon as the document is reliably downloaded. This scheme significantly reduces the Web access latency compared to AMP but requires the Web server to constantly transmitting all the popular documents. Besides, the delay for multicast join and leave messages to take effect also raises the Web access latency and the overhead in bandwidth consumption, especially when the document is only a few kilobytes. Rodriguez and Biersack [60] compared multicast push and hierarchical caching using analytical models. They found that for popular documents that do not change often, caching is preferable to multicast push in terms of both the bandwidth cost and the access latency. However, for popular documents that change frequently, the Web cache proxies at higher levels of the cache hierarchy become overloaded and delays the content delivery. Conversely, multicast push directly to the Web clients works better for such documents as it eliminates the need to fetch the content on

120


demand. Typical examples are Web-based stocker tickers and football scoreboards, which are constantly updated from the Web server to the browsers without the user requesting the update. Gwertzman and Seltzer [81] proposed geographic push caching, which is essentially a server-driven document replication scheme. The Web server identifies a number of geographic regions that reference a Web document the most frequently. The Web server then “pushes” the document to caches in those regions and redirects future requests to their respective regional caches. The Web server, however, does not actively maintain the cache consistency but rather relies on the cache callbacks (“If modified since”) to check the freshness of the cached copies. In retrospect, geographic push caching chose a server-driven model because at that time there were no Web cache proxies on the data paths from the Web clients to the Web server. Therefore the caching decision can be made only by the server and executed using replication and HTTP redirections. However, the idea of pushing to Web caches inspired the author’s work on MMO. Besides improving the delivery method, the actual data delivered also can be optimized. Trace analysis shows that the majority of the Web object modifications yield a new instance that is substantially similar to the old one [82]. For Web caches that already have the old instance, it is more efficient to transmit only the difference rather than the entire instance, a technique called delta encoding. Mogul et al. [82] showed that delta encoding can save over 30% of the total response body bytes. Delta encoding combined with data compression (e.g., using gzip) performs even better. 4.5.3

Cache Consistency

Web cache consistency is only a special case of the more general cache consistency problem that distributed file systems face in that the Web provides only a singlewriter multiple-reader interface [5]. However, Web cache consistency remains challenging because a large population of readers places a heavy load on the server to do cache invalidation or answer validity polls. Web cache consistency techniques fall into the two traditional categories: “validity check” (i.e., polling) [3,4] and “change notification” (i.e., invalidation) [5,83]. Polling-based Web cache consistency protocols include polling every time and adaptive TTL. Polling every time [3,4] provides strong cache consistency. The Web cache proxy always sends an “If modified since” request to the server before returning any cached copy to clients. The server responds with either a modified copy or “Not modified.” Cache hits are always slow hits. Regardless of whether the document is cached, the response time includes a roundtrip to the Web server. The Web server is also inundated with as many requests as without caching, although caching still reduces the amount of data the server has to transmit. Adaptive TTL [3] provides weak cache consistency and is currently widely employed. Based on the observation that “old” files are less likely to be modified than “young” files, the proxy sets the time-to live (TTL) of a cached copy to times the “age” of the object (i.e., from its last modification to now). (a is also

4.5

WEB PROXY CACHING

121

referred to as the adaptive threshold). By default, a is 0.2 in Squid Web cache [31] and 0.5 in Harvest Web cache [64]. Before the TTL expires, client requests are served directly from the cache. They are fast hits but may be stale. On the first client request after the TTL expires, the proxy sends an “If modified since” request to the server. The result may be a modified copy or a slow hit. In both cases, the TTL of the cached copy is adjusted accordingly. Gwertzman and Seltzer [3] showed that adaptive TTL can be tuned to return stale data less than 5% of the time. Invalidation-based Web cache consistency protocols include piggyback invalidation and validation [83,84], TCP-based invalidation [5], and volume leases [85,86]. Piggyback cache validation (or PCV) and piggyback cache invalidation (or PCI) were proposed in 1997 [83] and 1998 [84], respectively. Both provide weak cache consistency. In PCV, when the proxy contacts a Web server in order to serve client requests, the proxy piggybacks a list of cached but potentially stale Web objects for server validation. The list of potentially stale objects is obtained using adaptive TTL. Trace-driven simulations show that PCV performs best when the adaptive threshold is 0.1 and the maximum number of objects to be piggybacked for validation is 50. Compared to the best TTL-based approach, PCV reduces the number of requests from the proxy to the Web server by 16%, reduces the average cost (a combination of response time and bandwidth) by around 7%, and reduces the stale rate by around 60%. PCI is similar to PCV in that the Web server piggybacks (in a response to a proxy) a list of Web objects that have changed since the last access by the proxy. The proxy then invalidates cached copies of those objects and extends the TTL of caches copies that are not on the list. However, PCI differs from PCV in that the proxy request does not contain a list of web objects but rather the identifiers and version numbers of a list of volumes (groups of correlated web objects). Cohen et al. [79] refer to this list as a proxy filter. The server, therefore, does not need to maintain per proxy access information in order to generate the invalidation list but rather maintains only the volume compositions and version numbers. PCI reduces the information that the proxy has to piggyback to the Web server but increases the information that is returned to the proxy because the server may return invalidations of objects that the proxy does not cache. Krishnamurthy and Wills [84] showed that overall PCI provides similar improvement as PCV. Volumes are used both in prefetching and cache consistency control. A volume is a group of Web objects that are correlated and tend to be accessed together. In Ref. 84, volumes are constructed using the dependency graph, similar to prefetch prediction. Cohen et al. [79] compared this algorithm (called probability-based volumes) with directory-based volumes, where the server groups Web objects with the same directory prefix (up to some number of levels) into one volume. Trace analysis showed that both techniques can provide highly accurate predictions while the probability-based approach incurs more computational complexity and the directory-based approach requires more piggybacked information. Cao and Liu [5] studied a TCP-based cache invalidation protocol that provides strong cache consistency. The Web server keeps a list of Web caches that have

122


downloaded a Web object since its last modification. When this object is modified again, the server sends an invalidation via TCP to each one on the list. Cao and Liu [5] concluded, based on trace replay in a LAN, that strong cache consistency can be maintained with little or no extra cost compared to adaptive TTL. However, the Web server has to keep per proxy state and establish TCP connections to all the proxies to deliver the invalidation, a significant overhead for widely cached dynamic objects. Gray and Cheriton [85] proposed leases to maintain distributed file cache consistency. To read an object, the client first acquires from the Web server a lease for the object with a certain timeout. Before modifying the object, the Web server has to invalidate all the cached copies whose leases have not expired. To read a cached copy after the lease has expired, the client has to contact the server to renew the lease. Leases allow the server to make progress while maintaining strong consistency despite failures. Yin et al. [86] extended leases into volume leases, which combine short-lived leases of volumes with long-lived leases of individual objects in order to amortize the lease renewal overhead while reducing the delay to writes when a failure occurs. Trace-driven simulations show that volume leases generate less network traffic than does the invalidation approach when the lease timeout is longer than a week, but more traffic than the polling-based approach. Volume leases also require fewer server states than the invalidation approach.

4.5.4

Cache Cooperation

Web cache proxies can benefit from sharing each other’s cache in the same way that Web clients benefit from proxy caching. For example, the Web cache proxy for Beijing University may receive a request for Stanford’s homepage that the proxy does not cache. However, another proxy in Tsinghua University may have cached the page. Retrieving the page from the Tsinghua proxy not only significantly improves the access latency but also reduces the traffic load on the transpacific link and the Stanford Web server. Three different approaches can facilitate the cooperation among caches: hierarchical caching [73,87,88], cache mesh [89 –92,87 – 93] and transparent caching [93]. Chankhunthod et al. [73] proposed hierarchical caching where caches resolve misses through other caches higher in a hierarchy. Besides the parent –child relationships, caches also engage in peer relationships, called siblings, in order to distribute the load among the sibling caches. All the relationships are manually configured. On a cache miss, the proxy performs a remote procedure call to all of its siblings and parents to find out if any of them has the requested object cached. The proxy then retrieves the object from the first one (if any) that replies “yes.” If no one caches the object, the request is forwarded to the parent cache. The proxy may also be configured to forward requests with specific URL prefixes to a certain proxy that is assigned to cache objects with those prefixes. Results of the experimental system Harvest showed that hierarchical caching does not significantly increase the access latency if the parent caches along the way are not overloaded. Similar

4.5

WEB PROXY CACHING

123

studies on hierarchical caching include those by Yu and MacNair [88] and Malpani et al. [87]. Harvest employs adaptive TTL instead of hierarchical cache invalidation for cache consistency [73] for the following reasons: (1) hierarchical invalidation requires support from all the Web servers and participating cache proxies, (2) invalidation of widely cached objects would cause bursts of synchronous requests, and (3) invalidation requires much state to be kept in the cache hierarchy and may prevent it from scaling. Wessels and Claffy [31] elaborated on the Harvest intercache communication—a protocol called ICP (Internet caching protocol) implemented by the Squid Web cache. Their operation experiences indicate that, although ICP achieves the initial design goals, querying peers scales poorly in that it incurs more traffic when there are many peers and is prone to packet losses, which result in excessive delay. Fan et al. [94] also showed that ICP with as few as four peers increases the interproxy traffic by over 70 times and raises the average user-perceived latency by up to 11%. Summary cache [94] addressed the scaling problem of ICP. Instead of querying the peers on each cache miss, every proxy keeps a summary of the cache directory of each of its peers and checks the summaries for potential hits at other caches. Summary cache employed bloom filters 5 to compress the cache directory (e.g., into one byte per entry) while maintaining a low false hit ratio (e.g., 1.2% with four hash functions). The higher the false-hit rate, the greater the interproxy communication. But to lower the false-hit rate, the size of the compressed cache directory has to increase, which consumes more memory at the proxies. Overall, simulation results showed that summary cache significantly improves the bandwidth and latency over ICP while achieving a hit rate similar to ICP. As more cache proxies are deployed, the cache hierarchy is evolving toward a coarsely layered mesh structure [89,90]: (1) the cache hierarchy cannot be too deep because the response time becomes unacceptable after a request is relayed over 3 times [73] and (2) fewer the top-level caches, the more likely they are overloaded and become the bottleneck [60]. More caches now engage in peering relationships with each other rather than in a parent – child relationship [31]. The cache hierarchy becomes blurred. At each coarse level in the hierarchy, there tends to be a mesh of caches. Much ongoing research tackles the information dissemination and self-organization in a cache mesh. Zhang et al. proposed adaptive Web caching [91,95] where Web caches (1) selforganize themselves into a mesh of overlapping multicast groups and (2) forward Web requests in the cache mesh toward the Web server. To self-organize, each cache periodically sends a voting message for each multicast group to a common multicast group that every cache joins. Each cache then decides which multicast 5

The bloom filter enjoys hashing to represent group membership (in this case, whether an object is in the cache). To generate the cache summary, a proxy applies multiple hash functions to a cached URL and records the results in a bit array. When checking the summary for a potential hit, the hash results of the URL in question are compared with those in the bit array of the proxy. If the two sets of results match, the URL is potentially cached at the proxy.

124


groups to join based on the voting messages it received. The decision basically tries to maintain the relative closeness of the members of a multicast group. Then, the caches also maintain URL-based routing information so that a HTTP request can be forwarded in the cache mesh toward the source of the requested document and possibly find a cached copy along the way. The routing information within a multicast group is expressed as summary cache directories in regard to URL prefixes. Such information is then exchanged between multicast groups. The route computation resembles the distance vector routing algorithm. Adaptive Web caching is still in active development. Therefore, the protocol details and performance evaluations are not yet available. Touch proposed [92] a multicast distributed virtual cache called large-scale active middleware (LSAM) in which a Web server monitors the access load and selects popular ones to multicast. The Web server anounces the multicast event in a well-known multicast group address that all the Web cache proxies join. The proxies then decide whether to attend the multicast event to download the Web object. A multicast event is triggered only when a client request for a popular object arrives at the Web server, indicating that the cache mesh does not have the Web object cached. This work is also in progress and does not yet have performance evaluations. Transparent caching is a way to compel Web browsers to use caching without explicitly configuring the browsers to the scheme proposed by Nordstrom [96]. This technique diverts the IP packet that carries a HTTP request from its normal path to the cache proxy. The diversion employs either policy routing or WCCP (Web Cache Control Protocol, a Cisco-proprietary protocol) on a Cisco router to redirect traffic on port 80 to a nearby cache proxy (often directly connected to the router). The router has to be on the path that the client request takes, such as the gateway of a campus network. Transparent caching is useful in a number of ways: (1) the Web browsers do not need to be configured to use transparent caching, which greatly simplifies the deployment of Web cache proxies; (2) Web cache proxies naturally form a cache hierarchy and do not need to be explicitly configured into parent – child relationships; and (3) transparent caching takes advantage of IP routing in the network layer to automatically forward HTTP requests toward the content source, therefore eliminating the more complex application layer URL-based routing in a cache mesh. However, transparent caching has a a drawback. The transparent cache is not truly transparent in that the Web server sees the request bearing the source IP address of the transparent caching instead of that of the actual client machine. If the Web application requires the server to compute a reponse specific to the client identity, such application will only compute a response specific to the transparent proxy instead of the actual client. A common solution to this problem is for the proxy to include the client identity in its request to the Web server, embedded in HTTP request headers or in Web cookies. In the context of active network, Bhattacharjee et al. [93] proposed associating caches with the routers throughout the network, which is an extreme case of transparent caching. The proposed scheme distributes the Web content using a technique called modulo caching, where a Web response

4.5

WEB PROXY CACHING

125

carrying a Web object specifies a cache radius, say, 3, and every third hop along the way caches the Web object. The object is therefore distributed to the Internet at a certain density controlled by the cache radius. The caches also store the cache directories of nearby caches. At any router, a Web request is either served from the local cache, forwarded to the nearest cache that has the requested object, or forwarded toward the document source. Some simple analysis [93] suggests that the proposed scheme is beneficial and effective.

4.5.5

Limitations of Previous Work

While caching is very effective at disseminating long-lived Web objects, caching of dynamic objects remains a challenge in that dynamic objects reduce the benefit of caching and increase the problem of cache inconsistency. First, current web cache consistency protocols such as adaptive TTL [3] are polling-based and provide only weak cache consistency. For objects with long lifetimes, such as days and weeks, the rate of polling is low, so the overhead of polling is negligible. But for dynamic objects that change several times a day, the polling overhead can be excessive for the network and the Web server. Alternatively, serverdriven TCP-based cache invalidation can avoid the overhead of frequent client polling and provide strong cache consistency. However, the Web server has to keep per-proxy state and establish TCP connections to all the proxies to deliver the invalidation, a significant overhead for widely cached dynamic objects. Besides maintaining cache consistency, delivering dynamic objects is also a challenge. The rate at which the server can accept TCP connections and pump data to the Internet is the primary limitation of the service scalability. With Web caching, the server may unicast the document to each cache only once per modification. Consequently, a dynamic object incurs much more work than does a popular object that does not change. Besides, latency reduction from prefetching is significantly limited by the rapid changes of objects on the Web [74]. Moreover, after the cached copies of a popular dynamic object become invalid, there is likely to be a sudden influx of requests from many remote caches, potentially saturating the server and causing link congestion. These bursts of requests may produce peak loads comparable to that experienced without caching. If servers are engineered for these peak loads, the benefit of caching for servers is minimal. Fundamentally, the “repeated unicast” delivery model does not scale. End-to-end multicast push [59,60,75] addresses the challenges discussed above using AMP or CMP. However, the Web access pattern is highly asynchronous, so end-to-end multicast push causes the server to transmit an object either continuously (in CMP) or many times per modification (in AMP). Besides, the multicast overhead (including multicast address allocation, routing and transport session organization) can hardly be amortized over the delivery of one Web object, which is often small [97]. Carrying multiple Web objects in one multicast channel may reduce the multicast overhead but also introduce extraneous data, a significant problem to end users because their access patterns are sporadic. End users also have diverse capabilities in

126


terms of their connectivity, processing power, and disk storage, which aggravate the flow and congestion control of multicast transport. Piggyback invalidation and validation [83,84] improve the Web cache consistency but do not address the delivery of dynamic objects. Moreover, invalidations may trigger bursts of Web requests and overload the Web server. Conversely, hierarchical caching, cache mesh and transparent caching techniques alleviate the “repeated unicast” problem by gradually distributing the web content to more caches. However, most previous work does not provide an effective cache consistency measure for rapidly changing Web objects. Hierarchical cache invalidation was also dismissed by some proposals because it requires too much state to be kept in the cache hierarchy. In summary, more work is still needed on scalable Web proxy caching protocols for dynamic objects that provide strong cache consistency and efficient content delivery.

4.6

SUMMARY

The key to scalable content delivery on the Internet is efficient one-to-many content distribution. This chapter examined the two primary methods of one-to-many content distribution: proactive multicast and on-demand caching. Proactive multicast works best on real-time or frequently changing contents such as live video or stock quotes. It also works best if the end user cannot afford the bandwidth or time to fetch the content on demand. For example, when the end user is behind a thin link, proactive multicast makes sure the content is pushed out to the end user’s local LAN before the end user accesses the content. Within proactive multicast, there are two implementations: reliable IP multicast and application layer multicast. The former is applicable to networks that have enabled native IP multicast, while the latter applies to any network topology and requires installation of “overlay” nodes in the network. On-demand caching works best on infrequently changing content such as Webbased dictionary and school literatures. It requires the end user to have sufficient bandwidth or afford the time to fetch the content on demand. The first requester will have to wait for the content to download across the WAN. Once the content is fetched for the first time, it is cached and any subsequent requests from nearby end users will be served from the local cache, hence saving bandwidth and time. Within on-demand caching, there are implementations: explicit proxy and transparent proxy. Explicit proxy requires the end-user browsers to be explicitly configured to use the cache proxy, while transparent proxy relies on the routers to redirect Web traffic to the nearby cache proxy without manual intervention from the end user. Both proactive multicast and on-demand caching address the need of efficient one-to-many content distribution. Each has strengths and weaknesses. A combination of both proactive multicast and on-demand caching enables the most scalable blend of scalable content delivery on the Web.

REFERENCES

127

REFERENCES 1. R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and T. Berners-Lee, Hypertext Transfer Protocol—HTTP/1.1, RFC 2068, Jan. 1997. 2. T. Berners-Lee and D. Connolly, Hypertext Markup Language—2.0, RFC 1866, Nov. 1995. 3. J. S. Gwertzman and M. Seltzer, World-Wide Web cache consistency, Proc. USENIX 1996 Annual Technical Conf., Jan. 1996, pp. 141– 151. 4. A. Dingle and T. Part, Web cache coherence, Proc. 5th Int. World Wide Web Conf., Paris, May 6 –10, 1996. Comput. Networks ISDN Syst. 28(7 – 11):907– 920 (May 1996). 5. P. Cao and C. Liu, Maintaining strong cache consistency in the World Wide Web, Proc. 17th Int. Conf. Distributed Computing Systems, May 27 – 30, 1997; IEEE Trans. Comput. 47(4):445 – 457 (April 1998). 6. F. Douglis, A. Haro, and M. Rabinovich, HPP: HTML macro-preprocessing to support dynamic document caching, Proc. USENIX Symp. Internet Technologies and Systems, Dec. 1997, pp. 83– 94. 7. A. S. Thyagarajan and S. E. Deering, Hierarchical distance-vector multicast routing for the MBone, ACM SIGCOMM 95; Comput. Commun. Rev. 25(4):60 –66 (Oct. 1995). 8. E. Amir, S. McCanne, and H. Zhang, An application-level video gateway, Proc. ACM Multimedia, 1995. 9. E. Amir, S. McCanne, and R. Katz, An active service framework and its application to real-time multimedia transcoding, paper presented at ACM SIGCOMM’98, 1998. 10. P. Cao, J. Zhang, and K. Beach, Active cache: Caching Dynamic contents on the Web, Proc. IFIP Int. Conf. Distributed Systems Platforms and Open Distributed Processing, Middleware ’98, 1998, pp. 373– 388. 11. S. Michel, K. Nguyen, A. Rosenstein, L. Zhang, S. Floyd, and V. Jacobson, Adaptive Web caching: Towards a new global caching architecture, 3rd Int. Caching Workshop, June 1998; Comput. Networks ISDN Syst. 30(22– 23):2169– 2177 (Nov. 25, 1998). 12. P. V. Mockapetris, Domain Names—Concepts and Facilities, IETF RFC 1034, Nov. 1987. 13. R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and T. Berners-Lee, Hypertext Transfer Protocol—HTTP/1.1, RFC 2068, Jan. 1997. 14. J. Postel, Simple Mail Transfer Protocol, IETF RFC 821, Aug. 1982. 15. U. D. Black, OSI: A Model for Computer Communications Standards, Prentice-Hall, Englewood Cliffs, NJ, Jan. 1991. 16. Information Sciences Institute, Univ. Southern California, Transmission Control Protocol, DARPA Internet Program, Protocol Specification, RFC 793, Sept. 1981. 17. J. Postel, User Datagram Protocol, IETF RFC 768, Aug. 1980. 18. S. E. Deering, Host Extensions for IP Multicasting, RFC 1112, Aug. 1989. 19. H. Holbrook and D. R. Cheriton, EXPRESS multicast: An extended service model for globally scalable IP multicast, paper presented at SIGCOMM’99, Cambridge, MA, 1999. 20. R. Talpade and M. H. Ammar, Single connection emulation (SCE): An architecture for providing a reliable multicast transport service, Proc. 15th Int. Conf. Distributed Computing Systems, 1995, pp. 144– 151.

128


21. H. Holbrook, S. Singhal, and D. R. Cheriton, Log-based receiver-reliable multicast for distributed interactive simulation, paper presented at SIGCOMM’95, Cambridge, MA, 1995. 22. S. Floyd, V. Jacobson, C.-G. Liu, S. McCanne, and L. Zhang, A reliable multicast framework for light-weight sessions and application level framing, IEEE/ACM Trans. Network. 5(6):784 –803 (Dec. 1997). 23. S. Lin, D. J. Costello, and M. J. Miller, Automatic-repeat-request error-control schemes, IEEE Commun. Mag. 22(12):5 – 17 (1984). 24. B. N. Levine and J. J. Garcia-Luna-Aceves, A comparison of reliable multicast protocols, Multimedia Syst. (ACM/Springer) 6(5) (Aug. 1998). 25. R. Yavatkar, J. Griffioen, and M. Sudan, A reliable dissemination protocol for interactive collaborative applications, Proc. ACM Multimedia’95 Conf., Nov. 1995. 26. B. N. Levine, D. Lavo, and J. J. Garcia-Luna-Aceves, The case for concurrent reliable multicasting using shared Ack trees, Proc. ACM Multimedia’96 Conf., Nov. 1996. 27. M. Yamamoto, J. F. Kurose, D. F. Towsley, and H. Ikeda, A delay analysis of senderinitiated and receiver-initiated reliable multicast protocols, Proc. INFOCOM’97, 1997, Vol. 2, pp. 480– 488. 28. J. Lin and S. Paul, RMTP: A reliable multicast transport protocol, paper presented at INFOCOM’96. 29. S. Raman, S. McCanne, and S. Shenker, Asymptotic scaling behavior of global recovery in SRM, Proc. ACM Sigmetrics’98/Performance’98 Joint Conf., June 1998. 30. D. Li and D. R. Cheriton, OTERS (on-tree efficient recovery using subcasting): A reliable multicast protocol, Proc. 6th IEEE Int. Conf. Network Protocols (ICNP’98), Oct. 1998, pp. 237– 245. 31. D. Wessels and K. Claffy, ICP and the Squid web cache, IEEE J. Select. Areas Commun. 16(3):345 – 357 (April 1998). 32. S. Deering, D. L. Estrin, D. Farinacci, V. Jacobson, C.-G. Liu, and L. Wei, The PIM architecture for wide-area multicast routing, IEEE/ACM Trans. Network. 4(2):53 – 62 (April 1996). 33. A. Ballardie, Core Based Trees (CBT) Multicast Routing Architecture, RFC 2201, Sept. 1997. 34. D. Estrin, D. Farinacci, A. Helmy, D. Thaler, S. E. Deering, M. Handley, V. Jacobson, C. Liu, P. Sharma, and L. Wei, Protocol Independent Multicast-Sparse Mode ( pim-sm): Protocol Specification, RFC 2117, June 1997. 35. W. Fenner and S. Casner, A “Traceroute” Facility for IP Multicast, Internet Draft kdraft-ietf-idmr-traceroute-ipm-02.txtl, work in progress, Nov. 1997. 36. R. G. Kermode, Scoped hybrid automatic repeat request with forward error correction (SHARQFEC), paper presented at SIGCOMM’98, Sept. 1998. 37. M. Hofmann, Scalable Multicast Communication in Wide Area Networks (in German), Ph.D. thesis, Univ. Karlsruhe, Germany (published by InfixVerlag, DISDBIS, Vol. 42), Feb. 1998. 38. D. Meyer, Administratively Scoped IP Multicast, RFC 2365, July 1998. 39. C. Papadopoulos, G. Parulkar, and G. Varghese, An error control scheme for large-scale multicast applications, paper presented at INFOCOM’98, March 1998. 40. T. Speakman, D. Farinacci, S. Lin, and A. Tweedly, Pretty Good Multicast (pgm) Transport Protocol Specification, Internet Draft draft-speakman-pgm-spec-00.txt, work in progress, Jan. 1998.

REFERENCES

129

41. B. N. Levine and J. J. Garcia-Luna-Aceves, Improving Internet multicast with routing labels, Proc. IEEE Int. Conf. Network Protocols, Oct. 1997. 42. M. Lim and D. Kim, IP Extension for Reliable Multicast, Internet Draft draftlim-ip-reliable-multicast-00.txt, work in progress, 1997. 43. L. H. Lehman, S. J. Garland, and D. L. Tennenhouse, Active reliable multicast, Proc. IEEE INFOCOM’98 Conf. Computer Communications, 17th Annual Joint Conf. IEEE Computer and Communications Societies, 1998, Vol. 2, pp. 581 – 589. 44. L. Rizzo and L. Vicisano, A reliable multicast data distribution protocol based on software FEC techniques, Proc. 4th Workshop on the Architecture and Implementation of High Performance Communications Subsystems—HPCC’97, June 1997, pp. 115 – 124. 45. J. W. Byers, M. Luby, M. Mitzenmacher, and A. Rege, A digital fountain approach to reliable distribution of bulk data, ACM SIGCOMM’98 Conf., Sept. 1998; Comput. Commun. Rev. 28(4):56 – 67 (Oct. 1998). 46. E. Schooler and J. Gemmell, Using Multicast FEC to Solve the Midnight Madness Problem, Microsoft Research Technical Report, MSR-TR-97-25, Sept. 1997. 47. K. Miller, K. Robertson, A. Tweedly, and M. White, StarBurst Multicast File Transfer Protocol (MFTP) Specification, work in progress, ftp://ietf.org/internetdrafts/draft-miller-mftp-spec-03.txt. 48. D. Li and D. R. Cheriton, Evaluating the utility of FEC with reliable multicast, Proc. 7th IEEE Int. Conf. Network Protocols (ICNP’99), Oct. 1999. 49. J. Nonnenmacher, M. Lacher, M. Jung, E. W. Biersack, and G. Carle, How bad is reliable multicast without local recovery?, paper presented at INFOCOM’98, March 1998. 50. S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications, Prentice-Hall, Englewood Cliffs, NJ, 1983. 51. R. E. Blahut, Theory and Practice of Error Control Codes, Addison-Wesley, Reading, MA, 1984. 52. M. Luby et al., Practical loss-resilient codes, Proc. 29th ACM Symp. Theory of Computing, 1997. 53. K.-W. Lee, S. Ha, J.-R. Li, and V. Bharghavan, An application-level multicast architecture for multimedia communications, Multimedia: Proc. 8th ACM Int. Conf., 2000, pp. 398 – 400. 54. S. McCanne et al., FastForward Networks MediaBridge Software, http://www. inktomi.com/products/media/products/mediabridge.html. 55. E. Al-Shaer et al., Application-layer group communication server for extending reliable multicast protocols services, Proc. 1997 Int. Conf. Network Protocols, pp. 267 – 274. 56. P. Francis, Yoid: Extending the Internet Multicast Architecture, April 2, 2000, http:// www.aciri.org/yoid/docs/index.html. 57. M. Podolsky, M. Vetterli, and S. McCanne, Limited retransmission of real-time layered multimedia, Proc. IEEE Signal Processing Society Workshop on Multimedia Signal Processing, Los Angeles, Dec. 1998. 58. Y. Chawathe, S. Fink, S. McCanne, and E. Brewer, A proxy architecture for reliable multicast in heterogeneous environments, Proc. ACM Multimedia’98, Bristol, UK, Sept. 1998.

130


59. K. C. Almeroth, M. H. Ammar, and F. Zongming, Scalable delivery of Web pages using cyclic best-effort multicast, Proc. IEEE INFOCOM’98 Conf. Computer Communications, 17th Annual Joint Conf. IEEE Computer and Communications Societies, April 1998, Vol. 3, pp. 1214– 1221. 60. P. R. Rodriguez and E. W. Biersack, Continuous multicast push of Web documents over the Internet, IEEE Network 12(2):18 – 31 (March – April 1998). 61. J. Jannotti, D. K. Gifford, K. L. Johnson, M. F. Kaashoek, and J. W. O’Toole, Jr., Overcast: Reliable multicasting with an overlay network, Proc. Operating System Design and Implementation (OSDI) Conf. 2000, http://www.usenix.org/publications/ library/proceedings/osdi2000/jannotti.html. 62. S. Savage, A. Collins, E. Hoffman, J. Snell, and T. Anderson, The end-to-end effects of Internet path selection, SIGCOMM’99. 63. Y. Chawathe, S. McCanne, and E. Brewer, RMX: Reliable multicast in heterogeneous networks, Proc. IEEE INFOCOM 2000, Tel Aviv, Israel, March 2000. 64. C. M. Bowman, P. B. Danzig, D. R. Hardy, U. Manber, and M. F. Schwartz, The Harvest information discovery and access system, Proc. 2nd Int. WWW Conf., Oct. 1994; Comput. Networks ISDN Syst. 28(1 – 2):19– 25 (Dec. 1995). 65. P. Parnes et al., Lightweight application level multicast tunnelling using mTunnel, Proc. Interactive Distributed Multimedia Systems and Telecommunication Services, 4th Int. Workshop, IDMS ’97; Comput. Commun. 21(15):1295 – 1301 (Oct. 1, 1998). 66. S. Casner and H. Schulzrinne, Frequently Asked Questions (FAQ) on the Multicast Backbone (MBONE), http://www.cs.columbia.edu/ hgs/internet/mbonefaq.html. 67. J. Touch and S. Hotz, The X-Bone, paper presented at 3rd Global Internet Mini-Conf. and Globecom’98, Sydney, Australia, Nov. 8 – 12, 1998. 68. C. Perkins, IP Encapsulation within IP, IETF RFC 2003, Oct. 1996. 69. J. Moy, OSPF Version 2, IETF RFC 2328, April 1998. 70. D. Waitzman, C. Partridge, and S. E. Deering, Distance Vector Multicast Routing Protocol, IETF RFC 1075, Nov. 1988. 71. Testbed for Deployment of IPv6, http://www.6bone.net/. 72. C. Tschudin, et al., An active networks overlay network (ANON), Proc. 1st Int. Conf. Active Networks, IWAN’99, July 1999, pp. 156–164. 73. A. Chankhunthod, P. B. Danzig, C. Neerdaels, M. F. Schwartz, and K. J. Worrell, A hierarchical Internet object cache, Proc. USENIX Annual Technical Conf., Jan. 1996, pp. 153 – 163. 74. T. M. Kroeger, D. E. Long, and J. C. Mogul, Exploring the bounds of Web latency reduction from caching and prefetching, Proc. USENIX Symp. Internet Technologies and Systems, 1997, pp. 13– 22. 75. R. J. Clark and M. H. Ammar, Providing scalable Web services using multicast communication, Comput. Networks ISDN Syst. 29(7):841 – 858 (Aug. 1997). 76. D. A. Patterson and J. L. Hennessy, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, 1996. 77. V. N. Padmanabhan and J. C. Mogul, Using predictive prefetching to improve World Wide Web latency, Comput. Commun. Rev. (ACM) 26(3):22 – 36 (July 1996).

REFERENCES

131

78. Z. Jiang and L. Kleinrock, An adaptive network prefetch scheme, IEEE J. Select. Areas Commun. 16(3):358 – 368 (April 1998). 79. E. Cohen, B. Krishnamurthy, and J. Rexford, Improving end-to-end performance of the Web using server volumes and proxy filters, ACM SIGCOMM’98 Conf. Applications, Technologies, Architectures, and Protocols for Computer Communication, Sept. 1998; Comput. Commun. Rev. 28(4):241 – 253 (Oct. 1998). 80. J. Nonnenmacher and E. W. Biersack, Asynchronous multicast push: AMP, paper presented at ICCC’97 Int. Conf. Computer Communications, Nov. 1997. 81. J. S. Gwertzman and M. Seltzer, The case for geographical push-caching, Proc. 5th Workshop Hot Topics in Operating Systems (HotOS-V), Orcas Island, WA, May 4 – 5, 1995, pp. 51 –55. 82. J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy, Potential benefits of delta encoding and data compression for HTTP, ACM SIGCOMM97 Conf. Applications, Technologies, Architectures, and Protocols for Computer Communication; Comput. Commun. Rev. 27(4):181 – 194 (Oct. 1997). 83. B. Krishnamurthy and C. E. Wills, Study of piggyback cache validation for proxy caches in the World-Wide Web, Proc. USENIX Symp. Internet Technologies and Systems, Dec. 1997, pp. 1– 12. 84. B. Krishnamurthy and C. E. Wills, Piggyback server invalidation for proxy cache coherency, Proc. 7th Int. World Wide Web Conf.; Comput. Networks ISDN Syst. 30(1 – 7):185– 193 (April 1998). 85. C. G. Gray and D. R. Cheriton, Leases: An efficient fault-tolerant mechanism for distributed file cache consistency, Oper. Syst. Rev. 23(5):202 – 210 (1989). 86. J. Yin, L. Alvisi, M. Dahlin, and C. Lin, Using leases to support server-driven consistency in large-scale systems, Proc. 18th Int. Conf. Distributed Computing Systems, May 26 – 29, 1998, pp. 285–294. 87. R. Malpani, J. Lorch, and D. Berger, Making World Wide Web caching servers cooperate, paper presented at 4th Int. World Wide Web Conf., 1995. 88. P. S. Yu and E. A. MacNair, Performance study of a collaborative method for hierarchical caching in proxy servers, 7th Int. World Wide Web Conf., April 1998; Comput. Networks ISDN Syst. 30(1– 7):215– 224 (April 1998). 89. I. Melve, L. Slettjord, H. Bekker, and T. Verschuren, Building a Web caching system— architectural considerations, Proc. 8th Joint Eur. Networking Conf. (JENC8), May 1997, pp. 121/1– 9. 90. C. Grimm, J.-S. Vockler, and H. Pralle, Load and traffic balancing in large scale cache meshes, Proc. TERENA Networking Conf., ’98; Comput. Networks ISDN Syst. 30(16 – 18):1687 – 1695 (Sept. 1998). 91. L. Zhang, S. Michel, K. Nguyen, and A. Rosenstein, Adaptive Web caching: Toward a new global caching architecture, paper presented at 1998 Web Cache Workshop, June 1998, http://wwwcache.ja.net/events/workshop/25/3w3.html. 92. J. Touch, The LSAM proxy cache—a multicast distributed virtual cache, paper presented at 1998 Web Cache Workshop, June 1998, http://wwwcache.ja.net/ events/workshop/14/lsam.html. 93. S. Bhattacharjee, K. L. Calvert, and E. W. Zegura, Self-organizing wide-area network caches, Proc. IEEE INFOCOM’98 Conf. Computer Communications, April 1998, Vol. 2, pp. 600–608.

132


94. L. Fan, P. Cao, J. Almeida, and A. Z. Broder, Summary cache: A scalable wide-area Web cache sharing protocol, ACM SIGCOMM’98 Conf. Applications, Technologies, Architectures, and Protocols for Computer Communication; Comput. Commun. Rev. 28(4):254 – 265 (Oct. 1998). 95. H. Nordstrom, Transparent caching, paper presented at 1st DESIRE-II Web Cache Managers Workshop, 1999, http://www.terena.nl/tech/d2-workshop/d2cache99/ transpcaching/index.htm. 96. H. Yu, L. Breslau, and S. Shenker, A scalable Web cache consistency architecture, paper presented at ACM SIGCOMM’99, Cambridge, MA, 1999. 97. D. Li and D. R. Cheriton, Scalable Web caching of frequently updated objects using reliable multicast, Proc. 2nd USENIX Symp. Internet Technologies and Systems (USITS’99), Oct. 1999.

FURTHER READING D. Barbara and H. Garcia-Molina, Replicated data management in mobile environments: Anything new under the sun?, paper presented at IFIP Conference on Applications in Parallel and Distributed Computing, April 1994, http://www-db.stanford.edu/pub/papers/data-replication.short.ps. P. Barford and M. Crovella, Generating representative Web workloads for network and server performance evaluation, SIGMETRICS’98/PERFORMANCE’98, June 1998; Perform. Eval. Rev. 26(1)– 151– 160 (1998). J. M. Boyce and R. D. Gaglianello, Packet loss effects on MPEG video sent over the public Internet, Proc. 6th ACM Int. Multimedia Conf., Sept. 1998, pp. 181 – 190. F. Cheong and R. Lai, QoS specification and mapping for distributed multimedia systems: A survey of issues, J. Syst. Software 45(2):27 – 39 (March 1999). S. E. Deering, B. Cain, and A. Thyagarajan, Internet Group Management Protocol, Version 3, Internet Draft draft-ietf-idmr-igmp-v3-00.txt, work in progress, Dec. 1997. R. H. Deng, Hybrid ARQ schemes for point-to-multipoint communication over nonstationary broadcast channels, IEEE Trans. Commun. 41(9) (Sept. 1993). D. Freedman, Markov Chains, Holden-Day, San Francisco, 1971. J. Gemmell, Scalable Reliable Multicast Using Erasure-Correcting Re-sends, Microsoft Research Technical Report, MSR-TR-97-20, June 1997. I. S. Gopal and J. M. Jaffe, Point-to-multipoint communication over broadcast links, IEEE Trans. Commun. COM-33(3):232 – 240 (March, 1995). R. Gunther, L. Levitin, B. Schapiro, and P. Wagner, Zipfs law and the effect of ranking on probability distributions, Int. J. Theor. Phys. 35(2):395 – 417 (Feb. 1996). M. Handley, An Examination of Mbone Loss Distributions, http://north.east.isi. edu/mbonemon/, March 1998. http://news.bbc.co.uk/hi/english/sci/tech/newsid_469000/469749.stm; http://cnn.com/SHOWBIZ/Music/9910/09/netaid/index.html; http://www. sjmercury.com/business/top/005139.htm.

FURTHER READING

133

ISO/IEC JTC1/SC29/WG11, MPEG-2 Generic Coding of Moving Pictures and Associated Audio Information, July 1996, http://drogo.cselt.stet.it/mpeg/standards/ mpeg-2/mpeg-2.htm. V. Johnson and M. Johnson, IP Multicast Backgrounder: An IP Multicast Initiative White Paper, http://www.ipmulticast.com/community/whitepapers/ backgrounder.html. J. P. Macker, Reliable multicast transport and integrated erasure-based forward error correction, paper presented at IEEE MILCOM, Nov. 1997. S. McCanne and V. Jacobson, vic: A flexible framework for packet video, Proc. ACM Multimedia, San Francisco, Nov. 1995, pp. 511– 522. S. McCanne, V. Jacobson, and M. Vetterli, Receiver-driven layered multicast, Proc. ACM SIGCOMM, Stanford, CA, Aug. 1996, pp. 117– 130. J. J. Metzner, An improved broadcast retransmission protocol, IEEE Trans. Commun. COM32(6) (June 1984). National Lab of Applied Network Research, A Distributed Testbed for National Information Provisioning, http://ircache.nlanr.net/Cache/. J. Nonnenmacher and E. W. Biersack, Reliable multicast: Where to use FEC, paper presented at IFIP 5th Int. Workshop on Protocols for High Speed Networks (PfHSN’96), Oct. 1996. J. Nonnenmacher, E. W. Biersack, and D. Towsley, Parity-based loss recovery for reliable multicast transmission, paper presented at SIGCOMM’97, Sept. 1997. C. Noronha and F. Jia, Live video communication over computer networks using MPEG, paper presented at Combined Industry, Space and Earth Science Data Compression Workshop, Snowbird, UT, April 1996. S. Pejhan, M. Schwartz, and D. Anastassiou, Error control using retransmission schemes in multicast transport protocols for real-time media, IEEE/ACM Trans. Network 4(3):413 – 427 (June 1996). C. Perkins, IP Encapsulation within IP, RFC 2003, Oct. 1996. J. Postel and J. Reynolds, Telnet Protocol Specification, paper presented at Netowrk Working Group, RFC 854, ftp://ftp.isi.edu/in-notes/rfc854.txt. L. Rizzo and L. Vicisano, Effective erasure codes for reliable computer communication protocols, ACM Comput. Commun. Rev. (April 1997). P. R. Rodriguez, E. W. Biersack, and K. W. Ross, Improving the WWW: Caching or multicast?, 1998 Web Cache Workshop, June 1998, http://wwwcache.ja.net/events/ workshop/papers.html. D. Rubenstein, S. Kasera, D. Towsley, and J. Kurose, Improving Reliable Multicast Using Active Parity Encoding, Univ. Massachusetts, Dept. Computer Science, Technical Report 98-79, 1998. K. Sakakibara and M. Kasahara, A multicast hybrid ARQ scheme using MDS codes and GMD decoding, IEEE Trans. Commun. 43(12):2933– 2939 (Dec. 1995). H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: A Transport Protocol for Real-Time Applications, RFC 1889, Jan. 1996. M.-K. Shin, et al. The RTMW application: Bringing multicast audio/video to the Web, Proc. 7th Int. World Wide Web Conf.; Comput. Networks ISDN Syst. 30(1 – 7):685– 687 (April 1998).

134


D. Thaler, D. Estrin, and D. Meyer, Border Gateway Multicast Protocol (bgmp): Protocol Specification, Internet draft draft-ietf-idmr-gum-01.txt, work in progress, Oct. 1996. UCB/LBNL/VINT Network Animator, http://www-mash.cs.berkeley.edu/nam/. UCB/LBNL/VINT Network Simulator—ns (version 2), http://www-mash.cs. berkeley.edu/ns/. D. Waitzman, C. Partridge, and S. E. Deering, Distance Vector Multicast Routing Protocol, RFC 1075, Nov. 1988. X. Xiao and L. M. Ni, Internet QoS: A big picture, IEEE Network Mag. 8 – 18 (March/April 1999), http://www.cse.msu.edu/ xiaoxipe/papers/inet.qos. bigpicture.pdf. L. Zhang, S. Floyd, and V. Jacobson, Adaptive Web caching, paper presented at 1997 Web Cache Workshop.

CHAPTER 5

CHARACTERIZING WEB WORKLOAD OF MOBILE CLIENTS ATUL ADYA, PARAMVIR BAHL, and LILI QIU Microsoft Research Redmond, Washington

Since the early 1990s the cellular phone industry and the World Wide Web have experienced a phenomenal growth as people around the world have embraced these technologies at a remarkable rate. Today, most major wireless service providers in the United States, Europe, and Japan offer wireless Internet services, and many Internet companies provide content that has been adapted to suit the limited display, bandwidth, memory, and processing power of small devices. Another emerging trend, related to wireless Internet, pertains to how users manage the gigantic information flow that the Internet provides. Realizing that users are being overwhelmed with information, several Web content providers offer an alternative way for users to access content. In the new service model, users are allowed to switch their data access model from browsing and navigation to notifications or alerts. Instead of periodically browsing through the Web sites for potentially useful information, an increasing number of users are adopting the model where they register for information in which they are interested. These users provide a callback address usually in the form of an email address, a cell-phone number, or a pager number, depending on the perceived importance of the information. Whenever the relevant event occurs, it triggers an alert, which causes the content provider to send a notification to the user. Examples of some U.S. companies that provide such notifications are Yahoo Mobile, MSN Mobile, AOL Anywhere, and InfoSpace. All of these services allow users to subscribe to alerts for stock quotes, news, sports scores, lottery, horoscope, calendar events, and other information. Understanding how people use wireless browsing and notification services is critical for content providers and wireless ISPs. In this chapter, we focus on characteristics of wireless Web workloads. This is a new subject that has not been well studied because of the very limited availability of such traces. Therefore, we base our discussions largely on the analyses that we conducted using traces collected Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

135

136

CHARACTERIZING WEB WORKLOAD OF MOBILE CLIENTS

at a popular commercial Web site specifically designed for mobile clients. We hope that our work on the understanding of the dynamics of wireless Web is just a beginning, and that other researchers will step forward to provide more data points to enhance our knowledge about this important research topic. The organization of this chapter is as follows. In Section 5.1, we motivate the need to understand user workloads, and describe a set of workload analyses. We review previous studies on wireline and wireless user access patterns in Section 5.2. Sections 5.3 –5.6 present detailed analyses of the browse and notification workload observed at a commercial Web site. We include a summary of findings and implications at the end of each section to provide readers with a quick overview of the results. Finally we compare wireless and wireline user workload in Section 5.7, and summarize the chapter in Section 5.8.

5.1

OVERVIEW OF WEB WORKLOAD CHARACTERIZATION

In this section, we motivate the need for understanding Web workload, and describe a set of workload analyses.

5.1.1

Motivation for Workload Characterization

Characteristics of user workloads have significant implications on Web site design, content management, protocol design, and capacity planning. For content providers, who are interested in attracting current or prospective users to visit their Web site, understanding user workloads provides insight into how to enhance user experience through more effective design and management of content. For example, answers to the following questions are important to the content providers: . . . . . .

How do users come to visit the Web site? Why do users leave the Web site? Is poor performance the cause for this? Where is the performance bottleneck? Is it at the server or at the network? What content are users interested in? How do users’ interest vary in time? How do users’ interest vary across different geographic regions?

Wireless service providers can also benefit from workload information for efficient resource allocation, capacity planning, and pricing. For example, a wireless service provider can utilize information about loads imposed by different users for designing effective pricing plans, and providing service differentiation. Furthermore, for system designers, knowledge about workloads can shed light on possible performance bottlenecks and the effectiveness of existing protocols. For instance, if notification messages are very small, the transport protocol should be optimized for sending short messages.

5.2

5.1.2

OVERVIEW OF PREVIOUS WORK

137

Types of Analysis

We now discuss different types of workload analysis typically performed on the Web traces. 1. Content Analysis. This analysis reveals properties of Web content, such as content size, popularity, modification frequency. Such information is important to content providers who need to provide fast access to popular content while using the system and network resources efficiently. Moreover, content popularity has significant implication on the effectiveness of Web caching and multicast delivery. 2. User Behavior Analysis. Analyzing user behavior is useful for personalization, targeted advertising, prioritizing, and capacity planning. Specifically, the following aspects of user behavior are particularly interesting: User load distribution— the variation in load placed by different users on the Web site Session duration—the duration of a sequence of interactions initiated by a user to a Web site Temporal stability—whether users are interested in requesting similar documents over time Spatial locality—whether users in the same geographic region tend to request similar content 3. System Load Analysis. Analyzing system load helps us understand the temporal variation of system load and sheds light on how to effectively architect and optimize systems. 5.2


There have been a number of studies on characterizing user access patterns. They can be classified into two general categories: (1) studies on wireline users’ access patterns and (2) studies on wireless users’ access patterns. In this section, we review the work in both areas. 5.2.1

Wireline User Workload Characterization

Most of the previous studies on workload characterization have focused on the browsing patterns of wireline clients. This is not surprising since Web access through wireless service providers has become feasible and gained momentum only around 2000. Furthermore, since the wireline Web is accessed essentially through browsing (as opposed to notifications/alerts), the corresponding analyses have focused on browsing patterns only. Previous work on wireline user workloads has analyzed Web traces collected at Web proxies, Web browsers, and Web servers, each providing a unique perspective

138


on the functioning of the Web. Below we give a brief overview classified according to the types of analysis performed on the Web traces. The book [20] gives an in-depth survey of the measurement and analysis of wireline Web user workloads. 5.2.1.1 Content Analysis Previous work characterizes Web content by its type, size, popularity, and modification. We discuss each of these characteristics below. Content Types. Web sites offer a wide variety of content, ranging from pure text files to graphics-rich multimedia files. Several studies [3,4,31] show that the majority of content served by Web servers consists of text and image files. Of course, the exact combination of content types varies across Web sites, and is likely to change with new applications. Content Size. There are two definitions of content size. Content size can refer to the size of all content residing on a Web server. Alternatively, it can also refer to the size of content that is transferred by a Web server. It is easy to see these two definitions are not equivalent, since some files on a Web server may be transferred multiple times, or not transferred in completion, or not transferred at all. As reported elsewhere [7,22], the median transfer size is around 2 kB, and the median Web content is a few hundred bytes larger. Interestingly, several studies [4,7,11,22] show that the distribution of content size in both definitions exhibit a heavy tail, which indicates that there is a nonnegligible fraction of files that are very large. Moreover, Barford and Crovella [8] show that the body of content size distribution can be captured by a lognormal distribution. Therefore the combination of lognormal and heavy-tail distributions captures the full range of Web content size, with the former modeling the body, and the latter modeling the tail. Content Popularity. The relative popularity of Web content has been studied extensively. The almost universal consensus is that content popularity follows a Zipf-like distribution where the popularity of the ith most popular file is proportional to 1/i a. The value of a depends on where the traces were collected. The estimates of a range from 0.5 to 1 in the Web proxy logs [9,14,24] and Web client logs [7,12]; and range from 1 to 2 in the Web server logs [4,26]. A larger a implies that accesses are more concentrated on a small set of popular documents. For example, a proxybased trace study [9] reports that 25– 40% of pages account for 70% of the client accesses. In comparison, studies based on Web server traces [4,26] show that 10% or fewer pages cover 90% of the client accesses. Content Modification Pattern. The work by Douglis et al. [13] uses proxy logs collected at Digital and AT&T to evaluate the rate of change and age distribution of Web content. They find that Web content exhibits a large variation in their modification patterns—lots of content was never modified, while other content was modified frequently. There was a significant fraction of content that was modified at

5.2


139

least once between two consecutive accesses. Moreover, they show that the rate at which content changes depends on the content type and the top-level domains that the content belongs to. A limitation of their study is that they infer the number of updates indirectly based on the last-modified timestamp, which could result in underestimating the number of updates. Moreover, they could not detect updates between consecutive accesses. Padmanabhan and Qiu [26] analyze the file modification logs obtained directly from the MSNBC backend site. They find that the content at the news server tends to be highly dynamic with thousands of files modified and created over a one-week period. Moreover, a file’s past modification interval, if averaged over a sufficient number of samples, gives a rough prediction about its future modification time. Furthermore, they show that most file modifications are small, which suggests that delta encoding could be quite useful for the news site under study. 5.2.1.2

User Behavior Analysis

User Request Arrival and Duration. As described in Krishnamurthy and Rexford [20], Web users’ workload occur at three levels: session, click, and request. A user’s session consists of a consecutive series of requests from a user to a Web site. During a session, the user may generate one or more clicks, where each click results in one or more HTTP requests, with the first request fetching the top-level document and the subsequent requests fetching the embedded objects. The amount of load imposed on the Web server by different users varies widely. Some users generate only a few requests during a session, while others generate many more requests [22]. As shown, many user workload characteristics exhibit heavy tails [20]. For example, the number of clicks in a session, the number of embedded images in a Web page, think time (i.e., time between two consecutive clicks), and active time (i.e., time to download a Web page and its embedded images) can all be modeled by Pareto distributions [27] with heavy tails. Temporal Locality and Stability. There are several ways to measure temporal locality; for instance, if a page is accessed now, what is the likelihood that it will be accessed again in the near future? One common measure is to look at users’ request sequence, and determine how soon a file will be requested again. This can be measured by least-recently-used (LRU) stack distance, as done by Almeida et al. [2]. A smaller stack distance suggests a stronger temporal locality, which in turn implies that caching could be effective. As shown in [4], the temporal stability varies across different Web sites. Moreover client-side caching may reduce the temporal locality seen at a Web server. Temporal stability is another important metric, and has significant implication on the effectiveness of caching. In the MSNBC server trace study [26], the authors study the ranking stability of Web pages, namely, whether popular documents stay popular over time. They find that the stability is reasonably high on the scale of days. The ranking tends to change only gradually over time.

140


Spatial Locality. Spatial locality is another interesting characteristic. It captures how likely people in the same geographic location or at the same organization request similar set of documents. This kind of spatial locality has obvious implications on performance, particularly with respect to the effectiveness of proxy caching. Wolman et al. analyzed the proxy traces collected at the University of Washington [31]. They computed the degree of local sharing (i.e., intraorganization sharing) under the following two situations: (1) when clients are assigned to the true organization they belong to, where the organizations include academic and administrative departments, dormitories, and the university modem pool (true assignment); and (2) when clients are assigned to organizations randomly, while preserving the size of each organization as before (random assignment). Based on their analysis, they concluded that the impact of organization membership is significant, and clients belonging to the same organization are more likely to access the same documents than clients picked at random. On the other hand, most requests are directed to objects that are requested by multiple organizations. Padmanabhan and Qiu [26] apply the analysis technique mentioned above to the MSNBC server traces. Different from Wolman et al. [31], organizations in their study [26] are defined using Internet domain names. They show that domain membership is significant in most cases; clients belonging to the same domain are more likely to share requests than are clients picked at random. This is in agreement with the findings reported by Wolman et al. [31] despite the fact that the domains used by Padmanabhan and Qiu [26] are much larger and more diverse than the university organizations used by Wolman et al. However, when there is a “hot” event, they observe the global interest can become so dominant that even clients picked at random tend to share many requests, thereby diminishing the significance of domain membership. 5.2.1.3 System Load Analysis As one would expect, user load varies with time and recent events. For example, the World Cup trace study [3] reports a high correlation between games and the number of user accesses. In addition, several studies [4,11,17] show that Web traffic may exhibit selfsimilar characteristics, especially under a high load. Crovella and Bestavros [11] explain that self-similar Web traffic comes from the heavy-tail distributed Web document sizes and user think time, as well as the effects of caching and superposition of many such transfers in a local area network. Iyengar et al. [17], analyzed the 1998 Olympic Games Web site, and concluded that burstiness, trends, interdependencies, and seasonal effects contribute heavily to the self-similar behavior found in the access logs. 5.2.2

Wireless User Workload Characterization

Web access through wireless networks is a relatively new phenomenon. As a result, its characteristics are less well understood than those of wireline access. Fortunately

5.2


141

a few recent studies have helped uncover interesting characteristics exhibited by wireless users. We now review these studies. 5.2.2.1 Analysis of WAP Traffic at Bell Mobility’s PCS The work closest to the focus of this chapter is by Kunz et al. [21]. In that work, the authors analyze WAP traffic traces collected at Bell Mobility’s Personal Communication Services (PCS) network between June 1, 1999 and December 31, 1999. They observe that network traffic exhibits daily and weekly variation, and self-similar characteristics. Moreover, they find that the activity factor for data generated by a mobile browser application is lower than the voice activity factor, where the activity factor is defined as the percentage of time that data are transmitted when the channel is at full rate. This suggests that it is possible to achieve a higher multiplexing gain. The main limitation of their work is the size of the data analyzed: only 80,000 entries were logged over a period of 7 months. It is unclear whether the inferences drawn from this study are applicable for large commercial sites. 5.2.2.2 Analysis of a Metropolitan Area Wireless Network Tang and Baker [29] analyze a 7-week trace (Feb. 1, 1998 –March 23, 1998) of the Metricom metropolitan area packet radio wireless network. This study is different from the analysis in Kunz et al. [21] and our analysis of wireless Web traces, since it focuses on how the networks are used. For example, their study analyzes when the networks are most active, how active the network is, how often users move, and so on. Some of the major findings by Tang and Baker [29] are as follows: (i) usage behavior shows diurnal and weekly patterns, (ii) users do not move frequently–most users move within their local area, and only a few users travel long distances between different Metricom installations. 5.2.2.3 Wireless LAN Study The following five studies analyze 802.11b traces collected at university campus, a conference, and a large corporation. In comparison with the WAP traffic study [21] and our work, which examine content and application specific details, the following studies focus on overall network activity and user mobility. Workload of the Computer Science Building at Stanford University. Tang and Baker [30] analyzed a 12-week tcpdump and SNMP trace of a wireless LAN network installed at the Computer Science Building of Stanford University. They find that many people use laptops for session-oriented activities (e.g., ssh, telnet) and chatoriented activities (e.g., talk, icq, irc, zephyr). In most cases, incoming traffic exceeds outgoing traffic, but the opposite is true when the network load is high. Workload at Campuswide Wireless LAN at Dartmouth. Kotz and Essien [19] traced the activity of nearly 2000 users using a campus-wide network of 476 access points (APs) spread over 161 buildings on Dartmouth campus. Similar in flavor to the previous study [30], the new study is unique for its size, population diversity, and detail of trace information. Some of their interesting findings

142


include (1) wireless cards are extremely aggressive in associating with APs, which results in many short sessions and a high degree of roaming within sessions; and (2) while there is daily and weekly usage pattern, there is a high variation in the activity across different buildings, access points, and network cards, both over time and across space. Workload at Campus-wide Wireless LAN at Georgia Tech. Hutchins and Zegura [16] presented an analysis of wireless traces collected at Georgia Tech campus, which contained 110 wireless access points placed for maximum coverage across 18 buildings. They showed that users exhibit variability in movement. In addition, they showed that many user sessions and transport level flows are short, but there are a few that last much longer. Workload of a Technical Conference. Balachandran et al. [5] analyzed the user behavior and network performance of a wireless LAN using a workload captured during a 3-day technical conference in August 2001. Their study focused on characterizing users for the purpose of coming up with a parameterized model to describe them. Additionally, they carried out workload analysis to address the network capacity planning problem. Their main observations were that: (1) users are evenly distributed across all access points (APs) and user arrivals are correlated in time and space (this is a consequence of the conference setting); (2) Web and SSH account for 64% of the application traffic bytes and 58% of the flows; (3) most users have short sessions (e.g., 60% of the user sessions are less than 10 minutes); and (4) the bandwidth distribution across APs is highly uneven, and does not correlate to the number of users around an AP. Instead, an AP’s load is mostly governed by individual user workload. Workload of a Large Corporate Environment. Balazinska and Castro [6] complemented the previous studies on a campuswide wireless LAN by analyzing a 4-week trace gathered in a large corporate environment with 1366 users. Many of their analysis results confirmed the findings reported earlier [5,19,30]. For example, they also observed that different users impose different loads on the network, and the load at an access point is more influenced by which users are present than by the number of users. In addition, they modeled user mobility using persistence and prevalence, where the former reflects session durations while the latter reflects how often users visit various locations. Their results show that probability distribution of both metrics exhibit power laws.

5.3

SERVER ARCHITECTURE AND DATA GATHERING

Starting with this section, we describe our analysis of Web traces collected at a popular commercial Web site designed for mobile clients. Note that even though our study is based on a single Web site, we believe that many of the basic results will hold for other sites serving wireless users. We encourage others to publish

5.3

SERVER ARCHITECTURE AND DATA GATHERING

143

their study on wireless Web workload characterization to compare and contrast their observations with ours, and strengthen our understanding of user behavior for the wireless Web. Before presenting analysis in detail, we first describe the architecture of the Web server designed for wireless users, and how the data are gathered and analyzed. 5.3.1

Server Architecture

Figure 5.1 shows an example of Web server architecture for mobile users. It is made up of the following components: Front-Door Server – A request from a Web browser is directed to one of these servers using a standard redirector mechanism for load balancing. Content is optimized for small mobile devices such as cell phones and PDAs; however, the site may be accessed from desktop clients as well, such as for personalization purposes. Content Server – The set of servers constantly fetch new content from the Internet and distribute the new content to the front-door and notification servers. Notification Server – These servers send out event alerts to mobile devices. These messages may be one-time events, such as “Don’t forget the doctor’s appointment,” or periodic such as daily horoscopes, or events triggered by a condition such as a change in a company’s stock price.

Figure 5.1 An example of backend architecture, consisting of notification servers and front door Web servers.

144


Database Server – The database is used to store all information from the content server. The notification server queries the database when sending out notifications to clients. A request from a wireless client is received by the load balancing module through the carrier gateway (not shown in the figure). This module sends the request to one of the front doors servers that dynamically generates the page by communicating with the database if needed, and sends replies to the client.

5.3.2

Description of Data Logs

We have access to logs for 12 days of Web browsing from August 15, 2000 through August 26, 2000. There are approximately 33 million entries in the browse logs. Additionally,we use notification logs from August 20, 2000 through August 26, 2000, which contained 3.25 million entries. For our analysis of the correlation between browse and notification services (Section 1.6), we obtain additional notification logs and performed the comparison for the period from August 15, 2000 through August 26, 2000. When a user registers with the server, a unique identifier is assigned to the user. When such a registered user sends a browse request to the Web server, this unique identifier is also sent to the server and logged in the Web traces. For unregistered users, the identifier field is empty. We use these identifiers to perform our userbased analysis. Each log record also contains other pieces of useful information such as date, time, type of browser, the URL accessed, the data received and sent by the server When a notification message is sent, a record is created in a database. The record contains information about the server from which the notification message was sent, a user id, type of the device to which the message was sent (e.g., phone or pager), type of alert, and when it was sent. 5.3.3

Types of Accesses

For the Web site that we analyzed in this study, a single browse request results in exactly one HTTP request to the server. There are no images or other types of content embedded in the page that is transmitted to a client. The Web site can be browsed in three different ways: via desktop, offline, and wireless. Desktop access includes requests from desktop and laptop machines connected to the Web site via wireline networks. Offline access is generated due to handheld devices such as PDAs. For example, companies such as Avantgo and Vindigo offer services that let users select content from different Web sites, which is then downloaded onto their handheld devices for browsing at a later time. The content download occurs when a user synchronizes his or her handheld with a desktop machine and is controlled by a “downloader” program. Wireless access occurs as a result of browse actions initiated by users from their cell phones or wireless

5.4

CHARACTERIZING WEB BROWSING WORKLOAD

145

devices. Typically, a request from a cell phone is directed to a “gateway,” which is operated by the user’s service provider, and forwards the message to the Web site. This gateway also forwards the reply back to the cell phone. Thus, from the Web site’s perspective, it communicates directly with the gateway machines using the standard HTTP protocol. Since one gateway can serve multiple clients, we do not use IP addresses to identify users; instead, we use a unique identifier assigned to every client that is logged with each access. We determine the type of an access using the browser type stored in the log entry corresponding to that access. For example, entries with browser type “Mozilla Windows,” “Avantgo,” and “UP.Browser” are categorized as desktop, offline, and wireless accesses, respectively. In Table 5.1 we show the number of accesses for different browser types. The last category, Misc, corresponds to log entries for which the browser type either was empty or contained characters that could not be mapped to any known browser. The table also shows the number of unique users that are responsible for different types of accesses. The number of desktop users is much higher than the offline and wireless users, because a large number of clients use their desktop machines to register with the Web site, and then never browse the Web site later from their mobile device. In the case of notifications, more than 99% of the messages were sent to wireless clients and the remaining were sent to desktop clients.

5.4


In this section, we analyze the browser log collected from August 15, 2000 through August 26, 2000. During this period the Web server received 1.6 – 3.2 million requests per day from 64,000 – 98,000 distinct clients.

5.4.1

Content Analysis

We begin with content analysis. In particular, we seek answers to the following questions: (1) how content varies in size, (2) what are popular content categories, and (3) how popularity varies across different Web content.

TABLE 5.1 Access Type Desktop Wireless Offline Misc

Types of User Access Number of Accesses

Number of Users

7,342,206 2,210,758 20,508,272 2,944,708

639,971 58,432 50,968 1,634

146


5.4.1.1 Content Size Figure 5.2 shows the cumulative distribution function (CDF) of the reply message sizes. We make the following observations: 1. Most of the replies are small: 98% of wireless client replies are below 3 kB (kilobytes) and 99% of offline user replies are below 6.3 kB. This indicates that the Web server for mobile clients should be optimized for sending short replies. In particular, the TCP slow start procedure to probe available network bandwidth usually takes several roundtrips; this approach is inefficient for sending small files. Previous proposals on optimizing TCP for short Web transfers [25,33] can be very beneficial for sending wireless. 2. The replies for desktop clients are larger than those for wireless clients; desktops clients are used for personalization and signup activity and these pages are relatively bigger. 3. A significant fraction of bytes are sent in large replies, e.g., 80% of the total bytes are sent in the replies that contain 10 kB or more.

5.4.1.2 Popular Content Categories To study the popularity of Web pages in terms of content categories, we classify content into different categories, such as entertainment, stocks, weather, news, travel, and Yellow Pages. As mentioned earlier, all pages on the Web site are dynamically generated. For every Web access, the server runs an appropriate ASP [15] script. Usually, the name of the ASP script is indicative of the category of information accessed. For example, if the URL in the browse log contains the script name find.asp, we categorize it under “Yellow Pages”; similarly, if the script filename is stock.asp or onequote.asp, we categorize the access as stock quotes. This algorithm works

Fraction of messages

1 0.8

Wireless Offline All Desktop

0.6 0.4 0.2 0 0

100

200

300

400

Reply size (bytes/100) [max value = 8087]

Figure 5.2 Cumulative distribution function of total number of entries versus the size of the reply message.

5.4


147

for most URL entries in the log, but there are some ASP filenames that do not completely specify the category type. For such cases, we examine the database entries and categorize the URL based on parameter values; for example, the parameters to an unclassified file includes entries such as categ ¼ sports and categ ¼ news. In this way, we categorize all 33 million browse log entries. Table 5.2 shows the top three categories for wireless users, offline users, and desktop users. We observe that Stock quotes is the most popular category of interest to wireless users and it is ranked the third for offline users; thus, content providers need to continue providing stock quotes efficiently. We also observe that Yellow Pages and Direction services are more interesting to wireless users than offline users. In fact, for offline users Direction is one of the least popular categories. Since these services are tied to location services, it is reasonable to assume that when user location information is readily available, the popularity of locationaware services among wireless users will go up considerably. Surprisingly, mail shows up as being low on popularity. A possible reason is that at the time of trace collection, the wireless Web site did not provide mail browsing. It simply redirected the client queries to the user’s Internet mail service provider’s Web site. We conjecture that when the Web site is ready to handle mail accesses, the popularity of the mail service will go up. Figures 5.3 and 5.4 show the weekday and weekend activity for wireless and offline users. The Y-axes in the figures show the average number of bytes received and sent by the Web server for the corresponding content category in a day. Note that the scales of the two figures are different. In terms of relative popularity ranking across different content categories, we observe that the browse pattern on a weekday is similar to that on a weekend. This is true for both offline and wireless users. On the other hand, offline users are interested in different content categories from wireless users. In terms of absolute numbers, we see that offline users sync up their devices more frequently during weekdays than during weekends. When comparing the offline and wireless users, we observe that offline users rarely sign up for alert services or look for directions; however, they download many more help pages than do wireless users. Recall that offline accesses are controlled by an automated “sync” program that downloads pages according to the user’s profile. Such a program downloads help files for the user’s selected categories, regardless of whether the user actually wants help; wireless users download only the contents that they want to browse. Furthermore, as we will

TABLE 5.2

Wireless Offline Desktop

Top Three Preferences for Different Kinds of Users Rank 1

Rank 2

Rank 3

Stock quotes Help Signups

News News Mail

Yellow Pages Stock quotes Sports

148


Weekday

Weekend

100

Yellow Pages

Weather

Travel

Stock

Sports

News

Miscellaneous

Mail

Login

Horoscope/Lottery

Help

Entertainment

Default

Directions

0 Alert Signup

Bytes prrocessed (Mbytes)

200

Weekday versus weekend content analysis for wireless users.

Figure 5.3

discuss in Section 5.4.2, these sync programs (acting on behalf of offline users) can add a significantly higher load on the server than the load imposed by wireless users. 5.4.1.3 Document Popularity Several studies, [3,4,7,9,26] have found that Web accesses follow Zipf-like distribution, that is, the number of requests to the ith most popular object is proportional to 1/i a. It is interesting to determine whether the wireless Web accesses also exhibit a similar property.

Bytes prrocessed (Mbytes)

Weekday

Weekend

800 600 400 200

Figure 5.4

Yellow Pages

Weather

Travel

Stock

Sports

News

Miscellaneous

Mail

Login

Horoscope/Lottery

Help

Entertainment

Directions

Default

Alert Signup

0

Weekday versus weekend content analysis for offline users.

5.4


149

In Figure 5.5, we plot the number of URLs versus their popularity ranking on a log –log scale for the August 15 trace. The figure shows that the curve does not closely follow a straight line. Similar performance is observed for the logs on other days. A possible reason for such deviation from Zipf-like distribution is that there are a small number of unique files served by this Web site. Web accesses tend to exhibit a Zipf-like distribution when the number of distinct objects is large. When Web sites provide more diverse content to wireless users, the document popularity may exhibit a Zipf-like distribution. Since most Web pages in our dataset are dynamically generated, we look at the distribution of requests to documents by taking the input parameters into account. For ease of discussion, in the rest of this section, a document is referred to as a unique URL – parameter combination. Figure 5.6 shows the number of requests to different documents; if we ignore the top 100 samples, the number of requests decreases almost linearly with the number of documents. Figure 5.7 shows the cumulative distribution of the requests to documents for the August 15 log; the logs for other days are similar. The figure shows that the majority of requests are concentrated on a small number of documents. In particular, 0.1– 0.5% of the URL – parameter combinations—about 121 –442 unique combinations—account for 90% of the requests. This implies that a significant reduction in the Web server load can be achieved using a very small amount of memory (less than 2 MB from the numbers mentioned above). In comparison, for regular Web sites, the memory requirements for highly popular documents are typically higher since the pages are larger and the Web requests are more diverse.

5.4.2


Classifying users according to their access patterns is useful for personalization, targeted advertising, prioritizing, and capacity planning. We now present a userbased analysis for the browser logs by taking advantage of the unique user identifier that is logged with every Web access. In particular, we examine usage variation

1000000 # Requests

100000 10000 1000 100 10 1 1

10 100 Popularity ranking of documents

1000

Figure 5.5 Frequency of document accesses versus document ranking in log– log scale.

150


# Requests

8/15/2000

1000000 100000 10000 1000 100 10 1 1

10 100 1000 10000 1E+05 1E+06 Popularity ranking of documents 8/19/2000

1000000

# Requests

100000 10000 1000 100 10 1 1

10 100 1000 10000 100000 Popularity ranking of documents

Figure 5.6 Frequency of document accesses versus document ranking in log – log scale, where a document is defined as a unique URL and input parameter combination.

across different users, the duration of user sessions, temporal stability, and spatial locality.

Percentage of requests

5.4.2.1 Load Distribution of Different Users We examine the distribution of load placed on different users in two ways: (1) at a coarse-grained level, we compare the load imposed by wireless users and offline

1.2 1 0.8 0.6 0.4 0.2 0 0

Figure 5.7

0.2

0.4 0.6 0.8 Percentage of documents

1

1.2

Cumulative distribution function of requests to documents.

5.4


151

users (see Fig. 5.8); and (2) at a fine-grained level, we look at the distribution of loads imposed by individual users. Load Distribution across Wireless and Offline Users. We analyze the number of bytes downloaded by each user and observe that 90% of wireless users fetch less than 100 KB each during the trace period. In comparison, offline users access significantly more data: more than 95% of offline users downloaded more than 100 kB of data during the trace period. However, this result does not mean that offline users actually access all the data; it just reflects the amount of data downloaded by the sync program to synchronize a PDA’s data according to user’s registered profile. Currently, an offline user’s profile is registered with the user’s sync program. If the profile were registered with the server, it could be used to quickly prefetch only the relevant pages for that user when the first request is received for syncing. Figure 5.9 shows the inter –arrival time between requests coming from the same user. The requests generated from the offline users are much more bursty than those from the wireless users: 97% of the requests from the offline users have an inter – arrival time of less than 1 second whereas only 9% of the requests from the wireless users have a comparable inter –arrival time; offline users also generate significantly more requests than do wireless users for the same reason mentioned earlier. Such types of accesses are not common at regular Web servers. Thus, from a Web site architect’s perspective, it is crucial to handle these bursty loads appropriately so that (online) wireless users are not delayed significantly. One approach is to give higher priority to wireless users. Alternatively we can restrict the burst coming from offline user requests to a few front-door servers (i.e., servers that handle incoming HTTP requests). Another approach, as described earlier, is to register the user’s profile with the Web server and change the PDA synchronization protocol to send only the initial request to the Web server to indicate that the user is ready to sync their content.

Figure 5.8

User analysis: bytes distribution per user.

152


100

Percentage of requests

80

60

40

20 Offline users Wireless users

0 1

10

100

1000

10,000

100,000

1,000,000

Inter-arrival time (seconds)

Figure 5.9 Cumulative distribution function of inter – arrival times between consecutive requests from the same user.

Load Distribution at a per User Level Granularity. Figures 5.10 and 5.11 show the total number of accesses and total amount of data requested by different clients, respectively. Users with invalid identifiers were discarded. As we can see, there is a significant variation in the load placed by different users on the Web

Figure 5.10

Total number of accesses made by different users.

5.4


153

100,000

Total number of Kbytes sent

10,000 1000 100 10 1 0.1 1

10

100

1000

10,000

100,000 1,000,000

User ID (sorted by the number of bytes accessed)

Figure 5.11

Total number of data received by different users.

server: some users request several orders of magnitude more documents/data than do other users; accesses from wireless clients also reveal a similar property. Thus, service providers can consider designing different pricing plans that cater to the widely varying needs of different users. 5.4.2.2 Distribution of Wireless User Sessions To model the behavior of wireless users, it is important to understand how long users stay connected for Web browsing. We use the notion of a session to model such a sequence of interactions initiated by a user on his or her microbrowser (i.e., a browser on a cell phone or a PDA). A wireless service provider can utilize information about the number of sessions and length of sessions for effective pricing, capacity planning, and for providing service differentiation between users with different usage characteristics. A Web server administrator can classify users as short-session or long-session users and utilize this information for better load balancing (e.g., a uniform mix of long-session users and short-session users may result in fewer resource bottlenecks) and prefetching/caching strategies (e.g., cache or prefetch time-insensitive information for long-session users). Since it was not possible for us to instrument client microbrowsers for demarcating when a user sessions starts and ends, we approximate session times using a heuristic: if a user is idle for a “sufficiently long” time, called the session inactivity period, we say that the session has ended. We now describe our heuristic to determine the session inactivity period.

154


When a user starts browsing, he/she accesses the homepage (by default) and then traverses the Web site under study. Given the structure of this Web site, we expect few sessions access only one page, since during these sessions users only connect but do not browse. Thus, if we choose a very small session inactivity period, many accesses will be incorrectly counted as part of separate single-hit sessions. As the session inactivity period increases, these accesses are correctly classified as part of larger sessions and the number of single-hit sessions will decrease. The appropriate session inactivity period is at the knee point where the change in its value does not produce a significant change in the number of single-hit sessions. Figure 5.12 shows the number of single-hit sessions versus the session inactivity period. The figure shows that the knee point occurs somewhere between 30 and 45 seconds; we use this knee point as the session inactivity period. Note that even though our analysis is based on correctly classifying single-hit sessions, it impacts the classification of multihit sessions as well. We can verify our chosen period in another manner. As the session inactivity period increases, smaller sessions merge into larger ones, thereby decreasing the total number of sessions, until all sessions are merged into one big session. There is a point at which the rate of decrease of sessions becomes relatively steady and low. Again, this occurs in the region where the real session inactivity period lies. In Figure 5.12, there is a relatively steep drop in the number of sessions from 10 to 30 seconds and then a smoother curve from 30 seconds onward is observed. This analysis confirms the fact that the session inactivity period lies in the range from 30 to 45 seconds. This value is different from the one reported by Kunz et al. [21], who used 90 seconds as the threshold for reclaiming dynamic IP addresses.

600000

500000

No.of sessions

800000

No. of sessions No. of single-access sessions

400000

600000 300000 400000 200000 200000

100000

0

No. of single-access sessions

1000000

0 0

100

200

300

400

500

600

Session Inactivity Period (secs)

Figure 5.12 Determining the session inactivity period based on single-access sessions and number of sessions.

5.4


155

Using 30 seconds as our session inactivity period, we analyze user sessions in detail. For each user, we determine the total session time, the longest session time, and the number of sessions initiated during this period. We then classify users according to these three parameters, for example, how many users have 1 session, 2 sessions, and so on. Figure 5.13 shows that most users browse the Web for short periods of time, e.g., for 95% of the users, the largest session time is less than 3 minutes (not shown). Similarly, the total browse time for 95% of the users is less than 7 minutes for the entire trace period. Moreover, 95% of the users initiate fewer than 35 sessions during the trace period and 98% of the users have fewer than 200 hits during the trace period. There could be several explanations for this behavior: (1) browsing the Web on cell phones is cumbersome because of the small form factor; (2) unlike the traditional wired connections, browse time on the cellular network is not free—subscribers have to pay for airtime; and (3) finally, wireless Web services are just beginning; over time with the availability of better content and better display technology, along with cheaper airtime, more users will stay connected for longer periods of time. 5.4.2.3 Temporal Stability Next we analyze whether users are interested in a similar set of documents on different days. To answer this question, we pick the N most popular documents from each day, and compare the extent of the overlap. Since all the Web pages are dynamically generated, a document is defined as a combination of a unique URL name and the query parameters (i.e., two requests with the same URL but different parameters are considered as different document requests). We will use the terms document and query interchangeably in this section. We first study the requests from all users, including wireless, offline, and desktop users, and then examine the requests from wireless users only. Figures 5.14a and

800 600

400

Total session time No. of sessions

400

200

No.of sessions

Total session time (secs)

600

200 0 0

20000

40000

0 60000

No. of users

Figure 5.13 Distribution of session times and number of sessions for wireless users during the trace period.

156


1 0.9

Fraction of overlap

0.8 0.7 0.6 0.5 0.4 0.3 8/15 vs. 8/16 8/15 vs. 8/17 8/15 vs. 8/18 8/15 vs. 8/19 8/15 vs. 8/20

0.2 0.1 0 1

10

100

1,000

10,000

100,000

1e+006

100,000

1,000,000

# Top Documents Picked (a) 1

Fraction of overlap

0.8

0.6

0.4 8/21 vs. 8/22 8/21 vs. 8/23 8/21 vs. 8/24 8/21 vs. 8/19 8/21 vs. 8/20 8/21 vs. 8/26

0.2

0 1

10

100 1000 10,000 # Top Documents Picked (b)

Figure 5.14 Temporal stability of document ranking for weekdays: (a) overlap between a weekday versus other days; (b) overlap between another weekday versus other days.

5.4


157

5.14b plot the overlap between weekdays August 15 (Tuesday) and August 21 (Monday) versus other days (both weekdays and weekend days). Note that, the curves with points are for pairs of weekdays, and those without points are for a weekday and weekend. Figure 5.15 plots the overlap between weekend days. Note that the x-axis data values for the top N case are not always equal to N in the graphs. The reason is that when we consider the top N documents, the next few documents after the top Nth document may have the same access frequency as the Nth document; so these documents are included as the “top N” documents. Looking at Figure 5.14, we make the following observations: 1. The overlap of documents between different days is significant. For example, the document overlap for the top 100 documents is more than 80%, and the overlap for the top 1000 documents is more than 70%. This indicates that the set of popular queries remains relatively stable, and suggests that we can cache a stable set of popular query results or optimize the data layout to improve the performance of these queries. For example, workload-based techniques can be used to generate indices and materialized views automatically for a database [1]; these techniques are largely applicable if the database query workload is relatively stable, which is the case for our browser queries. 2. As shown in Figure 5.14, the document overlap initially fluctuates with an increasing number of documents and then decreases when the number of top documents picked is more than 100. The initial fluctuation is due to the

1 0.9 0.8

Fraction of overlap

0.7 0.6 0.5 0.4 0.3 0.2

8/19 vs. 8/20 8/19 vs. 8/26 8/20 vs. 8/26

0.1 0 1

10

100

1,000

10,000

100,000

# Top Documents Picked

Figure 5.15

Temporal stability of document ranking for weekdays.

1e+006

158


fact that although very popular documents tend to remain popular, their relative ranking does change over time. However, as we choose more documents, we may also include some less popular documents. Since these documents are less likely to remain popular over time, the temporal overlap decreases. This phenomenon was also observed at a popular news site for wireline clients [26]. 3. We observe that the document overlap between pairs of weekdays is generally higher than the overlap between a weekend day and a weekday; the overlap between two weekend days is even higher. This observation is consistent with our intuition, and suggests that we should use past weekday (weekend) workload to predict future weekday (weekend) workload. We also examine the requests coming from wireless users only, and observe similar results. As before, the set of popular queries remains stable over time. The stability is especially high when we consider the most popular queries. In addition, there is a significant difference between the access pattern on weekdays versus that on weekends. 5.4.2.4 Spatial Locality Finally we consider the following question: Do people in the same geographical region tend to issue a similar set of queries? We take the following approach to carry out this analysis. We define a browse request to be locally shared if at least two users in the same cluster access the same document (URL and parameter combination). We compare the degree of sharing using geographic clustering and four random clusterings. In the geographic clustering case, clients in the same city are clustered together. In the random clustering case, clients are clustered randomly and the cluster size is the same as in geographic clustering. We obtained the geographic location of users using a registration database which contains zip code information for each user. The zip code information is not clean—14% of users supplied zip codes that were not five digits, which we filter out before our analysis. In the remaining entries, it is still possible to have zip codes that do not match the actual user location, but the fraction is likely to be small. Figure 5.16 compares the fraction of documents that are shared within a geographic cluster and within four random clusters for all the users. The figure shows that the curve for the geographic clusters overlaps with those for random clusters. This overlap indicates that the degree of sharing between geographic clustering and random clustering is comparable, and the correlation between users’ interest in browsing over wireless networks and their geographic location is weak. A possible explanation for the weak correlation is that lots of popular content has global interest. In particular, as mentioned in Section 5.4.1.3, 0.1– 0.5% of the URL and parameter combinations (i.e., about 121 – 442 unique combinations) account for 90% of the requests. With such a high concentration of user interest on a few documents, even when clients are picked at random, they share many requests; therefore,

5.4


0.45

Trace Random 1 Random 2 Random 3 Random 4

0.4 Fraction of requests locally shared

159

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

100

200

300

400

500

600

700

800

900

1000

City ID

Figure 5.16 Local sharing between random sets of clients and clients that are geographically close together.

the geographic locality becomes insignificant. A similar phenomenon has been observed in a study of a popular news server for wireline users [26], where the authors observed that the significance of domain membership diminished during a popular event. A major distinction between that study and ours is the way in which users are clustered: in that study, users were clustered according to their DNS names, whereas in our study we cluster users according to their geographic region, such as, the cities in which they reside. A natural question follows: Why is there such a high concentration of interest in popular documents that even when clients are picked at random they share many documents? Examination of the most popular URLs and parameters shows that they include the front pages for email login, news, sports, weather, lottery, and the signup application, as well as some popular stock quote queries. Intuitively, these queries are very popular to all users regardless of their geographic locations. We perform the same spatial locality analysis to requests issued only by wireless clients. Figure 5.17 summarizes the results. With geographic clustering, wireless clients have slightly more sharing of documents than with random clustering; however, the distinction between the two clusterings is much less significant than the difference observed for notification documents, as we will see in Section 5.5.2.2. This result suggests that using geographic locality of wireless users as input for optimizing performance or providing content will yield limited success.

160


0.35

Trace Random 1 Random 2 Random 3 Random 4

Fraction of requests locally shared

0.3 0.25 0.2 0.15 0.1 0.05 0 0

50

100

150

200

250

City ID

Figure 5.17 Comparison of local sharing between random sets of wireless clients and wireless clients that are geographically close together.

5.4.3

System Load Analysis

We now present an analysis of the load experienced by the Web server during different times of the day. Figures 5.18 and 5.19 shows the variation in load observed by the Web server during different times of a day for the 12-day period. As expected, there are more hits on the Web site during the daytime than during the night; the distributions show that most of the hits are received during the middle of the day. The server is lightly loaded at night since it provides services mainly to customers in the United States. The server administrators can utilize this time period for daily maintenance tasks. However, as more international users come on line, we expect the graph to become flatter. In fact, such an effect is observed in Figure 5.20; there is a long period during the day when the load on the server is not significantly below the peak load (e.g., the number of bytes transmitted from 6 A .M . to 5 P .M . Pacific Time is within 25% of the peak load). The server observes a high load early in the day since that corresponds to the regular office hours on the East coast, and the high load continues till the end of office hours on the West Coast. In absolute terms, the traffic generated by offline access significantly outweighs the traffic generated by wireless users. The offline users’ access patterns are slightly different from the wireless users’ (see Fig. 5.19) because they do not correspond to the time when users actually need the data; instead, they correspond to the time when users sync their PDA device and are probably about to go mobile. As discussed in Section 5.4.2.1, the Web site designers must architect the system to handle large bursts of traffic or

5.4

Figure 5.18


161

Number of users accessing the Website at different times of the day.

change the PDA synchronization protocol; furthermore, the Web site can utilize the knowledge of the client type and provide higher priority to the wireless traffic. 5.4.4

Summary of Browse Log Analyses

We summarize our observations of the browse accesses at the Web site designed specifically for mobile users. 1. The page sizes of the Web replies are small. In particular, most of the replies to wireless clients are less than 3 kB, and those to offline users are less than 6 kB. 2. Our content analysis shows that stock quotes, news, and Yellow Pages are accessed most frequently. The relative importance of different categories changes little between weekdays and weekends, except that there are fewer stock quotes and more sports related accesses on weekends. However, the amount of data accessed over the weekend drops by a significant amount. Moreover, as expected, the Web server load exhibits a similar usage pattern.

Offline Wireless

Pages served

1200000 900000 600000 300000 0 0

4

8

12

16

20

24

Time of day Figure 5.19

Number of pages served by the Website at different times of the day.

162


Pages served

2000000 1500000 1000000 500000 0 0

4

8

12

16

20

Time of day

Figure 5.20 users.

Number of pages served by the Website at different times of the day for all

3. The popularity of Web content does not closely follow Zipf-like distribution. The majority of accesses are concentrated on a small number of documents. For example, 0.1– 0.5% of the queries account for 90% requests. 4. There is a significant variation in the load placed by different users on the Web server; some users issue orders of magnitude more requests than other users. A small fraction of users issue a majority of the requests; for example, 10% of the users issue more than 80% of the requests. 5. The traffic generated by offline users due to an automated sync program is bursty and can place a significant load on the Web server. For example, more than 60% of the Web pages accessed at the Web server are due to offline PDA users, and less than 7% of the accesses are due to wireless clients. Such type of automated accesses is not very common at regular Web servers. 6. Our analysis shows that users tend to have short sessions when interaction with the Web site; in 95% of the cases, the session duration is less than 3 minutes. 7. The set of popular queries remains relatively stable over time. For instance, among the top 100 popular queries, more than 80% remain popular during a week; and among the top 1000 popular queries, more than 70% remain popular during a week. 8. The geographic locality in users’ browsing requests is insignificant. The percentage of queries that are shared within a geographic cluster is comparable to what are shared within a random cluster. This is likely due to the fact that most popular queries are shared globally. These queries include the front pages for email login, news, sports, signup application, and other features. These observations have the following implications: 1. Most replies sent to wireless and offline users are small. Thus, the wireless Web server should be optimized for sending short replies.

5.5

CHARACTERIZING NOTIFICATION WORKLOAD

163

2. The fact that offline PDA users generate significantly more bursty requests than do wireless users suggests that system designers should consider giving wireless clients higher priority over offline PDA users. Alternatively, they should encourage users to register their profiles with the Web server and change the automated sync program to send only the initial request to indicate the client is ready for sync. 3. Our heuristic to determine the session inactivity period shows that the sessioninactivity period is somewhere between 30 and 45 seconds, which suggests that dynamically assigned IP addresses for WAP may be reclaimed more quickly than the 90-second duration suggested by Kunz et al. [21]. 4. The high concentration of browse requests to a small number of unique documents implies that caching the output of popular queries could be very effective in reducing the wireless Web server load. 5. The set of popular wireless Web queries is relatively stable over time. Thus, we can optimize the server’s performance over this stable set of queries for caching and database layout. 6. The lack of geographic locality in browsing over wireless channels implies that when we replicate wireless Web servers across different geographic regions, the workload perceived at different locations is likely to be similar. Since the data are simply replicated at every site, the same optimizations, such as optimization of the database layout, can be applied at all locations to improve performance. 7. The substantial variation of the load observed by the Web server during different times of day suggests that the server administrator can utilize night time, when the load is light, for daily maintenance.

5.5


In the section, we analyze notification logs. In the following discussion, we use the term notification document to refer to a unique document that may be sent to multiple users; we refer to each such transmission as a notification message, which includes duplicates. Table 5.3 shows the overall statistics for the notification logs. In one week, the server sent out 3.25 million notification messages for a total of 295 MB. Onefourth of the messages sent out were distinct, while the remaining messages had the same content but sent to different users (in some cases, the same message is sent to a user multiple times, e.g., if a user has registered for information to be delivered at specific times and the information has not changed during that period). The significant amount of duplication in messages sent to different users suggests that sending notification via application-level multicast would be useful; Section 5.5.1.3 examines this issue in greater depth. There were 200,860 distinct users, of which 99.02% were wireless users. The notifications were sent at the average rate

164


TABLE 5.3 Overall Statistics for the Notification Logs for August 20 – 26, 2000 Total messages Total distinct messages Total bytes transmitted Total bytes of unique messages transmitted Total number of users Total number of wireless users Average notification rate Peak notification rate

3,251,537 884,272 295 MB 71.3 MB 200,860 198,882 322.57 (msgs/min) 9502 (msgs/min)

of 323 messages per minute. The peak rate was much higher, approximately 30 times as high as the average rate.

5.5.1

Content Analysis

We begin our analysis by looking at the content of the notifications sent to various users. 5.5.1.1 Notification Message Size and Its Implications We find that notification messages are small. Specifically, all messages contain less than 256 bytes. We show the message size distribution in Figure 5.21 to illustrate this point. Consequently, it is important for the delivery protocol to handle small message efficiently. For example, if the protocol creates a new TCP connection for every notification message, the overhead can be high. The connection establishment may increase the user-perceived latency by a factory of 3, that is, from 1 roundtrip time to 1.5 roundtrip time. Assuming the average notification message size to be 128 bytes, the connection setup and teardown increase the bandwidth usage from 168 bytes per message to 448 bytes per message: seven additional packets (three packets in the three-way handshake connection setup, and four packets in the connection teardown). One suggestion for reducing the overhead of connection setup and teardown is to use persistent connections [23], that is, reuse a TCP connection for multiple transfers. In our case, the servers sending the notification messages can maintain persistent connections with the gateways of the wireless ISPs, and send all messages on this connection. 5.5.1.2 Popular Categories We classify the notifications into categories based on the subject field, which was recorded in the notification logs. We plot the number of messages sent for each notification category in Figure 5.22, and the number of users who received the notification message for each category in Figure 5.23.

5.5


165

Message size (# bytes)

300

200

100

0 0.E+ 00

1. E+06

2.E+06

3.E+06

4.E+06

Message ID (so rted by msg size)

Figure 5.21

Size distribution of notification messages (including duplicates).

As Figure 5.22 shows, email, weather, news, stock quotes, sports, and horoscopes are the most popular categories in terms of the total number of notification messages. In comparison, weather, email, horoscopes, news, and stock quotes are the most popular categories in terms of the total number of users (see Fig. 5.23). As expected, email alerts are very popular. On the other hand, we had not expected weatherrelated notifications to be so popular. Intuitively, one might have expected stock quotes and news to be more popular, especially since users have to explicitly register for different notification types, and notifications are not sent because of some default setting on the user signup page. Another surprise was the low popularity of calendar alerts. A possible reason is that subscribers use handheld devices that are not connected to the wireless Internet, such as PDAs with preinstalled software, to handle scheduled meetings, anniversaries, or other events.

Max Msg. W arnings Calendar Personalization IM Note Engine Lottery Horoscopes Sports Quotes News W eather Mail 0

200,000

400,000

600,000

800,000

# Notification messages

Figure 5.22 The total number of notifications sent for each category.

166


Calendar Max Msg. W arnings Lottery IM Note Engine Sports Personalization Quotes News Horoscopes Mail W eather 0

20,000

40,000

60,000

80,000

# Users

Figure 5.23

The total number of users who received notifications for each category.

Next we analyzed how user interest changed during the course of a week. Figure 5.24 shows a comparison between the amount of notification data sent on a weekday versus a Saturday or Sunday. As one would expect, there is a significantly difference between the number of stock quote alerts sent during the weekday

Weekday

Weekend

10,000

KBytes sent

8,000 6,000 4,000 2,000 0

Change of user interest between weekday and weekends.

Weather

Sports

Personalization

Quotes

Quota Warnings

IM

News

Lottery

Mail

Horoscopes

Calendar

Figure 5.24

Categor ies

5.5


167

compared to those sent on the weekend. Similarly, there are fewer email alerts on weekends; this is probably due to lower levels of work activity that occur on weekends relative to weekdays, resulting in fewer triggering events. For other categories, such as sports, weather, and horoscopes, the number of notification messages does not vary significantly from weekdays to weekends. We attribute these patterns to the fact that not many users personalize all aspects of their notification portfolio in a very fine-grained manner (for event types such as weather, the Web site allows users to select the frequency and the time of delivery). 5.5.1.3 Message Popularity Analysis and Its Implications As described in Section 5.4, several studies on regular Web sites have shown that Web accesses follow a Zipf-like distribution. It is interesting to see whether notification messages exhibit a similar property. To study this, we take the following approach. For each notification document, we count the number of notification messages (i.e., copies) that were sent on a given day. We plot the total number of transmissions of a document (i.e., notification messages) versus the popularity ranking of the document on a log – log scale. Figure 5.25 shows the plot for August 21, 2000. The plots for the other days are similar. If we ignore the first few notification documents and the flat tail in Figure 5.25 (as is done in the previous work [7,9,26]), we not that the curve fits a straight line reasonably well. The straight line on the log –log scale implies that the notification documents follow a Zipf-like distribution. We compute the values of a using least-square fitting, after excluding the top 20 documents and the flat tail (the latter represents the notification documents that were sent only once or twice). We find that for our complete dataset the value of a varies from 1.137 to 1.267 (in Fig. 5.25, the value of a is 1.146). These values are higher than the a in the Web proxy logs [9,14,24], and comparable to the a observed for popular Web server logs [26].

Transmissions of document

1,000,000

Trace

100,000

Least square line fit

10,000 1,000 100 10 1 1

10

100

1,000

10,000

100,000 1,000,000

Popularity ranking of document

Figure 5.25 21, 2000).

Frequency of notification documents versus ranking in log– log scale (for Aug.

168


Figure 5.26 shows the cumulative distribution of notification documents on August 21, 2000. The top 1% of notification documents (i.e., 1704) account for 54.24% of the total notification messages. In the logs for other days, the top 1% of notification documents account for 54.15 – 63.66% of the total messages. Such a high concentration of messages containing popular documents suggests that using application-level multicast [10,18,28,32] for popular documents would yield significant savings in both bandwidth and server load. For example, we may distribute a set of caches over the Internet to form an overlay multicast tree rooted at the notification server. When a notification message needs to be sent to multiple recipients simultaneously, it can be sent over the overlay tree and also stored at the caches that it traverses. These caches can help in offloading the retransmission work. For example, when the same copy of notification needs to be sent at a later time, the caches closest to the receiver can forward the message. Note that even though the current notification traffic is small, as the popularity of notification services increases, bandwidth usage will become an important factor for scaling the notification system. Consequently, optimizations such as applicationlevel multicast will become more important. We also observed that the concentration of notification messages to documents becomes less pronounced as the number of the documents considered increases. For example, the top 7.6– 42.0% of the documents account for 80% of the total messages, and the top 45.1– 71.0% of notifications account for 90% of the total messages. This implies that a large performance benefit can be obtained by multicasting only the most popular notification documents.

5.5.2


We study two aspects of user behavior: (1) the distribution of load that users place on the server and (2) the spatial locality of user interest.

1.2

Fraction of total message

1 0.8 0.6 0.4 0.2 0 0

0.2

0.4

0.6

0.8

1

1.2

Fraction of notification documents

Figure 5.26 2000).

Cumulative distribution of notification messages to documents (for Aug. 21,

5.5


169

5.5.2.1 Load Distribution of Different Users On average, we observed that a user receives 2.3 notification messages containing a total of 0.2 kB per day, and 16.1 notification messages containing 1.4 kB of data per week. There is a significant variation in the clients’ usage—during the week that we studied, some clients received over 1000 messages, containing as much as 0.1 MB of data, while other clients received fewer than 10 messages containing only a few hundred bytes of data. Figures 5.27 and 5.28 show the total number of messages and the total number of bytes received by different users on a log – log scale, respectively. Both curves fit a straight line, that is, follow Zipf-like distribution, except at the tail, where there is a sudden drop. We compute the values of a using least-square fitting, after excluding the sharp drop at the tail. The value of a is 0.4437 when usage is defined as the number of messages, and the a is 0.4567 when usage is defined as the number of bytes. To further study how usage is distributed across different clients, we plot the cumulative distribution of client usage in Figure 5.29. As the figure shows, the top 5% of the clients received 28% of the notification messages, and 25% of the notification bytes; the top 10% of the clients received 40% of the notification messages, and 38% of the notification bytes. It is clear that a small fraction of users consume a significant fraction of the system and network resources. It is also interesting to note that the CDF curves are similar for the two different ways of defining usage. The similarity of the curves shows that each user receives a similar number of bytes per message. 5.5.2.2 Spatial Locality Next we study spatial locality of user interest: whether people in the same geographic region tend to receive similar notification content. We employ the same approach as is used in studying the spatial locality for browsing services (described

Trace


# notification messages

10,000 1,000 100 10 1 1

10

100

1,000

10,000

100,000 1,000,000

User ID (sorted by the total number of notification messages)

Figure 5.27

The total number of notification messages received by different users.

170


Total number of bytes received

Trace


1,000,000 100,000 10,000 1,000 100 10 1 1

10

100

1,000

10,000

100,000

1,000,000

User ID (sorted by the total bytes received)

The total number of notification bytes received by different users.

Figure 5.28

in Section 5.4.2.4). We define a notification message to be locally shared if at least two users in the same cluster receive the notification. When computing the degree of local sharing, we exclude the cities to which fewer than 100 notification messages were sent over the course of the week. As shown in Figure 5.30, clients residing in the same city have significantly more sharing in notification content compared to the clients picked at random. We also compared geographic clustering with three other random clusterings and observed similar results. The higher degree of sharing in notification messages for clients in the same geographic region indicates that localized services are popular for notification services. For example, people living in New York are interesting in receiving notification messages about weather or events in New York. The geographic locality in notification content implies that placing servers (i.e., either notification server replicas or servers in an overlay network that provide

Fraction of usage

1.2 1 0.8 0.6 0.4

Total bytes

0.2

Total messages

0 0

0.2

0.4

0.6

0.8

1

1.2

Fraction of users (sorted by decreasing usage) Figure 5.29

Cumulative distribution of different clients’ usage.

5.5


171

Figure 5.30 Comparison of the local sharing between random clients and clients that are geographically close together.

application-level multicast) close to popular geographic clusters can be useful in reducing network load. 5.5.3

System Load

Figures 5.31 and 5.32 show the load imposed by all users in terms of number of messages and the number of bytes sent by the server, respectively. The figures show that the number of messages and the number of bytes are fairly constant during weekdays but exceed the number sent during the weekend. This confirms what one would expect, namely, that information alerts are more frequently generated when people are working. Next, we looked into the hourly notification activity and the variation of the amount of data sent by the notification servers. Figures 5.33 and 5.34 show the total number of notification messages and bytes sent during the course of the day. The results are the averages over the 7-day period. Both figures show that the servers are busy during the morning hours with the peak messages being sent near 9:00 A.M. Pacific Time, and they are the least busy between 9:00 P.M. and 3:00 A.M. Pacific Time. The Web site has an option that allows the user to disable alerts during the night. Thus, this option, along with the fact that certain events, such as stock quotes, are triggered during daytime, results in a low load during the nonworking hours of the day. This suggests that it makes most sense to do server updates and maintenance activity during these periods of lower activity. Although the server load was manageable at the time of trace collection, the load optimization will become more useful as the server load increases.

172


Figure 5.31

5.5.4

Number of messages served by the notification servers during the days in the week.

Summary of Notification Log Analyses

The analyses presented above show the following characteristics in the notification logs: 1. Notification messages are small. In our logs, all the messages have fewer than 256 bytes in application payload, and can fit into a single TCP segment in most cases. 2. The popularity distribution of notification messages follow a Zipf-like distribution, with a varying from 1.137 to 1.267. 3. A small number of notification messages account for most transmissions. In particular, we find that the top 1% notification objects account for 54.15 – 63.66% of the total messages sent. The concentration of transmissions to notifications decreases as the popularity decreases.

Figure 5.32

Number of bytes served by the notification servers during the days in the week.

# notifications messages sent

5.5


173

80000 70000 60000 50000 40000 30000 20000 10000 0 0

Figure 5.33

5

10 15 Time of day

20

25

Number of notification messages sent over the course of the day.

4. The notification usage for different users follows a Zipf-like distribution, with a of 0.44– 0.46. A small number of users consume a significant fraction of system and network resources. For example, 5% of users receive around 25% notifications, and 10% of users receive around 40% notifications. 5. There is geographic locality in notification content. Users in the same city have more sharing in notification content compared to the users picked at random. 6. On average the server sends less than 0.5 million event notifications with a total of 40 MB per day. Thus, the network load is low but other activities such as database updates can still keep the notification servers busy. As the notification service gains more popularity, the system and network load will increase.

Kilobytes sent

50000 40000 30000 2000 0 10000 0 0

4

8

12

16

20

Time of day Figure 5.34 Number of notification bytes sent over the course of the day.

174


7. The content analysis of the notifications logs revealed that email alerts and weather updates are the most popular categories, while calendar alerts are least popular. The relative importance of different notification categories did not change between weekdays and weekends, except stock quotes. The alert load decreases during the weekend as expected, since very few stock alerts are sent over the weekend. These findings have the following implications: 1. Since notification messages are small, sending such small messages reliably, efficiently, and securely poses a new challenge to transport protocol designers. 2. The fact that the top few popular notifications account for most messages transmitted indicates that multicasting the popular notifications can yield significant saving in both network bandwidth and server load. 3. The significant variation in clients’ usage of notification services gives service providers incentive and insight to design sensible charging models and pricing plans. 4. Geographic clusters have higher degree of sharing in notification content than random clusters. This suggests that as the user demand increases, placing servers (i.e., either notification server replicas or servers in overlay network that provide application-level multicast) close to popular geographic clusters will potentially be useful in reducing the network loads. 5.6 CORRELATION BETWEEN WEB BROWSING AND NOTIFICATION Having studied both the browse and notification logs, we want to understand whether there is any correlation between the browsing and notification activities of users. We are interested in addressing issues such as (1) whether heavy high usage of the other service, and (2) whether users’ interests in particular content categories differ across the two services. We use the notification and browser logs, both spanning from August 15, 2000 through August 26, 2000 for the following analysis. 5.6.1

Correlation in the Amount of Usage

Figure 5.35 shows the correlation between the browsing usage and the notification usage. In the top figure, we group the users according to the number of browse requests they issue, and plot the average number of notifications sent to the user groups as we vary the number of browse requests sent to the group. Similarly in the bottom figure we plot the average number of browse requests sent to the user groups, where the users are grouped according to the number of notifications they

5.6

CORRELATION BETWEEN WEB BROWSING AND NOTIFICATION

175

Figure 5.35 Correlation between the number of browse requests and notifications of wireless users.

received. There is little correlation between the two variables—the number of notification messages fluctuates widely with the number of browse requests; similarly, the number of browse requests shows no obvious trend with respect to the number of notification messages. The correlation coefficient between these two variables is 0.265 when considering all users, and 0.125 when considering only wireless users. The low correlation coefficients imply that Web site designers cannot

176


predict a user’s browsing activity according to his or her notification activity, and vice versa. 5.6.2

Correlation in Popular Content Categories

We now consider the question of whether users are interested in a similar set of content categories across the two services. We perform the following analysis to answer this question. First, we classify notification messages and browsing accesses into different categories. (The details of categorizing browse requests and notifications are described in Section 5.4.1.2 and Section 5.5.1.3.) Then for each individual user, we pick the top N content categories in browsing and top N content categories in notification (if the next few categories after the Nth category have the same frequency of access as the Nth category, we include those categories as well for the top N case). Figure 5.36 shows the percentage of users who have at least some overlap between their top N browse and notification categories. The degree of overlap is much higher when we consider wireless users only. For example, for the top three categories, the percentage of overlapped users is less than 10% when considering all the users, and around 50% when considering only the wireless users. On the other hand, even when considering only wireless users, the number of overlapped users is never more than 65%. Next we compare the extent of the overlap by varying from 1 to the total number of categories. The results are shown in Figure 5.37. The figure shows the average percentage of overlap between two categories, where the average overlap is

Figure 5.36 Number of users who have overlap between their top N browsing categories and top N notification categories.

5.6

CORRELATION BETWEEN WEB BROWSING AND NOTIFICATION

177

Figure 5.37 Correlation between the number of browse requests and notifications of wireless users.

computed as follows:

Overlaphigh

Overlaplow

P #categories overlapped for useri i min (N; min (BC,NC)) ¼ relevant users P #categories overlapped for useri i min (N; max (BC,NC)) ¼ relevant users

where BC denotes the number of browse categories, NC denotes the number of notification categories, and “relevant users” refers to those users that have at least one browse record and one notification record in the respective logs. We show the results for only the top nine categories, since the values beyond that are stable. These ratios essentially compute the percentage of overlap for each individual user, and then take the average of these percentages over all wireless users or all users. Since not all users have at least N browsing or notification categories, we compute overlaphigh and overlaplow, where the former computes the percentage of overlap by using the minimum of BC and NC and the latter uses the maximum of BC and NC. Figure 5.37 shows that the amount of overlap is considerably higher when considering only wireless users. For example, for the top three categories, the overlap is less than 7% when considering all users. In comparison, for wireless

178


users, the overlaplow and overlaphigh values are 21% and 36%, respectively. We also observe that the effect of increasing N is small. Even when N is 8, the percentage of overlap is less than 50% for wireless users. The results above indicate that wireless users have moderate correlation in the way they use browse and notification services. In comparison, the correlation is much lower when considering all users. This is because the most popular browsing categories for desktop users are signup services, direction, and general help, whereas notification is seldom used to deliver these types of content. On the other hand, some wireless users are interested in both browsing and receiving notifications about emails, stock quotes, personalization, news, and sports. However, the degree of correlation is limited, and service providers cannot rely solely on a user’s notification profile to determine what content that person may be interested in browsing. 5.6.3

Summary

To summarize, we observe that there is limited correlation between clients’ notification usage and their browsing usage. Users with the same amount of notification usage can exhibit very different browsing usage, and likewise users with the same amount of browsing usage can exhibit very different notification usage. Users’ popular notification categories have only small to moderate overlap with their popular browsing categories. This indicates that people use the two services for very different purposes. The different characteristics of notification and browsing services make them suitable for delivering different types of content. The lack of correlation between users’ browsing activities and notification activities suggests that users utilize the two services very differently. Higher usage in notification services does not imply higher usage in browsing, and vice versa. Web site designers need to be aware that they cannot predict a user’s browsing activities in terms of both the number of accesses and the type of content to be accessed only according to that individual’s notification profiles. Service providers also need to keep this in mind when designing pricing plans.

5.7 COMPARISON BETWEEN WORKLOAD OF WIRELINE WEB AND MOBILE WEB We now compare and contrast the workload between wireline and wireless users. 5.7.1

Comparison in Web Content

We start by comparing the type, size, and popularity of content offered to wireline and wireless users. The content offered to wireline users is richer than the content offered to wireless users. However, we believe, as wireless Web services continue to grow, that the richness of content in the two environment will become comparable.

5.8

SUMMARY

179

In terms of content size, we observe that the content offered to wireless users tends to be significantly smaller. This is a consequence of the limited display capabilities of wireless devices and limited wireless bandwidth. Even though wireless devices will become more powerful and wireless bandwidth will improve in the future, the discrepancy between wireless and wireline bandwidth and the discrepancy between display abilities of wireless and wireline devices will not go away. As a result, we expect the wireless content to remain more compact than the wireline content. In terms of content popularity, wireless content shares the Zipf-like popularity distribution as wireline content; in both cases, a majority of requests are highly concentrated on a small number of documents. 5.7.2

Comparison in User Behavior

There is a significant variation in the load placed by different users on the Web server for browsing and notification; some users impose significantly higher load on a Web server than do other users. This is in agreement with browsing activities seen at regular Web servers as well. In addition, both regular Web servers and the Web server for mobile clients (considered in our analysis) exhibit temporal stability; the set of popular content and queries remain stable over time. Spatial locality is observed in notification content (see Section 5.5.2) as well as regular Web server content [26,31]. Wireless users, however, do not exhibit strong spatial locality in their browsing activities at the Web site we analyzed. This is likely due to the limited content available at the time of our trace collection. As Web content becomes more diversified, we expect that spatial locality in the browsing activities of wireless users to grow. 5.7.3

Comparison in System Load

The load at Web servers, regardless of whether they serve wireline or wireless clients, exhibit a diurnal and weekly variation. On the other hand, the popular Web server that we analyzed for mobile clients experienced significantly smaller load than did popular Web servers for wireline clients, such as the World Cup Web site [3] and the MSNBC Web site [26]. However, we expect that the load at the wireless Web server will grow as the popularity of wireless devices and services increases. In addition, the Web site for mobile clients has more heterogeneous population of users, including wireless, offline, and desktop clients. This offers additional opportunity for performance optimization, such as service differentiation across different types of clients.

5.8

SUMMARY

In this chapter, we presented a detailed analysis of the workload observed at a popular commercial Web server that provides browse and notification services for

180


wireless mobile users. We compared the workload of mobile users with that observed by regular Web servers. Our analysis suggests a number of important implications on Web server design optimizations, capacity planning, caching, multicast, and other design strategies. When reading the results, it is important to note that Web workloads may vary both over time and across different Web sites. Therefore, we encourage others to step forward to provide more data points, and compare and constrast their studies with ours. Internet access via small devices such as cell phones and handheld devices is expected to increase tremendously in the near future. We hope that this chapter opens a door to a better understanding of the dynamics of such wireless systems.

REFERENCES 1. S. Agrawal, S. Chaudhuri, and V. Narasayya, Automated selection of materialized views and indexes for SQL databases, Proc. 26th Int. Conf. Very Large Databases (VLDB00), Sept. 2000. 2. V. Almeida, A. Bestavros, M. Crovella, and A. de Oliveira, Characterizing reference locality in the WWW, Proc. 4th Int. Conf. Parallel and Distributed Information Systems (PDIS ’96), Dec. 1996. 3. M. Arlitt and T. Jin, Workload characterization of the 1998 World Cup Web site, IEEE Network 30 – 37 (May/June 2000). 4. M. F. Arlitt and C. L.Williamson, Internet Web servers: Workload characterization and performance implications, IEEE/ACM Trans. on Networking, Vol. 5, No. 5, pp. 631– 645, Oct. 1997. 5. A. Balachandran, G. Voelker, P. Bahl, and V. Rangan, Characterizing user behavior and network performance in a public wireless LAN, Proc. ACM SIGMETRICS ’02, June 2002. 6. M. Balazinska and P. Castro, Characterizing mobility and network usage in corporate wireless local-area network, Proc. of ACM MOBISYS ’2003, May 2003. 7. P. Barford, A. Bestavros, A. Bradley, and M. Crovella, Changes in Web client access patterns: Characteristics and catching implications, Special Issue on WWW Characterization and Performance Evaluation, World Wide Web J., Dec. 1998. 8. P. Barford and M. Crovella, Generating representative Web workloads for network and server performance evaluation, Proc. ACM SIGMETRICS ’98, 1998. 9. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, Web caching and Zipf-like distributions: Evidence and implications, Proc. IEEE INFOCOMM ’99, March 1999. 10. Y. Chu, S. Rao, S. Seshan, and H. Zhang, Enabling conferencing applications on the Internet using an overlay multicast architecture, Proc. ACM SIGCOMM 2001, Aug. 2001. 11. M. E. Crovella and A. Bestavros, Self-similarity in world wide Web traffic: Evidence and possible causes, Proc. ACM SIGMETRICS ’96, May 1996. 12. C. R. Cunha, A. Bestavros, and M. E. Crovella, Characteristics of WWW Client-based Traces, Boston Univ. Computer Science Dept. Technical Report TR-95-010, June 1995.

REFERENCES

181

13. F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul, Rate of change and other metrics: A live study of the World Wide Web, Proc USENIX Symp. Internet Technologies and Systems, Dec. 1997. 14. S. Glassman, A caching relay for the World Wide Web, Proc. 1st WWW Conf., May 1994. 15. A. Homer et al., Professional Active Server Pages 3.0, Wrox Press, 1999. 16. R. Hutchins and E.W. Zegura, Measurements from a campus wireless network, IEEE Int. Conf. Communi. (ICC 2002), 2002, pp. 3161–3167, May 2002. 17. A. K. Iyengar, M. S. Squillante, and L. Zhang, Analysis and charaterization of large-scale Web server access patterns and performance, World Wide Web (June 1999). 18. J. Jannotti, D. K. Gifford, K. L. Johnson, M. F. Kaashoek, and J. W. O’Toole, Jr., Overcast: Reliable multicasting with an overlay network, Proc. 4th Symposium on Operating Systems Design and Implementation (OSDI 2000), Oct. 2000. 19. D. Kotz and K. Essien, Analysis of a campus-wide wireless network, Proc. ACM MOBICOM, Sept. 2002. 20. B. Krishnamurthy and J. Rexford, Web Protocols and Practice, Addison-Wesley, Reading MA, 2001. 21. T. Kunz, T. Barry, X. Zhou, J. P. Black, and H. M. Mahoney, WAP traffic: Description and comparison to WWW traffic, Proc. ACM Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems, Aug. 2000. 22. B. Mah, An empirical model of HTTP network traffic, Proc. IEEE INFOCOM ’97, April 1997. 23. J. C. Mogul, The case for persistent-connection HTTP, Proc. ACM SIGCOMM ’95, Aug. 1995. 24. N. Nishikawa, T. Hosokawa, Y. Mori, K. Yoshidab, and H. Tsujia, Memory-based architecture for distributed WWW caching proxy, Proc. 7th WWW Conf., April 1998. 25. V. N. Padmanabhan and R. Katz, TCP fast start: A technique for speeding up Web transfers, Proc. IEEE GLOBECOM’98, Nov. 1998. 26. V. N. Padmanabhan and L. Qiu, The content and access dynamics of a busy Web site: Findings and implications, Proc. ACM SIGCOMM 2000, Aug. 2000. 27. Pareto Distribution, http://mathworld.wolfram.com/ParetoDistribution. html. 28. D. Pendarakis, S. Shi, D. Verma, and M. Waldvogel, ALMI: An application level multicast infrastructure, Proc. USITS 2001, March 2001. 29. D. Tang and M. Baker, Analysis of a metropolitan-area wireless network, Proc. ACM MobiCom 99, Aug. 1999, pp. 13– 23. 30. D. Tang and M. Baker, Analysis of a local-areawireless network, Proc. ACM MobiCom 2000, Aug. 2000. 31. A. Wolman, G. M. Voelker, N. Sharma, N. Cardwell, M. Brown T. Landray, D. Pinnel, A. Karlin, and H. Levy, Organizational-based analysis of Web-object sharing and caching, Proc. USITS ’99, Oct. 1999. 32. H. Yu, L. Breslau, and S. Shenker, A scalable Web cache consistency architecture, Proc. ACM SIGCOMM ’99, Sept. 1999. 33. Y. Zhang, L. Qiu, and S. Keshav, Speeding up short data transfers: Theory, architectural support, and simulation results, Proc. NOSSDAV’2000, June 2000.

CHAPTER 6

ACME: A NEW MOBILE CONTENT DELIVERY ARCHITECTURE TAO WU, SADHNA AHUJA, and SUDHIR DIXIT Nokia Research Center Burlington, Massachusetts

6.1

INTRODUCTION

Data-capable 2.5G and 3G mobile networks, which are being deployed worldwide, will not only extend existing wireline Web applications and content to mobile terminals but also create remarkable opportunities for new mobile Web applications. As an example, a mobile user would rely on the Web to find a restaurant that is close to her current location, the schedule of a nearby movie theater, and so on. These “context aware” services add significant value to mobile devices, and will play a critical role in the evolution of future mobile networks. A key success factor for these new mobile services is that they need to achieve satisfactory user experience. Specifically, many mobile services are interactive in nature, and it is important to minimize user-perceived latency so that the interactivity can be enhanced. However, achieving this goal is uniquely challenging, as the wireless environment has some inherent characteristics that result in long latencies. First, mobile access networks are characterized by an error-prone air interface and consequently, high frame corruption rate [9,10]. Link layer retransmissions of corrupted frames are usually necessary to provide the reliability that most data applications require. These procedures, however, result in long and unpredictable latency and severely affect application interactivity and user experience, especially in heavily loaded networks where strong interference causes bit error rate to increase dramatically. In practice, the roundtrip time in a GPRS network is close to one second [14], and it has been shown that the PING roundtrip time from a terminal to a Web server increased by more than 20 times when a wireline link was replaced by an emulated heavily used GPRS link [9].

Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

183

184

ACME: A NEW MOBILE CONTENT DELIVERY ARCHITECTURE

In addition, mobile Web applications are usually highly asymmetric in bandwidth usage, requiring much more downlink bandwidth than uplink bandwidth. Unfortunately, most existing and emerging mobile networks allocate uplink and downlink bandwidth symmetrically.1 This mismatch between applications and the underlying network architecture can deplete the downlink bandwidth in these networks. As a result, excessive queuing delay and packet loss in the downlink contribute to unacceptable latency. In the wireline Internet, Web performance can be substantially enhanced by using additional bandwidth, which is often available at low cost, thanks to technologies such as DWDM. This approach is impractical for mobile networks in general. Most mobile networks have a fixed spectrum and the achievable data rate is fundamentally limited by interference, as dictated by information theory. This problem will intensify when more users start to use bandwidth-intensive services such as multimedia messaging and streaming media. In this chapter, we develop a novel architecture for content delivery in the mobile environment (ACME) to address these performance, scalability, and architectural challenges. Content delivery techniques have been quite effective in scaling the Internet backbone to accommodate the rapid growth of the World Wide Web traffic. In the first few years of the Web’s existence, the Internet backbone (especially ISP peering points) and origin servers were the bottleneck of performance. By utilizing Web caches at the “edge” of the network to store and serve popular Web content, as illustrated in Figure 6.1a, content delivery techniques reduce the usage of the backbone and significantly enhance web performance and user experience. However, directly applying wireline content delivery techniques in mobile networks would provide only limited QoS improvement beyond conventional client browser caching. In contrast to the Internet backbone bottleneck, the performance bottleneck of mobile Web applications is at the air interface as shown in Figure 6.1b. As a result, user-perceived latency cannot be effectively improved even if content is cached and served from the access network (the “edge” in traditional content delivery). Recognizing this fundamental difference, we implement the functionality of an edge cache in mobile terminals in ACME, as shown in Figure 6.2. Edge caching is a superset of client caching and stores popular content requested by all users. To implement edge caching functionality at the terminal, ACME pushes content requested by one terminal to other terminals, which cache it for future references. Push can be implemented in several different ways. One implementation takes advantage of the emerging mobile multicast technologies such as MBMS and DVB-T discussed in Chapter 10. By exploiting multicast, this implementation has the advantage of superior bandwidth scalability as the number of users grows. Alternatively, one can use existing push mechanisms such as WAP Push to transmit 1

High-speed downlink packet access (HSDPA) in 3GPP [2] and UTRA-TDD [25] will make it possible to increase downlink capacity. Nevertheless, efficient utilization of the radio resource is a prime consideration in any mobile network.

6.1

(a)

INTRODUCTION

185

(b)

Figure 6.1 The bottleneck determines network performance: (a) the bottleneck in the Internet; (b) the bottleneck in mobile networks.

content to terminals. While this approach does not scale as well as the multicast implementation, it still significantly reduces the user-perceived latency and can be implemented easily. Although we assume that multicast is used in the theoretical analysis in Section 6.3, the ACME Director is applicable for both implementations. We will use the terms “push” and “multicast” interchangeably, and the applicability of each implementation should be clear within the context. In Section 6.3, we first develop an analytical model of ACME based on the multicast implementation, where content requested by one user is broadcast to all users. Using this model, we demonstrate that ACME’s effectiveness comes not only from caching nonrequested content at the terminal, but also from reducing medium contention and interference. Additionally, by using broadcast, ACME reduces downlink traffic, thus mitigating the mismatch between network architecture and application characteristics. It also reduces the impact of dynamic network hot spots generated by user mobility. Besides latency and bandwidth efficiency, mobile Web applications must also be efficient in terminal power consumption. We note that the average communicationrelated terminal power consumption in ACME is proportional to the average multicast group size. Thus, broadcasting content to every terminal is not power

Figure 6.2 The ACME cache in terminals implements the edge caching functionality.

186


efficient. To reduce the average multicast group size and improve terminal power efficiency, we develop the ACME Director, shown in Figure 6.2, which pushes content only to selected users. The key observation we make is that user interest correlation (common interest in the same content among different users) is the foundation of edge caching effectiveness. The ACME Director exploits user interest correlation and selectively multicasts content only to those users who are most likely to request it in the future. We use conditional access probability to quantify user interest correlation. We than compute user interest correlation using real Web access traces, and perform trace-driven simulations to evaluate the ACME Director’s effectiveness at various multicast group sizes (and equivalently, average terminal power consumption levels). We find that to achieve 50% caching effectiveness of broadcast, the ACME Director’s multicast group size is only 0.7– 6% of the size of broadcast groups. Equivalently, selective multicast reduces ACME terminal power consumption by 17 to 144 times. Moreover, the ACME Director exhibits much better scalability than does broadcast, because the average multicast group size increases only by 62% (from 6.74 to 10.96) as the number of users increases 16 times from 127 in one trace to 2155 in another. The remainder of the chapter is organized as follows. In Section 6.2, we present a high-level overview of content delivery techniques using a taxonomy-based approach, and discuss content delivery’s applications in the mobile Internet. We also review some previous work related to the ACME architecture in Section 6.2. In Section 6.3, the ACME system model is introduced and its performance benefits are analyzed and characterized. Section 6.4 introduces the ACME Director algorithm to discover user interest correlation and to reduce terminal power consumption. We discuss additional benefits of ACME in radio resource management in Section 6.5, and conclude in Section 6.6.

6.2 MOBILE CONTENT DELIVERY TECHNIQUES AND RELATED WORK Accelerating mobile Web performance is attracting increasing attention from academia and industry as mobile users start to demand the ability to access information anywhere and anytime. In this section, we briefly overview content delivery techniques in both wireline and wireless networks and discuss some prior work related to ACME. A more in-depth review of content delivery techniques can be found in an article by two of the authors [42]. 6.2.1

Content Delivery for the Internet

Content delivery techniques for the wireline Internet can be roughly categorized into three areas: network scaling, which improves the scalability of the network by distributing content to multiple servers; end-system acceleration, which speeds up the content delivery at content source and destination; and content and protocol optimization, which improves content transmission efficiency. This taxonomy is

6.2

MOBILE CONTENT DELIVERY TECHNIQUES AND RELATED WORK

187

admittedly approximate, and a few techniques can be put into more than one categories. However, the purpose of this taxonomy is to identify the strength and weakness of a particular content delivery technique, and, as will become clear later in the chapter, help select the appropriate content delivery technology for the mobile Internet. 6.2.1.1 Network Scaling This type of technique distributes content to multiple servers and Web caches and uses the “best” server to serve content to achieve improved performance, scalability, and availability. One of the most widely used network scaling techniques is Web caching, illustrated in Figure 6.3. By serving popular Web objects to clients from the edge of the network, Web caches effectively reduce the use of the Internet backbone and the origin server, which usually were the bottleneck during the first few years of the Web evolution. Edge caching further enhances user experience by reducing the transmission latency, because edge caches are topologically closer to the clients. As the Web evolved, content delivery networks (CDNs) were developed to provide managed content distribution services (e.g., Akamai [7] and Cable & Wireless [17], to name just two). Currently some CDNs not only deliver static and dynamic content, but also use the same infrastructure to offer distributed Webbased services. In addition, IETF’s OPES Working Group is defining a framework for developing proxy-based value-added services such as virus scanning, content filtering and content adaptation [36]. 6.2.1.2 End-System Acceleration Content source and sink can substantially affect the user experience. Especially, in the wireline Internet, origin servers must be able to handle high volumes of content

Figure 6.3 Content distribution by Web caching [42] (courtesy Kluwer Academic Publishers).

188


requests. Server farms consisting of multiple origin servers are often deployed to offer high performance and availability. In such server farms, balancing server load is an important task [30]. Figure 6.4 gives a simplified illustration of the role of a load balancer or content switch in a server farm. 6.2.1.3 Content and Protocol Optimization These type of techniques aim to improve user experience by reducing redundancy in Web content or improving transport protocol efficiency. The predominant protocols used on the Web are HTTP over TCP/IP. TCP is a point-to-point protocol and in some scenarios does not suit the scalability requirements of content delivery networks. Enhancements that address this problem on the Internet backbone include the following: . Using multicast to disseminate content from one source to multiple destinations. Multicast can substantially reduce the bandwidth used by content distribution, but it is challenging to achieve reliability with multicast because the acknowledgment messages must be handled with care or eliminated (e.g., by using application layer forward error correction codes [31]) to avoid the implosion of acknowledgment messages [22]. Chapter 4 presents a comprehensive review of this topic. . Offloading TCP processing from origin servers. TCP connection setup and teardown overhead is significant for a busy origin server, so TCP multiplexing is used to move this task to a front-end server called the TCP multiplexer [39]. The TCP multiplexer terminates TCP connections from clients and uses a persistent TCP connection with the origin server. Eliminating the redundancy in the content is also important in improving transmission and storage efficiency. For example, most browsers developed for the PC platform support gzip-compressed Web content, but many origin servers do not serve compressed HTML pages. Caching proxies can gain substantial performance benefits in terms of user-perceived latency, throughput and hit ratio from compressing and storing and serving compressed content [6]. In addition, it is possible to exploit the

Load Balancer or Content Switch

Origin Server

Internet Origin Server

Web Cache/ Reverse Proxy

Figure 6.4

Load balancing in a server farm [42] (courtesy Kluwer Academic Publishers).

6.2

MOBILE CONTENT DELIVERY TECHNIQUES AND RELATED WORK

189

similarity between Web objects. For instance, the weather forecast pages for two cities from the same web site typically bear significant similarity in formatting. These common components can be cached as HTML macros [19], and only city-specific information needs to be transmitted to compose the appropriate weather forecast page. This idea is also used in delta encoding, where in order to update a cached Web page, only the difference between the two versions of the Web page needs to be transmitted [34]. Cache-based compaction [15] develops a more general framework that selects a cached object as a reference to reduce the size of a Web object being transmitted. More recently, Edge Side Includes (ESI) [21] has been proposed as a standardized technique to assemble web content at the network edge. The basic idea of ESI is somewhat similar to HTML macros, but its focus is on supporting dynamic content assembly, which is becoming increasingly important. We note that techniques such as ESI that enable distributed dynamic content assembly are orthogonal to the ACME architecture discussed later in the chapter, so it is possible to use ACME (in combination with these techniques) to deliver dynamic content. Finally, content adaptation is in important method to reduce transmission latency as well as optimize user experience. It is discussed in detail in Chapter 7. To summarize, content delivery techniques improve the user experience by accelerating the bottleneck performance. In the wireline Internet, network scaling and server acceleration are affective methods to reduce content download latency. 6.2.2

Content Delivery for the Mobile Internet

In the wireline Internet, improving the scalability of backbone and origin servers is of prime concern. In the mobile Internet, however, air interface is usually the bottleneck of end-to-end performance and user experience. Consequently, existing wireline content delivery techniques, which use edge caching to minimize the backbone usage, are not effective in the mobile Internet. Mobile content delivery techniques must be able to optimize content transmission over the radio link in order to enhance the mobile user experience. These techniques will work in unison with the techniques described in Section 6.2.1 to improve the end-to-end performance. Because many existing wireline Web applications are expected to run on the mobile terminals too, it is critical for mobile content delivery techniques to minimize the changes that applications need to make. For this reason, the “split proxy” architecture shown in Figure 6.5 is favored in many scenarios [31,33]. The network proxy, typically deployed in the radio access network, communicates with the client proxy using protocols optimized for the radio link, providing a transparent communications interface to applications running on the client and the server. This architecture has been adopted by systems such as the Wireless Application Protocol (WAP) [35]. The architecture that we will discuss later in the chapter, ACME, can also be viewed as a flavor of the split-proxy approach. 6.2.3

Related Work

The basic idea of our work is to push the content to the terminal before a user requests it. In its use of multicast, our work is related to data broadcast in wireless

190


Applications

Applications

Air Interface

Proxy

Client Side

Proxy

Network Side

Internet

Server Side

Figure 6.5 The split-proxy architecture.

networks. In data broadcast, also known as broadcast disks, data are broadcast periodically to all terminals. Items with different probabilities of access are broadcast with different frequencies, as if they were on disks with different spinning speeds. This means that on average, a client would wait for half a broadcast cycle to receive the desired message if that message is broadcast only once per cycle. So one-way data broadcast [3] is unsuitable for applications such as Web browsing because the vast number of Web objects would yield practically infinite average latency. While the hybrid push – pull scheme has been studied in many paper [4,18,26,27,38,41], it does not exploit terminal storage and the strong user interest correlation in mobile access networks. In general, ACME trades local storage for latency reduction and bandwidth efficiency, while data broadcast trades latency and interactivity for bandwidth efficiency. Therefore, data broadcast can achieve satisfactory latency only when used for the smallest set of most popular objects. In addition, in the context of satellite Internet services, it has been proposed to broadcast content using satellites and cache the content at the client [5,32], but the problem of optimally selecting the receiving terminals to reduce power consumption was not considered because this is usually not an issue for the home environment. Our independently developed ACME Director shares the similar objective of discovering “locality in interests” in [37], where the concept helps to locate content in peer-to-peer networks. Separately, online retailers such as amazon.com have recently started to recommend add-on items based on the purchase pattern of other users who have bought the same item [8]. The fact that user interest correlation can be applied to very different scenarios seems to suggest that the concept captures the gist of a variety of user-centric Web applications and should be studied in greater detail. Experimental study of content sharing among wireline Web users can be found in the literature [20,40]. Finally, we note that both ACME and document prefetching [29] reduce client compulsory misses by downloading the content before it is requested. Document prefetching achieves this by exploiting correlation among requests made to documents, while ACME achieves this by exploiting interest correlation among different users.

6.3

6.3

ACME PERFORMANCE ANALYSIS

191


As discussed in Section 6.1, extending the edge caching functionality to the terminal provides an efficient solution to several performance, scalability, and architectural problems in mobile access networks because it reduces the use of the slow and error-prone air interface. It is an extension to the wireline content delivery mechanism, which aims to serve content without using the backbone bottleneck. By its nature, edge caching is a superset of individual client caching since an edge cache stores popular content requested by all users that it serves. Hence, for each terminal equipped with an ACME cache, which functions as an edge cache, some form of pushing or prefetching must be implemented to download content that is popular among other users but has not been requested locally. In this section, we first introduce the general architecture of ACME, then analyze its performance in a slotted ALOHA system. We show that two key improvements made by ACME are increased cache hit ratio and reduced medium contention. Finally, we briefly discuss ACME’s role in CDMA networks from the perspectives of power control and interference reduction. As mentioned in Section 6.1, we assume that ACME is implemented with multicast support in this section. Specifically, we assume that each requested Web object is broadcast to every terminal in the cell. While broadcasting to each terminal may not be feasible in practice, it achieves the highest possible hit ratio and gives a performance upper bound for ACME. In Section 6.4, we will develop the ACME Director, which approximates broadcast performance with substantially smaller multicast groups.

6.3.1

System Description

Figure 6.2 illustrates the general architecture of ACME. The ACME cache in each terminal implements the edge caching functionality using on-demand broadcast. If a user requests a Web object that cannot be served from the terminal’s ACME cache, a request is sent to the origin server. When the object returned by the origin server arrives in the access network, it is broadcast over the air interface to all terminals within that cell, which then store it in their respective ACME caches. The ACME Director in Figure 6.2 is discussed in Section 6.4. It exploits user interest correlation to perform selective multicast and improve average terminal power efficiency.

6.3.2

ACME Performance in a Slotted ALOHA System

Here, we analyze ACME’s performance in a slotted ALOHA sytem. ACME’s benefits in other systems such as CDMA are similar, but slotted ALOHA allows us to obtain elegant analytical results that shed light on how ACME reduces medium contention and interference. In this chapter, we assume that ACME caches are of infinite capacity because the latest generation of flash memory and miniature hard disk drives have capacities that

192


are orders of magnitude larger than the content an average mobile user would download daily, and it is expected that their capacity will continue to grow rapidly in the foreseeable future. Furthermore, the ACME Director significantly reduces the amount of content a terminal edge cache needs to store. As a result, it is not necessary to explicitly model the cache replacement policy; instead, we use hit ratio to capture the caching effectiveness. For simplicity, assume that both content request and reply messages are of the same fixed size. Suppose that we have N terminals (N 1). Each terminal independently sends new requests to the origin server via a shared channel at a Poisson rate of l=N, so the aggregate new request rate is l. The link capacity of both the uplink (multiple access) and the downlink (broadcast capable) is one message per slot, and the request message is transmitted to the server only when there is no other terminal attempting to transmit messages. After receiving the request, the server immediately sends a reply message of the same size to the user via downlink. Collisions only take place in the uplink. If a collision occurs, the request is retransmitted in each subsequent timeslot with probability qr until it is successfully transmitted. We also assume that at the end of the timeslot, the terminal knows whether the transmission was successful. Note that the purpose of the analysis (and the assumptions above) is to gain qualitative insight on ACME’s performance characteristics rather than quantify its performance in a particular network environment. It is also trivial to extend this analytical framework to asymmetric uplink/downlink configurations. The total offered load, G, consists of both new arrivals and retransmissions, so we have G l. The throughput of the slotted ALOHA system, T(G), can be calculated as T(G) ¼ G exp ( G). Figure 6.6 depicts T(G) by varying the offered load G. The throughput reaches its maximum of 1=e when G is 1. We will focus on the stable equilibrium point shown in Figure 6.6 only. For details about standard assumptions as well as rigorous analysis of slotted ALOHA, readers are referred to Bertsekas and Gallager [12].

6.3.2.1 Performance of Baseline We first consider a base system (referred to as Baseline hereafter), with no ACME cache in use. Although content transmitted to one terminal is not stored at other terminals in Baseline, this model can handle the case of client caching at the terminal. In this case, the new arrival rate is the rate at which user requests result in client cache misses. Now we compute the expected delay in a successful request transmission in Baseline, which has a new request arrival rate of l and an offered load of G. At the stable equilibrium point, the new request rate l equals the throughput. Hence, l ¼ T(G) ¼ G exp ( G). A request is successfully transmitted in the immediately available time slot, if there is no other terminal attempting transmission, namely, approximately with probability eG if N 1, as in our case. With probability 1 eG , the terminal has to retransmit.

6.3


193

Throughput T(G) 0.5 Stable equilibrium point

Throughput T(G)

0.4 1/e 0.3

λ

0.2 0.1 0 0

1

2 3 Offered Load G

4

5

Figure 6.6 Slotted ALOHA throughput T(G).

In the retransmission phase, the terminal retransmits with probability qr in each timeslot. The probability of successful retransmission for a specific terminal in slot t is Psucc ¼ Prob(only terminal i sends request in slot t) ¼ qr Prob(other terminals do not send in slot t) qr exp ( G)

(6:1)

and the expected time for a successful retransmission, E½Tr , is E½Tr ¼

1 Psucc

¼

1 G e qr

Combining the new arrival and retransmission phases, and noting that the server reply arrives in the next timeslot following a successful request transmission, we compute the expected user-perceived network latency, E½TB , as follows: E½TB ¼ 2 þ

eG 1 qr

(6:2)

6.3.2.2 Performance of ACME Now we calculate the expected time for a request fulfilment in ACME. There are two key improvements that ACME makes over Baseline. The first improvement, caching, is straightforward: some of the requests can be served immediately from the ACME cache because other terminals have made the same requests earlier.

194


For these requests, the transmission latency is zero. This is in contrast to the roundtrip latency in Baseline, which is at least two timeslots according to Equation (6.2). The second improvement is more subtle: faster medium access due to reduced medium contention. We call this improvement “the medium contention reduction effect.” We now quantify the performance improvement due to these two effects. Assume that the hit ratio of the ACME cache is h. With a probability of 1 h, the request will be sent out on the multiple access medium. Consequently, the new arrival rate at the medium in ACME, lM , is

lM ¼ l (1 h)

(6:3)

where l is the new arrival rate in Baseline. The offered load in ACME, GM , is the solution to the equation

lM ¼ GM eGM

(6:4)

ACME’s expected time for successful request transmission can be derived from Equation (6.2) by replacing G with GM and considering that only requests that resulted in cache misses are sent to the server: exp (GM ) 1 E½TM ¼ 2 þ (1 h) qr

(6:5)

6.3.2.3 Comparison We compute the expected latency of Baseline and ACME using Equations (6.2) and (6.5), respectively. We vary hit ratios from 20% to 80%, with Baseline as a special case of zero hit ratio. The retransmission probability qr is fixed at 0.5. Figure 6.7 depicts the expected roundtrip latency for ACME and Baseline against different new arrival rates. One straightforward observation is that the latency decreases as ACME’s hit ratio improves. More specifically, when the system is lightly loaded, medium contention rarely occurs, and improvement of ACME is due mostly to caching content locally [i.e., the (1 h) term in Eq. (6.5)]. As the load of the system increases, the probability of collision increases, and caching additionally reduces contentions and retransmissions, improving latency even in the case of cache misses. This medium contention reduction effect is quantified by the (2 þ ½exp (GM ) 1=qr ) term in Equation (6.5). Therefore, in a heavily load system, ACME’s improvements are much more than just serving requests locally. Under the optimal new arrival rate of 1=e, the expected roundtrip latency in Baseline is 4.82 timeslots, while it is 0.95 timeslots in ACME with 60% hit ratio. This represents an acceleration of 4 times. If latency reduction were solely due to caching, then ACME’s latency would have been 1.93 timeslots (acceleration of 1.5 times). ACME’s medium contention and interference reduction effect is as powerful as its caching effect when the system is in heavy load.

6.3


195

Figure 6.7 Expected roundtrip latency.

Figure 6.8 depicts E½TB =E½TM at different hit ratios on the y axis to evaluate user-perceived latency improvement over Baseline. Again, the curves would be parallel to the x axis if there were no medium contention reduction. The slope of the curves becomes steeper as the new arrival rate increases, confirming that the medium contention reduction effect is more pronounced under heavy load. At

Figure 6.8

ACME’s improvement over baseline.

196


80% hit ratio. ACME’s acceleration is about 10 times at maximum load, half of which is contributed by the medium contention reduction effect. Our analysis clearly shows the benefits of using ACME to enhance user experience by increasing cache hit ratio and reducing the contention and interference in multiple access medium. In the next section, we discuss how ACME can further be leveraged to improve QoS in CDMA networks.

6.3.3

ACME in CDMA Networks

The principles of ACME are general and can be applied in a wide variety of mobile, fixed wireless and wireline networks using CDMA (WCDMA and CDMA2000), TDMA (GSM/GPRS), CSMA (Ethernet and cable networks), FDMA, and other multiple-access methods. ACME’s contention reduction effect in ALOHA is also available in other networks in different forms. In CDMA networks, ACME’s major benefit is reduced interference. Here, we qualitatively show this from the perspective of power control. Consider a CDMA network where many terminals connect to a base station via the wireless medium. A “near – far” problem often arises in the base station when the signal from terminals far away from the base station is overwhelmed by signals from terminals nearby. To correct this problem, terminal power control is used so that the signal from one terminal does not interfere with other terminals but still has satisfactory SIR (signal-to-interference ratio) at the base station. ACME can improve power control from a new perspective. Suppose we have terminals A, B, and C within the same sector of a cell served by a base station, as illustrated in Figure 6.9. Terminal A is closer to the base station than are B and C. Consequently, signals sent from the base station to B or C have better SIR at A and can be decoded with lower bit

B

Base Station

A

C

Figure 6.9

Using ACME to reduce interference in CDMA.

6.4

EXPLOITING USER INTEREST CORRELATION WITH ACME

197

error rates. If A is ACME-enabled, it is possible that all content sent to B and C is also received by and cached in A. All this is done without A sending any request messages (except possibly signaling). When terminal A request some content, it does not need to access the air interface if B or C has requested the same content before. In other words, a higher cache hit ratio in A reduces its ratio bandwidth usage. This can be thought of as terminal A having the “ultimate” power control, in which its transmission power is reduced to zero if the content has been previously requested by a terminal farther away. Therefore, the average SIR (and QoS) at the base station for signals from B and C can be improved. To implement ACME in 3G networks, one can utilize the broadcast/multicast support in these networks, such as the multimedia broadcast/multicast service (MBMS) that is being standardized in 3GPP [1], or DVB-T. Both topics are discussed in Chapter 10. Multicast capability in GPRS networks is also commercially available [11]. Multicast services are gaining increasing interest from mobile carriers because of their scalability and bandwidth efficiency, especially in delivering multimedia content.

6.4


As explained in Section 6.3, ACME achieves high bandwidth efficiency by using ondemand broadcast. Using broadcast as the downlink transport bearer, however, forces one terminal to receive and store information requested by every other user in the access network, resulting in O(N) terminal processing and power consumption cost. While this may not be issue in wireline or fixed wireless networks, where terminals have ample processing capability and power supply, it is often too expensive for terminals such as mobile phones and PDAs. For these terminals, ACME must also be power and processing efficient. Here, we significantly improve ACME’s power efficiency using selective multicast. Instead of broadcasting content to every terminal, we push requested content only to users who are likely to request the same content after the current requester. The basis of this approach is the following key observation. The fundamental effectiveness of edge caching lies in the fact that different users have overlapping interest in content; otherwise edge caching would offer no advantages over client caching. We discover the user interest correlation by computing the probability of one user requesting the same content after another user, and store this information in a server called the ACME Director in the mobile access network shown in Figure 6.2. The ACME Director then uses the interest correlation information to perform selective multicast for every content request. The average terminal power consumption is thus effectively reduced because it is proportional to average multicast group size. 6.4.1

The Algorithm

Suppose that there are N users numbered 1, 2, . . . , N, and let p( jji) denote the probability that user j will access a Web object in the future given user i has accessed that object. Let P be the N N matrix comprised of p( jji), that is, Pi, j ¼ p( jji). Pi,i is

198


defined to be zero because this case is handled by client caching and is irrelevant to ACME. In practice, P can be trained using past web access logs. Once the ACME Director has a properly trained P matrix, it performs selective multicast as follows. First, we use a tunable paramenter called multicast factor a(0 a 1) to control multicast “selectiveness.” Suppose that user i is requesting a web object and p(jji) is among the largest aN 2 elements in P, then the ACME Director pushes the content to user j. Obviously, a ¼ 0 represents the case of unicast (and client caching), and a ¼ 1 represents the case of broadcast (and edge caching), thus every terminal receives a copy of the requested content. The goal of the ACME Director is to achieve high hit ratio (close to that of edge caching using broadcast) with a small a to reduce multicast group size (close to unicast).

6.4.2

Traces

Because of the lack of publicly available mobile Web traces, we choose three wireline proxy traces [28] and exclude all dynamically generated and noncacheable responses. The UCB trace is the first 300,000 accesses in UC Berkeley’s home IP trace [23], representing about 17 hours of activity from 2155 IP addresses. The BU trace is the concatenated Web access logs of Room B19 in Boston University’s Computer Science Department in March and April 1995, with a total of 553 users and 437,861 Web accesses [16]. The NLANR trace is the access log at NLANR’s pb node on November 13, 2000, with a total number of 127 user IP addresses and 878,085 Web accesses. Both UCB and BU traces can be considered as access network logs since they are very constrained in terms of user set, access method, and user physical locations. The NLANR trace, on the other hand, includes both client request and proxy requests. For our purposes, we need to distinguish each user in the three traces. This is straightforward in the BU trace since it recorded the user ID. In the UCB trace, each user was assigned a static IP address, so we use IP address to identify a user. For the NLANR trace, we have to limit the length of the trace to one day because the anonymization function changed daily. For both UCB and NLANR traces, we assume that there is a one-to-one mapping between an IP address and a user.

6.4.3

Simulations

Here, we perform trace-driven simulations to evaluate the relation between caching performance and terminal power consumption in the ACME Director. To quantitatively measure the ACME Director’s performance, we note that its hit ratio is upperbounded by edge caching using broadcast and lower-bounded by client caching using unicast. Its hit ratio is also dependent on the multicast factor a. So we define the Director effectiveness E(a) as E(a) ¼

H D ( a) H C HB HC

6.4


199

where HD (a) is the hit ratio of the ACME Director with multicast factor a, and HB and HC are the hit ratios for edge caching using broadcast and client caching using unicast, respectively. Obviously, 0 E(a) 1, with E(0) ¼ 0 and E(1) ¼ 1. This normalized performance metric is very convenient for us, because we can compare the ACME Director’s performance in different traces even if they have dramatically different HC , HB , and HD (a), as shown in Table 6.1. Furthermore, we observe that the average terminal power consumption is proportional to the average multicast group size S(a). So we will use S(a) to represent average terminal power consumption. To compare the group size between different traces, we define relative multicast group size s(a) as s(a) ¼

S(a) S(1)

So s(a) is the group size relative to that of broadcast. This normalization is necessary because the number of users N varies greatly among traces, as shown in Table 6.1. Now we run simulations driven by the traces described above to determine E(a), S(a), and s(a). We partition each trace into a training part (odd-numbered accesses) and a test part (even-numbered accesses). We count the number of requests that user j makes after user i in the training part and obtain p( jji), i, j ¼ 1, 2, . . . , N, and normalize the matrix. We then run the test part of each trace in a ACME Director simulator and measure the hit ratios under different selectiveness factors. E(a) is then computed based on HD (a), HF , and HC . We obtain S(a) by counting the number of terminals receiving pushed object, for all requested objects, and compute s(a) by normalization. Table 6.1 lists S(1) (broadcast group size), and it closely follows the number of users in the network, N. For example, S(1) is 117 in the NLANR trace, which has 127 users. In the UCB trace, which has 2155 users, S(1) is 1586. This is consistent with our analysis that ACME with broadcast incurs O(N) terminal energy cost. Figure 6.10 depicts Director effectiveness E(a) against relative multicast group size s(a) by varying a from 0 to 1 for all three traces. In Figure 6.10 (as well as in Table 6.1), a0:5 represents the value of a when the ACME Director achieves 50% effectiveness, that is, the average hit ratio between client caching and edge caching. Our algorithm’s effectiveness is evident in the figure, as E(a) increase

TABLE 6.1

ACME Director Simulation Results a

Trace

N

HC

HB

HD(a0.5)

a0.5

S(1)

S(a0.5)

s(a0.5)

2155 553 127

13.34% 78.8% 12.5%

34.55% 86.9% 19.2%

23.95% 82.9% 15.9%

0.0055 0.02 0.053

1585.5 337.6 116.3

10.96 9.01 6.74

0.0069 0.0267 0.0580

UCB BU NLANR a

Here, a0.5 is the value of a when the ACME Director achieves 50% effectiveness and N is the number of users.

200


ACME Director Effectiveness E(α)

1

0.8

0.6

0.4

Multicast Group Size at 50% Effectiveness

0.2

0 0

UCB BU NLANR 0.2 0.4 0.6 0.8 Relative Multicast Group Size s(α)

1

Figure 6.10 Director effectiveness versus relative multicast group size. The ACME Director achieves 50% effectiveness with multicast group size that is 0.7– 6% of broadcast.

rapidly with very small a. In all three traces, a Director effectiveness of 80% is achieved at a relative group size smaller than 20%. In fact, the ACME Director achieves 50% effectiveness when the multicast group sizes range from only 0.7% (UCB trace) to 6% (NLANR trace) of the their respective broadcast group sizes. The average multicast group size ranges from 6.74 (NLANR trace, with broadcast group size of 115) to 10.96 (UCB trace, with broadcast group size of 1586). The reduction in average terminal power consumption is calculated as the inverse of the relative multicast group size, and the ACME Director reduces the terminal power consumption by 17 (NLANR trace) to 144 times (UCB trace) while maintaining 50% effectiveness. Another interesting finding from our simulations is that for a wide range of Director effectiveness values, the relative push group size decreases with the increasing number of users among traces. This indicates that the ACME push group size is much less sensitive than the full edge caching push group size, to the vastly different number of users in the three traces. Indeed, at 50% effectiveness, the push group size increases only 62%, from 6.74 to 10.96, as N increases 16 times from 127 in the NLANR trace to 2155 in the UCB trace. We conjecture that this is because for a given user, her interest in content can be best represented by a small group users whose interests are closest to hers. The size of this interest group grows much slower than the general user group because it keeps only the users with the closest interest in the general user group, and identifying this group can help to create scalable and efficient content multicast systems. It would be interesting to study this topic further, as it could lead to more efficient content delivery systems not only in wireless networks but in wireline networks as well.

6.6

6.5

CONCLUSIONS

201

ACME IN RADIO RESOURCE MANAGEMENT

ACME not only enhances QoS for networked applications by improving the bandwidth efficiency of content delivery services but also adds a new dimension in radio resource management for mobile access networks. Here, we sketch a new radio resource management scheme based on ACME. Current research in radio traffic engineering focuses on providing differentiated services for different classes of applications, mostly by prioritization [24]. The major limitation of this approach is that it does not work well when network utilization is high, because applications of lower priority are quickly deprived of bandwidth. If ACME is deployed, when network utilization increases, mobile network operators can multicast requested content more aggressively by increasing a. This results in higher terminal hit ratio and reduces the terminal’s need to use the radio bandwidth. The ACME Director provides a negative feedback mechanism to reduce the possibility of radio resources depletion by offering optimal balance between radio bandwidth, terminal storage and power usage. Chapter 5 shows that mobile Web requests are extremely correlated among users, so selective multicast using the ACME Director could efficiently enhance radio resource management. To further improve the bandwidth – delay –battery balance, the ACME Director can exploit the terminal state information and multicast content to active terminals only. It can also integrate terminal battery information into multicast decision if such information is available. For example, it can push content to a terminal more aggressively when the terminal is connected to AC power, and stop pushing when the terminal batter power falls below a certain threshold. 6.6

CONCLUSIONS

In this chapter, we reviewed the mobile content delivery techniques using a taxonomy approach, and identified the air interface as the bottleneck of mobile Web user experience. We developed ACME, an efficient content delivery architecture for mobile networks, which extend the network edge to the terminal using content push. We analyzed the performance characteristics of ACME and showed that it effectively reduced the effect of air interface on user experience. Furthermore, we identified that the effectiveness of web caching lies in user interest correlation. We exploited user interest correlation to significantly improve ACME’s power consumption efficiency as well as scalability. In addition, ACME adds a new dimension to radio resource management, and can be used to offer an optimal balance between radio bandwidth, terminal storage and power usage. ACKNOWLEDGMENTS The authors would like to thank Dimitris Koulakiotis, Lili Qiu, Gang Xu, Duane Wessels, and Antti Toskala for valuable information and comments.

202


REFERENCES 1. http://www.3gpp.org/ftp/specs/archive/23_series/23.846/. 2. http://www.3gpp.org/ftp/specs/archive/25_series/25.855/. 3. S. Acharya, M. Franklin, and S. Zdonik, Prefetching from a broadcast disk, Proc. Int. Conf. Data Engineering, New Orleans, LA, Feb. 1996. 4. S. Acharya, M. Franklin, and S. Zdonik, Balancing push and pull for data broadcast, Proc. ACM SIGMOD, Tuscon, AZ, May 1997. 5. M. Agarwal, A. Manjhi, N. Bansal, and S. Seshan, Improving web perfomance in broadcastunicast networks, Proc. IEEE INFOCOM 2003, San Francisco, CA, March 2003. 6. S. Ahuja, T. Wu, and S. Dixit, On the effects of compression on web cache performance, Proc. Int. Conf. Information Technology: Coding and Computing, Las Vegas, NV, April 2003. 7. http://www.akamai.com. 8. http://www.amazon.com. 9. C. Anderson, GPRS and 3G Wireless Applications, Wiley, 2001. 10. H. Balakrishnan, V. Padmanabhan, S. Seshan, and R. Katz, A comparison of mechanisms for improving TCP performance over wireless links, IEEE Trans. Network. 5(6) (Dec. 1997). 11. http://www.bamboomc.com. 12. D. Bertsekas and R. Gallager, Data Networks, 2nd ed., Prentice-Hall, 1992. 13. H. Byers, M. Luby, M. Mitzenmacher, and A. Rege, A digital fountain approach to reliable distribution of bulk data, Proc. ACM SIGCOMM’98, Vancouver, BC, 1998. 14. R. Chakravorty, S. Katti, J. Crowcroft, and I. Pratt, Flow aggregation for enhanced TCP over wide-area wireless, Proc. IEEE INFOCOM 2003, San Francisco, CA, March, 2003. 15. M. C. Chan and T. Woo, Cache-based compaction: A new technique for optimizing web transfer, Proc. IEEE INFOCOM 1999, New York, 1999. 16. C. Cunha, A. Bestavros, and M. Crovella, Characteristics of WWW Client-Based Traces, Technical Report BU-CS-95-010, CS Dept., Boston Univ., July 1995. 17. http://www.cw.com/US/services/managed_services/cdns. 18. A. Datta, D. Vendermeer, A. Celik, and V. Kumar, Broadcast protocols to support efficient retrieval from databases by mobile users, ACM Trans. Database Sys. 24(1) (1999). 19. F. Douglis, A. Haro, and M. Rabinovich, HPP:HTML macro-preprocessing to support dynamic document caching, Proc. USENIX Symp. Internetworking Technologies and Systems, 1997. 20. B. Duska, D. Marwood, and M. Feeley, The measured access characteristics of worldwide-web client proxy caches, Proc. USENIX Symp. Internet Technologies and Systems, 1997. 21. http://www.esi.org/overview.html. 22. S. Floyd, V. Jacobson, C. Liu, S. McCanne, and L. Zhang, A reliable multicast framework for light-weight sessions and application level framing, IEE/ACM Trans. Network. 5(6) (Dec. 1997). 23. S. Gribble UC Berkeley Home IP HTTP Traces, July 1997; http://www.acm.org/ sigcomm/ITA/.

REFERENCES

203

24. Y. Guo and H. Chaskar, Class-based quality of service over air interfaces in 4G mobile networks, IEEE Commun. Mag. 40(3) (2002). 25. H. Holma and A. Toskala, eds., WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, Wiley, 2000. 26. J.-H. Hu, K. Yueng, G. Feng, and K. Leung, A novel push-and-pull hybrid data broadcast for wireless information networks, Proc. IEEE ICC 2000, New Orleans, LA, June 2000. 27. T. Imielinski, S. Viswanathan, and B. Badrinath. Data on air: Organization and access, IEE Trans. Knowledge Data Eng. 9(3) (1997). 28. http://ita.ee.lbl.gov/html/traces.html. 29. Z. Jiang and L. Kleinrock, Web prefetching in a mobile environment, IEEE Pers. Commun. (Oct. 1998). 30. V. Kanodia and E. Knightly, Ensuring latency targets in multi-class web serves, IEEE Trans. Parallel Distrib. Syst. 14(1): 84– 93 (Jan. 2003). 31. M. Liljeberg, T. Alanko, M. Kojo, H. Laamanen, and K. Raatikainen, Optimizing worldwide web for weakly connected mobile workstations: An indirect approach, Proc. 2nd Int. Workshop on Services in Distributed and Networked Environments (SDNE’95), Whistler, Canada, June 1995. 32. H. Linder, H. Clausen, and B. Collini-Nocker, Satellite Internet services using DVB/ MPEG-2 and multicast web caching, IEE Commun. Mag. (June 2000). 33. M. Margaritidis and G. Polyzos, Mobiweb: Enabling adaptive continuous media applications over wireless links, IEEE Pers. Commun. 7(6): 36 –40 (Dec. 2000). 34. J. Mogul, Squeezing more bits out of HTTP caches, IEEE Network 14(3) (2000). 35. http://www.openmobilealliance. org. 36. http://www.ietf.org/html.charters/opes-charter.html/. 37. K. Sripanidkulchai, B. Maggs, and H. Zhang, Efficient content location using interestbased locality in peer-to-peer systems, Proc. INFOCOM 2003. 38. K. Stathatos, N. Roussopoulos, and J. Baras, Adaptive data broadcast in hybrid networks, Proc. 23rd VLDB Conf., Athens, Greece, 1997. 39. S. Thoma, HTTP Essentials, Wiley, 2001. 40. A. Wolman, G. Voelker, N. Sharma, N. Cardwell, A. Karlin, and H. Levy, On the scale and performance of cooperative web proxy caching, Proc. 17th ACM Symposium on Operating Systems Principles, 1999. 41. J. Wong, Broadcast delivery, Proc. IEEE 76(12) (Dec. 1988). 42. T. Wu and S. Dixit, The content driven mobile Internet, Wireless Pers. Commun. 26(1– 2) (2003).

CHAPTER 7

CONTENT ADAPTATION FOR THE MOBILE INTERNET STEPHANE COULOMBE Nokia Research Center Irving, Texas

OSKARI KOSKIMIES and GUIDO GRASSEL Nokia Research Center Helsinki, Finland

7.1

MOTIVATION FOR ADAPTATION

Nowadays, pervasive computing is a hot topic. There is a proliferation of handheld devices on the market (smart phones, PDAs, handheld PCs, etc.), and this trend will only grow in the coming years. Since they have different display size (and resolution), bit rate (or bandwidth), processing power, user interface, and color capabilities, applications and services must be tailored to each device to be successful or even useful. However, the task is becoming hard to manage as many new devices are constantly introduced on the market. For example, consider a weather service that delivers weather forecasts to mobile devices. For the oldest of GSM phones, the forecast can be delivered only as a short message, which limits the service to a short textual description of the weather forecast. Slightly newer phones support picture messaging, which allows weather icons to be sent as small monochrome images. Phones that have WAP browsers allow longer amounts of descriptive text to be attached to the images. Modern phones permit larger color weather maps—possibly even animated—with multimedia messaging and/or WAP browsing. It is even possible to send a television-style weather report as a video clip or streaming video to the newest terminals. See Figure 7.1 for an example of how the weather service might look on different devices.


205

206

CONTENT ADAPTATION FOR THE MOBILE INTERNET

Figure 7.1 Screenshots of a weather service on different devices: (a) text-only phone; (b) imaging phone or PDA (portrait screen); (c) video-capable phone or PDA; (d) imaging phone or PDA (landscape screen).

Of course, it is possible to use the same content for all terminals by adhering to the lowest common denominator—in this case, short textual descriptions. However, the owners of high-capability terminals would not find the service satisfactory, as it does not take advantage of the features of their terminals. A service that strives to be successful has to provide each terminal with content that adequately utilizes the capabilities of the terminal. Even this complex picture of terminal diversity is a simplification of the reality we are facing. For example, of those terminals that support color images, the maximum resolution of supported images varies, as do supported image formats and the maximum size of multimedia messages that the terminal can receive. Similarly, the processing power of a phone affects how high-resolution videos it is able to play or the complexity of the games it can offer. However, terminal diversity is only one part of the problem. The bandwidth available to terminals varies, both because of terminal capabilities (whether it supports Bluetooth, GPRS, WLAN, or UMTS) and because of network characteristics and coverage. Thus, at different times different content might be suitable even for the same terminal. For example, when a user is outside densely populated areas, there is likely no UMTS coverage and only GPRS is available. In this case, a high-quality streaming audio or video should be replaced by a lower-quality streaming to meet bandwidth limitations. Reducing the quality of Webpages for low-band-

7.2

MULTIMEDIA CONTENT TYPES

207

width conditions may also be required to keep the response time acceptable for users. Determining the best content for a certain terminal and bandwidth is also not straightforward. Unavoidably, “best” represents a certain tradeoff of presentation quality for used resources, and different users will prefer different tradeoffs. Some people might appreciate timely delivery more than flashy content, and vice versa. Some people might be willing to put up with large amounts of scrolling in order to get a lot of content at once, whereas others would prefer to get only what comfortably fits on one screen. Some people might prefer cheap and utilitarian content, whereas others would want flashy content that shows off the capabilities of their terminal even at a higher cost. Clearly the user cannot be factored out of the equation. Thus, any system that wishes to provide an optimal service for all users, has to take into account not only the properties of the terminal itself, but also the capabilities of the network and the preferences of the user. It is not an easy task, nor have all the related problems been adequately solved yet, but there are many useful techniques for handling the problems. In the following sections, we will examine the problem area more closely, and present some alternative ways of performing content adaptation. We will first present the different multimedia content types. Then we enumerate the different types of adaptation that can be performed on content. Subsequently, we present the methods of adaptation and the capabilities and metadata they use and the different adaptation architectures. Different applications scenarios are then presented to show how all the concepts work together in practical cases. We conclude by presenting relevant standardization efforts related to content adaptation.

7.2


In this section, we categorize the different content types that can be transmitted over the mobile Internet. We will see in the next sections that different adaptation methods and strategies may apply to each category. Each content type relates to the actual useful information or data and not elements related to its transport (headers or signaling data). 7.2.1

Media Content

Media content comprises the basic building blocks of any presentation; text, images, animations, audio, and video are the most common modalities. The role of media content is to carry information to the user. On the other hand, presentation content (explained further in Section 7.2.2) defines how the information should be shown to the user. 7.2.1.1 Textual Content Text is often considered the most basic of media content types, but its handling can be surprisingly complicated when international issues are taken into consideration. For example, the text encoding used by the Japanese for their kanji ideograms is

208


naturally entirely different from the text encoding normally used in the United States. Similarly, Russians use Cyrillic lettering. And while European languages mostly share the basic Latin alphabet that English uses, they usually have special characters such as the German u¨ or the Scandinavian a¨. For example, multimedia messaging terminals should support at least the U.S. ASCII and Unicode (UTF-8 and UTF-16) encodings. In the same way that it might be necessary to transform an image into a different format because the receiving terminal does not support the original format, it may be necessary to transform text into a different encoding. It is also possible to reduce the length of a text by abbreviating longer words and doing other text transformations such as changing “and” to “&.” These techniques are commonly used in HTML to WML transformation engines, [e.g., 29]. However, doing language-independent textual transformations is much harder. Finally, sometimes it is necessary to convert text into speech using a speech synthesizer, for example, because the receiving terminal lacks a display, the user is visually impaired, or the usage situation is such that the user cannot look on the display (e.g., while driving). 7.2.1.2 Audiovisual Content This category comprises any modality of audio or visual information including speech, audio, music, images, video, and graphic. They contribute to provide rich user experience. Although they are usually binary encoded (compressed), their file size tends to be large compared to text or layout information. This may be a problem for storage and also for transmission. We need to make a distinction between scalable and nonscalable audiovisual content since they have an impact on the adaptation strategy and performance. Throughout this chapter, we will use images as content examples since they are easier to demonstrate in a book (unlike speech, audio, or video). But the concepts usually apply to the other media contents as well. Nonscalable Audiovisual Content. Nonscalable content provides only one representation of the content once decoded. For an image, the resolution and the quality would be fixed. For audio, the sampling frequency and the quality are fixed. The encoded version doesn’t allow obtaining a different version of the content. Speech can be compressed using many nonscalable speech compression schemes. The most popular formats include AMR [1], EVRC, 13K (QCELP), GSM, EFRC, G.729, G.723.1, and PCM u-law and A-law (actually PCM u-law and A-law are not speech codecs per se but are used in telephony). The speech is typically sampled at 8kHz and the bit rate typically varies from 4.75 to 64 kbps (most formats used in the mobile environment being in the range of 7.2 –13 kbps). The highperformance speech codecs typically use technologies modeling the human speech production. They mathematically model the human vocal tract (throat, tongue, nose, and mouth) and the process of pushing air into it to generate speech. Since the model is designed for human speech, high compression of speech can be achieved while maintaining good intelligibility. However music encoded with a speech codec

7.2


209

is of rather bad quality. Audio can be compressed using many methods and in varying file formats, including MPEG-4 AAC [2], MP3 (MPEG layer III), ASF (advanced streaming format), and Real Audio G2. Audio codecs can compress any complex sound such as music, bird song, waterfall, and high-quality speech (much more natural than typical speech codecs). The audio codecs exploit the redundancies and irrelevancies in the frequency domain using a model of the human auditory system. For instance, it won’t try to reproduce sounds that the human ear can’t perceive. But, even compressed using those compression schemes, audio can require several hundreds of kbps to achieve CD quality. Speech is used extensively in mobile Internet because of its bandwidth efficiency. Applications of speech clips include voicemails and recording personal moments (audiovisual). But audio applications on mobile phones are growing (e.g., MP3 music playback). Although many image formats exist, two formats possess universal usage on the Internet: GIF and JPEG. JPEG [3] is used to compress efficiently, in a lossy manner, natural images (photography). Typically JPEG offers a compression ratio of 10 : 1 without significant visual quality loss. GIF [4] used to compress, in a lossless fashion, bitmap graphics of up to 256 colors. Typically, GIF’s compression ratio is about 4 : 1. GIF can also support animations by playing numerous images successively. Other formats of interest for the wireless Internet include PNG and WBMP. PNG [5] is a patent-free1 alternative to GIF also offering also more functionality. However, PNG requires MNG[6] to perform multiimage animation. WBMP is a format introduced in the early stages of the mobile Internet. It is an uncompressed black-and-white image format introduced to meet low-CPU, low-complexity early phones. Its usage will reduce as nowadays most phones support GIF and JPEG. The standardized video formats of interest in the mobile Internet include H.263 and MPEG-4. Scalable Audiovisual Content. Scalable content provides many representations of the content in a single bit stream. From the same file, content with different characteristics can be extracted. Such flexibility allows making the content suitable for different bit rate, terminal processing power, terminal resolution, and other features. A good example of scalable format is JPEG2000 [7]. Its bitstream can be organized in layers of resolution and quality. Thus from the same file, we can extract different parts to obtain an image with lower resolution or quality. However, the image must be encoded in the proper manner in order to get such benefits. JPEG2000 has twice the compression efficiency of JPEG but is about 5 times more complex. Also the compression efficiency is reduced with the increase of scalable layers (e.g., layers of different resolution or quality). Another format illustrating scalability is the scalable vector graphics (SVG) [9] format; a language for describing two-dimensional graphics in XML [57]. Applications for this format include maps, cartoons, and animations. SVG is currently a W3C candidate recommendation. Since SVG describes shapes, it can be rendered 1

GIF compression is protected by a Unisys patent that expired in June 2003 in the United States, but it does not expire in most of Europe, Japan, and Canada until about mid-2004.

210


on any screen resolution, thus providing resolution scalability. Since the original SVG recommendation was too complex for mobile terminals, simpler mobile profiles of SVG have been created: SVG Basic and SVG Tiny. 7.2.2

Presentation Content

Media content by itself does not tell how the content should be composed into a presentation. We need to give the presentation a layout, defining what is shown where, and when. The best-known examples of presentation content formats are the Hypertext Markup Language (HTML) [30], and its wireless counterpart Wireless Markup Language (WML) [34]. However, both are being slowly replaced by their XMLbased successors, XHTML [31] and XHTML Mobile Profile (XHTML MP)[28,38]. Presentation content such as HTML often includes any text content in the presentation, although strictly speaking the text is media content just like the images are. Since the presentation is usually defined with a textual description, it is often easier to embed text content directly within the presentation than to provide it in separate files. There are also pure presentation formats, such as Synchronized Multimedia Integration Language (SMIL) [21], which is used, for instance, in multimedia messaging, which are concerned only with presentation and consider text an external entity, the same as any other media. See Figure 7.2 for an example. Presentation content can contain aspects of device independence and content adaptation. For example, HTML documents should work independently of screen resolution, although use of large images and fancy layouts restricts this. Similarly, HTML provides a way of defining an alternative textual description for images that can be used in terminals that cannot show the images. XHTML 2.0 [31] has an advanced version of this functionality where media objects can have multiple alternatives, and SMIL content control allows the author to explicitly specify in which circumstances a certain content alternative should be used (e.g., depending on screen resolution). 7.2.2.1 Stylesheets An important paradigm for device independence is separation of data and their representation. When there is no representational information in the data (such as layout or formatting), it is easy to apply a new representation to it to change its look and feel or to make it suitable for a new device. The representation is commonly applied as a stylesheet. Stylesheets describe how documents are presented on screens, in print, or even in spoken form. By attaching style sheets to Web documents, authors and readers can influence the presentation of documents without sacrificing device independence. Cascading Style Sheets (CSS) [33] can be used to change the look and feel of a Website easily, but since HTML still contains many representational aspects stylesheets cannot entirely remake its representation. The next generation of Web technologies—XML, XHTML 2 and XForms—implement the separation more faithfully and provide better possibilities for device-independent content. XHTML 2 is the successor of HTML, where presentation concerns have been banished to

7.2


211

Happy Birthday to You!
Figure 7.2 Text with image in both HTML and SMIL.

CSS stylesheets. XForms improves current HTML forms by separating the data model from the form user interface, so that multiple user interfaces can easily be used with the same application. Unfortunately, while content written in XHTML 2 does not tie itself to a certain type of device by specifying a certain presentation, it does so in other ways. For instance, content that fits comfortably on one screen on a desktop PC should be divided into several pages when viewed on a mobile phone. It becomes then also necessary to provide the user with navigation links for moving between the pages. XHTML 2 does not provide device-independent ways for handling this. Similarly, large images look good on a desktop monitor but should be replaced by much smaller versions when viewed on a phone screen. For instance, while XHTML 2 enables you to give alternative content for devices that do not support a certain content type, it is not possible to provide lower-resolution images for devices with lesser screen sizes. CSS Media Queries [32], on the other hand, allows some adaptation based on device capabilities, but only in a limited way (very few device attributes can be used to make decisions). Furthermore, CSS is interpreted

212


on the client. So heavy CSS support is required on the client side, which is not available in most mobile terminals. Large Websites have the option of using data stored in application-specific XML markup as source for content. The XML data is then converted using Extensible Stylesheet Language (XSL) [36] Transformations (XSLT) [37] into any desired representation—XHTML, WML or even PDF. However, this solution is technically complex and requires heavier initial investment in setting up the service and designing the way data is stored. It is not really optimal for smaller Websites, let alone private Webpages. For example, assume that we want to transform the following piece of weather XML data into WML: Finland Helsinki PartlyCloudy 21 S 16

We would then use the following XSLT stylesheet (a different one would be needed to produce HTML):
city:

8C Wind 176 208 Nokia3650 15 6 Yes 12 Yes Yes PhoneKeypad 2 Nokia Yes Yes 1 1

7.5

CAPABILITIES AND METADATA

235

US-ASCII UTF-8 ISO-10646-UCS-2 ISO-5589-1 US-ASCII UTF-8 UTF-16 ISO-10646-UCS-2 Yes SunJ2ME1.0 Symbian OS Symbian LTD 6.1 Yes application/vnd.sun.java text/vnd.sun.j2me.app-descriptor application/java-archive application/vnd.symbian.install Figure 7.11

UAProf profile for the Nokia 3650 phone from nds.nokia.com.

236


Figure 7.12

7.5.2

Infopyramid of a weather service (see Section 7.4.2.1).

Metadata

When adaptation is done by content selection, the author must create multiple versions of the individual pieces of content (WML decks [34], images, etc.).4 For instance, the author will create HTML pages or WML decks of different lengths and images of different size and quality to support the different terminal resolutions and network bit rates. For example, tomorrow’s weather might have the alternative representations shown in Figure 7.12. An advanced terminal would be able to show any of the alternatives, so the service must have some information about the relative quality and usefulness of the alternatives as well. Assuming the capabilities of the terminal are known, an adaptation service could in principle then choose between the content versions based only on this information. However, it then becomes the responsibility of the adaptation service to decide what content is best for a certain context, and the content author (who presumably would know best) cannot influence the decision. Therefore, content metadata should also include information about the context in which the content can be used, rather than only information about the content itself. Figure 7.13 depicts an example of this type of metadata. As illustrated in Figure 7.13, the annotations describe the set of minimum requirements that a terminal and network must fulfill in order to be able to receive each specific version. A version is selected only if all the requirements are met. For instance, we can’t select the video with a terminal having a bit rate of 28,800 bps, even though the terminal might support video playback (because it would take too long to download the video).

4 In advanced systems, different versions of presentation content can be automatically generated from generic XML data. Automating generation of multi-version media content is more difficult.

7.6 ADAPTATION ARCHITECTURES

Figure 7.13

237

Multimedia content descriptor (metadata) for a weather service’s InfoPyramid.

A key element of the approach is to establish the usefulness or utility of the content. This allows deciding which version to select when the terminal or network can support many versions. The content selection mechanism will then choose the version of content with the highest utility out of those for which the terminal and network capabilities meet all the requirements. 7.6

ADAPTATION ARCHITECTURES

This section presents different adaptation architectures. Specifically, it addresses the issue of where adaptation should be performed and terminal capabilities propagated. 7.6.1

Location of Adaptation

There are three possible ways to ensure both that the message can be delivered to the receiver and that it conforms to the capabilities of the receiver: (1) ensuring it conforms before being sent from the content source; (2) modifying it at some intermediary point; or (3) modifying it at the receiver so that it conforms. We will use the following definitions: . Source—origin of the content. A content server for browsing or the sending terminal for messaging.

238


. Destination—final target location of the content. A client for browsing or receiving terminal for messaging. . Intermediary—an entity between the source and destination. It can include all kinds of proxies, gateways or other servers between the source and the destination. In this section, we first discuss the disadvantages and advantages of adapting at each of these locations. Then we present the different adaptation architecture configurations.

7.6.1.1 Adaptation at the Source In this case, the adaptation is performed at the origin of the content. In browsing, the adaptation would be performed by the Web server. It can be argued that this is the most logical place to perform the adaptation since the source can decide on the most appropriate content to send to the recipient according to its capabilities. After all, the content owner should know best what information it wants to convey to the user taking into consideration the capabilities of its terminal. Commercial sites have recognized this need and often provide different content for Netscape and Internet Explorer users. The source can perform adaptation through a combination of content selection, transcoding, and scripting (e.g., XSLT) techniques depending on the nature of their content. But adaptation at the content source can be difficult to achieve for many reasons; for instance, it can require a lot of processing power, it may require a lot of effort to create the content in a way that it can adapt to different terminals, and finally the source must have knowledge of the terminal capabilities in order to be able to perform adaptation. In today’s market, most Web service providers don’t find it economical to customize content to different mobile terminals. As a matter of fact, most offer a single version that somehow can reach the majority of PC terminals. Customizing their server to reach a small additional percentage of users would not be profitable. Only a small number of commercial sites provide additional service for mobile phones, often a basic text-only page for early WAP enabled phones. Also, the problem of obtaining the terminal capabilities is not an easy one to solve. On the mobile side, standards such as UAProf can be used to learn the terminal capabilities. However, on the Internet side, UAProf is not widely used and maintaining capability databases based on the user-agent header is quite demanding. But as more and more mobile phones will be in use, it should be expected that more Website will support mechanisms to learn the terminal capabilities and provide customized content to mobiles. For messaging applications, adaptation at the source is even more problematic. It requires that the sender know the recipient’s capabilities, understand those capabilities, and be willing and capable of creating content to meet them. This is a lot to expect from the sender.

7.6 ADAPTATION ARCHITECTURES

239

7.6.1.2 Adaptation at the Destination The content can also be adapted at the destination. It can be argued that the destination is the best place to perform adaptation since the user should decide how he/she wants the content to be rendered. For instance, the user may want to change the layout of the content, change the font’s color or size. Therefore there is great benefit to leave appearance adaptation to the destination. Actually, both source and destination are important adaptation locations that should be complementary. The source should provide the best supported content possible to users while giving them as much flexibility as possible to control how it is rendered. Format, size, characteristics, and encapsulation adaptation at source or intermediary is often required for the content to be supported or even reach the destination. Also adapting at the destination may require heavy computations that can be problematic for a mobile terminal: (1) it would increase the time until the content is rendered on the terminal, affecting user experience; and (2) this additional processing would affect the battery life. 7.6.1.3 Adaptation at the Intermediary When the source doesn’t support adaptation for any of the reasons mentioned above, an intermediary can perform the adaptation to enhance the usability of a service. In the mobile world, the WAP gateway is performing this task for browsing applications, while the multimedia messaging service center (MMSC), or an external transcoding server under its control, is presently assuming that role for the multimedia messaging service (MMS). Today’s WAP gateways and MMSCs can perform image format conversion, resolution reduction, and other functions. Unlike adaptation at the source, there may be legal implications to adapt the content between the origin and the destination. Also the results of adaptation may not be acceptable depending on the nature of the content. 7.6.2

Adaptation Architecture Configurations

Several elements affect how and where the adaptation process is performed. How the content is adapted depends on the type of the content at hand. For instance, the origin server can perform content selection when it has access to multiple versions and it can also transcode. However, an intermediary can typically only transcode content on the fly between the source and the destination. The location of the adaptation is often determined by the knowledge of the terminal capabilities. Only network elements having such knowledge can perform adaptation. Ideally the origin server should perform adaptation but if it doesn’t or can’t, then an intermediary should take over. In the case of mobile browsing, adaptation is usually performed by an intermediary, such as WAP gateway, because it has knowledge of terminal capabilities and can perform transcoding taking into account the terminal capabilities, something that very few Websites offer to wireless devices. Figure 7.14 shows different adaptation architecture configurations. Each configuration shows where the adaptation is performed and where the terminal

240


Capabilities (a) Source

Dest. Content

Capabilities (b) Source

Dest.

Inter. Content

Content Capabilities

(c) Source

Dest. Content

Figure 7.14 Adaptation architecture configurations (the cube shows the location where adaptation takes place).

capability information needs to be propagated. The diagrams don’t provide the protocol details as they are application-dependent. In addition, the terminal capabilities are not always part of the protocol exchange but can be part of an operator’s user profile database, for instance. Configuration (a) illustrates the adaptation architecture where the source performs adaptation on the basis of terminal capability knowledge. This is the case of commercial Websites using user-agent and/or accept headers of HTTP/WSP to select or transcode the content. Configuration (b) illustrates the adaptation at an intermediary. This configuration is typical in mobile browsing where a WAP gateway performs adaptation of web content obtained from the source. It is also the case of MMS adaptation where the MMSC performs adaptation between the source of the message and the destination. UAProf or user-agent and/or accept headers of HTTP/WSP can be used to propagate terminal capability information. Configuration (c) illustrates the adaptation at destination, typically based on the terminal’s display characteristics and user’s preferences. Note that combination of these configurations can be used in practice to distribute the adaptation operations between source, intermediary, and destination. For instance, in browsing, some initial content selection can be performed at the source, then the intermediary could perform some encapsulation adaptation for efficient wireless transport and finally the terminal could handle presentation adaptation. In the next section, we present different applications scenarios and show how they use those configurations.

7.7

7.7

APPLICATION SCENARIOS

241


This section presents different application scenarios. It applies the different concepts presented in earlier sections to solve two major application adaptation problems: browsing and multimedia messaging. It will explain in more detail how each application uses the configuration architectures and how it makes use of the adaptation methods.

7.7.1

Scenario for Content Selection: Browsing

We first present an example of content adaptation for browsing applications using the content selection adaptation method. Again, we will use the weather forecast service for illustration purposes. In this case, the source in the weather forecast service server, there may be an intermediary and the destination is the requesting terminal. An intermediary can be used to convert between HTTP and WSP. But we assume that except possible encapsulation adaptation handled in the intermediary, the source is performing the adaptation. This corresponds to configuration (a) of the adaptation architecture of Figure 7.14. Before we present this example, we should note that today the majority of adaptation for mobile browsing is achieved through transcoding at an intermediary (in a gateway). This raises many issues: such as quality of content and legal aspects. Therefore, in the near future content selection is expected to be more common than it is today. The protocol interaction, illustrated in Figure 7.15, is as follows: 1. Client requests content of URL to server and provides its capabilities (UA header and optionally UAProf).

Figure 7.15

Protocol interaction in the case of browsing application.

242


2. Server resolves UAProf capabilities and possibly gets additional capabilities from a local database, if needed, using UA header or static UAProf URL (not shown). 3. Server selects the best content according to terminal capabilities and its content selection policies. The algorithm will be described below. 4. Server may perform additional transcoding or XSLT operations (not shown). 5. Server delivers the adapted content to the client. Let’s illustrate how the content selection process works for simple image selection using a specific content selection policy. The content selection algorithm proposed here is independent of the specific content descriptors or methods of storage of the information. However, the effectiveness of the actual content selection process will depend on the choice of the specific content descriptors and their specific values as entered by the author or content provider. 7.7.1.1 Content Selection Algorithm The algorithm presented here is such that to each media content version is attached a set of requirements that we call “multimedia content descriptors” (MCDs). These requirements need to be fulfilled by the terminal, the network, and the user preferences in order for that version of the source content to be selectable (possibly chosen). The algorithm is thus based on a comparison between multimedia content descriptors and capabilities and characteristic descriptors. Among all “selectable” versions, the one of highest value (from the user’s perspective as assessed by the author or provider) will be selected. For convenience, let’s assume that the content provider can order the different versions (or representations) content in decreasing order of value. Then, the algorithm will select the first version of the content in the list which requirements can be satisfied by the terminal, user preferences, and the network. That will give the best representation that the terminal and network can support. Note that although the example illustrates the case of images, it applies equally to other elements such as layout elements (XHTML, HTML, WML, etc.). The algorithm can be summarized in the steps presented in Table 7.1. The requirements usually take the form of BitRate 28000 and/or ScreenSize 320 240, for instance. It is important to note that the algorithm is very generic and not bound to a specific set of requirement attributes such as resolution or bit rate. In Section 7.7.1.2 we will illustrate further how to apply the algorithm using proposed descriptors that are useful in the context of adaptive multimedia under Web browsing application. 7.7.1.2 The Infopyramid and Media Capability Descriptors Suppose the Infopyramid and multimedia content descriptors of Figure 7.13. The content is annotated using the following multimedia content descriptors: Utility (Value): a positive integer setting the rank of this version with respect

to other related versions (where 1 is the order of the image having the lowest

7.7

TABLE 7.1


243

Example of Content Selection Algorithm

Production of multimedia content descriptors (done during the creation process of the source content): for each multimedia element Set the requirements for each version of multimedia element (usually done under the author’s supervision) Order the version in increasing order of value or quality (usually done by the author) Content selection (performed when a request to the content is received by the content selection engine from the phone): for each requested element (WML deck, (X)HTML page, inline image, audio, video, etc.) Select the first element in the list of versions for which all the requirements are satisfied (usually through the characteristics of the terminal and the network and user preferences); thus the search for a match starts from the version with highest value to the least value until a match occurs Return the selected version of the element (WML deck, (X)HTML page, inline image, audio, video, etc.) to the requesting entity

value). The value order is unique for each version of the original image. The author or content provider is expected to order the content. MinBitRate: minimum required bit rate in bits/s (bps). This attribute specifies the recommended minimum transfer speed under which that object shall be selected. Setting requirements that are too low will increase the download time to a point where it may not be acceptable to a user. The author or content provider is expected specify the required bit rate. MinImageResolution: minimum image resolution required (X Y pixels). For content selection, the terminal must be able to accept images larger or equal to this image resolution (in both X and Y dimensions). This attribute can be set automatically (or manually). MinVirtualScreenSize: the minimum virtual screen size (X Y pixels) under which the image should be displayed. The virtual screen size could represent the size of a web page or a WML card in WAP. The image can be selected only if the virtual screen size of the phone is equal or higher than the minimum virtual screen size in each dimension (X and Y). This attribute is very useful for controlling the display of decorative elements. For instance, an author may want some small decorative images displayed only if the virtual resolution of the terminal is large enough. If only MinImageResolution was used, such small images would probably be acceptable even for a small display and would overcrowd the display without much value for the user. The author or content provider is expected specify this MinVirtualScreenSize. By default, this value could be set to the image resolution. MediaFormat: the media format in which the picture is stored. To be acceptable for selection, the media format must be an element of the list of media formats that the terminal can accept. For convenience, the format name follows the notation of MIME types.

244


7.7.1.3 The Terminal’s Media Capability Descriptors Associated with those multimedia content descriptors, the terminal would provide its media capability descriptors (MCD) when making a request for content. The MCDs are5 BitRate: the terminal’s connection average bit rate. MaxImageResolution: the terminal’s maximum image resolution supported

(X Y pixels). VirtualScreenSize: the terminal’s virtual screen size (X Y pixels). MediaFormatSet: the terminal’s supported media formats.

7.7.1.4 Results of the Media Content Selection Figure 7.16 shows the selected media content component for three devices. 7.7.1.5 Results of the Overall Adaptation We saw in the previous section which media component was selected for each terminal. We now describe the adaptation of the presentation part and show the final adaptation result visible on each terminal. The selection of the base HTML/ WML pages can be performed using content selection also. Alternatively or in addition, XSLT is applied on the layout content to meet the terminal characteristics. The layout components include URLs to images, video, and other inclusive media content. Each URL in question is an abstract link as it doesn’t relate to a specific version; rather, it relates to the set of versions (the original filename). The selection engine, when receiving a request to such URL makes a selection of the content and returns the best version. For that purpose, using abstract URLs is a very important technique. Figure 7.17 shows how the weather application might look on the different devices at the end of the adaptation process. 7.7.2

Scenario for Transcoding: Multimedia Messaging Service

The multimedia messaging service (MMS) [16 – 19,25,60] is the next evolutionary step from the short message service (SMS). While SMS is typically used to exchange short text messages between users, MMS provides the opportunity to exchange much larger messages composed of a wide and rich variety of content types. These include still images, graphics, audio, music, and video clips. MMS is expected to become an important 3G application and enabler. The MMS architecture and the overall concepts have been standardized in the Third Generation Partnership Project (3GPP) [16 –18]. On the basis of the work and requirements of 3GPP, the Wireless Application Protocol (WAP)-based implementation specifications have been the responsibility of the WAP Forum. 5 These capabilities are provided for illustration purposes. Actual MCDs are defined in UAProf specification.

7.7


Terminal

1

2

3

BitRate

15000

15000

50000

MaximumImageResolution

320×240

50×50

320×240

VirtualScreenSize

320×480

80×80

320×240

MediaFormatSet

"image/jpeg", "image/gif", " image/vnd.wap.w bmp"

"image/jpeg", "image/gif", " image/vnd.wap.w bmp"

"image/jpeg", "image/gif", " image/vnd.wap. wbmp", "video/3gp"

BitRate too low for receiving utility 5 image.

VirtualScreenSize too low for utility 4 image.

A BitRate lower than 43200 would have resulted in utility 5 image.

245

Received content

Note:

Figure 7.16

Selected media content for different browsing devices.

Now the work is under the responsibility of the Open Mobile Alliance (OMA), where the specification activity is continuing [19]. Also, the specifications are now taking into account requirements from 3GPP2 [58–59]. However, this new service brings new challenges related to interoperability and user experience. MMS is evolving in a very pervasive environment composed of mobile terminals with very different characteristics. For instance, some early MMS phones were capable of sending and receiving messages no larger than 30 kB while others could support up to 100 kB. To complicate this situation, the

Figure 7.17

Weather application on the different devices.

246


capabilities of new mobile terminal products are evolving very rapidly. For instance, while the first MMS terminals could support images but no video, many MMS terminals today can support video and soon will support vector graphics. This environment makes it very challenging to introduce new formats and services over MMS while maintaining backward interoperability with older less capable mobile terminals. Server-side multimedia message adaptation (MMA) is a technology that attempts to reduce the MMS interoperability problems and allow smoother format and service evolution. Specifically, server-side MMA consists of adapting the content of a multimedia message in the multimedia messaging service center (MMSC), or in an external transcoding server under its control, to suit the capabilities of the receiving terminal. After a short introduction of the applications that MMS can bring and the protocol flow, we will discuss how content adaptation can be performed in MMS. 7.7.2.1 MMS Applications MMS introduces a generic mechanism to encapsulate and transport multimedia content without restricting the formats used.6 Therefore, MMS can be the foundation of numerous and very diverse applications as illustrated in Figure 7.18. MMS can provide the following applications: . Mobile to mobile: sending/receiving photos, audio/video clips, voicemail, business cards, and so on . Web applications to mobile devices: electronic postcards, greeting cards, advertisement, news of the day (video/audio clips), screen savers, animations, maps . Internet to/from mobile devices: receive selected emails, send emails. MMS can also be an enabler to many other applications such as interactive games. 7.7.2.2 MMS Transactions Figure 7.19 shows the message flow for multimedia messages delivery. The steps are as follows: 1. The sender’s terminal initiates a WAP POST (using WSP or HTTP) request to the MMSC in order to send a message. This operation uploads the message to the MMSC. The MMSC is then responsible for the delivery. Note that MMS is a store-and-forward messaging service. 2. After the MMSC has stored the message, it sends a notification to the message recipient’s terminal to inform it that a new message arrived. The notification is typically carried using WAP PUSH (e.g., SMS as the bearer). The notification contains a URL associated with the message. It also contains information 6

Actually, MMS could transport any type of content such as Java MIDlets or binary data.

7.7


247

Figure 7.18 Exchange of MMS messages between mobiles devices, the Internet (email, instant messaging), and Web applications.

about the message such as when the message expires, the message size, and optionally the sender’s address. 3. The notification triggers in the recipient’s terminal a WAP GET (using WSP or HTTP) operation that fetches the message (using its URL) from the MMSC to the mobile device. That transaction contains information about the terminal

2. Notification (WAP PUSH)

MMSC

3. Delivery request (WAP GET) with UA-header and UAProf info.

480x360, 60kbytes 5. Deliver message 1. Send (WAP POST) 4. Adapt message 240x180, 18kbytes

Figure 7.19

MMS transactions and adaptation framework.

248

4.

5. 6. 7.


type (UA header) and may contain information about the terminal capabilities using UAProf. Such information is crucial for message adaptation. The MMSC retrieves, from its database, the message corresponding to the URL. It then may adapt the message to meet the terminal capabilities; message adaptation is not mandatory for earlier MMS specifications but is becoming mandatory with MMS v1.2 [60] and future specifications. The MMSC sends the resulting message to the destination terminal. The terminal confirms reception of the message (not shown). The MMSC may send a delivery report to the sender using WAP PUSH (not shown).

We can see in Figure 7.19 that the delivered message was adapted to meet the lower resolution and memory capabilities of the receiving terminal. For instance, the phone can support messages no larger than 30 kB, including images with resolution not exceeding 352 288. 7.7.2.3 The MMS Conformance Document Since the specifications of the first MMS version did not mandate formats, many equipment manufacturers decided to join forces in writing an MMS conformance document [20]. The purpose of the MMS conformance document was to ensure some degree of terminal interoperability when MMS would be initially introduced by defining simple baseline requirements that first-generation MMS terminals should meet. It was understood that terminals would later support richer formats such as the ones defined in 3GPP TS 26.140 [18]. The first conformance document recommends SMIL support for presentation, JPEG baseline, GIF and WBMP for images, and AMR for audio. The minimum supported image resolution should be 160 120, and the supported message size supported should be no less than 30 kB. Again, it is important to emphasize that this conformance document is not intended to limit the functionality of terminals but to set a minimum assumption in early MMS deployment when the destination terminal capabilities are unknown. By all means they can be exceeded, and most phones introduced on the market now do. Also, new conformance documents or introduction of newer MMS profiles are expected. Therefore, MMS adaptation is still required but conformance documents are there to limit the scope of the adaptation functionality required. 7.7.2.4 The UAProf Descriptions for MMS Application The standardized UAProf capability descriptors for the MMS adaptation are presented in Table 7.2. The adaptation is performed taking into account those capabilities in addition to some local ones that may be associated with the UA header or static UAProf URL. MmsMaxMessageSize, MmsMaxImageResolution, and MmsCcppAccept are especially important for media content adaptation.

7.7

TABLE 7.2


249

UAProf Descriptors for MMS Application

MmsMaxMessageSize MmsMaxImageResolution MmsCcppAccept MmsCcppAcceptCharSet

MmsCcppAcceptLanguage

MmsCcppAcceptEncoding

MmsVersion

MmsCcppStreamingCapable (introduced in MMS 1.1)

The maximum size of a multimedia message in bytes. The maximum size of an image in units of pixels (horizontalvertical) List of supported content types conveyed as MIME types List of character sets that the MMS client supports; each item in the list is a character set name registered with IANA List of preferred languages; the first item in the list should be considered the user’s first choice; property value is a list of natural languages, where each item in the list is the name of a language as defined by (IETF RFC 1766) List of transfer encodings that the MMS client supports; property value is a list of transfer encodings, where each item in the list is a transfer encoding name as specified by (RFC 2045) and registered with IANA The MMS versions supported by the MMS client conveyed as majorVersionNumber.minor VersionNumber Indicates whether the MMS client is capable of invoking streaming

Presently, not many phones support UAProf, or at most they support static UAProf (not the dynamic form of it). For phones not supporting UAProf, the MMSC must rely on the UA header information, which is used as a key to a database containing capabilities for all phones on the market (and for each software release they may have). When static UAProf is received, the URL can also serve as a key to the same database. The MMSC can also fetch the terminal profile at the given URL and cache it for future requests. But dynamic UAProf is becoming an important feature as terminals can now download and install software that allows them to support new media formats. Without dynamic UAProf, an MMSC can’t tell the difference between two terminals having the same model and software release, but one has installed new media format support and the other has not. 7.7.2.5 MMS Adaptation Example for a Weather Service Consider an MMS-based weather application service. Every day, the service sends to each subscriber an MMS containing weather forecast information. In this case, the source in the weather forecast service server, the intermediary is the MMSC and the sink is the service subscriber. This corresponds to configuration (b) of the adaptation architecture presented in Figure 7.14 and described in the text following the figure. This is the most common model in MMS since, typically, only the recipient’s MMSC has knowledge of terminal capabilities and only after the message retrieval is requested.

250


Nevertheless, the source could send either (1) multiple versions of the content from which the MMSC could select the best alternative or (2) a single version that the MMSC may have to transcode. Option 1 is not practical for person-to-person communications as a user normally sends only a single message version (providing multiple versions would not only be cumbersome but would also increase significantly the overall message size). But for application-origin content, this would be a good option assuming that the MMSC can perform the content selection. SMIL switch statement would permit providing multiple versions. The origin server could also perform the content selection if it knows the terminal capabilities. For instance, the user could have provided them when subscribing to the service. Since transcoding of a message is used today mostly in MMS, we will illustrate that case. Consider an MMS containing weather forecast information as illustrated in Figure 7.20a. It doesn’t comply with the capabilities of the two receiving terminals presented in Figure 7.20b and 7.20c. For the first terminal, the MMSC reduced the number of colors of the GIF map to 32 to meet the size constraint. For the second terminal, the MMSC reduced the resolution of the GIF map by half to meet the resolution constraint. This was also sufficient to meet the message size constraint of 30 kB. Therefore 256 colors were retained. 7.7.3

Concluding Remarks

It should be clear to the reader that there is no best method for adapting content that suits all situations. Content selection gives the author more control on the adapted versions of the content but requires knowledge of the target terminal and some

Finland: Sunny today. Maximum … Finland:

Finland:

Sunny today. Maximum 21C/70F. Minimum 13C/55F.

Sunny today. Maximum 21C/70F. Minimum 13C/55F.

Sweden: …

Sweden: …

(a) Original message (53kB): GIF: 300x236, 256 colors, 51kB Text + SMIL: 2kB

(b) Adapted message (36kB): (GIF: 300x236, 32 colors, 34kB Text + SMIL: 2kB

Terminal capabilities: MmsMaxMessageSize = 40kB MmsMaxImageResolution = 320x240 MmsCcppAccept =image/jpeg,image/GIF

Figure 7.20

(c) Adapted message (17kB): GIF: 150x118, 256 colors, 15kB Text + SMIL: 2kB

Terminal Capabilities: MmsMaxMessageSize = 30kB MmsMaxImageResolution = 160x120 MmsCcppAccept = image/jpeg,image/GIF

Example of MMS adaptation for weather service.

7.8

STANDARDIZATION AND FUTURE WORK

251

work to create the different versions and establish the selection rules. Transcoding works well for automatic adaptation of simple media content but often fails when the content is more sophisticated such as a complexly laid-out Webpage. Also transcoding often requires more processing resources and may lead to legal issues. But it may be the only solution if the origin doesn’t perform adaptation.

7.8

STANDARDIZATION AND FUTURE WORK

Content adaptation is a topic of high importance and interest. It is expected to be a key part of future mobile applications. Several standardization activities are underway to shape new adaptation technologies and services. They include OMA, ICAP Forum, MPEG, and W3C. This section presents briefly some of those activities. In OMA, the MMS working group has introduced the concept of message classes and established minimum adaptation requirements between those classes to be supported by all MMSCs. For instance, MMSCs must support image resolution and size adaptation for formats such as JPEG. These requirements will be part of OMA MMS version 1.2 [60]. A new working group called standard transcoding interface (STI) was also formed in OMA to define a common transcoding interface between multimedia application servers (MMSC, browsing server, downloading server) and a transcoding server. More information can be found at the OMA Website [19]. The Internet content adaptation protocol [49] is another protocol providing application servers with the possibility of making transformation requests to another server. The protocol is HTTP-based. ICAP was designed to support transformation services such as language translation, virus checking, family (PG/R/X) content filtering, local real-time ad insertion, wireless protocol translation, anonymous Web usage profiling, transcoding, or image enhancement. The work of ICAP is concentrated mostly on the architecture and transfer of request/response attributes and data between servers. However, regarding transcoding, it appears that it is assumed that the transcoding server knows what transformations need to be performed on the content. On the other hand, OMA’s STI will define more precisely in the interface what requirements the adapted content must meet (size, formats, etc.) and possibly some preferences on how adaptation should be done. The vision of MPEG-21 [50] is to define a multimedia framework to enable transparent use of multimedia resources across a wide range of networks and devices used by different communities. MPEG-21 leverages the already existing MPEG standards such as MPEG-1,2,4 for audiovisual representation and XML-based MPEG-7 for content description. MPEG-21 contains several parts, including the digital item adaptation (DIA). In MPEG-21, the adaptation engines themselves are nonnormative tools of DIA. But media descriptions and format-independent mechanisms that provide support for DIA are normative. Specifically, the following item descriptions are under MPEG-21’s scope: user characteristics, terminal capabilities, network characteristics, natural environment characteristics, resource adaptability, and session mobility.

252


The World Wide Web Consortium (W3C) [46] develops interoperable technologies (specifications, guidelines, software, and tools) for the Web. It is the most important standards forum in the area of Web markup languages. Within the W3C, two working groups are especially interesting from content adaptation point of view: the Device Independence working group and the CC/PP working group: . The Device Independence working group [47] studies issues related to authoring, adaptation, and presentation of Web content and applications that can be delivered effectively through different access mechanisms. . The CC/PP working group develops a framework for the management of device profile information [48]. The group is chartered to deliver a framework that allows the user to plug in different vocabularies. A vocabulary provides naming and syntax for device properties such as screen size, markup support, or browser version. UAProf [14] (specified by the WAP Forum, which is now continued within the Open Mobile Alliance) is probably the most relevant example of such a vocabulary. UAProf also specifies how profile information is attached to HTTP/WSP requests. REFERENCES 1. 3GPP TS 26.071, Mandatory Speech Codec Speech Processing Functions; AMR Speech Codec; General Description. 2. ISO/IEC 14496-3:2001, Information Technology—Coding of Audio-Visual Objects— Part 3: Audio. 3. Digital Compression and Coding of Continuous-Tone Still Images, ISO/IEC IS 10918-3, ITU-T Recommendation T.84, 1990 (JPEG specification). 4. Graphics Interchange Format, Version 89a, Programming Reference, CompuServe, Inc., 1990; http://256.com/gray/docs/gifspecs. 5. T. Boutell et. al., PNG (Portable Networks Graphics) Specification Version 1.0, IETF RFC 2083, March 1997. 6. Multiple-image Network Graphics, http://www.libpng.org/pub/mng/. 7. ISO/IEC 15444-1 (2000), Information Technology—JPEG 2000 Image Coding System: Core Coding System, Part 1. 8. W3C Working Draft Recommendation, Scalable Vector Graphics (SVG) 1.1 Specification, http://www.w3.org/TR/SVG11, Feb. 2002. 9. W3C Recommendation, Mobile SVG Profiles: SVG Tiny and SVG Basic, http:// www.w3.org/TR/SVGMobile. 10. ITU-T Recommendation H.263, Video Coding for Low Bit Rate Communication. 11. ISO/IEC 14496-2:2001, Information Technology—Coding of Audio-visual objects—Part 2: Visual. 12. R. Mohan, J. R. Smith, and Chung-Sheng Li, Adapting Internet multimedia content for universal access, IEEE Trans. Multimedia, 1(1): 104 – 114 (March 1999). 13. W3C Working Draft Recommendation, CC/PP Structure and Vocabularies, http:// www.w3.org/Mobile/CCPP/Group/Drafts/WD-CCPP-struct-vocab-20010620/, June 2001.

REFERENCES

253

14. OMA (formerly WAP Forum), WAP UAProf Specification, http://www1.wapforum. org/tech/documents/WAP-248-UAProf-20011020-a.pdf, Oct. 2001.

15. W3C Candidate Recommendation, Resource Description Framework (RDF) Schema Specification 1.0, http://www.w3.org/TR/2000/CR-rdf-schema-20000327, March 2000. 16. 3GPP, TS 22.140 V6.5.0, Multimedia Messaging Service (MMS); Technical Specification Group and System Aspects, Stage 1 (Release 5), http://www.3gpp.org/ftp/ Specs, March, 2004. 17. 3GPP TS 23.140 V6.5.0, Multimedia Messaging Service (MMS); Functional Description, Stage 2 (Release 6), http://www.3gpp.org/ftp/Specs, March 2004. 18. 3GPP, TS 26.140 V5.2.0, Multimedia Messaging Service (MMS); Media Formats and Codecs, http://www.3gpp.org/ftp/Specs, Dec. 2002. 19. Open Mobile Alliance (OMA), http://www.openmobilealliance.org. 20. CMG, Ericsson, Nokia, Sony-Ericsson, Comverse, Logica, Siemens, Motorola, MMS Conformance Document, version 2.0.0, Feb. 2002, http://www.forum.nokia.com. 21. W3C Recommendation: Synchronized Multimedia Integration Language (SMIL 2.0), http://www.w3.org/TR/2001/REC-smil20-20010807/, August 2001. 22. W3C Recommendation, XHTML Basic, http://www.w3.org/TR/2000/REC-xhtmlbasic-20001219, Dec. 2000. 23. B. C. Smith and L. Rowe, Algorithms for manipulating compressed images, IEEE Comput. Graph. Algorithms, 34– 42 (1993). 24. N. Merhav and V. Bhaskaran, A transform domain approach to spatial domain image scaling, Proc. Conf. Acoustics, Speech, and Signal Processing, 1996 (ICASSP-96), IEEE Int. Conf. Vol. 4, 1996, pp. 2403– 2406. 25. S. Coulombe, G. Grassel, and P. Hjort, Multimedia messaging—the evolution of SMS to MMS, in Mobile Internet Technical Architecture—Technologies and Standardization, IT Press, 2002; http://www.itpress.biz/. 26. WAP Wireless Application Environment, OMA (formerly WAP Forum), Nov. 4, 1999. 27. Project TS 23.140 Release 1999. ftp://www.3gpp.org/ftp/Specs. 28. OMA (formerly WAP Forum) Specification, XHTML Mobile Profile, http:// www1.wapforum.org/tech/terms.asp?doc ¼ WAP-277-XHTMLMP-20011029a.pdf, Oct. 2001.

29. Eizel Technologies, Inc., Amplifi Enterprise Server, http://www.eizel.com. 30. W3C Recommendation: HTML 4.01 Specification, http://www.w3.org/TR/1999/REChtml401-19991224/, Dec. 1999. 31. W3C Working Draft, XHTML 2.0, http://www.w3.org/TR/2003/WD-xhtml220030506/, May 2003. 32. W3C Candidate Recommendation, CSS Media Queries, http://www.w3.org/TR/ css3-mediaqueries/, July 2002. 33. W3C Recommendation, Cascading Style Sheets level 2 — CSS2 Specification, http:// www.w3.org/TR/1998/REC-CSS2-19980512/, May 1998. 34. OMA (formerly WAP Forum), Wireless Markup Language Specification 1.3, http:// www1.wapforum.org/tech/terms.asp?doc ¼ WAP-191-WML-20000219-a.pdf, Feb. 2000. 35. M. Hudley, A Framework for Multilingual, Device-Independent Web Sites, Sun Developer Connection, April 2001, http://wwws.sun.com/software/xml/ developers/xmlldijsp/framework.html.

254


36. W3C, The Extensible Stylesheet Language Family (XSL), http://www.w3.org/Style/ XSL/. 37. W3C Recommendation, XSL Transformations (XSLT) Version 1.0, http:// www.w3.org/TR/1999/REC-xslt-19991116, Nov. 1999. 38. OMA (formerly WAP Forum), WAP 2.0 Specifications, http://www.wapforum.org/ what/technical.htm. 39. Oracle Corporation, Oracle 9iAS Wireless, http://otn.oracle.com/products/ iaswe/content.html. 40. Apache Software Foundation, Apache Cocoon, http://cocoon.apache.org/2.0/. 41. Internet Mail Consortium: vCard and vCalendar, http://www.imc.org/pdi/. 42. Sun Microsystems, Inc., Mobile Information Device Profile, http://java.sun.com/ products/midp/. 43. Nokia Corporation, Series 60 Platform, http://www.forum.nokia.com/series60. 44. PalmSource, Inc., Palm OS, http://www.palmsource.com/. 45. Microsoft Corporation, Pocket PC, http://www.pocketpc.com/. 46. World Wide Web Consortium, http://www.w3.org/. 47. W3C Device Independence Working Group, http://www.w3.org/2001/di/Group/. 48. W3C CC/PP Working Group, http://www.w3.org/Mobile/CCPP/Group/. 49. Internet Content Adaption Protocol (ICAP) Forum, http://www.i-cap.org/. 50. MPEG (Moving Picture Experts Group), ISO/IEC JTC1/SC29 WG11, http:// mpeg.telecomitalialab.com/standards/mpeg-21/mpeg-21.htm. 51. Sun Microsystems, Inc., JavaServer Pages Technology, http://java.sun.com/ products/jsp/. 52. OMA, SyncML—Data Synchronization and Device Management, http://www. openmobilealliance.org/syncml/. 53. R. Fielding, J. Gettys, J. Mogul, H. Nielsen, and T. Berners-Lee, Hypertext Transfer Protocol—HTTP/1.1, IETF RFC 2068, Jan. 1997. 54. OMA (formerly WAP Forum), WAP Wireless Session Protocol Specification, July 2001. 55. ETSI, Digital Cellular Telecommunications System (Phase 2); International Mobile Station Equipment Identities (IMEI), ETS 300 508 (GSM 02.16 version 4.7.1), Nov. 2000. 56. EU Project Consensus, IST 2001 32407, http://www.consensus-online.org; software to be released early 2004. 57. W3C Recommendation, Extensible Markup Language (XML) 1.0, 2nd ed., Oct. 2000. 58. 3GPP2 X.S0016-000-A, 3GPP2 Multimedia Messaging System, MMS Specification Overview, Revision A, http://www.3gpp2.org, May 2003. 59. 3GPP2 C.S0045-0, Multimedia Messaging Service (MMS) Media Format and Codecs for cdma2000 Spread Spectrum Systems, http://www.3gpp2.org, Dec. 2003. 60. Open Mobile Alliance, OMA Multimedia Messaging Service v1.2, http//www. openmobilealliance.org, Sept. 2003.

CHAPTER 8

CONTENT SYNCHRONIZATION GANESH SIVARAMAN Nokia Helsinki, Finland

8.1

INTRODUCTION

Content synchronization has been used since the early 1990s for database replication. A good example where database replication is widely used is the Internet. Widely used content, such as Webpages, files, and emails, are stored and are frequently updated in a central server, such as Web servers, file servers, and mail servers. Having all content in one central server works well if the Internet traffic is low and if most users consuming the content are in the same location as the server. But in the Internet, that is not the case. Users consuming the content are distributed across various locations and in such a situation accessing the content from a central server is not just slow but also unreliable as the central server may encounter a failure and the user will not have access to the data he/she desires. Hence, it is very common to have the content distributed across numerous servers, which are known as “mirror servers.” The mirror servers need not be connected at all times with the central server. The process for distributing the content used is replication, which is a type of synchronization. Replication allows exchange of information from a central data store that holds all content with other servers that may not be connected with the central server at all times. Replication copies content from the central server to other servers, and when changes are made to central server. These changes are also communicated and exchanged with other servers. Replication is simply synchronization of content residing on the central server’s database with other servers’ databases, thereby ensuring that all servers have identical data. Important point to note and understand in case of synchronization is that the content stored at a central location are copied and stored locally for access and modifications. The local storage of the data provides certain advantages; Load balancing—with many corporations and service providers, the number of users accessing the data are huge and the users are usually physically Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

255

256

CONTENT SYNCHRONIZATION

located in different geographic locations. In such cases, it is very useful and important that users be able to access the data fast. Fault tolerance—storing all data in one server can be very risky; one could essentially loose all the data in the event of a system failure. Offline access—in order to access and/or modify the data, it is much easier to do it off line with the “local copy,” rather than accessing the “master copy” for every change that occurs. As seen above, data are not in one server and one database, but rather in different servers and different databases and could even be in different geographic locations, and all servers need not be connected to the central server at all times, as shown in Figure 8.1. Since servers may be disconnected, and changes and actions made or taken during the disconnect state may be unknown to either side. Hence, for such a distributed setup of databases, synchronization between all the servers and their databases is of utmost importance to ensure that all changes and actions that have taken place on the data are exchanged and that the dataset among all servers are in the same state. Synchronization enables all the distributed copies of the central data store to remain consistent by communicating the changes between the copy and the central database and resolving conflicts that may arise when changes contradictory occur on the same data of the database. As synchronization is very important for computers and the wired world, synchronization can also be viewed very important for mobile devices and the wireless world. This chapter explores content

Figure 8.1

Replication system.

8.2

WHY MOBILE DEVICES NEED SYNCHRONIZATION

257

synchronization in detail from a mobile device and wireless network perspective. Later, we will briefly discuss an open synchronization standard, which provides a complete open and interoperable synchronization solution for mobile devices by taking in to account the various mobile and wireless requirements.

8.2

WHY MOBILE DEVICES NEED SYNCHRONIZATION

The first generation of mobile phones that were introduced for commercial use in 1978 were designed for voice traffic. In those early days, mobile phones were too bulky, very expensive, and limited to a small coverage area, making roaming impossible, and were hardly capable of supporting any of the high-end services and features that are used today. But with rapid development of mobile technologies, the second generation (2G) of mobile telephony brought better quality in the form of digital cellular network and wide coverage that allowed mobile users to roam easily and conveniently from network to network. All these allowed for a high proliferation of 2G-based phones. With more and more people using mobile phones, it became natural to go beyond voice services such as data centric services, including browsing, messaging, streaming video and audio, and synchronization. Mobile phones are now viewed as more than merely a means of telephonic communication. Instead, they are gaining status as a mobile computing device. The current mobile phones are feature rich that create a whole new service industry for the mobile phone users. Many of the large corporations employ a mobile workforce that represents the sales and support organization. For such groups, a mobile computing device comes in very handy as it allows one to gain access in real time to the backend systems for stock status updates and other vital information needed for a mobile workforce on the move. In most cases, mobile users are not always connected to the network and its stored data. Thus mobile users retrieve data from the network and store them locally on the mobile device, where they access and manipulate the local copy of the data. From time to time, users reconnect with the network to send any local changes back to the networked data repository. During this maneuver, users may also have the opportunity to know about updates made to the networked data while the device was disconnected. Certain time, they need to resolve conflicts among the updates made to the networked data. This reconciliation operation—where updates are exchanged and conflicts are resolved—is known as data synchronization. In other words, data synchronization is the process of making two datasets appear identical. Synchronization is very important for a mobile workforce where data are typically modified and updated locally in disconnected or offline state. In this state changes and updates made on the device side are unknown to the server and changes made to the server are unknown to the client. As an example, mobile phone users are increasingly using their phones to access corporate email and calendar systems. Mobile phones are not typically connected to these systems at all times, but the users can connect to them or “go on line” when needed. For such applications, where data reside on both the client and server sides and the users are off

258


line for long periods, synchronization is of utmost importance. Figure 8.2 illustrates a scenario for mobile device synchronization.

8.3

FUNDAMENTAL PRINCIPLES OF SYNCHRONIZATION

To provide the reader with a greater understanding of content synchronization, it is important to explain some fundamental principles that form the basis for content synchronization. As seen earlier, synchronization allows “exchanging” (reversal or modification) of changes that have occurred between databases that store the same dataset. This section explores on the aspects of how synchronization takes place and provides some details on the elements that are needed for synchronization. 8.3.1 8.3.1.1

Types of Synchronization One- versus Two-Way Synchronization

One-Way Synchronization. As the term suggests, one-way synchronization (Fig. 8.3) allows for only one side to communicate and send the changes of the data stored by that side. Which side should send the change depends on the implementation and the configuration of the system. Typically, a content replication system, such as file replication or Web content replication, could be viewed as oneway synchronization, where the central server always propagates all the changes that have occurred to other servers since the previous synchronization. Two-Way Synchronization. Two-way or bidirectional synchronization (Fig. 8.4) allows both sides to exchange all changes and is the most common type of content synchronization. Typically, a dataset copied from a central database and stored locally is modified and new data may also be added to the dataset. As an example, in a corporate environment, most employees would download (essentially copy) the emails from the mail server and store them locally. Once they are available on an employee’s system, the employee would modify them by reading, deleting, or even creating new emails. These actions are performed mostly “off line,” unknown to the central server, in this case the mail server.

Figure 8.2

Mobile device synchronization.

8.3

FUNDAMENTAL PRINCIPLES OF SYNCHRONIZATION

Figure 8.3

259

One-way synchronization.

At the same time when the employee is modifying the emails off line, the mail server may be receiving new emails. Hence in such scenarios modifications that occur on both sides need to be communicated and exchanged to ensure that datasets on both databases are in the “same state.” 8.3.1.2 Slow versus Fast Synchronization One- and two-way synchronization may be either fast or slow. Slow Sync (Full Sync). This type is seldom used. By using this all content stored in the database is synchronized. Since the entire database is synchronized, this is used only in cases where the device and the server synchronize for the first time

Figure 8.4

Two-way synchronization.

260


and/or the database is unable to detect changes, which may be due to internal failure, database corruption, or some other anomaly. Fast Sync (Delta Sync). This type is more commonly used. It is used to exchange only the changes that have occurred since the previous synchronization; hence the term delta synchronization. This is very useful as it enables one to detect the changes that have occurred in the stored content and send only those changes as opposed to sending the entire database. As an example, a user has just synchronized the new contact that created on his device with his corporate server. After synchronization, he realizes that the phone number he entered was incorrect, so he updates the new entry. By using the “fast sync type,” the next time that he starts sync, he will be able to send only the entry he updated as opposed to the entire contact’s database. 8.3.2

Change Detection

This is very important for content synchronization. Without change detection, the synchronization would be a slow one, where the entire contents of the database, as opposed to only the changes, are synchronized. In databases, change detection is built into the system and there are numerous means to detect the changes that have occurred. The details of how change detection can be implemented are beyond the scope of this chapter. 8.3.3

Conflict Detection and Resolution

Conflict detection, and resolution are as important for content synchronization as is change detection. Conflicts occur whenever identical items residing on two different databases are changed and then synchronized. As an example, a contact created on the device is synchronized with the server. After synchronizing, the user updates the contact on the device and also updates the same contact on the server. When such situations occur, conflicts are encountered during synchronization. Hence, to resolve such situations, conflict detection and resolution are important. Conflict resolution can be done in various ways. The resolution can be based on certain rules. . . . . .

The client always wins, essentially overriding changes made on the server. The server always wins, essentially overriding changes made on the client. Create duplicates. The latest changes win. Merge the changes.

Conflict can be resolved either by the system, client side or server side, or by the user, where the user is presented with a dialog that reports the conflicts. According to the information presented in the dialog, the user can resolve the problem by selecting one of the rules listed above.

8.4 ADOPTION OF SYNCHRONIZATION FOR MOBILE DEVICES

8.4

261

ADOPTION OF SYNCHRONIZATION FOR MOBILE DEVICES

Although the basic principles of synchronization are also applicable to mobile device synchronization, some considerations are necessary while synchronizing mobile devices. It is well known that mobile devices have resource constraints— limited processing power, limited battery life, limited processing and storage memory, modest data rate [although higher data rates will soon be available with 3G and EDGE (enhanced data rates for GSM evolution)]. Also, the cost of airtime that mobile users have to pay is a crucial factor that needs to be considered while designing applications for mobile devices. As with any other mobile application, synchronization has to be adopted for the mobile environment by addressing the aforementioned constraints posed by mobile devices. This section explains on how the basic principles of synchronization are applied to mobile devices and discusses some of the special requirements for mobile devices in synchronization applications. 8.4.1

Synchronization Scenarios for Mobile Devices

The synchronization scenario is based on which access medium or “bearers” is used to connect the mobile devices with the servers. Typically, synchronization of mobile devices has been based on local connectivity media, such as cable or infrared, as defined by the IrMC specification. But with local synchronization the possibilities are limited, as the mobile device is able to synchronize only with a local server, which restricts the mobility of the user. Instead of such a restrictive synchronization scenario, over-the-air or remote synchronization allows the user to synchronize from anywhere, any time, providing a true mobile synchronization solution and allowing the user to initiate sync whenever needed. Local Synchronization. This is the most common and widely used synchronization scenario as shown in Figure 8.5. There are many proprietary solutions available that synchronize mobile devices, such as PDAs and mobile phones, connected via serial or universal serial bus cable, infrared, or bluetooth with applications running on desktop computers. The synchronization application running on the desktop computer acts as a local synchronization server that allows mobile devices to synchronize with applications running on that computer. The most commonly used application is Lotus Notes or Microsoft Outlook, which store content from mobile devices locally. Over-the-Air Sync (Remote Sync). This scenario (see Fig. 8.6) allows mobile users to initiate synchronization any time, anywhere. This gives great flexibility for users to detect changes that have been made at the server and also to communicate changes made by the user on the device at any time. For over-the-air synchronization, mobile devices, such as PDAs or mobile phones, use well-known wireless network access media such as GSM/GPRS or wireless LAN, with TCP/IP or WAP [9] protocols. There are both proprietary and open-standard-based over-the-air synchronization solutions available. Development of open-standard-based synchronization solution

262


Figure 8.5

Local synchronization based on short-range bearers.

started with the SyncML Initiative, which joined Open Mobile Alliance in November 2002. Standardization of the open synchronization solution is still being carried out in the OMA [6] Data Synchronization Working Group. (OMA standardization is discussed in further detail in Section 8.5.1.) 8.4.2

Adhering to Mobile Device Constraints

Mobile devices are known to have limited resources. So, while designing any applications for mobile devices, these constraints should be taken into account. Although the fundamental principles of synchronization are applicable even to mobile devices, certain exceptions must be made in order to address the constraints of mobile devices. As with most other mobile applications, synchronization can be based on client – server architecture, where the server handles all the major functionalities. This is true not only for mobile applications but for any desktop applications as well. This allows for great savings on memory by implementing a simple synchronization engine for mobile devices that will satisfy the needs for simple applications such as personal information management (PIM), which includes calendar and contact synchronization. But for complex applications, such as relational database applications, a more complex synchronization engine would be required on mobile devices, which will certainly require more memory. Mobile devices may typically synchronize with more than one server. As an example, a mobile user may synchronize a single calendar database with the corporate server and with the portal server for maintaining business appointments and

Figure 8.6

Remote synchronization based on over-the-air bearers.

8.5

SYNCHRONIZATION STANDARD

263

family appointments that he shares with his family members separately. In such cases, logging changes for each data store per server may require considerable storage memory or static memory. This change log is needed to record which items have changed in the database between synchronization sessions. During the synchronization session the change log is referred to determine what has changed, and these changes are communicated accordingly. The actual implementation details of the change log are beyond the scope of this chapter. It is worth noting that the change log can grow significantly and could consume a considerable amount of static memory storage. One way to reduce the storage for change log is to define the maximum limit, but this would restrict the number of changes that can be made between sessions; however, this is a tradeoff that one may have to accept. Because mobile devices have limited processing capabilities and processing and storage memory, complex operations, such as conflict detection and resolution, will inevitably overload mobile devices. Such operations must be carried out on the server side. By employing the client – server architecture, it is easy to move such operations to the server side without loosing the functionality of synchronization. An important aspect in synchronization is the use of a local unique identifier and a global unique identifier. Identifiers (IDs) are necessary to address each item uniquely in a database. The database in a mobile device is typically limited in terms of the size of the IDs used for the items that it stores. It is not possible to match with ID length used by the server, which is typically much longer than that used by the client. Hence, items that are created on the server cannot be synchronized as such with the same ID as used on the server, since the length of ID will not fit the size defined by the client. For this purpose, the client creates an ID of its own when a new item created on the server is synchronized with the client. For all the subsequent operations, such as changes or deletes that may be performed on this item, the server must address the item with the ID assigned by the client and not with the server’s ID. For this purpose a special operation called mapping must be carried out after the synchronization. In mapping, the client essentially sends map information that contains the client’s ID, which is the local unique identifier, and server’s temporary ID. The server must maintain this mapping information for the entire life of the item. More details on mapping can be found in the next section.

8.5


There is a proliferation of different, proprietary, noninteroperable data synchronization protocols for mobile devices. Each of these protocols is available for only selected transports, implemented on a selected subset of devices, and able to access a small set of networked data. The absence of a single synchronization standard poses many problems for end users, device manufacturers, application developers, and service providers. To address this problem, SyncML Initiative was formed to develop and promote a single, common data synchronization protocol that can be used industrywide. Driving this initiative were Ericsson, IBM, Lotus, Motorola, Nokia, Palm Inc.,

264


Psion, and Starfish Software. The first specification version was released in December 2000 with a supporting reference implementation. In June 2002 Open Mobile Alliance (OMA) [6] was created with the consolidation of supporters of open mobile architecture initiative and the WAP Forum [9]. Additionally, SyncML Initiative, Location Interoperability Forum, MMS-IOP, and Wireless Village joined OMA [6] during December 2002. Currently, the data synchronization standard is further developed in the OMA DS Working Group. The synchronization standard provides an open, interoperable synchronization solution for a wide spectrum of mobile devices, such as low-end mobile phones, “smart” phones, and PDAs. All these mobile devices have resource constraints by definition as seen above, but the level of constraint differs among the mobile devices. Typically in this wide spectrum, low-end mobile phones are seen to have the most computing constraints, such as processing power and memory, whereas PDAs are seen to have the least computing constraints. Along with resource constraints, wireless access media used by mobile devices, such as GSM or WLAN [8] have high latency, a high error rate, and slow data rates. Hence, during the design phase of the standard, these limitations have been closely considered. To further understand the technical aspects of data synchronization for mobile devices, it is imperative to comprehend the various specifications of OMA [6] data synchronization (formerly known as SyncML Data Synchronization). 8.5.1

OMA Data Synchronization Overview

For synchronization between two applications, changes that have been made by both applications have to be communicated. Also, for synchronization it may be necessary to reconcile the conflicting changes that occur when changes are made concurrently. Hence, for synchronization between applications it is necessary to represent the changes in a format and structure that is understood by both sides and to exchange those changes in accordance with certain rules to which both sides comply. The synchronization standard addresses this requirement with two fundamental specifications that form the basis of data synchronization: OMA representation and OMA data synchronization protocol (Fig. 8.7). Both of these specifications are designed by accounting for all the mobile device requirements that were discussed in earlier sections. Additionally, as with other mobile application standards or Internet application standards, synchronization has been designed such that it is impartial or unbiased to either the bearer or the content. Bearer neutrality is necessary so that the synchronization standard can be implemented on top of any data bearers, such as HTTP [3] for Internet/intranet access, WSP for wireless access, and OBEX for local connectivity that uses Bluetooth [7] Infrared [4,5] or cable. Similarly, the standard must allow for the synchronization of any content, without arbitrary restrictions with regard to a particular set of contents. 8.5.1.1 OMA Representation This is one of the two specifications that form the core of the synchronization standard. Representation defines a logical entity called as “package,” which

8.5

Calendar Application


Contact Application

265

Email Application

Representation Protocol Synchronization Protocol HTTP Internet/ Intranet Figure 8.7

WSP

OBEX IrDA, USB, Bluetooth

WAP

OMA data synchronization framework.

encapsulates one or more synchronization message(s), as shown in Figure 8.8. The message is based on the eXtensible Markup Language (XML) [10], which defines the structure and format of the message. This is necessary in order to provide the structural form for each message. The package is broken into numerous small messages to address the constraints of the mobile devices. As an example, for certain wireless transport applications, such as WSP, which does not allow segmentation of large objects and has a small protocol data unit size, large data objects must be broken down into smaller message segments that comply with the size limitations defined by the underlying transport protocol. Each message consists of a header and a body, which provide information that is relevant for synchronization applications during the synchronization session. The header provides routing information, which consists of server and client address, authentication credentials (username, password pair), and session information. The body encapsulates most of the important elements needed for synchronization and defines all the synchronization commands, such Add, Delete, Replace, and Sync, which are required to perform various synchronization operations, alert

SyncML Message SyncHdr SyncBody SyncML Commands

OMA DS Package

Figure 8.8

Data

Data synchronization packaging model.

266


database for synchronization, provide status information for all the operations, and convey mapping information from client to server, and most importantly must convey the data object or the payload itself, as shown in Figure 8.9 [11]. Representation provides the syntax for synchronization applications by using document type definition (DTD). Applications must represent the changes occurred on the data as defined by the representation DTD: 1.1 SyncML/1.1 1 1 http://www.syncserver.org/sync IMEI:930051010592118 syncml:auth-basic QnJ1Y2UyOk9oQmVoYXZl 5000 1 200 ./contacts/myContacts ./contacts 234 276 2

8.5


267

application/vnd. syncml-devinf þ xml ./devinf11 My Phone 4119 ... text/x-vcard BEGIN VCARD END VCARD VERSION 2.1 N TEL VOICE CELL 01 02 3 application/ vnd.syncml-devinf þ xml ./devinf11 Figure 8.9

XML snippet of synchronization message with capabilities [11, Sec. 9.11].

268


8.5.1.2 OMA Data Synchronization Protocol As representation defines the packaging model, structure of the messages, and syntax for the messages, the protocol specification sets the rules that both client and server must follow during the synchronization process. The protocol is the specification that follows the fundamentals of synchronization as discussed in Section 8.3 by factoring in the mobile device synchronization requirements as presented in Section 8.4. The protocol makes use of the representation DTD for creating synchronization messages but sets the rules for communication between the client and the server. The message sequence chart (MSC), as shown in Figure 8.10, best explains this. MSC clarifies the different states internally maintained by the client and the server, who both locally store state information. The protocol specification splits the entire synchronization process into three phases: (1) initialization; (2) exchanging changes and resolving conflicts, if any (essentially synchronization); and mapping. Each of these phases is explained in detail in later in this section. As seen in Section 8.4, mobile devices have certain constraints. Whenever applications are designed for mobile devices, these constraints must be taken seriously. Although the protocol specification follows the basic principles of data synchronization, it is not feasible for low-end or even high-end mobile devices to support all the requirements of data synchronization as such. To provide a solution that considers all the fundamental principles and that can be implemented by a wide spectrum of devices, the protocol specification must separate the functionalities supported by both client and server by defining roles for synchronization client and server. This approach allows for a wider adoption of the protocol and certainly facilitates the implementation effort.

Figure 8.10

Message sequence chart between sync client and server.

8.5


269

The distinction between synchronization client and server allows simple and lightweight implementation on the client and moves all the complex operation to the server. Although the implementation is simple, it still provides a fully functional synchronization solution for low-end as well as high-end mobile devices. Protocol specification requires the client to send its modifications first and be able to receive any responses to those modifications from the server. Also, the client must be able to handle any changes made on the server side. The requirement of the client sending the modification first addresses the mobile device constraints by moving the synchronization analysis to the server side. Synchronization analysis is the process of comparing the changes made on the client side with those made on the server for the same item. Any conflicts that may happen if the same item is modified on both client and server sides will be detected and possibly resolved during the synchronization analysis process. This is a complex operation, and in most cases clients with their resource constraints simply cannot handle it. Different Phases in a Synchronization Session INITIALIZATION .

This is a handshaking phase where client and server exchange information and negotiate the conditions that will be followed for the rest of the synchronization session. Unless the handshake phase is successfully completed, the other phases won’t occur. A failure in the handshaking process may result in disconnection of the session. Information that is exchanged as part of the initialization includes session information, authentication credentials and type of authentication mechanism (username, password pair), alert information for synchronization, and device capability information. Alert and device capabilities are the most important of all in the initialization phase. Alert provides the information about the different synchronization types. As seen in Section 8.3.1, the basic types are two-way and one-way sync. Additionally, there is slow and fast sync, which are used with both one-way and two-way types. Typically, when a new mobile device initiates synchronization for the first time with a server, the type used would be a slow, two-way sync, where all contents of the client’s database are exchanged with those of the server’s database and vice versa. During the initialization, if the alerted synchronization type is not supported or not acceptable for certain reasons, then a NACK, a negative acknowledgment in form of a status, is sent and the session may be terminated. As an example, suppose that a client and a server have synchronized earlier and since then the client has made certain changes and, hence requests synchronization by alerting for a fast, two-way sync to communicate only the changes to the server as opposed to all contents of the database. For some reason, the server is unable to accept this in such cases the server can always send a NACK and compel the client to initiate a slow, two-way synchronization. The support levels of content types on client and server are not the same. For instance, the client may not be able to support all the fields defined by a content type because of memory constraints. Also, in certain cases, not all the fields are

270


commonly used by the end-user. A good example is the versit vCard [1] object, which specifies many fields that can be used while creating a contact in a phonebook. Clients seldom implement all of them. Typically, a client is seen as a subset of what the server supports for a content type. Hence, in such situations it is imperative for the server and in some cases also the client to know the supported capabilities of the content and to know exactly what is supported. With this knowledge, the server can know upfront before sending the item whether the client will be able to handle it. For example, the versit vCard [1] allows contacts with or without photographs. Let’s say that the server supports a photo field for contacts but the client does not. The user creates a contact on the server and adds a picture. When synchronization is initiated, the client stores the contact sent by the server without the photo since the client does not support the photo. At a later time the same contact is synchronized with the server as the user will have modified some details. Now, the server deletes the photo it stored for this contact, as the client didn’t send the photo as part of the contact. This behavior is not acceptable as the client did not initially issue an explicit delete for the photo field. One way to avoid this problem is to refrain from sending anything that is not supported. This is achieved by knowing the device capabilities. For this purpose, OMA data synchronization specifies device information that allows expression of the content capabilities supported by both client and server. During the initialization phase the capabilities are exchanged by relaying the device information as shown in Figure 8.9. Synchronization. This phase performs all of the synchronization work. The client and the server synchronize according to the outcome of negotiations during the initialization phase, so if both client and server agreed on slow, two-way synchronization, then this phase will simply follow the agreement and send all the database content. For synchronization of changes, which is usually the fast synchronization type, it is necessary to support change detection. The details of how change detections are implemented are beyond the scope of this chapter. Without a mechanism to support change detection, client and server will only be able to perform slow sync, which is not an efficient synchronization solution for mobile devices that have low data rate and high airtime costs, such as the GSM/GPRS network. It is during this phase when synchronization analysis is performed. As seen, this is a complex operation and is usually supported by the server. Synchronization analysis consists of conflict detection and resolution. Whenever both client and server make changes on the same data item, conflicts occur. When this happens, the server is able to resolve the problem by following the rules as outlined in Section 8.3.3, [11]. Mapping. Mapping is required because of constraints imposed on the length of IDs used by mobile devices. IDs are unique identifiers used in a database to isolate or distinguish individual items. Typically, low-end devices use 4– 8 bytes as unique identifier length. Servers, on the other hand, use much longer unique identifiers.

8.5

Figure 8.11 10.2.1].


271

XML snippet of server sending synchronization message to client [11, Sec.

272


Figure 8.12

XML snippet of client sending mapping message to server [11, Sec. 10.3.1].

8.6

Figure 8.13

SUMMARY

273

Map table maintained by server (adapted from Ref. 11).

When a server adds an item, it cannot add with the unique identifier used by its database, but rather must add the item by using a temporary identifier as shown in Figure 8.11 in the “Add” command. Even a temporary identifier cannot have a long ID as the client has to buffer the temporary ID until map operation is executed. So the server must comply with the acceptable temporary length defined by the mobile device. The mobile device accepts the item and adds to its database. By adding it, the client’s database generates a new ID for the added item, which is the client-side local unique identifier (LUID). This ID must be communicated to the server, as the server must use the client’s ID to communicate all changes that may occur on the server side in the future. Hence, the mapping operation is used for this purpose, where the client sends a message consisting of the server’s temporary ID mapped with the client’s ID (LUID). On receiving this message, the server maintains a mapping table that contains maps the client’s ID with the server’s ID (LUID mapped with GUID). The server must maintain the map for the entire life of the item. This is illustrated in Figures 8.12 and 8.13 (adapted from Ref. 11 Section 7.3).

8.6

SUMMARY

Content synchronization has been widely used for many years in the wired world with devices having high computing power. Mobile computing devices have already attracted interest for various services and applications. With that, a mobile device is regarded as an indispensable device, which users prefer to use for almost everything that is possible while on the move. Content synchronization is very important for mobile devices, especially since there are numerous

274


applications, such as email, and the PIM system, on a mobile device with which users want to keep up-to-date with backend systems. But since mobile devices are considered to be in a “disconnected state” most of the time, one way to ensure that applications on mobile devices and backend systems are up-to-date is to use content synchronization.

REFERENCES 1. vCard, Electronic Business Card, http://www.imc.org/pdi/vcard-21.doc. 2. vCalendar, Electronic Calendaring and Scheduling Exchange Format, http:// www.imc.org/pdi/vcal-10.doc. 3. Hypertext Transfer Protocol, HTTP/1.1, http://www.ietf.org/rfc/rfc2616.txt. 4. IrDA Object Exchange Protocol, http://www.irda.org/standards/ specifications.asp. 5. Infrared Mobile Communications, http://ww.irda.org/standards/ specifications.asp. 6. Open Mobile Alliance, http://www.openmobilealliance.org. 7. Bluetooth Core Specification, http://www.bluetooth.com. 8. IEEE 802.11, Wireless Local Area Networks, http://grouper.ieee.org/groups/ 802/11. 9. WAP Forum, Wireless Session Protocol, http://www.wapforum.org/what/ technical.htm. 10. Extensible Markup Language, http://www.w3.org/XML/. 11. SyncML Data Sync Protocol Version 1.1.2, Open Mobile Alliance (OMA), OMASyncML-DataSyncProtocol– V1_1_2-20030612-A.

CHAPTER 9

MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS SANJEEV VERMA Nokia Research Center Burlington, Massachusetts

MUHAMMAD MUKARRAM BIN TARIQ DocoMo Communication Laboratories USA, Inc. San Jose, California

TAKESHI YOSHIMURA Multimedia Laboratories, NTT DoCoMo, Inc. Yokosuka, Kanagawa, Japan

TAO WU Nokia Research Center Burlington, Massachusetts

9.1

INTRODUCTION

Multimedia services, such as streaming applications, are growing in popularity with advances in compression technology, high-bandwidth storage devices, and highspeed access networks. Streaming services are generally used in applications like multimedia information and message retrieval, video on demand, and pay TV. Also, there has been growing popularity of portable devices, such as notebook computers, PDAs, and mobile phones in recent years. Now it is possible to provide very high-speed access to portable devices with emerging technologies like WLAN and 3G networks. For instance, emerging 3G wireless technologies provide data rates of 144 kbps for vehicular, 384 kbps for pedestrian, and 2 Mbps for indoor environments [1,2]. Hence, it is now possible to enrich the end user’s experience by combining multimedia services [3,4] with mobile-specific services such as geographic positioning, user profiling, and mobile payment. One example of such service is “mobile cinema ticketing,” which uses geographic positioning and user-defined Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

275

276

MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS

preferences to offer a mobile user a selection of movies from nearby movie theatres. A user views corresponding movie trailers through a streaming service before selecting a movie and purchasing a ticket. Streaming services are services in which continuous video and audio data are delivered to an end user. A multimedia streaming service consists of one or more media streams. A multimedia streaming application may have both audio and video components (e.g., news reviews, movie trailers) or it may have audio streaming with visual presentation comprising still images and or graphics animations, such as a quarterly Webcast of earnings by corporations. These applications are generally stored at a Web-based server and streamed to clients on request. Streaming audiovideo clips are sufficiently large, which makes their transmission time longer (several minutes or longer) than the acceptable playback latency. Hence, downloading the entire audio/video content before its playback is not an option. The streaming audio/video clips are played out while parts of the clips are being received and decoded. This is the biggest advantage of streaming service, since a user is able to see video soon after downloading begins. Figure 9.1 illustrates a general architecture for providing streaming services [5]. The multimedia content for streaming services is created from one or more media sources (videocamera, microphone, etc). It can also be created synthetically without using any natural media source. Examples of synthetically generated multimedia contents are computer-generated graphics and digitally generated music. Typically, the storage space required for raw multimedia content can be huge. The multimedia content is digitally edited and compressed in order to provide attractive multimedia retrieval services over low-speed modem connections. The edited

Figure 9.1

A general architecture designed to provide streaming services.

9.1

INTRODUCTION

277

and compressed multimedia clips are then stored in storage devices at the server. On receiving a request from the client, the streaming server retrieves the compressed multimedia clip from storage devices and the application layer QoS module adapts the multimedia stream based on the QoS feedback at the application layer. After adaptation at the application layer, transport protocols packetize the compressed multimedia clips and send them over the Internet. The packets may suffer losses and accumulate delay jitter while traversing the Internet. To further improve the QoS, continuous media distribution services (e.g., caching) may be deployed in the Internet. The successfully delivered media packets are decompressed and decoded at the client end. Compensation or playout buffers are deployed at the terminal end to mitigate the impact of delay jitter in the Internet and to achieve seamless QoS. Clients also use media synchronization mechanisms to achieve synchronization across different media streams, for example, between audio and video streams. There are several challenges in providing streaming services in wireless environments due to some issues that are specific to these environments (see Fig. 9.2). For example, wireless terminals typically have power constraints due to battery power. Also, they have limited buffering and processing power available due to size and power constraints. In addition, wireless environments are very harsh. The characteristics of a wireless channel have a unpredictable time-varying behavior due to factors such as interference, multipath fading, and atmospheric conditions. This results in more delay jitter, more delay, and higher error rates, compared to that in wired networks. Moreover, the mobility or the movement of a mobile user from one cell to another cell introduces additional uncertainty. The movement triggers a handoff mechanism to minimize interruption to an ongoing session. The wireless channel characteristics may be entirely different in a new cell after handoff. The access point (typically a basestation) of the mobile host to the wired network also changes after the handoff. This results in the establishment of entirely new route in the wired network. The new route in the fixed network may have different path characteristics. This problem becomes even more severe as wireless networks are Mobile Terminals

Limited Resources •Power constraints •Limited storage •Limited processing

Wireless Environments

Harsh Environments •High error rate •Large and variable delay •Expensive spectrum

Figure 9.2 Constraints in wireless environments.

278


being implemented using smaller cell sizes (microcell) to allow higher system capacity. Microcell implementations result in rapid handoff rates, causing even wider variation in path characteristics. These issues have some implications in providing streaming services in mobile environments. Streaming architecture in wireless/mobile environments should ensure minimum processing at the mobile terminal end. For instance, a typical approach used by streaming applications regarding QoS adaptations at the application layer may not be suitable in wireless environments. Adaptation at the application layer involves a lot of end-to-end signaling, which may eat away precious resources at the terminal end. Also, it is very difficult for mobile terminals with very limited processing and buffering capability to adapt at the application layer. The wireless network should have built-in network-wide mechanisms to minimize the resource and processing requirements at the mobile terminals. The overall design goal of wireless access architecture should be to “make networks friendly to applications” rather than “make applications friendly to networks.” In the remainder of this chapter, we will describe the different components and protocols that are constituents of the streaming architecture. First, we go over various QoS issues to support streaming services in general. We then give an overview of various codecs and media types that constitute an important component of multimedia streaming architecture. Next, we describe a general architecture to implement streaming services in mobile environments. We first review the different architectural components that support these services in wireless/mobile environments. In subsequent sections, we give an overview of key protocols and languages used for streaming multimedia delivery and provide an overview of their working and example usage. We then describe packet-switched streaming service architecture developed by 3GPP (referred as 3GPP-PSS) since it is the most mature standardization activity in this field. Most likely, the 3GPP2 architectural solution will also be on similar lines. Next, we discuss research issues and related work in providing multimedia services in mobile and wireless environments. Finally, we summarize and look into the future trends in supporting multimedia services in broadband wireless access networks.

9.2

QoS ISSUES FOR STREAMING APPLICATIONS

Streaming applications are real-time noninteractive applications. They involve oneway delivery of streaming data from the server to the client. Because of their realtime nature, these applications typically have bandwidth, delay jitter, and loss requirements. We first discuss the QoS parameters that are important for the streaming applications and then the QoS control mechanisms at application and lower layers. Delay jitter [6] is particularly important for these applications. The delay jitter bound for a session is calculated as the difference between the largest and smallest delays incurred by packets belonging to the session. A client (receiver) should choose playback instants so that, when it is ready to output the information contained in the packet, the packet has already arrived. If the delay jitter over a

9.2

Figure 9.3 jitter.


279

Client-side buffering: playout delay compensate for network-induced delay-

network connection is bounded, the receiver can eliminate delay-jitter incurred by packets while traversing the network by providing a large enough playout or compensation buffer (see Fig. 9.3). The buffered packets are then scheduled for transmission according to the rate at which they were generated at the sender. The packets, which arrive earlier than their scheduled playout time, wait in the playout buffer. Thus, the larger the delay jitter bound, the larger the playout buffer required at the receiver to maintain constant quality. For a given delay jitter bound, the required playout buffer size is the product of delay jitter bound and the playback rate. Figure 9.4 illustrates the removal of delay jitter at a client.

Figure 9.4

Delay jitter removal at the client end.

280


Although the output rate of the server could vary with time, for simplicity we assume that the server is generating packets at the constant rate with equal spacing every I seconds. The receiver delays the first packet by the delay jitter bound J and then plays out packets with the same spacing as they are generated. Suppose that the first packet arrives at the receiver d1 seconds after the transmission and is further delayed by an amount equal to the delay jitter bound J in playout buffer. The kth packet is generated after B ¼ kI seconds, and this packet will incur a delay between dk (fixed delay, mainly propagation delay) seconds and (dk þ J) seconds. Since the client plays back the packets with the same spacing as when they were generated, the kth packet will be scheduled for playout at (d1 þ J þ B) seconds. Since d1 dk , the latest arrival of the kth packet, (dk þ J þ B) is guaranteed to be before the scheduled time. Thus by delaying packets in the playout buffer for delay jitter bound, the receiver can eliminate jitter in the arrival stream, and guarantee that a packet has already arrived by the time the client is ready to play it. Note that the playout buffer is useful only to absorb short-term delay variations. The more the data initially buffered, the wider the variations that can be absorbed, but higher startup playback latency is experienced at the client end. The maximum allowable buffering is determined by the acceptable delay latency. Another important QoS parameter for a streaming application is the error rate. Although streaming applications can tolerate some loss, the error rate beyond a threshold can degrade the quality of the delivered streaming data significantly. To maintain reasonably good quality of playback stream, a proper error control mechanism is needed to recover packets before their scheduled playback time. The wellknown techniques to minimize error for streaming traffic are FEC, interleaving, and redundant retransmissions. In addition, the lost packets can be recovered through limited retransmissions. This necessitates buffering at the client end to allow for retransmissions. Now we look into the specific QoS control mechanisms at the application and lower layers to achieve the QoS needs of multimedia streaming applications.

9.2.1

Application Layer QoS Control

The goal of the application layer QoS control is to adapt at the application layer in order to provide acceptable quality streaming service to the end user in the presence of packet loss and congestion in the network. We note here that the Internet in its current form is a best-effort network and does not provide network-wide QoS support. Thus the available bandwidth is not known in advance and varies with time. The packets may suffer variable delay and come out of order at the client end. Clients need to adapt at the application layer in order to receive good-quality streaming service. The application layer QoS control techniques include endto-end congestion and error control. These techniques are employed by the end systems and do not assume any support from the network.

9.2


281

9.2.1.1 Congestion Control and Quality Adaptation The Internet in its rudimentary form provides a transport network that delivers packets from one point to another. It provides a shared environment, and its stability depends on the end systems implementing appropriate congestion control algorithms. The end-to-end congestion control algorithms help to reduce packet loss and delay in the network. Unfortunately, it is not possible for streaming applications to implement end-to-end congestion control algorithms since stored multimedia applications typically have intrinsic transmission rates. Streaming applications are rate-based and typically transmit data with a near-constant rate or loosely adjust their transmission rate on long timescales since the required rate for being well behaved is not compatible with their nature. For streaming applications, congestion control takes the form of rate control that attempts to minimize the possibility of congestion by matching the rate of streaming media to the available network bandwidth. A vast majority of the Internet applications implement TCP-based congestion control that uses the additive increase, multiplicative decrease (AIMD) algorithm. Under this algorithm, the transmission rate is linearly increased until a loss of packet signals congestion and a multiplicative decrease is performed. TCP, as it is, is not appropriate for delay-sensitive applications such as streaming. To ensure fairness and efficient utilization of network resources, rate control algorithms for streaming applications should be “TCP-friendly” [7 – 9]. This means that a streaming application sharing the same path with a TCP flow should obtain the same average throughput during a session. A number of model-based TCP-friendly rate control mechanisms [10] have been proposed for streaming applications. These mechanisms are based on the mathematical models that relate the throughput of a typical TCP connection to the network parameters [7]:

l¼

1:22 MTU pffiffiffiffi RTT p

(9:1)

where

l ¼ throughput of a TCP connection MTU ¼ maximum transmission unit is the maximum packet size used by the connection RTT ¼ Roundtrip time for the connection p ¼ packet loss experienced by the connection Under the model-based approach, the streaming server uses Equation (9.1) to determine the sending rate of the streamed media to behave in a TCP-friendly manner. The source basically regulates the rate of the streamed media according to the feedback information of the network. This can be used for both unicast and multicast scenarios. However, a source-rate-based control scheme is not suitable in heterogeneous network environments, where receivers have heterogeneous network capacity and processing power. Receiver-based rate control [11,12] has been found to be better rate control mechanism in heterogeneous network environments. Under this mechanism, receivers

282


regulate the receiving rate of streaming media by adding or dropping channels without any rate regulation from the source end. This is targeted toward scenarios where the source multicasts layered video with several layers. The basic scheme works as follows: 1. When no congestion is detected, a receiver joins or adds a layer or channel that results in increase of its receiving rate. If addition of a channel does not cause any congestion then join experiment is deemed successful. Otherwise, the receiver drops the added layer or channel. 2. If congestion is detected, the receiver drops the low-priority layer or channel (enhancement channel). Alternatively, an architecture may use both source and receiver-based control mechanisms [13] in which receivers regulate the receiving rate of streaming media by adding or dropping channels, while the sender also adjusts the transmission rate of each channel according to the feedback from the receivers. One of the main challenges in delivering streaming media to a client is to adjust with variations in network bandwidth while delivering acceptable quality streaming media to the receiver. As discussed before, short-term variations in bandwidth can be handled by providing playout or compensation buffer at the receiver. When available bandwidth is more than the playback rate at the receiver, the spare data are stored in the playout buffer and when the available bandwidth is less than that required to maintain the constant quality then the deficit is supplied by the spare data in the playout buffer (see Fig. 9.5). However, the bandwidth variations for a long-lived session can be large and random. This may cause the client’s buffer to either underflow or overflow. The buffer underflow is particularly undesirable since it causes interruption of service at the client’s end. Rate control mechanisms

Filling Phase

Draining Phase

Bandwidth

Transmission rate Spare data stored in playout buffer Deficit supplied from playout buffer

Playback rate

Available bandwidth from network

Time

Figure 9.5

Short-term quality adaptation at client.

9.2


283

discussed in preceding paragraphs are one way to tackle quality adaptation due to long-term variations in network bandwidth. Alternative mechanisms are adaptive encoding and switching between multiple encoded versions. Under adaptive encoding mechanism, the server adjusts the resolution of encoding by doing requantization based on the network feedback. However, this task is very CPU-intensive and is not scalable to large number of clients. Also, once the streaming data are compressed and stored, encoders cannot change the output rate over a wide range. In another alternative scheme, a server maintains several versions of media streams, each with different qualities. As available bandwidth in the network changes, the server dynamically switches between low- and high-quality media streams as appropriate. Hence, quality adaptation under short-term variations in bandwidth is achieved through the playout/compensation buffer at client end and quality adaptation under long term, and wide variations in bandwidth are achieved through appropriate rate control mechanisms at both client and server ends.

9.2.1.2 Error Control As previously mentioned, streaming media can tolerate errors as long as the error rate remains within an acceptable limit. This is particularly important in wireless environments that have high error rates. Moreover, errors tend to happen in bursts in these environments. Well-known techniques to minimize the error for streaming traffic are FEC (forward error correction), error-resilient encoding, error-concealment, and retransmissions. The FEC technique adds redundant information to the original packet in order to recover the packet, in the presence of errors. Errorresilient encoding is a preventive technique that enhances the robustness of streaming media in the presence of packet loss. The well-known error-resilient encoding schemes are resynchronization marking, data partitioning, and data recovery. These are particularly effective in wireless environments. Another promising error-resilient encoding scheme is multiple description coding (MDC) [14], where raw video data are encoded into a number of streams (or descriptions): each description provides an acceptable quality. If a client gets only one description, it should also be able to reconstruct video with reasonably good quality. However, the receiver can construct better-quality video if it gets more than one description. Error concealment techniques, on the other hand, adopt a reactive approach and aim to conceal lost packets and make the presentation less displeasing to human eyes or ears. Packet retransmission techniques [15] are considered very effective in wireless environments because of the bursty nature of wireless channels. In general, packet retransmission is not deemed very suitable for real-time applications such as video because of retransmission delay. However, retransmission may be allowed, especially for high-priority packets, if there is sufficient delay until the scheduled playback time of the packet considered. Clients may request the retransmissions of only those high-priority packets that have sufficient retransmission delay budget. We explain this concept as follows. For simplicity, we assume that the

284


server is generating packets at a constant frame rate (say, every T seconds). We introduce the following notations: Pn ¼ playback time of the nth packet Tn ¼ arrival time of the nth packet T ¼ interframe time RTT ¼ estimated roundtrip time Td ¼ loss detection delay Tr ¼ retransmission delay Tc ¼ current time Thus the scheduled time of the kth frame can be given by (P0 þ kT), where P0 is the playback time of the 0th frame. Now, if the current time is Tc, the delay budget before the scheduled playback time of the kth packet can be given by Delay budget ¼ (P0 þ kT) Tc

(9:2)

This delay budget should be sufficient to allow retransmission of the frame from the server taking into account loss detection delay, estimated roundtrip delay, and retransmission time. The client should send the retransmission request to the server only if the following condition is satisfied: Td þ RTT þ Tr delay budget

(9:3)

The objective here is to avoid unnecessary retransmissions that will not arrive in time for display. 9.2.2

Network Layer QoS Control

The previous discussions on QoS control at the application layer for streaming services assume no support from network whatsoever. The QoS support at the network layer and below complements the QoS mechanisms at application layer and reduces the signaling and processing load at higher layers. Providing QoS in the Internet is inherently a difficult problem due to its connectionless nature. However, a number of proposals have been made in IETF to provide some sort of QoS support in the Internet. Currently, there are two approaches, notably Integrated Services (IntServ[16]) and Differentiated Services (DiffServ[17]), standardized by IETF to provide QoS support in the Internet. The IntServ model provides per flow QoS guarantees. A flow is defined as a stream of packets between two end nodes with the same tuple of source address, destination address, source port number, and destination port number. The IntServ model consists of four functional blocks: end-to-end signaling protocol, call admission control at the edge, packet classifier at the edge, and packet scheduler at every network element in the path. RSVP [18] is the proposed signaling protocol to take the reservation requests to all the routers in the path. Underlying

9.3

STREAMING MEDIA CODECS

285

IP routing protocols determine the path, and RSVP signaling is used to reserve resources along the selected path. Keeping in mind the dynamic nature of IP routing protocols, the soft-state approach is utilized to reserve resources. Though IntServ provides excellent QoS model, it suffers from scalability problem. Network elements need to maintain a per flow state to provide per flow QoS guarantees. This can introduce scalability problems, particularly in backbone networks that support tens of thousands of flows. The DiffServ QoS model is another approach that provides scalable solution and does not require any signaling support. Unlike IntServ model, this model does not provide per flow QoS guarantees. Under this model, routers simply implement a suite of prioritylike scheduling and buffering mechanisms and apply them to IP packets based on the DS-field in the packet headers. The service that an individual flow gets is determined by the traffic characteristics of the other flows (cross-traffic) sharing the same service class. The lack of networkwide control implies that, on overload in a given service class, all flows in that class suffer a degradation of service. DiffServ tries to give soft QoS guarantees to flows by using a combination of provisioning, service-level agreements, and per hop behavior implementations. For this purpose, networkwide mechanisms are deployed in the network. Bandwidth broker (BB) is one approach to do resource provisioning within a DiffServ domain. BB is the resource manager within the DiffServ domain that keeps track of available resources and topology information for a domain. BB uses COPS (common open policy service) protocol [19] to interact with routers inside the domain.

9.3


Standardized video coding and decoding methods, such as H.263 by ITU-T and MPEG4 by ISO, are expected to be supported by a wide range of mobile terminals and networks. For audio-only content, MPEG-4 AAC is an appealing candidate for its superior coding efficiency, while MP3 is also likely to be supported because of its popularity on the Internet. Some mobile terminals may also support proprietary codecs and file formats, such as those developed by Apple Computer, Microsoft, and Real Networks. 9.3.1

Video Compression

Video compression in mobile networks is usually lossy compression that exploits temporal and spatial redundancy within the video streams. Specifically, motion estimation and compensation are widely used between consecutive video frames to reduce temporal redundancy. Within a frame, block-based transforms such as DCT (discrete-cosine transform) are performed to reduce spatial redundancy. In MPEG, for example, one can encode a video frame into one of the following types of encoded pictures [20]: . I-picture (I ¼ intraframe). I-pictures are encoded using intraframe information only, independently of other frames. In other words, I-pictures exploit spatial redundancy only.

286


. P-picture (P ¼ interframe prediction). P-pictures are encoded using the most recent-I-picture or P-picture as a reference. . B-picture (B ¼ bidirectional prediction). B-pictures are encoded using P-pictures and/or I-pictures both in the past and in the future as references. A video stream composed of I-pictures allows for flexible random access and high editability, but its compression ratio is relatively poor. P-pictures and B-pictures substantially improve compression efficiency at the cost of increased manipulation difficulty (random access, editability, etc.) and in the case of B-pictures, coding delay. Hence, an MPEG video stream often consists of a sequence of pictures of all three types (e.g., I B B P B B P B B I B) to strike a good balance among different aspects of performance and usability. In addition, MPEG-4 also allows encoding of arbitrarily shaped objects in order to provide content-based interactivity [21]. The mobile environment that we consider in this chapter brings some specific requirements for video compression. For example, wireless channel errors can lead to loss of synchronization because video encoders often uses variable-length coding (VLC), and forward error correction (FEC) codes are not very effective in correcting burst errors. Toward this end, error resilience and concealment techniques that minimize the effect of channel errors are important in providing graceful service degradation [22]. Furthermore, many mobile terminals have limited CPU, memory, and battery power resources; thus controlling decoder complexity is important for these terminals.

9.3.2

Audio Compression

Besides the speech codec used for voice services, general audio compression is needed for high-quality audio services such as music delivery. General audio coders typically generate higher bit rates than do speech coders since they cannot rely on a specific audio production model as speech coders do with the human vocal tract model. Additionally, while speech coder’s emphasis is intelligibility, audio codec may need to provide higher signal fidelity in streaming media services. In high bit rate, an audio codec strives to preserve the original signal waveform [23]. Higher compression can be achieved by taking advantage of the human auditory model so that the signal components that the human ears are not sensitive to can be compressed. More details on these techniques can be found in, for example, the article by Poll [23].

9.3.3

Codecs Used in 3GPP

As an example, Table 9.1 lists required or recommended decoders in 3GPP [24]. Figure 9.6 illustrates general client functional components for streaming media service in 3GPP [24].

9.3

TABLE 9.1

287

Codec Standards Used in 3GPP

Services

Decoder Requirements or Recommendations

Speech Audio Synthetic audio Video Still images Bitmap graphics Vector graphics

Figure 9.6


AMR MPEG-4 AAC low complexity Scalable polyphony MIDI H.263 profile 0 level 10 mandatory; MPEG-4 visual simple profile optional JPEG GIF, PNG SVG Tiny profile

Functional components of a 3GPP packet-switched streaming service (PSS) client.

288


9.4 END-TO-END ARCHITECTURE TO PROVIDE STREAMING SERVICES IN WIRELESS ENVIRONMENTS Streaming multimedia is characterized by an application rendering audio, video, or other media in a continuous way while part of the media is still being transmitted to the application over a data network. Streaming multimedia is a little different from conversational multimedia, which involves (usually bidirectional) conversation between multiple parties. Although the type of the media (media encoding) used for both streaming and conversational multimedia communication may be the same, conversational multimedia usually has more stringent requirements on endto-end delay between the parties. Also, streaming multimedia is usually a client server application and the media usually flow in only one direction (from server to the client), whereas conversational multimedia, such as interactive videoconferencing, is usually peer-to-peer, and the media (often) flow among all peers. In previous sections we saw how streaming media applications process (through decoding, error correction, buffering and scheduling) media data to compensate for delay jitter and packet loss incurred over the network and ensure a smooth rendering. Here we will discuss important logical components needed to enable streaming service in mobile or wireless networks and the interrelationships between these logical components required to form a complete streaming multimedia delivery system. Our main focus is packet-based streaming systems. We will start with a discussion on logical layout and components for such a system. In subsequent sections, we will shift focus to different protocols and languages used for streaming multimedia delivery and provide an overview of their working and example usage.

9.4.1

Logical Streaming Multimedia Architecture

Streaming multimedia architecture (Fig. 9.7) consists of following basic components: 1. A streaming server that sends media as a continuous stream over a data network. The server is often referred to as the origin server, to distinguish it from intermediary (proxy or caching) servers. 2. A data network that transports media from the server to the client application. 3. A client application capable of receiving, processing and rendering continuous stream of media in a smooth manner. 4. Protocols that are understood amongst the components and allow them to talk with each other. The protocols provide various functionalities, including allowing the client to establish a streaming multimedia session with the server, facilitating delivery of media from the server to the network and from the network to the client, understanding the content of media stream for correct processing at the client application (encoding and packaging), and allowing interaction with the servers to manipulate the media streams.

9.4

END-TO-END ARCHITECTURE TO PROVIDE STREAMING SERVICES

289

1. Streami ng Media Request

2. Str eaming Media

Streaming Media Client Figure 9.7

Network

Streaming Media Server

Basic streaming media architecture.

Besides the basic components and functionalities listed above, a multimedia delivery system often contains additional components, functionalities and protocols to improve various aspects of multimedia delivery. These may include the following:

1. Proxy Servers. Proxy servers provide functionality similar to that of a server from the client’s perspective. Proxy servers are often transparent to the application; however, certain streaming media protocols explicitly provide for the existence of proxies [25]. Proxy servers may be present to process client requests locally or relay the requests to some other server (after performing some optional local processing). If the target is to serve multimedia session requests locally, then a cache of streaming media content usually accompanies the proxy server. On receiving a request, the proxy server determines whether the desired content is available in the cache; if so, the content can be delivered locally; otherwise the proxy server relays the request to some other servers. 2. Caching Servers. Caching servers are local repositories of content. As in the case of static Web object (e.g., images and Webpages), it is advantageous to store local copies of content and serve user requests locally. This not only eliminates the delays incurred due to topological distance of the origin server from the client application but also results in traffic localization and better utilization of network resources. There are several well-known methods for populating caches with content, but they can be broadly classified in two categories: Passive Caching. Here only the content delivered by origin or upstream servers in response regional client application requests is stored at the cache server. Local storage in this method is often a promiscuous process and the cache server belonging to this category is often termed simply as “cache.” Proactive Caching. Here the content is proactively stored on the cache server by some external mechanism. Often entire or large portions of the content on a server may be replicated onto a caching server. In

290


this case the term surrogate server is sometimes used for the caching server. 3. Additional Protocols. Additional protocols may include Protocols for capability exchange between the client application and server, so as to allow the server to transmit appropriate data. Protocols for QoS feedback from client application to the server, enabling the server to adapt the transmission (if possible). Protocols and languages for (time and space) synchronization of multiple multimedia streams. Protocols and mechanisms for request routing to best available surrogate or caching for a given client request. We will not discuss request routing any further in this chapter. An overview of a multitude of request routing methods can be found in the report by Barbir et al. [26]. 4. Miscellaneous Components. A real-life deployment of a streaming multimedia delivery system will rely on more than just the abovementioned components (see example components in Fig. 9.8). Functionalities such as authentication, authorization and accounting (AAA) often require additional architectural support. Similarly, ensuring digital rights management (DRM) may require additional functionality from client application and also from the server and the content creation process. In certain scenarios dedicated components may be present to provide QoS adaptation and feedback. Standards for streaming media consist of a wide array of protocols, description languages, and media coding techniques. These standards have been developed and standardized at various standardization organizations, such as, Internet Engineering Task Force (IETF), ISO, Third-Generation Partnership Project (3GPP), and World Wide Web Consortium (W3C).

Proxy based

Figure 9.8

Some components of a typical streaming media architecture.

9.5

9.5

PROTOCOLS FOR STREAMING MEDIA DELIVERY

291


A streaming multimedia delivery system involves a number of protocols (see Fig. 9.9) to deal with the different aspects of streaming media. The protocols provide a common dialect through which different components in the architecture can talk with each other. These protocols can be classified in two broad categories: (1) session control and (2) media transport protocols. In most contemporary multimedia streaming setups, separate logical channels are used for session control and media transport. In some cases, however, most notably HTTP and RTSP tunneling, the same logical channel is used for both session control and media transport. Consequently, certain protocols provide functionalities that span more than one aspect of multimedia streaming, and we cannot draw a hard boundary. We will discuss these as well, but let’s first see what functionalities are expected out of the two main categories of the protocols. 9.5.1

Protocols and Languages for Streaming Media Session Control

Streaming multimedia often have a notion of (prolonged) association between multiple components, for example, between the client application and the server; this association is called a session.

Figure 9.9

Protocols used in a typical streaming session.

292


Session control and establishment usually includes identifying the parties (the client and server applications) involved in the session and the agreement or the announcement of different session parameters. For IP-based environments the parties are often identified by their transport layer address (IP address and port number). Multimedia streaming sessions often have a rich set of parameters, the most important of which are the types of encoding of media that will later flow from the sender (server application) to the recipient (client application). These parameters allow the application on the recipient to process and render the media correctly. Different session control protocols provide varying degrees of functionality, but all of them provide minimal functionality for basic session control: session setup, teardown and establishment of other session parameters. Examples of session control protocols include the real-time streaming control protocol (RTSP) [25], the session announcement protocol (SAP) [27], the session description protocol (SDP) [28], the session initiation protocol (SIP) [29], and ITU-T’s H.323 [30]. RTSP is the dominant session control protocol for client –server streaming multimedia application and is defined in RFC 2326 [25]. In this section you will find a brief tutorial on RTSP and its use; however, it is by no means a complete description of RTSP. In the following section we will describe RTSP in detail and briefly overview the other protocols in this realm.

9.5.1.1 Real-Time Streaming Protocol RTSP is an application-level client – server protocol that provides the functionality needed to establish and control a streaming session. The session may comprise one or more streams, which are described using a presentation description (using expressions such as SMIL or SDP). Once a session is established, RTSP provides methods for controlling the streams, such as, VCR-like forward, rewind, pause, and record methods. RTSP primarily provides functionalities to retrieve data from the server and invite a server to a conference, and it is a transactionoriented, request –response protocol like HTTP. However, there are a number of differences: . RTSP servers are required to maintain state between most transactions, unlike in HTTP, in which the servers are mostly stateless. . RTSP defines new methods and a protocol identifier. . In RTSP, the server side may issue some requests as well, unlike in the case of HTTP, where the client always makes the request and the server sends back a response. . In RTSP, the data are carried mostly out of band, on a separate data channel such as RTP. In HTTP, the data are carried in payload of HTTP (response) messages. . RTSP uses absolute resource identifiers (request URI); this is to eliminate the problems caused due to usage of relative URLs in earlier versions of HTTP.

9.5


293

RTSP Messages Figure 9.10 shows the syntax of RTSP messages. There are only two basic types of RTSP messages: request and response. All RTSP messages are text-based and use ISO-10646 UTF-8 encoding. The first line in the message identifies the message type: whether it is a request or response message and specifically what kind of request or response message. For requests this first line is termed the request line and for responses, the status line. Message headers follow the request line or the status line. These provide additional information that is critical for the correct interpretation of the message. Finally, messages may optionally contain a message body. Please refer to Section 15 in RFC 2326 [25] for the complete syntax of RTSP.

RTSP Request Messages Request line in each request message has a method token that indicates the task to be performed on the resource specified in “Request-URI.” Eleven methods are defined in RFC 2326 [25], each designed for a different task. Following is a brief description of each of the 11 RTSP methods; however, please refer to Section 10 in RFC 2326 [25] for an in depth description of the methods.

RTSP Message = Request | Response Request = RequestLine *( generalHeader | requestHeader | entityHeader )CRLF [ messageBody ]

Requestand andResponse Responseare arethe theonly onlytwo two Request typesofofRTSP RTSPmessages. messages. types Methodidentifies identifiesthe thetype typeofofrequest requestmessage. message. Method Leadingheaders headersprovide provideadditional additionalinformation information Leading forinterpreting interpretingthe therequest requestmessage. message. for

RequestLine = Method SP Request-URI SP RTSP_Ver CRLF Method = "DESCRIBE" | "ANNOUNCE" | "GET_PARAMETER" | "OPTIONS" | "PAUSE" | "PLAY" | "RECORD" | "REDIRECT" | "SETUP" | "SET_PARAMETER" | "TEARDOWN" | ext -method ext-method = token

Request-URI = "*" | absolute_URI RTSP_Ver = "RTSP" "/" 1*DIGIT "." 1*DIGIT Response = Status -Line *( generalHeader | responseHeader | entityHeader ) CRLF [ messageBody ]

Elevenmethods methodsare are Eleven definedinin[RTSP] [RTSP] defined specification. specification. Request-URIininthe therequest request Request-URI messageidentifies identifiesthe the message resourceininquestion. question. resource

Status-codeidentifies identifiesthe thetype typeofof Status-code responsemessage. message. response Leadingheaders headersprovide provideadditional additional Leading informationfor forinterpreting interpretingthe theresponse. response. information

StatusLine = RTSP_Ver SP StatusCode SP ReasonPhrase CRLF StatusCode = A pre-defined 3 digit code or a 3 -Digit extension-code ReasonPhrase = *

Figure 9.10

Syntax for RTSP messages.

294


. DESCRIBE is a recommended method that is only sent from the client side. The server typically sends a description of the resource identified in Request-URI. This description is contained in the message body. It is not necessary that session description always be obtained using this method. Other out-of-band mechanisms may be used for a variety of reasons including the cases where the server does not support the DESCRIBE method. Session may be described using SDP or other protocols. . ANNOUNCE is an optional method that may be sent from the client or the server. When sent from the client to the server, it updates the presentation or media object identified by the Request-URI. When sent from the server to the client, the session description is updated in real time. . SETUP is a mandatory method that is only sent from the client side. The client specifies the transport mechanism to be used for a media stream (identified by Request-URI). The SETUP method may also be used to change the transport parameters of a stream that is already playing. . PLAY is a mandatory method that is always sent from the client to the server. This tells the server to start sending the stream that was setup using a previously (successfully) completed SETUP transaction. PLAY is a versatile method, allowing very precise control to the client such as identify the range of media stream to be played (both starting point and ending point may be specified). Similarly, several PLAY requests may be issued for different segments of the stream setup using the previous SETUP message. Each request may specify both the range of stream segment and the time at which the server should start streaming the data. These requests would queue at the server and the server would generate the stream corresponding to each request at appropriate times. Obviously the server is not obliged to fulfill all the client requests. PLAY request is also used to resume a paused stream. . PAUSE is a recommended method that is always sent from the client to the server. This method causes the server to temporarily halt the delivery of a stream (or set of streams, depending on Request-URI). If a PAUSE request is issued, all the queued PLAY requests related to the Request-URI are discarded by the server. A new PLAY request must be sent to resume the stream(s). . The OPTIONS method is used by the sender to query the information about the communication options available on the resource identified by Request-URI; for example, it may be used by a client to query the types of methods supported by a server for a given media stream. Although a client or a server may send this message, implementation of this method is mandatory only for servers. . The TEARDOWN method request stops the stream delivery of the resource identified in the Request-URI. All the queued requests are discarded, and all the resources associated with the resource are freed. As you may have rightly guessed, TEARDOWN message is always send from the client to the server and this is a mandatory method.

9.5


295

. The REDIRECT method request informs the client that it must contact another server location. If the client wants to continue to send and/or receive the media, it must issue a TEARDOWN request for the current session and issue a new SETUP request to the server location identified in the REDIRECT request. REDIRECT message is always sent from the server to the client, but strangely, its support is optional. . The RECORD method initiates recording a range of media data according to description of the resource identified in Request-URI. This description may be made available by a previously sent ANNOUNCE method request or some out-of-band means. RECORD request is sent from the client to the server and its implementation is optional for both the client and the server. . The GET PARAMETER method request retrieves the values of the parameters of a presentation. The desired parameters are specified in the body of the request message. If no parameters are specified in message body, the message can serve as a method to check liveliness of client and server applications (a sort of RTSP application “ping”). GET PARAMETER is an optional method that may be used in either direction, that is, from the client to the server and from the server to the client. . The SET PARAMETER method request is used to set the value of a parameter for a presentation or stream identified in Request-URI. Only one parameter can be specified in the request, so that in event of failure there is no ambiguity about which parameter was not set. Like GET PARAMETER this method can also be used in both directions, and its implementation is optional for both client and server side applications. RTSP Response Messages The status line in each response message includes a status code, specifying the recipient’s response to the request. A three-digit number represents each status code. Response messages are classified in two broad categories, provisional responses and final responses. All messages status codes of the form 1xx (i.e., between 100 and 199) are considered provisional responses and they indicate that the recipient is processing the request, but the final action has not been taken, so the transaction is still considered pending. All other status codes indicate final responses. There are four subcategories. Status codes of the form 2xx indicate successful completion of transaction. Codes of the form 3xx indicate redirection (i.e., the responders “thinks” that the request must be sent elsewhere), 4xx indicate client error (i.e., something is wrong with the request made by the client), and 5xx indicate server error (i.e., although the request itself was fine, syntactically and semantically, but the server cannot process for some other reason). Although the method, token, and status codes are helpful in identifying the request and the response, in most cases the recipient of a message cannot determine the exact nature of the task to be performed on a request or the complete meaning of a response without looking at some of the other headers included in the message; sometimes message body must also be interpreted before the message can be fully understood by the recipient. For instance, earlier in this section, we referred

296


to the range of a stream while discussing the PLAY method. In RTSP, the stream range is specified using the “Range” request header; we discuss some of the RTSP message headers in next section.

Session Setup Using RTSP Figure 9.11 shows a typical interaction between RTSP client and RTSP server for establishing a RTSP session and its subsequent teardown. Once the client learns about certain RTSP resource, rtsp:// resource-name.server in this case, it sends a DESCRIBE request to the server to learn more about the resource. The server sends back a description of the session corresponding to the identified resource. If the client is interested, it sends a SETUP request, asking server to make necessary arrangements for establishment of the session. If successful, the client can initiate a PLAY request at a later time to get the media stream flowing. If the session requires a special QoS arrangement, such as resource reservation, the client does that before issuing the play request. If the PLAY request is successful, the media starts to flow. The client can manipulate this media stream using various RTSP requests, such as PAUSE or PLAY with different headers. Once the session is completed or the client is no longer interested, the client sends a TEARDOWN request to the server to terminate the session.

Figure 9.11

Session setup and teardown using RTSP.

9.5


297

9.5.1.2 Session Description Protocol The session description protocol (SDP) is widely used for presentation and session description. This protocol is specified in standards track IETF RFC 2327 [28]. SDP provides a well-defined format that conveys sufficient information about the multimedia session to allow the recipients of the session description to participate in the session. This information is commonly conveyed by SAP protocol that announces a multimedia session by periodically transmitting an announcement packet at a wellknown multicast address and port number. Alternatively, session descriptions can be conveyed through electronic email and World Wide Web. The SDP conveys following information: . Session name and purpose . Media comprising the session Media type (video, audio, etc.) Transport Protocol (RTP/UDP/IP) Media format (MPEG4 video, H.261 video, etc.) Addresses, port numbers for media . Time(s) the session is active The session description using SDP consists of a series of text-based lines (using the ISO 10646 character set in UTF-8 encoding). Each line is of the form ¼. is strictly one character (derived only from the U.S. ASCII subset of UTF-8). is generally either a number of fields delimited by a single space character or a free-format string.

A typical session description using SDP has three parts: . Session Description. This part describes the session and provides information about session owners and the session itself. The mandatory types included in this part are version (v), owner (o), and session name (s) fields. Other optional fields include session information (i), URI of description (u), email address (e), phone number (p), connection information (c), bandwidth (b), time-zone adjustments (z), encryption keys (k), and attribute lines with type field (a). . Timing Information. This part has one mandatory field (t) indicating time at which the session becomes active. The part may optionally include several repeat times (r). . Media Description. This part describes the type and other parameters for the media stream(s). This part includes a mandatory line for each media stream containing its name and transport address; this line is denoted by “m.” Additional optional lines for each media stream include media title (i), connection information (c), bandwidth information (b), encryption key (k), and zero or more media attribute lines each starting with “a.” If all the media streams share a common connection address, it can be mentioned once in the media

298


description part. The value corresponding to most typed fields is not free-form text and has a certain defined format. Figure 9.12 illustrates parts of session description with SDP using an example taken right out of the RFC 2327 [28]. 9.5.1.3 Other Session Control Protocols There are a number of other session control protocols available: the most notable ones are the (1) wireless session protocol [WSP], used with WAP, and (2) the SIP and H.323 family of protocols, which are typically used for real-time conversational media communication. Although these protocols can in principle be used for streaming multimedia with minor modifications, in practice, these protocols, despite their rich functionalities, are seldom used for streaming multimedia. In some cases streaming media protocols may be used in conjunction with conversational media protocols. For example, RTSP may be used for interacting with a voice or video mail system, while the remaining infrastructure may be based on SIP. There is, however, some preliminary discussion to use SIP for streaming media as well. This may eliminate the need for having multiple protocols of similar functionality at the terminal. This could be something to look forward to in the future. 9.5.1.4 Description Languages A number of description languages are used in today’s multimedia systems to describe the session integration and scene description, device capabilities, context, and metadata associated with media. The main purpose of well-formed description languages is to facilitate consumption of media information by computers, such as in search engines and semantic Web. However, this is not the only reason why description languages are used. Synchronized Multimedia Integration Language (SMIL) [31], for instance, is used to describe the space-time relationship

Figure 9.12

Parts of session description using SDP.

9.5


299

between a set of multimedia. Other examples of multimedia descriptions include ISO’s Multimedia Content Description Interface (MPEG-7) and Composite Capabilities/Preferences Profile (CC/PP) [32]. In the following sections we will learn about SMIL and CC/PP as they have important role to play in multimedia content delivery and presentation. Synchronized Multimedia Integration Language (SMIL) For commercial services, media presentation is perhaps just as important as the media itself. Content providers want to present the media in a manner that is both flexible for commercial services, such as integrating location specific advertisements with the media presentation, and at same time functional and appealing to the consumer. SMIL, an XMLbased language developed by the World Wide Web Consortium, is the “glue” that combines various media elements such as video, audio, images, and formatted text to create an interactive multimedia presentation. SMIL does not control the session, but it can be used to specify how the media are rendered at the client application (user agent). SMIL allows description of the temporal behavior of a multimedia presentation, associates hyperlinks with media objects and describes both temporal and spatial layout of a multimedia presentation on the user device. SMIL is an HTML-like language and like HTML, it also consists of elements, attributes, and attribute values. Following is a simple SMIL presentation. It demonstrates the timing, synchronization, prefetch and layout capabilities of SMIL. The SMIL user agent completely (100%) prefetches the media objects. It then displays a video clip and displays a series of static images one after another. The images appearing in “region2” change every 10 seconds, and those in “region3” change every second giving impression of a counter. The layout and presentation behavior is pictorially shown in Figure 9.13 using a video clip showing a moving airplane, and the images in “region3” change every second.

SMIL PRESENTATION EXAMPLE

0 0

0

Root-layout

Region 1

Region 2

Region 3

0 1

0 2

0 3

0 4

0 5

1

2

3

4

5

Figure 9.13

A SMIL presentation example.

Time (seconds)

300


1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34:

9.5


36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52:

35:

301

¼ “10s” ¼ “10s”

¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s”

Composite Capabilities/Preferences Profile (CC/PP) RTS does not provide a very good capability exchange mechanism. In most cases the server decides on the type of media and its other properties without first consulting the client about its capabilities. The client may have several capabilities or limitations, which, if communicated to the server, would allow the server to customize the presentation and media based on client capabilities. The client device may have limited bandwidth, or a constrained display, software constraints such as support for some SMIL features and not other features, or some user preferences that may impact the presentation of media at the user agent. CC/PP can be used to express all these scenarios and more. A CC/PP description is a statement of capabilities and profiles of a device or a user agent. CC/PP is based on resource description framework

CC/PP OVERVIEW

302


Profile

More Components

HardwarePlatform BitsPerPixel

Streaming AudioChannels

ColorCapable

MaxPolyphony

16

Mono yes

PixelAspectRatio

8

1x2

Figure 9.14

PssVersion 3GPP-R5

An example CC/PP profile.

(RDF1) and can be expressed using an XML document or some other structured representation format. A CC/PP description is structured such that each profile has a number of components and each component has one or more related attribute – value pairs, which are sometimes also referred to as properties. Figure 9.14 shows CC/PP structure for a hypothetical profile. Two components, HardwarePlatform and Streaming, and some of their respective attributes are shown. The HardwareProfile component above, groups together BitsPerPixel, ColorCapable, and PixelAspectRatio properties, which are presumably properties related to the hardware of the device. As with all the languages and description formats, we must have a set of mutually understood vocabulary and rules for their interpretation. CC/PP is no exception. With CC/PP any operational environment may define its own vocabulary and schema that specify the allowable attributes and values, along with their syntax and semantics. This vocabulary and schema may be understood only by the relevant applications. For instance, W3C [32] defines a core vocabulary for print and display, and WAP forum’s user-agent profile (UAProf) specification WAP [33] defines a vocabulary that can be used to express different capabilities and preferences related to the hardware, software, and networking available at the device. A discussion on CC/PP attribute vocabularies can be found in Ref. 34. CC/PP allows specification of default attributes and values in the schema corresponding to each component. If a user agent’s capabilities and preferences related to a particular component match the default, it can just specify so without actually giving details of all the attributes and their values. If values of some of the attributes differ from the default values, only a device can create a profile containing only the differing attribute value pairs while referring to the defaults for other attributes. This mechanism shortens the profile descriptions and saves precious wireless bandwidth. Other methods of reducing size of profile description include using binary encoding such as WAP binary XML. 1

If you are not familiar with RDF, an excellent premier can be found in [68].

9.5


303

9.5.1.5 UAProf Specification UAProf [33] is worth mentioning here because the capability exchange framework and vocabulary defined in this specification is used, with modifications in some cases, in many mobile content delivery systems, including 3GPP-PSS. UAProf specifies (1) end-to-end capability exchange architecture; (2) a vocabulary and schema comprising six components, namely, HardwarePlatform, SoftwarePlatform, BrowserUA, NetworkCharacteristics, WapCharacteristics, and PushCharacteristics; (3) encoding methods for the profiles; and (4) methods for transport of profiles. UAProf also outlines usage scenarios for user-agent profiles and behavior of different entities involved in the capability exchange process. A brief description of the six components described in Ref. 33 follows in Table 9.2.

CC/PP Exchange HTTP is typically used as the transport protocol for CC/PP description from client to server. However, potentially tens of components and hundreds of properties may be required to fully express the capabilities and preferences profile of a user device. A profile description can therefore be very large and transport of such description between the user device and the server can entail significant overhead.

TABLE 9.2

UAProf Component Description

Component HardwarePlatform

SoftwarePlatform

BrowserUA NetworkCharacteristics

WapCharacteristics PushCharacteristics

Description Comprises a set of attributes that describe the hardware characteristics of a user-agent device, such as type, model, and input/output capabilities Consists of a set of attributes related to the software environment on the device, such as the operating system, available audio video encoding/decoding components, user language preferences This component encompasses the properties related to the HTML browser at the user agent The attributes in this component describe the characteristics of the network that the user device is connected to Includes attributes concerning Wireless Application Protocol (WAP) capabilities Covers attributes specific to push capabilities of the device; the push model is slightly different from the traditional request/response model used for most content; instead the content can be “pushed” to the client without receiving an explicit request from the client (see Ref. 69 for details)

304


We already saw that CC/PP allows referring to default attribute values, which may reduce size of the description, but what about the properties that deviate from default. The CC/PP exchange protocol [35] has been designed with precisely these constraints in mind. This protocol allows the user agents to specify only the attributes that differ from default or last capability exchange. This reduces the size of descriptions significantly. Because of the dependency between different descriptions sent by a client, the network must maintain state information about previous a CC/PP exchanges. For this purpose a new logical entity called a CC/PP repository is introduced. This repository stores the default and predefined profiles. The CC/PP exchange protocol [35] extends HTTP by defining three new HTTP headers, two of which are request headers, namely, profile, profile-diff, and one response header, named profile-warning. The profile header contains a list of references to (predefined) profiles or profile descriptions expressed carried in profile-diff header in the same message. profile-diff header contains the actual profile description. Profile-warning header is used to convey any warning information to the requestor, such as when the server fails to fully resolve a profile description. Ref. 33 defines similar headers for use with Wireless profiled HTTP, and these headers are called x-wap-profile, x-wap-profilediff, and x-wap-profile-warning, respectively, and have meanings similar to those of the corresponding headers defined for CCPP exchange protocol. A simple example of the content delivery process based on CC/PP is shown in Figure 9.15. The client includes the CC/PP description in the request for the content. The server resolves the profile and selects or creates appropriate content and sends it back to the client. In reality this same model may include intermediaries such as proxies and gateways, which may manipulate the user request and its capability profile before forwarding the request to the server.

Client

1

HTTP or RTSP request for content with references to profile

4

Content Server

3 Appropriate content is selected or created

Response 2

Delivered content is appropriate for user’s capability and preference profile

Server retrieves the referenced pieces of profile

Profile Repository

Figure 9.15

Capability exchange with CC/PP.

9.5


305

Needless to say, CC/PP is a generic mechanism for expressing capabilities and profiles and can be used in a variety of situations besides the classical client – server scenario depicted in Figure 12.15. It should also be noted here that currently mostly HTTP is used to carry CC/PP descriptions, RTSP may become more widely used in the future. 9.5.2

The Streaming Media Transport Protocols

For the application to render the media while they are still being transmitted over the data network, some care must be taken in media transport. The media transport mechanisms must provide means through which the media are transported in a sequential manner, and with all the relevant information about how and when they must be rendered (e.g., the media format types and the timestamps). Currently the hypertext transport protocol (HTTP) [36] TCP [37], UDP [38], and real-time transport protocol (RTP) [39] [coupled with the real-time transport control protocol (RTCP)] are used for multimedia streaming over the Internet. Among these protocols, only RTP can be regarded as a true real-time transport protocol, but presence of firewalls that do not understand the streaming protocols and block UDP-based traffic can sometimes make use of HTTP and TCP unavoidable. In many scenarios a multimedia session consists of many different streams, each with its own unique requirements with respect to media transport, thus necessitating the use of more than one media transport protocols. One such scenario is the 3GPPPSS architecture, which we will describe later in this chapter 9.5.2.1 The Real-Time Transport Protocol This protocol has emerged as the dominant streaming media transport protocol. The basic protocol is defined in IETF RFC 1889 [39]. The RFC defines two protocols that are meant to work in tandem, namely, the RTP for media transport and the accompanying protocol called real-time transport control protocol (RTCP) for transport feedback to the senders from the receivers. While RFC 1889 provides the base specification, several additional specifications have been developed for packetization and use with individual media types such as H.263 [40] and GSM-AMR [41]. In the following text we will briefly give an overview of the functionality provided by RTP and RTCP and their use in streaming media environment. Figure 9.16 shows the RTP packet format. RTP provides payload type identification, fragmentation (M-bit), sequencing, and timing information in each individual packet. The payload type field allows the application to determine the correct codec type to be used with the media. Fragmentation information allows the applications to reassemble protocol data units correctly. Timing and sequence information allows the applications to recognize any out of sequence packets and compensate for delay-jitter variations incurred on the network. They all together allow an application to render the multimedia stream correctly and smoothly. RTP also provides synchronization source (SSRC) and contributing source (CSRC) identifiers to identify the packets belonging to same stream independent of the transport layer address. This is especially helpful in multiparty streaming

306


Figure 9.16

RTP packet format.

scenarios but is rarely used in contemporary streaming multimedia delivery. RTP is also capable of transporting encrypted media; however, the key generation and distribution is out of scope of RTP. RTCP specifies periodic transmission of control packets to all the participants in a session. It serves four main functions: 1. Feedback on quality of reception of data through RTCP sender and receiver reports. 2. Carrying a persistent transport level identifier for RTP source. This identifier is called canonical (CNAME), this is very helpful in multimedia scenarios where a RTP source may contribute more than one streams. Such as when transmitting audio/video streams of a conversation, the common CNAME for the individual SSRC allows the receiver to recognize these streams as associated, indicating need for synchronization (e.g., for lip-synchronization). 3. Rate control for RTCP messages. The number of RTCP messages generated can quickly get out of control in a conference with large number of participants. This functionality allows the participants to control the rate of RTCP reports. 4. Session control information for loosely controlled sessions, where, participants may join and leave without strict membership control. However, streaming multimedia sessions are often tightly controlled and complete session control information is established via separate session control protocols such as RTSP and RTCP, allowing only loose control within the parameters established by the session control protocol. Figure 9.17 shows the format of RTCP senders report. Receiver reports are similar, except that the header does not contain the NTP timestamp and there is no sender information block. The payload type for receiver reports is 201.

9.5

307


RTCP sender report packet format.

Figure 9.17

In addition to senders and receivers reports, RTCP also provides for source description or SDES packets (see Fig. 9.18). These packets include information, such as name, email, phone number, and geographic location about the synchronization and contributing sources. Although RTP is transport-independent as long as the transport protocol provides multiplexing and correct delivery, because of the stringent delay requirements of most real-time traffic and high acceptance of IP, UDP is primarily used as transport 0

V=2

7

P

Source Count (SC)

15

Payload type=SDES=202

23

Length

SSRC-1 or CSRC-1 SDES Items for SSRC/CSRC -1

…… SSRC-2 or CSRC-2 SDES Items for SSRC/CSRC -1

……

Figure 9.18

RTCP source description format.

31

308


for RTP. Although RFC 1889 states that RTP uses checksum and multiplexing capability of UDP, it is worth noting that most media codecs are either not sensitive to bit errors, or may be encoded with error correction codes; therefore, it is not wise to discard the entire packet if the checksum fails. In such cases it may be wise to disable UDP checksum or use protocols such as “UDP-lite” [42,43]. RTP and RTCP are usually used in tandem and multiplexed onto the same network layer address; for instance, if UDP/IP is used, they will typically share the IP address. By convention the RTP stream uses an even-numbered port number and the corresponding RTCP channel uses one immediately following the odd-numbered port. As stated earlier, individual profiles for specific media types have been defined. These profiles specify the payload type, any modifications to the semantics of different fields in the header and payload, and any new header types if necessary. Examples of such media-specific profiles include Ref. 44 for H.263 and Ref. 41 for AMR. These profiles sometimes provide functionality for rate adaptation and other in-band signaling; for example, Sjoberg et al. [41] allow the receiver to specify one of several AMR codec rates or modes of operation. Applications using these media types must conform to the corresponding profiles to ensure compatibility. 9.5.2.2 Other Media Transport Protocols HTTP and RTSP tunneling or plain UDP or TCP are sometimes used for media transport. HTTP and RTSP tunneling is useful in cases where a firewall blocks RTP/UDP traffic. With HTTP and RTSP tunneling, the streaming media are sent embedded or interleaved in the body of the HTTP or RTSP messages; this approach, however, can be highly inefficient in terms of the amount of bandwidth used. But as streaming multimedia gains wider deployment and acceptability, there are more firewalls that understand the streaming media protocols and can therefore open the desired ports to allow streaming media. So we will likely see less use of tunneling in the future.

9.6

3GPP PACKET-SWITCHED STREAMING SERVICE

As discussed in previous sections, a basic streaming service consists of streaming control protocols, transport protocols, media codecs, and scene description protocols. 3GPP has formulated a set of 3G PSS standards to provide mobile packetswitched streaming service (PSS). The 3GPP standard specifies protocols, codecs and architecture to provide mobile streaming service. The 3GPP codecs and media types were discussed in Section 3.3 of this chapter. Figure 9.19 depicts the 3GPP protocols and applications used in a PSS client. The protocols and their applications are . RTSP and SDP for session setup and description . SMIL for session layout description

9.6


309

Figure 9.19 3GPP streaming protocols and their applications.

. HTTP for capability exchange and transporting static media such as session layout description (SMIL files), text, graphics, and so on . RTP for transporting real-time media such as audio, video, and speech Providing end-to-end streaming service implies harmonized interworking between protocols and mechanisms specified by IETF and 3GPP. Both 3GPP and IETF

Figure 9.20 3GPP packet-switched streaming service.

310


have their own sets of protocols and mechanisms to provide QoS and connectivity in 3G access network and external IP-PDN (Internet), respectively. External IP-PDN can deploy either IntServ or DiffServ QoS model to provide QoS. 3GPP release 4 does have a support for streaming services in its QoS model. 3GPP release 5 has an upgraded packet-switched core network by adding an “Internet multimedia subsystem (IMS)” that consists of network elements used in session initiation protocol (SIP)-based session control. Release 5 has also upgraded GSNs (GPRS support nodes) to support delay-sensitive real-time services. In addition, the radio access network (UTRAN) has been upgraded to support real-time handover of PS (packet-switched) traffic. The main purpose of release 5 is to enable an operator to offer new services like multimedia, gaming, and location-based services. The Internet multimedia domain is mainly concerned with new services— their access, creation, and payment—but in a way that gives an operator full control over the content and revenue.

9.6.1

3GPP Packet-Switched Domain Architecture

Figure 9.20 depicts the network architecture of an end-to-end 3GPP packet-switched streaming service. We need at least a streaming client and a content server to implement the streaming service. Content servers may be either hosted in the UMTS architecture itself or accessed externally through an IP-PDN. A proxy server may be needed in UMTS architecture to provide sufficient QoS, if the content servers are accessed externally through an IP-PDN. The end-to-end streaming architecture has following network elements that are specific to streaming: . Content Servers. They can be either hosted in the UMTS architecture (added to the IMS) or can be accessed externally. Content servers consist of streaming servers that store streaming content and Web servers that hold SMIL pages, images, and other static content. . Proxy Server. This may be included in the IMS (especially when the streaming server is external) to provide enhanced QoS streaming service. The proxy server’s [45,46] main role is to smooth (eliminate delay jitter) incoming streaming traffic from the external IP-PDN. During transmission of the streaming content to the client, the proxy dynamically adapts the delivered QoS in accordance with the available bandwidth. The proxy server uses the feedback from the client application, radio network, and IP network. The proxy server can also implement an appropriate quality adaptation scheme by switching on the fly to a lower-quality streaming when the available bandwidth is not sufficient. Moreover, it can perform additional functionality of transcoding. Transcoding may be needed for several reasons, such as, when a user moves from a high-bandwidth wireless LAN to a GPRS or 3G networks. This may also be needed if the mobile node is unable to handle high-bandwidth streaming traffic.

9.6


311

. User and Profile Servers. These servers store user preferences and device capabilities. This information can be used to control presentation of streamed media to a mobile user. . Content Cache. Content cache can be optionally used to improve the overall service quality. . Portals. Portals are servers that allow convenient access to streamed media content. For example, a portal might offer content browse and search facilities. In the simplest case, it can be a Webpage with a list of links to streaming content. Apart from the abovementioned network elements that are specific to streaming service, other network elements in the 3GPP UMTS architecture play a significant role in the QoS management of streaming service. The UMTS radio access network (UTRAN) ensures seamless handover between basestations with minimal disruption to ongoing real-time services. The radio resource control (RRC) protocol [1] (3GPP-TS-25.331) is used for controlling resources on the UTRAN (universal terrestrial radio access network). The radio access network application part (RANAP) protocol [1] (TS-25.431) is used between UTRAN and core network entities. The serving GPRS support node (SGSN) acts as the gateway for the entire packet-based communications between user equipments (UEs) within its serving area. The SGSN is responsible for packet routing and transfer, mobility management (attach/ detach and location management), logical link management, authentication, and charging functions. The gateway GPRS support node (GGSN) acts as a gateway between UMTS core network and external IP-PDN. There is an active PDP context for every active packet-switched bearer or session. The PDP context is stored in UE, SGSN, and GGSN. With an active PDP context, the UE is visible for the external IP-PDN and is able to send and receive data packets. The PDP context describes the characteristics of the session. It contains a PDP type (e.g., IPv4), the IP address assigned to the UE, requested QoS, and the address of the GGSN that serves as the access point to the IP-PDN. Table 9.3 shows the different QoS classes supported in the UMTS architecture [1]. The PDP activitation (see Fig. 9.21) in the UMTS architecture works as follows. The UE first sends an “Activate PDP context request” message to the SGSN through the session management (SM) protocol. SGSN contacts the home location register

TABLE 9.3

UMTS QoS Classes

Class Conversational Streaming

Requirements

Interactive

Very delay-sensitive Better channel coding; retransmission Delay-insensitive

Background

Delay-insensitive

Example Traditional voice; VoIP One-way real-time audio/video Telnet; interactive e-mail; WWW Ftp; background email

312


Figure 9.21

PDP context activation procedure.

(HLR) and performs authentication and authorization functions. SGSN then performs the local admission and initiates radio access bearer (RAB) assignment procedure in the RAN/GERAN through RANAP procedure. A local call admission based on the availability of radio resources and UMTS QoS attributes is mapped on radio bearer (RB) parameters used in the physical and link layers. After the establishment of RB, SGSN sends a “Create PDP context request” message to the GGSN. The GGSN performs local admission control and creates a new entry in the PDP context table that enables the GGSN to route data between SGSN and external IP-PDN. Afterward, the GGSN returns a confirmation message “Create PDP context response” to the SGSN” that contains the PDP address. The SGSN updates its local PDP context table and sends an “Activate PDP context accept” message to the UE.

9.6.2

The 3GPP PSS Framework

The 3GPP PSS specifications consist of three 3GPP technical specifications: 3GPP TS 22.233, 3GPP TS 26.233, and 3GPP TS 26.234. PSS provides a framework for IP-based streaming applications in 3G networks. This framework is very much in line with what we have discussed so far in this chapter. This framework uses CC/ PP for capability exchange (see Fig. 9.22), SMIL for presentation description,

9.6

Figure 9.22


313

Capability negotiation mechanism applied in PSS.

RTSP for session control, and SDP for session description. However, there are minor differences here and there. Let’s go over these one by one. 9.6.2.1 Streaming Media Session Setup Procedures for PSS Figure 9.23 shows an example of a simple session establishment. The first step is to know what content to get and where to start. The client can obtain the URI of the content from an SMIL presentation document, a simple Webpage, or an email, or just simply by word of mouth. Once the URI is known, the client application sends a request for the primary PDP context that is opened to allocate the IP address for the UE as well as the access point. The primary PDP context is used to access content servers in either IMS domain or external IP-PDN. Since the primary PDP context is used for RTSP signaling, it is created with UMTS interactive QoS profile. A socket is opened for RTSP signaling and is tied to the primary PDP context. The client can now query the content server to learn more about the content using RTSP DESCRIBE request.2 The client may include its CC/PP description in the request. The client does not need to include the profile description if it is sure that the URI that it is using in the RTSP request already points to a resource that is compatible with its profile. Such would be the case if the URI were obtained from an SMIL document, which was obtained after presenting a valid CC/PP description. If the profile is included, it is carried using the x-wap-profile and the x-wapprofile-diff headers for CC/PP exchange protocol that we discussed earlier. 2

RTSP DESCRIBE method is mandatory in 3GPP-PSS architecture; however, IETF does not mandate its use.

314


Figure 9.23 Streaming multimedia session establishment in PSS.

If the profile description is included, the server can find or create content that is most suitable for the client’s request URI and its profile. Otherwise it just selects the default content corresponding to the URI. The server sends back the response with the description of the session that will be used to deliver the selected content. On receiving the description, the client can determine whether it likes the description, which it is likely to be the case because it has presumably been tailored to the client’s capabilities and preferences. The client can now send a SETUP request to the server, asking it to make necessary arrangements for the streaming session. The server acknowledges the SETUP request by sending a “200 OK” response message back to the client. The client now needs to establish a PDP context that is suitable for the anticipated multimedia streaming session. It does so by opening two sockets for RTP and RTCP traffic and tying it to two secondary PDP contexts. The secondary PDP contexts are assigned appropriate UMTS QoS profiles. The secondary PDP contexts reuse the same IP address and access point as the primary PDP context. Now that everything is ready, the client can send a PLAY request asking the content server to start the streaming session. The streaming media are typically transported over UDP/RTP/IP protocols as described in SDP. Figure 9.23 shows the presentation and content server as single entity, but these may in fact be logically and physically separate entities.

9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESS ENVIRONMENTS

315

9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESS ENVIRONMENTS The main factors that differentiate wireless mobile environments are . Limited Bandwidth and Error-Prone Channel. The channel characteristics of a wireless channel have unpredictable time-varying behavior due to several factors such as interference, multipath fading, and atmospheric conditions. The last hop of communication is wireless, which not only offers relatively low bandwidth, but also suffers from higher bit error rate (BER). Furthermore, retransmissions needed to recover from these errors induce variable delay across the wireless channels. . The Movement. The movement of mobile users triggers a handoff mechanism to minimize interruption to an ongoing session. The wireless channel characteristics may vary significantly from one segment of the network to another. Since the handoffs almost always incur packet loss, they further aggravate the already lossy nature of wireless medium. Finally, the relative pathlength from the server to the clients may vary as the clients move across networks. This is especially true if the server is close to the edge, as in the content distribution networks. In the following text we will cover some recent proposals to alleviate the problems that arise as a result of the error-prone nature of wireless channel and mobility in mobile content delivery systems. Also, we look into the research issues regarding providing streaming service in heterogeneous network environments.

9.7.1 Differentiating Transmission Error Losses from Congestion Losses In the wired and wide area Internet, most of the packet loss occurs as a result of congestion. In wireless environments, however, the major source of packet loss is transmission errors over the wireless channel. The rate control is normally used to avoid congestion-induced losses by slowing down the sender. But this is not suitable for avoiding or recovering from errors on wireless channels. The techniques used for error recovery or packet loss avoidance over the wireless channels build better error resiliency in the packets so that even if some packets are dropped, they can still be recovered at the receiver. Alternatively, some senders use aggressive retransmissions, but that is bound to introduce congestion problem. A typical mobile multimedia delivery environment comprises both wired and wireless links. In such an environment an end-to-end feedback mechanism, such as RTCP feedback messages can convey information only about the net end-toend packet loss and there is no way for the sender to ascertain whether the packet was lost on the wired network or the wireless network. Since counteracting the two types of the packet loss requires different techniques, the sender cannot cope

316


with the situation effectively without being able to distinguish between the two types of packet loss. To address this problem, a novel RTP monitoring technique has been introduced [47,48]. This technique relies on placement of a RTP monitoring agent at the edge of the wired/wireless network. This agent monitors the RTP streams and sends RTCP feedback to the sender of the stream, such as a streaming server. This feedback is in addition to the RTCP feedback generated by the recipient itself (see Fig. 9.24). The RTCP feedback from the client gives aggregate loss over both the wireless and the wired segments of the end-to-end path. On the other hand, RTCP feedback from RTP monitoring agent gives the loss over the wired segment only. This helps the recipient (typically a streaming server) to determine whether the loss occurred in the wireless or wired segment of the path. It is worth mentioning here that since the RTCP feedback messages are not generated at the same rate as the RTP packets, the feedback captures aggregate packet loss over the RTCP period, which is typically a few seconds. Thus the server can only estimate the percentage of packet loss over the wired and wireless segments and must adapt the stream accordingly. Details on the RTP monitoring techniques and its applications can be found in two papers by Yoshimura and colleagues [47,48].

9.7.2

Counteracting Handover Packet Loss

As we pointed out earlier in the section, handovers are the cause of additional packet loss in mobile networks. Although network layer mobility protocols such as mobile

Figure 9.24

Streaming agent to differentiate wired and wireless packet loss.


317

IP [49] and fast mobile IP [50] attempt to provide seamless handovers during host movement, some packet loss is inevitable because of signaling propagation delay. A novel end-to-end technique for soft IP handover has been proposed [51]. Figure 9.25 shows an overview of this scheme. This scheme assumes that the receiver host is at least temporarily attached to multiple interfaces during the handoff process. The receiver host signals this situation to the sender, along with the information about the interfaces, such as their IP addresses and their relative priority based on signal strength, estimated bandwidth, or packet loss rate on individual interfaces. The IP stack on the sender host then generates redundant error correction symbols, denoted as F1, F2, D1, and so on in Figure 9.25, and dispatches them to the multiple interfaces of the receiver. Reed –Solomon codes are used to generate the redundant symbols [51]. In general, if a message is extended from k symbols to n symbols through addition of (n 2 k) redundant symbols, then up to (n 2 k) redundant symbols can be recovered at the receiver node. For example, in Figure 9.25 n ¼ 2k, that is, there are just as many redundant symbols as the symbols in the original message; thus the receiver should be able to recover the application data even when any n symbols are lost.

Figure 9.25

Bicasting forward error correction codes.

318


9.7.3 Mobility-Aware Server Selection and Request Routing in Mobile CDN Environments We mentioned earlier that the movement of a mobile host might result in the establishment of an entirely new path with very different path characteristics. If the servers are present very close to the edge of the network, as in a highdensity content distribution network, this change of relative distance and path characteristics may result in significant QoS degradation, especially for streaming multimedia, where the sessions are typically long. This situation can, however, be alleviated by changing the content server as the host moves as proposed in Refs. 52 and 53. The technique revolves around keeping track of host movement and assigning a new server as the host moves from optimal content delivery region of one server to another (see Fig. 9.26). A number of methods may be used to keep track of host movement and then perform server handoff. Tariq and Takeshita [53] define server coverage areas as sets of IP subnets, and mobile IP binding update messages are used to track user movement. Server handoff is treated as a process of establishing session with new server and terminating with old one, and is achieved using extended RTSP methods [53]. Yoshimura et al. [52] use SOAP messages to update the presentation file used by the host, so that the next segment is fetched from the most appropriate server. The techniques described in Ref. 54 go a step further and analyze the host mobility in terms of how rapidly or slowly it is moving and try to assign a server on that basis. This predictive algorithm can significantly reduce the number of server handoffs that may be necessary.

Figure 9.26

Mobility-based server selection techniques.


319

9.7.4 Architectural Considerations to Provide Streaming Services in Integrated Cellular/WLAN Environments The wireless LAN is fast emerging as a complementary technology to 3G networks. This technology provides very high-speed access (11 Mbps for 802.11b and 54 Mbps for 802.11a) but covers very small area and allows limited mobility. On the other hand, the 3G technology provides access at relatively low speed (100 kbps for GPRS) to medium speed (2 Mbps for UMTS) but covers a wide area and allows high mobility. A number of interworking mechanisms [55 – 57] have been developed to integrate these two technologies into a single wireless data network that allows very high-speed access at hotspot areas such as airports and shopping malls. Integration of the WLAN and the cellular network falls in two categories depending on who owns and manages the WLAN. For example, operators can own and manage WLANs to augment their cellular data networks. Thus, an operator can gain competitive advantage by providing enhanced data services at strategic locations such as airports and hotels. In the alternative scenario an independent wireless Internet service provider (WISP) or enterprise can own the WLAN. In either of the two cases an end user can obtain very high quality streaming service in hotspot locations. Two methods are used to integrate cellular and WLAN networks: tight coupling and loose coupling as illustrated in Figure 9.27. The architectural issues to provide seamless streaming service for both the methods are described in the following paragraphs.

Figure 9.27 Generalized integrated UMTS/WLAN architecture.

320


9.7.4.1 Tight Coupling Under this integration scheme, the WLAN is connected to the GPRS core network in the same manner as any other radio access network (RAN), such as GPRS RAN (GERAN) and UMTS terrestrial network (UTRAN). The WLAN is deemed as a new radio access technology within the cellular systems. The WLAN may either emulate a radio network controller (RNC) or a SGSN. From the core network point of view, the WLAN is like any other GPRS routing area (RA) in the system. An interworking unit (IWU) is needed to interface the WLAN to the GPRS core network. The main advantage of this solution is that the mechanisms for mobility, QoS, and security in the core network can be reused. Within this architecture the handover takes place when a mobile user either enters or leaves a hotspot area. The IP address allocated to the mobile user under this scheme does not change during the handover process since the mobile user still remains under the same GGSN. The hotspot areas and cellular coverage areas normally overlap, and the handover is based on end user’s desire. For example, a mobile user, receiving multimedia streaming service, would like to hand over to the WLAN when moving into the hotspot area to improve performance. Since the bottleneck bandwidth in wireless environments lies in the air interface, the transcoding functionality (in the proxy server) may not be needed in the delivery path when the mobile user hands over from the cellular RAN to the WLAN. This scheme may also require implementation of additional QoS adaptation mechanisms to support seamless handover between the WLAN and the cellular RAN for real-time applications such as streaming. 9.7.4.2 Loose Coupling Under this integration scheme, the WLAN interfaces directly with the IP-PDN (e.g., the Internet) and has no direct interface with the GPRS core network. In this scenario, WLANs and cellular networks are two separate access networks. Loose coupling scheme may deploy IETF-based protocols to handle authentication, accounting, and mobility. The WLAN appears as a visiting network to the UMTS core network. A mobile user is typically allocated a new IP address while handing over from the UMTS network to the WLAN or vice versa. Seamless handover under this scheme may require advanced mechanisms like context transfer [50] (session context, QoS context, security context, etc.) and resource reservation. Providing seamless streaming service under this integration scheme is an open research problem. The streaming in mobile and wireless environments is a subject of active research. Some of the open research issues to provide multimedia streaming services in mobile and wireless environments are . . . .

Seamless service during interdomain and intertechnology handoffs Dynamic QoS adaptations and channel allocations Optimizations across lower and higher layers Efficient micromobility protocols [58] to make smoother intradomain handovers

9.8

CONCLUSIONS

321

. Secured streaming, digital rights management schemes . Efficient implementations of multicast streaming services Some of the more recent studies on these topics are listed at the end of this chapter [e.g., 3, 4, 59 – 67].

9.8

CONCLUSIONS

This chapter addresses the architectural and design issues to provide streaming services in wireless environments. Supporting streaming services in wireless environments is a big challenge, due to error-prone wireless channels and mobilityinduced factors. Also, limited buffering and processing power available in portable mobile devices impact the design of wireless access network architecture. A lot of research work has been done to address these issues [51,54,59 –62]. The wireless access network architecture should implement appropriate mechanisms to mitigate the impact of wireless/mobility-induced factors in order to minimize the resource and processing requirements at the mobile terminal. We have discussed some of these research issues and related work. The chapter gives a general overview of an end-to-end architecture including network elements and protocols to provide streaming services in wireless/mobile environments. We also describe packet-switched streaming service architecture developed by 3GPP (abbreviated 3GPP-PSS). There has been widespread effort to develop adaptive modulation, equalization and coding schemes that uses the real-time estimation of channel characteristics to achieve certain performance objectives such as error rate and delay at the physical layer. A number of smart-antenna-based technologies have been developed that use space diversity techniques to mitigate the impact of multipath fading and achieve higher capacity. Also, there has been lot of work on micromobility protocols [58] (such as FMIP) at the network layer to reduce mobility-induced disruption. There is a need to look into joint optimization issues across various layers to provide good-quality seamless streaming service in wireless/mobile environments. The wireless bandwidth can be utilized in an effective manner if the lower layers have detailed understanding of the application requirements. A well-defined interface between IP layer and lower layers would be very useful in next-generation wireless networks. Indeed, the EU IST project BRAIN has already defined an IP-to-Wireless (IP2W) interface for this purpose. There are still a number of design issues regarding providing streaming services in heterogeneous wireless networks that include various wireless access technologies (3G, WLAN, Bluetooth, etc). Secured streaming is yet another area of active research. The ability to protect the intellectual property rights of the content owners will be a key factor in the mobile digital content market. Multimedia streaming services are becoming very popular on the Internet, and when these services become mobile, animation, music, and news services will be available to users regardless of the location and time. Next-generation mobile

322


networks will combine the standardized streaming service with a range of unique services to offer a wide range of innovative and exciting multimedia services to the rapidly growing mobile market.

REFERENCES 1. The Third Generation Partnership Project, http://www.3gpp.org. 2. The Third Generation Partnership Project 2, http://www.3gpp2.org/. 3. I. Elson et al., Streaming technology in 3G mobile communication systems, IEEE Comput., 34(9): 46– 52. (Sept. 2001). 4. H. Montes et al., Deployment of IP multimedia streaming services in third-generation mobile networks, IEEE Wireless Commun. 84– 92 (Oct. 2002). 5. D. Wu et al., Streaming video over the Internet: Approaches and directions, IEEE Trans. Circuits Syst. Video Technol. 11(3) (March 2001). 6. S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, the Internet, and the Telephone Network, Addison-Wesley Professional, 1997. 7. S. Floyd and K. Fall, Promoting the use of end-to-end congestion control in the Internet, IEEE Trans. Network., 7: 458– 472, (Aug. 1999). 8. S. Floyd et al., Equation based congestion control for unicast applications, Proc. ACM SIGCOMM, Stockholm, Sweden, Aug. 2000, pp. 43 – 56. 9. The TCP-Friendly Web Page, URL: http://www.psc.edu/networking/ tcp_friendly.html. 10. R. Rejaie, M. Handley, and D. Estrin, RAP: An end-to-end rate-based congestion control mechanism for real-time streams in the Internet, Proc. IEEE INFOCOM ’99, March 1999, Vol. 3, pp. 1337– 1345. 11. S. McCanne, V. Jacobson, and M. Vetterli, “Receiver Driven Layered Multicast,” Proc. of ACM Sigcomm, pp. 117– 130, Palo Alto, CA, USA, Aug. 1996. 12. R. Rejaie, M. Handley, and D. Estrin, Quality adaptation for congestion controlled playback video over the Internet, Proc. ACM SIGCOMM’99, Cambridge, Sept. 1999, pp. 1337–1345. 13. Q. Guo et al., Sender-adaptive and receiver-driven video multicasting, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 2001), Sydney, Australia, May 2001. 14. Y. Wang, M. T. Orchard, and A. R. Reibman, Multiple description image coding for noisy channels by pairing transform coefficients, Proc. IEEE Workshop on Multimedia Signal Processing, June 1997, pp. 419– 424. 15. Xue Li et al., Layered video multicast with retransmission (LVMR): Evaluation of errorrecovery schemes, Proc. INFOCOM’98, March 29– April 1998, Vol. 3, pp. 1062– 1072. 16. S. Shenker, C. Patridge, and R. Guerin, Specification of the Guaranteed Quality of Service, RFC 2212. 17. S. Blake et al., An Architecture for Differentiated Services, RFC 2475. 18. R. Braden et al., Resource Reservation Protocol (RSVP)—Version 1 Functional Specification, RFC 2205. 19. D. Durham et al., The COPS (Common Open Policy Service) Protocol, RFC 2748. 20. T. Sikora, MPEG digital video-coding standards, IEEE Signal Process. Mag. (Sept. 1997).

REFERENCES

323

21. T. Sikora, The MPEG-4 video standard verification model, IEEE Trans. Circuits Syst. Video Technol., 7(1) (Feb. 1997). 22. R. Talluri, Error-resilient video coding in the ISO MPEG-4 standard, IEEE Commun. Mag., (June 1998). 23. N. Poll, MPEG digital audio coding, IEEE Signal Process. Mag. (Sept. 1997). 24. 3GPP, Transparent End-to-End Packet-Switched Streaming Service (PSS): Protocols and Codes (Release 5), Generation Partnership Project TS 26.234. V5.4.0. 25. H. Schulzrinne, A. Rao, and R. Lanphier, Real Time Streaming Protocol (RTSP). IETF Standards Track RFC 2326, April 1998. 26. A. Barbir, B. Cain, R. Nair, and O. Spatscheck, Known CN Request-Routing Mechanisms, IETF Work in Progress, April 2003. (Note: CDI working group at IETF has concluded.) 27. M. Handley, C. Perkins, and E. Whelan, Session Announcement Protocol, IETF Experimental RFC 2974, Oct. 2000. 28. M. Handley and V. Jacobson, SDP: Session Description Protocol, IETF Standards Track RFC 2327, April 1998. 29. J. Rosenberg et al., SIP: Session Initiation Protocol, IETF Standards track RFC 3261 June 2002. 30. H.323v5, ITU-T Recommendation H.323, Packet-Based Multimedia Communications Systems, 2003. 31. http://www.w3.org/TR/2001/REC-smil20-20010807/. 32. Composite Capabilities/Preference Profiles (CC/PP), Structure and Vocabularies, http://www.w3c.org/TR/CCPP-struct-vocab/. 33. WAP User Agent Profile Specification, Oct. 2001. 34. CC/PP Attribute Vocabularies, http://www.w3.org/TR/2000/WD-CCPP-vocab20000721/. 35. Capability Exchange Using HTTP Extension Framework, http://www.w3.org/TR/ NOTE-CCPPexchange. 36. R. Fielding et al., Hypertext Transfer Protocol—HTTP/1.1, IETF Standards Track RFC 2616, June 1999. 37. Transmission Control Protocol, IETF RFC 793, Sept. 1981. 38. J. Postel, User Datagram Protocol, RFC 768. 39. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: A Transport Protocol for Real-Time Applications, IETF Standards Track RFC 1889, Jan. 1996. 40. C. Bormann et al., RTP Payload Format for the 1998 Version of ITU-T Recommendation. H.263 Video (H.263 þ ), IETF Standards Track RFC 2429, Oct. 1998. 41. J. Sjoberg et al., Real-Time Transport Protocol (RTP) Payload Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs, RFC 3267. 42. L.-A. Larzon, M. Degermark, and S. Pink, The UDP-Lite Protocol, IETF Internet Draft, Work in Progress, Dec. 2002. 43. L.-A. Larzon, M. Degermark, and S. Pink, UDP Lite for Real Time Multimedia Applications, HPL-IRI-1999-001, April 1999. 44. J. Sjoberg et al., Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMRWB) Audio Codecs, IETF Standards Track RFC 2429, June 2002.

324


45. S. Sen, J. Rexford, and D. Towsley, Proxy prefix caching for multimedia streams, Proc. INFOCOM’99, March 1999, Vol. 3, pp. 1310– 1319. 46. J. Rexford, S. Sen, and A. Basso, A smoothing proxy service for variable-bit-rate streaming video, Proc. GLOBECOM’99, Vol. 3, pp. 1823– 1829. 47. T. Yoshimura, T. Ohya, T. Kawahara, and M. Etoh, Rate and robustness control with RTP monitoring agent for mobile multimedia streaming, Proc. IEEE Int. Conf. Communications (ICC 2002), April 2002. 48. G. Cheung and T. Yoshimura, Streaming agent: A network proxy for media streaming in 3G wireless networks, Proc. IEEE Packet Video Workshop, April 2002. 49. C. E. Perkins, Mobile IP, IEEE Commun. Mag., 66– 82 (May 2002). 50. R. Koodli and C. E. Perkins, Fast handovers and context transfers in mobile networks, paper presented at ACM SIGCOMM, 2002. 51. H. Matsuoka, T. Yoshimura, and T. Ohya, A robust method for soft IP handover, IEEE Internet Comput. 18– 24, (March/April 2003). 52. T. Yoshimura, Y. Yonemoto, T. Ohya, M. Etoh, and S. Wee, Mobile streaming media CDN enabled by dynamic SMIL, Proc. WWW2002, May 7 – 11, 2002, Honolulu. 53. M. Tariq and A. Takeshita, Management of cacheable streaming multimedia content in networks with mobile hosts, Proc. IEEE GLOBECOM2002, Nov. 17 – 22, 2002, Taipei, Taiwan. 54. M. Tariq, R. Jain, and T. Kawahara, Mobility aware server selection for mobile streaming multimedia content distribution networks, Proc. 8th Int. Workshop on Web Content Caching and Distribution, Hawthorne, NY, Sept. 29 – Oct. 1, 2003. 55. A. K. Salkintzis, C. Fors, and R. Pazhyannur, WLAN-GPRS integration for next-generation mobile data networks, Proc. IEEE Wireless Commun., 112 – 124 (Oct. 2002). 56. V. K. Varma et al., Mobility management in integrated UMTS/WLAN networks, Proc. IEEE ICC’03, May 2003, Vol. 2, pp. 1048 –1053. 57. 3GPP, Feasibility Study on 3GPP System to WLAN Interworking, Technical Report 3GPP TR22.934 v6.1.0, Dec. 2002. 58. A. T. Campbell and J. Gomez-Castellanos, IP micro-mobility protocols, Proc. ACM Sigmobile, 4(4): 45– 54 (Oct. 2001). 59. S. Verma and R. Barnes, A QoS architecture to support streaming applications in the mobile Internet, Proc. 5th IEEE Symp. Wireless Multimedia Communications (WPMC), Honolulu, Oct. 27– 30, 2002. 60. S. Verma and R. Barnes, DiffServ-based QoS architecture to support streaming applications in 3G networks, Proc. 13th IEEE Symp. Personal, Indoor and Mobile Radio Communications (PIMRC), Lisbon, Sept. 15– 18, 2002. 61. S. Verma and R. Barnes, A QoS architecture to support streaming applications in the mobile Internet, Proc. 12th IEEE Workshop on Local and Metropolitan Area Networks, Stockholm, Aug. 11– 14, 2002. 62. F. H. P. Fitzek and M. Reisslein, A prefetching protocol for continuous media streaming in wireless environments, IEEE J. Select. Areas Commun., 19(10): 2015 –2028 (Oct. 2001). 63. K. K. Leung et al., Link adaptation and power control for streaming services in EGPRS wireless networks, IEEE J. Select. Areas Commun., 19(10): 2029– 2039 (Oct. 2001). 64. S. Dogan et al., Error-resilient video transcoding for robust interwork communications using GPRS, IEEE Trans. Circuits Syst. Video Technol. 12: 453 – 464 (June 2002).

REFERENCES

325

65. A. Boukerche, H. Sungbum, and T. Jacob, An efficient synchronization scheme of multimedia streams in wireless and mobile systems, IEEE Trans. Parallel Distrib. Syst., 13: 911 – 923 (Sept. 2002). 66. A. Majumda et al., Multicast and unicast real-time video streaming over wireless LANs, IEEE Trans. Circuits Syst. Video Technol., 12: 524 – 534 (June 2002). 67. B. Zheng and M. Atiquzzaman, A novel scheme for streaming multimedia to personal wireless handheld devices, IEEE Trans. Consum. Electron., 49: 32 – 40 (Feb. 2003). 68. RDF Premier, http://www.w3c.org/TR/rdf-premier/. 69. WAP Push Architectural Overview, July 2003. 70. R. Rejaie, M. Handley, and D. Estrin, Architectural considerations for playback of quality adaptive video over the Internet, Proc. IEEE ICON 2000, Sept. 2000, pp. 204 – 209.

CHAPTER 10

MULTICAST CONTENT DELIVERY FOR MOBILES ROD WALSH and ANTTI-PENTTI VAINIO Nokia Research Center Tampere, Finland

JANNE AALTONEN Nokia Ventures Organization Turku, Finland

10.1

INTRODUCTION

Multicast is the simultaneous delivery, or communication, of data between several parties. Multicast as a technology has been available since the late 1990s in the Internet and for applications ranging from a multiparty teleconference between a few friends to hundreds of thousands of people watching and listening to a webcast music concert. The delivery of content to many users simultaneously using a shared multicast transport path and last mile is attractive for several reasons. More efficient use of infrastructure and radio bandwidth is very important to mobile wireless network operators, especially since higher-data-rate rich-media services become feasible without increasing the total network capacity. Users may benefit from consuming shared content in both technical terms (higher data rates and faster downloads) and social terms (content demand increasingly correlates to common interests for persistent communities and dynamic ad hoc groups). Initially driven by voice services, wireless data communications have become global with continued growth over many years. The resulting proliferation of media and business orientated handsets to meet many levels of user expectation provides an excellent basis on which multicast can achieve its main benefits in order to drive more exciting services and commercial endeavor. Two initiatives are on the track of progressing through development, standardization, and commercialization. These are IP datacast (IPDC) and multimedia broad-


327

328

MULTICAST CONTENT DELIVERY FOR MOBILES

cast multicast service (MBMS). Although these originate from different backgrounds—digital television broadcast and third-generation cellular telecommunications respectively—they both hold the promise of providing true multipoint services to mobile and wireless users, with all the benefits and opportunities that this brings. 10.1.1

Chapter Overview

Several aspects and features of the IPDC and MBMS systems originate from a mutual set of needs, and, as a result, some of the system aspects are common. For this reason, this chapter next gives a multicast overview of the salient aspects of multicast and then considers the generic IP multicast system as an abstract entity and describes some of the features of IP multicast as a common enabling technology before presenting the individual aspects of both IP datacast (IPDC) and multicast in third-generation cellular (MBMS) in more detail. The chapter finishes off by summarizing these systems and the next steps we can expect to see in the natural development of multicast content delivery systems for mobiles. 10.2 10.2.1

MULTICAST OVERVIEW The Justification for Multicast

Multicast is efficient for group communications. The delivery of data to a group of users is preferable over multiple individual data connections between each pair of users. Each part of the system using multicast need not duplicate actions or data. A simplified view of how multicast can provide efficiency in data delivery infrastructure is shown in Figure 10.1, where a single sender is delivering the same datastream to several wireless receivers. The figure also illustrates the communications efficiencies that apply to fixed and wireless links as well as the impact on the three mile system: . The first mile, between the servers or senders and their respective network connections, typically the Internet with access provided by an Internet service provider (ISP) of some kind. Providing sufficient quality of service with reasonable connection costs is of paramount importance. . The middle mile, over which data passes from the remote source to the remote destination, analogous to major ISP core networks in the Internet core. Data traverses several links and devices, which it shares with other data streams. High-volume data flows and well-behaved connections, for instance employing friendly congestion control, are of paramount importance. . The last mile, between the user (or her/his network) and her/his communication access point, such as her/his local ISP with fixed dialup or a cellular radio link with a wireless operator ISP. The costs and disadvantages of multicast are derived from the same axiom, that multicast is group communication. Thus ownership of costs, intellectual

10.2

MULTICAST OVERVIEW

329

single transmissions 1 copy 1 copy 1 copy

1 copy 1 copy

Sender 1 copy

Wireless Receiver

( a)

multiple transmissions 3 copies 3 copies 6 copies

1 copy 3 copies

Sender 2 copies

( b)

Wireless Receiver

Figure 10.1a A simple unidirectional end-to-end system showing the relative efficiencies of routing by (a) multicast and (b) unicast.

rights, security, network usage and congestion, and communications management functionality is not the same as point-to-point communications, which commonly consists of two parties with well-understood roles (e.g., server – client). This functionality must be distributed between the group, and there are many best ways to do this depending on the application of usage. For a massive Webcast, it makes sense that any special relationships exist only between the sender and each of the receivers, whereas for a teleconference there is a need for more of a peerto-peer relationship. In all cases, the provision of group functionalities is key. Although IP multicast has been available as a basic technology in the Internet for many years, it has not been as widely deployed or as used as point-to-point methods. There are several technical and commercial reasons why this has been so, leading

330


to a widespread acknowledgment that multicast has not yet been successful in the Internet. This perception subsequently developed into a chicken-and-egg problem of vendors manufacturing what operators have committed to order and operators buying what vendors have committed to manufacture, and thus hardware and software optimized-for-multicast technology has not been as commercialized and widely available as the point-to-point equivalents. However, the technology and marketplace has continued developing during the years of multicast availability and the main obstacles to widespread multicast deployment are well understood and either solved or work in progress. In particular, well-behaved congestion control mechanisms to use with multicast transport will allow the coexistence of point-to-point and multipoint data traffic over the same networks and thus allow a natural migration towards fully IP multicastenabled ISPs. Group management and control, especially in terms of security techniques, has also undergone significant development, especially in the Internet Engineering Task Force (IETF) Multicast Security [1] and Multicast and Anycast Group Membership [2] working groups. In an increasingly number of cases the advantages of multicast are becoming very attractive. Narrowband users cause less congestion in core networks than do broadband users, and increases in the number of broadband connections to homes and elsewhere mean that broadband services are feasible in the last mile. This requires that the first and middle miles scale to the increasing demand for rich media services—an area where multicast has a clear advantages over unicast. The shifting paradigm of “Internet by PC” toward a more ubiquitous mobile Internet leads to changing Internet usage patterns, and many of the exciting next-generation mobile data services, such as mobile TV, are more technically and commercially attractive using one-to-many techniques. All of these factors work in favor of growing deployment of multicast generally in the Internet and especially in the wireless mobile multicast systems that we shall describe.

10.2.2

Three Perspectives on Multicast

Multicast generally describes multipoint, or multiparty, exchange of information. The specific definition of multicast derives from its usage. Taking a simple layered protocol analogy, we can derive three perspectives on multicast usage: . Application Layer Multicast. Data or content is shared between many users simultaneously, with applications understanding the multiparty aspects of the service. Email with multiple recipients is an example of application layer multicast using unicast for delivering the data packets. . Network Layer Multicast. The network infrastructure optimizes the routing of data by delivering packets over a link only once, even though they are destined to possibly many more than one recipient. Thus, packet duplication is only needed where different receivers are only accessible via different links.

10.2

MULTICAST OVERVIEW

331

. Physical Layer Multicast. The basic link technology determines whether the physical layer is shared or dedicated. For instance, digital subscriber line (DSL) links deliver IP multicast packets point-to-point, whereas Ethernet segments would broadcast the IP multicast packets to all devices on the link. This list is not intended to be an exhaustive analysis, although it should provide an insight into the sorts of content distribution application environments that can benefit from multicast technologies. The term multicast is commonly used to imply network layer (i.e., IP multicast) techniques, although the total value of multicast to users and content and network providers is derived from a combination of all of these layers.

10.2.3 Multicast as a Communication Technique Unicast describes point-to-point (one-to-one) communications whereas multicast includes three theoretical multipoint communications subcases (as illustrated in Fig. 10.2). . Point-to-multipoint (p-t-m)—a single entity communicates to two with more others. Internet radio [3] is an example of this where streamed audio is sent from one server to many clients providing application layer multicast (simultaneously shared content) and using either unicast or multicast IP at the network layer. . Multipoint-to-multipoint (m-t-m)—two or more entities communicate with two more other entitles. Conferencing is a well-used example of this, such as for voice over IP [4]. . Multipoint-to-point (m-t-p)—many entities start communications with a single entity. Examples of this subcase are far fewer than the previous two, especially for user services. A network protocol example is the DHCPDISCOVER message used in DHCP [5].

Figure 10.2 The three fundamental multicast subcases (a) point-to-multipoint; (b) multipoint-to-multipoint; (c) multipoint-to-point.

332


These subcases infer that the originator of communications sends the initial message to the other party [i.e., originator(s)-to-other(s)], which is the distinction between the first (p-t-m) and third (m-t-p) cases. At this level there is no distinction between the functions of these parties. For instance, applications for peer-to-peer, server – client, and router –host communications are all included. In some scenarios a host can act as both a sender and a receiver and in others will assume on the role of only one of these. For content distribution, the point-to-multipoint case is already extensively implemented for services and multipoint-to-multipoint is undergoing significant efforts, especially for conferencing and multiplayer gaming. Our main focus in the chapter is on the point-to-multipoint case as it most accurately defines the services provided by the two mobile wireless systems we shall describe, IPDC and MBMS. However, the other two cases are not insignificant, and we can expect further developments in those areas. The distinction between multicast and broadcast is not well resolved in the literature on this subject, so we will take the approach that broadcast is a subset of multicast. The characteristics of broadcast are that it is a unidirectional communication, involving no return messaging (at least none in band), and transmitted to all users on a link. Multicast also includes the cases of bidirectional signaling and targeting selected groups, which can limit reception to only some users on a link, or across several links. What is implied by the term broadcast depends on which layers, discussed above, employ it. For instance, it is feasible to deliver IP multicast packets over broadcast radio, and we consider this to be a use of multicast at the network layer, although broadcast would also be a suitable description if no return signaling were sent on the network layer. Another useful concept is the difference between active and passive hosts. An active host will send and receive messages and a passive host will only receive. In the broadcast scenario, all receiving hosts are passive, as return signaling is not used. In the more general multicast scenario, all or some of the receiving hosts may be active. 10.2.4

Multicast Applications and Services

Selecting and developing multicast applications to deliver services and content should start with a simple question: “Is there a clear benefit in using multicast?” The answer must take into account the end-to-end characteristics of the service in question. Arguments in favor of multicast tend to be along the lines of “We need scalability to a large number of users”; “Users must receive the content at the same time”; or “This content is interesting to a large number of people.” Other arguments that need to be considered are the media (i.e., content source), format, and expected consumption. For instance, streaming of video is natural for real-time consumption, although file delivery may provide more optimal, and opportunistic, use of bandwidth if the video is to be consumed later than the delivery—the selection of which delivery method to use should take into account the expected usage of the whole receiving group. On the other hand, file delivery may be the

10.2

MULTICAST OVERVIEW

333

natural choice for executable objects, such as games software. In all cases, the choice of using multicast delivery depends on the audience size, location and consumption habits of the user group(s) in question. Figure 10.3 shows some of the alternatives for this source –delivery– consumption chain. A real-world application may well combine more than one of these alternatives, as in a live news channel augmented with cached HTML pages and video clips. The basic metric for the selection of multicast over unicast is the concept of multicast gain, that is, the expected quantitative benefit derived from using multicast [42]. A single value for systemwide multicast gain is generally very difficult to compute as gains in radio and fixed-line links, as well as correlation of user interests and variability of the service mix, would make such a figure a unique balance of many assumptions—and would thus be more academic than pragmatic. Thus, the function to calculate multicast gain need only be as complicated as required, such that if only a certain system element or domain is problematic, the analysis could be limited to that. A typical example of this is the scarce radio bandwidth as a resource for mobile wireless systems [43]. For example, if a live video transmission were expected to be delivered to 100 users on the same radio bearer, there may be a multicast gain of 100, although some interference-sensitive radio technologies could reduce this to, for example, 25. (Note: Especially dedicated closed-loop power-controlled radio channels can be more efficient using a small number of dedicated radio channels than a shared open-loop power controlled channel.) The composition of a multicast gain metric needs expert consideration from system and applications developers. However, there are some general issues that form a good foundation for this analysis. The importance of each of the following parameters can be evaluated: . Bandwidth Capacity and Congestion. In each link (e.g., air interface, network infrastructure, terminal capability) the cause and effect of congestion can be analyzed. Multicast with a congestion control mechanism may perform very well if users are interested in at least some common content. It is worth noting that unicast, broadcast, and multicast (with a variety of uplink methods) pose different congestion problems, and do so differently for uplink and

Figure 10.3

Common alternatives for the source, delivery, consumption chain.

334

.

.

.

.

1


downlink. For instance, the number of users helps to establish whether broadcast is more efficient than unicast. On the downlink multicast can be even more efficient than broadcast (as the data are not transmitted to “empty cells”), although additional route state changes and uplink signaling render (uplinkcapable) multicast less scalable than broadcast only. Application Requirements. Each application will have its own requirements and the service mix of a certain network or provider will determine the priority given to each. This will impact the value of multicast as a data delivery method in the relevant environment. For instance, delivery synchronization (i.e., limiting jitter—the variation in time between several receivers receiving the same content) can be particularly important for stockmarket alerts and online gaming. Another example is massive file delivery of software updates, where massive scalability and reliable transfer of executable code are both essential. Consumption Habits and Timing. Individual consumption habits and especially the timing of these play an important role. Where there is an existing usage paradigm (such as news at 6 P .M .) there is a strong case for a shared real-time transmission. When there is little user activity (generally this is true at night), common offline content, such as trailers and advertisements, may increase the total per user data quantity available for push delivery. Also, polarized habits such as TV channel-hopping 1 and passive-listening 2 to the radio may vary the requirements on delay and sustained bandwidth from case to case. In any case, the time required to receive the content combined with the timing of consumption is a critical factor. Usage Shaping. If there is a distinct advantage in shaping user consumption, then this may also affect multicast gain. For instance, the related revenues as well as the scalability of the system may influence the balance between video on demand and live TV. Deployment Investment. The cost and effort of deploying and maintaining a technology may play an important role. Where a service provider has an existing point-to-point infrastructure, upgrading to multicast may require a threshold of new users and revenues to be met first. On the other hand, the scalability of multicast may work in its favor for early deployment, as more users may be provisioned with rich media services for, generally, lower infrastructure requirements—such a system may be the basis for incremental upgrading to increase point-to-point capacity as revenues are realized.

The term channel hopping originates from the widely known users behavior of switching rapidly between television channels, usually to establish whether there is interesting content without consulting a service guide. The term also describes rapid user switching between content flows in general and sets requirements on timely establishment of new flows to meet user expectations, as well as timely teardown of old flows to avoid network congestion 2 Passive listening is a common user consumption behavior for AM/FM radio systems, where a user “gets on with other things” while listening to music or voice broadcast in the background and does not change the radio station (i.e., the content). This contrasts with active listening, where the listening to the service would be a user’s main, or only, activity.

10.2

MULTICAST OVERVIEW

335

Additionally, there needs to be a technical balance between the use of broadcast (passive receivers) and return path signaling, as the latter reduces scalability and increases complexity but may be used to improve reliability, security, and usage reporting. Furthermore, the commercial environment tends to add complexity to the decision and issues such as privacy, content distribution rights/licensing, router capability, and provisioning, which may all play an important part in the balance between unicast and multicast. Although there is no hard limit on which applications multicast supports or which are particularly suited to multicast, a brief listing of multicast applications can help set the context. There are already many good resources that list multicast applications, and Table 10.1 [6] provides a suitable foundation.

10.2.5 Mobile Wireless Multicast Cellular communications has traditionally focused on the dedicated voice call services to mobile wireless handsets, with more recent commercialization of packet-switched services, including Internet access. As the convergence of cellular and Internet communications is realized, the need for compelling rich-media content and services to next-generation mobile users will be extremely important in the success of this convergence. Multicast has the potential to offer media-rich services to multiple users at a capacity cost less than serving each user individually—and potentially providing a better quality of service to all. As such, it is a very interesting complement to the dedicated services of second- and third-generation (2G and 3G) cellular communications in terms of the network optimizations and content provisioning benefits it has to offer. The application of, especially, IP-based multicast services to the mobile wireless environment must cater for the demands of this environment. Scalability for an

TABLE 10.1

One Example List of Multicast Applications

One-to-Many Applications Scheduled audiovideo distribution Push media File distribution and caching Announcements Monitoring

Source: Quinn and Almeroth [6].

Many-to-Many Applications

Many-to-One Applications

Multimedia conferencing Synchronized resources Concurrent processing Collaboration Distance learning Chat groups Distributed interactive simulations Multiplayer games Jam sessions

Resource discovery Data collection Auctions Polling Jukebox Accounting

336


increasing number of uses at feasible infrastructure costs, loss of packets and coverage due to radio propagation and interference characteristics, and the rapid or gradual migration of users between access points as they perform cellular handovers are all typical considerations that multicast must be able to answer in the mobile wireless environment. Both of the real-world systems we shall describe, IPDC and MBMS, must contend with these issues and provide solutions to the problems that their developers have prioritized. Multicast (one-to-many) services have been very successful for a long time in the broadcast world, providing well-understood television and radio services to typically stationary (or car-based) receivers. Even in these systems, mobility and access using handheld battery powered terminals stretch the basic technologies to, and beyond, their limits. Thus a three-way broadcast – cellular – Internet convergence scenario could see the greatest benefits to all three worlds.

10.3 10.3.1

THE GENERIC IP MULTICAST SYSTEM Common Multicast System Aspects

Both of the real-world systems described later, IPDC and MBMS, bring terminology and assumptions that are specific to the broadcast and telecommunications worlds, respectively. However, with a common language and basic analysis, it is evident that they share much in common. In this section we shall provide a basic system reference model to compare architectural aspects of IPDC and MBMS and future variations, and describe some of the common aspects that are conceptually equal in both systems even though the detailed solutions may differ. In particular, both systems utilize IP multicast from IETF standards, and this leads to a number of common system aspects. Furthermore, both systems specialize IP multicast to provide a unidirectional-shared radio downlink (with separate individual uplink channels in the case of MBMS multicast mode). The unidirectional link poses particular problems for media discovery and connection management.

10.3.2

A Reference System Model

Figure 10.4 gives a model of a generic end-to-end system that can be used as a baseline to compare with both IP datacast and MBMS systems [7]. The purpose of this diagram is to spell out the principal domains and interfaces that may be interesting to us (the names are rather arbitrarily chosen). Each domain may be seen as a single component with a set of functionality. In practice, each domain will consist of multiple logical and physical components, such as services and servers, and provide the sum of the functionality of those components. Each interface provides a set of services between two domains. Although the interfaces contain no functionality themselves, they determine the functional requirements of the domain components.

10.3 ServiceNet IF

Content

AccessNet IF

Service Delivery

Core

THE GENERIC IP MULTICAST SYSTEM Broadcast IF

Access (unidir.)

337

Client Control IF

Client Platform

Client App.

Access (bi-dir.)

CoreNet IF

Interaction IF

Core IF

Figure 10.4

Service IF

Content IF

Client/Network IF

Reference model of a generic end-to-end multicast system.

The framework fits the “3 Cs” business approach: . Content—content and service delivery. Traditionally this role is fulfilled by a content provider and/or aggregator. . Connection—core and access networks. Traditionally provided by a network operator. . Consumption—client platform and application software. Traditionally enabled by end-user equipment and applications. Typically, a business model would deploy each of these domains separately or in groups for a single operator (or provider) in the value chain. There may be common functions required by different operators and so domains may be overlapping. This may lead to redundancy and competition. In all cases, each domain provides a distinct set of functions and components. For example, a content provider may operate a streaming server (and therefore part of a service delivery domain) while keeping his/her business model separated from that of a service provider, which may be aggregating streams. This framework implies complete scalability—unlimited numbers of each domain working with unlimited numbers of other domains. Furthermore, it does not impose business model restraints.

10.3.3 Three-Platform Services As discussed earlier, the selection of multicast technologies is bound to the selection of services and applications a system will support. According to our reference model, content is sent from a service delivery domain. In practice, this means that we must employ some kind of service system that is the source of multicast services, and possibly the source or repository of the content, too. In other words, a service

338


system is a collection of content servers. As any service system is generally built for purpose according to the service mix required by the system operator(s) in question, the exact content and servers are impossible to define at a generic level. However, three general platforms are required for the provision of higher-level services that are common to IPDC and MBMS multicast systems: streaming, filecast, and media discovery: . Streaming. This is for the streaming of data from local storage and live sources. Large audiences that can be streamed to at rates from a few tens of kilobits per second to several megabits can support rich media streaming particularly suited to video and audio streaming, for instance, providing mobile television. Streaming is also characterized by the higher importance of timely delivery (consistent data rate and low jitter) than perfectly reliable delivery. . Filecast. This is for the simultaneous distribution of files, otherwise known as discrete media objects. For some applications, close to real time may be desirable, such as if images are to be displayed and synchronized with real-time streaming media. Many applications interested in file transport also demand reliable (i.e., error-free) delivery of files; thus reliable multicast protocols are especially important. Use cases exist for many combinations of these alternatives: massive or small group distribution, large and small file sizes, real-time or offline delivery, scheduled or on-demand/spontaneous content, best-effort and reliable transport, one-off and repeated/carouselled files. . Media Discovery. IP multicast systems announce their content and services in advance, and during the multicast sessions that deliver them. This enables users to select and locate end-user services for consumption, and access them when the time comes. Generally, media are described using a description syntax and semantics, such as with the session description protocol (SDP), and delivered using one or more transport protocols, such as session announcement protocol (SAP), or hypertext transfer protocol (HTTP). The unidirectional nature of structured wireless multicast networks and the lack of a massively scalable uplink means that often unidirectional delivery mechanisms are preferable (possibly exclusively) over those requiring bidirectional connections. For this reason, reliability and redundancy in delivering media descriptors is important and faces many of the same issues as for filecasting. Some multicast applications (see Table 10.1), are easily based on one of these platforms, whereas others require additional development, such as the choice of streaming for small-file filecast for a messaging or chat service. It should be noted that these three platforms represent a feasible baseline, based on implementation experience, rather than an exhaustive analysis of all the options. Internal (e.g., walled garden 3) and external (e.g., public Internet) content may be available, and the choice of servers, proxies, and digital rights technologies will 3 Walled garden services and content are those that are available from only a limited number of providers and operators based on proprietary agreements.

10.3

THE GENERIC IP MULTICAST SYSTEM

339

reflect this. In many deployments there is a need for more than one service system with different functionalities; for instance, the file and management servers of a content provider may feed content acquisition servers of a service provider, and this service provider may subsequently stream content to a service aggregator or network operator who delivers their service mix to wireless users. The use of IP multicast provides a toolkit of existing IP stack protocols standardized and work in progress in the IETF. The maturity and feature set of IP-based protocols varies for each of the platform services: 10.3.3.1 IETF Streaming Several proprietary streaming protocols, codecs, servers, and players are on the market, but the momentum behind a number of open standard options and the existing implementation means that real-time transport protocol (RTP) [8] delivery is becoming the de facto choice for streaming transport. Related protocols, such as the real-time control protocol (RTCP—part of the same standard as RTP) and real-time streaming protocol (RTSP) provide additional features to a subset of applications, especially where a return link is available. RTP has a very flexible approach to the support of multiple and future media codecs as it requires a payload format to be specified for each payload type it supports. Many are already standardized and the work on MPEG4 payload formats promises to solidify a good selection of completely open standard video and audio streaming solutions. 10.3.3.2 IETF Filecast Many solutions have been proposed as both open standards and proprietary solutions to filecasting. However, efforts to standardize reliable multicast transport (RMT) have resulted in the IETF chartering a working group on RMT [9] to produce, among other things, specifications that meet of the reliable unidirectional delivery requirements. In particular, asynchronous layered coding (ALC) [10] offers a building block to produce a robust file delivery protocol with sufficient congestion control for IPDC and MBMS environments, and work is in progress on file delivery over unidirectional transport (FLUTE) [11]—a protocol instantiation to fully specify a filecast protocol based on ALC. Although this extremely important work is in progress with the results of deployment needed to complete the IETF standardization process, it is anticipated that a single open standard for IP-based filecast may be available in time for widespread IPDC and MBMS deployments. 10.3.3.3 IETF Media Discovery There are primarily three delivery scenarios for media discovery in wireless multicast terminals: unidirectional multicast only, bidirectional unicast only, and a combination of both unicast and multicast delivery. Existing IETF (and other) protocols provide solutions to the first two options. The session announcement protocol (SAP) [12] multicasts single session description protocol (SDP) [13] descriptions of media so that user applications can sufficiently understand the available services to locate them on a multicast link. SAP has some global deployment but has well-understood flaws that have kept it from increasing deployment and progressing beyond an

340


experimental standard, such as the lack of reliability, lack of announcement prioritization, and outdated authentication mechanisms. Plenty of unicast protocols exist, with HTTP/TCP pretty much ubiquitously deployed. The addition of a common description scheme for multicast services via both multicast and unicast transport is essential to allow the combination of these transports and operator-tailored variations. A multitude of existing description formats (MPEG7, SDP, SDPng), make it essential to provide a basic framing to enable the delivery of metadata independent of syntax, describe the basic data model relating how elements of metadata relate and should be used, and also specify the systemspecific and application-specific metadata formats for the range of multicast services to be provisioned. Common global standards for the basic framing and baseline data model are essential to enable interoperability of IP-multicast-based systems. The IETF has chartered the MMUSIC (multiparty multimedia session control) working group to build on its prior work, including SAP and SDP, and compile a framework that reuses suitable existing IETF protocols and newly specified missing blocks. The work item as a whole is called Internet media guides (IMG) [14], which promises to provide a baseline toolkit for each of the described delivery requirements. 10.3.4

IP Multicast Networking Procedure

Several general steps are required to provide and receive wireless multicast services. Most of these are generalized in Figure 10.5. The stages in broken-line boxes are optional but widely used and generally less supported by openly standardized methods. Content creation and the simple and complex services created from it result in a ready source of content in formats compatible with the user equipment capabilities. The scheduling and agreements between network operators, intellectual rights owners, and service aggregators have a basic impact on the technology—generally limited to formats, digital rights management, server choices and control messaging. Service advertising may be out of band of the system (e.g., billboard advertising) or may be electronically available through Web links or in-band unsolicited service announcements. Its purpose is to arouse user interest and subsequently encourage users to register for services. This registration may involve financial information exchange such as payment for subscriptions or authorizations to bill for services accessed later. All of these steps may be iterative such that registration for a general bundle of services may or may not imply registration to specific services or content—generally referred to as media. If service media are to be secured (encrypted, authenticated, etc.), sufficient security information to access to the media must either be given at registration or made available as a related service. In other words, after registration a user has the right to access the service and security data to access it either with or without further security messaging. Media discovery occurs as described earlier. It should be noted that the media discovery communications behave as any other IP service on multicast bearers and access to media announcements requires running through each of the same

10.3


341

content creat ion service creat ion, scheduling & agreement service advert ising

service selection

subscriber registrat ion

service registration

media discovery / announcement backbone bearer (rout ing) configurat ion

mult icast group/channel joining

radio access network bearer configurat ion

radio channel access

multicast content delivery f rom source

mult icast content consumpt ion

Network-side

Figure 10.5

media discovery

User-side

Generalized procedure to provide and receive multicast services.

steps for access to this service. Media discovery can also be out of band, such as using point-to-point communications channel or entering details by hand into the user device. For some services, such as media announcement itself, fully autonomous (or preconfigured) discovery may be useful, such as using well-known (or standardized) session parameters for announcements. On the network side both the fixed IP-routed (and IP-switched) backbone network infrastructure and the radio access network infrastructure must be configured. The actual order in which this occurs, and the timeline with respect to the complementary joining and radio access functions on the user device, may vary depending on operator preferences. For instance, user joining may be required to propagate to the network before any backbone or radio bearers are set up to ensure no wasted bandwidth, although for existing groups and for broadcast in general, it may be desirable to set up the data transmission all the way to the radio link regardless of the state of (new) user devices (and possibly to also make the joining procedure a device-local feature). Preconfiguring the network in this way is suitable for larger audiences and eliminates (or reduces) the need for an uplink channel, thus increasing scalability to larger audiences. For maximum interoperability of systems, protocols and technologies below the IP layer should be autonomous such that the radio access network bearer configuration and subsequent access by a user does not affect the higher-layer signaling. To ensure this, radio and link layer parameters can be signaled at the radio link in

342


question. This may involve using paging, notification, or sending service information tables that are specific to the radio technology in question. This is a feature of both MBMS and IPDC systems that they provide their own access-specific signaling to allow user devices to correctly associate radio and link channels with specific multicast services (IP addressing), and not require these parameters to be delivered by a higher-layer media announcement mechanism. Several protocols are available for the routing and switching of IP multicast packets on the Internet [15], and these include the option of tunneling multicast streams within unicast streams over one or more links (where IP routed multicast is not feasible). Generally IP routing is considered superior to forced switching as it is more autonomous, and scalable and requires less administration (i.e., human) configuration and maintenance. However, for unidirectional delivery with wellknown bandwidth constraints, it can be highly, desirable to provision the links in advance with well-defined quality of service parameters; thus, the use of IP-switched techniques, especially tunneling, is popular in wireless multicast systems. Demand for wireless multicast services will stimulate development, and it is feasible that the need for unicast tunneling may subside in the future if the perceived benefits of routed IP multicast outweigh the upgrade cost. Figure 10.6 illustrates the difference between routed and tunneled multicast in a broadcast service scenario and illustrates three important points: (1) tunnels add a layer of complexity and overhead, (2) tunnels may cause duplicate data on the same link (the clouds in the figure represent subnets), and (3) tunnels guarantee that data passes through certain control points on their path. 10.3.5

Additional Aspects of the Mobile Wireless Environment

Content delivery to mobiles relies on wireless radio techniques, and these place additional requirements on any mobile multicast system. The following sections describe the aspects of wireless multicast which are common to both IP datacast and MBMS systems, including issues which are generally applicable to any wireless multicast system. 10.3.5.1 Mobility and Movement of Users The mobility of users creates two more needs: untethered access to the network services and consistent service as users are moving. The first need is inherently solved by wireless data communications, but also forces user devices to be batterypowered under normal use. Thus electrical power consumption (both peak and average) must be carefully considered and minimized where feasible. This affects both the physical design of the equipment and the protocol design. The ability to sleep, or shut down certain software and hardware functions to reduce power consumption for short and longer periods of time, must be factored in. Also, the ability to receive passively on one interface without maintaining an uplink channel is important. Consistent service as users are moving implies robust reception as a user moves in the coverage area of its current access point and handover between wireless access points. Also graceful degradation of service quality is desirable if the quality of

10.3


343

( a)

( b)

Figure 10.6

IP multicast delivery techniques, (a) IP routed and (b) IP switched by tunnel.

service cannot be maintained. Robust reception is normally a combined function of reliable delivery protocols and the radio modulation technique, such as its ability to deal with Doppler effects and Rayleigh and fast fading. Handover, also known as handoff, is one of the essential functions used to support user mobility in a mobile communications network as it provides a means to maintain data traffic connections. It deals with setting up new connections and releasing (or maintaining) old connections to network cells as a mobile terminal moves from the radio coverage area of one cell to another. Cells are generally the radio coverage area of a basestation (i.e., the radio access point). Handover in wireless cellular systems is normally a three-phase process: (1) measurement (measurement criteria, measurement reports), (2) decision (algorithm parameters, handover criteria), and (3) execution (handover signaling, radio resource allocation). To illustrate, an arbitrary example could be, measurement is nearly continuous (e.g., sampled every 100 ms), decisions are regular (e.g., assessed every 5 seconds) and handover is infrequent (depending on the UE usage, e.g., average of every 20 minutes). Handover execution is typically initiated due to the results of a decision based on the measurement of

344


certain criteria (e.g., signal quality between basestation and the mobile device). There are many good descriptions of this process for point-to-point wireless communications and it is well covered for 3G networks by Kaaranen et al. [16]. Mobility for multicast services differs from point-to-point services in that the multicast transmission is likely to be delivered to users in several cells as part of the same session, individual members of multicast groups are likely to move somewhat independently of each other, and large group membership implies greater signaling overhead requirements for any uplink. For these reasons there are two generalized types of handover (HO) for multicast services: active and passive. Active HO requires specific signaling from the user device to the network (uplink), whereas passive HO requires the network to give sufficient information on the downlink for a user device to get the service in the new cell without uplink signaling. Obviously, passive HO requires that the new cell be already configured to deliver the service, whereas active HO also enables inactive cells to be reconfigured and then be included in the transmission area. Thus passive HO is less versatile but scales to large user groups better and is particularly suited to multicast and broadcast to passive users. Passive HO can be implemented by using enough of the same service parameters on adjacent cells, and possibly also access-specific notification signaling, so that a terminal can learn that the same service is or is not available in the new cell—either before or after the handover to the new cell. This has clear implications for a user interface so that users may understand and deal with preconfigured service coverage. On the other hand, active HO may be used to serve smaller user densities (zero to a few per cell), but this spreads the radio resource usage cost between fewer users and may require a premium monetary cost. For both passive and active multicast HO, the concept of area is extremely useful. An area may be used to define two useful parameters: the geographic area configured to transmit the service; and the geographic area authorized to deliver the service. In the passive and broadcast cases, these parameters are generally the same, but for active multicast the area authorized is generally larger as only the cells with sufficient users will be configured to transmit the service, to conserve radio resources. Various other intermediate area definitions may be useful. For instance, a network would generally be described by a number of areas, which could be cells, groups of cells, a subgroup of cell covering a known geographical area (e.g., a certain city) and the whole radio network. Figure 10.7 illustrates this by example. In this way a network operator can offer a certain well-known area for a service rather than a list of obscure cell names, whose radio coverage is often dynamic in practice. It also aids with passive handover, as a terminal would expect the same services to be available in all the cells of a certain transmission area. This technique enables various parts of the network to abstract others; for instance, 3G core networks generally understand 3G radio access networks in terms of routing (and other) areas, but only the radio access network is assumed to have knowledge of the specific cellular topology. Another feature of multicast handover where cells are geographically overlapping is that the availability of services is more important than the absolute value of signal quality in a cell. For instance, for point-to-point communications a

10.3


345

Incoming data streams

data network

cell 5

cell 2

cell1

cell 3

area A

Figure 10.7

cell 6

cell 4

area C network coverage

area B

An example of the relationship between cells and areas.

user would generally prefer to be situated in the cell with the best signal quality, whereas if this cell is not authorized or configured to transmit the multicast services a user desires, then a lower signal quality (with more transmission errors) is preferable if a user can still get the services she/he wants. Handover to cells based on both individual and group communications requirements is complex and normally solved by selecting the technique suited to the majority case: best signal for MBMS where individual communication is paramount in 3G systems, and best service selection for IPDC where one-to-many services dominate. A common aspect of the IPDC and MBMS systems is that they provide handover at layers below IP. The trend toward all-IP in the various mobile systems, which has already had a massive impact on IPDC and MBMS, indicates that multicast mobility at IP layer may also be considered. It would be a reasonable option for offering a bearer-independent mobility suited to heterogeneous and hybrid network systems. However, whereas IP protocols for mobility of unicast are fairly well developed in the IETF and so can expect widespread acceptance and deployment, mobile IP for multicast is not well developed in standards and promises to open a few technical and commercial debates in the future. 10.3.5.2 Errors in Radio Transmission All communication systems can be subject to data loss. The fixed/wired Internet generally experiences packet loss due to congestion (excessive data load) prompting routers to drop packets, and thus reduce congestion. However, wireless links generally suffer the majority of their data loss on the radio link due to radio phenomena such as fading and interference. Data loss on radio links is more likely to occur in bursts for the duration of the interference (or other cause) rather than as psuedorandomly dropped packets, as some IP-based protocols assume, such as TCP (Transmission

346


Control Protocol). Thus, it becomes important to provide a reliable transmission that works well in the presence of errors characterized by radio transmission. There are essentially two methods for reliable transport in general: increased coding of the data to provide redundant information that can be used to reconstruct the original data and request for retransmission of data once loss has been detected. These are not exclusive and can be combined. In practice, redundant information takes two forms, forward error correction (FEC) and unsolicited repeat transmissions. FEC usually adds additional data to the transmission, which is mathematically calculated from the original data to enable full reconstruction up to a limit of lost data. The resulting FEC data may be transmitted in addition to the original data or instead of the original data depending on the scheme used. Hamming codes and Reed – Solomon codes are common examples of FEC and are applied at several communications layers (in fact, channel coding on most radio links reduces the source data to transmitted data ratio). Unsolicited repeat transmission achieve exactly the same thing as FEC, providing redundant data to be used in case of errors. However, they are simpler and generally consume more bandwidth than FEC, as a complex calculation to optimize the data size to error tolerance ratio is not used. A simple incarnation of this is the object carousel, which repeats the transmission of a number of objects over and over. The additional FEC or repeat transmission data is often transmitted in band with the original data. However, it may also be provided at a later time or on an alternative channel to reduce the statistical correlation between errors in subsequent packets relating to the same data. Scrambling, data block reordering, is also used for the same result: to reduce the chances of a burst error removing more than the limit of data from a transmission than the error correction scheme can recover from. The other primary mechanism for reliable transport is the request for retransmission of data once loss has been detected. For unicast, TCP is ubiquitous and routinely sends acknowledgments (ACKs) based on data receiver so the sender may resend any blocks of data that are not acknowledged. A variation of this is the negative acknowledgment (NACK), which puts the onus on the receiver to detect packet loss, and can scale better in the one-to-many case as the related state information for each sender–receiver pair is stored in just one receiver, instead of all being stored in the sender. Both these schemes have been considered for IP multicast in the IETF-RMT working group, and a protocol for NACK-orientated reliable multicast (NORM) is under standardization. However, the required NACK signaling requires an uplink channel (i.e., not a purely unidirectional service) and mechanisms to deal with NACK-implosion, where many receivers want to send a NACK at the same moment as they have experienced a common fault in transmission. Thus, the use of redundant data must be the primary reliability scheme where media transmission is primarily unidirectional and scales to mass audiences. 10.3.5.3 Unidirectional Downlink Bearers The unidirectional nature of the downlink ensures that larger audiences can be served, but also puts a premium on any protocol requiring duplex connection. An alternative radio channel, on either the same or a separate radio access technology/system, used

10.4

IP DATACAST (IPDC)

347

for uplink and individual downlink signaling, is feasible for terminals with this need. However, using an additional channel may impose additional costs to the user and the additional resource usage will limit scalability, so is more suited to smaller groups using protocols that deal with multicast scalability. This makes it less desirable to use an additional channel when it can be avoided and is a serious design consideration of any wireless multicast application. Multipoint-to-multipoint communications may not be able to avoid this issue, but signaling overhead and scalability issues must be addressed to ensure any successful service over these systems. The availability of an alternative channel, especially if on an alternative radio access technology, such as GPRS in addition to IPDC, adds the hybrid-network and multihoming dynamics to system design [17]. Network, terminal, or both may make bearer selection decisions based on more complex formulations than handover criteria, such as access cost or quality of service. This also presents application developers with the challenge of providing services that are largely independent of bearers. Applications that can deal with different bandwidth availabilities, latencies or delays, and costs of several candidate bearers are likely to become more widely deployed and thus more successful that those which are dedicated to only one bearer type. Radio link variations and bearer selection functionality are likely to make some connections intermittent so that applications that require an uplink channel but only intermittently are much more versatile than those requiring a continuous uplink. This also permits the special case of mixed active and passive users where a small subgroup of users send representative data to the network, such as packet loss reports or membership reports, and the remainder of the group passively receive the service while ever they do not absolutely need to signal the network.

10.4

IP DATACAST (IPDC)

10.4.1 The IPDC Concept The delivery of mass media content to mobile devices is a challenging problem. Mass media includes movies, television, radio, newspapers, and other published media, and is, by definition, purposed for many people. Typically mass media types of content are edited and published with a certain schedule. For example, an edited newspaper is printed and then delivered to households and points of sale. Television shows are recorded and sent to an audience using a carefully planned schedule. In other words, mass media are delivered to many people at the same time. Convergence will enable delivery of all types of content via any communication network to any device. Traditionally, when talking about the delivery of content to the mobile environment it is often assumed that a bidirectional access network such as GPRS, WLAN, or UMTS is needed. This is natural due to bidirectional nature of interactive class of services such as normal Web browsing. Mass media content, content that is purposed for many people and can be typically delivered to many people at the same time, can be delivered using

348


unidirectional networks as well. The concept of delivering content using IP delivery over broadcast radio technologies is referred to as IP datacast (IPDC). As broadcast technologies are, by definition, purposed for broadcasting they are well suited for delivering mass media content types. IPDC is therefore a technology used to provide access to popular content for large audiences simultaneously. IP datacasting is based on the IP multicasting paradigm, with some conceptual additions for one-way/unidirectional networks and/or service concepts [18, p. 201]. One possible wireless transport option for delivering IPDC services is to use digital broadcast networks, such as the digital video broadcasting terrestrial (DVB-T) network, as the physical transport. IP data can be encapsulated into a DVB-T transport stream using a method called multiprotocol encapsulation (MPE) [19]. The DVB project office has also noted the possibility of integrating DVB and UMTS systems. In 2000, the DVB organization founded an ad hoc group in order to define common critical enablers for both DVB and UMTS systems [20]. In addition to the work carried out by the DVB project office, another boost for IP over DVB-T for wide area cellular communications was the foundation of the IP Datacast Forum [7,21]. The major target for that forum is to demonstrate and promote the use of digital broadcast standards for the delivery of digital content using IP connectivity. Figure 10.8 [22] shows a general high-level architecture of a convergence terminal that could be used to access personal interactive content over a cellular interface and mobile mass media content over an IPDC interface. Application layer and connectivity layer convergence, as shown in the figure, enables delivery of any type of content over any radio access bearer to a single device.

10.4.2

IPDC Services and Applications

IPDC technology can be used to deliver any type of digital content. The technology can be used as a point-to-point delivery channel, especially when accompanied with

Application Layer, Digital Convergence

Applications

Connectivity Layer, IP Convergence

IP, IP Multicast

Access Layer Any Transport

Cellular Access

IP over DVB-T

Figure 10.8 General high-level architecture of a convergence terminal for the consumption of mobile mass media. Application and connectivity layers are independent of radio transport [22].

10.4

IP DATACAST (IPDC)

349

a return channel. However, as the used radio access technology is designed for broadcast-type delivery, it is most beneficial if used for one-to-many applications. Thus this section focuses on one-to-many services. In much research on mobile media consumption we see very detailed classifications of services in the mobile domain. For example, Ref. 20 lists about 20 different service scenarios for IPDC types of services. The services vary from mobile office type of business applications to entertainment applications such as video on demand. Ref. 23 covers multiple scenarios of delivering Walled Garden portals over IPDC technologies. On the other hand, when we consider how people currently consume media, we see that usage of mobile services is not visible. For example, during the year 2002 in the United States the average daily television viewing time was 4.5 hours. Time spent with printed media and Internet accounted for a total of 1.2 hours, and people listened to radio for about 2.7 hours. Figure 10.9 [24] illustrates these media consumption reports for some different market areas. From these media consumption figures it is evident that the most popular media consumption medium is television. As the television delivery paradigm is natively broadcast, it is particularly well suited to delivery to mobile devices using IPDC technology. Television service scenarios can be divided into two basic categories: (1) broadcast preprogrammed television and (2) video on demand (VoD). Broadcast preprogrammed television can be either normal television broadcast (including live footage) or carousel type of television broadcast where, for example, a news clip is broadcast at repeated intervals. The VoD concept includes both video-on-demand and near video on demand. IPDC technology is best suited to preprogrammed

Media consumption, hours per day

10 8 .5 hours 8

6

Radio 2.7h

9 . 4 hours Intern et 0.4h

Internet 1.3h

Internet 0.6h

Print 0.6h

Print 1.9h Print 1.1h

Radio 2.0 h

Radio 3.4h

4

2

0 Figure 10.9

9 .4 hours

Television 4 .5h

USA

Television 5.0h

Singapore

Television 3.8h

Finland

Media consumption in the United States, Singapore, and Finland [24].

350


television and near video on demand types of services. The challenge with VoD services is dealing with lower scalability and bandwidth efficiency/reuse due to the need for individual communication channels both in the uplink and downlink.

10.4.3

IPDC System Architecture

Figure 10.4 gives a reference model that can be used for an IP datacast system. An example network architecture for unidirectional IPDC network for broadcast IP data delivery is shown in Figure 10.10. The service and delivery management system (SDMS) is used for controlling the service system (SS) and network elements. Content is provided to the service system from the content provisioning system (CPS). The service system schedules content distribution and delivers content over a quality-of-service-enabled backbone to the IPDC radio access system. The backbone routes selected IP packets to selected multiprotocol encapsulators (MPEs). Each encapsulator encapsulates IP packets in the native transport stream (TS) frames of the IPDC bearer.

10.4.4

Mobile Wireless Radio Networks for IPDC

There are three major variants of digital terrestrial television standard currently in the world: the European Digital Video Broadcasting Terrestrial (DVB-T) system, the Japanese Terrestrial Integrated Service Digital Broadcasting (ISDB-T) system, and the U.S. Advanced Television System Committee (ATSC) system. The primary service for all of these is, at the moment, providing digital television transmissions to households and the creation of mobile television or mobile data services was not a primary driver in their original scopes. The ATSC system was designed to transmit high-quality video and audio [highdefinition television (HDTV)] and ancillary data over a single 6-MHz-bandwidth

Service and Delivery Management System

e e e e QoS-enabled backbone

e

e e

Content Provisioning System Service System Figure 10.10

One possible network architecture for IPDC.

10.4

IP DATACAST (IPDC)

351

radio channel. The system uses the trellis-coded eight-level vestigial sideband (8-VSB) radio modulation. The modulation is designed for single transmitter [multifrequency network (MFN)] implementation. The modulation used does not support any kind of mobility [25]. Because of the lack of capability for mobile reception, the ATSC system is not considered as a candidate radio bearer for mobile IPDC services. The ISDB-T system aims to provide stable reception for compact, light, and inexpensive mobile receivers in addition to receivers used in homes. The system uses the band-segmented transmission –orthogonal frequency-division multiplex (BST-OFDM) modulation. This modulation provides good mobility [24]. In addition to mobility, BST segments allow provision of services on a bandwidth of 1 14th of the terrestrial television channel spacing (the total radio channel bandwidth). This feature enables terminals to consume less power in their radiofrequency (RF) components at the cost of more system complexity. The DVB-T system was developed by the European consortium of public and private sector organizations, named the Digital Video Broadcasting Project. The system was designed to allow digital video and digital audio transmission as well as transport of multimedia services. DVB-T uses coded orthogonal frequencydivision multiplex (COFDM) modulation. The standard was originally developed for stationary and portable reception. It was later shown that DVB-T supports good mobile reception with certain parameters. In order to optimize the DVB-T transport for delivery of IP data to the mobile environment the DVB project office has introduced a technical specification known as DVB-H (DVB handheld). This work has been in progress previously under the titles DVB-M (DVB mobile) and DVB-X. The DVB-H system, as far as we know today, will be usable in 6-, 7-, or 8-MHz UHF channels but will primarily target mobile (not fixed) receivers. DVB-H is an optimized radio bearer for delivering IP data to mobile/portable handheld devices, such as mobile phones. One key issue, which is currently the central point of the research activities in the ad hoc group DVB-H, is the power consumption of the DVB-H front end. DVB-H will be used with battery-powered communication devices including mobile phones. Here, battery lifetime is crucial, and ongoing research is looking at the prospects of providing DVB-H receivers that can operate in a batterypowered mode for several hours without the need to recharge the batteries. To accommodate more efficient power usage, a form of time slicing, or time-division multiplexing, will be employed so that services can be delivered in short bursts at higher data rates than they are acquired or consumed. This creates time intervals during which a receiver is aware that there are no data being transmitted that is of interest to it. If these idle times are significant in comparison with the active (interesting transmission) times (e.g., 90% of the total time), then a terminal is able to make a significant power saving by powering down its radio electronics during the idle times. The other main issue for the specification work is to optimize performance in the mobile environment including techniques to optimize the number of radio subcarriers, and increasing the robustness (error correction abilities) of the transmitted

352


data. As the standardization work is ongoing, it would be premature to make more detailed statements at this phase. At the moment it looks like the DVB-H is the ideal candidate as an IP datacast bearer to be used with mobile convergence terminals. Figure 10.11 [26] shows global adoption of digital television standards. ISDB-T is used in Japan. ATSC is used in United States and Canada. Central America and South Americas have not selected the standard as of early 2003. DVB-T has been selected or is likely to be selected by the remainder of the countries. The digital audio broadcast (DAB) system provides a signal, which carries a multiplex of several digital services simultaneously. The system radio channel bandwidth is about 1.5 MHz, providing a total transport bit rate capacity of just over 2.4 Mbps in a complete “ensemble.” Depending on the requirements of the broadcaster (transmitter coverage, reception quality), the bit rate available to data services ranges between 1.7 and 0.6 Mbps [27]. DAB provides the feature of IP encapsulation to its transport stream, as is the case with digital video standards [28]. DAB can be regarded as potential radio bearer for IPDC service. 10.4.4.1 DVB-T/H as a Radio Access Network for IPDC DVB-T and its forthcoming DVB-H variant are particularly suited to mobile wireless content distribution as part of an IP datacast system because of the widespread deployment and mobile-friendly characteristics. For this reason we shall consider IP over DVB-T/H in a little more detail. Video, audio, and data are carried over a transport stream (TS), as defined by part 1 of MPEG-2 Systems Standards [29], in all the abovementioned digital television networks. There are five protocol profiles defined for data broadcasting over DVB [30], each with different application areas and requirements (Fig. 10.12). The profiles that can be used to provide DVB data services are data piping, data streaming,

Figure 10.11

Global adoption of digital television standards [26].

10.4

Figure 10.12

IP DATACAST (IPDC)

353

Data broadcasting profiles of DVB.

data/object carousels, and multiprotocol encapsulation. Of the data broadcasting profiles specified by DVB, the DVB multiprotocol encapsulation (MPE) is best suited for generic Internet access, as it provides a standard encapsulation for IP-based protocols [31,32]. The DVB MPE profile is intended for sending datagrams of non-DVB protocols over DVB networks. The encapsulation provided by the DVB MPE profile is closely tailored to ISO LAN/MAN standards. Thus, the DVB network can be considered as being an OSI layer 2 data link in the domain between a MPE broadcast service provider and DVB data receivers. However, there are differences between DVB MPE and more traditional OSI data link layer technologies such as Ethernet, data links over DVB MPE are unidirectional, provide logical broadcast channels identified by different packet identifier (PID) values, and often include a much larger number of receiving hosts than does a normal LAN/MAN segment. While datagrams of other protocols can be fragmented and sent over multiple sections, no fragmentation is done for IP packets in MPE. Thus, each IP packet must fit into a single datagram section that can be up to 4097 bytes in size. This sets an upper limit to the size [maximum transport unit (MTU)] of IP packets that can be transmitted using multiprotocol encapsulation: 4074 bytes if LLC/SNAP framing is used, or 4080 bytes without LLC/SNAP framing (Fig. 10.13). MPEG2 also defines program service information (PSI) tables, which each digital television system inherits. In addition, DVB defines some of its own service information (SI) tables. These tables are delivered in carousel fashion at different rates depending on the quantity and urgency of their information so that receivers may understand the logical services provided by the digital television radio channel. DVB specifies enough SI to provide an electronic service guide (ESG), which can be used to render television program related information to user interfaces (e.g., a TV schedule guide). To IPDC, the relevance of SI is that it is an access-specific service discovery method. It is used to announce which IP streams (e.g., IP multicast groups) are available on the logical channels of the

354


IP Header

IP Packet

LLC/SNAP Header

LLC /SNA P Frame (optional)

Datagram Section

Section Header

IP Payload

LLC/SNAP Payload

Section Payload CRC-32 or checksum

Figure 10.13

Encapsulation of IP packets in DVB datagram sections.

available radio channels (multiplexes) [29]. As discussed earlier, this enables the higher-layer service discovery schemes to be bearer and access independent. Encapsulated IP datagrams and SI sections are multiplexed together, and possibly remultiplexed several times, into multiprogram transport streams that are delivered by any of many MPEG-TS supporting infrastructure links to DVB-T transmitter stations. These stations, which are analogous to cellular basestations, perform the necessary modulation, upconversion and radio transmission of the TS multiplex. IP datacast is able to work with any broadcast network topology, although a wide area cellular topology allows some frequency reuse and thus more efficient radio usage. The DVB-T definition of a cell enables one or more transmitters (and frequencies) to be used in providing the total cell geographic coverage area of a cell. 10.4.5

IP Infrastructure for IPDC

Figure 10.14 shows IP infrastructure of an IPDC system used for broadcasting. The figure shows two different hypothetical services marked by dashed and dotted flows (lines). The topmost IPDC cell (A) provides only the dashed service, the middle one (B) both dashed and dotted, and the lower one (C) only the dotted service. The QoS-enabled backbone is assumed to be a multicast-enabled IP network, and the following discussion applies to the various options for routing and networking. The service and delivery management system on the left of the figure is used to control encapsulator elements (marked “e”). Because of the unidirectional nature of the delivery, the encapsulator elements act as proxy clients and are responsible for joining to the multicast routing tree so as to subsequently forward the relevant data packets onto the broadcast radio link. In principle, encapsulator elements join as a result of a management system decision. The service system is the source of multicast services that provide content using the IP multicast network. In the figure, the management system has given instructions to the encapsulators feeding cells A and B to send join message to dashed service and encapsulators B and C to send join message to dotted service. The IP multicast network routes services automatically to encapsulators, and the exact mechanism depends on the routing protocols and

10.4

355

IP DATACAST (IPDC)

A

B e e SDMS

ee QoS-enabled backbone

d

e

C

SS CPS

Figure 10.14 IP infrastructure for IPDC broadcast system.

methods selected by the network operator. As the backbone is multicast-enabled, each data packet need be delivered only once on each link. The encapsulators encapsulate IP packets to the transport stream as described earlier. The multiprotocol encapsulated IP packets are delivered to cells A, B, and C. Terminals receive the traffic in the access cell as those would receive the traffic on normal multicast network. As the system is unidirectional, the terminals are not required to send any multicast join or leave messages toward the network. 10.4.6 The IPDC Service System An IPDC service system is generally built for purpose according to the service mix that the datacast operator(s) in question wishes (wish) to provide. Internal, walled garden, or external (e.g., public Internet) content may be available, and the choice of servers, proxies, and digital rights technologies will reflect this. However, three general platforms are required for the provision of higher-level services, which mirror those described in the earlier section on a generic multicast system: 1. Streaming. The mass media broadband communications provided by IPDC are particularly suited to video and audio streaming, for instance, providing mobile television. IPDC transport protocols and codecs required ensure that at least MPEG4 video using real-time transport protocol (RTP) delivery is supported. 2. Filecast. Forward error correction (FEC) and repetition are the best options to improve reliability for the IPDC unidirectional-only channel. Large events, such as major sports events and the launch of new software and media releases, require the kind of wide area multicast distribution IPDC is optimized for.

356


3. Media Discovery. IPDC systems announce their content and services in advance and during the multicast sessions that deliver them. The syntax of the media descriptors is open, although SDP and XML-based should be expected, and typically the same protocol would be used for unidirectional delivery as is used for filecasting. Work is in progress on filecast and media discovery standards, and the work in Internet media guides (IMGs) in the IETF promised to fulfill the foreseeable needs of IPDC systems. Figure 10.10 shows one possible IPDC architecture including three server-side components: the service and delivery management system (SDMS), the service system (SS), and the content provisioning system (CPS). The role of a content provisioning system is to store and provide content. Although any possible content creator may operate a CPS, large media houses often aggregate content that is orientated for the mass media consumer market. An IP datacast service provider must operate one, or more, service systems that acquire, aggregate, and/or source the content. Acquiring content from content provisioning systems may be entirely manual, such as physically transporting data tapes, although automated techniques are more scalable, including file transfers and live streaming. In any case, the value of the content and the contractual obligations of CPS and SS operators require that an appropriate level of security be provided to protect from unauthorized usage, distribution, and secret exposure. A service and delivery management system is required to control service systems, support electronic transactions between CPS and SS, negotiate and/or dictate transmission schedules, and control the related IP datacast network elements. In the case of scheduled services, such as television programming, distributing the scheduling functionality between SS and SDMS may be beneficial. It is through the service and delivery management system that operators would usually administrated over the IP datacast system as a whole. 10.4.7

E-Commerce for IPDC

In order to ensure revenues in the IPDC broadcast system, there is a need to securely cipher and encrypt data and for mechanisms that allow end users to request access and then to distribute decryption keys to mobile terminals. In practice the request for access, and possibly also the delivery of the keys, requires some sort of return channel from the mobile terminal to a network server. As the IPDC transport is unidirectional, the cellular access of the convergence terminal is used. In this scenario IPDC and 2G/ 3G networks complement each other. IPDC provides broadband downlink transport media while 2G/3G offers a natural bidirectional control channel for IPDC. The integration level can be loose, tight, or anything between. In the extreme loosely integrated case, a multimode terminal is the only common denominator between the systems, but some shared network functions can also be considered [33]. Figure 10.15 shows a possible implementation of an IPDC system with billing functionality. The service management system exchanges keys for content decryption

10.4

IP DATACAST (IPDC)

357

with an e-commerce system (e-CS). Mobile devices use short message service (SMS) messages for ordering access to the encrypted content via a SMS center (SMSC). Decrypting keys are delivered to end users with a SMS message. The mobile terminal uses these keys to decrypt IPDC content. Optionally, the key delivered may be a key encrypting key that allows access to content encrypting keys. The benefit of this scheme would be to decouple the grouping and partitioning of content encryption from user subscriptions, and also the ability to calculate the actual content encrypting keys at a later time to increase security. An IP datacast system only utilizes ciphering on IP or higher protocol layers. Figure 10.15 also shows supporting cellular network components are enablers of interactive communications and packet-switched protocols: radio network controller (RNC), serving GPRS support node (SGSN), and gateway GPRS support node (GGSN). 10.4.8 IPDC in Summary IPDC is technically based on the multicast/broadcast delivery of IP packets over digital broadcast technologies. The concept can be implemented by using DVB-S, ISDB-T, DVB-T, DVB-H, ATSC, or DAB radio technologies. The radio technologies of DVB-T, DVB-H, ISDB-T, and DAB enable mobile content delivery. The DVB-H technology is further optimized for delivery of IP data to mobile environments and is therefore eminently suitable as a radio bearer for mobile mass media services. The key components in IPDC systems are the service and delivery management system, the content provisioning system, the service system, the multicast-enabled IP backbone, encapsulators (MPEs), radio transmitters, the e-commerce system,

e SDMS

e e QoS-enabled backbone

e Key Exchange

SS CPS

Internet Int ernet

Figure 10.15

BS

SMSC

e-CS

BS GGSN

SGSN

RNC

IPDC system with billing functionality.

358


and terminals with IPDC receiver and software. The service and delivery management system, the content provisioning system, and the service system are used to send and manage content delivery and delivery schedules. The IP backbone enables delivery of content from a service system to access network components: encapsulators and radio transmitters. The essential part of the system in a business respect is the e-commerce system, which is used to for purchasing access rights to content and is the root of the IPDC security mechanisms.

10.5 10.5.1

MULTICAST IN THIRD-GENERATION CELLULAR (MBMS) The MBMS Concept

Traditionally, the main emphasis in cellular networks has been on bidirectional point-to-point communication. However, the benefits that come with point-to-multipoint (p-t-m) model have been noted and therefore some steps towards this direction have been taken in 3GPP (Third-Generation Partnership Project). In release 99 of the 3GPP standards, two p-t-m concepts have already been defined: cell broadcast service (CBS) and IP multicast support. CBS enables transmission of low-bit-rate data services to a predefined set of cells over a p-t-m bearer on the air interface [34]. Because of bit rate limitations, CBS is not suitable for delivering multimedia types of data, but fits well for services whose purpose is to be available to all subscribers in a certain area and does not require high bit rates. As the term states, IP multicast support enables the subscribers to receive IP multicast traffic over 3G networks [35]. In contrast to CBS, IP multicast support does not limit the transmitted data types, but enables transmission of any kind of data that can be carried over IP. The major drawback in this concept is that in reality the data are always delivered over dedicated p-t-p bearers. Thus, from a resource usage point of view, the data delivered using IP multicast support does not differ from normal packet calls, and no real savings can be found, so, in other words, this has no gain (a multicast gain of one). To overcome these weaknesses in the existing p-t-m services, 3GPP launched a standardization process for a new service concept. Multimedia broadcast/multicast service (MBMS) is a new p-t-m bearer service that enables efficient unidirectional p-t-m multimedia data delivery to mobile subscribers [36]. MBMS has two modes of operation: broadcast mode and multicast mode. The phases of service provisioning in these modes are illustrated in Figure 10.16. In the multicast mode services, users need to subscribe to services. When wanting to start the service data reception, an explicit request to join the service must be sent toward the network. On the contrary, in the broadcast mode the service data are always sent to the predefined network area without any MBMS system knowledge of the presence of potential receivers (i.e., subscription and joining are not required in the broadcast mode). Thus, in the multicast mode the data can be selectively sent only to such cells that contain listeners. Furthermore, charging data (i.e., usage reports) can be collected for the end users in multicast mode, unlike the case in broadcast mode.

10.5

MULTICAST IN THIRD-GENERATION CELLULAR (MBMS)

359

Subscription Establish relationship between user and service provider

Service Announcement

Service Announcement

Inform users of available services

Inform users of available services

Joining User indicates his/ her interest to receive a service

Session Start

Session St art

Trigger to establish MBMS bearer for data transfer

Trigger to establish MBMS bearer for data transfer

MBMS Notification

MBMS Not ification

Inform UEs about forthcoming/ongoing MBMS data transfer

Inform UEs about forthcoming/ongoing MBMS data transfer

Data Transfer

Data Transfer

MBMS data transfer to UEs

MBMS data transfer to UEs

Session St op

Session St op

Stop data delivery and release MBMS bearer resources

Stop data delivery and release MBMS bearer resources

Leaving User indicates his/her interest to stop service data reception

Broadcast Mode Multicast Mode Figure 10.16

Phases of MBMS service provisioning in multicast and broadcast modes.

From the standardization perspective, the specification of MBMS is a deliverable of 3GPP release 6 (see discussion below) and is a work item for many working groups (WGs) of 3GPP’s technical specification groups (TSGs). The overall structure of 3GPP’s technical bodies and their areas of responsibility are described in Figure 10.17 [37]. Particularly important to MBMS are the system aspects (SA), radio access networks (RAN), and core networks (CN) working groups. The MBMS standardization effort began in summer 2001 by defining service requirements in SA1 [36]. Progress in other WGs was possible, as the requirement specification reached a stable state, in autumn 2002. SA2 took the responsibility for the architecture and functionality of MBMS. Security aspects of MBMS, which provide security solutions in the IP layer or above, are defined in SA3 (noting that prior 3GPP ciphering will not be reused in MBMS). SA4 has responsibility for the MBMS work on codecs (video and audio encoding). The radio access network working groups (RAN and GERAN) specifications define the extensions required for p-t-m data delivery. The definition of core network aspects is the remit of the CN WGs, based on a mature architecture specification prepared by SA2. 3GPP specifications evolve continuously and they are being enhanced with new features to meet the market requirements. To enable simultaneous development of new features and implementation of the 3GPP system, the specifications are organized into releases that include certain groups of new features. A freeze date

360


Figure 10.17 Technical bodies in 3GPP and their areas of responsibility [37].

is defined for each release, after which no more new features can be added (only corrections). The first 3GPP system release (release 99) was frozen in March 1999. A rough timeline for the 3GPP release schedule is given in Figure 10.18. The objective in MBMS standardization was to finalize the specification in release 6 timeframe. The progress made for release 6 naturally affects the scope of release 7. 10.5.2

MBMS Services and Applications

As explained earlier, MBMS is primarily a unidirectional p-t-m bearer service for IP packets in the packet-switched domain of 3GPP systems. In essence, MBMS does not provide any content services itself, but different kinds of applications can use its bearer capabilities to create new services. Thus, MBMS can be seen as an enabler of other services. Since a MBMS bearer can be used to deliver different types of data (e.g., video, audio, text), it supports a vast range of services. The characteristics of services carried over MBMS bearers vary depending on the mode of operation that is used for service provisioning. The following paragraphs discuss these aspects. In the broadcast mode services data are delivered to all users in a certain network area without knowledge on the presence of users, since neither service subscription nor joining is required. Within the scope of the MBMS system, these services are

Figure 10.18

3GPP release schedule.

10.5


361

free of charge to the receivers, and the service data are not encrypted. Examples of these kinds of services would be mobile advertisement services and network welcome messages to the users. The multicast mode services can be further divided into two categories: services available in a hotspot area and services available in a larger network area. In both cases the users need to subscribe and join the services and charging information may be collected for the joined users. In the case of larger network area service, the data are delivered only to such cells in the service area in which joined users reside. A RAN can further optimize the data delivery over the air interface in cells using mechanisms discussed later (see Section 10.5.4). A typical example of this kind of service is a news service, in which subscribers receive news updates (e.g., video clips and text) to their mobile phones during the day. In the hotspot case an operator would predict that there would be many receivers in a service area during the service provisioning. Therefore the data are delivered to the cells in the service area over p-t-m configuration without regarding an explicit knowledge of the existence of receivers. The group of receivers may be a combination of both passive and active devices, such that the exact configuration of the hotspot change may be service- or operator-specific. One frequently-used example is the so-called football stadium scenario. In this scenario a service is provided to the football match spectators in the stadium area to receive replays of highlights and information on the progress of other matches taking place at the same time. 10.5.3 MBMS System Architecture The starting point in the MBMS architecture definition was the efficiency of resource usage—multiple receivers should share common bearer resources whenever possible. In addition, there has been emphasis on reusing existing network components and protocols in order to minimize the changes to the infrastructure [38]. The MBMS reference architecture is illustrated in Figure 10.19 [39] (note, consistent with the other figures in this chapter, that the mobile terminals are shown on the right-hand side, although the original 3GPP specifications normally show the terminal to network relationship from left to right). As illustrated by the figure, MBMS is implemented in the packet-switched domain of the 3GPP system. MBMS introduces one new element: the broadcast/ multicast service center (BM-SC). This new element resides between packet core network and content providers. It acts as a MBMS data source and performs certain control tasks, for example, to initiate and to terminate MBMS transmissions. The MBMS architecture enables content provisioning from data sources either external or internal to the operator’s network [known as the public land mobile network (PLMN) in 3GPP terminology]. Gateway GPRS support nodes (GGSNs) and serving GPRS support nodes (SGSNs) in the core network perform packet tunneling from BM-SC toward the correct radio access network (RAN) nodes. In addition, GGSNs maintain session information of ongoing MBMS sessions and perform control procedures, such as mobility management. On the RAN side, both UTRAN and GERAN will support MBMS. It is the responsibility of the

362


Figure 10.19

MBMS reference architecture [39].

individual RAN technology (i.e., UTRAN and GERAN) to select the most efficient delivery mechanism to transmit MBMS data over the air interface to end users. A possible realization of a MBMS-enabled 3G network is given in Figure 10.20, which also illustrates the hierarchy of different network elements in a 3GPP system. The different parts of this MBMS architecture are described in further detail in the following sections. 10.5.4

MBMS Radio Access Networks

3GPP has stated that it should be possible to provide MBMS services over both WCDMA-based UTRAN and GSM/EDGE-based GERAN. Their main responsibility is to efficiently deliver MBMS data from core network to mobile receivers. In addition, UTRAN/GERAN shall support RAN-level mobility management for MBMS receivers [40]. From the core network viewpoint MBMS data delivery is always point-tomultipoint, but at the RAN side the situation is more complicated. The reason for this is that the RAN needs to decide whether it is more efficient to deliver the data over p-t-p or p-t-m radio bearers (RBs). WCDMA is an interference-driven radio technology, and with a small number of receivers, delivery by several p-t-p bearers might be more efficient than by a single p-t-m bearer. Thus, MBMS service data always come to a RAN via a shared bearer from a SGSN, and, based on the number of listening users in its cells, the RAN makes a selection between delivering by p-t-p or p-t-m radio bearers. It should be noted that the RB type

10.5


363

RNC

... SGSN

BM-SC

GGSN RNC

...

Cell B3 Cell B1

SGSN

Cell B2 RNC

Cell A2

Cell A3 Core Network (CN) = SGSNs + GGSNs

Figure 10.20

Cell A1

Radio Access Network (RAN) = Radio Network Controllers (RNC) + Base Stations

Hierarchy of network elements in the 3GPP system.

(p-t-p, p-t-m) need not be the same for each cell under a certain radio network controller (RNC) node. For example, in Figure 10.20 the MBMS data could be delivered to cells A1 and B1 over a p-t-m bearer, while in the case of A2, B2, and B3 dedicated p-t-p bearers could be used. Another aspect making the functionality in RAN more complex is the idle-mode reception. When using dedicated point-to-point packet bearer services, the UE is in RRC_CONNECTED (radio resource control) mode. This means that the UE has an active signaling connection to the network and therefore the network knows the location of the UE with single-cell accuracy. In the case of MBMS data reception, UEs may receive data in RRC_CONNECTED mode, but to reduce the UE battery power consumption requirements, 3GPP has defined that data reception should also be possible from MBMS bearers in RRC_IDLE mode (allowing RF components to sleep some of the time while not receiving data). In RRC_IDLE mode the UE does not have a signaling connection to the network and therefore the network does not know the location of the UE with a cell accuracy, the UE is temporarily acting as a passive-only receiver. New mechanisms have been introduced to support the features discussed above. With a counting procedure, the RAN can find out whether there are enough receivers

364


in each cell under its control to justify the usage of p-t-m bearer instead of p-t-p delivery [41]. It should be noted that before initiating a counting procedure, some of the joined UEs might be in RRC_IDLE mode and therefore the exact number of receivers in each cell might not be known. The selection of the bearer type is based on the threshold value, which the network operator defines. When setting up an MBMS session, the RNC first checks whether the number of RRC_CONNECTED mode receivers in a cell exceeds this threshold. If the threshold is exceeded, the usage of p-t-m bearer is justified. Otherwise the RNC sends a notification message including counting indication to the wanted cell(s), which requests the joined UEs to establish an RRC connection (i.e. make a state transition to RRC_CONNECTED mode). However, not all joined UEs make this transition to prevent overload in the RAN. The exact number of UEs brought to RRC_CONNECTED is an implementation issue. By summing the number of the joined UEs in RRC_CONNECTED mode in each cell, the RNC can decide what type of radio bearer to use. During the MBMS data transmissions, notification messages are sent periodically to inform the users about the radio configuration used for the service. The message carries information about the type of radio bearer used (i.e. p-t-p or p-t-m). In the case of a p-t-m bearer, it may be used to deliver other RB parameters as well. This mechanism is used to minimize the data loss when a UE receiving MBMS data makes a cell reselection. For instance, due to periodical notifications a UE receiving MBMS data in RRC_IDLE mode does not need to establish a signaling connection upon cell change to learn the type of radio bearer that is used in the new cell for MBMS data transmission. 10.5.5

MBMS in the Core Network

Currently the 3GPP system’s packet core network supports data delivery only over point-to-point connections. To enable packet data exchange between UE and packet data networks (e.g., the Internet), GTP (GPRS tunneling protocol) tunnels are set up between GGSN and SGSN, and between SGSN and RAN. In the core network elements (SGSN, GGSN) packet connections are described by packet data protocol (PDP) contexts, which contain all the necessary parameters to deliver packet data between two endpoints with a defined QoS. Introducing MBMS into the core network requires enhancements as, from the perspective of the core network, MBMS data delivery is always seen as point-tomultipoint communication. 3GPP decided that the MBMS data delivery is also performed through GTP tunnels in the core network (as in the case for p-t-p packet communication), so that the solutions are built on well-known concepts. However, the concept of PDP context is not sufficient to describe p-t-m connections and therefore new context types are introduced: . The MBMS bearer context contains information to describe an MBMS bearer, such as addresses of downstream nodes, QoS, bearer identification (IP multicast address and access point name), and the area in which this service is

10.5


365

available. In addition to the core network elements, this type of context is also maintained in the RAN and BM-SC. . The MBMS UE context contains UE-specific information related to a certain MBMS bearer that the UE has joined. This context can be used, for example, for charging purposes. The MBMS UE context is maintained in GGSN, SGSN, and UE. In addition to tunnel management, the core network performs other MBMS functions as well. SGSNs take care of mobility management, provide user individual network control functions and relay MBMS data to RAN nodes. A GGSN acts as an entry point for MBMS bearers to BM-SC. GGSNs request MBMS bearer establishment and release from SGSNs upon notification from a BM-SC. The GGSN relays IP multicast traffic toward SGSNs as MBMS data and is also responsible for processing joining requests that the users send to request multicast mode service activation. Joining is performed using standard IETF mechanisms over p-t-p packet connections: the UE sends an IGMP (IPv4) or MLD (IPv6) join message, in which the IP multicast address(es) identifies the service requested. Furthermore, SGSNs and GGSNs collect charging data for MBMS service listeners. 10.5.6 MBMS Service Center and Data Sources The broadcast/multicast service center (BM-SC) is a new element introduced in the MBMS architecture. A BM-SC acts as a data source for MBMS services, but also has some control responsibilities. It acts as a gateway to MBMS services to the content providers, which may reside either within or outside the operator domain (as despicted in Fig. 10.18). As 3GPP is primarily interested in MBMS as a new bearer server, the interfaces and functionality between BM-SC and content providers is beyond the scope of 3GPP standardization. However, some issues in this area are briefly discussed in the next section. Toward the core network elements, the BM-SC has several functions. It performs high-level session scheduling based on which it is able to initiate and terminate MBMS transport resources as necessary. In order to initiate an MBMS service session, it sends a session start notification towards the GGSN. At the same time the session parameters (e.g., QoS, service area) are also delivered, so that the core network elements can perform QoS authorization and policing. As the resources for a session are reserved (i.e., the GTP tunnels and radio bearers for data delivery are set up), a BM-SC can begin the MBMS data delivery toward GGSN(s). The data are transmitted as IP multicast packets. Similarly, when the stop time of the session is reached, BM-SC sends a stop session notification to request release of the resources reserved for the session. Furthermore, a BM-SC will provide service announcements to advertise the available MBMS services to the users. With this function the UEs are able to find out the communication parameters (service identification, media descriptions, etc.) of the services. Any required security protection of MBMS service content also takes place in the BM-SC. The security procedures will be performed in the IP layer or above.

366


In the multicast mode, it is desirable that only subscribed users are able to join MBMS services [36]. Thus, the functionality for service authorization per user is located in the BM-SC. As a GGSN receives a join request to an MBMS service, it consults the corresponding BM-SC to check whether the user is authorized to receive the service. Thus, the BM-SC must maintain information on service subscriptions to perform this function. Still one additional essential responsibility of a BM-SC is to generate charging information for the data transmitted by content providers so that content providers may be billed should the service scenario require that. It is envisioned that MBMS bearers could also be utilized for delivering data from sources other than a BM-SC. For example, this feature would enable efficiently transmitting IP multicast sessions available in the Internet to mobile receivers via MBMS bearers. In this type of scenario the BM-SC would still be required to perform the control signaling toward GGSN(s) to set up and tear down the bearer, but the actual user plane data would flow directly from the data source to GGSN(s), bypassing BM-SC. Thus, these kinds of sessions need to be configured to BM-SC beforehand to enable the service provisioning over MBMS bearers. The possibility of using other data sources than BM-SC is illustrated in Figure 10.18 as data source boxes. 10.5.7

Commercial Interfaces

As mentioned earlier, the interfaces between BM-SC and content providers will not be standardized in 3GPP. However, 3GPP has defined some functions that should be available in BM-SC toward content providers. For security reasons, BM-SC will include functionalities to perform third-party content provider authentication and authorization to prevent illegal access to MBMS bearers. BM-SC will also verify the integrity of the data received from content providers to prevent illegal data insertion. The most important function in this interface is to enable data retrieval for MBMS services from external sources. The data retrieval functionality shall support rich media types, which can be further transmitted via MBMS bearers to receivers. 10.5.8

MBMS in Summary

MBMS brings 3GPP systems one step further in supporting a broader range and different types of services. This efficient point-to-multipoint bearer service for multimedia data enables the introduction of new kinds of services that would not have been previously feasible to implement with the capabilities of prior 3GPP systems. The application of multicast/broadcast services in this kind of environment is a relatively new matter. We can expect that in the future there will be further developments, in this area of 3GPP, as experience from real-world implementations is gained. For example, the future applications could drive the implementation of support for multipoint-to-multipoint communication in 3GPP systems, but this remains to be seen. Additionally, MBMS in 3GPP release 6 has remained substantially independent

10.6

MULTICAST CONTENT DELIVERY

367

of the largest new component of the previous release: the IP multimedia system (IMS). Since IMS brings along many IP-centric solutions to messaging and charging issues, it is likely that both MBMS and IMS will share a greater role in each other’s usage in future developments. The reception of MBMS in the market will play an important role in this area. On one hand, MBMS enables development of new types of services, but on the other hand, it puts pressure to service developers to come up with new service concepts that will awaken the interest of consumers.

10.6 MULTICAST CONTENT DELIVERY FOR MOBILES IN SUMMARY AND IN THE FUTURE IPDC and MBMS will be commercially implemented and available in the near future and promise to offer globally standardized and ubiquitous wireless multicast to mobiles. Both of these are primarily one-to-many content delivery systems providing data bearers and requiring significant service system and application support. Enhancing current services and business models with higher-bandwidth popular content clearly adds value to existing point-to-point wireless communication systems. Several anticipated third-generation services, such as mobile TV, can be made available with lower resource usage and cost, and thus are much more feasible and likely to experience widespread success. Just as exciting are the new opportunities that mobile multicast present that were unfeasible before. Surviving massive demand due to spontaneous human interest is a start, but we can expect to be surprised as the newly available technology stimulates ideas that have not been anticipated. In the same way, we can expect that real-world deployment and commercial development will encourage further technical advances. IPDC and MBMS technologies both offer a single “one size range has to fit all” bearer service for all applications. The practicalities of implementation would lead us to expect incremental development of bearer-specific applications toward bearer-independent applications, as IP multicast and IP convergence already facilitates. Thus the emergence of the need and solution for bearer selection: both the multicast and unicast options. Also, the one-to-many structured network optimizations of IPDC and MBMS leave the opportunity space somewhat open for smaller, more sparsely distributed multipoint-to-multipoint groups and services for them. The availability of several technologies and integration into terminals is an option that indicates a need for hybrid solutions that can further enhance the service value to users and operators. Integrated networks for several bearers and technologies remains a research topic for now, although the fast-moving mobile terminal market may be able to deliver integrated terminals that provide the full range of exciting multicast and unicast services, and allow system optimizations to occur incrementally in future deployment and standardization. As such, IPDC and MBMS form the basis for structured mobile multicast content delivery systems of

368


the future. Several other technologies also hold promise for IP multicast services. In particular, WLAN technologies offer both structured and ad hoc network use cases for multicast services. For now, momentum for widespread commercial deployment of multicast on these technologies has not formed, but this is one space to watch for both complementary and competing multicast propositions. Meanwhile, both IPDC and MBMS progress in standardization and implementation and several new systems and standards shall appear to meet their requirements. The home of IP Multicast, the IETF, is equally hard at work to make multicast a success through work on multicast enabling protocols. If these mobile multicast content delivery systems can achieve a commercial success in several application domains, they are capable of changing the usage patterns of the mobile Internet in its entirety. They would blur the lines between Internet, telecommunications and broadcast, and shatter traditional assumptions about segregating service categories between on-the-move, in-home and in the office usage. Mobile IP Multicast content distribution promises to become and remain a very important and exciting area into the foreseeable future.

REFERENCES 1. Multicast Security IETF Working Group, Working Group Charter, http:// www.ietf.org/html.charters/msec-charter.html. 2. Multicast & Anycast Group Membership IETF Working Group, Working Group Charter, http://www.ietf.org/html.charters/magma-charter.html. 3. C. K. Miller, Multicast Networking and Applications, Addison-Wesley, 1998, Chapter 7. 4. P. Koskelainen, H. Schulzrinne, and X. Wu, A SIP-based conference control framework, paper presented at NOSSDAV’02, Miami Beach, FL, May 2002. 5. R. Droms, Dynamic Host Configuration Protocol, RFC 2131, Draft Standard, March 1997. 6. B. Quinn and K. Almeroth, IP Multicast Applications: Challenges and Solutions, RFC 3170, Informational, Sept. 2001. 7. Technical Document, IPDC Forum, Proposal for Architectural Framework, version 2.0, 2002, http://www.ipdc-forum.org. 8. H. Schulzrinne et al., RTP: A Transport Protocol for Real-Time Applications, RFC 1889, Proposed Standard, Jan. 1996. 9. Reliable Multicast Transport IETF Working Group, Working Group Charter, http:// www.ietf.org/html.charters/rmt-charter.html. 10. M. Luby et al., Asynchronous Layered Coding (ALC) Protocol Instantiation, RFC 3450, Experimental RFC, Dec. 2002. 11. T. Paila et al., FLUTE: File Delivery in Unidirectional Environments, Work in Progress, Nov. 2003, http://www.ietf.org/internet-drafts/draft-ietf-flute-04.txt. 12. M. Handley et al., Session Announcement Protocol, RFC 2974, Experimental RFC, http://www.ietf.org/rfc/rfc2974.txt. 13. M. Handley and V. Jacobson, Session Description Protocol, RFC 2327, Proposed Standard, http://www.ietf.org/rfc/rfc2327.txt.

REFERENCES

369

14. J. Munõz, ed., Internet media guides, Proc. 56th Internet Engineering Task Force, San Francisco, March, 2003, http://www.ietf.org/proceedings/03mar/index.html. 15. B. Williamson, Developing IP Multicast Networks, Vol. 1, Cisco Press, 2000. 16. H. Kaaranen et al., UMTS Networks: Architecture, Mobility and Services, Wiley, 2001. 17. R. Walsh, L. Xu and T. Paila, Hybrid networks — a step beyond 3G, Wireless Personal Multimedia Communications, Conf., WPMC’00, Bangkok, Thailand, Nov. 2000. 18. Mobile Internet Technical Architecture, Nokia, Addison-Wesley, 2002. 19. TR 101 202, Digital Video Broadcasting (DVB); Implementation Guidelines for Data Broadcasting, ETSI, version 1.2.1, Jan. 2003. 20. DVB Project Office, Ad hoc Group DVB-UMTS, TM2466, The Convergence of Broadcast & Telecommunications Platforms, revision 4, Feb. 6, 2002. 21. Press release, New Forum to Promote IP Datacasting Activities, http://press. nokia.com/PR/200109/834086_5.html (referenced May 30, 2003). 22. J. Aaltonen, Proc. Conf. Transition from Analogue TV to Digital Services in Europe, panel 2, Aaltonen J, June 5, 2002, (http://www.eicta.org). 23. L. Tvede, P. Pircher, and J. Bodnekamp, Data Broadcasting: The Technology and Business, Wiley, Aug. 1999. 24. S. Gallup, Suomen Gallup Web Oy, http://www.gallupweb.com/, 2002. 25. Y. Wu, E. Plsizka, B. Caron, P. Bouchard, and G. Chouinard, Comparison of terrestrial DTV transmission systems: The ATSC 8-VSB, the DVB-T COFDM, and the ISDB-T BST-OFDM, p 101-113, IEEE Trans. Broadcast., 46(2), (June 2000). 26. DVB Project, DVB Home Page, http://www.dvb.org. 27. EN 300 401, Radio Broadcasting Systems; Digital Audio Broadcasting (DAB) to Mobile, Portable and Fixed Receivers, ETSI, version 1.3.3, May 2001. 28. ES 201 735, Digital Audio Broadcasting (DAB); Internet Protocol (IP) Datagram Tunnelling, ETSI, version 1.1.1, Sept. 2000. 29. ISO/IEC 13818-1:2000, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Systems, ISO, Dec. 2000. 30. EN 301 192, Digital Video Broadcasting (DVB); DVB Specification for Data Broadcasting, ETSI, version 1.3.1, May 2003. 31. J.-P. Luoma, Internet Access over DVB Networks, master of science thesis, Tampere Univ. Technology, Dept. of Information Technology, 2001. 32. J. Ljungquist, Transport Protocols for IP Traffic over DVB-T, master’s thesis, Dept. Teleinformatics, Computer Communications Laboratory, Royal Institute of Technology, Stockholm, 1999. 33. K. Ahmavaara, P. Jolma, and Y. Raivio, Broadcast and Multicast Services in Mobile Networks, WTC, 2002. 34. 3GPP, TS 23.041 Technical Realization of Cell Broadcast Service (CBS) (Release 5), version 5.1.0, March 2002. 35. 3GPP, TS 29.061 Interworking between Public Land Mobile Network (PLMN) Supporting Packet Based Services and Packet Data Networks (PDN) (Release 5), version 5.5.0, March 2003. 36. 3GPP, TS 22.146 Multimedia Broadcast/Multicast Service; Stage 1, version 6.1.0, Sept. 2002.

370


37. Third Generation Partnership Project, 3GPP Home Page, http://www.3gpp.org. 38. 3GPP, TR 23.846 Multimedia Broadcast/Multicast Service; Architecture and Functional Description (Release 6), version 6.1.0, Dec. 2002. 39. 3GPP, TS 23.246 Multimedia Broadcast/Multicast Service; Architecture and Functional Description (Release 6), version 0.5.0, April 2003. 40. 3GPP, TR 25.992 Multimedia Broadcast / Multicast Service (MBMS); UTRAN/GERAN Requirements (Release 6), version 1.3.0, Jan. 2003. 41. 3GPP, TS 25.346 Introduction of the Multimedia Broadcast/Multicast Service (MBMS) in the Radio Access Network (Stage 2), (Release 6), version 1.5.0, March 2003. 42. J. Aaltonen, J. Karvo, and S. Aalto, Multicasting vs. Unicasting in Mobile Communication Systems, WoWMom 2002, Atlanta, Georgia, 28 Sept. 2002. 43. J. Aaltonen, Content Distribution Using Wireless Broadcast and Multicast Communication Networks, Doctoral thesis, Tampere University of Technology, Publications 430, 2003.

CHAPTER 11

SECURITY AND DIGITAL RIGHTS MANAGEMENT FOR MOBILE CONTENT DEEPA KUNDUR Department of Electrical Engineering, Texas A&M University College Station, Texas

HEATHER YU Panasonic Information & Networking Technologies Laboratory Princeton, New Jersey

CHING-YUNG LIN IBM T. J. Watson Research Center Hawthorne, New York

The large-scale acceptance of digital media distribution rests on its ability to provide legitimate services to all parties. This requires allowing the convenient use of digital media while equitably compensating all members of the information distribution chain such as content creators, providers, and consumers. This chapter discusses the important issue of information protection and digital rights management in the context of mobile content delivery. We provide an introduction to the problem of content security and digital rights management (DRM); demonstrate how DRM must be designed to reflect the content distribution and business models of a given enterprise; discuss state-of-the-art mobile digital rights systems, such as Nokia’s Music Player and NEC VS-7810 and component technologies; and highlight some emerging technologies.

11.1 INTRODUCTION TO INFORMATION SECURITY AND DRM TECHNOLOGIES Modern advancements in the wireless communications infrastructure, signal processing, and digital storage technologies are enabling pervasive mobile digital media distribution; digital distribution allows the introduction of flexible, cost-effective Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

371

372

SECURITY AND DIGITAL

business models that are advantageous for multimedia commerce transactions. The digital nature of the information also more easily allows individuals to manipulate, duplicate or access media information beyond the terms and conditions agreed on in a given transaction, which has made content protection and rights management an influential issue in mobile content delivery. We begin this chapter by providing an introduction to the field of information security in order to motivate the basic techniques necessary for DRM. We then discuss key architectures, models, and requirements for DRM in the context of mobile devices. Section 11.3 addresses state-of-the-art and emerging challenges and solutions. 11.1.1

Information Security

Information security technologies can be classified into three main groups: computer security-, network security-, and content security-related methodologies. Each category entails the protection of information within a predefined scope. The oldest form of security, computer security, involves the protection of information within a computer, such as a single workstation. Traditionally the scope of computer security has been the standalone computing environment. Popular mechanisms include user password authentication and intrusion detection for access control, and antivirus programs to prevent information corruption. In contrast, network security is the protection of information during transit. In this type of protection, we are concerned with the security of the communication channel. Examples of basic technologies employed include encryption to prevent eavesdropping and digital signatures to ensure authentication of the received information. These technologies are used in conjunction with security conscious protocols that attempt to prevent undesirable access or processing of the signal. Content-based or media security is the newest form of information protection that has emerged since the early 1990s. In this form of protection the intellectual property itself is protected against attacks such as unlawful tampering, illegal duplication, and unauthorized access. Since protection is applied or “attached” at the content level, the tools often merge sophisticated digital signal processing with traditional security-related transformations such as encryption and attempt to provide more semantic meaning to the data through the use of identification tags such as metadata. There are three main aspects to information security [1]: . Security attack—an unwanted act performed by a party in order to jeopardize the protection of given information. Examples of security attacks include eavesdropping, forgery, masquerading, tampering, denial of service (DoS). A security attack can be conducted by an individual or group of individuals, known as the attacker(s), who may or may not be involved with the information creation, processing, communication, or storage. Security attacks are often grouped into the two main categories called “passive” and “active.” Passive attacks are often more readily prevented and more arduous to detect; they primarily encompass forms of eavesdropping. On the other hand, active attacks are normally easy to detect and more challenging to prevent; these include forgery, masquerading, tampering, and DoS.

11.1

INTRODUCTION TO INFORMATION SECURITY

373

. Security mechanism—a means to detect, prevent, and/or react to a security attack. The process usually takes the form of an algorithm and associated communication protocol. Examples of security mechanisms include digital signatures and public key cryptography algorithms, and the secure socket layer (SSL) protocol. The algorithms and protocols complement one another in order to jointly provide protection to a broad class of attacks. A security mechanism is effectively designed by making using of models of the behavior of the attacker(s). . Security service—an operation whose main objective is to counter security attacks through the effective use of one or more security mechanisms. Examples of the most common security services that we discuss in this section are confidentiality, authentication, integrity, non-repudiation, access control, availability, and antipiracy protection.

Let us consider a simple information communication system involving a single sender and receiver. Computer security aims to protect information at the end nodes and network security safeguards the communication channel. The associated mechanisms are applied through processing of the information at the bit level. In contrast, media security shields information from attacks at a semantic level and thus is often applied at a “higher stage” such as the application or presentation layers of a network. We divide security services into the following seven groups: 1. Confidentiality, which in a broad sense protects information against passive attacks such as eavesdropping that entails monitoring the information transmitted in a given medium. The mechanisms used to achieve confidentiality involve encryption. For an eavesdropping attack, the service is necessary both at the workstation and during information transmission. Eavesdropping cannot be easily detected at the time of attack. However, it can be prevented by scrambling the information such that access is impossible by a party other than the sender and the receiver. For DRM, the inaccessibility helps keep the content from being illegitimately copied, viewed, or tampered while in storage or transit. 2. Integrity, which ensures that the information received has not undergone unwanted tampering that affects the credibility of the information. Integrity is achieved through the use of mechanisms such as hash functions and/or digital signatures. An unlawful third party can redirect information from the sender, modify it, and transmit the misrepresented information to the receiver. 3. Authentication assures that the information received is in fact from the legitimate source. This service defends against masquerading. In particular, the mechanisms must ensure that (1) the initiating parties during a connection are in fact credible and that (2) the connection is not interfered with by an attacker. In DRM, authentication allows both parties to establish trust that they are indeed communicating with legitimate sources.

374


4. Antipiracy protection is the process of illegally duplicating and potentially retransmitting information. Because digital information can be exactly copied, this is of serious concern for the commercial digital media industries whose revenues are tied to the number of legitimate individuals buying digital content. The mechanisms used to prevent piracy have a goal of providing greater control of information and are a part of an overall DRM system that includes encryption, digital watermarking, metadata and rights expression languages, and protocols such as secure electronic transaction (SET). 5. Access control, which involves the legitimate admittance of information or a party to a host computer and its resources. Such protection can be used to prevent unlawful users from access to sensitive information through passwords or biometrics as well as prevent the spread of viruses to the host through the use of anti-virus programs. In DRM systems, access control can, for instance, enforce the negotiated rights in a transaction by preventing use of content until payment has been made. This is often implemented by ensuring the content is encrypted even after download and allowing a user the decryption key only after (s)he can prove payment. 6. Nonrepudiation, which guarantees that a sender cannot dispute that they transmitted a given message to the receiver. This can be established through the use of digital signatures, biometrics, and other data about the sender that would be collectively difficult to repudiate at a later time. Nonrepudiation is useful in proving that both parties have agreed to terms mutually without fear that one party will later recant. 7. Availability, a service that attempts to recover from loss or reduction of information access. An offense against availability is the DoS attack in which a service provider is flooded by artificial demands for information such that authentic requests cannot be sufficiently addressed. This attack is impossible to prevent ahead of time. Services such as IBM’s denial of service—alert and response exist to help detect network intrusion and attempt to recover from such an attack [2].

11.1.2

Content-Based Media Security

In the context of information creation, processing, transmission, and storage, content alludes to a higher-level representation or semantics of the data. Naturally, this implies that content may be comprised of multiple forms of media such as audio, imagery, video, text, and graphics in an assorted variety of digital formats. As a result, the characteristics of the information can greatly vary. For example, the required bit rate, maximum acceptable error, decompression complexity, and display requirements may deviate significantly which creates several challenges in terms of content protection. Traditional forms of computer and network security do not sufficiently address the needs of content security because they often process information at the bit level, which does not allow appropriate consideration of the semantics of the

11.1


375

information. This consideration is necessary if we are to design security mechanisms that can handle the processing of high-bandwidth information such as video that may undergo loss or format conversion and allow the possibility of retention of some control over the intellectual property once it is transmitted. For example, encryption and authentication functions must accommodate varying data formats and lossy compression. Furthermore, computer and network security alone cannot appropriately manage the issue of piracy. Content-based or media security responds to these issues through the following attributes: . Protection is tied to a higher level of the content, opposed to the actual bits, to provide efficient and effective security that can be designed to be more robust to format conversion or recovery from communication error. For example, digital signatures can be based on high-level features of information rather than the bit representations to allow authentication even in the face of lossy compression. . Security processing, for encryption or digital watermarking, is integrated with other signal processing tasks such as compression and decompression to be able to handle fluctuations in bit rate. Combining encryption or digital watermarking with signal processing not only allows the reuse of processing blocks for greater efficiency but can also provide a structured method to attach security processing to varying content forms and bandwidth. . Semantic information such as metadata is associated with the content security mechanisms to ensure that the content is used as negotiated in an information commerce scenario. A significant application and motivation for the development and use of contentbased and media security (in addition to traditional computer and network security) is DRM. For the remainder of the chapter we focus on DRM and its role in mobile content delivery.

11.1.3 Digital Rights Management 11.1.3.1 Overview Digital rights management (DRM) is the digital management of user rights to content. It entails linking specific user privileges to media in order to control viewing, duplication, access and distribution, among other operations. Ideally, the goal of a DRM system is to balance information protection, usability, and cost to provide a beneficial environment for all parties in an information commerce transaction; this includes expanded functionality, cost-effectiveness, and new marketing opportunities. Overall, management is achieved through the effective interaction of business models, legal policy and technology. It is the conflicting characteristics and rapidly changing environments in each of these spaces that makes DRM a challenging and interesting problem.

376


A DRM system is comprised of the following basic entities: the content, the users and the rights. We define them below as follows: Content refers to the actual intellectual property commonly referred to as the “work.” Users are any parties that are involved in the overall content distribution chain. This can include content creators, publishers, aggregators, distributors, and consumers. Rights express the permissions, constraints and obligations between the users and the content. The relationship between these fundamental entities is conveyed in an information architecture of the DRM system [3] shown in Figure 11.1. The information architecture deals with modeling and describing these entities, as well as conveying their relationships to one another. As Figure 11.1 shows, users may create or make use of content and/or own the rights over specified content. This general model is valid over any type of business or information commerce paradigm that incorporates DRM. From this perspective, it is clear that the purpose of DRM in any technological system is to facilitate the interaction between these abstract entities. However, to implement these DRM relationships in real systems, it is useful to summon the functional architecture [3,4] to provide an overview of the components and necessary modules in the overall DRM system. Figure 11.2 provides a possible functional architecture (modified from Refs. 3 and 4). Here, the overall DRM system can be seen in terms of a sequence of possible components that exists in the lifecycle of a piece of content. This sequence, called the content value chain, traces the

e.g., text, graphics, video, audio

Content create distribute use

Users e.g., content owner, copyright holder, aggregator, consumer

Figure 11.1

over

own

Rights e.g., copy once, play two times, distribute freely, never tamper

Information architecture of a general DRM System (Core Entities model).

11.1


377

Creator Content Creation, Capture Content Rights Establishment Content Rights Validation

Publisher

Aggregator

Content Packaging Content Repository

Distributor

Content Trading Content Distribution

Retailer

Content Trading Content Distribution Content Payments

Consumer Figure 11.2

Content Tracking Permission Management

Functional architecture of a general DRM system.

content from its creation to its consumption. Not all DRM scenarios will have all the components shown in Figure 11.2, and one or more parties can represent each or many of the different blocks because of the different distribution and business models in use. However, from an engineering perspective, in which we view “the chain of hardware and software services and technologies governing the authorized use of digital content and managing any consequences of that use throughout the entire life cycle of the content” [4], we can identify common elements that are employed in a broad class of scenarios. For example, at the technological level, DRM systems incorporate security mechanisms including encryption, digital signatures, digital watermarking, metadata, and network security communication protocols at various stages in the chain. We define these processes as follows: 1. Encryption. This is the process of scrambling digital information with the use of an encryption key such that access to the original information is not possible without applying the inverse process of decryption. By limiting access to the decryption key, access to the information is controlled. There are two main types of encryption: symmetric, which we discuss here; and asymmetric, which we discuss in relation to digital signatures. In symmetric encryption,

378


the same key is used to encrypt and decrypt. The resulting mathematics deals with bit operations that are computationally simpler to implement than asymmetric encryption. Many DRM systems in practice make use of a symmetric algorithm known as the advanced encryption standard (AES) [5] for the task of encrypting content. However, if real-time encryption of high-bandwidth information such as video is required, in addition to robustness to channel loss, strategies based on selective or progressive encryption can be employed and are especially suited for mobile applications. Here, only specific features of the content are encrypted or prioritized in order to distort the perceptual quality without requiring heavy computation. In Section 11.3 we discuss methods of encryption suitable for mobile applications in more detail. 2. Digital Signatures. These provide nonrepudiation of a transaction so that all parties are committed to the exchange and its associated terms on finalization. Digital signatures make use of a “footprint” (of the data to be signed) commonly generated by a hash function that takes the variable-length data (in this case the electronic version of the terms of the exchange and/or the content) and produces a much shorter fixed-length dataset called the hash. This result is then encrypted using asymmetric encryption. In asymmetric encryption the key used for encryption KE to scramble the data is different than the key used for decryption KD . One of the keys of the (KE , KD ) pair is publicly available and the other is private. KE and KD are related such that the encryption and decryption is possible, but it is computationally infeasible to determine the private key given the public one. In the case of digital signatures, KE is the private key and KD is the public member. Once the sender encrypts the hash with KE (which is a secret) the resulting signature and associated data are transmitted to the receiving end. At the receiver, anyone can verify that the signature of the data, as well as the data themselves, is valid by comparing the decrypted value of the signature (using the publicly available KD ) with the hash of the data received. If they match, then it is clear that only someone who knows KE (i.e., the legitimate sender) has generated it from the correct data. Thus, the signature guarantees that the data is not forged or tampered with, and nonrepudiation is enforced. In a mobile environment or heterogenous content distribution chain, the digital signature may be too sensitive to format conversion, transcoding, or unavoidable errors. In such situations proposals have been made to generate the hash from more robust features of the content [6] or make use of other strategies for tamper assessment [7]. 3. Digital Watermarking and Fingerprinting. This embeds an often imperceptible mark, called the watermark or fingerprint, into the host media such that it travels intrinsically with the resulting content and enhances copy control, distribution tracking, and usage. The embedding process often makes use of a secret watermark key known only at the watermark embedding and detection stages in order to make access, removal, or forgery of a watermark difficult. The purpose of watermarking in a DRM context is to insert a payload,

11.1


379

which represents the information required by a DRM system for certain tasks such as copy control and distribution tracking. The payload is inserted by making use of the watermark key and features of the host content in order to create a watermark signal, which, when added to the host, is imperceptible and robust to incidental processing. The resulting composite signal, known as the watermarked content, may undergo some minor processing such as lossy compression during distribution before being passed through a detection algorithm that uses the watermark key to extract an estimate of the payload. The detection algorithm may be housed within some part of the DRM system such as the content player in order to use the extracted payload to determine what the usage rules apply at that stage of the distribution chain. Alternatively, if the payload is unique to each user, then the process is often called “fingerprinting.” In this situation, detection is done off line in order to determine who the original source of pirated content is when illegitimate copies are identified. Watermarking is normally applied before any other security processing such as encryption within a DRM system. Thus, by some, it is observed as a last line of defense against unauthorized usage of the content [8]. 4. Metadata. Metadata for DRM provides machine-readable expressions that link the three entities of content, users, and rights together so that the DRM devices can enable permissible operations. In general, metadata are defined as data about information [9]. DRM metadata [10], in particular, convey information about the rights of the various users in the content distribution chain to various parts of the content; they may include data about the copyright owner, allowable usage of the content, and details about the cost to use or distribute the content. The main objectives of DRM metadata are to (a) provide semantic meaning to an often diverse set of content, so that educated usage decisions can be made regarding the content (e.g., cataloging, purchasing, redistributing) and (b) indicate an explicit relationship between the users’ rights and content, so that a practical bridge can be formed among the areas of commerce, intellectual property law, and technology. Since metadata needs to be used by a diverse set of applications (DRM- and non-DRM-related) and interoperability of all of these applications is necessary, a great deal of standardization effort has been put forth to develop an associated expression language. One undertaking that is anticipated to have much success as a digital rights language is the eXtensible rights Markup Language (XrML). 5. Security Protocols. These ensure the protection of the content while it is transmitted from one party or device to another in the distribution chain. In general, a security protocol defines a set of rules for the trusted communication of two or more parties in a communication network. These rules contain specifications as to the format, the necessary data, and processing to ensure secure information tranmission. Establishing a secure communication channel requires some initial handshaking on the part of some or all of the parties or devices that is enforced within the protocol. One of the goals of

380


effective protocol design, especially in the context of mobile applications, is to minimize this overhead while maintaining a reasonable level of protection. To successfully implement an overall DRM system for a given business application or model, designers must integrate the above technologies in an effective way to produce an end-to-end system that serves the needs of all users. Systemlevel design expertise is integral for this process and is currently an on-going effort in the research and industrial communities. We demonstrate how the various security elements can be put together in a generic DRM content distribution system shown in Figure 11.3; the Rights—System [4] developed by InterTrust follows a similar structure. Through our explanation of this system we hope to demonstrate how the different components of a DRM system effectively interact with one another to achieve the overall protection of content. The raw content must first be prepared by a process called the packager. The packager makes use of a securely generated content key to convert the raw content into a specific format (perhaps via lossy compression), encrypts the result (using symmetric encryption [1]) with the content key, and produces metadata to describe the content (e.g., the name of the artist, copyright holder, and work may be included) and to identify the location where the content key for decryption may be obtained (e.g., the network address or URL of the associated site). The packager may generate the content key or may obtain it from another party. In either case, after preparation of the content, the resulting data (compressed encrypted content and metadata) is transported to a content (distribution) server (CS) and the content key is sent to a rights fulfillment server (RFS). A user who would like to retrieve the content (and who we assume has the necessary DRM client software installed on his/her machine) needs to decrypt the content

ENCRYPT{content,key}

Distributor/ Aggregator

ENCRYPT{content,key}

$$ content

Packager

Retailer

Consumer token

key

Figure 11.3

Content Distribution Server

Rights Fulfillment Server

token rights, key

Generic DRM content distribution system (adapted from Ref. 11).

11.1


381

and use/access it according to the negotiated rights. According to the example, the user can obtain the encrypted content and associated metadata from the CS. To gain access to the information, (s)he would have to purchase rights from a web retailer (WR), who would provide the user with a token that can be provided to the RFS to show proof of payment. On receiving this proof, the RFS will provide the user with rights in the form of digitally signed (using the user ID) metadata and the content key for decryption. The metadata can specify time-outs, the number of times that a song can legally be played or whether the media can be copied. Using this information, the DRM client can decrypt the content and allow access according to the purchased rights. The scenario presented above is appropriate for a client –server type of distribution model. There are extensions that include portable devices, rights lockers, and peer-to-peer systems as discussed by Feigenbaum and Freedman [11]. In essence, the business model characterizes the usage rules that are enforced by the technological architecture; thus, the DRM system technology, content distribution, and business models are intrinsically tied. In the next section, we introduce some emerging content distribution models and discuss the associated DRM challenges. We do not provide an exhaustive list of models, nor in-depth evaluation, but intend to provide a flavor of the various issues and compromises.

11.1.3.2 DRM and Content Distribution Business Models To establish an effective DRM system that links users, contents, and rights, one must take into account the way in which the content is distributed (i.e., the distribution model) and the allowable usage rights (which are related to the business model). Examples of several content distribution and business models are introduced in this section. The most fundamental type of content distribution is based on a client – server paradigm in which a client requests and downloads information from a given server. The server is often centralized and assists all host computers. The content, depending on its format, may be streamed in real time or completely downloaded and stored for later use. Many first-generation e-commerce and DRM systems make use of this basic download model. In contrast, in peer-to-peer file sharing systems, content is stored and distributed directly by end systems (termed “peers”) without the need of a central server. Examples of such systems include the infamous Napster and Kazaa. Multicast distribution, which has been shown to scale more naturally than have client – server models [12] for a large number of consumers, involves the distribution of content from a single source to multiple recipients. For example, cellular digital packet data (CDPD) technology [13], a specification to support wireless Internet access for mobile devices such as cell phones, supports multicast distribution. Effective business models are also an integral part in the establishment of a successful DRM system. Subscription-based models allow a given user to obtain access to a library of content with a predefined scope based on the details of his/her “membership” rights. The membership is normally purchased ahead of time.

382


In contrast, promotion models provide restricted usage of the content to a group of users in order to entice them to pay for less limited access. Methods of restriction can include permitting only a fixed number of allowable views or plays after which the content is inaccessible, or curbing the distributed media characteristics to a noncommercial quality version. In these cases, users can pay additional fees in order to gain unlimited plays and high quality versions of the content. Pay per view (also called “pay per play”) charges the consumer according to the number of uses of the content. This scenario is relevant for content-on-demand applications in which pricing is dependent on the number of views or plays. Microtransactions/micropayments are another form of business model in which the users deal with a small volume of content (e.g., size of a few bytes) worth a meager value (e.g., a fraction of a cent). This scenario is envisioned1 to encompass applications involving individual charging and delivery of individual stock market quotes, newspaper articles, and Webpages [14]. Because this model deals with low volumes of inexpensive data, it is presumed that consumers will engage in high numbers of such transactions within a small period of time. In floating licenses, the usage of content is not associated with specific users. Instead, a group of, say, N individuals jointly own or have access to M N licenses. The system allows up to M members of the group to simultaneously make use of the content. In this case, the business model attaches permissions to a group and must keep track of the number of accesses at any given time. Another popular business model, called superdistribution, overlaps with some of the models described. Mori and Kawahara [15] describe superdistribution as an approach to distributing software in which software is made available freely and without restriction but is protected from modifications and modes of usage not authorized by its vendor. . . . Superdistribution relies neither on law nor ethics to achieve these protections; instead it is achieved through a combination of electronic devices, software, and administrative arrangements. . . By eliminating the need of vendors to protect their products against piracy through copy protection and similar measures, superdistribution promotes unrestricted distribution. . . .

The modern use of the term encompasses the distribution of any general types of content by peers to one another. DRM systems aim to provide effective content protection and management for a number of diverse content distribution and business models. A given enterprise must consider several technical and nontechnical issues before settling on a system [4]. We attempt to provide a flavor of the variation of technical challenges by considering a number of different scenarios. If a multicast delivery channel is combined with a pay-per-view model (for a video-on-demand scenario, say), a DRM system will have several challenges including content key distribution to the multiple users. For efficiency of the encrypted 1 To the best knowledge of the authors of this chapter, no real system has yet been implemented, in part due to the lack of an appropriate DRM system.

11.1


383

media distribution, it is preferred that a group key Kg is used to encrypt the content once at the transmitter end. If each user knows Kg , then s(he) can individually decrypt the content once it is received. However, this requires that the group key be securely transmitted to each of the users by a centralized server that keeps track of the membership to the group. This membership is usually dynamic; that is, an individual may be able to leave or enter at specific times. Therefore, for reasons of security, Kg must be updated to prevent new members from being able to decrypt information that was sent before they paid for privileges, and to stop members who have left the group from accessing information that is unpaid for after their departure (this latter issue is called “key revocation”). In the DRM literature, this problem is called the “state update problem” for key management, and has been investigated by a number of researchers [16 – 19] to find the best compromise in terms of complexity of encryptions and the update message length. In the realm of microtransactions/micropayments in which we assume that the content is downloaded using a client-server architecture (e.g., a Webpage is providing information on timely stock quotes), one of the main challenges is to make sure that the overhead of DRM per download is low; it should not cost the content provider in the order of one cent to transmit information and process the payment for, say, specific search engine results, for which the provider is paid 0.01 cent by the consumer. It is also envisioned that, given the granularity of the content and payment size, such a model will encourage a more diverse set of content providers (e.g., pay-per-use search engines, selective magazine article purchases) that will experience a significantly larger number of such transactions initiated from consumers. As discussed by Waller et al. [14], DRM methodologies that make use of protocols such as secure socket layer (SSL) to secure the communication channel have a heavy overhead that renders widespread use for microtransactions/micropayments infeasible. In fact, a better approach is to apply security to the content itself (rather than the channel as in the case of SSL), appending metadata and digital signatures as appropriate. A third-party “transaction broker” would be used to handle payments and issue “tokens” (where appropriate) in order to obtain the appropriate content keys for decryption and access to the content. The Secure Interactive Broadcast Infotainment Services (SIBS) project [14] is one such system with the goal of developing mechanisms for secure and scalable microtransactions. To be able to enforce certain types of business models such as pay per view or even subscription-based pricing in peer-to-peer networks, the use of the content needs to be tracked. One idea that has been proposed is to use a digital rights locker. A digital rights locker is a storage location that houses an individual’s or device’s rights to accessing content. The locker can be a central registry for the acquired rights of a given individual or device; it can also contain backups of prepurchased content and contain usage history. The greatest advantage of using the digital rights locker is the increased flexibility and potential for content protection even if the media leaves the peer-to-peer network and enters devices such a PDA or a mobile phone. Thus, a rights locker allows users portability beyond the standard distribution channels. One such product is available at the time of this writing by

384


Digital World Services [20] and is predicted to appear in future generations of InterTrust’s Rights—System. 11.1.3.3 DRM and Security For effective information management, it is clear that, DRM designers must borrow toolsets from the area of information security. However, there are a number of key differences from the abstract academic notions of security and the needs of practical DRM systems that we address in this section. The first departure is in scope and objective. The overall goal of a DRM system is to facilitate an electronic marketplace and optimize its usefulness to all parties. For example, consumers should be safeguarded against loss of privacy and content providers should defend against intellectual property theft. Thus, DRM applies to the overall management of the information commerce system. In contrast, the purpose of security is to unconditionally prevent a specific attack, such as piracy or eavesdropping, potentially arising in some part of the system. The broader goals of DRM not only involve issues of protection but more importantly also include measures of business success in which keeping a product commercially viable is the fundamental priority. Ironically, this latter objective can be at odds with that of strict security. Allowing some small level of piracy to accommodate greater consumer satisfaction and acceptance of DRM-enabled content distribution is more lucrative than restricting all unlawful duplication [4]. The content providers that have implemented the most user-friendly DRM system will have a significant market advantage. From this perspective, the effectiveness of a security system for DRM does not imply unbreakability. For instance, the pay TV market has been under continuous attacks with “hacks” freely available on the Web. However, the inconvenience of building and continuously updating such systems to keep up with the scrambling counterparts have still allowed pay TV to be a viable business. Another difference between security and DRM involves the formulation of “good guy/gal” and “bad guy/gal.” In the field of traditional security, there are precise distinctions between the lawful members of a communication system and the attackers. There are clear definitions of “trust” and “use” that are more challenging to characterize in a practical DRM setting. The primary obstacle is that the nature of alliances are dynamic in reality. For example, competitors may merge or allies dichotomize. This has serious implications if one tries to naively adopt tools developed in the traditional security area and apply them to DRM. The borrowed mechanism may not guarantee protection because the original assumptions and relationships between parties cannot be enforced throughout the evolution of the system. Furthermore, security may be breached during the implementation– deployment – system upgrade processes (with no fault on the part of the original algorithm design). 11.1.3.4 General Requirements for Mobile Terminals Mobile DRM (MDRM) is becoming an area of increasing focus to jump start the legitimate, fair, and secure exchange of mobile content, and thus avoid the “Napsterization” of content (as in the case of the Internet). For DRM to be successfully

11.2

MPEG INTELLECTUAL PROPERTY MANAGEMENT

385

developed for mobile terminals, a number of fundamental differences that set mobile terminals apart from their wired counterparts must be considered: . The processor performance (in terms of power and complexity) in mobile terminals is limited. This has a significant impact on the design of security algorithms that must, in part, compromise security, for practicality. Software must be down sized to accommodate the reduced hardware capabilities. . The capabilities of mobile devices may vary significantly. Standardization efforts early on in MDRM development is necessary to prevent competing systems that may not easily integrate for seamless DRM across different devices. . There is a greater probability of error and content desynchronization. Thus, DRM algorithms must be robust to variations in delivery quality, and should also be able to recover from distribution failure and loss. . Because of the convenience of the devices (which is the motivating factor in their success), users expect ease of use and cost-effectiveness of any DRM system implemented. Risk and pain assessment is necessary to deduce the appropriate security and convenience level favored for successful business. In the next section, we look into DRM involving MPEG IPMP, to provide the reader with an understanding of how DRM is shaping standards activities. Given the as-yet uncertain status of MDRM applications, we are not able to provide an effective mobile DRM case study. Section 11.2 is presented to elucidate integration of DRM with well-known formats. Section 11.3, however, touches on MDRM emerging technologies.

11.2 MPEG INTELLECTUAL PROPERTY MANAGEMENT AND PROTECTION One of the goals of the Moving Picture Experts Group (MPEG) is to ensure interoperability, including DRM, between multimedia systems. MPEG community mostly concerns interoperability from the consumers prespective: ensuring that content from multiple sources will play on players from different makers. In the following subsections, we briefly introduce the DRM standardization activities in the MPEG community. MPEG specifies DRM as intellectual property management and protection (IPMP) [21]. 11.2.1 Copy Protection on MPEG-2 Videos The MPEG-2 standard provides some methods for identification and protection of intellectual property of contents. For identification, a unique copyright identifier of 32 bits, which identifies the type of the work (ISBN, ISSN etc.), can be used by a registration authority to identify the work at the audio, visual, or system

386


level. For enabling protection, several provisions can be used: (1) signal whether particular packets have been scrambled, (2) send messages to be used in (proprietary) conditional access systems, or (3) identify the conditional access system used. The MPEG-2 standard does not have specific mechanism in identifying and protecting the video streams themselves. However, when videos are recorded in the DVD-ROM disks, they are protected by the Content Scrambling System (CSS). CSS is used to protect the content of DVDs from piracy and to enforce region-based viewing restrictions. A DVD system includes three components: the DVD disk, the DVD player, and the host (computer, host board, etc.). The DVD disk contains the encrypted content, as well as a hidden area. The contents of this hidden area cannot be delivered, except to an authenticated device. Presumably, any device that can authenticate has been licensed by the DVD Copy Control Association, and as a consequence is trusted to receive the information. This hidden area contains a table of several encrypted disk keys. The player stores the player keys that are used to decrypt the disk key, the region code that identifies the region in which the player should be used and another secret that is used for authentication with the host. Details of the inter-operation of keys and the CSS system can be found in Kesden’s report [22].

11.2.2

MPEG-4 IPMP Hook

The MPEG-4 IPMP standard specified two pieces of technology: identification of copyright and IPMP hook to enable protection. It does not prescribe when and how often to use such identification descriptors. It relies on international treaties and legislation to prohibit removal of IPMP information. An identification of an MPEG-4 AV object specifies whether it is protected by an IPMP system, its type (audiovisual, visual, still picture, etc.), and the registration authority that hands out unique numbers (e.g., ISAN, ISBM, ISRC). It can also indicate the titles and supplementary information and references to separate data streams. In 1998, MPEG concluded that it was not desirable to enforce IPMP tools on all MPEG-4 content and MPEG-4 players, and it was neither feasible nor desirable, to standardize a complete DRM system [21]. No DRM system could satisfy the varying application needs ranging from real-time low-quality communications to valuable content in settop box. Thus, MPEG-4 standardizes hooks that builds secure MPEG-4 delivery chains. Video bitstreams embed information that informs the terminal which IPMP system should be used. Two simple IPMP extensions of basic MPEG-4 systems are specified. . IPMP Descriptors (IPMP-Ds). These are a part of the MPEG-4 object descriptors that describe how an object can be accessed and decoded. These IPMP-Ds are used to denote the IPMP system that was used to encrypt the object. An independent registration authority (RA) is used so any party can register its own IPMP system and identify this without collisions.

11.3

EMERGING TECHNOLOGIES AND APPLICATIONS

387

. IPMP Elementary Streams (IPMP-ES). All MPEG objects are represented by elementary streams, that can reference each other. These special elementary streams can be used to convey IPMP specific data. Their syntax and semantics are not specified in the standard.

11.2.3 MPEG-21 and MPEG IPMP Extensions The MPEG-21 IPMP standard aims to provide a uniform framework that enables all users to express their rights and interests in digital Items and to have assurance that those rights, interests, and agreements will be persistently and reliably managed and protected across a wide range of networks and devices [23]. In 2002, MPEG adopted the XrML 2.0, developed by Xerox and Content Guard, as the basic rights expression language for describing contractual usage rules for “digital items” [24]. This language provides rules that are flexible and extensible. Its framework does not favor any particular human language, culture, or legal system. It also provides unambiguous semantics and predictable effects. Some expansions, such as public policies, rules, and business initiatives, have been adopted in order to satisfy versatile needs of multimedia data. The MPEG-21 Rights Expression Language was finalized in July 2003. In addition to Rights Expression Language, MPEG-21 Rights Data Dictionary was finalized in late 2003. Additional IPMP extensions are being developed to solve these issues: (1) support access to and interaction with content while keeping the amount of hardware to a minimum, (2) support easy interaction with content from different sources without swapping of physical modules, (3) support conveying to end users which conditions apply to what types of interaction with the content, (4) support protection of user privacy, and (5) support service models in which the end user’s identity is not disclosed to the service/content provider and/or to other parties. The work currently concentrates on various security interfaces. Major issues are the management of trust and tamper resistance. MPEG-21 IPMP was finalized in early 2004.

11.3


The popularity of mobile terminals such as phones and personal digital assistants (PDAs) is growing at an incredible rate, and with it we are also seeing a dramatic increase in the number and variety of mobile contents delivery services. This implies the need for technologies to enable the secure delivery of information to mobile terminals. In this section, we intend to present a technical overview of current state in mobile digital rights management (MDRM). Main aspects, such as MDRM systems requirements, categorizations and architectures, are studied. We will also analyze several sample MDRM systems and further focus on the discussion of several emerging security technologies to enable the success of MDRM challenge.

388

11.3.1


State-of-the-Art MDRM Systems

In recent years, many DRM systems have been proposed and commercialized. According to the type of client-side content access devices (terminal), these systems can be categorized into platform independent, mobile DRM only, and fixed DRM only, where protection and rights management of live and on-demand content delivered to fixed and mobile devices and across networks, to mobile devices only, or to fixed devices only are enabled. We shall look at several typical commercialized MDRM systems that are either mobile DRM only or platform independent, with the capability to facilitate rights management of various kinds of digital content delivered to mobile devices. In preceding, let’s look at some popular MDRM system classifications and terminologies based on various metrics. . Rights enforcement-based Server-side DRM enforcement—flexible and scalable server-side solutions are used to control access to and usage of content delivered to mobile devices that do not yet have client-side DRM capabilities. Server- and client-side DRM enforcement—where client-side mobile devices have DRM capabilities. . System architecture-based Centralized—rights are managed by a trusted third party (central). Each service provider and each user have at least one account managed by the central. An account manager handles the rights processing. Distributed—rights are stored in tamper resistant devices such as smart card or self-protecting containers. Rights are not managed by a centralized rights manager but the user or a delegated account manager. Tamper resistant hardware is often needed. . Usage rule-based Forward lock—intended for the delivery of news, sports, information, and any content that should not be sent on to others. Unencrypted content is transferred to the mobile device using any delivery method. The mobile device is allowed to play, display, or execute, but not forward. No rights object is delivered. Instead the mobile terminal enforces a default set of rights and ensures that the content cannot be forwarded to any other device. Combined delivery—enables usage rules to be set for and delivered together with the content. Unencrypted content is packaged with a rights object, and the whole package is transferred to the mobile device using any delivery method. The mobile terminal enforces the usage permissions defined in the rights objects and ensures that the content cannot be forwarded to any other device. It enables preview feature. Separate delivery—enables superdistribution to protect content with higher value. Encrypted content is delivered to the mobile device using any delivery method. It allows the device to forward the content but not the rights, which is achieved by delivering the content and the rights via a separate

11.3

.

.

.

.

.


389

channel. Recipients of superdistributed content must contact the content retailer to obtain rights to either preview or purchase the content. This kind of delivery requires a rights refresh mechanism. DRM model-based Media-player-specific—a model where the media player is responsible for the rights management and rights enforcement is exercised by the media player. Mobile-terminal-specific—a model where DRM functionality is tightly integrated with the mobile terminal’s system software/hardware. Network centric—a model where copyrights and policies are enforced during the delivery or superdistribution in the network. Content delivery method-based Broadcast—MDRM system targets at protection of broadcast contents. Only users who subscribed to the service can receive and use the content based on the usage rules. Streaming media—protecting and managing the usage rules of streaming media. Downloaded—protection of downloaded mobile content, such as ring tone, images, and games. Personal storage—MDRM system that protects and manages the copyright of one’s own works. Content variety-based Content-type-specific—the MDRM system supports only certain specific type of content. Any content type—the MDRM system is capable of supporting any type of content. Content value-based Heavy (premium) content—content with high commercial value. Consumer use rights are explicitly licensed by the copyright holder. Receiver has to pay for full content access. Light content—copyrighted content that may be free for noncommercial use (ringtones, screensavers, etc.). Personal storage—content that is not commercialized but owned and stored by the content creator. Security level-based Strong MDRM—capable of providing strong protection to premium content that is robust again various kinds of attacks. Light MDRM—only lightweight protection is imposed. Systems may be targeted at “keep honest people honest” or there may be non premium content.

In the following Sections 11.3.1.1 –11.3.1.3, we will briefly discuss several representative and commercially available MDRM systems. Their key characteristics and differences are summarized in Table 11.1.

390

Secure-containerbased Ticket-based

Centralized

Multiple DRM technologies; additional DRM technologies may be added using its plugin architecture Multiple DRM technologies

Nokia Music Player

EncrypTix

DMDmobile

No additional client SW needed

Does not require any proprietary technology to be deployed on the client side

Java-capable mobile terminal required; no TR HW requirement Compliant TR HW required

TR HW required

TRa HW required

Any

Any

Real-time, streaming, download

Streaming, download

Ticket, document

Any, including game and e-book

MP3, AAC, RealAudio, narrowband AMR audio, MPEG4, H.263, RealVideo MP3, AAC

b

Content Type

—

Streaming, download Download, streaming

Streaming, download

Supported Type of Distribution

Tamper-resistant. Federal Information Processing Standards Publications, Security Requirements for Cryptographic Modules.

a

Beep Science AS

VS-7810

Secure-containerbased

DRM Technology

Client (Mobile Terminal) System Requirement

Several Commercially Available Sample MDRM Systems

Helix

TABLE 11.1

Encryption, watermarking, fingerprinting

Encryption, TR HW, FIPS 140-1b compliant (HW L4, SW L3) AES 128-bit encryption

Encryption

Encryption

Strong encryption, secure container

Security Technology

11.3


391

11.3.1.1 Nokia Music Player—Distributed DRM Nokia Music Player, an accessory for Nokia mobile phones, enables the user to listen to an integrated FM stereo radio and downloadable protected MPEG audio/ music (MP3- and AAC-formatted, i.e., it is content-type-specific) files secured with InterTrust digital rights management technology. The InterTrust system is designed so that a secure channel is not required for other than privacy reasons. The content is protected via a secure container called DigiBox. DigiBox enables the association of rules and controls that specifies the content usage rules and the consequences of usage via cryptographic means. DigiBox is manipulated by using a trusted rights protection application to make the protected content available according to its associated access control rules. It’s one of the most popular distributed DRM system today. Such a secure container-based MDRM system will allow MDRM components to be integrated with almost any type of system, architecture, or network topology. It is aimed at providing both strong and lightweight security levels for MDRM systems. The drawback is that it requires tamper-resistant hardware devices to realize secure processing of the protected content.

11.3.1.2 NEC VS-7810—Centralized DRM NEC VS-7810, a MDRM system designed to enable secure delivery of information to mobile terminals, provides certain flexibility to incorporate into a broad variety of systems and architectures. VS-7810 is a “ticket”-based system where users have to purchase a “ticket” (decipher key) to access or make use of protected contents. Figure 11.4 illustrates the content delivery diagram using MDRM VS-7810. Unlike secure container-based systems, this centralized system does not require tamper-resistant hardware at the client side, although the mobile terminal has to be Java-enabled. In general, this kind of system is content type nonspecific and it is feasible to incorporate different systems and architectures. One of the

Figure 11.4

VS-7810 MDRM content delivery diagram.

392


disadvantages of such centralized system, however, is the cost to maintain accounts on the server(s) for each and every user and its lack of scalability. 11.3.1.3 Integrated Model Helix is another typical example of distributed MDRM system based on secure container technology, which is made platform-independent. Readers can check Table 11.1 to see the differences between Helix and VS-7810 and between Helix and Nokia Music Player. Helix DRM is a complete, end-to-end secure digital delivery platform that consists of four plugin components: packager, license server, client, and universal server DRM. The packager uses strong encryption algorithms and secure container technology to prevent unauthorized use of content and to prepare content for distribution via streaming, download, or other delivery methods. With true superdistribution capability, the packaged media content and the associated business rules for unlocking and using that content are stored separately, so that multiple sets of business rules can be applied to a single file over time. Helix DRM supports RealAudio, RealVideo, MP3, MPEG-4 video, H.263 video, AAC audio, and narrowband AMR audio. The license server verifies content licensing requests, issues content licenses to trusted, authenticated Helix DRM end-user clients, and provides auditing information to facilitate royalty payments. The client enables download and streaming playback of secure formats in a tamper-resistant environment based on the usage rules specified by the content owners. The media player specific plugin enables streaming of protected media from the helix universal server to client terminal with no tamper-resistant hardware. We call the kind of DRM model that represents a combination of different schemes an integrated model. Since a single MDRM scheme alone may not create a sufficiently broad solution to deal with all the mobile business requirements, an integrated system is expected to provide more capability that covers a wide range of business requirements to improve user convenience and offer better rights control flexibility and enhanced system scalability and upgradability. A representative example in this category is the Beep Science Mobile DRM system (illustrated in Fig. 11.5 and listed in Table 11.1) that comprises player DRM, terminal DRM, and network DRM models and includes the content policy system (CPS)—a server-side solution that enables the operator to act as a payment collector for their own and partners’ premium content and ensures that copyright restrictions are enforced; the content control engine (CCE)—a highcapacity, real-time infrastructure node that activates content protection during download; policy enforcement server (PES)—a real-time component for executing the copyright and charging policies; license server (PCR)—a backend system for managing policies and rights for premium content and services; and rights issuer server (RIS)—a system that supports content superdistribution. 11.3.1.4 MDRM Requirements The sample cases we discussed provide us with some insight into the design of MDRM systems. Essentially, MDRM systems aim to provide usage rights management to establish trust for secure content distribution to and access through mobile

11.3


393

Figure 11.5 Beep Science Mobile DRM system components.

devices. From a technological point of view, the content provider’s rights—user convenience and privacy—and the device mobility and capability must all be taken into consideration when designing a MDRM system. Subsequently, the basic requirements for a MDRM system include 1. Security. The MDRM system must be robust to various attacks and be able to prevent illegitimate usage and unauthorized distribution of rights protected digital content. Furthermore, it should ensure user privacy. 2. Scalability. The MDRM system must be scalable to handle dynamic communication channels, diverse mobile device capability, various types of digital content, and distinct rights and usage rules issued by different issuers with different security and cost requirements. 3. Usability. This involves ease of use, mobility, cost, and compliance. Ease of Use. The system should not sacrifice user convenience, and it should bring attractive services to end users. Mobility. The system should not reduce mobility or increase usage complexity due to mobility. Cost. This includes implementation and operation cost at both content provider and end user and is an important factor for successful MDRM systems. It must be feasible to mobile devices’ processing power, storage, and battery power. Compliance. The system should be compliant with the existing network infra structure and the various standard formats. Scalability and mobility are crucial for mobile DRM. In the next section, we consider some emerging security technologies that enable scalability and mobility for the design of practical MDRM systems.

394

11.3.2


State-of-the-Art MDRM Component Technologies

It is clear that MDRM technology should enable increased security to ensure authenticity and integrity of both content and rights. The sample systems we discussed in the previous section use encryption, watermarking, fingerprinting, and authentication technologies to provide secure DRM services. In this section, we look at several emerging security technologies that are intended to provide more adequate content protection for seamless content distribution in the mobile environment. We discuss the following topics: scalable and format-compliant cryptography scheme, the public key watermarking system, scalable watermarking for authentication and error recovery, and efficient key management for multicast in mobile environment. 11.3.2.1 Scalable and Format-Compliant-Encryption for Multimedia In a traditional communication system, the encoder compresses the source media into a fixed bit rate that may be equal to or less than the channel capacity and sends it to the receiver. Given the assumption that the receiver can obtain and decode all the bits in time, it reconstructs the media using all the bits received. Similarly, in a traditional encryption system, the sender encrypts each and every bit of a message and sends it to the receiver, where it will be decrypted at the same rate as encrypted. This often requires correct reception of each and every bit of the encrypted message to avoid subsequent plaintext from being improperly decrypted. In such a system, three or more assumptions are made: (1) the channel capacity is known, (2) the receiver can receive all the bits correctly in time for decryption, and (3) the receiver is capable of reconstructing the media in time. Those assumptions are often challenging for multimedia data streams (MDS) over wireless networks due to the large size of MDS, the varying and possibly low wireless network bandwidth, the wireless channel instability (error-prone), the diversity in mobile receiver device processing powers and storage spaces, and the high computational complexity of many encryption algorithms. These demand scalable and flexible approaches that are capable of adapting to changing network conditions as well as device capabilities. Figure 11.6 illustrates streaming media over various networks to various devices. The time constraint of real-time and streaming media makes it particularly crucial to offer scalability for secure wireless multimedia distribution. In the following text we will discuss state-of-the-art multimedia encryption schemes designed to meet some of the challenges discussed above. As a foundation, we briefly discuss the design of a scalable system for non-secure multimedia communication. Scalable Access of Nonsecure Multimedia: Three Basic Schemes Simulcast as well as scalable and fine-grained scalable (FGS) [25 – 27] compression algorithms were proposed and widely adopted to provide scalable access of clear text multimedia with interoperability between different services and flexible support to receivers with different device capabilities. With simulcast, multiple bitstreams of multiple bit rates for the same content are generated. With scalable coding, a

11.3


Server

media Internet (wired)

Gateway

Wireless network

request request Networked appliances: small/large screen TV

PDA request

media Tele-/video/web conferencing terminal

Cell phone

Mobile PC

lower resolution media

media

395

Mobile CE device Desktop/palm PC

higher resolution media

Figure 11.6

Streaming media over various networks to various devices.

bitstream is partially decodable at a given bit rate. With FGS coding, it is partially decodable at any bit rate within a bit rate range to reconstruct the medium signal with the optimized quality at that bit rate. In most applications, FGS is more efficient compared with simulcast [27] and is by far the most efficient scheme with continuous scalability. Selective Encryption If a medium datastream is encrypted using nonscalable cryptography algorithms, decryption at an arbitrary bit rate to provide scalable services can hardly be accomplished. If a medium compressed using scalable coding needs to be protected and nonscalable cryptography algorithms are used (the bitstream is encrypted uniformly), the advantages of scalable coding may be lost. To resolve this problem, selective encryption may be used [28 – 34], including some of the selective encryption algorithms proposed since the late 1990s. The idea is to encrypt only part of the multimedia datastream, often with lightweight encryption algorithms, to achieve certain level of application suitable protection; that is, only some parts of the entire bitstream are encrypted, while the rest are left in the clear. A general selective encryption mechanism works as follows (see also Fig. 11.7). First, an encryption algorithm Enc appropriate for the security requirement of the target application is chosen. Then, “selection” is performed by partitioning the source object (multimedia plaintext bitstream) O into two bitstreams O1 and O2. “Encryption,” encrypting only one bitstream O1 of the two, is done next. Let O ¼ O1 þ O2 , O ¼ EncK (O1 ) þ O2 be the target object after encryption. If the multimedia are to be compressed, the compression operation is either performed at first or at the same time that the encryption operation is carried out. One interesting reason for the latter approach is to support format compliant encryption, which we will discuss later in this section. Using MPEG video as an example, the most popular approach is using the features of MPEG layered structures to selectively encrypt only a portion of the bitstream. Basic algorithms include header encryption, sign bit encryption, and I-frame encryption, where only the header, the sign bit, the I-frames, or a combination

396


A key K A medium objec t O

Encoder:

Decoder:

partition

partition

O

encryption

serve r

insecure channel

compression

decompression decryption

decoded medium object O '

mobile client

Figure 11.7

Symmetric key selective encryption general architecture.

of them are encrypted and the rest are left in the clear. Most of those are lightweight solutions that provide low level of security. Interested readers can look at Refs. 28 and 30 for some details on the security analysis and possible attacks. Format Compliant Encryption When a bitstream is scrambled, the original format of the bitstream may be compromised if care is not taken. This is especially serious for compressed multimedia datastreams. If scrambling destroys certain inherent structure, compression efficiency can be compromised. Let’s look at a simple example. Assume we encrypt only the I-frames of MPEG video using intrablock DCT coefficient shuffling; that is, we shuffle the DCT coefficients within each DCT block. Notice that MPEG uses several attributes/characteristics of DCT coefficients when encoding a coefficient block. For instance, typical video frames often have many coefficients that are zero valued, especially after requantization. The high-frequency AC coefficients are likely to be small or zero. It is highly probable that a cluster of consecutive AC coefficients are all zero. MPEG effectively uses this property by sending the coefficients in an optimum order, by describing their values with Huffman coding, and by using run-length encoding for the zero-valued coefficients to achieve a significant reduction in bitrate. Assume a low-bit-rate video transmission over wireless network. As a result of shuffling, some clustered zero coefficients may be shuffled apart, resulting in increasing bit rate. To guarantee a full compatibility with any decoder, the bitstream should be altered only at places where it does not compromise the compliance to the original format. This principle is referred to as format compliance. Design of formatcompliant encryption algorithms has to take into account the following factors: security requirement, original format of the bitstream and its coding structure, computational overhead, bit rate, bitstream partition capability, error resilience capability, channel adaptation capability, and the tradeoffs between each pair. Wen et al. [35] gave a general framework for format-compliant encryption. They particularly pointed out that the encryption of a variable-length coding (VLC) codeword may not result in another valid codeword and hence designed a more suitable

11.3


397

algorithm to solve this problem by encrypting the indices of the codeword instead. The algorithm is as follows: 1. Create a bitstream partition (i.e., extract bits that are important). 2. Concatenate extracted bits. 3. Choose a public key or a private key encryption algorithm, such as DES or AES. 4. Encrypt the concatenated bits. For VLC-coded bitstreams, encrypt the indices of codewords from the code table instead, and then map it back to codewords in code table. 5. Put the encrypted bits back into their original positions. In their simulations, they focus on MPEG-4 video error resilient mode with data partitioning and discuss which fields can be encrypted and which should not be. Progressive Encryption The selective encryption algorithms above discussed can often be easily modified to offer two-layer scalability. For instance, if the original bitstream is partitioned into base layer and enhancement layer as was done in scalable coding, one can selectively encrypt the base layer and leave the enhancement layer in the clear. This often provides minimal security. If higher level of security is needed, enhancement layer encryption has to be added. Two-layer scalability can be preserved if the base layer and enhancement layer are encrypted separately. To provide fine-grained scalability, however, many selective encryption algorithms have to be modified since they are not specifically designed to be compatible with FGS coding. FGS-compatible encryption algorithms have to be used. Imagine a medium compressed using FGS coding to be encrypted with a nonFGS-compatible encryption algorithm; in such a case, the advantages of FGS coding will be lost. If an uncompressed medium is encrypted with non-FGS compatible schemes and transmitted through a dynamic channel to various unknown receivers, reconstruction with optimized quality will be difficult to achieve. To provide FGS scalability, progressive encryption may be adopted [36]. Given a bitstream S, the encryption operation is performed portion-by-portion, most often with the latter portion encrypted, based on an earlier portion, to allow partial decryption of the encrypted bitstream progressively. Using cipher block chaining or stream cipher, a multimedia bitstream may be encrypted with progressive decoding capability to allow partial decoding with optimized quality. One variation is to encrypt the base layer independently and the enhancement layer with progressive encryption that is either dependent on or independent of the base layer. Given the assumption that the base layer will always be received without error, on bitstream truncation, partial decryption and hence decoding of the enhancement layer will provide subor near-optimized quality at a given bit rate. Discussion Scalable cryptography schemes offer means for multilevel access control of multimedia content. A traditional security system commonly offers two

398


states: access authorization and access denial. With scalable cryptography algorithms, a single encrypted bitstream can suggest multiple levels of access authorization and hence multiple levels of rights management capability. For example, in a four-level access-controllable DRM system, we may specify access denial, general preview access authorization that allows preview of a sample or part of the movie in low-resolution, high-quality preview access authorization that grants access to club members a snick preview of the content in high resolution, and full content with full-quality access authorization. As an active research area, many issues need to be investigated to provide best performance, security, scalability, and interoperability. The following suggests some of the issues to be taken into account when designing a scalable encryption algorithm for a DRM system: . Minimum and maximum security levels—encryption should provide adequate security level for a given application. . System upgradability and renewability—the scheme should be easy to be upgraded to a different encryption algorithm that provides a higher level of security for a different application and should be easily renewable. . Bit rate—encryption should preserve the size of the original bitstream. . Computational complexity—encryption should add minimum decoding computational overhead that is appropriate for the decoding device processing power. . Error resilience capability—encryption should not lower the error resilience capability of the original system. . Channel adaptation capability—encryption should not compromise the original bitstream’s channel adaptation capability, should preserve such capabilities as transcoding. . Format-compliant capability—encryption should not compromise the compliance to the original format. . Tradeoff between security and computational complexity. . Tradeoff between security and scalability. . Tradeoff between security and bitrate or coding efficiency. . Tradeoff between security and error resilience capability. . Tradeoff between security and interoperability.

11.3.2.2 Public Key Watermarking System Public Key versus Private Key Watermarking Schemes In classic content protection systems using secure digital watermarking, private key (also called symmetric-key-based) schemes are used. That is the key K for embedding Ken and retrieval Kde of the watermark “w” is identical: Ken ¼ Kde . In other words, decoding is not public. It makes use of the encoding key Ken to detect the embedded

11.3


399

watermark. The decoder has all the critical information about the watermark for correct watermark decoding. This implies that the decoder has to be trusted and highly secure. For applications that could not meet these conditions, private-keybased schemes potentially allow the embedded watermark to be destroyed or damaged via watermark subtraction and fake watermarks to be embedded with the same key Ken . Hence study on public-key-based watermarking schemes, where the encoder key Ken and the decoder key Kde are different, Ken = Kde and encoding key Ken cannot be calculated from the decoding key Kde , has been conducted. Public-key-based algorithms can also be called asymmetric key algorithms. Figure 11.8 illustrates general private key (a) and public key (b) watermarking schemes. Some Public Key Watermarking Algorithms Hartung and Girod [37] used one of the first proposed public key robust watermarking algorithms for copy protection applications. It is especially designed for spread-spectrum watermarking schemes. The pseudorandom sequence Pen used for watermark embedding is partitioned, and a part of that jointly with arbitrary random values replacing the rest part of Pen is used for watermark decoding; that is, a partial encoding key is used for decoding. Let Pen ¼ P1 þ P2 , and assume that P0 = P2 and the length of P0 equals that of P2 , jP0 j ¼ jP2 j. Then, Pde ¼ P0 þ P1 . In the paper they suggested that on average each nth coefficient, n . 2, is used to construct the decoding pseudorandom sequence Pde . It is easy to see that the decoding key is a function of the encoding key, and care has to be taken such that Pen cannot be easily calculated from Pde . Various public key watermarking algorithms have been proposed since then. One-way signal processing [38], Legendre sequence [39], and eigenvector [40] public key watermarking algorithms are several of the well-known ones. Unlike Hartung’s algorithm, these schemes do not require partial knowledge of the embedding key Ken , which is either a function of the watermark w or equals w. Furon and Duhamel [38] make use of the power density spectrum (PDS) function. A permutation is performed on the host signal x such that PDS of x is flat. Public watermark private key Ken

O W

Embedding

O

transformation

O’

decoding

W’

(a) private key Ken

O W

Embedding

public key Ken

O

transformation

O’

decoding

W’

(b)

Figure 11.8

General (a) private key, and (b) public key watermarking schemes.

400


detection is based on the specific shape of the PDS of the received signal x0 . In contrast, van Schyndel et al. [39] described an asymmetric watermarking scheme based on a length-N Legendre sequence used as the watermark w. The watermark is detected at the decoder using the correlation between the received signal x0 and its conjugate Fourier transform. Eggers et al. [40] extended the idea of van Schyndel et al. [39]. Assume an N N matrix A and a watermark vector w, Aw ¼ l0 w. Watermark is detected by correlating the received signal x0 with its transformed signal Ax0 . Since the embedding key is not needed for watermark detection at the decoder, these schemes are considered public key watermarking schemes. Eggers et al. [41] and Craver and Katzenbeisser [42] discussed the vulnerability of existing public key watermarking schemes. It was found that none of the aforementioned schemes is sufficiently robust against malicious attacks. Will public key watermarking lead to secure public watermark detection? The challenge is to detect the presence of the watermark (1) using a key (public) that is different from that was used for embedding the watermark, (2) without revealing enough information to destruct the watermark, (3) or without making it possible to forge a valid watermark for a different signal or digital object, and (4) with decoding being computationally feasible. Perhaps the development of a theoretical foundation of public key watermarking schemes is needed. A public key fragile watermarking schemes have also been proposed. Basically, an asymmetric algorithm is used to encrypt the hash value h ¼ H(O) to generate a digital signature S ¼ EncKen (h). A public key is used to decrypt S0 , received digital signature. Assuming h^ ¼ DecKde (S0 ) and h0 ¼ H(O0 ), authenticity of the received object O0 can be verified if h^ ¼ h0 . Study on public key fragile watermarking schemes have focused mainly on how to generate the authentication value, where to embed the digital signature, and what public key encryption algorithm to use for a tamperproof system. Bareto et al. [43] show that using nondeterministic signature, where each individual signature depends not only on the hashing function but also on some randomly chosen parameters, is more secure than previous approaches using deterministic signature, such as those proposed in Refs. 44 and 45. A Dual Watermarking –Fingerprinting System An ideal asymmetric key watermarking scheme with single embedding key Ken and single decoding key Kde = Ken shall limit an adversary’s ability to recreate the original content from the watermarked content. This, however, may not be enough for certain applications, such as multicast applications where a server sends the object to multiple clients. If the adversary fully controls any client, interference with the communication of server to all clients is established. To solve this problem, Kirovski et al. [46] proposed a public key algorithm that provides multiple distinctive decoding keys, all different from the encoding key, which is more suitable for multicast applications. The algorithm is also designed for spread spectrum based digital watermarking schemes. Let C ¼ {cij } denote an m N matrix, where cij [ R, is a zero mean random variable with standard deviation s. i The ith watermark decoding key is generated as follows: Kde ¼ Ken þ ci , where i the encoder key Ken is hidden in every copy of the different decoding key Kde such

11.3


401

i that knowledge of Kde does not deterministically imply knowledge of Ken , as long as s is large enough. An application for a dual-watermarking/fingerprinting system using the proposed public key algorithm has also been addressed [46]. The watermark detector i is straightforward. The testing (received) signal x0 is correlated with Kde . The deteci i tor determines whether the signal is marked according to whether Dw ¼ x0 Kde . d, a threshold. The fingerprint detector is used to detect if the ith client is part of the collusion. Unlike the watermark detector, the fingerprint detector has access to the watermark carrier matrix C. It computes the correlation between ci, the suspect watermark carrier, and (x0 x ) : Di ¼ (x0 x ) ci , where x is the originally marked signal, to detect the compromised decoding key. Based on their study, the system achieves content protection with collusion resistance of up to 900,000 users for a 2-hour high-definition video.

11.3.2.3 Efficient Key Management for Multicast in the Mobile Environment Key management can be the weakest point of a digital right management system. Mobility complicates multicast key management by allowing members to move between networks while staying in the same session on top of supporting dynamic group membership. The mobility of a group member across different time zones adds an additional dimension of complexity. Additional security threats include unauthorized creation, alteration, destruction, and illegitimate use of content by a mobile member who accumulates information for each area that he or she visits. Efficient key management and distribution can address many security concerns for mobile content multicast. In general, the following criteria are commonly used to evaluate the efficiency of various key management schemes: . Scalability—the ability to handle dynamic membership (scale group size) without considerable performance deterioration. . Rekeying efficiency—affected by the frequency of rekeying, data transmission delay due to rekeying, the number of members affected, and the total number of key update messages needs to be sent . Storage requirement—memory capacity overhead for key management at mobile terminals and group managers Hierarchical structure is widely used for scalable multicast services. In general, there are two types of scalable rekeying solutions, namely, approaches based on logical key hierarchy (LKH) and group management hierarchy (GMH). Partition a multicast group into subgroups localizes the effect of membership changes and allows differentiated key management techniques. When members are mobile, hierarchical group key management works better than centralized group key management used in LKH-based approaches. In hierarchical group management, several subgroup managers or area key distributors are used such that mobile members can get access to new group keys as long as they are ‘near’ to one of the subgroup

402


managers. The use of hierarchical key management adds a certain complexity to the key management system. For instance, the mobility of the subgroup manager or the area key distributor may result in substantial changes in the hierarchical topology. This has to be addressed to ensure end-to-end performance. One common drawback of hierarchical frameworks is the substantial resource overhead to manage the multicast group. Several interarea key distribution algorithms for group key management in wireless and mobile environments are introduced in IETF [47]. . Baseline Rekeying. This treats mobility across areas as a “leave” from the old area followed by a “join” to the new area. The disadvantage is that data transmission is unnecessarily interrupted twice during a transfer between areas because the system does not distinguish leaving member from a moving member. Hence mobility of a single member affects all the members in the domain. . Immediate Rekeying. This extends the baseline algorithm by adding explicit semantics for a handoff between areas. Each area updates the local area key but not the data key on receiving the notification of a member’s moving. Data transmission is uninterrupted. . Delayed Rekeying. This reduces repeated local rekeying by postponing local rekeying until a particular criterion is satisfied. Members moving between areas may accumulate multiple area keys and reuse them upon returning. An “extra key owner list” is maintained by each area key distributor to check against those returned members. The impact of member mobility is reduced at the cost of increased semantics. A sample key management tree, k-ary, is illustrated in Figure 11.9, where the multicast group has nine members and three hierarchies. 11.3.2.4 Multimedia Content Verification and Error Concealment Mobile content distribution is subject to error-prone communication, a characteristic of wireless networks. The task of authentication is to verify the authenticity of the target object or bitstream. Conventional cryptographic system provides complete verification such that even one bit change will give a “negative” outcome. To provide seamless content access, transcoding is often used that preserves the content but not the bits. For this kind of application it is obvious that complete verification cannot provide sufficient authentication capability. Furthermore, with error-prone communication for real-time and streaming media applications where transmission time is critical, retransmission for error-free deployment is impossible. Content or semantic verification with error recovery ability improves means for moderate security for such applications in mobile communication. First, features C ¼ f (O), which are important to preserve the original content information, are extracted from the original multimedia bitstream. Next, C is authenticated using secure cryptographic algorithms, D ¼ EncK (h) ¼ EncK (H(C)) to generate a digital signature D with key K. If D is used for authentication, content verification

11.3

403


Kroot

KB

K1

M1

K2

M2

Figure 11.9

Kb

K3

M3

K4

M4

K5

M5

Kc

K6

M6

K7

M7

K8

M8

K9

M9

A k-ary key management tree for nine members with three hierarchies.

instead of complete verification is performed. Assume that C0 is extracted from the received bitstream O0 , and that D0 ¼ EncK (h0 ) ¼ EncK (H(C 0 )). If D ¼ D0 , then O0 is said to be content-authentic. Ideally, C should be invariant to any content preserving transformations and be robust against certain amount of bit errors. In the mean time, C should also be sensitive to any content varying alterations. One interesting approach is to use digital watermarking to achieve authentication and error concealment at the same time: . Assume the source object (multimedia plaintext bitstream) O ¼ O1 þ O2 þ þ OM . . Let Ci ¼ f (Oi ) be an invariant feature of Oi . . Authenticate Ci to get Di ¼ EncK (h) ¼ EncK (H(Ci )). . Embed Di into Oj , with i = j and the ith and the jth subbitstream sufficiently apart such that the probability of both Oi and Oj are erroneous is low. . At decoder, extracted Di is used to verify the authenticity of the received subbitstream O0i and recover some significant content information of Oi in the event Di = D0i ¼ EncK (h0 ) ¼ EncK (H(Ci0 )). Theoretically, if the watermarking scheme is sufficiently robust, Di ¼ Di . Most of the content verification schemes in the literature, with or without error concealment capability, try to explore low-level features extracted from a single domain, such as frequency domain. Interested reader can refer to the literature [48 –50]. What constitutes a feature C to be theoretically invariant and/or practically robust enough for this given application remains an active research area. Lin and Tseng [51] studied semantic-based video authentication using highlevel features. A series of advanced multimedia technologies, processes of video shot and object segmentation, label annotation, concept modeling, classification,

404


watermarking, and digital signatures using public key infrastructure (PKI), are deployed to combine semantic learning and security for semantic authentication. For multimedia applications in which the content may undergo various transformations that maintain the semantic meaning of the information, but modify the bit representation, it is imperative to tie authentication to the semantics of the data; otherwise, routine processing of the content may interfere with DRM protection. 11.3.3

Summary

DRM in mobile computing environment is challenging because of its (1) vulnerability to many types of attacks, such as intrusion and eavesdropping; (2) errorprone communication channel; (3) insufficient network bandwidth; (4) dynamic mobile device location change; and (5) limited processing power, memory, and battery lifespan of mobile device, which make it computationally infeasible for strong encryption schemes. DRM solutions defined in many standard and currently available products should not be considered completely secure, if in fact this requirement is ever needed for commercial viability. Indeed, many systems today are designed on the basis of the mantra to “keep honest people honest.” Although a content protection system for DRM, no matter how strong or sophisticated, will always be vulnerable to some degree of attack from hackers and pirates, it must be sufficiently robust, such that it can protect the commercial value of the content from widespread attacks, as well as encourage appealing business models for all parties in a distribution chain. Thus, there still exist a great many technical, legal, and business challenges in adopting DRM. However, we believe that early discussion, design, development, and deployment in the context of mobile content delivery will ultimately ensure the existence of MDRM solutions at the core of all digital content services to enable seamless convergence of digital content access across all networks and devices, to provide reliable, convenient, and secure distribution for the fair benefit of all.

REFERENCES 1. W. Stallings, Network Security Essentials: Applications and Standards Prentice-Hall, 2000. 2. IBM Denial-of-Service—Alert and Response, http://www-1.ibm.com/services/ continuity/recover1.nsf/mss/DoS. 3. R. Iannella, Digital rights management (drm) architectures, D-Lib Mag. 7(6) (June 2001). 4. J. Duhl and S. Kevorkian, Understanding DRM Systems, IDC Whitepaper, Technical Report, Intertrust, 2001. 5. AES Home Page, http://csrc.nist.gov/CryptoToolkit/aes/. 6. M. Schneider and S.-F. Chanag, A robust content based digital signature for image authentication, Proc. IEEE Int. Conf. Image Processing, 1996, Vol. 3, pp. 227 – 230.

REFERENCES

405

7. D. Kunder and D. Hatzinakos, Digital watermarking for telltale tamper-proofing and authentication, Proc. IEEE (special issue on identification and protection of multimedia information), 1167– 1180 (July 1999). 8. J. A. Bloom, I. J. Cox, T. Kalker, J.-P. Linnartz, M. L. Miller, and C. B. S. Traw, Copy protection for DVD video, Proc. IEEE (special issue on identification and protection of multimedia information), 1267–1276 (July 1999). 9. A. Steinacker, A. Ghavam, and R. Steinmetz, Metadata standards for web-based resources, IEEE Multimedia, 8(1): 70– 76 (Jan. – March 2001). 10. N. Paskin, Toward unique identifiers, Proc. IEEE (special issue on identification and protection of multimedia information), 1208– 1227 (July 1999). 11. J. Feigenbaum, M. J. Freedman, T. Sander, and A. Shostack, Privacy engineering for digital rights management systems, Proc. ACM CCS-8 Digital Rights Management Workshop, 2001, pp. 76–105. 12. Y. Chawathe, Scattercast: An Architecture for Internet Broadcast Distribution as an Infrastructure Service, Ph.D. thesis, Univ. California, Berkeley, Dec. 2000. 13. M. Sreetharan and R. Kumar, Cellular Digital Packet Data, Artech House, 1996. 14. A. O. Waller, G. Jones, T. Whitley, J. Edwards, D. Kaleshi, A. Munro, B. MacFarlane, and A. Wood, Securing the delivery of digital content over the internet, IEE Electron Commun. Eng. J., 14(5): 239– 248 (October 2002). 15. R. Mori and M. Kawahara, Superdistribution: The concept and the architecture, Trans. IEICE (special issue on cryptography and information security), E73(7): (July 1990). 16. D. M. Wallner, E. G. Harder, and R. C. Agee, Key Management for Multicast: Issues and Architecture, Technical Report, Internet Draft, Sep. 1998. 17. C. K. Wong and S. S. Lam, Digital signatures for flows and multicasts, IEEE/ACM Trans. Network., 7(4): 502– 513 (Aug. 1999). 18. R. Canetti, J. Garay, G. Itkis, D. Micciancio, M. Naor, and B. Pinkas, Multicast security: A taxonomy and some efficient constructions, Proc. IEEE INFOCOM, March 1999, Vol. 2, pp. 708 – 716. 19. B. Pinkas, Efficient state updates for key management, Proc ACM CCS-8 Digital Rights Management Workshop, 2001, pp. 40–56. 20. DWS, www.dwsco.com 21. JTC1/SC29/WG11 N3943, Intellectual Property Management and Protection in MPEG Standards, Technical Report, ISO/IEC, Jan. 2001. 22. G. Kesden, Introduction on Content Scrambling System, Technical Report, CMU, Dec. 2000. 23. JTC1/SC29/WG11 N4269, Information Technology—Multimedia Framework (MPEG21) Part 4: Intellectual Property Management and Protection, Technical Report, ISO/ IEC, July 2001. 24. Content Guard XRML 2.0 Technical Overview, http://www.xrml.org. 25. R. Aravind, M. R. Civanlar, and A. R. Reibman, Packet loss resliience of MPEG-2 scalable video coding algorithm, IEEE Trans. Circuits Sys. Video Technol., 6(10): (Oct. 1996). 26. H. Gharavi and M. H. Partovi, Multilevel video coding and distribution architectures for emerging broadband digital networks, IEEE Trans. Circuits Sys. Video Technol., 6(10): (Oct. 1996).

406


27. W. Li, Overview of fine granularity scalability in MPEG-4 video standard, IEEE Trans. Circuits Sys. Video Technol., 11(3): (March 2001). 28. I. Agi and L. Gong, An empirical study of MPEG video transmissions, Proc. Internet Socient Symp. Network and Distributed System Security, San Diego, CA, Feb 1996. 29. I. Agi and L. Gong, Security enhanced MPEG player, Proc. IEEE 1st Int. Workshop on Multimedia Software Development, March 1996. 30. L. Qiao and K. Nahrstedt, Comparison of MPEG encryption algorithms, Int. J. Comput. Graph. 22(3): (1998). 31. C. Shi, S.-Y. Wang, and B. Bhargava, MPEG video encryption in real-time using secret key cryptography, Proc. Int. Conf. Parallel and Distributed Processing Techniques and Applications, 1999. 32. C.-P. Wu and C.-C. J. Kuo, Fast encryption methods for audiovisual data confidentiality, Proc. SPIE, Multimedia Systems and Applications III, A. G. Tescher, B. Vasudev, and V. M. Bove, eds., Nov. 2000, Vol. 4209. 33. A. Servetti and J. C. De Martin, Perception-based selective encryption of G.729 speech, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2002. 34. A. Pommer and A. Uhl, Selective encryption of wavelet packet subband structures for secure transmission of visual data, Proc. Multimedia and Security Workshop, ACM Multimedia, Dec. 2002. 35. J. Wen, M. Severa, W. Zeng, M. Luttrel, and W. Jin, A format-compliant configurable encryption framework for access control of video, IEEE Trans. Circuits Sys. Video Technol. 6(13): 545– 557 (2002). 36. H. H. Yu and X. Yu, Progressive and scalable encryption for multimedia content access control, Proc. IEEE Int. Conf. Communications, May 2003. 37. F. Hartung and B. Girod, Fast public-key watermarking of compressed video, Proc. IEEE Int. Conf. Image Processing, 1997, Vol. 1, pp. 528– 531. 38. T. Furon and P. Duhamel, An asymmetric public detection watermarking technique, Proc. Workshop on Information Hiding, Dresden, Germany, Oct. 1999. 39. R. G. van Schyndel, A. Z. Tirkel, and I. D. Svalbe, Key independent watermarking detection, Proc. IEEE int. Conf. Multimedia Computing and Systems, Florence, Italy, June 1999. 40. J. J. Eggers, J. K. Su, and B. Girod, Public key watermarking by eigenvectors of linear transforms, Proc. Eur. Signal Processing Conf., Tempere, Finland, April 2000. 41. J. J. Eggers, J. K. Su, and B. Girod, Asymmetric watermarking schemes, Proc. Sicherheit in Mediendaten, GMD Jahrestagung, 2000. 42. S. Craver and S. Katzenbeisser, Security analysis of public-key watermarking schemes, Proc. SPIE, Mathematics of Data/Image Coding, Compression and Encryption IV, July 2001, Vol. 4475. 43. P. S. L. M. Bareto, H. Y. Kim, and V. Rijimen, Toward a secure public-key blockwise fragile authentication watermarking, Proc. IEEE Int. Conf. Image Processing, Sep. 2001. 44. P. S. L. M. Bareto, H. Y. Kim, and V. Rijimen, Image authentication and integrity verification via content-based watermarking and public key cryptosystem, Proc. IEEE Int. Conf. Image Processing, 2000, Vol. 3, pp. 694– 697. 45. P. W. Wong and N. Memon, Secret and public key watermarking schemes for image authentication and ownership verification, IEEE Trans. Image Process., 10(10): (2001).

REFERENCES

407

46. D. Kirovski, H. Malvar, and Y. Yacobi, Multimedia content screening using a dual watermarking and fingerprinting system. Proc. ACM Multimedia, Juan Les Pins, France, Dec. 2002. 47. L. Dondeti, B. Decleene, S. Griffin, T. Hardjono, J. Kurose, D. Towsley, C. Zhang, and S. Vasudevan, Group Key Management in Wireless and Mobile Environment, Technical Report, Internet Draft, Jan. 2002. 48. C. Y. Lin, D. Sow, and S.-F. Chang, Using self-authentication and recovery for error concealment in wireless environments, Proc. SPIE, Multimedia Systems and Applications IV, A. G. Tescher, B. Vasudev, and V. M. Bove, eds., Aug. 2001, Vol. 4518. 49. P. Yin, B. Liu, and H. Yu, Error concealment using data hiding, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Salt Lake City, Utah, May 2001. 50. P. Yin and H. Yu, A semi-fragile watermarking system for MPEG video authentication, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Orlando, Florida, May 2002. 51. C.-Y. Lin and B. L. Tseng, Segmentation, classification and watermarking for multimedia semantic authentication, Proc. Int. Workshop on Multimedia Signal Processing, Virgin Islands, Dec. 2002.

CHAPTER 12

CHARGING FOR MOBILE CONTENT DAVID BANJO Nokia Networks Helsinki, Finland

12.1

INTRODUCTION

Other chapters of this book have focused on describing mechanisms for creating, managing, and delivering mobile content. There is the underlying assumption that the consumers of such content are paying to receive or access this content. In this chapter we consider the mechanisms that enable the providers of mobile content services to charge for them. Charging for mobile content requires consideration of two aspects: payment for the content (the goods) and the payment for the use of the mobile domain (e.g., the wireless bearer, and a mobile context; e.g., the location and “presence” of the consumer, and the personal, trusted status of the mobile device). Charging for mobile content services must be able to distinguish and apply the true value of the content to the consumer; which might differ considerably from the value of that content to the user over a fixed (home or work), Internet connection. Therefore the charging mechanisms that are utilized should be able to differentiate between, and separately charge, the content-related price and access or mobility related fees to the end user. Charging is an area that is critical to the success of the mobile content business. If users of mobile content services are not able to understand and accept the way that content is charged, they will not use the services, and the “mobile Internet explosion” will at best be significantly delayed. Emphasis must be given to, identifying the charging models that are needed to support each type of mobile content usage, and developing the needed charging mechanisms and systems to enable those charging models. In this chapter we will explore the mechanisms that have been developed for charging within the telecom domain, as these will generally provide the starting point for operators to deploy a mobile charging infrastructure. We will continue to examine how these mechanisms can be adapted to mobile content charging, and look at some of the challenges that are posed in this area. Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

409

410

12.1.1

CHARGING FOR MOBILE CONTENT

The Scope of “Charging”

First it might be useful to define what we mean by charging in this document, and to make some distinctions between various related terms. Questions frequently raised in this general area are What is the difference between charging and payment? Aren’t charging and billing the same thing? Are we talking about charging or accounting here? Although, to the author’s knowledge, there is no empirical definition that separates charging and payment, for the purposes of this chapter we will apply the subsequent interpretations of the terms. The 3GPP (Third-Generation Partnership Project) definition of charging [1] is “A function whereby information related to a chargeable operation is formatted and transferred in order to make it possible to determine usage for which the charged party may be billed.” Payment has been described within OMA (Open Mobile Alliance) as “a mechanism by which funds are moved from the customer to the merchant in exchange for goods and/or services. It may be on a per transaction basis, or may be aggregated over a number of transactions.” A distinction that can be made is that payment involves the explicit interaction of the end user in a transaction, whereas this is not necessarily the case in charging. A mobile payment (m-commerce) transaction may result in charging; for example, where the user has opted to pay via his/her operator bill, instead of, for example, supplying credit card details. So, we will align our focus on charging with 3GPP’s definition; focusing on the generation, formatting and transfer of data records, which provide details relating to the usage of a service or consumption of an item of content. These records can be generated from a variety of sources connected with the delivery of the service, and utilizing a number of different mechanisms, which we will explore later. The purpose of generating these data records is to facilitate the identification and apportionment of the various revenue transactions that can be related to the service; for example, the consumer of the service paying the provider of the service; the provider of the service paying the provider of the content, who in turn pays the owner of the content. These charging records may ultimately be presented to, for example, billing systems through which invoices can be generated for payment (unless the subscriber has a prepaid charging relationship with the mobile network operator). This document does not explore the billing process. For instance, bill or invoice presentation, settlements, clearing, taxation, and general ledger entries are not in the scope of charging. In scope is the generation of charging info; also transformation, normalization, and correlation of usage data. Charging can also focus on “prerating,” which involves assignment of ratable attributes to the charging record that allow the appropriate values (e.g., currency) to be assigned against the appropriate accounts

12.2

FIXED-LINE TELEPHONY CHARGING

411

[e.g., subscriber, business-to-business (B2B) partner]. The management of impacts of charging events against a prepaid account balance is generally seen as within the charging domain (however, the recharging or “topping up” of the prepaid balance is not considered within the charging scope). From the description above, charging can be understood to be closely related to accounting. Charging is often seen to be the equivalent in the telecom(munication) world to accounting in the Internet sphere. The scope of accounting can however be seen to be wider than that of charging, as it also deals with the collection of usage data for nonbilling purposes (usage analysis and audit, etc.). But for the purposes of this chapter, we can assume that charging and accounting are synonymous.

12.2


In this section we will overview some of the mechanisms that have been employed by telecom operators to charge users, initially within fixed-line networks. (We are assuming in this chapter that the mobile user is a subscriber of a mobile network operator. However, this does not necessarily mean that the network operator is offering the content or service used. A Web services provider might for example supply the content.) In a mobile content provisioning scenario the telecom operator generally holds the primary relationship with the mobile user. To enable the billing of users for a variety of services that have evolved through a sequence of technological advances, the operator has typically developed a complex charging and billing infrastructure. While such an infrastructure is often unsuited to the needs of content charging, it represents a legacy system to which integration is required. Operators have invested significantly in such systems. They cannot be replaced overnight and must be expected to evolve, somewhat slowly, toward the requirements of content charging. Person-to-person voice calling was the first service offered by PSTN (public switched telephony network) telephone operators. Charging for voice calls was facilitated through counting the number of “pulses” generated per call within the telephony exchanges involved in the setup, routing and control of these calls. The number of pulses generated per call varied in relation to the location of the called number. In the early days of billing, engineers took photographs of the settings on racks of pulse meters in order to identify the change (total number of pulses) on a subscriber’s usage since the previous billing period! With the advent of digital exchanges (or “switches”), additional sophistication was introduced to charging through the ability to extract usage data from the switches that manage the connections. These charging records [call detail records (CDRs)] identified numerous parameters associated with a call (e.g., the calling number, called number, start and end times). CDRs were initially extracted by writing to magnetic or optical media, which had to be physically transported to a

412


billing center. These days the automated collection of CDR files via the operator’s own network has become the norm. It can be relevant to consider simple voice calls as content. From a charging perspective, a voice call includes most of the attributes that are pertinent to content: . A service deliverable (i.e., voice telephony) that represents a quantifiable value to the end user . Recognition of the roles other players in the value chain, from a revenue sharing perspective An example of fixed-network CDR contents would be as follows: Calling party Called party Call start Call end Recording unit Record type Egress circuit Egress port

01023422315 05564123444 03032003 : 101007 03032003 : 101703 22 01 244 37

From the example voice call CDR above, these aspects can be derived by CDR analysis within typical charging and billing systems, for instance The “record type” indicates that this is a normal outgoing voice call (the service), which will be billed to the calling party (the user). The called number (i.e., the destination); which, in relation to such attributes as the calling number, and the time and duration of the call, enable the calculation of the retail value of the call (i.e., the price the user will pay for the content). The “circuits” and “ports” used in the connection that (combined with the retail attributes) support the computation of the charges due to the network operator through which the call was connected. Additionally, some of the complexities associated with current content charging have originally been presented in charging for voice telephony. Where multiple physical elements (e.g., telephony switches) are involved in the delivery of a service, there is a need to determine which element(s) to use as a source of billing information; for example, whether to use the switch to which the subscriber connected, or a transit switch or an interconnecting switch to other networks. (The “recording unit” in the earlier example might indicate that this switch is one that interconnects directly with a specific network). In Figure 12.1 operator A’s charging and billing system would use the residential switch as the source of billing information for subscribers attached to that residential

12.2


413

Operator B

Called Party Interconnect switch

Interconnect switch

Transit switch Operator A

IDC

Charging System

Residential switch

Calling Party

Figure 12.1

Different switches in a PSTN network.

switch and the interconnect switch as the source of interoperator settlement information with operator B. In addition to person-to-person voice calls, various value-added services (VASs) were gradually introduced to voice telephony. Examples of these include premium rate infotainment (number translation), friends and family, VPN, and closed user groups. In addition to providing an increased range of value to the end user, such

414


services often introduce the concept of a third party to the value chain. For example, a business might offer an adult entertainment voice service that would be charged to the user at a specific premium rate via the user’s phone bill, with a predetermined proportion of the charge paid by the operator to the service provider. Conversely, certain services would be provided to the user free of charge, concerning which the service provider would be charged by the operator on a per call basis, derived from analysis of the called number. We shall discuss revenue sharing later in this chapter. Such VASs are often provided using intelligent network (IN) features. IN has evolved from functionality that might be integrated within telephony switches to a separately implemented and flexible service creation environment. As such, an IN system represents an additional source of charging information.

12.3

MOBILE TELEPHONY CHARGING

This section introduces some additional factors relevant to charging that have been developed within mobile telephony. The introduction of GSM (Groupe Spećiale Mobile) mobile telephony networks in the early 1990s laid the foundation for mobile content. One of the success stories of GSM was the emergence of the short message service (SMS) as a bearer for content-based services. Originally envisaged as a simple text-toperson communication facility, SMS has developed to become a conveyor of value-added content to individuals. It is fair to say that the user adoption and commercial potential of SMS was largely unpredicted within the industry. For this reason, the charging mechanisms developed around SMS did not adequately accommodate the fact that SMS messages might be charged differently, depending on the inherent value of the content carried; for example, whether a received message (SMS-MT) contained a ringtone, picture message, or a voicemail notification. While both the GSM MSC (mobile switching center) and GPRS SGSN (serving GPRS support node) are also able to provide charging data related to SMS usage, only the SMS center (SMSC) can provide application-level information to support content-based SMS billing. This underlines an important issue for content charging, which we shall return to later within this chapter, namely, the selection of the source of charging data. The Wireless Access Protocol (WAP), which became available in the late 1990s, offers new possibilities for transporting and presenting content securely to mobile terminals. WAP still suffers badly from the adverse reaction to the tremendous hyperbole with which it was introduced. It does still have a significant role to play in the future of mobile content. Significantly, from the charging perspective a WAP server presents an additional source of charging information: one that is able to identify attributes relevant to the content, such as whether the content was “pushed” or “pulled,” the content provider’s identity and pricing (if present), and the bearer used [2].

12.4 ASPECTS PERTINENT TO MOBILE CONTENT CHARGING

415

At the end of the 1990s GPRS (General Packet Radio Service) was introduced. GPRS provides packet switching capability to an enhanced GSM network. GPRS standards identified two new charging elements: The SGSN (serving GPRS support node), which provides charging data relating to the data packets delivered toward and from the mobile terminal, and also the mobility of the user, and SMS messages sent and received, The GGSN (gateway GPRS support node), which provides charging data relating to the data packets delivered toward and from the ‘Gi’ interface (i.e., the Internet, or content sources). In theory, the mobile Internet had then been realized. In practice, however, this was not quite the case. There were (and still are) issues to surmount regarding content adaptation and the availability of services geared specifically to mobile terminals. But most relevant to the subject of charging there is the issue of what pricing models would be used in conjunction with mobile content. A significant fact relating to mobile Internet data is that, unlike the fixed Internet, it cannot yet be offered as a virtual commodity. The investments made by mobile network operators on radio networks and radio spectrum licenses vastly outweigh the investment made, for example, by legacy fixed-line operators and access providers. Therefore, mobile operators are constrained to charge more dearly for wireless access to the Internet than fixed access. It should however be noted that mobile operators can also offer additional value propositions to the end user, associated with mobility and context (location, presence), and so on. Services that leverage these benefits are nevertheless slow to emerge. An obvious implication that arises from this is that the revenue propositions associated with the provision of mobile content cannot be the same as that applied to fixed Internet content—even for the same content. However, it cannot be expected in all cases that the end user should pay significantly more for mobile content than for fixed Internet content. There must be a focused rebalancing of the value chain associated with mobile content provision. Importantly, mechanisms are needed to identify and correlate the charges for access to the mobile Internet and for the content itself. Both access and content are in many cases charged according to usage data recorded within separate elements, and correlation of this usage data is far from straightforward. These aspects associated with correlation are explored further later in this chapter. 12.4

ASPECTS PERTINENT TO MOBILE CONTENT CHARGING

This section examines various aspects that must be understood and considered in any study of mobile content charging. These include identifying who should be charged (not limited to the end user), what charging models should apply, what additional factors should be reflected in the charging, and from where the charging information should be obtained.

416


A pertinent question when considering the necessary mechanisms for charging mobile content is “What is mobile content?” From our perspective in charging we might define mobile content as something that can be delivered via the mobile Internet and that has a measurable value to the end recipient, as well as to other participants in the revenue chain. From the preceding description alone, it might be fairly difficult to define charging mechanisms that are required regarding mobile content. However, we can extract from the description certain key criteria that have an impact on how charging should be supported: . Mobility. This does not suggest that the content itself is defined by its provision to mobile terminals. However, there is a requirement that the charging for the content must be capable of recognizing certain additional aspects associated with delivery to a mobile terminal, such as location, presence, bearer, and roaming. . Internet. By and large, content is delivered via IP (Internet Protocol). SMS and circuit switched-data are exceptions, but these are largely considered to be of decreasing importance in the longer-term development of mobile content. . Delivery. There needs be means to ensure whether delivery of the content has succeeded of failed, before definitive charging actions can be performed. . Measurable Value. The content should be classifiable and quantifiable by automated systems. It is important to note that what exactly is measured will vary depending on the type of content and selected charging model. For example, some content should be charged according to the size of the content, some on the duration of the consumption session, and some on an explicit tariff code assigned by the content provider. Additionally, factors such as location or roaming, time of day, and quality of service should be able to influence the charge calculated. . End Recipient. Content is delivered to an end recipient (or a number of end recipients). The identity of these end recipients (or mobile devices) should be resolvable within the charging process to determine the account or entity that should pay for the content. . Other Participants. Charging must recognize that numerous players associated in the revenue chain should obtain a share in the revenues. The charging paradigms (and enabling mechanisms may differ for each, as explained in later sections). Charging data should support as far as possible the identification and value apportionment of the relevant parties. 12.4.1

Revenue Chain

Figure 12.2 illustrates how complex the revenue flow can be in a typical model of the mobile Internet. However, not all of these associations will be present in a single given context. Figure 12.3 provides an example of an individual context in which a mobile user has accessed directly a service provided by a service provider that is partnered with the


417

Content Aggregator Advertiser Content Creator

Home Mobile Network Operator

Content/ Service Provider

Visited Mobile Operator Network

User

Figure 12.2 Possible revenue flow relationships.

user’s network operator. The service provider has an arrangement to charge the user via operator billing, and the operator will settle with the service provider after the event. The service provider is responsible for redistributing the revenues toward the content aggregator, who in turn settles with the content copyright owner. The service provider will also receive payment for advertisements placed within the content, or during the service discovery process (this helps reduce the cost of the content toward the end user). Finally, the user is roaming on a foreign network. The (single) charge by the operator to the user should reflect the fee for the content, plus mobile network access, plus roaming surcharge. The operator will settle with the visited network operator through the standard transferred account procedure. 12.4.2 Subscription Models The subscription model offered by the service provider determines the mechanisms needed for content charging. Typical subscription and charging models include Fixed subscription—for example, a monthly charge, with “all you can eat” usage.

418


Figure 12.3

Revenue sharing example.

Limited subscription—where the monthly charge includes a fixed amount of content consumption, with additional consumption charged per one of the other subscription models. Event or transaction-based charging—where the charge is based on individual invocations of the service or content. Examples include per file, or per message charging. The charge may differ depending on, for instance, the type or size of the content. Session-based charging—where charging is based on metering during a continuous period of usage of the service or content. Examples are streamed services, which might be charged according to the amount of time that the streamed service is used. (Also, such charging might be based on an initial setup charge followed by per minute charging metered during the session duration.) In addition to these, certain other charging mechanisms can be utilized: Sponsorship. The charging of the user or content consumer may to a variable extent be reduced as a result of a third-party participation. Examples


419

include a company paying (sponsoring) 50% of WAP usage by its employees; or a company sponsoring all users of specific services carrying inserted advertisements. Loyalty Schemes. In this mechanism the user can accrue miscellaneous “loyalty points” associated with the usage of a service provider’s offerings. These credits might equate to simple currency or service-usage-based credits, or might involve the accrual of abstract resources such as air mile awards by a partner airline. The mechanisms listed above have been described simply in the context of charging for only the content. However, in reality the mechanism will need to apply to a differentiated charging model, which takes into account the charging that should be made for access to the mobile network (which might also vary dependent on the access type). So, for example, the service provider might wish to allow access charging to be active as a default for browsing usage, but to be disabled where selected sites are visited, or to have the content charging reduced where UMTS is used as the access bearer (e.g., to promote greater take up of UMTS subscriptions).

12.4.3 Postpaid and Prepaid Charging Perhaps the most significant aspect of the subscription from the viewpoint of the required charging mechanism is whether the subscription is prepaid or postpaid. In Europe it is estimated that more than 50% of mobile users on average are on prepaid subscriptions, with the figure approaching 90% in certain countries. It is in the interest of mobile network operators and service providers alike to ensure that the fullest range of mobile content and services are equally available to postpaid and prepaying subscribers. However, there are considerable complexities associated with prepaid charging. In a postpaid billing relationship the subscriber is considered by the charging party (i.e., the mobile network operator) to have an acceptable level of creditworthiness. Accordingly, the operator is satisfied to request payment retrospectively for services used during a billing period (typically a calendar month). Consequently, the generation of associated charging information does not carry a high urgency, and generally a daily collection1 of charging information is sufficient to ensure that it can be rated and included within the upcoming subscriber-billing run. In prepaid billing the subscriber has, for various reasons (not necessarily related to creditworthiness), chosen to have his or her mobile telephony services usage limited, based on his/her prepaid account balance, which has been deposited with the charging party in advance. The account balance can be “topped up” by the subscriber as required, to ensure that there is sufficient credit on the prepaid account to cover his/her intended service usage. This model places a greater overhead on the 1

In practice the frequency of collection is more often dictated by storage capacities on the CDR generating elements.

420


Content/ Delivery Server

Charging Mediation

Content Delivery

Rating & Billing

Usage data (e.g. CDR) Mediated CDR

Figure 12.4 Postpaid content charging.

delivery server, to ensure that there is sufficient credit on a subscriber’s account before a specific service or content item is delivered, and to adjust the subscriber’s prepaid balance in line with the actual service usage (recognizing accordingly whether the service or content delivery was or was not successful). Prepaid charging for session-based services requires implementation of an “interim accounting” mechanism, in which the prepaid balance is incrementally debited during the session. Figures 12.4 and 12.5 illustrate the differences between the mechanisms required to support postpaid and prepaid charging. Currently, it is popular to describe offline and online charging methods, rather than prepaid and postpaid. These terms are, however, not directly interchangeable 3GPP offers the following definitions:

Figure 12.5

Event-based prepaid content charging (simplified).


421

Offline charging—a charging mechanism where charging information does not affect, in real-time, the service rendered Online charging—a charging mechanism where charging information can affect, in real time, the service rendered and therefore a direct interaction of the charging mechanism with session/service control is required A postpaying subscriber may elect to utilize a spend limiting feature offered by an operator—for example, to place a limit of E40 per month on MMS messaging. In this case, the operator will possibly use an online charging mechanism specifically against this subscriber’s service in order to check that the balance accumulated during the given period does not exceed the defined threshold.

12.4.4 Business-to-Business (B2B) In addition to enabling the billing of end users, charging is responsible for supporting the business-to-business (B2B) settlements associated with the delivery of mobile content. This includes recognition of the fees owed to or from the third parties involved in the creation or aggregation of the content; and other network operators with whom it was necessary to interconnect in order to deliver the content. In the simplest case third-party charging requires the identification of the specific content or service aggregated quantitatively, and the service provider involved. The service provider is in turn responsible for settlements with other parties involved in the content provision chain. Settlement periods for B2B charging are typically monthly or quarterly; online accounting is rarely seen as a requirement. Wherever a user involved in the content delivery is located (permanently or temporarily) on a network other than the charging party’s, it can be assumed that interoperator charges need to be levied between the parties, based primarily on the traffic (within defined categories) that are passed between the networks. Currently such settlements are made periodically through the interchange and reconciliation of usage data between the two parties. It is a general principle that each party engaging in a B2B relationship will maintain its own charging data. The difficulties in reconciling such data, usually maintained in disparate formats, has helped promote the development of interchange formats such as TAP and CIBER, and the role of “clearinghouse” companies that broker such settlements. It is worth noting that B2B charging is generally performed at aggregate level, with no requirement to recognize the individual users associated with the transactions. This means that it is possible to obtain the charging data for user and B2B charging, respectively, from different points in the network. For example, data needed for user charging might most appropriately be obtained from the delivering network element, while the business charging data might be obtained from a transit switch, or a border gateway. (There can be small differences between data recorded in each of these entities e.g., different volume counts arising from signaling overhead; but this can be regarded as of low importance in this context.)

422

12.4.5


Roaming

Roaming is the situation that arises when a mobile user is actively attached to a network other than his/her home network. When services are used in this scenario charging must be able to recognize the subscriber’s roaming status, so that the amount that user is charged can vary according to the specific network that the user is attached to. Users will always pay an increased premium when roaming, reflecting the charges that the home operator will pay to the visited network. These charges will be lower where there is a close business partnership between the networks, and the user charging will need to reflect such network-specific variation.

12.4.6

Multiple Access

Multiaccess networks allow content and services within an operator’s offering to be made available through a number of access technologies (e.g., GPRS, WLAN, xDSL). The importance of multiaccess is that charging should be access-aware. A specific service available to a subscriber may be tariffed differently depending on the access type. For example, content accessed via GPRS might be priced more cheaply during a promotion, and might not be available when the subscriber accesses the services via WLAN. The charging system relies on either the core network or the content delivery server (where possible) to indicate the used access technology within charging requests.

12.4.7

Source of Charging Records

As we noted earlier, an important aspect of mobile content charging concerns the choice(s) of source of charging information. In a typical mobile content environment there will be a number of service delivery elements involved with content delivery (all of which can generate charging information). Take an example of a game download. In the simplified example given in Figure 12.6 there is a mobile GPRS network, represented by the SGSN plus GGSN, an origin server (which holds the content) and a download server (which manages the content delivery). In principle, any or all of these elements could be used to charge for content delivery; however, in practice there are challenges involved with any approach that is taken. The typical mobile data network is unaware of the precise content that it is carrying, also whether the content is successfully delivered. The network has however the best awareness of the subscriber’s identity, subscription status, and location, and the type and quality of the bearer utilized. The origin server is the source that defines the precise content, but it may typically be completely unaware of the destination and delivery of the content. The delivery or download server has a high awareness of the content (it may often be responsible for adapting the content) and its successful delivery to the user. Being located within the mobile network operator domain the download server has some


ID C

423

Charging System

Charging Records Charging Records

Origin Server Download IP Server

SGSN

GPRS Core GGSN

Figure 12.6

Game download.

subscriber awareness (however, the user’s network identity may not necessarily be known, depending on the authentication mechanisms used). The content delivery server, or closest equivalent (e.g., an SMSC, MMSC or download server) is often the source of choice for content charging information. The disadvantage of using only a delivery server for content charging is that the general default basis for charging utilized by mobile network operators is based on charging for use of radio network access (usually based on volume of data traffic). Combining this access charging with content charging will result in the user being effectively double-charged; first, the cost of the content then the charges associated with the content delivery. It could be argued that that a solution lies in careful tariffing of content-based services; however, this has not so far proved successful. GPRS access is fairly expensive, with prices in the range of E1 per megabyte of data being commonplace. If on top of this there is additional charging for content, including both the fees due to the content owner and the operator’s margin the end result is not one that is at all inviting to the user from a cost proposition. It is also worth noting that the revenue sharing basis that has been used with SMS ringtones, where the operator may retain around 50% of the charge to the user, also is not likely to succeed if applied to many other types of content.

12.4.8 Multiple Servers Involved in Delivery A factor that must be carefully considered in charging is the “big picture” in a service delivery architecture, and how to ensure that the most pertinent charging information is available to bill the service or content in question, and that other charging information is suppressed or ignored. For example, WAP gateways and SMS centers are often used as sources of charging information, but the charging information that they produce is not normally required to charge for MMS

424


services; even though SMS and WAP gateways are part of the MMS message delivery. Tailoring of charging information can be accomplished in a number of ways: . Configure each service element to produce only charging information relevant to the specific service delivery context it is participating in. . Configure logic in the charging system to recognize a service delivery context, and to utilize or disregard charging information accordingly. . Introduce a “workflow engine” or “broker” to manage the logic required externally to the charging and service delivery systems.

12.5

CHARGING CONCEPTS AND MECHANISMS

This section outlines various concepts, techniques, and processes generally utilized within the charging domain, but that have a relevance to mobile content charging. These include the generation of charging data, the differentiation of access and content charges, and “mediation” processes that can be applied to charging records. 12.5.1

Creation of Charging Records

Here we discuss mechanisms used for recording charging information. Although these mechanisms have been developed in the context of network access or bearer-related charging, the same techniques have also been utilized within the servers that provide content delivery, and generate corresponding usage records. Within the fixed Internet RADIUS (remote authentication dial-in user service), accounting [3] has typically been used to provide usage data for billing. RADIUS has had more limited use for charging in the telecom domain. Other less standardized mechanisms have also been employed, such as the analysis of log and event information created by access servers and web servers. As we have noted earlier, the telecom domain has traditionally utilized CDRs as a charging mechanism. There have been, particularly since the early 1990s, drivers to speed up the frequency of collection of CDRs from the switches and other recording devices that generate them. Initially, these drivers were, for instance, intended to reduce revenue leakage caused by the loss of data when the switches’ CDR file capacity was reached—often an issue where the frequency of switch polling (scheduled automated CDR collection) was as low as once per day. The mechanism of CDR creation is typically as follows: . Individual CDRs are created at configurable intervals. For example, a recording device may be programmed to generate CDRs every 15 minutes during the duration of a content session. (This creates “partial” CDRs, which can be charged individually, but that need to be combined with other partial CDRs created in relation to the session in order to present a complete billing record.)

12.5


425

. CDRs are written to a storage area within the recording device, and batched together in “CDR files” with hundreds or thousands of other CDRs generated at the same time. . When a CDR file is ready, it is made available for collection, The collection is often a secure file transfer process initiated by a billing or mediation device, employing a mutually agreed protocol. Increasingly, however, demands to minimize the credit risk posed by prepaying subscribers to operators have accelerated the trend toward real-time charging. CDR collection advanced toward a concept known as “warm billing,” where, for instance, CDR files may be collected every 5 –15 minutes. The further evolved concept of “hot billing” introduced mechanisms whereby CDRs could be individually “pushed” by the recording device, for example, as part of a message stream, bypassing the need to accumulate CDRs in files prior to their transfer. In terms of prepaid charging, even hot billing has been seen to introduce too much latency into the equation. First, the subscriber has already received some service or content before the CDR is generated. Then there is a certain amount of delay before the prepaid account balance can be impacted (mostly due to the need to buffer the CDR at one or more points within the processing stream). Finally, hot billing does not offer a mechanism to terminate a service in a circumstance where the prepaid account balance is exhausted. (Prepaid charging is discussed in more detail later in this chapter.) These limitations were initially addressed in CAMEL (customized application for mobile network enhanced logic) is a network architecture, relevant to both the CS (circuit-switched) and PS (packet-switched) domains and provides mechanisms to support IN (intelligent network) applications and services. CAP3 (CAMEL application Part 3) provides prepaid charging support for GPRS data and mobile originating SMS services, including services accessed by users roaming outside their home network [4]. However, a major limitation of CAMEL is that it does not support the differentiation of content (at least not insofar as different content may be provided within a single GPRS APN). Differentiated charging is discussed in more detail in the following section.

12.5.2 Differentiated Charging The target of truly effective content charging is to enable differentiated contentbased charging. This means being able to identify a service or content at the relevant level of granularity within the available charging information, and to relate and manage in harmony all charges that are applicable (e.g., separately for the content and access). Some difficulties with this have been discussed earlier within this chapter. Solutions are continually developing to surmount this challenge. One straightforward solution is to perform content charging using information provided by the content origin or delivery server, and vary the access charging

426


from the network according to the type of content or service being delivered. This could, for instance, be made possible by reserving a specific traffic “pipe” within the access network for each type of content. An example of this is MMS charging, where a specific GPRS access point is dedicated to traffic toward the MMSC (via the WAP gateway). MMS charging is then based on the MMSC charging information, and the charging for access (specific to that access point) is then either zero-tariffed, or tariffed according to the operator’s configured policy. There are, however, limitations to this approach. Separate access points are then required for each different type of content service. The configuration of multiple access points is problematic (especially for the user of a mobile terminal, but also within the network), and it is regarded as healthy practice to minimize the number of access points that need to be created. (Also in the example above it is necessary to have a dedicated WAP gateway for MMS traffic, as there will most probably be different access tariffing applicable to WAP browsing traffic.) 12.5.3

Flow-Based Charging

Identification and charging of traffic flows within a single access point are not currently supported in the GPRS standards; however, this is the area in which solutions are currently being sought to address the challenge of differentiated charging. IP flow-based traffic handling can be used to Classify traffic within the core network according to different services or content types Route traffic according to the service or content (or the indicated service or content provider), providing virtual access points within a single access point configured within a mobile terminal Charge for the flow according to the identified service or content type Provide prepaid charging support Enable a granular level of service control Simplified traffic and content analysis can be performed by utilizing “IP header” information (layers 3–4) within the datastream. More intrusive inspection of the data (layers 5–7) can provide a great range of information relating to the content or service used. However, analysis of higher-layer information is not straightforward, and requires the development of specific analyzers for each application protocol used. Table 12.1 provides an example of items derivable from flow analysis that could be utilized for charging purposes. In this example the flow analysis might identify that the user is browsing a download selection offered by a partner service provider, and the user should not be charged for GPRS access in this instance. In addition to enabling differentiated access through variation of the related access charges, flow-based charging (Fig. 12.7) can enable content charging to be performed solely via charging information provided by a “flow-aware” core network.

12.5

TABLE 12.1


427

Flow Analysis

Layers 5 – 7 URL

Layers 3 – 4 Source IP address Source port Destination IP address Destination port Protocol

www.contentsshop.com/ downloads

Can be analyzed against a lookup list to identify the service or content being accessed

132.225.35.4 80 129.37.22.17 80 TCP

Identifies the mobile terminal HTTP default Identifies the traffic destination

In the example pictured in Figure 12.7, flow analysis identifies all UDP traffic destined for IP address 129.37.22.22 and port 9201 as representing a specific WAP browsing service. The flow-aware core network provides charging data (based on volume count) for user charging. No charging data are needed from the WAP gateway. All traffic to IP address 129.37.22.15 is identified to relate to gaming services. In this instance the gaming server is used to provide charging data as the user charging might depend on certain application-specific events. No user charging is generated from the core network; however, CDRs might be required for B2B charging between the network operator and the business partners associated with provision of the gaming service.

IDC

Charging System

User charging CDRs for browsing; Zero tariffed CDRs for gaming (B2B charging)

User charging CDRs for gaming

GGSN SGSN flow-aware core n7w

Gaming Server

.15

.22

.37

129

UDP/p or t 92 0

1/129

.37.2

2.22

WAP Gateway

Figure 12.7

Flow-based charging.

428


3GPP is working on standardizing the concept of traffic plane flow-based bearer charging within its release 6 standard. This functionality has not so far been precisely defined within the GPRS core. Currently, solutions exist on the market that place this functionality either inside an evolved GGSN, or independent of the GGSN. 12.5.4

Mediation

The role of mediation cannot be understated where charging is concerned. Mediation is a process that permits the transfer of information between incompatible entities. Incompatibility arises from the disparate information structures and communication protocols that are used within networks, especially telecom networks. For example, within an operator’s environment one network element might generate CDRs using an ASN.1 encoding format with FTAM as the transfer protocol, while another might present a proprietary fixed-field ASCII format with real-time transfer via GTP. All these records need to be consumed by a billing system that has its own proprietary input CDR format. A mediation device is a standard item in an operator’s charging and billing infrastructure. The mediation device sits between the network and the business support systems. In the context of billing, it is responsible for automating the secure, scheduled collection, and reception of charging records. It manages, according to configured rules, the validation, formatting, and conversion of the charging records, including any required computations derived from configured rules and correlation with other related usage data. Mediation also directs the results of its processes to the required output stream(s) facing the operator’s business support systems. 12.5.5

Correlation

Charging correlation can be defined as a process involving the consolidation of two or more sets of charging data, generated by one or more sources, but that are all related to the same connectivity and/or service session. As a result of the correlation process, new or modified charging data are created. Correlation is typically the business of a charging mediation system. A basic form of correlation occurs with long sessions (e.g., voice calls). An operator for reasons of revenue assurance might generate charging records every 15 minutes, in relation to a very long session. These records will normally need to be buffered within the charging process, and then combined into a single record before they are passed to a billing system. A more complex type of correlation arises where there is a need to correlate usage information from two separate network elements. Examples of this are GPRS, where the correlated charging record might use the GGSN CDR as the basis of volume count information, but use the SGSN CDR for information relating to the user’s location; or correlation as a means of enabling differentiated charging, where access network CDRs are treated differently depending on the information presented

12.6

CHARGING INTERFACES

429

in a related CDR generated by a content delivery server. In online correlation this association must be performed in near realtime during a service delivery session. Complexity in correlation occurs from the technical challenge in implementing correlation (especially online charging correlation) together with the fact that no standards exist to support correlation in flow-based content charging; therefore, no two separate network elements are likely to support the same correlation vectors and mechanisms. 12.5.6 Charging Rules “Charging rules” are the logic configured within network elements to define the charging behavior of the network element for a given subscriber or service. Examples include whether online or offline charging should be used and what tariff class and metering technique to use. Dynamic charging rules allow such instructions to be pushed to, and executed within a network element during a charging session (e.g., to change the metering technique used by the element). 3GPP release 6 standardization work is currently focusing on the concept of charging rules within flow-based bearer charging in the mobile core network. 12.5.7 Rating Rating is also a fundamental requirement for content charging, and is needed to derive the charge to the subscriber for the service or content received. Even where content is prepriced by a content or delivery server, the final charge to the user might depend on numerous additional attributes, such as date and time, location and roaming status, QoS, accumulated credits, or usage balances of the associated subscription, or the bearer used. Rating is used to compute the final price, based on the information supplied in the charging record, and rules and history relating to the subscription. 12.5.8 Advice of Charge Advice of charge is an increasingly important requirement for content charging. The charging system supports advice of charge by providing information, on request, as to the cost of a proposed service or content (or, if it is an open-ended service, the rate at which the service will be charged). Rating is a prerequisite for advice of charge support. Advice of charge is not yet standardized explicitly, and solutions currently provided are largely proprietary and limited in nature. The challenge in advice of charge is to provide information relating to the full (differentiated) cost of the content or service. 12.6

CHARGING INTERFACES

There are a variety of mechanisms used to enable service elements to provide charging information to charging systems. These generally involve the specification of the charging data definition and encoding format, and a record transfer protocol.

430


This section introduces some commonly referenced interfaces that have a relevance to content charging. There are no universal standards for offline charging (e.g., CDR-based accounting), and a range of mostly vendor-specific alternatives exist. Many proprietary vendor-specific and domain-specific specifications exist describing CDR file encoding formats and transfer protocols. IPDR.org proposes an open specification for offline usage reporting [5]. Online and prepaid accounting requires a high degree of coordination between the service element (client) and the charging system (server). Currently there are a number of available and developing specifications in this area, which highlight the importance of sophisticated charging mechanisms for prepaid services and content. The remote authentication dial-in user service (RADIUS) protocol [6] is commonly used in the internet and telecom worlds to provide authentication, authorization, and accounting (AAA) of users. RADIUS accounting can be used to meet both offline and online charging requirements. RADIUS implements a client – server model, utilizing request and response messages between the client (service element) and server (charging system). Figure 12.8 illustrates this model. RADIUS has a number of limitations from a charging perspective, some of which have been addressed in the Diameter base protocol [7], which is the protocol proposed within IETF (Internet Engineering Task Force) as the evolution to RADIUS. . RADIUS is based on UDP, which is an unreliable transport, allowing for undetected packet loss (which can be translated to lost revenue for an operator or service provider). The Diameter protocol, on the other hand, is based on TCP or SCTP, which provide transport layer reliability and control. . The RADIUS specification has restrictions on the size of attribute data that can be carried, and the number of pending requests that can be supported. These restrictions are not present in the Diameter protocol.

Figure 12.8

Accounting client and server.

12.7

CHARGING INFORMATION

431

. RADIUS and Diameter are message-based. This is advantageous for online accounting, which needs to have the lowest possible latency, but does not scale well for offline accounting, which is more efficiently served by batch file CDR production. Additionally, the mediation needs of RADIUS and Diameter are higher than for instance, for CDR-based charging. . The basis of RADIUS and Diameter accounting is to report usage to the accounting server “after the event,” such as after 100 kB of data has already been consumed. This does not provide support to “true” prepaid charging, in which the 100 kB should be “authorized” by the charging server before it is provided to the user. Extensions have been proposed in IETF to the RADIUS and diameter protocols to enable them to support true prepaid, namely, utilizing authentication and authorization messages to trigger the checking of a prepaid balance before a service is released and to carry a “quota grant” (e.g., seconds or kilobytes) that is authorized for use in the service before re-authenticating and obtaining further quota to continue using the service [8]. Diameter is specified as a base protocol, which provides common functionality to a set of supported applications. One such application is the Diameter Credit Control application [9], which enables real-time credit control for network and contentbased event and session-centric services, and includes operations to support reservation and direct debit against a prepaid balance. The application also supports advice of charge, correlation, and error handling. Figure 12.9 illustrates an accounting or credit control dialogue that might occur between an accounting client and an accounting server. This example is based on the charging of a streamed service to a prepaying subscriber, and also utilizes many of the mechanisms described in the preceding sections; for example, rating (to determine the service cost; and to translate this into a number of “units” to be metered by the service element), charging rules (to influence the charging behavior of the access network), and revenue sharing. OSA (Open Service Access) defines a set of APIs enabling external applications to interact with a network’s service capabilities. One of these APIs supports contentbased charging, also supporting reservation and direct debit models [10]. (OSA API specifications are aligned with and functionally identical to those of the Parlay Group). Currently, specification work in this area is focusing on the definition of a charging Web services interface, supporting the Web services XML, SOAP, and HTTP paradigm. Other specifications and drafts propose the transport of charging and price information within HTTP or SOAP headers, although these can be considered to have limited application and poor support for prepaid charging.

12.7

CHARGING INFORMATION

What information should be presented in a charging record? The answer depends closely on the type (and vendor) of the charging element and the charging models that the charging element is intended to support.

432

Figure 12.9

Session-based prepaid content charging example.

12.8

CHARGING ARCHITECTURE AND SCENARIOS

433

The list in Table 12.2 has been put together not as a comprehensive list of information that should appear in a charging record or request, but is intended to provide a greater understanding of charging data that are needed in various scenarios. The information needed is invariably dependent on factors such as the service used, the charging model supported, and the capabilities of the systems involved. Typically, only the most essential information is found in an online charging request since these requests are generated within the service delivery, and must introduce minimal latency and network load. The choice and naming of the informational items in Table 12.2 is representative only and does not reflect any particular implementation or specification. The list may nevertheless serve as a checklist for considering parameters that need to be supported in a charging dialog.

12.8


In this section a suggested charging architecture is illustrated and some examples are discussed as to how charging might be implemented for various content types. Alternatives do exist to the examples presented, although these might involve a greater degree of complexity or lack of accuracy. 12.8.1 Charging Architecture Figure 12.10 presents a high-level charging architecture based on a charging system that incorporates both online charging and offline mediation capabilities. The charging system should be capable of handling charging requests from the access network (e.g., for GPRS, WLAN access charges), the operator’s internal service delivery platforms (e.g., MMS centers, delivery servers) and externally provided content and services (e.g., partner service providers, Web service providers). The charging system should, if required, be capable of correlating requests from all of these domains where they relate to the same service instance. The charging system should also be able to interact in real time with account balance and rating systems, and should communicate with the operator’s billing and business support systems responsible for invoicing, settlements, financial accounting, and taxation. 12.8.2 Charging Scenarios The following outline charging models provide some further examples of how mechanisms described in preceding sections of this chapter can be applied within a charging architecture as illustrated above. 12.8.2.1 Browsing In browsing, the charging model is based on volume. A flow-aware packet core network can be used as the primary source of charging information, based on (downlink) volume.

434


TABLE 12.2 Charging Information Items Item User ID

Price Service ID

Tariff class

Destination Timestamp Content/Service provider

QoS

Bearer

Correlation (method, key) Location

Content type Delivery indicator Amount (type, scale, value)

Explanation This item is needed to identify the user or recipient of the content or service, and derive from that the account that should be charged. Typically MSISDN or IMSI are used within a mobile network; although these may not necessarily be used by a content server, and might have to be mapped from e.g., an authenticated user name or pseudonym. This item can be used where the price (currency code and value) is dictated by the content or delivery server. This item is needed generally to provide audit information, but is specifically needed where the content or service is not prepriced, and needs to be rated to derive the charge to the user This item can be used in addition to the Service ID, for instance to indicate what price band an item (e.g., SMS ringtone, mp3) should be charged at. This item can be used e.g. to describe the target of the content Can be used to indicate the time of the service or delivery event (start time and/or end time; duration) Can identify the content provider for revenue sharing purposes—also can indicate whether the content is provided internally or externally to the operator’s partner network. Indicates the quality of service with which the content or service was delivered—can be used as a weighting factor in deriving the charge to the user Indicates the bearer (e.g., WLAN, GPRS) over which the content or service was delivered—can be used as a tariff attribute in deriving the charge to the user Can be used to indicate the correlation method and keys to be used (e.g., IP flow classifier) Indicates the location of the user—can be used to derive roaming status; also as a tariff attribute in deriving the charge to the user This item can be used to indicate the type of content (e.g., .JPEG, MPEG, WAV sent in an MMS message) Can be used to indicate whether the content was successfully delivered This is a collection of informational items that can be used to indicate the value that should be charged or metered in the charging element (e.g., time-seconds; downlink volumekilobytes; currency-ISO currency code-US dollars). More than one type of value should be supportable in a charging request, within different contexts. (Continued )

12.8

Table 12.2


435

(Continued)

Item Debit-credit indicator Application

Revenue share (party, method, value) Sponsored (party, method, value) Session id Session message type

Method (prepaid-specific) Session event Record sequence

Explanation Indicates whether the amount field should be applied positively or negatively Identifies the application being used on the terminal. Can be used to derive revenue share for application developers; also to affect user charging where nonauthenticated applications are used. Can be used e.g., to indicate whether revenue sharing should be applied, in respect of which party, and how the value can be calculated (e.g., % of total charge) Can be used e.g., to indicate whether sponsorship should be applied, by which party, and how the value can be calculated (e.g., % of total charge) Unique ID for a charging session Can indicate for example whether this is an accounting start, interim or stop; request or response; “final units” quota grant from the charging server Can indicate whether the request is for balance check only, reservation, deduction, advice of charge, etc. Can be used to indicate a change of charging rule during a charging session Identifies the sequence of the charging record e.g., where multiple partial CDRs are generated during a long session

The core network is able to recognize which browsing traffic is destined outside the operator’s partner network. This enables charging scenarios where browsing traffic directed outside of the operator’s network is charged at rate x, and traffic within the operator’s network is charged at rate y, or free-of-charge. In the latter case, user charging might be based on additional information from a WAP gateway.

12.8.2.2 Person-to-Person Messaging In person-to-person message, the charging model is based on per message payment by the sender. The pricing is dependent on the message content type and message size (these can be expressed within service ID þ tariff class parameters). A flowaware core network can zero-tariff the access charging, with the MMSC being used as the source of charging data. Charging is based on delivery of the MMS successfully to the recipient’s MMSC; therefore, prepaid charging should utilize a reservation method, where a balance check and reservation is made before the message is permitted to be sent, and the reservation committed (or canceled) depending on the message sending success indicator.

436


Figure 12.10

Charging architecture.

More elaborate charging models can be accommodated where the core network is able to support a “roaming premium” for both message senders and message recipients, depending on their network-specific location. 12.8.2.3 Download In download, the charging model is based on the specific content and service involved, but mainly the recipient pays. If the content charge is to be made through the download event, then confirmation of successful download is needed, which is most reliably obtained from the delivery server. Prepaid charging would in this case support a balance reservation þ commit on download confirmation model. Revenue sharing (recognition of the content originator) and sponsorship (e.g., based on advertisement display or insertion) should also be supported in the charging information generated by the download server. A flow-aware core network can be used to apply recognition of the flows relating to the download and accordingly suppress access charges, according to the operator’s service offering. 12.8.2.4 Streaming Video In streaming video, the charging model is based on the duration of the streaming session, together with an initial setup fee, charged to the viewer of the content. A flow-aware core network can be used to identify the streaming traffic based on, for example, destination IP address, destination port and protocol, with higher-layer analysis of the RTSP protocol being used to detect messages such as “play,”

REFERENCES

437

“pause,” and “tear-down” as a basis for charging. The core network charging information can also include QoS indicators, which will often be important to indicate how the streamed session should be charged (e.g., if the video stream quality is poor).

12.9

SUMMARY

In conclusion, this chapter has presented at the evolution of the charging systems and mechanisms within the telephony environment, and how these are being adapted to the needs of mobile content charging. A philosophy that is too often applied in the development of mobile technologies and creation of services is that charging is an issue for the charging systems to solve (i.e., an attitude that if you just “throw in a billing system at the end, it will all work”). Typically, the outcome in such cases where charging has not been well considered from the outset is complex, expensive integration projects that nevertheless result in some compromise charging solution, often with the end user completely unclear on how a given service is to be charged. There are two main messages that can be emphasized from this chapter. 1. Mass consumer takeup of mobile content-based services will not be achieved until understandable, predictable, and acceptable user charging can be provided (i.e., fully differentiated charging). 2. Charging logic should be designed in to service applications and solutions from their inception, particularly in prepaid charging, where charging should be integral to the service authorization and also should support correlation of access and content charging data.

REFERENCES 1. 3rd Generation Partnership Project, Technical Specification Group Services and System Aspects, Service Aspects; Charging and Billing (Release 5), 3GPP TS 22.115. 2. WAP Billing Framework Version 1.0, version 21, Nov. 2002, Open Mobile Alliance, OMA-WBF-v1_0-20021121-C. 3. C. Rigney, RADIUS Accounting, RFC 2139, April 1997. 4. 3GPP TS 23.078 V4.7.0 (2002-12), Technical Specification 3rd Generation Partnership Project; Technical Specification Group Core Network; Customized Applications for Mobile Network Enhanced Logic (CAMEL) Phase 3, Stage 2. 5. IPDR.org Network Data Management Usage Specification, version 3.1.1 6. C. Rigney, S. Willens, A. Rubens, and W. Simpson, Remote Authentication Dial In User Service (RADIUS), RFC 2865, June 2000. 7. P. Calhoun, J. Arkko, E. Guttman, G. Zorn, and J. Loughney, Diameter Base Protocol, IETF Work in Progress

438


8. A. Lior et al., Prepaid Extensions to Remote Authentication Dial-In User Service (RADIUS), Work in Progress, draft-lior-radius-prepaid-extensions00.txt, Feb. 2003. 9. H. Hakala et al., Diameter Credit Control Application, Work in Progress, draft-ietfaaa-diameter-cc-00.txt, June 2003. 10. 3rd Generation Partnership Project, Technical Specification Group Core Network; Open Service Access (OSA); Application Programming Interface (API); Part 12: Charging (Release 5).

CHAPTER 13

ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES GANG WU and XIA GAO DoCoMo USA Labs San Jose, California

KEISUKE SUWA Musashi Institute of Technology Japan

13.1

INTRODUCTION

There has been an explosive growth of mobile computing and speedy emergence of new wireless technologies. The desire to be connected “anytime, anywhere, and anyway” leads to an unprecedented research on mobile ubiquitous computing. One main difference of mobile ubiquitous computing from stationary computing is that mobile applications do not occur at a single location with a single context but rather span a multitude of locations such as offices, homes, streets, highways, and mountains [1]. Therefore, a key distinguishing feature of mobile ubiquitous computing is the ability to detect, react to, and make use of changing environmental conditions (context) to provide users with a better seamless and intuitive experience. Location is considered as one of the most fundamental factors of context that influence application behaviors. Context, as discussed by Schilit et al. [1], has three important aspects: where you are, whom you are with, and what resources are nearby. Depending on specific applications, other possible relevant context elements may include noise, light, temperature, speed, network traffic, and charging scheme. Among them, location is a constantly changing parameter in a mobile environment. Also, many resources (such as printers), services (such as wireless coverage), and other elements (such as noise) are location-dependent. The change of location can thus serve as a hint for mobile applications to refresh Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.

439

440

ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES

their knowledge of interested context and respond accordingly. To this end, location-based services are gaining prime importance. It is predicted that worldwide location-based services revenues will grow from approximately $1 billion in 2000 to over $40 billion in 2006, representing a compound annual average growth rate of 81% [2]. Figure 13.1 shows a general framework to support location-based applications. The location estimation system usually has two components: location sensor infrastructure and location estimation algorithm. The location sensor infrastructure comprises of both transmitters that actively or passively (on request) send out signals and receivers that receive and measure these signals. Different location estimation algorithms make use of various types of measurement of these signals such as time of flight, angle, and signal strength. Then, after location estimation is derived, the original location format may not be understood by applications and has to be transformed to other presentation format by the “location format transformation” component. In the rest of the chapter, we will discuss each component of the framework in detail. We begin by defining of location and location-based services. According to the way that location information is used by mobile applications, the taxonomy of location is presented. Then, different physical media transmitted between signal transmitters and receivers in the location sensor infrastructure are discussed. The advantage and disadvantage of each media are covered so that the design choice of one media in a specific application scenario can be easily understood. Next, location estimation algorithms are surveyed. Since measurements received from location sensor infrastructure are error-prone, the goal of these algorithms is to satisfy the estimation accuracy required by applications. At the same time, these algorithms try to optimize measurement cost and decrease system complexity. Next, a number of indoor and outdoor location estimation systems are introduced. These systems utilize different physical media and location estimation algorithms discussed in the previous sections. The work of the Open GIS (geographic information system) Consortium (OGC) is also introduced in this section. OGC, consisting of GIS software vendors, database vendors, integrators, and application providers, put great efforts on defining a standardized format of geographic data expression and communication protocols to expedite data exchanges among different peers. The results from OGC are considered beneficial to the “location format transformation” component. Finally, we describe how to provide location services based on a cellular system.

Figure 13.1

A framework designed to support location-based applications.

13.2

13.2

TAXONOMY OF LOCATION

441

TAXONOMY OF LOCATION

Location, as one of the most important aspects of context, has been widely factored into the design of mobile applications. The applications that are capable of finding the geographic location of an object and providing services based on this location information are called location-based applications. The location information can be accessed via different devices, such as desktop, mobile phone, personal digital assistant (PDA), vehicle, and airplane. Diverse application scenarios include Enhanced 911 (E-911) emergency services, road assistance, geotargeting advertisement, fleet tracking, navigation, and smart office. Furthermore, location information can be integrated into network protocol design to provide location-aware routing, handoff, billing, and system planning services. Although all these applications require location information, the types of the information are quite different. The most important types discussed here are physical location, symbolic location, absolute location, and relative location [3]. 13.2.1 Physical and Symbolic Location Physical location is expressed in the form of coordinates, which uniquely identify a point on a two-dimensional map of the earth. The most widely used coordinate system is the degree/minutes/seconds (DMS) system. Some other common coordinate systems are degree decimal minutes and universal transverse mercator (UTM). In the DMS system, two sets of lines, latitude and longitude, crisscross the map in two directions. Each set of lines is given a set of numbers (coordinates) so that every point on the earth can be expressed as the intersection of a latitude line and a longitude line, which in turn is expressed as the juxtaposition of coordinates of two lines. Latitude lines go east and west. The earth’s equator is the zero line, the baseline for latitude. Then the coordinates increase both to the north and to the south from there to a maximum of 908, which is a single point at each geographic pole. The longitude lines, also known as the meridian lines, go north and south and all cross each other at the pole. The baseline of longitude lines is the line passing Greenwich, a small town in England. From there, coordinates increase both to the east and to the west. Since there are only 3608 in a circle, the maximum degree for a longitude line is 1808 in both east and west directions. Because the unit of degree is too coarse to specify a point, each degree is broken up into 60 minutes, which in turn is broken up into 60 seconds. So, in the DMS system, the coordinates of a point has a format similar to N788330 2200 E1408420 2300 , where N788 is the number of degrees of north latitude and E1408 is the number of degrees of east longitude. To uniquely identify an object in 3D environment, the altitude can work together with DMS coordinates. Symbolic location, on the other hand, expresses a location in a natural-language way: in the home, on the bed, on a train, or similar. This type of location is useful for many applications that do not need the precise physical location information, especially for applications that have a fixed service range. For example, home security systems protect only the area that is part of a home so that symbolic

442


expressions such as “in the home” or “out of home” are enough to trigger some actions. With help from the location information database, a physical location can be mapped to the corresponding symbolic information and vice versa. However, the resolution of a physical location can influence the definitiveness of the symbolic information. For example, a resolution of 10 m might not be enough to derive whether a person is in a specific room because there might be several rooms within a 10-m range. If the resolution is improved to 1 m, the probability to successfully estimate the room occupancy is increased significantly. On the other hand, because of the vagueness of symbolic location, it typically provides very coarsegrained physical locations. For example, given a symbolic location such as “in an office,” if the office has a radius of 4 m, the resolution of physical location can not be better than this if no other additional information is collected. 13.2.2

Absolute and Relative Location

An absolute location uses a shared reference grid for all located objects [3]. For example, the DMS system provides absolute location information based on latitude and longitude grid system. For the same location, the reports from two different location estimation systems should be the same. On the contrary, a relative location depends on its own frame of reference. The reports from two different location estimation systems may be different for the same location. For example, a mobile host can be reported as “100 m from the base station 1” and “200 meters from the base station 2.” With the knowledge of the absolution location of the reference points, which is usually stored in a location database, an absolute location and a relative location can be transformed between each other. Relative location information is usually based on the proximity to known reference points, such as access points in a wireless local area network (WLAN) and basestations in a cellular system. So, this type of information can be easily provided by existing infrastructure without specialized location tracking infrastructure. Some useful applications include geotargeting advertisement and E-911 services. For example, a geotargeting advertisement could send out location-specific guide on local restaurants, theatres, or even traffic information to mobile users that register at local cells. An E-911 emergency call is able to reveal which cell the emergency call originates from so that rescue teams can be dispatched in an efficient way.

13.3

LOCATION ESTIMATION MEDIA

Location estimation algorithms are based on the measurements of specific parameters of a received signal that enables the position of a device to be inferred. Depending on targeted applications, location estimation algorithms vary enormously in terms of the type of measuring signals and measured parameters, and the accuracy, usage, and category of derived location information. The measuring signals could be physical signals such as ultrasound, radio, infrared, and visible

13.3


443

light, or software signals such as IP (Internet Protocol) packets. The measured parameters could be time, distance, angle, signaling attenuation, physical contact, or IP headers. The measurement accuracy could be a few meters (for geographic survey), tens of meters (for fleet tracking), a cellular cell (paging), an office room (for business usage), or an Internet subnet (for geographic content targeting). In this section, we will introduce how the characteristics of different physical media (ultrasound, radio, infrared, and visible light) influence their deployment in location estimation systems. IP-based location estimation techniques and services will be covered later in a separate section. 13.3.1 Radiofrequency (RF) Radiofrequency (RF) refers to any frequency within the electromagnetic radiation spectrum normally associated with the radio wave propagation. When an RF current is supplied to an antenna, an electromagnetic field, usually called an RF field or simply a “radio wave,” is produced that propagates through space and is suitable for wireless communications. An RF signal has the speed of light in free space (3 108 m/s) and has a wavelength inversely proportional to the frequency. It covers a significant portion of the electromagnetic radiation spectrum, ranging from 9 kilohertz (kHz), the lowest allocated wireless communications frequency, to thousands of gigahertz (GHz). The RF spectrum is further divided into several bands, and its allocation in the United States is managed by the Federal Communications Commission (FCC). Many types of wireless communication systems make use of RF spectrum, such as cellular telephone systems, satellite communication systems, and WLAN systems (e.g., IEEE 802.11). Because of the popularity of wireless communications and the convergence of cellular networks and WLANs to provide ubiquitous mobile computing, the location-based services and estimation technologies enabled by RF are the main focus here. This section covers the characteristics of radio wave that influence the accuracy of location estimation. 13.3.1.1 Multipath Propagation Multipath propagation refers to the phenomenon that a transmitted signal arrives at a receiver from various directions over a multiplicity of paths because of obstacles and reflectors in the radio propagation channel. The direct path between the transmitter and the receiver is called line of sight (LOS), while other paths are called non– line of sight (NLOS). Because most location estimation algorithms depend on the existence of a LOS path between measured objects and system reference points to correctly measure distance or angle, multipath propagation can be the dominant source of error in location estimation by introducing error in LOS detection. Three propagation mechanisms [4], as illustrated in Figure 13.2, play a role. Reflection occurs when a radio wave encounters a surface that is large relative to the wavelength of the signal. Diffraction occurs at the edge of an impenetrable body that is large compared to the wavelength of the signal. If the size of an obstacle is on the order of the wavelength of a signal or less, scattering occurs and an incoming signal is scattered into several weaker outgoing signals.

444


Reflection Transmitter

Lamp post Scattering Diffraction

Reflection

Figure 13.2

Three propagation mechanisms of multipath propagation.

One problem of multipath is amplitude and phase fluctuations, which are also referred to as multipath fading. Multipath fading happens when multipath waves of a signal reach a receiver out of phase and lead to signal cancellation or reinforcement depending on the phases of each wave. Such rapid variations in signal strength and phase occur over a distance in the order of a wavelength. Thus multipath fading has a small-scale effect and makes it harder to estimate distance between a transmitter and a receiver by measuring received signal strength (RSS). The other problem of multipath is called delay spread. When multiple radio waves of a transmitted signal reach a receiver at different times, the signal on NLOS paths of one data bit may collide with the signal on LOS path of neighbor data bits and lead to intersymbol interference (ISI). These delayed signals act as a form of noise to the subsequent primary signal and have similar amplitude in many cases, which make recovery of data information more difficult. Rayleigh and Rician radio propagation models are widely used to model rapid amplitude fluctuation due to multipath fading in the outdoor environment. The Rayleigh model is applicable to the situation where there are multiple NLOS paths of equal strength between a transmitter and a receiver but no dominant paths such as a LOS path. On the other hand, the Rician model best characterizes a situation where there is a direct LOS path in addition to a number of NLOS paths. The Rician model contains the Rayleigh model as a special case when the strong LOS path is eliminated. The radio propagation model in the indoor environment is hard to characterize because of severe multipath, low probability for availability of LOS path, and site-specific parameters such as floor layout, moving people, and numerous reflecting surfaces. So far there do not exist any good models for the multipath characteristics of indoor radio channels for geolocation estimations [5] and more advanced location estimation algorithms using scene analysis or proximity (discussed in later sections) are developed to mitigate the location measurement errors.

13.3


445

13.3.1.2 Other Interference Factors Besides the small-scale multipath fading, which fluctuates over a distance on the order of a wavelength or more, both the medium-scale effect of shadowing and the large-scale effect of path losses exist in RF channels. These two effects have great impact on outdoor location estimation systems such as satellite systems and cellular systems. Shadowing describes the gradual variation in mean power over a distance on the order of a few tens of wavelengths. Shadowing is due mainly to the local shadow attenuations by obstacles such as trees in the vicinity of an antenna and is well described by a log-normal distribution. Large-scale path loss describes the slow variations in mean power over a large area (usually on the order of tens or hundreds of meters). It is modeled as attenuation of the signal due to the large distance of travel. Some large-scale path losses are freespace loss, plane earth loss, and diffraction loss. In the free-space loss model a signal attenuates over distance because the signal is being spread over a larger and larger area. In the plane earth loss model a strong LOS is present, but ground reflections also exist and significantly influence path loss. In the diffraction loss model, plane earth loss is modified to take into account of signification diffraction losses, caused by obstacles cutting into the LOS path. The attenuation is generally the distance raised to a power called the path loss exponent, which is usually between 3 and 5. 13.3.2 Infrared (IR) The infrared (IR) region of the electromagnetic spectrum uses the terahertz (THz) (1012 Hz) range of frequencies, and its spectrum is roughly divided into four subregions; short, medium, long, and very long wavelength. These four frequency bands, from short to very long, cover the wavelengths of 1 – 3, 3– 8, 8– 14, and 14 – 30 mm, respectively. One of the main challenges of IR-based systems is the interference imposed by ambient light source. Such light sources, including local incandescent sources, fluorescent lighting, and sunlight, have relatively high power compared with IR transmitted signal. Although IR optical filters can be used to attenuate the visible portion of the spectrum while leaving the IR intact, these sources still cause severe interference because they provide a greater noise level, 60 dB in many cases, than does the desired IR signal. Furthermore, these sources, especially sunlight, also produce a large amount of infrared energy (in-band noise) in addition to the visible component. IR systems are generally used for two purposes. The first usage is for object detection, sensing, and tracking. This is based on the fact that all heated objects emit IR radiation and every type of object has a unique IR signature or fingerprint. By deriving a digital image from received radiation signal and matching the image in the fingerprint database, IR sensing systems can detect the existence and type of the objects. The advantage of this kind of system is that no specific IR transmitters are needed. The disadvantage of the system is that more sensitive and delicate IR receivers are required to capture sometime very weak signal and to produce the

446


image. Because most IR sensing systems use a photon detector made from IR sensitive materials such as mercury– cadmium – telluride (HgCdTe) to detect IR radiation, such systems are very expensive to produce. Also facing the problems of the lack of capability to distinguish objects of the same type and the vulnerability to the environmental interference, this type of system is not suitable to the general-purpose location estimation services. The second usage of the IR system is for data communications. This is because the IR uses high operational frequencies and provides high bandwidth. Some popular commercial products include the remote control of domestic appliances and data backup links for PDAs and laptops. Compared with IR sensing systems, IR data communication systems use extra IR transmitters that modulate IR waves in different ways to transmit digital signals. Because no image production ability is needed, the IR transmitter could be a light emitting diode (LED) or an injection laser diode (ILD), and the IR receiver could be a photodiode. With limited transmitting range (10 m), these equipments are cheap and can massively manufactured and integrated into large-scale sensor networks. 13.3.3

Ultrasound

Unlike RF and IR that are electromagnetic waves, a sound wave is a pressure disturbance that travels through a medium by means of particle interaction. The nature of the interaction is an oscillation in the constituent particles of the medium, causing them to alternately be positioned closer to and farther apart from each other. The oscillations and therefore the sound wave must be produced by a source, which physically oscillates back and forth causing adjacent medium particles to oscillate with it. Hence, sound cannot exist without a medium and the properties of a given medium also heavily influence the manner in which a sound wave propagates through it. The speed of sound at which a sound wave propagates through a given medium depends only on the elasticity and density of the medium, not the frequency of the sound wave. For all practical purposes of location estimation, the speed of sound in air is mainly dependent on the absolute temperature that directly affects the density of the air. The temperature dependence of the speed of sound in air is approximated by v ¼ 331 þ 0:6 T where T is the temperature of the air in degrees Celsius and the unit of the velocity v is m/s. At the normal atmospheric pressure with a temperature of 208C, the equation yields the solution of 343 m/s. Ordinary ultrasound systems operate at frequencies between 20 and 100 kHz. Sounds below 20 kHz are audible by humans, and the use of frequencies above 100 kHz is limited by the attenuation of ultrasound in air. Ultrasonic location systems are able to estimate location with a higher degree of accuracy than other location system based on RF, IR, or visible light. This is because the speed of ultrasound in air (approximately 343 m/s in an indoor environment) is much slower than

13.4

LOCATION ESTIMATION ALGORITHMS

447

the speed of other media (the speed of light is 3 108 m/s). So the time of flight of an ultrasonic signal between a transmitter and a receiver can be accurately measured. On the other hand, because the intensity of sound decreases exponentially with respect to transmission speed, ultrasound location systems have limited coverage area and are suitable only for indoor usage. As an ultrasound wave travels through the air, it also undergoes several behaviors similar to those of RF when it encounters the end of the medium or meets some obstacles. Such behaviors of the ultrasound include reflection, diffraction, and scattering. Also, because the velocity of sound depends on environmental factors such as the ambient temperature and humidity, these properties can exhibit both temporal and spatial variations within a building, and introduce additional measurement errors.

13.4


After introducing the characteristics of different media used in location estimation systems, we will describe three typical location estimation algorithms; triangulation, scene analysis, and proximity. Targeting different application environments or services, these algorithms have unique advantages and disadvantages. Hence, some hybrid systems utilize more than one type of location algorithms at the same time to get better performance. 13.4.1 Triangulation Triangulation is the technique using the geometric properties of triangles to compute the objects’ location. The location is estimated relative to some known framework, which consists of either fixed terrestrial sites (e.g., basestations in a cellular system) or space-based satellites (i.e., the geographical positioning system). Triangulation technique has two derivations; lateration and angulation. Lateration is the technique to locate an object by measuring its distances from multiple reference positions, while angulation is the technique used to locate an object by computing angles or bearings relative to multiple reference positions. Instead of measuring the distance directly, time of arrival (TOA) or time difference of arrival (TDOA) is usually measured, and then distance is derived by multiplying the travel time and the signal velocity (3 108 m/s for light and radio). In other words, all three methods depend on emitting and receiving light or radio signals to determine the location of an object on which a light or radio transceiver is attached. 13.4.1.1 Time of Arrival (TOA) The TOA system depends on the measurements of the time of arrival of multiple signals from reference points to the object at the estimated location. As shown in Figure 13.3, in order to enable a two-dimensional calculation, TOA measurements must be made with respect to signals from at least three fixed reference points. Ideally if all the time measurements are perfect without error, the intersection of

448


Reference Point 2 t1

P1

t1 + t Reference Point 1

P3

P

t2

t2 + t

P2 t3

t3 + t Reference Point 3

Figure 13.3

A time-of-arrival (TOA) system.

three dashed circles will uniquely pinpoint the location, P. On the other hand, if one or more measurements exhibit error because of out-of-synchronization clocks or other reasons discussed later, the three distance circles (shown in Fig. 13.3 as the thick circles) cannot intersect at one point. Instead, there will be three intersections, P1, P2, and P3. Normally, (P1 þ P2 þ P3)/3 is a good estimation of real location P. But if it is known a priori that clock offset at the estimated object is the main reason for measurement discrepancy, and the clocks of three reference points are perfectly synchronized to Coordinated Universal Time (UTC), by looking for a single correction factor, Dt, that would allow all the measurements to intersect at one point, the system is able to not only achieve accurate location estimation, but also synchronize the clock of the estimated object to the UTC. This is exactly the mechanism used by the GPS to compensate the clock offset between GPS receivers and satellites and to provide atomic-accuracy timing service to even the lowliest GPS receivers. 13.4.1.2 Time Difference of Arrival (TDOA) The TDOA system is also based on lateration technique but it uses time difference measurements rather than absolute time measurements as TOA system does. Then the time difference is converted to distance difference by multiplying time difference and signaling transmission speed. Because points have constant distance difference to two reference points form a hyperbolic curve, the TDOA system is also referred to as

13.4


Reference Point 2 d2

449

h23

(d3-d2) = constant

P d1 d3 Reference Point 1 Reference Point 3 (d1-d3) = constant h13

Figure 13.4

A time-difference-of-arrival (TDOA) system.

the hyperbolic system. As shown in Figure 13.4, at least three fixed reference points and two pair of time measurements are needed for the two-dimensional location estimation. One hyperbola h13 is defined by the one pair of measurement (d1 2 d3) ¼ (t1 2 t3) (signal speed) ¼ constant, and the other hyperbola h23 is defined by the other pair of measurement (d3 2 d2) ¼ (t3 2 t2) (signal speed) ¼ constant. Because the time difference is used, as long as the clocks at the reference points are perfectly synchronized, the clock offset of the estimated object is canceled during the calculation so that the accuracy of the location estimation is not influenced. 13.4.1.3 Angle of Arrival (AOA) An AOA system depends on the measurements of the angles of the arrival of the signals involved in location estimation. Because directional antennas or antenna arrays are usually required, an AOA system is difficult to be implemented on small mobile devices. As shown in Figure 13.5, at least two geographically fixed reference points and a pair of measurements are required to calculate the

Figure 13.5 Angle-of-arrival (AOA) system.

450


two-dimensional location. The intersection of two directional lines of bearing defines a unique location, each formed by a radial from a reference point to the estimated object. Although lateration and angulation techniques are introduced separately in this section, in the real implementation a hybrid system could utilize both techniques to generate more accurate estimation at the cost of additional complexity. 13.4.2

Scene Analysis

In the triangulation type of location estimation schemes discussed above, geometric information such as distance, time, or angle is directly measured to derive the location. Scene analysis, on the other hand, refers to the type of algorithms that at first collect features (fingerprints) of a scene not explicitly related to geometric information and then infer the location of an object by matching real-time measurements with the closest a priori location fingerprints. Scene analysis has been widely used in many research fields such as human face identification and terrain matching. A few systems for location estimation have already been proposed. The “smart” floor [6] developed at the Georgia Institute of Technology installs pressure sensors in the building floor to capture footfalls and uses the data for position tracking and pedestrian recognition. Easy Living [7] developed by Microsoft Research uses high-performance 3D cameras to capture and then analyze visual frames to provide vision positioning capability in a home environment. Besides these systems, RF-based scene analysis systems also exist and are the main focus of this section because of the popularity of RFbased indoor systems such as IEEE 802.11 and Bluetooth. 13.4.2.1 Rationale for RF-Based Scene Analysis One big advantage of schemes based on indoor RF over those based on IR, ultrasound, and satellite is that no special infrastructure for positioning needs to be deployed. Wireless communication infrastructure is being widely deployed in indoor areas (e.g., WLAN, Bluetooth). RF-based scene analysis techniques can utilize such general-purpose infrastructure to provide location estimation services as value-added services. Such services can also be used by other applications to give users better seamless experiences. This contradicts the systems based on IR, ultrasound, and satellite where special IR or ultrasound sensor networks, or GPS satellite networks have to be deployed. Another advantage of RF-based schemes is their wide coverage area, scalability, and easy maintenance. As discussed above, IR technology has some limitations: (1) it scales poorly because of the limited range of IR, and (2) it is very sensitive to the direct sunlight. Ultrasound technology has similar limitations: (1) it behaves poorly over a long distance because of quick signal attenuation in the air and (2) the velocity of ultrasound is greatly influenced by the temperature and humidity of the air. GPS works well outdoors but poorly indoors. Two main characteristics of RF relevant to location estimation are signal strength (SS) and signal-to-noise ratio (SNR). As indicated in another study [8], SS usually is a

13.4


451

better index than SNR. The SS can be used to determine the distance between a transmitter and a receiver in two ways. The first approach is to use received SS to estimate the path loss of the RF signal over the path and then calculate the distance according to RF attenuation model such as the Rayleigh or Rician model. With the knowledge of the SS from three distinct transmitters, the location of the receiver can be determined by trilateration technique discussed above. However, because both SS and SNR are subject to severe multipath fading and fluctuate over a very short distance, the variation of SS is too wide to be useful for deriving accurate location information. So, the second approach of scene analysis [8–10] based on SS fingerprinting is used and has better results than the first approach [8]. In this approach, real-time measurements of SS are not used directly to calculate distance but to match against SS entries in a database. The entries of this database contain SS of different locations collected a priori. In the simplest way, the location of the entry closest to the real-time SS measurements approximates the accurate location of the object. 13.4.2.2 General Framework of Scene Analysis Because measurements of RF characteristics have large variation and generally cannot be accurately mapped to a location according to a closed-form model, the scene analysis approach then uses these measurements in a fingerprinting manner. Most of such systems involve three operational steps: profiling, matching, and estimation. Profiling This step usually acts off line before systems can be used for real-time location estimation. In this step, the wireless coverage area is at first divided into small portions and each portion includes one or several observation points. Then, depending on specific algorithms, multiple samplings of required RF parameters are collected at each observation point. If necessary, measurement data can be further postprocessed also. Finally, the processed measurement data are stored in a location database to be retrieved later for online matching. In order to guarantee that the sampling is updated and reflects current layout of the building, this step of profiling has to be repeated when the building environment has some dramatic changes. The sampling also has to be collected multiple times at each observation point at different times of the day to smooth the measurement variation throughout the day. How to choose these observation points is a design issue. The points can form a simple grid structure, or they can be distributed unevenly depending on the popularity and the layout of the location. It is obvious that increasing the density of observation points can increase the closeness between fingerprinting points and observation points and therefore the accuracy of location estimations. The tradeoff is that the number of entries of the database also increases, which requires larger storage and lengthens the search time. Also there is a threshold that further increasing density of observation points has no much influence on estimation accuracy [9]. Matching In this step, real-time measurement of an object is obtained and input to the computation unit. The computation unit can be on either the client or infrastructure

452


side depending on specific systems. It compares the real-time measurements with the entries in the location database and finds the “closest” match(es). On one hand, various systems might use different index used to define “closest” relation between two measurements. On the other hand, one or several most “closest” entries in the database might be returned depending on how the actual location is estimated using these entries. More detailed algorithms are covered in the following example. For example, a generalized weighted Lp distance between the measured vector [m1, m2, . . . , mN] and a database entry [e1, e2, . . . , eN] can be obtained [10] !1=p N 1 X 1 p Lp ¼ jmi ei j N i¼1 vi Lp becomes Manhattan L1 distance when p ¼ 1 and Euclidean L2 distance when p ¼ 2. In most cases all the entries have vi ¼ 1. But possible enhancements could use weight vi to bias the distance by differentiating how reliable a measurement is. For this purpose, vi can be related to the total number of samples to get the average at each observation point. The larger the number of samples is, the more accurate the average is. Or vi can be related to the standard deviation of samples. The less the standard deviation is, the more reliable the measurement is. Besides these deterministic algorithms, stochastic algorithms such as Bayesian networks [9] could also be used to find the match. The Bayesian network is a graphical representation of a joint probability distribution that explicitly declares dependency relationships between random variables in the distribution. Figure 13.6 shows an example of how a Bayesian network is used for location estimation in the Nibble system, which is a WLAN-based system for indoor location estimation. The Bayesian network is a rooted tree with directed arcs from the root node Q to a set of WLAN access points (APs). The root node Q is the “query” variable that describes p(Q), the a priori distribution over a set of locations Q ¼ fq1, q2, . . . , qjg. Location set Q includes all the interested locations that a mobile device wants to track. The default distribution of Q is a uniform distribution. But it can be easily modified to reflect one’s preference or current showing frequency at each

Location

Q = {office, library, seminar room, doorway}

Access Points Set L E = {Low SNR, High SNR, Unknown}

Figure 13.6 [9].

AP 1

AP 2

AP i

An example of Bayesian network for location estimation in the Nibble system

13.4


453

location. In the profiling step, at each interest location, multiple samples from each AP are collected and the marginal conditional probability is calculated that value ei [ E as is observed from an AP given current object location is qj [ Q. After all locations are sampled, the marginal conditional probability p(EjQ) of each AP is stored in a separate leaf node of the Bayesian network. Then in the matching step, to estimate a location, a mobile object at first samples and quantizes signals from each AP. The result from each AP forms a vector R ¼ fr1, r2, . . . , rjjrj [ Eg, which is used by the Bayesian network to calculate p(QjR), the a posteriori probability distribution over location set Q. And the conditionally probability distribution of each location is used to define “closeness.” Another issue of the matching problem is in the domain of computation geometry. Because the location database can be quite large for wide area coverage, the efficient organization of the data and corresponding search algorithm are important for real-time performance. There is a fair amount of literature dealing with these issues, and they are not repeated here because they are considered beyond the scope of this chapter.

Estimation This step produces location estimation of the measured object using “k closest” entries selected in the “matching” step. If k is equal to 1, the selected entry is the closest entry and considered to be the location of the object. On the other hand, if k is larger than 1, the “estimation” step then uses different algorithms to derive the estimated location from these k entries. The rationale is that often there are multiple neighbors that are roughly the same distance from the measured object. Given the large variation of RF measurements, there is no fundamental reason to pick up only the closest neighbor and reject others that are almost close. It has already been shown [8,10] that algorithms such as the k nearest-neighbor averaging and the smallest k-vertex polygon can provide more accurate estimation. The k nearest-neighbor average requires k closed locations be returned and then uses the average of these k locations as the location estimations. Smallest k-vertex polygon finds k locations that form a polygon with smallest perimeters and then take the average of these k locations as the location estimation. As shown in Figure 13.7, when k is equal to 3, three (P3, P4, and P5) among total seven points

P1

P6 P3

P5 L

P2 P4

Figure 13.7

P7

Smallest k-vertex polygon location estimation algorithm (k ¼ 3).

454


form a triangle with minimum perimeter. The location estimation L is the average of these three points. 13.4.3

Proximity

The main goal of proximity location estimation algorithms is to detect whether an object is near a known location. In another words, proximity algorithms mainly provide symbolic and relative location information. Such proximity information can be used by a lot of applications that do not need very accurate physical location information. The proximity location system usually is implemented as software enhancement on existing service infrastructure. Hence, it is often more cost-effective and has a shorter turnaround time than do other systems relying on specialized sensor infrastructure. The proximity information can be used in many different ways. It can help devices requiring physical contact to provide more intelligent services. For example, lamps (computers, TV, etc.) can automatically turn on once the appearance of a user is detected. It can be used to provide a mobile “Yellow Page” that sorts the information according to the distance from the information source to a user. In the following section, we will discuss two algorithms providing proximity location information: IP subnet detection and basestation detection. IP subnet detection occurs in the wired Internet and is supported by most of the current content delivery network (CDN) operators. Base station detection is required by FCC for E-911 emergency service and is mandatory for all cellular operators.

13.5

LOCATION ESTIMATION SYSTEMS

In Sections 13.3 and 13.4 we have introduced the main media and basic estimation algorithms used in current location estimation systems. In this section, we describe in detail some representative systems that have different combinations of these technologies and therefore are suitable for different usage scenarios. These systems are broadly divided into two categories: indoor systems and outdoor systems. At the end of the section, the work of Open GIS (geographic information system) Consortium (OGC) is introduced briefly. This work is considered as part of the “location format transformation” component shown in Figure 13.1 that handles the information transformation between different location formats. 13.5.1

Indoor Location Estimation Systems

Indoor location estimation systems have a limited coverage area, probably high estimation accuracy requirements, and complex environment setup. All systems based on RF, infrared, and ultrasound can be used in an indoor environment. 13.5.1.1 Scene-Analysis-Based Systems We introduce two systems, RADAR and Nibble, as examples in this section.

13.5


455

RADAR [8,10] is a RF-based indoor location tracking system developed at Microsoft Research. The system is based on IEEE 802.11 WLAN technology and uses scene analysis for location estimation. In the original testbed of RADAR, three access points (APs) of 802.11 WLAN were able to cover a floor with over 50 rooms. In the profiling step, each AP measures both SS and SNR of RF signal transmitted by a mobile object from one of the observation points. Because it has been discovered [8] that signal strength at a given location varies significantly depending of the object’s orientation (i.e., east, west, north, and south), at each observation point multiple measurements facing each orientation are collected. Then the average of the measurements is taken at each AP. Finally samples from three APs are combined into the tuple of the form (x, y, d, SSi, SNRi), where x and y are the coordinates of a observation point, d is the orientation, and i1f1, 2, 3g corresponds to the three APs. Such a tuple of information is collected for each observation point and stored into a location database. Then in the matching step, real-time measurement of a mobile object is collected and compared with entries in the location database. Finally, k nearest-neighbor averaging algorithm is used to derive the location estimation. Overall, the RADAR system is able to estimate location with a high degree of accuracy. Its median error distance (50% percentile) is 2– 3 m. The k nearestneighbor averaging (k ¼ 3) scheme outperforms the single closest location scheme significantly, and it is shown that there are thresholds of both the density of observation points and the number of real-time samples beyond which further increase cannot improve the accuracy dramatically. The Nibble system [9] is also a WLAN-based indoor location system utilizing scene analysis techniques. It was developed at UCLA Multimedia Systems Lab as part of its multiuse sensor environment (MUSE) project. It runs on mobile objects and uses RF signal sent by nearby WLAN APs to derive location estimations. The most significant difference of Nibble from other systems is that it relies on an evidential reasoning model, namely, Bayesian networks, to aggregate and interpret information from sensors (APs in this case) to provide location estimation services. In the profiling phase, SNR is collected at each interested location and then quantized into discrete values. In current implementation, the SNR value set is E ¼ flow, high, unknowng. Then the Bayesian network is used in the matching phase. Finally, the location with largest conditional probability is considered as the location estimate. Furthermore, Nibble defines the “quality of information” metric to characterize sensors’ service performance. Nibble then uses this metric to select most reliable sensors to retrieve RF signal measurement. In this way, for the same estimation accuracy, the number of queried sensors, or cost can be optimized. Currently, the Nibble system can discriminate locations roughly 3 m apart.

13.5.1.2 Ultrasound-Based Location Estimation Systems There are both narrowband and broadband ultrasonic location systems. Narrowband ultrasonic transducers used in existing ultrasonic systems have piezoelectric

456


ceramics as their active elements. A piezoelectric material has the property that a mechanical change of the material is proportional to a change in the electric field across the material. Hence, a piezoelectric material can be used as an ultrasonic transmitter by modulating the electric field across the material; vice versa, a piezoelectric material can be used as an ultrasonic receiver/detector by measuring the electrical field across the material. This kind of transducer only has a usable bandwidth of less than 5 kHz but is inexpensive, small, and robust. As a result, it is widely used in location systems with a large scale of transmitters [11,12]. The BAT system [12] and the Cricket system [11] share some common properties: (1) use narrowband ultrasound as measuring media, (2) use triangulation and time of arrival as the estimation algorithm, (3) have accuracy of several centimeters, and (4) use both ultrasound and RF transducers on one measured object. Because narrowband ultrasound is very difficult or expensive to be modulated to carry digital data, in both systems ultrasound only has the form as a pulse and relies on RF to carry on system information. Developed in AT&T Laboratories at Cambridge, MA, the BAT system consists of a central RF basestation, a matrix of fixed system receivers, and a collection of measured objects equipped with both an RF receiver and an ultrasonic transmitter. To find the location of an object, the central RF basestation sends out an RF signal with a unique ID identifying this object. On receiving the RF signal and verifying embedded-object ID, the probed object sends out ultrasonic pulse immediately. Also having received the initial RF signal, nearby system receivers can measure the distance between themselves and the measured object by observing the time interval between the receipt of RF signal and the receipt of corresponding ultrasonic signal. Then the trilateration technique is used to calculate the location of the measured object. The Cricket system is a location support system developed in MIT Laboratory for Computer Science. To maintain location privacy of each mobile host, location estimation is not carried on at the system side. Instead, the Cricket location system distributes independent and unconnected transmitters throughout a building. Each transmitter sends an RF signal while simultaneously sending out an ultrasonic pulse. Any mobile host needs to have both RF and ultrasonic receiver to derive its location. To do so, on receiving the initial RF signal, a mobile host activates its ultrasonic receiver and measures the time difference between the arrival of initial RF signal and corresponding ultrasonic signal. Then the measurement of time of flight is used to derive the distance between the mobile host and the transmitter. After distances to multiple transmitters have been measured, the location of the mobile host can be calculated using the trilateration algorithm. Location systems using narrowband ultrasound have two main limitations: (1) they are very sensitive to the interference of in-band noise. Because in-band noise, such as the clacking of typing on a computer keyboard, happens frequently in our daily lives, the accuracy of location estimation could deteriorate greatly sometime; and (2) if the transmission time of ultrasonic signals from nearby transmitters overlaps with each other, these signals will collide with each other and make it difficult for the receiver to distinguish among them and derive the location information.

13.5


457

To solve these two problems, a broadband ultrasonic location system [13] has been developed at the University of Cambridge. The system prototype uses piezopolymer films as activators, which are inexpensive and more robust than piezoelectric material. However, piezopolymer films have low sensitivity, so as transmitters they need a larger driving voltage and as receivers they need more sensitive amplifiers. The system prototype has a wide above-noise frequency bandwidth of 75 kHz. It uses direct-sequence spread-spectrum (DSSS) signal structure to achieve simultaneous multiple access ability and better performance in the presence of noise. The prototype uses the same architecture as BAT. A central RF basestation is used to send out RFs signal to poll each individual mobile host. Each mobile host is equipped with a RF receiver and a broadband ultrasonic transmitter. On receiving the RF poll from the central basestation, a mobile host immediately sends out an ultrasonic signal. The signal is then received by nearby fixed ultrasonic receivers and used to derive the location of the mobile host. Because multiple ultrasonic transmissions are allowed at the same time, the update rate of the system is greater such that the system also has higher accuracy than does BAT. 13.5.1.3 Infrared-Based Location Estimation Systems One location estimation system using IR techniques is the active badge system [14] developed in Olivertti Research Ltd, England. The active badge system deploys a sensor network inside a building consisting of one IR sensor per room. Each employee is equipped with a 55 55 7-mm badge that sends out a short IR beacon with a unique ID every 15 seconds. On receiving a beacon, the sensor forwards related information to a centralized computer where employee information is derived and then associated with the room in which the sensor resides. Unlike RF signals that can penetrate the office partitions, IR signals will not travel through walls. Hence, this system can provide good location estimation of the accuracy of a room size. The services supported by the system are Find(name), which provides the current location or a list of most visited location of a badge With(name), which provides information of other badges that are at the same area of the requested badge Look(location), which provides information of all the badges close to the requested location Notify(name), which sends an alarm to the current location of the requested badge.

13.5.1.4 Location Proximity in CDN An IP address is used not only to identify a computer interface but also to route packet from source to destinations. In an IP address there are different fields to express the network address and local interface address. The network address is unique globally and is assigned in a hierarchical manner by some authorities. For a fixed wired subnet, the mapping between IP subnet address and location could

458


be stable for quite a long time. Hence, it is possible to derive location proximity from an IP address. However, the resolution of proximity location varies a lot according to the size of an Internet service provider (ISP). Usually each ISP has been assigned a chunk of IP addresses. Depending on the size of an ISP, the chunk can be large or small (class A, B, or C in IPv4). ISP has the authority to further assign IP addresses within its own chunk to its own subnets. Such an internal assignment is seldom exposed to outsiders. So if an ISP has a very large coverage area, the proximity information might be very coarse-grained. At the same time, CDN providers, such as Akamai [15] and Speedera [16], operate a highly distributed network comprising a few data centers and thousands of edge servers residing inside ISPs’ networks or at the ISPs’ POPs (points of presence) through bilateral contracts. These overlay networks run on top of current Internet and nodes, and interconnect with each other through dedicated links purchased from local ISPs. Contents are distributed among edge servers and retrieved locally from the closest edge servers. The global coverage and flexibility of overlay networks make it possible to use more advanced traffic management, failure recovery, and denial-of-service protection, and request routing techniques to further improve users’ experience. Because edge servers in overlay networks have smaller coverage areas, are closer to end hosts than original servers of content providers, and have access to internal IP address information, this infrastructure could achieve better performance and have more edge processing capabilities such as geotargeting, location-aware computing, and dynamic content reassembly. Hence, further leveraging their overlay network infrastructure, CDN providers provide proximity location services as the valueadded services. EdgeScape provided by Akamai [15] is the service that provides geographic, network, and corporate identity information for IP addresses on the Internet. The information provided by EdgeScape includes geographic information (country, area, latitude and longitude, time zone, zip code, etc), network (connection type, network name, and actual connection), and corporate identity (company name and domain name). The information is kept in a database maintained by Akamai network. Using customers’ IP addresses as parameters and invoking EdgeScape API, service providers send inquiries to and retrieve information from the database. By integrating EdgeScape API into their Web servers or application servers, service providers can realize more complicated business logic including localized pricing, target promotions and advertising, content customization, service regulation, and fraud detection and prevention. The geotargeting service suite provided by Speedera [16] provides both GeoPoint services and GeoTraffic analysis services. GeoPoint services are similar to EdgeScape services of Akamai in providing geographic information to content provider’s application server to facilitate personalized and localized services. Content providers use GeoPoint APIs to contact GeoPoint servers to receive geographic information that is continuously updated. GeoTraffic analysis services provide a comprehensive look at the Web traffic from a geographic perspective.

13.5


459

Weekly and monthly reports detail different resolutions of geographic data and summarized network and proxy information. The services complement other web analytic services, adding a geographic dimension to visitor information and enabling the content provider to prioritize content personalization initiatives, improve marketing and sales planning and execution, and help plan Web server architecture and indicate appropriate revisions. 13.5.2 Outdoor Location Estimation Systems Because ultrasound and infrared have only limited coverage areas, RF is the main medium used in outdoor location estimation systems. In this section, we will introduce two systems based on satellites and cellular networks respectively. 13.5.2.1 Location Estimation with GPS-Based System The global navigation satellite system (GNSS) [17] is a generic term given to satellite-based radio navigation systems designed to support worldwide highaccuracy position, velocity, and time estimation. The global positioning system (GPS) system developed by the Department of Defense (DoD) of the United States and the GLONASS system later developed by the Soviet Union are two currently operational GNSS systems. In this section, we will cover the basics of the GPS system because of its popularity. GPS consists of three major segments: SPACE, CONTROL, and USER. The SPACE segment consists of 24 operational satellites that orbit the earth in 12 hours. There are often more than 24 operational satellites as new ones are launched to replace older satellites. The satellites are divided into six equally spaced (608 apart) orbital planes (four satellites in each plane). The orbits have the altitude of 20,200 km and the inclination angle of 558 with respect to the equatorial plane. Such a constellation allows the satellites to repeat the same track and configuration over any point approximately every 24 hours (4 minutes earlier each day) and provides the user with between five and eight satellites visible from any point on the earth. The CONTROL segment consists of five monitor stations, three ground antennas, and a master control station (MCS). The monitor stations passively track all satellites in view, accumulating ranging data. This information is processed at the MCS to determine satellite orbits and to update each satellite’s navigation message. Updated information is transmitted to each satellite via the ground antennas. The USER segment consists of antennas and receiver processors that provide positioning, velocity, and precise timing to users. Currently GPS receivers have been miniaturized to just a few integrated circuits and equipped in various consumer electronic components, such as in cars, boats, laptop PCs, and PDAs. Initially operating in 1993 and fully operating in 1995, GPS originally provides two levels of services; the precise positioning service (PPS) and the standard positioning service (SPS). The PPS is a highly accurate military positioning, velocity and timing services available to authorized users equipped with specialized

460


receivers. The PPS has the accuracy of 22 m horizontally, 27.7 m vertically, and 200 ns temporally [17]. On the other hand, the SPS is targeted for civil usage with a built-in variable error produced by the U.S. military in satellite transmitters to intentionally degrade the accuracy to 100 m horizontally, 156 m vertically, and 340 ns temporally. Recognizing that GPS becomes more and more indispensable in the global information infrastructure, the U.S. government deactivated selective availability (SA), which was used to degrade SPS accuracy, on May 1, 2000 such that the PPS now is available for civil usage also. GPS location services are based on the time-of-arrival (TOA) technique. The distance between a user and a satellite is measured in terms of the transit time of the GPS signal from the satellite to the user. The satellites, which broadcast their positions, are the reference points shown in Figure 13.3. To uniquely decide a point in 3D space, theoretically three satellites are needed to provide three distinct distance measurements. However, to mitigate the clock bias between GPS satellites and receiver clocks, an extra fourth satellite is required. Figure 13.3 shows the 2D case and only require 3 satellites. The accuracy of basic GPS is influenced by several sources of random and systematic errors. Some of these error sources are uncompensated errors in the clocks of the satellites, accuracy of the predicted satellite positions, unmodeled propagation delays in the ionosphere and the troposphere, multipath fading, and receiver noise [17]. To mitigate such errors, differential GPS (DGPS) systems are proposed. The design of DGPS is based on the fact that the errors associated with the GPS measurements are similar for users located close to each other (within a few hundredths of a kilometer) and change slowly in time (in the order of several seconds). So, DGPS utilizes a reference receiver with known location to estimate measurement errors and broadcasts these errors to other nearby GPS receivers over a radio link. Neighboring GPS receivers then use this information to mitigate errors in their own measurement assuming that they have the same measurement errors as the reference receiver. DGPS can provide meter-level and even submeter-level position estimations depending on the closeness of the user to a reference point and the latency of the corrections transmitted on the radio link. 13.5.2.2 Location Estimation with Cellular-Based System In 1996 FCC ruled that cellular operators must provide E-911 emergency services, comparable to those currently available for most wireline network users. The deployment of the services is divided into two phases. In phase 1, both the cell location of the caller and appropriate callback number are required to be reported to the public safety answering point (PSAP). In phase 2, the estimation of location information of the mobile station has to be within 125 meters of its actual location for at least 67 percent of all wireless E-911 calls [18]. The requirements of phase 2 are hard to satisfy and need major modification of existing cellular network infrastructure. The location estimation algorithms are based on triangulation algorithms discussed in the previous section. However, the requirements of phase 1 are based on proximity location information, so they are relatively easy to satisfy and only require simple add-ons to receive and detect E-911 calls. The system does not modify

13.5


461

existing basestations and has no interaction with base station except the access to the necessary antenna signals.

Phase 1 Estimation Not just limited to E-911 services, location proximity information in phase 1 can be applied to other applications such as location-sensitive billing, fraud detection, cellular system design, and resource management. Location-sensitive billing allows cellular carriers to offer different rates depending on whether a mobile station is used at home, in an office, or on the road. System designer can use the load information of each location to better position and tune cells, and thus improve spectrum utilization efficiency. In cellular networks, a service coverage area is divided into smaller hexagonal areas called cells. Each cell is served by a basestation with a systemwide unique station ID. Several basestations are then controlled by a radio network controller (RNC), a number of which in turn are managed by a mobile switching center (MSC). A mobile station is active if it is powered on. Since the exact cell location of a mobile station is known to the network during the call, the main issue of location proximity is the location estimation of a mobile station between two consecutive calls. Two major operations are involved in location proximity: location update and paging [19]. The location update is performed by an active mobile station to fresh its location information state in the cellular network. The paging operation is carried on by the cellular network to alarm a mobile station about upcoming events such as a coming call. There is a basic tradeoff between the updating cost and paging cost. The updating cost is mainly the bandwidth usage and mobile station power consumptions. The paging cost is the paging traffic load and the paging delay. Among these parameters, paging delay is the main focus of location proximity algorithms because paging delay can influence the availability of location information. The higher frequency at which the mobile station updates its location, the smaller the paging area of cellular network. So, the optimization problem is to find the optimal location update and paging algorithms and to minimize the totalcost. Many schemes are surveyed by Zhang [19]. One scheme is to define a location area that covers a few basestations and a mobile station updates its location whenever it comes into a new location area. In this scheme, before paging succeeds, the location proximity is the current location area. Another scheme is called the movement-based location update scheme, in which each mobile station keeps a counter that is initialized to zero. The count increases by one whenever the mobile station crosses a cell border. When the count reaches a predefined number M, the mobile station updates its location. In this case, before paging succeeds, the location proximity is a circle centering at last updated cell and having a radius of M cells. Considering the zigzag behavior between two adjacent cells, the actual position may be quite close to the previous updated cell. A variation of this scheme is called distance-based location update strategy. In this area, a mobile station updates its location only after it is more than distance M from previous updated cell. In this

462


case, the location proximity is the perimeter of the circle centering at last updated cell and having a radius of M cells. Phase 2 Estimation Signal strength, angle of arrival (AOA), time of arrival (TOA), and time difference of arrival (TDOA) are the most important measurement algorithms used for location estimation in current cellular networks. As discussed earlier, signal-strength- and AOA-based systems are subject to the severe interference of multipath and shadowing effects and are suitable for applications requiring only low location estimate accuracy. TOA and TDOA appear to be more appropriate for high-accuracy location estimation required by phase 2 E-911 services and other applications. Since code-division multiple access (CDMA) and time-division multiple access (TDMA) are two dramatically different air interfaces used in current mobile networks, time estimation techniques deployed in these two types of systems are introduced separately below. One common assumption for these two types of systems to work is that time synchronization needs to be guaranteed among participating basestations or mobile hosts. This is achieved either through GPS or extra network components. The TOA estimates can be derived from the pseudonoise (PN) code acquisition and tracking algorithms employed in spread-spectrum receivers. The estimation usually has two phases: the coarse acquisition phase, which determines the time delay estimate to within a chip duration; and the fine acquisition phase, which maintains fine alignment between the locally generated and incoming PN sequences by using a delay-locked loop (DLL) or tau-dither loop (TDL) [18]. The TDOA estimates can be derived by forming the cross-correlation between signals received at a pair of basestations. Assume that the signal received by basestation A is sA (t) and the signal received by basestation B is sB (t). Then the crosscorrelation function of sA (t) and sB (t) is ð 1 T sA (t)sB (t þ n)dt CA,B (t) ¼ T 0

CDMA-BASED SYSTEM

The TDOA estimate is the value v that maximizes CA,B (t). TDOA can also be directly derived from TOA if TOA measurements at each base station are available. Besides multipath, shadowing, and path attenuation, CDMA systems are subject to two other main sources of error. The first source of error is the multiple access interference that is usually called “near – far” effect in CDMA systems. The near – far effect is the phenomenon where strong signals sent from nearby mobile stations make it difficult to correctly receive the weak signals from remote mobile hosts. So power control schemes are used to combat the near –far effect by attempting to ensure that each user’s signal is received with equal power at the basestation. But in the location estimation systems, because each mobile station has to communicate with at least three basestations at the same time, it can set up its transmission power to satisfy only one basestation’s power control requirements. As a result, it might bring up severe multiple access interference to cells of other two base stations.

13.5


463

The other error source is called dilution of precision (DOP), which refers to the effect that relative geometry of the basestations to the mobile stations can further degrade the accuracy of location estimation. In some cases, the accuracy of the estimate can vary by an order of magnitude or more due to the effect of DOP. Because cellular networks are usually designed to achieve better communication services, the basestation configuration may not be good at minimizing DOP. For example, basestations might be set up along a popular highway in order to have good coverage and handover performance for drivers on the highway. But such a linear configuration of basestations incurs large errors of DOP. Figure 13.8 uses AOA as an example to show the effect of DOP. In Figure 13.8, geometry (a) has better performance than geometry (b) given the same measurement error of AOA. Note that DOP also happens in TDMA systems described below.

BS A Correct Measurement

AOA Error

Location Error

BS B Actual Location

Erroneous Location

Geometry (a)

BS A

Correct Measurement

AOA Error Location Error

BS B Actual Location

Erroneous Location

Geometry (b) Figure 13.8 Effect of dilution of precision (DOP).

464


TDMA-BASED SYSTEM The global system for mobile communications (GSM) is a widely adopted TDMA-based 2G system used in cellular networks around the world. In GSM systems, time measurement (TOA or TDOA) location estimation systems also have a higher accuracy than do those with signal-strength- and AOA-based solutions [20]. In addition, timing measurements are inherent in the GSM standard as a way of ensuring proper slot framing. TOA is measured by arbitrarily imposing a mobile station handover to two or more basestations. After handover happens, two mechanisms can be used to measure the TOA between the mobile station and the basestation. In the first mechanism, according to GSM specification, a basestation informs the mobile station how to advance the frame timing to ensure proper framing synchronization. After two forced handovers, three time advances are known and the sufficient information is collected to estimate location. In the second mechanism, a mobile station sends a burst of known data to the current basestation so that the basestation can record TOA. After two other basestations record the TOA in the same way, the location can be estimated using three TOA measurements. However, under current GSM specification, the unit of TOA is of a bit period, which equates to a location accuracy of 554 m. After taking into account other error sources, the accuracy of systems purely based on TOA can be worse than 554 m. TDOA is measured using methods of observed time difference (OTD). Each basestation monitors the OTD between at least three basestations. This information is known in both the idle and communicating modes. If real-time difference values between basestations are known, TDOA among these basestations can then be derived. Once again, the unit of the estimation is still of 1 bit period and the accuracy of location estimation is at best 554 m, which is worse than the requirements of phase 2 E-911. To achieve TDOA measurement much more accurate than the current 1-bit resolution, current mobile stations have to be modified to support more accurate pseudosynchronization to locate the training sequence and combat multipath effect.

13.5.3

Location Format Transformation

For a given location estimation system, the number of output formats of location estimation is limited by the implementation complexity. However, both the number of location-based applications and their desired presentation formats are unlimited. In Section 13.2, we have briefly introduced a simple taxonomy of location information. In practice, new types of location-based services and presentation format will continue to evolve as ubiquitous mobile computing becomes more and more popular in our daily lives. Hence, the component of location format transformation (LFT), as shown in Figure 13.1, plays an important role in transforming information formats provided by a specific location estimation system to different presentation formats understandable by applications. During the transformation, LFT might cooperate with other information resources, such as various map databases and geoinformation processing vendors, to complete the transformation. Given the fact that the vendors of

13.6

LOCATION SERVICES BASED ON CELLULAR SYSTEMS

465

location estimation systems, location format transformation, applications, and other resources can be completely different, the seamless integration and cooperation of these systems are not a trivial issue. As a result, the OpenGIS (geographic information system) Consortium (OGC) was formed. OpenGIS is defined as transparent access to heterogeneous geodata and geoprocessing resources in a networked environment. The goal of the OpenGIS project is to provide a comprehensive suite of open interface specifications that enable developers to write interoperating components that provide these capabilities [21]. OGC, consisting of GIS software vendors, database vendors, integrators, and application providers, manages consensus processes that result in a standardized format of geographic data expression and communication protocols to expedite data exchanges among diverse geoprocessing systems. Some of the possible benefits of OGC are as follows [21]: . Geolocation information should be easy to find, without regard to its physical location. . Once found, geolocation information should be easy to access or acquire. . Geolocation information from different sources should be easy to integrate, combine, or use in spatial analyses, even when sources contain dissimilar types of data (raster, vector, coverage, etc.) or data with disparate featurename schemas. . Geolocation information from different sources should be easy to register, superimpose, and render for display. . Special displays and visualizations, for specific audiences and purposes, should be easy to generate, even when many sources and types of data are involved. . It should be easy, without expensive integration efforts, to incorporate systems geoprocessing resources from many software and content providers into enterprise information.

13.6


In this section, we focus on how to provide location services to end users via a cellular network. At first, we introduce a system architecture to provide location services based on a cellular system. Then, the mobile location protocol (MLP) is described. Finally, we introduce an example of a location service platform. 13.6.1 Location Service System Architecture An architecture to provide location service (LCS) based on a cellular system is shown in Figure 13.9. There are four entities in the architecture: LCS client, gateway mobile location center (GMLC), cellular network, and handset. An LCS

466


Cellular Network

MLP

LCS Client Third Party Search

GMLC

Base Station

Mobile User

Location Measurement Request LCS Client Authentication Mobile user privacy check Location Measurement Location Measurement Request Location Measurement Result

Location Measurement Request Location Measurement

User Location Notification

Location Measurement Result LCS Client Authentication Location Measurement Request

Figure 13.9

Location service system architecture.

client, such as an application service provider (ASP), is an end user of location information. GMLC coordinates LCS clients and cellular networks. There is a standardized communication interface (protocol) between the LCS client and GMLC. One example is the mobile location protocol (MLP), which is standardized in 3GPP (Third-Generation Partnership Project) for IMT2000 system [22]. MLP is developed in OMA (Open Mobile Alliance) for the transmission of location information of cellular users measured in the core network of cellular system to external servers [23]. It provides a simple way for corporations and/or ASPs to utilize location services. A corporation or an ASP may use the system to get the location information of cellular users and then provide specific services to them. In a cellular network, there are also mechanisms and protocols, for example, specified by 3GPP [22], to support location services. For example, there is a function of privacy for a cellular user to determine whether (s)he wants to disclose the current location information. In general, there are two types of location services [24]: 1. The Third-Party Search Service. It is provided to LCS clients who want to know the location of a mobile user. For example, a company may want to know the current location of its salespeople and/or delivery cars. At first, the LCS

13.6


467

client sends a request to GMLC for the location information of a mobile user. The GMLC authenticates the LCS client after receiving the request. It will then check the privacy information of the mobile user requested. A mobile user presets the privacy policy, for example, whether to disclose his/her location and if so, to whom. The user’s location information will be measured by the cellular network and then delivered to the LCS client if all the conditions are cleared. 2. The User Location Notification Service. A mobile user can require his/her mobile terminal to measure the current location and sends the result to the LCS client. The LCS client, in turn, may provide the information related to the location to the user. For example, a contents provider can provide a mobile user the information of all shops or user-specific shops nearby. The mobile terminal sends its request to the basestation requiring the measurement of its location. On receiving the request, the radio network controller coordinates the basestations neighboring the terminal to measure the location by using OTDOA. If the terminal equips a GPS receiver, it will send the location data measured locally rather than a request for measurement. The location information is then transmitted to the GMLC for the authentication. If there is no problem, the GMLC forwards the location data to the corresponding LCS client. 13.6.2 Mobile Location Protocol As described above, MLP is a protocol supporting the communication interface between LCS client and GMLN, which is specified as the “Le interface” in 3GPP. The hierarchy of the MLP consists of transport layer, element layer, and service layer. Figure 13.10 depicts the three-layer hierarchy of MLP. The transport layer included in the hierarchy uses independent protocols such as HTTP, WSP, and SOAP to carry on location information. XML (Xtensible Markup Language) is used to describe the basic functions in both element and service layers. The element layer defines the common function of different location information services in the service layer. This makes it possible to use the existing element layer when new functions are added to the service layer. In the service layer, it is possible to define multiple MLPs as shown in Figure 13.10. The basic MLP is to support basic location services defined in 3GPP, and the advanced MLP is to provide more flexible and convenient location services. Since they are defined separately, the update of new functions in one part may not influence the other part(s). Also, the basic common elements of each MLP service, which are defined as the sublayer of the service layer, can be easily reused. Table 13.1 lists the possible location services supported and specified in MLP3.0. SLIS and ELIS belong to the third-party search type, while SLRS and ELRS belong to the user location notification type. TLRS is a new type of service and can be triggered by time, period, and mobile terminal’s operation. An implementation example of MLP over HTTP can be described as follows. In the third-party search case, an LCS client sends a HTTP POST for service initiation to GMLC. The GMLC responds the request by sending back a HTTP response in which the location information of the mobile user is included. In the user location

468


Figure 13.10

Three-layer hierarchy of MLP.

notification case, on the other hand, the GMLC sends a HTTP POST including the location information of the mobile user requesting a location service to the LCS client. In turn, the LCS client responds a HTTP Response for service initiation.

13.6.3

Location Service Platform

In order to provide a flexible and convenient location service for both mobile users and ASPs (LCS clients), it is necessary to develop a location service platform. As indicated in Section 13.6.1, location services can be provided via the architecture shown in Figure 13.9. Although GMLC and MLP have been standardized by 3GPP and OMA, respectively, there are heterogeneous wired and wireless networks

TABLE 13.1 MLP-Specified Services SLIS (standard location immediate service) ELIS (emergency location immediate service) SLRS (standard location reporting service) ELRS (emergency location reporting service) TLRS (triggered location reporting service)

Provide location information of a mobile user to an LCS client based on the LCS client’s request Provide location information of a mobile user to an LCS client based on the LCS client’s request in emergency cases Provide location information of a mobile user to an LCS client based on the user’s request Provide location information of a mobile user to an LCS client when an emergency call is initialized Provide location information of a mobile user to an LCS client based on the preset events

13.6


469

in the real world. In 1999, NTT DoCoMo proposed the DoCoMo location platform (DLP) to provide a common platform to enable location services via various interfaces [25]. It is a solution to provide seamless location services in the heterogeneous environment. After 2 years for development and deployment with various experiments, the DLP service started in 2001. In this section, we describe the DLP as an example to introduce how the location service is provided by an operator. Two general functions are provided by the DLP: location information provision and ASP support. The location information provision function includes user location search, user location notification, and the third-party location search functions. On the other hand, the ASP support function includes group information management, zone monitoring, and the push-type information delivery management functions. Figure 13.11 shows the network configuration of a DLP network. Besides the functions we have described above, the DLP offers a number of communication interfaces to corporations and ASPs. In the case of the Internet, the security protocols such as SSL (security sockets layer) and TLS (transport layer security) are used to encrypt location information. There are five groups of servers in a DLP center including request reception servers, location measurement servers, user management servers, ASP support servers, and status monitoring servers. Request reception servers receive requests from mobile users and LCS clients based on a unified interface LISAP (location information service access protocol). Location measurement servers are responsible for measuring the location of a terminal subscribing the DLP service. User management servers authenticate the mobile user as well as LCS clients when receiving a request and manage the location information of mobile users. ASP support servers support ASPs via CGI (common gateway interface) that can be used by ASPs to develop a location-based application easily. Finally, status monitoring servers collect logs and monitor/manage the status of all the servers. The service sequences based on these servers are depicted in Figure 13.12. With these sequences, the DLP supports a number of location-based applications.

Figure 13.11

Network configuration of DLP.

470


Figure 13.12 Service sequences: (a) location measurement sequence; (b) user location notification sequence; (c) user location registration sequence; (d) location reference sequence; (e) third-party search sequence.

REFERENCES

471

REFERENCES 1. B. Schilit, N. Adams, and R. Want, Context-aware computing applications, Proc. Workshop on Mobile Computing Systems and Applications, Dec. 1994, pp. 85 – 90. 2. Allied Business Intelligence, Inc., Location Based Services: A Strategic Analysis of Wireless Technologies, Markets, and Trends, ABI report, 2000. 3. J. Hightower and G. Borriello, Location systems for ubiquitous computing, IEEE Comput. 34(8):57 – 66 (Aug. 2001). 4. J. Anderson, T. Rappaport, and S. Yoshida, Propagation measurements and models for wireless communications channels, IEEE Commun. Mag. 33(1):42 – 49 (Jan. 1995). 5. K. Pahlavan, X. Li, and J. Makela, Indoor geolocation science and technology, IEEE Commun. Mag. 40(2):112 – 118 (Feb. 2002). 6. R. J. Orr and G. D. Abowd, The smart floor: A mechanism for natural user identification and tracking, Proc. 2000 Conf. Human Factors in Computing Systems (CHI 2000), April 2000, pp. 275 –276. 7. Microsoft Research, http://www.research.microsoft.com/easyliving/. 8. P. Bahl and V. N. Padmanabhan, RADAR: An in-building RF-based user location and tracking system, Proc. IEEE INFOCOM 2000, March 2000, pp. 775 – 784. 9. P. Castro, P. Chiu, T. Kremenek, and R. Muntz, A probabilistic room location service for wireless networked environments, Proc. Ubiquitous Comput. 18 – 34 (Sept. 2001). 10. P. Prasithsangaree, P. Krishnamurthy, and P. K. Chrysanthis, On indoor position location with wireless LANs, Proc. IEEE PIMRC 2002, Lisbon, Sept. 2002. 11. N. B. Priyantha, A. Chakraborty, and H. Balakrishnan, The cricket location-support system, Proc. MOBICOM 2000, Aug. 2000, pp. 32 – 43. 12. A. Harter, A. Hopper, P. Steggles, A. Ward, and P. Webster, The anatomy of a contextaware application, Proc. MOBICOM 1999, Aug. 1999, pp. 59 – 68. 13. M. Hazas and A. Ward, A novel broadband ultrasonic location system, Proc. UbiComp 2002, Sept. 2002, pp. 264– 280. 14. R. Want, A. Hopper, V. Falcao, and J. Gibbons, The active badge location system, ACM Trans. Inform. Syst., 10(1):91 – 102 (Jan. 1992). 15. Akamai White Paper, Turbo-charging Dynamic Web Sites with Akamai EdgeSuite, 2001. 16. Speedera White Paper, Speedera Edge Delivery Network, Sept. 2001. 17. P. Enge and P. Misra, Special issue on global positioning system, Proc. IEEE, 87(1) (Jan. 1999). 18. J. J. Caffery and G. L. Stuber, Overview of radiolocation in CDMA cellular systems, IEEE Commun. Mag., 36(4):38 – 45 (April 1998). 19. J. Zhang, Location management in cellular system, in Handbook of Wireless Networks and Mobile Computing, Wiley, New York, 2002, Chapter 2. 20. C. Drane, M. Macnaughtan, and C. Scott, Positioning GSM telephones, IEEE Commun. Mag. 36(4):46 – 54 (April 1998). 21. Open GIS Symposium webpage, http://www.opengis.org, 2003. 22. Function Stage 2 Description of LCS, 3GPP TS23.271, version 6.4.0, June 2003.

472


23. Mobile Location Protocol, LIF TS101, version 3.0.0, June 2002. 24. N. Miura and M. Takahata, Trends of location information distribution methods, NTT DoCoMo Tech. J., 9(4):34 – 43 (April 2001). 25. M. Fujita and M. Chikamori, DLP (DoCoMo location platform) service, NTT DoCoMo Tech. J., 10(1):41 – 47 (Jan. 2002).

CHAPTER 14

FIXED AND MOBILE WEB SERVICES MICHAEL MAHAN1 Nokia Research Center Burlington, Massachusetts

14.1

WEB SERVICES INTRODUCTION

Comprehensive coverage on the topic of Web services is impossible within the constraints of one book chapter. Web services are a highly volatile technical domain, and any complete treatment will both read like a Tolstoy novel and soon be outdated. Web services are an amalgamation of many detailed protocols—some leveraged by Web services and some created specifically for Web services. Advocates stress that the whole value of Web services exceeds the sum of these protocols. Web services describe the interactions between network-accessible, runtime components, assembled to create distributed applications. Web services strike a sophisticated balance between the desirable, yet conflicting, goals of system extensibility and application interoperability. Web services diverge from the traditional Web in that the interactions contain application-customized, function-oriented data rather than presentation-oriented data universally understood by any browser. From a software engineering perspective, Web services open the way toward more efficient and effective software application development through component reuse and the sharing of runtime resources. 14.1.1 Web Services Defined There exists a broad movement by the information technology (IT) industry to expand the existing World Wide Web (Web) from a server-to-user interaction model to an application-to-application interaction model. This overall goal is to extend the Web from an information-oriented system to a service-oriented, 1

With SOAP Message Security contribution (Section 14.2.4.7) from Frederick Hirsch, Nokia Mobile Phones, Burlington, MA.


473

474

FIXED AND MOBILE WEB SERVICES

transactional system. To accomplish this goal, new and openly developed Web services specifications will exploit and extend the existing foundation Web specifications. Competing IT vendors, open source, and freeware providers will create and distribute Web services tools and platforms that comply with these new specifications. Developers will create distributed applications using these available Web services tools and platforms and deliver them to their users. What is unique with Web services is the creation of Web-based application middleware that is openly developed and specified through standards bodies. Taken together, the existing Web specifications plus the new and emerging Web services specifications form the requirements set for this new breed of application middleware. Corporate entities, open-source or freeware organizations, and individuals can take these requirements and implement partial to full solutions. Solutions range from simple message parsers to full development platforms to manageable service deployment platforms. A Web services developer chooses which middleware products or tools to use dependent on his/her application constraints such as cost, time to market, and available product support. Early adopters of Web services came from different communities with often contradictory visions and goals. These early adopters include the distributed computing industry, the EDI/B2B industry, the Web community itself, and proponents of the emerging Semantic Web. Because of the divergent views of these special interest groups, the development of Web services standards and technologies have been slow, chaotic, and undisciplined. Adding to the difficulty is that the foundation technology, the Web, is itself decentralized and its organizational guardian, the World Wide Web Consortium (W3C), has only recently (as of 2003) started to produce a normative, definitive architectural description [1]. Thus, it is understandable that evolving an ill-defined, albeit highly successful, architecture, influenced by competing visions, has proved difficult. As an example of the chaos, even the most basic of definitions in this space that of Web services proper, does not enjoy consensus. The term “Web services” first gained traction in the marketing domain rather than the technical domain. Technicians have tried to catch up; however, a single rigorous definition has eluded wide acceptance with this audience. Perhaps it is because the term was co-opted by marketers or “marketeering” that has led to muddied technical waters. Recently, some consensus on a definition has been building within the W3C’s Web Services Architecture working group. This group was converging on the following definition in 2003: A Web service is a distributed software system designed for the exchange of messages encoded with functional rather than presentation data. Web services use URIs for identifiers, and have interfaces described using XML (typically WSDL). Agents interact with the Web service in a manner prescribed by its description, using XML-based messages typically conveyed using HTTP, SOAP and other Web-related standards.

This distinguishes Web services more as an architectural pattern or a collection of system capabilities, rather than a particular specification, application, or solution. This pattern or collection is a work in progress and is being realized by a set of

14.1


475

emerging open standards specifications, software tools, and development and execution platforms. One powerful, yet subtle, pattern expressed in the definition above is the ability to dynamically enhance the infrastructure itself, to define, develop, and deploy new system features as needed. Web services describe more than just an application messaging interface to a specific service. They also can define describe optional interoperable infrastructure features that can be used as needed by applications. Such horizontal, system features include security, reliability, transactions, privacy, identity management, and service composition, for example. Through the application of intermediaries, Web services define the ability to deploy both functional and optimizing features at runtime. This extensibility is a highly sought system property in any distributed system or middleware. Thus, we arrive at the crucial benefit that Web services can deliver to the distributed computing industry and perhaps its most distinctive characteristic. Web services strike a sophisticated balance between system extensibility and application interoperability. Traditionally, a system architect must consciously trade off between these two desired properties just as a software designer chooses between code size and execution speed. In contrast, the latest Web services specifications, from standards organizations such as the W3C, OASIS, and WS-I, accomplish the prime objective of application-to-application interoperability while supporting the extensible deployment of system capabilities in a robust fashion. The next few years should show whether Web services can truly deliver both promises commercially.

14.1.2 Service-Oriented Architectures Web services are an example of a service-oriented architecture (SOA). Serviceoriented architectures are distributed systems that promote the concept of atomic, self-contained, services accessible and available to a multitude of applications. Hence, implementations of a SOA reduce the coupling between service providers and consumers. Client development and deployment is less dependent on service development and deployment; enabling clients to mix, match, and group services to best meet application requirements. For instance, a client might use many different airline Web services to check availability and pricing information, another Web service for credit card validation, and yet another for creating paper (hardcopy) tickets and shipping them to the buyer, all possibly using preexisting services. Applications built on a SOA should leverage the decoupling of applications from services in terms of requirements, semantics, and platform dependencies. In the above example, the travel agent client application needs to understand only the syntax and semantics of the service interface. All other details of the service are opaque to the client, including programming language, algorithms used, internal data representations and semantics, and platform execution environment. Likewise, the service needs to know little about the client application. In fact, at development time, the service does not need to know anything about any of the many potential clients that may use the service once it is deployed.

476


This decoupling enables emergent behavior. This means that because application semantics are not built into the service, services are free to interact with any client that recognizes the service’s interface. This in turn encourages client developers to create applications that more nuanced, more complex, and specialized to meet a customer’s requirements. A Web services client developer uses the ubiquitous infrastructure to access and leverage these available, atomic services. The client developer binds the semantics of the application to these available Web services. The user of the client gets the application semantics desired. The client developer focuses on understanding and codifying the application requirements and application semantics. The Web service provider focuses only on the service semantics and is completely ignorant of any client application requirements and semantics. The Web services architecture is an instantiation of a SOA. The SOA model used by Web services is depicted in Figure 14.1. The Web services architecture defines roles for a service requestor, a service provider, and a service registry. Web services requestors and providers fulfill welldefined roles originally defined by traditional distributed systems. The difference with Web services is that requesters and providers interact using messages defined by open standards bodies and providers describe their offered services in a standardized format. Leveraging on these two distinctions is one additional key differentiator: Web services requesters and providers are often developmentally decoupled—implemented and deployed by separate development teams. Web services can also be used to enable requestors to discover providers of suitable services.

Figure 14.1

Web services conceptual model.

14.1


477

Currently, Web services discovery is primarily a design time endeavor. However, a Web service registry provides a necessary, yet not sufficient, condition for requestors to find and interact with providers without having a priori knowledge of the provider. Automated discovery is the goal for the emerging field of semantic Web services. Although automated discovery has been promoted as a prime benefit of Web services, without explicit semantic annotation it can occur only in isolated environments where all application semantics have been standardized or well known by the participants and these semantics are encoded in the program logic of both the requestor and the provider. Much of software production still resembles more individualized craft than massproduced product, similar to manufacturing industries before interchangeable components and assembly lines enabled mass production of complex goods. Service oriented architectures and Web services are an attempt to bring interoperable service components to the software industry. SOAs can be viewed as the latest attempt by the computer engineering industry to improve production software methods. Earlier efforts included the development of procedural languages, object orientation, and component software, each of which increased software modularity and reuse. SOAs in general and Web services in particular extend previous efforts at modularity and reuse to include runtime elements and to distribute functionality across a network instead of encapsulating it within one runtime process or a set of linked libraries executing in a single memory space. Application functionality is partitioned and farmed out to perhaps competing service providers. These are distributed runtime components—extending reuse and modularity out to network accessible services. The goal, from an engineering perspective, is to produce and deploy distributed software that allows greater client access, results in higherquality applications, and is delivered predictably on time and within budget. Will runtime interchangeable parts transform information engineering in the same way that industrial engineering was transformed by manufactured interchangeable parts? Given the increasing dependency of all segments of the economy on the efficiency and robustness the information technology sector, the demand for transformation is high and vendors who deliver solutions in this space will be rewarded. Web services have some unique service-oriented architecture properties. Although the Web is a distributed, highly decoupled information system it is built around the assumption of human interaction. This makes it inappropriate to use Web servers as a set of interchangeable parts or services to programmatically form more complex or specialized applications. Other service-oriented approaches such as CORBA (Common Object Request Broker Architecture) are not built on the open Internet foundations of the IP and the Web, are seen as vendor-encumbered, and are not as pervasive. Web services have the strength of leveraging successful Internet infrastructure and providing a framework based on open standards. 14.1.3 Motivating Technologies: Creating a Mature Foundation Web services draw on experience and mature technologies developed in the late 1990s when technologists explored issues in a number of technology areas,

478


including distributed computing, business data interchange, and information distribution and retrieval. These efforts have contributed to the new software domain called Web services. From this perspective, the creation of Web services technology and specifications are a collective attempt to address many of the problems independently encountered by each motivating technology. Additionally, for some of the motivating technologies, the trend toward Web services concepts are the byproducts of attempts to reconcile lack of widespread adoption. Of course, each of the motivating technologies brings tangible assets along with hard-to-solve issues. . Distributed Computing. Distributed computing relies on middleware for object exchange or function invocation across a distributed application. Distributed applications typically are intranet-oriented. Distributed applications that span multiple trust boundaries have not gained significant traction. Leading technologies are DCOM, CORBA, and Java’s RMI and Jini; along with distributed security initiatives such as Kerberos, DCE, and public key infrastructure. Distributing computing often suffers from interoperability issues across vendor tools and platforms, especially since it requires clients and servers to be tightly coupled in both a business and a development sense. Thus, this technology has not come close to meeting scalability expectations, especially when compared to the Web—a loose, ad hoc coupling of components traversed by following identifier links and observing a few simple rules. Distributed computing applications are often fragile in the event of partial failure and are inflexible to redistribution of objects or application functionality. They are also inflexible at runtime when new system components such as firewalls, proxies, and gateways are deployed. They tend to have a high administrative cost. Finally, there is a significant technical barrier for distributed computing middleware to directly leverage the desired global pervasiveness of the Web. The Web is a disconnected infrastructure, whereas vital middleware services such as security and transactions are typically connection-oriented. . Electronic Date Interchange ðEDI Þ. EDI is a type of distributed system focusing on particular requirements from the business community. These are typically agreed upon requirements for buying, selling, and trading between two or more trading partners. These are often referred to as Business to business (B2B) requirements. B2B requirements are technology-oriented solutions to traditional business challenges. Much of the B2B focus has been to specify automated mechanisms for uniform business needs—those that concern every business venture. These are horizontal issues such as service and business level agreements, multi-party transactions (including rollback schemes), and reliable messaging. EDI also addresses vertical market issues; defining market-specific vocabularies for industry-mandated semantics (insurance, commodity trading, etc.) and electronic exchanges or marketplaces for consumers and producers to discover and interact one another in an adhoc fashion. In this vertical context, EDI methods were developed to meet specific governmental requirements for an industry such as aerospace or defense. EDI

14.1


479

systems are typically deployed on private networks and do not scale, especially since EDI suffers from being complex and inflexible. . The Web. The Web is a globally accessible, distributed system designed for direct human consumption of remote information. The Web has enjoyed enormous success—it and email are the standard-bearer applications of the computer age. The Web uses UI application browsers to request and subsequently render information retrieved from a provider’s Web server. Hence, a general-purpose client is used to deploy applications as diverse as stock trading, news, auctions, Blogs (WeB logs), peer-to-peer (P2P) applications (i.e., media sharing such as Napster clones and process sharing such as P2P SETI@home), and simple to moderately complex retail services for secure commodity purchases (i.e., Amazon, Travelocity). The Web infrastructure does not require communicating parties to develop or preconfigure software and does not require them to have preestablished application-level knowledge of each other. This is because humans supply the semantics to Web applications at runtime, interpreting the information appropriately. This enables the Web to be a simple architecture with request – response messaging, a uniform set of actions (primarily GET and POST), and independent components loosely connected through hypermedia links. This is in contrast to distributed computing and EDI efforts that required tight coupling, preestablished understanding, and extensive costs and efforts upfront. In addition, the Web scales well and promotes interoperability; however, it was designed for simple information retrieval and not for sophisticated B2C or businessto-business (B2B) processing or complex, interactive applications. Businessoriented applications require solutions for message reliability, transactional support, security, service composition, and choreography. Complex applications may include multiple parties, asynchronous interactions, and service discovery. . The Semantic Web. The Semantic Web is an emerging set of specifications that, like Web services, are based on the Web infrastructure to create sophisticated program-to-program software applications. However, Web services only addresses service syntax and relies on humans to supply the meaning of services and the service interfaces to the consumer application. The semantic Web aims to move beyond this human barrier to distributed system utility and sophistication. The Semantic Web defines an XML-based language for annotation of resources and services as well as well defined meanings of terms, or ontologies. This allows a semantic processor to comprehend a resource or service and enables automated service discovery. A semantic processor embedded into a consumer application will be able to interrogate, at runtime, a repository of service descriptions and analyze whether any registered service meets its processing requirements to fulfill an active task. This semantic capability also enables the consumer application to understand the chosen service’s interface sufficiently to automatically invoke the advertised service and comprehend the results. Hence, the semantic Web describes

480


Web services in machine understandable (not just processible) form to enable runtime rather than design-time application-to-application discovery and non – a priori or emergent behavior. Distributed computing, EDI, and the Web have established roots in the computing industry and academia, In contrast, the Semantic Web has grown up in parallel with Web services and is still largely associated with research. These precursor technologies were necessary but not sufficient to create Web services. The catalyst was the advent and widespread adoption of eXtensible Markup Language (XML)—a flexible, readable, and openly developed syntax for encoding data. XML is based on a mature technology for marking up content, SGML, contributing to its wide adoption. Given the advantages and availability of XML, parallel ideas germinated in all precursor technology camps to apply XML in each domain to solve problems or expand their solution space. What has become evident is that this new large area of intersection between distributed computing, EDI, and the Web would require its own focus. This focused activity is Web services.

14.1.4 Quick Glance at Foundation and Core Technologies of Web Services The genesis of Web services was the creation and widespread adoption of the XML (eXtensible Markup Language) together with the already established Web. The business and technical potential produced by combining XML and the Web generated widespread attention and an incipient marketplace. Built on these foundation technologies are the core Web services specifications of SOAP and WSDL. SOAP, through version 1.1, is an acronym standing for simple object access protocol. Interestingly, the W3C kept the name SOAP for its standardized version 1.2, yet assert that it is no longer an acronym. This is not cause for alarm as the 1.2 work clarifies or expands on version 1.1 rather than narrowing its applicability. WSDL stands for the Web Services Description Language through all versions. Just as the Web is built on the services of the Internet [3], Web services depend on the distributed system backbone of the Web. The Web provides three essential elements that Web services exploit: 1. Globally addressable identifiers—URIs 2. Typed (integers, strings, binary) representations associated with identifiers 3. A transfer protocol to retrieve resource representations The fundamental concept is that the Web is composed of resources, each of which has a globally addressable identifier called a uniform resource identifier (URI). In addition, each resource has a representation of state that is s represented by structured data conforming to some well-known, yet unbounded, media type. Lastly, the Web provides a transfer protocol so that Web applications can exchange

14.1


481

resource representations. The primary Web protocol is HTTP (Hypertext Transfer Protocol) [4]. XML is an open standard data representation language that enjoys widespread adoption across IT vendors and computing platforms. XML is popular because of its openness, simplicity, expressiveness, extensibility, and readability. XML defines a simple syntax to structure and collect data into documents. Application designers use XML to define a markup vocabulary appropriate to their application and industry. (Standards bodies such as Oasis help define uniform industry vocabularies, allowing application-level interoperability.) Application grammar and the baseline XML syntax are used to structure the data through the technique of data markup. Hence, XML documents are application customized to be both humanreadable and machine-processible. Since any XML document must conform to the syntax rules defined by the XML specification [5], developers can choose from a host of available, general-purpose XML tools, including codecs, programming libraries, database libraries, and user interface tools. SOAP is the core Web services messaging technology.2 The SOAP specification addresses XML messaging between processing nodes supporting a distributed application. SOAP details how an XML message is structured, how the XML data are encoded (data typing), how a SOAP node must behave according to a specific processing model, and details on binding the XML message to an Internet protocol, namely, the Hypertext Transfer Protocol (HTTP). SOAP defines how the message infrastructure may be extended using message headers and how intermediaries must behave when SOAP messages are passed through them. The use of headers and intermediaries gives application architects great flexibility in how SOAP messaging is used to meet application requirements. The Web Services Definition Language (WSDL) is the core Web services metadata technology. The WSDL specification addresses the need for exposing Web service information that will enable a client to access and invoke that service. Thus, WSDL is akin to an Interface Definition Language (IDL) or a Java object’s public interface. It is client-centric in that it specifies only what the client needs to know; all other service information consists of opaque implementation details. The minimal metadata that a WSDL document describes are the structure of the XML content to be used in a SOAP message and the transport protocol details required for the client to connect and access the service. WSDL information may be obtained by a Web services consumer by a variety of techniques, depending on the requirements of the application, consumer, and service provider. It is expected that the early adoption of Web services will follow one of two models. First will be enterprise-oriented, intranet, back-office integration tasks. The second will be business to business, between established, trusted partners. In both cases, client developers have a close association with service developers, enabling exchange of WSDL information as part of the development process (out of band). Use of a private registry will allow such closed communities to share WSDL information within the community, using a standard such as UDDI 2

As noted previously, as of SOAP version 1.2 this is no longer an acronym.

482


(universal distribution, discovery, and interoperability). Future services may also support public registries for wider communities. Using WSDL and SOAP, a service provider is not limited to create standard interfaces to only new services but also to wrap existing legacy services with a SOAP interface and a metadata description. This enables legacy services to benefit from Web services interoperability and use the Web foundation technologies. The service provider can provide allow authorized consumers to use server authorization mechanisms in conjunction with consumer authentication techniques. Emerging as another foundation standard in the Web services domain is the Web Services Interoperability Organization’s (WS-I) Basic Profile 1.0 (BP 1.0). BP 1.0 addresses ambiguities in the SOAP 1.1 and WSDL 1.1 specifications, along with how SOAP 1.1 uses HTTP headers, status codes, and cookies. These usage clarifications are targeted to enable one of the key stated benefits of Web services—interoperability. Prior to BP 1.0, each Web services platform or tool vendor offered his/her own interpretation of the core specifications. This resulted in Web services applications that were isolated and could serve only a limited community. This situation was clearly antithetical to the goals of Web services, thus motivating the primary Web services vendors to create WS-I and to produce the BP 1.0 specification. To summarize, the core Web services specifications describe a messaging infrastructure, message processing rules, a service description syntax, a discovery system, and best-practices guidelines. These are based on Web standards for resource identification and transfer as well as the XML standards family (XML Schema, namespaces, XML Infoset, XML Canonicalization, etc.). Taken together, these specifications enable interoperability, yet have extension points allowing deployments to be customized appropriately. This allows infrastructure services (security, reliability, etc.) to be customized in a standard interoperable manner. It is this property of flexibility that makes Web services attractive to Internet and intranet service developers. Developers can rely on predicable distributed infrastructure while architecting the correct mixture of functional or optimizing properties for each particular application’s requirements. 14.1.5

Web Services Hype

XML and Web services have been oversold as a panacea, a silver bullet for various problems. XML has been marketed as a means to eliminate the difficult problem of defining the shared understanding and processing rules needed to integrate businesses. Web services have been sold as a means to trivially interconnect software without thought or effort. Using Web services as a way to create fantastic new software, seamlessly and automatically connecting any two business processes or applications anywhere on the network as if by magic, is unrealistic. Major research projects such as the semantic Web project are addressing the issues of shared understanding and automatic integration of unintroduced applications and software components. This work will eventually be integrated with Web services efforts, but Web services technologies alone are not adequate to achieve such goals.

14.2

WEB SERVICES FOUNDATION TECHNOLOGIES

483

Similarly, XML enables software to operate only at the syntactic level. There is a widely promoted fallacy that data that can be parsed are sufficient for software to interoperate. At best, XML makes it possible for businesses or developer groups to share data, provided they agree on the semantics of those data in advance. Hence, XML provides data interoperability where shared semantics can be assumed. XML does nothing at all to create semantic interoperability. Although labeled software’s “lingua franca” XML, it isn’t even a “lingua.” XML is an alphabet or token set that provides the primitives for describing larger concepts, and it works by allowing an unlimited number of semantic concepts to be encoded using those primitives. A true lingua consists of not only the set of tokens but also a specific grammar and the corresponding semantics. Projects like the semantic Web and DAML-S are working toward defining a true “linqua franca” by defining ontologies, enabling meaning to be shared.

14.2


14.2.1 XML/XML Schema XML is a family of specifications developed by the W3C. The prime feature of XML is that it defines a format and syntax in which you can write specialized, applicationspecific grammars to express structured data. Web services use XML for both message structure and service definitions. XML is popular because it is simple, platform-independent, and readable. XML documents must conform to the XML syntax in order to be “well formed.” To be valid, an XML document must conform to a schema. XML documents are made up of element tags and attributes. Figure 14.2 lists an example XML document. An XML Schema is a document that describes the valid format of an XML dataset. This definition includes what elements are (and are not) allowed at any point, what the attibutes for any element may be, and the number of occurances of elements. Hence, XML Schemas express shared vocabularies and allow machines to carry out rules made by people. DTDs, which came from XML’s parent, SGML, were first used for this function. Howver, XML Schema has some distinct advantages; the most notable is that XML Schema can deal with namespaces, as well as the ability to constrain values to define meaningful application types (such as a part type) as well as complex data types. This allows automatic value checking when parsing XML. DTDs also have non-XML grammar making them difficult to understand and requiring specialized tools. An XML Schema associated with the example XML document in Figure 14.2 is presented in Figure 14.3. 14.2.2 The Web The Web is both a foundation technology and a motivating technology for Web services. The Web is one of the most successful distributed systems ever built. The only other system that enjoys the same broad popularity is email. The Web

484


Julius Zinn 22 Armidillo Street Happy Camp TX 12345 12 1233.00 .05 10 600.00 3% Figure 14.2

Example XML document.

can be characterized as a ubiquitous, shared information space coupled to a hypermedia processing model. The Web is ubiquitous—the primary Web protocols are simple and deployed in all major user-oriented computing platforms, even handheld telephony devices. The Web is a shared information space—accessible data are structured content representing any data that any service provider sees fit to publish. Hypermedia define the ability to address remote content that is organized in some known structure. The hypermedia-processing model defines data retrieval and presentation behavior, defines how to perform simple queries on content servers and how to push new or modified content back to a content server. The definitive source for a definition of the Web comes from the W3C TAG (Technical Architectural Group). It categorizes the Web architecture into three primary areas: . Identification—Web processes identify resources using uniform identity identifiers (URIs) . Representation—Web processes represent the state of an identifiable resource . Transfer—representational state transfer (REST) The Web is currently a human-initiated client – server system based on browsers, document access, simple manual purchases, and file downloads. The Web owes

14.2


Figure 14.3

XML schema corresponding to an example document.

485

486


its success because of its simplicity, ubiquity, and extensibility. Content suppliers and consumers both exploit the ubiquity and pervasiveness of the Web. A service provider merely structures its content and sets up a Website—voila´—it just joined a global data and service access system. Any user with a browser can then access the content and services by a universally resolvable address. The Web flourishes because of its robust and extensible architecture. New features and functions can be deployed on the Web’s distributed infrastructure as long as they conform to the Web’s principles. Content handlers and plugins enable new content types to be executed on the client application. Simple protocols keep the entry barrier low. For example, the data and service interface consists of only three methods, each with clear semantics: HTTP GET, POST, and PUT. On top of that, the content language, HTML, is a simple markup language. To obtain these highly desired characteristics of simplicity, ubiquity, and extensibility, the Web architecture is purposefully constrained. These architectural constraints are described and promoted by network-based architectural style defined as representational state transfer (REST) [2]. In REST, as instantiated by the Web architecture, representations of resources are transferred from Web servers to Web clients. The REST constraints to a distributed hypermedia system are 1. Client – Server—separates the user interface from the data storage system. Gains: portability, scalability, independent component evolvability. 2. Stateless communications—each interaction must contain all information necessary for the service to process the request. Gains: visibility (debugability), reliability (easier recovery from partial failures), looser coupling of system components, and scalability (server components can be simpler and can quickly free resources). Tradeoffs: decreased network performance, loss of server’s complete control over system behavior. Note that the deployed Web extensively uses cookies (caching server state on the client), which breaks this constraint. 3. Caching—some representations may be cached. Intermediaries may respond on behalf of a server with the cached data. Gains: efficiency, scalability, and user-perceived performance. Tradeoff: potential reliability. 4. Uniform interface—consistent interfaces for resource identification, resource manipulation through representations, self-describing messages, and messages as the embodiment of application state. Gains: simplicity, visibility, independent evolvability. Tradeoff: efficiency. 5. Layered system—each component “knows” only about the components with which it is interacting. Gains: bounds system complexity, independent component evolvability, legacy encapsulation, simplifies components by moving infrequently used functionality to a shared intermediary, and intermediaries can be used to improve system scalability by enabling load balancing. Enables security policies to be enforced on data crossing the organizational boundary (firewalls). Tradeoffs: adds overhead and latency to the processing

14.2


487

of data, reducing user-perceived performance (can be offset by shared caching at intermediaries). 6. Code on demand—client may download and execute code (e.g., applets, scripts, ActiveX controls, XSLT). Gains: simplicity (reducing the number of preimplemented client features) and runtime extensibility. Tradeoff: reduces visibility. Taken together, these constraints create the large-scale effect of a shared information space that scales well and behaves predictably. URI and HTTP have been specially designed for REST interactions. REST components communicate by transferring a representation of a resource, selected dynamically based on the capabilities or desires of the recipient, and the nature of the resource. The Web focuses on a shared understanding of data types with metadata, but limits the scope of what is revealed to a standardized interface. User agents, gateways, proxies, and origin servers are the main roles that a component can act in. A component may act in different roles depending on the interaction [1]. Are Web services an extension of the Web as defined by REST? REST proponents think that using the RPC style of SOAP, rather than the document style, violates the uniform interface constraint. REST proponents argue that a service-oriented system is a special case of a distributed shared information space like the Web. This viewpoint asserts that services are just resources that should be exposed as URIs by service description documents, and the HTTP methods of GET, POST, PUT, and DELETE are sufficient to perform all the operations fathomable without resorting to using custom (Web services) methods. Following this reasoning, REST capabilities support document workflow-based interactions, such as passing a purchase order for processing, in a simpler and more robust fashion than can Web services with its custom interfaces. As evidence, REST proponents point toward RPC-based distributed systems like CORBA, DCOM, and DCE, which have failed to deploy on Internet scale because they have nonuniform interfaces. There is a large amount of controversy surrounding this argument, which is further slowing down the Web services working groups within the W3C.

14.2.3 Web Services Standards Figure 14.4 organizes the Web services standards by function area: security, messaging, discovery, security management, and description. There are elements of a protocol stack in this diagram. However, this diagram is intended primarily to graphically separate the main functional areas and map the mostly ongoing standards work to these broad categories. The core and foundation technologies are found about the base of the diagrams, and the most volatile technologies are towards the top. Given this volatility, this chapter will only address the foundation technologies of SOAP, WSDL, and the WS-I specification profiles. Note that WS-I specifications are profiles of selected specifications presented below and hence are not themselves part of the diagram with the exception of calling out the Basic

488 Figure 14.4 Emerging Web services standards.

14.2


489

Profile 1.1 (BP 1.1). This specification is special in how it describes the use of attachments. Also note that the security stack is typically applied to the messaging stack, and can also be applied to the discovery stack. 14.2.4 Simple Object Access Protocol (SOAP) SOAP defines the intersection of two technology domains: XML and the Web. SOAP details how to structure and transfer data across the Web between participating Web service nodes. The transferred data are organized into messages. Each SOAP message is represented by an XML document. These XML documents must conform to SOAP rules regarding a baseline vocabulary and structure. In this fashion, SOAP constrains the message contents to a known structure and syntax. This promotes interoperability—clients and servers can rely on a mutually understood minimal message syntax. SOAP is arguably the primary Web services protocol and, as such, this chapter will focus on this specification. SOAP and XML-RPC were both derived from the insight of Dave Winer, who realized the potential of RPC over HTTP via XML in early 1998. The SOAP branch was morphed from the XML-RPC branch as Microsoft and other companies adopted the concept and collaborated on the specification. These specifications were developed while XML Schema was evolving, so as a result SOAP not only included some XML Schema conventions and namespaces but also defined its own encoding model. The first SOAP specification was first published in September 1999 authored by UserLand, DevelopMentor, and Microsoft. By December 1999, the protocol had stabilized with the release of version 1.1, and now IBM and Lotus were part of the authoring team. In May 2000, this SOAP version was submitted to the W3C. The W3C established the XML protocol (XMLP) working group in September 2000 to handle this submission. By June 2003, the W3C released SOAP 1.2 as a W3C recommendation. Note that by mid-2003, it is the 1.1 version of the SOAP specification that is the most widely deployed and enjoys the most tool and platform support. However, tools and platforms are transitioning to be SOAP 1.2-compliant during this same timeframe, indicating commercial support for the evolution of the SOAP specification. SOAP originally stood for simple object access protocol, but the latest version of the specification interestingly keeps the well-known acronym and drops the full name—probably because there is nothing particularly object-oriented about the technology, nor does it define an object model. SOAP is simple in the sense that the messaging section of the specification doesn’t attempt to solve many of the nasty issues of distributed computing. These are features such as message reliability, end-to-end security, routing through application intermediaries, and multiple message dependencies. What the developers of the SOAP specification opted to do instead was to provide a generic mechanism that system developers can exploit to provide these horizontal features. In the latest SOAP specification (version 1.2), this is described in the section called the SOAP messaging framework [6].

490


14.2.4.1 SOAP Deployment Environments SOAP application environments can range from simple to arbitrarily complex. Complex application environments may involve network intermediaries, multiple transport protocols, service discovery, or sophisticated message patterns. In the most straightforward and typical Web services use case, SOAP is used with the HTTP protocol to perform a single request and the corresponding response between two distributed processes—one assuming a client role and the other assuming a server role. Unlike the typical Web browser use case, the message content in both the request and the response is in the form of an XML document. These XML documents are not intended for rendering. Each document contains data targeted for processing at the application layer rather than at the presentation layer. SOAP can be used as an integration technology, providing the infrastructure to feasibly couple heterogeneous systems. This follows from SOAP’s protocol neutrality and SOAP’s flexibility regarding SOAP headers. Although SOAP defines a concrete binding to HTTP, SOAP messaging does not assume the presence of HTTP or even TCP/IP. Thus, SOAP is more “protocol-biased” than “protocolneutral” because SOAP requires an outside-the-specification effort to map it to another transport protocol. However, some of these bindings have already been specified or productized—for instance the SOAP binding to email [9], BEEP (blocks extensible exchange protocol) [10], EJB (Enterprise Java Beans), and JMS (Java Message Service). In addition, SOAP allows application designers to determine the separation between message metadata and message payload—there is no defined static set of SOAP headers to which an application must comply. This flexibility to define custom define metadata is another key enabler to using SOAP as an application integration technology. Leveraging these conditions, SOAP can be used to migrate protocol semantics between bridged protocols. Differences in error propagation and compensation, message correlation, and message reliability can be addressed using SOAP headers to carry the data representing the semantic impedance. Thus, SOAP can be deployed as an integration framework used to bridge different protocols that do not share each other’s full semantics. This is a popular early application of SOAP within the enterprise environment—back-office integration. Larger companies often start deployments of SOAP in this manner before opening up to application exchanges outside their trust boundary. SOAP applications can be deployed in a number of system patterns to solve a particular business problem. These patterns may involve SOAP intermediaries or non-SOAP nodes such as proxies and gateways. SOAP intermediaries are strictly defined as processing nodes that operate at the SOAP layer. They receive a SOAP message, understand and potentially modify it, and then forward it to the next SOAP node in the application’s message path. SOAP and non-SOAP intermediaries can be categorized as either functional or optimizing. Functional intermediaries provide some functional processing that is required for the application to work. Functional intermediaries are often used to support requestor authentication and service authorization. Optimizing intermediaries enhance an application yet are not required to exist for the application to work. Optimizing intermediaries can often be introduced

14.2


491

at runtime rather than at design time—a significant capability to enable a provider to “tune” the service post the initial service deployment. Optimizing intermediaries typically enhance scalability, reliability, or perceived performance of the provided service. Common deployment patterns of Web services intermediaries include . Gateways. These are intermediaries deployed by that reside at the trust boundary of the Web services provider. Gateways enable an enterprise to provide a single point of contact to all requesters outside its trust domain for all the Web services that it hosts. Gateways may implement some business logic such as authentication, authorization, and privacy handling. Gateways assume the role of ultimate receiver of the message, and hence are not a true SOAP intermediary. . Proxies/Adapters. These intermediaries assume the roles of requester and provider, respectively, in order to enable legacy systems to participate in a Web services application. Like gateways, these are not intermediaries in the strict sense of the SOAP processing model, as they do not forward SOAP messages. Rather, proxies are intermediaries that generate SOAP messages and adapters are intermediaries that terminate the SOAP message path—the ultimate receiver in “SOAP speak.” Proxies enable non-SOAP service requestors to initiate SOAP requests, whereas adaptors enable non-SOAP services to be exposed as Web services. . Routers. A Web service message can follow a particular “path” through an arbitrary number of SOAP intermediaries where each Web service intermediary would provide a value-added service to the message and hence to the application. Note that for a request –response message exchange pattern (MEP), the request message may traverse through different intermediaries compared to the response message path. Routing intermediaries could belong to the trust domain of the requester, the provider, or some third party. . Dispatchers. These SOAP intermediaries enable a Web service request message to be routed to one of several instances of a Web service provider based on some “filter” criteria applied to the message’s contents/data. The dispatcher typically belongs to the trust domain of the Web services provider. Dispatchers may perform some form of “load sharing” or “data partitioning” so that requests for a particular Web service may be filtered on the presence and/or value of certain data elements in a SOAP request and diverted to an appropriate provider that hosts the resource to which the data in question are relevant. Dispatching intermediaries may be used to filter Web service messages on namespaces used within the SOAP message or support service extensibility by directing requests for different versions of an interface to the appropriate runtime implementation. . Orchestrators/Composers. These intermediaries offers a composite service interface built from the coordination of a set of individual Web services. Requesters see the composed service, not the component services. Orchestrators/composers assume the role of ultimate receiver of the message and hence are not a true SOAP intermediary.

492


14.2.4.2 SOAP Example The following example comes from the W3C Architecture working group’s usage scenarios [7]. It illustrates some of the features that differentiates it from XMLRPC—use of SOAP extensibility (headers) and support for network intermediaries. The use case is a product price request between a SOAP sender and a SOAP receiver. A data caching intermediary sits between them to provide higher system performance. Figure 14.5 shows the system deployment and message flows. Figures 14.6– 14.8 show the exchanged SOAP messages. Note that these are SOAP 1.2 examples using the namespaces and attributes associated with that version. The SOAP request’s message is routed through a caching intermediary. The caching intermediary (SOAP application 2) checks its caching store to see if it can directly respond to the request. Message path 2 in the SOAP application diagram corresponds to the intermediary directly supporting the request. If the intermediary cannot fulfill the request, it will forward the message to the catalog Web SOAO Node acting as SOAO Node acting as caching intermediary Chaching ultimate receiver Catalog Store Store SOAP Sender & SOAP Receiver Receiver

SOAO Node acting as initial sender SOAP Sender SOAP Application 1

SOAP Application 2

Caching Handler

SOAP Application 3 SOAP Block 1

Caching Handler

Message Path 1 Message Path 2

SOAP Layer

SOAP Processor

Underlying Protocol Layer

SOAP Message

SOAP Processor

SOAP Message

XMLP Processor

Underlying Protocol Message Path

Underlying Protocol Intermediary e.g. HTTP proxy SMTP relay Host I

Host II

Host III

Host IV

Host V

Figure 14.5 SOAP distributed application using a caching intermediary [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]

14.2


493

ABC-1234 Figure 14.6 SOAP request message for a cataloged price [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]

service (SOAP application 3). This is message path 1. The catalog SOAP processor will respond to the request and insert into the message additional data targeted to the caching intermediary (see figure B). These additional data are placed in a SOAP header. This demonstrates SOAP’s extensibility mechanism—to provide a means to attach message metadata or intermediary targeted content—coupled with a deterministic proceesing model for handling these message headers. In this scenario, the

ABC-1234 2001-03-09T08:00:00Z ABC-1234 120.37 Figure 14.7 SOAP response emitted from catalog SOAP processor [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]

494


ABC-1234 120.37 Figure 14.8 SOAP response received by originating SOAP sender [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]

SOAP header is inserted to control any caches that may reside in any intermediaries along the message return path to the originating sender. The CacheControl Header data are consumed by the caching intermediary. This message header must be processed at this SOAP node as determined by the semantics of the mustUnderstand and role attributes (more details later on these semantics). The application semantics of caching determines the behavior at the intermediary. In this case the salient data in the message payload are copied to the cache with the indicated index and expiration time. The CacheControl Header is stripped by the intermediary (see figure C) and the message follows its path back to the SOAP sending application. 14.2.4.3 SOAP 1.1 Structure and Processing Model SOAP version 1.1 primarily defines a message structure, a processing model, an encoding scheme for message data, an HTTP binding, and a RPC programming model mapping. Arguably, the buzz about Web services is due mostly to this specification. SOAP is similar to XML-RPC, which is to be expected given their shared origin. Many of the features are the same—comparable encoding rules, an RPC convention, and HTTP binding orientation. What is different is that SOAP allows custom data types, direct support for application intermediaries, an extensibility mechanism, and a defined processing model for these last two features. The cost to SOAP, relative to XML-RPC, is simplicity. It seems that the value of this tradeoff is borne out by SOAP’s popularity. The SOAP message structure and processing model are tightly coupled. Hence, it is illustrative to discuss these topics together. Figure 14.9 demonstrates the general structure of a SOAP 1.1 message. A SOAP message is an XML document. The outermost container is the “envelope” element—thus the envelope is essentially the SOAP message The envelope is where global namespaces are set. The envelope tag itself must be scoped to a namespace. The envelope namespace declares the version of the SOAP message. For SOAP version 1.1, the namespace URI to use is http://schemas.xmlsoap.

14.2


495

Figure 14.9

SOAP message structure.

org/soap/envelope/. If a SOAP 1.1 processor receives a SOAP message that

uses a different envelope namespace, it will generate a SOAP fault with a faultcode value of VersionMismatch. The SOAP envelope also can be used to declare the encoding style used to represent the message data. If the encoding style is the one defined by the SOAP specification (called “SOAP encoding”), then it is often defined here at the envelope using the URI http://schemas.xmlsoap.org/ soap/encoding/. SOAP encoding will be discussed in more detail later in this chapter. The SOAP envelope is divided into two sections—one for header element blocks and the other for body element blocks. The SOAP header is intended to encapsulate ancillary content relative to the service being invoked. The SOAP body is intended to encapsulate the primary content of the message exchange. From a syntax viewpoint, the header and body elements are similar. Each must be an immediate child

496


element of the envelope element. Header elements are optional and must precede the minimally one mandatory body element. SOAP Body Processing The body element is the container for data that maps to functional requirements of a SOAP application. This is the message payload. The payload contains information intended for processing by the main application logic. It can take the form of a remote procedure call or an XML document to exchange. The two canonical examples of these options are the stock quote and the purchase order, respectively. (Syntactically, the body element is an immediate child of the envelope element. If there is no header element, then the body element is the first child; if a header element does appear in the message, then the body element immediately follows it. The payload of the message is represented as child elements of body, and is serialized according to the chosen convention and encoding. Most of this chapter deals with the contents of the body and how to build payloads.) The processing model for messages that do not contain header elements is simple and straightforward. The client encodes a service request into the body of a SOAP message and forwards the message to a SOAP node that is acting in the role of ultimate receiver or Web service provider. It performs the service and typically sends back a response message, depending on the application semantics. An error may occur during service processing: the service provider may not recognize the content of the SOAP message body, the body’s content may be incomplete or erroneous, or some internal service processing problem may occur. The service would then generate a SOAP fault message describing the error and forward the message back to the client. SOAP Header Processing The SOAP 1.1 specification states that the use of SOAP headers is to extend a message without prior knowledge between the communicating parties. This definition seems both ambiguous and somewhat misleading. The specification then cites SOAP header use for purposes such as authentication, transaction management, and payments. These examples better illustrate current SOAP header practice—to extend a base application with features that communicating SOAP nodes recognize and process. This is the dominant use case for SOAP headers—where the semantics of the SOAP header data is orthogonal to the semantics of the SOAP body payload. Consider a sophisticated travel agency Web service that reserves and purchases hotel bookings and airline tickets. The SOAP processors implement header elements to handle transactions, security, performance, message reliability, or message correlation. These are all orthogonal features to the offered service. In contrast, another class of SOAP header use fits the case where the semantics of header data and payload processing are mutually dependent. For instance, in the same-use case, SOAP messages exchanging travel itinerary information could capture the traveler’s identity (name, address, etc.) in a SOAP header where the SOAP body would carry the flight numbers, arrival and departure times, hotel

14.2


497

address, and other specifics. Thus, the SOAP header is used to carry application data that applies to the whole body payload in a direct, application-oriented manner. The third class of SOAP header usage is for intermediary processing, including the deliberate processing of data at intermediaries. The earlier SOAP example illustrating cache control demonstrates this style of header usage. Another example is routing messages to intermediaries that implement part of the application logic, such as credit card authorization, before routing the message to the ultimate receiver. Note that a SOAP message is not constrained to any one style of header usage. A SOAP message can use headers that map to any combination of the header types described above—or even to all types in some extreme case. The SOAP processing model adds novel value only in the case when SOAP headers are used to control intermediary behavior. The SOAP processing model is scoped to a single message exchange routing from an originating sender to the ultimate (last) receiver. Hence, it doesn’t describe behavior involving more than a single message exchange between originator and destination—unlike multiple message exchanges such as request – response or publish –subscribe. Along the path from sender to ultimate receiver, the SOAP message pass may through SOAP intermediaries. A SOAP intermediary is a SOAP processor that receives a SOAP message, processes one or more of the SOAP headers in the message, and then forward the message to either another intermediary or to the ultimate receiver. SOAP messaging is hop to hop, so the SOAP processing model is defined in terms of individual node processing. The SOAP 1.2 processing model describes the behavior at each SOAP node and involves three steps: 1. Identify all the SOAP blocks to be processed. This means inspecting each SOAP header in turn and identifying whether that header is to be processed by the SOAP node. The “role” attribute is used to target a specific SOAP node in the message path to process a specific SOAP header. 2. Verify that the SOAP node is capable of processing all above-identified headers that are attributed as “mandatory.” If the SOAP node is not capable of processing any of these headers, then generate a SOAP fault message targeted to the originating message sender. If identified SOAP headers are attributed as optional, they may be ignored. The mustUnderstand attribute is used to assert a mandatory request for the SOAP node to process the SOAP header. 3. If the SOAP node is an intermediary, then remove all identified SOAP header blocks from the message and then forward the message toward the ultimate receiver. SOAP 1.2 extended the SOAP 1.0 processing model by offering more detail and options regarding role processing. The SOAP 1.1 processing model does not stipulate the processing order for header blocks. Some SOAP nodes process in lexical order, some in placement order. This has been a source of interoperability problems. Another interoperability issue is the smuggling of platform-dependent, proprietary headers into the SOAP

498


message. These actions contradict the goal of SOAP to be a cross-platform application messaging protocol. Some interoperability issues have been addressed in the WS-I Basic Profile 1.0, a document that profiles best practices for SOAP, WSDL, and UDDI. Web services specifications based upon SOAP 1.1 extensibility have been released in droves. Each of these defines a set of headers—their syntax and processing semantics. There are headers specified for security, transactions, reliability, business processing, asynchronous callbacks, message routing, and other generic requirements. Some of these specifications have been submitted to one of the Web services standards bodies, and more specifications assuredly will be. Once standardized, the resulting specifications will broaden the minimal SOAP protocol to support features required for some distributed applications. A system architect will be able to map the application requirements to the proper standards. Hence, SOAP extensibility enables system design flexibility.

SOAP Fault Message Structure and Processing For error processing, SOAP defines a SOAP fault message structure. A SOAP fault message is structured to support a SOAP processing error response associated with a received SOAP request. SOAP allows only a single SOAP fault element as the only immediate child of a body element. The simple example in Figure 14.10 demonstrates an error thrown during payload processing. SOAP faults are mapped to handle either messaging or processing exceptions. SOAP messaging errors are generated when a SOAP node detects a poorly formed SOAP message or a SOAP message containing erroneous or incomplete information. SOAP processing errors are generated when a SOAP node throws an application-level exception not directly attributed to the structure or content of the received SOAP message. A SOAP fault contains an extensible faultcode attribute that allows a SOAP processor to branch appropriately when it receives the SOAP fault message. For complex applications involving intermediaries, a faultactor attribute will identify, by URI, the SOAP node that detected the error and generated the SOAP fault message.

12345 Joe’s Pizza Parlor 20013-07-03 net 30 Joe Cerigioni 53535 500 lbs 12345 1000.00 60.00 1060.00 20013-07-05 Figure 14.12

Document style SOAP.

14.2


503

POST/Reservations HTTP/1.1 Host: travelcompany.example.org Content-Type: application/soap þ xml; charset ¼ “utf-8” Content-Length: nnnn 5 FT35ZBQ ˚ke Jo ´gvan Øyvind A env:Envelope xmlns:env ¼ “http://www.w3.org/2003/05/soapenvelope” > 5 confirmed FT35ZBQ http://travelcompany.example.org/reservations?code ¼ FT35ZBQ Figure 14.14

SOAP 1.2 RPC response.

Using SOAP for RPC and choosing an appropriate protocol binding are treated as orthogonal concepts in the specification. In practice, mapping the RPC semantics to a protocol binding that does not clearly define a request –response message pattern will require extra design. HTTP correlates the two messages by reusing the same connection. Thus a message is certain to be a response if delivered along the same connection where an earlier request was sent. Binding to a different protocol may require the application desiger to implement a correlation identifier directly as a SOAP header. SOAP Encoding SOAP encoding specifies how to represent typed data in a SOAP message. Encoding rules are necessary to be able to transform data from a programming environment into a SOAP message and back to a programming environment. SOAP encoding rules are specified in the SOAP1.1 specification and are biased toward marshaling RPC parameters into and from their programming constructs. Encoding rules operate on a chosen set of datatypes. Recall that SOAP is not tied to any particular programming language and that a heterogeneous mixture of programming languages can implement a set of interacting SOAP nodes. Hence, the

14.2


505

encoding rules that SOAP employs must support datatypes common to popular programming and database languages. These datatypes include integers, strings, floating-point numbers, structures, and arrays. SOAP does not dictate which encoding rules to use. It is important that each application be allowed to choose the most appropriate encoding to suit its requirements. What tempers a proliferation of encoding rules is the prime objective of Web services—interoperability. What SOAP does is support a mechanism so communicating SOAP nodes can be clear on what encoding is used to serialize a SOAP message. This is the encodingStyle attribute, and its value is a URI that indicates the controlling schema. This URI may indicate only a verbal or documented agreement for encoding rules, rather than an actual parsable, schema to use for validation. When encodingStyle is attributed to the envelope element, it applies to the whole SOAP message. Although this is the typical case, it is not mandatory. Some messages will set the encodingStyle attribute on the subcontainer with data to be encoded. SOAP does not define a default encoding. If encodingStyle is not present on any container of a SOAP message, the receiving SOAP processor has no embedded clues on the message’s data serialization. SOAP defines an encoding as part of the specification. This is called “SOAP encoding” or “Section 5 encoding,” taken from the section of the SOAP 1.1 specification that defines it. Since XML Schema and SOAP were developed in the same timeframe, SOAP could only take partial advantage of this standard schema technology. XML Schema satisfies the SOAP encoding requirements for defining common datatypes and their corresponding encoding rules. SOAP encoding takes from XML Schema the datatypes’ namespace and the type attribute. XML Schema encoding is often used in place of the native SOAP encoding rules. WSDL, which was released after XML Schema, describes message formats using only XML Schema. SOAP encoding constitutes a data model based on XML along with corresponding encoding rules. This data model consists of simple types such as strings, integers, and floats. The SOAP data model also includes complex types such as arrays and structures, which are compositions of simple types. An application creates an instance of the data model that is represented as a graph of typed data. The SOAP encoding rules dictate how this graph is serialized into a SOAP message. To indicate this encoding in a SOAP message, the encodingStyle URI should be set to the value of http://schemas.xmlsoap.org/soap/encoding/. This URI points to a schema that defines the SOAP datatypes and encoding rules. “Section 5” encoding has been popular for RPC-oriented SOAP applications. However, there is momentum toward document-style SOAP, and this requires XML Schema encoding to produce the required sophisticated data representations. Toward this end, the WS-I Basic Profile has deprecated “Section 5” encoding. 14.2.4.6 SOAP with Attachments SOAP is principally an application-to-application messaging protocol. However, SOAP is increasingly acting as a content delivery conduit for arbitrarily typed

506


and sized data that are attached to the message. It is up to the SOAP application’s semantics whether this data attachment is to be used as an ancillary rider to the SOAP message or whether the attachment itself is the core reason for the transmission of the SOAP message. This second scenario may be exploited by applications whose requirement is to move binary data to another network application using a common transport protocol. Given an optimistic future for the uptake of SOAP, either scenario is plausible. For transmitting binary data or a large quantity of marked up data, an application designer can choose between embedding the data (e.g., as base64 typed data) between XML tags inside the SOAP envelope or using an attachment mechanism. Attachments can increase efficiency by avoiding the need to parse a large XML envelope or to process the attachments. However, attachments themselves add some overhead to the message, and the application processor must be more sophisticated to understand attachment syntax and semantics. Attachments can be conveyed with a SOAP message using MIME multipart technology. This technique was defined in the “SOAP with attachments” (SwA) specification [13]. SwA defines how a message is to be carried within a MIME multipart form in such a way that the SOAP processing rules are preserved. The MIME multipart mechanism supports binary attachments such as images, sound files, or Word documents. To handle attachments, the XML Protocol (XMLP) working group of the W3C first published the SOAP 1.2 Attachment Feature [17]. The SOAP 1.2 Attachment Feature refers to SwA as a concrete attachment binding. The SOAP 1.2 Attachment Feature does not define concrete encoding rules for SOAP attachments. Rather, it defines an abstract SOAP 1.2 attachment feature that will enable various SOAP bindings to be defined that support the exchange of messages with binary attachments. In addition to SwA, another attachments specification, WS-Attachments [18], is also referenced by the SOAP 1.2 Attachment Feature. WS-Attachments leverages DIME rather than MIME as the document attachment mechanism. DIME has the advantage of maintaining a length attribute for each payload component. Knowing the length relieves the receiving SOAP application from the possible significant overhead in processing and memory allocation. After creating the SOAP 1.2 Attachment Feature, the XMLP working group further analyzed the attachments space and found the existing work lacking. SwA did not fully consider attachments relative to the XML Infoset and SOAP processing model expressed in SOAP 1.2. Additionally, opportunities for optimizing the exchange of SOAP messages that contain binary data had not been explored in a standard body. To this end, the XMLP working group has created a trio of new specifications. XML-binary Optimization Packaging [25] (XOP, pronounced “zop”) is an optimized serialization of an XML Infoset that contains binary content. Content that is typically base64 encoded is transmitted instead in its native octet stream, thus it does not incur either the message bloat or processing overhead associated with base64 encoding binary data. XOP uses MIME Multipart/Related [26] as its packaging format. Also, XOP enables binary data to be conceptualized as typical base64-encoded so it can be processed as a typical XML document. This layering thus enables a

14.2


507

XOP package to be digitally signed, for instance. SOAP Message Transmission Optimization Mechanism (MTOM) [27] is the another XMLP specification that describes how to use XOP with SOAP 1.2. A new media type is described that is to be used when a SOAP message Infoset is XOP encoded, thus alerting a receiving SOAP application to appropriately optimize during processing. The last of the trio of specifications is the SOAP resource representation header [28]. This specification describes a mechanism for carrying a representation of a Web resource as a SOAP header in situations where the receiver of this message would typically want to retrieve this representation. This capability is important in cases where the resource may be inaccessible or when it is desired to reduce the network overhead and eliminate the need for an additional HTTP GET on the resource. 14.2.4.7 SOAP Message Security Security is an essential aspect of SOAP messaging due to the various risks associated with sharing information or performing transactions. Inappropriate information disclosure can impact ongoing negotiations, customer, partner or employee relationships, adherence to privacy regulations, and reputation. Eavesdropping of SOAP messages can allow a passive attacker to obtain and misuse information. An attacker may also perform active attacks such as sending messages pretending to be someone else, changing message content, or resending old messages to obtain services or other side effects. Such attacks can result in products being shipped to the attacker rather than the original message sender, false information being used in decisionmaking, or services being obtained without payment, to give some examples. Such attacks may occur when a capable attacker exploits vulnerabilities to achieve a goal. Such threats may be mitigated by deploying security services as countermeasures. Since there is a cost to security, security services need to be optional and used as needed depending on the application, risks, and costs. Some well-established security services include the following (complete definitions may be found in the Internet Security Glossary, RFC 2828): . Message integrity—message content cannot be changed without receiver detecting the change . Message confidentiality—eavesdropper unable to view protected message content . Mutual authentication—sender and receiver able to determine that the other party is as expected . Authorization—information can be conveyed to enable party to determine whether to provide service . Timeliness—messages cannot be reused without detection Integrity requires more than a checksum since in the context of security threats, forgery and substitutions also need to be detected. This is accomplished by using cryptographic methods to achieve data origin integrity, the ability to associate a party with the content. In addition to these security services, care must be taken

508


to mitigate the risk of denial-of-service attacks that can be used to either disable or degrade a service. It is hard to prevent but measures can be taken to increase the cost to an attacker. Mature technologies exist to provide transport security and can be used in certain circumstances when SOAP is bound to HTTP or another protocol that uses TCP/IP. A well-known solution is the secure sockets layer (SSL) and the revised RFC standard transport layer security (TLS). SSL was a Netscape de facto standard that was later adopted and revised by the IETF to produce TLS in January 1999. This security technology may be used to provide integrity, timeliness, and confidentiality as well as mutual authentication. Server authentication is achieved using a server X.509 certificate. Client authentication may be achieved using a client X.509 certificate, or in the case of an HTTP binding, using HTTP basic or digest authentication in conjunction with SSL/TLS confidentiality and integrity protection. Care must be taken to configure SSL/TLS correctly to use the appropriate security features, algorithms, and key lengths. Transport layer security is only appropriate for securing a link between two SSL/ TLS endpoints. This is useful when SOAP is used with HTTP without any SOAP intermediary nodes, but not when SOAP intermediary nodes are present. Since SOAP intermediary nodes must be able to examine and possibly modify the SOAP headers, SSL/TLS must terminate at the intermediary. This means that SSL/TLS protection is lost at the intermediary node, exposing the payload of the message to risks if the SOAP node is compromised. The SOAP Message Security specification under development at the Oasis Web Services Security technical committee is designed to specifically address security concerns related to SOAP messaging. This effort was initiated on July 9, 2002 with the charter referring to the April 5, 2002 WS-Security specification submission from IBM, Microsoft, and Verizon. The WSS committee specifications provide an example of how SOAP messaging may be extended using SOAP headers and processing rules to add additional optional infrastructure capabilities to Web services. The SOAP Message Security specification defines how to use the W3C XML Digital Signature recommendation to provide (1) integrity for any combination of SOAP header or body elements and (2) verification that claims made in security tokens are from the sender. It also defines use of the W3C XML Encryption recommendation to allow encryption of any combination of SOAP body elements and content, SOAP header elements, and content and attachments (but not the soap:Envelope, soap:Header and soap:Body elements themselves since the SOAP enveloping structure must remain intact). The specification defines a timestamp mechanism to support establishing the freshness of SOAP Security header blocks. Finally, the core specification and security token profiles define security tokens and associated mechanisms and processsing rules. Security tokens are XML elements representing one or more claims made by the token authority, used to convey identity, key, authentication, authorization, or other information. The SOAP Message Security specification supports both XML and binary tokens and defines generic mechanisms to reference tokens. The Oasis committee is also

14.2


509

producing security token profiles for Username, X.509, Kerberos, SAML, and XrML security tokens. The username token may be used to convey a username and optionally a password for authentication, as well as supporting key derivation based on a username. The X.509 token profile outlines a means to convey X.509 certificate (and hence key) information, and the Kerberos token profile outlines conveying Kerberos tickets (and hence symmetric key information). The SAML profile outlines how to convey SAML assertions (authentication, authorization, and attribute assertions) and how the subject confirmation method is used in a SOAP messaging context. XrML defines how rights management tokens may be used. Other token profiles may be developed in the future. The SOAP Message Security specification defines a SOAP security header block enabling security for both an ultimate receiver or SOAP intermediaries through the use of the SOAP role. A SOAP message may contain one or more wsse:Security3 header blocks, as needed, although only one may be defined for a specific role. Each security header block may contain a combination of XML signature elements, encryption reference lists, encrypted keys, wsu:Timestamp4 elements, or security tokens. Thus the security header contains signatures, information necessary to process encrypted message components, information useful to give security timeliness, and security tokens. The SOAP Message Security specification requires that XML signatures used for SOAP message security be placed in a SOAP security header block associated with the intended recipient (either the ultimate receiver or an intermediary, depending on the role) and recommends that supporting cryptographic key information also be conveyed in security tokens in that header block, although other ds:KeyInfo5 mechanisms are possible. In order to fit with the SOAP message processing model, senders should not send enveloping signatures, nor should receivers use an EnvelopedSignatureTransform. The SecurityTokenReference provides a mechanism to reference security tokens, either directly using a URI or indirectly using an opaque identifier value or name as defined in the token profile. An embedded token may also be contained within a SecurityTokenReference element.When keys are conveyed in security tokens in the security header block they may be referenced as an XML signature key by placing a SecurityTokenReference as a child of the ds:KeyInfo element. Security tokens may also be signed as part of an XML signature by referencing a SecurityTokenReference element with a ds:Reference URI and specifying a “STR Dereference Transform.” This transform dereferences the SecurityTokenReference, allowing the resulting security token to be hashed and included in the XML signature ds:Reference. 3

wsse is a prefix referring to the SOAP Message Security namespace; the prefix string used here for clarity is not normative. 4 wsu is a prefix referring to the SOAP Message Security utility namespace; the prefix string used here for clarity is not normative. 5 ds is a prefix referring to the XML Digital Signature namespace; the prefix string used here for clarity is not normative.

510


SOAP Message Security addresses confidentiality by using the W3C XML Encryption recommendation. Any SOAP message content may be encrypted, either elements or element content except for the soap:Envelope, soap:Header, and soap:Body elements themselves. As required by XML Encryption, when an XML element or element content is encrypted, it is replaced by the xenc:EncryptedData element.6 Attachments may also be encrypted, with the attachment replaced by cipher text that is referenced from an EncryptedData element in the security header. The SOAP Message Security specification imposes additional requirements on the use of XML Encryption by requiring a manifest of what has been encrypted be added to the security header block. Each encryption may use a different key, and if the encryption key needs to be conveyed explicitly, it may be referenced using a ds:KeyInfo element in the xenc:EncryptedData element. The ds:KeyInfo may include a SecurityTokenReference child when the key is conveyed in a security token in the security header. If a symmetric encryption key has been encrypted, then this encrypted key may be conveyed in the security header using a ds:EncryptedKey element. A common case is to encrypt a symmetric key with a public key, for example. If an EncryptedKey element is used in the header then it should contain a xenc: ReferenceList containing a xenc:DataReference element for each encryption. If the encryption key is not encrypted then what was encrypted with it should be included in an xenc:ReferenceList in the security header block. Use of security techniques for SOAP messaging is being profiled by the WS-I Basic Security Profile working group, profiling use of SSL/TLS as well as SOAP Message Security and a subset of the token profiles. Such profiling is important since security techniques may be combined but care must be taken to avoid introducing new security vulnerabilities. 14.2.4.8 SOAP 1.2 Changes The SOAP 1.2 specification was a W3C-proposed recommendation (PR) in summer 2003 and is on track to be a full recommendation (REC). There are some significant changes in SOAP 1.2. One large difference is that the SOAP 1.2 is built around the XML Infoset [11] rather than XML 1.0 document syntax. This means that the SOAP structure is defined abstractly, instead of preencoded as an XML document. This enables alternate representations of the same SOAP information to be mapped to different protocol bindings. SOAP 1.2 has some syntax changes dealing with message structure. It addresses some ambiguities and missing features related to headers and intermediary processing, along with faults. The HTTP binding has been modified. The SOAPAction header is removed and is replaced by an optional parameter in the Contenttype HTTP header. A new media type, application/xml þ soap, replaces text/xml of the Content-type HTTP header. SOAP 1.2 better maps to the HTTP status codes. The HTTP GET method is now incorporated into a SOAP 6

xenc is a prefix referring to the XML Encryption namespace; the prefix string used here for clarity is not normative.

14.2


511

response message exchange pattern. This leverages the safety and idempotence of GET requests for queries, instead of misusing HTTP POST for this purpose. SOAP 1.2 also adds some features to both the RPC style and the SOAP encoding. The SOAP GET binding is a response to REST advocates criticism that SOAP messages with query semantics should work within the Web architecture principals and use the syntax and semantics of HTTP GET rather than tunnel over HTTP POST. Use of GET requires some subtle requirements to be met, namely, that the resource to query is reachable by a URI, there are no side effects on the service platform (the operation is safe), the query performed many times has the same results as if the query were performed only once (idempotence), and caching the retrieved data creates no security or privacy concerns. 14.2.5 WSDL The acronym WSDL stands for Web Services Description Language. The WSDL specification defines service interface metadata that can be used to formulate messages targeted to and from a Web service. SOAP is the primary message format that WSDL describes. WSDL 1.1 is an IBM and Microsoft submission to the W3C in spring 2001 [11]. In 2002, the W3C started a working group to turn this into a W3C Recommendation. Web services are about interoperability. Interoperability can be difficult to achieve even when the development process is controlled within an enterprise. Interoperability outside the enterprise becomes more elusive when you factor in multiple development teams, a variety of programming languages and platforms, geographic dispersal of developers and locale differences, and no single controlling party across the developed components to ensure quality. What is minimally required is an interface definition language (IDL) that is language/platform-neutral for Web services. Other distributed systems such as CORBA and DCOM use an IDL to specify the how to bind and invoke services. For Web services, WSDL is the IDL. WSDL is limited in its ability to encapsulate all the information necessary for a client process to fully use the offered Web service. This is because WSDL, like other IDL languages, can define only the syntax, and not the semantics, of the service in a formal manner. Therefore, what a service actually does is left to humans to interpolate. The missing formal semantics should not be underestimated. Missing or misunderstood semantics introduces errors and costly development cycles to fix. This is the bane of most middleware, and it will not improve until technologies such as the semantic Web are fully deployed. WSDL is an XML document that describes three general areas of necessary metadata in order to invoke a Web service—message data, operations, and bindings. Message data defines any needed structure or datatypes of the exchanged data and the mapping of custom or XSD types into messages. Operations define how the service is accessed by the defined messages, such as an input/output message. Bindings defines the transfer protocol messages use for physical delivery. WSDL defines three bindings, for HTTP, for MIME, and, most importantly, for SOAP. It is the SOAP binding that gives WSDL its high value in the Web services domain.

512


Figure 14.15

Simple WSDL definition.

The WSDL document structure defines elements to build up to these three main concepts. Lets use examples and examine the subelements. WSDL Message Data—Types, Parts, and Messages First is the message data section. Two WSDL element constructs are used to define the data format of a Web service message. The WSDL section defines the structure of custom typed data to be used in the message. Hence, these are datatypes tied semantically to the application and are not general datatypes. General datatypes are predefined by XSD (such as xsd:int) and are the foundation of custom-defined datatypes. The datatypes are then mapped into a WSDL message element. This mapping in done by message parts. Each message part is related to a defined data type, possibly a custom datatype container modeling a data structure. For RPC style Web services, each part typically corresponds to a function parameter. Message parts are also used when a message has multiple logical sections, each composed of an arbitrarily complex XSD structure. Modeling messages in this fashion supports documentstyle Web services. Both modeling styles are supported by WSDL. A message is independent of the binding and hence is said to be “abstract.” Thus, a message may be used with SOAP, HTTP GET, MIME, or any other protocol binding. For RPC-style Web services, two messages must be described—the input/request message and the output/response message. WSDL elements and XSD is used to allow many ways to model messages. Figure 14.15 gives two examples; the first one is simple, the second more complex, and more realistic. The example in Figure 14.16 demonstrates the creation of WSDL messages without the need for custom types. These WSDL messages model RPC style Web services. Each message part corresponds to an RPC parameter. Since custom datatypes are not needed, the has been omitted from the WSDL document. The second example demonstrates WSDL messages for a document-style Web service that models a factory production system. This demonstrates a WSDL message requiring custom types and multiple message parts. Custom types are defined using XSD constructs and build on the foundation XSD primitive types. Custom datatypes are created according to the semantics of the application. Message parts are used to logically group the data. In this example, this is the and the . A single ProductionReport

14.2


xmlns ¼ “http://www.w3.org/2001/XMLSchema”

Figure 14.16

WSDL custom types and multiple message parts.

513

514


message combines these two parts. is straightforward— it models a structure that contains production information collected off the factory floor. is more complex. It models what the product is manufactured to fill—intended either for warehoused stock or for one or more customer orders for that product. WSDL Operations—Message Exchange Patterns, and portTypes The second WSDL concept to address is operations. An operation corresponds to a logical invocation of a Web service. WSDL 1.1 operations are defined in the context of one of four basic message exchange patterns called transmission primitives. Hence, operations define the sequence of messages required to logically complete a Web service invocation. Operations define input, output, or optional fault child elements. These correspond to the message flow relative to the Web service receiver, not the invoking sender. The WSDL four defined operation patterns are . One-way—the Web service receives a message and generates no reply. This is modeled by only one child input element. . Request –response—the Web service receives a message, and either sends a reply message that is correlated to the request message or generates a fault. This is modeled by one child input element, followed by an output element, followed by an optional fault element. . Solicit-response—the Web service endpoint sends a message, and receives a correlated reply. This pattern is the converse of the request – response pattern. Note that the response message should not have an “in” parameter Figure 14.17 Example WSDL porttype and operation.

14.2


515

because this pattern dictates, by definition, that the Web service cannot react to input data in the reply. WSDL does not define a binding for this pattern, so this may of limited use in practice. This is modeled by one child output element, followed by an input element, followed by an optional fault element. . Notification—the Web service emits a message. This corresponds to a broadcast or a publish – subscribe programming model. WSDL also does not define a binding for this pattern. This is modeled by only one child output element. To support the tight syntax of RPC style services, yet to be flexible to handle document-style services, the operation element can optionally use the parameterOrder. This attribute is used to explicitly specify the part order for RPC-style messages. Parameters are space-delimited and identified by part name. Related operations may be grouped into elements. The element is a logical container of functionally related operations similar to object methods or COM interface methods. The element provides WSDL a mechanism to bind related operations to a concrete protocol. A WSDL document can host a number of container elements. The example in Figure 14.17 demonstrates both and . Three operations are modeled in Figure 14.17: two request – response and one notification. All are grouped in the same element because as one service they would likely be bound to the same protocol.

WSDL Bindings—Concrete Protocols, Ports, and Services The last WSDL concept to address is bindings. WSDL bindings map the abstract datatypes, messages, and operations to concrete physical representation of messages. There are

Figure 14.18

WSDL example from WSDL 1.1 specification.

516


three WSDL components required to define this mapping: bindings, port, and service. A WSDL element maps a PortType to an actual transport protocol. Bindings are defined in the WSDL 1.1 specification for SOAP, HTTP, and MIME. However, the SOAP 1.1 binding is the primary use case for WSDL. A binding does not specify network address, just the protocol artifacts that would be static no matter which host machine the service is deployed on. For SOAP bindings, WSDL 1.1 describes examples of SOAP over HTTP (request – response operation) and SOAP over SMTP (one-way operation). Figure 14.18 presents a SOAP-over-HTTP binding example from the WSDL 1.1 specification. WSDL defines for the element a set of extension elements. Extension elements transition the WSDL document from the abstract to the concrete. Extension elements declare how a binding is realized—by using SOAP or HTTP or MIME. Hence, the role of the extension elements is to specify a concrete grammar in the notation of the physical binding. In the case for SOAP, a set of SOAP extension elements is used. The extension elements defined by the WSDL 1.1 specification are for SOAP, HTTP, and MIME. Concentrating on this example and SOAP binding, the element specifies SOAP over HTTP as the transport protocol and the document style of messaging. A correlation between the abstract operation and this binding is made by identifying that operation by name and creating a child element. The soap:operation element defines the SOAPAction header to use. This is discussed in the SOAP section, but is it an HTTP visible mechanism to dispatch this SOAP message to the proper component in the Web service. The element is used to describe the encoding of the input and output messages related to the containing operation. There are a number of options of relating how the message is encoded and relating whether the indicated encoding is for the whole message body or targeted to individual message parts. This simple example identifies literal encoding. This declares that the concrete schema definitions are “literally” defined in the WSDL type section. If the value of the “use” attribute is “encoded,” then the datatypes used to construct the message parts in the WSDL type section should be thought of as “abstract.” Another step is required to encode the data, and that is defined by an encodingStyle attribute. The encodingStyle attribute value is a URI, and it will point to an indicator of the encoding rules to use. For example, a element that indicates the SOAP encoding style is presented in Figure 14.19.

Figure 14.19

Declaring SOAP encoding in a WSDL .

14.2


517

Figure 14.20

Two WSDL services and associated elements.

All that is left for WSDL to declare is the runtime address of the defined Web service. This is done with the WSDL and elements. A service is a collection of ports. A port specifies a sole network address for a binding. Hence, a port can be considered a concrete instantiation of an abstract portType. The address information is supplied by a protocol extension element. As with bindings, protocol extension elements are defined for SOAP and HTTP (the MIME binding assumes SOAP as a transport protocol). If a service groups multiple ports, then the ports are instantiations of the same portType. The ports must vary by either employing a different binding (protocol) or a different network address for the same binding. The ports are semantically equivalent and can be regarded as alternatives. In Figure 14.20, two services are offered by the WSDL file. The first service deploys three alternative endpoints—two SOAP and one HTTP—to the same logical set of operations. The SOAP ports reuse the same binding information, but use separate network addresses. A WSDL document can expose multiple services. The second service instantiates a port that doesn’t expose the same portType.

518


WSDL 1.2 The W3C formed the Web Services Description working group in 2002 and had produced significant deliverables by summer 2003. In addition to a requirements and a usage scenarios document, the group is creating the WSDL 1.2 specification. The current form of the specification is in three parts. WSDL Part 1: Core Language defines the overall document framework in terms of an abstract component model modeled as an XML Infoset. WSDL Part 2: Message Patterns defines the supported primitive message combinations used to support a Web service operation. Message patterns are binding independent and are defined in terms of sequence, direction, cardinality (single or multicast), and potential faults. WSDL 1.3: Bindings defines WSDL protocol extensions to use with SOAP and HTTP, along with an extension to use with the MIME message format. The WSDL 1.2 specification has made some changes relative to WSDL 1.1. Operator overloading was possible in 1.1, and this feature has been removed in 1.2. The element has been renamed , and interface inheritance is supported. The element has been renamed . WSDL 1.2 is also defining a mechanism to describe extensibility in terms of features and properties. Features are described in SOAP 1.2. Features are defined abstractly, and their role is to introduce a new capability, such as security or correlation, which extends the SOAP processing model. Properties are concrete extension elements that support a given feature. Hence, properties and features should enable WSDL 1.2 to describe extended distributed processing functionality beyond the basic SOAP features. 14.2.6

Discovery Protocols

Web service registries will support the discovery of provided Web services by interested requesters. Discovery is critical when requestors and providers are unknown to one another. However, in this nascent phase of Web services technology, it will be more typical that business partners already have an established relationship before interacting with one another. All partners will typically know each others’ capabilities in terms of both offered services and the quality of that service. However, some Web services proponents state that automating the discovery process is the key to the uptake of Web services uptake. Lost in the discovery process is the transmission of service semantics. It is assumed that humans and natural languages will supply the semantics until technologies central to the semantic Web are deployed. Universal Description, Discovery, and Integration (UDDI) UDDI [19] is a technology that defines a Web services discovery mechanism based on a centralized registry model. It defines and describes the format for storing metadata on published services along with the APIs for both publishing and querying the registry. Registries are deployed and managed by an operator. A registry can support services for a single large enterprise, or across multiple enterprises. While UDDI can support either configuration; it is geared toward supporting service discovery across enterprises—in much the same way as the Yellow Pages organize offered services and providers. Besides detailed service metadata, UDDI registries store

14.2


519

business metadata relating to the service providers. The UDDI discovery process mandates that the requestor know the UDDI query SOAP API along with the UDDI registry endpoint. UDDI registries can be replicated across UDDI registry operators. Web Services Inspection Language (WS-IL) In contrast to UDDI, WS-IL [20] describes a service discovery model that is decentralized and relatively simple. WSIL describes how a service provider exposes its offered services. WS-IL does two things to accomplish this: (1) it describes the structure of an inspection document or a document set that references a provider’s available service descriptions, and (2) it describes a mechanism for a client to retrieve the inspection document(s). An inspection document is an XML document and is used to aggregate references for each service that the provider is exposing. A provider can partition the links to service description documents into more than one inspection document. In such a case, the inspection documents are linked and the root inspection document can be used as a starting point for discovery queries. The service description references contained in an inspection document typically point to WSDL files. WS-IL also defines a binding to enable the inspection document to reference a UDDI registered service. This enables a provider to register a services in a UDDI registry, yet still offer a local inspection access to the service metadata. This has the advantage of avoiding duplicate service metadata. A WS-IL inspection document references to service WS-IL describes two mechanisms to place and find inspection documents. The first is to name the root inspection document inspection.wsil and make it available via URL at the common entry point to the provider domain. For instance, the provider foo.com would place its root inspection document at the root of its Web server— http://foo.com/inspection.wsil. Another mechanism is to use the HTML META tag to provide links to the providers inspection documents. Although not a public specification produced by a standards body, WS-IL concepts form a practical alternative to centralized registries. WS-IL consolidates concepts found in earlier technologies that promoted decentralized Web services registries. 14.2.7 Web Services Interoperability Organization (WS-I) One of the most fundamental goals for the establishment of Web services is interoperability. Interoperability is required when building new distributed applications under heterogeneous conditions—developed by different individuals, by different development tools, and deployed on different computing platforms. Another common scenario occurs when a Web services client is built to interact with a previously deployed Web service. In both cases, interoperability issues are exasperated when the distributed application crosses one or more trust boundaries—all the difficulties of developing in heterogeneous environments are present with the added issues of security, privacy, and reduced human access to application semantics. Interoperability is also required when using Web services for integration tasks,

520


connecting legacy systems typically in a back-office environment. In all these situations, interoperability is impossible unless the Web services technologies used are consistently interpreted and implemented by all parties. Web services interoperability has been elusive. Different standards bodies (W3C, OASIS, IETF) generate Web services specifications; no single organization owns or coordinates the development and cohesion of the specifications. The problem is especially highlighted here in Web services infancy, as the primary implemented specifications, SOAP 1.1 and WSDL 1.1, are vendor-produced and not the result of a rigorous standards process. The resulting environment is that each platform and tool vendor must interpret gaps between or ambiguity within the specifications, which leads to divergent, noninteroperable implementations. Solving these issues is why WS-I was formed. WS-I was formed by the principal vendors of Web services tools and platforms. They collectively realized that this new technology domain of Web services would not gain traction in the marketplace unless it can deliver cross-platform and tool interoperability. Pushing this responsibility down to a standards organization would result in loss of control over the process when time to market is essential. Hence, WS-I was formed in early 2002. WS-I brands versions of Web services specification as interoperable profiles. Interoperable profiles identify target Web services technologies and provide clarifications on their usage both individually and in conjunction. The first profile is WS-I Basic Profile 1.0 (BP 1.0) and includes the specifications: HTTP 1.1, XML Schema 1.0, SOAP 1.1, WSDL 1.1, and UDDI 1.0. The BP 1.0 became a final specification in summer 2003. The Basic Profile 1.1 (BP 1.1) is currently under development and pulls “SOAP with attachments” (SwA) into the BP umbrella. The Basic Security Profile 1.0 (BSP 1.0) is currently under development. BSP 1.0 will profile security specifications appropriate for SOAP message security, including SSL/TLS and the OASIS SOAP Message Security specifications, as well as the Username, X.509, and Kerberos token profiles. In addition to generating profiles, WS-I also performs a profile validation function for participating vendor Web services products. This activity occurs in the WS-I Sample Applications working group. The Sample Applications group validates interoperability against a given profile by designing a mock application (supply chain management) with multiple Web service endpoints. Each vendor platform implements each endpoint and hosts each service to demonstrate crossplatform interoperability. Additionally, a WS-I produces test tools that sniff SOAP messages on the wire and evaluate conformance of the message against a target profile. 14.2.8

Mobile Terminal Web Services

The mobile domain is motivated to exploit the increasing availability of information and services available on the Web. Mobile terminals can now browse the Web and are becoming connected to personal information management (PIM) services (such as email and scheduling and contacts) along with other enterprise systems. The demand will be to incorporate new applications and functions that make particular

14.2


521

sense to a mobile user, such as traffic and airport data and other localized information. There will also be a demand on the mobile industry to make these facilities available on more mid- and entry-level mobile phones. Given the staggering number of mobile terminals in use and the upgrade rate, the opportunities for all parties involved—consumers, enterprises, network operators, and service providers—are great. As Web services are built on the foundation of the Web, offering mobile Web services is dependent on the physical capabilities of the mobile Web. The mobile Web differs from the fixed Web in some significant ways. Mobile terminals are subject to some well-documented limitations. Mobile handset processing power is restricted in terms of CPU capabilities, addressable memory, and permanent memory. The user interface is limited in terms of both screen and input device. The data network is constrained in both data throughput and latency. However, on all these fronts, mobile terminals are improving dramatically. Mobile operating systems and user interfaces are becoming increasingly powerful and sophisticated— witness the Symbian operating system and the Nokia Series 60 platform. These advanced terminal platforms have opened up their local resources to host thirdparty applications. Current and widely deploying wireless packet switching protocols (such as GPRS) offer greater capacity, higher bit rates, and always-connected functionality. These emerging mobile enhancements are creating positive conditions for terminal hosted Web services applications. While direct terminal participation in Web services is coming, architectural alternatives can bridge the gap. The common architectural solution is to deploy a proxy into a network host that assumes the role of a SOAP client when facing the Web and connects to the mobile terminal using a supported technology. This enables the SOAP client to be arbitrarily complex. The terminal application that connects to the proxy could be either generalpurpose such as a browser or a custom application such as myAirportKiosk. These are temporary solutions that await a new generation of mobile terminals capable of hosting a Web services stack and addressing the issues that will obstruct the uptake of terminal-hosted mobile Web services applications. Two main issues of mobile Web services uptake are performance and openness. The constraints described above will limit the performance of Web services on the mobile device. Web services messages are large XML documents, so bandwidth, processing, and memory limitations all will impact perceived application performance. In particular is the performance impact of marshaling and unmarshaling XML serialized data to program memory. Hence, the serialization of the SOAP message XML Infoset into a binary representation (such as ASN.1) are being actively discussed in the mobile Internet industry and already being promoted by some vendors. The performance gains are significant using binary serialization of the SOAP message, and are made greater by sending the binary message sans (without) the tag set. The argument is that the data is sufficient, as each node already knows the message schema. These approaches however reduce the “transparency” of the system, which makes debugging difficult and runs counter to XML and Web precepts.

522


Open Mobile Alliance (OMA) The Open Mobile Alliance [22] is an organization of mobile device manufacturers, operators, and IT vendors. OMA is currently addressing how Web services can be leveraged by mobile operators. OMA is defining enabling services such as location, digital rights, and presence; use cases involving mobile subscribers, mobile operators, and service providers; an architecture for the access and deployment of enabling services; and a Web services framework for using secure SOAP in this environment. These Web service interfaces are intended to enhance a service provider’s data for a particular mobile subscriber. A common scenario starts with a data request from some application (perhaps a mobile browser) to a service provider. The service provider then uses Web services to interact with a subscriber’s mobile operator to retrieve some relevant data about the subscriber such as location or presence. This data can be used to enhance the service provider’s response to the initial request. J2ME Web Services J2ME Web Services is a specification created by the Java Community Process (JCP) under Java Specification Request (JSR) 172 [23]. J2ME Web Services builds out from J2ME, an implementation of the Mobile Information Device Profile (MIDP) specification. All MIDP implementations support HTTP, the primary transfer protocol to use with SOAP. Given the availability of HTTP networking, J2ME Web Service can focus on defining the additional components needed to parse XML and to create and consume SOAP messages, while constrained to meet platform size and performance metrics. To meet the XML parsing and platform size requirements, J2ME Web Services 1.0 does not provide DOM (Document Object Model) level 1.0 or 2.0 support and instead supports SAX (Simple API for XML) 2.0. DOM builds a tree data structure of a parsed XML document, and this can violate the memory constrain. J2ME Web Services also describes WSDL to Java mapping and a runtime API to support generated stubs. There also exist third party alternatives to JSR 172 implementations. These are XML and SOAP APIs for the MIDP platform and have the advantage of a smaller footprint and more mature implementation. 14.2.9

Privacy and Identity Management

Privacy is often an essential requirement for Web service deployments, mandated by regulations in some environments. One aspect of privacy is to ensure that identity attributes of a user are not disclosed during a Web service invocation. Anonymity in the Web services context is a mechanism used to ensure that the identity of a user is not disclosed and may be implemented using a pseudonym. This may be achieved by using an identity management solution that maps identities, by federating accounts via pseudonyms such as done by Liberty Alliance [21], for example. Architecturally, use of pseudonyms requires an application intermediary that is inside the user’s trust boundary. The application intermediary will relay the user request to the ultimate receiver after stripping out user-sensitive information and replacing any required data with intermediary-oriented data, including the pseudonym. Web service providers often require authentication of a user, but this does not

14.3

CONCLUSION

523

necessarily mean that the identity of the user is known to the service, as long as authentication was performed by a trusted party. A Web service user will typically interact with many Web service providers. Hence, each Web service user will need to maintain many identifiers, regardless of whether the user employs specialized-application Web service clients, a general-application Web services client (i.e., browser-based), or some mixture of these two client types. Service providers often capture the same sensitive information, such as address, personal preferences, and financial data. This duplicated, distributed data are apt to become stale, as each user’s sensitive information may be volatile. A solution to these issues is to provide Web service users a federated network identity. A federated network identity will provide a Web service user a single repository for all online identities and sensitive information including preferences and purchasing habits. Architecturally, an identify provider fulfils the role of a federated network identity. Each Web service user can then administer his/her multiple identities. The identity provider will then securely share identity and sensitive information with Web service providers discriminately. Identity management services can be built using Web service technology components as well as adding value to Web services deployments.

14.3

CONCLUSION

Web services are an immature industry with lots of backing by the businesses in the best position to capitalize if a robust market emerges. Because of this enormous interest, it becomes hard to distinguish true customer demand from the marketing machinery. However, Web services seem like a natural extension of the successful Web, and there is increasing Web services deployment in the B2B environment. These are typically integration tasks within an enterprise behind a secure perimeter. Once the Web services security specifications are more mature and have considerable buy-in and platform-support, this might fulfill the necessary precondition for Web services to reach critical mass. Having secure Web services will enable businesses to confidently expose external service interfaces and consumers to confidently interact with these service providers. The next level of business process issues such as message reliability, transaction-support, and message addressing will then draw the focus of IT vendors and the standards bodies. However, until basic Web services messaging, interface descriptions, and security are resolved and widely adopted, Web services technology will have limited application. The good news is that between the WS-I Basic Profile and the SOAP and WSDL specifications, two-thirds of the necessary and sufficient conditions have been met. Once the security standards make their way through the standards bodies and become adopted, the industry will then truly witness the value that the market places on Web services.

524


REFERENCES 1. I. Jacobs, ed., W3C TAG: The Architecture of the World Wide Web, http:// www.w3.org/2001/tag/webarch/. 2. R. Fielding and R. Taylor, Principled design of the modern web architecture, Proc. 2000 Int. Conf. Software Engineering (ICSE 2000), Limerick, Ireland, June 2000, pp. 407 – 416. 3. B. Carpenter, Architectural Principles of the Internet, RFC 1958, IETF, June 1996; http://www.ietf.org/rfc/rfc1958.txt. 4. J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, Hypertext Transfer Protocol—HTTP/1.1, RFC 2616, IETF, June 1999, http://www.ietf.org/ rfc/rfc2616.txt. 5. T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, eds., W3C Extensible Markup Language (XML) 1.0, 2nd ed., W3C Recommendation, Oct. 6, 2000; http://www.w3.org/TR/REC-xml. 6. M. Gudgin, M. Hadley, N. Mendelsohn, J. J. Moreau, and H. F. Nielsen, SOAP Version 1.2 Part 1: Messaging Frameworks, http://www.w3.org/2000/xp/Group/. 7. H. Haas and D. Orchard, Web Services Architecture Usage Scenarios, http:// www.w3.org/TR/2002/WD-ws-arch-scenarios-20020730/. 8. N. Mitra, SOAP Version 1.2 Part 0: Primer, http://www.w3.org/2000/xp/Group/2/ 06/LC/soap12-part0.html. 9. M. Mountain et al., SOAP Version 1.2 Email Binding, http://www.w3.org/TR/2002/ NOTE-soap12-email-20020626. 10. Using the Simple Object Access Protocol (SOAP) in Blocks Extensible Exchange Protocol (BEEP), RFC 3288, IETF, http://www.ietf.org/rfc/rfc3288.txt. 11. J. Cowan and R. Tobin, XML Information Set, W3C Recommendation, Oct. 24, 2001; http://www.w3.org/TR/xml-infoset/. 12. Web Services Description Language (WSDL) 1.1, W3C Note; http://www.w3.org/TR/ wsdl. 13. D. Box et al., Simple Object Access Protocol (SOAP) 1.1, W3C Note, May 8, 2000; http://www.w3.org/TR/SOAP/. 14. K. Ballinger et al., Basic Profile 1.0, WS-1 Board Approval Draft, June 21, 2003; http://www.ws-i.org/Profiles/Basic/2003-06/BasicProfile-1.0BdAD.html. 15. (a) D. Eastlake et al., XML—Signature Syntax and Processing, W3C Recommendation, Feb. 12, 2002, http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/; (b) D. Eastlake and J. Reagle, XML Encryption Syntax and Processing, W3C Recommendation, Dec. 10, 2002, http://www.w3.org/TR/2002/REC-xmlenc-core20021210/. 16. R. Fielding, Architectural Styles and the Design of Network-Based Software Architectures, Ph.D. dissertation, Univ. California, Irvine, 2000; http://www.ics.uci. edu/ fielding/pubs/dissertation/top.htm. 17. H. F. Nielson and H. Ruellan, SOAP 1.2 Attachment Feature, W3C Working Draft, Sept. 24, 2002; http://www.w3.org/TR/soap12-af/. 18. H. F. Nielson, E. Christensen, and J. Farrell, WS—Attachments, Internet Draft, June 2002. 19. UDDI Version 3 Specification; http://uddi.org/.

REFERENCES

525

20. K. Ballinger et al., Web Services Inspection Language (WS—Inspection) 1.0; http:// www.106.ibm.com/developerworks/webservices/library/ws-wsilspec.html. 21. Liberty Alliance; http://www.projectliberty.org/. 22. Open Mobile Alliance; http://www.openmobilealliance.org/. 23. Java Community Process, JSR 172; http://www.jcp.org/en/jsr/detail?id ¼ 172. 24. J. Ibbotson, ed., SOAP Version 1.2 Usage Scenarios, W3C Working Draft; http:// www.w3.org/TR/2002/WD-xmlp-scenarios-20020626/. Copyright Notice: http:// www.w3.org/Consortium/Legal/2002/copyright-documents-20021231. 25. N. Mendelsohn, M. Nottingham, and H. Ruellan, eds., XML-binary Optimized Packaging, W3C Working Draft, http://www.w3.org/TR/2004/WD-xop10-20040608/. 26. E. Levinson, The MIME Multipart/Related Content-type, RFC 2387, IETF, August 1998, http://www.ietf.org/rfc/rfc2387.txt. 27. N. Mendelsohn, M. Nottingham, and H. Ruellan, eds., SOAP Message Transmission Optimization Mechanism, W3C Working Draft, http://www.w3.org/TR/2004/WDsoap12-mtom-20040608/.

28. A. Karmarkar, M. Gudgin, and Y. Lafon, eds., SOAP Resource Representation Header, W3C Working Draft, http://www.w3.org/TR/2004/WD-soap12-rep-20040608/.

INDEX

AAC audio, 392 Absolute location, 442 Access, generally control, security services, 374 Internet service providers (ISPs), 109 offline, 144–145, 265 points, wireless LAN network, 142 workload characterization, 144 –145 Acknowledgments (ACKs) negative, 95–98, 346 positive, 93, 95, 345 ACME (architecture for content delivery in the mobile environment) defined, 5–6 overview of, 183–185 performance analysis in CDMA networks, 191, 196 –197 slotted ALOHA system, 191–196 system description, 191 in radio resource management, 201 terminal power consumption in, 185–186, 198 user interest correlation: algorithm, 197 –198 simulations, 198 –200 traces, 198 ACME Director characteristics of, 190 –192, 197 –198

development of, 185–186 effectiveness, 198– 200 terminal power consumption, 198–199 Active cache, 90 Active handover, IP multicast system, 344 ActiveX, 487 Activity factor, defined, 141 Adaptation appearance, 217–219 quality, in multimedia streaming, 281–283 Adapters, SOAP, 491 Adaptive threshold, 121 Adaptive TTL protocol, 120– 122, 125 Adaptive Web caching, 123–124 Additive increase, multiplicative decrease (AIMD) algorithm, 281 Administratively scoped multicast, 98 ADSL (asymmetric digital subscriber line), 31 Advanced encryption standard (AES), 378 Advanced mobile phone system (AMPS), 9 Agent-driven negotiations, 48 AIM, 99 Akamai, 91 Alert protocol, WAP 1.0, 77 AMR (adaptive multirate voice codec), 29, 208, 248, 308, 392


527

528

INDEX

Angle of arrival (AOA), location estimation, 449–450, 462 Angulation, 447 Antenna technologies, 23 Antipiracy protection, 374 AOL Anywhere, 135 Apache Cocoon, 213 Appearance adaptation, 217–219 Application layer defined, 3 multicast, see Application layer multicast multimedia streaming, QoS control, 278, 280–284 overview of, 91 –92 scalable content delivery, 91– 92 Application layer multicast characteristics of, generally, 103– 104, 330 functions of content distribution, 111 end-user subscription, 111 overlay setup, 110 tree organization, 110–111 rationale for application layer routing, 107 asynchronous delivery, 106 easy deployment, 105 –106 effective transport, 104 –105 versatility, 107 Application-level multicast, 168, 174 Application service provider (ASP), 466, 469 Architecture, see specific types of systems core network, 11– 13 end-user performance, GSM/EDGE and WCDMA, 31–33 GSM/GPRS/EDGE, 24–27 IS –95 radio access, 27–29 operator performance, GSM/EDGE and WCDMA, 29–31 overview of, 9– 10 standardization framework, 10 WCDMA radio access network architecture, 13–14, 362–363 beyond 2Mbps with HSDPA, 20–21 evolution of, 22–24 layer 2/3, 14 –18 physical layer, 18–20 ARIB, 10 ARM (active reliable multicast), 99 ASCII, 208, 428 ASF (advanced streaming format), 209 Asymmetric encryption, 377–378 Asynchronous delivery, 106 Asynchronous multicast push (AMP), 119, 125 AT&T, 138

Atmospheric conditions, impact of, 277 Attachments, 505–506 Audio, streaming, 285, 338. See also Multicast content delivery; Multimedia streaming Audiovisual content characteristics of, 208 nonscalable, 208–209 scalable, 209– 210 Authentication digital rights management, 402–404 HTTP/1.1, 48 OMA data synchronization, 269 security services, 373, 375, 507 3GPP MBMS system, 366 Authentication, authorization, and accounting (AAA), 290, 430 Automatic retransmission query (ARQ), 26 Automesh, defined, 113 Availability, security services, 374 Avantgo, 144–145 Backbone ISPs, 109 Backend architecture, 143 Band-segmented transmission-orthogonal frequency-division multiplex (BSTOFDM) modulation, 351 Bandwidth caching and, 126 distribution trees, 116 efficiency, 185, 190 IP multicast system, 107–108, 347 multicast content delivery, 333–334 multimedia streaming, 282–283 significance of, 2 streaming media, 283 wireless LAN network, 142 wireless mobile environment, 315 Bandwidth Broker (BB), 285 Bandwidth-on-demand services, 16 Basestation controller (BSC), 26 BAT system, location estimation, 456–457 Bayesian network, location estimation, 452, 455 Beamforming antenna, 23 BEEP (block extensible exchange protocol), 499 Beep Science AS, 390 mobile DRM system, 392–393 Berners-Lee, Tim, 39 Bidrectional access networks, 347 Bit error rate (BER), 183, 315 Bit rate capability, 10 Blogs (WeB logs), 479 Bloom filters, 123 Bluetooth, 206, 232–233, 261, 264

INDEX

Body area network (BAN), 3 Bottlenecks, 6, 123, 136, 184 –185 B-pictures, streaming video, 286 Broadcast control channel (BCCH), 15 Broadcast/multicast control (BMC) protocol, 15 Browsing charges, 433, 435 Bus cable, 261 Business-to-business (B2B), 421, 478 –479 Byte-range operations, HTTP/1.1, 45, 48 Cache, see Caching applet, 90 consistency, 118, 120 –122 HTTP/1.1, 48–50 mesh, 122, 126 miss, 117 radius, 125 router proposal, 99 Caching implications of, generally, 45, 48–50 proxy, 117 –125 scalable content delivery, 91–92 servers, streaming media, 289 –290, 311 web-based applications information dissemination, 87–90 information exchange, 90 web proxy, 117– 126 Callback address, 135 Call detail records (CDRs), 411–412, 424–425, 427–431 CAMEL (customized application for mobile network enhanced logic), 425 Canonical (CNAME), streaming media, 306 CAP3 (CAMEL application Part 3), 425 Carriage return and a linefeed (CRLF), 40 Cascaded unicast, 97 –98 Cascading style sheets (CSS), 81, 210–211, 228–229 CC/PP, see Composite capability/preference profiles (CC/PP) CCTrCH, 16 CDMA (code-division multiple access) characteristics of, generally, 11 classical near-far problem, 19 fast power control, 18–19 location estimation, 462–463 networks, ACME in, 191, 196– 197 cdma2000, 27, 196 Cell broadcast service (CBS), 358 Cellular digital packet data (CDPD) technology, 381 Cellular phones, 66, 144–145, 153 CGI (common gateway interface), 469 Channels, error-prone, 315

529

Characteristics adaptation, 217, 220, 239 Charging advice of charge, 429 architecture, 433 correlation, 428– 429 differentiated, 425–426 fixed-line telephony, 411–414 flow-based, 426–428 information, 431– 435 interfaces, 429–431 mediation, 428 mobile content features business-to-business (B2B), 421 multiple access, 422 multiple services in delivery, 423–424 overview, 415–416 postpaid charging, 419–421 prepaid charging, 419–421, 432 records, source of, 422–423 revenue chain, 416–417 roaming, 422 subscription models, 417–419 mobile telephony, 414– 415 rating, 429 records creation of, 424–425 source of, 422–423 rules, 429 scenarios browsing, 433, 435 downloads, 423, 436 person-to-person messaging, 435–436 streaming video, 436–437 scope of, 410–411 CIBER, 421 Circuit-switched (CS) domain, 11– 13 Cisco, 89, 103, 124 Classical near-far problem, 19 Client information request headers, 58–59 Client response preferences request headers, 59–62 CNN.com, 87–88 Coded orthogonal frequency-division multiplex (COFDM) modulation, 351 Common control channel (CCCH), 15 Common packet channel (CPCH), 16–17 Composers, SOAP, 491 Composite capability/preference profiles (CC/PP) content adaptation, 231–233 exchanges, 303–305, 313 repository, 304 streaming media, 299, 301–305 Conditional request headers, 57–58 Confidentiality, security services, 373, 507

530

INDEX

Conflict resolution, data synchronization, 260, 270 Congestion implications of, 47, 125 losses, 315–316 multicast content delivery, 333–334 multimedia streaming, 281– 283 Connections IP multicast system, 337 persistent, HTTP/1.1, 46 –47 Connectivity, 3, 126 Consumption IP multicast system, 337 IPDC, 349–350 multicast content delivery, 334 Content, generally caching, 5 defined, 2 distribution, 111 IP multicast system, 337 metadata, 230, 236–237 negotiations, 45, 47–48 types, media audiovisual content, 208 –210, 222 –223 nonaudiovisual content, 223–224 textual content, 207– 208 value chain, 376– 377 verification, 402–403 Content adaptation application scenarios content selection, browsing, 241 –244 transcoding, multimedia messaging service, 244– 250 architectures configurations, 239 –240 location of adaptation, 237 –239 capabilities composite capability/preference profiles (CC/PP), 231–233 defined, 230 subscriber databases, 233 UAProf, 233–235, 238–239, 248–249 user-agent information, 231 future directions for, 251 –252 implications of, generally, 6 metadata, 230, 236–237 methods of content selection, 225–228, 241–244 hybrid approaches, 230 multimedia transcoding, 221–225, 244– 250 rendering at the client, 228–230 motivation for, 205–207 multimedia content types application data, 214–215

media content, 207– 210 presentation content, 210–214 procedural code, 215 standardization, 251 types of adaptation appearance, 217–219 characteristics, 217, 239 encapsulation, 221, 239–240 format, 216–217, 239 size, 219–220, 239 Content analysis, workload characterization content modification pattern, 138–139 content popularity, 138 content size, 138 content types, 138 defined, 137 notification, 164– 168, 176–179 web browsing, 145– 149, 176– 179 Content control engine (CCE), 392 Content delivery multicast, see Multicast content delivery web proxy coaching, 118–120 Content delivery networks (CDNs) characteristics of, 3, 187, 454, 457–459 end-system acceleration, 187–188 mobile, 189 network scaling, 187 optimization, content and protocol, 188–189 Content networking, generally characteristics of, 1–2 defined, 2 in the mobile internet, 2 –4 Content provisioning system (CPS) IPDC system, 356 IP multicast system, 350 Content scrambling system (CSS), 386 Content servers (CS), 143, 310, 380 Content synchronization adoption of constraints, adherence to, 262–263, 269–270 mobile device scenarios, 261–262 change detection, 260, 270 conflict detection/resolution, 260, 270 data storage and, 255–257 implications of, 6 need for, 257–258, 273–274 OMA standards data synchronization protocol, 268–273 overview of, 263–264 representation, 264–267 types of delta sync, 260 fast sync, 260

INDEX

full sync, 259–260 one-way, 258 –259, 269 slow sync, 259 –260 two-way, 258 –259, 269 Continuous multicast push (CMP), 119, 125 Contributing source (CSRC), 305 CONTROL, GPS system, 459 Cookies, WAP 2.x, 82 COPS (common open policy service), 285 Copyright, 225 CORBA, 478, 487 Cricket system, location estimation, 456 Crying baby problem, 94, 100 Cumulative distribution function (CDF), 146, 149 Curly, 54 Customizer, 226–227 CWTS (China Wireless Telecommunication Standard Group), 10 DA delay, 97 DAML-S, 483 Database server, 144 Data gathering, web workload characterization, 3, 144 Datagroups, IP multicast, 110 Data synchronization, defined, 257. See also Content synchronization DCE, 487 DCOM, 478, 487 DCT (discrete-cosine transform), 385, 396 Debugging, 46 DECE, 478 Decryption, 380 Dedicated channel (DCH), 16 –17 Dedicated control channel (DCCH), 15 Dedicated physical control channel (DPCCH), 16–17, 19 –20 Dedicated physical data channel (DPDCH), 16–17, 19 Dedicated traffic channel (CTCH), 15 Deering, S., 93 Degree/minutes/seconds (DMS) system, physical location, 441 Delay, generally budget, 284 DA, 97 jitter, 277 –280 loss detection, 284 queuing, 184 spread, 444 Delay-locked loop (DLL), 462 Delta encoding, 120 Delta synchronization, 260 Denial of service (DoS), 372, 374

531

Dependency graph, 118 Deployment, multicast content delivery, 334 Designated receivers (DRs), 96 Desktop access, 144–145 Desktop computers, 261 Desktop users, web workload, 147, 178 Destination, adaptation architecture, 238–240 Device Independence, 252 DHCP (dynamic host configuration protocol), 331 Differential GPS (DGPS), 460 Differentiated Services (DiffServ), 284–285, 310 Diffraction defined, 443 loss, 445 Digest authentication, 45, 48 DigiBox, 391 Digital, 138 Digital audio broadcast (DAB) system, 352 Digital cameras, 217 Digital fingerprinting, 378– 379, 400–401 Digital Fountain, 100, 102, 119 Digital item adaptation (DIA), 251 Digital rights locker, 383–384 Digital rights management (DRM) authentication, 373 content distribution business models characteristics of, 380–381 floating licenses, 382 microtransactions/micropayments, 382 pay per view, 382–383 promotion models, 381 subscription– based, 381 superdistribution, 382 functional architecture, 376 information architecture, 376 mobile (MDRM), terminal requirements, 384–385 multicast content delivery, 338 multimedia services, 321 overview of, 375–381 security protocols, 373–374, 379–380, 384 significance of, 7 streaming media, 290 Digital signatures, 377– 378 Digital video broadcasting terrestrial (DVB-T) network, IPDC system, 348, 352–354 Digital watermarking, 375, 377–379 Dilution of precision (DOP), 463 Direct-sequence spread-spectrum (DSSS) signal, 457 Direction services, 147 Director effectiveness, 198–200 Directory-based volumes, 121 Disconnection, 256

532

INDEX

Discovery protocols, 518–519 Dispatchers, SOAP, 491 Disseminate information, 5 Distance-based location update strategy, 461–462 Distributed computing, 478 Distributed share memory (DSM), 117 Distribution tree application layer, 110–112 construction of: maintenance, 116 neighbor selection, 113–114 parent selection, 114– 116 peer discovery, 112–113 DMDmobile, 390 DNS (domain name service), 91 DoCoMo location platform (DLP), 469 Document object model (DOM), 223–224 Document type definition (DTD), 266, 268 DOM (Document Object Model), 522 Doppler effects, 343 Downlink shared channel (DSCH), 17 Downlink signaling, unidirectional, 346 –347 Downloader program, 144 Downloads charges for, 423, 436 web services, 484 Duplicate avoidance (DA), 95–97 DVB-H (DVB handheld), 351–353, 357 DVB-M (DVB mobile), 351 DVB-S (DVB-satellite), 357 DVB-T (DVB-terrestrial), 184, 197 DVB-X, 351 DVDs, intellectual property management, 386 DVMRP (distance vector multicast routing protocol), 96, 115 DWDM, 184 Dynamic content, defined, 88–90 Dynamic objects, 88 –89, 125 ECMAScript, 81 E-commerce, 91, 356 –357 Edge caching, 184 –185 EdgeScape, 458 Edge Side Includes (ESI), 189 EDI/B2B industry, 474 EFRC, 208 Eight-level vestigial sideband (8-VSB) modulation, 351 Electronic data interchange (EDI), 478–480 Electronic service guide (ESG), 353 Email, 258, 297, 483 –484. See also Mail service Encapsulation adaptation, 220 –221, 239 –240

Encoding chunked, 47 error-resilient, 283 Encryption characteristics of, 48, 375, 377–378 format compliant, 396– 397 progressive, 397 scalable, 394– 395, 397– 398 selective, 395–396 SOAP Message Security, 508–510 EncrypTix, 390 End-system acceleration, 187–188 End-to-end, generally architecture, 288–290 congestion, 281 connection, 18 headers, HTTP/1.1, 50 –51 E-911 calls, 460–464 Entity headers, HTTP/1.1, 51, 54–56 Ericsson, 67, 263 Error-concealment, 283 Error control, multimedia streaming, 283–284 Estimation, in location estimation process, 453–454 Ethernet LANs, 108 ETSI, 10 EU IST, BRAIN, 321 European Digital Video Broadcasting Terrestrial (DVB-T) system, 350–353, 357 Event-based charging, 418 EVRC, 218 Exchange information, 5 Expanding-ring search (ERS), 96, 112–113 eXtensible Markup Language (XML) characteristics of, 209, 265–267, 271–272, 480–485 XML-RPC, 494 XML/XML schema, 483 eXtensible rights Markup Language (XrML), 379, 387, 509 eXtensible Style Language Transformation (XSLT), 223–224, 229, 487 Extensible Stylesheet Language (XSL) Transformations (XSLT), 212–214, 228 External functional interface (EFI), 83

False-hit rate, 123 Fast fading, 343 Fast hit, 118 Fast synchronization, 260 Fault tolerance, 256 Fcast, 100 FDMA, 196

INDEX

FEC packets, 101–102. See also Forward error correction (FEC) Federal Communications Commission (FCC), 443, 460 Feedback implosion, 94–95 Fees, 7, 358. See also Charging Filecast IPDC system, 355 IP multicast system, 338 File delivery over unidirectional transport (FLUTE), 339 Fine-grained scalable (FGS) compression, 394–395, 397 Firewalls, 486 First-hop router, 102 Fixed-access networks, 1 Fixed-line telephony, charges for, 411 –414 Fixed subscription, 417 Fixed Web services, see Web services Flexible layer one (FLO), 26– 27 Flow-based charging, 426– 428 Format adaptation, 216–217, 239 Format compliant encryption, MDRM, 396–397 Forward access channel (FACH), 17 Forward error correction (FEC) ACME architecture and, 188 content caching, 100 –102, 104 IPDC system, 355 IP multicast system, 346 multimedia streaming, 280, 283, 286, 317 Free-space loss, 445 Freeware organizations, 474 Front-door server, 143 FTAM, 428 Full synchronization, 259 –260 Fusion tree, 96 Gateway GPRS support node (GGSN), 13, 311, 320, 357, 361, 364– 366, 415, 428–429 Gateway mobile location center (GMLC), 465, 467–468 Gateways, generally SOAP, 491 wireless access, 144–145 Gaussian minimum shift keying (GMSK), 25 General headers, HTTP/1.1, 51–54 Geographic locality, system load analysis, 162–163 Geographic positioning, 275 Geographic push caching, 120 GeoPoint services, 458 Geotargeting, 458– 459 GeoTraffic analysis services, 458–459 GIF, 209, 216– 218, 220, 222, 248, 250

533

Global Mobile Suppliers Association, 10 Global navigation satellite system (GNSS), 459 Global positioning system (GPS), 459 GLONASS system, 459 GMSC (gateway MSC), 13 Google.com, 88 GPRS (general packet radio service), see GSM/ GPRS/EDGE characteristics of, generally, 83, 206 charging for, 415, 422–423, 426 network, 183 GPRS RAN (GERAN), 320, 361–362 GPRS SGSN, 414–415 GPRS tunneling protocol (GTP), 364 GPRS/WCDMA, 32– 33 Group key Kg, 383 Group management hierarchy (GMH), digital rights management, 401 G.723.1, 208 G.729, 208 GSM (global system for mobile communication), see GSM/EDGE; GSM/GPRS/EDGE characteristics of, generally, 208 evolution of, 66 phones, 205 standardization, 13 GSM (Group Spećiale Mobile), 414 GSM Association, 10 GSM/EDGE, 9– 10 GSM/GPRS/EDGE end-user performance, 31 –33 GSM principle, 24–26 operator performance, 29–31 radio access network architecture, 26 service creation principle, 26–27 GSM/GPRS, synchronization, 261 GSM MSC, 414 GSM900, 29 Hacks, 384 Handheld Device Markup Language (HDML), 5, 67 Handheld PCs, 205 Handover (HO)/handoff, IP multicast system, 343– 345 Handover packet loss, 316–317 Handshake protocol DRM security, 379–380 WAP 1.0, 77 HARQ, 28 Harvest Web cache, 121– 123 Headers, HTTP/1.1 end-to-end, 50–51 entity, 51, 54 –56

534

INDEX

Headers, HTTP/1.1 (Continued ) functions of, 50 –51 general, 51 –54 hop-to-hop, 50–53, 61, 65 request, 51, 56–63 response, 51, 63–65 Heavy-tail distribution, 138–139 Helix, 390–391 Hierarchical caching, 117, 119, 122, 126 High-definition television (HDTV), 350 High-speed circuit-switched data (HSCSD), 24 High-speed DSCH (HS-DSCH), 21 HLR (home location register), 12 Hop-by-hop headers, HTTP/1.1, 50 –53, 61, 65 Hop count, 107 Host mobility, 315, 317 –318 Hot billing, 425 Hotspots, 319, 361 HSDPA (high-speed downlink packet access), 4, 20–22, 26, 29, 31–33 HTTP/0.9, 40 HTTP/1.0, 410 HTTP/1.1 authentication, 43 –44, 48 byte-range operation, 48 caching, 48–50 chunked encoding, 47 content negotiations, 47– 48 goals of, 44 headers end-to-end, 50–51 entity, 54 –56 functions of, 50–51 general, 52–54 hop-to-hop, 50–52 request, 56– 63 response, 63–65 persistent connections, 46 –47 request methods, 45–46 response methods, 45 status code with description, 43 HTTP/2.0, 53 Human factors studies, 1 Hybrid networks, 97 –98, 347 Hypertext Markup Language (HTML) characteristics of, 210 macros, 188–189 multimedia transcoding, 222 Hypertext Transfer Protocol (HTTP), see HTTP/ 0.9; HTTP/1.0; HTTP/1.1; HTTP/2.0 basic operation, 39 defined, 5, 38 evolution HTTP/0.9, 40

HTTP/1.0, 41 –44 HTTP/1.1, see HTTP1.1 general operation, 38 –39 GET, 479, 486, 510– 511 IP multicast systems, 338 OMA data synchronization, 264 popularity of, 188 POST, 467–468, 479, 486, 499, 511 scalable content delivery, 91 IANA, 56 IBM, 263, 508 ICAP Forum, 251 ICP (Internet caching protocol), 123 Identifiers, content synchronization, 263, 270, 273 Idle-mode reception, 363 IGMP (Internet group management protocol), 97, 99, 365 IMEI code, 233 IMT-2000 technologies, 10, 23, 466 Indoor location estimation system infrared-based, 457 scene-analysis-based, 454–455 ultrasound-based, 455–457 Infopyramid characteristics of, 225– 226 content selection application, 242–243 creation process, 227 Information, generally dissemination, 5, 87–90 exchange, 5, 89–91 security, see Security InfoSpace, 135 Infrared, 264 Infrared (IR) signals, location estimation, 445–446 Initialization, OMA data synchronization, 269–270 Inktomi “Media Distribution Network,” 116 reverse proxy, 91 Integrated cellular/WLAN environments, 319–321 Integrated Services (IntServ), 284, 310 Integrity, security services, 373, 507 Intellectual property, 375. See also Moving Picture Experts Group (MPEG), intellectual property management Intelligent network (IN), 414 Interference, 25, 277, 336, 445 Interleaving, 280 Intermediary, adaptation architecture, 238–240 Internet, generally access, 264

INDEX

infrastructure, 92 TV broadcasts, 109 Internet Engineering Task Force (IETF) filecast, 339 functions of, 18, 290 group key management, 402 IP Multicast, 368 media discovery, 339– 340 Multicast Security, 330 OPES Working Group, 187 RFC 1889, 305, 308 2326, 293 2327, 297 streaming media, 284, 339 web services interoperability (WS–I), 519 Internet Explorer (IE), 231, 238 Internet media guides (IMGs), 340 Internet multimedia system (IMS), 310 Internet service providers (ISPs) functions of, 109, 458 wireless, 164, 319 InterTrust, 380 Intranet, 264 Invalidation, web proxy caching, 121 IP datacast (IPDC) characteristics of, 7 concept of, 347 –348, 357 –358 e-commerce for (e-CS), 356 –357 IP infrastructure, 354 –355 mobile wireless radio networks, 350 –354 services and applications, 348–350 service system, 355–356 system architecture, 350 I-pictures, streaming video, 285 IP layer, scalable content delivery, 91–92 IP multicast system, generic common aspects, 336 networking procedure, 340 –342 reference system model, 336–337 three-platform services, 337 –340 IP multimedia subsystem (IMS), 11, 13, 367 IPMP-Ds, 386 IPMP-ES, 387 IP-PDN, 310 –311, 320 IPSec, 83 IP unicast, 93, 345 IPv4, 83, 91, 311, 365 IPv6, 83, 91, 365 IPv6 Forum, 10 IRC/6.9, 53 IrMC, 261 IS-95 characteristics of, 9– 10

535

evolution of, 10 radio access, 27– 28 ISO, 285, 290 IS-2000, 27 ITU-T H.261, 223 H.263, 209, 217, 223, 285, 392 Japanese Terrestrial Integrated Service Digital Broadcasting (ISDB-T) system, 350, 352– 353, 357 Java, RMI, 478 Java Community Process (JCP), 522 JavaEnabled, 232–233 Java MIDP, 215 Java Server Pages, 214 Java Specification Request (JSR), 522 Jini, 478 JMS, 499 JPEG, 209, 216–218, 220, 222–223, 248 JPEG2000, 209, 216, 220, 223 J2ME web services, 522 JVMVersion, 232–233 Kazaa, 381 Kerberos, 478, 509 k nearest-neighbor averaging, 453 Large-scale active middleware (LSAM), 124 Large-scale path loss, 445 Latency implications of, 185, 190 reduction, 125, 185, 190 Lateration, 447 LGMP, 98 Liberty Alliance, 522 Limited-scope multicast (LSM), 98–99 Limited subscription, 418 Line of sight (LOS) path, 443–445 Link layer, 2, 183 LISAP (location information service access protocol), 469 LLC/SNAP framing, 353 Load balancing, 255–256 Local synchronization, 261 Local unique identifier (LUID), 273 Location-based services based on cellular systems location service platform, 468– 470 mobile location protocol (MLP), 467– 468 system architecture, 465–467 characteristics of, generally, 7–8 location estimation algorithm defined, 440

536

INDEX

Location-based services (Continued ) proximity, 454, 457–459 scene analysis, 450–454 triangulation, 447– 450 location estimation media infrared (IR), 445 –446, 457 radiofrequency (RF), 443–445 ultrasound, 446– 447, 455–457 location estimation system defined, 440 indoor, 454–459 outdoor, 459–464 location format transformation (LFT), 464–465 location sensor infrastructure, 440 location taxonomy absolute location, 442 physical location, 441 relative location, 442 symbolic location, 441–442 Location Interoperability Forum, 264 Location service system (LCS), 465 –467 Log analyses, workload characterization, 161–163, 172–174 Logical key hierarchy (LKH), digital rights management, 401 Lookahead window, 118 Loose coupling, 319 –320 Lorax, 98 Loss detection delay, 284 Loss rate, 107 Lotus/Lotus Notes, 261, 263 Low-end mobile phones, 264, 270 Loyalty schemes, 419 MAC (media access control) layer functions of, 91 scalable content delivery, 91– 92 WCDMA, 14 Macrocell environment, 31 Mail servers, 258 Mail service, 147, 165 Manual mesh, distribution tree, 113 Mapping, data synchronization, 263, 270, 273 Matching, location estimation, 451 –453, 455 Maximum transmission unit (MTU), 281, 353 MBone, 89 Media discovery IPDC system, 356 IP multicast system, 338, 340 –341 Media gateway (MGW), 13 Media gateway control function (MGCF), 13 Mediation, charging issues, 428 Media transcoder, architecture of, 218, 221–222

Media transport protocols, 308 Memory, 232–233 Meridian lines, 441 Mesh overlay network, 110, 113 Message sequence chart (MSC), 268, 461 Metadata HTTP/1.0, 41 security strategies, 375, 377, 379 Metropolitan area wireless network, workload characterization analysis, 141 MFTP, 100 Microcells, 278 Micromobility protocols, 320 Microsoft, 508 Middleware, 2, 478 MIME (multipurpose Internet mail extensions), 41, 231, 511, 512 Mirror servers, 255 MMS-IOP, 264 Mobile cinema ticketing, 275–276 Mobile clients, web workload, see Web workload Mobile content charging for, 409–437 delivery, for the Internet, 189 digital rights management 371–404 security, 371–404 Mobile DRM (MDRM) state-of-the-art component technologies efficient key management, 401–402 encryption, generally, 394–398 error concealment, 402–403 format compliant encryption, 396–397 multimedia content verification, 402–404 progressive encryption, 397 public key watermarking system, 398– 401 scalable encryption, 394–395, 397–398 selective encryption, 395–396 state-of-the-art systems components of, 388–389 Helix, 390– 391 integrated model, 391 NEC VS-7810, 390–392 Nokia Music Player, 390–391 table of, 390 terminal requirements, 384–385, 392–393 Mobile Information Device Profile (MIDP), 522 Mobile Internet architecture, overview of, 4, 9–33 characteristics of, generally, 2 –4 content adaptation, 205–252 content networking, 1 –8 protocols for, 35 –84 Mobile location protocol (MLP), 466–468 Mobile phones, 197. See also Cellular phones

INDEX

Mobile terminals, web services, 520–521 Mobile Web services, see Web services Mobile web users, 178 –179 Mobile wireless multicast characteristics of, 335 –336 mobility/movement of users, 342–345 radio transmission errors, 345 –345 unidirectional downlink bearers, 346 –347 Mobile wireless networks, multimedia streaming end-to-end architecture, 288 –290 media delivery protocols, 291 –305 multimedia services, 315–321 QoS control application layer, 275 –284 network layer, 284–285 streaming media codecs, 285–288 streaming media transport protocols, 305 –308 3GPP packed-switched streaming service, 308 –315 Mobile wireless radio networks, IPDC system characteristics of, generally, 350–353 DVB-T/H, 352–354 Modality axis, Infopyramid, 226 Modulation, GSM, 25 Modulo caching, 124–125 Motorola, 67, 263 Movement-based location update scheme, 461 Moving Picture Experts Group (MPEG), intellectual property management MPEG-4 IPMP hook, 386–387 MPEG IMP extensions, 387 MPEG-21 (Rights Data Dictionary), 387 MPEG-2 videos, copy protection, 385–386 Mozilla Windows, 145 MP, 209 MP3, 392 MPEG characteristics of, generally, 251 IMP extensions, 387 video, 395–396 MPEG-4 AAC, 209, 285 AVC, 216 characteristics of, 107, 223, 286 IPMP hook, 386 –387 video, 392 MPEG-7, 299, 340 MPEG-21 (Rights Data Dictionary), 251, 387 MPEG-2 systems standards, 352 videos, copy protection, 385– 386 MSC/VLR (mobile services switching center/ visitor location register), 12 MSNBC server traces, 139–140

537

MSN Mobile, 135 Multicast application layer, see Application layer multicast characteristics of, 5–7, 93 confinement, 96 content delivery, see Multicast content delivery datagrams, 93 distribution tree, 113–116 FIB (forwarding information base), 93 gain, 333 IP characteristics of, generally, 93–94 need for, 108–110 multigroup, 98 push, 118–120 reliable, see Reliable multicast (RM) scalable content delivery, 91 –92 streaming services, 321 Multicast content delivery applications, generally, 327–328, 332–335 architecture, 188 as communication technique, 331–332 future directions for, 367–368 generic IP multicast system common aspects, 336 networking procedure, 340–342 reference system model, 336–337 three-platform services, 337–340 IP datacast (IPDC) concept of, 347–348, 357–358 e-commerce for, 356–357 IP infrastructure, 354–355 mobile wireless radio networks, 350–354 services and applications, 348–350 service system, 355–356 system architecture, 350 justification for, 328–330 mobile wireless multicast characteristics of, 335–336 mobility/movement of users, 342–345 radio transmission errors, 345–345 unidirectional downlink bearers, 346–347 multimedia broadcast multicast service (MBMS) commercial interfaces, 366 concept of, 358–360, 366–367 in core network, 364–365 data sources, 365– 366 radio access networks, 362–364 service center (BM–SC), 361, 365–366

538

INDEX

Multicast content delivery (Continued ) services and applications, 360–361 standardization, 359 system architecture, 361– 362 perspectives of, 330–331 Multicast content distribution, digital rights management, 381 Multicast distributed virtual cache, 124 Multicast expanding-ring search, distribution tree, 112–113 Multifrequency network (MFN), 351 Multigroup multicast, 98 Multihit sessions, 154 Multihoming, 347 Multimedia broadcast multicast service (MBMS) characteristics of, generally, 7, 24, 184, 197 commercial interfaces, 366 concept of, 358–360, 366–367 in core network, 364–365 data sources, 365 –366 radio access networks, 362 –364 release schedule, 360 service center (BM-SC), 361,.365– 366 services and applications, 360– 361 standardization, 359 system architecture, 361–362 Multimedia data streams (MDS), 394 Multimedia message adaptation (MMA), 246 Multimedia messaging service (MMS) charging, 426 content adaptation, 217, 219, 221, 239 multimedia transcoding, 222, 244 –250 wireless access protocol (WAP), 81 Multimedia messaging service center (MMSC), 239–240, 244, 246, 248–250, 426 Multimedia streaming architecture integrated schemes, 319– 321 logical, 288 –290 overview of, 276–278, 288 benefits of, 276 characteristics of, generally, 6 challenges for, 277–278, 282 codecs audio compression, 286 in 3GPP, 286–287 video compression, 285–286 defined, 276 delivery protocols session control, 291 –305 transport protocols, 305–308 future directions for, 321 –322 QoS application layer, 278, 280– 284

network layer, 284–285, 316–317 overview of, 278–280 seamless, 320 3GPP packet-switched streaming service characteristics of, 308–310 domain architecture, 310–312 PSS framework, 312– 314 wireless environment congestion losses, 315– 316 handover packet loss, 316–317 integrated cellular/WLAN environments, 319–321 mobility-aware server selection, 318 request routing, 318 transmission error loss, 315–316 Multimedia transcoding advantages of, 224 architecture, 221–222 audiovisual content, 222–223 drawbacks of, 224– 225 nonaudiovisual content, 223–224 procedural code, 224 Multimedia units (MMUs), content adaptation, 216, 219–221 Multiparty multimedia session control (MMUSIC), 340 Multipath fading, 277, 444 Multiple description coding (MDC), 283 Multiple-input multiple-output (MIMO), 23 Multipoint-to-multipoint (m-t-m) multicast, 331 Multipoint-to-point (m-t-p) multicast, 331–332 Multiprogram transport streams, 354 Multiprotocol encapsulation (MPE) digital video broadcasting (DVB), 353–354 IPDC system, 348, 350, 354–355, 357 Multiuse sensor environment (MUSE), 455

Negative acknowledgment (NAK) characteristics of, 95–98 implosion, 95 Napster, 107, 381 Narrowband systems, 29 Navigational service, WAP 2.x, 83 Nearest-neighbor averaging, 453 NEC VS-7810, 371, 390–392 Negative acknowledgments (NAKs), 95–98 Neighbor selection, distribution tree, 113–114 Netscape, 231, 238 Network infrastructure, 2–3 Networking, defined, 2 Network layers multicast, 330, 332

INDEX

multimedia streaming, QoS control, 284– 285, 316 –317 overview of, 91–92 Network scaling, 187 Nibble system, 452 –455 NLANR trace, 198, 200 NMT, 9 Nokia Music Player, 371, 390–391 Series 60 platform, 521 synchronization standards, 263 wireless access protocol (WAP), 67 Non-line of sight (NLOS) path, 443–444 Nonmesh overlay network, 110 Nonrepudiation, 374 Notification document, 163 Notification messages, 163–164, 364. See also Notification workload Notification server, 143 Notification workload, characterization of content analysis message popularity analysis, 167– 168 notification message, 164– 168 popular categories, 164 –167, 176 –178 log analyses, 172– 174 significance of, 163–164 system load analysis, 171–172 user behavior analysis load distribution, 168–169 spatial locality, 168 –171 web browsing correlations, 174–178 Oasis, 475, 508, 520 OBEX, 264 Observed time difference (OTD), 464 Offline, generally access, 144 –145, 265 charging, 421 users content analysis, 147 system load analysis, 160, 162 user behavior analysis, 151–152 Offloading TCP processing, 188 On-demand, see Video on demand caching, 126 request-response, 118 One-way synchronization, 258–259, 269 Online charging, 421 Open GIS (geographic information system) Consortium (OGC), 440, 465 Open Mobile Alliance (OMA) content adaptation, 245, 251 content synchronization data synchronization protocol, 268– 273

539

representation, 264–267 functions of, 6, 84, 410 web services, 522 Open-source organizations, 474 Open System Interconnection (OSI), 1, 92 Openwave, 66 Orchestrators, SOAP, 491 Origin server charging records, 422 streaming media, 288 OSA (Open Service Access), 431 OSPF (open shortest path first), 115 OTDOA, 467 OTERS, 98 Outdoor location estimation systems cellular-based system, 460–464 GPS-based system, 459–460 Outlook Express, 261 Overcast, 116, 111, 114 Overlay networks, 103, 108, 458 Over-the-air synchronization, 261

Packager, digital rights management, 380 Packed-switch (PS) domain, 11, 13, 24 Packet(s), generally decoding failure, 26 loss, 184, 315–317 retransmission, 283–284 Packet data convergence protocol (PDCP), 15 Packet data protocol (PDP), 364 Packet-switched streaming service (PSS), 3GPP characteristics of, 308–310 domain architecture, 310–312 framework, overview, 312–314 setup procedures, 313–314 Paging channel (PCH), 16 Paging control channel (PCCH), 15 Palm Inc., 263 Palm OS, 215 Parent selection, distribution tree, 114–116 Passive attacks, 372 Passive caching, 289 Passive handover, IP multicast system, 344 Path loss exponent, 445 Payload DRM system, 378– 379 format, 339 PCM, u-law/A-law, 208 Peak loads, 125 Peer discovery, distribution tree, 112–113 Peer-to-peer (P2P), generally applications, 479 file sharing systems, 381, 383

540

INDEX

Personal area network (PAN), 3 Personal devices, 3 Personal digital assistants (PDAs), 66, 144, 151, 153, 160–161, 163, 165, 197, 205, 215, 228, 261, 264, 383 Personal information management (PIM), 214–215, 262–263 Person-to-person messaging, charges for, 435–436 PGM, 99, 103 Phone.com, 66 Physical common packet channel (PCPCH), 17 Physical layer functions of, 2, 10, 91 multicast, 331 1X, IS –95, 27– 28 scalable content delivery, 91– 92 WCDMA radio access network, 18 –20 Physical location, 441 Piezoelectric ceramics, location estimation, 456 Piggyback cache invalidation (PCI), 119, 121 Piggback cache validation (PCV), 119, 121 Piggybacking, 117 PIM-DM, 96 PING roundtrip time, 183 Plane earth loss, 445 Playback rate, multimedia streaming, 276, 282–284 Playout buffer, streaming media, 279–280, 282 PNG format, 209, 216 Point of contact, distribution tree, 112 Points of presence (POPs), 107, 110 Point-to-multipoint (p-t-m) multicast, 331–332, 358, 361 –364 Point-to-point multicast, 362–363 Policy enforcement server (PES), 392 Polling every time protocol, 120, 122, 125 Portals, packet-switched streaming service, 311 Postpaid charging, 419–421 P-pictures, streaming video, 286 Precise positioning service (PPS), 459–460 Prefetch threshold, 118 Prefetching, 118, 125, 151, 190 Prepaid charging, 419–421, 432 Presentation content device-independent, 214 overview of, 210 stylesheets, 210–213 Presentation layer, scalable content delivery, 92 Primary common control physical channel (P-CCPCH), 17 Private key watermarking, 398 –399 Proactive caching, 289–290

Proactive multicast, 126 Probability-based volumes, 121 Profile servers, packet-switched streaming service, 311 Profiling, location estimation, 451 Program service information (PSI), 353–354 Progressive encryption, 378, 397 Protocol data units (PDUs), 75 Provisioning service, WAP 2.x, 83 Proximity, location estimation algorithm, 454, 457–459 Proxy, generally caching basics of, 117–118 cache consistency, 120–122 cache cooperation, 117, 122–125 content delivery, 118–120 limitations of, 125–126 clients, IPDC system, 354 filters, 119, 121 server, streaming media, 289, 310 SOAP, 491 Psion, 264 PSVP signaling, 284–285 Public key infrastructure (PKI), 404, 478 Public key watermarking, 398–400 Public land mobile network (PLMN), 361 Public safety answering point (PSAP), 460–464 Public switched telephony network (PSTN), 411 Push multicast, 118–120 wireless session protocol (WSP), 74, 81, 233 Push-pull scheme, 190 QCELP, 208 QPSK modulation, 28 Quality of service (QoS) ACME, 184 importance of, 3, 6, 8, 10 multimedia streaming, 278–285 Queuing delay, 184 QuickTime, 107, 111 RADAR, 454–455 Radio access bearer (RAB), 312 Radio access network (RAN), 11, 320, 361–365 Radio bearers (RBs), 362–363 Radio frequencies, WCDMA, 23 Radiofrequency (RF) signals, location estimation generally, 443 interference factors, 445 multipath propagation, 443– 444

INDEX

Radio modulation, types of, 351 Radio network controllers (RNCs), 11, 14, 357, 461 Radio propagation implications of, 336 models, 444 Radio resource control (RRC), 16, 311 Radio resource management, 201 Radio traffic engineering, 201 Radio transmission errors, IP multicast system, 345–346 RADIUS (remote authentication dial-in user service), 424, 430–431 RAMP (reliable adaptive multicast protocol), 103 RAN/GERAN, 312 Random access control channel (RACH), 16–17 Random hopping, 25 Rayleigh fading, 343 Rayleigh propagation, 444 RDF (resource definition framework), 223, 232 RealAudio, 392 Real Audio G2, 209 Real Networks, 89 RealServer 8, 116 Real-time packet-based services, 32 Real-time streaming protocol (RTSP): IP multicast system, 339 streaming media characteristics of, 291–292, 305 messages, 293 packet-switched streaming, 313 request messages, 293–295 response messages, 295–296 session setup, 296 Real-time transport control protocol (RTCP) IP multicast system, 339 streaming media, 305–306, 315–316 Real-time transport protocol (RTP) IP multicast system, 339 streaming media, 305–308, 316 RealVideo, 392 Received signal strength (RSS), 444 Receiver-driven reliability, 95 Redundant transmissions, 280 Reed-Solomon code, 101– 102, 317 Relative location, 442 Reliable multicast (RM) active (ARM), 99 challenges of, 94–95 characteristics of, 93 –94 distributed recovery, 95– 98 FEC-based recovery, 100– 102 NAK-based recovery, 95

541

router-assisted recovery, 98– 100 state of the art software, 102–103 Reliable multicast transport (RMT), 339 Remote procedure call (RPC), SOAP, 501, 503– 504, 504 Remote synchronization, 261 Repeated unicast model, 117 Replication, 255–256, 258 Representational state transfer (REST), 484, 486– 487 Requantization, 283 Request headers, HTTP/1.1, 51, 56–63 Request line, 293 Request routing, multimedia streaming, 318 Resolution adaptation, 217–219, 239 axis, Infopyramid, 226 Response headers, HTTP/1.1, 51, 63–65 Retransmission, 26, 74 –75, 283 Revenue chain, 416–417 Revenue sharing, 7 Reverse proxy, 91 RFC, generally 1458, 103 1945, 40, 42 –43 2068, 44 2616, 44 Rich calls, 32 Rician propagation, 444 Rights fulfillment server (RFS), 380–381 Rights issuer server (RIS), 392 RLC (radio link control) layer, WCDMA, 14 RMDP, 100, 119 Roaming charges, 422 Roundtrip times (RTTs), 31 –32, 95– 96, 107, 281 Routers multicast-capable, 93 SOAP, 491 Routing application layer, 107 IP multicast system, 340–342 protocols, distribution trees, 115 RTP/UDP traffic, 308 SAML, 509 Sandpiper, 91 Satellite Internet services, 190 SAX, 223–224, 522 Scalable encryption, MDRM, 394– 395, 397– 398 Scalable vector graphics (SVG), 209–210 Scattering, 443 SCE (single–connection emulation), 93, 95

542

INDEX

Scene analysis, location estimation algorithm estimation, 453–454 indoor systems, 454 –455 matching, 451–453, 455 profiling, 451 RF-based, rationale for, 450 –451 SDES packets, 307 SDPng, 340 Second-generation (2G) cellular communications, 36, 257, 335 Secure electronic transaction (SET), 374 Secure Interactive Broadcast Infotainment Services (SBIS) project, 383 Secure socket layer (SSL), 44, 68, 384, 469, 508 Security attacks, types of, 372, 507 content-based media security, 374– 375 emerging technologies state-of-the-art MDRM component technologies, 394 –404 state-of-the-art MDRM systems, 387– 384 importance of, 7 information, see Information security intellectual property, see Moving Picture Expert Group (MPEG), intellectual property management mechanism, 373 overview of, 371 –374 services, types of, 373 SOAP messages, 507– 510 Selective encryption, 378, 395 –396 Semantic Web, 479– 480 Server architecture accesses, types of, 144 –145 components of, 143–144 content server, 143 database server, 144 data log description, 144 front-door server, 143 notification server, 143 significance of, 142 –143 Server-driven negotiations, 47 Server selection, mobility-aware, 318 Server volumes, 119 Service and delivery management system (SDMS), 356 Service lookup, WAP 2.x, 83 Service-oriented architecture (SOA), 475–477 Serving GPRS support node (SGSN), 13, 311, 357, 361, 364–365 Session announcement protocol (SAP), IP multicast systems, 338 –304 Session-based charging, 418

Session control, streaming media description languages, 298–302 H.323 protocol, 298 real-time streaming protocols, 292–296 session description protocol (SDP), 297–298 SIP protocol, 298 UAProf specification, 303–305 wireless session protocol (WSP), 298 Session description protocol (SDP) IP multicast systems, 338–340 streaming media, 292, 297–298 Session duration, 137 Session inactivity period, 153–154 Session initiation protocol (SIP), 310 Session layer, scalable content delivery, 92 Shadowing, 445 Short message service (SMS), 24, 221, 244, 357, 414 Short message service center (SMSC), 357, 414, 423 Siblings, 122 Signal-to-interference ratio (SIR), 23, 196–197 Simple object access protocol (SOAP) attachments and, 505–506 bindings, 490, 499–500, 516–517 characteristics of, generally, 8, 318, 481–482, 489 defined, 481, 489 deployment environments, 490–491 encoding, 500, 504–505 example of, 492–494 HTTP binding, 499–500 message security, 507–510 mobile terminals, 521 1.2, changes to, 510 processing body, 496 fault message, 497–499 header, 496–498 structure, 494– 496, 498– 499 styles document, 500–502 RPC-style, 501, 503–504 Single-hit sessions, 154 Single-source multicast (SSM), 93 Size adaptation, 219–220, 239 Sleep, 342, 363 Slotted ALOHA system, ACME performance analysis, 191–196 Slow hit, 118 Slow synchronization, 259–260 Smallest k-vertex polygon, 453 Smart phones, 205, 264 SMTP (simple mail transport protocol), 91, 499

INDEX

SNMP trace, wireless LAN network, 141 Soft handover, 19 Source, adaptation architecture, 237–238, 240 SPACE, GPS system, 459 Spatial locality analysis, 140, 158–160 Spatial redundancy, 285 Spectral efficiency, 29– 30 Speech compression, 208 –209 Split-proxy architecture, 189 –190 Sponsorship, 418–419 Spreading factor (SF), 16 Squid Web cache, 121 SRM (scalable reliable multicast), 103 SSL/TLS, SOAP Message Security, 508–509 Standard positioning service (SPS), 459 Standard transcoding interface (STI), 251 Starfish Software, 264 Static content, defined, 88, 90 Status line, 293 Stock quotes, 147, 161, 166, 174 Streaming, generally IP multicast system, 338 IPDC system, 355 media, 89 video, charging for, 436–437 Stream thinning, 109 Stylesheets, 210–213, 223 Subcast ERS, 96 –98 Subcasting, 96–97 Subcast repair, 98 Subscriber databases, 233 Subscription charges, 417–419 Summary cache, 123 –124 Sun XML LDI Extensions tag library, 214 Surrogate server, 290 SVG, 216, 223 Symbolic location, 441–442 Symmetric encryption, 377– 378, 380 Synchronization, see specific types of synchronization multimedia streaming, 277 protocol, PDAs, 151, 160–161 Synchronization source (SSRC), streaming media, 305 Synchronized Multimedia Integration Language (SMIL) characteristics of, generally, 210, 222 –223 streaming media, 292, 298 –301, 308 –309, 312 –313 SyncML Initiative, 262– 264 SyncML synchronization, 215, 262 Systematic hopping, 25 System load analysis, workload characterization defined, 137, 140

543

notification, 171–172 web browsing, 160–161 TACS, 9 Talarian, 103 TAP, 421 Tau-dither loop (TDL), 462 TCP (transmission control protocol), see TCP/IP –based invalidation, 121–122 connections, 125, 164 defined, 32 IP multicast system, 345–346 tcpdump, 141 TCP/IP, see Transmission control protocol/ Internet protocol (TCP/IP) TDMA (time division multiple access), 196 Telecommunications market, 1 Television broadcasts, preprogrammed, 349 Temporal redundancy, 285 Temporal stability analysis, 137, 139, 155–158 Third-generation (3G) cellular, generally communications, 335 multicast service, see Multimedia broadcast multicast service (MBMS) Third Generation Partnership Project (3GPP) on charging, 410 generally, 10, 13, 18, 22–24, 197, 244 packet-switched streaming service characteristics of, 308–310, 321 domain architecture, 310–312 PSS framework, 312–314 streaming media, 286–287, 308–314 Third-party search service, 466–467 13K, 208 3GPP2, 10, 13, 290 Three mile system, multicast services, 328 Tibco, 103 Tight coupling, 319–320 Time difference of arrival (TDOA), location estimation, 447–449, 462, 464 Time-division multiple access (TDMA) system, 24, 26, 462, 464 Timeliness, in security, 507 Time of arrival (TOA), location estimation, 447– 448, 460, 462, 464 Time-to-live (TTL) values, 96, 112, 117–118 Tivoli NetView “Distribution Manager”/ “Software Distribution,” 116 TMTP, 97–98 T1P1, 10 Tornado code, 101– 102 Trace-driven simulations, 122 Traffic, content synchronization and, 255 Transaction-based charging, 418

544

INDEX

Transaction identifier (TID), WAP 1.0, 74–75 Transcoding, 109. See also Multimedia transcoding Transmission, generally error loss, 315 –316 primitives, 514 Transmission control protocol/Internet protocol (TCP/IP), 2–3, 5, 38, 41, 83–84, 185, 261 Transmission power control (TPC), 20 Transparent caching, 117, 122, 124, 126 Transport format combination indication (TFCI), 19–20 Transport layer, scalable content delivery, 91–92 Transport layer security (TLS) characteristics of, 469, 508 wireless, 69, 76–77, 83 Transport protocols implications of, 136 streaming media HTTP tunneling, 291, 308 real-time, 305–308 RTSP tunneling, 291, 308 Triangulation, location estimation algorithm, 447–450 TTA, 10 TTC, 10 Tunneling GPRS protocol, 364 HTTP, 291, 308 IP, 89 RTSP, 291, 308 Turbo coding, 19 Two-way synchronization, 258–259, 269 UA header, 221–222, 231, 248 UAProf content adaptation, 233 –235, 238, 240, 248 – 249 streaming media, 302 –303 UDP charging for, 427 streaming media, 307 –308 Ultrasound signals, location estimation, 446– 447 UMTS (universal mobile telecommunications system) architecture, 10, 206, 310 –311 charging, 419 QoS classes, 311, 314 UMTS Forum, 10 UMTS PLMN (public land mobile network), 13 UMTS terrestrial RAN (UTRAN), 11, 16, 310, 320, 361 –362

UMTS/WLAN architecture, 319 Unicode (UTF-8/UTF-16), 208 Uniform resource identifier (URI), 38, 41, 292, 313–314, 480, 484, 487, 494–495, 509, 511 Uniform resource locator (URL), 61, 64, 74, 112, 122, 144, 146–147, 149, 248–249, 292, 519 U.S. Advanced Television System Committee (ATSC) system, 350–351, 357 Universal Description, Discovery, and Integration (UDDI), 518 Universal transverse mercator (UTM), 441 Universal Wireless Communications Consortium (UWCC), 10 Unwired Planet (UP), 66–67 UP.Browser, 145 Uplink, generally dedicated channel, 22–23 signaling, 346–347 URI RFC, 41 Usage shaping, multicast content delivery, 334 User agent profiling (UAProf), 81 User behavior analysis, workload characterization defined, 137 load distribution, 150–153 notification workload, 168–171 spatial locality, 140, 158– 160 temporal locality, 139 temporal stability, 139, 155–158 user request arrival and duration, 139 wireless user sessions, distribution of, 153–156 wireline web and mobile web compared, 170, 178–179 User, generally behavior analysis, see User behavior analysis interest correlation, 186 level granularity, 152–153 load distribution, 137 location notification service, 467 servers, packet-switched streaming service, 311 USER, GPS system, 459 User equipment (UE), 11–12, 15, 22, 311, 363–365 User interest correlation algorithm, 197–198 simulations, 198–200 traces, 198 Username, 509 UTRA (universal terrestrial radio access), standardization of, 10 UTRA FDD, 15 UTRAN FDD, 16

INDEX

Value-added services (VASs), 2, 413–414 Variable-length coding (VLC), 286, 396 –397 Verizon, 508 versit vCard, 270 Video, generally splitters, 89 streaming, 285– 286, 338. See also Multicast content delivery; Multimedia streaming Videoconferencing, interactive, 288 Video on demand (VoD), 109, 349 –350 Video transmission, multicast content delivery, 332–333 Vindigo, 144 Virtual hosting support, 45 VLR (visitor location register), 12 –13 Voice over IP (VoIP), 32, 331 Volume leases, 121 –122

Walled garden, 338 WAP 1.0 bearer layer, 77, 83 components of, overview, 69– 70 wireless application environment (WAE) layer, 69, 71–73 wireless datagram protocol (WDP), 69, 73, 76– 77, 82 wireless session layer (WSP), 69, 73–74, 82 wireless transaction protocol (WTP), 69, 73– 76, 82 wireless transport layer security (WTLS), 69, 76– 77, 83 WAP Forum, 264 Watermarked content, 379 Watermarking digital, 375, 377 –379 dual watermarking-fingerprinting system, 400 –401 public key vs. private key, 398 –399 WBMP, 209, 248 WCCP (Web Cache Control Protocol), 124 WCDMA (wideband code-division multiple access) radio access network architecture, 13–14, 362–363 beyond 2Mbps with HSDPA, 20– 21 evolution of advanced antenna technologies, 23 enhanced uplink dedicated channel, 22–23 multimedia broadcast and multicast service (MBMS), 24 new frequency variants, 23 layer 2/3, 14–18 physical layer, 18–20

545

WCDMA (wideband code-division multiple access) technology architecture, generally, 9–10, 196 core network, 10– 11 radio access network, see WCDMA radio access network standardization, 12 Weak cache consistency, 118 Weather service, MMS adaptation, 249–250 Web-based applications classification of, 116– 117 information dissemination, 87 –90 information exchange, 89–91 Web browsing workload, characterization of content analysis content size, 146 document popularity, 148–149 overview, 145 popular content categories, 146–148 correlation with notification workload, 174–178 log analyses, 161–163 system load analysis, 160–161 user behavior analysis load distribution, 150– 153 overview, 149–150 spatial locality, 158–160 temporal stability, 155–158 wireless user sessions, distribution of, 153–155 Web retailers, digital rights management, 381 Web services core technologies, 480–482 defined, 473–475 foundation technologies discovery protocols, 518–519 simple object access protocol (SOAP), 489–511 the Web, 483–487 Web Services Description Language (WSDL), 511– 518 XML/XML schema, 483 hype, 482–483 identity management, 522–523 mobile terminal, 520–522 motivating technologies, 477–480 privacy, 522–523 service-oriented architectures (SOA), 475–477 standards, 487–489 Web Services Interoperability Organization (WS-I), 519–520 Web Services Description Language (WSDL) bindings, 515–516

546

INDEX

Web Services Description Language (WSDL) (Continued ) characteristics of, generally, 8, 481 –482, 511–512 defined, 481 message data, 512–514 1.2, 516–518 operations, 512, 514–515 Web Services Inspection Language (WS-IL), 519 Web Services Interoperability Organization (WS-I), 475, 519–520 Web workload, see Web browsing workload characterization of, generally analysis, types of, 137 motivation for, 136 impact of, 5, 135–136 notification workload, characterization of content analysis, 164–168 log analyses, 172–174 significance of, 163–164 system load, 171–172 user behavior analysis, 168–171 server architecture accesses, types of, 144–145 components of, 143 –144 data log description, 144 significance of, 142–143 web browsing workload, characterization of content analysis, 145–149 log analyses, 161–163 system load analysis, 160 –161 user behavior analysis, 149–160 web browsing correlated with notification amount of usage, 174– 176 popular content categories, 176 –178 wireless user workload characterization, 140–142 wireline user workload characterization content analysis, 138–139 overview of, 137–138 system load analysis, 140 user behavior analysis, 139–140 wireline web compared with mobile web system load, 179 user behavior, 179 web content, 178–179 Wide area network (WAN), 3 Wireless access, 144–145 Wireless Access Protocol (WAP) charging for, 414, 423–424, 427 defined, 5, 66 evolution of architecture, 67–69 overview, 66–67

future directions for, 83–84 mobile content delivery, 189 synchronization, 261 traffic analysis at Bell Mobility’s PCS, 141 WAP 1.0 components, 69–77 WAP 2.0, architecture overview, 78 –80 WAP 2.x components application framework, 80 –81 bearer networks, 83 security services, 83 service discovery, 83 session services, 81–82 transfer services, 82 transport services, 82–83 Wireless environments, constraints in, 277 Wireless identity module (WIM), 83 Wireless Internet service provider (WISPs), 164, 319 Wireless LAN (WLAN) charging, 422 content adaptation, 206 radiofrequency signals, 443 relative location, 442 streaming media, 275, 319–321 synchronization, 261 workload characterization study, 141– 142 Wireless Markup Language (WML), 5, 210, 214, 223, 225, 228 Wireless profiled TCP (WP-TCP), 82 Wireless public key infrastructure (WPKI), 83 Wireless session protocol (WSP), 69, 73–74, 82, 264–265 Wireless transport layer security (TLS), 69, 76 –77, 83 Wireless transport security layer (WTSL), 83 Wireless users content analysis, 147 notification workload, 178 system load analysis, 162 user behavior analysis, 151–159 web browsing workload, 177–178 Wireless Village, 264 Wireline Internet, 184 Wireline users, 178–179 WML language, 71–73 WMLScript, 71, 73, 81 World Wide Web (WWW), see Web services current status of, 37 future directions for, 37–38 historical perspectives, 35 –37 impact of, generally, 5, 473 protocols for, see HyperText Transfer Protocol; Wireless Access Protocol (WAP)

INDEX

World Wide Web Consortium (W3C) composite capability/preference profiles (CC/PP) working group, 231, 252 functions of, generally, 81–82, 251– 252, 290, 474 –475 Web Services Architecture working group, 474 WS-Security, 508 WTA (wireless telephony application), 71

XForms, 211 XHTML 2.0, 210–211, 214, 216 XHTML mobile profile markup languages (XHTML MP), 81, 223 XML/XML schema, 483 XML-RPC, 494 XSD, 512

Xbone, 116 xDSL, 422 X.509, 509

Yahoo.com, 88 Yahoo Mobile, 135 Yellow Pages, 147, 161

547

Content Networking in the Mobile Internet

Building the Mobile Internet (Networking Technology)

Content Networking Fundamentals

Mobile Ad Hoc Networking

Mobile Ad Hoc Networking

Mobile Internet For Dummies

EDGE for Mobile Internet

EDGE for Mobile Internet

GPRS for Mobile Internet

Content Is Currency: Developing Powerful Content for Web and Mobile

Content Networking: Architecture, Protocols, and Practice

Content Networking: Architecture, Protocols, and Practice

Quality-Based Content Delivery over the Internet

Detecting Steganographic Content on the Internet

Content Networking : Architecture, Protocols, and Practice (The Morgan Kaufmann Series in Networking)

Wireless Internet & Mobile Business

Mobile Radio Networks: Networking and Protocols

Wireless IP and Building the Mobile Internet

Wireless IP and Building the Mobile Internet

Ambient Networks: Co-operative Mobile Networking for the Wireless World

Wireless Internet Telecommunications (Artech House Mobile Communications)

Mobile Internet: Enabling Technologies and Services

Cellular Authentication for Mobile and Internet Services

Mobile Internet: enabling technologies and services

Networking

Networking

Fundamental Networking in Java

Fundamental Networking in Java

Fundamental Networking in Java

Content Networking in the Mobile Internet

Building the Mobile Internet (Networking Technology)

Content Networking Fundamentals

Mobile Ad Hoc Networking

Mobile Ad Hoc Networking

Mobile Internet For Dummies

EDGE for Mobile Internet

EDGE for Mobile Internet

GPRS for Mobile Internet

Content Is Currency: Developing Powerful Content for Web and Mobile

Content Networking: Architecture, Protocols, and Practice

Content Networking: Architecture, Protocols, and Practice

Quality-Based Content Delivery over the Internet

Detecting Steganographic Content on the Internet

Content Networking : Architecture, Protocols, and Practice (The Morgan Kaufmann Series in Networking)

Wireless Internet & Mobile Business

Mobile Radio Networks: Networking and Protocols

Wireless IP and Building the Mobile Internet

Wireless IP and Building the Mobile Internet

Ambient Networks: Co-operative Mobile Networking for the Wireless World

Wireless Internet Telecommunications (Artech House Mobile Communications)

Mobile Internet: Enabling Technologies and Services

Cellular Authentication for Mobile and Internet Services

Mobile Internet: enabling technologies and services

Networking

Networking

Fundamental Networking in Java

Fundamental Networking in Java

Fundamental Networking in Java

Recommend Documents