This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Happy Birthday to You!
Figure 7.2 Text with image in both HTML and SMIL.city:
8C Wind 176 208 Nokia3650 15 6 Yes 12 Yes Yes PhoneKeypad 2 Nokia Yes Yes 1 1
7.5
CAPABILITIES AND METADATA
235
US-ASCII UTF-8 ISO-10646-UCS-2 ISO-5589-1 US-ASCII UTF-8 UTF-16 ISO-10646-UCS-2 Yes SunJ2ME1.0 Symbian OS Symbian LTD 6.1 Yes application/vnd.sun.java text/vnd.sun.j2me.app-descriptor application/java-archive application/vnd.symbian.install Figure 7.11
UAProf profile for the Nokia 3650 phone from nds.nokia.com.
236
CONTENT ADAPTATION FOR THE MOBILE INTERNET
Figure 7.12
7.5.2
Infopyramid of a weather service (see Section 7.4.2.1).
Metadata
When adaptation is done by content selection, the author must create multiple versions of the individual pieces of content (WML decks [34], images, etc.).4 For instance, the author will create HTML pages or WML decks of different lengths and images of different size and quality to support the different terminal resolutions and network bit rates. For example, tomorrow’s weather might have the alternative representations shown in Figure 7.12. An advanced terminal would be able to show any of the alternatives, so the service must have some information about the relative quality and usefulness of the alternatives as well. Assuming the capabilities of the terminal are known, an adaptation service could in principle then choose between the content versions based only on this information. However, it then becomes the responsibility of the adaptation service to decide what content is best for a certain context, and the content author (who presumably would know best) cannot influence the decision. Therefore, content metadata should also include information about the context in which the content can be used, rather than only information about the content itself. Figure 7.13 depicts an example of this type of metadata. As illustrated in Figure 7.13, the annotations describe the set of minimum requirements that a terminal and network must fulfill in order to be able to receive each specific version. A version is selected only if all the requirements are met. For instance, we can’t select the video with a terminal having a bit rate of 28,800 bps, even though the terminal might support video playback (because it would take too long to download the video).
4 In advanced systems, different versions of presentation content can be automatically generated from generic XML data. Automating generation of multi-version media content is more difficult.
7.6 ADAPTATION ARCHITECTURES
Figure 7.13
237
Multimedia content descriptor (metadata) for a weather service’s InfoPyramid.
A key element of the approach is to establish the usefulness or utility of the content. This allows deciding which version to select when the terminal or network can support many versions. The content selection mechanism will then choose the version of content with the highest utility out of those for which the terminal and network capabilities meet all the requirements. 7.6
ADAPTATION ARCHITECTURES
This section presents different adaptation architectures. Specifically, it addresses the issue of where adaptation should be performed and terminal capabilities propagated. 7.6.1
Location of Adaptation
There are three possible ways to ensure both that the message can be delivered to the receiver and that it conforms to the capabilities of the receiver: (1) ensuring it conforms before being sent from the content source; (2) modifying it at some intermediary point; or (3) modifying it at the receiver so that it conforms. We will use the following definitions: . Source—origin of the content. A content server for browsing or the sending terminal for messaging.
238
CONTENT ADAPTATION FOR THE MOBILE INTERNET
. Destination—final target location of the content. A client for browsing or receiving terminal for messaging. . Intermediary—an entity between the source and destination. It can include all kinds of proxies, gateways or other servers between the source and the destination. In this section, we first discuss the disadvantages and advantages of adapting at each of these locations. Then we present the different adaptation architecture configurations.
7.6.1.1 Adaptation at the Source In this case, the adaptation is performed at the origin of the content. In browsing, the adaptation would be performed by the Web server. It can be argued that this is the most logical place to perform the adaptation since the source can decide on the most appropriate content to send to the recipient according to its capabilities. After all, the content owner should know best what information it wants to convey to the user taking into consideration the capabilities of its terminal. Commercial sites have recognized this need and often provide different content for Netscape and Internet Explorer users. The source can perform adaptation through a combination of content selection, transcoding, and scripting (e.g., XSLT) techniques depending on the nature of their content. But adaptation at the content source can be difficult to achieve for many reasons; for instance, it can require a lot of processing power, it may require a lot of effort to create the content in a way that it can adapt to different terminals, and finally the source must have knowledge of the terminal capabilities in order to be able to perform adaptation. In today’s market, most Web service providers don’t find it economical to customize content to different mobile terminals. As a matter of fact, most offer a single version that somehow can reach the majority of PC terminals. Customizing their server to reach a small additional percentage of users would not be profitable. Only a small number of commercial sites provide additional service for mobile phones, often a basic text-only page for early WAP enabled phones. Also, the problem of obtaining the terminal capabilities is not an easy one to solve. On the mobile side, standards such as UAProf can be used to learn the terminal capabilities. However, on the Internet side, UAProf is not widely used and maintaining capability databases based on the user-agent header is quite demanding. But as more and more mobile phones will be in use, it should be expected that more Website will support mechanisms to learn the terminal capabilities and provide customized content to mobiles. For messaging applications, adaptation at the source is even more problematic. It requires that the sender know the recipient’s capabilities, understand those capabilities, and be willing and capable of creating content to meet them. This is a lot to expect from the sender.
7.6 ADAPTATION ARCHITECTURES
239
7.6.1.2 Adaptation at the Destination The content can also be adapted at the destination. It can be argued that the destination is the best place to perform adaptation since the user should decide how he/she wants the content to be rendered. For instance, the user may want to change the layout of the content, change the font’s color or size. Therefore there is great benefit to leave appearance adaptation to the destination. Actually, both source and destination are important adaptation locations that should be complementary. The source should provide the best supported content possible to users while giving them as much flexibility as possible to control how it is rendered. Format, size, characteristics, and encapsulation adaptation at source or intermediary is often required for the content to be supported or even reach the destination. Also adapting at the destination may require heavy computations that can be problematic for a mobile terminal: (1) it would increase the time until the content is rendered on the terminal, affecting user experience; and (2) this additional processing would affect the battery life. 7.6.1.3 Adaptation at the Intermediary When the source doesn’t support adaptation for any of the reasons mentioned above, an intermediary can perform the adaptation to enhance the usability of a service. In the mobile world, the WAP gateway is performing this task for browsing applications, while the multimedia messaging service center (MMSC), or an external transcoding server under its control, is presently assuming that role for the multimedia messaging service (MMS). Today’s WAP gateways and MMSCs can perform image format conversion, resolution reduction, and other functions. Unlike adaptation at the source, there may be legal implications to adapt the content between the origin and the destination. Also the results of adaptation may not be acceptable depending on the nature of the content. 7.6.2
Adaptation Architecture Configurations
Several elements affect how and where the adaptation process is performed. How the content is adapted depends on the type of the content at hand. For instance, the origin server can perform content selection when it has access to multiple versions and it can also transcode. However, an intermediary can typically only transcode content on the fly between the source and the destination. The location of the adaptation is often determined by the knowledge of the terminal capabilities. Only network elements having such knowledge can perform adaptation. Ideally the origin server should perform adaptation but if it doesn’t or can’t, then an intermediary should take over. In the case of mobile browsing, adaptation is usually performed by an intermediary, such as WAP gateway, because it has knowledge of terminal capabilities and can perform transcoding taking into account the terminal capabilities, something that very few Websites offer to wireless devices. Figure 7.14 shows different adaptation architecture configurations. Each configuration shows where the adaptation is performed and where the terminal
240
CONTENT ADAPTATION FOR THE MOBILE INTERNET
Capabilities (a) Source
Dest. Content
Capabilities (b) Source
Dest.
Inter. Content
Content Capabilities
(c) Source
Dest. Content
Figure 7.14 Adaptation architecture configurations (the cube shows the location where adaptation takes place).
capability information needs to be propagated. The diagrams don’t provide the protocol details as they are application-dependent. In addition, the terminal capabilities are not always part of the protocol exchange but can be part of an operator’s user profile database, for instance. Configuration (a) illustrates the adaptation architecture where the source performs adaptation on the basis of terminal capability knowledge. This is the case of commercial Websites using user-agent and/or accept headers of HTTP/WSP to select or transcode the content. Configuration (b) illustrates the adaptation at an intermediary. This configuration is typical in mobile browsing where a WAP gateway performs adaptation of web content obtained from the source. It is also the case of MMS adaptation where the MMSC performs adaptation between the source of the message and the destination. UAProf or user-agent and/or accept headers of HTTP/WSP can be used to propagate terminal capability information. Configuration (c) illustrates the adaptation at destination, typically based on the terminal’s display characteristics and user’s preferences. Note that combination of these configurations can be used in practice to distribute the adaptation operations between source, intermediary, and destination. For instance, in browsing, some initial content selection can be performed at the source, then the intermediary could perform some encapsulation adaptation for efficient wireless transport and finally the terminal could handle presentation adaptation. In the next section, we present different applications scenarios and show how they use those configurations.
7.7
7.7
APPLICATION SCENARIOS
241
APPLICATION SCENARIOS
This section presents different application scenarios. It applies the different concepts presented in earlier sections to solve two major application adaptation problems: browsing and multimedia messaging. It will explain in more detail how each application uses the configuration architectures and how it makes use of the adaptation methods.
7.7.1
Scenario for Content Selection: Browsing
We first present an example of content adaptation for browsing applications using the content selection adaptation method. Again, we will use the weather forecast service for illustration purposes. In this case, the source in the weather forecast service server, there may be an intermediary and the destination is the requesting terminal. An intermediary can be used to convert between HTTP and WSP. But we assume that except possible encapsulation adaptation handled in the intermediary, the source is performing the adaptation. This corresponds to configuration (a) of the adaptation architecture of Figure 7.14. Before we present this example, we should note that today the majority of adaptation for mobile browsing is achieved through transcoding at an intermediary (in a gateway). This raises many issues: such as quality of content and legal aspects. Therefore, in the near future content selection is expected to be more common than it is today. The protocol interaction, illustrated in Figure 7.15, is as follows: 1. Client requests content of URL to server and provides its capabilities (UA header and optionally UAProf).
Figure 7.15
Protocol interaction in the case of browsing application.
242
CONTENT ADAPTATION FOR THE MOBILE INTERNET
2. Server resolves UAProf capabilities and possibly gets additional capabilities from a local database, if needed, using UA header or static UAProf URL (not shown). 3. Server selects the best content according to terminal capabilities and its content selection policies. The algorithm will be described below. 4. Server may perform additional transcoding or XSLT operations (not shown). 5. Server delivers the adapted content to the client. Let’s illustrate how the content selection process works for simple image selection using a specific content selection policy. The content selection algorithm proposed here is independent of the specific content descriptors or methods of storage of the information. However, the effectiveness of the actual content selection process will depend on the choice of the specific content descriptors and their specific values as entered by the author or content provider. 7.7.1.1 Content Selection Algorithm The algorithm presented here is such that to each media content version is attached a set of requirements that we call “multimedia content descriptors” (MCDs). These requirements need to be fulfilled by the terminal, the network, and the user preferences in order for that version of the source content to be selectable (possibly chosen). The algorithm is thus based on a comparison between multimedia content descriptors and capabilities and characteristic descriptors. Among all “selectable” versions, the one of highest value (from the user’s perspective as assessed by the author or provider) will be selected. For convenience, let’s assume that the content provider can order the different versions (or representations) content in decreasing order of value. Then, the algorithm will select the first version of the content in the list which requirements can be satisfied by the terminal, user preferences, and the network. That will give the best representation that the terminal and network can support. Note that although the example illustrates the case of images, it applies equally to other elements such as layout elements (XHTML, HTML, WML, etc.). The algorithm can be summarized in the steps presented in Table 7.1. The requirements usually take the form of BitRate 28000 and/or ScreenSize 320 240, for instance. It is important to note that the algorithm is very generic and not bound to a specific set of requirement attributes such as resolution or bit rate. In Section 7.7.1.2 we will illustrate further how to apply the algorithm using proposed descriptors that are useful in the context of adaptive multimedia under Web browsing application. 7.7.1.2 The Infopyramid and Media Capability Descriptors Suppose the Infopyramid and multimedia content descriptors of Figure 7.13. The content is annotated using the following multimedia content descriptors: Utility (Value): a positive integer setting the rank of this version with respect
to other related versions (where 1 is the order of the image having the lowest
7.7
TABLE 7.1
APPLICATION SCENARIOS
243
Example of Content Selection Algorithm
Production of multimedia content descriptors (done during the creation process of the source content): for each multimedia element Set the requirements for each version of multimedia element (usually done under the author’s supervision) Order the version in increasing order of value or quality (usually done by the author) Content selection (performed when a request to the content is received by the content selection engine from the phone): for each requested element (WML deck, (X)HTML page, inline image, audio, video, etc.) Select the first element in the list of versions for which all the requirements are satisfied (usually through the characteristics of the terminal and the network and user preferences); thus the search for a match starts from the version with highest value to the least value until a match occurs Return the selected version of the element (WML deck, (X)HTML page, inline image, audio, video, etc.) to the requesting entity
value). The value order is unique for each version of the original image. The author or content provider is expected to order the content. MinBitRate: minimum required bit rate in bits/s (bps). This attribute specifies the recommended minimum transfer speed under which that object shall be selected. Setting requirements that are too low will increase the download time to a point where it may not be acceptable to a user. The author or content provider is expected specify the required bit rate. MinImageResolution: minimum image resolution required (X Y pixels). For content selection, the terminal must be able to accept images larger or equal to this image resolution (in both X and Y dimensions). This attribute can be set automatically (or manually). MinVirtualScreenSize: the minimum virtual screen size (X Y pixels) under which the image should be displayed. The virtual screen size could represent the size of a web page or a WML card in WAP. The image can be selected only if the virtual screen size of the phone is equal or higher than the minimum virtual screen size in each dimension (X and Y). This attribute is very useful for controlling the display of decorative elements. For instance, an author may want some small decorative images displayed only if the virtual resolution of the terminal is large enough. If only MinImageResolution was used, such small images would probably be acceptable even for a small display and would overcrowd the display without much value for the user. The author or content provider is expected specify this MinVirtualScreenSize. By default, this value could be set to the image resolution. MediaFormat: the media format in which the picture is stored. To be acceptable for selection, the media format must be an element of the list of media formats that the terminal can accept. For convenience, the format name follows the notation of MIME types.
244
CONTENT ADAPTATION FOR THE MOBILE INTERNET
7.7.1.3 The Terminal’s Media Capability Descriptors Associated with those multimedia content descriptors, the terminal would provide its media capability descriptors (MCD) when making a request for content. The MCDs are5 BitRate: the terminal’s connection average bit rate. MaxImageResolution: the terminal’s maximum image resolution supported
(X Y pixels). VirtualScreenSize: the terminal’s virtual screen size (X Y pixels). MediaFormatSet: the terminal’s supported media formats.
7.7.1.4 Results of the Media Content Selection Figure 7.16 shows the selected media content component for three devices. 7.7.1.5 Results of the Overall Adaptation We saw in the previous section which media component was selected for each terminal. We now describe the adaptation of the presentation part and show the final adaptation result visible on each terminal. The selection of the base HTML/ WML pages can be performed using content selection also. Alternatively or in addition, XSLT is applied on the layout content to meet the terminal characteristics. The layout components include URLs to images, video, and other inclusive media content. Each URL in question is an abstract link as it doesn’t relate to a specific version; rather, it relates to the set of versions (the original filename). The selection engine, when receiving a request to such URL makes a selection of the content and returns the best version. For that purpose, using abstract URLs is a very important technique. Figure 7.17 shows how the weather application might look on the different devices at the end of the adaptation process. 7.7.2
Scenario for Transcoding: Multimedia Messaging Service
The multimedia messaging service (MMS) [16 – 19,25,60] is the next evolutionary step from the short message service (SMS). While SMS is typically used to exchange short text messages between users, MMS provides the opportunity to exchange much larger messages composed of a wide and rich variety of content types. These include still images, graphics, audio, music, and video clips. MMS is expected to become an important 3G application and enabler. The MMS architecture and the overall concepts have been standardized in the Third Generation Partnership Project (3GPP) [16 –18]. On the basis of the work and requirements of 3GPP, the Wireless Application Protocol (WAP)-based implementation specifications have been the responsibility of the WAP Forum. 5 These capabilities are provided for illustration purposes. Actual MCDs are defined in UAProf specification.
7.7
APPLICATION SCENARIOS
Terminal
1
2
3
BitRate
15000
15000
50000
MaximumImageResolution
320×240
50×50
320×240
VirtualScreenSize
320×480
80×80
320×240
MediaFormatSet
"image/jpeg", "image/gif", " image/vnd.wap.w bmp"
"image/jpeg", "image/gif", " image/vnd.wap.w bmp"
"image/jpeg", "image/gif", " image/vnd.wap. wbmp", "video/3gp"
BitRate too low for receiving utility 5 image.
VirtualScreenSize too low for utility 4 image.
A BitRate lower than 43200 would have resulted in utility 5 image.
245
Received content
Note:
Figure 7.16
Selected media content for different browsing devices.
Now the work is under the responsibility of the Open Mobile Alliance (OMA), where the specification activity is continuing [19]. Also, the specifications are now taking into account requirements from 3GPP2 [58–59]. However, this new service brings new challenges related to interoperability and user experience. MMS is evolving in a very pervasive environment composed of mobile terminals with very different characteristics. For instance, some early MMS phones were capable of sending and receiving messages no larger than 30 kB while others could support up to 100 kB. To complicate this situation, the
Figure 7.17
Weather application on the different devices.
246
CONTENT ADAPTATION FOR THE MOBILE INTERNET
capabilities of new mobile terminal products are evolving very rapidly. For instance, while the first MMS terminals could support images but no video, many MMS terminals today can support video and soon will support vector graphics. This environment makes it very challenging to introduce new formats and services over MMS while maintaining backward interoperability with older less capable mobile terminals. Server-side multimedia message adaptation (MMA) is a technology that attempts to reduce the MMS interoperability problems and allow smoother format and service evolution. Specifically, server-side MMA consists of adapting the content of a multimedia message in the multimedia messaging service center (MMSC), or in an external transcoding server under its control, to suit the capabilities of the receiving terminal. After a short introduction of the applications that MMS can bring and the protocol flow, we will discuss how content adaptation can be performed in MMS. 7.7.2.1 MMS Applications MMS introduces a generic mechanism to encapsulate and transport multimedia content without restricting the formats used.6 Therefore, MMS can be the foundation of numerous and very diverse applications as illustrated in Figure 7.18. MMS can provide the following applications: . Mobile to mobile: sending/receiving photos, audio/video clips, voicemail, business cards, and so on . Web applications to mobile devices: electronic postcards, greeting cards, advertisement, news of the day (video/audio clips), screen savers, animations, maps . Internet to/from mobile devices: receive selected emails, send emails. MMS can also be an enabler to many other applications such as interactive games. 7.7.2.2 MMS Transactions Figure 7.19 shows the message flow for multimedia messages delivery. The steps are as follows: 1. The sender’s terminal initiates a WAP POST (using WSP or HTTP) request to the MMSC in order to send a message. This operation uploads the message to the MMSC. The MMSC is then responsible for the delivery. Note that MMS is a store-and-forward messaging service. 2. After the MMSC has stored the message, it sends a notification to the message recipient’s terminal to inform it that a new message arrived. The notification is typically carried using WAP PUSH (e.g., SMS as the bearer). The notification contains a URL associated with the message. It also contains information 6
Actually, MMS could transport any type of content such as Java MIDlets or binary data.
7.7
APPLICATION SCENARIOS
247
Figure 7.18 Exchange of MMS messages between mobiles devices, the Internet (email, instant messaging), and Web applications.
about the message such as when the message expires, the message size, and optionally the sender’s address. 3. The notification triggers in the recipient’s terminal a WAP GET (using WSP or HTTP) operation that fetches the message (using its URL) from the MMSC to the mobile device. That transaction contains information about the terminal
2. Notification (WAP PUSH)
MMSC
3. Delivery request (WAP GET) with UA-header and UAProf info.
480x360, 60kbytes 5. Deliver message 1. Send (WAP POST) 4. Adapt message 240x180, 18kbytes
Figure 7.19
MMS transactions and adaptation framework.
248
4.
5. 6. 7.
CONTENT ADAPTATION FOR THE MOBILE INTERNET
type (UA header) and may contain information about the terminal capabilities using UAProf. Such information is crucial for message adaptation. The MMSC retrieves, from its database, the message corresponding to the URL. It then may adapt the message to meet the terminal capabilities; message adaptation is not mandatory for earlier MMS specifications but is becoming mandatory with MMS v1.2 [60] and future specifications. The MMSC sends the resulting message to the destination terminal. The terminal confirms reception of the message (not shown). The MMSC may send a delivery report to the sender using WAP PUSH (not shown).
We can see in Figure 7.19 that the delivered message was adapted to meet the lower resolution and memory capabilities of the receiving terminal. For instance, the phone can support messages no larger than 30 kB, including images with resolution not exceeding 352 288. 7.7.2.3 The MMS Conformance Document Since the specifications of the first MMS version did not mandate formats, many equipment manufacturers decided to join forces in writing an MMS conformance document [20]. The purpose of the MMS conformance document was to ensure some degree of terminal interoperability when MMS would be initially introduced by defining simple baseline requirements that first-generation MMS terminals should meet. It was understood that terminals would later support richer formats such as the ones defined in 3GPP TS 26.140 [18]. The first conformance document recommends SMIL support for presentation, JPEG baseline, GIF and WBMP for images, and AMR for audio. The minimum supported image resolution should be 160 120, and the supported message size supported should be no less than 30 kB. Again, it is important to emphasize that this conformance document is not intended to limit the functionality of terminals but to set a minimum assumption in early MMS deployment when the destination terminal capabilities are unknown. By all means they can be exceeded, and most phones introduced on the market now do. Also, new conformance documents or introduction of newer MMS profiles are expected. Therefore, MMS adaptation is still required but conformance documents are there to limit the scope of the adaptation functionality required. 7.7.2.4 The UAProf Descriptions for MMS Application The standardized UAProf capability descriptors for the MMS adaptation are presented in Table 7.2. The adaptation is performed taking into account those capabilities in addition to some local ones that may be associated with the UA header or static UAProf URL. MmsMaxMessageSize, MmsMaxImageResolution, and MmsCcppAccept are especially important for media content adaptation.
7.7
TABLE 7.2
APPLICATION SCENARIOS
249
UAProf Descriptors for MMS Application
MmsMaxMessageSize MmsMaxImageResolution MmsCcppAccept MmsCcppAcceptCharSet
MmsCcppAcceptLanguage
MmsCcppAcceptEncoding
MmsVersion
MmsCcppStreamingCapable (introduced in MMS 1.1)
The maximum size of a multimedia message in bytes. The maximum size of an image in units of pixels (horizontalvertical) List of supported content types conveyed as MIME types List of character sets that the MMS client supports; each item in the list is a character set name registered with IANA List of preferred languages; the first item in the list should be considered the user’s first choice; property value is a list of natural languages, where each item in the list is the name of a language as defined by (IETF RFC 1766) List of transfer encodings that the MMS client supports; property value is a list of transfer encodings, where each item in the list is a transfer encoding name as specified by (RFC 2045) and registered with IANA The MMS versions supported by the MMS client conveyed as majorVersionNumber.minor VersionNumber Indicates whether the MMS client is capable of invoking streaming
Presently, not many phones support UAProf, or at most they support static UAProf (not the dynamic form of it). For phones not supporting UAProf, the MMSC must rely on the UA header information, which is used as a key to a database containing capabilities for all phones on the market (and for each software release they may have). When static UAProf is received, the URL can also serve as a key to the same database. The MMSC can also fetch the terminal profile at the given URL and cache it for future requests. But dynamic UAProf is becoming an important feature as terminals can now download and install software that allows them to support new media formats. Without dynamic UAProf, an MMSC can’t tell the difference between two terminals having the same model and software release, but one has installed new media format support and the other has not. 7.7.2.5 MMS Adaptation Example for a Weather Service Consider an MMS-based weather application service. Every day, the service sends to each subscriber an MMS containing weather forecast information. In this case, the source in the weather forecast service server, the intermediary is the MMSC and the sink is the service subscriber. This corresponds to configuration (b) of the adaptation architecture presented in Figure 7.14 and described in the text following the figure. This is the most common model in MMS since, typically, only the recipient’s MMSC has knowledge of terminal capabilities and only after the message retrieval is requested.
250
CONTENT ADAPTATION FOR THE MOBILE INTERNET
Nevertheless, the source could send either (1) multiple versions of the content from which the MMSC could select the best alternative or (2) a single version that the MMSC may have to transcode. Option 1 is not practical for person-to-person communications as a user normally sends only a single message version (providing multiple versions would not only be cumbersome but would also increase significantly the overall message size). But for application-origin content, this would be a good option assuming that the MMSC can perform the content selection. SMIL switch statement would permit providing multiple versions. The origin server could also perform the content selection if it knows the terminal capabilities. For instance, the user could have provided them when subscribing to the service. Since transcoding of a message is used today mostly in MMS, we will illustrate that case. Consider an MMS containing weather forecast information as illustrated in Figure 7.20a. It doesn’t comply with the capabilities of the two receiving terminals presented in Figure 7.20b and 7.20c. For the first terminal, the MMSC reduced the number of colors of the GIF map to 32 to meet the size constraint. For the second terminal, the MMSC reduced the resolution of the GIF map by half to meet the resolution constraint. This was also sufficient to meet the message size constraint of 30 kB. Therefore 256 colors were retained. 7.7.3
Concluding Remarks
It should be clear to the reader that there is no best method for adapting content that suits all situations. Content selection gives the author more control on the adapted versions of the content but requires knowledge of the target terminal and some
Finland: Sunny today. Maximum … Finland:
Finland:
Sunny today. Maximum 21C/70F. Minimum 13C/55F.
Sunny today. Maximum 21C/70F. Minimum 13C/55F.
Sweden: …
Sweden: …
(a) Original message (53kB): GIF: 300x236, 256 colors, 51kB Text + SMIL: 2kB
(b) Adapted message (36kB): (GIF: 300x236, 32 colors, 34kB Text + SMIL: 2kB
Terminal capabilities: MmsMaxMessageSize = 40kB MmsMaxImageResolution = 320x240 MmsCcppAccept =image/jpeg,image/GIF
Figure 7.20
(c) Adapted message (17kB): GIF: 150x118, 256 colors, 15kB Text + SMIL: 2kB
Terminal Capabilities: MmsMaxMessageSize = 30kB MmsMaxImageResolution = 160x120 MmsCcppAccept = image/jpeg,image/GIF
Example of MMS adaptation for weather service.
7.8
STANDARDIZATION AND FUTURE WORK
251
work to create the different versions and establish the selection rules. Transcoding works well for automatic adaptation of simple media content but often fails when the content is more sophisticated such as a complexly laid-out Webpage. Also transcoding often requires more processing resources and may lead to legal issues. But it may be the only solution if the origin doesn’t perform adaptation.
7.8
STANDARDIZATION AND FUTURE WORK
Content adaptation is a topic of high importance and interest. It is expected to be a key part of future mobile applications. Several standardization activities are underway to shape new adaptation technologies and services. They include OMA, ICAP Forum, MPEG, and W3C. This section presents briefly some of those activities. In OMA, the MMS working group has introduced the concept of message classes and established minimum adaptation requirements between those classes to be supported by all MMSCs. For instance, MMSCs must support image resolution and size adaptation for formats such as JPEG. These requirements will be part of OMA MMS version 1.2 [60]. A new working group called standard transcoding interface (STI) was also formed in OMA to define a common transcoding interface between multimedia application servers (MMSC, browsing server, downloading server) and a transcoding server. More information can be found at the OMA Website [19]. The Internet content adaptation protocol [49] is another protocol providing application servers with the possibility of making transformation requests to another server. The protocol is HTTP-based. ICAP was designed to support transformation services such as language translation, virus checking, family (PG/R/X) content filtering, local real-time ad insertion, wireless protocol translation, anonymous Web usage profiling, transcoding, or image enhancement. The work of ICAP is concentrated mostly on the architecture and transfer of request/response attributes and data between servers. However, regarding transcoding, it appears that it is assumed that the transcoding server knows what transformations need to be performed on the content. On the other hand, OMA’s STI will define more precisely in the interface what requirements the adapted content must meet (size, formats, etc.) and possibly some preferences on how adaptation should be done. The vision of MPEG-21 [50] is to define a multimedia framework to enable transparent use of multimedia resources across a wide range of networks and devices used by different communities. MPEG-21 leverages the already existing MPEG standards such as MPEG-1,2,4 for audiovisual representation and XML-based MPEG-7 for content description. MPEG-21 contains several parts, including the digital item adaptation (DIA). In MPEG-21, the adaptation engines themselves are nonnormative tools of DIA. But media descriptions and format-independent mechanisms that provide support for DIA are normative. Specifically, the following item descriptions are under MPEG-21’s scope: user characteristics, terminal capabilities, network characteristics, natural environment characteristics, resource adaptability, and session mobility.
252
CONTENT ADAPTATION FOR THE MOBILE INTERNET
The World Wide Web Consortium (W3C) [46] develops interoperable technologies (specifications, guidelines, software, and tools) for the Web. It is the most important standards forum in the area of Web markup languages. Within the W3C, two working groups are especially interesting from content adaptation point of view: the Device Independence working group and the CC/PP working group: . The Device Independence working group [47] studies issues related to authoring, adaptation, and presentation of Web content and applications that can be delivered effectively through different access mechanisms. . The CC/PP working group develops a framework for the management of device profile information [48]. The group is chartered to deliver a framework that allows the user to plug in different vocabularies. A vocabulary provides naming and syntax for device properties such as screen size, markup support, or browser version. UAProf [14] (specified by the WAP Forum, which is now continued within the Open Mobile Alliance) is probably the most relevant example of such a vocabulary. UAProf also specifies how profile information is attached to HTTP/WSP requests. REFERENCES 1. 3GPP TS 26.071, Mandatory Speech Codec Speech Processing Functions; AMR Speech Codec; General Description. 2. ISO/IEC 14496-3:2001, Information Technology—Coding of Audio-Visual Objects— Part 3: Audio. 3. Digital Compression and Coding of Continuous-Tone Still Images, ISO/IEC IS 10918-3, ITU-T Recommendation T.84, 1990 (JPEG specification). 4. Graphics Interchange Format, Version 89a, Programming Reference, CompuServe, Inc., 1990; http://256.com/gray/docs/gifspecs. 5. T. Boutell et. al., PNG (Portable Networks Graphics) Specification Version 1.0, IETF RFC 2083, March 1997. 6. Multiple-image Network Graphics, http://www.libpng.org/pub/mng/. 7. ISO/IEC 15444-1 (2000), Information Technology—JPEG 2000 Image Coding System: Core Coding System, Part 1. 8. W3C Working Draft Recommendation, Scalable Vector Graphics (SVG) 1.1 Specification, http://www.w3.org/TR/SVG11, Feb. 2002. 9. W3C Recommendation, Mobile SVG Profiles: SVG Tiny and SVG Basic, http:// www.w3.org/TR/SVGMobile. 10. ITU-T Recommendation H.263, Video Coding for Low Bit Rate Communication. 11. ISO/IEC 14496-2:2001, Information Technology—Coding of Audio-visual objects—Part 2: Visual. 12. R. Mohan, J. R. Smith, and Chung-Sheng Li, Adapting Internet multimedia content for universal access, IEEE Trans. Multimedia, 1(1): 104 – 114 (March 1999). 13. W3C Working Draft Recommendation, CC/PP Structure and Vocabularies, http:// www.w3.org/Mobile/CCPP/Group/Drafts/WD-CCPP-struct-vocab-20010620/, June 2001.
REFERENCES
253
14. OMA (formerly WAP Forum), WAP UAProf Specification, http://www1.wapforum. org/tech/documents/WAP-248-UAProf-20011020-a.pdf, Oct. 2001.
15. W3C Candidate Recommendation, Resource Description Framework (RDF) Schema Specification 1.0, http://www.w3.org/TR/2000/CR-rdf-schema-20000327, March 2000. 16. 3GPP, TS 22.140 V6.5.0, Multimedia Messaging Service (MMS); Technical Specification Group and System Aspects, Stage 1 (Release 5), http://www.3gpp.org/ftp/ Specs, March, 2004. 17. 3GPP TS 23.140 V6.5.0, Multimedia Messaging Service (MMS); Functional Description, Stage 2 (Release 6), http://www.3gpp.org/ftp/Specs, March 2004. 18. 3GPP, TS 26.140 V5.2.0, Multimedia Messaging Service (MMS); Media Formats and Codecs, http://www.3gpp.org/ftp/Specs, Dec. 2002. 19. Open Mobile Alliance (OMA), http://www.openmobilealliance.org. 20. CMG, Ericsson, Nokia, Sony-Ericsson, Comverse, Logica, Siemens, Motorola, MMS Conformance Document, version 2.0.0, Feb. 2002, http://www.forum.nokia.com. 21. W3C Recommendation: Synchronized Multimedia Integration Language (SMIL 2.0), http://www.w3.org/TR/2001/REC-smil20-20010807/, August 2001. 22. W3C Recommendation, XHTML Basic, http://www.w3.org/TR/2000/REC-xhtmlbasic-20001219, Dec. 2000. 23. B. C. Smith and L. Rowe, Algorithms for manipulating compressed images, IEEE Comput. Graph. Algorithms, 34– 42 (1993). 24. N. Merhav and V. Bhaskaran, A transform domain approach to spatial domain image scaling, Proc. Conf. Acoustics, Speech, and Signal Processing, 1996 (ICASSP-96), IEEE Int. Conf. Vol. 4, 1996, pp. 2403– 2406. 25. S. Coulombe, G. Grassel, and P. Hjort, Multimedia messaging—the evolution of SMS to MMS, in Mobile Internet Technical Architecture—Technologies and Standardization, IT Press, 2002; http://www.itpress.biz/. 26. WAP Wireless Application Environment, OMA (formerly WAP Forum), Nov. 4, 1999. 27. Project TS 23.140 Release 1999. ftp://www.3gpp.org/ftp/Specs. 28. OMA (formerly WAP Forum) Specification, XHTML Mobile Profile, http:// www1.wapforum.org/tech/terms.asp?doc ¼ WAP-277-XHTMLMP-20011029a.pdf, Oct. 2001.
29. Eizel Technologies, Inc., Amplifi Enterprise Server, http://www.eizel.com. 30. W3C Recommendation: HTML 4.01 Specification, http://www.w3.org/TR/1999/REChtml401-19991224/, Dec. 1999. 31. W3C Working Draft, XHTML 2.0, http://www.w3.org/TR/2003/WD-xhtml220030506/, May 2003. 32. W3C Candidate Recommendation, CSS Media Queries, http://www.w3.org/TR/ css3-mediaqueries/, July 2002. 33. W3C Recommendation, Cascading Style Sheets level 2 — CSS2 Specification, http:// www.w3.org/TR/1998/REC-CSS2-19980512/, May 1998. 34. OMA (formerly WAP Forum), Wireless Markup Language Specification 1.3, http:// www1.wapforum.org/tech/terms.asp?doc ¼ WAP-191-WML-20000219-a.pdf, Feb. 2000. 35. M. Hudley, A Framework for Multilingual, Device-Independent Web Sites, Sun Developer Connection, April 2001, http://wwws.sun.com/software/xml/ developers/xmlldijsp/framework.html.
254
CONTENT ADAPTATION FOR THE MOBILE INTERNET
36. W3C, The Extensible Stylesheet Language Family (XSL), http://www.w3.org/Style/ XSL/. 37. W3C Recommendation, XSL Transformations (XSLT) Version 1.0, http:// www.w3.org/TR/1999/REC-xslt-19991116, Nov. 1999. 38. OMA (formerly WAP Forum), WAP 2.0 Specifications, http://www.wapforum.org/ what/technical.htm. 39. Oracle Corporation, Oracle 9iAS Wireless, http://otn.oracle.com/products/ iaswe/content.html. 40. Apache Software Foundation, Apache Cocoon, http://cocoon.apache.org/2.0/. 41. Internet Mail Consortium: vCard and vCalendar, http://www.imc.org/pdi/. 42. Sun Microsystems, Inc., Mobile Information Device Profile, http://java.sun.com/ products/midp/. 43. Nokia Corporation, Series 60 Platform, http://www.forum.nokia.com/series60. 44. PalmSource, Inc., Palm OS, http://www.palmsource.com/. 45. Microsoft Corporation, Pocket PC, http://www.pocketpc.com/. 46. World Wide Web Consortium, http://www.w3.org/. 47. W3C Device Independence Working Group, http://www.w3.org/2001/di/Group/. 48. W3C CC/PP Working Group, http://www.w3.org/Mobile/CCPP/Group/. 49. Internet Content Adaption Protocol (ICAP) Forum, http://www.i-cap.org/. 50. MPEG (Moving Picture Experts Group), ISO/IEC JTC1/SC29 WG11, http:// mpeg.telecomitalialab.com/standards/mpeg-21/mpeg-21.htm. 51. Sun Microsystems, Inc., JavaServer Pages Technology, http://java.sun.com/ products/jsp/. 52. OMA, SyncML—Data Synchronization and Device Management, http://www. openmobilealliance.org/syncml/. 53. R. Fielding, J. Gettys, J. Mogul, H. Nielsen, and T. Berners-Lee, Hypertext Transfer Protocol—HTTP/1.1, IETF RFC 2068, Jan. 1997. 54. OMA (formerly WAP Forum), WAP Wireless Session Protocol Specification, July 2001. 55. ETSI, Digital Cellular Telecommunications System (Phase 2); International Mobile Station Equipment Identities (IMEI), ETS 300 508 (GSM 02.16 version 4.7.1), Nov. 2000. 56. EU Project Consensus, IST 2001 32407, http://www.consensus-online.org; software to be released early 2004. 57. W3C Recommendation, Extensible Markup Language (XML) 1.0, 2nd ed., Oct. 2000. 58. 3GPP2 X.S0016-000-A, 3GPP2 Multimedia Messaging System, MMS Specification Overview, Revision A, http://www.3gpp2.org, May 2003. 59. 3GPP2 C.S0045-0, Multimedia Messaging Service (MMS) Media Format and Codecs for cdma2000 Spread Spectrum Systems, http://www.3gpp2.org, Dec. 2003. 60. Open Mobile Alliance, OMA Multimedia Messaging Service v1.2, http//www. openmobilealliance.org, Sept. 2003.
CHAPTER 8
CONTENT SYNCHRONIZATION GANESH SIVARAMAN Nokia Helsinki, Finland
8.1
INTRODUCTION
Content synchronization has been used since the early 1990s for database replication. A good example where database replication is widely used is the Internet. Widely used content, such as Webpages, files, and emails, are stored and are frequently updated in a central server, such as Web servers, file servers, and mail servers. Having all content in one central server works well if the Internet traffic is low and if most users consuming the content are in the same location as the server. But in the Internet, that is not the case. Users consuming the content are distributed across various locations and in such a situation accessing the content from a central server is not just slow but also unreliable as the central server may encounter a failure and the user will not have access to the data he/she desires. Hence, it is very common to have the content distributed across numerous servers, which are known as “mirror servers.” The mirror servers need not be connected at all times with the central server. The process for distributing the content used is replication, which is a type of synchronization. Replication allows exchange of information from a central data store that holds all content with other servers that may not be connected with the central server at all times. Replication copies content from the central server to other servers, and when changes are made to central server. These changes are also communicated and exchanged with other servers. Replication is simply synchronization of content residing on the central server’s database with other servers’ databases, thereby ensuring that all servers have identical data. Important point to note and understand in case of synchronization is that the content stored at a central location are copied and stored locally for access and modifications. The local storage of the data provides certain advantages; Load balancing—with many corporations and service providers, the number of users accessing the data are huge and the users are usually physically Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
255
256
CONTENT SYNCHRONIZATION
located in different geographic locations. In such cases, it is very useful and important that users be able to access the data fast. Fault tolerance—storing all data in one server can be very risky; one could essentially loose all the data in the event of a system failure. Offline access—in order to access and/or modify the data, it is much easier to do it off line with the “local copy,” rather than accessing the “master copy” for every change that occurs. As seen above, data are not in one server and one database, but rather in different servers and different databases and could even be in different geographic locations, and all servers need not be connected to the central server at all times, as shown in Figure 8.1. Since servers may be disconnected, and changes and actions made or taken during the disconnect state may be unknown to either side. Hence, for such a distributed setup of databases, synchronization between all the servers and their databases is of utmost importance to ensure that all changes and actions that have taken place on the data are exchanged and that the dataset among all servers are in the same state. Synchronization enables all the distributed copies of the central data store to remain consistent by communicating the changes between the copy and the central database and resolving conflicts that may arise when changes contradictory occur on the same data of the database. As synchronization is very important for computers and the wired world, synchronization can also be viewed very important for mobile devices and the wireless world. This chapter explores content
Figure 8.1
Replication system.
8.2
WHY MOBILE DEVICES NEED SYNCHRONIZATION
257
synchronization in detail from a mobile device and wireless network perspective. Later, we will briefly discuss an open synchronization standard, which provides a complete open and interoperable synchronization solution for mobile devices by taking in to account the various mobile and wireless requirements.
8.2
WHY MOBILE DEVICES NEED SYNCHRONIZATION
The first generation of mobile phones that were introduced for commercial use in 1978 were designed for voice traffic. In those early days, mobile phones were too bulky, very expensive, and limited to a small coverage area, making roaming impossible, and were hardly capable of supporting any of the high-end services and features that are used today. But with rapid development of mobile technologies, the second generation (2G) of mobile telephony brought better quality in the form of digital cellular network and wide coverage that allowed mobile users to roam easily and conveniently from network to network. All these allowed for a high proliferation of 2G-based phones. With more and more people using mobile phones, it became natural to go beyond voice services such as data centric services, including browsing, messaging, streaming video and audio, and synchronization. Mobile phones are now viewed as more than merely a means of telephonic communication. Instead, they are gaining status as a mobile computing device. The current mobile phones are feature rich that create a whole new service industry for the mobile phone users. Many of the large corporations employ a mobile workforce that represents the sales and support organization. For such groups, a mobile computing device comes in very handy as it allows one to gain access in real time to the backend systems for stock status updates and other vital information needed for a mobile workforce on the move. In most cases, mobile users are not always connected to the network and its stored data. Thus mobile users retrieve data from the network and store them locally on the mobile device, where they access and manipulate the local copy of the data. From time to time, users reconnect with the network to send any local changes back to the networked data repository. During this maneuver, users may also have the opportunity to know about updates made to the networked data while the device was disconnected. Certain time, they need to resolve conflicts among the updates made to the networked data. This reconciliation operation—where updates are exchanged and conflicts are resolved—is known as data synchronization. In other words, data synchronization is the process of making two datasets appear identical. Synchronization is very important for a mobile workforce where data are typically modified and updated locally in disconnected or offline state. In this state changes and updates made on the device side are unknown to the server and changes made to the server are unknown to the client. As an example, mobile phone users are increasingly using their phones to access corporate email and calendar systems. Mobile phones are not typically connected to these systems at all times, but the users can connect to them or “go on line” when needed. For such applications, where data reside on both the client and server sides and the users are off
258
CONTENT SYNCHRONIZATION
line for long periods, synchronization is of utmost importance. Figure 8.2 illustrates a scenario for mobile device synchronization.
8.3
FUNDAMENTAL PRINCIPLES OF SYNCHRONIZATION
To provide the reader with a greater understanding of content synchronization, it is important to explain some fundamental principles that form the basis for content synchronization. As seen earlier, synchronization allows “exchanging” (reversal or modification) of changes that have occurred between databases that store the same dataset. This section explores on the aspects of how synchronization takes place and provides some details on the elements that are needed for synchronization. 8.3.1 8.3.1.1
Types of Synchronization One- versus Two-Way Synchronization
One-Way Synchronization. As the term suggests, one-way synchronization (Fig. 8.3) allows for only one side to communicate and send the changes of the data stored by that side. Which side should send the change depends on the implementation and the configuration of the system. Typically, a content replication system, such as file replication or Web content replication, could be viewed as oneway synchronization, where the central server always propagates all the changes that have occurred to other servers since the previous synchronization. Two-Way Synchronization. Two-way or bidirectional synchronization (Fig. 8.4) allows both sides to exchange all changes and is the most common type of content synchronization. Typically, a dataset copied from a central database and stored locally is modified and new data may also be added to the dataset. As an example, in a corporate environment, most employees would download (essentially copy) the emails from the mail server and store them locally. Once they are available on an employee’s system, the employee would modify them by reading, deleting, or even creating new emails. These actions are performed mostly “off line,” unknown to the central server, in this case the mail server.
Figure 8.2
Mobile device synchronization.
8.3
FUNDAMENTAL PRINCIPLES OF SYNCHRONIZATION
Figure 8.3
259
One-way synchronization.
At the same time when the employee is modifying the emails off line, the mail server may be receiving new emails. Hence in such scenarios modifications that occur on both sides need to be communicated and exchanged to ensure that datasets on both databases are in the “same state.” 8.3.1.2 Slow versus Fast Synchronization One- and two-way synchronization may be either fast or slow. Slow Sync (Full Sync). This type is seldom used. By using this all content stored in the database is synchronized. Since the entire database is synchronized, this is used only in cases where the device and the server synchronize for the first time
Figure 8.4
Two-way synchronization.
260
CONTENT SYNCHRONIZATION
and/or the database is unable to detect changes, which may be due to internal failure, database corruption, or some other anomaly. Fast Sync (Delta Sync). This type is more commonly used. It is used to exchange only the changes that have occurred since the previous synchronization; hence the term delta synchronization. This is very useful as it enables one to detect the changes that have occurred in the stored content and send only those changes as opposed to sending the entire database. As an example, a user has just synchronized the new contact that created on his device with his corporate server. After synchronization, he realizes that the phone number he entered was incorrect, so he updates the new entry. By using the “fast sync type,” the next time that he starts sync, he will be able to send only the entry he updated as opposed to the entire contact’s database. 8.3.2
Change Detection
This is very important for content synchronization. Without change detection, the synchronization would be a slow one, where the entire contents of the database, as opposed to only the changes, are synchronized. In databases, change detection is built into the system and there are numerous means to detect the changes that have occurred. The details of how change detection can be implemented are beyond the scope of this chapter. 8.3.3
Conflict Detection and Resolution
Conflict detection, and resolution are as important for content synchronization as is change detection. Conflicts occur whenever identical items residing on two different databases are changed and then synchronized. As an example, a contact created on the device is synchronized with the server. After synchronizing, the user updates the contact on the device and also updates the same contact on the server. When such situations occur, conflicts are encountered during synchronization. Hence, to resolve such situations, conflict detection and resolution are important. Conflict resolution can be done in various ways. The resolution can be based on certain rules. . . . . .
The client always wins, essentially overriding changes made on the server. The server always wins, essentially overriding changes made on the client. Create duplicates. The latest changes win. Merge the changes.
Conflict can be resolved either by the system, client side or server side, or by the user, where the user is presented with a dialog that reports the conflicts. According to the information presented in the dialog, the user can resolve the problem by selecting one of the rules listed above.
8.4 ADOPTION OF SYNCHRONIZATION FOR MOBILE DEVICES
8.4
261
ADOPTION OF SYNCHRONIZATION FOR MOBILE DEVICES
Although the basic principles of synchronization are also applicable to mobile device synchronization, some considerations are necessary while synchronizing mobile devices. It is well known that mobile devices have resource constraints— limited processing power, limited battery life, limited processing and storage memory, modest data rate [although higher data rates will soon be available with 3G and EDGE (enhanced data rates for GSM evolution)]. Also, the cost of airtime that mobile users have to pay is a crucial factor that needs to be considered while designing applications for mobile devices. As with any other mobile application, synchronization has to be adopted for the mobile environment by addressing the aforementioned constraints posed by mobile devices. This section explains on how the basic principles of synchronization are applied to mobile devices and discusses some of the special requirements for mobile devices in synchronization applications. 8.4.1
Synchronization Scenarios for Mobile Devices
The synchronization scenario is based on which access medium or “bearers” is used to connect the mobile devices with the servers. Typically, synchronization of mobile devices has been based on local connectivity media, such as cable or infrared, as defined by the IrMC specification. But with local synchronization the possibilities are limited, as the mobile device is able to synchronize only with a local server, which restricts the mobility of the user. Instead of such a restrictive synchronization scenario, over-the-air or remote synchronization allows the user to synchronize from anywhere, any time, providing a true mobile synchronization solution and allowing the user to initiate sync whenever needed. Local Synchronization. This is the most common and widely used synchronization scenario as shown in Figure 8.5. There are many proprietary solutions available that synchronize mobile devices, such as PDAs and mobile phones, connected via serial or universal serial bus cable, infrared, or bluetooth with applications running on desktop computers. The synchronization application running on the desktop computer acts as a local synchronization server that allows mobile devices to synchronize with applications running on that computer. The most commonly used application is Lotus Notes or Microsoft Outlook, which store content from mobile devices locally. Over-the-Air Sync (Remote Sync). This scenario (see Fig. 8.6) allows mobile users to initiate synchronization any time, anywhere. This gives great flexibility for users to detect changes that have been made at the server and also to communicate changes made by the user on the device at any time. For over-the-air synchronization, mobile devices, such as PDAs or mobile phones, use well-known wireless network access media such as GSM/GPRS or wireless LAN, with TCP/IP or WAP [9] protocols. There are both proprietary and open-standard-based over-the-air synchronization solutions available. Development of open-standard-based synchronization solution
262
CONTENT SYNCHRONIZATION
Figure 8.5
Local synchronization based on short-range bearers.
started with the SyncML Initiative, which joined Open Mobile Alliance in November 2002. Standardization of the open synchronization solution is still being carried out in the OMA [6] Data Synchronization Working Group. (OMA standardization is discussed in further detail in Section 8.5.1.) 8.4.2
Adhering to Mobile Device Constraints
Mobile devices are known to have limited resources. So, while designing any applications for mobile devices, these constraints should be taken into account. Although the fundamental principles of synchronization are applicable even to mobile devices, certain exceptions must be made in order to address the constraints of mobile devices. As with most other mobile applications, synchronization can be based on client – server architecture, where the server handles all the major functionalities. This is true not only for mobile applications but for any desktop applications as well. This allows for great savings on memory by implementing a simple synchronization engine for mobile devices that will satisfy the needs for simple applications such as personal information management (PIM), which includes calendar and contact synchronization. But for complex applications, such as relational database applications, a more complex synchronization engine would be required on mobile devices, which will certainly require more memory. Mobile devices may typically synchronize with more than one server. As an example, a mobile user may synchronize a single calendar database with the corporate server and with the portal server for maintaining business appointments and
Figure 8.6
Remote synchronization based on over-the-air bearers.
8.5
SYNCHRONIZATION STANDARD
263
family appointments that he shares with his family members separately. In such cases, logging changes for each data store per server may require considerable storage memory or static memory. This change log is needed to record which items have changed in the database between synchronization sessions. During the synchronization session the change log is referred to determine what has changed, and these changes are communicated accordingly. The actual implementation details of the change log are beyond the scope of this chapter. It is worth noting that the change log can grow significantly and could consume a considerable amount of static memory storage. One way to reduce the storage for change log is to define the maximum limit, but this would restrict the number of changes that can be made between sessions; however, this is a tradeoff that one may have to accept. Because mobile devices have limited processing capabilities and processing and storage memory, complex operations, such as conflict detection and resolution, will inevitably overload mobile devices. Such operations must be carried out on the server side. By employing the client – server architecture, it is easy to move such operations to the server side without loosing the functionality of synchronization. An important aspect in synchronization is the use of a local unique identifier and a global unique identifier. Identifiers (IDs) are necessary to address each item uniquely in a database. The database in a mobile device is typically limited in terms of the size of the IDs used for the items that it stores. It is not possible to match with ID length used by the server, which is typically much longer than that used by the client. Hence, items that are created on the server cannot be synchronized as such with the same ID as used on the server, since the length of ID will not fit the size defined by the client. For this purpose, the client creates an ID of its own when a new item created on the server is synchronized with the client. For all the subsequent operations, such as changes or deletes that may be performed on this item, the server must address the item with the ID assigned by the client and not with the server’s ID. For this purpose a special operation called mapping must be carried out after the synchronization. In mapping, the client essentially sends map information that contains the client’s ID, which is the local unique identifier, and server’s temporary ID. The server must maintain this mapping information for the entire life of the item. More details on mapping can be found in the next section.
8.5
SYNCHRONIZATION STANDARD
There is a proliferation of different, proprietary, noninteroperable data synchronization protocols for mobile devices. Each of these protocols is available for only selected transports, implemented on a selected subset of devices, and able to access a small set of networked data. The absence of a single synchronization standard poses many problems for end users, device manufacturers, application developers, and service providers. To address this problem, SyncML Initiative was formed to develop and promote a single, common data synchronization protocol that can be used industrywide. Driving this initiative were Ericsson, IBM, Lotus, Motorola, Nokia, Palm Inc.,
264
CONTENT SYNCHRONIZATION
Psion, and Starfish Software. The first specification version was released in December 2000 with a supporting reference implementation. In June 2002 Open Mobile Alliance (OMA) [6] was created with the consolidation of supporters of open mobile architecture initiative and the WAP Forum [9]. Additionally, SyncML Initiative, Location Interoperability Forum, MMS-IOP, and Wireless Village joined OMA [6] during December 2002. Currently, the data synchronization standard is further developed in the OMA DS Working Group. The synchronization standard provides an open, interoperable synchronization solution for a wide spectrum of mobile devices, such as low-end mobile phones, “smart” phones, and PDAs. All these mobile devices have resource constraints by definition as seen above, but the level of constraint differs among the mobile devices. Typically in this wide spectrum, low-end mobile phones are seen to have the most computing constraints, such as processing power and memory, whereas PDAs are seen to have the least computing constraints. Along with resource constraints, wireless access media used by mobile devices, such as GSM or WLAN [8] have high latency, a high error rate, and slow data rates. Hence, during the design phase of the standard, these limitations have been closely considered. To further understand the technical aspects of data synchronization for mobile devices, it is imperative to comprehend the various specifications of OMA [6] data synchronization (formerly known as SyncML Data Synchronization). 8.5.1
OMA Data Synchronization Overview
For synchronization between two applications, changes that have been made by both applications have to be communicated. Also, for synchronization it may be necessary to reconcile the conflicting changes that occur when changes are made concurrently. Hence, for synchronization between applications it is necessary to represent the changes in a format and structure that is understood by both sides and to exchange those changes in accordance with certain rules to which both sides comply. The synchronization standard addresses this requirement with two fundamental specifications that form the basis of data synchronization: OMA representation and OMA data synchronization protocol (Fig. 8.7). Both of these specifications are designed by accounting for all the mobile device requirements that were discussed in earlier sections. Additionally, as with other mobile application standards or Internet application standards, synchronization has been designed such that it is impartial or unbiased to either the bearer or the content. Bearer neutrality is necessary so that the synchronization standard can be implemented on top of any data bearers, such as HTTP [3] for Internet/intranet access, WSP for wireless access, and OBEX for local connectivity that uses Bluetooth [7] Infrared [4,5] or cable. Similarly, the standard must allow for the synchronization of any content, without arbitrary restrictions with regard to a particular set of contents. 8.5.1.1 OMA Representation This is one of the two specifications that form the core of the synchronization standard. Representation defines a logical entity called as “package,” which
8.5
Calendar Application
SYNCHRONIZATION STANDARD
Contact Application
265
Email Application
Representation Protocol Synchronization Protocol HTTP Internet/ Intranet Figure 8.7
WSP
OBEX IrDA, USB, Bluetooth
WAP
OMA data synchronization framework.
encapsulates one or more synchronization message(s), as shown in Figure 8.8. The message is based on the eXtensible Markup Language (XML) [10], which defines the structure and format of the message. This is necessary in order to provide the structural form for each message. The package is broken into numerous small messages to address the constraints of the mobile devices. As an example, for certain wireless transport applications, such as WSP, which does not allow segmentation of large objects and has a small protocol data unit size, large data objects must be broken down into smaller message segments that comply with the size limitations defined by the underlying transport protocol. Each message consists of a header and a body, which provide information that is relevant for synchronization applications during the synchronization session. The header provides routing information, which consists of server and client address, authentication credentials (username, password pair), and session information. The body encapsulates most of the important elements needed for synchronization and defines all the synchronization commands, such Add, Delete, Replace, and Sync, which are required to perform various synchronization operations, alert
SyncML Message SyncHdr SyncBody SyncML Commands
OMA DS Package
Figure 8.8
Data
Data synchronization packaging model.
266
CONTENT SYNCHRONIZATION
database for synchronization, provide status information for all the operations, and convey mapping information from client to server, and most importantly must convey the data object or the payload itself, as shown in Figure 8.9 [11]. Representation provides the syntax for synchronization applications by using document type definition (DTD). Applications must represent the changes occurred on the data as defined by the representation DTD: 1.1 SyncML/1.1 1 1 http://www.syncserver.org/sync IMEI:930051010592118 syncml:auth-basic QnJ1Y2UyOk9oQmVoYXZl 5000 1 200 ./contacts/myContacts ./contacts 234 276 2
8.5
SYNCHRONIZATION STANDARD
267
application/vnd. syncml-devinf þ xml ./devinf11 My Phone 4119 ... text/x-vcard BEGIN VCARD END VCARD VERSION 2.1 N TEL VOICE CELL 01 02 3 application/ vnd.syncml-devinf þ xml ./devinf11 Figure 8.9
XML snippet of synchronization message with capabilities [11, Sec. 9.11].
268
CONTENT SYNCHRONIZATION
8.5.1.2 OMA Data Synchronization Protocol As representation defines the packaging model, structure of the messages, and syntax for the messages, the protocol specification sets the rules that both client and server must follow during the synchronization process. The protocol is the specification that follows the fundamentals of synchronization as discussed in Section 8.3 by factoring in the mobile device synchronization requirements as presented in Section 8.4. The protocol makes use of the representation DTD for creating synchronization messages but sets the rules for communication between the client and the server. The message sequence chart (MSC), as shown in Figure 8.10, best explains this. MSC clarifies the different states internally maintained by the client and the server, who both locally store state information. The protocol specification splits the entire synchronization process into three phases: (1) initialization; (2) exchanging changes and resolving conflicts, if any (essentially synchronization); and mapping. Each of these phases is explained in detail in later in this section. As seen in Section 8.4, mobile devices have certain constraints. Whenever applications are designed for mobile devices, these constraints must be taken seriously. Although the protocol specification follows the basic principles of data synchronization, it is not feasible for low-end or even high-end mobile devices to support all the requirements of data synchronization as such. To provide a solution that considers all the fundamental principles and that can be implemented by a wide spectrum of devices, the protocol specification must separate the functionalities supported by both client and server by defining roles for synchronization client and server. This approach allows for a wider adoption of the protocol and certainly facilitates the implementation effort.
Figure 8.10
Message sequence chart between sync client and server.
8.5
SYNCHRONIZATION STANDARD
269
The distinction between synchronization client and server allows simple and lightweight implementation on the client and moves all the complex operation to the server. Although the implementation is simple, it still provides a fully functional synchronization solution for low-end as well as high-end mobile devices. Protocol specification requires the client to send its modifications first and be able to receive any responses to those modifications from the server. Also, the client must be able to handle any changes made on the server side. The requirement of the client sending the modification first addresses the mobile device constraints by moving the synchronization analysis to the server side. Synchronization analysis is the process of comparing the changes made on the client side with those made on the server for the same item. Any conflicts that may happen if the same item is modified on both client and server sides will be detected and possibly resolved during the synchronization analysis process. This is a complex operation, and in most cases clients with their resource constraints simply cannot handle it. Different Phases in a Synchronization Session INITIALIZATION .
This is a handshaking phase where client and server exchange information and negotiate the conditions that will be followed for the rest of the synchronization session. Unless the handshake phase is successfully completed, the other phases won’t occur. A failure in the handshaking process may result in disconnection of the session. Information that is exchanged as part of the initialization includes session information, authentication credentials and type of authentication mechanism (username, password pair), alert information for synchronization, and device capability information. Alert and device capabilities are the most important of all in the initialization phase. Alert provides the information about the different synchronization types. As seen in Section 8.3.1, the basic types are two-way and one-way sync. Additionally, there is slow and fast sync, which are used with both one-way and two-way types. Typically, when a new mobile device initiates synchronization for the first time with a server, the type used would be a slow, two-way sync, where all contents of the client’s database are exchanged with those of the server’s database and vice versa. During the initialization, if the alerted synchronization type is not supported or not acceptable for certain reasons, then a NACK, a negative acknowledgment in form of a status, is sent and the session may be terminated. As an example, suppose that a client and a server have synchronized earlier and since then the client has made certain changes and, hence requests synchronization by alerting for a fast, two-way sync to communicate only the changes to the server as opposed to all contents of the database. For some reason, the server is unable to accept this in such cases the server can always send a NACK and compel the client to initiate a slow, two-way synchronization. The support levels of content types on client and server are not the same. For instance, the client may not be able to support all the fields defined by a content type because of memory constraints. Also, in certain cases, not all the fields are
270
CONTENT SYNCHRONIZATION
commonly used by the end-user. A good example is the versit vCard [1] object, which specifies many fields that can be used while creating a contact in a phonebook. Clients seldom implement all of them. Typically, a client is seen as a subset of what the server supports for a content type. Hence, in such situations it is imperative for the server and in some cases also the client to know the supported capabilities of the content and to know exactly what is supported. With this knowledge, the server can know upfront before sending the item whether the client will be able to handle it. For example, the versit vCard [1] allows contacts with or without photographs. Let’s say that the server supports a photo field for contacts but the client does not. The user creates a contact on the server and adds a picture. When synchronization is initiated, the client stores the contact sent by the server without the photo since the client does not support the photo. At a later time the same contact is synchronized with the server as the user will have modified some details. Now, the server deletes the photo it stored for this contact, as the client didn’t send the photo as part of the contact. This behavior is not acceptable as the client did not initially issue an explicit delete for the photo field. One way to avoid this problem is to refrain from sending anything that is not supported. This is achieved by knowing the device capabilities. For this purpose, OMA data synchronization specifies device information that allows expression of the content capabilities supported by both client and server. During the initialization phase the capabilities are exchanged by relaying the device information as shown in Figure 8.9. Synchronization. This phase performs all of the synchronization work. The client and the server synchronize according to the outcome of negotiations during the initialization phase, so if both client and server agreed on slow, two-way synchronization, then this phase will simply follow the agreement and send all the database content. For synchronization of changes, which is usually the fast synchronization type, it is necessary to support change detection. The details of how change detections are implemented are beyond the scope of this chapter. Without a mechanism to support change detection, client and server will only be able to perform slow sync, which is not an efficient synchronization solution for mobile devices that have low data rate and high airtime costs, such as the GSM/GPRS network. It is during this phase when synchronization analysis is performed. As seen, this is a complex operation and is usually supported by the server. Synchronization analysis consists of conflict detection and resolution. Whenever both client and server make changes on the same data item, conflicts occur. When this happens, the server is able to resolve the problem by following the rules as outlined in Section 8.3.3, [11]. Mapping. Mapping is required because of constraints imposed on the length of IDs used by mobile devices. IDs are unique identifiers used in a database to isolate or distinguish individual items. Typically, low-end devices use 4– 8 bytes as unique identifier length. Servers, on the other hand, use much longer unique identifiers.
8.5
Figure 8.11 10.2.1].
SYNCHRONIZATION STANDARD
271
XML snippet of server sending synchronization message to client [11, Sec.
272
CONTENT SYNCHRONIZATION
Figure 8.12
XML snippet of client sending mapping message to server [11, Sec. 10.3.1].
8.6
Figure 8.13
SUMMARY
273
Map table maintained by server (adapted from Ref. 11).
When a server adds an item, it cannot add with the unique identifier used by its database, but rather must add the item by using a temporary identifier as shown in Figure 8.11 in the “Add” command. Even a temporary identifier cannot have a long ID as the client has to buffer the temporary ID until map operation is executed. So the server must comply with the acceptable temporary length defined by the mobile device. The mobile device accepts the item and adds to its database. By adding it, the client’s database generates a new ID for the added item, which is the client-side local unique identifier (LUID). This ID must be communicated to the server, as the server must use the client’s ID to communicate all changes that may occur on the server side in the future. Hence, the mapping operation is used for this purpose, where the client sends a message consisting of the server’s temporary ID mapped with the client’s ID (LUID). On receiving this message, the server maintains a mapping table that contains maps the client’s ID with the server’s ID (LUID mapped with GUID). The server must maintain the map for the entire life of the item. This is illustrated in Figures 8.12 and 8.13 (adapted from Ref. 11 Section 7.3).
8.6
SUMMARY
Content synchronization has been widely used for many years in the wired world with devices having high computing power. Mobile computing devices have already attracted interest for various services and applications. With that, a mobile device is regarded as an indispensable device, which users prefer to use for almost everything that is possible while on the move. Content synchronization is very important for mobile devices, especially since there are numerous
274
CONTENT SYNCHRONIZATION
applications, such as email, and the PIM system, on a mobile device with which users want to keep up-to-date with backend systems. But since mobile devices are considered to be in a “disconnected state” most of the time, one way to ensure that applications on mobile devices and backend systems are up-to-date is to use content synchronization.
REFERENCES 1. vCard, Electronic Business Card, http://www.imc.org/pdi/vcard-21.doc. 2. vCalendar, Electronic Calendaring and Scheduling Exchange Format, http:// www.imc.org/pdi/vcal-10.doc. 3. Hypertext Transfer Protocol, HTTP/1.1, http://www.ietf.org/rfc/rfc2616.txt. 4. IrDA Object Exchange Protocol, http://www.irda.org/standards/ specifications.asp. 5. Infrared Mobile Communications, http://ww.irda.org/standards/ specifications.asp. 6. Open Mobile Alliance, http://www.openmobilealliance.org. 7. Bluetooth Core Specification, http://www.bluetooth.com. 8. IEEE 802.11, Wireless Local Area Networks, http://grouper.ieee.org/groups/ 802/11. 9. WAP Forum, Wireless Session Protocol, http://www.wapforum.org/what/ technical.htm. 10. Extensible Markup Language, http://www.w3.org/XML/. 11. SyncML Data Sync Protocol Version 1.1.2, Open Mobile Alliance (OMA), OMASyncML-DataSyncProtocol– V1_1_2-20030612-A.
CHAPTER 9
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS SANJEEV VERMA Nokia Research Center Burlington, Massachusetts
MUHAMMAD MUKARRAM BIN TARIQ DocoMo Communication Laboratories USA, Inc. San Jose, California
TAKESHI YOSHIMURA Multimedia Laboratories, NTT DoCoMo, Inc. Yokosuka, Kanagawa, Japan
TAO WU Nokia Research Center Burlington, Massachusetts
9.1
INTRODUCTION
Multimedia services, such as streaming applications, are growing in popularity with advances in compression technology, high-bandwidth storage devices, and highspeed access networks. Streaming services are generally used in applications like multimedia information and message retrieval, video on demand, and pay TV. Also, there has been growing popularity of portable devices, such as notebook computers, PDAs, and mobile phones in recent years. Now it is possible to provide very high-speed access to portable devices with emerging technologies like WLAN and 3G networks. For instance, emerging 3G wireless technologies provide data rates of 144 kbps for vehicular, 384 kbps for pedestrian, and 2 Mbps for indoor environments [1,2]. Hence, it is now possible to enrich the end user’s experience by combining multimedia services [3,4] with mobile-specific services such as geographic positioning, user profiling, and mobile payment. One example of such service is “mobile cinema ticketing,” which uses geographic positioning and user-defined Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
275
276
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
preferences to offer a mobile user a selection of movies from nearby movie theatres. A user views corresponding movie trailers through a streaming service before selecting a movie and purchasing a ticket. Streaming services are services in which continuous video and audio data are delivered to an end user. A multimedia streaming service consists of one or more media streams. A multimedia streaming application may have both audio and video components (e.g., news reviews, movie trailers) or it may have audio streaming with visual presentation comprising still images and or graphics animations, such as a quarterly Webcast of earnings by corporations. These applications are generally stored at a Web-based server and streamed to clients on request. Streaming audiovideo clips are sufficiently large, which makes their transmission time longer (several minutes or longer) than the acceptable playback latency. Hence, downloading the entire audio/video content before its playback is not an option. The streaming audio/video clips are played out while parts of the clips are being received and decoded. This is the biggest advantage of streaming service, since a user is able to see video soon after downloading begins. Figure 9.1 illustrates a general architecture for providing streaming services [5]. The multimedia content for streaming services is created from one or more media sources (videocamera, microphone, etc). It can also be created synthetically without using any natural media source. Examples of synthetically generated multimedia contents are computer-generated graphics and digitally generated music. Typically, the storage space required for raw multimedia content can be huge. The multimedia content is digitally edited and compressed in order to provide attractive multimedia retrieval services over low-speed modem connections. The edited
Figure 9.1
A general architecture designed to provide streaming services.
9.1
INTRODUCTION
277
and compressed multimedia clips are then stored in storage devices at the server. On receiving a request from the client, the streaming server retrieves the compressed multimedia clip from storage devices and the application layer QoS module adapts the multimedia stream based on the QoS feedback at the application layer. After adaptation at the application layer, transport protocols packetize the compressed multimedia clips and send them over the Internet. The packets may suffer losses and accumulate delay jitter while traversing the Internet. To further improve the QoS, continuous media distribution services (e.g., caching) may be deployed in the Internet. The successfully delivered media packets are decompressed and decoded at the client end. Compensation or playout buffers are deployed at the terminal end to mitigate the impact of delay jitter in the Internet and to achieve seamless QoS. Clients also use media synchronization mechanisms to achieve synchronization across different media streams, for example, between audio and video streams. There are several challenges in providing streaming services in wireless environments due to some issues that are specific to these environments (see Fig. 9.2). For example, wireless terminals typically have power constraints due to battery power. Also, they have limited buffering and processing power available due to size and power constraints. In addition, wireless environments are very harsh. The characteristics of a wireless channel have a unpredictable time-varying behavior due to factors such as interference, multipath fading, and atmospheric conditions. This results in more delay jitter, more delay, and higher error rates, compared to that in wired networks. Moreover, the mobility or the movement of a mobile user from one cell to another cell introduces additional uncertainty. The movement triggers a handoff mechanism to minimize interruption to an ongoing session. The wireless channel characteristics may be entirely different in a new cell after handoff. The access point (typically a basestation) of the mobile host to the wired network also changes after the handoff. This results in the establishment of entirely new route in the wired network. The new route in the fixed network may have different path characteristics. This problem becomes even more severe as wireless networks are Mobile Terminals
Limited Resources •Power constraints •Limited storage •Limited processing
Wireless Environments
Harsh Environments •High error rate •Large and variable delay •Expensive spectrum
Figure 9.2 Constraints in wireless environments.
278
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
being implemented using smaller cell sizes (microcell) to allow higher system capacity. Microcell implementations result in rapid handoff rates, causing even wider variation in path characteristics. These issues have some implications in providing streaming services in mobile environments. Streaming architecture in wireless/mobile environments should ensure minimum processing at the mobile terminal end. For instance, a typical approach used by streaming applications regarding QoS adaptations at the application layer may not be suitable in wireless environments. Adaptation at the application layer involves a lot of end-to-end signaling, which may eat away precious resources at the terminal end. Also, it is very difficult for mobile terminals with very limited processing and buffering capability to adapt at the application layer. The wireless network should have built-in network-wide mechanisms to minimize the resource and processing requirements at the mobile terminals. The overall design goal of wireless access architecture should be to “make networks friendly to applications” rather than “make applications friendly to networks.” In the remainder of this chapter, we will describe the different components and protocols that are constituents of the streaming architecture. First, we go over various QoS issues to support streaming services in general. We then give an overview of various codecs and media types that constitute an important component of multimedia streaming architecture. Next, we describe a general architecture to implement streaming services in mobile environments. We first review the different architectural components that support these services in wireless/mobile environments. In subsequent sections, we give an overview of key protocols and languages used for streaming multimedia delivery and provide an overview of their working and example usage. We then describe packet-switched streaming service architecture developed by 3GPP (referred as 3GPP-PSS) since it is the most mature standardization activity in this field. Most likely, the 3GPP2 architectural solution will also be on similar lines. Next, we discuss research issues and related work in providing multimedia services in mobile and wireless environments. Finally, we summarize and look into the future trends in supporting multimedia services in broadband wireless access networks.
9.2
QoS ISSUES FOR STREAMING APPLICATIONS
Streaming applications are real-time noninteractive applications. They involve oneway delivery of streaming data from the server to the client. Because of their realtime nature, these applications typically have bandwidth, delay jitter, and loss requirements. We first discuss the QoS parameters that are important for the streaming applications and then the QoS control mechanisms at application and lower layers. Delay jitter [6] is particularly important for these applications. The delay jitter bound for a session is calculated as the difference between the largest and smallest delays incurred by packets belonging to the session. A client (receiver) should choose playback instants so that, when it is ready to output the information contained in the packet, the packet has already arrived. If the delay jitter over a
9.2
Figure 9.3 jitter.
QoS ISSUES FOR STREAMING APPLICATIONS
279
Client-side buffering: playout delay compensate for network-induced delay-
network connection is bounded, the receiver can eliminate delay-jitter incurred by packets while traversing the network by providing a large enough playout or compensation buffer (see Fig. 9.3). The buffered packets are then scheduled for transmission according to the rate at which they were generated at the sender. The packets, which arrive earlier than their scheduled playout time, wait in the playout buffer. Thus, the larger the delay jitter bound, the larger the playout buffer required at the receiver to maintain constant quality. For a given delay jitter bound, the required playout buffer size is the product of delay jitter bound and the playback rate. Figure 9.4 illustrates the removal of delay jitter at a client.
Figure 9.4
Delay jitter removal at the client end.
280
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
Although the output rate of the server could vary with time, for simplicity we assume that the server is generating packets at the constant rate with equal spacing every I seconds. The receiver delays the first packet by the delay jitter bound J and then plays out packets with the same spacing as they are generated. Suppose that the first packet arrives at the receiver d1 seconds after the transmission and is further delayed by an amount equal to the delay jitter bound J in playout buffer. The kth packet is generated after B ¼ kI seconds, and this packet will incur a delay between dk (fixed delay, mainly propagation delay) seconds and (dk þ J) seconds. Since the client plays back the packets with the same spacing as when they were generated, the kth packet will be scheduled for playout at (d1 þ J þ B) seconds. Since d1 dk , the latest arrival of the kth packet, (dk þ J þ B) is guaranteed to be before the scheduled time. Thus by delaying packets in the playout buffer for delay jitter bound, the receiver can eliminate jitter in the arrival stream, and guarantee that a packet has already arrived by the time the client is ready to play it. Note that the playout buffer is useful only to absorb short-term delay variations. The more the data initially buffered, the wider the variations that can be absorbed, but higher startup playback latency is experienced at the client end. The maximum allowable buffering is determined by the acceptable delay latency. Another important QoS parameter for a streaming application is the error rate. Although streaming applications can tolerate some loss, the error rate beyond a threshold can degrade the quality of the delivered streaming data significantly. To maintain reasonably good quality of playback stream, a proper error control mechanism is needed to recover packets before their scheduled playback time. The wellknown techniques to minimize error for streaming traffic are FEC, interleaving, and redundant retransmissions. In addition, the lost packets can be recovered through limited retransmissions. This necessitates buffering at the client end to allow for retransmissions. Now we look into the specific QoS control mechanisms at the application and lower layers to achieve the QoS needs of multimedia streaming applications.
9.2.1
Application Layer QoS Control
The goal of the application layer QoS control is to adapt at the application layer in order to provide acceptable quality streaming service to the end user in the presence of packet loss and congestion in the network. We note here that the Internet in its current form is a best-effort network and does not provide network-wide QoS support. Thus the available bandwidth is not known in advance and varies with time. The packets may suffer variable delay and come out of order at the client end. Clients need to adapt at the application layer in order to receive good-quality streaming service. The application layer QoS control techniques include endto-end congestion and error control. These techniques are employed by the end systems and do not assume any support from the network.
9.2
QoS ISSUES FOR STREAMING APPLICATIONS
281
9.2.1.1 Congestion Control and Quality Adaptation The Internet in its rudimentary form provides a transport network that delivers packets from one point to another. It provides a shared environment, and its stability depends on the end systems implementing appropriate congestion control algorithms. The end-to-end congestion control algorithms help to reduce packet loss and delay in the network. Unfortunately, it is not possible for streaming applications to implement end-to-end congestion control algorithms since stored multimedia applications typically have intrinsic transmission rates. Streaming applications are rate-based and typically transmit data with a near-constant rate or loosely adjust their transmission rate on long timescales since the required rate for being well behaved is not compatible with their nature. For streaming applications, congestion control takes the form of rate control that attempts to minimize the possibility of congestion by matching the rate of streaming media to the available network bandwidth. A vast majority of the Internet applications implement TCP-based congestion control that uses the additive increase, multiplicative decrease (AIMD) algorithm. Under this algorithm, the transmission rate is linearly increased until a loss of packet signals congestion and a multiplicative decrease is performed. TCP, as it is, is not appropriate for delay-sensitive applications such as streaming. To ensure fairness and efficient utilization of network resources, rate control algorithms for streaming applications should be “TCP-friendly” [7 – 9]. This means that a streaming application sharing the same path with a TCP flow should obtain the same average throughput during a session. A number of model-based TCP-friendly rate control mechanisms [10] have been proposed for streaming applications. These mechanisms are based on the mathematical models that relate the throughput of a typical TCP connection to the network parameters [7]:
l¼
1:22 MTU pffiffiffiffi RTT p
(9:1)
where
l ¼ throughput of a TCP connection MTU ¼ maximum transmission unit is the maximum packet size used by the connection RTT ¼ Roundtrip time for the connection p ¼ packet loss experienced by the connection Under the model-based approach, the streaming server uses Equation (9.1) to determine the sending rate of the streamed media to behave in a TCP-friendly manner. The source basically regulates the rate of the streamed media according to the feedback information of the network. This can be used for both unicast and multicast scenarios. However, a source-rate-based control scheme is not suitable in heterogeneous network environments, where receivers have heterogeneous network capacity and processing power. Receiver-based rate control [11,12] has been found to be better rate control mechanism in heterogeneous network environments. Under this mechanism, receivers
282
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
regulate the receiving rate of streaming media by adding or dropping channels without any rate regulation from the source end. This is targeted toward scenarios where the source multicasts layered video with several layers. The basic scheme works as follows: 1. When no congestion is detected, a receiver joins or adds a layer or channel that results in increase of its receiving rate. If addition of a channel does not cause any congestion then join experiment is deemed successful. Otherwise, the receiver drops the added layer or channel. 2. If congestion is detected, the receiver drops the low-priority layer or channel (enhancement channel). Alternatively, an architecture may use both source and receiver-based control mechanisms [13] in which receivers regulate the receiving rate of streaming media by adding or dropping channels, while the sender also adjusts the transmission rate of each channel according to the feedback from the receivers. One of the main challenges in delivering streaming media to a client is to adjust with variations in network bandwidth while delivering acceptable quality streaming media to the receiver. As discussed before, short-term variations in bandwidth can be handled by providing playout or compensation buffer at the receiver. When available bandwidth is more than the playback rate at the receiver, the spare data are stored in the playout buffer and when the available bandwidth is less than that required to maintain the constant quality then the deficit is supplied by the spare data in the playout buffer (see Fig. 9.5). However, the bandwidth variations for a long-lived session can be large and random. This may cause the client’s buffer to either underflow or overflow. The buffer underflow is particularly undesirable since it causes interruption of service at the client’s end. Rate control mechanisms
Filling Phase
Draining Phase
Bandwidth
Transmission rate Spare data stored in playout buffer Deficit supplied from playout buffer
Playback rate
Available bandwidth from network
Time
Figure 9.5
Short-term quality adaptation at client.
9.2
QoS ISSUES FOR STREAMING APPLICATIONS
283
discussed in preceding paragraphs are one way to tackle quality adaptation due to long-term variations in network bandwidth. Alternative mechanisms are adaptive encoding and switching between multiple encoded versions. Under adaptive encoding mechanism, the server adjusts the resolution of encoding by doing requantization based on the network feedback. However, this task is very CPU-intensive and is not scalable to large number of clients. Also, once the streaming data are compressed and stored, encoders cannot change the output rate over a wide range. In another alternative scheme, a server maintains several versions of media streams, each with different qualities. As available bandwidth in the network changes, the server dynamically switches between low- and high-quality media streams as appropriate. Hence, quality adaptation under short-term variations in bandwidth is achieved through the playout/compensation buffer at client end and quality adaptation under long term, and wide variations in bandwidth are achieved through appropriate rate control mechanisms at both client and server ends.
9.2.1.2 Error Control As previously mentioned, streaming media can tolerate errors as long as the error rate remains within an acceptable limit. This is particularly important in wireless environments that have high error rates. Moreover, errors tend to happen in bursts in these environments. Well-known techniques to minimize the error for streaming traffic are FEC (forward error correction), error-resilient encoding, error-concealment, and retransmissions. The FEC technique adds redundant information to the original packet in order to recover the packet, in the presence of errors. Errorresilient encoding is a preventive technique that enhances the robustness of streaming media in the presence of packet loss. The well-known error-resilient encoding schemes are resynchronization marking, data partitioning, and data recovery. These are particularly effective in wireless environments. Another promising error-resilient encoding scheme is multiple description coding (MDC) [14], where raw video data are encoded into a number of streams (or descriptions): each description provides an acceptable quality. If a client gets only one description, it should also be able to reconstruct video with reasonably good quality. However, the receiver can construct better-quality video if it gets more than one description. Error concealment techniques, on the other hand, adopt a reactive approach and aim to conceal lost packets and make the presentation less displeasing to human eyes or ears. Packet retransmission techniques [15] are considered very effective in wireless environments because of the bursty nature of wireless channels. In general, packet retransmission is not deemed very suitable for real-time applications such as video because of retransmission delay. However, retransmission may be allowed, especially for high-priority packets, if there is sufficient delay until the scheduled playback time of the packet considered. Clients may request the retransmissions of only those high-priority packets that have sufficient retransmission delay budget. We explain this concept as follows. For simplicity, we assume that the
284
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
server is generating packets at a constant frame rate (say, every T seconds). We introduce the following notations: Pn ¼ playback time of the nth packet Tn ¼ arrival time of the nth packet T ¼ interframe time RTT ¼ estimated roundtrip time Td ¼ loss detection delay Tr ¼ retransmission delay Tc ¼ current time Thus the scheduled time of the kth frame can be given by (P0 þ kT), where P0 is the playback time of the 0th frame. Now, if the current time is Tc, the delay budget before the scheduled playback time of the kth packet can be given by Delay budget ¼ (P0 þ kT) Tc
(9:2)
This delay budget should be sufficient to allow retransmission of the frame from the server taking into account loss detection delay, estimated roundtrip delay, and retransmission time. The client should send the retransmission request to the server only if the following condition is satisfied: Td þ RTT þ Tr delay budget
(9:3)
The objective here is to avoid unnecessary retransmissions that will not arrive in time for display. 9.2.2
Network Layer QoS Control
The previous discussions on QoS control at the application layer for streaming services assume no support from network whatsoever. The QoS support at the network layer and below complements the QoS mechanisms at application layer and reduces the signaling and processing load at higher layers. Providing QoS in the Internet is inherently a difficult problem due to its connectionless nature. However, a number of proposals have been made in IETF to provide some sort of QoS support in the Internet. Currently, there are two approaches, notably Integrated Services (IntServ[16]) and Differentiated Services (DiffServ[17]), standardized by IETF to provide QoS support in the Internet. The IntServ model provides per flow QoS guarantees. A flow is defined as a stream of packets between two end nodes with the same tuple of source address, destination address, source port number, and destination port number. The IntServ model consists of four functional blocks: end-to-end signaling protocol, call admission control at the edge, packet classifier at the edge, and packet scheduler at every network element in the path. RSVP [18] is the proposed signaling protocol to take the reservation requests to all the routers in the path. Underlying
9.3
STREAMING MEDIA CODECS
285
IP routing protocols determine the path, and RSVP signaling is used to reserve resources along the selected path. Keeping in mind the dynamic nature of IP routing protocols, the soft-state approach is utilized to reserve resources. Though IntServ provides excellent QoS model, it suffers from scalability problem. Network elements need to maintain a per flow state to provide per flow QoS guarantees. This can introduce scalability problems, particularly in backbone networks that support tens of thousands of flows. The DiffServ QoS model is another approach that provides scalable solution and does not require any signaling support. Unlike IntServ model, this model does not provide per flow QoS guarantees. Under this model, routers simply implement a suite of prioritylike scheduling and buffering mechanisms and apply them to IP packets based on the DS-field in the packet headers. The service that an individual flow gets is determined by the traffic characteristics of the other flows (cross-traffic) sharing the same service class. The lack of networkwide control implies that, on overload in a given service class, all flows in that class suffer a degradation of service. DiffServ tries to give soft QoS guarantees to flows by using a combination of provisioning, service-level agreements, and per hop behavior implementations. For this purpose, networkwide mechanisms are deployed in the network. Bandwidth broker (BB) is one approach to do resource provisioning within a DiffServ domain. BB is the resource manager within the DiffServ domain that keeps track of available resources and topology information for a domain. BB uses COPS (common open policy service) protocol [19] to interact with routers inside the domain.
9.3
STREAMING MEDIA CODECS
Standardized video coding and decoding methods, such as H.263 by ITU-T and MPEG4 by ISO, are expected to be supported by a wide range of mobile terminals and networks. For audio-only content, MPEG-4 AAC is an appealing candidate for its superior coding efficiency, while MP3 is also likely to be supported because of its popularity on the Internet. Some mobile terminals may also support proprietary codecs and file formats, such as those developed by Apple Computer, Microsoft, and Real Networks. 9.3.1
Video Compression
Video compression in mobile networks is usually lossy compression that exploits temporal and spatial redundancy within the video streams. Specifically, motion estimation and compensation are widely used between consecutive video frames to reduce temporal redundancy. Within a frame, block-based transforms such as DCT (discrete-cosine transform) are performed to reduce spatial redundancy. In MPEG, for example, one can encode a video frame into one of the following types of encoded pictures [20]: . I-picture (I ¼ intraframe). I-pictures are encoded using intraframe information only, independently of other frames. In other words, I-pictures exploit spatial redundancy only.
286
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
. P-picture (P ¼ interframe prediction). P-pictures are encoded using the most recent-I-picture or P-picture as a reference. . B-picture (B ¼ bidirectional prediction). B-pictures are encoded using P-pictures and/or I-pictures both in the past and in the future as references. A video stream composed of I-pictures allows for flexible random access and high editability, but its compression ratio is relatively poor. P-pictures and B-pictures substantially improve compression efficiency at the cost of increased manipulation difficulty (random access, editability, etc.) and in the case of B-pictures, coding delay. Hence, an MPEG video stream often consists of a sequence of pictures of all three types (e.g., I B B P B B P B B I B) to strike a good balance among different aspects of performance and usability. In addition, MPEG-4 also allows encoding of arbitrarily shaped objects in order to provide content-based interactivity [21]. The mobile environment that we consider in this chapter brings some specific requirements for video compression. For example, wireless channel errors can lead to loss of synchronization because video encoders often uses variable-length coding (VLC), and forward error correction (FEC) codes are not very effective in correcting burst errors. Toward this end, error resilience and concealment techniques that minimize the effect of channel errors are important in providing graceful service degradation [22]. Furthermore, many mobile terminals have limited CPU, memory, and battery power resources; thus controlling decoder complexity is important for these terminals.
9.3.2
Audio Compression
Besides the speech codec used for voice services, general audio compression is needed for high-quality audio services such as music delivery. General audio coders typically generate higher bit rates than do speech coders since they cannot rely on a specific audio production model as speech coders do with the human vocal tract model. Additionally, while speech coder’s emphasis is intelligibility, audio codec may need to provide higher signal fidelity in streaming media services. In high bit rate, an audio codec strives to preserve the original signal waveform [23]. Higher compression can be achieved by taking advantage of the human auditory model so that the signal components that the human ears are not sensitive to can be compressed. More details on these techniques can be found in, for example, the article by Poll [23].
9.3.3
Codecs Used in 3GPP
As an example, Table 9.1 lists required or recommended decoders in 3GPP [24]. Figure 9.6 illustrates general client functional components for streaming media service in 3GPP [24].
9.3
TABLE 9.1
287
Codec Standards Used in 3GPP
Services
Decoder Requirements or Recommendations
Speech Audio Synthetic audio Video Still images Bitmap graphics Vector graphics
Figure 9.6
STREAMING MEDIA CODECS
AMR MPEG-4 AAC low complexity Scalable polyphony MIDI H.263 profile 0 level 10 mandatory; MPEG-4 visual simple profile optional JPEG GIF, PNG SVG Tiny profile
Functional components of a 3GPP packet-switched streaming service (PSS) client.
288
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
9.4 END-TO-END ARCHITECTURE TO PROVIDE STREAMING SERVICES IN WIRELESS ENVIRONMENTS Streaming multimedia is characterized by an application rendering audio, video, or other media in a continuous way while part of the media is still being transmitted to the application over a data network. Streaming multimedia is a little different from conversational multimedia, which involves (usually bidirectional) conversation between multiple parties. Although the type of the media (media encoding) used for both streaming and conversational multimedia communication may be the same, conversational multimedia usually has more stringent requirements on endto-end delay between the parties. Also, streaming multimedia is usually a client server application and the media usually flow in only one direction (from server to the client), whereas conversational multimedia, such as interactive videoconferencing, is usually peer-to-peer, and the media (often) flow among all peers. In previous sections we saw how streaming media applications process (through decoding, error correction, buffering and scheduling) media data to compensate for delay jitter and packet loss incurred over the network and ensure a smooth rendering. Here we will discuss important logical components needed to enable streaming service in mobile or wireless networks and the interrelationships between these logical components required to form a complete streaming multimedia delivery system. Our main focus is packet-based streaming systems. We will start with a discussion on logical layout and components for such a system. In subsequent sections, we will shift focus to different protocols and languages used for streaming multimedia delivery and provide an overview of their working and example usage.
9.4.1
Logical Streaming Multimedia Architecture
Streaming multimedia architecture (Fig. 9.7) consists of following basic components: 1. A streaming server that sends media as a continuous stream over a data network. The server is often referred to as the origin server, to distinguish it from intermediary (proxy or caching) servers. 2. A data network that transports media from the server to the client application. 3. A client application capable of receiving, processing and rendering continuous stream of media in a smooth manner. 4. Protocols that are understood amongst the components and allow them to talk with each other. The protocols provide various functionalities, including allowing the client to establish a streaming multimedia session with the server, facilitating delivery of media from the server to the network and from the network to the client, understanding the content of media stream for correct processing at the client application (encoding and packaging), and allowing interaction with the servers to manipulate the media streams.
9.4
END-TO-END ARCHITECTURE TO PROVIDE STREAMING SERVICES
289
1. Streami ng Media Request
2. Str eaming Media
Streaming Media Client Figure 9.7
Network
Streaming Media Server
Basic streaming media architecture.
Besides the basic components and functionalities listed above, a multimedia delivery system often contains additional components, functionalities and protocols to improve various aspects of multimedia delivery. These may include the following:
1. Proxy Servers. Proxy servers provide functionality similar to that of a server from the client’s perspective. Proxy servers are often transparent to the application; however, certain streaming media protocols explicitly provide for the existence of proxies [25]. Proxy servers may be present to process client requests locally or relay the requests to some other server (after performing some optional local processing). If the target is to serve multimedia session requests locally, then a cache of streaming media content usually accompanies the proxy server. On receiving a request, the proxy server determines whether the desired content is available in the cache; if so, the content can be delivered locally; otherwise the proxy server relays the request to some other servers. 2. Caching Servers. Caching servers are local repositories of content. As in the case of static Web object (e.g., images and Webpages), it is advantageous to store local copies of content and serve user requests locally. This not only eliminates the delays incurred due to topological distance of the origin server from the client application but also results in traffic localization and better utilization of network resources. There are several well-known methods for populating caches with content, but they can be broadly classified in two categories: Passive Caching. Here only the content delivered by origin or upstream servers in response regional client application requests is stored at the cache server. Local storage in this method is often a promiscuous process and the cache server belonging to this category is often termed simply as “cache.” Proactive Caching. Here the content is proactively stored on the cache server by some external mechanism. Often entire or large portions of the content on a server may be replicated onto a caching server. In
290
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
this case the term surrogate server is sometimes used for the caching server. 3. Additional Protocols. Additional protocols may include Protocols for capability exchange between the client application and server, so as to allow the server to transmit appropriate data. Protocols for QoS feedback from client application to the server, enabling the server to adapt the transmission (if possible). Protocols and languages for (time and space) synchronization of multiple multimedia streams. Protocols and mechanisms for request routing to best available surrogate or caching for a given client request. We will not discuss request routing any further in this chapter. An overview of a multitude of request routing methods can be found in the report by Barbir et al. [26]. 4. Miscellaneous Components. A real-life deployment of a streaming multimedia delivery system will rely on more than just the abovementioned components (see example components in Fig. 9.8). Functionalities such as authentication, authorization and accounting (AAA) often require additional architectural support. Similarly, ensuring digital rights management (DRM) may require additional functionality from client application and also from the server and the content creation process. In certain scenarios dedicated components may be present to provide QoS adaptation and feedback. Standards for streaming media consist of a wide array of protocols, description languages, and media coding techniques. These standards have been developed and standardized at various standardization organizations, such as, Internet Engineering Task Force (IETF), ISO, Third-Generation Partnership Project (3GPP), and World Wide Web Consortium (W3C).
Proxy based
Figure 9.8
Some components of a typical streaming media architecture.
9.5
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
291
PROTOCOLS FOR STREAMING MEDIA DELIVERY
A streaming multimedia delivery system involves a number of protocols (see Fig. 9.9) to deal with the different aspects of streaming media. The protocols provide a common dialect through which different components in the architecture can talk with each other. These protocols can be classified in two broad categories: (1) session control and (2) media transport protocols. In most contemporary multimedia streaming setups, separate logical channels are used for session control and media transport. In some cases, however, most notably HTTP and RTSP tunneling, the same logical channel is used for both session control and media transport. Consequently, certain protocols provide functionalities that span more than one aspect of multimedia streaming, and we cannot draw a hard boundary. We will discuss these as well, but let’s first see what functionalities are expected out of the two main categories of the protocols. 9.5.1
Protocols and Languages for Streaming Media Session Control
Streaming multimedia often have a notion of (prolonged) association between multiple components, for example, between the client application and the server; this association is called a session.
Figure 9.9
Protocols used in a typical streaming session.
292
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
Session control and establishment usually includes identifying the parties (the client and server applications) involved in the session and the agreement or the announcement of different session parameters. For IP-based environments the parties are often identified by their transport layer address (IP address and port number). Multimedia streaming sessions often have a rich set of parameters, the most important of which are the types of encoding of media that will later flow from the sender (server application) to the recipient (client application). These parameters allow the application on the recipient to process and render the media correctly. Different session control protocols provide varying degrees of functionality, but all of them provide minimal functionality for basic session control: session setup, teardown and establishment of other session parameters. Examples of session control protocols include the real-time streaming control protocol (RTSP) [25], the session announcement protocol (SAP) [27], the session description protocol (SDP) [28], the session initiation protocol (SIP) [29], and ITU-T’s H.323 [30]. RTSP is the dominant session control protocol for client –server streaming multimedia application and is defined in RFC 2326 [25]. In this section you will find a brief tutorial on RTSP and its use; however, it is by no means a complete description of RTSP. In the following section we will describe RTSP in detail and briefly overview the other protocols in this realm.
9.5.1.1 Real-Time Streaming Protocol RTSP is an application-level client – server protocol that provides the functionality needed to establish and control a streaming session. The session may comprise one or more streams, which are described using a presentation description (using expressions such as SMIL or SDP). Once a session is established, RTSP provides methods for controlling the streams, such as, VCR-like forward, rewind, pause, and record methods. RTSP primarily provides functionalities to retrieve data from the server and invite a server to a conference, and it is a transactionoriented, request –response protocol like HTTP. However, there are a number of differences: . RTSP servers are required to maintain state between most transactions, unlike in HTTP, in which the servers are mostly stateless. . RTSP defines new methods and a protocol identifier. . In RTSP, the server side may issue some requests as well, unlike in the case of HTTP, where the client always makes the request and the server sends back a response. . In RTSP, the data are carried mostly out of band, on a separate data channel such as RTP. In HTTP, the data are carried in payload of HTTP (response) messages. . RTSP uses absolute resource identifiers (request URI); this is to eliminate the problems caused due to usage of relative URLs in earlier versions of HTTP.
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
293
RTSP Messages Figure 9.10 shows the syntax of RTSP messages. There are only two basic types of RTSP messages: request and response. All RTSP messages are text-based and use ISO-10646 UTF-8 encoding. The first line in the message identifies the message type: whether it is a request or response message and specifically what kind of request or response message. For requests this first line is termed the request line and for responses, the status line. Message headers follow the request line or the status line. These provide additional information that is critical for the correct interpretation of the message. Finally, messages may optionally contain a message body. Please refer to Section 15 in RFC 2326 [25] for the complete syntax of RTSP.
RTSP Request Messages Request line in each request message has a method token that indicates the task to be performed on the resource specified in “Request-URI.” Eleven methods are defined in RFC 2326 [25], each designed for a different task. Following is a brief description of each of the 11 RTSP methods; however, please refer to Section 10 in RFC 2326 [25] for an in depth description of the methods.
RTSP Message = Request | Response Request = RequestLine *( generalHeader | requestHeader | entityHeader )CRLF [ messageBody ]
Requestand andResponse Responseare arethe theonly onlytwo two Request typesofofRTSP RTSPmessages. messages. types Methodidentifies identifiesthe thetype typeofofrequest requestmessage. message. Method Leadingheaders headersprovide provideadditional additionalinformation information Leading forinterpreting interpretingthe therequest requestmessage. message. for
RequestLine = Method SP Request-URI SP RTSP_Ver CRLF Method = "DESCRIBE" | "ANNOUNCE" | "GET_PARAMETER" | "OPTIONS" | "PAUSE" | "PLAY" | "RECORD" | "REDIRECT" | "SETUP" | "SET_PARAMETER" | "TEARDOWN" | ext -method ext-method = token
Request-URI = "*" | absolute_URI RTSP_Ver = "RTSP" "/" 1*DIGIT "." 1*DIGIT Response = Status -Line *( generalHeader | responseHeader | entityHeader ) CRLF [ messageBody ]
Elevenmethods methodsare are Eleven definedinin[RTSP] [RTSP] defined specification. specification. Request-URIininthe therequest request Request-URI messageidentifies identifiesthe the message resourceininquestion. question. resource
Status-codeidentifies identifiesthe thetype typeofof Status-code responsemessage. message. response Leadingheaders headersprovide provideadditional additional Leading informationfor forinterpreting interpretingthe theresponse. response. information
StatusLine = RTSP_Ver SP StatusCode SP ReasonPhrase CRLF StatusCode = A pre-defined 3 digit code or a 3 -Digit extension-code ReasonPhrase = *
Figure 9.10
Syntax for RTSP messages.
294
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
. DESCRIBE is a recommended method that is only sent from the client side. The server typically sends a description of the resource identified in Request-URI. This description is contained in the message body. It is not necessary that session description always be obtained using this method. Other out-of-band mechanisms may be used for a variety of reasons including the cases where the server does not support the DESCRIBE method. Session may be described using SDP or other protocols. . ANNOUNCE is an optional method that may be sent from the client or the server. When sent from the client to the server, it updates the presentation or media object identified by the Request-URI. When sent from the server to the client, the session description is updated in real time. . SETUP is a mandatory method that is only sent from the client side. The client specifies the transport mechanism to be used for a media stream (identified by Request-URI). The SETUP method may also be used to change the transport parameters of a stream that is already playing. . PLAY is a mandatory method that is always sent from the client to the server. This tells the server to start sending the stream that was setup using a previously (successfully) completed SETUP transaction. PLAY is a versatile method, allowing very precise control to the client such as identify the range of media stream to be played (both starting point and ending point may be specified). Similarly, several PLAY requests may be issued for different segments of the stream setup using the previous SETUP message. Each request may specify both the range of stream segment and the time at which the server should start streaming the data. These requests would queue at the server and the server would generate the stream corresponding to each request at appropriate times. Obviously the server is not obliged to fulfill all the client requests. PLAY request is also used to resume a paused stream. . PAUSE is a recommended method that is always sent from the client to the server. This method causes the server to temporarily halt the delivery of a stream (or set of streams, depending on Request-URI). If a PAUSE request is issued, all the queued PLAY requests related to the Request-URI are discarded by the server. A new PLAY request must be sent to resume the stream(s). . The OPTIONS method is used by the sender to query the information about the communication options available on the resource identified by Request-URI; for example, it may be used by a client to query the types of methods supported by a server for a given media stream. Although a client or a server may send this message, implementation of this method is mandatory only for servers. . The TEARDOWN method request stops the stream delivery of the resource identified in the Request-URI. All the queued requests are discarded, and all the resources associated with the resource are freed. As you may have rightly guessed, TEARDOWN message is always send from the client to the server and this is a mandatory method.
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
295
. The REDIRECT method request informs the client that it must contact another server location. If the client wants to continue to send and/or receive the media, it must issue a TEARDOWN request for the current session and issue a new SETUP request to the server location identified in the REDIRECT request. REDIRECT message is always sent from the server to the client, but strangely, its support is optional. . The RECORD method initiates recording a range of media data according to description of the resource identified in Request-URI. This description may be made available by a previously sent ANNOUNCE method request or some out-of-band means. RECORD request is sent from the client to the server and its implementation is optional for both the client and the server. . The GET PARAMETER method request retrieves the values of the parameters of a presentation. The desired parameters are specified in the body of the request message. If no parameters are specified in message body, the message can serve as a method to check liveliness of client and server applications (a sort of RTSP application “ping”). GET PARAMETER is an optional method that may be used in either direction, that is, from the client to the server and from the server to the client. . The SET PARAMETER method request is used to set the value of a parameter for a presentation or stream identified in Request-URI. Only one parameter can be specified in the request, so that in event of failure there is no ambiguity about which parameter was not set. Like GET PARAMETER this method can also be used in both directions, and its implementation is optional for both client and server side applications. RTSP Response Messages The status line in each response message includes a status code, specifying the recipient’s response to the request. A three-digit number represents each status code. Response messages are classified in two broad categories, provisional responses and final responses. All messages status codes of the form 1xx (i.e., between 100 and 199) are considered provisional responses and they indicate that the recipient is processing the request, but the final action has not been taken, so the transaction is still considered pending. All other status codes indicate final responses. There are four subcategories. Status codes of the form 2xx indicate successful completion of transaction. Codes of the form 3xx indicate redirection (i.e., the responders “thinks” that the request must be sent elsewhere), 4xx indicate client error (i.e., something is wrong with the request made by the client), and 5xx indicate server error (i.e., although the request itself was fine, syntactically and semantically, but the server cannot process for some other reason). Although the method, token, and status codes are helpful in identifying the request and the response, in most cases the recipient of a message cannot determine the exact nature of the task to be performed on a request or the complete meaning of a response without looking at some of the other headers included in the message; sometimes message body must also be interpreted before the message can be fully understood by the recipient. For instance, earlier in this section, we referred
296
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
to the range of a stream while discussing the PLAY method. In RTSP, the stream range is specified using the “Range” request header; we discuss some of the RTSP message headers in next section.
Session Setup Using RTSP Figure 9.11 shows a typical interaction between RTSP client and RTSP server for establishing a RTSP session and its subsequent teardown. Once the client learns about certain RTSP resource, rtsp:// resource-name.server in this case, it sends a DESCRIBE request to the server to learn more about the resource. The server sends back a description of the session corresponding to the identified resource. If the client is interested, it sends a SETUP request, asking server to make necessary arrangements for establishment of the session. If successful, the client can initiate a PLAY request at a later time to get the media stream flowing. If the session requires a special QoS arrangement, such as resource reservation, the client does that before issuing the play request. If the PLAY request is successful, the media starts to flow. The client can manipulate this media stream using various RTSP requests, such as PAUSE or PLAY with different headers. Once the session is completed or the client is no longer interested, the client sends a TEARDOWN request to the server to terminate the session.
Figure 9.11
Session setup and teardown using RTSP.
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
297
9.5.1.2 Session Description Protocol The session description protocol (SDP) is widely used for presentation and session description. This protocol is specified in standards track IETF RFC 2327 [28]. SDP provides a well-defined format that conveys sufficient information about the multimedia session to allow the recipients of the session description to participate in the session. This information is commonly conveyed by SAP protocol that announces a multimedia session by periodically transmitting an announcement packet at a wellknown multicast address and port number. Alternatively, session descriptions can be conveyed through electronic email and World Wide Web. The SDP conveys following information: . Session name and purpose . Media comprising the session Media type (video, audio, etc.) Transport Protocol (RTP/UDP/IP) Media format (MPEG4 video, H.261 video, etc.) Addresses, port numbers for media . Time(s) the session is active The session description using SDP consists of a series of text-based lines (using the ISO 10646 character set in UTF-8 encoding). Each line is of the form ¼. is strictly one character (derived only from the U.S. ASCII subset of UTF-8). is generally either a number of fields delimited by a single space character or a free-format string.
A typical session description using SDP has three parts: . Session Description. This part describes the session and provides information about session owners and the session itself. The mandatory types included in this part are version (v), owner (o), and session name (s) fields. Other optional fields include session information (i), URI of description (u), email address (e), phone number (p), connection information (c), bandwidth (b), time-zone adjustments (z), encryption keys (k), and attribute lines with type field (a). . Timing Information. This part has one mandatory field (t) indicating time at which the session becomes active. The part may optionally include several repeat times (r). . Media Description. This part describes the type and other parameters for the media stream(s). This part includes a mandatory line for each media stream containing its name and transport address; this line is denoted by “m.” Additional optional lines for each media stream include media title (i), connection information (c), bandwidth information (b), encryption key (k), and zero or more media attribute lines each starting with “a.” If all the media streams share a common connection address, it can be mentioned once in the media
298
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
description part. The value corresponding to most typed fields is not free-form text and has a certain defined format. Figure 9.12 illustrates parts of session description with SDP using an example taken right out of the RFC 2327 [28]. 9.5.1.3 Other Session Control Protocols There are a number of other session control protocols available: the most notable ones are the (1) wireless session protocol [WSP], used with WAP, and (2) the SIP and H.323 family of protocols, which are typically used for real-time conversational media communication. Although these protocols can in principle be used for streaming multimedia with minor modifications, in practice, these protocols, despite their rich functionalities, are seldom used for streaming multimedia. In some cases streaming media protocols may be used in conjunction with conversational media protocols. For example, RTSP may be used for interacting with a voice or video mail system, while the remaining infrastructure may be based on SIP. There is, however, some preliminary discussion to use SIP for streaming media as well. This may eliminate the need for having multiple protocols of similar functionality at the terminal. This could be something to look forward to in the future. 9.5.1.4 Description Languages A number of description languages are used in today’s multimedia systems to describe the session integration and scene description, device capabilities, context, and metadata associated with media. The main purpose of well-formed description languages is to facilitate consumption of media information by computers, such as in search engines and semantic Web. However, this is not the only reason why description languages are used. Synchronized Multimedia Integration Language (SMIL) [31], for instance, is used to describe the space-time relationship
Figure 9.12
Parts of session description using SDP.
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
299
between a set of multimedia. Other examples of multimedia descriptions include ISO’s Multimedia Content Description Interface (MPEG-7) and Composite Capabilities/Preferences Profile (CC/PP) [32]. In the following sections we will learn about SMIL and CC/PP as they have important role to play in multimedia content delivery and presentation. Synchronized Multimedia Integration Language (SMIL) For commercial services, media presentation is perhaps just as important as the media itself. Content providers want to present the media in a manner that is both flexible for commercial services, such as integrating location specific advertisements with the media presentation, and at same time functional and appealing to the consumer. SMIL, an XMLbased language developed by the World Wide Web Consortium, is the “glue” that combines various media elements such as video, audio, images, and formatted text to create an interactive multimedia presentation. SMIL does not control the session, but it can be used to specify how the media are rendered at the client application (user agent). SMIL allows description of the temporal behavior of a multimedia presentation, associates hyperlinks with media objects and describes both temporal and spatial layout of a multimedia presentation on the user device. SMIL is an HTML-like language and like HTML, it also consists of elements, attributes, and attribute values. Following is a simple SMIL presentation. It demonstrates the timing, synchronization, prefetch and layout capabilities of SMIL. The SMIL user agent completely (100%) prefetches the media objects. It then displays a video clip and displays a series of static images one after another. The images appearing in “region2” change every 10 seconds, and those in “region3” change every second giving impression of a counter. The layout and presentation behavior is pictorially shown in Figure 9.13 using a video clip showing a moving airplane, and the images in “region3” change every second.
SMIL PRESENTATION EXAMPLE
0 0
0
Root-layout
Region 1
Region 2
Region 3
0 1
0 2
0 3
0 4
0 5
1
2
3
4
5
Figure 9.13
A SMIL presentation example.
Time (seconds)
300
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34:
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52:
35:
301
¼ “10s” ¼ “10s”
¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s” ¼ “1s”
Composite Capabilities/Preferences Profile (CC/PP) RTS does not provide a very good capability exchange mechanism. In most cases the server decides on the type of media and its other properties without first consulting the client about its capabilities. The client may have several capabilities or limitations, which, if communicated to the server, would allow the server to customize the presentation and media based on client capabilities. The client device may have limited bandwidth, or a constrained display, software constraints such as support for some SMIL features and not other features, or some user preferences that may impact the presentation of media at the user agent. CC/PP can be used to express all these scenarios and more. A CC/PP description is a statement of capabilities and profiles of a device or a user agent. CC/PP is based on resource description framework
CC/PP OVERVIEW
302
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
Profile
More Components
HardwarePlatform BitsPerPixel
Streaming AudioChannels
ColorCapable
MaxPolyphony
16
Mono yes
PixelAspectRatio
8
1x2
Figure 9.14
PssVersion 3GPP-R5
An example CC/PP profile.
(RDF1) and can be expressed using an XML document or some other structured representation format. A CC/PP description is structured such that each profile has a number of components and each component has one or more related attribute – value pairs, which are sometimes also referred to as properties. Figure 9.14 shows CC/PP structure for a hypothetical profile. Two components, HardwarePlatform and Streaming, and some of their respective attributes are shown. The HardwareProfile component above, groups together BitsPerPixel, ColorCapable, and PixelAspectRatio properties, which are presumably properties related to the hardware of the device. As with all the languages and description formats, we must have a set of mutually understood vocabulary and rules for their interpretation. CC/PP is no exception. With CC/PP any operational environment may define its own vocabulary and schema that specify the allowable attributes and values, along with their syntax and semantics. This vocabulary and schema may be understood only by the relevant applications. For instance, W3C [32] defines a core vocabulary for print and display, and WAP forum’s user-agent profile (UAProf) specification WAP [33] defines a vocabulary that can be used to express different capabilities and preferences related to the hardware, software, and networking available at the device. A discussion on CC/PP attribute vocabularies can be found in Ref. 34. CC/PP allows specification of default attributes and values in the schema corresponding to each component. If a user agent’s capabilities and preferences related to a particular component match the default, it can just specify so without actually giving details of all the attributes and their values. If values of some of the attributes differ from the default values, only a device can create a profile containing only the differing attribute value pairs while referring to the defaults for other attributes. This mechanism shortens the profile descriptions and saves precious wireless bandwidth. Other methods of reducing size of profile description include using binary encoding such as WAP binary XML. 1
If you are not familiar with RDF, an excellent premier can be found in [68].
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
303
9.5.1.5 UAProf Specification UAProf [33] is worth mentioning here because the capability exchange framework and vocabulary defined in this specification is used, with modifications in some cases, in many mobile content delivery systems, including 3GPP-PSS. UAProf specifies (1) end-to-end capability exchange architecture; (2) a vocabulary and schema comprising six components, namely, HardwarePlatform, SoftwarePlatform, BrowserUA, NetworkCharacteristics, WapCharacteristics, and PushCharacteristics; (3) encoding methods for the profiles; and (4) methods for transport of profiles. UAProf also outlines usage scenarios for user-agent profiles and behavior of different entities involved in the capability exchange process. A brief description of the six components described in Ref. 33 follows in Table 9.2.
CC/PP Exchange HTTP is typically used as the transport protocol for CC/PP description from client to server. However, potentially tens of components and hundreds of properties may be required to fully express the capabilities and preferences profile of a user device. A profile description can therefore be very large and transport of such description between the user device and the server can entail significant overhead.
TABLE 9.2
UAProf Component Description
Component HardwarePlatform
SoftwarePlatform
BrowserUA NetworkCharacteristics
WapCharacteristics PushCharacteristics
Description Comprises a set of attributes that describe the hardware characteristics of a user-agent device, such as type, model, and input/output capabilities Consists of a set of attributes related to the software environment on the device, such as the operating system, available audio video encoding/decoding components, user language preferences This component encompasses the properties related to the HTML browser at the user agent The attributes in this component describe the characteristics of the network that the user device is connected to Includes attributes concerning Wireless Application Protocol (WAP) capabilities Covers attributes specific to push capabilities of the device; the push model is slightly different from the traditional request/response model used for most content; instead the content can be “pushed” to the client without receiving an explicit request from the client (see Ref. 69 for details)
304
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
We already saw that CC/PP allows referring to default attribute values, which may reduce size of the description, but what about the properties that deviate from default. The CC/PP exchange protocol [35] has been designed with precisely these constraints in mind. This protocol allows the user agents to specify only the attributes that differ from default or last capability exchange. This reduces the size of descriptions significantly. Because of the dependency between different descriptions sent by a client, the network must maintain state information about previous a CC/PP exchanges. For this purpose a new logical entity called a CC/PP repository is introduced. This repository stores the default and predefined profiles. The CC/PP exchange protocol [35] extends HTTP by defining three new HTTP headers, two of which are request headers, namely, profile, profile-diff, and one response header, named profile-warning. The profile header contains a list of references to (predefined) profiles or profile descriptions expressed carried in profile-diff header in the same message. profile-diff header contains the actual profile description. Profile-warning header is used to convey any warning information to the requestor, such as when the server fails to fully resolve a profile description. Ref. 33 defines similar headers for use with Wireless profiled HTTP, and these headers are called x-wap-profile, x-wap-profilediff, and x-wap-profile-warning, respectively, and have meanings similar to those of the corresponding headers defined for CCPP exchange protocol. A simple example of the content delivery process based on CC/PP is shown in Figure 9.15. The client includes the CC/PP description in the request for the content. The server resolves the profile and selects or creates appropriate content and sends it back to the client. In reality this same model may include intermediaries such as proxies and gateways, which may manipulate the user request and its capability profile before forwarding the request to the server.
Client
1
HTTP or RTSP request for content with references to profile
4
Content Server
3 Appropriate content is selected or created
Response 2
Delivered content is appropriate for user’s capability and preference profile
Server retrieves the referenced pieces of profile
Profile Repository
Figure 9.15
Capability exchange with CC/PP.
9.5
PROTOCOLS FOR STREAMING MEDIA DELIVERY
305
Needless to say, CC/PP is a generic mechanism for expressing capabilities and profiles and can be used in a variety of situations besides the classical client – server scenario depicted in Figure 12.15. It should also be noted here that currently mostly HTTP is used to carry CC/PP descriptions, RTSP may become more widely used in the future. 9.5.2
The Streaming Media Transport Protocols
For the application to render the media while they are still being transmitted over the data network, some care must be taken in media transport. The media transport mechanisms must provide means through which the media are transported in a sequential manner, and with all the relevant information about how and when they must be rendered (e.g., the media format types and the timestamps). Currently the hypertext transport protocol (HTTP) [36] TCP [37], UDP [38], and real-time transport protocol (RTP) [39] [coupled with the real-time transport control protocol (RTCP)] are used for multimedia streaming over the Internet. Among these protocols, only RTP can be regarded as a true real-time transport protocol, but presence of firewalls that do not understand the streaming protocols and block UDP-based traffic can sometimes make use of HTTP and TCP unavoidable. In many scenarios a multimedia session consists of many different streams, each with its own unique requirements with respect to media transport, thus necessitating the use of more than one media transport protocols. One such scenario is the 3GPPPSS architecture, which we will describe later in this chapter 9.5.2.1 The Real-Time Transport Protocol This protocol has emerged as the dominant streaming media transport protocol. The basic protocol is defined in IETF RFC 1889 [39]. The RFC defines two protocols that are meant to work in tandem, namely, the RTP for media transport and the accompanying protocol called real-time transport control protocol (RTCP) for transport feedback to the senders from the receivers. While RFC 1889 provides the base specification, several additional specifications have been developed for packetization and use with individual media types such as H.263 [40] and GSM-AMR [41]. In the following text we will briefly give an overview of the functionality provided by RTP and RTCP and their use in streaming media environment. Figure 9.16 shows the RTP packet format. RTP provides payload type identification, fragmentation (M-bit), sequencing, and timing information in each individual packet. The payload type field allows the application to determine the correct codec type to be used with the media. Fragmentation information allows the applications to reassemble protocol data units correctly. Timing and sequence information allows the applications to recognize any out of sequence packets and compensate for delay-jitter variations incurred on the network. They all together allow an application to render the multimedia stream correctly and smoothly. RTP also provides synchronization source (SSRC) and contributing source (CSRC) identifiers to identify the packets belonging to same stream independent of the transport layer address. This is especially helpful in multiparty streaming
306
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
Figure 9.16
RTP packet format.
scenarios but is rarely used in contemporary streaming multimedia delivery. RTP is also capable of transporting encrypted media; however, the key generation and distribution is out of scope of RTP. RTCP specifies periodic transmission of control packets to all the participants in a session. It serves four main functions: 1. Feedback on quality of reception of data through RTCP sender and receiver reports. 2. Carrying a persistent transport level identifier for RTP source. This identifier is called canonical (CNAME), this is very helpful in multimedia scenarios where a RTP source may contribute more than one streams. Such as when transmitting audio/video streams of a conversation, the common CNAME for the individual SSRC allows the receiver to recognize these streams as associated, indicating need for synchronization (e.g., for lip-synchronization). 3. Rate control for RTCP messages. The number of RTCP messages generated can quickly get out of control in a conference with large number of participants. This functionality allows the participants to control the rate of RTCP reports. 4. Session control information for loosely controlled sessions, where, participants may join and leave without strict membership control. However, streaming multimedia sessions are often tightly controlled and complete session control information is established via separate session control protocols such as RTSP and RTCP, allowing only loose control within the parameters established by the session control protocol. Figure 9.17 shows the format of RTCP senders report. Receiver reports are similar, except that the header does not contain the NTP timestamp and there is no sender information block. The payload type for receiver reports is 201.
9.5
307
PROTOCOLS FOR STREAMING MEDIA DELIVERY
RTCP sender report packet format.
Figure 9.17
In addition to senders and receivers reports, RTCP also provides for source description or SDES packets (see Fig. 9.18). These packets include information, such as name, email, phone number, and geographic location about the synchronization and contributing sources. Although RTP is transport-independent as long as the transport protocol provides multiplexing and correct delivery, because of the stringent delay requirements of most real-time traffic and high acceptance of IP, UDP is primarily used as transport 0
V=2
7
P
Source Count (SC)
15
Payload type=SDES=202
23
Length
SSRC-1 or CSRC-1 SDES Items for SSRC/CSRC -1
…… SSRC-2 or CSRC-2 SDES Items for SSRC/CSRC -1
……
Figure 9.18
RTCP source description format.
31
308
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
for RTP. Although RFC 1889 states that RTP uses checksum and multiplexing capability of UDP, it is worth noting that most media codecs are either not sensitive to bit errors, or may be encoded with error correction codes; therefore, it is not wise to discard the entire packet if the checksum fails. In such cases it may be wise to disable UDP checksum or use protocols such as “UDP-lite” [42,43]. RTP and RTCP are usually used in tandem and multiplexed onto the same network layer address; for instance, if UDP/IP is used, they will typically share the IP address. By convention the RTP stream uses an even-numbered port number and the corresponding RTCP channel uses one immediately following the odd-numbered port. As stated earlier, individual profiles for specific media types have been defined. These profiles specify the payload type, any modifications to the semantics of different fields in the header and payload, and any new header types if necessary. Examples of such media-specific profiles include Ref. 44 for H.263 and Ref. 41 for AMR. These profiles sometimes provide functionality for rate adaptation and other in-band signaling; for example, Sjoberg et al. [41] allow the receiver to specify one of several AMR codec rates or modes of operation. Applications using these media types must conform to the corresponding profiles to ensure compatibility. 9.5.2.2 Other Media Transport Protocols HTTP and RTSP tunneling or plain UDP or TCP are sometimes used for media transport. HTTP and RTSP tunneling is useful in cases where a firewall blocks RTP/UDP traffic. With HTTP and RTSP tunneling, the streaming media are sent embedded or interleaved in the body of the HTTP or RTSP messages; this approach, however, can be highly inefficient in terms of the amount of bandwidth used. But as streaming multimedia gains wider deployment and acceptability, there are more firewalls that understand the streaming media protocols and can therefore open the desired ports to allow streaming media. So we will likely see less use of tunneling in the future.
9.6
3GPP PACKET-SWITCHED STREAMING SERVICE
As discussed in previous sections, a basic streaming service consists of streaming control protocols, transport protocols, media codecs, and scene description protocols. 3GPP has formulated a set of 3G PSS standards to provide mobile packetswitched streaming service (PSS). The 3GPP standard specifies protocols, codecs and architecture to provide mobile streaming service. The 3GPP codecs and media types were discussed in Section 3.3 of this chapter. Figure 9.19 depicts the 3GPP protocols and applications used in a PSS client. The protocols and their applications are . RTSP and SDP for session setup and description . SMIL for session layout description
9.6
3GPP PACKET-SWITCHED STREAMING SERVICE
309
Figure 9.19 3GPP streaming protocols and their applications.
. HTTP for capability exchange and transporting static media such as session layout description (SMIL files), text, graphics, and so on . RTP for transporting real-time media such as audio, video, and speech Providing end-to-end streaming service implies harmonized interworking between protocols and mechanisms specified by IETF and 3GPP. Both 3GPP and IETF
Figure 9.20 3GPP packet-switched streaming service.
310
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
have their own sets of protocols and mechanisms to provide QoS and connectivity in 3G access network and external IP-PDN (Internet), respectively. External IP-PDN can deploy either IntServ or DiffServ QoS model to provide QoS. 3GPP release 4 does have a support for streaming services in its QoS model. 3GPP release 5 has an upgraded packet-switched core network by adding an “Internet multimedia subsystem (IMS)” that consists of network elements used in session initiation protocol (SIP)-based session control. Release 5 has also upgraded GSNs (GPRS support nodes) to support delay-sensitive real-time services. In addition, the radio access network (UTRAN) has been upgraded to support real-time handover of PS (packet-switched) traffic. The main purpose of release 5 is to enable an operator to offer new services like multimedia, gaming, and location-based services. The Internet multimedia domain is mainly concerned with new services— their access, creation, and payment—but in a way that gives an operator full control over the content and revenue.
9.6.1
3GPP Packet-Switched Domain Architecture
Figure 9.20 depicts the network architecture of an end-to-end 3GPP packet-switched streaming service. We need at least a streaming client and a content server to implement the streaming service. Content servers may be either hosted in the UMTS architecture itself or accessed externally through an IP-PDN. A proxy server may be needed in UMTS architecture to provide sufficient QoS, if the content servers are accessed externally through an IP-PDN. The end-to-end streaming architecture has following network elements that are specific to streaming: . Content Servers. They can be either hosted in the UMTS architecture (added to the IMS) or can be accessed externally. Content servers consist of streaming servers that store streaming content and Web servers that hold SMIL pages, images, and other static content. . Proxy Server. This may be included in the IMS (especially when the streaming server is external) to provide enhanced QoS streaming service. The proxy server’s [45,46] main role is to smooth (eliminate delay jitter) incoming streaming traffic from the external IP-PDN. During transmission of the streaming content to the client, the proxy dynamically adapts the delivered QoS in accordance with the available bandwidth. The proxy server uses the feedback from the client application, radio network, and IP network. The proxy server can also implement an appropriate quality adaptation scheme by switching on the fly to a lower-quality streaming when the available bandwidth is not sufficient. Moreover, it can perform additional functionality of transcoding. Transcoding may be needed for several reasons, such as, when a user moves from a high-bandwidth wireless LAN to a GPRS or 3G networks. This may also be needed if the mobile node is unable to handle high-bandwidth streaming traffic.
9.6
3GPP PACKET-SWITCHED STREAMING SERVICE
311
. User and Profile Servers. These servers store user preferences and device capabilities. This information can be used to control presentation of streamed media to a mobile user. . Content Cache. Content cache can be optionally used to improve the overall service quality. . Portals. Portals are servers that allow convenient access to streamed media content. For example, a portal might offer content browse and search facilities. In the simplest case, it can be a Webpage with a list of links to streaming content. Apart from the abovementioned network elements that are specific to streaming service, other network elements in the 3GPP UMTS architecture play a significant role in the QoS management of streaming service. The UMTS radio access network (UTRAN) ensures seamless handover between basestations with minimal disruption to ongoing real-time services. The radio resource control (RRC) protocol [1] (3GPP-TS-25.331) is used for controlling resources on the UTRAN (universal terrestrial radio access network). The radio access network application part (RANAP) protocol [1] (TS-25.431) is used between UTRAN and core network entities. The serving GPRS support node (SGSN) acts as the gateway for the entire packet-based communications between user equipments (UEs) within its serving area. The SGSN is responsible for packet routing and transfer, mobility management (attach/ detach and location management), logical link management, authentication, and charging functions. The gateway GPRS support node (GGSN) acts as a gateway between UMTS core network and external IP-PDN. There is an active PDP context for every active packet-switched bearer or session. The PDP context is stored in UE, SGSN, and GGSN. With an active PDP context, the UE is visible for the external IP-PDN and is able to send and receive data packets. The PDP context describes the characteristics of the session. It contains a PDP type (e.g., IPv4), the IP address assigned to the UE, requested QoS, and the address of the GGSN that serves as the access point to the IP-PDN. Table 9.3 shows the different QoS classes supported in the UMTS architecture [1]. The PDP activitation (see Fig. 9.21) in the UMTS architecture works as follows. The UE first sends an “Activate PDP context request” message to the SGSN through the session management (SM) protocol. SGSN contacts the home location register
TABLE 9.3
UMTS QoS Classes
Class Conversational Streaming
Requirements
Interactive
Very delay-sensitive Better channel coding; retransmission Delay-insensitive
Background
Delay-insensitive
Example Traditional voice; VoIP One-way real-time audio/video Telnet; interactive e-mail; WWW Ftp; background email
312
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
Figure 9.21
PDP context activation procedure.
(HLR) and performs authentication and authorization functions. SGSN then performs the local admission and initiates radio access bearer (RAB) assignment procedure in the RAN/GERAN through RANAP procedure. A local call admission based on the availability of radio resources and UMTS QoS attributes is mapped on radio bearer (RB) parameters used in the physical and link layers. After the establishment of RB, SGSN sends a “Create PDP context request” message to the GGSN. The GGSN performs local admission control and creates a new entry in the PDP context table that enables the GGSN to route data between SGSN and external IP-PDN. Afterward, the GGSN returns a confirmation message “Create PDP context response” to the SGSN” that contains the PDP address. The SGSN updates its local PDP context table and sends an “Activate PDP context accept” message to the UE.
9.6.2
The 3GPP PSS Framework
The 3GPP PSS specifications consist of three 3GPP technical specifications: 3GPP TS 22.233, 3GPP TS 26.233, and 3GPP TS 26.234. PSS provides a framework for IP-based streaming applications in 3G networks. This framework is very much in line with what we have discussed so far in this chapter. This framework uses CC/ PP for capability exchange (see Fig. 9.22), SMIL for presentation description,
9.6
Figure 9.22
3GPP PACKET-SWITCHED STREAMING SERVICE
313
Capability negotiation mechanism applied in PSS.
RTSP for session control, and SDP for session description. However, there are minor differences here and there. Let’s go over these one by one. 9.6.2.1 Streaming Media Session Setup Procedures for PSS Figure 9.23 shows an example of a simple session establishment. The first step is to know what content to get and where to start. The client can obtain the URI of the content from an SMIL presentation document, a simple Webpage, or an email, or just simply by word of mouth. Once the URI is known, the client application sends a request for the primary PDP context that is opened to allocate the IP address for the UE as well as the access point. The primary PDP context is used to access content servers in either IMS domain or external IP-PDN. Since the primary PDP context is used for RTSP signaling, it is created with UMTS interactive QoS profile. A socket is opened for RTSP signaling and is tied to the primary PDP context. The client can now query the content server to learn more about the content using RTSP DESCRIBE request.2 The client may include its CC/PP description in the request. The client does not need to include the profile description if it is sure that the URI that it is using in the RTSP request already points to a resource that is compatible with its profile. Such would be the case if the URI were obtained from an SMIL document, which was obtained after presenting a valid CC/PP description. If the profile is included, it is carried using the x-wap-profile and the x-wapprofile-diff headers for CC/PP exchange protocol that we discussed earlier. 2
RTSP DESCRIBE method is mandatory in 3GPP-PSS architecture; however, IETF does not mandate its use.
314
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
Figure 9.23 Streaming multimedia session establishment in PSS.
If the profile description is included, the server can find or create content that is most suitable for the client’s request URI and its profile. Otherwise it just selects the default content corresponding to the URI. The server sends back the response with the description of the session that will be used to deliver the selected content. On receiving the description, the client can determine whether it likes the description, which it is likely to be the case because it has presumably been tailored to the client’s capabilities and preferences. The client can now send a SETUP request to the server, asking it to make necessary arrangements for the streaming session. The server acknowledges the SETUP request by sending a “200 OK” response message back to the client. The client now needs to establish a PDP context that is suitable for the anticipated multimedia streaming session. It does so by opening two sockets for RTP and RTCP traffic and tying it to two secondary PDP contexts. The secondary PDP contexts are assigned appropriate UMTS QoS profiles. The secondary PDP contexts reuse the same IP address and access point as the primary PDP context. Now that everything is ready, the client can send a PLAY request asking the content server to start the streaming session. The streaming media are typically transported over UDP/RTP/IP protocols as described in SDP. Figure 9.23 shows the presentation and content server as single entity, but these may in fact be logically and physically separate entities.
9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESS ENVIRONMENTS
315
9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESS ENVIRONMENTS The main factors that differentiate wireless mobile environments are . Limited Bandwidth and Error-Prone Channel. The channel characteristics of a wireless channel have unpredictable time-varying behavior due to several factors such as interference, multipath fading, and atmospheric conditions. The last hop of communication is wireless, which not only offers relatively low bandwidth, but also suffers from higher bit error rate (BER). Furthermore, retransmissions needed to recover from these errors induce variable delay across the wireless channels. . The Movement. The movement of mobile users triggers a handoff mechanism to minimize interruption to an ongoing session. The wireless channel characteristics may vary significantly from one segment of the network to another. Since the handoffs almost always incur packet loss, they further aggravate the already lossy nature of wireless medium. Finally, the relative pathlength from the server to the clients may vary as the clients move across networks. This is especially true if the server is close to the edge, as in the content distribution networks. In the following text we will cover some recent proposals to alleviate the problems that arise as a result of the error-prone nature of wireless channel and mobility in mobile content delivery systems. Also, we look into the research issues regarding providing streaming service in heterogeneous network environments.
9.7.1 Differentiating Transmission Error Losses from Congestion Losses In the wired and wide area Internet, most of the packet loss occurs as a result of congestion. In wireless environments, however, the major source of packet loss is transmission errors over the wireless channel. The rate control is normally used to avoid congestion-induced losses by slowing down the sender. But this is not suitable for avoiding or recovering from errors on wireless channels. The techniques used for error recovery or packet loss avoidance over the wireless channels build better error resiliency in the packets so that even if some packets are dropped, they can still be recovered at the receiver. Alternatively, some senders use aggressive retransmissions, but that is bound to introduce congestion problem. A typical mobile multimedia delivery environment comprises both wired and wireless links. In such an environment an end-to-end feedback mechanism, such as RTCP feedback messages can convey information only about the net end-toend packet loss and there is no way for the sender to ascertain whether the packet was lost on the wired network or the wireless network. Since counteracting the two types of the packet loss requires different techniques, the sender cannot cope
316
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
with the situation effectively without being able to distinguish between the two types of packet loss. To address this problem, a novel RTP monitoring technique has been introduced [47,48]. This technique relies on placement of a RTP monitoring agent at the edge of the wired/wireless network. This agent monitors the RTP streams and sends RTCP feedback to the sender of the stream, such as a streaming server. This feedback is in addition to the RTCP feedback generated by the recipient itself (see Fig. 9.24). The RTCP feedback from the client gives aggregate loss over both the wireless and the wired segments of the end-to-end path. On the other hand, RTCP feedback from RTP monitoring agent gives the loss over the wired segment only. This helps the recipient (typically a streaming server) to determine whether the loss occurred in the wireless or wired segment of the path. It is worth mentioning here that since the RTCP feedback messages are not generated at the same rate as the RTP packets, the feedback captures aggregate packet loss over the RTCP period, which is typically a few seconds. Thus the server can only estimate the percentage of packet loss over the wired and wireless segments and must adapt the stream accordingly. Details on the RTP monitoring techniques and its applications can be found in two papers by Yoshimura and colleagues [47,48].
9.7.2
Counteracting Handover Packet Loss
As we pointed out earlier in the section, handovers are the cause of additional packet loss in mobile networks. Although network layer mobility protocols such as mobile
Figure 9.24
Streaming agent to differentiate wired and wireless packet loss.
9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESS ENVIRONMENTS
317
IP [49] and fast mobile IP [50] attempt to provide seamless handovers during host movement, some packet loss is inevitable because of signaling propagation delay. A novel end-to-end technique for soft IP handover has been proposed [51]. Figure 9.25 shows an overview of this scheme. This scheme assumes that the receiver host is at least temporarily attached to multiple interfaces during the handoff process. The receiver host signals this situation to the sender, along with the information about the interfaces, such as their IP addresses and their relative priority based on signal strength, estimated bandwidth, or packet loss rate on individual interfaces. The IP stack on the sender host then generates redundant error correction symbols, denoted as F1, F2, D1, and so on in Figure 9.25, and dispatches them to the multiple interfaces of the receiver. Reed –Solomon codes are used to generate the redundant symbols [51]. In general, if a message is extended from k symbols to n symbols through addition of (n 2 k) redundant symbols, then up to (n 2 k) redundant symbols can be recovered at the receiver node. For example, in Figure 9.25 n ¼ 2k, that is, there are just as many redundant symbols as the symbols in the original message; thus the receiver should be able to recover the application data even when any n symbols are lost.
Figure 9.25
Bicasting forward error correction codes.
318
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
9.7.3 Mobility-Aware Server Selection and Request Routing in Mobile CDN Environments We mentioned earlier that the movement of a mobile host might result in the establishment of an entirely new path with very different path characteristics. If the servers are present very close to the edge of the network, as in a highdensity content distribution network, this change of relative distance and path characteristics may result in significant QoS degradation, especially for streaming multimedia, where the sessions are typically long. This situation can, however, be alleviated by changing the content server as the host moves as proposed in Refs. 52 and 53. The technique revolves around keeping track of host movement and assigning a new server as the host moves from optimal content delivery region of one server to another (see Fig. 9.26). A number of methods may be used to keep track of host movement and then perform server handoff. Tariq and Takeshita [53] define server coverage areas as sets of IP subnets, and mobile IP binding update messages are used to track user movement. Server handoff is treated as a process of establishing session with new server and terminating with old one, and is achieved using extended RTSP methods [53]. Yoshimura et al. [52] use SOAP messages to update the presentation file used by the host, so that the next segment is fetched from the most appropriate server. The techniques described in Ref. 54 go a step further and analyze the host mobility in terms of how rapidly or slowly it is moving and try to assign a server on that basis. This predictive algorithm can significantly reduce the number of server handoffs that may be necessary.
Figure 9.26
Mobility-based server selection techniques.
9.7 MULTIMEDIA SERVICES IN MOBLE AND WIRELESS ENVIRONMENTS
319
9.7.4 Architectural Considerations to Provide Streaming Services in Integrated Cellular/WLAN Environments The wireless LAN is fast emerging as a complementary technology to 3G networks. This technology provides very high-speed access (11 Mbps for 802.11b and 54 Mbps for 802.11a) but covers very small area and allows limited mobility. On the other hand, the 3G technology provides access at relatively low speed (100 kbps for GPRS) to medium speed (2 Mbps for UMTS) but covers a wide area and allows high mobility. A number of interworking mechanisms [55 – 57] have been developed to integrate these two technologies into a single wireless data network that allows very high-speed access at hotspot areas such as airports and shopping malls. Integration of the WLAN and the cellular network falls in two categories depending on who owns and manages the WLAN. For example, operators can own and manage WLANs to augment their cellular data networks. Thus, an operator can gain competitive advantage by providing enhanced data services at strategic locations such as airports and hotels. In the alternative scenario an independent wireless Internet service provider (WISP) or enterprise can own the WLAN. In either of the two cases an end user can obtain very high quality streaming service in hotspot locations. Two methods are used to integrate cellular and WLAN networks: tight coupling and loose coupling as illustrated in Figure 9.27. The architectural issues to provide seamless streaming service for both the methods are described in the following paragraphs.
Figure 9.27 Generalized integrated UMTS/WLAN architecture.
320
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
9.7.4.1 Tight Coupling Under this integration scheme, the WLAN is connected to the GPRS core network in the same manner as any other radio access network (RAN), such as GPRS RAN (GERAN) and UMTS terrestrial network (UTRAN). The WLAN is deemed as a new radio access technology within the cellular systems. The WLAN may either emulate a radio network controller (RNC) or a SGSN. From the core network point of view, the WLAN is like any other GPRS routing area (RA) in the system. An interworking unit (IWU) is needed to interface the WLAN to the GPRS core network. The main advantage of this solution is that the mechanisms for mobility, QoS, and security in the core network can be reused. Within this architecture the handover takes place when a mobile user either enters or leaves a hotspot area. The IP address allocated to the mobile user under this scheme does not change during the handover process since the mobile user still remains under the same GGSN. The hotspot areas and cellular coverage areas normally overlap, and the handover is based on end user’s desire. For example, a mobile user, receiving multimedia streaming service, would like to hand over to the WLAN when moving into the hotspot area to improve performance. Since the bottleneck bandwidth in wireless environments lies in the air interface, the transcoding functionality (in the proxy server) may not be needed in the delivery path when the mobile user hands over from the cellular RAN to the WLAN. This scheme may also require implementation of additional QoS adaptation mechanisms to support seamless handover between the WLAN and the cellular RAN for real-time applications such as streaming. 9.7.4.2 Loose Coupling Under this integration scheme, the WLAN interfaces directly with the IP-PDN (e.g., the Internet) and has no direct interface with the GPRS core network. In this scenario, WLANs and cellular networks are two separate access networks. Loose coupling scheme may deploy IETF-based protocols to handle authentication, accounting, and mobility. The WLAN appears as a visiting network to the UMTS core network. A mobile user is typically allocated a new IP address while handing over from the UMTS network to the WLAN or vice versa. Seamless handover under this scheme may require advanced mechanisms like context transfer [50] (session context, QoS context, security context, etc.) and resource reservation. Providing seamless streaming service under this integration scheme is an open research problem. The streaming in mobile and wireless environments is a subject of active research. Some of the open research issues to provide multimedia streaming services in mobile and wireless environments are . . . .
Seamless service during interdomain and intertechnology handoffs Dynamic QoS adaptations and channel allocations Optimizations across lower and higher layers Efficient micromobility protocols [58] to make smoother intradomain handovers
9.8
CONCLUSIONS
321
. Secured streaming, digital rights management schemes . Efficient implementations of multicast streaming services Some of the more recent studies on these topics are listed at the end of this chapter [e.g., 3, 4, 59 – 67].
9.8
CONCLUSIONS
This chapter addresses the architectural and design issues to provide streaming services in wireless environments. Supporting streaming services in wireless environments is a big challenge, due to error-prone wireless channels and mobilityinduced factors. Also, limited buffering and processing power available in portable mobile devices impact the design of wireless access network architecture. A lot of research work has been done to address these issues [51,54,59 –62]. The wireless access network architecture should implement appropriate mechanisms to mitigate the impact of wireless/mobility-induced factors in order to minimize the resource and processing requirements at the mobile terminal. We have discussed some of these research issues and related work. The chapter gives a general overview of an end-to-end architecture including network elements and protocols to provide streaming services in wireless/mobile environments. We also describe packet-switched streaming service architecture developed by 3GPP (abbreviated 3GPP-PSS). There has been widespread effort to develop adaptive modulation, equalization and coding schemes that uses the real-time estimation of channel characteristics to achieve certain performance objectives such as error rate and delay at the physical layer. A number of smart-antenna-based technologies have been developed that use space diversity techniques to mitigate the impact of multipath fading and achieve higher capacity. Also, there has been lot of work on micromobility protocols [58] (such as FMIP) at the network layer to reduce mobility-induced disruption. There is a need to look into joint optimization issues across various layers to provide good-quality seamless streaming service in wireless/mobile environments. The wireless bandwidth can be utilized in an effective manner if the lower layers have detailed understanding of the application requirements. A well-defined interface between IP layer and lower layers would be very useful in next-generation wireless networks. Indeed, the EU IST project BRAIN has already defined an IP-to-Wireless (IP2W) interface for this purpose. There are still a number of design issues regarding providing streaming services in heterogeneous wireless networks that include various wireless access technologies (3G, WLAN, Bluetooth, etc). Secured streaming is yet another area of active research. The ability to protect the intellectual property rights of the content owners will be a key factor in the mobile digital content market. Multimedia streaming services are becoming very popular on the Internet, and when these services become mobile, animation, music, and news services will be available to users regardless of the location and time. Next-generation mobile
322
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
networks will combine the standardized streaming service with a range of unique services to offer a wide range of innovative and exciting multimedia services to the rapidly growing mobile market.
REFERENCES 1. The Third Generation Partnership Project, http://www.3gpp.org. 2. The Third Generation Partnership Project 2, http://www.3gpp2.org/. 3. I. Elson et al., Streaming technology in 3G mobile communication systems, IEEE Comput., 34(9): 46– 52. (Sept. 2001). 4. H. Montes et al., Deployment of IP multimedia streaming services in third-generation mobile networks, IEEE Wireless Commun. 84– 92 (Oct. 2002). 5. D. Wu et al., Streaming video over the Internet: Approaches and directions, IEEE Trans. Circuits Syst. Video Technol. 11(3) (March 2001). 6. S. Keshav, An Engineering Approach to Computer Networking: ATM Networks, the Internet, and the Telephone Network, Addison-Wesley Professional, 1997. 7. S. Floyd and K. Fall, Promoting the use of end-to-end congestion control in the Internet, IEEE Trans. Network., 7: 458– 472, (Aug. 1999). 8. S. Floyd et al., Equation based congestion control for unicast applications, Proc. ACM SIGCOMM, Stockholm, Sweden, Aug. 2000, pp. 43 – 56. 9. The TCP-Friendly Web Page, URL: http://www.psc.edu/networking/ tcp_friendly.html. 10. R. Rejaie, M. Handley, and D. Estrin, RAP: An end-to-end rate-based congestion control mechanism for real-time streams in the Internet, Proc. IEEE INFOCOM ’99, March 1999, Vol. 3, pp. 1337– 1345. 11. S. McCanne, V. Jacobson, and M. Vetterli, “Receiver Driven Layered Multicast,” Proc. of ACM Sigcomm, pp. 117– 130, Palo Alto, CA, USA, Aug. 1996. 12. R. Rejaie, M. Handley, and D. Estrin, Quality adaptation for congestion controlled playback video over the Internet, Proc. ACM SIGCOMM’99, Cambridge, Sept. 1999, pp. 1337–1345. 13. Q. Guo et al., Sender-adaptive and receiver-driven video multicasting, Proc. IEEE Int. Symp. Circuits and Systems (ISCAS 2001), Sydney, Australia, May 2001. 14. Y. Wang, M. T. Orchard, and A. R. Reibman, Multiple description image coding for noisy channels by pairing transform coefficients, Proc. IEEE Workshop on Multimedia Signal Processing, June 1997, pp. 419– 424. 15. Xue Li et al., Layered video multicast with retransmission (LVMR): Evaluation of errorrecovery schemes, Proc. INFOCOM’98, March 29– April 1998, Vol. 3, pp. 1062– 1072. 16. S. Shenker, C. Patridge, and R. Guerin, Specification of the Guaranteed Quality of Service, RFC 2212. 17. S. Blake et al., An Architecture for Differentiated Services, RFC 2475. 18. R. Braden et al., Resource Reservation Protocol (RSVP)—Version 1 Functional Specification, RFC 2205. 19. D. Durham et al., The COPS (Common Open Policy Service) Protocol, RFC 2748. 20. T. Sikora, MPEG digital video-coding standards, IEEE Signal Process. Mag. (Sept. 1997).
REFERENCES
323
21. T. Sikora, The MPEG-4 video standard verification model, IEEE Trans. Circuits Syst. Video Technol., 7(1) (Feb. 1997). 22. R. Talluri, Error-resilient video coding in the ISO MPEG-4 standard, IEEE Commun. Mag., (June 1998). 23. N. Poll, MPEG digital audio coding, IEEE Signal Process. Mag. (Sept. 1997). 24. 3GPP, Transparent End-to-End Packet-Switched Streaming Service (PSS): Protocols and Codes (Release 5), Generation Partnership Project TS 26.234. V5.4.0. 25. H. Schulzrinne, A. Rao, and R. Lanphier, Real Time Streaming Protocol (RTSP). IETF Standards Track RFC 2326, April 1998. 26. A. Barbir, B. Cain, R. Nair, and O. Spatscheck, Known CN Request-Routing Mechanisms, IETF Work in Progress, April 2003. (Note: CDI working group at IETF has concluded.) 27. M. Handley, C. Perkins, and E. Whelan, Session Announcement Protocol, IETF Experimental RFC 2974, Oct. 2000. 28. M. Handley and V. Jacobson, SDP: Session Description Protocol, IETF Standards Track RFC 2327, April 1998. 29. J. Rosenberg et al., SIP: Session Initiation Protocol, IETF Standards track RFC 3261 June 2002. 30. H.323v5, ITU-T Recommendation H.323, Packet-Based Multimedia Communications Systems, 2003. 31. http://www.w3.org/TR/2001/REC-smil20-20010807/. 32. Composite Capabilities/Preference Profiles (CC/PP), Structure and Vocabularies, http://www.w3c.org/TR/CCPP-struct-vocab/. 33. WAP User Agent Profile Specification, Oct. 2001. 34. CC/PP Attribute Vocabularies, http://www.w3.org/TR/2000/WD-CCPP-vocab20000721/. 35. Capability Exchange Using HTTP Extension Framework, http://www.w3.org/TR/ NOTE-CCPPexchange. 36. R. Fielding et al., Hypertext Transfer Protocol—HTTP/1.1, IETF Standards Track RFC 2616, June 1999. 37. Transmission Control Protocol, IETF RFC 793, Sept. 1981. 38. J. Postel, User Datagram Protocol, RFC 768. 39. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: A Transport Protocol for Real-Time Applications, IETF Standards Track RFC 1889, Jan. 1996. 40. C. Bormann et al., RTP Payload Format for the 1998 Version of ITU-T Recommendation. H.263 Video (H.263 þ ), IETF Standards Track RFC 2429, Oct. 1998. 41. J. Sjoberg et al., Real-Time Transport Protocol (RTP) Payload Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs, RFC 3267. 42. L.-A. Larzon, M. Degermark, and S. Pink, The UDP-Lite Protocol, IETF Internet Draft, Work in Progress, Dec. 2002. 43. L.-A. Larzon, M. Degermark, and S. Pink, UDP Lite for Real Time Multimedia Applications, HPL-IRI-1999-001, April 1999. 44. J. Sjoberg et al., Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMRWB) Audio Codecs, IETF Standards Track RFC 2429, June 2002.
324
MULTIMEDIA STREAMING IN MOBILE WIRELESS NETWORKS
45. S. Sen, J. Rexford, and D. Towsley, Proxy prefix caching for multimedia streams, Proc. INFOCOM’99, March 1999, Vol. 3, pp. 1310– 1319. 46. J. Rexford, S. Sen, and A. Basso, A smoothing proxy service for variable-bit-rate streaming video, Proc. GLOBECOM’99, Vol. 3, pp. 1823– 1829. 47. T. Yoshimura, T. Ohya, T. Kawahara, and M. Etoh, Rate and robustness control with RTP monitoring agent for mobile multimedia streaming, Proc. IEEE Int. Conf. Communications (ICC 2002), April 2002. 48. G. Cheung and T. Yoshimura, Streaming agent: A network proxy for media streaming in 3G wireless networks, Proc. IEEE Packet Video Workshop, April 2002. 49. C. E. Perkins, Mobile IP, IEEE Commun. Mag., 66– 82 (May 2002). 50. R. Koodli and C. E. Perkins, Fast handovers and context transfers in mobile networks, paper presented at ACM SIGCOMM, 2002. 51. H. Matsuoka, T. Yoshimura, and T. Ohya, A robust method for soft IP handover, IEEE Internet Comput. 18– 24, (March/April 2003). 52. T. Yoshimura, Y. Yonemoto, T. Ohya, M. Etoh, and S. Wee, Mobile streaming media CDN enabled by dynamic SMIL, Proc. WWW2002, May 7 – 11, 2002, Honolulu. 53. M. Tariq and A. Takeshita, Management of cacheable streaming multimedia content in networks with mobile hosts, Proc. IEEE GLOBECOM2002, Nov. 17 – 22, 2002, Taipei, Taiwan. 54. M. Tariq, R. Jain, and T. Kawahara, Mobility aware server selection for mobile streaming multimedia content distribution networks, Proc. 8th Int. Workshop on Web Content Caching and Distribution, Hawthorne, NY, Sept. 29 – Oct. 1, 2003. 55. A. K. Salkintzis, C. Fors, and R. Pazhyannur, WLAN-GPRS integration for next-generation mobile data networks, Proc. IEEE Wireless Commun., 112 – 124 (Oct. 2002). 56. V. K. Varma et al., Mobility management in integrated UMTS/WLAN networks, Proc. IEEE ICC’03, May 2003, Vol. 2, pp. 1048 –1053. 57. 3GPP, Feasibility Study on 3GPP System to WLAN Interworking, Technical Report 3GPP TR22.934 v6.1.0, Dec. 2002. 58. A. T. Campbell and J. Gomez-Castellanos, IP micro-mobility protocols, Proc. ACM Sigmobile, 4(4): 45– 54 (Oct. 2001). 59. S. Verma and R. Barnes, A QoS architecture to support streaming applications in the mobile Internet, Proc. 5th IEEE Symp. Wireless Multimedia Communications (WPMC), Honolulu, Oct. 27– 30, 2002. 60. S. Verma and R. Barnes, DiffServ-based QoS architecture to support streaming applications in 3G networks, Proc. 13th IEEE Symp. Personal, Indoor and Mobile Radio Communications (PIMRC), Lisbon, Sept. 15– 18, 2002. 61. S. Verma and R. Barnes, A QoS architecture to support streaming applications in the mobile Internet, Proc. 12th IEEE Workshop on Local and Metropolitan Area Networks, Stockholm, Aug. 11– 14, 2002. 62. F. H. P. Fitzek and M. Reisslein, A prefetching protocol for continuous media streaming in wireless environments, IEEE J. Select. Areas Commun., 19(10): 2015 –2028 (Oct. 2001). 63. K. K. Leung et al., Link adaptation and power control for streaming services in EGPRS wireless networks, IEEE J. Select. Areas Commun., 19(10): 2029– 2039 (Oct. 2001). 64. S. Dogan et al., Error-resilient video transcoding for robust interwork communications using GPRS, IEEE Trans. Circuits Syst. Video Technol. 12: 453 – 464 (June 2002).
REFERENCES
325
65. A. Boukerche, H. Sungbum, and T. Jacob, An efficient synchronization scheme of multimedia streams in wireless and mobile systems, IEEE Trans. Parallel Distrib. Syst., 13: 911 – 923 (Sept. 2002). 66. A. Majumda et al., Multicast and unicast real-time video streaming over wireless LANs, IEEE Trans. Circuits Syst. Video Technol., 12: 524 – 534 (June 2002). 67. B. Zheng and M. Atiquzzaman, A novel scheme for streaming multimedia to personal wireless handheld devices, IEEE Trans. Consum. Electron., 49: 32 – 40 (Feb. 2003). 68. RDF Premier, http://www.w3c.org/TR/rdf-premier/. 69. WAP Push Architectural Overview, July 2003. 70. R. Rejaie, M. Handley, and D. Estrin, Architectural considerations for playback of quality adaptive video over the Internet, Proc. IEEE ICON 2000, Sept. 2000, pp. 204 – 209.
CHAPTER 10
MULTICAST CONTENT DELIVERY FOR MOBILES ROD WALSH and ANTTI-PENTTI VAINIO Nokia Research Center Tampere, Finland
JANNE AALTONEN Nokia Ventures Organization Turku, Finland
10.1
INTRODUCTION
Multicast is the simultaneous delivery, or communication, of data between several parties. Multicast as a technology has been available since the late 1990s in the Internet and for applications ranging from a multiparty teleconference between a few friends to hundreds of thousands of people watching and listening to a webcast music concert. The delivery of content to many users simultaneously using a shared multicast transport path and last mile is attractive for several reasons. More efficient use of infrastructure and radio bandwidth is very important to mobile wireless network operators, especially since higher-data-rate rich-media services become feasible without increasing the total network capacity. Users may benefit from consuming shared content in both technical terms (higher data rates and faster downloads) and social terms (content demand increasingly correlates to common interests for persistent communities and dynamic ad hoc groups). Initially driven by voice services, wireless data communications have become global with continued growth over many years. The resulting proliferation of media and business orientated handsets to meet many levels of user expectation provides an excellent basis on which multicast can achieve its main benefits in order to drive more exciting services and commercial endeavor. Two initiatives are on the track of progressing through development, standardization, and commercialization. These are IP datacast (IPDC) and multimedia broad-
Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
327
328
MULTICAST CONTENT DELIVERY FOR MOBILES
cast multicast service (MBMS). Although these originate from different backgrounds—digital television broadcast and third-generation cellular telecommunications respectively—they both hold the promise of providing true multipoint services to mobile and wireless users, with all the benefits and opportunities that this brings. 10.1.1
Chapter Overview
Several aspects and features of the IPDC and MBMS systems originate from a mutual set of needs, and, as a result, some of the system aspects are common. For this reason, this chapter next gives a multicast overview of the salient aspects of multicast and then considers the generic IP multicast system as an abstract entity and describes some of the features of IP multicast as a common enabling technology before presenting the individual aspects of both IP datacast (IPDC) and multicast in third-generation cellular (MBMS) in more detail. The chapter finishes off by summarizing these systems and the next steps we can expect to see in the natural development of multicast content delivery systems for mobiles. 10.2 10.2.1
MULTICAST OVERVIEW The Justification for Multicast
Multicast is efficient for group communications. The delivery of data to a group of users is preferable over multiple individual data connections between each pair of users. Each part of the system using multicast need not duplicate actions or data. A simplified view of how multicast can provide efficiency in data delivery infrastructure is shown in Figure 10.1, where a single sender is delivering the same datastream to several wireless receivers. The figure also illustrates the communications efficiencies that apply to fixed and wireless links as well as the impact on the three mile system: . The first mile, between the servers or senders and their respective network connections, typically the Internet with access provided by an Internet service provider (ISP) of some kind. Providing sufficient quality of service with reasonable connection costs is of paramount importance. . The middle mile, over which data passes from the remote source to the remote destination, analogous to major ISP core networks in the Internet core. Data traverses several links and devices, which it shares with other data streams. High-volume data flows and well-behaved connections, for instance employing friendly congestion control, are of paramount importance. . The last mile, between the user (or her/his network) and her/his communication access point, such as her/his local ISP with fixed dialup or a cellular radio link with a wireless operator ISP. The costs and disadvantages of multicast are derived from the same axiom, that multicast is group communication. Thus ownership of costs, intellectual
10.2
MULTICAST OVERVIEW
329
single transmissions 1 copy 1 copy 1 copy
1 copy 1 copy
Sender 1 copy
Wireless Receiver
( a)
multiple transmissions 3 copies 3 copies 6 copies
1 copy 3 copies
Sender 2 copies
( b)
Wireless Receiver
Figure 10.1a A simple unidirectional end-to-end system showing the relative efficiencies of routing by (a) multicast and (b) unicast.
rights, security, network usage and congestion, and communications management functionality is not the same as point-to-point communications, which commonly consists of two parties with well-understood roles (e.g., server – client). This functionality must be distributed between the group, and there are many best ways to do this depending on the application of usage. For a massive Webcast, it makes sense that any special relationships exist only between the sender and each of the receivers, whereas for a teleconference there is a need for more of a peerto-peer relationship. In all cases, the provision of group functionalities is key. Although IP multicast has been available as a basic technology in the Internet for many years, it has not been as widely deployed or as used as point-to-point methods. There are several technical and commercial reasons why this has been so, leading
330
MULTICAST CONTENT DELIVERY FOR MOBILES
to a widespread acknowledgment that multicast has not yet been successful in the Internet. This perception subsequently developed into a chicken-and-egg problem of vendors manufacturing what operators have committed to order and operators buying what vendors have committed to manufacture, and thus hardware and software optimized-for-multicast technology has not been as commercialized and widely available as the point-to-point equivalents. However, the technology and marketplace has continued developing during the years of multicast availability and the main obstacles to widespread multicast deployment are well understood and either solved or work in progress. In particular, well-behaved congestion control mechanisms to use with multicast transport will allow the coexistence of point-to-point and multipoint data traffic over the same networks and thus allow a natural migration towards fully IP multicastenabled ISPs. Group management and control, especially in terms of security techniques, has also undergone significant development, especially in the Internet Engineering Task Force (IETF) Multicast Security [1] and Multicast and Anycast Group Membership [2] working groups. In an increasingly number of cases the advantages of multicast are becoming very attractive. Narrowband users cause less congestion in core networks than do broadband users, and increases in the number of broadband connections to homes and elsewhere mean that broadband services are feasible in the last mile. This requires that the first and middle miles scale to the increasing demand for rich media services—an area where multicast has a clear advantages over unicast. The shifting paradigm of “Internet by PC” toward a more ubiquitous mobile Internet leads to changing Internet usage patterns, and many of the exciting next-generation mobile data services, such as mobile TV, are more technically and commercially attractive using one-to-many techniques. All of these factors work in favor of growing deployment of multicast generally in the Internet and especially in the wireless mobile multicast systems that we shall describe.
10.2.2
Three Perspectives on Multicast
Multicast generally describes multipoint, or multiparty, exchange of information. The specific definition of multicast derives from its usage. Taking a simple layered protocol analogy, we can derive three perspectives on multicast usage: . Application Layer Multicast. Data or content is shared between many users simultaneously, with applications understanding the multiparty aspects of the service. Email with multiple recipients is an example of application layer multicast using unicast for delivering the data packets. . Network Layer Multicast. The network infrastructure optimizes the routing of data by delivering packets over a link only once, even though they are destined to possibly many more than one recipient. Thus, packet duplication is only needed where different receivers are only accessible via different links.
10.2
MULTICAST OVERVIEW
331
. Physical Layer Multicast. The basic link technology determines whether the physical layer is shared or dedicated. For instance, digital subscriber line (DSL) links deliver IP multicast packets point-to-point, whereas Ethernet segments would broadcast the IP multicast packets to all devices on the link. This list is not intended to be an exhaustive analysis, although it should provide an insight into the sorts of content distribution application environments that can benefit from multicast technologies. The term multicast is commonly used to imply network layer (i.e., IP multicast) techniques, although the total value of multicast to users and content and network providers is derived from a combination of all of these layers.
10.2.3 Multicast as a Communication Technique Unicast describes point-to-point (one-to-one) communications whereas multicast includes three theoretical multipoint communications subcases (as illustrated in Fig. 10.2). . Point-to-multipoint (p-t-m)—a single entity communicates to two with more others. Internet radio [3] is an example of this where streamed audio is sent from one server to many clients providing application layer multicast (simultaneously shared content) and using either unicast or multicast IP at the network layer. . Multipoint-to-multipoint (m-t-m)—two or more entities communicate with two more other entitles. Conferencing is a well-used example of this, such as for voice over IP [4]. . Multipoint-to-point (m-t-p)—many entities start communications with a single entity. Examples of this subcase are far fewer than the previous two, especially for user services. A network protocol example is the DHCPDISCOVER message used in DHCP [5].
Figure 10.2 The three fundamental multicast subcases (a) point-to-multipoint; (b) multipoint-to-multipoint; (c) multipoint-to-point.
332
MULTICAST CONTENT DELIVERY FOR MOBILES
These subcases infer that the originator of communications sends the initial message to the other party [i.e., originator(s)-to-other(s)], which is the distinction between the first (p-t-m) and third (m-t-p) cases. At this level there is no distinction between the functions of these parties. For instance, applications for peer-to-peer, server – client, and router –host communications are all included. In some scenarios a host can act as both a sender and a receiver and in others will assume on the role of only one of these. For content distribution, the point-to-multipoint case is already extensively implemented for services and multipoint-to-multipoint is undergoing significant efforts, especially for conferencing and multiplayer gaming. Our main focus in the chapter is on the point-to-multipoint case as it most accurately defines the services provided by the two mobile wireless systems we shall describe, IPDC and MBMS. However, the other two cases are not insignificant, and we can expect further developments in those areas. The distinction between multicast and broadcast is not well resolved in the literature on this subject, so we will take the approach that broadcast is a subset of multicast. The characteristics of broadcast are that it is a unidirectional communication, involving no return messaging (at least none in band), and transmitted to all users on a link. Multicast also includes the cases of bidirectional signaling and targeting selected groups, which can limit reception to only some users on a link, or across several links. What is implied by the term broadcast depends on which layers, discussed above, employ it. For instance, it is feasible to deliver IP multicast packets over broadcast radio, and we consider this to be a use of multicast at the network layer, although broadcast would also be a suitable description if no return signaling were sent on the network layer. Another useful concept is the difference between active and passive hosts. An active host will send and receive messages and a passive host will only receive. In the broadcast scenario, all receiving hosts are passive, as return signaling is not used. In the more general multicast scenario, all or some of the receiving hosts may be active. 10.2.4
Multicast Applications and Services
Selecting and developing multicast applications to deliver services and content should start with a simple question: “Is there a clear benefit in using multicast?” The answer must take into account the end-to-end characteristics of the service in question. Arguments in favor of multicast tend to be along the lines of “We need scalability to a large number of users”; “Users must receive the content at the same time”; or “This content is interesting to a large number of people.” Other arguments that need to be considered are the media (i.e., content source), format, and expected consumption. For instance, streaming of video is natural for real-time consumption, although file delivery may provide more optimal, and opportunistic, use of bandwidth if the video is to be consumed later than the delivery—the selection of which delivery method to use should take into account the expected usage of the whole receiving group. On the other hand, file delivery may be the
10.2
MULTICAST OVERVIEW
333
natural choice for executable objects, such as games software. In all cases, the choice of using multicast delivery depends on the audience size, location and consumption habits of the user group(s) in question. Figure 10.3 shows some of the alternatives for this source –delivery– consumption chain. A real-world application may well combine more than one of these alternatives, as in a live news channel augmented with cached HTML pages and video clips. The basic metric for the selection of multicast over unicast is the concept of multicast gain, that is, the expected quantitative benefit derived from using multicast [42]. A single value for systemwide multicast gain is generally very difficult to compute as gains in radio and fixed-line links, as well as correlation of user interests and variability of the service mix, would make such a figure a unique balance of many assumptions—and would thus be more academic than pragmatic. Thus, the function to calculate multicast gain need only be as complicated as required, such that if only a certain system element or domain is problematic, the analysis could be limited to that. A typical example of this is the scarce radio bandwidth as a resource for mobile wireless systems [43]. For example, if a live video transmission were expected to be delivered to 100 users on the same radio bearer, there may be a multicast gain of 100, although some interference-sensitive radio technologies could reduce this to, for example, 25. (Note: Especially dedicated closed-loop power-controlled radio channels can be more efficient using a small number of dedicated radio channels than a shared open-loop power controlled channel.) The composition of a multicast gain metric needs expert consideration from system and applications developers. However, there are some general issues that form a good foundation for this analysis. The importance of each of the following parameters can be evaluated: . Bandwidth Capacity and Congestion. In each link (e.g., air interface, network infrastructure, terminal capability) the cause and effect of congestion can be analyzed. Multicast with a congestion control mechanism may perform very well if users are interested in at least some common content. It is worth noting that unicast, broadcast, and multicast (with a variety of uplink methods) pose different congestion problems, and do so differently for uplink and
Figure 10.3
Common alternatives for the source, delivery, consumption chain.
334
.
.
.
.
1
MULTICAST CONTENT DELIVERY FOR MOBILES
downlink. For instance, the number of users helps to establish whether broadcast is more efficient than unicast. On the downlink multicast can be even more efficient than broadcast (as the data are not transmitted to “empty cells”), although additional route state changes and uplink signaling render (uplinkcapable) multicast less scalable than broadcast only. Application Requirements. Each application will have its own requirements and the service mix of a certain network or provider will determine the priority given to each. This will impact the value of multicast as a data delivery method in the relevant environment. For instance, delivery synchronization (i.e., limiting jitter—the variation in time between several receivers receiving the same content) can be particularly important for stockmarket alerts and online gaming. Another example is massive file delivery of software updates, where massive scalability and reliable transfer of executable code are both essential. Consumption Habits and Timing. Individual consumption habits and especially the timing of these play an important role. Where there is an existing usage paradigm (such as news at 6 P .M .) there is a strong case for a shared real-time transmission. When there is little user activity (generally this is true at night), common offline content, such as trailers and advertisements, may increase the total per user data quantity available for push delivery. Also, polarized habits such as TV channel-hopping 1 and passive-listening 2 to the radio may vary the requirements on delay and sustained bandwidth from case to case. In any case, the time required to receive the content combined with the timing of consumption is a critical factor. Usage Shaping. If there is a distinct advantage in shaping user consumption, then this may also affect multicast gain. For instance, the related revenues as well as the scalability of the system may influence the balance between video on demand and live TV. Deployment Investment. The cost and effort of deploying and maintaining a technology may play an important role. Where a service provider has an existing point-to-point infrastructure, upgrading to multicast may require a threshold of new users and revenues to be met first. On the other hand, the scalability of multicast may work in its favor for early deployment, as more users may be provisioned with rich media services for, generally, lower infrastructure requirements—such a system may be the basis for incremental upgrading to increase point-to-point capacity as revenues are realized.
The term channel hopping originates from the widely known users behavior of switching rapidly between television channels, usually to establish whether there is interesting content without consulting a service guide. The term also describes rapid user switching between content flows in general and sets requirements on timely establishment of new flows to meet user expectations, as well as timely teardown of old flows to avoid network congestion 2 Passive listening is a common user consumption behavior for AM/FM radio systems, where a user “gets on with other things” while listening to music or voice broadcast in the background and does not change the radio station (i.e., the content). This contrasts with active listening, where the listening to the service would be a user’s main, or only, activity.
10.2
MULTICAST OVERVIEW
335
Additionally, there needs to be a technical balance between the use of broadcast (passive receivers) and return path signaling, as the latter reduces scalability and increases complexity but may be used to improve reliability, security, and usage reporting. Furthermore, the commercial environment tends to add complexity to the decision and issues such as privacy, content distribution rights/licensing, router capability, and provisioning, which may all play an important part in the balance between unicast and multicast. Although there is no hard limit on which applications multicast supports or which are particularly suited to multicast, a brief listing of multicast applications can help set the context. There are already many good resources that list multicast applications, and Table 10.1 [6] provides a suitable foundation.
10.2.5 Mobile Wireless Multicast Cellular communications has traditionally focused on the dedicated voice call services to mobile wireless handsets, with more recent commercialization of packet-switched services, including Internet access. As the convergence of cellular and Internet communications is realized, the need for compelling rich-media content and services to next-generation mobile users will be extremely important in the success of this convergence. Multicast has the potential to offer media-rich services to multiple users at a capacity cost less than serving each user individually—and potentially providing a better quality of service to all. As such, it is a very interesting complement to the dedicated services of second- and third-generation (2G and 3G) cellular communications in terms of the network optimizations and content provisioning benefits it has to offer. The application of, especially, IP-based multicast services to the mobile wireless environment must cater for the demands of this environment. Scalability for an
TABLE 10.1
One Example List of Multicast Applications
One-to-Many Applications Scheduled audiovideo distribution Push media File distribution and caching Announcements Monitoring
Source: Quinn and Almeroth [6].
Many-to-Many Applications
Many-to-One Applications
Multimedia conferencing Synchronized resources Concurrent processing Collaboration Distance learning Chat groups Distributed interactive simulations Multiplayer games Jam sessions
Resource discovery Data collection Auctions Polling Jukebox Accounting
336
MULTICAST CONTENT DELIVERY FOR MOBILES
increasing number of uses at feasible infrastructure costs, loss of packets and coverage due to radio propagation and interference characteristics, and the rapid or gradual migration of users between access points as they perform cellular handovers are all typical considerations that multicast must be able to answer in the mobile wireless environment. Both of the real-world systems we shall describe, IPDC and MBMS, must contend with these issues and provide solutions to the problems that their developers have prioritized. Multicast (one-to-many) services have been very successful for a long time in the broadcast world, providing well-understood television and radio services to typically stationary (or car-based) receivers. Even in these systems, mobility and access using handheld battery powered terminals stretch the basic technologies to, and beyond, their limits. Thus a three-way broadcast – cellular – Internet convergence scenario could see the greatest benefits to all three worlds.
10.3 10.3.1
THE GENERIC IP MULTICAST SYSTEM Common Multicast System Aspects
Both of the real-world systems described later, IPDC and MBMS, bring terminology and assumptions that are specific to the broadcast and telecommunications worlds, respectively. However, with a common language and basic analysis, it is evident that they share much in common. In this section we shall provide a basic system reference model to compare architectural aspects of IPDC and MBMS and future variations, and describe some of the common aspects that are conceptually equal in both systems even though the detailed solutions may differ. In particular, both systems utilize IP multicast from IETF standards, and this leads to a number of common system aspects. Furthermore, both systems specialize IP multicast to provide a unidirectional-shared radio downlink (with separate individual uplink channels in the case of MBMS multicast mode). The unidirectional link poses particular problems for media discovery and connection management.
10.3.2
A Reference System Model
Figure 10.4 gives a model of a generic end-to-end system that can be used as a baseline to compare with both IP datacast and MBMS systems [7]. The purpose of this diagram is to spell out the principal domains and interfaces that may be interesting to us (the names are rather arbitrarily chosen). Each domain may be seen as a single component with a set of functionality. In practice, each domain will consist of multiple logical and physical components, such as services and servers, and provide the sum of the functionality of those components. Each interface provides a set of services between two domains. Although the interfaces contain no functionality themselves, they determine the functional requirements of the domain components.
10.3 ServiceNet IF
Content
AccessNet IF
Service Delivery
Core
THE GENERIC IP MULTICAST SYSTEM Broadcast IF
Access (unidir.)
337
Client Control IF
Client Platform
Client App.
Access (bi-dir.)
CoreNet IF
Interaction IF
Core IF
Figure 10.4
Service IF
Content IF
Client/Network IF
Reference model of a generic end-to-end multicast system.
The framework fits the “3 Cs” business approach: . Content—content and service delivery. Traditionally this role is fulfilled by a content provider and/or aggregator. . Connection—core and access networks. Traditionally provided by a network operator. . Consumption—client platform and application software. Traditionally enabled by end-user equipment and applications. Typically, a business model would deploy each of these domains separately or in groups for a single operator (or provider) in the value chain. There may be common functions required by different operators and so domains may be overlapping. This may lead to redundancy and competition. In all cases, each domain provides a distinct set of functions and components. For example, a content provider may operate a streaming server (and therefore part of a service delivery domain) while keeping his/her business model separated from that of a service provider, which may be aggregating streams. This framework implies complete scalability—unlimited numbers of each domain working with unlimited numbers of other domains. Furthermore, it does not impose business model restraints.
10.3.3 Three-Platform Services As discussed earlier, the selection of multicast technologies is bound to the selection of services and applications a system will support. According to our reference model, content is sent from a service delivery domain. In practice, this means that we must employ some kind of service system that is the source of multicast services, and possibly the source or repository of the content, too. In other words, a service
338
MULTICAST CONTENT DELIVERY FOR MOBILES
system is a collection of content servers. As any service system is generally built for purpose according to the service mix required by the system operator(s) in question, the exact content and servers are impossible to define at a generic level. However, three general platforms are required for the provision of higher-level services that are common to IPDC and MBMS multicast systems: streaming, filecast, and media discovery: . Streaming. This is for the streaming of data from local storage and live sources. Large audiences that can be streamed to at rates from a few tens of kilobits per second to several megabits can support rich media streaming particularly suited to video and audio streaming, for instance, providing mobile television. Streaming is also characterized by the higher importance of timely delivery (consistent data rate and low jitter) than perfectly reliable delivery. . Filecast. This is for the simultaneous distribution of files, otherwise known as discrete media objects. For some applications, close to real time may be desirable, such as if images are to be displayed and synchronized with real-time streaming media. Many applications interested in file transport also demand reliable (i.e., error-free) delivery of files; thus reliable multicast protocols are especially important. Use cases exist for many combinations of these alternatives: massive or small group distribution, large and small file sizes, real-time or offline delivery, scheduled or on-demand/spontaneous content, best-effort and reliable transport, one-off and repeated/carouselled files. . Media Discovery. IP multicast systems announce their content and services in advance, and during the multicast sessions that deliver them. This enables users to select and locate end-user services for consumption, and access them when the time comes. Generally, media are described using a description syntax and semantics, such as with the session description protocol (SDP), and delivered using one or more transport protocols, such as session announcement protocol (SAP), or hypertext transfer protocol (HTTP). The unidirectional nature of structured wireless multicast networks and the lack of a massively scalable uplink means that often unidirectional delivery mechanisms are preferable (possibly exclusively) over those requiring bidirectional connections. For this reason, reliability and redundancy in delivering media descriptors is important and faces many of the same issues as for filecasting. Some multicast applications (see Table 10.1), are easily based on one of these platforms, whereas others require additional development, such as the choice of streaming for small-file filecast for a messaging or chat service. It should be noted that these three platforms represent a feasible baseline, based on implementation experience, rather than an exhaustive analysis of all the options. Internal (e.g., walled garden 3) and external (e.g., public Internet) content may be available, and the choice of servers, proxies, and digital rights technologies will 3 Walled garden services and content are those that are available from only a limited number of providers and operators based on proprietary agreements.
10.3
THE GENERIC IP MULTICAST SYSTEM
339
reflect this. In many deployments there is a need for more than one service system with different functionalities; for instance, the file and management servers of a content provider may feed content acquisition servers of a service provider, and this service provider may subsequently stream content to a service aggregator or network operator who delivers their service mix to wireless users. The use of IP multicast provides a toolkit of existing IP stack protocols standardized and work in progress in the IETF. The maturity and feature set of IP-based protocols varies for each of the platform services: 10.3.3.1 IETF Streaming Several proprietary streaming protocols, codecs, servers, and players are on the market, but the momentum behind a number of open standard options and the existing implementation means that real-time transport protocol (RTP) [8] delivery is becoming the de facto choice for streaming transport. Related protocols, such as the real-time control protocol (RTCP—part of the same standard as RTP) and real-time streaming protocol (RTSP) provide additional features to a subset of applications, especially where a return link is available. RTP has a very flexible approach to the support of multiple and future media codecs as it requires a payload format to be specified for each payload type it supports. Many are already standardized and the work on MPEG4 payload formats promises to solidify a good selection of completely open standard video and audio streaming solutions. 10.3.3.2 IETF Filecast Many solutions have been proposed as both open standards and proprietary solutions to filecasting. However, efforts to standardize reliable multicast transport (RMT) have resulted in the IETF chartering a working group on RMT [9] to produce, among other things, specifications that meet of the reliable unidirectional delivery requirements. In particular, asynchronous layered coding (ALC) [10] offers a building block to produce a robust file delivery protocol with sufficient congestion control for IPDC and MBMS environments, and work is in progress on file delivery over unidirectional transport (FLUTE) [11]—a protocol instantiation to fully specify a filecast protocol based on ALC. Although this extremely important work is in progress with the results of deployment needed to complete the IETF standardization process, it is anticipated that a single open standard for IP-based filecast may be available in time for widespread IPDC and MBMS deployments. 10.3.3.3 IETF Media Discovery There are primarily three delivery scenarios for media discovery in wireless multicast terminals: unidirectional multicast only, bidirectional unicast only, and a combination of both unicast and multicast delivery. Existing IETF (and other) protocols provide solutions to the first two options. The session announcement protocol (SAP) [12] multicasts single session description protocol (SDP) [13] descriptions of media so that user applications can sufficiently understand the available services to locate them on a multicast link. SAP has some global deployment but has well-understood flaws that have kept it from increasing deployment and progressing beyond an
340
MULTICAST CONTENT DELIVERY FOR MOBILES
experimental standard, such as the lack of reliability, lack of announcement prioritization, and outdated authentication mechanisms. Plenty of unicast protocols exist, with HTTP/TCP pretty much ubiquitously deployed. The addition of a common description scheme for multicast services via both multicast and unicast transport is essential to allow the combination of these transports and operator-tailored variations. A multitude of existing description formats (MPEG7, SDP, SDPng), make it essential to provide a basic framing to enable the delivery of metadata independent of syntax, describe the basic data model relating how elements of metadata relate and should be used, and also specify the systemspecific and application-specific metadata formats for the range of multicast services to be provisioned. Common global standards for the basic framing and baseline data model are essential to enable interoperability of IP-multicast-based systems. The IETF has chartered the MMUSIC (multiparty multimedia session control) working group to build on its prior work, including SAP and SDP, and compile a framework that reuses suitable existing IETF protocols and newly specified missing blocks. The work item as a whole is called Internet media guides (IMG) [14], which promises to provide a baseline toolkit for each of the described delivery requirements. 10.3.4
IP Multicast Networking Procedure
Several general steps are required to provide and receive wireless multicast services. Most of these are generalized in Figure 10.5. The stages in broken-line boxes are optional but widely used and generally less supported by openly standardized methods. Content creation and the simple and complex services created from it result in a ready source of content in formats compatible with the user equipment capabilities. The scheduling and agreements between network operators, intellectual rights owners, and service aggregators have a basic impact on the technology—generally limited to formats, digital rights management, server choices and control messaging. Service advertising may be out of band of the system (e.g., billboard advertising) or may be electronically available through Web links or in-band unsolicited service announcements. Its purpose is to arouse user interest and subsequently encourage users to register for services. This registration may involve financial information exchange such as payment for subscriptions or authorizations to bill for services accessed later. All of these steps may be iterative such that registration for a general bundle of services may or may not imply registration to specific services or content—generally referred to as media. If service media are to be secured (encrypted, authenticated, etc.), sufficient security information to access to the media must either be given at registration or made available as a related service. In other words, after registration a user has the right to access the service and security data to access it either with or without further security messaging. Media discovery occurs as described earlier. It should be noted that the media discovery communications behave as any other IP service on multicast bearers and access to media announcements requires running through each of the same
10.3
THE GENERIC IP MULTICAST SYSTEM
341
content creat ion service creat ion, scheduling & agreement service advert ising
service selection
subscriber registrat ion
service registration
media discovery / announcement backbone bearer (rout ing) configurat ion
mult icast group/channel joining
radio access network bearer configurat ion
radio channel access
multicast content delivery f rom source
mult icast content consumpt ion
Network-side
Figure 10.5
media discovery
User-side
Generalized procedure to provide and receive multicast services.
steps for access to this service. Media discovery can also be out of band, such as using point-to-point communications channel or entering details by hand into the user device. For some services, such as media announcement itself, fully autonomous (or preconfigured) discovery may be useful, such as using well-known (or standardized) session parameters for announcements. On the network side both the fixed IP-routed (and IP-switched) backbone network infrastructure and the radio access network infrastructure must be configured. The actual order in which this occurs, and the timeline with respect to the complementary joining and radio access functions on the user device, may vary depending on operator preferences. For instance, user joining may be required to propagate to the network before any backbone or radio bearers are set up to ensure no wasted bandwidth, although for existing groups and for broadcast in general, it may be desirable to set up the data transmission all the way to the radio link regardless of the state of (new) user devices (and possibly to also make the joining procedure a device-local feature). Preconfiguring the network in this way is suitable for larger audiences and eliminates (or reduces) the need for an uplink channel, thus increasing scalability to larger audiences. For maximum interoperability of systems, protocols and technologies below the IP layer should be autonomous such that the radio access network bearer configuration and subsequent access by a user does not affect the higher-layer signaling. To ensure this, radio and link layer parameters can be signaled at the radio link in
342
MULTICAST CONTENT DELIVERY FOR MOBILES
question. This may involve using paging, notification, or sending service information tables that are specific to the radio technology in question. This is a feature of both MBMS and IPDC systems that they provide their own access-specific signaling to allow user devices to correctly associate radio and link channels with specific multicast services (IP addressing), and not require these parameters to be delivered by a higher-layer media announcement mechanism. Several protocols are available for the routing and switching of IP multicast packets on the Internet [15], and these include the option of tunneling multicast streams within unicast streams over one or more links (where IP routed multicast is not feasible). Generally IP routing is considered superior to forced switching as it is more autonomous, and scalable and requires less administration (i.e., human) configuration and maintenance. However, for unidirectional delivery with wellknown bandwidth constraints, it can be highly, desirable to provision the links in advance with well-defined quality of service parameters; thus, the use of IP-switched techniques, especially tunneling, is popular in wireless multicast systems. Demand for wireless multicast services will stimulate development, and it is feasible that the need for unicast tunneling may subside in the future if the perceived benefits of routed IP multicast outweigh the upgrade cost. Figure 10.6 illustrates the difference between routed and tunneled multicast in a broadcast service scenario and illustrates three important points: (1) tunnels add a layer of complexity and overhead, (2) tunnels may cause duplicate data on the same link (the clouds in the figure represent subnets), and (3) tunnels guarantee that data passes through certain control points on their path. 10.3.5
Additional Aspects of the Mobile Wireless Environment
Content delivery to mobiles relies on wireless radio techniques, and these place additional requirements on any mobile multicast system. The following sections describe the aspects of wireless multicast which are common to both IP datacast and MBMS systems, including issues which are generally applicable to any wireless multicast system. 10.3.5.1 Mobility and Movement of Users The mobility of users creates two more needs: untethered access to the network services and consistent service as users are moving. The first need is inherently solved by wireless data communications, but also forces user devices to be batterypowered under normal use. Thus electrical power consumption (both peak and average) must be carefully considered and minimized where feasible. This affects both the physical design of the equipment and the protocol design. The ability to sleep, or shut down certain software and hardware functions to reduce power consumption for short and longer periods of time, must be factored in. Also, the ability to receive passively on one interface without maintaining an uplink channel is important. Consistent service as users are moving implies robust reception as a user moves in the coverage area of its current access point and handover between wireless access points. Also graceful degradation of service quality is desirable if the quality of
10.3
THE GENERIC IP MULTICAST SYSTEM
343
( a)
( b)
Figure 10.6
IP multicast delivery techniques, (a) IP routed and (b) IP switched by tunnel.
service cannot be maintained. Robust reception is normally a combined function of reliable delivery protocols and the radio modulation technique, such as its ability to deal with Doppler effects and Rayleigh and fast fading. Handover, also known as handoff, is one of the essential functions used to support user mobility in a mobile communications network as it provides a means to maintain data traffic connections. It deals with setting up new connections and releasing (or maintaining) old connections to network cells as a mobile terminal moves from the radio coverage area of one cell to another. Cells are generally the radio coverage area of a basestation (i.e., the radio access point). Handover in wireless cellular systems is normally a three-phase process: (1) measurement (measurement criteria, measurement reports), (2) decision (algorithm parameters, handover criteria), and (3) execution (handover signaling, radio resource allocation). To illustrate, an arbitrary example could be, measurement is nearly continuous (e.g., sampled every 100 ms), decisions are regular (e.g., assessed every 5 seconds) and handover is infrequent (depending on the UE usage, e.g., average of every 20 minutes). Handover execution is typically initiated due to the results of a decision based on the measurement of
344
MULTICAST CONTENT DELIVERY FOR MOBILES
certain criteria (e.g., signal quality between basestation and the mobile device). There are many good descriptions of this process for point-to-point wireless communications and it is well covered for 3G networks by Kaaranen et al. [16]. Mobility for multicast services differs from point-to-point services in that the multicast transmission is likely to be delivered to users in several cells as part of the same session, individual members of multicast groups are likely to move somewhat independently of each other, and large group membership implies greater signaling overhead requirements for any uplink. For these reasons there are two generalized types of handover (HO) for multicast services: active and passive. Active HO requires specific signaling from the user device to the network (uplink), whereas passive HO requires the network to give sufficient information on the downlink for a user device to get the service in the new cell without uplink signaling. Obviously, passive HO requires that the new cell be already configured to deliver the service, whereas active HO also enables inactive cells to be reconfigured and then be included in the transmission area. Thus passive HO is less versatile but scales to large user groups better and is particularly suited to multicast and broadcast to passive users. Passive HO can be implemented by using enough of the same service parameters on adjacent cells, and possibly also access-specific notification signaling, so that a terminal can learn that the same service is or is not available in the new cell—either before or after the handover to the new cell. This has clear implications for a user interface so that users may understand and deal with preconfigured service coverage. On the other hand, active HO may be used to serve smaller user densities (zero to a few per cell), but this spreads the radio resource usage cost between fewer users and may require a premium monetary cost. For both passive and active multicast HO, the concept of area is extremely useful. An area may be used to define two useful parameters: the geographic area configured to transmit the service; and the geographic area authorized to deliver the service. In the passive and broadcast cases, these parameters are generally the same, but for active multicast the area authorized is generally larger as only the cells with sufficient users will be configured to transmit the service, to conserve radio resources. Various other intermediate area definitions may be useful. For instance, a network would generally be described by a number of areas, which could be cells, groups of cells, a subgroup of cell covering a known geographical area (e.g., a certain city) and the whole radio network. Figure 10.7 illustrates this by example. In this way a network operator can offer a certain well-known area for a service rather than a list of obscure cell names, whose radio coverage is often dynamic in practice. It also aids with passive handover, as a terminal would expect the same services to be available in all the cells of a certain transmission area. This technique enables various parts of the network to abstract others; for instance, 3G core networks generally understand 3G radio access networks in terms of routing (and other) areas, but only the radio access network is assumed to have knowledge of the specific cellular topology. Another feature of multicast handover where cells are geographically overlapping is that the availability of services is more important than the absolute value of signal quality in a cell. For instance, for point-to-point communications a
10.3
THE GENERIC IP MULTICAST SYSTEM
345
Incoming data streams
data network
cell 5
cell 2
cell1
cell 3
area A
Figure 10.7
cell 6
cell 4
area C network coverage
area B
An example of the relationship between cells and areas.
user would generally prefer to be situated in the cell with the best signal quality, whereas if this cell is not authorized or configured to transmit the multicast services a user desires, then a lower signal quality (with more transmission errors) is preferable if a user can still get the services she/he wants. Handover to cells based on both individual and group communications requirements is complex and normally solved by selecting the technique suited to the majority case: best signal for MBMS where individual communication is paramount in 3G systems, and best service selection for IPDC where one-to-many services dominate. A common aspect of the IPDC and MBMS systems is that they provide handover at layers below IP. The trend toward all-IP in the various mobile systems, which has already had a massive impact on IPDC and MBMS, indicates that multicast mobility at IP layer may also be considered. It would be a reasonable option for offering a bearer-independent mobility suited to heterogeneous and hybrid network systems. However, whereas IP protocols for mobility of unicast are fairly well developed in the IETF and so can expect widespread acceptance and deployment, mobile IP for multicast is not well developed in standards and promises to open a few technical and commercial debates in the future. 10.3.5.2 Errors in Radio Transmission All communication systems can be subject to data loss. The fixed/wired Internet generally experiences packet loss due to congestion (excessive data load) prompting routers to drop packets, and thus reduce congestion. However, wireless links generally suffer the majority of their data loss on the radio link due to radio phenomena such as fading and interference. Data loss on radio links is more likely to occur in bursts for the duration of the interference (or other cause) rather than as psuedorandomly dropped packets, as some IP-based protocols assume, such as TCP (Transmission
346
MULTICAST CONTENT DELIVERY FOR MOBILES
Control Protocol). Thus, it becomes important to provide a reliable transmission that works well in the presence of errors characterized by radio transmission. There are essentially two methods for reliable transport in general: increased coding of the data to provide redundant information that can be used to reconstruct the original data and request for retransmission of data once loss has been detected. These are not exclusive and can be combined. In practice, redundant information takes two forms, forward error correction (FEC) and unsolicited repeat transmissions. FEC usually adds additional data to the transmission, which is mathematically calculated from the original data to enable full reconstruction up to a limit of lost data. The resulting FEC data may be transmitted in addition to the original data or instead of the original data depending on the scheme used. Hamming codes and Reed – Solomon codes are common examples of FEC and are applied at several communications layers (in fact, channel coding on most radio links reduces the source data to transmitted data ratio). Unsolicited repeat transmission achieve exactly the same thing as FEC, providing redundant data to be used in case of errors. However, they are simpler and generally consume more bandwidth than FEC, as a complex calculation to optimize the data size to error tolerance ratio is not used. A simple incarnation of this is the object carousel, which repeats the transmission of a number of objects over and over. The additional FEC or repeat transmission data is often transmitted in band with the original data. However, it may also be provided at a later time or on an alternative channel to reduce the statistical correlation between errors in subsequent packets relating to the same data. Scrambling, data block reordering, is also used for the same result: to reduce the chances of a burst error removing more than the limit of data from a transmission than the error correction scheme can recover from. The other primary mechanism for reliable transport is the request for retransmission of data once loss has been detected. For unicast, TCP is ubiquitous and routinely sends acknowledgments (ACKs) based on data receiver so the sender may resend any blocks of data that are not acknowledged. A variation of this is the negative acknowledgment (NACK), which puts the onus on the receiver to detect packet loss, and can scale better in the one-to-many case as the related state information for each sender–receiver pair is stored in just one receiver, instead of all being stored in the sender. Both these schemes have been considered for IP multicast in the IETF-RMT working group, and a protocol for NACK-orientated reliable multicast (NORM) is under standardization. However, the required NACK signaling requires an uplink channel (i.e., not a purely unidirectional service) and mechanisms to deal with NACK-implosion, where many receivers want to send a NACK at the same moment as they have experienced a common fault in transmission. Thus, the use of redundant data must be the primary reliability scheme where media transmission is primarily unidirectional and scales to mass audiences. 10.3.5.3 Unidirectional Downlink Bearers The unidirectional nature of the downlink ensures that larger audiences can be served, but also puts a premium on any protocol requiring duplex connection. An alternative radio channel, on either the same or a separate radio access technology/system, used
10.4
IP DATACAST (IPDC)
347
for uplink and individual downlink signaling, is feasible for terminals with this need. However, using an additional channel may impose additional costs to the user and the additional resource usage will limit scalability, so is more suited to smaller groups using protocols that deal with multicast scalability. This makes it less desirable to use an additional channel when it can be avoided and is a serious design consideration of any wireless multicast application. Multipoint-to-multipoint communications may not be able to avoid this issue, but signaling overhead and scalability issues must be addressed to ensure any successful service over these systems. The availability of an alternative channel, especially if on an alternative radio access technology, such as GPRS in addition to IPDC, adds the hybrid-network and multihoming dynamics to system design [17]. Network, terminal, or both may make bearer selection decisions based on more complex formulations than handover criteria, such as access cost or quality of service. This also presents application developers with the challenge of providing services that are largely independent of bearers. Applications that can deal with different bandwidth availabilities, latencies or delays, and costs of several candidate bearers are likely to become more widely deployed and thus more successful that those which are dedicated to only one bearer type. Radio link variations and bearer selection functionality are likely to make some connections intermittent so that applications that require an uplink channel but only intermittently are much more versatile than those requiring a continuous uplink. This also permits the special case of mixed active and passive users where a small subgroup of users send representative data to the network, such as packet loss reports or membership reports, and the remainder of the group passively receive the service while ever they do not absolutely need to signal the network.
10.4
IP DATACAST (IPDC)
10.4.1 The IPDC Concept The delivery of mass media content to mobile devices is a challenging problem. Mass media includes movies, television, radio, newspapers, and other published media, and is, by definition, purposed for many people. Typically mass media types of content are edited and published with a certain schedule. For example, an edited newspaper is printed and then delivered to households and points of sale. Television shows are recorded and sent to an audience using a carefully planned schedule. In other words, mass media are delivered to many people at the same time. Convergence will enable delivery of all types of content via any communication network to any device. Traditionally, when talking about the delivery of content to the mobile environment it is often assumed that a bidirectional access network such as GPRS, WLAN, or UMTS is needed. This is natural due to bidirectional nature of interactive class of services such as normal Web browsing. Mass media content, content that is purposed for many people and can be typically delivered to many people at the same time, can be delivered using
348
MULTICAST CONTENT DELIVERY FOR MOBILES
unidirectional networks as well. The concept of delivering content using IP delivery over broadcast radio technologies is referred to as IP datacast (IPDC). As broadcast technologies are, by definition, purposed for broadcasting they are well suited for delivering mass media content types. IPDC is therefore a technology used to provide access to popular content for large audiences simultaneously. IP datacasting is based on the IP multicasting paradigm, with some conceptual additions for one-way/unidirectional networks and/or service concepts [18, p. 201]. One possible wireless transport option for delivering IPDC services is to use digital broadcast networks, such as the digital video broadcasting terrestrial (DVB-T) network, as the physical transport. IP data can be encapsulated into a DVB-T transport stream using a method called multiprotocol encapsulation (MPE) [19]. The DVB project office has also noted the possibility of integrating DVB and UMTS systems. In 2000, the DVB organization founded an ad hoc group in order to define common critical enablers for both DVB and UMTS systems [20]. In addition to the work carried out by the DVB project office, another boost for IP over DVB-T for wide area cellular communications was the foundation of the IP Datacast Forum [7,21]. The major target for that forum is to demonstrate and promote the use of digital broadcast standards for the delivery of digital content using IP connectivity. Figure 10.8 [22] shows a general high-level architecture of a convergence terminal that could be used to access personal interactive content over a cellular interface and mobile mass media content over an IPDC interface. Application layer and connectivity layer convergence, as shown in the figure, enables delivery of any type of content over any radio access bearer to a single device.
10.4.2
IPDC Services and Applications
IPDC technology can be used to deliver any type of digital content. The technology can be used as a point-to-point delivery channel, especially when accompanied with
Application Layer, Digital Convergence
Applications
Connectivity Layer, IP Convergence
IP, IP Multicast
Access Layer Any Transport
Cellular Access
IP over DVB-T
Figure 10.8 General high-level architecture of a convergence terminal for the consumption of mobile mass media. Application and connectivity layers are independent of radio transport [22].
10.4
IP DATACAST (IPDC)
349
a return channel. However, as the used radio access technology is designed for broadcast-type delivery, it is most beneficial if used for one-to-many applications. Thus this section focuses on one-to-many services. In much research on mobile media consumption we see very detailed classifications of services in the mobile domain. For example, Ref. 20 lists about 20 different service scenarios for IPDC types of services. The services vary from mobile office type of business applications to entertainment applications such as video on demand. Ref. 23 covers multiple scenarios of delivering Walled Garden portals over IPDC technologies. On the other hand, when we consider how people currently consume media, we see that usage of mobile services is not visible. For example, during the year 2002 in the United States the average daily television viewing time was 4.5 hours. Time spent with printed media and Internet accounted for a total of 1.2 hours, and people listened to radio for about 2.7 hours. Figure 10.9 [24] illustrates these media consumption reports for some different market areas. From these media consumption figures it is evident that the most popular media consumption medium is television. As the television delivery paradigm is natively broadcast, it is particularly well suited to delivery to mobile devices using IPDC technology. Television service scenarios can be divided into two basic categories: (1) broadcast preprogrammed television and (2) video on demand (VoD). Broadcast preprogrammed television can be either normal television broadcast (including live footage) or carousel type of television broadcast where, for example, a news clip is broadcast at repeated intervals. The VoD concept includes both video-on-demand and near video on demand. IPDC technology is best suited to preprogrammed
Media consumption, hours per day
10 8 .5 hours 8
6
Radio 2.7h
9 . 4 hours Intern et 0.4h
Internet 1.3h
Internet 0.6h
Print 0.6h
Print 1.9h Print 1.1h
Radio 2.0 h
Radio 3.4h
4
2
0 Figure 10.9
9 .4 hours
Television 4 .5h
USA
Television 5.0h
Singapore
Television 3.8h
Finland
Media consumption in the United States, Singapore, and Finland [24].
350
MULTICAST CONTENT DELIVERY FOR MOBILES
television and near video on demand types of services. The challenge with VoD services is dealing with lower scalability and bandwidth efficiency/reuse due to the need for individual communication channels both in the uplink and downlink.
10.4.3
IPDC System Architecture
Figure 10.4 gives a reference model that can be used for an IP datacast system. An example network architecture for unidirectional IPDC network for broadcast IP data delivery is shown in Figure 10.10. The service and delivery management system (SDMS) is used for controlling the service system (SS) and network elements. Content is provided to the service system from the content provisioning system (CPS). The service system schedules content distribution and delivers content over a quality-of-service-enabled backbone to the IPDC radio access system. The backbone routes selected IP packets to selected multiprotocol encapsulators (MPEs). Each encapsulator encapsulates IP packets in the native transport stream (TS) frames of the IPDC bearer.
10.4.4
Mobile Wireless Radio Networks for IPDC
There are three major variants of digital terrestrial television standard currently in the world: the European Digital Video Broadcasting Terrestrial (DVB-T) system, the Japanese Terrestrial Integrated Service Digital Broadcasting (ISDB-T) system, and the U.S. Advanced Television System Committee (ATSC) system. The primary service for all of these is, at the moment, providing digital television transmissions to households and the creation of mobile television or mobile data services was not a primary driver in their original scopes. The ATSC system was designed to transmit high-quality video and audio [highdefinition television (HDTV)] and ancillary data over a single 6-MHz-bandwidth
Service and Delivery Management System
e e e e QoS-enabled backbone
e
e e
Content Provisioning System Service System Figure 10.10
One possible network architecture for IPDC.
10.4
IP DATACAST (IPDC)
351
radio channel. The system uses the trellis-coded eight-level vestigial sideband (8-VSB) radio modulation. The modulation is designed for single transmitter [multifrequency network (MFN)] implementation. The modulation used does not support any kind of mobility [25]. Because of the lack of capability for mobile reception, the ATSC system is not considered as a candidate radio bearer for mobile IPDC services. The ISDB-T system aims to provide stable reception for compact, light, and inexpensive mobile receivers in addition to receivers used in homes. The system uses the band-segmented transmission –orthogonal frequency-division multiplex (BST-OFDM) modulation. This modulation provides good mobility [24]. In addition to mobility, BST segments allow provision of services on a bandwidth of 1 14th of the terrestrial television channel spacing (the total radio channel bandwidth). This feature enables terminals to consume less power in their radiofrequency (RF) components at the cost of more system complexity. The DVB-T system was developed by the European consortium of public and private sector organizations, named the Digital Video Broadcasting Project. The system was designed to allow digital video and digital audio transmission as well as transport of multimedia services. DVB-T uses coded orthogonal frequencydivision multiplex (COFDM) modulation. The standard was originally developed for stationary and portable reception. It was later shown that DVB-T supports good mobile reception with certain parameters. In order to optimize the DVB-T transport for delivery of IP data to the mobile environment the DVB project office has introduced a technical specification known as DVB-H (DVB handheld). This work has been in progress previously under the titles DVB-M (DVB mobile) and DVB-X. The DVB-H system, as far as we know today, will be usable in 6-, 7-, or 8-MHz UHF channels but will primarily target mobile (not fixed) receivers. DVB-H is an optimized radio bearer for delivering IP data to mobile/portable handheld devices, such as mobile phones. One key issue, which is currently the central point of the research activities in the ad hoc group DVB-H, is the power consumption of the DVB-H front end. DVB-H will be used with battery-powered communication devices including mobile phones. Here, battery lifetime is crucial, and ongoing research is looking at the prospects of providing DVB-H receivers that can operate in a batterypowered mode for several hours without the need to recharge the batteries. To accommodate more efficient power usage, a form of time slicing, or time-division multiplexing, will be employed so that services can be delivered in short bursts at higher data rates than they are acquired or consumed. This creates time intervals during which a receiver is aware that there are no data being transmitted that is of interest to it. If these idle times are significant in comparison with the active (interesting transmission) times (e.g., 90% of the total time), then a terminal is able to make a significant power saving by powering down its radio electronics during the idle times. The other main issue for the specification work is to optimize performance in the mobile environment including techniques to optimize the number of radio subcarriers, and increasing the robustness (error correction abilities) of the transmitted
352
MULTICAST CONTENT DELIVERY FOR MOBILES
data. As the standardization work is ongoing, it would be premature to make more detailed statements at this phase. At the moment it looks like the DVB-H is the ideal candidate as an IP datacast bearer to be used with mobile convergence terminals. Figure 10.11 [26] shows global adoption of digital television standards. ISDB-T is used in Japan. ATSC is used in United States and Canada. Central America and South Americas have not selected the standard as of early 2003. DVB-T has been selected or is likely to be selected by the remainder of the countries. The digital audio broadcast (DAB) system provides a signal, which carries a multiplex of several digital services simultaneously. The system radio channel bandwidth is about 1.5 MHz, providing a total transport bit rate capacity of just over 2.4 Mbps in a complete “ensemble.” Depending on the requirements of the broadcaster (transmitter coverage, reception quality), the bit rate available to data services ranges between 1.7 and 0.6 Mbps [27]. DAB provides the feature of IP encapsulation to its transport stream, as is the case with digital video standards [28]. DAB can be regarded as potential radio bearer for IPDC service. 10.4.4.1 DVB-T/H as a Radio Access Network for IPDC DVB-T and its forthcoming DVB-H variant are particularly suited to mobile wireless content distribution as part of an IP datacast system because of the widespread deployment and mobile-friendly characteristics. For this reason we shall consider IP over DVB-T/H in a little more detail. Video, audio, and data are carried over a transport stream (TS), as defined by part 1 of MPEG-2 Systems Standards [29], in all the abovementioned digital television networks. There are five protocol profiles defined for data broadcasting over DVB [30], each with different application areas and requirements (Fig. 10.12). The profiles that can be used to provide DVB data services are data piping, data streaming,
Figure 10.11
Global adoption of digital television standards [26].
10.4
Figure 10.12
IP DATACAST (IPDC)
353
Data broadcasting profiles of DVB.
data/object carousels, and multiprotocol encapsulation. Of the data broadcasting profiles specified by DVB, the DVB multiprotocol encapsulation (MPE) is best suited for generic Internet access, as it provides a standard encapsulation for IP-based protocols [31,32]. The DVB MPE profile is intended for sending datagrams of non-DVB protocols over DVB networks. The encapsulation provided by the DVB MPE profile is closely tailored to ISO LAN/MAN standards. Thus, the DVB network can be considered as being an OSI layer 2 data link in the domain between a MPE broadcast service provider and DVB data receivers. However, there are differences between DVB MPE and more traditional OSI data link layer technologies such as Ethernet, data links over DVB MPE are unidirectional, provide logical broadcast channels identified by different packet identifier (PID) values, and often include a much larger number of receiving hosts than does a normal LAN/MAN segment. While datagrams of other protocols can be fragmented and sent over multiple sections, no fragmentation is done for IP packets in MPE. Thus, each IP packet must fit into a single datagram section that can be up to 4097 bytes in size. This sets an upper limit to the size [maximum transport unit (MTU)] of IP packets that can be transmitted using multiprotocol encapsulation: 4074 bytes if LLC/SNAP framing is used, or 4080 bytes without LLC/SNAP framing (Fig. 10.13). MPEG2 also defines program service information (PSI) tables, which each digital television system inherits. In addition, DVB defines some of its own service information (SI) tables. These tables are delivered in carousel fashion at different rates depending on the quantity and urgency of their information so that receivers may understand the logical services provided by the digital television radio channel. DVB specifies enough SI to provide an electronic service guide (ESG), which can be used to render television program related information to user interfaces (e.g., a TV schedule guide). To IPDC, the relevance of SI is that it is an access-specific service discovery method. It is used to announce which IP streams (e.g., IP multicast groups) are available on the logical channels of the
354
MULTICAST CONTENT DELIVERY FOR MOBILES
IP Header
IP Packet
LLC/SNAP Header
LLC /SNA P Frame (optional)
Datagram Section
Section Header
IP Payload
LLC/SNAP Payload
Section Payload CRC-32 or checksum
Figure 10.13
Encapsulation of IP packets in DVB datagram sections.
available radio channels (multiplexes) [29]. As discussed earlier, this enables the higher-layer service discovery schemes to be bearer and access independent. Encapsulated IP datagrams and SI sections are multiplexed together, and possibly remultiplexed several times, into multiprogram transport streams that are delivered by any of many MPEG-TS supporting infrastructure links to DVB-T transmitter stations. These stations, which are analogous to cellular basestations, perform the necessary modulation, upconversion and radio transmission of the TS multiplex. IP datacast is able to work with any broadcast network topology, although a wide area cellular topology allows some frequency reuse and thus more efficient radio usage. The DVB-T definition of a cell enables one or more transmitters (and frequencies) to be used in providing the total cell geographic coverage area of a cell. 10.4.5
IP Infrastructure for IPDC
Figure 10.14 shows IP infrastructure of an IPDC system used for broadcasting. The figure shows two different hypothetical services marked by dashed and dotted flows (lines). The topmost IPDC cell (A) provides only the dashed service, the middle one (B) both dashed and dotted, and the lower one (C) only the dotted service. The QoS-enabled backbone is assumed to be a multicast-enabled IP network, and the following discussion applies to the various options for routing and networking. The service and delivery management system on the left of the figure is used to control encapsulator elements (marked “e”). Because of the unidirectional nature of the delivery, the encapsulator elements act as proxy clients and are responsible for joining to the multicast routing tree so as to subsequently forward the relevant data packets onto the broadcast radio link. In principle, encapsulator elements join as a result of a management system decision. The service system is the source of multicast services that provide content using the IP multicast network. In the figure, the management system has given instructions to the encapsulators feeding cells A and B to send join message to dashed service and encapsulators B and C to send join message to dotted service. The IP multicast network routes services automatically to encapsulators, and the exact mechanism depends on the routing protocols and
10.4
355
IP DATACAST (IPDC)
A
B e e SDMS
ee QoS-enabled backbone
d
e
C
SS CPS
Figure 10.14 IP infrastructure for IPDC broadcast system.
methods selected by the network operator. As the backbone is multicast-enabled, each data packet need be delivered only once on each link. The encapsulators encapsulate IP packets to the transport stream as described earlier. The multiprotocol encapsulated IP packets are delivered to cells A, B, and C. Terminals receive the traffic in the access cell as those would receive the traffic on normal multicast network. As the system is unidirectional, the terminals are not required to send any multicast join or leave messages toward the network. 10.4.6 The IPDC Service System An IPDC service system is generally built for purpose according to the service mix that the datacast operator(s) in question wishes (wish) to provide. Internal, walled garden, or external (e.g., public Internet) content may be available, and the choice of servers, proxies, and digital rights technologies will reflect this. However, three general platforms are required for the provision of higher-level services, which mirror those described in the earlier section on a generic multicast system: 1. Streaming. The mass media broadband communications provided by IPDC are particularly suited to video and audio streaming, for instance, providing mobile television. IPDC transport protocols and codecs required ensure that at least MPEG4 video using real-time transport protocol (RTP) delivery is supported. 2. Filecast. Forward error correction (FEC) and repetition are the best options to improve reliability for the IPDC unidirectional-only channel. Large events, such as major sports events and the launch of new software and media releases, require the kind of wide area multicast distribution IPDC is optimized for.
356
MULTICAST CONTENT DELIVERY FOR MOBILES
3. Media Discovery. IPDC systems announce their content and services in advance and during the multicast sessions that deliver them. The syntax of the media descriptors is open, although SDP and XML-based should be expected, and typically the same protocol would be used for unidirectional delivery as is used for filecasting. Work is in progress on filecast and media discovery standards, and the work in Internet media guides (IMGs) in the IETF promised to fulfill the foreseeable needs of IPDC systems. Figure 10.10 shows one possible IPDC architecture including three server-side components: the service and delivery management system (SDMS), the service system (SS), and the content provisioning system (CPS). The role of a content provisioning system is to store and provide content. Although any possible content creator may operate a CPS, large media houses often aggregate content that is orientated for the mass media consumer market. An IP datacast service provider must operate one, or more, service systems that acquire, aggregate, and/or source the content. Acquiring content from content provisioning systems may be entirely manual, such as physically transporting data tapes, although automated techniques are more scalable, including file transfers and live streaming. In any case, the value of the content and the contractual obligations of CPS and SS operators require that an appropriate level of security be provided to protect from unauthorized usage, distribution, and secret exposure. A service and delivery management system is required to control service systems, support electronic transactions between CPS and SS, negotiate and/or dictate transmission schedules, and control the related IP datacast network elements. In the case of scheduled services, such as television programming, distributing the scheduling functionality between SS and SDMS may be beneficial. It is through the service and delivery management system that operators would usually administrated over the IP datacast system as a whole. 10.4.7
E-Commerce for IPDC
In order to ensure revenues in the IPDC broadcast system, there is a need to securely cipher and encrypt data and for mechanisms that allow end users to request access and then to distribute decryption keys to mobile terminals. In practice the request for access, and possibly also the delivery of the keys, requires some sort of return channel from the mobile terminal to a network server. As the IPDC transport is unidirectional, the cellular access of the convergence terminal is used. In this scenario IPDC and 2G/ 3G networks complement each other. IPDC provides broadband downlink transport media while 2G/3G offers a natural bidirectional control channel for IPDC. The integration level can be loose, tight, or anything between. In the extreme loosely integrated case, a multimode terminal is the only common denominator between the systems, but some shared network functions can also be considered [33]. Figure 10.15 shows a possible implementation of an IPDC system with billing functionality. The service management system exchanges keys for content decryption
10.4
IP DATACAST (IPDC)
357
with an e-commerce system (e-CS). Mobile devices use short message service (SMS) messages for ordering access to the encrypted content via a SMS center (SMSC). Decrypting keys are delivered to end users with a SMS message. The mobile terminal uses these keys to decrypt IPDC content. Optionally, the key delivered may be a key encrypting key that allows access to content encrypting keys. The benefit of this scheme would be to decouple the grouping and partitioning of content encryption from user subscriptions, and also the ability to calculate the actual content encrypting keys at a later time to increase security. An IP datacast system only utilizes ciphering on IP or higher protocol layers. Figure 10.15 also shows supporting cellular network components are enablers of interactive communications and packet-switched protocols: radio network controller (RNC), serving GPRS support node (SGSN), and gateway GPRS support node (GGSN). 10.4.8 IPDC in Summary IPDC is technically based on the multicast/broadcast delivery of IP packets over digital broadcast technologies. The concept can be implemented by using DVB-S, ISDB-T, DVB-T, DVB-H, ATSC, or DAB radio technologies. The radio technologies of DVB-T, DVB-H, ISDB-T, and DAB enable mobile content delivery. The DVB-H technology is further optimized for delivery of IP data to mobile environments and is therefore eminently suitable as a radio bearer for mobile mass media services. The key components in IPDC systems are the service and delivery management system, the content provisioning system, the service system, the multicast-enabled IP backbone, encapsulators (MPEs), radio transmitters, the e-commerce system,
e SDMS
e e QoS-enabled backbone
e Key Exchange
SS CPS
Internet Int ernet
Figure 10.15
BS
SMSC
e-CS
BS GGSN
SGSN
RNC
IPDC system with billing functionality.
358
MULTICAST CONTENT DELIVERY FOR MOBILES
and terminals with IPDC receiver and software. The service and delivery management system, the content provisioning system, and the service system are used to send and manage content delivery and delivery schedules. The IP backbone enables delivery of content from a service system to access network components: encapsulators and radio transmitters. The essential part of the system in a business respect is the e-commerce system, which is used to for purchasing access rights to content and is the root of the IPDC security mechanisms.
10.5 10.5.1
MULTICAST IN THIRD-GENERATION CELLULAR (MBMS) The MBMS Concept
Traditionally, the main emphasis in cellular networks has been on bidirectional point-to-point communication. However, the benefits that come with point-to-multipoint (p-t-m) model have been noted and therefore some steps towards this direction have been taken in 3GPP (Third-Generation Partnership Project). In release 99 of the 3GPP standards, two p-t-m concepts have already been defined: cell broadcast service (CBS) and IP multicast support. CBS enables transmission of low-bit-rate data services to a predefined set of cells over a p-t-m bearer on the air interface [34]. Because of bit rate limitations, CBS is not suitable for delivering multimedia types of data, but fits well for services whose purpose is to be available to all subscribers in a certain area and does not require high bit rates. As the term states, IP multicast support enables the subscribers to receive IP multicast traffic over 3G networks [35]. In contrast to CBS, IP multicast support does not limit the transmitted data types, but enables transmission of any kind of data that can be carried over IP. The major drawback in this concept is that in reality the data are always delivered over dedicated p-t-p bearers. Thus, from a resource usage point of view, the data delivered using IP multicast support does not differ from normal packet calls, and no real savings can be found, so, in other words, this has no gain (a multicast gain of one). To overcome these weaknesses in the existing p-t-m services, 3GPP launched a standardization process for a new service concept. Multimedia broadcast/multicast service (MBMS) is a new p-t-m bearer service that enables efficient unidirectional p-t-m multimedia data delivery to mobile subscribers [36]. MBMS has two modes of operation: broadcast mode and multicast mode. The phases of service provisioning in these modes are illustrated in Figure 10.16. In the multicast mode services, users need to subscribe to services. When wanting to start the service data reception, an explicit request to join the service must be sent toward the network. On the contrary, in the broadcast mode the service data are always sent to the predefined network area without any MBMS system knowledge of the presence of potential receivers (i.e., subscription and joining are not required in the broadcast mode). Thus, in the multicast mode the data can be selectively sent only to such cells that contain listeners. Furthermore, charging data (i.e., usage reports) can be collected for the end users in multicast mode, unlike the case in broadcast mode.
10.5
MULTICAST IN THIRD-GENERATION CELLULAR (MBMS)
359
Subscription Establish relationship between user and service provider
Service Announcement
Service Announcement
Inform users of available services
Inform users of available services
Joining User indicates his/ her interest to receive a service
Session Start
Session St art
Trigger to establish MBMS bearer for data transfer
Trigger to establish MBMS bearer for data transfer
MBMS Notification
MBMS Not ification
Inform UEs about forthcoming/ongoing MBMS data transfer
Inform UEs about forthcoming/ongoing MBMS data transfer
Data Transfer
Data Transfer
MBMS data transfer to UEs
MBMS data transfer to UEs
Session St op
Session St op
Stop data delivery and release MBMS bearer resources
Stop data delivery and release MBMS bearer resources
Leaving User indicates his/her interest to stop service data reception
Broadcast Mode Multicast Mode Figure 10.16
Phases of MBMS service provisioning in multicast and broadcast modes.
From the standardization perspective, the specification of MBMS is a deliverable of 3GPP release 6 (see discussion below) and is a work item for many working groups (WGs) of 3GPP’s technical specification groups (TSGs). The overall structure of 3GPP’s technical bodies and their areas of responsibility are described in Figure 10.17 [37]. Particularly important to MBMS are the system aspects (SA), radio access networks (RAN), and core networks (CN) working groups. The MBMS standardization effort began in summer 2001 by defining service requirements in SA1 [36]. Progress in other WGs was possible, as the requirement specification reached a stable state, in autumn 2002. SA2 took the responsibility for the architecture and functionality of MBMS. Security aspects of MBMS, which provide security solutions in the IP layer or above, are defined in SA3 (noting that prior 3GPP ciphering will not be reused in MBMS). SA4 has responsibility for the MBMS work on codecs (video and audio encoding). The radio access network working groups (RAN and GERAN) specifications define the extensions required for p-t-m data delivery. The definition of core network aspects is the remit of the CN WGs, based on a mature architecture specification prepared by SA2. 3GPP specifications evolve continuously and they are being enhanced with new features to meet the market requirements. To enable simultaneous development of new features and implementation of the 3GPP system, the specifications are organized into releases that include certain groups of new features. A freeze date
360
MULTICAST CONTENT DELIVERY FOR MOBILES
Figure 10.17 Technical bodies in 3GPP and their areas of responsibility [37].
is defined for each release, after which no more new features can be added (only corrections). The first 3GPP system release (release 99) was frozen in March 1999. A rough timeline for the 3GPP release schedule is given in Figure 10.18. The objective in MBMS standardization was to finalize the specification in release 6 timeframe. The progress made for release 6 naturally affects the scope of release 7. 10.5.2
MBMS Services and Applications
As explained earlier, MBMS is primarily a unidirectional p-t-m bearer service for IP packets in the packet-switched domain of 3GPP systems. In essence, MBMS does not provide any content services itself, but different kinds of applications can use its bearer capabilities to create new services. Thus, MBMS can be seen as an enabler of other services. Since a MBMS bearer can be used to deliver different types of data (e.g., video, audio, text), it supports a vast range of services. The characteristics of services carried over MBMS bearers vary depending on the mode of operation that is used for service provisioning. The following paragraphs discuss these aspects. In the broadcast mode services data are delivered to all users in a certain network area without knowledge on the presence of users, since neither service subscription nor joining is required. Within the scope of the MBMS system, these services are
Figure 10.18
3GPP release schedule.
10.5
MULTICAST IN THIRD-GENERATION CELLULAR (MBMS)
361
free of charge to the receivers, and the service data are not encrypted. Examples of these kinds of services would be mobile advertisement services and network welcome messages to the users. The multicast mode services can be further divided into two categories: services available in a hotspot area and services available in a larger network area. In both cases the users need to subscribe and join the services and charging information may be collected for the joined users. In the case of larger network area service, the data are delivered only to such cells in the service area in which joined users reside. A RAN can further optimize the data delivery over the air interface in cells using mechanisms discussed later (see Section 10.5.4). A typical example of this kind of service is a news service, in which subscribers receive news updates (e.g., video clips and text) to their mobile phones during the day. In the hotspot case an operator would predict that there would be many receivers in a service area during the service provisioning. Therefore the data are delivered to the cells in the service area over p-t-m configuration without regarding an explicit knowledge of the existence of receivers. The group of receivers may be a combination of both passive and active devices, such that the exact configuration of the hotspot change may be service- or operator-specific. One frequently-used example is the so-called football stadium scenario. In this scenario a service is provided to the football match spectators in the stadium area to receive replays of highlights and information on the progress of other matches taking place at the same time. 10.5.3 MBMS System Architecture The starting point in the MBMS architecture definition was the efficiency of resource usage—multiple receivers should share common bearer resources whenever possible. In addition, there has been emphasis on reusing existing network components and protocols in order to minimize the changes to the infrastructure [38]. The MBMS reference architecture is illustrated in Figure 10.19 [39] (note, consistent with the other figures in this chapter, that the mobile terminals are shown on the right-hand side, although the original 3GPP specifications normally show the terminal to network relationship from left to right). As illustrated by the figure, MBMS is implemented in the packet-switched domain of the 3GPP system. MBMS introduces one new element: the broadcast/ multicast service center (BM-SC). This new element resides between packet core network and content providers. It acts as a MBMS data source and performs certain control tasks, for example, to initiate and to terminate MBMS transmissions. The MBMS architecture enables content provisioning from data sources either external or internal to the operator’s network [known as the public land mobile network (PLMN) in 3GPP terminology]. Gateway GPRS support nodes (GGSNs) and serving GPRS support nodes (SGSNs) in the core network perform packet tunneling from BM-SC toward the correct radio access network (RAN) nodes. In addition, GGSNs maintain session information of ongoing MBMS sessions and perform control procedures, such as mobility management. On the RAN side, both UTRAN and GERAN will support MBMS. It is the responsibility of the
362
MULTICAST CONTENT DELIVERY FOR MOBILES
Figure 10.19
MBMS reference architecture [39].
individual RAN technology (i.e., UTRAN and GERAN) to select the most efficient delivery mechanism to transmit MBMS data over the air interface to end users. A possible realization of a MBMS-enabled 3G network is given in Figure 10.20, which also illustrates the hierarchy of different network elements in a 3GPP system. The different parts of this MBMS architecture are described in further detail in the following sections. 10.5.4
MBMS Radio Access Networks
3GPP has stated that it should be possible to provide MBMS services over both WCDMA-based UTRAN and GSM/EDGE-based GERAN. Their main responsibility is to efficiently deliver MBMS data from core network to mobile receivers. In addition, UTRAN/GERAN shall support RAN-level mobility management for MBMS receivers [40]. From the core network viewpoint MBMS data delivery is always point-tomultipoint, but at the RAN side the situation is more complicated. The reason for this is that the RAN needs to decide whether it is more efficient to deliver the data over p-t-p or p-t-m radio bearers (RBs). WCDMA is an interference-driven radio technology, and with a small number of receivers, delivery by several p-t-p bearers might be more efficient than by a single p-t-m bearer. Thus, MBMS service data always come to a RAN via a shared bearer from a SGSN, and, based on the number of listening users in its cells, the RAN makes a selection between delivering by p-t-p or p-t-m radio bearers. It should be noted that the RB type
10.5
MULTICAST IN THIRD-GENERATION CELLULAR (MBMS)
363
RNC
... SGSN
BM-SC
GGSN RNC
...
Cell B3 Cell B1
SGSN
Cell B2 RNC
Cell A2
Cell A3 Core Network (CN) = SGSNs + GGSNs
Figure 10.20
Cell A1
Radio Access Network (RAN) = Radio Network Controllers (RNC) + Base Stations
Hierarchy of network elements in the 3GPP system.
(p-t-p, p-t-m) need not be the same for each cell under a certain radio network controller (RNC) node. For example, in Figure 10.20 the MBMS data could be delivered to cells A1 and B1 over a p-t-m bearer, while in the case of A2, B2, and B3 dedicated p-t-p bearers could be used. Another aspect making the functionality in RAN more complex is the idle-mode reception. When using dedicated point-to-point packet bearer services, the UE is in RRC_CONNECTED (radio resource control) mode. This means that the UE has an active signaling connection to the network and therefore the network knows the location of the UE with single-cell accuracy. In the case of MBMS data reception, UEs may receive data in RRC_CONNECTED mode, but to reduce the UE battery power consumption requirements, 3GPP has defined that data reception should also be possible from MBMS bearers in RRC_IDLE mode (allowing RF components to sleep some of the time while not receiving data). In RRC_IDLE mode the UE does not have a signaling connection to the network and therefore the network does not know the location of the UE with a cell accuracy, the UE is temporarily acting as a passive-only receiver. New mechanisms have been introduced to support the features discussed above. With a counting procedure, the RAN can find out whether there are enough receivers
364
MULTICAST CONTENT DELIVERY FOR MOBILES
in each cell under its control to justify the usage of p-t-m bearer instead of p-t-p delivery [41]. It should be noted that before initiating a counting procedure, some of the joined UEs might be in RRC_IDLE mode and therefore the exact number of receivers in each cell might not be known. The selection of the bearer type is based on the threshold value, which the network operator defines. When setting up an MBMS session, the RNC first checks whether the number of RRC_CONNECTED mode receivers in a cell exceeds this threshold. If the threshold is exceeded, the usage of p-t-m bearer is justified. Otherwise the RNC sends a notification message including counting indication to the wanted cell(s), which requests the joined UEs to establish an RRC connection (i.e. make a state transition to RRC_CONNECTED mode). However, not all joined UEs make this transition to prevent overload in the RAN. The exact number of UEs brought to RRC_CONNECTED is an implementation issue. By summing the number of the joined UEs in RRC_CONNECTED mode in each cell, the RNC can decide what type of radio bearer to use. During the MBMS data transmissions, notification messages are sent periodically to inform the users about the radio configuration used for the service. The message carries information about the type of radio bearer used (i.e. p-t-p or p-t-m). In the case of a p-t-m bearer, it may be used to deliver other RB parameters as well. This mechanism is used to minimize the data loss when a UE receiving MBMS data makes a cell reselection. For instance, due to periodical notifications a UE receiving MBMS data in RRC_IDLE mode does not need to establish a signaling connection upon cell change to learn the type of radio bearer that is used in the new cell for MBMS data transmission. 10.5.5
MBMS in the Core Network
Currently the 3GPP system’s packet core network supports data delivery only over point-to-point connections. To enable packet data exchange between UE and packet data networks (e.g., the Internet), GTP (GPRS tunneling protocol) tunnels are set up between GGSN and SGSN, and between SGSN and RAN. In the core network elements (SGSN, GGSN) packet connections are described by packet data protocol (PDP) contexts, which contain all the necessary parameters to deliver packet data between two endpoints with a defined QoS. Introducing MBMS into the core network requires enhancements as, from the perspective of the core network, MBMS data delivery is always seen as point-tomultipoint communication. 3GPP decided that the MBMS data delivery is also performed through GTP tunnels in the core network (as in the case for p-t-p packet communication), so that the solutions are built on well-known concepts. However, the concept of PDP context is not sufficient to describe p-t-m connections and therefore new context types are introduced: . The MBMS bearer context contains information to describe an MBMS bearer, such as addresses of downstream nodes, QoS, bearer identification (IP multicast address and access point name), and the area in which this service is
10.5
MULTICAST IN THIRD-GENERATION CELLULAR (MBMS)
365
available. In addition to the core network elements, this type of context is also maintained in the RAN and BM-SC. . The MBMS UE context contains UE-specific information related to a certain MBMS bearer that the UE has joined. This context can be used, for example, for charging purposes. The MBMS UE context is maintained in GGSN, SGSN, and UE. In addition to tunnel management, the core network performs other MBMS functions as well. SGSNs take care of mobility management, provide user individual network control functions and relay MBMS data to RAN nodes. A GGSN acts as an entry point for MBMS bearers to BM-SC. GGSNs request MBMS bearer establishment and release from SGSNs upon notification from a BM-SC. The GGSN relays IP multicast traffic toward SGSNs as MBMS data and is also responsible for processing joining requests that the users send to request multicast mode service activation. Joining is performed using standard IETF mechanisms over p-t-p packet connections: the UE sends an IGMP (IPv4) or MLD (IPv6) join message, in which the IP multicast address(es) identifies the service requested. Furthermore, SGSNs and GGSNs collect charging data for MBMS service listeners. 10.5.6 MBMS Service Center and Data Sources The broadcast/multicast service center (BM-SC) is a new element introduced in the MBMS architecture. A BM-SC acts as a data source for MBMS services, but also has some control responsibilities. It acts as a gateway to MBMS services to the content providers, which may reside either within or outside the operator domain (as despicted in Fig. 10.18). As 3GPP is primarily interested in MBMS as a new bearer server, the interfaces and functionality between BM-SC and content providers is beyond the scope of 3GPP standardization. However, some issues in this area are briefly discussed in the next section. Toward the core network elements, the BM-SC has several functions. It performs high-level session scheduling based on which it is able to initiate and terminate MBMS transport resources as necessary. In order to initiate an MBMS service session, it sends a session start notification towards the GGSN. At the same time the session parameters (e.g., QoS, service area) are also delivered, so that the core network elements can perform QoS authorization and policing. As the resources for a session are reserved (i.e., the GTP tunnels and radio bearers for data delivery are set up), a BM-SC can begin the MBMS data delivery toward GGSN(s). The data are transmitted as IP multicast packets. Similarly, when the stop time of the session is reached, BM-SC sends a stop session notification to request release of the resources reserved for the session. Furthermore, a BM-SC will provide service announcements to advertise the available MBMS services to the users. With this function the UEs are able to find out the communication parameters (service identification, media descriptions, etc.) of the services. Any required security protection of MBMS service content also takes place in the BM-SC. The security procedures will be performed in the IP layer or above.
366
MULTICAST CONTENT DELIVERY FOR MOBILES
In the multicast mode, it is desirable that only subscribed users are able to join MBMS services [36]. Thus, the functionality for service authorization per user is located in the BM-SC. As a GGSN receives a join request to an MBMS service, it consults the corresponding BM-SC to check whether the user is authorized to receive the service. Thus, the BM-SC must maintain information on service subscriptions to perform this function. Still one additional essential responsibility of a BM-SC is to generate charging information for the data transmitted by content providers so that content providers may be billed should the service scenario require that. It is envisioned that MBMS bearers could also be utilized for delivering data from sources other than a BM-SC. For example, this feature would enable efficiently transmitting IP multicast sessions available in the Internet to mobile receivers via MBMS bearers. In this type of scenario the BM-SC would still be required to perform the control signaling toward GGSN(s) to set up and tear down the bearer, but the actual user plane data would flow directly from the data source to GGSN(s), bypassing BM-SC. Thus, these kinds of sessions need to be configured to BM-SC beforehand to enable the service provisioning over MBMS bearers. The possibility of using other data sources than BM-SC is illustrated in Figure 10.18 as data source boxes. 10.5.7
Commercial Interfaces
As mentioned earlier, the interfaces between BM-SC and content providers will not be standardized in 3GPP. However, 3GPP has defined some functions that should be available in BM-SC toward content providers. For security reasons, BM-SC will include functionalities to perform third-party content provider authentication and authorization to prevent illegal access to MBMS bearers. BM-SC will also verify the integrity of the data received from content providers to prevent illegal data insertion. The most important function in this interface is to enable data retrieval for MBMS services from external sources. The data retrieval functionality shall support rich media types, which can be further transmitted via MBMS bearers to receivers. 10.5.8
MBMS in Summary
MBMS brings 3GPP systems one step further in supporting a broader range and different types of services. This efficient point-to-multipoint bearer service for multimedia data enables the introduction of new kinds of services that would not have been previously feasible to implement with the capabilities of prior 3GPP systems. The application of multicast/broadcast services in this kind of environment is a relatively new matter. We can expect that in the future there will be further developments, in this area of 3GPP, as experience from real-world implementations is gained. For example, the future applications could drive the implementation of support for multipoint-to-multipoint communication in 3GPP systems, but this remains to be seen. Additionally, MBMS in 3GPP release 6 has remained substantially independent
10.6
MULTICAST CONTENT DELIVERY
367
of the largest new component of the previous release: the IP multimedia system (IMS). Since IMS brings along many IP-centric solutions to messaging and charging issues, it is likely that both MBMS and IMS will share a greater role in each other’s usage in future developments. The reception of MBMS in the market will play an important role in this area. On one hand, MBMS enables development of new types of services, but on the other hand, it puts pressure to service developers to come up with new service concepts that will awaken the interest of consumers.
10.6 MULTICAST CONTENT DELIVERY FOR MOBILES IN SUMMARY AND IN THE FUTURE IPDC and MBMS will be commercially implemented and available in the near future and promise to offer globally standardized and ubiquitous wireless multicast to mobiles. Both of these are primarily one-to-many content delivery systems providing data bearers and requiring significant service system and application support. Enhancing current services and business models with higher-bandwidth popular content clearly adds value to existing point-to-point wireless communication systems. Several anticipated third-generation services, such as mobile TV, can be made available with lower resource usage and cost, and thus are much more feasible and likely to experience widespread success. Just as exciting are the new opportunities that mobile multicast present that were unfeasible before. Surviving massive demand due to spontaneous human interest is a start, but we can expect to be surprised as the newly available technology stimulates ideas that have not been anticipated. In the same way, we can expect that real-world deployment and commercial development will encourage further technical advances. IPDC and MBMS technologies both offer a single “one size range has to fit all” bearer service for all applications. The practicalities of implementation would lead us to expect incremental development of bearer-specific applications toward bearer-independent applications, as IP multicast and IP convergence already facilitates. Thus the emergence of the need and solution for bearer selection: both the multicast and unicast options. Also, the one-to-many structured network optimizations of IPDC and MBMS leave the opportunity space somewhat open for smaller, more sparsely distributed multipoint-to-multipoint groups and services for them. The availability of several technologies and integration into terminals is an option that indicates a need for hybrid solutions that can further enhance the service value to users and operators. Integrated networks for several bearers and technologies remains a research topic for now, although the fast-moving mobile terminal market may be able to deliver integrated terminals that provide the full range of exciting multicast and unicast services, and allow system optimizations to occur incrementally in future deployment and standardization. As such, IPDC and MBMS form the basis for structured mobile multicast content delivery systems of
368
MULTICAST CONTENT DELIVERY FOR MOBILES
the future. Several other technologies also hold promise for IP multicast services. In particular, WLAN technologies offer both structured and ad hoc network use cases for multicast services. For now, momentum for widespread commercial deployment of multicast on these technologies has not formed, but this is one space to watch for both complementary and competing multicast propositions. Meanwhile, both IPDC and MBMS progress in standardization and implementation and several new systems and standards shall appear to meet their requirements. The home of IP Multicast, the IETF, is equally hard at work to make multicast a success through work on multicast enabling protocols. If these mobile multicast content delivery systems can achieve a commercial success in several application domains, they are capable of changing the usage patterns of the mobile Internet in its entirety. They would blur the lines between Internet, telecommunications and broadcast, and shatter traditional assumptions about segregating service categories between on-the-move, in-home and in the office usage. Mobile IP Multicast content distribution promises to become and remain a very important and exciting area into the foreseeable future.
REFERENCES 1. Multicast Security IETF Working Group, Working Group Charter, http:// www.ietf.org/html.charters/msec-charter.html. 2. Multicast & Anycast Group Membership IETF Working Group, Working Group Charter, http://www.ietf.org/html.charters/magma-charter.html. 3. C. K. Miller, Multicast Networking and Applications, Addison-Wesley, 1998, Chapter 7. 4. P. Koskelainen, H. Schulzrinne, and X. Wu, A SIP-based conference control framework, paper presented at NOSSDAV’02, Miami Beach, FL, May 2002. 5. R. Droms, Dynamic Host Configuration Protocol, RFC 2131, Draft Standard, March 1997. 6. B. Quinn and K. Almeroth, IP Multicast Applications: Challenges and Solutions, RFC 3170, Informational, Sept. 2001. 7. Technical Document, IPDC Forum, Proposal for Architectural Framework, version 2.0, 2002, http://www.ipdc-forum.org. 8. H. Schulzrinne et al., RTP: A Transport Protocol for Real-Time Applications, RFC 1889, Proposed Standard, Jan. 1996. 9. Reliable Multicast Transport IETF Working Group, Working Group Charter, http:// www.ietf.org/html.charters/rmt-charter.html. 10. M. Luby et al., Asynchronous Layered Coding (ALC) Protocol Instantiation, RFC 3450, Experimental RFC, Dec. 2002. 11. T. Paila et al., FLUTE: File Delivery in Unidirectional Environments, Work in Progress, Nov. 2003, http://www.ietf.org/internet-drafts/draft-ietf-flute-04.txt. 12. M. Handley et al., Session Announcement Protocol, RFC 2974, Experimental RFC, http://www.ietf.org/rfc/rfc2974.txt. 13. M. Handley and V. Jacobson, Session Description Protocol, RFC 2327, Proposed Standard, http://www.ietf.org/rfc/rfc2327.txt.
REFERENCES
369
14. J. Mun˜oz, ed., Internet media guides, Proc. 56th Internet Engineering Task Force, San Francisco, March, 2003, http://www.ietf.org/proceedings/03mar/index.html. 15. B. Williamson, Developing IP Multicast Networks, Vol. 1, Cisco Press, 2000. 16. H. Kaaranen et al., UMTS Networks: Architecture, Mobility and Services, Wiley, 2001. 17. R. Walsh, L. Xu and T. Paila, Hybrid networks — a step beyond 3G, Wireless Personal Multimedia Communications, Conf., WPMC’00, Bangkok, Thailand, Nov. 2000. 18. Mobile Internet Technical Architecture, Nokia, Addison-Wesley, 2002. 19. TR 101 202, Digital Video Broadcasting (DVB); Implementation Guidelines for Data Broadcasting, ETSI, version 1.2.1, Jan. 2003. 20. DVB Project Office, Ad hoc Group DVB-UMTS, TM2466, The Convergence of Broadcast & Telecommunications Platforms, revision 4, Feb. 6, 2002. 21. Press release, New Forum to Promote IP Datacasting Activities, http://press. nokia.com/PR/200109/834086_5.html (referenced May 30, 2003). 22. J. Aaltonen, Proc. Conf. Transition from Analogue TV to Digital Services in Europe, panel 2, Aaltonen J, June 5, 2002, (http://www.eicta.org). 23. L. Tvede, P. Pircher, and J. Bodnekamp, Data Broadcasting: The Technology and Business, Wiley, Aug. 1999. 24. S. Gallup, Suomen Gallup Web Oy, http://www.gallupweb.com/, 2002. 25. Y. Wu, E. Plsizka, B. Caron, P. Bouchard, and G. Chouinard, Comparison of terrestrial DTV transmission systems: The ATSC 8-VSB, the DVB-T COFDM, and the ISDB-T BST-OFDM, p 101-113, IEEE Trans. Broadcast., 46(2), (June 2000). 26. DVB Project, DVB Home Page, http://www.dvb.org. 27. EN 300 401, Radio Broadcasting Systems; Digital Audio Broadcasting (DAB) to Mobile, Portable and Fixed Receivers, ETSI, version 1.3.3, May 2001. 28. ES 201 735, Digital Audio Broadcasting (DAB); Internet Protocol (IP) Datagram Tunnelling, ETSI, version 1.1.1, Sept. 2000. 29. ISO/IEC 13818-1:2000, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Systems, ISO, Dec. 2000. 30. EN 301 192, Digital Video Broadcasting (DVB); DVB Specification for Data Broadcasting, ETSI, version 1.3.1, May 2003. 31. J.-P. Luoma, Internet Access over DVB Networks, master of science thesis, Tampere Univ. Technology, Dept. of Information Technology, 2001. 32. J. Ljungquist, Transport Protocols for IP Traffic over DVB-T, master’s thesis, Dept. Teleinformatics, Computer Communications Laboratory, Royal Institute of Technology, Stockholm, 1999. 33. K. Ahmavaara, P. Jolma, and Y. Raivio, Broadcast and Multicast Services in Mobile Networks, WTC, 2002. 34. 3GPP, TS 23.041 Technical Realization of Cell Broadcast Service (CBS) (Release 5), version 5.1.0, March 2002. 35. 3GPP, TS 29.061 Interworking between Public Land Mobile Network (PLMN) Supporting Packet Based Services and Packet Data Networks (PDN) (Release 5), version 5.5.0, March 2003. 36. 3GPP, TS 22.146 Multimedia Broadcast/Multicast Service; Stage 1, version 6.1.0, Sept. 2002.
370
MULTICAST CONTENT DELIVERY FOR MOBILES
37. Third Generation Partnership Project, 3GPP Home Page, http://www.3gpp.org. 38. 3GPP, TR 23.846 Multimedia Broadcast/Multicast Service; Architecture and Functional Description (Release 6), version 6.1.0, Dec. 2002. 39. 3GPP, TS 23.246 Multimedia Broadcast/Multicast Service; Architecture and Functional Description (Release 6), version 0.5.0, April 2003. 40. 3GPP, TR 25.992 Multimedia Broadcast / Multicast Service (MBMS); UTRAN/GERAN Requirements (Release 6), version 1.3.0, Jan. 2003. 41. 3GPP, TS 25.346 Introduction of the Multimedia Broadcast/Multicast Service (MBMS) in the Radio Access Network (Stage 2), (Release 6), version 1.5.0, March 2003. 42. J. Aaltonen, J. Karvo, and S. Aalto, Multicasting vs. Unicasting in Mobile Communication Systems, WoWMom 2002, Atlanta, Georgia, 28 Sept. 2002. 43. J. Aaltonen, Content Distribution Using Wireless Broadcast and Multicast Communication Networks, Doctoral thesis, Tampere University of Technology, Publications 430, 2003.
CHAPTER 11
SECURITY AND DIGITAL RIGHTS MANAGEMENT FOR MOBILE CONTENT DEEPA KUNDUR Department of Electrical Engineering, Texas A&M University College Station, Texas
HEATHER YU Panasonic Information & Networking Technologies Laboratory Princeton, New Jersey
CHING-YUNG LIN IBM T. J. Watson Research Center Hawthorne, New York
The large-scale acceptance of digital media distribution rests on its ability to provide legitimate services to all parties. This requires allowing the convenient use of digital media while equitably compensating all members of the information distribution chain such as content creators, providers, and consumers. This chapter discusses the important issue of information protection and digital rights management in the context of mobile content delivery. We provide an introduction to the problem of content security and digital rights management (DRM); demonstrate how DRM must be designed to reflect the content distribution and business models of a given enterprise; discuss state-of-the-art mobile digital rights systems, such as Nokia’s Music Player and NEC VS-7810 and component technologies; and highlight some emerging technologies.
11.1 INTRODUCTION TO INFORMATION SECURITY AND DRM TECHNOLOGIES Modern advancements in the wireless communications infrastructure, signal processing, and digital storage technologies are enabling pervasive mobile digital media distribution; digital distribution allows the introduction of flexible, cost-effective Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
371
372
SECURITY AND DIGITAL
business models that are advantageous for multimedia commerce transactions. The digital nature of the information also more easily allows individuals to manipulate, duplicate or access media information beyond the terms and conditions agreed on in a given transaction, which has made content protection and rights management an influential issue in mobile content delivery. We begin this chapter by providing an introduction to the field of information security in order to motivate the basic techniques necessary for DRM. We then discuss key architectures, models, and requirements for DRM in the context of mobile devices. Section 11.3 addresses state-of-the-art and emerging challenges and solutions. 11.1.1
Information Security
Information security technologies can be classified into three main groups: computer security-, network security-, and content security-related methodologies. Each category entails the protection of information within a predefined scope. The oldest form of security, computer security, involves the protection of information within a computer, such as a single workstation. Traditionally the scope of computer security has been the standalone computing environment. Popular mechanisms include user password authentication and intrusion detection for access control, and antivirus programs to prevent information corruption. In contrast, network security is the protection of information during transit. In this type of protection, we are concerned with the security of the communication channel. Examples of basic technologies employed include encryption to prevent eavesdropping and digital signatures to ensure authentication of the received information. These technologies are used in conjunction with security conscious protocols that attempt to prevent undesirable access or processing of the signal. Content-based or media security is the newest form of information protection that has emerged since the early 1990s. In this form of protection the intellectual property itself is protected against attacks such as unlawful tampering, illegal duplication, and unauthorized access. Since protection is applied or “attached” at the content level, the tools often merge sophisticated digital signal processing with traditional security-related transformations such as encryption and attempt to provide more semantic meaning to the data through the use of identification tags such as metadata. There are three main aspects to information security [1]: . Security attack—an unwanted act performed by a party in order to jeopardize the protection of given information. Examples of security attacks include eavesdropping, forgery, masquerading, tampering, denial of service (DoS). A security attack can be conducted by an individual or group of individuals, known as the attacker(s), who may or may not be involved with the information creation, processing, communication, or storage. Security attacks are often grouped into the two main categories called “passive” and “active.” Passive attacks are often more readily prevented and more arduous to detect; they primarily encompass forms of eavesdropping. On the other hand, active attacks are normally easy to detect and more challenging to prevent; these include forgery, masquerading, tampering, and DoS.
11.1
INTRODUCTION TO INFORMATION SECURITY
373
. Security mechanism—a means to detect, prevent, and/or react to a security attack. The process usually takes the form of an algorithm and associated communication protocol. Examples of security mechanisms include digital signatures and public key cryptography algorithms, and the secure socket layer (SSL) protocol. The algorithms and protocols complement one another in order to jointly provide protection to a broad class of attacks. A security mechanism is effectively designed by making using of models of the behavior of the attacker(s). . Security service—an operation whose main objective is to counter security attacks through the effective use of one or more security mechanisms. Examples of the most common security services that we discuss in this section are confidentiality, authentication, integrity, non-repudiation, access control, availability, and antipiracy protection.
Let us consider a simple information communication system involving a single sender and receiver. Computer security aims to protect information at the end nodes and network security safeguards the communication channel. The associated mechanisms are applied through processing of the information at the bit level. In contrast, media security shields information from attacks at a semantic level and thus is often applied at a “higher stage” such as the application or presentation layers of a network. We divide security services into the following seven groups: 1. Confidentiality, which in a broad sense protects information against passive attacks such as eavesdropping that entails monitoring the information transmitted in a given medium. The mechanisms used to achieve confidentiality involve encryption. For an eavesdropping attack, the service is necessary both at the workstation and during information transmission. Eavesdropping cannot be easily detected at the time of attack. However, it can be prevented by scrambling the information such that access is impossible by a party other than the sender and the receiver. For DRM, the inaccessibility helps keep the content from being illegitimately copied, viewed, or tampered while in storage or transit. 2. Integrity, which ensures that the information received has not undergone unwanted tampering that affects the credibility of the information. Integrity is achieved through the use of mechanisms such as hash functions and/or digital signatures. An unlawful third party can redirect information from the sender, modify it, and transmit the misrepresented information to the receiver. 3. Authentication assures that the information received is in fact from the legitimate source. This service defends against masquerading. In particular, the mechanisms must ensure that (1) the initiating parties during a connection are in fact credible and that (2) the connection is not interfered with by an attacker. In DRM, authentication allows both parties to establish trust that they are indeed communicating with legitimate sources.
374
SECURITY AND DIGITAL
4. Antipiracy protection is the process of illegally duplicating and potentially retransmitting information. Because digital information can be exactly copied, this is of serious concern for the commercial digital media industries whose revenues are tied to the number of legitimate individuals buying digital content. The mechanisms used to prevent piracy have a goal of providing greater control of information and are a part of an overall DRM system that includes encryption, digital watermarking, metadata and rights expression languages, and protocols such as secure electronic transaction (SET). 5. Access control, which involves the legitimate admittance of information or a party to a host computer and its resources. Such protection can be used to prevent unlawful users from access to sensitive information through passwords or biometrics as well as prevent the spread of viruses to the host through the use of anti-virus programs. In DRM systems, access control can, for instance, enforce the negotiated rights in a transaction by preventing use of content until payment has been made. This is often implemented by ensuring the content is encrypted even after download and allowing a user the decryption key only after (s)he can prove payment. 6. Nonrepudiation, which guarantees that a sender cannot dispute that they transmitted a given message to the receiver. This can be established through the use of digital signatures, biometrics, and other data about the sender that would be collectively difficult to repudiate at a later time. Nonrepudiation is useful in proving that both parties have agreed to terms mutually without fear that one party will later recant. 7. Availability, a service that attempts to recover from loss or reduction of information access. An offense against availability is the DoS attack in which a service provider is flooded by artificial demands for information such that authentic requests cannot be sufficiently addressed. This attack is impossible to prevent ahead of time. Services such as IBM’s denial of service—alert and response exist to help detect network intrusion and attempt to recover from such an attack [2].
11.1.2
Content-Based Media Security
In the context of information creation, processing, transmission, and storage, content alludes to a higher-level representation or semantics of the data. Naturally, this implies that content may be comprised of multiple forms of media such as audio, imagery, video, text, and graphics in an assorted variety of digital formats. As a result, the characteristics of the information can greatly vary. For example, the required bit rate, maximum acceptable error, decompression complexity, and display requirements may deviate significantly which creates several challenges in terms of content protection. Traditional forms of computer and network security do not sufficiently address the needs of content security because they often process information at the bit level, which does not allow appropriate consideration of the semantics of the
11.1
INTRODUCTION TO INFORMATION SECURITY
375
information. This consideration is necessary if we are to design security mechanisms that can handle the processing of high-bandwidth information such as video that may undergo loss or format conversion and allow the possibility of retention of some control over the intellectual property once it is transmitted. For example, encryption and authentication functions must accommodate varying data formats and lossy compression. Furthermore, computer and network security alone cannot appropriately manage the issue of piracy. Content-based or media security responds to these issues through the following attributes: . Protection is tied to a higher level of the content, opposed to the actual bits, to provide efficient and effective security that can be designed to be more robust to format conversion or recovery from communication error. For example, digital signatures can be based on high-level features of information rather than the bit representations to allow authentication even in the face of lossy compression. . Security processing, for encryption or digital watermarking, is integrated with other signal processing tasks such as compression and decompression to be able to handle fluctuations in bit rate. Combining encryption or digital watermarking with signal processing not only allows the reuse of processing blocks for greater efficiency but can also provide a structured method to attach security processing to varying content forms and bandwidth. . Semantic information such as metadata is associated with the content security mechanisms to ensure that the content is used as negotiated in an information commerce scenario. A significant application and motivation for the development and use of contentbased and media security (in addition to traditional computer and network security) is DRM. For the remainder of the chapter we focus on DRM and its role in mobile content delivery.
11.1.3 Digital Rights Management 11.1.3.1 Overview Digital rights management (DRM) is the digital management of user rights to content. It entails linking specific user privileges to media in order to control viewing, duplication, access and distribution, among other operations. Ideally, the goal of a DRM system is to balance information protection, usability, and cost to provide a beneficial environment for all parties in an information commerce transaction; this includes expanded functionality, cost-effectiveness, and new marketing opportunities. Overall, management is achieved through the effective interaction of business models, legal policy and technology. It is the conflicting characteristics and rapidly changing environments in each of these spaces that makes DRM a challenging and interesting problem.
376
SECURITY AND DIGITAL
A DRM system is comprised of the following basic entities: the content, the users and the rights. We define them below as follows: Content refers to the actual intellectual property commonly referred to as the “work.” Users are any parties that are involved in the overall content distribution chain. This can include content creators, publishers, aggregators, distributors, and consumers. Rights express the permissions, constraints and obligations between the users and the content. The relationship between these fundamental entities is conveyed in an information architecture of the DRM system [3] shown in Figure 11.1. The information architecture deals with modeling and describing these entities, as well as conveying their relationships to one another. As Figure 11.1 shows, users may create or make use of content and/or own the rights over specified content. This general model is valid over any type of business or information commerce paradigm that incorporates DRM. From this perspective, it is clear that the purpose of DRM in any technological system is to facilitate the interaction between these abstract entities. However, to implement these DRM relationships in real systems, it is useful to summon the functional architecture [3,4] to provide an overview of the components and necessary modules in the overall DRM system. Figure 11.2 provides a possible functional architecture (modified from Refs. 3 and 4). Here, the overall DRM system can be seen in terms of a sequence of possible components that exists in the lifecycle of a piece of content. This sequence, called the content value chain, traces the
e.g., text, graphics, video, audio
Content create distribute use
Users e.g., content owner, copyright holder, aggregator, consumer
Figure 11.1
over
own
Rights e.g., copy once, play two times, distribute freely, never tamper
Information architecture of a general DRM System (Core Entities model).
11.1
INTRODUCTION TO INFORMATION SECURITY
377
Creator Content Creation, Capture Content Rights Establishment Content Rights Validation
Publisher
Aggregator
Content Packaging Content Repository
Distributor
Content Trading Content Distribution
Retailer
Content Trading Content Distribution Content Payments
Consumer Figure 11.2
Content Tracking Permission Management
Functional architecture of a general DRM system.
content from its creation to its consumption. Not all DRM scenarios will have all the components shown in Figure 11.2, and one or more parties can represent each or many of the different blocks because of the different distribution and business models in use. However, from an engineering perspective, in which we view “the chain of hardware and software services and technologies governing the authorized use of digital content and managing any consequences of that use throughout the entire life cycle of the content” [4], we can identify common elements that are employed in a broad class of scenarios. For example, at the technological level, DRM systems incorporate security mechanisms including encryption, digital signatures, digital watermarking, metadata, and network security communication protocols at various stages in the chain. We define these processes as follows: 1. Encryption. This is the process of scrambling digital information with the use of an encryption key such that access to the original information is not possible without applying the inverse process of decryption. By limiting access to the decryption key, access to the information is controlled. There are two main types of encryption: symmetric, which we discuss here; and asymmetric, which we discuss in relation to digital signatures. In symmetric encryption,
378
SECURITY AND DIGITAL
the same key is used to encrypt and decrypt. The resulting mathematics deals with bit operations that are computationally simpler to implement than asymmetric encryption. Many DRM systems in practice make use of a symmetric algorithm known as the advanced encryption standard (AES) [5] for the task of encrypting content. However, if real-time encryption of high-bandwidth information such as video is required, in addition to robustness to channel loss, strategies based on selective or progressive encryption can be employed and are especially suited for mobile applications. Here, only specific features of the content are encrypted or prioritized in order to distort the perceptual quality without requiring heavy computation. In Section 11.3 we discuss methods of encryption suitable for mobile applications in more detail. 2. Digital Signatures. These provide nonrepudiation of a transaction so that all parties are committed to the exchange and its associated terms on finalization. Digital signatures make use of a “footprint” (of the data to be signed) commonly generated by a hash function that takes the variable-length data (in this case the electronic version of the terms of the exchange and/or the content) and produces a much shorter fixed-length dataset called the hash. This result is then encrypted using asymmetric encryption. In asymmetric encryption the key used for encryption KE to scramble the data is different than the key used for decryption KD . One of the keys of the (KE , KD ) pair is publicly available and the other is private. KE and KD are related such that the encryption and decryption is possible, but it is computationally infeasible to determine the private key given the public one. In the case of digital signatures, KE is the private key and KD is the public member. Once the sender encrypts the hash with KE (which is a secret) the resulting signature and associated data are transmitted to the receiving end. At the receiver, anyone can verify that the signature of the data, as well as the data themselves, is valid by comparing the decrypted value of the signature (using the publicly available KD ) with the hash of the data received. If they match, then it is clear that only someone who knows KE (i.e., the legitimate sender) has generated it from the correct data. Thus, the signature guarantees that the data is not forged or tampered with, and nonrepudiation is enforced. In a mobile environment or heterogenous content distribution chain, the digital signature may be too sensitive to format conversion, transcoding, or unavoidable errors. In such situations proposals have been made to generate the hash from more robust features of the content [6] or make use of other strategies for tamper assessment [7]. 3. Digital Watermarking and Fingerprinting. This embeds an often imperceptible mark, called the watermark or fingerprint, into the host media such that it travels intrinsically with the resulting content and enhances copy control, distribution tracking, and usage. The embedding process often makes use of a secret watermark key known only at the watermark embedding and detection stages in order to make access, removal, or forgery of a watermark difficult. The purpose of watermarking in a DRM context is to insert a payload,
11.1
INTRODUCTION TO INFORMATION SECURITY
379
which represents the information required by a DRM system for certain tasks such as copy control and distribution tracking. The payload is inserted by making use of the watermark key and features of the host content in order to create a watermark signal, which, when added to the host, is imperceptible and robust to incidental processing. The resulting composite signal, known as the watermarked content, may undergo some minor processing such as lossy compression during distribution before being passed through a detection algorithm that uses the watermark key to extract an estimate of the payload. The detection algorithm may be housed within some part of the DRM system such as the content player in order to use the extracted payload to determine what the usage rules apply at that stage of the distribution chain. Alternatively, if the payload is unique to each user, then the process is often called “fingerprinting.” In this situation, detection is done off line in order to determine who the original source of pirated content is when illegitimate copies are identified. Watermarking is normally applied before any other security processing such as encryption within a DRM system. Thus, by some, it is observed as a last line of defense against unauthorized usage of the content [8]. 4. Metadata. Metadata for DRM provides machine-readable expressions that link the three entities of content, users, and rights together so that the DRM devices can enable permissible operations. In general, metadata are defined as data about information [9]. DRM metadata [10], in particular, convey information about the rights of the various users in the content distribution chain to various parts of the content; they may include data about the copyright owner, allowable usage of the content, and details about the cost to use or distribute the content. The main objectives of DRM metadata are to (a) provide semantic meaning to an often diverse set of content, so that educated usage decisions can be made regarding the content (e.g., cataloging, purchasing, redistributing) and (b) indicate an explicit relationship between the users’ rights and content, so that a practical bridge can be formed among the areas of commerce, intellectual property law, and technology. Since metadata needs to be used by a diverse set of applications (DRM- and non-DRM-related) and interoperability of all of these applications is necessary, a great deal of standardization effort has been put forth to develop an associated expression language. One undertaking that is anticipated to have much success as a digital rights language is the eXtensible rights Markup Language (XrML). 5. Security Protocols. These ensure the protection of the content while it is transmitted from one party or device to another in the distribution chain. In general, a security protocol defines a set of rules for the trusted communication of two or more parties in a communication network. These rules contain specifications as to the format, the necessary data, and processing to ensure secure information tranmission. Establishing a secure communication channel requires some initial handshaking on the part of some or all of the parties or devices that is enforced within the protocol. One of the goals of
380
SECURITY AND DIGITAL
effective protocol design, especially in the context of mobile applications, is to minimize this overhead while maintaining a reasonable level of protection. To successfully implement an overall DRM system for a given business application or model, designers must integrate the above technologies in an effective way to produce an end-to-end system that serves the needs of all users. Systemlevel design expertise is integral for this process and is currently an on-going effort in the research and industrial communities. We demonstrate how the various security elements can be put together in a generic DRM content distribution system shown in Figure 11.3; the Rights—System [4] developed by InterTrust follows a similar structure. Through our explanation of this system we hope to demonstrate how the different components of a DRM system effectively interact with one another to achieve the overall protection of content. The raw content must first be prepared by a process called the packager. The packager makes use of a securely generated content key to convert the raw content into a specific format (perhaps via lossy compression), encrypts the result (using symmetric encryption [1]) with the content key, and produces metadata to describe the content (e.g., the name of the artist, copyright holder, and work may be included) and to identify the location where the content key for decryption may be obtained (e.g., the network address or URL of the associated site). The packager may generate the content key or may obtain it from another party. In either case, after preparation of the content, the resulting data (compressed encrypted content and metadata) is transported to a content (distribution) server (CS) and the content key is sent to a rights fulfillment server (RFS). A user who would like to retrieve the content (and who we assume has the necessary DRM client software installed on his/her machine) needs to decrypt the content
ENCRYPT{content,key}
Distributor/ Aggregator
ENCRYPT{content,key}
$$ content
Packager
Retailer
Consumer token
key
Figure 11.3
Content Distribution Server
Rights Fulfillment Server
token rights, key
Generic DRM content distribution system (adapted from Ref. 11).
11.1
INTRODUCTION TO INFORMATION SECURITY
381
and use/access it according to the negotiated rights. According to the example, the user can obtain the encrypted content and associated metadata from the CS. To gain access to the information, (s)he would have to purchase rights from a web retailer (WR), who would provide the user with a token that can be provided to the RFS to show proof of payment. On receiving this proof, the RFS will provide the user with rights in the form of digitally signed (using the user ID) metadata and the content key for decryption. The metadata can specify time-outs, the number of times that a song can legally be played or whether the media can be copied. Using this information, the DRM client can decrypt the content and allow access according to the purchased rights. The scenario presented above is appropriate for a client –server type of distribution model. There are extensions that include portable devices, rights lockers, and peer-to-peer systems as discussed by Feigenbaum and Freedman [11]. In essence, the business model characterizes the usage rules that are enforced by the technological architecture; thus, the DRM system technology, content distribution, and business models are intrinsically tied. In the next section, we introduce some emerging content distribution models and discuss the associated DRM challenges. We do not provide an exhaustive list of models, nor in-depth evaluation, but intend to provide a flavor of the various issues and compromises.
11.1.3.2 DRM and Content Distribution Business Models To establish an effective DRM system that links users, contents, and rights, one must take into account the way in which the content is distributed (i.e., the distribution model) and the allowable usage rights (which are related to the business model). Examples of several content distribution and business models are introduced in this section. The most fundamental type of content distribution is based on a client – server paradigm in which a client requests and downloads information from a given server. The server is often centralized and assists all host computers. The content, depending on its format, may be streamed in real time or completely downloaded and stored for later use. Many first-generation e-commerce and DRM systems make use of this basic download model. In contrast, in peer-to-peer file sharing systems, content is stored and distributed directly by end systems (termed “peers”) without the need of a central server. Examples of such systems include the infamous Napster and Kazaa. Multicast distribution, which has been shown to scale more naturally than have client – server models [12] for a large number of consumers, involves the distribution of content from a single source to multiple recipients. For example, cellular digital packet data (CDPD) technology [13], a specification to support wireless Internet access for mobile devices such as cell phones, supports multicast distribution. Effective business models are also an integral part in the establishment of a successful DRM system. Subscription-based models allow a given user to obtain access to a library of content with a predefined scope based on the details of his/her “membership” rights. The membership is normally purchased ahead of time.
382
SECURITY AND DIGITAL
In contrast, promotion models provide restricted usage of the content to a group of users in order to entice them to pay for less limited access. Methods of restriction can include permitting only a fixed number of allowable views or plays after which the content is inaccessible, or curbing the distributed media characteristics to a noncommercial quality version. In these cases, users can pay additional fees in order to gain unlimited plays and high quality versions of the content. Pay per view (also called “pay per play”) charges the consumer according to the number of uses of the content. This scenario is relevant for content-on-demand applications in which pricing is dependent on the number of views or plays. Microtransactions/micropayments are another form of business model in which the users deal with a small volume of content (e.g., size of a few bytes) worth a meager value (e.g., a fraction of a cent). This scenario is envisioned1 to encompass applications involving individual charging and delivery of individual stock market quotes, newspaper articles, and Webpages [14]. Because this model deals with low volumes of inexpensive data, it is presumed that consumers will engage in high numbers of such transactions within a small period of time. In floating licenses, the usage of content is not associated with specific users. Instead, a group of, say, N individuals jointly own or have access to M N licenses. The system allows up to M members of the group to simultaneously make use of the content. In this case, the business model attaches permissions to a group and must keep track of the number of accesses at any given time. Another popular business model, called superdistribution, overlaps with some of the models described. Mori and Kawahara [15] describe superdistribution as an approach to distributing software in which software is made available freely and without restriction but is protected from modifications and modes of usage not authorized by its vendor. . . . Superdistribution relies neither on law nor ethics to achieve these protections; instead it is achieved through a combination of electronic devices, software, and administrative arrangements. . . By eliminating the need of vendors to protect their products against piracy through copy protection and similar measures, superdistribution promotes unrestricted distribution. . . .
The modern use of the term encompasses the distribution of any general types of content by peers to one another. DRM systems aim to provide effective content protection and management for a number of diverse content distribution and business models. A given enterprise must consider several technical and nontechnical issues before settling on a system [4]. We attempt to provide a flavor of the variation of technical challenges by considering a number of different scenarios. If a multicast delivery channel is combined with a pay-per-view model (for a video-on-demand scenario, say), a DRM system will have several challenges including content key distribution to the multiple users. For efficiency of the encrypted 1 To the best knowledge of the authors of this chapter, no real system has yet been implemented, in part due to the lack of an appropriate DRM system.
11.1
INTRODUCTION TO INFORMATION SECURITY
383
media distribution, it is preferred that a group key Kg is used to encrypt the content once at the transmitter end. If each user knows Kg , then s(he) can individually decrypt the content once it is received. However, this requires that the group key be securely transmitted to each of the users by a centralized server that keeps track of the membership to the group. This membership is usually dynamic; that is, an individual may be able to leave or enter at specific times. Therefore, for reasons of security, Kg must be updated to prevent new members from being able to decrypt information that was sent before they paid for privileges, and to stop members who have left the group from accessing information that is unpaid for after their departure (this latter issue is called “key revocation”). In the DRM literature, this problem is called the “state update problem” for key management, and has been investigated by a number of researchers [16 – 19] to find the best compromise in terms of complexity of encryptions and the update message length. In the realm of microtransactions/micropayments in which we assume that the content is downloaded using a client-server architecture (e.g., a Webpage is providing information on timely stock quotes), one of the main challenges is to make sure that the overhead of DRM per download is low; it should not cost the content provider in the order of one cent to transmit information and process the payment for, say, specific search engine results, for which the provider is paid 0.01 cent by the consumer. It is also envisioned that, given the granularity of the content and payment size, such a model will encourage a more diverse set of content providers (e.g., pay-per-use search engines, selective magazine article purchases) that will experience a significantly larger number of such transactions initiated from consumers. As discussed by Waller et al. [14], DRM methodologies that make use of protocols such as secure socket layer (SSL) to secure the communication channel have a heavy overhead that renders widespread use for microtransactions/micropayments infeasible. In fact, a better approach is to apply security to the content itself (rather than the channel as in the case of SSL), appending metadata and digital signatures as appropriate. A third-party “transaction broker” would be used to handle payments and issue “tokens” (where appropriate) in order to obtain the appropriate content keys for decryption and access to the content. The Secure Interactive Broadcast Infotainment Services (SIBS) project [14] is one such system with the goal of developing mechanisms for secure and scalable microtransactions. To be able to enforce certain types of business models such as pay per view or even subscription-based pricing in peer-to-peer networks, the use of the content needs to be tracked. One idea that has been proposed is to use a digital rights locker. A digital rights locker is a storage location that houses an individual’s or device’s rights to accessing content. The locker can be a central registry for the acquired rights of a given individual or device; it can also contain backups of prepurchased content and contain usage history. The greatest advantage of using the digital rights locker is the increased flexibility and potential for content protection even if the media leaves the peer-to-peer network and enters devices such a PDA or a mobile phone. Thus, a rights locker allows users portability beyond the standard distribution channels. One such product is available at the time of this writing by
384
SECURITY AND DIGITAL
Digital World Services [20] and is predicted to appear in future generations of InterTrust’s Rights—System. 11.1.3.3 DRM and Security For effective information management, it is clear that, DRM designers must borrow toolsets from the area of information security. However, there are a number of key differences from the abstract academic notions of security and the needs of practical DRM systems that we address in this section. The first departure is in scope and objective. The overall goal of a DRM system is to facilitate an electronic marketplace and optimize its usefulness to all parties. For example, consumers should be safeguarded against loss of privacy and content providers should defend against intellectual property theft. Thus, DRM applies to the overall management of the information commerce system. In contrast, the purpose of security is to unconditionally prevent a specific attack, such as piracy or eavesdropping, potentially arising in some part of the system. The broader goals of DRM not only involve issues of protection but more importantly also include measures of business success in which keeping a product commercially viable is the fundamental priority. Ironically, this latter objective can be at odds with that of strict security. Allowing some small level of piracy to accommodate greater consumer satisfaction and acceptance of DRM-enabled content distribution is more lucrative than restricting all unlawful duplication [4]. The content providers that have implemented the most user-friendly DRM system will have a significant market advantage. From this perspective, the effectiveness of a security system for DRM does not imply unbreakability. For instance, the pay TV market has been under continuous attacks with “hacks” freely available on the Web. However, the inconvenience of building and continuously updating such systems to keep up with the scrambling counterparts have still allowed pay TV to be a viable business. Another difference between security and DRM involves the formulation of “good guy/gal” and “bad guy/gal.” In the field of traditional security, there are precise distinctions between the lawful members of a communication system and the attackers. There are clear definitions of “trust” and “use” that are more challenging to characterize in a practical DRM setting. The primary obstacle is that the nature of alliances are dynamic in reality. For example, competitors may merge or allies dichotomize. This has serious implications if one tries to naively adopt tools developed in the traditional security area and apply them to DRM. The borrowed mechanism may not guarantee protection because the original assumptions and relationships between parties cannot be enforced throughout the evolution of the system. Furthermore, security may be breached during the implementation– deployment – system upgrade processes (with no fault on the part of the original algorithm design). 11.1.3.4 General Requirements for Mobile Terminals Mobile DRM (MDRM) is becoming an area of increasing focus to jump start the legitimate, fair, and secure exchange of mobile content, and thus avoid the “Napsterization” of content (as in the case of the Internet). For DRM to be successfully
11.2
MPEG INTELLECTUAL PROPERTY MANAGEMENT
385
developed for mobile terminals, a number of fundamental differences that set mobile terminals apart from their wired counterparts must be considered: . The processor performance (in terms of power and complexity) in mobile terminals is limited. This has a significant impact on the design of security algorithms that must, in part, compromise security, for practicality. Software must be down sized to accommodate the reduced hardware capabilities. . The capabilities of mobile devices may vary significantly. Standardization efforts early on in MDRM development is necessary to prevent competing systems that may not easily integrate for seamless DRM across different devices. . There is a greater probability of error and content desynchronization. Thus, DRM algorithms must be robust to variations in delivery quality, and should also be able to recover from distribution failure and loss. . Because of the convenience of the devices (which is the motivating factor in their success), users expect ease of use and cost-effectiveness of any DRM system implemented. Risk and pain assessment is necessary to deduce the appropriate security and convenience level favored for successful business. In the next section, we look into DRM involving MPEG IPMP, to provide the reader with an understanding of how DRM is shaping standards activities. Given the as-yet uncertain status of MDRM applications, we are not able to provide an effective mobile DRM case study. Section 11.2 is presented to elucidate integration of DRM with well-known formats. Section 11.3, however, touches on MDRM emerging technologies.
11.2 MPEG INTELLECTUAL PROPERTY MANAGEMENT AND PROTECTION One of the goals of the Moving Picture Experts Group (MPEG) is to ensure interoperability, including DRM, between multimedia systems. MPEG community mostly concerns interoperability from the consumers prespective: ensuring that content from multiple sources will play on players from different makers. In the following subsections, we briefly introduce the DRM standardization activities in the MPEG community. MPEG specifies DRM as intellectual property management and protection (IPMP) [21]. 11.2.1 Copy Protection on MPEG-2 Videos The MPEG-2 standard provides some methods for identification and protection of intellectual property of contents. For identification, a unique copyright identifier of 32 bits, which identifies the type of the work (ISBN, ISSN etc.), can be used by a registration authority to identify the work at the audio, visual, or system
386
SECURITY AND DIGITAL
level. For enabling protection, several provisions can be used: (1) signal whether particular packets have been scrambled, (2) send messages to be used in (proprietary) conditional access systems, or (3) identify the conditional access system used. The MPEG-2 standard does not have specific mechanism in identifying and protecting the video streams themselves. However, when videos are recorded in the DVD-ROM disks, they are protected by the Content Scrambling System (CSS). CSS is used to protect the content of DVDs from piracy and to enforce region-based viewing restrictions. A DVD system includes three components: the DVD disk, the DVD player, and the host (computer, host board, etc.). The DVD disk contains the encrypted content, as well as a hidden area. The contents of this hidden area cannot be delivered, except to an authenticated device. Presumably, any device that can authenticate has been licensed by the DVD Copy Control Association, and as a consequence is trusted to receive the information. This hidden area contains a table of several encrypted disk keys. The player stores the player keys that are used to decrypt the disk key, the region code that identifies the region in which the player should be used and another secret that is used for authentication with the host. Details of the inter-operation of keys and the CSS system can be found in Kesden’s report [22].
11.2.2
MPEG-4 IPMP Hook
The MPEG-4 IPMP standard specified two pieces of technology: identification of copyright and IPMP hook to enable protection. It does not prescribe when and how often to use such identification descriptors. It relies on international treaties and legislation to prohibit removal of IPMP information. An identification of an MPEG-4 AV object specifies whether it is protected by an IPMP system, its type (audiovisual, visual, still picture, etc.), and the registration authority that hands out unique numbers (e.g., ISAN, ISBM, ISRC). It can also indicate the titles and supplementary information and references to separate data streams. In 1998, MPEG concluded that it was not desirable to enforce IPMP tools on all MPEG-4 content and MPEG-4 players, and it was neither feasible nor desirable, to standardize a complete DRM system [21]. No DRM system could satisfy the varying application needs ranging from real-time low-quality communications to valuable content in settop box. Thus, MPEG-4 standardizes hooks that builds secure MPEG-4 delivery chains. Video bitstreams embed information that informs the terminal which IPMP system should be used. Two simple IPMP extensions of basic MPEG-4 systems are specified. . IPMP Descriptors (IPMP-Ds). These are a part of the MPEG-4 object descriptors that describe how an object can be accessed and decoded. These IPMP-Ds are used to denote the IPMP system that was used to encrypt the object. An independent registration authority (RA) is used so any party can register its own IPMP system and identify this without collisions.
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
387
. IPMP Elementary Streams (IPMP-ES). All MPEG objects are represented by elementary streams, that can reference each other. These special elementary streams can be used to convey IPMP specific data. Their syntax and semantics are not specified in the standard.
11.2.3 MPEG-21 and MPEG IPMP Extensions The MPEG-21 IPMP standard aims to provide a uniform framework that enables all users to express their rights and interests in digital Items and to have assurance that those rights, interests, and agreements will be persistently and reliably managed and protected across a wide range of networks and devices [23]. In 2002, MPEG adopted the XrML 2.0, developed by Xerox and Content Guard, as the basic rights expression language for describing contractual usage rules for “digital items” [24]. This language provides rules that are flexible and extensible. Its framework does not favor any particular human language, culture, or legal system. It also provides unambiguous semantics and predictable effects. Some expansions, such as public policies, rules, and business initiatives, have been adopted in order to satisfy versatile needs of multimedia data. The MPEG-21 Rights Expression Language was finalized in July 2003. In addition to Rights Expression Language, MPEG-21 Rights Data Dictionary was finalized in late 2003. Additional IPMP extensions are being developed to solve these issues: (1) support access to and interaction with content while keeping the amount of hardware to a minimum, (2) support easy interaction with content from different sources without swapping of physical modules, (3) support conveying to end users which conditions apply to what types of interaction with the content, (4) support protection of user privacy, and (5) support service models in which the end user’s identity is not disclosed to the service/content provider and/or to other parties. The work currently concentrates on various security interfaces. Major issues are the management of trust and tamper resistance. MPEG-21 IPMP was finalized in early 2004.
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
The popularity of mobile terminals such as phones and personal digital assistants (PDAs) is growing at an incredible rate, and with it we are also seeing a dramatic increase in the number and variety of mobile contents delivery services. This implies the need for technologies to enable the secure delivery of information to mobile terminals. In this section, we intend to present a technical overview of current state in mobile digital rights management (MDRM). Main aspects, such as MDRM systems requirements, categorizations and architectures, are studied. We will also analyze several sample MDRM systems and further focus on the discussion of several emerging security technologies to enable the success of MDRM challenge.
388
11.3.1
SECURITY AND DIGITAL
State-of-the-Art MDRM Systems
In recent years, many DRM systems have been proposed and commercialized. According to the type of client-side content access devices (terminal), these systems can be categorized into platform independent, mobile DRM only, and fixed DRM only, where protection and rights management of live and on-demand content delivered to fixed and mobile devices and across networks, to mobile devices only, or to fixed devices only are enabled. We shall look at several typical commercialized MDRM systems that are either mobile DRM only or platform independent, with the capability to facilitate rights management of various kinds of digital content delivered to mobile devices. In preceding, let’s look at some popular MDRM system classifications and terminologies based on various metrics. . Rights enforcement-based Server-side DRM enforcement—flexible and scalable server-side solutions are used to control access to and usage of content delivered to mobile devices that do not yet have client-side DRM capabilities. Server- and client-side DRM enforcement—where client-side mobile devices have DRM capabilities. . System architecture-based Centralized—rights are managed by a trusted third party (central). Each service provider and each user have at least one account managed by the central. An account manager handles the rights processing. Distributed—rights are stored in tamper resistant devices such as smart card or self-protecting containers. Rights are not managed by a centralized rights manager but the user or a delegated account manager. Tamper resistant hardware is often needed. . Usage rule-based Forward lock—intended for the delivery of news, sports, information, and any content that should not be sent on to others. Unencrypted content is transferred to the mobile device using any delivery method. The mobile device is allowed to play, display, or execute, but not forward. No rights object is delivered. Instead the mobile terminal enforces a default set of rights and ensures that the content cannot be forwarded to any other device. Combined delivery—enables usage rules to be set for and delivered together with the content. Unencrypted content is packaged with a rights object, and the whole package is transferred to the mobile device using any delivery method. The mobile terminal enforces the usage permissions defined in the rights objects and ensures that the content cannot be forwarded to any other device. It enables preview feature. Separate delivery—enables superdistribution to protect content with higher value. Encrypted content is delivered to the mobile device using any delivery method. It allows the device to forward the content but not the rights, which is achieved by delivering the content and the rights via a separate
11.3
.
.
.
.
.
EMERGING TECHNOLOGIES AND APPLICATIONS
389
channel. Recipients of superdistributed content must contact the content retailer to obtain rights to either preview or purchase the content. This kind of delivery requires a rights refresh mechanism. DRM model-based Media-player-specific—a model where the media player is responsible for the rights management and rights enforcement is exercised by the media player. Mobile-terminal-specific—a model where DRM functionality is tightly integrated with the mobile terminal’s system software/hardware. Network centric—a model where copyrights and policies are enforced during the delivery or superdistribution in the network. Content delivery method-based Broadcast—MDRM system targets at protection of broadcast contents. Only users who subscribed to the service can receive and use the content based on the usage rules. Streaming media—protecting and managing the usage rules of streaming media. Downloaded—protection of downloaded mobile content, such as ring tone, images, and games. Personal storage—MDRM system that protects and manages the copyright of one’s own works. Content variety-based Content-type-specific—the MDRM system supports only certain specific type of content. Any content type—the MDRM system is capable of supporting any type of content. Content value-based Heavy (premium) content—content with high commercial value. Consumer use rights are explicitly licensed by the copyright holder. Receiver has to pay for full content access. Light content—copyrighted content that may be free for noncommercial use (ringtones, screensavers, etc.). Personal storage—content that is not commercialized but owned and stored by the content creator. Security level-based Strong MDRM—capable of providing strong protection to premium content that is robust again various kinds of attacks. Light MDRM—only lightweight protection is imposed. Systems may be targeted at “keep honest people honest” or there may be non premium content.
In the following Sections 11.3.1.1 –11.3.1.3, we will briefly discuss several representative and commercially available MDRM systems. Their key characteristics and differences are summarized in Table 11.1.
390
Secure-containerbased Ticket-based
Centralized
Multiple DRM technologies; additional DRM technologies may be added using its plugin architecture Multiple DRM technologies
Nokia Music Player
EncrypTix
DMDmobile
No additional client SW needed
Does not require any proprietary technology to be deployed on the client side
Java-capable mobile terminal required; no TR HW requirement Compliant TR HW required
TR HW required
TRa HW required
Any
Any
Real-time, streaming, download
Streaming, download
Ticket, document
Any, including game and e-book
MP3, AAC, RealAudio, narrowband AMR audio, MPEG4, H.263, RealVideo MP3, AAC
b
Content Type
—
Streaming, download Download, streaming
Streaming, download
Supported Type of Distribution
Tamper-resistant. Federal Information Processing Standards Publications, Security Requirements for Cryptographic Modules.
a
Beep Science AS
VS-7810
Secure-containerbased
DRM Technology
Client (Mobile Terminal) System Requirement
Several Commercially Available Sample MDRM Systems
Helix
TABLE 11.1
Encryption, watermarking, fingerprinting
Encryption, TR HW, FIPS 140-1b compliant (HW L4, SW L3) AES 128-bit encryption
Encryption
Encryption
Strong encryption, secure container
Security Technology
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
391
11.3.1.1 Nokia Music Player—Distributed DRM Nokia Music Player, an accessory for Nokia mobile phones, enables the user to listen to an integrated FM stereo radio and downloadable protected MPEG audio/ music (MP3- and AAC-formatted, i.e., it is content-type-specific) files secured with InterTrust digital rights management technology. The InterTrust system is designed so that a secure channel is not required for other than privacy reasons. The content is protected via a secure container called DigiBox. DigiBox enables the association of rules and controls that specifies the content usage rules and the consequences of usage via cryptographic means. DigiBox is manipulated by using a trusted rights protection application to make the protected content available according to its associated access control rules. It’s one of the most popular distributed DRM system today. Such a secure container-based MDRM system will allow MDRM components to be integrated with almost any type of system, architecture, or network topology. It is aimed at providing both strong and lightweight security levels for MDRM systems. The drawback is that it requires tamper-resistant hardware devices to realize secure processing of the protected content.
11.3.1.2 NEC VS-7810—Centralized DRM NEC VS-7810, a MDRM system designed to enable secure delivery of information to mobile terminals, provides certain flexibility to incorporate into a broad variety of systems and architectures. VS-7810 is a “ticket”-based system where users have to purchase a “ticket” (decipher key) to access or make use of protected contents. Figure 11.4 illustrates the content delivery diagram using MDRM VS-7810. Unlike secure container-based systems, this centralized system does not require tamper-resistant hardware at the client side, although the mobile terminal has to be Java-enabled. In general, this kind of system is content type nonspecific and it is feasible to incorporate different systems and architectures. One of the
Figure 11.4
VS-7810 MDRM content delivery diagram.
392
SECURITY AND DIGITAL
disadvantages of such centralized system, however, is the cost to maintain accounts on the server(s) for each and every user and its lack of scalability. 11.3.1.3 Integrated Model Helix is another typical example of distributed MDRM system based on secure container technology, which is made platform-independent. Readers can check Table 11.1 to see the differences between Helix and VS-7810 and between Helix and Nokia Music Player. Helix DRM is a complete, end-to-end secure digital delivery platform that consists of four plugin components: packager, license server, client, and universal server DRM. The packager uses strong encryption algorithms and secure container technology to prevent unauthorized use of content and to prepare content for distribution via streaming, download, or other delivery methods. With true superdistribution capability, the packaged media content and the associated business rules for unlocking and using that content are stored separately, so that multiple sets of business rules can be applied to a single file over time. Helix DRM supports RealAudio, RealVideo, MP3, MPEG-4 video, H.263 video, AAC audio, and narrowband AMR audio. The license server verifies content licensing requests, issues content licenses to trusted, authenticated Helix DRM end-user clients, and provides auditing information to facilitate royalty payments. The client enables download and streaming playback of secure formats in a tamper-resistant environment based on the usage rules specified by the content owners. The media player specific plugin enables streaming of protected media from the helix universal server to client terminal with no tamper-resistant hardware. We call the kind of DRM model that represents a combination of different schemes an integrated model. Since a single MDRM scheme alone may not create a sufficiently broad solution to deal with all the mobile business requirements, an integrated system is expected to provide more capability that covers a wide range of business requirements to improve user convenience and offer better rights control flexibility and enhanced system scalability and upgradability. A representative example in this category is the Beep Science Mobile DRM system (illustrated in Fig. 11.5 and listed in Table 11.1) that comprises player DRM, terminal DRM, and network DRM models and includes the content policy system (CPS)—a server-side solution that enables the operator to act as a payment collector for their own and partners’ premium content and ensures that copyright restrictions are enforced; the content control engine (CCE)—a highcapacity, real-time infrastructure node that activates content protection during download; policy enforcement server (PES)—a real-time component for executing the copyright and charging policies; license server (PCR)—a backend system for managing policies and rights for premium content and services; and rights issuer server (RIS)—a system that supports content superdistribution. 11.3.1.4 MDRM Requirements The sample cases we discussed provide us with some insight into the design of MDRM systems. Essentially, MDRM systems aim to provide usage rights management to establish trust for secure content distribution to and access through mobile
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
393
Figure 11.5 Beep Science Mobile DRM system components.
devices. From a technological point of view, the content provider’s rights—user convenience and privacy—and the device mobility and capability must all be taken into consideration when designing a MDRM system. Subsequently, the basic requirements for a MDRM system include 1. Security. The MDRM system must be robust to various attacks and be able to prevent illegitimate usage and unauthorized distribution of rights protected digital content. Furthermore, it should ensure user privacy. 2. Scalability. The MDRM system must be scalable to handle dynamic communication channels, diverse mobile device capability, various types of digital content, and distinct rights and usage rules issued by different issuers with different security and cost requirements. 3. Usability. This involves ease of use, mobility, cost, and compliance. Ease of Use. The system should not sacrifice user convenience, and it should bring attractive services to end users. Mobility. The system should not reduce mobility or increase usage complexity due to mobility. Cost. This includes implementation and operation cost at both content provider and end user and is an important factor for successful MDRM systems. It must be feasible to mobile devices’ processing power, storage, and battery power. Compliance. The system should be compliant with the existing network infra structure and the various standard formats. Scalability and mobility are crucial for mobile DRM. In the next section, we consider some emerging security technologies that enable scalability and mobility for the design of practical MDRM systems.
394
11.3.2
SECURITY AND DIGITAL
State-of-the-Art MDRM Component Technologies
It is clear that MDRM technology should enable increased security to ensure authenticity and integrity of both content and rights. The sample systems we discussed in the previous section use encryption, watermarking, fingerprinting, and authentication technologies to provide secure DRM services. In this section, we look at several emerging security technologies that are intended to provide more adequate content protection for seamless content distribution in the mobile environment. We discuss the following topics: scalable and format-compliant cryptography scheme, the public key watermarking system, scalable watermarking for authentication and error recovery, and efficient key management for multicast in mobile environment. 11.3.2.1 Scalable and Format-Compliant-Encryption for Multimedia In a traditional communication system, the encoder compresses the source media into a fixed bit rate that may be equal to or less than the channel capacity and sends it to the receiver. Given the assumption that the receiver can obtain and decode all the bits in time, it reconstructs the media using all the bits received. Similarly, in a traditional encryption system, the sender encrypts each and every bit of a message and sends it to the receiver, where it will be decrypted at the same rate as encrypted. This often requires correct reception of each and every bit of the encrypted message to avoid subsequent plaintext from being improperly decrypted. In such a system, three or more assumptions are made: (1) the channel capacity is known, (2) the receiver can receive all the bits correctly in time for decryption, and (3) the receiver is capable of reconstructing the media in time. Those assumptions are often challenging for multimedia data streams (MDS) over wireless networks due to the large size of MDS, the varying and possibly low wireless network bandwidth, the wireless channel instability (error-prone), the diversity in mobile receiver device processing powers and storage spaces, and the high computational complexity of many encryption algorithms. These demand scalable and flexible approaches that are capable of adapting to changing network conditions as well as device capabilities. Figure 11.6 illustrates streaming media over various networks to various devices. The time constraint of real-time and streaming media makes it particularly crucial to offer scalability for secure wireless multimedia distribution. In the following text we will discuss state-of-the-art multimedia encryption schemes designed to meet some of the challenges discussed above. As a foundation, we briefly discuss the design of a scalable system for non-secure multimedia communication. Scalable Access of Nonsecure Multimedia: Three Basic Schemes Simulcast as well as scalable and fine-grained scalable (FGS) [25 – 27] compression algorithms were proposed and widely adopted to provide scalable access of clear text multimedia with interoperability between different services and flexible support to receivers with different device capabilities. With simulcast, multiple bitstreams of multiple bit rates for the same content are generated. With scalable coding, a
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
Server
media Internet (wired)
Gateway
Wireless network
request request Networked appliances: small/large screen TV
PDA request
media Tele-/video/web conferencing terminal
Cell phone
Mobile PC
lower resolution media
media
395
Mobile CE device Desktop/palm PC
higher resolution media
Figure 11.6
Streaming media over various networks to various devices.
bitstream is partially decodable at a given bit rate. With FGS coding, it is partially decodable at any bit rate within a bit rate range to reconstruct the medium signal with the optimized quality at that bit rate. In most applications, FGS is more efficient compared with simulcast [27] and is by far the most efficient scheme with continuous scalability. Selective Encryption If a medium datastream is encrypted using nonscalable cryptography algorithms, decryption at an arbitrary bit rate to provide scalable services can hardly be accomplished. If a medium compressed using scalable coding needs to be protected and nonscalable cryptography algorithms are used (the bitstream is encrypted uniformly), the advantages of scalable coding may be lost. To resolve this problem, selective encryption may be used [28 – 34], including some of the selective encryption algorithms proposed since the late 1990s. The idea is to encrypt only part of the multimedia datastream, often with lightweight encryption algorithms, to achieve certain level of application suitable protection; that is, only some parts of the entire bitstream are encrypted, while the rest are left in the clear. A general selective encryption mechanism works as follows (see also Fig. 11.7). First, an encryption algorithm Enc appropriate for the security requirement of the target application is chosen. Then, “selection” is performed by partitioning the source object (multimedia plaintext bitstream) O into two bitstreams O1 and O2. “Encryption,” encrypting only one bitstream O1 of the two, is done next. Let O ¼ O1 þ O2 , O ¼ EncK (O1 ) þ O2 be the target object after encryption. If the multimedia are to be compressed, the compression operation is either performed at first or at the same time that the encryption operation is carried out. One interesting reason for the latter approach is to support format compliant encryption, which we will discuss later in this section. Using MPEG video as an example, the most popular approach is using the features of MPEG layered structures to selectively encrypt only a portion of the bitstream. Basic algorithms include header encryption, sign bit encryption, and I-frame encryption, where only the header, the sign bit, the I-frames, or a combination
396
SECURITY AND DIGITAL
A key K A medium objec t O
Encoder:
Decoder:
partition
partition
O
encryption
serve r
insecure channel
compression
decompression decryption
decoded medium object O '
mobile client
Figure 11.7
Symmetric key selective encryption general architecture.
of them are encrypted and the rest are left in the clear. Most of those are lightweight solutions that provide low level of security. Interested readers can look at Refs. 28 and 30 for some details on the security analysis and possible attacks. Format Compliant Encryption When a bitstream is scrambled, the original format of the bitstream may be compromised if care is not taken. This is especially serious for compressed multimedia datastreams. If scrambling destroys certain inherent structure, compression efficiency can be compromised. Let’s look at a simple example. Assume we encrypt only the I-frames of MPEG video using intrablock DCT coefficient shuffling; that is, we shuffle the DCT coefficients within each DCT block. Notice that MPEG uses several attributes/characteristics of DCT coefficients when encoding a coefficient block. For instance, typical video frames often have many coefficients that are zero valued, especially after requantization. The high-frequency AC coefficients are likely to be small or zero. It is highly probable that a cluster of consecutive AC coefficients are all zero. MPEG effectively uses this property by sending the coefficients in an optimum order, by describing their values with Huffman coding, and by using run-length encoding for the zero-valued coefficients to achieve a significant reduction in bitrate. Assume a low-bit-rate video transmission over wireless network. As a result of shuffling, some clustered zero coefficients may be shuffled apart, resulting in increasing bit rate. To guarantee a full compatibility with any decoder, the bitstream should be altered only at places where it does not compromise the compliance to the original format. This principle is referred to as format compliance. Design of formatcompliant encryption algorithms has to take into account the following factors: security requirement, original format of the bitstream and its coding structure, computational overhead, bit rate, bitstream partition capability, error resilience capability, channel adaptation capability, and the tradeoffs between each pair. Wen et al. [35] gave a general framework for format-compliant encryption. They particularly pointed out that the encryption of a variable-length coding (VLC) codeword may not result in another valid codeword and hence designed a more suitable
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
397
algorithm to solve this problem by encrypting the indices of the codeword instead. The algorithm is as follows: 1. Create a bitstream partition (i.e., extract bits that are important). 2. Concatenate extracted bits. 3. Choose a public key or a private key encryption algorithm, such as DES or AES. 4. Encrypt the concatenated bits. For VLC-coded bitstreams, encrypt the indices of codewords from the code table instead, and then map it back to codewords in code table. 5. Put the encrypted bits back into their original positions. In their simulations, they focus on MPEG-4 video error resilient mode with data partitioning and discuss which fields can be encrypted and which should not be. Progressive Encryption The selective encryption algorithms above discussed can often be easily modified to offer two-layer scalability. For instance, if the original bitstream is partitioned into base layer and enhancement layer as was done in scalable coding, one can selectively encrypt the base layer and leave the enhancement layer in the clear. This often provides minimal security. If higher level of security is needed, enhancement layer encryption has to be added. Two-layer scalability can be preserved if the base layer and enhancement layer are encrypted separately. To provide fine-grained scalability, however, many selective encryption algorithms have to be modified since they are not specifically designed to be compatible with FGS coding. FGS-compatible encryption algorithms have to be used. Imagine a medium compressed using FGS coding to be encrypted with a nonFGS-compatible encryption algorithm; in such a case, the advantages of FGS coding will be lost. If an uncompressed medium is encrypted with non-FGS compatible schemes and transmitted through a dynamic channel to various unknown receivers, reconstruction with optimized quality will be difficult to achieve. To provide FGS scalability, progressive encryption may be adopted [36]. Given a bitstream S, the encryption operation is performed portion-by-portion, most often with the latter portion encrypted, based on an earlier portion, to allow partial decryption of the encrypted bitstream progressively. Using cipher block chaining or stream cipher, a multimedia bitstream may be encrypted with progressive decoding capability to allow partial decoding with optimized quality. One variation is to encrypt the base layer independently and the enhancement layer with progressive encryption that is either dependent on or independent of the base layer. Given the assumption that the base layer will always be received without error, on bitstream truncation, partial decryption and hence decoding of the enhancement layer will provide subor near-optimized quality at a given bit rate. Discussion Scalable cryptography schemes offer means for multilevel access control of multimedia content. A traditional security system commonly offers two
398
SECURITY AND DIGITAL
states: access authorization and access denial. With scalable cryptography algorithms, a single encrypted bitstream can suggest multiple levels of access authorization and hence multiple levels of rights management capability. For example, in a four-level access-controllable DRM system, we may specify access denial, general preview access authorization that allows preview of a sample or part of the movie in low-resolution, high-quality preview access authorization that grants access to club members a snick preview of the content in high resolution, and full content with full-quality access authorization. As an active research area, many issues need to be investigated to provide best performance, security, scalability, and interoperability. The following suggests some of the issues to be taken into account when designing a scalable encryption algorithm for a DRM system: . Minimum and maximum security levels—encryption should provide adequate security level for a given application. . System upgradability and renewability—the scheme should be easy to be upgraded to a different encryption algorithm that provides a higher level of security for a different application and should be easily renewable. . Bit rate—encryption should preserve the size of the original bitstream. . Computational complexity—encryption should add minimum decoding computational overhead that is appropriate for the decoding device processing power. . Error resilience capability—encryption should not lower the error resilience capability of the original system. . Channel adaptation capability—encryption should not compromise the original bitstream’s channel adaptation capability, should preserve such capabilities as transcoding. . Format-compliant capability—encryption should not compromise the compliance to the original format. . Tradeoff between security and computational complexity. . Tradeoff between security and scalability. . Tradeoff between security and bitrate or coding efficiency. . Tradeoff between security and error resilience capability. . Tradeoff between security and interoperability.
11.3.2.2 Public Key Watermarking System Public Key versus Private Key Watermarking Schemes In classic content protection systems using secure digital watermarking, private key (also called symmetric-key-based) schemes are used. That is the key K for embedding Ken and retrieval Kde of the watermark “w” is identical: Ken ¼ Kde . In other words, decoding is not public. It makes use of the encoding key Ken to detect the embedded
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
399
watermark. The decoder has all the critical information about the watermark for correct watermark decoding. This implies that the decoder has to be trusted and highly secure. For applications that could not meet these conditions, private-keybased schemes potentially allow the embedded watermark to be destroyed or damaged via watermark subtraction and fake watermarks to be embedded with the same key Ken . Hence study on public-key-based watermarking schemes, where the encoder key Ken and the decoder key Kde are different, Ken = Kde and encoding key Ken cannot be calculated from the decoding key Kde , has been conducted. Public-key-based algorithms can also be called asymmetric key algorithms. Figure 11.8 illustrates general private key (a) and public key (b) watermarking schemes. Some Public Key Watermarking Algorithms Hartung and Girod [37] used one of the first proposed public key robust watermarking algorithms for copy protection applications. It is especially designed for spread-spectrum watermarking schemes. The pseudorandom sequence Pen used for watermark embedding is partitioned, and a part of that jointly with arbitrary random values replacing the rest part of Pen is used for watermark decoding; that is, a partial encoding key is used for decoding. Let Pen ¼ P1 þ P2 , and assume that P0 = P2 and the length of P0 equals that of P2 , jP0 j ¼ jP2 j. Then, Pde ¼ P0 þ P1 . In the paper they suggested that on average each nth coefficient, n . 2, is used to construct the decoding pseudorandom sequence Pde . It is easy to see that the decoding key is a function of the encoding key, and care has to be taken such that Pen cannot be easily calculated from Pde . Various public key watermarking algorithms have been proposed since then. One-way signal processing [38], Legendre sequence [39], and eigenvector [40] public key watermarking algorithms are several of the well-known ones. Unlike Hartung’s algorithm, these schemes do not require partial knowledge of the embedding key Ken , which is either a function of the watermark w or equals w. Furon and Duhamel [38] make use of the power density spectrum (PDS) function. A permutation is performed on the host signal x such that PDS of x is flat. Public watermark private key Ken
O W
Embedding
O
transformation
O’
decoding
W’
(a) private key Ken
O W
Embedding
public key Ken
O
transformation
O’
decoding
W’
(b)
Figure 11.8
General (a) private key, and (b) public key watermarking schemes.
400
SECURITY AND DIGITAL
detection is based on the specific shape of the PDS of the received signal x0 . In contrast, van Schyndel et al. [39] described an asymmetric watermarking scheme based on a length-N Legendre sequence used as the watermark w. The watermark is detected at the decoder using the correlation between the received signal x0 and its conjugate Fourier transform. Eggers et al. [40] extended the idea of van Schyndel et al. [39]. Assume an N N matrix A and a watermark vector w, Aw ¼ l0 w. Watermark is detected by correlating the received signal x0 with its transformed signal Ax0 . Since the embedding key is not needed for watermark detection at the decoder, these schemes are considered public key watermarking schemes. Eggers et al. [41] and Craver and Katzenbeisser [42] discussed the vulnerability of existing public key watermarking schemes. It was found that none of the aforementioned schemes is sufficiently robust against malicious attacks. Will public key watermarking lead to secure public watermark detection? The challenge is to detect the presence of the watermark (1) using a key (public) that is different from that was used for embedding the watermark, (2) without revealing enough information to destruct the watermark, (3) or without making it possible to forge a valid watermark for a different signal or digital object, and (4) with decoding being computationally feasible. Perhaps the development of a theoretical foundation of public key watermarking schemes is needed. A public key fragile watermarking schemes have also been proposed. Basically, an asymmetric algorithm is used to encrypt the hash value h ¼ H(O) to generate a digital signature S ¼ EncKen (h). A public key is used to decrypt S0 , received digital signature. Assuming h^ ¼ DecKde (S0 ) and h0 ¼ H(O0 ), authenticity of the received object O0 can be verified if h^ ¼ h0 . Study on public key fragile watermarking schemes have focused mainly on how to generate the authentication value, where to embed the digital signature, and what public key encryption algorithm to use for a tamperproof system. Bareto et al. [43] show that using nondeterministic signature, where each individual signature depends not only on the hashing function but also on some randomly chosen parameters, is more secure than previous approaches using deterministic signature, such as those proposed in Refs. 44 and 45. A Dual Watermarking –Fingerprinting System An ideal asymmetric key watermarking scheme with single embedding key Ken and single decoding key Kde = Ken shall limit an adversary’s ability to recreate the original content from the watermarked content. This, however, may not be enough for certain applications, such as multicast applications where a server sends the object to multiple clients. If the adversary fully controls any client, interference with the communication of server to all clients is established. To solve this problem, Kirovski et al. [46] proposed a public key algorithm that provides multiple distinctive decoding keys, all different from the encoding key, which is more suitable for multicast applications. The algorithm is also designed for spread spectrum based digital watermarking schemes. Let C ¼ {cij } denote an m N matrix, where cij [ R, is a zero mean random variable with standard deviation s. i The ith watermark decoding key is generated as follows: Kde ¼ Ken þ ci , where i the encoder key Ken is hidden in every copy of the different decoding key Kde such
11.3
EMERGING TECHNOLOGIES AND APPLICATIONS
401
i that knowledge of Kde does not deterministically imply knowledge of Ken , as long as s is large enough. An application for a dual-watermarking/fingerprinting system using the proposed public key algorithm has also been addressed [46]. The watermark detector i is straightforward. The testing (received) signal x0 is correlated with Kde . The deteci i tor determines whether the signal is marked according to whether Dw ¼ x0 Kde . d, a threshold. The fingerprint detector is used to detect if the ith client is part of the collusion. Unlike the watermark detector, the fingerprint detector has access to the watermark carrier matrix C. It computes the correlation between ci, the suspect watermark carrier, and (x0 x ) : Di ¼ (x0 x ) ci , where x is the originally marked signal, to detect the compromised decoding key. Based on their study, the system achieves content protection with collusion resistance of up to 900,000 users for a 2-hour high-definition video.
11.3.2.3 Efficient Key Management for Multicast in the Mobile Environment Key management can be the weakest point of a digital right management system. Mobility complicates multicast key management by allowing members to move between networks while staying in the same session on top of supporting dynamic group membership. The mobility of a group member across different time zones adds an additional dimension of complexity. Additional security threats include unauthorized creation, alteration, destruction, and illegitimate use of content by a mobile member who accumulates information for each area that he or she visits. Efficient key management and distribution can address many security concerns for mobile content multicast. In general, the following criteria are commonly used to evaluate the efficiency of various key management schemes: . Scalability—the ability to handle dynamic membership (scale group size) without considerable performance deterioration. . Rekeying efficiency—affected by the frequency of rekeying, data transmission delay due to rekeying, the number of members affected, and the total number of key update messages needs to be sent . Storage requirement—memory capacity overhead for key management at mobile terminals and group managers Hierarchical structure is widely used for scalable multicast services. In general, there are two types of scalable rekeying solutions, namely, approaches based on logical key hierarchy (LKH) and group management hierarchy (GMH). Partition a multicast group into subgroups localizes the effect of membership changes and allows differentiated key management techniques. When members are mobile, hierarchical group key management works better than centralized group key management used in LKH-based approaches. In hierarchical group management, several subgroup managers or area key distributors are used such that mobile members can get access to new group keys as long as they are ‘near’ to one of the subgroup
402
SECURITY AND DIGITAL
managers. The use of hierarchical key management adds a certain complexity to the key management system. For instance, the mobility of the subgroup manager or the area key distributor may result in substantial changes in the hierarchical topology. This has to be addressed to ensure end-to-end performance. One common drawback of hierarchical frameworks is the substantial resource overhead to manage the multicast group. Several interarea key distribution algorithms for group key management in wireless and mobile environments are introduced in IETF [47]. . Baseline Rekeying. This treats mobility across areas as a “leave” from the old area followed by a “join” to the new area. The disadvantage is that data transmission is unnecessarily interrupted twice during a transfer between areas because the system does not distinguish leaving member from a moving member. Hence mobility of a single member affects all the members in the domain. . Immediate Rekeying. This extends the baseline algorithm by adding explicit semantics for a handoff between areas. Each area updates the local area key but not the data key on receiving the notification of a member’s moving. Data transmission is uninterrupted. . Delayed Rekeying. This reduces repeated local rekeying by postponing local rekeying until a particular criterion is satisfied. Members moving between areas may accumulate multiple area keys and reuse them upon returning. An “extra key owner list” is maintained by each area key distributor to check against those returned members. The impact of member mobility is reduced at the cost of increased semantics. A sample key management tree, k-ary, is illustrated in Figure 11.9, where the multicast group has nine members and three hierarchies. 11.3.2.4 Multimedia Content Verification and Error Concealment Mobile content distribution is subject to error-prone communication, a characteristic of wireless networks. The task of authentication is to verify the authenticity of the target object or bitstream. Conventional cryptographic system provides complete verification such that even one bit change will give a “negative” outcome. To provide seamless content access, transcoding is often used that preserves the content but not the bits. For this kind of application it is obvious that complete verification cannot provide sufficient authentication capability. Furthermore, with error-prone communication for real-time and streaming media applications where transmission time is critical, retransmission for error-free deployment is impossible. Content or semantic verification with error recovery ability improves means for moderate security for such applications in mobile communication. First, features C ¼ f (O), which are important to preserve the original content information, are extracted from the original multimedia bitstream. Next, C is authenticated using secure cryptographic algorithms, D ¼ EncK (h) ¼ EncK (H(C)) to generate a digital signature D with key K. If D is used for authentication, content verification
11.3
403
EMERGING TECHNOLOGIES AND APPLICATIONS
Kroot
KB
K1
M1
K2
M2
Figure 11.9
Kb
K3
M3
K4
M4
K5
M5
Kc
K6
M6
K7
M7
K8
M8
K9
M9
A k-ary key management tree for nine members with three hierarchies.
instead of complete verification is performed. Assume that C0 is extracted from the received bitstream O0 , and that D0 ¼ EncK (h0 ) ¼ EncK (H(C 0 )). If D ¼ D0 , then O0 is said to be content-authentic. Ideally, C should be invariant to any content preserving transformations and be robust against certain amount of bit errors. In the mean time, C should also be sensitive to any content varying alterations. One interesting approach is to use digital watermarking to achieve authentication and error concealment at the same time: . Assume the source object (multimedia plaintext bitstream) O ¼ O1 þ O2 þ þ OM . . Let Ci ¼ f (Oi ) be an invariant feature of Oi . . Authenticate Ci to get Di ¼ EncK (h) ¼ EncK (H(Ci )). . Embed Di into Oj , with i = j and the ith and the jth subbitstream sufficiently apart such that the probability of both Oi and Oj are erroneous is low. . At decoder, extracted Di is used to verify the authenticity of the received subbitstream O0i and recover some significant content information of Oi in the event Di = D0i ¼ EncK (h0 ) ¼ EncK (H(Ci0 )). Theoretically, if the watermarking scheme is sufficiently robust, Di ¼ Di . Most of the content verification schemes in the literature, with or without error concealment capability, try to explore low-level features extracted from a single domain, such as frequency domain. Interested reader can refer to the literature [48 –50]. What constitutes a feature C to be theoretically invariant and/or practically robust enough for this given application remains an active research area. Lin and Tseng [51] studied semantic-based video authentication using highlevel features. A series of advanced multimedia technologies, processes of video shot and object segmentation, label annotation, concept modeling, classification,
404
SECURITY AND DIGITAL
watermarking, and digital signatures using public key infrastructure (PKI), are deployed to combine semantic learning and security for semantic authentication. For multimedia applications in which the content may undergo various transformations that maintain the semantic meaning of the information, but modify the bit representation, it is imperative to tie authentication to the semantics of the data; otherwise, routine processing of the content may interfere with DRM protection. 11.3.3
Summary
DRM in mobile computing environment is challenging because of its (1) vulnerability to many types of attacks, such as intrusion and eavesdropping; (2) errorprone communication channel; (3) insufficient network bandwidth; (4) dynamic mobile device location change; and (5) limited processing power, memory, and battery lifespan of mobile device, which make it computationally infeasible for strong encryption schemes. DRM solutions defined in many standard and currently available products should not be considered completely secure, if in fact this requirement is ever needed for commercial viability. Indeed, many systems today are designed on the basis of the mantra to “keep honest people honest.” Although a content protection system for DRM, no matter how strong or sophisticated, will always be vulnerable to some degree of attack from hackers and pirates, it must be sufficiently robust, such that it can protect the commercial value of the content from widespread attacks, as well as encourage appealing business models for all parties in a distribution chain. Thus, there still exist a great many technical, legal, and business challenges in adopting DRM. However, we believe that early discussion, design, development, and deployment in the context of mobile content delivery will ultimately ensure the existence of MDRM solutions at the core of all digital content services to enable seamless convergence of digital content access across all networks and devices, to provide reliable, convenient, and secure distribution for the fair benefit of all.
REFERENCES 1. W. Stallings, Network Security Essentials: Applications and Standards Prentice-Hall, 2000. 2. IBM Denial-of-Service—Alert and Response, http://www-1.ibm.com/services/ continuity/recover1.nsf/mss/DoS. 3. R. Iannella, Digital rights management (drm) architectures, D-Lib Mag. 7(6) (June 2001). 4. J. Duhl and S. Kevorkian, Understanding DRM Systems, IDC Whitepaper, Technical Report, Intertrust, 2001. 5. AES Home Page, http://csrc.nist.gov/CryptoToolkit/aes/. 6. M. Schneider and S.-F. Chanag, A robust content based digital signature for image authentication, Proc. IEEE Int. Conf. Image Processing, 1996, Vol. 3, pp. 227 – 230.
REFERENCES
405
7. D. Kunder and D. Hatzinakos, Digital watermarking for telltale tamper-proofing and authentication, Proc. IEEE (special issue on identification and protection of multimedia information), 1167– 1180 (July 1999). 8. J. A. Bloom, I. J. Cox, T. Kalker, J.-P. Linnartz, M. L. Miller, and C. B. S. Traw, Copy protection for DVD video, Proc. IEEE (special issue on identification and protection of multimedia information), 1267–1276 (July 1999). 9. A. Steinacker, A. Ghavam, and R. Steinmetz, Metadata standards for web-based resources, IEEE Multimedia, 8(1): 70– 76 (Jan. – March 2001). 10. N. Paskin, Toward unique identifiers, Proc. IEEE (special issue on identification and protection of multimedia information), 1208– 1227 (July 1999). 11. J. Feigenbaum, M. J. Freedman, T. Sander, and A. Shostack, Privacy engineering for digital rights management systems, Proc. ACM CCS-8 Digital Rights Management Workshop, 2001, pp. 76–105. 12. Y. Chawathe, Scattercast: An Architecture for Internet Broadcast Distribution as an Infrastructure Service, Ph.D. thesis, Univ. California, Berkeley, Dec. 2000. 13. M. Sreetharan and R. Kumar, Cellular Digital Packet Data, Artech House, 1996. 14. A. O. Waller, G. Jones, T. Whitley, J. Edwards, D. Kaleshi, A. Munro, B. MacFarlane, and A. Wood, Securing the delivery of digital content over the internet, IEE Electron Commun. Eng. J., 14(5): 239– 248 (October 2002). 15. R. Mori and M. Kawahara, Superdistribution: The concept and the architecture, Trans. IEICE (special issue on cryptography and information security), E73(7): (July 1990). 16. D. M. Wallner, E. G. Harder, and R. C. Agee, Key Management for Multicast: Issues and Architecture, Technical Report, Internet Draft, Sep. 1998. 17. C. K. Wong and S. S. Lam, Digital signatures for flows and multicasts, IEEE/ACM Trans. Network., 7(4): 502– 513 (Aug. 1999). 18. R. Canetti, J. Garay, G. Itkis, D. Micciancio, M. Naor, and B. Pinkas, Multicast security: A taxonomy and some efficient constructions, Proc. IEEE INFOCOM, March 1999, Vol. 2, pp. 708 – 716. 19. B. Pinkas, Efficient state updates for key management, Proc ACM CCS-8 Digital Rights Management Workshop, 2001, pp. 40–56. 20. DWS, www.dwsco.com 21. JTC1/SC29/WG11 N3943, Intellectual Property Management and Protection in MPEG Standards, Technical Report, ISO/IEC, Jan. 2001. 22. G. Kesden, Introduction on Content Scrambling System, Technical Report, CMU, Dec. 2000. 23. JTC1/SC29/WG11 N4269, Information Technology—Multimedia Framework (MPEG21) Part 4: Intellectual Property Management and Protection, Technical Report, ISO/ IEC, July 2001. 24. Content Guard XRML 2.0 Technical Overview, http://www.xrml.org. 25. R. Aravind, M. R. Civanlar, and A. R. Reibman, Packet loss resliience of MPEG-2 scalable video coding algorithm, IEEE Trans. Circuits Sys. Video Technol., 6(10): (Oct. 1996). 26. H. Gharavi and M. H. Partovi, Multilevel video coding and distribution architectures for emerging broadband digital networks, IEEE Trans. Circuits Sys. Video Technol., 6(10): (Oct. 1996).
406
SECURITY AND DIGITAL
27. W. Li, Overview of fine granularity scalability in MPEG-4 video standard, IEEE Trans. Circuits Sys. Video Technol., 11(3): (March 2001). 28. I. Agi and L. Gong, An empirical study of MPEG video transmissions, Proc. Internet Socient Symp. Network and Distributed System Security, San Diego, CA, Feb 1996. 29. I. Agi and L. Gong, Security enhanced MPEG player, Proc. IEEE 1st Int. Workshop on Multimedia Software Development, March 1996. 30. L. Qiao and K. Nahrstedt, Comparison of MPEG encryption algorithms, Int. J. Comput. Graph. 22(3): (1998). 31. C. Shi, S.-Y. Wang, and B. Bhargava, MPEG video encryption in real-time using secret key cryptography, Proc. Int. Conf. Parallel and Distributed Processing Techniques and Applications, 1999. 32. C.-P. Wu and C.-C. J. Kuo, Fast encryption methods for audiovisual data confidentiality, Proc. SPIE, Multimedia Systems and Applications III, A. G. Tescher, B. Vasudev, and V. M. Bove, eds., Nov. 2000, Vol. 4209. 33. A. Servetti and J. C. De Martin, Perception-based selective encryption of G.729 speech, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, May 2002. 34. A. Pommer and A. Uhl, Selective encryption of wavelet packet subband structures for secure transmission of visual data, Proc. Multimedia and Security Workshop, ACM Multimedia, Dec. 2002. 35. J. Wen, M. Severa, W. Zeng, M. Luttrel, and W. Jin, A format-compliant configurable encryption framework for access control of video, IEEE Trans. Circuits Sys. Video Technol. 6(13): 545– 557 (2002). 36. H. H. Yu and X. Yu, Progressive and scalable encryption for multimedia content access control, Proc. IEEE Int. Conf. Communications, May 2003. 37. F. Hartung and B. Girod, Fast public-key watermarking of compressed video, Proc. IEEE Int. Conf. Image Processing, 1997, Vol. 1, pp. 528– 531. 38. T. Furon and P. Duhamel, An asymmetric public detection watermarking technique, Proc. Workshop on Information Hiding, Dresden, Germany, Oct. 1999. 39. R. G. van Schyndel, A. Z. Tirkel, and I. D. Svalbe, Key independent watermarking detection, Proc. IEEE int. Conf. Multimedia Computing and Systems, Florence, Italy, June 1999. 40. J. J. Eggers, J. K. Su, and B. Girod, Public key watermarking by eigenvectors of linear transforms, Proc. Eur. Signal Processing Conf., Tempere, Finland, April 2000. 41. J. J. Eggers, J. K. Su, and B. Girod, Asymmetric watermarking schemes, Proc. Sicherheit in Mediendaten, GMD Jahrestagung, 2000. 42. S. Craver and S. Katzenbeisser, Security analysis of public-key watermarking schemes, Proc. SPIE, Mathematics of Data/Image Coding, Compression and Encryption IV, July 2001, Vol. 4475. 43. P. S. L. M. Bareto, H. Y. Kim, and V. Rijimen, Toward a secure public-key blockwise fragile authentication watermarking, Proc. IEEE Int. Conf. Image Processing, Sep. 2001. 44. P. S. L. M. Bareto, H. Y. Kim, and V. Rijimen, Image authentication and integrity verification via content-based watermarking and public key cryptosystem, Proc. IEEE Int. Conf. Image Processing, 2000, Vol. 3, pp. 694– 697. 45. P. W. Wong and N. Memon, Secret and public key watermarking schemes for image authentication and ownership verification, IEEE Trans. Image Process., 10(10): (2001).
REFERENCES
407
46. D. Kirovski, H. Malvar, and Y. Yacobi, Multimedia content screening using a dual watermarking and fingerprinting system. Proc. ACM Multimedia, Juan Les Pins, France, Dec. 2002. 47. L. Dondeti, B. Decleene, S. Griffin, T. Hardjono, J. Kurose, D. Towsley, C. Zhang, and S. Vasudevan, Group Key Management in Wireless and Mobile Environment, Technical Report, Internet Draft, Jan. 2002. 48. C. Y. Lin, D. Sow, and S.-F. Chang, Using self-authentication and recovery for error concealment in wireless environments, Proc. SPIE, Multimedia Systems and Applications IV, A. G. Tescher, B. Vasudev, and V. M. Bove, eds., Aug. 2001, Vol. 4518. 49. P. Yin, B. Liu, and H. Yu, Error concealment using data hiding, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Salt Lake City, Utah, May 2001. 50. P. Yin and H. Yu, A semi-fragile watermarking system for MPEG video authentication, Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Orlando, Florida, May 2002. 51. C.-Y. Lin and B. L. Tseng, Segmentation, classification and watermarking for multimedia semantic authentication, Proc. Int. Workshop on Multimedia Signal Processing, Virgin Islands, Dec. 2002.
CHAPTER 12
CHARGING FOR MOBILE CONTENT DAVID BANJO Nokia Networks Helsinki, Finland
12.1
INTRODUCTION
Other chapters of this book have focused on describing mechanisms for creating, managing, and delivering mobile content. There is the underlying assumption that the consumers of such content are paying to receive or access this content. In this chapter we consider the mechanisms that enable the providers of mobile content services to charge for them. Charging for mobile content requires consideration of two aspects: payment for the content (the goods) and the payment for the use of the mobile domain (e.g., the wireless bearer, and a mobile context; e.g., the location and “presence” of the consumer, and the personal, trusted status of the mobile device). Charging for mobile content services must be able to distinguish and apply the true value of the content to the consumer; which might differ considerably from the value of that content to the user over a fixed (home or work), Internet connection. Therefore the charging mechanisms that are utilized should be able to differentiate between, and separately charge, the content-related price and access or mobility related fees to the end user. Charging is an area that is critical to the success of the mobile content business. If users of mobile content services are not able to understand and accept the way that content is charged, they will not use the services, and the “mobile Internet explosion” will at best be significantly delayed. Emphasis must be given to, identifying the charging models that are needed to support each type of mobile content usage, and developing the needed charging mechanisms and systems to enable those charging models. In this chapter we will explore the mechanisms that have been developed for charging within the telecom domain, as these will generally provide the starting point for operators to deploy a mobile charging infrastructure. We will continue to examine how these mechanisms can be adapted to mobile content charging, and look at some of the challenges that are posed in this area. Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
409
410
12.1.1
CHARGING FOR MOBILE CONTENT
The Scope of “Charging”
First it might be useful to define what we mean by charging in this document, and to make some distinctions between various related terms. Questions frequently raised in this general area are What is the difference between charging and payment? Aren’t charging and billing the same thing? Are we talking about charging or accounting here? Although, to the author’s knowledge, there is no empirical definition that separates charging and payment, for the purposes of this chapter we will apply the subsequent interpretations of the terms. The 3GPP (Third-Generation Partnership Project) definition of charging [1] is “A function whereby information related to a chargeable operation is formatted and transferred in order to make it possible to determine usage for which the charged party may be billed.” Payment has been described within OMA (Open Mobile Alliance) as “a mechanism by which funds are moved from the customer to the merchant in exchange for goods and/or services. It may be on a per transaction basis, or may be aggregated over a number of transactions.” A distinction that can be made is that payment involves the explicit interaction of the end user in a transaction, whereas this is not necessarily the case in charging. A mobile payment (m-commerce) transaction may result in charging; for example, where the user has opted to pay via his/her operator bill, instead of, for example, supplying credit card details. So, we will align our focus on charging with 3GPP’s definition; focusing on the generation, formatting and transfer of data records, which provide details relating to the usage of a service or consumption of an item of content. These records can be generated from a variety of sources connected with the delivery of the service, and utilizing a number of different mechanisms, which we will explore later. The purpose of generating these data records is to facilitate the identification and apportionment of the various revenue transactions that can be related to the service; for example, the consumer of the service paying the provider of the service; the provider of the service paying the provider of the content, who in turn pays the owner of the content. These charging records may ultimately be presented to, for example, billing systems through which invoices can be generated for payment (unless the subscriber has a prepaid charging relationship with the mobile network operator). This document does not explore the billing process. For instance, bill or invoice presentation, settlements, clearing, taxation, and general ledger entries are not in the scope of charging. In scope is the generation of charging info; also transformation, normalization, and correlation of usage data. Charging can also focus on “prerating,” which involves assignment of ratable attributes to the charging record that allow the appropriate values (e.g., currency) to be assigned against the appropriate accounts
12.2
FIXED-LINE TELEPHONY CHARGING
411
[e.g., subscriber, business-to-business (B2B) partner]. The management of impacts of charging events against a prepaid account balance is generally seen as within the charging domain (however, the recharging or “topping up” of the prepaid balance is not considered within the charging scope). From the description above, charging can be understood to be closely related to accounting. Charging is often seen to be the equivalent in the telecom(munication) world to accounting in the Internet sphere. The scope of accounting can however be seen to be wider than that of charging, as it also deals with the collection of usage data for nonbilling purposes (usage analysis and audit, etc.). But for the purposes of this chapter, we can assume that charging and accounting are synonymous.
12.2
FIXED-LINE TELEPHONY CHARGING
In this section we will overview some of the mechanisms that have been employed by telecom operators to charge users, initially within fixed-line networks. (We are assuming in this chapter that the mobile user is a subscriber of a mobile network operator. However, this does not necessarily mean that the network operator is offering the content or service used. A Web services provider might for example supply the content.) In a mobile content provisioning scenario the telecom operator generally holds the primary relationship with the mobile user. To enable the billing of users for a variety of services that have evolved through a sequence of technological advances, the operator has typically developed a complex charging and billing infrastructure. While such an infrastructure is often unsuited to the needs of content charging, it represents a legacy system to which integration is required. Operators have invested significantly in such systems. They cannot be replaced overnight and must be expected to evolve, somewhat slowly, toward the requirements of content charging. Person-to-person voice calling was the first service offered by PSTN (public switched telephony network) telephone operators. Charging for voice calls was facilitated through counting the number of “pulses” generated per call within the telephony exchanges involved in the setup, routing and control of these calls. The number of pulses generated per call varied in relation to the location of the called number. In the early days of billing, engineers took photographs of the settings on racks of pulse meters in order to identify the change (total number of pulses) on a subscriber’s usage since the previous billing period! With the advent of digital exchanges (or “switches”), additional sophistication was introduced to charging through the ability to extract usage data from the switches that manage the connections. These charging records [call detail records (CDRs)] identified numerous parameters associated with a call (e.g., the calling number, called number, start and end times). CDRs were initially extracted by writing to magnetic or optical media, which had to be physically transported to a
412
CHARGING FOR MOBILE CONTENT
billing center. These days the automated collection of CDR files via the operator’s own network has become the norm. It can be relevant to consider simple voice calls as content. From a charging perspective, a voice call includes most of the attributes that are pertinent to content: . A service deliverable (i.e., voice telephony) that represents a quantifiable value to the end user . Recognition of the roles other players in the value chain, from a revenue sharing perspective An example of fixed-network CDR contents would be as follows: Calling party Called party Call start Call end Recording unit Record type Egress circuit Egress port
01023422315 05564123444 03032003 : 101007 03032003 : 101703 22 01 244 37
From the example voice call CDR above, these aspects can be derived by CDR analysis within typical charging and billing systems, for instance The “record type” indicates that this is a normal outgoing voice call (the service), which will be billed to the calling party (the user). The called number (i.e., the destination); which, in relation to such attributes as the calling number, and the time and duration of the call, enable the calculation of the retail value of the call (i.e., the price the user will pay for the content). The “circuits” and “ports” used in the connection that (combined with the retail attributes) support the computation of the charges due to the network operator through which the call was connected. Additionally, some of the complexities associated with current content charging have originally been presented in charging for voice telephony. Where multiple physical elements (e.g., telephony switches) are involved in the delivery of a service, there is a need to determine which element(s) to use as a source of billing information; for example, whether to use the switch to which the subscriber connected, or a transit switch or an interconnecting switch to other networks. (The “recording unit” in the earlier example might indicate that this switch is one that interconnects directly with a specific network). In Figure 12.1 operator A’s charging and billing system would use the residential switch as the source of billing information for subscribers attached to that residential
12.2
FIXED-LINE TELEPHONY CHARGING
413
Operator B
Called Party Interconnect switch
Interconnect switch
Transit switch Operator A
IDC
Charging System
Residential switch
Calling Party
Figure 12.1
Different switches in a PSTN network.
switch and the interconnect switch as the source of interoperator settlement information with operator B. In addition to person-to-person voice calls, various value-added services (VASs) were gradually introduced to voice telephony. Examples of these include premium rate infotainment (number translation), friends and family, VPN, and closed user groups. In addition to providing an increased range of value to the end user, such
414
CHARGING FOR MOBILE CONTENT
services often introduce the concept of a third party to the value chain. For example, a business might offer an adult entertainment voice service that would be charged to the user at a specific premium rate via the user’s phone bill, with a predetermined proportion of the charge paid by the operator to the service provider. Conversely, certain services would be provided to the user free of charge, concerning which the service provider would be charged by the operator on a per call basis, derived from analysis of the called number. We shall discuss revenue sharing later in this chapter. Such VASs are often provided using intelligent network (IN) features. IN has evolved from functionality that might be integrated within telephony switches to a separately implemented and flexible service creation environment. As such, an IN system represents an additional source of charging information.
12.3
MOBILE TELEPHONY CHARGING
This section introduces some additional factors relevant to charging that have been developed within mobile telephony. The introduction of GSM (Groupe Spe´ciale Mobile) mobile telephony networks in the early 1990s laid the foundation for mobile content. One of the success stories of GSM was the emergence of the short message service (SMS) as a bearer for content-based services. Originally envisaged as a simple text-toperson communication facility, SMS has developed to become a conveyor of value-added content to individuals. It is fair to say that the user adoption and commercial potential of SMS was largely unpredicted within the industry. For this reason, the charging mechanisms developed around SMS did not adequately accommodate the fact that SMS messages might be charged differently, depending on the inherent value of the content carried; for example, whether a received message (SMS-MT) contained a ringtone, picture message, or a voicemail notification. While both the GSM MSC (mobile switching center) and GPRS SGSN (serving GPRS support node) are also able to provide charging data related to SMS usage, only the SMS center (SMSC) can provide application-level information to support content-based SMS billing. This underlines an important issue for content charging, which we shall return to later within this chapter, namely, the selection of the source of charging data. The Wireless Access Protocol (WAP), which became available in the late 1990s, offers new possibilities for transporting and presenting content securely to mobile terminals. WAP still suffers badly from the adverse reaction to the tremendous hyperbole with which it was introduced. It does still have a significant role to play in the future of mobile content. Significantly, from the charging perspective a WAP server presents an additional source of charging information: one that is able to identify attributes relevant to the content, such as whether the content was “pushed” or “pulled,” the content provider’s identity and pricing (if present), and the bearer used [2].
12.4 ASPECTS PERTINENT TO MOBILE CONTENT CHARGING
415
At the end of the 1990s GPRS (General Packet Radio Service) was introduced. GPRS provides packet switching capability to an enhanced GSM network. GPRS standards identified two new charging elements: The SGSN (serving GPRS support node), which provides charging data relating to the data packets delivered toward and from the mobile terminal, and also the mobility of the user, and SMS messages sent and received, The GGSN (gateway GPRS support node), which provides charging data relating to the data packets delivered toward and from the ‘Gi’ interface (i.e., the Internet, or content sources). In theory, the mobile Internet had then been realized. In practice, however, this was not quite the case. There were (and still are) issues to surmount regarding content adaptation and the availability of services geared specifically to mobile terminals. But most relevant to the subject of charging there is the issue of what pricing models would be used in conjunction with mobile content. A significant fact relating to mobile Internet data is that, unlike the fixed Internet, it cannot yet be offered as a virtual commodity. The investments made by mobile network operators on radio networks and radio spectrum licenses vastly outweigh the investment made, for example, by legacy fixed-line operators and access providers. Therefore, mobile operators are constrained to charge more dearly for wireless access to the Internet than fixed access. It should however be noted that mobile operators can also offer additional value propositions to the end user, associated with mobility and context (location, presence), and so on. Services that leverage these benefits are nevertheless slow to emerge. An obvious implication that arises from this is that the revenue propositions associated with the provision of mobile content cannot be the same as that applied to fixed Internet content—even for the same content. However, it cannot be expected in all cases that the end user should pay significantly more for mobile content than for fixed Internet content. There must be a focused rebalancing of the value chain associated with mobile content provision. Importantly, mechanisms are needed to identify and correlate the charges for access to the mobile Internet and for the content itself. Both access and content are in many cases charged according to usage data recorded within separate elements, and correlation of this usage data is far from straightforward. These aspects associated with correlation are explored further later in this chapter. 12.4
ASPECTS PERTINENT TO MOBILE CONTENT CHARGING
This section examines various aspects that must be understood and considered in any study of mobile content charging. These include identifying who should be charged (not limited to the end user), what charging models should apply, what additional factors should be reflected in the charging, and from where the charging information should be obtained.
416
CHARGING FOR MOBILE CONTENT
A pertinent question when considering the necessary mechanisms for charging mobile content is “What is mobile content?” From our perspective in charging we might define mobile content as something that can be delivered via the mobile Internet and that has a measurable value to the end recipient, as well as to other participants in the revenue chain. From the preceding description alone, it might be fairly difficult to define charging mechanisms that are required regarding mobile content. However, we can extract from the description certain key criteria that have an impact on how charging should be supported: . Mobility. This does not suggest that the content itself is defined by its provision to mobile terminals. However, there is a requirement that the charging for the content must be capable of recognizing certain additional aspects associated with delivery to a mobile terminal, such as location, presence, bearer, and roaming. . Internet. By and large, content is delivered via IP (Internet Protocol). SMS and circuit switched-data are exceptions, but these are largely considered to be of decreasing importance in the longer-term development of mobile content. . Delivery. There needs be means to ensure whether delivery of the content has succeeded of failed, before definitive charging actions can be performed. . Measurable Value. The content should be classifiable and quantifiable by automated systems. It is important to note that what exactly is measured will vary depending on the type of content and selected charging model. For example, some content should be charged according to the size of the content, some on the duration of the consumption session, and some on an explicit tariff code assigned by the content provider. Additionally, factors such as location or roaming, time of day, and quality of service should be able to influence the charge calculated. . End Recipient. Content is delivered to an end recipient (or a number of end recipients). The identity of these end recipients (or mobile devices) should be resolvable within the charging process to determine the account or entity that should pay for the content. . Other Participants. Charging must recognize that numerous players associated in the revenue chain should obtain a share in the revenues. The charging paradigms (and enabling mechanisms may differ for each, as explained in later sections). Charging data should support as far as possible the identification and value apportionment of the relevant parties. 12.4.1
Revenue Chain
Figure 12.2 illustrates how complex the revenue flow can be in a typical model of the mobile Internet. However, not all of these associations will be present in a single given context. Figure 12.3 provides an example of an individual context in which a mobile user has accessed directly a service provided by a service provider that is partnered with the
12.4 ASPECTS PERTINENT TO MOBILE CONTENT CHARGING
417
Content Aggregator Advertiser Content Creator
Home Mobile Network Operator
Content/ Service Provider
Visited Mobile Operator Network
User
Figure 12.2 Possible revenue flow relationships.
user’s network operator. The service provider has an arrangement to charge the user via operator billing, and the operator will settle with the service provider after the event. The service provider is responsible for redistributing the revenues toward the content aggregator, who in turn settles with the content copyright owner. The service provider will also receive payment for advertisements placed within the content, or during the service discovery process (this helps reduce the cost of the content toward the end user). Finally, the user is roaming on a foreign network. The (single) charge by the operator to the user should reflect the fee for the content, plus mobile network access, plus roaming surcharge. The operator will settle with the visited network operator through the standard transferred account procedure. 12.4.2 Subscription Models The subscription model offered by the service provider determines the mechanisms needed for content charging. Typical subscription and charging models include Fixed subscription—for example, a monthly charge, with “all you can eat” usage.
418
CHARGING FOR MOBILE CONTENT
Figure 12.3
Revenue sharing example.
Limited subscription—where the monthly charge includes a fixed amount of content consumption, with additional consumption charged per one of the other subscription models. Event or transaction-based charging—where the charge is based on individual invocations of the service or content. Examples include per file, or per message charging. The charge may differ depending on, for instance, the type or size of the content. Session-based charging—where charging is based on metering during a continuous period of usage of the service or content. Examples are streamed services, which might be charged according to the amount of time that the streamed service is used. (Also, such charging might be based on an initial setup charge followed by per minute charging metered during the session duration.) In addition to these, certain other charging mechanisms can be utilized: Sponsorship. The charging of the user or content consumer may to a variable extent be reduced as a result of a third-party participation. Examples
12.4 ASPECTS PERTINENT TO MOBILE CONTENT CHARGING
419
include a company paying (sponsoring) 50% of WAP usage by its employees; or a company sponsoring all users of specific services carrying inserted advertisements. Loyalty Schemes. In this mechanism the user can accrue miscellaneous “loyalty points” associated with the usage of a service provider’s offerings. These credits might equate to simple currency or service-usage-based credits, or might involve the accrual of abstract resources such as air mile awards by a partner airline. The mechanisms listed above have been described simply in the context of charging for only the content. However, in reality the mechanism will need to apply to a differentiated charging model, which takes into account the charging that should be made for access to the mobile network (which might also vary dependent on the access type). So, for example, the service provider might wish to allow access charging to be active as a default for browsing usage, but to be disabled where selected sites are visited, or to have the content charging reduced where UMTS is used as the access bearer (e.g., to promote greater take up of UMTS subscriptions).
12.4.3 Postpaid and Prepaid Charging Perhaps the most significant aspect of the subscription from the viewpoint of the required charging mechanism is whether the subscription is prepaid or postpaid. In Europe it is estimated that more than 50% of mobile users on average are on prepaid subscriptions, with the figure approaching 90% in certain countries. It is in the interest of mobile network operators and service providers alike to ensure that the fullest range of mobile content and services are equally available to postpaid and prepaying subscribers. However, there are considerable complexities associated with prepaid charging. In a postpaid billing relationship the subscriber is considered by the charging party (i.e., the mobile network operator) to have an acceptable level of creditworthiness. Accordingly, the operator is satisfied to request payment retrospectively for services used during a billing period (typically a calendar month). Consequently, the generation of associated charging information does not carry a high urgency, and generally a daily collection1 of charging information is sufficient to ensure that it can be rated and included within the upcoming subscriber-billing run. In prepaid billing the subscriber has, for various reasons (not necessarily related to creditworthiness), chosen to have his or her mobile telephony services usage limited, based on his/her prepaid account balance, which has been deposited with the charging party in advance. The account balance can be “topped up” by the subscriber as required, to ensure that there is sufficient credit on the prepaid account to cover his/her intended service usage. This model places a greater overhead on the 1
In practice the frequency of collection is more often dictated by storage capacities on the CDR generating elements.
420
CHARGING FOR MOBILE CONTENT
Content/ Delivery Server
Charging Mediation
Content Delivery
Rating & Billing
Usage data (e.g. CDR) Mediated CDR
Figure 12.4 Postpaid content charging.
delivery server, to ensure that there is sufficient credit on a subscriber’s account before a specific service or content item is delivered, and to adjust the subscriber’s prepaid balance in line with the actual service usage (recognizing accordingly whether the service or content delivery was or was not successful). Prepaid charging for session-based services requires implementation of an “interim accounting” mechanism, in which the prepaid balance is incrementally debited during the session. Figures 12.4 and 12.5 illustrate the differences between the mechanisms required to support postpaid and prepaid charging. Currently, it is popular to describe offline and online charging methods, rather than prepaid and postpaid. These terms are, however, not directly interchangeable 3GPP offers the following definitions:
Figure 12.5
Event-based prepaid content charging (simplified).
12.4 ASPECTS PERTINENT TO MOBILE CONTENT CHARGING
421
Offline charging—a charging mechanism where charging information does not affect, in real-time, the service rendered Online charging—a charging mechanism where charging information can affect, in real time, the service rendered and therefore a direct interaction of the charging mechanism with session/service control is required A postpaying subscriber may elect to utilize a spend limiting feature offered by an operator—for example, to place a limit of E40 per month on MMS messaging. In this case, the operator will possibly use an online charging mechanism specifically against this subscriber’s service in order to check that the balance accumulated during the given period does not exceed the defined threshold.
12.4.4 Business-to-Business (B2B) In addition to enabling the billing of end users, charging is responsible for supporting the business-to-business (B2B) settlements associated with the delivery of mobile content. This includes recognition of the fees owed to or from the third parties involved in the creation or aggregation of the content; and other network operators with whom it was necessary to interconnect in order to deliver the content. In the simplest case third-party charging requires the identification of the specific content or service aggregated quantitatively, and the service provider involved. The service provider is in turn responsible for settlements with other parties involved in the content provision chain. Settlement periods for B2B charging are typically monthly or quarterly; online accounting is rarely seen as a requirement. Wherever a user involved in the content delivery is located (permanently or temporarily) on a network other than the charging party’s, it can be assumed that interoperator charges need to be levied between the parties, based primarily on the traffic (within defined categories) that are passed between the networks. Currently such settlements are made periodically through the interchange and reconciliation of usage data between the two parties. It is a general principle that each party engaging in a B2B relationship will maintain its own charging data. The difficulties in reconciling such data, usually maintained in disparate formats, has helped promote the development of interchange formats such as TAP and CIBER, and the role of “clearinghouse” companies that broker such settlements. It is worth noting that B2B charging is generally performed at aggregate level, with no requirement to recognize the individual users associated with the transactions. This means that it is possible to obtain the charging data for user and B2B charging, respectively, from different points in the network. For example, data needed for user charging might most appropriately be obtained from the delivering network element, while the business charging data might be obtained from a transit switch, or a border gateway. (There can be small differences between data recorded in each of these entities e.g., different volume counts arising from signaling overhead; but this can be regarded as of low importance in this context.)
422
12.4.5
CHARGING FOR MOBILE CONTENT
Roaming
Roaming is the situation that arises when a mobile user is actively attached to a network other than his/her home network. When services are used in this scenario charging must be able to recognize the subscriber’s roaming status, so that the amount that user is charged can vary according to the specific network that the user is attached to. Users will always pay an increased premium when roaming, reflecting the charges that the home operator will pay to the visited network. These charges will be lower where there is a close business partnership between the networks, and the user charging will need to reflect such network-specific variation.
12.4.6
Multiple Access
Multiaccess networks allow content and services within an operator’s offering to be made available through a number of access technologies (e.g., GPRS, WLAN, xDSL). The importance of multiaccess is that charging should be access-aware. A specific service available to a subscriber may be tariffed differently depending on the access type. For example, content accessed via GPRS might be priced more cheaply during a promotion, and might not be available when the subscriber accesses the services via WLAN. The charging system relies on either the core network or the content delivery server (where possible) to indicate the used access technology within charging requests.
12.4.7
Source of Charging Records
As we noted earlier, an important aspect of mobile content charging concerns the choice(s) of source of charging information. In a typical mobile content environment there will be a number of service delivery elements involved with content delivery (all of which can generate charging information). Take an example of a game download. In the simplified example given in Figure 12.6 there is a mobile GPRS network, represented by the SGSN plus GGSN, an origin server (which holds the content) and a download server (which manages the content delivery). In principle, any or all of these elements could be used to charge for content delivery; however, in practice there are challenges involved with any approach that is taken. The typical mobile data network is unaware of the precise content that it is carrying, also whether the content is successfully delivered. The network has however the best awareness of the subscriber’s identity, subscription status, and location, and the type and quality of the bearer utilized. The origin server is the source that defines the precise content, but it may typically be completely unaware of the destination and delivery of the content. The delivery or download server has a high awareness of the content (it may often be responsible for adapting the content) and its successful delivery to the user. Being located within the mobile network operator domain the download server has some
12.4 ASPECTS PERTINENT TO MOBILE CONTENT CHARGING
ID C
423
Charging System
Charging Records Charging Records
Origin Server Download IP Server
SGSN
GPRS Core GGSN
Figure 12.6
Game download.
subscriber awareness (however, the user’s network identity may not necessarily be known, depending on the authentication mechanisms used). The content delivery server, or closest equivalent (e.g., an SMSC, MMSC or download server) is often the source of choice for content charging information. The disadvantage of using only a delivery server for content charging is that the general default basis for charging utilized by mobile network operators is based on charging for use of radio network access (usually based on volume of data traffic). Combining this access charging with content charging will result in the user being effectively double-charged; first, the cost of the content then the charges associated with the content delivery. It could be argued that that a solution lies in careful tariffing of content-based services; however, this has not so far proved successful. GPRS access is fairly expensive, with prices in the range of E1 per megabyte of data being commonplace. If on top of this there is additional charging for content, including both the fees due to the content owner and the operator’s margin the end result is not one that is at all inviting to the user from a cost proposition. It is also worth noting that the revenue sharing basis that has been used with SMS ringtones, where the operator may retain around 50% of the charge to the user, also is not likely to succeed if applied to many other types of content.
12.4.8 Multiple Servers Involved in Delivery A factor that must be carefully considered in charging is the “big picture” in a service delivery architecture, and how to ensure that the most pertinent charging information is available to bill the service or content in question, and that other charging information is suppressed or ignored. For example, WAP gateways and SMS centers are often used as sources of charging information, but the charging information that they produce is not normally required to charge for MMS
424
CHARGING FOR MOBILE CONTENT
services; even though SMS and WAP gateways are part of the MMS message delivery. Tailoring of charging information can be accomplished in a number of ways: . Configure each service element to produce only charging information relevant to the specific service delivery context it is participating in. . Configure logic in the charging system to recognize a service delivery context, and to utilize or disregard charging information accordingly. . Introduce a “workflow engine” or “broker” to manage the logic required externally to the charging and service delivery systems.
12.5
CHARGING CONCEPTS AND MECHANISMS
This section outlines various concepts, techniques, and processes generally utilized within the charging domain, but that have a relevance to mobile content charging. These include the generation of charging data, the differentiation of access and content charges, and “mediation” processes that can be applied to charging records. 12.5.1
Creation of Charging Records
Here we discuss mechanisms used for recording charging information. Although these mechanisms have been developed in the context of network access or bearer-related charging, the same techniques have also been utilized within the servers that provide content delivery, and generate corresponding usage records. Within the fixed Internet RADIUS (remote authentication dial-in user service), accounting [3] has typically been used to provide usage data for billing. RADIUS has had more limited use for charging in the telecom domain. Other less standardized mechanisms have also been employed, such as the analysis of log and event information created by access servers and web servers. As we have noted earlier, the telecom domain has traditionally utilized CDRs as a charging mechanism. There have been, particularly since the early 1990s, drivers to speed up the frequency of collection of CDRs from the switches and other recording devices that generate them. Initially, these drivers were, for instance, intended to reduce revenue leakage caused by the loss of data when the switches’ CDR file capacity was reached—often an issue where the frequency of switch polling (scheduled automated CDR collection) was as low as once per day. The mechanism of CDR creation is typically as follows: . Individual CDRs are created at configurable intervals. For example, a recording device may be programmed to generate CDRs every 15 minutes during the duration of a content session. (This creates “partial” CDRs, which can be charged individually, but that need to be combined with other partial CDRs created in relation to the session in order to present a complete billing record.)
12.5
CHARGING CONCEPTS AND MECHANISMS
425
. CDRs are written to a storage area within the recording device, and batched together in “CDR files” with hundreds or thousands of other CDRs generated at the same time. . When a CDR file is ready, it is made available for collection, The collection is often a secure file transfer process initiated by a billing or mediation device, employing a mutually agreed protocol. Increasingly, however, demands to minimize the credit risk posed by prepaying subscribers to operators have accelerated the trend toward real-time charging. CDR collection advanced toward a concept known as “warm billing,” where, for instance, CDR files may be collected every 5 –15 minutes. The further evolved concept of “hot billing” introduced mechanisms whereby CDRs could be individually “pushed” by the recording device, for example, as part of a message stream, bypassing the need to accumulate CDRs in files prior to their transfer. In terms of prepaid charging, even hot billing has been seen to introduce too much latency into the equation. First, the subscriber has already received some service or content before the CDR is generated. Then there is a certain amount of delay before the prepaid account balance can be impacted (mostly due to the need to buffer the CDR at one or more points within the processing stream). Finally, hot billing does not offer a mechanism to terminate a service in a circumstance where the prepaid account balance is exhausted. (Prepaid charging is discussed in more detail later in this chapter.) These limitations were initially addressed in CAMEL (customized application for mobile network enhanced logic) is a network architecture, relevant to both the CS (circuit-switched) and PS (packet-switched) domains and provides mechanisms to support IN (intelligent network) applications and services. CAP3 (CAMEL application Part 3) provides prepaid charging support for GPRS data and mobile originating SMS services, including services accessed by users roaming outside their home network [4]. However, a major limitation of CAMEL is that it does not support the differentiation of content (at least not insofar as different content may be provided within a single GPRS APN). Differentiated charging is discussed in more detail in the following section.
12.5.2 Differentiated Charging The target of truly effective content charging is to enable differentiated contentbased charging. This means being able to identify a service or content at the relevant level of granularity within the available charging information, and to relate and manage in harmony all charges that are applicable (e.g., separately for the content and access). Some difficulties with this have been discussed earlier within this chapter. Solutions are continually developing to surmount this challenge. One straightforward solution is to perform content charging using information provided by the content origin or delivery server, and vary the access charging
426
CHARGING FOR MOBILE CONTENT
from the network according to the type of content or service being delivered. This could, for instance, be made possible by reserving a specific traffic “pipe” within the access network for each type of content. An example of this is MMS charging, where a specific GPRS access point is dedicated to traffic toward the MMSC (via the WAP gateway). MMS charging is then based on the MMSC charging information, and the charging for access (specific to that access point) is then either zero-tariffed, or tariffed according to the operator’s configured policy. There are, however, limitations to this approach. Separate access points are then required for each different type of content service. The configuration of multiple access points is problematic (especially for the user of a mobile terminal, but also within the network), and it is regarded as healthy practice to minimize the number of access points that need to be created. (Also in the example above it is necessary to have a dedicated WAP gateway for MMS traffic, as there will most probably be different access tariffing applicable to WAP browsing traffic.) 12.5.3
Flow-Based Charging
Identification and charging of traffic flows within a single access point are not currently supported in the GPRS standards; however, this is the area in which solutions are currently being sought to address the challenge of differentiated charging. IP flow-based traffic handling can be used to Classify traffic within the core network according to different services or content types Route traffic according to the service or content (or the indicated service or content provider), providing virtual access points within a single access point configured within a mobile terminal Charge for the flow according to the identified service or content type Provide prepaid charging support Enable a granular level of service control Simplified traffic and content analysis can be performed by utilizing “IP header” information (layers 3–4) within the datastream. More intrusive inspection of the data (layers 5–7) can provide a great range of information relating to the content or service used. However, analysis of higher-layer information is not straightforward, and requires the development of specific analyzers for each application protocol used. Table 12.1 provides an example of items derivable from flow analysis that could be utilized for charging purposes. In this example the flow analysis might identify that the user is browsing a download selection offered by a partner service provider, and the user should not be charged for GPRS access in this instance. In addition to enabling differentiated access through variation of the related access charges, flow-based charging (Fig. 12.7) can enable content charging to be performed solely via charging information provided by a “flow-aware” core network.
12.5
TABLE 12.1
CHARGING CONCEPTS AND MECHANISMS
427
Flow Analysis
Layers 5 – 7 URL
Layers 3 – 4 Source IP address Source port Destination IP address Destination port Protocol
www.contentsshop.com/ downloads
Can be analyzed against a lookup list to identify the service or content being accessed
132.225.35.4 80 129.37.22.17 80 TCP
Identifies the mobile terminal HTTP default Identifies the traffic destination
In the example pictured in Figure 12.7, flow analysis identifies all UDP traffic destined for IP address 129.37.22.22 and port 9201 as representing a specific WAP browsing service. The flow-aware core network provides charging data (based on volume count) for user charging. No charging data are needed from the WAP gateway. All traffic to IP address 129.37.22.15 is identified to relate to gaming services. In this instance the gaming server is used to provide charging data as the user charging might depend on certain application-specific events. No user charging is generated from the core network; however, CDRs might be required for B2B charging between the network operator and the business partners associated with provision of the gaming service.
IDC
Charging System
User charging CDRs for browsing; Zero tariffed CDRs for gaming (B2B charging)
User charging CDRs for gaming
GGSN SGSN flow-aware core n7w
Gaming Server
.15
.22
.37
129
UDP/p or t 92 0
1/129
.37.2
2.22
WAP Gateway
Figure 12.7
Flow-based charging.
428
CHARGING FOR MOBILE CONTENT
3GPP is working on standardizing the concept of traffic plane flow-based bearer charging within its release 6 standard. This functionality has not so far been precisely defined within the GPRS core. Currently, solutions exist on the market that place this functionality either inside an evolved GGSN, or independent of the GGSN. 12.5.4
Mediation
The role of mediation cannot be understated where charging is concerned. Mediation is a process that permits the transfer of information between incompatible entities. Incompatibility arises from the disparate information structures and communication protocols that are used within networks, especially telecom networks. For example, within an operator’s environment one network element might generate CDRs using an ASN.1 encoding format with FTAM as the transfer protocol, while another might present a proprietary fixed-field ASCII format with real-time transfer via GTP. All these records need to be consumed by a billing system that has its own proprietary input CDR format. A mediation device is a standard item in an operator’s charging and billing infrastructure. The mediation device sits between the network and the business support systems. In the context of billing, it is responsible for automating the secure, scheduled collection, and reception of charging records. It manages, according to configured rules, the validation, formatting, and conversion of the charging records, including any required computations derived from configured rules and correlation with other related usage data. Mediation also directs the results of its processes to the required output stream(s) facing the operator’s business support systems. 12.5.5
Correlation
Charging correlation can be defined as a process involving the consolidation of two or more sets of charging data, generated by one or more sources, but that are all related to the same connectivity and/or service session. As a result of the correlation process, new or modified charging data are created. Correlation is typically the business of a charging mediation system. A basic form of correlation occurs with long sessions (e.g., voice calls). An operator for reasons of revenue assurance might generate charging records every 15 minutes, in relation to a very long session. These records will normally need to be buffered within the charging process, and then combined into a single record before they are passed to a billing system. A more complex type of correlation arises where there is a need to correlate usage information from two separate network elements. Examples of this are GPRS, where the correlated charging record might use the GGSN CDR as the basis of volume count information, but use the SGSN CDR for information relating to the user’s location; or correlation as a means of enabling differentiated charging, where access network CDRs are treated differently depending on the information presented
12.6
CHARGING INTERFACES
429
in a related CDR generated by a content delivery server. In online correlation this association must be performed in near realtime during a service delivery session. Complexity in correlation occurs from the technical challenge in implementing correlation (especially online charging correlation) together with the fact that no standards exist to support correlation in flow-based content charging; therefore, no two separate network elements are likely to support the same correlation vectors and mechanisms. 12.5.6 Charging Rules “Charging rules” are the logic configured within network elements to define the charging behavior of the network element for a given subscriber or service. Examples include whether online or offline charging should be used and what tariff class and metering technique to use. Dynamic charging rules allow such instructions to be pushed to, and executed within a network element during a charging session (e.g., to change the metering technique used by the element). 3GPP release 6 standardization work is currently focusing on the concept of charging rules within flow-based bearer charging in the mobile core network. 12.5.7 Rating Rating is also a fundamental requirement for content charging, and is needed to derive the charge to the subscriber for the service or content received. Even where content is prepriced by a content or delivery server, the final charge to the user might depend on numerous additional attributes, such as date and time, location and roaming status, QoS, accumulated credits, or usage balances of the associated subscription, or the bearer used. Rating is used to compute the final price, based on the information supplied in the charging record, and rules and history relating to the subscription. 12.5.8 Advice of Charge Advice of charge is an increasingly important requirement for content charging. The charging system supports advice of charge by providing information, on request, as to the cost of a proposed service or content (or, if it is an open-ended service, the rate at which the service will be charged). Rating is a prerequisite for advice of charge support. Advice of charge is not yet standardized explicitly, and solutions currently provided are largely proprietary and limited in nature. The challenge in advice of charge is to provide information relating to the full (differentiated) cost of the content or service. 12.6
CHARGING INTERFACES
There are a variety of mechanisms used to enable service elements to provide charging information to charging systems. These generally involve the specification of the charging data definition and encoding format, and a record transfer protocol.
430
CHARGING FOR MOBILE CONTENT
This section introduces some commonly referenced interfaces that have a relevance to content charging. There are no universal standards for offline charging (e.g., CDR-based accounting), and a range of mostly vendor-specific alternatives exist. Many proprietary vendor-specific and domain-specific specifications exist describing CDR file encoding formats and transfer protocols. IPDR.org proposes an open specification for offline usage reporting [5]. Online and prepaid accounting requires a high degree of coordination between the service element (client) and the charging system (server). Currently there are a number of available and developing specifications in this area, which highlight the importance of sophisticated charging mechanisms for prepaid services and content. The remote authentication dial-in user service (RADIUS) protocol [6] is commonly used in the internet and telecom worlds to provide authentication, authorization, and accounting (AAA) of users. RADIUS accounting can be used to meet both offline and online charging requirements. RADIUS implements a client – server model, utilizing request and response messages between the client (service element) and server (charging system). Figure 12.8 illustrates this model. RADIUS has a number of limitations from a charging perspective, some of which have been addressed in the Diameter base protocol [7], which is the protocol proposed within IETF (Internet Engineering Task Force) as the evolution to RADIUS. . RADIUS is based on UDP, which is an unreliable transport, allowing for undetected packet loss (which can be translated to lost revenue for an operator or service provider). The Diameter protocol, on the other hand, is based on TCP or SCTP, which provide transport layer reliability and control. . The RADIUS specification has restrictions on the size of attribute data that can be carried, and the number of pending requests that can be supported. These restrictions are not present in the Diameter protocol.
Figure 12.8
Accounting client and server.
12.7
CHARGING INFORMATION
431
. RADIUS and Diameter are message-based. This is advantageous for online accounting, which needs to have the lowest possible latency, but does not scale well for offline accounting, which is more efficiently served by batch file CDR production. Additionally, the mediation needs of RADIUS and Diameter are higher than for instance, for CDR-based charging. . The basis of RADIUS and Diameter accounting is to report usage to the accounting server “after the event,” such as after 100 kB of data has already been consumed. This does not provide support to “true” prepaid charging, in which the 100 kB should be “authorized” by the charging server before it is provided to the user. Extensions have been proposed in IETF to the RADIUS and diameter protocols to enable them to support true prepaid, namely, utilizing authentication and authorization messages to trigger the checking of a prepaid balance before a service is released and to carry a “quota grant” (e.g., seconds or kilobytes) that is authorized for use in the service before re-authenticating and obtaining further quota to continue using the service [8]. Diameter is specified as a base protocol, which provides common functionality to a set of supported applications. One such application is the Diameter Credit Control application [9], which enables real-time credit control for network and contentbased event and session-centric services, and includes operations to support reservation and direct debit against a prepaid balance. The application also supports advice of charge, correlation, and error handling. Figure 12.9 illustrates an accounting or credit control dialogue that might occur between an accounting client and an accounting server. This example is based on the charging of a streamed service to a prepaying subscriber, and also utilizes many of the mechanisms described in the preceding sections; for example, rating (to determine the service cost; and to translate this into a number of “units” to be metered by the service element), charging rules (to influence the charging behavior of the access network), and revenue sharing. OSA (Open Service Access) defines a set of APIs enabling external applications to interact with a network’s service capabilities. One of these APIs supports contentbased charging, also supporting reservation and direct debit models [10]. (OSA API specifications are aligned with and functionally identical to those of the Parlay Group). Currently, specification work in this area is focusing on the definition of a charging Web services interface, supporting the Web services XML, SOAP, and HTTP paradigm. Other specifications and drafts propose the transport of charging and price information within HTTP or SOAP headers, although these can be considered to have limited application and poor support for prepaid charging.
12.7
CHARGING INFORMATION
What information should be presented in a charging record? The answer depends closely on the type (and vendor) of the charging element and the charging models that the charging element is intended to support.
432
Figure 12.9
Session-based prepaid content charging example.
12.8
CHARGING ARCHITECTURE AND SCENARIOS
433
The list in Table 12.2 has been put together not as a comprehensive list of information that should appear in a charging record or request, but is intended to provide a greater understanding of charging data that are needed in various scenarios. The information needed is invariably dependent on factors such as the service used, the charging model supported, and the capabilities of the systems involved. Typically, only the most essential information is found in an online charging request since these requests are generated within the service delivery, and must introduce minimal latency and network load. The choice and naming of the informational items in Table 12.2 is representative only and does not reflect any particular implementation or specification. The list may nevertheless serve as a checklist for considering parameters that need to be supported in a charging dialog.
12.8
CHARGING ARCHITECTURE AND SCENARIOS
In this section a suggested charging architecture is illustrated and some examples are discussed as to how charging might be implemented for various content types. Alternatives do exist to the examples presented, although these might involve a greater degree of complexity or lack of accuracy. 12.8.1 Charging Architecture Figure 12.10 presents a high-level charging architecture based on a charging system that incorporates both online charging and offline mediation capabilities. The charging system should be capable of handling charging requests from the access network (e.g., for GPRS, WLAN access charges), the operator’s internal service delivery platforms (e.g., MMS centers, delivery servers) and externally provided content and services (e.g., partner service providers, Web service providers). The charging system should, if required, be capable of correlating requests from all of these domains where they relate to the same service instance. The charging system should also be able to interact in real time with account balance and rating systems, and should communicate with the operator’s billing and business support systems responsible for invoicing, settlements, financial accounting, and taxation. 12.8.2 Charging Scenarios The following outline charging models provide some further examples of how mechanisms described in preceding sections of this chapter can be applied within a charging architecture as illustrated above. 12.8.2.1 Browsing In browsing, the charging model is based on volume. A flow-aware packet core network can be used as the primary source of charging information, based on (downlink) volume.
434
CHARGING FOR MOBILE CONTENT
TABLE 12.2 Charging Information Items Item User ID
Price Service ID
Tariff class
Destination Timestamp Content/Service provider
QoS
Bearer
Correlation (method, key) Location
Content type Delivery indicator Amount (type, scale, value)
Explanation This item is needed to identify the user or recipient of the content or service, and derive from that the account that should be charged. Typically MSISDN or IMSI are used within a mobile network; although these may not necessarily be used by a content server, and might have to be mapped from e.g., an authenticated user name or pseudonym. This item can be used where the price (currency code and value) is dictated by the content or delivery server. This item is needed generally to provide audit information, but is specifically needed where the content or service is not prepriced, and needs to be rated to derive the charge to the user This item can be used in addition to the Service ID, for instance to indicate what price band an item (e.g., SMS ringtone, mp3) should be charged at. This item can be used e.g. to describe the target of the content Can be used to indicate the time of the service or delivery event (start time and/or end time; duration) Can identify the content provider for revenue sharing purposes—also can indicate whether the content is provided internally or externally to the operator’s partner network. Indicates the quality of service with which the content or service was delivered—can be used as a weighting factor in deriving the charge to the user Indicates the bearer (e.g., WLAN, GPRS) over which the content or service was delivered—can be used as a tariff attribute in deriving the charge to the user Can be used to indicate the correlation method and keys to be used (e.g., IP flow classifier) Indicates the location of the user—can be used to derive roaming status; also as a tariff attribute in deriving the charge to the user This item can be used to indicate the type of content (e.g., .JPEG, MPEG, WAV sent in an MMS message) Can be used to indicate whether the content was successfully delivered This is a collection of informational items that can be used to indicate the value that should be charged or metered in the charging element (e.g., time-seconds; downlink volumekilobytes; currency-ISO currency code-US dollars). More than one type of value should be supportable in a charging request, within different contexts. (Continued )
12.8
Table 12.2
CHARGING ARCHITECTURE AND SCENARIOS
435
(Continued)
Item Debit-credit indicator Application
Revenue share (party, method, value) Sponsored (party, method, value) Session id Session message type
Method (prepaid-specific) Session event Record sequence
Explanation Indicates whether the amount field should be applied positively or negatively Identifies the application being used on the terminal. Can be used to derive revenue share for application developers; also to affect user charging where nonauthenticated applications are used. Can be used e.g., to indicate whether revenue sharing should be applied, in respect of which party, and how the value can be calculated (e.g., % of total charge) Can be used e.g., to indicate whether sponsorship should be applied, by which party, and how the value can be calculated (e.g., % of total charge) Unique ID for a charging session Can indicate for example whether this is an accounting start, interim or stop; request or response; “final units” quota grant from the charging server Can indicate whether the request is for balance check only, reservation, deduction, advice of charge, etc. Can be used to indicate a change of charging rule during a charging session Identifies the sequence of the charging record e.g., where multiple partial CDRs are generated during a long session
The core network is able to recognize which browsing traffic is destined outside the operator’s partner network. This enables charging scenarios where browsing traffic directed outside of the operator’s network is charged at rate x, and traffic within the operator’s network is charged at rate y, or free-of-charge. In the latter case, user charging might be based on additional information from a WAP gateway.
12.8.2.2 Person-to-Person Messaging In person-to-person message, the charging model is based on per message payment by the sender. The pricing is dependent on the message content type and message size (these can be expressed within service ID þ tariff class parameters). A flowaware core network can zero-tariff the access charging, with the MMSC being used as the source of charging data. Charging is based on delivery of the MMS successfully to the recipient’s MMSC; therefore, prepaid charging should utilize a reservation method, where a balance check and reservation is made before the message is permitted to be sent, and the reservation committed (or canceled) depending on the message sending success indicator.
436
CHARGING FOR MOBILE CONTENT
Figure 12.10
Charging architecture.
More elaborate charging models can be accommodated where the core network is able to support a “roaming premium” for both message senders and message recipients, depending on their network-specific location. 12.8.2.3 Download In download, the charging model is based on the specific content and service involved, but mainly the recipient pays. If the content charge is to be made through the download event, then confirmation of successful download is needed, which is most reliably obtained from the delivery server. Prepaid charging would in this case support a balance reservation þ commit on download confirmation model. Revenue sharing (recognition of the content originator) and sponsorship (e.g., based on advertisement display or insertion) should also be supported in the charging information generated by the download server. A flow-aware core network can be used to apply recognition of the flows relating to the download and accordingly suppress access charges, according to the operator’s service offering. 12.8.2.4 Streaming Video In streaming video, the charging model is based on the duration of the streaming session, together with an initial setup fee, charged to the viewer of the content. A flow-aware core network can be used to identify the streaming traffic based on, for example, destination IP address, destination port and protocol, with higher-layer analysis of the RTSP protocol being used to detect messages such as “play,”
REFERENCES
437
“pause,” and “tear-down” as a basis for charging. The core network charging information can also include QoS indicators, which will often be important to indicate how the streamed session should be charged (e.g., if the video stream quality is poor).
12.9
SUMMARY
In conclusion, this chapter has presented at the evolution of the charging systems and mechanisms within the telephony environment, and how these are being adapted to the needs of mobile content charging. A philosophy that is too often applied in the development of mobile technologies and creation of services is that charging is an issue for the charging systems to solve (i.e., an attitude that if you just “throw in a billing system at the end, it will all work”). Typically, the outcome in such cases where charging has not been well considered from the outset is complex, expensive integration projects that nevertheless result in some compromise charging solution, often with the end user completely unclear on how a given service is to be charged. There are two main messages that can be emphasized from this chapter. 1. Mass consumer takeup of mobile content-based services will not be achieved until understandable, predictable, and acceptable user charging can be provided (i.e., fully differentiated charging). 2. Charging logic should be designed in to service applications and solutions from their inception, particularly in prepaid charging, where charging should be integral to the service authorization and also should support correlation of access and content charging data.
REFERENCES 1. 3rd Generation Partnership Project, Technical Specification Group Services and System Aspects, Service Aspects; Charging and Billing (Release 5), 3GPP TS 22.115. 2. WAP Billing Framework Version 1.0, version 21, Nov. 2002, Open Mobile Alliance, OMA-WBF-v1_0-20021121-C. 3. C. Rigney, RADIUS Accounting, RFC 2139, April 1997. 4. 3GPP TS 23.078 V4.7.0 (2002-12), Technical Specification 3rd Generation Partnership Project; Technical Specification Group Core Network; Customized Applications for Mobile Network Enhanced Logic (CAMEL) Phase 3, Stage 2. 5. IPDR.org Network Data Management Usage Specification, version 3.1.1 6. C. Rigney, S. Willens, A. Rubens, and W. Simpson, Remote Authentication Dial In User Service (RADIUS), RFC 2865, June 2000. 7. P. Calhoun, J. Arkko, E. Guttman, G. Zorn, and J. Loughney, Diameter Base Protocol, IETF Work in Progress
438
CHARGING FOR MOBILE CONTENT
8. A. Lior et al., Prepaid Extensions to Remote Authentication Dial-In User Service (RADIUS), Work in Progress, draft-lior-radius-prepaid-extensions00.txt, Feb. 2003. 9. H. Hakala et al., Diameter Credit Control Application, Work in Progress, draft-ietfaaa-diameter-cc-00.txt, June 2003. 10. 3rd Generation Partnership Project, Technical Specification Group Core Network; Open Service Access (OSA); Application Programming Interface (API); Part 12: Charging (Release 5).
CHAPTER 13
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES GANG WU and XIA GAO DoCoMo USA Labs San Jose, California
KEISUKE SUWA Musashi Institute of Technology Japan
13.1
INTRODUCTION
There has been an explosive growth of mobile computing and speedy emergence of new wireless technologies. The desire to be connected “anytime, anywhere, and anyway” leads to an unprecedented research on mobile ubiquitous computing. One main difference of mobile ubiquitous computing from stationary computing is that mobile applications do not occur at a single location with a single context but rather span a multitude of locations such as offices, homes, streets, highways, and mountains [1]. Therefore, a key distinguishing feature of mobile ubiquitous computing is the ability to detect, react to, and make use of changing environmental conditions (context) to provide users with a better seamless and intuitive experience. Location is considered as one of the most fundamental factors of context that influence application behaviors. Context, as discussed by Schilit et al. [1], has three important aspects: where you are, whom you are with, and what resources are nearby. Depending on specific applications, other possible relevant context elements may include noise, light, temperature, speed, network traffic, and charging scheme. Among them, location is a constantly changing parameter in a mobile environment. Also, many resources (such as printers), services (such as wireless coverage), and other elements (such as noise) are location-dependent. The change of location can thus serve as a hint for mobile applications to refresh Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
439
440
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
their knowledge of interested context and respond accordingly. To this end, location-based services are gaining prime importance. It is predicted that worldwide location-based services revenues will grow from approximately $1 billion in 2000 to over $40 billion in 2006, representing a compound annual average growth rate of 81% [2]. Figure 13.1 shows a general framework to support location-based applications. The location estimation system usually has two components: location sensor infrastructure and location estimation algorithm. The location sensor infrastructure comprises of both transmitters that actively or passively (on request) send out signals and receivers that receive and measure these signals. Different location estimation algorithms make use of various types of measurement of these signals such as time of flight, angle, and signal strength. Then, after location estimation is derived, the original location format may not be understood by applications and has to be transformed to other presentation format by the “location format transformation” component. In the rest of the chapter, we will discuss each component of the framework in detail. We begin by defining of location and location-based services. According to the way that location information is used by mobile applications, the taxonomy of location is presented. Then, different physical media transmitted between signal transmitters and receivers in the location sensor infrastructure are discussed. The advantage and disadvantage of each media are covered so that the design choice of one media in a specific application scenario can be easily understood. Next, location estimation algorithms are surveyed. Since measurements received from location sensor infrastructure are error-prone, the goal of these algorithms is to satisfy the estimation accuracy required by applications. At the same time, these algorithms try to optimize measurement cost and decrease system complexity. Next, a number of indoor and outdoor location estimation systems are introduced. These systems utilize different physical media and location estimation algorithms discussed in the previous sections. The work of the Open GIS (geographic information system) Consortium (OGC) is also introduced in this section. OGC, consisting of GIS software vendors, database vendors, integrators, and application providers, put great efforts on defining a standardized format of geographic data expression and communication protocols to expedite data exchanges among different peers. The results from OGC are considered beneficial to the “location format transformation” component. Finally, we describe how to provide location services based on a cellular system.
Figure 13.1
A framework designed to support location-based applications.
13.2
13.2
TAXONOMY OF LOCATION
441
TAXONOMY OF LOCATION
Location, as one of the most important aspects of context, has been widely factored into the design of mobile applications. The applications that are capable of finding the geographic location of an object and providing services based on this location information are called location-based applications. The location information can be accessed via different devices, such as desktop, mobile phone, personal digital assistant (PDA), vehicle, and airplane. Diverse application scenarios include Enhanced 911 (E-911) emergency services, road assistance, geotargeting advertisement, fleet tracking, navigation, and smart office. Furthermore, location information can be integrated into network protocol design to provide location-aware routing, handoff, billing, and system planning services. Although all these applications require location information, the types of the information are quite different. The most important types discussed here are physical location, symbolic location, absolute location, and relative location [3]. 13.2.1 Physical and Symbolic Location Physical location is expressed in the form of coordinates, which uniquely identify a point on a two-dimensional map of the earth. The most widely used coordinate system is the degree/minutes/seconds (DMS) system. Some other common coordinate systems are degree decimal minutes and universal transverse mercator (UTM). In the DMS system, two sets of lines, latitude and longitude, crisscross the map in two directions. Each set of lines is given a set of numbers (coordinates) so that every point on the earth can be expressed as the intersection of a latitude line and a longitude line, which in turn is expressed as the juxtaposition of coordinates of two lines. Latitude lines go east and west. The earth’s equator is the zero line, the baseline for latitude. Then the coordinates increase both to the north and to the south from there to a maximum of 908, which is a single point at each geographic pole. The longitude lines, also known as the meridian lines, go north and south and all cross each other at the pole. The baseline of longitude lines is the line passing Greenwich, a small town in England. From there, coordinates increase both to the east and to the west. Since there are only 3608 in a circle, the maximum degree for a longitude line is 1808 in both east and west directions. Because the unit of degree is too coarse to specify a point, each degree is broken up into 60 minutes, which in turn is broken up into 60 seconds. So, in the DMS system, the coordinates of a point has a format similar to N788330 2200 E1408420 2300 , where N788 is the number of degrees of north latitude and E1408 is the number of degrees of east longitude. To uniquely identify an object in 3D environment, the altitude can work together with DMS coordinates. Symbolic location, on the other hand, expresses a location in a natural-language way: in the home, on the bed, on a train, or similar. This type of location is useful for many applications that do not need the precise physical location information, especially for applications that have a fixed service range. For example, home security systems protect only the area that is part of a home so that symbolic
442
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
expressions such as “in the home” or “out of home” are enough to trigger some actions. With help from the location information database, a physical location can be mapped to the corresponding symbolic information and vice versa. However, the resolution of a physical location can influence the definitiveness of the symbolic information. For example, a resolution of 10 m might not be enough to derive whether a person is in a specific room because there might be several rooms within a 10-m range. If the resolution is improved to 1 m, the probability to successfully estimate the room occupancy is increased significantly. On the other hand, because of the vagueness of symbolic location, it typically provides very coarsegrained physical locations. For example, given a symbolic location such as “in an office,” if the office has a radius of 4 m, the resolution of physical location can not be better than this if no other additional information is collected. 13.2.2
Absolute and Relative Location
An absolute location uses a shared reference grid for all located objects [3]. For example, the DMS system provides absolute location information based on latitude and longitude grid system. For the same location, the reports from two different location estimation systems should be the same. On the contrary, a relative location depends on its own frame of reference. The reports from two different location estimation systems may be different for the same location. For example, a mobile host can be reported as “100 m from the base station 1” and “200 meters from the base station 2.” With the knowledge of the absolution location of the reference points, which is usually stored in a location database, an absolute location and a relative location can be transformed between each other. Relative location information is usually based on the proximity to known reference points, such as access points in a wireless local area network (WLAN) and basestations in a cellular system. So, this type of information can be easily provided by existing infrastructure without specialized location tracking infrastructure. Some useful applications include geotargeting advertisement and E-911 services. For example, a geotargeting advertisement could send out location-specific guide on local restaurants, theatres, or even traffic information to mobile users that register at local cells. An E-911 emergency call is able to reveal which cell the emergency call originates from so that rescue teams can be dispatched in an efficient way.
13.3
LOCATION ESTIMATION MEDIA
Location estimation algorithms are based on the measurements of specific parameters of a received signal that enables the position of a device to be inferred. Depending on targeted applications, location estimation algorithms vary enormously in terms of the type of measuring signals and measured parameters, and the accuracy, usage, and category of derived location information. The measuring signals could be physical signals such as ultrasound, radio, infrared, and visible
13.3
LOCATION ESTIMATION MEDIA
443
light, or software signals such as IP (Internet Protocol) packets. The measured parameters could be time, distance, angle, signaling attenuation, physical contact, or IP headers. The measurement accuracy could be a few meters (for geographic survey), tens of meters (for fleet tracking), a cellular cell (paging), an office room (for business usage), or an Internet subnet (for geographic content targeting). In this section, we will introduce how the characteristics of different physical media (ultrasound, radio, infrared, and visible light) influence their deployment in location estimation systems. IP-based location estimation techniques and services will be covered later in a separate section. 13.3.1 Radiofrequency (RF) Radiofrequency (RF) refers to any frequency within the electromagnetic radiation spectrum normally associated with the radio wave propagation. When an RF current is supplied to an antenna, an electromagnetic field, usually called an RF field or simply a “radio wave,” is produced that propagates through space and is suitable for wireless communications. An RF signal has the speed of light in free space (3 108 m/s) and has a wavelength inversely proportional to the frequency. It covers a significant portion of the electromagnetic radiation spectrum, ranging from 9 kilohertz (kHz), the lowest allocated wireless communications frequency, to thousands of gigahertz (GHz). The RF spectrum is further divided into several bands, and its allocation in the United States is managed by the Federal Communications Commission (FCC). Many types of wireless communication systems make use of RF spectrum, such as cellular telephone systems, satellite communication systems, and WLAN systems (e.g., IEEE 802.11). Because of the popularity of wireless communications and the convergence of cellular networks and WLANs to provide ubiquitous mobile computing, the location-based services and estimation technologies enabled by RF are the main focus here. This section covers the characteristics of radio wave that influence the accuracy of location estimation. 13.3.1.1 Multipath Propagation Multipath propagation refers to the phenomenon that a transmitted signal arrives at a receiver from various directions over a multiplicity of paths because of obstacles and reflectors in the radio propagation channel. The direct path between the transmitter and the receiver is called line of sight (LOS), while other paths are called non– line of sight (NLOS). Because most location estimation algorithms depend on the existence of a LOS path between measured objects and system reference points to correctly measure distance or angle, multipath propagation can be the dominant source of error in location estimation by introducing error in LOS detection. Three propagation mechanisms [4], as illustrated in Figure 13.2, play a role. Reflection occurs when a radio wave encounters a surface that is large relative to the wavelength of the signal. Diffraction occurs at the edge of an impenetrable body that is large compared to the wavelength of the signal. If the size of an obstacle is on the order of the wavelength of a signal or less, scattering occurs and an incoming signal is scattered into several weaker outgoing signals.
444
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
Reflection Transmitter
Lamp post Scattering Diffraction
Reflection
Figure 13.2
Three propagation mechanisms of multipath propagation.
One problem of multipath is amplitude and phase fluctuations, which are also referred to as multipath fading. Multipath fading happens when multipath waves of a signal reach a receiver out of phase and lead to signal cancellation or reinforcement depending on the phases of each wave. Such rapid variations in signal strength and phase occur over a distance in the order of a wavelength. Thus multipath fading has a small-scale effect and makes it harder to estimate distance between a transmitter and a receiver by measuring received signal strength (RSS). The other problem of multipath is called delay spread. When multiple radio waves of a transmitted signal reach a receiver at different times, the signal on NLOS paths of one data bit may collide with the signal on LOS path of neighbor data bits and lead to intersymbol interference (ISI). These delayed signals act as a form of noise to the subsequent primary signal and have similar amplitude in many cases, which make recovery of data information more difficult. Rayleigh and Rician radio propagation models are widely used to model rapid amplitude fluctuation due to multipath fading in the outdoor environment. The Rayleigh model is applicable to the situation where there are multiple NLOS paths of equal strength between a transmitter and a receiver but no dominant paths such as a LOS path. On the other hand, the Rician model best characterizes a situation where there is a direct LOS path in addition to a number of NLOS paths. The Rician model contains the Rayleigh model as a special case when the strong LOS path is eliminated. The radio propagation model in the indoor environment is hard to characterize because of severe multipath, low probability for availability of LOS path, and site-specific parameters such as floor layout, moving people, and numerous reflecting surfaces. So far there do not exist any good models for the multipath characteristics of indoor radio channels for geolocation estimations [5] and more advanced location estimation algorithms using scene analysis or proximity (discussed in later sections) are developed to mitigate the location measurement errors.
13.3
LOCATION ESTIMATION MEDIA
445
13.3.1.2 Other Interference Factors Besides the small-scale multipath fading, which fluctuates over a distance on the order of a wavelength or more, both the medium-scale effect of shadowing and the large-scale effect of path losses exist in RF channels. These two effects have great impact on outdoor location estimation systems such as satellite systems and cellular systems. Shadowing describes the gradual variation in mean power over a distance on the order of a few tens of wavelengths. Shadowing is due mainly to the local shadow attenuations by obstacles such as trees in the vicinity of an antenna and is well described by a log-normal distribution. Large-scale path loss describes the slow variations in mean power over a large area (usually on the order of tens or hundreds of meters). It is modeled as attenuation of the signal due to the large distance of travel. Some large-scale path losses are freespace loss, plane earth loss, and diffraction loss. In the free-space loss model a signal attenuates over distance because the signal is being spread over a larger and larger area. In the plane earth loss model a strong LOS is present, but ground reflections also exist and significantly influence path loss. In the diffraction loss model, plane earth loss is modified to take into account of signification diffraction losses, caused by obstacles cutting into the LOS path. The attenuation is generally the distance raised to a power called the path loss exponent, which is usually between 3 and 5. 13.3.2 Infrared (IR) The infrared (IR) region of the electromagnetic spectrum uses the terahertz (THz) (1012 Hz) range of frequencies, and its spectrum is roughly divided into four subregions; short, medium, long, and very long wavelength. These four frequency bands, from short to very long, cover the wavelengths of 1 – 3, 3– 8, 8– 14, and 14 – 30 mm, respectively. One of the main challenges of IR-based systems is the interference imposed by ambient light source. Such light sources, including local incandescent sources, fluorescent lighting, and sunlight, have relatively high power compared with IR transmitted signal. Although IR optical filters can be used to attenuate the visible portion of the spectrum while leaving the IR intact, these sources still cause severe interference because they provide a greater noise level, 60 dB in many cases, than does the desired IR signal. Furthermore, these sources, especially sunlight, also produce a large amount of infrared energy (in-band noise) in addition to the visible component. IR systems are generally used for two purposes. The first usage is for object detection, sensing, and tracking. This is based on the fact that all heated objects emit IR radiation and every type of object has a unique IR signature or fingerprint. By deriving a digital image from received radiation signal and matching the image in the fingerprint database, IR sensing systems can detect the existence and type of the objects. The advantage of this kind of system is that no specific IR transmitters are needed. The disadvantage of the system is that more sensitive and delicate IR receivers are required to capture sometime very weak signal and to produce the
446
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
image. Because most IR sensing systems use a photon detector made from IR sensitive materials such as mercury– cadmium – telluride (HgCdTe) to detect IR radiation, such systems are very expensive to produce. Also facing the problems of the lack of capability to distinguish objects of the same type and the vulnerability to the environmental interference, this type of system is not suitable to the general-purpose location estimation services. The second usage of the IR system is for data communications. This is because the IR uses high operational frequencies and provides high bandwidth. Some popular commercial products include the remote control of domestic appliances and data backup links for PDAs and laptops. Compared with IR sensing systems, IR data communication systems use extra IR transmitters that modulate IR waves in different ways to transmit digital signals. Because no image production ability is needed, the IR transmitter could be a light emitting diode (LED) or an injection laser diode (ILD), and the IR receiver could be a photodiode. With limited transmitting range (10 m), these equipments are cheap and can massively manufactured and integrated into large-scale sensor networks. 13.3.3
Ultrasound
Unlike RF and IR that are electromagnetic waves, a sound wave is a pressure disturbance that travels through a medium by means of particle interaction. The nature of the interaction is an oscillation in the constituent particles of the medium, causing them to alternately be positioned closer to and farther apart from each other. The oscillations and therefore the sound wave must be produced by a source, which physically oscillates back and forth causing adjacent medium particles to oscillate with it. Hence, sound cannot exist without a medium and the properties of a given medium also heavily influence the manner in which a sound wave propagates through it. The speed of sound at which a sound wave propagates through a given medium depends only on the elasticity and density of the medium, not the frequency of the sound wave. For all practical purposes of location estimation, the speed of sound in air is mainly dependent on the absolute temperature that directly affects the density of the air. The temperature dependence of the speed of sound in air is approximated by v ¼ 331 þ 0:6 T where T is the temperature of the air in degrees Celsius and the unit of the velocity v is m/s. At the normal atmospheric pressure with a temperature of 208C, the equation yields the solution of 343 m/s. Ordinary ultrasound systems operate at frequencies between 20 and 100 kHz. Sounds below 20 kHz are audible by humans, and the use of frequencies above 100 kHz is limited by the attenuation of ultrasound in air. Ultrasonic location systems are able to estimate location with a higher degree of accuracy than other location system based on RF, IR, or visible light. This is because the speed of ultrasound in air (approximately 343 m/s in an indoor environment) is much slower than
13.4
LOCATION ESTIMATION ALGORITHMS
447
the speed of other media (the speed of light is 3 108 m/s). So the time of flight of an ultrasonic signal between a transmitter and a receiver can be accurately measured. On the other hand, because the intensity of sound decreases exponentially with respect to transmission speed, ultrasound location systems have limited coverage area and are suitable only for indoor usage. As an ultrasound wave travels through the air, it also undergoes several behaviors similar to those of RF when it encounters the end of the medium or meets some obstacles. Such behaviors of the ultrasound include reflection, diffraction, and scattering. Also, because the velocity of sound depends on environmental factors such as the ambient temperature and humidity, these properties can exhibit both temporal and spatial variations within a building, and introduce additional measurement errors.
13.4
LOCATION ESTIMATION ALGORITHMS
After introducing the characteristics of different media used in location estimation systems, we will describe three typical location estimation algorithms; triangulation, scene analysis, and proximity. Targeting different application environments or services, these algorithms have unique advantages and disadvantages. Hence, some hybrid systems utilize more than one type of location algorithms at the same time to get better performance. 13.4.1 Triangulation Triangulation is the technique using the geometric properties of triangles to compute the objects’ location. The location is estimated relative to some known framework, which consists of either fixed terrestrial sites (e.g., basestations in a cellular system) or space-based satellites (i.e., the geographical positioning system). Triangulation technique has two derivations; lateration and angulation. Lateration is the technique to locate an object by measuring its distances from multiple reference positions, while angulation is the technique used to locate an object by computing angles or bearings relative to multiple reference positions. Instead of measuring the distance directly, time of arrival (TOA) or time difference of arrival (TDOA) is usually measured, and then distance is derived by multiplying the travel time and the signal velocity (3 108 m/s for light and radio). In other words, all three methods depend on emitting and receiving light or radio signals to determine the location of an object on which a light or radio transceiver is attached. 13.4.1.1 Time of Arrival (TOA) The TOA system depends on the measurements of the time of arrival of multiple signals from reference points to the object at the estimated location. As shown in Figure 13.3, in order to enable a two-dimensional calculation, TOA measurements must be made with respect to signals from at least three fixed reference points. Ideally if all the time measurements are perfect without error, the intersection of
448
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
Reference Point 2 t1
P1
t1 + t Reference Point 1
P3
P
t2
t2 + t
P2 t3
t3 + t Reference Point 3
Figure 13.3
A time-of-arrival (TOA) system.
three dashed circles will uniquely pinpoint the location, P. On the other hand, if one or more measurements exhibit error because of out-of-synchronization clocks or other reasons discussed later, the three distance circles (shown in Fig. 13.3 as the thick circles) cannot intersect at one point. Instead, there will be three intersections, P1, P2, and P3. Normally, (P1 þ P2 þ P3)/3 is a good estimation of real location P. But if it is known a priori that clock offset at the estimated object is the main reason for measurement discrepancy, and the clocks of three reference points are perfectly synchronized to Coordinated Universal Time (UTC), by looking for a single correction factor, Dt, that would allow all the measurements to intersect at one point, the system is able to not only achieve accurate location estimation, but also synchronize the clock of the estimated object to the UTC. This is exactly the mechanism used by the GPS to compensate the clock offset between GPS receivers and satellites and to provide atomic-accuracy timing service to even the lowliest GPS receivers. 13.4.1.2 Time Difference of Arrival (TDOA) The TDOA system is also based on lateration technique but it uses time difference measurements rather than absolute time measurements as TOA system does. Then the time difference is converted to distance difference by multiplying time difference and signaling transmission speed. Because points have constant distance difference to two reference points form a hyperbolic curve, the TDOA system is also referred to as
13.4
LOCATION ESTIMATION ALGORITHMS
Reference Point 2 d2
449
h23
(d3-d2) = constant
P d1 d3 Reference Point 1 Reference Point 3 (d1-d3) = constant h13
Figure 13.4
A time-difference-of-arrival (TDOA) system.
the hyperbolic system. As shown in Figure 13.4, at least three fixed reference points and two pair of time measurements are needed for the two-dimensional location estimation. One hyperbola h13 is defined by the one pair of measurement (d1 2 d3) ¼ (t1 2 t3) (signal speed) ¼ constant, and the other hyperbola h23 is defined by the other pair of measurement (d3 2 d2) ¼ (t3 2 t2) (signal speed) ¼ constant. Because the time difference is used, as long as the clocks at the reference points are perfectly synchronized, the clock offset of the estimated object is canceled during the calculation so that the accuracy of the location estimation is not influenced. 13.4.1.3 Angle of Arrival (AOA) An AOA system depends on the measurements of the angles of the arrival of the signals involved in location estimation. Because directional antennas or antenna arrays are usually required, an AOA system is difficult to be implemented on small mobile devices. As shown in Figure 13.5, at least two geographically fixed reference points and a pair of measurements are required to calculate the
Figure 13.5 Angle-of-arrival (AOA) system.
450
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
two-dimensional location. The intersection of two directional lines of bearing defines a unique location, each formed by a radial from a reference point to the estimated object. Although lateration and angulation techniques are introduced separately in this section, in the real implementation a hybrid system could utilize both techniques to generate more accurate estimation at the cost of additional complexity. 13.4.2
Scene Analysis
In the triangulation type of location estimation schemes discussed above, geometric information such as distance, time, or angle is directly measured to derive the location. Scene analysis, on the other hand, refers to the type of algorithms that at first collect features (fingerprints) of a scene not explicitly related to geometric information and then infer the location of an object by matching real-time measurements with the closest a priori location fingerprints. Scene analysis has been widely used in many research fields such as human face identification and terrain matching. A few systems for location estimation have already been proposed. The “smart” floor [6] developed at the Georgia Institute of Technology installs pressure sensors in the building floor to capture footfalls and uses the data for position tracking and pedestrian recognition. Easy Living [7] developed by Microsoft Research uses high-performance 3D cameras to capture and then analyze visual frames to provide vision positioning capability in a home environment. Besides these systems, RF-based scene analysis systems also exist and are the main focus of this section because of the popularity of RFbased indoor systems such as IEEE 802.11 and Bluetooth. 13.4.2.1 Rationale for RF-Based Scene Analysis One big advantage of schemes based on indoor RF over those based on IR, ultrasound, and satellite is that no special infrastructure for positioning needs to be deployed. Wireless communication infrastructure is being widely deployed in indoor areas (e.g., WLAN, Bluetooth). RF-based scene analysis techniques can utilize such general-purpose infrastructure to provide location estimation services as value-added services. Such services can also be used by other applications to give users better seamless experiences. This contradicts the systems based on IR, ultrasound, and satellite where special IR or ultrasound sensor networks, or GPS satellite networks have to be deployed. Another advantage of RF-based schemes is their wide coverage area, scalability, and easy maintenance. As discussed above, IR technology has some limitations: (1) it scales poorly because of the limited range of IR, and (2) it is very sensitive to the direct sunlight. Ultrasound technology has similar limitations: (1) it behaves poorly over a long distance because of quick signal attenuation in the air and (2) the velocity of ultrasound is greatly influenced by the temperature and humidity of the air. GPS works well outdoors but poorly indoors. Two main characteristics of RF relevant to location estimation are signal strength (SS) and signal-to-noise ratio (SNR). As indicated in another study [8], SS usually is a
13.4
LOCATION ESTIMATION ALGORITHMS
451
better index than SNR. The SS can be used to determine the distance between a transmitter and a receiver in two ways. The first approach is to use received SS to estimate the path loss of the RF signal over the path and then calculate the distance according to RF attenuation model such as the Rayleigh or Rician model. With the knowledge of the SS from three distinct transmitters, the location of the receiver can be determined by trilateration technique discussed above. However, because both SS and SNR are subject to severe multipath fading and fluctuate over a very short distance, the variation of SS is too wide to be useful for deriving accurate location information. So, the second approach of scene analysis [8–10] based on SS fingerprinting is used and has better results than the first approach [8]. In this approach, real-time measurements of SS are not used directly to calculate distance but to match against SS entries in a database. The entries of this database contain SS of different locations collected a priori. In the simplest way, the location of the entry closest to the real-time SS measurements approximates the accurate location of the object. 13.4.2.2 General Framework of Scene Analysis Because measurements of RF characteristics have large variation and generally cannot be accurately mapped to a location according to a closed-form model, the scene analysis approach then uses these measurements in a fingerprinting manner. Most of such systems involve three operational steps: profiling, matching, and estimation. Profiling This step usually acts off line before systems can be used for real-time location estimation. In this step, the wireless coverage area is at first divided into small portions and each portion includes one or several observation points. Then, depending on specific algorithms, multiple samplings of required RF parameters are collected at each observation point. If necessary, measurement data can be further postprocessed also. Finally, the processed measurement data are stored in a location database to be retrieved later for online matching. In order to guarantee that the sampling is updated and reflects current layout of the building, this step of profiling has to be repeated when the building environment has some dramatic changes. The sampling also has to be collected multiple times at each observation point at different times of the day to smooth the measurement variation throughout the day. How to choose these observation points is a design issue. The points can form a simple grid structure, or they can be distributed unevenly depending on the popularity and the layout of the location. It is obvious that increasing the density of observation points can increase the closeness between fingerprinting points and observation points and therefore the accuracy of location estimations. The tradeoff is that the number of entries of the database also increases, which requires larger storage and lengthens the search time. Also there is a threshold that further increasing density of observation points has no much influence on estimation accuracy [9]. Matching In this step, real-time measurement of an object is obtained and input to the computation unit. The computation unit can be on either the client or infrastructure
452
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
side depending on specific systems. It compares the real-time measurements with the entries in the location database and finds the “closest” match(es). On one hand, various systems might use different index used to define “closest” relation between two measurements. On the other hand, one or several most “closest” entries in the database might be returned depending on how the actual location is estimated using these entries. More detailed algorithms are covered in the following example. For example, a generalized weighted Lp distance between the measured vector [m1, m2, . . . , mN] and a database entry [e1, e2, . . . , eN] can be obtained [10] !1=p N 1 X 1 p Lp ¼ jmi ei j N i¼1 vi Lp becomes Manhattan L1 distance when p ¼ 1 and Euclidean L2 distance when p ¼ 2. In most cases all the entries have vi ¼ 1. But possible enhancements could use weight vi to bias the distance by differentiating how reliable a measurement is. For this purpose, vi can be related to the total number of samples to get the average at each observation point. The larger the number of samples is, the more accurate the average is. Or vi can be related to the standard deviation of samples. The less the standard deviation is, the more reliable the measurement is. Besides these deterministic algorithms, stochastic algorithms such as Bayesian networks [9] could also be used to find the match. The Bayesian network is a graphical representation of a joint probability distribution that explicitly declares dependency relationships between random variables in the distribution. Figure 13.6 shows an example of how a Bayesian network is used for location estimation in the Nibble system, which is a WLAN-based system for indoor location estimation. The Bayesian network is a rooted tree with directed arcs from the root node Q to a set of WLAN access points (APs). The root node Q is the “query” variable that describes p(Q), the a priori distribution over a set of locations Q ¼ fq1, q2, . . . , qjg. Location set Q includes all the interested locations that a mobile device wants to track. The default distribution of Q is a uniform distribution. But it can be easily modified to reflect one’s preference or current showing frequency at each
Location
Q = {office, library, seminar room, doorway}
Access Points Set L E = {Low SNR, High SNR, Unknown}
Figure 13.6 [9].
AP 1
AP 2
AP i
An example of Bayesian network for location estimation in the Nibble system
13.4
LOCATION ESTIMATION ALGORITHMS
453
location. In the profiling step, at each interest location, multiple samples from each AP are collected and the marginal conditional probability is calculated that value ei [ E as is observed from an AP given current object location is qj [ Q. After all locations are sampled, the marginal conditional probability p(EjQ) of each AP is stored in a separate leaf node of the Bayesian network. Then in the matching step, to estimate a location, a mobile object at first samples and quantizes signals from each AP. The result from each AP forms a vector R ¼ fr1, r2, . . . , rjjrj [ Eg, which is used by the Bayesian network to calculate p(QjR), the a posteriori probability distribution over location set Q. And the conditionally probability distribution of each location is used to define “closeness.” Another issue of the matching problem is in the domain of computation geometry. Because the location database can be quite large for wide area coverage, the efficient organization of the data and corresponding search algorithm are important for real-time performance. There is a fair amount of literature dealing with these issues, and they are not repeated here because they are considered beyond the scope of this chapter.
Estimation This step produces location estimation of the measured object using “k closest” entries selected in the “matching” step. If k is equal to 1, the selected entry is the closest entry and considered to be the location of the object. On the other hand, if k is larger than 1, the “estimation” step then uses different algorithms to derive the estimated location from these k entries. The rationale is that often there are multiple neighbors that are roughly the same distance from the measured object. Given the large variation of RF measurements, there is no fundamental reason to pick up only the closest neighbor and reject others that are almost close. It has already been shown [8,10] that algorithms such as the k nearest-neighbor averaging and the smallest k-vertex polygon can provide more accurate estimation. The k nearest-neighbor average requires k closed locations be returned and then uses the average of these k locations as the location estimations. Smallest k-vertex polygon finds k locations that form a polygon with smallest perimeters and then take the average of these k locations as the location estimation. As shown in Figure 13.7, when k is equal to 3, three (P3, P4, and P5) among total seven points
P1
P6 P3
P5 L
P2 P4
Figure 13.7
P7
Smallest k-vertex polygon location estimation algorithm (k ¼ 3).
454
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
form a triangle with minimum perimeter. The location estimation L is the average of these three points. 13.4.3
Proximity
The main goal of proximity location estimation algorithms is to detect whether an object is near a known location. In another words, proximity algorithms mainly provide symbolic and relative location information. Such proximity information can be used by a lot of applications that do not need very accurate physical location information. The proximity location system usually is implemented as software enhancement on existing service infrastructure. Hence, it is often more cost-effective and has a shorter turnaround time than do other systems relying on specialized sensor infrastructure. The proximity information can be used in many different ways. It can help devices requiring physical contact to provide more intelligent services. For example, lamps (computers, TV, etc.) can automatically turn on once the appearance of a user is detected. It can be used to provide a mobile “Yellow Page” that sorts the information according to the distance from the information source to a user. In the following section, we will discuss two algorithms providing proximity location information: IP subnet detection and basestation detection. IP subnet detection occurs in the wired Internet and is supported by most of the current content delivery network (CDN) operators. Base station detection is required by FCC for E-911 emergency service and is mandatory for all cellular operators.
13.5
LOCATION ESTIMATION SYSTEMS
In Sections 13.3 and 13.4 we have introduced the main media and basic estimation algorithms used in current location estimation systems. In this section, we describe in detail some representative systems that have different combinations of these technologies and therefore are suitable for different usage scenarios. These systems are broadly divided into two categories: indoor systems and outdoor systems. At the end of the section, the work of Open GIS (geographic information system) Consortium (OGC) is introduced briefly. This work is considered as part of the “location format transformation” component shown in Figure 13.1 that handles the information transformation between different location formats. 13.5.1
Indoor Location Estimation Systems
Indoor location estimation systems have a limited coverage area, probably high estimation accuracy requirements, and complex environment setup. All systems based on RF, infrared, and ultrasound can be used in an indoor environment. 13.5.1.1 Scene-Analysis-Based Systems We introduce two systems, RADAR and Nibble, as examples in this section.
13.5
LOCATION ESTIMATION SYSTEMS
455
RADAR [8,10] is a RF-based indoor location tracking system developed at Microsoft Research. The system is based on IEEE 802.11 WLAN technology and uses scene analysis for location estimation. In the original testbed of RADAR, three access points (APs) of 802.11 WLAN were able to cover a floor with over 50 rooms. In the profiling step, each AP measures both SS and SNR of RF signal transmitted by a mobile object from one of the observation points. Because it has been discovered [8] that signal strength at a given location varies significantly depending of the object’s orientation (i.e., east, west, north, and south), at each observation point multiple measurements facing each orientation are collected. Then the average of the measurements is taken at each AP. Finally samples from three APs are combined into the tuple of the form (x, y, d, SSi, SNRi), where x and y are the coordinates of a observation point, d is the orientation, and i1f1, 2, 3g corresponds to the three APs. Such a tuple of information is collected for each observation point and stored into a location database. Then in the matching step, real-time measurement of a mobile object is collected and compared with entries in the location database. Finally, k nearest-neighbor averaging algorithm is used to derive the location estimation. Overall, the RADAR system is able to estimate location with a high degree of accuracy. Its median error distance (50% percentile) is 2– 3 m. The k nearestneighbor averaging (k ¼ 3) scheme outperforms the single closest location scheme significantly, and it is shown that there are thresholds of both the density of observation points and the number of real-time samples beyond which further increase cannot improve the accuracy dramatically. The Nibble system [9] is also a WLAN-based indoor location system utilizing scene analysis techniques. It was developed at UCLA Multimedia Systems Lab as part of its multiuse sensor environment (MUSE) project. It runs on mobile objects and uses RF signal sent by nearby WLAN APs to derive location estimations. The most significant difference of Nibble from other systems is that it relies on an evidential reasoning model, namely, Bayesian networks, to aggregate and interpret information from sensors (APs in this case) to provide location estimation services. In the profiling phase, SNR is collected at each interested location and then quantized into discrete values. In current implementation, the SNR value set is E ¼ flow, high, unknowng. Then the Bayesian network is used in the matching phase. Finally, the location with largest conditional probability is considered as the location estimate. Furthermore, Nibble defines the “quality of information” metric to characterize sensors’ service performance. Nibble then uses this metric to select most reliable sensors to retrieve RF signal measurement. In this way, for the same estimation accuracy, the number of queried sensors, or cost can be optimized. Currently, the Nibble system can discriminate locations roughly 3 m apart.
13.5.1.2 Ultrasound-Based Location Estimation Systems There are both narrowband and broadband ultrasonic location systems. Narrowband ultrasonic transducers used in existing ultrasonic systems have piezoelectric
456
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
ceramics as their active elements. A piezoelectric material has the property that a mechanical change of the material is proportional to a change in the electric field across the material. Hence, a piezoelectric material can be used as an ultrasonic transmitter by modulating the electric field across the material; vice versa, a piezoelectric material can be used as an ultrasonic receiver/detector by measuring the electrical field across the material. This kind of transducer only has a usable bandwidth of less than 5 kHz but is inexpensive, small, and robust. As a result, it is widely used in location systems with a large scale of transmitters [11,12]. The BAT system [12] and the Cricket system [11] share some common properties: (1) use narrowband ultrasound as measuring media, (2) use triangulation and time of arrival as the estimation algorithm, (3) have accuracy of several centimeters, and (4) use both ultrasound and RF transducers on one measured object. Because narrowband ultrasound is very difficult or expensive to be modulated to carry digital data, in both systems ultrasound only has the form as a pulse and relies on RF to carry on system information. Developed in AT&T Laboratories at Cambridge, MA, the BAT system consists of a central RF basestation, a matrix of fixed system receivers, and a collection of measured objects equipped with both an RF receiver and an ultrasonic transmitter. To find the location of an object, the central RF basestation sends out an RF signal with a unique ID identifying this object. On receiving the RF signal and verifying embedded-object ID, the probed object sends out ultrasonic pulse immediately. Also having received the initial RF signal, nearby system receivers can measure the distance between themselves and the measured object by observing the time interval between the receipt of RF signal and the receipt of corresponding ultrasonic signal. Then the trilateration technique is used to calculate the location of the measured object. The Cricket system is a location support system developed in MIT Laboratory for Computer Science. To maintain location privacy of each mobile host, location estimation is not carried on at the system side. Instead, the Cricket location system distributes independent and unconnected transmitters throughout a building. Each transmitter sends an RF signal while simultaneously sending out an ultrasonic pulse. Any mobile host needs to have both RF and ultrasonic receiver to derive its location. To do so, on receiving the initial RF signal, a mobile host activates its ultrasonic receiver and measures the time difference between the arrival of initial RF signal and corresponding ultrasonic signal. Then the measurement of time of flight is used to derive the distance between the mobile host and the transmitter. After distances to multiple transmitters have been measured, the location of the mobile host can be calculated using the trilateration algorithm. Location systems using narrowband ultrasound have two main limitations: (1) they are very sensitive to the interference of in-band noise. Because in-band noise, such as the clacking of typing on a computer keyboard, happens frequently in our daily lives, the accuracy of location estimation could deteriorate greatly sometime; and (2) if the transmission time of ultrasonic signals from nearby transmitters overlaps with each other, these signals will collide with each other and make it difficult for the receiver to distinguish among them and derive the location information.
13.5
LOCATION ESTIMATION SYSTEMS
457
To solve these two problems, a broadband ultrasonic location system [13] has been developed at the University of Cambridge. The system prototype uses piezopolymer films as activators, which are inexpensive and more robust than piezoelectric material. However, piezopolymer films have low sensitivity, so as transmitters they need a larger driving voltage and as receivers they need more sensitive amplifiers. The system prototype has a wide above-noise frequency bandwidth of 75 kHz. It uses direct-sequence spread-spectrum (DSSS) signal structure to achieve simultaneous multiple access ability and better performance in the presence of noise. The prototype uses the same architecture as BAT. A central RF basestation is used to send out RFs signal to poll each individual mobile host. Each mobile host is equipped with a RF receiver and a broadband ultrasonic transmitter. On receiving the RF poll from the central basestation, a mobile host immediately sends out an ultrasonic signal. The signal is then received by nearby fixed ultrasonic receivers and used to derive the location of the mobile host. Because multiple ultrasonic transmissions are allowed at the same time, the update rate of the system is greater such that the system also has higher accuracy than does BAT. 13.5.1.3 Infrared-Based Location Estimation Systems One location estimation system using IR techniques is the active badge system [14] developed in Olivertti Research Ltd, England. The active badge system deploys a sensor network inside a building consisting of one IR sensor per room. Each employee is equipped with a 55 55 7-mm badge that sends out a short IR beacon with a unique ID every 15 seconds. On receiving a beacon, the sensor forwards related information to a centralized computer where employee information is derived and then associated with the room in which the sensor resides. Unlike RF signals that can penetrate the office partitions, IR signals will not travel through walls. Hence, this system can provide good location estimation of the accuracy of a room size. The services supported by the system are Find(name), which provides the current location or a list of most visited location of a badge With(name), which provides information of other badges that are at the same area of the requested badge Look(location), which provides information of all the badges close to the requested location Notify(name), which sends an alarm to the current location of the requested badge.
13.5.1.4 Location Proximity in CDN An IP address is used not only to identify a computer interface but also to route packet from source to destinations. In an IP address there are different fields to express the network address and local interface address. The network address is unique globally and is assigned in a hierarchical manner by some authorities. For a fixed wired subnet, the mapping between IP subnet address and location could
458
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
be stable for quite a long time. Hence, it is possible to derive location proximity from an IP address. However, the resolution of proximity location varies a lot according to the size of an Internet service provider (ISP). Usually each ISP has been assigned a chunk of IP addresses. Depending on the size of an ISP, the chunk can be large or small (class A, B, or C in IPv4). ISP has the authority to further assign IP addresses within its own chunk to its own subnets. Such an internal assignment is seldom exposed to outsiders. So if an ISP has a very large coverage area, the proximity information might be very coarse-grained. At the same time, CDN providers, such as Akamai [15] and Speedera [16], operate a highly distributed network comprising a few data centers and thousands of edge servers residing inside ISPs’ networks or at the ISPs’ POPs (points of presence) through bilateral contracts. These overlay networks run on top of current Internet and nodes, and interconnect with each other through dedicated links purchased from local ISPs. Contents are distributed among edge servers and retrieved locally from the closest edge servers. The global coverage and flexibility of overlay networks make it possible to use more advanced traffic management, failure recovery, and denial-of-service protection, and request routing techniques to further improve users’ experience. Because edge servers in overlay networks have smaller coverage areas, are closer to end hosts than original servers of content providers, and have access to internal IP address information, this infrastructure could achieve better performance and have more edge processing capabilities such as geotargeting, location-aware computing, and dynamic content reassembly. Hence, further leveraging their overlay network infrastructure, CDN providers provide proximity location services as the valueadded services. EdgeScape provided by Akamai [15] is the service that provides geographic, network, and corporate identity information for IP addresses on the Internet. The information provided by EdgeScape includes geographic information (country, area, latitude and longitude, time zone, zip code, etc), network (connection type, network name, and actual connection), and corporate identity (company name and domain name). The information is kept in a database maintained by Akamai network. Using customers’ IP addresses as parameters and invoking EdgeScape API, service providers send inquiries to and retrieve information from the database. By integrating EdgeScape API into their Web servers or application servers, service providers can realize more complicated business logic including localized pricing, target promotions and advertising, content customization, service regulation, and fraud detection and prevention. The geotargeting service suite provided by Speedera [16] provides both GeoPoint services and GeoTraffic analysis services. GeoPoint services are similar to EdgeScape services of Akamai in providing geographic information to content provider’s application server to facilitate personalized and localized services. Content providers use GeoPoint APIs to contact GeoPoint servers to receive geographic information that is continuously updated. GeoTraffic analysis services provide a comprehensive look at the Web traffic from a geographic perspective.
13.5
LOCATION ESTIMATION SYSTEMS
459
Weekly and monthly reports detail different resolutions of geographic data and summarized network and proxy information. The services complement other web analytic services, adding a geographic dimension to visitor information and enabling the content provider to prioritize content personalization initiatives, improve marketing and sales planning and execution, and help plan Web server architecture and indicate appropriate revisions. 13.5.2 Outdoor Location Estimation Systems Because ultrasound and infrared have only limited coverage areas, RF is the main medium used in outdoor location estimation systems. In this section, we will introduce two systems based on satellites and cellular networks respectively. 13.5.2.1 Location Estimation with GPS-Based System The global navigation satellite system (GNSS) [17] is a generic term given to satellite-based radio navigation systems designed to support worldwide highaccuracy position, velocity, and time estimation. The global positioning system (GPS) system developed by the Department of Defense (DoD) of the United States and the GLONASS system later developed by the Soviet Union are two currently operational GNSS systems. In this section, we will cover the basics of the GPS system because of its popularity. GPS consists of three major segments: SPACE, CONTROL, and USER. The SPACE segment consists of 24 operational satellites that orbit the earth in 12 hours. There are often more than 24 operational satellites as new ones are launched to replace older satellites. The satellites are divided into six equally spaced (608 apart) orbital planes (four satellites in each plane). The orbits have the altitude of 20,200 km and the inclination angle of 558 with respect to the equatorial plane. Such a constellation allows the satellites to repeat the same track and configuration over any point approximately every 24 hours (4 minutes earlier each day) and provides the user with between five and eight satellites visible from any point on the earth. The CONTROL segment consists of five monitor stations, three ground antennas, and a master control station (MCS). The monitor stations passively track all satellites in view, accumulating ranging data. This information is processed at the MCS to determine satellite orbits and to update each satellite’s navigation message. Updated information is transmitted to each satellite via the ground antennas. The USER segment consists of antennas and receiver processors that provide positioning, velocity, and precise timing to users. Currently GPS receivers have been miniaturized to just a few integrated circuits and equipped in various consumer electronic components, such as in cars, boats, laptop PCs, and PDAs. Initially operating in 1993 and fully operating in 1995, GPS originally provides two levels of services; the precise positioning service (PPS) and the standard positioning service (SPS). The PPS is a highly accurate military positioning, velocity and timing services available to authorized users equipped with specialized
460
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
receivers. The PPS has the accuracy of 22 m horizontally, 27.7 m vertically, and 200 ns temporally [17]. On the other hand, the SPS is targeted for civil usage with a built-in variable error produced by the U.S. military in satellite transmitters to intentionally degrade the accuracy to 100 m horizontally, 156 m vertically, and 340 ns temporally. Recognizing that GPS becomes more and more indispensable in the global information infrastructure, the U.S. government deactivated selective availability (SA), which was used to degrade SPS accuracy, on May 1, 2000 such that the PPS now is available for civil usage also. GPS location services are based on the time-of-arrival (TOA) technique. The distance between a user and a satellite is measured in terms of the transit time of the GPS signal from the satellite to the user. The satellites, which broadcast their positions, are the reference points shown in Figure 13.3. To uniquely decide a point in 3D space, theoretically three satellites are needed to provide three distinct distance measurements. However, to mitigate the clock bias between GPS satellites and receiver clocks, an extra fourth satellite is required. Figure 13.3 shows the 2D case and only require 3 satellites. The accuracy of basic GPS is influenced by several sources of random and systematic errors. Some of these error sources are uncompensated errors in the clocks of the satellites, accuracy of the predicted satellite positions, unmodeled propagation delays in the ionosphere and the troposphere, multipath fading, and receiver noise [17]. To mitigate such errors, differential GPS (DGPS) systems are proposed. The design of DGPS is based on the fact that the errors associated with the GPS measurements are similar for users located close to each other (within a few hundredths of a kilometer) and change slowly in time (in the order of several seconds). So, DGPS utilizes a reference receiver with known location to estimate measurement errors and broadcasts these errors to other nearby GPS receivers over a radio link. Neighboring GPS receivers then use this information to mitigate errors in their own measurement assuming that they have the same measurement errors as the reference receiver. DGPS can provide meter-level and even submeter-level position estimations depending on the closeness of the user to a reference point and the latency of the corrections transmitted on the radio link. 13.5.2.2 Location Estimation with Cellular-Based System In 1996 FCC ruled that cellular operators must provide E-911 emergency services, comparable to those currently available for most wireline network users. The deployment of the services is divided into two phases. In phase 1, both the cell location of the caller and appropriate callback number are required to be reported to the public safety answering point (PSAP). In phase 2, the estimation of location information of the mobile station has to be within 125 meters of its actual location for at least 67 percent of all wireless E-911 calls [18]. The requirements of phase 2 are hard to satisfy and need major modification of existing cellular network infrastructure. The location estimation algorithms are based on triangulation algorithms discussed in the previous section. However, the requirements of phase 1 are based on proximity location information, so they are relatively easy to satisfy and only require simple add-ons to receive and detect E-911 calls. The system does not modify
13.5
LOCATION ESTIMATION SYSTEMS
461
existing basestations and has no interaction with base station except the access to the necessary antenna signals.
Phase 1 Estimation Not just limited to E-911 services, location proximity information in phase 1 can be applied to other applications such as location-sensitive billing, fraud detection, cellular system design, and resource management. Location-sensitive billing allows cellular carriers to offer different rates depending on whether a mobile station is used at home, in an office, or on the road. System designer can use the load information of each location to better position and tune cells, and thus improve spectrum utilization efficiency. In cellular networks, a service coverage area is divided into smaller hexagonal areas called cells. Each cell is served by a basestation with a systemwide unique station ID. Several basestations are then controlled by a radio network controller (RNC), a number of which in turn are managed by a mobile switching center (MSC). A mobile station is active if it is powered on. Since the exact cell location of a mobile station is known to the network during the call, the main issue of location proximity is the location estimation of a mobile station between two consecutive calls. Two major operations are involved in location proximity: location update and paging [19]. The location update is performed by an active mobile station to fresh its location information state in the cellular network. The paging operation is carried on by the cellular network to alarm a mobile station about upcoming events such as a coming call. There is a basic tradeoff between the updating cost and paging cost. The updating cost is mainly the bandwidth usage and mobile station power consumptions. The paging cost is the paging traffic load and the paging delay. Among these parameters, paging delay is the main focus of location proximity algorithms because paging delay can influence the availability of location information. The higher frequency at which the mobile station updates its location, the smaller the paging area of cellular network. So, the optimization problem is to find the optimal location update and paging algorithms and to minimize the totalcost. Many schemes are surveyed by Zhang [19]. One scheme is to define a location area that covers a few basestations and a mobile station updates its location whenever it comes into a new location area. In this scheme, before paging succeeds, the location proximity is the current location area. Another scheme is called the movement-based location update scheme, in which each mobile station keeps a counter that is initialized to zero. The count increases by one whenever the mobile station crosses a cell border. When the count reaches a predefined number M, the mobile station updates its location. In this case, before paging succeeds, the location proximity is a circle centering at last updated cell and having a radius of M cells. Considering the zigzag behavior between two adjacent cells, the actual position may be quite close to the previous updated cell. A variation of this scheme is called distance-based location update strategy. In this area, a mobile station updates its location only after it is more than distance M from previous updated cell. In this
462
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
case, the location proximity is the perimeter of the circle centering at last updated cell and having a radius of M cells. Phase 2 Estimation Signal strength, angle of arrival (AOA), time of arrival (TOA), and time difference of arrival (TDOA) are the most important measurement algorithms used for location estimation in current cellular networks. As discussed earlier, signal-strength- and AOA-based systems are subject to the severe interference of multipath and shadowing effects and are suitable for applications requiring only low location estimate accuracy. TOA and TDOA appear to be more appropriate for high-accuracy location estimation required by phase 2 E-911 services and other applications. Since code-division multiple access (CDMA) and time-division multiple access (TDMA) are two dramatically different air interfaces used in current mobile networks, time estimation techniques deployed in these two types of systems are introduced separately below. One common assumption for these two types of systems to work is that time synchronization needs to be guaranteed among participating basestations or mobile hosts. This is achieved either through GPS or extra network components. The TOA estimates can be derived from the pseudonoise (PN) code acquisition and tracking algorithms employed in spread-spectrum receivers. The estimation usually has two phases: the coarse acquisition phase, which determines the time delay estimate to within a chip duration; and the fine acquisition phase, which maintains fine alignment between the locally generated and incoming PN sequences by using a delay-locked loop (DLL) or tau-dither loop (TDL) [18]. The TDOA estimates can be derived by forming the cross-correlation between signals received at a pair of basestations. Assume that the signal received by basestation A is sA (t) and the signal received by basestation B is sB (t). Then the crosscorrelation function of sA (t) and sB (t) is ð 1 T sA (t)sB (t þ n)dt CA,B (t) ¼ T 0
CDMA-BASED SYSTEM
The TDOA estimate is the value v that maximizes CA,B (t). TDOA can also be directly derived from TOA if TOA measurements at each base station are available. Besides multipath, shadowing, and path attenuation, CDMA systems are subject to two other main sources of error. The first source of error is the multiple access interference that is usually called “near – far” effect in CDMA systems. The near – far effect is the phenomenon where strong signals sent from nearby mobile stations make it difficult to correctly receive the weak signals from remote mobile hosts. So power control schemes are used to combat the near –far effect by attempting to ensure that each user’s signal is received with equal power at the basestation. But in the location estimation systems, because each mobile station has to communicate with at least three basestations at the same time, it can set up its transmission power to satisfy only one basestation’s power control requirements. As a result, it might bring up severe multiple access interference to cells of other two base stations.
13.5
LOCATION ESTIMATION SYSTEMS
463
The other error source is called dilution of precision (DOP), which refers to the effect that relative geometry of the basestations to the mobile stations can further degrade the accuracy of location estimation. In some cases, the accuracy of the estimate can vary by an order of magnitude or more due to the effect of DOP. Because cellular networks are usually designed to achieve better communication services, the basestation configuration may not be good at minimizing DOP. For example, basestations might be set up along a popular highway in order to have good coverage and handover performance for drivers on the highway. But such a linear configuration of basestations incurs large errors of DOP. Figure 13.8 uses AOA as an example to show the effect of DOP. In Figure 13.8, geometry (a) has better performance than geometry (b) given the same measurement error of AOA. Note that DOP also happens in TDMA systems described below.
BS A Correct Measurement
AOA Error
Location Error
BS B Actual Location
Erroneous Location
Geometry (a)
BS A
Correct Measurement
AOA Error Location Error
BS B Actual Location
Erroneous Location
Geometry (b) Figure 13.8 Effect of dilution of precision (DOP).
464
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
TDMA-BASED SYSTEM The global system for mobile communications (GSM) is a widely adopted TDMA-based 2G system used in cellular networks around the world. In GSM systems, time measurement (TOA or TDOA) location estimation systems also have a higher accuracy than do those with signal-strength- and AOA-based solutions [20]. In addition, timing measurements are inherent in the GSM standard as a way of ensuring proper slot framing. TOA is measured by arbitrarily imposing a mobile station handover to two or more basestations. After handover happens, two mechanisms can be used to measure the TOA between the mobile station and the basestation. In the first mechanism, according to GSM specification, a basestation informs the mobile station how to advance the frame timing to ensure proper framing synchronization. After two forced handovers, three time advances are known and the sufficient information is collected to estimate location. In the second mechanism, a mobile station sends a burst of known data to the current basestation so that the basestation can record TOA. After two other basestations record the TOA in the same way, the location can be estimated using three TOA measurements. However, under current GSM specification, the unit of TOA is of a bit period, which equates to a location accuracy of 554 m. After taking into account other error sources, the accuracy of systems purely based on TOA can be worse than 554 m. TDOA is measured using methods of observed time difference (OTD). Each basestation monitors the OTD between at least three basestations. This information is known in both the idle and communicating modes. If real-time difference values between basestations are known, TDOA among these basestations can then be derived. Once again, the unit of the estimation is still of 1 bit period and the accuracy of location estimation is at best 554 m, which is worse than the requirements of phase 2 E-911. To achieve TDOA measurement much more accurate than the current 1-bit resolution, current mobile stations have to be modified to support more accurate pseudosynchronization to locate the training sequence and combat multipath effect.
13.5.3
Location Format Transformation
For a given location estimation system, the number of output formats of location estimation is limited by the implementation complexity. However, both the number of location-based applications and their desired presentation formats are unlimited. In Section 13.2, we have briefly introduced a simple taxonomy of location information. In practice, new types of location-based services and presentation format will continue to evolve as ubiquitous mobile computing becomes more and more popular in our daily lives. Hence, the component of location format transformation (LFT), as shown in Figure 13.1, plays an important role in transforming information formats provided by a specific location estimation system to different presentation formats understandable by applications. During the transformation, LFT might cooperate with other information resources, such as various map databases and geoinformation processing vendors, to complete the transformation. Given the fact that the vendors of
13.6
LOCATION SERVICES BASED ON CELLULAR SYSTEMS
465
location estimation systems, location format transformation, applications, and other resources can be completely different, the seamless integration and cooperation of these systems are not a trivial issue. As a result, the OpenGIS (geographic information system) Consortium (OGC) was formed. OpenGIS is defined as transparent access to heterogeneous geodata and geoprocessing resources in a networked environment. The goal of the OpenGIS project is to provide a comprehensive suite of open interface specifications that enable developers to write interoperating components that provide these capabilities [21]. OGC, consisting of GIS software vendors, database vendors, integrators, and application providers, manages consensus processes that result in a standardized format of geographic data expression and communication protocols to expedite data exchanges among diverse geoprocessing systems. Some of the possible benefits of OGC are as follows [21]: . Geolocation information should be easy to find, without regard to its physical location. . Once found, geolocation information should be easy to access or acquire. . Geolocation information from different sources should be easy to integrate, combine, or use in spatial analyses, even when sources contain dissimilar types of data (raster, vector, coverage, etc.) or data with disparate featurename schemas. . Geolocation information from different sources should be easy to register, superimpose, and render for display. . Special displays and visualizations, for specific audiences and purposes, should be easy to generate, even when many sources and types of data are involved. . It should be easy, without expensive integration efforts, to incorporate systems geoprocessing resources from many software and content providers into enterprise information.
13.6
LOCATION SERVICES BASED ON CELLULAR SYSTEMS
In this section, we focus on how to provide location services to end users via a cellular network. At first, we introduce a system architecture to provide location services based on a cellular system. Then, the mobile location protocol (MLP) is described. Finally, we introduce an example of a location service platform. 13.6.1 Location Service System Architecture An architecture to provide location service (LCS) based on a cellular system is shown in Figure 13.9. There are four entities in the architecture: LCS client, gateway mobile location center (GMLC), cellular network, and handset. An LCS
466
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
Cellular Network
MLP
LCS Client Third Party Search
GMLC
Base Station
Mobile User
Location Measurement Request LCS Client Authentication Mobile user privacy check Location Measurement Location Measurement Request Location Measurement Result
Location Measurement Request Location Measurement
User Location Notification
Location Measurement Result LCS Client Authentication Location Measurement Request
Figure 13.9
Location service system architecture.
client, such as an application service provider (ASP), is an end user of location information. GMLC coordinates LCS clients and cellular networks. There is a standardized communication interface (protocol) between the LCS client and GMLC. One example is the mobile location protocol (MLP), which is standardized in 3GPP (Third-Generation Partnership Project) for IMT2000 system [22]. MLP is developed in OMA (Open Mobile Alliance) for the transmission of location information of cellular users measured in the core network of cellular system to external servers [23]. It provides a simple way for corporations and/or ASPs to utilize location services. A corporation or an ASP may use the system to get the location information of cellular users and then provide specific services to them. In a cellular network, there are also mechanisms and protocols, for example, specified by 3GPP [22], to support location services. For example, there is a function of privacy for a cellular user to determine whether (s)he wants to disclose the current location information. In general, there are two types of location services [24]: 1. The Third-Party Search Service. It is provided to LCS clients who want to know the location of a mobile user. For example, a company may want to know the current location of its salespeople and/or delivery cars. At first, the LCS
13.6
LOCATION SERVICES BASED ON CELLULAR SYSTEMS
467
client sends a request to GMLC for the location information of a mobile user. The GMLC authenticates the LCS client after receiving the request. It will then check the privacy information of the mobile user requested. A mobile user presets the privacy policy, for example, whether to disclose his/her location and if so, to whom. The user’s location information will be measured by the cellular network and then delivered to the LCS client if all the conditions are cleared. 2. The User Location Notification Service. A mobile user can require his/her mobile terminal to measure the current location and sends the result to the LCS client. The LCS client, in turn, may provide the information related to the location to the user. For example, a contents provider can provide a mobile user the information of all shops or user-specific shops nearby. The mobile terminal sends its request to the basestation requiring the measurement of its location. On receiving the request, the radio network controller coordinates the basestations neighboring the terminal to measure the location by using OTDOA. If the terminal equips a GPS receiver, it will send the location data measured locally rather than a request for measurement. The location information is then transmitted to the GMLC for the authentication. If there is no problem, the GMLC forwards the location data to the corresponding LCS client. 13.6.2 Mobile Location Protocol As described above, MLP is a protocol supporting the communication interface between LCS client and GMLN, which is specified as the “Le interface” in 3GPP. The hierarchy of the MLP consists of transport layer, element layer, and service layer. Figure 13.10 depicts the three-layer hierarchy of MLP. The transport layer included in the hierarchy uses independent protocols such as HTTP, WSP, and SOAP to carry on location information. XML (Xtensible Markup Language) is used to describe the basic functions in both element and service layers. The element layer defines the common function of different location information services in the service layer. This makes it possible to use the existing element layer when new functions are added to the service layer. In the service layer, it is possible to define multiple MLPs as shown in Figure 13.10. The basic MLP is to support basic location services defined in 3GPP, and the advanced MLP is to provide more flexible and convenient location services. Since they are defined separately, the update of new functions in one part may not influence the other part(s). Also, the basic common elements of each MLP service, which are defined as the sublayer of the service layer, can be easily reused. Table 13.1 lists the possible location services supported and specified in MLP3.0. SLIS and ELIS belong to the third-party search type, while SLRS and ELRS belong to the user location notification type. TLRS is a new type of service and can be triggered by time, period, and mobile terminal’s operation. An implementation example of MLP over HTTP can be described as follows. In the third-party search case, an LCS client sends a HTTP POST for service initiation to GMLC. The GMLC responds the request by sending back a HTTP response in which the location information of the mobile user is included. In the user location
468
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
Figure 13.10
Three-layer hierarchy of MLP.
notification case, on the other hand, the GMLC sends a HTTP POST including the location information of the mobile user requesting a location service to the LCS client. In turn, the LCS client responds a HTTP Response for service initiation.
13.6.3
Location Service Platform
In order to provide a flexible and convenient location service for both mobile users and ASPs (LCS clients), it is necessary to develop a location service platform. As indicated in Section 13.6.1, location services can be provided via the architecture shown in Figure 13.9. Although GMLC and MLP have been standardized by 3GPP and OMA, respectively, there are heterogeneous wired and wireless networks
TABLE 13.1 MLP-Specified Services SLIS (standard location immediate service) ELIS (emergency location immediate service) SLRS (standard location reporting service) ELRS (emergency location reporting service) TLRS (triggered location reporting service)
Provide location information of a mobile user to an LCS client based on the LCS client’s request Provide location information of a mobile user to an LCS client based on the LCS client’s request in emergency cases Provide location information of a mobile user to an LCS client based on the user’s request Provide location information of a mobile user to an LCS client when an emergency call is initialized Provide location information of a mobile user to an LCS client based on the preset events
13.6
LOCATION SERVICES BASED ON CELLULAR SYSTEMS
469
in the real world. In 1999, NTT DoCoMo proposed the DoCoMo location platform (DLP) to provide a common platform to enable location services via various interfaces [25]. It is a solution to provide seamless location services in the heterogeneous environment. After 2 years for development and deployment with various experiments, the DLP service started in 2001. In this section, we describe the DLP as an example to introduce how the location service is provided by an operator. Two general functions are provided by the DLP: location information provision and ASP support. The location information provision function includes user location search, user location notification, and the third-party location search functions. On the other hand, the ASP support function includes group information management, zone monitoring, and the push-type information delivery management functions. Figure 13.11 shows the network configuration of a DLP network. Besides the functions we have described above, the DLP offers a number of communication interfaces to corporations and ASPs. In the case of the Internet, the security protocols such as SSL (security sockets layer) and TLS (transport layer security) are used to encrypt location information. There are five groups of servers in a DLP center including request reception servers, location measurement servers, user management servers, ASP support servers, and status monitoring servers. Request reception servers receive requests from mobile users and LCS clients based on a unified interface LISAP (location information service access protocol). Location measurement servers are responsible for measuring the location of a terminal subscribing the DLP service. User management servers authenticate the mobile user as well as LCS clients when receiving a request and manage the location information of mobile users. ASP support servers support ASPs via CGI (common gateway interface) that can be used by ASPs to develop a location-based application easily. Finally, status monitoring servers collect logs and monitor/manage the status of all the servers. The service sequences based on these servers are depicted in Figure 13.12. With these sequences, the DLP supports a number of location-based applications.
Figure 13.11
Network configuration of DLP.
470
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
Figure 13.12 Service sequences: (a) location measurement sequence; (b) user location notification sequence; (c) user location registration sequence; (d) location reference sequence; (e) third-party search sequence.
REFERENCES
471
REFERENCES 1. B. Schilit, N. Adams, and R. Want, Context-aware computing applications, Proc. Workshop on Mobile Computing Systems and Applications, Dec. 1994, pp. 85 – 90. 2. Allied Business Intelligence, Inc., Location Based Services: A Strategic Analysis of Wireless Technologies, Markets, and Trends, ABI report, 2000. 3. J. Hightower and G. Borriello, Location systems for ubiquitous computing, IEEE Comput. 34(8):57 – 66 (Aug. 2001). 4. J. Anderson, T. Rappaport, and S. Yoshida, Propagation measurements and models for wireless communications channels, IEEE Commun. Mag. 33(1):42 – 49 (Jan. 1995). 5. K. Pahlavan, X. Li, and J. Makela, Indoor geolocation science and technology, IEEE Commun. Mag. 40(2):112 – 118 (Feb. 2002). 6. R. J. Orr and G. D. Abowd, The smart floor: A mechanism for natural user identification and tracking, Proc. 2000 Conf. Human Factors in Computing Systems (CHI 2000), April 2000, pp. 275 –276. 7. Microsoft Research, http://www.research.microsoft.com/easyliving/. 8. P. Bahl and V. N. Padmanabhan, RADAR: An in-building RF-based user location and tracking system, Proc. IEEE INFOCOM 2000, March 2000, pp. 775 – 784. 9. P. Castro, P. Chiu, T. Kremenek, and R. Muntz, A probabilistic room location service for wireless networked environments, Proc. Ubiquitous Comput. 18 – 34 (Sept. 2001). 10. P. Prasithsangaree, P. Krishnamurthy, and P. K. Chrysanthis, On indoor position location with wireless LANs, Proc. IEEE PIMRC 2002, Lisbon, Sept. 2002. 11. N. B. Priyantha, A. Chakraborty, and H. Balakrishnan, The cricket location-support system, Proc. MOBICOM 2000, Aug. 2000, pp. 32 – 43. 12. A. Harter, A. Hopper, P. Steggles, A. Ward, and P. Webster, The anatomy of a contextaware application, Proc. MOBICOM 1999, Aug. 1999, pp. 59 – 68. 13. M. Hazas and A. Ward, A novel broadband ultrasonic location system, Proc. UbiComp 2002, Sept. 2002, pp. 264– 280. 14. R. Want, A. Hopper, V. Falcao, and J. Gibbons, The active badge location system, ACM Trans. Inform. Syst., 10(1):91 – 102 (Jan. 1992). 15. Akamai White Paper, Turbo-charging Dynamic Web Sites with Akamai EdgeSuite, 2001. 16. Speedera White Paper, Speedera Edge Delivery Network, Sept. 2001. 17. P. Enge and P. Misra, Special issue on global positioning system, Proc. IEEE, 87(1) (Jan. 1999). 18. J. J. Caffery and G. L. Stuber, Overview of radiolocation in CDMA cellular systems, IEEE Commun. Mag., 36(4):38 – 45 (April 1998). 19. J. Zhang, Location management in cellular system, in Handbook of Wireless Networks and Mobile Computing, Wiley, New York, 2002, Chapter 2. 20. C. Drane, M. Macnaughtan, and C. Scott, Positioning GSM telephones, IEEE Commun. Mag. 36(4):46 – 54 (April 1998). 21. Open GIS Symposium webpage, http://www.opengis.org, 2003. 22. Function Stage 2 Description of LCS, 3GPP TS23.271, version 6.4.0, June 2003.
472
ALGORITHMS AND INFRASTRUCTURES FOR LOCATION-BASED SERVICES
23. Mobile Location Protocol, LIF TS101, version 3.0.0, June 2002. 24. N. Miura and M. Takahata, Trends of location information distribution methods, NTT DoCoMo Tech. J., 9(4):34 – 43 (April 2001). 25. M. Fujita and M. Chikamori, DLP (DoCoMo location platform) service, NTT DoCoMo Tech. J., 10(1):41 – 47 (Jan. 2002).
CHAPTER 14
FIXED AND MOBILE WEB SERVICES MICHAEL MAHAN1 Nokia Research Center Burlington, Massachusetts
14.1
WEB SERVICES INTRODUCTION
Comprehensive coverage on the topic of Web services is impossible within the constraints of one book chapter. Web services are a highly volatile technical domain, and any complete treatment will both read like a Tolstoy novel and soon be outdated. Web services are an amalgamation of many detailed protocols—some leveraged by Web services and some created specifically for Web services. Advocates stress that the whole value of Web services exceeds the sum of these protocols. Web services describe the interactions between network-accessible, runtime components, assembled to create distributed applications. Web services strike a sophisticated balance between the desirable, yet conflicting, goals of system extensibility and application interoperability. Web services diverge from the traditional Web in that the interactions contain application-customized, function-oriented data rather than presentation-oriented data universally understood by any browser. From a software engineering perspective, Web services open the way toward more efficient and effective software application development through component reuse and the sharing of runtime resources. 14.1.1 Web Services Defined There exists a broad movement by the information technology (IT) industry to expand the existing World Wide Web (Web) from a server-to-user interaction model to an application-to-application interaction model. This overall goal is to extend the Web from an information-oriented system to a service-oriented, 1
With SOAP Message Security contribution (Section 14.2.4.7) from Frederick Hirsch, Nokia Mobile Phones, Burlington, MA.
Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
473
474
FIXED AND MOBILE WEB SERVICES
transactional system. To accomplish this goal, new and openly developed Web services specifications will exploit and extend the existing foundation Web specifications. Competing IT vendors, open source, and freeware providers will create and distribute Web services tools and platforms that comply with these new specifications. Developers will create distributed applications using these available Web services tools and platforms and deliver them to their users. What is unique with Web services is the creation of Web-based application middleware that is openly developed and specified through standards bodies. Taken together, the existing Web specifications plus the new and emerging Web services specifications form the requirements set for this new breed of application middleware. Corporate entities, open-source or freeware organizations, and individuals can take these requirements and implement partial to full solutions. Solutions range from simple message parsers to full development platforms to manageable service deployment platforms. A Web services developer chooses which middleware products or tools to use dependent on his/her application constraints such as cost, time to market, and available product support. Early adopters of Web services came from different communities with often contradictory visions and goals. These early adopters include the distributed computing industry, the EDI/B2B industry, the Web community itself, and proponents of the emerging Semantic Web. Because of the divergent views of these special interest groups, the development of Web services standards and technologies have been slow, chaotic, and undisciplined. Adding to the difficulty is that the foundation technology, the Web, is itself decentralized and its organizational guardian, the World Wide Web Consortium (W3C), has only recently (as of 2003) started to produce a normative, definitive architectural description [1]. Thus, it is understandable that evolving an ill-defined, albeit highly successful, architecture, influenced by competing visions, has proved difficult. As an example of the chaos, even the most basic of definitions in this space that of Web services proper, does not enjoy consensus. The term “Web services” first gained traction in the marketing domain rather than the technical domain. Technicians have tried to catch up; however, a single rigorous definition has eluded wide acceptance with this audience. Perhaps it is because the term was co-opted by marketers or “marketeering” that has led to muddied technical waters. Recently, some consensus on a definition has been building within the W3C’s Web Services Architecture working group. This group was converging on the following definition in 2003: A Web service is a distributed software system designed for the exchange of messages encoded with functional rather than presentation data. Web services use URIs for identifiers, and have interfaces described using XML (typically WSDL). Agents interact with the Web service in a manner prescribed by its description, using XML-based messages typically conveyed using HTTP, SOAP and other Web-related standards.
This distinguishes Web services more as an architectural pattern or a collection of system capabilities, rather than a particular specification, application, or solution. This pattern or collection is a work in progress and is being realized by a set of
14.1
WEB SERVICES INTRODUCTION
475
emerging open standards specifications, software tools, and development and execution platforms. One powerful, yet subtle, pattern expressed in the definition above is the ability to dynamically enhance the infrastructure itself, to define, develop, and deploy new system features as needed. Web services describe more than just an application messaging interface to a specific service. They also can define describe optional interoperable infrastructure features that can be used as needed by applications. Such horizontal, system features include security, reliability, transactions, privacy, identity management, and service composition, for example. Through the application of intermediaries, Web services define the ability to deploy both functional and optimizing features at runtime. This extensibility is a highly sought system property in any distributed system or middleware. Thus, we arrive at the crucial benefit that Web services can deliver to the distributed computing industry and perhaps its most distinctive characteristic. Web services strike a sophisticated balance between system extensibility and application interoperability. Traditionally, a system architect must consciously trade off between these two desired properties just as a software designer chooses between code size and execution speed. In contrast, the latest Web services specifications, from standards organizations such as the W3C, OASIS, and WS-I, accomplish the prime objective of application-to-application interoperability while supporting the extensible deployment of system capabilities in a robust fashion. The next few years should show whether Web services can truly deliver both promises commercially.
14.1.2 Service-Oriented Architectures Web services are an example of a service-oriented architecture (SOA). Serviceoriented architectures are distributed systems that promote the concept of atomic, self-contained, services accessible and available to a multitude of applications. Hence, implementations of a SOA reduce the coupling between service providers and consumers. Client development and deployment is less dependent on service development and deployment; enabling clients to mix, match, and group services to best meet application requirements. For instance, a client might use many different airline Web services to check availability and pricing information, another Web service for credit card validation, and yet another for creating paper (hardcopy) tickets and shipping them to the buyer, all possibly using preexisting services. Applications built on a SOA should leverage the decoupling of applications from services in terms of requirements, semantics, and platform dependencies. In the above example, the travel agent client application needs to understand only the syntax and semantics of the service interface. All other details of the service are opaque to the client, including programming language, algorithms used, internal data representations and semantics, and platform execution environment. Likewise, the service needs to know little about the client application. In fact, at development time, the service does not need to know anything about any of the many potential clients that may use the service once it is deployed.
476
FIXED AND MOBILE WEB SERVICES
This decoupling enables emergent behavior. This means that because application semantics are not built into the service, services are free to interact with any client that recognizes the service’s interface. This in turn encourages client developers to create applications that more nuanced, more complex, and specialized to meet a customer’s requirements. A Web services client developer uses the ubiquitous infrastructure to access and leverage these available, atomic services. The client developer binds the semantics of the application to these available Web services. The user of the client gets the application semantics desired. The client developer focuses on understanding and codifying the application requirements and application semantics. The Web service provider focuses only on the service semantics and is completely ignorant of any client application requirements and semantics. The Web services architecture is an instantiation of a SOA. The SOA model used by Web services is depicted in Figure 14.1. The Web services architecture defines roles for a service requestor, a service provider, and a service registry. Web services requestors and providers fulfill welldefined roles originally defined by traditional distributed systems. The difference with Web services is that requesters and providers interact using messages defined by open standards bodies and providers describe their offered services in a standardized format. Leveraging on these two distinctions is one additional key differentiator: Web services requesters and providers are often developmentally decoupled—implemented and deployed by separate development teams. Web services can also be used to enable requestors to discover providers of suitable services.
Figure 14.1
Web services conceptual model.
14.1
WEB SERVICES INTRODUCTION
477
Currently, Web services discovery is primarily a design time endeavor. However, a Web service registry provides a necessary, yet not sufficient, condition for requestors to find and interact with providers without having a priori knowledge of the provider. Automated discovery is the goal for the emerging field of semantic Web services. Although automated discovery has been promoted as a prime benefit of Web services, without explicit semantic annotation it can occur only in isolated environments where all application semantics have been standardized or well known by the participants and these semantics are encoded in the program logic of both the requestor and the provider. Much of software production still resembles more individualized craft than massproduced product, similar to manufacturing industries before interchangeable components and assembly lines enabled mass production of complex goods. Service oriented architectures and Web services are an attempt to bring interoperable service components to the software industry. SOAs can be viewed as the latest attempt by the computer engineering industry to improve production software methods. Earlier efforts included the development of procedural languages, object orientation, and component software, each of which increased software modularity and reuse. SOAs in general and Web services in particular extend previous efforts at modularity and reuse to include runtime elements and to distribute functionality across a network instead of encapsulating it within one runtime process or a set of linked libraries executing in a single memory space. Application functionality is partitioned and farmed out to perhaps competing service providers. These are distributed runtime components—extending reuse and modularity out to network accessible services. The goal, from an engineering perspective, is to produce and deploy distributed software that allows greater client access, results in higherquality applications, and is delivered predictably on time and within budget. Will runtime interchangeable parts transform information engineering in the same way that industrial engineering was transformed by manufactured interchangeable parts? Given the increasing dependency of all segments of the economy on the efficiency and robustness the information technology sector, the demand for transformation is high and vendors who deliver solutions in this space will be rewarded. Web services have some unique service-oriented architecture properties. Although the Web is a distributed, highly decoupled information system it is built around the assumption of human interaction. This makes it inappropriate to use Web servers as a set of interchangeable parts or services to programmatically form more complex or specialized applications. Other service-oriented approaches such as CORBA (Common Object Request Broker Architecture) are not built on the open Internet foundations of the IP and the Web, are seen as vendor-encumbered, and are not as pervasive. Web services have the strength of leveraging successful Internet infrastructure and providing a framework based on open standards. 14.1.3 Motivating Technologies: Creating a Mature Foundation Web services draw on experience and mature technologies developed in the late 1990s when technologists explored issues in a number of technology areas,
478
FIXED AND MOBILE WEB SERVICES
including distributed computing, business data interchange, and information distribution and retrieval. These efforts have contributed to the new software domain called Web services. From this perspective, the creation of Web services technology and specifications are a collective attempt to address many of the problems independently encountered by each motivating technology. Additionally, for some of the motivating technologies, the trend toward Web services concepts are the byproducts of attempts to reconcile lack of widespread adoption. Of course, each of the motivating technologies brings tangible assets along with hard-to-solve issues. . Distributed Computing. Distributed computing relies on middleware for object exchange or function invocation across a distributed application. Distributed applications typically are intranet-oriented. Distributed applications that span multiple trust boundaries have not gained significant traction. Leading technologies are DCOM, CORBA, and Java’s RMI and Jini; along with distributed security initiatives such as Kerberos, DCE, and public key infrastructure. Distributing computing often suffers from interoperability issues across vendor tools and platforms, especially since it requires clients and servers to be tightly coupled in both a business and a development sense. Thus, this technology has not come close to meeting scalability expectations, especially when compared to the Web—a loose, ad hoc coupling of components traversed by following identifier links and observing a few simple rules. Distributed computing applications are often fragile in the event of partial failure and are inflexible to redistribution of objects or application functionality. They are also inflexible at runtime when new system components such as firewalls, proxies, and gateways are deployed. They tend to have a high administrative cost. Finally, there is a significant technical barrier for distributed computing middleware to directly leverage the desired global pervasiveness of the Web. The Web is a disconnected infrastructure, whereas vital middleware services such as security and transactions are typically connection-oriented. . Electronic Date Interchange ðEDI Þ. EDI is a type of distributed system focusing on particular requirements from the business community. These are typically agreed upon requirements for buying, selling, and trading between two or more trading partners. These are often referred to as Business to business (B2B) requirements. B2B requirements are technology-oriented solutions to traditional business challenges. Much of the B2B focus has been to specify automated mechanisms for uniform business needs—those that concern every business venture. These are horizontal issues such as service and business level agreements, multi-party transactions (including rollback schemes), and reliable messaging. EDI also addresses vertical market issues; defining market-specific vocabularies for industry-mandated semantics (insurance, commodity trading, etc.) and electronic exchanges or marketplaces for consumers and producers to discover and interact one another in an adhoc fashion. In this vertical context, EDI methods were developed to meet specific governmental requirements for an industry such as aerospace or defense. EDI
14.1
WEB SERVICES INTRODUCTION
479
systems are typically deployed on private networks and do not scale, especially since EDI suffers from being complex and inflexible. . The Web. The Web is a globally accessible, distributed system designed for direct human consumption of remote information. The Web has enjoyed enormous success—it and email are the standard-bearer applications of the computer age. The Web uses UI application browsers to request and subsequently render information retrieved from a provider’s Web server. Hence, a general-purpose client is used to deploy applications as diverse as stock trading, news, auctions, Blogs (WeB logs), peer-to-peer (P2P) applications (i.e., media sharing such as Napster clones and process sharing such as P2P SETI@home), and simple to moderately complex retail services for secure commodity purchases (i.e., Amazon, Travelocity). The Web infrastructure does not require communicating parties to develop or preconfigure software and does not require them to have preestablished application-level knowledge of each other. This is because humans supply the semantics to Web applications at runtime, interpreting the information appropriately. This enables the Web to be a simple architecture with request – response messaging, a uniform set of actions (primarily GET and POST), and independent components loosely connected through hypermedia links. This is in contrast to distributed computing and EDI efforts that required tight coupling, preestablished understanding, and extensive costs and efforts upfront. In addition, the Web scales well and promotes interoperability; however, it was designed for simple information retrieval and not for sophisticated B2C or businessto-business (B2B) processing or complex, interactive applications. Businessoriented applications require solutions for message reliability, transactional support, security, service composition, and choreography. Complex applications may include multiple parties, asynchronous interactions, and service discovery. . The Semantic Web. The Semantic Web is an emerging set of specifications that, like Web services, are based on the Web infrastructure to create sophisticated program-to-program software applications. However, Web services only addresses service syntax and relies on humans to supply the meaning of services and the service interfaces to the consumer application. The semantic Web aims to move beyond this human barrier to distributed system utility and sophistication. The Semantic Web defines an XML-based language for annotation of resources and services as well as well defined meanings of terms, or ontologies. This allows a semantic processor to comprehend a resource or service and enables automated service discovery. A semantic processor embedded into a consumer application will be able to interrogate, at runtime, a repository of service descriptions and analyze whether any registered service meets its processing requirements to fulfill an active task. This semantic capability also enables the consumer application to understand the chosen service’s interface sufficiently to automatically invoke the advertised service and comprehend the results. Hence, the semantic Web describes
480
FIXED AND MOBILE WEB SERVICES
Web services in machine understandable (not just processible) form to enable runtime rather than design-time application-to-application discovery and non – a priori or emergent behavior. Distributed computing, EDI, and the Web have established roots in the computing industry and academia, In contrast, the Semantic Web has grown up in parallel with Web services and is still largely associated with research. These precursor technologies were necessary but not sufficient to create Web services. The catalyst was the advent and widespread adoption of eXtensible Markup Language (XML)—a flexible, readable, and openly developed syntax for encoding data. XML is based on a mature technology for marking up content, SGML, contributing to its wide adoption. Given the advantages and availability of XML, parallel ideas germinated in all precursor technology camps to apply XML in each domain to solve problems or expand their solution space. What has become evident is that this new large area of intersection between distributed computing, EDI, and the Web would require its own focus. This focused activity is Web services.
14.1.4 Quick Glance at Foundation and Core Technologies of Web Services The genesis of Web services was the creation and widespread adoption of the XML (eXtensible Markup Language) together with the already established Web. The business and technical potential produced by combining XML and the Web generated widespread attention and an incipient marketplace. Built on these foundation technologies are the core Web services specifications of SOAP and WSDL. SOAP, through version 1.1, is an acronym standing for simple object access protocol. Interestingly, the W3C kept the name SOAP for its standardized version 1.2, yet assert that it is no longer an acronym. This is not cause for alarm as the 1.2 work clarifies or expands on version 1.1 rather than narrowing its applicability. WSDL stands for the Web Services Description Language through all versions. Just as the Web is built on the services of the Internet [3], Web services depend on the distributed system backbone of the Web. The Web provides three essential elements that Web services exploit: 1. Globally addressable identifiers—URIs 2. Typed (integers, strings, binary) representations associated with identifiers 3. A transfer protocol to retrieve resource representations The fundamental concept is that the Web is composed of resources, each of which has a globally addressable identifier called a uniform resource identifier (URI). In addition, each resource has a representation of state that is s represented by structured data conforming to some well-known, yet unbounded, media type. Lastly, the Web provides a transfer protocol so that Web applications can exchange
14.1
WEB SERVICES INTRODUCTION
481
resource representations. The primary Web protocol is HTTP (Hypertext Transfer Protocol) [4]. XML is an open standard data representation language that enjoys widespread adoption across IT vendors and computing platforms. XML is popular because of its openness, simplicity, expressiveness, extensibility, and readability. XML defines a simple syntax to structure and collect data into documents. Application designers use XML to define a markup vocabulary appropriate to their application and industry. (Standards bodies such as Oasis help define uniform industry vocabularies, allowing application-level interoperability.) Application grammar and the baseline XML syntax are used to structure the data through the technique of data markup. Hence, XML documents are application customized to be both humanreadable and machine-processible. Since any XML document must conform to the syntax rules defined by the XML specification [5], developers can choose from a host of available, general-purpose XML tools, including codecs, programming libraries, database libraries, and user interface tools. SOAP is the core Web services messaging technology.2 The SOAP specification addresses XML messaging between processing nodes supporting a distributed application. SOAP details how an XML message is structured, how the XML data are encoded (data typing), how a SOAP node must behave according to a specific processing model, and details on binding the XML message to an Internet protocol, namely, the Hypertext Transfer Protocol (HTTP). SOAP defines how the message infrastructure may be extended using message headers and how intermediaries must behave when SOAP messages are passed through them. The use of headers and intermediaries gives application architects great flexibility in how SOAP messaging is used to meet application requirements. The Web Services Definition Language (WSDL) is the core Web services metadata technology. The WSDL specification addresses the need for exposing Web service information that will enable a client to access and invoke that service. Thus, WSDL is akin to an Interface Definition Language (IDL) or a Java object’s public interface. It is client-centric in that it specifies only what the client needs to know; all other service information consists of opaque implementation details. The minimal metadata that a WSDL document describes are the structure of the XML content to be used in a SOAP message and the transport protocol details required for the client to connect and access the service. WSDL information may be obtained by a Web services consumer by a variety of techniques, depending on the requirements of the application, consumer, and service provider. It is expected that the early adoption of Web services will follow one of two models. First will be enterprise-oriented, intranet, back-office integration tasks. The second will be business to business, between established, trusted partners. In both cases, client developers have a close association with service developers, enabling exchange of WSDL information as part of the development process (out of band). Use of a private registry will allow such closed communities to share WSDL information within the community, using a standard such as UDDI 2
As noted previously, as of SOAP version 1.2 this is no longer an acronym.
482
FIXED AND MOBILE WEB SERVICES
(universal distribution, discovery, and interoperability). Future services may also support public registries for wider communities. Using WSDL and SOAP, a service provider is not limited to create standard interfaces to only new services but also to wrap existing legacy services with a SOAP interface and a metadata description. This enables legacy services to benefit from Web services interoperability and use the Web foundation technologies. The service provider can provide allow authorized consumers to use server authorization mechanisms in conjunction with consumer authentication techniques. Emerging as another foundation standard in the Web services domain is the Web Services Interoperability Organization’s (WS-I) Basic Profile 1.0 (BP 1.0). BP 1.0 addresses ambiguities in the SOAP 1.1 and WSDL 1.1 specifications, along with how SOAP 1.1 uses HTTP headers, status codes, and cookies. These usage clarifications are targeted to enable one of the key stated benefits of Web services—interoperability. Prior to BP 1.0, each Web services platform or tool vendor offered his/her own interpretation of the core specifications. This resulted in Web services applications that were isolated and could serve only a limited community. This situation was clearly antithetical to the goals of Web services, thus motivating the primary Web services vendors to create WS-I and to produce the BP 1.0 specification. To summarize, the core Web services specifications describe a messaging infrastructure, message processing rules, a service description syntax, a discovery system, and best-practices guidelines. These are based on Web standards for resource identification and transfer as well as the XML standards family (XML Schema, namespaces, XML Infoset, XML Canonicalization, etc.). Taken together, these specifications enable interoperability, yet have extension points allowing deployments to be customized appropriately. This allows infrastructure services (security, reliability, etc.) to be customized in a standard interoperable manner. It is this property of flexibility that makes Web services attractive to Internet and intranet service developers. Developers can rely on predicable distributed infrastructure while architecting the correct mixture of functional or optimizing properties for each particular application’s requirements. 14.1.5
Web Services Hype
XML and Web services have been oversold as a panacea, a silver bullet for various problems. XML has been marketed as a means to eliminate the difficult problem of defining the shared understanding and processing rules needed to integrate businesses. Web services have been sold as a means to trivially interconnect software without thought or effort. Using Web services as a way to create fantastic new software, seamlessly and automatically connecting any two business processes or applications anywhere on the network as if by magic, is unrealistic. Major research projects such as the semantic Web project are addressing the issues of shared understanding and automatic integration of unintroduced applications and software components. This work will eventually be integrated with Web services efforts, but Web services technologies alone are not adequate to achieve such goals.
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
483
Similarly, XML enables software to operate only at the syntactic level. There is a widely promoted fallacy that data that can be parsed are sufficient for software to interoperate. At best, XML makes it possible for businesses or developer groups to share data, provided they agree on the semantics of those data in advance. Hence, XML provides data interoperability where shared semantics can be assumed. XML does nothing at all to create semantic interoperability. Although labeled software’s “lingua franca” XML, it isn’t even a “lingua.” XML is an alphabet or token set that provides the primitives for describing larger concepts, and it works by allowing an unlimited number of semantic concepts to be encoded using those primitives. A true lingua consists of not only the set of tokens but also a specific grammar and the corresponding semantics. Projects like the semantic Web and DAML-S are working toward defining a true “linqua franca” by defining ontologies, enabling meaning to be shared.
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
14.2.1 XML/XML Schema XML is a family of specifications developed by the W3C. The prime feature of XML is that it defines a format and syntax in which you can write specialized, applicationspecific grammars to express structured data. Web services use XML for both message structure and service definitions. XML is popular because it is simple, platform-independent, and readable. XML documents must conform to the XML syntax in order to be “well formed.” To be valid, an XML document must conform to a schema. XML documents are made up of element tags and attributes. Figure 14.2 lists an example XML document. An XML Schema is a document that describes the valid format of an XML dataset. This definition includes what elements are (and are not) allowed at any point, what the attibutes for any element may be, and the number of occurances of elements. Hence, XML Schemas express shared vocabularies and allow machines to carry out rules made by people. DTDs, which came from XML’s parent, SGML, were first used for this function. Howver, XML Schema has some distinct advantages; the most notable is that XML Schema can deal with namespaces, as well as the ability to constrain values to define meaningful application types (such as a part type) as well as complex data types. This allows automatic value checking when parsing XML. DTDs also have non-XML grammar making them difficult to understand and requiring specialized tools. An XML Schema associated with the example XML document in Figure 14.2 is presented in Figure 14.3. 14.2.2 The Web The Web is both a foundation technology and a motivating technology for Web services. The Web is one of the most successful distributed systems ever built. The only other system that enjoys the same broad popularity is email. The Web
484
FIXED AND MOBILE WEB SERVICES
Julius Zinn 22 Armidillo Street Happy Camp TX 12345 12 1233.00 .05 10 600.00 3% Figure 14.2
Example XML document.
can be characterized as a ubiquitous, shared information space coupled to a hypermedia processing model. The Web is ubiquitous—the primary Web protocols are simple and deployed in all major user-oriented computing platforms, even handheld telephony devices. The Web is a shared information space—accessible data are structured content representing any data that any service provider sees fit to publish. Hypermedia define the ability to address remote content that is organized in some known structure. The hypermedia-processing model defines data retrieval and presentation behavior, defines how to perform simple queries on content servers and how to push new or modified content back to a content server. The definitive source for a definition of the Web comes from the W3C TAG (Technical Architectural Group). It categorizes the Web architecture into three primary areas: . Identification—Web processes identify resources using uniform identity identifiers (URIs) . Representation—Web processes represent the state of an identifiable resource . Transfer—representational state transfer (REST) The Web is currently a human-initiated client – server system based on browsers, document access, simple manual purchases, and file downloads. The Web owes
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
Figure 14.3
XML schema corresponding to an example document.
485
486
FIXED AND MOBILE WEB SERVICES
its success because of its simplicity, ubiquity, and extensibility. Content suppliers and consumers both exploit the ubiquity and pervasiveness of the Web. A service provider merely structures its content and sets up a Website—voila´—it just joined a global data and service access system. Any user with a browser can then access the content and services by a universally resolvable address. The Web flourishes because of its robust and extensible architecture. New features and functions can be deployed on the Web’s distributed infrastructure as long as they conform to the Web’s principles. Content handlers and plugins enable new content types to be executed on the client application. Simple protocols keep the entry barrier low. For example, the data and service interface consists of only three methods, each with clear semantics: HTTP GET, POST, and PUT. On top of that, the content language, HTML, is a simple markup language. To obtain these highly desired characteristics of simplicity, ubiquity, and extensibility, the Web architecture is purposefully constrained. These architectural constraints are described and promoted by network-based architectural style defined as representational state transfer (REST) [2]. In REST, as instantiated by the Web architecture, representations of resources are transferred from Web servers to Web clients. The REST constraints to a distributed hypermedia system are 1. Client – Server—separates the user interface from the data storage system. Gains: portability, scalability, independent component evolvability. 2. Stateless communications—each interaction must contain all information necessary for the service to process the request. Gains: visibility (debugability), reliability (easier recovery from partial failures), looser coupling of system components, and scalability (server components can be simpler and can quickly free resources). Tradeoffs: decreased network performance, loss of server’s complete control over system behavior. Note that the deployed Web extensively uses cookies (caching server state on the client), which breaks this constraint. 3. Caching—some representations may be cached. Intermediaries may respond on behalf of a server with the cached data. Gains: efficiency, scalability, and user-perceived performance. Tradeoff: potential reliability. 4. Uniform interface—consistent interfaces for resource identification, resource manipulation through representations, self-describing messages, and messages as the embodiment of application state. Gains: simplicity, visibility, independent evolvability. Tradeoff: efficiency. 5. Layered system—each component “knows” only about the components with which it is interacting. Gains: bounds system complexity, independent component evolvability, legacy encapsulation, simplifies components by moving infrequently used functionality to a shared intermediary, and intermediaries can be used to improve system scalability by enabling load balancing. Enables security policies to be enforced on data crossing the organizational boundary (firewalls). Tradeoffs: adds overhead and latency to the processing
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
487
of data, reducing user-perceived performance (can be offset by shared caching at intermediaries). 6. Code on demand—client may download and execute code (e.g., applets, scripts, ActiveX controls, XSLT). Gains: simplicity (reducing the number of preimplemented client features) and runtime extensibility. Tradeoff: reduces visibility. Taken together, these constraints create the large-scale effect of a shared information space that scales well and behaves predictably. URI and HTTP have been specially designed for REST interactions. REST components communicate by transferring a representation of a resource, selected dynamically based on the capabilities or desires of the recipient, and the nature of the resource. The Web focuses on a shared understanding of data types with metadata, but limits the scope of what is revealed to a standardized interface. User agents, gateways, proxies, and origin servers are the main roles that a component can act in. A component may act in different roles depending on the interaction [1]. Are Web services an extension of the Web as defined by REST? REST proponents think that using the RPC style of SOAP, rather than the document style, violates the uniform interface constraint. REST proponents argue that a service-oriented system is a special case of a distributed shared information space like the Web. This viewpoint asserts that services are just resources that should be exposed as URIs by service description documents, and the HTTP methods of GET, POST, PUT, and DELETE are sufficient to perform all the operations fathomable without resorting to using custom (Web services) methods. Following this reasoning, REST capabilities support document workflow-based interactions, such as passing a purchase order for processing, in a simpler and more robust fashion than can Web services with its custom interfaces. As evidence, REST proponents point toward RPC-based distributed systems like CORBA, DCOM, and DCE, which have failed to deploy on Internet scale because they have nonuniform interfaces. There is a large amount of controversy surrounding this argument, which is further slowing down the Web services working groups within the W3C.
14.2.3 Web Services Standards Figure 14.4 organizes the Web services standards by function area: security, messaging, discovery, security management, and description. There are elements of a protocol stack in this diagram. However, this diagram is intended primarily to graphically separate the main functional areas and map the mostly ongoing standards work to these broad categories. The core and foundation technologies are found about the base of the diagrams, and the most volatile technologies are towards the top. Given this volatility, this chapter will only address the foundation technologies of SOAP, WSDL, and the WS-I specification profiles. Note that WS-I specifications are profiles of selected specifications presented below and hence are not themselves part of the diagram with the exception of calling out the Basic
488 Figure 14.4 Emerging Web services standards.
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
489
Profile 1.1 (BP 1.1). This specification is special in how it describes the use of attachments. Also note that the security stack is typically applied to the messaging stack, and can also be applied to the discovery stack. 14.2.4 Simple Object Access Protocol (SOAP) SOAP defines the intersection of two technology domains: XML and the Web. SOAP details how to structure and transfer data across the Web between participating Web service nodes. The transferred data are organized into messages. Each SOAP message is represented by an XML document. These XML documents must conform to SOAP rules regarding a baseline vocabulary and structure. In this fashion, SOAP constrains the message contents to a known structure and syntax. This promotes interoperability—clients and servers can rely on a mutually understood minimal message syntax. SOAP is arguably the primary Web services protocol and, as such, this chapter will focus on this specification. SOAP and XML-RPC were both derived from the insight of Dave Winer, who realized the potential of RPC over HTTP via XML in early 1998. The SOAP branch was morphed from the XML-RPC branch as Microsoft and other companies adopted the concept and collaborated on the specification. These specifications were developed while XML Schema was evolving, so as a result SOAP not only included some XML Schema conventions and namespaces but also defined its own encoding model. The first SOAP specification was first published in September 1999 authored by UserLand, DevelopMentor, and Microsoft. By December 1999, the protocol had stabilized with the release of version 1.1, and now IBM and Lotus were part of the authoring team. In May 2000, this SOAP version was submitted to the W3C. The W3C established the XML protocol (XMLP) working group in September 2000 to handle this submission. By June 2003, the W3C released SOAP 1.2 as a W3C recommendation. Note that by mid-2003, it is the 1.1 version of the SOAP specification that is the most widely deployed and enjoys the most tool and platform support. However, tools and platforms are transitioning to be SOAP 1.2-compliant during this same timeframe, indicating commercial support for the evolution of the SOAP specification. SOAP originally stood for simple object access protocol, but the latest version of the specification interestingly keeps the well-known acronym and drops the full name—probably because there is nothing particularly object-oriented about the technology, nor does it define an object model. SOAP is simple in the sense that the messaging section of the specification doesn’t attempt to solve many of the nasty issues of distributed computing. These are features such as message reliability, end-to-end security, routing through application intermediaries, and multiple message dependencies. What the developers of the SOAP specification opted to do instead was to provide a generic mechanism that system developers can exploit to provide these horizontal features. In the latest SOAP specification (version 1.2), this is described in the section called the SOAP messaging framework [6].
490
FIXED AND MOBILE WEB SERVICES
14.2.4.1 SOAP Deployment Environments SOAP application environments can range from simple to arbitrarily complex. Complex application environments may involve network intermediaries, multiple transport protocols, service discovery, or sophisticated message patterns. In the most straightforward and typical Web services use case, SOAP is used with the HTTP protocol to perform a single request and the corresponding response between two distributed processes—one assuming a client role and the other assuming a server role. Unlike the typical Web browser use case, the message content in both the request and the response is in the form of an XML document. These XML documents are not intended for rendering. Each document contains data targeted for processing at the application layer rather than at the presentation layer. SOAP can be used as an integration technology, providing the infrastructure to feasibly couple heterogeneous systems. This follows from SOAP’s protocol neutrality and SOAP’s flexibility regarding SOAP headers. Although SOAP defines a concrete binding to HTTP, SOAP messaging does not assume the presence of HTTP or even TCP/IP. Thus, SOAP is more “protocol-biased” than “protocolneutral” because SOAP requires an outside-the-specification effort to map it to another transport protocol. However, some of these bindings have already been specified or productized—for instance the SOAP binding to email [9], BEEP (blocks extensible exchange protocol) [10], EJB (Enterprise Java Beans), and JMS (Java Message Service). In addition, SOAP allows application designers to determine the separation between message metadata and message payload—there is no defined static set of SOAP headers to which an application must comply. This flexibility to define custom define metadata is another key enabler to using SOAP as an application integration technology. Leveraging these conditions, SOAP can be used to migrate protocol semantics between bridged protocols. Differences in error propagation and compensation, message correlation, and message reliability can be addressed using SOAP headers to carry the data representing the semantic impedance. Thus, SOAP can be deployed as an integration framework used to bridge different protocols that do not share each other’s full semantics. This is a popular early application of SOAP within the enterprise environment—back-office integration. Larger companies often start deployments of SOAP in this manner before opening up to application exchanges outside their trust boundary. SOAP applications can be deployed in a number of system patterns to solve a particular business problem. These patterns may involve SOAP intermediaries or non-SOAP nodes such as proxies and gateways. SOAP intermediaries are strictly defined as processing nodes that operate at the SOAP layer. They receive a SOAP message, understand and potentially modify it, and then forward it to the next SOAP node in the application’s message path. SOAP and non-SOAP intermediaries can be categorized as either functional or optimizing. Functional intermediaries provide some functional processing that is required for the application to work. Functional intermediaries are often used to support requestor authentication and service authorization. Optimizing intermediaries enhance an application yet are not required to exist for the application to work. Optimizing intermediaries can often be introduced
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
491
at runtime rather than at design time—a significant capability to enable a provider to “tune” the service post the initial service deployment. Optimizing intermediaries typically enhance scalability, reliability, or perceived performance of the provided service. Common deployment patterns of Web services intermediaries include . Gateways. These are intermediaries deployed by that reside at the trust boundary of the Web services provider. Gateways enable an enterprise to provide a single point of contact to all requesters outside its trust domain for all the Web services that it hosts. Gateways may implement some business logic such as authentication, authorization, and privacy handling. Gateways assume the role of ultimate receiver of the message, and hence are not a true SOAP intermediary. . Proxies/Adapters. These intermediaries assume the roles of requester and provider, respectively, in order to enable legacy systems to participate in a Web services application. Like gateways, these are not intermediaries in the strict sense of the SOAP processing model, as they do not forward SOAP messages. Rather, proxies are intermediaries that generate SOAP messages and adapters are intermediaries that terminate the SOAP message path—the ultimate receiver in “SOAP speak.” Proxies enable non-SOAP service requestors to initiate SOAP requests, whereas adaptors enable non-SOAP services to be exposed as Web services. . Routers. A Web service message can follow a particular “path” through an arbitrary number of SOAP intermediaries where each Web service intermediary would provide a value-added service to the message and hence to the application. Note that for a request –response message exchange pattern (MEP), the request message may traverse through different intermediaries compared to the response message path. Routing intermediaries could belong to the trust domain of the requester, the provider, or some third party. . Dispatchers. These SOAP intermediaries enable a Web service request message to be routed to one of several instances of a Web service provider based on some “filter” criteria applied to the message’s contents/data. The dispatcher typically belongs to the trust domain of the Web services provider. Dispatchers may perform some form of “load sharing” or “data partitioning” so that requests for a particular Web service may be filtered on the presence and/or value of certain data elements in a SOAP request and diverted to an appropriate provider that hosts the resource to which the data in question are relevant. Dispatching intermediaries may be used to filter Web service messages on namespaces used within the SOAP message or support service extensibility by directing requests for different versions of an interface to the appropriate runtime implementation. . Orchestrators/Composers. These intermediaries offers a composite service interface built from the coordination of a set of individual Web services. Requesters see the composed service, not the component services. Orchestrators/composers assume the role of ultimate receiver of the message and hence are not a true SOAP intermediary.
492
FIXED AND MOBILE WEB SERVICES
14.2.4.2 SOAP Example The following example comes from the W3C Architecture working group’s usage scenarios [7]. It illustrates some of the features that differentiates it from XMLRPC—use of SOAP extensibility (headers) and support for network intermediaries. The use case is a product price request between a SOAP sender and a SOAP receiver. A data caching intermediary sits between them to provide higher system performance. Figure 14.5 shows the system deployment and message flows. Figures 14.6– 14.8 show the exchanged SOAP messages. Note that these are SOAP 1.2 examples using the namespaces and attributes associated with that version. The SOAP request’s message is routed through a caching intermediary. The caching intermediary (SOAP application 2) checks its caching store to see if it can directly respond to the request. Message path 2 in the SOAP application diagram corresponds to the intermediary directly supporting the request. If the intermediary cannot fulfill the request, it will forward the message to the catalog Web SOAO Node acting as SOAO Node acting as caching intermediary Chaching ultimate receiver Catalog Store Store SOAP Sender & SOAP Receiver Receiver
SOAO Node acting as initial sender SOAP Sender SOAP Application 1
SOAP Application 2
Caching Handler
SOAP Application 3 SOAP Block 1
Caching Handler
Message Path 1 Message Path 2
SOAP Layer
SOAP Processor
Underlying Protocol Layer
SOAP Message
SOAP Processor
SOAP Message
XMLP Processor
Underlying Protocol Message Path
Underlying Protocol Intermediary e.g. HTTP proxy SMTP relay Host I
Host II
Host III
Host IV
Host V
Figure 14.5 SOAP distributed application using a caching intermediary [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
493
ABC-1234 Figure 14.6 SOAP request message for a cataloged price [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]
service (SOAP application 3). This is message path 1. The catalog SOAP processor will respond to the request and insert into the message additional data targeted to the caching intermediary (see figure B). These additional data are placed in a SOAP header. This demonstrates SOAP’s extensibility mechanism—to provide a means to attach message metadata or intermediary targeted content—coupled with a deterministic proceesing model for handling these message headers. In this scenario, the
ABC-1234 2001-03-09T08:00:00Z ABC-1234 120.37 Figure 14.7 SOAP response emitted from catalog SOAP processor [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]
494
FIXED AND MOBILE WEB SERVICES
ABC-1234 120.37 Figure 14.8 SOAP response received by originating SOAP sender [24]. [Copyright # 2002 W3C (MIT, INRIA, Keio University), all rights reserved.]
SOAP header is inserted to control any caches that may reside in any intermediaries along the message return path to the originating sender. The CacheControl Header data are consumed by the caching intermediary. This message header must be processed at this SOAP node as determined by the semantics of the mustUnderstand and role attributes (more details later on these semantics). The application semantics of caching determines the behavior at the intermediary. In this case the salient data in the message payload are copied to the cache with the indicated index and expiration time. The CacheControl Header is stripped by the intermediary (see figure C) and the message follows its path back to the SOAP sending application. 14.2.4.3 SOAP 1.1 Structure and Processing Model SOAP version 1.1 primarily defines a message structure, a processing model, an encoding scheme for message data, an HTTP binding, and a RPC programming model mapping. Arguably, the buzz about Web services is due mostly to this specification. SOAP is similar to XML-RPC, which is to be expected given their shared origin. Many of the features are the same—comparable encoding rules, an RPC convention, and HTTP binding orientation. What is different is that SOAP allows custom data types, direct support for application intermediaries, an extensibility mechanism, and a defined processing model for these last two features. The cost to SOAP, relative to XML-RPC, is simplicity. It seems that the value of this tradeoff is borne out by SOAP’s popularity. The SOAP message structure and processing model are tightly coupled. Hence, it is illustrative to discuss these topics together. Figure 14.9 demonstrates the general structure of a SOAP 1.1 message. A SOAP message is an XML document. The outermost container is the “envelope” element—thus the envelope is essentially the SOAP message The envelope is where global namespaces are set. The envelope tag itself must be scoped to a namespace. The envelope namespace declares the version of the SOAP message. For SOAP version 1.1, the namespace URI to use is http://schemas.xmlsoap.
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
495
Figure 14.9
SOAP message structure.
org/soap/envelope/. If a SOAP 1.1 processor receives a SOAP message that
uses a different envelope namespace, it will generate a SOAP fault with a faultcode value of VersionMismatch. The SOAP envelope also can be used to declare the encoding style used to represent the message data. If the encoding style is the one defined by the SOAP specification (called “SOAP encoding”), then it is often defined here at the envelope using the URI http://schemas.xmlsoap.org/ soap/encoding/. SOAP encoding will be discussed in more detail later in this chapter. The SOAP envelope is divided into two sections—one for header element blocks and the other for body element blocks. The SOAP header is intended to encapsulate ancillary content relative to the service being invoked. The SOAP body is intended to encapsulate the primary content of the message exchange. From a syntax viewpoint, the header and body elements are similar. Each must be an immediate child
496
FIXED AND MOBILE WEB SERVICES
element of the envelope element. Header elements are optional and must precede the minimally one mandatory body element. SOAP Body Processing The body element is the container for data that maps to functional requirements of a SOAP application. This is the message payload. The payload contains information intended for processing by the main application logic. It can take the form of a remote procedure call or an XML document to exchange. The two canonical examples of these options are the stock quote and the purchase order, respectively. (Syntactically, the body element is an immediate child of the envelope element. If there is no header element, then the body element is the first child; if a header element does appear in the message, then the body element immediately follows it. The payload of the message is represented as child elements of body, and is serialized according to the chosen convention and encoding. Most of this chapter deals with the contents of the body and how to build payloads.) The processing model for messages that do not contain header elements is simple and straightforward. The client encodes a service request into the body of a SOAP message and forwards the message to a SOAP node that is acting in the role of ultimate receiver or Web service provider. It performs the service and typically sends back a response message, depending on the application semantics. An error may occur during service processing: the service provider may not recognize the content of the SOAP message body, the body’s content may be incomplete or erroneous, or some internal service processing problem may occur. The service would then generate a SOAP fault message describing the error and forward the message back to the client. SOAP Header Processing The SOAP 1.1 specification states that the use of SOAP headers is to extend a message without prior knowledge between the communicating parties. This definition seems both ambiguous and somewhat misleading. The specification then cites SOAP header use for purposes such as authentication, transaction management, and payments. These examples better illustrate current SOAP header practice—to extend a base application with features that communicating SOAP nodes recognize and process. This is the dominant use case for SOAP headers—where the semantics of the SOAP header data is orthogonal to the semantics of the SOAP body payload. Consider a sophisticated travel agency Web service that reserves and purchases hotel bookings and airline tickets. The SOAP processors implement header elements to handle transactions, security, performance, message reliability, or message correlation. These are all orthogonal features to the offered service. In contrast, another class of SOAP header use fits the case where the semantics of header data and payload processing are mutually dependent. For instance, in the same-use case, SOAP messages exchanging travel itinerary information could capture the traveler’s identity (name, address, etc.) in a SOAP header where the SOAP body would carry the flight numbers, arrival and departure times, hotel
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
497
address, and other specifics. Thus, the SOAP header is used to carry application data that applies to the whole body payload in a direct, application-oriented manner. The third class of SOAP header usage is for intermediary processing, including the deliberate processing of data at intermediaries. The earlier SOAP example illustrating cache control demonstrates this style of header usage. Another example is routing messages to intermediaries that implement part of the application logic, such as credit card authorization, before routing the message to the ultimate receiver. Note that a SOAP message is not constrained to any one style of header usage. A SOAP message can use headers that map to any combination of the header types described above—or even to all types in some extreme case. The SOAP processing model adds novel value only in the case when SOAP headers are used to control intermediary behavior. The SOAP processing model is scoped to a single message exchange routing from an originating sender to the ultimate (last) receiver. Hence, it doesn’t describe behavior involving more than a single message exchange between originator and destination—unlike multiple message exchanges such as request – response or publish –subscribe. Along the path from sender to ultimate receiver, the SOAP message pass may through SOAP intermediaries. A SOAP intermediary is a SOAP processor that receives a SOAP message, processes one or more of the SOAP headers in the message, and then forward the message to either another intermediary or to the ultimate receiver. SOAP messaging is hop to hop, so the SOAP processing model is defined in terms of individual node processing. The SOAP 1.2 processing model describes the behavior at each SOAP node and involves three steps: 1. Identify all the SOAP blocks to be processed. This means inspecting each SOAP header in turn and identifying whether that header is to be processed by the SOAP node. The “role” attribute is used to target a specific SOAP node in the message path to process a specific SOAP header. 2. Verify that the SOAP node is capable of processing all above-identified headers that are attributed as “mandatory.” If the SOAP node is not capable of processing any of these headers, then generate a SOAP fault message targeted to the originating message sender. If identified SOAP headers are attributed as optional, they may be ignored. The mustUnderstand attribute is used to assert a mandatory request for the SOAP node to process the SOAP header. 3. If the SOAP node is an intermediary, then remove all identified SOAP header blocks from the message and then forward the message toward the ultimate receiver. SOAP 1.2 extended the SOAP 1.0 processing model by offering more detail and options regarding role processing. The SOAP 1.1 processing model does not stipulate the processing order for header blocks. Some SOAP nodes process in lexical order, some in placement order. This has been a source of interoperability problems. Another interoperability issue is the smuggling of platform-dependent, proprietary headers into the SOAP
498
FIXED AND MOBILE WEB SERVICES
message. These actions contradict the goal of SOAP to be a cross-platform application messaging protocol. Some interoperability issues have been addressed in the WS-I Basic Profile 1.0, a document that profiles best practices for SOAP, WSDL, and UDDI. Web services specifications based upon SOAP 1.1 extensibility have been released in droves. Each of these defines a set of headers—their syntax and processing semantics. There are headers specified for security, transactions, reliability, business processing, asynchronous callbacks, message routing, and other generic requirements. Some of these specifications have been submitted to one of the Web services standards bodies, and more specifications assuredly will be. Once standardized, the resulting specifications will broaden the minimal SOAP protocol to support features required for some distributed applications. A system architect will be able to map the application requirements to the proper standards. Hence, SOAP extensibility enables system design flexibility.
SOAP Fault Message Structure and Processing For error processing, SOAP defines a SOAP fault message structure. A SOAP fault message is structured to support a SOAP processing error response associated with a received SOAP request. SOAP allows only a single SOAP fault element as the only immediate child of a body element. The simple example in Figure 14.10 demonstrates an error thrown during payload processing. SOAP faults are mapped to handle either messaging or processing exceptions. SOAP messaging errors are generated when a SOAP node detects a poorly formed SOAP message or a SOAP message containing erroneous or incomplete information. SOAP processing errors are generated when a SOAP node throws an application-level exception not directly attributed to the structure or content of the received SOAP message. A SOAP fault contains an extensible faultcode attribute that allows a SOAP processor to branch appropriately when it receives the SOAP fault message. For complex applications involving intermediaries, a faultactor attribute will identify, by URI, the SOAP node that detected the error and generated the SOAP fault message.
12345 Joe’s Pizza Parlor 20013-07-03 net 30 Joe Cerigioni 53535 500 lbs 12345 1000.00 60.00 1060.00 20013-07-05 Figure 14.12
Document style SOAP.
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
503
POST/Reservations HTTP/1.1 Host: travelcompany.example.org Content-Type: application/soap þ xml; charset ¼ “utf-8” Content-Length: nnnn 5 FT35ZBQ ˚ke Jo ´gvan Øyvind A env:Envelope xmlns:env ¼ “http://www.w3.org/2003/05/soapenvelope” > 5 confirmed FT35ZBQ http://travelcompany.example.org/reservations?code ¼ FT35ZBQ Figure 14.14
SOAP 1.2 RPC response.
Using SOAP for RPC and choosing an appropriate protocol binding are treated as orthogonal concepts in the specification. In practice, mapping the RPC semantics to a protocol binding that does not clearly define a request –response message pattern will require extra design. HTTP correlates the two messages by reusing the same connection. Thus a message is certain to be a response if delivered along the same connection where an earlier request was sent. Binding to a different protocol may require the application desiger to implement a correlation identifier directly as a SOAP header. SOAP Encoding SOAP encoding specifies how to represent typed data in a SOAP message. Encoding rules are necessary to be able to transform data from a programming environment into a SOAP message and back to a programming environment. SOAP encoding rules are specified in the SOAP1.1 specification and are biased toward marshaling RPC parameters into and from their programming constructs. Encoding rules operate on a chosen set of datatypes. Recall that SOAP is not tied to any particular programming language and that a heterogeneous mixture of programming languages can implement a set of interacting SOAP nodes. Hence, the
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
505
encoding rules that SOAP employs must support datatypes common to popular programming and database languages. These datatypes include integers, strings, floating-point numbers, structures, and arrays. SOAP does not dictate which encoding rules to use. It is important that each application be allowed to choose the most appropriate encoding to suit its requirements. What tempers a proliferation of encoding rules is the prime objective of Web services—interoperability. What SOAP does is support a mechanism so communicating SOAP nodes can be clear on what encoding is used to serialize a SOAP message. This is the encodingStyle attribute, and its value is a URI that indicates the controlling schema. This URI may indicate only a verbal or documented agreement for encoding rules, rather than an actual parsable, schema to use for validation. When encodingStyle is attributed to the envelope element, it applies to the whole SOAP message. Although this is the typical case, it is not mandatory. Some messages will set the encodingStyle attribute on the subcontainer with data to be encoded. SOAP does not define a default encoding. If encodingStyle is not present on any container of a SOAP message, the receiving SOAP processor has no embedded clues on the message’s data serialization. SOAP defines an encoding as part of the specification. This is called “SOAP encoding” or “Section 5 encoding,” taken from the section of the SOAP 1.1 specification that defines it. Since XML Schema and SOAP were developed in the same timeframe, SOAP could only take partial advantage of this standard schema technology. XML Schema satisfies the SOAP encoding requirements for defining common datatypes and their corresponding encoding rules. SOAP encoding takes from XML Schema the datatypes’ namespace and the type attribute. XML Schema encoding is often used in place of the native SOAP encoding rules. WSDL, which was released after XML Schema, describes message formats using only XML Schema. SOAP encoding constitutes a data model based on XML along with corresponding encoding rules. This data model consists of simple types such as strings, integers, and floats. The SOAP data model also includes complex types such as arrays and structures, which are compositions of simple types. An application creates an instance of the data model that is represented as a graph of typed data. The SOAP encoding rules dictate how this graph is serialized into a SOAP message. To indicate this encoding in a SOAP message, the encodingStyle URI should be set to the value of http://schemas.xmlsoap.org/soap/encoding/. This URI points to a schema that defines the SOAP datatypes and encoding rules. “Section 5” encoding has been popular for RPC-oriented SOAP applications. However, there is momentum toward document-style SOAP, and this requires XML Schema encoding to produce the required sophisticated data representations. Toward this end, the WS-I Basic Profile has deprecated “Section 5” encoding. 14.2.4.6 SOAP with Attachments SOAP is principally an application-to-application messaging protocol. However, SOAP is increasingly acting as a content delivery conduit for arbitrarily typed
506
FIXED AND MOBILE WEB SERVICES
and sized data that are attached to the message. It is up to the SOAP application’s semantics whether this data attachment is to be used as an ancillary rider to the SOAP message or whether the attachment itself is the core reason for the transmission of the SOAP message. This second scenario may be exploited by applications whose requirement is to move binary data to another network application using a common transport protocol. Given an optimistic future for the uptake of SOAP, either scenario is plausible. For transmitting binary data or a large quantity of marked up data, an application designer can choose between embedding the data (e.g., as base64 typed data) between XML tags inside the SOAP envelope or using an attachment mechanism. Attachments can increase efficiency by avoiding the need to parse a large XML envelope or to process the attachments. However, attachments themselves add some overhead to the message, and the application processor must be more sophisticated to understand attachment syntax and semantics. Attachments can be conveyed with a SOAP message using MIME multipart technology. This technique was defined in the “SOAP with attachments” (SwA) specification [13]. SwA defines how a message is to be carried within a MIME multipart form in such a way that the SOAP processing rules are preserved. The MIME multipart mechanism supports binary attachments such as images, sound files, or Word documents. To handle attachments, the XML Protocol (XMLP) working group of the W3C first published the SOAP 1.2 Attachment Feature [17]. The SOAP 1.2 Attachment Feature refers to SwA as a concrete attachment binding. The SOAP 1.2 Attachment Feature does not define concrete encoding rules for SOAP attachments. Rather, it defines an abstract SOAP 1.2 attachment feature that will enable various SOAP bindings to be defined that support the exchange of messages with binary attachments. In addition to SwA, another attachments specification, WS-Attachments [18], is also referenced by the SOAP 1.2 Attachment Feature. WS-Attachments leverages DIME rather than MIME as the document attachment mechanism. DIME has the advantage of maintaining a length attribute for each payload component. Knowing the length relieves the receiving SOAP application from the possible significant overhead in processing and memory allocation. After creating the SOAP 1.2 Attachment Feature, the XMLP working group further analyzed the attachments space and found the existing work lacking. SwA did not fully consider attachments relative to the XML Infoset and SOAP processing model expressed in SOAP 1.2. Additionally, opportunities for optimizing the exchange of SOAP messages that contain binary data had not been explored in a standard body. To this end, the XMLP working group has created a trio of new specifications. XML-binary Optimization Packaging [25] (XOP, pronounced “zop”) is an optimized serialization of an XML Infoset that contains binary content. Content that is typically base64 encoded is transmitted instead in its native octet stream, thus it does not incur either the message bloat or processing overhead associated with base64 encoding binary data. XOP uses MIME Multipart/Related [26] as its packaging format. Also, XOP enables binary data to be conceptualized as typical base64-encoded so it can be processed as a typical XML document. This layering thus enables a
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
507
XOP package to be digitally signed, for instance. SOAP Message Transmission Optimization Mechanism (MTOM) [27] is the another XMLP specification that describes how to use XOP with SOAP 1.2. A new media type is described that is to be used when a SOAP message Infoset is XOP encoded, thus alerting a receiving SOAP application to appropriately optimize during processing. The last of the trio of specifications is the SOAP resource representation header [28]. This specification describes a mechanism for carrying a representation of a Web resource as a SOAP header in situations where the receiver of this message would typically want to retrieve this representation. This capability is important in cases where the resource may be inaccessible or when it is desired to reduce the network overhead and eliminate the need for an additional HTTP GET on the resource. 14.2.4.7 SOAP Message Security Security is an essential aspect of SOAP messaging due to the various risks associated with sharing information or performing transactions. Inappropriate information disclosure can impact ongoing negotiations, customer, partner or employee relationships, adherence to privacy regulations, and reputation. Eavesdropping of SOAP messages can allow a passive attacker to obtain and misuse information. An attacker may also perform active attacks such as sending messages pretending to be someone else, changing message content, or resending old messages to obtain services or other side effects. Such attacks can result in products being shipped to the attacker rather than the original message sender, false information being used in decisionmaking, or services being obtained without payment, to give some examples. Such attacks may occur when a capable attacker exploits vulnerabilities to achieve a goal. Such threats may be mitigated by deploying security services as countermeasures. Since there is a cost to security, security services need to be optional and used as needed depending on the application, risks, and costs. Some well-established security services include the following (complete definitions may be found in the Internet Security Glossary, RFC 2828): . Message integrity—message content cannot be changed without receiver detecting the change . Message confidentiality—eavesdropper unable to view protected message content . Mutual authentication—sender and receiver able to determine that the other party is as expected . Authorization—information can be conveyed to enable party to determine whether to provide service . Timeliness—messages cannot be reused without detection Integrity requires more than a checksum since in the context of security threats, forgery and substitutions also need to be detected. This is accomplished by using cryptographic methods to achieve data origin integrity, the ability to associate a party with the content. In addition to these security services, care must be taken
508
FIXED AND MOBILE WEB SERVICES
to mitigate the risk of denial-of-service attacks that can be used to either disable or degrade a service. It is hard to prevent but measures can be taken to increase the cost to an attacker. Mature technologies exist to provide transport security and can be used in certain circumstances when SOAP is bound to HTTP or another protocol that uses TCP/IP. A well-known solution is the secure sockets layer (SSL) and the revised RFC standard transport layer security (TLS). SSL was a Netscape de facto standard that was later adopted and revised by the IETF to produce TLS in January 1999. This security technology may be used to provide integrity, timeliness, and confidentiality as well as mutual authentication. Server authentication is achieved using a server X.509 certificate. Client authentication may be achieved using a client X.509 certificate, or in the case of an HTTP binding, using HTTP basic or digest authentication in conjunction with SSL/TLS confidentiality and integrity protection. Care must be taken to configure SSL/TLS correctly to use the appropriate security features, algorithms, and key lengths. Transport layer security is only appropriate for securing a link between two SSL/ TLS endpoints. This is useful when SOAP is used with HTTP without any SOAP intermediary nodes, but not when SOAP intermediary nodes are present. Since SOAP intermediary nodes must be able to examine and possibly modify the SOAP headers, SSL/TLS must terminate at the intermediary. This means that SSL/TLS protection is lost at the intermediary node, exposing the payload of the message to risks if the SOAP node is compromised. The SOAP Message Security specification under development at the Oasis Web Services Security technical committee is designed to specifically address security concerns related to SOAP messaging. This effort was initiated on July 9, 2002 with the charter referring to the April 5, 2002 WS-Security specification submission from IBM, Microsoft, and Verizon. The WSS committee specifications provide an example of how SOAP messaging may be extended using SOAP headers and processing rules to add additional optional infrastructure capabilities to Web services. The SOAP Message Security specification defines how to use the W3C XML Digital Signature recommendation to provide (1) integrity for any combination of SOAP header or body elements and (2) verification that claims made in security tokens are from the sender. It also defines use of the W3C XML Encryption recommendation to allow encryption of any combination of SOAP body elements and content, SOAP header elements, and content and attachments (but not the soap:Envelope, soap:Header and soap:Body elements themselves since the SOAP enveloping structure must remain intact). The specification defines a timestamp mechanism to support establishing the freshness of SOAP Security header blocks. Finally, the core specification and security token profiles define security tokens and associated mechanisms and processsing rules. Security tokens are XML elements representing one or more claims made by the token authority, used to convey identity, key, authentication, authorization, or other information. The SOAP Message Security specification supports both XML and binary tokens and defines generic mechanisms to reference tokens. The Oasis committee is also
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
509
producing security token profiles for Username, X.509, Kerberos, SAML, and XrML security tokens. The username token may be used to convey a username and optionally a password for authentication, as well as supporting key derivation based on a username. The X.509 token profile outlines a means to convey X.509 certificate (and hence key) information, and the Kerberos token profile outlines conveying Kerberos tickets (and hence symmetric key information). The SAML profile outlines how to convey SAML assertions (authentication, authorization, and attribute assertions) and how the subject confirmation method is used in a SOAP messaging context. XrML defines how rights management tokens may be used. Other token profiles may be developed in the future. The SOAP Message Security specification defines a SOAP security header block enabling security for both an ultimate receiver or SOAP intermediaries through the use of the SOAP role. A SOAP message may contain one or more wsse:Security3 header blocks, as needed, although only one may be defined for a specific role. Each security header block may contain a combination of XML signature elements, encryption reference lists, encrypted keys, wsu:Timestamp4 elements, or security tokens. Thus the security header contains signatures, information necessary to process encrypted message components, information useful to give security timeliness, and security tokens. The SOAP Message Security specification requires that XML signatures used for SOAP message security be placed in a SOAP security header block associated with the intended recipient (either the ultimate receiver or an intermediary, depending on the role) and recommends that supporting cryptographic key information also be conveyed in security tokens in that header block, although other ds:KeyInfo5 mechanisms are possible. In order to fit with the SOAP message processing model, senders should not send enveloping signatures, nor should receivers use an EnvelopedSignatureTransform. The SecurityTokenReference provides a mechanism to reference security tokens, either directly using a URI or indirectly using an opaque identifier value or name as defined in the token profile. An embedded token may also be contained within a SecurityTokenReference element.When keys are conveyed in security tokens in the security header block they may be referenced as an XML signature key by placing a SecurityTokenReference as a child of the ds:KeyInfo element. Security tokens may also be signed as part of an XML signature by referencing a SecurityTokenReference element with a ds:Reference URI and specifying a “STR Dereference Transform.” This transform dereferences the SecurityTokenReference, allowing the resulting security token to be hashed and included in the XML signature ds:Reference. 3
wsse is a prefix referring to the SOAP Message Security namespace; the prefix string used here for clarity is not normative. 4 wsu is a prefix referring to the SOAP Message Security utility namespace; the prefix string used here for clarity is not normative. 5 ds is a prefix referring to the XML Digital Signature namespace; the prefix string used here for clarity is not normative.
510
FIXED AND MOBILE WEB SERVICES
SOAP Message Security addresses confidentiality by using the W3C XML Encryption recommendation. Any SOAP message content may be encrypted, either elements or element content except for the soap:Envelope, soap:Header, and soap:Body elements themselves. As required by XML Encryption, when an XML element or element content is encrypted, it is replaced by the xenc:EncryptedData element.6 Attachments may also be encrypted, with the attachment replaced by cipher text that is referenced from an EncryptedData element in the security header. The SOAP Message Security specification imposes additional requirements on the use of XML Encryption by requiring a manifest of what has been encrypted be added to the security header block. Each encryption may use a different key, and if the encryption key needs to be conveyed explicitly, it may be referenced using a ds:KeyInfo element in the xenc:EncryptedData element. The ds:KeyInfo may include a SecurityTokenReference child when the key is conveyed in a security token in the security header. If a symmetric encryption key has been encrypted, then this encrypted key may be conveyed in the security header using a ds:EncryptedKey element. A common case is to encrypt a symmetric key with a public key, for example. If an EncryptedKey element is used in the header then it should contain a xenc: ReferenceList containing a xenc:DataReference element for each encryption. If the encryption key is not encrypted then what was encrypted with it should be included in an xenc:ReferenceList in the security header block. Use of security techniques for SOAP messaging is being profiled by the WS-I Basic Security Profile working group, profiling use of SSL/TLS as well as SOAP Message Security and a subset of the token profiles. Such profiling is important since security techniques may be combined but care must be taken to avoid introducing new security vulnerabilities. 14.2.4.8 SOAP 1.2 Changes The SOAP 1.2 specification was a W3C-proposed recommendation (PR) in summer 2003 and is on track to be a full recommendation (REC). There are some significant changes in SOAP 1.2. One large difference is that the SOAP 1.2 is built around the XML Infoset [11] rather than XML 1.0 document syntax. This means that the SOAP structure is defined abstractly, instead of preencoded as an XML document. This enables alternate representations of the same SOAP information to be mapped to different protocol bindings. SOAP 1.2 has some syntax changes dealing with message structure. It addresses some ambiguities and missing features related to headers and intermediary processing, along with faults. The HTTP binding has been modified. The SOAPAction header is removed and is replaced by an optional parameter in the Contenttype HTTP header. A new media type, application/xml þ soap, replaces text/xml of the Content-type HTTP header. SOAP 1.2 better maps to the HTTP status codes. The HTTP GET method is now incorporated into a SOAP 6
xenc is a prefix referring to the XML Encryption namespace; the prefix string used here for clarity is not normative.
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
511
response message exchange pattern. This leverages the safety and idempotence of GET requests for queries, instead of misusing HTTP POST for this purpose. SOAP 1.2 also adds some features to both the RPC style and the SOAP encoding. The SOAP GET binding is a response to REST advocates criticism that SOAP messages with query semantics should work within the Web architecture principals and use the syntax and semantics of HTTP GET rather than tunnel over HTTP POST. Use of GET requires some subtle requirements to be met, namely, that the resource to query is reachable by a URI, there are no side effects on the service platform (the operation is safe), the query performed many times has the same results as if the query were performed only once (idempotence), and caching the retrieved data creates no security or privacy concerns. 14.2.5 WSDL The acronym WSDL stands for Web Services Description Language. The WSDL specification defines service interface metadata that can be used to formulate messages targeted to and from a Web service. SOAP is the primary message format that WSDL describes. WSDL 1.1 is an IBM and Microsoft submission to the W3C in spring 2001 [11]. In 2002, the W3C started a working group to turn this into a W3C Recommendation. Web services are about interoperability. Interoperability can be difficult to achieve even when the development process is controlled within an enterprise. Interoperability outside the enterprise becomes more elusive when you factor in multiple development teams, a variety of programming languages and platforms, geographic dispersal of developers and locale differences, and no single controlling party across the developed components to ensure quality. What is minimally required is an interface definition language (IDL) that is language/platform-neutral for Web services. Other distributed systems such as CORBA and DCOM use an IDL to specify the how to bind and invoke services. For Web services, WSDL is the IDL. WSDL is limited in its ability to encapsulate all the information necessary for a client process to fully use the offered Web service. This is because WSDL, like other IDL languages, can define only the syntax, and not the semantics, of the service in a formal manner. Therefore, what a service actually does is left to humans to interpolate. The missing formal semantics should not be underestimated. Missing or misunderstood semantics introduces errors and costly development cycles to fix. This is the bane of most middleware, and it will not improve until technologies such as the semantic Web are fully deployed. WSDL is an XML document that describes three general areas of necessary metadata in order to invoke a Web service—message data, operations, and bindings. Message data defines any needed structure or datatypes of the exchanged data and the mapping of custom or XSD types into messages. Operations define how the service is accessed by the defined messages, such as an input/output message. Bindings defines the transfer protocol messages use for physical delivery. WSDL defines three bindings, for HTTP, for MIME, and, most importantly, for SOAP. It is the SOAP binding that gives WSDL its high value in the Web services domain.
512
FIXED AND MOBILE WEB SERVICES
Figure 14.15
Simple WSDL definition.
The WSDL document structure defines elements to build up to these three main concepts. Lets use examples and examine the subelements. WSDL Message Data—Types, Parts, and Messages First is the message data section. Two WSDL element constructs are used to define the data format of a Web service message. The WSDL section defines the structure of custom typed data to be used in the message. Hence, these are datatypes tied semantically to the application and are not general datatypes. General datatypes are predefined by XSD (such as xsd:int) and are the foundation of custom-defined datatypes. The datatypes are then mapped into a WSDL message element. This mapping in done by message parts. Each message part is related to a defined data type, possibly a custom datatype container modeling a data structure. For RPC style Web services, each part typically corresponds to a function parameter. Message parts are also used when a message has multiple logical sections, each composed of an arbitrarily complex XSD structure. Modeling messages in this fashion supports documentstyle Web services. Both modeling styles are supported by WSDL. A message is independent of the binding and hence is said to be “abstract.” Thus, a message may be used with SOAP, HTTP GET, MIME, or any other protocol binding. For RPC-style Web services, two messages must be described—the input/request message and the output/response message. WSDL elements and XSD is used to allow many ways to model messages. Figure 14.15 gives two examples; the first one is simple, the second more complex, and more realistic. The example in Figure 14.16 demonstrates the creation of WSDL messages without the need for custom types. These WSDL messages model RPC style Web services. Each message part corresponds to an RPC parameter. Since custom datatypes are not needed, the has been omitted from the WSDL document. The second example demonstrates WSDL messages for a document-style Web service that models a factory production system. This demonstrates a WSDL message requiring custom types and multiple message parts. Custom types are defined using XSD constructs and build on the foundation XSD primitive types. Custom datatypes are created according to the semantics of the application. Message parts are used to logically group the data. In this example, this is the and the . A single ProductionReport
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
xmlns ¼ “http://www.w3.org/2001/XMLSchema”
Figure 14.16
WSDL custom types and multiple message parts.
513
514
FIXED AND MOBILE WEB SERVICES
message combines these two parts. is straightforward— it models a structure that contains production information collected off the factory floor. is more complex. It models what the product is manufactured to fill—intended either for warehoused stock or for one or more customer orders for that product. WSDL Operations—Message Exchange Patterns, and portTypes The second WSDL concept to address is operations. An operation corresponds to a logical invocation of a Web service. WSDL 1.1 operations are defined in the context of one of four basic message exchange patterns called transmission primitives. Hence, operations define the sequence of messages required to logically complete a Web service invocation. Operations define input, output, or optional fault child elements. These correspond to the message flow relative to the Web service receiver, not the invoking sender. The WSDL four defined operation patterns are . One-way—the Web service receives a message and generates no reply. This is modeled by only one child input element. . Request –response—the Web service receives a message, and either sends a reply message that is correlated to the request message or generates a fault. This is modeled by one child input element, followed by an output element, followed by an optional fault element. . Solicit-response—the Web service endpoint sends a message, and receives a correlated reply. This pattern is the converse of the request – response pattern. Note that the response message should not have an “in” parameter Figure 14.17 Example WSDL porttype and operation.
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
515
because this pattern dictates, by definition, that the Web service cannot react to input data in the reply. WSDL does not define a binding for this pattern, so this may of limited use in practice. This is modeled by one child output element, followed by an input element, followed by an optional fault element. . Notification—the Web service emits a message. This corresponds to a broadcast or a publish – subscribe programming model. WSDL also does not define a binding for this pattern. This is modeled by only one child output element. To support the tight syntax of RPC style services, yet to be flexible to handle document-style services, the operation element can optionally use the parameterOrder. This attribute is used to explicitly specify the part order for RPC-style messages. Parameters are space-delimited and identified by part name. Related operations may be grouped into elements. The element is a logical container of functionally related operations similar to object methods or COM interface methods. The element provides WSDL a mechanism to bind related operations to a concrete protocol. A WSDL document can host a number of container elements. The example in Figure 14.17 demonstrates both and . Three operations are modeled in Figure 14.17: two request – response and one notification. All are grouped in the same element because as one service they would likely be bound to the same protocol.
WSDL Bindings—Concrete Protocols, Ports, and Services The last WSDL concept to address is bindings. WSDL bindings map the abstract datatypes, messages, and operations to concrete physical representation of messages. There are
Figure 14.18
WSDL example from WSDL 1.1 specification.
516
FIXED AND MOBILE WEB SERVICES
three WSDL components required to define this mapping: bindings, port, and service. A WSDL element maps a PortType to an actual transport protocol. Bindings are defined in the WSDL 1.1 specification for SOAP, HTTP, and MIME. However, the SOAP 1.1 binding is the primary use case for WSDL. A binding does not specify network address, just the protocol artifacts that would be static no matter which host machine the service is deployed on. For SOAP bindings, WSDL 1.1 describes examples of SOAP over HTTP (request – response operation) and SOAP over SMTP (one-way operation). Figure 14.18 presents a SOAP-over-HTTP binding example from the WSDL 1.1 specification. WSDL defines for the element a set of extension elements. Extension elements transition the WSDL document from the abstract to the concrete. Extension elements declare how a binding is realized—by using SOAP or HTTP or MIME. Hence, the role of the extension elements is to specify a concrete grammar in the notation of the physical binding. In the case for SOAP, a set of SOAP extension elements is used. The extension elements defined by the WSDL 1.1 specification are for SOAP, HTTP, and MIME. Concentrating on this example and SOAP binding, the element specifies SOAP over HTTP as the transport protocol and the document style of messaging. A correlation between the abstract operation and this binding is made by identifying that operation by name and creating a child element. The soap:operation element defines the SOAPAction header to use. This is discussed in the SOAP section, but is it an HTTP visible mechanism to dispatch this SOAP message to the proper component in the Web service. The element is used to describe the encoding of the input and output messages related to the containing operation. There are a number of options of relating how the message is encoded and relating whether the indicated encoding is for the whole message body or targeted to individual message parts. This simple example identifies literal encoding. This declares that the concrete schema definitions are “literally” defined in the WSDL type section. If the value of the “use” attribute is “encoded,” then the datatypes used to construct the message parts in the WSDL type section should be thought of as “abstract.” Another step is required to encode the data, and that is defined by an encodingStyle attribute. The encodingStyle attribute value is a URI, and it will point to an indicator of the encoding rules to use. For example, a element that indicates the SOAP encoding style is presented in Figure 14.19.
Figure 14.19
Declaring SOAP encoding in a WSDL .
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
517
Figure 14.20
Two WSDL services and associated elements.
All that is left for WSDL to declare is the runtime address of the defined Web service. This is done with the WSDL and elements. A service is a collection of ports. A port specifies a sole network address for a binding. Hence, a port can be considered a concrete instantiation of an abstract portType. The address information is supplied by a protocol extension element. As with bindings, protocol extension elements are defined for SOAP and HTTP (the MIME binding assumes SOAP as a transport protocol). If a service groups multiple ports, then the ports are instantiations of the same portType. The ports must vary by either employing a different binding (protocol) or a different network address for the same binding. The ports are semantically equivalent and can be regarded as alternatives. In Figure 14.20, two services are offered by the WSDL file. The first service deploys three alternative endpoints—two SOAP and one HTTP—to the same logical set of operations. The SOAP ports reuse the same binding information, but use separate network addresses. A WSDL document can expose multiple services. The second service instantiates a port that doesn’t expose the same portType.
518
FIXED AND MOBILE WEB SERVICES
WSDL 1.2 The W3C formed the Web Services Description working group in 2002 and had produced significant deliverables by summer 2003. In addition to a requirements and a usage scenarios document, the group is creating the WSDL 1.2 specification. The current form of the specification is in three parts. WSDL Part 1: Core Language defines the overall document framework in terms of an abstract component model modeled as an XML Infoset. WSDL Part 2: Message Patterns defines the supported primitive message combinations used to support a Web service operation. Message patterns are binding independent and are defined in terms of sequence, direction, cardinality (single or multicast), and potential faults. WSDL 1.3: Bindings defines WSDL protocol extensions to use with SOAP and HTTP, along with an extension to use with the MIME message format. The WSDL 1.2 specification has made some changes relative to WSDL 1.1. Operator overloading was possible in 1.1, and this feature has been removed in 1.2. The element has been renamed , and interface inheritance is supported. The element has been renamed . WSDL 1.2 is also defining a mechanism to describe extensibility in terms of features and properties. Features are described in SOAP 1.2. Features are defined abstractly, and their role is to introduce a new capability, such as security or correlation, which extends the SOAP processing model. Properties are concrete extension elements that support a given feature. Hence, properties and features should enable WSDL 1.2 to describe extended distributed processing functionality beyond the basic SOAP features. 14.2.6
Discovery Protocols
Web service registries will support the discovery of provided Web services by interested requesters. Discovery is critical when requestors and providers are unknown to one another. However, in this nascent phase of Web services technology, it will be more typical that business partners already have an established relationship before interacting with one another. All partners will typically know each others’ capabilities in terms of both offered services and the quality of that service. However, some Web services proponents state that automating the discovery process is the key to the uptake of Web services uptake. Lost in the discovery process is the transmission of service semantics. It is assumed that humans and natural languages will supply the semantics until technologies central to the semantic Web are deployed. Universal Description, Discovery, and Integration (UDDI) UDDI [19] is a technology that defines a Web services discovery mechanism based on a centralized registry model. It defines and describes the format for storing metadata on published services along with the APIs for both publishing and querying the registry. Registries are deployed and managed by an operator. A registry can support services for a single large enterprise, or across multiple enterprises. While UDDI can support either configuration; it is geared toward supporting service discovery across enterprises—in much the same way as the Yellow Pages organize offered services and providers. Besides detailed service metadata, UDDI registries store
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
519
business metadata relating to the service providers. The UDDI discovery process mandates that the requestor know the UDDI query SOAP API along with the UDDI registry endpoint. UDDI registries can be replicated across UDDI registry operators. Web Services Inspection Language (WS-IL) In contrast to UDDI, WS-IL [20] describes a service discovery model that is decentralized and relatively simple. WSIL describes how a service provider exposes its offered services. WS-IL does two things to accomplish this: (1) it describes the structure of an inspection document or a document set that references a provider’s available service descriptions, and (2) it describes a mechanism for a client to retrieve the inspection document(s). An inspection document is an XML document and is used to aggregate references for each service that the provider is exposing. A provider can partition the links to service description documents into more than one inspection document. In such a case, the inspection documents are linked and the root inspection document can be used as a starting point for discovery queries. The service description references contained in an inspection document typically point to WSDL files. WS-IL also defines a binding to enable the inspection document to reference a UDDI registered service. This enables a provider to register a services in a UDDI registry, yet still offer a local inspection access to the service metadata. This has the advantage of avoiding duplicate service metadata. A WS-IL inspection document references to service WS-IL describes two mechanisms to place and find inspection documents. The first is to name the root inspection document inspection.wsil and make it available via URL at the common entry point to the provider domain. For instance, the provider foo.com would place its root inspection document at the root of its Web server— http://foo.com/inspection.wsil. Another mechanism is to use the HTML META tag to provide links to the providers inspection documents. Although not a public specification produced by a standards body, WS-IL concepts form a practical alternative to centralized registries. WS-IL consolidates concepts found in earlier technologies that promoted decentralized Web services registries. 14.2.7 Web Services Interoperability Organization (WS-I) One of the most fundamental goals for the establishment of Web services is interoperability. Interoperability is required when building new distributed applications under heterogeneous conditions—developed by different individuals, by different development tools, and deployed on different computing platforms. Another common scenario occurs when a Web services client is built to interact with a previously deployed Web service. In both cases, interoperability issues are exasperated when the distributed application crosses one or more trust boundaries—all the difficulties of developing in heterogeneous environments are present with the added issues of security, privacy, and reduced human access to application semantics. Interoperability is also required when using Web services for integration tasks,
520
FIXED AND MOBILE WEB SERVICES
connecting legacy systems typically in a back-office environment. In all these situations, interoperability is impossible unless the Web services technologies used are consistently interpreted and implemented by all parties. Web services interoperability has been elusive. Different standards bodies (W3C, OASIS, IETF) generate Web services specifications; no single organization owns or coordinates the development and cohesion of the specifications. The problem is especially highlighted here in Web services infancy, as the primary implemented specifications, SOAP 1.1 and WSDL 1.1, are vendor-produced and not the result of a rigorous standards process. The resulting environment is that each platform and tool vendor must interpret gaps between or ambiguity within the specifications, which leads to divergent, noninteroperable implementations. Solving these issues is why WS-I was formed. WS-I was formed by the principal vendors of Web services tools and platforms. They collectively realized that this new technology domain of Web services would not gain traction in the marketplace unless it can deliver cross-platform and tool interoperability. Pushing this responsibility down to a standards organization would result in loss of control over the process when time to market is essential. Hence, WS-I was formed in early 2002. WS-I brands versions of Web services specification as interoperable profiles. Interoperable profiles identify target Web services technologies and provide clarifications on their usage both individually and in conjunction. The first profile is WS-I Basic Profile 1.0 (BP 1.0) and includes the specifications: HTTP 1.1, XML Schema 1.0, SOAP 1.1, WSDL 1.1, and UDDI 1.0. The BP 1.0 became a final specification in summer 2003. The Basic Profile 1.1 (BP 1.1) is currently under development and pulls “SOAP with attachments” (SwA) into the BP umbrella. The Basic Security Profile 1.0 (BSP 1.0) is currently under development. BSP 1.0 will profile security specifications appropriate for SOAP message security, including SSL/TLS and the OASIS SOAP Message Security specifications, as well as the Username, X.509, and Kerberos token profiles. In addition to generating profiles, WS-I also performs a profile validation function for participating vendor Web services products. This activity occurs in the WS-I Sample Applications working group. The Sample Applications group validates interoperability against a given profile by designing a mock application (supply chain management) with multiple Web service endpoints. Each vendor platform implements each endpoint and hosts each service to demonstrate crossplatform interoperability. Additionally, a WS-I produces test tools that sniff SOAP messages on the wire and evaluate conformance of the message against a target profile. 14.2.8
Mobile Terminal Web Services
The mobile domain is motivated to exploit the increasing availability of information and services available on the Web. Mobile terminals can now browse the Web and are becoming connected to personal information management (PIM) services (such as email and scheduling and contacts) along with other enterprise systems. The demand will be to incorporate new applications and functions that make particular
14.2
WEB SERVICES FOUNDATION TECHNOLOGIES
521
sense to a mobile user, such as traffic and airport data and other localized information. There will also be a demand on the mobile industry to make these facilities available on more mid- and entry-level mobile phones. Given the staggering number of mobile terminals in use and the upgrade rate, the opportunities for all parties involved—consumers, enterprises, network operators, and service providers—are great. As Web services are built on the foundation of the Web, offering mobile Web services is dependent on the physical capabilities of the mobile Web. The mobile Web differs from the fixed Web in some significant ways. Mobile terminals are subject to some well-documented limitations. Mobile handset processing power is restricted in terms of CPU capabilities, addressable memory, and permanent memory. The user interface is limited in terms of both screen and input device. The data network is constrained in both data throughput and latency. However, on all these fronts, mobile terminals are improving dramatically. Mobile operating systems and user interfaces are becoming increasingly powerful and sophisticated— witness the Symbian operating system and the Nokia Series 60 platform. These advanced terminal platforms have opened up their local resources to host thirdparty applications. Current and widely deploying wireless packet switching protocols (such as GPRS) offer greater capacity, higher bit rates, and always-connected functionality. These emerging mobile enhancements are creating positive conditions for terminal hosted Web services applications. While direct terminal participation in Web services is coming, architectural alternatives can bridge the gap. The common architectural solution is to deploy a proxy into a network host that assumes the role of a SOAP client when facing the Web and connects to the mobile terminal using a supported technology. This enables the SOAP client to be arbitrarily complex. The terminal application that connects to the proxy could be either generalpurpose such as a browser or a custom application such as myAirportKiosk. These are temporary solutions that await a new generation of mobile terminals capable of hosting a Web services stack and addressing the issues that will obstruct the uptake of terminal-hosted mobile Web services applications. Two main issues of mobile Web services uptake are performance and openness. The constraints described above will limit the performance of Web services on the mobile device. Web services messages are large XML documents, so bandwidth, processing, and memory limitations all will impact perceived application performance. In particular is the performance impact of marshaling and unmarshaling XML serialized data to program memory. Hence, the serialization of the SOAP message XML Infoset into a binary representation (such as ASN.1) are being actively discussed in the mobile Internet industry and already being promoted by some vendors. The performance gains are significant using binary serialization of the SOAP message, and are made greater by sending the binary message sans (without) the tag set. The argument is that the data is sufficient, as each node already knows the message schema. These approaches however reduce the “transparency” of the system, which makes debugging difficult and runs counter to XML and Web precepts.
522
FIXED AND MOBILE WEB SERVICES
Open Mobile Alliance (OMA) The Open Mobile Alliance [22] is an organization of mobile device manufacturers, operators, and IT vendors. OMA is currently addressing how Web services can be leveraged by mobile operators. OMA is defining enabling services such as location, digital rights, and presence; use cases involving mobile subscribers, mobile operators, and service providers; an architecture for the access and deployment of enabling services; and a Web services framework for using secure SOAP in this environment. These Web service interfaces are intended to enhance a service provider’s data for a particular mobile subscriber. A common scenario starts with a data request from some application (perhaps a mobile browser) to a service provider. The service provider then uses Web services to interact with a subscriber’s mobile operator to retrieve some relevant data about the subscriber such as location or presence. This data can be used to enhance the service provider’s response to the initial request. J2ME Web Services J2ME Web Services is a specification created by the Java Community Process (JCP) under Java Specification Request (JSR) 172 [23]. J2ME Web Services builds out from J2ME, an implementation of the Mobile Information Device Profile (MIDP) specification. All MIDP implementations support HTTP, the primary transfer protocol to use with SOAP. Given the availability of HTTP networking, J2ME Web Service can focus on defining the additional components needed to parse XML and to create and consume SOAP messages, while constrained to meet platform size and performance metrics. To meet the XML parsing and platform size requirements, J2ME Web Services 1.0 does not provide DOM (Document Object Model) level 1.0 or 2.0 support and instead supports SAX (Simple API for XML) 2.0. DOM builds a tree data structure of a parsed XML document, and this can violate the memory constrain. J2ME Web Services also describes WSDL to Java mapping and a runtime API to support generated stubs. There also exist third party alternatives to JSR 172 implementations. These are XML and SOAP APIs for the MIDP platform and have the advantage of a smaller footprint and more mature implementation. 14.2.9
Privacy and Identity Management
Privacy is often an essential requirement for Web service deployments, mandated by regulations in some environments. One aspect of privacy is to ensure that identity attributes of a user are not disclosed during a Web service invocation. Anonymity in the Web services context is a mechanism used to ensure that the identity of a user is not disclosed and may be implemented using a pseudonym. This may be achieved by using an identity management solution that maps identities, by federating accounts via pseudonyms such as done by Liberty Alliance [21], for example. Architecturally, use of pseudonyms requires an application intermediary that is inside the user’s trust boundary. The application intermediary will relay the user request to the ultimate receiver after stripping out user-sensitive information and replacing any required data with intermediary-oriented data, including the pseudonym. Web service providers often require authentication of a user, but this does not
14.3
CONCLUSION
523
necessarily mean that the identity of the user is known to the service, as long as authentication was performed by a trusted party. A Web service user will typically interact with many Web service providers. Hence, each Web service user will need to maintain many identifiers, regardless of whether the user employs specialized-application Web service clients, a general-application Web services client (i.e., browser-based), or some mixture of these two client types. Service providers often capture the same sensitive information, such as address, personal preferences, and financial data. This duplicated, distributed data are apt to become stale, as each user’s sensitive information may be volatile. A solution to these issues is to provide Web service users a federated network identity. A federated network identity will provide a Web service user a single repository for all online identities and sensitive information including preferences and purchasing habits. Architecturally, an identify provider fulfils the role of a federated network identity. Each Web service user can then administer his/her multiple identities. The identity provider will then securely share identity and sensitive information with Web service providers discriminately. Identity management services can be built using Web service technology components as well as adding value to Web services deployments.
14.3
CONCLUSION
Web services are an immature industry with lots of backing by the businesses in the best position to capitalize if a robust market emerges. Because of this enormous interest, it becomes hard to distinguish true customer demand from the marketing machinery. However, Web services seem like a natural extension of the successful Web, and there is increasing Web services deployment in the B2B environment. These are typically integration tasks within an enterprise behind a secure perimeter. Once the Web services security specifications are more mature and have considerable buy-in and platform-support, this might fulfill the necessary precondition for Web services to reach critical mass. Having secure Web services will enable businesses to confidently expose external service interfaces and consumers to confidently interact with these service providers. The next level of business process issues such as message reliability, transaction-support, and message addressing will then draw the focus of IT vendors and the standards bodies. However, until basic Web services messaging, interface descriptions, and security are resolved and widely adopted, Web services technology will have limited application. The good news is that between the WS-I Basic Profile and the SOAP and WSDL specifications, two-thirds of the necessary and sufficient conditions have been met. Once the security standards make their way through the standards bodies and become adopted, the industry will then truly witness the value that the market places on Web services.
524
FIXED AND MOBILE WEB SERVICES
REFERENCES 1. I. Jacobs, ed., W3C TAG: The Architecture of the World Wide Web, http:// www.w3.org/2001/tag/webarch/. 2. R. Fielding and R. Taylor, Principled design of the modern web architecture, Proc. 2000 Int. Conf. Software Engineering (ICSE 2000), Limerick, Ireland, June 2000, pp. 407 – 416. 3. B. Carpenter, Architectural Principles of the Internet, RFC 1958, IETF, June 1996; http://www.ietf.org/rfc/rfc1958.txt. 4. J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee, Hypertext Transfer Protocol—HTTP/1.1, RFC 2616, IETF, June 1999, http://www.ietf.org/ rfc/rfc2616.txt. 5. T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, eds., W3C Extensible Markup Language (XML) 1.0, 2nd ed., W3C Recommendation, Oct. 6, 2000; http://www.w3.org/TR/REC-xml. 6. M. Gudgin, M. Hadley, N. Mendelsohn, J. J. Moreau, and H. F. Nielsen, SOAP Version 1.2 Part 1: Messaging Frameworks, http://www.w3.org/2000/xp/Group/. 7. H. Haas and D. Orchard, Web Services Architecture Usage Scenarios, http:// www.w3.org/TR/2002/WD-ws-arch-scenarios-20020730/. 8. N. Mitra, SOAP Version 1.2 Part 0: Primer, http://www.w3.org/2000/xp/Group/2/ 06/LC/soap12-part0.html. 9. M. Mountain et al., SOAP Version 1.2 Email Binding, http://www.w3.org/TR/2002/ NOTE-soap12-email-20020626. 10. Using the Simple Object Access Protocol (SOAP) in Blocks Extensible Exchange Protocol (BEEP), RFC 3288, IETF, http://www.ietf.org/rfc/rfc3288.txt. 11. J. Cowan and R. Tobin, XML Information Set, W3C Recommendation, Oct. 24, 2001; http://www.w3.org/TR/xml-infoset/. 12. Web Services Description Language (WSDL) 1.1, W3C Note; http://www.w3.org/TR/ wsdl. 13. D. Box et al., Simple Object Access Protocol (SOAP) 1.1, W3C Note, May 8, 2000; http://www.w3.org/TR/SOAP/. 14. K. Ballinger et al., Basic Profile 1.0, WS-1 Board Approval Draft, June 21, 2003; http://www.ws-i.org/Profiles/Basic/2003-06/BasicProfile-1.0BdAD.html. 15. (a) D. Eastlake et al., XML—Signature Syntax and Processing, W3C Recommendation, Feb. 12, 2002, http://www.w3.org/TR/2002/REC-xmldsig-core-20020212/; (b) D. Eastlake and J. Reagle, XML Encryption Syntax and Processing, W3C Recommendation, Dec. 10, 2002, http://www.w3.org/TR/2002/REC-xmlenc-core20021210/. 16. R. Fielding, Architectural Styles and the Design of Network-Based Software Architectures, Ph.D. dissertation, Univ. California, Irvine, 2000; http://www.ics.uci. edu/ fielding/pubs/dissertation/top.htm. 17. H. F. Nielson and H. Ruellan, SOAP 1.2 Attachment Feature, W3C Working Draft, Sept. 24, 2002; http://www.w3.org/TR/soap12-af/. 18. H. F. Nielson, E. Christensen, and J. Farrell, WS—Attachments, Internet Draft, June 2002. 19. UDDI Version 3 Specification; http://uddi.org/.
REFERENCES
525
20. K. Ballinger et al., Web Services Inspection Language (WS—Inspection) 1.0; http:// www.106.ibm.com/developerworks/webservices/library/ws-wsilspec.html. 21. Liberty Alliance; http://www.projectliberty.org/. 22. Open Mobile Alliance; http://www.openmobilealliance.org/. 23. Java Community Process, JSR 172; http://www.jcp.org/en/jsr/detail?id ¼ 172. 24. J. Ibbotson, ed., SOAP Version 1.2 Usage Scenarios, W3C Working Draft; http:// www.w3.org/TR/2002/WD-xmlp-scenarios-20020626/. Copyright Notice: http:// www.w3.org/Consortium/Legal/2002/copyright-documents-20021231. 25. N. Mendelsohn, M. Nottingham, and H. Ruellan, eds., XML-binary Optimized Packaging, W3C Working Draft, http://www.w3.org/TR/2004/WD-xop10-20040608/. 26. E. Levinson, The MIME Multipart/Related Content-type, RFC 2387, IETF, August 1998, http://www.ietf.org/rfc/rfc2387.txt. 27. N. Mendelsohn, M. Nottingham, and H. Ruellan, eds., SOAP Message Transmission Optimization Mechanism, W3C Working Draft, http://www.w3.org/TR/2004/WDsoap12-mtom-20040608/.
28. A. Karmarkar, M. Gudgin, and Y. Lafon, eds., SOAP Resource Representation Header, W3C Working Draft, http://www.w3.org/TR/2004/WD-soap12-rep-20040608/.
INDEX
AAC audio, 392 Absolute location, 442 Access, generally control, security services, 374 Internet service providers (ISPs), 109 offline, 144–145, 265 points, wireless LAN network, 142 workload characterization, 144 –145 Acknowledgments (ACKs) negative, 95–98, 346 positive, 93, 95, 345 ACME (architecture for content delivery in the mobile environment) defined, 5–6 overview of, 183–185 performance analysis in CDMA networks, 191, 196 –197 slotted ALOHA system, 191–196 system description, 191 in radio resource management, 201 terminal power consumption in, 185–186, 198 user interest correlation: algorithm, 197 –198 simulations, 198 –200 traces, 198 ACME Director characteristics of, 190 –192, 197 –198
development of, 185–186 effectiveness, 198– 200 terminal power consumption, 198–199 Active cache, 90 Active handover, IP multicast system, 344 ActiveX, 487 Activity factor, defined, 141 Adaptation appearance, 217–219 quality, in multimedia streaming, 281–283 Adapters, SOAP, 491 Adaptive threshold, 121 Adaptive TTL protocol, 120– 122, 125 Adaptive Web caching, 123–124 Additive increase, multiplicative decrease (AIMD) algorithm, 281 Administratively scoped multicast, 98 ADSL (asymmetric digital subscriber line), 31 Advanced encryption standard (AES), 378 Advanced mobile phone system (AMPS), 9 Agent-driven negotiations, 48 AIM, 99 Akamai, 91 Alert protocol, WAP 1.0, 77 AMR (adaptive multirate voice codec), 29, 208, 248, 308, 392
Content Networking in the Mobile Internet, Edited by Sudhir Dixit and Tao Wu ISBN 0-471-46618-2 Copyright # 2004 John Wiley & Sons, Inc.
527
528
INDEX
Angle of arrival (AOA), location estimation, 449–450, 462 Angulation, 447 Antenna technologies, 23 Antipiracy protection, 374 AOL Anywhere, 135 Apache Cocoon, 213 Appearance adaptation, 217–219 Application layer defined, 3 multicast, see Application layer multicast multimedia streaming, QoS control, 278, 280–284 overview of, 91 –92 scalable content delivery, 91– 92 Application layer multicast characteristics of, generally, 103– 104, 330 functions of content distribution, 111 end-user subscription, 111 overlay setup, 110 tree organization, 110–111 rationale for application layer routing, 107 asynchronous delivery, 106 easy deployment, 105 –106 effective transport, 104 –105 versatility, 107 Application-level multicast, 168, 174 Application service provider (ASP), 466, 469 Architecture, see specific types of systems core network, 11– 13 end-user performance, GSM/EDGE and WCDMA, 31–33 GSM/GPRS/EDGE, 24–27 IS –95 radio access, 27–29 operator performance, GSM/EDGE and WCDMA, 29–31 overview of, 9– 10 standardization framework, 10 WCDMA radio access network architecture, 13–14, 362–363 beyond 2Mbps with HSDPA, 20–21 evolution of, 22–24 layer 2/3, 14 –18 physical layer, 18–20 ARIB, 10 ARM (active reliable multicast), 99 ASCII, 208, 428 ASF (advanced streaming format), 209 Asymmetric encryption, 377–378 Asynchronous delivery, 106 Asynchronous multicast push (AMP), 119, 125 AT&T, 138
Atmospheric conditions, impact of, 277 Attachments, 505–506 Audio, streaming, 285, 338. See also Multicast content delivery; Multimedia streaming Audiovisual content characteristics of, 208 nonscalable, 208–209 scalable, 209– 210 Authentication digital rights management, 402–404 HTTP/1.1, 48 OMA data synchronization, 269 security services, 373, 375, 507 3GPP MBMS system, 366 Authentication, authorization, and accounting (AAA), 290, 430 Automatic retransmission query (ARQ), 26 Automesh, defined, 113 Availability, security services, 374 Avantgo, 144–145 Backbone ISPs, 109 Backend architecture, 143 Band-segmented transmission-orthogonal frequency-division multiplex (BSTOFDM) modulation, 351 Bandwidth caching and, 126 distribution trees, 116 efficiency, 185, 190 IP multicast system, 107–108, 347 multicast content delivery, 333–334 multimedia streaming, 282–283 significance of, 2 streaming media, 283 wireless LAN network, 142 wireless mobile environment, 315 Bandwidth Broker (BB), 285 Bandwidth-on-demand services, 16 Basestation controller (BSC), 26 BAT system, location estimation, 456–457 Bayesian network, location estimation, 452, 455 Beamforming antenna, 23 BEEP (block extensible exchange protocol), 499 Beep Science AS, 390 mobile DRM system, 392–393 Berners-Lee, Tim, 39 Bidrectional access networks, 347 Bit error rate (BER), 183, 315 Bit rate capability, 10 Blogs (WeB logs), 479 Bloom filters, 123 Bluetooth, 206, 232–233, 261, 264
INDEX
Body area network (BAN), 3 Bottlenecks, 6, 123, 136, 184 –185 B-pictures, streaming video, 286 Broadcast control channel (BCCH), 15 Broadcast/multicast control (BMC) protocol, 15 Browsing charges, 433, 435 Bus cable, 261 Business-to-business (B2B), 421, 478 –479 Byte-range operations, HTTP/1.1, 45, 48 Cache, see Caching applet, 90 consistency, 118, 120 –122 HTTP/1.1, 48–50 mesh, 122, 126 miss, 117 radius, 125 router proposal, 99 Caching implications of, generally, 45, 48–50 proxy, 117 –125 scalable content delivery, 91–92 servers, streaming media, 289 –290, 311 web-based applications information dissemination, 87–90 information exchange, 90 web proxy, 117– 126 Callback address, 135 Call detail records (CDRs), 411–412, 424–425, 427–431 CAMEL (customized application for mobile network enhanced logic), 425 Canonical (CNAME), streaming media, 306 CAP3 (CAMEL application Part 3), 425 Carriage return and a linefeed (CRLF), 40 Cascaded unicast, 97 –98 Cascading style sheets (CSS), 81, 210–211, 228–229 CC/PP, see Composite capability/preference profiles (CC/PP) CCTrCH, 16 CDMA (code-division multiple access) characteristics of, generally, 11 classical near-far problem, 19 fast power control, 18–19 location estimation, 462–463 networks, ACME in, 191, 196– 197 cdma2000, 27, 196 Cell broadcast service (CBS), 358 Cellular digital packet data (CDPD) technology, 381 Cellular phones, 66, 144–145, 153 CGI (common gateway interface), 469 Channels, error-prone, 315
529
Characteristics adaptation, 217, 220, 239 Charging advice of charge, 429 architecture, 433 correlation, 428– 429 differentiated, 425–426 fixed-line telephony, 411–414 flow-based, 426–428 information, 431– 435 interfaces, 429–431 mediation, 428 mobile content features business-to-business (B2B), 421 multiple access, 422 multiple services in delivery, 423–424 overview, 415–416 postpaid charging, 419–421 prepaid charging, 419–421, 432 records, source of, 422–423 revenue chain, 416–417 roaming, 422 subscription models, 417–419 mobile telephony, 414– 415 rating, 429 records creation of, 424–425 source of, 422–423 rules, 429 scenarios browsing, 433, 435 downloads, 423, 436 person-to-person messaging, 435–436 streaming video, 436–437 scope of, 410–411 CIBER, 421 Circuit-switched (CS) domain, 11– 13 Cisco, 89, 103, 124 Classical near-far problem, 19 Client information request headers, 58–59 Client response preferences request headers, 59–62 CNN.com, 87–88 Coded orthogonal frequency-division multiplex (COFDM) modulation, 351 Common control channel (CCCH), 15 Common packet channel (CPCH), 16–17 Composers, SOAP, 491 Composite capability/preference profiles (CC/PP) content adaptation, 231–233 exchanges, 303–305, 313 repository, 304 streaming media, 299, 301–305 Conditional request headers, 57–58 Confidentiality, security services, 373, 507
530
INDEX
Conflict resolution, data synchronization, 260, 270 Congestion implications of, 47, 125 losses, 315–316 multicast content delivery, 333–334 multimedia streaming, 281– 283 Connections IP multicast system, 337 persistent, HTTP/1.1, 46 –47 Connectivity, 3, 126 Consumption IP multicast system, 337 IPDC, 349–350 multicast content delivery, 334 Content, generally caching, 5 defined, 2 distribution, 111 IP multicast system, 337 metadata, 230, 236–237 negotiations, 45, 47–48 types, media audiovisual content, 208 –210, 222 –223 nonaudiovisual content, 223–224 textual content, 207– 208 value chain, 376– 377 verification, 402–403 Content adaptation application scenarios content selection, browsing, 241 –244 transcoding, multimedia messaging service, 244– 250 architectures configurations, 239 –240 location of adaptation, 237 –239 capabilities composite capability/preference profiles (CC/PP), 231–233 defined, 230 subscriber databases, 233 UAProf, 233–235, 238–239, 248–249 user-agent information, 231 future directions for, 251 –252 implications of, generally, 6 metadata, 230, 236–237 methods of content selection, 225–228, 241–244 hybrid approaches, 230 multimedia transcoding, 221–225, 244– 250 rendering at the client, 228–230 motivation for, 205–207 multimedia content types application data, 214–215
media content, 207– 210 presentation content, 210–214 procedural code, 215 standardization, 251 types of adaptation appearance, 217–219 characteristics, 217, 239 encapsulation, 221, 239–240 format, 216–217, 239 size, 219–220, 239 Content analysis, workload characterization content modification pattern, 138–139 content popularity, 138 content size, 138 content types, 138 defined, 137 notification, 164– 168, 176–179 web browsing, 145– 149, 176– 179 Content control engine (CCE), 392 Content delivery multicast, see Multicast content delivery web proxy coaching, 118–120 Content delivery networks (CDNs) characteristics of, 3, 187, 454, 457–459 end-system acceleration, 187–188 mobile, 189 network scaling, 187 optimization, content and protocol, 188–189 Content networking, generally characteristics of, 1–2 defined, 2 in the mobile internet, 2 –4 Content provisioning system (CPS) IPDC system, 356 IP multicast system, 350 Content scrambling system (CSS), 386 Content servers (CS), 143, 310, 380 Content synchronization adoption of constraints, adherence to, 262–263, 269–270 mobile device scenarios, 261–262 change detection, 260, 270 conflict detection/resolution, 260, 270 data storage and, 255–257 implications of, 6 need for, 257–258, 273–274 OMA standards data synchronization protocol, 268–273 overview of, 263–264 representation, 264–267 types of delta sync, 260 fast sync, 260
INDEX
full sync, 259–260 one-way, 258 –259, 269 slow sync, 259 –260 two-way, 258 –259, 269 Continuous multicast push (CMP), 119, 125 Contributing source (CSRC), 305 CONTROL, GPS system, 459 Cookies, WAP 2.x, 82 COPS (common open policy service), 285 Copyright, 225 CORBA, 478, 487 Cricket system, location estimation, 456 Crying baby problem, 94, 100 Cumulative distribution function (CDF), 146, 149 Curly, 54 Customizer, 226–227 CWTS (China Wireless Telecommunication Standard Group), 10 DA delay, 97 DAML-S, 483 Database server, 144 Data gathering, web workload characterization, 3, 144 Datagroups, IP multicast, 110 Data synchronization, defined, 257. See also Content synchronization DCE, 487 DCOM, 478, 487 DCT (discrete-cosine transform), 385, 396 Debugging, 46 DECE, 478 Decryption, 380 Dedicated channel (DCH), 16 –17 Dedicated control channel (DCCH), 15 Dedicated physical control channel (DPCCH), 16–17, 19 –20 Dedicated physical data channel (DPDCH), 16–17, 19 Dedicated traffic channel (CTCH), 15 Deering, S., 93 Degree/minutes/seconds (DMS) system, physical location, 441 Delay, generally budget, 284 DA, 97 jitter, 277 –280 loss detection, 284 queuing, 184 spread, 444 Delay-locked loop (DLL), 462 Delta encoding, 120 Delta synchronization, 260 Denial of service (DoS), 372, 374
531
Dependency graph, 118 Deployment, multicast content delivery, 334 Designated receivers (DRs), 96 Desktop access, 144–145 Desktop computers, 261 Desktop users, web workload, 147, 178 Destination, adaptation architecture, 238–240 Device Independence, 252 DHCP (dynamic host configuration protocol), 331 Differential GPS (DGPS), 460 Differentiated Services (DiffServ), 284–285, 310 Diffraction defined, 443 loss, 445 Digest authentication, 45, 48 DigiBox, 391 Digital, 138 Digital audio broadcast (DAB) system, 352 Digital cameras, 217 Digital fingerprinting, 378– 379, 400–401 Digital Fountain, 100, 102, 119 Digital item adaptation (DIA), 251 Digital rights locker, 383–384 Digital rights management (DRM) authentication, 373 content distribution business models characteristics of, 380–381 floating licenses, 382 microtransactions/micropayments, 382 pay per view, 382–383 promotion models, 381 subscription– based, 381 superdistribution, 382 functional architecture, 376 information architecture, 376 mobile (MDRM), terminal requirements, 384–385 multicast content delivery, 338 multimedia services, 321 overview of, 375–381 security protocols, 373–374, 379–380, 384 significance of, 7 streaming media, 290 Digital signatures, 377– 378 Digital video broadcasting terrestrial (DVB-T) network, IPDC system, 348, 352–354 Digital watermarking, 375, 377–379 Dilution of precision (DOP), 463 Direct-sequence spread-spectrum (DSSS) signal, 457 Direction services, 147 Director effectiveness, 198–200 Directory-based volumes, 121 Disconnection, 256
532
INDEX
Discovery protocols, 518–519 Dispatchers, SOAP, 491 Disseminate information, 5 Distance-based location update strategy, 461–462 Distributed computing, 478 Distributed share memory (DSM), 117 Distribution tree application layer, 110–112 construction of: maintenance, 116 neighbor selection, 113–114 parent selection, 114– 116 peer discovery, 112–113 DMDmobile, 390 DNS (domain name service), 91 DoCoMo location platform (DLP), 469 Document object model (DOM), 223–224 Document type definition (DTD), 266, 268 DOM (Document Object Model), 522 Doppler effects, 343 Downlink shared channel (DSCH), 17 Downlink signaling, unidirectional, 346 –347 Downloader program, 144 Downloads charges for, 423, 436 web services, 484 Duplicate avoidance (DA), 95–97 DVB-H (DVB handheld), 351–353, 357 DVB-M (DVB mobile), 351 DVB-S (DVB-satellite), 357 DVB-T (DVB-terrestrial), 184, 197 DVB-X, 351 DVDs, intellectual property management, 386 DVMRP (distance vector multicast routing protocol), 96, 115 DWDM, 184 Dynamic content, defined, 88–90 Dynamic objects, 88 –89, 125 ECMAScript, 81 E-commerce, 91, 356 –357 Edge caching, 184 –185 EdgeScape, 458 Edge Side Includes (ESI), 189 EDI/B2B industry, 474 EFRC, 208 Eight-level vestigial sideband (8-VSB) modulation, 351 Electronic data interchange (EDI), 478–480 Electronic service guide (ESG), 353 Email, 258, 297, 483 –484. See also Mail service Encapsulation adaptation, 220 –221, 239 –240
Encoding chunked, 47 error-resilient, 283 Encryption characteristics of, 48, 375, 377–378 format compliant, 396– 397 progressive, 397 scalable, 394– 395, 397– 398 selective, 395–396 SOAP Message Security, 508–510 EncrypTix, 390 End-system acceleration, 187–188 End-to-end, generally architecture, 288–290 congestion, 281 connection, 18 headers, HTTP/1.1, 50 –51 E-911 calls, 460–464 Entity headers, HTTP/1.1, 51, 54–56 Ericsson, 67, 263 Error-concealment, 283 Error control, multimedia streaming, 283–284 Estimation, in location estimation process, 453–454 Ethernet LANs, 108 ETSI, 10 EU IST, BRAIN, 321 European Digital Video Broadcasting Terrestrial (DVB-T) system, 350–353, 357 Event-based charging, 418 EVRC, 218 Exchange information, 5 Expanding-ring search (ERS), 96, 112–113 eXtensible Markup Language (XML) characteristics of, 209, 265–267, 271–272, 480–485 XML-RPC, 494 XML/XML schema, 483 eXtensible rights Markup Language (XrML), 379, 387, 509 eXtensible Style Language Transformation (XSLT), 223–224, 229, 487 Extensible Stylesheet Language (XSL) Transformations (XSLT), 212–214, 228 External functional interface (EFI), 83
False-hit rate, 123 Fast fading, 343 Fast hit, 118 Fast synchronization, 260 Fault tolerance, 256 Fcast, 100 FDMA, 196
INDEX
FEC packets, 101–102. See also Forward error correction (FEC) Federal Communications Commission (FCC), 443, 460 Feedback implosion, 94–95 Fees, 7, 358. See also Charging Filecast IPDC system, 355 IP multicast system, 338 File delivery over unidirectional transport (FLUTE), 339 Fine-grained scalable (FGS) compression, 394–395, 397 Firewalls, 486 First-hop router, 102 Fixed-access networks, 1 Fixed-line telephony, charges for, 411 –414 Fixed subscription, 417 Fixed Web services, see Web services Flexible layer one (FLO), 26– 27 Flow-based charging, 426– 428 Format adaptation, 216–217, 239 Format compliant encryption, MDRM, 396–397 Forward access channel (FACH), 17 Forward error correction (FEC) ACME architecture and, 188 content caching, 100 –102, 104 IPDC system, 355 IP multicast system, 346 multimedia streaming, 280, 283, 286, 317 Free-space loss, 445 Freeware organizations, 474 Front-door server, 143 FTAM, 428 Full synchronization, 259 –260 Fusion tree, 96 Gateway GPRS support node (GGSN), 13, 311, 320, 357, 361, 364– 366, 415, 428–429 Gateway mobile location center (GMLC), 465, 467–468 Gateways, generally SOAP, 491 wireless access, 144–145 Gaussian minimum shift keying (GMSK), 25 General headers, HTTP/1.1, 51–54 Geographic locality, system load analysis, 162–163 Geographic positioning, 275 Geographic push caching, 120 GeoPoint services, 458 Geotargeting, 458– 459 GeoTraffic analysis services, 458–459 GIF, 209, 216– 218, 220, 222, 248, 250
533
Global Mobile Suppliers Association, 10 Global navigation satellite system (GNSS), 459 Global positioning system (GPS), 459 GLONASS system, 459 GMSC (gateway MSC), 13 Google.com, 88 GPRS (general packet radio service), see GSM/ GPRS/EDGE characteristics of, generally, 83, 206 charging for, 415, 422–423, 426 network, 183 GPRS RAN (GERAN), 320, 361–362 GPRS SGSN, 414–415 GPRS tunneling protocol (GTP), 364 GPRS/WCDMA, 32– 33 Group key Kg, 383 Group management hierarchy (GMH), digital rights management, 401 G.723.1, 208 G.729, 208 GSM (global system for mobile communication), see GSM/EDGE; GSM/GPRS/EDGE characteristics of, generally, 208 evolution of, 66 phones, 205 standardization, 13 GSM (Group Spe´ciale Mobile), 414 GSM Association, 10 GSM/EDGE, 9– 10 GSM/GPRS/EDGE end-user performance, 31 –33 GSM principle, 24–26 operator performance, 29–31 radio access network architecture, 26 service creation principle, 26–27 GSM/GPRS, synchronization, 261 GSM MSC, 414 GSM900, 29 Hacks, 384 Handheld Device Markup Language (HDML), 5, 67 Handheld PCs, 205 Handover (HO)/handoff, IP multicast system, 343– 345 Handover packet loss, 316–317 Handshake protocol DRM security, 379–380 WAP 1.0, 77 HARQ, 28 Harvest Web cache, 121– 123 Headers, HTTP/1.1 end-to-end, 50–51 entity, 51, 54 –56
534
INDEX
Headers, HTTP/1.1 (Continued ) functions of, 50 –51 general, 51 –54 hop-to-hop, 50–53, 61, 65 request, 51, 56–63 response, 51, 63–65 Heavy-tail distribution, 138–139 Helix, 390–391 Hierarchical caching, 117, 119, 122, 126 High-definition television (HDTV), 350 High-speed circuit-switched data (HSCSD), 24 High-speed DSCH (HS-DSCH), 21 HLR (home location register), 12 Hop-by-hop headers, HTTP/1.1, 50 –53, 61, 65 Hop count, 107 Host mobility, 315, 317 –318 Hot billing, 425 Hotspots, 319, 361 HSDPA (high-speed downlink packet access), 4, 20–22, 26, 29, 31–33 HTTP/0.9, 40 HTTP/1.0, 410 HTTP/1.1 authentication, 43 –44, 48 byte-range operation, 48 caching, 48–50 chunked encoding, 47 content negotiations, 47– 48 goals of, 44 headers end-to-end, 50–51 entity, 54 –56 functions of, 50–51 general, 52–54 hop-to-hop, 50–52 request, 56– 63 response, 63–65 persistent connections, 46 –47 request methods, 45–46 response methods, 45 status code with description, 43 HTTP/2.0, 53 Human factors studies, 1 Hybrid networks, 97 –98, 347 Hypertext Markup Language (HTML) characteristics of, 210 macros, 188–189 multimedia transcoding, 222 Hypertext Transfer Protocol (HTTP), see HTTP/ 0.9; HTTP/1.0; HTTP/1.1; HTTP/2.0 basic operation, 39 defined, 5, 38 evolution HTTP/0.9, 40
HTTP/1.0, 41 –44 HTTP/1.1, see HTTP1.1 general operation, 38 –39 GET, 479, 486, 510– 511 IP multicast systems, 338 OMA data synchronization, 264 popularity of, 188 POST, 467–468, 479, 486, 499, 511 scalable content delivery, 91 IANA, 56 IBM, 263, 508 ICAP Forum, 251 ICP (Internet caching protocol), 123 Identifiers, content synchronization, 263, 270, 273 Idle-mode reception, 363 IGMP (Internet group management protocol), 97, 99, 365 IMEI code, 233 IMT-2000 technologies, 10, 23, 466 Indoor location estimation system infrared-based, 457 scene-analysis-based, 454–455 ultrasound-based, 455–457 Infopyramid characteristics of, 225– 226 content selection application, 242–243 creation process, 227 Information, generally dissemination, 5, 87–90 exchange, 5, 89–91 security, see Security InfoSpace, 135 Infrared, 264 Infrared (IR) signals, location estimation, 445–446 Initialization, OMA data synchronization, 269–270 Inktomi “Media Distribution Network,” 116 reverse proxy, 91 Integrated cellular/WLAN environments, 319–321 Integrated Services (IntServ), 284, 310 Integrity, security services, 373, 507 Intellectual property, 375. See also Moving Picture Experts Group (MPEG), intellectual property management Intelligent network (IN), 414 Interference, 25, 277, 336, 445 Interleaving, 280 Intermediary, adaptation architecture, 238–240 Internet, generally access, 264
INDEX
infrastructure, 92 TV broadcasts, 109 Internet Engineering Task Force (IETF) filecast, 339 functions of, 18, 290 group key management, 402 IP Multicast, 368 media discovery, 339– 340 Multicast Security, 330 OPES Working Group, 187 RFC 1889, 305, 308 2326, 293 2327, 297 streaming media, 284, 339 web services interoperability (WS–I), 519 Internet Explorer (IE), 231, 238 Internet media guides (IMGs), 340 Internet multimedia system (IMS), 310 Internet service providers (ISPs) functions of, 109, 458 wireless, 164, 319 InterTrust, 380 Intranet, 264 Invalidation, web proxy caching, 121 IP datacast (IPDC) characteristics of, 7 concept of, 347 –348, 357 –358 e-commerce for (e-CS), 356 –357 IP infrastructure, 354 –355 mobile wireless radio networks, 350 –354 services and applications, 348–350 service system, 355–356 system architecture, 350 I-pictures, streaming video, 285 IP layer, scalable content delivery, 91–92 IP multicast system, generic common aspects, 336 networking procedure, 340 –342 reference system model, 336–337 three-platform services, 337 –340 IP multimedia subsystem (IMS), 11, 13, 367 IPMP-Ds, 386 IPMP-ES, 387 IP-PDN, 310 –311, 320 IPSec, 83 IP unicast, 93, 345 IPv4, 83, 91, 311, 365 IPv6, 83, 91, 365 IPv6 Forum, 10 IRC/6.9, 53 IrMC, 261 IS-95 characteristics of, 9– 10
535
evolution of, 10 radio access, 27– 28 ISO, 285, 290 IS-2000, 27 ITU-T H.261, 223 H.263, 209, 217, 223, 285, 392 Japanese Terrestrial Integrated Service Digital Broadcasting (ISDB-T) system, 350, 352– 353, 357 Java, RMI, 478 Java Community Process (JCP), 522 JavaEnabled, 232–233 Java MIDP, 215 Java Server Pages, 214 Java Specification Request (JSR), 522 Jini, 478 JMS, 499 JPEG, 209, 216–218, 220, 222–223, 248 JPEG2000, 209, 216, 220, 223 J2ME web services, 522 JVMVersion, 232–233 Kazaa, 381 Kerberos, 478, 509 k nearest-neighbor averaging, 453 Large-scale active middleware (LSAM), 124 Large-scale path loss, 445 Latency implications of, 185, 190 reduction, 125, 185, 190 Lateration, 447 LGMP, 98 Liberty Alliance, 522 Limited-scope multicast (LSM), 98–99 Limited subscription, 418 Line of sight (LOS) path, 443–445 Link layer, 2, 183 LISAP (location information service access protocol), 469 LLC/SNAP framing, 353 Load balancing, 255–256 Local synchronization, 261 Local unique identifier (LUID), 273 Location-based services based on cellular systems location service platform, 468– 470 mobile location protocol (MLP), 467– 468 system architecture, 465–467 characteristics of, generally, 7–8 location estimation algorithm defined, 440
536
INDEX
Location-based services (Continued ) proximity, 454, 457–459 scene analysis, 450–454 triangulation, 447– 450 location estimation media infrared (IR), 445 –446, 457 radiofrequency (RF), 443–445 ultrasound, 446– 447, 455–457 location estimation system defined, 440 indoor, 454–459 outdoor, 459–464 location format transformation (LFT), 464–465 location sensor infrastructure, 440 location taxonomy absolute location, 442 physical location, 441 relative location, 442 symbolic location, 441–442 Location Interoperability Forum, 264 Location service system (LCS), 465 –467 Log analyses, workload characterization, 161–163, 172–174 Logical key hierarchy (LKH), digital rights management, 401 Lookahead window, 118 Loose coupling, 319 –320 Lorax, 98 Loss detection delay, 284 Loss rate, 107 Lotus/Lotus Notes, 261, 263 Low-end mobile phones, 264, 270 Loyalty schemes, 419 MAC (media access control) layer functions of, 91 scalable content delivery, 91– 92 WCDMA, 14 Macrocell environment, 31 Mail servers, 258 Mail service, 147, 165 Manual mesh, distribution tree, 113 Mapping, data synchronization, 263, 270, 273 Matching, location estimation, 451 –453, 455 Maximum transmission unit (MTU), 281, 353 MBone, 89 Media discovery IPDC system, 356 IP multicast system, 338, 340 –341 Media gateway (MGW), 13 Media gateway control function (MGCF), 13 Mediation, charging issues, 428 Media transcoder, architecture of, 218, 221–222
Media transport protocols, 308 Memory, 232–233 Meridian lines, 441 Mesh overlay network, 110, 113 Message sequence chart (MSC), 268, 461 Metadata HTTP/1.0, 41 security strategies, 375, 377, 379 Metropolitan area wireless network, workload characterization analysis, 141 MFTP, 100 Microcells, 278 Micromobility protocols, 320 Microsoft, 508 Middleware, 2, 478 MIME (multipurpose Internet mail extensions), 41, 231, 511, 512 Mirror servers, 255 MMS-IOP, 264 Mobile cinema ticketing, 275–276 Mobile clients, web workload, see Web workload Mobile content charging for, 409–437 delivery, for the Internet, 189 digital rights management 371–404 security, 371–404 Mobile DRM (MDRM) state-of-the-art component technologies efficient key management, 401–402 encryption, generally, 394–398 error concealment, 402–403 format compliant encryption, 396–397 multimedia content verification, 402–404 progressive encryption, 397 public key watermarking system, 398– 401 scalable encryption, 394–395, 397–398 selective encryption, 395–396 state-of-the-art systems components of, 388–389 Helix, 390– 391 integrated model, 391 NEC VS-7810, 390–392 Nokia Music Player, 390–391 table of, 390 terminal requirements, 384–385, 392–393 Mobile Information Device Profile (MIDP), 522 Mobile Internet architecture, overview of, 4, 9–33 characteristics of, generally, 2 –4 content adaptation, 205–252 content networking, 1 –8 protocols for, 35 –84 Mobile location protocol (MLP), 466–468 Mobile phones, 197. See also Cellular phones
INDEX
Mobile terminals, web services, 520–521 Mobile Web services, see Web services Mobile web users, 178 –179 Mobile wireless multicast characteristics of, 335 –336 mobility/movement of users, 342–345 radio transmission errors, 345 –345 unidirectional downlink bearers, 346 –347 Mobile wireless networks, multimedia streaming end-to-end architecture, 288 –290 media delivery protocols, 291 –305 multimedia services, 315–321 QoS control application layer, 275 –284 network layer, 284–285 streaming media codecs, 285–288 streaming media transport protocols, 305 –308 3GPP packed-switched streaming service, 308 –315 Mobile wireless radio networks, IPDC system characteristics of, generally, 350–353 DVB-T/H, 352–354 Modality axis, Infopyramid, 226 Modulation, GSM, 25 Modulo caching, 124–125 Motorola, 67, 263 Movement-based location update scheme, 461 Moving Picture Experts Group (MPEG), intellectual property management MPEG-4 IPMP hook, 386–387 MPEG IMP extensions, 387 MPEG-21 (Rights Data Dictionary), 387 MPEG-2 videos, copy protection, 385–386 Mozilla Windows, 145 MP, 209 MP3, 392 MPEG characteristics of, generally, 251 IMP extensions, 387 video, 395–396 MPEG-4 AAC, 209, 285 AVC, 216 characteristics of, 107, 223, 286 IPMP hook, 386 –387 video, 392 MPEG-7, 299, 340 MPEG-21 (Rights Data Dictionary), 251, 387 MPEG-2 systems standards, 352 videos, copy protection, 385– 386 MSC/VLR (mobile services switching center/ visitor location register), 12 MSNBC server traces, 139–140
537
MSN Mobile, 135 Multicast application layer, see Application layer multicast characteristics of, 5–7, 93 confinement, 96 content delivery, see Multicast content delivery datagrams, 93 distribution tree, 113–116 FIB (forwarding information base), 93 gain, 333 IP characteristics of, generally, 93–94 need for, 108–110 multigroup, 98 push, 118–120 reliable, see Reliable multicast (RM) scalable content delivery, 91 –92 streaming services, 321 Multicast content delivery applications, generally, 327–328, 332–335 architecture, 188 as communication technique, 331–332 future directions for, 367–368 generic IP multicast system common aspects, 336 networking procedure, 340–342 reference system model, 336–337 three-platform services, 337–340 IP datacast (IPDC) concept of, 347–348, 357–358 e-commerce for, 356–357 IP infrastructure, 354–355 mobile wireless radio networks, 350–354 services and applications, 348–350 service system, 355–356 system architecture, 350 justification for, 328–330 mobile wireless multicast characteristics of, 335–336 mobility/movement of users, 342–345 radio transmission errors, 345–345 unidirectional downlink bearers, 346–347 multimedia broadcast multicast service (MBMS) commercial interfaces, 366 concept of, 358–360, 366–367 in core network, 364–365 data sources, 365– 366 radio access networks, 362–364 service center (BM–SC), 361, 365–366
538
INDEX
Multicast content delivery (Continued ) services and applications, 360–361 standardization, 359 system architecture, 361– 362 perspectives of, 330–331 Multicast content distribution, digital rights management, 381 Multicast distributed virtual cache, 124 Multicast expanding-ring search, distribution tree, 112–113 Multifrequency network (MFN), 351 Multigroup multicast, 98 Multihit sessions, 154 Multihoming, 347 Multimedia broadcast multicast service (MBMS) characteristics of, generally, 7, 24, 184, 197 commercial interfaces, 366 concept of, 358–360, 366–367 in core network, 364–365 data sources, 365 –366 radio access networks, 362 –364 release schedule, 360 service center (BM-SC), 361,.365– 366 services and applications, 360– 361 standardization, 359 system architecture, 361–362 Multimedia data streams (MDS), 394 Multimedia message adaptation (MMA), 246 Multimedia messaging service (MMS) charging, 426 content adaptation, 217, 219, 221, 239 multimedia transcoding, 222, 244 –250 wireless access protocol (WAP), 81 Multimedia messaging service center (MMSC), 239–240, 244, 246, 248–250, 426 Multimedia streaming architecture integrated schemes, 319– 321 logical, 288 –290 overview of, 276–278, 288 benefits of, 276 characteristics of, generally, 6 challenges for, 277–278, 282 codecs audio compression, 286 in 3GPP, 286–287 video compression, 285–286 defined, 276 delivery protocols session control, 291 –305 transport protocols, 305–308 future directions for, 321 –322 QoS application layer, 278, 280– 284
network layer, 284–285, 316–317 overview of, 278–280 seamless, 320 3GPP packet-switched streaming service characteristics of, 308–310 domain architecture, 310–312 PSS framework, 312– 314 wireless environment congestion losses, 315– 316 handover packet loss, 316–317 integrated cellular/WLAN environments, 319–321 mobility-aware server selection, 318 request routing, 318 transmission error loss, 315–316 Multimedia transcoding advantages of, 224 architecture, 221–222 audiovisual content, 222–223 drawbacks of, 224– 225 nonaudiovisual content, 223–224 procedural code, 224 Multimedia units (MMUs), content adaptation, 216, 219–221 Multiparty multimedia session control (MMUSIC), 340 Multipath fading, 277, 444 Multiple description coding (MDC), 283 Multiple-input multiple-output (MIMO), 23 Multipoint-to-multipoint (m-t-m) multicast, 331 Multipoint-to-point (m-t-p) multicast, 331–332 Multiprogram transport streams, 354 Multiprotocol encapsulation (MPE) digital video broadcasting (DVB), 353–354 IPDC system, 348, 350, 354–355, 357 Multiuse sensor environment (MUSE), 455
Negative acknowledgment (NAK) characteristics of, 95–98 implosion, 95 Napster, 107, 381 Narrowband systems, 29 Navigational service, WAP 2.x, 83 Nearest-neighbor averaging, 453 NEC VS-7810, 371, 390–392 Negative acknowledgments (NAKs), 95–98 Neighbor selection, distribution tree, 113–114 Netscape, 231, 238 Network infrastructure, 2–3 Networking, defined, 2 Network layers multicast, 330, 332
INDEX
multimedia streaming, QoS control, 284– 285, 316 –317 overview of, 91–92 Network scaling, 187 Nibble system, 452 –455 NLANR trace, 198, 200 NMT, 9 Nokia Music Player, 371, 390–391 Series 60 platform, 521 synchronization standards, 263 wireless access protocol (WAP), 67 Non-line of sight (NLOS) path, 443–444 Nonmesh overlay network, 110 Nonrepudiation, 374 Notification document, 163 Notification messages, 163–164, 364. See also Notification workload Notification server, 143 Notification workload, characterization of content analysis message popularity analysis, 167– 168 notification message, 164– 168 popular categories, 164 –167, 176 –178 log analyses, 172– 174 significance of, 163–164 system load analysis, 171–172 user behavior analysis load distribution, 168–169 spatial locality, 168 –171 web browsing correlations, 174–178 Oasis, 475, 508, 520 OBEX, 264 Observed time difference (OTD), 464 Offline, generally access, 144 –145, 265 charging, 421 users content analysis, 147 system load analysis, 160, 162 user behavior analysis, 151–152 Offloading TCP processing, 188 On-demand, see Video on demand caching, 126 request-response, 118 One-way synchronization, 258–259, 269 Online charging, 421 Open GIS (geographic information system) Consortium (OGC), 440, 465 Open Mobile Alliance (OMA) content adaptation, 245, 251 content synchronization data synchronization protocol, 268– 273
539
representation, 264–267 functions of, 6, 84, 410 web services, 522 Open-source organizations, 474 Open System Interconnection (OSI), 1, 92 Openwave, 66 Orchestrators, SOAP, 491 Origin server charging records, 422 streaming media, 288 OSA (Open Service Access), 431 OSPF (open shortest path first), 115 OTDOA, 467 OTERS, 98 Outdoor location estimation systems cellular-based system, 460–464 GPS-based system, 459–460 Outlook Express, 261 Overcast, 116, 111, 114 Overlay networks, 103, 108, 458 Over-the-air synchronization, 261
Packager, digital rights management, 380 Packed-switch (PS) domain, 11, 13, 24 Packet(s), generally decoding failure, 26 loss, 184, 315–317 retransmission, 283–284 Packet data convergence protocol (PDCP), 15 Packet data protocol (PDP), 364 Packet-switched streaming service (PSS), 3GPP characteristics of, 308–310 domain architecture, 310–312 framework, overview, 312–314 setup procedures, 313–314 Paging channel (PCH), 16 Paging control channel (PCCH), 15 Palm Inc., 263 Palm OS, 215 Parent selection, distribution tree, 114–116 Passive attacks, 372 Passive caching, 289 Passive handover, IP multicast system, 344 Path loss exponent, 445 Payload DRM system, 378– 379 format, 339 PCM, u-law/A-law, 208 Peak loads, 125 Peer discovery, distribution tree, 112–113 Peer-to-peer (P2P), generally applications, 479 file sharing systems, 381, 383
540
INDEX
Personal area network (PAN), 3 Personal devices, 3 Personal digital assistants (PDAs), 66, 144, 151, 153, 160–161, 163, 165, 197, 205, 215, 228, 261, 264, 383 Personal information management (PIM), 214–215, 262–263 Person-to-person messaging, charges for, 435–436 PGM, 99, 103 Phone.com, 66 Physical common packet channel (PCPCH), 17 Physical layer functions of, 2, 10, 91 multicast, 331 1X, IS –95, 27– 28 scalable content delivery, 91– 92 WCDMA radio access network, 18 –20 Physical location, 441 Piezoelectric ceramics, location estimation, 456 Piggyback cache invalidation (PCI), 119, 121 Piggback cache validation (PCV), 119, 121 Piggybacking, 117 PIM-DM, 96 PING roundtrip time, 183 Plane earth loss, 445 Playback rate, multimedia streaming, 276, 282–284 Playout buffer, streaming media, 279–280, 282 PNG format, 209, 216 Point of contact, distribution tree, 112 Points of presence (POPs), 107, 110 Point-to-multipoint (p-t-m) multicast, 331–332, 358, 361 –364 Point-to-point multicast, 362–363 Policy enforcement server (PES), 392 Polling every time protocol, 120, 122, 125 Portals, packet-switched streaming service, 311 Postpaid charging, 419–421 P-pictures, streaming video, 286 Precise positioning service (PPS), 459–460 Prefetch threshold, 118 Prefetching, 118, 125, 151, 190 Prepaid charging, 419–421, 432 Presentation content device-independent, 214 overview of, 210 stylesheets, 210–213 Presentation layer, scalable content delivery, 92 Primary common control physical channel (P-CCPCH), 17 Private key watermarking, 398 –399 Proactive caching, 289–290
Proactive multicast, 126 Probability-based volumes, 121 Profile servers, packet-switched streaming service, 311 Profiling, location estimation, 451 Program service information (PSI), 353–354 Progressive encryption, 378, 397 Protocol data units (PDUs), 75 Provisioning service, WAP 2.x, 83 Proximity, location estimation algorithm, 454, 457–459 Proxy, generally caching basics of, 117–118 cache consistency, 120–122 cache cooperation, 117, 122–125 content delivery, 118–120 limitations of, 125–126 clients, IPDC system, 354 filters, 119, 121 server, streaming media, 289, 310 SOAP, 491 Psion, 264 PSVP signaling, 284–285 Public key infrastructure (PKI), 404, 478 Public key watermarking, 398–400 Public land mobile network (PLMN), 361 Public safety answering point (PSAP), 460–464 Public switched telephony network (PSTN), 411 Push multicast, 118–120 wireless session protocol (WSP), 74, 81, 233 Push-pull scheme, 190 QCELP, 208 QPSK modulation, 28 Quality of service (QoS) ACME, 184 importance of, 3, 6, 8, 10 multimedia streaming, 278–285 Queuing delay, 184 QuickTime, 107, 111 RADAR, 454–455 Radio access bearer (RAB), 312 Radio access network (RAN), 11, 320, 361–365 Radio bearers (RBs), 362–363 Radio frequencies, WCDMA, 23 Radiofrequency (RF) signals, location estimation generally, 443 interference factors, 445 multipath propagation, 443– 444
INDEX
Radio modulation, types of, 351 Radio network controllers (RNCs), 11, 14, 357, 461 Radio propagation implications of, 336 models, 444 Radio resource control (RRC), 16, 311 Radio resource management, 201 Radio traffic engineering, 201 Radio transmission errors, IP multicast system, 345–346 RADIUS (remote authentication dial-in user service), 424, 430–431 RAMP (reliable adaptive multicast protocol), 103 RAN/GERAN, 312 Random access control channel (RACH), 16–17 Random hopping, 25 Rayleigh fading, 343 Rayleigh propagation, 444 RDF (resource definition framework), 223, 232 RealAudio, 392 Real Audio G2, 209 Real Networks, 89 RealServer 8, 116 Real-time packet-based services, 32 Real-time streaming protocol (RTSP): IP multicast system, 339 streaming media characteristics of, 291–292, 305 messages, 293 packet-switched streaming, 313 request messages, 293–295 response messages, 295–296 session setup, 296 Real-time transport control protocol (RTCP) IP multicast system, 339 streaming media, 305–306, 315–316 Real-time transport protocol (RTP) IP multicast system, 339 streaming media, 305–308, 316 RealVideo, 392 Received signal strength (RSS), 444 Receiver-driven reliability, 95 Redundant transmissions, 280 Reed-Solomon code, 101– 102, 317 Relative location, 442 Reliable multicast (RM) active (ARM), 99 challenges of, 94–95 characteristics of, 93 –94 distributed recovery, 95– 98 FEC-based recovery, 100– 102 NAK-based recovery, 95
541
router-assisted recovery, 98– 100 state of the art software, 102–103 Reliable multicast transport (RMT), 339 Remote procedure call (RPC), SOAP, 501, 503– 504, 504 Remote synchronization, 261 Repeated unicast model, 117 Replication, 255–256, 258 Representational state transfer (REST), 484, 486– 487 Requantization, 283 Request headers, HTTP/1.1, 51, 56–63 Request line, 293 Request routing, multimedia streaming, 318 Resolution adaptation, 217–219, 239 axis, Infopyramid, 226 Response headers, HTTP/1.1, 51, 63–65 Retransmission, 26, 74 –75, 283 Revenue chain, 416–417 Revenue sharing, 7 Reverse proxy, 91 RFC, generally 1458, 103 1945, 40, 42 –43 2068, 44 2616, 44 Rich calls, 32 Rician propagation, 444 Rights fulfillment server (RFS), 380–381 Rights issuer server (RIS), 392 RLC (radio link control) layer, WCDMA, 14 RMDP, 100, 119 Roaming charges, 422 Roundtrip times (RTTs), 31 –32, 95– 96, 107, 281 Routers multicast-capable, 93 SOAP, 491 Routing application layer, 107 IP multicast system, 340–342 protocols, distribution trees, 115 RTP/UDP traffic, 308 SAML, 509 Sandpiper, 91 Satellite Internet services, 190 SAX, 223–224, 522 Scalable encryption, MDRM, 394– 395, 397– 398 Scalable vector graphics (SVG), 209–210 Scattering, 443 SCE (single–connection emulation), 93, 95
542
INDEX
Scene analysis, location estimation algorithm estimation, 453–454 indoor systems, 454 –455 matching, 451–453, 455 profiling, 451 RF-based, rationale for, 450 –451 SDES packets, 307 SDPng, 340 Second-generation (2G) cellular communications, 36, 257, 335 Secure electronic transaction (SET), 374 Secure Interactive Broadcast Infotainment Services (SBIS) project, 383 Secure socket layer (SSL), 44, 68, 384, 469, 508 Security attacks, types of, 372, 507 content-based media security, 374– 375 emerging technologies state-of-the-art MDRM component technologies, 394 –404 state-of-the-art MDRM systems, 387– 384 importance of, 7 information, see Information security intellectual property, see Moving Picture Expert Group (MPEG), intellectual property management mechanism, 373 overview of, 371 –374 services, types of, 373 SOAP messages, 507– 510 Selective encryption, 378, 395 –396 Semantic Web, 479– 480 Server architecture accesses, types of, 144 –145 components of, 143–144 content server, 143 database server, 144 data log description, 144 front-door server, 143 notification server, 143 significance of, 142 –143 Server-driven negotiations, 47 Server selection, mobility-aware, 318 Server volumes, 119 Service and delivery management system (SDMS), 356 Service lookup, WAP 2.x, 83 Service-oriented architecture (SOA), 475–477 Serving GPRS support node (SGSN), 13, 311, 357, 361, 364–365 Session announcement protocol (SAP), IP multicast systems, 338 –304 Session-based charging, 418
Session control, streaming media description languages, 298–302 H.323 protocol, 298 real-time streaming protocols, 292–296 session description protocol (SDP), 297–298 SIP protocol, 298 UAProf specification, 303–305 wireless session protocol (WSP), 298 Session description protocol (SDP) IP multicast systems, 338–340 streaming media, 292, 297–298 Session duration, 137 Session inactivity period, 153–154 Session initiation protocol (SIP), 310 Session layer, scalable content delivery, 92 Shadowing, 445 Short message service (SMS), 24, 221, 244, 357, 414 Short message service center (SMSC), 357, 414, 423 Siblings, 122 Signal-to-interference ratio (SIR), 23, 196–197 Simple object access protocol (SOAP) attachments and, 505–506 bindings, 490, 499–500, 516–517 characteristics of, generally, 8, 318, 481–482, 489 defined, 481, 489 deployment environments, 490–491 encoding, 500, 504–505 example of, 492–494 HTTP binding, 499–500 message security, 507–510 mobile terminals, 521 1.2, changes to, 510 processing body, 496 fault message, 497–499 header, 496–498 structure, 494– 496, 498– 499 styles document, 500–502 RPC-style, 501, 503–504 Single-hit sessions, 154 Single-source multicast (SSM), 93 Size adaptation, 219–220, 239 Sleep, 342, 363 Slotted ALOHA system, ACME performance analysis, 191–196 Slow hit, 118 Slow synchronization, 259–260 Smallest k-vertex polygon, 453 Smart phones, 205, 264 SMTP (simple mail transport protocol), 91, 499
INDEX
SNMP trace, wireless LAN network, 141 Soft handover, 19 Source, adaptation architecture, 237–238, 240 SPACE, GPS system, 459 Spatial locality analysis, 140, 158–160 Spatial redundancy, 285 Spectral efficiency, 29– 30 Speech compression, 208 –209 Split-proxy architecture, 189 –190 Sponsorship, 418–419 Spreading factor (SF), 16 Squid Web cache, 121 SRM (scalable reliable multicast), 103 SSL/TLS, SOAP Message Security, 508–509 Standard positioning service (SPS), 459 Standard transcoding interface (STI), 251 Starfish Software, 264 Static content, defined, 88, 90 Status line, 293 Stock quotes, 147, 161, 166, 174 Streaming, generally IP multicast system, 338 IPDC system, 355 media, 89 video, charging for, 436–437 Stream thinning, 109 Stylesheets, 210–213, 223 Subcast ERS, 96 –98 Subcasting, 96–97 Subcast repair, 98 Subscriber databases, 233 Subscription charges, 417–419 Summary cache, 123 –124 Sun XML LDI Extensions tag library, 214 Surrogate server, 290 SVG, 216, 223 Symbolic location, 441–442 Symmetric encryption, 377– 378, 380 Synchronization, see specific types of synchronization multimedia streaming, 277 protocol, PDAs, 151, 160–161 Synchronization source (SSRC), streaming media, 305 Synchronized Multimedia Integration Language (SMIL) characteristics of, generally, 210, 222 –223 streaming media, 292, 298 –301, 308 –309, 312 –313 SyncML Initiative, 262– 264 SyncML synchronization, 215, 262 Systematic hopping, 25 System load analysis, workload characterization defined, 137, 140
543
notification, 171–172 web browsing, 160–161 TACS, 9 Talarian, 103 TAP, 421 Tau-dither loop (TDL), 462 TCP (transmission control protocol), see TCP/IP –based invalidation, 121–122 connections, 125, 164 defined, 32 IP multicast system, 345–346 tcpdump, 141 TCP/IP, see Transmission control protocol/ Internet protocol (TCP/IP) TDMA (time division multiple access), 196 Telecommunications market, 1 Television broadcasts, preprogrammed, 349 Temporal redundancy, 285 Temporal stability analysis, 137, 139, 155–158 Third-generation (3G) cellular, generally communications, 335 multicast service, see Multimedia broadcast multicast service (MBMS) Third Generation Partnership Project (3GPP) on charging, 410 generally, 10, 13, 18, 22–24, 197, 244 packet-switched streaming service characteristics of, 308–310, 321 domain architecture, 310–312 PSS framework, 312–314 streaming media, 286–287, 308–314 Third-party search service, 466–467 13K, 208 3GPP2, 10, 13, 290 Three mile system, multicast services, 328 Tibco, 103 Tight coupling, 319–320 Time difference of arrival (TDOA), location estimation, 447–449, 462, 464 Time-division multiple access (TDMA) system, 24, 26, 462, 464 Timeliness, in security, 507 Time of arrival (TOA), location estimation, 447– 448, 460, 462, 464 Time-to-live (TTL) values, 96, 112, 117–118 Tivoli NetView “Distribution Manager”/ “Software Distribution,” 116 TMTP, 97–98 T1P1, 10 Tornado code, 101– 102 Trace-driven simulations, 122 Traffic, content synchronization and, 255 Transaction-based charging, 418
544
INDEX
Transaction identifier (TID), WAP 1.0, 74–75 Transcoding, 109. See also Multimedia transcoding Transmission, generally error loss, 315 –316 primitives, 514 Transmission control protocol/Internet protocol (TCP/IP), 2–3, 5, 38, 41, 83–84, 185, 261 Transmission power control (TPC), 20 Transparent caching, 117, 122, 124, 126 Transport format combination indication (TFCI), 19–20 Transport layer, scalable content delivery, 91–92 Transport layer security (TLS) characteristics of, 469, 508 wireless, 69, 76–77, 83 Transport protocols implications of, 136 streaming media HTTP tunneling, 291, 308 real-time, 305–308 RTSP tunneling, 291, 308 Triangulation, location estimation algorithm, 447–450 TTA, 10 TTC, 10 Tunneling GPRS protocol, 364 HTTP, 291, 308 IP, 89 RTSP, 291, 308 Turbo coding, 19 Two-way synchronization, 258–259, 269 UA header, 221–222, 231, 248 UAProf content adaptation, 233 –235, 238, 240, 248 – 249 streaming media, 302 –303 UDP charging for, 427 streaming media, 307 –308 Ultrasound signals, location estimation, 446– 447 UMTS (universal mobile telecommunications system) architecture, 10, 206, 310 –311 charging, 419 QoS classes, 311, 314 UMTS Forum, 10 UMTS PLMN (public land mobile network), 13 UMTS terrestrial RAN (UTRAN), 11, 16, 310, 320, 361 –362
UMTS/WLAN architecture, 319 Unicode (UTF-8/UTF-16), 208 Uniform resource identifier (URI), 38, 41, 292, 313–314, 480, 484, 487, 494–495, 509, 511 Uniform resource locator (URL), 61, 64, 74, 112, 122, 144, 146–147, 149, 248–249, 292, 519 U.S. Advanced Television System Committee (ATSC) system, 350–351, 357 Universal Description, Discovery, and Integration (UDDI), 518 Universal transverse mercator (UTM), 441 Universal Wireless Communications Consortium (UWCC), 10 Unwired Planet (UP), 66–67 UP.Browser, 145 Uplink, generally dedicated channel, 22–23 signaling, 346–347 URI RFC, 41 Usage shaping, multicast content delivery, 334 User agent profiling (UAProf), 81 User behavior analysis, workload characterization defined, 137 load distribution, 150–153 notification workload, 168–171 spatial locality, 140, 158– 160 temporal locality, 139 temporal stability, 139, 155–158 user request arrival and duration, 139 wireless user sessions, distribution of, 153–156 wireline web and mobile web compared, 170, 178–179 User, generally behavior analysis, see User behavior analysis interest correlation, 186 level granularity, 152–153 load distribution, 137 location notification service, 467 servers, packet-switched streaming service, 311 USER, GPS system, 459 User equipment (UE), 11–12, 15, 22, 311, 363–365 User interest correlation algorithm, 197–198 simulations, 198–200 traces, 198 Username, 509 UTRA (universal terrestrial radio access), standardization of, 10 UTRA FDD, 15 UTRAN FDD, 16
INDEX
Value-added services (VASs), 2, 413–414 Variable-length coding (VLC), 286, 396 –397 Verizon, 508 versit vCard, 270 Video, generally splitters, 89 streaming, 285– 286, 338. See also Multicast content delivery; Multimedia streaming Videoconferencing, interactive, 288 Video on demand (VoD), 109, 349 –350 Video transmission, multicast content delivery, 332–333 Vindigo, 144 Virtual hosting support, 45 VLR (visitor location register), 12 –13 Voice over IP (VoIP), 32, 331 Volume leases, 121 –122
Walled garden, 338 WAP 1.0 bearer layer, 77, 83 components of, overview, 69– 70 wireless application environment (WAE) layer, 69, 71–73 wireless datagram protocol (WDP), 69, 73, 76– 77, 82 wireless session layer (WSP), 69, 73–74, 82 wireless transaction protocol (WTP), 69, 73– 76, 82 wireless transport layer security (WTLS), 69, 76– 77, 83 WAP Forum, 264 Watermarked content, 379 Watermarking digital, 375, 377 –379 dual watermarking-fingerprinting system, 400 –401 public key vs. private key, 398 –399 WBMP, 209, 248 WCCP (Web Cache Control Protocol), 124 WCDMA (wideband code-division multiple access) radio access network architecture, 13–14, 362–363 beyond 2Mbps with HSDPA, 20– 21 evolution of advanced antenna technologies, 23 enhanced uplink dedicated channel, 22–23 multimedia broadcast and multicast service (MBMS), 24 new frequency variants, 23 layer 2/3, 14–18 physical layer, 18–20
545
WCDMA (wideband code-division multiple access) technology architecture, generally, 9–10, 196 core network, 10– 11 radio access network, see WCDMA radio access network standardization, 12 Weak cache consistency, 118 Weather service, MMS adaptation, 249–250 Web-based applications classification of, 116– 117 information dissemination, 87 –90 information exchange, 89–91 Web browsing workload, characterization of content analysis content size, 146 document popularity, 148–149 overview, 145 popular content categories, 146–148 correlation with notification workload, 174–178 log analyses, 161–163 system load analysis, 160–161 user behavior analysis load distribution, 150– 153 overview, 149–150 spatial locality, 158–160 temporal stability, 155–158 wireless user sessions, distribution of, 153–155 Web retailers, digital rights management, 381 Web services core technologies, 480–482 defined, 473–475 foundation technologies discovery protocols, 518–519 simple object access protocol (SOAP), 489–511 the Web, 483–487 Web Services Description Language (WSDL), 511– 518 XML/XML schema, 483 hype, 482–483 identity management, 522–523 mobile terminal, 520–522 motivating technologies, 477–480 privacy, 522–523 service-oriented architectures (SOA), 475–477 standards, 487–489 Web Services Interoperability Organization (WS-I), 519–520 Web Services Description Language (WSDL) bindings, 515–516
546
INDEX
Web Services Description Language (WSDL) (Continued ) characteristics of, generally, 8, 481 –482, 511–512 defined, 481 message data, 512–514 1.2, 516–518 operations, 512, 514–515 Web Services Inspection Language (WS-IL), 519 Web Services Interoperability Organization (WS-I), 475, 519–520 Web workload, see Web browsing workload characterization of, generally analysis, types of, 137 motivation for, 136 impact of, 5, 135–136 notification workload, characterization of content analysis, 164–168 log analyses, 172–174 significance of, 163–164 system load, 171–172 user behavior analysis, 168–171 server architecture accesses, types of, 144–145 components of, 143 –144 data log description, 144 significance of, 142–143 web browsing workload, characterization of content analysis, 145–149 log analyses, 161–163 system load analysis, 160 –161 user behavior analysis, 149–160 web browsing correlated with notification amount of usage, 174– 176 popular content categories, 176 –178 wireless user workload characterization, 140–142 wireline user workload characterization content analysis, 138–139 overview of, 137–138 system load analysis, 140 user behavior analysis, 139–140 wireline web compared with mobile web system load, 179 user behavior, 179 web content, 178–179 Wide area network (WAN), 3 Wireless access, 144–145 Wireless Access Protocol (WAP) charging for, 414, 423–424, 427 defined, 5, 66 evolution of architecture, 67–69 overview, 66–67
future directions for, 83–84 mobile content delivery, 189 synchronization, 261 traffic analysis at Bell Mobility’s PCS, 141 WAP 1.0 components, 69–77 WAP 2.0, architecture overview, 78 –80 WAP 2.x components application framework, 80 –81 bearer networks, 83 security services, 83 service discovery, 83 session services, 81–82 transfer services, 82 transport services, 82–83 Wireless environments, constraints in, 277 Wireless identity module (WIM), 83 Wireless Internet service provider (WISPs), 164, 319 Wireless LAN (WLAN) charging, 422 content adaptation, 206 radiofrequency signals, 443 relative location, 442 streaming media, 275, 319–321 synchronization, 261 workload characterization study, 141– 142 Wireless Markup Language (WML), 5, 210, 214, 223, 225, 228 Wireless profiled TCP (WP-TCP), 82 Wireless public key infrastructure (WPKI), 83 Wireless session protocol (WSP), 69, 73–74, 82, 264–265 Wireless transport layer security (TLS), 69, 76 –77, 83 Wireless transport security layer (WTSL), 83 Wireless users content analysis, 147 notification workload, 178 system load analysis, 162 user behavior analysis, 151–159 web browsing workload, 177–178 Wireless Village, 264 Wireline Internet, 184 Wireline users, 178–179 WML language, 71–73 WMLScript, 71, 73, 81 World Wide Web (WWW), see Web services current status of, 37 future directions for, 37–38 historical perspectives, 35 –37 impact of, generally, 5, 473 protocols for, see HyperText Transfer Protocol; Wireless Access Protocol (WAP)
INDEX
World Wide Web Consortium (W3C) composite capability/preference profiles (CC/PP) working group, 231, 252 functions of, generally, 81–82, 251– 252, 290, 474 –475 Web Services Architecture working group, 474 WS-Security, 508 WTA (wireless telephony application), 71
XForms, 211 XHTML 2.0, 210–211, 214, 216 XHTML mobile profile markup languages (XHTML MP), 81, 223 XML/XML schema, 483 XML-RPC, 494 XSD, 512
Xbone, 116 xDSL, 422 X.509, 509
Yahoo.com, 88 Yahoo Mobile, 135 Yellow Pages, 147, 161
547