Jeng-Shyang Pan, Hsiang-Cheh Huang, and Lakhmi C. Jain (Eds.) Information Hiding and Applications
Studies in Computational Intelligence, Volume 227 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 205. Ajith Abraham, Aboul-Ella Hassanien, and Václav Snášel (Eds.) Foundations of Computational Intelligence Volume 5, 2009 ISBN 978-3-642-01535-9 Vol. 206. Ajith Abraham, Aboul-Ella Hassanien, André Ponce de Leon F. de Carvalho, and Václav Snášel (Eds.) Foundations of Computational Intelligence Volume 6, 2009 ISBN 978-3-642-01090-3 Vol. 207. Santo Fortunato, Giuseppe Mangioni, Ronaldo Menezes, and Vincenzo Nicosia (Eds.) Complex Networks, 2009 ISBN 978-3-642-01205-1 Vol. 208. Roger Lee, Gongzu Hu, and Huaikou Miao (Eds.) Computer and Information Science 2009, 2009 ISBN 978-3-642-01208-2 Vol. 209. Roger Lee and Naohiro Ishii (Eds.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 2009 ISBN 978-3-642-01202-0 Vol. 210. Andrew Lewis, Sanaz Mostaghim, and Marcus Randall (Eds.) Biologically-Inspired Optimisation Methods, 2009 ISBN 978-3-642-01261-7 Vol. 211. Godfrey C. Onwubolu (Ed.) Hybrid Self-Organizing Modeling Systems, 2009 ISBN 978-3-642-01529-8 Vol. 212. Viktor M. Kureychik, Sergey P. Malyukov, Vladimir V. Kureychik, and Alexander S. Malyoukov Genetic Algorithms for Applied CAD Problems, 2009 ISBN 978-3-540-85280-3 Vol. 213. Stefano Cagnoni (Ed.) Evolutionary Image Analysis and Signal Processing, 2009 ISBN 978-3-642-01635-6 Vol. 214. Been-Chian Chien and Tzung-Pei Hong (Eds.) Opportunities and Challenges for Next-Generation Applied Intelligence, 2009 ISBN 978-3-540-92813-3 Vol. 215. Habib M. Ammari Opportunities and Challenges of Connected k-Covered Wireless Sensor Networks, 2009 ISBN 978-3-642-01876-3 Vol. 216. Matthew Taylor Transfer in Reinforcement Learning Domains, 2009 ISBN 978-3-642-01881-7
Vol. 217. Horia-Nicolai Teodorescu, Junzo Watada, and Lakhmi C. Jain (Eds.) Intelligent Systems and Technologies, 2009 ISBN 978-3-642-01884-8 Vol. 218. Maria do Carmo Nicoletti and Lakhmi C. Jain (Eds.) Computational Intelligence Techniques for Bioprocess Modelling, Supervision and Control, 2009 ISBN 978-3-642-01887-9 Vol. 219. Maja Hadzic, Elizabeth Chang, Pornpit Wongthongtham, and Tharam Dillon Ontology-Based Multi-Agent Systems, 2009 ISBN 978-3-642-01903-6 Vol. 220. Bettina Berendt, Dunja Mladenic, Marco de de Gemmis, Giovanni Semeraro, Myra Spiliopoulou, Gerd Stumme, Vojtech Svatek, and Filip Zelezny (Eds.) Knowledge Discovery Enhanced with Semantic and Social Information, 2009 ISBN 978-3-642-01890-9 Vol. 221. Tassilo Pellegrini, S¨oren Auer, Klaus Tochtermann, and Sebastian Schaffert (Eds.) Networked Knowledge - Networked Media, 2009 ISBN 978-3-642-02183-1 Vol. 222. Elisabeth Rakus-Andersson, Ronald R. Yager, Nikhil Ichalkaranje, and Lakhmi C. Jain (Eds.) Recent Advances in Decision Making, 2009 ISBN 978-3-642-02186-2 Vol. 223. Zbigniew W. Ras and Agnieszka Dardzinska (Eds.) Advances in Data Management, 2009 ISBN 978-3-642-02189-3 Vol. 224. Amandeep S. Sidhu and Tharam S. Dillon (Eds.) Biomedical Data and Applications, 2009 ISBN 978-3-642-02192-3 Vol. 225. Danuta Zakrzewska, Ernestina Menasalvas, and Liliana Byczkowska-Lipinska (Eds.) Methods and Supporting Technologies for Data Analysis, 2009 ISBN 978-3-642-02195-4 Vol. 226. Ernesto Damiani, Jechang Jeong, Robert J. Howlett, and Lakhmi C. Jain (Eds.) New Directions in Intelligent Interactive Multimedia Systems and Services - 2, 2009 ISBN 978-3-642-02936-3 Vol. 227. Jeng-Shyang Pan, Hsiang-Cheh Huang, and Lakhmi C. Jain (Eds.) Information Hiding and Applications, 2009 ISBN 978-3-642-02334-7
Jeng-Shyang Pan, Hsiang-Cheh Huang, and Lakhmi C. Jain (Eds.)
Information Hiding and Applications
123
Prof. Jeng-Shyang Pan
Prof. Dr. Lakhmi C. Jain
National Kaohsiung University of Applied Sciences No.415, Jiangong Rd. Sanmin District, Kaohsiung City 80778 Taiwan (R.O.C.) E-mail:
[email protected] University of South Australia Adelaide Mawson Lakes Campus South Australia Sa 5095 Australia E-mail:
[email protected] Prof. Hsiang-Cheh Huang Department of Electrical Engineering National University of Kaohsiung No.700, Kaohsiung University Rd. Nan Tzu Dist., 811.Kaohsiung Taiwan (R.O.C.) E-mail:
[email protected] ISBN 978-3-642-02334-7
e-ISBN 978-3-642-02335-4
DOI 10.1007/978-3-642-02335-4 Studies in Computational Intelligence
ISSN 1860-949X
Library of Congress Control Number: Applied for c 2009 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed in acid-free paper 987654321 springer.com
Foreword
It is generally agreed that three basic requirements of an information hiding system include the quality of watermarked contents, embedding capacity, and robustness. However, these three factors conflict with each other and the trade-off must be made for different applications. Designing a robust watermarking system is really a technically challenging task. This book offers various schemes related to robust watermarking for digital images and videos in spatial, compressed, and frequency domains. Chapter 1 presents a robust image watermarking in frequency domain. This scheme uses a genetic algorithm (GA) to design an applicable system that obtains the good image quality, reasonable hiding capacity, and acceptable robustness. A robust video watermarking method is presented in Chapter 2. This method is robust against image processing operations (e.g. rotation, scaling, clipping and translation, random distortion, and combinations thereof) while maintaining picture quality. Chapter 3 offers a technique for digital image inpainting used to repair scratches or stains in aged images or films. This scheme uses the data embedding technique to embed object data into a video to prevent the objects from being eliminated. Chapter 4 introduces a data hiding system in compressed domain. This scheme employs a hybrid data embedding technique using side match vector quantization (SMVQ) and gray-code computation. This proposed scheme can achieve reasonable image quality, a reasonable compression ratio while guaranteeing the security of the hidden data. A framework of scale-space feature point based robust image watermarking (SSFW) is presented in Chapter 5. This watermarking scheme combines scaleinvariant feature transform (SIFT) and Zernike moments. Chapter 6 proposes a robust watermarking system in frequency domain. This scheme introduces a new watermarking system based on intelligent perceptual shaping of a digital watermark using Genetic Programming (GP). An image authentication scheme based on digital signature is introduced in Chapter 7. The scheme utilizes a novel concept named lowest authenticable difference (LAD). This scheme is robust against JPEG, JPEG2000
VI
Foreword
compression and scaling simultaneously. Chapter 8 proposes a new genetic fingerprinting scheme for copyright protection of multicast video. The proposed method has three features, namely, adaptive fingerprinting, effective transmission, and security and imperceptibility. Chapter 9 proposes two lossless data hiding schemes for gray level halftone images. Human visual system (HVS) characteristics are utilized to reduce the visual distortion introduced by data embedding. In addition, the micro structure statistical feature of error diffused halftone images is exploited to enhance the data capacity. A robust watermarking technique with high capacity in frequency domain is presented in Chapter 10. The scheme employs the genetic algorithm (GA) and chaotic map. A block-based chaotic map is developed to increase the amount of significant coefficients in the transformed image, leading to higher embedding capacity. This edited book is a very reference one for information hiding techniques in general and robust watermarking techniques in particular. January 2009 Taichung, Taiwan
Prof. Chin-Chen Chang Feng-Chia University Department of Information Engineering and Computer Science
Contents
Genetic Watermarking for Copyright Protection Hsiang-Cheh Huang, Chi-Ming Chu, Jeng-Shyang Pan . . . . . . . . . . . . . .
1
Dual-Plane Correlation-Based Video Watermarking for Immunity to Rotation, Scale, Translation, and Random Distortion Isao Echizen, Yusuke Atomori, Shinta Nakayama, Hiroshi Yoshiura . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
Restoring Objects for Digital Inpainting Yung-Chen Chou, Chin-Chen Chang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
A Secure Data Embedding Scheme Using Gray-Code Computation and SMVQ Encoding Chin-Chen Chang, Chia-Chen Lin, Yi-Hui Chen . . . . . . . . . . . . . . . . . . .
63
Robust Image Watermarking Based on Scale-Space Feature Points Bao-Long Guo, Lei-Da Li, Jeng-Shyang Pan . . . . . . . . . . . . . . . . . . . . . .
75
Intelligent Perceptual Shaping in Digital Watermarking Asifullah Khan, Imran Usman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Semi-fragile Image Authentication Method for Robust to JPEG, JPEG 2000 Compressed and Scaled Images Chih-Hung Lin, Wen-Shyong Hsieh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Genetic-Based Fingerprinting for Multicast Multimedia Yueh-Hong Chen, Hsiang-Cheh Huang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Lossless Data Hiding for Halftone Images Fa-Xin Yu, Hao Luo, Shu-Chuan Chu . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
VIII
Contents
Information Hiding by Digital Watermarking Frank Y. Shih, Yi-Ta Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
List of Contributors
Yusuke Atomori The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, 182-8585, Japan
[email protected] Shu-Chuan Chu Cheng Shiu University, Kaohsiung, Taiwan, R.O.C.
[email protected] Isao Echizen National Institute of Informatics 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
[email protected] Chin-Chen Chang Feng Chia University Taichung, Taiwan, R.O.C.
[email protected] http://msn.iecs.fcu.edu.tw/ccc/ Bao-Long Guo Xidian University Yi-Hui Chen Xi’an 710071, P.R. China National Chung Cheng University
[email protected] Chiayi 621, Taiwan, R.O.C. Wen-Shyong Hsieh
[email protected] Shu-Te University Yueh-Hong Chen Kaohsiung, Taiwan, R.O.C. Far East University
[email protected] Tainan, Taiwan, R.O.C. Hsiang-Cheh Huang
[email protected] National University of Kaohsiung Yung-Chen Chou Kaohsiung, Taiwan, R.O.C. National Chung Cheng University
[email protected] Chiayi 621, Taiwan, R.O.C. http://hchuang.ee.nuk.edu.tw/
[email protected] Lakhmi C. Jain Chi-Ming Chu University of South Australia National Kaohsiung University of Adelaide, SA, Australia Applied Sciences
[email protected] Kaohsiung, Taiwan, R.O.C. http://www.kes.unisa.edu.au/
X
List of Contributors
Asifullah Khan Pakistan Institute of Engineering and Applied Sciences Nilore–45650, Islamabad, Pakistan
[email protected] Lei-Da Li Xidian University Xi’an 710071, P.R. China
[email protected] Chia-Chen Lin Providence University Taichung 43301, Taiwan, R.O.C.
[email protected] Chih-Hung Lin Southern Taiwan University Tainan, Taiwan, R.O.C.
[email protected] Hao Luo Zhejiang University 310027, Hangzhou, 101-8430, P.R. China
[email protected] Shinta Nakayama The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, 182-8585, Japan
[email protected] Jeng-Shyang Pan National Kaohsiung University of Applied Sciences Kaohsiung, Taiwan, R.O.C.
[email protected] http://bit.kuas.edu.tw/jspan/ Frank Y. Shih New Jersey Institute of Technology Newark, NJ 07102
[email protected] Imran Usman Pakistan Institute of Engineering and Applied Sciences Nilore–45650, Islamabad, Pakistan
[email protected] Yi-Ta Wu New Jersey Institute of Technology Newark, NJ 07102 Hiroshi Yoshiura The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, 182-8585, Japan
[email protected] Fa-Xin Yu Zhejiang University 310027, Hangzhou, 101-8430, P.R. China
[email protected] 1 Genetic Watermarking for Copyright Protection Hsiang-Cheh Huang1 , Chi-Ming Chu2 , and Jeng-Shyang Pan2 1
2
National University of Kaohsiung, Kaohsiung 811, Taiwan, R.O.C.
[email protected] http://hchuang.ee.nuk.edu.tw/ National Kaohsiung University of Applied Sciences, Kaohsiung 807, Taiwan, R.O.C.
[email protected] http://bit.kuas.edu.tw/~ jspan
Summary. Applications for robust watermarking is one of the major branches in digital rights management (DRM) systems. Based on existing experiences to assess how good one robust watermarking is, it is generally agreed that three parameters or requirements, including the quality of watermarked contents, the survivability of extracted watermark after deliberate or unintentional attacks, and the number of bits embedded, need to be considered. However, performances relating to these three parameters conflict with each other, and the trade off must be searched for. In this chapter, we take these requirements into consideration, and we can find the optimized combination among the three parameters. With the aid of genetic algorithm, we design an applicable system that would obtain the good quality, acceptable survivability, and reasonable capacity after watermarking. Simulation results present the effectiveness in practical implementation and possible application of the proposed algorithm.
1.1 Introduction Multimedia contents are easily spread over the Internet. Due to the ease of delivery and modification of digital files, the copyrights might be infringed upon. To deal with this problem, digital rights management (DRM) systems can prevent users from using such contents illegally [3]. In DRM systems, encryption and robust watermarking are two major schemes for applications [6, 7, 10]. By using encryption to protect data, the encrypted digital contents look like random and noisy patterns, which will cause the eavesdroppers to suspect the existence of hidden secrets. Furthermore, if one bit is received erroneously during transmission, part or whole of received data would not be decrypted, leading to the uselessness of such contents. Under J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 1–19. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
2
H.-C. Huang, C.-M. Chu, and J.-S. Pan
the umbrella of watermarking researches, the main goal is to cope with the deliberately or unintentionally applied modifications, called attacks, and such a kind of watermarking schemes is regarded as robust watermarking. On the other hand, watermarking can be classified into two major categories; one is fragile watermarking (also named data hiding or steganography), and the other is robust watermarking. For fragile watermarking, the watermarked content looks identical to its original counterpart, and it cannot withstand the slightest modification intentionally or unintentionally, since it would lead to the result that the embedded watermark gets vanished. From this viewpoint, fragile watermarking and encryption are similar since they are vulnerable to modifications. For robust watermarking, the watermarked contents and their original counterparts look similar, or even identical from subjective point of view. The major advantage for robust watermarking is that the watermark embedded can survive even when the watermarked content gets modified. Thus, we focus on robust watermarking and propose an applicable solution with optimization techniques in DRM implementation. In this chapter, we use the digital images to represent the multimedia contents. It is generally agreed that for one robust watermarking algorithm, the watermarked image quality (or imperceptibility), the survivability, represented by the correct rate of extracted watermark (or robustness), and the number of bits embedded (or capacity), are the three most important factors to assess how good the algorithm and implementation are. However, some trade off must be searched for because the three factors conflict with each other. Here we employ genetic algorithm (GA) [1] to find an optimized solution numerically that can reach better imperceptibility, more robustness, and reasonable capacity. The scheme can be directly applicable to DRM systems. This chapter is organized as follows. In Section 1.2 we point out the need for optimization in a watermarking system. In Section 1.3 we describe proposed algorithm by modifying and extending previous works. Two detailed case studies are presented in Section 1.4, and simulation results are demonstrated in Section 1.5. Finally, we make conclusions in Section 1.6.
1.2 Watermarking Requirements As we stated in Section 1.1, the three major requirements for robust watermarking are imperceptibility, robustness, and capacity. Their interrelationships can be discussed as follows. And we can see why they conflict with each other, and the tradeoff among the three should be reached with the aid of optimization techniques. 1.2.1
Watermark Imperceptibility
Watermark imperceptibility refers to whether the viewer can perceive the existence of embedded watermark or not from subjective point of view [2, 13].
1
Genetic Watermarking for Copyright Protection
3
It also means the quality of the watermarked image, measured by the distortion induced between the watermarked and original images due to watermark embedding, and it is represented by numerical values such as the Peak Signalto-Noise Ratio (PSNR) objectively. The PSNR has the definition in Eq. (1.1). Let the original image and the watermarked one be X and X , respectively, with image size of M × N . The PSNR is 2552 . (1.1) PSNR = 10 × log10 M N 2 1 i=1 j=1 (X(i, j) − X (i, j)) M×N Larger PSNR values imply less distortions induced, hence the better outcomes can be reached. For getting the better imperceptibility, less modification to the original content is desired. Hence, we can lead to the derivations as follows. • •
Embedding more bits induce more distortion into the original image. Thus, embedding fewer bits, namely, decreasing the capacity, meets the goal. If we set the capacity a constant value, embedding bits into least significant bit or higher frequency coefficient would alter the image as less as possible. However, this would degrade the robustness since the watermarked image may experiencing filtering and the hidden watermark may get vanished.
1.2.2
Watermark Robustness
Watermark robustness means the capability that the watermarked media can withstand deliberate or unintentional media processing, called attacks, including filtering, resizing, or rotation [6, 7]. There are also benchmarks to perform attacks [8]. From subjective viewpoint, the extracted watermark needs to be as similar as the embedded one. And speaking objectively, the correlation between the two needs to be measured. People often use the bit correct rate (BCR), with the definition in Eq. (1.2), to assess the robustness of the watermarking algorithm. The normalized cross-correlation (NC) between the embedded watermark and the extracted one may also be considered to assess the survivability for the extracted watermark. Let the embedded and extracted watermarks be W and W , respectively, with size of MW × NW . With its ease of calculation, we choose the BCR values to measure the robustness of extracted watermarks. The BCR is BCR = 1 −
MW
NW M W 1 [W (i, j) ⊕ W (i, j)] , × NW i=1 j=1
(1.2)
where ⊕ denotes the exclusive-or (XOR) operation. The larger the BCR value, the better the result. To make the algorithm robust, the watermark needs to be hidden into more important parts, such as the most significant bits or the low frequency components, in order to resist common attacks.
4
• •
H.-C. Huang, C.-M. Chu, and J.-S. Pan
In addition to obtaining the larger BCR value, extracted watermark needs to be recognizable. Based on this standpoint, the capacity should be more than some pre-determined threshold value. Embedding bits into lower frequency coefficient would increase the robustness. However, this would sacrifice the imperceptibility.
1.2.3
Watermark Capacity
Watermark capacity attributes to the number of bits embedded into the original media, that is, the size of watermark. We can see that from the results in [11], embedding more bits into the contents would directly cause the degradation of the quality of watermarked image. On the contrary, embedding too few bits may lead to the result that the extracted watermark may hardly be comprehensible even though the watermarked image quality can be guaranteed. Thus, the watermark capacity needs to be carefully chosen to be meaningful. •
•
Even though increasing the capacity is desired, the appropriate number of capacity should lie above some threshold in order to make the extracted watermark recognizable. On the other hand, embedding too many bits may sacrifice the imperceptibility. If the increase in capacity is feasible under the condition that acceptable quality in watermarked image is obtained, using error control codes (ECC) [5] for encoding the watermark is a able way to meet the requirement. Once the capacity is determined, embedding into higher frequency coefficients meets the goal for imperceptibility. On the contrary, embedding into lower frequency coefficients meets the goal for robustness. Hence, some trade off must be searched for, and this is the major contribution of the paper in [12].
1.2.4
Optimization for Requirements
Since there are conflicts among the three requirements described in Secs. 1.2.1 to 1.2.3, we employ the optimization technique for finding the better outcome. First, the fitness function should be designed. Next, parameters relating to the optimization technique should be carefully chosen. From the above discussions, we lead to the results that better imperceptibility, more robustness, and the reasonable number of capacity are all required for designing the algorithm. For measuring imperceptibility, we use PSNR of watermarked image to serve as an objective measure. In evaluating robustness, after applying some deliberate attacks [2], we calculate the BCR. For measuring capacity, we count the average number of bits, C, embedded into one 8 × 8 block, since we are going to embed the watermark with discrete cosine transform (DCT). The fitness function for training at iteration i can be defined: 1 BCRk,i + λ2 · Ci . n n
fi = PSNRi + λ1 ·
k=1
(1.3)
1
Genetic Watermarking for Copyright Protection
5
The first term, PSNRi , denotes the imperceptibility. In the second term, because we expect to cope with n different attacks, we calculate the robustness after certain attacks, BCRk,i , respectively, and the average of these BCR values is served as the robustness measure. In the third term, Ci implies the capacity. Because PSNR values are usually more than 30 dB, the BCR values lie between 0 and 1, and the capacity can be set to 1 to 4 bits per block after considering practical situations, and the average capacity must lie between 1 and 4 bit/block, we find that values corresponding to the three components lie into various ranges because of their inherent characteristics. Thus, we introduce two weighting factors, λ1 and λ2 , into the fitness function. The main reason is to balance the effects and contributions from these three factors. And the goal of our optimization algorithm is to find the maximum value in Eq. (1.3). Figure 1.1 is the conceptual illustration of robust watermarking with GA optimization. Original image X is the input. At the beginning, we set the number of iteration for training. In the training process, for every population in GA, the number of bits for embedding into every 8 × 8 block of the original image, one after another, is decided first, and the watermark capacity is regarded as one of the three parts in the fitness function. Next, the watermark is embedded into the original images, namely, the populations in GA, and the PSNR values of watermarked images are obtained. Finally, we apply n different attacks, for instance, JPEG compression for every population, and we try to extract the embedded watermarks from every attacked image. The BCR values between the embedded and extracted watermarks under different attacks are obtained. After calculating the fitness function, we proceed with this process in the next iteration until meeting the terminating condition. Finally, the optimized, watermarked image, Y , and associated secret key, key1 , are delivered to the reception side.
1.3 Proposed Algorithm We employ genetic algorithm (GA) for optimizing the three requirements above. GA is constituted of three major steps: selection, crossover , and mutation [1]. Based on the fitness function in Eq. (1.3) and the dashed box on the left side of the flow graph in Figure 1.1, we propose an integration of our watermarking scheme with GA procedures. 1.3.1
Preprocessing in GA
GA is a process to emulate the natural selection. We need to have populations for training in GA to perform the three steps. The number of populations is generally chosen to be an even number to ease the operation of the crossover step. Every population is composed of chromosomes. The chromosome is a binary string, and the binary representation denotes the positions for watermark embedding in one block. It has variable length, which depends on the
6
H.-C. Huang, C.-M. Chu, and J.-S. Pan
-
Watermarked image
-Y - key1
Yes
Z - Final ZZ Z iteration? ZZ 6
X
-
No
- Watermark
Capacity
building blocks
-
?
Mutation
GA
Xi
embedding
selection
PSNR
6
calculation
Selection
6
? Attack #1
Ci
PSNRi
? qqq
X
extraction
?n,i qqq
W 1,i
?
?
Fitness evaluation with Eq. (1.3)
BCR1,i
q q q
BCR
Watermark extraction
W
?
calculation
Attack #n
X
? 1,i Watermark
Crossover
6
qqq
? n,i qqq
BCR calculation
BCRn,i
Fig. 1.1. Building blocks of watermarking with GA-based optimization schemes
watermark capacity decided. By concatenating the binary string of every block in the image, we obtain one population in GA. Because we perform 8 × 8 DCT for watermark embedding [12], we have 64 DCT coefficients, ranging between 0 and 63, and these coefficients are represented with a 6-bit string, in one block. If the size of original image is W ×H, and let the maximal H capacity be Cmax bit/block, the length of the population is W 8 × 8 × 6 × Cmax bits. For certain blocks that are embedded fewer bits than Cmax , considering the practicability in implementation, the remaining parts of the chromosome are replaced by consecutive bits of 0’s. The size of original image is 512 × 512 and that of the binary watermark is 128 × 128 in this chapter. Hence, Cmax is set to be 128×128 = 4 bit/block. 512 ( 512 8 × 8 ) 1.3.2
Deciding the Capacity in One Iteration
After considering practical implementations in GA in Sec. 1.3.1, Cmax = 4. And the capacity for every 8 × 8 block is variable, ranging between 1 to 4 bits per block. After calculation of 8×8 DCT, 64 DCT coefficients can be produced, and the range lies between 0 and 63, where 0 denotes the DC coefficient, i denotes the ith AC coefficients, 1 ≤ i ≤ 63 for watermark embedding. For clearly explaining our implementation, we describe an instance as follows. If the capacity C = 3 for a certain block, and suppose that the 19th , 28th and 43th coefficients are selected, then the chromosome in this block is represented by a 24-bit string 010011 011100 101011 000000, where the 18 bits in the first three segments denote the position, and the final 6 bits
1
Genetic Watermarking for Copyright Protection
7
represent the remaining positions that are not selected, which are intentionally inserted to ease the implementation. At the first training iteration, all the AC coefficients are randomly selected for watermark embedding. 1.3.3
Embedding Watermark with DCT
We modify conventional schemes [12] for DCT-based watermarking to embed the binary watermark. Step 1. Perform DCT on original image: 8 × 8 DCT is performed on the entire 512 × 512 image. For one block, it leads to one DC coefficient, presented by coefficient 0, and 63 AC coefficients, presented by coefficients 1 to 63. Step 2. Determine the capacities and positions: The goal of our algorithm is to search for the proper positions for embedding based on decided capacity, leading to a trade off among the three requirements and suitable positions. Step 3. Obtain the threshold for embedding: The average values of DC and 512 other 63 AC coefficients among the 512 8 × 8 , or 4096 blocks are served as the thresholds for watermark embedding. Step 4. Embedding the watermark: The thresholds in Step 3 sre represented by a vector a = [a0 , a1 , · · · , a63 ], where a denotes the average value. The DC coefficient is prohibited for embedding. And we use the vector 0 to serve as the reference for modifying the AC r = aa01 , aa02 , · · · , aa63 coefficients, where r denotes the ratio between DC and AC coefficients. Embedding of watermark meets one of the two situations below. • If bit 0 is embedded, the AC coefficient in selected position is modified. If it is larger than the reference value in r, it is decreased to be smaller than the corresponding element in r by a parameter δ. If not, the value is kept unchanged. • If bit 1 is embedded, the coefficient is modified to be in contrary with the previous condition. Step 5. Perform the inverse DCT on modified coefficients: Inverse DCT is calculated to obtain the watermarked image. The corresponding positions for watermark embedding are also recorded, and the PSNR of the watermarked image can be obtained. 1.3.4
Choosing Proper Attacks
For verifying robust watermarking, applying attacks to watermarked image is necessary. However, attacks need to be properly selected such that the attacked images still retain its meaningfulness and commercial value. For instance, image cropping attack is unsuitable since too much information would be discarded, and subjective image quality is degraded. Here, we choose three kinds of attacks [8], namely, JPEG compression with different quality
8
H.-C. Huang, C.-M. Chu, and J.-S. Pan
factors (QF), low-pass filtering (LPF), and median filtering (MF), to perform the attacks. Attacked images look similar to their original counterpart after applying these properly selected attacks. The BCR values after experiencing these attacks are calculated, and the average of these values are included into the fitness function. 1.3.5
Extracting the Watermark
Let the watermarked image Y in Figure 1.1, after applying attack, be denoted by Z. We calculate the DCT of the attacked image Z, and generate the new reference value r by following Step 4 in Sec. 1.3.3. The extracted watermark bit is determined by one of the two situations below. • •
If the selected coefficient divided by the average of DC value in Z is smaller than its corresponding coefficient in r , we decide the extracted watermark bit to be 0; If the selected coefficient divided by the average of DC value in Z is larger than its corresponding coefficient in r , we decide the extracted watermark bit to be 1.
1.3.6
Evaluating Fitness
We gather the average capacity C in Sec. 1.3.2, calculate the PSNR in Step 5 of Sec. 1.3.3, and obtain the average of BCR in Secs. 1.3.4 and 1.3.5, and then combine them altogether to calculate the fitness value with Eq. (1.3). Every population corresponds to one fitness value in the training iteration. 1.3.7
GA Procedures
The generated binary strings are ready for GA procedures. In this chapter, the number of populations is 10. The selection rate is 0.5, meaning that only the 5 populations with higher fitness values are kept for the next iteration, and the remaining 5 are produced by the crossover operation. The mutation rate is 0.1, meaning that 10% of all the bits are randomly selected and intentionally flipped. The main theme for GA is to search for the proper coefficient positions for watermark embedding, leading to the associated secret key, key1 . The secret key can be delivered with the scheme in [9]. Weighting factor λ1 is set to be between 0 and 200, and its counterpart λ2 is set to range between 0 and 50 in the GA training process. The parameter for altering the selected DCT coefficients in Step 4 in Sec. 1.3.3, δ, is fixed to 5. The major reason for choosing these values is to balance the effects from the three requirements, because we would like to have equal contribution from the three requirements to some extent. Based on this setting, fitness value from the three weighted requirements can lie among the following ranges:
1
• • •
Genetic Watermarking for Copyright Protection
9
PSNR part: from 30 to 55, observed from simulation results; BCR part multiplied by weighting factors: from 0 to 200, because BCR ∈ [0, 1] and λ1 ∈ [0, 200]; capacity part multiplied by weighting factors: from 0 to 200, because C ∈ [1, 4] and λ2 ∈ [0, 50].
Results with these different combinations of weighting factors are verified in Sec. 1.5. 1.3.8
The Stopping Condition
Once the number of training iterations in GA is reached, the optimization process is stopped. The population with the largest fitness value in the final iteration is the optimized watermarked image. Corresponding secret key with this image is also delivered to the receiver [9].
1.4 Case Studies in Optimized Embedding 1.4.1
Fixed Embedding Capacity
We choose the test image bridge with the picture size of 512×512, illustrated in Figure 1.2(a). The binary watermark with the size of 128×128 is prepared, shown in Figure 1.2(b). In Figure 1.2, the width and height between the two images are carefully chosen to be 4 : 1. And we will compare with the results shown in [12]. Because in [12], authors used normalized correlation (NC) to represent the watermark robustness, and we use BCR here, hence we show extracted watermarks for subjective evaluation in Figure 1.3. Both [12] and here we denote imperceptibility by using PSNR, we make comparisons in Table 1.1. Based on the settings by including JPEG quantization tables with watermarking in [12] for reference, we obtain reasonable results with the algorithm proposed in this chapter. We choose the JPEG attack with quality factor QF = 80 to validate the proposed algorithm. The weighting factors, λ1 and λ2 in Eq. (1.3), are set to λ1 = 50 and λ2 = 0, respectively. The main reason for setting λ2 = 0 is that we can manually adjust the capacity to see the performances between imperceptibility and robustness. First, we compare the extracted watermarks in Figure 1.3. Figure 1.3(a) shows the one with capacity of 4 bit/block. Figure 1.3(b)–(d) illustrate those with capacity of 3, 2, 1 bit/block, respectively, leading to the watermark size of 128×96, 128×64, and 128×32. Figure 1.3(a) can be clearly perceived, and the BCR value is high. We can also see that in Figure 1.3(b), the capacity is 3 bit/block, and only the upper three quarters can be recognized. The bottom quarter is intentionally set to bit 0 for comparison. Figure 1.3(c) and (d) also have similar phenomena. For embedding 3 or 4 bit/block, similar BCR values can be obtained. When decreasing the
10
H.-C. Huang, C.-M. Chu, and J.-S. Pan
(a) Original bridge image
(b) Watermark
Fig. 1.2. (a) The original bridge image with size 512 × 512. (b) The binary watermark with size 128 × 128.
capacity to 2 or 1 bit/block, the BCR values grow. However, even though the BCR values are high enough, decreasing the capacity leads to be less meaningful in the extracted watermarks. Table 1.1 makes comparison between our scheme and that in [12]. We can see that comparable results can be obtained. When we lower the embedded capacity, the PSNR values get higher. This is because less DCT coefficients get modified, and it proves our discussions in Sec. 1.2. Table 1.1. Comparisons of capacity (in bit/block) and imperceptibility, represented by PSNR (in dB), between our algorithm and existing one Scheme Capacity Imperceptibility Existing ([12]) 4 bit/block 34.79 dB Proposed 4 bit/block 33.95 dB 3 bit/block 35.24 dB 2 bit/block 37.57 dB 1 bit/block 40.50 dB
1
Genetic Watermarking for Copyright Protection
11
In Figure 1.4, we present the number of embedded positions that is associated with the results in Figure 1.2. Due to the JPEG compression attack that tends to discard the higher frequency coefficients, lower to middle frequency coefficients, namely, AC2 and AC18 , are mostly embedded. Moreover, with the values indicated on the vertical axis, we can see that the total number of embedded bits decreases from Figure 1.4(a) to Figure 1.4(d). From the data in Figure 1.3, Figure 1.4, and Table 1.1 above, we can find out that the three requirements have their own characteristics inherently, and they influence on another. By taking the watermark capacity into account, we have more flexibility in the design of algorithm. 1.4.2
Variable Embedding Capacity
Considering the fitness function in Eq. (1.3), we choose λ1 = 50 and λ2 = 15 for the detailed case study among the three requirements. The main reason for choosing such values is to balance the contributions from all the three requirements. Regarding to the attacking schemes, the JPEG compression with QF = 80 is chosen for verifying our algorithm in this case study. Moreover, attacking schemes with the 3 × 3 low-pass filtering (LPF), and the 3 × 3 median filtering (MF), are also examined, and results are depicted in Sec. 1.5. In GA, we choose 20 populations with selection rate of 0.5 and mutation rate of 0.1 for optimization. After training for 100 iterations under the preliminary for better imperceptibility and better robustness under the JPEG attack, we obtain the optimized output with PSNR = 45.91 dB in Figure 1.5, and we can hardly
(a) 128 × 128 bits (b) 128 × 96 bits BCR = 0.9232 BCR = 0.9224
(c) 128 × 64 bits (d) 128 × 32 bits BCR = 0.9430 BCR = 0.9739 Fig. 1.3. (a)–(d) Extracted watermarks with capacities 4, 3, 2, and 1 bit/block, respectively
12
H.-C. Huang, C.-M. Chu, and J.-S. Pan DCT band vs. number of embedded bits, 4 bit/block, with optimized results
DCT band vs. number of embedded bits, 3 bit/block, with optimized results
350
300
300
250
250
occurrences
occurrences
200 200
150
150
100 100
50
50
0
0
10
20 30 DCT band number, top 4: AC → AC 2
18
40 → AC → AC 1
50
0
60
0
10
13
20 30 40 DCT band number, top 3: AC → AC → AC 2
(a) 4 bit/block, mostly embedded: AC2 → AC18 → AC1 → AC13
9
50
60
3
(b) 3 bit/block, mostly embedded: AC2 → AC9 → AC3
DCT band vs. number of embedded bits, 2 bit/block, with optimized results
DCT band vs. number of embedded bits, 1 bit/block, with optimized results
250
160
140 200 120
100 occurrences
occurrences
150
100
80
60
40 50 20
0
0
10
20 30 40 DCT band number, top 2: AC2 → AC5
50
(c) 2 bit/block, mostly embedded: AC2 → AC5
60
0
0
10
20
30 40 DCT band number, top 1: AC2
50
60
(d) 1 bit/block, mostly embedded: AC2
Fig. 1.4. The histogram between embedding DCT coefficients and the number of bits embedded. (a)–(d) Embedding coefficients with capacity 4, 3, 2, and 1 bit/block, respectively.
differentiate the differences between the original image and the watermarked one subjectively. Regarding to the watermark robustness in addition to the JPEG attack, we also employ two other attacking schemes altogether on the watermarked image to see whether our attack can survive after other attacks or not. For making comparisons conveniently, we put the embedded watermark again in Figure 1.6(a). And we can see that BCR = 0.9603 for JPEG attack in Figure 1.6(b), BCR = 0.7128 for LPF attack in Figure 1.6(c), and BCR = 0.7469 for MF attack in Figure 1.6(d). It is easily comprehended that in this case, we can obtain better imperceptibility and it can successfully resist the JPEG attack. However, it cannot survive under other attacks, such LPF and MF due to the fact that the BCR after JPEG attack is included into the fitness function in GA in Eq. (1.3),
1
Genetic Watermarking for Copyright Protection
13
Fig. 1.5. Watermarked output, PSNR = 45.91 dB
watermark (a)
JPEG attack, BCR = 0.9603 (b)
LPF attack, BCR = 0.7128 (c)
MF attack, BCR = 0.7469 (d)
Fig. 1.6. Comparisons of embedded watermark and extracted ones after different attacks. From subjective viewpoint, (c) and (d) do not survive well under LPF and MF attacks. (a) Embedded watermark containing 128 × 128 = 16384 bits. (b) Extracted from JPEG attack. (c) Extracted from LPF attack. (d) Extracted from MF attack.
14
H.-C. Huang, C.-M. Chu, and J.-S. Pan DCT band vs. number of embedded bits, 3.4285 bit/block, with optimized results 300
250
occurrences
200
150
100
50
0
0
10
20 30 DCT band number, top 4: AC
16
→ AC
14
40 → AC
43
→ AC
50
60
2
Fig. 1.7. The histogram between embedding DCT coefficients and the number of bits embedded. AC16 and AC14 are the top-two coefficients for embedding, and a total of 14043 bits, or 3.4285 bit/block, are embedded.
but others are not. With this observation, when coping with several different attacks, all the extracted BCR values need to be integrated into the fitness function. For clearly representing the effects under various capacities, in the extracted watermark, bit 0 and bit 1 are denoted by black and white pixels, respectively, while those in the remaining parts are intentionally denoted by grey pixels. This phenomena can be seen from Figure 1.6(b) to Figure 1.6(d). On the one hand, the watermark extracted from JPEG-attacked image, shown in Figure 1.6(b), can be clearly recognizable, and the BCR value is very high. On the other hand, the watermark extracted from LPF- and MF-attacked image, illustrated in Figures 1.6(c) and (d), respectively, can hardly be recognized, and also the BCR values are not high enough. This result is reasonable because we focus on the JPEG attack, and Figure 1.6(b) verifies this phenomenon. Next, we check the histogram for embedding coefficients in Figure 1.7, and we find that a total of 14043 bits are embedded. For measuring imperceptibility, objective value is acceptable and most parts for watermarking are invisible. For evaluating robustness, the BCR value is high enough, while the extracted watermark is easily recognized under JPEG attack. For embedding positions, the 16th and 14th coefficients (or AC16 and AC14 , respectively) are embedded mostly, which follows the concept of embedding into ‘middle frequency bands’ proposed in literature [2, 12].
1.5 Simulation Results 1.5.1
Selection of Weighting Factors
Besides the case study depicted in Sec. 1.4, we provide more results with our experiments as follows. Table 1.2 demonstrates the performances among
1
Genetic Watermarking for Copyright Protection
15
imperceptibility, robustness, and capacity, and the two DCT coefficients that are mostly embedded, under a variety of weighting factors. These results are obtained after 100 training iterations in GA, with selection rate of 0.5 and mutation rate of 0.1. We perform lots of experiments based on the preliminary conditions that λ1 ∈ [0, 200] and λ2 ∈ [0, 50], and we present results with 15 of all the experiments in Table 1.2. These experiments can be classified into three categories: 1. fixing the robustness factor λ1 to 50, and varying the capacity factor λ2 from 10 to 30 with a stepsize of 5. 2. fixing the robustness factor λ1 to 100, and varying the capacity factor λ2 from 10 to 30 with a stepsize of 5. 3. fixing the robustness factor λ1 to 150, and varying the capacity factor λ2 from 10 to 30 with a stepsize of 5. From the numerical values in Table 1.2, and the subjective evaluation from Figure 1.9, we observe that by increasing the weighting factor of capacity, we can see that the average capacity gets increased, while the BCR values gets somewhat reduced. PSNR values fluctuate a bit, but comparing to the original image, they remain objectively unnoticed in the watermarked parts. It is because of the embedding position selected after GA optimization. According to the data presented in Figure 1.8, the best embedding bands also lie in low to middle frequency bands. Table 1.2. Comparisons of imperceptibility (in dB), robustness, and capacity (in bit/block) with different weighting factors PSNR BCR BCR BCR (dB)
(JPG) (LPF) (MF)
45.46 44.34 43.23 43.01 42.95 41.58 40.98 40.94 40.33 39.92 40.78 39.94 39.90 39.55 38.99
0.9161 0.9013 0.8838 0.8820 0.8824 0.9349 0.9402 0.9308 0.9118 0.9058 0.9421 0.9530 0.9534 0.9420 0.9332
0.7897 0.7199 0.6698 0.6611 0.6598 0.8799 0.8268 0.7875 0.7374 0.7054 0.8762 0.8871 0.8683 0.8304 0.7838
0.8484 0.8103 0.7695 0.7599 0.7588 0.9011 0.8959 0.8681 0.8375 0.8167 0.9181 0.9215 0.9131 0.8994 0.8702
Capacity The two (bit/block) best bands 3.3364 AC4 → AC2 3.7200 AC4 → AC2 3.9302 AC1 → AC12 3.9819 AC7 → AC4 3.9941 AC1 → AC2 2.8264 AC4 → AC1 3.1804 AC1 → AC4 3.5144 AC4 → AC1 3.7473 AC4 → AC1 3.8752 AC1 → AC4 2.7021 AC4 → AC2 2.8979 AC4 → AC2 3.1506 AC4 → AC1 3.3611 AC1 → AC4 3.5781 AC4 → AC1
Factors λ1 λ2 50 10 50 15 50 20 50 25 50 30 100 10 100 15 100 20 100 25 100 30 150 10 150 15 150 20 150 25 150 30
16
H.-C. Huang, C.-M. Chu, and J.-S. Pan DCT band vs. number of embedded bits, 3.7200 bit/block, with optimized results 350
300
occurrences
250
200
150
100
50
0
0
10
20 30 40 DCT band number, top 4: AC → AC → AC → AC 4
2
7
50
60
5
(a) λ1 = 50 and λ2 = 15 DCT band vs. number of embedded bits, 3.1804 bit/block, with optimized results 350
300
occurrences
250
200
150
100
50
0
0
10
20 30 40 DCT band number, top 4: AC → AC → AC → AC 1
4
2
50
60
7
(b) λ1 = 100 and λ2 = 15 DCT band vs. number of embedded bits, 2.8979 bit/block, with optimized results 350
300
occurrences
250
200
150
100
50
0
0
10
20 30 40 DCT band number, top 4: AC → AC → AC → AC 4
2
1
50
60
7
(c) λ1 = 150 and λ2 = 15 Fig. 1.8. Comparisons of the histograms of embedding coefficients with different weighting factors in Eq. (1.3)
1
Genetic Watermarking for Copyright Protection
17
JPEG attack, LPF attack, MF attack, BCR = 0.9013 BCR = 0.7199 BCR = 0.8103 (a) Results with λ1 = 50 and λ2 = 15
JPEG attack, LPF attack, MF attack, BCR = 0.9402 BCR = 0.8268 BCR = 0.8959 (b) Results with λ1 = 100 and λ2 = 15
JPEG attack, LPF attack, MF attack, BCR = 0.9530 BCR = 0.8268 BCR = 0.8959 (c) Results with λ1 = 150 and λ2 = 15 Fig. 1.9. Comparisons of the extracted watermarks and BCR values with different weighting factors in Eq. (1.3)
Furthermore, we can easily see that the BCR values after LPF attack are much lower than their counterparts after MF and JPEG attacks. To alleviate this problem, the weighting factor associated with robustness, λ1 , should be increased to enhance the contribution from the robustness in the fitness function. Comparing the three sets of data with (λ1 , λ2 ) = (50, 15), (100, 15), and (150, 15) for instance, we observe that by simply increasing the value of λ1 , both the resulting PSNR and capacity get decreased. Figure 1.9 also demonstrate this observation from subjective point of view. Summing up, the weighting factors need to be carefully chosen based on twofold. The first is that contributions from the different requirements are supposed to have nearly equal contribution. The second is that both the watermarked image and extracted watermark need to be recognized from subjective point of view.
18
H.-C. Huang, C.-M. Chu, and J.-S. Pan
Table 1.3. Comparisons of imperceptibility (in dB), robustness, and capacity (in bit/block) with different a combination of attacks. Weighting factors are set to be (λ1 , λ2 ) = (50, 10). To simplify the representation, we use the abbreviations of J, L, and M to represent JPEG, LPF, and MF, respectively. Attack J L M J&L J&M L&M J&L&M
1.5.2
PSNR BCR BCR BCR (dB)
(JPG) (LPF) (MF)
45.91 46.44 43.45 46.45 43.56 46.21 45.46
0.9603
0.7128 0.7469
0.9194
0.8380
0.7963
0.9058 0.7947
0.8486 0.9262 0.8395 0.7955 0.9107 0.7500 0.8718 0.9058 0.7947 0.8486 0.9196 0.7897 0.8484
Capacity The two (bit/block) best bands 3.4285 AC16 → AC14 3.2273 AC1 → AC4 3.3423 AC1 → AC4 3.1863 AC4 → AC1 3.3145 AC1 → AC4 3.3081 AC4 → AC2 3.3364 AC4 → AC2
Combination of Various Attacks
Based on the building blocks in Figure 1.1, algorithm designer can choose different attacks for making optimization. We show the combination of various attacks with GA in Table 1.3 as follows. In Table 1.3, we list all the seven combinations from the three independent attacks with the weighting factors of λ1 = 50 and λ2 = 10. The BCR values, shown in italics, are not optimized based on the type of attacks. For instance, for the results with JPEG-type attack in the first row, numerical values for BCR(LPF) and BCR(MF) are shown in italics, because only BCR(JPG) is optimized. Because we embed the watermark into DCT coefficients, it seems that our algorithm tends to resist JPEG attack inherently. Therefore, in the first three rows, we can see that BCR values after JPEG attacks are high enough, and we need to take the LPF or MF attack into optimization to obtain the improved BCR values. In the fourth to sixth rows, we can see that only the BCR values with selected attacks perform better. In the last row, because we calculate the average BCR value in the fitness function in Eq. (1.3), if we choose all the three attacks altogether, BCR(JPG) tends to perform better inherently. Therefore, the two remaining BCR values may get decreased, and we can see that the numerical results present this phenomenon. Summing up, our algorithm tends to resist JPEG attack based on its characteristics. And we may suggest to ignore the JPEG attack during the optimization process. By choosing the combination of LPF and MF attacks into the fitness function, acceptable results can be reached.
1.6 Conclusions In this chapter, we discussed about the optimization of robust watermarking with genetic algorithms. By finding trade-offs among robustness, capacity, and imperceptibility, we design a practical fitness function for optimization.
1
Genetic Watermarking for Copyright Protection
19
We observe that the three requirements conflict with one another, thus, by applying GA, we can obtain the optimized outcome. Simulation results depict the improvements of our algorithm, hence the implementation of copyright protection system, and it is directly extendable to cope with a variety of attacks in the benchmarks. Since the capacity can be set variable here, corresponding results perform better than those in literature [12] with fixed watermarking capacity. In addition, the weighting factors in the fitness function play an important role in the design of algorithm. Properly selected weighting factors can lead to better results in overall performance. Other schemes, such as employing ECC into the watermark, can be considered to be integrated into our implementation to obtain better performance with a slight increase in watermark encoding and decoding.
References 1. Gen, M., Cheng, R.: Genetic Algorithms and Engineering Design. Wiley, New York (1997) 2. Huang, H.C., Pan, J.S., Huang, Y.H., Wang, F.H., Huang, K.C.: Progressive watermarking techniques using genetic algorithms. Circuits, Systems, and Signal Processing 26, 671–687 (2007) 3. Koenen, R.H., Lacy, J., Mackay, M., Mitchell, S.: The long march to interoperable digital rights management. Proc. of the IEEE 92, 883–897 (2004) 4. Macq, B., Dittmann, J., Delp, E.J.: Benchmarking of image watermarking algorithms for digital rights management. Proc. of the IEEE 92, 971–984 (2004) 5. Morelos-Zaragoza, R.H.: The Art of Error Correcting Coding, 2nd edn. Wiley, New York (2006) 6. Pan, J.S., Huang, H.C., Jain, L.C. (eds.): Intelligent Watermarking Techniques, pp. 3–38. World Scientific Publishing Company, Singapore (2004) 7. Pan, J.S., Huang, H.C., Jain, L.C., Fang, W.C. (eds.): Intelligent Multimedia Data Hiding. Springer, Heidelberg (2007) 8. Petitcolas, F.A.P.: Stirmark benchmark 4.0 (2004), http://www.petitcolas.net/fabien/watermarking/stirmark/ 9. Piva, A., Bartolini, F., Barni, M.: Managing copyright in open networks. IEEE Internet Comput. 6, 18–26 (2002) 10. Shehab, M., Bertino, E., Ghafoor, A.: Watermarking relational databases using optimization-based techniques. IEEE Trans. on Knowledge and Data Engineering 20, 116–129 (2008) 11. Shieh, C.S., Huang, H.C., Wang, F.H., Pan, J.S.: An embedding algorithm for multiple watermarks. Journal of Information Science and Engineering 19, 381–395 (2003) 12. Shieh, C.S., Huang, H.C., Wang, F.H., Pan, J.S.: Genetic watermarking based on transform domain techniques. Patt. Recog. 37, 555–565 (2004) 13. Wang, S., Zheng, D., Zhao, J., Tam, W.J., Speranza, F.: An image quality evaluation method based on digital watermarking. IEEE Trans. Circuits and Systems for Video Technology 17, 98–105 (2007)
2 Dual-Plane Correlation-Based Video Watermarking for Immunity to Rotation, Scale, Translation, and Random Distortion Isao Echizen1 , Yusuke Atomori2 , Shinta Nakayama2, and Hiroshi Yoshiura2 1
2
National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
[email protected] Faculty of Electro-Communication, The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, 182-8585, Japan {atomori,shinta}@edu.hc.uec.ac.jp,
[email protected] Summary. A robust video watermarking method is described that can embed watermarks immune to not only rotation, scaling, and clipping and translation but also random geometric distortion and any of their combinations. It can detect watermarks without any search for canceling the effect of random distortion. Rotation, scale, and translation are canceled by searching for the angle, size, and origin of the original picture. The search for the angle and size is combinatorial and independent of the search for the origin. That is, the angle and size of the target picture can be restored without necessarily restoring the picture’s origin. Furthermore, the search for the origin is divided into horizontal and vertical searches that can be carried out independently. The number of searches is thus drastically reduced, making the processing time short enough for practical application. Watermark strength is controlled using an improved human visual system model for processing color information. This is done in the L*u*v* space, where human-perceived degradation of picture quality can be measured in terms of Euclidean distance. The watermarks are actually embedded and detected in the YUV space, where watermarks can be detected more reliably than that in the L*u*v* space. Experimental evaluation using actual video samples demonstrated that the proposed method can embed watermarks immune to rotation, scaling, clipping and translation, random distortion, and combinations thereof while maintaining picture quality. Keywords: video watermarking, RST, random distortion, human visual system.
2.1 Introduction Digital content is becoming widely available through various types of media such as the Internet, digital broadcasting, and DVDs because of its J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 21–46. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
22
I. Echizen et al.
advantages over analog content. It requires less space, is easier to process, and is not degraded by age or repeated use. A serious problem, however, is that the copyrights of digital videos are easily violated because the contents can easily be copied and sent illegally over the Internet. Video watermarking, which helps protect the copyrights by embedding copyright information, is therefore becoming increasingly important. We are working on ways to improve the digital watermarking of videos. Video watermarks are more useful when they are better able to survive geometric transformations (such as rotation and scaling) and non-geometric ones (such as MPEG compression and filtering) and when they cause less degradation of picture quality. Watermarking robust to geometric transformation has been studied in the development of various methods such as for searching for watermarks embedded in pictures [18], for using watermark patterns robust to rotation, scaling, and translation [14, 17], and for using watermark patterns suitable for estimating distortion [3]. Random distortion is one of the most difficult kinds of geometric image processing for watermarks to survive, and coping with it requires additional methods such as searching with the help of the original picture [11, 13], using robust watermark patterns with specific sequences in the embedding process [20], embedding watermarks with respect to the feature points of pictures [2], and clarifying the characteristics of various types of distortion from training data [15]. Much research has been done on surviving geometric transformation, and several methods can embed watermarks immune to rotation, scaling, and translation (RST). Only a few methods, however, can embed watermarks immune to random geometric distortion. Since random distortion distorts various parts of the embedded watermark in various ways so that the expected watermark pattern is no longer the same as the watermark embedded in the picture, random distortion is difficult to treat. We have developed a dual-plane correlation-based (DPC-based) video watermarking method that can embed watermarks immune to RST and to random distortion and any of their combinations [1, 8]. The watermarks are embedded in two constituent planes (the U and V planes) of a color picture and are detected by evaluating the correlation between the planes. We also developed a human visual system (HVS) model of color information that we use to control watermark strength in our watermarking method. This model can be used in a wide variety of watermarking methods. Section 2.2 describes conventional watermarking methods for improving resistance to geometric distortion. Section 2.3 describes our previously reported DPC-based watermarking method for immunity to random distortion and the problems with this method. Section 2.4 describes our newly developed DPC-based video watermarking method for immunity to RST and random distortion. Section 2.5 describes the HVS model for processing color information and its incorporation into the DPC-based video watermarking method. Section 2.6 reports our experimental evaluation showing the effectiveness of
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
23
the DPC-based video watermarking method, and Section 2.7 concludes the chapter with a brief summary.
2.2 Conventional Methods for Improving Resistance to Geometric Distortion Improving watermark resistance to geometric distortion has been actively studied for both the pixel and frequency domains, and the methods established by these studies can be classified into four types. (1) Search methods: These are the most basic methods in that parameters such as the origin, angle, and size of the watermarked picture are searched until the watermark is detected or synchronization between expected watermark patterns and picture elements is achieved [18]. (2) Methods using invariant patterns: Patterns of embedded watermarks are devised so that they can be detected directly from geometrically distorted pictures or can be detected without exhaustive searches. Such invariant patterns have been proposed for rotation, scaling, and shifting [14, 17]. (3) Methods using template watermarks: Watermark patterns suitable for estimating geometric distortion are devised and embedded along with watermarks conveying information. These template watermarks are first detected to estimate the distortion, and then the watermarks conveying information are detected on the basis of the estimation [3]. Watermark patterns that can be used both as templates and for conveying information have also been proposed [10, 16]. (4) Learning methods: Training data are used by learning procedures to estimate the geometric distortion added to pictures, and these estimations are used to detect the watermarks [15]. Although the methods using learning can deal with random geometric distortion, they need additional data for training. The other three types of methods were not originally intended for treating random geometric distortion but for treating uniform geometric distortion (i.e., that uniformly added to the whole picture). These methods have been therefore extended for dealing with random geometric distortions as follows. The watermark methods using searches require impractically large amounts of computation for treating random geometric distortion because they need to search for various parameters (i.e., origins, angles, and sizes) in each portion of the picture. One method that solves this problem [13] matches the target picture portions (i.e., the randomly distorted portions) to the corresponding original picture portions3 and uses the results of this matching to transform the target portions back to their original shapes. This method reduces the amount of computation by limiting searches for matching to the portions 3
More precisely, the target picture portions are matched with the picture portions immediately after watermarking and prior to distortion.
24
I. Echizen et al.
where watermarks are embedded and detects the watermarks correctly by restoring the original shapes. Since patterns that are completely invariant for random distortion have not yet been found, there are no methods that use invariant patterns. However, one method [12] can treat random distortion by using patterns less sensitive to random distortion coupled with searching and majority decision logic. Methods using template watermarks are now feasible given that watermarks suitable for estimating random distortion have been proposed [2]. Our DPC-based watermarking method is based on a principle different from those of the previous methods: watermarks are embedded in two of the three color planes (e.g., the R and G planes), and they are detected using the covariance between these two planes. This method is based on the fact that these two planes are distorted in the same way.
2.3 DPC-Based Still-Picture Watermarking for Resistance to Random Distortion This section describes our previously reported DPC-based watermarking method for still pictures (“basic method”) [24] on which our method for video watermarking is constructed. Our DPC-based watermarking method embeds synchronized pseudo-noise (PN) sequences into two of the planes constituting a color picture (e.g., the R and G planes) and detects the sequences by correlating these two planes. Because the two planes are distorted in the same way, the PN sequence embedded in one plane is synchronized with that embedded in the other even after random geometric distortion. Thus, the detection based on their correlation works without searches or special patterns. 2.3.1
Terminology
A color picture, P , is a two-dimensional pixel array with width W and height H. That is, (d) (2.1) P = pi,j | 0 ≤ i < W, 0 ≤ j < H, 0 ≤ d < D , (d)
where pi,j represents the d-th pixel value at pixel (i, j), and D represents the dimension of color space (e.g., RGB color space) and is usually three. A picture can therefore be resolved into D constituent planes, for example, R, G, and B planes. The d-th constituent plane, P (d) , of color picture P with width W and height H is defined as (d) (2.2) P (d) = pi,j | 0 ≤ i < W, 0 ≤ j < H . A unit block of watermarking is an L × L pixel block in which the values of all the pixels are changed in the same direction (i.e., all of them are either increased or decreased). L is called the block size.
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
25
A pseudo-random noise (PN) sequence R is a pseudo-random sequence of integers +1 and −1: R = {rk,l | rk,l ∈ {−1, +1}, 0 ≤ k < W/L, 0 ≤ l < H/L} .
(2.3)
Mask M using PN sequence R is defined as
M = mi,j | mi,j = ri/L,j/L , 0 ≤ i < W, 0 ≤ j < H .
(2.4)
The mask is the watermark pattern actually embedded into the constituent plane. 2.3.2
Basic Procedure
For clarity, we describe here the method for one-bit watermarking. In the multi-bit case, each bit is assigned to each area of the picture and is embedded and detected in the same way as in the one-bit case. The embedding part of the one-bit watermarking method is as follows. Watermark embedding Step E1. Resolve P into constituent planes, P (0) , P (1) , and P (2) ; planes P (1) and P (2) are used for watermarking. Step E2. Construct mask M from R and L in accordance with the width and height of P . Step E3. Embed PN sequence R (i.e., corresponding to mask M ) into P (1) using (1) (1) (1) p i,j = pi,j + αi,j mi,j , (2.5) (d)
(d)
where αi,j (> 0) represents the watermark strength at pi,j . This operation (1)
(1)
increases or decreases pi,j by degree αi,j depending on the value of mi,j . Step E4. Embed R into P (2) using (2) (2) pi,j + αi,j mi,j if b = 1 (2) p i,j = (2.6) (2) (2) pi,j − αi,j mi,j if b = 0. The value of bit b determines whether this operation embeds into P (2) a pattern that is either the same as or the reverse of that embedded into P (1) . (1) (2) Step E5. Construct P from P (0) , P , and P . Watermark detection The detection part of the one-bit watermarking method is as follows.
26
I. Echizen et al.
Step D1. Extract the two constituent planes, P way as in the embedding. Step D2. Calculate the covariance c using
(1)
and P
c = (p i,j − p i,j )(p i,j − p i,j , (1)
(1)
(2)
(2)
(2)
, in the same
(2.7)
where • means the average value over i and j. Step D3. If c > 0, detect b = 1; if c < 0, detect b = 0; if c = 0, b is not detected. The detection of the embedded bit can be based on the correlation between the two planes. Even after random geometric distortion, the PN sequences em(1) (2) bedded in P and P are still synchronized because these two planes are distorted in the same way. The embedded bit can therefore still be detected. 2.3.3
Dealing with Random Geometric Distortion
Theoretical analysis of basic procedure As mentioned in Section 2.3.2, the determination of the bit value is based on (1) covariance c (formula (2.7) of Step D2), which, by the definition of P and (2) P , can be transformed as follows: (1)
(1)
(2)
(2)
(2.8)
(2) (1) pi,j αi,j mi,j )
(2.9)
c = (pi,j − pi,j )(pi,j − pi,j ) +
(2) (1) (pi,j αi,j mi,j (1)
−
(2)
(1)
(2)
± (pi,j αi,j mi,j − pi,j αi,j mi,j ) ±
(1) (2) (αi,j αi,j
−
(1) (2) αi,j mi,j αi,j mi,j ),
(2.10) (2.11)
where “±” means “+” when embedded bit b is 1 and “−” when b is 0. Portion (2.8) represents the covariance between P (1) and P (2) and depends only on the nature of the picture and the selection of the two constituent planes. (2) (1) (1) (2) The first terms in portions (2.9) and (2.10), pi,j αi,j mi,j and pi,j αi,j mi,j , have an expected value of 0, as Bender et al. showed in their analysis of the Patchwork watermarking method [3]. The second terms in these two portions also have an expected value of 0. The expected value and variance of c for L = 1 can thus be evaluated as follows.
(1) (2) (1) (2) E(c) = ˆ pi,j pˆi,j ± αi,j αi,j 1 −
1 HW
,
(2.12)
1 (2)2 (1)2 (1)2 (2)2 (1) (2) (1) (2) αi,j pˆi,j +αi,j pˆi,j ± 2αi,j αi,j pˆi,j pˆi,j HW 1 1 (1)2 (2)2 (1)2 (2)2 (1) (2) αi,j αi,j +αi,j αi,j −αi,j αi,j 2 (, 2.13) 2+ + 2 2 H W HW
V (c) =
(d)
(d)
(d)
(d)
where pˆi,j is given by pˆi,j = pi,j − pi,j . This formula reveals that the requirements for reliable detection after random geometric distortion are that
2
Dual-Plane Correlation-Based Video Watermarking for Immunity (2)2 (1)2
27
(1)2 (2)2
the noise, especially variances αi,j pˆi,j and αi,j pˆi,j and covariances (1) (2)
(1)
(2) (1) (2)
ˆ pi,j pˆi,j and αi,j αi,j pˆi,j pˆi,j , should be reduced. The next section explains how we did this. Improvement of basic procedure Noise-reduction preprocess The noise reduction method developed by Choi et al. for DCT-domain watermarking [4] preprocesses the DCT coefficients of the watermarked area before using them to detect watermarks. The preprocess subtracts the average DCT coefficients of the surrounding neighborhood areas, where watermarks are not embedded, from the DCT coefficients of the watermarked area. The watermarked area and the surrounding areas have similar DCT coefficients in the original picture because of the natural similarity of neighboring areas. The subtraction thus reduces the original DCT information of the watermarked area. That is, it reduces the detection noise. It does not reduce the watermark DCT information, however, because the surrounding areas are not watermarked. We use a modified version of this method as a noise reduction preprocess in our method. Before detection step D2, plane P (d) (d = 1, 2) is processed using (d)
(d)
pi,j ← pi,j −
1 (d) (d) (d) (d) pi−L,j + pi,j−L + pi+L,j + pi,j+L . 4
(2.14)
Because neighboring pixels in the horizontal and vertical directions tend to have similar values, this preprocess reduces the variance of plane P (d) when L is small. Selection of two planes The two planes for watermarking should be selected in such a way that the covariance between them before watermarking is small. Because RGB and YUV color systems are generally used on computers, we select the planes from them. Figure 2.1 shows the covariances between two planes for a standard Lenna picture after the noise reduction preprocess. Size L corresponds to the distance between the target pixel and the surrounding pixels. The covariances between U and V are the smallest absolute values. The U and V planes are therefore selected for watermarking; P (0) = Y , P (1) = U , and P (2) = V . Shift of plane Two constituent planes of a picture have an inherent correlation. We reduce this correlation by shifting one of them during watermark detection; that is, watermarks are embedded in P (2) at the position shifted by s = (Δx, Δy).
28
I. Echizen et al.
1000
Covariance
800
RG GB BR YU UV VY
600 400 200 0 -200 1
2
3
4
5
6
Size L of watermarking block
7
8
Fig. 2.1. Covariance between two constituent planes of standard picture Lenna
We call s a shift vector. Corresponding to the shifted embedding, shifted watermark detection is done by correlating P (1) with P (2) shifted by −s. This shifted detection desynchronizes the two planes and reduces the inherent correlation while keeping the PN sequences embedded in the two planes synchronized. Thus, embedding Step E4 and detection step D2 are changed as follows. Step E4’. Embed R into P (2) using (2) (2) pi,j + αi,j mi−Δx,j−Δy if b = 1, (2) p i,j = (2) (2) pi,j − αi,j mi−Δx,j−Δy if b = 0,
(2.15)
where i and j satisfy Δx ≤ i < W and Δy ≤ j < H respectively. Step D2’. Calculate covariance value c using c = (p i,j − p i,j )(p i+Δx,j+Δy − p i+Δx,j+Δy . (1)
(1)
(2)
(2)
(2.16)
2.4 DPC-Based Video Watermarking for Resistance to Rotation, Scale, Translation, and Random Distortion In this section we describe our DPC-based video watermarking method that can embed watermarks immune to RST, random distortion, and any of their combinations. In the basic DPC-based watermarking method described above, watermarks are embedded into the constituent planes of a still picture so that they are immune to random distortion. We improved this basic method to achieve video watermarking that is immune to both RST and random distortion. Here we introduce the techniques for both watermark embedding and detection.
2
2.4.1
Dual-Plane Correlation-Based Video Watermarking for Immunity
29
Terminology
We redefine the terminology described in Section 2.3.1 for video watermarking. A color picture frame, P (f ) , is a two-dimensional pixel array with width W and height H of the f -th frame. That is, (f,d)
P (f ) = {pi,j
| 0 ≤ i < W, 0 ≤ j < H, 0 ≤ d < D}.
(2.17)
The d-th constituent plane of the f -th frame P (f,d) of P (f ) is defined as
(f,d)
P (f,d) = {pi,j 2.4.2
| 0 ≤ i < W, 0 ≤ j < H}.
(2.18)
Watermark Embedding
We use the following techniques to achieve immunity to RST and random distortion. They are used to embed the same watermark pattern in each frame of the video. The pattern is immune to clipping and the accompanying translation. Cell as minimum detection area: The target picture is divided into rectangular areas of the same size, and each area is called a cell. Mask M , which is generated for the whole picture in the basic method, is now generated for and embedded into each cell, as shown in Fig. 2.2. A cell is the minimum area for detecting watermarks. Finding start point of cell: If a picture has been clipped, the start point (e.g., the upper-left corner) of a cell may not be the start point of the picture. We therefore divide each cell into area for embedded information (called “INF”) and that called belt (“BLT”) for use in finding the start point of the cell (see Fig. 2.2). A belt is a rectangular area that lies between two cells. Two PN sequences that are mutually independent are embedded into P (f,1) and P (f,2) for each belt, so the expected values of the correlations between P (f,1) and P (f,2) are zero for the belts while the expected values for the cells deviate from zero in accordance with the embedded bits. 2.4.3
Watermark Detection
Watermark detection in the improved method is the same as in the basic method except for treating geometric transformation. Additionally, we introduce redundant coding and frame accumulation techniques to prevent watermark detection errors. Treating random distortion: Watermarks embedded using the basic method are immune to random distortion because the detection uses two constituent planes that are distorted in the same way. The improved method does likewise, so the watermarks are immune to random distortion.
30
I. Echizen et al.
Treating rotation and scaling: Rotation and scaling are identified using a combinatorial search; the target picture is rotated and scaled using all combinations of candidate angles and sizes to find the combination that recovers the original angle and size. To find the correct combination, we use the notion of “number of sign flippings” (NF). The covariance plane, C, is a monochrome plane consisting of pixel values ci,j s. Each value ci,j in the covariance plane is an element of the covariance of P (f,1) and P (f,2) (the details are described in Section 2.4.4). NF is defined as the number of pixels in the covariance plane for which the sign differs from that of the pixel on the right and from that of the pixel below. Each area of P (f,1) and the corresponding area of P (f,2) are made to have a positive or negative covariance on the basis of the embedded bit. The ci,j therefore has the same sign within each area if the shift vector is correctly applied, i.e. if both the angle and size are correctly restored in the search (Fig. 2.3(a)). On the other hand, it has a positive or negative sign randomly if either the angle or size is not correctly restored (Fig. 2.3(b)). NF is therefore minimum at the correct combination of angle and size. Note that finding the minimum NF is possible even if translation is not resolved. Treating clipping: Clipping is treated after recovering the angle and size. The belts are found horizontally and vertically by using NF and are used to find the start point of each cell. To search for a horizontal belt, the NF is counted along each horizontal pixel array. It increases only at the boundary between areas with different bit values. In a belt, however, it increases at every pixel in the belt with a probability of 0.5. Large NFs are therefore obtained for belts with a period equal to the cell size. The horizontal belts are thereby found. The vertical belts are found in a similar manner. Note that the searches for the horizontal and vertical belts can be done independently. Redundant coding and frame accumulation: Because various geometric and non-geometric transformations may attenuate watermark signals and cause detection errors, we introduce a redundant coding and frame accumulation technique into our method. It embeds watermarks repeatedly in every frame of a video and can thus prevent errors by accumulating a specific number of sequential frames coded repeatedly during watermark detection. 2.4.4
Procedure
To simplify the explanation, we introduce a one-bit watermark-embedding procedure. When multi-bit information is to be embedded, the one-bit schema is applied to each area of the frame. Watermark embedding Step E1. Resolve P (f ) into three constituent planes, P (f,d) s (d = 0, 1, 2), select two of them, P (f,1) and P (f,2) , and divide each P (f,d) into M
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
31
Independent sequences are embedded using belts. r⎣i L ⎦ ⎣ j ( 1)
P
,
( f ,1)
BLT
BLT
INF
INF
H
BLT c
r⎣i L ⎦ ⎣ j ( 2)
L⎦
,
Shift vector
INF BLT
INF
q ⎣i L ⎦ ⎣ j
c
,
( f ,2)
BLT
BLT
INF
W
BLT
P
INF
BLT
INF
L⎦
INF
L⎦
Sequence is repeatedly embedded into areas for embedded info. Fig. 2.2. Watermark embedding
(a) Correctly restored size and angle
(b) Incorrectly restored size and angle
Fig. 2.3. Use of NF to restore angle and size of picture (each area surrounded by bold lines corresponds to an embedded bit)
rectangular areas of the same size (M cells), P (f,d,k) s (k = 1, . . . , M ). Cell P (f,d,k) is given by (f,d) (2.19) P (f,d,k) = pi,j | 0 ≤ i < Wc , 0 ≤ j < Hc , where Wc and Hc are the width and height of the cell. Note that the basic method selects the Y, U, and V planes for P (f,0) , P (f,1) , and P (f,2) as described in Section 2.3.3, and watermarks are embedded into the U and V planes4 . Step E2. Construct two masks, M (d) s (d = 1, 2), and embed them into the corresponding cells. Each mask is given by 4
It can also select the V and U planes for P (f,1) and P (f,2) .
32
I. Echizen et al.
(d) M (d) = mi,j | 0 ≤ i < Wc , 0 ≤ j < Hc ,
(2.20)
(d)
where mi,j is defined as (d) mi,j
=
(d)
ri/L,j/L if (i, j) ∈ BLT qi/L,j/L if (i, j) ∈ IN F ,
(2.21)
(d)
where ri,j and qi,j are the pseudo-random sequences of ±1s with 0 ≤ i < (1) (2) (d) Wc /L and 0 ≤ j < Hc /L. They satisfy i,j ri,j ri,j = i,j ri,j qi,j = (1)
(2)
0. Note that ri,j and ri,j are used for finding the start point of each cell in watermark detection (See Fig 2.2), and the sets of IN F and BLT are respectively defined as IN F = {(i, j) | Wb ≤ i < Wc , Wb ≤ j < Hc } and BLT = BLTH ∪ BLTV , where BLTH and BLTV are given by BLTH = {(i, j) | 0 ≤ i < Wc , 0 ≤ j < Wb } ,
(2.22)
BLTV = {(i, j) | 0 ≤ i < Wb , Wb ≤ j < Hc } ,
(2.23)
where Wb is called the belt width. Step E3. Embed M (1) into P (f,1,k) for each k using (f,1,k)
pi,j
(f,1,k)
= pi,j
(f,1,k)
+ αi,j
(1)
mi,j ,
(f,d,k)
(2.24) (f,d,k)
(> 0) represents the watermark strength at pi,j . where αi,j (2) (f,2,k) into P for each k in accordance with embedded Step E4. Embed M bit b using (f,2,k) (f,2,k) (2) pi,j + αi,j mi−s,j−s if b = 1 (f,2,k) (2.25) = pi,j (f,2,k) (f,2,k) (2) pi,j − αi,j mi−s,j−s if b = 0. Step E5. Construct watermarked frame P 1, . . . , M ).
(f )
from P
(f,d,k)
s (d = 1, 2, k =
Watermark detection Detecting one-bit information in the picture is done as follows. Step D1. Input F watermarked frames, P (f ) s (f = f0 , f0 +1, . . . , f0 +F −1), (f,d) s (d = 1, 2). and resolve them into constituent planes P Step D2. Accumulate the F watermarked planes for each d. Accumulated planes P˜ (d) s (d = 1, 2) are given by (d)
p˜i,j =
1 F
f0 +F −1
p i,j , (f,d)
f =f0
where F is called the accumulation number.
(2.26)
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
33
Step D3. Divide each P˜ (d) into M cells: P˜ (d,k) s (k = 1, . . . , M ). Step D4. Preprocess P˜ (d,k) (d = 1, 2, k = 1, . . . , M ) using a smoothing operation: (d,k)
p˜i,j
(d,k)
← p˜i,j
−
1 (d,k) (d,k) (d,k) (d,k) p˜i−L,j + p˜i,j−L + p˜i+L,j + p˜i,j+L . 4
(2.27)
Step D5. Calculate covariance plane C = {ci,j } using ci,j =
M−1
(0,k)
(˜ pi,j
(0,k)
− p¯ ˜
(1,k)
(1,k)
)(˜ pi+s,j+s − p¯˜
),
(2.28)
k=0 (d,k) (d,k) is the average value of p˜i,j over i and j. where p¯ ˜ Step D6. Detect geometric transformation using covariance plane C, and generate correctly modified covariance plane C˜ = {˜ ci,j }. The belts of the cells are found horizontally and vertically by using independent pseudo(0) (1) random sequences ri,j and ri,j , which are used to find the start point of each cell. ˜ StepD7. Detect the embedded bit using C; that is, b is detected as 1 if c ˜ > 0, b is detected as 0 if c ˜ i,j i,j < 0, and b is not detected if i,j i,j c ˜ = 0. i,j i,j
2.5 Use of Human Visual System Model for DPC-Based Video Watermarking The DPC-based video watermarking method described in Section 2.4.4 orig(f,d,k) inally embeds watermarks with a constant watermark strength (αi,j = α) regardless of the visual characteristics of the picture content. Watermarks produced by our original method therefore degrade the picture quality or else have to be weakened uniformly, meaning they cannot reliably survive random distortion or RST. This is because the use of human vision features to control watermarking has been studied mainly for watermarking using luminance information [6, 21] and little for watermarking using color information (such as chrominance information). In this section, we describe the use of a human visual system model for processing color information to control (f,d,k) . The improved the strength of the DPC-based video watermarking, αi,j method determines the watermark strength in the L*u*v* color space, where human-perceived degradation of picture quality can be measured in terms of Euclidean distance. This HVS model can be applied not only to the DPCbased method but also to other methods, both in the pixel and frequency domains.
34
I. Echizen et al.
2.5.1
Controlling Watermark Strength in Accordance with Pixel Values
Basic scheme Human sensitivity to color change depends on the luminance and color of the target picture portion. This phenomenon has been well studied in the field of color engineering. We use the L*u*v* uniform color space [5, 9] as a basis for controlling watermark strength. A pixel in the L*u*v* space is represented by three coordinate values, L∗ , u∗ , and v ∗ . Formulas mapping from X, Y , and Z (well-known color coordinates derived from R, G, and B using linear conversions) to L∗ , u∗ , and v ∗ are defined as follows. ⎧
1/3 ⎪ Y Y ⎪ ⎪ 116 − 16 if > 0.008856, ⎪ ⎨ Yn Yn ∗ L (Y ) = (2.29)
⎪ ⎪ ⎪ Y Y ⎪ ⎩ 903.29 ≤ 0.008856, if Yn Yn (2.30) u∗ (Y, U, V ) = 13L∗(u − un ), ∗ ∗ (2.31) v (Y, U, V ) = 13L (v − vn ), where Yn , un , and vn are constants reflecting the properties of the standard light source, and the quantities un and vn are the (u , v ) chromaticity coordinates of a specified white object. Formulas for u and v are given by 4X , X + 15Y + 3Z 9Y . v (Y, U, V ) = X + 15Y + 3Z
u (Y, U, V ) =
(2.32) (2.33)
A remarkable property of the L*u*v* space is that human-perceived degradation of picture quality can be measured in terms of Euclidean distance, which is called “color difference,” or ΔE. Our basic idea is to control watermark strength so that color changes caused by watermarking are constant for all pixels in terms of color differences in the L*u*v* space. With this control, degradation of picture quality as perceived by the human eye can be kept constant, so watermarks can be embedded preferentially into pixels where watermarks are less perceptible. A straightforward realization of this idea is to embed watermarks directly in the L*u*v* space with uniform watermarking strength. Our preliminary evaluation showed, however, that watermarking directly in the L*u*v* space did not reduce the inherent correlation when the picture quality was maintained. We thus used the following scheme to improve our method. (a) Watermarks are, as in our original method, embedded and detected in the U and V planes of the YUV space, where watermarks can be detected more reliably than in the L*u*v* space.
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
35
(b) Watermark strength is controlled in the L*u*v* space so that the color difference in the L*u*v* space corresponding to the watermark is constant for all pixels in the picture. Determining watermark strength The color difference is defined by ΔE = (ΔL∗ )2 + (Δu∗ )2 + (Δv ∗ )2 ,
(2.34)
where ΔL∗ , Δu∗ , and Δv ∗ are the changes in the L*u*v* space due to changes in the YUV space (ΔY , ΔU , and ΔV ). Since our original method does not embed into the Y plane (ΔY = 0), the L* plane does not change (ΔL∗ = 0). Consequently, Δu∗ and Δv ∗ are given by ∂u∗ ∂u∗ + ΔV , (2.35) ∂U ∂V ∗ ∗ ∂v ∂v + ΔV . (2.36) Δv ∗ = v ∗ (Y, U +ΔU, V +ΔV ) − v ∗ (Y, U, V ) ∼ ΔU ∂U ∂V
Δu∗ = u∗ (Y, U +ΔU, V +ΔV ) − u∗ (Y, U, V ) ∼ ΔU
From these formulas, ΔE can be approximated using 2 2 ∂u∗ ∂v ∗ ∂v ∗ ∂u∗ ΔE ∼ + ΔV + ΔV + ΔU . ΔU ∂U ∂V ∂U ∂V If condition |ΔU | = given by ⎧ ⎪ ⎪ ⎨ (ΔU, ΔV ) = ⎪ ⎪ ⎩
(2.37)
|ΔV | is assumed in formula (2.37), ΔU and ΔV are (ΔEδ + , ΔEδ + ) (ΔEδ − , −ΔEδ − ) (−ΔEδ − , ΔEδ − ) (−ΔEδ + , −ΔEδ + )
if if if if
ΔU ΔU ΔU ΔU
> 0, > 0, < 0, < 0,
ΔV ΔV ΔV ΔV
>0 0 < 0,
(2.38)
where δ + = δ + (Y, U, V ) = ∂u∗ ∂U
δ − = δ − (Y, U, V ) = ∂u∗ ∂U
1 ∗ ∗ 2 + ∂u + ∂v ∂V ∂U +
∂v ∗ 2 ∂V
,
(2.39)
1 2 ∂v∗ + ∂U −
∂v ∗ 2 ∂V
.
(2.40)
−
∂u∗ ∂V
From formula (2.38), we can measure the human-perceived degradation of picture quality in the U and V planes. For consistency with the original (f,k) method, we define acceptability of watermark (AoW), wi,j , representing the degree of imperceptibility of a chrominance change at pixel (i, j) in the k-th cell of the f -th frame. If, for example, the AoW at pixel (1, 0) is larger than that at pixel (2, 0), a change in chrominance at pixel (1, 0) is less
36
I. Echizen et al. Table 2.1. Outputs of φ (1)
(2)
(1)
(2)
φ mi,j mi−s,j−s > 0 mi,j mi−s,j−s < 0 b=1 + − b=0 − +
perceptible than one at pixel (2, 0). AoW is positive in the range of i, j, f , and k. Considering the signs (positive or negative) of ΔU and ΔV in formula (2.38) and steps E3 and E4 of watermark embedding in Section 2.4.4, we (f,1,k) (f,2,k) and Δpi,j , which can replace (ΔU, ΔV ) in formula (2.38) with Δpi,j are defined using AoW. (f,1,k)
Δpi,j
(f,2,k) Δpi,j
(f,k)
(1)
= ΔEwi,j mi,j , =
(2.41)
(f,k) (2) ±ΔEwi,j mi−s,j−s ,
(2.42)
where the sign of formula (2.42) depends on embedded bit b, the same as in step E4. From formulas (2.38), (2.41), and (2.42), we can derive AoW, (1) (2) which depends on b and the sign of the product of the masks, mi,j mi−s,j−s ; that is, (f,k) (f,0,k) (f,1,k) (f,2,k) , (2.43) wi,j = δ φ pi,j , pi,j , pi,j (1)
(2)
where φ = φ(b, mi,j mi−s,j−s ) outputs ‘+’ or ‘−’ depending on b and (1)
(2)
mi,j mi−s,j−s in accordance with the rules shown in Table 2.1. 2.5.2
Controlling Watermark Strength in Accordance with Pixel Value Relationships
The control of watermark strength described in Section 2.5.1 can be done using not only the values of the target picture portion but also the relationships between the values of the target portion and the surrounding portions. The latter type of control has been well studied for watermarking using luminance information [6, 7, 21] but not for watermarking using color information, as in our original method. We therefore propose using heuristic knowledge, like that identified by Tajima and Ikeda in their study of color image quantization for limited color display [22]. They investigated how large the quantization steps in the L∗ , u∗ , and v ∗ coordinates could be without degrading picture quality. In their heuristics, the randomness of pixel (i, j) in the k-th cell of the f -th frame is defined as proportional to exp
(f,k)
i,j 512
(f,k)
, and i,j
is
defined as (f,k)
i,j
=
x=±1
d(f,k) [(i, j), (i + x, j)] +
y=±1
d(f,k) [(i, j), (i, j + y)],
(2.44)
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
37
where d(f,k) [(i, j), (i , j )] is the square of the skewed difference between target pixel (i, j) and pixel (i , j ) and is defined as 2 ∗(f,k) ∗(f,k) d(f,k) [(i, j), (i , j )] = 4 Li,j − Li ,j 2 1 2 1 ∗(f,k) ∗(f,k) ∗(f,k) ∗(f,k) ui,j vi,j + − ui ,j + − vi ,j (2.45) . 4 4 The quantization step for representing pixel (i, j) is set to be proportional to the pixel’s randomness. The quantization step corresponds to the strength of the noise in the color information, which can be seen as the strength of the watermark. We assume that the average watermark strength can be increased while maintaining picture quality the watermark strength for each
by setting pixel (i, j) proportional to exp
(f,k)
i,j 512
. AoW is therefore redefined as
(f,k) w ˜i,j
= exp
(f,k)
i,j 512
(f,k)
wi,j .
(2.46)
The research results of Tajima and Ikeda cannot be directly applied to our watermarking in picture portions where pixel values change drastically, such as at boundaries. Some correction is needed if we are to use the results of Tajima and Ikeda. Here we use a simple correction. (f,k)
w ˜i,j
(f,k)
(f,k)
⎧ ⎪ ⎪ ⎨
where ξi,j
(f,k)
= ξi,j wi,j ,
=
(2.47) (f,k)
i,j 512
T if exp ≥ T,
(f,k) ⎪ ⎪ otherwise, ⎩ exp i,j 512
where T is the threshold for the value of exp
(f,k)
i,j 512
(2.48)
. It was set to 2.3 on the
basis of preliminary evaluations in which the picture quality of the standard motion pictures described in Section 2.6 was evaluated with different values of T . 2.5.3
Steps For Controlling Watermark Strength
The proposed use of the HVS model described in Sections 2.5.1 and 2.5.2 can be applied to a wide variety of watermarking methods. We have added its use to our original watermarking method to control the strength of the watermarks. We did this by changing one of the parameters used in the orig(f,d,k) (f,k) is replaced with βγi,j , where β is inal method: watermark strength αi,j (f,k)
the average watermark strength, and γi,j represents the relative acceptability of the watermark at pixel (i, j) in the k-th cell of the f -th frame,
38
I. Echizen et al. (f,k)
(f,k)
γi,j
=
1 Hc Wc
w ˜i,j
i,j
(f,k)
w ˜i,j
,
(2.49)
where Hc and Wc are the width and height of the cell. We also modified steps E3 and E4. Step E3’. Embed M into P (f,1,k) for each k using (f,1,k)
pi,j
(f,1,k)
= pi,j
(f,k)
(1)
+ βγi,j mi,j .
Step E4’. Embed M into P (f,2,k) for each k using (f,2,k) (f,k) (2) pi,j + βγi,j mi−s,j−s if b = 1, (f,2,k) = pi,j (f,2,k) (f,k) (2) pi,j − βγi,j mi−s,j−s if b = 0.
(2.50)
(2.51)
The relative acceptability of the watermark is used to control the watermark strength. The effectiveness of this approach was shown by experimental evaluation.
2.6 Experimental Evaluation The ability of the DPC-based method to maintain picture quality and its ability to detect watermarks after geometric transformations were compared experimentally with those of our original method by using standard motion pictures [23] (each with 300 frames of 720 × 480 pixels) having different properties (see Figure 2.4). • • •
Walk through the Square (“Walk”): people walking in a town square (low-speed panning shot). Rustling Leaves (“Leaves”): leaves fluttering in the breeze (static shot). Whale Show (“Whale”): whale splashing in front of audience (highspeed panning and tilting shot).
The evaluation was done using the parameters and values shown in Table 2.2. 2.6.1
Evaluation of Picture Quality
Procedure Seven copies of each standard picture were watermarked at a different strength for each using the DPC-based method using the HVS described in Section 2.5.3 (“improved method”), and seven were watermarked at a constant strength using the DPC-based method not using the HVS described in
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
39
Table 2.2. Evaluation parameters Planes, P (f,d) s Belt width, Wb Shift vector (s, s) Payload Block size, L × L Cell size, Hc × Wc
U and V planes of YUV color system 8 pixels (10, 10) 64 bits 4 × 4 pixels 136 × 136 pixels
(a) Walk
(b) Leaves
(c) Whale Fig. 2.4. Evaluated pictures Table 2.3. Rating scale Disturbance Points Imperceptible 5 Perceptible but not annoy- 4 ing Slightly annoying 3 2 Annoying 1 Very annoying
Section 2.4.4 (“original method”). The pictures were subjectively evaluated using the procedure described in Recommendation ITU-R BT.500-7 [19]. Each watermarked picture was shown, along with the original picture, to ten evaluators (each an expert in image or video processing) who rated the quality of the watermarked picture in accordance with the scale shown
40
I. Echizen et al.
in Table 2.3. For each picture, the average of the ten scores was used as the quality level. Results As shown in Figure 2.5, the quality of the pictures watermarked using the improved method was consistently judged to be better than that of those watermarked using the original method, except for Whale at a strength of 6. For most of the quality levels evaluated, the improved method enabled the watermark strength to be increased while maintaining picture quality. Table 2.4 shows the watermarking strengths of the original and improved methods at a quality level of 4 (perceptible but not annoying). It shows that the watermark strength could be increased by 81% for Walk, 29% for Leaves, and 14% for Whale without degrading quality. 2.6.2
Robustness against Geometric Transformation
Procedure We evaluated robustness against RST and random distortion using StirMark 3.1 with the default parameters as the representative random distortion5 . The watermarked pictures were transformed by rotation, scaling, clipping and translation, random distortion, and combinations thereof. Then, 64-bit information was sequentially detected using three different accumulation numbers (F = 1, 5, 10) from 300 frames of the watermarked pictures using the detection procedure (steps D1 through D7) described in Section 2.4.4. The detection rates were measured for F = 1, 5, and 10. The watermark strengths of the original and improved methods for each picture were set to the values listed in Table 2.4 so that the corresponding quality level was 4. Results Table 2.5 summarizes the correct-detection ratios for F = 1, 5, and 10 (numbers of detected points were, respectively, 300, 60, and 30). The condition for each geometric transformation is also shown in the table. The correctdetection ratio is the number of correctly detected points, meaning that all 64 bits were correctly detected, divided by the total number of detected points (300, 60, or 30). For all pictures and for all cases evaluated, the improved method had a higher or equal detection ratio, meaning that watermark survivability was improved. For F = 1 (non accumulation), the detection ratio of the improved method varied from 9.0% to 100%. For F = 5 and 10, the ratio varied from 63.3% to 100% and from 80.0% to 100%. We can thus infer 5
Although StirMark Benchmark 4.0 was recently released, we used 3.0 because most research in this area has been done using the older version, so its use makes our results easier to compare with those of other evaluations.
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
41
5
5 4 Quality level 3
Original Improved
4 Quality level 3 2
2 1
2 3 4 5 6 Watermark strength
7
1
(a) Walk
2 3 4 5 6 7 Watermark strength
(b) Leaves 5
4 Quality level 3 2 1
2 3 4 5 6 7 Watermark strength
(c) Whale Fig. 2.5. Quality of watermarked pictures Table 2.4. Watermark strengths of original and improved methods at quality level of 4
Original Improved
Walk Leaves Whale 3.61 3.55 4.03 6.53 4.58 4.60
that the improved method can be high enough for practical use when frame accumulation is used. Due to space limitations, we focus on the results for Walk and give a detailed description for each transformation. Rotation: As shown in Fig. 2.6(a), the NF was minimum at −5 deg, which correctly canceled a +5 deg. rotation attack . Scaling: The NF was minimum at 166% scaling, which correctly canceled a 60%-scaling attack (Fig. 2.6(b)). Clipping and translation: As shown in Figs. 2.6(c) and (d), the NF was maximum at 200 and 80, which was the correct starting point of the cell. Combination of random distortion, rotation, scaling, and translation: Rotation and scaling were first canceled without canceling random distortion and translation. Figure 2.7(a) shows that the NF for the angle and size pair was minimum at the correct angle (−5 deg) and size (166%).
42
I. Echizen et al. Table 2.5. Evaluation results for RST and random distortion Transformation Sample Random distortion
Rotation
Scaling
Condition
Walk
NA
Leaves
NA
Whale
NA
Walk
+5 deg
Leaves
+20 deg
Whale
−10 deg
Walk
60%
Leaves
125%
Whale
75%
Clipping Walk and translation Leaves Whale Combination of Walk all four Leaves Whale
(200, 80) through (600, 380) (170, 80) through (570, 380) (150, 50) through (550, 350) Use four above conditions for Walk Use four above conditions for Leaves Use four above conditions for Whale
Method Detection ratio (%) F = 1 F = 5 F = 10 Original 44.0 48.3 56.6 Improved 99.3 100 100 Original 28.0 60.0 90.0 Improved 83.3 96.6 100 Original 94.3 98.3 96.6 Improved 100 100 100 Original 97.6 100 100 Improved 100 100 100 Original 51.3 91.6 100 Improved 94.3 100 100 Original 100 100 100 Improved 100 100 100 Original 84.6 95.0 96.6 Improved 100 100 100 Original 53.6 98.3 100 Improved 96.6 100 100 Original 94.6 100 100 Improved 100 100 100 Original 20.3 36.6 56.6 Improved 95.6 98.3 100 Original 19.0 55.0 80.0 Improved 72.3 93.3 100 Original 67.0 80.0 90.0 Improved 89.6 91.6 90.0 Original 0.0 16.6 30.0 Improved 53.6 75.0 80.0 Original 5.3 50.0 68.3 Improved 9.0 63.3 93.3 Original 53.0 83.3 76.6 Improved 69.3 88.3 93.3
Translation was then canceled without canceling random distortion. Figures 2.7(b) and (c) shows that the NFs for the horizontal and vertical lines were maximum at the correct starting point of the cell. Embedded bits were then detected without canceling random distortion. Notice that, in the DPC-based method, the search for rotation and scaling is independent of that for translation, the searches for horizontal and vertical translations are mutually independent, and search for random distortion is not needed. Search space and processing time: Assume that the numbers of searches for random distortion, rotation, scaling, horizontal translation, and
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
25
20
20
%][ F 15 N
]% 15 [ F N 10
10 5 -15
43
NF -10
-5 Angle [degree]
0
5
NF
5 140
150
160
170
180
Scale [%]
(a) Angle search
(b) Scaling search
30
25
25
20
]% 20 [ F N 15
]% 15 [ FN 10 5
10
vertical belt
5 160
180
200
220
Offset [pixel]
(c) Offset search (vertical belt)
240
0
horizontal belt 40
60
80
100
120
Offset [pixel]
(d) Offset search (horizontal belt)
Fig. 2.6. NFs for angle, scaling, and offset searches
vertical translation are Sd , Sr , St , Sht , and Svt . If the searches are fully combinatorial, the size of the search space per accumulated picture frame is Sd Sr St Sht Svt . In the DPC-based method, however, the search for random distortion is not needed, and those for RST are performed independently, so the size of the search space is Sr St + Sht + Svt , which is drastically smaller than that in the fully combinatorial case. We developed a software prototype of the DPC-based method and measured the processing time for 300 frames of Walk using a PC (Intel Pentium 4/2.8 GHz). Note that, in actual use, the process for detecting geometric transformation in step D6 (Section 2.4.4) is done only in the first accumulated frame, and the following detection process reuses the geometric parameters detected in this step. This is because actual geometric transformation for videos is not likely to change in the middle of sequences. The measured processing time under the above condition for detecting geometric transformation was 15.4 sec. The total processing time depended on the accumulation number (F = 1, 10, 30): the average time per frame with F = 1 (non accumulation) was 0.879 sec., that with F = 10 was 0.146 sec., and that with F = 30 was 0.092 sec. These results demonstrate that the DPC-based method effectively uses frame accumulation and that it is suitable for various applications such as illegal copy tracing.
44
I. Echizen et al.
17 16 15 14 13
%][ FN
12 11 10 9
-15
-10
-5
Angle [degree]
0
1556
161
171
166
176
Scale [%]
(a) Angle and size search 30
35 30
]% 25 [ FN
]% [ 25 FN
20
20 15
vertical belt 160
180
200
220
Offset [pixel]
(b) Offset search (vertical belt)
15 240
horizon tal belt 40
60
80 Offset [pixel]
100
120
(c) Offset search (horizontal belt)
Fig. 2.7. NFs for combinatorial transformation search
2.7 Conclusion We have developed a dual-plane correlation (DPC)-based video watermarking method that embeds watermarks into the U and V planes to improve immunity against rotation, scaling, and translation transformations and randomdistortion geometric transformations. Special watermark patterns called Cells are used to reduce the search space and to treat clipping. We have also developed a human visual system model for use with the DPC-based watermarking. Watermark strength is determined in the L*u*v* space, where human-perceived degradation of picture quality can be measured in terms of the color difference. The watermarks are actually embedded and detected in the YUV space, where watermarks can be detected more reliably than in the L*u*v* space. Subjective evaluation using three standard motion
2
Dual-Plane Correlation-Based Video Watermarking for Immunity
45
pictures having different properties showed that using the DPC-based method using HVS results in higher picture quality and that, for the same picture quality, up to 81% more watermarks can be embedded. Testing using a prototype showed that the watermarks were robust against image processing (rotation, scaling, clipping and translation, random distortion, and combinations thereof) and that the processing time was sufficiently short for practical application.
References 1. Atomori, Y., Echizen, I., Dainaka, M., Nakayama, S., Yoshiura, H.: Robust video watermarking based on dual-plane correlation for immunity to rotation, scale, translation, and random distortion. Journal of Digital Information Management 6, 161–167 (2008) 2. Bas, P., Chassery, J.-M., Macq, B.: Geometrically invariant watermarking using feature points. IEEE Trans. Image Processing 11, 1014–1028 (2002) 3. Bender, W., Gruhl, D., Morimoto, N., Lu, A.: Techniques for data hiding. IBM Systems Journal 35, 313–336 (1996) 4. Choi, Y., Aizawa, K.: Digital watermarking technique using block correlation of DCT coefficients. IEICE Trans. Inf.& Syst. J83-D2, 1620–1627 (2000) 5. Commission Internationale de l’Eclairage, Colorimetry, CIE 15 (2004) 6. Delaigle, J.F., Devleeschouwer, C., Macq, B.: Watermarking algorithm based on a human visual model. Signal Processing 66, 319–335 (1998) 7. Delaigle, J.F., Devleeschouwer, C., Macq, B., Lagendijk, I.: Humans visual system features enabling watermarking. In: Proc. IEEE International Conference on Multimedia & Expo, vol. 2, pp. 489–492 (2002) 8. Echizen, I., Atomori, Y., Nakayama, S., Yoshiura, H.: Use of human visual system to improve video watermarking for immunity to rotation, scale, translation, and random distortion. Circuits, Systems and Signal Processing 27, 213–227 (2008) 9. Fairchild, M.: Color Appearance Models. Addison-Wesley, Reading (1998) 10. Fleet, D.J., Heeger, D.J.: Embedding invisible information in color images. In: Proc. ICIP 1997, vol. 1, pp. 532–535 (1997) 11. Hisanaga, R., Kuribayashi, M., Tanaka, H.: Improvement of watermark detection ability based on blockwise image compensation for geometrical distortions. In: Wiley, Electronics and Communications in Japan, Part III: Fundamental Electronic Science, vol. 89, pp. 1–9 (2006) 12. Iwata, M., Shiozaki, A.: Improvement of watermark robustness against affine transform by Hilbert scanning key generation and local search. IEICE Trans. Inf.& Syst. J84-D2, 1351–1359 (2001) 13. Kuribayashi, M., Tanaka, H.: Watermarking schemes using the addition property among DCT coefficients. In: Wiley, Electronics and Communications in Japan, Part III: Fundamental Electronic Science, vol. 86, pp. 11–23 (2003) 14. Lin, C.Y., Wu, M., Bloom, J.A., Cox, I.J., Miller, M.L., Lui, Y.M.: Rotation, scale, and translation resilient public watermarking for images. In: Proc. SPIE, vol. 3971, pp. 90–98 (2000) 15. Miyazaki, A.: Digital watermarking for images–its analysis and improvement using digital signal processing technique. IEICE Trans. Fundamentals E-85A, 582–590 (2002)
46
I. Echizen et al.
16. Nakamura, T., Ogawa, H., Tomioka, A., Takashima, Y.: An improvement of watermark robustness against moving and/or cropping the area of the image. In: Proc. SCIS 1999, pp. 193–198 (1999) 17. O’Ruanaidh, J.J.K., Pun, T.: Rotation, scale and translation invariant spread spectrum digital image watermarking. Signal Processing 66, 307–317 (1998) 18. Powell, R.D., Nitzberg, M.J.: Method for encoding auxiliary data within a source signal, United State Patent 6,385,330 (2002) 19. Rec. ITU-R, BT.500-7: Methodology for the subjective assessment of the quality of television pictures (1995) 20. Shiozaki, A., Tanimoto, J., Iwata, M.: A digital image watermarking scheme withstanding malicious attacks. IEICE Trans. Fundamentals E83-A, 2015–2022 (2000) 21. Swanson, M., Kobayashi, M., Tewfik, A.: Multimedia data-embedding and watermarking technologies. Proc. IEEE 86, 1064–1087 (1998) 22. Tajima, J., Ikeda, T.: High quality color image quantization, utilizing human vision characteristics. Journal of IIEEJ 18, 293–301 (1989) 23. The Institute of Image Information and Television Engineers: Evaluation video sample (standard definition) 24. Yoshiura, H., Echizen, I.: Color picture watermarking correlating two constituent planes for immunity to random geometric distortion. IEICE Trans. on Information & Systems E87-D, 2239–2252 (2004)
3 Restoring Objects for Digital Inpainting Yung-Chen Chou1 and Chin-Chen Chang1,2 1
2
Department of Computer Science and Information Engineering National Chung Cheng University, Chiayi 62102, Taiwan, R.O.C.
[email protected] Department of Information Engineering and Computer Science Feng Chia University, Taichung 40724, Taiwan, R.O.C.
[email protected] Summary. The technique of digital image inpainting is used to repair scratches or stains in aged images or films. Furthermore, inpainting can also be used to remove selected objects from a video. Video inpainting is more difficult than image inpainting because video inpainting considers two or more frames with the same or different backgrounds. In order to prevent an unexpected user from eliminating objects in a video, the technique of data embedding is used to embed object data into the video. This paper proposes a framework that combines object segmentation and data embedding. The experimental results show that the stego video carries object data with high visual quality and the lost objects can be clearly restored from the inpainted stego video.
3.1 Introduction Digital inpainting is a technique for repairing damaged areas of images or videos [8]. Digital inpainting can also be used to remove selected objects from videos (called video inpainting). For instance, a person standing on the beach can be removed from the video frames using inpainting. For secure monitor applications, objects are very important for proving illegal actions. If an unexpected user removes illegal behavior, then it will be difficult for judges to find evidence. To consider this, we propose an object data preserving framework to embed object information into the video frame using data embedding methods. Inpainting is a useful technique for repairing scratches or contamination of aged photos or videos. Bertalmio et al. proposed an automatic inpainting method to inpaint damaged pictures using surrounding information [1]. Yamauchi et al. proposed an image inpainting method that fills colors into a missing region and also considers the texture and structure of the mission region [14]. Yamauchi et al.’s method translates the image from the spatial domain into the frequency domain using discrete cosine transform (DCT). J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 47–61. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
48
Y.-C. Chou and C.-C. Chang
For high frequency bands, multi-level texture synthesis is applied; for low frequency bands, diffusion is used. The missing region inpainted by Yamauchi et al.’s method is more smoothly than that by Bertalmio et al.’s method. Video inpainting is more difficult than image inpainting because video inpainting considers two or more frames with the same or different backgrounds. That is, video inpainting may lead to shadows in the inpainted video. For object removal applications, digital inpainting is used to remove objects from images or videos by filling in colors into the eliminated region. Shih et al. proposed several techniques for video inpainting [9, 10, 11, 12]. Shih et al.’s method considers the temporal factor of video and effectively eliminates shadows in the inpainted video. Data embedding techniques, also called steganographic techniques, are used to embed secret messages into cover images [2, 3, 4, 5]. Because of the redundant space of digital images, a cover image can be used to carry secret data without attracting unexpected users’ attention. In other words, the distortion between stego image and original image can not be distinguished by human eyes. These techniques can be briefly divided into reversible and irreversible. For reversible data embedding, the original image can be restored from the stego image after extracting the secret data. For irreversible data embedding, the cover image cannot be restored after the secret data have been extracted. The reversibility of data embedding methods is related to the accuracy of the cover image, for instance, an x-ray image of patient carrying the case history for distance treatment and the x-ray image must be exactly reconstructed on the receiver side. For irreversible data embedding, Chan and Cheng proposed a simple data embedding method to embed secret data into cover images [2]. This method modifies k least significant bits (LSB) of pixels to imply the secret data with optimal substitution rules in order to reduce the distortion of the stego image. Mielikainen proposed an LSB-based data embedding method called the LSB matching revisited method [7]. This method pairs the pixels of the cover image; each pair embeds two secret bits by increasing or decreasing one pixel by one at most. The visual quality of the stego image produced by Mielikainen’s method is better than the stego image produced by Chan and Cheng’s method. Zhang and Wang proposed an exploiting modification direction method (EMD) to embed one digit of the 2n + 1 numeral system into a group containing n pixels [15]. This method embeds a secret digit into the pixel group by increasing or decreasing one of the pixels in the group by one. The visual quality of stego images produced by Zhang and Wang’s method is better than the stego images produced by Mielikainen’s method. For reversible data embedding, Tian utilized the concept of difference expansion to modify the pixels in order to imply the secret data [13]. This technique successfully achieves secret data delivery and completely restores the cover image. Tian’s method is limited in terms of payload because the method needs to remember the extra information for reversibility. Furthermore, Chang and Lu proposed a data embedding method to improve Tian’s
3
Restoring Objects for Digital Inpainting
49
method in terms of payload [4]. Chang and Lu’s method uses m neighboring pixels to conceal secret data and successfully restore the cover image. Chang and Lin proposed a reversible data embedding method for VQ-compressed images [3]. Their method applies the concept of side-match with relocation to modify the indices of the compressed image and to carry the secret message. After the secret data have been extracted, the original image can be restored. In this Chapter, an object data preserving framework is proposed to embed objects into video frames. Object segmentation and data embedding are two main components in the proposed framework. Any suitable object segmentation method can be applied to segment the objects, which are denoted as the interested objects. A suitable object segmentation method means that the interested object can be segmented by a user or an automatic segmentation algorithm; the object data are then embedded into frames of video. From the experimental results, the proposed framework not only successfully embeds the object data with tiny distortion but also successfully restores the objects. The rest of this Chapter is organized as follows. Section 3.2 briefly describes the concept of image inpainting and simple LSB substitution. The proposed framework of object data embedding is detailed in Section 3.3. Experimental results are given in Section 3.4. Conclusions are made in Section 3.5.
3.2 Related Works 3.2.1
Digital Inpainting
The drawback of video inpainting is the resulting shadow of the inpainted video. In order to improve the visual quality of inpainted video, Shih et al. proposed a multi-resolution image inpainting technique [12]. The inpainting regions can be completed using the following two steps. Step 1: Divide the image into many blocks of different resolutions; that is, the image is divided many times into different sized blocks. Step 2: Inpaint a target block by using one of following rules: Rule 1: If a suitable block is found from the small-sized blocks, then the target region is patched using that block. Rule 2: If a suitable block is not found in small-sized blocks, then the size of block is enlarged, and find a suitable block from large-sized blocks. Rule 3: If the target region cannot find a suitable block, then the interpolation strategy is applied to inpaint the region. These two steps are repeated until all of the missing regions have been inpainted. On the other hand, Shih et al. proposed an inpainting algorithm to improve the visual quality of inpainted video of the continuous large area [10]. Shih et al. considered the image properties to develop an exemplar-based image inpainting method for the continuous large missing region inpainting. Their
50
Y.-C. Chou and C.-C. Chang
method applies the technique of edge detection to classify the types of blocks; then the missing region is patched using the search procedure. After that, if a block in the region is predicted as the edge block, then the target region is patched by a suitable block found from the edge block pool; otherwise, a suitable block is found from the smooth block pool. Since, Shih et al.’s method considers the attributes of color distribution and structure information at the same time, the visual quality of the continuous large missing region has been significantly improved. 3.2.2
Simple LSB Substitution
Data embedding is a widely used technique for secret data delivery. The technique uses digital image to be a cover medium used to carry secret message. The embedded image, also called stego image, is sent to the receiver over public computer network without attracting unexpected users’ attention. To embed secret message into the pixel by substituting k least significant bits is a simple data embedding method. In order to improve the visual quality of the stego image, Chan and Cheng proposed a data embedding method [2] that uses optimal pixel adjustment process to imply secret message. Chan and Cheng’s method are described as follows. Let I be the grayscale cover image with H × W pixels, represented as I = {pij |0 ≤ i < H, 0 ≤ j < W, pij ∈ {0, 1, . . . , 255}}. The set S of secret data is represented as {si |0 ≤ i < n − 1, si ∈ {0, 1}}, where n is the number of secret bits. First, the secret data is rearranged as in the set S = {si |0 ≤ i < n − 1, si ∈ {0, 1, . . . , 2k − 1}, where n ≤ H × W }. Second, a set {xi |0 ≤ i < n − 1, xi ∈ {0, 1, . . . , 255}} of pixels is selected from the cover image using a predefined choosing strategy. Then, the selected pixel is modified using the following formula in Eq. (3.1): xi = xi − xi mod 2k + si ,
(3.1)
where xi represents the modified pixel, also called the stego pixel. Therefore, the embedding procedure is repeated until all of the secret messages have been embedded. Data extraction is the reverse work of data embedding. First, a pixel selection strategy is applied to select the pixels from stego image for extracting the secret data. Here, the pixel selection strategy of data embedding and extracting are the same. For each selected pixel, the secret data are extracted using si = xi mod 2k . After all of the secret data have been extracted, the message can be converted back to the original.
3.3 Object Data Preserving Framework In order to preserve the object data in the video, we propose a framework for concealing object information in video frames. Based on our framework,
3
Restoring Objects for Digital Inpainting
51
Fig. 3.1. Flowchart of the proposed framework
the lost object can be reconstructed from an inapinted stego video. Briefly, the proposed framework can be divided into embedding phase and object reconstruction phase (see Fig. 3.1). The details of these phases are described in Subsections 3.3.1 and 3.3.2, respectively. 3.3.1
Embedding Object Data
Let us assume that video V is represented as {fi |0 ≤ i < Nf }, where Nf is the number of frames of V . The objects are segmented using any suitable segmentation method or user intervention. The object size is reduced as follows; the color of objects is replaced using the color palette. Thus, a color palette generator is needed to generate a significant palette for objects. Here, the color palette can be obtained using an LBG training algorithm [6]. Each color is replaced by a color from the palette with the smallest Euclidean distance from the original color. The object’s color is then represented by an index. Thus, the object size is significantly reduced. The object data are composed of color information and pixel location. In general, the object region is smaller than the background region, so the objects’ locations can be recoded using the pixels’ coordinates. An object pixel is represented as (x, y, c), where x and y are the coordinates of a pixel with color index c. To summarize, the object data consists of four parts: the number of colors in the color palette, the number of pixels in the object, the color palette content, and the pixel coordinates with color index (see Fig. 3.2). Here, N S denotes the number of symbols of converted object data.
Fig. 3.2. The structure of object’s data stream
52
Y.-C. Chou and C.-C. Chang
The key steps of object data generation are summarized as follows. In Step 4, NC and NP represent the number of colors in the color palette and the number of object pixels, respectively. Procedure 1: Object data generation Input: Selected object O Output: Data stream of object S Step 1: Set all pixels of object O as training pixels. Step 2: Apply LBG algorithm to generate the color palette CP containing NC colors. Step 3: For each object pixel, find the most similar color in CP . Step 4: Output NC ||NP ||{Ci |0 ≤ i < NC }||{pi |0 ≤ i < NP }. The generated object data are represented in binary format. According to our data embedding procedure, the generated object data are converted to 2k numeral system; that is, every k bits is converted into a 2k numeral number. Thus, the converted object data are represented as S = {s1 s2 . . . sN S }, where N S is the number of symbols of converted object data. Next, the data embedding procedure is applied to embed the object data into a video frame. Here, a video frame is composed of red, green, and blue planes. To consider the insensibility of human eye to color, the sequence of color planes that we use to embed object data is blue, green, and then red. The blue color is insensitive to human eyes; that means, the human eye is hard to distinguish the distortion of a frame, which blue plane has been modified. On the other hand, green color is more insensitive to human eye than red color. Data embedding is completed using the following steps, but any suitable data embedding method can be applied. A simple data embedding method is described as follows. First, a pixel selection strategy is used to select N S pixels from frame fi to form P = {p1 , p2 , . . . , pN S } except the pixel located in the object. After that, the object data si is embedded into pi using pi = pi − (pi mod 2k ) + si . Furthermore, the embedding procedure is applied to all frames in the video for complete embedding the objects data. Here, a video embedded with the objects data is called stego video. The key steps of objects embedding are summarized as follows. Procedure 2: Data embedding Input: A frame fi of video with object data stream S Output: A stego frame fi of video Step 1: Set i = 0. Step 2: Select N S pixels to form P = {p1 , p2 , . . . , pN S } for data embedding except the pixel located in the objects’ region. Step 3: Embed si into pi using pi = pi − (pi mod 2k ) + si . Step 4: If i < N S, then i = i + 1. Go to Step 3. Step 5: Output stego frame fi .
3
3.3.2
Restoring Objects for Digital Inpainting
53
Object Data Restoring
The lost object can be restored when the object data have been extracted from stego video. The object data extraction is a reverse work to object data embedding; that is, to extract the object data from stego frames and to repaint the object on the frames. Thus, the object data restoring consists of two phases, namely object data extracting and object repainting. The object data can be extracted by the following. First, a predefined pixel selecting strategy is used to select the pixels with the objects data from the stego frame. Next, the objects data are extracted using the following equation in Eq. (3.2). (3.2) si = pi mod 2k , where pi and si represent the stego pixel on the frame and the extracted object data, respectively. After all of object data have been extracted, the object data are converted from 2k numeral system into binary format. Thus, the extracted object data can be used to restore the lost object data. For the object restoring, the color palette is reconstructed for helping the object restoring. Since Part 3 in object data stream (see Fig. 3.2) is the content of color palette, the color palette can be exactly reconstructed as the color palette in object data generation phase. After that, the objects can be restored according to Part 4 in the object data stream (see Fig. 3.2). Finally, the lost object restoring can be done by applying the object restoring to all frames in the stego video. The key steps of the data extracting are summarized as follows: Procedure 3: Extracting object data and restoring the object Input: A stego frame fj Output: Restored frame fj Step 1: Select the stego pixels P = {p1 , p2 , . . . , pN S } from fj using the same pixel selection strategy as the pixel selection in data embedding phase. Step 2: Set i = 0 and r = 0. Step 3: Extract object data using si = pi mod 2k . Step 4: If all object data have been extracted, then go to Step 5, Else i = i + 1. Go to Step 3. Step 5: Convert extracted data from 2k numeral system into binary format. Step 6: Reconstruct color palette CP . Step 7: Restore the object using Part 4 of the extracted object data. Step 8: Output fj .
3.4 Experimental Results To evaluate the effectiveness of the proposed framework, we implemented the procedures using MATLAB 7.0 on Intel 3.60 GHz CPU with 1 GB RAM hardware. The visual quality of the restored frame is an important factor
54
Y.-C. Chou and C.-C. Chang
(a) A frame of video 1
(b) A frame of video 2
(c) A frame of video 3
(d) A frame of video 4
(e) A frame of video 5 Fig. 3.3. Test videos
for measuring the effect of the object restoration. We adopt the peak-signalnoise-ratio (PSNR) to measure the visual quality of the restored frame in order to avoid the subjective evaluation from human eyes. The value of PSNR represents the similar degree of a restored frame compared to original frame. In other words, a large value of PSNR means that the restored frame is most similar to the original frame. On contrary, a small value of PSNR indicates that the restored frame is dissimilar to the original frame. PSNR is calculated using the following equations:
3
Restoring Objects for Digital Inpainting
(a) Selected object from Fig. 3.3(a)
(b) Selected object from Fig. 3.3(b)
(d) Selected object from Fig. 3.3(c)
(e) Selected object from Fig. 3.3(d)
55
(e) Selected object from Fig. 3.3(e) Fig. 3.4. Selected objects
PSNR = 10 log10 where MSE =
2552 MSE
,
H−1 −1 2 W 1 (fxyr − fxyr )2 . H × W × 3 x=0 y=0 r=0
(3.3)
(3.4)
56
Y.-C. Chou and C.-C. Chang
(a) Inpainted frame from Fig. 3.4(a)
(b) Inpainted frame from Fig. 3.4(b)
(d) Inpainted frame from Fig. 3.4(c)
(e) Inpainted frame from Fig. 3.4(d)
(e) Inpainted frame from Fig. 3.4(e) Fig. 3.5. Inpainted video frames
Here, MSE is the mean square error between two frames; x and y represent the pixel coordinates in the frame; r represents the color plane of the frame; represent the pixel value of frames f and f , respectively. and fxyr and fxyr Moreover, the object data is embedded into k least significant bits of a pixel. Here, we let k = 3. Fig. 3.3 shows the five videos used in our simulation. Each video frame consists of 320 × 240 pixels, and the numbers of frames of test videos are
3
Restoring Objects for Digital Inpainting
(a) Restored frame from Fig. 3.5(a), (PSNR = 50.54 dB)
(b) Restored frame from Fig. 3.5(b), (PSNR = 53.99 dB)
(d) Restored frame from Fig. 3.5(c), (PSNR = 59.39 dB)
(e) Restored frame from Fig. 3.5(d), (PSNR = 60.34 dB)
57
(e) Restored frame from Fig. 3.5(e), (PSNR = 59.05 dB) Fig. 3.6. Restored video frames
39, 77, 62, 208, and 305, respectively. Fig. 3.4 shows the selected objects from Fig. 3.3. For instance, the red area in Fig. 3.4(a) represents the selected object.
58
Y.-C. Chou and C.-C. Chang
(a) Inpainted frame 1
(b) Inpainted frame 5
(c) Inpainted frame 10
(d) Inpainted frame 15
(e) Inpainted frame 20
(f) Inpainted frame 25
(g) Inpainted frame 30
(h) Inpainted frame 35
(i) Inpainted frame 39 Fig. 3.7. Inpainted frames for video 1
3
Restoring Objects for Digital Inpainting
(a) Restored frame 1
(b) Restored frame 5
(c) Restored frame 10
(d) Restored frame 15
(e) Restored frame 20
(f) Restored frame 25
(g) Restored frame 30
(h) Restored frame 35
(i) Restored frame 39 Fig. 3.8. Restored frames for video 1
59
60
Y.-C. Chou and C.-C. Chang
Fig. 3.5 shows the video frames inpainted using Shih et al.’s method [10]. The objects were successful removed. Fig. 3.6 shows the restored video frames from Fig. 3.5. The PSNR values of restored video frames are higher than 50 dB. This means that the visual quality of restored video frames is most similar to the original video frames. Furthermore, the average PSNR values of test videos are 49.04 dB, 53.23 dB, 59.24 dB, 60.41 dB, and 59.39 dB, respectively. In general, a user cannot distinguish the difference between the restored frame and the original frame when the PSNR value is higher than 30 dB. From Fig. 3.6, the inpainted objects are not only successfully restored but also most similar to the objects on the original video frames. Fig. 3.7 shows a part of inpainted frames of video 1. Fig. 3.8 shows the restored frames of video 1 corresponding to Fig. 3.7. From Fig. 3.8, the missing object has been successfully restored. Thus, we can conclude that the missing object can be clearly restored according to our framework design.
3.5 Conclusions Object elimination is an extended application of digital image inpainting. To prevent object permanent elimination, we propose an object restoring framework to conceal object information into video frames. The proposed framework uses the data embedding technique to conceal object information in the background region. From the experimental results, the visual quality of the stego frame is very hard to visually distinguish the difference from the original image by human’s eyes. Thus, we can conclude that the proposed framework not only successfully embeds object data into video frames but also restores visible and clear objects.
References 1. Bertalmio, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proceedings of ACM SIGGRAPH Conference on Computer Graphics, Louisiana, U.S.A, pp. 417–424 (2000) 2. Chan, C.K., Cheng, L.M.: Hiding data in images by simple lsb substitution. Pattern Recognition 37, 469–474 (2004) 3. Chang, C.C., Lin, C.Y.: Reversible steganography for VQ-compressed images using side matching and relocation. IEEE Trans. Information Forensics and Security 1, 493–501 (2006) 4. Chang, C.C., Lu, T.C.: A difference expansion oriented data hiding scheme for restoring the original host images. The Journal of Systems and Software 79, 1754–1766 (2006) 5. Chang, C.C., Tai, W.L., Lin, C.C.: A reversible data hiding scheme based on side match vector quantization. IEEE Trans. Circuits and Systems for Video Technology 16, 1301–1308 (2006) 6. Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Communications COM-28, 84–95 (1980)
3
Restoring Objects for Digital Inpainting
61
7. Mielikainen, J.: LSB matching revisited. IEEE Signal Processing Letters 13, 285–287 (2006) 8. Oliveira, M.M., Bowen, B., McKenna, R., Chang, Y.S.: Fast digital image inpainting. In: Proceedings of International Conference on Visualization, Imaging and Image Processing (VIIP 2001), Marbella, Spain, pp. 261–266 (2001) 9. Shih, T.K., Chang, R.C., Lu, L.C.: Multi-layer inpainting on Chinese artwork. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME 2004), Taipei, Taiwan, pp. 26–30 (2004) 10. Shih, T.K., Lu, L.C., Chang, R.C.: Multi-resolution image inpainting. In: Proceedings of the 2003 IEEE International Conference on Multimedia and Expo (ICME 2003), Baltimore, USA, pp. I-485–I-488 (2003) 11. Shih, T.K., Tang, N.C.: Digital Inpainting. In: Proceedings of the First International Conference on Ubiquitous Information Management and Communication, Suwon, Korea, pp. 195–205 (2007) 12. Shih, T.K., Tang, N.C., Yeh, W.S., Chen, T.J.: Video inpainting and implant via diversified temporal continuations. In: Proceedings of 2006 ACM Multimedia Conference, California, U.S.A, pp. 23–27 (2006) 13. Tian, J.: Reversible data embedding using a difference expansion. IEEE Trans. Circuits and Systems for Video Technology 13, 831–841 (2003) 14. Yamauchi, H., Haber, J., Seidel, H.P.: Image restoration using multiresolution texture synthesis and image inpainting. In: Proceedings of Computer Graphics International, Tokyo, Japan, pp. 120–125 (2003) 15. Zhang, X., Wang, S.: Efficient steganographic embedding by exploiting modification direction. IEEE Communications Letters 10, 1–3 (2006)
4 A Secure Data Embedding Scheme Using Gray-Code Computation and SMVQ Encoding Chin-Chen Chang1,3 , Chia-Chen Lin2 , and Yi-Hui Chen3 1 2 3
Department of Information Engineering and Computer Science Feng Chia University, Taichung 40724, Taiwan, R.O.C. Department of Computer Science and Information Management Providence University, Taichung 43301, Taiwan, R.O.C. Department of Computer Science and Information Engineering National Chung Cheng University, Chiayi 621, Taiwan, R.O.C. {ccc,chenyh}@cs.ccu.edu.tw,
[email protected] Summary. Data hiding is widely used for concealing secrets in images so that senders can securely transfer these secrets to the receivers. In data embedding schemes, the image quality, hiding capacity, and compression ratio as well as the security of the hidden data are all critical issues. Instead of using traditional data encryption techniques to encrypt secret data in the preprocessing phase, this paper proposes a hybrid data embedding scheme using SMVQ and gray-code computation. The experimental results demonstrate that this proposed scheme can achieve reasonable image quality in decoded images and a reasonable compression ratio while guaranteeing the security of the hidden data. Keywords: Data hiding, SMVQ, gray-code computation, security.
4.1 Introduction Secure data transmission has become an important issue in recent years. In addition to traditional data encryption techniques, data hiding has become another approach for users to secure data transmission. The significant difference between traditional data encryption and data hiding is that the latter approach can transmit data secretly without arousing malicious attackers’ suspicion. In data hiding schemes, the cover media can be images, text files, and videos, although usually an image is the simplest media used for embedding a large amount of secret data. The embedded image is often called a stego-image. Once secret data are embedded into a cover image, the data become invisible to attackers, who do not sense the existence of the secret data during data transmission. The two fundamental requirements for data hiding are reasonable image quality of the stego-image and a hiding capacity J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 63–74. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
64
C.-C. Chang, C.-C. Lin, and Y.-H. Chen
that is as large as possible. However, high capacity often causes low quality; as such, many literatures [1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 14, 15] have proposed promoting hiding capacity with reasonable degradation. Data hiding schemes have been roughly divided into three categories: spatial-domain, frequency-domain, and compression-codes manners. In the spatial-domain manner, users mix the pixel values of the cover image and secret data to create a stego-image. The simplest example involves replacing the least significant bit (LSB) of bit plane of the cover image with secret data. However, this process is insecure because many attack methods have already been proposed for detecting whether secret data are embedded into the cover image. In addition, it is not robust because the secret data cannot be extracted if the stego-image becomes damaged. In the frequency-domain manner, prior to data embedding, the image must be transformed into frequency coefficients using a frequency tool such as DCT or DFT. The secret data are then embedded by modifying the frequency coefficients. In general, the frequency-domain manner is more robust than the spatial-domain one because it can resist several image processing attacks. In compression-codes manner, the encoder embeds secret data into the compression results. Vector quantization (VQ) [13] and side-match vector quantization (SMVQ) [7, 8] are two of the most well-known compression techniques currently being applied to design data hiding schemes. The existing data embedding scheme based on VQ or SMVQ will be described in Subsection 4.2.2. Unlike the existing VQ- or SMVQ-based data embedding schemes, the proposed process applies not only clustering technology but also gray-code computation in designing the data embedding scheme. The cluster technology is used to ensure that two similar codewords are assigned into the same group so that the acceptable image quality of the decoded image can be achieved. The gray-code computation is used to combine VQ and SMVQ encoding in the data embedding phase to ensure that the strongest possible security of the hidden data can be guaranteed. Experimental results indicate that the image quality of the stego-images by using our proposed scheme is reasonable after data embedding. On average, other than blocks that appear in the first row and column of the cover image, each residual block can hide one secret bit. Although the hiding capacity of this proposed scheme is not high, the proposed scheme employs the same strategy as traditional SMVQ to identify whether a block is encoded by VQ or SMVQ. Therefore, attackers are difficult to perceive any embedded secrets. The rest of this chapter is organized as follows. Section 4.2 will briefly describe the existing VQ- and SMVQ-based data embedding schemes. The proposed data embedding scheme is described in Section 4.3. The experimental results are presented in Section 4.4. Finally, conclusions and future works are addressed in Section 4.5.
4
A Secure Data Embedding Scheme
65
4.2 Literature Review This section will briefly address traditional SMVQ as well as some existing data embedding schemes based on VQ and SMVQ. 4.2.1
SMVQ
SMVQ [7, 8] is an improvement over the VQ technique. To improve the bit rate and reduce the block effect of an image, SMVQ uses two kinds of codebooks: the super codebook, which is the ordinary codebook used by VQ, and the state codebook, which is a subset of the super codebook. Blocks located in an image’s first row or first column are encoded by VQ using the super codebook. The remaining blocks, also called residual blocks, are encoded by SMVQ using their corresponding state codebooks. To generate a state codebook for the current block, SMVQ uses the upper and left blocks of the current block instead of the current block itself. Because the state codebook is smaller than the super codebook and the codewords of each state codebook are very similar to the current block, both the bit rate of the compression result and the image quality of the reconstructed image offered by SMVQ are significantly improved as compared to VQ. Assuming a given block X will be encoded by SMVQ as shown in Fig. 4.1, seven boundary pixel values of block X are predicted using the neighboring pixels above it and to its left (U and L, respectively). That is, the values of X1 , X2 , X3 , X4 , X5 , X9 , and X1 3 are copied from the mean of U13 and L4 , U14 , U15 , U16 , L4 , L8 , L12 , and L16 , respectively. Therefore, a predicted vector of block X (X1 , X2 , X3 , X4 , X5 , ?, ?, ?, X9 , ?, ?, ?, X13 , ?, ?, ? ) is generated, where the unknown pixel is denoted as “?”. Subsequently, the encoder ignores the unknown pixel values to compute the Euclidean distances between the predicted vector and all codewords of the super codebook. The next step involves sorting the codewords according to their calculated Euclidean distances and picking out the first ts minimal codewords to construct
Fig. 4.1. Predication of boundary pixel values of block X
66
C.-C. Chang, C.-C. Lin, and Y.-H. Chen
a state codebook for block X, where ts is the size of state codebook. Finally, the best codeword, which is with the one with the smallest Euclidean distance between the original block X and codewords in the state codebook, is selected to encode block X. However, sometimes this prediction is not imprecise enough, especially when the current block is an edged block. To ensure a better image quality of the reconstructed image, a threshold T is applied, which will evaluate whether the block can be encoded by SMVQ. If the Euclidean distance between the best codeword and the original block is less than the threshold T , the current block can be encoded by SMVQ using the state codebook. Otherwise, it must be encoded by VQ using the super codebook. 4.2.2
The Existing VQ- or SMVQ-Based Data Embedding Schemes
In recent years, many scholars have proposed several data embedding schemes based on VQ or SMVQ [5, 10, 12, 14]. In 1999, Lin et al. [10] applied the VQ technique to design a data embedding scheme so that secret data can be embedded into VQ compression results. In Lin et al.’s scheme, two similar codewords from the codebook are clustered into a pair. After clustering, the codewords in each pair are assigned to two sub-codebooks, entitled 0codebook and 1-codebook. Subsequently, according to the secret data, the encoder can select the best codeword from one of the sub-codebooks to encode the current block. That is, the block is encoded using the codeword with the smallest distortion from 0-codebook or 1-codebook once the secret bit is 0 and 1, respectively. In 2000, Lu and Sun [12] extended Lin et al.’s scheme to cluster 2k similar codewords into a group and then distribute them into 2k sub-codebooks. On average, Lu and Sun’s scheme can embed k bits per block into an image. Later, Du and Hsu [5] proposed a VQ-based data hiding scheme for digital images. In their scheme, similar codewords from the codebook are first clustered into several groups and then all the binary secret bits are transformed into an unsigned integer. Next, according to this unsigned integer, they assign a corresponding index of codeword for each block. Du and Hsu successfully improved the hiding capacity, but their unsigned integer transformation and index assignment are time-consuming processes. Subsequently, Shie et al. [14] combined VQ and SMVQ techniques to propose a low bit rate data embedding scheme. In Shie et al.’s scheme, secret data are embedded into smooth blocks in the cover image using a state codeword replacement. Two thresholds T Hvar and T Hsmd are used to pick out smooth and embeddable blocks, respectively. These two thresholds also determine the hiding capacity and image quality of the reconstructed images. In other words, higher values of T Hvar and T Hsmd result in higher capacity, but lower image quality of the reconstructed images.
4
A Secure Data Embedding Scheme
67
4.3 The Proposed Scheme This chapter proposes a data embedding scheme based on SMVQ and graycode computation. Using this data embedding strategy, the sender can transform a cover image into a stego-image according to secret data and then send the stego-image to a receiver for secure data transmission. The receiver can then easily extract the secret data using the proposed extracting policy. The proposed scheme can be broken down into three phases: preprocessing, data embedding, and data extracting phases. These phases will be discussed in Subsections 4.3.1, 4.3.2, and 4.3.3, respectively. 4.3.1
Preprocessing Phase
First, a codebook C, whose size is n, is generated using a well-known LBG algorithm [9]. Next, all codewords wi in codebook C are clustered into n2 groups, denoted as G = {g1 , g2 ,. . . , g n2 }, where wi ∈ C and 1 ≤ i ≤ n. Each group consists of two similar codewords so that a block can be encoded by one or the other codeword from the same group to maintain similar degradation of the cover image. Finally, all codewords in codebook C are rearranged to generate a new codebook C to ensure that the two codewords belonging to the same group have the closest indices. For clarity, an example is provided in Fig. 4.2. As Fig. 4.2 indicates, eight codewords cw1 , cw2 , . . . , cw8 in codebook C are clustered into four groups, g1 , g2 , g3 and g4 , which contain codewords {cw1 , cw6 }, {cw2 , cw5 }, {cw3 , cw8 }, and {cw4 , cw7 }, respectively. Based on the rearrangement policy depicted above, codebook C is reorganized as codebook C . Therefore, the original indices 0, 1, 2, 3, 4, 5, 6, and 7 for codewords cw1 ,
Fig. 4.2. An example of generation of a new codebook C
68
C.-C. Chang, C.-C. Lin, and Y.-H. Chen
cw2 , cw3 , cw4 , cw5 , cw6 , cw7 , and cw8 are updated as new indices 0, 2, 4, 6, 3, 1, 7, and 5, respectively. Based on these re-indexing results, a replacement method based on gray-code computation of the candidate codewords is later adopted in the data embedding phase. 4.3.2
Data Embedding Phase
To conduct the data embedding process, a cover image is first divided into several non-overlapping blocks. Within these blocks, those located in the first row and first column are directly encoded by VQ using the super codebook. The remaining blocks are encoded by combining VQ or SMVQ to conceal secret bits. Before the data embedding proceeds, it must be determined whether each remaining block is embeddable or not according to its caused distortion. If the block is embeddable, it is encoded by SMVQ or VQ; otherwise, it is encoded by VQ using the super codebook. Since a block may be encoded by VQ or SMVQ, an indicator is required; “0” and “1” indicate that the current block is encoded by SMVQ and VQ, respectively. For simplicity, it is assumed that the hidden secret bit for each embeddable block is denoted as s. Fig. 4.3 presents the flowchart of the data embedding phase. A state codebook sized as m is generated for each residual block by using SMVQ technique. In this case, the value of m is 4. Notably, each codeword in the state codebook always can find its corresponding index in codebook C . According to the re-indexing results generated in the preprocessing phase, each codeword’s neighbor is its similar codeword in codebook. In the example in Fig. 4.3, it is assumed that the four codewords mapping to indices 14, 15, 127, 255 in codebook C are collected as a state codebook for block X. Based
Fig. 4.3. An example of the proposed data embedding
4
A Secure Data Embedding Scheme
69
on the rearrangement policy mentioned in Subsection 4.3.1, the codewords mapping to indices 14 and 15 are similar to each other, those mapping to indices 126 and 127 are in the same group, and those mapping to indices 254 and 255 are in the same group. Therefore, six indices 14, 15, 126, 127, 254, and 255 are picked for further evaluation. Next, each index of the selected codewords is transformed into a binary bit stream B = b1 b2 . . . bk , where k = log2 n and n is the size of codebook C . Subsequently, integer can be produced from each B by using gray-code computation, as demonstrated in Equation (4.1). Here, is an XOR operation. In the example from Fig. 4.3, after gray-code computation, the values 1, 0, 0, 1, 1, and 0 are produced for codewords whose indices are 14, 15, 126, 127, 254, and 255, respectively. s = b 1 ⊕ b 2 . . . b k .
(4.1)
Next, the codewords for the candidate pool are chosen from the selected codewords while their values of s are equal to secret bit s for the current block. Among the codewords in the candidate pool, the encoder chooses a codeword with the least distortion to encode the current block. This example assumes that the secret bit s is “1”; as a result, the codewords mapping to indices 14, 127, and 224 are selected to form a candidate pool. Among them, the codeword of index 14 has the least distortion; therefore, it is treated as a candidate codeword. If the current block is smooth, the distortion between the original block and the candidate codeword is small; conversely, when the block is complex, the distortion is large. Based on this property, before encoding the current block, a threshold T is applied to estimate whether the current block is embeddable. If the distortion caused by the candidate codeword is less than T , it is embeddable and encoded by the candidate codeword. Otherwise, it is encoded by the best codeword of VQ. In Fig. 4.3, function D is used to return the distortion between two different input vectors by applying the Euclidean distance. For an embeddable block, the candidate codeword might be a codeword from its state codebook or the original codebook. If the candidate codeword is found in its state codebook, the embeddable block is encoded using the candidate codeword and its indicator is set to “0” Conversely, the embeddable block is encoded by VQ using the candidate codeword, and the indicator is set to “1.” 4.3.3
Data Extracting Phase
The data extracting phase is broken down into two procedures: decoding and extracting procedures. After receiving the compression codes and the corresponding indicators, the receiver can easily produce the reconstructed image and extract the secret bits using the proposed decoding and extracting procedures, respectively. In the decoding procedure, blocks located in the first row and first column are directly decoded by VQ using the super codebook. The
70
C.-C. Chang, C.-C. Lin, and Y.-H. Chen
Fig. 4.4. The flowchart of the proposed data extracting phase
remaining blocks are decoded according to their indicators and compression results. In other words, if the indicator is “0,” a state codebook is generated and the SMVQ decoding procedure is used to decode the corresponding block. Otherwise, it is decoded by VQ using the super codebook. In the extracting procedure, shown in Fig. 4.4, a state codebook is generated for each remaining block. Each codeword of the state codebook has a corresponding index in codebook C and a similar codeword generated in the preprocessing phase. If the indicator is “1” and the compression result is
Fig. 4.5. Three 512 × 512 test images
4
A Secure Data Embedding Scheme
Table 4.1. The performances of test images at different thresholds TS Threshold TS
8
9
10
25
100
Test images
image quality Hiding capacity Compression ratio (dB) (bits) (bpp)
Lena
32.24
213
0.620
Pepper
31.40
54
0.623
Gold Hills
29.94
17
0.624
Lena
32.24
510
0.615
Pepper
31.40
177
0.621
Gold Hills
29.94
33
0.624
Lena
32.23
913
0.607
Pepper
31.40
405
0.617
Gold Hills
29.94
63
0.623
Lena
31.79
7018
0.511
Pepper
31.02
7268
0.508
Gold Hills
29.83
2801
0.582
Lena
27.65
14973
0.467
Pepper
27.73
14914
0.454
Gold Hills
25.94
14968
0.485
71
72
C.-C. Chang, C.-C. Lin, and Y.-H. Chen
neither the codeword from its state codebook nor its similar codeword, no secret is hidden in the current block. If the indicator is “1” and the corresponding compression result can be found in the candidate pool but not in the state codebook, it means that the current block hides a secret bit and the secret data can be extracted by using Equation (4.1). Finally, if the indicator is “0” and its compression result maps to an index found in its state codebook, the hidden secret data can be extracted by using Equation (4.1).
4.4 Experimental Results To prove the performance of the proposed scheme in regard to image quality, hiding capacity, and compression ratio, three experiments were conducted on a test platform of Microsoft Windows XP, Pentium 3 with 512 MB memory. The proposed data embedding scheme was implemented using a Java program. Three gray-scale images entitled “Lena,” “Pepper,” and “Gold Hills” are served as test images. Three test images are shown in Fig. 4.5. In the experiments conducted, the peak signal-to-noise ratio (PSNR) was used to measure the image quality of the decoded image. Generally, the higher the value of PSNR, the lower the degradation after data embedding. The number of hidden secret bits that can be embedded in a given test image represents the hiding capacity of the proposed scheme. Moreover, the bit rate is denoted as a compression ratio of the proposed scheme. Table 4.1 lists the different thresholds Ts , the image quality of the decoded image, hiding capacity, and compression ratio for each image. This information demonstrates that the image quality and compression ratio decrease gradually while the value of threshold T increases gradually. Certainly, the hiding capacity of the proposed scheme can not compete with the existing schemes, which emphasize a high data capacity. However, the proposed scheme can definitely provide stronger protection for the hidden data without using data encryption techniques because the hybrid data embedding strategy used is based on the gray-code computation results of the candidate codewords and caused the distortion of the best codewords. This means that, even if attackers know a block is encoded by VQ, they still have no idea whether it hides secret bits. In addition, the proposed scheme employs the same strategy as traditional SMVQ to identify whether a block is encoded by VQ or SMVQ and the proposed decoding procedure is the same as that of the ordinary SMVQ. Therefore, attackers cannot detect secrets embedded through the decoding procedure or based on the compression results generated in the embedding phase.
4.5 Conclusions This chapter has proposed a data embedding scheme. Instead of using traditional data encryption such as DES or RSA to encrypt the secret data in
4
A Secure Data Embedding Scheme
73
the preprocessing phase, the proposed scheme applied hybrid data encoding strategies based on the gray-code computation results of the candidate codewords and the resulting distortion of the best codeword to enhance the security of the hidden data. Because the proposed scheme employs the same strategy as traditional SMVQ to identify whether a block is encoded by VQ or SMVQ and the proposed decoding procedure is the same as the traditional SMVQ, attackers have difficulty detecting hidden data through the decoding phase or from the compression results. Since the security of the hidden data is achieved at the cost of lower hiding capacity, our future work will focus on increasing the hiding capacity while maintaining the same security level for the hidden data. Certainly, eliminating the extra indicator is another objective so that the compression ratio can also be improved.
References 1. Chang, C.C., Hsiao, J.Y., Chan, C.S.: Finding optimal least-significant-bit substitution in image hiding by dynamic programming strategy. Pattern Recognition 36(7), 1583–1593 (2003) 2. Chang, C.C., Lin, C.Y.: Reversible steganography for VQ-compressed images using side matching and relocation. IEEE Transactions on Information Forensics and Security 1(4), 493–501 (2006) 3. Chang, C.C., Lin, C.Y., Wang, Y.Z.: New image steganographic methods using run-length approach. Information Sciences 176(22), 3393–3408 (2006) 4. Chang, C.C., Tai, W.L., Lin, C.C.: A reversible data hiding scheme based on side match vector quantization. IEEE Transactions on Circuits and Systems for Video Technology 16(10), 1301–1308 (2006) 5. Du, W.C., Hsu, W.J.: Adaptive data hiding based on VQ compressed images. IEE Proceedings on Vision Image Signal Processing 150(4), 233–238 (2003) 6. Kamstra, L.H., Henk, J.A.M.: Reversible data embedding into images using Wavelet techniques and sorting. IEEE Transactions on Image Processing 14(12), 2082–2090 (2005) 7. Kim, T.: Side match and overlap match vector quantizers for images. IEEE Transactions on Image Processing 1(2), 170–185 (1992) 8. Lin, S.D., Shie, S.C.: Side-match finite-state vector quantization with adaptive block classification for image compression. IEICE Transactions on Information Systems E83-D(8), 1671–1678 (2000) 9. Lin, Y.C., Tai, S.C.: A fast linde-buzo-gray algorithm in image vector quantization. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 45(3), 432–435 (1998) 10. Lin, Y.C., Wang, C.C.: Digital images watermarking by vector quantization. In: Proceedings of 9th National Computer Symposium, Taichung, Taiwan (1999) 11. Lu, C.S., Huang, S.K., Sze, C.J., Liao, H.Y.M.: A new watermarking technique for multimedia protection. In: Guan, L., Larsen, J., Kung, S.Y. (eds.) Multimedia Image and Video Processing, ch. 18, pp. 507–530. CRC Press Inc., Boca Raton (2000)
74
C.-C. Chang, C.-C. Lin, and Y.-H. Chen
12. Lu, Z.M., Sun, S.H.: Digital image watermarking technique based on vector quantization. Electronics Letters 36(4), 303–305 (2000) 13. Nasrabadi, N.M., King, R.A.: Image coding using vector quantization: A review. IEEE Transactions on Communications 36(8), 957–971 (1988) 14. Shie, S.C., Lin, S.D., Fang, C.M.: Adaptive data hiding based on SMVQ prediction. IEICE Transactions on Information and Systems E89-D(1), 358–362 (2006) 15. Wang, R.Z., Lin, C.F., Lin, J.C.: Image hiding by optimal LSB substitution and genetic algorithm. Pattern Recognition 34, 671–683 (2001)
5 Robust Image Watermarking Based on Scale-Space Feature Points Bao-Long Guo1 , Lei-Da Li1 , and Jeng-Shyang Pan2 1
2
Institute of Intelligent Control and Image Engineering (ICIE), Xidian University, Xi’an 710071, P.R. China
[email protected] [email protected] National Kaohsiung University of Applied Sciences, Kaohsiung 807, Taiwan
[email protected] http://bit.kuas.edu.tw/~ jspan/
Summary. Digital watermarking techniques have been explored extensively since its first appearance in the 1990s. However, watermark robustness to geometric attacks is still an open problem. The past decade has witnessed a significant improvement in the understanding of geometric attacks and how watermarks can survive such attacks. In this chapter, we will introduce a set of image watermarking schemes which can resist both geometric attacks and traditional signal processing attacks simultaneously. These schemes follow a uniform framework, which is based on the detection of scale-space feature points. We call it the scale-space feature point based watermarking, SSFW for short. Scale-space feature points have been developed recently for pattern recognition applications. This kind of feature points are commonly invariant to rotation, scaling and translation (RST), therefore they naturally fit into the framework of geometrically robust image watermarking. Scale-space feature points are typically detected from the scale space of the image. As a result, we will first introduce the scale space theory and how the feature points can be extracted. The basic principles on how the scale-space feature points can be adapted for watermark synchronization are then discussed in detail. Subsequently, we will present several content-based watermark embedding and extraction methods which can be directly implemented based on the synchronization scheme. A detailed watermarking scheme which combines scale-invariant feature transform (SIFT) and Zernike moments is then presented for further understanding of SSFW. Watermarking schemes based on the SSFW framework have the following advantages: (a) Good invisibility. The Peak Signal to Noise Ratio (PSNR) value is typically higher than 40dB. (b) Good robustness. These schemes can resist both signal processing attacks and geometric attacks, such as JPEG compression, image filtering, added noise, RST attacks, locally cropping as well as some combined attacks. J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 75–114. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
76
B.-L. Guo, L.-D. Li, and J.-S. Pan
5.1 Introduction The past decade has seen an explosion in the use and distribution of digital multimedia data. Computers with Internet connections have made the distribution of digital products much easier. Digital signals are advantageous over traditional analog signals in that they are easy to process and transmit. Digital products, such as image, audio and video, can be shared or distributed by the World Wide Web without causing any degradation. Meantime, we have witnessed a rise in copyright encroachment. Digital watermarking is a promising way to protect the copyright of digital contents [3, 18, 19]. According to the different applications, digital watermarking can be classified into robust watermarking and fragile (semi-fragile) watermarking. Robust watermarking is commonly used for copyright protection while fragile watermarking is usually used in content authentication. Most of the existing schemes focus on robust image watermarking. The two basic requirements of a watermarking scheme are watermark invisibility and watermark robustness. Invisibility means that the watermark signal should be embedded into the cover signal so that human eyes cannot perceive obvious changes. In other words, the fidelity of of the cover signal should be maintained after the watermark is inserted. Robustness means that the watermark should be detectable even when the watermarked image is subject to unintentional or intentional attacks. Traditional attacks consists of image compression, image filtering, added noise, etc. These attacks are usually introduced when the image is processed for specific purposes. For example, the images are often compressed before they are published while the contrast or brightness may be enhanced for better display. Besides, the transmission of the image may introduce noise because of the channel. These attacks tends to weaken the watermark energy so that the watermark detector cannot detect the presence of the watermark correctly. Another kind of attacks act on the watermarked image in a quite different way, which are called geometric attacks. Geometric attacks mainly consists of image rotation, scaling, translation (RST), cropping and random bending etc. RST invariant image watermarking has become an active research field recently [11, 28]. When the watermarked image is subject to geometric attacks, the watermark signal still exists in the image and the watermark energy remain almost unchanged. However, the position of the watermark has changed because of the geometric transformation of the image. As a result, the watermark detector cannot find the watermark at the place where the watermark was originally inserted. In order to successfully detect the watermark from the geometrically distorted image, an additional step is necessary before watermark detection, i.e. watermark synchronization. The task of watermark synchronization is to find the position where the watermark has been inserted. The key problem of geometrically robust image watermarking is watermark synchronization, and this kind of schemes can be classified according to the different types of watermark synchronization approaches.
5
Robust Image Watermarking Based on Scale-Space Feature Points
77
This chapter is organized as follows. Section 5.2 presents the previous works. In Section 5.3, we introduce the scale-space theory and feature point extraction. The framework of scale-space feature point based watermarking (SSFW) will be described in Section 5.4, together with some content-based watermark embedding and extraction schemes. A geometrically robust image watermarking scheme using scale-invariant feature transform and Zernike moment is discussed in detail in Section 5.5 to further classify the framework. Finally, Section 5.6 concludes this chapter.
5.2 Related Works In this section, we will review the existing RST invariant image watermarking schemes. Geometric distortions can be global or local. Global geometric distortions affect all the pixels of an image in the same manner, while local geometric distortions affect different portions of an image in different manners [11]. Most of the existing schemes focus on the research of watermarking schemes resistant to global geometric attacks, and they can be roughly classified into the following categories [28]: 1. embedding the watermark in a domain that is invariant to geometric attacks. 2. embedding an additional template and employing the template to estimate the parameters of geometric distortions. 3. embedding a watermark that has self-synchronizing property. 4. employing image features to achieve watermark synchronization. Invariant domain embedding. The key idea of this kind of scheme is to find a domain that is invariant to geometric transformations and embed the watermark therein. O’Ruanaidh first reported the RST invariant domain image watermarking based on the Fourier-Mellin transform [17]. Fig. 5.1 shows the diagram of this scheme. The original image is first transformed into Discrete Fourier Transform (DFT) domain and the magnitude spectrum is mapped into the log-polar coordinate system. Then a second DFT is conducted on the new coordinate system and the watermark is embedded into the magnitudes of the second DFT. In this scheme, shift invariance is achieved from the translation invariant property of the magnitudes of Fourier coefficients. Rotation and scale invariance are achieved from the Fourier magnitudes of the log-polar mapped Fourier magnitudes of the image. This scheme suffers severe implementation difficulty, which is mainly due to the interpolation error induced by the log-polar mapping (LPM) and inverse log-polar mapping (ILPM) during watermark embedding. Similar methods can be found in literatures [12, 29]. In literature [12], Lin et al. embed the watermark into a one-dimensional signal obtained by taking the Fourier transform of the image, resampling the Fourier magnitudes into log-polar coordinates, and then summing a function of those magnitudes along the log-radius axis. Rotation of the image results
78
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.1. Diagram of O’Ruanaidh’s scheme [17]
in a cyclical shift of the extracted signal. Scaling of the image results in amplification of the extracted signal, and translation of the image has no effect on the extracted signal. Therefore, they can compensate for rotation with a simple search, and compensate for scaling by using the correlation coefficient as the detection measure. In literature [29], Zheng et al. embed the watermark in the LPMs of the Fourier magnitude spectrum of an original image, and use the phase correlation between the LPM of the original image and the LPM of the watermarked image to calculate the displacement of watermark positions in the LPM domain. This approach preserves the image quality by avoiding computing the inverse LPM. Another kind of invariant domain schemes embed the watermark using Zernike or pseudo-Zernike moments [5, 24, 25, 26]. Kim et al. embedded the watermark by modifying Zernike moments with orders less than five [5]. Rotation invariance is achieved by using the magnitude of the Zernike moments while scaling and translation invariance are achieved by image normalization. Xin et al. embed the watermark using Zernike/pseudo-Zernike moments by dither modulation [24, 25, 26]. Some selected Zernike/pseudo-Zernike moments of an image are computed, and their magnitudes are quantized by dither modulation to embed an array of bits. In watermark detection, the embedded bits are estimated from the invariant magnitudes of the moments using a minimum distance decoder.
5
Robust Image Watermarking Based on Scale-Space Feature Points
79
Template based embedding. This kind of scheme embeds a template at predetermined locations in addition to the watermark [15, 20]. The template is commonly composed of local peaks. Before watermark detection, the template is first detected and used to estimate the parameters of the distortions that the image has undergone. Then the watermark can be detected after inverting the distortions. Template based schemes are usually subject to template-removal attacks because anyone can access the local peaks and eliminate them. Self-synchronizing watermarks. This kind of scheme relies on the watermark’s autocorrelation properties to achieve synchronization. Generally speaking, the watermark is designed such that its autocorrelation function contains several peaks. On the receiver side, the decoder correlates the received watermarked image with itself and uses the knowledge about the autocorrelation function’s periodic nature to synchronize the watermark. In [6], Kutter uses space diversity to estimate the attack parameters and invert them before detection. Delannay et al. use key-dependent 2D cyclic patterns to facilitate detector synchronization [4]. Self-synchronizing watermarks are susceptible to removal or estimation attacks in much the same way as template-based methods, because an attacker can use knowledge about the watermark’s periodic tiling to remove it. Feature based synchronization. By binding the watermark with image features, watermark detection can be done without synchronization error. Synchronization based on image features relies on the ability to identify certain feature points in the image before and after an attack. If enough of these points establish correspondence, it is possible to reduce the effect of geometric attacks by referencing to these features. This kind of scheme belongs to the second generation watermarking [7]. Recently, feature-based synchronization has become an active research field in robust image watermarking. Several feature point based watermarking schemes have been proposed in [2, 8, 9, 10, 21, 22, 23]. Bas et al. first extract feature points from the original image and decompose the image into a set of disjoint triangles by Delaunay Tessellation. The watermark is embedded into each triangle using a classical additive scheme and it can be detected by correlation [2]. Figure 5.2 shows the diagram of this scheme. It is among the first few papers that address the feature point based image watermarking. The drawback of this method is that they extract large number of feature points and many of the points from the original image and distorted images are not matched. As a result, the triangles generated during watermark insertion and detection are different. Lee et al. improved Bas’ scheme by using the SIFT detector [9]. Tang et al. adopt the Mexican Hat wavelet scale interaction to extract feature points. Local regions are then generated based on the feature points. Two sub-blocks are further generated and a 16-bit watermark is embedded
80
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.2. Diagram of Bas’ scheme [2]
into the sub-blocks in DFT domain [23]. This method is more robust to signal processing attacks because the watermark is embedded in DFT domain. However, it can not resist scaling attacks. Besides, it fails to detect the watermark even the rotation angle is only 5 degrees. In our analysis, this is mainly due to the feature point selection strategy. In this scheme, a feature point has a higher priority for watermark embedding if it has more neighboring feature points inside its disk. This can produce feature points that locate at high textured areas while not obtaining the points with the best robustness. As a result, the feature points used for embedding and extraction cannot be matched. In Qi’s scheme, the image content is represented by important feature points obtained by the adaptive Harris corner detector [21]. An imagecontent-based adaptive embedding scheme is applied in DFT domain of each perceptually high textured subimage. This scheme can resist more general geometric attacks. The shortcoming of this scheme is that the positions of the important feature points have to be saved for watermark detection. The key ideas in [8] and [22] are quite similar. Scale space feature points, such
5
Robust Image Watermarking Based on Scale-Space Feature Points
81
as Harris-Laplace and SIFT, are employed to generate the local regions. As the feature points are detected from different scales, the regions can be determined using the characteristic scale. The advantage of using scale space feature points is that the extracted regions can always cover the same image content even when the image is subject to scaling attacks. However, the watermark cannot be embedded using traditional transform domain techniques, because the local regions have different sizes. As an alternative, content-based embedding and extraction are usually employed. In [22], the watermark is embedded by partitioning each circular region in spatial domain and adapting it to the size of the original region. In [8], the rectangle watermark is first warped into a circular pattern and it is embedded in spatial domain after adapting to the size of the circular region. In [10], we present a novel robust image watermarking scheme which combines SIFT and Zernike moments. Watermark synchronization is first achieved using SIFT feature points. In order to embed the watermark, the Zernike moment is employed. As circular regions are extracted for watermark synchronization and the computation area of Zernike moment is just a circular disk, the Zernike moments are computed for watermark embedding. In this scheme, scale invariance is achieved using SIFT characteristic scale and rotation invariance is achieved from the rotation invariant property of Zernike moments. This scheme can resist both signal processing attacks and geometric attacks, such as JPEG compression, median filtering, added noise, RST attacks as well as some combined attacks. It can be seen from literature [2, 21, 23] that the feature point is either Harris corner or the Mexican Hat wavelet scale interaction based feature point. These feature points are detected from an uniform scale so that they may disappear when the image is scaled. The recent development in pattern recognition and computer vision has brought out new methods of feature extraction, among which Mikolajczyk’s Harris-Laplace detector [16] and Lowe’s SIFT detector [14] are the two most popular ones. They are different from traditional feature points in that each feature point has a characteristic scale. The characteristic scale is useful in determining a support region from which an invariant descriptor can be generated for feature matching. Watermarking schemes using this kind of feature point can resist more general geometric attacks which can be seen from the latter part of this chapter.
5.3 Scale Space Theory and Feature Point Extraction In order to understand the framework of scale-space feature point based watermarking better, a prior knowledge on the scale space theory is necessary. As a result, we will briefly introduce the image scale space in this section. Then we will describe two scale space feature point detector, namely the Harris-Laplace detector [16] and the SIFT detector [14].
82
B.-L. Guo, L.-D. Li, and J.-S. Pan
5.3.1
Scale Space Theory
As described in [13], scale space representation is a special type of multi-scale representation that comprises a continuous scale parameter and preserves the same spatial sampling at all scales. The scale space representation of a signal is an embedding of the original signal into a one-parameter family of Gaussian kernels of increasing width. Besides, it has been proved that under a variety of reasonable assumptions the only possible scale-space kernel is the Gaussian function. Formally the linear scale-space representation of a continuous signal is constructed as follows. Let f : RN → R represent any given signal. Then the scale-space representation L : RN × R+ → R is defined by L(·; 0) = f and L(·; t) = G(·; t) ∗ f, (5.1) where ∗ denotes for the convolution operation, t ∈ R+ is the scale parameter, and G : RN × R+ \ {0} → R is the Gaussian kernel; in arbitrary dimensions it can be written as G(μ; t) =
2 1 1 −μT μ/(2t) − N i=1 μi /(2t) , e = e N/2 N/2 (2πt) (2πt)
(5.2)
√ where μ ∈ RN and μi ∈ R. The square root of the scale parameter, σ = t, is the standard deviation of the kernel G, and is the natural measure of spatial scale in the smoothed signal at scale t. For a digital image, the scale space is defined as a function, L(x, y, σ), that is produced from the convolution of a variable-scale Gaussian, G(x, y, σ), with an input image, I(x, y): L(x, y, σ) = G(x, y, σ) ∗ I(x, y),
(5.3)
where
1 −(x2 +y2 )/2σ2 e . (5.4) 2πσ 2 Fig. 5.3 shows an example of the images taken from the scale space of standard image Lena. G(x, y, σ) =
5.3.2
Harris-Laplace Detector [16]
The Harris-Laplace detector locates the feature points in scale space based on the scale-adapted Harris function. The Harris detector is based on the second moment matrix. The second moment matrix, also called the autocorrelation matrix, is often used for feature detection or for describing local image structures. This matrix must be adapted to scale changes to make it independent of the image resolution. The scale-adapted second moment matrix M is defined by: 2 Lx (x, y, σD ) Lx Ly (x, y, σD ) 2 M (x, y, σI , σD ) = σD G(σI ) ∗ , (5.5) Lx Ly (x, y, σD ) L2y (x, y, σD )
5
Robust Image Watermarking Based on Scale-Space Feature Points
83
Fig. 5.3. Some images taken from the scale space of image Lena. From up to bottom and from left to right, the images are produced with increasing scales.
where σI is the integration scale, σD is the differentiation scale and La is the derivative computed in the a direction. The eigenvalues of this matrix represent two principal signal changes in the neighborhood of a point. This property enables the extraction of points, for which both curvatures are significant, that is the signal change is significant in the orthogonal directions, i.e. corners, junctions, etc. Such points are stable in arbitrary lighting conditions and are representative of an image. The Harris measure which combines the trace and the determinant of the second moment matrix is calculated as: R(x, y, σI , σD ) = Det(M (x, y, σI , σD )) − k · Tr2 (M (x, y, σI , σD )),
(5.6)
where Det(·) and Tr(·) are the determinant and the trace of M respectively, k is a constant (commonly 0.04 ∼ 0.06). Local maxima of R(x, y, σI , σD ) determine the location of interest points. Once the location of the interest point is determined, it then selects the points for which the Laplacian-of-Gaussian attains a maximum over scale. The Laplacian-of-Gaussian (LoG) is computed as: |LoG(x, y, σI )| = σI2 |Lxx (x, y, σI ) + Lyy (x, y, σI )| .
(5.7)
They then verify for each of the initial points whether the LoG attains a maximum at the scale of the point, that is the LoG response is lower for the finer and the coarser scale. The points for which the Laplacian attains no extremum or the response is below a threshold are rejected. In this way a set of characteristic points with associated scales are obtained. Note that the scale interval between two successive levels should be small to find the location and scale of an interest point with high accuracy. Figure 5.4 shows an example of the Harris-Laplace interest points detected from image Lena. The Harris-Laplace approach provides a compact and
84
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.4. The Harris-Laplace feature points detected from image Lena (The radii of the circles are determined by magnifying the characteristic scales by a uniform factor)
representative set of points which are characteristic in the image and in the scale dimension. 5.3.3
SIFT Detector [14]
A more recent scale-space feature point detector is proposed by Lowe, which is called the scale-invariant feature transform (SIFT) [14]. The SIFT feature points are invariant to image scale and rotation, and are shown to provide robust matching across a a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive so that a single feature can be correctly matched with high probability against a large database of features from many images. The SIFT feature points are detected using a cascade filtering approach that uses efficient algorithms to identify candidate locations that are then examined in further detail. The first step of feature point detection is to identify locations and scales that can be repeatedly assigned under differing views of the same object. To efficiently detect stable keypoint locations in scale space, they propose to use the scale-space extrema in the difference-of-Gaussian function convolved with the image, D(x, y, σ), which can be computed from the difference of two nearly scales separated by a constant multiplicative factor k: D(x, y, σ) = (G(x, y, kσ) − G(x, y, σ)) ∗ I(x, y) = L(x, y, kσ) − L(x, y, σ).
(5.8)
Figure 5.5 illustrates how the Difference-of-Gaussian images can be generated. The initial image is incrementally convolved with Gaussian functions to produce images separated by a constant factor k in scale space, shown
5
Robust Image Watermarking Based on Scale-Space Feature Points
85
Fig. 5.5. Image scale space and DoG images
stacked in the left column. Then adjacent images are subtracted to produce the Difference-of-Gaussian images shown on the right. In order to detect the local maxima and minima of D(x, y, σ), each sample point is compared to its eight neighbors in the current image and nine neighbors in the scale above and below (see the right column of Fig. 5.5). It is selected only if it is larger than all of these neighbors or smaller than all of them. Once a candidate keypoint has been found by comparing a pixel to its neighbors, a detailed model is fitted by a 3-D quadratic function to determine accurately the location and scale of each feature. In addition, candidate locations that have low contrast or are poorly localized along edges are removed by measuring the stability of each feature using a 2× 2 Hessian matrix H as follows: Dxx Dxy (5.9) H= Dxy Dyy The stability of the keypoint can be measured by stability =
(r + 1)2 (Dxx + Dyy )2 , < 2 Dxx Dyy − Dxy r
(5.10)
where Dxx , Dxy , and Dyy are the derivatives of the scale space images, r is the ratio of the largest eigenvalue to the smallest eigenvalue of the Hessian matrix. It can be used to control the stability. In order to achieve invariance to image rotation, a consistent orientation is then assigned to each feature point. The scale of the keypoint is used to select the Gaussian smoothed image, L, with the closest scale, so that all computations are performed in a scale-invariant manner. For each image sample, L(x, y), at this scale, the gradient magnitude, m(x, y), and orientation, θ(x, y), are precomputed using pixel differences: m(x, y) = (L(x + 1, y) − L(x − 1, y))2 + (L(x, y + 1) − L(x, y − 1))2
L(x, y + 1) − L(x, y − 1) θ(x, y) = arctan (5.11) (L(x + 1, y) − L(x − 1, y)
86
B.-L. Guo, L.-D. Li, and J.-S. Pan
Then an orientation histogram is formed from the gradient orientations of sample points within a region around the keypoint. The orientation histogram has 36 bins covering the 360-degree range of orientations. Each sample added to the histogram is weighted by its gradient magnitude and by a Gaussianweighted circular window with a σ that is 1.5 times that of the scale of the keypoint. The peak in the orientation histogram corresponds to orientation of a keypoint. The last step is to compute a descriptor for the local image region. Fig. 5.6 illustrates the computation of the keypoint descriptor. In order to achieve orientation invariance, the coordinates of the descriptor and the gradient orientations are rotated relative to the keypoint orientation. A Gaussian weighting function with σ equal to one half the width of the descriptor window is used to assign a weight to the magnitude of each sample point. The descriptor is formed from a vector containing the values of all the orientation histogram entries, corresponding to the lengths of the arrows on the right side of Fig. 5.6. The figure shows a 2 × 2 array of orientation histograms. In implementation, a 4 × 4 array of histograms with 8 orientation bins in each produces the best results. Therfore, the final descriptor is a 4 × 4 × 8 = 128 dimensional vector. This descriptor is highly distinctive so that it can be used for reliable feature point matching.
Fig. 5.6. Genaration of the keypoint descriptor
Fig. 5.7 shows an example of the detected SIFT feature points from standard image Lena. It can be seen from Fig. 5.4 and Fig. 5.7 that much more feature points can be extracted using the SIFT detector than using the HarrisLaplace detector. Both of the feature points are extracted with characteristic scales which can be seen from the disks with different sizes in Fig. 5.4 and Fig. 5.7.
5
Robust Image Watermarking Based on Scale-Space Feature Points
87
Fig. 5.7. The SIFT feature points detected from image Lena
5.4 Framework of Scale-Space Feature Point Based Robust Image Watermarking In this section, we present a framework of geometrically robust image watermarking which uses scale-space feature points. We call it SSFW for short. It consists of two phases, namely watermark embedding and watermark detection. Fig. 5.8 shows the diagram of the framework. In Fig. 5.8, the first several steps of watermark embedding and watermark extraction are exactly the same except that watermark insertion is implemented on the original image while watermark detection is implemented on the possibly distorted image. These steps are called watermark synchronization, which is necessary for both watermark embedding and watermark detection. Watermark synchronization is the first key technique in SSFW. The other two key techniques of SSFW are content-based watermark embedding and content-based watermark extraction, respectively. Next, we will first introduce how watermark synchronization can be achieved using scalespace feature points and then we will present some content-based watermark embedding and extraction approaches. 5.4.1
Watermark Synchronization Using Scale Space Feature Points
Presently, two methods are commonly employed to extract the scale space feature points, i.e., the Harris-Laplace detector [16] and the SIFT detector [14]. In both schemes, a feature point is detected together with its coordinates and the characteristic scale. The coordinates indicate where the feature point is detected and the characteristic scale indicates the very scale at which it is detected. Both of these parameters are important for reliable watermark synchronization.
88
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.8. Framework of scale-space feature point based watermarking
Once the feature points are detected, the next step is to select the robust ones. As the scale space feature points are originally designed for image matching, many feature points can be detected, especially for the SIFT detector. For example, more than 1000 feature points can be detected from a gray level image with size 512 × 512. Furthermore, more features can be detected for high-textured image, such as the standard image Baboon. As a result, an optimization procedure is necessary to select the robust ones for watermark synchronization. The characteristic scale of a feature point is related to the variance of the Gaussian function in the scale space. In implementation, the feature points whose scales are small have a low probability of being re-detected, because they disappear easily when image contents are modified. Features whose scales are large also have a low probability of being re-detected in distorted images, because they move easily to other locations [8]. As a result, it is necessary to set a range of the characteristic scale. Feature points whose scales fall outside the range should not be used for
5
Robust Image Watermarking Based on Scale-Space Feature Points
89
watermark synchronization. In literature [8], the lower limit and the upper limit of the scale range are set to be 2 and 10, respectively. It should be noted that the scale range of watermark extraction should be larger than that of watermark embedding. This is due to the fact that the watermarked image may be subject to scaling attacks before watermark detection. For example if the range is set to be 2 to 10 during watermark embedding, then the lower limit of the scale range during watermark detection should be smaller than 2 while the upper limit should be larger than 10. The last step of watermark synchronization is to generate some local regions from the image which can be directly used for watermark embedding and extraction. This step is important because the stability of these local regions are related to the performance of watermark robustness. If these regions are stable enough and they can be reliably regenerated during watermark extraction, then the watermark tends to be detected correctly. On the contrary, if these regions cannot be re-detected, the watermark cannot be successfully detected at all.
Fig. 5.9. Local regions for watermarking, (a) triangles in [2], (b) circles in [8, 10, 22]
To our best knowledge, two kinds of local regions are commonly employed, i.e. the triangle and the circle. In literature [2], the Delaunay Tessellation is used to decompose the original into a set of triangles. Fig. 5.9(a) shows an example. The Delaunay tessellation has the following properties [2]: 1. the tessellation has local properties: if a vertex disappears, the tessellation is only modified on connected triangles, 2. each vertex is associated with a stability area in which the tessellation is not modified when the vertex is moving inside this area, and 3. the computational cost is low: a Delaunay Tessellation can be done using fast algorithms. In literature [8, 10, 22], circular regions are generated for watermark synchronization. Fig. 5.9(b) shows an example of the circular regions. Note that the size of the circular regions are different, because the radius of the circle
90
B.-L. Guo, L.-D. Li, and J.-S. Pan
is determined by the characteristic scale. In other words, the characteristic scale is magnified by a factor and used as the radius of the circle. Suppose that the coordinate of the scale-space feature point is (x0 , y0 ) and the characteristic scale is σ, then the circular region is determined using the following equation: (5.12) (x − x0 )2 + (y − y0 )2 = (kσ)2 where k is a magnification factor which controls the radius of the circle. If all the feature points are used to generate the circular regions, the regions may overlapped with each other. For watermarking applications, it is necessary to generate the non-overlapped regions so that each regions can be used for watermark embedding independently. In order to achieve this goal, some of the feature points should be dropped. In order to preserve the robust feature points, the stability of the features should be measured first. For the Harris-Laplace feature point, this measure can be the detector response, namely Eq. (5.6). The larger the response, the better is the stability. For SIFT feature points, the values of the Difference-of-Gaussian (DoG) function can be used instead. In implementation, suppose that the initial feature point set is Ω0 , the following operations are employed to generate the non-overlapped circular regions: Step 1: Choose, from Ω0 , the feature point with the best stability, say P0 ; Step 2: Dismiss the feature points whose corresponding regions overlap with that of P0 ; Step 3: Update Ω0 by dismissing P0 ; Step 4: If the circular regions generated using the updated points in Ω0 still overlap with others, repeat Steps 1-3, otherwise go to Step 5; Step 5: Generate non-overlapped regions using the reserved feature points. Circular region based schemes have the advantage that when a feature point is missing, the other circular regions can also be re-detected thus facilitate watermark detection. Besides, the circular regions are invariant to a considerable number of attacks, such as the RST attacks and traditional signal processing attacks. Fig. 5.10 shows an example of the extracted circular regions from the original image and some distorted images. It can be seen from Fig. 5.10 that the circular region always covers the same image content. It is easy to know that if the watermark embedding and watermark extraction are based on the image content, the watermark can resist such attacks. As a result, we will introduce some content-based watermark embedding and extraction approaches. 5.4.2
Content Based Watermark Embedding and Extraction
In the previous subsection, we have presented how to achieve watermark synchronization using scale-space feature points. As described, local circular regions can be generated for watermark embedding and extraction. In
5
Robust Image Watermarking Based on Scale-Space Feature Points
91
Fig. 5.10. Circular regions extracted from (a) the original image, (b) the medianfiltered image, (c) the added Gaussian noise image, (d) the 30% JPEG image, (e) the 30-degree rotated image, (f) the 0.8× scaled image
this subsection, we will introduce some content-based watermark embedding and extraction schemes that can be directly applied on the circular regions. Four approaches are presented here, namely the two approaches in literature [8, 22] and two of our schemes. Each of the schemes embeds the same watermark repeatedly into all the circular regions in order to enhance watermark robustness. Seo’s approach [22]. In this approach, the watermark is embedded as follows: 1. Extract scale-space feature points using the Harris-Laplace detector, 2. Select n feature points with the strongest scale-normalized corner strength measures, i.e. Eq. (5.6), 3. Select the final feature points by considering the spatial positions, corner strength measures and characteristic scales of the n feature points, 4. Embed the circularly symmetric watermark after adapting to the characteristic scale at each feature point. In the spatial domain, the watermark is embedded in a circularly symmetric way centered at each selected feature point. A binary pseudo-random M × M pattern O(m1 , m2 ) is prepared as a watermark where m1 , m2 = 0, 1, ···, M −1. Regarding a feature point as the center of a disk, the disk is separated into M homocentric circles of radius ri (i = 0, 1, ···, M − 1, M ) and into M sectors as shown in Fig. 5.11. In Fig. 5.11, The radius ri is set to make the area of each sector the same. In each sector the value of the watermark is the same. For each disk, the embedded watermark W (x, y) is obtained from the original watermark O(m1 , m2 ) as follows: W (x, y) = O(m1 , m2 ) if (x, y) ∈ Sm1 ,m2
(5.13)
92
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.11. Sector-based watermarking in Seo’s approach [22]
where Sm1 ,m2 = {(x, y), (x, −y), (−x, y), (y, x)} with the constraints that θM rm1 ≤ r < rm1 +1 , 0.5π = m2 for x, y ≥ 0, r = x2 + y 2 and θ = y arctan x , and · is the floor operation. As a result, the same watermark is embedded in the symmetric sectors as shaded in Fig. 5.11; i.e. W (x, y) = W (x, −y) = W (−x, y) = W (y, x). Besides, the watermark pattern W (x, y) is embedded additively into the image centered at the feature point xp as follows: f (x) = f (x) + α(x)W (x − xp ),
(5.14)
where α(x) is the local masking function calculated from the human visual system (HVS) to make the embedded watermark imperceptible. In the decoder, assume that the watermark exists at the feature points with the strong scale-normalized corner strength measure as the watermark was embedded. The first and second steps are the same to the embedding. Watermark detection is performed at each detected feature point. In their method, watermark detection is performed at n selected feature points with the strongest scale-normalized corner strength measures. The detection mask D of each feature point is obtained in each sector as follows Σ (x,y)∈Sm1 ,m2 I(x,y) if C A RD(Sm1 ,m2 ) = 0 C A RD(Sm1,m2 ) (5.15) D(m1 , m2 ) = 0 if C A RD(Sm1 ,m2 ) = 0 where C A RD(Sm1 ,m2 ) is the number of elements in the set Sm1 ,m2 . Watermark detection is based on correlation between the detection mask D and the original watermark pattern O. To achieve rotational invariance, they compute the cyclic convolution C between D and O along m2 as follows: C(k) =
1 D(m1 , m2 )O(m1 , m2 − k), M 2 m ,m 1
2
(5.16)
5
Robust Image Watermarking Based on Scale-Space Feature Points
93
where k = 0, 1, · · ·, M − 1. Using the cyclic convolution C, the watermark detection problem could be formulated as a hypothesis testing. The reader can refer to reference [22] for further detail. Lee’s approach [8]. In Lee’s approach, the 2-D watermark is generated and transformed into circular form for each circular region. Then the circular watermarks are added to the circular regions in spatial domain. They generate a 2-D rectangular watermark that follows a Gaussian distribution, using a random number generator. To be inserted into circular patches, this watermark should be transformed so that its shape is circular. They consider the rectangular watermark to be a polar-mapped watermark and inversely polarmap it to assign the insertion location of the circular patches. In this way, a rotation attack is mapped as a translation of the rectangular watermark, and the watermark still can be detected using the correlation detector. Note that the size of circular patches differs, so they generate a separate circular watermark for each patch.
Fig. 5.12. Polar mapping between the rectangular watermark and the circular watermark [8]
Fig. 5.12 describes how the rectangular watermark can be transformed into a circular pattern. Let the x and y dimensions of the rectangular watermark be denoted by M and N , respectively. Let r be the radius of a circular patch. As shown in Fig. 5.12, they divide a circular patch into homocentric regions. To generate the circular watermark, the x- and the y-axis of the rectangular watermark are inversely polar-mapped into the radius and angle directions of the patch. The relation between the coordinates of the rectangular watermark and the circular watermark is represented as follows: θ 0 if 0 ≤ θ < π, x = rrMi −r −r0 · M, y = π · N (5.17) ri −r0 θ−π x = rM −r0 · M, y = π · N if π ≤ θ < 2π,
94
B.-L. Guo, L.-D. Li, and J.-S. Pan
where x and y are the rectangular watermark coordinates, ri and θ are the coordinates of the circular watermark, rM is equal to the radius of the patch, and r0 is a fixed fraction of rM . The insertion of the watermark is represented as the spatial addition between the pixels of images and the pixels of the circular watermark as follows: vˆi = vi + Λi · wci ,
and wci ≈ N (0, 1),
(5.18)
where vi and wci denote the pixels of images and of the circular watermark, respectively, and Λ denotes the perceptual mask that controls the insertion strength of the watermark. In detection, the first step is analyzing the image contents to extract circular regions. The watermark is then detected from the regions. If the watermark is detected correctly from at least one region, they can prove ownership successfully. The additive watermarking method in the spatial domain inserts the watermark into the image contents as noise. Therefore, they first apply a Wiener filter to extract this noise by calculating the difference between the watermarked image and its Wiener-filtered image, and then regard the difference as the retrieved watermark. To measure the similarity between the reference watermark generated during watermark insertion and the retrieved watermark, the retrieved circular watermark should be converted into a rectangular watermark by applying the polar-mapping. Considering the fact that the watermark is inserted symmetrically, they take the mean value from the two semicircular areas. By this mapping, the rotation of circular patches is represented as a translation, and hence they achieve rotation invariance for their scheme. They apply circular convolution to the reference watermark and the retrieved one. The degree of similarity is called the response of the detector, which is represented by the maximum value of circular convolution as follows: w(m, n)w∗ (m, n − r) for r ∈ [0, n]. (5.19) similarity = max [w(m, n)w(m, n)]1/2 where w is the reference watermark and w∗ is the retrieved watermark. The range of similarity values is from −1.0 to 1.0. They can identify the rotation angle πr of the circular region by finding the r with the maximum value. If the similarity exceeds a predefined threshold, the watermark is successfully detected. Our approach I. We also conducted some in depth researches on robust image watermarking based on the detection of scale-space feature points. The diagram of watermark embedding of our approaches is shown in Fig. 5.13 while that of watermark extraction is shown in Fig. 5.14. As illustrated before, the local regions are invariant to geometric attacks. They can cover the same image content even the image is subject to rotation, scaling and traditional signal processing attacks. However, when rotation occurs, the orientation of the circular regions differs. In order to make the
5
Robust Image Watermarking Based on Scale-Space Feature Points
95
Fig. 5.13. Diagram of our approaches: watermark embedding
Fig. 5.14. Diagram of our approaches: watermark extraction
watermark invariant to image rotation, our content-based watermark embedding scheme is proposed as follows. Suppose that we would like to embed an N -bit-long binary watermark w = {w1 , w2 , · · ·, wN }, wi ∈ {0, 1} into the original image. For each circular region, it is first divided into homocentric cirque regions (CR), as shown in Fig. 5.15. The number of the cirque regions is determined by the watermark length (i.e., N ), and the interval between two neighboring regions is RN0 , with R0 the radius of the circular region. First, move the origin of the Cartesian coordinates to the center of the circular region, and map the pixels inside the circular region into polar coordinates.
96
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.15. Homocentric cirque regions for watermarking
ρx,y =
y . x2 + y 2 , and θx,y = arctan x
Then the ith cirque regions (CRi ) can be expressed as: R0 R0 CRi = (x, y)|(i − 1) · ≤ ρx,y < i · , i = 1, 2, · · · , N. N N
(5.20)
(5.21)
In our approach, watermarking embedding is achieved by odd-even quantization. In implementation, all pixels in the same cirque region are quantized into odd or even pixels, resulting in an odd or even region. The pixels inside a CR are first assigned a sign “0” or “1” using the quantization function: 0, if k ≤ I(x, y) < (k + 1) for k = 0, ±2, ±4, · · · Q(x, y) = 1, if k ≤ I(x, y) < (k + 1) for k = ±1, ±3, ±5, · · · (5.22) where is the quantization interval and I(x, y) is the pixel value. For the purpose of robustness, a watermark bit should be encoded by moving the pixel value to the middle of a corresponding quantization interval such that the modified pixel value cannot be easily moved away from the current interval [27]. As a result, our approach operates as follows. Let wi denote the target watermark bit that is to be encoded in the cirque region CRi , then the quantization noise is then defined as: ! " I(x, y) r(x, y) = I(x, y) − · (5.23) where · is the floor operation. Then the amount of modification u(x, y) added to the pixel I(x, y) is determined by
5
Robust Image Watermarking Based on Scale-Space Feature Points
97
⎧ ⎨ −r(x, y) + 0.5, if Q(x, y) = wi ; u(x, y) = −r(x, y) + 1.5, if Q(x, y) = wi and r(x, y) > 0.5; ⎩ −r(x, y) − 0.5, if Q(x, y) = wi and r(x, y) ≤ 0.5. (5.24) As a result, the modified pixel value is obtained by the following formula: = Ix,y + u(x, y) Ix,y
(5.25)
is the modified pixel value. The above embedding procedure enwhere Ix,y sures that the modified pixel value locates at the middle of the corresponding quantization interval. The embedding operation is done circularly for all regions, producing the whole watermarked image. In the decoder, the watermark can be extracted using an odd-even detector. The first several steps of watermark extraction are exactly the same as watermark embedding. Local invariant regions are first extracted, and each region is divided into N cirque regions using Eq. (5.21). The odd-even detector (OED) is designed as follows to extract the watermark bits from the cirque regions. The pixels inside a cirque region CRi are first assigned a number “0” or “1”, using equation Eq. (5.22). The number of “0” pixels is denoted by N U Mi,0 , and the number of “1” pixels is denoted by N U Mi,1 . Then the watermark bits can be extracted using the following equation: 0, if N U Mi,0 > N U Mi,1 ; wi = (5.26) 1, if N U Mi,1 > N U Mi,0 .
At last, the normalized correlation (NC) is employed to evaluate the similarities between the extracted watermark w and the original watermark w: N NC = N
wi w i N 2
i=1
i=1 wi
(5.27)
2 i=1 wi
The presence of the watermark can be claimed if the normalized correlation is higher than a threshold. Our approach II. In our first approach, the local circular regions are partitioned into homocentric cirque regions to embed a watermark. We also present an alternative approach that can be used to embed the watermark. The key difference between the first approach and this one lies in the partition of the circular regions. Fig. 5.16 illustrates how the circular regions can be partitioned. It should be noted that when rotation occurs, the region extracted from the original image and the region from the rotated image differs in orientation. In order to achieve watermark robustness to rotation, the partition should be implemented in a rotation invariant pattern. In other words, the start radius should remain unchanged relative to the image content even during image rotation. This can be done by first align the region to a standard
98
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.16. Fan-shaped regions for watermarking
Fig. 5.17. Illustration of the start radius of the partition
orientation using rotation normalization [1]. However, if we do that way, the watermarked region has to be rotated back to the original orientation, which will inevitably introduce interpolation error. In order to avoid interpolation error, we determine the start radius by only using the rotation normalization angle. Suppose that the normalization angle is Θ, then the start radius is determined as shown in Fig. 5.16. Suppose that the watermark to be embedded is w = {w1 , w2 , · · · , wN }, wi ∈ {0, 1}, then the angle of each fan-shaped sub-area is 2π N . In order to describe the sub-area, the pixels are first transformed into polar coordinates as follows: y . (5.28) ρx,y = x2 + y 2 , and θx,y = arctan x Then each sub-area Ai can be denoted as: 2π 2π ≤ θx,y < −Θ + i Ai = (x, y)| − Θ + (i − 1) , i = 1, 2, · · ·, N. N N (5.29) The above partition of the circular region ensures that the partition of the circular region is rotation invariant. Fig. 5.17 shows an example of the start radius of the partition. In Fig. 5.17, the two circular regions are extracted from the original image and the 30-degree-rotated image, respectively. It can be seen that the two
5
Robust Image Watermarking Based on Scale-Space Feature Points
99
Fig. 5.18. False-alarm probability, assuming N =24 and M =5
Fig. 5.19. Watermark invisibility (a) the original image, (b) the watermarked image, (c) the magnified residual image
start radii of the circular regions remain unchanged relative to the image content. As a result, the partition of the fan-shaped area is done in a rotation invariant way. Therefore, it is possible to achieve watermark synchronization even the watermarked image is subject to rotation attacks.
100
B.-L. Guo, L.-D. Li, and J.-S. Pan
In approach II, watermark embedding and extraction can be achieved using the same methods as those of approach I. In both approaches, the watermark is claimed to be presence by comparing the similarity with a predetermined threshold. In both of our approaches, this threshold can be determined based on the false-alarm probability. Suppose that each bit of the watermark is an independent variable. The probability of a k-bit match between N -bit extracted and original watermark bit sequences is calculated as:
N pk (1 − p)N −k (5.30) pk = k where p is the success probability which indicates the possibility that the extracted bit matches the original watermark bit. Assuming that p = 0.5, pk can be rewritten as N! pk = (0.5)N . (5.31) k!(N − k)! Then the false-alarm probability for one local region is computed as Plocal =
N k=T
(0.5)N
N! , k!(N − k)!
(5.32)
where k is the number of matched bits and T is the threshold for watermark detection. In this chapter, the image is claimed to be watermarked if we can successfully detect the watermark from at least two local regions. In this case, the global false-alarm probability is Pglobal =
M i=2
(Plocal )i (1 − Plocal )M−i
M i
(5.33)
where M is number of the local regions. Fig. 5.18 shows a plot of the global false-alarm probability against various thresholds when the watermark is 24bit long. For example. if the threshold is set to be 19, the false-alarm probability is as low as 1 × 10−4 . In order to demonstrate the efficiency of our proposed scheme, we present some simulation results using our approach II. Fig. 5.19 shows an example of the watermarked images on Lena, House and the corresponding residual images when the watermark is 24-bit-long and = 5. It can be seen from Fig. 5.19(a) and Fig. 5.19(b) that the inserted watermark is invisible to the naked eyes. The Peak Signal to Noise Ratio (PSNR) values are 41.11dB and 41.03dB, respectively. The quality of the watermarked image is satisfactory. Note that the residual images are magnified 60 times for better display. Fig. 5.20 shows the relation between PSNR values and the quantization step when a 24-bit-long watermark is embedded. It is observed that the PSNR
5
Robust Image Watermarking Based on Scale-Space Feature Points
101
Fig. 5.20. PSNR values with different quantization steps
decreases with increasing quantization steps. When approaches 14, the PSNR falls around 30dB. In this paper, we set = 5 so as to make the image quality keep around 40dB while achieving the most desired robustness. Simulation results also show that the length of the watermark has little effect on the quality of the watermarked image. To test the robustness of the proposed scheme, the watermarked images are subject to different attacks, including signal processing attacks and geometric attacks. The simulation results are listed in Table 5.1. We also compared our results with those of Lee’s scheme [8]. In Table 5.1, the number represent the maximum of the NC values extracted from all the local regions. Note that “/” indicates that there is no result provided by the author. It can be seen from Table 5.1 that the proposed scheme is robust to both signal processing attacks and geometric attacks. When no attack occurs, it can extract the watermark accurately, while the similarity is only 0.727 in Lee’s approach. The proposed scheme is robust to RST attacks. The embedded watermark can be exactly extracted even after large rotations, such as 60◦ and 90◦ . When the image is scaled up, the NC values are all 1. The similarities are a little lower under scaling down attacks, because some information loses therein. It is also robust to locally cropping attacks, because the watermark is embedded locally into multiple regions. The watermark can also be detected even after some combined attacks, such as rotation and scaling. For all attacks, the watermark can be successfully detected. Besides, the watermark similarities are higher than Lee’s method, in all cases.
102
B.-L. Guo, L.-D. Li, and J.-S. Pan Table 5.1. Experimental results on watermark robustness Type of attack No attack JPEG compression 50 JPEG compression 80 Median filtering 3*3 Added Gaussian noise Rotation 10 degree Rotation 30 degree Rotation 60 degree Rotation 90 degree Rot 5 degree + crop Rot 15 degree + crop Rot 45 degree + crop Scaling 0.7 Scaling 0.8 Scaling 1.2 Scaling 1.5 Translation 20 pixels Translation 40 pixels Centered crop 10 percent Centered crop 25 percent Rot 20 degree + scaling 0.8 Rot 40 degree + scaling 1.5
Ref. [8] 0.727 0.532 0.660 0.629 0.532 0.600 0.514 / / 0.641 0.536 0.550 0.472 0.539 0.614 0.524 / / 0.683 0.621 / /
Lena 1 0.958 1 1 0.750 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Proposed Peppers 1 0.792 0.875 1 0.708 1 1 1 1 1 1 1 0.917 0.958 1 1 1 1 1 1 0.875 1
House 1 0.708 0.875 1 0.792 1 1 1 1 1 1 1 0.958 1 1 1 1 1 1 1 0.958 1
5.5 Geometrically Robust Image Watermarking Using SIFT Feature Points and Zernike Moments [10] Most of the existing image watermarking schemes using scale-space feature points embed the watermark in spatial domain. As we all know that spatial domain watermarking schemes are less robust to signal processing attacks, because the watermark is embedded by modifying pixel values directly. Furthermore, watermark robustness to geometric attacks is also compromised because geometric attacks, such as rotation, scaling and translation will introduce interpolation error which can be seen as one kind of signal processing attacks. In our opinion, if the multi-scale feature point based synchronization can be combined with transform domain watermarking techniques, the above problems can be solved. However, the difficulty lies in that the extracted local regions are different in size. As a result, the watermark cannot be embedded in an uniform framework. In this section, we will present an algorithm which is based on the above idea. In the proposed scheme, watermark synchronization is achieved using the scale-invariant feature transform. The watermark signal is composed of the Zernike moments (ZM) computed on all the extracted circular regions
5
Robust Image Watermarking Based on Scale-Space Feature Points
103
after scale normalization. Then the ZM vectors are modified and reconstructed, producing some error images. The watermark is embedded by adding the error images into the corresponding local circular regions in spatial domain directly. During watermark detection, local circular regions are first extracted from the distorted image and the Zernike moments are computed over each local region after scale normalization. A minimum distance decoder is proposed to detect the watermark blindly. Simulation results show that the watermark is robust to traditional signal processing attacks, rotation, scaling as well as combined attacks. 5.5.1
Watermark Synchronization
As have described in Section 5.3.3, the SIFT extracts keypoints with their location (p1 , p2 ), scale σ and orientation θ . In the proposed scheme, we adopt the location and scale information to generate a circular patch centered at the feature point as follows: (x − p1 )2 + (y − p2 )2 = (kσ)2
(5.34)
where k is a magnification factor to control the radius of the patch. The SIFT was originally designed for image matching. It extracts many interest points which densely cover the image content. To adapt them to the watermarking system, keypoints must be pre-processed. In this chapter, we determine their distribution using feature point matching. For an original image, it is first rotated. Then we extract SIFT keypoints from the original image and the rotated one, respectively. Then feature matching is applied on those detected points using fast nearest-neighbor algorithm [14] by setting a threshold. Then a set of matched keypoints from the original image is obtained together with their descriptors. These points are the initial candidates for local region generation. The candidate points are then filtered according to their scale. The SIFT detects keypoints from a set of Gaussian smoothed images. So candidate keypoints have different scales. A keypoint whose scale is small or large have a low probability of being re-detected because they are unstable when image contents are modified. As a result, we select keypoints from the candidate points whose scales are between a maximum value and a minimum value [8]. In our scheme, the minimum scale value is set to be 4 and the maximum scale value is 8. Upon feature matching and scale selection, the next step is to reduce keypoints that are too close together. In feature matching, each candidate point is related to a distance ratio which is produced from the fast nearest-neighbor algorithm. The smaller the ratio, the more robust the keypoint is. As a result, we first select the point with the smallest distance ratio. Then we compute the distance between any other point and this one. If it is smaller than the sum of the two expected radii (kσ), the point is dropped, otherwise selected. This operation is done circularly until all points are processed. At last, we
104
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.21. Circular regions for watermarking
dismiss the keypoints that are too close to the image edge. Finally, we obtain the most robust keypoints and use them to generate non-overlapped circular patches. These local regions are rotation and scale invariant. Fig. 5.21 shows an example of patches generated on the image Lena. 5.5.2
Zernike Moments and Watermark Generation
In order to embed the watermark into the circular regions, the embedding scheme should be designed in a rotation invariant pattern, because the orientations of the local regions are different when the image is rotated. In this paper, we employ the Zernike moment to design the watermark, because the magnitude of ZM is rotation invariant. We will first introduce the basic theory of ZM and then we present the process of watermark generation. Zernike basis is a set of orthogonal and complete polynomials defined on the unit circle [5]. The polynomial is defined as follows: (5.35) Vnm (x, y) = Vnm (ρ, θ) = Rnm (ρ)ejmθ y where ρ = x2 + y 2 and θ = arctan x . Here the order n is a nonnegative integer while the repetition m is an integer subject to the constraint that n − |m| is nonnegative and even. Rnm (ρ) is the radial polynomial defined as:
(n−|m|)/2
Rnm (ρ) =
s=0
(−1)s (n − s)!ρn−2s s! n+|m| − s ! n−|m| −s ! 2 2
These polynomials are orthogonal and they satisfy # # π ∗ δnp δmq Vnm (x, y)Vpq (x, y)dxdy = n+1 x2 +y 2 ≤1 where
δab =
1, a = b; 0, a = b.
(5.36)
(5.37)
5
Robust Image Watermarking Based on Scale-Space Feature Points
Given an image f (x, y), the ZMs can be computed by # # n+1 ∗ Anm = f (x, y)Vnm (x, y)dxdy. π x2 +y 2 ≤1
105
(5.38)
For a digital image with size N × N , Eq. (5.38) can be approximated by n + 1 ∗ Aˆnm = f (xi , yj )Vnm (xi , yj )xy, π i=1 j=1 N
N
(5.39)
where x2i + yj2 ≤ 1, and x = y = N2 . Given all ZMs with the maximum order Nmax , the image can be reconstructed as f (x, y) =
N max
Anm Vnm (x, y).
(5.40)
n=0 {m:n≥|m|,n−|m|=even}
As the proposed synchronization method extracts circular regions from the original image, a ZM vector can be directly computed over each extracted region. In this paper, the watermark signal is composed of all the ZM vectors. Suppose that we extract K local circular regions from the original image and the order of the ZMs computed is n, then the watermark signal consists of K ZM vectors as follows: ⎤ ⎡ 1 A0,0 , A11,−1 , A11,1 , · · · , A1n,−n , · · · , A1n,n , ⎢ A20,0 , A21,−1 , A21,1 , · · · , A2n,−n , · · · , A2n,n , ⎥ ⎥ ⎢ 3 3 3 3 3 ⎥ Watermark signal : ⎢ ⎢ A0,0 , A1,−1 , A1,1 , · · · , An,−n , · · · , An,n , ⎥ . (5.41) ⎦ ⎣ ··· K K K K , A , A , · · · , A , · · · , A , AK 0,0 1,−1 1,1 n,−n n,n In Eq. (5.41), each row is a ZM vector computed over one circular region. 5.5.3
Moment Modification and Watermark Insertion
In this chapter, the watermark is embedded repeatedly into each local region by modifying its ZM vectors. Therefore, we first introduce how to modify the moments and then present the watermark insertion process. Some of the followings are based on [5]. Given the ZM vector (denoted by A) of one local region R(x, y), we modify Anm by nm , and the modified moment is denoted by A , then the image reconstructed from those modified moments is ∞ ∞ ˆ R(x, y) = Anm Vnm (x, y) = (Anm + Δnm )Vnm (x, y) n=0 m
= R(x, y) +
n=0 m ∞
Δnm Vnm (x, y)
n=0 m
= R(x, y) + e(x, y).
(5.42)
106
B.-L. Guo, L.-D. Li, and J.-S. Pan
The reconstructed image consists of two parts: the original image and an error image. If we only modify Ak,l by α, then the error signal e(x, y) is ˆ y) − R(x, y) = e(x, y) = R(x,
∞
Δnm Vnm (x, y)
n=0 m
= · · · + Δk,l Vk.−l + · · · + Δk,l Vk,l + · · · = α(Vk,−l + Vk,l ).
(5.43)
Note that because of the conjugate symmetry of ZM, Ak,−l must also be modified accordingly to obtain a real image. Therefore, αVk,l , l = 0; (5.44) e(x, y) = α(Vk,l + Vk,−l ), l = 0. If we add e(x, y) into the original image in spatial domain, the ZMs will be different. Ideally, the added image e(x, y) only affects Ak,l , and Ak,l becomes Ak,l = Ak,l + α
(5.45)
As an example, we modified the moment A31 of one circular region, and added the reconstructed error image to the original region. Then we computed ZMs of the modified region again, the magnitude differences are computed and shown in Fig. 5.22. In Fig. 5.22, obvious peaks can only be found where the ZMs have been modified. In fact, the above information indicates that we can modify the ZM vector in the original watermark at predetermined order and repetition for watermark embedding. During watermark detection, we compute the ZM vector from a synchronized region and the magnitude difference between the original vector and the newly extracted one is computed. If peaks appear only at the same order and repetition, then we can claim the presence of the watermark, otherwise not. The diagram of the proposed watermark insertion scheme is shown in Fig. 5.23. The SIFT feature points are first extracted from the original image and the non-overlapped circular regions are extracted using the method described in Section 5.5.1. For each region, it is first scale normalized and the ZM vector is then extracted as the watermark. The ZM vector is then modified at specialized order and repetition. An error image is then produced by reconstructing the modified vector. The error image is then inversely scale normalized and added into the original region under proper strength to obtain the watermarked region. Then the watermarked region is used to replace the original region. The above operation is done repeatedly until all local regions are watermarked. Finally, the whole image is inverse scale normalized to obtain the watermarked image. 5.5.4
Watermark Detection
Figure 5.24 shows the diagram of watermark detection. In Fig. 5.24, the first several steps are exactly the same as those of watermark insertion. Nonoverlapped circular regions are first extracted for synchronization. For each
5
Robust Image Watermarking Based on Scale-Space Feature Points
107
Fig. 5.22. Magnitude difference at (3, −1) and (3, 1)
Fig. 5.23. Diagram of watermark insertion
region, it is scale normalized and the ZMs are first computed, and a minimum distance decoder is proposed to detect the watermark. The minimum distance decoder is designed as follows. Suppose synchronized region is de that the ZM vector computed over one noted by Aˆi0,0 , Aˆi1,−1 , Aˆi1,1 , · · · , Aˆin,−n , · · · , Aˆin,n . Then what the minimum distance decoder does is to find the ZM vector from the original watermark
108
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.24. Diagram of watermark extraction
signal Eq. (5.41) that has the smallest distance with the extracted one. The distance is computed as follows: Di,j = (Aˆi0,0 − Aj0,0 )2 + (Aˆi1,−1 − Aj1,−1 )2 + · · · + (Aˆin,n − Ajn,n )2 (5.46) where j = 0, 1, 2, · · · , K. Note that in Eq. (5.46), the moment that was previously modified for watermark insertion should be excluded in computation. For example, if we modify A3,1 and A3,−1 to insert the watermark, then terms and should not appear in Eq. (5.46). Then, the minimum distance between the extracted ZM vector and the watermark is: Dmin (i) = min(Di,1 , Di,2 , Di,3 , · · · , Di,K )
(5.47)
Upon obtaining the minimum distance Dmin (i), min (i) = Di,m , the ab say D * m + i i i m ˆ ˆ ˆ solute difference between A0,0 , A1,−1 , · · · , An,n and A0,0 , Am 1,−1 , · · · , An,n is then computed as: m ˆi ˆi − Am | Ediff = |Aˆi0,0 − Am |, | A − A |, · · · , | A 0,0 1,−1 1,−1 n,n n,n
(5.48)
Ideally, peaks can only be found at (ORDmod , −REPmod) and (ORDmod , REPmod ) of Ediff , where the moments were previously modified to embed the watermark. The shape of the detected watermark is shown in Fig. 5.24. In implementation, we set a threshold (T ) to enhance watermark robustness. If for any s = ORDmod , t = ±REPmod , the following relation always holds,
5
Robust Image Watermarking Based on Scale-Space Feature Points
109
m ˆi |AˆiORDmod ±REPmod − Am ORDmod ±REPmod | − |As,t − As,t | > T. (5.49)
Then we can claim that the watermark has been successfully detected. In this paper, the threshold T is set to be 0.002, which is determined experimentally. 5.5.5
Experimental Results and Discussions
In this section, two experiments have been conducted to evaluate the performance of the proposed scheme, namely watermark invisibility and watermark robustness. In experiments, gray images with size 512 × 512 are used as the original images, including Lena, Peppers, Boat, Girl, Barbara, Baboon and House etc. The order of ZM used for watermark generation is 6. The moments (4, 2) and (4, −2) are modified to insert the watermark, i.e. ORDmod = 4 and the REPmod = 2. Watermark invisibility Fig. 5.25 shows an example of watermark embedding. The original images and the watermarked images are shown in Fig. 5.25(a) and Fig. 5.25(b), respectively. Fig. 5.25(c) shows the corresponding residual images between Fig. 5.25(a) and Fig. 5.25(b), which are magnified 20 times for better display. It is easily seen from Fig. 5.25(a) and Fig. 5.25(b) that we insert the watermark so as not to be visible to the naked eyes. The PSNR values are 40.4dB and 43.2dB, respectively for Lena and House. In implementation, the PSNR values for all test images are higher than 40dB. The numbers
Fig. 5.25. (a) Original images, (b) Watermarked images, (c) Magnified residual images
110
B.-L. Guo, L.-D. Li, and J.-S. Pan
of extracted regions for watermarking are 6 and 7, respectively for Lena and House. Our scheme achieves comparable image quality with that of [21], while both are better than that of [23]. The reason is that the method in [8] embeds the watermark by modifying the pixels in spatial domain directly, while [21] and our scheme embed the watermark using transform domain techniques. Watermark robustness In this subsection, we test the robustness of our scheme to both traditional signal processing attacks and geometric attacks, such as JPEG compression, added noise, image rotation, scaling, etc. As ORDP ZM is 6 and the (4, ±2)th = 28 moments are modified to embed the watermark, it produces (6+1)(6+2) 2 ZMs and the detected watermark is located at index 12 and index 14, respectively. Fig. 5.26 to Fig. 5.32 show some distorted images and the corresponding detected watermark. It can be seen from Fig. 5.26 to Fig. 5.32 that, for JPEG compression, added Gaussian noise, the watermark can be readily detected. The above results are comparable with that of [21], while both are better than those of [2] and [8]. The reason is that in [2] and [8], the watermarks are embedded in spatial domain additively, somewhat like added noise. As a result, they are
Fig. 5.26. Watermarked image and the detected watermark
Fig. 5.27. 20% JPEG compressed image and the detected watermark
5
Robust Image Watermarking Based on Scale-Space Feature Points
111
Fig. 5.28. Added-noise image and the detected watermark
Fig. 5.29. Rotated image (10 degree) and the detected watermark
Fig. 5.30. Scaled image (0.8×) and the detected watermark
more likely to be affected by signal processing attacks. Rotation and scaling are two types of geometric attacks that can be easily implemented without causing visible degradations. Our scheme is robust to these attacks. Scale invariance is achieved by normalizing the original image to a uniform scale before feature detection. Bas’ scheme [2] and Tang’s scheme [21] are theoretically not robust to scaling attack, because the feature detection methods are
112
B.-L. Guo, L.-D. Li, and J.-S. Pan
Fig. 5.31. Rotated plus scaled image (10 degree and 1.2×) and the detected watermark
Fig. 5.32. Center-cropped image and the detected watermark
scale sensitive. Lee’s scheme [8] can resist image scale change, but all the watermark similarities are smaller than 0.7. Rotation robustness of our scheme is achieved using pseudo-Zernike moments. The three schemes in [2, 8, 21] can all resist image rotation. However, rotation invariance can only be achieved at small angles, typically 10 degrees in [2], 5 degrees in [21] and 10 degrees in [8]. Our scheme can resist large rotations, such as 90 degrees or even an obtuse angle. Our scheme is also robust to combined attacks, such as rotation plus scaling.
5.6 Conclusion In this chapter, we have presented a framework of scale-space feature point based robust image watermarking (SSFW). This kind of watermarking schemes are based on the detection of image feature points in the scale space. The basic principles of scale space and how the interest points can be extracted are first introduced. As these points are originally designed for image matching, a further modification procedure is necessary for effective watermark synchronization, which has been addressed in this chapter. We have
5
Robust Image Watermarking Based on Scale-Space Feature Points
113
also presented several content-based watermark embedding and extraction schemes which are applicable to the SSFW. An algorithm which combines scale-invariant feature transform and Zernike moments is introduced in detail to further classify the framework SSFW. Most of the existing schemes using scale-space feature points embed the watermark in spatial domain so that the performance of watermark robustness is not satisfactory. How to combine scale-scale feature based watermark synchronization with traditional transform domain based watermark embedding is still an open problem.
Acknowledgment The research work related to this chapter was supported by Hi-Tech Research and Development Program of China (863) (2006AA01Z127), National Natural Science Foundation of China (60572152) and Ph.D Programs Foundation of Ministry of Education of China (20060701004).
References 1. Alghoniemy, M., Tewfik, A.: Geometric invariance in image watermarking. IEEE Trans. Image Processing 13, 145–153 (2004) 2. Bas, P., Chassery, J., Macq, B.: Geometrically invariant watermarking using feature points. IEEE Trans. Image Processing 11, 1014–1028 (2002) 3. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking: Principles & Practice. Morgan Kaufman, Los Altos (2001) 4. Delannay, D., Macq, B.: Generalized 2D cyclic patterns for secret watermark generation. In: Proc. Int’l Conf. Image Processing, vol. 2, pp. 72–79 (2000) 5. Kim, H., Lee, H.: Invariant image watermark using Zernike moments. IEEE Trans. Circuits and Systems for Video Technology 13, 766–775 (2003) 6. Kutter, M.: Watermarking resisting to translation, rotation, and scaling. In: Proc. SPIE Multimedia Systems and Applications, vol. 3528, pp. 423–431 (1999) 7. Kutter, M., Bhattacharjee, S., Ebrahimi, T.: Towards second generation watermarking schemes. In: Proc. Int’l Conf. Image Processing, vol. 1, pp. 320–323 (1999) 8. Lee, H., Kim, H., Lee, H.: Robust image watermarking using local invariant features. Optical Engineering 45, 037002 (2006) 9. Lee, H., Lee, H.: Copyright protection through feature-based watermarking using scale-invariant keypoints. In: Proc. Int’l Conf. Consumer Electronics, pp. 225–226 (2006) 10. Li, L., Guo, B., Shao, K.: Geometrically robust image watermarking using scaleinvariant feature transform and Zernike moments. Chinese Optics Letters 5, 332–335 (2007) 11. Licks, V., Jordan, R.: Geometric attacks on image watermarking systems. IEEE Multimedia 12, 68–78 (2005) 12. Lin, C., Wu, M., Bloom, J., Cox, I., Miller, M., Lui, Y.: Rotation, scale, and translation resilient watermarking of images. IEEE Trans. Image Processing 10, 767–782 (2001)
114
B.-L. Guo, L.-D. Li, and J.-S. Pan
13. Lindeberg, T.: Scale-space theory: a basic tool for analysing structures at different scales. Journal of Applied Statistics 21, 224–270 (1994) 14. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004) 15. Lu, W., Lu, H., Chung, F.: Feature based watermarking using watermark template match. Applied Mathematics and Computation 177, 377–386 (2006) 16. Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proc. Int’l Conf. Computer Vision, vol. 1, pp. 525–531 (2001) 17. O’Ruanaidh, J., Pun, T.: Rotation, scale and translation invariant spread spectrum digital image watermarking. Signal Processing 63, 303–317 (1998) 18. Pan, J.S., Huang, H.C., Jain, L.C. (eds.): Intelligent Watermarking Techniques. World Scientific Publishing Company, Singapore (2004) 19. Pan, J.S., Huang, H.C., Jain, L.C., Fang, W.C. (eds.): Intelligent Multimedia Data Hiding. Springer, Heidelberg (2007) 20. Pereira, S., Pun, T.: Robust template matching for affine resistant image watermarks. IEEE Trans. Image Processing 9, 1123–1129 (2000) 21. Qi, X., Qi, J.: A robust content-based digital image watermarking scheme. Signal Processing 87, 1264–1280 (2007) 22. Seo, J., Yoo, C.: Localized image watermarking based on feature points of scale-space representation. Pattern Recognition 37, 1365–1375 (2004) 23. Tang, C., Hang, H.: A feature-based robust digital image watermarking scheme. IEEE Trans. Signal Processing 51, 950–959 (2003) 24. Xin, Y., Liao, S., Pawlak, M.: Geometrically robust image watermarking via pseudo-Zernike moments. In: Proc. Canadian Conf. Electrical and Computer Engineering, vol. 2, pp. 939–942 (2004) 25. Xin, Y., Liao, S., Pawlak, M.: Robust data hiding with image invariants. In: Proc. Canadian Conf. Electrical and Computer Engineering, pp. 963–966 (2005) 26. Xin, Y., Liao, S., Pawlak, M.: Circularly orthogonal moments for geometrically robust image watermarking. Pattern Recognition 40, 3740–3752 (2007) 27. Yu, G., Lu, C., Liao, H.: Mean-quantization-based fragile watermarking for image authentication. Optical Engineering 40, 1396–1408 (2001) 28. Zheng, D., Liu, Y., Zhao, J., El-Saddik, A.: A survey of RST invariant image watermarking algorithms. ACM Computing Surveys 39, 1–89 (2007) 29. Zheng, D., Zhao, J., ElSaddik, A.: RST-invariant digital image watermarking based on log-polar mapping and phase correlation. IEEE Trans. Circuits and Systems for Video Technology 13, 753–765 (2003)
6 Intelligent Perceptual Shaping in Digital Watermarking Asifullah Khan and Imran Usman Department of Computer and Information Sciences Pakistan Institute of Engineering and Applied Sciences, Nilore–45650, Islamabad, Pakistan
[email protected],
[email protected] Summary. With the rapid technological advancement in the development, storage and transmission of digital content, watermarking applications are both growing in number and becoming complex. This has prompted the use of computational intelligence in watermarking, especially for thwarting attacks. In this context, we describe the development of a new watermarking system based on intelligent perceptual shaping of a digital watermark using Genetic Programming (GP). The proposed approach utilizes optimum embedding strength together with appropriate DCT position selection and information pertaining to conceivable attack in order to achieve superior tradeoff in terms of the two conflicting properties in digital watermarking, namely, robustness and imperceptibility. This tradeoff is achieved by developing superior perceptual shaping functions using GP, which learn the content of a cover image by exploiting the sensitivities/insensitivities of Human Visual System (HVS) as well as attack information. The improvement in imperceptibility and bit correct ratio after attack are employed as the multi-objective fitness criteria in the GP search.
6.1 Introduction Recent years have witnessed a striking increase in digital multimedia representation, storage and distribution due to the widespread usage of interconnected networks. Digital media, such as images, is used on Internet and portable devices for display purposes in order to market ideas and sell products. This, indeed, has triggered an increase of multimedia content piracy. In this context, digital watermarking is a promising technique to help protect intellectual property rights and enforce piracy control. Robust watermarking is based on embedding a transparent watermark in the cover work for the purpose of copyright protection or identifying the sender/receiver of the data. In this type of watermarking, the integrity of the watermark is of prime importance, i.e. the watermark should sustain all J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 115–139. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
116
A. Khan and I. Usman
attacks unless the cover work is substantially degraded and becomes of little or no use. In robust watermarking, usually, capacity is not an issue; however, high capacity is always desirable. Different applications pose different requirements on the watermark shaping in terms of attacks that are likely to be mounted on them [11]. An attack may be intentional and/or unintentional manipulation of the marked cover work; thus, reducing the watermark detection and decoding performance. A universal watermarking system that can withstand all attacks and at the same time fulfill all other desirable requirements is almost impossible to be developed [3, 11]. Usually robustness is achieved at the cost of imperceptibility. As these two properties contradict each other, therefore, while designing a watermarking system one needs to make an optimum choice between them in accordance to the conceivable attack and intended application. This need has prompted the use of intelligent optimization techniques; whereby the problem of making balanced alteration to the original features during embedding is formulated as an optimization problem. The work by Huang et al [16] demonstrates that keeping in view the robustness versus imperceptibility tradeoff, optimal embedding positions in a block-based DCT domain watermarking can be selected using Genetic Algorithms. Exploiting machine learning capabilities further for the improvement of watermarking techniques, Khan et al.[5] proposed the idea of selecting suitable strength of alteration in DCT coefficients using GP. For this purpose, enhanced Perceptual Shaping Functions (PSFs) are developed that make an effective tradeoff between robustness and imperceptibility. These shaping functions are image adaptive and offer superior performance in terms of tradeoff as compared to the conventional Watson’s model originally designed for JPEG compression. In addition to the tradeoff, the conceivable attack information can be exploited as well. For this purpose, the problem is considered as a multiobjective optimization problem, i.e achieving an optimal tradeoff as well as structuring the watermark in accordance to the anticipated attack, by selecting the appropriate strength of watermark embedding. This type of strategy is known as intelligent perceptual shaping. To shape a watermark according to any cover image, perceptual models [12, 13, 18, 20] that are frequently used in image compression are utilized. These perceptual models make a tradeoff between robustness and imperceptibility according to the cover image. However, they do not take into consideration the watermark application and thus the anticipated attack. The aim of this chapter is to describe an intelligent and automatic system capable of structuring digital watermarks for practical applications, where, not only fair imperceptibility is required, but also achieving resistance against a conceivable attack is considered as a more realistic approach. A system for developing appropriate PSFs is presented, which are image independent, but application specific. This approach is adaptive not only towards perceptual shaping but towards the application environment as well. It is easy to implement and applicable to all watermarking approaches that employ structuring
6
Intelligent Perceptual Shaping in Digital Watermarking
117
of watermark in view of HVS before being embedded. We call this technique as Genetic Perceptual Shaping (GPS) scheme and the genetically developed PSFs as Genetic Perceptual Shaping Functions (GPSFs). In section 6.2 we present a brief overview of HVS modeling, attacks and their counter measures, spread spectrum based watermark embedding and decoding. It also provides a general overview of GP along with some discussion on the objective measures of watermark robustness and imperceptibility. The GPS scheme is presented in section 6.3, which includes both the training and the testing phases. Results obtained through experimentation are discussed in section 6.4. Section 6.5 concludes this chapter by analyzing the prospects and future directions of intelligent watermarking.
6.2 Attack-Resistant Perceptual Shaping: Some Preliminaries 6.2.1
Human Visual System Modeling
Human perception mechanisms are not uniform in all regions of an image signal. It varies with spatial frequency, luminance and color of its input. This suggests that, when embedded, all components of a watermark may not be equally perceptible. If we amplify the watermark in the areas where it is well disguised and attenuate it in the areas where it is easily perceptible, we are able to embed a stronger watermark. This type of strategy is known as perceptual shaping of a watermark [3], and the models used are called Perceptual Models. Depending upon the ability of HVS to perceive or not to perceive a certain stimuli, perceptual models assign a perceptual slack, α(i, j),to each term of the cover work expressed in some domain. Many perceptual models have been proposed in literature including Watson’s Perceptual Model [12, 18], originally proposed by Ahumada et al [1], which assigns a slack to each term in block DCT domain. In context of watermark shaping, in the sequel, we will use Perceptual Models and PSFs interchangeably. Other proposed PSF for images assign slacks to frequencies in Fourier and wavelet domains [19], or pixels in the spatial domain [14]. Similarly, Kutter et al proposed a model based on local isotropic measure and a masking function [10], while Lambrecht et al’s model is based on Gabor filters [13]. Our proposed GPS scheme is largely based on Genetic modification of Watson’s PSF. Consider a 2D image signal x in spatial domain. It is transformed to matrix X by applying 8×8 block DCT. According to the Watson’s perceptual model, the visibility threshold T (i, j) for every (i, j) DCT coefficient of 8 × 8 block is defined as: 2 2 2 + f0,j ) Tmin (fi,0 log T (i, j) = log 2 + f 2 )2 − 4(1 − r)f 2 f 2 (fi,0 0,j i,0 0,j 2 2 + f 2 − log f +u(log (fi,0 (6.1) min ) 0,j
118
A. Khan and I. Usman
where fi,0 and f0,j denotes the vertical and horizontal frequencies (cycles/degree) of the DCT basis functions respectively and Tmin is the minimum value of T (i, j) corresponding to fmin . In order to include luminance sensitivity, this threshold is corrected corresponding to average luminance of each block,
X0,0 (6.2) T (i, j) = T (i, j) ¯ X0,0 ¯ 0,0 represents the average where X0,0 is the DC coefficient of each block and X screen luminance (1024 for an 8-bit image). If we include the effect of contrast masking as well, the following relation is formulated: + * (6.3) T ∗ (i, j) = max T (i, j), |T (i, j)|1−ω X(i, j)ω where X(i, j) is AC DCT coefficient of each block and ω has been empirically set to a value of 0.7. These allowed alterations represent the perceptual mask denoted by α. Though fair enough to give us imperceptible alterations, Watson’s perceptual model does not provide an optimum PSF since many of the constants are set empirically. Furthermore, some effects like spatial masking in frequency domain are ignored. Based on this, an effective system for developing appropriate PSFs is presented in this chapter. A system which caters for fine details in human perception such that both robustness and imperceptibility of the watermarking system are enhanced. 6.2.2
Attacks and Their Countermeasures
Watermark attacks and their counteractions are complex and still a topic which needs much attention from the research community. A watermarked medium can be attacked in a variety of distinctive manners. Modeling all possible attacks that preserve the perceptual requirements is fundamentally difficult. Normally, attacks comprising of a specific set of distortions are associated with each application. The types and levels of robustness that might be required for a particular watermarking application is discussed in detail by Cox et al[3]. Several strategies have been adopted for making a watermarking system reliable against the expected distortions. A few examples include spread spectrum modulation, redundant embedding, selection of perceptually significant coefficients, and inverting distortion in the detection phase. Attacks can be classified into four categories [15]: 1. 2. 3. 4.
removal and interference attacks, geometrical attacks, cryptographic attacks, and protocol attacks.
6
Intelligent Perceptual Shaping in Digital Watermarking
119
Intentional attacks or illegitimate attacks are more difficult to counteract as compared to common signal processing and transmission distortions. Therefore, in thwarting an attack, many assumptions are made about the attacker and the tools he might use. For example, does the attacker know the watermarking algorithm, does he possess a detector that he can modify, what softwares tools are available to him etc. Once the watermarking system is specified publicly, an attacker usually has more freedom as compared to an embedder. This is because the attacker is free to develop extra and more intricate attacks, while the watermarker can no longer change the watermarked work. 6.2.3
Additive Spread Spectrum Watermarking Approach
Information encoding and embedding Let us consider the 8×8 block DCT transformed image X as described earlier in section 6.2.1. Let W be the watermark viewed as a 2-D signal which will be added to X in order to produce watermarked DCT image Y. Let M be a message, which is mapped to a codeword vector (Figure 6.1). In the decoding stage, we have to retrieve this message M . The codeword vector is then expanded to generate b, with each element bi repeated over a selected set of DCT coefficients in order to enhance robustness. The resulting signal b is further direct sequence spread spectrum modulated by employing a 2-D pseudo random sequence (PRS) denoted by S. The PRS behaves as spread sequence taking values ± and has zero mean. In order to shape the resultant signal according to the cover image, it is multiplied with perceptual mask α (obtained by applying the PSF to the cover image in DCT domain) to produce the watermark W as follows: W = α·S·b
(6.4)
Addition of this watermark to the original image, element by element thus performs the embedding: Y = X+W (6.5) where the watermark W is our desired signal, while the cover work X acts as an additive noise. Information decoding The additive spread spectrum watermarking approach [4] is based on the assumption that the pdf of the original coefficients remains the same even after embedding. The DCT coefficients are modeled using zero mean Generalized Gaussian pdf, given as: (6.6) fx (x) = Ae−|βx|c
120
A. Khan and I. Usman
Fig. 6.1. Additive spread spectrum based watermark embedding
where, both A and β are expressed as a function of the unknown parameters c and standard deviation σ : 12 βc 1 Γ 3c , A= (6.7) β= σ Γ 1c 2Γ ( 1c ) Γ represents the gamma function. The unknown parameters c and σ are to be estimated from the received image in the decoding stage. Generalized Gaussian based Maximum Likelihood decoder: Since detection of the watermark is not our primary concern, therefore it is assumed by the decoder that the received image is watermarked. Let us suppose that there are L possible messages. In the verification process, our objective is to obtain an estimate of the hidden message M from the marked image (Figure 6.2). In cases where there is less or no knowledge of the priory probabilities of the classes/hypotheses, priori probabilities are selected that make the classes equally likely. Hence, if we assume that the messages are equiprobable, then it is judicious to consider maximum likelihood test. The estimated message should satisfy:
6
Intelligent Perceptual Shaping in Digital Watermarking
ln
f ( Ybl ) f ( bYm )
> 0,
∀ m = l.
121
(6.8)
The sequences that are generated by considering each (i, j) DCT coefficient of all 8 × 8 blocks are assumed to behave like generalized Gaussian distribution and are statistically independent. Let us denote 2-D indices by k and the 2-D sequences by Qi,j [k] which are obtained as follows: Ql1 ,l2 [k1 , k2 ] = X[8k1 + l1 , 8k2 + l2 ],
l1 , l2 ∈ 0, . . . , 7.
(6.9)
The ML is equivalent to finding index l ∈ 1, 2, . . . , L that obey |Y [k] − Wm [k]|c[k] − |Y [k] − Wl [k]|c[k] σ[k]c[k]
k
> 0,
∀ m = l.
(6.10)
In this case, it is to be noted that our bit can be +1 or −1 only i.e. a bipolar signal. If now Gi sample vector denote all the DCT coefficients of different 8 × 8 blocks that correspond to a single bit i, then the sufficient statistics of this sample vector is given by: ri
|Y [k] + α[k]S[k]|c[k] − |Y [k] − α[k]S[k]|c[k] . σ[k]c[k]
(6.11)
k∈Gi
For bipolar signal, i.e., b ∈ [−1, 1], the hidden bits are then estimated using ‘0’ as a threshold: ˆbi = sgn(ri )
∀i ∈ 1, 2, . . . , N
(6.12)
where sgn(·) is the sign function. Details of this additive Spread Spectrum watermarking approach are well documented in [4]. 6.2.4
Basics of Genetic Programming
Evolutionary computational methods are inspired by biological evolution, which is based on enhancement through proper selection and breeding. Genetic Programming is a type of such method which imitate biological evolution and generational refinement. It is mostly used in optimization problems, by using a selection criteria to measure and compare candidate solutions in a stepwise enhancement process. In GP, candidate solutions are represented using a data structure such as a tree. Multi-potential solutions for specific problems are encoded as trees to form a population of functions (programs) as shown in Figure 6.3. Initially, a random population of such candidate solutions is created. Every candidate solution is evaluated and assigned a numerical rating, or fitness, using a predefined application dependent fitness function. The survival of fittest is
122
A. Khan and I. Usman
Fig. 6.2. Additive spread spectrum based watermark extraction
Fig. 6.3. An example attack-resistant GPSF represented by a tree data structure
implemented by retaining the best individuals in a generation. The rest are deleted and replaced by the offspring of the best individuals. Offspring are created by applying genetic operators (crossover, mutation and replication) on the best individuals. The retained ones and the offspring make a new generation. The whole process is repeated for subsequent generations with scoring and selection procedure in place. Every new generation, on average, has a slightly higher score than the previous one. The process is stopped when any of the stopping criteria is met, which may include the number of generations, fitness value or time limit. In this way, the solution space is refined generation by generation and thus converges to the optimal/nearoptimal solution. Figure 6.4 illustrates this GP search mechanism.
6
Intelligent Perceptual Shaping in Digital Watermarking
123
Fig. 6.4. GP search mechanism
6.2.5
Robustness and Imperceptibility Measures
For comparing performance of watermarking systems implemented in different feature domains, a set of general and objective measures are defined. In
124
A. Khan and I. Usman
this regard, robustness is generally measured in terms of Bit Correct Ratio (BCR) [2], bit error rate and false alarm probability. Whereas imperceptibility of a watermark is measured in terms of Structural Similarity Index Measure (SSIM) [17], weighted Peak Signal to Noise Ratio (wPSNR) [14], and Watermark to Document Ratio (WDR) [9]. We use Mean Squared Strength (MSS) as an objective measure to depict watermark power. It is used to represent the estimated robustness of individuals in a GP population and is given by: M SS =
Nb Nc 1 α(x1 , x2 )2 Nb Nc x =1 x =1 1
(6.13)
2
where, Nb is the total number of 8 × 8 blocks in the cover image and Nc is the number of selected DCT coefficients in a block. x1 and x2 are the respective indices. Based on the above stated set of objective measures for watermark robustness and imperceptibility, a fitness function is defined for performance evaluation of each individual of a GP population. The fitness evaluation is based on how well is the imperceptibility as well as how high the BCR value is. The numerical rating of the fitness function is used to rank each individual of the population. The greater the fitness is, the better the individual has performed. (6.14) Fitness = W1 · SSIM + W2 · BCR, where, W1 and W2 represent the corresponding weights of different measures used in the fitness. SSIM denotes the structural similarity index measure of the marked image at a certain level of estimated robustness. The work by Wang et. al. [17] shows that a measure of change in structural information can provide a good approximation for perceived distortion. The SSIM between two images x and y is provided as: SSIM (x, y) = [l(x, y)]α · [c(x, y)]β · [s(x, y)]γ
(6.15)
where, l(x, y) is the luminance comparison function, c(x, y) represent the contrast comparison function, and c(x, y) is the structure comparison function. α, β and γ are the parameters used to adjust the relative importance of the three components following the constraints α > 0, β > 0 and γ > 0. Further details on SSIM are provided in [17]. BCR is the second objective of the fitness function and represents Bit Correct Ratio after the attack is carried out on the watermarked image. It is defined as: Lm (mi ⊕ mi )/Lm (6.16) BCR(M, M ) = i=1
where M is the original, while M represents the decoded message. Lm is the length of the message and ⊕ represents exclusive-OR operation.
6
Intelligent Perceptual Shaping in Digital Watermarking
125
6.3 Attack Resistant Perceptual Shaping This section presents a detailed discussion on the attack resistant GPS scheme. Its basic architecture for developing GPSF is shown in Figure 6.5, where five modules work in a repeating manner. The overall working of the basic architecture is as follows.
Fig. 6.5. Basic architecture of the GPS scheme
The GP module produces a population of GPSF. Each GPSF is presented to the perceptual shaping module, where it is applied to the cover image in DCT-domain and generates a perceptual mask. In the watermarking stage, the watermark is shaped using this perceptual mask. The imperceptibility of the watermark is calculated in terms of SSIM before presenting it to the attack module. Next, the conceivable attack is performed on the watermarked image in the attack module. In the decoding module, the embedded message is retrieved from the corrupted image. Watermark robustness in terms of BCR is calculated, which is then used as a scoring criteria in the GP module. In this way, the GP module evaluates the performance of its several generated GPSFs. 6.3.1
Evolution of Perceptual Shaping Function
Figure 6.6 demonstrates detailed block diagram of the GPS scheme, where different modules in Figure 6.5 are elaborated. These modules are described as follows:
126
A. Khan and I. Usman
Fig. 6.6. Detailed flow chart of the GPS scheme
1. The GP module The GP settings for evolving GPSF are as under: GP Function Set: Function set in GP is a collection of functions available to the GP system. In this regard, there is a wide range of choices to be included in GP function set. For simplicity, we choose simple functions, including four binary floating arithmetic operators (+, −, ∗, and protected division), LOG, EXP, SIN and COS. GP Terminals: To develop initial population of GPSF, we consider GPSF as watermark shaping function and the characteristics of HVS as independent variables. By doing this, we are letting GP to exploit the search space representing different possible forms of dependencies of the watermark shaping function on the characteristics of HVS. Therefore, the current value of WPM-based perceptual mask, DC and AC DCT coefficients of 8 × 8 block are provided as variable terminals. Random constants in the range [−1, 1] are used as constant terminals. Fitness Function: The fitness function used in our GP simulation is represented by equation 6.14, which is designed to provide feedback during the
6
Intelligent Perceptual Shaping in Digital Watermarking
127
GP simulation about how well an individual is performing in terms of watermark robustness and imperceptibility. This evaluation is based on the values of SSIM measure at a certain level of watermark power as well as how high the BCR value is. Termination Criterion: The GP simulation is ceased when one of the following conditions is encountered: 1. The fitness score exceeds 1.95 with M SS > 20. 2. The number of generations reaches the predefined maximum number of generations. 2. Perceptual shaping module In order to embed a high energy watermark at low cost of resultant distortion in the cover image, a PSF structures a watermark. In doing so, it exploits the characteristics of HVS. The perceptual shaping module in GPS scheme receives the individual GPSF provided by the GP module as an input. Each GPSF is operated on the cover image in DCT-domain. Corresponding to the selected DCT coefficient of a block, the GPSF returns a value. The magnitude of this value represents the perceptual strength of the alteration made in that coefficient. We represent the functional dependency of the perceptual shaping function on the characteristics of HVS as follows: α(k1 , k2 ) = f (T (i, j), X0,0 , X(i, j))
(6.17)
where T (i, j) is the visibility threshold representing frequency sensitivity of HVS. X0,0 is the DC DCT coefficient representing luminance sensitivity, whereas X(i, j) is the AC DCT coefficient of the current block representing the contrast masking characteristics of HVS. We obtain the perceptual mask for the current cover image by operating the GPSF on all of the selected DCT coefficients. It is to be noted that instead of using all the 64 coefficients inside an 8 × 8 block, a predefined zigzag scanned set of DCT coefficients can also be used. If GP is allowed to incorporate coefficient selection for watermark embedding, then it can further refine the predefined selection. The product of the spread-spectrum sequence and expanded message bits is multiplied with this perceptual mask to obtain the 2D watermark signal W (see Eq. (6.4)). If αG is the perceptual mask obtained from GPSF, then equation 6.4 is modified as follows: (6.18) W = αG · S · b where, αG incorporates the dependencies from Watson’s PSF, AC and DC coefficients and the intended attack. If Λ denotes the information about the intended attack, then equation 6.17 is modified to include the resultant changes in the distribution of the DCT coefficients caused by the attack as follows:
128
A. Khan and I. Usman
αG (k1 , k2 ) = f (T (i, j), X0,0, X(i, j), Λ)
(6.19)
If i, j represent the index values of the current AC coefficient corresponding to the selected bandpass DCT coefficients, then equation 6.19 is modified as follows: (6.20) αG (k1 , k2 ) = f (T (i, j), X0,0 , X(i, j), Λ, i, j) 3. Watermarking module The watermarking module implements the spread spectrum based watermarking technique. This watermarking technique is oblivious and embeds message in the low and mid frequency coefficients of 8 × 8 DCT blocks of a cover image. It performs the statistical modeling of DCT coefficients using generalized Gaussian distribution as explained in section 6.2.3. Besides this technique, any other watermarking approach that makes use of a perceptual mask before embedding can also be utilized. Additive spread spectrum based watermarking approaches, such as those presented in literature [4], are easy to implement. The watermark, W, is added coefficient by coefficient to the original image in DCT domain using equation 6.5. The watermarking module provides the imperceptibility in terms of SSIM after the watermark is embedded. 4. Attack module The attack module implements different types of attacks on the marked image before decoding the embedded message. For instance, Gaussian noise is added before decoding to evolve Gaussian attack-resistant GPSF. Hence, different GP simulations are carried out for evolving different attack-resistant GPSFs. 5. Decoding module After an attack has been carried out in the attack module, the corrupted image is presented to the decoding module. It performs decoding of the embedded message as discussed in section 6.2.3. The same GPSF, as used in the embedding stage, is required to obtain the perceptual mask for the received image. The perceptual mask is then used to compute the required sufficient statistics for the Maximum Likelihood based decoder, in order to extract the hidden message blindly. 6.3.2
Bonus-Fitness Based Evolution
In the decoding stage of the GPS scheme, both robustness and imperceptibility requirements of a watermark are implemented through a multi-objective fitness function [6, 7]. One way to perform this is to use equation 6.14. However, the drawback of this type of fitness function is that instead of searching
6
Intelligent Perceptual Shaping in Digital Watermarking
129
Fig. 6.7. GP search beam indicating the idea of bonus fitness in scoring individuals
for a superior and image independent GPSF, main effort of the GP search is spent on searching a GPSF that results in high BCR value. This in fact belittles the optimization of robustness versus imperceptibility trade off.The GPSF evolved using this type of strategy is not image adaptive and might have very poor performance for attacks other than the intended attacks. This problem is solved by using the idea of bonus fitness as used in [7]. As shown in Figure 6.7, those GPSF that make a better tradeoff between robustness and imperceptibility, are given bonus fitness. The bonus fitness is the amount of resistance against the intended attack in terms of BCR. Thus equation 6.14 is modified as follows: W1 ∗ SSIM + W2 ∗ BCR if SSIM ≥ T1 and M SS ≥ T2 F itness = W1 ∗ SSIM otherwise (6.21) where T1 and T2 are lower bounds of SSIM and M SS respectively. In this way, the second driving force is separated from the first driving force through the concept of bonus fitness. Otherwise, the GP simulation will usually tend to focus on the second requirement and will altogether neglect the imperceptibility requirement and vice versa. Figure 6.7 elaborates this idea of bonus fitness. In the GP search, those GPSF that make a good tradeoff are represented with star symbol and thus conceptually separated from the main GP search beam. The GPSFs would then compete in terms of the second fitness. The thresholds T1 and T2 are of prime importance for the selection of individuals regarding the trade off. The smaller these thresholds for fulfilling the fitness criteria, the larger is the diversity among the selected GPSFs. 6.3.3
Generalization of the Best Evolved Gpsf
At the end of the GP simulation, the best evolved GPSF is saved in order to assess its performance. Using this evolved expression, test images are watermarked and robustness and imperceptibility measures are computed. In order to assess its watermark shaping capability, the evolved GPSF is then compared to that of the Watson’s PSF.
130
A. Khan and I. Usman
Fig. 6.8. Watermark strength distribution corresponding to the attack-resistant GPSF
6.4 Experimental Results 6.4.1
Watermark Shaping Performance
Figure 6.8 demonstrates the ability of the GPS scheme for selecting suitable strength of embedding in the desired regions. These strengths are produced by the GPSF developed against the Gaussian attack with σ = 50 using Lena image. It is observed that depending upon the current AC and DC coefficient, it provides suitable imperceptible alterations according to the spatial content of that block. Figure 6.9 shows the corresponding strength of embedding using Watson’s PSF. By comparing figures 6.8 and 6.9, it is noticeable that the evolved Gaussian attack-resistant GPSF is able to select appropriate DCT positions for embedding along with suitable strengths. This fact indicates that the developed GPSF has learnt the content of the cover image and is adaptive with respect to the attack. The resultant 2-D watermarks in terms of amplitude for GPSF and Watson’s PSF are shown in figures 6.10 and 6.11 respectively. The Gaussian attack-resistant GPSF is given below: αG (k1 , k2 ) = A ∗ B min(i2 , (0.2077 ≤ X0,0 )) where, A = log(j) ∗ (j + X0,0 ) B = sin(j ∗ X0,0 ) > C, C = D ∗ j ∗ α(k1 , k2 ), and
(6.22)
D = cos(exp(cos((i + 0.29242) − max((j ∗ X0,0 ), X0,0 )))).
6
Intelligent Perceptual Shaping in Digital Watermarking
131
Fig. 6.9. Watermark strength distribution corresponding to Watson’s PSF
Fig. 6.10. 2-D watermark signal obtained from the attack-resistant GPSF
6.4.2
Fidelity of the Resultant Image
The difference image, obtained by subtracting the original image (Figure 6.12) from the watermarked image (Figure 6.13) in spatial domain in shown in Figure 6.14. The pixel values of the difference image are amplified ten times for illustration purpose. The evolved GPSF is able to learn the
132
A. Khan and I. Usman
Fig. 6.11. 2-D watermark signal obtained from Watson’s PSF
Fig. 6.12. Original Couple Image
spatial distribution of the Lena image. Though the embedding was performed in the DCT domain, it is visible that most of the strong embedding is in highly textured areas.
6
Intelligent Perceptual Shaping in Digital Watermarking
Fig. 6.13. Watermarked Couple Image using Attack-resistant GPSF
Fig. 6.14. Scaled difference image
133
134
A. Khan and I. Usman
Fig. 6.15. BCR comparison of Wiener attack-resistant GPSF and Watson’s PSF for five standard images
Fig. 6.16. BCR comparison of Gaussian attack resistant GPSF and Watson’s PSF
6.4.3
Wiener Attack Resistant GPSF
Figure 6.15 shows a comparison of Watson’s PSF and Wiener attack-resistant GPSF in terms of BCR for five different standard images. In order to achieve equal distortion of the resultant watermarked image corresponding to both Watson’s PSF and GPSF, some equalization in terms of imperceptibility is required. By doing so, we are able to compare the robustness performance of both the schemes. Therefore, the SSIM values for each image are kept approximately equal by multiplying the corresponding perceptual masks with some scaling factor. The results in Figure 6.15 shows that the Wiener attackresistant GPSF, evolved for Lena image alone, is image independent. In terms of BCR performance, the Wiener attack-resistant GPSF has superior performance as compared to that of Watson’s PSF, for almost all of the test images.
6
Intelligent Perceptual Shaping in Digital Watermarking
135
Fig. 6.17. BCR versus standard deviation performance of both perceptual shaping functions
6.4.4
Gaussian Attack Resistant GPSF
The performance of the GPSF developed for Gaussian noise attack with σ = 50 is presented in Figure 6.16. The imperceptibility performance measures of the evolved GPSF it similar to Watson’s PSF. But, on the other hand, it outperforms Watson’s PSF in terms of BCR and hence, robustness. Figure 6.17 demonstrates the BCR versus standard deviation performance of both perceptual shaping functions. It can be observed that Gaussian noise
Fig. 6.18. Watermarked image attacked by Gaussian noise (σ = 50)
136
A. Khan and I. Usman
Fig. 6.19. BCR comparison of both perceptual shaping functions against Median filtering attack
attack-resistant GPSF has high BCR values corresponding to different standard deviations. The watermarked image after being attacked by the Gaussian noise attack is presented in Figure 6.18. 6.4.5
GPSF Developed for Median Filtering Attack
The evolved median filtering attack-resistant GPSF is compared with Watson’s PSF in Figure 6.19. Again, keeping the same level of imperceptibility, the best evolved GPSF demonstrates a substantial improvement in terms of robustness as compared to Watson’s PSF. This is because the evolved expression has learnt the optimum tradeoff and spreads the watermark energy in areas where the distortion affect is less.
Fig. 6.20. Performance of the evolved GPSF with respect to complexity
6
Intelligent Perceptual Shaping in Digital Watermarking
137
Fig. 6.21. BCR of against JPEG compression with Quality Factor=70
Fig. 6.22. BCR comparison in terms of Quality Factor for both perceptual shaping functions
The accuracy versus complexity curve of a GP run, as displayed in Figure 6.20, characterize the performance of the evolved GPSF with respect to its complexity. As a matter of fact, with the increase in fitness of the best GPSF of a generation, its individual’s total number of nodes as well as its average tree depth increases. This means that the GPSF is able to further exploit the hidden functional dependencies pertaining to HVS, but at the cost of higher complexity. 6.4.6
GPSF Developed for JPEG Compression Attack
The performance of the GPSF evolved with respect to JPEG compression attack is shown in figures 6.21 and 6.22. The performance of the JPEG attackresistant GPSF is better in terms of BCR as compared to Watson’s PSF. This indeed, justifies the use of such intelligent perceptual shaping techniques in order to counter attacks.
138
A. Khan and I. Usman
6.5 Conclusion In this chapter we have considered the GP-based perceptual shaping of a digital watermark in accordance to the cover image and anticipated attack. The GP developed PSFs are image and attack adaptive. A considerable improvement in resistance against the intended attack is achieved by letting the GP search exploit the attack information. Achieving the superior tradeoff and high resistance against an attack is implemented through the concept of bonus fitness in the multi objective fitness function. Once the best GPSF is developed, then applying the evolved GPSF for watermark shaping is straight forward and easy to implement. This approach is quite suited for the complex and dynamic applications of watermarking, e.g protecting data sent through context-aware medical networks. It can be used in any watermarking technique that makes use of perceptual shaping at the embedding stage. In order to resist the intelligent attempts of an adversary to break watermarking schemes, the use of intelligent coding, embedding, and decoding techniques [8] seems very prospective.
References 1. Ahumada, A.J., Peterson, H.A.: Luminance-model-based DCT quantization for color image compression. In: Proc. SPIE on Human Vision, Visual Processing, and Digital Display III, vol. 1666, pp. 365–374 (1992) 2. Barni, M., Bartolini, F.: Watermarking Systems Engineering: Enabling Digital Assets Security and Other Application. Marcel Dekker, Inc., New York (2004) 3. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking: Principles & Practice. Morgan Kaufman, Los Altos (2001) 4. Hernandez, J.R., Amado, M., Perez-Gonzalez, F.: DCT-Ddmain watermarking techniques for still images: Detector performance analysis and a new structure. IEEE Trans. on Image Processing 9, 55–68 (2000) 5. Khan, A., Mirza, A.M., Majid, A.: Optimizing perceptual shaping of a digital watermark using genetic programming. Iranian Journal of Electrical and Computer Engineering 3, 144–150 (2004) 6. Khan, A.: Intelligent Perceptual Shaping of a Digital Watermark, PhD thesis, Faculty of Computer Science, GIK Institute, Pakistan (2006) 7. Khan, A., Mirza, A.M.: Genetic perceptual shaping: Utilizing cover image and conceivable attack information using genetic programming. Information Fusion 8, 354–365 (2007) 8. Khan, A., Tahir, A.F., Majid, A., Choi, T.S.: Machine learning based adaptive watermark decoding in view of an anticipated attack. Pattern Recognition 41, 2594–2610 (2008) 9. Koprulu, F.I.: Application to Low-Density Parity-Check Codes to Watermark Channels, Ms. Thesis, Electrical and Electronics, Bogaziei University, Turkey (2001) 10. Kutter, M., Winkler, S.: A vision-based masking model for spread-spectrum image watermarking. IEEE Trans. Image Processing 11, 16–25 (2002) 11. Pan, J.S., Huang, H.C., Jain, L.C. (eds.): Intelligent Watermarking Techniques. World Scientific Publishing Company, Singapore (2004)
6
Intelligent Perceptual Shaping in Digital Watermarking
139
12. Solomon, J.A., Watson, A.B., Ahumada, A.J.: Visibility of DCT basis functions: Effects of contrast masking. In: Proc. Data Compression Conf., Snowbird, UT, pp. 361–370 (1994) 13. van den Branden Lambrecht, C.J., Farrell, J.E.: Perceptual quality metric for digitally coded color images. In: Proc. EUSIPCO, pp. 1175–1178 (1996) 14. Voloshynovskiy, S., Herrigel, A., Baumgaetner, N., Pun, T.: A stochastic approach to content adaptive digital image watermarking. In: Pfitzmann, A. (ed.) IH 1999. LNCS, vol. 1768, pp. 211–236. Springer, Heidelberg (2000) 15. Voloshynovskiy, S., Pereira, S., Iquise, V., Pun, T.: Attack modeling: Towards a second generation watermarking benchmark. Signal Processing 81, 1177–1214 (2001) 16. Shieh, C.S., Huang, H.C., Wang, F.H., Pan, J.S.: Genetic watermarking based on transform domain techniques. Patt. Recog. 37, 555–565 (2004) 17. Wang, Z., Bovik, A.C., Sheikh, H.R.: Image quality assessment: From error visibility to structure similarity. IEEE Trans. on Image Processing 13, 600–612 (2004) 18. Watson, A.B.: Visual optimization of DCT quantization matrices for individual images. In: Proc. AIAA Computing in Aerospace 9, San Diego, CA, pp. 286–291 (1993) 19. Watson, A.B., Yang, G.Y., Solomon, J.A., Villasenor, J.: Visual thresholds for wavelet quantization error. In: Human Vision and Electronic Imaging, SPIE, vol. 2657, pp. 381–392 (1996) 20. Watson, A.B., Yang, G.Y., Solomon, J.A., Villasenor, J.: Visibility of wavelet quantization noise. IEEE Trans. Image Processing 6, 1164–1175 (1997)
7 Semi-fragile Image Authentication Method for Robust to JPEG, JPEG2000 Compressed and Scaled Images Chih-Hung Lin1 and Wen-Shyong Hsieh2 1
2
Department of Computer Science and Information Engineering, Southern Taiwan University, Tainan, Taiwan
[email protected] Department of Computer Science and Information Engineering, Shu-Te University, Kaohsiung, Taiwan
[email protected] Summary. This chapter presents a secure and simple content-based digital signature method for verifying the image authentication under JPEG, JPEG2000 compression and scaling based on a novel concept named lowest authenticable difference (LAD). The whole method, which is extended from the crypto-based digital signature scheme, mainly consists of statistical analysis and signature generation/verification. The invariant features, which are generated from the relationship among image blocks in the spatial domain, are coded and signed by the sender’s private key to create a digital signature for each image, regardless of the image size. The main contributions of the proposed scheme are: (1) only the spatial domain is adopted during feature generation and verification, making domain transformation process unnecessary; (2) more non-malicious manipulated images (JPEG, JPEG2000 compressed and scaled images) than related studies can be authenticated by LAD, achieving a good trade-off of image authentication between fragile and robust under practical image processing; (3) non-malicious manipulation is clearly defined to meet closely the requirements of storing images or sending them over the Internet. The related analysis and discussion are presented. The experimental results indicate that the proposed method is feasible and effective.
7.1 Introduction Digitized media can be easily copied and digital data can easily be altered without leaving any clues. Under these circumstances, integrity and authenticity verification has become a significant issue in the digital world. Therefore, the following conditions have to be protected [20]: (1) Content integrity J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 141–162. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
142
C.-H. Lin and W.-S. Hsieh
protection: the content is not allowed to be modified such that its meaning is altered, and (2) Sender’s repudiation prevention: once an image sender generates the signature, he cannot subsequently deny such a signing if both the signature and the image have been verified as being authentic. Fragile authentication methods [2, 9], which can detect one bit change in an image, can solve the above problems by either crypto signature methods such as RSA or DSA [18], or by public watermarking methods [1, 14]. This study designs a method with the same functions as fragile methods, but at a semi-fragile level. The motivation stems from real and common applications where some image processing (e.g., lossy compression or scaling) has to be permitted during the process of media transmission and storing (hereinafter referred to as non-malicious manipulation) while other modifications from attacks should be rejected (hereafter referred to as malicious manipulation). The proposed content-based authentication method focuses on verifying the authenticity of an image compressed by JPEG, JPEG2000 or scaling in terms of lowest authenticable difference (LAD). The authenticity of the nonmalicious manipulated image is authenticated as long as all the block feature differences (BFD) of image are less than the LAD. The invariant image feature, which is generated by analyzing the relationship of standard deviation between one block and its neighbors in the spatial domain by adopting the statistic model, is coded and signed by the sender’s private key to generate one digital signature per image only. The hybrid cryptosystem is adopted to enhance the security of the proposed method. The digital signature is then encrypted by the receiver’s public key to generate the ciphertext, and the original image is then sent with the ciphertext to the receiver. A preliminary version of this chapter was presented at [10], and this chapter presents further comparison with recent related studies [3, 7]. The remainder of this chapter is organized as follows. Section 7.2 describes related works and motivation. Section 7.3 presents the proposed method in detail, focusing mainly on feature extraction, signature generation and verification, all of which are based on the LAD. Experimental results are shown and compared with those of the related works in Sect. 7.4. Section 7.5 presents the analysis and discussion of the proposed method. Conclusions and future works are given in Sect. 7.6.
7.2 Related Works and Motivation Many methods for authenticating and verifying the content integrity of digital image have been presented in the past decade. Since the proposed method is signature-based, prior related works focusing on semi-fragile digital signaturebased solutions are surveyed. Figure 7.1 shows the original concept of fragile digital signatures [18, 20]. The feature is extracted from the content. Acceptable manipulations are those that only cause small changes to the feature content, while malicious attacks cause large changes. Schneider [19] and other researchers proposed to adopt some typical content signature, by
7
Semi-fragile Image Authentication Method
143
Fig. 7.1. Digital signature scheme for message authentication
making assumptions about those features for generating content signature. Schneider assumed that those features are insensitive to non-malicious manipulations, but sensitive to malicious manipulations. The features include histogram maps, edge/corner maps and moments. Lossy compression such as JPEG should be deemed a non-malicious manipulation in most applications. Lin and Chang [11] discovered a mathematical invariant relationship between two discrete cosine transform (DCT) coefficients in a block pair before and after JPEG compression, and selected it as the feature. Similarly, Lu and Liao [15] presented a structural signature solution for image authentication by identifying a stable relationship between a parent-child pair of coefficients in the wavelet domain. Sun and Chang [20] proposed a content-based digital signature method for verifying the authenticity of JPEG2000 images, in terms of a unique concept called lowest authenticable bit rates. Their method generates the feature in the wavelet domain, and mainly comprises signature generation/verification, error correction coding (ECC) and watermark embedding/extracting. Some methods directly apply crypto hashing to images (named content hash). Venkatesan et al. [21] presented a method for generating content hash for an image. However, their method also lacks the ability to locate content modifications if the authentication fails. All of the above signature-based methods have the same properties, that is, they require domain transformation (such as DCT or discrete wavelet transformation (DWT)) operations, and can only authenticate compressed images (such as JPEG or JPEG2000 compression). An increasing range of image processing options, such as blurring, sharpening, rotation and scaling, are now considered as non-malicious manipulation for different applications.
144
C.-H. Lin and W.-S. Hsieh
Liu [13] et al. presented a content based semi-fragile watermarking method for image authentication. The Zernike moment of the lowpass Wavelet band of the host image is chosen as the feature. Their method classifies JPEG compression, scaling and ratio as non-malicious manipulations, and jitter, cropping as malicious manipulations. Hu [6] et al. presented a semi-fragile watermarking method for image authentication which extracts image features from the low-frequency wavelet domain to generate two watermarks for classifying of the intentional content modification, and for indicating the modified location. The method [6] classifies lossy image compression (such as JPEG or JPEG2000 compression) and other image processing modifications (such as average filtering, median filtering, Gaussian noise addition, salt & pepper noise addition, sharpening and histogram equalization) as non-malicious manipulations. Feng [4] et al. presented a structure-based hierarchical image content representation method. Their method considers the influence of pixon kernel sharp to the Bayesian Structural Content Abstraction (BaSCA) model, and constructs an appropriate distance metric of the BaSCA model. The method of Feng [4] considers JPEG and SPIHT compression, median filtering and random line removal from images as non-malicious manipulations. Monga [16] et al. presented an image content authentication method using salient feature points. Their method includes an iterative feature detector, and compared features from two images by developing a generalized Hausdorff distance measure. The Wavelet transformation is needed during feature generation and detection. The experimental results indicate that JPEG compression, scaling (by 50%), rotation by 15◦ , random bending, print and scan, and cropping by 10% are considered as non-malicious manipulations. Chen and Lin [3] present an image authentication approach that distinguishes malicious attacks from JPEG lossy compression, and Chen and Lin’s method calculates the relationships between important DCT coefficients in each pair of DCT blocks and predefined thresholds to form the authentication message. In Ishihara and Abe [7]’s image authentication method, each watermark bit is duplicated and randomly embedded in the original image in the discrete wavelet domain by modifying the corresponding image coefficients through quantization. Ishihara and Abe’s method is robust again JPEG compression, but sensitive to malicious attacks such as cutting and pasting. The above surveys indicate that the definitions of non-malicious manipulation are different in various. Feature selection is application dependent [20]: different applications define non-malicious as well as malicious manipulations differently. Therefore, defining non-malicious and malicious manipulation of the image content is the first step in designing a good semi-fragile authentication system. This study defines non-malicious manipulation according to two important, definite and reasonable properties: (1) non-malicious manipulation must preserve the meaning of the image content, and (2) the non-malicious manipulation must reduce the size of the original image file.
7
Semi-fragile Image Authentication Method
145
Property (1) is well defined, but interpreted differently in different semifragile image authentication schemes. The definition of property (1) is insufficient and sometimes ambiguous. For instance, if an image contains a little white kite flying in the foreground, along with many white clouds appearing in the background, then the white-kite might disappear after blurring. Therefore, some method consider blurring as malicious manipulation [3, 7, 11, 15, 19, 20, 21]. By contrast, other method consider blurring as non-malicious manipulation in [4, 6]. Other manipulation operations, such as adding noise, rotation, sharpening and lowpass/highpass filtering, also present ambiguous cases. Consequently, property (2) is proposed in order to avoid ambiguous cases, and form a definite and suitable definition of nonmalicious manipulation for future applications in the Internet or other communication systems. Reducing the size of original file is desired in property (2). Consider the case of Alice transmitting a digital image (assuming that the original file size was 10MB) that was photographed by a Digital Single Lens Reflex (DSLR) to Bob. Alice must reduce the file size before transmitting the file in order to speed up the media transmission and save communication time. Another benefit of reducing a file’s size is the saving of storage space. Therefore, the above two properties are met simultaneously by not only the well-known compression manipulations (such as JPEG and JPEG2000 compression), but also scaling down. Thus, this study clearly defines non-malicious manipulation, to facilitate applications that need to use compression and scaling down manipulations during media transmission and storage, and ensures that other malicious manipulations are rejected. Another motivation stems from developing a simple image authentication method. Most previous related semi-fragile image authentication schemes need domain transformation procedures (such as DCT or DWT) during feature generation/verification [1, 3, 4, 6, 7, 11, 13, 14, 15, 16, 19, 20, 21]. Hence, this study presents a simple method, based on secret signatures, that generates and verifies the image feature according to the spatial domain. Restated, the proposed method does not need domain transformation procedures, and therefore saves time of operation.
7.3 The Proposed Semi-fragile Image Authentication Method One property of the proposed non-malicious manipulation method is that it is “content preserving.” That is, the distribution of luminance for one block with its neighbors must be preserved before and after non-malicious manipulations. The analysis of standard deviation is a well-known method for measuring the degree of distribution for data. Therefore, this work measures and analyzes the standard deviation of blocks in the same neighborhood set in order to generate the feature. Figure 7.2 shows the framework of the signing and encrypting procedure for semi-fragile content-based image authentication, and the Fig. 7.3 shows the procedure of decrypting and
146
C.-H. Lin and W.-S. Hsieh
Fig. 7.2. Diagram of image feature and cryptographic ciphertext generation methods
Fig. 7.3. Diagram of image authentication method
verifying content-based authentication. The image feature measurement, signature generation, encryption and decryption and image authentication are described in detail below. 7.3.1
Image Feature Measurement and Cryptographic Ciphertext Generation Methods
The relationship of standard deviation between one block and its neighbor blocks in spatial domain are analyzed in order to measure the image feature. Firstly, the original image, denoted as X with size w × h, is divided into 2p × 2q non-overlapping blocks. Therefore, the size of each block bi is 2wp × 2hq , 0 ≤ i < 2p × 2q . The r-neighborhood set of bi is noted as Nr (i), representing that comprising the traditional second-order neighbor r − 1 blocks of bi and bi itself. Figure 7.4 shows an example of the 25-neighborhood set N25 (i). Secondly, the mean luminance of Nr (i), named mi , is calculated as follows: w
mi = w 2p
×
1 h 2q
×r
×
×
h
−1
r−1 ( 2p 2q ) j=0
bi,j (k)
(7.1)
k=0
where bi,j (k) denotes the luminance of k th pixel in bi,j , and bi,j denotes the j th block in Nr (i), 0 ≤ j < r − 1. After calculating all mi corresponding to
7
Semi-fragile Image Authentication Method
147
Fig. 7.4. The 25-neighborhood set N25 (i) of block bi , represented as the gray region
bi according to Eq. (7.1), all mi are then adopted to train and generate a codebook C, C = {ct | t = 0, 1, 2, · · · , n − 1}, where ct denotes the codeword and n denotes the codebook size. The index of the represented codeword corresponding to mi can be found easily in C, and this index is recorded as ith element of M , where M = {M [i] | 0 ≤ i < 2p+q } denotes a one-dimension array with size 2p+q (= 2p × 2q ). Therefore, the index matrix M is adopted to record the index of codeword corresponding to mi , and cM[i] denotes the codeword corresponding to mi in codebook C. Thirdly, the C and M are adopted to calculate the standard deviation σi,j for bi,j as follows: ⎛ σi,j = ⎝ w 2p
1 × × 2hq
w 2p
× 2hq −1
⎞ 12 2 bi,j − cM[i] ⎠ .
(7.2)
k=0
Consequently, the maximum and the minimum values among σi,j , named σi max and σi min respectively, are found. The block feature of bi , denoted as fi , is then measured as follows: fi =
σi,(r−1)/2 − σi min × 2e σi max − σi min
(7.3)
where e denotes the number of bits for encoding fi , and fi is then cascaded to generate the image feature F . To generate the LAD, the least authenticable image (LAI), as determined ˆ denotes the by the sender’s application, should be defined. Assume that X ˆ ˆ LAI, with a size of w ˆ × h pixels. The LAI X is divided into 2p × 2q nonˆ overlapping blocks, each denoted as ˆbi with size 2wˆp × 2hq pixels, 0 ≤ i < 2p × ˆr (i). The standard deviation 2q . The neighborhood set of ˆbi is denoted as N ˆ ˆ σ ˆi,j and the block feature fi for bi are calculated by Eqs. (7.2) and (7.3)
148
C.-H. Lin and W.-S. Hsieh
respectively. The absolute block feature difference (BFD) between fi and fˆi , denoted by BFD(fi , fˆi ), is given as follows: 0 0 0 0 (7.4) BFD(fi , fˆi ) = 0fi − fˆi 0 . ˆ named the lowest authenThe maximum absolute BFD between X and X, ticable difference (LAD), is then calculated as follows: LAD = max BFD(fi , fˆi )|0 ≤ i < 2p × 2q . (7.5) Finally, to enhance the security of the proposed method, the hybrid cryptosystem is considered. The p, q, e, r, C, M , F , LAD and original file’s size (OFS) are signed by the image sender’s private key to form the digital signature, based on traditional crypto signature schemes such as RSA and DSA [18]. The digital signature is then encrypted as a ciphertext by the receiver’s public key. The original image is sent with the ciphertext to the receiver. 7.3.2
Image Authentication
Figure 7.3 indicates that one extra piece of information is needed, in addition to the image itself, to authenticate the received image, namely the received ciphertext. The received ciphertext is decrypted by the receiver’s private key, and the digital signature is then generated. This digital signature can be verified by the sender’s public key to derive p, q, e, r, C, M , F , LAD and OFS. A received image X with size w × h is divided into 2p × 2q non h overlapping blocks denoted as bi (0 ≤ i < 2p × 2q ) with size w 2p × 2q . The neighborhood set, Nr (i), is considered. Based on procedures in Eq. (7.2) and Eq. (7.3), the values of σi,j are calculated as follows: ⎛ 1 ⎜ σi,j = ⎝ w h × 2p × 2q
w 2p
× 2hq −1
⎞ 12
2 ⎟ bi,j (k) − cM[i] ⎠ .
(7.6)
k=0
Therefore, block feature fi is calculated as follows: fi =
σi,(r−1)/2 − σi
σi
max
− σi
min
× 2e
(7.7)
min
where σi max and σi min denote the maximum and minimum values among all respectively. If all BFD(fi , fi ) are less than or equal standard deviation σi,j to LAD, and the file’s size of X is less than the file’s size of X, then X is authentic; otherwise, X is unauthentic and those Nr (i) where BFD(fi , fi ) > LAD are marked in the authentication result image. The security of the proposed image authentication method is discussed and presented in Sect. 7.5.2.
7
Semi-fragile Image Authentication Method
149
7.4 Experimental Results Three well-known images, Lena, Pepper and Fruits, with size 512 × 512 pixels were adopted to measure the performance of the proposed method. Each image is divided into 32 × 32 blocks with size 16 × 16 pixels. The well-known LBG VQ (Vector Quantization) design algorithm, presented by Linde, Buzo and Gray in 1980, was adopted to generate the codebook C. The codebook size was set to 8, and the codeword length was 8 bits. A 25-neighborhood set was adopted, and the number of bits in the encoding feature, e, was set to 5. As described in the previous section, the proposed content-based authentication scheme can verify the authentication of JPEG, JPEG2000 compression and scaling images in terms of the LAD. Because the LAD was determined and generated based on the least authenticable image (LAI), according the human visual perceptivity, the JPEG2000 compressed image that the Peak Signal to Noise Ratio (PSNR) closes to 34.0dB was adopted as the LAI in this experiment. Therefore, in the following experiment, the LAI for three test images were: JPEG2000 compressed Lena image with bit rate 0.375bpp (bpp denotes bit per pixel, with LAD and PSNR of 2.9 and 34.2, respectively), JPEG2000 compressed Pepper image with bit rate 0.5bpp (LAD and PSNR of 2.5 and 33.7, respectively) and JPEG2000 compressed Fruits image with bit rate 0.5bpp (LAD and PSNR of Fruits image is 2.9 and 34.7, respectively). These LAD values of the three test images are underlined in Table 7.1. The related values (PSNR, MBFD and CR) of considered non-malicious manipulations are printed in bold and italic in Tables 7.1–7.3. Two related symbols adopted in the following experiments — (1) the maximum block feature difference (MBFD) between the original and manipulated images, and (2) the compression ratio (CR) [5] — are defined as follows: MBFD = max(BFD(fi , fi )|0 ≤ i < 2p × 2q ) the file size of original image CR = the file size of manipulated image where BFD(·) is defined in (7.4). Several manipulations are adopted to prove the efficient of the proposed scheme in the following experiments. 7.4.1
JPEG2000 Compression
Images were compressed by JPEG2000 compression and bit rates in the range 0.125–3.0bpp (by PhotoImpact software package). Table 7.1 shows the authentication results after several JPEG2000 compressed at different bit rates. According to the two proposed non-malicious manipulation properties, the non-malicious manipulation must not only preserve the meaning of content, but also reduce the size of the original file. That is, the manipulation is considered as non-malicious if the following two conditions are achieved simultaneously:
150
C.-H. Lin and W.-S. Hsieh Table 7.1. Experimental results of JPEG2000 compressed images Lena bpp 0.125 0.200 0.250 0.300 0.375 0.400 0.500 0.600 0.625 0.700 0.750 0.800 0.900 1.000 1.100 1.200 1.300 1.400 1.500 2.000 2.200 2.400 2.600 2.800 3.000
Pepper
Fruits
PSNR MBFD CR PSNR MBFD CR PSNR MBFD CR 29.4 6.9 193.8 28.5 8.2 186.2 28.9 7.6 187.4 31.5 5.3 118.2 30.3 6.8 117.7 30.8 4.7 117.7 32.5 4.6 94.7 31.2 4.5 94.5 31.7 4.5 94.7 33.4 3.8 79.2 31.9 3.1 79.0 32.2 4.0 80.7 2.9 63.3 33.4 3.5 63.5 34.2 2.9 63.3 32.8 34.6 2.2 59.7 33.0 2.7 59.5 33.6 3.5 59.5 2.9 47.7 35.4 2.2 48.1 33.7 2.5 47.7 34.7 36.4 2.1 39.7 34.3 2.1 40.1 35.5 2.4 39.9 36.6 2.1 38.2 34.5 2.0 38.4 35.6 2.4 38.2 36.8 2.0 34.1 34.9 2.0 34.1 36.2 2.0 34.3 37.1 2.0 32.0 35.2 2.0 31.9 36.6 2.0 31.8 37.3 1.7 30.0 35.4 1.9 29.9 36.9 1.9 30.0 37.8 1.7 26.7 35.7 1.9 26.6 37.5 1.7 26.6 38.1 1.4 24.0 36.0 1.9 23.9 38.1 1.4 24.0 38.5 1.3 21.9 36.4 1.7 21.8 38.5 1.4 21.8 39.0 1.3 20.0 36.7 1.6 20.0 38.9 1.3 19.9 39.3 1.3 18.5 36.9 1.6 18.5 39.6 1.3 18.4 39.5 1.3 17.1 37.1 1.5 17.1 40.1 1.3 17.2 39.8 1.3 16.0 37.3 1.2 16.0 40.5 1.2 16.0 41.2 1.3 12.0 38.9 1.2 12.0 42.3 1.1 12.0 41.6 1.0 10.9 39.3 1.2 10.9 42.7 0.7 10.9 41.8 0.9 10.0 39.5 1.2 10.0 43.3 0.7 10.0 42.1 0.8 9.2 39.8 0.9 9.2 44.0 0.6 9.3 42.5 0.5 8.6 40.0 0.9 8.6 44.6 0.6 8.6 43.1 0.5 8.0 40.6 0.9 8.0 45.1 0.6 8.0
(1) all BFD between fi and fi are less then or equal to LAD (that is, the MBFD of manipulated image must less than or equal to LAD), and (2) CR > 1. Otherwise, the manipulation is considered as malicious because of an obvious distortion in visual quality compared to the original image, or a lack of compression. For the Pepper image, Table 7.1 indicates that non-malicious manipulation produces images where MBFD ≤ 2.5 (the LAD of Pepper image), and the PSNR of these non-malicious manipulated images are close to or greater than 33.7dB and CR all are greater than 1. Such manipulations produce images with had low distortion compared with the original image, and with smaller file sizes than the original, and were therefore considered as non-malicious manipulations. Manipulations that did not conform to the above criteria were considered as malicious. The same conditions occurred for the Lena and Fruits images.
7
Semi-fragile Image Authentication Method
151
Table 7.2. Experimental results of JPEG compressed images (QF denotes the JPEG quality factor) Lena QF 30 40 50 60 70 80 90
Pepper
Fruits
PSNR MBFD CR PSNR MBFD CR PSNR MBFD CR 34.3 3.2 44.6 33.6 2.5 41.8 33.7 3.3 46.4 35.1 2.2 37.5 34.2 1.5 35.0 34.6 3.0 38.0 35.8 1.6 32.3 34.8 1.5 30.5 35.3 2.8 32.3 36.5 1.4 28.1 35.3 1.2 25.8 36.0 1.6 27.8 37.3 1.2 23.2 35.9 1.0 21.2 37.0 1.2 22.9 38.5 1.2 17.8 36.8 0.8 16.3 38.5 1.1 17.9 40.8 0.5 11.4 38.8 0.7 10.2 41.2 0.6 1.8
Table 7.3. Experimental results of scaled images Lena
Pepper
Fruits
Scale factor (Image size) MBFD CR MBFD CR MBFD CR 31.3% (160×160) 6.4 10.2 8.1 10.2 10.8 10.2 37.5% (192×192) 5.8 7.1 6.7 7.1 9.8 7.1 50.0% (256×256) 2.8 4.0 4.5 4.0 9.6 4.0 62.5% (320×320) 2.4 2.6 2.9 2.6 4.0 2.6 68.8% (352×352) 2.3 2.1 2.3 2.1 3.8 2.1 75.0% (384×384) 2.3 1.8 2.1 1.8 2.9 1.8 87.5% (448×448) 2.2 1.3 1.6 1.3 2.5 1.3 93.8% (480×480) 1.6 1.1 1.6 1.1 2.1 1.1
7.4.2
JPEG Compression
Several JPEG compressed images, with quality factors (QF), 30, 40, 50, 60, 70, 80 and 90 (by PhotoImpact software package), were selected and examined to calculate the MBFD and CR. Table 7.2 summarizes the authentication results of the three test images. In Table 7.2, for example of Fruits image, values that would guarantee non-malicious manipulation were MBFD ≤ 2.9 (the LAD of Fruits image), and the PSNR of these non-malicious manipulated images are close to or greater than 34.7dB and CR > 1. These values produce images with little distortion from the original image and small file sizes. Values outside the range, MBFD > 2.9, were considered as malicious manipulations. The same conditions occurred for the Lena and Pepper images. 7.4.3
Scaling
Firstly, the scale factor (SF) was adopted and defined as follows,
152
C.-H. Lin and W.-S. Hsieh
width of scaling image width of original image height of scaling image = . height of original image
SF =
Several different scale factors, 31.3%, 37.5%, 50.0%, 62.5%, 68.8%, 75.0%, 87.5% and 93.8% (the corresponding image size are 160×160, 192×192, 256×256, 320×320, 352×352, 384×384, 448×448 and 480×480 pixels, respectively), were selected and examined with PhotoImpact. Table 7.3 shows the MBFD and CR of these scaled images. Unlike in Experiments 1 and 2, the PSNR in this experiment cannot be adopted to measure the visual distortion between various sizes of images. Therefore, the proposed MBFD is a good measure for determining the distortion of visual quality compared with the original image in this experiment. Considering the Lena image in Table 7.3, the scaled images with MBFD ≤ 2.9 (LAD of Lena image) and CR>1, are considered as non-malicious manipulated images, and otherwise are considered as malicious manipulated images. The same conditions also occurred in the Fruits and Pepper images. However, scaling may change the proposed feature measurement if an image has large energies in high-frequency bands, which are mostly eliminated by scaling. For instance, Table 7.3 indicates that the Fruits image is less robust to scaling than the other images. Section 7.5.1 presents the related analysis and discussion of the robustness of the proposed method to scaling. 7.4.4
Additional Image Manipulations
The three test images were manipulated by the following two categories: (1) manipulated by PhotoImpact software package: sharpening (the option path of PhotoImpact was “photo\sharpen\sharpen” and the level of sharpening was set to 50), blurring (the option path “photo\blur\flatten uneven area” in PhotoImapct, and a “lowpass filtering” factor of 3.), Gauss blurring (with radii of 0.2 and 3.0 respectively), adding noise (with variances of 1 and 25 respectively), brightness up 10%, brightness down 10%, contrast up 10% and contrast down 10%; (2) manipulated by Matlab coding and the related coefficients were adopted from [5]: smoothing spatial filtering with mask1 (in Fig. 7.5(a)) and mask2 (in Fig. 7.5(b)), respectively; sharpening spatial filtering with mask3 (in Fig. 7.5(c)) and mask4 (in Fig. 7.5(d)), respectively; lowpass frequency filtering (Butterworth lowpass filtering (BLPF) of order 2, with cutoff frequencies (CF) at radii of 15, 30, 80 and 230, respectively) and highpass frequency filtering (Butterworth highpass filtering (BHPF) of order 2 with cutoff distance (CD) set 15, 30 and 80, respectively, where CD is the cutoff distance measured from the origin of the frequency rectangle).
7
Semi-fragile Image Authentication Method
153
Fig. 7.5. Masks of smoothing/sharpening spatial filtering. (a) mask1, (b) mask2, (c) mask3, (d) mask4.
Tables 7.4 and 7.5 list the authentication results of this experiment. Tables 7.4 and 7.5 indicate that all the CR values of the three test images were all equal to 1, meaning that these manipulations are useless to speed up the transmission in the Internet environment or save the storage space. Moreover, Table 7.4 contains two significant experimental results, namely Gauss blurring with radius 0.2 and adding noise with variance 1. Table 7.5 also contains one significant experimental result, namely lowpass frequency filtering with cutoff frequency at radius of 230. Although the MBFD values of these three manipulations were less than the LAD, they were still considered as malicious manipulations. This finding is discussed in Sect. 7.5.3. Table 7.4. Experimental results of malicious manipulations that manipulated by PhotoImpact Lena Manipulations sharpening blurring Gauss blurring (radius=0.2) Gauss blurring (radius=3.0) adding noise (variance=1) adding noise (variance=25) brightness up 10% brightness down 10% contrast up 10% contrast down 10%
MBFD 14.7 23.1 0.2 23.5 0.3 9.1 27.5 27.9 9.2 9.3
CR 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Pepper MBFD 9.9 13.6 0.8 14.6 1.2 6.9 24.2 24.8 8.1 8.7
CR 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Fruits MBFD 16.4 22.8 1.2 24.0 1.8 15.3 26.4 31.0 17.9 31.0
CR 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Another experiment was performed to demonstrate the ability to locate modified regions. Figure 7.6(a) is the original Pepper image, and Fig. 7.6(b) is pasted with three small pepper patterns. Figure 7.6(c) shows the neigh (i) in which BFD(fi , fi ) > LAD are marked in the authenborhood sets N25 tication result image.
154
C.-H. Lin and W.-S. Hsieh
Table 7.5. Experimental results of malicious manipulations that manipulated by Matlab coding Lena Manipulations smoothing spatial filtering (with mask1) smoothing spatial filtering (with mask2) sharpening spatial filtering (with mask3) sharpening spatial filtering (with mask4) lowpass frequency filtering (with CF=15) lowpass frequency filtering (with CF=30) lowpass frequency filtering (with CF=80) lowpass frequency filtering (with CF=230) highpass frequency filtering (with CD=15) highpass frequency filtering (with CD=30) highpass frequency filtering (with CD=80)
(a)
(b)
MBFD 9.7 6.8 20.6 25.4 25.4 23.1 6.8 1.8 31.0 31.0 31.0
CR 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Pepper MBFD 4.2 3.7 17.1 23.6 22.5 14.4 9.2 2.1 31.0 31.0 31.0
CR 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Fruits MBFD 9.5 7.5 19.8 29.3 27.2 23.8 11.8 1.0 31.0 31.0 31.0
CR 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
(c)
Fig. 7.6. Pepper image (a) original image, (b) pasting-attacked image, (c) mark (i) where BFD(fi , fi ) > LAD the neighborhood sets N25
7.5 Analysis and Discussion 7.5.1
Robustness of the Proposed Image Authentication Method
Because the authentication of the scaled image is the main contribution in this chapter, this section discusses the robustness of the proposed method to image scaling. Mathematical morphology can be adopted as a tool to extract image components, such as boundaries, skeletons, and the convex hull, that are useful for representing and describing region shapes [5]. Wu and Shih [22] proposed an adjust-purpose digital watermarking technique and observed that almost all the pixel-based features of an image will remain the same after compression. An example was presented in [22], selected a pixel ρ and its 3×3 neighborhood, the value of ρ (the central pixel) was originally ranked “5” within the 3×3 neighborhood and was then ranked “4”, “4” and “6” after
7
Semi-fragile Image Authentication Method
155
JPEG compression with compression rates of 80%, 50%, and 20%, respectively. They generated the feature based on the ranking-property and this property also can be adopted and extended to block-based. Therefore, the proposed block-based image authentication adopts the similar property with [22] and generates feature based on standard deviation measurement. The standard deviation is popularly adopted in mathematical morphology to generate features [8, 17]. This study also adopts the standard deviation to generate block features. Firstly, the proposed measurement of block feature, in Eq. (7.3), is given as follows, fi =
σi,(r−1)/2 − σi min × 2e . σi max − σi min
According to the statistics, the standard deviation is defined as the statistical measure of variability in a set of data, and is calculated from the square root of the variance. That is, standard deviation is a statistical measure of spread or variability. Therefore, the following formula in Eq. (7.8), σi,(r−1)/2 − σi min σi max − σi min
(7.8)
denotes the rate of variance for block bi within its corresponding neighborhood set Nr (i). The value of fi is normalized between 0 and 2e − 1 after (7.3). If fi closes to 0, then the pixels’ luminance of bi are close to cM[i] ; while if fi closes to 2e − 1, then the pixels’ luminance of bi are variable to cM[i] . The scale-down operation can be considered as the sampling operation for an image. If Nr (i) consists of α × β pixels and a scale factor of μ, then the pixel number of the corresponding neighborhood set Nrμ (i) with μ is given by (μ×α)×(μ×β) pixels, where μ ≤ 1. That is, the number of elements in sample space Nr (i) is α×β, and that in sample space Nrμ (i) is (μ×α)×(μ×β), where (μ×α)×(μ×β) ≤ α×β. The accuracy and stability of statistical measurement depends on two factors: (1) the number of elements in sample space, and (2) the variance of elements in sample space. Increasing the number of elements and reducing the variance of elements in the sample space helps increase the accuracy and stability of statistical measurement. That is, the above two factors affect the difference of standard deviation between the original space Nr (i) and space Nrμ (i) after sampling, and also affect the robustness of the proposed method. The degree of μ determines the number of elements, while the structure of Nr (i) determines the variance of elements. The following two cases were analyzed. Case 1. The structure of Nr (i) is uniform or with low texture: In this case, the luminance of each pixel in the original block bi is similar; that is, the variance of bi is smooth. Hence, the difference between σi,j (standard μ (standard deviation of scaledeviation of bi,j , defined in Eq. (7.2) and σi,j μ down block bi,j ) is small. This difference closes to 0 and the variance of fi is slight when μ closes to 1. The difference rises when μ falls. Based
156
C.-H. Lin and W.-S. Hsieh
(a)
(b)
(c)
Fig. 7.7. (a) An uniform neighborhood set N25 (i)of original Lena image, (b) LDC 50 (i) and b50% of N25 (i) and bi , (c) LDC of N25 i,j
on the above two factors, because the variance of elements in bi is small, the proposed method is robust for the falling μ in this case. Therefore, the proposed authentication system is robust to Nr (i) with a uniform structure. Furthermore, this case also can be illustrated by luminance-distributionchart (LDC). Figure 7.7(a) shows a uniform neighborhood set N25(i)of original Lena image. Figure 7.7(b) shows the LDC of N25 (i) (upper curve) and bi (lower curve), symbol “♦” locates the position of value cM[i] . Fig50% (i) (upper curve) and b50% ure 7.7(c) is the LDC of N25 i,j (lower curve). In Figs. 7.7(b) and (c), x-axis denotes the pixel’s luminance between 0-255 and y-axis denotes the amount of pixel. The shape of luminance-distribution between Figs. 7.7(b) and (c) is similar, therefore, the difference between σi,j 50% and σi,j is small. Hence, this method is robust to Nr (i) with a uniform structure. Case 2. The structure of Nr (i) contains highly detailed textures: In this case the differences between pixels in bi are high. Therefore, the number of elements in sample space Nrμ (i) affects the difference between σi and σiμ easily. Based on the factor in Eq. (7.2), the degree of variance between pixels in bi affects the accuracy and stability of statistical measurement. If the variance of luminance is strong, then the difference between σi and σiμ is variable and the degree of scaling operation is restricted. This case is presented as Table 7.3, which indicates that the Fruits image contains blocks of high variance, and is thus degraded by scaling. That is, the Fruits image is less robust to scaling than the other test images.
7
Semi-fragile Image Authentication Method
157
(a)
(b)
(c)
Fig. 7.8. (a) An texture neighborhood set N25 (i)of original Lena image, (b) LDC 37.5% (i) and b37.5% of N25 (i) and bi , (c) LDC of N25 i,j
Figure 7.8 also can be adopted to illustrate this case. A neighborhood set with texture-structure of original Lena image is shown in Fig. 7.8(a), and the corresponding LDC is shown in Fig. 7.8(b). Figure 7.8(c) is the LDC with scale factor of 37.5%. The definitions of representation in Figs. 7.7 and 7.8 are the same. The shape of luminance-distribution in Figs. 7.8(b) and (c) is different and violent variance, therefore, the 37.5% is variable. That is, if the variance of measure between σi,j and σi,j luminance is high, the result of Eq. (7.8) is different before and after scaling, therefore, the degree of scaling operation is limited. Although the proposed method is sensitive to textured regions, its robustness can be improved by adjusting two coefficients, p and q, reducing p and q increases the robustness of an image with a fixed μ. Because the block’s size is 2wp × 2hq , so smaller p and q values generate bigger block. Moreover, bigger block has more pixels (due to having more elements), increasing the accuracy and stability of measurement and avoiding large differences in standard deviation due to scaling. 7.5.2
Security
The hybrid cryptosystem is adopted to enhance the security of the proposed method. Figure 7.9 shows the hybrid cryptosystem protocol. For the sender, the p, q, e, r, C, M , F , LAD and OFS are signed by the image sender’s private key to form the digital signature. The digital signature is then encrypted by the receiver’s public key to generate the ciphertext (see Fig. 7.2). For the
158
C.-H. Lin and W.-S. Hsieh
receiver, the received ciphertext is decrypted by the sender’s private key to generate the digital signature. This decrypted digital signature can then be decrypted by the sender’s public key (see Fig. 7.3). This cryptosystem increases the security of the proposed authentication system, and has the following advantages: (1) (2) (3) (4) (5)
high secret of privacy, convenient management of key (one public and one private key), authentication (digital signature), integrity (digital signature), and nonrepudiation (digital signature).
Fig. 7.9. The protocol of Hybrid cryptosystem
Therefore, if a careful attacker wants to modify the image, such as permute the pixels in a block, in order to counterfeit the feature without affecting its authenticity, then he must know the accurate size and location of block (determined by p and q) and size of neighborhood set (determined by r). However, these significant information are signed and encoded by sender. That is, he has to obtain the receiver’s private key. Since the attacker cannot obtain the receiver’s private key in the public-key cryptosystems, therefore, he cannot decode the block structure and other information affecting the image authenticity. 7.5.3
Distinguishing Non-malicious Manipulations from Malicious Attacks
This study presented a simple method for distinguishing the non-malicious manipulation (normal operation) from malicious manipulation by measuring the standard deviation. As described in Sect. 7.2, previous image authentication methods have not accurate definition between malicious and nonmalicious manipulations. Many image manipulations do not reduce either
7
Semi-fragile Image Authentication Method
159
storage space or transmission time over the Internet, but are adopted for purposes such as image enhancement or restoration, such as lowpass filtering or highpass filtering [5]. To solve the ambiguous definition of malicious and non-malicious manipulation among the issue of image authentication, an extra property to content preservation, namely the compression ratio (CR), is proposed in this chapter. Because reducing an image’s file size can speed up the time of transmission and storing, scaling (scale-down) manipulation as well as JPEG and JPEG2000 compression are considered non-malicious manipulations. The following two most important aims in image authentication for the receiver are: (1) receiving and storing the image immediately, and (2) verifying the authentication and integrity of the received image. Therefore, manipulations that cannot reduce the size of an image file should be considered as malicious, even if MBFD≤LAD, as in Gauss blurring with radius 0.2 and adding noise with variance 1 (in Table 7.4), and lowpass frequency filtering with cutoff frequency 230. Hence, the property “CR > 1” is important for distinguishing between non-malicious and malicious manipulations. Otherwise, if the property “CR > 1” is not required in the issue of image authentication, then almost all manipulations with slight modification can be considered as non-malicious, because MBFD≤LAD (means the visual quality has a little distortion to original image). Other studies have not considered this case. This study presents the property “CR > 1” to strengthen the distinction between non-malicious and malicious manipulation. The proposed method therefore applies the following two conditions, MBFD≤LAD and CR>1, for a manipulation to be considered as non-malicious.
7.6 Conclusions Image authentication requires verification that can resist non-malicious manipulations while being sensitive to malicious manipulations. This study presents a new content-based digital signature method for verifying the authentication of JPEG, JPEG2000 compressed and scaled images in terms of a concept named lowest authenticable difference (LAD). The structure of the statistical standard deviation between block and its neighbor blocks is analyzed to construct the digital signature. This construction is based on the fact that exiting some BFD of malicious manipulations always greater than the LAD that corresponding to LAI, while the BFD of non-malicious manipulations are mostly less than the LAD. The proposed solution makes the following main contributions: (1) Most traditional semi-fragile image authentication methods must transfer the spatial domain to the frequency domain in order to generate and verify the image feature. In the proposed scheme, the feature generation and verification schemes all are performed in the spatial domain.
160
C.-H. Lin and W.-S. Hsieh
Therefore, domain transformation is unnecessary, and the time is saved in the entire procedure. (2) Additional non-malicious manipulated images, including JPEG, JPEG2000 compressed and scaled images, can be authenticated in the proposed method to achieve a good trade-off among security and practical image processing in an Internet-based environment. Table 7.6 compares the proposed method with related studies [3, 7, 11, 15, 19, 20, 21] on the application of non-malicious manipulations and the required domain transformation, indicating that the non-malicious manipulations that can reduce the file size in these related works are either JPEG compression and JPEG2000 compression. However, the proposed semi-fragile image authentication method is robust to JPEG, JPEG2000 compression and scaling simultaneously. Therefore, the proposed method has more applications than others in the field. Table 7.6. Comparison of the proposed scheme with related works about the application of non-malicious manipulations and the demanded domain-transformation Non-malicious manipulations Domain transformation Sun [20] Lin [11] Lu [15] Venkatesan [21] Chen [3] Ishihara [7] Proposed scheme
JPEG2000 compression JPEG compression JPEG compression JPEG compression JPEG compression JPEG compression JPEG compression JPEG2000 compression Scaling
Frequency (DWT) Frequency (DCT) Frequency (DWT) Frequency (DWT) Frequency (DCT) Frequency (DWT) Spatial Domain
(3) Non-malicious manipulation is clearly defined in order to meet closely the requirements when images are transmitted over the Internet or saved in storage. (4) The MBFD is proposed as a measurement for determining the distortion of visual quality of a scaled image in comparison with original image. (5) The modified parts of a received image can be located if the image is classified as maliciously manipulated. Experimental results of this study demonstrate that the proposed method is feasible and effective. Future work should more closely examine robust solutions under the proposed framework to tolerate more acceptable manipulations combining JPEG, JPEG2000 compression, scaling, and others, e.g. scaled JPEG-compressed images, or scaled lowpass-filtering image.
7
Semi-fragile Image Authentication Method
161
References 1. Bao, P., Ma, X.: Image adaptive watermarking using wavelet domain singular value decomposition. IEEE Trans. Circuits and Systems for Video Technology 15, 96–102 (2005) 2. Celik, M.U., Sharma, G., Saber, E., Tekalp, A.M.: Hierarchical watermarking for secure image authentication with localization. IEEE Trans. Image Processing 11, 585–595 (2002) 3. Chen, C.C., Lin, C.S.: Toward a robust image authentication method surviving JPEG lossy compression. Journal of Information and Engineering 23, 511–524 (2007) 4. Feng, W., Liu, Z.Q.: Bayesian structural content abstraction for region-level image authentication. In: Proc. IEEE Int’l Conf. on Computer Vision, vol. 2, pp. 1042–1047 (2005) 5. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice-Hall, Englewood Cliffs (2002) 6. Hu, Y.P., Han, D.Z.: Using two semi-fragile watermark for image authentication. In: Proc. IEEE Int’l Conf. on Machine Learning and Cybernetics, vol. 9, pp. 5484–5489 (2005) 7. Ishihara, N., Abe, K.: A semi-fragile watermarking scheme using weighted vote with sieve and emphasis for image authentication. IEICE Trans. Fundamentals E90-A, 1045–1054 (2007) 8. Khan, F., Sun, Y.: Morphological templates for extracting texture information in X-ray mammography. In: Proc. IEEE Symp. on Computer-Based Medical Systems, pp. 375–380 (2001) 9. Lin, C.H., Hsieh, W.S.: Applying projection and B-spline to image authentication and remedy. IEEE Trans. Consumer Electronics 49, 1234–1239 (2003) 10. Lin, C.H., Hsieh, W.S.: Image authentication scheme for resisting JPEG, JPEG2000 compression and scaling. IEICE Trans. Information and Systems E90-D, 126–136 (2007) 11. Lin, C.Y., Chang, S.F.: A robust image authentication method distinguishing JPEG compression from malicious manipulation. IEEE Trans. Circuits and Systems for Video Technology 11, 153–168 (2001) 12. Linde, Y., Buzo, A., Gray, R.: An algorithm for vector quantizer design. IEEE Trans. Communications 28, 84–95 (1980) 13. Liu, H., Lin, J., Huang, J.: Image authentication using content based watermark. In: Proc. IEEE Int’l Symp. on Circuits and Systems, vol. 4, pp. 4014–4017 (2005) 14. Lu, C.S., Liao, H.Y.M.: Multipurpose watermarking for image authentication and protection. IEEE Trans. Image Processing 10, 1579–1592 (2001) 15. Lu, C.S., Liao, H.Y.M.: Structural digital signature for image authentication: An incidental distortion resistant scheme. IEEE Trans. Multimedia 5, 161–173 (2003) 16. Monga, V., Vats, D., Evans, B.L.: Image authentication under geometric attacks via structure matching. In: Proc. IEEE Int’l Conf. on Multimedia and Expo, pp. 229–232 (2005) 17. Park, H.S., Ra, J.B.: Morphological image segmentation for realistic image representation preserving semantic object shapes. Optical Engineering 39, 1909– 1916 (2000)
162
C.-H. Lin and W.-S. Hsieh
18. Schneier, B.: Applied Cryptography. Wiley, Chichester (1996) 19. Schneider, M., Chang, S.F.: A robust content-based digital signature for image authentication. In: Proc. IEEE Int’l Conf. on Image Processing, vol. 3, pp. 227–230 (1996) 20. Sun, Q., Chang, S.F.: A secure and robust digital signature scheme for JPEG2000 image authentication. IEEE Trans. on Multimedia 7, 480–494 (2005) 21. Venkatesan, R., Koon, S.M., Jakubowski, M.H., Moulin, P.: Robust image hashing. In: Proc. IEEE Int’l Conf. on Image Processing, vol. 3, pp. 664–666 (2000) 22. Wu, Y.T., Shih, F.Y.: An adjusted-purpose digital watermarking technique. Pattern Recognition 37, 2349–2359 (2004)
8 Genetic-Based Fingerprinting for Multicast Multimedia Yueh-Hong Chen1 and Hsiang-Cheh Huang2 1 2
Far East University, Tainan, Taiwan, R.O.C. National University of Kaohsiung, Kaohsiung, Taiwan, R.O.C.
Summary. The unicast method is a more efficient scheme to transmit multimedia data to massive users to compare with the multicast method. However, the ease of delivery of multimedia data may cause the copyrights to be violated easily, and the fingerprinting scheme is one of effective means for conquering this problem. The fingerprinting process often generates the multimedia contents into many different versions, which have to be transmitted via the unicast method. In this chapter, we propose a new genetic fingerprinting scheme for copyright protection of multicast media. In this method, the encryption and decryption keys, which aim at scrambling and descrambling multimedia contents, are first produced with genetic algorithms. Next, multimedia data are then encrypted and multicast to all clients. At the same time, a secure channel is employed to unicast a designated decryption key to each client. When a user deploys the designated key to decrypt the received data, a corresponding fingerprint would be embedded into the contents. Once upon the reception of the fingerprinted content, the embedded fingerprint can be extracted shortly, and the copyright can be confirmed and assured. Experimental results demonstrate that the proposed method can transmit multimedia data to clients effectively and cause only a slight degradation in perceptual quality.
8.1 Introduction The entertainment applications over the Internet contribute to the importance of Internet TV and video/multimedia on demand (VOD/MOD) services besides the existing text- or image-based retrieval applications. Nevertheless, the Internet-based applications have encountered some difficulties, including (1) the dramatic increase in bandwidth requirement for transmitting multimedia data, especially video data, and (2) the rampant pirates problem than has been existed for some time. These are important issues and interesting research topics to the video-based Internet applications. There is a growing interest in digital rights protection (DRP) and digital rights management (DRM) researches because of their potential capabilities J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 163–179. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
164
Y.-H. Chen and H.-C. Huang
to prevent media content from being pirated [6, 15, 16]. Besides the conventional schemes for employing cryptographic schemes for protecting data, digital watermarking provides another alternative for DRP and DRM. As people know, after completing the encryption process, the output looks like noisy pattern [3], which may easily cause the suspicion by the eavesdroppers. On the contrary, after performing watermarking, the original multimedia content and its corresponding output look very similar, hence reducing the possibility by the suspicion from the eavesdroppers. Watermarking is a way to secretly and imperceptibly embed the specific information, called the watermark, into the original multimedia contents. When an embedded watermark is associated with a particular user, it can be considered as a fingerprint. Once the fingerprinted media have been illegally distributed, the corresponding user could be easily traced back from the redistributed versions. Unfortunately, the fingerprint embedding process often produces the media content into many different versions with somewhat dissimilarities, and they have to be transmitted via the unicast method. Generally speaking, it is more efficient to transmit a unique media via the multicast method to massive users [5, 21]. It becomes quite important in the field of video-based applications to efficiently transmit video data embedded with fingerprints to all the users. Several different techniques have been proposed to tackle the problem described. A brief review of those research efforts is described as follows. According to the timing of embedding the fingerprint data into video, most of proposed methods can be classified into one of the following cases: 1. 2. 3. 4.
transmitter-side fingerprint embedding, receiver-side fingerprint embedding, intermediate-node fingerprint embedding, and joint fingerprinting and decryption.
Each of these cases will be briefly introduced and discussed in the following sections. 8.1.1
Transmitter-Side Fingerprint Embedding
The goal for transmitter-side fingerprint embedding schemes is to embed users’ fingerprints directly into the video to be multicast at server side. Next, the video is scrambled such that each user can only descramble his/her own fingerprinted video. Wu and Wu [20] proposed a scheme to multicast most of the video and unicast a portion of the video with unique fingerprints. When a larger percentage of the video is chosen to be fingerprinted, scrambled, and unicast, the security of transmitted video gets enhanced, but the efficiency of the protocol begins to resemble that of the simple unicast model. Boneh and Shaw [2] presented a method to distribute the fingerprinted copies of digital contents with the scrambling approach, also called the encryption approach therein. In their approaches, only two watermarked versions of video were
8
Genetic-Based Fingerprinting for Multicast Multimedia
165
needed, and video data were transmitted in a multicast manner. However, the bandwidth requirement was almost doubled over that of the normal multicast case. The strategy was also adopted by some other methods on frames [4], packets [17], and segments of video streams [19]. 8.1.2
Receiver-Side Fingerprint Embedding
The architecture of this type of embedding methods was initially introduced in [14], and more recently discussion were appeared in [1, 9]. Here, a video is protected to produce a scrambled content, also called an encrypted content, and it is then multicast to users from the server side. At the receiver side, the encrypted video is decrypted and fingerprinted with an unique mark by a decryption operator. For security reasons, tamper proof hardware must be employed in order to protect the purely decrypted host media content from eavesdropping. However, tamper proof hardware is difficult to build and it is still an interesting research topic till now. 8.1.3
Intermediate-Node Fingerprint Embedding
This method would distribute a fingerprinting process over a set of intermediate nodes such as routers [11]. Thus, by tracing the routing paths, the owner of a specific fingerprinted copy can be identified. However, this method creates a different set of challenges, such as vulnerability to intermediate node, compromise and susceptibility to standard network congestion, and packet dropping [13]. Hence, these problems are still under surveying. 8.1.4
Joint Fingerprinting and Decryption (JFD)
The JFD method, proposed by Kunder and Karthik [12], integrates the decryption and fingerprinting processes at the client side. In the method, a server is allowed to multicast only one encrypted video to all customers, and to unicast a designated decryption key to a specific user. The user can only decrypt a portion of the video data with the decryption key, and the video data that remain encrypted constitutes a fingerprint. However, in the method in [12], they perform the fingerprinting process in the discrete cosine transform (DCT) domain, and DCT coefficients are partitioned into subsets to represent binary strings, that is, the user ID. The bandwidth required for unicasting decryption keys would be increased with the increasing number of subsets. In this chapter, we propose a new JFD method based on genetic algorithms (GA). The method first generates encryption and decryption keys with GA. Multimedia data is then protected and multicast to all users. At the same time, a secure channel is used to unicast a designated description key to each user. When a client uses the designated key to decrypt the received video, a designated fingerprint would be embedded into the video. Three features of
166
Y.-H. Chen and H.-C. Huang
the proposed method make itself a suitable approach for protecting copyright of multicast media on the Internet. These features are: 1. Adaptive fingerprinting. The transform domain coefficients that are suitable for embedding fingerprints are selected with genetic algorithm. Therefore, different criteria can be adopted into different applications with the suitable design of fitness function. 2. Effective transmission. Only the decryption keys, that is, the random seeds in our implementation, need to be delivered with the unicast method. All the remaining data, including encrypted contents and information related to decryption, can be transmitted with the multicast method. 3. Security and imperceptibility. While a video is encrypted, most of the transform domain coefficients are scrambled such that it has little or no commercial value. On the other hand, a decrypted video left only a few coefficients that are still encrypted, and a fingerprint, or the encrypted coefficients left in the decrypted video, causes only imperceptible degradation in video quality. The rest of the chapter is organized as follows. Section 8.2 introduces the concepts of genetic algorithms. Section 8.3 briefly presents the scheme about how to encrypt a video frame and how to generate decryption keys with genetic algorithms. Section 8.4 discusses about the fingerprint detection method of the proposed scheme. Experimental results are presented in Section 8.5. Finally, in Section 8.6, we summarize the proposed method and draw some concluding remarks.
8.2 Brief Descriptions of Genetic Algorithms It is commonly seen that in a non-linear function with multiple variables, finding the maximum and minimum values is a difficult task by use of conventional optimization schemes. One scheme called the “genetic algorithm” (GA), based on the concept of natural genetics, is a directed random search technique. The exceptional contribution of this method was developed by Holland [10] over the course of 1960s and 1970s, and finally popularized by Goldberg [8]. In the genetic algorithms, the parameters are represented by an encoded binary string, called the “chromosome.” And the elements in the binary strings, or the “genes,” are adjusted to minimize or maximize the fitness value corresponding to a properly designed fitness function. The ways for adjusting the binary strings are composed of three major building blocks, including selection, crossover , and mutation. The fitness function is defined by algorithm designers, with the goal of optimizing the outcome for the specific application, for instance, the conventional traveling salesman problem (TSP) [7] and more recently, applications for watermarking. It generates its fitness value, which is composed of multiple variables to be optimized by GA. For every iteration in
8
Genetic-Based Fingerprinting for Multicast Multimedia
167
GA, a pre-determined number of chromosomes will correspondingly produce fitness values. At the beginning of the training process in GA, it defines the parameters for optimization, the fitness function provided by algorithm designers, and the corresponding fitness value, and it ends by testing for convergence. According to the applications for optimization, designers need to carefully define the necessary elements for training with GA. The three major building blocks in GA can be briefly depicted as follows. Mate selection: Assume that there are N chromosomes, and each has the length of l-bit for training in GA. A large portion of the chromosomes with low fitness values is discarded through this natural selection step. Algorithm designers need to provide a parameter called the “selection rate”, ps , for training in GA. The selection rate defines the portion of chromosomes with high fitness values that can be survived into the next training iteration. Consequently, there are (N · ps ) chromosomes that can be survived into the next iteration. Crossover: Crossover is the first way that a GA explores a fitness surface. Two among the (N ·ps ) survived chromosomes are chosen from the current training iteration to produce two new offsprings. A crossover point is selected, and the fractions of each chromosome after the crossover point are exchanged, and two new chromosomes are produced. Mutation: Mutation is the second way that a GA explores a fitness surface. The mutation procedure is accomplished by intentionally flipping the bit values at the chosen positions. It introduces traits not in the original individuals, and keeps GA from converging too fast. The fraction between the number of chosen positions, and the total lengths of chromosomes, is called the mutation rate, pm . Consequently, a total of (N · l · pm ) bits are intentionally flipped during the mutation procedure. The pre-determined mutation rate should be low. Most mutations deteriorate the fitness of an individual, however, the occasional improvement of the fitness adds diversity and strengthens the individual. After obtaining the fundamental concepts in GA, we are able to design an optimized, DCT-based fingerprinting system with the aid of GA, and to evaluate the fitness function in addition to the terminating criteria with the natural selection, crossover, and mutation operations in a reasonable way [7].
8.3 Genetic Fingerprinting Scheme The genetic fingerprinting scheme proposed here is mainly applied to transmit and protect media content for a multicast scenario. In this section, the multicast scenario and a performance metric for multicast fingerprinting schemes are introduced. Next, the embedding and detecting methods for the proposed GA fingerprinting scheme are also described.
168
Y.-H. Chen and H.-C. Huang
8.3.1
Performance Evaluation for Multicast Fingerprinting Schemes
Multicast transmission described in this chapter is adopted from that in [12]. First, we assume that there is only a public channel between a media server and all clients. Data transmitted with the public channel can be received by all of the clients simultaneously. In other words, sending data directly with the public channel is called the broadcast transmission. If the server needs to send secret data to a specific client, the data should be encrypted with the secret key associated with the client before transmission. All clients’ secret keys are delivered with a secure channel. Encrypting and transmitting data to a specific client is referred to as the unicast transmission. We use the term multicasting for the transmission of data using the combination of the unicast and broadcast methods. Regarding to the evaluation criterion, the transmission of media content is efficient if it incorporates both the broadcast and unicast methods such that the broadcast channel is used only a few times, while the unicast channel is seldom employed [12]. From quantitative point of view, the efficiency of a distribution method is measured; it relates to the purely naive broadcasting scenario and it can be defined by the ratio given in Eq. (8.1), η=
mD , m0
(8.1)
where mD is a value proportional to the bandwidth used by a fingerprinting scheme, and m0 is a value proportional to the bandwidth used in the unicast channel case. In particular, m0 is defined to be the number of times the public channel is used when the fingerprinted content is sent to each user respectively, and mD is the number of times the public channel is used by the fingerprinting scheme. We expect that 0 ≤ η ≤ 1. In addition, for two fingerprinting methods, if Method 1 is more efficient than Method 2, we refer to η1 < η2 . 8.3.2
Genetic Fingerprinting Embedding
As we discussed in Sec. 8.3.1, it is much more economic to use the broadcast channel than to use the unicast channel to transmit data. Therefore, the proposed scheme employs the broadcast channel to deliver the encrypted media content and decryption-related data required by all clients. To verify the applicability of the proposed genetic fingerprinting algorithm, we use image data to serve as the multimedia content. It is directly extendable to video or other media formats under consideration with only minor modifications. In our scheme, a method about how to modify the frequency domain coefficients of images is required for encrypting and fingerprinting the images. Any method having following properties, along with the watermarking requirements, can be adopted. These properties are:
8
Genetic-Based Fingerprinting for Multicast Multimedia
169
1. robustness, 2. imperceptibility, and 3. reversibility. The first two properties are similar to the requirements of a general watermarking system [16, 18]. The third property means that the modification made by the method should be reversible such that clients can decrypt the image. In this chapter, we intentionally invert the sign bits of some DCT coefficients to encrypt the image. The DCT is applied to the entire image, and unlike conventional schemes, it does not divide into 8 × 8 blocks for reasons of perceptibility of the fingerprint, and DCT is directly applied to the entire image. Next, decryption keys (or the random seeds in our implementation) are provided to clients through the unicast channel. The encrypted image and decryption-related information are broadcast to all clients. By combining decryption keys and decryption-related information, all clients have the ability to decrypt the encrypted coefficients to the most, and to obtain their own fingerprinted image.
Fig. 8.1. The process of fingerprint embedding in this chapter
Figure 8.1 demonstrates the process of fingerprint embedding. First, we randomly select a decryption key for each client. The decryption key is regarded as a random seed at the client side. Then, GA is applied to choose the suitable DCT coefficients to be encrypted, and to generate a coefficient decryption table. The coefficient decryption table is shared among all clients. Finally, the encrypted image and the coefficient decryption table are delivered to all clients with the broadcast method, while the decryption keys are sent to each client with the unicast method. An example of the coefficient decryption table is depicted in Fig. 8.2. An entry of the coefficient decryption table contains at most m coefficient indices. Each index in Fig. 8.2 corresponds to a DCT coefficient. A larger coefficient decryption table requires more bandwidth for transmission, and
170
Y.-H. Chen and H.-C. Huang
Entry Coefficient #1 · · · Coefficient #m
1
44
···
−1
2
173
···
35
3
62
···
44
Fig. 8.2. The coefficient decryption table
consequently adds variety to the encryption and decryption processes. At the client side, the decryption key is used to serve as a random seed to select K entries from the coefficient decryption table randomly, and then to decrypt coefficients indexed by the selected entries. Some encrypted coefficients will not be decrypted and form a fingerprint for the user. Figure 8.3 presents another example for the decryption and fingerprinting processes. On the left side in Fig. 8.3, suppose that an image with the size of 3 × 3 is to be protected. Both the encrypted DCT coefficients and the coefficient decryption table are ready to be broadcasted to all clients by the server. The encrypted DCT coefficients are represented by dark blocks, namely, Coefficients 2, 3, 4, 6, 8 are encrypted at the server side. On the right side in Fig. 8.3, suppose that two clients have received the encrypted image. With their own decryption keys, the two clients choose three entries from the coefficient decryption table, and decrypt the six corresponding DCT coefficients. Since the clients do not have the knowledge about the exact combination of coefficients that are encrypted at the server side, some coefficients may be left unchanged during the decryption process. On the one hand, some encrypted coefficients that are not selected are still encrypted. On the other hand, if
Fig. 8.3. An example for the decryption and fingerprinting processes of the proposed method
8
Genetic-Based Fingerprinting for Multicast Multimedia
171
some coefficient is not encrypted by the server, it may become fingerprinted at the client side due to the coefficient decryption table. We can see that for Coefficient 7 for Client 1, and for Coefficient 9 for Client 2, both of them present this phenomena. To speak more precisely, let X be the candidate coefficients to be encrypted ˆ be the candidate coefficients extracted from in the original image, and let X the fingerprinted image of client u. The fingerprint of the client u can be calculated with Eq. (8.2), Fu (i) =
1 ˆ 1 + sign(X(i)) · sign(X(i)) , 2
where
sign(x) =
1 ≤ i ≤ L,
(8.2)
1, x ≥ 0; −1, x < 0;
and L is the number of candidate coefficients. Thus, the fingerprint of the client is a binary string. If an encrypted coefficient is not decrypted, the corresponding bit is equal to 1; otherwise, it is equal to 0. Since the decryption keys associated with the clients are chosen randomly, the genetic algorithm is used to select proper coefficients for encryption and to generate coefficient decryption table such that the client will leave some encrypted coefficients to form a fingerprint. Therefore, chromosomes of the genetic algorithm consist of a binary string and an integer array. The length of the binary string is equal to the length of X. When a bit is equal to ‘1’ in the string, it indicates that the corresponding coefficient should be encrypted at the server side; and when the bit is equal to ‘0’ the coefficient will be left unchanged. The integer array represents a coefficient decryption table. The value of each integer is regarded as the index of a coefficient to be decrypted, and a special value −1 is used to indicate that this table cell does not points to any coefficient. From the discussions above, we observe that there are lots of flexibilities for implementing encryption and fingerprinting, and GA is suitable for conquering this task with a properly designed fitness function. For designing the fitness function, we consider the following goals. 1. The original image and the encrypted one should be as dissimilar as possible. 2. The correctly decrypted images corresponding to each client should be as similar to the original image as possible. 3. Different fingerprints should be as diverse as possible in order to differentiate the specific client. 4. The statistical difference between different fingerprints should be as apart as possible. Therefore, the fitness function proposed in training with the genetic algorithm, containing four different parts, is depicted in Eq. (8.3),
172
Y.-H. Chen and H.-C. Huang
fitness = −ω1 · PSNRe U
+ω2 ·
(8.3)
PSNRu
u=1 U U
−ω3 ·
sim(Fi , Fj )
i=1 j=1 j=i U U
+ω4 ·
diff(Fi , Fj ),
i=1 j=1 j=i
where ωi , i = 1, 2, 3, 4, are weighting factors for each term, U is the number of clients. In the first two terms, corresponding to the image quality, PSNRe is the peak signal-to-noise ratio (PSNR) value of an encrypted image, and PSNRu is the PSNR value of the decrypted image of client u. Next, for the remaining two terms relating to the fingerprint, and in order to reduce the computational complexity, the similarity and difference measures between different fingerprints, sim(•) and diff(•), are defined as follows, 1, if L1 (Fi · Fj ) > T, sim(Fi , Fj ) = (8.4) 0, otherwise; diff(Fi , Fj ) =
1, if L1 (Fi · Fj ) < t, 0, otherwise;
(8.5)
where Fi , Fj denotes the fingerprint i and j. T and t are the corresponding, pre-defined thresholds. By doing so, we are ready to maximize the fitness value in Eq. (8.3). With the fitness function, we hope to find an encryption manner to degrade the visual quality of the encrypted image, and to enhance the quality of clients’ decrypted images. Moreover, the fingerprints of any two clients should be only partly different to prevent from a comparison attack. We can also take into account the robustness of fingerprints to certain image processing methods. This would make fingerprints to be placed in more secure coefficients. Consequently, with GA described in Sec. 8.2, it can be employed to find proper coefficients to encrypt and form a coefficient decryption table. Since the decryption key associated with a specific client is only a random seed, the bandwidth required to send a decryption key can almost be omitted. Hence, by using a coefficient decryption table with a proper size, we can distribute fingerprinted image efficiently.
8.4 Fingerprint Detection When client u decrypts an encrypted image, a fingerprint Fu is embedded into the image immediately. If a redistributed copy of the image is obtained, we can detect the fingerprint Fu with Eq. (8.6),
8
Genetic-Based Fingerprinting for Multicast Multimedia
similarity =
1 (Fu · S) , L
173
(8.6)
and S(i) =
1 ˜ 1 + sign(X(i)) · sign(X(i)) , 2
1 ≤ i ≤ L,
˜ is the candidate coefficients for encryption, extracted from the rewhere X distributed image. If the similarity calculated by Eq. (8.6) is larger then a pre-defined threshold T , the redistributed image is considered to be embedded with the fingerprint Fu .
8.5 Experimental Results In this section, we present some experimental results to evaluate the performance of the proposed method. A 256 × 256, 8 bit/pixel gray scale image, Lena, is used as the test image in all experiments. This test image will be encrypted and transmitted to 5 different clients. 15000 DCT coefficients in middle frequency are chosen as candidate coefficients for encryption, that is, L = 15000. Coefficient decryption table contains 15000 entries, and each entry can keep 2 coefficient indices at most. After receiving the decryption key, the client will randomly select 70% entries of the coefficient decryption 32 30 maximal fitness value
28
Fitness value
26 24 minimal fitness value 22 20 18 16 14 12
0
50
100
150
200 250 Generation
300
350
400
450
Fig. 8.4. Fitness values of best and worst chromosomes in 450 generations
174
Y.-H. Chen and H.-C. Huang
Fig. 8.5. The encrypted image, PSNR = 25.24 dB
table, and will decrypt the coefficients indexed. Four weighting factors in the fitness function are assigned to be ω1 = 0.5, ω2 = 1, ω3 = 5, and ω4 = 5. Since the coefficient modification method in this chapter is inherently robust against image processing operations, the fitness function does not take the robustness measure into consideration. Even though there is no robustness measure in the fitness function, we will still evaluate the robustness of the proposed fingerprinting scheme with general image processing operations. First, the genetic algorithm is applied to find out proper coefficients to be encrypted, and a corresponding coefficient decryption table for the five clients can be generated. Corresponding to Sec. 8.2, in our experiments, the number of chromosomes is N = 200, the length of each chromosome is l = 45000, the selection and mutation rates are set to ps = 0.5 and pm = 0.04, respectively, and the number of training iteration is 450. Figure 8.4 depicts the fitness value of the best and the worst chromosomes in all the 450 generations. The resulting image encrypted by the best chromosome in GA is shown in Fig. 8.5,
8
Genetic-Based Fingerprinting for Multicast Multimedia
175
Table 8.1. The PSNR values of five fingerprinted images, corresponding to five clients Client number 1 2 3 4 5
PSNR for fingerprinted image (dB) 43.35 43.42 43.31 43.22 43.54
Fig. 8.6. The image with the worst quality among 5 decrypted and fingerprinted images, PSNR = 43.22 dB
with the PSNR = 25.24 dB. The fingerprinted image with the lowest image quality among five clients is illustrated in Fig. 8.6, with the PSNR = 43.22 dB. Figure 8.7 demonstrates the resulting image decrypted with a randomly
176
Y.-H. Chen and H.-C. Huang
Fig. 8.7. The image decrypted with a random key, PSNR = 28.64 dB
generated decryption key, which does not belong to the five clients. And we can see that the randomly decrypted image has poor image quality and is not suitable for commercial use. As shown in Fig 8.7, after the encryption process, from subjective point of view, the image content is still visible, but the image quality is too low to have commercial value. After decryption process, although the decrypted image contains a fingerprint, it is imperceptible to human eyes. As depicted in Table 8.1, PSNR values of all fingerprinted images are higher than 43 dB, and the one with the lowest PSNR value is illustrated in Fig. 8.6, with acceptable quality from subjective point of view. Moreover, based on our design methodology in algorithm, any two fingerprints are only partly different, so the collusion attack can be prevented. In the second set of experiments, five fingerprinted images are attacked with three general image-processing methods. They are: (1) low pass filtering, (2) high pass filtering, and (3) JPEG compression. The detection values are presented in Table 8.2. From Table 8.2, it can be seen that although the fitness function does not take these attacks into account, the fingerprints
8
Genetic-Based Fingerprinting for Multicast Multimedia
177
Table 8.2. The detection values of fingerprints after various image processing operations Client number 1 2 3 4 5
Low pass filtering 0.92 0.93 0.95 0.91 0.91
High pass filtering 0.99 0.98 0.99 0.99 0.99
JPEG compression, Quality factor = 15 0.717 0.706 0.693 0.660 0.683
are still robust to these image processing schemes applied intentionally, because the proposed algorithm is inherently robust to such processing. Thus, the fingerprint can hardly be removed unless the image quality is severely degraded. Finally, in our experiments, the encrypted Lena image is transmitted to five clients and then fingerprinted. As discussed in Sec. 8.3.1, the efficiency of our method is η =
1 + 0.9155 mD = 0.3831, = m0 5
where the term 0.9155 is the bandwidth required to broadcast coefficient decryption table. This result, 0.3831, is lower than the results presented in [2, 4, 17, 19], which have a η = 0.4. Moreover, the fingerprinted images have acceptable to good visual quality. Summing up, the proposed method is very suitable for the application needs to transmit fingerprinted media in broadcast environment.
8.6 Conclusion In this chapter, we propose a GA-based joint fingerprinting and decryption (JFD) method. The method randomly select decryption keys for clients, and then generates an encryption key and decryption-related information with genetic algorithms. Media data is then encrypted and multicast to all client. At the same time, a secure channel is used to unicast a designated decryption key to each client. When a client uses the designated key to decrypt received media data, a designated fingerprint will be embedded in the data correspondingly. In summary, based on our observations, the proposed method have three features: 1. Adaptive fingerprinting, 2. Effective transmission, and 3. Security and imperceptibility. Experimental results present that the proposed method has the ability to transmit media data to clients effectively and cause only a slight degradation
178
Y.-H. Chen and H.-C. Huang
in perceptual quality. In addition, the proposed method has the capability to resist some attack methods when an appropriate encryption method is adopted.
References 1. Bloom, J.A.: Security and rights management in digital cinema. In: Int’l Conf. on Multimedia and Expo, pp. 621–624 (2003) 2. Boneh, D., Shaw, J.: Collusion-secure fingerprinting for digital data. IEEE Trans. Information Theory 44, 1897–1905 (1998) 3. Chang, F.C., Huang, H.C., Hang, H.M.: Layered access control schemes on watermarked scalable media. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology 49, 443–455 (2007) 4. Chu, H.H., Qiao, L., Nahrstedt, K.: A secure multicast protocol with copyright protection. ACM Computer Communication Review 32, 42–60 (2002) 5. Chuang, J.C.I., Sirbu, M.A.: Pricing multicast communication: A cost-based approach. Telecommunication Systems 17, 281–297 (2001) 6. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking: Principles & Practice. Morgan Kaufman, Los Altos (2001) 7. Gen, M., Cheng, R.: Genetic Algorithms and Engineering Design. Wiley, New York (1997) 8. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989) 9. Hartung, F., Girod, B.: Digital watermarking of MPEG-2 coded video in the bitstream domain. In: IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, pp. 2621–2624 (1997) 10. Holland, J.H.: Adaptation in Natural and Artificial Systems. The MIT Press, Cambridge (1992) 11. Judge, P., Ammar, M.: WHIM: Watermarking multicast video with a hierarchy of intermediaries. Computer Networks 39, 699–712 (2002) 12. Kunder, D., Karthik, K.: Video fingerprinting and encryption principles for digital rights management. Proc. of the IEEE 92, 918–932 (2004) 13. Luh, W., Kundur, D.: Digital media fingerprinting: Techniques and trends. In: Multimedia Security Handbook, ch. 19. CRC, Boca Raton (2004) 14. Macq, B.M., Quisquater, J.J.: Cryptology for digital TV broadcasting. Proc. of the IEEE 83, 944–957 (1995) 15. Pan, J.S., Huang, H.C., Jain, L.C. (eds.): Intelligent Watermarking Techniques. World Scientific Publishing Company, Singapore (2004) 16. Pan, J.S., Huang, H.C., Jain, L.C., Fang, W.C. (eds.): Intelligent Multimedia Data Hiding. Springer, Heidelberg (2007) 17. Parviainen, R., Parnes, P.: Large scale distributed watermarking of multicast media through encryption. In: IFIP Conference Proceedings on Communications and Multimedia Security, pp. 149–158 (2001) 18. Shieh, C.S., Huang, H.C., Wang, F.H., Pan, J.S.: Genetic watermarking based on transform domain techniques. Patt. Recog. 37, 555–565 (2004)
8
Genetic-Based Fingerprinting for Multicast Multimedia
179
19. Thanos, D.: COiN-Video: A model for the dissemination of copyrighted video streams over open networks. In: 4th International Workshop on Information Hiding, pp. 169–184 (2001) 20. Wu, T.L., Wu, S.F.: Selective encryption and watermarking of MPEG video. In: Int’l Conf. on Image Science, Systems and Technology, pp. 261–269 (1997) 21. Zhao, H., Liu, K.J.R.: Bandwidth efficient fingerprint multicast for video streaming. In: IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, pp. 849–852 (2004)
9 Lossless Data Hiding for Halftone Images Fa-Xin Yu1 , Hao Luo1 , and Shu-Chuan Chu2 1
2
School of Aeronautics and Astronautics, Zhejiang University, 310027, Hangzhou, 101-8430, P.R. China
[email protected],
[email protected] Department of Information Management, Cheng Shiu University, Kaohsiung County 833, Taiwan, ROC
[email protected] Summary. With the development of image halftoning techniques and computer networks, a large quantity of digital halftone images are produced and transmitted in the Internet. Meanwhile, hiding data in images becomes a powerful approach for covert communications, copyright protection, owner announcement, content authentication, traitor tracing, etc. This chapter proposes two look-up table based methods to hide data in the cover media of halftone images. Both of them belong to lossless data hiding techniques, i.e., not only the secret data but also the original cover image can be accurately recovered in data extraction when the stego image is intact. Besides, an application example of the proposed method in lossless content authentication is illustrated. Furthermore, a data capacity enhancement strategy is introduced based on a statistical property of error diffused halftone image micro structure.
9.1 Introduction Digital halftoning is a process to transform continuous-tone images into twotone images, e.g. from 8-bit gray level images to 1-bit binary images. Halftone images can resemble their original versions when viewing from distance by the low-pass filtering of the human eyes. Popular halftoning techniques can be divided into three categories: ordered dithering [20], error diffusion [3], and direct binary search [10]. Among these, error diffusion based techniques achieve a preferable tradeoff between high visual quality and reasonable computational complexity. In the past several decades, halftone images are widely expressed in hard copies such as books, magazines, printer outputs and fax documents. Actually, they are produced by digital halftoning in printing. In recent years, digital halftone image itself also draws much attention among researchers because many properties it provided. For example, it can be J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 181–203. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
182
F.-X. Yu, H. Luo, and S.-C. Chu
regarded as a compressed version of the original continuous-tone image. This property can be used in continuous-tone image tamper detection, localization and distortion recovery [12]. For another example, digital halftone images can be used in low-level display device or in the situations of narrow channel band transmission [19]. Consequently, it is also desirable to embed data in this special kind of images for the applications of covert communications, copyright protection, content authentication, and tamper detection. Up to now, many data hiding techniques have been developed for still continuous-tone images, audio, videos, 2D vector data, 3D meshes, [14] etc. In contrast, a small number of data hiding approaches are available for halftone images. Different from gray level or color images, there are mainly three challenges to embed data in halftone images. The first one is less information redundancy exists for each pixel value is either black or white. Consequently many data hiding approaches such as some transform domain based techniques for continuous-tone images cannot be directly transplanted to halftone images. Another challenge is the visual quality degradation. To insert data in halftone images, the change of the pixel value is either from black to white or vice versa. According to the study of the human visual system (HVS), human eyes are sensitive to the abrupt change aroused by data embedding, e.g., new appearances of the white cross and the black cross. The third one is the lower capacity compared with continuous-tone images data hiding. High capacity is one of the key factors to evaluate the performance of data hiding techniques. In fact, for halftone images, this challenge is closely related to the former two. It is expected that a large quantity of data are difficult to be embedded into halftone images considering high distortion for less information redundancy can be employed. In recent years, a set of data hiding methods for halftone images are reported. These approaches can be divided into three categories: (1) Pixel-based: to change the values of individual pixels that usually randomly selected [5]. (2) Block-based: to partition the original image into pixel blocks and modify the characteristics of some blocks [1, 6, 9]. (3) Hybrid-based: to insert data by combining the characteristics of pixelbased and block-based [15, 16]. However, most existing data hiding methods cannot recover the original image because of the irreversible distortion introduced. Although the distortion is slight, it may not satisfy the requirement of some specific applications, where content accuracy of the host image must be guaranteed, e.g., military maps, medical images, great works of art, etc. Therefore, it is quite necessary to develop lossless data hiding methods for halftone images. However, till the present time, little attention has been paid to this. This chapter proposes two methods to hide data in the cover media of halftone images. Both of them belong to lossless data hiding [4], i.e., not only the watermark but also the
9
Lossless Data Hiding for Halftone Images
183
original cover image can be accurately recovered in data extraction when the stego image is intact. The rest of this chapter is organized as follows. Section 9.2 briefly reviews the popular used digital halftoning techniques. Section 9.3 describes the two proposed methods. Furthermore, an example of lossless content authentication for halftone image is illustrated. Besides, a capacity enhancement strategy based on micro structure statistical feature of halftone is introduced. Section 9.4 concludes the whole chapter.
9.2 Digital Halftoning As digital halftone image is the products of digital halftoning instead of natural image, the principle of digital haltoning is briefly reviewed in this section. Ordered dithering and error diffusion are two kinds of popular used digital halftoning techniques. In ordered dithering, the input continuous-tone image is compared with a periodic threshold matrix that also called a halftone screen. For example, a dispersed-dot dithering halftone screen is shown in Fig. 9.1. Suppose HS denotes the halftone screen. The ordered dithering processing is quite simple. First the input image is partitioned into nonoverlapping blocks with each block of the same size as HS. Then we compare pixel values in each block CT with HS as 255 if CT (i, j) ≥ HS(i, j) HT (i, j) = , (9.1) 0 if CT (i, j) < HS(i, j) where (i, j ) is the pixel’s coordinates, HT is the output block, and 255 and 0 denote a white pixel and a black pixel respectively.
Fig. 9.1. A dispersed-dot dithering halftone screen
184
F.-X. Yu, H. Luo, and S.-C. Chu
Fig. 9.2. Flow chart of the error diffusion halftoning
Compared with ordered dithering, error diffusion is a halftoning technique which can generate higher quality halftone images. Its processing flow chart is shown in Fig. 9.2. When halftoning a continuous-tone image line by line sequentially, the past error is diffused to the current pixel. x(i, j) is the current processing pixel and x (i, j) is the diffused error sum added up from the neighboring processed pixels. b(i, j) represents the binary output at position (i, j). u(i, j) is the modified gray output and e(i, j) is the difference between the u(i, j) and the b(i, j). The relationships of these variables are given below. u(i, j) = x(i, j) + x (i, j) , 1 1 x (i, j) = e(i + m, j + n)k(m, n) ,
(9.2) (9.3)
m=0 n=−1
e(i, j) = u(i, j) − b(i, j) , 0 if u(i, j) ≥ t b(i, j) = , 255 if u(i, j) < t
(9.4) (9.5)
where the parameter t in the Eq. (9.5) is a threshold usually set as 128. From the Fig. 9.2, it is easily found that different kernels k(m, n) correspond to different visual quality of halftone images. Nowadays, Floyed-Steinberg [3], Jarvis [7] and Stucki [18] are three popular error diffused kernels which applied to transform gray level images into halftone images. These kernels are shown in Fig. 9.3. In this chapter, all test halftone images are produced by error diffusion halftoning an 8-bit gray level image using Floyed-Steinberg kernel as Eq. (9.3).
Fig. 9.3. Error diffusion kernels: Floyed-Steinberg, Jarvis and Stucki (from left to right, X is the current pixel)
9
Lossless Data Hiding for Halftone Images
185
9.3 Look-Up Table Based Lossless Halftone Image Data Hiding 9.3.1
Information Redundancy Investigation
In general, the first task of data hiding is to investigate the information redundancy of the cover media. Although halftone image is a 1-bit binary image, some information redundancy also can be found and exploited. Suppose the cover image is partitioned into a set of non-overlapping 4 × 4 blocks. Obviously a 4 × 4 binary block has totally 216 different patterns. Assume each 4×4 block is rearranged into a binary sequence and transformed into a decimal integer. As a result, each pattern is uniquely associated with + * an integer called pattern index (PI ) in the range of 0, 216 − 1 . We count the appearance times (AT ) of each pattern according to its PI and thus a histogram can be constructed. The x -axis is the PI and the y-axis is the AT. This histogram is called the pattern histogram (PH ). Fig. 9.4 shows the PH s of six 512 × 512 halftone images shown in Fig. 9.18 respectively. From Fig. 9.4 we can see the distribution of these bins is statically uneven. The exploited information redundancy is motivated from the distribution property. In particular, two kinds of information redundancy can be used to embed data. 1. A quite small portion of patterns appear many times, while some other patterns others never appear. 2. A quite small portion of patterns appear many times, while some other patterns others appear only once.
Fig. 9.4. Pattern histograms of Lena, F16, Baboon (above row, from left to right), Boat, Pepper and Barbara (below row, from left to right)
186
F.-X. Yu, H. Luo, and S.-C. Chu
From plenty of experiential results, usually only 3000 or so kinds of patterns appear in a halftone image, and in contrast a majority of patterns never appear or appear only once. According to these two kinds of statistical properties, two lossless data hiding schemes are designed. 9.3.2
Scheme 1
From the inherent properties and the investigated information redundancy of halftone image, we can see that it is not appropriate to embed data by direct changing the pixel values of some selected pixels. Instead, a hybrid-based data embedding strategy is more reasonable. That is, some appropriate blocks are selected and their properties are changed, and meanwhile some other pixels are selected for side information embedding. In Scheme 1, we used the first property examined from the PH : a quite small portion of patterns appear many times, while some other patterns never appear. The block diagram of data embedding and extraction is shown in Fig. 9.5 and details are described as follows.
Fig. 9.5. Block diagram of data embedding and extraction of Scheme 1
Data embedding As shown in Fig. 9.5, the cover image is partitioned into non-overlapping 4×4 blocks cb = {cb1 , cb2 , . . . , cbk }. In both of our schemes, the data embedding operation is aided by a look-up table (LUT) composed by a set of similar pattern pairs. Therefore the LUT must be constructed in advance. In the Scheme 1, the similarity metric is Hamming distance. In particular, we define a 4 × 4 binary pattern has 16 similar patterns. That is, if one pixel value of the original pattern is changed, a corresponding similar pattern is obtained. In other words, the Hamming distance between the original and the similar pattern is 1. An example of a 4 × 4 original pattern and its similar patterns is shown in Fig. 9.6.
9
Lossless Data Hiding for Halftone Images
187
Fig. 9.6. An example of an original block and its similar blocks
Fig. 9.7. A LUT example of halftone Lena image with the similar metric of Hamming distance
According to the PH, AT s of all patterns ever appears are sorted in ascending order. Then we select r patterns with the largest AT s h={h1 , h2 . . . ,hr }, and find their corresponding similar patterns l={l1 , l2 . . . ,lr }. The procedure is given below. To a block pattern h, if all of its 16 similar patterns appear at least once in the cover image, the pattern h is discarded. If one or more similar blocks do not appear, we randomly select one as its similar block. In this way, a pair of similar pattern is found. All patterns in h are investigated like this, and the corresponding similar patterns are recorded in l. Besides, the AT s are also recorded for the embedding capacity computation (this information is not required in data extraction). Hence a LUT is constructed. It is necessary to note that each pattern in l must be different from any other one. As an example shown in Fig. 9.7, the LUT is constructed on the halftone Lena image, with the size of the Lena and a block 512 × 512 and 4 × 4 pixels, respectively. Only 5 patterns with highest frequency of appearance and their similar patterns are given. In general, the size of LUT is determined by the quantity of secret data. More secret data to be embedded required a larger LUT. The data embedding operation is called similar pair toggling. In particular, we examine the original block cb in the cover image from the first one to the last one. If it is the same as one of the patterns in h, one bit secret data w can be hidden with the following rule. If “0” is to be embedded, we do not
188
F.-X. Yu, H. Luo, and S.-C. Chu
change the current block, if “1” is to be embedded, it is replaced with its corresponding similar pattern in l. The operation is shown in Eq. (9.6). h if w = 0 sb = , (9.6) l if w = 1 where sb denotes the stego block. Hence we can find that once similar pair toggling is associated with one bit embedding. Since different images have different LUTs, no universal LUT is suitable for all images. Nevertheless, the LUT is quite essential in data embedding and extraction. In [9], it is transmitted along with the stego image to the decoder in a secure way. In that case, the security of the LUT must be carefully considered, and some extra channel resource must be used in the transmission. Meanwhile, some storage space is also required. However in Scheme 1, these insufficiencies are conquered in that the LUT is also hidden in the cover image along with the secret data. Since the operations are similar as that in Scheme 2, they are presented in the latter context. After secret data and LUT embedded, the stego image is obtained. Data extraction The data extraction and cover image recovery is the inverse process of the data embedding. As shown in Fig. 9.5, the stego image is partitioned into non-overlapping 4 × 4 blocks sb = {sb1 , sb2 , . . . , sbk }. Obviously, the LUT must be reconstructed. We use the same key to localize the LUT embedded blocks. Then retrieve these block pixel values and rearrange them to the LUT. Next, compare each block in the stego image with patterns in the LUT, if it is the same as one of patterns in h, one bit secret data “0” is extracted because the original block is not changed. If it is the same as one of patterns in l, one bit secret data “1” is extracted for the original block is changed to its similar pattern. This can be expressed as in Eq. (9.7), 0 if sb = h w= . (9.7) 1 if sb = l After all data extracted, it is divided into two parts. The first part is the retrieved secret data and the second is the side information caused by LUT embedding. In the cover image recovery, on one hand, when we come across a pattern of l, it is replaced with its corresponding pattern h. On the other hand, the side information is set back in their original positions localized by the key. Application in lossless content authentication Data hiding techniques play an important role in halftone image authentication. Kim and Afif introduce an authentication watermark AWST (authentication watermarking by self toggling) for halftone images in [8]. It consists of
9
Lossless Data Hiding for Halftone Images
189
following steps: choosing a set of pseudo-random pixels in the image, clearing them, computing the message authentication code (MAC) or the digital signature (DS) of the random-pixels-cleared image, and inserting the resulting code into the selected random pixels. One disadvantage of the AWST is it cannot obtain the original image in watermark extraction and image authentication when the host image is not changed, because it clears some pixels randomly selected and never can be recovered. However, in the situations of the cover image content must be accurately maintained if no alterations suffered, AWST fails. In contrast, lossless data hiding used in content authentication outperforms the conventional techniques in the aspect of its reversibility. In this subsection, Scheme 1 is applied in the halftone image content authentication. The cover image can be perfectly restored as long as it suffers no alteration, otherwise even a single pixel change can be detected. The details are described below. In the side of data hiding, the image hash is exploited as the secret data. Image hashing is known as the problem of mapping an image to a short binary string. Image hash function has the properties that perceptually identical images have the same hash value with high probability, while perceptually different images have independent hash values. In addition, the hash function is secure, so that an attacker cannot predict the hash value of a known image. Image hashing is one-way, collision-free and relatively easy to compute for any given image. Hence, the secret data can be viewed as adaptive for its sensitivity to change of the image. The generated hash sequence is hidden in the cover image using the Scheme 1. In the side of content authentication, the secret data is extracted and a “cover image” is recovered. Then a hash sequence is computed on the “covered image” with the same hash function. Next, compare the extracted secret data and the computed hash sequence. If they are exactly the same, a conclusion can be made that the original image suffers no alteration; otherwise it is changed intentionally or unintentionally. Experimental results In the experiment, 512 × 512 halftone Lena image and Baboon image are selected to test the effectiveness of the method. As shown in Fig. 9.8(a), the halftone Lena image is divided into 4 × 4 blocks. The original secret data, i.e. the hash sequence of Lena, is computed by the MD5 hash function [17]. After translating the string into a “0 − 1” sequence, a 128-bit digest is obtained. Note that using other cryptographic hash functions to replace MD5 is also allowable. In authentication, we compare the secret data extracted from the stego image and the hash sequence computed from the restored image. If they are equal, the stego Lena can be confirmed intact. Both of them are equal to the original watermark, as shown in Fig. 9.8(g), Fig. 9.8(h) and Fig. 9.8(e). While
190
F.-X. Yu, H. Luo, and S.-C. Chu
Fig. 9.8. Experimental results on halftone Lena Image. (a) The original Lena, (b) the stego Lena without alteration, (c) the restored Lena of (b), (d) the stego Lena with alteration, (e) the restored Lena of (d), (f) the original secret data, (g) the secret data extracted from (b), (h) hash of (c), (i) the secret data extracted from (d), (j) hash of (e).
if the stego Lena is tampered by a mark “KUAS HIT” (Fig. 9.8(d)), the two sequences are different, as shown in Fig. 9.8(i) and Fig. 9.8(j). Therefore, we can make a judgment by virtue of the two sequences are equal or not: if equal, the image suffers no alteration; otherwise changed. The Baboon image is also tested like this. As shown in Fig. 9.9, experimental results also verify effectiveness of the scheme. 9.3.3
Scheme 2
Different from Scheme 1, Scheme 2 exploits the second property examined from the PH : a quite small portion of patterns appear many times, while some other patterns others appear once. The original idea is motivated from the R-S algorithm developed in [4]. The block diagram of data embedding and extraction is shown in Fig. 9.10. Firstly, a LUT is constructed. It also consists of pairs of similar block patterns which are selected according to the statistics combining characteristics of the HVS. In this case “0” and ”1” represent the states of two group patterns of the LUT respectively. Secondly, we search all blocks in the image: if one is the same as some pattern in the LUT, record its state. Thus a state sequence
9
Lossless Data Hiding for Halftone Images
191
Fig. 9.9. Experimental results on halftone Baboon Image. (a) The original Baboon, (b) the stego Baboon without alteration, (c) the restored Baboon of (b), (d) the stego Baboon with alteration, (e) the restored Baboon of (d), (f) the original secret data, (g) the secret data extracted from (b), (h) hash of (c), (i) the secret data extracted from (d), (j) hash of (e).
can be obtained. Thirdly, this sequence is compressed without loss and the saved space is filled with the hidden data and some side information. Here the side information refers to the extra data aroused by the LUT embedding. Fourthly, data is hidden by similar patterns toggling with reference to the new sequence. The last step is to insert the LUT with a secret key, and meanwhile the stego halftone image is obtained. In the data extraction stage, the LUT must be reconstructed first and other procedures are just the inverse process of data hiding. Details of the Scheme 2 are described as follows. Data embedding The cover image is partitioned into non-overlapping 4 × 4 blocks. Different from the Hamming distance used as similarity metric in Scheme 1, Scheme 2 employs the human visual system (HVS) characteristics to reduce the introduced visual distortion. According to the study on the HVS [13], the spatial frequency sensitivity of human eyes is usually estimated as a modulation transfer function. Specifically, the impulse response to a printed image of 300 dpi at a viewing distance of 30 inches is virtually identical to that of a Gaussian filter with
192
F.-X. Yu, H. Luo, and S.-C. Chu
Fig. 9.10. Block diagram of data embedding and extraction of Scheme 2
σ = 1.5 and τ = 0.0095◦. In our research, we adopt the 5 × 5 visual response filter given in [2] as ⎡ ⎤ 0.1628 0.3215 0.4035 0.3215 0.1628 ⎢ 0.3215 0.6352 0.7970 0.6352 0.3215 ⎥ ⎥ 1 ⎢ ⎢ 0.4035 0.7970 1.0000 0.7970 0.4035 ⎥ . (9.8) f= ⎢ ⎥ 11.566 ⎣ 0.3215 0.6352 0.7970 0.6352 0.3215 ⎦ 0.1628 0.3215 0.4035 0.3215 0.1628 Based on the statistics of the PH, the patterns h={h1 , h2 . . . , hr } with the largest r biggest AT s are easily found. Suppose m={m1 , m2 . . . , mv } denotes the those patterns with AT =1. Then convolute h and m with f respectively as hc = h ⊗ f ,
(9.9)
mc = m ⊗ f .
(9.10)
Obviously the size of the convolution results hc and mc is 8×8 (a 4×4 matrix convoluted with a 5 × 5 matrix). Next, the Euclidean distance d computed as
9
Lossless Data Hiding for Halftone Images
193
Eq. (9.11) is used to measure the similarity between hc and mc . The pattern with the smallest d to hc is recorded and the associated pattern in m is selected as the l of h, d= (hc − mc ) . (9.11) Note that any pattern in l must be different from each other. In this way, a LUT composed by similar pairs (h, l) is constructed. h is the similar pattern of l and vice versa.
Fig. 9.11. A LUT example of halftone Lena image with the similar metric of HVS characteristics
An example LUT constructed on the 512 × 512 halftone Lena image is shown in Fig. 9.11. Only 10 similar pattern pairs are listed. The above row shows the group of h, and the below row is the corresponding group of l. From Fig. 9.7 and Fig. 9.11, we can find that although they are produced on the same Lena image, the same pattern in h may has quite different similar pattern in l. In fact, the Hamming distance between them in the Fig. 9.11 are usually larger than 1. However, from the aspect of HVS characteristics, the LUT shown in Fig. 9.11 is better to reduce the visual distortion introduced by data embedding. Once the LUT is constructed, a LUT pattern state must be recorded. In particular, examine all blocks in the original image. As long as we come across a pattern in the LUT, we record its state based on the following rule: if it belongs to h, “0” is recorded; if it belongs to l, then “1” is recorded. Thus the state sequence S can be obtained. Next, losslessly compress S into Sc . In our case, the arithmetic coding algorithm is used. As the patterns in h has the highest frequency of appearance and those in l only appear once. It is clearly the state sequence is composed by quite a few “0” and a small quantity of “1”. Therefore, the compression ratio is expected to be very high. In this way, much space is saved for hiding secret data and side information. As shown in Fig. 9.12, a new state sequence S is produced by concatenating the compressed state sequence (Sc ), the secret data (W ) and the side information (SI). The key data embedding operation is also similar pair toggling. Examine the cover block from the first one to the last one. In the Scheme 2, the task is to modulate the states of patterns belonging to the LUT from the original state (S) to the new states (S ) based on the following rule.
194
F.-X. Yu, H. Luo, and S.-C. Chu
Fig. 9.12. State sequence lossless compression and the saved space allocation
1. If the current block is the same as one of patterns in l and the current new state sequence bit s is “0”, we replace l with its similar pattern h. 2. If the current block is the same as one of patterns in h, and the current new state sequence bit s is “1”, we replace h with its similar pattern l. In the other two cases, blocks are unchanged. These operations can be expressed as follows ⎧ h, if s = 0, cp = l ⎪ ⎪ ⎨ h, if s = 0, cp = h , (9.12) cp = l, if s = 1, cp = l ⎪ ⎪ ⎩ l, if s = 1, cp = h where cp and cp are the current processed cover block and stego block, respectively. The LUT is also embedded in the cover image with the steps are described as follows. It is rearranged into a binary sequence. A secret key is used to generate some random pixel locations to embed the LUT. Note that these pixels must not fall into the blocks in the LUT. We extract the selected pixel values into a binary sequence, and embedded it based on Eq. (9.12). Then we directly replace the selected pixels with the LUT sequence. This principle is illustrated in Fig. 9.13. The red blocks denote patterns used to extract state according to the LUT. While the blue points denote the select pixels with the key, these pixels values are extracted as side information SI, and then the rearranged LUT (a binary sequence) is inserted into these locations. For example, if the LUT shown in Fig. 9.11 is to be inserted, we need to select 320 pixel positions (blue points) and replace their pixels values using the rearranged LUT, and the original 320 pixels values are hidden as the way of watermark embedding (red blocks). More details of the LUT embedding method can be seen in [11]. After the secret data and the side information embedded, the stego image is produced.
9
Lossless Data Hiding for Halftone Images
195
Fig. 9.13. Principle of LUT embedding
Data extraction As shown in Fig. 9.10, the data extraction and cover image recovery is the inverse process of the data embedding. First, the stego image is partitioned into non-overlapping 4 × 4 blocks. Second, the LUT is reconstructed, i.e., the same key is used to localize the LUT embedded pixels and their values are retrieved to rearrange the LUT. Third, the state sequence S is extracted as the operation of S extraction in data embedding. Obviously, S is composed with three parts, the first part is the compressed version of the original state sequence; the second part is the secret data and the last part is the side information. Hence the secret data is retrieved. Moreover, the first part of S is decompressed into a binary sequence. The process is illustrated in Fig. 9.14. To recover the cover image, an inverse state modulation must be performed. If the current pattern cp is exactly the same as l and the current state bit s is “0”, we replace it with its similar pattern h. If the current pattern cp is exactly the same as h and the current state bit s is “1”, we replace it with its
Fig. 9.14. Principle of LUT embedding
196
F.-X. Yu, H. Luo, and S.-C. Chu
similar pattern l. In the other two cases, block patterns are preserved. These operations are given as ⎧ h if s = 0, cp = l ⎪ ⎪ ⎨ h if s = 0, cp = h . (9.13) cp = l if s = 1, cp = l ⎪ ⎪ ⎩ l if s = 1, cp = h Capacity Enhancement From the analysis of Scheme 1 and Scheme 2, once similar pattern toggling corresponds to one bit data embedding. For example, the LUT shown in Fig. 9.7 can provide 723 bits for data hiding (including the side information). However, although this capacity is suitable for applications of content authentication, it is not large enough for a large quantity of secret data embedding. This subsection exploits the micro structure statistical feature of error diffused halftone images to enhance data capacity.
Fig. 9.15. 2×2 binary patterns
The original image is partitioned into non-overlapping 2 × 2 blocks instead of 4 × 4 blocks. Obviously there are 16 patterns of a 2 × 2 binary block P1 , P2 . . . , P16 as shown in Fig. 9.15. We investigate their appearance frequencies in error diffusion halftone images. Through numerous experiments, it is amazing to find that in most cases the two patterns P4 and P13 rarely appear compared with the other two P7 and P10 . If we consider a 2×2 binary pattern as a fine texture in halftone images, it is easy to understand the error diffusion halftoning process produces a large quantity of P7 and P10 , in contrast much fewer P4 and P13 . This is the micro structure statistical feature used for capacity enhancement. Although Floyd-Steinberg, Jarvis and Stucki kernels are various in sizes or weight values at different directions, similar micro structure statistical feature can be obtained by performing these kernels error diffused filtering on corresponding 8-bit gray level images. Consequently, the statistical features prevalently exist in these halftone images. For simplicity, in the following sections, we only focus on the images obtained by Floyd-Steinberg kernel halftoning. To illustrate the micro structure statistical features, an example is shown in Fig. 9.16, where six 512 × 512
9
Lossless Data Hiding for Halftone Images
197
Fig. 9.16. 16 2×2 patterns’ appearance times in test images
Fig. 9.17. An LUT example in the capacity enhancement scheme
halftone images (see Fig. 9.18), Lena, F16, Baboon, Pepper, Boat and Barbara are examined. To enhance the data capacity, we only need to adopt the LUT shown in Fig. 9.17 instead of a larger LUT in Scheme 1 and Scheme 2. As the image is partitioned into smaller blocks, and the blocks patterns in the group of h is highly increased, more similar pair toggling is provided for data embedding. Besides, the LUT shown in Fig. 9.17 has many variations by recombination of the (h1 , l1 ), (h2 , l2 ), and as also hidden in the cover image, its security can be guaranteed. Experimental Results Six 512×512 error diffused halftone images, Lena, Baboon, F16, Boat, Pepper and Barbara are selected to test the performance of the proposed method, as shown in Fig. 9.18. These halftones are obtained by performing FloydSteinberg error diffusion filtering on the 8-bit gray level images.
198
F.-X. Yu, H. Luo, and S.-C. Chu
Fig. 9.18. Six test error diffused images Lena, F16, Baboon (above row, from left to right), Boat, Pepper and Barbara (below row, from left to right)
In the first experiment, performance of the basic Scheme 2 is tested. The secret data is a binary sequence created by a pseudo-random number generator. Fig. 9.19(a) and Fig. 9.19(b) illustrate the original image Lena and its stego version, whereas Fig. 9.19(c) shows the recovered one. To evaluate the introduced distortion, we apply an effective quality metric proposed in [21], i.e., weighted SNR (WSNR). The linear distortion is quantified in [21] by constructing a minimum mean squared error Weiner filter, in this way the residual image is uncorrelated with the input image. The residual image represents the nonlinear distortion plus additive independent noise. Valliappan et al. [21] spectrally weight the residual by a contrast sensitivity function (CSF) to quantify the effect of nonlinear distortion and noise on quality. A CSF is a linear approximation of the HVS response to a sine wave of a single frequency, and a low pass CSF assumes the human eyes do not focus on one point but freely moves the eyes around the image. Since the halftone image is attempted to preserve the useful information of the gray level image, we compare the halftone or watermarked image with the original gray level image. Similar to PSNR, a higher WSNR means higher quality. In our experiments, the WSNR between the gray level Lena and the halftone Lena is 29.18dB, while the WSNR between the gray level Lena and the watermarked Lena is 28.59dB. It can be seen that the introduced distortion of the visual quality is slight. Since the WSNR between the gray level Lena and the recovered Lena is 29.18dB, the recovered version is exactly the same as the original image.
9
Lossless Data Hiding for Halftone Images
199
Fig. 9.19. Lossless data hiding on the halftone Lena, (a) the original Lena, WSNR=29.18dB, (b) the stego Lena with 831 bits hidden, WSNR=28.58dB, (c) the recovered Lena, WSNR=29.18dB
The second experiment is designed to test the performance of the enhanced strategy. The experimental results on Lena and Baboon image are shown in Fig. 9.20. As no alterations suffered, the cover images are accurately
200
F.-X. Yu, H. Luo, and S.-C. Chu
Fig. 9.20. Lossless data hiding on the halftone Lena and Baboon, (a) the original Lena, (b) the original Baboon, (c) the stego Lena with 17550 bits hidden, (d) the stego Baboon with 1806 bits hidden, (e) the recovered Lena, (f) the recovered Baboon
9
Lossless Data Hiding for Halftone Images
201
Table 9.1. Capacity (bits) of the Scheme 2 without capacity enhancement strategy used LUT I=1 I=2 I=3 I=4 I=5 I=6 I=7 I=8 I=9 I=10
Lena 201 339 432 512 582 641 690 739 788 831
F16 195 385 522 635 742 844 936 1025 1109 1191
Baboon 8 21 28 37 41 45 48 51 52 54
Boat 68 142 204 261 314 366 416 464 512 553
Peppers 152 258 355 429 487 540 582 620 655 685
Barbara 33 73 112 140 165 188 212 228 241 254
Table 9.2. Capacity (bits) of the Scheme 2 with capacity enhancement strategy used LUT I=2
Lena 17550
F16 4903
Baboon 1806
Boat 13588
Peppers 17460
Barbara 14142
recovered for the normalized cross-correlation values between the original and the recovered images are equal to 1. Although much more data is hidden, the visual qualities of the stego images are acceptable. Actually, the average luminance of the stego image is approximately equal to that of the original image. This is because all of the average gray levels of P4 , P13 , P7 and P10 are equal and thus similar pair toggling among them can not result in the sharp luminance change. Capacities of the basic Scheme 2 are listed in Table 9.1, while those with capacity enhancement strategy used are shown in Table 9.2. In Table 9.1, ten sizes of LUT are tested with the number of similar pattern pairs from 1 to 10. However, even with a largest LUT, the capacities are much smaller than those provided when the capacity enhancement strategy used.
9.4 Conclusions In this chapter, two lossless data hiding schemes for halftone images are proposed. In both of them, data embedding is based on similar pair toggling according to a look-up table. As long as the stego image is not unauthorized changed, the cover image can be perfected recovered only with a secret key. Besides, Scheme 1 is illustrated in lossless content authentication for halftone images. In Scheme 2, HVS characteristics are utilized to reduce the visual distortion introduced by data embedding. Moreover, the micro structure
202
F.-X. Yu, H. Luo, and S.-C. Chu
statistical feature of error diffused halftone images is exploited to enhance the data capacity. Although both of the two schemes focus on the gray level halftone images, they are easily to be extended to lossless data hiding of color halftone images.
References 1. Baharav, Z., Shaked, D.: Watermarking of dither halftone images, HewlettPackard Labs Tech. Rep., HPL-98-32 (1998) 2. Cheung, S.M., Chan, Y.H.: A technique for lossy compression of error-diffused halftones. In: Proc. of ICME, pp. 1083–1086 (2004) 3. Floyd, R., Steinberg, L.: An adaptive algorithm for spatial gray scale, Society for Information Display Symp. Tech. Papers, 36–37 (1975) 4. Fridrich, J., Goljan, M., Du, R.: Lossless data embedding — New paradigm in digital watermarking. EURASIP Journal on Applied Signal Processing, 185– 196 (2002) 5. Fu, M.S., Au, O.C.: Data hiding watermarking for halftone images. IEEE Trans. Image Processing. 11, 477–484 (2002) 6. Hel-Or, H.Z.: Watermarking and copyright labeling of printed images. Journal of Electronic Imaging 10, 794–803 (2001) 7. Jarvis, J.F., Judice, C.N., Ninke, W.H.: A survey of techniques for the display of continuous-tone pictures on bilevel displays. Computer Graphics Image Process 5, 13–40 (1976) 8. Kim, H.Y., Afif, A.: Secure authentication watermarking for binary images. In: Proc. Brazilian Symp. on Computer Graphics and Image Processing, pp. 199–206 (2003) 9. Liao, P.S., Pan, J.S., Chen, Y.H., Liao, B.Y.: A lossless watermarking technique for halftone images. In: Proc. Knowledge-Based Intelligent Information and Engineering Systems, vol. 2, pp. 593–599 (2005) 10. Lieberman, D., Allebach, J.: Digital halftoning using the direct binary search algorithm. In: Proc. IST Int’l Conf. on High Technology, pp. 114–124 (1996) 11. Lu, Z.M., Luo, H., Pan, J.S.: Reversible watermarking for error diffused halftone images using statistical features. In: Shi, Y.Q., Jeon, B. (eds.) IWDW 2006. LNCS, vol. 4283, pp. 71–81. Springer, Heidelberg (2006) 12. Luo, H., Chu, S.C., Lu, Z.M.: Self embedding watermarking using halftoning technique. Circuits Systems and Signal Processing 27, 155–170 (2008) 13. Pappas, T.N., Neuhoff, D.L.: Least-squares model based halftoning. IEEE Trans. Image Processing 8, 1102–1116 (1999) 14. Pan, J.S., Huang, H.C., Jain, L.C. (eds.): Intelligent Watermarking Techniques. World Scientific Publishing Company, Singapore (2004) 15. Pan, J.S., Luo, H., Lu, Z.M.: A lossless watermarking scheme for halftone image authentication. International Journal of Computer Science and Network Security 6, 147–151 (2006) 16. Pei, S.C., Guo, J.M.: Hybrid pixel-based data hiding and block-based watermarking for error-diffused halftone images. IEEE Trans. Circuits and Systems for Video Technology 13, 867–884 (2003) 17. Rivest, R.L.: The MD5 Message digest algorithm, Technical Report of MIT Laboratory for Computer Science and RSA Data Security (1992)
9
Lossless Data Hiding for Halftone Images
203
18. Stucki, P.: MECCA — A multiple error correcting computation algorithm for bilevel image hardcopy reproduction, Research Report RZ1060, IBM Res. Lab., Zurich, Switzerland (1981) 19. Sun, Z.: Video halftoning. IEEE Trans. Image Processing 15, 678–686 (2006) 20. Ulichney, R.: Digital Halftoning. MIT Press, Cambridge (1987) 21. Valliappan, M., Evans, B.L., Tompkins, D.A.D., Kossentini, F.: Lossy compression of stochastic halftones with JBIG2. In: Proc. Int’l Conf. on Image Processing, pp. 214–218 (1999)
10 Information Hiding by Digital Watermarking Frank Y. Shih and Yi-Ta Wu Computer Vision Laboratory, College of Computing Sciences, New Jersey Institute of Technology, Newark, NJ 07102, USA
[email protected] Summary. Digital information and data are transmitted more often over the Internet now than ever. Free-access digital multimedia communication unfortunately provides virtually unprecedented opportunities to pirate copyrighted material. Therefore, the idea of using a digital watermark to detect and trace copyright violations has stimulated significant interests among engineers, scientists, lawyers, artists and publishers. In this chapter, we present robust high-capacity digital watermarking techniques based on genetic algorithm (GA) and chaotic map. A GA-based technique is presented to correct the rounding errors. The fundamental is to adopt a fitness function for choosing the best chromosome which determines the conversion rule of real numbers into integers during the cosine transformation. We develop a block-based chaotic map, which outperforms the traditional one by breaking local spatial similarity, to increase the amount of significant coefficients in the transformed image. We demonstrate the superiority of the proposed scheme in terms of better embedding capacity and lower message error rate. Furthermore, we show the proposed algorithm works well under some image-processing distortions, such as the JPEG compression, Gaussian noise, and low-pass filter.
10.1 Introduction Computers and networking facilities are becoming less expensive and more widespread. Creative approaches of storing, accessing and distributing data have generated many benefits into the digital multimedia field mainly due to properties, such as distortion-free transmission, compact storage, and easy editing. Free-access digital multimedia communication unfortunately provides virtually unprecedented opportunities to pirate copyrighted material. Therefore, the idea of using a digital watermark to detect and trace copyright violations has stimulated significant interests among engineers, scientists, lawyers, artists and publishers. As a result, the research in watermark embedding to be robust with respect to compression, image-processing J.-S. Pan et al. (Eds.): Information Hiding and Applications, SCI 227, pp. 205–223. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
206
F.Y. Shih and Y.-T. Wu
Fig. 10.1. A general digital watermarking system
operations, and cryptographic attacks has become very active in recent years and the developed techniques have grown extremely well. Watermarking is not a brand new phenomenon. For nearly one thousand years, watermarks on paper have been often used to visibly indicate a particular publisher and to discourage counterfeiting in currency. A watermark is a design impressed on a piece of paper during production and used for copyright identification. The design may be a pattern, a logo or an image. In the modern era, as most of the data and information are stored and communicated in a digital form, proving authenticity plays an increasingly important role. As a result, digital watermarking is a process whereby arbitrary information is encoded into an image in such a way that the additional payload is imperceptible to image observers. Digital watermarking has been proposed as a suitable tool to identify the source, creator, owner, distributor, or authorized consumer of a document or an image [1]. It can also be used to detect a document or an image that has been illegally distributed or modified. Another technology, encryption used in cryptography, is a process of obscuring information to make it unreadable to observers without specific keys or knowledge. This technology is sometimes referred to as data scrambling. Watermarking, when complimented by encryption, can serve vast amounts of purposes including copyright protection, broadcast monitoring, and data authentication. In digital world, a watermark is a pattern of bits inserted into a digital media that can identify the creator or authorized users. The digital watermark, unlike the printed visible stamp watermark, is designed to be invisible to viewers. The bits embedded into an image are scattered all around to avoid identification or modification. Therefore, a digital watermark must be robust enough to survive the detection, compression, and operations that are applied on. Figure 10.1 depicts a general digital watermarking system. A watermark message W is embedded into a media message which is defined as the host image H. The resulting image is the watermarked image H ∗ . In the embedding process, a secret key K, e.g., a random number generator, is sometimes involved to generate a more secure watermark. The watermarked image H ∗ is then transmitted along a communication channel. The watermark can later be detected or extracted by the receiver.
10
Information Hiding by Digital Watermarking
207
There are many aspects to be noticed in watermarking design, for example, imperceptibility, security, capacity and robustness. The watermarked image must look indistinguishable from the original image. If a watermarking system distorts the host image to some point of being perceptible, it is of no use. An ideal watermarking system could embed a large amount of information perfectly securely with no visible degradation to the host image. The embedded watermark should be robust with invariance to intentional (e.g. noise) or unintentional (e.g. image enhancement, cropping, resizing or compression) attacks. Many researchers have been focusing on security and robustness, but rarely on the watermarking capacity [5, 17]. The amount of data an algorithm can embed in an image has implications for how the watermark can be applied. Indeed, both security and robustness are important because the embedded watermark is expected to be unperceivable and irremovable. Nevertheless, if we can embed a large amount of watermark into a host image, the application becomes widely open in many areas. Another scheme is the use of keys to generate some random sequences during the embedding process. In this scheme, the cover image (i.e. the host image) is not needed during the watermark detection process. It is also a goal that the watermarking system utilizes an asymmetric key such as in public or private key cryptographic systems. A public key is used for image verification and a private key is needed for the embedding security. Knowledge of the public key neither helps compute the private key nor allows removal of the watermark. According to user’s embedding purposes, watermarks can be categorized into three types: robust, fragile, and semi-fragile. Robust watermarks [3, 9, 10, 13] are designed to withstand arbitrarily malicious attacks, such as image scaling, bending, cropping, and lossy compression. They are usually used for copyright protection to declare the rightful ownership. On the contrary, for the purpose of image authentication, fragile watermarks [4, 8, 15, 16, 19] are adopted to detect any unauthorized modification. The semi-fragile watermarks [12] are designed for detecting any unauthorized modification, at the same time allowing some image processing operations. In other words, it is for selective authentication that detects illegitimate distortion, while ignoring applications of legitimate distortion. In general, we can embed watermark in two types of domains: one is in spatial domain and the other is in frequency domain [7, 11, 14, 18] In the spatial domain [2, 9], we can replace the pixels in the host image by the pixels in the watermark image. Note that a sophisticated computer program may easily detect the inserted watermark. In the frequency domain [6, 7] we can replace the coefficients of a transformed image by the pixels in the watermark image. The often-used frequency-domain transformations are the Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), and Discrete Wavelet Transform (DWT). This kind of embedded watermark is in general difficult to detect. However, its embedding capacity is usually low since a large amount of data will distort the host image significantly. The
208
F.Y. Shih and Y.-T. Wu
size of watermark must be smaller than the host image. In general, the size of watermark is one sixteenth of the host image. It is reasonable to think that the frequency-domain watermarking would be robust since the embedded watermarks are spread out all over the spatial extent of an image [4]. If the watermarks are embedded into locations of large absolute values (we name “significant coefficients”) of the transformed image, the watermarking technique would become more robust. Unfortunately, the transformed images in general contain only a few significant coefficients, so the watermarking capacity is limited.
10.2 Weakness of Current Robust Watermarking There are several approaches for achieving the robust watermarking such that the watermarks are detectable after distorting the watermarked images. However, the capacity of robust watermarking techniques is usually limited due to their strategy. For example, the redundant embedding approach achieves the robustness by embed more than one copy of the same watermarks into an image. However, the multiple copies reduce the size of watermarks. That is, the more the copy number is, the smaller the size of watermarks is. For the significant coefficients embedding, it is obviously that the capacity of watermarks is due to the number of the significant coefficients. Unfortunately, the number of significant coefficients is quite limited in most of real images due to local spatial similarity. For region based embedding, we embed a bit of watermark over the region of an image in order to spread out the message all over the selected region. Hence, the capacity of watermarks is restricted to block size.
10.3 Concept of Robust Watermarking Zhao et al. [19] presented a robust wavelet-domain watermarking algorithm based on the chaotic map. They divide an image into a set of 8 × 8 blocks with labels. After the order is mixed by a chaotic map, the first 256 blocks are selected for embedding watermarks. Miller et al. [8] proposed a robust high-capacity watermarking algorithm by informed coding and embedding. However, they can embed only 1,380 bits of information in an image of size 240 × 368, i.e., a capacity of 0.015625 bits/pixel. There are two criteria to be considered when we develop the robust high-capacity watermarking technique. First, the strategy for embedding and extracting watermarks ensures the robustness. Second, the strategy for enlarging the capacity does not affect the robustness of watermarking. The frequency-domain watermarking possesses strong robustness since the embedded messages are spread out all over the spatial extent in an image [4]. Moreover, if the messages are embedded into significant coefficients, the
10
Information Hiding by Digital Watermarking
209
watermarking technique will be more robust. Therefore, the significant coefficients embedding approach is adopted as the base of the robust high-capacity watermarking to satisfy the first criterion. The remaining problem is to enlarge the capacity without degrading the robustness.
10.4 Enlargement of Significant Coefficients During spatial-frequency transformation, the low frequencies in the transformed domain reflect smooth areas of an image, and the high frequencies reflect the areas with large intensity changes, such as edges and noise. Therefore, due to local spatial similarity the significant coefficients of a transformed image are limited. Unless an image contains heavy noises, we cannot obtain large significant coefficients through any of the transformation approaches (DCT, DFT or DWT). 10.4.1
Break the Local Spatial Similarity
The noise image leads us to enlarge the capacity. If we could rearrange pixel positions such that the rearranged image contains lots of noise, the significant coefficients of the rearranged image would be increased dramatically. Therefore, we adopt the chaotic map to relocate pixels. Figure 10.2 illustrates an example of increasing the number of significant coefficients by rearranging the pixel locations. Figure 10.2(a) shows an image with its gray-values displayed in (b). Figure 10.2(d) shows its relocated image with gray-values displayed in (e). Figures 10.2(c) and (f) are obtained by applying DCT on Figures 10.2(b) and (e), respectively. If the threshold is set
Fig. 10.2. An example of increasing the number of significant coefficients. (a) and (b) An image and its gray-values, (c) the DCT coefficients, (d) and (e) the relocated image and its gray-values, (f) the DCT coefficients.
210
F.Y. Shih and Y.-T. Wu
to be 70, we obtain 14 significant coefficients in Figure 10.2(f), but only two significant coefficients in (c). 10.4.2
Block-Based Chaotic Map
In order to enlarge the watermark capacity, we adopt the chaotic map to break the local spatial similarity of an image, so it can generate more significant coefficients in the transformed domain. Since the traditional chaotic map is pixel-based [11, 13, 19] and not suited to generate the reference register, we develop a new block-based chaotic map to break the local spatial similarity. In other words, the block-based chaotic map for relocating pixels based on “block” unit (i.e., a set of connected pixels) instead of “pixel” unit. Figures 10.3(a) and (b) show a diagram and its relocated result based on the block size of 2 × 2. Figures 10.3(c) and (d) show a diagram and its relocated result based on the block size of 2 × 4. In general, the bigger the block size is, the larger the local similarity is. We apply the block size of 2 × 2 and l = 2 on Figure 10.4, where (a) is the Lena image, (b) its relocated image, and (c) the resulting image in the next iteration.
Fig. 10.3. An example of the block-based relocation. (a) and (b) A diagram and its relocated result based on the 2 × 2 block, (c) and (d) a diagram and its relocated result based on the 2 × 4 block.
10
Information Hiding by Digital Watermarking
211
Fig. 10.4. An example of performing the block-based chaotic map. (a) A Lena image, (b) the relocated image, (c) the continuously relocated image.
10.5 Determination of Embedding Locations 10.5.1
Intersection-Based Pixels Collection
The intersection-based pixels collection (IBPC) intends to label an image using two symbols alternatively and then collect the two sub-images with the same symbol. Figures 10.5(a)–(c) show three different approaches to collect the pixels with same label horizontally, vertically, and diagonally, respectively. The two sub-images formed are shown in Figures 10.5(d) and (e). Note that the pair of images obtained has local spatial similarity, even after transformation or attacks. From experiments, the diagonal IBPC shows better similarity in our RHC watermarking algorithm. Therefore, we generate a pair of 8 × 8 coefficients from the 16 × 8 image by using the IBPC approach followed by the DCT. Afterwards, one is used as the reference register for
Fig. 10.5. An example of the IBPC
212
F.Y. Shih and Y.-T. Wu
Fig. 10.6. An example of illustrating the similarity of two sub-images obtained by the IBPC. (a) A Lena image, (b) the extracted 16 × 8 image, (c) and (d) a pair of extracted 8 × 8 sub-images, (e) and (f) the 8 × 8 DCT coefficients, (g) and (h) the quantized images.
indicating significant DCT coefficients, and the other is used as the container for embedding watermarks. 10.5.2
The Reference Register and Container
An example of demonstrating the similarity of two sub-images obtained by the IBPC is shown in Figure 10.6. Figure 10.6(a) shows the Lena image where a small block of pixels are cropped in (b). Figures 10.6(c) and (d) show the pair of sub-images obtained from (b) by the IBPC. Figures 10.6(e) and (f) are their respective DCT images. Figures 10.6(g) and (h) show the results after dividing (e) and (f) respectively by the quantization table with the quality factor (QF) of 50. Note that the QF will be used in Eq. (10.1), and an example of the quantization table in JPEG is given in Figure 10.7. We observe that Figures 10.6(g) and (h) are similar. Therefore, either one can be used as the reference register to indicate the significant coefficients
10
Information Hiding by Digital Watermarking
213
for embedding watermarks. The significant coefficients have the values larger than a pre-defined threshold RRT h . Let QT able(i, j) denote the a quantization table, where 0 ≤ i, j ≤ 7. The new quantization table N ewT able(i, j) is obtained by 50 , if QF < 50; QT able(i, j) × QF (10.1) N ewT able(i, j) = QT able(i, j) × (2 − 0.02 × QF ), otherwise.
10.6 The RHC Watermarking Algorithm Our robust high-capacity (RHC) watermarking algorithm contains two main components: the block-based chaotic map (BBCM) and the reference register (RR). 10.6.1
Embedding Procedure
In order to explain our watermark embedding procedures clearly, we must first introduce the following symbols: H: The gray-level host image of size n × m. n 16×8 H(i,j) and : The (i, j)-th sub-image of size 16 × 8 of H, where 1 ≤ i ≤ 16 m 1≤j≤ 8 . 8×8 16×8 HA8×8 (i,j) and HB(i,j) : A pair of images obtained from H(i,j) by the IBPC. 8×8 DA8×8 (i,j) and DB(i,j) : The transformed images of applying the DCT on 8×8 HA8×8 (i,j) and HB(i,j) , respectively. 8×8 8×8 8×8 QA8×8 (i,j) and QB(i,j) : The resulting images of dividing DA(i,j) and DB(i,j) by the quantization table, respectively. 8×8 8×8 8×8 E(i,j) : The watermarked image of DA8×8 (i,j) using QA(i,j) and QB(i,j) . 8×8 8×8 : The image after multiplying E(i,j) by the quantization table. M(i,j) 8×8 I(i,j) : The image obtained by the Inverse Discrete Cosine Transformation 8×8 . (IDCT) of M(i,j)
Fig. 10.7. A quantization table in the JPEG
214
F.Y. Shih and Y.-T. Wu
16×8 8×8 8×8 C(i,j) : The watermarked sub-image obtained by combining I(i,j) and HB(i,j) using IBPC. O: The output watermarked image obtained by collecting all the sub-images 16×8 of C(i,j) .
We present the overall embedding procedures as follows and the flowchart in Figure 10.8.
Fig. 10.8. The embedding procedure of our RHC watermarking
The Embedding Procedures of our Robust High-Capacity Watermarking Algorithm 16×8 1. Divide an input image H into a set of sub-images, H(i,j) , of size 16 × 9. 8×8 8×8 16×8 2. Build HA(i,j) and HB(i,j) from each sub-image H(i,j) by the IBPC. 8×8 8×8 8×8 3. Obtain DA8×8 (i,j) and DB(i,j) from HA(i,j) and HB(i,j) by the DCT, respectively. 8×8 8×8 8×8 4. Obtain QA8×8 (i,j) and QB(i,j) from dividing DA(i,j) and DB(i,j) by the JPEG quantization table, respectively. 8×8 and 5. Determine the proper positions of significant coefficients in QB(i,j) 8×8 embed watermarks into the corresponding positions in QA(i,j) to obtain 8×8 . The detailed embedding strategy will be described later. E(i,j) 8×8 8×8 from multiplying E(i,j) by the JPEG quantization table. 6. Obtain M(i,j) 8×8 8×8 7. Obtain I(i,j) by applying IDCT on M(i,j) . 16×8 8×8 8×8 . 8. Reconstruct C(i,j) by combining I(i,j) and HB(i,j) 16×8 ’s. 9. Obtain the output watermarked image O by collecting all the C(i,j)
10.6.2
Extracting Procedure
After receiving the watermarked image O, we intend to extract the watermark information. The following symbols are introduced in order to explain the watermark extracting procedures:
10
Information Hiding by Digital Watermarking
215
16×8 O(i,j) : The (i, j)-th sub-image of size 16 × 8 of O. 8×8 8×8 16×8 : A pair of extracted images from O(i,j) using the IBPC. OA(i,j) and OB(i,j) 8×8 8×8 T A(i,j) and T B(i,j) : The transformed image of applying the DCT on OA8×8 (i,j) 8×8 and OB(i,j) , respectively. 8×8 8×8 8×8 RA8×8 (i,j) and RB(i,j) : The resulting images of dividing T A(i,j) and T B(i,j) by the quantization table, respectively.
We present the watermark extracting procedures as follows and the flowchart in Figure 10.9.
Fig. 10.9. The extracting procedures in our RHC watermarking algorithm
The Extracting Procedures of our Robust High-Capacity Watermarking Algorithm: 16×8 1. Divide the watermarked image O into a set of sub-images, O(i,j) , of size 16 × 9. 8×8 16×8 2. Build OA8×8 (i,j) and OB(i,j) from each sub-image O(i,j) by the IBPC. 8×8 8×8 8×8 8×8 3. Obtain T A(i,j) and T B(i,j) from OA(i,j) and OB(i,j) by the DCT, respectively. 8×8 8×8 8×8 4. Obtain RA8×8 (i,j) and RB(i,j) from dividing T A(i,j) and T B(i,j) by the JPEG quantization table, respectively. 8×8 5. Determine the proper positions of significant coefficients in RB(i,j) and 8×8 extract sub-watermark from the corresponding positions in RA(i,j) . The detailed extracting strategy will be described later. 6. Collect all the sub-watermarks to obtain the watermark.
10.6.3
The Embedding and Extracting Strategies
The embedding strategy In this section, we will describe the embedding and extracting strategies of our RHC watermarking algorithm. For embedding, each pair of QA8×8 (i,j) (con8×8 tainer) and QB(i,j) (reference register) is obtained first. After the significant coefficients are determined by the reference register, the watermarks are embedded into the corresponding positions of the cover-coefficients by adding
216
F.Y. Shih and Y.-T. Wu
C R the values as in Eq. (10.2). Let V(k,l) and V(k,l) denote the values of QA8×8 (i,j) 8×8 C and QB(i,j) , respectively. Let S(k,l) be the result after embedding the message. C R R V(k,l) + αV(k,l) , ifV(k,l) ≥ RRT h , C = (10.2) S(k,l) C V(k,l) , otherwise,
where 0 ≤ k, l ≤ 7 and α > 0. Note that, the bigger α is, the higher robustness 8×8 C ’s are collected to form the corresponding E(i,j) . The is. The 8 × 8 S(k,l) embedding strategy is presented below. Watermark Embedding Strategy 1. Determine the embedding location by checking the significant coefficients 8×8 . of QB(i,j) R C 2. If the embedding message is “1” and V(k,l) ≥ RRT h , we obtain S(k,l) = C R C C + αV(k,l) ; otherwise, we set S(k,l) = V(k,l) . V(k,l) Extracting strategy 8×8 For the extracting strategy, each pair of RA8×8 (i,j) (container) and RB(i,j) (reference register) is obtained first. After the embedded positions of the reference register are determined, the watermark can be extracted by Eq. (10.3) and the watermarked coefficients can be used to construct the original ones by 8×8 C R Eq. (10.4). Let W(k,l) and W(k,l) denote the values of RA8×8 (i,j) and RB(i,j) , C respectively. Let F(k,l) be the result after the additional amount is removed by Eq. (10.4), and w be the embedded message. We have C R R − W(k,l) ) ≥ α2 V(k,l) , 1, if(W(k,l) (10.3) w= 0, elsewhere.
C F(k,l) =
C R W(k,l) − αW(k,l) , ifw = 1, C W(k,l) , otherwise.
(10.4)
The extraction strategy of our RHC watermarking is presented below. Watermark Extracting Strategy: 1. Determine the extracting location by checking the significant coefficients 8×8 . of RB(i,j) 2. Obtain the embedded message using Eq. (10.3). C C R = W(k,l) − αW(k,l) ; 3. If the embedded message is 1, we calculate F(k,l) C C otherwise, we set F(k,l) = W(k,l) .
10
Information Hiding by Digital Watermarking
217
10.7 Experimental Results In this section, we will show an example to illustrate that our algorithm can enlarge the watermarking capacity by using the block-based chaotic map (BBCM) and the intersection-based pixels collection (IBPC) approaches. We will provide the experimental results on 200 images and the comparisons with the “iciens” algorithm by Miller et al. [8]. 10.7.1
Capacity Enlargement
We give an example of the embedding strategy in Figure 10.10. Figure 10.10(a) is the relocated image of Figure 10.4(a) after performing seven iterations of relocation using the BBCM in Eq. (10.1), where the block size is 2 × 2 and l = 2 . After obtaining Figure 10.10(b) from the small box in (a), we generate Figures 10.10(c) and (d) by the IBPC. Figures 10.10(e) and (f) are
Fig. 10.10. An example of the embedding strategy. (a) A relocated image, (b) a 16 × 8 sub-image, (c) and (d) a pair of extracted results using the IBPC, (e) a container, (f) a reference register, (g) the watermarked result.
218
F.Y. Shih and Y.-T. Wu
Fig. 10.11. An example of the extracting strategy. (a) A 16×8 watermarked image, (b) and (c) a pair of extracted 8 × 8 results by the IBPC, (d) an 8 × 8 watermarked image by the DCT, (e) the result after the embedded data are removed, (f) a reconstructed 16 × 8 image.
respectively obtained from Figures 10.10(c) and (d) by the DCT followed by a division of the quantization table. Note that, Figure 10.10(e) is considered as the container for watermark embedding, and Figure 10.10(f) is the reference register for indicating the embedding positions as colored in gray. Since the number of embedding positions is 8, we set the watermark to be “11111111.” Figure 10.10(g) shows the result after embedding watermarks into these calculated positions by Eq. (10.2), where α = 0.5 and RRT h = 9. Figure 10.11 shows an example of the extracting strategy. Figure 10.11(a) is the watermarked sub-image, and Figures 10.11(b) and (c) are extracted from (a) by the IBPC. Note that, since Figure 10.11(c) is exactly the same as Figure 10.10(d), the reference register is the same as Figure 10.10(f). Therefore, after obtaining Figure 10.11(d) from (b) by the DCT, we can extract the watermark “11111111” from the coefficients, and generate Figure 10.11(e). Figure 10.11(f) is the reconstructed result. We observe that Figure 10.11(f) is exactly the same as the original image in Figure 10.10(b).
10
Information Hiding by Digital Watermarking
219
Fig. 10.12. An example of robustness experiment using JPEG compression. (a) A 202 × 202 Lena image, (b) the relocated image, (c) the watermarked image, (d) the image by the JPEG compression, (e) the reconstructed image.
10.7.2
Robust Experiments
Miller et al. [8] presented a robust high-capacity watermarking by applying informed coding and embedding. They conclude that the watermarking scheme is robust if the message error rate (MER) is lower than 20%. In order to evaluate the robustness of our watermarking algorithm, we test on 200 images and attack the watermarked images by the JPEG compression, Gaussian noise, and low-pass filter. The resulting MERs from these attacked images are provided. Figure 10.12 shows the robustness experiment under the JPEG compression. Figure 10.12(a) is the original Lena image. After continuously performing 9 times of BBCM, we obtain Figure 10.12(b) in which we embed watermarks by using the following parameters: the block size of 2 × 2, l = 2, α = 0.2 , and RRT h = 1. Figure 10.12(c) is the watermarked image obtained by continuously performing 8 times of BBCM to Figure 10.12(b). Figure 10.12(d) is the attacked image by applying JPEG compression on Figure 10.12(c) using quality factor of 20. Figure 10.12(e) is the reconstructed image after the embedded watermarks are removed. Note that, our RHC watermarking not only is robust due to its MER of 0.19 which is less than 20%,
220
F.Y. Shih and Y.-T. Wu
Fig. 10.13. An example of robustness experiment using low-pass filter. (a) A 202× 202 image, (b) the relocated image, (c) the watermarked image, (d) the attacked image by a low-pass filter, (e) the reconstructed image. Table 10.1. Effect of RRT h RRT h Capacity Capacity JPEG(QF=20) Gaussian noise(SD=500,mean=0) (bits) (bits/pixel) Error bits MER Error bits MER 10 810 0.0199 104 0.12 140 0.17 9 924 0.0226 120 0.13 170 0.18 8 1061 0.0260 139 0.13 195 0.18 7 1228 0.0301 163 0.13 228 0.19 6 1439 0.0353 197 0.14 271 0.19 5 1715 0.0420 243 0.14 329 0.19 4 2087 0.0511 304 0.15 403 0.19 3 2617 0.0641 404 0.15 522 0.20 2 3442 0.0844 577 0.18 712 0.21 1 5050 0.1238 968 0.19 1116 0.22
but also contains high capacity of 0.116 bits/pixel which is much larger than 0.015625 bits/pixel in Miller et al. [8]. Figures 10.13 and 10.14 show the experimental results under the low-pass filter and Gaussian noise attacks, respectively. The parameters used are the same as the previous example. In Figure 10.13, a 3 × 3 low-pass filter with all
10
Information Hiding by Digital Watermarking
221
Fig. 10.14. An example of robustness experiment using Gaussian noise. (a) A 202 × 202 image, (b) the relocated image, (c) the watermarked image, (d) the attacked image by Gaussian noise, (e) the reconstructed image.
Fig. 10.15. Robustness versus JPEG compression
222
F.Y. Shih and Y.-T. Wu
Fig. 10.16. Robustness versus Gaussian noise
1’s is utilized as the attacker. The capacity and the MER are 0.117 and 16%, respectively. In Figure 10.14, a Gaussian noise with the standard deviation (SD) 500 and mean 0 is added to the watermarked image. The capacity and the MER are 0.124 and 23%, respectively. It is obviously that our RHC watermarking algorithm not only achieves the robustness requirement, but also maintains the high capacity of watermarks. 10.7.3
Performance Comparisons
In this section, we compare our RHC watermarking algorithm with the “iciens” algorithm by Miller et al. [8] Figure 10.15 shows the effect under the JPEG compression with different quality factors (QFs). For simplicity, we use the symbol “∗” to represent our RHC and “” to represent iciens. Our algorithm has a slightly higher MER when the QF is higher than 50, but has a significant lower MER when the QF is lower than 40. Figure 10.16 shows the effect under Gaussian noise with mean 0 and different standard deviations. Our RHC algorithm outperforms the iciens algorithm in terms of low MER. Note that the parameters are the block size of 2×2, l = 2 , α = 0.2 , and RRT h = 1. For watermarking capacity, the average capacity of our RHC watermarking is 0.1238 bits/pixel, which is much larger than 0.015625 bits/pixel obtained by Miller’s iciens algorithm. The capacity of our RHC watermarking is affected by RRT h which is the threshold to determine the significant coefficients. The smaller RRT h is, the higher capacity is. Table 10.1 shows the effect of RRT h under the JPEG compression with QF=20 and the Gaussian noise with SD = 500 and
10
Information Hiding by Digital Watermarking
223
mean = 0. We observe that our algorithm can not only enlarge the capacity (i.e., larger than 0.12 bits/pixel), but also maintain a low MER (i.e., less than 0.22) even under voluminous distortions.
References 1. Berghel, H., O’Gorman, L.: Protecting ownership rights through digital watermarking. IEEE Computer Mag. 29, 101–103 (1996) 2. Bruyndonckx, O., Quisquater, J.-J., Macq, B.: Spatial method for copyright labeling of digital images. In: Proc. IEEE Workshop on Nonlinear Signal and Image Processing, Neos Marmaras, Greece, pp. 456–459 (1995) 3. Celik, M.U., et al.: Hierarchical watermarking for secure image authentication with localization. IEEE Trans. Image Processing 11, 585–595 (2002) 4. Cox, I.J., et al.: Secure spread spectrum watermarking for multimedia. IEEE Trans. Image Processing 6, 1673–1687 (1997) 5. Eggers, J., Girod, B.: Informed Watermarking. Kluwer Academic Publishers, Dordrecht (2002) 6. Huang, J., Shi, Y.Q., Shi, Y.: Embedding image watermarks in DC components. IEEE Trans. Circuits and Systems for Video Technology 10, 974–979 (2000) 7. Lin, S.D., Chen, C.-F.: A robust DCT-based watermarking for copyright protection. IEEE Trans. Consumer Electronics 46, 415–421 (2000) 8. Miller, M.L., Doerr, G.J., Cox, I.J.: Applying informed coding and embedding to design a robust high-capacity watermark. IEEE Trans. Image Processing 13, 792–807 (2004) 9. Mukherjee, D.P., Maitra, S., Acton, S.T.: Spatial domain digital watermarking of multimedia objects for buyer authentication. IEEE Trans. Multimedia 6, 1–15 (2004) 10. Nikolaidis, N., Pitas, I.: Robust image watermarking in the spatial domain. Signal Processing 66, 385–403 (1999) 11. Shih, F.Y., Wu, Y.T.: Combinational image watermarking in the spatial and frequency domains. Pattern Recognition 36, 969–975 (2003) 12. Shih, F.Y., Wu, Y.T.: Enhancement of image watermark retrieval based on genetic algorithm. Journal of Visual Communication and Image Representation 16, 115–133 (2005) 13. Voyatzis, G., Pitas, I.: Applications of toral automorphisms in image watermarking. In: Proc. IEEE Int’l Conf. on Image Processing, Lausanne, Switzerland, vol. 2, pp. 237–240 (1996) 14. Wang, H., Chen, H., Ke, D.: Watermark hiding technique based on chaotic map. In: Proc. IEEE Int’l Conf. on Neural Networks and Signal Processing, Nanjing, China, pp. 1505–1508 (2003) 15. Wong, P.W.: A public key watermark for image verification and authentication. In: Proc. IEEE Int’l Conf. Image Processing, Chicago, IL, pp. 455–459 (1998) 16. Wu, Y.T., Shih, F.Y.: An adjusted-purpose digital watermarking technique. Pattern Recognition 37, 2349–2359 (2004) 17. Wu, Y.T.: Multimedia Security, Morphological Processing, and Applications, Ph.D. Dissertation, New Jersey Institute Technology, Newark, NJ (2005) 18. Wu, Y.T., Shih, F.Y.: Genetic algorithm based methodology for breaking the steganalytic systems. IEEE Trans. Systems, Man, and Cybernetics – Part B, 36, 24–31 (2006) 19. Zhao, D., Chen, G., Liu, W.: A chaos-based robust wavelet-domain watermarking algorithm. Chaos, Solitons and Fractals 22, 47–54 (2004)
Subject Index
audio, 76 authentication, 76, 141–149, 151, 153, 154, 158, 159, 188, 189 bandwidth, 168 bit correct rate (BCR), 3–6, 8–15, 17, 18, 124, 125, 127, 129, 134–137 bit error rate, 124 block feature differences (BFD), 142, 148, 150, 159 maximum block feature difference (MBFD), 149–153, 159, 160 broadcast, 21, 168, 177 capacity, 48, 49, 208, 210, 217, 219, 222 chaotic map, 205, 208–210, 213 circular convolution, 94 cluster, 66, 67 codebook, 65–68, 147, 149 0-codebook, 66 1-codebook, 66 state codebook, 65, 66, 68–70, 72 super codebook, 65, 66, 68–70 codeword, 64, 65, 67–70, 72, 73, 119, 147, 149 color difference, 34, 35, 44 color palette, 51–53 color plane, 22, 24–27, 29, 32, 52, 56 RGB, 27, 34 YUV, 21, 27, 31, 34, 35, 39, 44 color space, 21, 33–35, 44 L*u*v*, 21, 33–35, 44 compression, 142, 145 compression ratio (CR), 149–153, 159 computational intelligence, 115
contrast sensitivity function, 198 convolution, 82 cyclic convolution, 92, 93 correlation, 22, 24, 26, 27, 34, 78, 79, 93 autocorrelation, 79, 82 covariance, 24, 26–28, 30, 33
data hiding, 2, 63, 66, 181, 185, 188, 189, 191, 196, 201 decryption, 2, 163 descriptor, 86, 103 determinant, 83 difference expansion, 48 digital rights management (DRM), 1, 2, 163 digital rights protection (DRP), 163 digital signature, 141–143, 145, 146, 148, 157–159 discrete cosine transform (DCT), 4, 6, 7, 27, 47, 64, 115–119, 121, 124–128, 130, 132, 143–145, 165, 167, 169, 170, 173, 207, 209, 211–215 AC, 6, 7 DC, 6, 118 inverse DCT, 213, 214 discrete Fourier transform (DFT), 77, 80, 117, 207, 209 discrete Fourier transformation (DFT), 64 discrete wavelet transform (DWT), 117, 143, 145, 207, 209 distortion, 23, 84
226
Subject Index
dual-plane correlation-based (DPCbased) video watermarking, 22–24, 28, 33, 38, 42–45 DVD, 21 edge detection, 50 encryption, 1, 63, 72, 163, 164, 206 error control codes, 4, 19 Euclidean distance, 21, 33, 34, 51, 65, 69 evolutionary computation, 121 exclusive or, 3, 69, 124 expected value, 26, 29 false-alarm probability, 100, 124 feature point, 22, 75, 77, 79–82, 84–88, 90–92, 94, 102, 103, 106, 112, 144 feature selection, 80, 144, 146, 148, 155 fidelity, 76 fingerprint, 163, 164, 168, 171 fitness, 121 fitness function, 4, 5, 8, 12, 121, 124, 138, 166, 171, 172, 205 Fourier-Mellin transform, 77 genetic algorithm, 1, 2, 5, 6, 8, 9, 12, 18, 165–167, 171, 205 crossover, 5, 166 mutation, 5, 166 selection, 5, 166 genetic perceptual shaping, 117, 125–128, 130 genetic programming, 115–117, 121–129, 137, 138 gray code, 63, 64, 67–69, 73 halftoning, 181, 183, 184, 186, 188–191, 196–198, 201 direct binary search, 181 error diffusion, 181, 196, 197 ordered dithering, 181, 184 Hamming distance, 186 hash, 143, 189–191 MD5, 189 human visual system, 21, 22, 33, 37, 38, 44, 48, 52, 60, 92, 115, 117, 126, 127, 137, 149, 182, 190, 191, 193, 198, 201 hypothesis testing, 93 inpainting, 47–49, 60 Internet, 1, 21, 76, 115, 141, 153, 163, 181, 205
intersection-based pixels collection (IBPC), 211–215, 217, 218 ITU-R BT.500-7, 39 Java, 72 JPEG, 5, 7, 9, 11, 18, 75, 81, 110, 116, 137, 141–145, 149, 151, 155, 159, 160, 205, 212–215, 219, 221 JPEG2000, 141–145, 149, 150, 159, 160 least significant bit, 48, 50, 64 log-polar mapping (LPM), 77, 78 look-up table, 181, 186–188, 190, 191, 193–196, 201 lowest authenticable difference (LAD), 141, 142, 147–154, 157, 159 mean, 222 mean squared strength (MSS), 124, 127, 129 message error rate (MER), 219, 222, 223 modulation transfer function, 191 MPEG, 22 multi-resolution, 49 multicast, 163–168, 177 multimedia, 1, 76, 115, 163, 205 multimedia on demand, 163 normalized correlation (NC), 97, 101 optimization, 2, 4, 9, 11, 18, 88, 116, 121, 128, 167 multi-objective, 115, 116, 128, 138 pattern recognition, 75 payload, see capacity Peak Signal-to-Noise Ratio (PSNR), 3–11, 13, 15, 17, 18, 54, 72, 75, 100, 101, 149–152, 175, 176 weighted PSNR, 124 perceptual model, 117 perceptual shaping function, 116–119, 127, 138 genetic perceptual shaping function, 117, 122, 125–131, 133–138 Watson’s perceptual shaping function, 117, 127, 129–132, 134–137 quadratic function, 85 quality factor, 8, 9, 151, 212
Subject Index quantization, 36, 37, 96, 97 quantization table, 213, 215 robust high-capacity (RHC), 211, 213–216, 219, 222 scale factor (SF), 151, 152, 155, 157 scale-invariant feature transform (SIFT), 75, 79, 81, 84, 86–88, 90, 103, 106 scale-space feature point based watermarking (SSFW), 75, 77, 87, 112, 113 segmentation, 47, 49, 51 Set Partitioning in Hierarchical Trees (SPIHT), 144 spread spectrum, 118, 119 stability, 85, 89, 90 standard deviation, 142, 145–148, 155, 157–159, 222 steganography, see data hiding Stirmark, 3, 40 structural similarity index measure (SSIM), 124, 125, 127–129, 134 synchronization, 23, 24, 26, 28, 76, 77, 79, 81, 102, 105 texture, 47, 88 trace, 83 unicast, 163–166, 168, 169, 177 variance, 26, 27, 88, 152, 153, 155–157, 159 vector quantization, 49, 51, 64–66, 68, 69, 72, 73, 149 side-match vector quantization, 49, 63–68, 70, 72, 73 verification, 141–143, 145, 159 video, 22, 47–53, 76, 165 video on demand, 163 Watermark to document ratio (WDR), 124 watermarking, 1, 2, 7, 21–23, 25–28, 32, 36, 38, 75–77, 79, 81, 87, 90–94, 96, 97, 101, 102, 104–106, 109, 112, 115, 117, 119, 138, 142–144, 164, 169, 189, 205, 206, 208, 213–215, 222 acceptability of watermark, 35–37
227
attack, 1–3, 17, 63, 64, 72, 73, 75–77, 79–81, 89, 90, 93, 94, 99, 101–103, 110, 111, 115–118, 122, 124, 125, 127–131, 134–138, 142, 144, 154, 158, 207 capacity, 1, 2, 4, 7–10, 15, 18, 63, 64, 66, 71, 72, 116, 181–183, 187, 196, 197, 201, 202, 207, 209 detection, 21, 22, 24–30, 32, 33, 38, 44, 64, 76, 78, 79, 87, 89, 92–94, 101, 103, 106, 110, 112, 120, 167, 207 embedding, 2, 5, 7, 21, 22, 24–30, 47–50, 52, 53, 63–68, 72, 87, 89–91, 94, 96–98, 100, 102–106, 109, 115, 127, 130, 138, 143, 163, 164, 167, 185–188, 190, 193–196, 201, 206–208, 213–217 extraction, 50, 53, 67, 69, 70, 87, 90, 91, 94, 97, 100, 103, 106, 143, 188–190, 195, 208, 214–216 fragile, 2, 76, 141, 142, 207 imperceptibility, 2, 5, 11, 15, 18, 35, 52, 75, 76, 109, 115, 116, 118, 124, 128, 166, 207 imperceptible, 63, 92 invisible, 206 patchwork, 26 quality, 1, 21, 22, 33, 37, 38, 48–50, 54, 60, 63–66, 71, 72, 100, 163, 166, 177, 184 relative acceptability of the watermark, 37, 38 robust, 1, 2, 22, 76, 87, 115, 141, 177, 207–209 robustness, 2, 3, 11, 15, 18, 75, 76, 91, 96, 97, 101, 102, 109, 115, 116, 118, 124, 125, 154, 207, 209, 221, 222 semi-fragile, 76, 142, 144, 145, 207 shaping, 116 strength, 21, 22, 32–35, 37, 38, 40, 44, 92 synchronization, 87–89, 112 video, 21, 22, 28 visible, 206 Zernike basis, 104 Zernike moments, 75, 77, 78, 81, 102–110, 112, 113, 144