Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks M. Kıvan¸c Mıh¸cak1 , Ramarathnam Venkatesan1 , and Mustafa Kesal2 1
2
Microsoft Research {kivancm,venkie}@microsoft.com University of Illinois, Urbana-Champaign
[email protected] Abstract. Assume that we are given a watermark (wm) embedding algorithm that performs well against generic benchmark-type attacks that comprise of simple operations that are independent of the algorithm and generally of the input as well. A natural question then is to ask for a nearly perfect cryptanalytic attack for the specific watermarking method. In this paper we present and analyze an attack on a state-ofthe-art Discrete-Sequence Spread Spectrum (dsss) audio watermarking algorithm. Our method uses detailed models for watermarked signal and almost always jams or recovers > 90% of the watermarking-key. It exploits the host and wm correlations, and the fact that one can locally correct errors in the wm estimates if the watermarking coefficients are discrete. It is natural to use error-correction codes in a watermarking algorithm, and we study the effects of the accompanying redundancy as well.
1
Introduction
Many wm embedding schemes use spread spectrum (ss) techniques [1], where typically the wm is an additive perturbation independent of the host signal. We consider a “blind private-key watermarking scheme”, where the detector knows the secret key, but not the the original host. The attacker knows everything about the watermarking algorithm, but has no access to the secret key, the host data or the detector itself. We denote the host by s ∈ RN and the wm by m ∈ RN where mi ∈ B, where B is a set of possible values. The wm m is usually generated pseudo-randomly from some class of distributions using a random number generator, given a secret key. Then, the watermarked signal is given by y ∈ RN , where y = s + m. At the detector end, the goal is to successfully detect the existence of the wm in the input signal. In dsss watermarking , B is a finite set. In this paper, we consider the case B = {∆, −∆}. By key extraction, we mean computing m from y. We call an attack against a given watermarking algorithm, an ε-perfect cryptanalytic attack (or ε-perfect attack for short) if with probability ≥ 1 − ε, it (a) yields perceptually undistorted outputs and (b) removes wm . Barring such an attack, a given watermarking algorithm may be useful in many circumstances. F.A.P. Petitcolas (Ed.): IH 2002, LNCS 2578, pp. 226–246, 2003. c Springer-Verlag Berlin Heidelberg 2003
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
227
If a pirate has an attack that works, for example, only with 90% probability, he will be exposed with ε = 10% probability, which limits his business model. Thus, ε-perfect attacks have a significantly higher threshold over generic benchmark attacks. As we discuss next, it is non-trivial and can be hopeless to convert a generic attack into a ε-perfect cryptanalytic attack. Previous Work and Benchmarks: The watermarking problem is often viewed as a communication problem with side information, where the message to be transmitted is the wm m and the side information is the host data s. If one wishes to account for adversaries, in particular with bounded computational resources (the only type of adversaries we are interested in), then this is no longer a typical communication problem and is inextricably tied to computational complexity issues, which have little to do with the methods of communication and information theory. This issue has often been overlooked in the literature. In addition, suggested (or tested) attacks are often independent of the particular watermarking algorithm and do not fully exploit the available correlations in the host and possibly in the wm in an adaptive manner, or information about the particular targeted watermarking algorithm: usual attacks are non-malicious such as additive independent noise, denoising, smoothing, compression, rotation, cropping, shearing, etc [2, 3]. See [4] for a comprehensive survey. With respect to most watermarking algorithms, some of which have characterizations of optimal attacks (which involves scaling and addition of Gaussian or other noise) and proofs that in their models that the attack can be withstood, Stirmark, with its fairly elementary methods, established an important point: they are vulnerable for reasons unclear or unaccounted for in the theoretical model of their design. Using mathematically more sophisticated and remarkably successful models and tools of signal processing (such as estimation, compression, de-noising) for attacks is a natural variation and a motivation for more benchmarks [3]. However, such benchmarks are at best sanity checks, and akin (in cryptography) to running randomness tests such as Diehard[5] or NIST’s[6] (or applying favorite cryptanalytic tools [7] such as differential or linear analysis) to one’s latest cipher. There is no guarantee whatsoever such generic methods apply to a targeted algorithm. For example, Checkmark as an attack on image watermarking algorithm [8] is comparable to jpeg compression; this image watermarking algorithm is yet to yield to targeted and specific estimation attacks, which are in part effective against plain dsss watermarking . It may be non-trivial to make it work near perfectly, for images, see website of [3] for some examples. We remark that issues here may be more complicated than in cryptography: both watermarking algorithms and attacks/benchmarks ail from the same problem: lack of a reasonably robust perceptual metric applicable in this context [9]. Furthermore, most of such attacks aim to jam the detector, rather than extracting the key. Contribution of Our Work: Assuming we are given a specific watermarking algorithm that is well engineered and performs remarkably well against a wide-
228
M. Kıvan¸c Mıh¸cak et al.
range of generic attacks, (e.g., audio analogies of those in [10]), a natural and important question to decide the practical applicability of the algorithm is to ask if there is a ε-perfect attack. We studied a state of the art dsss audio watermarking algorithm in [11] that embeds the wm which also uses error correcting codes for robustness. We study the algorithm for both hidden and known codebooks, with the goal of extracting the secret key, and as a byproduct get a ε-perfect cryptanalytic attack with ε ∼ 0. Our attack exploits the host and wm correlations. We also implemented and tested an attack that does not recover secret key, but was ε-perfect and used more detailed source models and in fact may be better for wider applicability since there may exist some watermarking scheme for which it may be possible to jam, but not recover the key. Owing to space constraints we do not describe this here. We estimate the embedded wm and subtract a scaled version of the estimate from the watermarked data. This is similar to the so-called “remodulation attacks” in [10] and Wiener filtering in [12]; the non-trivial part is to derive a method to perform this task for the targeted algorithm that is theoretically and empirically justifiable. In our results the attacked signal sounded closer to the original than signal with the wm . We use maximum a posteriori estimation which is optimal in the sense of probability of error (unlike the Wiener filter in [12] which is optimal only for stationary Gaussian sources). Furthermore, we employ a Gaussian-based stochastic model for audio coefficients in the log-MCLT transform domain [13]. It is natural to use error correcting codes in watermarking algorithms, and we analytically characterize the tradeoff between the estimation accuracy and redundancy, and provide experimental results on the success of the proposed attack. Moreover, we precisely quantify the performance of the attack in a detection-theoretic sense. We are not aware of such a discussion in related literature (see [10, 12] or their references). Outline of the Paper: In section 2 we give an overview of the method. Section 3 explains our source model, section 4 analyzes correlation detectors for dsss watermarking schemes, section 5 describes the key extraction and section 6 presents the attack and quantifies the degradation of the detector. Section 7 describes the method’s empirical effectiveness against [11]: our key extraction almost always recovers > 90% of the key. For further details, see [14]; we omit our experimental results on images for ss watermarking schemes with no repetition and our attacks are not yet effective against watermarking methods which use explicit randomization and choose watermarking pseudo-randomly from an interval [8]. Notation: We are going to use calligraphic letters for sets, | . | operator for the cardinality of sets, superscripts to index set elements, boldface letters for vectors, and corresponding regular letters with subscripts to index elements of vectors. For example, consider set A where A = a1, a2 , . . . , a|A| and aji denotes the ith element of vector aj . Also let N µ, σ 2 denote the Gaussian distribution
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
229
with mean µ and variance σ 2 , log stand for the natural logarithm and < . , . > represent the inner product corresponding to Euclidean norm. Further notation shall be introduced in Secs. 2, 3 and 4, wherever applicable.
2
Overview of the Proposed Attack
Our attack consists of two steps: ˆ of the wm m using models that exploit 1. Given y, we produce an estimate m correlations in the watermarked image and the wm if any; 2. An estimate for input signal is now made; for example, one can use z = ˆ where α > 0 is an input parameter and y is the input to the y − αm wm detector. For step 1, one can use various methods to suit different source models. A good estimate for wm depends on the specific watermarking algorithm itself, in particular the structure of the codeword set M ⊆ {∆, −∆}N from which the wm m is chosen. Obviously, more accurate estimation will make the attack more effective. On the other hand, the estimation accuracy in step 1, depends on the amount of redundancy used in the design of the codeword set M. The redundancy can be used in the form of repetitions as in most ss based watermarking algorithms. We remark that it is safe for the designers to assume that (i.e., unsafe to assume the contrary) if repetitions help the watermarking detector quantitatively (in terms of robustness against common signal processing attacks), it is likely to help the attack as well. As for a good defense, we suggest explicit randomization of the steps in watermarking algorithm (e.g. as in [8, 15]).
3
Source Model
We assume that the host data is a realization of conditionally independent Gaussian distribution (conditioned on the parameters of the Gaussian distribution). The parameters of the Gaussian distribution can be heavily correlated. Note that, in terms of the assumptions about the source distribution, there is a significant difference between the watermarking algorithm designer and the attacker. The designers could possibly design the wm encoding algorithm based on a source distribution, however from a security perspective it is inherently dangerous to design a watermarking detector relying on a source distribution, since an attacker can in principle “skew” or “bend” the distribution so as to force the detector to produce unwanted outputs. On the other hand, the situation is more or less the opposite for attackers. This is because a fixed watermarking algorithm cannot easily ensure that the outputs do not obey a class of distributions. Hence the attack analysis presented here has its importance as it clarifies the algorithm and points some subtleties. Let the unwatermarked source data be s, a length-N zero mean Gaussian vector whose elements are conditionally independent (conditioned on the variances). Throughout the text, we drop the conditioning from notation for the
230
M. Kıvan¸c Mıh¸cak et al.
sake of simplicity. The correlations in the source are embedded in the correlations between the variances. As a result, we propose a locally i.i.d. (independent identically distributed) source model. Under this model we assume that s consists of segments of length M where N = qM , both q and M are positive integers. 2 Within segment i,2 s is assumed to be i.i.d. Gaussian with variance σi , 1 ≤ i ≤ q, i.e., sj ∼ N 0, σi , (i − 1)M + 1 ≤ j ≤ iM , 1 ≤ i ≤ q, where j ∈ {1, . . . N }. Then, we have q iM s2j 1 √ p (s) = exp − 2 , 2σi 2πσi i=1 j=(i−1)M+1
and
q
log p (s) = −
q
1 1 M log 2πσi2 − 2 i=1 2 i=1
iM j=(i−1)M+1
s2j , σi2
where p(.) is the corresponding probability density function. Variants of this model have been shown to be quite useful within the context of image compression [16] and image denoising [17].
4
On dsss Watermarking Methods
Let m ∈ M be the wm vector (message) which is chosen randomly (under N uniform distribution) from the “codeword set” M where M ⊆ {∆, −∆} (randomization is carried out using the secret key as the seed of the random number generator). For dsss methods, watermarking rule is defined by y = s + m where addition is component-wise and y is the watermarked signal. At the detector end, the purpose is to reliably detect the presence of wm . Under private key blind watermarking scenario, it is assumed that the detector knows the secret key and M, hence m, however it does not know s. Under these conditions, the detector provides a solution to the following binary hypothesis testing problem : H0 : y = s H1 : y = s + m In this paper, we consider the detectors that use “correlation test”, since this is usually the detection rule that is used in the literature for sswatermarking schemes1 . The correlation detector is given by H1 > y i mi τ < i=1 H0
N
1
(4.1)
It can be shown that the correlation detector is not optimal (in the sense of probability of error) in general for non-i.i.d. host signals. The optimality holds only for i.i.d. Gaussian host signals. The performance loss due to suboptimality of the correlation detector shall be quantified in our future work
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
231
Next, we derive the performance of the correlation detector using our source model. Within a detection-theoretic setting, the performance of the detector is characterized by PF (probability of false alarm) and PM (probability of miss)[18], where PF = Pr [deciding on H1 |H0 ] and
PM = Pr [deciding on H0 |H1 ] .
N N y m > τ |H = Pr y m < τ |H and P . Using rule (4.1), PF = Pr i i 0 M i i 1 i=1 i=1
N
q Under H0 , we have i=1 yi mi ∼ N 0, ∆2 M i=1 σi2 and hence τ √ q PF = Q , (4.2) 2 ∆ M i=1 σi ∞ where Q(t) = t √12π exp −u2 /2 du. Under H1 , we have
N
q 2 2 2 i=1 yi mi ∼ N N ∆ , ∆ M i=1 σi and hence N ∆2 − τ √ q PM = Q . 2 ∆ M i=1 σi
(4.3)
The results (4.2) and (4.3) shall be useful in the subsequent sections in order to evaluate the degradation in the performance of the detector after the proposed attack. In the next section, we outline our attack approach and provide a brief discussion. In Secs. 5 and 6, we explain the details of our approach and derive the related results.
5
Key Extraction
We use ML estimation to extract the key. We assume that the estimation of wm will be carried out on watermarked data only, i.e., we assume that the attacker is aware of the presence of a wm and hence, the attack is applied only to watermarked signals. Moreover, we assume that the attacker knows everything about the wm embedding algorithm, except for the secret key, i.e., the attacker knows the domain where wm is embedded, the magnitude of wm samples, ∆ and the set M from which wm is selected; however he does not know the wm sequence itself. Furthermore, we assume that the total wm m consists of concatenation of wm vectors w(i) where w(i) is the wm for segment i, and w(i) is chosen from M set W ⊆ {∆, −∆} , 1 ≤ i ≤ q. First note that, according to MAP (Maximum A Posteriori) rule, the estimation problem is ˆ MAP = argmax p (y|m) f (m) , m m∈M
232
M. Kıvan¸c Mıh¸cak et al.
where f (.) is the probability density function according to which codewords are selected from M. Now, we assume that the choice of a particular wm from set M bears no bias over another one. In that case f (m) = 1/ |M|, ∀m ∈ M and the MAP rule reduces to ML rule : ˆ ML = argmax p (y|m) . m
(5.4)
m∈M
Next, we show that under the assumptions stated above, it is optimal to carry out estimation locally within each segment independent of other segments. (1) (2) (q) ˆ ML . . . w ˆ ML w ˆ ML is a solution to (5.4) where for Lemma 1. The choice of w (i)
ˆ ML is a solution to each i w
(i) ˆ ML = argmax p y(i) |w = argmax log p y(i) |w . w w∈W
w∈W
(5.5)
where y(i) is the watermarked data in segment i, 1 ≤ i ≤ q. Proof : See Appendix A. 5.1
Estimation Analysis - Most General Case
In this section, our goal is to quantify the accuracy of the estimator in terms of finding the probability distribution of the number of wm samples that are estimated inaccurately. Due to Lemma 1, it is optimal to carry out estimation independently in each segment. Thus, we shall follow this procedure. In this section, we will confine our analysis to a single segment. However the results presented here can be extended to the total signal without difficulty once the structure of W is known. For convenience, we drop the superscripts of type (i) that index the segments in this section. Let σ 2 be the variance of each sj within segment i. In the most general case, our goal is to solve the following discrete optimization problem for each segment ˆ ML = argmax log p (y|w) , w
(5.6)
w∈W
where W = w1 , w2 , . . . , w|W| . Since p (yi |wi ) ∼ N wi , σ 2 ,
ˆ ML w
M 1 M = argmax − log 2πσ 2 − 2 (yi − wi )2 2 2σ w∈W i=1
= argmax < y, w >, w∈W
where the last equality follows since wi2 = ∆2 is constant, 1 ≤ i ≤ M . Before stating the result, first we introduce the following definitions:
– e = number of bits that are estimated incorrectly within a segment.
(5.7) (5.8)
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
233
ˆ ML = wi |w = wj (conditional probability of getting wi as a re– pij = Pr w sult of ML estimation where wj was actually embedded). 1 M i j – dij = 2∆ k=1 wk − wk (normalized l1 distance between wi and wj , also can be viewed as the Hamming distance between “variants” of wi and wj in GF2 ). Note that 1 ≤ dij ≤ M for i = j. – Ajk = wi ∈ W|dij = k , 0 ≤ k ≤ M (the set of codewords which are at M distance k from wj ). Note that Aj0 = wj , 1 ≤ j ≤ |W|, k=0 Ajk = W, ∀j.
– M × 1 vector ai where aik = < s, wi − wk >, 1 ≤ k ≤ M .
ij i j k j – M × 1 vector b where bij k = < w , w > − < w , w >, 1 ≤ k ≤ M . T – Ri = E ai ai (autocorrelation matrix of ai ).
Now, we present Lemma 2 where we quantify the results of wm estimation in the most general case and provide closed form expressions for estimation error. Lemma 2 presents closed form expressions for: – Conditional probability of estimating wj where wi is actually embedded (i.e. pij ). – The probability distribution of the error made as a result of the estimation (i.e., the probability distribution of e) Lemma 2. (i) pij = Pr aik + bij ≥ 0, 1 ≤ k ≤ M where aik ∼ N 0, 4∆2 σ 2 dik and bij k k = 2∆2 (dkj − dij ). an eigenvector (ii) Assuming that Ri is strictly positive definite, there exists M ˜ bij i i i iT k ˜ ij = decomposition R = V Λ V such that pij = k=1 Q − √ i where b iT
λk
ij
λik
i
is the k-th element of Λ along the diagonal. V b and
(iii) Pr e = k|w = wj = i s.t.wi ∈Aj pij and k
|W|
1 j pij . Pr [e = k] = |W| i j=1 i s.t.w ∈Ak Proof : See Appendix B. Naturally the characteristic of the estimation error depends on the structure of the codeword set W. In general it is a nontrivial task to find the these characteristics (pij and Pr [e = k]) for arbitrary codeword sets W. In this paper, we concentrate on the special case of “block repetition codes”, i.e., the case when ∆ or −∆ is repeated within each segment. As a result, we derive tractable expressions for the distribution of the error that is made in the estimation process. More detailed analysis on different codes shall be considered in our future work. 5.2
Estimation Analysis - Block Repetition Code In this section, we consider the case where W = w0 , w1 where wk0 = −∆, wk1 = ∆, 1 ≤ k ≤ M .
234
M. Kıvan¸c Mıh¸cak et al.
Lemma 3. √ If W = w0 , w1 , for segment k, p01 = p10 = Q σ∆k M , p00 = p11 = 1 − p01 , Pr [e = 0] = p00 , Pr [e = M ] = p01 , where σk2 is the variance of segment k, 1 ≤ k ≤ q. Proof : See Appendix C. Now, we extend the estimation error result to the whole signal under our locally i.i.d. source model for the block repetition code case. Let
B k = {(l1 , l2 , . . . , lk ) |li = lj fori = j and li ∈ {1, 2, . . . , q} , 1 ≤ i ≤ k} ,
k 1 ≤ k ≤ q with B 0 = ∅ (i.e., B isthe set of all possible k–tuples from the set k q . Also let etotal be the number of bits that {1, 2, . . . q}. Note that B = k are estimated incorrectly in the whole signal. Then we have the following result.
Corollary 1. Pr [etotal = kM ] =
(l1 ,...,lk
)∈Bk
k
m=1
Q
∆ √ M σlm
q m=k+1
1−Q
∆ √ M σlm
.
Corollary 1 is immediate from the independence of codeword selection between different segments. Remark: Clearly, the wm estimation process will perform strictly better than 50% (i.e., in expectation sense strictly more than half of the watermarking bits shall be estimated correctly). The estimation accuracy depends on relative strength of the wm with respect to the signal as well as the wm length and the particular codebook used. For instance, the usage of block repetition codes greatly increases the estimation accuracy. Also as the wm strength increases, the estimation error would in general decrease. Note that, some controlled redundancy and relatively strong wms are usually essential in ss based watermarking schemes in order to provide synchronization against de–synch attacks and withstand against common signal processing attacks. This redundancy is usually provided in terms of using a block repetition code in the watermarking community. Such a watermarking scheme, if designed properly, is expected to withstand most reasonable “blind” attacks (such as de–synchronization attacks, band pass filtering, compression, denoising, etc.), that aim to jam the detector only. We use the term “blind” for an attack if the attack does not make use of the watermarking algorithm used in the system. In general, as a rule of thumb, as the amount of redundancy in the watermarking code and the strength of the wm increase, the robustness against blind attacks is expected to increase. On the other hand, this also brings advantages to a “non–blind” attacker (a non–blind attacker is an attacker that knows the watermarking algorithm completely or has some partial information about the watermarking algorithm), who aims to extract the key.
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
6
235
Analysis of the Proposed Attack
Our proposed attack produces the following signal: ˆ ML = s + m − αm ˆ ML . z = y − αm
(6.9)
Since the attacker is going perform strictly better than “random coin flips” in expectation sense, clearly if α is chosen large enough, the wm detector is expected to fail. If the estimation accuracy is high (say around %90) α ∼ 1 would be sufficient. However, as the estimation accuracy degrades (i.e., gets closer to %50), it would be necessary to use higher values of α in order to increase the PM of detector to a desired level. On the other hand, if α is too large, the attacker is going to introduce unacceptable amount of distortion to the signal. In this section, our goal is to quantify this trade–off. In particular, in Sec. 6.1, we quantify the distortion introduced by the proposed attack in an expected MSE (mean squared error) sense. In Sec. 6.2, we quantify the degradation in wm detection by analyzing the variations in PF and PM of a correlation detector of a dsss scheme after proposed attack. We provide results for block repetition codes. 6.1
Distortion Induced by Proposed Attack
Let
ˆ ML dtotal = z − s = m − αm be the total distortion vector introduced by the attack. Our goal in this section 2 is to find E ||dtotal || . First, we derive the expected MSE in a particular segment. This result can easily be generalized to the whole signal. Afterwards, we specialize to the case of block repetition codes. Lemma 4. Within a particular segment, |W| |W| 2 2 E ||d|| = ∆2 M 1 + α2 + 2α∆2 −M + dij pij , |W| i=1 j=1
(6.10)
ˆ ML and pij = Pr w ˆ ML = wi |w = wj within that segment where d = w − αw and dij is the “Hamming distance” between wi and wj (as defined in Sec. 5.1). Proof : See Appendix D.
Corollary 2. In the block repetition code case, i.e., W = w0 , w1 , where wl0 = −∆, wl1 = ∆, 1 ≤ l ≤ M , we have ∆√ 2 M −1 , (6.11) E ||d|| = ∆2 M 1 + α2 + 2α 2Q σk in segment k, and
q ∆√ 2 2 Q M . E ||dtotal || = ∆ N 1 + α + ∆ M 2α −q + 2 σk 2
2
k=1
Proof : See Appendix E.
(6.12)
236
6.2
M. Kıvan¸c Mıh¸cak et al.
Degradation of wm Detection after Proposed Attack
Now, we quantify the degradation in the performance of the wm detector, which uses correlation test. First recall that we assume the attacker is aware of the presence of the wm i.e., proposed estimation attack is applied only if the input signal is watermarked. Therefore, PF (i.e., probability of detecting wm if wm is not embedded) does not change after the attack. The proposed attack only changes PM (i.e., probability of declaring wm is not present even though wm was embedded). In order to gain insight, we first consider the simplest case, where we consider a single Gaussian random variable. We are able to find tractable closed form expressions in this case. Then, we extend this result to the case of block repetition codes and provide tractable lower bounds on PM . Single Random Variable Case: Consider a single Gaussian random variable s, that is watermarked via dsss : y = s + w, w is randomly chosen from W = {∆, −∆} with a fair coin toss. The detector applies the simple correlation test; then the decision rule is H1 > yw τ, < H0 2 −τ and σ 2 is the variance of s. The where PM = Pr [yw < τ |y = s + w] = Q ∆∆σ estimation attack produces w ˆ = ∆signy. After the attack, the detector input is ˆ z = y − αwˆ = s + w − αw. ˆ Let E = zw = sw + ∆2 − αww. Lemma 5. For the single random variable setup, 2 2 Q ∆ (1+α)−τ − Q ∆ + Q ∆ (1−α)−τ if ∆2 α > τ ∆σ ∆σ 2 σ PM = Pr [E < τ ] = . Q ∆ (1−α)−τ else ∆σ (6.13) Proof : See Appendix F. Remark: There is a nonuniform behavior in the functional behavior of PM . There are 2 regions (2 different functional forms); these regions are determined by comparing ∆2 α with τ . It can be shown that PM is an increasing function of α, however the rate of increase changes in the regions of ∆2 α > τ and ∆2 α ≤ τ . In particular it can be shown that when ∆2 α ≤ τ , PM increases a lot faster with respect to α than the other regime. Hence, for fixed τ , an increase in ∆2 α is very useful for attacker (since the rate of increase in PM increases), i.e. increasing wm strength (∆) and/or attack strength (α) benefits the attacker. Also, for fixed ∆ and α, decreasing τ damages the attackers performance. However, note that, as τ decreases, the chances of declaring unwatermarked data as watermarked (i.e., PF ) increases, which is not shown here.
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
237
Block Repetition Code Case Now, consider the case where block repetition code is used within each segment. Due to our locally i.i.d. model, this case can be viewed as watermarking the length–q signal (recall that q is the number of segments) ˜s, which is given by the local averages of s. Hence, s˜i =
iM 1 ˜i ∼ N 0, σi2 /M for segment i, where 1 ≤ i ≤ q. Furj=(i−1)M+1 sj , and s M ˜ be the length q signal that consists
thermore let y of arithmetic means of the iM 1 watermarked signal y for each segment, i.e., y˜i = M j=(i−1)M+1 yj , 1 ≤ i ≤ q. 0 1 yi = s˜i + ∆). Then for segment i, if w = w (w = w ), then clearly y˜i = s˜i − ∆ (˜ ˜, Hence, the whole setup can be viewed as attacking the watermarked signal y where no ECC (error correction coding) has been used in the wm generation. Utilizing this approach, we can use the result of Lemma 5 in deriving probability of error results for the block repetition code case. Thus, we present the following result. Corollary 3. When W = w0 , w1 , where wk0 = −∆, wk1 = ∆, 1 ≤ k ≤ M , according to 0–mean locally i.i.d. Gaussian model with variance σi2 in segment i, 1 ≤ i ≤ q, N = qM , we have the following: (i) For segment i, we have √ √ √ 2 ∆2 (1−α)−τ ∆ Q M ∆ (1+α)−τ M M − Q + Q if ∆2 α > τ ∆σi σi ∆σi √ . PM = 2 else Q M ∆ (1−α)−τ ∆σi (6.14) ˜ k be the set of all k–tuples from index set {1, 2, . . . , q}, i.e., (ii) Let D " # q k kl kl kl kl k ˜ ˜ ˜ |˜ D = v vi = v˜j for i = j and v˜i ∈ {1, . . . , q} , 1 ≤ i ≤ k, 1 ≤ l ≤ |D | = . k ˜ kl (i.e., elements of the l-th k-tuple Accordingly define the index set of v k ˜ from set D ): V˜ kl = v˜kl , v˜kl , . . . , v˜kl . 0
˜ kl,C
Also let V
1
k
˜ kl
denote {1, . . . , q} − V . Then for the whole signal we have ˜k
D | N |
q
q∆2 (1 + α) − τ − 2kα∆2 , s˜m < ∆, m ∈ V˜ kl , ∆ i=1 k=0 l=1 (6.15) s˜n > ∆, n ∈ V˜ kl,C " q √ q∆2 (1 + α) − τ − 2kα∆2 q ≥ M Q k q∆σmin τ k= 12 (q− α∆ ) 2 k q−k $ √ √ ∆ ∆ M M , (6.16) −Q Q σmin σmin
PM =
Pr
s˜i >
where σmin = min1≤i≤q σi .
238
M. Kıvan¸c Mıh¸cak et al.
The results (6.14) and (6.15) follow directly from the extension of (6.13). In order to obtain (6.16), we use union bound and monotonicity properties of the Q (·) function. The complete proof has been omitted here due to space constraints.
7
Practical Applications
We developed 3 variants of our attack approach and applied them on the state-ofthe-art audio watermarking algorithm [11], which was designed using dsss signal watermarking approach. 7.1
An Overview of the Tested Audio Watermarking Scheme [11]
Here, we provide an outline of [11]. Some of the pre– and post–processing details are omitted due to space constraints. Both wm embedding and detection takes place in the time–frequency representation of the audio clip. In particular MCLT (Modulated Complex Lapped Transform) has been used in order to pass to this domain [13]. 8 bits are hidden in a L = 11 second audio clip. The first 4 bits are used for randomization purposes together with the secret key. The last 4 bits are actually embedded after the randomization process. An L second audio clip is divided into “cells” in the time–frequency domain. There are total of 120 splits along the frequency axis and 24 splits along the time axis. The splitting along the time axis is done in a uniform fashion whereas the splitting along the frequency axis is done in a geometric progression. In the embedding process a codeword set C is used. The set C has the following properties: 6
– C ∈ {∆, −∆} – |C| = 12 – c∈C ⇔ ¯ c ∈ C where c¯ is defined such that c¯i = −ci , 1 ≤ i ≤ 6. Now for each frequency split (there are 120 of them), there are 24 cells to consider. These cells are divided into 4 groups (because there are 4 bits to embed), 6 cells in each group. 1 bit falls to each group. Given 1 bit, an element c is chosen randomly from C. Then, the chosen c is embedded to each group by using the block repetition code, i.e., ci ∈ {∆, −∆} is embedded in the i–the cell of the group. The addition of ±∆ is done in the magnitude MCLT domain after applying log non–linearity. The detector uses the correlation test. The frequency band, that is used both in wm embedding and detection, is 2–7 kHz band. For further details, we refer the reader to [11]. 7.2
Simulation Results Based on Quantitative Analysis of the Attack
Now, we present predictions about proposed attack for the setup of [11] by numerically evaluating our theoretic results that were derived in the previous sections. Our experiments revealed that, the variance of MCLT coefficients (using
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
239
on a 0-mean locally i.i.d. Gaussian model) were in the range of 4 − 6 most of the time (after applying the proper pre-processing mentioned in [11]). Also average cell size for proposed method was about 40 coefficients. In [11], ∆ is typically chosen in the range of [1, 3]. ∆ = 1 is used for numerical simulations of our analysis. Next, we give plots that show the behavior of PM without any attack and after attack. Recall that, although we derive closed form expressions for PM after attack, they are not tractable. Thus, in the plots the curve that gives the behavior of wm detector after attack is a lower bound on actual PM (given by (6.16)). The plots are given in Fig. 1 in Appendix. Consider 0-mean locally i.i.d. Gaussian signal model within each cell, (i.i.d. , 1 ≤ i ≤ q where q independent everywhere), ∆ = 1, and σi = 5 + sin 2πi L is the number of segments (i.e., the number of cells); i indexes each cell in the MCLT domain and L = 5. We assume that each cell consists of 40 coefficients (i.e., M = 40) and we choose τ /number of coefficients = ∆2 /2 to equalize PF and PM when there are no attacks. . In the plots (a) and (b) given in Fig. 1, the solid line shows PM at the detector without any attack (given by (4.3)) and the dashed line shows a lower bound on PM after our proposed attack (given by (6.16)). The lower bound (6.16)) is given by the solid line in Fig. 1(c). First we fix α = 3 and look at the performance of detector in terms of degradation in PM vs N (total number of coefficients) (Fig. 1(a)). Then, we fix N = 80000 and examine the performance with respect to varying α: 0 < α ≤ 5. The degradation in the performance of the detector is shown in Fig. 1(b) with respect to changing α. Note that, the meaning of varying α is changing the strength of the attack, thereby increasing the distortion introduced to the signal. Hence, next we examined the performance degradation of the detector performance with respect to the average distortion introduced by the attack. The degradation in the performance of the detector is shown in Fig. 1(c) vsthe “average normalized distortion” introduced by the attack, i.e., E||dtotal ||2 / N ∆2 given by (6.12). 7.3
Practical Details and Experimental Results of Proposed Attack Methods on Audio Clips
We assume that attacker knows everything about the watermarking algorithm except for the secret key, i.e., the locations of the cells and the codeword set C are known, however the exact codewords, that have been used, are unknown. We applied 3 variations of our estimation attack approach as described briefly below. Method 1: Relying on the 0–mean locally i.i.d. Gaussian source model (i.i.d. within each cell, independent everywhere), we carried out the following ML estimation independently for each group of 6 cells: ˆcML = argmax < y, c >, c∈C
240
M. Kıvan¸c Mıh¸cak et al.
where y is the watermarked signal within each cell. Then for each group of 6 cells, the attacked signal is given by z = y − αˆ cML . Here α is an attack parameter and determines the strength of the attack. Method 2: Relying on the 0–mean locally i.i.d. Gaussian source model (i.i.d. within each cell), this time we assumed that we do not know C. Thus we assumed that 6 C = {∆, −∆} in general. Hence, we carried out estimation independently within each cell. Thus for the i–th cell of a group of cells, we used yj . cˆi,ML = ∆ sign j∈cell i Then for each cell, the attacked signal is given by z = y − αˆ cML . Method 3: In the latest method, we used a slightly different source model: We assumed that the signal is independent Gaussian and locally i.i.d., however not 0–mean in general. This is based on our experimental observations. Based on this model, we used vertical strips (in the time–frequency plane where frequency stands for the y axis) of watermarked data (that cover more than 1 cell) in order to find estimate of the mean at each location. The rationale for using vertical strips that cover more than 1 cell is to cancel the effect of bias due to wm as much as possible in mean estimation. The estimated mean is subtracted from the watermarked signal. Then Method 2 is applied to the resulting signal. Experimental Results: 25 audio clips of length approximately 22 seconds (and of different types and characteristics) have been experimented. 16 bits of wm have been embedded to each of these audio clips; 8 bits in the first 11 second half and 8 bits in the second 11 second half. 7 different wm strength levels have been tried, namely ∆ = 1, 1.5, 2, 2.5, 3, 3.5, 4 (all in dB since wm is added in log magnitude MCLT domain). All 3 attack methods have been tried at all wm strength levels. The wm detector in the proposed scheme of [11] is a normalized correlation detector and naturally we used this detector in our experiments. Furthermore, the detector does not know which attack took place. It has been observed that wm detector always failed to detect the presence of wm after the attack when we chose α ∼ 2 − 3 for Method 1 and Method 3 and α ∼ 4 − 5 for Method 2. After all attacks, we observed very mild or no degradation in the perceptual quality of the attacked audio clip; in fact the attacked audio clip sounded closer to the original than the watermarked clip in most of our experiments. Also, when Method 1 is applied, we observed that more than 90% of wm was estimated correctly on average. Thus, approximate key extraction is achieved in this case.
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
241
Acknowledgments We would like to thank Mariusz Jakubowski of Microsoft Research for providing experimental results on applying [2, 3] to [8].
References [1] I. J. Cox, J. Killian, F. T. Leighton and T. Shamoon, “Secure Spread Spectrum Watermarking for Multimedia,” IEEE Trans. Image Proc., Vol. 6, No. 12, pp. 1673–1687, Dec. 1997. 226 [2] F. A. P. Petitcolas and M. G. Kuhn: StirMark software, available from www.cl.cam.ac.uk/~fapp2/watermarking/image watermarking/stirmark/. 227, 241 [3] S. Pereira, S. Voloshynovskiy, M. Madue˜ no, S. Marchand-Maillet and T. Pun: Checkmark software, available from watermarking.unige.ch/Checkmark/. 227, 241 [4] S. Voloshynovskiy, S. Pereira, T. Pun, J. J. Eggers and J. K. Su, “Attacks on Digital Watermarks: Classification, Estimation-based Attacks and Benchmarks,” IEEE Communications Magazine (Special Issue on Digital watermarking for copyright protection: a communications perspective), F. Bartolini, I. J. Cox, J. Hernandez, F. P´erez-Gonz´ alez, Guest Eds. , Vol. 39, No. 8, pp. 118–127, 2001, Invited paper. 227 [5] G. Marsaglia: Diehard software, available from stat.fsu.edu/~geo/diehard.html. 227 [6] Software for random number generation and testing, available from csrc.nist.gov/rng/. 227 [7] A. J. Menezes, P. C. van Oorschot and S. A. Vanstone, Handbook of Applied Cryptography, CRC Press, 1997. 227 [8] R. Venkatesan and M. H. Jakubowski, “Robust Image Watermarking,” Proc. ICIP, Vancouver, B. C., Canada, 2000. 227, 228, 229, 241 [9] R. Venkatesan, “Signal Processing in the Presence of Adversary,” preprint, available from research.microsoft.com/~venkie/. 227 [10] S. Voloshynovskiy, S. Pereira, V. Iquise and T. Pun, “Attack modeling: Towards a second generation benchmark”, Signal Processing, Special Issue on Information Theoretic Issues in Digital Watermarking, Vol. 81, No. 6, pp. 1177–1214, June 2001. 228 [11] D. Kirovski and H. S. Malvar, “Robust Covert Communication Over a Public Audio Channel Using Spread Spectrum,” Proceedings of Information Hiding Workshop, Pittsburgh, PA, 2001. 228, 238, 239, 240 [12] J. K. Su, J. J. Eggers, and B. Girod, “Analysis of Digital Watermarks Subjected to Optimum Linear Filtering and Additive Noise,” Signal Processing, Special Issue on Information Theoretic Issues in Digital Watermarking, Vol. 81, No. 6., pp. 1141–1175, 2001. 228 [13] H. S. Malvar, “A modulated complex lapped transform and applications to audio processing,”, Proc. IEEE ICASSP, Phoenix, AZ, March 1999. 228, 238 [14] M. K. Mıh¸cak, R. Venkatesan and M. Kesal, “Discrete-Sequence Spread Spectrum Watermarking Methods and Estimation Attacks,” preprint, August, 2001. 228 [15] M. K. Mıh¸cak and R. Venkatesan, “Blind Image Watermarking via Derivation and Quantization of Robust Semi-Global Statistics,” Proc. IEEE ICASSP, Florida, FL, June 2002. 229
242
M. Kıvan¸c Mıh¸cak et al.
[16] S. LoPresto, K. Ramchandran and M. T. Orchard, “Image Coding based on Mixture Modeling of Wavelet Coefficients and a Fast Estimation–Quantization Framework,” Proc. Data Compression Conference 1997, Snowbird, Utah, pp. 221—230, 1997. 230 [17] M. K. Mıh¸cak, I. Kozintsev, K. Ramchandran and P. Moulin, “Low-Complexity Image Denoising Based on Statistical Modeling of Wavelet Coefficients,” IEEE Signal Processing Letters, Vol. 6, No. 12, pp. 300—303, Dec. 1999. 230 [18] H. L. Van Trees, Detection, Estimation and Modulation Theory, Wiley, 1968. 231
A
Proof of Lemma 1
ˆ ML = argmaxm∈M The optimization problem (5.4) can be rewritten as m log p (y|m) , which in turn can be rewritten as ˆ ML = m
argmax
q
w(i) ∈W,1≤i≤q i=1
log p y(i) |w(i)
(i) ˆ ML ≥ due to conditional independence of y on m. Now we have log p y(i) |w (i) ˆ ji for all 1 ≤ i ≤ q, 1 ≤ ji ≤ |W| where w ˆ ML are defined by (5.5). log p y(i) |w
q (i) ˆ ML ≥ qi=1 log p y(i) |w ˆ ji for all 1 ≤ ji ≤ |W|. This implies i=1 log p y(i) |w Hence the proof.
B
Proof of Lemma 2
First,we present the proof of part (i). Note that, using (5.8), pij = Pr < y, wi > ≥ < y, wk >, 1 ≤ k ≤ |W| | w = wj . Conditioned on wj , y = s + wj . Hence < y, wi > − < y, wk > |wj = aik + bij k. Since wi − wk are different at dik locationsand at eachof those locations the difference value is ±2∆, we have aik ∼ N 0, 4∆2 σ 2 dik . Also < wi , wj >= ij 2 ∆2 (M − 2dij ), < wk , wj >= ∆2 (M − 2dkj ), and hence 2 bk = 2∆ (dkj2−2dij ). i k j Therefore, < y, w > − < y, w > |w ∼ N 2∆ (dkj − dij ) , 4∆ σ dik
and pij = Pr aik ≥ ∆2 (dij − dkj ) = −bij k , 1 ≤ k ≤ M . Hence the proof of part (i). Next, we give the proof of part (ii). Note that by construction Ri is a positive semidefinite matrix; the assumption of strict positive definiteness is equivalent to having no full correlation between the components of ai (i.e., one cannot be determined from the other with full certainty) and hence this assumption is quite mild. If this assumption is not satisfied, it is possible to decrease dimension until this assumption is satisfied. Now, we briefly show that there exists an eigenvec- i i i iT i ˜k ≥ −˜bij tor decomposition R = V Λ V such that pij = Pr a k ,1 ≤ k ≤ M
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks T
243
T
˜ ij = Vi bij . Firstly, note that for any eigenvector de˜i = Vi ai and b where a i composition of R , since the eigenvector matrix is unitary, the norm is preserved T and hence the probability is invariant under the transform with Vi . However T in general, after transforming with an arbitrary Vi , the corresponding probabilities would be the probability of a˜ik ≥ −˜bij k some k and the probability of T ij i ki T ij ˜ a ˜k ≤ −bk for some other k. Now note that a ˜ik = vki ai and ˜bij b k = v where vki is the k-th eigenvector of Ri (i.e., k-th column of Vi ). If for some k, ki the corresponding probability is a ˜ik ≤ −˜bij with −vki and k , we just replace v we still have a valid eigenvector decomposition. Thus there exists an eigenvector ij i i ˜ decomposition of R such that pij = Pr a ˜k ≥ −bk , 1 ≤ k ≤ M . Furthermore, i iT T T T T ˜ ˜i is an inde˜a = Vi E ai ai Vi = Vi Vi Λi Vi Vi = Λi . Hence a E a of pendent Gaussian vector and a ˜ik ∼ N 0, λik where λik is the k-th element ij ˜ b M M ij Λi along the diagonal. Thus, pij = k=1 Pr a ˜ik ≥ −˜bk = k=1 Q − √k i . λk
Hence the proof of part (ii). Part (iii) is obvious.
C
Proof of Lemma 3
Using W = w0 , w1 within each segment, is equivalent to adding −∆ to the arithmetic mean of that segment if w = w0 or adding ∆ to the arithmetic mean of that segment if w = w1 . In that case the estimation problem can be rewritten as yi . w ˆML = argmax w w∈{∆,−∆}
The solution is given by
" ˆ ML = w
w0 if w1
i
i yi < 0, else
for each segment. Then we have Pr [w ˆML = ∆|w = −∆] = Pr ( i yi > 0|w √
= −∆) = Pr ( i si > M ∆) = Q M ∆/σ . From symmetry we have Pr [w ˆML √ = −∆|w = ∆] = Pr [w ˆML = ∆|w = −∆] = Q M ∆/σ .
D
Proof of Lemma 4
After the attack,
" di =
(1 − α) wi if wi = w ˆML,i . (1 + α) wi if wi = w ˆML,i
ˆ ML = wi and w = wj , then Therefore, if w ||d||2 =< d, d > = ∆2 (1 − α)2 (M − dij ) + (1 + α)2 dij = ∆2 M 1 + α2 + 2α (2dij − M ) .
244
M. Kıvan¸c Mıh¸cak et al.
Hence,
|W| |W| 2 ˆ ML = wi , w = wj . E ||d|| = ∆2 M 1+α2 +2α −M +2 dij Pr w i=1 j=1
(D.1) ˆ ML = wi , w = wj = pij Pr w = wj = pij / |W|. Using this in (D.1) But, Pr w yields (6.10).
E
Proof of Corollary 2
By using Lemma 3 in segment k, we have |W| |W|
1 1 dij pij = [d01 p01 + d10 p10 ] = d01 p01 = M Q |W| i=1 j=1 2
∆√ M σk
.
(E.1)
Using (E.1) in (6.10) we get (6.11).(6.12) is a trivial extension of (6.11) to the whole signal.
F
Proof of Lemma 5 PM = Pr [E < τ |w = −∆] Pr [w = −∆] + Pr [E < τ |w = ∆] Pr [w = ∆] = Pr [E < τ |w = −∆] ,
from symmetry. Now, Pr [E < τ |w = −∆] = Pr [E < τ |y > 0, w = −∆] Pr [y > 0|w = −∆] + Pr [E < τ |y < 0, w = −∆] Pr [y < 0|w = −∆] . (F.1) Concentrating on each of the components of (F.1), we get : Pr [y > 0|w = −∆] = Pr [s > ∆] , Pr [y < 0|w = −∆] = Pr [s < ∆] , Pr [E < τ |y > 0, w = −∆] = Pr −s∆ + ∆2 + α∆2 < τ |s > ∆ ∆2 (1 + α) − τ = Pr s > |s > ∆ , ∆ Pr [E < τ |y < 0, w = −∆] = Pr −s∆ + ∆2 − α∆2 < τ |s < ∆ ∆2 (1 − α) − τ = Pr s > |s < ∆ . ∆ Employing (F.2), (F.3), (F.4) and (F.5) in (F.1), we obtain ∆2 (1 + α) − τ Pr [E < τ |w = −∆] = Pr s > , s>∆ ∆
(F.2) (F.3)
(F.4)
(F.5)
(F.6)
Cryptanalysis of Discrete-Sequence Spread Spectrum Watermarks
∆2 (1 − α) − τ , s ∆ 2 ∆ (1 + α) − τ = Pr s > max , ∆ ∆ 2 ∆ (1 − α) − τ + Pr s > , s τ . Thus, if ∆ α > τ , (F.7) can be rewritten as ∆2 (1 + α) − τ ∆2 (1 − α) − τ Pr [E < τ |w = −∆] = Pr s > + Pr ∆ > s > , ∆ ∆ 2 2 ∆ (1+α)−τ ∆ ∆ (1−α)−τ =Q −Q +Q , (F.8) ∆σ σ ∆σ after carrying out necessary manipulations. Similarly, if ∆2 α ≤ τ , (F.7) can be rewritten as ∆2 (1 − α) − τ Pr [E < τ |w = −∆] = Pr [s > ∆] + Pr ∆ > s > , ∆ 2 ∆ (1 − α) − τ , (F.9) =Q ∆σ after carrying out necessary algebra. Combining (F.8) with (F.9), we get (6.13).
246
M. Kıvan¸c Mıh¸cak et al.
1 0.9
0.8
0.8
0.7
0.7
0.6
0.6
PM
1 0.9
P
M
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 0
1
2
3
4
5 N
6
7
8
9
10
0
6
0.5
1
1.5
2
2.5 alpha
x 10
(a) PM before attack(solid) - lower bound on PM after attack (dashed) vs. N , τ = N ∆2 /2
3
3.5
4
4.5
5
(b) PM before attack(solid) - lower bound on PM after attack (dashed) vs, α, τ = N ∆2 /2
1
0.9
0.8
lower bound on P
M
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
8
10 2
12
14
16
18
20
2
E||dtotal|| /(N∆ )
(c) Lower bound on PM after attack vs E||dtotal ||2 / N ∆2 , τ = N ∆2 /2
Fig. 1. Locally i.i.d. 0–mean Gaussian model is employed where σi = 25 + sin 2πi L , L = 5, 1 ≤ i ≤ q, M = 40, N = M q; τ /number of coefficients = ∆ /2. In plots (a) and (b), solid line shows PM without any attack (4.3); dashed line shows lower bound on PM after proposed attack (6.16) (the latter one is shown by solid line in (c)). In (a), α = 3 is fixed, x-axis shows N . In (b) and (c), N = 80000 is fixed, 0 < α ≤ 5. In (b), x-axis shows α; in (c), x-axis shows the average nor- malized distortion introduced in the attack signal, given by E||dtotal ||2 / N ∆2 (6.12)