Collusion-Resistant Audio Fingerprinting System in the Modulated Complex Lapped Transform Domain

Jose Juan Garcia-Hernandez; Claudia Feregrino-Uribe; Rene Cumplido

doi:10.1371/journal.pone.0065985

Abstract

Collusion-resistant fingerprinting paradigm seems to be a practical solution to the piracy problem as it allows media owners to detect any unauthorized copy and trace it back to the dishonest users. Despite the billionaire losses in the music industry, most of the collusion-resistant fingerprinting systems are devoted to digital images and very few to audio signals. In this paper, state-of-the-art collusion-resistant fingerprinting ideas are extended to audio signals and the corresponding parameters and operation conditions are proposed. Moreover, in order to carry out fingerprint detection using just a fraction of the pirate audio clip, block-based embedding and its corresponding detector is proposed. Extensive simulations show the robustness of the proposed system against average collusion attack. Moreover, by using an efficient Fast Fourier Transform core and standard computer machines it is shown that the proposed system is suitable for real-world scenarios.

Citation: Garcia-Hernandez JJ, Feregrino-Uribe C, Cumplido R (2013) Collusion-Resistant Audio Fingerprinting System in the Modulated Complex Lapped Transform Domain. PLoS ONE 8(6): e65985. https://doi.org/10.1371/journal.pone.0065985

Editor: Randen Lee Patterson, UC Davis School of Medicine, United States of America

Received: March 20, 2013; Accepted: April 29, 2013; Published: June 6, 2013

Copyright: © 2013 Garcia-Hernandez et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The work was supported by CONACyT (www.conacyt.gob.mx) under grant CB-2010-1-50910. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

In the Information Technology era, expansion of the Internet service together with the rapid advance of high capacity storage systems facilitated the fast and perfect copy of digital content. However, at the same time the use of these technologies causes serious problems, such as unauthorized copying and distribution of digital materials, [1]. Conventional cryptography systems encrypt digital data during its transmission and permit only authorized person to decrypt the encrypted data, nevertheless, once such data are decrypted they are totally vulnerable to illegal copying and distribution. One possible solution to this problem is the fingerprinting paradigm, where, a unique signature (which identifies to the legal user) known as a digital fingerprint is hidden using a watermarking technique into the content previously to distribution. Watermarking has several applications such as: ownership proof [2], secret communications [3], bio-security [4], [5], etc. Digital fingerprinting, which is also a watermarking application, has the capacity of identifying illegal users by extracting the fingerprint of a suspicious copy. A typical attack in fingerprinting systems is the collusion attack, where a group of users combine their copies in order to remove the original fingerprint. If a sufficient number of copies are combined, the noise produced by the collusion attack can disable/confuse the fingerprint detector and prevent the content owner from identifying the illegal users. Although several linear and nonlinear operations can be utilized for a collusion attack, it has been shown that the worst one is the linear averaging [6]. Therefore, it is necessary to design collusion-resistant fingerprints that can identify the greatest number of colluders involved in a pirate copy.

Collusion-resistant fingerprint codes have been proposed as a solution to the collusion attack [6]–[9]. Theoretical results for collusion-resistant fingerprint codes have shown interesting properties against collusion attacks, however, in practical sceneries their performance needs further research as these can be sensible to other kinds of attacks, [9].

On the other hand, Spread Spectrum (SS) modulation is a watermarking technique that has shown to be remarkably robust to several attacks, collusion included, [10]–[13]; therefore, it has been frequently utilized in fingerprinting systems [14]–[16]. The main drawback of fingerprinting schemes based in spread spectrum modulation is their high computational complexity as the number of correlations performed is proportional to the number of possible users. A users grouping approach was proposed in [17]. That idea is based on the consideration of colluders being more likely to have similar geographical area and interests with each other. Users are grouped according to common conditions between them. When a suspicious copy is identified, the first search is about the group IDs and then for user' IDs. The computational complexity is reduced, due to the colluders search is carried out in a tree fashion, [17].

In [18], the use of PN-modulated Discrete Cosine Transform (DCT) basis as fingerprints for digital images is proposed. The DCT operation can be represented as a multiplication between the input vector and one matrix conformed by the DCT basis. That multiplication is equivalent to correlations between the input vector and each column of the DCT matrix. Therefore, a fast DCT algorithm reduces the computational complexity of correlations needed in the IDs detection to the logarithmic scale. The fingerprint is formed by the sum of two PN-modulated DCT basis, one for the group ID and the other for the user ID. In the detection stage, firstly the groups to which colluders belong are detected, and then colluders are detected for each of them. In [19] the interference due to colluder fingerprints is removed and performance of the system in [18] is improved drastically.

The music piracy produces large monetary losses around the world [20], [21], therefore, a tool that helps to mitigate the music piracy is mandatory. However, most of the reported collusion-resistant fingerprinting schemes are devoted to digital images [7], [8], [16], [18], [19], and only very few are validated with audio signals [24]. This paper is about collusion-resistant audio fingerprinting. A collusion-resistant audio fingerprinting system based on some of the ideas developed for digital images in [18], [19] is proposed. Instead of using the full signal as [18], [19] a block-based fingerprint embedding strategy is followed and the corresponding detector is derived. In this paper, the Modulated Complex Lapped Transform (MCLT) domain is utilized as fingerprint channel due to no block-artifact property in audio watermarking systems [11], [25]–[28].

Related Work

Work reported in [24] claims to be able to detect 80 colluders in a pirate audio clip. However, that system seems to be not suitable for real world scenarios. One weakness is about construction of component vector which is carried out using two audio channels in the Fourier domain. Due to a trigonometric function (inverse tangent) is involved in this stage, a simple attack like sign inversion in one audio channel prevents the correct ID detection as multiplication by −1 is equivalent to a phase shift by radians. Moreover, if an audio channel is scaled (volume gain) the relation between both channels will be different to the original and the detection will fail. Detector performance after lossy compression, such as MP3 coding or Advanced Audio Coding (AAC) is not reported. It is important to mention that sign inversion, volume gain and lossy compression are real world scenarios. Neither viability of the system nor the number of users is reported. To the best of our knowledge, it is the only work about collusion-attack resistant fingerprinting in audio signals. Although there are several works about audio fingerprinting in the literature, almost all of them do not consider the collusion-attack [29]–[33].

On the other hand, most of the works about collusion-attack resistant fingerprinting systems are devoted to digital images mainly based in Spread Spectrum techniques. The main drawback of fingerprinting schemes based in SS techniques is their high computational complexity as it is discussed in the Introduction. In order to achieve lower computational complexity than SS-fingerprinting schemes for digital images, in [18] it is proposed to utilize PN-modulated orthogonal sequences. These orthogonal sequences can be obtained from DCT or DFT basis. In the DCT case, each user is related to a DCT matrix column which is defined in equation (26). Therefore, the SS sequence for the th user becomes:(1)

where is a robustness factor, is a PN sequence generated using an initial value , is a secret key, is the th DCT matrix column and is the element-wise multiplication. The sequence is embedded into the frequency components of a digital medium, in this paper audio signals. As an example, Figure 1 shows the SS sequence, , for the user 1890 of 2048 and .

Download:

Figure 1. Spread spectrum sequence for the user 1890 of 2048.

https://doi.org/10.1371/journal.pone.0065985.g001

Unlike other watermarking applications, in the fingerprinting paradigm, detection is usually carried out in a non-blind fashion [34], i.e. the original signal is available to the detector. Under that condition, after subtracting the original sequence from the pirate copy the sequence is obtained. In order to carry out the detection the sequence is obtained by applying the Inverse DCT to which is demodulated by the PN sequence as follows:(2)where InverseDCT(.) denotes a fast inverse discrete cosine transform algorithm as described in the Materials and Methods section. Figure 2 shows the corresponding for detection of the user 1890 out of 2048, as exemplified above.

Download:

Figure 2.

for detection of the user 1890 of 2048.

https://doi.org/10.1371/journal.pone.0065985.g002

From Figure 2, it is possible to observe that a threshold is necessary in order to determine the user under a statistical point of view. If is supposed to be except for a fingerprinted component , it is possible to calculate a threshold according to the probability of false detection [18] as follows:(3)

where erfc() is the complementary error function defined as:(4)

Therefore, the threshold is given by the expression (5),(5)where erfc(.) stands for the inverse complementary error function.

Grouping a set of users has been proposed in the literature as a solution to high computational costs [8], [16]. The assumption behind this proposal is that users who have a similar background and region are more likely to collude each other. In [18] the idea of introducing dependency between two SS sequences by exploiting the property of quasi-orthogonality of PN sequences is proposed. Thus, the fingerprint is integrated by two spread spectrum sequences related to a group ID and an user ID as follows:(6)

where is the robustness factor for groups, is a PN sequence generated with the secret key , is the th basis vector that identified to the th group and(7)

where is the robustness factor for users, is a PN sequence corresponding to the th group, and is the th basis vector that identified the th user.

Then, the fingerprint assigned to the th user of the th group is conformed by:(8)

The energy of the fingerprint is represented by(9)

From equation (8) it is easy to see that a couple of detectors is required, one for the spread spectrum sequence related to group ID and other for the user ID . These detectors are derived from equation (2) as follows:

For group ID detection:(10)

and for user ID detection:(11)

with thresholds, and , derived according to equation (5) as follows:(12)(13)

where and are given false positive probabilities for the group and user ID detection procedures respectively. and are the variance of the group and user ID detection sequences respectively.

The outline of the paper is as follows: First, experimental results and discussion are offered. In the Materials and Methods section, we recall the Modulated Complex Lapped Transform and Discrete Cosine Transform and their fast algorithms used in this work. In the Fingerprinting System section steps are described comprising audio fingerprinting method by DCT modulation in the MCLT domain. Finally, conclusions are offered.

Results and Discussion

The proposed audio fingerprinting system is evaluated under averaging collusion attacks. Through abundant experiments; the operation parameters are determinate too. For experimentations, CD-quality audio files are utilized from a set of 1000 popular music recordings. The probability of false detection is set to for both group () and user ID detection () procedures, as this is a typical value in audio spread spectrum-based watermarking systems [11].

Fingerprint Robustness Determination

In order to determinate the adequate and values in equations (6) and (7); an audio transparency metric is utilized, the Objective Difference Grade (ODG) [35]. An ODG value between and −1 is considered a good perceptual transparency [35]. In the experiment, several audio clips are fingerprinted with different combinations of and values and the ODG metric for each combination is obtained. The limit for practical and values is determinate for ODG as the bigger the fingerprint energy the lower the ODG value. In the spread spectrum watermarking, it is well known that the bigger the watermark energy the bigger the watermark robustness [2]. Therefore, it is interesting to investigate the biggest fingerprint energy values that maintain a good perceptual transparency. Figure 3 shows the ODG region for an average of 10 sets of 225 fingerprinted audio clips.

Download:

Figure 3. ODG region for fingerprinted audio clips.

https://doi.org/10.1371/journal.pone.0065985.g003

In order to provide a reference for practical and values, Table 1 shows the corresponding ODG values for combinations of and values.

Download:

Table 1. ODG values for combinations of

and

values.

https://doi.org/10.1371/journal.pone.0065985.t001

Figure 4 shows the collusion-attack robustness for the combination with the higher acceptable and the combination with the higher acceptable , with colluders from the same group and block length, . It is interesting to note that the detection performance is better when the robustness factor for users is greater than the robustness factor for groups, i.e. , moreover, according to [17], users in a group are more likely to collude with each other, therefore, the number of group IDs involved in a pirate copy must be smaller than the number of colluder IDs. As a consequence, the energy of the user ID PN-sequence must be higher than the group ID PN-sequence, i.e. . Under that asseveration, combinations , , and seem to be good candidates for fingerprint embedding.

Download:

Figure 4. Collusion-attack robustness for

and

.

https://doi.org/10.1371/journal.pone.0065985.g004

Block Length Influence

From Figure 4 it is possible to observe that in the best case, , the number of detected colluders appears low for practical applications. In order to improve the performance of the proposed system, the influence of the block length, , is investigated. Collusion-attack robustness is studied for different block lengths and Figure 5 shows the results for such study.

Download:

Figure 5. Collusion-attack robustness for several block length,

.

https://doi.org/10.1371/journal.pone.0065985.g005

The higher block length the higher collusion-attack robustness. However, for the increase in performance is not significant in comparison with as it can be seen in Figure 5. Moreover, the computing resources for computing FFT in the MCLT and DCT transforms can be critical for some platforms when the number of points is larger [36]. Therefore, , seems to be the best option as it is possible to detect more colluders users from the totality of them. Figure 6 shows the detection rate of colluders in function of block length, . This confirms what mentioned from results in Figure 5.

Download:

Figure 6. Detection rate of colluders in function of block length,

.

https://doi.org/10.1371/journal.pone.0065985.g006

Implementation Issues

In this subsection, the viability of the proposed system is addressed. It can be interesting to measure the computing time of fingerprint embedding and detection as a function of the block sizes as several block sizes have been studied. Table 2 shows time requirements for several block sizes in real-time terms. It is interesting to point out that the computing complexity increases very slightly when the block size increases in two-power factor.

Download:

Table 2. Time requirements for several block sizes.

https://doi.org/10.1371/journal.pone.0065985.t002

According to Table 2, an 80 min. music album can be fingerprinted in about 2.37 min. () which could be attractive for on-line music distribution services.

Let be the number of groups of colluders involved in a pirate copy, it is necessary detection process operations in order to find all of the colluders as one detection process is utilized for group IDs detection and 1 detection process for each detected group in order to identify colluder user IDs. As an example, if an 80 min. music album is pirated by 40 users from 5 groups, the colluders detection in the whole album requires about 15.3 min. () which seems to be a non-prohibitive amount of time for commercial applications. Moreover, if the number of colluders is higher but the number of groups is the same, the computational complexity will be maintained about 15.3 min. as it only depends of number of detected groups.

Audio Clip Requirements for IDs Detection

Due to the nature of the fingerprint insertion process, it is possible to assume that it is not necessary the whole audio clip in the detection process. The IDs detection is carried out by a counter of events that exceed thresholds, therefore, if there are enough events the system achieves its maximum detection capacity. This is expected to happen after a certain number of events and after the behavior of the detector goes stable. In order to validate that claim, the next experiment was carried out: a set of audio clips were fingerprinted with different IDs, and a pirate copy was generated for 2 to 50 colluders; for 1 to 55 seconds of the audio clip ID detection is executed and detected colluders are counted. This experimentation was carried out with 100 different pirate audio clips with and their results are averaged. Figure 7 shows the detector behavior in function of pirate audio clip duration and number of colluders.

Download:

Figure 7. Colluders detection in function of pirate audio clip duration.

https://doi.org/10.1371/journal.pone.0065985.g007

It is interesting to observe that the curve remains without notable changes from 26 seconds to 55 seconds. In other experimentation, using several 30 seconds pirate audio clips, the detector capacity is the same as compared with detection using the whole pirate audio clips, which corroborates the behavior shown in Figure 7. On the other hand, according to Figure 8, for a probability of colluders detection equal to 1, the detector behavior is practically the same for durations longer that 2 seconds.

Download:

Figure 8. Probability of colluders detection for several pirate audio clip durations.

https://doi.org/10.1371/journal.pone.0065985.g008

Lossy Compression Attack

In order to validate the proposed system in a practical scenario, robustness to collusion attack after lossy compression is explored. Advanced Audio Coding (AAC) is used for experimentation as it has shown better performance in perceptual transparency and compression rates terms as compared with MPEG-1 and MPEG-2 Audio Layer 3 [37]. The block length utilized in the experiment is , and the number of audio clips involved is 225. Figure 9 shows the detector performance under collusion attack after AAC compression for several bitrates.

Download:

Figure 9. Collusion-attack robustness for several AAC bitrates.

https://doi.org/10.1371/journal.pone.0065985.g009

It is possible to see from Figure 9 that the lower AAC bitrate the lower performance. Figure 10 shows the detection probability of colluded attacked audio clip after AAC compression for several bitrates. The detector performance reduces about 12% after high quality lossy compression; which is competitive for real work environments.

Download:

Figure 10. Detection rate of colluders in function of several AAC bitrates.

https://doi.org/10.1371/journal.pone.0065985.g010

Gain and Inverse Attack

It is well known that SS watermarking is strong against gain attack, however, in order to corroborate that claim an experiment was carried out. After several pirate audio clips were uniformly scaled in the range [0.5,1.5] in 0.1 steps, the detector performance was the same for all of the scaled values.

Due to the linearity of the embedding domain, when the sign of the pirate audio signal is inverted, the same happens in the embedding domain. Therefore, in order to guarantee the correct detection in an inverse attack scenario, the ID in a block is counted when correlation value is bigger that thresholds and or lower that and , which is a very small change to the detector. This claim was corroborated with several experiments where, in the presence of the inverse attack, the detector performance is not altered

Comparison

To the best of our knowledge, the work reported in [24] is the only one addressing the collusion-resistant fingerprinting problem with audio signals. Table 3 shows a detailed feature comparison of the proposed system against that proposed in [24]. As it was described in the Related Work section, the work reported in [24] is not robust against sign inversion attack whereas, according to the results described in the Results and Discussion section, the proposed system is robust against this type of attack. A very common real-world audio processing operation is volume gain, the proposed system is able to resist this processing while the work reported in [24] does not. Moreover, unlike [24], this paper reports results for lossy compression and system viability, which are real-world scenarios as music distribution is nowadays in compressed format and real-time. Comparing the work reported in [24] with the proposed system in terms of detection probability per number of colluders is a difficult task as it is unclear which value of false alarm, , is considered in that work. The aforementioned work lacks in a statistical analysis of the system performance, therefore, a deepest comparison with the proposed system can be biased.

Download:

Table 3. Comparison between the proposed work and [24].

https://doi.org/10.1371/journal.pone.0065985.t003

Summary of Results

In this paper, a block-based approach for fingerprinting is considered. This consideration is due to two facts: 1) a frequency transform for a full typical audio clip is practically intractable and 2) if the fingerprint is replied each block, then, for detection is not necessary the full pirate audio signal. As a consequence of the block-based approach, the detection is carried out according to the half-normal distribution. Through experimentation, it was shown that about 1 second of CD-quality pirate audio signal is enough for probability of colluders detection equal to 1.

The optimal energy for user and group ID fingerprints in function of ODG metric is also studied. It was observed that the bigger user ID fingerprint energy, , the better detection performance. This characteristic is interesting because users in a group are more likely to collude with each other [17], therefore; the number of group IDs involved in a pirate copy must be minor to the number of colluder IDs.

The impact in the fingerprint detection process of the block length was investigated through experimentation. It was observed that the higher block length the higher collusion-attack robustness. However, for a block length bigger than samples the performance improvement is not significative. Moreover, for a bigger block length the needed computing resources are also bigger and even intractable for some platforms.

For validation purposes, the proposed fingerprinting system was implemented in an standard modern computer using free libraries. The performance is guaranteed to be several times better that the real-time restriction. The proposed system viability is demonstrated.

Finally, the robustness of the proposed system to typical attacks in real-world scenarios, such as lossy compression, gain and inverse attacks, was shown. Then results suggest that the proposed fingerprinting system is suitable for practical applications, therefore, attractive for the music industry.

Materials and Methods

Due to the proposed fingerprinting system utilizes the DCT basis as fingerprint modulators and the insertion domain is the set of MCLT magnitudes, in this section are recalled two Fast Fourier Transform (FFT)-based fast algorithms for MCLT and DCT calculations which are utilized for the proposed fingerprinting system implementation.

Modulated Complex Lapped Transform

The Modulated Complex Lapped Transform (MCLT) is a particular kind of a 2x oversampled generalized DFT filter bank proposed in [38] whose basis are:(14)(15)(16)

with:(17)

and(18)

where is the time-domain index, is the frequency-domain index, is the sample block length and . The MCLT coefficients of the input vector are calculated as with:(19)

Fast MCLT Algorithm.

In [39] it was proposed a FFT-based fast MCLT algorithm. The MCLT coefficients can be obtained as follows:(20)

where(21)

and is the common notation for the complex exponential used in Fourier transforms, namely:(22)

is a point FFT with orthonormal basis function of the input block , which means that MCLT coefficients can be computed by computing FFT of to obtain and carring out the operations with factors .

Fast Inverse MCLT Algorithm.

In order to carry out the inverse MCLT, in [39] is developed the next relation:(23)where are the MCLT coefficients, the superscript * denotes complex conjugation, and the modulation is the same as that in (21). Using (23) we compute the first FFT coefficients of , but it is well known that FFT coefficients must satisfy the conjugate symmetry property:(24)

Finally, we know that and must be real-valued, and after some manipulations:(25)

with and taking the real and imaginary parts, respectively.

Discrete Cosine Transform

The Discrete Cosine Transform (DCT) is a linear and invertible function in the Real Numbers set, originally derived from Chebyshev polynomials [40]. The DCT basis are orthogonal and defined as follows:(26)

where(27)

Fast DCT and Inverse-DCT Algorithms.

It is known that the Fourier transform of a real-even function is real-even, and times the Fourier transform of a real-odd function is real-odd, thus for these symmetry conditions it is not necessary to use complex inputs/output. Therefore, it is possible to compute the DCT or the Discrete Sine Transform (DST) by utilizing an FFT algorithm.

Let be the input vector even around and even around , it is possible to show that DFT() is the non-normalized DCT of , described as follows:(28)

with basis:(29)

The basis set described in equation (29) is non-orthogonal, therefore, it is necessary to normalize equation (28) in order to get the orthogonal transform as follows:(30)

On the other hand, let be the input vector even around and odd around , it is possible to show that DFT() is the non-normalized Inverse DCT of , described as follows:(31)

As in equation (28), it is necessary a normalization procedure in order to get the orthogonal transform. The normalization is carried out as follows:(32)

In the literature, fast algorithms for the DFT have been extensively reported and very efficient software libraries exist [41]. In this work, these libraries are utilized as a module of the DCT and MCLT computing, reducing the effort required for efficient implementation to a butterfly stage implementation for MCLT and a normalization stage implementation for DCT.

The Fingerprinting System

The frequency domain for embedding is the Modulated Lapped Complex Transform (MCLT). In order to bring perceptual transparency, the fingerprint is embedded into MCLT magnitudes while keeping phases without changes.

Fingerprint Embedding.

Instead of [18], [19], in this paper the fingerprint is replicated several times along the audio signal in a block-processing fashion as typical CD-quality music clips are conformed by about 8 million of samples and the embedding/detecting process can become intractable if an orthogonal transform is applied to the whole audio clip. Moreover, by splitting the audio signals in blocks for fingerprinting it is possible to detect colluders with a fraction of the whole audio clip which is demonstrated in the Results section. Each samples-block is 50% overlapped as the MCLT is a lapped transform. Due to MCLT is a 2x oversampled DFT filter bank, audio samples are required in order to compute MCLT coefficients. Figure 11 shows a block diagram of the embedding system.

Download:

Figure 11. Fingerprint embedding system.

https://doi.org/10.1371/journal.pone.0065985.g011

The fingerprint embedding process is carried out as follows: firstly host audio signal is divided into frames of samples per frame. Next, each frame is transformed using the MCLT. Subsequently both magnitude and phase of MCLT are computed. The fingerprint is then added to the MCLT magnitudes while keeping phase without change. The additive technique is utilized for embedding as follows:(33)

where is the fingerprinted MCLT magnitude, is the original MCLT magnitude and is the fingerprint assigned to the jth user of the ith group. Finally, inverse MCLT is applied to both processed magnitude and original phase to get the audio signal with hidden fingerprint. The fingerprint is conformed according to equation (8), the secret key provides the system security in a symmetric-key fashion. The and variables represent the authorized user and its group respectively. The PN-Generators produce pseudo-noise with an uniform distribution.

Fingerprint Detection.

Figure 12 shows the colluders detection system. In fingerprinting systems is a common assumption to get access to the original media. That consideration is taken into account for the proposed system.

Download:

Figure 12. Colluder detection system.

https://doi.org/10.1371/journal.pone.0065985.g012

Detection procedure is carried out in a block fashion as the fingerprint is embedded in the same way. In this paper, a detection strategy using several MCLT magnitude blocks is proposed.

Group ID Detection.

Figure 13 shows the group IDs detection system. For each available MCLT coefficients block, group detection is carried out according to the threshold, , described in equation (12). For the whole pirate audio clip, there is a counter vector that registers the number of times that each component of exceeds .

Download:

Figure 13. Group IDs detection system.

https://doi.org/10.1371/journal.pone.0065985.g013

Instead of the group ID detection in each block where the threshold is computed assuming a Gaussian distribution, the threshold, , for detection in the counter vector must consider other distribution as the lower limit of that distribution will always be zero. In order to know the statistical behavior of the counter vector , 120 different fingerprinted audio clips are utilized. Figure 14 shows the distribution of the counter vector .

Download:

Figure 14. Distribution of the counter vector

.

https://doi.org/10.1371/journal.pone.0065985.g014

As can be seen from Figure 14, can be modeled by the Half-normal distribution, which is defined as follows:(34)

with cumulative distribution function as follows,(35)

For a given threshold, , the false detection probability , is computed by subtracting the cumulative distribution function to the unit as follows:(36)

Figure 15 shows the for a given threshold in a Half-normal distribution.

Download:

Figure 15. Half-normal distribution.

https://doi.org/10.1371/journal.pone.0065985.g015

Using a change of variable in equation (35) it becomes:(37)

where erf(.) is the error function and is related to the complementary error function as:(38)

From equations (36), (37) and (38); the false detection probability for a given threshold can be rewritten as:(39)

Therefore, the threshold, for a given for a group ID detection in a pirate audio clip can be computed as follows:(40)

where is the variance of .

User ID Detection.

Figure 16 shows the user ID detection system. In similar form that group ID detection, for each available MCLT coefficients block, user detection is carried out according to the threshold, , described in equation (13). For the whole pirate audio clip, there is a counter vector that registers the number of times that each component of exceeds . Counter vector is modeled as a Half-normal distribution and the corresponding threshold for a given is calculated according to:

Download:

Figure 16. User IDs detection system.

https://doi.org/10.1371/journal.pone.0065985.g016

(41)where is the variance of . In order to improve user ID detection, the interference due to group ID is previously removed.

Number of Users.

Due to fingerprints are formed by two DCT modulated PN-sequences, the number of possible IDs for each of them is equal to their respective lengths, for both. Therefore, the maximum number of possible users of the system is .

Software Implementation

Audio signals are processed as vectors of float numbers in the range . For audio file manipulations the libsndfile library [42] is used. The entire fingerprinting system was implemented in C language in a Intel Core i7 CPU and 8 GB RAM. The compiler used in this work is GCC version 4.2.1 and the operating system is Mac Os X version 10.7.5. The programs are compiled with -o3 optimization flag. In order to compute the FFT for the fast MCLT and DCT algorithms described above, the FFTW library [41] is utilized.

FFTW is an optimized library that implements most of the variants of the Discrete Fourier Transform. Moreover, FFTW is able to exploit the Message Passing Interface (MPI) and multi-threaded strategies in order to utilize the full power of modern personal computers. Due to the computer used for validation of the proposed fingerprinting system is a multi-core shared-memory computer, the instantiation of FFTW is carried out using multi-threaded calls.

Conclusions

In this paper, a collusion-resistant fingerprinting system for audio signals is proposed. Each fingerprint is formed by a PN-sequence representing a group ID and other representing one user ID, following state-of-the-art ideas for fingerprinting systems in digital images. Due to nature of audio signals, the fingerprint is replied several times along the audio clip, therefore, it is not necessary the whole audio clip in the detection process. This characteristic guarantees the performance to be several times better that the real-time restriction. The detector performance after high quality lossy compression remains competitive for real work environments. The number of users available, the low computational complexity and the high quality lossy compression robustness make the proposed algorithm attractive for a number of audio processing applications.

Author Contributions

Conceived and designed the experiments: JJG CFU RC. Performed the experiments: JJG CFU RC. Analyzed the data: JJG CFU RC. Contributed reagents/materials/analysis tools: JJG CFU RC. Wrote the paper: JJG CFU RC.

References

1. Zhang X, Liu Q, Wang H (2012) Ontologies for intellectual property rights protection. Expert Systems with Applications 39: 1388–1400.
- View Article
- Google Scholar
2. Cox I, Miller M, Bloom J, Fridrich J, Kalker T (2007) Digital Watermarking and Steganography, 2nd Ed. (The Morgan Kaufmann Series in Multimedia Information and Systems). Morgan Kaufmann, 2nd edition.
3. Fridrich J (2009) Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press.
4. Jupiter D, Ficht T, Samuel J, Qin Q, de Figueiredo P (2010) Dna watermarking of infectious agents: Progress and prospects. PLoS ONE 6: e1000950.
- View Article
- Google Scholar
5. Liss M, Daubert D, Brunner K, Kliche K, Hammes U, et al. (2012) Embedding permanent watermarks in synthetic genes. PLoS ONE 7: e42465.
- View Article
- Google Scholar
6. Kiyavash N, Moulin P, Kalker T (2009) Regular simplex fingerprints and their optimality properties. IEEE Transactions on Information Forensics and Security 4: 318–329.
- View Article
- Google Scholar
7. Boneh D, Shaw J (1998) Collusion-secure fingerprinting for digital data. IEEE Transactions on Information Theory 44: 1897–1905.
- View Article
- Google Scholar
8. Trappe W, Wu M, Wang J, Liu R (2003) Anti-collusion fingerprint for multimedia. IEEE Transactions on Signal Processing 51: 1069.
- View Article
- Google Scholar
9. Tardos G (2003) Optimal probabilistic fingerprint codes. In: Proceedings of 35th ACM Symposium on Theory Computing. pp. 116–125.
10. Cox I, Kilian J, Leighton T, Shamoon T (1997) Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing 6: 1673–1687.
- View Article
- Google Scholar
11. Kirovski D, Malvar H (2003) Spread spectrum watermarking of audio signals. IEEE Transactions on Signal Processing 51: 1020–1033.
- View Article
- Google Scholar
12. Tsai MJ, Luo YF (2009) Service-oriented grid computing system for digital rights management (gc-drm). Expert Systems with Applications 36: 10708–10726.
- View Article
- Google Scholar
13. Geetha S, Ishwarya N, Kamaraj N (2010) Audio steganalysis with hausdorff distance higher order statistics using a rule based decision tree paradigm. Expert Systems with Applications 37: 7469–7482.
- View Article
- Google Scholar
14. Kirovski D, Malvar H, Yacobi Y (2004) A dual watermark-fingerprint system. IEEE Multimedia 11: 59–73.
- View Article
- Google Scholar
15. Zhao HV, Wu M, Wang ZJ, Liu KJR (2005) Forensic analysis of nonlinear collution attacks for multimedia fingerprinting. IEEE Transactions on Image Processing 14: 646–61.
- View Article
- Google Scholar
16. Wang ZJ, Wu M, Zhao HV, Trappe W, Liu KJR (2005) Anti-collusion forensics of multimedia fingerprinting using orthogonal modulation. IEEE Transactions on Image Processing 14: 804–821.
- View Article
- Google Scholar
17. Wang ZJ, Wu M, Trappe W, Liu KJR (2004) Group-oriented fingerprinting for multimedia forensics. EURASIP Journal on Advances in Signal Processing 2004: 2153–2173.
- View Article
- Google Scholar
18. Kuribayashi M (2011) Hierarchical spread spectrum fingerprinting scheme based on the cdma technique. EURASIP Journal on Information Security 2011.
19. Kuribayashi M (2012) Interference removal operation for spread spectrum fingerprinting scheme. IEEE Transactions on Information Forensics and Security 7: 403–417.
- View Article
- Google Scholar
20. Siwek SE (2007) The true cost of sound recording piracy to the u. s. economy. Policy Report 188, Institute for Policy Innovation.
21. IIPA (2010) Iipa 2010 “special 301” recommendations. Available: http://www.iipa.com/rbc/2010/2010SPEC301LOSSLEVEL.pdf. Accessed 2013 March 6.
22. Kim Wg, Lee S, Seo Ys (2006) Image fingerprinting scheme for print-and-capture model. In: Zhuang Y, Yang SQ, Rui Y, He Q, editors, Advances in Multimedia Information Processing -PCM 2006, Springer Berlin/Heidelberg, volume 4261 of Lecture Notes in Computer Science. pp. 106–113.
23. Celik MU, Sharma G, Tekalp AM (2004) Collusion-reilient fingerprinting by random pre-warping. IEEE Signal Processing Letters 11: 831–835.
- View Article
- Google Scholar
24. Tirkel A, Hall TE, Osborne CF, Meinhold N, Moreno O (2011) Collusion resistant fingerprinting of digital audio. In: Proceedings of the 4th international conference on Security of information and networks. ACM, pp. 5–12.
25. Kirovski D, Malvar H (2001) Robust covert communication over a public audio channel using spread spectrum. In: 4th International Information Hiding Workshop. pp. 354–368.
26. Zezula R, Misurec J (2006) Audio signal watermarking in mclt domain with the aid of 2d pattern. In: Proceedings of 2nd International Conference on Digital Telecommunications, ICDT '07. p. 40.
27. Garcia-Hernandez JJ, Nakano M, Perez H (2008) Data hiding in audio signals using rational dither modulation. IEICE Electron Express 5: 217–222.
- View Article
- Google Scholar
28. Garcia-Hernandez JJ, Feregrino-Uribe C, Cumplido R, Reta C (2011) On the implementation of a hardware architecture for an audio data hiding system. Journal of Signal Processing Systems 64: 457–468.
- View Article
- Google Scholar
29. Cano P, Batlle E, Kalker T, Haitsma J (2002) A review of algorithms for audio fingerprinting. In: IEEE Workshop on Multimedia Signal Processing. IEEE, pp. 169–173.
30. Cano P, Batlle E, Gomez E, Gomes LCT, Bonnet M (2005) Audio Fingerprinting: Concepts and Applications, Springer-Verlag, chapter 2. Studies in Computational Intelligence. pp. 233–245.
31. Jang D, Jin M, Lee J, Lee S, Lee S, et al.. (2006) Automatic commercial monitoring for tv broadcasting using audio fingerprinting. In: AES, editor, 29th International Conference: Audio for Mobile and Handheld Devices. 3–2.
32. Baluja S, Coveli M (2008) Waveprint: Efficient wavelet-based audio fingerprinting. Pattern Recognition 41: 3467–3480.
- View Article
- Google Scholar
33. Bellettini C, Mazzini G (2010) A framework for robust audio fingerprinting. Journal of Communications 5: 409–424.
- View Article
- Google Scholar
34. Liu KJR, Trappe W, Wang ZJ, Wu M, Zhao H (2005) Multimedia Fingerprinting Forensics for Traitor Tracing, volume 4 of EURASIP Book Series on Signal Processing and Communications. Hindawi Publishing Corporation.
35. Thiede T, Treurniet W, Bitto R, Schmidmer C, Sporer T, et al. (2000) Peaq - the itu standard for objective measurement of perceived audio quality. AES 48: 3–29.
- View Article
- Google Scholar
36. Smith SW (2002) Digital Signal Processing: A Practical Guide for Engineers and Scientists. Newnes.
37. Brandenburg K (1999) Mp3 and aac explained. In: AES, editor, Audio Engineering Society Conference: 17th International Conference: High-Quality Audio Coding. 1709, pp. 1–12.
38. Malvar HS (1999) A modulated complex lapped transform and its applications to audio processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, volume 3 , pp. 1421–1424.
39. Malvar HS (2003) Fast algorithm for the modulated complex lapped transform. IEEE Signal Processing Letters 10: 8–10.
- View Article
- Google Scholar
40. Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Transactions on Computers 23: 90–93.
- View Article
- Google Scholar
41. Frigo M, Johnson SG (2005) The desing and implementation of fftw3. Proceedings of the IEEE 93: 216–231.
- View Article
- Google Scholar
42. de Castro Lopo E (2011) Libsndfile. Available: http://www.mega-nerd.com/libsndfile/. Accessed 2013 January 25.

[ref1] 1. Zhang X, Liu Q, Wang H (2012) Ontologies for intellectual property rights protection. Expert Systems with Applications 39: 1388–1400.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Cox I, Miller M, Bloom J, Fridrich J, Kalker T (2007) Digital Watermarking and Steganography, 2nd Ed. (The Morgan Kaufmann Series in Multimedia Information and Systems). Morgan Kaufmann, 2nd edition.

[ref3] 3. Fridrich J (2009) Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press.

[ref4] 4. Jupiter D, Ficht T, Samuel J, Qin Q, de Figueiredo P (2010) Dna watermarking of infectious agents: Progress and prospects. PLoS ONE 6: e1000950.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Liss M, Daubert D, Brunner K, Kliche K, Hammes U, et al. (2012) Embedding permanent watermarks in synthetic genes. PLoS ONE 7: e42465.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Kiyavash N, Moulin P, Kalker T (2009) Regular simplex fingerprints and their optimality properties. IEEE Transactions on Information Forensics and Security 4: 318–329.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Boneh D, Shaw J (1998) Collusion-secure fingerprinting for digital data. IEEE Transactions on Information Theory 44: 1897–1905.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Trappe W, Wu M, Wang J, Liu R (2003) Anti-collusion fingerprint for multimedia. IEEE Transactions on Signal Processing 51: 1069.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Tardos G (2003) Optimal probabilistic fingerprint codes. In: Proceedings of 35th ACM Symposium on Theory Computing. pp. 116–125.

[ref10] 10. Cox I, Kilian J, Leighton T, Shamoon T (1997) Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing 6: 1673–1687.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref11] 11. Kirovski D, Malvar H (2003) Spread spectrum watermarking of audio signals. IEEE Transactions on Signal Processing 51: 1020–1033.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref12] 12. Tsai MJ, Luo YF (2009) Service-oriented grid computing system for digital rights management (gc-drm). Expert Systems with Applications 36: 10708–10726.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref13] 13. Geetha S, Ishwarya N, Kamaraj N (2010) Audio steganalysis with hausdorff distance higher order statistics using a rule based decision tree paradigm. Expert Systems with Applications 37: 7469–7482.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref14] 14. Kirovski D, Malvar H, Yacobi Y (2004) A dual watermark-fingerprint system. IEEE Multimedia 11: 59–73.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref15] 15. Zhao HV, Wu M, Wang ZJ, Liu KJR (2005) Forensic analysis of nonlinear collution attacks for multimedia fingerprinting. IEEE Transactions on Image Processing 14: 646–61.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref16] 16. Wang ZJ, Wu M, Zhao HV, Trappe W, Liu KJR (2005) Anti-collusion forensics of multimedia fingerprinting using orthogonal modulation. IEEE Transactions on Image Processing 14: 804–821.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref17] 17. Wang ZJ, Wu M, Trappe W, Liu KJR (2004) Group-oriented fingerprinting for multimedia forensics. EURASIP Journal on Advances in Signal Processing 2004: 2153–2173.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref18] 18. Kuribayashi M (2011) Hierarchical spread spectrum fingerprinting scheme based on the cdma technique. EURASIP Journal on Information Security 2011.

[ref19] 19. Kuribayashi M (2012) Interference removal operation for spread spectrum fingerprinting scheme. IEEE Transactions on Information Forensics and Security 7: 403–417.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref20] 20. Siwek SE (2007) The true cost of sound recording piracy to the u. s. economy. Policy Report 188, Institute for Policy Innovation.

[ref21] 21. IIPA (2010) Iipa 2010 “special 301” recommendations. Available: http://www.iipa.com/rbc/2010/2010SPEC301LOSSLEVEL.pdf. Accessed 2013 March 6.

[ref22] 22. Kim Wg, Lee S, Seo Ys (2006) Image fingerprinting scheme for print-and-capture model. In: Zhuang Y, Yang SQ, Rui Y, He Q, editors, Advances in Multimedia Information Processing -PCM 2006, Springer Berlin/Heidelberg, volume 4261 of Lecture Notes in Computer Science. pp. 106–113.

[ref23] 23. Celik MU, Sharma G, Tekalp AM (2004) Collusion-reilient fingerprinting by random pre-warping. IEEE Signal Processing Letters 11: 831–835.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref24] 24. Tirkel A, Hall TE, Osborne CF, Meinhold N, Moreno O (2011) Collusion resistant fingerprinting of digital audio. In: Proceedings of the 4th international conference on Security of information and networks. ACM, pp. 5–12.

[ref25] 25. Kirovski D, Malvar H (2001) Robust covert communication over a public audio channel using spread spectrum. In: 4th International Information Hiding Workshop. pp. 354–368.

[ref26] 26. Zezula R, Misurec J (2006) Audio signal watermarking in mclt domain with the aid of 2d pattern. In: Proceedings of 2nd International Conference on Digital Telecommunications, ICDT '07. p. 40.

[ref27] 27. Garcia-Hernandez JJ, Nakano M, Perez H (2008) Data hiding in audio signals using rational dither modulation. IEICE Electron Express 5: 217–222.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref28] 28. Garcia-Hernandez JJ, Feregrino-Uribe C, Cumplido R, Reta C (2011) On the implementation of a hardware architecture for an audio data hiding system. Journal of Signal Processing Systems 64: 457–468.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref29] 29. Cano P, Batlle E, Kalker T, Haitsma J (2002) A review of algorithms for audio fingerprinting. In: IEEE Workshop on Multimedia Signal Processing. IEEE, pp. 169–173.

[ref30] 30. Cano P, Batlle E, Gomez E, Gomes LCT, Bonnet M (2005) Audio Fingerprinting: Concepts and Applications, Springer-Verlag, chapter 2. Studies in Computational Intelligence. pp. 233–245.

[ref31] 31. Jang D, Jin M, Lee J, Lee S, Lee S, et al.. (2006) Automatic commercial monitoring for tv broadcasting using audio fingerprinting. In: AES, editor, 29th International Conference: Audio for Mobile and Handheld Devices. 3–2.

[ref32] 32. Baluja S, Coveli M (2008) Waveprint: Efficient wavelet-based audio fingerprinting. Pattern Recognition 41: 3467–3480.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref33] 33. Bellettini C, Mazzini G (2010) A framework for robust audio fingerprinting. Journal of Communications 5: 409–424.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref34] 34. Liu KJR, Trappe W, Wang ZJ, Wu M, Zhao H (2005) Multimedia Fingerprinting Forensics for Traitor Tracing, volume 4 of EURASIP Book Series on Signal Processing and Communications. Hindawi Publishing Corporation.

[ref35] 35. Thiede T, Treurniet W, Bitto R, Schmidmer C, Sporer T, et al. (2000) Peaq - the itu standard for objective measurement of perceived audio quality. AES 48: 3–29.
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref36] 36. Smith SW (2002) Digital Signal Processing: A Practical Guide for Engineers and Scientists. Newnes.

[ref37] 37. Brandenburg K (1999) Mp3 and aac explained. In: AES, editor, Audio Engineering Society Conference: 17th International Conference: High-Quality Audio Coding. 1709, pp. 1–12.

[ref38] 38. Malvar HS (1999) A modulated complex lapped transform and its applications to audio processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, volume 3 , pp. 1421–1424.

[ref39] 39. Malvar HS (2003) Fast algorithm for the modulated complex lapped transform. IEEE Signal Processing Letters 10: 8–10.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref40] 40. Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Transactions on Computers 23: 90–93.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref41] 41. Frigo M, Johnson SG (2005) The desing and implementation of fftw3. Proceedings of the IEEE 93: 216–231.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref42] 42. de Castro Lopo E (2011) Libsndfile. Available: http://www.mega-nerd.com/libsndfile/. Accessed 2013 January 25.

Figures

Abstract

Introduction

Related Work

Results and Discussion

Fingerprint Robustness Determination

Block Length Influence

Implementation Issues

Audio Clip Requirements for IDs Detection

Lossy Compression Attack

Gain and Inverse Attack

Comparison

Summary of Results

Materials and Methods

Modulated Complex Lapped Transform

Fast MCLT Algorithm.

Fast Inverse MCLT Algorithm.

Discrete Cosine Transform

Fast DCT and Inverse-DCT Algorithms.

The Fingerprinting System

Fingerprint Embedding.

Fingerprint Detection.

Group ID Detection.

User ID Detection.

Number of Users.

Software Implementation

Conclusions

Author Contributions

References