AudioWatermarking.info
robust audio watermarking technology homepage

 

If you prefer, you can skip this reading and pass directly to the Examples section.
You can also download AWT User Guide (PDF) containing all the detailed information about the product and its usage.

 

AWT at a glance

The watermark is extremely robust. Read details below or proceed to the examples section.

AWT algorithm implements so called "strict watermarking" approach, meaning that the original (non-watermarked) audio stream is required to extract the watermark from the watermarked stream. Watermark extraction is performed by “comparing” source and watermarked streams (read "Pros and Cons" below).

AWT encoder operates with wave PCM audio files of almost any feasible format - mono/stereo, with sampling rates from 8 up to 192 KHz, and amplitude resolution of 8/16/24/32 bits.

Virtually any watermarking payload size is supported. Feasible payload sizes range from 1 to 20 bytes.

With default parameters, watermarking data rate is 8 bps for 1-byte payload, 12 bps for 2-byte payload, 15 bps for 4-byte payload, 18 bps for 8-byte payload, etc. Thus, the watermarking rate increases with the increase of payload size due to some constant overhead for each copy of the watermarking payload.

The watermarking algorithm works in a time domain. The overall idea behind the algorithm is in embedding of a binary watermarking payload within a carrier audio signal in a time domain by time-shifting the carrier signal blocks in one or several frequency sub-bands. The algorithm is patent pending*.

The algorithm can be applied to all kinds of audio data. Typical examples: music (pop, jazz, classics, rock, whatever else), speech recordings, audio samples, etc.

Each copy of the AWT binaries contains a unique numeric identifier that is used both during the encoding and decoding. This security feature prevents one user from extracting or even detecting a watermark in output files of another user.

Download AWT User Guide (PDF) >>>
Download free AWT evaluation package >>>

 

Watermark robustness and aural (im)perceptibility

On robustness…

The proposed watermarking scheme demonstrates extreme robustness to almost all kinds of audio conversions and their combinations. Here are some examples:

  • transcoding using MP3, Ogg Vorbis and other lossy coders
  • acoustic coupling (i.e. traveling of sound from D/A to loudspeaker, then to the microphone via air and then to A/D)
  • mixing with other signals, noise addition
  • signal cropping, cutting
  • sample rate conversion, even down to extremely low sample rates such as 6 KHz; amplitude re-quantization
  • effect processing, from a simple EQ to an extreme dynamic range compression, reverberation, echo, spectral effects, etc.
  • waveform distortions such as limiting, clipping, slope manipulation, gain control
  • A/D - D/A conversion
  • almost any combination of the above

A quick example: the watermark survives even after several dozens(!) of transcodings using low MP3 bitrates and back, and then transducing of the transcoded signal via air (i.e. reproducing it with a loudspeaker and recording with a microphone) in presence of loud background speech. See “examples” section for more details.

You may ask whether this scheme is robust against time scaling (stretching). The answer is both "yes" and "no". The decoder is unable to automatically extract watermark from time-stretched audio file. However, the watermark in the stretched file still exists, so it is possible to extract it by a manual correction of the signal speed prior to decoding.

On imperceptibility…

The watermarking scheme is found to show practically undistinguishable watermarking to an average listener on audio equipment of any quality. Depending on the target needs, the user may adjust encoding parameters (namely, “density” and “aggressiveness”) to achieve suitable aural transparency and robustness.

 

How the algorithm works

For your information, here is a high-level description of the watermarking algorithm:

  • Watermark payload is converted to a data packet containing error-correction code and encryption
  • The source (carrier) audio stream is divided into blocks of a certain length.
  • Each block of the signal is associated with corresponding symbol (bit) of the watermark payload data packet
  • Each block is then time-shifted to a degree associated with the corresponding payload symbol (bit) value
  • The watermark payload data packet is repeated throughout audio stream as many times as the stream duration permits
  • The signal consisting of the time-shifted blocks represents the algorithm output and is stored on disc
  • At extraction stage, two signals (original and watermarked) are synchronized (either automatically or manually) and then compared block by block. Time-shift degrees of corresponding signal blocks are determined and watermark payload symbols are extracted.

For increased reliability, different statistical, security, error-correction and signal processing mechanisms are applied.

Outcomes:

  • Number of copies of the payload embedded within the signal is proportional to the signal duration
  • Maximal degree of time-shifts (which is algorithm parameter called “aggressiveness”) impacts on robustness and (im)perceptibility of the watermark: the lower degrees used, the less robust and less perceptible the watermark becomes, and vice versa.
  • Block size (which is algorithm parameter “density”) also impacts on robustness and imperceptibility: the greater the block size, the less perceptible the watermark becomes, however the less payload copies are embedded reducing robustness.

Due to some constant overhead for each copy of the watermark payload, watermarking rate increases with the increase of the payload size.

 

Pros and cons

The main apparent disadvantage of the scheme is the requirement to have an original audio stream to decode the watermark. However, this requirement represents a security feature because it prevents third parties from extracting and even detecting watermark in watermarked file not having its original source. Additionally, such scheme demonstrates extreme robustness that is unreachable for any “blind watermarking” schemes that do not require the source.

Due to the required accuracy of signals synchronization, the decoding generally takes longer than the encoding, with the time to decode being proportional to the audio file size. Fortunately, extraction of the watermark is generally a less frequent procedure than embedding. Also, typically you do not need to process whole 3-5 minute audio recordings to extract the watermark, as 15-40 seconds of audio will generally suffice (depending on payload size used).

The current implementation of encoder and decoder is somewhat RAM consuming. This drawback is solely due to this particular implementation and not because of the algorithm. However, current RAM amount standards and prices should easily minimize this shortcoming.

Watermarking data rate is quite high (12-30 bits per second, depending on parameters).

Thanks to the simplicity of the whole idea behind the algorithm, the latter shows great robustness that puts this algorithm a step ahead of other competitive watermarking solutions.

 

Examples

To show the robustness of AWT watermarks, I place audio samples that you can play with during your evaluation of AWT. Please do not forget to download the demo package of AWT on the main page.

First, download source audio signals (WAV PCM, 44.1 Khz, 16 bit)
    bach-in.wav
    brahms-in.wav
    gazebo-in.wav
    jarre-in.wav
    speech-in.wav
    yello-in.wav
These are audio recordings of different types: music (pop, electronic, classics) and speech. Note, these are source, not watermarked files that will be used for encoding in this demonstration.

All of the above source files have been encoded using AWT encoder, and the watermark 0xABCDEF12 (4 bytes) has been embedded into each of them. The encoding was done by running the encoder:

    awt_enc source.wav output.wav 0xABCDEF12 -aggressiveness=1.0 -density=1.0

With these parameters the watermarking data rate is approximately 15 bits per second, that results in embedding of 28 copies of the watermarking payload per one minute of audio.

Below is a table of input and output (watermarked) files together with their distorted copies (transcoded, air transduced, mixed with speech, etc). You may download and listen to them in order to:
* check aural transparency of the watermark (by comparing source and watermarked files quality)
* get impression of the distortions introduced into the original recordings and ensure that the watermarks are still detectable by the AWT decoder even in so much distorted files.
You can also edit/distort these files even further to test the robustness of the watermark.

To decode the watermark you need to run:

    awt_dec source.wav output.wav 4 -aggressiveness=1.0 -density=1.0

Below is the table containing all input and output files. You can download the wave files of just listen to them in place using embedded audio player.

Source (input) file
original, without watermark
Used watermark
Watermarked (output) file
to ensure that the watermark is indeed inaudible
Transcoded watermarked file*
to test watermark robustness
using AWT decoder
Transcoded* and then air transduced partial watermarked file in presence of loud background speech**
to test watermark robustness using AWT decoder
bach-in.wav
(50 sec)

 

0

x

A
B
C
D
E
F
1
2

bach-out.wav
(50 sec)

 
bach-out-hardtranscoded.wav
(50 sec)

 
bach-out-hardtranscoded-airtransduced-voiced_part.wav
(24 sec)

 
brahms-in.wav
(50 sec)

 
brahms-out.wav
(50 sec)

 
brahms-out-hardtranscoded.wav
(50 sec)

 
brahms-out-hardtranscoded-airtransduced-voiced_part.wav
(34 sec)

 
gazebo-in.wav
(60 sec)

 
gazebo-out.wav
(60 sec)

 
gazebo-out-hardtranscoded.wav
(60 sec)

 
gazebo-out-hardtranscoded-airtransduced-voiced_part.wav
(33 sec)

 
jarre-in.wav
(60 sec)

 
jarre-out.wav
(60 sec)

 
jarre-out-hardtranscoded.wav
(60 sec)

 
jarre-out-hardtranscoded-airtransduced-voiced_part.wav
(31 sec)

 
speech-in.wav
(30 sec)

 
speech-out.wav
(30 sec)

 
speech-out-hardtranscoded.wav
(30 sec)

 
speech-out-hardtranscoded-airtransduced-voiced_part.wav
(22 sec)

 
yello-in.wav
(60 sec)

 
yello-out.wav
(60 sec)

 
yello-out-hardtranscoded.wav
(60 sec)

 
yello-out-hardtranscoded-airtransduced-voiced_part.wav
(33 sec)

 

(*) Transcoded files were created by multiple subsequent lossy encoding and decoding of watermarked files using MP3 (Lame) and Ogg Vorbis codecs (18 subsequent encodings at different bitrates: source WAV -> MP3 256 Kbps -> MP3 192 Kbps -> MP3 192 Kbps -> MP3 128 Kbps -> MP3 128 Kbps -> MP3 128 Kbps -> ...). You can download the batch file that was used for transcoding by clicking here.
(**) Air transducing (acoustic coupling) has been performed by reproducing the lossy transcoded output files* using multimedia loudspeakers and by recording the signal with a microphone placed at 30 cm from one of the loudspeakers. Additionally, loud background speech was disturbing the recording constantly, and the recording has been then cropped.

As a conclusion - the watermark is still detectable even in recordings made via air from bad quality 18 times transcoded MP3 outputs in presence of loud background speech while only partial recording is available! Use AWT decoder to make sure in that yourself.

Just for fun, I made another experiment: I took yello-out-hardtranscoded-airtransduced-voiced_part.wav, and introduced additional distortion by seriously clipping (limiting) its waverform on the entire signal duration, and then cropped the signal leaving only 14 seconds of it. The result is here: yello-out-hardtranscoded-airtransduced_voiced_cropped_clipped.wav. You can now download this file and make sure that the watermark is still detectable even in this file!

 

FAQ

Q: AWT algorithm requires from the user to retain all his original (non-watermarked) files in order to be able extracting watermarks. Should I store all the source files in their original wave format (PCM), exactly the same that was used for watermark embedding?
A: No, you shouldn’t. The idea is to have the non-watermarked sources of a reasonably good quality. If your original files are of very high resolution, then for the decoding purposes you can store only their down-sampled versions, such as 44.1 KHz or 22.025 KHz, 16 bit. If you wish to save even more disk space, you can also use lossless coders such as wavpack, FLAC, etc. If this is not enough too, you can even compress your sources into MP3 (e.g. using Lame MP3 Encoder), but please make sure that the bitrate you use for encoding is high enough (e.g. 160-256 Kbps) to preserve the audio quality as close to the source as possible and to not compromise watermark extraction performance. Depending on the amount of files you’re going to watermark, you can choose whatever option of the above.

Q: Can special “watermarking attacks” compromise watermark robustness of AWT? In other words, are there methods able to destroy AWT watermarks?
A: Yes, of course. To damage watermark, it is always possible to develop a special watermarking attack targeted on a specific watermarking technique. Therefore the watermarking technique used in AWT is not attack agnostic too. However, the author believes that the proposed technique is strong enough to survive most of standard watermarking attacks.

Q: Does the use of this watermarking solution compromise the performance of automatic audio recognition services (such as TrackID, MusicBrainz, etc) being applied to the watermarked files?
A: The answer depends on every particular audio recognition technology. As AWT watermark is transparent to the listener, it should be transparent to a good audio recognition engine as well. AWT watermarking should not harm recognition results of the most of widespread audio recognition technologies.

Q: Can one AWT user detect and/or extract watermarks from audio streams of another AWT user?
A: No. Each copy of the AWT binaries contains a unique numeric identifier that is used both during encoding and decoding. This security feature prevents one user from extracting or even detecting a watermark in output files of another user even if both of them have identical source (non-watermarked) files.

Q: Can I extract watermark using only a part of the source file?
A: Yes, you can, but the source file should not be cropped at the head. It can be cropped at the tail only. This is due to the fact that during watermarking process, the blocks are aligned relative to the start of the source stream. Therefore, for successful watermark extraction, the source stream head should be preserved and should not be cut off.

Q: Can I extract watermark from a small fragment of the watermarked stream?
A: Yes, of course the whole watermarked stream is generally not required. As the watermarking rate is quite high, it is generally enough to use only a portion of the watermarked stream (30-50 seconds) to extract the watermark from it, especially if the audio is marginally distorted compared to the source file. On the other hand, if the audio stream is distorted, you might need to use longer portion or the entire available watermarked recording because analysis of longer signals may significantly improve extraction reliability statistically.

Q: Can I watermark MP3/OGG/… files?
A: This procedure will require transcoding. In other words, you need to decode MP3-files into wave format before watermarking them. AWT encoder operates with wave files only, and is unable to embed watermarks directly into MP3 bit stream. Therefore, the only way to watermark MP3 files is to decode MP3 file into a wave file, then watermark the obtained wave file, and then encode the watermarked wave file back to MP3.

Proceed to purchase page >>>
Download AWT User Guide (PDF) >>>
Download free AWT evaluation package >>>

<<< back to the main page