AWT1 at a glance
AWT1 implements a "strict watermarking" algorithm, i.e. the source, non-watermarked audio file is required in order to find and decode watermark in the watermarked recording. Watermark extraction is performed by “comparing” source and watermarked streams (read "Pros and Cons" below).
The watermark produced by AWT1 is extremely, insanely robust. Read details below or proceed to the examples section.
AWT1 encoder and decoder operate with wave PCM (.wav) audio files of almost any format - mono/stereo, with sampling rates from 8 to 192 KHz, and amplitude resolutions of 8/16/24/32 bits. Additional audio formats (such as MP3, OGG, AMR, etc.) are supported by the decoder via external tool, FFmpeg (http://ffmpeg.org).
Virtually any watermarking payload size is supported (subject to limitations in different AWT1 packages). Feasible payload sizes range from 1 to 20 bytes. Recommended watermarking payload size ensuring uncompromising robustness is up to 8 bytes.
With default parameters, watermarking data rate is 8 bps for 1-byte payload, 12 bps for 2-byte payload, 15 bps for 4-byte payload, 18 bps for 8-byte payload, etc. Thus, the watermarking data rate increases with increase of watermarking payload size due to some constant overhead for each watermark copy. A special parameter of the encoder allows adjusting the data rate making it higher or lower than default.
The algorithm can be applied to all kinds of audio data. Typical examples: music (pop, jazz, classics, rock), speech recordings, instrument samples, etc.
Each particular copy of AWT1 binaries with particular Serial Number (SN) contains a unique numeric identifier that is used during encoding to scramble watermark payload. This security feature prevents one AWT1 user (with one SN) from extracting watermarks from watermarked files created by another AWT1 user (with another SN number).
AWT1 is very fast: encoding is at least 100 times faster than the real-time on a modest Intel Core 2 Duo E6750@2.6Ghz using 1 core.
Watermark robustness and aural (im)perceptibility
The proposed watermarking scheme demonstrates very high robustness to almost all kinds of audio conversions. Here are some typical examples:
A quick example: the watermark survives even after several dozens(!) of transcodings using low MP3 bitrates and back, and then transducing of the transcoded signal via air (i.e. reproducing it with a loudspeaker and recording with a microphone) in presence of loud background speech. See “examples” section for more details.
You may ask whether this scheme is robust against time scaling (stretching). The answer is both "yes" and "no". The decoder is unable to automatically extract watermark from time-stretched audio file. However, the watermark in the stretched file still exists, so it is possible to extract it by a manual correction of the signal speed prior to decoding.
With default parameters, the proposed watermarking algorithm demonstrates practically undistinguishable watermarking which is transparent to an average listener with audio equipment of any quality on most of audio content. For the sake of truth it should be noted that (like with any other real world technology) there are examples of very specific audio samples that may reveal some watermarking artifacts compared to original non-watermarked audio, however in these specific cases such effects are rather minor and may be noticeable only to experienced listener. Depending on the target needs, the user may adjust encoding parameters (namely, watermarking “density” and “aggressiveness”) to achieve optimal aural transparency and robustness.
How the watermarking algorithm works
...no, it is not another spread spectrum or echo hiding technique...
A high-level description of the patented watermarking algorithm implemented in AWT1 is described in the AWT1 User Guide (PDF), section 2.
Pros and cons
The main apparent disadvantage of the scheme is the requirement to have an original audio stream to decode the watermark. However, this requirement represents a security feature because it prevents third parties from extracting and even detecting watermark in watermarked file not having its original source. Additionally, such scheme demonstrates extreme robustness that is unreachable for any “blind watermarking” schemes that do not require the source.
Due to the required accuracy of signals synchronization, the decoding generally takes longer than the encoding, with the time to decode being proportional to the audio file size. Fortunately, extraction of the watermark is generally a less frequent procedure than embedding. Also, typically you do not need to process whole 3-5 minute audio recordings to extract the watermark, as 15-40 seconds of audio will generally suffice (depending on payload size used).
Watermarking data rate is quite high (12-30 bits per second, depending on parameters).
Encoding is very fast.
Thanks to the simplicity of the whole idea behind the algorithm, the latter demonstrates extreme, almost insane watermark robustness that puts this algorithm step ahead of other competitive watermarking solutions.
To demonstrate robustness and imperceptibility of AWT1 watermarks, I place audio samples that you can play with during your evaluation of AWT1. Please do not forget to download the demo package of AWT1 that includes AWT1 encoder, decoder, convenient GUI tool and documentation.
Here are several source audio signals (WAV PCM, 44.1 Khz, 16 bit) that are used in this demonstration:
These are audio recordings of different types: music (pop, electronic, classics) and speech. Note, these are source, not watermarked files that will be used for encoding in this demonstration.
All of the above source files have been encoded using AWT1 encoder, and the watermark 0xABCDEF12 (4 bytes) has been embedded into each of them. The encoding was done by running the encoder:
awt1_enc source.wav output.wav 0xABCDEF12 -aggressiveness=1.0 -density=1.0
With these parameters the watermarking data rate is approximately 15 bits per second, that results in embedding of 28 copies of the watermarking payload per one minute of audio.
Below is a table of input and output (watermarked) files together with their distorted copies (transcoded, air transduced, mixed with speech, etc). You may download and listen to them in order to:
To decode the watermark you need to run:
awt1_dec source.wav output.wav 4 -aggressiveness=1.0 -density=1.0
Below is the table containing all input and output files. You can download the wave files of just listen to them in place using embedded audio player.
(*) Transcoded files were created by multiple subsequent lossy encoding and decoding of watermarked files using MP3 (Lame) and Ogg Vorbis codecs (18 subsequent encodings at different bitrates: source WAV -> MP3 256 Kbps -> MP3 192 Kbps -> MP3 192 Kbps -> MP3 128 Kbps -> MP3 128 Kbps -> MP3 128 Kbps -> ...). You can download the batch file that was used for transcoding by clicking here.
As a conclusion - the watermark is still detectable even in recordings made via air from bad quality 18 times transcoded MP3 outputs in presence of loud background speech while only partial recording is available! Use AWT1 decoder to make sure in that yourself.
Just for fun, I made another experiment: I took yello-out-hardtranscoded-airtransduced-voiced_part.wav, and introduced additional distortion by seriously clipping (limiting) its waverform on the entire signal duration, and then cropped the signal leaving only 14 seconds of it. The result is here: yello-out-hardtranscoded-airtransduced_voiced_cropped_clipped.wav. You can now download this file and make sure that the watermark is still detectable even in this file!