Want to dive even deeper?

Take the course Build Your Own Online Business with WordPress by Stone River and become an expert!
Build Your Own Online Business with WordPress
by Stone River

Check it out!
You're watching a preview of this video, click the button on the left to puchase the full version from Paris JUG.

What Shazam doesn't want you to know

Roy always wants to know how things work, to the smallest detail. This session will focus on music recognition like Shazam and SoudHound. Those "magic" programs that identify songs by listening to it. After this session you'll not only know how to implement this in Java. You'll also have learned how a microphone works, how the human ear works, how to capture and analyze sound in Java SE, what the Fourier Transformation does, and of course how those music recognition algorithms do their magic. Also, after publishing this information on my blog I've received a couple of patent infringement claims from Shazam's patent holders. Can you really be sued after a weekend of programming and releasing the source code?

Published on
  • 2.947
  • 56
  • 1
  • 19
  • 2
  • Wh a t Sh a za m d o e s n 't w a n t y o u t o k n o w Roy van Rijn Software Craftsman
  • I m a g i n e t h i s ...
  • Friday afternoon 3
  • Getting inspiration 4
  • Singing along... 5
  • Meet Stewie! 6
  • M u s i c m at c h i n g
  • Music matching · Shazam is magic... alien technology! 1) Start the application. 2) Let it listen for +/- 20 seconds. 3) It tells you: 8
  • Sat u r d a y m o r n i n g
  • The microphone · Specifying the audio format: private AudioFormat getFormat() { float sampleRate = 44100; int sampleSizeInBits = 8; int channels = 1; boolean signed = true; boolean bigEndian = true; return new AudioFormat( sampleRate, sampleSizeInBits, channels, signed, bigEndian); } 10
  • The microphone · Accessing the microphone final AudioFormat format = getFormat(); DataLine.Info info = new DataLine.Info(TargetDataLine.class, format); final TargetDataLine line = (TargetDataLine) AudioSystem.getLine(info); line.open(format); line.start(); 11
  • The microphone · Reading the sound: out = new ByteArrayOutputStream(); running = true; try { while (running) { int count = line.read(buffer, 0, buffer.length); if (count > 0) { out.write(buffer, 0, count); } } out.close(); } catch (IOException e) { throw new RuntimeException(e); } 12
  • Inside the byte array 0 0 0 1 2 5 0 -1 -3 -4 -5 -2 0 1 2 0 2 (etc) 13
  • Plotting a graph 14
  • The human ear · Membrane, cochlear, brain 15
  • Plotting a graph 14
  • The microphone · Specifying the audio format: private AudioFormat getFormat() { float sampleRate = 44100; int sampleSizeInBits = 8; int channels = 1; boolean signed = true; boolean bigEndian = true; return new AudioFormat( sampleRate, sampleSizeInBits, channels, signed, bigEndian); } 10
  • The microphone · Reading the sound: out = new ByteArrayOutputStream(); running = true; try { while (running) { int count = line.read(buffer, 0, buffer.length); if (count > 0) { out.write(buffer, 0, count); } } out.close(); } catch (IOException e) { throw new RuntimeException(e); } 12
  • Plotting a graph 14
  • Frequencies...? · Hertz · The amount of cycles per second · i.e. one sine wave. 16
  • Time vs Frequency · To match music we need frequencies, not waves. 17
  • Fourier Transformation 18
  • Fourier Transformation 19
  • Fourier Transformation 18
  • Fourier Transformation 19
  • Fourier Transformation 20
  • Fourier Transformation 21
  • Fourier Transformation 22
  • Fourier Transformation · Excellent explanation by Stuart Riffle ­ http://altdevblogaday.com 23
  • Frequency domain · We've lost track of time! 24
  • Windowing · Solution: Apply transformation on pieces byte audio[] = out.toByteArray(); final int amountSlices = audio.length/SLICE_SIZE; Complex[][] results = new Complex[amountChucks][]; for(int slice = 0;slice < amountSlices; slice++) { Complex[] complex = new Complex[SLICE_SIZE]; for(int i = 0;i<SLICE_SIZE;i++) { complex[i] = new Complex(audio[(slice*SLICE_SIZE)+i], 0); } results[slice] = FFT.fft(complex); } 25
  • Spectrum Analyzer · From wikipedia: Spectum Analyzer A spectrum analyzer or spectral analyzer is a device used to examine the spectral composition of some electrical, acoustic, or optical waveform. 26
  • Windowing · Solution: Apply transformation on pieces byte audio[] = out.toByteArray(); final int amountSlices = audio.length/SLICE_SIZE; Complex[][] results = new Complex[amountChucks][]; for(int slice = 0;slice < amountSlices; slice++) { Complex[] complex = new Complex[SLICE_SIZE]; for(int i = 0;i<SLICE_SIZE;i++) { complex[i] = new Complex(audio[(slice*SLICE_SIZE)+i], 0); } results[slice] = FFT.fft(complex); } 25
  • Spectrum Analyzer · From wikipedia: Spectum Analyzer A spectrum analyzer or spectral analyzer is a device used to examine the spectral composition of some electrical, acoustic, or optical waveform. 26
  • De m o : A p h ex Tw i n
  • Su n d a y
  • Matching the song · Determine the key points (in ranges): 34 41 39 41 40 42 40 47 40 53 40 48 39 45 40 42 40 41 38 42 92 117 106 117 81 109 89 84 81 113 129 130 129 121 129 132 135 125 121 131 186 218 191 217 208 260 247 251 232 245 (etc...) 29
  • Something to match against · Playing/decoding MP3 files: ­ JLayer (real time MP3 decoder) · jl1.0.1.jar ­ MP3SPI (Java plugin, based on JLayer) · mp3spi1.9.4.jar ­ Tritonus (implementation of Java Sound API) · tritonus_share.jar 30
  • Something to match against · Harvesting my music collection: public void harvest(File rootDirectory) { String[] itemsInDirectory = rootDirectory.list(); for(String itemInDirectory:itemsInDirectory) { if(itemInDirectory.endsWith(".mp3")) { //Assume mp3 file File mp3File = new File(mp3Directory, itemInDirectory); captureAudio(mp3File); } else if(new File(mp3Directory, itemInDirectory).isDirectory()) { //Directory? Recurse! harvest(new File(mp3Directory, itemInDirectory)); } } } 31
  • What we have now · We have: ­ Set of +/- 3000 files of reference data (songs) ­ Way of capturing key moments with microphone · Lets do some matching! 32
  • Hash function · Create a single hash per slice private static final int FUZ_FACTOR = 2; private long hash(String line) { String[] p = line.split("\t"); long p1 = Long.parseLong(p[0]); long p2 = Long.parseLong(p[1]); long p3 = Long.parseLong(p[2]); long p4 = Long.parseLong(p[3]); // long p5 = Long.parseLong(p[5]); // Not using the fifth point currently return (p4-(p4%FUZ_FACTOR)) * 100000000 + (p3-(p3%FUZ_FACTOR)) * 100000 + (p2-(p2%FUZ_FACTOR)) * 100 + (p1-(p1%FUZ_FACTOR)); } 33
  • Matching algorithm #1 1) 2) 3) 4) Load all the reference hashes Listen to the microphone and generate hashes Find all matching hashes Return the reference-song with most hits · This worked (a bit) but produced a lot of mis-hits · How can we improve this? 34
  • Matching algorithm #2 · Microphone, sample #1, matches with: · Song 4, sample #4 · Song 6, sample #9 · Microphone, sample #2, matches with: · Song 4, sample #6 · Song 6, sample #10 · Song 8, sample #4 · Microphone, sample #3, matches with: · Song 4, sample #5 · Song 5, sample #3 35
  • Matching algorithm #2 · Microphone, sample #1, matches with: · Song 4, sample #4: · Song 6, sample #9 4­1=3 9­1=8 · Microphone, sample #2, matches with: · Song 4, sample #6 · Song 6, sample #10 · Song 8, sample #4 6­2=4 10 ­ 2 = 8 4­2=2 · Microphone, sample #3, matches with: · Song 4, sample #5 · Song 5, sample #2 5­3=2 2 ­ 3 = -1 36
  • Matching algorithm #2 · Now we group the results: 2x: 1x: 1x: 1x: 1x: 1x: Song 6 with offset 8 Song 4 with offset 2 Song 4 with offset 3 Song 4 with offset 4 Song 8 with offset 2 Song 5 with offset -1 37
  • Recap · Music matching (Shazam, SoundHound) isn't magic · Searching can be done very fast: ­ Searching for hashes: O(1) ­ Align and match · The Fourier Transformation is like a spirograph: 38
  • Other uses for this algorithm · What are other uses for this algorithm: ­ Speech recognition? · Probably not.. ­ Detecting duplicate songs in your music collection? · Yes! Took 5 minutes for crude implementation ­ Subtitle synchronisation in India ! 40
  • Landmark Digital Services ... Landmark Digital Services owns the patents that cover the algorithm used as the basis for your recently posted "Creating Shazam In Java". While it is not Landmark's intention to alienate those in the Open Source and Music Information Retrieval community, Landmark must request that you do not ship, deploy or post the code presented in your post. Landmark also requests that in the future you do not ship, deploy or post any portions or versions of this code in its current state or in any modified state. We hope you understand our position and that we would be legally remiss not to make this request. We appreciate your immediate attention and response. ... 41
  • Getting information · After this email I contacted: ­ Arnoud Engelfriet (Dutch IT lawyer, patent attorney) ­ Free Software Foundation ­ And others. 42
  • Now the blogpost? · From another email: As I'm sure you are aware, your blogpost may be viewed internationally. As a result, you may contribute to someone infringing our patents in any part of the world. While we trust your good intentions, yes, we would like you to refrain from releasing the code at all and to remove the blogpost explaining the algorithm. 43
  • No way... · My reply was short and concise: I'm sorry, I can't comply. The blogpost will absolutely not be removed. Good luck. 44
  • Qu e s t i o n s ?

Comments

Be the first one to add a comment

User is not allowed