Tag: teaching

ep-341 syllabus – Spring 2015

Programming interactive audio software and plugins in Max/MSP

Spring 2015

teacher: Tom Zicarelli – http://tomzicarelli.com

You can reach me at:  tzicarelli@berklee.edu

Office hours: Tuesday 1-2 PM, or Tuesday 4-5PM, at the EPD office #401 at 161 Mass Ave. Please email or call ahead.

Assignments and class notes will be posted to this blog: http://reactivemusic.net before or after the class. Search for: ep-341 to find the notes

Examples, software, links, and references demonstrated in class are available for you to use. If there is something missing from the notes,  please ask about it. This is your textbook.


Prototyping is the focus. Max is a seed that has grown into music, art, discoveries, products, and entire businesses.

After you take the course, you will have developed several projects. You might design a musical instrument or a plugin. You will have opportunities to solve problems.  But mostly you will have a sense of how to explore possibilities by building prototypes in Max. You will have the basic skills to quickly make software to connect things, and answer questions like, “Is it possible to make something that does x?”.

You will become familiar with how other artists use Max to make things. You will be exposed to to a world of possibilities – which you may embrace or reject.

We will explore a range of methods and have opportunities to use them in projects. We’ll look at examples by artists – asking the question: How does this work?

Success depends on execution as well as good ideas.

Topics: (subject to change)

  1. Max
  2. Reverse engineering
  3. Transforming and scaling data
  4. Designing user interfaces
  5. Messages and communication, MIDI/OSC
  6. randomness and probability
  7. Connecting hardware and other devices
  8. Working with sensors, data, and API’s
  9. Audio signal processing and synthesis.
  10. Problem solving, prototyping, portfolios.
  11. plugins, Max for Live.
  12. Basic video processing and visualization
  13. Alternative tools: Pd
  14. Max externals
  15. How to get ideas
  16. Computers and Live performance
  17. Transcoding

Grading and projects:

Grades will be assigned projects, several small assignments/quizzes, and class participation. Please see Neil Leonard’s EP-341 syllabus for details. I encourage and will give credit for: collaboration with other students, outside projects, performances, independent projects, and anything else that will encourage your growth and success.

I am open to alternative projects. For example, if you want to use this course as an opportunity to develop a larger project or continue a work in progress.

Reference material



New musical instruments

A presentation for Berklee BTOT 2015 http://www.berklee.edu/faculty 


Around the year 1700, several startup ventures developed prototypes of machines with thousands of moving parts. After 30 years of engineering, competition, and refinement, the result was a device remarkably similar to the modern piano.

What are the musical instruments of the future being designed right now?

  • new composition tools,
  • reactive music,
  • connecting things,
  • sensors,
  • voices, 
  • brains



Ray Kurzweil’s future predictions on a timeline: http://imgur.com/quKXllo (The Singularity will happen in 2045)

In 1965 researcher Herbert Simon said: “Machines will be capable, within twenty years, of doing any work a man can do”. Marvin Minsky added his own prediction: “Within a generation … the problem of creating ‘artificial intelligence’ will substantially be solved.” https://forums.opensuse.org/showthread.php/390217-Will-computers-or-machines-ever-become-self-aware-or-evolve/page2


Are there patterns in the ways that artists adapt technology?

For example, the Hammond organ borrowed ideas developed for radios. Recorded music is produced with computers that were originally as business machines.

Instead of looking forward to predict future music, lets look backwards to ask,”What technology needs to happen to make musical instruments possible?” The piano relies upon a single-escapement (1710) and later a double-escapement (1821). Real time pitch shifting depends on Fourier transforms (1822) and fast computers (~1980).

Artists often find new (unintended) uses for tools. Like the printing press.

New pianos

The piano is still in development. In December 2014, Eren Başbuğ composed and performed music on the Roli Seaboard – a piano keyboard made of 3 dimensional sensing foam:

Here is Keith McMillen’s QuNexus keyboard (with Polyphonic aftertouch):



Here are tools that might lead to new ways of making music. They won’t replace old ways. Singing has outlasted every other kind of music.

These ideas represent a combination of engineering and art. Engineers need artists. Artists need engineers. Interesting things happen at the confluence of streams.

Analysis, re-synthesis, transformation

Computers can analyze the audio spectrum in real time. Sounds can be transformed and re-synthesized with near zero latency.

Infinite Jukebox

Finding alternate routes through a song.

by Paul Lamere at the Echonest

Echonest has compiled data on over 14 million songs. This is an example of machine learning and pattern matching applied to music.


Try examples: “Karma Police”, Or search for: “Albert Ayler”)

Remixing a remix

“Mindblowing Six Song Country Mashup”: https://www.youtube.com/watch?v=FY8SwIvxj8o (start at 0:40)

Screen Shot 2015-01-09 at 11.25.13 PM

Local file: Max teaching examples/new-country-mashup.mp3

More about Echonest

Feature detection

Looking at music under a microscope.

removing music from speech

First you have to separate them.


by Xavier Serra and UPF

Harmonic Model Plus Residual (HPR) – Build a spectrogram using STFT, then identify where there is strong correlation to a tonal harmonic structure (music). This is the harmonic model of the sound. Subtract it from the original spectrogram to get the residual (noise).

Screen Shot 2015-01-06 at 1.40.37 AM

Screen Shot 2015-01-06 at 1.40.12 AM

Settings for above example:

  • Window size: 1800 (SR / f0 * lobeWidth) 44100 / 200 * 8 = 1764
  • FFT size: 2048
  • Mag threshold: -90
  • Max harmonics: 30
  • f0 min: 150
  • f0 max: 200
Many kinds of features
  • Low level features: harmonicity, amplitude, fundamental frequency
  • high level features: mood, genre, danceability
Examples of feature detection
Music information retrieval

Finding the drop

“Detetcting Drops in EDM” – by Karthik Yadati, Martha Larson, Cynthia C. S. Liem, Alan Hanjalic at Delft University of Technology (2014) http://reactivemusic.net/?p=17711

Polyphonic audio editing

Blurring the distinction between recorded and written music.


by Celemony


A minor version of “Bohemian Rhapsody”: http://www.youtube.com/watch?v=voca1OyQdKk

Music recognition

“How Shazam Works” by Farhoud Manjoo at Slate: http://reactivemusic.net/?p=12712, “About 3 datapoints per second, per song.”

  • Music fingerprinting: https://musicbrainz.org/doc/Fingerprinting
  • Humans being computers. Mystery sounds. (Local file: Desktop/mystery sounds)
  • Is it more difficult to build a robot that plays or one that listens?

Sonographic sound processing

Transforming music through pictures.

by Tadej Droljc


(Example of 3d speech processing at 4:12)

local file: SSP-dissertation/4 – Max/MSP/Jitter Patch of PV With Spectrogram as a Spectral Data Storage and User Interface/basic_patch.maxpat

Try recording a short passage, then set bound mode to 4, and click autorotate

Spectral scanning in Ableton Live:

Web Audio

Web browser is the new black


by Joe Berkowitz 



by Dinahmoe


Can you jam over the internet?

What is the speed of electricity? 70-80 ms is the best round trip latency (via fiber) from the U.S. east to west coast. If you were jamming over the internet with someone on the opposite coast it might be like being 100 ft away from them in a field. (sound travels 1100 feet/second in air).

Global communal experiences – Bill McKibben – 1990 “The Age of Missing Information”

More about Web Audio

Conversation with robots

Computers finding meaning

The Google speech API


The Google speech API uses neural networks, statistics, and large quantities of data.

Microsoft: real-time translation

Reverse entropy


Making music from from sounds that are not music.

by Katja Vetter

. (InstantDecomposer is an update of SliceJockey2):   http://www.katjaas.nl/slicejockey/slicejockey.html

  • local: InstantDecomposer version: tkzic/pdweekend2014/IDecTouch/IDecTouch.pd
  • local: slicejockey2test2/slicejockey2test2.pd
More about reactive music

Sensors and sonification

Transforming motion into music

Three approaches
  • earcons (email notification sound)
  • models (video game sounds)
  • parameter mapping (Geiger counter)
Leap Motion

camera based hand sensor

“Muse” (Boulanger Labs) with Paul Bachelor, Christopher Konopka, Tom Shani, and Chelsea Southard: http://reactivemusic.net/?p=16187

Max/MSP piano example: Leapfinger: http://reactivemusic.net/?p=11727

local file: max-projects/leap-motion/leapfinger2.maxpat

Internet sensors project

Detecting motion from the Internet


Twitter streaming example


MBTA bus data

 Sonification of Mass Ave buses, from Harvard to Dudley


Screen Shot 2014-11-11 at 3.26.16 PM

Stock market music


More sonification projects
Vine API mashup

By Steve Hensley

Using Max/MSP/jitter

local file: tkzic/stevehensely/shensley_maxvine.maxpat

Audio sensing gloves for spacesuits

By Christopher Konopka at future, music, technology


Computer Vision

Sensing motion with video using frame subtraction

by Adam Rokhsar


local file: max-projects/frame-subtraction

The brain

Music is stored all across the brain.

Mouse brain wiring diagram

The Allen institute


“Hacking the soul” by Christof Koch at the Allen institute

(An Explanation of the wiring diagram of the mouse brain – at 13:33) http://www.technologyreview.com/emtech/14/video/watch/christof-koch-hacking-the-soul/

OpenWorm project

A complete simulation of the nematode worm, in software, with a Lego body (320 neurons)



Harold Cohen’s algorithmic painting machine


Brain plasticity

A perfect pitch pill? http://www.theverge.com/2014/1/6/5279182/valproate-may-give-humans-perfect-pitch-by-resetting-critical-periods-in-brain


Could we grow music producing organisms? http://reactivemusic.net/?p=18018


Two possibilities

Rejecting technology?
An optimistic future?

There is a quickening of discovery: internet collaboration, open source, linux,  github, r-pi, Pd, SDR.

“Robots and AI will help us create more jobs for humans — if we want them. And one of those jobs for us will be to keep inventing new jobs for the AIs and robots to take from us. We think of a new job we want, we do it for a while, then we teach robots how to do it. Then we make up something else.”

“…We invented machines to take x-rays, then we invented x-ray diagnostic technicians which farmers 200 years ago would have not believed could be a job, and now we are giving those jobs to robot AIs.”

Kevin Kelly – January 7, 2015, reddit AMA http://www.reddit.com/r/Futurology/comments/2rohmk/i_am_kevin_kelly_radical_technooptimist_digital/

Will people be marrying robots in 2050? http://www.livescience.com/1951-forecast-sex-marriage-robots-2050.html

“What can you predict about the future of music” by Michael Gonchar at The New York Times http://reactivemusic.net/?p=17023

Jim Morrison predicts the future of music:


More areas to explore

Hearing voices

A presentation for Berklee BTOT 2015 http://www.berklee.edu/faculty

(KITT dashboard by Dave Metlesits)

The voice was the first musical instrument. Humans are not the only source of musical voices. Machines have voices. Animals too.

  • synthesizing voices (formant synthesis, text to speech, Vocaloid)
  • processing voices (pitch-shifting, time-stretching, vocoding, filtering, harmonizing),
  • voices of the natural world
  • fictional languages and animals
  • accents
  • speech and music recognition
  • processing voices as pictures
  • removing music from speech
  • removing voices


We instantly recognize people and animals by their voices. As an artist we work to develop our own voice. Voices contain information beyond words. Think of R2D2 or Chewbacca.

There is also information between words: “Palin Biden Silences” David Tinapple, 2008: http://vimeo.com/38876967

Synthesizing voices

The vocal spectrum

What’s in a voice?

Singing chords

Humans acting like synthesizers.

More about formants
Text to speech

Teaching machines to talk.


  • phonemes (unit of sound)
  • diphones (combination of phonemes) (Mac OS “Macintalk 3  pro”)
  • morphemes (unit of meaning)
  • prosody (musical quality of speech)
  • articulatory (anatomical model)
  • formant (additive synthesis) (speak and spell)
  • concatentative (building blocks) (Mac Os)

Try the ‘say’ command (in Mac OS terminal), for example: say hello

More about text to speech

Combining the energy of voice with musical instruments (convolution)

  • Peter Frampton “talkbox”: https://www.youtube.com/watch?v=EqYDQPN_nXQ (about 5:42) – Where is the exciting audience noise in this video?
  • Ableton Live example: Local file: Max/MSP: examples/effects/classic-vocoder-folder/classic_vocoder.maxpat
  • Max vocoder tutorial (In the frequency domain), by dude837 – Sam Tarakajian http://reactivemusic.net/?p=17362 (local file: dude837/4-vocoder/robot-master.maxpat
More about vocoders

By Yamaha

(text + notation = singing)

Demo tracks: https://www.youtube.com/watch?v=QWkHypp3kuQ

Vocaloop device http://vocaloop.jp/ demo: https://www.youtube.com/watch?v=xLpX2M7I6og#t=24

Processing voices


Pitch transposing a baby http://reactivemusic.net/?p=2458

Real time pitch shifting

Autotune: “T-Pain effect” ,(I-am-T-Pain bySmule), “Lollipop” by Lil’ Wayne. “Woods” by Bon Iver https://www.youtube.com/watch?v=1_cePGP6lbU

Autotuna in Max 7

by Matthew Davidson

Local file: max-teaching-examples/autotuna-test.maxpat

InstantDecomposer in Pure Data (Pd)

by Katja Vetter


Autocorrelation: (helmholtz~ Pd external) “Helmholtz finds the pitch” http://www.katjaas.nl/helmholtz/helmholtz.html

(^^ is input pitch, preset #9 is normal)

  • local file: InstantDecomposer version: tkzic/pdweekend2014/IDecTouch/IDecTouch.pd
  • local file: slicejockey2test2/slicejockey2test2.pd
Phasors and Granular synthesis

Disassembling time into very small pieces


Adapted from Andy Farnell, “Designing Sound”

http://reactivemusic.net/?p=11385 Download these patches from: https://github.com/tkzic/max-projects folder: granular-timestretch

  • Basic granular synthesis: graintest3.maxpat
  • Time-stretching: timestretch5.maxpat

More about phasors and granular synthesis
Phase vocoder

…coming soon

Sonographic sound processing

Changing sound into pictures and back into sound

by Tadej Droljc


(Example of 3d speech processing at 4:12)

local file: SSP-dissertation/4 – Max/MSP/Jitter Patch of PV With Spectrogram as a Spectral Data Storage and User Interface/basic_patch.maxpat

Try recording a short passage, then set bound mode to 4, and click autorotate

Speech to text

Understanding the meaning of speech

The Google Speech API

A conversation with a robot in Max


Google speech uses neural networks, statistics, and large quantities of data.

More about speech to text

Voices of the natural world

Changes in the environment reflected by sound

Fictional languages and animals

“You can talk to the animals…”

Pig creatures example: http://vimeo.com/64543087

  • 0:00 Neutral
  • 0:32 Single morphemes – neutral mode
  • 0:37 Series, with unifying sounds and breaths
  • 1:02 Neutral, layered
  • 1:12 Sad
  • 1:26 Angry
  • 1:44 More Angry
  • 2:11 Happy

What about Jar Jar Binks?


The sound changes but the words remain the same.

The Speech accent archive http://reactivemusic.net/?p=9436

Finding and removing music in speech

We are always singing.

Jamming with speech
Removing music from speech

by Xavier Serra and UPF

Harmonic Model Plus Residual (HPR) – Build a spectrogram using STFT, then identify where there is strong correlation to a tonal harmonic structure (music). This is the harmonic model of the sound. Subtract it from the original spectrogram to get the residual (noise).

Screen Shot 2015-01-06 at 1.40.37 AM

Screen Shot 2015-01-06 at 1.40.12 AM

Settings for above example:

  • Window size: 1800 (SR / f0 * lobeWidth) 44100 / 200 * 8 = 1764
  • FFT size: 2048
  • Mag threshold: -90
  • Max harmonics: 30
  • f0 min: 150
  • f0 max: 200
feature detection
  • time dependent
  • Low level features: harmonicity, amplitude, fundamental frequency
  • high level features: mood, genre, danceability

Acoustic Brainz: (typical analysis page) http://reactivemusic.net/?p=17641

Essentia (open source feature detection tools)  https://github.com/MTG/essentia

Freesound (vast library of sounds):  https://www.freesound.org – look at “similar sounds”

Removing voices from music

A sad thought

phase cancellation encryption

This method was used to send secret messages during world war 2. Its now used in cell phones to get rid of echo. Its also used in noise canceling headphones.



Center channel subtraction

What is not left and not right?

Ableton Live – utility/difference device: http://reactivemusic.net/?p=1498 (Allison Krause example)

Local file: Ableton-teaching-examples/vocal-eliminator

More experiments


  • Why do most people not like the recorded sound of their voice?
  • Can voice be used as a controller?
  • How do you recognize voices?
  • Does speech recognition work with singing?
  • How does the Google Speech API know the difference between music and speech?
  • How can we listen to ultrasonic animal sounds?
  • What about animal translators?


ep-413 DSP week 15


  1. Syllabus: http://reactivemusic.net/?p=17122
  2. Ways to approach a project http://reactivemusic.net/?p=17132
    • Make machines that make art
    • Reverse engineering
    • Use the wrong tools
    • Abstraction and destruction
    • Backwards, extreme, opposite – connect two things
    • Ask questions
  3. Composition tools and dramatic shape http://reactivemusic.net/?p=17157
  4. Problem solving (pitch detection) and prototyping (Muse) http://reactivemusic.net/?p=17159
  5. Sound byte composition http://reactivemusic.net/?p=17190
  6. Convolution and voices http://reactivemusic.net/?p=17211
  7. (No class this week)
  8. Granular synthesis, the frequency domain, and phasors http://reactivemusic.net/?p=17360
  9. Data, Internet API’s, Vine API in Max http://reactivemusic.net/?p=17466
  10. Communication, Osc, Sonification, MBTA API in Max http://reactivemusic.net/?p=17518
  11. Filters: analog, digital, other, reversability http://reactivemusic.net/?p=17542
  12. Web Audio API http://reactivemusic.net/?p=17600
  13. Feature detection, and Music Information Retrieval http://reactivemusic.net/?p=17689
  14. Waves: light, radio, water http://reactivemusic.net/?p=17787
  15. This

John Coltrane: You can learn something from everybody, no matter how good or bad they play, everybody has something to say.

Sal Khan: In the future people will take agency for their own education.

For artists, everything is a tool.

ep-413 DSP – week 13

Feature detection and randomness

What could it possibly have to do with my life?

Feature detection

  • Spectral
    • BarkBands, MelBands, ERBBands, MFCC, GFCC, LPC, HFC, Spectral Contrast, Inharmonicity, and Dissonance, …
  • Time-domain
    • EffectiveDuration, ECR, Loundness, …
  • Tonal
    • PitchSalienceFunction, PitchYinFFT, HPCP, TuningFrequency, Key, ChordsDetection, …
  • Rhythm
    • BestTrackerDegara, BeatTrackerMultiFeature, BPMHistogramDescriptors, NoveltyCurve, OnsetDescription, OnsetDetection, Onsets, …
  • SFX
    • LogAttackTime, MaxToTotal, MinToTotal, TCToTotal, …
  • High-level
    • Danceability, DynamicComplexity, FadeDetection, SBic, …

-from X. Serra (2014) “Audio Signal Processing for Music Applications”

  • low level vs. high level
  • single events vs. groups of events
  • combinations of descriptors
  • order of events (markov chains)

Humans are very good at pattern recognition. Is it a survival mechanism? People who listen to music are very good at analysis. Compared to the abilities of an average child, computer music information retrieval has not yet reached the computational ability of a worm: http://reactivemusic.net/?p=17744

Pattern recognition

(in computer science)

Music information retrieval



(These demonstrations will be done in Ubuntu Linux 14.04.1)

1. Separating and removing musical tones from speech

1a. Harmonic plus residual model (HPR) in  sms-tools (speech-female 150-200 Hz. , and sax phrase)

1b. Do the same thing with transformation model

What about pitch salience and chroma?

2. descriptors

(use workspace/A9)

import soundAnalysis as SA

Here is the list of descriptors that are donwloaded:

Index — Descriptor

0 — lowlevel.spectral_centroid.mean
1 — lowlevel.dissonance.mean
2 — lowlevel.hfc.mean
3 — sfx.logattacktime.mean
4 — sfx.inharmonicity.mean
5 — lowlevel.spectral_contrast.mean.0
6 — lowlevel.spectral_contrast.mean.1
7 — lowlevel.spectral_contrast.mean.2
8 — lowlevel.spectral_contrast.mean.3
9 — lowlevel.spectral_contrast.mean.4
10 — lowlevel.spectral_contrast.mean.5
11 — lowlevel.mfcc.mean.0
12 — lowlevel.mfcc.mean.1
13 — lowlevel.mfcc.mean.2
14 — lowlevel.mfcc.mean.3
15 — lowlevel.mfcc.mean.4

16 — lowlevel.mfcc.mean.5

2a. euclidian distance

What happens when you look at multiple descriptors

In [22]: SA.descriptorPairScatterPlot( ‘tmp’, descInput=(0,5))


2b. k- means clustering

What descriptors best classify sounds into instrument groups?

SA.clusterSounds(‘tmp’, nCluster = 3, descInput=[0,2,9])

In [21]: SA.clusterSounds(‘tmp’, nCluster = 3, descInput=[0,2,9])
(Cluster: 0) Using majority voting as a criterion this cluster belongs to class: violin
Number of sounds in this cluster are: 15
sound-id, sound-class, classification decision
[[‘61926’ ‘violin’ ‘1’]
[‘61925’ ‘violin’ ‘1’]
[‘153607’ ‘violin’ ‘1’]
[‘153629’ ‘violin’ ‘1’]
[‘153609’ ‘violin’ ‘1’]
[‘153608’ ‘violin’ ‘1’]
[‘153628’ ‘violin’ ‘1’]
[‘153603’ ‘violin’ ‘1’]
[‘153602’ ‘violin’ ‘1’]
[‘153601’ ‘violin’ ‘1’]
[‘153600’ ‘violin’ ‘1’]
[‘153610’ ‘violin’ ‘1’]
[‘153606’ ‘violin’ ‘1’]
[‘153605’ ‘violin’ ‘1’]
[‘153604’ ‘violin’ ‘1’]]

(Cluster: 1) Using majority voting as a criterion this cluster belongs to class: bassoon
Number of sounds in this cluster are: 22
sound-id, sound-class, classification decision
[[‘154336’ ‘bassoon’ ‘1’]
[‘154337’ ‘bassoon’ ‘1’]
[‘154335’ ‘bassoon’ ‘1’]
[‘154352’ ‘bassoon’ ‘1’]
[‘154344’ ‘bassoon’ ‘1’]
[‘154338’ ‘bassoon’ ‘1’]
[‘154339’ ‘bassoon’ ‘1’]
[‘154343’ ‘bassoon’ ‘1’]
[‘154342’ ‘bassoon’ ‘1’]
[‘154341’ ‘bassoon’ ‘1’]
[‘154340’ ‘bassoon’ ‘1’]
[‘154347’ ‘bassoon’ ‘1’]
[‘154346’ ‘bassoon’ ‘1’]
[‘154345’ ‘bassoon’ ‘1’]
[‘154353’ ‘bassoon’ ‘1’]
[‘154350’ ‘bassoon’ ‘1’]
[‘154349’ ‘bassoon’ ‘1’]
[‘154348’ ‘bassoon’ ‘1’]
[‘154351’ ‘bassoon’ ‘1’]
[‘61927’ ‘violin’ ‘0’]
[‘61928’ ‘violin’ ‘0’]
[‘153769’ ‘cello’ ‘0’]]

(Cluster: 2) Using majority voting as a criterion this cluster belongs to class: cello
Number of sounds in this cluster are: 23
sound-id, sound-class, classification decision
[[‘154334’ ‘bassoon’ ‘0’]
[‘61929’ ‘violin’ ‘0’]
[‘61930’ ‘violin’ ‘0’]
[‘153626’ ‘violin’ ‘0’]
[‘42252’ ‘cello’ ‘1’]
[‘42250’ ‘cello’ ‘1’]
[‘42251’ ‘cello’ ‘1’]
[‘42256’ ‘cello’ ‘1’]
[‘42257’ ‘cello’ ‘1’]
[‘42254’ ‘cello’ ‘1’]
[‘42255’ ‘cello’ ‘1’]
[‘42249’ ‘cello’ ‘1’]
[‘42248’ ‘cello’ ‘1’]
[‘42247’ ‘cello’ ‘1’]
[‘42246’ ‘cello’ ‘1’]
[‘42239’ ‘cello’ ‘1’]
[‘42260’ ‘cello’ ‘1’]
[‘42241’ ‘cello’ ‘1’]
[‘42243’ ‘cello’ ‘1’]
[‘42242’ ‘cello’ ‘1’]
[‘42253’ ‘cello’ ‘1’]
[‘42244’ ‘cello’ ‘1’]
[‘42259’ ‘cello’ ‘1’]]
Out of 60 sounds, 7 sounds are incorrectly classified considering that one cluster should ideally contain sounds from only a single class
You obtain a classification (based on obtained clusters and majority voting) accuracy of 88.33 percentage

2c. KNN find nearest neighbors

Which neighborhood or group does a sound belong to?

(using bad male vocal)

SA.classifySoundkNN(“qs/175454/175454/175454_2042115-lq.json”, “tmp”,13, descInput = [0,5,10])

from workspace…

First test: Use sounds by querying “saxophone”, tag=”alto-sax”
I am using the same descriptors [0,2,9] that worked well in previous section. with K=3. Tried various values of K with this analysis and it always came out matching ‘violin’ which I think is correct.

In [26]: SA.classifySoundkNN(“qs/saxophone/119248/119248_2104336-lq.json”, “tmp”, 33, descInput = [0,2,9])
This sample belongs to class: violin
Out[26]: ‘violin’

Second test: I am trying out the freesound “similar sound” feature. Using one of the bassoon sounds I clicked “similar sounds” and chose a sound that was not a bassoon – “Bad Singer” (male).


Running the previous descriptors returned a match for violin. So I tried various other descriptors, and was able to get it to match bassoon consistently by using: [0,5,10] which are lowlevel.spectral_centroid.mean, lowlevel.spectral_contrast.mean.0, and lowlevel.mfcc.mean.0.

I honestly don’t know the best strategy for choosing these descriptors and tried to go with ones that seemed the least esoteric. The value of K does not seem to make any difference in the classification.

Here is the output

In [42]: SA.classifySoundkNN(“qs/175454/175454/175454_2042115-lq.json”, “tmp”,13, descInput = [0,5,10])
This sample belongs to class: bassoon
Out[42]: ‘bassoon’

2d. JSON analysis data for “bad voice” example…

#: cd ~/sms-tools/workspace/A9/qs/175454/175454

# cat 175454_2042115-lq.json | python -mjson.tool

3. acousticbrainz

(typical analysis page…) http://reactivemusic.net/?p=17641

4. Echonest acoustic analysis

fingerprinting and human factors like danceability…



random -> ordered (stochastic -> deterministic (for academics))

Does technology have a sound?

Can you add small bits of randomness?
  • Can you detect randomness?
  • How much repetition is enough?

Brain wiring diagram: http://reactivemusic.net/?p=17758 

Christof Koch – check out this video at around 13:33 for about 2 minutes http://www.technologyreview.com/emtech/14/video/watch/christof-koch-hacking-the-soul/

With every technology, musicians figure out how to use it another way. Starting with stone tools, bow and arrow, and now computers.

Data science skills: http://reactivemusic.net/?p=17707

Car engine synthesizer

At the end of the class we revved the engine of a 2015 Golf TSI, connected to an engine sound simulator in Max, on Boylston street…



ep-413 DSP – week 12

Web Audio API


A good place to start

There are many links to Web Audio project right here on this blog: http://reactivemusic.net/?tag=web-audio

Technical reference manual

Comprehensive guide from Mozilla Developer Network https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API

Quick tutorial

  • Open a javascript console in a Web browser. This example uses Chrome. From the Chrome menu, select View | Developer | Javascript console
  • Type the following commands (in the shaded blocks) into the console:

Assign an instance of the Web Audio class to the object: context

context = new AudioContext();

Make an oscillator node

osc = context.createOscillator();

Connect the oscillator to the audio output (speaker)


Turn on the oscillator at the current time and plays a tone.


Turn of the oscillator


For more basics, see: “Using the Web Audio API”, at the Mozilla Developer Network: https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_Web_Audio_API

Basic html examples:

Demonstrated in class. Download from here: https://github.com/tkzic/web-audio-projects

  • Oscillator
  • Audio file player

HTML versions of examples from Boris Smus’s book: http://reactivemusic.net/?p=6094


Tutorials and references

Development (libraries, frameworks)

tz – examples


Write a composition to induce magical effects.

Here is an example from Aseem Suri http://www.aseemsuri.com/journal/piece-of-mind-second-run-at-the-csound-conference 

The project was derived from computer technology, but the overall effect was that people would go into a mysterious room, for a minute, and when they emerged, they would be smiling and happy.

Due on December 15th (last class)

ep-413 DSP – week 11







We usually think of filters in terms of frequency. Any process that removes information is a filter. Curation or abstraction, for example.




ep-413 DSP week 10

Music from data


We looked at Vine API Examples from Eli and Steve H.

OSC (Open Sound Control)
  • Max udpsend, udpreceive, touchOsc, aka.speech
  • using a wifi router for performance projects

Using an Osc server to handle Internet API’s

Searchtweet (rate limited): http://reactivemusic.net/?p=17425

Max oggz streaming example: http://reactivemusic.net/?p=17504

We talked about cartoons: http://reactivemusic.net/?p=10091

  • parameter mapping
  • modeling
  • earcons

We didn’t talk about alarm fatigue…

MBTA bus example in Max:

(parameter mapping sonification)


  • documentation
  • API keys
  • JSON
  • Max example
Unsolved problems

Is it possible to generate music from data?

ep-413 DSP week 9



Building a Max patch that displays, transforms, and responds to internet data.

building materials
  • Max (6.1.7 or newer)
  • Soundflower –

Both available from Cycling 74 http://cycling74.com/

The Max patch is based on a tutorial by dude837 called “Automatic Silly Video Generator”


The patch at the download link in the video is broken – but the javascript code for the Max js object is intact. You can download the entire patch from the Max-projects archive: https://github.com/tkzic/max-projects folder: maxvine

Internet API’s

API’s (application programming interfaces) provide methods for programs (other than web browsers) to access Internet data. Any app that access data from the web uses an API.

Here is a link to information about the Vine API: https://github.com/starlock/vino/wiki/API-Reference

For example, if you copy this URL into a web browser address bar, it will return a block of data in JSON format about the most popular videos on Vine: https://api.vineapp.com/timelines/popular

HTTP requests

An HTTP request transfers data to or from a server. A web browser handles HTTP requests in the background. You can also write programs that make HTTP requests. A  program called “curl” runs http requests from the terminal command line. Here are examples: http://reactivemusic.net/?p=5916

Response data

Data is usually returned in one of 3 formats:

  • JSON
  • XML
  • HTML

JSON is the preferred method because its easy to access the data structure.

Max HTTP requests

There are several ways to make HTTP requests in Max, but the best method is the js object: Here is the code that runs the GET request for the Vine API:

function get(url)
    var ajaxreq = new XMLHttpRequest();
    ajaxreq.open("GET", url);
    ajaxreq.onreadystatechange = readystatechange;

function readystatechange()
    var rawtext = this._getResponseKey("body");
    var body = JSON.parse(rawtext);
    outlet(0, body.data.records[0].videoUrl);


The function: get() formats and sends an HTTP request using the URL passed in with the get message from Max. When the data is returned to Max, the readystatechange() function parses it and sends the URL of the most popular Vine video out the left outlet of the js object.

Playing Internet audio/video files in Max

The qt.movie object will play videos, with the URL passed in by the read message.

Unfortunately, qt.movie sends its audio to the system, not to Max. You can use Soundflower, or a virtual audio routing app, to get the audio back into Max.

Audio from video


Video from audio


Other Internet API examples in Max

There is a large archive of examples here: Internet sensors: http://reactivemusic.net/?p=5859

We will look at more of these next week. Here is simple Max patch that uses the Soundcloud API: http://reactivemusic.net/?p=17430

Gokce Kinayoglu has written a java external for Max called Searchtweet: http://cycling74.com/toolbox/searchtweet-design-patches-that-respond-to-twitter-posts/

Many API’s require complex authentication, or money, before they will release their data. We will look ways to access these API’s from Max next week.


There are API services that consolidate many API’s into one API. For example:

Scaling data

Look at the Max tutorial (built in to Max Help) called “Data : data scaling” It contains most of what you need to know to work with streams of data.


Using the Vine API patch that we built during the class as a starting point: Build a better app.

Ideas to explore:

  • Is it possible to run several API requests simultaneously?
  • Recording? Time expansion? Effects that evolve over time?
  • Generate music from motion, data, and raw sound?
  • Make a video respond to your instrument or voice?
  • Design a better user interface or external controller?
  • Will this idea work in Max For Live?
  • How would you make adjustments to the loop length, or synchronize a video to other events?
  • Make envelopes to change the dynamic shape?
  • Destruction? Abstraction?
  • Find or write a Max URL streaming object?
  • What about using a different API or other data from the Vine API?

This project will be due in 2-3 weeks. But for next week please bring in your work in progress, and we will help solve problems.