Hearing voices

A presentation for Berklee BTOT 2015 http://www.berklee.edu/faculty

(KITT dashboard by Dave Metlesits)

The voice was the first musical instrument. Humans are not the only source of musical voices. Machines have voices. Animals too.

Topics
  • synthesizing voices (formant synthesis, text to speech, Vocaloid)
  • processing voices (pitch-shifting, time-stretching, vocoding, filtering, harmonizing),
  • voices of the natural world
  • fictional languages and animals
  • accents
  • speech and music recognition
  • processing voices as pictures
  • removing music from speech
  • removing voices

Voices

We instantly recognize people and animals by their voices. As an artist we work to develop our own voice. Voices contain information beyond words. Think of R2D2 or Chewbacca.

There is also information between words: “Palin Biden Silences” David Tinapple, 2008: http://vimeo.com/38876967

Synthesizing voices

The vocal spectrum

What’s in a voice?

Singing chords

Humans acting like synthesizers.

More about formants
Text to speech

Teaching machines to talk.

vocodblk.gif

  • phonemes (unit of sound)
  • diphones (combination of phonemes) (Mac OS “Macintalk 3  pro”)
  • morphemes (unit of meaning)
  • prosody (musical quality of speech)
Methods
  • articulatory (anatomical model)
  • formant (additive synthesis) (speak and spell)
  • concatentative (building blocks) (Mac Os)

Try the ‘say’ command (in Mac OS terminal), for example: say hello

More about text to speech
Vocoders

Combining the energy of voice with musical instruments (convolution)

  • Peter Frampton “talkbox”: https://www.youtube.com/watch?v=EqYDQPN_nXQ (about 5:42) – Where is the exciting audience noise in this video?
  • Ableton Live example: Local file: Max/MSP: examples/effects/classic-vocoder-folder/classic_vocoder.maxpat
  • Max vocoder tutorial (In the frequency domain), by dude837 – Sam Tarakajian https://reactivemusic.net/?p=17362 (local file: dude837/4-vocoder/robot-master.maxpat
More about vocoders
Vocaloid

By Yamaha

(text + notation = singing)

Demo tracks: https://www.youtube.com/watch?v=QWkHypp3kuQ

Vocaloop device http://vocaloop.jp/ demo: https://www.youtube.com/watch?v=xLpX2M7I6og#t=24

Processing voices

Transformation

Pitch transposing a baby https://reactivemusic.net/?p=2458

Real time pitch shifting

Autotune: “T-Pain effect” ,(I-am-T-Pain bySmule), “Lollipop” by Lil’ Wayne. “Woods” by Bon Iver https://www.youtube.com/watch?v=1_cePGP6lbU

Autotuna in Max 7

by Matthew Davidson

Local file: max-teaching-examples/autotuna-test.maxpat

InstantDecomposer in Pure Data (Pd)

by Katja Vetter

http://www.katjaas.nl/slicejockey/slicejockey.html

Autocorrelation: (helmholtz~ Pd external) “Helmholtz finds the pitch” http://www.katjaas.nl/helmholtz/helmholtz.html

(^^ is input pitch, preset #9 is normal)

  • local file: InstantDecomposer version: tkzic/pdweekend2014/IDecTouch/IDecTouch.pd
  • local file: slicejockey2test2/slicejockey2test2.pd
Phasors and Granular synthesis

Disassembling time into very small pieces

Time-stretching

Adapted from Andy Farnell, “Designing Sound”

https://reactivemusic.net/?p=11385 Download these patches from: https://github.com/tkzic/max-projects folder: granular-timestretch

  • Basic granular synthesis: graintest3.maxpat
  • Time-stretching: timestretch5.maxpat

More about phasors and granular synthesis
Phase vocoder

…coming soon

Sonographic sound processing

Changing sound into pictures and back into sound

by Tadej Droljc

 https://reactivemusic.net/?p=16887

(Example of 3d speech processing at 4:12)

local file: SSP-dissertation/4 – Max/MSP/Jitter Patch of PV With Spectrogram as a Spectral Data Storage and User Interface/basic_patch.maxpat

Try recording a short passage, then set bound mode to 4, and click autorotate

Speech to text

Understanding the meaning of speech

The Google Speech API

A conversation with a robot in Max

https://reactivemusic.net/?p=9834

Google speech uses neural networks, statistics, and large quantities of data.

More about speech to text

Voices of the natural world

Changes in the environment reflected by sound

Fictional languages and animals

“You can talk to the animals…”

Pig creatures example: http://vimeo.com/64543087

  • 0:00 Neutral
  • 0:32 Single morphemes – neutral mode
  • 0:37 Series, with unifying sounds and breaths
  • 1:02 Neutral, layered
  • 1:12 Sad
  • 1:26 Angry
  • 1:44 More Angry
  • 2:11 Happy

What about Jar Jar Binks?

Accents

The sound changes but the words remain the same.

The Speech accent archive https://reactivemusic.net/?p=9436

Finding and removing music in speech

We are always singing.

Jamming with speech
Removing music from speech
SMS-tools

by Xavier Serra and UPF

Harmonic Model Plus Residual (HPR) – Build a spectrogram using STFT, then identify where there is strong correlation to a tonal harmonic structure (music). This is the harmonic model of the sound. Subtract it from the original spectrogram to get the residual (noise).

Screen Shot 2015-01-06 at 1.40.37 AM

Screen Shot 2015-01-06 at 1.40.12 AM

Settings for above example:

  • Window size: 1800 (SR / f0 * lobeWidth) 44100 / 200 * 8 = 1764
  • FFT size: 2048
  • Mag threshold: -90
  • Max harmonics: 30
  • f0 min: 150
  • f0 max: 200
feature detection
  • time dependent
  • Low level features: harmonicity, amplitude, fundamental frequency
  • high level features: mood, genre, danceability

Acoustic Brainz: (typical analysis page) https://reactivemusic.net/?p=17641

Essentia (open source feature detection tools)  https://github.com/MTG/essentia

Freesound (vast library of sounds):  https://www.freesound.org – look at “similar sounds”

Removing voices from music

A sad thought

phase cancellation encryption

This method was used to send secret messages during world war 2. Its now used in cell phones to get rid of echo. Its also used in noise canceling headphones.

https://reactivemusic.net/?p=8879

max-projects/phase-cancellation/phase-cancellation-example.maxpat

Center channel subtraction

What is not left and not right?

Ableton Live – utility/difference device: https://reactivemusic.net/?p=1498 (Allison Krause example)

Local file: Ableton-teaching-examples/vocal-eliminator

More experiments

Questions

  • Why do most people not like the recorded sound of their voice?
  • Can voice be used as a controller?
  • How do you recognize voices?
  • Does speech recognition work with singing?
  • How does the Google Speech API know the difference between music and speech?
  • How can we listen to ultrasonic animal sounds?
  • What about animal translators?

 

chord generator using Live Object Model

Global tempo setting triggers Midi chords

download

https://github.com/tkzic/max-for-live-projects

folder: repeater

device: chord.amxd

instructions
  1. drag chord.amxd into a midi track
  2. put a midi instrument after the device in the track
  3. start the global transport (play on top toolbar)
  4. arm the midi track for record
  5. play some midi notes
  6. adjust the note value in the device

random tone and speed changes

This version randomly generates chord tones and and note durations.

In the same folder, device: chord2.amxd

instructions
  1. drag chord2.amxd into a midi track
  2. put a midi instrument after the device in the track
  3. start the global transport (play on top toolbar)
  4. arm the midi track for record
  5. hold down a Midi note
  6. Use toggles to select parameters to randomize

Ableton Live looping project

Current local version is in tkzic/van project/van47g key9d.als

changes

  • Using Korg nanoKontrol instead of foot pedals (actually either will work) – see map below
  • Now works with any audio interface
  • Mic input for channel 9 is fed to output to use as direct live monitor

Essentially it now works with a minimal amount of external hardware.

midiStroke

Backed up the midiStroke config file in its current state. The file is located at: /Users/tkzic/Library/Application Support/midiStroke/midiStroke.cdcmidistroke. Backups will be kept in the same folder.

nanoKontrol map

Scene 1
Top row of buttons
  1. up
  2. down
  3. left
  4. right
  5. stop all clips
  6. delete
  7. finish recording and arm track to the right (enter/right/enter)
  8. record start/stop (enter)
  9. launch scene in current row and move cursor to track 5
bottom row of buttons
  1. unassigned…
dials

Correspond to track levels 1-9

faders
  1. Master level
  2. unassigned…

what’s next

  • Make a lightweight template version (without all of Van’s music) – that can be used for new projects
  • set up with Launchpad to do fx processing
  • Midi version – or midi tracks?

 

max-for-live-projects index

A collection of experiments using Max for Live.

Each project is in a separate folder. Several projects require additional external objects or dependencies. You will find helpful instructions by clicking links next to the project names below.

download

max-for-live-projects on Github: https://github.com/tkzic/max-for-live-projects

Runs in Live 9 and Max 6.1, on Mac OS 10.9

index

Location of midiStroke preset file

MidiStroke is a utility app that converts typewriter key strokes into Midi messages.

By Charlie Roberts

Configuration file (On a Macbook):

/Users/tkzic/Library/Application Support/midiStroke/midiStroke.cdcmidistroke

Helpful article at djtechtools.coml:

http://www.djtechtools.com/2010/06/18/using-multile-controllers-with-itch/

You can’t create multiple preset files within midiStroke – but you can work within the file system as they explain in the article above.

HISS impulse response tutorials

notes

From Pierre Alex Tremblay – 2 videos demonstrating how to create an IR using HISS tools

http://vimeo.com/tremblap/videos

Link to the paper which is not a photograph of the pages from the journal

http://quod.lib.umich.edu/cgi/p/pod/dod-idx/hisstools-impulse-response-toolbox-convolution-for.pdf?c=icmc;idno=bbp2372.2012.029

[update 1/2014]

Here’s the Max Patch from the video – from this C74 forum thread: http://cycling74.com/forums/topic/a-quick-tutorial-video-on-how-to-create-an-impulse-response/

the local file is: tkzic/max teaching examples/impulse-response-rodrigo-vid.maxpat

Ok, I must be living under a rock. Did not realize this stuff was in Live 9.

This tutorial explains how to use Live convolution reverb effect, as well as the IR measurement tool. The measurement tool lets you record impulse responses using a sine spectrum sweep. Essentially these are the Alex Harker Max tools made into M4L devices.

 http://www.macprovideo.com/hub/ableton-live/create-your-own-impulse-responses-in-ableton-live-9