API Archives - Page 5 of 13

May 11, 2014June 22, 2014

Google speech API v2

notes

update 5/17/2014: The key in the post below is now disabled. Trying this one: AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw

It worked for now – but I will probably need to get a real key… and add instructions for inserting the key into the patches: robot_conversation5.maxpat and speech-to-google-text-api5.maxpat

Basic instructions are

edit Max patch
go into the sub patch: call-google-speech and replace the key string inside the curl command with the correct key.

earlier post

v1 API of Google Speech broke a few days ago.

Here is an example of how to run v2. https://github.com/gillesdemey/google-speech-v2

I have updated the Max patches in the Internet Sensors project. https://github.com/tkzic/internet-sensors

This version of the API produces malformed JSON responses.

Here’s an example using curl:

curl -v -i -X POST -H “Content-Type:audio/x-flac; rate=16000” -T /tmp/tweet.flac “https://www.google.com/speech-api/v2/recognize?xjerr=1&client=chromium&lang=en-US&maxresults=10&pfilter=0&xjerr=1&key=AIzaSyCnl6MRydhw_5fLXIdASxkLJzcJh5iX0M4”

Instructions for getting a real key… http://www.chromium.org/developers/how-tos/api-keys

Note: Need to look at the double buffering methods in the Max patches to make sure they are handling various sample rates properly. I think they may be optimized for 44.1 KHz

May 8, 2014June 22, 2014

What we watch

From the MIT Civic Media Lab (Ethan Zuckerman)

http://whatwewatch.mediameter.org/#all

May 4, 2014June 24, 2014

Web Audio API demo list

https://github.com/WebAudio/demo-list

March 16, 2014June 24, 2014

Interviewly

Reformats reddit AMA’s into readable interviews

found at kottke.org

http://interviewly.com

February 12, 2014June 19, 2014

Searching by image

Using Google’s search by image feature to return similar images

http://images.google.com/imghp?hl=en

With Google you can search by image. But it gets really interesting when you upload an image that is not available on the internet and look at the set of similar images returned. Or if you use a common image but just view the visually similar results. For example, here is a protein molecule (http://www.kurzweilai.net/images/ferritin.jpg)

Here are similar image results returned by Google.

You can also restrict the results to faces:

A few internet images to try:

Camera images (not on the Internet until they were posted here) These will give more interesting results. For example, the woman with the flower (using face matching) returns images of Erik Prince and Brad Pitt.

February 11, 2014January 22, 2024

More conversations with robots in Max

Using Google speech API and Pandorabots API

(updated 1/21/2024)

all of these changes are local – for now.

replace path to sox with /opt/homebrew/bin/sox in [p call-google-speech]

Also had to write a new python script to convert xml to json. Its in the subfolder /xml2json/xml4json.py

The program came from this link: https://www.geeksforgeeks.org/python-xml-to-json/

Also inside [p call-pandorabots] the path for this python program had to be explicit to the full path on the computer. this will vary depending on your python installation.

Also, note that you must install a dependency with pip:

pip install xmltodict

After all that I was actually able to have a conversation. These bots seem primitive, but loveable, now compared to chatGPT. Guess its time for a new project.

Also the voice selection for speech synth is still not connected

(updated 1/21/2021)

This project is an extension to the speech-to-text project: https://reactivemusic.net/?p=4690 You might want to try running that project first to get the Google speech API running.

features

Everything runs in one Max patch
menu selection of chat bots and voices (currently disabled)
filtering of non speakable text (like HTML tags)
python script now runs under current directory of patch using relative path
refinements to recording and chatbot engines

download

https://github.com/tkzic/internet-sensors

folder: google-speech

files

main Max patch

robot-conversation7.maxpat

abstractions and other files

clean-html.js
xml2json/xml2json.py
JSON-google-speech.js
JSON-pandorabot.js
ms-counter.maxpat (timer for recording messages)
pandorabots.txt

Max external objects

[shell] from https://github.com/jeremybernstein/shell/releases/tag/1.0b2 download this external and add the folder to Options | File Preferences, in Max

external programs:

sox: sox audio conversion program must be in the computer’s executable file path, ie., /usr/bin – or you can rewrite the [sprintf] input to [aka.shell] with the actual path. In our case we installed sox using Macports. The executable path is /opt/local/bin/sox – which is built into a message object in the subpatcher [call-google-speech]

get sox from: http://sox.sourceforge.net

Instructions

Open robot-converstaion7.maxpat and turn on audio
select chatbot as destination
Press the spacebar to start recording.
Ask a question.
Press the spacebar to stop recording.

notes

Need to fix the selection of voices.

revision history

1/21/2021: complete rewrite for Max8 and Catalina
4/24/2016: need to have explicit path to sox, in the call-google-speech subpatch. In my Macports version the path is /usr/local/opt/bin/sox.
6/6/2014: re-added missing pandorabots.txt (list of chatbots) – also noticed that pandorabots.com was not available. May need to look for another site.
5/11/2014: The newest version requires Max 6.1.7 (for JSON parsing). Also have updated to Google Speech API v2.
Note: Instructions for getting a real key from Google – which will need to be inserted into the patch. http://www.chromium.org/developers/how-tos/api-keys – so far we have been getting by with common keys from a github site (see notes in next link)

Also please see these notes about how to modify the patch with your key – until this gets resolved: https://reactivemusic.net/?p=11035

This project added to internetsensors 3/26/2014
This is an update to the robot conversation project https://reactivemusic.net/?p=4710

February 4, 2014June 25, 2014

Notes: Chatbots in Conversation

update 6/2014 – Now part of the Internet sensors projects: https://reactivemusic.net/?p=5859

original post

They can talk with each other… sort of.

Last spring I made a project that lets you talk with chatbots using speech recognition and synthesis. https://reactivemusic.net/?p=4710.

Yesterday I managed to get two instances of this program, running on two computers, using two chatbots, to talk with each other, through the air. Technical issues remain (see below). But there were moments of real interaction.

In the original project, a human pressed button in Max to start and stop recording speech. This has been automated. The program detects and records speech, using audio level sensing. The auto-recording sensor turns on a switch when the level hits a threshold, and turns off after a period of silence. Threshold level and duration of silence can be adjusted by the user. There is also a feedback gate that shuts off auto-record while the computer is converting speech to text, and ‘speaking’ a reply.

technical issues

The Google speech API has difficulty with some of the voices used by the Mac OS speech synthesizer. We’ll need to experiment to find which voices produce accurate results.
The overall levels produced by the builtin Macbook speakers is not quite enough to achieve clear communication. The auto-recorder missed the onset of speech sometimes. One solution would be to insert a click to trigger the recorder, just before the speech synthesizer begins the actual speech. Or to use external speakers, or a secondary “wired” connection.
It would be nice to have menus of chatbots and voices. Also to automate the start of a new conversation thread.
The button to start the audio detector had to be operated by key-press because pushing the trackpad on a MacBook makes too much noise and always triggers the audio level detector.
Occasionally a chat bot would deliver a long response, or one containing a web address. These were problematic for recognition and synthesis.