I’ve revised the php program that streams Tweets and sends them to Max, to remove hyperlinks, RT indicators, user mentions, and ascii art. Now it works better with text-to-speech.
things that could be done in a future project:
This diagram shows the process of handling a stream, using a data store as an intermediary.
The JSON response breaks out various components of the tweet like hashtags and URL’s but it doesn’t provide a clean version of the text – which for example could be converted to speech.
Here’s a sample response which shows all the fields:
{ "created_at": "Sat Feb 23 01:30:55 +0000 2013", "id": 305127564538691584, "id_str": "305127564538691584", "text": "Window Seat #photo #cats http:\/\/\/sf0fHWEX2S", "source": "\u003ca href=\"http:\/\/\/\" rel=\"nofollow\"\u003eEchofon\u003c\/a\u003e", "truncated": false, "in_reply_to_status_id": null, "in_reply_to_status_id_str": null, "in_reply_to_user_id": null, "in_reply_to_user_id_str": null, "in_reply_to_screen_name": null, "user": { "id": 19079184, "id_str": "19079184", "name": "theRobot Vegetable", "screen_name": "roveg", "location": "", "url": "http:\/\/\/", "description": "choose art, not life", "protected": false, "followers_count": 975, "friends_count": 454, "listed_count": 109, "created_at": "Fri Jan 16 18:38:11 +0000 2009", "favourites_count": 1018, "utc_offset": -39600, "time_zone": "International Date Line West", "geo_enabled": false, "verified": false, "statuses_count": 10888, "lang": "en", "contributors_enabled": false, "is_translator": false, "profile_background_color": "1A1B1F", "profile_background_image_url": "http:\/\/\/profile_background_images\/6824826\/BlogisattvasEtc.gif", "profile_background_image_url_https": "https:\/\/\/profile_background_images\/6824826\/BlogisattvasEtc.gif", "profile_background_tile": true, "profile_image_url": "http:\/\/\/profile_images\/266371487\/1roveggreen_normal.gif", "profile_image_url_https": "https:\/\/\/profile_images\/266371487\/1roveggreen_normal.gif", "profile_link_color": "2FC2EF", "profile_sidebar_border_color": "181A1E", "profile_sidebar_fill_color": "252429", "profile_text_color": "666666", "profile_use_background_image": false, "default_profile": false, "default_profile_image": false, "following": null, "follow_request_sent": null, "notifications": null }, "geo": null, "coordinates": null, "place": null, "contributors": null, "retweet_count": 0, "entities": { "hashtags": [{ "text": "photo", "indices": [12, 18] }, { "text": "cats", "indices": [19, 24] }], "urls": [{ "url": "http:\/\/\/sf0fHWEX2S", "expanded_url": "http:\/\/\/?p=186", "display_url": "\/?p=186", "indices": [25, 47] }], "user_mentions": [] }, "favorited": false, "retweeted": false, "possibly_sensitive": false, "filter_level": "medium" }
A method using regexp and php. Actually what this does is parse Tweets using regexp to reformat the text as html with links. A tutorial here:
This is a php library that breaks out hashtags, usernames, etc., but doesn’t really provide a way to isolate the remaining stuff. I have put it in tkzic/API – there is an example php program provided.
hashtags – using regular expressions
twitter-text-rb : ruby gem which parses out usernames and hashtags
update 6/2014
original post
Got a test patch running today which breaks out tweets (in php and curl) and sends them to Max via Osc.
(update) Have parsed data to remove hyperlinks and Twitter symbols.
It took some tweaking of global variables in php – and probably would be better written using classes (as in this example: – see post from GZipp.
Max patch: tkzic/max teaching examples/twitter-php-streamer1.maxpat
php code: twitterStreamMax.php
<?php // max-osc-play.php // // collection of php OSC code from Max stock-market thing // include 'udp.php'; // udp data sending stuff $DESTINATION = 'localhost'; $SENDPORT = '7400'; $RECVPORT = '7401'; ////////////////////////////////////////////////////////////////////////////////////////// $USERNAME = 'username'; $PASSWORD = 'password'; $QUERY = 'cats'; // the hashtag # is optional // these variables are defined as global so they can be used inside the write callback function global $osc; global $kount; // initialize OSC $osc = new OSCClient(); // OSC object $osc->set_destination($DESTINATION, $SENDPORT); // This amazing program uses curl to access the Twitter streaming API and breaks the data // into individual tweets which can be saved in a database, sent out via OSC, or whatever // /** * Called every time a chunk of data is read, this will be a json encoded message * * @param resource $handle The curl handle * @param string $data The data chunk (json message) */ function writeCallback($handle, $data) { /* echo "-----------------------------------------------------------\n"; echo $data; echo "-----------------------------------------------------------\n"; */ $maxdata = "/tweet" ; // header - begin global $kount; // test counter global $osc; // osc object $json = json_decode($data); if (isset($json->user) && isset($json->text)) { // here we have a single tweet echo "@{$json->user->screen_name}: {$json->text}\n\n"; // do some cleaning up... // remove URL's $s = $json->text; // raw tweet text // ok now need to do the same thing below for URL,s RT's @'s etc., // and then remove redundant spaces /* example Depending on how greedy you'd like to be, you could do something like: $pg_url = preg_replace("/[^a-zA-Z 0-9]+/", " ", $pg_url); This will replace anything that isn't a letter, number or space */ // display all hashtags and their indices foreach( $json->entities->hashtags as $obj ) { echo "#:{$obj->text}\n"; // display hashtag // get rid of the hashtag // note: this gets rid of all hashtags, which could obscure the meaning of the tweet, if // the hashtag is used inside a sentence like: "my #cat is purple" - would be changed to: "my is purple" // so we could use some intelligent parsing here... // $s = str_replace("#{$obj->text}", "", $s ); // this is a more benign approach, which leaves the word but removes the # $s = str_replace("#{$obj->text}", "{$obj->text}", $s ); } foreach( $json->entities->urls as $obj ) { echo "U:{$obj->url}\n"; // display url $s = str_replace("{$obj->url}", "", $s ); // get rid of the url } foreach( $json->entities->user_mentions as $obj ) { echo "@:{$obj->screen_name}\n"; // display $s = str_replace("RT @{$obj->screen_name}:", "", $s ); // get rid of re-tweets $s = str_replace("@{$obj->screen_name}:", "", $s ); // get rid of other user mentions $s = str_replace("@{$obj->screen_name}", "", $s ); // get rid of other user mentions } // $s = str_replace("RT ", "", $s ); // get rid of RT's (re-tweet indicators) // $s = preg_replace( '/[^[:print:]]/', '',$s); // remove non printable characters $s = htmlspecialchars_decode($s); // decode stuff like > $s = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/u', '', $s); // get rid of unicode junk $s = preg_replace('/[^(\x20-\x7F)]*/','', $s); // get rid of other non printable stuff $s = preg_replace('!\s+!', ' ', $s); // remove redundant white space echo "revised tweet: {$s}\n"; $maxdata = "/tweet " . "{$json->text}"; // $maxdata = $maxdata . " " . $kount++; $osc->send(new OSCMessage($maxdata)); } return strlen($data); } // initialize OSC // initialize curl $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, '' . urlencode($QUERY)); curl_setopt($ch, CURLOPT_USERPWD, "$USERNAME:$PASSWORD"); curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'writeCallback'); curl_setopt($ch, CURLOPT_TIMEOUT, 20); // disconnect after 20 seconds for testing curl_setopt($ch, CURLOPT_VERBOSE, 1); // debugging curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate'); // req'd to get gzip curl_setopt($ch, CURLOPT_USERAGENT, 'tstreamer/1.0'); // req'd to get gzip curl_exec($ch); // commence streaming $info = curl_getinfo($ch); var_dump($info); ?>
A sequencer that works like Tetris.
by Avery Rossow
Cool matrixing in the sub-patches…
local file: Live set is in tkzic/max teaching examples/rainmaker-thing
Thoughts on a streaming API project model with Max.
I’ve been trying to come up with generalized methods to handle the class of Max projects which read a stream of data from the Web, and use it to trigger events, for example, sound and graphics.
OSC is generally a good way to get data into Max from Web API’s. One issue with data streams, is that they do not always provide a constant flow. In some cases, this is what makes them musical. The rhythm of the flow becomes the rhythm of the music.
But in some cases we are vexed by too little flow or too much.
When the flow is too sparse, and the project requires a constant flow – the stream can be fattened up by using a [metro] object to output the current stream value at a higher frequency.
When the flow is too fast – you can use [speedlim] for numbers – but not for text data like tweets about cats, which seem to stream in like a flood. One solution is to use a data-recorder, like our modified CNMAT list recorder in the Irish Train project.
You would need separate access to the record and play ‘heads’ – so for example you could record in real time, but start playing back at a slower rate (while the recording continues). This is essentially a form of stream buffering. The data recorder approach would also allow you to use various algorithms to ‘thin’ the data – for example, to keep up with the real time rate, but by using less of the data.
[update] got this working with the modified CNMAT data recorder patch. It allows separate control of recording and playback, simultaneously.
patch is in tkzic/max teaching examples/ data-recorder-tester.maxpat
The Osc code from the stock market music project is not really doing Osc.
But… it works well going from php->max. In the other direction its using a kluge of nc and an alarm clock shell program – to receive messages from Max in UDP, but its really kind of horrible – so I’m going to look again for an OSC library in php.
update 2/2013 This is hard to believe, but I haven’t yet found a real OSC libraries for php. Apparently php is so uncool, that nobody wants to write for it anymore. Anyway, the code above, works unidirectionally, so its of some use for existing php code.
Local files are max-php-osc-tester.maxpat and max-osc-play.php in tkzic/api
Analysis that might help with parsing:
from Captain Caveman
Its starting to seem like the R-Pi is turning out to be a real musical instrument.