Parsing Tweets into spoken language

notes

I’ve revised the php program that streams Tweets and sends them to Max, to remove hyperlinks, RT indicators, user mentions, and ascii art. Now it works better with text-to-speech.

things that could be done in a future project:

figure out which #hashtags are integral to content, and which are just tagged onto the end of a tweet
remove extraneous hyperlinks which don’t get parsed by the API
translate symbols like > into “great than” or “better than”
translate (or at least flag) foreign languages – this could be aided by geocoding data
translate slang acronyms like OMG, LOL
natural language parsing (see Stanford open source program) for content and grammatical analysis
replace hyperlinks/picture-links with a ‘title’ from the actual target
natural language equivalents of things like: RT @zooloo:

things to try

Running the output of text-to-speech through musical analysis tools, to detect pitch and rhythm
Chaining: Use the content of one tweet to direct a search for the next one. For example say you search for cats and get: “my cat is turning purple” – then you would search for ‘purple’ and get: “I’ve never eaten a purple cow” – then you would search for “cow” and so forth