Parsing Tweets into spoken language


I’ve revised the php program that streams Tweets and sends them to Max, to remove hyperlinks, RT indicators, user mentions, and ascii art. Now it works better with text-to-speech.

things that could be done in a future project:

  • figure out which #hashtags are integral to content, and which are just tagged onto the end of a tweet
  • remove extraneous hyperlinks which don’t get parsed by the API
  • translate symbols like > into “great than” or “better than”
  • translate (or at least flag) foreign languages – this could be aided by geocoding data
  • translate slang acronyms like OMG, LOL
  • natural language parsing (see Stanford open source program) for content and grammatical analysis
  • replace hyperlinks/picture-links with a ‘title’ from the actual target
  • natural language equivalents of things like: RT @zooloo:
things to try
  • Running the output of text-to-speech through musical analysis tools, to detect pitch and rhythm
  • Chaining: Use the content of one tweet to direct a search for the next one. For example say you search for cats and get: “my cat is turning purple” – then you would search for ‘purple’ and get: “I’ve never eaten a purple cow” – then you would search for “cow” and so forth