Posts Tagged ‘tools’

Gawk Scripts for Processing Twitter Data, Vol. 1

Well, getting stuck in Melbourne for a day and being unable to participate in day one of our ATN-DAAD workshop with Cornelius Puschmann and Katrin Weller from the University of Düsseldorf has at least enabled me to put the finishing touches on something I’ve been meaning to do for some time: to collect and share the various Gawk scripts for processing Twitter data collected by Twapperkeeper or our modified yourTwapperkeeper. A ZIP file of all our (half-way decent) scripts is now available on the Tools section of our site.

These scripts enable the processing of comma- or tab-separated value files containing tweets related to specific hashtags or keywords, as Twapperkeeper used to produce them, and as yourTwapperkeeper does once you’ve installed the modified export functions which I shared in a previous post.

Read the rest of this entry →

22

06 2011

Switching from Twapperkeeper to yourTwapperkeeper

As those of you who are regular followers of our research might have gleaned already, we’ve started a few months ago to use yourTwapperkeeper to gather our Twitter data. yourTwapperkeeper is the open source version of the software that runs Twapperkeeper.com, which was one of the best tools for gathering Twitter data on selected #hashtags and keywords; sadly, Twitter’s move to a significantly stricter interpretation of its terms an conditions has made using Twapperkeeper itself all but impossible now. (I won’t go into the details of that discussion here – the key issue is that the public Twapperkeeper Website enabled researchers to share the datasets they’d gathered using the site, which Twitter took exception to.)

Happily, yourTwapperkeeper is a perfectly workable replacement for Twapperkeeper itself – but requires researchers to run their own instance of the tool on their own Web servers, and should not be used for the public sharing of datasets. yourTwapperkeeper is available from the project’s Website at Google Projects; as we’ve found, it also requires a few additional modifications before it can be used as a straight replacement to Twapperkeeper itself, however. In this post, I’m outlining the changes we’ve made – and I’m including the added and revised PHP files which are required for making them.

Read the rest of this entry →

21

06 2011

Creating Twitter Timelines from Twapperkeeper Data

This is the first in what will be an irregular series of methods posts outlining some of our approaches to working with datasets from various sources. Part of our work over the next few weeks will be to examine what happens in the Australian Twittersphere around the upcoming federal election, so I figured it would be a good idea to start with some of the basics of working with Twapperkeeper data. (Note that what I’ll outline here is a working solution, but not necessarily an elegant one – if anybody has a better suggestion, we’d love to hear it.)

Twapperkeeper is an online tool for capturing (public) tweets that contain specific #hashtags, keywords, or @usernames. The datasets it creates are delivered in a standard comma-separated value (CSV) format – including fields such as the tweet itself, the username of the poster, and a timestamp in various formats, as well as a few other bits of backend information.

One of the most immediate points of interest in working with a Twapperkeeper dataset is often to get a sense of the tweet timeline: how does the volume of tweets change over time, for example in response to events occurring in the world? The datasets provide that information – but to create an accurate visualisation of the timeline needs some doing. In this post, I’m going to work through an example, using Twapperkeeper data collected by my colleague Jean Burgess during the recent Australian Labor Party leadership spill (centred around the #spill hashtag and a few related ones).

Read the rest of this entry →

22

07 2010