Posts Tagged ‘Twapperkeeper’

Twapperkeeper and Beyond: A Reminder

Those of you who have followed our adventures in Twitter research for some time now will know that we’ve relied to a significant extent on Joe John O’Brien III’s excellent Twapperkeeper as a tool for capturing tweets. Twapperkeeper (as a stand-alone, free Web-based service) no longer exists in its original form, however – though some of its functionality for creating Twitter archives appears to have been subsumed into the for-pay services available as premium offerings from Hootsuite – and so we’ve been getting the occasional inquiry about what to do now.

Read the rest of this entry →

09

01 2012

Twitter Research Methods

Following on from the “World According to Twitter” research workshop at QUT, today we presented our research methods at a pre-conference workshop at Communities & Technologies 2011. This was probably the most extensive presentation of our work on Twitter research to date – including a live demonstration of how to work with basic yourTwapperkeeper datasets.

Below are the two presentations I made during the day, with audio attached. Obviously, some of the audio commentary refers to the live demonstrations, which we didn’t capture – but I hope it’s useful nonetheless.

Read the rest of this entry →

29

06 2011

Gawk Scripts for Processing Twitter Data, Vol. 1

Well, getting stuck in Melbourne for a day and being unable to participate in day one of our ATN-DAAD workshop with Cornelius Puschmann and Katrin Weller from the University of Düsseldorf has at least enabled me to put the finishing touches on something I’ve been meaning to do for some time: to collect and share the various Gawk scripts for processing Twitter data collected by Twapperkeeper or our modified yourTwapperkeeper. A ZIP file of all our (half-way decent) scripts is now available on the Tools section of our site.

These scripts enable the processing of comma- or tab-separated value files containing tweets related to specific hashtags or keywords, as Twapperkeeper used to produce them, and as yourTwapperkeeper does once you’ve installed the modified export functions which I shared in a previous post.

Read the rest of this entry →

22

06 2011

Switching from Twapperkeeper to yourTwapperkeeper

As those of you who are regular followers of our research might have gleaned already, we’ve started a few months ago to use yourTwapperkeeper to gather our Twitter data. yourTwapperkeeper is the open source version of the software that runs Twapperkeeper.com, which was one of the best tools for gathering Twitter data on selected #hashtags and keywords; sadly, Twitter’s move to a significantly stricter interpretation of its terms an conditions has made using Twapperkeeper itself all but impossible now. (I won’t go into the details of that discussion here – the key issue is that the public Twapperkeeper Website enabled researchers to share the datasets they’d gathered using the site, which Twitter took exception to.)

Happily, yourTwapperkeeper is a perfectly workable replacement for Twapperkeeper itself – but requires researchers to run their own instance of the tool on their own Web servers, and should not be used for the public sharing of datasets. yourTwapperkeeper is available from the project’s Website at Google Projects; as we’ve found, it also requires a few additional modifications before it can be used as a straight replacement to Twapperkeeper itself, however. In this post, I’m outlining the changes we’ve made – and I’m including the added and revised PHP files which are required for making them.

Read the rest of this entry →

21

06 2011

Extracting images from Twapperkeeper archives

This is just a quick post to share another new script – this one takes a list of tweets with pre-resolved URLs, and filters the list for known image-hosting services. I whipped this up as part of our ongoing efforts to go deeper into the dynamics of communication at various phases of the Queensland Floods disaster – prompted in part by the observations I made on the link data, which showed a very high prevalence of user-uploaded images being posted and retweeted. Besides that, our project aims to investigate not only text-based public communication, but also the role of image- and video-sharing (as well as the communities that have emerged around these practices, particularly on the Flickr and YouTube platforms). I’m partway through drafting a substantial post taking a closer look at the role of image sharing (and communication around images) in both Twitter and Flickr during the floods, but for now here is the script and the instructions.

Please note that this script won’t work unless the urlextract.awk and urlresolve.awk scripts have been run on the archive first.


# extractimages.awk - extract tweets containing links to images
#
# this script takes a preprocessed CSV of tweets based on the Twapperkeeper format, looks at the longurl field, and removes any lines that do not contain a link to a known image hosting service
# the urlextract.awk and urlresolve.awk scripts should be run prior to running this script
# expected data format:
# longurl,url,text,[other columns]
#
# Released under Creative Commons (BY, NC, SA) by Jean Burgess - je.burgess@qut.edu.au and Axel Bruns - a.bruns@qut.edu.au
#Project website http://mappingonlinepublics.net

BEGIN {
	getline
	print $0
}

#add more services below as you find them
$1 ~ /(twitpic\.com|flickr\.com|yfrog\.com|plixi\.com|instagr\.am|photobucket\.com|occip\.it|picasaweb\.google|sphotos\.ak\.fbcdn\.net|facebook\.com\/photo|imgur\.com)/ {

print $0 

}

18

02 2011

Dynamic Networks in Gephi: From Twapperkeeper to GEXF

In between last week’s ECREA conference in Hamburg, where we presented some of our methodologies and early outcomes from the Mapping Online Publics project, and the AoIR conference in Gothenburg, where we’ll talk some more about tracking and mapping interaction in online social networks, I wanted to finally follow up on Jean’s teaser post of a dynamic animation of Twitter @reply activity from a couple of weeks ago. This animation of network activity over time has become possible with the release of the latest beta version of Gephi, the open source network visualisation software, which now includes support for time-based data – and on the flight over to Europe as well as in between conferences and workshops, I’ve made some first steps towards building the tools to prepare our Twitter data for such dynamic visualisations.

First, though, I need to stress that the video which we posted a little while ago was only a very preliminary attempt; in the meantime, and with considerable and speedy support from the Gephi team (thanks, guys!), we’ve managed to improve our methods significantly. In the following, I’ll explain what our current approach looks like; a little further down the track, we’ll also post another animation of the results.

Read the rest of this entry →

20

10 2010

Twitter’s Response to Q&A: Abbott Edition

The other day I had a look at Twitter’s response to the Australian political leaders’ appearances on ABC1’s citizen forum-style show Q&A – by looking at the #qanda hashtag. My last post focussed especially on the commentary about Julia Gillard’s performance – today, it’s Tony Abbott’s turn.

First, though: in comparing the volume of tweets across the two programmes I noted that the Twapperkeeper archive for Tony Abbott’s appearance had a number of crucial gaps – for several periods of up to ten minutes at a time, we’re simply missing tweets altogether. I’ve checked this with the good folks at Twapperkeeper, and I’m afraid the response is that there’s nothing that can be done to retrieve those tweets now – so we’ll have to make do with what we’ve got. In that light, I’ve re-done the side-by-side comparison of tweeting activity in response to both leaders, and – for illustration only – added in a ‘moving average’ trendline to extrapolate what volume we might have seen during those gaps in the Abbott tweetstream.

Read the rest of this entry →

18

08 2010

Twitter’s Response to Gillard (and Abbott) on Q&A

By popular demand, here’s part one of a first quick take on how Australia’s major political leaders fared with their appearances on the ABC’s Q&A programme, in the eyes of the (surprisingly massive) Twitter audience that Q&A manages to generate – for both of their appearances this week (Tony Abbott) and last (Julia Gillard), the #qanda hashtag became a globally trending topic.

Let’s begin with some baseline data (provided, once again, by Twapperkeeper): here’s the total amount of tweets before, during, and after the screening of Q&A on ABC1, hour by hour.

Read the rest of this entry →

17

08 2010

Top 20 election-related YouTube videos (according to Twitter)

Update: this analysis covers a few less days than I originally stated – the results should look quite different once we add in this week’s links (and next week’s!).

Here are the top 20 Australian election-related YouTube videos so far up to last Friday morning, according to the Twitterati. Or to be more precise, here are the 20 videos which have been linked to the most in tweets containing the #ausvotes hashtag posted between 17 July and 6 August, according to the Twapperkeeper archive.

Couple of interesting things to note:

  • the mismatches between the Twitter link rankings of some of these videos with the number of views they have received on YouTube;
  • the low numbers of links generally (could be a glitch with the scripts, but I’m reasonably confident it isn’t)
  • the reasonably solid performance of ‘made-for-web’ comedy videos performed and/or produced by professionals
  • the high retweet value of ‘official’ campaign videos (in which I’d probably count GetUp!) – although it’s important to note that the tweets that go alongside the videos are frequently less-than flattering…
  • and if I may add a personal note, the only mild sharpness or funniness of even the sharpest and funniest of these videos…

Read the rest of this entry →

12

08 2010

Using Gawk and Wget to Resolve URL Shorteners

Jean’s post today points to a key problem in examining user activities on Twitter and elsewhere – people are increasingly using bit.ly and other URL shorteners, which means that a) the same target URL might appear in any number of different shortened versions, and b) it’s no longer possible from a quick look at a list of URLs to select only those which are from a specific site (for example, YouTube videos).

For our purposes, that’s a significant problem – we might want to find out, for example, which were the most popular videos shared during the election campaign, the most popular articles on abc.net.au, and so on. So, we need to resolve those shortened URLs back to their original state. This could be done through the APIs of the various shortening services, of course, but with literally hundreds of different shorteners now available, that would probably require specific unshortening scripts for each service – far too much work. So, what can we do?

Read the rest of this entry →

02

08 2010