Archive for the ‘Methods’Category

More Twitter Metrics: Metrify Revisited

About a month ago I introduced my new Gawk script metrify.awk, which generates a wide range of Twitter metrics for a given Twapperkeeper/yourTwapperkeeper hashtag or keyword archive. Even as I was writing those posts, though – and certainly while playing with the language metrics I discussed in my last post –, I started to find a few areas where metrify could provide even more information on the dataset. So, the time has come for a first service release which upgrades metrify.awk to add some more functionality (and fix a few inconsistencies along the way). This is a revision rather than a full rewrite of the script, so let’s call it metrify 1.2; it’s now available for download here, where it replaces the older version.

As before, the new version of metrify.awk is called as follows:

gawk -F , -f metrify.awk time=”[year|month|day|hour|minute]” [divisions=x,y,z,…] [skipusers=1] input.csv >metrics.csv

(divisions defaults to ‘90,99’ – i.e. a 90%/9%/1% split of the userbase – if it is not specified).

Read the rest of this entry →

31

01 2012

Creating Basic Twitter Language Metrics

OK, this may be a somewhat esoteric subject for researchers who mainly work with Twitter data from specific countries and cultures, but over the past few weeks I’ve been working on a paper that analyses Twitter activities in the #egypt and #libya hashtags – and as part of that work, I’ve been interested in exploring the interactions between users tweeting in Arabic and users tweeting in other languages (mainly in English). Unfortunately, there’s no reliable means of identifying the language of specific tweets, or of the users who post them; while the Twitter API provides an ISO language code (e.g. ‘en’ for English, ‘no’ for Norwegian, etc.) for each tweet, this is drawn simply from the overall language setting of the user’s account, and not specific to each individual tweet itself. For users who alternate between languages in their tweeting, all tweets will be tagged with their chosen language code; for users who haven’t bothered to change their Twitter profile settings away from the default English, all their tweets will be tagged ‘en’, regardless of their actual language.

Read the rest of this entry →

28

01 2012

Taking Twitter Metrics to a New Level (Part 4)

Update: revision 1.2 of metrify.awk is now available (still at the link below), and introduces some further functionality, which is outlined here.

This is the final instalment of my four-part introduction to the metrify.awk script for generating detailed metrics for specific Twapperkeeper/yourTwapperkeeper hashtag archives. Over the last couple of posts, we’ve mainly dealt with overall stats for the hashtag, as well as for specific, definable percentiles of more or less active users. Finally, now, it’s time to look more closely at patterns within the overall userbase.

Read the rest of this entry →

02

01 2012

Taking Twitter Metrics to a New Level (Part 3)

Update: revision 1.2 of metrify.awk is now available (still at the link below), and introduces some further functionality, which is outlined here.

Over the past couple of posts, I’ve introduced our new metrify.awk Twitter metrics script, and looked at the first of the three metrics tables produced by the script. Let’s move on now to the second table, where I’ll use a snapshot of Australian political discussion on Twitter under the #auspol hashtag between February and August 2011, instead of #qldfloods – the overall metrics for the different user percentiles in the #qldfloods dataset turn out not to be particularly interesting… As before, we’re dividing the total userbase according to the 1/9/90 rule into the 1% of most active users, the next 9% of moderately active users, and the final 90% of least active users. (In the case of #auspol, that first percentile contains 142, the second percentile contains 1291, and the final percentile contains 12700 of a total of 14133 users.)

Read the rest of this entry →

02

01 2012

Taking Twitter Metrics to a New Level (Part 2)

Update: I’ve clarified/corrected some of the details relating to the percentile metrics contained in the first table which metrify.awk generates.

Update 2: revision 1.2 of metrify.awk adds further functionality in addition to what is described below. These changes are detailed here.

In the previous post, I’ve introduced metrify.awk, our new multi-purpose tool for generating Twitter metrics. Over the next instalments in this series of posts, I’ll take you through the results it produces. And seeing as we’re coming up to the anniversary of the January 2011 south-east Queensland floods, and as I needed to generate those metrics anyway, for a report on social media in the floods which we’re publishing soon, I’ll be using an archive of #qldfloods tweets between 10 and 17 January 2011 as an example here.

I’m running metrify.awk as follows for this:

gawk -F , -f metrify.awk divisions=90,99 time=day qldfloods.csv >qldfloods-metrics.csv

In other words, we’re using a 1/9/90 division of users, and we’re tracking activities per day; the skipusers switch is not set, so full stats for all users will be generated.

Read the rest of this entry →

02

01 2012

Taking Twitter Metrics to a New Level (Part 1)

So, 2011 is finally over – and what a year it’s been. While the confluence of natural disasters, political crises, and other major events has also provided us with the basis for a new research programme in crisis communication, let’s hope that 2012 is a little less intense, please…

To start the new year on a positive note, I’m finally getting around to sharing some more information about the new approach to generating Twitter metrics which we’ve developed over the past few months – this actually started during the research workshops we had with Stefan Stieglitz’s group at the University of Münster in August, so it’s taken some time to gestate into its present form. What it’s now turned into is quite a powerful tool for generating detailed information about a specific Twitter dataset – intended mainly for the study of hashtags, but with applications well beyond this as well. Amongst other things, it enables us to distinguish more effectively between different groups of participating users (from highly active lead users to much less active casual participants), and to track different types of participation, in total or by these specific groups, over time.

Read the rest of this entry →

02

01 2012

Quick Update from the Road: Twitter Research Methods

Cardiff.
Another week, another presentation: Jean, Stephen, and I have now made it to Cardiff, where we’re participating in the Future of Journalism conference. Today, we presented our paper on Twitter research methods for journalists and journalism researchers, which offers a quick overview of our major ways of studying Twitter (and Twitter hashtags in particular). Our slides and audio from the presentation are below – the full paper is also online. For my liveblogging from the conference, check the Future of Journalism posts on snurb.info – and there’s also the #foj11 hashtag, of course.

09

09 2011

Twitter and the Royal Wedding, Pt. 2: Something New

The first part of this post examined some of the basic stats on Twitter use during the 29 April 2011 royal wedding. Here, we’ll try something a little different: in the tweets using the #royalwedding hashtag between 00:00 and 23:59 GMT that day, what other hashtags were also used?

Hashtags, of course, aren’t mutually exclusive, and are often used for emphasis (or comic relief) as much as to make a genuine contribution to an existing conversational hashtag feed – or indeed, both at the same time. So, beyond #royalwedding as the key hashtag to be used to refer to the actual event itself, an examination of these other hashtags provides us with some useful nuances on how Twitter users perceived and contextualised the wedding, and a correlation of what secondary hashtags were used by which groups of users helps group these perspectives to some extent.

Read the rest of this entry →

12

08 2011

Twitter and the Royal Wedding, Pt. 1: Something Processed

OK: I realise this may induce some cognitive dissonance in susceptible readers while those images of the London riots continue to flash across our TV screens (and we’re now also tracking some of the Twitter coverage of the riots and subsequent cleanup – more on that some other time, if anything interesting emerges). For some time, though, I’ve been meaning to post up some observations about that rather more glamorous event in recent British history: the royal wedding between Kate Middleton and Prince William on 29 April 2011.

We’re planning to explore this in detail in a paper some time down the track, so the main purpose of this blog post is to try on some approaches to analysing the event, and to test out some new approaches to crunching the data that I’ve played with recently – some of these ideas, in fact, resulted from our intensive research workshops with our visitors from the Universität Düsseldorf, Katrin Weller and Cornelius Puschmann, so they’re also a first outcome of that ATN-DAAD project.

Read the rest of this entry →

12

08 2011

Twitter Research Methods

Following on from the “World According to Twitter” research workshop at QUT, today we presented our research methods at a pre-conference workshop at Communities & Technologies 2011. This was probably the most extensive presentation of our work on Twitter research to date – including a live demonstration of how to work with basic yourTwapperkeeper datasets.

Below are the two presentations I made during the day, with audio attached. Obviously, some of the audio commentary refers to the live demonstrations, which we didn’t capture – but I hope it’s useful nonetheless.

Read the rest of this entry →

29

06 2011