Posts Tagged ‘Gephi’

Twitter Research Methods

Following on from the “World According to Twitter” research workshop at QUT, today we presented our research methods at a pre-conference workshop at Communities & Technologies 2011. This was probably the most extensive presentation of our work on Twitter research to date – including a live demonstration of how to work with basic yourTwapperkeeper datasets.

Below are the two presentations I made during the day, with audio attached. Obviously, some of the audio commentary refers to the live demonstrations, which we didn’t capture – but I hope it’s useful nonetheless.

Read the rest of this entry →

29

06 2011

Visualising Twitter Dynamics in Gephi, Part 2

OK, so this is the second part of my post on turning Twitter data from Twapperkeeper into a dynamic network visualisation in Gephi. Last night’s post did the groundwork, generating a GEXF file from our #spill hashtag dataset (covering Twitter discussion of an Australian Labor Party leadership spill between 7 p.m. and midnight (AEST) on 23 June 2010). In this post, we’ll work with this data file to generate a number of dynamic visualisations of the @reply activity (including old-style ‘RT @username’ retweets) during this time.

Essentially, here’s the overall network of the most active participants which we ended up with last night, now with each node’s degree value (number of @replies sent + number of @replies received, from within this most active group) next to its name. (If positions of nodes have shifted slightly from what they were, that’s because I had to recalculate the map again.) As noted at the end of part one, this overall map somewhat underestimates the weight of connections within the network, due to a limitation in how Gephi currently calculates its edge weight averages, but hopefully this will be fixed soon. What I’ve done in this new version of the map, though, is to highlight a number of interesting nodes in the network whom we’ll want to follow further:

Read the rest of this entry →

30

12 2010

Visualising Twitter Dynamics in Gephi, Part 1

In the following posts I’m finally keeping my promise to explore in earnest the use of Gephi’s dynamic timeline feature for visualising Twitter-based discussions as they unfolded in real time. A few months ago, Jean posted a first glimpse of our then still very experimental data on Twitter dynamics, with a string of caveats attached – and I followed up on this a little while later with some background on the Gawk scripts we’re using to generate timeline data in GEXF format from our trusty Twapperkeeper archives (note that I’ve updated one of the scripts in that post, to make the process case-insensitive). Building on those posts, here I’ll outline the entire process and show some practical results (disclaimer: actual dynamic animations will follow in part two, tomorrow – first we’re focussing on laying the groundwork).

First, a quick overview: what we’re after is a process that provides us not only with a static map of all connections (i.e., @replies – including old-style ‘RT @user’ retweets) made between a specific group of users on Twitter during a given period of time, but a dynamic visualisation of how those connections unfolded over the course of that period: how specific users assume more or less central positions in the @reply network as time unfolds; how discussion activity waxes and wanes; how particular tweets stimulate further activity in the network (for example as users reply to them or retweet them).

Read the rest of this entry →

30

12 2010

Dynamic Networks in Gephi: From Twapperkeeper to GEXF

In between last week’s ECREA conference in Hamburg, where we presented some of our methodologies and early outcomes from the Mapping Online Publics project, and the AoIR conference in Gothenburg, where we’ll talk some more about tracking and mapping interaction in online social networks, I wanted to finally follow up on Jean’s teaser post of a dynamic animation of Twitter @reply activity from a couple of weeks ago. This animation of network activity over time has become possible with the release of the latest beta version of Gephi, the open source network visualisation software, which now includes support for time-based data – and on the flight over to Europe as well as in between conferences and workshops, I’ve made some first steps towards building the tools to prepare our Twitter data for such dynamic visualisations.

First, though, I need to stress that the video which we posted a little while ago was only a very preliminary attempt; in the meantime, and with considerable and speedy support from the Gephi team (thanks, guys!), we’ve managed to improve our methods significantly. In the following, I’ll explain what our current approach looks like; a little further down the track, we’ll also post another animation of the results.

Read the rest of this entry →

20

10 2010

Fun with Gephi’s new dynamic visualisation feature

This is a quick demo of how the new timeline feature works in Gephi 0.7 beta. We’ve used 5 hours worth of @reply data from the Twapperkeeper archives for the #spill hashtag. This period corresponds to the ‘acute event’ in Australian politics that kicked off the election that sidetracked our research (in all kinds of productive ways, of course) – the day (the evening, and then the next morning) when now-PM Julia Gillard overthrew then-PM Kevin Rudd. Please don’t read too much (or indeed anything) into the actual analysis here, but for the sake of completeness: I’ve indicated betweenness centrality with both colour (red at the high end, yellow at the low end) and size.

The possibilities here are very interesting, particularly if we use better quality data that is properly set up for longitudinal analysis – e.g. so the nodes scale up and down properly through time. I’m pretty sure Axel has one of his epic and highly detailed methods posts up his sleeve in relation to all this, but for now, enjoy the pretty moving pictures – and apologies for the jerky cursor movements – I’m on the road and so without a mouse.

If you’re interested in any of the detail it is probably best viewed at the YouTube website in HD and fullscreen:

06

10 2010

Mapping the Australian Blogosphere Some More

My previous post outlined a few more steps I’ve taken in cleaning up our emerging dataset of links in the Australian blogosphere (current limitations of our data are also listed there). It’s time to take those cleaner data for a spin, then. Beyond mapping the interlinkages between our known blogs during the period of 17 July to 27 August 2010 (roughly coinciding with the Australian federal election campaign), as I did a couple of posts ago, I’ll now work off the cleaned dataset which contains only those links which:

  • originate from those sites in our list which we have confirmed to be (independent or professional) Australian blogs; and
  • point to sites which are more than merely functional (i.e. sites which aren’t on tge destination filter list at the bottom of my previous post).

 
What I’m especially interested in as I work with these network data is:

  1. Which non-blog sites appear prominently in the network, and in what contexts; and
  2. which blog sites appear to serve as connectors between the various components of the overall network.

 
So, feeding the network data (close to 3.4 million links) into Gephi and filtering out any sites which don’t at least receive ten incoming links from anywhere in the network, here’s what we get (PDF here):

Read the rest of this entry →

22

09 2010

More Blog Network Data Cleaning with Gawk

The other day I outlined some first steps in cleaning our blog network data (generated by our partner researchers at Sociomantic Labs) ahead of visualising it, and posted a first tentative visualisation of the part of the Australian blogosphere that we’re currently tracking. In this post I’ll continue that discussion, describing a few more steps in processing the data (again using Gawk).

Just to reiterate briefly the current limitations of our dataset:

  • We’re tracking some 8,500 feeds at the moment, some of which are mainstream news sites or other sites with RSS feeds – so we’re only covering a part of the overall Australian blogosphere at this point.
  • We’re still improving our approaches to extracting post texts and links from the blog pages – right now, our data still include text and links which are not in the posts themselves, but elsewhere on the page.

 
But even so, we can already begin to test our methods. Now, what we managed to get to in the previous post was to develop a Gawk script that truncated link destinations to their most meaningful component, in order to make network visualisation possible – if the link destination matched the base URL of one of the sites we’re following (e.g. domain.com.au/blog/), we used that URL instead of the full link URL (e.g. domain.com.au/blog/post-title.html); if the link destination was unknown, we truncated it to the domain only (e.g. domain.com.au). To improve the readability of the resulting network graph, we also dropped ‘http://’ and ‘www.’.

The first outcome from this process were the network maps I published in my last post, which further filtered the network to include only those sites which we’re actively tracking (including a number of mainstream media sites). But clearly that’s only one part of the picture – we’re just as interested in the extent to which the blogs we’re tracking are linking to other sites, from mainstream media in Australia and elsewhere through YouTube, Flickr, Facebook, and other social media sites, to any other sites which may be relevant to all Australian bloggers or any specific clusters in the blogosphere. To get there, we’ll have to massage the data a little further.

Read the rest of this entry →

22

09 2010

First Steps in Mapping the Australian Blogosphere

Following on from my previous post about the methods we’re starting to use to make sense of the Australian blogosphere data we’re receiving from our colleagues at Sociomantic Labs, here’s a first look at what happens when we begin to visualise those data in the open source network visualisation software Gephi. Let me begin by making one thing very clear, though: this is based on as yet incomplete data, and should not be seen to say anything comprehensive about the shape of the Australian blogosphere. What we’re currently working with is:

  • a highly incomplete list of Australian blogs that is biased towards those genres of blogging that we already know quite a bit about, and
  • hyperlink data that hasn’t yet been cleaned up to contain only those links present in the blog posts themselves, rather than links elsewhere on the page.

 
So, as we’ve explained in our previous work, we can expect plenty of false positives (e.g. sites like WordPress.org which appear to be central to the blog network, but are so only because many blogs run on and link to WordPress – not because their posts actually talk about WordPress-related topics), and a network structure which overrepresents those sectors of the overall Australian blogosphere where we already know and track a majority of existing blogs (e.g. Australian politics, which we’ve studied in detail over the past few years).

With those caveats in mind, though, in this post I’ll work through the data as they are at the moment, largely to test our methods as we’ve established them and to see what insights can emerge from this process. I’m drawing here on a slice of hyperlink data from the nearly 8,300 blogs that we follow (also including a number of mainstream news sites which have RSS feeds – these will be sorted into a separate category at a later stage), collected between 17 July and 27 August 2010 – i.e. roughly coinciding with the Australian federal election campaign between 17 and 21 August. (Given this heightened activity, we should expect an overrepresentation of political blogs, therefore, even beyond the skew towards politics in our overall list of blogs.)

Read the rest of this entry →

20

09 2010

Cleaning Up Blog Network Data with Gawk

Having done a fair amount of work with Twitter data over the past couple of months, I’m keen to get back now to the other substantive part of our ARC Discovery project on mapping online public communication in Australia during this first year of the project: examining patterns of interaction within and across the Australian blogosphere.

This post will start off that process by exploring some of the methodological issues, and asking for some help on refining our methods from Gawk nerds along the way. What we’re building on with the blog mapping is our previous work with our fantastic colleagues from Sociomantic Labs in Berlin, who are also doing the data gathering for this new slice of research. We’ve outlined the basic approach of our blog mapping in some detail elsewhere already (also see the Publications section of this blog), but here’s a very quick summary:

Read the rest of this entry →

20

09 2010

Twitter @reply Networks on #ausvotes

This post comes as something of a postscript to my four-part series about the key themes of discussion under the #ausvotes hashtag on Twitter during the recent Australian election campaign (17 July to 21 August 2010 – see posts #1, #2, #3, and #4). In addition to looking at the content of those tweets, I also wanted to examine the networks of conversation which took place during that time. This builds on our trusty Twapperkeeper #ausvotes archives from between 17 July and 24 August again.

Those networks are created by Twitter users including @replies, of course – e.g. ‘@snurb_dot_info’ to get my attention. I need to point out two major limitations of looking at @replies in this way, though: first, not all @reply conversations will necessarily continue to include the #ausvotes hashtag in further tweets – one way of describing this is to say that where #ausvotes is missing from follow-up tweets, the users @replying to one another have stepped away from the crowd and begun a more private conversation (though still in a public space, unless they move to direct messaging). What I’m analysing in the following, by contrast, are only public conversations where the #ausvotes hashtag was retained – i.e. where users were talking to (or at) one another, but did so still with the wider #ausvotes audience in mind; we might understand this as a deliberately publicly performed conversation.

Read the rest of this entry →

10

09 2010