Twitter Concept Mapping with Wordstat and Gephi: First Steps
Continuing my series of posts on methods for doing quantitative research using Twitter data, this will be a fairly tentative post. I’m currently looking into ways to examine the terms and concepts used by tweeters as they discuss specific issues; we’ve done similar work looking at the content of blog-based debates in the past, using the (commercial) concept mapping software Leximancer, but I’ve never been fully satisfied with the information generated by Leximancer, and especially with its data visualisation functionality, so it’s time to look at the alternatives.
Ideally, I’d like to leave the visualisation aspects to the open source software Gephi, which I’ve already used for some useful network visualisations (more on that in another post), so what I’m really after is a software that produces word and concept co-occurrence data for my source texts (in this case, a database of tweets on a specific subject), and pushes this out in a format that Gephi can understand (e.g. UCINet or Pajek, or even Gephi’s own network data format). At the ICA conference in Singapore last month, I came across a (commercial, sadly) quantitative text analysis software called WordStat – part of a larger software package available from Provalis Research that includes various other statistical tools which are less relevant for me here –, so that’s where I’ll start.