<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mapping Online Publics</title>
	<atom:link href="http://www.mappingonlinepublics.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mappingonlinepublics.net</link>
	<description></description>
	<lastBuildDate>Tue, 31 Jan 2012 08:20:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>More Twitter Metrics: Metrify Revisited</title>
		<link>http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/#comments</comments>
		<pubDate>Tue, 31 Jan 2012 08:04:18 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[Gawk]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/?p=1063</guid>
		<description><![CDATA[About a month ago I introduced my new Gawk script metrify.awk, which generates a wide range of Twitter metrics for a given Twapperkeeper/yourTwapperkeeper hashtag or keyword archive. Even as I was writing those posts, though – and certainly while playing with the language metrics I discussed in my last post –, I started to find [...]]]></description>
			<content:encoded><![CDATA[<p>About a month ago <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-1/">I introduced my new Gawk script metrify.awk</a>, which generates a wide range of <em>Twitter</em> metrics for a given <em>Twapperkeeper</em>/<em>yourTwapperkeeper</em> hashtag or keyword archive. Even as I was writing those posts, though – and certainly while playing with the language metrics I discussed <a href="http://www.mappingonlinepublics.net/2012/01/28/creating-basic-twitter-language-metrics/">in my last post</a> –, I started to find a few areas where metrify could provide even more information on the dataset. So, the time has come for a first service release which upgrades metrify.awk to add some more functionality (and fix a few inconsistencies along the way). This is a revision rather than a full rewrite of the script, so let’s call it metrify 1.2; <a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/metrify.zip">it’s now available for download here</a>, where it replaces the older version.</p>
<p>As before, the new version of metrify.awk is called as follows:</p>
<blockquote><p>gawk -F , -f metrify.awk time=”[year|month|day|hour|minute]” [divisions=x,y,z,…] [skipusers=1] input.csv &gt;metrics.csv</p>
</blockquote>
<p>(divisions defaults to ‘90,99’ – i.e. a 90%/9%/1% split of the userbase – if it is not specified).</p>
<p><span id="more-1063"></span>
<p>In this post, I won’t go from scratch through the entire range of metrics that metrify.awk generates; <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-1/">my original four-part post</a> is still sufficient for that purpose. Rather, I’ll focus only on the major changes in this new revision, which relate mainly to <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-2/">part two</a> of that series (and I’ve noted the updates in those posts as well, to avoid confusion): the metrics over time.</p>
<h2>Changes to Metrics over Time</h2>
<p>The first table generated by metrify shows the metrics over the chosen timeframe (e.g. day or hour), but it now contains a number of additional data points. The changes only concern the columns which contain metrics for the various user percentiles which are defined with the ‘divisions’ argument. Rather than providing information only on the number of users from each percentile which are actively participating during each timeframe (expressed as a percentage of the total number of currently active users), as metrify 1.0 did, revision 1.2 provides a number of further metrics:</p>
<ul>
<li>the number of users from each percentile which are currently active, and what percentage of the total currently active userbase that number represents; </li>
<li>the number of tweets from users in each percentile which were made during the timeframe, and what percentage of the total current volume of tweets that number represents.      <br />&#160; </li>
</ul>
<p>Here’s a comparison of the relevant output columns between versions 1.0 and 1.2:</p>
<table cellspacing="0" cellpadding="0" width="100%" border="0">
<tbody>
<tr>
<td width="50%"><strong>metrify.awk 1.0</strong></td>
<td width="50%"><strong>metrify.awk 1.2</strong></td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">number of current users from least active x% (&lt; u tweets)</td>
</tr>
<tr>
<td wwwit="362">lowest x% users (&lt;= u tweets)</td>
<td wwwit="113">% of current users from least active x% (&lt; u tweets)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">number of tweets from least active x% (&lt; u tweets)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">% of tweets from least active x% (&lt; u tweets)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">&#160;</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">number of current users from &gt; x% group (&gt; u-1 tweets; a of n users)</td>
</tr>
<tr>
<td wwwit="362">users &gt; x% (&gt; u tweets; a of n users)</td>
<td wwwit="113">% of current users from &gt; x% group (&gt; u-1 tweets; a of n users)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">tweets from &gt; x% group (&gt; u-1 tweets; a of n users)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">% of tweets from &gt; x% group (&gt; u-1 tweets; a of n users)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">&#160;</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">number of current users from &gt; y% group (&gt; v tweets; b of n users)</td>
</tr>
<tr>
<td wwwit="362">users &gt; y% (&gt; v tweets; b of n users)</td>
<td wwwit="113">% of current users from &gt; y% group (&gt; v tweets; b of n users)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">tweets from &gt; y% group (&gt; v tweets; b of n users)</td>
</tr>
<tr>
<td wwwit="362">&#160;</td>
<td wwwit="113">% of tweets from &gt; y% group (&gt; v tweets; b of n users)</td>
</tr>
</tbody>
</table>
<p>&#160;&#160; <br />(with the default settings, x% would be 90% and y% would be 99%; a, b, u, v, and n would depend on the dataset).</p>
<p>So, it now becomes possible not only to track what percentage of the total number of currently active users are from each of the percentiles we have defined, but also what percentage of the total volume of tweets during each period is contributed by each of the user percentiles. By way of example, here’s a comparison of those metrics for the #egypt dataset during February 2011:</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image8.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb11.png" width="1028" border="0" /></a>&#160; <br />Active users in the 90/9/1 user percentiles as percentage of total active userbase     </p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image9.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb12.png" width="1028" border="0" /></a>     <br />Tweets by users in the 90/9/1 user percentiles as percentage of total current tweet volume     </p>
</p>
<p>Unsurprisingly, the two charts move together – the greater the presence of a specific user group in the total active userbase, the greater their contribution to the current tweet volume – but only the second chart also tells the story of just how dominant the most active one per cent of users really is. Towards the end, they still only constitute slightly less than 20% of the total userbase participating during the final days of February – but more than half of all tweets posted at that time originate from them.</p>
<p>(At a later stage, I may also add functionality to track the use of different tweet types over time, by the different percentiles – but that’s a feature for metrify 1.5 or so.)</p>
<h2>Other Changes</h2>
<p>The only other notable change in this new revision is that the third of the tables generated by metrify.awk, which describes the participating users themselves, has gained a further column, ‘percentile’. This contains a simple descriptor of which of the various percentiles a user has been placed in, and thereby allows for an easier filtering of the list (using Excel’s data filter functions). For the standard 90/9/1 division of the userbase, fields in the column would contain one of the following four options for each user:</p>
<ul>
<li>&gt; 99% – user belongs to the top 1% of most active users </li>
<li>&gt; 90% – user belongs to the top 10% of most active users, but is outside the top 1% </li>
<li>&gt; 0% – user belongs to the 90% of least active users </li>
<li>none – user appears only in @reply or retweet mentions by others, but does not actively contribute to the hashtag </li>
</ul>
<p>&#160; <br />Additionally, and less obviously, I’ve also rewired how users are tracked through the dataset. In principle, this should be a very simple process: each user has both a unique numerical <em>Twitter</em> user ID, and a unique alphanumeric username. However, for some esoteric reason the user IDs returned by the <em>Twitter</em> search and streaming APIs, which <em>Twapperkeeper</em> uses to retrieve its datasets, do not always match, especially for older archives (or perhaps for older accounts?); <a href="http://code.google.com/p/twitter-api/issues/detail?id=214">the same user may have two completely different user IDs</a> (thanks for <a href="http://twitter.com/jobrieniii">John O’Brien</a> for the details on this). This means that using the user IDs to track user activities in the dataset is unreliable. Usernames, however, may also be changed by the user at any point – @KRuddMP could become @KRuddPM when you least expect it. (Sorry, couldn’t resist!) </p>
<p>Still, as this doesn’t happen all too often, and given the unreliability of the numerical user IDs, metrify does use (lowercase) usernames as its internal tracking ID. The final output itself shows usernames in their properly capitalised form as we’ve first encountered it in tweets by the users themselves (they may also have chosen to change that capitalisation at a later date, though; we’re not checking for that), wherever possible; for users who are only mentioned, but don’t themselves tweet actively, we use the capitalisation which we first encounter.</p>
<p>Finally, one <em>caveat</em> remains: as before, metrify will take quite some time to process a large dataset, and is likely to run out of memory if it’s trying to generate full user metrics for such datasets. (There doesn’t seem to be any way to allocate more memory to Gawk – or to the shell it runs in –, so there’s little I can do to fix this.) Where full, detailed per-user metrics aren’t required, use the skipusers=1 command-line argument, and Gawk will only output the number of tweets contributed by each user, and the percentile they’ve been allocated to on that basis. And it will take a lot less time to do so.</p>
<p>So much, then, for this service update of metrify.awk. In a follow-up post in a few days, I’ll show how metrify metrics can also be imported into Gephi to turbo-charge our network visualisations of <em>Twitter</em> @reply and retweet networks…     </p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Call for Papers: Emerging Methods for Digital Media Research</title>
		<link>http://www.mappingonlinepublics.net/2012/01/30/call-for-papers-emerging-methods-for-digital-media-research/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/30/call-for-papers-emerging-methods-for-digital-media-research/#comments</comments>
		<pubDate>Mon, 30 Jan 2012 10:40:10 +0000</pubDate>
		<dc:creator>Jean Burgess</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/?p=1051</guid>
		<description><![CDATA[Another brief announcement: along with our CCI colleague Larissa Hjorth, Axel and I are looking forward to editing a special issue of the Journal of Broadcasting &#038; Electronic Media (JOBEM) on the theme &#8220;Emerging Methods for Digital Media Research&#8221;, due for publication in March 2013. If you work in a related area, please consider submitting [...]]]></description>
			<content:encoded><![CDATA[<p>Another brief announcement: along with our <a href="http://cci.edu.au">CCI</a> colleague Larissa Hjorth, Axel and I are looking forward to editing a special issue of the <a href="http://www.tandf.co.uk/journals/HBEM">Journal of Broadcasting &#038; Electronic Media</a> (JOBEM) on the theme &#8220;Emerging Methods for Digital Media Research&#8221;, due for publication in March 2013. If you work in a related area, please consider submitting an abstract by the March deadline. Details follow below.<br />
<span id="more-1051"></span></p>
<blockquote><p><strong>Emerging Methods for Digital Media Research</strong><br />
Special Themed Issue of the Journal of Broadcasting &#038; Electronic Media (JOBEM), March 2013.</p>
<p>Guest Editors:<br />
Jean Burgess (QUT)<br />
Axel Bruns (QUT)<br />
Larissa Hjorth (RMIT)<br />
ARC Centre of Excellence for Creative Industries &#038; Innovation (http://cci.edu.au/)</p>
<p>Editor: Zizi Papacharissi</p>
<p>With the rise of ‘big data’, locative media, and smartphones, existing media and communication studies methods are being recombined, reconfigured and replaced alongside their objects of study. This special issue of JOBEM seeks to expose new research methods for understanding the changing nature of the content industries, the impact of digital media on the practices of creative workers, and the experiences and practices of everyday users of digital media technologies.</p>
<p>We welcome papers based in the humanities and social sciences that reflect on, discuss or critique current methodological trends in digital media research, shedding light on the following questions:<br />
1. Where are the emerging methodological <strong>gaps</strong> &#8211; are there pressing research problems that require the development of new methods, techniques and tools?<br />
2. Where are there needs for new <strong>combinations</strong> of methods, within or across disciplines?<br />
3. What are the implications for future <strong>pedagogical models</strong> in internet, media and communication studies, including doctoral education and other forms of research training?</p>
<p>We especially welcome papers grounded in the experience of conducting empirical digital media research. However we will give preference to papers that contextualise, historicise, and reflect on current methodological trends; rather than simply report on the applications or results of new methods.</p>
<p>Abstracts of 250 words are due by 31 March, 2012. Depending on the number of abstracts received, we may shortlist submissions at this stage. Please email your abstract and a list of 3 or 4 suggested peer reviewers to: jobem.edm@gmail.com.</p>
<p>Full articles of no more than 7000 words should be submitted on or before 1 August, 2012 at: <a href="http://mc.manuscriptcentral.com/hbem">http://mc.manuscriptcentral.com/hbem</a> (select “Special Issue: Emerging Digital Methods” as a manuscript type). Manuscripts should conform to the <a href="http://www.tandf.co.uk/journals/journal.asp?issn=0883-8151&#038;linktype=44">guidelines of the Journal of Broadcasting &#038; Electronic Media</a>.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/30/call-for-papers-emerging-methods-for-digital-media-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating Basic Twitter Language Metrics</title>
		<link>http://www.mappingonlinepublics.net/2012/01/28/creating-basic-twitter-language-metrics/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/28/creating-basic-twitter-language-metrics/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 02:27:01 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[#egypt]]></category>
		<category><![CDATA[Gawk]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/?p=1049</guid>
		<description><![CDATA[OK, this may be a somewhat esoteric subject for researchers who mainly work with Twitter data from specific countries and cultures, but over the past few weeks I’ve been working on a paper that analyses Twitter activities in the #egypt and #libya hashtags – and as part of that work, I’ve been interested in exploring [...]]]></description>
			<content:encoded><![CDATA[<p>OK, this may be a somewhat esoteric subject for researchers who mainly work with <em>Twitter</em> data from specific countries and cultures, but over the past few weeks I’ve been working on a paper that analyses <em>Twitter</em> activities in the #egypt and #libya hashtags – and as part of that work, I’ve been interested in exploring the interactions between users tweeting in Arabic and users tweeting in other languages (mainly in English). Unfortunately, there’s no reliable means of identifying the language of specific tweets, or of the users who post them; while the <em>Twitter</em> API provides an ISO language code (e.g. ‘en’ for English, ‘no’ for Norwegian, etc.) for each tweet, this is drawn simply from the overall language setting of the user’s account, and not specific to each individual tweet itself. For users who alternate between languages in their tweeting, all tweets will be tagged with their chosen language code; for users who haven’t bothered to change their <em>Twitter</em> profile settings away from the default English, all their tweets will be tagged ‘en’, regardless of their actual language.</p>
<p><span id="more-1049"></span>
<p>So far, so unhelpful. Further, short of running every tweet through some form of automatic language recognition tool (using <em>Google Translate</em> or a similar mechanism, for example) – which would be extremely time-consuming for <em>Twitter</em> archives upwards of a few thousand tweets – it is prohibitively difficult to identify the exact language of each tweet, not least also because of the 140 character limit of tweets. In theory, if we had word corpora for all major languages, we could cross-check each tweet against those corpora to see what words from what language occur most frequently – but again, that process would be extremely time-consuming, and would probably have serious difficulties with the abbreviations and contractions which <em>Twitter</em> users commonly employ to stay within that limit.</p>
<p>A much simpler approach – which does generate somewhat less conclusive results, though – works by examining the character sets used in tweets. This is able to make only relatively broad distinctions, but it’s good enough for what I’m trying to achieve with my #egypt/#libya datasets: here, a quick qualitative look at the data suggests that the major division is between Arabic tweets and tweets in English (and to some extent in other European languages) – so the main challenge is to distinguish between Latin and Arabic character sets. This we can do, even just with a basic Gawk script.</p>
<p><em>Twitter</em> datasets as they are generated by our standard hashtag tracking solution, <em>yourTwapperkeeper</em>, are available in UTF-8 encoding, leaving virtually all characters and character sets intact. Each character is assigned a specific character code, and for historical reasons, the basic characters of the Latin script (unaccented letters, standard punctuation marks, etc.) retain their traditional ASCII codes, with values below 128; beyond that range, we’re moving into accented letters, more unusual punctuation marks, and non-Latin character sets. Sadly, our preferred tool for processing <em>yourTwapperkeeper</em> datasets, Gawk, doesn’t cope all that well with advanced UTF-8 characters – it copes fine with single-byte character codes (i.e. below 256), but not with multi-byte character codes (above 255; it reads these as multiple single-byte characters). At least on a Windows PC, there doesn’t seem to be any way to change that behaviour, either.</p>
<p>However, that’s still good enough for our immediate purpose of distinguishing between Latin and non-Latin (i.e. mainly English and Arabic) tweets. As it turns out, Gawk consistently sees Arabic characters as a sequence of two codes: of either 216 (Ø) or 217 (Ù), followed by another character with a code above 127. So, for a basic distinction between tweets using Latin and tweets using non-Latin scripts, we simply need to count the number of high-ASCII characters (with a code above 127) which Gawk sees in each tweet, and to set a threshold below which a tweet is still classified as ‘Latin’ (to allow tweets that use accented characters or ‘fancy’ quotation marks to be classed as Latin). Through trial and error, I’ve found that a threshold of 20 (i.e. ten Arabic or other non-Latin characters) seems to work reasonably well: few tweets in languages using the Latin alphabet will be miscounted as ‘non-Latin’, even if they contain a number of umlauts or accented characters, while tweets in Arabic, Hebrew, Greek, Chinese, Korean, and other non-Latin alphabets are reliably recognised.</p>
<p>We could use this to mark up the language of every line in a <em>yourTwapperkeeper</em> archive – but that’s not necessarily very useful or interesting. Instead, the script below operates on a user-by-user basis: for each user, it counts the number of their tweets which were above the ‘non-Latin’ threshold, and also calculates a language_ratio value: the percentage of their tweets which used non-Latin characters. The script accepts an optional ‘tolerance’ parameter, to set the ‘non-Latin’ threshold: a typical way to use it would be</p>
<blockquote><p>gawk -F , -f userlanguage.awk tolerance=20 input.csv &gt;output.csv</p>
</blockquote>
<p>(tolerance defaults to zero if it isn’t set).</p>
<div class="wlWriterEditableSmartContent" id="scid:887EC618-8FBE-49a5-A908-2339AF2EC720:8bce1867-5a64-47d3-8bea-b9d89980756d" style="padding-right: 0px; display: inline; padding-left: 0px; float: none; padding-bottom: 0px; margin: 0px; padding-top: 0px">         <code>
<pre>

# userlanguage.awk - Extract stats on the language use of each user, as metrics for network visualisation in Gephi
#
# this script takes a Twapperkeeper CSV/TSV archive of tweets, and calculates for each user a ratio
# indicating how many of their tweets were in non-Latin charactersets
#
# output is in a format ready to be imported as a node list into the Gephi Data Laboratory
# on import, note that new data columns must be imported as 'float' type
#
# the script skips the first line, expecting that it contains header information
#
# script expects an optional numerical "tolerance" parameter, to set how many high-ASCII (non-Latin) characters a tweet may contain while still counted as Latin script
# set tolerance to ~20 to treat most accented European languages as Latin (note that Gawk will count some UTF-8 characters as two or more high-ASCII characters)
# default value for tolerance is 0
#
# expected data format:
# text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type,geo_coordinates_0,geo_coordinates_1,created_at,time
#
# output format:
# nodes,id,label,user_tweets,user_highASCII_tweets,language_ratio
# (language_ratio is a value between 1 = no Latin tweets and 0 = 100% Latin tweets)
#
# Released under Creative Commons (BY, NC, SA) by Axel Bruns - a.bruns@qut.edu.au

BEGIN {
	getline 

	if(!tolerance) tolerance = 0;						# highASCII tolerance level: default 0

	for(char = 0; char < 256; char++) {
		charnum[sprintf("%c", char)] = char
	}

	print "Nodes" FS "Id" FS "Label" FS "user_tweets" FS "user_highASCII_tweets" FS "language_ratio"
}

{
	nodename[tolower($3)] = $3
	node[tolower($3),"tweets"]++

	highASCII = 0
	for(char = 1; char<=length($1); char++) {
		if(charnum[substr($1, char, 1)] > 127) highASCII++		# count number of high ASCII (>127) characters in tweet; note: some UTF-8 characters count as multiples
	}
	if(highASCII > tolerance) node[tolower($3),"highASCII"]++
}

END {
	for(name in nodename) {
		print name FS name FS nodename[name] FS node[name,"tweets"] FS node[name,"highASCII"] FS node[name,"highASCII"] / node[name,"tweets"]
	}
}
</pre>
<p></code>
      </div>
<p>
  <br />The resulting data can be used in a number of ways. For one, we might divide the total userbase into three groups: users who mainly used Latin characters (with a language_ratio below 0.33); users who mainly used non-Latin characters (language_ratio &gt; 0.66); and users posting in a mix of languages (language_ratio between 0.33 and 0.66). If we further combine this grouping with the distinctions between lead users, highly active users, and less active users <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-1/">which the metrify.awk script makes possible</a>, we now have the ability to examine the prevalence of different languages across these different groups – for #egypt during February 2011, this is what results, for example:</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image7.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb10.png" width="491" border="0" /></a></p>
<p>An interesting result: while ‘Latin’ (in this case, mainly English-speaking) users dominate overall, they’re mainly found amongst the less engaged 90% of users – they’re making or retweeting a small number of hashtagged comments about the situation in Egypt during February. The most engaged one per cent of users contain a much larger percentage of Arabic (i.e. non-Latin) speakers, as well as a sizeable proportion of users tweeting in a mix of languages and character sets.</p>
<p>(Note: of course, speakers of languages such as Chinese, Korean, Japanese, Greek, Hebrew, Russian, etc. will be included in the ‘non-Latin’ group here, and speakers of many European languages other than English will be counted amongst the ‘Latin’ group. In many cases, this will be a problem, and our approach here doesn’t allow for easy distinctions between, say, English and French, or Arabic and Hebrew. For our present purposes, however, that’s a negligible problem – few ‘non-Latin’ languages other than Arabic, and few ‘Latin’ languages other than English, are present in the #egypt dataset to any significant extent.)</p>
<p>Additionally, the output of userlanguage.awk is also designed to be easily imported into Gephi as an additional source of data on the users in the network. Assuming we’ve already created a network (for example showing @replies and retweets) for your dataset, using the <em>Twitter</em> usernames (normalised to lower case) as node IDs, we can now use the Data Laboratory to import the language data into the nodes table, as additional columns. Here, it’s important to make sure the numerical metrics generated by userlanguage.awk (user_tweets, user_highASCII_tweets, language_ratio) are imported as columns of the ‘Float’ type, in order to be able to use them effectively in Gephi. </p>
<p>(I’ll say much more about importing <em>Twitter</em> metrics data into Gephi in a future blog post – stay tuned.)</p>
<p>Once imported, these metrics are now available to be used for various purposes: as a means of sizing or colouring nodes in the network, or as criteria for filtering it. To finish off for now, here’s a simple example, which shows @replies and retweets in the #egypt hashtag during February 2011. I’ve used the language_ratio value as the guide for the colour scale here: blue indicates a language_ratio close to zero (predominantly tweeting in Latin characters); green a language_ratio close to one (predominantly tweeting in non-Latin characters); with a gradient of colours between them. Connections between users are coloured according to the language ratio of the sender. (<a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/egypt-all-users.png">Full graph here – PNG, 9 MB.</a>)</p>
<p><script src="http://zoom.it/VGo2.js?width=auto&amp;height=400px"></script></p>
<p>&#160; <br />There’s an obvious language divide here – English- and Arabic-speaking users are mainly tweeting amongst themselves. But there are also a good number of connections across the divide – and for these, given the graph above, the most active #egypt participants are disproportionately responsible: mixed-language users are much more likely to be found in that group than in any of the others. </p>
<p>And that’s it for now – more on my language analysis of #egypt and #libya when the paper gets published, and more on using <em>Twitter</em> metrics in Gephi in a future post!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/28/creating-basic-twitter-language-metrics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CCI Winter School &#8211; Apply Now</title>
		<link>http://www.mappingonlinepublics.net/2012/01/13/cci-winter-school-apply-now/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/13/cci-winter-school-apply-now/#comments</comments>
		<pubDate>Fri, 13 Jan 2012 02:20:05 +0000</pubDate>
		<dc:creator>Jean Burgess</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[CCI]]></category>
		<category><![CDATA[summer school]]></category>
		<category><![CDATA[winter school]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/?p=1036</guid>
		<description><![CDATA[In my new role as Deputy Director of the ARC Centre of Excellence for Creative Industries &#038; Innovation (CCI for short), I&#8217;m excited to be leading the team that&#8217;s organising our most ambitious PhD and Early Career Researcher activity to date &#8211; the CCI Winter School, to be held in balmy Brisbane in late June [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.cciwinterschool.org/wp-content/uploads/2011/11/P1030333_A.jpg" align="center"/></p>
<p>In my new role as Deputy Director of the ARC Centre of Excellence for Creative Industries &#038; Innovation (<a href="http://cci.edu.au" title="CCI Website" target="_blank">CCI</a> for short), I&#8217;m excited to be leading the team that&#8217;s organising our most ambitious PhD and Early Career Researcher activity to date &#8211; the <a href="http://cciwinterschool.org" target="_blank">CCI Winter School</a>, to be held in balmy Brisbane in late June this year. It&#8217;s a selective but free event (you or your institution only need to cover your travel), involving a fairly small group of promising PhD students and early career researchers from around the world. If you&#8217;re in the northern hemisphere and looking for a 2012 summer research school, why not consider being adventurous and coming down under instead? Axel and I will both be on hand as mentors, along with <a href="http://www.cciwinterschool.org/the-team/" target="_blank">a bunch of other fabulous people</a>.</p>
<p>Applications close on 31 January &#8211; don&#8217;t miss out!<br />
<span id="more-1036"></span></p>
<blockquote><p>
<a href="http://cciwinterschool.org" title="CCI winter school website" target="_blank">CCI’s 2012 Winter School</a> (coinciding with summer in the northern hemisphere) offers selected doctoral students and early career researchers a week-long program of interdisciplinary study, collaboration and social interaction in the broad area of creative industries and innovation research, drawing on the Centre’s expertise in media, cultural and communication studies, economics, education, policy and law, in relation to the creative economy.</p>
<p>We welcome applications from emerging scholars working on related topics including, but not limited to:</p>
<ul>
<li>Cultural, media and creative industries policy</li>
<li>Digital society</li>
<li>Community arts and media</li>
<li>New business models in the creative economy</li>
<li>Innovation studies</li>
<li>Economics of the creative industries</li>
<li>The creative industries in Asia</li>
<li>Transmedia</li>
<li>Internet studies</li>
<li>Copyright and intellectual property</li>
<li>The challenges of ‘big data’</li>
<li>Creative careers and creative labour</li>
</ul>
<p>Participants will work with leading researchers, engage in intensive workshop activities and receive direct feedback and individual mentoring on their own work. Social activities will provide additional opportunities for participants to get to know each other and form collaborative relationships that will last for years to come.</p></blockquote>
<p>For all the info, lists of mentors, an indicative program and the online application form, visit the <a href="http://cciwinterschool.org">CCI Winter School</a> website.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/13/cci-winter-school-apply-now/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CCI Report on #qldfloods and @QPSMedia in the 2011 Floods</title>
		<link>http://www.mappingonlinepublics.net/2012/01/11/cci-report-on-qldfloods-and-qpsmedia-in-the-2011-floods/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/11/cci-report-on-qldfloods-and-qpsmedia-in-the-2011-floods/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 00:00:00 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Publications]]></category>
		<category><![CDATA[#qldfloods]]></category>
		<category><![CDATA[@QPSMedia]]></category>
		<category><![CDATA[crisis communication]]></category>
		<category><![CDATA[floods]]></category>
		<category><![CDATA[Queensland]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/?p=1031</guid>
		<description><![CDATA[It’s difficult to believe that one year ago, significant parts of Brisbane were inundated by floodwaters; thankfully, there has been no repeat of the flood crisis this year. One of the few good news stories to emerge from the disaster was the – overall, very successful – way in which social media such as Twitter [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/qldfloods-and-@QPSMedia.pdf"><img style="margin: 0px 0px 5px 5px; border-width: 0px;" title="#qldfloods and @QPSMedia thumbnail" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/qldfloodsandQPSMediathumbnail.png" alt="#qldfloods and @QPSMedia thumbnail" width="154" height="204" align="right" border="0" /></a> It’s difficult to believe that one year ago, significant parts of Brisbane were inundated by floodwaters; thankfully, there has been no repeat of the flood crisis this year. One of the few good news stories to emerge from the disaster was the – overall, very successful – way in which social media such as <em>Twitter</em> and <em>Facebook</em> were used during the event, both by key emergency authorities and by everyday users, from directly affected local residents to onlookers further afield.</p>
<p>Particular kudos in this must go to the Queensland Police Service Media Unit, which – not quite from a standing start, but certainly without much time to prepare a comprehensive strategy for its social media crisis communication approaches – delivered timely, informative, and level-headed updates on the flood crisis as it unfolded. Its <em><a href="http://www.facebook.com/QueenslandPolice">Facebook</a></em> followers grew, literally overnight, by a factor of ten, and <a href="http://twitter.com/QPSMedia">@QPSMedia</a> also became the single most visible account participating in the <a href="http://twitter.com/#!/search/%23qldfloods">#qldfloods</a> <em>Twitter</em> hashtag.</p>
<p><span id="more-1031"></span></p>
<p>We’ve presented some analyses of the use of <em>Twitter</em> during the crisis in various contexts during 2011 – including the <a href="http://www.mappingonlinepublics.net/2011/03/28/event-social-media-in-times-of-crisis/">Eidos Institute symposium</a> at the Queensland State Library in April, and <a href="http://www.mappingonlinepublics.net/2011/04/11/emergency-media-and-public-affairs-conference/">various</a> <a href="http://www.mappingonlinepublics.net/2011/05/07/e-democracy-learning-from-qldfloods-and-wikileaks/">conference</a> <a href="http://www.mappingonlinepublics.net/2011/10/29/twitter-and-crises-qldfloods-eqnz-and-sj/">presentations</a> later in the year. In time for the first anniversary of the floods, we are now releasing a major report on #qldfloods and @QPSMedia through the <a href="http://cci.edu.au/">ARC Centre of Excellence for Creative Industries and Innovation</a>, where we are based.</p>
<p>Co-authored by Axel Bruns, Jean Burgess, Kate Crawford, and Frances Shaw, the report takes a comprehensive look at overall patterns of <em>Twitter</em> activity in #qldfloods, as well as analysing in much greater detail the contents both of the #qldfloods update stream itself and of the conversation specifically surrounding @QPSMedia. (We are especially indebted for this to our colleague Frances Shaw, who carried out the tedious task of coding those tweets.)</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/qldfloods-and-@QPSMedia.pdf">The report is available for download here.</a> More information is also available from the CCI Website, <a href="http://cci.edu.au/publications/cci-report-highlights-role-social-media-floods-coverage-and-re">which has the full press release</a>, too.</p>
<p>We’re hoping that this report will make a useful contribution to the further development of social media crisis communication strategies in emergency services and media organisations. It’s also a useful starting-point for <a href="http://www.mappingonlinepublics.net/2011/11/01/new-arc-linkage-project-social-media-in-times-of-crisis/">our ARC Linkage project</a> in partnership with the <a href="http://eidos.org.au/">Eidos Institute</a> and the <a href="http://www.communitysafety.qld.gov.au/">Queensland Department of Community Safety (DCS)</a>, which will further investigate the use of social media in crisis communication and work with the DCS to develop its social media activities.</p>
<p><span style="font-size: xx-small;">(Report cover image by </span><a href="http://www.flickr.com/photos/gusveitch/5363574870/"><span style="font-size: xx-small;">Angus Veitch</span></a><span style="font-size: xx-small;"> on <em>Flickr</em>. Used under a Creative Commons BY-NC licence.)</span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/11/cci-report-on-qldfloods-and-qpsmedia-in-the-2011-floods/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Gearing Up for the Election(s)</title>
		<link>http://www.mappingonlinepublics.net/2012/01/10/gearing-up-for-the-elections/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/10/gearing-up-for-the-elections/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 00:35:41 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[elections]]></category>
		<category><![CDATA[Norway]]></category>
		<category><![CDATA[projects]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[social media]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/2012/01/10/gearing-up-for-the-elections/</guid>
		<description><![CDATA[We’ve got a few busy years ahead of us, it seems. In addition to the ARC Linkage project on social media and crisis communication which was awarded to us (the QUT Mapping Online Publics team along with our CCI colleague Kate Crawford, the Queensland Department of Community Safety, and the Eidos Institute), which we’ll carry [...]]]></description>
			<content:encoded><![CDATA[<p>We’ve got a few busy years ahead of us, it seems. In addition to <a href="http://www.mappingonlinepublics.net/2011/11/01/new-arc-linkage-project-social-media-in-times-of-crisis/">the ARC Linkage project on social media and crisis communication</a> which was awarded to us (the QUT Mapping Online Publics team along with our <a href="http://cci.edu.au/">CCI</a> colleague Kate Crawford, the Queensland Department of Community Safety, and the Eidos Institute), which we’ll carry out during 2012-14, we’ve also had word in December that another project application has been successful. </p>
<p>Titled “The Impact of Social Media on Agenda-Setting in Election Campaigns:    <br />Cross-Media and Cross-National Comparisons”, that project will study the use of social media in a series of election campaigns which are coming up over the next few years (2012-15) – including the Queensland state election and the US presidential election this year (and I’m tempted to throw in the French presidential election as well, just for fun), and elections in Sweden, Norway, and Australia which are coming up in 2013 and 2014.</p>
<p><span id="more-1027"></span>
<p>The project is led by <a href="http://www.hf.uio.no/imk/personer/vit/gunnen/">Gunn Enli</a> at the University of Oslo, and also involves <a href="http://www.hf.uio.no/imk/english/people/aca/skogerbo/index.html">Eli Skogerbø</a> at Oslo, <a href="http://hm.uib.no/index.html">Hallvard Moe</a> from the University of Bergen (currently visiting the CCI), <a href="http://chrchristensen.wordpress.com/">Christian Christensen</a> at the University of Uppsala, and <a href="http://www.csulb.edu/colleges/cla/departments/polisci/people/wallsten.html">Kevin Wallsten</a> at California State University. It’s funded by the Norwegian Research Council, who have awarded us the impressive sum of 9.9m NOK (a still impressive 1.5m in Australian Dollars). Here’s the project overview:</p>
<blockquote><h4>The Impact of Social Media on Agenda-Setting in Election Campaigns: Cross-Media and Cross-National Comparisons</h4>
<p>The project has as its primary objective to establish new and unique knowledge on the interaction and inter-media agenda-setting between social media and mainstream media in different cultural and political settings. The findings of the project will provide empirical insights into the development of hybrid public spheres, and contribute to refining and revising theories on political communication in cross-national environments.</p>
<p>The project will establish a high quality international research network, involving some of the leading scholars on social media, internationally as well as in Norway, Sweden, USA and Australia. The publications from the project will contribute to the ongoing international scholarly debate on the role of social media in public communication across the world.      </p>
<p>Social media not only serve as arenas for debate and discussion, they are also increasingly integrated in inter-media agenda-setting, as they serve as input to the mainstream media. Political actors as well as citizens use them in order to draw attention to issues and manage their public images. The increasing cross-mediality between the social media and the mainstream media can be described in terms of creating &quot;hybrid public spheres&quot; in which the social and mainstream media overlap and interact. The project takes a cross-media and cross-national approach, by researching political communication in election campaigns in Australia, Norway, Sweden and USA. </p>
<p>The project has one overall and three sub-RQs:</p>
<ul>
<li>What characterizes the dynamics between social media and mainstream media in political agenda-setting, and how does this dynamic impact the relationship between politicians and voters in different political systems?        <br />&#160; </li>
<li>What characterizes politicians&#8217; use of social media as a tool of political communication in countries of different size and with different election systems, and to what degree has political debate migrated from mainstream media to social media? </li>
<li>What characterizes the dynamics between social media and mainstream media in agenda setting during election campaigns, and to what degree do journalists in different nations relate to and incorporate social media as editorial raw material? </li>
<li>What characterizes the &#8216;hybrid public sphere&#8217; in the intersection between social media and mainstream media, and to what degree are traditional power hierarchies and elite domination challenged in the new hybrid public sphere? </li>
</ul>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/10/gearing-up-for-the-elections/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twapperkeeper and Beyond: A Reminder</title>
		<link>http://www.mappingonlinepublics.net/2012/01/09/twapperkeeper-and-beyond-a-reminder/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/09/twapperkeeper-and-beyond-a-reminder/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 00:20:35 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[archiving]]></category>
		<category><![CDATA[Twapperkeeper]]></category>
		<category><![CDATA[Twitter]]></category>
		<category><![CDATA[yourTwapperkeeper]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/2012/01/09/twapperkeeper-and-beyond-a-reminder/</guid>
		<description><![CDATA[Those of you who have followed our adventures in Twitter research for some time now will know that we’ve relied to a significant extent on Joe John O’Brien III’s excellent Twapperkeeper as a tool for capturing tweets. Twapperkeeper (as a stand-alone, free Web-based service) no longer exists in its original form, however – though some [...]]]></description>
			<content:encoded><![CDATA[<p>Those of you who have followed our adventures in <em>Twitter</em> research for some time now will know that we’ve relied to a significant extent on <strike>Joe</strike> John O’Brien III’s excellent <em><a href="http://twapperkeeper.com/">Twapperkeeper</a></em> as a tool for capturing tweets. <em>Twapperkeeper</em> (as a stand-alone, free Web-based service) no longer exists in its original form, however – though some of its functionality for creating <em>Twitter</em> archives appears to have been subsumed into the for-pay services available as premium offerings from <em><a href="http://hootsuite.com/">Hootsuite</a></em> – and so we’ve been getting the occasional <a href="http://www.mappingonlinepublics.net/2010/10/20/dynamic-networks-in-gephi-from-twapperkeeper-to-gexf/#comment-18422">inquiry</a> about what to do now.</p>
<p><span id="more-1022"></span>
<p>Some months ago, I published a quick post to outline how we’ve transitioned from <em>Twapperkeeper</em>(<em>.com</em>) to the open-source solution <em><a href="https://github.com/jobrieniii/yourTwapperKeeper">yourTwapperkeeper</a></em>, which offers comparable functionality as a Web package which users are able to install on their owns servers, and the start of a new year seems like a good point to reiterate this, as well as to add a few further pointers. So:</p>
<ul>
<li><em>yourTwapperkeeper</em> does pretty much exactly what <em>Twapperkeeper</em> did, and provides data in almost the same format. For the purposes of using the Gawk scripts which much of our work is built on, though, we need CSV or TSV files in the original <em>Twapperkeeper</em> format, and I’ve made available a small modification for <em>yourTwapperkeeper</em> which generates them. <a href="http://www.mappingonlinepublics.net/2011/06/21/switching-from-twapperkeeper-to-yourtwapperkeeper/">More details here.</a> </li>
<li>Your Web server must be running 24/7 if you want to capture comprehensive datasets from <em>Twitter</em>. If there’s any chance that it may go down at some point (e.g. due to regular scheduled maintenance), you need to make sure that <em>yTK</em> is restarted as soon as the server is back up again. <a href="http://groups.google.com/group/yourtwapperkeeper/browse_thread/thread/2064f9393be7c57c">Some information on how to do so is here.</a> </li>
<li><strong>Most importantly:</strong> recent changes at <em>Twitter</em> now require API requests to use the https (rather than plain http) protocol. This means that <font color="#ff0000">(if you have an existing install of <em>yTK</em>)</font>you need to make some minor changes to the <em>yourTwapperkeeper</em> code <font color="#ff0000">(or use <a href="https://github.com/jobrieniii/yourTwapperKeeper">the latest version from Github</a>, which has the changes already built in)</font>; without these changes, you may only receive data from the search API, but not the (more important) streaming API, or even none at all. <a href="http://groups.google.com/group/yourtwapperkeeper/browse_thread/thread/599d51e7d250a52e?pli=1">Details on how to make these changes are here.</a> </li>
</ul>
<p>&#160; <br />Hope this helps. Happy <em>Twapperkeeping</em>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/09/twapperkeeper-and-beyond-a-reminder/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Taking Twitter Metrics to a New Level (Part 4)</title>
		<link>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-4/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-4/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 01:48:59 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[#auspol]]></category>
		<category><![CDATA[Gawk]]></category>
		<category><![CDATA[hashtags]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-4/</guid>
		<description><![CDATA[Update: revision 1.2 of metrify.awk is now available (still at the link below), and introduces some further functionality, which is outlined here. This is the final instalment of my four-part introduction to the metrify.awk script for generating detailed metrics for specific Twapperkeeper/yourTwapperkeeper hashtag archives. Over the last couple of posts, we’ve mainly dealt with overall [...]]]></description>
			<content:encoded><![CDATA[<p><font color="#ff0000"><strong>Update:</strong> revision 1.2 of metrify.awk is now available (still at the link below), and introduces some further functionality, </font><a href="http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/"><font color="#ff0000">which is outlined here</font></a><font color="#ff0000">.</font></p>
<p>This is the final instalment of <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-3/">my four-part introduction</a> to the <a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/metrify.zip">metrify.awk</a> script for generating detailed metrics for specific <em>Twapperkeeper</em>/<em>yourTwapperkeeper</em> hashtag archives. Over the last couple of posts, we’ve mainly dealt with overall stats for the hashtag, as well as for specific, definable percentiles of more or less active users. Finally, now, it’s time to look more closely at patterns within the overall userbase.</p>
<p><span id="more-1003"></span><br />
<h2>User Metrics</h2>
<p>For this, we’re using the final (and by far the largest) data table which metrify.awk generates. To produce a full table, by the way, the <em>skipusers=1</em> command-line argument must not be specified this time around – otherwise the only per-user data which metrify.awk will output is each user’s number of tweets. With <em>skipusers</em> off, on the other hand, we get a great deal more – but a word of warning: for large datasets, processing times can also increase quite considerably. For each user, metrify.awk tracks <font color="#ff0000">which user percentile they’ve been assigned to,</font> how many tweets they’ve sent and received (in the form of public @replies or retweets – note that this does not include any non-hashtagged tweets, which would not be included in the original dataset, of course), as well as how these sent and received tweets break down into our by now familiar categories:</p>
<ul>
<li>original tweets </li>
<li>@replies </li>
<li>genuine @replies </li>
<li>retweets </li>
<li>unedited retweets </li>
<li>edited retweets </li>
<li>URLs </li>
</ul>
<p>&#160; <br />as well as</p>
<ul>
<li>@replies received </li>
<li>genuine @replies received </li>
<li>retweets received </li>
<li>unedited retweets received </li>
<li>edited retweets received </li>
</ul>
<p></p>
<p>(with these metrics again provided both as a total number, and as a percentage of all tweets sent or @replies received, respectively). Again, with the exception of URLs, these will add up to the total:</p>
<ul>
<ul>
<li>edited retweets + unedited retweets = retweets </li>
<li>retweets + genuine @replies = @replies </li>
<li>original tweets + genuine @replies + retweets = total number of tweets </li>
</ul>
</ul>
<p>&#160; <br />as well as</p>
<ul>
<li>unedited retweets received + edited retweets received = retweets received </li>
<li>retweets received + genuine @replies received = @replies received </li>
</ul>
<p></p>
<p>But wait, there’s more – we can also calculate the ratio between these incoming @replies and the tweets sent by the user, to get a sense of the resonance of their activities:</p>
<ul>
<li>@replies received : total tweets sent </li>
<li>genuine @replies received : total tweets sent </li>
<li>retweets received : total tweets sent </li>
<li>unedited retweets received : total tweets sent </li>
<li>edited retweets received : total tweets sent </li>
</ul>
<p>&#160; </p>
<h2>Some Results</h2>
<p>So, let’s see what these data tell us. In the first place, let’s look more closely at that small group of highly active users: here’s a graph for the top 150 most active participants (i.e. slightly more than the top 1%):</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image3.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb5.png" width="1028" border="0" /></a> </p>
<p>We see immediately that even amongst this top group, there’s a very pronounced long tail distribution: just two #auspol users (you know who you are) contributed more than 10,000 tweets each, and a total eight contributed more than 5,000 tweets each. Beyond those hyper-active few, we’re quickly dropping down towards the just over 500 tweets achieved by each of the users at the end of that top 150 (and further as we move into the second and third percentile groups). Additionally, the graph above also shows a breakdown of those tweets into original tweets, genuine @replies, and retweets – and remarkably, the lead user here achieved their position mainly by sending copious amounts of @replies…</p>
<p>The total activity distribution across all 14,133 active #auspol participants, by the way, looks like this:</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image4.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb6.png" width="529" border="0" /></a> </p>
<p>An extreme activity distribution if ever I’ve seen one!</p>
<p>But of course, tweeting a lot is only one side of the coin on <em>Twitter</em>: if nobody is reading (and responding), the user’s influence may still not be particularly great. So, instead of tweets sent, we can also examine the @replies received (showing the top 150 users again):</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image5.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb7.png" width="1028" border="0" /></a> </p>
<p>This gives us a much better idea of who’s central to the conversation, I think: these are the users receiving the largest amount of @replies and retweets (and in this case, mostly genuine @replies, which is remarkable in its own right). It should be noted that – since we’re looking at @replies <em>received</em> here – this list may also include users who are <em>only</em> mentioned, but never actively participated in the hashtag; in the case of #auspol, this includes accounts like @JuliaGillard and @TonyAbbottMHR, for example, both of whom are present in the top 50 @reply recipients.</p>
<p>For those users, of course, it’s impossible to calculate the ratio of @replies received to tweets sent (since they didn’t send any) – but for the rest, that ratio may also be valuable, as an indication of what we might call resonance. A user receiving a great number of @replies (whether genuine @replies or retweets) for a comparatively small number of tweets could be said to have substantial resonance; a user tweeting a great deal, but receiving few @replies in return for their efforts, has relatively little resonance.</p>
<p>There are plenty of different ways to examine such resonance, using the different metrics which metrify.awk provides us with; as one example, here I’ve plotted the ratio of <em>genuine</em> @replies (i.e. non-retweets) received per sent tweet against the total number of tweets for the fifty most active users:</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image6.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb9.png" width="1028" border="0" /></a> </p>
</p>
<p>For the very lead users, then, their resonance rating isn’t actually all that great: the top user receives a genuine @reply roughly only for every second tweet they’ve sent, and for the next most active users this gradually increases to a 1:1 ratio. A handful of others, on the other hand, break through the parity barrier, receiving (on average) more than one genuine @reply for each tweet they’ve sent. Remarkably, though, one user in the top fifty even received an average of more than two genuine @replies for each of the over 2000 tweets they contributed to #auspol!</p>
<p>(Again, I should stress here that we’re only counting those @replies which are contained in our dataset – which in this example means @replies which were themselves tagged with the #auspol hashtag. In the absence of comprehensive data on non-hashtagged <em>Twitter</em> traffic we have no way of knowing how much non-hashtagged follow-on communication may also have occurred – our measures of tweet resonance, therefore, only measure resonance within the hashtagged conversation.)</p>
<p>Phew – well, with these posts at least we’ve started to scratch the surface of the <em>Twitter</em> metrics which metrify.awk can generate for a given dataset. Exactly how any of these metrics may be used in any specific case depends on the research questions to be examined, of course. Go experiment – and let me know if there are other metrics which we could add to the script as well!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-4/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Taking Twitter Metrics to a New Level (Part 3)</title>
		<link>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-3/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-3/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 01:48:08 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[#auspol]]></category>
		<category><![CDATA[Gawk]]></category>
		<category><![CDATA[hashtags]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-3/</guid>
		<description><![CDATA[Update: revision 1.2 of metrify.awk is now available (still at the link below), and introduces some further functionality, which is outlined here. Over the past couple of posts, I’ve introduced our new metrify.awk Twitter metrics script, and looked at the first of the three metrics tables produced by the script. Let’s move on now to [...]]]></description>
			<content:encoded><![CDATA[<p><font color="#ff0000"><strong>Update:</strong> revision 1.2 of metrify.awk is now available (still at the link below), and introduces some further functionality, </font><a href="http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/"><font color="#ff0000">which is outlined here</font></a><font color="#ff0000">.</font></p>
<p>Over <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-2/">the past couple of posts</a>, I’ve introduced our new <a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/metrify.zip">metrify.awk</a> <em>Twitter</em> metrics script, and looked at the first of the three metrics tables produced by the script. Let’s move on now to the second table, where I’ll use a snapshot of Australian political discussion on <em>Twitter</em> under the #auspol hashtag between February and August 2011, instead of #qldfloods – the overall metrics for the different user percentiles in the #qldfloods dataset turn out not to be particularly interesting… As before, we’re dividing the total userbase according to the 1/9/90 rule into the 1% of most active users, the next 9% of moderately active users, and the final 90% of least active users. (In the case of #auspol, that first percentile contains 142, the second percentile contains 1291, and the final percentile contains 12700 of a total of 14133 users.)</p>
<p><span id="more-994"></span><br />
<h2>Percentile Metrics</h2>
<p>The second table generated by metrify.awk provides us with detailed metrics on these three percentiles, on an overall basis rather than per specific time period.</p>
<p>This table contains the following columns:</p>
<ul>
<li><strong>percentile: </strong>the various percentiles making up the userbase, as well as total metrics for the entire userbase </li>
<li><strong>various stats on tweets of these different types:</strong> the types match those we’ve already encountered in the previous blog post, and stats on these tweet types are provided in each case as total numbers, and as a percentage of the number of tweets posted by the user percentile in question
<ul>
<li>original tweets </li>
<li>@replies </li>
<li>genuine @replies </li>
<li>retweets </li>
<li>unedited retweets </li>
<li>edited retweets </li>
<li>tweets containing URLs </li>
</ul>
</li>
</ul>
<p>&#160; <br />Again, too, these figures will add up to the total:</p>
<ul>
<li>edited retweets + unedited retweets = retweets </li>
<li>retweets + genuine @replies = @replies </li>
<li>original tweets + genuine @replies + retweets = total number of tweets </li>
</ul>
<p>&#160; <br />and</p>
<ul>
<li>% edited retweets + % unedited retweets = % retweets </li>
<li>% original tweets + % genuine @replies + % retweets = 100% </li>
</ul>
<p>&#160; <br />(with tweets containing URLs again constituting a separate category, since any type of tweet may also contain URLs).</p>
<h2>Some Results</h2>
<p>Applying this to our #royalwedding dataset, here’s what the activities of the different user percentiles look like:</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="295" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb.png" width="487" border="0" /></a> </p>
</p>
<p>We’re clearly seeing some very significant differences between the various percentile groups here. Interaction amongst the top 1% of most active users is especially discursive, with more than 55% of all of their tweets constituting genuine @replies: these people are very actively talking to (or at) one another. </p>
<p>The next lower group of active users, by contrast, doesn’t engage as much: only one third of their tweets are genuine @replies, but nearly 39% are original tweets. They’re more active at posting their own views and comments, rather than responding to others – or at least (and this is important to keep in mind with any such metrics), they’re less in the habit of also marking their @replies with the #auspol hashtag. By contrast, the top group are much more overtly <em>performing</em> their conversations, making them visible to all followers of #auspol; the second group may well send their own @replies, but if those @replies don’t contain the hashtag #auspol, they’re less visible to others and not included in our hashtag dataset. </p>
<p>Finally, too, the least active 90% of users are participating differently again: some 52% of their tweets are retweets, so (given that they’re not posting to #auspol that often in the first place) they’re probably more likely to be present here simply as ‘drive-by’ retweeters who occasionally pass along an interesting #auspol-tagged message that shows up in their <em>Twitter</em> feeds, but don’t deliberately follow the continuing #auspol conversation itself.</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image1.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="300" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb1.png" width="487" border="0" /></a> </p>
<p>There are two more useful statistics to examine for #auspol, and I’ve combined them in the graph above: first, the percentage of the total volume of #auspol tweets that each group is responsible for (shown here in blue): the one percent of most active users – a total of 142 <em>Twitter</em> users, for the period we’re looking at – accounts for a staggering <em>62%</em> of all #auspol tweets. In other words, Australian political discussion on <em>Twitter</em>, under the #auspol banner, is dominated by a vanishingly small group of users whose output is massively disproportional to the size of the group. Compare this with the least active 90%: those more than 12,000 users contribute less than 9% of all #auspol posts. Quite a difference – #auspol shows a very strong long-tail distribution amongst its active participants, then. (This is very different for many of the crisis-related hashtags we’ve looked at, by the way: the top 1% of most active users in #qldfloods, for example, are responsible for less than 17% of all tweets; the least active 90% of #qldfloods users for nearly 57%.)</p>
<p>Second, the distribution of tweets containing URLs is also interesting here. We already know that the lowest 90% are more likely to retweet than post their own commentary or @replies – and it looks like many of those retweets are of posts containing URLs: some 37% of all tweets by the bottom 90% include links. By contrast, the discursive few at the top of the activity scale include URLs in only 18% of their tweets.</p>
<h2>Percentile Metrics, Compared</h2>
<p>But beyond these metrics for the various user percentiles in individual hashtags, we can also compare these findings <em>across</em> different hashtag datasets – and that’s where things get <em>really</em> interesting. There are very many possible comparisons here: how do the individual percentiles of users compare across the different hashtags (something I’ve already hinted at above, comparing the relative contribution of the top 1% in #auspol and #qldfloods, for example), which hashtags contain more @replies, retweets, URLs, etc.?</p>
<p>We’ve only scratched the surface on these broader comparisons, but one very interesting pattern which has already emerged is shown in the graph below (which remains preliminary; one of my plans for the next month or so is to develop this further):</p>
<p><a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image2.png"><img title="image" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="451" alt="image" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb3.png" width="441" border="0" /></a> </p>
</p>
</p>
</p>
<p>Here, we’re comparing the <em>total</em> metrics (for all users, rather than for specific percentiles) across a range of different hashtags: #qldfloods, #eqnz, the Japanese #tsunami, #libya, the #londonriots, #ukriots, and #riotcleanup, the #royalwedding, election nights in Australia and Ireland (#ausvotes and #ge11), the Tour de France (#tdf), #eurovision, and #wikileaks. The size of each point on the graph shows the total size of the userbase for each hashtag – so, the #royalwedding and the #tsunami attracted a vastly larger <em>Twitter</em> userbase (of around half a million unique users each) than the Irish election or Queensland floods, for example.</p>
<p>But what the graph shows is that independent of the size of the userbase, there are some very obvious patterns here. All of the crisis events are characterised by a large number of both (unedited) retweets and tweets sharing links; people are actively finding and disseminating information. All of the widely televised events, on the other hand, have very few URLs, and only marginally more retweets: <em>Twitter</em> may be used as a backchannel for the television, in a shared experience of audiencing, but there’s not much additional information sharing going on here. #wikileaks, in turn, is a different story altogether – but perhaps we’ll come across more hashtags with similar metrics, and it’s the first sign of a third major category.</p>
<p>I’m reluctant to read too much more into these patterns as yet – first, I’ll need to do some more work cleaning up the datasets which the graph above is based on (working out which exact periods of time to use for each hashtag, and trying comparisons of a few more different combinations of metrics. I do think there’s a first sign in this of much more fundamental patterns in how <em>Twitter</em> hashtags are used for specific purposes. But that’s a longer discussion for another time.</p>
<p>And we haven’t yet exhausted all the possibilities which metrify.awk itself offers. In addition to the time- and/or percentile-based metrics which we’ve discussed over these last couple of posts, it also calculates metrics for each individual user in the dataset. And that’s what we’ll look at <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-4/">in the final instalment in this series</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Taking Twitter Metrics to a New Level (Part 2)</title>
		<link>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-2/</link>
		<comments>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-2/#comments</comments>
		<pubDate>Mon, 02 Jan 2012 01:46:30 +0000</pubDate>
		<dc:creator>Snurb</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Tools]]></category>
		<category><![CDATA[#qldfloods]]></category>
		<category><![CDATA[Gawk]]></category>
		<category><![CDATA[hashtags]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-2/</guid>
		<description><![CDATA[Update: I’ve clarified/corrected some of the details relating to the percentile metrics contained in the first table which metrify.awk generates. Update 2: revision 1.2 of metrify.awk adds further functionality in addition to what is described below. These changes are detailed here. In the previous post, I’ve introduced metrify.awk, our new multi-purpose tool for generating Twitter [...]]]></description>
			<content:encoded><![CDATA[<p><font color="#ff0000"><strong>Update:</strong> I’ve clarified/corrected some of the details relating to the percentile metrics contained in the first table which metrify.awk generates.</font></p>
<p><font color="#ff0000"><strong>Update 2:</strong> revision 1.2 of metrify.awk adds further functionality in addition to what is described below. <a href="http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/">These changes are detailed here.</a></font></p>
<p>In <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-1/">the previous post</a>, I’ve introduced <a href="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/metrify.zip">metrify.awk</a>, our new multi-purpose tool for generating <em>Twitter</em> metrics. Over the next instalments in this series of posts, I’ll take you through the results it produces. And seeing as we’re coming up to the anniversary of the January 2011 south-east Queensland floods, and as I needed to generate those metrics anyway, for a report on social media in the floods which we’re publishing soon, I’ll be using an archive of #qldfloods tweets between 10 and 17 January 2011 as an example here.</p>
<p>I’m running metrify.awk as follows for this:</p>
<blockquote><p>gawk -F , -f metrify.awk divisions=90,99 time=day qldfloods.csv &gt;qldfloods-metrics.csv</p>
</blockquote>
<p>In other words, we’re using a 1/9/90 division of users, and we’re tracking activities per day; the skipusers switch is not set, so full stats for all users will be generated.</p>
<p><span id="more-987"></span><br />
<h2>Metrics over Time</h2>
<p>The output file from this, qldfloods-metrics.csv, contains three separate data tables in the same spreadsheet, which I’m now loading into Excel. The first of these contains the following information:</p>
<ul>
<li><strong>day</strong> (in my case, otherwise minute, hour, month, year)<strong>:</strong> each time period covered by the dataset </li>
<li><strong>tweets:</strong> total number of tweets for that period </li>
<li><strong>users:</strong> total number of unique users posting tweets for that period </li>
<li><strong>various stats on tweets of these different types: </strong>these are provided as stats per user, as total numbers, and as percentages of the total number of tweets for each period
<ul>
<li>original tweets: tweets which are neither @reply nor retweet </li>
<li>retweets: manual retweets which contain any of <em>RT @user…</em> / “<em>@user</em><em>…</em> / <em>MT @user</em> / <em>via @user</em> </li>
<li>unedited retweets: manual retweets which <em>start with</em> any of <em>RT @user…</em> / “<em>@user</em><em>…</em> / <em>MT @user</em> / <em>via @user</em> </li>
<li>edited retweets: manual retweets which contain, but <em>don’t start with</em> any of <em>RT @user…</em> / “<em>@user</em><em>…</em> / <em>MT @user</em> / <em>via @user</em> </li>
<li>genuine @replies: tweets which contain <em>@user</em>, but are not retweets </li>
<li>URLs: tweets which contain URLs </li>
</ul>
</li>
<li><strong>stats for the various percentiles of users:</strong> in my example, following the 1/9/90 division
<ul>
<li>lowest 90% users (&lt; <em>a</em> tweets) <font color="#ff0000">as a percentage of the total number of users</font> </li>
<li>users &gt; 90% (&gt; <em>b</em> tweets; <em>x</em> of <em>n</em> users) <font color="#ff0000">as a percentage of the total number of users</font> </li>
<li>users &gt; 99% (&gt; <em>c</em> tweets; <em>y</em> of <em>n</em> users) <font color="#ff0000">as a percentage of the total number of users</font> </li>
<li><font color="#ff0000"><strong>(further stats for those user percentiles were introduced in metrify 1.2 – </strong></font><a href="http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/"><font color="#ff0000"><strong>details are here</strong></font></a><font color="#ff0000"><strong>)</strong></font></li>
</ul>
</li>
</ul>
<p>&#160; <br /><strong>Some more side notes are required here:</strong> first, as you already know, <em>Twapperkeeper</em> / <em>yourTwapperkeeper</em> does not capture ‘button’ retweets – so all we can examine in the retweet department are ‘manual’ retweets. We count tweets as retweets if they follow any of the four formats listed above (<em>RT</em> = retweet, <em>“@user</em> = quoted tweet, <em>MT</em> = manual retweet, <em>via @user</em>); between them, these formats capture the overwhelming majority of retweets, but some very unusual retweeting formats will slip through the cracks. We also distinguish between edited and unedited retweets simply by checking whether the tweet in question starts with these retweet indicators, or not; that’s the only reliable way of checking without entering vastly more complicated territory. Again, this will miss retweets where the retweeting user added comments at the end of the retweet; these will be (incorrectly) counted as unedited retweets.</p>
<p>These different tweet types will always add up to the total:</p>
<ul>
<li>edited retweets + unedited retweets = retweets </li>
<li>original tweets + genuine @replies + retweets = total number of tweets </li>
</ul>
<p>&#160; <br />and</p>
<ul>
<li>% edited retweets + % unedited retweets = % retweets </li>
<li>% original tweets + % genuine @replies + % retweets = 100% </li>
</ul>
<p>&#160; <br />and</p>
<ul>
<li>edited retweets:user + unedited retweets:user = retweets:user </li>
<li>original tweets:user + genuine @replies:user + retweets:user = total tweets:user ratio </li>
</ul>
<p>&#160; <br />(The odd ones left out from this are the stats on URLs, since URLs may be contained in original tweets as much as in @replies or retweets.)</p>
<p>Second, you see there the stats for our three (in my case) user percentiles make their first appearance. In my example, the following three column headings appear in the table:</p>
<ul>
<li>lowest 90% users (&lt;= 4 tweets) </li>
<li>users &gt; 90% (&gt; 4 tweets; 1670 of 15581 users) </li>
<li>users &gt; 99% (&gt; 18 tweets; 177 of 15581 users)</li>
<li><font color="#ff0000"><strong>(further stats for those user percentiles were introduced in metrify 1.2 – </strong></font><a href="http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/"><font color="#ff0000"><strong>details are here</strong></font></a><font color="#ff0000"><strong>)</strong></font>&#160;</li>
</ul>
<p>&#160; <br />This already provides us with some information about how the percentiles ended up being defined in this case (more detailed information appears in the second table generated by metrify.awk – more on that later). First, the activity cutoffs: the least active 90% of users were defined as users who contributed 4 tweets or less to the total dataset; the middle group contributed more than four and up to 18 tweets; the most active 1% of users contributed more than 18 tweets over the entire duration covered by the dataset.</p>
<p>Additionally, we also see the numbers of users included in each group: 177 users posted more than 18 tweets; another 1670 users posted more than 4 and up to 18 tweets, and the rest (15581 – 1670 – 177 = 13734) posted 4 tweets or less. This also exemplifies the slight size creep which I’ve mentioned before: the 177 users in the top group are actually 1.14% of the total group (rather than 1%), the 1670 in the next lot are 10.72% (rather than 9%). If the creep gets too big for your liking, you could adjust the division cutoffs slightly (I could have used <em>divisions=91,99</em> as a parameter to try to make the middle group smaller, for example).</p>
<p>At any rate, what the data in these columns track is <strike>what percentage of the total volume of tweets for each time period is contributed by each of the user percentiles</strike> <font color="#ff0000">the percentage of the total number of unique users during each period which belong to each of the percentile groups</font> – in other words, the extent to which any of these groups dominate the hashtag feed at any one point. Note that which users get to be in which percentile is determined once, for the entire dataset, rather than on a per-time period basis: what these columns indicate, therefore, is how <strike>active</strike> <font color="#ff0000">present</font> the <em>overall</em> lead (and other) user groups are in each time period, rather than how much a changing <em>current</em> group of most active users have contributed in each time period.</p>
<p><font color="#ff0000"><strong>(Again, please note that further stats for those user percentiles were introduced in metrify 1.2 – </strong></font><a href="http://www.mappingonlinepublics.net/2012/01/31/more-twitter-metrics-metrify-revisited/"><font color="#ff0000"><strong>details are here</strong></font></a>.<font color="#ff0000"><strong>)</strong></font></p>
<h2>Some Results</h2>
<p>Time for some first results from this table, then. What these data allow us to do is already quite useful, and I’ll only provide a handful of examples here; you can experiment further on your own. Using my #qldfloods data, and selecting just this first table of metrics from the metrify.awk output, I’ll create a pivot table in Excel, which enables me to plot various metrics over time, for example:</p>
<p><a href="file:///C:/Documents and Settings/bruns/Local Settings/Temp/WindowsLiveWriter-429641856/supfiles102CDCDA/image[6].png"><img title="image_thumb[2]" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="295" alt="image_thumb[2]" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb2.png" width="487" border="0" /></a> </p>
<p>This first table simply shows that the number of unique participating users, and the volume of tweets posted under the hashtag #qldfloods, move together over time; for most hashtags, that’s what you’d expect to see, I think.</p>
<p><a href="file:///C:/Documents and Settings/bruns/Local Settings/Temp/WindowsLiveWriter-429641856/supfiles102CDCDA/image[10].png"><img title="image_thumb[4]" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="295" alt="image_thumb[4]" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb4.png" width="521" border="0" /></a> </p>
<p>Next, we see how different types of tweets contribute to the overall volume of tweets. Retweets (which I haven’t divided into edited and unedited retweets here) are quite prominent at the start of the crisis – as everyone is looking to share what little information is already available – and gradually drop down towards the end (as more information is available, and retweeting isn’t as important any more; there’s a big tick up on the last day, but the overall volume of tweets is very low then, so this may be an outlier); @replies gradually rise, on the other hand (perhaps because there’s a shift from simply sharing news and information to discussing how best to organise the recovery effort). URLs also rise gradually – possibly a sign of more and better information becoming available.</p>
<p><a href="file:///C:/Documents and Settings/bruns/Local Settings/Temp/WindowsLiveWriter-429641856/supfiles102CDCDA/image[18].png"><img title="image_thumb[8]" style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="295" alt="image_thumb[8]" src="http://www.mappingonlinepublics.net/wp-content/uploads/2012/01/image_thumb8.png" width="521" border="0" /></a> </p>
<p>Finally, a look at our user percentiles: what we see here is that the ‘lead’ users aren’t actually that <strike>active</strike> <font color="#ff0000">prominent</font>, especially during the busiest days for the hashtag (11-13 January): on those days, even the top two user percentiles combined don’t account for more than 20% of all <strike>messages</strike> <font color="#ff0000">unique users</font>. This shouldn’t be misunderstood to mean that these top users were being drowned out by the <em>hoi polloi</em>, though: rather – given what we’ve already found out about retweeting rates in the previous graph – much of what the least active 90% of users were doing during these days was to retweet the messages of those lead users. (From all we’ve seen so far, this is a pattern common to crisis-related hashtags; it may be very different for a non-crisis case.)</p>
<p>We’ll see more evidence of this, in fact, when we turn to the next metrics table produced by metrify.awk – <a href="http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-3/">in the next post in this series</a>…</p>
]]></content:encoded>
			<wfw:commentRss>http://www.mappingonlinepublics.net/2012/01/02/taking-twitter-metrics-to-a-new-level-part-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

