Resolving Short URLs: A New Approach
When working with Twitter data, one of the most interesting questions is always what URLs tweets are linking to. As Twitter users discuss any given topic or issue, the URLs they share provide us with an indication of the online media they’re drawing on for information and/or entertainment – and by counting which sites appear most frequently, we’re also able to measure the relative visibility or relevance of such sites).
But of course, there’s a complication: the vast majority of URLs in tweets have been shortened using a variety of URL shorteners, and multiple short URLs may point to the same eventual target; additionally, it’s even possible – and not too uncommon – for shortener nesting to occur: for example, a bit.ly short URL might subsequently be shortened by ow.ly, and finally by t.co, in the course of retweeting. Working with the short URLs themselves is less than useful, therefore – and we must find ways to resolve them to their eventual target.