TechPolicy.ca Data mining politics and public policy. The politics of data mining.

2Mar/101

Visualizing Networks in JavaScript

Continuing my exploration of JavaScript-based data visualization, I've created a basic network visualizer for the MP data I'm collecting. Below is a social network of all the Canadian federal ministers who have been mentioned together in various press and social media sources in the last week.

Note that the size of the node represents the number of articles mentioning the MP in the past week.

If you want the source code or if the visualization does not work, please e-mail me.

Tagged as: 1 Comment
7Feb/100

Tracking the Press: Minister Networks

About a week ago, I discussed tracking Canadian MPs based on the number of times they get mentioned in various news media, and who they get mentioned with. At the time, I only showed a chart of mentions, and discussed some shortcomings of the approaches used for tracking politicians -- or, for that matter, any brands.

I've been working on improving my tracking software and also working on new visualizations. The work has culminated in the network below, and a high quality PDF version is also available:

This network tracks Canadian federal ministers in various blogs, magazines, and newspapers. The size of the circle with the minister's name represents the number of articles (i.e. the larger the circle, the more articles), while a connection exists between ministers if they have been mentioned together in at least one article or blog post over the last week.

Such a network representation provides very useful information about press coverage of Canadian ministers. A great example is that Prime Minister Stephen Harper gets mentioned very often relative to other ministers, but is not mentioned often with other ministers. Tony Clement or Jim Prentice, on the other hand, get mentioned with more ministers, but have fewer articles about them.

One thing the network does not show, however, is how often the co-mentions occur. It's possible, for example, that a set of five or six ministers was mentioned in one article, and this would create something like the dense set of connections with ministers Flaherty, Prentice, Clement, and others. More information would be necessary to analyze whether this is the case or not.

Stay tuned for more updates on the software. I also hope to have a website set up where this is all done automatically and people can peruse social media surrounding Canadian politics.

1Feb/100

Tracking the Press: MPs in Canada

In my last post, I discussed different approaches to social media mining. While I am currently working on complex approaches to mining information in blogs, newspapers, and other forms of social and news media, even simple approaches can yield interesting information.

For example, the graph below shows the number of articles that mention Canadian Members of Parliament (MPs) versus the number of different MPs that are mentioned when discussing those original MPs. For example, if you have MP A mentioned in 30 articles, and several of those articles mentioned two other MPs, then A would be located at point (30, 2). Note that clicking the graph follows posts over the week ending on January 31.

What's interesting about this graph is it shows the centrality of MPs to political discussions. As one would expect, Stephen Harper is mentioned fairly often and in relation to many other MPs. The same is true for Michael Ignatieff. While we lose a great deal of information by not reading the articles themselves, it is instructive to see how observing the information in aggregate helps elucidate the underlying social and political structure of Canada.

Note that Stockwell Day is seemingly mentioned in a great deal of articles, but this is an artifact of the data collection process. Specifically, "day" is a common word regardless of the MP. I wanted to leave this data point in, however, to show how developing tools for press and media tracking is often more difficult than one would expect. The initial software for downloading newspaper or blog articles and counting words is seemingly straightforward to build. However, many practical hurdles often hamper the process. Differentiating between "Day" the politician and "day" the common noun is but one example.

7Nov/090

Types of Bloggers

I recently gave a talk at Brunel University. It was about 40 minutes long and focused on my work in data mining the political blogosphere. While I won't discuss most of the work in this post, one area that really got me excited was categorizing bloggers by their posting habits.

I haven't done any formal work in this area yet, but I've plotted a few graphs showing how different bloggers post in the context of the 2008 US Presidential election. The graphs show the the number of posts on the blog within a seven day period. The election itself takes place around day 175.

What really jumps out at me with these graphs is that bloggers are very different, but that some intuitive categories exist. The obvious one is a very active blogger regardless of political context or external events. The graph below shows one such example.

jpeg-9091-sm

Another example is a single issue blogger. These could include blogs focusing on the "Palin for VP" or "Clinton for President" campaigns. Such bloggers tend to be very active around the time when there is most hope for success in such a campaign, and activity drops off when the campaign shuts down or fails.

jpeg-8196-sm

While the single issue blogger above seems to ramp up and then die down slowly, some bloggers have much more obvious swings in activity. An obvious "issue" is the presidential election itself. While very active blogs were active throughout the entire period, some of the less popular and less active blogs had a big increase in activity around November 2008. This is shown in the example below.

jpeg-7527-sm

I'm not quite sure where this research can go, but I have a few ideas. My broader research focus is social influence in the blogosphere, and I imagine being able to categorize blogs using mathematical definitions based on the above would certainly help my work.