TechPolicy.ca Data mining politics and public policy. The politics of data mining.

27May/102

Canadian CPI: Visualization Brainstorm

After finishing the R prototype for data visualization, I've started abstracting the various methods necessary to create beautiful graphs. While there's no preliminary version of the R package yet, I think I've taken a number of exciting steps. These include:

  • Abstracting graph objects. Objects such as lines, scatter plots, and other graph types can all be treated in a similar fashion in JavaScript. I use this approach in the new version of the JavaScript graph presented below.
  • Including axes. The last graphs did not have axes, grid lines, and other information cues. These ones do. While they have to be manually set, this presents an advantage in that one can choose which grid lines and axis points to show.
  • Interactivity. The graph below actually has useful interactive features. Mousing over points provides information on the value of the point itself, while mousing over the line plot provides the title. Nothing too complex, but already fairly useful.

I chose to present data on the Canadian consumer price index (CPI). This is freely available data and serves as a reminder of the major political issue of our time... While I don't want to make this post political, the ultimate goal of this blog is to use such visualizations and mathematical models to better understand public policy and the role of data mining therein. Might as well start referencing useful data in this regard.

So without further ado, here's the graph...



The next step is fairly clear: making the above possible in R!

7Feb/100

Tracking the Press: Minister Networks

About a week ago, I discussed tracking Canadian MPs based on the number of times they get mentioned in various news media, and who they get mentioned with. At the time, I only showed a chart of mentions, and discussed some shortcomings of the approaches used for tracking politicians -- or, for that matter, any brands.

I've been working on improving my tracking software and also working on new visualizations. The work has culminated in the network below, and a high quality PDF version is also available:

This network tracks Canadian federal ministers in various blogs, magazines, and newspapers. The size of the circle with the minister's name represents the number of articles (i.e. the larger the circle, the more articles), while a connection exists between ministers if they have been mentioned together in at least one article or blog post over the last week.

Such a network representation provides very useful information about press coverage of Canadian ministers. A great example is that Prime Minister Stephen Harper gets mentioned very often relative to other ministers, but is not mentioned often with other ministers. Tony Clement or Jim Prentice, on the other hand, get mentioned with more ministers, but have fewer articles about them.

One thing the network does not show, however, is how often the co-mentions occur. It's possible, for example, that a set of five or six ministers was mentioned in one article, and this would create something like the dense set of connections with ministers Flaherty, Prentice, Clement, and others. More information would be necessary to analyze whether this is the case or not.

Stay tuned for more updates on the software. I also hope to have a website set up where this is all done automatically and people can peruse social media surrounding Canadian politics.

1Feb/100

Tracking the Press: MPs in Canada

In my last post, I discussed different approaches to social media mining. While I am currently working on complex approaches to mining information in blogs, newspapers, and other forms of social and news media, even simple approaches can yield interesting information.

For example, the graph below shows the number of articles that mention Canadian Members of Parliament (MPs) versus the number of different MPs that are mentioned when discussing those original MPs. For example, if you have MP A mentioned in 30 articles, and several of those articles mentioned two other MPs, then A would be located at point (30, 2). Note that clicking the graph follows posts over the week ending on January 31.

What's interesting about this graph is it shows the centrality of MPs to political discussions. As one would expect, Stephen Harper is mentioned fairly often and in relation to many other MPs. The same is true for Michael Ignatieff. While we lose a great deal of information by not reading the articles themselves, it is instructive to see how observing the information in aggregate helps elucidate the underlying social and political structure of Canada.

Note that Stockwell Day is seemingly mentioned in a great deal of articles, but this is an artifact of the data collection process. Specifically, "day" is a common word regardless of the MP. I wanted to leave this data point in, however, to show how developing tools for press and media tracking is often more difficult than one would expect. The initial software for downloading newspaper or blog articles and counting words is seemingly straightforward to build. However, many practical hurdles often hamper the process. Differentiating between "Day" the politician and "day" the common noun is but one example.

24Jan/100

Rebranding the Blog

A number of recent developments have convinced me to rebrand this blog. Rather than focusing on research notes, I'm going to begin actively discussing data mining and politics.

There are a number of reasons for this change. I'm currently based in the UK, and a number of developments here have led me to want to share my views on these issues. First, the launch of data.gov.uk, alongside similar initiatives in my home town (Toronto), have made it clear that government-focused data analytics is important and gaining in popularity. Rumours are circulating of a national election in the UK in May, and I may be analyzing the data surrounding this. Furthermore, a great deal of governmental scrutiny is going into data mining, profiling, and tracking information about Internet users, travelers, customers, and so on.

A big question surrounding these developments is: "What does it all mean?" And this is what this blog will focus on. My research tries to explore such questions, but there's more to this issue than analyzing political discussions online. So stay tuned!