Prototype: More Web-Friendly Visualizations in R
I've spent some more time thinking about how best to put together the package for creating web-friendly, interactive data visualizations in R. I have a pretty substantial JavaScript package that does a lot of basic visualizations now, and it's really exciting to see where this is going. With this in mind, I'm releasing a new version of the R package prototype I keep discussing in this blog.
A number of functions are included here, including wv.plot(), wv.lineplot(), wv.snaplot(), wv.bargraph. The documentation still needs a lot of work, and there are no interactive abilities yet (though they exist in the JavaScript code).
What is most exciting about this package is that a lot of the steps one takes to make a complete graph have been split into individual functions. Thus, while one can make a scatterplot with wv.plot(), one can also use wv.axis() and wv.points() to do so as well. Each data visualization gets its own ID, or can be assigned one, so one can later start passing visualization (e.g. the points in the scatterplot itself) as arguments to other functions, thus allowing one to begin adding functions for interactivity.
A few examples of the visualizations are shown below, along with the necessary R code to get them to display. Note that these are embedded into the blog, I did so through the use of an inline frame.
Basic Scatterplot
The code below will generate a basic scatterplot.
x = rnorm(30)
y = rnorm(30)
wv.plot(x, y, "~/Desktop/scatterplot", height=300, width=300, xlim=c(-2.5,2.5), ylim=c(-2.5,2.5), xbreaks=c(0), ybreaks=c(0))
Plot with Multiple Data Types
Supposing you want to have a scatterplot with multiple point types and a line. You can build this manually with the following code.
x = rnorm(30); y = rnorm(30); z = runif(30);
wv.open("~/Desktop/plot3/", height=300, width=300);
wv.axis(c(-3.5, 3.5), c(-3.5, 3.5), xbreaks=-2:2, ybreaks=-2:2);
wv.points(x, y, xlim=c(-3.5, 3.5), ylim=c(-3.5, 3.5));
wv.lines(sort(x), z, col="red", xlim=c(-3.5, 3.5), ylim=c(-3.5, 3.5));
wv.close();
Bar Graph
This is a new graph format.
x = c(2.5, 7, 11);
wv.bargraph(x, cats, "~/Desktop/barplot", ylim=c(0, 15), ybreaks=(1:5)*3);
As always, comments are welcome.
Canadian CPI: Visualization Brainstorm
After finishing the R prototype for data visualization, I've started abstracting the various methods necessary to create beautiful graphs. While there's no preliminary version of the R package yet, I think I've taken a number of exciting steps. These include:
- Abstracting graph objects. Objects such as lines, scatter plots, and other graph types can all be treated in a similar fashion in JavaScript. I use this approach in the new version of the JavaScript graph presented below.
- Including axes. The last graphs did not have axes, grid lines, and other information cues. These ones do. While they have to be manually set, this presents an advantage in that one can choose which grid lines and axis points to show.
- Interactivity. The graph below actually has useful interactive features. Mousing over points provides information on the value of the point itself, while mousing over the line plot provides the title. Nothing too complex, but already fairly useful.
I chose to present data on the Canadian consumer price index (CPI). This is freely available data and serves as a reminder of the major political issue of our time... While I don't want to make this post political, the ultimate goal of this blog is to use such visualizations and mathematical models to better understand public policy and the role of data mining therein. Might as well start referencing useful data in this regard.
So without further ado, here's the graph...
The next step is fairly clear: making the above possible in R!
Prototype: Web-Friendly Visualizations in R
Developing web-friendly data visualizations is not very difficult, though as far as I know, a package that allows one to do this directly in R does not exist (e-mail me if you know of one). As someone who has been developing lots of data-oriented software tools, it's always nice to post visualizations online. To facilitate this task, I've been fooling around with creating a data visualization prototype in R. While the package is very limited in what it does, I hope it'll generate a discussion on the types of visualization tools that could help R users post their work on the web.
At this stage, the package has three functions to illustrate scatter plots, line graphs, and social networks. Each function creates a new directory with all the necessary JavaScript and HTML files. The HTML file could then be embedded using an inline frame (as done below) or used as a standalone website.
You can download the prototype here, and below are some examples of visualizations.
Scatter Plot
x = rnorm(25)
y = rnorm(25)
wv.scatterplot(x, y, "/wv-scatterplot", height=300, width=300, marginsize=0.1)
Line Graph
x = -100:100/10
y = sin(x)
wv.lineplot(x, y, "/wv-lineplot", height=300, width=300, marginsize=0.1)
Social Network
library(igraph)
g <- erdos.renyi.game(15, 0.175)
wv.sna(g, "/wv-sna", rnorm(15, 2, 0.75), width=400, height=400)
Next Steps
I apologize in advance, as some of the code above may be buggy and it certainly isn't very customizable. The next step -- assuming there's interest -- is to abstract the graph drawing to individual functions so one can then produce multiple graphs in one canvas or frame. Making more options for interactivity, labels, and so on is also a must. Again, comments and suggestions are very welcome.
Beautiful Web-Based Graphs
I regularly show charts on this website, and for the past few days, have been trying to find a good way to do this. Many of the charts so far have been shown as PDF or JPG files. These are fine, but they are not very responsive. Furthermore, many of the packages available for graphing are proprietary or not open source, and this is a problem for me. I decided to look for something I could live with when it comes to displaying charts and graphs.
Quite a few people have recommended Google Charts, which definitely has a lot to offer. However, I also want to customize my charts and make my own chart types (for example, social network illustrations). Another good package is Open Flash Chart, but I don't have a Flash license and prefer things to be a bit more open. Finally, there's Processing. This is a great language, but Java applets on a website bug me.
I'm quite picky, but have finally found a useful tool: Raphaël -- a library meant for representing vector graphics using JavaScript. While they have a graphing library, I decided to write my own code to play around with the library and customize the graphics. Overall, I must say that I am very impressed with the package.
As an example, the chart below shows a bubble plot. While fairly basic, I'm really happy with how easy it is to make interactive charts. Scrolling over the bubbles changes their colour, and adding other features is fairly easy.
Another example is a line chart, shown below.
I'll do my best to improve these charts and make them more interactive and useful. Please e-mail me if you want the source code.
Mobile World Congress 2010
In a few hours, I'm flying to Barcelona for the World Mobile Congress, an annual event that showcases pretty much everything related to mobile technologies. I'm quite excited about this event, as it's bringing together around 40,000 to 50,000 people interested in mobile technologies, business, and related areas.
If you're attending and interested in data mining, social network analysis, social media mining, and mobile technologies, feel free to e-mail me. I'm always open to meeting people!
Tracking the Press: Minister Networks
About a week ago, I discussed tracking Canadian MPs based on the number of times they get mentioned in various news media, and who they get mentioned with. At the time, I only showed a chart of mentions, and discussed some shortcomings of the approaches used for tracking politicians -- or, for that matter, any brands.
I've been working on improving my tracking software and also working on new visualizations. The work has culminated in the network below, and a high quality PDF version is also available:

This network tracks Canadian federal ministers in various blogs, magazines, and newspapers. The size of the circle with the minister's name represents the number of articles (i.e. the larger the circle, the more articles), while a connection exists between ministers if they have been mentioned together in at least one article or blog post over the last week.
Such a network representation provides very useful information about press coverage of Canadian ministers. A great example is that Prime Minister Stephen Harper gets mentioned very often relative to other ministers, but is not mentioned often with other ministers. Tony Clement or Jim Prentice, on the other hand, get mentioned with more ministers, but have fewer articles about them.
One thing the network does not show, however, is how often the co-mentions occur. It's possible, for example, that a set of five or six ministers was mentioned in one article, and this would create something like the dense set of connections with ministers Flaherty, Prentice, Clement, and others. More information would be necessary to analyze whether this is the case or not.
Stay tuned for more updates on the software. I also hope to have a website set up where this is all done automatically and people can peruse social media surrounding Canadian politics.
Graphs, Maps, and Trees
I just finished reading Graphs, Maps, and Trees by Franco Moretti. The book was recommended to me by a friend (thanks Tom!) and I must say I really enjoyed it.
While the book does not discuss information theory, machine learning, or data mining, it provides a very interesting argument for more rigour in literary studies. Furthermore, I believe it provides a great introduction to the possibilities that information theory holds for political science, business intelligence, and related fields. A particularly powerful example of this is when Moretti writes,
What do literary maps do... First, they are a good way to prepare text for analysis. You choose a unit--walks, lawsuits, luxury goods, whatever--find its occurrences, place them in space... Or in other words, you reduce the text to a few elements, and abstract them from the narrative flow, and construct a new, artificial object like the maps that I have been discussing. And with a little luck, these maps will be more than the sum of their parts: they will possess 'emerging' qualities, which were not visible at the lower level.
In this paragraph, Moretti specifically discusses the use of geographical representations of novels to study the patterns behind the stories therein. If we go beyond maps specifically and discuss graphs, trees, networks, and other abstract analytical tools, we can see how using any such tools may illuminate underlying patterns in literary works.
As Moretti discusses at the start of his book, a major challenge to literary research is that reading all the novels published in a specific period is impossible. There is simply too many of them. The use of graphs allows one to analyze such works in aggregate while dealing with the shortcoming of not being able to read as fast as content is produced. Social media and press tracking has a similar challenge. There are too many blog posts, articles, Tweets, status updates, and websites out there for a consultant or researcher to read and aggregate by hand. As such, one needs more abstract frameworks for dealing with the data.
If you are looking for a non-technical introduction to the possibilities held within information retrieval and data mining, this is a great book. While Moretti doesn't discuss automated or algorithmic approaches to his work, the mental leap from his work to automated strategies is short and easy.
