Skip to content
October 18, 2010 / jphoward

Visualising Time Series

Over at Kaggle there’s an interesting competition involving time series prediction. Since I’ve never done much with time series before, I figured I’d give it a go. It’s a good chance to learn something new, and have some fun in the process.

I decided to try a new (for me) approach to the analysis, which is to use general purpose programming tools for all the data analysis, including import/export, visualization, modelling, etc. My hypothesis was that with powerful languages which strong functional capabilities, I would be able to achieve results just as quickly as using a dedicated tool (like R), plus have the benefits of a “proper” programming language (e.g. strong language design, excellent IDE, speed, etc).

My first approach was to use Javascript to chart the 400-odd time series in each category (quarterly, and monthly). It turns out that it’s only about 10 lines of code, plus a cut-and-paste of a function from Google Charts docs:

var content = $("#content");
for (var i = 0; i < qdata.length; i++) {
    var url = chartUrl(qdata[i], i);
    content.append('' + '<img src="' + url + '" /> ');
}

function chartUrl(data, i) {
    var res = "http://chart.apis.google.com/chart?chs=440x220&cht=lc&chtt=" + i +    "&chd=";
    var maxval = Math.max.apply(Math, data);
    return res + extendedEncode(data, maxval);
}

The result is this page, which is a fast and easy way to see all the time series at once (click one of the buttons on that page to see the data). If you’re interested in seeing how it works, feel free to look at the JavaScript linked from that page.

Next, I moved to C#, and found that the functional capabilities added in .Net 3.5 (LINQ et al), and the automatic parallelization added in .Net 4, made it a real pleasure to work with. I also used GlowCode to profile my algorithms as I went, which made it easy to keep them running fast. I used the free Microsoft Chart components, plus a FlowLayoutPanel, to easily generate visualizations. For example, here’s a (subset of a) visualization showing in-sample predictions (blue) vs actual data (orange):

TsMetrics

(click image to view full size)

In this example, it’s easy to see some models that aren’t ideal: series 2 shows that the underlying trend is not matching closely enough in this instance, and series 5 shows the problem of using additive seasonality in appropriately. You can see that adding the series number and a fitness metric to each chart makes it easier to work with.

Here’s a visualization showing out-of-sample predictions for a different model:

TsPredict

(click image to view full size)

In this case we can confirm visually that the models have reasonable-looking predictions.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: