We are Cambridge Energy Data Lab, a smart energy startup based in Cambridge, UK.
This blog, named "Cambridge Energy Data Analysis", aims to incrementally unveil our big data analysis and technologies to the world. We are a group of young geeks: computer scientists, data scientists, and serial entrepreneurs, having a passion for smart energy and sustainable world.

Monday 19 May 2014

How Do You Use Electricity ?

Collecting data with smart-meters

Smart-meters, through their ability to communicate data instantly, are re-shaping the electricity market landscape. Indeed, these new-generation meters collect and transmit instantaneous electricity consumption data, which can then be used by various actors ranging from the user (e.g. to monitor its own usage) to the supplier (e.g. to forecast energy demand) via independent companies (like Cambridge Energy Data Lab) which help make more sense of this data.

In this short study, we will focus on identifying generic behaviours of electricity consumption within a dataset of more than 400 users for the February-March 2014 period. Because the raw dataset is impossible to interpret, we will perform what is usually referred to as a "model reduction."

Principal component analysis

The first step in the analysis is to perform a model reduction to define several types of days. Amongst all the unique day time-series, we select few thousands (8000 days exactly, out of the 60 days x 400 users = 24000 total days available) in order to perform a Principal Component Analysis.  PCA is a linear algebra method used in order to find directions of largest variance in a dataset composed of several samples of a given variable. See figure 1 for a visual example.

Figure 1: PCA, 2-dimensional example. PCA finds the orthogonal directions which maximise the variance of the samples. 
After having performed a PCA, we can order the different samples along the first principal component (PC1 in Figure 1). We perform a PCA on the dataset composed of the 8000 different days of electricity consumption and order the different days along the first principal component. The result is presented in Figure 2.

Figure 2: PCA performed on a dataset of 8000 days of electricity consumption. The days are ordered with respect to their coordinate along the first principal component.
We notice that the days are now ordered with respect to a relevant criterion since we can detect a continuous evolution from users consuming electricity during the day and in the evening (top of Figure 2) to users who mostly use electricity in the evening and at night (bottom of Figure 2). From this observation, we can therefore define different types of days.
We then simplified the full dataset thanks to this criterion, creating around 10 different "types of days." It is therefore possible to simplify the 2-month time-series by attributing a value to each day corresponding to its type. This is represented in the left panel of Figure 3.

Figure 3: Left panel: unordered "type of day" time-series. Right panel: ordered "type of day" time-series obtained by ordering along the first principal component. We notice an evolution from users who mostly use electricity during the day (top) to users who mostly use electricity at night (bottom).
By re-applying the concept of first principal component ordering, we can re-order the simplified "type of day" time-series. This is presented in the right hand side panel of Figure 3. This time, more than ordering the time-series of the days, we manage to order the users. Each separate user can therefore be attributed to a category, depending not only on the type of daily consumption, but also on the longer time-scale (weekly, monthly) behaviour. Indeed, at the top of the right side panel are represented the "type of day" time-series for the users consuming electricity mostly during the day and the evenings, whereas the bottom part of this colour plot is associated with users consuming electricity mostly at night time. We can also notice on this figure a longer time-scale behaviour ordering, and the signature of the week-ends where people tend to stay awake (and use more energy) later at night.


The large amount of data collected by smart-meters can only been visualised and interpreted by using advanced mathematical tools, PCA being one of them. This method allowed us to successfully define different types of days in terms of electricity usage and therefore simplify the complete users' electricity time-series. From this model reduction, another PCA was then performed to directly order the users, therefore gaining insight about the different types of electricity consumption behaviour present in the dataset.