## Description

This blog, named "Cambridge Energy Data Analysis", aims to incrementally unveil our big data analysis and technologies to the world. We are a group of young geeks: computer scientists, data scientists, and serial entrepreneurs, having a passion for smart energy and sustainable world.

## Wednesday 5 November 2014

### Feed in tariffs: Small scale solar PV cost

In our previous posts we focused on the excess energy that can be generated using photovoltaic (PV) panels and on on its trends. In this post we will focus on the price of the solar PV installation through two interactive visualizations based on the latest data provided by the UK government about the cost per kW of PV deployments by month.

## Friday 31 October 2014

### Big Data Crunching

There has been much talk about Big Data in the last years and the word cloud shows terms commonly related to the definition of Big Data. First and foremost, the most important attribute of Big Data comes as no surprise: its volume! Big Data is, as the name suggests, BIG. What big actually means in regards to bytes or number of records is circumstantial. It becomes big when your traditional way of data processing hits a wall and becomes unfeasible.

The first symptom will be that your data does not fit into memory. In the beginning you might simply beef up your computer with some extra memory. This is commonly called to scale up. A more sophisticated solution would be to load only partial data into memory as disk size is much less of a bottleneck. This is how a database operates. A join operation on two massive tables, e.g. in Postgres, will load and write many chunks of intermediate data but will eventually succeed even though all the data never fit into memory at once. Relational Databases and scaling up to more powerful computers was the gold standard for tackling growing data volumes. Things changed in particular after 2004 with Google's publication of "MapReduce: Simplified Data Processing on Large Clusters"[1].

Instead of running huge databases on expensive supercomputers the trend went to massive parallelisation on clusters of cheap hardware. With this came new challenges which MapReduce successfully addresses:

- parallelisation must be easy
- automatic distribution of data between the workers of the cluster
- fault tolerance

If you process data on a big cluster of cheap hardware the chances are quite high that one of the computers breaks down. In the ACID world of RDBs (all or nothing transactions) this would mean we never get any results.

So what exactly is MapReduce doing differently?

### The MapReduce Paradigm

Let's discuss a simple example inspired by a common task in processing genetic data: imagine you have vast amount of strings and you want to trim the last 5 letters of the strings.

In such a task we have to process each single record. This means the task is of linear order:

\[ O(n) \]

where the computational effort grows linearly with the number of records.

However, each record can be processed independently from the other records which allows to scale out the task over multiple processes, cores, or computers.

Let \(k\) be the number of processes, cores, or computers available, the order of our task becomes

\[ O \left ( \frac{n}{k} \right ). \]

This is much better for the case of Big Data when \(n\) is very large as we can control the computational effort easily by increasing \(k\). Additionally, if one subtask fails we only have to rerun that specific subtask. A complete rollback of the transaction is not required as it would be the case in ACID conform RDBs.

Let’s consider a slightly more complex task: the "hello world" program in the world of MapReduce is the counting of word frequencies in a very big number of documents. As it was the case in the previous task, the word count of a single document is independent from the other documents, this makes the task perfectly suitable for scaling out:

A common pattern is emerging here: we use a function which maps each document to a list of independent word counts. The result of this map is a distributed list of word counts. So far our MapReduce programme comprises the following steps:

- distribute the documents over multiple computers in a cluster
- apply a word count function on each computer
- generate a distributed list of word counts

The next step is to aggregate the distributed list of word counts. However, we want to scale out the aggregation again over multiple computers:

This aggregation step is also called reduce. The steps involved are as follows

- send the same words to the same computer for aggregation (called shuffle)
- apply a sum function to generate the word count over the complete set of documents (reduce)

And there we have the complete MapReduce paradigm:

Some tasks might be more difficult to translate into map and reduce steps and can require multiple rounds of mapreduce. However, the mapreduce ecosystem is growing steadily with new libraries implementing now even complex machine learning algorithms in mapreduce [3,4,5].

Last but not least, comparing mapreduce to RDB we see that mapreduce is using schema at read, which is ideal for messy and inconsistent data, and RDB is traditionally using schemas at write. In the world of Big Data the schema at read approach has the following advantages:

- the flexibility to store data of any kind including unstructured or semi-structured data
- it allows flexible data consumption
- it allows the storage of raw data for future processing and changing objectives
- it removes the cost of data formatting at the moment of data creation which results in faster data availability
- it allows you to experiment with the data at low risk as the raw data can be kept to correct mistakes

There is always the elephant in the room when speaking about MapReduce: Hadoop!

Most importantly, Hadoop is not MapReduce it is just one implementation of the mapreduce framework! Hadoop is quite a beast and targets the really BIG Big Data. An alternative implementation we are using here at Cambridge Energy Data Lab is Disco.

The main reason we use Disco over Hadoop: Disco jobs are written in Python and Hadoop jobs are mainly written in Java. (Strictly speaking you can also use other languages with Hadoop). Also Disco is much lighter and easier to administrate. [2]

The word count example in Disco is as simple as the underlying problem itself:

from disco.core import Job, result_iterator

def map(line, params):

for word in line.split():

yield word, 1

def reduce(iter, params):

from disco.util import kvgroup

for word, counts in kvgroup(sorted(iter)):

yield word, sum(counts)

if __name__ == '__main__':

input = ["http://discoproject.org/media/text/chekhov.txt"]

job = Job().run(input=input, map=map, reduce=reduce)

for word, count in result_iterator(job.wait()):

print word, count

def map(line, params):

for word in line.split():

yield word, 1

def reduce(iter, params):

from disco.util import kvgroup

for word, counts in kvgroup(sorted(iter)):

yield word, sum(counts)

if __name__ == '__main__':

input = ["http://discoproject.org/media/text/chekhov.txt"]

job = Job().run(input=input, map=map, reduce=reduce)

for word, count in result_iterator(job.wait()):

print word, count

I leave it to you to compare this with the Java version for Hadoop: https://wiki.apache.org/hadoop/WordCount

## Friday 19 September 2014

### 6 screens for monitoring with Raspberry Pi cluster

Do you monitor?

Key performance indicators (KPI) , project progresses, server status, user logs (and many more) are constantly changing and the amount of data collected is growing bigger everyday. However it is hard to monitor these values, and sharing them with all the members of the team.

Instead, let's constantly display it.

We had too many things to display on a single monitor. Our solution was to buy 6 monitors!

6 screens for monitoring. |

In order to handle 6 monitors, we would normally have to buy a massive desktop PC and a top-end graphic board. The solution we found is using Raspberry Pi! Raspberry Pi is one of the smallest computers in the world and it was born in Cambridge UK. Each Pi's performance is not perfect but it can be powerful enough if we assemble them together to form a cluster.

We immediately bought 6 Raspberry Pis. The main issue was to find a rack to store the 6 Pis. It's common to build a case from lego, however, we didn't have any Lego laying around. Our solution we found is using a shoebox!

Raspberry Pis in a Nike shoe case. |

Just Do It!

It looks cool. Cambridge style monitoring environment. Try it like us!

## Monday 15 September 2014

### Challenge for Excess Generation

Do you consider to buy a photovoltaic (PV) installation for your house? Or do you already have some solar panels on your roof and you are looking for ways to maximise your return from this investment? Read on as we have some important advice for you:

### What is Excess Generation?

Sometimes excess generation is also referred to as surplus generation, excess electricity, or exported energy among others. Excess generation is defined as the amount of electricity generated by your rooftop panels (1. Total generation) minus your daytime electricity consumption (2. Electricity used). It is such excess generation which is available for export to a grid system (3. Export energy).

### Feed-in-tariff Incentive Scheme.

Feed-in tariffs (FITs) are the most widely used policy in the world for accelerating renewable energy (RE) deployment, accounting for a greater share of RE development than either tax incentives or renewable portfolio standard (RPS) policies. In the European Union (EU), FIT policies have led to the deployment of more than 15,000 MW of solar photovoltaic (PV) power and more than 55,000 MW of wind power between 2000 and the end of 2009. In total, FITs are responsible for approximately 75% of the global PV deployment.

In a grid connected rooftop photovoltaic power station, the generated electricity can be sold to the grid at a higher price than what the grid charges for the consumers. This arrangement provides a secure return for the installer’s investment. Many consumers from across the world are switching to this mechanism due to the revenue yielded. However, the details of the financial mechanism varies depending on countries as illustrated by two examples as follows:

#### Case Study 1: Desincentive for Excess Generation (UK)

In the UK, consumers have a stronger incentive to minimise Excess Generation, by using the majority of their generated electricity on sunny days. The UK customers receive a guaranteed Feed-in-tariff for all electricity generation (10-14 p/kWh) , plus an 'Export tariff' (4.77 p/kWh) for their excess generation, which, however, is much smaller than the average electricity bill (12-15 p/kWh) . Therefore, customers should consume their generated electricity rather than export it to the grid.

As goes the theory.

However, in reality the ratio of excess generation is fixed to 50% of PV generation due to a lack of smart-meters. Thus, the importance of 'Excess Generation' will definitely emerge with the rollout of smart-meters in the near future.

#### Case Study 2: Incentive for Excess Generation (JAPAN)

In Japan, FiTs are only paid for 'Excess Generation', not "Total Generation' as it is the case in the UK. FiT’s price is currently much higher (38-42 JPY per kWh) than the average electricity bill (20-25 JPY per kWh), so customers have a strong financial incentive to maximise their amount of 'Excess Generation'. Therefore, customers are willing to change their consumption behaviour by shifting the usage of electricity-heavy appliances, such as dishwashers and wasching machines to the nighttime with cheaper electricity tariff. This individual behavioral change is expected to contribute to a nationwide peak-reduction in the future.### Our Challenge for Excess Generation

We are developing Eneberg, a Domestic PV Generation Forecasting and Trading Software. Eneberg is mainly dealing with aggregated "Excess Generation". Whilst there is a vast body of research and models dealing with PV Generation and Energy Demand, "Excess Generation" is still an open frontier. It is our aim to pioneer in that new field of Excess Generation.In-depth excess generation analysis is already covered by this post: Energy Surplus Trends from Domestic UK Solar Panels in October 2013 to January 2014

## Thursday 11 September 2014

### Artificial Neural Networks

An artificial neural network (ANN) is a computational model inspired by the information processing functionality of the brain. But how does the brain compute?

Generally, the central elements of computation are processing, transmission, and storage. Within the brain the neuron is the central computing element. Neurons receive signals and produce responses. The transmission of information at the neural level involves electrical signals – so called action potentials – based broadly on ions and semi-permeable membranes, and chemical signals at the synapses. In the brain the storage of information corresponds to learning which occurs at the synapses. These synapses are at the interface between neurons and regulate the transmission of information from neuron to neuron.

An ANN widely corresponds to the processing paradigm of neural networks with the nodes of the ANN being the central computing element similar to the neuron. In fact, ANNs are nothing but networks of primitive functions where the chain of function compositions transforms an input to an output. The composition of the computational model is contained implicitly in the interconnections of the nodes and is referred to as the network function. Each node comprises a primitive function transforming its input into an output:

Typically, the inputs of a node have an associated weight

(A second class of ANNs are recurrent networks where connections between nodes form directed cycles.)

The network function of an ANN can be understood as a universal function approximation. However, the difference between ANNs and a Taylor or Fourier series is that the function to be approximated is given not explicitly but implicitly, through a representative set of input-output examples. It will be the task of the learning algorithm to adjust the parameters of the ANN to reflect the input-output examples and to extrapolate to new input patterns in an optimal manner. The learning algorithm is an adaptive method by which the network self-organises to reflect the function to be approximated. The computational effort directly relates to the number of parameters and therefore to the topology of the network and increases substantially for more complicated ANNs. It was not until the proposal of

The back-propagation algorithm uses gradient descent on the error function of an ANN in weight space. Thus, the weights of an ANN which minimise its error function are considered to be the solution of the learning problem. As a precondition for gradient descent the error function of an ANN needs to be continuous and differentiable. Since the ANN is simply the composition of its primitive functions the error function becomes differentiable if the networks primitive functions are differentiable themselves.

In the back-propagation algorithm an ANN is initialised randomly with weights. Next, the gradient of the error function is computed recursively and the weights of the ANN are adjusted accordingly using gradient descent. Because an ANN is a complex chain of a sequential function composition the chain rule plays a most important role in calculating the gradient of the network function's error. The back-propagation algorithm implements the chain rule for the recursive calculation of the gradient of the error function in weight space in a very efficient manner.

Learning in an ANN with back-propagation consists of two stages: in the first stage – the

Each node of the network evaluates its primitive function \(f_j(e)\) and emits the result \(y_j\) to the connected nodes in the subsequent layer. Additionally, each node calculates and stores the derivative of its primitive function \(df_j(e)/de\).

The second stage -- the back-propagation step -- consists in reversing the flow of information throughout the network whereby a unit input propagates from the output layer towards the input layer with the activation of each neuron now being the back-propagation term \(\delta_j\).

At each node the back-propagation term \(\delta_j\) is multiplied by the stored derivative of the node's primitive function from the previous feed-forward step which gives the gradient in weight space \((d f_j(e)/de) \delta_j\).

Finally, the weights are updated using gradient descent as given by

$$

w'_{i,j} = w_{i,j} + \alpha y_{i} \frac{d f_j(e)}{de} \delta_j

$$

with \(\alpha\) being the learning rate and \(w_{i,j}\) being the weight of the feed-forward connection from neuron \(i\) in the previous layer to neuron \(j\) in the subsequent layer.

[Werbos, 1974] Beyond regression: New tools for prediction and analysis in the behavioural sciences, Pd.D. Thesis, Harvard University (1974).

[Gurney, 1997] An introduction to neural networks, UCL Press (1997).

[Montavon, 1998] Neural Networks: Tricks of the Trade, Springer (1998).

Generally, the central elements of computation are processing, transmission, and storage. Within the brain the neuron is the central computing element. Neurons receive signals and produce responses. The transmission of information at the neural level involves electrical signals – so called action potentials – based broadly on ions and semi-permeable membranes, and chemical signals at the synapses. In the brain the storage of information corresponds to learning which occurs at the synapses. These synapses are at the interface between neurons and regulate the transmission of information from neuron to neuron.

An ANN widely corresponds to the processing paradigm of neural networks with the nodes of the ANN being the central computing element similar to the neuron. In fact, ANNs are nothing but networks of primitive functions where the chain of function compositions transforms an input to an output. The composition of the computational model is contained implicitly in the interconnections of the nodes and is referred to as the network function. Each node comprises a primitive function transforming its input into an output:

Typically, the inputs of a node have an associated weight

*w*by which the input x i is multiplied. The node integrates all its inputs – usually by adding the different inputs – followed by the evaluation of its primitive function*f*. The primitive function*f*computed in the node can be any function but common choices are differentiable functions such as the sigmoid function. Models of ANNs mainly differ in their choice of the primitive function, the topology of the network, and rarely in the timing of the evaluation of the primitive function. In*feed-forward*ANNs the network is composed of distinctive layers where each neuron only receives input from neurons of the previous layer. Accordingly, a feed-forward network has a distinct input and output layer with the intermediate layers being referred to as hidden layers:(A second class of ANNs are recurrent networks where connections between nodes form directed cycles.)

The network function of an ANN can be understood as a universal function approximation. However, the difference between ANNs and a Taylor or Fourier series is that the function to be approximated is given not explicitly but implicitly, through a representative set of input-output examples. It will be the task of the learning algorithm to adjust the parameters of the ANN to reflect the input-output examples and to extrapolate to new input patterns in an optimal manner. The learning algorithm is an adaptive method by which the network self-organises to reflect the function to be approximated. The computational effort directly relates to the number of parameters and therefore to the topology of the network and increases substantially for more complicated ANNs. It was not until the proposal of

*back-propagation*as a learning algorithm [Werbos, 1974] that the application of ANNs gained momentum and it has been the most widely used algorithm for neural network learning ever since.The back-propagation algorithm uses gradient descent on the error function of an ANN in weight space. Thus, the weights of an ANN which minimise its error function are considered to be the solution of the learning problem. As a precondition for gradient descent the error function of an ANN needs to be continuous and differentiable. Since the ANN is simply the composition of its primitive functions the error function becomes differentiable if the networks primitive functions are differentiable themselves.

In the back-propagation algorithm an ANN is initialised randomly with weights. Next, the gradient of the error function is computed recursively and the weights of the ANN are adjusted accordingly using gradient descent. Because an ANN is a complex chain of a sequential function composition the chain rule plays a most important role in calculating the gradient of the network function's error. The back-propagation algorithm implements the chain rule for the recursive calculation of the gradient of the error function in weight space in a very efficient manner.

Learning in an ANN with back-propagation consists of two stages: in the first stage – the

*feed-forward*step – the information progresses form the input layer throughout the network towards the output layer.Each node of the network evaluates its primitive function \(f_j(e)\) and emits the result \(y_j\) to the connected nodes in the subsequent layer. Additionally, each node calculates and stores the derivative of its primitive function \(df_j(e)/de\).

The second stage -- the back-propagation step -- consists in reversing the flow of information throughout the network whereby a unit input propagates from the output layer towards the input layer with the activation of each neuron now being the back-propagation term \(\delta_j\).

At each node the back-propagation term \(\delta_j\) is multiplied by the stored derivative of the node's primitive function from the previous feed-forward step which gives the gradient in weight space \((d f_j(e)/de) \delta_j\).

Finally, the weights are updated using gradient descent as given by

$$

w'_{i,j} = w_{i,j} + \alpha y_{i} \frac{d f_j(e)}{de} \delta_j

$$

with \(\alpha\) being the learning rate and \(w_{i,j}\) being the weight of the feed-forward connection from neuron \(i\) in the previous layer to neuron \(j\) in the subsequent layer.

[Werbos, 1974] Beyond regression: New tools for prediction and analysis in the behavioural sciences, Pd.D. Thesis, Harvard University (1974).

[Gurney, 1997] An introduction to neural networks, UCL Press (1997).

[Montavon, 1998] Neural Networks: Tricks of the Trade, Springer (1998).

## Wednesday 10 September 2014

### Motivation Over CVs

As part of our recruitment process we run EnergyDataSimulationChallenge on GitHub which consists of different data science and web application development tasks. Since I started the first challenge on GitHub almost one year ago, the repository has not only been forked 48 times but we also received 36 pull requests. We have reviewed some really great submissions and we hired seven candidates so far and all of them have successfully contributed to our business without exceptions. Under our current recruitment policy, we do not review any candidate's CV before we received and reviewed their pull request on GitHub.

As we laid out in our previous article "Talent over CVs", our motivation to run the challenges as part of our recruitment is to check an (or to get a better understanding of an) applicant's actual skills in data analysis, programming, and communication. Reviewing the actual code is the easiest way to know the applicants' technical skills. This makes sense as we are looking for people to work with who write great code rather than great CVs. However, this is not the only insight we gain from an applicant's submission. The challenge gives an applicant also the chance to showcase his or her motivation.

**Why Motivation?**

Why do you want to work as data scientist or a software engineer? Is it solely for the good salary or great career opportunities? Or do you actually love data and you cannot think of any greater joy than to crunch some difficult analysis? Yes, it is the most important! Your motivation is your strongest selling point and one of the most important factors for us during the recruitment process. We all love our jobs here at Cambridge Energy Data Lab and so should you!

Each member of our data science team has very strong opinions and they all are a bit stubborn. One of them is the author of popular data science blog "The Glowing Python", a guru in the world of data analysis in python. They easily have a few hours discussions about our data analysis strategy. Would you enjoy the long and complex data analysis discussions with our data geeks?

Also, our developers are crazy about programming. Our CTO loves programming contests like Top Coders, and he can easily spend a whole weekend in the contests. Another developer has his own project. he works almost all of his free moments on "c3", a super cool javascript library for creating fancy charts so easily. Would you enjoy programming with the code geeks for a whole day?

Heroic CTO |

I don't think you can enjoy either unless you are really motivated about data science and development. We design our challenges to be like a mini-set of our data analysis and development within our actual business operations. If you love data analysis and development, and you are highly self-motivated, I am confident you will enjoy our challenges. And if you indeed enjoy our challenges I guarantee you will love to work with us as a member of the team! As long as you enjoy working with us, we are confident you will pick up necessary skills and start to contribute to our projects in no time.

**Summary**

We would like to work with people who love data science and development rather than people who has impressive careers and CVs. If you are highly motivated and passionate, you can easily catch up with other members and we will do our best to support you! We are looking forward to your cool submissions!

We are waiting for your challenge! |

## Friday 22 August 2014

### Some insights about domestic electricity prices in the IEA countires

In this post we will provide three interactive visualizations of the latest data released by the International Energy Agency (IEA) about the domestic electricity prices*.

In 2013, average domestic electricity prices, including taxes, in Denmark and Germany were the highest in the IEA. We also note that in Denmark the fraction of taxes paid is higher than the actual electricity price whereas in Germany the actual electricity price and the taxes are almost the same. Interestingly, USA has the lowest price and the lowest taxation.

### Prices in 2013

In the first figure below we compare the prices of the domestic electricity among the countries monitored by IEA. The plot also shows which fraction of the price is represented by taxes:In 2013, average domestic electricity prices, including taxes, in Denmark and Germany were the highest in the IEA. We also note that in Denmark the fraction of taxes paid is higher than the actual electricity price whereas in Germany the actual electricity price and the taxes are almost the same. Interestingly, USA has the lowest price and the lowest taxation.

### Relationship between taxes and full prices

In this figure we highlight the correlation between taxes and full prices: Here we can see that there is a positive correlation (correlation=0.82) between the prices with taxes and the prices without taxes. This indicates that according to this data, when the full price increases, the taxes also increase. Hovering the pointer on the points we can discover that Germany and Denmark have the highest taxes, while USA, UK and Japan have the lowest. Also, we note that Ireland has expensive electricity and low taxes, while Norway shows the reverse trend.### Evolution of the prices from 2010 to 2013

Here we try to compare the trend of the prices among the five countries with the higest prices in 2013: From this chart we can observe that only in 2013 the cost of the electricity for the domestic consumers has become very similar in Germany and Denmark and that the Danish prices were substantially higher in the past. We can also see that prices in Italy and Ireland have a very similar increasing trend while prices in Austria dropped in 2012 but raised again in 2013.**the prices are showed as pence per Kwh.*### A weather forecast accuracy study

# Can you trust the weather forecast ?

It's a question that can trigger long passionate conversations, especially in this country (UK) where talking about the weather is deeply embedded in the culture. An answer to such a question can therefore be of interest to anybody living in the UK, but also for companies like us who work constantly with weather forecast data in order to produce accurate estimation of renewable energy production. This study will focus on the daily-averaged solar radiation over the Tokyo area from November 2013 to July 2014.

First, we plot in figure 1 the actual solar radiation in kWh over the Tokyo area.

We can access weather forecasts for the day ahead, the following day or even a week or a month in advance. The number of days separating the date of the prediction and the date the prediction was done for is called the forecast horizon (abbreviated fh in the following). Our dataset consists of 31 different forecasts for every day, from a forecast horizon of 1 to 31. Short term forecasts (0 < fh ≤ 11) are created by weather models. These models take weather data from the past and use it to predict the future of the weather conditions (temperature, pressure, wind, humidity…). Long term forecasts (11 < fh ≤ 31) are determined from considering previous years’ averages. For all these forecasts, we can compute the error by comparing the forecast solar radiation value to the actual value. We represent in figure 2 the error distribution for both long term and short term forecasts.

This figure shows us that the short term forecast performs better than the long term forecast. Now we can question how this error varries with the forecast horizon. We therefore plot in figure 3 the MAPE (measure of the accuracy) and MPE (measure of the bias) as a function of the forecast horizon.

Figure 3: MAPE (measure of the accuracy) and MPE (measure of the bias) as a function of the forecast horizon. The error and the confidence interval decrease as the forecast horizon approaches 0. |

We note that the forecast accuracy is improving (the error is decreasing) when the forecast horizon gets closer to 0, for which we have a vanishing error since it corresponds to actual weather values. Not only does the accuracy gets better, but so does the confidence interval (the shaded regions are narrower for short term forecasts). We also confirm that the solar radiation is under-estimated by the long term forecast and slightly over-estimated by the short term one (see right-hand-side plot). This could be due to the fact that the studied period is particularly sunny compared to previous years.
However, we also notice that a 10-day weather model forecast is, on average, worse than a prediction based on previous years’ averages... We know since Lorenz that chaotic systems (weather models are good examples of chaotic equations) are very sensitive to initial conditions... Therefore, a middle term forecast based on a weather model has to be considered with caution.

So please, keep chatting about the weather but when it comes to planning a barbecue a week in advance, don't put all your trust in the forecast!

Subscribe to:
Posts (Atom)