Cambridge Energy Data Analysis: September 2014

Friday, 19 September 2014

6 screens for monitoring with Raspberry Pi cluster

Do you monitor?

Key performance indicators (KPI) , project progresses, server status, user logs (and many more) are constantly changing and the amount of data collected is growing bigger everyday. However it is hard to monitor these values, and sharing them with all the members of the team.

Instead, let's constantly display it.

We had too many things to display on a single monitor. Our solution was to buy 6 monitors!

6 screens for monitoring.

In order to handle 6 monitors, we would normally have to buy a massive desktop PC and a top-end graphic board. The solution we found is using Raspberry Pi! Raspberry Pi is one of the smallest computers in the world and it was born in Cambridge UK. Each Pi's performance is not perfect but it can be powerful enough if we assemble them together to form a cluster.

We immediately bought 6 Raspberry Pis. The main issue was to find a rack to store the 6 Pis. It's common to build a case from lego, however, we didn't have any Lego laying around. Our solution we found is using a shoebox!

Raspberry Pis in a Nike shoe case.

Just Do It!

It looks cool. Cambridge style monitoring environment. Try it like us!

Monday, 15 September 2014

Challenge for Excess Generation

Do you consider to buy a photovoltaic (PV) installation for your house? Or do you already have some solar panels on your roof and you are looking for ways to maximise your return from this investment? Read on as we have some important advice for you:

So far the PV industry's main growth is driven by the government's support with subsidies. Unfortunately, solar panel installations have not yet achieved a competitive advantage due to their still limited efficiency and high cost. This is bound to change in the future. However, if you consider investing into a domestic PV installation today, the governmentally backed financial incentives are key to your investment's return! Here is the secret.

What is Excess Generation?

Sometimes excess generation is also referred to as surplus generation, excess electricity, or exported energy among others. Excess generation is defined as the amount of electricity generated by your rooftop panels (1. Total generation) minus your daytime electricity consumption (2. Electricity used). It is such excess generation which is available for export to a grid system (3. Export energy).

Feed-in-tariff Incentive Scheme.

Feed-in tariffs (FITs) are the most widely used policy in the world for accelerating renewable energy (RE) deployment, accounting for a greater share of RE development than either tax incentives or renewable portfolio standard (RPS) policies. In the European Union (EU), FIT policies have led to the deployment of more than 15,000 MW of solar photovoltaic (PV) power and more than 55,000 MW of wind power between 2000 and the end of 2009. In total, FITs are responsible for approximately 75% of the global PV deployment.

In a grid connected rooftop photovoltaic power station, the generated electricity can be sold to the grid at a higher price than what the grid charges for the consumers. This arrangement provides a secure return for the installer’s investment. Many consumers from across the world are switching to this mechanism due to the revenue yielded. However, the details of the financial mechanism varies depending on countries as illustrated by two examples as follows:

Case Study 1: Desincentive for Excess Generation (UK)

In the UK, consumers have a stronger incentive to minimise Excess Generation, by using the majority of their generated electricity on sunny days. The UK customers receive a guaranteed Feed-in-tariff for all electricity generation (10-14 p/kWh) , plus an 'Export tariff' (4.77 p/kWh) for their excess generation, which, however, is much smaller than the average electricity bill (12-15 p/kWh) . Therefore, customers should consume their generated electricity rather than export it to the grid.

As goes the theory.

However, in reality the ratio of excess generation is fixed to 50% of PV generation due to a lack of smart-meters. Thus, the importance of 'Excess Generation' will definitely emerge with the rollout of smart-meters in the near future.

Case Study 2: Incentive for Excess Generation (JAPAN)

In Japan, FiTs are only paid for 'Excess Generation', not "Total Generation' as it is the case in the UK. FiT’s price is currently much higher (38-42 JPY per kWh) than the average electricity bill (20-25 JPY per kWh), so customers have a strong financial incentive to maximise their amount of 'Excess Generation'. Therefore, customers are willing to change their consumption behaviour by shifting the usage of electricity-heavy appliances, such as dishwashers and wasching machines to the nighttime with cheaper electricity tariff. This individual behavioral change is expected to contribute to a nationwide peak-reduction in the future.

Our Challenge for Excess Generation

We are developing Eneberg, a Domestic PV Generation Forecasting and Trading Software. Eneberg is mainly dealing with aggregated "Excess Generation". Whilst there is a vast body of research and models dealing with PV Generation and Energy Demand, "Excess Generation" is still an open frontier. It is our aim to pioneer in that new field of Excess Generation.

In-depth excess generation analysis is already covered by this post: Energy Surplus Trends from Domestic UK Solar Panels in October 2013 to January 2014

Thursday, 11 September 2014

Artificial Neural Networks

An artificial neural network (ANN) is a computational model inspired by the information processing functionality of the brain. But how does the brain compute?

Generally, the central elements of computation are processing, transmission, and storage. Within the brain the neuron is the central computing element. Neurons receive signals and produce responses. The transmission of information at the neural level involves electrical signals – so called action potentials – based broadly on ions and semi-permeable membranes, and chemical signals at the synapses. In the brain the storage of information corresponds to learning which occurs at the synapses. These synapses are at the interface between neurons and regulate the transmission of information from neuron to neuron.

An ANN widely corresponds to the processing paradigm of neural networks with the nodes of the ANN being the central computing element similar to the neuron. In fact, ANNs are nothing but networks of primitive functions where the chain of function compositions transforms an input to an output. The composition of the computational model is contained implicitly in the interconnections of the nodes and is referred to as the network function. Each node comprises a primitive function transforming its input into an output:

Typically, the inputs of a node have an associated weight w by which the input x i is multiplied. The node integrates all its inputs – usually by adding the different inputs – followed by the evaluation of its primitive function f. The primitive function f computed in the node can be any function but common choices are differentiable functions such as the sigmoid function. Models of ANNs mainly differ in their choice of the primitive function, the topology of the network, and rarely in the timing of the evaluation of the primitive function. In feed-forward ANNs the network is composed of distinctive layers where each neuron only receives input from neurons of the previous layer. Accordingly, a feed-forward network has a distinct input and output layer with the intermediate layers being referred to as hidden layers:

(A second class of ANNs are recurrent networks where connections between nodes form directed cycles.)

The network function of an ANN can be understood as a universal function approximation. However, the difference between ANNs and a Taylor or Fourier series is that the function to be approximated is given not explicitly but implicitly, through a representative set of input-output examples. It will be the task of the learning algorithm to adjust the parameters of the ANN to reflect the input-output examples and to extrapolate to new input patterns in an optimal manner. The learning algorithm is an adaptive method by which the network self-organises to reflect the function to be approximated. The computational effort directly relates to the number of parameters and therefore to the topology of the network and increases substantially for more complicated ANNs. It was not until the proposal of back-propagation as a learning algorithm [Werbos, 1974] that the application of ANNs gained momentum and it has been the most widely used algorithm for neural network learning ever since.

The back-propagation algorithm uses gradient descent on the error function of an ANN in weight space. Thus, the weights of an ANN which minimise its error function are considered to be the solution of the learning problem. As a precondition for gradient descent the error function of an ANN needs to be continuous and differentiable. Since the ANN is simply the composition of its primitive functions the error function becomes differentiable if the networks primitive functions are differentiable themselves.
In the back-propagation algorithm an ANN is initialised randomly with weights. Next, the gradient of the error function is computed recursively and the weights of the ANN are adjusted accordingly using gradient descent. Because an ANN is a complex chain of a sequential function composition the chain rule plays a most important role in calculating the gradient of the network function's error. The back-propagation algorithm implements the chain rule for the recursive calculation of the gradient of the error function in weight space in a very efficient manner.

Learning in an ANN with back-propagation consists of two stages: in the first stage – the feed-forward step – the information progresses form the input layer throughout the network towards the output layer.
Each node of the network evaluates its primitive function $f_j(e)$ and emits the result $y_j$ to the connected nodes in the subsequent layer. Additionally, each node calculates and stores the derivative of its primitive function $df_j(e)/de$.

The second stage -- the back-propagation step -- consists in reversing the flow of information throughout the network whereby a unit input propagates from the output layer towards the input layer with the activation of each neuron now being the back-propagation term $\delta_j$.
At each node the back-propagation term $\delta_j$ is multiplied by the stored derivative of the node's primitive function from the previous feed-forward step which gives the gradient in weight space $(d f_j(e)/de) \delta_j$.

Finally, the weights are updated using gradient descent as given by
$$
w'_{i,j} = w_{i,j} + \alpha y_{i} \frac{d f_j(e)}{de} \delta_j
$$
with $\alpha$ being the learning rate and $w_{i,j}$ being the weight of the feed-forward connection from neuron $i$ in the previous layer to neuron $j$ in the subsequent layer.

[Werbos, 1974] Beyond regression: New tools for prediction and analysis in the behavioural sciences, Pd.D. Thesis, Harvard University (1974).
[Gurney, 1997] An introduction to neural networks, UCL Press (1997).
[Montavon, 1998] Neural Networks: Tricks of the Trade, Springer (1998).

Wednesday, 10 September 2014

Motivation Over CVs

As part of our recruitment process we run EnergyDataSimulationChallenge on GitHub which consists of different data science and web application development tasks. Since I started the first challenge on GitHub almost one year ago, the repository has not only been forked 48 times but we also received 36 pull requests. We have reviewed some really great submissions and we hired seven candidates so far and all of them have successfully contributed to our business without exceptions. Under our current recruitment policy, we do not review any candidate's CV before we received and reviewed their pull request on GitHub.

A lot of challenges :)

As we laid out in our previous article "Talent over CVs", our motivation to run the challenges as part of our recruitment is to check an (or to get a better understanding of an) applicant's actual skills in data analysis, programming, and communication. Reviewing the actual code is the easiest way to know the applicants' technical skills. This makes sense as we are looking for people to work with who write great code rather than great CVs. However, this is not the only insight we gain from an applicant's submission. The challenge gives an applicant also the chance to showcase his or her motivation.

Why Motivation?

Why do you want to work as data scientist or a software engineer? Is it solely for the good salary or great career opportunities? Or do you actually love data and you cannot think of any greater joy than to crunch some difficult analysis? Yes, it is the most important! Your motivation is your strongest selling point and one of the most important factors for us during the recruitment process. We all love our jobs here at Cambridge Energy Data Lab and so should you!

Each member of our data science team has very strong opinions and they all are a bit stubborn. One of them is the author of popular data science blog "The Glowing Python", a guru in the world of data analysis in python. They easily have a few hours discussions about our data analysis strategy. Would you enjoy the long and complex data analysis discussions with our data geeks?

Stubborn Data Geek

Also, our developers are crazy about programming. Our CTO loves programming contests like Top Coders, and he can easily spend a whole weekend in the contests. Another developer has his own project. he works almost all of his free moments on "c3", a super cool javascript library for creating fancy charts so easily. Would you enjoy programming with the code geeks for a whole day?

Heroic CTO

I don't think you can enjoy either unless you are really motivated about data science and development. We design our challenges to be like a mini-set of our data analysis and development within our actual business operations. If you love data analysis and development, and you are highly self-motivated, I am confident you will enjoy our challenges. And if you indeed enjoy our challenges I guarantee you will love to work with us as a member of the team! As long as you enjoy working with us, we are confident you will pick up necessary skills and start to contribute to our projects in no time.

Summary

We would like to work with people who love data science and development rather than people who has impressive careers and CVs. If you are highly motivated and passionate, you can easily catch up with other members and we will do our best to support you! We are looking forward to your cool submissions!

We are waiting for your challenge!

Description