pexels-photo-296888

Project Friday 1.3: Artificial Intelligence meets coffee

Last Friday the third afternoon of Project Friday took place. In Project Friday we spent about once a month  an afternoon on something completely useless. Why do we do this? Because we can, it’s fun and interesting and it’s a good reason to grab a couple of beers. This Project Friday is all about mixing Espresso Machines with Artificial Intelligence: adding facial recognition to the machine so that you don’t need to push the button to get your favorite coffee.

In the previous afternoons (day 1 and day 2), we’ve installed everything on the Raspberry PI (which was quite a hassle), learned what relays were and how to use them, soldered the first buttons and were able to control these buttons via the computer. This Friday we set ourselves the goal to real time face detection on the PI cam.

As the weather was warm and sunny, we decided it was better to leave the office and reach our goal in a more suitable environment, somewhere we the Raspberry PI (and us) wouldn’t overheat. So we drove to my place and settled ourselves in the garden, brought a television outside, hooked up the Raspberry PI and off coding we went. An additional benefit was the BBQ!

Astonishingly enough, not only the Raspberry PI had trouble with the warm weather, so did your cognitive capabilities. The move didn’t make us more productive (but was still the right choice with this kind of weather). So progress was slow, and we were on half the strength as three team members were on holiday. We did had some success, we did manage to implement the pre-trained HAAR-cascades for detecting faces and our trained cascade for detecting your middle finger when you flip it. But didn’t get so far to get default face recognition in place. So we’ll leave that for the next afternoon!

20170526_170709 20170526_161358

pexels-photo-296888

Project Friday 1.2: Artificial Intelligence meets coffee

A month ago we started with ‘Project Friday’; once a month we’ll lock some collegues in a room with a couple of beers and a fun project. This project: give a coffeemachine a brain and an eye, so it can see who’s standing in front of the machine and knows what he/she wants. Why do we do this? Well, because it’s fun.

This friday we made some significant progress, first of all we didn’t short circuit the arduino board, which we accomplished in the first ten minutes the previous time. We managed to reverse engineer some of the buttons of the circuitboard of the coffee machine and after learning about the transistors and relays, we managed to “push” a few of the buttons of the machine using the arduino. We bought a cheap solder around the corner of the office and soldered the first three buttons.

Meanwhile, on the Raspberry we succeeded extracting faces from the images using HAAR cascades, when multiple faces are detected it selects the biggest one. Face Recognition consists of two parts: detection (where in image / frame is a face) and recognition (to whom belongs that face). The first part works, for the second part we had some trouble installing libraries, apparently this takes a long time!

Next time, we’re going to solder the rest of the buttons and program the arduino, create trainingsdata by taking pictures of ourselves and try to train the first algorithms.

IMG_20170414_175446 IMG_20170414_164921 IMG_20170414_145149

IMG_20170313_180058

R Experience @ University of Groningen

This Monday was the last day of the R Experience course at the University of Groningen. Since this year Marketing (Intelligence) students are required to use R for assignments, where they’ve only worked with SPSS before. To kickstart their learning curve in R, we lectured the R Experience, in cooperation with the MARUG (Marketing Association at University of Groningen).

The course is intended for analysts who already know the statistics and modelling theories and principles. The goal of the course is to learn how to do this in R, instead of using tools like SPSS or SAS. In five sessions, we showed how to work with Rstudio and how to write code.

They should make this course mandatory for the study Marketing Intelligence

The first module started with an entry level introduction to writing code. In the following module we’ve touched how Rstudio works, how to import data and how to use functions. In the next modules data manipulation and preparation (like missing values, outliers, correlations and near zero variance) were discussed. In the last sessions we showed how to use the caret package to train lots of different kind of models with minor changes in code.

Very clear explanation and definitely added value for every Marketing Intelligence Student

We were amazed by the enthusiasm of the group, students were on Facebook, but instead of looking at their timeline, they used it to send code to each other via Messenger. Also reactions in the feedback forms were very positive, with an average grade of 8.5/10 and no one who wouldn’t recommend the course to fellow students, we consider the course as a great success. Next year, we’ll definitely do this again!

Hackathon_051

Tweede Kamer Verkiezingen Hackathon – The Aftermovie!

Op vrijdag 10 maart jl. heeft de tweede “The Analytics Lab Hackathon” plaatsgevonden, de opdracht was dit keer om de zetelverdeling te voorspellen van de Tweede Kamer na de verkiezingen van 15 maart.Hackathon_033

Dankzij de serieuze, sportieve en ook humoristische inzet van onze deelnemende teams is het weer een zeer leuke dag geweest, waarbij iedereen wilde winnen, maar het toch vooral ging om het meedoen. We willen alle deelnemers dan ook bedanken voor hun aanwezigheid en bijdrage aan deze leuke dag!

Winnaar Hackathon Verkiezingen

 

Nadat donderdagochtend de voorlopige uitslag bekend werd gemaakt van de verkiezingen, konden wij het team van de ANWB als winnaar aanwijzen, op de voet gevolgd door het Essent – Vrouwen Voor Data team. De ANWB kon er met de prijs vandoor gaan door een RMSE (root mean square error) te behalen van slechts 5,29.

Wat ons betreft: op naar volgend jaar, op naar de volgende hackathon!

pexels-photo-296888

Project Friday 1.1: Artificial Intelligence meets coffee

This Friday we started with our first ‘Project Friday’. About once a month, on a Friday, we’ll lock ourselves away for the afternoon with a couple of beers and a fun project. The project doesn’t need to bring money to table, it needs to bring fun, challenges, knowledge and inspiration to the table.

Our first project is definitely an awesome one; Artificial Intelligence meets Coffee! Douwe Egberts was kind enough to provide us with a full automatic coffee machine, well almost full automatic, you still need to push the button to let the machine know whether you want a normal coffee, a cappuccino, an espresso or whatever grinds your gears. Our objective: create an actual full automatic coffee machine. The goal is to add face recognition to the coffee machine: Stand in front of the machine, the machine recognizes who you are and knows which coffee you want.

We started by prepping a Raspberry Pi and installing OpenCV. Dismantling the machine and figuring out how circuit board works using the well proven method of trial and error. Since we’re not engineers, the biggest challenge is probably going to be to hack the coffee machine: within 15 minutes, one colleague managed to short circuit the Arduino board. Great start!

 

PF1.1

 

PF 1.0

494345930

Python & R vs. SPSS & SAS

When we’re working for clients we mostly come across the statistical programming languages SAS, SPSS, R and Python. Of these SAS and SPSS are probably the most used. However, the interest for the open source languages R and Python is increasing. In recent years, some of our clients migrated from using SAS or SPSS to using R and/or Python. And even if they haven’t (yet), most commercial software packages (including SAS and SPSS) make it possible to connect to R and Python nowadays.

SAS was developed at the North Carolina State University and was primarily developed to be able to analyse large quantities of agriculture data. The abbreviation SAS stands for Statistical Analysis System. In 1976 the company SAS was founded as the demand for such software increased. Statistical Package for the Social Sciences (SPSS) was developed for the social sciences and was the first statistical programming language for the PC. It was developed in 1968 at the Stanford University and eight years later the company SPSS Inc. was founded, which was bought by IBM in 2009.

In 2000 the University of Auckland released the first version of R, a programming language primarily focused on statistical modeling and was open sourced under the GNU license. Python is the only one that was not developed at a university. Python was created by a Dutch guy who is a big fan of Monty Python (where the name comes from). He needed a project during Christmas and created this language which is based on ABC. ABC is a language, also created by him, with the goal to teach non-programmers how to program. Python is a multi-purpose language, like C++ and Java, with the big difference and advantage that Python is way easier to learn. Programmers carried on and created lots of modules on top of Python and it therefore has a wide range of statistical modeling capabilities nowadays. That’s why Python definitely belongs in this list.

In this article, we compare the four languages on methods and techniques, ease of learning, visualisation, support and costs. We explicitly focus on the languages, the user interfaces SAS Enterprise Miner and SPSS Modeler are out of scope.

table

Statistical methods and Techniques

My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.

When we’re looking at SPSS and SAS, both of these languages originate from the explanatory side of Data Analysis. They are developed in an academic environment, where hypotheses testing plays a major role. This makes that they have significant less methods and techniques in comparison to R and Python. Nowadays, SAS and SPSS both have data mining tools (SAS Enterprise Miner and SPSS Modeler), however these are different tools and you’ll need extra licenses.

One of the major advantages of open source tooling is that the community continuously improves and increases functionality. R was created by academics, who wanted their algorithms to spread as easily as possible. Ergo R has the widest range of algorithms, which makes R strong on the explanatory side and on the predictive side of Data Analysis.

Python is developed with a strong focus on (business) applications, not from an academic or statistical standpoint. This makes Python very powerful when algorithms are directly used in applications. Hence, we see that the statistical capabilities are primarily focused on the predictive side. Python is mostly used in Data Mining or Machine Learning applications where a data analyst doesn’t need to intervene. Python is therefore also strong in analysing images and videos, for example we’ve used Python this summer to build our own autonomous driving RC car. Python is also the easiest language to use when using Big Data Frameworks like Spark.

Ease of learning

Both SPSS and SAS have a comprehensive user interface, with the consequence that a user doesn’t necessarily need to code. Furthermore, SPSS has a paste-function which creates syntaxes from steps executed in the user interface and SAS has Proc SQL, which makes SAS-coding a lot easier for people who know the SQL query language. SAS and SPSS code are syntaxtically far from similar to each other and also very different from other relevant programming languages, so when you need to learn one of these from scratch, good luck with it!

Although there are GUI alternatives for R, like Rattle, it doesn’t come close to SAS or SPSS in terms of its functionality. R is easily to learn for programmers, however, a lot of analysts don’t have a background in programming. R has the steepest learning curve from all, it’s the most difficult one to start with. But once you get the basics, it gets easier soon. For this specific reason, we’ve created a R course, called Experience R, which kickstarts (aspiring) data analysts / scientists in learning R. Python is based on ABC, which is developed with the sole purpose of teaching non-programmers how to program. Readability is one of the key features of Python. This makes Python the easiest language to learn. As Python is so broad, there are no GUI’s for Python.

To conclude, as for ease of learning SPSS and SAS are the best option for starting analysts as they provide tools where the user doesn’t need to program.

Support

Both SAS and SPSS are commercial products and therefore have official support. This motivates some companies to choose for these languages: if something goes wrong, they’ve got support.

There is a misconception around the support for open-source tooling. It’s true that there is no official support from the creators or owners, nonetheless, there’s a large community for both languages most willing to help you to solve your problem. And 99 out of 100 times (if not more often), your question has already been asked and answered on sites like Stack Overflow. On top of that, there are numerous companies that do provide professional support for R and Python. So, although there’s no official support for both R and Python, in practice we see that if you’ve got a question, you’ll likely have your answer sooner if it’s about R or Python than in case it’s SAS or SPSS related.

Visualisation

The graphical capabilities of SAS and SPSS are purely functional; although it is possible to make minor changes to graphs, to fully customize your plots and visualizations in SAS and SPSS can be very cumbersome or even impossible. R and Python offer much more opportunities to customize and optimize your graphs due to the wide range of modules that are available. The most widely used module for R is ggplot2, which has a wide set of graphs where you’re able to adjust practically everything. These graphs are also easily made interactive, which allows users to play with the data through applications like shiny.

Python and R learned (and still learn) a lot from each other. One of the best examples of this is that Python also has a ggplot-module , which has practically the same functionality and syntax as it does in R. Another widely used module for visualisation in Python is Matplotlib.

Costs

R and Python are open source, which makes them freely available for everybody. The downside is that, as we’ve discussed before, these are harder to learn languages compared to start using the SAS or SPSS GUI. As a result, analysts equipped with R and/or Python in their skillset have higher salaries than analyst that don’t. Educating employees that are currently not familiar with R and/or Python costs money as well. Therefore, in practice it isn’t the case that the open source programming language are completely free of costs, but when you compare it with the license fees for SAS or SPSS, the business case is very easily made: R and Python are way cheaper!

My choice

“Software is like sex, it’s better when it’s free” – Linus Torvalds (creator Linux)

My go-to tools are R and Python, I can use these languages everywhere without having to buy licenses. Also I don’t need to wait for the licenses. And time is a key feature in my job as a consultant. Aside from licenses, probably the main reason is the wide range of statistical methods; I can use any algorithm out there and choose the one that suits the challenge at hand best.

Which of the two languages I use depends on the goal, as mentioned above. Python is a multi-purpose language and is developed with a strong focus on applications. Python is therefore strong in Machine Learning applications; hence I use Python for example for Face or Object Recognition or Deep Learning applications. I use R for goals which have to do with customer behaviour, where the explanatory side also plays a major role; if I know which customers are about to churn, I would also like to know why.

These two languages are for a large part complementary. There are libraries for R that allow you to run Python code (reticulate, rPython), and there are Python modules which allow you to run R code (rpy2). This makes the combination of the two languages even stronger.


Jeroen Kromme, Senior Consultant Data Scientist

 

Hackathon_017

De stemmen zijn geteld…

Ondanks de hoge extra werkdruk voor de stemmentellers en de daaruit voortvloeiende vertraging van de definitieve uitslag is het ons als hackathon stemmentellers toch al gelukt om een winnaar uit de bus te laten komen.

Het is een epische strijd geworden, zowel op het strijdtoneel van de Tweede Kamer Verkiezingen als op het Hackathon strijdtoneel. Waar het op het eerste toneel vooral draait om standpunten, geloofwaardigheid en uitstraling, draait het op het tweede toneel maar om één ding: harde feiten!

Hackathon_004

Asscher en de MARUG hadden hoog ingezet op de PvdA, dit is helaas niet uitgekomen, met een historisch dieptepunt voor de PvdA en helaas geen winst voor de MARUG als gevolg. Dankzij Erdogan maakte Rutte in de laatste paar dagen voor de verkiezingen nog een mooie eindspurt, met wel verlies van zetels, maar vergeleken met de peilingen en veel van de hackathon voorspellingen was het toch een mooie dag voor de VVD! Wilders bleek, ondanks al zijn pogingen om dit niet te zijn, toch heel voorspelbaar. De voorspellingen vanuit de hackathon lagen allemaal zeer dicht bij het aantal behaalde zetels van de PVV.

Hackathon_017

Maar om dan nu over te gaan tot het nieuws waar iedereen op wacht…
DE WINNAAR!

De grote strijd op het hackathon toneel is gegaan tussen Essent en de ANWB, waarbij wij kunnen aankondigen dat de ANWB zich de VVD van deze verkiezingen mag voelen en dus als winnaar uit de bus is gekomen!

Hackathon_032

Gefeliciteerd ANWB!

 

Namens The Analytics Lab en Cmotions willen wij, als hackathon crew, de deelnemers van harte bedanken voor hun humoristische en toch serieuze en sportieve inzet bij de hackathon. Wij vonden het zelf weer een ontzettend leuk evenement om te organiseren en hopen alle deelnemers bij de volgende hackathon weer te mogen begroeten!

Hackathon_058

Happy_pi_day_header

Happy pi day!

Just something funny because it’s  pi day. Enjoy!

# clear your environment
rm(list = ls())
# load the necessary libraries
library(png)
library(plotrix)
# lab kleuren
oranje <- rgb(228/255, 86/255, 65/255)
donkergrijs <- rgb(75/255, 75/255, 74/255)
lichtblauw <- rgb(123/255, 176/255, 231/255)
# read the image of pi
img = readPNG("C:/Users/j.schoonemann/Desktop/pi.png")
# read the logo of The Analytics Lab
logo = readPNG("C:/Users/j.schoonemann/Desktop/Lab.png")
# define the x-position of the pie charts
x_position <- c(2, 4, 8, 14, 22)
# define the y-position of the pie charts
y_position <- c(4, 6, 8, 10, 12)
# define the size of the pie charts
pie_size <- c(0.5,1.0,1.5,2.0,2.5)
# create PacMan pie-charts
pacman <- list(c(20,80), c(20,80), c(20,80), c(20,80), c(20,80))
# calculate the chart limits for the x-axis
x_axis <- c(min(x_position - pie_size), max(x_position + pie_size))
# calculate the chart limits for the y-axis
y_axis <- c(min(y_position - pie_size),max(y_position + pie_size))
# define the colors of the PacMan pie-charts 
sector_col<- c("black", "yellow")
# define the startposition of the first slice of the pie in the charts
start_position <- c(-0.1, -0.2, -0.3, -0.4, -0.5)
# create the canvas for the plot
plot(0, xlim = x_axis, ylim = y_axis, type = "n", axes = F, xlab = "", ylab = "")
# add a title and subtitle to the plot, adjust size and color
title(main = "Eating Pi makes PacMan grow!\nHappy pi(e) day!", col.main = lichtblauw, cex.main = 2, 
 sub = "Powered by: The Analytics Lab", col.sub = oranje, cex.sub = 1)
# plot all the PacMan pie-charts
for(bubble in 1:length(x_position)){ 
 floating.pie(xpos = x_position[bubble], ypos = y_position[bubble], x = pacman[[bubble]], radius = pie_size[bubble], col = sector_col, startpos = start_position[bubble]) 
}
# add the logo of The Analytics Lab to the plot
rasterImage(image = logo, xleft = 0, ybottom = 12, xright = 5, ytop = 16)
# add pi multiple times to the plot
# pi between 1st and 2nd
rasterImage(image = img, xleft = 2.5, ybottom = 4.5, xright = 3.5, ytop = 5)
# pi between 2nd and 3d
rasterImage(image = img, xleft = 5, ybottom = 6.5, xright = 6, ytop = 7)
rasterImage(image = img, xleft = 5.8, ybottom = 7, xright = 6.8, ytop = 7.5)
# pi between 3d and 4th
rasterImage(image = img, xleft = 10, ybottom = 8.5, xright = 11, ytop = 9)
rasterImage(image = img, xleft = 11, ybottom = 9, xright = 12, ytop = 9.5)
# pi between 4th and 5th
rasterImage(image = img, xleft = 16.2, ybottom = 10, xright = 17.2, ytop = 10.5)
rasterImage(image = img, xleft = 17, ybottom = 10.5, xright = 18, ytop = 11)
rasterImage(image = img, xleft = 18, ybottom = 11, xright = 19, ytop = 11.5)

Happy_pi_day

 

Voorspelling Verkiezingen

Verkiezingen Hackathon groot succes

Verschillende media meldden afgelopen week dat 75 procent van alle stemgerechtigden nog twijfelt tussen twee of meerdere partijen voor de Tweede Kamerverkiezingen van aanstaande woensdag. In die wetenschap togen de ruim 60 deelnemers aan de Verkiezingen Hackathon, georganiseerd door The Analytics Lab en Cmotions, afgelopen vrijdag naar Utrecht om deel te nemen aan de tweede editie van dit evenement. Elf teams, uit onder meer de financiële sector, energie-, automotive- en verzekeringsbranche, trachtten – ondanks de nog grote twijfels onder het electoraat – de zetelverdeling van de aanstaande verkiezingen zo goed mogelijk te voorspellen.

Hackathon_nogstarten

 

Om 14:00 uur werd het startschot gegeven door de organisatie, die zichzelf voor de gelegenheid hadden omgedoopt tot voorzitters van onder meer de Partij van de Analyses (PvdA), Volume and Value of Data (VVD), Correcte en Degelijke Analyses (CDA) en de Statistisch Georganiseerde Partij (SGP). De Partij Voor de Vrijheidsgraden (PVV) liet verrassend genoeg verstek gaan tijdens het debat. Nadat de stemlokalen waren geopend analyseerden en modelleerden de fanatieke deelnemers er lustig op los. Hierbij werd gebruik gemaakt van gerenommeerde technieken, waaronder Random forest modellen en neurale netwerken, maar ook de creatieve geesten konden hun energie ruimschoots kwijt. Zo baseerde een van de teams haar voorspellingen op Kamergotchi-data en viel het team van RDC terug op hun kennis van de Nederlandse automarkt door middel van de ‘Krol-correctie’: het verschijnsel dat ouderen steeds hippere auto’s kopen en zich daarmee distantiëren van het 50-plus-label dat hen door de lijsttrekker van de ouderenpartij wordt toebedeeld. Ook de samenhang tussen het aantal verkopen van rode auto’s en de zetels voor de SP in de zogenaamde ‘SP-coëfficient’, kwam uit de koker van het RDC.

Hackathon_aanhetwerk

 

Vanaf 17:30 druppelden de uitslagen uit de verschillende delen van het land bij de organisatie binnen en werden de stemmen geteld. Nadat ook de MARUG (Marketing Associatie Rijksuniversiteit Groningen) – zoals het echte studenten betaamt – keurig één minuut voor de deadline hun voorspellingen instuurde, maakten de teamcaptains zich op voor het grote lijsttrekkersdebat. Hierin werd de deelnemers gevraagd naar hun gebruikte tactieken en modelleringstechnieken. Ondanks de grote variëteit in gebruikte bronnen en analysemethoden, waren de deelnemers redelijk eensgezind in de zetelverdeling, met een maximale spreiding van zeven zetels voor de verschillende politieke partijen. Grootste uitschieter was het aantal voorspelde zetels voor de PvdA door team van de MARUG. Ondanks dat deze partij tegen het herstel van de basisbeurs voor studenten is, werd de PvdA door de studenten liefst 29 stoelen in de Tweede Kamer toebedeeld.

Hackathon_lijsttrekkersdebat

 

Als aanstaande woensdag de verkiezingsuitslag bekend is, wordt aan de hand van ‘Root Mean Square Error’ methode bepaalt welk team zich de winnaar van de tweede editie van The Analytics Lab Hackathon mag noemen en de felbegeerde voorzittershamer in ontvangst mag noemen.

Voorspelling Verkiezingen

traffic-1782527_960_720

10 artikelen over data en verkiezingen

Nu de deelnemers van onze hackathon in de startblokken staan om morgen de meest fantastische voorspellingen van de verkiezingen te maken, willen wij ze nog even inspireren met een paar mooie artikelen/websites over data en verkiezingen.

Succes morgen allemaal!

  1. Nieuwe peiling methode in Nederland
  1. Aantal peilingen samengebracht
  1. Waarom er bijna geen verandering is in de peilingen
  1. Waarom zagen de peilingen de overwinning van Trump niet aankomen
  1. Hoe Trump data analyse inzette om de verkiezingen te winnen
  1. Hoe Nederlandse politieke partijen data analyse inzetten om stemmen te winnen
  1. Google populariteit van partijen en lijsttrekkers
  1. Goede voorspeller van verkiezingen in Science
  1. Belang social media in verkiezingen
  1. Rol facebook in verkiezingen NL