Posts

business-communication-computer-261706

Let R/Python send messages when the algorithms are done training

As Data Scientists, we often train complex algorithms in order to tackle certain business problems and generate value. These algorithms, however, can take a while to train. Sometimes they take a couple of hours, hours which I’m not going to spend just sitting and waiting. But regularly checking whether the training is done, is also not the most efficient way.

Now I started to use Telegram to send me notifications from R and Python to let me know when training is done. Furthermore, I’m also using it for example to send me notifications when pipelines / ETLs fail, which allows me to repair them as soon as they fail.

It’s really easy, so I thought I’ll share my code!

First, after you’ve installed Telegram, search for the BotFather, which is a bot from the app itself. When you text /newbot, and follow the instructions, it will create your first bot and gives you a token. Copy this!

Next step is to find the id to send messages to. Find your bot in Telegram and say something. Then, go to your browser and go to https://api.telegram.org/bot<token>/getUpdates, where it should show you your chat id.

Finally install the necessary packages for R [install.packages(‘telegram’)] and / or Python [pip install telegram]. And you’re ready!

For R, use the following function:

send_telegram_message <- function(text, chat_id, bot_token){ require(telegram) bot <- TGBot$new(token = bot_token) bot$sendMessage(text = text, chat_id = chat_id) }

And this one for Python:

def send_telegram_message(text, chat_id, bot_token):
import telegram
bot = telegram.Bot(token=bot_token)
bot.send_message(chat_id=chat_id, text = text )

Cmotions (111 van 174)_bijgesneden

Education for the Next Generation: a Handsign recognition project in Python

“Could you create a handsign recognition model which we can use to teach High School students a bit more about A.I. in a fun way?”
This is the question a few colleagues asked a couple of weeks ago, and ofcourse, the only real response here could be YES! I was immediately enthusiastic and started working on this fun project.

After a lot of messing around with different models, among which xgboost and neural networks, I found a real goldmine. Which, in this case, was the GitHub page of loicmarie, where he created a script to not only train such a model using an Inception Model (convolutional neural network classifier), but also use it. So I combined my own script with the ones of loicmarie and we were ready to go!

The Inception Model V3 is a deep learning model created by Google based on images from ImageNet.

image03

The Inception Model is capable of classifying images in 1.000 classes with an error rate like a human would have. An impressive model, which isn’t only cool on its own, but can also be used for Transfer Learning. Which means we can use the knowledge from this model and expand it with our own images. Which makes it quite “easy” and “fast” to create a good performing model on our own images, which, in this case, are different handsigns.

When we arrived at the High School, we first gave the students an introduction to what A.I. actually is and where they encounter A.I. in their world. After that we introduced them to our handsign recognition model and gave them the assignment to create their own handsigns.

Cmotions (41 van 174) Cmotions (58 van 174)

After which they used a script to take their own pictures for each handsign.

Cmotions (111 van 174)_bijgesneden

And then it was time for us to put our computers to work! It started with a script to generate 10.000 pictures for each handsign. As soon as this script was ready, the training of the model started.

WhatsApp Image 2018-05-17 at 07.29.20

After 23 hours (!) all the models were succesfully trained and it was battle time! The group who could write the most flawless text, using their own handsigns, within 5 minutes was the project-winner!

Cmotions (158 van 174) Cmotions (170 van 174)_bijgesneden

 

Check out this video to see how it works:

 

Are you interested in our code? Please feel free to take a look at our GitHub repo!

mountain-2574006_1920

Project Friday 2.2: let’s fly!

A little while ago we started with our second Project Friday; once a month (or so) we’ll lock some colleagues in a room with a couple of beers and a fun project. This project: give a drone a brain and an eye, so we can call itand make it do stuff for us. Why do we do this? Well, because it’s fun. and we learn a lot.

During our first session we mostly discovered how difficult it was to maneuver the drone around inside our office building. This didn’t put us off even a tiny little bit, we love a good challenge!

We’ve spend most of this session on thinking of a way we can use the camera on the bottom of the drone to make it follow a path we’ve laid out for it. If we put it simply: we want to make the drone to be able to follow a line on the floor. The first thing we did was create a line made out of white adhesive tape on our dark carpet. After that we held the drone above this line to take pictures. And then the thinking started… We had to make sure we took every possible deviation into account and thought of the best way to correct the drone if that deviation occured. Believe it or not, but this drawing helped us do that.

deviationsWhile thinking of every possible deviation and the correction that had to be applied for that deviation, we immediately programmed it into our Python script for the drone. As soon as this script was done, it was time for our first test flight. Which you can see in this video:

Ok… Not succesful yet. Enough work left for some more friday afternoons. As far as we’re concerned: bring it on!

 

Read more about what we did before

mountain-2574006_1920

Project Friday 2.1: let’s fly!

After all the fun we had, while also learning a lot, during our first Project Friday “Artificial Intelligence meets Coffee” project We felt it was time for a second project. So this time, instead of giving an eye and a brain to a coffeemachine, why not try to do the same to a drone?! What if we could make a drone come fly up to us when we call for it and tell it what to do after it recognizes who we are…

Our second Project Friday was born!

You might wonder what Project Friday actually is… Well, that’s an easy one; once a month(or so) we’ll lock some colleagues in a room with a couple of beers and a fun project. Why do we do this? Well, because it’s fun and we also learn a lot.

To get started with this Project Friday, we first needed a drone! We chose for the Parrot AR 2.0 Drone, because you can easily connect this drone to your computer to take over the command.

3029-large-parrot-3029jpg

Most of this first afternoon was spend on trying to fly the drone inside. Which, we found out the hard way, isn’t so easy! A few walls were hit and we did see some people running for their lives, but in the end of the day everbody, including the drone, survived. All is good!

We even managed do give a few commands to the drone from the computer. Although the effect of the commands weren’t as succesful as we had hoped…

 

What we’ve learned so far:

  • flying a drone inside is difficult,
  • connecting to the drone from the computer is easy,
  • giving the right commands isn’t easy at all,
  • we love being pilots!

DroneFun

 

494345930

Python & R vs. SPSS & SAS

When we’re working for clients we mostly come across the statistical programming languages SAS, SPSS, R and Python. Of these SAS and SPSS are probably the most used. However, the interest for the open source languages R and Python is increasing. In recent years, some of our clients migrated from using SAS or SPSS to using R and/or Python. And even if they haven’t (yet), most commercial software packages (including SAS and SPSS) make it possible to connect to R and Python nowadays.

SAS was developed at the North Carolina State University and was primarily developed to be able to analyse large quantities of agriculture data. The abbreviation SAS stands for Statistical Analysis System. In 1976 the company SAS was founded as the demand for such software increased. Statistical Package for the Social Sciences (SPSS) was developed for the social sciences and was the first statistical programming language for the PC. It was developed in 1968 at the Stanford University and eight years later the company SPSS Inc. was founded, which was bought by IBM in 2009.

In 2000 the University of Auckland released the first version of R, a programming language primarily focused on statistical modeling and was open sourced under the GNU license. Python is the only one that was not developed at a university. Python was created by a Dutch guy who is a big fan of Monty Python (where the name comes from). He needed a project during Christmas and created this language which is based on ABC. ABC is a language, also created by him, with the goal to teach non-programmers how to program. Python is a multi-purpose language, like C++ and Java, with the big difference and advantage that Python is way easier to learn. Programmers carried on and created lots of modules on top of Python and it therefore has a wide range of statistical modeling capabilities nowadays. That’s why Python definitely belongs in this list.

In this article, we compare the four languages on methods and techniques, ease of learning, visualisation, support and costs. We explicitly focus on the languages, the user interfaces SAS Enterprise Miner and SPSS Modeler are out of scope.

table

Statistical methods and Techniques

My vision on Data Analysis is that there is continuum between explanatory models on one side and predictive models on the other side. The decisions you make during the modeling process depend on your goal. Let’s take Customer Churn as an example, you can ask yourself why are customers leaving? Or you can ask yourself which customers are leaving? The first question has as its primary goal to explain churn, while the second question has as its primary goal to predict churn. These are two fundamentally different questions and this has implications for the decisions you take along the way. The predictive side of Data Analysis is closely related to terms like Data Mining and Machine Learning.

When we’re looking at SPSS and SAS, both of these languages originate from the explanatory side of Data Analysis. They are developed in an academic environment, where hypotheses testing plays a major role. This makes that they have significant less methods and techniques in comparison to R and Python. Nowadays, SAS and SPSS both have data mining tools (SAS Enterprise Miner and SPSS Modeler), however these are different tools and you’ll need extra licenses.

One of the major advantages of open source tooling is that the community continuously improves and increases functionality. R was created by academics, who wanted their algorithms to spread as easily as possible. Ergo R has the widest range of algorithms, which makes R strong on the explanatory side and on the predictive side of Data Analysis.

Python is developed with a strong focus on (business) applications, not from an academic or statistical standpoint. This makes Python very powerful when algorithms are directly used in applications. Hence, we see that the statistical capabilities are primarily focused on the predictive side. Python is mostly used in Data Mining or Machine Learning applications where a data analyst doesn’t need to intervene. Python is therefore also strong in analysing images and videos, for example we’ve used Python this summer to build our own autonomous driving RC car. Python is also the easiest language to use when using Big Data Frameworks like Spark.

Ease of learning

Both SPSS and SAS have a comprehensive user interface, with the consequence that a user doesn’t necessarily need to code. Furthermore, SPSS has a paste-function which creates syntaxes from steps executed in the user interface and SAS has Proc SQL, which makes SAS-coding a lot easier for people who know the SQL query language. SAS and SPSS code are syntaxtically far from similar to each other and also very different from other relevant programming languages, so when you need to learn one of these from scratch, good luck with it!

Although there are GUI alternatives for R, like Rattle, it doesn’t come close to SAS or SPSS in terms of its functionality. R is easily to learn for programmers, however, a lot of analysts don’t have a background in programming. R has the steepest learning curve from all, it’s the most difficult one to start with. But once you get the basics, it gets easier soon. For this specific reason, we’ve created a R course, called Experience R, which kickstarts (aspiring) data analysts / scientists in learning R. Python is based on ABC, which is developed with the sole purpose of teaching non-programmers how to program. Readability is one of the key features of Python. This makes Python the easiest language to learn. As Python is so broad, there are no GUI’s for Python.

To conclude, as for ease of learning SPSS and SAS are the best option for starting analysts as they provide tools where the user doesn’t need to program.

Support

Both SAS and SPSS are commercial products and therefore have official support. This motivates some companies to choose for these languages: if something goes wrong, they’ve got support.

There is a misconception around the support for open-source tooling. It’s true that there is no official support from the creators or owners, nonetheless, there’s a large community for both languages most willing to help you to solve your problem. And 99 out of 100 times (if not more often), your question has already been asked and answered on sites like Stack Overflow. On top of that, there are numerous companies that do provide professional support for R and Python. So, although there’s no official support for both R and Python, in practice we see that if you’ve got a question, you’ll likely have your answer sooner if it’s about R or Python than in case it’s SAS or SPSS related.

Visualisation

The graphical capabilities of SAS and SPSS are purely functional; although it is possible to make minor changes to graphs, to fully customize your plots and visualizations in SAS and SPSS can be very cumbersome or even impossible. R and Python offer much more opportunities to customize and optimize your graphs due to the wide range of modules that are available. The most widely used module for R is ggplot2, which has a wide set of graphs where you’re able to adjust practically everything. These graphs are also easily made interactive, which allows users to play with the data through applications like shiny.

Python and R learned (and still learn) a lot from each other. One of the best examples of this is that Python also has a ggplot-module , which has practically the same functionality and syntax as it does in R. Another widely used module for visualisation in Python is Matplotlib.

Costs

R and Python are open source, which makes them freely available for everybody. The downside is that, as we’ve discussed before, these are harder to learn languages compared to start using the SAS or SPSS GUI. As a result, analysts equipped with R and/or Python in their skillset have higher salaries than analyst that don’t. Educating employees that are currently not familiar with R and/or Python costs money as well. Therefore, in practice it isn’t the case that the open source programming language are completely free of costs, but when you compare it with the license fees for SAS or SPSS, the business case is very easily made: R and Python are way cheaper!

My choice

“Software is like sex, it’s better when it’s free” – Linus Torvalds (creator Linux)

My go-to tools are R and Python, I can use these languages everywhere without having to buy licenses. Also I don’t need to wait for the licenses. And time is a key feature in my job as a consultant. Aside from licenses, probably the main reason is the wide range of statistical methods; I can use any algorithm out there and choose the one that suits the challenge at hand best.

Which of the two languages I use depends on the goal, as mentioned above. Python is a multi-purpose language and is developed with a strong focus on applications. Python is therefore strong in Machine Learning applications; hence I use Python for example for Face or Object Recognition or Deep Learning applications. I use R for goals which have to do with customer behaviour, where the explanatory side also plays a major role; if I know which customers are about to churn, I would also like to know why.

These two languages are for a large part complementary. There are libraries for R that allow you to run Python code (reticulate, rPython), and there are Python modules which allow you to run R code (rpy2). This makes the combination of the two languages even stronger.


Jeroen Kromme, Senior Consultant Data Scientist