Anna feels sad and tired. She leaves a meeting that was supposed to be exciting and the start of something new, but it went in another direction, entirely. Two weeks earlier, she was excited. She was given the task of researching and proposing new ideas to improve team collaboration. She created a list of promising agile methods she believed would improve team spirit and productivity. However, as she explained her ideas in today’s meeting to her team members, few were attentive, some were bored, and two openly questioned her why there is the need to change anything.
Had you a…
Hi, I am Gregor, a Data Scientist, and a passionate non-competitive runner. I also enjoy the art of photography, while — with everything — I am still learning.
One of my strategies to be creative next to a full-time job is to write about Data Science. I recently wrote an article about how to extract and visualize data collected by my smartwatch. It is most technical and about knowledge that I liked to share. But then, most recently, I read an article from Chesca Kirkland where she described how she collected personal data during the lockdown phase and how she…
Hi, I am Gregor, a Data Scientist, and a passionate non-competitive runner. I just realized that I started to use a running app on my phone ten years ago. Back then, I just recorded GPS, start and end-times. I had no means to record cadence, heart rate, elevation, and the like. I remember to be a bad runner — slow and easily out of breath. I just finished my Ph.D. and worked way too much at my desk. So I decided to start this journey as a runner.
Ten years is quite a long time and I am wondering what…
Gregor is a data scientist who loves to solve data riddles. One day, he is given a new project and wants to jump into this new task immediately. He is prepared and a master of his favorite coding tool of choice, knows which packages to use, and already he forms an idea of how to structure the data project. Logging is not on his mind. Not at the start, that is. It dawns on him when most of the code is written and there is only one bug that he tries to pinpoint and find in his code. Once he…
Hi, I am Gregor, a researcher, a writer, a data scientist, and a consultant. And I like all of it, but these passions compete for the same amount of available time and energy. At any given moment new ideas emerge that compete with old ideas that I am working on at the moment. Maybe you can relate to this scenario. Wouldn’t it be great to have something to put your ideas into and that allows you to work on them gradually as well as take a step back and sort your ideas? …
By trade I am an R person. Especially the Tidyverse is such a powerful, clean, easy-to-understand and well documented data science platform. I highly recommend to every beginner the free online book R for Data Science.
However, my team’s programming language of choice is Python/ Pandas — which is also a wonderful data science platform. One of the major differences (to me, at least) is how we write Python code, which is very different to R code — that has nothing to do with the syntax in itself.
March 15th, 2021 marks my ninth year on LinkedIn. I joined LinkedIn not at the beginning of my professional life, but nine years represent the better part of me working. I was a researcher with Siemens CT before I went into the consulting business in 2011, where I am still active today. Looking back, my consisting topics are process management and data science — topics I really enjoy. Since joining LinkedIn in 2012 I made 720 virtual connections 😯.
Boxplots are extremely useful to learn more about any given dataset. Basically, it allows you to compare a continuous and a categorical variable, that includes information about distribution and statistics, such as the median. As an example, let us explore the Iris dataset.
Let’s say you want to know more about the variable
Sepal.Length. One way to do this would be to look at its statistics.
summary(iris$Sepal.Length)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.300 5.100 5.800 5.843 6.400 7.900
If you want to look at the variable
Sepal.Length and differentiate by another variable - let's say…
If you love plotting your data with R’s ggplot2 but you are bound to use Python, the plotnine package is worth to look into as an alternative to matplotlib. In this post I show you how to get started with plotnine for productive output.
If you want to follow along please find the whole script on GitHub:
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. Source: http://ggplot2.tidyverse.org/
Gregor Scheithauer is a consultant, data scientist, and researcher. He is specialized in Process Mining, Process Management, and Data Analytics.