Photo by Pascal van de Vendel on Unsplash


A Step-by-Step Walkthrough of the most basic but yet most-used Data Manipulation Functions

1 Introduction

Hi, I am Gregor, a data scientist or someone who needs to assess and clean data most of the time. I love to work with Python/ Pandas and R/ tidyverse in my projects equally. Since we used R and the tidyverse package in our most recent project, I would like to share the most basic but most-used functions to manipulate data sets.

In the next section, I outline my technical setup for this article to use the examples in this article yourself immediately. Then, in section 3, I present the seven functions using the Gapminder dataset. …

Photo by Jeremy Zero on Unsplash


A Step-by-Step Tutorial How to Create Publication-Ready Tables

I am a Data Scientist, and most of the time, I think about a perfect way to visualize a vast amount of data to convey interesting findings to clients and team members. And to be honest, in most cases, if not in every case, showing the data and its structure in the form of a simple table is necessary and will help to improve the overall understanding.

However, in most cases, I use PowerPoint or Excel to create this table to look presentable and/ or publishable. This, of course, breaks the possibility to automatically reproduce this result. …

Photo by Tim Bogdanov on Unsplash


A Step-by-Step Tutorial How to Use R and ggplot2 along with Linear Regression

In one of my last projects, I was asked to perform a simple linear regression to foresee possible price developments. To compare the actual price development, we used the consumer price index as a baseline. This article will show you how I tried to achieve this with a different data set — using ggplot2 for plotting and linear regression for prediction.

1. Setup

I will briefly explain my setup, including the data and R packages that I am using.

In general, I always use the Tidyverse package. It includes the packages like ggplot2 to create beautiful plots in a very intuitive way…

Image created by author, based on images by Neven Krcmarek on Unsplash and Andyone on Unsplash


Learn Which Strategies and Factors Highly Influence Innovation Adoption

Anna feels sad and tired. She leaves a meeting that was supposed to be exciting and the start of something new, but it went in another direction, entirely. Two weeks earlier, she was excited. She was given the task of researching and proposing new ideas to improve team collaboration. She created a list of promising agile methods she believed would improve team spirit and productivity. However, as she explained her ideas in today’s meeting to her team members, few were attentive, some were bored, and two openly questioned her why there is the need to change anything.

Had you a…

Image created by author; based on work from Pixabay Close-up of Wooden Plank and Photo by Frank Luca on Unsplash and Photo by Anshu A on Unsplash

Hands-on Tutorials, TUTORIAL — R — OKTOBERFEST

A Step-By-Step Tutorial on How to Analyze and Visualize Oktoberfest Data Using R and ggplot2 and How to Predict Price Information

My running journey; image by author


A Step by Step Tutorial to Create Beautiful Art using Data viz tools and Affinity Designer

Hi, I am Gregor, a Data Scientist, and a passionate non-competitive runner. I also enjoy the art of photography, while — with everything — I am still learning.

One of my strategies to be creative next to a full-time job is to write about Data Science. I recently wrote an article about how to extract and visualize data collected by my smartwatch. It is most technical and about knowledge that I liked to share. But then, most recently, I read an article from Chesca Kirkland where she described how she collected personal data during the lockdown phase and how she…

Photo by Bruno Nascimento on Unsplash; slightly altered by author


A Step-by-Step Guide to Retrieve and to Visualize your Running Data with Python and Altair

Hi, I am Gregor, a Data Scientist, and a passionate non-competitive runner. I just realized that I started to use a running app on my phone ten years ago. Back then, I just recorded GPS, start and end-times. I had no means to record cadence, heart rate, elevation, and the like. I remember to be a bad runner — slow and easily out of breath. I just finished my Ph.D. and worked way too much at my desk. So I decided to start this journey as a runner.

Ten years is quite a long time and I am wondering what…


A step-by-step guide to understand, install, use, and enjoy the standard logging package in Python

Photo by Jaymantri on Pexels

Gregor is a data scientist who loves to solve data riddles. One day, he is given a new project and wants to jump into this new task immediately. He is prepared and a master of his favorite coding tool of choice, knows which packages to use, and already he forms an idea of how to structure the data project. Logging is not on his mind. Not at the start, that is. It dawns on him when most of the code is written and there is only one bug that he tries to pinpoint and find in his code. Once he…


And how you can use a free database tool to start right now to build your own Article Pipeline — from idea generation and planning until review

Image by author; based on work from RetroSupply and Jordan Wozniak on Unsplash, inspired by Michal Malewicz’s article

Hi, I am Gregor, a researcher, a writer, a data scientist, and a consultant. And I like all of it, but these passions compete for the same amount of available time and energy. At any given moment new ideas emerge that compete with old ideas that I am working on at the moment. Maybe you can relate to this scenario. Wouldn’t it be great to have something to put your ideas into and that allows you to work on them gradually as well as take a step back and sort your ideas? …

Tips and Tricks

In this article you will learn how to use a great concept in Python and Pandas to make your code more efficient and better to read (even for your future self)

Photo by JJ Ying on Unsplash


By trade I am an R person. Especially the Tidyverse is such a powerful, clean, easy-to-understand and well documented data science platform. I highly recommend to every beginner the free online book R for Data Science.

However, my team’s programming language of choice is Python/ Pandas — which is also a wonderful data science platform. One of the major differences (to me, at least) is how we write Python code, which is very different to R code — that has nothing to do with the syntax in itself.

One of R’s elegances is using the pipe functionality programming metaphor. This…

Dr. Gregor Scheithauer

Gregor Scheithauer is a consultant, data scientist, and researcher. He is specialized in Process Mining, Process Management, and Data Analytics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store