Member-only story

How to add number of observations to a ggplot2 boxplot

Dr. Gregor Scheithauer
3 min readSep 6, 2018

--

Introduction

Boxplots are extremely useful to learn more about any given dataset. Basically, it allows you to compare a continuous and a categorical variable, that includes information about distribution and statistics, such as the median. As an example, let us explore the Iris dataset.

Let’s say you want to know more about the variable Sepal.Length. One way to do this would be to look at its statistics.

summary(iris$Sepal.Length)##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 4.300 5.100 5.800 5.843 6.400 7.900

If you want to look at the variable Sepal.Length and differentiate by another variable - let's say Species you could summarize it as such:

kable(
iris %>%
group_by(Species) %>%
summarize(
mean = mean(Sepal.Length),
count = n())
)
Species mean count
----------- ------ ------
setosa 5.006 50
versicolor 5.936 50
virginica 6.588 50

A visual way of exploring the data is to use a boxplot. It shows you the distribution, the median as well as the upper and lower quartile.

ggplot(iris, aes(Species, Sepal.Length)) + 
geom_boxplot() +
theme_fivethirtyeight()

--

--

Dr. Gregor Scheithauer
Dr. Gregor Scheithauer

Written by Dr. Gregor Scheithauer

Gregor Scheithauer is a consultant, data scientist, and researcher. https://gscheithauer.medium.com/membership

Responses (8)