Member-only story
How to add number of observations to a ggplot2 boxplot
Introduction
Boxplots are extremely useful to learn more about any given dataset. Basically, it allows you to compare a continuous and a categorical variable, that includes information about distribution and statistics, such as the median. As an example, let us explore the Iris dataset.
Let’s say you want to know more about the variable Sepal.Length
. One way to do this would be to look at its statistics.
summary(iris$Sepal.Length)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.300 5.100 5.800 5.843 6.400 7.900
If you want to look at the variable Sepal.Length
and differentiate by another variable - let's say Species
you could summarize it as such:
kable(
iris %>%
group_by(Species) %>%
summarize(
mean = mean(Sepal.Length),
count = n())
)Species mean count
----------- ------ ------
setosa 5.006 50
versicolor 5.936 50
virginica 6.588 50
A visual way of exploring the data is to use a boxplot. It shows you the distribution, the median as well as the upper and lower quartile.
ggplot(iris, aes(Species, Sepal.Length)) +
geom_boxplot() +
theme_fivethirtyeight()