Data set gold mine
Quick one here. As a statistics educator I am always on the lookout for interesting, real, digestable data that illustrate important statistical concepts. That’s a tall order!
One site that I visit again and again is this excellent repository hosted at University of Florida. Here’s the link. I regularly ping this website for classes ranging from intro stats to experimental design to regression analysis. Not only are they varied in scope and organized by topic, they also have brief descriptions and citations of original sources. It’s a gold mine! Hat tip to my colleague Brant Deppa aka Data Hound for originally cluing me in to this website.
Just an example, here’s one on modeling math scores as a function of LSD concentration I recently used in a homework assignment for my intermediate statistics course (spoiler alert: taking LSD is not recommended to improve math test score.)
library(ggplot2) df <- read.table('http://www.stat.ufl.edu/~winner/data/lsd.dat',header=FALSE,col.names=c('LSD','Score')) ggplot(data = df,aes(x = LSD, y = Score)) + geom_point() + geom_smooth(method="lm") + xlab('LSD concentration (mcg/kg)') + ylab('Math score (out of 100)')
## `geom_smooth()` using formula 'y ~ x'