Public data sources with social justice applications

Recently, I and some WSU student volunteers gave a talk at St. Olaf on great public data sources I have been using as sources of investigating “social justice” applications. Most of the ways I’ve used the data have been for class projects or assignments. That talk was an impetus to compile all the resources I’ve been using over the years, along with some example class projects using pre-aggregated data.

Integrated Public Use Microdata Series (IPUMS)

IPUMS is one of my favorite data sources. The great folks at the Minnesota Population Center have done fabulous work collecting, harmonizing, and producing data from the American Community Survey, the Current Population Survey, and the National Health Interview Survey, among others. The group at IPUMS-International even collaborated with us on an REU during Summer 2017. Once you get the hang of their data query system, you have an enormous wealth of data available to you. In December 2017, they even released ipumsr, an R package that facilitates wrangling their data extracts with R.

As the M in IPUMS indicates, data extracted from IPUMS are microdata. This means each row is an individual observation, which makes IPUMS data especially interesting. All sensible IPUMS data analyses require consideration of the PERWT variable, which indicates how many individuals in the population each row represents. IPUMS has extensive documentation on PERWT (e.g. here and here).

Example student projects using IPUMS data


The WONDER database operated by the Centers for Disease Control and Prevention provide a rich query system for analysis of public health data. All data is aggregated. If too few counts are available at the requested level of aggregation, WONDER typically suppresses those counts. The databases I most often use are the Natality and Mortality queries.

Example student projects using WONDER data

MN Department of Education

The Minnesota Department of Education’s Data Center has a wealth of data, though not nearly as nice a query system as IPUMS or WONDER. Queries must often be by year, with separate Excel/CSV files for each year. Looking at trends for 10 years, then, requires downloading 10 separate data files and aggregating manually. Not fun! But there are a lot of data sources here for those interested in looking at staffing, ACT scores, school enrollment, etc. I spent some time aggregating data on disciplinary actions for the Winona Area Public School district. I’ve turned it into a Tableau visualization and also used it as an assigment for my intermediate statistics (“Stat 2”) class.

  • Intermediate statistics assignment: modeling racial disparity in disciplinary action rate using data from Winona Area Public Schools

Police Data Initiative

The Police Data Initiative contains data on 911 calls for over 20 different U.S. cities. The data are relatively clean, but the variables available vary greatly from city-to-city. A recent group of students from my data visualization class did very well visualizing the calls from Seattle.