A word in favor of summarytools

Yesterday, I was preparing material for STAT 405 (biostatistics) I am teaching this spring, and was on the prowl for something that is an improvement upon the base R summary() function (it doesn’t even give standard deviations!). The ideal package would also improve upon the base R table() method, for which getting row and/or column percents is a huge pain. Base function xtabs() is great for getting arrays of contigency tables, but no percents.

Stat consulting: a data science playground

This past semester I taught our STAT 370 (statistical consulting and communication) for the first time. This course gave student experience consulting for real clients from the university and community and focused on communicating with a client as well as report and presentation preparation best practices. Most of the required analyses were simple: paired t-tests, simple linear regression, etc. What struck me was the nontrivality of the data tidying process! While STAT 370 is taken mostly by our statistics majors, so many of the examples we encountered would be beautiful case studies for our introductory DSCI (data science) curriculum.

Quantifying thrill

Monday morning, October 30, found me groggy and sandy-eyed. The culprit was the 5-hour and 17-minute, 10-inning thriller between the LA Dodgers and Houston Astros in Game 5 of the 2017 the night before. Thanks to living in the Central Time Zone, I went to bed around 1am. The Astros ended up defeating the Dodgers 13-12, but the game was insane, featuring three comebacks from deficits of 3 runs or more.


As I was watching Chris Rock host the 2016 Oscars, I decided to finally scratch my curiosity itch and learn R’s Twitter API, twitteR. The 2016 Oscars were controversial, due to the fact that all the actors and actresses nominated were white for the second year in a row. Chris Rock made sure to point this out in his opening monologue, and tweets began using the hashtag #OscarsSoWhite to advance the conversation.