Notes from Liberal Arts Data Science Workshop

When temperatures hit 0°F in Minnesota, what better remedy than to head to Florida and talk data science curriculum! The 2-day workshop was held at the New College of Florida in Sarasota, FL. This post reflects some of the ideas circulated at the workshop that stood out to me.

Multivariate thinking and the introductory statistics and data science course: preparing students to make sense of a world of observational data (Nick Horton)

In this talk, Dr. Horton emphasized the fact that most data nowadays is “found data” of the observational nature. In other words, it is rare to encounter studies that implement careful randomization into groups in order to account for confounding variables. In light of this, he made the following suggestions.

  • In introductory statistics classes, focus less on technical assumptions of 2-sample t-test (for example sample size and degrees-of-freedom), and more on issues of confounding and randomization.
  • Bring multivariate thinking into the course early. One easy way this can be done is to introduce data visualization from Day 1.
  • Introductory classes should emphasize writing, projects, and visualization

He also gave several examples of confounding; one can never have too many in their repertoire! An example I really liked: if a study finds that people who use sunscreen tend to have higher rates of skin cancer, would this imply that sunscreen is dangerous to use? The confounding variable in this case would be sun exposure. Of course, people who are using sunscreen are probably experiencing greater sun exposure, which is a risk factor for skin cancer.

Projects first in an interdisciplinary data science curriculum (Jessen Havill)

Dr. Havill gave an overview of the new Data Analytics major at Denison University. The major is intentionally not named Data Science to emphasize the liberal arts nature of the major. It is extremely cross-disciplinary (the two new upper-level Data Analytics courses are taught by an ecologist and an operations researcher). It was interesting to hear about the program at Denison and the thought they put into it. Check out the program website for more details.

Computer science in the data science curriculum (Panel)

This panel included Jessen Havill of Denison University; Dennis F.X. Mathaisel of Babson College; Julie Medero of Harvey Mudd College; and Imad Rahal of St. John’s University and The College of St. Benedict. Some pertinent features of the panel:

  • What CS skills are essential for data science?

    • The single most important thing, according to Dr. Havill, is abstraction. This concept is more important than the argument of whether this language is better than that language, and is something that can be taught in CS courses from Day 1.
    • Computational thinking that translates a problem into a computational solution, according to Dr. Medero.
    • How to even represent data that comes in nonstandard form, according to Dr. Rahal. The ability to work with data of large Volume, Velocity, and in a wide Varieties of structure.
  • Are more proprietary tools or more general purpose tools more important?

    • The ability to learn something new is more important than expertise in a specific tool, according to Dr. Medero.
    • We will never keep up with all the proprietary tools. The languages I want to use are those that are best for teaching. Choosing a tool because it’s hot right now is not necessarily wise, according to Dr. Havill.

Florida Panthers consulting projects (Brian Macdonald)

Probably my favorite presentation! Brian is the Director of Hockey Analytics for the Florida Panthers, transitioning toward DIrector of Data Science and Research for the Panthers. In this talk, Brian discussed some fascinating projects he’s worked on with students pursuing master’s degrees in business analytics.

In the first project, he described a model for predicting attendance for games using only information known before tickets go on sale. This will help answer questions like, which games should be in which tiers for variable pricing? What kinds of requests should the team make when the league is developing the schedule? For example, does it make better sense from a sales standpoint to schedule good teams on a Saturday and a bad team during the week, or vice versa?

This project used data on announced attendance from Predictors of attendance included day of week, holiday, month, opponent. Interesting nuggets: nobody wants to go to games on Halloween or Easter; and people like to go to games against the “original 6” NHL teams.

The second project centered on understanding what influences season ticket renewal. In short, people who attend more high-scoring, close games that their team wins, are more likely to renew.

Brian also discussed skills he looks for in interns. He emphasized skills in data management and merging and data visualization more than analysis skills. Coding experience is non-negotiable.

And, of course…

…there was beach time.