This post describes an activity I developed for Stat 310: Intermediate Statistics. This course is the second course on statistics at Winona State. I like to think of it as our “introduction to modeling” course, and this activity does just that: introduces students to the idea of a statistical model, including model assessment and fitting. The activity actually comes in two parts, administered at different times in the semester. In the first part, I am trying to get students to think about how to assess and compare proposed models using residuals. In the second, students need to fit their own models, and compare performance of fitted models.

The *Vitruvian Man* is a well-known drawing and study by Leonardo DaVinci:

This work is sometimes referred to as *Canon of Proportions*, and is essentially a series of proposed proportions. My activity focuses on two of these proportions:

*the length of the outspread arms is equal to the height of a man**the distance from the elbow to the tip of the hand is a quarter of the height of a man*

## Part 1: comparing proposed models

These proportions proposed by DaVinci are essentially two proposed statistical models! We can test these models by collecting some data. This I did by having the students pair up and measure the following three quantities:

A. Height;

B. “Wingspan” (length of the outspread arms);

C. “Elbow-tip” (the distance from the elbow to the tip of the hand).

With these three measurements, we can assess which of DaVinci’s proposed proportions is “best!” Notice that his first proportion is like fitting the model:

\[Height = \beta_0 + \beta_1 \times Wingspan + \epsilon\]

In this model, both the intercept and the slope are fixed with \(\beta_0 = 0\) and \(\beta_1 = 1\).

The second model is:

\[Height = \beta_0 + \beta_1 \times ElbowTip + \epsilon\]

Again in this model, the model parameters are fixed with \(\beta_0 = 0\) and \(\beta_1 = 4\).

So which model is better?! This motivates finding the modeled \(\widehat{Height}\) given each equation, and comparing the sum of squared residuals, *SSError*.

But first, let’s visualize! Here is a scatterplot of Height vs Wingspan using the data collected by the 22 students in my Spring 2018 section of Stat 310. The line indicates the proposed model with \(\beta_0 = 0\) and \(\beta_1 = 1\):

The model is about perfect for two students; but clearly imperfect for the other 20. What about Elbow-Tip?

Clearly, this fit looks worse. We can quantify this by computing SSError, which equals 156 using wingspan and 504 using Elbow-Tip.

## Part 2: Fitting simple linear regression models

So, DaVinci’s proposed model using Wingspan wasn’t horrible, but the proposal using Elbow-Tip was. Can we improve these proposed proportions by fitting simple linear regression models, and if so, which *fitted* model is best?

The figure below shows the actual height plotted versus the fitted heights from the two mdoels, along with the (0,1) line:

It’s difficult to tell which model performs best! Here we really do need the SSErrors, which are 117.8 using wingspan and 122.4 using Elbow-Tip. So, close! But Wingspan slightly out-performs Elbow-Tip as a predictor of height. (Of course, these are in-sample SSErrors; a more accurate comparison would cross-validate which we discuss later in the course.)