Comparing Data Sets
In the past couple of lessons, we concentrated on data analysis involving one set of data, like arm span, or number of pets in a household. These are called one-variable statistics. In this lesson, we examine weather page data and our class spread sheet to see if we can find sets of data that appear to be related. For these two-variable statistics, we can create plots that visually show by the 'shape' of the data any relationships or comparisons that might be otherwise hidden in the numbers. On this page, we will look at examples of two kinds of plots in particular - scatter plots and back-to-back stem-and-leaf plots (see last lesson for a single stem-and-leaf plot example).
Predicted and actual temperatures are a good example of two-variable data that can be represented on a scatter plot. If predicted temperatures were 100% accurate, then a plot of the predicted and actual temperatures would lie on a straight line. As we can see below, however, this is not the case!

The dots seem to be all over the place, but they are grouped in a bit of a pattern that rises slightly to the right. This might indicate that as predictions rise, so do the actual temperatures, which we would hope is the case. We can plot a line of "100%" correlation by connecting points where the predicted and actual temperatures do, in fact, agree. See the plot below.

We can see that most of the dots are above the line that shows 100% correlation. This indicates that most of the actual temperatures were somewhat lower than the predictions.
Of course, we don't really expect forecasters to be exactly on the temperature every time, and our scatter plot can be used to show how often they are within a certain range of error, say + or - 5 degrees. See the graph below.

By comparing the dots on or inside the confidence lines to the total number of dots, we can extablish a certain degree of confidence that predictions will fall within the +5/-5 degree range of error. There are 31 pieces of data, and 20 of them are on or inside the confidence lines. We can say that about 2/3 of the time, the predictions will be no more than five degrees off, or that we can be 66% certain that the predictions will be within the +5/-5 degree range.
If we make the range of error smaller, say +3/-3 degrees, then our confidence level that the predictions would be within this range would fall to a smaller number. See graph below.

Back-to-back stem-and-leaf plots can also be used to compate two sets of data. For example, we might compare the arm span and footprint area data from the class spread sheet to see how they might compare. See the plot below.

The shape of the stem-and-leaf plot reveals much information not readily available in the raw data. The most noticable observation is that the arm span data is much more tightly clustered than the footprint data, which contains several gaps and is spread out over a greater range by far than the arm span data. We can also see the modes of both sets of data by looking for the repeated digits in the leaves - There are several in each set of data.
These two examples of statistical plots help us to evaluate and make generalizations about data. Along with the statistical techniques discussed in the previous lesson, they are powerful tools that can help us make decisions about complex issues and relationships.