Data Centers, Spreads and Plots
Please e-mail your suggestions...Link to my other web site.

Math Models Home Page

Modeling Statistical Terms

Sampling

In this lesson, students collect numerical and catagorical data about themselves which is then organized and displayed using a variety of data analysis techniques. Data centers (mean, median, and mode), clusters, gaps, outlieers and the range of the data are revealed in order to 'see' what is typical about the data and its distribution.

Columns of cubes are used to model data centers. Here is the set of cube columns for the 'house' (number of people in household) category:

 

cube columns - household size

 

It is easy to see that there are quite a few households with four members, and that the minimum is two and the maximum 8. We would say that the mode is 4, and that the range is 8-2, or 6. The mode is the data item most often repeated, and the range is the difference between the minimum and maximum data items.

 

 

If we arrange the data items in numerical order, then we can 'see' the median:

 

cube columns - median

 

The median is the exact denter of the data items. since there are 27 items, we count in 13 from each end, and the center column is the median. If there are an even number of items, then the median is the 'average' (mean) of the two center columns. It is also even more readily noticable that there are more fours than any other data item.

 

 

We can also manipulate our cube columns to illustrate the mean. The mean is the even distribution of the data among the data items. When most people think of the word 'average' they are thinking of the mean, also referred to as the arithemetic mean, since we use basic operations to determine the mean number in the data set.

 

cube columns - mean

 

There are now four cubes in every column, and five in 13 others, or just under half the columns. If we distribute the remaining 13 cubes to the 27 columns, each would get 13/27 of a cube more. then each column would contain 4 13/27 members in the household, or about 4 1/2. This is like totaling up all the cubes and dividing the amount by 27.

 

Now we know some statistical information about the household sizes in our class - the median number is 4, the mode is 4, and the mean is about 4 1/2 .

 

We can also organize our data visually using line plots and stem-and-leaf plots. Here is a line plot for the number of pets in each household:

 

line plot - pets

We can quite easily see information about our pets in the line plot. It's obvious that the large majority of households contain fewer than five pets. There is a definite cluster of data from 0-4 pets. Above four pets, there are several gaps in the data, and it looks as if there are a couple of pieces of data that are far from the typical numbers. We might call 13 and 14 outliers. We can also see that the mode number of pets is one.

 

 

 

We can also construct stem-and-leaf plots to get a feel for how our data is shaped. Here's a plot for the arm span:

 stem-and-leaf plot - arm span

 

 

We can see that the majority of students have arm spans in the 140-159 cm range. We can also see that there several repeated data items, 140, 142, 150, and 151 cm. 150 would be considered the mode of the data. We can count from the top or bottom 13 items to find the median, which would be 150 cm. Again,plotting the data gives it a 'shape' which helps us to understand it.

 

 

 

This page illustrates a few models for basic statistical analysis techniques that students can use to analyze their single category (1-variable ) data samples. In the next lesson, we'll see a couple of ways to compare two sets of data to see if there are reltionships between them.