Prediction Unit

UNIT 1—Prediction

Teacher Materials


CLICK THE  SYMBOL OF EACH SECTION HEADER TO RETURN HERE.
TEKS Support
Context Overview
Mathematical Development
Teacher Notes
 Preparation Reading—Let the Bones Speak!
 Activity 1—Using Your Head
 Homework 1—Leg Work
 Supplemental Activity 1—Under Investigation
 Activity 2—Measuring Up
 Homework 2—Follow in My Footsteps
 Supplemental Activity 2—Line Up
 Activity 3—I Predict That
 Homework 3—Exercising Judgment
 Activity 4—Forearmed Is Forewarned
 Homework 4—The Nature of Our Relationship
 Activity 5—Dangerous Waters
 Homework 5—Anscombe’s Data
 Activity 6—The Plot Thickens
 Homework 6—You Are What You Eat
 Unit Project—Who Am I?
 Handout 1—CLASS DATA RECORDING SHEET
 Handout 2—TI-83 INSTRUCTIONS FOR FINDING THE LEAST-SQUARES LINE
 Handout 3—EXCEL 4.0 GUIDANCE FOR FINDING THE LEAST-SQUARES LINE
 Handout 4—TI-83 INSTRUCTIONS: CALCULATING PREDICTED VALUES AND ERRORS
Annotated Student Materials
 Preparation Reading—Let the Bones Speak!
 Activity 1—Using Your Head
 Homework 1—Leg Work
 Supplemental Activity 1—Under Investigation
 Activity 2—Measuring Up
 Homework 2—Follow in My Footsteps
 Supplemental Activity 2—Line Up
 Activity 3—I Predict That
 Homework 3—Exercising Judgment
 Activity 4—Forearmed Is Forewarned
 Homework 4—The Nature of Our Relationship
 Activity 5—Dangerous Waters
 Homework 5—Anscombe’s Data
 Activity 6—The Plot Thickens
 Homework 6—You Are What You Eat
 Assessment—Cattle Stocks
 Unit Project—Who Am I?
 Mathematical Summary—Scatter plots
 Key Concepts
Solution to Short Modeling Practice
 Solution to Christmas Tree Farming
Solutions to Practice and Review Problems

TEKS Support


This unit contains activities that support the following knowledge and skills elements of the TEKS.

(1) (A)

X

(4) (A)

 

(1) (B)

X

(4) (B)

X

(1) (C)

X

   
       

(2) (A)

X

(8) (A)

 

(2) (B)

X

(8) (B)

 

(2) (C)

 

(8) (C)

X

(2) (D)

X

   
       

(3) (A)

X

(9) (A)

 

(3) (B)

 

(9) (B)

 

(3) (C)

     

The mathematical prerequisites for this unit are

The mathematical topics included or taught in this unit are

The equipment list for this unit is


Context Overview


This unit investigates statistical concepts involved with prediction. Archaeologists, criminologists, and doctors all have an interest in predicting people’s heights from knowledge of another variable, such as the length of bones in the body or the distance between footsteps. In this unit, students collect their own data on height, forearm length, and stride length. Based on their data, they determine models to predict height from forearm or stride length. For the final project, students analyze skeletal data from the Forensic Anthropological Data Bank and determine models to predict height from the length of long bones in the arms and legs.

In addition to predicting height, students investigate an ecological problem. The manatee is an endangered species with a rising death rate. About one-third of all manatee deaths are attributable to human causes, and among these the leading cause is contact with powerboats. Based on data collected by the Florida Department of Environmental Protection, students decide how increases in powerboat registrations are affecting the life of the manatee.


Mathematical Development


The two major mathematical goals of the unit are to explore bivariate data analysis and to further understanding of linear relationships.

This unit expands students’ acquaintance with data analysis, emphasizing not simply describing data but also using data to make predictions. The unit begins by examining a proportional relationship between people’s head length and their height. Based on data collected from their classmates, students determine the best multiplier for this relationship. In addition, they interpret the meaning of slope in this context.

Later in the unit, students display data using dot plots (one-variable data) and scatter plots (two-variable data) and then use their displays to make predictions. They assess the precision of their predictions based on the variability in the data. Midway through the unit, students encounter a major problem: Given a scatter plot, how do you select the "best" line to describe the data? After looking at several methods and comparing the resulting models, students are introduced to the least-squares line. They use residual plots to judge the adequacy of the linear regression model to describe the pattern in a scatter plot. In addition, they examine the effect that outliers have on the least-squares line and learn to select the best variable for making a prediction.


Teacher Notes





Preparation Reading—Let the Bones Speak!

This reading introduces the major contextual theme of the unit and leaves students with a question: What can you predict about a person from a few bones?

Make sure students begin this unit by reading this preparation reading. Information from this reading is used throughout the unit.





Activity 1—Using Your Head

   


Materials Needed

Meter sticks (at least 2)

Rulers

Mathematically, relationships between height and head length, examined from the perspective of artists, focus attention on models of the form y = mx and the meaning of slope. At the end of the activity, students use one such model to predict a person’s height based on the length of his or her skull bone.

Students should work in small groups (3 to 4 students) on this activity. For Item 2(a) each group will have to share its data with another group.

For Item 1 it is not important that students arrive at "correct" answers. What is important is the reasoning that they use to arrive at their answers. For example, some students may decide that the 416-mm tibia belongs to the same person as the 413-mm femur. They may have arrived at this conclusion based on the data provided in Table 1. Other students may argue that the femur is the longest bone in the body, and thus this tibia belongs to the same person as the 508-mm femur. Let students argue this out for themselves. Don’t give them the answer.

If students struggle making a guess in Item 1(e), remind them to use their general knowledge about people’s heights.

At the conclusion of this activity, discuss Item 2(d and e). Students should understand that, when a residual error is positive, the prediction underestimates the actual value; when a residual error is negative, the prediction overestimates the actual value. You would like to choose a model where, in some sense, the positive and negative errors are balanced or tend to cancel each other out.

Discuss Item 4 with your students to continue developing the concepts of slope and rate of change. Here are some suggested approaches: numeric reasoning, algebraic approach, and graphical approach.

Numeric reasoning:

Make a table of values similar to Example 1. Explain to the students that you are using the notation D(Head length) and D(Height) as shorthand for the change in head length and the change in height, respectively. Stress that the direction of the change must be consistent. Positive values indicate increases, negative values indicate decreases.

In this table, start with the preliminary head length, make the indicated change to head length, and note the corresponding change in height. Point out to students that each time the head length increases by 1 cm, the height increases by 7.5 cm. This is true no matter how large the preliminary head lengths are. If the head length increases by 2 cm, then the height increases by 15 or 2 × 7.5 cm.

Head-length of preliminary sketch D(Head length) from preliminary to final sketch Height of preliminary sketch (cm) Height of final sketch (cm) D(Height)
8 1 60 67.5 7.5
9 1 67.5 75 7.5
10 1 75 82.5 7.5
10 2 75 90 15
11 2 82.5 97.5 15

Example 1. Table of height values

Algebraic approach:

Make sure the students understand the distributive-law-based answer to Item 4.

Graphical approach:

Use Transparency T.1 to illustrate the change in height corresponding to a 1 cm change in head length.



Transparency T.1: Rate of change

Be sure that students see the connections between the three approaches.

When discussing Item 5, note that groups may have arrived at different predictions if they decided that the length of a deceased person’s skull is smaller than the person’s head length when he or she was living. Using the length of the skull to estimate the length of the person’s head introduces a source of uncertainty (or error) into the prediction process. The second source of error may be the artists’ guidelines. They were meant to be rough guides for drawing figures and not precise methods for predicting height.





Homework 1—Leg Work

   

For this assignment, students work with linear models developed by Dr. Mildred Trotter to predict people’s heights based on femur and tibia lengths. At the conclusion of this assignment, they discover that a person’s femur is generally longer than his or her tibia.

This assignment foreshadows results from the final project. Here students are introduced to several of Dr. Trotter’s equations relating height to lengths of leg bones. At the end of this Homework, they should understand that a person’s femur is longer than his or her tibia.

As background, here is some information on Dr. Trotter. Dr. Mildred Trotter had a long and distinguished physical anthropology career that included working as a special consultant to the U. S. government during World War II. Her task during the war involved the identification of skeletal remains of servicemen. At the time, she realized that bone sizes and proportions vary based on age, sex, race, and ethnic background. Forensic scientists and law enforcement agencies are still using Trotter’s formulas for estimating people’s stature based on the lengths of their bones.





Supplemental Activity 1—Under Investigation

   

In this activity students investigate graphs of members of the y = mx + b family and learn how changes in m and b affect the graphs. In addition, students discover that the appearance of a graph can be altered by changing the scaling on one or both of the axes.

When discussing Item 2, note that, because the graph’s origin is often not displayed on the calculator screen when plotting data, students should keep in mind the location of the origin in relation to the graph. Give students practice locating the origin with sample calculator windows. Show them the screens and ask them to sketch the origins. Below are some sample windows.

[100,200] × [50,90]

[–10,–5] × [5,15]

[75,100] × [–40,–10]

[–10,–5] × [50,90]

Note: Generally, students will use windows in the first quadrant or select the standard viewing screen.


Activity 2—Measuring Up

   

Materials Needed

Handout 1 (to record class data)

Tape measures, rulers, meter sticks

Background Reading: Dr. Mildred Trotter’s Study of Military Personnel

In this activity, students plan how they will measure and collect data on students’ heights and forearm lengths. Then they collect the data from students in their class.

The quotation below sheds light on how the military measured the height of personnel in the 1940s. (Dr. Trotter’s equations were based on military personnel from this time period.) Share this reading with your class; students may be surprised at the level of detail in the regulation. Can they find the one important detail that’s missing? The regulation appears in a paper by Mildred Trotter and Goldine Gleser (1952).

In Mobilization Regulations, War Department, October 15, 1942 (Regulation 10):

"Directions for taking height. Use a board at least 2 inches wide by 80 inches long, placed vertically, and carefully graduated to 1/4 inch between 58 inches from the floor and the top end. Obtain the height by placing vertically in firm contact with the top of the head, against the measuring rod, an accurately square board of about 6 by 6 by 2 inches, best permanently attached to graduated board by a long cord. The individual should stand erect with back to the graduated board, eyes straight to the front."

As detailed as the regulation appears, something was forgotten. In another set of mobilization regulations dated April 19, 1944, the same essential directions were given with the following sentence added:

"The shoes should be removed when the height is taken."

Mobilization Regulations, War Department, April 19, 1944 (Regulation 10). (Trotter and Gleser 1952, 469-470).





Homework 2—Follow in My Footsteps

   

Determining a standard method for measuring stride length is a bit more complicated than measuring height. For this assignment, students write a set of instructions for measuring a person’s stride length.

At some time prior to Activity 6, you should collect the stride length data from students in the class.





Supplemental Activity 2—Line Up

   

This activity reviews determining an equation for a line given its graph. Students use both slope-intercept and point-slope forms to determine equations of lines from their graphs.

This is a review activity for students who are rusty in use of the slope-intercept and point-slope forms for determining the equation of a line. This is optional.



Activity 3—I Predict That

   


Materials Needed

Class data on heights and forearm lengths

This activity focuses on the idea of variability in data and its relation to the precision of predictions made from the data. Students analyze one variable, student heights. They assess the precision of using the mean as a predictor of height.

The purpose of this activity is for students to see that precision in prediction is linked to variability in the data. Dot plots are introduced as a graphical tool for analysis of one-variable data, and the mean is suggested as a simple predictor for such data.

Discuss Item 3. Notice that, instead of using the minimum and maximum heights for the prediction interval, we narrow the interval by omitting the three smallest and three largest observations. Although this allows for a more precise prediction (the interval is narrower), omitting data also increases the chance that the prediction will be false. Discuss why the increase in precision is probably worth the increased risk of being wrong.

In Item 4, students consider the mean as a predictor. Point out that asking how far off a prediction might be is another way of asking how large the prediction error might be. Point out that the error depends on the spread (variability) of the data.

In Item 5, note that, whenever you see data that are bimodal (appear roughly as two mounds), you should ask if the data contain two subpopulations. If you can identify the subpopulations, which in this case are the boys and the girls, you should analyze each separately and then compare the results.

In Item 6, the girls’ data are less variable (exhibit less spread) than the entire data set. This reduction in variability allows a more precise prediction.

The purpose for Item 8 is to acquaint students with calculator output from one-variable statistics calculations. Check to see that students understand the mathematical notations for sum and mean.

If you run short of time, students can complete Items 10 and 11 on their own. Note that the term outlier is defined in Item 10. In Item 10, students discover the effect that outliers have on the mean and the importance of adjusting predictions when outliers are present. For Item 11, check to see that students are aware of the link between the precision of predictions and the variability of data. This concept will reappear when students analyze precision of predictions that are based on linear models.





Homework 3—Exercising Judgment

   
This assignment provides students with more experience comparing two sets of data and practice in drawing reasonable conclusions.

In Item 1, although students will notice that, on average, the mothers who smoked had babies that weighed less than those of mothers who did not smoke, they may not notice that all of the babies who weighed under 6 lb. had mothers who smoked.

In Item 2, each set of data contains an outlier that inflates the mean. Be sure that students recognize this, remove the outliers, and compute the means of the data that remain. For Items 2(f) and (g), check that students understand why a scatter plot is an inappropriate way to display these data. Scatter plots are used when there is an assumption that two quantities obtained as matched pairs are related. There is no such pairing here, and no natural reason to pair particular numbers for the two groups.

This activity reviews determining an equation for a line given its graph. Students use both slope-intercept and point-slope forms to determine equations of lines from their graphs.

This is a review activity for students who are rusty in use of the slope-intercept and point-slope forms for determining the equation of a line. This is optional.





Activity 4—Forearmed Is Forewarned

   


Materials Needed

Graph paper

Ruler

Spaghetti or toothpicks

This activity emphasizes the use of scatter plots in identifying and describing relationships. Students face the problem of selecting the "best model" to describe the pattern of a scatter plot. In deciding between two contenders for the best model, students analyze both models’ residuals.

Item 2 is designed to connect analyses of one-variable data (discussed in Activity 3) to two-variable settings, bridging dot plots to scatter plots. By drawing a vertical line to specify a single value of the independent variable, students can interpret the data that fall along that line (or close to it) as a vertical dot plot.

For example, to view the variability in heights for students with forearm length 27 cm, draw the vertical line x = 27. Then look at the range of heights for students with 27-cm forearms (or close to 27-cm forearms).

After students have calculated a few predicted values and residual errors in Item 4, you may wish to help them use calculator lists to speed their work. See Handout 4 for TI-83 calculator instructions.

For Item 5, students may find it helpful to use a tangible object such as uncooked spaghetti (or toothpicks if they’re working on calculator screens) to use as lines. That way, they can easily adjust the line until they are satisfied with how it fits the data.





Homework 4—The Nature of Our Relationship

   

This assignment introduces the ideas of direction (positive or negative), form (linear or nonlinear), and strength (strong or weak) of a relationship.

Note: After students have completed this assignment, review some of the new vocabulary words, such as positively and negatively related, linear and nonlinear form, and weak and strong relationships.





Activity 5—Dangerous Waters

   

Students fit a least-squares line to describe the relationship between the number of manatees killed per year and the number of powerboat registrations. They use their model for analysis and prediction. In addition, they learn to use residual plots to assess whether their model is adequate to describe the data.

For Item 2, you will need to teach students to calculate the equation for the least-squares line on the graphing calculator or computer. Handout 2 contains TI-83 instructions for computing the least-squares line. Handout 3 provides similar instructions for Excel. For other calculators or spreadsheets, check your manual.

Item 4 states two essential criteria related to good fits and defines "residual plot." Stress student understanding of what this plot really means. The randomness of this plot should be the primary criterion for deciding that a model is reasonable. You may refer students to Handout 4 if they need help calculating the residuals on their calculators.

For Item 6, check that students realize that the number of powerboat registrations is in units of 1,000.

You may want to point out that a "good" residual plot looks like a bunch of dots thrown haphazardly at a piece of paper; the dots should appear randomly scattered around the x-axis. If the dots do not look randomly scattered around the x-axis but instead form a clear pattern, you should look for another model to describe your data.

Some calculators give the value of Pearson’s correlation coefficient, r, as part of the output from a linear regression. If this is the case, you may want to tell students that r is a measure of the strength and direction of a linear relationship. However, stress that judging the goodness of a fit should begin with examining the graphs of the original data and the residual plot.





Homework 5—Anscombe’s Data

   

Students fit least-squares lines to four data sets and discover that they get the same equation in all four cases. After examining scatter plots of the data, students learn that the least-squares equation is an adequate model for describing the pattern in only one of the data sets.

This is a famous data set. You will find it in numerous statistics texts.





Activity 6—The Plot Thickens

   


Materials Needed

Tape measures or meter sticks

Partially completed Handout 1

Students decide which of two independent variables, forearm or stride length, is a better predictor of student height. In addition, students work through an analysis illustrating how forensic data can help solve crimes.

For Item 1, one method for deciding which of two independent variables yields more precise predictions for the same dependent variable is to select the relationship that has the smaller average of squared errors. Taking the average instead of the sum adjusts for situations where the scatter plot of one relationship has more data than the scatter plot of another relationship. For example, this method can be used when comparing the regression equation based on the class height-forearm data to predict height to the regression equation based on the boys’ height-forearm data.

The average of squared errors is one estimate of the variance about the least-squares line. It is, however, not the one generally used by statisticians. Statisticians generally use the unbiased estimator SSE/(n–2) where n is the number of cases or the sample size, but this is not relevant to student work in this unit.

To speed the completion of Item 1, you may decide to work the item as a whole class activity.

The sample answers to the remainder of the items in this activity are based on the following set of data collected from a set of 9th and 10th graders.

Name

Gender

Height
(cm)

Stride Length
(cm)

Forearm Length
(cm)

Scott Male 166.0 58.25 28.5
John Male 178.0 68.5 29.0
Matt Male 171.0 58.5 27.2
Will Male 165.0 50.125 28.0
Michael Male 177.5 58.75 31.3
Jeffrey Male 166.0 62.875 28.3
Even Male 175.5 59.125 28.6
Brad Male 171.0 67.75 31.5
Lonnie Male 184.0 68.875 30.5
William Male 184.5 66.25 30.8
Robert Male 183.5 79.5 30.5
Karim Male 172.0 70.5 30.3
Meredith Female 164.5 55.875 24.2
Lee Female 166.0 52.375 27.3
Pilar Female 168.0 55.375 28.0
Ansley Female 178.5 59.75 29.1
Julie Female 166.0 48.375 27.9
Becton Female 159.0 57.125 28.0
Elizabeth Female 166.0 64.0 27.4
Shannon Female 154.5 57.75 25.8
Jamie Female 161.0 63.5 27.0
Jeris Female 177.0 69.75 30.1
Kat Female 161.0 72.5 26.5
Blaie Female 164.0 75.25 28.2
Frances Female 174.0 58.5 28.4
Eliz Female 164.0 59.75 26.8
Baily Female 168.0 55.25 26.4

For Item 2, if you have not already collected class data on student stride lengths, you should do so. (See Homework 2.) After deciding on a method for collecting the data, each group can be responsible for collecting the data from its members. After groups have collected their data, pool the results. Students should record these results in the last column of Handout 1. If you have already collected the stride-length data, students can read quickly through Items 2 and 3 and begin their work at Item 4.





Homework 6—You Are What You Eat

   

In this assignment students discover the drastic effect that outliers can have on a regression line by comparing models computed with and without outliers. Students also learn to seek the interpretation of outliers in particular settings.

After students have completed this assignment, discuss how the presence of outliers affects the values of m and b in the least-squares equation. The least-squares equation can be very sensitive to outliers, particularly if they occur at the extremes. In these situations, the least-squares line does a poor job of describing the pattern of the majority of the data.

In those cases where you can determine that the outliers are "unusual points" that are not representative of the relationship, remove these points and recalculate the equation of the least-squares line using the remaining data. For example, in Item 2 (the situation with the swimming data) a good argument could be made that the first two times were not "typical" because the swimmer was still learning the butterfly. In this case, it seems reasonable to remove the outliers and refit the model.






Unit Project—Who Am I?

This project can be adapted for a wide range of student abilities and time constraints. Students can complete their analysis using a spreadsheet or a graphing calculator. Ideally, students work in groups, and each group should present its work in a formal written report. You may want to have groups give oral presentations in addition to or in place of the written report. Work may also be done individually if more time is available.

Below is a brief set of guidelines for reports. You may decide to give more detailed guidelines of your own design.

If possible, let students decide for themselves how they will complete this project. Encourage them to plan what equations they will need to determine and then divide the work among group members. If some groups struggle, you may need to provide additional structure.

The following is a direct method (not necessarily the best method) of addressing the questions in this project.

Students may use several different approaches in developing equations to predict the heights of Bones 1 and Bones 2. First, they should realize that there are two bones that can be used to predict height: the femur and the ulna. So, they should begin predicting height using each of these independent variables.

Note that the data that appear in the student pages of this project are also provided as computer files, as listed:

Column headings are not included in these files. However, the calculator file is a program that stores the data to named lists. See student pages for the heading labels and units of measure.





Handout 1—CLASS DATA RECORDING SHEET



Female

       

Male

     

Name

Forearm
length
(cm)

Height
(cm)

Strident
length
(cm)

 

Name

Forearm
length
(cm)

Height
(cm)

Stride
length
(cm)

                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 
                 




Handout 2—TI-83 INSTRUCTIONS FOR FINDING THE LEAST-SQUARES LINE

Here’s how to use the TI-83 to calculate the least-squres line and its residuals.

  1. Enter the independent variable values into L1 and the dependent variable values into L2. Then make a scatter plot of the data in an appropriate window.
  2. To calculate the equation of the least-squares line, first press STAT, CALC, LinReg(ax + b). To complete the command, you need to tell the calculator which list contains the data for the explantory variable, which list contains the data for the dependent variable, and where you want the least-squares equation stored. Complete the entry to match the one shown in Figure 1 and then press ENTER.




  3. Figure 1. Setting up linear regression

  4. Press Y=. Has the regression equation been stored as Y1? Press GRAPH to view the scatter plot with a graph of the least-squares line.
  5. The TI-83 computes the residuals automatically and stores them in a list named RESID whenever you use a built-in regression command. To see the residuals displayed, place the cursor on the header of list L3, press INS, and select RESID from the LIST menu. Your screen should look like Figure 2.




  6. Figure 2. Entering the residuals

Press ENTER again and the residuals will appear. How many are positive and how many negative? What is the absolute value of the largest residual? How can you use the features on your calculator to find the sum of the residuals?





Handout 3—EXCEL 4.0 GUIDANCE FOR FINDING THE LEAST-SQUARES LINE

Complete Activity 5 in your text. Use these instructions to assist your work using Excel 4.0.

  1. Data entry.


    1. Open your saved spreadsheet from Activity 8.
    2. Enter the powerboat and manatee deaths data in columns A and B.
  1. Finding the least-squares equation.


    1. Clear the formula for the predicted y-values by highlighting column C and choosing Clear from the Edit menu.
    2. Move the cursor to the cell in which you have the slope, D2. Enter the formula =SLOPE(B2:B15, A2:A15). Note that the y-values are listed first in the Excel formula. Press Enter to see the result.
    3. Move to an empty cell nearby, say, D4. Type a new label, " y-int." Then move to D5 and enter the formula =INTERCEPT(B2:B15, A2:A15). Press ENTER for the result (Figure 1).




    Figure 1. Computing the intercept of the least-squares line in Excel 4.0




Handout 4—TI-83 INSTRUCTIONS: CALCULATING PREDICTED VALUES AND ERRORS

Example: Find the errors using the linear model y = 2x + 1.



Figure 1. Sample data

Recall that prediction errors are defined as Yactual - Ypredicted.

  1. Enter the data in lists L1 and L2. List L2 contains the Yactual. Then enter y = 2x + 1 as Y1. Now, plot the data and graph the line. If you used ZOOM 9 to set the viewing window, your results should match the graph in Figure 2.




  2. Figure 2. Scatter plot and graph of a line
  1. Next, use the linear model to store in L3 the predicted value of y for each of the values of x in L1. Here’s how to do it:


  1. To get the prediction errors of the y = 2x + 1 model, all you need to do is subtract the predicted values in L3 from the actual y-values in L2, that is, L3 – L2. Here’s how to do it:



Annotated Student Materials






Preparation Reading—Let the Bones Speak!



Legend has it that, about 100 years ago, somewhere in Arizona’s Superstition Mountains, a Dutchman by the name of Jacob Walz murdered a group of gold miners in order to claim their mine for himself. Over the years, he would periodically be seen in Phoenix with saddlebags filled with rich ore. Many attempted to follow Walz when he returned to the mine, but he always managed to lose trackers in the rugged wilderness. Walz died in 1891.

For over a century, people have searched without success for the Lost Dutchman Mine. Some have lost not only time and money, but their lives. At least two searchers are known to have been murdered during their quest. Others, unable to meet the physical challenges of the rugged area, never returned from their treks and remain missing.

From time to time, human bones are found in rugged areas such as the Superstition Mountains. Suppose that a contemporary gold digger searching for the Lost Dutchman Mine finds a skull, eight long bones, and numerous bone fragments. He notifies the local authorities who, in turn, send out a team to investigate. After first documenting the exact location and position of the bones at the site, the team records information about the bones, such as their size and general condition. A partial list of information similar to what might be recorded is contained in Table 1.

Bone type

Number found

Length (mm)

Femur 3 413, 414, 508
Tibia 1 416
Ulna 2 228, 290
Radius 1 215
Humerus 1 357
Skull 1 230
Fragments More than 10 From 30 to 50 mm

Table 1. Sample record of bones found at site

In such situations, police frequently request help from forensic anthropologists to identify the deceased and determine the cause of death. The bones tell the forensic scientists a story about who the deceased were and frequently how they died. Sometimes the story is an old one, as would be the case if the bones belonged to the gold miners murdered by Walz. Other times, the bones tell a story of more recent crime and provide police with clues that may help them solve a mystery.


Figure 1




Activity 1—Using Your Head

   

 = 1, 2, 3, 4

In this unit you will be asked to think like a forensic scientist. After studying the data in Table 1 of the preparation reading, you will begin to sort out clues about the deceased from their bones. For the final project at the end of this unit, you will write a report detailing their story.

  1. Study the data in Table 1 of the preparation reading.


    1. A forensic scientist would tell you that these bones belonged to at least two people. How would the scientist know this for sure?


    2. There are three femurs. Two of the femur bones are very close in length (413 mm and 414 mm) and probably belong to the same person. So, these three bones most likely belong to two people. It is possible that the bones actually belong to more than two people.

    3. Which bones do you think belonged to the same person? On what assumptions did you base your answer? How sure are you of your answer? (Name the dead people Bones 1, Bones 2, and so on, to make it easier to classify their bones.) You will need to refer to your answers to this item in Homework 1.


    4. Sample answer:

      Assume that there are only two people.

      Bones 1: femurs–413, 414; tibia–416; ulna–28; and radius–215.

      Bones 2: femur–508; ulna–290; humerus–357.

      The skull bone could belong to either Bones 1 or Bones 2.

      Since the two femurs—413 mm and 414 mm—are close in length, they most likely belong to the same person, Bones 1. From the skeleton in Table 1, it appears reasonable to assume that a person’s tibia should be fairly close in length to her femur, so the 416-mm tibia probably belongs to Bones 1. Assuming that there are only two people, the person with shorter legs probably also has shorter arms. That means that the 228-mm ulna belongs to Bones 1. The length of a person’s radius should be close to the length of the ulna, and hence the 215-mm radius belongs to Bones 1.

      Bones 2, the taller of the two people, has a 508-mm femur and a 290-mm ulna. A person’s humerus should be just a bit longer than his ulna (based on the diagram of the skeleton in the preparation reading). Assuming that there are only two people, the 357 mm humerus belongs to Bones 2. (It’s possible that this bone belongs to a third person.)

      Students probably will be most uncertain to whom the skull and humerus belong. Some students may know that the femur is the longest bone in the body and may conclude that the 416-mm tibia belongs to Bones 2.

    5. Do you think the deceased were male or female? On what evidence did you base your answer?


    6. Sample answer:

      There is very little evidence to suggest whether the decedents are male or female. Because the femur bones of one of the decedents are considerably shorter than the other, it’s possible that Bones 1 (shorter femur) is female and Bones 2 is male. However, it could just as easily be the case that the decedents are two males, one short and one tall.

    7. Do you think the deceased were young children or adults? Defend your answer.


    8. Sample answer #1: Most likely the decedents were adults. What would young children be doing out in the Superstition Mountains?

      Sample answer #2: When I measured my own forearm to get an estimate of how long an adult’s ulna might be, it measured about 260 mm. (The actual length of my ulna would be a bit less than this measurement.) So, Bones 2 is probably an adult. Bones 1 might be a child.

    9. Guess the heights of the deceased. How accurate do you think your guesses are?


    10. Sample answer:

      Assuming that Bones 1 is female and Bones 2 is male, my guesses are that Bones 1 is 5 feet 5 inches and Bones 2 is 5 feet 10 inches. These guesses were based on my estimates of average heights for adult females and adult males, respectively. While these guesses are fairly rough, they should be within a foot of the actual heights.

  1. One place to look for some help in estimating heights is artists’ guidebooks for sketching human figures. Artists have found that the rule of thumb, "draw a 14-year old 7 head-lengths tall," helps them draw teenagers with heads correctly proportioned to their bodies. How closely do the dimensions of real students, such as those in your class, match the ideal relationship suggested by artists?


    1. Within your group, measure each person’s head length (from chin to the top of the head). Record your data in the first two columns of a table similar to the one in Table 2. Be sure to specify your units of measurement at the top of the last four columns. (It may be easiest to record measurements in cm.)



    2. Name

      Head
      length

      Predicted height

      Actual height

      Residual error: Actual – Predicted

               
               
               
               
               
               

      Table 2. Group head length and height data


    3. Use the relationship "height = 7 head lengths" to predict each person’s height. Record your results in the third column of your table.


    4. See sample answer to (d).

    5. Next, measure each person’s actual height and record your results.


    6. See sample answer to (d).

      In almost every situation in which predictions are made from data, it is useful to examine the residual errors. Residual errors are defined as the difference between the actual value and the predicted value for each point in your data.

    7. Calculate the residual errors corresponding to the people represented in your table by subtracting the predicted heights in column 3 from the actual heights in column 4. Record the results in column 5. So that you have sufficient data to detect patterns, collect the data from another group and add it to the bottom of your table.

      Sample Answer:

      Name

      Head length (cm)

      Predicted height (cm)

      Actual height (cm)

      Residual error
      Actual–Predicted (cm)

      Jan

      23.0

      161.0

      168.0

      7.0

      Horrace

      25.0

      175.0

      178.0

      3.0

      Betty

      24.0

      168.0

      166.5

      –1.5

      John

      25.0

      175.0

      184.0

      9.0

      Joy

      23.5

      164.5

      170.5

      6.0

      Lora

      22.0

      154.0

      159.0

      5.0



      The answers items 1(a - d) depend on the individuals in the group and how measurements were taken. Expect answers to vary greatly from group to group. If most students are older than 14 years old, expect most of the residual errors to be positive.

    8. If a residual error is positive, what does that tell you about your prediction? What if an error is negative? What if an error is zero?


    9. Positive residual errors indicate that your predictions are too low; negative errors mean predictions are too high; zero residual errors mean the predicted heights match the actual heights.

    10. Are the residual errors fairly evenly divided between positive and negative values? How well did the relationship "height = 7 head lengths" do in predicting the actual heights of members of your group?


    11. Sample answer:

      Most of the residual errors were positive. So, the rule of thumb seems to frequently underestimate students’ actual heights.

    12. Would a multiplier different from 7 do a better job? If so, what multiplier would you choose? What process did you use to determine this multiplier? Why do you think it does a better job than the multiplier 7?


    13. Sample answer based on sample answer to (d):

      Use Average ratio of height/head length: (1/6)(168.0/23.0 + 178.0/25.0 + 166.5/24.0 + 184.0/25.0 + 170.0/23.5 + 159.0/22.0) » 7.2.

      Using 7.2 as the multiplier produces the following residuals errors.



      Name

      Head length (cm)

      Predicted height (cm)

      Actual height (cm)

      Residual error
      Actual–Predicted
      (cm)

      Jan

      23.0

      165.6

      168.0

      2.4

      Horrace

      25.0

      180.0

      178.0

      –2.0

      Betty

      24.0

      172.8

      166.5

      –6.3

      John

      25.0

      180.0

      184.0

      4.0

      Joy

      23.5

      169.2

      170.5

      1.3

      Lora

      22.0

      158.4

      159.0

      0.6



      Using this multiplier, two of the six residual errors are negative; the size of errors tends to be smaller. In addition, the errors sum to zero so that the amount overestimated balances the amount underestimated.

  1. The relationship between height and head length changes with age. Therefore, artists adjust their guideline based on the age of the person they are drawing.


    1. When drawing sketches of adults (ages 18-50) artists follow this guideline: Draw the figure of an adult approximately seven and one-half head-lengths tall. Write a formula that describes the relationship between height, H, and head length, L, according to the artists’ guideline for drawing an adult.


    2. H = 7.5L

    3. Write two additional formulas, one representing guidelines for drawing sketches of 14-year-olds and the other for drawing sketches of students from your class. (As was done in a), use the variables H and L.)


    4. H = 7.0L

      Sample answer: H = 7.2L

    5. Using your graphing calculator, on the same set of axes, graph the equations describing the relationships between height, H, and head length, L, for 14-year-olds, for the students in your class, and for adults. (In other words, sketch the graphs of the formulas that you have written for (a) and (b). You will have to rename variables H and L, y and x, respectively, when you enter the formulas into your calculator.) Adjust the settings for Xmin, Xmax, Ymin, and Ymax so that the x-interval includes all reasonable head lengths and the y-interval includes all reasonable heights. Then make a careful sketch of your three graphs. Be sure that you label each graph with its equation and indicate the scale on each axis.


    6. Sample answer:



      Note that the shaded sections indicate portions of the graphs that represent the real-world relationship between height and head length. (Students may decide to choose a window cropped to capture the shaded sections.)

    7. How are the three graphs the same, and how are they different? What effect does changing the value of the multiplier have on the graph?


    8. These three graphs are all lines that pass through the origin. The multiplier controls the amount of inclination: the larger the multiplier the steeper the incline.

    9. Using the artists’ guidelines for adults, predict the height of a person whose head length measures 23.0 cm. Without doing further calculations, would your estimate be higher or lower if you knew the person was only 13 years old? Explain how you could use your graphs to answer the preceding question.


    10. Predicted height: H = 7.5(23.0) = 172.5 cm. This estimate is likely to be too high. The graph H = 7.5L lies above the graph H = 7.0L for L > 0. So, for each positive entry for L, the value for H will be larger from the first relationship than from the second.

  2. Juan decides to draw a picture of his mother standing by a window. He follows the artists’ guidelines for drawing adults. He makes a preliminary sketch, but then decides that the figure is too small. So, for his final sketch, he draws the head of his figure 1 cm longer than in his preliminary sketch and continues to follow the artists’ guidelines. How much taller than his preliminary sketch is Juan’s final sketch. Justify your answer.


  3. Each time head length is changed by 1 cm, the height gets changed by 7.5 cm. This is true regardless of the size of the head in the preliminary sketch.

  4. Think about how you might use one of the artists’ guidelines or the relationship that your group determined between height and head length to make a rough prediction of the height of the person whose skull length was recorded in Table 1.


    1. What assumptions might you make in order to make your prediction?


    2. Sample answer:

      The decedent was most likely adult. The length of the skull is somewhat smaller than the length of the person’s head.

    3. Recall that the skull measured 230 mm in length. Predict the height of the person in cm. Describe the process you used in making your prediction.


    4. Sample answer:

      Assume that the person was an adult and that head length was 2 cm larger than the skull bone, or 25.0 cm. (The extra 2 cm leaves room for skin and soft tissue and also accounts for shrinkage of the skull due to drying.)

      Prediction: 7.5(25.0 cm) = 187.5 cm.

    5. Does your prediction result in a height that is reasonable for a person? Explain.


    6. The sample answer of 187.5 cm is about 6 ft 2 in. This is a reasonable height for a tall person.

    7. Do you think your prediction is likely to be close to the actual height of the person? Why?


    8. Sample answer:

      First, skull size was used to estimate head length. In addition, the artists’ guidelines are only rough approximations and are not exact for every individual. So the estimate is a very rough one. It may be very far from the person’s actual height.

  5. What information do you think might be helpful in determining better estimates of the heights of the deceased whose bone lengths are recorded in Table 1? How or where might you obtain this information?


  6. Sample answer:

    It would be helpful to know the relationships between bone lengths and heights. Perhaps data could be collected from the class. Perhaps an artists’ handbook would contain relationships between arm lengths and heights or leg lengths and heights. Perhaps you could get data on bones and peoples’ heights from the Internet and then use these data to determine models that predict height from lengths of bones.





Homework 1—Leg Work

   

Dr. Mildred Trotter (1899-1991), a physical anthropologist, was well known for her work in the area of height prediction based on the length of the long bones in the arms and legs.

Here is one of the relationships proposed by Dr. Trotter.

H = 2.38 F + 61.41
First formula

where H is the person’s height (in cm) and F is the length of the femur (in cm).



Figure 2. The femur (thighbone)
  1. Suppose, for most adults, femurs range in size from about 38 cm to 55 cm. According to Dr. Trotter’s formula, how tall is a person with a 38-cm femur? How tall is a person with a 55-cm femur?


  2. Predicted height for a person with a 38-cm femur: 151.85 cm or approximately 152 cm.

    Predicted height for a person with a 55-cm femur: 192.31 cm or approximately 192 cm.

  1. On graph paper, draw a set of axes similar to that shown in Figure 3.
  2. Figure 3. Axes for height and femur length

    Notice that the horizontal axis is scaled from around 35 cm to 60 cm (a slightly wider range than the minimum and maximum femur lengths) with tick marks every 5 units. A zigzag has been added to indicate that there is a break in this scale between 0 and 35.

    1. Draw a scale on the vertical axis that would be appropriate for data on adult heights (in cm).


    2. Refer to answer in (b).

    3. Sketch a graph of Dr. Trotter’s relationship on the set of axes you have drawn. (You may want to plot several points before drawing the graph.)


    4. Note: Students may choose a different scaling for the vertical axis.

  1. Jason’s femur measures 40 cm. His brother’s measures 41 cm. Based on Dr. Trotter’s first formula, predict the difference in the two brother’s heights.


  2. Jason’s height: (40)(2.38) + 61.41 = 156.61

    Jason’s brother’s height: (41)(2.38) + 61.38 = 158.99

    Difference: 2.38 cm

  1. The femurs of two men differ by one centimeter. Predict the difference in their heights. Explain how you were able to determine your answer even though the lengths of the two men’s femurs were not given. In addition, tell how you could read off your answer from Dr. Trotter’s first formula.


  2. Each time you increase the length of the femur by one cm, the height changes by 2.38 cm. This difference has the same value as the multiplier of F, the slope of the linear equation.

  1. Suppose that a woman is 172.7 cm (about 5 ft 8 in.) tall. Explain how you could use your graph to estimate the length of her femur. What is your estimate?


    1. Draw a horizontal line at approximately H = 172.7 cm. Find the F-coordinate that corresponds to the point where the horizontal line and the graph of Dr. Trotter’s equation intersect.


    2. Sample answer: 47.5 cm.

    3. Write an equation (based on Dr. Trotter’s first formula) that describes how you could predict the length of the femur from a person’s height.


    4. F = (H – 61.41)/2.38; femur is the dependent variable and height the independent variable.

    5. Use your equation in (b) to predict the length of a woman’s femur if the woman is 172.7 cm tall. Compare your answer to the one from (a).


    6. (172.7 – 61.41)/2.38 » 46.8 cm

  1. Another of Dr. Trotter’s equations predicts height from the person’s tibia:
  2. Second Formula:

    H = 2.52T + 78.62,

    where H and T are measured in cm.

    1. The length of the tibia described in Table 1 was 416 mm. Using Dr. Trotter’s second formula, predict the person’s height. Is your answer a reasonable height for a person? (Recall that 2.54 cm » 1 in.)


    2. H = (2.52)(41.6) + 78.62 » 183.5 cm. This person is approximately 6 feet tall. This is a reasonable height for a person.

    3. Write a set of algebraic steps to solve
    4. Second formula H = 2.52T + 78.62 for T.

      (A doctor might use such an equation to check that the length of a person’s tibia is normal for a person of that height.)

      Step 1: subtract 78.62 from both sides of the equation.

      H – 78.62 = 2.52T

      Step 2: divide both sides by 2.52.

      (H – 78.62)/2.52 = T or T = (H – 78.62)/2.52

    5. If a person is 172.7 cm tall, use your equation from (b) to predict the length of his or her tibia.


    6. Approximately 37.3 cm.

  1. In the third formula, Dr. Trotter used both the tibia and the femur to predict height:
  2. H = 1.30(F + T) + 63.29.

    (All measurements are in cm.)


    1. Suppose that students measure the femur and tibia of a skeleton and determine that the femur is 42 cm long and the tibia is 43 cm long. Predict the height of the person using Dr. Trotter’s third formula
    2. H = 1.30(F + T) + 63.29.

      The predicted height is 173.8 cm.

    3. Compare the prediction in (a) with the predicted height using Dr. Trotter’s equation H = 2.38F + 61.41.


    4. The predicted height is (2.38)(42) + 61.41 » 161.4 cm. This is 12.4 cm (or about 5 inches) shorter than the prediction in a).

    5. Compare the predictions in (a) and (b) with the predicted height using Dr. Trotter’s Second Formula
    6. H = 2.52T + 78.62.

      The predicted height is (2.52)(43) + 78.62 » 187 cm. This is 13.2 cm larger than the prediction in (a) and 25.6 cm more than the prediction in (b).

    7. You should have found a fairly large discrepancy between your predictions in (a) - (c). One possibility is that you did not get precise measurements of the bone lengths. Suppose that a man is 175 cm tall (about 5 ft 9 in.). Based on Dr. Trotter’s equations in (b) and (c), would you expect his tibia or his femur to be longer and by how much?


    8. Solving 175 = 2.38F + 61.41 for F gives F » 47.7 cm. Solving 175 = 2.52T + 78.62 for T gives T = 38.2 cm. You would expect the femur to be approximately 9.5 cm longer than the tibia.

    9. Repeat part (d) for a person whose height is 160 cm.


    10. Solving 160 = 2.38F + 61.41 for F gives F » 41.4 cm. Solving 160 = 2.52 T + 78.62 for T gives T = 32.3 cm. From these equations, it appears that a person 160 cm tall should have a larger femur than tibia.

    11. Based on Dr. Trotter’s equations, is there any evidence that indicates that you may have made faulty measurements? Explain.


    12. Based on the answers to (e) and (f), it appears that a person’s femur should be longer than his or her tibia. When the students measured the bones, they found that the femur of the skeleton was shorter than the tibia. This is the reverse of what Dr. Trotter’s equations indicate. Perhaps the students need to recheck their measurements.

  1. Use one or more of Dr. Trotter’s equations to estimate the heights of two of the people whose bones are described in Table 1 of the preparation reading. Using her equations, do you think these bones might have belonged to at least three people? Do your calculations give you cause to change any of the assumptions that you made in Item 1(b), Activity 1? If so, which assumption(s)?


  2. Bones 1: H = 2.38(413.5) + 61.41 » 159.823 cm or approximately 5 ft 3 in. (The average of the two femurs closest in length was used to make this estimation.)

    Bones 2: H = 2.38(508) + 61.41 » 182.314 cm or approximately 6 ft.

    Using the tibia length from Table 1, H = (2.52)(41.6) + 78.62 » 183.5 cm or approximately 6 ft.

    Sample answer:

    These calculations do not appear to refute the assumption that there were only two deceased. However, based on these calculations, it appears that the 416-mm tibia might belong to Bones 2 (the taller person) rather than Bones 1.

    In Activity 1 and in this homework, you examined and interpreted equations established by artists and by a scientist. You used some of Dr. Trotter’s models to estimate the heights of Bones 1 and Bones 2 (described in the preparation reading). Dr. Trotter’s formulas may have challenged some of the assumptions that you made in Item 1(b), Activity 1. However, for the equations given in this homework, she assumed that the deceased were adult white males. If this assumption is not valid, your estimates based on Dr. Trotter’s equations may not be accurate.





Supplemental Activity 1—Under Investigation

   

Unlike the artists’ guidelines for drawing figures, Dr. Trotter’s equation,

H = 2.38F + 61.41

(where height, H, and femur length, F, are in cm),

is not a member of the y = mx family, but instead belongs to the larger y = mx + b family. You indicate members of this family by choosing values for m and b. (What were Dr. Trotter’s choices for m and b?)

Recall that Dr. Trotter’s equation

H = 2.38 F + 61.41

was designed to work well for a particular population, adult white males. She later modified her formula by modifying the values of m and b to adjust for age, ethnic background, and gender. To make such adjustments, you will need to know how changes in m and b affect the graph. Complete the following investigation to find out what happens when you make changes to m and b.

Because there are two quantities to change, m and b, it may help to divide the investigation into two parts, as described below.

PART I: KEEP m THE SAME AND CHANGE b.

  1. Choose a value for m and one for b. What is your equation?
  2. Graph your equation.
  3. Choose several other values for b. What equations correspond to these choices?
  4. Graph several of the equations from (3) and the equation from (2) in the same window.

PART II: KEEP b THE SAME AND CHANGE m.

Repeat Part I, reversing the roles of m and b.

  1. Use your graphing calculator to investigate how changing the values of m and b affects the graph of a member of the y = mx + b family.


    1. How does changing the value of b affect the graph of a member of the y = mx + b family? Illustrate using several examples. Continue experimenting with choices for b until you know what b controls on the graph.


    2. Changing the value of b moves the line up (if b is increased) or down (if b is decreased). In addition, the line will cross the y-axis at b.

    3. How does changing the value of m affect the graph of a member of the y = mx + b family? Illustrate using several examples. Continue experimenting with choices for m until you until you know what m controls on the graph.


    4. The slope, m, determines how steeply the line tilts and (depending if m is positive or negative) whether the line tilts upward or downward as you look along the graph from left to right.

    5. The numbers m and b are called the slope and y-intercept, respectively. Do you think slope and y-intercept are descriptive names for m and b? Why?


    6. The value of b determines where the line crosses the y-axis. So, y-intercept is a descriptive name. The value of m determines the steepness of the line or how much it slopes.

      By changing your window settings, you can affect the appearance of a line described by a member of the y = mx + b family without changing the values of m or b. At times, you may want to adjust your window settings to display your graph more effectively. However, you should also be aware that some people, driven by an interest in distorting the truth, will tinker with their window settings until they achieve a graph that satisfies their purpose. Your understanding of how scale change affects the appearance of the line will help you interpret graphs correctly and avoid being misled by their distortions. The next investigation will help you learn the effects on a graph of changing the maximum settings for the horizontal or vertical axis.

  1. In Homework 1, you drew a graph of Dr. Trotter’s equation by hand. Now you will reproduce your hand-drawn graph using a graphing calculator.


    1. Set the viewing window on your calculator to match the scalings on the axes of your hand-drawn graph from Item 2, Homework 1. (For example, set Xmin = 35, Xmax = 60, Xscl = 5. The y-settings will depend on your choice of scale for the vertical axis.) Enter Dr. Trotter’s equation into your calculator and then graph the equation. How does your calculator-produced graph compare with your hand-drawn graph?


    2. Sample answers:

      Hand-drawn graph (left) and calculator produced graph (right) with the same scale settings.



    3. Experiment with changing the scale on the vertical axis by first increasing the value of Ymax and then decreasing the value of Ymax. How would you change the value of Ymax to make the graph of Dr. Trotter’s equation appear very steep? How would you change the value of Ymax to make the graph appear much flatter?


    4. In each case the value of Ymin = 140. Then Ymax was changed from 200 to 300 and then to 160. The appearance of the graph of Dr. Trotter’s formula changed as indicated below.

      The graph appeared flatter when the value of Ymax was increased and became steeper when the value of Ymax was decreased.


    5. Without actually changing the scaling on the horizontal axis, predict what would happen to the appearance of the graph if you changed the value of Xmax from 60 to 120. Why do you think your graph will change as you predicted? Finally, check your prediction by changing the Xmax setting from 60 to 120.


    6. Display 1

      Display 2





      The settings for Display 1 were Xmin = 35, Xmax = 60, Ymin = 140, Ymax = 200. For Display 2, Xmax has been changed to 120.

      For each Display, the equation of the line is the same and each time the x value changes by 1 unit the y-value will change by 2.38 units. However, in Display 2, the distance on the horizontal axis representing 1 unit is smaller than in Display 1 since more units must fit on the same display screen. This makes the line appear steeper.

  1. Answer the following questions without graphing the equations.


    1. Which graph is steeper, the graph of y = 3.48x + 20 or y = 5.78x + 5? How do you know?


    2. The graph of y = 5.78x + 5 is steeper; 5.78 is larger than 3.48. This means that every time the x-value is increased by one unit, the y-value for the first graph will increase by 5.78 units compared to only 3.48 units for the second graph.

    3. Which graph crosses the y-axis at 30, the graph of y = 30x + 15 or y = 15x + 30? How do you know?


    4. The graph of y = 15x + 30. The value for b is 30.

    5. Which graph slants downward as the x values increase:
    6. y = (1/2)x + 15 or y = –2x + 5?

      The graph of y = –2x + 5.

      In this activity you discovered how modifying a member of the y = mx + b family by changing the value of m or b affects its graph. You also discovered that rescaling can change your perception of how steeply a line rises or falls, even though you are graphing the same equation. This understanding will come in handy when you want to select members of the y = mx + b family to describe patterns in data.





Activity 2—Measuring Up

   

 = 5, 6, 7, 8

Your analysis in Activity 1 and Homework 1 has included parts of a process known as mathematical modeling. The process begins when you identify a problem for which you need an answer or a situation that requires further understanding.

For example, during World War II, the armed services sometimes had problems identifying the remains of dead soldiers. Dr. Mildred Trotter was asked to help. She wondered if there were relationships between the height of a person and the lengths of his long bones.

Having posed this question, Dr. Trotter’s next step was to collect relevant data. She needed measurements of people’s heights and the lengths of their long bones. Her model

H = 2.38 F + 61.41

where femur length is in cm

expresses the relationship she observed between the height measurements and femur-length measurements from her data.

Because it depends on data, the model is only as good as the quality of the data on which it is based. Dr. Trotter took special care to check that her data were collected by people who followed detailed instructions for taking the measurements. In this way, she was able to keep to a minimum the data variability that was due to the measurement process.

Dr. Trotter used lengths of long skeletal bones to predict height. You can’t directly measure the bones in your body. Instead, in this activity, you will design methods for collecting data on students’ heights and the lengths of their forearms. Later you will develop a model to predict classmates’ heights using the lengths of their forearms.

Before you collect your data (height and forearm length from each student in your class), you need to establish a method for taking the measurements. Remember, the worth of your model will depend on the quality of the data that you collect. Everyone who will be doing the measuring must use the same method and then record their data to the same degree of precision (for example, to the nearest eighth of an inch or to the nearest millimeter).

  1. With members of your group, discuss methods for measuring:


    1. the heights of students and
    2. the lengths of their forearms.


    3. Sample answer:

      For height: Stack two meter sticks vertically and tape them to the wall. Have the student to be measured stand in front of the meter sticks. The student should take his or her shoes off and stand up straight. Place a cardboard on top of the student’s head. Hold the cardboard so that it is parallel to the floor. Read the spot where the cardboard touches the meter stick. Record height to the nearest millimeter.

      For forearm: Have student hold arm flat on a desk with his or her hand placed palm down. Measure along outside of the arm from the point on the elbow to the top of the knobby bone at the wrist. Measure to the nearest millimeter.

  1. Test your methods as follows:


    1. Have two different students measure the height of the same person following your method. Are both height measurements roughly the same? Are they recorded to the same degree of precision? If not, modify your method and test it again. Keep modifying your method until there is only a small amount of variation in the measurements taken.
    2. Repeat part a), but this time measure forearm length.
  1. Discuss various groups’ methods for measuring height and forearm length. Then select one method. Write a brief description of the method that the class will use to collect the data.
  2. Measure classmates’ forearms and heights. Record your results on Handout 1, Class Data Recording Sheet. Leave the last column blank. (You will collect more data from your class later.) Be sure to record the units you used for height and forearm length at the tops of those columns.

Note: Save your data for use in Activities 3 and 6.





Homework 2—Follow in My Footsteps

   

Sometimes all that’s left at a crime scene is a few footprints. However, the length of a person’s stride is also related to the person’s height. Now you will develop a method for measuring a person’s stride. Later you will gather these data and then use your measurements in a model to predict height.

To collect reliable data, you need to carefully plan the method you will use to collect the data. Remember, your model will be only as good as the data on which it is based.

Design a method for measuring the length of a person’s stride.

Here are some items to consider.

How will the person walk? Do you plan to measure from heel to heel or heel to toe? Since step lengths for the same person can vary, does it makes sense to have the person take more than one step and average the results? If so, how many steps should the person take?

Determine the measurement instrument (e.g., ruler, tape measure, meter stick) you will use to make the measurement.

Specify the precision of the measurement.

After you have decided on your method, test your method as you did the methods for measuring height and forearm length.

When you are satisfied with your method, describe it with a set of written instructions. Give your instructions to a friend to see if someone else understands what you mean. If necessary, revise your instructions. Save them until your class is ready to collect the stride-length data needed later in this unit.

Sample answer:

A tape measure with metric reading will be used for measuring. Measurements will be recorded to the nearest tenth of a centimeter.

Setup: Mark a line about 15 feet long with adhesive tape. Mark the starting position with another piece of tape.

Have the person put his or her heels at the edge of the starting position and then tell the the person to take four steps along the marked line. Measure from starting point to back of heel after the person has stopped. Divide by four to get the stride length.





Supplemental Activity 2—Line Up

   

This activity gives you an opportunity to practice determining an equation of a line from its graph.

Figure 4. Graphs of four lines

  1. The line corresponding to y = (1/2)x + 1 has already been labeled with its equation. Recall that the value multiplying x, in this case 1/2, is called the slope of the line.


    1. For this line, what is the value of y when x has value 0? How can you read this information from the equation?


    2. y = 1 when x = 0. This is the y-intercept, the value of b.

    3. Suppose you change the value of x by 2 units, Dx = 2. What is the value of Dy?
    4. When Dx = 2, Dy = 1.

    5. What is the value of Dy/Dx? How is this ratio related to the equation of this line?


    6. Dy/Dx = 1/2; this ratio is the same as the slope.

  1. Next, look at the line corresponding to y = –2x + 1.


    1. Suppose you change the value of x by 3 units so that Dx = 3. What is the value of Dy?


    2. When Dx = 3, Dy = –6.

    3. What is the value of Dy/Dx? How is this ratio related to the equation of this line?


    4. Dy/Dx = –2; this ratio is the same as the slope.

    5. What is the slope of Line B? What is its equation?


    6. Dy = 1 when Dx = 4; slope = Dy/Dx = 1/4; y = (1/4)x.

  1. Find an equation describing Line A.
  2. y = (1/4)x + 3.

  1. How are lines A and B alike? How are they different? How are the equations describing Lines A and B alike? How are they different?


  2. Lines A and B have the same steepness and they are parallel; they cross the y-axis at different locations. The slopes for the two lines are equal, m = 1/4; the y-intercepts are different, b = 3 for Line A and b = 0 for Line B.

  1. "Understory" trees are the short trees among much taller trees in a forest or jungle. Their growth is stunted because of the thick vegetation above them. Although understory trees are shorter than other trees, their crowns can be very wide. Biologists studied two species of understory trees and recorded their measurements in the scatter plot shown in Figure 5, Display 1. To sharpen the relationship between height and width, they drew lines that they thought described the general pattern of the data for each species of tree. (See Figure 5, Display 2.)
  2. Figure 5. Understory trees in a forest

    1. For each species, predict the crown width when the tree height is 4 meters.


    2. Species A: approximately 2.6 m; Species B: approximately 1.5 m.

    3. For each species, predict the tree height when the crown width is 2 meters.


    4. Species A: 3 m; Species B: 6 m.

      The two lines in Display 2 are examples of straight-line relationships between two variables. In this case the variables are tree height and crown width. The official name for such relationships is linear relationships, and the equations that describe these relationships are called linear equations.

      In the section "Let the Bones Speak," you studied linear relationships between bone length and height. Dr. Trotter’s equations are examples of linear equations relating bone length and height variables.

    5. Which of the two lines in Display 2 can be described by a linear equation from the y = mx family? How can you tell? What is the value for m (approximately)? How did you determine m’s value?


    6. The line for species A. The line goes through the origin. The approximate value for m is 2/3. We found this value for m by starting at the origin. Then we got to another point on the line by moving up 2 meters and across 3 meters.

    7. The other line can be described by a linear equation from the y = mx + b family. (The value of b will not be 0 for this line.) Determine an equation for this line.


    8. m » 2/7. We started at the point x = 2.5 and y = 1. We moved 1 unit up and 3.5 units across to get to another point on the line. So the slope is approximately 1/3.5 = 2/7 » .29.

      When x = 0, y » 0.25. This gives you the value of b.

      The equation for Species B is y = (2/7)x + (1/4) or approximately y = 0.29x + 0.25.

    9. In your equation for Species B, what does m mean in this context? What does b mean?


    10. m = 2/7 means that the crown width changes, on average, by 2/7 meters each time tree height increases by 1 m. Theoretically, the value of b indicates that trees that are 0 meters tall will have crown widths that are approximately 0.2 m. This is nonsense. The value of b does not have meaning relevant to this context. It may suggest that, in the early stages of growth when the height is near zero, the crown grows rapidly.

      Up to this point, most of the linear equations you have worked with have been written in slope-intercept form, meaning as members of the y = mx + b family. Suppose you choose b = 3. Then, all members of the y = mx + 3 family will pass through the point (0,3). (Why?) Or suppose that you wanted your lines to pass through the point (4,3). What equations would you use to describe these lines? The key to the answer is contained in the next question.

  1. Check that the following lines all pass through the point (4,3). Two of the equations specify the same line. Which two?

    1. y – 3 = 2(x – 4)
    2. y = 5(x – 4) + 3
    3. y – 3 = –2(x – 4)
    4. y = 2(x – 4) + 3
    5. y = m(x – 4) + 3 (Even though you don’t know the value of m, you can still check that y has value 3 when x has value 4.)

    The two lines that are the same have equations y – 3 = 2(x – 4) and y = 2(x – 4) + 3.

  1. A scatter plot is shown in Figure 5. It’s center is marked with an X at (10,20) and a line is drawn through the X in the general direction of the pattern of dots.
  2. Figure 5. Data with X at center

    1. Explain why the following statement is true: The graph of any linear equation from the y = m(x – 10) + 20 family will pass through the X in this scatter plot.


    2. When x = 10, then y = m(10 – 10) + 20 = 20.

    3. Approximately what is the slope of the line? Explain how you arrived at your answer.


    4. Sample answer: Select two points on the line as far apart as possible. For example, when x = 0, y » 15 and when x = 20, y » 24. This gives a slope of (24 – 15)/20 = .45

    5. Determine an equation for the line.


    6. Sample answer:

      y = 0.45(x – 10) + 20.

    7. Find an equation from the y = mx + b family that is algebraically equivalent to your equation in part (c).


    8. Sample answer:

      y = 0.45x + 15.5.

  1. Here is a general equation for a line: yk = m(xh). x and y are variables; m, h, and k are constants.
    1. What letter matches the slope of this line?


    2. m is the slope.

    3. What two letters tell you the coordinates of a point on this line?


    4. When x = h, y = k; (h,k) is a point on the line.

    5. This form of linear equation is called the point-slope form. Why is this a good name?


    6. It’s a good name because, when you know the slope and a point on the line, you know the line’s equation.





Activity 3—I Predict That

   

The data in Table 3 show the forearm lengths and heights of students in a tenth-grade class.

Female

Male

Name

Forearm
Length (cm)

Height
(cm)

Name

Forearm
Length (cm)

Height
(cm)

Alice

24

157

Allan

26.5

173

Bia

24.5

166

Brian

27

177

Christi

27

164

Daniel

27

174

Chantalle

24

164

Davis

31

192

Coral-Anne

23

161

Eric

28

172

Jennifer

27.5

164

Kevin

29

180

Ji-Hyun

27

167

Lenny

27

174

Kim

26

162

Larry

28

175

Kristen

26

175

Mike

32

185

Nancy

28.5

166

Neil

30

185

Tanner

26.5

172

Rob

30

178

Teresa

25.5

176

     

Table 3. Height-forearm data from Class A

  1. Look over the data from Class A. By how much do the heights vary from the shortest student to the tallest?


  2. The heights of the students vary from 157 cm to 192 cm for a total difference of 35 cm.

    Recall the general principle that data can be examined graphically. A graphic representation of the height data might help you to assess the amount of variation in student height. Follow the instructions in Item 2 to construct your own dot plot.

     = 9, 10, 11, 12

  1. On a piece of graph paper, draw a number line that includes the heights in Figure 7. To make a dot plot, place a dot above each number that corresponds to a student’s height. If two heights are the same, place one of the dots directly above the other. Dots representing Alice, Bia, Christie, and Chantalle’s heights have already been marked. Complete the dot plot for the remaining students.
  2. Figure 7. Partial dot plot of height data from Class A


  1. Suppose another tenth grader joined Class A. Would it be reasonable to predict that the tenth grader would be between 164 cm and 180 cm tall? Explain your answer.


  2. Two sample answers follow. The first answer is more in the spirit of prediction.

    Sample answer #1: There were only three people in Class A with height’s below 164 cm and three above 180 cm. The heights of seventeen of the 23 students fall within the 164- to 180-cm range. So, if heights of students in Class A are representative of tenth-grade students in general, chances are pretty good that this prediction will be correct. Even if the prediction is wrong, it will be off by at most 12 cm (provided tenth-grade students, in general, have heights similar to those of the students in Class A).

    Sample answer #2: This prediction could be wrong. What if the tenth grader is only 157 cm?

  1. One way to predict a new student’s height is to take the average of all the heights.


    1. Find the average of all the students’ heights. Mark the average with an "X" on your dot plot.


    2. The average height is approximately, 172.1 cm.



    3. Suppose the new student is as short as the shortest student in Class A. How far off was the prediction in part (a)?


    4. If the new student is 157 cm, the prediction will be off by 15.1 cm.

    5. Suppose the new student is as tall as the tallest student in Class A. How far off was the prediction in part (a)?


    6. If the new student is 192 cm, the prediction is off by 19.9 cm.

    7. Do you think taking the average helped to make a good prediction? Explain. Can you suggest a better one?


    8. Sample answer #1: The prediction in part (a) is near the middle of the height data. If the unknown student’s height lies somewhere in the middle of the heights of students in Class A, this prediction will be very close to the actual height of the student. If the student’s height is as small as the smallest student in Class A or as tall as the tallest student, this prediction will be off by at most 19.9 cm. So this prediction seems reasonable.

      Sample answer #2: Perhaps the median height, 173 cm, might be better.

      Sample answer #3: Maybe she should use 174.5 cm, midway between the shortest and tallest student’s heights. Then her prediction would be off by no more than 17.5 cm.

  1. Notice that the data separate into two groups, one to the left of 170 and the other to the right. Do you think this separation shows the split in height by gender? Figure 8 shows number lines for use with two dot plots, one showing only girls’ heights and the other showing only boys’ heights. They use the same scale.
  2. Figure 8. Number lines for comparative dot plots


    1. On your own paper, draw these number lines and include the data for the two dot plots.
    2. What do these dot plots tell you about the heights of the Class-A tenth-grade girls and boys? Do girls or boys tend to be shorter in Class A?


    3. The girl’s heights are between 157 cm and 176 cm; the boys, between 172 cm and 192 cm. Even though a few of the girls in Class A are taller than several of the boys, the boys in Class A tend to be taller than the girls.

  1. Suppose the new student’s name is Malisa. Realizing that the new student is a girl may change your prediction.


    1. Find the average for girls’ heights in Class A to predict Malisa’s height. If Malisa is as short as the shortest girl in Class A, how far off is your prediction? What if Malisa is as tall as the tallest girl?


    2. Using the average of the girls’ heights, the predicted height is 166.2 cm. If Malisa is 157 cm, this prediction is off by 9.2 cm; if she is 176 cm, this prediction is off by 9.8 cm.

    3. Do you think this is a better prediction than the prediction made in Item 4? Explain.


    4. The prediction from Item 4, 172.1 cm, is probably too high if you know that the student is a girl. (Only three girls are at least 172 cm tall.) If Malisa’s height is in the middle of the girls’ heights for students in Class A, using the prediction of 166.2 cm will be off by no more than 9.8 cm. On the other hand, the prediction of 172.1 cm may be off by as much as 15.1 cm.

  1. Suppose the new student turns out to be Martin (a boy), not Malisa.


    1. Chose a method for predicting Martin’s height. Give your prediction and describe your method.


    2. Sample answer:

      Possible predictions: Average of boys’ heights is approximately 178.6 cm; median of boys’ heights is 177 cm. Some students may just eyeball what seems to be a typical boy’s height.

    3. If Martin’s height is somewhere between that of the shortest boy and the tallest boy, what is the largest possible error that could have resulted from your prediction?


    4. This answer depends on the answer to part a. Sample answer: If the prediction is 178.6 cm, it could be off by at most 13.4 cm.

    Using the height data from Class A, you have computed at least two and possibly three different averages: an average of all the data, an average for the girls, and an average for the boys.

    The term mean is another name for average. For the remaining items, when you are asked to calculate the mean, just find the sum of the data and then divide by the number of data points. It’s no different from calculating an average.

    If your data have been entered into one of your calculator’s lists, you can use a built-in calculator command to compute the mean. However, you will need to know some mathematical shorthand to understand what your calculator is telling you. (See Table 4.)


    Shorthand Notation

    Meaning

    å x

    The sum of the data

    n

    The number of data

    The mean, the sum of the data divided by the number of data

    Table 4. Table of shorthand

  1. For example, suppose you want the mean height of students in Class A. After you enter the data into your calculator and press a few keys, the screen in Figure 9 appears on your calculator.
  2. Figure 9. One-variable statistics screen

    1. What is the sum of these data?


    2. 972

    3. How many people were is this small group?


    4. 6 people

    5. What was the mean height for the people in this group?


    6. 162 cm

  1. Now apply the ideas of this activity to your class. Measure and compare the heights of students in your class to the heights of students from Class A.


    1. Make two dot plots for your class data similar to the ones that you made for Item 5 (a), one for the boys’ heights and one for the girls’ heights.


    2. Answer depends on your class data.

    3. Enter the boys’ heights and girls’ heights into separate lists in your calculator. What is the mean height for the boys? What is the mean height for the girls?


    4. Answer depends on your class data.

    5. Based on your dot plots and the means of boys’ heights and girls’ heights, do the boys in your class tend to be shorter or taller than the girls?


    6. Most likely, the boys will tend to be taller than the girls.

    7. Compare the data from your class to the data from Class A. Describe the similarities and differences between the two data sets.


    8. Answer depends on your class data. If your students are older than tenth-grade students, the boys in your class may be taller than the boys in Class A. Perhaps the girls’ heights will be similar to the girls in Class A.

    If you find it helpful, you may use your calculator’s built-in statistical capabilities to calculate the means in the remaining problems.

  1. A researcher gathered data on the number of gray hairs on the heads of 25-year-olds. These are the data she found.
  2. 0

    23

    45

    6

    8

    9

    33

    15

    0

    2

    4

    10

    12

    13

    34

    67

    40

    38

    27

    25

    0

    13

    34

    23

    56

    34

    7

    789

    44

    6

    4

    0

    31

    22

    5

    16

    17

    11

    2

    1

                   


    1. Represent these data in a dot plot. (How do you plan to deal with the largest data point?) Then use your dot plot to help you list your data from smallest to largest.
    2. (Students may choose simply to list the largest data point rather than to put a break in the scale.)

      Ordered data: 0, 0, 0, 0, 1, 2, 2, 4, 4, 5, 6, 6, 7, 8, 9, 10, 11, 12 , 13, 13, 15, 16, 17, 22, 23, 23, 25, 27, 31, 33, 34, 34, 34, 38, 40, 44, 45, 56, 67, 789

    3. Take the smallest ten numbers and calculate the mean (the average) of these ten data. Then take the largest ten numbers and calculate the mean of these ten data. Which of the two means is a better predictor for the number of gray hairs on the head of a random 25-year-old? Justify your answer.


    4. Mean of smallest 10 data points: (0 + 0 + 0 + 0 + 1 + 2 + 2 + 4 + 4 + 5)/10 = 1.8

      Mean of largest 10 data points: (34 + 34 + 34 + 38 + 40 + 44 + 45 + 56 + 67 + 789)/10 = 118.1

      The first mean, 1.8, appears to be a better prediction than the second mean, 118.1. At least 1.8 is close to several data points, while 118.1 is not near any of the data.

    5. Next, calculate the mean using all the data.


    6. Mean of all data: 1526/40 » 38.2

    7. Statisticians use the term outlier when referring to data much larger or smaller than the rest of the data. How does the outlier in this data set, 789, affect the mean? To find out, calculate the mean again, this time leaving out the outlier.


    8. Mean data with outlier removed: 737/39 = 18.9

      When 789 was removed, the mean went down considerably. So, the outlier inflated the prediction.

    9. You have calculated four means, two for part (b) and one each in parts (c) and (d). Which one of these means do you think is the better predictor of the number of gray hairs on 25-year-olds? Why?


    10. Prediction: 18.9 (see figure below)

      When predicting the number of gray hairs, if the actual number of gray hairs is very low, using 1.8 would be a more accurate estimate. However, if the actual number is 780, using 118 would be the more accurate estimate. But you won’t know the actual number of gray hairs before you make your prediction; therefore, use 18.9 as your prediction because it is closer to more of the data than the other predictions.

  1. When the researcher (from Item 10) gathered data on the number of gray hairs on the heads of 20-year-olds, the data looked quite different from those for the 25-year-olds. Her data are displayed in the dot plot in Figure 10.
  2. Figure 10. Number of gray hairs on the heads of 20-year-olds

    1. Suppose a 20-year-old student teacher will be visiting your class tomorrow. Predict the number of gray hairs on the student teacher’s head.


    2. The mean is 2.6. Prediction: approximately 3 gray hairs.

    3. If you had bet money on your prediction, would you prefer to predict the number of gray hairs on the head of a 25-year-old or on a 20-year-old? Why?


    4. Because there is less variability in the data for the 20-year-olds, you are more likely to predict a number that is very close to the actual number.

If the data have a lot of variability (in other words, the data are very spread out), it is difficult to make precise predictions. If, instead, the variability in the data is small, so that the data are very concentrated, it is much easier to make fairly precise predictions.





Homework 3—Exercising Judgment

   

Each of the items in this assignment provides an opportunity to compare data from two groups. When you make comparisons to analyze data, use what you have learned from Activity 3 as well as common sense.

  1. Table 5 lists the weights of babies at birth for two groups of babies. The first group was babies whose mothers never smoked. The second group was babies whose mothers smoked at least ten cigarettes per day. From these data, does it appear that smoking has an influence on a baby’s birthweight? Explain your answer.


  2. Never smoked

    6.3 7.3 8.2 7.1 7.8 9.7 6.1 9.6 7.4 7.8 9.4 7.6

    Smoked ten or more cigarettes per day

    6.3 6.4 4.2 9.4 7.1 5.9 6.8 8.2 7.8 5.9 5.4 6.3

    Table 5. Babies’ birthweights

    Comparative dot plots appear below.

    Never Smoked: Average weight—approximately 7.9 pounds

    Smoked: Average weight—approximately 6.6 pounds

    Mothers who smoked had, on average, babies that weighed less than mothers who did not smoke.

    The mothers of the four babies who weigh below 6 all smoked. None of the babies whose mothers did not smoke weighed less than 6 pounds.

  1. Two groups of high school students were asked how much they typically spend on entertainment on average. The first group was 12 students who did not exercise; students in the second group exercised at least twice a week. The results of the survey are displayed in Table 6.


  2. Does not exercise

    10

    5

    20

    4

    20

    20

    15

    0

    8

    40

    8

    15

    Does exercise

    15

    15

    15

    5

    10

    5

    5

    6

    30

    25

    30

    60

    Table 6. Cost of entertainment (dollars)

    1. From the data in Table 6, make two dot plots using the same scaling on each. Place one dot plot directly above the other.


    2. The comparative dot plots appear below.

    3. What can you learn from your dot plots?


    4. The data for the "exercise" group appear shifted slightly right of the data for the "does not exercise" group. This would indicate that the "exercise" group tends to spend more on entertainment than the "does not exercise" group. The shift is more noticeable if you ignore the largest value in each data set. Then the pattern for the "does not exercise" group spans from $0 to $20 while the pattern for the "exercise" group spans from $5 to $30.

    5. Predict the amount spent on enertainment by a person who exercises. Explain why the average amount spent by the "exercise" group might not be a good choice for your prediction.


    6. The average of these data is approximately $18.42. However, one person in this group spent $60. If the $60 outlier is removed, the average is only about $14.64. A prediction of $14.64 looks more central to the remaining data.

    7. Complete the following sentences: "I predict that a person from the "does not exercise" group will spend _______ on his or her next entertainment. However, given what this group has spent on entertainment in the past, this person might spend as little as _____ or as much as _____. So my prediction might be as far off as _______ ." (Add any comments that you think shed light on your prediction.)


    8. Sample answer: "I predict that a person from the "does not exercise" group will spend $11.36 on his or her next entertainment. However, given what this group has spent on entertainments in the past, this person might spend as little as $0 or as much as $40. So my prediction might be as far off as $28.64. However, it is more likely that they will be off by no more than $11.36."

      If students did not remove the highest value, the average would be $13.75 instead of $11.36.

    9. Now compare your predictions. According to your predictions, which of the two groups of students spends more on entertainment. Does your dot plot support the same conclusion?


    10. Sample answer:

      We predict that a student who exercises will spend $14.64 on entertainment and a student who does not exercise will spend $11.36 on entertainment. The dot plot indicated that students who exercised tended to spend more than students who did not exercise. So the dot plot confirms the higher prediction for the group that exercises.

    11. Make a scatter plot of these data. (In other words, plot the points (15,10), (15,5), and so forth.) Label the vertical axis "does not exercise" and the horizontal axis, "does exercise."


    12. Solution (chart below):

    13. Is it valid to claim that there is a direct connection between the exercise and nonexercise groups? Could you use the typical amount spent by one in the "does exercise" category and predict how much a person in the "does not exercise" category would spend? Explain.


    14. The scatter plot doesn’t make any sense in this situation. The spending levels for the "exercise" group and the "does not exercise" group are not paired. In other words, the x and y coordinates were matched up only by the order in the table. They don’t belong together in any meaningful way.





Activity 4—Forearmed Is Forewarned

   

 = 13, 14

In Activity 1, you were asked to predict the heights of Bones 1 and Bones 2 (and perhaps Bones 3 or more) whose bones were described in Table 1. At that time, your best guess was most likely based on what you know, in general, about people’s heights. In Activity 3, you were asked to predict students’ heights. Your predictions were based on data from a single variable, student heights. Later, you used your knowledge of whether the student was male or female to improve the precision of your predictions. But you have not yet made any connections between the length of someone’s forearm and his or her height.

For this assignment, use the data from Class A shown in Table 3.

  1. On graph paper, represent the data with a scatter plot. Because forearm length is being used to "explain" or "predict" height, it is called the independent variable. Because height generally changes in response to changes in forearm lengths, height is called the dependent variable. It is customary to display the independent variable on the horizontal axis and the dependent variable on the vertical axis. Remember to label each axis with its variable and an appropriate scale for that variable. To differentiate the boys from the girls, use two colors (or different shapes), one to represent the girls’ data, and the other the boys’.


  2. Student graphs should be similar to the one below. Circles represent boys, and squares girls.


  1. Use your scatter plot to make the following predictions.


    1. If a girl of the same age as the students in Class A had a forearm that measured between 25 and 27 cm, what would you predict for her height? How accurate do you think your prediction would be?


    2. Sample answer:

      There are six female students with forearm lengths between 25 and 27 cm. The heights of these six students are 162, 164, 167, 172, 175, and 176 cm. Use the average of these heights, approximately 169 cm, for the prediction. (Some students may decide to choose the median or some other number between 162 and 175 for their predictions.) If the student’s height turns out to be only 162 cm, this prediction is off by 7 cm. On the other hand, if the student turns out to be 175 cm tall, this prediction is off by 6 cm. So the largest error would be around 7 cm (provided the students in Class A are similar to all tenth-grade students).

    3. Predict the height of a tenth-grade boy with a 28.5-cm forearm. Explain how you determined your answer.


    4. Student answers will vary. Here are some important points to consider.

      Only one student in Class A has a 28.5-cm forearm. The student is 166 cm tall and happens to be a girl. There are, however, two boys with forearms of 28 cm. Their heights are 172 cm and 177 cm. So the average height would be 173.5 cm. The height of a boy with a 29-cm forearm is 180 cm. A reasonable prediction might be to select the height that is midway between 173.5 and 180 cm for a predicted height of approximately 177 cm.

    5. Predict the height for a tenth-grade student who has a forearm of 33 cm. How did you do this?


    6. Student answers will vary. This forearm length is larger than any in the data collected from Class A. So, to make the prediction you need to follow the pattern of the data beyond the observed data. A prediction of around 194 cm seems reasonable.

  1. Archaeologists study ancient human life and, similar to artists, they frequently use general rules of proportions. For example, an archaeologist might use the proportion that the forearm of a typical female teenager is 16% of her height.


    1. Translate this relationship into an equation that relates forearm length, x, and height, y. Then test a few points to be sure your equation makes sense.


    2. The 16% model: y = x/0.16 = 6.25x, where y is height (cm) and x is forearm length (cm).

      For the remainder of this activity, this equation will be referred to as the "16% model."

    3. Sketch a graph of the 16% model on the same set of axes as your scatter plot.


    4. Student graphs should be similar to the one below.

    5. Is the 16% model true for all the people in Class A? Justify your answer based on your graph.


    6. No, this relationship is not true for all students in Class A. If it were true, all the data would have to fall along the line representing the 16% model.

  1. Do you think the 16% model fits the boys’ or the girls’ data better? Justify your answer based on the model’s residual errors.


  2. The boys’ data appear to lie closer to the line than the girls’ data. The residual errors appear in the last column of the table that follows.

    Girls’ Data

    Forearm length
    (cm)
    Height
    (cm)
    Predicted height using
    16% model (cm)
    Residual error
    (cm)
    24.3 157 152 5.1
    24.5 166 153 12.9
    27.8 164 174 –9.8
    24.1 164 151 13.4
    23.7 161 148 12.9
    27.5 164 172 –7.9
    27.4 167 71 –4.3
    26.6 162 166 –4.3
    26.1 175 163 11.9
    28.5 166 178 –12.1
    26.6 172 166 5.8
    25.4 176 159 17.3

    Table 7


    Boys’ Data

    Forearm length
    (cm)
    Height
    (cm)
    Predicted height using
    16% model (cm)
    Residual error
    (cm)
    26.5 173 166 7.4
    27.3 177 171 6.4
    27.8 174 174 0.3
    31.4 192 196 –4.3
    28.3 172 177 –4.9
    29.2 180 183 –2.5
    27.3 174 171 3.4
    27.9 175 174 0.6
    31.8 185 199 –13.8
    30.2 185 189 –3.8
    29.8 178 186 –8.3

    Table 8

    Average absolute value of the errors for girls’ data: » 9.5 cm; for boys’ data: 5.4 cm.

    The height-forearm data from Class A are fairly spread out. Picking a line that describes the data, or makes good predictions for heights based on forearm lengths, is somewhat difficult because of the large amount of variability in the data.

 = 15, 16, 17

  1. Below are two methods to help you select a line that describes the pattern of the height-forearm data from Class A. Divide your group in half. Half of your group should use Method #1 and the other half should use Method #2.
  2. Method #1:

    Pick a point that appears to lie in the middle of the points displayed in your scatter plot. What are the coordinates of this point? Now anchor your line to this point and adjust the slope until you find a line that you think best describes the pattern of the data. What is the equation of the line you have selected. How did you decide what line was best?

    Method #2:

    Draw two lines in such a way that the points on your hand-drawn scatter plot are bounded as tightly as possible between these lines. (The lines don’t have to be parallel.) Now draw one line halfway between the two lines that you have drawn. What is the equation of this line? How did you decide which line was closest to the middle of the two outer lines?

    Student answers will vary. Here are sample answers for the two methods.

    Method #1: Selected point (27.2,172.6), the mean of the forearm lengths and heights. Using a graphing calculator, we experimented with equations of the form y = m(x – 27.2) + 172.6 until we found one that split the data about in half with about the same number of points above the line as below the line. The equation of our model is y = 4(x – 27.2) + 172.6. A graph of the data, the anchor point marked with a square, and the line are shown below.



    Method #2 (see figure above). We drew two lines that bounded the data between them. Then we selected a line between these two lines that had approximately as many of the data points above the line as below the line. The results shown in the calculator screen below are similar to the lines that we drew by hand.

    Figure 12

    The equation for our model is y = 4(x – 27) + 170.

  1. In Item 5, your group used two methods to determine a model (equation) that described the data in your scatter plot.


    1. Express your equations in slope-intercept form (y = mx + b) if they are not already in that form. Compare the models determined by the two methods.


    2. Student answers will vary depending on their answers to Item 5.

      Sample answer:

      Method #1: y = 4.0x + 63.8

      Method #2: y = 4.0x + 62.0

      In this case, the two models differed in only the y-intercept. The graph of Model #1 is 1.8 units higher than the graph of Model #2.

    3. Which model, the one from Method #1 or the one from Method #2, appears to describe the pattern of the data better?


    4. Sample answer:

      In this case it is difficult to tell which model fits the data better just by looking at the models and the data. For Model #1, 12 of the data points lie above the line, one lies on the line, and 10 lie below the line (note that (27,174) appears twice in the data). For Model #2, 14 data points lie above the line and 9 below the line. So, if we use the criterion of equal numbers of points above and below the line, Model #1 is better.

      Students could also analyze the prediction errors.

    5. What was your criterion for choosing the better model?


    6. Sample answer:

      The line that was closer to having an equal number of points above and below was the better line.

      Some students may decide that the line with the sum of the errors closer to zero was the better line. Still others might calculate the average of the absolute values of the errors and choose the line for which this average was smaller.

    7. Using your criterion, does your selected model from part (b) appear to fit the data better than the 16% model? Explain.


    8. The 16% model had 11 points above the line, 2 on (or approximately on ) the line, and 10 below the line. Using the criterion of choosing a line that splits the data into equal numbers of points above and below the line, the 16%-model does as well as the model from Method #1. However, the line from Method #1 seems to capture the pattern of the data better than the 16% model.

  1. Use your model from Item 6 (b) or the 16% model (whichever you think is better) to make the following predictions.


    1. Predict the height of a student whose forearm is 27 cm. Use the data from Class A to assess the precision of your prediction.


    2. Sample answer:

      Using the model from Method #1, y = 4x + 63.8, the predicted height is 171.8 cm. Students with forearms of length 27 cm varied in height from 164 cm to 177 cm. If the student is actually 164 cm tall, the prediction will be off by 7.8 cm.

    3. Predict the height of a student whose forearm is 33 cm. Do the data provide any clues to suggest how precise this prediction might be? Explain.


    4. Sample answer:

      Using the model from Method #1, y = 4x + 63.8, the predicted height is about 196 cm. There are no data from students with forearms as long as 33 cm. Hence, it is difficult to assess how far off this prediction might be.

    5. The forearm lengths of two students differ by 1 cm. Predict how much their heights differ. What if their forearm lengths differed by 2 cm? Justify your answers.


    6. Sample answer:

      Using the model from Method #1, y = 4x + 63.8, the predicted difference in height for the students whose forearm lengths differ by one centimeter is 4 cm. If the students forearm lengths differed by 2 cm, the predicted difference in height would be 8 cm.

    7. What does the value of the slope in your model tell you about people?


    8. For each 1-cm increase in forearm length, you should expect about a 4-cm increase in height.

    The height-forearm data from Class A were fairly scattered about the line that you chose for your model. The amount of scatter (variability in the data) made determining a model that "best" describes the data somewhat difficult.





Homework 4—The Nature of Our Relationship

   

In this assignment you will make scatter plots of data sets and, in some cases, fit lines to the scatter plots and make predictions. Pay particular attention to the characteristics of the relationship between the variables.

Do the points in the scatter plot appear to be scattered on either side of a straight line? If so, the scatter plot has linear form and it makes sense to describe it with a linear equation (a member of the y = mx + b family).

Does the pattern made by the points move upward as you look from left to right? If so, the two variables are positively related (as one variable increases the other tends to increase). If the pattern drifts downward, the two variables are negatively related (as one variable increases the other tends to decrease).

  1. Linda heats her house with natural gas. She wonders how her gas consumption is related to how cold the weather is. Table 7 shows the average outside temperature (in degrees Fahrenheit) each winter month and the average amount of natural gas Linda’s house used (in hundreds of cubic feet) each day that month.


  2. Month Sep Oct Nov Dec Jan Feb Mar Apr May
    Outdoor temperature ºF 48 46 38 29 26 28 49 57 65
    Gas used per day _ 100 cu ft 5.1 4.9 6.0 8.9 8.8 8.5 4.4 2.5 1.1

    Table 7. Gas usage and temperature data


    1. Make a scatter plot of these data. Which is the independent variable and which is the dependent variable? How did you decide?


    2. The independent variable is temperature and the dependent variable is the gas used per day. Generally, people want to predict the amount of fuel they use. Generally, temperature is a key factor in determining fuel usage (see chart below).


    3. Describe in words the characteristics of the relationship between outside temperature and natural gas consumption. Why does the relationship have this direction?


    4. There is very little scatter around a straight line. The direction of the line is downward, so the variables have a negative relationship. An increase in the mean monthly temperature produces a decrease in the monthly gas consumption.

    5. Draw a line that you think best describes the pattern of these data. What is the equation of your line?


    6. Sample answer:

      Using the points (26,8.8) and (65,1.1): y = –0.20x + 14.

    7. Use your equation from (c) to predict the gas used during a month when the average temperature is 60° F.


    8. Sample answer based on equation in c above: 3.7

  1. The 11 members of a college women’s golf team play a practice round, then the next day play a round in competition on the same course. Their scores appear in Table 8. (A golf score is the number of strokes required to complete the course, so low scores are better.)


  2. Player

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    Practice

    89

    90

    87

    95

    86

    81

    105

    83

    88

    91

    79

    Competition

    94

    85

    89

    89

    81

    76

    89

    87

    91

    88

    80

    Table 8. Golf scores

    1. Make a scatter plot that allows you to study how well the competition score can be predicted from the practice score.


    2. Answer (chart):

    3. Describe the relationships between practice and competition scores. In particular, is there a positive or negative relationship? Explain why you would expect the scores to have a relationship like the one you observe.


    4. The points are loosely scattered around a straight line with a positive association between the variables. This indicates that the better golfer should have lower scores for both practice and competition.

    5. One point falls clearly outside the overall pattern. Circle this point in your plot. A good golfer can have an unusually bad round, or a weaker golfer can have an unusually good round. Can you tell from the data given whether the unusual value is produced by a good player or a poor player? What other data would you need to distinguish between the two possibilities?


    6. The outlier is circled in the scatter plot in (a). It is not possible to tell if this player is a good player who had a bad practice round or a weaker player who had an unusually good competition. You would need more data on this player to decide.

    7. You might expect a player to have about the same score on two rounds played on the same course. Draw on your graph the line that represents the same score on both days. Does this line fit the data well when you ignore the outlier? If you don’t like this line, draw a line that you would prefer to use to predict the competition score from the practice score.


    8. The graph below shows the data with the outlier omitted and the line y = x superimposed. It appears to do a reasonable job in describing these data.

    9. Another golf team member shot a 95 in practice. Predict her score in competition.

    10. Prediction: 95 in competition round.

    When the relationship between two variables is strong and has linear form, the points in a scatter plot will fall very close to a line. For weaker relationships, the data are more scattered.

  1. Figure 11 contains two displays (scatter plots) showing the relationship between human height and the ulna length (forearm-bone length). One of the scatter plots is based on real data and the other on fictitious data. Which do you think is which? In each display, a line describing the pattern of the data has been added to the scatter plot.


  2. Figure 11. Two scatter plots showing height versus ulna length

    1. Both relationships have a linear form. Which of the two displays shows a stronger linear relationship? Explain.


    2. Display 2: The data in Display 2 appear to lie in a narrower band about the line than the data in Display 1.

    3. Suppose that scientists used the model
    4. y = 60.2 + 0.42x

      (where x is ulna length in mm and y is height in cm)

      to predict the height of the deceased whose ulna length of 290 mm was recorded in Figure 1. What did scientists predict as his or her height?

      182 cm

    5. How precise is the scientists’ prediction if Display 1 shows the real data? How precise is the scientists’ prediction if Display 2 shows the real data? Which display better supports the scientists’ prediction?


    6. Sample Answer:

      Display 1: The actual height might be as low as 165 cm and as high as 195 cm.

      Display 2: The actual height might be as low as 175 cm and as high as 185 cm.

      The prediction is more precise if the real data are from Display 2.

    7. Describe the connection between the strength of a linear relationship and the degree of precision with which you can make predictions using that scatter plot.


    8. When the data in a scatter plot form a narrow band about the linear model, the predictions are more precise than when the data form a wide band about the linear model. You can make more precise predictions when you are dealing with a strong relationship than with a weaker one.





Activity 5—Dangerous Waters

   

 = 15, 16

On the coast of Florida lives the manatee, a large, appealingly ugly, and friendly marine mammal. However, the gentle Florida manatee does not live a carefree life. It is one of the most endangered marine mammals in the United States. One major threat to the manatee’s survival is the large numbers of manatees killed each year by powerboats.

Should the Florida Department of Environmental Protection limit the number of registered boats in order to protect the manatee population? Before deciding to limit the number of registrations, the department will have to present a convincing argument to the public.

Table 9 contains data on the number of powerboats registered in Florida (in thousands) and the number of manatees killed.

Year

Powerboat
Registrations
(in thousands)
Manatees Killed
1977 447 13
1978 460 21
1979 481 24
1980 498 16
1981 513 24
1982 512 20
1983 526 15
1984 559 34
1985 585 33
1986 614 33
1987 645 39
1988 675 43
1989 711 50
1990 719 47

Table 9. Powerboat registration and manatee deaths in Florida 1977-1990.

In this activity, you are asked to make a convincing case for whether or not to restrict powerboats. The main steps in building a convincing argument to present to the authorities are:

  1. Find a prediction equation.
  2. Show that the prediction equation does a good job of describing real data.
  3. Use the equation to make your prediction.

Finding the Least-Squares Prediction Equation

  1. Enter the data from the last two columns of Table 9 into calculator lists (or columns A and B of a spreadsheet).
  2. Your graphing calculator (as well as most scientific calculators and spreadsheets) has built-in commands that calculate the equation of the least-squares line, a line that statisticians frequently select as the best-fitting line.

    Least-squares criterion: A line’s score is the sum of the squared (residual) errors. Statisticians frequently refer to this number as the SSE. The best line (the "least-squares line") has the smallest SSE.

    However, you probably won’t see the term "least-squares line" anywhere on your calculator. The general name for determining an equation that fits the data is regression. Because, in this case, you are looking for a linear relationship, the technique used in fitting the "best" line to data is called linear regression.

    1. Use your calculator (or spreadsheet) to calculate the values of the slope and y-intercept of the least-squares line. Then write its equation.


    2. Least-squares equation: y = 0.125x – 41.4

      (The values for slope and y-intercept have been rounded. More decimal places have been retained for the slope. You can round predictions to an appropriate number of decimal places later.)

    3. What do your slope and y-intercept tell you about boats and manatees?


    4. The slope tells you that, each time you increase the number of powerboats by one unit (1000 boats), you can expect 0.125 additional manatee deaths. The value of the y-intercept is meaningless in this context.


Assessing the Quality of Fit

  1. Plot both the data and the least-squares line so you can see both graphs in the same window. Does the least-squares line appear to fit the data well? Support your answer.


  2. In the plot below, there appear to be about as many points above the line as below the line. Most of the dots are fairly close to the line. The line appears to fit the data fairly well.


  1. In assessing whether a linear regression model is adequate to describe the relationship shown by a scatter plot, statisticians routinely ask if the line meets the following criteria:
  1. The sum of the errors, also called residuals or residual errors, should be close to 0.
  1. The data in the scatter plot should be randomly scattered above and below the line.
    1. Set your calculator to compute a list of the residuals (errors) for the least-squares line. Then calculate the sum of the residuals. Is the sum of the residuals close to 0? (How can you use the statistical features on your calculator to compute this sum most efficiently?)


    2. Sample answer:

      We used the one-variable statistical feature on the calculator to get the sum of the data in the list containing the residuals. The value of the sum of the residuals is –3 × 10–12.

    3. Does the least-squares line appear to satisfy criterion (2) above? What does this imply about the residuals?


    4. Yes, the dots appear to be randomly scattered above and below the line. This means that the residuals are randomly scattered between those with positive sign and those with negative sign.

    5. Using your calculator, make a scatter plot of the residuals versus the number of powerboat registrations. Do the dots in your plot look randomly scattered above and below the horizontal line y = 0?


    6. The dots in the scatter plot below appear randomly scattered.

      A plot of the residuals versus the independent variable, such as the plot you have made for Item 4(c), is called a residual plot. The residual plot gives you information about how well your model describes the data. For example, the residual plot can help you decide if your model satisfies criterion (2) above.

    7. Explain how the residual plot can help you decide if your model satisfied criterion (2).


    8. Each dot in the residual plot that lies above the line y = 0 is a positive error and hence comes from a point on the scatter plot that lies above the regression line. Each dot in the residual plot that is negative and hence lies below the line y = 0, came from a point on the scatter plot that lies below the regression line. If the points in the residual plot are randomly scattered above and below the line y = 0, that means the points in the original scatter plot are randomly scattered above and below the regression line.

      If the points in the residual plot do not appear randomly scattered, but instead form a strong pattern, your linear model cannot adequately describe the data. In this case, you may need to search for another model, perhaps a new slope or new intercept, or maybe a model that is not linear.

    1. Why do you think you should assess whether your linear regression equation adequately describes the pattern in the scatter plot before you use your equation to make predictions?


    2. If your model does not adequately describe the data, you probably don’t want to use it to make predictions.

    3. Does the least-squares line appear to do a good job of describing the relationship between the number of manatees killed and the number of powerboat registrations? That is, can you use it to assess the outcomes of your recommendations to limit powerboat registrations?


    4. Yes, the residual plot appears to be randomly scattered.


Making Predictions

  1. Consider recommending a limit on powerboats that reduces the number of manatees killed each year to about 30. What limit should be placed on the number of powerboat registrations in order to stabilize the number of powerboat-related manatee deaths at around 30 per year. Explain how you can use your model and algebra to answer this question.


  2. Sample answer: x = 571.2 units or about 571,000 registrations.

    Solve 30 = 0.125x – 41.4 for x.

  1. Suppose, instead, that you recommend that Florida limit the number of powerboat registrations to 700,000 (slightly below the number of registrations in 1989). Predict the number of manatees that would be killed each year, on average, if this proposal were adopted. Then use your scatter plot from Item 3 to assess the precision of your prediction.


  2. Sample answer:

    y = 0.125(700) – 41.4 = 46.1 or approximately 46 manatees would be killed by powerboats. To assess the precision of this prediction, we look at the data. Since x = 700 is between observations for x = 675 and x = 711, we can base our answer on observations (675,43) and (711,50). The actual numbers of manatees killed might be as small as 43 or as large as 50, even though we have predicted 46 deaths.

  1. You have just considered recommending a powerboat-registration limit of 700,000. Consider what would happen if the limit were raised to 750,000 instead. Use your answer to item 2(b) to complete the following:


  2. Every time the limit on boat registrations is raised by 50,000 registrations, one can predict that, on average, an additional ________ manatees would be killed by powerboats each year.

    Justify your response.

    Because the units for the powerboats are in terms of 1000 boats, a 50,000 increase in the number of powerboats corresponds to an increase of 50 units for the independent variable. Using our model, we would predict an increase of (0.125)(50) = 6.25 deaths. Based on this calculation, we will respond to objectors as follows:

    Every time the limit is raised by 50,000 registrations, we predict that, on average, an additional 6 manatees will be killed by powerboats each year.





Homework 5—Anscombe’s Data

   

Tables 10–13 present four sets of data prepared by statistician Frank Anscombe.

  1. Enter each of the four sets of data into a calculator (or your spreadsheet). (Notice that for three of the data sets the x-values are the same.)


  2. x

    10

    8

    13

    9

    11

    14

    6

    4

    12

    7

    5

    y

    8.04

    6.95

    7.58

    8.81

    8.33

    9.96

    7.24

    4.26

    10.84

    4.82

    5.68

    Table 10. Data Set A



    x

    10

    8

    13

    9

    11

    14

    6

    4

    12

    7

    5

    y

    9.14

    8.14

    8.74

    8.77

    9.26

    8.10

    6.13

    3.10

    9.13

    7.26

    4.74

    Table 11. Data Set B



    x

    8

    8

    8

    8

    8

    8

    8

    8

    8

    8

    19

    y

    6.58

    5.76

    7.71

    8.84

    8.47

    7.04

    5.25

    5.56

    7.91

    6.89

    12.50

    Table 12. Data Set C



    x

    10

    8

    13

    9

    11

    14

    6

    4

    12

    7

    5

    y

    7.46

    6.72

    12.74

    7.11

    7.81

    8.84

    6.08

    5.39

    8.15

    6.42

    5.73

    Table 13. Data Set D

    [Anscombe 1973]

  1. Determine the equation of the least-squares line for each of the data sets. Compare your equations for the various sets of data.


  2. All the equations are approximately y = 0.5x + 3

  1. Make a scatter plot for each of the four data sets and draw the regression line on each of the plots.


  2. Answer: See charts below.

  1. In which of the four cases would you be willing to use the fitted regression line to predict y given that x = 14? Explain.


  2. The only data set of the first three that has a linear form is Data Set A. You shouldn’t even try to fit a line unless the data set appears to have a linear form. So, for Data Set A, you could predict that the value of y when x = 14 is 10. Data set D has an outlier that appears to tilt the line more steeply than the general flow of the data.

  1. What do you think Anscombe wanted students to learn from the four data sets that he created?


  2. He wanted students to see the dangers of not first checking to see if the data have linear form or an outlier before determining the least-squares equation.

  1. Data Set D has an outlier, one data point that appears not to follow the general pattern of the data.


    1. Remove that outlier from the data and recalculate the least-squares equation.


    2. y = 0.35x + 4.0

    3. Does the least-squares line appear to fit the remaining data better than the equation that you calculated in Item 3?


    4. If you ignore the outlier, this line appears to fit the data almost perfectly.





Activity 6—The Plot Thickens

   

There are times when it is possible to make predictions using two or more models, each based on a different independent variable. You will then need to choose which model to use. In this activity, you will learn one method for selecting among these models the one that results in the most precise predictions.

First, how would you determine which relationship will result in the more precise predictions: height versus stride length or height versus forearm length? (The asterisks(*) indicate missing data.)

Suppose you collected the data in Table 14 from a ninth-grade class.

Height (cm)

Stride Length (cm)

Forearm Length (cm)

166.0 58.2 28.5
164.5 55.9 27.2
175.0 59.1 28.6
184.0 68.9 30.5
161.0 72.5 26.5
164.0 * 28.2
171.0 * 28.4

Table 14. Data on height, stride length, and forearm length

    1. Make a scatter plot of height versus stride length and then one of height versus forearm length. (Use the same scaling on the vertical axis for each plot. For both plots, height should be on the vertical axis.) Which of the two scatter plots shows the stronger relationship? How can you tell?


    2. Answer: See charts below.

      Sample answer:

      Height versus forearm length is the stronger of the two relationships. The dots in the height-forearm scatter plot fall closer to fitting on a line than the dots in the height-stride scatter plot.

    3. Find the least-squares line for each of these relationships. Recall, the least-squares line minimizes the sum of the squared errors. Calculate the average of the squared errors for both of your least-squares lines. Explain why you would expect the stronger relationship to be the one associated with the smaller average sum of squared errors.


    4. Height = 160 + 0.167(stride length)

      Height = 10.9 + 5.59 (forearm length)

      Average for height-stride: 342.252/5 » 68.5

      Average for height-forearm: 73.29/7 » 10.5

      The average squared error represents the typical squared distance from a point to the least-squares line. The smaller this distance the stronger the relationship.

    5. Use your two least-squares equations to predict the height of a person whose stride is 73 cm and whose forearm length is 27 cm. Which of the two estimates is more reliable? Explain.


    6. Sample answer:

      Height-forearm model: Height = 10.9 + 5.59(27) » 161.8 cm

      Height-stride model: Height = 160 + 0.167(73) » 172.2

      The forearm estimate is more reliable. As was found in part b), the relationship between height and forearm length is stronger than the relationship between height and stride length.

    7. State in your own words how you would choose between two different models for predicting the same quantity.


    8. Select the model associated with the smaller average squared residuals. This should be the model for which the data appear most concentrated along the least-squares line.

Now that you have a method for selecting between two models based on different independent variables, read the following scenario and, together with the members of your group, solve the problem.


The Missing Manatee

A school’s mascot is stolen (see Figure 12). The thief has left clues: a plain black sweater and a set of footprints under a window. The footprints appear to have been made by a man’s sneaker.

The distance between the footprints, from the back of the heel on the first footprint to the back of the heel on the second, reveals that the thief’s steps are approximately 58 cm long. The thief’s forearm length can be estimated from the sweater by measuring from the center of a worn spot on the elbow to the turn at the cuff. The thief’s forearm is between 26 and 27 cm.

Figure 12. The missing manatee

School officials suspect that the thief is a student from a rival high school. You have two tasks: Gather data on stride lengths of students in your class in order to estimate the thief’s height from his or her stride length. Then use the data collected from Activity 2 to predict, as accurately as possible, the height of the thief from the length of his or her forearm.

  1. With your group, discuss methods for getting reliable measurements for stride length. (If you have completed Homework 2, base your discussions on that work.) After your group discussion, the class must decide on the method that will be used to collect the footstep data. Write a brief description of the method that will be used to collect the stride-length data.


  2. See sample answer to Homework 2.

  1. Collect the data. After you finish collecting the data, add your results to Handout 1, used in Activity 2.
  1. What is a typical stride length for students in your class? Is there a difference between what is typical for a girl and a boy? Justify your answer based on the stride-length data.


  2. Sample answers for this activity are based on class data contained in the notes to the teacher.

    The mean stride length for all students in this class is 62 cm. One stride length of 79.5 cm tends to inflate the mean somewhat. The average stride length for girls is 60.34 cm or a bit more than 60 cm; for boys it’s approximately 64 cm. However, if you compare dot plots for the boys’ stride lengths and the girls’ stride lengths, a cluster of girls had stride lengths below 55.5 cm. Very few of the boys fell in this category.

  1. The footprint was made by a man’s sneaker. However, sometimes girls wear men’s sneakers, especially high-top sneakers. So you’ll need to confirm whether or not the thief was male using what you know about the thief’s stride length and forearm length.


    1. Do the class stride-length data tend to confirm that the footprints belonged to a boy and not a girl? Support your answer with statistical evidence.


    2. Sample answer:

      There is a cluster of students whose stride lengths were around 58 cm. Some were boys and some girls. The stride length does appear to provide helpful information about the gender of the thief.

    3. Do the forearm data tend to confirm that the sweater belonged to a boy and not a girl? Support your answer with statistical evidence.


    4. Sample answer:

      Four people from this class had forearms with measurements between 26 and 27 cm. All of these students were girls. So, the forearm data tend to indicate that the thief was a girl!

      Now predict the height of the thief. You have two possible variables to consider for your prediction, stride length and forearm length. Which is better? Divide the work in Items 6 and 7 among the members of your group.

  1. Determine a relationship between height and forearm length for your actual data, first using the entire data set, then using the girls’ data, and finally using the boys’ data.


    1. Compare the three models. Is there much difference between them? Explain.


    2. Sample answer:

      Class data: y = 75.6 + 3.324x

      Boys’ data: y = 87.2 + 2.954x

      Girls’ data: y = 84.4 + 2.980x

      The slopes for the boys’ data and girls’ data are fairly close. However, the boy’s line is shifted vertically from the girls’ line. It appears, according to these two models, that boys are, on average, 2.8 cm taller than the girls with similar forearm lengths.



    3. Assess the strength of the linear relationship for the model based on the class data and the models based on the single gender data. (What numeric measure will you use to assess the strength?)


    4. Sample answer:

      To assess the strength of the linear relationship, we used the average of the squared errors.

      Class data: 720.678/27 » 27

      Boys’ data: 361.158/12 » 30

      Girls’ data: 342.513/15 » 23

      The girls’ model did better than the boys’ or the class models.

    5. Which model would give more precise predictions if, in fact, the thief were male? What if the thief were female? Justify your answer based on your data.


    6. Sample answer:

      You may do better using the class data even though you are told the thief is male. If you are fairly certain that the thief is female, use the girls’ model.

    7. Make a residual plot for each of the models. Do the dots in the plot appear to be randomly scattered, or is a clear pattern apparent? Are there any unusually large residuals?


    8. Answer: See charts below.

  1. Repeat Item 6 for the relationship between height and stride length.


  2. Sample answer:

    Class model: y = 145 + 0.399x

    Boys’ model: y = 137 + 0.590x

    Girls’ model: y = 165 + 0.019x

    Because the slope of the girls’ model is so small, it offers very little help in predicting height from stride length. Because we have decided that it is likely that the thief is a girl, we stop here and use the girls’ model from Item 6.

  1. Select the best model for the job of predicting the height of the thief. Support your selection. Finally, use this model as the basis for completing the following clue.


  2. I predict that the thief is ______ cm tall. But the thief might be as short as _____ or as tall as _____.

    Sample answer:

    Girls’ model based on forearm length: y = 84.4 + 2.980x

    I predict that the thief is between 162 and 165 cm tall. But the thief might be as short as 155 cm or as tall as 168 cm. We came to this conclusion from looking at heights corresponding to girls with forearm lengths near 26 or 27 cm.





Homework 6—You Are What You Eat

   

Selecting a line according to the least-squares criterion often produces a line with good properties. That’s why selecting a line using the least-squares criterion is so popular. However, sometimes this line does a terrible job in describing the pattern of the data. In such cases, you may have to adjust your model so that it better describes the pattern in the data.

  1. What is the relationship between the number of calories a food actually has and how many calories people think it has? A food industry group surveyed 3,368 people, asking them to guess the number of calories in several common foods.


  2. Food

    Guessed calories

    Correct calories

    8 oz whole milk

    196 159

    5 oz spaghetti with tomato sauce

    394 163

    5 oz macaroni with cheese

    350 269

    One slice of wheat bread

    117 61

    One slice white bread

    136 76

    2-oz candy bar

    364 260

    Saltine cracker

    74 12

    Medium-size apple

    107 80

    Medium-size potato

    160 88

    Cream-filled snack cake

    419 160

    Table 15. Guessed calories and actual calories
    [USA TODAY, October 12, 1983]



    1. The goal is to predict the guessed calories from the actual calories. Enter the data into your calculator and make a scatter plot with this in mind.


    2. Answer: See plot below.

    3. Describe in words the most important features of the scatter plot.


    4. There is a positive relationship between the variables. However, there are two points that appear to be outliers. Those have been circled. The pattern of the rest of the points appears to have a linear form.

    5. Find the regression line for predicting guessed calories from actual calories. Then make a residual plot. Does the regression line adequately describe these data?


    6. Regression line fit to entire data set: There are two outliers that appear to be pulling the regression line up. Most of the residuals are negative. The regression line does not appear to adequately describe these data.

    7. Would you classify any of the data as outliers? If so, identify them. What do they tell you?


    8. Yes, the entries for cream-filled snack cake and spaghetti. People guessed that they had many more calories than they actually did.

    9. If you found outliers, remove them and recalculate the regression line. Compare your new equation to the one from Item 3.


    10. Regression line after removal of the outliers. The residual plot looks more randomly scattered.

    11. Do the calories in a food enable us to predict accurately what people will guess? Explain.


    12. The size of the largest error was approximately 30 calories. The line appears to do a good job in predicting the guessed calories from the actual calories, provided the spaghetti and cream-filled snack cakes are removed from the data.

    13. Interpret the meaning of the slope of your model for predicting guessed calories from actual calories.


    14. The slope was 1.14. This means that the guessed calories will increase by 1.14 each time the actual calories increase by one.

  1. A young swimmer’s favorite stroke is the butterfly. Her times are listed below.


  2. Race Number

    25-Yard Butterfly Time
    (seconds)

    1 60.81
    2 66.11
    3 47.32
    4 42.69
    5 43.40
    6 44.82
    7 42.67
    8 45.17
    9 41.20
    10 43.68
    11 42.47
    12 41.74
    13 40.40
    14 42.90

    Table 16. Butterfly times

    1. Use your calculator to make a scatter plot of the data. Describe the nature of the relationship between time and race number. Are any outliers apparent? If so, describe their general location relative to the non-outliers in the scatter plot.


    2. As race number increases, time tends to decrease. The data look fairly linear with the exception of two points in the upper left corner.

    3. What is the least-squares line for these data? Use your calculator to make a residual plot. Based on the residual plot, does this linear model appear to describe the data adequately? Explain.


    4. The charts below represent (from left to right) regression equation, least-squares line, and residual plot.



      The dots in the residual plot appear to have a strong pattern (sort of like a check mark). This indicates that this model is not adequate to describe the relationship between time and race number.

    5. In the scatter plot that you observed in part (b), you should have noted two outliers. These correspond to the times for the first two races. (This swimmer had just started swimming butterfly, and her times were unusually slow.) What effect do these points have on the least-squares line? How can the least-squares criterion be used to explain why these points had this effect?


    6. Since the least-squares line tries to make the sum of the squared distances between the data and the line as small as possible, these outliers have the effect of tilting the line in their direction.

    7. How do you think the least-squares line would change if the two outliers were removed from the data? Explain your reasoning.


    8. If these points were removed, the slope of the line would increase. (The absolute value of the slope is smaller, but because this slope is negative, the slope increases.)

    9. Remove the outliers from the data. What is the equation for the least-squares line now? Again, use your calculator to make a residual plot. Based on the residual plot, does this linear model appear to describe the data adequately (with the exception of the outliers)? Explain.


    10. The dots in the residual plot appear to be randomly scattered.

      Therefore this linear equation adequately describes the relationship between time and race number.

    11. Based on the model from part (e), predict the swimmer’s swim time for her 15th race. For her 16th race. On average, by how much are her times decreasing from race to race? Can this pattern continue indefinitely? Explain.


    12. Time for 15th race, 41.08; for 16th race, 40.75.

      Her times go down by approximately 0.33 second (about 1/3 of a second).

      Eventually, her times must start to level off at some value. Otherwise she would eventually end up swimming the length of a pool in a negative amount of time.





Assessment—Cattle Stocks

   

Some archaeologists gather information about the dietary habits of ancient societies. Animal bones give the archaeologists clues about the kinds of meat that were eaten by such groups. Sometimes scientists find enough bones to determine the size of the whole animal. Most of the time, however, only a few bones are found. If they find the leg bone (metacarpus) of a cow, archaeologists can predict its height (Figure 1).

Figure 1. "Reconstructing" a cow from its leg bone

Figure 2. Linear relationship between h and m

  1. Does extension of the graph make any sense for the real situation that is described by the graph? Explain your answer.


  2. The drawn line is based on data. For other values of m the relation is not known. Smaller and larger values of m may be impossible: the height of a cow is limited. The most extreme value (m = 0) should predict a value of 60 for h !!

  1. Find the equation of this line. Write it in the form h = __m + __. (Note here that m is one of the variables, not the slope of the line.)


  2. h = 3*m + 60

  1. Suppose a scientist finds a metacarpus that is 17 cm long. What was the height of the cow?


  2. 111 cm

  1. If a cow is 85 cm tall, how long is its metacarpus?


  2. 3m = 25, so m = 25/3. An answer giving the nearest integer (8) is correct also.

    In a certain excavation, archaeologists found bones of cattle that didn’t fit their expectations. The relationship between the metacarpus and the total height of the cow was different from that for the other cow bones. These animals were apparently of another stock. To distinguish between the two stocks, call the original stock "A" and the new stock "B."

    The equation that describes the relationship between metacarpus and height for stock B is:

    h = 5m + 15.

    This equation is reliable only for 5 m 30.

  1. Draw the line for stock B in Figure 2.


  2. Answer: Chart below


  1. In theory, it is possible to tell to which stock a cow belongs when the length of the metacarpus and the height are known. As you can see in Figure 2, at the point of intersection of the two lines it is not possible to identify the stock related to a metacarpus length near 22 cm. Find the exact value of m and h where the lines intersect.


  2. The difference between A and B is 35 for m = 5. For every extra cm, the difference shrinks by 2 cm, so it takes 17.5 steps to arrive at the same value of h. Therefore, the values are: m = 22.5 and h = 127.5. An accurate estimation from the graph (together with a check on the answer) is good too. We don’t expect students to solve a "system of equations" in the traditional way.

    In reality, the relationship between metacarpus and height is not as well defined as is suggested by the graphs of the linear equations. Therefore, it is possible that the length of the metacarpus and the height of a stock-A or -B cow don’t exactly fit the linear equation of stock A or B. Remember that the equations of the two straight lines were found as the least-squares lines from scatter plots.

  1. Suppose that an archaeologist found a metacarpus that was 15 cm long. Based on investigations of the other bones found at the same site, the height of this cow was estimated to be 108 cm. To which stock did this cow belong: A or B? Explain your answer.


  2. Stock A. The expected value should be 105 for stock A and 90 for stock B. So the real value is the nearest to the one of stock A.

  1. Answer Item 7 for m = 10 and h = 70.


  2. The value of h is now between the predicted values for stock A and B: 90 and 65. It is nearer to the value of stock B.

    Because of the scatter, it is sometimes difficult to tell to which stock a cow belongs. For these data, the scatter around the least-squares line is relatively small. Ignoring a few outliers, residual errors rarely exceed 2 cm (positive or negative). So you can say that there is a region around the line where all dots of the scatter can be found.

    Figure 3. Magnification of a section of the graph from Figure 2

    Figure 3 shows an enlarged portion of Figure 2. The line for cattle stock A is drawn, together with the region of variability around that line.

  1. Interpret the shaded area that is drawn in Figure 3.


  2. The top line for the shaded area is parallel to the given line for stock A. the vertical distance to this line is exactly 2 cm. This can be checked for m = 21: the value of h is 125, and that is 2 more than the value on the given line. The lower straight line is exactly 2 cm below the given line. So the shaded area is showing the region.

  1. Complete Figure 3 by adding the line and the region that belong to cattle stock B.
  1. Use Figure 3 to find out the values of m and h that do not help in identifying the stock of the animal.


  2. The part of the picture that is shaded twice is the "problematic" region. From Figure 2 it is clear that problems arise for values of m between 20.5 and 24.5 We don’t expect exact borders, although they can be found by intersecting two lines (two times) with equations that can be derived from the given lines.

    Archaeologists are interested not only in the kinds of food that ancient people consumed, but also in the quantities. Using the height of a cow, archaeologists can roughly estimate the total amount of meat. The relationship between height and total amount of meat is almost linear. Figure 4 shows the data for cattle of the two stocks, A and B.


Height (cm)

Amount of meat for stock A (kg)

Amount of meat for stock B (kg)

110 400 380
120 470 435

Figure 4. Amount of meat given the height and stock


  1. The length of a metacarpus is 21 cm. This bone can belong to a cow of either stock A or stock B. Make an accurate estimation of the amount of meat for both possibilities.


  2. Linear extrapolation using the table. For stock A: m = 21 means h = 123 cm. For every cm increment of the height, the amount of meat grows by 7 kg. So, for h = 123, there is 470 + 3*7 = 491 kg. In the same way you find for stock B: h = 120 and the amount of meat is given directly in the table: 435 kg.





Unit Project—Who Am I?

Reconsider the data from Bones 1 and Bones 2 (in the preparation reading). The table in Figure 1 contains the information about those bones.

Bones 1
(Possibly female)

Bones 2
(Taller of the two)

Uncertain

Femur: 413, 414

Femur: 508

Skull: 230

Ulna: 228

Ulna: 290

Humerus: 357

   

Radius: 215

   

Tibia: 416

Figure 1. Classification of bones discussed in preparation reading

Table 1 contains actual data from the Forensic Anthropology Data Bank (FDB) at the University of Tennessee.

The FDB contains metric, nonmetric, demographic, and other kinds of data on skeletons from all over the United States. These individuals most likely came through the medico-legal channels as unidentified bodies, then went to forensic anthropologists for analysis and identification.

1 168 307 240 258 448 384 368

1 178 336 247 261 463 404 390

1 161 294 213 227 413 335 322

1 155 324 262 279 465 395 375

1 165 314 243 258 432 364 364

1 168 303 223 244 441 355 342

1 165 311 231 254 436 362 360

1 173 312 248 266 483 405 401

1 165 322 229 246 448 368 352

1 163 298 221 245 443 355 361

1 153 280 218 234 410 345 344

1 165 294 220 235 448 354 353

1 170 311 235 253 440 360 347

1 160 316 214 226 437 356 348

1 159 292 223 233 419 346 336

1 163 315 228 251 438 356 347

1 165 303 237 249 451 356 348

1 165 308 234 248 439 348 344

1 165 315 227 240 448 363 353

1 175 316 244 260 473 390 374

1 180 333 256 278 475 391 381

1 168 321 230 248 450 365 362

1 163 299 219 236 435 357 339

1 165 304 246 264 467 392 383

1 160 309 236 248 432 364 358

1 158 319 246 268 442 371 364

1 165 325 242 250 448 378 365

1 170 335 248 263 474 400 382

1 182 334 254 273 514 420 407

1 165 307 230 248 452 363 355

1 163 297 240 260 435 356 356

1 143 282 216 233 398 334 318

1 154 297 228 248 423 344 334

1 171 342 272 290 485 418 407

1 162 303 237 262 433 367 364

1 150 308 220 247 383 352 341

1 157 288 201 215 429 363 350

1 158 314 239 263 432 371 358

1 162 306 250 268 444 355 352

1 159 310 238 255 449 362 352

2 169 337 254 273 460 396 385

2 153 296 223 243 407 337 338

2 175 339 256 271 470 390 381

2 179 343 242 263 464 378 371

2 179 352 253 269 484 407 397

2 198 354 263 292 508 417 412

2 173 327 256 276 463 383 387

2 180 357 268 278 494 401 390

2 178 344 254 269 464 371 366

2 175 339 245 272 456 374 366

2 177 343 250 266 483 361 365

2 180 353 260 281 490 420 415

2 170 303 235 249 435 366 361

2 191 364 263 278 511 430 417

2 188 349 269 288 498 427 423

2 179 323 256 276 486 398 400

2 180 350 263 280 480 419 418

2 181 350 263 282 488 391 381

2 178 337 272 272 475 393 390

2 172 344 255 281 470 400 393

2 188 360 269 283 510 422 416

2 189 347 272 283 547 432 445

2 177 330 246 262 462 386 370

2 166 322 242 258 442 373 374

2 186 332 267 283 478 391 388

2 177 322 245 265 457 397 395

2 176 332 259 274 458 382 378

2 180 323 251 275 448 390 387

2 173 335 253 273 497 404 389

2 175 330 253 274 470 384 382

2 169 313 252 265 472 391 385

2 175 336 256 274 464 388 377

2 181 390 284 303 521 440 435

2 193 356 297 318 522 451 433

2 182 362 275 293 499 424 405

2 169 322 249 266 426 366 356

2 180 337 265 281 482 412 399

2 185 363 286 302 520 429 420

2 180 355 274 292 490 422 424

2 170 378 272 291 512 404 390

2 180 370 278 292 523 429 420

2 175 333 260 273 484 398 386

2 168 342 262 280 484 404 385

2 170 347 269 291 476 396 393

2 166 315 240 260 456 377 362

2 185 363 295 309 524 446 427

2 191 382 299 316 537 479 466

Table 1. Data from Forensic Anthropology Data Bank (FDB)

Key to Data (in order from left to right): sex (1 = female, 2 = male), height (cm), humerus (mm), radius (mm), ulna (mm), femur (mm), tibia (mm), fibula (mm)

Use the data in Table 1 to answer the following items. Present your findings in a formal report. All of your conclusions must be supported by statistical analysis using the data in Table 1.

  1. Determine several models to predict people’s height from the lengths of various long bones in their arms and legs. Explain which of these models you would prefer to use and why.
  2. Based on these data, do you agree that Bones 1 is female? Do the data provide any information that would help you determine whether Bones 2 is male or female?
  3. Determine relationships between pairs of long bones that would help you decide whether the bones in the "uncertain" column belong to Bones 1 or Bones 2. (Or is there strong evidence that one of these bones belongs to a third person?)
  4. Finally, predict the heights of Bones 1 and Bones 2. Explain why you chose the model that you did to make your predictions.




Mathematical Summary—Scatter plots

In many situations, one is confronted with questions such as "Are values of quantity 1 related to values of quantity 2?" For example, a forensic scientist might ask, "Is height related to femur length?" In general, such questions suggest the use of graphs called scatter plots.

Since the question implies that one quantity might help predict values of the other quantity, it is common to refer to the quantities as the independent and dependent variables, respectively. A scatter plot is a graph in which the dependent variable’s values are represented on the vertical axis and the independent variable’s values are represented on the horizontal axis. This is also referred to as a graph of the dependent variable versus the independent variable.

A scatter plot is an ideal tool in looking for patterns in a relationship between two quantities.

Linear Relationships

Linear relationships between two variables can be described by graphs, equations, tables, and arrow diagrams.

the slope-intercept form, y = mx + b,

and,

the point-slope form, yk = m(x - h).

Given any two points on a line, you can determine the value of m, the slope of the line, by computing the ratio D y/D x between the two points. Note that, in each of these forms, the slope appears as the number multiplying the independent variable, x.


Equivalence

Two linear equations are equivalent if they have the same slope and both pass through the same point.

For example, the graph of the equation

y – 5 = 3(x – 1)

is a line that passes through the point (1,5) and has slope 3. The equation

y = 3x + 2

is an equivalent equation because the slope is 3 and the point (1,5) is a solution.

y = 3(1) + 2
y = 5

also passes through the point (1,5). You could verify that fact by substituting the pair (1,5) for x and y into each equation.


Fitting and Evaluating Equations

The main question of this unit is, "How can you identify and describe a relationship between two variables so that you can predict values of one variable from values of the other?"

First, collect data on the two variables. As noted above, a scatter plot is a useful display for gaining insight into possible relationships. From the scatter plot, check the direction (positive, negative, or neither) and the form (linear or nonlinear) of the relationship.

If a scatter plot has a linear form, you can "fit’ a line to the data and use the equation of your line to make predictions. The principal tool in evaluating the fit of your line is the set of residual errors–the differences between the actual and predicted values of the dependent variable. Different criteria based on the residual errors can be used to determine the "best-fitting" line. Unfortunately, the "best-fitting" line according to one criterion is not always the best according to another. However, a "good" fit should always have residuals that are randomly scattered around the horizontal axis.

One of the most commonly used criteria for determining the "best-fitting" line is called the least-squares criterion. The least-squares line has the smallest sum of the square errors (residuals). Also referred to as the regression line, it is popular because it generally does a good job of describing data that have a linear form. However, when outliers are present or when the scatter plot does not have a linear form, the least-squares line, or any other line, can do a very poor job of describing the pattern of a scatter plot.

A plot of the residuals versus the independent variable can be very helpful in spotting outliers or nonlinear data. Such plots can display outliers more prominently than a scatter plot of the original data. Also, if the data have a nonlinear form, a residual plot will show a strong pattern.

When outliers are present, removing the outliers and refitting a linear model to the remaining data may produce a better prediction model. However, when data have a nonlinear form, no line will adequately describe the pattern of the data. In this situation, look for a different kind of model.


The Precision of a Prediction

The precision of a prediction is linked to the variability inherent in the data. For example, suppose you had the following data on student heights (in cm): 150, 152, 154, 156, 158. If you were asked to predict the height of a student in this group, you might decide to chose the mean height of 154 cm for your prediction. In this case, the actual height could be as short as 150 cm or as tall as 158 cm; so you could be as far off as 4 cm. You can use a similar approach when dealing with relationships between two variables by examining the variability in the residuals.


Choosing Between Two Linear Models

In some situations, you may have two independent variables that are linearly related to the same dependent variable. In this case, it is generally best to base your predictions on the independent variable that has the stronger linear relationship with the dependent variable. Strong relationships have low variability, so one way of determining the strength of the linear relationship is to use the sum of the squared errors. For example, you could select the least-squares line associated with the independent variable that has the smaller sum of square residuals. If the data on the two independent variables contain different numbers of observations, select the least-squares line associated with the independent variable that has the smaller average squared error.





Key Concepts

Dot plot: Display in which dots are placed above a number line to represent the values of data for a single variable

Independent variable: The variable on which a prediction is based; the variable that "explains" the dependent variable. Mathematicians frequently use the letter "x" to represent this in noncontextual situations.

Least-squares criterion: Choose the line with the smallest sum of squared errors (SSE).

Least-squares line: The line that satisfies the least-squares criterion

Linear equation: An equation relating two variables, x and y, that can be put in the form y = mx + b

Linear form: The form of a scatter plot for which it is possible to draw a line that describes the general flow of the data

Linear regression: Fitting a line to data using the least-squares criterion

Negative relationship: A relationship between two variables in which one variable tends to decrease while the other increases

Nonlinear form: The form of a scatter plot on which the general flow of the data is not well described by a straight line

Outlier: In a collection of data, an individual data point that falls outside the general pattern of the other data

Point-slope form: yk = m(xh); a form for a linear equation where (h, k) is a point on the line and m is the slope of the line

Positive relationship: A relationship between two variables in which both variables tend to increase together

Regression: Fitting lines or curves to data

Residual errors: Actual value of the dependent variable minus the predicted value

Residual plot: A scatter plot of the residuals versus the independent variable

Dependent variable: The variable that is to be predicted; the variable that "responds" to changes in the independent variable. Mathematicians frequently use the letter "y" to represent this in noncontextual situations.

Scatter plot: A plot of ordered pairs of data

Slope-intercept form: y = mx + b; a form for a linear equation where m is the slope and b is the y-intercept

SSE: The sum of the squared errors

Strong relationship: A scatter plot of the data lies in a narrow band.

Weak relationship: A scatter plot of the data does not lie in a narrow band; they are more scattered.

Versus: When used in the phrase y versus x, it describes a scatter plot of y and x in which y is the dependent variable and x is the independent variable.




Solution to Short Modeling Practice






Solution to Christmas Tree Farming



Table 3

Elevation

Mean Max Temp

Mean Min Temp

1,000'

67.3°F

40.0°F

2,000' 63.8°F 36.5°F

3,000'

60.3°F

33.0°F

4,000'

56.8°F

29.5°F

5,000'

53.3°F

26.0°F

6,000'

49.8°F

22.5°F

7,000'

46.3°F

19.0°F

2.

Yes, both the maximum and minimum temperature-elevation relationships are linear.

Yes

Tract 1

2,000-ft elevation

mean max temp = 63.8°F
mean min temp = 36.5°F

 

3,100-ft elevation

mean max temp = 60.0°F
mean min temp = 32.7°F

Tract 2

3,200-ft elevation

mean max temp = 59.6°F
mean min temp = 32.3°F

 

4,200-ft elevation

mean max temp = 56.1°F
mean min temp = 28.8°F

Tract 3

5,500-ft elevation

mean max temp = 51.6°F
mean min temp = 24.3°F

Tract 1 = Douglas fir

Tract 2 = Douglas fir and noble fir

Tract 3 = Noble fir

Yes, Tract 2 is a favorable site for both of these species of trees.


Solutions to Practice and Review Problems



Exercise 1


Exercise 2


Exercise 3

  1. F1D1 = F2D2

    0.2F1 = 500D2

    F1 = = 2500D2

  2. F1 = 2500 * 18 = 45,000 g or 45 kg


Exercise 4


Exercise 5


Exercise 6



  1. X

    Water Pressure Versus Depth

     

    Air Pressure Versus Altitude

    Depth (feet)

    Pressure (Atm)

     

    Altitude (1000 ft)

    Pressure (Atm)

    0 1   0 1.00
    33 2   10 0.69
    66 3   20 0.46
    100 4   30 0.30
    133 5   40 0.19
    166 61   50 l0.11
    200 7   60 0.07
    300 10      
    400 13      
    500 16      

  2. The graph of pressure versus depth is linear.

Exercise 7

  1. 83 + 88 + 79 + 69 + 75 + 84 + 87 + 85 + 78 + 80 + 84 + 86 = 978
    978 ÷ 12 = 81.5

  2. 69 75 78 79 80 83 84 84 85 86 87 88
    83 + 84 = 167 ÷ 2 = 83.5

Exercise 8


Exercise 9

  1. W = 2.7d, where W is the weight gain in pounds and d is the number of days. Solve for d, d = W .

  2. F = 22.3d, where F is the feed consumption in pounds, and d is the number of days. Solve for d: d = F

  3. W = F

    W = 2.7F

    F = W

  4. F = (50) » 412.96 or about 412.6 lb. of feed.

  5. 10 is greater than 5; –10 is farther from zero than –5.


Exercise 10


Exercise 11


Exercise 12

  1. Probability = » 0.00273

  2. The theoretical probability is also » 0.00273.

  3. (0.00273) (0.00273) » 0.00000075 or about 0.000075%.


Exercise 13

  1. P(Small and large) = (0.030) (0.018), or about 0.054%.

  2. It is likely that only one bottle has the match up and tend to leak for this reason.


Exercise 14

  1. Since each mouse costs $4.90, the cost for n mice is…
    P = 4.90n

    where P is the cost to purchase the mice in dollars, and
       n is the number of mice purchased.

  2. The cost of breeding is the sum of the overhead costs and the food cost.

    B = 830 + 0.60n
    where B is the cost in dollars of breeding the mice per week, and
       n is the number of mice.

  3. Equate P and B, to find what value of n makes them equal.

    4.90n = 830 + 0.60 n

    Then solve for n. Subtract 0.60 n from both sides.

    4.90n – 0.60 n = 830 + 0.60 n – 0.60 n
    430n = 830

    Then, divide both sides by 4.30.

    n = 193 mice (rounded)

  4. If, for example, 200 mice were needed, the cost to breed them would be

    B = 830 + 0.60 (200)
    B = 950.00, or $950.00

    while the cost for purchasing would be

    P = 4.90 (200)
    P = 980.00, or $980.00

    Thus, it would be cheaper to breed the mice if more than 193 mice per week were needed.


Exercise 15

  1. L = 40T + 110

    where L is the cost for labor and equipment in dollars, and
       T is the time required, in hours.

  2. You know that 1 bag of sand is used in 10 minutes. A proportion can be solved to find how much sand is used in 60 minutes (one hour).

    Number of bags = (1 bag 60 min) 10 min
    Number of bags = 6 bags

    Thus, since 6 bags are used each hour, the sand will cost $24 (6 bags × $4 per bag) each hour. So, …

    S = 24 T

    where S is the cost for sand in dollars, and
       T is the number of hours the sandblaster is used.

  3. The total cost, C, will be S + L, or

    C = 24T + 40T + 110
    C
    = 64T + 110

    which is in standard form. The variables are C and T, the coefficient is 64, and the constant is 110.

  4. For 2 hours, T = 2,

    C = 64 (2) + 110
    C = 238

    and the cost is $238.

    For 4 hours, T = 4,

    C = 64 (4) + 110
    C = 366

    and the cost is $366.

    For 6 hours, T = 6,

    C = 64 (6) + 110
    C = 494

    and the cost is $494.