Selected Response Tasks

Overview

Type of objectives

Conceptual understanding, skills

Number of students

Large group

Teacher prep time

Depends on the quality of the questions. Questions requiring more than rote recall require much longer to prepare

Class time

One class period

Scoring time

Short

Scoring method

Answer key

Possible problems

Questions that do not reflect objectives, wrong answers keyed, does not reflect the type of work students will be called upon to do as adults, cannot probe in-depth understanding

Possible values

Can test a large number of students quickly, scoring is objective, can cover a wide variety of topics and skills quickly

Frequently Asked Questions

What are selected response tasks?

These are tasks where students select from given responses of one or two words. These tasks include multiple choice, true/false, matching, and fill-in-the-blank.

What are some of the strengths and weaknesses of selected response tasks?

Multiple choice, true/false, matching, and fill-in-the-blank assessments furnish snapshots of performance. They can be developed to assess many problem-solving skills, but generally the skills are tested in isolation. These common forms of assessments are blamed for some of the current problems in education. All too often teacher and student alike tend to focus on "will this be on the test" more than "is this important to learn." However, the ease of administration and scoring, and statistical reliability of these items will probably cause them to remain in use.

Multiple choice appears to be the best of the selected response assessment methods. True/false items tend to be tricky and require a tremendous number of items to maintain statistical reliability. Matching items often give away many of the answers. Fill-in-the-blank items are generally scored with a single keyed response. If a different correct response is given it probably will not receive credit causing student frustration. For these reasons multiple choice will probably remain as a large part of large scale assessments.

What are the major criteria for selected response tasks?

The following are some of the common criteria used to evaluate selected response items:

  • The students who answer the item correctly should do so because they understand the material, not because they have good test taking skills.
  • The item matches the objective it is testing.
  • The item is clearly stated.
  • The reading level of all items is appropriate.
  • The item avoids negatives, especially in the case of true/false items.
  • One item does not give away the answer to another.
  • The item is not tricky.
  • The item clearly has a correct response.
  • The number of items reflects the weight given to and time spent on various objectives.
  • The items ask for more than rote memory.

What are some of the strengths and weaknesses of true-false tests?

True-false tests are excellent for testing memorization of factual information. They are quickly prepared and scored. True-false tests require about twice as many items as a multiple choice test to have the same reliability. Test items that contain negatives are very difficult to interpret and should be avoided.

What are some of the strengths and weaknesses of matching items?

Matching items like true-false tend to cover factual information, but are usually a bit more difficult to construct. The use of negatives in one part of the matching can cause confusion and should be avoided.

What are some of the strengths and weaknesses of fill-in-the-blank items?

Fill-in-the-blank items are easily constructed and easy to score when a key is used. Frequently students answer with a correct response that is not given in the key. The answer is marked incorrect causing many hurt feelings among parents, students, and teachers. This incorrect marking occurs much more frequently with fill-in-the-blank than in true-false or multiple choice tests.

What are some of the strengths and weaknesses of multiple choice?

Good thought-provoking multiple choice items can be fashioned, but this requires a great deal of time and content expertise. The difficulty of preparation can be minimized by using item models and banking good items.

What are some modifications of multiple choice items?

Some multiple choice questions have multiple correct answers. A wise test-taker treats each answer as a true/false question. The method used for scoring can dramatically affect the way a test taker should answer the questions. Students who fail to understand the importance of various scoring methods can be severely penalized in this form of testing.

Justified multiple choice is another modification of multiple choice questions. In this form the student is required to justify choices and in some cases refute incorrect options. This form of multiple choice assessment provides an opportunity for the teacher to see if the student really knows the answer or is just guessing. The disadvantage of the justified multiple choice is that it has all of the problems that any multiple choice question has and it takes away the speed of scoring which is the major strength of multiple choice.

What are some common problems with multiple choice items?

The better items presented in this section are low level items and are only better in the sense that they are better than the poor items that precede them. The questions that follow are typical of the questions that a test writer would ask to facilitate construction of high quality test items.

Do the questions ask the student for more than rote memory?

Poor example:

Which of the following is a sex-linked characteristic?

  1. Blood type
  2. Eye color
  3. Color-blindness
  4. Tongue rolling

NOTE: Look at questions and ask, "what is the question asking the student to do?" In this case, the item above is asking the student to recall. In addition, the recall is rather trivial.

Better example:

Color-blindness is a sex-linked characteristic. What are the chances of a normal male and a color-blind female having a color-blind son?

  1. 0%
  2. 25%
  3. 75%
  4. 100%

NOTE: The answer to this question requires the student to make a prediction based on an understanding of the heredity of sex-linked characteristics. It is not necessary for the student to have memorized that color-blindness is sex-linked.

Do the questions really meet the objectives being taught?

Objective: Understand the transfer of heat.

Poor example:

Which temperature is equivalent of 1273 K?

  1. 100 °C
  2. 212 °F
  3. 1000 °C
  4. –2731 °C

NOTE: If the objective had been, "identify temperature equivalents," then the sample above would match the objective. The question below requires an understanding of heat transfer and is a better match to the given objective.

Additional problems: Kelvin is not preceded by the degree sign as in Celsius or Fahrenheit. These type of details are often missed when writing items, but are usually caught when items are reviewed.

Options A and B can be omitted by a knowledgeable student because 100 °C and 212 °F are equivalent. As two answers are not possible, a student can increase the chances of choosing the correct response to 50%. As with all test items, the students who answer correctly should do so because they understand the material, not because they have good test taking skills.

Better example:

Which diagram correctly shows the direction of net heat flow in three metal bars?

four diagrams of heat flow in metal bars

Are the questions tricky?

Poor example:

Who invented bacteria?

  1. van Leeuwenhoek
  2. Schleiden and Schwann
  3. Hooke
  4. Pasteur
  5. None of these*

NOTE: This item depends on an understanding of the term invented rather than an understanding of the contributions of individuals, since no one invented bacteria.

Additional problems: The use of all of these or none of these should be avoided since they can frequently be justified as the answer even when they are not intended to be the answer.

Better example:

Who was the first to identify microorganisms as a cause of some diseases?

  1. van Leeuwenhoek
  2. Schleiden and Schwann
  3. Hooke
  4. Pasteur*

Do the questions make it clear what the student is being asked to do?

Poor example:

Every organic compound—

  1. is produced by animals.
  2. contains carbon.*
  3. has a low melting point.
  4. is soluble in water.

Note: This type of poor item construction is often found in multiple choice tests. Some items may have only a single word followed by a dash leaving the student to wonder, "What does the test maker want?" Test takers can be taught to treat these type of items as four true–false items for each option. The first option, for example, would be read by students as, "Every organic compound is produced by animals. True or false."

Better example:

Which of these is found in every organic compound?

  1. Nitrogen
  2. Sulfur
  3. Carbon*
  4. Oxygen

Note: This question still leaves one wondering why it is important to know that organic compounds contain carbon.

Do questions avoid negatives?

Poor example:

Which of these is not a characteristic of animals?

  1. They eat other organisms.
  2. They can move.
  3. They do not make their own food.
  4. They have cell walls.*

NOTE: The use of the negative in the stem and in option C make this doubly difficult to read and interpret. Option B is correct for some animals, but not all, making this a weak option. If negatives are used the negative word should be in bold face or all caps. If a negative is used in the stem of the item NO answer option should contain a negative.

Better example:

Which of these is a characteristic that all animals have?

  1. Four legs
  2. Food-getting behavior*
  3. Chlorophyll
  4. Blood

Does a question clue answers to other questions?

Poor example:

Vitamin B, found in rice hulls, helps prevent—

  1. beriberi*
  2. scurvy
  3. measles
  4. botulism

As westerners introduced better milling techniques to remove rice hulls, natives of south sea islands began to develop beriberi due to a lack of—

  1. calcium
  2. a vitamin*
  3. citric acid
  4. protein

NOTE: The second item can lead the student to the correct answer for the first item. Items that clue each other should not be in the same test. Many good test taking students will get the correct answer, but not because they understand the information.

Are the responses in logical order?

Poor example:

graph of seed germination

Which seeds are probably the slowest to germinate?

  1. Mango*
  2. Tomato
  3. Peach
  4. Watermelon
Better example:

Which seeds are probably the slowest to germinate?

  1. Watermelon
  2. Tomato
  3. Mango*
  4. Peach

NOTE: Options with numbers should be in ascending or descending order. This helps the student find the correct option quickly.

Does the question have one clearly correct answer?

Poor example:

Tundra and taiga biomes are found at high latitudes, but similar biomes can also be found—

  1. at high altitudes*
  2. at high longitude
  3. along coastlines
  4. in the interior of continents

NOTE: Tundra and taiga biomes are found at high latitudes, but similar biomes can also be found—

  1. at high altitudes*
  2. at high longitude (This is true where the longitude line happens to intersect a high latitude.)
  3. along coastlines (This is true where the coastline is at a high latitude.)
  4. in the interior of continents (If the interior has high mountains this would be true.)

NOTE: This problem is the most difficult to detect, especially if you are writing and editing your own items. Alternate correct options are often overlooked if the person sees the answer as "obvious," if the person has a misconception, or if the incorrect options can be justified in some way. In the item above the best answer is clearly option A. However, the justification of the other options makes this type of item very difficult to defend. Only very knowledgeable reviewers can catch this type of item flaw.

If students are writing test items for review for the test, the teacher can easily spot problem areas as students will often have multiple correct answers. Because of these, student test-writing can become a powerful teaching and learning tool.

Do the options contain repetitious information?

Poor example:

When is the earth farthest from the sun?

  1. During the spring in the Northern Hemisphere
  2. During the summer in the Northern Hemisphere
  3. During the fall in the Northern Hemisphere
  4. During the winter in the Northern Hemisphere
Better example:

The earth is farthest from the sun when the season in the Northern Hemisphere is

  1. Spring
  2. Summer*
  3. Winter
  4. Fall

NOTE: Students who understand the information may miss items that have long wordy options. This problem can be avoided by putting more information in the stem of the item.

Are all the options parallel in structure and degree of specificity?

Poor example:

Which of the following is a compound?

  1. Kinetic energy
  2. Blowing wind
  3. Water*
  4. A mineral

NOTE: Items that lack parallel options often contain more than one justifiable answer. For example, water is clearly the answer, but some minerals are compounds making Option D justifiable. Avoid the use of options that can be justified in someway by a student to minimize frustration for students who tend to ponder each answer. All options should be parallel in grammatical construction as well as in degree of specificity.

Better example:

Which of the following is a compound?

  1. Sulfur
  2. Iron
  3. Water*
  4. Hydrogen

NOTE: This example provides one clearly correct answer and all of the options are substances.

Do the justifications for the options indicate why a student might choose the incorrect options?

A test writer should justify each of the options, if not in writing, at least mentally.

Poor example:

What is the source of MOST of the salt in the oceans?

  1. Dead organisms (wrong answer)
  2. Snow (wrong answer)
  3. Rocks*
  4. Direct sunlight (student guesses)

NOTE: "Wrong answer" or "Student guesses" are not justifications, but are frequently given by writers as rationales for using options as wrong answers. "Off the wall" options are wasted because so few students choose them.

Better example:

What is the source of MOST of the salt in the oceans?

  1. Dead organisms (formation of oil might be confused with salt formation)
  2. Glaciers (glacier melt might be thought to contain salt)
  3. Rocks*
  4. Meteorites (meteorite composition might be thought to be salt)

NOTE: The inclusion of rationales does not guarantee good items, but it does make the writer provide options that at least seem reasonable. The item might still focus on trivia as this item does. If students are writing test items they should provide rationals for the options. This can inform the teacher of many misconceptions students hold. It also requires students to investigate the information they are trying to learn in greater depth.

Do the items contain options that are all-inclusive?

Poor example

What type of gene is responsible for the five-finger trait in humans?

  1. dominant
  2. recessive*
  3. sex-linked
  4. sex-influenced

NOTE: Dominant and recessive are all inclusive. This reduces the chances of selecting the correct response to 50%. Without reading the question one could guess the answer to be either dominant or recessive. Genes are generally taught as dominant, recessive or incompletely dominant; as a result there is not a fourth plausible option to this particular item.

Does the writer have a misconception?

Poor example:

Five-finger trait occurs more commonly in any human population than six fingers. The five-finger trait is probably—

  1. dominant*
  2. recessive
  3. hybrid
  4. incomplete

NOTE: The keyed response should be recessive in this item. Review by others is necessary to reveal this type of error. No one writer can be expected to know all of the science there is to know. Multiple reviewers are a must. Student developed questions will often alert teachers to misconceptions held by students because they will mark the wrong answer as the correct one.

Are there opposites in the options?

Poor example:

Which of these will result if all colors are mixed?

  1. black
  2. white
  3. green
  4. yellow

Note: A test-wise student will narrow this to the opposites, thus increasing their chances to 50% of getting it right by guessing. This item also is ambiguous. Are the colors to be light or pigment? White is the correct response if light is intended and black is the correct response is pigment is intended.

Is the wording and/or the art of the item open to more than one interpretation?

Poor example:

Which of these models shows the orbit of the Earth?

four depictions of Earth orbiting the sun

NOTE: Is the answer supposed to be a mathematical or scale model? Does the view correct for perspective? When using drawings you must constantly be aware of the problems of depicting materials and scale. Multiple reviewers help prevent interpretation problems. All people reading the items should be encouraged to voice their concerns no matter how trivial they seem.

Is the factual information given in the test accurate and logical?

Poor example:

graph of length of onion root in centimeters

NOTE: Onion roots do not generally grow to a length of 300 centimeters and even if they did they would not grow as fast as indicated by the graph. Multiple reviewers can help identify illogical data or inaccurate factual information. This graph also lacks numbers on the x-axis.

How does the teacher prepare for assessment with Selected Response Tasks?

Prepare a unit assessment instrument that addresses each of the following:

  • Applying conceptual understanding, not recall of factual information
  • Applying decision-making skills
  • Applying problem-solving skills
  • Applying science investigation skills

Use the criteria provided in the Selected Response Section to help you design the best possible test.

Provide an answer key.

Describe how the assessment instrument can be used to improve student learning.