Conducting Research in Psychology Measuring the Weight of Smoke 4th Edition by Brett W. Pelham – Test Bank
To Purchase this Complete Test Bank with Answers Click the link Below
If face any problem or
Further information contact us At tbzuiqe@gmail.com
Sample Test
Chapter 3 – Moving From Fact to Truth:
Validity, Reliability, and Measurement
Chapter Summary
This chapter began by describing four forms of validity (internal, external, construct and conceptual). We
reviewed important distinctions between these four forms of validity, but the
primary point to remember is that each of the four forms of validity has to do
with a different form of psychological accuracy or “truth.” Am I correct in
inferring causality from a correlation (internal validity)? Am I correct when I
try to generalize my laboratory findings to the real world (external validity)?
Am I correct when I argue that my manipulation makes people angry rather than
afraid (construct validity)? Am I correct in concluding that my series of three
field studies validated one theory rather than another (conceptual validity)?
In a similar fashion, the three forms of reliability we reviewed (test-retest, internal consistency, interrater) are
different from one another, but each focuses on an issue related to
consistency. Is my test sorting people in a consistent manner over time
(test-test reliability)? Are the individual items of my test contributing to a
consistent image of people (internal consistency)? Are different judges
consistently making similar judgments of identical stimuli (interrater
reliability)? After discussing validity and reliability, we reviewed levels of
measurement (i.e., measurement
scales). We argued that higher level measurements such as ratio
scales typically yield more information about participants, but we also noted
that it is can be tricky to decide whether a psychological scale truly
qualifies as an interval or ratio scale. The summary in the text of Chapter 3
also noted that you could think of this entire textbook as a detailed recipe
for maximizing the validity and reliability of psychological research.
Sample Answers for the Study Questions from the Text Book
1. What
is the difference between external and internal validity? Which of these two
forms of validity is maximized by the use of a) random assignment to
conditions, b) random selection, and c) the elimination of confounds?
External validity refers to how well the experiment generalizes
to the general population, while internal validity determines whether changes
in the dependent variable were caused by manipulation of the independent
variable. Internal validity is maximized by random assignment to conditions,
random selection, and the elimination of confounds.
2. Reliability
does not generally guarantee validity. Consider the specific case of external
validity. What is the difference between external validity and reliability? Can
an experimental finding be both reliable and externally valid?
External validity is how well the experiment generalizes to the
general population, while reliability is the repeatability of a measure or
observation. If a finding has external validity, it must also be reliable.
However, a reliable finding does not require external validity. A reliable
measure may be able to accurately group people in ways that are consistent, but
this does not necessarily extrapolate to the general population. A finding that
is both reliable and generalizable to the overall population will be most
helpful to researchers.
3. Explain
the logic of inferring whether or not a test will have high test-retest
reliability by examining the internal consistency of the test. Why are measures
with higher internal consistency better suited for testing psychological
theories regarding covariation?
If a test has 10 questions and each question addresses the same
concept, then the test will have high test-retest reliability if participants
answer each question in the same way. This is to say that the test has internal
consistency. Measures with higher internal consistency are better suited for
testing psychological theories regarding covariation because it is necessary to
gain repeated results to show that one variable corresponds to changes in
another. The more items on a test that address this relationship, the better a
psychologist can infer that the two variables are related.
4. The
subtitle of this book alludes to a story in which Sir Walter Raleigh placed a
wager that he could measure something elusive, namely the weight of smoke. Both
construct validity and conceptual validity refer to things more elusive than
the weight of smoke, because both try to link real observations to hypothetical
constructs or abstract theories. What are these two forms of validity, and how
do they differ from one another? How might you determine if a given study, or
program of research, is high in these two forms of validity?
Construct validity is the extent to which the independent and
dependent variables of an experiment truly represent the abstract, hypothetical
variables of interest to the researcher. Conceptual validity is how well a
specific research hypothesis maps on to the broader theory that it was defined
to test. A given study would be high in construct validity if it has
operational definitions that accurately reflect the variable that is to be
measured. However, many psychological concepts are very abstract, so operational
definitions may differ. In that case, construct validity would be high if other
experts on the subject agree that the operational definitions are a true
reflection of the variable in question. A study or group of studies would be
high in conceptual validity if they make unique predictions that logically come
from a theory. This is similar to construct validity but on a broader scale.
5. Suppose
you are interested in measuring people’s weight. Provide an example of a
nominal, ordinal, interval, and ratio scale that would accomplish this. If you
are assessing a psychological construct (e.g., self-esteem) with an externally
valid scale that measures from 0 (“low self-esteem”) to 7 (“high self-esteem”),
which levels of measurement are likely satisfied and which are questionable?
A nominal scale associated with weight would consist of labeling
people “fat”, “skinny”, “average”, etc. An ordinal scale of weight would be
lining up a group of people in order of the heaviest to lightest. An interval
scale would be to measure each person’s weight in pounds. A ratio scale would
consist of comparing people in terms of how heavy they are in comparison to one
another. For example, a 100 lb. child is twice as heavy as a 50 lb. child. When
assessing psychological constructs such as self-esteem, nominal and ordinal
scales are not satisfactory. Interval scales are the most appropriate way to
measure psychological constructs, although it is truly difficult to state that
a score would make someone twice as likely to have a characteristic such as low
self-esteem. However, most psychological measures come so close to being
interval scales that we refer to them as such. While ratio scales are seemingly
a good measure, it is very difficult to measure psychological constructs with
an absolute zero point.
It is very difficult to obtain a baseline of a measure such as
self-esteem, so researchers must use caution when referring to a scale as a
ratio measure.
Testbank
Multiple-Choice Questions
1. The
central point behind the three strange stories that begin Chapter 3 (e.g., the
story about the beginner’s golf shot) is that:
1. A)
most laypeople have no appreciation of concepts like reliability and validity
2. B)
single observations cannot be studied scientifically
3. C)
almost all individual measurements include components of error or luck
4. D)
reliability and validity are not the same thing
ANS: C
REF: Three Strange Stories
2. The
relative accuracy or correctness of a psychological statement is its:
1. A)
reliability
2. B) validity
3. C)
efficacy
4. D)
confirmability
ANS: B
REF: Validity
3. Studies
that provide good information about causal relations between variables are high
in:
1. A)
restrictive validity
2. B)
construct validity
3. C)
external validity
4. D)
internal validity
ANS: D
REF: Validity
4. High
levels of empathy usually go hand in hand with high levels of helping behavior.
That is, people who are very high in empathy are usually more likely to help
others. This is an example of:
1. A)
generalizability
2. B)
covariation
3. C)
conceptual validity
4. D)
construct validity
ANS: B
REF: Validity
5. Which
of the following is NOT one of John Stuart Mill’s requirements for establishing
a causal relation between two variables?
1. A)
temporal sequence
2. B)
eliminating confounds
3. C)
covariation
4. D)
external validity
ANS: D
REF: Validity
6. Because
high levels of cloudiness usually go hand in hand with (i.e., are correlated
with) high levels of rainfall, Rogelio concludes that rain causes cloudiness.
Rogelio has failed to take into account information regarding:
1. A)
temporal sequence
2. B)
the elimination of noise
3. C)
covariation
4. D)
the problem of induction
ANS: A
REF: Validity
7. Psychology
has been criticized by many people because its research generally uses white
college students as participants. Critics would argue that focusing on such a
limited group is a threat to:
1. A)
construct validity
2. B)
external validity
3. C)
reliability
4. D)
sequential reasoning
ANS: B
REF: Validity MSC: WWW
8. According
to the text, ruling out other probable causes of an event before concluding
that one thing (e.g., testosterone) is the cause of another (e.g., aggression)
is known as:
1. A)
competitive elimination
2. B) eliminating
confounds
3. C)
construct validity
4. D)
temporal consistency
ANS: B
REF: Validity
9. To
say that one is worried about confounds is to say that one is worried about
the:
1. A)
problem of deduction
2. B)
trade-off between internal and external validity
3. C)
third variable problem
4. D)
problem of covariation
ANS:
C REF:
Validity
10.
The best synonym for external validity is:
1. A)
generalizability
2. B)
specificity
3. C)
coherence
4. D)
suitability
ANS: A
REF: Validity MSC: WWW
11.
Most researchers would argue that _________ studies tend to be
high in external validity.
1. A)
experimental
2. B)
quasi-experimental
3. C)
passive observational
4. D)
within-subjects
ANS: C
REF: Validity MSC: WWW
12.
The quality of a researcher’s operational definitions is closely
associated with:
1. A)
restrictive validity
2. B)
construct validity
3. C)
external validity
4. D)
internal validity
ANS: B
REF: Validity
13.
The degree to which a hypothesis successfully maps onto the broader
theory that it was designed to test is known as:
1. A)
conceptual validity
2. B)
construct validity
3. C)
hypothesis validity
4. D)
internal validity
ANS: A
REF: Validity MSC: WWW
14.
The three basic forms of reliability emphasized in the text are:
1. A)
internal reliability, external reliability, and restrictive reliability
2. B)
interobserver reliability, interrater reliability, and test-retest reliability
3. C)
interobserver agreement, internal consistency, and temporal consistency
4. D) interrater
agreement, internal agreement, and temporal agreement
ANS: C
REF: Reliability
15.
For typical studies, what is an ideal time frame to wait before
re-testing participants on a measure?
1. A) 1
day
2. B)
2-4 weeks
3. C) 6
months
4. D)
2-4 years
ANS: B
REF: Reliability
16.
If we wanted to formally assess the reliability of a set of
judgments (scores) made by a group of boxing or figure skating judges, which
form of reliability would we need to assess?
1. A)
consensual agreement
2. B)
internal consistency
3. C)
temporal consistency
4. D)
interobserver agreement
ANS:
D REF:
Reliability
17.
Test-retest reliability can only be assessed:
1. A)
when a survey contains two or more different items
2. B)
when two different forms of a measure are administered at the same time
3. C)
when participants fill out the same measure on more than one occasion
4. D)
when multiple judges independently assess the same behavior or performance
ANS: C
REF: Reliability
18.
According to the text, one way to think about
____________________ is that it represents
test-retest reliability as assessed over an extremely __________
waiting period.
1. A)
inter-item validity; long
2. B)
internal consistency; short
3. C)
interrater agreement; short
4. D)
interrater agreement; long
ANS:
B REF:
Reliability
19.
Maura and Link each developed multiple-item survey measures of
self-perceived creativity. You read all of the items in both surveys, and as
far as you can tell, the items all seem clear and valid. However, Maura
developed a 10-item measure, and Link developed a 16-item measure. Whose
measure should you probably prefer?
1. A)
Maura’s measure because it is more parsimonious
2. B)
Maura’s measure because participants are less likely to be bored by it
3. C)
Link’s measure because it likely to be higher in internal validity
4. D)
Link’s measure because it is likely to be higher in internal consistency
ANS: D
REF: Reliability, Validity, and the “More Is Better”
Rule
20.
Which is the easiest to assess statistically?
1. A)
construct validity
2. B)
test-retest reliability
3. C)
conceptual validity
4. D)
interval / ratio
ANS: B
REF: Reliability, Validity, and the “More Is
Better” Rule
21.
The difference between the Celsius and the Kelvin scales for measuring
temperature is that only the Kelvin scale has a true zero point. Thus, the
Celsius scale is a(n) ________ measurement scale whereas the Kelvin scale is
a(n) ________ measurement scale.
1. A)
nominal / ordinal
2. B)
ratio / ordinal
3. C)
ordinal / interval
4. D)
interval / ratio
ANS: D
REF: Measurement Scales
22.
Your social security number is best thought of as a(n):
1. A)
nominal score
2. B)
ordinal score
3. C)
interval score
4. D)
ratio score
ANS: A
REF: Measurement Scales MSC:
WWW
23.
Most common psychological scores (e.g., an IQ score, a
self-esteem score, a score on a 7-point Likert scale) are:
1. A)
nominal scores
2. B)
ordinal scores
3. C)
interval scores
4. D)
ratio scores
ANS: C
REF: Measurement Scales
Chapter 5 – How Do We Misinterpret?
Common Threats to Validity
Chapter Summary
Chapter 5 began by organizing some common threats to validity
around three broad themes. Specifically, we noted that (1) people are different, (2) people
change and (3) the process of studying people changes people.
Thus, for example, the methodological problem of regression toward the mean is
a specific example of how people change. We used this simple organizational
scheme to help students realize that there are really only a few general things
that can go wrong in psychological research. After discussing these three types
of threats to validity, we also suggested an alternative way of organizing the
threats, by discussing confounds versus artifacts. Because
there are a great number of ways in which confounds can crop up in research,
many of the later chapters in this text will elaborate on the concept of
confounds and describe specific types of confounds and the specific threats
they pose to validity. Later chapters will also elaborate on the fact that some
specific research methods (e.g., cross-sectional questionnaires) more often
raise concerns about confounds whereas others (e.g., laboratory experiments)
more often raise concerns about artifacts. Fortunately, just as there are many
unique kinds of confounds and artifacts, there are also many unique things
researchers can do to correct these problems. A primary goal of this book from
this point forward is to help you learn how to identify and eliminate confounds
and artifacts (i.e., threats to validity), so that they do not undermine your
own ability to interpret and conduct psychological research.
Sample Answers for the Study Questions from the Textbook
1. What
are the differences between artifacts and confounds? How do these terms relate
to a) internal versus external validity and b) random selection versus random
assignment?
Artifacts are important but overlooked variables that are held
constant in a given study or set of studies. Confounds are additional variables
in a study that vary systematically with the independent variable and also vary
systematically with the dependent variable. Confounds are a threat to internal
validity because they can lead to a false association between the dependent and
independent variables. Artifacts, on the other hand, are a threat to external
validity because it may be that the independent and dependent variables are
associated under the limited conditions of the experiment. Random selection is
commonly associated with artifacts because the results from a study done on a
specific group (i.e., Western college students) may not transfer over to
another age group or culture. Random assignment is associated with confounds
because there may be a variable within participants who are assigned to a
condition that makes them more likely to drop out of the experiment.
2. People
are different, and this fact leads to two threats to validity: the
third-variable problem and the selection bias. How do these two threats relate
to the concept of artifacts versus confounds? Which is a threat to internal
validity and which to external validity?
The third-variable problem is essentially a confound. It is a
variable that goes unnoticed by the researcher and is the cause of the change
in the dependent variable rather than the change being caused by the
independent variable. Selection bias is linked to artifacts because an
imperfect sampling method that is not random will lead to results that do not
generalize to the entire population. The third-variable problem is a threat to
internal validity and selection bias is a threat to external validity.
3. In
the summer of 2004, a rural county in Texas had three separate instances of
high school drivers causing serious automobile accidents. In the ten years
prior to this summer, there had been only three such accidents involving teen
drivers. The local superintendent of schools responded by having all high
school students of driving age take both beginner and advanced driver’s
education courses. There were no such accidents the following summer. Why might
you be cautious about concluding that the new driver’s education classes
prevented automobile accidents?
I would be cautious about attributing the absence of serious
automobile accidents the following summer to the new driver’s education classes
because it is likely that regression toward the mean was responsible for the
decrease of accidents. Since there were only 3 serious accidents in the 10
years prior to the summer of 2004, it is likely that the number is approaching
the mean and that the driver’s education courses were not the cause of the
decrease.
4. The
text discusses both heterogeneous and homogeneous attrition. What are these
concepts, and how do they relate to internal versus external validity? How do
they relate to artifacts versus confounds?
Heterogeneous attrition occurs when the attrition (aka
mortality) rates in two or more conditions of an experiment are noticeably
different. Homogeneous attrition occurs when the attrition rates throughout all
the conditions of an experiment are equal. Internal validity is affected by
heterogeneous attrition because it is difficult to make comparisons between
conditions when one has lost more participants than the other. Homogeneous
attrition is linked to external validity because the results of the study may
not be generalizable, since they may only be able to be attributed to the type
of participants who chose to complete the study. Heterogeneous attrition is
likely to be caused by a confound, whereas homogeneous attrition is likely
caused by an artifact.
5. The
act of studying people may change them. List three safeguards researchers can
take to prevent these effects from introducing confounds or artifacts into
psychological research.
Three safeguards that researchers can take are: 1) Conduct a
true experiment with a control condition in order to identify testing effects
and separate them from the experimental treatment. 2) To eliminate the effects
associated with mortality, communicate the importance of the study and try to
make the participants see how critical it is for them to continue until the
end. Offering rewards for completion of the study may help people stick with
it. 3) Use a double-blind procedure to help eliminate the effects of both
participant reaction bias and experimenter bias.
Hands-On Activity 2: Regression Toward the Mean
Some instructors may feel that this activity draws a little too
much attention to a relatively minor methodological issue, but I believe that
regression toward the mean plays a big role in a lot of casual and scientific
observations. I also think that students have a hard time really understanding
this concept and are typically forced to simply accept this methodological
principle on faith. This exercise virtually runs itself, and students who
complete it should have a very clear sense of the role of measurement error and
reliability in regression toward the mean. The key to the exercise, of
course, is that it makes visible what is normally invisible – the difference
between “true scores” and “measured scores.” To make this more salient, you
might want to pause after you have sent people to opposite halves of the room
(based on their pretest scores) and ask people in each of the two groups to
identify the number of dice they will be rolling. You might also ask people to
wear name tags that designate either their true scores (7.0 or 10.5) or the
number of dice that they will be rolling during the two rounds of the activity
(2 or 3).
Presumably, after observing regression toward the mean in the
posttest scores of both groups, most students will be able to articulate the
role of measurement error in producing regression toward the mean.
Specifically, they should be able to see that a lack of perfect reliability in
measurement (i.e., good or bad luck) caused some students to be
“mis-categorized” based on their pretest scores. On the posttest, such
mis-categorized people will score closer to their true score than to their
falsely inflated or deflated pretest score
If students cannot generate (or appreciate) the answer to the
second question (the fact that there wouldn’t usually be any regression toward
the mean if measurement were perfectly reliable), you might want to repeat the
exercise based on people’s true scores. In this case, you should see that on
both the pretest and the posttest, people’s scores hovered respectively around
7.0 and 10.5 in the groups of true low and high rollers. Of course, this does
not mean that you will never observe regression toward the mean if all
categorizations are based on true scores, but it means that there will not be a
systematic bias in this direction. In any specific set of observations, it will
be just as likely (among both groups) that the posttest scores are higher than
the pretest scores as it is that they are lower.
The final thought question is designed to help students realize
that as luck or measurement error makes a larger and larger contribution to
people’s scores on a measure (i.e., as the reliability of a measure gets lower
and lower) regression toward the mean becomes increasingly likely. The six- and
seven-sided dice example represents a case in which the true scores of the high
and low rollers are not very different and in which people’s observed scores on
any one given occasion might differ greatly based on chance. In such a case, of
course, we should typically observe a great deal of regression toward the mean.
I often ask students to contrast this activity with a hypothetical activity in
which we carefully measured people’s heights on two occasions. In the case of
height, we would expect to observe little or no evidence of regression toward
the mean.
Testbank
Multiple-Choice Questions
1. A
research design in which someone tests a claim about a variable by exposing a
person to the variable and showing that the person thought, felt, or behaved as
expected is referred to as:
1. A) a
pseudo-experiment
2. B) a
quasi-experiment
3. C) a
clinical trial
4. D) an
experiment
ANS: A
REF: People
are Different MSC: WWW
2. Madeline
plans to stand outside of a BMW dealership and ask the people she sees who they
think will win the 2012 presidential election. Her study will most likely
suffer from which of the following methodological problems?
1. A) selection
bias
2. B)
history
3. C)
maturation
4. D)
the Hawthorne effect
ANS: A
REF: People
are Different
3. The
Literary Digest error concerning the outcome of the 1936 U.S. Presidential
election was apparently caused by:
1. A)
selection bias
2. B)
nonresponse bias
3. C)
both selection bias and nonresponse bias
4. D)
both selection bias and regression toward the mean
ANS: C
REF: People are
Different MSC: WWW
4. One
of the pairs of terms below consists of two very similar threats to validity.
Which pair?
1. A)
history and maturation
2. B)
history and regression toward the mean
3. C)
experimenter bias and experimental mortality
4. D)
selection bias and testing effects
ANS: A
REF: People
Change
5. Regression
toward the mean occurs because:
1. A)
measurement is almost always biased in one way or another
2. B)
measurements are usually a mixture of true scores and error
3. C) no
two measurements are ever exactly the same
4. D)
the act of taking a test usually influences people’s future scores on the test
ANS: B
REF: People Change
6. During
the first quarter of his freshman year in high school, Dinky received a very
low score on a vocabulary test. Three months later Dinky took test again, and
he scored much higher on the test. Dinky’s improvement can be explained by:
1. A)
maturation
2. B)
regression toward the mean
3. C)
testing effects
4. D)
all of the above (all are good explanations)
ANS: D
REF: The Process of Studying People Changes
People
7. The
tendency for people to change their behaviors just because they have been asked
what they intend to do in the future is known as:
1. A)
retroactive interference
2. B)
the Hawthorne effect
3. C)
the mere measurement effect
4. D)
causation
ANS: C
REF: The Process of Studying People Changes People
8. Both
testing effects and:
1. A)
regression toward the mean lead to increases in people’s scores
2. B)
history can lead to either increases or decreases in people’s scores
3. C)
experimenter bias are based on laboratory experimenters’ behavior toward
participants
4. D)
Hawthorne effects are ways in which studying people changes people
ANS: D
REF: The Process of Studying People Changes
People MSC: WWW
9. Which
of the following threats to validity could often be thought of as a form of
attitude polarization?
1. A)
the Hawthorne effect
2. B)
testing effects
3. C)
regression toward the mean
4. D)
participant expectancies
ANS: B
REF: The
Process of Studying People Changes People
10.
Which of the following represents the most serious threat to
internal validity?
1. A)
selection bias
2. B)
nonresponse bias
3. C)
heterogenous attrition
4. D)
homogeneous attrition
ANS: C
REF: The Process of Studying People Changes
People
11.
In an experimental study of cooperation, the experimenter makes
people in the experimental condition feel like they have no choice but to
cooperate with a confederate. Kermit was assigned to this condition of the
study and felt that he was being treated like a puppet. As a result, he
actively tried to disconfirm the
experimenter’s hypothesis by refusing to cooperate. This is an example of:
1. A)
participant expectancies
2. B)
demand characteristics
3. C)
participant reactance
4. D)
evaluation apprehension
ANS: C
REF: The Process of Studying People Changes
People MSC: WWW
12.
Demand characteristics refer to:
1. A)
pressure participants feel to finish a study even when they feel uncomfortable
2. B)
pressure to give socially desirable answers to survey questions
3. C)
cues for authority that encourage research participants to respond honestly
4. D)
subtle cues in an experiment that suggest to participants how they should
behave
ANS:
D
REF: The Process of Studying People Changes People
13.
Which of the following threats to validity CANNOT be corrected
by simply adding a control group to a researcher’s design?
1. A)
history
2. B)
regression toward the mean
3. C)
testing effects
4. D)
participant reaction bias
ANS: D
REF: The Process of Studying People Changes People
14.
Which of the following procedures or techniques requires little
or no active deception?
1. A)
the use of a cover story
2. B)
the use of a confederate
3. C)
the use of unobtrusive observations
4. D)
the use of a bogus pipeline
ANS: C
REF: The Process of Studying People Changes
People MSC: WWW
15.
Rosenthal and Fode’s study of “maze-bright” and “maze-dull” rats
provides an excellent example of:
1. A)
experimenter bias
2. B)
demand characteristics
3. C) Heisenberg
effects
4. D)
participant mortality
ANS: A
REF: The
Process of Studying People Changes People
16.
The Implicit Association Test (IAT) assesses people’s
unconscious associations about objects. The IAT would be used in an instance
when the experimenter is trying to:
1. A)
conduct a double-blind experiment
2. B)
reduce experimenter bias
3. C)
introduce confounds
4. D)
minimize participant reaction bias
ANS: D
REF: The Process of Studying People Changes
People
17.
In their research on the door-in-the-face technique and blood
donation, Cialdini and Ascani (1976) were concerned about the possibility of
experimenter bias. What steps did they take to eliminate or reduce this
methodological problem?
1. A)
They kept the experimenter blind to participants’ conditions.
2. B)
They made use of a double-blind procedure.
3. C)
They deceived the participants.
4. D)
They deceived the experimenters.
ANS: D
REF: The Process of Studying People Changes
People MSC: WWW
18.
Experimenter bias and ____________ can become very similar in
some experiments.
1. A)
regression toward the mean
2. B)
maturation
3. C)
participant expectancies
4. D)
attrition
ANS: C
REF: The Process of Studying People Changes People
19.
The most common threat to the internal validity of research
designs is probably:
1. A)
experimenter bias
2. B)
confounds
3. C)
participant expectancies
4. D)
regression toward the mean
ANS: B
REF: Moving
from Three Threats to Two: Confounds and Artifacts
20.
Whereas confounds threaten _________, artifacts threaten
_________.
1. A)
validity; reliability
2. B)
reliability; validity
3. C)
internal validity; external validity
4. D)
external validity; internal validity
ANS: C
REF: Moving from Three Threats to Two: Confounds
and Artifacts
21.
By replicating an experiment while using a different specific
way of manipulating the independent variable, a researcher can often reduce
concerns about:
1. A)
archetypes
2. B)
belief perseverance
3. C)
confounds
4. D)
demand characteristics
ANS: C
REF: Moving from Three Threats to Two: Confounds
and Artifacts
22.
Lincoln conducted a successful experiment on modeling (i.e.,
social learning or copying) and helping behavior among American high school
students. He then replicated this same experiment (using exactly the same
independent and dependent variables) in a sample of Japanese senior citizens.
Lincoln probably hoped that his replication study would reduce concerns about:
1. A)
artifacts
2. B)
linguistic biases
3. C)
confounds
4. D)
demand characteristics
ANS: A
REF: Moving from Three Threats to Two: Confounds
and Artifacts
Hands-On Activity 2 – Regression Toward the Mean
Some instructors may feel that this activity draws a little too
much attention to a relatively minor methodological issue, but I believe that regression
toward the mean plays a big role in a lot of casual and scientific
observations. I also think that students have a hard time really
understanding this concept and are typically forced to simply accept this
methodological principle on faith. This exercise virtually runs itself,
and students who complete it should have a very clear sense of the role of
measurement error and reliability in regression toward the mean. The key
to the exercise, of course, is that it makes visible what is normally invisible
– the difference between “true scores” and “measured scores.” To make
this more salient, you might want to pause after you have sent people to
opposite halves of the room (based on their pretest scores) and ask people in
each of the two groups to identify the number of dice they will be
rolling. You might also ask people to wear name tags that designate
either their true scores (7.0 or 10.5) or the number of dice that they will be
rolling during the two rounds of the activity (2 or 3).
Presumably, after observing regression toward the mean in the
posttest scores of both groups, most students will be able to articulate the
role of measurement error in producing regression toward the mean.
Specifically, they should be able to see that a lack of perfect reliability in
measurement (i.e., good or bad luck) caused some students to be
“mis-categorized” based on their pretest scores. On the posttest, such
mis-categorized people will score closer to their true score than to their
falsely inflated or deflated pretest score
If students cannot generate (or appreciate) the answer to the
second question (the fact that there wouldn’t usually be any regression toward
the mean if measurement were perfectly reliable), you might want to repeat the
exercise based on people’s true scores. In this case, you should see that
on both the pretest and the posttest, people’s scores hovered respectively
around 7.0 and 10.5 in the groups of true low and high rollers. Of
course, this does not mean that you will never observe regression toward the
mean if all categorizations are based on true scores, but it means that there
will not be a systematic bias in this direction. In any specific set of
observations, it will be just as likely (among both groups) that the posttest
scores are higher than the pretest scores as it is that they are lower.
The final thought question is designed to help students realize
that as luck or measurement error makes a larger and larger contribution to
people’s scores on a measure (i.e., as the reliability of a measure gets lower
and lower) regression toward the mean becomes increasingly likely. The
six- and seven-sided dice example represents a case in which the true scores of
the high and low rollers are not very different and in which people’s observed
scores on any one given occasion might differ greatly based on chance. In
such a case, of course, we should typically observe a great deal of regression
toward the mean. I often ask students to contrast this activity with a
hypothetical activity in which we carefully measured people’s heights on two
occasions. In the case of height, we would expect to observe little or no
evidence of regression toward the mean.
Comments
Post a Comment