QUASI-EXPERIMENTAL AND SINGLE-CASE EXPERIMENTAL DESIGNS

Description

In the natural world, the environment or situation you find yourself in can be dynamic. You need look no further than a college classroom. Suppose, for example, that you take a college exam in which the average student scored a 50%. Why were the exam grades so low? Was the professor ineffective in their teaching? Did the students study for the exam? Was the exam itself not fair? Was the material being studied too difficult or at too high a level? In this example, the answer can be difficult to identify because the environment is constrained by preexisting factors—the time, date, content, professor, and students enrolled in the course were not assigned by a researcher but instead were determined by the school and students. Accounting for these preexisting factors is important to determine why the exam grades were low.

In other situations, we can have difficulty obtaining large samples of participants. If a company were a small business, then it would have few employees, or if a behavioral disorder were rare, then it would afflict few people. In these cases, we probably could not obtain a large sample, so it would be advantageous to observe the behavior of only one or a few individuals. For example, we could observe an employee after a merger as new policy changes successively go into effect, or we could observe a patient’s health across multiple phases of treatment. In each case, we follow one participant and observe their behavior over time.

In this chapter, we introduce quasi-experimental designs used in science to make observations in settings that are constrained by preexisting factors. We also introduce many methods used to assess the behavior of a single participant or subject using single-case experimental designs, typically used when a large sample cannot be obtained.

QUASI-EXPERIMENTAL DESIGNS

UASI-EXPERIMENTAL DESIGNS

Suppose we hypothesize that high school graduates who attend college will value an education more than those who do not attend college. To test this hypothesis, we could select a sample of high school graduates from the same graduating class and divide them into two groups: those who attended college (Group College) and those who did not attend college (Group No College). We could then have all participants complete a survey in which higher scores on the survey indicate a higher value placed on obtaining an education. If the hypothesis is correct and we set up this study correctly, then participants in Group College should show higher scores on the survey than participants in Group No College.

Notice in this example that participants controlled which group they were assigned to—they either attended college or did not. Hence, in this example, the factor of interest (whether or not students attended college) was a quasi-independent variable. When a factor in a study is not manipulated (i.e., quasi-independent), this typically means that the study is a type of quasi-experimental research design. In this chapter, we separate the content into two major sections: quasi-experimental designs and single-case experimental designs. We begin this chapter with an introduction to the type of research design illustrated here: the quasi-experimental research design.

9.1 AN OVERVIEW OF QUASI-EXPERIMENTAL DESIGNS

In this major section, we introduce a common type of research design called the quasi-experimental research design. The quasi-experimental research design, also defined in Chapter 6, is structured similar to an experiment, except that this design does one or both of the following:

  1. It includes a quasi-independent variable (also defined in Chapter 6).
  2. It lacks an appropriate comparison/control group.

1 AN OVERVIEW OF QUASI-EXPERIMENTAL DESIGNS

In this major section, we introduce a common type of research design called the quasi-experimental research design. The quasi-experimental research design, also defined in Chapter 6, is structured similar to an experiment, except that this design does one or both of the following:

It includes a quasi-independent variable (also defined in Chapter 6).

It lacks an appropriate comparison/control group.

A quasi-experimental research design is the use of methods and procedures to make observations in a study that is structured similar to an experiment, but the conditions and experiences of participants lack some control because the study lacks random assignment, includes a preexisting factor (i.e., a variable that is not manipulated), or does not include a comparison/control group.

A quasi-independent variable is a preexisting variable that is often a characteristic inherent to an individual, which differentiates the groups or conditions being compared in a research study. Because the levels of the variable are preexisting, it is not possible to randomly assign participants to groups.

In the example used to introduce this section, the preexisting factor was college attendance (yes, no). The researchers did not manipulate or randomly assign participants to groups. Instead participants were assigned to Group College or Group No College based on whether they attended college prior to the study. In other words, the participants, not the researcher, controlled which group they were assigned to. In this way, the study described to introduce this section was a quasi-experiment—the study was structured like an experiment in that differences in how students value college were compared between groups, but it lacked a manipulation (of the groups: whether students attended or did not attend college) and randomization (of assigning participants to each group).

Hence, a quasi-experiment is not an experiment because, as illustrated in Figure 9.1, the design does not meet all three requirements for demonstrating cause. In the college attendance study, for example, additional unique characteristics of participants, other than whether or not they attended college, could also be different between groups and therefore could also be causing differences between groups. For example, levels of motivation and academic ability may also be different between people who attend and do not attend college. When other possible causes cannot be ruled out, the design does not demonstrate cause.

Description

Figure 9.1 ⦁ A Simplified Distinction Between Experiments, Quasi-Experiments, and Nonexperiments

The line represents the requirements for demonstrating cause: randomization, manipulation, and comparison/control. A quasi-experiment lacks at least one of these requirements and so fails to demonstrate cause.

In this major section, we introduce four categories of quasi-experimental research designs used in the behavioral sciences:

  1. One-group designs (posttest only and pretest-posttest)
  2. Nonequivalent control group designs (posttest only and pretest-posttest)
  3. Time series designs (basic, interrupted, and control)
  4. Developmental designs (longitudinal, cross-sectional, and cohort-sequential)

Quasi-experiments include a quasi-independent variable and/or lack a control group.

9.2 QUASI-EXPERIMENTAL DESIGN: ONE-GROUP DESIGNS

In some situations, researchers ask questions that require the observation of a single group. When only one group is observed, the study lacks a comparison group and so does not demonstrate cause; that is, the study is a quasi-experiment. Two types of one-group quasi-experiments are the following:

  1. One-group posttest-only design
  2. One-group pretest-posttest design
  3. One-Group Posttest-Only Design

The type of quasi-experiment most susceptible to threats to internal validity is the one-group posttest-only design, which is also called the one-shot case study (Campbell & Stanley, 1966). Using the one-group posttest-only design, a researcher measures a dependent variable for one group of participants following a treatment. For example, as illustrated in Figure 9.2, after a professor gives a lecture (the treatment), they may record students’ grades on an exam out of 100 possible points (the dependent variable) to test their learning.

The major limitation of this design is that it lacks a comparison or control group. Consider, for example, the exam scores following the lecture. If exam scores are high following the lecture, can we conclude that the lecture is effective? How can we know for sure if scores would have been high even without the lecture? We cannot know this because we have nothing to compare this outcome to; we have no control group. Hence, the design is susceptible to many threats to internal validity, such as history effects (unanticipated events that can co-occur with the exam) and maturation effects (natural changes in learning). In all, these limitations make the one-group posttest-only design a poor research design.

A one-group posttest-only design is a quasi-experimental research design in which a dependent variable is measured for one group of participants following a treatment.

One-group designs lack a control group.

One-Group Pretest-Posttest Design

One way to minimize problems related to having no control or comparison group is to measure the same dependent variable in one group of participants before (pretest) and after (posttest) a treatment. Using this type of research design, called a one-group pretest-posttest design, we measure scores before and again following a treatment, then compare the difference between pretest and posttest scores. The advantage is that we can compare scores after a treatment to scores on the same measure in the same participants prior to the treatment. The disadvantage is that the one-group design does not include a no-treatment control group and therefore is still prone to many threats to internal validity, including those associated with observing the same participants over time (e.g., testing effects and regression toward the mean).

A one-group pretest-posttest design is a quasi-experimental research design in which the same dependent variable is measured in one group of participants before (pretest) and after (posttest) a treatment is administered.

Figure 9.2 ⦁ The One-Group Posttest-Only Quasi-Experimental Design

To illustrate the one-group pretest-posttest design, we will look at the research example illustrated in Figure 9.3. Kimport and Hartzell (2015) measured state anxiety—a type of undesired current stress that is temporary or changes as experiences or conditions change—among psychiatric inpatients from two general adult units at a private psychiatric hospital before and after a structured clay therapy in which patients could use and mold clay for up to 10 minutes. Their results showed that state anxiety was significantly reduced from before to after the therapy. A limitation of this design is that participants were not randomly assigned to groups. This means that any other factors related to state anxiety could also explain the findings. Factors include changes in the conditions or experiences of patients other than the therapy, such as patient interactions with the researchers, how long they actually used the clay during the therapy, and distractions during the therapy (e.g., noises, decorations in the setting). These factors were largely beyond the control of the researchers and therefore could have also influenced the results. In addition, because the study lacked a control group with patients who had no therapy at all, the design was susceptible to many threats to internal validity, as stated previously. Indeed, Kimport and Hartzell (2015) directly acknowledged that “control groups in future research may prevent additional confounds from occurring including the novelty effect, which implies that the treatment may have been effective simply because it was new for the participants” (p. 188). Thus, it is possible any type of new therapy intervention could have been effective.

Description

Figure 9.3 ⦁ The One-Group Pretest-Posttest Quasi-Experimental Design

Source: Based on a design used by Kimport and Hartzell (2015).

9.3 QUASI-EXPERIMENTAL DESIGN: NONEQUIVALENT CONTROL GROUP DESIGNS

In some cases, researchers can use nonequivalent control groups, when it is not possible to randomly assign participants to groups. A nonequivalent control group is a type of control group that is matched upon certain preexisting characteristics similar to those observed in a treatment group, but to which participants are not randomly assigned. For example, suppose a professor gives a new lecture method to your research methods class and gives a traditional method in another research methods class, then compares grades on the topic lectured. The classes are matched on certain characteristics: Both classes are on the same topic (research methods), offered at the same school, and taught by the same professor. However, the class taught using the traditional method is a nonequivalent control group because students in that class chose to enroll in the class, so they were not randomly assigned to that class. Any preexisting differences between students who tend to enroll for one class over another, called selection differences, could therefore explain any differences observed between the two classes. Two types of nonequivalent control group quasi-experiments are the following:

A nonequivalent control group is a control group that is matched upon certain preexisting characteristics similar to those observed in a treatment group but to which participants are not randomly assigned. In a quasi-experiment, a dependent variable measured in a treatment group is compared to that in the nonequivalent control group.

Selection differences are any differences, which are not controlled by the researcher, between individuals who are selected from preexisting groups or groups to which the researcher does not randomly assign participants.

Nonequivalent control group posttest-only design

Nonequivalent control group pretest-posttest design

Nonequivalent Control Group Posttest-Only Design

Using the nonequivalent control group posttest-only design, a researcher measures a dependent variable following a treatment in one group and compares that measure to a nonequivalent control group that does not receive the treatment. The nonequivalent control group will have characteristics similar to the treatment group, but participants will not be randomly assigned to this group, typically because it is not possible to do so. For example, as illustrated in Figure 9.4, suppose a professor gives a new teaching method in their research methods class and gives a traditional method in another research methods class, then tests all students on the material taught. In this example, the nonequivalent control group was selected because it matched characteristics in the treatment group (e.g., all students were taking a research methods class). Students, however, enrolled themselves in each class; random assignment was not used, so the comparison is a nonequivalent control group.

A nonequivalent control group posttest-only design is a quasi-experimental research design in which a dependent variable is measured following a treatment in one group and in a nonequivalent control group that does not receive the treatment.

Description

Figure 9.4 ⦁ The Nonequivalent Control Group Posttest-Only Quasi-Experimental Design

A key limitation of this research design is that it is particularly susceptible to the threat of selection differences. In the example illustrated in Figure 9.4, because students enrolled in their college classes, they, not the researcher, controlled which class they enrolled in. Therefore, any preexisting differences between students who choose one section of a class over another, such as how busy the students’ daily schedules are or how motivated they are to attend earlier or later classes, may actually be causing differences in grades between classes. For this reason, the nonequivalent control group posttest-only design demonstrates only that a treatment is associated with differences between groups, not that a treatment caused differences between groups, if any were observed.

Nonequivalent control group designs include a “matched” or nonequivalent control group.

Nonequivalent Control Group Pretest-Posttest Design

A nonequivalent control group pretest-posttest design is a quasi-experimental research design in which a dependent variable is measured in one group of participants before (pretest) and after (posttest) a treatment and that same dependent variable is also measured at pretest and posttest in another nonequivalent control group that does not receive the treatment.

One way to minimize problems related to not having a comparison group is to measure a dependent variable in one group of participants observed before (pretest) and after (posttest) a treatment and measure that same dependent variable at pretest and posttest in another nonequivalent control group that does not receive the treatment. This type of design is called the nonequivalent control group pretest-posttest design. The advantage of this design is that we can compare scores before and after a treatment in a group that receives the treatment and in a nonequivalent control group that does not receive the treatment. While the nonequivalent control group will have characteristics similar to the treatment group, participants are not randomly assigned to this group, typically because it is not possible to do so. Hence, selection differences still can possibly explain observations made using this research design.

To illustrate the nonequivalent control group pretest-posttest design, we will look at the research example in Figure 9.5. Heinicke, Zuckerman, and Cravalho (2017) evaluated the effectiveness of online Readiness Assessment Tests (RATs)—quizzes or tests given prior to class to inform the instructor of where students are struggling the most, from which they can adapt course lectures—on overall class exam grades. Heinicke et al. (2017) hypothesized that implementing RATs into coursework would improve overall grades and class performance in general. To test this hypothesis, college students enrolled in one of two sections of a Psychology of Exceptional Children course were recruited to participate. In one section, the RATs were a required part of the course (the treatment group; Section B); in the other section, the course was structured the same except that RATs were not part of the coursework (the nonequivalent control group; Section A). Knowledge of course material was assessed both prior to and at the end of the course. As shown in Figure 9.6, while knowledge of course material was not different prior to the course, students who were in the class with RATs incorporated (the treatment group) showed overall higher grades on the final assessment by the end of the course compared with students in the nonequivalent control group who did not have RATs incorporated into the course.

A key limitation of this research design is that it is particularly susceptible to the threat of selection differences. In the example illustrated in Figure 9.5, because students enroll in college classes, they, not the researcher, control what classes they will be in. Any preexisting differences between students who choose one class over another, then, could also be causing differences between classes. For example, Heinicke et al. (2017) acknowledged that because students were not randomly assigned to classes, the differences in overall class performance between those who did versus did not have RATs incorporated into their course could also be due to other “potential extraneous variables, such as the timing of the class (e.g., Section B met at 10:00 a.m., whereas Section A met at 8:00 a.m.)” (p. 137). Hence, the nonequivalent control group pretest-posttest design, like the posttest-only design, demonstrates only that a treatment is associated with differences between groups, not that a treatment caused differences between groups, if any were observed.

Figure 9.5 ⦁ The Nonequivalent Control Group Pretest-Posttest Quasi-Experimental Design

Source: Based on a design used by Heinicke et al. (2017).

Description

Figure 9.6 ⦁ The Overall Grades on a Final Assessment Between Groups That Did Versus Did Not Have RATs Incorporated Into the Course

Source: Data are adapted from those reported by Heinicke et al. (2017).

RAT = Readiness Assessment Tests.

9.4 QUASI-EXPERIMENTAL DESIGN: TIME SERIES DESIGNS

In some situations, researchers observe one or two preexisting groups at many points in time before and after a treatment, and not just at one time, using designs called the time series quasi-experimental designs. Using these types of designs, we compare the pattern of change over time from before to following a treatment. Three types of time series quasi-experimental designs are as follows:

  1. Basic time series design
  2. Interrupted time series design
  3. Control time series design

Basic Time Series Design

When researchers manipulate the treatment, they use a basic time series design to make a series of observations over time before and after a treatment. The advantage of measuring a dependent variable at multiple times before and after a treatment is that it eliminates the problem associated with only having a snapshot of behavior. To illustrate, suppose we test a treatment for improving alertness during the day. To use the basic time series design, we record alertness at multiple times before and after we give participants the treatment, as illustrated in Figure 9.7. Notice in the figure that a pretest (at 12 p.m.) and posttest (at 4 p.m.) measure can be misleading because the pattern observed before and after the treatment recurred without the treatment at the same time the day before and the day after the treatment was given. The basic time series design allows us to uniquely see this pattern by making a series of observations over time.

A basic time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that is manipulated by the researcher is administered.

Figure 9.7 ⦁ The Time Series Quasi-Experimental Design

A time series design is used to compare the pattern of behavior before and after the treatment. In this example, the pattern that occurs before and after the treatment recurs at the same time of day, even without the treatment.

Using the basic time series design, the researcher manipulates or controls when the treatment will occur. The advantage of this design is that we can identify if the pattern of change in a dependent variable before and after the treatment occurs only during that period and not during other periods when the treatment is not administered. The disadvantage of this design is that only one group is observed, so we cannot compare the results in the treatment group to a group that never received the treatment.

Time series designs include many observations made before and after a treatment.

In a basic time series design, we manipulate the treatment; in an interrupted time series design, the treatment is naturally occurring.

Interrupted Time Series Design

In some situations, researchers will measure a dependent variable multiple times before and after a naturally occurring treatment or event. Examples of a naturally occurring treatment or event include a scheduled medical procedure, a wedding, a natural disaster, a change in public policy, a new law, and a political scandal. These events occur beyond the control of the researcher, so the researcher loses control over the timing of the manipulation. In these situations, when multiple measurements are taken before and after a naturally occurring treatment, researchers use the interrupted time series design.

An interrupted time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that naturally occurred.

As an example of the interrupted time series design, Fuller, Sahlqvist, Cummins, and Ogilvie (2012) measured the impact of two London Underground (the “Tube”) strikes by the train drivers on the usage of bicycle travel using a public bicycle share program that provided bicycles (“Boris bikes”) at docking stations around London for a small fee. For this study, the researchers recorded the number of trips per day on the Boris bikes. In Figure 9.8, the solid vertical lines show the dates for each 24-hour strike. Note that in their study, each time there was a strike, the number of trips on the Boris bikes spiked, showing evidence that the Tube strikes by train drivers was related to an increase in usage of the Boris bikes.

An advantage of the interrupted time series design is that we can identify if the pattern of change in a dependent variable changes from before to following a naturally occurring treatment or event. The disadvantage of this design, like that for the basic time series design, is that only one group is observed, so we cannot compare the results in the treatment group to a group that never received or was never affected by a treatment. To address this disadvantage, we can include a matched or nonequivalent control group, as described in the next section.

Description

Figure 9.8 ⦁ Total Number of Trips per Day on the Boris Bikes

On the day of each strike, there was a sudden increase in total number of trips. Data are reproduced with permission from those reported by Fuller et al. (2012).

Control Time Series Design

A basic or interrupted time series design that includes a matched or nonequivalent control group is called a control time series design. As an example of a control time series design, Hacker et al. (2017) used such a design to test the effects of the implementation of a behavioral health child screening mandate in Massachusetts on the rates of behavioral health screenings. To compare their time series data, they also included rates of behavioral health screenings during the same time period in California, where such a policy was not implemented. California, then, was a nonequivalent control group that was matched because “it has a [similar] large diverse and stable Medicaid population [with] no competing mandate” (Hacker et al., 2017, p. 26). As shown in Figure 9.9, the implementation of the mandate was associated with an increase in behavioral health screenings in Massachusetts; no increase was observed in California during this same period.

A control time series design is a basic or interrupted time series quasi-experimental research design that also includes a nonequivalent control group that is observed during the same period as a treatment group but does not receive the treatment.

Description

Figure 9.9 ⦁ Rate of Behavioral Health Screening in Massachusetts and California Before and After the 2008 Behavioral Health Child Screening Mandate in Massachusetts

Source: Reprinted with permission from Psychiatric Services, (Copyright © 2017). American Psychiatric Association. All Rights Reserved. The behavioral health child screening mandate was associated with an increase in behavioral health screenings in Massachusetts; no change seen in California (the matched control).

As a caution, while the addition of the matched control group strengthens the design, keep in mind that the residents in each state are preexisting groups in that residents chose to live in those locations (or were born in those locations); the researcher did not randomly assign them to live in those locations. It is therefore possible, like that for all other designs that use a nonequivalent control group, that selection differences (such as differences in access residents have to care and even the costs of care for residents in each state) could have caused the differences observed in behavioral health screening rates between the states. For this reason, we conclude that the mandate was associated with an increase in behavioral health screenings, not that the mandate caused the increase.

Table 9.1 summarizes each quasi-experimental research design described in this chapter. In the next section, we introduce a special case of quasi-experiments used in developmental research.

Table 9.1 ⦁ The Quasi-Experimental Research Designs

Type of Quasi-Experimental Design

Description

Key Limitation

One-group posttest only——Observe one group after (posttest) a treatment.

No control group for comparison

One-group pretest-posttest——Observe one group before (pretest) and after (posttest) a treatment.

No control group for comparison

Nonequivalent control group posttest only

Observe treatment and nonequivalent control groups after (posttest) a treatment.

No random assignment between groups

Nonequivalent control group pretest-posttest

Observe treatment and nonequivalent control groups before (pretest) and after (posttest) a treatment.

No random assignment between groups

Basic time series design

Make many observations over time before and after a treatment manipulated by the researcher.

No control group for comparison

Interrupted time series design

Make many observations over time before and after a naturally occurring treatment.

No control group for comparison

Control series design

A time series design with a matched or nonequivalent control group.

No random assignment between groups

Learning Check 1 ✓

1. The quasi-experimental research design is structured similar to an experiment, except ____________ [complete the sentence].

2. State the type of quasi-experimental research design described in each of the following examples:

A researcher records the time (in seconds) it takes a group of participants to complete a computer-based task following an online “how-to” course.

A researcher records the rate of traffic accidents on a section of highway each month for 2 years before and 2 years after the speed limit on that section of highway is reduced.

A researcher records employee satisfaction before and after a training seminar, then compares satisfaction scores for employees at a local branch to the scores for those at the main branch who did not receive the seminar.

Answers:

  1. the research design includes a quasi-independent variable and/or lacks an appropriate or equivalent control group; 2. A. One-group posttest-only design, B. Interrupted time series design, C. Nonequivalent control group pretest-posttest design.

9.5 QUASI-EXPERIMENTAL DESIGN: DEVELOPMENTAL DESIGNS

An important area of research is used to study changes that occur across the life span. This type of research aims to understand how people or species change as they develop or age. The unique aspect of this area of research is that age, which is the factor being studied, is a quasi-independent variable. Age is a preexisting factor in that the researcher cannot manipulate the age of a participant. Because this design does not include a manipulation, it is also commonly categorized as a nonexperimental design. However, in this chapter, we describe this under the quasi-experimental category because, as you will see, each design is analogous to a quasi-experimental design already introduced in this chapter. Regardless of the category that developmental designs fit best with, it is most important to note that while developmental designs can demonstrate that variables differ by age, they do not demonstrate what causes variables to differ by age—more controlled procedures are needed, such as in an experiment.

The study of developmental changes across the life span is a special case, in that the focus of the field is on a factor that is inherent to the participants (their age). Therefore, researchers have developed research designs specifically adapted to study changes across the life span. Three types of developmental research designs are the following:

  1. Longitudinal design
  2. Cross-sectional design
  3. Cohort-sequential design

Longitudinal Design

Using a research design called the longitudinal design, we can observe changes across the life span by observing the same participants over time as they age. Using this design, researchers observe the same participants and measure the same dependent variable at different points in time or at different ages. The longitudinal design is similar to the one-group pretest-posttest quasi-experimental research design in that one group of participants is observed over time. In a strictly longitudinal design, however, changes at different ages are tested, but no treatment is administered.

A longitudinal design is a developmental research design used to study changes across the life span by observing the same participants at different points in time and measuring the same dependent variable at each time.

To illustrate the longitudinal design, consider the research example illustrated in Figure 9.10. Vrangalova (2015) tested the hypothesis that casual sex among college students is related to their well-being. To test this hypothesis, the researchers had a sample of 528 undergraduate students complete an online survey at the beginning (Time 1) and again at the end (Time 2) of 1 academic year. In support of their hypothesis, students reporting having engaged in “hookups for anonymous reasons” (p. 945) between Time 1 and Time 2, had lower self-esteem, and had higher depression and anxiety scores compared with those who did not report engaging in this activity. This study highlights a key advantage of the longitudinal design in that changes in participant behavior can be recorded over extended periods (e.g., Hawkley, Thisted, & Cacioppo, 2009), even 1 year or more.

The disadvantage of the longitudinal design is that it is prone to many threats to internal validity associated with observing participants over time. For example, many participants may drop out of the study over time (attrition). One possibility is that those who are most motivated to complete the study will remain at the end, so it could be motivation to complete the study, not age, that is associated with any changes observed. In addition, participants could learn how to take the assessments (testing effect) or settle down during the study so that assessments at Time 2 actually reflect their true score (regression toward the mean) on the measures recorded. Finally, the longitudinal design can require substantial resources, money, recruitment efforts, and time to complete, particularly for studies that last years or even decades.

Importantly, participant characteristics, referred to as individual differences, can further be used to explain any differences or changes observed in a longitudinal study. For this reason, many researchers who use this design will record additional measures at Time 1/Time 2 so that they can control for these factors prior to evaluating differences over time. For example, Vrangalova (2015) recorded a variety of participant characteristics, such as demographic background, personality traits, and prior casual and romantic sex (prior to the start of the study), to ensure that such factors could be controlled for (i.e., identified or eliminated as possible reasons or explanations for the results), prior to evaluating the differences described in their study. Measuring participant characteristics, then, is a practical way to control for factors that you anticipate may influence differences over time in a longitudinal study.

Description

Figure 9.10 ⦁ The Longitudinal Design

Based on a design used by Vrangalova (2015). The structure of the longitudinal design is to observe the same participants across time.

Age is the quasi-independent variable using a developmental research design.

Cross-Sectional Design

An alternative developmental design that does not require observing the same participants over time is the cross-sectional design. Using this design, the researcher observes a cross-section of participants who are grouped based on their age. The cross-sectional design is similar to a nonequivalent control group quasi-experimental design in that the different age groups act as nonequivalent control groups. Each age group is called a cohort, which is any group of individuals who share common statistical traits or characteristics, or experiences within a defined period. For example, a cohort could be a group of people who were born in the same year, served in the same war, or attended the same school. For developmental research, cohorts in a cross-sectional analysis are related in terms of when participants were born.

A cross-sectional design is a developmental research design in which participants are grouped by their age and participant characteristics are measured in each age group.A cohort is a group of individuals who share common statistical traits or characteristics, or experiences within a defined period.

To illustrate the cross-sectional design, we will look at the research example illustrated in Figure 9.11. Phillips (2008) selected a sample of 99 community college students and 320 middle school and high school students. Each group represented a different age group or cohort. The researcher measured the identity style of students in each cohort (community college vs. middle school and high school) using the Identity Style Inventory Revised for a Sixth-Grade Reading Level (ISI-6G; White, Wampler, & Winn, 1998). Results showed that the identity style of a student is different for precollege and college-aged cohorts.

The advantage of a cross-sectional design is that participants are observed one time in each cohort. Observing participants one time eliminates many threats to internal validity associated with observing participants over time. Factors such as attrition, testing effects, and regression toward the mean are typically not a concern when participants are observed only one time.

Description

gure 9.11 ⦁ The Cross-Sectional Design

Based on a design used by Phillips (2008). Notice that participants are grouped based on their age using the cross-sectional design.

However, a disadvantage of the cross-sectional design is the possibility of cohort effects (or generation effects), which occur when preexisting differences between members of a cohort can explain an observed result. For example, suppose we use a cross-sectional design to measure how often 20-year-olds, 40-year-olds, and 80-year-olds send text messages. In this example, we are likely to find that texting decreases with age. However, there is also a cohort effect due to the generational gap in advances of technology. An 80-year-old participant was raised when cell phones, and therefore texting, did not yet exist. This cohort effect of differences in experience or familiarity with texting across the life span can alternatively explain why texting appears to decrease with age, without appealing to age as the primary explanation. For this reason, researchers must be cautious to consider any possible cohort effects prior to the conduct of a cross-sectional study.

Table 9.2 summarizes the two developmental research designs described here. These two research designs, the longitudinal design and the cross-sectional design, can also be used together, as is described next.

A cohort effect, or a generation effect, is a threat to internal validity in which differences in the characteristics of participants in different cohorts or age groups confound or alternatively explain an observed result.

Cohort-Sequential Design

To combine the advantages of longitudinal and cross-sectional developmental research designs, we can use a cohort-sequential design. Using the cohort-sequential design, two or more cohorts are observed from or at different points in time (cross-sectional design), and over time (longitudinal design). Figure 9.12 illustrates this design when three cohorts are observed, with each cohort also observed over time. Note that this design requires only that the longitudinal observations overlap across the cohorts. With only two cohorts observed, it is also common for some of the same participants to be represented in each cohort, as described in the following research example, physical activity among adolescent girls. In their study, physical

A cohort-sequential design is a developmental research design that combines longitudinal and cross-sectional techniques by observing different cohorts of participants over time at overlapping times.

Figure 9.12 ⦁ The Cohort-Sequential Design

In this example of a cohort-sequential design, three cohorts of participants born as part of Generation X (oldest cohort), Millennials, or Generation Z (youngest cohort) are observed on some measure over time. The shaded boxes indicate when each group was observed. In this example, each cohort was observed twice, and the times of longitudinal observations overlapped.

Table 9.2 ⦁ Potential Limitations of the Longitudinal and Cross-Sectional Research Designs

Potential Limitations                           Developmental Research Design                      Cross-Sectional 

                                                                             Longitudinal

Threats to internal validity

History and maturationYes, because participants are observed more than one time, and the design lacks a control group.

Possibly, because the control groups (by age) are nonequivalent.

Regression and testing effects?

Yes, because participants are observed more than one time.

No, because participants are observed only one time.

Heterogeneous attrition?

Yes, because participants are observed more than one time.

Possibly, but not likely because participants are observed only one time.

Cohort effects?

No, because participants from the same cohort are observed over time.

Yes, because participants are grouped based on their age, which is a cohort.

Additional potential limitations

Time-consuming?

Yes, studies can range from months to years in length.

No, a cross-section of the life span is observed at one time.

Costly/expensive?

Yes, keeping track of participants costs time, recruitment, and money.

Possibly, but this design is typically less costly/expensive than a longitudinal study.

As an example of how the cohort-sequential design can be applied when the same participants are represented in each cohort, Pate et al. (2009) measured age-related changes in physical activity among adolescent girls. In their study, physical activity was measured in sixth-grade girls, and physical activity was again measured 2 years later when the girls were in eighth grade. Part of their sample was longitudinal in that the same girls from sixth grade were sampled again when they were in eighth grade. Also, by chance, some girls were sampled only one time because some sixth-grade girls did not participate in eighth grade and some eighth-grade girls included in the study did not participate when they were in sixth grade. The advantage of using this cohort-sequential design is that researchers can do the following:

Account for threats to internal validity associated with observing participants over time because part of the sample is a cross-section of age groups.

Account for cohort effects because part of the sample includes the same participants observed over time in each age group or cohort.

9.6 Ethics in Focus

Development and Aging

Ethical concerns related to age are often focused on those who are very young and those who are very old. For younger participants, researchers must obtain consent from a parent, caregiver, or legal guardian to study minors, who are children under the age of 18 years. On the other extreme, older individuals require special permissions particularly when they are deemed no longer functionally or legally capable. Additional concerns also arise for the ethical treatment of clinical populations, such as those suffering trauma or disease at any stage of development. In all, you should follow three rules to ensure that such groups or cohorts are treated in an ethical manner:

Obtain assent when necessary. In other words, ensure that informed consent is obtained from the participant only after all possible risks and benefits have been clearly identified.

Obtain permission from a parent, caregiver, legal guardian, or another legally capable individual, such as a medical professional, when a participant is a minor or when a participant is functionally or legally incapable of providing consent.

Clearly show that the benefits of a study outweigh the costs. For any group that is studied, that group (younger, older, or incapable) should specifically benefit from participating in the research with minimal costs.

Learning Check 2 ✓

State the developmental research design that is described by each of the following phrases:

Observing participants over time

Observing groups at one time only

Prone to testing effects

Prone to cohort effects

A __________ is a group of individuals who share common statistical or demographic characteristics.

Answers:

  1. A. Longitudinal, B. Cross-sectional, C. Longitudinal, D. Cross-sectional; 2. cohort.

SiNGLE-CASE EXPERIMENTAL DESIGNS

In this section, we begin by identifying a new research design to test the following research hypothesis: Giving encouragement to students who are at risk of dropping out of school will keep them on task in the classroom. To answer this hypothesis, we could measure the time (in minutes) that an at-risk student stays on task. We could observe the student for a few days with no encouragement. Then we could observe the student for a few days with encouragement given as they work on the task. We could then again observe the student for a few more days with no encouragement. If the hypothesis is correct and we set up this study correctly, then we should expect to find that the time (in minutes) spent on task was high when the encouragement was given but low during the observation periods before and after when no encouragement was given. The unique feature of this design is that only one participant was observed.

In this final section, we introduce the research design that was illustrated here: the single-case experimental design.

9.7 AN OVERVIEW OF SINGLE-CASE DESIGNS

In some cases, often in areas of applied psychology, medicine, and education, researchers want to observe and analyze the behavior of a single participant using a research design called the single-case experimental design. A single-case design is unique in that a single participant serves as their own control; multiple participants can also be observed as long as each individual serves as their own control (Antia, Guardino, & Cannon, 2017; Kazdin, 2016). In addition, the dependent variable measured in a single-case design is analyzed for each individual participant and is not averaged across groups or across participants. By contrast, all other experimental research designs, introduced in Chapters 10 through 12, are grouped designs.

A single-case experimental design is an experimental research design in which a participant serves as their own control and the dependent variable measured is analyzed for each individual participant and is not averaged across groups or across participants.

For a single-case design to be an experimental design, it must meet the following three key elements of control required to draw cause-and-effect conclusions:

Randomization (random assignment). Using single-case designs, each participant can be randomly assigned to experience many phases or treatments controlled by the researcher.

Manipulation (of variables that operate in an experiment). The researcher must manipulate the phases or treatments that are experienced by each participant such that the factor or independent variable is not preexisting.

Comparison/control group. Each participant acts as their own control or comparison. For the single-case designs described here, comparisons can be made across multiple baseline phases (reversal design), participants (multiple-baseline design), or treatments (changing-criterion design).

An advantage of analyzing the data one participant at a time is that it allows for the critical analysis of each individual measure, whereas averaging scores across groups can give a spurious appearance of orderly change. To illustrate this advantage, suppose that a researcher measures the body weight in grams of four rat subjects before and after an injection of a drug believed to cause weight loss. The hypothetical data, provided in Table 9.3, show that rat subjects as a group lost 25 grams on average. However, Rat C actually gained weight following the injection. An analysis of each individual rat could be used to explain this outlier; a grouped design would often disregard this outlier as “error” so long as weight loss was large enough on average.

The single-case design, which is also called the single-subject, single-participant, or small n design, is most often used in applied areas of psychology, medicine, and education.

Table 9.3 ⦁ The Value of an Individual Analysis

Subject

Baseline weight

Weight following drug treatment

Weight loss

Rat A

320

305

15

Rat B

310

280

30

Rat C

290

295

−5

Rat D

360

300

60

8 SINGLE-CASE BASELINE-PHASE DESIGNS

Single-case designs are typically structured by alternating baseline and treatment phases over many trials or observations. In this major section, we introduce three types of single-case experimental research designs:

Reversal design

Multiple-baseline design

Changing-criterion design

Reversal Design

One type of single-case design, called the reversal design (or ABA design), involves observing a single participant prior to (A), during (B), and following (A) a treatment or manipulation. The reversal design is structured into phases, represented alphabetically with an A or a B. Each phase consists of many observations or trials. The researcher begins with a baseline phase (A), in which no treatment is given, then applies a treatment in a second phase (B), and again returns to a baseline phase (A) in which the treatment is removed. This type of research design can be represented as follows:

A reversal design, or ABA design, is a single-case experimental design in which a single participant is observed before (A), during (B), and after (A) a treatment or manipulation.

  1. A phase is a series of trials or observations made in one condition.
  • The baseline phase (A) is a phase in which a treatment or manipulation is absent.
  • A (baseline phase) → B (treatment phase) → A (baseline phase)

If the treatment in Phase B causes a change in the dependent variable, then the dependent variable should change from baseline to treatment, then return to baseline levels when the treatment is removed. For example, we opened this section with the hypothesis that giving encouragement to students who are at risk of dropping out of school will keep them on task in the classroom. To test this hypothesis, we measured the time in minutes that an “at-risk” student spent on task in a class with no encouragement (baseline, A) for a few trials, then with encouragement (treatment, B) for a few trials, and again with no encouragement (baseline, A) for a few more trials. If the encouragement (the treatment) was successful, then the time (in minutes) spent on task would be higher when the encouragement was given but lower during the observation periods before and after when no encouragement was given. The second baseline phase minimizes the possibility of threats to internal validity. Adding another B and A phase would further minimize the possibility of threats to internal validity because the pattern of change would be repeated using multiple treatment phases.

A visual inspection of the data, and not inferential statistics, is used to analyze the data when only a single participant is observed. To analyze the data in this way, we look for two types of patterns that indicate that a treatment caused an observed change, as illustrated in Figure 9.13:

A change in level is displayed graphically, as shown in Figure 9.13 (top graph), when the levels of the dependent variable in the baseline phases are obviously less than or greater than the levels of the dependent variable in the treatment phase.

A change in trend is displayed graphically, as shown in Figure 9.13 (bottom graph), when the direction or pattern of change in the baseline phases is different from the pattern of change in the treatment phase. In the typical case, a dependent variable gradually increases or decreases in the treatment phase but is stable or does not change in the baseline phases.

The reversal design is typically conducted in applied areas of research to investigate possible solutions that can benefit individuals or society. For this reason, one advantage of the design is that it can be used to apply treatments that are beneficial to participants. Often this means that researchers will be asked by ethics committees to end their study with a treatment phase (B), which was the phase that was beneficial to the participant. For this reason, many reversal designs are at least four phases, or ABAB, so as not to return to baseline to end an experiment.

A limitation of the reversal design is that the change in a dependent variable in a treatment phase must return to baseline levels when the treatment is removed. However, in many areas of research, such as studies on learning, a return to baseline is not possible. When a participant is taught a new skill, for example, it is often not possible to undo what the participant learned—as fully expected, the behavior will not return to baseline. In these situations, when it is not possible for changes in a dependent variable to return to baseline, a reversal design cannot be used.

Figure 9.13 ⦁ Two Ways to Identify if a Treatment Caused Changes in a Dependent Variable

A change in level (top graph) and a change in trend (bottom graph) make it possible to infer that some treatment is causing an effect or a change in behavior.

Source: Republished with permission of John Wiley and Sons Inc, from Enhancing capacity to make sexuality-related decisions in people with an intellectual disability. Dukes, E. & McGuire, B. E., Journal of Intellectual Disability Research, 53 (8), 2009; permission conveyed through Copyright Clearance Center, Inc.

Multiple-Baseline Design

For situations in which it is not possible for changes in a dependent variable to return to baseline levels following a treatment phase, researchers can use the multiple-baseline design. The multiple-baseline design is a single-case design in which the treatment is successively administered over time to different participants, for different behaviors, or in different settings. This design allows researchers to systematically observe changes caused by a treatment without the need of a second baseline phase and can be represented as follows:

A multiple-baseline design is a single-case experimental design in which a treatment is successively administered over time to different participants, for different behaviors, or in different settings.

By representing the multiple-baseline design in this way, a case refers to a unique time, behavior, participant, or setting. Baseline periods are extended in some cases prior to giving a treatment. If the treatment causes an effect following a baseline phase for each case, then the change in level or pattern should begin only when the baseline phase ends, which is different for each case. If this occurs, then we can be confident that the treatment is causing the observed change. This design minimizes the likelihood that something other than the treatment is causing the observed changes if the changes in a dependent variable begin only after the baseline phase ends for each case.

To illustrate the multiple-baseline design, we will look at the research example illustrated in Figure 9.14. Dukes and McGuire (2009) used a multiple-baseline design to measure the effectiveness of a sex education intervention, which they administered to multiple participants with a moderate intellectual disability. The researchers recorded participant knowledge of sexual functioning using the Sexual Consent and Education Assessment (SCEA K-Scale; Kennedy, 1993), on which higher scores indicate greater ability to make decisions about sex. Each participant was given a baseline phase for a different number of weeks. Scores on the SCEA K-Scale were low in this baseline phase. As shown in Figure 9.14 for three participants, only after the baseline period ended and the intervention was administered did scores on the scale increase. Scores also remained high for 4 weeks after the program ended. Hence, the results showed a change in level from baseline to intervention for each participant.

Each participant in the sex education study received the intervention (or the treatment) in successive weeks: Tina (Week 11), Josh (Week 12), and Debbie (Week 13). Because the treatment was administered at different times, and changes in the dependent variable only occurred once the treatment was administered, the pattern showed that the treatment, and not other factors related to observing participants over time, caused the observed changes in SCEA K-Scale scores.

Description

Figure 9.14 ⦁ Results from a Multiple-Baseline Design for Three Participants Receiving a Sex Education Intervention

Source: Republished with permission of John Wiley and Sons Inc, from Enhancing capacity to make sexuality-related decisions in people with an intellectual disability. Dukes, E. & McGuire, B. E., Journal of Intellectual Disability Research, 53 (8), 2009; permission conveyed through Copyright Clearance Center, Inc.

The advantage of a multiple-baseline design is that it can be used when we expect a treatment will not return to baseline, such as when we study learning on some measure, as illustrated in Figure 9.14 for our example. The limitation of a multiple-baseline design is that the design is used when only a single type of treatment is administered. This same limitation applies to the reversal design. For situations when we want to administer successive treatments, then, we require a different type of single-case experimental design.

A and B indicate the phases in a reversal design.

The length of the baseline phase is varied using a multiple-baseline design.

Changing-Criterion Design

For research situations in which we want to change a criterion or treatment after the participant meets an initial criterion or responds to one particular treatment, we can use a changing-criterion design. Using the changing-criterion design, we begin with a baseline phase, which is followed by many successive treatment phases to determine if participants can reach different levels or criteria in each treatment phase. The criterion can be changed as often as necessary or until some final criterion is met. For a three-treatment study, the changing-criterion design can be represented as follows:

A changing-criterion design is a single-case experimental design in which a baseline phase is followed by successive treatment phases in which some criterion or target level of behavior is changed from one treatment phase to the next. The participant must meet the criterion of one treatment phase before the next treatment phase is administered.

To illustrate the changing-criterion design, we will look at the research example illustrated in Figure 9.15. Gentry and Luiselli (2008) used the changing-criterion design to increase the number of bites that Sam, a fictitious name for the 4-year-old boy being observed, would take of a nonpreferred food (i.e., a food he did not like) during a supper meal. In a baseline phase, Sam ate the food with no manipulation. Then a series of manipulations followed. Sam was instructed to spin an arrow that would fall on a number indicating the number of bites of a nonpreferred food that Sam would need to consume during supper to gain a reward, which in this study was his favorite play activity. The initial criterion was a spinner with a 1 and a 2 on it. This criterion was increased over time, until the options on the spinner were 5 and 6 (bites) to meet the criterion to gain a reward. As shown in Figure 9.15, each time the criterion, or the number of bites required to gain a reward, was increased, Sam’s eating behavior correspondingly increased.

Two advantages of the changing-criterion design are that it does not require a reversal to baseline of an otherwise effective treatment and that it enables experimental analysis of a gradually improving behavior. A limitation of the design is that the target behavior must already be in the participant’s repertoire. For example, the number of bites of food is well within the abilities of a healthy child. In addition, researchers should be cautious to not increase or decrease the criterion too soon or by too much, which may impede the natural learning rate of the participant being observed.

Each successive treatment phase in a changing-criterion design is associated with a change in criterion.

Description

Figure 9.15 ⦁ A Changing-Criterion Design to Increase the Number of Bites of Nonpreferred Food for a Single Child (Sam)

At baseline, Sam ate no bites, and then Sam spun an arrow that displayed different criteria for a reward. He began with 1–2 bites, then 2–3 bites, 3–5 bites, 4–6 bites, and finally 5–6 bites in order to receive the reward. The changing criterion is highlighted in each treatment phase. Notice that as the criterion was increased, so did Sam increase the number of bites he took of nonpreferred food. Data based on those presented by Gentry and Luiselli (2008).

Learning Check 3 ✓

Why is the single-case design regarded as an experimental research design?

Identify whether each of the following is an example of a reversal design, a multiple-baseline design, or a changing-criterion design:

A researcher gives a child successively greater levels of positive reinforcement after an initial baseline phase to reduce how often the child bites their nails. The successive treatments are administered until the child has reached a level where they are no longer biting their nails.

A researcher records the duration of time a participant stays on task in a dance recital 4 days before, 4 days during, and 4 days after a behavioral intervention strategy is implemented.

A researcher records the quality of artistic strokes made by three participants. Each participant was given a treatment phase after 3, 4, or 5 days of a baseline phase; no baseline phase was given after the treatment was administered.

For a single-case experimental study, why would a researcher use a multiple-baseline design instead of a reversal design?

Answers:

  1. Because it meets the three key elements of control required to demonstrate cause and effect: randomization, manipulation, and comparison; 2. A. Changing-criterion design, B. Reversal design, C. Multiple-baseline design; 3. A multiple-baseline design would be used when it is not possible for changes in a dependent variable to return to baseline.

9 VALIDITY, STABILITY, MAGNITUDE, AND GENERALITY

The analysis of single-case experimental research designs is based largely on a visual inspection of the data in a graph and is not based on statistical analyses that require data to be grouped across multiple participants or groups. The specific visual features in a graph that indicate the validity of an observation are described in this section.

Internal Validity, Stability, and Magnitude

Recall from Chapter 6 that internal validity is the extent to which we can demonstrate that a manipulation or treatment causes a change in a dependent measure. Importantly, the extent to which we establish experimental control of all other possible causes is directly related to the internal validity of a research study. The greater the control we establish, the higher the internal validity.

A single-case design requires a visual analysis of the graphical data of a single participant. The level of control and therefore the internal validity of a single-case design can be determined when the following two features are observed in a graph using this type of analysis:

  1. The stability in the pattern of change across phases
  2. The stability in the pattern of change across phases

The magnitude or size of the change across phases

Stability is the consistency in the pattern of change in a dependent measure in each phase of a design. The more stable or consistent changes in a dependent measure are in each phase, the higher the internal validity of a research design.

In a visual inspection of a graph, the stability of a measure is indicated by the consistency in the pattern of change in each phase. The stability of a dependent measure is illustrated in Figure 9.16. Data in a given phase can show a stable level (as in Figure 9.16a), can show a stable trend (as in Figure 9.16b), or can be unstable (as in Figure 9.16c). The stability of a measure in each phase is important because when a measure is unstable, changes are occurring in a dependent variable even when the researcher is not manipulating the behavior. When a dependent measure is stable, we can be confident that any changes in level or trend were caused by the manipulation, because changes only occurred between each phase and were otherwise stable or consistent within each phase. Therefore, the more stable a measure, the greater the control and the higher the internal validity in an experiment.

Description

Figure 9.16 ⦁ A Stable Level (a), a Stable Trend (b), and an Unstable Response (c)

Graphs (a) and (b) show a response that indicates high internal validity, whereas graph (c) indicates low internal validity.

Another level of control can be demonstrated by the magnitude of change, which is the size of the change in a dependent measure observed between phases. When a measure is stable within each phase, we look at the magnitude of changes between phases. For a treatment to be causing changes in a dependent measure, we should observe immediate changes as soon as the treatment phase is administered. We can observe an immediate change in level (as shown in Figure 9.17a), or we can observe an immediate change in trend (as shown in Figure 9.17b). The greater the magnitude of changes between phases, the greater the control and the higher the internal validity in a single-case experiment.

Magnitude is the size of the change in a dependent measure observed between phases of a design. The larger the magnitude of changes in a dependent measure between phases, the higher the internal validity of a research design.

Internal validity is related to the stability and magnitude of change across phases in a single-case design.

Description

Figure 9.17 ⦁ Internal Validity and Control

The graphs identify an immediate change in level (top row, a) or a change in trend (bottom row, b) that would indicate a high level of control and high internal validity.

External Validity and Generality

Recall from Chapter 6 that external validity is the extent to which observations generalize beyond the constraints of a study. A single-case design is typically associated with low population validity, which is a subcategory of external validity. In other words, it is not possible to know whether the results in the sample would also be observed in the population from which the sample was selected because single-case experimental designs are associated with very small sample sizes. However, the results in a single-case design can have high external validity in terms of generalizing across behaviors, across subjects or participants, and across settings. The following is an example of each way to generalize results to establish the external validity of a single-case experiment:

As an example of generalizing across behaviors, a psychotherapist may examine the extent to which causes of spousal abuse generalize to or also similarly cause child abuse. In this example, the therapist generalizes across behaviors, from spousal abuse (Behavior 1) to child abuse (Behavior 2).

As an example of generalizing across subjects or participants, an animal researcher may examine the generality of foraging behavior across multiple rat subjects, or a clinical researcher may examine the effectiveness of a behavioral therapy to improve symptoms of depression across multiple participants. In each case, the researcher is generalizing across multiple subjects or participants.

As an example of generalizing across settings, a child psychologist may want to determine the extent to which characteristics of child play behavior during recess generalize to characteristics of play behavior during class time. In this example, the researcher generalizes across settings, from child play behavior during recess (Setting 1) to child play behavior during class time (Setting 2).

External validity is related to the generality of findings in a single-case design.

9.10 Ethics in FocusThe Ethics of Innovation

Many single-case experiments look at early treatments for behavioral disorders or simply bad habits such as smoking or nail biting. When these types of behaviors are studied using a single-case design, the treatment is typically hypothesized to have benefits, such as reducing symptoms of the behavioral disorder or reducing the frequency of bad habits. Researchers will end an experiment with the treatment phase that was most beneficial, so as to maximize the benefits that participants receive. In a reversal design, this means that researchers end the study in a B phase (e.g., ABAB). A multiple-baseline design and a changing-criterion design already end in a treatment phase. Adding a treatment phase or otherwise adapting a single-case design is quite manageable for researchers because they observe only one or a few subjects or participants in a single-case experiment. Observing such a small sample size allows researchers the flexibility to make changes, such as when they add or omit treatments to maximize benefits to participants.

The flexibility of a single-case design also allows for greater “investigative play” (Hayes, 1981, p. 193) or greater freedom to ask innovative or new questions about treatments with unknown causes or with unknown costs or benefits. Single-case designs allow for the conduct of such innovative research to rigorously evaluate potential, yet untested, treatments with small samples; this allows researchers to test the treatment without exposing such a treatment to large groups of participants, particularly when the potential costs of implementing such a treatment are largely unknown or untested. In this way, single-case designs can be used as an initial research design for testing some of the most innovative research in the behavioral sciences.

Learning Check 4 ✓

  1. Perform a visual inspection of the following data. Does the graph illustrate a study with high internal validity? Explain.

Description

  1. A researcher uses a single-case design to record the number of minutes spent studying in a baseline phase and a calming music treatment phase with a student who studied in a library and the same student who studied in a college dormitory room. Based on this description, can the researcher generalize across behaviors, across participants, or across settings?
  2. Single-case designs allow for greater freedom to ask innovative or new questions about treatments with unknown causes or with unknown costs or benefits. Why can a single-case design be an ethically appropriate research design to test the effectiveness of such treatments?

Answers:

  1. Yes, because the data at baseline are stable, and there is a change in trend from baseline to treatment; 2. Generalize across settings; 3. Because single-case designs are used with small samples, thereby testing the treatment without exposing such a treatment to large groups of participants.

Chapter Summary

LO 1 Define and identify a quasi-experiment and a quasi-independent variable.

  1. A quasi-experimental research design is structured similar to an experiment, except that this design lacks random assignment, includes a preexisting factor (i.e., a variable that is not manipulated), or does not include a comparison/control group.
  2. A quasi-independent variable is a preexisting variable that is often a characteristic inherent to an individual, which differentiates the groups or conditions being compared in a research study. Because the levels of the variable are preexisting, it is not possible to randomly assign participants to groups.

LO 2 Identify and describe two one-group quasi-experimental research designs: the posttest-only and pretest-posttest designs.

  1. The one-group posttest-only design is a quasi-experimental research design in which a dependent variable is measured for one group of participants following a treatment.
  2. The one-group pretest-posttest design is a quasi-experimental research design in which the same dependent variable is measured in one group of participants before and after a treatment is administered.

LO 3 Identify and describe two nonequivalent control group quasi-experimental research designs: the posttest-only and pretest-posttest designs.

  1. A nonequivalent control group is a control group that is matched upon certain preexisting characteristics similar to those observed in a treatment group, but to which participants are not randomly assigned. When a nonequivalent control group is used, selection differences can potentially explain an observed difference between an experimental and a nonequivalent control group.
  2. The nonequivalent control group posttest-only design is a quasi-experimental research design in which a dependent variable is measured following a treatment in one group and is compared with a nonequivalent control group that does not receive the treatment.
  3. The nonequivalent control group pretest-posttest design is a quasi-experimental research design in which a dependent variable is measured in one group of participants before (pretest) and after (posttest) a treatment, and that same dependent variable is also measured at pretest and posttest in a nonequivalent control group that does not receive the treatment.

LO 4 Identify and describe three time series quasi-experimental research designs: basic, interrupted, and control designs.

  1. The basic time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that is manipulated by the researcher is administered.
  2. The interrupted time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that naturally occurs.
  3. A control time series design is a basic or interrupted time series quasi-experimental research design that also includes a nonequivalent control group that is observed during the same period as a treatment group but does not receive the treatment.

LO 5 Identify and describe three developmental quasi-experimental research designs: longitudinal, cross-sectional, and cohort-sequential designs.

  1. A longitudinal design is a developmental research design used to study changes across the life span by observing the same participants over time and measuring the same dependent variable each time.
  2. A cross-sectional design is a developmental research design in which participants are grouped by their age and participant characteristics are measured in each age group. Each age group is a cohort, so this design is prone to cohort effects, which occur when unique characteristics in each cohort can potentially explain an observed difference between groups.
  3. A cohort-sequential design is a developmental research design that combines longitudinal and cross-sectional techniques by observing different cohorts of participants over time at overlapping times.

LO 6 Define the single-case experimental design.

The single-case experimental design is an experimental research design in which a participant serves as their own control and the dependent variable measured is analyzed for each individual participant and is not averaged across groups or across participants. This design meets the three requirements to demonstrate cause and effect: randomization, manipulation, and comparison/control.

LO 7 Identify and describe three types of single-case research designs: the reversal, multiple-baseline, and changing-criterion designs.

  1. The reversal design is a single-case experimental design in which a single participant is observed before (A), during (B), and after (A) a treatment or manipulation.
  2. The multiple-baseline design is a single-case experimental design in which a treatment is successively administered over time to different participants, for different behaviors, or in different settings.
  3. The changing-criterion design is a single-case experimental design in which a baseline phase is followed by successive treatment phases in which some criterion or target level of behavior is changed from one treatment phase to the next. The participant must meet the criterion of one treatment phase before the next treatment phase is administered.

LO 8 Identify in a graph the stability and magnitude of a dependent measure and explain how each is related to the internal validity of a single-case design.

  1. The stability of a measure is the consistency in the pattern of change in a dependent measure in each phase of a design. The more stable or consistent changes in a dependent measure are in each phase, the higher the internal validity of a research design.
  2. The magnitude of change in a measure is the size of the change in a dependent measure observed between phases of a design. A measure can have a change in level or a change in trend. The larger the magnitude of change, the greater the internal validity of a research design.

LO 9 Identify three ways that researchers can strengthen the external validity of a result using a single-case design.

A single-case design is typically associated with low population validity (a subcategory of external validity). However, three ways that researchers can strengthen the external validity of a result using a single-case design is to generalize across behaviors, across subjects or participants, and across settings.

REVIEW QUESTIONS

A quasi-experimental research design is structured similar to an experiment, with what two exceptions?

State whether each of the following factors is an example of an independent variable or a quasi-independent variable. Only state “quasi-independent variable” for participant variables that cannot be manipulated.

The age of participants

Time allotted for taking an exam

A participant’s work experience

Time of day a study is conducted

A participant’s state of residence

Amount of sugar added to a drink

How does a one-group pretest-posttest design improve on the posttest-only quasi-experimental design? What is the major limitation of all one-group designs?

What is a nonequivalent control group, and why does this type of group make it difficult to determine cause and effect using a nonequivalent control group quasi-experimental design?

What is the key difference between the basic and interrupted time series quasi-experimental research designs?

Name the developmental research design described in each of the following examples:

A researcher measures job satisfaction in a sample of employees on their first day of work and again 1 year later.

A researcher records the number of nightmares per week reported in a sample of 2-year-old, 4-year-old, and 8-year-old foster children.

(A) Cohort effects are a threat to what type of validity? (B) Which developmental research design is most susceptible to effects?

Why is the single-case design regarded as an experimental research design?

A reversal design is used to test the hypothesis that low lighting in a room reduces how quickly students read. As shown in Graph 1 for one student, a student reads passages of similar length in a room with normal lighting (baseline), then in the same room with dim lighting (treatment), and then again with normal lighting. Do the results shown in the figure support the hypothesis? Explain.

What is the most likely reason that a researcher uses a multiple-baseline design instead of a reversal design?

Define the changing-criterion design and explain when the design is used.

Are the baseline data shown in Graph 2 stable? Do the baseline data in the figure indicate high or low internal validity?

A researcher examines the generality of a behavioral treatment for overeating by testing the same treatment to treat overworking. In this example, is the researcher generalizing across behaviors, across participants, or across settings?

A researcher examines if the effectiveness of a new learning system used in a classroom is also effective when used in a home (for homeschooled children). In this example, is the researcher generalizing across behaviors, across participants, or across settings?

ACTIVITIES

Use an online database, such as PsycINFO, to search scientific research articles for any topic you are interested in. Perform two searches. In the first search, enter a search term related to your topic of interest, and enter the term longitudinal to find research that used this design in your area of interest. Select and print one article. In the second search, again enter a search term related to your topic of interest, and this time enter the term cross-sectional to find research that used this design in your area of interest. Again, select and print one article. Once your searches are complete, complete the following assignment:

Write a summary of each article, and explain how each research design differed.

Describe at least two potential threats to internal validity in each study.

Include the full reference information for both articles at the end of the assignment.

A researcher proposes that having a pet will improve health.

Write a research plan to test this hypothesis using a single-case experimental design.

What is the predicted outcome or pattern, if the hypothesis that having a pet will improve health were correct?

Graph the expected results.

Identify the extent to which your results demonstrate high or low internal validity.

SURVEY AND CORRELATIONAL RESEARCH DESIGNS

Only follow what is highlighted in red Description You have probably made or heard the popular comment “Is it just me, or [fill in the blank here]?” This question is really a survey that asks others to indicate their level of agreement with some viewpoint—for example, “Is it just me, or is it hot in here?” or “Is it just me, or was this exam difficult?” We largely ask such questions to gauge the opinions of others. Many examples likely occur every day, from completing a customer satisfaction survey to asking your friends what they plan to order at a restaurant to get a better idea of what you might want to order. Really, we could survey people to measure all sorts of constructs, including love, attachment, personality, motivation, cognition, and many other constructs studied by behavioral scientists. We can also identify how constructs such as love, attachment, personality, motivation, and cognition are related to other factors or behaviors such as the likelihood of depression, emotional well-being, and physical health. In everyday situations, you may notice relationships between temperature and aggression (e.g., the hotter it is outside, the more often you see people fighting at a sports stadium) or between class participation and grades (e.g., students with higher grades tend to also participate more in class). Hence, there is a natural tendency for us to engage the world under the assumption that behavior does not occur in isolation. Instead, behavior is related to or influenced by other factors in the environment. It is therefore not uncommon at all for humans to observe the world by asking people to answer questions about themselves or by observing how human behavior is related to other factors such as health and well-being. The same is true in science. In this chapter, we describe how we can use the scientific method to evaluate or survey participant responses and identify relationships between factors. SURVEY DESIGNS Many research designs can be used to test the same hypotheses. This chapter is separated into two major sections; each section describes a nonexperimental research design: survey designs and correlational designs. To introduce how each design can be used to test the same hypothesis, we begin each major section by developing a new research design to test the same hypothesis. Suppose we hypothesize that texting while driving is more prevalent among younger age groups, as has been tested in the published literature (Hayashi, Rivera, Modico, Foreman, & Wirth, 2017; Quisenberry, 2015; Srinivas, White, & Omar, 2014). We could use a survey research design by asking a sample of young college students who drive to indicate in a questionnaire how often they use text messaging while driving (per month). If the hypothesis is correct and we set up this study correctly, we should find that a high percentage of young drivers use text messaging while driving. We will return to this hypothesis with a new way to answer it when we introduce correlational designs. We begin this chapter with an introduction to the research design that was illustrated here: the survey research design. 8.1 A1 AN OVERVIEW OF SURVEY DESIGNS A nonexperimental research design used to describe an individual or a group by having participants complete a survey or questionnaire is called the survey research design. A survey, which is a common measurement tool in the behavioral sciences, is a series of questions or statements to which participants indicate responses. A survey can also be called a questionnaire or self-report because many surveys specifically include questions in which participants report about themselves—their attitudes, opinions, beliefs, activities, emotions, and so on. The survey research design is the use of a survey, administered either in written form or orally, to quantify, describe, or characterize an individual or a group. A survey can be administered in printed form, or it can be distributed orally in an interview. While a survey can be used as a measurement tool in many research designs, the survey research design specifically refers to the use of surveys to quantify, describe, or characterize an individual or a group. In this chapter, we introduce the types and writing of questions included in surveys, describe how to administer surveys, and discuss some limitations associated with using surveys in the behavioral sciences. A survey is a series of questions or statements, called items, used in a questionnaire or an interview to measure the self-reports or responses of respondents. 8.2 TYPES OF SURVEY ITEMSN OVERVIEW OF SURVEY DESIGNS 2 TYPES OF SURVEY ITEMS A survey consists of many questions or statements to which participants respond. A survey is sometimes called a scale, and the questions or statements in the survey are often called items. As an example of a scale with many items, the Estimated Daily Intake Scale for Sugar (EDIS-S) was developed as an 11-item scale (Privitera & Wallace, 2011). Hence, the scale or survey has 11 items or statements to which participants respond on a 7-point scale from 1 (completely disagree) to 7 (completely agree). Notice that each item, listed in Table 8.1, is a statement about how much sugar participants consume in their diets. There are three types of questions or statements that can be included in a survey: open-ended items, partially open-ended items, and restricted items. Each type of item is described here. Open-Ended Items When researchers want participants to respond in their own words to a survey item, they include an open-ended item in the survey. An open-ended item is a question or statement that is left completely “open” for response. It allows participants to give any response they feel is appropriate with no limitations. For example, the following four items are open-ended questions that were asked in a focus group, which is largely a guided discussion among a targeted group of participants to explore a topic. In this study, the aim was to understand the reasons or nature of binge-watching television shows (Flayelle, Maurage, & Billieux, 2017, p. 471): An open-ended item is a question or statement in a survey that allows the respondent to give any response in their own words, without restriction. table 8.1 ⦁ The 11 Items for the EDIS-S Item Statement 1.I tend to eat cereals that have sugar in them. 2.I tend to put a lot of syrup on my pancakes or waffles. 3.I often eat candy to snack on when I am hungry. 4.I tend to crave foods that are high in sugar. 5.I tend to snack on healthier food options. 6.I tend to consume a low-sugar diet. 7.I often snack on sugary foods when I am hungry. 8.When I crave a snack, I typically seek out sweet-tasting foods. 9.I tend to eat foods that are most convenient, even if they contain a lot of sugar. 10.I like consuming sweet-tasting foods and drinks each day. 11.I tend to avoid consuming a high-sugar diet. 1. What are your watching habits and practices regarding TV series? 2. Why do you indulge in TV series watching? 3. How do you feel right after watching an episode? 4. Do you sometimes consider yourself as a “TV series addict”? Open-ended items can also be given as a statement and not a question. For example, the researchers could have asked participants in the focus group to respond to the following survey item: “Describe an experience you had while binge-watching a TV series and describe the emotions you felt during that experience.” In this example, the open-ended item is phrased as a statement and not a question; however, the response will still be open ended. Open-ended items are most often used with the qualitative research design because the responses in the survey are purely descriptive. For example, one descriptive result from the study by Flayelle et al. (2017) was that that while most participants acknowledged that binge-watching TV shows can be addicting, they did not recognize themselves as being “addicted” in terms of a diagnosis. The focus group study by Flayelle et al. (2017) was purely descriptive—it was a qualitative research study. For quantitative research designs, the challenge with using open-ended questions or items is in coding the open-ended responses of participants. It is difficult to anticipate how participants will respond to an open-ended item, so the researcher must develop methods to code patterns or similarities in participant responses. Coding the responses to open-ended items, however, requires researchers to do both of the following: Tediously anticipate and list all possible examples of potential responses in terms of how participants might write or express their responses. Use multiple raters and additional statistical analyses to make sure the coding is accurate. r the reasons listed here, open-ended survey items are not often used in quantitative research, with partially open-ended or restricted items being favored among quantitative researchers. That said, when the response to an open-ended item is numeric, these questions can be readily used with quantitative research. For example, a survey item can ask for demographic information (age in years, income in dollars, height in inches or centimeters, weight in pounds or kilograms) or behavioral patterns. A survey used to assess social media and website behavior could ask the following: “Not counting e-mail, about how many minutes or hours per week do you use the web?” (Stern, Bilgen, McClain, & Hunscher, 2017, p. 718). As long as the unit of measurement is defined for the respondent (e.g., years, dollars, minutes), such answers will be readily quantified. Partially Open-Ended Items Researchers can include items, called partially open-ended items, which give participants a few restricted answer options and then a last one that allows participants to respond in their own words in case the few restricted options do not fit with the answer they want to give. The open-ended option is typically stated as “other” with a blank space provided for the participant’s open-ended response. For example, another focus group study evaluated what educators and students thought about using patients as teachers in medical education (Jha, Quinton, Bekker, & Roberts, 2009). Their study included the following partially open-ended item: A partially open-ended item is a question or statement in a survey that includes a few restricted answer options and then a last one that allows participants to respond in their own words in case the few restricted options do not fit with the answer they want to give. In what capacity do you (students or faculty or other) view the role of patients as teachers? 1. Teaching 2. Assessment 3. Curriculum development 4. Other ____ (Jha et al., 2009, p. 455) In this item, participants either chose an option provided (teaching, assessment, or curriculum development) or provided their own open-ended response (other ___). For the researchers, it is easier to manage the participant responses, or data, when an open-ended item includes a few restricted options. To enter participant responses, researchers can code each answer option as a number. The last open-ended option could be coded further, or just analyzed without further coding. For example, we could report only the percentage of participants choosing the last open-ended option, without analyzing the specific open-ended responses given. In this way, coding and Open-ended items are largely used in qualitative research; restricted items are largely used in quantitative research. Restricted Items The most commonly used survey item in quantitative research, called a restricted item, includes a restricted number of answer options. A restricted item does not give participants an option to respond in their own words; instead, the item is restricted to the finite number of options provided by the researcher. Restricted items are often given with a rating scale, which is often referred to as a Likert scale when the scale varies between 5 and 7 points. A Likert scale, named after Rensis Likert (1932) who was the first to use such a scale, is a finite number of points for which a participant can respond to an item in a survey, typically between 5 and 7 points. A restricted item, also called a closed-ended item, is a question or statement in a survey that includes a restricted number of answer options to which participants must respond. A Likert scale is a numeric response scale used to indicate a participant’s rating or level of agreement with a question or statement. Two common applications of rating scales are to have participants use the scale to describe themselves or to indicate their level of agreement. For example, Becker, Helseth, Frank, Escobar, and Weeks (2018) used the following 5-point rating scale to select participants to their study based on their level of concern for their child’s substance use (each number value is a “point” on the scale): analyzing partially open-ended items can be less tedious than for open-ended items. How concerned are you about your child’s substance use? (Circle one)    1 2 3 4 5 Not concerned at all Extremely concerned As another example of using a rating scale to indicate a level of agreement, Nagels, Kircher, Steines, Grosvald, and Straube (2015) used the following 5-point rating scale for an item included in an assessment of individual differences in gesture perception and production: I like talking to people who gesture a lot when they talk. (Circle one)   1 2 3 4 5 Not agree Fully agree rating scale itself does not necessarily have to be numeric. For example, many young children cannot count, let alone use numeric scales to describe themselves. In these cases, pictorial scales, such as the “self-assessment manikin” (SAM) shown in Figure 8.1 (Bradley & Lang, 1994), can be used. This type of scale can be used with children, such as in a study in which children used the SAM scale to indicate their mood or how they feel, or the how much they like foods or drinks they consume (Privitera, 2016). The scale can also be used with adults. For example, Kunze, Arntz, and Kindt (2015) used the SAM scale to evaluate participant responses to a variety of stimuli used in a study on fear conditioning. In each study, the researchers coded responses using a numeric scale from 1 (saddest) to 5 (happiest) so that responses with the images could be recorded as numeric values and then analyzed. Figure 8.1 ⦁ A Pictorial Rating Scale Used to Measure Emotion and Liking in Children Source: Adapted from Bradley and Lang (1994). Source: Adapted from Bradley and Lang (1994). The main advantage of using restricted items is that survey responses can be easily entered or coded for the purposes of statistical analysis. The main limitation of using restricted items is that the analysis is restricted to the finite number of options provided to participants. However, when the options available to participants are exhaustive of all options they could choose, this limitation is minimal. Learning Check 1 ✓ State the type of survey item for each of the following items: How do you feel about the effectiveness of your professor’s teaching style? On a scale from 1 (very ineffective) to 7 (very effective), how would you rate your professor? Is your professor’s greatest strength their (a) timeliness to class, (b) knowledge of the material, (c) concern for students, or (d) other ___ (please explain)? A __________ is a numeric response scale used to indicate a participant’s rating or level of agreement with a question or statement. Answers: 1. A. Open-ended item, B. Restricted item, C. Partially open-ended item; 2. rating scale or Likert scale. 8.3 RULES FOR WRITING SURVEY ITEMS Writing survey items is a thoughtful endeavor. The items that you write must be valid and reliable. In other words, the items must actually measure what you are trying to measure (valid), and the responses in the survey should be consistently observed across participants and over time (reliable). When an item or a measure is not valid and reliable, it is often due to a measurement error, or variability in responding due to poorly written survey items. In this section, we describe the following 10 rules used to write valid and reliable survey items that can minimize the likelihood of measurement error: 1. Keep it simple. 2. Avoid mis categorizing response items. 3. Avoid double-barreled items. 4. Use neutral or unbiased language. 5. Minimize the use of negative wording. 6. Avoid the response set pitfall. 7. Use rating scales consistently. 8. Limit the points on a rating scale. 9. Label or anchor the rating scale points. 10. Minimize item and survey length. Keep It Simple Everyone who takes a survey should be able to understand it. The best strategy is to use less than a high school–level vocabulary in writing the survey items. We use this strategy to make sure that participants’ responses reflect their actual responses and are not given because they are confused about what the question is asking. For example, we could have participants rate how full they feel by asking, “How satiated do you feel?” However, some participants may not know that satiated means to satisfy an appetite, so it would be better to plainly ask, “How full do you feel?” In sum, keep the language simple. Use simple words or language in a survey. Avoid Mis categorizing Response Items Mis categorizing response items can occur both in terms of (1) creating too many or too few categories and (2) using categories when you should avoid using them at all. Let us look at each way in which mis categorization can be avoided. The first way in which mis categorization can occur is the creation of too many or too few categories for a response. As an example of too few categories, suppose you ask participants to identify their political affiliation, and you allow them two options or categories to reply to: Republican, Democrat. However, this is too few categories. For example, other affiliations exist (e.g., Moderate, Independent) that should also be added as options. Likewise, you can have too many categories. As a general rule, you should try to limit responses to no more than nine categories, unless it is necessary. For example, creating categories for education level can certainly be captured well within nine categories, unless it is important to categorize beyond this (e.g., a survey evaluating each grade level from K−12). The second way mis categorization can occur is the use of categories that should have been avoided altogether. Ultimately, you need to be aware of the analysis options for a survey. Interval and ratio scale data (“scaled” data) are more informative and allow for greater flexibility in terms of analysis than do nominal data (“categorical” data). For this reason, it is often best to avoid using categories when a response otherwise could be numeric. In these cases, instead of listing categorical response options, it is often best to simply state the item as an open-ended item (e.g., age in years, income in dollars, height in inches or centimeters, weight in pounds or kilograms, or time in minutes) or give it as a Likert-type scale (e.g., political affiliation on a 7-point scale from 1 being extremely liberal to 7 being extremely conservative; for an example, see American National Election Studies, 2015). The critical step is to make sure to define the unit of measurement for the response. If you categorize response items first, then you never actually collect the data points within those ranges. For example, you cannot know the actual income of someone who selected a range within which their actual income falls. However, if you record the actual data for responses that do not need to be categorized (e.g., record the actual income of each respondent), then you can always go back and group or categorize the data later. But by having the original data, you have more flexibility in terms of options later for analyzing the results of your survey. Avoid listing too few or too many response categories and avoid using categories if responses could otherwise be scaled. Avoid Double-Barreled Items Ask only one question or give only one statement for each item. Double-barreled items are survey items that ask participants for one response to two different questions or statements. For example, to study relationship satisfaction, we could ask participants to indicate their level of agreement with the following statement: Double-barreled items are survey items that ask participants for one response to two different questions or statements. I enjoy the time we spend together and dislike the time we are apart.   1    2      3    4    5 Strongly disagree   Strongly agree This item for relationship satisfaction is double-barreled. It is not necessarily true that people who enjoy the time they spend with their partner also dislike the time they are apart. Anytime a sentence uses a conjunction, such as and, it is likely that the item is double-barreled. The solution is to split the question into separate items. For example, we can change the double-barreled item into two separate items, each with a separate rating scale. We could write the first item as “I enjoy the time we spend together” and the second item as “I dislike the time we are apart” to allow participants to give a separate response to each individual item. Do not include predictable response patterns in a survey. Use Neutral or Unbiased Language Do not use loaded terms, or words that produce an emotional reaction, such as language that is offensive or could potentially be considered offensive by a respondent. Offensive language is not only inappropriate, but it can also lead people to respond in reaction to the language used. In other words, responses may be caused by the choice of wording in a survey item and may not reflect the honest response of the participant. To avoid the use of potentially offensive language, the American Psychological Association (APA; 2020) provides guidelines for using appropriate language. Some suggestions include explicitly reporting information about gender identities of participants, if known, rather than assuming cisgender identities, and capitalizing Black and White to identify those racial and ethnic groups. More guidelines are provided in the publication manual (APA, 2020) and in Table 15.2 in Chapter 15. Likewise, do not use leading terms or leading questions, or words or questions that indicate how people should respond to an item. For example, a leading question would be, “How bad are your problems with your boss?” In this example, it is implied that you have a problem, which may or may not be true. So, the use of the word bad is a leading term in this sentence and it should be removed. A better way to phrase this question would be, “What is the nature of your relationship with your boss?” In this case, you are not implying what the nature of that relationship is, and the respondent is not being led toward one response or another. Thus, the solution for fixing survey items with loaded terms or leading questions is often to simply rephrase or rewrite the item to avoid this pitfall. Use appropriate and unbiased language in surveys. Minimize the Use of Negative Wording The use of negative wording can trick participants into misunderstanding a survey item. Negative wording is the use of words in a sentence or an item that negates or indicates the opposite of what was otherwise described. The rule is to avoid asking participants in a survey item what they would not do, which can require rephrasing a sentence or survey item. For example, the survey item “How much do you not like working?” can be rephrased to “How much do you dislike working?” It may seem like a small change, but it can effectively reduce confusion. Avoid using negative wording in a survey item. Avoid the Response Set Pitfall When respondents notice an obvious pattern in the responses they provide, they will often use that same pattern to respond to future items in that survey. For example, suppose we ask participants to indicate their level of agreement with the following items on a 5-point scale to measure relationship satisfaction: I enjoy the time I spend with my partner. The time I spend with my partner makes me happy. I look forward to the time I spend with my partner. On the 5-point scale, suppose 1 indicates low satisfaction and 5 indicates high satisfaction. For each item, high satisfaction would be a rating of 5. If there were 20 questions like this, then participants would start to see a pattern, such as high ratings always indicate greater satisfaction. If participants are generally satisfied in their relationship, they may begin marking 5 for each item without reading many of the items because they know what the scale represents. However, the ratings participants give would reflect the fact that they saw a pattern and may not necessarily reflect their true ratings for each item. To avoid this problem, called a response set, mix up the items in a survey so that ratings are not all on the same end of the scale for a given measure. A response set is the tendency for participants to respond the same way to all items in a survey when the direction of ratings is the same for all items in the survey. To illustrate how to avoid the response set pitfall, the following are four items from the EDIS-S (Privitera & Wallace, 2011), which is used to estimate how much sugar participants consume in their diet. Participants rate each item on a 7-point scale from 1 (completely disagree) to 7 (completely agree). I tend to eat cereals that have sugar in them. I often eat candy to snack on when I am hungry. I tend to crave foods that are high in sugar. I tend to consume a low-sugar diet. Notice that the last item is flipped—if participants eat a lot of sugar, then they would rate on the low end of the scale for this last listed item only. Because a few items in the EDIS-S are flipped like this, responses on this scale are unlikely to result from a response set. However, for the scale to make sense, higher overall ratings must indicate greater daily intake of sugar. The first three items are stated such that higher scores do indicate greater daily intake of sugar. Suppose, for example, a participant rates the first three items a 7. The participant’s total score so far, then, is 7 × 3 items = 21. The fourth item is a reverse coded item, meaning that we need to code responses for the item in reverse order. The participant rates their response on the 1 to 7 scale, but when we score it, we will reverse it to a 7 to 1 scale. Hence, a 1 is scored as a 7, a 2 as a 6, a 3 as a 5, a 4 remains a 4, a 5 is scored as a 3, a 6 as a 2, and a 7 as a 1. By doing so, a 7 for the reverse coded item again indicates the highest intake of sugar, and a 1 indicates the lowest intake of sugar—consistent with the scale for the other items in the survey. Returning to our example, suppose that our participant rates the fourth item a 2. We reverse code this item and score it as a 6, then calculate the total score, which is 21 (first three items) + 6 (fourth item) = 27 (total score). Because the fourth item was reverse coded, the survey can now be scored such that higher scores indicate greater daily intake of sugar, and the survey can also be written so as to avoid a response set pitfall. A reverse coded item is an item that is phrased in the semantically opposite direction of most other items in a survey and is scored by coding or entering responses for the item in reverse order from how they are listed. Do not use double-barreled items in a survey. Use Rating Scales Consistently Another rule is to use only one rating scale at a time. In the simplest scenario, use only one scale if possible. The EDIS-S, for example, uses a level of agreement scale for all items in the survey from 1 (completely disagree) to 7 (completely agree). Having only one response scale makes it clear how respondents must respond to all items in a survey. If a survey must use two or more different scales, then the items in the survey should be grouped from one type of scale to the next. Begin with all items for one scale (e.g., items with a scale rated from very dissatisfied to very satisfied); then give directions to clearly indicate a change in the scale for the next group of items (e.g., items with a scale rated from not at all to all the time). Consistent use of rating scales in a survey ensures that participants’ responses reflect their true ratings for each item and not some confusion about the meaning of the scale used. Be clear about the rating scale(s) used in surveys. Limit the Points on a Rating Scale To construct a response scale, keep the scale between 3 and 10 points (Komorita & Graham, 1965; Matell & Jacoby, 1971). Experts in psychometrics, a field involved in the construction of measurement scales, suggest that response scales should have a midpoint or intermediate response level. Having fewer than 3 points on a response scale violates this suggestion, and response scales with more than 10 points can be too confusing. As a general rule, use 3 to 10 points on the rating scale for each item in a survey. There are two exceptions to the rule of limiting a rating scale to 3 to 10 points. One exception is that a 2-point scale is appropriate for dichotomous scales in which only two responses are possible. For example, dichotomous scales with true/false, yes/no, or agree/disagree as the response options are acceptable. A second exception is that bipolar scales, those that have points above and below a zero point, can be 3 to 10 points above and below the zero point. Hence, a bipolar scale, such as the one shown below with 11 points, can have up to 21 points, or 10 points above and 10 points below zero. Anchor or label the end points of a rating scale. Bipolar scales are response scales that have points above (positive values) and below (negative values) a zero point. How do you feel about your ability to find a job that will make you happy?   −5  −4  −3  −2  −1  0  1  2  3  4  5 Extremely pessimistic   No opinion Extremely optimistic abel or Anchor the Rating Scale Points To clearly indicate what a rating scale means, we can use anchors or adjectives given at the end points of a rating scale. Anchors are often listed below the end points on a rating scale, such as those given for the bipolar scale for the previous rule. Notice also in the bipolar scale that the midpoint is labeled. Indeed, we can include anchors for the end points and label every other point on a scale if we choose. Anchors are adjectives that are given to describe the end points of a rating scale to give the scale greater meaning. Minimize Item and Survey Length As a general rule, you want to make sure that each item in a survey is as brief as possible. Being concise is important to ensure that respondents read the full item before responding. Likewise, a survey itself can be too long, although it is difficult to determine or define what constitutes “too long.” The best advice is to write the survey to be as short and concise as possible, yet still able to convey or measure what it is intended to measure. Keep in mind that participants will fatigue or simply get tired of answering survey items. If this occurs, then a participant may start to “browse” survey items or even make up responses just to “get the survey over with.” It is not to say that all participants will do this, but some will do this, and we want to avoid this problem of fatigue. The obvious solution is to make the survey and the items in the survey as brief or concise as possible. A survey that is not longer than 10 to 15 minutes is typically preferred to one that takes an hour to complete. The time to complete a survey tends to be more important than the number of items in the survey. For example, a survey with a few open-ended items may take 15 to 20 minutes to complete, whereas a survey that has 30 restricted items may take only a minute or two to complete. To minimize survey length, then, the key goal is to minimize how long (over time) it takes a person to complete all items in a survey, not necessarily to minimize the number items in the survey per se. Minimize item and survey length. Learning Check 2 ✓ Which rule or rules for writing survey items does each of the following items violate? Note: Assume that each item is rated on a 5-point scale from 1 (completely false) to 5 (completely true): I am a likable person and enjoy the company of others. On a scale from 1 (very unlikely) to 13 (very likely), what are the chances you will win money in a casino? Reverse discrimination against White Americans is a big problem in America. Misogynistic men do not make good boyfriends. True or false: How an item is worded can affect the reliability and validity of responses given for that item. Answers: 1. A. Avoid double-barreled items, B. Limit the points on a rating scale, C. Use neutral or unbiased language, D. Keep it simple and minimize the use of negative wording; 2. True. 8.4 ADMINISTERING SURVEYS Once a survey is constructed, it is administered to participants who will respond to the survey. A survey can be written (in print or electronically) or spoken (such as in an interview). A written survey can be administered in person, by mail, or using the Internet. An interview survey can be administered face to face, by telephone, or in focus groups. Each method of administering a survey is described in this section. Written Surveys In-person surveys. A method that can effectively get participants to respond to a survey is to be physically present while participants complete the survey. The reason that more participants are willing to complete a survey administered in person is that you, the researcher, can be there to explain the survey, observe participants take the survey, and answer any questions they may have while they complete the survey. This method is more time-consuming, however, because it requires the researcher to be present while each and every participant completes the survey. Mail surveys. An alternative that can require less of the researcher’s time is to submit surveys in the mail. However, mail surveys are associated with higher rates of potential respondents choosing not to complete and return them to the researcher. Mail surveys can also be costly in terms of both the time it takes to prepare the surveys (e.g., printing and addressing surveys) and the money spent to send them out to potential respondents (e.g., postage stamps and envelopes). The following are four strategies that can increase how many people complete and return a mail survey: Include a return envelope with the return postage already paid. Let potential respondents know in advance that the survey is being sent. Include a cover letter detailing the importance of completing the survey. Include a gift for the potential respondent to keep, such as a pen or gift card. Internet surveys. A popular and cost-effective survey option is to administer surveys online. This option is inexpensive in that a survey can be administered to a large group of potential respondents with little more than a click of a button. Online surveys can be administered via links provided in an e-mail or using online survey construction sites, such as Qualtrics or SurveyMonkey.com (see Dillman, Smyth, & Christian, 2014). The main concern for using online surveys is that the results of these surveys may be limited to individuals who have access to computers with online capabilities, and to individuals who know enough about using computers that they can complete and submit the survey correctly. A written survey can be administered in person, by mail, or using the Internet. Interview Surveys Face-to-face interviews. A researcher could administer a survey orally to one participant at a time or to a small group. The advantage of a face-to-face interview is that the researcher can control how long it takes to complete the survey inasmuch as it is the researcher asking the questions. The drawback of face-to-face interviews is that they require the interviewer to be present for each survey and can be prone to interviewer bias, meaning that the interviewer’s demeanor, words, or expressions in an interview may influence the responses of a participant. For this reason, face-to-face interviews, while used in quantitative research, tend to be more commonly applied in qualitative research for which interviewing is a primary method used to describe an individual or a group. Interviewer bias is the tendency for the demeanor, words, or expressions of a researcher to influence the responses of a participant when the researcher and the participant are in direct contact. Telephone interviews. An interview can also be administered via the telephone. Phone interviews can be interpersonal (e.g., the researcher asks the questions) or automated (e.g., computer-assisted technology asks the questions). One advantage of automated telephone interviews is that they can save time and reduce the likelihood of interviewer bias. Another advantage is that telephone surveys can be administered at random by generating telephone numbers at random from within the area or region using random digit dialing. The key disadvantage of telephone interviews is that they often result in few people willingly agreeing to complete the survey. Also, the passage of new laws restricting telephone surveying has made this method of administering surveys less common in the behavioral sciences. Random digit dialing is a strategy for selecting participants in telephone interviews by generating telephone numbers to dial or call at random. Focus-group interviews. Sometimes researchers use surveys that are aimed at getting people to share ideas or opinions on a certain topic or issue. A survey that is structured to get participants to interact is called a focus group, which is a small group of about three to eight people. Questions or survey items in a focus group are mostly open ended, and the researcher plays more of a moderator role than an interviewer role. The goal of a focus group is to get participants talking to each other to get them to share their ideas and experiences on a predetermined topic. The conversations are typically recorded and then analyzed. While focus groups can reveal new directions and ideas for a given research topic, they are associated with the same problems mentioned for face-to-face interviews. Each survey administration method described here can vary substantially on how effectively researchers obtain representative samples. Obtaining representative samples is important because surveys are often used for the purpose of learning about characteristics in a population of interest. For example, we sample a few potential voters to identify the candidate who is likely to obtain the most votes in the population, not just among those sampled. Therefore, it is important that the sample we select to complete a survey is representative of the population. Administering a survey in person or face to face can make it more likely that we can obtain a representative sample. Administering the survey by mail, telephone, or Internet, on the other hand, can limit the representativeness of our sample because often only a small proportion of those who receive the survey will respond and actually complete the survey. Issues related to this problem of response rate are discussed in Section 8.5. An interview survey can be administered face-to-face, by telephone, or in focus groups. 8.5 SURVEYS, SAMPLING, AND NONRESPONSE BIAS When administering a survey, it is important to obtain a high survey response rate, which is the portion of participants who agree to complete a survey among all those who were asked to complete the survey. When the response rate is low, the concern is that any results from the survey will be limited to only those people who were actually willing to complete the survey. When the response rate is high, we can be more confident that the sample of those who completed the survey is representative of the larger population of interest. Response rate is the portion of participants who agree to complete a survey among all individuals who were asked to complete the survey. Issues related to response rates center on the possibility of a nonresponse bias (see Chapter 5), which occurs when participants choose not to complete a survey or choose not to respond to specific items in a survey. Although at least a 75% response rate should be obtained to minimize bias, the typical response rate to surveys in published peer-reviewed research is less than 50% (Baruch, 1999; Baruch & Holtom, 2008; Shih & Fan, 2008; Stoop, 2015). The problem of low response rates is that people who respond to surveys are probably different from those who do not respond. Because we cannot collect data from people who fail to respond, it is difficult to know the exact characteristics of this group of no responders. For this reason, we cannot know for sure whether survey results of those who do respond are representative of the larger population of interest, which includes those who do not respond to surveys. While the low response rates in published research can be problematic, there is good reason to publish the results from these journals. Although low response rates can limit the population validity (a subtype of external validity; see Chapter 6) of results from a survey, researchers are not always interested in generalizing results to a population. To establish some external validity, researchers often use survey results to instead generalize to a theory, called theoretical generalization, or generalize to other observations, called empirical generalization. Each type of generalization is illustrated in Figure 8.2, with an example given for each type. As long as survey results are rooted in existing theories and data, researchers “can afford to be lenient [to some extent] about sample quality in academic research” (Blair & Zinkhan, 2006, p. 6). Theoretical generalization is the extent to which results in a survey or another research study are consistent with predictions made by an existing theory. Empirical generalization is the extent to which results in a survey, or another research study are consistent with data obtained in previous research studies. Description Figure 8.2 ⦁ Two Types of Generalization for the Results in Survey Research 8.6 Ethics in Focus Handling and Administering Surveys To show respect for persons, which is a key principle in the Belmont Report and the APA (2020) code of conduct, the researcher has certain ethical responsibilities regarding how to handle and administer surveys in a research study. The following are four responsible and appropriate ways to handle and administer surveys: The survey itself should not be offensive or stressful to the respondents. The respondents should, under reasonable circumstances, be satisfied or comfortable with their survey experience such that they would not feel distress if asked to complete the survey again. If they would feel distress, then the survey may pose potential psychological risks to the respondents. Do not coerce respondents into answering questions or completing a survey. All respondents should be informed prior to completing the survey (typically in an informed consent form) that they can skip or choose not to answer any survey items, or the entire survey, without penalty or negative consequence. Do not harass respondents in any way for recruitment purposes. Because of high nonresponse rates, researchers often actively recruit potential respondents through e-mail or phone call reminders. The potential respondents must not view these recruitment efforts as harassing or intrusive. Protect the confidentiality or anonymity of respondents. Personally identifiable information of respondents should be protected at all times. If the researcher requires respondents to provide personally identifiable information, then such information should be safeguarded while in the possession of the researcher. Learning Check 3 ✓ State three ways that a written survey can be administered. State three ways that an interview survey can be administered. Is the typical response rate for survey research that is published in the behavioral sciences less than or greater than 50%? True or false: A survey should not be offensive or stressful to the respondent. Answers: 1. A written survey can be administered in person, by mail, and over the Internet. An interview survey can be administered face to face, by telephone, and in focus groups; 2. Less than 50%; 3. True. CORRELATIONAL DESIGNS In the chapter opening, we stated the following hypothesis: Texting while driving is more prevalent among younger age groups. To answer the hypothesis, we used a survey design to begin this chapter. However, we could use a correlational research design to test this hypothesis as well. To use the correlational design, for example, we could ask a sample of participants who drive to indicate in a questionnaire their age (in years) and how often they use text messaging while driving (per month). If the hypothesis is correct and we set up this study correctly, then we should expect to find that increased texting is associated with younger drivers. Notice that we used survey data (based on responses in a questionnaire) to record the data. Surveys are often used with a correlational research design. However, keep in mind that anytime we use data to determine whether two or more factors are related/correlated, we are using the correlational design, even if we used a survey or questionnaire to record the data. In this final section, we introduce the research design that was illustrated here: the correlational research design. 8.7 THE STRUCTURE OF CORRELATIONAL DESIGNS It is often difficult to determine that one factor causes changes in another factor. For example, in the texting-while-driving study used to introduce each major section in this chapter, we cannot reasonably determine that being younger causes drivers to text more while driving because we cannot control for other possible factors that can cause a change in texting behavior. Other possible factors include how often people drive, how busy their daily lives are, how many friends they have, how good of a driver they think they are, or whether they believe texting while driving is dangerous. In these situations, when it is difficult to control for other possible factors that could be causing changes in behavior, we use the correlational research design to determine the extent to which two factors are related, not the extent to which one factor causes changes in another factor. A correlational research design is the measurement of two or more factors to determine or estimate the extent to which the values for the factors are related or change in an identifiable pattern. To set up a correlational research design, we make two or more measurements for each individual observed. For the purposes of introducing the correlational research design, we will introduce situations in which only two measurements are made. Each measurement is for a different variable that we believe is related. For example, one economic factor, income, is related to obesity (Lovasi, Hutson, Guerra, & Neckerman, 2013; S. Newton, Braithwaite, & Akinyemiju, 2017; Su, Esqueda, Li, & Pagán, 2012) in that individuals with lower income tend to be more obese. The correlation establishes the extent to which two factors are related, such that values for one variable (income level) may predict changes in the values of a second variable (severity of obesity). correlation can be established in any setting. In a naturalistic setting, for example, we could measure the correlation between customer satisfaction in a restaurant and timeliness to serve patrons. In a laboratory setting, for example, we could expose participants to a fearful stimulus and record how fearful they rate the stimulus and their corresponding physiological stress response. Using existing data records, we could use legal documents to identify the correlation between duration of marriage (in years) and race, age, socioeconomic status, and any number of other demographic characteristics. In each example, we make two measurements for each individual (or document when using existing records), one measurement for each of the two variables being examined. Once we measure two variables, we then compute a statistical measure called the correlation coefficient to identify the extent to which the values of the two variables or factors are related or change in an identifiable pattern. The correlation coefficient ranges from −1.0 (the values for two factors change in opposite directions) to +1.0 (the values for two factors change in the same direction), and it is used to identify a pattern in terms of the direction and strength of a relationship between two factors—each way of describing the relationship between two factors is introduced in this section. The correlation coefficient is a statistic used to measure the strength and direction of the linear relationship, or correlation, between two factors. The value of r can range from −1.0 to +1.0. −1.0 to +1.0. In behavioral research, we mostly describe the linear (or straight-line) relationship between two factors. For this reason, we will limit this introduction to the direction and strength of a linear relationship between two factors. 8.8 DESCRIBING THE RELATIONSHIP BETWEEN VARIABLES The direction of a relationship between two factors is described as being positive or negative. The strength of a relationship between two factors is described by the value of the correlation coefficient, r, with values closer to r = ±1.0 indicating a stronger relationship between two factors. The direction and strength of correlation can be readily identified in a graph called a scatter plot. To construct a scatter plot (also called a scatter diagram or scatter gram), we plot each pair of values, called data points, along the x-axis and y-axis of a graph to see whether a pattern emerges. A scatter plot, also called a scatter diagram or scatter gram, is a graphical display of discrete data points (x, y) used to summarize the relationship between two factors. Data points are the x- and y-coordinates for each plot in a scatter plot. The extent to which two factors are related is determined by how far data points fall from a regression line when the data points are plotted in a graph. The regression line is the best-fitting or closest-fitting straight line to a set of data points. The best-fitting straight line is the one that minimizes the distance of all data points that fall from it. We will use the regression line to illustrate the direction and strength of the relationship between two factors using the correlational research design. Data points are the x- and y-coordinates for each plot in a scatter plot. The extent to which two factors are related is determined by how far data points fall from a regression line when the data points are plotted in a graph. The regression line is the best-fitting or closest-fitting straight line to a set of data points. The best-fitting straight line is the one that minimizes the distance of all data points that fall from it. We will use the regression line to illustrate the direction and strength of the relationship between two factors using the correlational research design. The regression line is the best-fitting straight line to a set of data points. A best-fitting line is the line that minimizes the distance that all data points fall from it. The Direction of a Relationship In a scatter plot, a positive correlation means that as values of one factor increase, values of a second factor also increase; as values of one factor decrease, values of a second factor also decrease. If two factors have values that change in the same direction, we can graph the correlation using a straight line. In Figure 8.3, values on the y-axis increase as values on the x-axis increase. A positive correlation is a positive value of r that indicates that the values of two factors change in the same direction: As the values of one factor increase, values of the second factor also increase; as the values of one factor decrease, values of the second factor also decrease. Figure 8.3a shows a perfect positive correlation, which occurs when each data point falls exactly on a straight line, although this is rare. More commonly, as shown in Figure 8.3b, a positive correlation is greater than 0 but less than 1.0, where the values of two factors change in the same direction but not all data points fall exactly on the regression line. Description Figure 8.3 ⦁ A Perfect Positive (a) and a Positive (b) Linear Correlation Both the table and the scatter plot show the same data for (a) and (b). A negative correlation means that as values of one factor increase, values of the second factor decrease. If two factors have values that change in the opposite direction, we can graph the correlation using a straight line. In Figure 8.4, values on the y-axis decrease as values on the x-axis increase. A negative correlation is a negative value of r that indicates that the values of two factors change in different directions, meaning that as the values of one factor increase, values of the second factor decrease. Figure 8.4a shows a perfect negative correlation, which occurs when each data point falls exactly on a straight line, although this is also rare. More commonly, as shown in Figure 8.4b, a negative correlation is greater than −1.0 but less than 0, where the values of two factors change in the opposite direction but not all data points fall exactly on the regression line. The Strength of a Relationship A zero correlation (r = 0) means that there is no linear pattern or relationship between two factors. This outcome is rare because usually by mere chance at least some values of one factor, X, will show some pattern or relationship with values of a second factor, Y. The closer a correlation coefficient is to r = 0, the weaker the correlation and the less likely that two factors are related; the closer a correlation coefficient is to r = ±1.0, the stronger the correlation and the more likely that two factors are related. The pattern of a set of data points can indicate the extent to which two factors are related. A positive correction is given with a plus (+) sign; a negative correlation is given with a minus (−) sign. Figure 8.4 ⦁ A Perfect Negative (a) and a Negative (b) Linear Correlation Both the table and the scatter plot show the same data for (a) and (b). The closer a set of data points falls to a regression line, the stronger the correlation. The strength of a correlation reflects how consistently values for each factor change. When plotted in a graph, a stronger correlation means that the values for each factor change in a related pattern—the data points fall closer to a regression line, or the straight line that best fits a set of data points. Figure 8.5 shows two positive correlations between exercise (Factor X) and body image satisfaction (Factor Y), and Figure 8.6 shows two negative correlations between absences in class (Factor X) and quiz grades (Factor Y). In both figures, the closer a set of data points falls to the regression line, the stronger the correlation; hence, the closer a correlation coefficient is to r = ±1.0. The Correlation Coefficient The most commonly used formula for computing r is the Pearson correlation coefficient, which is used to determine the strength and direction of the relationship between two factors on an interval or a ratio scale of measurement. Alternative formulas for computing a correlation with many scales of measurement exist, as identified in Table 8.2 (also given in Chapter 14, Table 14.2); however, each of these alternative formulas was derived from the formula for the Pearson correlation coefficient, so only the Pearson formula will be described in this section. The formula for the Pearson correlation coefficient is a measure of the variance of data points from a regression line that is shared by the values of two factors (X and Y), divided by the total variance measured: The Pearson correlation coefficient is used to measure the direction and strength of the linear relationship of two factors in which the data for both factors are on an interval or a ratio scale of measurement. The Pearson correlation coefficient is used to measure the direction and strength of the linear relationship of two factors in which the data for both factors are on an interval or a ratio scale of measurement. Description Figure 8.6 ⦁ The Consistency of Scores for a Negative Correlation Both figures show approximately the same regression line, but the data points in (b) are more consistent because they fall closer to the regression line than in (a). 𝑟 = Variance shared by 𝑋 and 𝑌 Total Variance measured The correlation coefficient, r, measures the variance of X and the variance of Y, which constitutes the total variance that can be measured. The total variance is placed in the denominator of the formula for r. The variance in the numerator, called covariance, is the amount or proportion of the total variance that is shared by X and Y. The larger the covariance, the closer data points will fall to the regression line. When all data points for X and Y fall exactly on a regression line, the covariance equals the total variance, making the formula for r equal to +1.0 or −1.0, depending on the direction of the relationship between two factors. The farther that data points fall from the regression line, the smaller the covariance will be compared with the total variance in the denominator, resulting in a value of r closer to 0. Covariance is the extent to which the values of two factors (X and Y) vary together. The closer data points fall to the regression line, the more the values of two factors vary together. Table 8.2 ⦁ The Scales of Measurement for Factors Tested Using Correlation Coefficients Correlation Coefficient Scale of Measurement for Correlated Variables Pearson Both factors are interval or ratio data. Spearman Both factors are ranked or ordinal data. Point-Biserial One factor is dichotomous (nominal data), and a second factor is continuous (interval or ratio data). Phi Both factors are dichotomous (nominal data). Figure 8.7 ⦁ Covariance Between X and Y Each circle represents the variance of a factor. The variances of two factors covary inasmuch as the two circles overlap. The more overlap or shared variance of two factors, the more the two factors are related. If we conceptualize covariance as circles, as illustrated in Figure 8.7, then the variance of each factor (X and Y) will be contained within each circle. The two circles, then, contain the total measured variance. The covariance of X and Y reflects the extent to which the total variance or the two circles overlap. In terms of computing r, the overlap or covariance is placed in the numerator; the total variance contained within each circle is placed in the denominator. The more the two circles overlap, the more the covariance (in the numerator) will equal the independent variances contained within each circle (in the denominator)—and the closer r will be to ±1.0. Learning Check 4 ✓ The value of the __________________ provides an estimate of the strength and direction of the relationship between two factors. A professor measures a negative correlation between time spent partying and grades. Interpret this result. 1. A researcher records a correlation of r = +.02. 2. Identify the direction of this correlation. 3. Identify the strength of this correlation. 4. How will the data points appear in a graph for two factors with values that change consistently? Answers: 1. correlation coefficient (r); 2. As time spent partying increases, grades decrease; 3. A. The direction of the correlation is positive, B. The strength of the correlation is weak because .02 is close to 0; 4. The data points will fall close to the regression line. 8.9 LIMITATIONS IN INTERPRETATION Fundamental limitations using the correlational method require that a significant correlation be interpreted with caution. Among the many considerations for interpreting a significant correlation, in this section we consider causality, outliers, and restriction of range. Causality Using a correlational design, we do not manipulate an independent variable, and we certainly make little effort to control for other possible factors that may also vary with the two variables we measured. For this reason, a significant correlation does not show that one factor causes changes in a second factor (i.e., causality). To illustrate, suppose we measure a significant negative correlation between the self-rated mood of participants and the amount of food they eat daily (in calories per day). We will look at four possible interpretations for this correlation: Decreases in how people feel (mood) can cause an increase in the amount they eat (eating). This possibility cannot be ruled out. Increases in the amount people eat (eating) can cause a decrease in how people feel (mood). So the direction of causality can be in the opposite direction. Hence, instead of changes in mood causing changes in eating, maybe changes in eating cause changes in mood. This possibility, called reverse causality, cannot be ruled out either. The two factors could be systematic, meaning that they work together to cause a change. If two factors are systematic, then Conclusions 1 and 2 could both be correct. The worse people feel, the more they eat, and the more people eat, the worse they feel. This possibility, that each factor causes the other, cannot be ruled out either. Changes in both factors may be caused by a third unanticipated confound or confound variable. Perhaps biological factors, such as increased parasympathetic activity, make people feel worse and increase how much they want to eat. So, it is increased parasympathetic activity that could be causing changes in both mood and eating. This confound variable and any number of additional confound variables could be causing changes in mood and eating and cannot be ruled out either. Reverse causality is a problem that arises when the direction of causality between two factors can be in either direction. Correlation does not demonstrate cause. Reverse causality occurs when the direction of causality for two factors, A and B, cannot be determined. Hence, changes in Factor A could cause changes in Factor B, or changes in Factor B could cause changes in Factor A. A confound or confounded variable is an often unanticipated variable not accounted for in a research study that could be causing or associated with observed changes in one or more measured variables. Figure 8.8 summarizes each possible explanation for an observed correlation between mood and eating. The correlational design cannot distinguish between these four possible explanations. Instead, a significant correlation shows that two factors are related. It does not provide an explanation for how or why they are related. Description Description Figure 8.8 ⦁ Four Potential Explanations for a Significant Correlation Because factors are measured, but not manipulated using the correlational method, any one of these possibilities could explain a significant correlation. Description Description Figure 8.9 ⦁ The Effects of an Outlier (a) The graph displays a typical correlation between income and education, with more education being associated with higher income. (b) The graph shows the same data with an additional outlier of a child movie star who earns $1 million. The inclusion of this outlier changed the direction and the strength of the correlation. Outliers Another limitation that can obscure the correlation or relationship between two factors is when an outlier is in the data. An outlier is a score that falls substantially above or below most other scores in a data set and can alter the direction and the strength of an observed correlation. Figure 8.9a shows data for the relationship between income and education without an outlier in the data. Figure 8.9b shows how an outlier, such as the income earned by a child movie star, changes the relationship between two factors. Notice in Figure 8.9 that the outlier changed both the direction and the strength of the correlation. An outlier is a score that falls substantially above or below most other scores in a data set. Outliers can change the strength and the direction of a correlation or relationship between two factors. Restriction of Range When interpreting a correlation, it is also important to avoid making conclusions about relationships that fall beyond the range of data measured. The restriction of range problem occurs when the range of data measured in a sample is restricted or smaller than the range of data in the general population. Restriction of range is a problem that arises when the range of data for one or both correlated factors in a sample is limited or restricted, compared with the range of data in the population from which the sample was selected. Figure 8.10 shows how the range of data measured in a sample can lead to erroneous conclusions about the relationship between two factors in a given population. This figure shows the positive correlation for a hypothetical population (top graph) and the correlations in three possible samples we could select from this population (smaller bottom graphs). Notice that, depending on the range of data measured, we could identify a positive correlation, a negative correlation, or zero correlation from the same population, although the data in the population are actually positively correlated. To avoid the problem of restriction of range, the direction and the strength of a correlation should only be generalized to a population within the limited range of measurements observed in the sample. Description Figure 8.10 ⦁ The Effects of Restriction of Range In this population, shown in the top graph, there is a positive correlation between two factors (r = +.855). Also depicted are three possible samples we could select from this population. Range PC shows a positive correlation (r = +.891), Range ZC shows a zero correlation (r = 0), and Range NC shows a negative correlation (r = −.598)—all within the same population. Because different ranges of data within the same population can show very different patterns, correlations should never be interpreted beyond the range of data measured in a sample. 8.10 CORRELATION, REGRESSION, AND PREDICTION The correlation coefficient, r, is used to measure the extent to which two factors (X and Y) are related. The value of r indicates the direction and strength of a correlation. When r is negative, the values of two factors change in opposite directions; when r is positive, the values of two factors change in the same direction. The closer r is to ±1.0, the stronger the correlation, and the more closely two factors are related. We can use the information provided by r to predict values of one factor, given known values of a second factor. Recall that the strength of a correlation reflects how closely a set of data points fits to a regression line (the straight line that most closely fits a set of data points). We can use the value of r to compute the equation of a regression line and then use this equation to predict values of one factor, given known values of a second factor in a population. This procedure is called linear regression (also called regression). Linear regression, also called regression, is a statistical procedure used to determine the equation of a regression line to a set of data points and to determ Figure 8.10 ⦁ The Effects of Restriction of Range In this population, shown in the top graph, there is a positive correlation between two factors (r = +.855). Also depicted are three possible samples we could select from this population. Range PC shows a positive correlation (r = +.891), Range ZC shows a zero correlation (r = 0), and Range NC shows a negative correlation (r = −.598)—all within the same population. Because different ranges of data within the same population can show very different patterns, correlations should never be interpreted beyond the range of data measured in a sample. 8.10 CORRELATION, REGRESSION, AND PREDICTION The correlation coefficient, r, is used to measure the extent to which two factors (X and Y) are related. The value of r indicates the direction and strength of a correlation. When r is negative, the values of two factors change in opposite directions; when r is positive, the values of two factors change in the same direction. The closer r is to ±1.0, the stronger the correlation, and the more closely two factors are related. We can use the information provided by r to predict values of one factor, given known values of a second factor. Recall that the strength of a correlation reflects how closely a set of data points fits to a regression line (the straight line that most closely fits a set of data points). We can use the value of r to compute the equation of a regression line and then use this equation to predict values of one factor, given known values of a second factor in a population. This procedure is called linear regression (also called regression). Linear regression, also called regression, is a statistical procedure used to determine the equation of a regression line to a set of data points and to determine the extent to which the regression equation can be used to predict values of one factor, given known values of a second factor in a population. To use linear regression, we identity two types of variables: the predictor variable and the criterion variable. The predictor variable (X) is the variable with values that are known and can be used to predict values of the criterion variable; the predictor variable is plotted on the x-axis of a graph. The criterion variable (Y) is the variable with unknown values that we are trying to predict, given known values of the predictor variable; the criterion variable is plotted on the y-axis of a graph. If we know the equation of the regression line, we can predict values of the criterion variable, Y, so long as we know values of the predictor variable, X. To make use of this equation, we identify the following equation of a straight line: The predictor variable (X) is the variable with values that are known and can be used to predict values of another variable. The criterion variable (Y) is the to-be-predicted variable with unknown values that can be predicted or estimated, given known values of the predictor variable. Y = bX + a In this equation, Y is a value we plot for the criterion variable, X is a value we plot for the predictor variable, b is the slope of a straight line, and a is the y-intercept (where the line crosses the y-axis). Given a set of data, researchers can find the values of a and b and then use the equation they found to predict outcomes of Y. To illustrate the use of the regression line to predict outcomes, consider a study conducted by Chen, Dai, and Dong (2008). In this study, participants completed a revised version of the Aitken Procrastination Inventory (API), and their level of procrastination was recorded. The researchers found that the following regression equation could be used to predict procrastination (Y) based on known scores on the API (X): Ŷ = 0.146X − 2.922 In this equation, Ŷ is the predicted value of Y given known scores on the API, a = 2.922, and b = 0.146. Using this information, we could have a student complete the API, plug their API score into the equation for X, and solve for Ŷ to find the procrastination level we predict for that student. The advantage of using linear regression is that we can use the equation of the regression line to predict how people will behave or perform. A caution of using this procedure, however, is that smaller correlations, or those closer to r = 0, will produce inaccurate predictions using the equation of the regression line because the data points will fall far from it. Likewise, the stronger the correlation, or the closer to r = ±1.0, the more accurate the predictions made using the equation of the regression line because the data points will fall closer to it. A correlation cannot describe data beyond the range of data observed in a sample. The equation of the regression line can be used to predict outcomes of a criterion variable. Learning Check 5 ✓ A correlational design does not demonstrate cause. Why? True or false: An outlier can influence both the direction and the strength of an observed correlation. __________ occurs when the range of data for one or both correlated factors in a sample is limited or restricted, compared with the range of data in the population from which the sample was selected. What procedure is used to predict outcomes of one factor given known values of a second factor? Answers: 1. Because we do not manipulate an independent variable, and we make little effort to control for other possible factors that may also vary with the two variables we measured; 2. True; 3. Restriction of range; 4. Linear regression. 8.11 SPSS in FocusCorrelation and Linear Regression The correlational design will likely require the use of a correlation coefficient or linear regression to statistically analyze measured data. In this section, we describe how to compute each type of statistic using SPSS. Pearson Correlation Coefficient (This starts pg. 227) this reading is for assignment 2 To compute a Pearson correlation coefficient using SPSS, suppose we test the hypothesis that greater mobile phone use is associated with increased stress, as has been tested in published research studies (see Lee, 2016; Murdock, Gorman, & Robbins, 2015). To measure mobile phone use, we can use the 27-item Mobile Phone Problem Use Scale (MPPUS; Bianchi & Phillips, 2005). To measure stress, we can use the 10-item Perceived Stress Scale (PSS; S. Cohen & Williamson, 1988). Using these measures, we will enter the data shown in Figure 8.11 to compute a Pearson correlation coefficient using SPSS. Click on the Variable View tab and enter MPPUS in the Name column; enter PSS in the Name column below it. Reduce the value to 0 in the Decimals column for both rows. By default, SPSS should identify the variables, MPPUS and PSS, as being scaled, which they are, so there is no need to change this in the Measure column. Click on the Data View tab. Enter the data for MPPUS in the first column; enter the data for PSS in the second column. Go to the menu bar and click Analyze, then Correlate, and then Bivariate to display a dialog box. Using the arrows, move both variables into the Variables box. Select OK or select Paste and click the Run command. Description Figure 8.11 ⦁ A Table and Scatter Plot Showing the Relationship Between Mobile Phone Use and Perceived Stress The regression line is given in the scatter plot. Both the table and the scatter plot show the same data. The SPSS output table, shown in Table 8.3, is set up in a matrix with MPPUS and PSS listed in the rows and columns. Each cell in the matrix gives the direction and strength of the correlation (r = .540 for mobile phone use and stress; this value is shown with an asterisk for significant correlations), the significance (p = .014; how to interpret a p value is described in Chapters 10–12 and 14), and the sample size (N = 20). To make a decision, if a correlation is significant, then the decision is that the correlation observed in the sample will also be observed in the larger population from which the sample was selected. We can report a correlation in a research journal using the guidelines provided in the Publication Manual of the American Psychological Association (APA, 2020). Using these guidelines, we state the value of r, the p value, and the sample size as shown: A Pearson correlation indicates that greater mobile phone use is associated with greater perceived stress, r = .540, p = .014, N = 20. Description Linear Regression (page 229) also for assignment 2 For situations in which we want to know whether values for one factor predict values for a second factor, we use linear regression. As an example, suppose we conduct a test similar to that computed by Privitera and Wallace (2011). In that study, the researchers tested if scores on the 11-item EDIS-S predicted how much people like the taste of sugar. To measure liking for sugar, participants drank sugar water and rated how much they liked the taste on a 100-millimeter line scale in which higher ratings indicated greater liking for the sugar water. Using data similar to those observed in the study by Privitera and Wallace (2011), we will enter the data shown in Figure 8.12 to compute linear regression using SPSS. Click on the Variable View tab and enter EDISS in the Name column; enter liking in the Name column below it. Reduce the value to 0 in the Decimals column for both rows. By default, SPSS should identify the variables, EDISS and liking, as being scaled, which they are, so there is no need to change this in the Measure column. Click on the Data View tab. Enter the data for the EDIS-S in the first column; enter the data for liking in the second column. Go to the menu bar and click Analyze, then Regression, then Linear . . . to display a dialog box. Using the arrows, move the criterion variable, liking, in the Dependent box; move the predictor variable, EDISS, into the Independent(s) box. Select OK or select Paste and click the Run command. Description Figure 8.12 ⦁ A Table and Scatter Plot Showing the Relationship Between EDIS-S Scores and Liking for Sugar Water The regression line is given in the scatter plot. Both the table and the scatter plot show the same data. Description Table 8.4 ⦁ SPSS Output for Linear Regression The SPSS output table, shown in Table 8.4, displays three ways to analyze the data. The top table shows the proportion of variance, R2 =.797, which is an estimate for how well EDIS-S scores predict ratings of liking for the sugar water. Values closer to 1.0 indicate better predictions. Results for the regression analysis are given in the middle table. Based on the results in that table, we conclude that EDIS-S scores (the predictor variable) do significantly predict a liking for sugar water (the criterion variable), as indicated by the p value in the Sig. column (how to interpret a p value is described further in Chapters 10–12 and 14). To make a decision, if a regression analysis is significant, then the decision is that the predictive relationship observed in the sample will also be observed in the larger population from which the sample was selected. To determine the direction of the relationship between EDIS-S scores and liking for the sugar water, we look at the standardized beta coefficient given in the bottom table. In this example, the beta (β) coefficient is positive, β = +.893, indicating that higher scores on the EDIS-S predict higher ratings for the sugar water. Based on guidelines in the Publication Manual of the American Psychological Association (APA, 2020), we report the results of a regression analysis with one predictor variable by including the value of R2 (top table), the value of β (bottom table), and the results of the regression analysis (middle table; reported as an F value) as shown: A regression analysis showed that EDIS-S scores significantly predicted ratings of liking for sugar water, β = +.893, F(1, 13) = 50.978, p < .001 (R2 = .797). Testing the Assumptions of Parametric Testing Note that a Pearson correlation test is a parametric statistic. This means that it can only be used when the data being tested are on an interval or ratio scale. Additionally, this test has the following assumptions, all of which must be met: Normality. We assume that the data points are normally distributed, such that (a) the population of X and Y scores are normal; (b) for each X score, the distribution of Y scores is normal; and (c) for each Y score, the distribution of X scores is normal. Linearity. We assume that the best way to describe a pattern of data is using a straight line, as opposed to other shapes such as curvilinear shapes. Homoscedasticity. We assume that there is an equal (“homo”) variance or scatter (“scedasticity”) of data points dispersed along the regression line. When the variance of data points along the regression line is not equal, the Pearson correlation coefficient (r) tends to underestimate the strength of a correlation. Likewise, a regression analysis test is a parametric statistic in that it can only be used when the data being tested are on an interval or ratio scale. Additionally, this test has the following assumptions, all of which must be met: Normality of errors. We assume that the errors for each individual value of X (the predictor variable) are normally distributed. Independence of errors. The residuals (or errors) between the observed value and the predicted value should be independent (i.e., not significantly correlated) for each value of the predictor variable. Linearity. We assume that that the best way to predict the value of Y (the variable we are trying to predict) given known values of X (the predictor) is using the equation for a straight line, as opposed to equations for other shapes such as curvilinear shapes. Homoscedasticity. We again assume that there is an equal (“homo”) variance or scatter (“scedasticity”) of data points dispersed along the regression line. Keep in mind that satisfying the assumptions for the correlation and regression analysis is critically important. For each example using SPSS in this chapter, the data are given such that the assumptions for conducting the tests were met. That said, always take caution to test the assumptions when using parametric tests. Ensuring that the assumptions for a given parametric test are met is a critical step to optimize the accuracy of the conclusions you draw from such testing. See also Table 14.2 in Chapter 14 (p. 400) for a summary of alternative tests to the Pearson correlation when the data being tested are not on an interval or ratio scale. Chapter Summary LO 1 Identify and construct open-ended, partially open-ended, and restricted survey items. An open-ended item is a question or statement in a survey that allows the respondent to give any response in their own words, without restriction. This type of question is most often used in qualitative research. A partially open-ended item is a question or statement in a survey that includes a few restricted answer options and then a last option that allows participants to respond in their own words in case the few restricted options do not fit with the answer they want to give. A restricted item is a question or statement in a survey that includes a restricted number of answer options to which participants must respond. This type of question is most often used in quantitative research. LO 2 Identify 10 rules for writing valid and reliable survey items. Ten rules for writing valid and reliable survey items are as follows: Keep it simple. Avoid miscategorizing response items. Avoid double-barreled items. Use neutral or unbiased language. Minimize the use of negative wording. Avoid the response set pitfall. Use rating scales consistently. Limit the points on a rating scale. Label or anchor the rating scale points. Minimize item and survey length. LO 3 Describe methods of administering written surveys and interview surveys. A survey can be written (in print or electronically) or spoken (such as in an interview). A written survey can be administered in person, by mail, or using the Internet. An interview survey can be administered face to face, by telephone, or in focus groups. In-person and face-to-face surveys have the best response rates. In addition, written surveys are preferred to interview surveys in quantitative research partly because interviews are prone to a possible interviewer bias. LO 4 Explain how response rates to surveys can limit the interpretation of survey results. The problem of low response rates is that people who respond to surveys are probably different from those who do not respond. Because we cannot collect data from people who fail to respond, it is difficult to know the exact characteristics of this group of no responders. For this reason, we cannot know for sure whether survey results of those who do respond are representative of the larger population of interest, which includes those who do not respond to surveys. LO 5 Identify how to appropriately handle and administer surveys. To appropriately handle and administer surveys, the survey itself should not be offensive or stressful to the respondent; do not coerce respondents into answering questions or completing a survey; do not harass respondents in any way for recruitment purposes; protect the confidentiality or anonymity of respondents. LO 6 Identify and describe the direction and strength of a correlation. The correlation coefficient, r, is used to measure the extent to which two factors (X and Y) are related. The value of r indicates the direction and strength of a correlation. When r is negative, the values for two factors change in opposite directions; when r is positive, the values for two factors change in the same direction. The closer r is to ±1.0, the stronger the correlation, and the more closely two factors are related. When plotted in a graph, the strength of a correlation is reflected by the distance that data points fall from the regression line. The closer that data points fall to a regression line, or the straight line that best fits a set of data points, the stronger the correlation or relationship between two factors. LO 7 Explain how causality, outliers, and restriction of range can limit the interpretation of a correlation coefficient. Three considerations that must be made to accurately interpret a correlation coefficient are as follows: (1) correlations do not demonstrate causality, (2) outliers can change the direction and the strength of a correlation, and (3) never generalize the direction and the strength of a correlation beyond the range of data measured in a sample (restriction of range). LO 8 Explain how linear regression can be used to predict outcomes. We can use the information provided by r to predict values of one factor, given known values of a second factor using a procedure called linear regression. Specifically, we can use the value of r to compute the equation of a regression line and then use this equation to predict values of one factor, given known values of a second factor in a population. Using the following equation of the regression line, Y = bX + a, we can predict values of the criterion variable, Y, so long as we know values of the predictor variable, X. LO 9 Compute the Pearson correlation coefficient and linear regression using SPSS. SPSS can be used to compute the Pearson correlation coefficient using the Analyze, Correlate, and Bivariate options in the menu bar. These actions will display a dialog box that allows you to identify the variables and to run the correlation (for more details, see Section 8.11). SPSS can be used to compute linear regression using the Analyze, Regression, and Linear . . . options in the menu bar. These actions will display a dialog box that allows you to identify the variables and to run the linear regression (for more details, see Section 8.11).

The Use of the Test Statistic

This is pages 285- 287 from the text (referring to the activities)

In an experiment, participants are selected from one population, then randomly assigned to groups.

The Use of the Test Statistic

Once participants have been assigned to groups, we conduct the experiment and measure the same dependent variable in each group. For example, suppose we test the hypothesis that music can inspire greater creativity. Studies are quite common in this area of research (see He, Wong, & Hui, 2017; Kokotsaki, 2011; H. Newton, 2015). To test this hypothesis, we can select a sample of participants from a single population and randomly assign them to one of two groups. In Group Music, participants listen to classical music for 10 minutes; in Group No Music, different participants listen to a lecture about music for 10 minutes. Listening to classical music versus a lecture is the manipulation. After the manipulation, participants in both groups are given 5 minutes to write down as many uses as they can think of for a paper clip. If the hypothesis is correct, then Group Music should come up with more practical uses for a paper clip than Group No Music. The number of practical uses for a paper clip, then, is the dependent variable measured in both groups.

To compare differences between groups, we will compute a test statistic, which is a mathematical formula that allows us to determine whether the manipulation (music vs. no music) or error variance (other factors attributed to individual differences) is likely to explain differences between the groups. In most cases, researchers measure data on an interval or a ratio scale of measurement. In our example, the number of practical uses for a paper clip is a ratio scale measure. In these situations, when data are interval or ratio scale, the appropriate test statistic for comparing differences between two independent samples is the two-independent-sample t test. This test statistic follows a common form:

A two-independent-sample t test, also called an independent-sample t test, is a statistical procedure used to test hypotheses concerning the difference in interval or ratio scale data between two group means, in which the variance in the population is unknown.

𝑡=Mean differences between groups

    Mean differences attributed to error

The numerator of the test statistic is the actual difference between the two groups. For example, suppose that participants in Group Music came up with five practical uses for a paper clip on average, and Group No Music came up with two practical uses on average. The mean difference, then, between the two groups is 3 (5 − 2 = 3). We divide the mean difference between two groups by the value for error variance in the denominator. The smaller the error variance, the larger the value of the test statistic will be. In this way, the smaller the error variance, or the less overlap in scores between groups, the more likely we are to conclude that the manipulation, not factors attributed to individual differences, is causing differences between groups. To illustrate further, we will work through this example using SPSS.

10.7 SPSS in FocusTwo-Independent-Sample t Test

In Section 10.4, we used data originally given in Figure 10.4 to illustrate that the more overlap in scores between groups, the larger the error variance. We will use these same data, reproduced in Table 10.3, and assume that they represent the number of practical uses for a paper clip from the classical music and creativity study. We will use SPSS to compute a two-independent-sample t test for each data set given in Table 10.3: one test for the no-overlap example and one test for the overlap example.

  1. Click on the Variable View tab and enter Groups in the Name column. In the second row, enter No-Overlap in the Name column. In the third row, enter Overlap in the Name column. We will enter whole numbers in each column, so reduce the value to 0 in the Decimals column in each row. We can also define the scale of measurement for each variable. Go to the Measure column to select Nominal from the dropdown menu for Groups (because this is a categorical variable), then select Scale from the dropdown menu for No-Overlap and for Overlap.
  2. In the first row (labeled Groups), click on the Values column and click on the small gray box with three dots. To label the groups, in the dialog box, enter 1 in the value cell and No Music in the label cell, and then click Add. Then enter 2 in the value cell and Classical Music in the label cell, and then click Add. Select OK.
  3. Click on the Data View tab. In the first column (labeled Groups) enter, 1 five times, then 2 five times, which are the codes we entered in Step 2 for each group. In the second column (labeled NoOverlap), enter the scores for the No Music group next to the 1s and enter the scores for the Classical Music group next to the 2s for the no-overlap data given in Table 10.3 (left side). In the third column (labeled Overlap), enter the scores for the No Music group next to the 1s and enter the scores for the Classical Music group next to the 2s for the overlap data given in Table 10.3 (right side). Figure 10.7 shows how the data should appear.
  4. Go to the menu bar and click Analyze, then Compare Means, and Independent-Samples T Test to bring up a dialog box, which is shown in Figure 10.8.
  5. Use the arrows to move the data for NoOverlap and Overlap into the Test Variable(s): cell. SPSS will compute a separate t test for each of these sets of data. Select Groups and use the arrow to move this column into the Grouping Variable: cell. Two question marks will appear in that cell.
  6. To define the groups, click Define Groups . . . to bring up a new dialog box. Enter 1 in the Group 1: box, and enter 2 in the Group 2: box, and then click Continue. Now a 1 and 2 will appear in the Grouping Variable box instead of question marks.
  7. Select OK or select Paste and click the Run command.

Table 10.3 ⦁ Data to Enter Into SPSS

No-Overlap Example     Overlap Example 
No music  Classical MusicNo MusicClassical Music
1  401
1  404
2  524
3  638
3  658
M= 2  M= 5M= 2M= 6
    
    

The data are reproduced from those given in Figure 10.4 (no-overlap example) and Figure 10.5 (overlap example).

The output table, shown in Table 10.4, gives the results for both data sets; key results are circled and described in the table. Read the first row of each cell because we will assume that the variances were equal between groups. In the Mean Difference column, notice that the mean difference between the two groups is the same for both data sets; the mean difference is −3.0. However, notice in the Std. Error Difference column that the error variance is much smaller for the no-overlap data. The mean difference is the numerator for the test statistic, and the Std. Error Difference (or error variance) is the denominator. If you divide those values, you will obtain the value of the test statistic, given in the t column. The Sig. (2-tailed) column gives the p value, which is the likelihood that individual differences, or anything other than the music manipulation, caused the 3-point effect. The results show that when scores do not overlap between groups, the likelihood that individual differences explain the 3-point effect is p = .001; however, when scores do overlap between groups, this likelihood is much larger, p = .105.

Description

Figure 10.7 ⦁ SPSS Data View for Step 3

The criterion in the behavioral sciences is p ≤ .05. When p ≤ .05, we conclude that the manipulation caused the effect because the likelihood that anything else caused the effect is less than 5%. When p > .05, we conclude that individual differences, or something else, caused the effect because the likelihood is greater than 5% that something else, typically attributed to individual differences, is indeed causing the effect. In this way, the smaller the error variance or overlap in scores between groups, the more likely we are to conclude that differences between groups were caused by the manipulation and not individual differences.

Description

Figure 10.8 ⦁ SPSS Dialog Box for Steps 4 to 6

Also given in Table 10.4, the two-independent-sample t test is associated with N − 2 degrees of freedom, in which N is the total sample size. We report the results of a t test in a research journal using guidelines given in the Publication Manual of the American Psychological Association (APA, 2020). Using these guidelines, we report the results computed here by stating the value of the test statistic, the degrees of freedom (df), and the p value for each t test as shown:

A two-independent-sample t test showed that classical music significantly enhanced participant creativity when the data did not overlap, t(8) = −4.743, p = .001; the results were not significant when the data did overlap, t(8) = −1.826, p = .105.

Description

Table 10.4 ⦁ SPSS Output Table for the Two-Independent-Sample t Test

To read the table, assume equal variances. Notice that the error variance is smaller when scores do not overlap between groups, thereby making the value of the test statistic larger.

Adding groups can allow for more informative conclusions of observed results.

This is pages 291- 293 in the text (referring to the activities)

The Use of the Test Statistic

Once participants have been assigned to groups, we conduct the experiment and measure the same dependent variable in each group. For example, suppose we want to test the hypothesis that gym patrons will crave more high-fat foods after an intense workout, compared with an easy or moderate aerobic workout. To test this hypothesis, we could create the three exercise levels (easy, moderate, or intense) and randomly assign patrons to each group.

To compare differences between groups, we will compute a test statistic, which allows us to determine whether the manipulation (easy, moderate, or intense workout) or error variance (other factors attributed to individual differences) is likely to explain differences between the groups. In most cases, researchers measure data on an interval or a ratio scale of measurement. In our example, the number of high-fat foods chosen is a ratio scale measure. In these situations, when data are interval or ratio scale, the appropriate test statistic for comparing differences among two or more independent samples is the one-way between-subjects analysis of variance (ANOVA). The term one-way indicates the number of factors in a design. In this example, we have one factor or independent variable (type of workout). This test statistic follows a common form:

The one-way between-subjects ANOVA is a statistical procedure used to test hypotheses for one factor with two or more levels concerning the variance among group means. This test is used when different participants are observed at each level of a factor and the variance in a given population is unknown.

𝐹=Variability between groups

      Variability attributed to error

An ANOVA is computed by dividing the variability in a dependent measure attributed to the manipulation or groups, by the variability attributed to error or individual differences. When the variance attributed to error is the same as the variance attributed to differences between groups, the value of F is 1.0, and we conclude that the manipulation did not cause differences between groups. The larger the variance between groups relative to the variance attributed to error, the larger the value of the test statistic, and the more likely we are to conclude that the manipulation, not individual differences, is causing an effect or a mean difference between groups.

The one-way between-subjects ANOVA informs us only that the means for at least one pair of groups are different—it does not tell us which pairs of groups differ. For situations in which we have more than two groups in an experiment, we compute post hoc tests or “after the fact” tests to determine which pairs of groups are different. Post hoc tests are used to evaluate all possible pairwise comparisons, or differences in the group means between all possible pairings of two groups. In the exercise and food cravings experiment, we would use the one-way between-subjects ANOVA to determine if the manipulation (easy, moderate, or intense workout groups) caused the mean number of high-fat foods that participants craved to be different or to vary between groups. We would then compute post hoc tests to determine which pairs of group means were different. To illustrate further, we will work through this research example using SPSS. The data for this example, as well as steps for analyzing these data, are described in Figure 10.11.

A post hoc test is a statistical procedure computed following a significant ANOVA to determine which pair or pairs of group means significantly differ. These tests are needed with more than two groups because multiple comparisons must be made.

A pairwise comparison is a statistical comparison for the difference between two group means. A post hoc test evaluates all possible pairwise comparisons for an ANOVA with any number of groups.

Reducing error variance increases power—or likelihood of observing an effect, assuming it exists.

10.9 SPSS in Focus One-Way Between-Subjects ANOVA

We will use SPSS to compute the one-way between-subjects ANOVA for the data given in Figure 10.11 in Step 1. For these data, we will test the hypothesis that patrons at a gym will crave more high-fat foods after an intense aerobic workout, compared with an easy or moderate aerobic workout. There are two commands that we could use to analyze these data; we will use the One-Way ANOVA command.

  1. Click on the Variable View tab and enter Groups in the Name column. In the Values column, click on the small gray box with three dots. To label the groups, in the dialog box, enter 1 in the Value cell and Easy in the Label cell, then click Add. Then enter 2 in the Value cell and Moderate in the Label cell, and then click Add. Then enter 3 in the Value cell and Intense in the Label cell, and then click Add. Then click OK.
  2. Still in the Variable View, enter Foods in the Name column in the second row. Reduce the value to 0 in the Decimals column for each row. Go to the Measure column to select Nominal from the dropdown menu for Groups (because this is a categorical variable), then select Scale from the dropdown menu for Foods.
  3. Click on the Data View tab. In the first column (labeled Groups), enter 1 five times, 2 five times, and 3 five times, which are the codes we entered in Step 2 for each group. In the Foods column, enter the data for each group as shown in Figure 10.12a.
  4. Go to the menu bar and click Analyze, then Compare Means, and One-Way ANOVA to bring up the dialog box shown in Figure 10.12b.
  5. Using the appropriate arrows, move Groups into the Factor: box. Move Foods into the Dependent List: box.
  6. Click the Post Hoc option to bring up the new dialog box shown in Figure 10.12c. Select Tukey, which is a commonly used post hoc test. Click Continue.
  7. Select OK, or select Paste and click the Run command.

The output table, shown in Table 10.5, gives the results for the one-way between-subjects ANOVA. The numerator of the test statistic is the variance between groups, 46.667, and the denominator is the variance attributed to individual differences or error, 2.167. When you divide those two values, you obtain the value of the test statistic, 21.538. The Sig. column gives the p value, which in our example shows that the likelihood that anything other than the exercise manipulation caused differences between groups is p < .001. We decide that the group manipulation caused the differences when p < .05; hence, we decide that the manipulation caused group differences. However, remember that this result does not tell us which groups are different; it tells us only that at least one pair of group means differ significantly.

To determine which groups are different, we conducted post hoc tests, shown in Table 10.6. On the left, you see Easy, Moderate, and Intense labels for the rows. You read the table as comparisons across the rows. The first comparison on the first line in the table is Easy and Moderate. If there is an asterisk next to the value given in the Mean Difference column, then those two groups significantly differ (note that the p value for each comparison is also given in the Sig. column for each comparison). The next comparison is Easy and Intense in the top left boxed portion of the table. For all comparisons, the results show that people choose significantly more high-fat foods following an intense workout, compared with a moderate and an easy workout.

Also given in Table 10.5, the one-way between-subjects ANOVA is associated with two sets of degrees of freedom (df): one for the variance between groups and one for the variance attributed to error. Using APA (2020) guidelines, we report the results of an ANOVA by stating the value of the test statistic, both degrees of freedom, and the p value for the F test, and indicate the results of the post hoc test if one was computed as shown:

A one-way between-subjects ANOVA showed that the number of high-fat foods chosen significantly varied by the type of workout participants completed, F(2, 12) = 21.538, p < .001. Participants chose significantly more high-fat foods following a moderate or intense workout compared to an easy workout (Tukey’s honestly significant difference, p ≤ .003).

Description

Figure 10.11 ⦁ The Steps for Analyzing Differences Between More Than Two Group Means

QUASI-EXPERIMENTAL AND SINGLE-CASE EXPERIMENTAL DESIGNS

Description

In the natural world, the environment or situation you find yourself in can be dynamic. You need look no further than a college classroom. Suppose, for example, that you take a college exam in which the average student scored a 50%. Why were the exam grades so low? Was the professor ineffective in their teaching? Did the students study for the exam? Was the exam itself not fair? Was the material being studied too difficult or at too high a level? In this example, the answer can be difficult to identify because the environment is constrained by preexisting factors—the time, date, content, professor, and students enrolled in the course were not assigned by a researcher but instead were determined by the school and students. Accounting for these preexisting factors is important to determine why the exam grades were low.

In other situations, we can have difficulty obtaining large samples of participants. If a company were a small business, then it would have few employees, or if a behavioral disorder were rare, then it would afflict few people. In these cases, we probably could not obtain a large sample, so it would be advantageous to observe the behavior of only one or a few individuals. For example, we could observe an employee after a merger as new policy changes successively go into effect, or we could observe a patient’s health across multiple phases of treatment. In each case, we follow one participant and observe their behavior over time.

In this chapter, we introduce quasi-experimental designs used in science to make observations in settings that are constrained by preexisting factors. We also introduce many methods used to assess the behavior of a single participant or subject using single-case experimental designs, typically used when a large sample cannot be obtained.

QUASI-EXPERIMENTAL DESIGNS

UASI-EXPERIMENTAL DESIGNS

Suppose we hypothesize that high school graduates who attend college will value an education more than those who do not attend college. To test this hypothesis, we could select a sample of high school graduates from the same graduating class and divide them into two groups: those who attended college (Group College) and those who did not attend college (Group No College). We could then have all participants complete a survey in which higher scores on the survey indicate a higher value placed on obtaining an education. If the hypothesis is correct and we set up this study correctly, then participants in Group College should show higher scores on the survey than participants in Group No College.

Notice in this example that participants controlled which group they were assigned to—they either attended college or did not. Hence, in this example, the factor of interest (whether or not students attended college) was a quasi-independent variable. When a factor in a study is not manipulated (i.e., quasi-independent), this typically means that the study is a type of quasi-experimental research design. In this chapter, we separate the content into two major sections: quasi-experimental designs and single-case experimental designs. We begin this chapter with an introduction to the type of research design illustrated here: the quasi-experimental research design.

9.1 AN OVERVIEW OF QUASI-EXPERIMENTAL DESIGNS

In this major section, we introduce a common type of research design called the quasi-experimental research design. The quasi-experimental research design, also defined in Chapter 6, is structured similar to an experiment, except that this design does one or both of the following:

It includes a quasi-independent variable (also defined in Chapter 6).

It lacks an appropriate comparison/control group.

A quasi-experimental research design is the use of methods and procedures to make observations in a study that is structured similar to an experiment, but the conditions and experiences of participants lack some control because the study lacks random assignment, includes a preexisting factor (i.e., a variable that is not manipulated), or does not include a comparison/control group.

A quasi-independent variable is a preexisting variable that is often a characteristic inherent to an individual, which differentiates the groups or conditions being compared in a research study. Because the levels of the variable are preexisting, it is not possible to randomly assign participants to groups.

In the example used to introduce this section, the preexisting factor was college attendance (yes, no). The researchers did not manipulate or randomly assign participants to groups. Instead, participants were assigned to Group College or Group No College based on whether they attended college prior to the study. In other words, the participants, not the researcher, controlled which group they were assigned to. In this way, the study described to introduce this section was a quasi-experiment—the study was structured like an experiment in that differences in how students value college were compared between groups, but it lacked a manipulation (of the groups: whether students attended or did not attend college) and randomization (of assigning participants to each group).

Hence, a quasi-experiment is not an experiment because, as illustrated in Figure 9.1, the design does not meet all three requirements for demonstrating cause. In the college attendance study, for example, additional unique characteristics of participants, other than whether or not they attended college, could also be different between groups and therefore could also be causing differences between groups. For example, levels of motivation and academic ability may also be different between people who attend and do not attend college. When other possible causes cannot be ruled out, the design does not demonstrate cause.

Description

Figure 9.1 ⦁ A Simplified Distinction Between Experiments, Quasi-Experiments, and No experiments

The line represents the requirements for demonstrating cause: randomization, manipulation, and comparison/control. A quasi-experiment lacks at least one of these requirements and so fails to demonstrate cause.

In this major section, we introduce four categories of quasi-experimental research designs used in the behavioral sciences:

  1. One-group designs (posttest only and pretest-posttest)
  2. Nonequivalent control group designs (posttest only and pretest-posttest)
  3. Time series designs (basic, interrupted, and control)
  4. Developmental designs (longitudinal, cross-sectional, and cohort-sequential)

Quasi-experiments include a quasi-independent variable and/or lack a control group.

9.2 QUASI-EXPERIMENTAL DESIGN: ONE-GROUP DESIGNS

In some situations, researchers ask questions that require the observation of a single group. When only one group is observed, the study lacks a comparison group and so does not demonstrate cause; that is, the study is a quasi-experiment. Two types of one-group quasi-experiments are the following:

  1. One-group posttest-only design
  2. One-group pretest-posttest design
  3. One-Group Posttest-Only Design

The type of quasi-experiment most susceptible to threats to internal validity is the one-group posttest-only design, which is also called the one-shot case study (Campbell & Stanley, 1966). Using the one-group posttest-only design, a researcher measures a dependent variable for one group of participants following a treatment. For example, as illustrated in Figure 9.2, after a professor gives a lecture (the treatment), they may record students’ grades on an exam out of 100 possible points (the dependent variable) to test their learning.

The major limitation of this design is that it lacks a comparison or control group. Consider, for example, the exam scores following the lecture. If exam scores are high following the lecture, can we conclude that the lecture is effective? How can we know for sure if scores would have been high even without the lecture? We cannot know this because we have nothing to compare this outcome to; we have no control group. Hence, the design is susceptible to many threats to internal validity, such as history effects (unanticipated events that can co-occur with the exam) and maturation effects (natural changes in learning). In all, these limitations make the one-group posttest-only design a poor research design.

A one-group posttest-only design is a quasi-experimental research design in which a dependent variable is measured for one group of participants following a treatment.

One-group designs lack a control group.

One-Group Pretest-Posttest Design

One way to minimize problems related to having no control or comparison group is to measure the same dependent variable in one group of participants before (pretest) and after (posttest) a treatment. Using this type of research design, called a one-group pretest-posttest design, we measure scores before and again following a treatment, then compare the difference between pretest and posttest scores. The advantage is that we can compare scores after a treatment to scores on the same measure in the same participants prior to the treatment. The disadvantage is that the one-group design does not include a no-treatment control group and therefore is still prone to many threats to internal validity, including those associated with observing the same participants over time (e.g., testing effects and regression toward the mean).

A one-group pretest-posttest design is a quasi-experimental research design in which the same dependent variable is measured in one group of participants before (pretest) and after (posttest) a treatment is administered.

Figure 9.2 ⦁ The One-Group Posttest-Only Quasi-Experimental Design

To illustrate the one-group pretest-post-posttest design, we will look at the research example illustrated in Figure 9.3. Kimport and Hartzell (2015) measured state anxiety—a type of undesired current stress that is temporary or changes as experiences or conditions change—among psychiatric inpatients from two general adult units at a private psychiatric hospital before and after a structured clay therapy in which patients could use and mold clay for up to 10 minutes. Their results showed that state anxiety was significantly reduced from before to after the therapy. A limitation of this design is that participants were not randomly assigned to groups. This means that any other factors related to state anxiety could also explain the findings. Factors include changes in the conditions or experiences of patients other than the therapy, such as patient interactions with the researchers, how long they actually used the clay during the therapy, and distractions during the therapy (e.g., noises, decorations in the setting). These factors were largely beyond the control of the researchers and therefore could have also influenced the results. In addition, because the study lacked a control group with patients who had no therapy at all, the design was susceptible to many threats to internal validity, as stated previously. Indeed, Kimport and Hartzell (2015) directly acknowledged that “control groups in future research may prevent additional confounds from occurring including the novelty effect, which implies that the treatment may have been effective simply because it was new for the participants” (p. 188). Thus, it is possible any type of new therapy intervention could have been effective.

Description

Figure 9.3 ⦁ The One-Group Pretest-Posttest Quasi-Experimental Design

Source: Based on a design used by Kimport and Hartzell (2015).

9.3 QUASI-EXPERIMENTAL DESIGN: NONEQUIVALENT CONTROL GROUP DESIGNS

In some cases, researchers can use nonequivalent control groups, when it is not possible to randomly assign participants to groups. A nonequivalent control group is a type of control group that is matched upon certain preexisting characteristics similar to those observed in a treatment group, but to which participants are not randomly assigned. For example, suppose a professor gives a new lecture method to your research methods class and gives a traditional method in another research methods class, then compares grades on the topic lectured. The classes are matched on certain characteristics: Both classes are on the same topic (research methods), offered at the same school, and taught by the same professor. However, the class taught using the traditional method is a nonequivalent control group because students in that class chose to enroll in the class, so they were not randomly assigned to that class. Any preexisting differences between students who tend to enroll for one class over another, called selection differences, could therefore explain any differences observed between the two classes. Two types of nonequivalent control group quasi-experiments are the following:

A nonequivalent control group is a control group that is matched upon certain preexisting characteristics similar to those observed in a treatment group but to which participants are not randomly assigned. In a quasi-experiment, a dependent variable measured in a treatment group is compared to that in the nonequivalent control group.

Selection differences are any differences, which are not controlled by the researcher, between individuals who are selected from preexisting groups or groups to which the researcher does not randomly assign participants.

Nonequivalent control group posttest-only design

Nonequivalent control group pretest-posttest design

Nonequivalent Control Group Posttest-Only Design

Using the nonequivalent control group posttest-only design, a researcher measures a dependent variable following a treatment in one group and compares that measure to a nonequivalent control group that does not receive the treatment. The nonequivalent control group will have characteristics similar to the treatment group, but participants will not be randomly assigned to this group, typically because it is not possible to do so. For example, as illustrated in Figure 9.4, suppose a professor gives a new teaching method in their research methods class and gives a traditional method in another research methods class, then tests all students on the material taught. In this example, the nonequivalent control group was selected because it matched characteristics in the treatment group (e.g., all students were taking a research methods class). Students, however, enrolled themselves in each class; random assignment was not used, so the comparison is a nonequivalent control group.

A nonequivalent control group posttest-only design is a quasi-experimental research design in which a dependent variable is measured following a treatment in one group and in a nonequivalent control group that does not receive the treatment.

Description

Figure 9.4 ⦁ The Nonequivalent Control Group Posttest-Only Quasi-Experimental Design

A key limitation of this research design is that it is particularly susceptible to the threat of selection differences. In the example illustrated in Figure 9.4, because students enrolled in their college classes, they, not the researcher, controlled which class they enrolled in. Therefore, any preexisting differences between students who choose one section of a class over another, such as how busy the students’ daily schedules are or how motivated they are to attend earlier or later classes, may actually be causing differences in grades between classes. For this reason, the nonequivalent control group posttest-only design demonstrates only that a treatment is associated with differences between groups, not that a treatment caused differences between groups, if any were observed.

Nonequivalent control group designs include a “matched” or nonequivalent control group.

Nonequivalent Control Group Pretest-Posttest Design

A nonequivalent control group pretest-posttest design is a quasi-experimental research design in which a dependent variable is measured in one group of participants before (pretest) and after (posttest) a treatment and that same dependent variable is also measured at pretest and posttest in another nonequivalent control group that does not receive the treatment.

One way to minimize problems related to not having a comparison group is to measure a dependent variable in one group of participants observed before (pretest) and after (posttest) a treatment and measure that same dependent variable at pretest and posttest in another nonequivalent control group that does not receive the treatment. This type of design is called the nonequivalent control group pretest-posttest design. The advantage of this design is that we can compare scores before and after a treatment in a group that receives the treatment and in a nonequivalent control group that does not receive the treatment. While the nonequivalent control group will have characteristics similar to the treatment group, participants are not randomly assigned to this group, typically because it is not possible to do so. Hence, selection differences still can possibly explain observations made using this research design.

To illustrate the nonequivalent control group pretest-posttest design, we will look at the research example in Figure 9.5. Heinicke, Zuckerman, and Cravalho (2017) evaluated the effectiveness of online Readiness Assessment Tests (RATs)—quizzes or tests given prior to class to inform the instructor of where students are struggling the most, from which they can adapt course lectures—on overall class exam grades. Heinicke et al. (2017) hypothesized that implementing RATs into coursework would improve overall grades and class performance in general. To test this hypothesis, college students enrolled in one of two sections of a Psychology of Exceptional Children course were recruited to participate. In one section, the RATs were a required part of the course (the treatment group; Section B); in the other section, the course was structured the same except that RATs were not part of the coursework (the nonequivalent control group; Section A). Knowledge of course material was assessed both prior to and at the end of the course. As shown in Figure 9.6, while knowledge of course material was not different prior to the course, students who were in the class with RATs incorporated (the treatment group) showed overall higher grades on the final assessment by the end of the course compared with students in the nonequivalent control group who did not have RATs incorporated into the course.

A key limitation of this research design is that it is particularly susceptible to the threat of selection differences. In the example illustrated in Figure 9.5, because students enroll in college classes, they, not the researcher, control what classes they will be in. Any preexisting differences between students who choose one class over another, then, could also be causing differences between classes. For example, Heinicke et al. (2017) acknowledged that because students were not randomly assigned to classes, the differences in overall class performance between those who did versus did not have RATs incorporated into their course could also be due to other “potential extraneous variables, such as the timing of the class (e.g., Section B met at 10:00 a.m., whereas Section A met at 8:00 a.m.)” (p. 137). Hence, the nonequivalent control group pretest-posttest design, like the posttest-only design, demonstrates only that a treatment is associated with differences between groups, not that a treatment caused differences between groups, if any were observed.

Figure 9.5 ⦁ The Nonequivalent Control Group Pretest-Posttest Quasi-Experimental Design

Source: Based on a design used by Heinicke et al. (2017).

Description

Figure 9.6 ⦁ The Overall Grades on a Final Assessment Between Groups That Did Versus Did Not Have RATs Incorporated into the Course

Source: Data are adapted from those reported by Heinicke et al. (2017).

RAT = Readiness Assessment Tests.

9.4 QUASI-EXPERIMENTAL DESIGN: TIME SERIES DESIGNS

In some situations, researchers observe one or two preexisting groups at many points in time before and after a treatment, and not just at one time, using designs called the time series quasi-experimental designs. Using these types of designs, we compare the pattern of change over time from before to following a treatment. Three types of time series quasi-experimental designs are as follows:

  1. Basic time series design
  2. Interrupted time series design
  3. Control time series design

Basic Time Series Design

When researchers manipulate the treatment, they use a basic time series design to make a series of observations over time before and after a treatment. The advantage of measuring a dependent variable at multiple times before and after a treatment is that it eliminates the problem associated with only having a snapshot of behavior. To illustrate, suppose we test a treatment for improving alertness during the day. To use the basic time series design, we record alertness at multiple times before and after we give participants the treatment, as illustrated in Figure 9.7. Notice in the figure that a pretest (at 12 p.m.) and posttest (at 4 p.m.) measure can be misleading because the pattern observed before and after the treatment recurred without the treatment at the same time the day before and the day after the treatment was given. The basic time series design allows us to uniquely see this pattern by making a series of observations over time.

A basic time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that is manipulated by the researcher is administered.

Figure 9.7 ⦁ The Time Series Quasi-Experimental Design

A time series design is used to compare the pattern of behavior before and after the treatment. In this example, the pattern that occurs before and after the treatment recurs at the same time of day, even without the treatment.

Using the basic time series design, the researcher manipulates or controls when the treatment will occur. The advantage of this design is that we can identify if the pattern of change in a dependent variable before and after the treatment occurs only during that period and not during other periods when the treatment is not administered. The disadvantage of this design is that only one group is observed, so we cannot compare the results in the treatment group to a group that never received the treatment.

Time series designs include many observations made before and after a treatment.

In a basic time, series design, we manipulate the treatment; in an interrupted time series design, the treatment is naturally occurring.

Interrupted Time Series Design

In some situations, researchers will measure a dependent variable multiple times before and after a naturally occurring treatment or event. Examples of a naturally occurring treatment or event include a scheduled medical procedure, a wedding, a natural disaster, a change in public policy, a new law, and a political scandal. These events occur beyond the control of the researcher, so the researcher loses control over the timing of the manipulation. In these situations, when multiple measurements are taken before and after a naturally occurring treatment, researchers use the interrupted time series design.

An interrupted time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that naturally occurred.

As an example of the interrupted time series design, Fuller, Sahlqvist, Cummins, and Ogilvie (2012) measured the impact of two London Underground (the “Tube”) strikes by the train drivers on the usage of bicycle travel using a public bicycle share program that provided bicycles (“Boris bikes”) at docking stations around London for a small fee. For this study, the researchers recorded the number of trips per day on the Boris bikes. In Figure 9.8, the solid vertical lines show the dates for each 24-hour strike. Note that in their study, each time there was a strike, the number of trips on the Boris bikes spiked, showing evidence that the Tube strikes by train drivers was related to an increase in usage of the Boris bikes.

An advantage of the interrupted time series design is that we can identify if the pattern of change in a dependent variable change from before to following a naturally occurring treatment or event. The disadvantage of this design, like that for the basic time series design, is that only one group is observed, so we cannot compare the results in the treatment group to a group that never received or was never affected by a treatment. To address this disadvantage, we can include a matched or nonequivalent control group, as described in the next section.

Description

Figure 9.8 ⦁ Total Number of Trips per Day on the Boris Bikes

On the day of each strike, there was a sudden increase in total number of trips. Data are reproduced with permission from those reported by Fuller et al. (2012).

Control Time Series Design

A basic or interrupted time series design that includes a matched or nonequivalent control group is called a control time series design. As an example of a control time series design, Hacker et al. (2017) used such a design to test the effects of the implementation of a behavioral health child screening mandate in Massachusetts on the rates of behavioral health screenings. To compare their time series data, they also included rates of behavioral health screenings during the same time period in California, where such a policy was not implemented. California, then, was a nonequivalent control group that was matched because “it has a [similar] large diverse and stable Medicaid population [with] no competing mandate” (Hacker et al., 2017, p. 26). As shown in Figure 9.9, the implementation of the mandate was associated with an increase in behavioral health screenings in Massachusetts; no increase was observed in California during this same period.

A control time series design is a basic or interrupted time series quasi-experimental research design that also includes a nonequivalent control group that is observed during the same period as a treatment group but does not receive the treatment.

Description

Figure 9.9 ⦁ Rate of Behavioral Health Screening in Massachusetts and California Before and After the 2008 Behavioral Health Child Screening Mandate in Massachusetts

Source: Reprinted with permission from Psychiatric Services, (Copyright © 2017). American Psychiatric Association. All Rights Reserved. The behavioral health child screening mandate was associated with an increase in behavioral health screenings in Massachusetts; no change seen in California (the matched control).

As a caution, while the addition of the matched control group strengthens the design, keep in mind that the residents in each state are preexisting groups in that residents chose to live in those locations (or were born in those locations); the researcher did not randomly assign them to live in those locations. It is therefore possible, like that for all other designs that use a nonequivalent control group, that selection differences (such as differences in access residents have to care and even the costs of care for residents in each state) could have caused the differences observed in behavioral health screening rates between the states. For this reason, we conclude that the mandate was associated with an increase in behavioral health screenings, not that the mandate caused the increase.

Table 9.1 summarizes each quasi-experimental research design described in this chapter. In the next section, we introduce a special case of quasi-experiments used in developmental research.

Table 9.1 ⦁ The Quasi-Experimental Research Designs

Type of Quasi-Experimental Design

Description

Key Limitation

One-group posttest only——Observe one group after (posttest) a treatment.

No control group for comparison

One-group pretest-posttest——Observe one group before (pretest) and after (posttest) a treatment.

No control group for comparison

Nonequivalent control group posttest only

Observe treatment and nonequivalent control groups after (posttest) a treatment.

No random assignment between groups

Nonequivalent control group pretest-posttest

Observe treatment and nonequivalent control groups before (pretest) and after (posttest) a treatment.

No random assignment between groups

Basic time series design

Make many observations over time before and after a treatment manipulated by the researcher.

No control group for comparison

Interrupted time series design

Make many observations over time before and after a naturally occurring treatment.

No control group for comparison

Control series design

A time series design with a matched or nonequivalent control group.

No random assignment between groups

Learning Check 1 ✓

1. The quasi-experimental research design is structured similar to an experiment, except ____________ [complete the sentence].

2. State the type of quasi-experimental research design described in each of the following examples:

A researcher records the time (in seconds) it takes a group of participants to complete a computer-based task following an online “how-to” course.

A researcher records the rate of traffic accidents on a section of highway each month for 2 years before and 2 years after the speed limit on that section of highway is reduced.

A researcher records employee satisfaction before and after a training seminar, then compares satisfaction scores for employees at a local branch to the scores for those at the main branch who did not receive the seminar.

Answers:

  1. the research design includes a quasi-independent variable and/or lacks an appropriate or equivalent control group; 2. A. One-group posttest-only design, B. Interrupted time series design, C. Nonequivalent control group pretest-posttest design.

9.5 QUASI-EXPERIMENTAL DESIGN: DEVELOPMENTAL DESIGNS

An important area of research is used to study changes that occur across the life span. This type of research aims to understand how people or species change as they develop or age. The unique aspect of this area of research is that age, which is the factor being studied, is a quasi-independent variable. Age is a preexisting factor in that the researcher cannot manipulate the age of a participant. Because this design does not include a manipulation, it is also commonly categorized as a nonexperimental design. However, in this chapter, we describe this under the quasi-experimental category because, as you will see, each design is analogous to a quasi-experimental design already introduced in this chapter. Regardless of the category that developmental designs fit best with, it is most important to note that while developmental designs can demonstrate that variables differ by age, they do not demonstrate what causes variables to differ by age—more controlled procedures are needed, such as in an experiment.

The study of developmental changes across the life span is a special case, in that the focus of the field is on a factor that is inherent to the participants (their age). Therefore, researchers have developed research designs specifically adapted to study changes across the life span. Three types of developmental research designs are the following:

  1. Longitudinal design
  2. Cross-sectional design
  3. Cohort-sequential design

Longitudinal Design

Using a research design called the longitudinal design, we can observe changes across the life span by observing the same participants over time as they age. Using this design, researchers observe the same participants and measure the same dependent variable at different points in time or at different ages. The longitudinal design is similar to the one-group pretest-posttest quasi-experimental research design in that one group of participants is observed over time. In a strictly longitudinal design, however, changes at different ages are tested, but no treatment is administered.

A longitudinal design is a developmental research design used to study changes across the life span by observing the same participants at different points in time and measuring the same dependent variable at each time.

To illustrate the longitudinal design, consider the research example illustrated in Figure 9.10. Vrangalova (2015) tested the hypothesis that casual sex among college students is related to their well-being. To test this hypothesis, the researchers had a sample of 528 undergraduate students complete an online survey at the beginning (Time 1) and again at the end (Time 2) of 1 academic year. In support of their hypothesis, students reporting having engaged in “hookups for anonymous reasons” (p. 945) between Time 1 and Time 2, had lower self-esteem, and had higher depression and anxiety scores compared with those who did not report engaging in this activity. This study highlights a key advantage of the longitudinal design in that changes in participant behavior can be recorded over extended periods (e.g., Hawkley, Thisted, & Cacioppo, 2009), even 1 year or more.

The disadvantage of the longitudinal design is that it is prone to many threats to internal validity associated with observing participants over time. For example, many participants may drop out of the study over time (attrition). One possibility is that those who are most motivated to complete the study will remain at the end, so it could be motivation to complete the study, not age, that is associated with any changes observed. In addition, participants could learn how to take the assessments (testing effect) or settle down during the study so that assessments at Time 2 actually reflect their true score (regression toward the mean) on the measures recorded. Finally, the longitudinal design can require substantial resources, money, recruitment efforts, and time to complete, particularly for studies that last years or even decades.

Importantly, participant characteristics, referred to as individual differences, can further be used to explain any differences or changes observed in a longitudinal study. For this reason, many researchers who use this design will record additional measures at Time 1/Time 2 so that they can control for these factors prior to evaluating differences over time. For example, Vrangalova (2015) recorded a variety of participant characteristics, such as demographic background, personality traits, and prior casual and romantic sex (prior to the start of the study), to ensure that such factors could be controlled for (i.e., identified or eliminated as possible reasons or explanations for the results), prior to evaluating the differences described in their study. Measuring participant characteristics, then, is a practical way to control for factors that you anticipate may influence differences over time in a longitudinal study.

Description

Figure 9.10 ⦁ The Longitudinal Design

Based on a design used by Vrangalova (2015). The structure of the longitudinal design is to observe the same participants across time.

Age is the quasi-independent variable using a developmental research design.

Cross-Sectional Design

An alternative developmental design that does not require observing the same participants over time is the cross-sectional design. Using this design, the researcher observes a cross-section of participants who are grouped based on their age. The cross-sectional design is similar to a nonequivalent control group quasi-experimental design in that the different age groups act as nonequivalent control groups. Each age group is called a cohort, which is any group of individuals who share common statistical traits or characteristics, or experiences within a defined period. For example, a cohort could be a group of people who were born in the same year, served in the same war, or attended the same school. For developmental research, cohorts in a cross-sectional analysis are related in terms of when participants were born.

A cross-sectional design is a developmental research design in which participants are grouped by their age and participant characteristics are measured in each age group.A cohort is a group of individuals who share common statistical traits or characteristics, or experiences within a defined period.

To illustrate the cross-sectional design, we will look at the research example illustrated in Figure 9.11. Phillips (2008) selected a sample of 99 community college students and 320 middle school and high school students. Each group represented a different age group or cohort. The researcher measured the identity style of students in each cohort (community college vs. middle school and high school) using the Identity Style Inventory Revised for a Sixth-Grade Reading Level (ISI-6G; White, Wampler, & Winn, 1998). Results showed that the identity style of a student is different for precollege and college-aged cohorts.

The advantage of a cross-sectional design is that participants are observed one time in each cohort. Observing participants one time eliminates many threats to internal validity associated with observing participants over time. Factors such as attrition, testing effects, and regression toward the mean are typically not a concern when participants are observed only one time.

Description

gure 9.11 ⦁ The Cross-Sectional Design

Based on a design used by Phillips (2008). Notice that participants are grouped based on their age using the cross-sectional design.

However, a disadvantage of the cross-sectional design is the possibility of cohort effects (or generation effects), which occur when preexisting differences between members of a cohort can explain an observed result. For example, suppose we use a cross-sectional design to measure how often 20-year-olds, 40-year-olds, and 80-year-olds send text messages. In this example, we are likely to find that texting decreases with age. However, there is also a cohort effect due to the generational gap in advances of technology. An 80-year-old participant was raised when cell phones, and therefore texting, did not yet exist. This cohort effect of differences in experience or familiarity with texting across the life span can alternatively explain why texting appears to decrease with age, without appealing to age as the primary explanation. For this reason, researchers must be cautious to consider any possible cohort effects prior to the conduct of a cross-sectional study.

Table 9.2 summarizes the two developmental research designs described here. These two research designs, the longitudinal design and the cross-sectional design, can also be used together, as is described next.

A cohort effect, or a generation effect, is a threat to internal validity in which differences in the characteristics of participants in different cohorts or age groups confound or alternatively explain an observed result.

Cohort-Sequential Design

To combine the advantages of longitudinal and cross-sectional developmental research designs, we can use a cohort-sequential design. Using the cohort-sequential design, two or more cohorts are observed from or at different points in time (cross-sectional design), and over time (longitudinal design). Figure 9.12 illustrates this design when three cohorts are observed, with each cohort also observed over time. Note that this design requires only that the longitudinal observations overlap across the cohorts. With only two cohorts observed, it is also common for some of the same participants to be represented in each cohort, as described in the following research example, physical activity among adolescent girls. In their study, physical

A cohort-sequential design is a developmental research design that combines longitudinal and cross-sectional techniques by observing different cohorts of participants over time at overlapping times.

Figure 9.12 ⦁ The Cohort-Sequential Design

In this example of a cohort-sequential design, three cohorts of participants born as part of Generation X (oldest cohort), Millennials, or Generation Z (youngest cohort) are observed on some measure over time. The shaded boxes indicate when each group was observed. In this example, each cohort was observed twice, and the times of longitudinal observations overlapped.

Table 9.2 ⦁ Potential Limitations of the Longitudinal and Cross-Sectional Research Designs

Potential Limitations                           Developmental Research Design                      Cross-Sectional 

                                                                             Longitudinal

Threats to internal validity

History and maturationYes, because participants are observed more than one time, and the design lacks a control group.

Possibly, because the control groups (by age) are nonequivalent.

Regression and testing effects?

Yes, because participants are observed more than one time.

No, because participants are observed only one time.

Heterogeneous attrition?

Yes, because participants are observed more than one time.

Possibly, but not likely because participants are observed only one time.

Cohort effects?

No, because participants from the same cohort are observed over time.

Yes, because participants are grouped based on their age, which is a cohort.

Additional potential limitations

Time-consuming?

Yes, studies can range from months to years in length.

No, a cross-section of the life span is observed at one time.

Costly/expensive?

Yes, keeping track of participants costs time, recruitment, and money.

Possibly, but this design is typically less costly/expensive than a longitudinal study.

As an example of how the cohort-sequential design can be applied when the same participants are represented in each cohort, Pate et al. (2009) measured age-related changes in physical activity among adolescent girls. In their study, physical activity was measured in sixth-grade girls, and physical activity was again measured 2 years later when the girls were in eighth grade. Part of their sample was longitudinal in that the same girls from sixth grade were sampled again when they were in eighth grade. Also, by chance, some girls were sampled only one time because some sixth-grade girls did not participate in eighth grade and some eighth-grade girls included in the study did not participate when they were in sixth grade. The advantage of using this cohort-sequential design is that researchers can do the following:

Account for threats to internal validity associated with observing participants over time because part of the sample is a cross-section of age groups.

Account for cohort effects because part of the sample includes the same participants observed over time in each age group or cohort.

9.6 Ethics in Focus

Development and Aging

Ethical concerns related to age are often focused on those who are very young and those who are very old. For younger participants, researchers must obtain consent from a parent, caregiver, or legal guardian to study minors, who are children under the age of 18 years. On the other extreme, older individuals require special permissions particularly when they are deemed no longer functionally or legally capable. Additional concerns also arise for the ethical treatment of clinical populations, such as those suffering trauma or disease at any stage of development. In all, you should follow three rules to ensure that such groups or cohorts are treated in an ethical manner:

Obtain assent when necessary. In other words, ensure that informed consent is obtained from the participant only after all possible risks and benefits have been clearly identified.

Obtain permission from a parent, caregiver, legal guardian, or another legally capable individual, such as a medical professional, when a participant is a minor or when a participant is functionally or legally incapable of providing consent.

Clearly show that the benefits of a study outweigh the costs. For any group that is studied, that group (younger, older, or incapable) should specifically benefit from participating in the research with minimal costs.

Learning Check 2 ✓

State the developmental research design that is described by each of the following phrases:

Observing participants over time

Observing groups at one time only

Prone to testing effects

Prone to cohort effects

A __________ is a group of individuals who share common statistical or demographic characteristics.

Answers:

  1. A. Longitudinal, B. Cross-sectional, C. Longitudinal, D. Cross-sectional; 2. cohort.

SINGLE-CASE EXPERIMENTAL DESIGNS

In this section, we begin by identifying a new research design to test the following research hypothesis: Giving encouragement to students who are at risk of dropping out of school will keep them on task in the classroom. To answer this hypothesis, we could measure the time (in minutes) that an at-risk student stays on task. We could observe the student for a few days with no encouragement. Then we could observe the student for a few days with encouragement given as they work on the task. We could then again observe the student for a few more days with no encouragement. If the hypothesis is correct and we set up this study correctly, then we should expect to find that the time (in minutes) spent on task was high when the encouragement was given but low during the observation periods before and after when no encouragement was given. The unique feature of this design is that only one participant was observed.

In this final section, we introduce the research design that was illustrated here: the single-case experimental design.

9.7 AN OVERVIEW OF SINGLE-CASE DESIGNS

In some cases, often in areas of applied psychology, medicine, and education, researchers want to observe and analyze the behavior of a single participant using a research design called the single-case experimental design. A single-case design is unique in that a single participant serves as their own control; multiple participants can also be observed as long as each individual serves as their own control (Antia, Guardino, & Cannon, 2017; Kazdin, 2016). In addition, the dependent variable measured in a single-case design is analyzed for each individual participant and is not averaged across groups or across participants. By contrast, all other experimental research designs, introduced in Chapters 10 through 12, are grouped designs.

A single-case experimental design is an experimental research design in which a participant serves as their own control and the dependent variable measured is analyzed for each individual participant and is not averaged across groups or across participants.

For a single-case design to be an experimental design, it must meet the following three key elements of control required to draw cause-and-effect conclusions:

Randomization (random assignment). Using single-case designs, each participant can be randomly assigned to experience many phases or treatments controlled by the researcher.

Manipulation (of variables that operate in an experiment). The researcher must manipulate the phases or treatments that are experienced by each participant such that the factor or independent variable is not preexisting.

Comparison/control group. Each participant acts as their own control or comparison. For the single-case designs described here, comparisons can be made across multiple baseline phases (reversal design), participants (multiple-baseline design), or treatments (changing-criterion design).

An advantage of analyzing the data one participant at a time is that it allows for the critical analysis of each individual measure, whereas averaging scores across groups can give a spurious appearance of orderly change. To illustrate this advantage, suppose that a researcher measures the body weight in grams of four rat subjects before and after an injection of a drug believed to cause weight loss. The hypothetical data, provided in Table 9.3, show that rat subjects as a group lost 25 grams on average. However, Rat C actually gained weight following the injection. An analysis of each individual rat could be used to explain this outlier; a grouped design would often disregard this outlier as “error” so long as weight loss was large enough on average.

The single-case design, which is also called the single-subject, single-participant, or small n design, is most often used in applied areas of psychology, medicine, and education.

Table 9.3 ⦁ The Value of an Individual Analysis

Subject

Baseline weight

Weight following drug treatment

Weight loss

Rat A

320

305

15

Rat B

310

280

30

Rat C

290

295

−5

Rat D

360

300

60

8 SINGLE-CASE BASELINE-PHASE DESIGNS

Single-case designs are typically structured by alternating baseline and treatment phases over many trials or observations. In this major section, we introduce three types of single-case experimental research designs:

Reversal design

Multiple-baseline design

Changing-criterion design

Reversal Design

One type of single-case design, called the reversal design (or ABA design), involves observing a single participant prior to (A), during (B), and following (A) a treatment or manipulation. The reversal design is structured into phases, represented alphabetically with an A or a B. Each phase consists of many observations or trials. The researcher begins with a baseline phase (A), in which no treatment is given, then applies a treatment in a second phase (B), and again returns to a baseline phase (A) in which the treatment is removed. This type of research design can be represented as follows:

A reversal design, or ABA design, is a single-case experimental design in which a single participant is observed before (A), during (B), and after (A) a treatment or manipulation.

  1. A phase is a series of trials or observations made in one condition.
  • The baseline phase (A) is a phase in which a treatment or manipulation is absent.
  • A (baseline phase) → B (treatment phase) → A (baseline phase)

If the treatment in Phase B causes a change in the dependent variable, then the dependent variable should change from baseline to treatment, then return to baseline levels when the treatment is removed. For example, we opened this section with the hypothesis that giving encouragement to students who are at risk of dropping out of school will keep them on task in the classroom. To test this hypothesis, we measured the time in minutes that an “at-risk” student spent on task in a class with no encouragement (baseline, A) for a few trials, then with encouragement (treatment, B) for a few trials, and again with no encouragement (baseline, A) for a few more trials. If the encouragement (the treatment) was successful, then the time (in minutes) spent on task would be higher when the encouragement was given but lower during the observation periods before and after when no encouragement was given. The second baseline phase minimizes the possibility of threats to internal validity. Adding another B and A phase would further minimize the possibility of threats to internal validity because the pattern of change would be repeated using multiple treatment phases.

A visual inspection of the data, and not inferential statistics, is used to analyze the data when only a single participant is observed. To analyze the data in this way, we look for two types of patterns that indicate that a treatment caused an observed change, as illustrated in Figure 9.13:

A change in level is displayed graphically, as shown in Figure 9.13 (top graph), when the levels of the dependent variable in the baseline phases are obviously less than or greater than the levels of the dependent variable in the treatment phase.

A change in trend is displayed graphically, as shown in Figure 9.13 (bottom graph), when the direction or pattern of change in the baseline phases is different from the pattern of change in the treatment phase. In the typical case, a dependent variable gradually increases or decreases in the treatment phase but is stable or does not change in the baseline phases.

The reversal design is typically conducted in applied areas of research to investigate possible solutions that can benefit individuals or society. For this reason, one advantage of the design is that it can be used to apply treatments that are beneficial to participants. Often this means that researchers will be asked by ethics committees to end their study with a treatment phase (B), which was the phase that was beneficial to the participant. For this reason, many reversal designs are at least four phases, or ABAB, so as not to return to baseline to end an experiment.

A limitation of the reversal design is that the change in a dependent variable in a treatment phase must return to baseline levels when the treatment is removed. However, in many areas of research, such as studies on learning, a return to baseline is not possible. When a participant is taught a new skill, for example, it is often not possible to undo what the participant learned as fully expected, the behavior will not return to baseline. In these situations, when it is not possible for changes in a dependent variable to return to baseline, a reversal design cannot be used.

Figure 9.13 ⦁ Two Ways to Identify if a Treatment Caused Changes in a Dependent Variable

A change in level (top graph) and a change in trend (bottom graph) make it possible to infer that some treatment is causing an effect or a change in behavior.

Source: Republished with permission of John Wiley and Sons Inc, from Enhancing capacity to make sexuality-related decisions in people with an intellectual disability. Dukes, E. & McGuire, B. E., Journal of Intellectual Disability Research, 53 (8), 2009; permission conveyed through Copyright Clearance Center, Inc.

Multiple-Baseline Design

For situations in which it is not possible for changes in a dependent variable to return to baseline levels following a treatment phase, researchers can use the multiple-baseline design. The multiple-baseline design is a single-case design in which the treatment is successively administered over time to different participants, for different behaviors, or in different settings. This design allows researchers to systematically observe changes caused by a treatment without the need of a second baseline phase and can be represented as follows:

A multiple-baseline design is a single-case experimental design in which a treatment is successively administered over time to different participants, for different behaviors, or in different settings.

By representing the multiple-baseline design in this way, a case refers to a unique time, behavior, participant, or setting. Baseline periods are extended in some cases prior to giving a treatment. If the treatment causes an effect following a baseline phase for each case, then the change in level or pattern should begin only when the baseline phase ends, which is different for each case. If this occurs, then we can be confident that the treatment is causing the observed change. This design minimizes the likelihood that something other than the treatment is causing the observed changes if the changes in a dependent variable begin only after the baseline phase ends for each case.

To illustrate the multiple-baseline design, we will look at the research example illustrated in Figure 9.14. Dukes and McGuire (2009) used a multiple-baseline design to measure the effectiveness of a sex education intervention, which they administered to multiple participants with a moderate intellectual disability. The researchers recorded participant knowledge of sexual functioning using the Sexual Consent and Education Assessment (SCEA K-Scale; Kennedy, 1993), on which higher scores indicate greater ability to make decisions about sex. Each participant was given a baseline phase for a different number of weeks. Scores on the SCEA K-Scale were low in this baseline phase. As shown in Figure 9.14 for three participants, only after the baseline period ended and the intervention was administered did scores on the scale increase. Scores also remained high for 4 weeks after the program ended. Hence, the results showed a change in level from baseline to intervention for each participant.

Each participant in the sex education study received the intervention (or the treatment) in successive weeks: Tina (Week 11), Josh (Week 12), and Debbie (Week 13). Because the treatment was administered at different times, and changes in the dependent variable only occurred once the treatment was administered, the pattern showed that the treatment, and not other factors related to observing participants over time, caused the observed changes in SCEA K-Scale scores.

Description

Figure 9.14 ⦁ Results from a Multiple-Baseline Design for Three Participants Receiving a Sex Education Intervention

Source: Republished with permission of John Wiley and Sons Inc, from Enhancing capacity to make sexuality-related decisions in people with an intellectual disability. Dukes, E. & McGuire, B. E., Journal of Intellectual Disability Research, 53 (8), 2009; permission conveyed through Copyright Clearance Center, Inc.

The advantage of a multiple-baseline design is that it can be used when we expect a treatment will not return to baseline, such as when we study learning on some measure, as illustrated in Figure 9.14 for our example. The limitation of a multiple-baseline design is that the design is used when only a single type of treatment is administered. This same limitation applies to the reversal design. For situations when we want to administer successive treatments, then, we require a different type of single-case experimental design.

A and B indicate the phases in a reversal design.

The length of the baseline phase is varied using a multiple-baseline design.

Changing-Criterion Design

For research situations in which we want to change a criterion or treatment after the participant meets an initial criterion or responds to one particular treatment, we can use a changing-criterion design. Using the changing-criterion design, we begin with a baseline phase, which is followed by many successive treatment phases to determine if participants can reach different levels or criteria in each treatment phase. The criterion can be changed as often as necessary or until some final criterion is met. For a three-treatment study, the changing-criterion design can be represented as follows:

A changing-criterion design is a single-case experimental design in which a baseline phase is followed by successive treatment phases in which some criterion or target level of behavior is changed from one treatment phase to the next. The participant must meet the criterion of one treatment phase before the next treatment phase is administered.

To illustrate the changing-criterion design, we will look at the research example illustrated in Figure 9.15. Gentry and Luiselli (2008) used the changing-criterion design to increase the number of bites that Sam, a fictitious name for the 4-year-old boy being observed, would take of a nonpreferred food (i.e., a food he did not like) during a supper meal. In a baseline phase, Sam ate the food with no manipulation. Then a series of manipulations followed. Sam was instructed to spin an arrow that would fall on a number indicating the number of bites of a nonpreferred food that Sam would need to consume during supper to gain a reward, which in this study was his favorite play activity. The initial criterion was a spinner with a 1 and a 2 on it. This criterion was increased over time, until the options on the spinner were 5 and 6 (bites) to meet the criterion to gain a reward. As shown in Figure 9.15, each time the criterion, or the number of bites required to gain a reward, was increased, Sam’s eating behavior correspondingly increased.

Two advantages of the changing-criterion design are that it does not require a reversal to baseline of an otherwise effective treatment and that it enables experimental analysis of a gradually improving behavior. A limitation of the design is that the target behavior must already be in the participant’s repertoire. For example, the number of bites of food is well within the abilities of a healthy child. In addition, researchers should be cautious to not increase or decrease the criterion too soon or by too much, which may impede the natural learning rate of the participant being observed.

Each successive treatment phase in a changing-criterion design is associated with a change in criterion.

Description

Figure 9.15 ⦁ A Changing-Criterion Design to Increase the Number of Bites of Nonpreferred Food for a Single Child (Sam)

At baseline, Sam ate no bites, and then Sam spun an arrow that displayed different criteria for a reward. He began with 1–2 bites, then 2–3 bites, 3–5 bites, 4–6 bites, and finally 5–6 bites in order to receive the reward. The changing criterion is highlighted in each treatment phase. Notice that as the criterion was increased, so did Sam increase the number of bites he took of nonpreferred food. Data based on those presented by Gentry and Luiselli (2008).

Learning Check 3 ✓

Why is the single-case design regarded as an experimental research design?

Identify whether each of the following is an example of a reversal design, a multiple-baseline design, or a changing-criterion design:

A researcher gives a child successively greater levels of positive reinforcement after an initial baseline phase to reduce how often the child bites their nails. The successive treatments are administered until the child has reached a level where they are no longer biting their nails.

A researcher records the duration of time a participant stays on task in a dance recital 4 days before, 4 days during, and 4 days after a behavioral intervention strategy is implemented.

A researcher records the quality of artistic strokes made by three participants. Each participant was given a treatment phase after 3, 4, or 5 days of a baseline phase; no baseline phase was given after the treatment was administered.

For a single-case experimental study, why would a researcher use a multiple-baseline design instead of a reversal design?

Answers:

  1. Because it meets the three key elements of control required to demonstrate cause and effect: randomization, manipulation, and comparison; 2. A. Changing-criterion design, B. Reversal design, C. Multiple-baseline design; 3. A multiple-baseline design would be used when it is not possible for changes in a dependent variable to return to baseline.

9 VALIDITY, STABILITY, MAGNITUDE, AND GENERALITY

The analysis of single-case experimental research designs is based largely on a visual inspection of the data in a graph and is not based on statistical analyses that require data to be grouped across multiple participants or groups. The specific visual features in a graph that indicate the validity of an observation are described in this section.

Internal Validity, Stability, and Magnitude

Recall from Chapter 6 that internal validity is the extent to which we can demonstrate that a manipulation or treatment causes a change in a dependent measure. Importantly, the extent to which we establish experimental control of all other possible causes is directly related to the internal validity of a research study. The greater the control we establish, the higher the internal validity.

A single-case design requires a visual analysis of the graphical data of a single participant. The level of control and therefore the internal validity of a single-case design can be determined when the following two features are observed in a graph using this type of analysis:

  1. The stability in the pattern of change across phases
  2. The stability in the pattern of change across phases

The magnitude or size of the change across phases

Stability is the consistency in the pattern of change in a dependent measure in each phase of a design. The more stable or consistent changes in a dependent measure are in each phase, the higher the internal validity of a research design.

In a visual inspection of a graph, the stability of a measure is indicated by the consistency in the pattern of change in each phase. The stability of a dependent measure is illustrated in Figure 9.16. Data in a given phase can show a stable level (as in Figure 9.16a), can show a stable trend (as in Figure 9.16b), or can be unstable (as in Figure 9.16c). The stability of a measure in each phase is important because when a measure is unstable, changes are occurring in a dependent variable even when the researcher is not manipulating the behavior. When a dependent measure is stable, we can be confident that any changes in level or trend were caused by the manipulation, because changes only occurred between each phase and were otherwise stable or consistent within each phase. Therefore, the more stable a measure, the greater the control and the higher the internal validity in an experiment.

Description

Figure 9.16 ⦁ A Stable Level (a), a Stable Trend (b), and an Unstable Response (c)

Graphs (a) and (b) show a response that indicates high internal validity, whereas graph (c) indicates low internal validity.

Another level of control can be demonstrated by the magnitude of change, which is the size of the change in a dependent measure observed between phases. When a measure is stable within each phase, we look at the magnitude of changes between phases. For a treatment to be causing changes in a dependent measure, we should observe immediate changes as soon as the treatment phase is administered. We can observe an immediate change in level (as shown in Figure 9.17a), or we can observe an immediate change in trend (as shown in Figure 9.17b). The greater the magnitude of changes between phases, the greater the control and the higher the internal validity in a single-case experiment.

Magnitude is the size of the change in a dependent measure observed between phases of a design. The larger the magnitude of changes in a dependent measure between phases, the higher the internal validity of a research design.

Internal validity is related to the stability and magnitude of change across phases in a single-case design.

Description

Figure 9.17 ⦁ Internal Validity and Control

The graphs identify an immediate change in level (top row, a) or a change in trend (bottom row, b) that would indicate a high level of control and high internal validity.

External Validity and Generality

Recall from Chapter 6 that external validity is the extent to which observations generalize beyond the constraints of a study. A single-case design is typically associated with low population validity, which is a subcategory of external validity. In other words, it is not possible to know whether the results in the sample would also be observed in the population from which the sample was selected because single-case experimental designs are associated with very small sample sizes. However, the results in a single-case design can have high external validity in terms of generalizing across behaviors, across subjects or participants, and across settings. The following is an example of each way to generalize results to establish the external validity of a single-case experiment:

As an example of generalizing across behaviors, a psychotherapist may examine the extent to which causes of spousal abuse generalize to or also similarly cause child abuse. In this example, the therapist generalizes across behaviors, from spousal abuse (Behavior 1) to child abuse (Behavior 2).

As an example of generalizing across subjects or participants, an animal researcher may examine the generality of foraging behavior across multiple rat subjects, or a clinical researcher may examine the effectiveness of a behavioral therapy to improve symptoms of depression across multiple participants. In each case, the researcher is generalizing across multiple subjects or participants.

As an example of generalizing across settings, a child psychologist may want to determine the extent to which characteristics of child play behavior during recess generalize to characteristics of play behavior during class time. In this example, the researcher generalizes across settings, from child play behavior during recess (Setting 1) to child play behavior during class time (Setting 2).

External validity is related to the generality of findings in a single-case design.

9.10 Ethics in Focus The Ethics of Innovation

Many single-case experiments look at early treatments for behavioral disorders or simply bad habits such as smoking or nail biting. When these types of behaviors are studied using a single-case design, the treatment is typically hypothesized to have benefits, such as reducing symptoms of the behavioral disorder or reducing the frequency of bad habits. Researchers will end an experiment with the treatment phase that was most beneficial, so as to maximize the benefits that participants receive. In a reversal design, this means that researchers end the study in a B phase (e.g., ABAB). A multiple-baseline design and a changing-criterion design already end in a treatment phase. Adding a treatment phase or otherwise adapting a single-case design is quite manageable for researchers because they observe only one or a few subjects or participants in a single-case experiment. Observing such a small sample size allows researchers the flexibility to make changes, such as when they add or omit treatments to maximize benefits to participants.

The flexibility of a single-case design also allows for greater “investigative play” (Hayes, 1981, p. 193) or greater freedom to ask innovative or new questions about treatments with unknown causes or with unknown costs or benefits. Single-case designs allow for the conduct of such innovative research to rigorously evaluate potential, yet untested, treatments with small samples; this allows researchers to test the treatment without exposing such a treatment to large groups of participants, particularly when the potential costs of implementing such a treatment are largely unknown or untested. In this way, single-case designs can be used as an initial research design for testing some of the most innovative research in the behavioral sciences.

Learning Check 4 ✓

  1. Perform a visual inspection of the following data. Does the graph illustrate a study with high internal validity? Explain.

Description

  1. A researcher uses a single-case design to record the number of minutes spent studying in a baseline phase and a calming music treatment phase with a student who studied in a library and the same student who studied in a college dormitory room. Based on this description, can the researcher generalize across behaviors, across participants, or across settings?
  2. Single-case designs allow for greater freedom to ask innovative or new questions about treatments with unknown causes or with unknown costs or benefits. Why can a single-case design be an ethically appropriate research design to test the effectiveness of such treatments?

Answers:

  1. Yes, because the data at baseline are stable, and there is a change in trend from baseline to treatment; 2. Generalize across settings; 3. Because single-case designs are used with small samples, thereby testing the treatment without exposing such a treatment to large groups of participants.

Chapter Summary

LO 1 Define and identify a quasi-experiment and a quasi-independent variable.

  1. A quasi-experimental research design is structured similar to an experiment, except that this design lacks random assignment, includes a preexisting factor (i.e., a variable that is not manipulated), or does not include a comparison/control group.
  2. A quasi-independent variable is a preexisting variable that is often a characteristic inherent to an individual, which differentiates the groups or conditions being compared in a research study. Because the levels of the variable are preexisting, it is not possible to randomly assign participants to groups.

LO 2 Identify and describe two one-group quasi-experimental research designs: the posttest-only and pretest-posttest designs.

  1. The one-group posttest-only design is a quasi-experimental research design in which a dependent variable is measured for one group of participants following a treatment.
  2. The one-group pretest-posttest design is a quasi-experimental research design in which the same dependent variable is measured in one group of participants before and after a treatment is administered.

LO 3 Identify and describe two nonequivalent control group quasi-experimental research designs: the posttest-only and pretest-posttest designs.

  1. A nonequivalent control group is a control group that is matched upon certain preexisting characteristics similar to those observed in a treatment group, but to which participants are not randomly assigned. When a nonequivalent control group is used, selection differences can potentially explain an observed difference between an experimental and a nonequivalent control group.
  2. The nonequivalent control group posttest-only design is a quasi-experimental research design in which a dependent variable is measured following a treatment in one group and is compared with a nonequivalent control group that does not receive the treatment.
  3. The nonequivalent control group pretest-posttest design is a quasi-experimental research design in which a dependent variable is measured in one group of participants before (pretest) and after (posttest) a treatment, and that same dependent variable is also measured at pretest and posttest in a nonequivalent control group that does not receive the treatment.

LO 4 Identify and describe three time series quasi-experimental research designs: basic, interrupted, and control designs.

  1. The basic time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that is manipulated by the researcher is administered.
  2. The interrupted time series design is a quasi-experimental research design in which a dependent variable is measured at many different points in time in one group before and after a treatment that naturally occurs.
  3. A control time series design is a basic or interrupted time series quasi-experimental research design that also includes a nonequivalent control group that is observed during the same period as a treatment group but does not receive the treatment.

LO 5 Identify and describe three developmental quasi-experimental research designs: longitudinal, cross-sectional, and cohort-sequential designs.

  1. A longitudinal design is a developmental research design used to study changes across the life span by observing the same participants over time and measuring the same dependent variable each time.
  2. A cross-sectional design is a developmental research design in which participants are grouped by their age and participant characteristics are measured in each age group. Each age group is a cohort, so this design is prone to cohort effects, which occur when unique characteristics in each cohort can potentially explain an observed difference between groups.
  3. A cohort-sequential design is a developmental research design that combines longitudinal and cross-sectional techniques by observing different cohorts of participants over time at overlapping times.

LO 6 Define the single-case experimental design.

The single-case experimental design is an experimental research design in which a participant serves as their own control and the dependent variable measured is analyzed for each individual participant and is not averaged across groups or across participants. This design meets the three requirements to demonstrate cause and effect: randomization, manipulation, and comparison/control.

LO 7 Identify and describe three types of single-case research designs: the reversal, multiple-baseline, and changing-criterion designs.

  1. The reversal design is a single-case experimental design in which a single participant is observed before (A), during (B), and after (A) a treatment or manipulation.
  2. The multiple-baseline design is a single-case experimental design in which a treatment is successively administered over time to different participants, for different behaviors, or in different settings.
  3. The changing-criterion design is a single-case experimental design in which a baseline phase is followed by successive treatment phases in which some criterion or target level of behavior is changed from one treatment phase to the next. The participant must meet the criterion of one treatment phase before the next treatment phase is administered.

LO 8 Identify in a graph the stability and magnitude of a dependent measure and explain how each is related to the internal validity of a single-case design.

  1. The stability of a measure is the consistency in the pattern of change in a dependent measure in each phase of a design. The more stable or consistent changes in a dependent measure are in each phase, the higher the internal validity of a research design.
  2. The magnitude of change in a measure is the size of the change in a dependent measure observed between phases of a design. A measure can have a change in level or a change in trend. The larger the magnitude of change, the greater the internal validity of a research design.

LO 9 Identify three ways that researchers can strengthen the external validity of a result using a single-case design.

A single-case design is typically associated with low population validity (a subcategory of external validity). However, three ways that researchers can strengthen the external validity of a result using a single-case design is to generalize across behaviors, across subjects or participants, and across settings.

REVIEW QUESTIONS

  1. A quasi-experimental research design is structured similar to an experiment, with what two exceptions?
  2. State whether each of the following factors is an example of an independent variable or a quasi-independent variable. Only state “quasi-independent variable” for participant variables that cannot be manipulated.
  3. The age of participants
  4. Time allotted for taking an exam
  5. A participant’s work experience
  6. Time of day a study is conducted
  7. A participant’s state of residence
  8. Amount of sugar added to a drink
  9. How does a one-group pretest-posttest design improve on the posttest-only quasi-experimental design? What is the major limitation of all one-group designs?
  10. What is a nonequivalent control group, and why does this type of group make it difficult to determine cause and effect using a nonequivalent control group quasi-experimental design?
  11. What is the key difference between the basic and interrupted time series quasi-experimental research designs?
  12. Name the developmental research design described in each of the following examples:
  13. A researcher measures job satisfaction in a sample of employees on their first day of work and again 1 year later.

 b.  A researcher records the number of nightmares per week reported in a sample of 2-year-old, 4-year-old, and 8-year-old foster children.

7. (A) Cohort effects are a threat to what type of validity? (B) Which developmental research design is most susceptible to effects?

8. Why is the single-case design regarded as an experimental research design?

9. A reversal design is used to test the hypothesis that low lighting in a room reduces how quickly students read. As shown in Graph 1 for one student, a student reads passages of similar length in a room with normal lighting (baseline), then in the same room with dim lighting (treatment), and then again with normal lighting. Do the results shown in the figure support the hypothesis? Explain.

10. What is the most likely reason that a researcher uses a multiple-baseline design instead of a reversal design?

11. Define the changing-criterion design and explain when the design is used.

12. Are the baseline data shown in Graph 2 stable? Do the baseline data in the figure indicate high or low internal validity?

13. A researcher examines the generality of a behavioral treatment for overeating by testing the same treatment to treat overworking. In this example, is the researcher generalizing across behaviors, across participants, or across settings?

14. A researcher examines if the effectiveness of a new learning system used in a classroom is also effective when used in a home (for homeschooled children). In this example, is the researcher generalizing across behaviors, across participants, or across settings?

ACTIVITIES

Use an online database, such as PsycINFO, to search scientific research articles for any topic you are interested in. Perform two searches. In the first search, enter a search term related to your topic of interest, and enter the term longitudinal to find research that used this design in your area of interest. Select and print one article. In the second search, again enter a search term related to your topic of interest, and this time enter the term cross-sectional to find research that used this design in your area of interest. Again, select and print one article. Once your searches are complete, complete the following assignment:

Write a summary of each article, and explain how each research design differed.

Describe at least two potential threats to internal validity in each study.

Include the full reference information for both articles at the end of the assignment.

A researcher proposes that having a pet will improve health.

Write a research plan to test this hypothesis using a single-case experimental design.

What is the predicted outcome or pattern, if the hypothesis that having a pet will improve health were correct?

Identify the extent to which your results demonstrate high or low internal validity.

Graph the expected results.

Mrs F 80 year old Muslim woman admitted to ward

Consider the following case study:

An ethical dilemma

Mrs F is an 80 year old Muslim woman admitted to your ward. She has limited English and is accompanied by her husband (also with little English) and their 2 sons who speak English fluently.

Mrs F has advanced bladder cancer with urinary retention and pain score of 6/10. She requires an Indwelling Catheter (IDC) to be inserted to relieve her pain and urinary retention. She has no other medical conditions. From handover you know that she has no EPOA or ACHD and has Capacity.

Her doctor comes in and speaks to the sons who say that their culture means that there is no need to talk to their mother about the procedure and they can make the decision for her. The doctor agrees with them and orders an IDC on free drainage to be inserted.

When you come in to do her vital signs prior to the IDC being inserted and while her family are outside she tries to ask you what is happening, grabbing her lower abdomen and crying.

Q4 What would you do in this situation?

In answering the question you need to make an Ethical Decision using the following as they relate to the case study:

·      Code of Ethics (ICN 2012)

·      Informed decision making and Consent.

·      Cultural Competency

·      Ethical concepts and principles in nursing

–  Autonomy – “Self-determination” – consider Mrs F’s rights.

–  Beneficence – “above all, do good” – what is best in Mrs F’s interests?

–  Non-maleficence – “above all, do no harm” – what is in Mrs F’s interests?

–  Confidentially – who should have Mrs F’s information?

–  Justice – “fairness” – what is fair for Mrs F?

–  Rights – what are Mrs F’s rights?

–  Veracity – “telling the truth” – what does this mean for Mrs F?

SOLUTION: Mrs F 80 year old Muslim woman admitted to ward

In this situation, several ethical principles and considerations come into play:

  1. Autonomy: Mrs. F has the right to make decisions about her own medical care, provided she has capacity. Despite the cultural beliefs of her sons, it is imperative to respect Mrs. F’s autonomy and involve her in the decision-making process to the extent possible.Beneficence: The primary goal of healthcare professionals is to do good for the patient. In Mrs. F’s case, relieving her pain and discomfort through the insertion of the indwelling catheter (IDC) is in her best interest from a medical perspective.Non-maleficence: Healthcare providers must strive to do no harm to their patients. In this context, withholding necessary medical treatment, such as the IDC insertion, could potentially harm Mrs. F by allowing her pain and urinary retention to persist.Confidentiality: Mrs. F’s medical information should be kept confidential and shared only with those involved in her care or with her explicit consent. While her sons may be involved in her care, Mrs. F should be given the opportunity to share her concerns and preferences privately with healthcare providers.Justice: Fairness requires that Mrs. F’s cultural background and beliefs be respected while also ensuring that she receives appropriate medical care that aligns with her best interests and preferences.Rights: Mrs. F has the right to receive adequate information about her medical condition and proposed treatments, to make decisions about her care, and to have her autonomy respected.Veracity: Healthcare providers have an ethical obligation to be truthful with their patients. Mrs. F should be provided with clear and honest information about her medical condition, the proposed IDC insertion, and the potential benefits and risks associated with the procedure.

Given these ethical considerations, the appropriate course of action would be to:

  • Respect Mrs. F’s autonomy by involving her in the decision-making process to the extent possible, ensuring that she understands the proposed procedure and its implications.Provide Mrs. F with culturally sensitive and linguistically appropriate information about the IDC insertion, addressing any concerns or questions she may have.Advocate for Mrs. F’s rights to receive appropriate medical care while also respecting her cultural beliefs and preferences.Ensure that Mrs. F’s medical information is kept confidential and shared only with those directly involved in her care, including her sons if she consents to their involvement.If Mrs. F is unable to provide informed consent due to language barriers or other factors, efforts should be made to facilitate communication through interpretation services or other means to ensure that her preferences and wishes are understood and respected. If necessary, involving an ethics committee or seeking legal guidance may be appropriate to resolve any conflicts between Mrs. F’s autonomy and her family’s cultural beliefs.