Tuesday, February 4, 2014

Choosing the Right Growth Measure

State education agencies and school districts are increasingly using measures based on student test-score growth in their systems for evaluating school and teacher performance. In many cases, these systems inform high-stakes decisions such as which schools to close and which teachers to retain. Performance metrics tied directly to student test-score growth are appealing because although schools and teachers differ dramatically in their effects on student achievement, researchers have had great difficulty linking these performance differences to characteristics that are easily observed and measured.

The question of how best to measure student test-score growth for the purpose of school and teacher evaluation has fueled lively debates nationwide.

This study examines three competing approaches to measuring growth in student achievement.

The first approach, based on aggregated student growth percentiles, has been adopted for use in evaluation systems in several states. SGPs calculate how a student’s performance on a standardized test compares to the performance of all students who received the same score in the previous year (or who have the same score history in cases with multiple years of data). For example, an SGP of 67 for a 4th-grade student would indicate that the student performed better than two-thirds of students with the same 3rd-grade score. An SGP of 25 would indicate that the student performed better than only one-quarter of students with the same 3rd-grade score.

To produce a growth measure for a district, school, or teacher, the SGPs for individual students are combined, usually by calculating the median SGP for all students in the relevant unit. The number of years of student-level data used to calculate median SGPs can vary. In our analysis, we use the median SGP of students enrolled in a given school over five years.

A key feature of the SGP approach is that it does not take into account student characteristics, such as race and poverty status, or schooling environments. Advocates of SGPs, and of “sparse” growth models more generally, view this as an advantage; they worry that methods that do take into account student or school-level demographic characteristics effectively set lower expectations for disadvantaged students. Critics of SGP-type metrics counter that not taking these differences into account may in fact penalize schools that serve disadvantaged students, which tend to have lower rates of test-score growth for reasons that may be at least partly out of their control.

A second approach, by far the most common among researchers studying school and teacher effects, is a one-step value-added model. Many versions of the value-added approach exist. The version we use takes into account student background characteristics and schooling environment factors, including students’ socioeconomic status (SES), while simultaneously calculating school-average student test-score growth. Specifically, we calculate growth for schools based on math scores while taking into account students’ prior performance in both math and communication arts; characteristics that include race, gender, free or reduced-price lunch eligibility (FRL), English-language-learner status, special education status, mobility status, and grade level; and school-wide averages of these student characteristics.

Researchers have gravitated toward the value-added approach because, under some assumptions, it provides accurate information on the causal effects of individual schools or individual teachers on student performance. But interpreting growth measures based on the one-step value-added approach in this way requires assuming that the available measures of student and school SES, and the specific methods used to adjust for differences in SES, are both adequate. If the measures are insufficient and the academic growth of disadvantaged students is lower than that of more advantaged students in ways not captured by the model, the one-step value-added approach will be biased in favor of high-SES schools at the expense of low-SES schools.

The third approach we consider is also based on value-added but is carried out in two steps instead of one in order to force comparisons between schools and teachers serving students with similar characteristics. In the first step, we measure the relationship between student achievement and student and school characteristics. In the second step, we calculate a growth measure for each school using test-score data that have been adjusted for student and school characteristics in the first step.

By design, this third approach fully adjusts student test scores for differences in student and school characteristics. In fact, it may overadjust for the role of such differences. For example, suppose that students eligible for free or reduced-price lunch attend schools that are truly inferior in quality, on average, to the schools attended by ineligible students. The average gap in school quality between these groups would be eliminated in the first step of the two-step value-added procedure, and thus would not carry over to the estimated growth measures. Consequently, it is important to interpret the results using this approach accurately, as they do not necessarily reflect differences in the causal effects of schools and teachers on student performance.

The researchers argue that the third approach is still the best choice for use in an evaluation system aimed at increasing student achievement.

No comments: