Monday, October 21, 2024

Stabilizing School Performance Measures

 

 Can Help States Identify Schools and Student Groups Most in Need of Support


The Every Student Succeeds Act of 2015 requires states to use a variety of indicators, including standardized tests and attendance records, to designate schools for support and improvement based on schoolwide performance and the performance of groups of students within schools. Schoolwide and group-­level performance indicators are also diagnostically relevant for district-­level and school-level decisionmaking outside the formal accountability context. Like all measurements, performance indicators are subject to measurement error, with some having more random error than others. Measurement error can have an outsized effect for smaller groups of students, rendering their measured performance unreliable, which can lead to misidentification of groups with the greatest needs. Many states address the reliability problem by excluding from accountability student groups smaller than an established threshold, but this approach sacrifices equity, which requires counting students in all relevant groups.

With the aim of improving reliability, particularly for small groups of students, this study applied a stabilization model called Bayesian hierarchical modeling to group-­level data (with groups assigned according to demographic designations) within schools in New Jersey. Stabilization substantially improved the reliability of test-­based indicators, including proficiency rates and median student growth percentiles. The stabilization model used in this study was less effective for non-­test-based indictors, such as chronic absenteeism and graduation rate, for several reasons related to their statistical properties. When stabilization is applied to the indicators best suited for it (such as proficiency and growth), it leads to substantial changes in the lists of schools designated for support and improvement. These results indicate that, applied correctly, stabilization can increase the reliability of performance indicators for processes using these indicators, simultaneously improving accuracy and equity.

A new REL Mid-Atlantic report shows how stabilizing indicators of school performance, by reducing the impact of random variation, improves their accuracy. Using advanced statistical modeling (Bayesian hierarchical modeling) and data provided by the New Jersey Department of Education, REL Mid-Atlantic stabilized scores for most student groups and indicators used in New Jersey’s accountability processes. Bayesian stabilization adjusts a student group’s score on an accountability indicator based on past scores from the same group and scores on the same indicator from other groups of students within the same subgroup.

Stabilization made test-based performance indicators more reliable, especially for small groups of students. Non-test-based indicators, however, such as high school graduation and chronic absenteeism, were less suitable for stabilization. When the accountability designation process was simulated using stabilized test-based indicators, the list of schools that would have been identified for support and improvement changed, reflecting the reduced effect of random data variations on scores.  This study demonstrates how stabilization can improve the accuracy of performance results for small student groups, so that they are less likely to be flagged in accountability processes based on random variation rather than true performance. States can use this approach to include more student groups in accountability considerations by reducing the minimum group size (currently 20–30 students in most states) while maintaining accuracy.


No comments: