Reliability of the Tests

The NCDPI Score-Frequency data gives the impression of precision, of exactness and completeness. However, students being susceptible to illnesses, emotional stress and so on, the same tests given to the same students on some different days would result in at least slightly different results. There are also influences such as change in principal, teacher turnover, student ethnic composition, number of students who need English development, and so on. This is expressed in the randomness that is associated with almost all testing. There are also the statistical variances where schools with very small numbers of students in an ethnic category may show exaggerated variability. In the aggregate of 1400 schools some small differences may not be of consequence, but that might not hold when looking more closely at these tests. The year to year comparison shown in Figure I.2 obscures these variations and leaves the misleading impression of regularity.

The classification of students into GLP percent and the reporting of GLP percentages are consequences of the tests. Thus far in these notes I have been considering aggregations of data, taking advantage of what is called the properties of large numbers. Now I will consider two questions and show that they have at best doubtful answers. First, how much confidence can be put in GLP percent for the same school, for the same grade and subject, from year to year. That is, when tracking grades and subjects from year to year and looking at individual schools, how consistent are the GLP percent? The implication of a negative answer is that the GLP percent scores may themselves be considered of questionable usefulness.

Another consideration is whether this data can be used to compare year-to-year changes between schools. If policy decisions are based on comparisons of GLP percent between schools, then the variations apparent in these plots cast doubt on their usefulness.

I. Year to Year Changes by Grade

I.1 All Schools

Figure I.1.1 compares the changes in Reading GLP percent by school from one year to the next, for instance, from 2016-17 to 2017-18, to the changes in the succeeding year, 2017-18 to 2018-19. Red denotes Title I schools, and filled circles denote schools with either under 20% or over 80% GLP percent. Each school is annotated with the school ID, number of students, and GLP percent for the earliest year. The decision to declare a school Title I status is an administrative one made by the school district (Local Education Agency or LEA). However, there is a strong association between Title I status and a school’s overall EDS percentage. Roughly, if a school is over 40% EDS, then it will classify as a Title I school.[DOEt1] The distribution of Title I funds to individual schools is made by the LEA administrators and will not be discussed here.

Figure I.1.1. All Schools Reading Grade 3

This shows several important aspects of the grade 3, 4, and 5 GLP percent test results. There is a grouping in the center of the plot, that is, around small changes. This is to be expected and is reflective of the realities of changing administrative and teaching staff, and shifts in student population and school mission, such as from or to magnet. Schools in the upper right-hand quadrant performed better in 2015-16 compared with 2014-15 and also in 2014-15 compared to 2013-14. The lower left-hand quadrant contains schools that performed worse for both years, etc. One might expect some schools to do remarkably better or worse as interventions, population shifts, staff changes, or finances change. However, the size of the “cloud”, where there are shifts up and down by over ten percent from year to year, requires explanation. Are the schools in the cloud distinguished by some internal or external circumstances? Are there additional influences that might be available in other data sources? Are there changes in student preparation or the amount of time allocated to practice? To what extent do large changes cast doubt on the usefulness of the tests and testing procedures? I address some aspects of those questions below, but more extensive work remains to be done.

I.2 Slices Across GLP Percentiles

The sweeping together of data for all schools, as for instance in Figure I.1.1, may obscure insights available in various slice-and-dice expositions. Figures I.2, below, show the same Grade 3 Reading year-to-year first differences as in Figure I.1.1, but sliced by the starting (2013-14) GLP percentage of each school. There are three divisions presented here, schools with lower starting GLP, specifically in the bottom fortieth percentile, schools in the middle (from the fortieth to the sixtieth percentile), and schools with higher starting GLP, above the sixtieth percentile.

In the first plot, Figure I.2.1A, the schools started in the lower fortieth percentile. The schools are shifted rightward, towards higher GLP percent in 2014-15 than in 2013-14, although there many schools that moved leftward, having lower GLP percent in 2014-15 than in 2013-14. There appears to be a general tendency for the GLP percent in 2015-16 to be higher than in 2014-15 (above the horizontal line denoted as “0”). That is encourageing, but there were about fifty schools that showed improvement of over twenty percentage points (above the horizontal line denoted “20”)

Figure I.2.1A. Statewide 2013-14 GLP% in Bottom 40th Percentile

Figure I.2.1B. Statewide 2013-14 GLP% Between 40th and 60th Percentile

Figure I.2.1C. Statewide 2013-14 GLP% Above 60th Percentile

II. Tracking a “Wave”

II.1. All Schools

The second question I address is how much confidence can be put in GLP percent for the same school, for the same subject, when following a cohort from year to year. When tracking grade 3 to grade 4, and 4 to 5, from year to year and looking at individual schools, how consistent are the GLP percent? Figure II.1 shows that the appearance of cohort tracking resembles that for same grade, shown above. Red denotes Title I schools, and filled circles denote schools with either under 20% or over 80% GLP percent. Each school is annotated with the school ID, number of students, and GLP percent for the earliest year. It appears that the usefulness of GLP percent comparisons for cohorts is limited.

Figure II.1. All Schools Reading Cohort

II.2. Slices Across GLP Percentiles

This is similar to what was discussed in Section I.2. Figures II.2 show the same Grade 3 Reading year-to-year first differences as in Figure II.1, but sliced by the starting (2013-14) GLP percentage of each school. There are three divisions presented here, schools with lower starting GLP, specifically in the bottom fortieth percentile, schools in the middle (from the fortieth to the sixtieth percentile), and schools with higher starting GLP, above the sixtieth percentile.

Figure II.2.1A Statewide 2013-14 GLP% in Bottom 40th Percentile

Figure II.2.1B Statewide 2013-14 GLP% Between 40th and 60th Percentile

Figure II.2.1C Statewide 2013-14 GLP% Above 60th Percentile

III. Wake County Schools

The appearance of the data for specific LEAs is similar to that for the whole. This is for Wake County following the wave that starts in 2013-14 Grade 3 Reading.

Figure III.1. Wake County Reading Cohort

This report run 2021-05-08

publicSchoolsReporting Project - Targets

D. Hopp

May 8, 2021