The 5Essentials reporting website uses three types of related but different scores:
- Measure Score
- Essential Score
- 5Essentials Score
All scores provide information about a school’s performance relative to a benchmark and determine the color for measures and essentials. The 2011 Chicago Public Schools (CPS) average for high schools and elementary schools is used as the benchmark for all clients outside of Illinois. For the state of Illinois and Chicago Public Schools, the 2013 Illinois state average for each grade level is used as the benchmark.
Extensive empirical evidence indicates that there is a strong correlation between a score's distance from the mean and school improvement.
The measure and essential scores are on a 1-99 scale. These scale scores are neither a percentile rank nor a percentage.
The measure scores are calculated by first combining responses from multiple survey questions. When we combine the responses together, we do not take a simple average of the raw response data. Instead, we use a method called Rasch analysis, which takes into account missing responses and unreliable responses. See below for more technical details about how this works.
After these measure scores are calculated, we compare them to the benchmark and put the score on this 1-99 scale. Every twenty points is exactly one standard deviation wide and has a different color. This means that improving on twenty points—or an entire color—is a substantial improvement on that measure.
The essential scores are the average of all of the measure scores for that essential. These scores are also on a 1-99 scale.
The overall 5Essentials score is a summary indicator that describes a school’s performance on the combined Essentials. This score is calculated by adding together the school’s performance on each individual Essential, with the score color categories receiving a numeric value.
More technical details on the scoring process for the Measure, Essential, and 5Essential scores are provided at the end of this page.
About the BenchmarkThe CPS Benchmark. The Chicago Public Schools (CPS) average in 2011 is the benchmark for providing the survey results for many clients. CPS is a very diverse system, with over 650 large, small, public, charter, and other types of schools. The original research that found 5Essentials as a leading indicator for school improvement was conducted using two decades of CPS data.
Because high schools can be substantially different from elementary schools, we compare high schools (grades 9-12) to the benchmark high school average and elementary schools (grades K-8, K-5, and 6-8) to the benchmark elementary school average.
The benchmark is used to generate meaningful categories for each measure score:
- “Very strong”: at least 1.5 standard deviations above the benchmark.
- “Strong”: between 0.5 and 1.5 standard deviations above the benchmark.
- “Neutral”: above -0.5 standard deviations and below 0.5 standard deviations above the benchmark.
- “Weak”: 0.5 to 1.5 standard deviations below the benchmark.
- “Very weak”: at least 1.5 standard deviations below the benchmark.
The Illinois Benchmark. For schools in Illinois (including CPS), we use the 2013 state average as the benchmark. As described above, we compare schools within grade level. For the CPS benchmark, we only compare schools within one of two grade levels: elementary or high school. In contrast, the Illinois benchmark makes these comparisons within four grade levels: elementary (K-8), primary (K-5), middle school (6-8), and high school (9-12).
The categories on the Illinois reports (Illinois.5-essentials.org) have different colors and labels than other clients, although the scoring process is identical for all clients:
- “Most implementation”: at least 1.5 standard deviations above the benchmark.
- “More implementation”: between 0.5 and 1.5 standard deviations above the benchmark.
- “Average implementation”: above -0.5 standard deviations and below 0.5 standard deviations above the benchmark.
- “Less implementation”: 0.5 to 1.5 standard deviations below the benchmark.
- “Least implementation”: at least 1.5 standard deviations below the benchmark.
Note: CPS schools appear on the Illinois reporting site (Illinois.5-essentials.org) as well as their own CPS reporting site (cps.5-essentials.org). The CPS reporting site uses the labels “very strong”, “strong”, etc. Although the labels differ between the two sites, the scores themselves are identical.
More Technical Details on Scoring
Measure Score: A measure score is a summary indicator describing how teachers or students responded to the specific questions making up each measure; a number of measures together comprise an essential. A measure score is calculated by using a method called Rasch analysis. This method uses statistical models to combine survey questions together into a raw score for each responding individual. This method has several advantages over simply averaging responses: a) It allows us to ask a relatively small number of questions and still get a valid and reliable indicator, measuring a broad range of experiences; b) It can handle missing data easily; and c) it provides a “standard error” for each individual, which tells us how reliable the person’s responses were.
We know that no raw score is going to be perfectly true—survey measures always have some random errors. The standard error helps us estimate how accurate raw scores are.
Based on Rasch analysis, we know which items are “easier” to endorse and which ones are more “difficult” to endorse. For example, consider the Safety measure on the student survey:
How safe do you feel:
- In the hallways and bathrooms of the school.
- Outside around the school.
- Traveling between home and school.
- In your classes.
We know from Rasch that the second item is the most “difficult” to endorse, and the fourth item is the “easiest”. That is, students tend to report feeling relatively safest in their classes and least safe outside and around the school. This makes sense—areas with less adult supervision tend to feel less safe to students.
We expect that students will respond more favorably to the fourth item than the second; statistically and substantively, it just makes sense that if a student reports feeling very safe outside and around the school, she should also feel very safe in her classes.
It is extremely unlikely that a student reports feeling not safe in his classes but very safe outside and around the school. Since these response patterns are so unusual, it’s likely that they’re due to some kind of error, such as students not paying attention to the questions.
Rasch analysis produces a measure score at the individual (student) level as well as a standard error at the individual level. This standard error represents how unreliable the student’s overall responses are. It also takes into account missing data—so if a student only has responded to two out of four items, they’ll still get an individual measure score, but they’ll get a large standard error.
When we create the school-level measure scores, we weight students by the inverse of the standard error. That is, students with lots of missing data or extremely unusual response patterns are down-weighted.
After computing the raw score, we adjust the average score for each school () by the average respondent’s standard error for the school (). We do this so that, when we generate the school-level measure score, we give a statistically more accurate score greater weight than a score that is less accurate. This is the adjusted score.
In order to compare scores over time, we standardize the adjusted scores to the benchmark, which means we subtract the average raw score for the benchmark () and divide by the school-level standard deviation for the benchmark ().
This standardized adjusted score is put on a (roughly) 1 to 99 scale by multiplying by 20 and adding 50. The scores are truncated so that the scale ranges from 1 to 99. A score of 1 represents a score that is at least 2.5 standard deviations below the given benchmark. A score of 99 represents a score that is at least 2.5 standard deviations above the benchmark. Some standardized adjusted scores may be unusually extreme—the effect of these responses on the school’s final measure score is thus limited.
Essential Score: An essential score is a summary indicator that describes the school’s performance on each particular essential. It is the average of all of the measure scores making up that essential.
5Essentials Score: The overall 5Essentials score is a summary indicator that describes a school’s performance on the 5Essentials as a group. This score is calculated by adding up the school’s performance on each individual essential. Being light green (strong) or dark green (very strong) on an essential counts as +1; yellow (neutral) or gray counts as 0; and orange (weak) or dark red (very weak) counts as -1. For example, if a school has 1 green, 2 yellows, and 2 reds, the school would have a net score of -1 (1 + 0 + 0 -1 -1= -1). Each 5Essentials score has a corresponding color and improvement label.
- Well Organized (+3, +4, or +5) - Dark Green
- Organized (+1 or +2) – Light Green
- Moderately Organized (0) – Yellow
- Partially Organized (-1 or -2) – Orange
- Not Yet Organized (-3, -4, or -5) – Red
For a primer on Rasch analysis, click here (http://ccsr.uchicago.edu/downloads/9585ccsr_rasch_analysis_primer.pdf)