- Measure Score
- Essential Score
- 5Essentials Score
All scores provide information about a school’s performance relative to a benchmark and determine the color for Measures and Essentials. For more information on benchmarks, click here.
Extensive empirical evidence indicates that there is a strong correlation between a score's distance from the mean and school improvement.
The Measure and Essential scores are on a 1-99 scale. These scale scores are neither a percentile rank nor a percentage.
The Measure scores are calculated by first combining responses from multiple survey questions. When we combine the responses together, we do not take a simple average of the raw response data. Instead, we use a method called Rasch analysis, which takes into account missing responses and unreliable responses. See below for more technical details about how this works.
After these Measure scores are calculated, we compare them to the benchmark and put the score on this 1-99 scale. Every twenty points is exactly one standard deviation wide and has a different color. This means that improving on twenty points—or an entire color—is a substantial improvement on that Measure.
The Essential scores are the average of all of the Measure scores for that Essential. These scores are also on a 1-99 scale.
The overall 5Essentials score is a summary indicator that describes a school’s performance on the combined Essentials. This score is calculated by adding together the school’s performance on each individual Essential, with the score color categories receiving a numeric value.
More technical details on the scoring process for the Measure, Essential, and 5Essential scores follows.
More Technical Details on Scoring
Measure Score: A Measure score is a summary indicator describing how teachers or students responded to the specific questions making up each Measure; a number of Measures together comprise an Essential. A Measure score is calculated by using a method called Rasch analysis. This method uses statistical models to combine survey questions together into a raw score for each responding individual. This method has several advantages over simply averaging responses: a) It allows us to ask a relatively small number of questions and still get a valid and reliable indicator, measuring a broad range of experiences; b) It can handle missing data easily; and c) it provides a “standard error” for each individual, which tells us how reliable the person’s responses were.
We know that no raw score is going to be perfectly true—survey Measures always have some random errors. The standard error helps us estimate how accurate raw scores are.
Based on Rasch analysis, we know which items are “easier” to endorse and which ones are more “difficult” to endorse. For example, consider the Safety Measure on the student survey:
How safe do you feel:
- In the hallways and bathrooms of the school.
- Outside around the school.
- Traveling between home and school.
- In your classes.
We know from Rasch that the second item is the most “difficult” to endorse, and the fourth item is the “easiest." That is, students tend to report feeling relatively safest in their classes and least safe outside and around the school. This makes sense—areas with less adult supervision tend to feel less safe to students.
We expect that students will respond more favorably to the fourth item than the second; statistically and substantively, it just makes sense that if a student reports feeling very safe outside and around the school, she should also feel very safe in her classes.
It is extremely unlikely that a student reports feeling not safe in his classes but very safe outside and around the school. Since these response patterns are so unusual, it’s likely that they’re due to some kind of error, such as students not paying attention to the questions.
Rasch analysis produces a Measure score at the individual (student) level as well as a standard error at the individual level. This standard error represents how unreliable the student’s overall responses are. It also takes into account missing data—so if a student only has responded to two out of four items, they’ll still get an individual Measure score, but they’ll get a large standard error.
When we create the school-level Measure scores, we weight students by the inverse of the standard error. That is, students with lots of missing data or extremely unusual response patterns are down-weighted.
After computing the raw score, we adjust the average score for each school () by the average respondent’s standard error for the school (). We do this so that, when we generate the school-level Measure score, we give a statistically more accurate score greater weight than a score that is less accurate. This is the adjusted score.
In order to compare scores over time, we standardize the adjusted scores to the benchmark, which means we subtract the average raw score for the benchmark () and divide by the school-level standard deviation for the benchmark ().
This standardized adjusted score is put on a (roughly) 1 to 99 scale by multiplying by 20 and adding 50. The scores are truncated so that the scale ranges from 1 to 99. A score of 1 represents a score that is at least 2.5 standard deviations below the given benchmark. A score of 99 represents a score that is at least 2.5 standard deviations above the benchmark. Some standardized adjusted scores may be unusually extreme—the effect of these responses on the school’s final Measure score is thus limited.
Essential Score: An Essential score is a summary indicator that describes the school’s performance on each particular Essential. It is the average of all of the Measure scores making up that Essential.
5Essentials Score: The overall 5Essentials score is a summary indicator that describes a school’s performance on the 5Essentials as a group. This score is calculated by adding up the school’s performance on each individual Essential. Being light green (strong) or dark green (very strong) on an Essential counts as +1; yellow (neutral) or gray counts as 0; and orange (weak) or dark red (very weak) counts as -1. For example, if a school has 1 green, 2 yellows, and 2 reds, the school would have a net score of -1 (1 + 0 + 0 -1 -1= -1). Each 5Essentials score has a corresponding color and improvement label.
- Well Organized (+3, +4, or +5) - Dark Green
- Organized (+1 or +2) – Light Green
- Moderately Organized (0) – Yellow
- Partially Organized (-1 or -2) – Orange
- Not Yet Organized (-3, -4, or -5) – Red
For Illinois reports, the color and improvement labels differ slightly. The labels are below, "Most Implementation" is the darkest purple and each subsequent label is a lighter shade.
- Most Implementation (+3, +4, or +5)
- More Implementation (+1 or +2)
- Average Implementation (0)
- Less Implementation (-1 or -2)
- Least Implementation (-3, -4 or -5)
For a primer on Rasch analysis, click here