There’s a chance for learning in the NYC teacher scores

As a journalist, I would have published the scores.
The argument isn’t whether or not the New York Times should have published NYC teacher evaluation scores.
They are a newspaper. The scores are news. Their job is to publish them. They publish the news.
If they’d sat on the scores, if they’d held them internally, if they’d published pieces of them or only profiled certain teachers, they would have been compromising and editorializing.
The coverage of the scores has certainly had an editorializing effect on how the scores are consumed. As José pointed out the other day, the person telling the story affects the narrative.
Now they’re out there, and a conversation has been stoked around the use, intent, validity of the scores.
As it should be.
As a teacher, I abhor the scores.
These scores (and value-added measures in general) are imperfect, imprecise, skewed, and dangerous tools. Let’s make that argument. Let’s make that argument better and more profoundly than those who stand by the scores.
If ever the teaching profession was faced with a teachable moment, this is it. Isn’t this what we do? We make complex issues accessible to those standing on unfamiliar ground and help them come to deep understanding. If we’re right (and we are) the truth of the argument against the scores will become apparent through education.
Yes, resent that time, money, energy must be spent on this. Detest, the scores the same way you detest poor grammar, ignorance of culture and history, or imperfect proofs. Then, find a way to teach toward understanding.
This is one of those few moments in the teaching profession’s wheel house. Let’s not miss it by admiring another problem so long that we forget to teach through it.
Teachers are better than that.
This is where unions can take the lead.
It is time for the AFT and NEA to hike up their big-kid pants and lead their membership not through dues or rallies, but through teaching.
I mean this in two ways. First, teachers are historically challenged when it comes to telling their stories. There’s every reason to believe this inability is only going to be exacerbated when faced with an issue as emotionally charged and personal as the NY scores. If teachers are going to respond and educate, they’re going to need guidance. Every union head in every school across the country should be leading trainings in how to create talking points and craft effective editorials. If there is a conversation to be had about how we measure teachers, let teachers lead it and educate teachers in how best to have those conversations.
Second, after these PR primers, help teachers organize forums and community meetings to build understanding of the scores and all their imperfections. Use the presence of the NYC conversation to move preemptively against other imperfect and unfair measures of teachers. These should have been the moves the moment the courts allowed the publishing of the scores. There’s still time to make this a thoughtful, productive conversation. All it will take is all it has ever taken – teaching.

Thanks to Paul and José for helping me figure out my thinking on this one.

Things I Know 273 of 365: Value added isn’t

Value-added assessment is a new way of analyzing test data that can measure teaching and learning. Based on a review of students’ test score gains from previous grades, researchers can predict the amount of growth those students are likely to make in a given year. Thus, value-added assessment can show whether particular students – those taking a certain Algebra class, say – have made the expected amount of progress, have made less progress than expected, or have been stretched beyond what they could reasonably be expected to achieve.

– The Center for Greater Philadelphia

Professor Andrew Ho came and spoke to my school reform class tonight about the idea of value added and its space in the conversation on American education.

We started looking at a scatterplot of local restaurants situated by their Zagat rating and the Zagat average price per meal.

Ho then plotted a regression line through the scatterplot and took note of one restaurant that had a higher score than predicted for it’s cost.

The temptation was to claim our overachieving restaurant was a good buy for the money. Who’d expect a restaurant with such inexpensive food to have such a high rating?

Then he asked us what we didn’t see.

Portions, ambiance, quality, location, service, selection, etc.

Any of these is familiar to someone who’s debated with a group of friends when attempting to select a restaurant.

His point was simple. Expectations changes based on what you base expectations on.

Ho relabeled the axes – this year’s test results, previous year’s test results.

He asked us what we didn’t see.

Content, delivery, socioeconomic status, race, home life, sports, after-school activities, tutoring, mentoring, etc.

This is to say nothing of the fact that perhaps there is a natural spread to knowledge and growth that is beyond the influence of a teacher or the fact that different combinations of teachers in the life of a student in a given year could have varying effects on achievement.

A psychometrician, statistician and policy researcher, Ho then laid some data on us from the research on value added:

  • Estimates of value added are unstable across models, courses that teacher might teach, and years.
  • Across different value-added models, teacher effect. ratings differ by at least 1 decile for 56%-80% of teachers and by at least 3 deciles for 0%-14% of teachers (this is reassuring).
  • Across courses taught, between 39% and 54% of teachers differ by at least 3 deciles.
  • Across years, between 19% and 41% of teachers differ by at least 3 deciles.

He then made a point that’s come up time and again in my statistics course, “Any test measures, at best, a representative sample of the target domain.”

But we’re not seeing samples that are representative. According to Ho, “In practice, it is an unrepresentative sample that skews heavily toward the quickly and cheaply measurable.” We’re not learning about the population. Put differently, we can’t know all that we want to know. Anyone who says differently is selling something.

When questioned on teacher assessment in his recent Twitter Town Hall, Sec. Duncan said he favored multiple forms of assessment in gauging teacher effectiveness. Nominally, Ho explained, this makes sense, but in effect it can have unintended negative consequences.

Here too, Ho cautioned against the current trend. Yes, value added is often used in concert with observation data or other similar measures. If those observations are counted as “meets expectations” or “does not meet expectations” and all teachers meet expectations, though, we have a problem. The effect is to mute the impact of this measure in the composite. While it may be nominally weighted at 50%, if value added is the only aspect of the composite accounting for variance, “the contribution of these measures is usually much higher than reported, as teacher effectiveness ratings discriminate much better (effective weights) than other ratings.”

Ho’s stated goal was to demystify value added. In that he succeeded.

He left us with his two concerns:

  • The current incentive structures are so obviously flawed, and the mechanisms for detecting and discouraging unintended responses to incentives are not in place.
  • The simplifying assumptions driving “value added,” including a dramatic overconfidence about the scope of defensible applications of educational tests (“numbers is numbers!”), will lead to a slippery slope toward less and less defensible accountability models.

I’d hate to think we’re more comprehensive in our selection of restaurants than teacher assessment.