Monday, January 21, 2008

New York Measuring Teachers by Test Scores

This is a really, really important development -- which is why it made the front page of today's NY Times -- and is a bold and courageous step by Bloomberg and Klein, because the UFT will fight this tooth and nail.

New York City has embarked on an ambitious experiment, yet to be announced, in which some 2,500 teachers are being measured on how much their students improve on annual standardized tests.

The move is so contentious that principals in some of the 140 schools participating have not told their teachers that they are being scrutinized based on student performance and improvement.

Nobody disputes that teacher quality is, by far, the single most important factor in student learning and achievement, so it logically follows that teacher quality is the single most thing that needs to be measured to improve our schools and close the achievement gap.
 
At first glance, it wouldn't seem so difficult to measure this.  The primary job of teachers is to impart knowledge to children (don't even get me started on building self esteem -- self esteem is not taught, but rather comes from genuine knowledge and achievement), so you simply test kids at the beginning of the year and then test them at the end of the year and see how much kids have learned.
 
Of course, it's not this simple: though I think most tests do a pretty good job of measuring how well a child can read or do math, tests aren't perfect and some subject areas are harder to measure than others; students who live in broken, violent and/or dysfunctional homes or who don't speak English well or who qualify for Special Ed are obviously more of a challenge to teach; sometimes students have more than one teacher in a particular area (both an English and History teacher, for example, might be teaching a student how to write better); etc.
 
However, it's important not to let perfection be the enemy of the good.  As this researcher correctly points out, a well-designed measurement system is effective at identifying top and bottom performers:

William Sanders, a researcher in North Carolina who was one of the first to begin evaluating teachers and schools based on student test score improvements, said that while such a system could be used to make broad judgments, it was difficult to use it with precision enough to find differences among teachers who are simply average.

“Can you distinguish the top teachers? Yes,” Mr. Sanders said. “Can you distinguish the bottom teachers? The answer is yes, too. But it would be risky to make decisions using information at the classroom level for teachers who are just in the middle. You might miss a lot that way.”

But what about Randi's claim that “There is no way that any of this current data could actually, fairly, honestly or with any integrity be used to isolate the contributions of an individual teacher”?  It's a clever but disingenuous argument: she's saying that because the system isn't perfect, that it's therefore useless.  But nobody is claiming that this system should be the only tool used to evaluate teachers, as this principal correctly notes:
“This should simply be one more way to think about things,” said Frank A. Cimino, the principal of P.S. 193 in Brooklyn, who said he was participating in the experiment. “It is going to tell you some things you don’t know, but it will miss the other things that go on in a classroom.”
Then Randi argues: “These tests were never intended and have never been validated for the use of evaluating teachers”.  In fact, there are numerous studies which show that teacher evaluation systems very accurately identify effective teachers.  For example, see page 2 of this presentation I posted here www.tilsonfunds.com/Personal/Teacherquality.pdf, which shows results from a Hamilton Project study: student math performance in Los Angeles in the third year is hugely impacted by teacher quality measured in the previous two years.  Pages 4-7 show similar results for both math and reading in Dallas.
 
Pages 14 and 15 underscore the fact that years of experience (after the first two years) and what type of certification a teacher has do not predict student achievement, as today's article correctly points out:

But experts are grappling with how to determine what makes a good teacher. Neither graduate programs in education schools nor previous academic records are reliable predictors, they say. The federal No Child Left Behind law requires that districts place a “highly qualified” teacher in every classroom, which typically means one who has completed a certification program, but this, too, is not necessarily a good indicator of quality.

“It seems hard to know who is going to be effective in the classroom until they are actually in the classroom,” said Thomas J. Kane, a professor of education and economics at Harvard, who is conducting several research projects on teacher quality in New York City, and who is involved in the new effort.

Mr. Kane said there was little evidence that teachers with the “right paper qualifications” were any more effective than those without them. “But most school districts spend very little time trying to assess how good teachers are in their first couple of years, when it is most important,” he said.

Turning to another lame argument, Randi says:
Ms. Weingarten said the system was not needed. “Any real educator can know within five minutes of walking into a classroom if a teacher is effective,” she said.
This is nonsense as well.  Any "real educator" will tell you that it's extremely difficult to determine if a teacher is effective based on five minutes (or five hours, five days or, sometimes, even five months!) of classroom observation. 
 
That said, the main problem principals face is not figuring out who their best and worst teachers are, but rather finding a way to overcome union resistence and the contract to reward and incent top performers and, more importantly, get rid of teachers who are unwilling or unable to effectively impart knowledge to children. 
 
This is why having data is so important.  Today, because classroom observation and other teacher evaluation metrics are so subjective, it's virtually impossible to deny a teacher tenure or remove a teacher for being ineffective (especially given the UFT's policy of grieving every single teacher removal case, no matter how egregious the circumstances).  With an effective measurement system rooted in hard data, these problems can be addressed.
 
Randi's real fear is that any system that would identify bottom performers is terrifying to those teachers -- and their union.  Hence, while Randi claims she's in favor of identifying and (if they don't improve) removing underperforming teachers, this is just PR -- here are Randi's true feelings: “If one permitted this, it would be one of the worst decisions of my professional life.”  On this, we can agree.  Allow me to translate what she's saying: "If I don't fight this tooth and nail, my members -- a substantial fraction of whom are ineffective and thus rightly terrified of being identified and losing their jobs -- will vote me out."
 
To see how substantial a fraction, see page 3 of my slides at www.tilsonfunds.com/Personal/Teacherquality.pdf.  Bain & Co. did a study in the late 1990s that measured what students at various grade levels in Boston public schools had learned during a year, mapped it back to individual teachers and discovered that one third (!) of reading/English and math teachers failed to impart any knowledge during an entire school year!  To be clear, it wasn't that the students taught by the bottom third of teachers learned somewhat less than one year of material over the course of a school year -- they didn't learn anything at all!  I'm sure you will be shocked -- SHOCKED -- to hear that the study was attacked by the teachers union so vociferously that Bain buried it (fortunately, I have friends at Bain).
----------------------------
January 21, 2008

New York Measuring Teachers by Test Scores

New York City has embarked on an ambitious experiment, yet to be announced, in which some 2,500 teachers are being measured on how much their students improve on annual standardized tests.

 Subscribe in a reader