Measuring Teaching: Gates Foundation

March 7, 2012 at 8:08 am 15 comments

We’ve mentioned before in this blog about the importance of figuring out good measures for quality teaching.  I didn’t know that the Gates Foundation was working on this, and has a set of five instruments that they’re developing.  Bill Gates just had an op-ed in the NYTimes where he argued for just this kind of multi-faceted evaluation of teachers, and argued that public humiliation of poor-performing teachers was not going to be effective.

The MET project: installment two
The Bill and Melinda Gates Foundation has released an update to its preliminary findings for its Measures of Effective Teaching (MET) project, investigating the properties of five instruments for classroom observation: Framework for Teaching (FFT), Classroom Assessment Scoring System (CLASS), Protocol for Language Arts Teaching Observations (PLATO), Mathematical Quality of instruction (MQI), and UTeach Teacher Observation Protocol (UTOP). Researchers assessed each instrument using two criteria: reliability and validity. All five instruments were positively associated with student achievement gains. Evaluators found that reliably characterizing a teacher’s practice required averaging scores over multiple observations. They also found that combining observation scores with evidence of student achievement gains on state tests and student feedback improved predictive power and reliability. Combining observation scores, student feedback, and student achievement gains was better than graduate degrees or years of teaching experience at predicting a teacher’s student achievement gains with another group of students on the state tests. Combining observation scores, student feedback, and student achievement gains on state tests also was better than graduate degrees or years of teaching experience in identifying teachers whose students performed well on other measures.
See the report:

Entry filed under: Uncategorized. Tags: , , .

A $35 computer for education: But it’s never been about the hardware AERA now includes Computing Education Research

15 Comments Add your own

  • 1. Bonnie  |  March 7, 2012 at 8:13 am

    Read these two articles (one from the NYTimes and one from the Washington Post) for a look at what happens when these measures are used to humiliate teachers rather than help them improve. I cannot imagine why anyone would want to go into teaching in the current climate–motivating-and-fired/2012/02/04/gIQAwzZpvR_story.html?hpid=z4

  • 2. Baker Franke  |  March 7, 2012 at 8:47 am

    “All five instruments were positively associated with student achievement gains.”

    If student “achievement”, which seems to mean scores on standardized tests, is the primary bar against which we rate teachers, I quit. I hope that these tools for measuring teaching have other facets to them besides how the students do that year. I can tell you much of my “good” teaching happens with students who had good teaching long before getting to me – I teach HS Comp. Sci. I ride the coattails of their primary school teachers who primed them well to be good students; I get credit for their “achievement’ when really, in many cases it has little to do with me. Similarly, college and university profs benefit from whatever good I can impart on my students — you’re welcome.

    Didn’t you cite an article in this very column (maybe I saw it elsewhere) discussing this issue? The problem isn’t teachers, it’s schools as a whole. There’s a lot that teachers do to make schools decent places that is another part of the job, outside the classroom.

    What if the teacher who gets the best student “achievement” gains is a total asshole? making for a poor, cancerous, professional climate in the school? Anyone who’s taught in a school knows these people exist, and know that the student population as a whole would benefit from their removal.

    • 3. Bradley Beth  |  March 7, 2012 at 11:11 am


      I can’t speak to the details of the others, but the UTOP is not testing-oriented. Rather it’s based on a rubric for qualitatively documenting the teacher/student/classroom environment. Nothing new there, but it assumes a couple of novel premises:

      1. Observers should be well versed in the content, so that content and pedagogy can be observed together. Previously, observation protocols assumed that content was irrelevant.

      2. Observations are done by at least 2 independent observers who have been extensively trained on use of the protocol.

      3. The UTOP is largely style-agnostic, in the sense that the observation does not hinge on a lesson being specifically lecture-, inquiry-, project-, etc. oriented.

      So, in your case, I imagine a situation as follows:

      You are teaching a lesson on selection sort. An observer *with knowledge of this topic* who has been trained over the UTOP documents your behaviors as well as the students’. That day is the first day you have ever introduced sorting, so you work through this cool, physical activity that has students sorting themselves by height.

      The next day you teach insertion sort. A *different* observer *with knowledge of this topic* who has been trained over the UTOP documents your behaviors as well as the students’. Sorting is old hat; you do a direct instruction lecture with guided practice.

      Now using the UTOP, these two independent observers should be able to “rate” you (with agreement) based on criteria such as:

      A. The instructional strategies and activities used in this lesson clearly connected to students’ prior knowledge and experience.

      B. The significance of the … content, including how it fits into the “big picture” of the discipline, was made explicit to the students.

      C. The teacher’s questioning strategies developed student conceptual understanding of important … content (e.g. emphasizing higher order questions, appropriately using “wait time,” identifying prior conceptions and misconceptions).

      Although high ratings using the UTOP may correlate with high test scores of students, it isn’t any part of the measurement.

      I know personally that the developers of the instrument wanted to provide an alternative to testing-based assessment of teachers. It’s one thing to say that testing-based performance assessment is wrong, but the politicians and state agencies rightly respond with “what then?” That good teaching practices lead to higher test scores is obvious, but demonstrating that to the bottom-liners is important pragmatically.

      • 4. Baker Franke  |  March 7, 2012 at 12:16 pm

        Well that all sounds fine. And I get what you’re saying. I’m all for good and better teacher measurement since most of the time I have no idea if what I’m doing is working or not.

        I think a big step is making explicit what you implied but the article did not — that teacher behavior might be predictive of student achievement, but not causal. That the measurement of a teacher should be somehow a measurement of the teacher giving the students in his/her classroom the best odds of achieving, rather than looking at the achievement itself.

        What I’ve been thinking about is trying to figure out a moneyball approach to teaching: Billy Beane looked for players that got on base a lot, not players who got a lot of hits, or scored a lot of runs necessarily. He put together a team with high OBP guys, believing that it put the odds in the team’s favor of scoring runs since guys who get on base, don’t get outs, etc.

        If we could find a way to measure teaching this way – which it sounds like we might be onto – and the measurement of a teacher was the degree to which their behaviors increased the odds of student learning, rather than on whether or not the student actually achieved that would be huge. It would be a step in the right direction away from “blame the teacher” and toward a whole-school approach, where a teacher is recognized as an important piece of the puzzle, but not directly and personally responsible for the success or failure of an individual student.

        • 5. Baker Franke  |  March 7, 2012 at 12:19 pm

          …but of course, for a moneyball approach, we do need some kind of data to go on, so data collection is necessary. But it’s going to take a lot of re-education about how that data will be used before people (like me) put their guard down and don’t feel threatened.

        • 6. Bradley Beth  |  March 7, 2012 at 12:45 pm

          Agreed all around. As far as causality goes, I’d say there’s more to it than prediction and less than causation. In math terms, good teacher behaviors are “necessary but not sufficient” for classroom learning to take place.

          Policymakers seem to be gravitating toward *necessary and sufficient* (i.e., “it is all the teachers’ responsibility”) OR *not necessary* (i.e., “teachers are glorified babysitters; let’s replace them with the Khan Academy”).

          “I’m all for good and better teacher measurement since most of the time I have no idea if what I’m doing is working or not.”

          Yes, it really isn’t helpful to teachers if it isn’t oriented toward the process of teaching.

          Funny anecdote time:

          I taught AP CS in Texas. The district admins decided our TAKS scores were too low in general, and mandated that principals take a “data-driven” approach to correcting this. Our principal decided the best way to do this was to have each teacher comb over their students’ test scores disaggregated by item so that they could tailor instruction to problem areas. Her measure for teacher performance was? *drumroll* How many times each teacher logged into the system to view their students’ data.

          I remember objecting in the staff meeting – particularly around the fact that the data were *static*. If they never change, what’s the point of viewing them over and over? And of course, how many times I log in does not equate to how I use the data. So, I looked at the data, made notes, and made a script to auto-log me in/out and let it run over the weekend, and then never looked at the site again. At the end of the year, she must have been astounded at how good of a teacher I was — 2nd place in the metrics was a score of 36; mine was >58,000.

  • 7. Baker Franke  |  March 7, 2012 at 12:50 pm

    Data in the hands of decision-makers can be a dangerous thing.

  • 8. Leigh Ann Sudol-DeLyser  |  March 7, 2012 at 12:50 pm

    Notice that several of these assessments rely on the subjective evaluation of administrators. They are going to need professional development as well.

  • 9. Garth  |  March 7, 2012 at 12:51 pm

    “since most of the time I have no idea if what I’m doing is working or not.” As a CS teacher I live in this zone.

  • 10. gfrblxt  |  March 8, 2012 at 7:59 am

    I’m following the discussions above with great interest, and I wonder: how long does it take to train the observers for the UTOP ratings? It would strike me that you would need a LOT of observers to pull this off, even for one school.

    • 11. Bri Morrison  |  March 8, 2012 at 8:27 am

      I would think that computer based training to train the observers could be used in this instance. Lots of problem-based learning examples (videos) with comparison of evaluation to “expert” evaluation.

  • 12. Alicia  |  March 8, 2012 at 12:45 pm

    Disclaimers: (1) I work for the UTeach Institute. My opinions don’t reflect my organization and are completely my own. (2) I’m married to the other Beth commenting here. 🙂 (3) I’ve been trained on the UTOP, but I know nothing about the other MET measures. The UTOP is an observational protocol used to measure a set of really deep indicators of teaching and learning in STEM (and only STEM).

    @Bonnie – Agree. As far as the UTOP goes, the intention is to use it for both evaluation and intensive professional development.

    @Baker – The UTOP doesn’t even look at student achievement. It’s a completely PCK and instruction based observational measure (in my limited experience with it, it’s about as deep as I’ve ever seen one of these instruments go into really effective teaching – e.g., it doesn’t focus so completely on student engagement as many observational instruments like to do… students can be engaged and not learning or learning inaccurate content). Absolutely agree with your other points. And the UTOP was designed in large part by a “master teacher” at UT Austin, one who was successful for years in the classroom and now trains UTeach teachers at UT Austin.

    @Bradley – I would agree that the UTOP is style agnostic, although UTeach definitely (and admittedly) encourages a very project-/problem-/inquiry-based approach. Also agree that UTeach and the developers absolutely want to avoid measuring teacher effectiveness by student achievement (alone). That’s one thing UTeach has taken a hit on for years. We get funders who want to fund UTeach replication who say, “How do your teachers’ students’ do on standardized tests?” That’s the one thing they want to know. We haven’t measured this because we think it’s a much more complicated question and one that we shouldn’t even really be asking (over others).

    @Baker – Again, speaking only from my experience, I would say that UTeach is absolutely ANTI blame the teacher. We have to walk a fine line because that sometimes involves saying things (like, “We don’t think it’s valuable to measure our teachers’ effectiveness by their students’ achievement”) that are not what funders/politicians want to hear, and they’re the ones who control the purse strings that allow us to continue doing the work we do (which we are all passionate about and think is very good and out of the box).

    @LeighAnn – True, although I think the UTOP is not designed to be used by administrators. If it were, they’d need a ton of training and PD to go with.

    @gfrbxlt & @Bri – I can tell you that the training I did was the mini-training, and it took an entire day. We really should have come back together to meet again to finish, and then probably another day to really get on the same page. (We weren’t actually going to be using this as observers, so we didn’t go through this process.) The developers are also imagining, after the major training, a recalibration process — viewing videos and rating online — before each round of observations. It’s extremely time intensive, but it’s GOOD (again, my opinion).

    Sorry to only talk about the UTOP, but it’s the one I know, and I can’t say enough good things about UTeach or about the excellent intentions of its developers (who are the same people who developed the UTOP and who are absolutely, without question PRO teacher). It’s absolutely my favorite thing about working here… everyone is completely passionate about what they do and wants to see good teachers, good support from politicians/funders/the public, good learning, … Always the goal. We just have a really long way to go.

  • […] I’ve written before about computer science pedagogical content knowledge (PCK).  Phil Sadler and his colleagues just published a wonderful study about the value of PCK.  He found that science teachers need to know science, but the most effective science teachers also know what students get wrong — their misconceptions, what the learning difficulties are, and what are the symptoms of misunderstandings.  I got a chance to ask him about this paper, and he said one of the implications of the work that he sees is that he offers a way to measure PCK, and measuring something important about teaching is hard and important. […]

  • […] How would one measure extraordinary, innovative teaching?  We have a difficult time measuring regular teaching! […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,186 other subscribers


Recent Posts

Blog Stats

  • 2,060,644 hits
March 2012

CS Teaching Tips

%d bloggers like this: