Evaluation as a Voice for the Kids in the Back of the Room

June 5, 2010 at 4:48 pm 11 comments

I’ve been lurking recently on a conversation between K-12 computing teachers about learning in their classes.  One teacher (of elementary school children) spoke of how much programming her kids do.  She said that they easily use variables, tests, and iteration.  A respondent spoke of all that his young daughter could do in programming, including creating elaborate stories that she shared with others on the Internet.

So what exactly do those claims mean?  I’m sure that they mean exactly what they say, and that each speaker is telling the truth.  The first teacher certainly has kids who are mastering variables, conditionals, and iteration.  The proud father is certainly describing what his daughter can do.  Are there broader claims being made?  Is the teacher telling us that all her students are mastering all those things?  Or are the teacher and father saying that because some kids are mastering these concepts, it’s possible for any student to master those topics?  Is the teacher making  a claim about the methods or tools being used, that those are particularly effective technicques?

What if those broader claims are wrong?  What if not all students in the teacher’s class are mastering these concepts?  What if the techniques are not effective for all students?  What if it’s not possible for all students to learn to the same level with the same techniques?  I’ll bet that some of these broader claims are wrong.  But how would anyone know? How would the teacher know?

A Different Take on Evaluation

I often hear education developers and researchers bemoaning standardized testing — that it’s inadequate for measuring student learning (probably true), that it can cramp the style of a teacher (also probably true), that it encourages teachers to teach to the test (also probably true), and that good, veteran teachers already know how things are going in their classes.  That last part is the one that I now believe is false.  Without some kind of careful, systematic measurement, most teachers do not know what is going on in their classes.  University teachers are particularly ill-informed as to what’s going on in their large, lecture classes. The more that I learn about the Media Computation classes, the more I realize that I only knew about averages in the class, not the distribution of learning and performance across the class.

I do mean evaluation here.  Assessment is about measuring and improving students learning. Evaluation is about measurement that leads to decisions.  Most decisions about computing education (and probably education more generally, but my data are drawn from computing education) are made without measurement, without data, based on a teacher’s gut instinct — and that “gut instinct” is most certainly wrong.  “Gut instinct” told us that the Earth is flat, that the sun and moon revolved around the Earth, and that the only programming language we need ever learn is C and its variants.  A teacher’s “gut instinct” is probably informed by data. But which data, and how does a teacher avoid bias in interpreting those data?

Teachers are heavily influenced by the best students in the class.  In her book on Computers and Classroom Culture, Janet Schofield found that teachers tended to talk to the best students the most — they were the most interesting to the teachers, and the ones that the teachers wanted to please the most.  Thus, when a teacher says that something is “working” or “not working” without systemic, evaluative data, the teacher is most often talking about only the upper end of the class distribution.  As has been discussed previously in this blog, the higher-knowledge and lower-knowledge students have very different needs and respond differently to educational interventions.

Evaluation should not be about limiting the teacher or even needs to create an accurate picture of students’ learning.  Evaluation should give voice to the lower-ability kids, the kids who often hide out in the back of the class.  The teacher easily hears the kids at the front of the class, and most often tunes the class to their needs.  Careful, systematic evaluation is about hearing all the students, including those that the teacher may not be excited about hearing.

Examples from Media Computation

Lana Yarosh’s study of our Media Computation data structures class is a case in point.  Lana interviewed seven stdents in the class, analyzed those transcripts, and then developed a survey for the whole class in order to check the generality of her interview claims. One of the interviewed students hated the media content: “I didn’t take this class to learn how to make pretty pictures.” Then in the survey, 11% of all the students agreed with the statement that “Working with media is a waste of time that could be used to learn the material in greater depth.” However 60-70% (different for each statement) agreed with the statement that media made the class more interesting and that they did extra work on at least one assignment in order to make it look “cool.”  That last one is important — extra work means more time on task, which creates more opportunity to learn,

We don’t know the grade distribution of the students in Lana’s study.  Our hypothesis is that the 11% who didn’t like the media were drawn mostly from the top students.  Those are the students who want the content, as quickly as possible and with as few frills as possible.

At Georgia Tech, the teachers of the Media Computation data structures class are under a lot of pressure to reduce the amount of media coverage in the class.  The top students want less media.  Those top students are the ones who become undergraduate teaching assistants (TAs), and those TAs are not keen on the media, so the teacher also has the TA’s pushing for reducing the media content.  Without Lana’s study, the teacher would have no way to know that the majority of students actually like the media content.

Davide Fossati has been studying the Media Computation CS1 class recently.  We know that the Media Computation class has led to reduced failure rates, but we don’t know if it leads to similar learning to our other CS1’s. We have three different CS1’s at Georgia Tech: the Robotics one for CS and Science majors, the MATLAB one for Engineering majors, and the Media Computation one for Liberal Arts, Architecture, Management, and most Computational Media majors.  After the robotics course, CS majors take a course on object-oriented programming in Java.  (I actually designed the Media Computation data structures course to come between the CS1 and the Java course, but almost no student takes that path because of the cost of an extra course.  Design and implementation are different things.)  We can compare the three different CS1’s by looking at performance in that Java course.  That’s not a measure of learning, but it is an equal comparison point — it’s an evaluation measurement, even if it’s not a good assessment measurement.

Davide has looked at four years worth of data from these courses.  What he finds is that there is a significant difference between the three CS1’s in terms of performance in the Java course, but those differences are grade dependent.  Students who get A’s in these three classes perform identically in the Java course.  However, students who get lower grades in the Media Computation class do significantly worse in the Java course than the other two CS1’s.

Davide’s data are the first that we have that give us a comparison point between our three CS1’s.  His data tell us that the top students are doing just fine (which is just under half the class), but the rest of students, while passing, are not reaching the same level of performance.  Maybe that’s acceptable — that’s part of the decision part of using evaluation data.  My point is that without Davide’s data, we’d have no idea that the lower half of the students (in my metaphor, the students in the back of the room) were not performing comparably to the other students.  They would not have a voice in the decisions to be made about the course and how to support all the students in the course.

Conclusion: Consider Threats to Validity

As teachers, we’re used to making the decisions.  We make decisions about what “counts” towards grades and what doesn’t.  We decide standards on tests and grades for the class.  We also make decisions about what to teach and how.

Teachers often tell me about making changes in their class because “students don’t like this” or “don’t think that’s worthwhile” or “aren’t doing well enough in that subject.”  That may be right.  I am asking teachers to consider the possibility that they’re wrong, that they are only looking at some of the students.

In our human-centered computing classes at Georgia Tech, we talk about “threats to validity.”  We consider the possibility that our claims are wrong, and how we could know if we are wrong.  Teachers should do this, too.

Next time you make a class change because of a claim about “the students in the class,” please gather some evaluative data from all the students in the class.  Consider the possibility that you’re not hearing from the whole class.  Do a survey to find out what the whole class thinks.  Use an exam question that lets you see how the whole class is performing. Gather some systematic data that can speak for those kids hiding out in the back of your class.

Entry filed under: Uncategorized. Tags: , , , .

Computing at odds with getting Faster Back and Board

11 Comments Add your own

  • 1. Robert Talbert  |  June 7, 2010 at 6:41 am

    Some consideration about tenure and promotion has to be factored into this discussion too. Too many promotion and tenure systems at colleges and universities have virtually no basis in actual student learning but rather on the end-or-semester evaluation, which is all about what students like/dislike or find worthwhile/not worthwhile. So for a pre-tenured professor, or one who is near promotion, all this objective student learning evaluation data is all well and good – but at the end of the day, what “really matters” is what students feel about the course.

    That’s a bit cynical, and the best promotion and tenure systems balance student impressions with more objective evidence, in the form of impressions from deans and colleagues who don’t have any objective data to work with either but who at least can tell student learning from non-learning. But it is an unfortunate fact that objective measurement and analysis of student learning data is virtually invisible in a lot of P&T processes, and pragmatism on the part of professors often wins out over principled and scientific gathering and use of data.

    Reply
  • 2. Mark Guzdial  |  June 8, 2010 at 1:29 pm

    Hi Robert,

    Evaluation isn’t just about learning. Sometimes you care about pass/fail rates, or student attitudes. Even if your goal is to influence end-of-semester student attitudes, my point in this piece is that broad, systematic collection of data is necessary to give all the students a voice, to inform decisions and change.

    Cheers,
    Mark

    Reply
  • 3. Lisa Kaczmarczyk  |  June 10, 2010 at 12:27 am

    Hi Mark,

    RE the first part of your post. I had several conversations with high school teachers over the past few months, as I was trying to learn more about their situation on the ground. I heard something repeatedly that I had not anticipated. They told me that it is not the bottom students that they are most concerned about but the ones in the middle. They said that the top students have plenty of resources available and tend to do well in most situations, as you note. They then commented that there are lots of resources available for the students having the most trouble – their school systems have put a focus on putting into place programs, resources, for those on the brink of being left completely behind. The teachers I spoke with (several subjects, all science or math or computing and very dedicated) said that it is the kids in the middle who are really getting left behind because they aren’t considered good enough for gifted and talented programs and they aren’t doing poorly enough to merit attention to help them excel – which these teachers felt that many of them could easily do.

    I have also been thinking of this in the context of the admissions interviews I do for my undergraduate institution (have been doing for 20 odd years) and I see similar symptoms. The really top students jump out at you and have often been provided special extras of one sort or another; others, who could presumably do as well (speculation on my part now) but haven’t been noticed, end up not shining. This has been bothering me.

    So I think maybe we need to talk about these middle kids that are falling through the cracks in a different way.

    Reply
    • 4. Mark Guzdial  |  June 10, 2010 at 1:32 pm

      Thanks, Lisa. It’s a good point, which meshes with my main point. Good evaluative data is broad and systemic. Just listening to the smart kids is not. We need to make educational decisions for more than just the top kids.

      Cheers,
      Mark

      Reply
  • 5. Alan Kay  |  June 10, 2010 at 1:57 pm

    Hi Mark,

    It certainly seems important to gather a good model of how well a student has been learning.

    This is done in music, art, and sports learning by teachers observing actual performance of real processes, and through discussions (for example of harmonic theory). It is rare for such teachers to be off the beam about their students.

    Part of the reason this works is the nature of “real performance”, but I think a large additional reason is that the student-teacher ratio is generally very different from what seem to be absurd “factory methods” at use in university and lower grades.

    The great art teacher Betty Edwards will actually guarantee great results if she can get monetary support for 1 assistant for every 7 students (for her curriculum, this is the most students whose progress can be assessed and helped by one person as the class ensues).

    If the student teacher ratio could be different, then there are many more really good avenues for real assessment. For example, essay writing and equivalents are a very good way to assess a wide range of subject learning, but it takes time to for assessors to really read and judge essays (they’ve been given up on in California for Language Arts despite considerable evidence as to their usefulness and power).

    The band-aid over the festering wound here might be trying to do assessment in a cost and time effective manner, even if they not only don’t work, but create distractive bad indicators.

    Similarly, we can easily make up 4 or 5 categories of students on the basis of the kinds of help they need to learn a subject. Poor, middle, and talented are too few. But despite the logic and need for this, the various school and university systems try to shove students through very similar processes (I think largely for cost reasons).

    Given that most parents are the product of pretty bad schooling, the better student teacher ratio in the home is difficult to leverage.

    Until the “great computer tutor in the sky” arrives, we should probably put an enormous amount of effort into restructuring most schooling into peer-peer instruction and assessment.

    Best wishes,

    Alan

    Reply
    • 6. Mark Guzdial  |  June 11, 2010 at 8:45 am

      Hi Alan,

      I completely agree with, with enough people, we could do good assessment. With good assessment, good evaluation is easy. Yes, we have a “factory” model of education, which is ineffective. My concern (in this blog post) is that we do not even recognize that we have a factory model, and pretend that the five students we talk to out of a class of 300 are somehow representative. My best teaching experience ever was teaching CS1 (MediaComp) to a group of 12 (!) students in a residential setting on a campus in Oxford. I really knew all the students and what each could do. I could do reasonable assessment, and I could trust my evaluation of the class. That doesn’t scale well to the same class in the 150-seat lecture hall where we normally teach that same class.

      Best wishes,
      Mark

      Reply
  • 7. Jeff Graham  |  June 10, 2010 at 2:50 pm

    Not so sure about peer to peer instruction. I’ve seen that work (I did it when I was in elementary school) and I’ve seen it fail miserably (in college classrooms). Of course, I can say the same about every other form of instruction that I’ve seen as well. I don’t think there is a magic bullet instruction technique that works for everyone. Perhaps if we could catergorize teh students based on which instruction method works best for them we coud get somewhere. No idea how that would work, maybe the brain scientists will work that out someday.

    Reply
  • 8. Alan Kay  |  June 11, 2010 at 9:25 am

    Hi Mark and Jeff,

    I don’t think the factory model can be made to work well enough — so attempts to patch it are a waste of time and (worse) distractive from what should be addressed.

    Though “peer-peer” is the term used, it is a bit of a misnomer (since we are really talking about e.g. 7th graders teaching 5th graders, or upper division students helping lower division students). In some grad schools in the past (e.g. CMU, Utah, etc.) it was part of the ethics of the grad students to bring the newly anointed up to speed. CMU went to great lengths (the “immigration courses” etc.) and these were highly effective.

    Leaving the new ways to cheat the system aside, the key issue is maintaining quality control over the process (this is a key issue already in most classrooms because of what the teachers don’t understand about the subject matter — including in university).

    And categorizing the students wrt the kinds of help they need was just what I was referring to.

    I think, for starters, that taking Betty Edwards 1:7 ratio as a rule of thumb, and making “5” categories of learners, would produce much better results, even without a big and difficult cognitive study.

    Cheers,

    Alan

    Reply
  • 9. Jeff Graham  |  June 13, 2010 at 11:03 am

    That makes more sense to me. The “peer to peer” we did as elementary students was 6th graders (like myself) tutoring 4th graders. FInding ways to facilitate that kind of interaction is probably a very good idea. I also think the sort of reform that you are advocating would be a definite improvement.

    Reply
  • 10. A Voice from the Back of the Room at Software Carpentry  |  June 16, 2010 at 9:02 am

    […] Guzdial recently posted another thought-provoking piece, this one about how teachers are biased toward assessing a class’s progress by their […]

    Reply
  • […] or are “easy” based on little evidence, often just discussion with the top students (as Davide Fossati and I found). If we’re going to make computing education work for everyone, we have to ask, “What […]

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 11.4K other subscribers

Feeds

Recent Posts

Blog Stats

  • 2,096,696 hits
June 2010
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
282930  

CS Teaching Tips