Archive for June 5, 2010

Evaluation as a Voice for the Kids in the Back of the Room

I’ve been lurking recently on a conversation between K-12 computing teachers about learning in their classes.  One teacher (of elementary school children) spoke of how much programming her kids do.  She said that they easily use variables, tests, and iteration.  A respondent spoke of all that his young daughter could do in programming, including creating elaborate stories that she shared with others on the Internet.

So what exactly do those claims mean?  I’m sure that they mean exactly what they say, and that each speaker is telling the truth.  The first teacher certainly has kids who are mastering variables, conditionals, and iteration.  The proud father is certainly describing what his daughter can do.  Are there broader claims being made?  Is the teacher telling us that all her students are mastering all those things?  Or are the teacher and father saying that because some kids are mastering these concepts, it’s possible for any student to master those topics?  Is the teacher making  a claim about the methods or tools being used, that those are particularly effective technicques?

What if those broader claims are wrong?  What if not all students in the teacher’s class are mastering these concepts?  What if the techniques are not effective for all students?  What if it’s not possible for all students to learn to the same level with the same techniques?  I’ll bet that some of these broader claims are wrong.  But how would anyone know? How would the teacher know?

A Different Take on Evaluation

I often hear education developers and researchers bemoaning standardized testing — that it’s inadequate for measuring student learning (probably true), that it can cramp the style of a teacher (also probably true), that it encourages teachers to teach to the test (also probably true), and that good, veteran teachers already know how things are going in their classes.  That last part is the one that I now believe is false.  Without some kind of careful, systematic measurement, most teachers do not know what is going on in their classes.  University teachers are particularly ill-informed as to what’s going on in their large, lecture classes. The more that I learn about the Media Computation classes, the more I realize that I only knew about averages in the class, not the distribution of learning and performance across the class.

I do mean evaluation here.  Assessment is about measuring and improving students learning. Evaluation is about measurement that leads to decisions.  Most decisions about computing education (and probably education more generally, but my data are drawn from computing education) are made without measurement, without data, based on a teacher’s gut instinct — and that “gut instinct” is most certainly wrong.  “Gut instinct” told us that the Earth is flat, that the sun and moon revolved around the Earth, and that the only programming language we need ever learn is C and its variants.  A teacher’s “gut instinct” is probably informed by data. But which data, and how does a teacher avoid bias in interpreting those data?

Teachers are heavily influenced by the best students in the class.  In her book on Computers and Classroom Culture, Janet Schofield found that teachers tended to talk to the best students the most — they were the most interesting to the teachers, and the ones that the teachers wanted to please the most.  Thus, when a teacher says that something is “working” or “not working” without systemic, evaluative data, the teacher is most often talking about only the upper end of the class distribution.  As has been discussed previously in this blog, the higher-knowledge and lower-knowledge students have very different needs and respond differently to educational interventions.

Evaluation should not be about limiting the teacher or even needs to create an accurate picture of students’ learning.  Evaluation should give voice to the lower-ability kids, the kids who often hide out in the back of the class.  The teacher easily hears the kids at the front of the class, and most often tunes the class to their needs.  Careful, systematic evaluation is about hearing all the students, including those that the teacher may not be excited about hearing.

Examples from Media Computation

Lana Yarosh’s study of our Media Computation data structures class is a case in point.  Lana interviewed seven stdents in the class, analyzed those transcripts, and then developed a survey for the whole class in order to check the generality of her interview claims. One of the interviewed students hated the media content: “I didn’t take this class to learn how to make pretty pictures.” Then in the survey, 11% of all the students agreed with the statement that “Working with media is a waste of time that could be used to learn the material in greater depth.” However 60-70% (different for each statement) agreed with the statement that media made the class more interesting and that they did extra work on at least one assignment in order to make it look “cool.”  That last one is important — extra work means more time on task, which creates more opportunity to learn,

We don’t know the grade distribution of the students in Lana’s study.  Our hypothesis is that the 11% who didn’t like the media were drawn mostly from the top students.  Those are the students who want the content, as quickly as possible and with as few frills as possible.

At Georgia Tech, the teachers of the Media Computation data structures class are under a lot of pressure to reduce the amount of media coverage in the class.  The top students want less media.  Those top students are the ones who become undergraduate teaching assistants (TAs), and those TAs are not keen on the media, so the teacher also has the TA’s pushing for reducing the media content.  Without Lana’s study, the teacher would have no way to know that the majority of students actually like the media content.

Davide Fossati has been studying the Media Computation CS1 class recently.  We know that the Media Computation class has led to reduced failure rates, but we don’t know if it leads to similar learning to our other CS1’s. We have three different CS1’s at Georgia Tech: the Robotics one for CS and Science majors, the MATLAB one for Engineering majors, and the Media Computation one for Liberal Arts, Architecture, Management, and most Computational Media majors.  After the robotics course, CS majors take a course on object-oriented programming in Java.  (I actually designed the Media Computation data structures course to come between the CS1 and the Java course, but almost no student takes that path because of the cost of an extra course.  Design and implementation are different things.)  We can compare the three different CS1’s by looking at performance in that Java course.  That’s not a measure of learning, but it is an equal comparison point — it’s an evaluation measurement, even if it’s not a good assessment measurement.

Davide has looked at four years worth of data from these courses.  What he finds is that there is a significant difference between the three CS1’s in terms of performance in the Java course, but those differences are grade dependent.  Students who get A’s in these three classes perform identically in the Java course.  However, students who get lower grades in the Media Computation class do significantly worse in the Java course than the other two CS1’s.

Davide’s data are the first that we have that give us a comparison point between our three CS1’s.  His data tell us that the top students are doing just fine (which is just under half the class), but the rest of students, while passing, are not reaching the same level of performance.  Maybe that’s acceptable — that’s part of the decision part of using evaluation data.  My point is that without Davide’s data, we’d have no idea that the lower half of the students (in my metaphor, the students in the back of the room) were not performing comparably to the other students.  They would not have a voice in the decisions to be made about the course and how to support all the students in the course.

Conclusion: Consider Threats to Validity

As teachers, we’re used to making the decisions.  We make decisions about what “counts” towards grades and what doesn’t.  We decide standards on tests and grades for the class.  We also make decisions about what to teach and how.

Teachers often tell me about making changes in their class because “students don’t like this” or “don’t think that’s worthwhile” or “aren’t doing well enough in that subject.”  That may be right.  I am asking teachers to consider the possibility that they’re wrong, that they are only looking at some of the students.

In our human-centered computing classes at Georgia Tech, we talk about “threats to validity.”  We consider the possibility that our claims are wrong, and how we could know if we are wrong.  Teachers should do this, too.

Next time you make a class change because of a claim about “the students in the class,” please gather some evaluative data from all the students in the class.  Consider the possibility that you’re not hearing from the whole class.  Do a survey to find out what the whole class thinks.  Use an exam question that lets you see how the whole class is performing. Gather some systematic data that can speak for those kids hiding out in the back of your class.

June 5, 2010 at 4:48 pm 11 comments

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,185 other subscribers


Recent Posts

Blog Stats

  • 2,039,875 hits
June 2010

CS Teaching Tips