Question Everything: How We Teach Intro CS is Wrong
I’ve been interested in John Sweller and Cognitive Load Theory since reading Ray Lister’s ACE keynote paper from a couple year’s back. I’ve wanted to learn more, and as all professors will tell you, the best way to learn something (or at least, to find time to learn it) is to teach it. So, I assigned several papers on the topic (first three papers in the References) to my educational technology class. Those three papers set me on a paper finding-and-reading spree which is having a dramatic effect on how I think about we teach about computing.
Let me start out with a strawman position of how we teach introductory computing. I know it won’t be how you and your institution teach computing, but I hope that it’s got some elements of how most American schools teach CS1:
We lecture students on how to use a particular element of computing, say variables, assignments, and function calls. We walk through 2-3 examples of using those elements, and assign reading from a textbook that might present 1-2 of those examples with maybe a couple others. We then assign a programming assignment that the students have never seen before whose solution will require use of those programming elements, often with authentic tools or IDEs like those used by experts.
The original 1985 Sweller and Cooper paper had five studies with similar set-ups. There are two groups of students, each of which is shown two worked-out algebra problems. Our experimental group then gets eight more algebra problems, completely worked out. Our control group solves those eight more problems. As you might imagine, the control group takes five times as long to complete the eight problems than the experiment group takes to simply read them. Both groups then get new problems to solve. The experimental group solves the problems in half the time and with fewer errors than the control group. Not problem-solving leads to better problem-solving skills than those doing problem-solving. That’s when Educational Psychologists began to question “learning by doing” and the idea that we should best teach problem-solving by having students solve problems.
The paper by Kirschner, Sweller, and Clark is the most outspoken and most interesting of the papers in this thread of research. Their title states their basic premise: “Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching.” They describe the problem like this:
On one side of this argument are those advocating the hypothesis that people learn best in an unguided or
minimally guided environment, generally defined as one in which learners, rather than being presented with essential information, must discover or construct essential information for themselves (e.g., Bruner, 1961; Papert, 1980; Steffe & Gale, 1995). On the other side are those suggesting that novice learners should be provided with direct instructional guidance on the concepts and procedures required by a particular discipline and should not be left to discover those procedures by themselves (e.g., Cronbach & Snow, 1977; Klahr & Nigam, 2004; Mayer, 2004; Shulman & Keisler, 1966; Sweller, 2003).
What exactly is minimal instruction? And are they really describing us? I think this quote describes how we work in computing education (as modeled above) pretty well:
There seem to be two main assumptions underlying instructional programs using minimal guidance. First they challenge students to solve “authentic” problems or acquire complex knowledge in information-rich settings based on the assumption that having learners construct their own solutions leads to the most effective learning experience. Second, they appear to assume that knowledge can best be acquired through experience based on the procedures of the discipline (i.e., seeing the pedagogic content of the learning experience as identical to the methods and processes or epistemology of the discipline being studied; Kirschner, 1992).
That is, “people should learn to program by constructing program from the basic information on the language, and they should do it in the same way that experts do it.” The paper then goes on to present all the evidence showing that this “minimally-guided instruction” does not work.
After a half-century of advocacy associated with instruction using minimal guidance, it appears that there is no body of research supporting the technique. In so far as there is any evidence from controlled studies, it almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners.
Now, in the same issue of Educational Psychologist there were rebuttals to KSC, and there have been more written since then, including one by my colleague Cindy Hmelo-Silver. What’s striking about these rebuttals is that they basically say, “But not problem-based and inquiry-based learning! Those are actually guided, scaffolded forms of instruction.” What’s striking is that no one challenges KSC on the basic premise, that putting introductory students in the position of discovering information for themselves is a bad idea! In general, the Educational Psychology community (from the papers I’ve read) says that the model I describe above is an ineffective way to teach introductory students.
So what’s wrong with my strawman? First, way too few examples. The original Sweller and Cooper paper used ten examples of just a simple algebraic manipulation. In the studies where worked examples have been used with computer programming (yes, it has been used with computer programming, and yes, it works there, too) by Pete Pirolli and Mimi Recker, the examples were in the 6-12 range for something like recursion. 2-5 examples are too few.
Second, we should not be asking students to solve something new. They should practice the same information, demonstrate that they got that. What KSC shows is that that approach takes too much time and leads to too little learning — it overloads the cognitive ability of the learner. Only the very best students can succeed in when simply thrown in front of a speeding interpreter or compiler. (I think that explains the “two humps” pretty darn well — it’s not the discipline, but how we teach it that results in the bimodal distribution.)
What should we do instead? That’s a big open question. Lots of options have been explored in this literature, from using tools like intelligent tutors to focusing on program “completion” problems (van Merrie¨nboer and Krammer in 1987 got great results using completion rather than program generation). I think that a lot of TeachScheme with DrScheme language levels has many of the right characteristics. Media Computation provides the right number of examples with minimal changes during practice, but still deals with the speeding interpreter problem. The bottomline is that asking students to program doesn’t lead to them learning programming — as Richard Mayer (famous educational psychologist put it) says it promotes behavioral activity too early in the learning process, when learners should be cognitively active. (How often have you seen students programming without thinking much?) We need to figure out what’s the right kind of practice for CS, and how much, and how to motivate students to take seriously examples and practice, as opposed to full programming.
This literature is not saying never program. Rather, it’s a bad way to start. Students need the opportunity to gain knowledge first before programming, just as with reading. Later, there is a expertise reversal effect, where the worked example effect disappears then reverses. Students do learn better with real programming, real problem-solving. There is a place for minimally guided student activity, including programming. It’s just not at the beginning.
Finally, it turns out that textbooks are not the best medium for teaching programming. The Atkinson, Derry et al. article does a wonderful job of surveying what we know about teaching with examples. One of the findings is the Modality Effect. Using text to explain something that is visual (like a program) leads to more extraneous cognitive processing than using audio for the explanation. Given that we work with computers capable of multimedia, we should be able to do much better than textbooks.
Overall, I find this literature unintuitive. It seems obvious to me that the way to learn to program is by programming. It seems obvious to me that real programming can be motivating. But KSC respond to this, too.
Why do outstanding scientists who demand rigorous proof for scientific assertions in their research continue to use and, indeed defend on the bias of intuition alone, teaching methods that are not the most effective?
This literature doesn’t offer a lot of obvious answers for how to do computing education better. It does, however, provide strong evidence that what we’re doing wrong, and offers pointers to how other disciplines have done it better. It’s a challenge to us to question our practice. It’s up to us to use these lessons and models to improve our teaching practice.
- Atkinson, R. K., Derry, S. J., Renkl, A., & Wortham, D. W. (2000). Learning from examples: Instructional principles from the worked examples research. Review of Educational Research, 70, 181–214.
- Kirschner, P. A., Sweller, J., and Clark, R. E. (2006) Why minimal guidance during instruction does not work: an analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist 41 (2) 75-86
- Sweller, J., & Cooper, G. A. (1985). “The use of worked examples as a substitute for problem solving in learning algebra”. Cognition and Instruction 2 (1): 59–89. doi:10.1207/s1532690xci0201_3.
- Scaffolding and Achievement in Problem-Based and Inquiry Learning: A Response to Kirschner, Sweller, and Clark (2006) Hmelo-Silver, Duncan, & Chinn. (2007). Educational Psychologist, 42(2), 99–107