Question Everything: How We Teach Intro CS is Wrong

October 2, 2009 at 3:55 pm 23 comments

I’ve been interested in John Sweller and Cognitive Load Theory since reading Ray Lister’s ACE keynote paper from a couple year’s back. I’ve wanted to learn more, and as all professors will tell you, the best way to learn something (or at least, to find time to learn it) is to teach it. So, I assigned several papers on the topic (first three papers in the References) to my educational technology class. Those three papers set me on a paper finding-and-reading spree which is having a dramatic effect on how I think about we teach about computing.

Let me start out with a strawman position of how we teach introductory computing. I know it won’t be how you and your institution teach computing, but I hope that it’s got some elements of how most American schools teach CS1:

We lecture students on how to use a particular element of computing, say variables, assignments, and function calls. We walk through 2-3 examples of using those elements, and assign reading from a textbook that might present 1-2 of those examples with maybe a couple others. We then assign a programming assignment that the students have never seen before whose solution will require use of those programming elements, often with authentic tools or IDEs like those used by experts.

The original 1985 Sweller and Cooper paper had five studies with similar set-ups. There are two groups of students, each of which is shown two worked-out algebra problems. Our experimental group then gets eight more algebra problems, completely worked out. Our control group solves those eight more problems. As you might imagine, the control group takes five times as long to complete the eight problems than the experiment group takes to simply read them. Both groups then get new problems to solve. The experimental group solves the problems in half the time and with fewer errors than the control group. Not problem-solving leads to better problem-solving skills than those doing problem-solving. That’s when Educational Psychologists began to question “learning by doing” and the idea that we should best teach problem-solving by having students solve problems.

The paper by Kirschner, Sweller, and Clark is the most outspoken and most interesting of the papers in this thread of research. Their title states their basic premise: “Why Minimal Guidance During Instruction Does Not Work: An Analysis of the Failure of Constructivist, Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching.” They describe the problem like this:

On one side of this argument are those advocating the hypothesis that people learn best in an unguided or
minimally guided environment, generally defined as one in which learners, rather than being presented with essential information, must discover or construct essential information for themselves (e.g., Bruner, 1961; Papert, 1980; Steffe & Gale, 1995). On the other side are those suggesting that novice learners should be provided with direct instructional guidance on the concepts and procedures required by a particular discipline and should not be left to discover those procedures by themselves (e.g., Cronbach & Snow, 1977; Klahr & Nigam, 2004; Mayer, 2004; Shulman & Keisler, 1966; Sweller, 2003).

What exactly is minimal instruction? And are they really describing us? I think this quote describes how we work in computing education (as modeled above) pretty well:

There seem to be two main assumptions underlying instructional programs using minimal guidance. First they challenge students to solve “authentic” problems or acquire complex knowledge in information-rich settings based on the assumption that having learners construct their own solutions leads to the most effective learning experience. Second, they appear to assume that knowledge can best be acquired through experience based on the procedures of the discipline (i.e., seeing the pedagogic content of the learning experience as identical to the methods and processes or epistemology of the discipline being studied; Kirschner, 1992).

That is, “people should learn to program by constructing program from the basic information on the language, and they should do it in the same way that experts do it.” The paper then goes on to present all the evidence showing that this “minimally-guided instruction” does not work.

After a half-century of advocacy associated with instruction using minimal guidance, it appears that there is no body of research supporting the technique. In so far as there is any evidence from controlled studies, it almost uniformly supports direct, strong instructional guidance rather than constructivist-based minimal guidance during the instruction of novice to intermediate learners.

Now, in the same issue of Educational Psychologist there were rebuttals to KSC, and there have been more written since then, including one by my colleague Cindy Hmelo-Silver. What’s striking about these rebuttals is that they basically say, “But not problem-based and inquiry-based learning! Those are actually guided, scaffolded forms of instruction.” What’s striking is that no one challenges KSC on the basic premise, that putting introductory students in the position of discovering information for themselves is a bad idea! In general, the Educational Psychology community (from the papers I’ve read) says that the model I describe above is an ineffective way to teach introductory students.

So what’s wrong with my strawman? First, way too few examples. The original Sweller and Cooper paper used ten examples of just a simple algebraic manipulation. In the studies where worked examples have been used with computer programming (yes, it has been used with computer programming, and yes, it works there, too) by Pete Pirolli and Mimi Recker, the examples were in the 6-12 range for something like recursion. 2-5 examples are too few.

Second, we should not be asking students to solve something new. They should practice the same information, demonstrate that they got that. What KSC shows is that that approach takes too much time and leads to too little learning — it overloads the cognitive ability of the learner. Only the very best students can succeed in when simply thrown in front of a speeding interpreter or compiler. (I think that explains the “two humps” pretty darn well — it’s not the discipline, but how we teach it that results in the bimodal distribution.)

What should we do instead? That’s a big open question. Lots of options have been explored in this literature, from using tools like intelligent tutors to focusing on program “completion” problems (van Merrie¨nboer and Krammer in 1987 got great results using completion rather than program generation). I think that a lot of TeachScheme with DrScheme language levels has many of the right characteristics. Media Computation provides the right number of examples with minimal changes during practice, but still deals with the speeding interpreter problem. The bottomline is that asking students to program doesn’t lead to them learning programming — as Richard Mayer (famous educational psychologist put it) says it promotes behavioral activity too early in the learning process, when learners should be cognitively active. (How often have you seen students programming without thinking much?) We need to figure out what’s the right kind of practice for CS, and how much, and how to motivate students to take seriously examples and practice, as opposed to full programming.

This literature is not saying never program. Rather, it’s a bad way to start. Students need the opportunity to gain knowledge first before programming, just as with reading. Later, there is a expertise reversal effect, where the worked example effect disappears then reverses. Students do learn better with real programming, real problem-solving. There is a place for minimally guided student activity, including programming. It’s just not at the beginning.

Finally, it turns out that textbooks are not the best medium for teaching programming. The Atkinson, Derry et al. article does a wonderful job of surveying what we know about teaching with examples. One of the findings is the Modality Effect. Using text to explain something that is visual (like a program) leads to more extraneous cognitive processing than using audio for the explanation. Given that we work with computers capable of multimedia, we should be able to do much better than textbooks.

Overall, I find this literature unintuitive. It seems obvious to me that the way to learn to program is by programming. It seems obvious to me that real programming can be motivating. But KSC respond to this, too.

Why do outstanding scientists who demand rigorous proof for scientific assertions in their research continue to use and, indeed defend on the bias of intuition alone, teaching methods that are not the most effective?

This literature doesn’t offer a lot of obvious answers for how to do computing education better. It does, however, provide strong evidence that what we’re doing wrong, and offers pointers to how other disciplines have done it better. It’s a challenge to us to question our practice. It’s up to us to use these lessons and models to improve our teaching practice.

References

Atkinson, R. K., Derry, S. J., Renkl, A., & Wortham, D. W. (2000). Learning from examples: Instructional principles from the worked examples research. Review of Educational Research, 70, 181–214.
Kirschner, P. A., Sweller, J., and Clark, R. E. (2006) Why minimal guidance during instruction does not work: an analysis of the failure of constructivist, discovery, problem-based, experiential, and inquiry-based teaching. Educational Psychologist 41 (2) 75-86
Sweller, J., & Cooper, G. A. (1985). “The use of worked examples as a substitute for problem solving in learning algebra”. Cognition and Instruction 2 (1): 59–89. doi:10.1207/s1532690xci0201_3.
Scaffolding and Achievement in Problem-Based and Inquiry Learning: A Response to Kirschner, Sweller, and Clark (2006) Hmelo-Silver, Duncan, & Chinn. (2007). Educational Psychologist, 42(2), 99–107

Entry filed under: Uncategorized. Tags: cognitive science, computing education, computing education research, Media Computation, Scheme.

Playing Wack-a-Mole with State CS Ed Policy Learning computing on computer, not in IDE

23 Comments Add your own

1. Alfred Thompson | October 2, 2009 at 4:29 pm

Intuitive or not my gut tells me that we are teaching CS wrong. I’ve believed that for a number of years but have lacked the {somethine} to put my fingure on what is wrong and how to fix it. It seems to me that these studies may have it nailed. Well perhaps not nailed but at least they open the door in the right direction.
In spite of having written several textbooks myself I am not a strong believer in them. I think they make good references. The projects and examples are helpful. But ultimately it is the intereaction with the teacher/mentor and using the tools that make for real learning. Perhaps some form of pairs programming where a more experienced programmer works with a less/unexperienced programmer – the student – would work better? Does it scale well? no, not at all. So perhaps we have to find some middle ground. But as I read this post my thought was “Of course. We force students to make too large a jump from concept to implimentation.”
Reply
2. Alan Fekete | October 2, 2009 at 8:58 pm

I believe the error in traditional CS teaching isn’t the lack of cognitive load theory; rather, it is lack of diversity of tasks. So often, a course will have only one or two main types of task. For example, in CS1 it’s usually “write a program to do X”, in OOAnalysis it’s “produce a set of UML models for Y”, in Algorithms we ask either “trace this algorithm”, or “analyse the running time of this algorithm” or “produce an algorithm”. I believe that monoculture is always a bad idea, it encourages students to see the class as boring drill, and it leads students to adopt a “product/output focus”.

Instead, I propose that we should have a wide range of learning activities and assessed tasks. In CS1, we can mix “write a program to do X” with “produce pseudocode to do X”, “fix a program”, “trace a program”, “explain a program”, “identify differences between two programs”, “convert pseudocode into a program”, or even “teach someone else about this feature of programming”. The more variety of tasks we give, the more engaging and authentic the experience will be.
Reply
- 3. Mark Guzdial | October 3, 2009 at 4:21 pm
  
  The kinds of tasks you’re describing, Alan, as well as the ones Katrin articulates in her blog, have the advantage of having lower cognitive load, so this line of research suggests that students might learn more and more quickly with these kinds of tasks. There are still the tasks that professional programmers engage in, though. Can we come up with new tasks, that help students to focus on particular schema/rules/lessons, that work even better for learning than the professional tasks? Maybe not, and maybe authenticity is critical for maintaining motivation. Other disciplines have found practice tasks (yes, essentially drills in some cases — sometimes drills work) that have led to this dramatic speed-up in learning. One of the Sweller and Cooper experiments explored how much MORE can you teach if you just focus on worked examples, and it’s pretty impressive. These results suggest that we can do a lot better with our teaching if we’re willing to go beyond the experts’ tasks. That’s why I titled this piece, “Question Everything.”
  Reply
4. Katrin Becker | October 3, 2009 at 9:18 am

I certainly agree with the idea that we should be using a wide range of approaches.

One of the ones we really don’t use often enough is the “change a program” exercise. I don’t know about anyone else but that is my own main method for learning a new language on my own. I *never* use the approach we impose on our novices myself. When I am trying to learn a new language, it is usually because I have some thing I need to accomplish. So I look for existing code that does something similar – usually several examples. I check them over to find the one that is closest to what I want AND that makes the most sense to me (we all have different approaches to coding) and then I try and modify it to fit my needs.

If I were to organize these notions into some rough hierarchy, I’d do “change a program” before asking them to “fix a program”. I think debugging requires a relatively deep understanding of the way the code works. I think we use this approach way too early. I have been guilty of that myself – recently – and your post Mark has helped me to get a handle on why that approach rarely works well with rank novices.
Reply
5. Katrin Becker | October 3, 2009 at 9:30 am

I also think one of the strongest reasons why intro programming is often taught the way it is is because we’re lazy: creating 12 examples is more work than creating 2 or 3, so if it’s not int the book, we’re not likely to make it up ourselves.

There are, of course, many exceptional CS teachers who do make up their own examples and who create resources for their students, but this is not the norm. Even those who DO make up their own examples typically just make one or two to ‘supplement’ the text. Here I will fess up and say I am one of those who made up a few of my own examples and then used those year after year.

p.s. I also maintain that laziness is an essential quality in a computer scientist: why else would we work SO hard to find an easy way to do things?
Reply
6. The Becker Blog » Blog Archive » Teaching Introductory Programming: We’re Doing It Wrong (still) | October 3, 2009 at 9:49 am

[…] Question Everything: How We Teach Intro CS is Wrong […]
Reply
7. Aaron Lanterman | October 3, 2009 at 9:28 pm

I think Scott McCloud should make an intro programming text.
Reply
- 8. Mark Guzdial | October 5, 2009 at 1:03 pm
  
  Maybe he will here: http://www.educomics.org/
  Reply
9. janggoon's me2DAY | October 5, 2009 at 12:57 am

장군의 생각…

“the best way to learn something (or at least, to find time to learn it) is to teach it.”, “the best way to predict the future is to invent it.”…
Reply
10. Raymond Lister | October 5, 2009 at 1:35 pm

Mark made reference to a paper of mine, “Ray Lister’s AACE keynote paper from a couple year’s back”. That paper can be found at …

Click to access CRPITV78Lister.pdf

The full citation to it is …

Lister, R. (2008). After the gold rush: toward sustainable scholarship in computing. In Proc. Tenth Australasian Computing Education Conference (ACE 2008), Wollongong, NSW, Australia. CRPIT, 78. Simon and Hamilton, M., Eds., ACS. 3-18.
Reply
11. Katrin Becker | October 5, 2009 at 7:23 pm

We should start a list somewhere (maybe here on Mark’s blog?).

Way back in the early 80’s we were teaching Pascal as a first language and I had this idea to use Pascal’s syntax diagrams as tools to help people learn about the language by “playing compiler”. We used them in 2 ways:
1. As guides for people to parse little programs (and fragments) and reduce them to their smallest tokens
2. To parse small programs (and fragments) to find syntax errors in the same way the compiler does. I even made up rules for recovery so they could continue analyzing the program and find more bugs. It also illustrated very nicely what happens when you forget a closed comment, quote, or semi-colon.

I think these could still be useful exercises for novices. All that is needed is a set of syntax diagrams for whatever language you are using. There’s no need to do the whole language – just enough for people to get the hang of things.

They were very useful for helping people get a handle on what the compiler does; what statements are, (and so on) and how such tiny errors can have such a cascading effect.
Reply
12. Guitar Hero as a Form of Scaffolding « Computing Education Blog | October 29, 2009 at 9:52 am

[…] tools” and “learn science the way scientists understand it.” We also know from educational psychology that engaging introductory students in the same activity as experts only work…. The bottom half of the students get frustrated and […]
Reply
13. The need for hybrids: A call for more DBR « Computing Education Blog | November 10, 2009 at 10:57 am

[…] result surprised me, given what we know about worked examples research. How much do students make use of the worked examples? Answer: Almost none at all. Students […]
Reply
14. edtechdev | November 17, 2009 at 1:18 am

You might be interested in recent criticisms/analysis of cognitive load theory from multimedia reseachers themselves:

Cognitive Load Theory: Failure?

Also, the modality effect you mentioned has been shown to not transfer very well (indeed, even reverse) when tested in classroom or other real-world contexts.
Reply
15. Recursion by Pirolli (1991) « Computing Education Blog | March 31, 2010 at 8:47 am

[…] was interested in this paper as one of the first in computer science to build upon Sweller’s worked examples research. Pirolli was explicitly trying to understand the role of examples in problem-solving and about […]
Reply
16. In Toronto, Talking About CS Ed and CS4All « Computing Education Blog | November 21, 2010 at 10:01 pm

[…] already know what I’m going to say — it’s about my students’ work, about worked examples and phonics, and about why textbooks are bad for CS Ed, and why distance education is important for […]
Reply
17. Taking a test helps with learning « Computing Education Blog | January 29, 2011 at 10:07 am

[…] interesting result! Flies in the face of the original Worked Examples research by Sweller et al., but not the later work that emphasized skills testing as well as examples. It supports the claims […]
Reply
18. Show Me The Code « Computing Education Blog | April 18, 2011 at 10:32 am

[…] we focus too much on having students write code, and not enough time reading code examples. Worked examples are a powerful approach to learning that we simply make too little use of in computer science. We […]
Reply
19. Teachers should not tailor information to different kinds of learners « Computing Education Blog | September 13, 2011 at 10:03 am

[…] they try to tailor information to the “style” of the student. To me, this meshes with Mayer’s work on multimedia learning. All people have limitations in perceiving different modalities. Mixing modalities helps, and […]
Reply
20. Practice is better for learning facts, worked examples are better for learning skills « Computing Education Blog | April 4, 2012 at 6:57 am

[…] Isn’t learning to program mostly about learning skills? We should be providing lots more worked examples of programming (not just the code — the process) to teach programming skills. In math, for example, […]
Reply
21. Instructional Design Principles Improve Learning about Computing: Making Measurable Progress « Computing Education Blog | June 5, 2012 at 7:30 am

[…] a complete worked example. By calling out the mental model of the process explicitly, we reduce cognitive load associated with figuring out a mental model for themselves. (When you tell students to develop […]
Reply
22. Balancing cognition and motivation in computing education: Herbert Simon and evidence-based education | Computing Education Blog | January 6, 2017 at 7:01 am

[…] Discovery, Problem-Based, Experiential, and Inquiry-Based Teaching.” I talked about the Kirschner, Sweller, and Clark paper in this blog post with its implication for how we teach computer […]
Reply
23. Jeff Graham | November 19, 2018 at 11:24 am

This is a great post! Now I’ve got some reading to do.
Reply

	PCAS Expansion, Grow… on Updates: NSF Funding to Study…
	PCAS Expansion, Grow… on Putting a Teaspoon of Programm…
	PCAS Expansion, Grow… on Media Computation today: Runes…
	PCAS Expansion, Grow… on Participatory Design to Set St…
	PCAS Expansion, Grow… on Updates: Developing the Univer…

Computing Ed Research – Guzdial's Take