From Design of Everyday Things to Teaching of Everyday People: Human error? Student error? No, Bad Design

June 26, 2017 at 7:00 am 19 comments

This summer, I’m teaching a Human-Computer Interaction course in Barcelona. I’ve never taught HCI before, and I’m enjoying it. Like many HCI courses, we started by reading Don Norman’s Design of Everyday Things.

He begins Chapter 5, “Human Error? No, Bad Design” with these paragraphs:

Most industrial accidents are caused by human error: estimates range between 75 and 95 percent. How is that so many people are so incompetent? Answer: They aren’t. It’s a design problem

If the number of accidents blamed upon human error were 1 to 5 percent, I might believe that people were at fault. But when the percentage is so high, then clearly other factors must be involved. When something happens this frequently, there must be another underlying factor.

That quote really struck me. We have lots of education systems with more “error” than 1 to 5 percent. Educators call “error” the failure rate, which is the rate of students who do not succeed in the course (i.e., not completing the course with a passing grade). Why don’t we critique education systems in the same way as Norman critiques user interfaces?

The issue is particularly relevant for the CS for All movement.  “All” is 100%.  Is our goal for 100% of students get access to computing education, or that 100% of the students succeed in learning computing education?  The former is a cop-out — we want under-privileged kids to access computing education (which likely was developed for privileged kids, because culturally-relevant CS education is important but challenging to build), but we don’t care if they succeed?  When it comes to textual literacy or mathematical literacy, the goal is all citizens, 100%.  How do we design computing courses in which 100% of the students succeed?

I posed the question to my class, with another Norman quote:

There was a lot of hmmm-ing in the class when I put up that slide. I had them discuss in groups for just a couple minutes (our room is set up with chairs in groups of three, which works out pretty well for short, small-group discussions), and then share their thoughts. Several students were surprised that MOOCs typically have around a 90% withdrawal rate. One student asked about MOOCs where students pay for them. Our OMS CS courses cost students a few hundred dollars each, and typically have WDF (withdrawing or earning a D or F, non-passing grade) rates between 25-33%, compared to 3-7% for the same in-person MS CS courses.  Is that an acceptable failure rate?

There was a lot more concern about the first question. One student asked, “But what if a student regularly fails CS classes?” I asked if the system is regularly failing that student. Someone asked, “But what about the 60% of the students who succeed in the course?” I pointed out that many people successfully use a UNIX command line, but it’s a pretty classic case of a poor user interface. Some users might be able to use a poorly designed system, but everyone is better off with a better designed system.

I didn’t let the discussion go on for too long, because this is an HCI course and not an educational design class. (Barbara is teaching the educational technology design course here.) I saw this as an opportunity to apply DOET thinking to systems they were familiar with.

I am interested in hearing what the readers of this blog post think. What amount of error/failure should we tolerate in an educational system before we call the system badly designed? I understand that Norman is aiming to design computing systems that users can successfully use to achieve their goals, while educational systems have an explicit purpose to change the students and their goals. However, Norman’s users and our students are all irrational, flawed humans, and we still want them to succeed. I’ll end with another quote from DOET, with a small annotation from me.

“The problem with the designs of most engineers (and maybe teachers?) is that they are too logical. We have to accept human behavior the way it is, not the way we would wish it to be.”

Entry filed under: Uncategorized. Tags: , , , , , .

Call for Papers for 2nd Blocks and Beyond Workshop Teaching the students isn’t the same as changing the culture: Dear Microsoft: absolutely not. by Monica Byrne

19 Comments Add your own

  • 1. Paul Gestwicki  |  June 26, 2017 at 8:12 am

    Did anyone push back on the claim that the UNIX command line is badly designed? It seems like this could lead to a good discussion of fitness for purpose and balancing various design goals.

    Reply
    • 2. Mark Guzdial  |  June 26, 2017 at 8:52 am

      Sure — it’s called the anti-Mac interface. The UNIX command line certainly leads to far more error than do GUIs.

      But that’s missing the point of my post. How much error do we tolerate before we find fault with the system?

      Reply
      • 3. Paul Gestwicki  |  June 26, 2017 at 8:57 am

        It seems to me that identifying design goals and addressing fitness for purpose of designs *is* the point of your post. Isn’t that exactly what you’re asking us to consider in terms of CS4All and CS education generally?

        That’s what made me wonder if any students asked about it: a design cannot be good or bad without context. Seems like rich grounds for deeper discussion of HCI practice. One could continue to make analogies, particularly considering (for example) for whom CLIs were designed and for whom CS education was designed.

        I’ll check out the linked article later—it’s not one I’ve read before.

        Reply
        • 4. Mark Guzdial  |  June 26, 2017 at 9:06 am

          I’m actually asking a broader question. Norman claims that any more than 5% error is due to system design error, rather than human error. His notion of “systems” is broad, e.g., it includes workplace accidents.

          Error in an education context might be students giving up or failing. (I’m open to consider other definitions.) In many education situations, error rates are far more than 5%. Is that indicative of system design error?

          The question I’m asking is, “How much student failure do we tolerate before we decide our design has failed?” If your answer is, “it depends on the design goals,” I’d like to know why that matters. Norman is saying “systems.” Can we have education systems where we expect high failure rates? That that’s somehow good for the students? What is the argument that any more than 5% student withdrawal or failure is acceptable?

          Reply
          • 5. Paul Gestwicki  |  June 26, 2017 at 9:39 am

            Thanks for the clarification.

            I think the devil’s advocate position is easily found, since I hear echoes of it in the hallway: we’ve not designed a system to educate everybody, but only to educate the students who have a particular background or status. If _those_ students get educated, then it doesn’t really matter what happens to the rest, whether they are 5% or 40%. Put another way, a WDF rate of 40% is not error if you have not defined your design goals to consider it as error—it’s more like noise the system is designed to filter out. The fact that this theme echoes in the ivory tower is, I think, why I saw in your post a question of design goals and fitness functions.

            To be clear, I don’t advocate that position, but I think a lot of people have, perhaps implicitly, sometimes explicitly. I think it’s hard to converse with people about WDF rates as error if they don’t see it as such.

            I’ve thought quite a bit about failure-as-educational-error since you posted the link to Ko’s essay. Much of it resonates with me. And yet, I find myself unable (so far) to articulate why 100% feels like the wrong goal, when I have undergraduates whose problems often seem to far exceed motivation and inspiration. I’m still trying to sort that one out.

            Cheers!

            Reply
  • 6. Alfred Thompson  |  June 26, 2017 at 8:49 am

    When I look at test results I always look for questions that a lot of students got wrong. The next step is to try and figure out if the question was written poorly or if I just didn’t teach the material well enough. It always seemed to me that the goal for a teacher would be for everyone to pass the course with a good grade. Ideally indicating solid learning of the material. Shooting for a curve always seemed antithetical to being a good educator.

    If students don’t do well it should be clearly their fault not the teachers. That means they, the student, are not doing their part. MY job is to do all I can to meet the student where they are and help them to learn.

    Reply
  • 7. Rob St. Amant  |  June 26, 2017 at 10:04 am

    For your HCI class, I’ll mention that usability is a tricky issue; Gilbert Cockton lays out the history of usability in HCI here.

    https://www.interaction-design.org/literature/book/the-encyclopedia-of-human-computer-interaction-2nd-ed/usability-evaluation

    The bottom line for usability, roughly speaking, is that systems can’t be evaluated in the abstract, independent of a target audience of users (although we sometimes leave that implicit in HCI, when we talk about design flaws, indicating that the target audience of users is hypothetically everyone) and the tasks they want to perform. UNIX in particular has some subtle and some obvious design flaws, but it’s reasonably well-designed for its intended target audience of experienced, knowledgeable computer people and what they want to do. Saul Greenberg’s dissertation back in 1988 contains one of the largest-scale user study of different types of Unix users up to that point in time (and probably since); experienced users had a command-line error rate of a little over 5%.

    As for CS education, I think there’s another interesting connection with HCI, in that sometimes we consider the difference between mandatory and discretionary use of applications; games are an interesting example of the latter. Some types of games walk a fine line between making tasks for players hard enough to be challenging and yet not so hard that players eventually give up. It turns out, at least from what I’ve read in the popular press, that completion rates for video games, even the most popular and successful ones, are 10% to 20%, not too far off from MOOCs. Those games have been designed to provide incremental value as they’re being played, so completion isn’t everything. If that were the case with MOOCs, in discretionary topics, where students aren’t necessarily expecting to finish, that might be okay if the students are getting something out of the time they put in.

    For mandatory/required classes, though, I agree that a high failure rate is a problem with the course, because it *should* be designed such that anyone could pass it.

    Reply
  • 8. acbart  |  June 26, 2017 at 10:42 am

    Normative, Ipsative, and Criterion-based – how do we measure our success rate in CS Ed? Here, a criterion-based metric has been proposed: no more than 5% of the class can fail. When writing my dissertation, I compared our Computational Thinking to other courses in the “Quantitative and Symbolic Reasoning” bucket (http://imgur.com/a/fUxsQ), for a normative comparison. When you wrote your “Exploring Hypotheses about Media Comp” paper, you used an Ipsative assessment to improve your CS1’s dropout rate (“… a course using Media Computation will have a higher retention rate than a traditional course.”).

    It’s hard to make an argument that one of these approaches is better than the other. Criterion-based measures are usually formed when you look at many normative distributions, and so aren’t that different in practice. Ipsative can be unsatisfactory and insufficient for making honest comparisons. Looking at it normatively doesn’t always help that much, if we assume that most of our introductory courses aren’t doing a very good job.

    But questions of what we’re basing this on aside, does 5% feel like the right number? Currently, we have about a ~10% DFW rate in our CT course. I think we’re nearing the bottom of what we can reasonably do to bring those DFW students to success – most of these are students who stopped working for reasons outside my control or who had medical crises. I’m not sure I could squeeze out another 5% without really lowering my standards for success. And that’s in a non-majors class where my expectations are very different from what I want out of CS majors.

    Ultimately, I think my big reaction is that assessment is really, really hard (e.g., Allison’s work) and we aren’t good enough at it yet to really be able to micromanage our pass/fail rates too much. Whatever arbitrary number we choose as success is tied very heavily to how we measure success in the first place.

    Reply
  • 9. alanone1  |  June 26, 2017 at 11:14 am

    Underlying all is that “biology means variation”, and variation will extend through cultural and other learning.

    The entry point is that in any good design for human use we would like to start the learning from where particular learners are (and that will be from different places cf above).

    The other big principle is that a good design of artifact or curricula or user interface will not have gratuitous difficulties.

    Now consider the violin. The only real gratuitous difficulty it has are the wooden friction tuning pegs. This has been fixed by adding at the other end, screw type tuners for fine tuning. And let me just mention in passing the cornetto (perhaps the hardest western instrument to make sound good).

    But these instruments reward the hours/years of learning with incredible expressive and beautiful timbres. The human definitely adapts to what is needed here, and this is ultimately a good thing.

    It is OK for these instruments to be this difficult to learn because they are ultimately a choice.

    The question at hand is “what about 100% learning something”? (where the “biology is variation” means “not really 100% but as close as possible”). And the learners are not given a choice.

    A threshold needs to be drawn that helps define “fluency”, and the goal is to get everyone over it. (Sound like the goals for reading and writing, mathematics, and science, etc.?) The “fluency” line needs to be really carefully done, because it is where people get fooled when the ideas of fluency are too weak (as they are for all the established subjects).

    Biologically we are -les-s set up for “powerful idea inventions” than for the built-ins like language. (It’s worth looking at “fluency” just for oral language, how long it takes to develop it, the percentages of reaching it, etc.)

    Finally, we can see that what Don is right about is pretty low on the list of what needs to be done to teach a “powerful invented idea” to the level of fluency.

    The most important miss here is that really difficult things that are not particularly setup for human brains are going to create lots of mismatches.

    We can take a cue from baseball: if you do -not- get a hit 70% of the time you are still considered to be a very good “fluent” hitter. But if you do not catch a flyball more than 2% of the time you are considered to be a “not-fluent” fielder.

    The first is not an “error” in baseball, but is the -overhead- for trying to do something really difficult. The second is “technique” which can be learned well, and fluency means being able to do it virtually all the time, and it is considered an “error” if you don’t.

    In the part of education that is about helping learners get fluent in “invented powerful ideas”, part of the process will be some big changes in almost all learners combined with various degrees of discomfort, even “pain”.

    If the difficulties are all needed to effect the changes, then they need to be embraced, and the approach needs to be a much larger set of strategies and tactics that are much more often found in both sports and music learning than in the academic subjects.

    Reply
    • 10. Mark Guzdial  |  June 26, 2017 at 4:07 pm

      Thanks, Alan — your analysis helps me connect Don’s ideas to CS ed issues. Given that there is going to be variation, it’s possible that there can be a mismatch between instructional method and variance in the individual. How do we know when we don’t have enough variance in our instructional methods to be a good match to enough of the population to achieve a literate culture? Maryanne Wolf pointed out that Egyptians didn’t develop a literate culture because hieroglyphics were too hard to learn and too few people achieved fluency. How do we know when the pedagogical methods we have are as good as they can get, and our inability to achieve thresholds and fluency are due to the inherent difficulty of the material — as opposed to a lack of invention in how we teach?

      Reply
      • 11. alanone1  |  June 27, 2017 at 2:14 am

        Hi Mark — I think sports and music analogies can help up to a point (analogies should always be suspect, but they can be useful too!)

        Even though there is no question that there are “talent factors” in most endeavors and areas of learning — Gladwell is quite off on this — the good news is that putting in a lot of work in practicing etc can allow most people to achieve fluency.

        So a good “equation” is
        ability = talent + skill + will

        At the very top levels of most endeavors, even the highly talented need to do lots of practicing. But these levels are higher than “fluency”.

        The good news is that “fluency” is generally enjoyable, and this means that most people can learn enough to enjoy most pursuits.

        The questions for the “invented powerful ideas” are “what are fluency levels for them?” and “can these levels be obtained without high levels of talent, but with good efforts via skill and will?”

        I think there are no definitive answers for most of these at this point.

        One of the most interesting and recent “invented powerful ideas” is the idea of universal human rights, which requires a society and societies to learn to go against genetic predispositions for revenge, tribalism, “the other”, competition, etc. One of the most important questions for our time — right up there with “citizens fluently understanding science”, is “can equity be learned?” — and, if so, how can we learn how to teach and learn it?

        Reply
        • 12. jasonbrennan  |  July 1, 2017 at 2:36 pm

          Hi Alan,

          It still seems like a failure rate at gaining fluency in a powerful idea (e.g. Science) indicates a failure in the learning environment, at least in terms of imparting learning strategies to students. For example, grit & growth mindset from Carol Dweck’s research suggest to me that these sorts of strategies are more important than any “natural born talent” at learner may have, and I think probably as important (or inseparable from) a class’s curriculum itself.

          Tangentially related, can you point us to any good literature / thinking on how to determine fluency thresholds? (I am currently reading Maryanne Wolf’s “Proust and the Squid” so maybe she touches on the subject later).

          Reply
          • 13. alanone1  |  July 2, 2017 at 2:28 am

            Hi Jason

            I’ll write a longer reply to this “later”. But consider again the “equation”:
            ability = talent ‡ skill ‡ will

            Reply
            • 14. alanone1  |  July 2, 2017 at 2:58 am

              … where the “‡” is some form of combination, not really “addition”.

              It’s worth thinking about what “-talent-, as a ‘potential’ ” might mean. It’s “something” innate that is a “start” and a “channel”. For example, it’s almost universally indicated from evidence that almost all human babies have “something” already set up for learning language from their environment. The “something” includes (a) various kinds of sensitivities — to certain kinds of sounds, to what’s going on, etc. — that draw attention to speech-like things going on, and (b) includes “inner motivation” to imitate and learn. Etc. Because of “Biology is variation”. there will be a distribution of how strong the “somethings” are.

              There can be failures in the learning environment here (such as the tragic children who are locked in closets from birth), but in general, language learning is driven by the learner, and mostly refined by the environment.

              And, of course, trying (and thinking trying is a good thing) is critical — so an environment that doesn’t encourage this is going to be stunting for many.

              Another slant on the “Biology is variation” is to think about what this means for just the two dimensions of “talent” and “movitivation” — let’s just think about only (arbitrarily) 5 “bins” for each.

              For any subject laid before a group, there will be some children who “take to it quickly and ‘naturally’ ” (maybe 5-10%). They don’t need a lot of help. Three more subgroups need different kinds of help. And there will usually be a subgroup we don’t know how to help.

              Similarly for motivation. Some children are self-motivated from the get go (and often they are the kids who who “take to it quickly”. Other subgroups need different kinds of motivation e.g. some will want to do things mainly because others are doing them. Some will need to see that the thing relates to other things they are interested in. And so forth.

              The two dimensions are not completely orthogonal, so let’s posit 20 different combinations instead of 25. (This is a lot for a teacher to deal with! And the good ones find ways to do it.)

              Much of -real- Montessori education is about how teachers and the designed environment can — invisibly to the children — get them to take charge of their learning — just as they do for language and other built-ins.

              There’s no question the set up of the environment makes a big difference, so does the belief or not in one’s own possibilities.

              However, there is still “talent”, and this really does help especially in the beginnings of learning, and after fluency has developed.

              To your second question, much of the assessment in sports and music learning (and driving a car, etc.) has partly to do with seeing what the learner can do -while doing- the basic activity — for example, can they think about what they are doing while doing it, can they talk about it, what else are they aware of, etc.?

              For reading, while reading out loud something not seen before, if a story, can the reader act out the story (indicating they can read ahead and think about the meanings of the story while also saying the story)? There are similar assessments for prose reading out loud. Does the reader remember? Does the reader read faster than a human can speak? And so forth.

              For math, what can be done with something, especially in conversation with a fluent practitioner?

              These are all indications. And most of these also have assessment possibilities that are “off-line” (not in conversation)m having to do with both consuming and creation.

              Reply
  • 15. Data Visualization Pitfalls to Avoid | EDUMIO.com  |  June 26, 2017 at 2:32 pm

    […] Not to be missed by anyone working in big data or human-computer interaction. Related: Bad design of everyday things. Also related: data journalism in broadcast news and […]

    Reply
  • 16. Chris Johnson  |  June 27, 2017 at 3:50 pm

    Norman appeals to probability. If enough failures happen, it does seem probable that some of them are due to poor design. But that’s not enough for a class action lawsuit. Both plaintiff and defendant study the failures and tease apart user error from design error.

    You ask, “How much student failure do we tolerate before we decide our design has failed?” We can’t and shouldn’t decide from the fail rate alone. You suggest that fail rates in the workplace are akin to fail rates in a course. When I buy a teapot and only use it once or twice, the teapot hasn’t failed. Yet when a student attends my class once or twice in a semester, should that student’s F be considered when evaluating the design of the course? I hope not.

    The followup investigations in these failures are the only thing that will provide an accurate picture. Unfortunately, the plaintiff is usually under 22 years old and is unlikely to make a claim. It is up to the defendant to file suit against itself. This is hard.

    Reply
    • 17. Mark Guzdial  |  June 28, 2017 at 2:05 am

      Chris, you’re arguing from a straw man position. I know of many cases (including my own kids and my own students) where students fail despite attending every class and doing every assignment. As teachers and faculty, we are designing education systems. Surely we can do better than we have to consider each case separately.

      Reply
      • 18. Chris Johnson  |  June 28, 2017 at 6:03 am

        Of course. I am only arguing against using the raw fail rate as evidence of broken design. Not against the reality of broken design.

        For me, at least, the number of non-participants in my introductory programming class is also significant.

        Strangely, just a couple of weeks ago, I went into a local business, and when the clerk asked me for my name, he provided it himself. I looked at him in surprise, and he said that he’d been a student in one of my classes. I actually practice learning and remembering my students with an app on my phone, but I didn’t recognize this man. I asked him if he attended class. He said yes. I asked him if he attended lab. He said yes. I asked him if he turned anything in. He said yes. I was feeling like a failure for not recognizing him at all.

        But when I got him and checked the gradebook, I saw that the only thing he had turned in was a little get-to-know-you sheet on the first day of class. He had never been to lab, never touched the homework, and only visited the course discussion board twice. I don’t understand why he tried to tell me otherwise.

        Reply
  • 19. David Karger  |  June 30, 2017 at 8:09 pm

    In general tools in HCI are assessed as to their effectiveness for performing a relatively narrow task—edit a document, transfer a balance, etc. I’d argue that “education” is too broad a task to consider in the same way. A better match might be to assess the effectiveness of a particular lesson at teaching a particular language construct, for example.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Recent Posts

June 2017
M T W T F S S
« May   Jul »
 1234
567891011
12131415161718
19202122232425
2627282930  

Feeds

Blog Stats

  • 1,436,255 hits

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 5,152 other followers

CS Teaching Tips


%d bloggers like this: