The problem with sorting students into CS classes: We don’t know how, and we may institutionalize inequity

April 1, 2019 at 7:00 am 10 comments

One of the more interesting features of the ACM SIGCSE ITiCSE (Innovation and Technology in CS Education) conference are “working groups” (see description here). Groups of attendees from around the world work together before and at the conference on an issue of interest, then publish a report on what happened. This is the mechanism that Mike McCracken used when he organized the first Multi-Institutional, Multi-National (MIMN) study in CS Ed (see paper here). This year’s Working Group #9 caught my eye (see list here).

The description of what the group wants to explore is interesting: How can we measure what will lead to success in introductory computer science?

The main issues are the following.

The ability to predict skill in the absence of prior experience

The value of programming language neutrality in an assessment instrument

Stigma and other perception issues associated with students’ performance, especially among groups underrepresented in computer science

It’s a deep and interesting question that several research groups have explored. Probably the most famous of these is the “The Camel has Two Humps.” If you read that paper, be sure to read Caspersen et al’s (unsuccessful) attempt to replicate the results (here), Simon’s work with Dehnadi and Bornat to replicate the results (again unsuccessful, here), and then finally the retraction of the original results (here). Bennedsen and Caspersen have a nice survey paper about what we know about predictive factors from ICER 2005, and there was a paper at SIGCSE 2019 that used multiple data sources to predict success in multiple CS courses (here). The questions as I see it are (a) what are the skills and knowledge that improve success in learning to program, (b) how can we measure to determine if they are there, and (c) how can we teach those skills explicitly if they are not.

Elizabeth Patitsas explored the question of whether there are innate differences between students that lead to success or failure in introductory CS (see paper here). She does not prove that there is no so-called Geek Gene. We can’t prove that something does not exist. She does show that (a) that grades at one institution over many years are (mostly) not bimodal, and (b) some faculty see bimodal grade distributions even if the distribution is normal. If there was something else going on (Geek Gene, aptitude, whatever), you wouldn’t expect that much normality. So she gives us evidence to doubt the Geek Gene hypothesis, and she gives us a reasonable alternative hypothesis. But it isn’t definitive proof — that’s what Ahadi and Lister argued at ICER 2013. We have to do more research to better understand the problem.

Are Patitsas’s results suspect because they’re from an elite school? Maybe. Asking that question is really common among CS faculty — Lecia Barker found that that’s one of the top reasons why CS faculty ignore CS Ed research. We discount findings from places unlike ours. That’s why Multi-Institutional, Multi-National (MIMN) is such a brilliant idea. They control for institutional and even national biases. (A MIMN study showing the effectiveness of Peer Instruction is in the top-10 SIGCSE papers list.)

In my research group, we’re exploring spatial reasoning as one of those skills that may be foundational (though we don’t yet know how or why). We can measure spatial reasoning, and that we can (pretty easily) teach. We have empirically shown that wealth (more specifically, socioeconomic status (SES)) leads to success in computing (see this paper), and we have a literature review identifying other privileges that likely lead to success in CS (see Miranda Parker’s lit review).

I am concerned about the goals of the working group. The title is “Towards an Ability to Direct College Students to an Appropriately Paced Introductory Computer Science Course.” The first line of the description is:

We propose a working group to investigate methods of proper placement of university entrance-level students into introductory computer science courses.

The idea is that we might have two (or more) different intro courses, one at a normal pace and one at a remedial pace. The overall goal is to meet student needs. There is good evidence that having different intro courses is a good practice. Most institutions that I know that have multiple introductory courses choose based on major, or allow students to choose based on interest or on expectations of abilities. It’s a different thing to have different courses assigned by test.

If we don’t know what those skills are that might predict success in CS, how are you going to measure them? And if you do build a test that sorts students, what will you actually be sorting on? It’s hard to build a CS placement test that doesn’t actually sort on wealth, prior experience, and other forms of privilege.

If the test sorts on privilege, it is institutionalizing inequity. Poor kids go into one pile, and rich kids go into the other. Kids who have access to CS education go into one pile, everyone else into another.

Why build the test at all? To build a test to sort people into classes presumes that there are differences that cannot be mitigated by teaching. Building such a test presumes that there is a constant answer to “What does it take to succeed in CS1?” If we had such a test, would the results be predictive for classes that both use and don’t use pair programming? Peer instruction? Parsons problems?

I suggest a different perspective on the problem. We can get so much further if we instead improve success in CS1. Let’s make the introductory course one that more students will succeed in.

There’s so much evidence that we can improve success rates with better teaching methods and revised curriculum. Beth Simon, Leo Porter, Cynthia Lee, and I have been teaching workshops the last four years to new CS faculty on how to teach better. It works — I learned Peer Instruction from them, and use it successfully today. My read on the existing literature suggests that everyone benefits from active learning, and the less privileged students benefit the most (see Annie-Murphy Paul’s articles).

One of the reasons why spatial reasoning is so interesting to explore is that (a) it does seem related to learning computing and (b) it is teachable. Several researchers have shown that spatial skills can be increased relatively easily, and that the improved skills are long-lasting and do transfer to new contexts. Rather than sort people, we’re better off teaching people the skills that we want them to have, using research-informed methods that have measurable effects.

Bottom line: We are dealing with extraordinary enrollment pressures in CS right now. We do need to try different things, and multiple introductory courses is a good idea. Let’s manage the enrollment with the best of our research results. Let’s avoid institutionalizing inequities.

Entry filed under: Uncategorized. Tags: computing education research, Geek Gene, spatial reasoning.

Using MOOCs for Computer Science Teacher Professional Development Opportunities to explore research questions with Code.org: Guest post from Baker Franke

10 Comments Add your own

1. Megan | April 1, 2019 at 12:10 pm

It sounds almost as if the group is planning on using some cognitive diagnostic assessment to assess ability and sort students. If a multidimensional assessment is used, students may be “sorted” based on their abilities in different domains and targeted remediation can be employed within a course. Alternatively, if the approach as apparently described (remedial/normal) in a unidimensional assessment of ability/skills is taken, remediation can still be applied in that normal course, as you have alluded to.

The issue in that case really becomes the design of the remediation tools/modules and the adoption of them into a course (both by the instructor and the student).

Of course, SES and other variables may be associated with performance on one or more of these outcomes, which is why streaming the remediation within a course would be preferred.

IRT models would also be up to this task, if used appropriately.

See, e.g.,:

Embretson, S.E. (2012). A multicomponent latent trait model for diagnosis. Psychometrika, 78 (1), 14-36.

Gorin, J. (2007). Test design with cognition in mind.
Educational Measurement, Issues and Practice, 4, 21–35.

Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log linear models with latent variables. Psychometrika, 74, 191-210.
Reply
2. BKM | April 1, 2019 at 1:01 pm

I actually considered applying to be on that working group, although the dates did not work for me. We are very much struggling with this. And I think this is where we get into the issue that different schools have different needs. I am at an instituion where most of the students are first in their family to go to college, and Pell-eligible. Ethnically and racially, this is one of the most diverse schools in the country. Our students largely come from urban schools, and rarely come in with AP in CS, or really, any other computer science background. They have never done robotics teams or afterschool Scratch clubs, or anything else that is typical of more privileged students.

We only offer one version of the introductory sequence, but divide it across 3 semesters instead of 2 so the students have more time to learn the material. We have small classes and teach in a very hands on style. And yet, we see tremendous disparities in their ability to make it through the introductory courses. And yes, we have bimodal distributions. I don’t care what the research says, because I see it firsthand every semester. The final grades in those introductory courses are heavily skewed to A’s, D’s, and F’s. Furthermore, the students getting the A’s complain that everything is too slow, and they often transfer out. So we are definitly trying to figure out how to tell in advance which students are going to struggle so that we could either place them differently or give them extra support in some way. Since no one has computing experience coming in, we need a way to do this that does not simply ask them about computing. We worry less about class and privilege than you might, because most students here have neither.
Reply
3. Alisha A. Waller | April 4, 2019 at 12:23 pm

I really agree with your social justice arguments and the caution against institutionalizing inequity. I’m curious though, how are you measuring “spatial reasoning”? I’ve long suspected that the Purdue spatial rotations test suffers from similar biases that the Physics Concept Inventory does. Does a student’s past experience with the materials being manipulated in space, e.g. Lego bricks versus fabric, influence their scores on tests of spatial reasoning?
Reply
- 4. Mark Guzdial | April 4, 2019 at 12:29 pm
  
  We have used the Purdue test in the past, but share your suspicions. We’re now trying to invent new measures. See Amber Solomon’s ICER paper on developing a taxonomy of gesture in CS classes, and Katie Cunningham’s ICER paper on sketching and tracing. We’re looking at gesture, sketching, and spatial language as indicators of spatial thinking.
  Reply
5. Jean Lebonpain | April 8, 2019 at 3:37 am

I have a few problems with this post, that are obvious from the title «The problem with sorting students into CS classes: We don’t know how, and we may institutionalize inequity».

First, of course that there are problems with sorting students into CS classes. There are problems with everything complex, especially when humans are involved. It is all a matter of tradeoffs : are the problems with sorting students into CS classes worse than the problems with not sorting students into CS classes ?

Second, “we don’t know how”. Again it is simplistic to consider that such knowhow would be binary either we know how to do it perfectly or we don’t know how. Again, it is a matter of tradeoffs, cf. supra., given the way we could sort students into CS classes, would the costs outweight the benefits ? For whom (e.g “false positive” vs “false negative” )?

Third, “we may institutionalize inequity” fails to understand the consequences of the distinction between correlation and causation. If, for whatever reason, something like the g factor of general cognitive abilities is correlated with something like socioeconomic status, then any assessment involving cognition “may institutionalize inequity”. Should we give up, then ?

I have had students skipping my CS class telling me that it had nothing to do with me, but the pace was much to slow for them. I also had students struggle so much, that the rest of the class resented them for bringing the class to a halt with their endless streams of questions because they could never seem to grasp the material.

It does a disservice to all students to pretend that they should all share the same class regardless of skills/abilities. That we can’t predict perfectly how students will fare (how much material they can cover, at which pace) or that such prediction would be correlated with socio-economic factors should not be an excuse to let the students down in trying to give them the best educational experience (as in, fit for them).
Reply
- 6. Mark Guzdial | April 8, 2019 at 8:39 am
  
  Thanks for visiting, Jean. I agree that the title doesn’t sell the story. I’m not great at coming up with titles that capture the story well. I do believe that the post expands on the title.
  
  Yes, it is worse to sort students than not to sort students. Let’s consider dyslexia. I’ve seen some estimates that 10-12% of the student population has dyslexia. Dyslexia impacts a student’s ability to read, and particularly in later grades, impacts a student’s ability to learn. But we don’t separate off dyslexic students in their own classes, because of the stigma and inequitable education that would likely result. Instead, we give the students what they need in those classes.
  
  As I describe in the post, we have yet to identify differences that make a difference (in terms of predicting student success) other than privilege. As Patitsas and Parker and Porter have shown us, it’s not clear that we can predict future success, and it’s not clear that students are separating into bimodal distributions. Yes, it’s hard, but it’s also not worth doing.
  
  Probably a better title would have been “Don’t sort CS students, teach them better.” That was the real point of my post. We don’t know how to predict success, but our ability to teach computer science well more than mitigates student differences.
  
  In your class where students were skipping, did you use peer instruction, pair programming, or POGIL? If you’re not, then I suggest that you might include them. The best of our empirical results suggests that that’s how we provide the best CS educational experience.
  Reply
  - 7. gasstationwithoutpumps | April 8, 2019 at 12:58 pm
    
    Dyslexic students are often separated from the rest of the students—not at the elite university level, where only the dyslexics who have learned well how to compensate remain, but at lower levels it is very common to separate students by reading level.
    
    For that matter, many university subjects sort students, either explicitly by denying entrance to courses to non-majors or by letting students self-select among different levels of the same material. It is common to have 3 or 4 different calculus series and 3 or 4 different physics series, for students who want to learn at different paces and to different depths. Why is this acceptable in physics and math, but not in computer science?
    Reply
    - 8. Mark Guzdial | April 8, 2019 at 1:05 pm
      
      I don’t have the same problem with selection as I do with testing and enforcement when we don’t have the science to do the testing well.
      Reply
    - 9. Mark Guzdial | April 9, 2019 at 8:51 am
      
      Kevin, I gave you a brief response from cell phone yesterday. Now that I have a keyboard, let me try a better one, please.
      
      Dyslexic students are often given special training, but (at least in the schools with which I’m familiar) not put in separate classes. That’s an important distinction. Students who need remediation should get it, but they also need their social network.
      
      Separation by major is still a choice — the student chose the major. Harvey Mudd and Union College are successful with offering different types of CS1 at different paces/depths, and letting the student choose.
      
      I am opposed to testing students into separate classes, when the test will likely just measure privilege — especially when there’s such a better option available. Just teach better! Teaching with active learning makes everyone’s learning better.
      
      AERA was last week, and there were lots of relevant tweets. I liked this one especially.
      
      Teach all Ss like you teach your honors Ss. The minute you start thinking “This S can’t handle this math, I’m going to give them an easier one,” is the minute you have decided to be unjust. @TheJLV #NCTMSD2019 pic.twitter.com/RpYOn6jLDK
      
      — Nicole Bridge (@NicoleBridge1) April 5, 2019
      
      Reply
      - 10. gasstationwithoutpumps | April 9, 2019 at 12:03 pm
        
        I understand the distinction you are making between students choosing what to learn and placement tests for students, but I’m not sure it holds up to close scrutiny.
        
        In many institutions, students do not get free choice of major—the “impacted” majors (like computer science) are often rationed by one means or another (first-come-first-served, separate admissions criteria, GPA, high-failure-rate gateways, …). So restrictions on courses by major are not entirely a self-selection mechanism and often measures privilege (knowing to apply early, taking advantage of appeal mechanisms, having had prior exposure to the material of gateway classes, …).
        
        Self-selection can also serve students badly, by students choosing an “easy” path rather than a more challenging one, resulting in a proliferation of light-weight courses to serve students who don’t “need” the real thing. Look at almost any physics or math department—there has been a proliferation of courses for biologists and others who need to put “physics” and “calculus” on their transcripts, but have no interest in learning the material and not much application of it in their field. The resulting courses are often poorly taught, as neither the teachers nor the students care much. The same could happen to CS courses (and may already be happening).
        
        About dyslexia—some schools do provide supplemental reading instruction, which is often insufficient and ineffective. Others just track students into different reading groups. Students with dyslexia are often sorted out—our town has a private school specializing in dyslexia, because the public schools aren’t really able to handle more than the mildest forms of dyslexia. (I know two students who attended that private school, and neither was well served by it.) At the level of R1 universities, we usually see only the students with milder dyslexia, because dyslexics with more severe versions have been weeded out by the K–12 system.
        
        Teaching better is always a desirable goal, but the advice to teach all students as if they were honors students is probably not sound. The honors students need challenges that pull them way out of their comfort zone—challenges that would cause less prepared or less dedicated students to give up. Finding the right level of challenge and encouragement for each student is difficult, particularly in large courses.
        Reply

	PCAS Expansion, Grow… on Updates: NSF Funding to Study…
	PCAS Expansion, Grow… on Putting a Teaspoon of Programm…
	PCAS Expansion, Grow… on Media Computation today: Runes…
	PCAS Expansion, Grow… on Participatory Design to Set St…
	PCAS Expansion, Grow… on Updates: Developing the Univer…

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Computing Ed Research – Guzdial's Take