Hadi Partovi’s keynote at CrossRoads 2018: CS Ed Research needs to up its game

February 11, 2019 at 7:00 am 12 comments

I was scheduled to be on a panel at last year’s InfoSys Crossroads 2018 conference, but my father passed the week before, so I couldn’t make it.  Several people told me that I had to see the video of Hadi Partovi’s talk, because “he talks about CS Education Research.”  The facial expression suggested I should feel warned.

I enjoyed the talk. Hadi made some critical and completely fair statements about CS Ed Research today. I recommend watching.


The really research-y stuff is around 30 minutes into the talk.  He talks about CS competing with other fields (a point that Joan Ferrini-Mundy also made to ECEP), and that CS has to make the argument for itself. I most appreciated the point (around 31:50) that no CS Ed research meets the “What Works Clearinghouse” standards for “works without reservations.” He’s right, and it’s a fair criticism.  Maybe it’s too early, but at some point, we have to be able to compete at the evaluation standards applied to all other subjects.  Code.org is stepping up their game, with the first evaluation demonstrating that their CSP curriculum significantly increases the rate at which students take and pass the AP CSP exam (see paper here).

I learned a lot from his talk, especially his points about how to scale. I love the advice to “Execute first, raise funding after.”  Since I’m trying to re-start my research program at Michigan, I am thinking about that advice a lot. I hear Hadi saying that I need to be executing on something interesting to fund first, before trying to sell the something interesting I want to do. Writing proposals about things I’m not executing on will probably not be convincing, and delays me making progress on the things I’m passionate about.

Of course, I vehemently agree with his argument that the future of CS Ed is to integrate CS into other subjects.


Entry filed under: Uncategorized. Tags: , .

Higher Ed and the Role of a Computing Culture: Interview on No Such Thing Podcast Need for Reviewers for US Department of Education CS Education Grants – Guest Post from Pat Yongpradit

12 Comments Add your own

  • 1. alanone1  |  February 11, 2019 at 7:12 am

    But what if AP CSP is a very poor representation of computing, and especially of ( real) “computer science”?

    • 2. Mark Guzdial  |  February 11, 2019 at 8:26 am

      A fair concern, but separable in my mind. We should be able to teach CSP. Our track record for teaching programming is pretty poor (e.g., some surveys suggest 30-70% failure-or-give-up rates every semester at many schools). Their results suggest that the Code.org curriculum is effective for teaching CSP. A necessary but not sufficient step to learning how to teach more.

    • 3. alfredtwo  |  February 11, 2019 at 10:40 am

      A lot of people in the CS Ed community focus too much on the AP courses IMHO. It is a wedge into getting CS in some schools where AP is highly valued but long term we can do better.

  • 4. Ben Shapiro  |  February 11, 2019 at 11:21 am

    Thanks for the summary.

    I think we should be very skeptical that meeting the What Works bar is a good goal. Striving for that as an indication of CS Ed Research making progress may lead us away from valuing the adaptive knowledge that teachers on the ground have, and celebrate constraining them rather than building systems of improvement that empower them.

    See Figure 1 here for some valuable contrasts: http://www.diplomasnow.org/wp-content/uploads/2015/07/what-is-improvement-science.pdf

    For more on this topic, see Bryk, Gomez, Grunow, & LeMahieu’s Learning to Improve. (Disclosure: Louis Gomez was my advisor)

    • 5. Mark Guzdial  |  February 11, 2019 at 11:35 am

      In my CS Education Research class, we just studied the Pea and Papert debate from “Educational Researcher” (30 years ago?). My students were kind of bummed that I didn’t give them a definitive answer of who was right and who was wrong. Papert was saying that we need to use technology to create a new kind of education, and we’re “technocentrist” to study the technology alone. Pea was saying that we shouldn’t assume that what we’re inventing works, and we should be carefully evaluating everything we put in front of kids. Both are right. It feels like we continually revisit these issues.

      “May lead us away” presumes a single-threaded research space. We can have efforts that explore how to achieve measurable progress, while we also push to empower teachers and re-invent educational culture and practices. We don’t all do the same things.

      • 6. alanone1  |  February 11, 2019 at 11:48 am

        Not to beat the poor horse further, but in many cases “careful evaluation” is not even close to being “careful enough”. For example, over quite a few years of longitudinal experience in schools we gradually found that it really takes about three years in a classroom with the same teacher(s) before a really “careful evaluation” can be accomplished.

        A good example where this didn’t happen was the Pea and Kurland Bank Street School “study” of LOGO. This paper had a huge negative influence on the possible use of LOGO, and it was much too narrow and short a view to justify the conclusions.

        “The softer the discipline the tougher you have to be” — I feel that so many things around computing and also computing education are very soft and there is a general lack of toughness to deal with this is ways that will really make progress.

        Better characterizations of “what is actually needed” and much longer and deeper experiments that involve the whole system of ideas and participants is needed before most of this can be taken seriously.

        • 7. Ben Shapiro  |  February 11, 2019 at 11:57 am

          Thanks, both of you, for your responses.

          I agree with Pea’s main point in that writing, which you’ve ably summarized. I also agree with Alan that the evaluation was too narrow.

          To your point about things not needing to be single threaded. That’s true. But I also think there is a need to actively resist the What Works/RCT perspective on evaluation. To challenge its legitimacy. To declare: this is not something that is acceptable to us, so if you’re going to spawn a thread, let it not be that.

          The RCT paradigm is, in my opinion, the education equivalent of the No Deficit Spending framework that has for too long dominated the American political sphere. A powerful political tool but not one that has lead to greater equality and opportunity in our society. There are times when RCT and Deficit aversion make sense, but probably neither does in this context and scale.

          Additional reading for those who are interested: Penuel, W. R., & Fishman, B. J. (2012). Large-scale intervention research we can use. Journal of Research in Science Teaching, 49(3), 281-304.

          • 8. Mark Guzdial  |  February 11, 2019 at 9:24 pm

            Why, Ben? DBIR (Penuel and Fishman) is a terrific idea. In my work, I’ve never done an RCT, but have used DBR methods in several studies (most recently, Barb’s ICER paper on our ebooks), so I’m much more informed about DBR methods than RCT methods. But I don’t see the harm of aiming for RCT style studies. They are expensive, but compared to (say) military spending, they’re reasonable.

            I don’t see the connection between RCTs and No Deficit Spending. I don’t like the “What Works Clearinghouse” either — clearly, there are lots of things that work but don’t meet their standards. But I don’t see why it’s wrong to do randomized control trials, to do evaluations with a goal of avoiding confirmation bias, and to be honest and humble about what we might be getting wrong. It’s easy for researchers and teachers to believe things are working that aren’t really, to fool ourselves. Isn’t it worthwhile to spend effort to test our beliefs and assumptions?

            • 9. Ben Shapiro  |  February 12, 2019 at 10:16 pm

              The connection between the RCT discourse and the No Deficit Spending discourse is that both sound reasonable, but ultimately don’t pan out as well as one might expect.

              After all, of course we want to measure things to know if they “work”.

              And of course one — acting as an individual — shouldn’t get in the habit of spending more money than they have. But as the Keynsians point out, sometimes the returns to society over the longer term are worth it.

              As an example, I was in Wisconsin when Walker and his cronies did all they could to strip down spending on the university because “we just can’t afford it.” Except every dollar spent on the University of Wisconsin paid dividends many times over in terms of overall economic impact.

              In short, avoiding deficit spending might seem sensible but doesn’t necessarily pan out how one might expect.

              The same is true about RCTs. Yes, of course we should want to test our beliefs and assumptions. But there’s a big, consequential difference between us, as researchers, designing studies to test theories that we have about pedagogy, learning, or whatever, and us, working together with teachers, developing products that are actually implementable at scale. Just because we can show an effect in an RCT doesn’t mean we are actually producing answers of a form that will be taken up and used at scale.

              RCTs are all about comparing treatments, and part of that framework is the need to push hard for implementation with fidelity to the original plan (after all, if you don’t have fidelity, then how do you compare treatments to one another?). One could even drop a teacher (and their students) from a sample if the researcher judges them as so much lacking fidelity as to not really be a case of the treatment. What that means is that it’s a framework for research that ultimately devalues teachers’ agency, choice, and ability to adapt and modify things as they feel make sense for their students. It means that those well-motivated deviations are failures, noise that a researcher should screen out.

              But that’s a really weird way to think about making change in schools. As Cynthia Coburn’s work shows us, teachers have LOTS of power to resist innovations that they don’t want. You may have data showing that your innovation is the best innovation, but if a teacher doesn’t think that it’s constructed in a way that works for them, you’re not going to get very far in shaping what happens in the classroom.

              Don’t we want teachers that have expertise enough in the content matter, the curriculum, their students, to be able to adapt the content and curriculum for their students? Don’t we want researchers to listen to and study these adaptations?

              I want professional development, at scale, that helps teachers to do these things well AND that helps researchers to appreciate the perspectives, expertise, and needs that are motivating those adaptations. I want co-design partnerships that allow both kinds of stakeholders to share the knowledge that they have, whether that be the situated, narrative knowledge that teachers have, the quantitative evaluation data that researchers are trained to collect and analyze, and the domain expertise that both kinds of participants can contribute. That’s what DBIR and Improvement Science are all about. And it tends to be the opposite of what an RCT perspective gives you.

              • 10. Ben  |  February 12, 2019 at 10:26 pm

                P.S. to put it another way: Go re read Ann Brown’s design experiments paper. The buzzing blooming confusion is real. RCTs smooth all that away. But implementation requires dwelling in it.

        • 11. Mark Guzdial  |  February 11, 2019 at 9:18 pm

          I believe most people, including Roy Pea, would agree that the Bank Street study was blown out of proportion. For many years, Janet Kolodner and I had all of our learning sciences and technologies students read the chapters by Noss and Hoyles (1996) that dissect the Pea and Kurland studies, but much more interestingly, explore why these small n studies, only showing a lack of impact, and clearly stating that students didn’t learn enough Logo to expect much impact still shut down Logo. There were larger political movements at work — probably similar to the ones that did in MACOS.

          The debate between Pea and Papert was interesting because it wasn’t just about those studies. They were arguing about bigger questions. What is careful enough? How much do we put students through without evidence of effectiveness? How do we avoid confirmation bias? The “Emperor of All Maladies” made me more of a proponent of evaluation. With the best intentions and theory, we can still get it wrong pretty spectacularly. You’re clearly right that we have to give interventions a serious opportunity to establish themselves, especially given all that we have learned about the sociocultural character of learning. The issues that Pea and Papert raised are still front and center in education, and especially in computing education.

          • 12. alanone1  |  February 12, 2019 at 1:22 am

            Hi Mark

            It’s not the small n that was the problem — unless it was the small n of the years needed for a good study of teachers and students — but the whole nature of the study.

            And, yes, there was probably larger politics involved.

            I agree with you that we need to test our beliefs and assumptions. I think we did more and longer studies than pretty much any group — and did learn a few things — but most of it was not worth publishing (from any reasonable scientific sense), so we didn’t.

            We were able to take this attitude because we only took funding that didn’t require results. When we finally couldn’t get “no strings” funding, we quit doing the research.

            As I said above: “The softer the discipline the tougher you have to be”. And this means both choosing the aims of the educational boost in a tough way, and also being tough about doing the actual work needed to assess.

            Our experience over many years indicated that “positive results” don’t necessarily mean one is doing something good, and “negative results” don’t necessarily mean one is doing something bad. This is partly from the near term difficulties, and partly from the deep needs for longitudinal studies — which are really difficult to do and get clear pictures from (and which funders flee from funding).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,185 other subscribers


Recent Posts

Blog Stats

  • 2,060,332 hits
February 2019

CS Teaching Tips

%d bloggers like this: