Posts tagged ‘computing education research’
No, Really – Programming is Hard and CS Flipped Classrooms are Complicated: ITICSE 2016 Award-Winning Papers
I only recently started digging into the papers from the ITICSE 2016 conference (see Table of Contents link at ACM Digital Library here). There were two papers that caught my attention.
First, the best paper award went to one of my former PhD students, Brian Dorn: An Empirical Analysis of Video Viewing Behaviors in Flipped CS1 Courses, by Suzanne L. Dazo, Nicholas R. Stepanek, Robert Fulkerson, and Brian Dorn. Brian has this cool piece of technology where students can view videos, annotate them, be challenged to answer questions from specific places, and have discussions. They used this for teaching a flipped CS1 class, where students were required to watch videos before class and then engage in more active learning opportunities in class. The real trick, as you might imagine and that the paper goes into detail on, is getting students to watch the video. I liked both the techniques for prodding students to watch videos and the fascinating results showing the relationship between watching the videos and learning.
ITICSE 2016 recognized two “commended” papers this year. I haven’t found the listing of which papers they were, but I did learn that one of them is Learning to Program is Easy by Andrew Luxton-Reilly. I enjoyed reading the paper and recommend it — even though I disagree with his conclusions, captured in the paper title. He does a good job of exploring the evidence that programming is hard (and even uses this blog as a foil, since I’ve claimed several times that programming is hard), and overall, is a terrific synthesis of a bunch of computing education papers (40 references is a lot for a six page ITICSE paper).
His argument that programming is easy has two parts:
- First, children do it. As he says in the abstract, “But learning to program is easy — so easy that children can do it.” That’s a false comparison — what children do in programming is not the same definition of “programming” that is in most of the literature that Andrew cites. The evidence that programming is hard is coming mostly from higher-ed CS classes. What is going on in introductory University CS classes and what children do is dramatically different. We saw that in the WIPSCE 2014 Fields and Kafai paper, and those results were recently replicated in a recent ICER 2016 paper. These are two different activities.
- Second, what higher-education CS teachers expect at the end of the first course is too much. He presents significant evidence that what CS teachers expect is achieved by students, but at the end of the second course. The paper from Morrison, Decker, and Margulieux supports the argument that students think and work very differently and much more successfully by the end of the second CS course than in the first course.
I see Andrew’s argument as evidence that programming is hard. The problem is that Andrew doesn’t define the target. What level of ability counts as “programming”? I believe that level of ability described by the McCracken Working Group, by the FCS1/SCS1 exams, and by most teachers as the outcomes from CS1 (these are all cited by Andrew’s paper) is the definition of the lowest level of “programming ability.” That it takes two courses to reach that level of ability is what I would call hard.
I’ve been reading a terrific book, Proust and the Squid: The Story and Science of the Reading Brain by Maryanne Wolf. It’s the story of how humans invented reading, how we teach reading, and how reading changes our brains (physically and in terms of cognitive function). Oral language is easy. We are literally wired for that. Reading is hard. We are not wired for that, and much of the invention of reading is about inventing how to teach reading. Unless you can teach reading to a significant part of your population, you don’t develop a literate culture, and your written language doesn’t succeed.
Much of the invention of written language is about making it easier to learn and teach because learning to read is so hard. Have you ever thought about why our Latin alphabet is ordered? Why do we talk about the “ABC”‘s and sing a song about them? We don’t actually need them to be ordered to read. Ordering the alphabet makes it easier to memorize, and learning to read is a lot about memorization, about drill-and-practice to make the translation of symbols to sounds to words to concepts effortless (or at least, all System 1 in Kahneman terms). This makes it easier, but the task of learning to read and write is still a cognitively complex task that takes a significant amount of time to master. It’s hard.
Programming is hard like written language is hard. It’s not possible to program unless you know how to read. Programming is particularly hard because the concepts that we’re mapping to are unfamiliar, are not part of our daily experience. We only see it as easy because we have expert blind-spot. We have already learned these concepts and made those mappings. We have constructed understandings of iteration and conditional execution and variable storage. It is difficult for experts to understand how hard it is to develop those concepts. The evidence of children programming suggests that most children who program don’t have those concepts.
I remain unconvinced by Andrew’s argument, but I recommend the paper for a great summary of literature and an interesting read.
I have written about this Dagstuhl Seminar (see earlier post). The formal report is now available.
This seminar discussed educational outcomes for first-year (university-level) computer science. We explored which outcomes were widely shared across both countries and individual universities, best practices for assessing outcomes, and research projects that would significantly advance assessment of learning in computer science. We considered both technical and professional outcomes (some narrow and some broad) as well as how to create assessments that focused on individual learners. Several concrete research projects took shape during the seminar and are being pursued by some participants.
Learning Curves, Given vs Generated Subgoal Labels, Replicating a US study in India, and Frames vs Text: More ICER 2016 Trip Reports
My Blog@CACM post for this month is a trip report on ICER 2016. I recommend Andy Ko’s excellent ICER 2016 trip report for another take on the conference. You can also see the Twitter live feed with hashtag #ICER2016.
I write in the Blog@CACM post about three papers (and reference two others), but I could easily write reports on a dozen more. The findings were that interesting and that well done. I’m going to give four more mini-summaries here, where the results are more confusing or surprising than those I included in the CACM Blog post.
This year was the first time we had a neck-and-neck race for the attendee-selected award, the “John Henry” award. The runner-up was Learning Curve Analysis for Programming: Which Concepts do Students Struggle With? by Kelly Rivers, Erik Harpstead, and Ken Koedinger. Tutoring systems can be used to track errors on knowledge concepts over multiple practice problems. Tutoring systems developers can show these lovely decreasing error curves as students get more practice, which clearly demonstrate learning. Kelly wanted to see if she could do that with open editing of code, not in a tutoring system. She tried to use AST graphs as a sense of programming “concepts,” and measure errors in use of the various constructs. It didn’t work, as Kelly explains in her paper. It was a nice example of an interesting and promising idea that didn’t pan out, but with careful explanation for the next try.
I mentioned in this blog previously that Briana Morrison and Lauren Margulieux had a replication study (see paper here), written with Adrienne Decker using participants from Adrienne’s institution. I hadn’t read the paper when I wrote that first blog post, and I was amazed by their results. Recall that they had this unexpected result where changing contexts for subgoal labeling worked better (i.e., led to better performance) for students than keeping students in the same context. The weird contextual-transfer problems that they’d seen previously went away in the second (follow-on) CS class — see below snap from their slides. The weird result was replicated in the first class at this new institution, so we know it’s not just one strange student population, and now we know that it’s a novice problem. That’s fascinating, but still doesn’t really explain why. Even more interesting was that when the context transfer issues go away, students did better when they were given subgoal labels than when they generated them. That’s not what happens in other fields. Why is CS different? It’s such an interesting trail that they’re exploring!
Mike Hewner and Shitanshu Mishra replicated Mike’s dissertation study about how students choose CS as a major, but in Indian institutions rather than in US institutions: When Everyone Knows CS is the Best Major: Decisions about CS in an Indian context. The results that came out of the Grounded Theory analysis were quite different! Mike had found that US students use enjoyment as a proxy for ability — “If I like CS, I must be good at it, so I’ll major in that.” But Indian students already thought CS was the best major. The social pressures were completely different. So, Indian students chose CS — if they had no other plans. CS was the default behavior.
One of the more surprising results was from Thomas W. Price, Neil C.C. Brown, Dragan Lipovac, Tiffany Barnes, and Michael Kölling, Evaluation of a Frame-based Programming Editor. They asked a group of middle school students in a short laboratory study (not the most optimal choice, but an acceptable starting place) to program in Java or in Stride, the new frame-based language and editing environment from the BlueJ/Greenfoot team. They found no statistically significant differences between the two different languages, in terms of number of objectives completed, student frustration/satisfaction, or amount of time spent on the tasks. Yes, Java students got more syntax errors, but it didn’t seem to have a significant impact on performance or satisfaction. I found that totally unexpected. This is a result that cries out for more exploration and explanation.
There’s a lot more I could say, from Colleen Lewis’s terrific ideas to reduce the impact of CS stereotypes to a promising new method of expert heuristic evaluation of cognitive load. I recommend reviewing the papers while they’re still free to download.
When I visited Mumbai for LaTICE 2016, I mentioned meeting Yogendra Pal. I was asked to be a reader for his thesis, which I found fascinating. I’m pleased to report that he has now graduated and his thesis, A Framework for Scaffolding to Teach Vernacular Medium Learners, is available here: https://www.cse.iitb.ac.in/~sri/students/#yogendra.
I learned a lot from Yogendra’s thesis, like what “vernacular medium learners” means. Here’s the problem that he’s facing (and that Yogendra faced as a student). Students go through primary and secondary school learning in one language (Hindi, in Yogendra’s personal case and in his thesis), and then come to University to study Computer Science. Do you teach them (what Yogendra calls “Medium of Instruction” or MoI) in English, or in Hindi? Note that English is pervasive in Computer Science, e.g., almost all our programming languages use English keywords.
Here’s Yogendra’s bottomline finding: “We find that self-paced video-based environment is more suitable for vernacular medium students than a classroom environment if English-only MoI are used.” Yogendra uses a design-based research methodology. He measures the students, tries something based on his current hypothesis, then measures them again. He compares what he thought would happen to what he saw, and revises his hypothesis — and then iterate. Some of the scaffolds he tested may seem obvious (like using a slower pace), but a strength of the thesis is that he develops rationale for each of his changes and tests them. Eventually, he came to this surprising (to me) and interesting result: It’s better to teach with Hindi in the classroom, and in English when students are learning from self-paced videos.
The stories at the beginning of the thesis are insightful and moving. I hadn’t realized what a handicap it is to be learning English in a class that uses English. It’s obvious that the learners would be struggling with the language. What I hadn’t realized was how hard it is to raise your hand and ask questions. Maybe you have a question just because you don’t know the language. Maybe you’ll expose yourself to ridicule because you’ll post the question wrong.
Yogendra describes solutions that the Hindi-speaking students tried, and where the solutions didn’t work. The Hindi-speaking students used English-to-English dictionaries. They didn’t want English-Hindi dictionaries, because they wanted to become fluent in English, but they needed help with the complicated (especially technical) words. They tried using online videos for additional explanations of concepts, but most of those were made by American or British speakers. When you’re still learning English, switching from an Indian accent to another accent is a barrier to understanding.
The middle chapters are a detailed description of Yogendra’s attempts to scaffold student learning. He tried to teach in all-Hindi but some English technical terms like “execute” don’t have a direct translation in Hindi. He selected other Hindi words to represent the technical terms, but the words he selected as the Hindi translation were unusual and not well-known to the students. Perhaps the most compelling insight for me in these chapters was how important it was to both the students and the teachers that the students learn English — even when the Hindi materials were measurably better for learning in some conditions.
In the end, he found that Hindi language screencasts led to better learning (statistically significantly) when the learners (who had received primary and secondary school instruction in Hindi) were in a classroom, but that the English language screencasts led to better learning (again, statistically significantly) when the learners were watching the screencasts self-paced. When the students are self-paced, they can rewind and re-watch things that are confusing, so it’s okay to struggle with the English. In the classroom, the lecture just goes on by. It works best if it’s in Hindi for the students who learned in Hindi in school.
Yogendra tells a convincing story. It’s an interesting question of how these lessons transfer to other contexts. For example, what are the issues for Spanish-speaking students learning CS in the United States? In a general form, can we use the lessons from this thesis to make CS learning accessible to more ESL (English as a Second Language) learners?
I didn’t realize that Computing at School has their own YouTube channel: https://www.youtube.com/c/computingatschooltv
This episode is particularly relevant for this blog — Sue Sentence talking about computing education research: https://www.youtube.com/watch?v=T-NaxSaXtRA
Some of the articles mentioned by Sue (from Miles Berry on the CAS site):
- Lister, R., 2011. Concrete and other neo-Piagetian forms of reasoning in the novice programmer. In Proceedings of the Thirteenth Australasian Computing Education Conference-Volume 114 (pp. 9-18). Australian Computer Society, Inc.
- Cutts, Q., Cutts, E., Draper, S., O’Donnell, P. and Saffrey, P., 2010. Manipulating mindset to positively influence introductory programming performance. In Proceedings of the 41st ACM technical symposium on Computer science education (pp. 431-435). ACM.
- Sorva, J., 2013. Notional machines and introductory programming education. ACM Transactions on Computing Education, 13(2), p.8.
- Zingaro, D. and Porter. L., 2015. Tracking Student Learning from Class to Exam using Isomorphic Questions. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education (SIGCSE ‘15). ACM, New York, NY, USA, 356-361
- Werner, L., & Denning, J. (2009). Pair programming in middle school: What does it look like?. Journal of Research on Technology in Education, 42(1), 29-49
- Price, T.W. and Barnes, T., 2015. Comparing Textual and Block Interfaces in a Novice Programming Environment. In Proceedings of the eleventh annual International Conference on International Computing Education Research (pp. 91-99). ACM.
- Weintrop, D. and Wilensky, U., 2015. To block or not to block, that is the question: students’ perceptions of blocks-based programming. In Proceedings of the 14th International Conference on Interaction Design and Children (pp. 199-208). ACM.
- Margulieux, L. E., Catrambone, R., & Guzdial, M. (2013). Subgoal labeled worked examples improve K-12 teacher performance in computer programming training. In Proceedings of the 35th Annual Conference of the Cognitive Science Society (pp. 978-983).
- Kafai, Y.B. and Vasudevan, V., 2015. Constructionist Gaming Beyond the Screen: Middle School Students’ Crafting and Computing of Touchpads, Board Games, and Controllers. In Proceedings of the Workshop in Primary and Secondary Computing Education (pp. 49-54). ACM.
- Sentance, S. and Csizmadia, A., 2016. Computing in the curriculum: Challenges and strategies from a teacher’s perspective. Education and Information Technologies, pp.1-27.
Preview ICER 2016: Ebooks Design-Based Research and Replications in Assessment and Cognitive Load Studies
The International Computing Education Research (ICER) Conference 2016 is September 8-12 in Melbourne, Australia (see website here). There were 102 papers submitted, and 26 papers accepted for a 25% acceptance rate. Georgia Tech computing education researchers are justifiably proud — we submitted three papers to ICER 2016, and we had three acceptances. We’re over 10% of all papers at ICER 2016.
One of the papers extends the ebook work that I’ve reported on here (see here where we made them available and our paper on usability and usage from WiPSCE 2015). Identifying Design Principles for CS Teacher Ebooks through Design-Based Research (click on the title to get to the ACM DL page) by Barbara Ericson, Kantwon Rogers, Miranda Parker, Briana Morrison, and I use a Design-Based Research perspective on our ebooks work. We describe our theory for the ebooks, then describe the iterations of what we designed, what happened when we deployed (data-driven), and how we then re-designed.
Two of our papers are replication studies — so grateful to the ICER reviewers and communities for seeing the value of replication studies. The first is Replication, Validation, and Use of a Language Independent CS1 Knowledge Assessment by Miranda Parker, me, and Shelly Engleman. This is Miranda’s paper expanding on her SIGCSE 2016 poster introducing the SCS1 validated and language-independent measure of CS1 knowledge. The paper does a great survey of validated measures of learning, explains her process, and then presents what one can and can’t claim with a validated instrument.
The second is Learning Loops: A Replication Study Illuminates Impact of HS Courses by Briana Morrison, Adrienne Decker, and Lauren Margulieux. Briana and Lauren have both now left Georgia Tech, but they were still here when they did this paper, so we’re claiming them. Readers of this blog may recall Briana and Lauren’s confusing results from SIGCSE 2016 result that suggest that cognitive load in CS textual programming is so high that it blows away our experimental instructional treatments. Was that an aberration? With Adrienne Decker’s help (and student participants), they replicated the study. I’ll give away the bottom line: It wasn’t an aberration. One new finding is that students who did not have high school CS classes caught up with those who did in the experiment, with respect to understanding loops
We’re sending three of our Human-Centered Computing PhD students to the ICER 2016 Doctoral Consortium. These folks will be in the DC on Sept 8, and will present posters to the conference on Sept 9 afternoon.
- Barbara Ericson will be presenting her results with Dynamically Adaptive Parsons Problems. I’ve seen some of the pilot study results from this summer, and they’re fascinating.
- Amber Solomon is just starting her second year working with me. She did the evaluation on the AR Design Studio classroom. She (and I) is fascinated by Steve Cooper’s results from ICER 2015 where spatial reasoning training influenced CS performance and reduced SES differences. She’s been doing a study on CS grades, SES, and spatial reasoning in a non-majors class. She’ll be presenting on The Role of Spatial Reasoning in Learning Computer Science.
- Kayla DesPortes works with my colleague Betsy DiSalvo on the learning that happens in MakerSpaces. She’s designing new kinds of physical interfaces to reduce cognitive load and improve learning when working with electronics, which she’ll be talking about at her poster: Learning and Collaboration in Physical Computing.
I needed to look up a paper on Andreas Stefik’s page the other day and came across this fascinating new paper from him:
Phillip Merlin Uesbeck, Andreas Stefik, Stefan Hanenberg, Jan Pedersen, and Patrick Daleiden. 2016. An empirical study on the impact of C++ lambdas and programmer experience. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). ACM, New York, NY, USA, 760-771.
(You can download it for free from his publications page: http://web.cs.unlv.edu/stefika/research.html.)
Since this is Stefik, he carefully describes what his paper is saying and what it’s not saying. For example, he and his students measured C++ lambdas vs iterators — not a particularly pleasant syntax to work with.
The results are quite interesting. This graph is what caught my eye. For professionals, iteration and lambdas work just about the same. For novices, iterators blows lambdas away. Lambda-using students took more time to complete tasks and received more compiler errors (though that might be a good thing, in terms of using the compiler to find and correct bugs). Most interesting was how the differences disappeared with experience. Quoting from the abstract:
Finally, experienced users were more likely to complete tasks, with or without lambdas, and could do so more quickly, with experience as a factor explaining 45.7% of the variance in our sample in regard to completion time.
This is an example of my “Test, don’t trust” principle (see earlier blog post). I was looking up Stefik’s paper because I received an email from someone who simply claimed, “And I’m using functional notation because it’s much easier for novices than procedural or object-oriented.” That may be true, but it ought to be tested.