Learning Curves, Given vs Generated Subgoal Labels, Replicating a US study in India, and Frames vs Text: More ICER 2016 Trip Reports
September 16, 2016 at 7:07 am 4 comments
My Blog@CACM post for this month is a trip report on ICER 2016. I recommend Amy Ko’s excellent ICER 2016 trip report for another take on the conference. You can also see the Twitter live feed with hashtag #ICER2016.
I write in the Blog@CACM post about three papers (and reference two others), but I could easily write reports on a dozen more. The findings were that interesting and that well done. I’m going to give four more mini-summaries here, where the results are more confusing or surprising than those I included in the CACM Blog post.
This year was the first time we had a neck-and-neck race for the attendee-selected award, the “John Henry” award. The runner-up was Learning Curve Analysis for Programming: Which Concepts do Students Struggle With? by Kelly Rivers, Erik Harpstead, and Ken Koedinger. Tutoring systems can be used to track errors on knowledge concepts over multiple practice problems. Tutoring systems developers can show these lovely decreasing error curves as students get more practice, which clearly demonstrate learning. Kelly wanted to see if she could do that with open editing of code, not in a tutoring system. She tried to use AST graphs as a sense of programming “concepts,” and measure errors in use of the various constructs. It didn’t work, as Kelly explains in her paper. It was a nice example of an interesting and promising idea that didn’t pan out, but with careful explanation for the next try.
I mentioned in this blog previously that Briana Morrison and Lauren Margulieux had a replication study (see paper here), written with Adrienne Decker using participants from Adrienne’s institution. I hadn’t read the paper when I wrote that first blog post, and I was amazed by their results. Recall that they had this unexpected result where changing contexts for subgoal labeling worked better (i.e., led to better performance) for students than keeping students in the same context. The weird contextual-transfer problems that they’d seen previously went away in the second (follow-on) CS class — see below snap from their slides. The weird result was replicated in the first class at this new institution, so we know it’s not just one strange student population, and now we know that it’s a novice problem. That’s fascinating, but still doesn’t really explain why. Even more interesting was that when the context transfer issues go away, students did better when they were given subgoal labels than when they generated them. That’s not what happens in other fields. Why is CS different? It’s such an interesting trail that they’re exploring!
Mike Hewner and Shitanshu Mishra replicated Mike’s dissertation study about how students choose CS as a major, but in Indian institutions rather than in US institutions: When Everyone Knows CS is the Best Major: Decisions about CS in an Indian context. The results that came out of the Grounded Theory analysis were quite different! Mike had found that US students use enjoyment as a proxy for ability — “If I like CS, I must be good at it, so I’ll major in that.” But Indian students already thought CS was the best major. The social pressures were completely different. So, Indian students chose CS — if they had no other plans. CS was the default behavior.
One of the more surprising results was from Thomas W. Price, Neil C.C. Brown, Dragan Lipovac, Tiffany Barnes, and Michael Kölling, Evaluation of a Frame-based Programming Editor. They asked a group of middle school students in a short laboratory study (not the most optimal choice, but an acceptable starting place) to program in Java or in Stride, the new frame-based language and editing environment from the BlueJ/Greenfoot team. They found no statistically significant differences between the two different languages, in terms of number of objectives completed, student frustration/satisfaction, or amount of time spent on the tasks. Yes, Java students got more syntax errors, but it didn’t seem to have a significant impact on performance or satisfaction. I found that totally unexpected. This is a result that cries out for more exploration and explanation.
There’s a lot more I could say, from Colleen Lewis’s terrific ideas to reduce the impact of CS stereotypes to a promising new method of expert heuristic evaluation of cognitive load. I recommend reviewing the papers while they’re still free to download.
Entry filed under: Uncategorized. Tags: computing education research, educational psychology, ICER, learning sciences.
1.
Michael Kölling | October 24, 2016 at 7:13 am
Hi Mark,
Just a quick remark about one specific aspect of your post: the evaluation of the frame-based editor.
I think that your summary here is a little top pessimistic. It’s true there was no difference in subjective satisfaction, but the Stride group (frame editor) achieved more of the objectives in the same time than the Java (text editor) group (Figure 5 in the paper). For some of the later objectives, the difference is quite significant.
The Java group also spent more time idle towards the end of the experiment, while the Stride group continued to work more effectively. (It is not clear what causes this. Lost interest? Difficulty? Worth a look.)
It’s true that it would be nice to repeated a similar study with more participants to get more reliable data, but for a first study this looks quite promising to me!
Regards,
Michael
2.
Mark Guzdial | October 24, 2016 at 9:47 am
Hi Michael,
You (i.e., All y’all, since I’m here in the South) say in the paper:
Yes, it’s more, but it’s not statistically significantly more. So, it could be spurious. As you say in the paper, yes, there are big differences on later objectives, but it’s also the case that fewer Java students even attempted those, so it’s not really a fair comparison.
I’m not at all pessimistic. It’s fascinating to study both frames-based and blocks-based programming languages. I’m surprised that Stride didn’t blow Java out of the water. That’s not an anti-Stride statement. That’s a statement about how interesting and complicated is student performance in programming.
Cheers,
Mark
3.
How to Teach Computational Literacy/Thinking: Wolfram’s Language and Code.org’s Response | Computing Education Blog | October 31, 2016 at 7:18 am
[…] What gets used in daily practice by professionals is the result of historical and cultural factors that in no way imply that we made choices optimized what is best for learners. Fortran won over Lisp because (in part) we didn’t know how to compile Lisp efficiently, but we do now and we know how to teach Lisp well. C++ and then Java won over Pascal because of perceptions of what industry wanted, not because of data and not because Pascal was shown to be ineffective for learners. What we know about what is “natural” for learners first thinking about programming strongly imply that Wolfram’s functional structures are easier for learners than loops and declarations. We should strive to make decisions for what we use in classrooms based on evidence, not on what is professional practice, nor what we decide based on social defense mechanisms. […]
4.
Graduating Dr. Briana Morrison: Posing New Puzzles for Computing Education Research | Computing Education Blog | December 16, 2016 at 7:00 am
[…] At ICER 2016, she presented a replication study of her first given vs. generated subgoals study. […]