Code Smells might suggest a different and better Notional Machine: Maybe students want more than one main()

March 18, 2019 at 7:00 am 14 comments

There is a body of research that looks for “code smells” in Scratch projects. “Code smells” are characteristics of code that suggest a deeper problem (see Wikipedia description here). I have argued that these shouldn’t be applied to Scratch, that we’re confusing software engineering with what students are doing with computing (see post here).

One of the smells is having code lying around that isn’t actually executed from the Go button, the green flag in Scratch. The argument is that code that’s not executed from the Go button is unreachable.  That’s a very main() oriented definition of what matters. There was a discussion on Twitter about that “smell” and why it’s inappropriate to apply to Scratch. I know that when I program in GP (another block-based program), I often leave little bits of maintenance code lying around that I might use to set the world’s state.

There’s another possibility for code lying around that isn’t connected and thus doesn’t executd properly — it should execute properly. There’s evidence that novice students are pretty comfortable with the idea of programs/functions/codechunks executing in parallel. They want more than one main() at once. It’s our programming systems that can’t handle this idea well.  Our languages need to step up to the notional machines that students can and want to use.

For example, in Squeak eToys, it’s pretty common to create multiple scripts to control one object. In the below example, one script is continually telling the car to turn, and the other script is continually telling the car to go forward. The overall effect is that the car turns in circles.

I was on Kayla DesPortes dissertation committee (now at NYU!). She asked novice programmers to write a script to make two lights on an Arduino to blink. She gave them the code to blink one light: In a Forever loop, they raise the voltage on a pin high, then wait a bit, then lower the voltage, then wait a bit. That makes a single light blink.

The obvious thing that more than half of the participants in her study did was to duplicate the code — either putting it in parallel or putting in sequence. One block blinked the light on one pin, and the other block blinked the light on the other pin. However, both blocks were Forever loops. Only script can execute on Arduino at a time.

On the Arduino, what the students did was buggy. It “smelled” because the second or parallel Forever block would never execute.

These examples suggest that parallel execution of scripts might be normal and even expected for novices. Maybe parallel execution is an attribute of a notional machine that is natural and even easier for students than trying to figure out how to do everything in one loop. Maybe concurrency is more natural than sequentiality.

Something that “smells” to a software engineer might actually be easier to understand for a layperson.

Entry filed under: Uncategorized. Tags: , , , .

Open Research Questions from the CS Education Research class, February 2019 Where to find Guzdial and Ericson Web Resources post-Georgia Tech: Bookmark this post

14 Comments Add your own

  • 1. alanone1  |  March 18, 2019 at 7:08 am

    This is quite nuts! (It should be clear just what really smells out there …)

    It also happens that one of the results of Etoys was that children especially did much better with parallel logic than they did with lots of conditionals (this makes perfect sense if one thinks about it for a bit).

    Reply
  • 2. ukstemblog  |  March 18, 2019 at 8:21 am

    Hi Mark, this is interesting as I’ve just concluded some similar research based on the ‘Crumble’ platform. Looking at orphaned blocks to see if there was a link between this dead code lying around and eventual success of their code. It was the first step in developing a tool to direct classroom support to where it is most needed. It seems like there isn’t or at least I couldn’t find one. So I’m looking at other behaviours now to see if there are other aspects I should be looking at.

    Reply
  • 3. orcmid  |  March 18, 2019 at 9:55 am

    The blinkers example strikes me as not standing far enough back to envision the situation.

    If they blink alternately, as on a typical light bar, one can start them in opposite states and blink them together inside one loop with whatever delays make that work. There are other ways to do that.
    To create more arbitrariness, and any number of blinkers, it is useful to create events, possibly firing of interval timers. That is, each blinker is tied to a separate timing event and one can experiment with varying the timing interval, using randomness, adding colors, all of that.

    What simply applying parallelism de novo does is complicate an already absence of rationality because the problem of when to have it and when not to have it, and synchronization become even greater mysteries.

    When you create “dead” code that you can exercise interactively in testing or anything else, do you have a way to annotate what that is for so it doesn’t smell?

    Learning to distinguish what code is versus what it is for (except, maybe when toying around and simply exploring the tool) seems like something you want to provide a gentle nudge toward from the beginning, creating an opening for the recognition of abstraction and layering. Isn’t that an important element of CS and so-called computational thinking?

    Reply
    • 4. Mark Guzdial  |  March 18, 2019 at 10:16 am

      Dennis, I suggest that you read the Commonsense Computing papers (by Gary Lewandowski, Beth Simon, and others). People find parallelism quite natural.

      What does “standing far enough back” for a novice? There are no levels of abstraction yet, nor layering. You may be thinking about this as an expert software engineering, and this is falling within your expert blind spot.

      Reply
      • 5. orcmid  |  March 18, 2019 at 11:39 am

        It is of course the case that starting computing as a 19-year-old in 1959 does not have me experienced with the roadmap you favor.

        By the way, multiple main() is strange question-begging. That is an odd extremely-low-level consideration. Seems like tinkering to me.

        I notice how often you ask me to read someone else. Links would be very handy. Such as doi:10.1145/1785414.1785438 (ACM digital library) or https://www.researchgate.net/publication/234801018_Commonsense_Understanding_of_Concurrency_Computing_Students_and_Concert_Tickets

        I came up with “Commonsense Understanding of Concurrency: Computing Students and Concert Tickets,” C.ACM 53, 7 (July 2010), 60-70. That’s a great paper. It is about CS1-taking undergraduates, yes? The Concert Tickets example is terrific, and the discussions are wonderful about “stepping” back and dealing with shared-data concurrency issues, and they even got into the seat-selection problem as well as the ticket purchase problem. It is interesting to me that in the initial study that provoked the larger one, the results are quite different. The real-world setting and formulation, separated from programming as such, seems to be the benefit of the second study. I recommend this paper too.

        These students identified the problems of race conditions (whether knowledgable of the term) and also the problem of particular seats. They demonstrated conceptual knowledge and proposed strategies, but all had problems around the in-progress reservations of a seat that might not be completed. (Counting tickets seems easier, handling concurrent choices shows how it is not.) A great grounded problem. There was no programming or exploration of the mechanisms available in support of concurrency (not just parallelism). The conclusion is that it might be helpful to introduce such things earlier in the curriculum before spending so much on fluency with deterministic procedures first.

        To me, it is an amazing inference to go from this to conclusions about people finding parallelism quite natural. I refer you back to the initial study in that paper where there was too much assumption of atomicity and waving away of the prospect of race conditions.

        Or, perhaps, you have another paper in mind?

        Reply
        • 6. orcmid  |  March 18, 2019 at 11:43 am

          PS: The brief abstract, “Innate understanding of concurrency helps beginners solve CS problems with multiple processes executing at the same time.” leaves too much to the imagination, I fear, although the text is much more circumspect, befitting the care of the particular study.

          Reply
    • 7. gasstationwithoutpumps  |  March 18, 2019 at 11:08 am

      Running two different blinky lights with different unrelated periods is easy in a parallel system, but much harder in a single-thread system. You have to introduce all the machinery needed to implement parallelism (task switching, event schedulers, interrupts, …) in order to do a simple task like having one blink at π times a second and one at once every π seconds.

      Although there is some benefit to students learning how parallelism is implemented on a single-thread machine, that is probably not the best place to start.

      A lot of real hardware and software is inherently parallel and enforcing single-thread behavior is a difficult problem, both at a low level (metastability and synchronizers for signals that cross clock-domain boundaries) and at a high level (cache coherency in distributed databases).

      Reply
    • 8. orcmid  |  March 23, 2019 at 11:29 am

      Point of clarification: I do not propose introduction of layers of abstraction specifically (and I don’t think of it as limited to computational arrangements). I said that standing back and addressing what a computation is for rather than what it is provides an opening for recognition of that kind of thing, related to the purposive use of computations. It is useful to foster that informally as a counter to simple fascination with what a computation is.

      Context matters and I would hope that is emphasized relatively early.

      Reply
  • 9. Raul Miller  |  March 18, 2019 at 11:13 am

    Thinking about this from my perspective (decades of computing work):

    There are several approaches to the “more than one main” concept.

    1) main (the real one) can dispatch to the alternates. But you (or someone) need(s) to decide how that part works.

    2) In a repl environment (or a debugging environment or a development environment, or …) alternate code paths can be brought up by the user. Here, colorizing or otherwise marking off the supposed “unreachable” stuff can be wise.

    3) In a finished product, the use of eval or other such “bad practices” (pointers, user constructed names, etc.) can bring in “dead code”.

    4) When it’s unnecessary, dead code can be painful and unpleasant.

    5) the details of getting two “forever loops” to run “simultaneously” tend to be “advanced topics” (time slicing schedulers or parallel machines). That said, if she had had two arduinos that would have been a simple approach.

    6) On the internet (an environment which includes a few hostile people, a lot of friendlies, and also people making decisions who don’t adequately understand…) any of the above variant perspectives can take a turn for the worse.

    Reply
  • 10. orcmid  |  March 18, 2019 at 11:45 am

    LOL. After reading @gasstationwithoutpumps and Raul, I think I can be excused from operating from a Software Engineering expert blind spot.

    Reply
  • 11. orcmid  |  March 18, 2019 at 11:59 am

    PPS. I am curious why the eToys examples is thought to be about parallelism, rather than the separate setting of two properties on the same object? Is this so much about timing or is it just about the means of interactively specifying those (dynamic?) properties.

    Reply
  • 12. Mark Miller  |  March 18, 2019 at 8:36 pm

    Mark, although the terminology was less “colorful,” the idea dates back to at least Ira Goldstein, whose system Mycroft used “Rational Form Criteria” to suggest possible bugs in student Logo programs. I don’t think it applies to children program in the same sense that it applies in a professional software engineering context, but if we want to build smarter systems that can help children solve their own bugs by asking Socratic questions, say, then these sorts of things can be helpful. For example, if a student program to draw a figure with the turtle has fd(100) followed immediately by fd(100), the odds are pretty good that they might have wanted a rt(90) or other rotation in between. A beginner might not realize that values other than 100 are permissible, but even after a day or so, the student would simply combine the distances to make fd(200) if that’s what they actually wanted. So, bottom line, I think there can be a role for systems that can analyze student code in this way. Best, Mark

    Reply
  • 13. Mike  |  March 20, 2019 at 3:57 pm

    I’m curious about novices and parallelism. In particular, it seems like the examples above (turn AND press the gas peddle; blink the left AND blink the right LEDs) fall into the ‘Embarrassingly Parallel’ category of parallel task.

    I’m curious: how well novices do when one needs synchronization to coordinate more complicated work?

    I guess there’s an implied question here too – do we care how well they do with synchronization? “Wouldn’t it be great if easy parallel problems were easy, and we’ll cover the harder later on in the curriculum like we’re currently doing.” seems like a reasonable answer.

    Reply
  • […] programming. They are going to solve different problems for different purposes in different ways (a point I made in this blog post several years ago). Few US teachers in K-12 are taught how to teach good software engineering practice — […]

    Reply

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 11.4K other subscribers

Feeds

Recent Posts

Blog Stats

  • 2,097,034 hits
March 2019
M T W T F S S
 123
45678910
11121314151617
18192021222324
25262728293031

CS Teaching Tips