Language Choice = f(Number of Copies)

August 26, 2009 at 11:03 am 13 comments

Last night, a user reported a bug in our latest version of JES, the Jython IDE that we use in our Media Computation classes. In cleaning up the code for release, one of the developers renamed the short variable “pict” to “picture”–in all but one spot.  The function that broke (with a “name not found” error in the Jython function) is writePictureTo, a really important function for being able to share the images resulting from playing with Media Computation.  This was particularly disappointing because this release was a big one (e.g., moving from one-based to zero-based indexing) and was our most careful development efforts (e.g., long testing cycle with careful bug tracking).  But at the end, there was a “simple clean-up” that certainly (pshaw!) wasn’t worth re-running the regression tests–or so the developer thought.  And now, Version 3.2.1 and 4.2.1 (for zero and one-based indexing in the media functions) will be out later today.

This has got me wondering about the wisdom of developing an application used by hundreds, if not thousands, of students in Python (or Jython).  I’ve done other “largish” (defined here, for a non-Systems-oriented CS professor, as “anything that takes more than three days to code”) systems in Python.  I built a case library which generated multiple levels of scaffolding from a small set of base case material, called STABLE.  Running the STABLE generator was aggravating because it would run for awhile…then hit one of my typos.  Over and over, I would delete all the HTML pages generated so far, make the 5 second fix, and start the run all over.  It was annoying, but it wasn’t nearly as painful as this bug — requiring everyone who downloaded JES 3.2/4.2 to download it again.

I’m particularly sensitized to this issue after this summer, where I taught workshops (too often) where I literally switched Python<->Java every day.  I became aware of the strengths and weaknesses of each for playing around with media.  Python is by-far more fun for trying out a new idea, generating a new kind of sound or image effect.  But this bug wouldn’t have happened in Java! The compiler would have caught the mis-named variable.  I build another “largish” system in Squeak (Swiki), which also would have caught this bug at compile time.

My growing respect for good compilers doesn’t change my attitude about good first languages for students of computing.  The first language should be fun, with minimal error messages (even at compile time), with rapid response times and lots of opportunities for feedback.   So where does one make the transition, as a student?  Why is it important to have good compilers in one place and not in the other?

I am not software engineering researcher, so I haven’t thought about this as deeply as they have.  My gut instinct is that your choice of language is a function (at least in part) of the number of copies of the code that will ever exist.  If you’re building an application that’s going to live on hundreds, thousands, or millions of boxes, then you have to be very careful — correcting a bug is very expensive.  You need a good compiler helping you find mistakes.  However, if you’re building an application for the Web, I can see why dynamic, scripting languages make so much sense.  They’re fun and flexible (letting you build new features quickly, as Paul Graham describes), and fixing a bug is cheap and easy.  If there’s only one copy of the code, it’s as easy as fixing a piece of code for yourself.

First-time programmers should only be writing code for themselves.  It should be a fun, personal, engaging experience.  They should use programming languages that are flexible and responsive, without a compiler yelling at them.  (My students using Java always complain about “DrJava’s yelling at me in yellow!” when the error system highlights the questionable line of code.)  But they should also be told in no uncertain terms that they should not believe that they are creating code for others.  If they want to produce application software for others, they need to step up to another level of discipline and care in what they do, and that usually means new tools.

I still strongly believe that the first course in computing should not be a course in software engineering.  Students should not have to learn the discipline of creating code for others, while just starting to make sense of the big ideas of computing.  The first course should be personal, about making code for your expression, your exploration, and your ideas.  But when students start building code for others, engineering practice and discipline is required. Just don’t start there.

Entry filed under: Uncategorized. Tags: , , , , .

The Learning Process for Education Research Nice CACM piece on K-12 Education Policy

13 Comments Add your own

  • 1. Darrin Thompson  |  August 26, 2009 at 11:35 am

    Sample size of 1? As you said, you haven’t researched it and really neither have I. But I really like python so Internet custom clearly indicates I must flame you.

    Releases are the thing in software. The true computational value to whomever you serve is held up until you release software.

    If you have a lot of customers who demand things work well out of the box, well, you need a well disciplined process that learns from experience.

    If the best you can do is wonder if you should have written it all in another language then no worries. You won’t have so many customers soon.

    If you learn that your release process contains too much variation and you change the inputs to the system, or you remove some assignable cause of variation, well then you are cooking with gas.

    Release engineering is not very computational.

    There might be a _reason_ someone thought it a good idea to release changes with inadequate testing.

    There might be a _cultural_ problem, where people are consistently rewarded for committing without regression testing.

    There might be a _technical_ problem where the tests are too slow and have become a bottleneck that people have to work around to get work done.

    Bottom line, I like python. Don’t mess with python. Don’t say Java is better than python. 😉

    • 2. Mark Guzdial  |  August 26, 2009 at 11:51 am

      It’s an Internet custom to flame someone you disagree with? Talk about a _cultural_ problem!

      I really like Python. I don’t think Python solves all problems.

      • 3. Darrin Thompson  |  August 26, 2009 at 12:14 pm

        In all seriousness, I think you are biasing your sample to weight a brown paper bag error, easily avoided, far too high, and discarding the real email from the universe to you as spam.

        This is a Deming, not a Turing problem.

        And kudos for sharing. I’d probably have hidden a mistake like that in shame.

  • 4. Eugene Wallingford  |  August 26, 2009 at 12:02 pm

    I agree. Mark — the first course in computing should not be a course in software engineering. But I’d also say that, when working on code in an interpreted language, running the tests at build-time or check-in time is essential. When that becomes part of one’s standard practice, building software for others to use works out pretty well even though the language is interpreted. The challenge then becomes creating a strong test suite.

    • 5. Mark Guzdial  |  August 26, 2009 at 12:18 pm

      Agreed, Eugene. The tests should have been run. But people do make mistakes. The compiler catches that kind of error even without running the tests.

      • 6. Eugene Wallingford  |  August 26, 2009 at 1:11 pm

        True — people do make mistakes. I’m certainly not casting any blame here. I am intrigued by your software-for-self/software-for-others distinction. I like it when my compiler catches errors like this for me, but… still I am uneasy to give up the freedom that a language like Smalltalk, Scheme, Ruby, or Python gives me when I am writing code, even when the value of your f() rises.

        Maybe we need to build a refactoring browser for Python, so that a tool could at least help us see possible cases such as this one.

        Now I see in a later comment your mention of a trade-off between process and language, a la space vs time. Interesting. I want to think about that one!

  • 7. Bill Mill  |  August 26, 2009 at 12:03 pm

    So you had the proper process in place to catch an error like this, nobody ran it, and you blame the error on the language?

    Seems to me like a process problem in this case, not a language problem.

    • 8. Mark Guzdial  |  August 26, 2009 at 12:20 pm

      Isn’t there a process/language trade-off, sort of like a space/time tradeoff?

      • 9. Darrin Thompson  |  August 26, 2009 at 12:26 pm

        It wouldn’t be linear.

        This is going to be your most commented post _ever_.

        I suppose next you are going to tell us you think new students shouldn’t use vi?

      • 10. Bill Mill  |  August 26, 2009 at 4:06 pm

        I don’t think it’s all that useful to think of it as a tradeoff, since as Darrin points out their relationship is not linear. Neither, of course, is the relationship between space and time always linear, but often they are and that’s when the tradeoff model is useful for thinking about them.

        Rather, I think that you found one of the small class of bugs which a complier will always catch but an interpreted program can allow to drift into runtime, and this class of bugs tempts many to say “use java to stop this class of bugs!”.

        Unfortunately, the much larger class of bugs not found by the compiler needs testing in any language. This is why I think you have a process problem; the bug you hit just happens to be in a small class of bugs caught by the compiler, but the deeper problem is that you made a release without testing it (AFAICT – please correct me if I’m wrong).

        I hope that, in the future, there will be more of a process/language tradeoff to speak of. I think that when we have a gradually typed post-Python post-Haskell language where modules migrate towards type safety as they mature, it will be more possible to speak of it, but that for now we’re stuck with less fundamental, more practical, constraints.

  • […] Continue reading here: Language Choice = f(Number of Copies) « Computing Education Blog […]

  • 12. Mark Guzdial  |  August 26, 2009 at 4:03 pm

    Eugene blogged on this post, and came up with a better statement than what I did: There is a trade-off between process and tools. You can generate great code for many people using Python, if you are disciplined in your process. Alternatively, you can use tools (including languages) that will insure testing and type-checking and other kinds of checks, even if you are not disciplined enough in your process. To produce code for many, the checks have to be there, either from your discipline or from your tools.

  • 13. Lloyd Smith  |  August 28, 2009 at 6:24 pm

    Oddly enough, I was finishing up chapter 1 today and one tihng I talked about was why Python is great for writing small programs, while C++ and Java are better for writing huge programs. I gave an example of misspelling a variable name – a C++ or Java compiler would catch it and give you an error message, but Python would happily give you a new variable. If that mistake happens in a million lines of code, it can be a nightmare to find it. On the other hand, it’s nice to skip the Java and C++ overhead when you don’t need it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 9,005 other followers


Recent Posts

Blog Stats

  • 1,880,033 hits
August 2009

CS Teaching Tips

%d bloggers like this: