A Task-Specific Programming Language for Web Scraping in which learners are successful very quickly

April 15, 2019 at 7:00 am 3 comments

One of the papers that has most influenced my thinking about task-specific programming languages is Rousillon: Scraping Distributed Hierarchical Web Data by Sarah Chasins, Maria Mueller, and Rastislav Bodik.

Rousillon is a programming by demonstration system that generates a program in Helena, a task-specific programming language for Web scraping (grabbing data out of Web pages). Below is a flow diagram describing how Rousillon generates a program in Helena.

Check out the Helena program at the far right of that diagram. Yeah, it’s a block-based programming language — for adults. The choice of blocks was made explicitly to avoid syntax errors. If the user wants to modify the synthesized code, she is guided to what can be changed — change a slot in a block, or delete or move a block.  It’s a purposeful choice to improve the user experience of programming with Helena.

Helena doesn’t do statistics. It doesn’t do visualizations. It does one thing extremely well.  Maybe it’s competition for R, because people do use R for Web scraping.  But it’s far easier to do web-scraping in Helena.

The below graph is the one that blew me away. They ran a study comparing Rousillon and Selenium, a comparable Web data scraping system. Everybody using Rousillon completed both tasks in a few minutes. Most people using Selenium couldn’t finish the tasks (if the bar goes all the way up to the top, people ran out of time before completing).

But here’s the part that is just astonishing. Notice the black border on some of the Selenium bars in that graph? Those are the people who knew Selenium already. NONE of the Rousillon users knew the language before hand. That time for the Rousillon users includes training — and they still beat out the Selenium users.  A task that a trained Selenium user can solve in 25 minutes, a complete novice can solve with Rousillon and Helena in 10 minutes.

Here’s a goal for task-specific programming languages for adult end-users: It should be easily learned. End-users want to be able to succeed at simple tasks nearly immediately. They want to achieve their task, and any time spent learning is time away from completing the task. It’s about reducing the costs of integrating programming into the adult’s context.

The goal isn’t exactly the same when we’re talking about learners in non-CS classes. I think it’s about balancing two challenges:

  • If the learning is generative and will be used again, then some additional learning is valuable. For example, learning about vectors in MATLAB and lists in Racket make sense — you’ll use them whenever you will use either language. It’s a concept that can be applied whenever you use those languages. (I have a blog post that talks about generativity and other goals for task-specific programming languages next week.)
  • But we don’t want to turn algebra, history, or economics classes into CS classes. We don’t want to make mathematics or social studies teachers feel like like they’re now teaching CS. When is it too much CS that you’re teaching?  Perhaps a rule of thumb is when the teacher is teaching more than they could learn in a single professional development session. That’s just a guess.

Entry filed under: Uncategorized. Tags: , , .

European Best Practices in Education Award: Focus on Inclusive Education What we want kids to learn through coding: Requirements for task-specific programming languages for learning

3 Comments Add your own

  • 1. Raul Miller  |  April 15, 2019 at 11:55 am

    My one question here would have to do with the scope of that study.

    More specifically: are there tasks where Selenium’s approach gets people going faster than Helena’s approach? (If so, would those tasks be useful? To who? Would those be tasks with a long time-to-develop — indicating something, perhaps about the depth of the language — or would those tasks be “simple, easy stuff” with relatively broad applicability? Or would no one be able to find such tasks because of inherent limits of the architecture?) I do not know about any of this, but my exposure to analogous “x better than y” studies in the past nags at me to wonder about this kind of thing.

    I’m very encouraged by this study, and I’m going to try using Helena today on something (probably nothing important), to get a better feel for its capabilities. But I imagine it has some easy room for improvement, and I’m looking forward to seeing that, also.

    • 2. Mark Guzdial  |  April 16, 2019 at 1:29 pm

      I’m sure that you’re right. There’s got to be a limit to any task-specific programming language. I’m not sure that it’s about improving it. That’s like putting an electric motor on a toddler’s tricycle — yes, it’s faster and can go uphill easier, but it misses the point of what the tricycle was good for.

      I got into a discussion on the topic of what is a task-specific language for with Arnold Pears yesterday via the tweet on this blog post.

  • 3. zamanskym  |  April 17, 2019 at 8:31 am

    I love your last two points but as usual, am concerned with implementation and specifically what’s imposed by states and school boards.

    It’s still unclear what actual CS (or CT or whatever it is we’re talking about) is important and necessary for all students in K12 and while I agree that we don’t want to turn our ____ class into CS classes while leveraging CS to empower those students and teachers I suspect the beancounters will gladly say “yep, we’ve got CS in ___ class so we’re done with CS at this level.”

    We’ve already seen it happen.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,184 other subscribers


Recent Posts

Blog Stats

  • 2,054,191 hits
April 2019

CS Teaching Tips

%d bloggers like this: