Archive for April 15, 2019

A Task-Specific Programming Language for Web Scraping in which learners are successful very quickly

One of the papers that has most influenced my thinking about task-specific programming languages is Rousillon: Scraping Distributed Hierarchical Web Data by Sarah Chasins, Maria Mueller, and Rastislav Bodik.

Rousillon is a programming by demonstration system that generates a program in Helena, a task-specific programming language for Web scraping (grabbing data out of Web pages). Below is a flow diagram describing how Rousillon generates a program in Helena.

Check out the Helena program at the far right of that diagram. Yeah, it’s a block-based programming language — for adults. The choice of blocks was made explicitly to avoid syntax errors. If the user wants to modify the synthesized code, she is guided to what can be changed — change a slot in a block, or delete or move a block.  It’s a purposeful choice to improve the user experience of programming with Helena.

Helena doesn’t do statistics. It doesn’t do visualizations. It does one thing extremely well.  Maybe it’s competition for R, because people do use R for Web scraping.  But it’s far easier to do web-scraping in Helena.

The below graph is the one that blew me away. They ran a study comparing Rousillon and Selenium, a comparable Web data scraping system. Everybody using Rousillon completed both tasks in a few minutes. Most people using Selenium couldn’t finish the tasks (if the bar goes all the way up to the top, people ran out of time before completing).

But here’s the part that is just astonishing. Notice the black border on some of the Selenium bars in that graph? Those are the people who knew Selenium already. NONE of the Rousillon users knew the language before hand. That time for the Rousillon users includes training — and they still beat out the Selenium users.  A task that a trained Selenium user can solve in 25 minutes, a complete novice can solve with Rousillon and Helena in 10 minutes.

Here’s a goal for task-specific programming languages for adult end-users: It should be easily learned. End-users want to be able to succeed at simple tasks nearly immediately. They want to achieve their task, and any time spent learning is time away from completing the task. It’s about reducing the costs of integrating programming into the adult’s context.

The goal isn’t exactly the same when we’re talking about learners in non-CS classes. I think it’s about balancing two challenges:

  • If the learning is generative and will be used again, then some additional learning is valuable. For example, learning about vectors in MATLAB and lists in Racket make sense — you’ll use them whenever you will use either language. It’s a concept that can be applied whenever you use those languages. (I have a blog post that talks about generativity and other goals for task-specific programming languages next week.)
  • But we don’t want to turn algebra, history, or economics classes into CS classes. We don’t want to make mathematics or social studies teachers feel like like they’re now teaching CS. When is it too much CS that you’re teaching?  Perhaps a rule of thumb is when the teacher is teaching more than they could learn in a single professional development session. That’s just a guess.

April 15, 2019 at 7:00 am 3 comments

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,184 other subscribers


Recent Posts

Blog Stats

  • 2,054,191 hits
April 2019

CS Teaching Tips