The First Multi-Lingual, Valid Measure of CS1 Knowledge: Allison Tew Defends

August 19, 2010 at 6:42 am 17 comments

Allison Elliott Tew has been working for five years to be able to figure out how we can compare different approaches to teaching CS1.  As Alan Kay noted in his comments to my recent previous post on computing education research, there are lots of factors, like who is taking the class and what they’re doing in the class.  But to make a fair comparison in terms of the inputs, we need a stable measure of the output.  Allison made a pass in 2005, but became worried when she couldn’t replicate her results in later semesters.  She decided that the problem was that we had no scientific tool that we could rely on to measure CS1 knowledge.  We have had no way of measuring what students learn in CS1, in a way that was independent of language or approach, that was reliable and valid.  Allison set out to create one.

Allison defends this week.  She took a huge gamble — at the end of her dissertation work, she collected two multiple choice question exams from each of 952 subjects.  If you get that wrong, you can’t really try again.

She doesn’t need to. She won.

Her dissertation had three main questions.

(1) How do you do this?  All the standard educational assessment methods involve comparing new methods to old methods in order to validate them.  How do you bootstrap a new test when one has never been created before?  She developed a multi-step process for validating her exam, and she carefully defined the range of the test using a combination of text analysis and curriculum standards.

(2) Can you use pseudo-code to make the test language-independent?  First, she developed 3 open-ended versions of her test in MATLAB, Python, and Java, then had subjects take those.  By analyzing those, she was able to find three distractors (wrong answers) for every question that covered the top three wrong answers in each language — which by itself was pretty amazing.  I wouldn’t have guessed that the same mistakes would be made in all three languages.

Then she developed her pseudo-code test.  She ran subjects through two sessions (counter-balanced). In one session, they took the test in their “native” language (whatever their CS1 was in), and in another (a week later, to avoid learning effects), the pseudo-code version.

The pseudo-code and native language tests were strongly correlated.  The social scientists say that, in this kind of comparison, a correlation statistic r over 0.37 is considered the same test.  She beat that on every language.

Notice that the Python correlation was only .415.  She then split out the Python CS1 with only CS majors, from the one with mostly non-majors.  That’s the .615 vs. the .372 — CS majors will always beat non-majors.  One of her hypotheses was that this transfer from native code to pseudo-code would work best for the best students.  She found that that was true.  She split her subjects into quartiles and the top quartile was significantly different than the third, the third from the second, and so on.  I think that this is really important for all those folks who might say, “Oh sure, your students did badly.  Our students would rock that exam!”  (As I mentioned, the average score on the pseudo-code test was 33.78%, and 48.61% on the “native” language test.)  Excellent!  Allison’s test works even better as a proxy test for really good students.  Do show us better results, then publish it and tell us how you did it!

(3) Then comes the validity argument — is this testing really testing what’s important?  Is it a good test?  Like I said, she had a multi-step process. First, she had a panel of experts review her test for reasonableness of coverage. Second, she did think-alouds with 12 students to make sure that they were reading the exam the way she intended.  Third, she ran IRT analysis to show that her problems were reasonable.  Finally, she correlated performance on her pseudo-code test (FCS1) with the final exam grades.  That one is the big test for me — is this test measuring what we think is important, across two universities and four different classes?  Another highly significant set of correlations, but it’s this scatterplot that really tells the story for me.

Next, Allison defends, and takes a job as a post-doc at University of British Columbia.  She plans to make her exam available for other researchers to use — in comparison of CS1 approaches and languages.  Want to know if your new Python class is leading to the same learning as your old Java class?  This is your test!  But she’ll never post it for free on the Internet.  If there’s any chance that a student has seen the problems first, the argument for validity fails.  So, she’ll be carefully controlling access to the test.

Allison’s work is a big deal.  We need it in our “Georgia Computes!” work, as do our teachers.  As we change our approaches to broaden participation, we need to show that learning isn’t impacted.  In general, we need it in computing education research.  We finally have a yardstick by which we can start comparing learning.  This isn’t the final and end-all assessment.  For example, there are no objects in this test, and we don’t know if it’ll be valid for graphical languages.  But it’s the first test like this, and that’s a big step. I hope that others will follow the trail Allison made so that we end up with lots of great learning measures in computing education research.

Entry filed under: Uncategorized. Tags: , , , , , .

Secondary school IT education failing in UK A New Classroom for a New Kind of Computing Student: Brian Defends

17 Comments Add your own

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 7,982 other followers

Feeds

Recent Posts

Blog Stats

  • 1,794,436 hits
August 2010
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

CS Teaching Tips


%d bloggers like this: