The First Multi-Lingual, Valid Measure of CS1 Knowledge: Allison Tew Defends
August 19, 2010 at 6:42 am 17 comments
Allison Elliott Tew has been working for five years to be able to figure out how we can compare different approaches to teaching CS1. As Alan Kay noted in his comments to my recent previous post on computing education research, there are lots of factors, like who is taking the class and what they’re doing in the class. But to make a fair comparison in terms of the inputs, we need a stable measure of the output. Allison made a pass in 2005, but became worried when she couldn’t replicate her results in later semesters. She decided that the problem was that we had no scientific tool that we could rely on to measure CS1 knowledge. We have had no way of measuring what students learn in CS1, in a way that was independent of language or approach, that was reliable and valid. Allison set out to create one.
Allison defends this week. She took a huge gamble — at the end of her dissertation work, she collected two multiple choice question exams from each of 952 subjects. If you get that wrong, you can’t really try again.
She doesn’t need to. She won.
Her dissertation had three main questions.
(1) How do you do this? All the standard educational assessment methods involve comparing new methods to old methods in order to validate them. How do you bootstrap a new test when one has never been created before? She developed a multi-step process for validating her exam, and she carefully defined the range of the test using a combination of text analysis and curriculum standards.
(2) Can you use pseudo-code to make the test language-independent? First, she developed 3 open-ended versions of her test in MATLAB, Python, and Java, then had subjects take those. By analyzing those, she was able to find three distractors (wrong answers) for every question that covered the top three wrong answers in each language — which by itself was pretty amazing. I wouldn’t have guessed that the same mistakes would be made in all three languages.
Then she developed her pseudo-code test. She ran subjects through two sessions (counter-balanced). In one session, they took the test in their “native” language (whatever their CS1 was in), and in another (a week later, to avoid learning effects), the pseudo-code version.
The pseudo-code and native language tests were strongly correlated. The social scientists say that, in this kind of comparison, a correlation statistic r over 0.37 is considered the same test. She beat that on every language.
Notice that the Python correlation was only .415. She then split out the Python CS1 with only CS majors, from the one with mostly non-majors. That’s the .615 vs. the .372 — CS majors will always beat non-majors. One of her hypotheses was that this transfer from native code to pseudo-code would work best for the best students. She found that that was true. She split her subjects into quartiles and the top quartile was significantly different than the third, the third from the second, and so on. I think that this is really important for all those folks who might say, “Oh sure, your students did badly. Our students would rock that exam!” (As I mentioned, the average score on the pseudo-code test was 33.78%, and 48.61% on the “native” language test.) Excellent! Allison’s test works even better as a proxy test for really good students. Do show us better results, then publish it and tell us how you did it!
(3) Then comes the validity argument — is this testing really testing what’s important? Is it a good test? Like I said, she had a multi-step process. First, she had a panel of experts review her test for reasonableness of coverage. Second, she did think-alouds with 12 students to make sure that they were reading the exam the way she intended. Third, she ran IRT analysis to show that her problems were reasonable. Finally, she correlated performance on her pseudo-code test (FCS1) with the final exam grades. That one is the big test for me — is this test measuring what we think is important, across two universities and four different classes? Another highly significant set of correlations, but it’s this scatterplot that really tells the story for me.
Next, Allison defends, and takes a job as a post-doc at University of British Columbia. She plans to make her exam available for other researchers to use — in comparison of CS1 approaches and languages. Want to know if your new Python class is leading to the same learning as your old Java class? This is your test! But she’ll never post it for free on the Internet. If there’s any chance that a student has seen the problems first, the argument for validity fails. So, she’ll be carefully controlling access to the test.
Allison’s work is a big deal. We need it in our “Georgia Computes!” work, as do our teachers. As we change our approaches to broaden participation, we need to show that learning isn’t impacted. In general, we need it in computing education research. We finally have a yardstick by which we can start comparing learning. This isn’t the final and end-all assessment. For example, there are no objects in this test, and we don’t know if it’ll be valid for graphical languages. But it’s the first test like this, and that’s a big step. I hope that others will follow the trail Allison made so that we end up with lots of great learning measures in computing education research.
Entry filed under: Uncategorized. Tags: assessment, computing education research, GaComputes, Java, MATLAB, Python.
1.
gasstationwithoutpumps | August 19, 2010 at 10:45 am
My son’s first “official” computer science course at age 13 was in LISP, using Dr. Scheme. I don’t know how well the pseudo-code test would generalize to courses based on LISP dialects, rather than the very similar Java/Python/MATLAB languages.
(I suspect that my son would still have aced the test, since Dr. Scheme was not the first language he’d learned and he had a solid understanding of C, but several of the other students in the class were seeing Dr. Scheme as their first programming language.)
2.
Aaron Lanterman | August 20, 2010 at 1:31 am
That’s awesome – where was this?
3.
Mark Guzdial | August 20, 2010 at 9:19 am
It’s this morning at 10 am in TSRB first floor.
4.
Aaron Lanterman | August 21, 2010 at 9:33 pm
Ooops! I wasn’t clear – actually my “where was this” was referring to gasstationwithoutpumps – I was wondering where a 13-year old could get a CS course using Dr. Scheme…
5.
Two successful defenses, and off to the UK! « Computing Education Blog | August 20, 2010 at 12:29 pm
[…] 20, 2010 I am pleased to announce that Allison Elliott Tew and Brian Dorn both defended their dissertations successfully. Dr. Elliott Tew actually walked […]
6.
gasstationwithoutpumps | August 21, 2010 at 11:09 pm
My son got the intro to Dr. Scheme at a private school, Georgianna Bruce Kirby Preparatory School. He was an 8th grader, but most of the students in the class were 9–12th graders. He took it in place of the 8th grade science class that most students took (because he was certain he would be bored to tears in the science class). Based on what he heard from the other 8th graders, he made the right choice.
The class started with HTML and CSS for about a month, then spent the rest of the year on Dr. Scheme. They used an on-line text book How to Design Programs.
7.
Those SIGCSE Reviewers say the darnedest things! « Computing Education Blog | October 26, 2010 at 11:52 am
[…] I’m thrilled with how our group did. We submitted three papers: Allison’s on her dissertation work, Lijun’s on our community support for CS teachers, and Davide Fossati’s on his […]
8.
Heading off to SIGCSE 2011! « Computing Education Blog | March 7, 2011 at 9:01 am
[…] Allison Elliott Tew presents her dissertation in 6 pages and 25 minutes. This will be the first presentation of her instrument, the first language-independent, validated test of CS1 knowledg… — should lead to a wild discussion. Later in the afternoon, I’m on a panel about the […]
9.
Computer scientists need to understand education research methods for CE21 « Computing Education Blog | February 13, 2012 at 10:01 am
[…] and that you still achieve learning outcomes (against some reasonable measure of learning, like Allison’s test or the outcome measures being developed for CS:Principles). You don’t need to do a […]
10.
Could we replace all of CS1′s nationwide with one good on-line CS course? « Computing Education Blog | March 9, 2012 at 10:31 am
[…] Sophomore year? Getting more students into internships in the summer? Getting more access? Or getting more learning — and then, about CS knowledge or CS skills? And which knowledge and […]
11.
College Degree, No Class Time Required: Just Religious Faith in Tests « Computing Education Blog | February 11, 2013 at 1:14 am
[…] unsupported (almost religious) faith in our ability to construct tests, especially online tests. Building reliable and valid assessments is part of my research, and it’s really hard. Can I come up with assessments that are at least as good as having […]
12.
SIGCSE 2013 Preview: Measuring attitudes in introductory computing « Computing Education Blog | February 15, 2013 at 1:49 am
[…] general, we have too few research measures in computing education research. Allison’s dissertation work stands alone as the only validated language-independent measure of…. Brian and Allison have been following a careful process of developing the Computing Attitudes […]
13.
MOOC roundup | Gas station without pumps | July 28, 2013 at 1:04 pm
[…] unsupported (almost religious) faith in our ability to construct tests, especially online tests. Building reliable and valid assessments is part of my research, and it’s really […]
14.
A 10 year retrospective on research on Media Computation: ICER 2013 preview | Computing Education Blog | August 9, 2013 at 1:23 am
[…] would learn as much in MediaComp as in our traditional CS1 class. Answering that question led to Allison Elliott Tew’s excellent work on FCS1. The bottom line, though, is that we still don’t […]
15.
How to Learn Computer Programming Efficiently through Computer Games: Michael Lee and Gidget | Computing Education Blog | July 29, 2015 at 7:32 am
[…] an assessment of computing knowledge based on Allison Elliott Tew’s work on FCS1 (see here). He did a nice job validating it using Amazon’s Mechanical […]
16.
SIGCSE 2016 Preview: Miranda Parker replicated the FCS1 | Computing Education Blog | March 2, 2016 at 8:01 am
[…] first language-independent and validated measure of introductory computer science knowledge (see this post summarizing the work). The FCS1 was a significant accomplishment, but it didn’t get used much. Allison had […]
17.
Proposal #3 to Change CS Education to Reduce Inequity: Call a truce on academic misconduct cases for programming assignments | Computing Education Research Blog | July 30, 2020 at 7:00 am
[…] exam scores of the students from four courses at two universities who were part of her study (see post here, with diagram of this scatterplot). Her test (all multiple choice) was predicted the grade of the […]