Posts tagged ‘MATLAB’

MATLAB and APL: Meeting Cleve Moler

I have often described MATLAB as “APL with a normal character set,” but I didn’t actually know anything about how MATLAB came to be or if there was any relationship.  Last night, I got to ask the man who invented MATLAB, Cleve Moler, at the IEEE Computer Society Awards Dinner, where Cleve was named a “Computer Society Pioneer.”  When I introduced myself as coming from Georgia Tech, he took notice.  “Georgia Tech is a big MATLAB user!”  We teach 1200 Engineering students a semester in MATLAB.

Cleve developed MATLAB (in Fortran) as a Matrix Calculator (explicitly, a “MATrix LABoratory”) for his students.  There was no explicit tie to APL, but he saw the connections.  He said that he’s always seen MATLAB as “portable APL” because he used a traditional character set.

It’s not just the character set though.  “Iverson showed me J.  I wanted MATLAB to be understandable by normal people.”  He said that someone once converted a program he’d written in MATLAB into APL.  “I asked what that was.  They told me, ‘That’s your program!’ I couldn’t recognize it.”  APL is about being uniform about everything, but MATLAB “is a mishmosh of all kinds of things.”

Others joined in the conversation.  “What do you think about Mathematica?”  Cleve responded, “Mathematica is APL for the 21st century.  Mathematica has a uniformity about it.”

Cleve’s ideas about what make a language usable “by normal people” are interesting.  The success of MATLAB in terms of its use by so many people in so many different contexts, domains, and application areas give him real authority for making such claims.  He sees a “mishmosh” as being easier for people to understand than uniformity.  Marvin Minsky famously said that the brain is likely a “kluge.”  Do we actually prefer messy languages, with less uniformity, perhaps as a reflection of our “kluge” nature?

June 14, 2012 at 10:11 am 4 comments

The First Multi-Lingual, Valid Measure of CS1 Knowledge: Allison Tew Defends

Allison Elliott Tew has been working for five years to be able to figure out how we can compare different approaches to teaching CS1.  As Alan Kay noted in his comments to my recent previous post on computing education research, there are lots of factors, like who is taking the class and what they’re doing in the class.  But to make a fair comparison in terms of the inputs, we need a stable measure of the output.  Allison made a pass in 2005, but became worried when she couldn’t replicate her results in later semesters.  She decided that the problem was that we had no scientific tool that we could rely on to measure CS1 knowledge.  We have had no way of measuring what students learn in CS1, in a way that was independent of language or approach, that was reliable and valid.  Allison set out to create one.

Allison defends this week.  She took a huge gamble — at the end of her dissertation work, she collected two multiple choice question exams from each of 952 subjects.  If you get that wrong, you can’t really try again.

She doesn’t need to. She won.

Her dissertation had three main questions.

(1) How do you do this?  All the standard educational assessment methods involve comparing new methods to old methods in order to validate them.  How do you bootstrap a new test when one has never been created before?  She developed a multi-step process for validating her exam, and she carefully defined the range of the test using a combination of text analysis and curriculum standards.

(2) Can you use pseudo-code to make the test language-independent?  First, she developed 3 open-ended versions of her test in MATLAB, Python, and Java, then had subjects take those.  By analyzing those, she was able to find three distractors (wrong answers) for every question that covered the top three wrong answers in each language — which by itself was pretty amazing.  I wouldn’t have guessed that the same mistakes would be made in all three languages.

Then she developed her pseudo-code test.  She ran subjects through two sessions (counter-balanced). In one session, they took the test in their “native” language (whatever their CS1 was in), and in another (a week later, to avoid learning effects), the pseudo-code version.

The pseudo-code and native language tests were strongly correlated.  The social scientists say that, in this kind of comparison, a correlation statistic r over 0.37 is considered the same test.  She beat that on every language.

Notice that the Python correlation was only .415.  She then split out the Python CS1 with only CS majors, from the one with mostly non-majors.  That’s the .615 vs. the .372 — CS majors will always beat non-majors.  One of her hypotheses was that this transfer from native code to pseudo-code would work best for the best students.  She found that that was true.  She split her subjects into quartiles and the top quartile was significantly different than the third, the third from the second, and so on.  I think that this is really important for all those folks who might say, “Oh sure, your students did badly.  Our students would rock that exam!”  (As I mentioned, the average score on the pseudo-code test was 33.78%, and 48.61% on the “native” language test.)  Excellent!  Allison’s test works even better as a proxy test for really good students.  Do show us better results, then publish it and tell us how you did it!

(3) Then comes the validity argument — is this testing really testing what’s important?  Is it a good test?  Like I said, she had a multi-step process. First, she had a panel of experts review her test for reasonableness of coverage. Second, she did think-alouds with 12 students to make sure that they were reading the exam the way she intended.  Third, she ran IRT analysis to show that her problems were reasonable.  Finally, she correlated performance on her pseudo-code test (FCS1) with the final exam grades.  That one is the big test for me — is this test measuring what we think is important, across two universities and four different classes?  Another highly significant set of correlations, but it’s this scatterplot that really tells the story for me.

Next, Allison defends, and takes a job as a post-doc at University of British Columbia.  She plans to make her exam available for other researchers to use — in comparison of CS1 approaches and languages.  Want to know if your new Python class is leading to the same learning as your old Java class?  This is your test!  But she’ll never post it for free on the Internet.  If there’s any chance that a student has seen the problems first, the argument for validity fails.  So, she’ll be carefully controlling access to the test.

Allison’s work is a big deal.  We need it in our “Georgia Computes!” work, as do our teachers.  As we change our approaches to broaden participation, we need to show that learning isn’t impacted.  In general, we need it in computing education research.  We finally have a yardstick by which we can start comparing learning.  This isn’t the final and end-all assessment.  For example, there are no objects in this test, and we don’t know if it’ll be valid for graphical languages.  But it’s the first test like this, and that’s a big step. I hope that others will follow the trail Allison made so that we end up with lots of great learning measures in computing education research.

August 19, 2010 at 6:42 am 17 comments


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,185 other subscribers

Feeds

Recent Posts

Blog Stats

  • 2,060,328 hits
June 2023
M T W T F S S
 1234
567891011
12131415161718
19202122232425
2627282930  

CS Teaching Tips