Posts tagged ‘assessment’

How do we test the cultural assumptions of our assessments?

I’m teaching a course on user interface software development for about 260 students this semester. We just had a Midterm where I felt I bobbled one of the assessment questions because I made cultural assumptions. I’m wondering how I could have avoided that.

I’m a big fan of multiple choice, fill-in-the-blank, and Parsons problems on my assessments. I use my Parson problem generator a lot (see link here). For example, on this one, students had to arrange the scrambled parts of an HTML file in order to achieve a given DOM tree, and there were two programs in JavaScript (using constructors and prototypes) that they had to unscramble.

I typically ask some definitional questions about user interfaces at the start, about ideas like signifiers, affordances, learned associations, and metaphors. Like Dan Garcia (see his CS-Ed Podcast), I believe in starting out the exam with some easy things, to buoy confidence. They’re typically only worth a couple points, and I try to make the distractors fun. Here’s an example:

Since we watched in lecture a nice video starring Don Norman explaining “Norman doors,” I was pretty sure that anyone who actually attended lecture that day would know that the answer was the first one in the list. Still, maybe a half-dozen students chose the second item.

Here’s the one that bothered me much more.

I meant for the answer to be the first item on the list. In fact, almost the exact words were on the midterm exam review, so that students who studied the review guide would know immediately what we wanted. (I do know that working memory doesn’t actually store more for experts — I made a simplification to make the definition easier to keep in mind.)

Perhaps a dozen student chose the second item: “Familiarity breeds contempt. Experts contempt for their user interfaces allows them to use them without a sense of cognitive overload.” I had several students ask me during the exam, “What’s contempt?” I realized that many of my students didn’t know the word or the famous phrase (dates back to Chaucer).

Then one student actually wrote on his exam, “I’m assuming that contempt means learned contentment.” If you make that assumption, the item doesn’t sound ridiculous: “Familiarity breeds learned contentment. Experts learned contentment for their user interfaces allows them to use them without a sense of cognitive overload.”

I had accidentally created an assessment that expected a particular cultural context. The midterm was developed over several weeks, and reviewed by my co-instructor, graduate student instructor, five undergraduate assistants, and three undergraduate graders. We’re a pretty diverse bunch. We had found and fixed perhaps a dozen errors in the exam during the development period. We’d never noted this problem.

I’m not sure how I could have avoided this mistake. How does one remain aware of one’s own cultural assumptions? I’m thinking of the McLuhan quote: “I don’t know who discovered water, but it wasn’t a fish.” I feel bad for the students who got this problem wrong because they didn’t know the quote or the meaning of the word “contempt.” What do you think? How might I have discovered the cultural assumptions in my assessment?

March 16, 2020 at 1:57 pm 15 comments

BDSI – A New Validated Assessment for Basic Data Structures: Guest Blog Post from Leo Porter and colleagues

Leo Porter, Michael Clancy, Cynthia Lee, Soohyun Nam Liao, Cynthia Taylor, Kevin C. Webb, and Daniel Zingaro have developed a new concept inventory that they are making available to instructors and researchers. They have written this guest blog post to describe their new instrument and explain why you should use it. I’m grateful for their contribution!

We recently published a Concept Inventory for Basic Data Structures at ICER 2019 [1] and hope it will be of use to you in your classes and/or research.

The BDSI is a validated instrument to measure student knowledge of Basic Data Structure Concepts [1].  To validate the BDSI, we engaged faculty at a diverse set of institutions to decide on topics, help with question design, and ensure the questions are valued by instructors.  We also conducted over one hundred interviews with students in order to identify common misconceptions and to ensure students properly interpret the questions. Lastly, we ran pilots of the instrument at seven different institutions and performed a statistical evaluation of the instrument to ensure the questions are properly interpreted and discriminate between students’ abilities well.

What Our Assessment Measures

The BDSI measures student performance on Basic Data Structure concepts commonly found in a CS2 course.  To arrive at the topics and content of the exam, we worked with fifteen faculty at thirteen different institutions to ensure broad applicability.  The resulting topics on the CI include: Interfaces, Array-Based Lists, Linked-Lists, and Binary Search Trees. If you are curious about the learning goals or want more details on the process we used in arriving at these goals, please see our SIGCSE 2018 publication [2].

Why Validated Assessments are Great for Instructors

Suppose you want to know how well your students understand various topics in your CS2 course.  How could you figure out how much your students are learning relative to other schools? You could, perhaps, get a final exam from another school and use it in your class to compare results, but invariably, the final exam may not be a good fit.  Moreover, you may find flaws in some of the questions and wonder if students interpret them properly. Instead, you can use a validated assessment. The advantage of using a validated assessment is there is general agreement that it is measuring what you want to measure and it accurately measures student thinking.  As such, you can compare your findings to results from other schools who have used the instrument to determine if your students are learning particular topics better or worse than cohorts and similar institutions.

Why Validated Assessments are Great for Researchers

As CS researchers, we often experiment with new ways to teach courses.  For example, many people use Media Computation or Peer Instruction (PI), two complementary pedagogical approaches developed over the past several decades.  It’s important to establish whether these changes are helping our students. Do more students pass? Do fewer students withdraw? Do more students continue studying CS?  Does it boost outcomes for under-represented groups? Answering these questions using a variety of courses can give us insight into whether what we do corresponds with our expectations.

One important question is: using our new approach, do students learn more than before?  Unfortunately, answering this is complicated by the lack of standardized, validated assessments.  If students score 5% higher on an exam when studying with PI vs. not studying with PI, all we know is that PI students did better on that exam.  But exams are designed by one instructor, for one course at one institution, not for the purposes of cross-institution, cross-cohort comparisons.  They are not validated. They do not take into account the perspectives of other CS experts. When students answer a question on an exam correctly, we assume that it’s because they know the material; when they answer incorrectly, we assume it’s because they don’t know the material.  But we don’t know: maybe the exam contains incidental cues that subtly influence how students respond.

A Concept Inventory (CI) solves these problems.  Its rigorous design process leads to an assessment that can be used across schools and cohorts, and can be used to validly compare teaching approaches.

How to Obtain the BDSI

The BDSI is available via the google group.  If you’re interested in using it, please join the group and add a post with your name, institution, and how you plan to use the BDSI.

How to Use the BDSI

The BDSI is designed to be given as a post-test after students have completed the covered material.  Because the BDSI was validated as a full instrument, it is important to use the entire assessment, and not alter or remove any of the questions.  We ask that instructors not make copies of the assessment available to students after giving the BDSI, to try to avoid the questions becoming public.  We likewise recommend giving participation credit, but not correctness credit, to students for taking the BDSI, to avoid incentivizing cheating.  We have found giving the BDSI as part of a final review session, collecting the assessment from students, and then going over the answers to be a successful methodology for having students take it. 

Want to Learn More?

If you’re interested in learning more about how to build a CI, please come to our talk at SIGCSE 2020 (from 3:45-4:10pm on Thursday, March 12th) or read our paper [3].  If you are interested in learning more about how to use validated assessments, please come to our Birds of a Feather session on “Using Validated Assessments to Learn About Your Students” at SIGCSE 2020 (5:30-6:20pm on Thursday, March 12th) or our tutorial on using the BDSI at CCSC-SW 2020 (March 20-21).

References:

[1] Leo Porter, Daniel Zingaro, Soohyun Nam Liao, Cynthia Taylor, Kevin C. Webb, Cynthia Lee, and Michael Clancy. 2019. BDSI: A Validated Concept Inventory for Basic Data Structures. In Proceedings of the 2019 ACM Conference on International Computing Education Research (ICER ’19).

[2] Leo Porter, Daniel Zingaro, Cynthia Lee, Cynthia Taylor, Kevin C. Webb, and Michael Clancy. 2018. Developing Course-Level Learning Goals for Basic Data Structures in CS2. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE ’18).

[3] Cynthia Taylor, Michael Clancy, Kevin C. Webb, Daniel Zingaro, Cynthia Lee, and Leo Porter. 2020. The Practical Details of Building a CS Concept Inventory. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ’20).

February 24, 2020 at 7:00 am Leave a comment

Adaptive Parsons problems, and the role of SES and Gesture in learning computing: ICER 2018 Preview

 

Next week is the 2018 International Computing Education Research Conference in Espoo, Finland. The proceedings are (as of this writing) available here: https://dl.acm.org/citation.cfm?id=3230977. Our group has three papers in the 28 accepted this year.

“Evaluating the efficiency and effectiveness of adaptive Parsons problems” by Barbara Ericson, Jim Foley, and Jochen (“Jeff”) Rick

These are the final studies from Barb Ericson’s dissertation (I blogged about her defense here). In her experiment, she compared four conditions: Students learning through writing code, through fixing code, through solving Parsons problems, and through solving her new adaptive Parsons problems. She had a control group this time (different from her Koli Calling paper) that did turtle graphics between the pre-test and post-test, so that she could be sure that there wasn’t just a testing effect of pre-test followed by a post-test. The bottom line was basically what she predicted: Learning did occur, with no significant difference between treatment groups, but the Parsons problems groups took less time. Our ebooks now include some of her adaptive Parsons problems, so she can compare performance across many students on adaptive and non-adaptive forms of the same problem. She finds that students solve the problems more and with fewer trials on the adaptive problems. So, adaptive Parsons problems lead to the same amount of learning, in less time, with fewer failures. (Failures matter, since self-efficacy is a big deal in computer science education.)

“Socioeconomic status and Computer science achievement: Spatial ability as a mediating variable in a novel model of understanding” by Miranda Parker, Amber Solomon, Brianna Pritchett, David Illingworth, Lauren Margulieux, and Mark Guzdial

(Link to last version I reviewed.)

This study is a response to the paper Steve Cooper presented at ICER 2015 (see blog post here), where they found that spatial reasoning training erased performance differences between higher and lower socioeconomic status (SES) students, while the comparison class had higher-SES students performing better than lower-SES students. Miranda and Amber wanted to test this relationship at a larger scale.

Why should wealthier students do better in CS? The most common reason I’ve heard is that wealthier students have more opportunities to study CS — they have greater access. Sometimes that’s called preparatory privilege.

Miranda and Amber and their team wanted to test whether access is really the right intermediate variable. They gave students at two different Universities four tests:

  • Part of Miranda’s SCS1 to measure performance in CS.
  • A standardized test of SES.
  • A test of spatial reasoning.
  • A survey about the amount of access they had to CS education, e.g., formal classes, code clubs, summer camps, etc.

David and Lauren did the factor analysis and structural equation modeling to compare two hypotheses: Does higher SES lead to greater access which leads to greater success in CS, or does higher SES lead to higher spatial reasoning which leads to greater success in CS? Neither hypothesis accounted for a significant amount of the differences in CS performance, but the spatial reasoning model did better than the access model.

There are some significant limitations of this study. The biggest is that they gathered data at universities. A lot of SES variance just disappears when you look at college students — they tend to be wealthier than average.

Still, the result is important for challenging the prevailing assumption about why wealthier kids do better in CS. More, spatial reasoning is an interesting variable because it’s rather inexpensively taught. It’s expensive to prepare CS teachers and get them into all schools. Steve showed that we can teach spatial reasoning within an existing CS class and reduce SES differences.

“Applying a Gesture Taxonomy to Introductory Computing Concepts” by Amber Solomon, Betsy DiSalvo, Mark Guzdial, and Ben Shapiro

(Link to last version I saw.)

We were a bit surprised (quite pleasantly!) that this paper got into ICER. I love the paper, but it’s different from most ICER papers.

Amber is interested in the role that gestures play in teaching CS. She started this paper from a taxonomy of gestures seen in other STEM classes. She observed a CS classroom and used her observations to provide concrete examples of the gestures seen in other kinds of classes. This isn’t a report of empirical findings. This is a report of using a lens borrowed from another field to look at CS learning and teaching in a new way.

My favorite part of of this paper is when Amber points out what parts of CS gestures don’t really fit in the taxonomy. It’s one thing to point to lines of code – that’s relatively concrete. It’s another thing to “point” to reference data, e.g., when explaining a sort and you gesture at the two elements you’re comparing or swapping. What exactly/concretely are we pointing at? Arrays are neither horizontal nor vertical — that distinction doesn’t really exist in memory. Arrays have no physical representation, but we act (usually) as if they’re laid out horizontally in front of us. What assumptions are we making in order to use gestures in our teaching? And what if students don’t share in those assumptions?

August 10, 2018 at 7:00 am 5 comments

A Generator for Parsons problems on LaTeX exams and quizzes

I just finished teaching my Introduction to Media Computation a few weeks ago to over 200 students. After Barb finished her dissertation on Parsons problems this semester, I decided that I should include Parsons problems on my last quiz, on the final exam study guide, and on the final exam. Parsons problems are a great fit for this assessment task. We know that Parsons problems are a more sensitive measure of learning than code writing problems, they’re just as effective as code writing or code fixing problems for learning (so good for a study guide), and they take less time than code writing or fixing.

Barb’s work used an interactive tool for providing adaptive Parsons problems. I needed to use paper for the quiz and final exam. There have been several Parsons problems paper-based implementation, and Barb guided me in developing mine.

But I realized that there’s a challenge to doing a bunch of Parsons problems like this. Scrambling code is pretty easy, but what happens when you find that you got something wrong? The quiz, study guide, and final exam were all going to iterate several times as we developed them and tested them with the teaching assistants. How do I make sure that I always kept aligned the scrambled code and the right answer?

I decided to build a gadget in LiveCode to do it.

I paste the correctly ordered code into the field on the left. When I press “Scramble,” a random ordering of the code appears (in a Verbatim LaTeX environment) along with the right answers, to be used in the LaTeX exam class. If you want to list a number of points to be associated with each correct line, you can put a number into the field above the solution field. If empty, no points will be explicitly allocated in the exam document.

I’d then paste both of those fields into my LaTeX source document. (I usually also pasted in the original source code in the correct order, so that I could fix the code and re-run the scramble when I inevitably found that I did something wrong.)

The wording of the problem was significant. Barb coached me on the best practice. You allow students to write just the line number, but encourage them to write the whole line because the latter is going to be less cognitive load for them.

Unscramble the code below that halves the frequency of the input sound.

Put the code in the right order on the lines below. You may write the line numbers of the scrambled code in the right order, or you can write the lines themselves (or both). (If you include both, we will grade the code itself if there’s a mismatch.)

The problem as the student sees it looks like this:

The exam class can also automatically generate a version of the exam with answers for used in grading. I didn’t solve any of the really hard problems in my script, like how do I deal with lines that could be put in any order. When I found that problem, I just edited the answer fields to list the acceptable options.

I am making the LiveCode source available here: http://bit.ly/scrambled-latex-src

LiveCode generates executables very easily. I have generated Windows, MacOS, and Linux executables and put them in a (20 Mb, all three versions) zip here: http://bit.ly/scrambled-latex

I used this generator probably 10-20 times in the last few weeks of the semester. I have been reflecting on this experience as an example of end-user programming. I’ll talk about that in the next blog post.

June 8, 2018 at 2:00 am 3 comments

Attending the amazing 2017 Computing at School conference #CASConf17

June 17, Barbara and I attended the Computing at School conference in Birmingham, England (which I wrote about here).  The slides from my talk are below. I highly recommend the summary from Duncan Hull which I quote at the bottom.

CAS was a terrifically fun event. It was packed full with 300 attendees. I under-estimated the length of my talk (I tend to talk too fast), so instead of a brief Q&A, there was almost half the time for Q&A. Interacting with the audience to answer teachers’ questions was more fun (and hopefully, more useful and entertaining) than me talking for longer. The session was well received based on the Tweets I read. In fact, that’s probably the best way to get a sense for the whole day — on Twitter, hashtag #CASConf17. (I’m going to try to embed some tweets with pictures below.)

Barbara’s two workshops on Media Computation in Python using our ebooks went over really well.

I enjoyed my interactions all day long. I was asked about research results in just about every conversation — the CAS teachers are eager to see what computing education research can offer them.  I met several computing education research PhD students, which was particularly exciting and fun. England takes computing education research seriously.

Miles Berry demonstrated Project Quantum by having participants answer questions from the database.  That was an engaging and fascinating interactive presentation.

Linda Liukas gave a terrific closing keynote. She views the world from a perspective that reminded me of Mitchel Resnick’s Lifelong Kindergarten and Seymour Papert’s playfulness. I was inspired.

The session that most made me think was from Peter Kemp on the report that he and co-authors have just completed on the state of computing education in England. That one deserves a separate blog post – coming Wednesday.

Check out Duncan’s summary of the conference:

The Computing At School (CAS) conference is an annual event for educators, mostly primary and secondary school teachers from the public and private sector in the UK. Now in its ninth year, it attracts over 300 delegates from across the UK and beyond to the University of Birmingham, see the brochure for details. One of the purposes of the conference is to give teachers new ideas to use in their classrooms to teach Computer Science and Computational Thinking. I went along for my first time (*blushes*) seeking ideas to use in an after school Code Club (ages 7-10) I’ve been running for a few years and also for approaches that undergraduate students in Computer Science (age 20+) at the University of Manchester could use in their final year Computer Science Education projects that I supervise. So here are nine ideas (in random brain dump order) I’ll be putting to immediate use in clubs, classrooms, labs and lecture theatres:

Source: Nine ideas for teaching Computing at School from the 2017 CAS conference | O’Really?

My talk slides:

July 10, 2017 at 7:00 am 1 comment

Assessing Learning In Introductory Computer Science: Dagstuhl Seminar Report now Available

I have written about this Dagstuhl Seminar (see earlier post). The formal report is now available.

This seminar discussed educational outcomes for first-year (university-level) computer science. We explored which outcomes were widely shared across both countries and individual universities, best practices for assessing outcomes, and research projects that would significantly advance assessment of learning in computer science. We considered both technical and professional outcomes (some narrow and some broad) as well as how to create assessments that focused on individual learners. Several concrete research projects took shape during the seminar and are being pursued by some participants.

Source: DROPS – Assessing Learning In Introductory Computer Science (Dagstuhl Seminar 16072)

September 26, 2016 at 7:26 am Leave a comment

Preview ICER 2016: Ebooks Design-Based Research and Replications in Assessment and Cognitive Load Studies

The International Computing Education Research (ICER) Conference 2016 is September 8-12 in Melbourne, Australia (see website here). There were 102 papers submitted, and 26 papers accepted for a 25% acceptance rate. Georgia Tech computing education researchers are justifiably proud — we submitted three papers to ICER 2016, and we had three acceptances. We’re over 10% of all papers at ICER 2016.

One of the papers extends the ebook work that I’ve reported on here (see here where we made them available and our paper on usability and usage from WiPSCE 2015). Identifying Design Principles for CS Teacher Ebooks through Design-Based Research (click on the title to get to the ACM DL page) by Barbara Ericson, Kantwon Rogers, Miranda Parker, Briana Morrison, and I use a Design-Based Research perspective on our ebooks work. We describe our theory for the ebooks, then describe the iterations of what we designed, what happened when we deployed (data-driven), and how we then re-designed.

Two of our papers are replication studies — so grateful to the ICER reviewers and communities for seeing the value of replication studies. The first is Replication, Validation, and Use of a Language Independent CS1 Knowledge Assessment by Miranda Parker, me, and Shelly Engleman. This is Miranda’s paper expanding on her SIGCSE 2016 poster introducing the SCS1 validated and language-independent measure of CS1 knowledge. The paper does a great survey of validated measures of learning, explains her process, and then presents what one can and can’t claim with a validated instrument.

The second is Learning Loops: A Replication Study Illuminates Impact of HS Courses by Briana Morrison, Adrienne Decker, and Lauren Margulieux. Briana and Lauren have both now left Georgia Tech, but they were still here when they did this paper, so we’re claiming them. Readers of this blog may recall Briana and Lauren’s confusing results from SIGCSE 2016 result that suggest that cognitive load in CS textual programming is so high that it blows away our experimental instructional treatments. Was that an aberration? With Adrienne Decker’s help (and student participants), they replicated the study. I’ll give away the bottom line: It wasn’t an aberration. One new finding is that students who did not have high school CS classes caught up with those who did in the experiment, with respect to understanding loops

We’re sending three of our Human-Centered Computing PhD students to the ICER 2016 Doctoral Consortium. These folks will be in the DC on Sept 8, and will present posters to the conference on Sept 9 afternoon.

September 2, 2016 at 7:53 am 17 comments

Older Posts


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 7,680 other followers

Feeds

Recent Posts

Blog Stats

  • 1,744,432 hits
April 2020
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
27282930  

CS Teaching Tips