Posts tagged ‘assessment’

How do we test the cultural assumptions of our assessments?

I’m teaching a course on user interface software development for about 260 students this semester. We just had a Midterm where I felt I bobbled one of the assessment questions because I made cultural assumptions. I’m wondering how I could have avoided that.

I’m a big fan of multiple choice, fill-in-the-blank, and Parsons problems on my assessments. I use my Parson problem generator a lot (see link here). For example, on this one, students had to arrange the scrambled parts of an HTML file in order to achieve a given DOM tree, and there were two programs in JavaScript (using constructors and prototypes) that they had to unscramble.

I typically ask some definitional questions about user interfaces at the start, about ideas like signifiers, affordances, learned associations, and metaphors. Like Dan Garcia (see his CS-Ed Podcast), I believe in starting out the exam with some easy things, to buoy confidence. They’re typically only worth a couple points, and I try to make the distractors fun. Here’s an example:

Since we watched in lecture a nice video starring Don Norman explaining “Norman doors,” I was pretty sure that anyone who actually attended lecture that day would know that the answer was the first one in the list. Still, maybe a half-dozen students chose the second item.

Here’s the one that bothered me much more.

I meant for the answer to be the first item on the list. In fact, almost the exact words were on the midterm exam review, so that students who studied the review guide would know immediately what we wanted. (I do know that working memory doesn’t actually store more for experts — I made a simplification to make the definition easier to keep in mind.)

Perhaps a dozen student chose the second item: “Familiarity breeds contempt. Experts contempt for their user interfaces allows them to use them without a sense of cognitive overload.” I had several students ask me during the exam, “What’s contempt?” I realized that many of my students didn’t know the word or the famous phrase (dates back to Chaucer).

Then one student actually wrote on his exam, “I’m assuming that contempt means learned contentment.” If you make that assumption, the item doesn’t sound ridiculous: “Familiarity breeds learned contentment. Experts learned contentment for their user interfaces allows them to use them without a sense of cognitive overload.”

I had accidentally created an assessment that expected a particular cultural context. The midterm was developed over several weeks, and reviewed by my co-instructor, graduate student instructor, five undergraduate assistants, and three undergraduate graders. We’re a pretty diverse bunch. We had found and fixed perhaps a dozen errors in the exam during the development period. We’d never noted this problem.

I’m not sure how I could have avoided this mistake. How does one remain aware of one’s own cultural assumptions? I’m thinking of the McLuhan quote: “I don’t know who discovered water, but it wasn’t a fish.” I feel bad for the students who got this problem wrong because they didn’t know the quote or the meaning of the word “contempt.” What do you think? How might I have discovered the cultural assumptions in my assessment?

March 16, 2020 at 1:57 pm 15 comments

BDSI – A New Validated Assessment for Basic Data Structures: Guest Blog Post from Leo Porter and colleagues

Leo Porter, Michael Clancy, Cynthia Lee, Soohyun Nam Liao, Cynthia Taylor, Kevin C. Webb, and Daniel Zingaro have developed a new concept inventory that they are making available to instructors and researchers. They have written this guest blog post to describe their new instrument and explain why you should use it. I’m grateful for their contribution!

We recently published a Concept Inventory for Basic Data Structures at ICER 2019 [1] and hope it will be of use to you in your classes and/or research.

The BDSI is a validated instrument to measure student knowledge of Basic Data Structure Concepts [1].  To validate the BDSI, we engaged faculty at a diverse set of institutions to decide on topics, help with question design, and ensure the questions are valued by instructors.  We also conducted over one hundred interviews with students in order to identify common misconceptions and to ensure students properly interpret the questions. Lastly, we ran pilots of the instrument at seven different institutions and performed a statistical evaluation of the instrument to ensure the questions are properly interpreted and discriminate between students’ abilities well.

What Our Assessment Measures

The BDSI measures student performance on Basic Data Structure concepts commonly found in a CS2 course.  To arrive at the topics and content of the exam, we worked with fifteen faculty at thirteen different institutions to ensure broad applicability.  The resulting topics on the CI include: Interfaces, Array-Based Lists, Linked-Lists, and Binary Search Trees. If you are curious about the learning goals or want more details on the process we used in arriving at these goals, please see our SIGCSE 2018 publication [2].

Why Validated Assessments are Great for Instructors

Suppose you want to know how well your students understand various topics in your CS2 course.  How could you figure out how much your students are learning relative to other schools? You could, perhaps, get a final exam from another school and use it in your class to compare results, but invariably, the final exam may not be a good fit.  Moreover, you may find flaws in some of the questions and wonder if students interpret them properly. Instead, you can use a validated assessment. The advantage of using a validated assessment is there is general agreement that it is measuring what you want to measure and it accurately measures student thinking.  As such, you can compare your findings to results from other schools who have used the instrument to determine if your students are learning particular topics better or worse than cohorts and similar institutions.

Why Validated Assessments are Great for Researchers

As CS researchers, we often experiment with new ways to teach courses.  For example, many people use Media Computation or Peer Instruction (PI), two complementary pedagogical approaches developed over the past several decades.  It’s important to establish whether these changes are helping our students. Do more students pass? Do fewer students withdraw? Do more students continue studying CS?  Does it boost outcomes for under-represented groups? Answering these questions using a variety of courses can give us insight into whether what we do corresponds with our expectations.

One important question is: using our new approach, do students learn more than before?  Unfortunately, answering this is complicated by the lack of standardized, validated assessments.  If students score 5% higher on an exam when studying with PI vs. not studying with PI, all we know is that PI students did better on that exam.  But exams are designed by one instructor, for one course at one institution, not for the purposes of cross-institution, cross-cohort comparisons.  They are not validated. They do not take into account the perspectives of other CS experts. When students answer a question on an exam correctly, we assume that it’s because they know the material; when they answer incorrectly, we assume it’s because they don’t know the material.  But we don’t know: maybe the exam contains incidental cues that subtly influence how students respond.

A Concept Inventory (CI) solves these problems.  Its rigorous design process leads to an assessment that can be used across schools and cohorts, and can be used to validly compare teaching approaches.

How to Obtain the BDSI

The BDSI is available via the google group.  If you’re interested in using it, please join the group and add a post with your name, institution, and how you plan to use the BDSI.

How to Use the BDSI

The BDSI is designed to be given as a post-test after students have completed the covered material.  Because the BDSI was validated as a full instrument, it is important to use the entire assessment, and not alter or remove any of the questions.  We ask that instructors not make copies of the assessment available to students after giving the BDSI, to try to avoid the questions becoming public.  We likewise recommend giving participation credit, but not correctness credit, to students for taking the BDSI, to avoid incentivizing cheating.  We have found giving the BDSI as part of a final review session, collecting the assessment from students, and then going over the answers to be a successful methodology for having students take it. 

Want to Learn More?

If you’re interested in learning more about how to build a CI, please come to our talk at SIGCSE 2020 (from 3:45-4:10pm on Thursday, March 12th) or read our paper [3].  If you are interested in learning more about how to use validated assessments, please come to our Birds of a Feather session on “Using Validated Assessments to Learn About Your Students” at SIGCSE 2020 (5:30-6:20pm on Thursday, March 12th) or our tutorial on using the BDSI at CCSC-SW 2020 (March 20-21).

References:

[1] Leo Porter, Daniel Zingaro, Soohyun Nam Liao, Cynthia Taylor, Kevin C. Webb, Cynthia Lee, and Michael Clancy. 2019. BDSI: A Validated Concept Inventory for Basic Data Structures. In Proceedings of the 2019 ACM Conference on International Computing Education Research (ICER ’19).

[2] Leo Porter, Daniel Zingaro, Cynthia Lee, Cynthia Taylor, Kevin C. Webb, and Michael Clancy. 2018. Developing Course-Level Learning Goals for Basic Data Structures in CS2. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education (SIGCSE ’18).

[3] Cynthia Taylor, Michael Clancy, Kevin C. Webb, Daniel Zingaro, Cynthia Lee, and Leo Porter. 2020. The Practical Details of Building a CS Concept Inventory. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ’20).

February 24, 2020 at 7:00 am Leave a comment

Adaptive Parsons problems, and the role of SES and Gesture in learning computing: ICER 2018 Preview

 

Next week is the 2018 International Computing Education Research Conference in Espoo, Finland. The proceedings are (as of this writing) available here: https://dl.acm.org/citation.cfm?id=3230977. Our group has three papers in the 28 accepted this year.

“Evaluating the efficiency and effectiveness of adaptive Parsons problems” by Barbara Ericson, Jim Foley, and Jochen (“Jeff”) Rick

These are the final studies from Barb Ericson’s dissertation (I blogged about her defense here). In her experiment, she compared four conditions: Students learning through writing code, through fixing code, through solving Parsons problems, and through solving her new adaptive Parsons problems. She had a control group this time (different from her Koli Calling paper) that did turtle graphics between the pre-test and post-test, so that she could be sure that there wasn’t just a testing effect of pre-test followed by a post-test. The bottom line was basically what she predicted: Learning did occur, with no significant difference between treatment groups, but the Parsons problems groups took less time. Our ebooks now include some of her adaptive Parsons problems, so she can compare performance across many students on adaptive and non-adaptive forms of the same problem. She finds that students solve the problems more and with fewer trials on the adaptive problems. So, adaptive Parsons problems lead to the same amount of learning, in less time, with fewer failures. (Failures matter, since self-efficacy is a big deal in computer science education.)

“Socioeconomic status and Computer science achievement: Spatial ability as a mediating variable in a novel model of understanding” by Miranda Parker, Amber Solomon, Brianna Pritchett, David Illingworth, Lauren Margulieux, and Mark Guzdial

(Link to last version I reviewed.)

This study is a response to the paper Steve Cooper presented at ICER 2015 (see blog post here), where they found that spatial reasoning training erased performance differences between higher and lower socioeconomic status (SES) students, while the comparison class had higher-SES students performing better than lower-SES students. Miranda and Amber wanted to test this relationship at a larger scale.

Why should wealthier students do better in CS? The most common reason I’ve heard is that wealthier students have more opportunities to study CS — they have greater access. Sometimes that’s called preparatory privilege.

Miranda and Amber and their team wanted to test whether access is really the right intermediate variable. They gave students at two different Universities four tests:

  • Part of Miranda’s SCS1 to measure performance in CS.
  • A standardized test of SES.
  • A test of spatial reasoning.
  • A survey about the amount of access they had to CS education, e.g., formal classes, code clubs, summer camps, etc.

David and Lauren did the factor analysis and structural equation modeling to compare two hypotheses: Does higher SES lead to greater access which leads to greater success in CS, or does higher SES lead to higher spatial reasoning which leads to greater success in CS? Neither hypothesis accounted for a significant amount of the differences in CS performance, but the spatial reasoning model did better than the access model.

There are some significant limitations of this study. The biggest is that they gathered data at universities. A lot of SES variance just disappears when you look at college students — they tend to be wealthier than average.

Still, the result is important for challenging the prevailing assumption about why wealthier kids do better in CS. More, spatial reasoning is an interesting variable because it’s rather inexpensively taught. It’s expensive to prepare CS teachers and get them into all schools. Steve showed that we can teach spatial reasoning within an existing CS class and reduce SES differences.

“Applying a Gesture Taxonomy to Introductory Computing Concepts” by Amber Solomon, Betsy DiSalvo, Mark Guzdial, and Ben Shapiro

(Link to last version I saw.)

We were a bit surprised (quite pleasantly!) that this paper got into ICER. I love the paper, but it’s different from most ICER papers.

Amber is interested in the role that gestures play in teaching CS. She started this paper from a taxonomy of gestures seen in other STEM classes. She observed a CS classroom and used her observations to provide concrete examples of the gestures seen in other kinds of classes. This isn’t a report of empirical findings. This is a report of using a lens borrowed from another field to look at CS learning and teaching in a new way.

My favorite part of of this paper is when Amber points out what parts of CS gestures don’t really fit in the taxonomy. It’s one thing to point to lines of code – that’s relatively concrete. It’s another thing to “point” to reference data, e.g., when explaining a sort and you gesture at the two elements you’re comparing or swapping. What exactly/concretely are we pointing at? Arrays are neither horizontal nor vertical — that distinction doesn’t really exist in memory. Arrays have no physical representation, but we act (usually) as if they’re laid out horizontally in front of us. What assumptions are we making in order to use gestures in our teaching? And what if students don’t share in those assumptions?

August 10, 2018 at 7:00 am 7 comments

A Generator for Parsons problems on LaTeX exams and quizzes

I just finished teaching my Introduction to Media Computation a few weeks ago to over 200 students. After Barb finished her dissertation on Parsons problems this semester, I decided that I should include Parsons problems on my last quiz, on the final exam study guide, and on the final exam. Parsons problems are a great fit for this assessment task. We know that Parsons problems are a more sensitive measure of learning than code writing problems, they’re just as effective as code writing or code fixing problems for learning (so good for a study guide), and they take less time than code writing or fixing.

Barb’s work used an interactive tool for providing adaptive Parsons problems. I needed to use paper for the quiz and final exam. There have been several Parsons problems paper-based implementation, and Barb guided me in developing mine.

But I realized that there’s a challenge to doing a bunch of Parsons problems like this. Scrambling code is pretty easy, but what happens when you find that you got something wrong? The quiz, study guide, and final exam were all going to iterate several times as we developed them and tested them with the teaching assistants. How do I make sure that I always kept aligned the scrambled code and the right answer?

I decided to build a gadget in LiveCode to do it.

I paste the correctly ordered code into the field on the left. When I press “Scramble,” a random ordering of the code appears (in a Verbatim LaTeX environment) along with the right answers, to be used in the LaTeX exam class. If you want to list a number of points to be associated with each correct line, you can put a number into the field above the solution field. If empty, no points will be explicitly allocated in the exam document.

I’d then paste both of those fields into my LaTeX source document. (I usually also pasted in the original source code in the correct order, so that I could fix the code and re-run the scramble when I inevitably found that I did something wrong.)

The wording of the problem was significant. Barb coached me on the best practice. You allow students to write just the line number, but encourage them to write the whole line because the latter is going to be less cognitive load for them.

Unscramble the code below that halves the frequency of the input sound.

Put the code in the right order on the lines below. You may write the line numbers of the scrambled code in the right order, or you can write the lines themselves (or both). (If you include both, we will grade the code itself if there’s a mismatch.)

The problem as the student sees it looks like this:

The exam class can also automatically generate a version of the exam with answers for used in grading. I didn’t solve any of the really hard problems in my script, like how do I deal with lines that could be put in any order. When I found that problem, I just edited the answer fields to list the acceptable options.

I am making the LiveCode source available here: http://bit.ly/scrambled-latex-src

LiveCode generates executables very easily. I have generated Windows, MacOS, and Linux executables and put them in a (20 Mb, all three versions) zip here: http://bit.ly/scrambled-latex

I used this generator probably 10-20 times in the last few weeks of the semester. I have been reflecting on this experience as an example of end-user programming. I’ll talk about that in the next blog post.

June 8, 2018 at 2:00 am 5 comments

Attending the amazing 2017 Computing at School conference #CASConf17

June 17, Barbara and I attended the Computing at School conference in Birmingham, England (which I wrote about here).  The slides from my talk are below. I highly recommend the summary from Duncan Hull which I quote at the bottom.

CAS was a terrifically fun event. It was packed full with 300 attendees. I under-estimated the length of my talk (I tend to talk too fast), so instead of a brief Q&A, there was almost half the time for Q&A. Interacting with the audience to answer teachers’ questions was more fun (and hopefully, more useful and entertaining) than me talking for longer. The session was well received based on the Tweets I read. In fact, that’s probably the best way to get a sense for the whole day — on Twitter, hashtag #CASConf17. (I’m going to try to embed some tweets with pictures below.)

Barbara’s two workshops on Media Computation in Python using our ebooks went over really well.

I enjoyed my interactions all day long. I was asked about research results in just about every conversation — the CAS teachers are eager to see what computing education research can offer them.  I met several computing education research PhD students, which was particularly exciting and fun. England takes computing education research seriously.

Miles Berry demonstrated Project Quantum by having participants answer questions from the database.  That was an engaging and fascinating interactive presentation.

Linda Liukas gave a terrific closing keynote. She views the world from a perspective that reminded me of Mitchel Resnick’s Lifelong Kindergarten and Seymour Papert’s playfulness. I was inspired.

The session that most made me think was from Peter Kemp on the report that he and co-authors have just completed on the state of computing education in England. That one deserves a separate blog post – coming Wednesday.

Check out Duncan’s summary of the conference:

The Computing At School (CAS) conference is an annual event for educators, mostly primary and secondary school teachers from the public and private sector in the UK. Now in its ninth year, it attracts over 300 delegates from across the UK and beyond to the University of Birmingham, see the brochure for details. One of the purposes of the conference is to give teachers new ideas to use in their classrooms to teach Computer Science and Computational Thinking. I went along for my first time (*blushes*) seeking ideas to use in an after school Code Club (ages 7-10) I’ve been running for a few years and also for approaches that undergraduate students in Computer Science (age 20+) at the University of Manchester could use in their final year Computer Science Education projects that I supervise. So here are nine ideas (in random brain dump order) I’ll be putting to immediate use in clubs, classrooms, labs and lecture theatres:

Source: Nine ideas for teaching Computing at School from the 2017 CAS conference | O’Really?

My talk slides:

July 10, 2017 at 7:00 am 1 comment

Assessing Learning In Introductory Computer Science: Dagstuhl Seminar Report now Available

I have written about this Dagstuhl Seminar (see earlier post). The formal report is now available.

This seminar discussed educational outcomes for first-year (university-level) computer science. We explored which outcomes were widely shared across both countries and individual universities, best practices for assessing outcomes, and research projects that would significantly advance assessment of learning in computer science. We considered both technical and professional outcomes (some narrow and some broad) as well as how to create assessments that focused on individual learners. Several concrete research projects took shape during the seminar and are being pursued by some participants.

Source: DROPS – Assessing Learning In Introductory Computer Science (Dagstuhl Seminar 16072)

September 26, 2016 at 7:26 am Leave a comment

Preview ICER 2016: Ebooks Design-Based Research and Replications in Assessment and Cognitive Load Studies

The International Computing Education Research (ICER) Conference 2016 is September 8-12 in Melbourne, Australia (see website here). There were 102 papers submitted, and 26 papers accepted for a 25% acceptance rate. Georgia Tech computing education researchers are justifiably proud — we submitted three papers to ICER 2016, and we had three acceptances. We’re over 10% of all papers at ICER 2016.

One of the papers extends the ebook work that I’ve reported on here (see here where we made them available and our paper on usability and usage from WiPSCE 2015). Identifying Design Principles for CS Teacher Ebooks through Design-Based Research (click on the title to get to the ACM DL page) by Barbara Ericson, Kantwon Rogers, Miranda Parker, Briana Morrison, and I use a Design-Based Research perspective on our ebooks work. We describe our theory for the ebooks, then describe the iterations of what we designed, what happened when we deployed (data-driven), and how we then re-designed.

Two of our papers are replication studies — so grateful to the ICER reviewers and communities for seeing the value of replication studies. The first is Replication, Validation, and Use of a Language Independent CS1 Knowledge Assessment by Miranda Parker, me, and Shelly Engleman. This is Miranda’s paper expanding on her SIGCSE 2016 poster introducing the SCS1 validated and language-independent measure of CS1 knowledge. The paper does a great survey of validated measures of learning, explains her process, and then presents what one can and can’t claim with a validated instrument.

The second is Learning Loops: A Replication Study Illuminates Impact of HS Courses by Briana Morrison, Adrienne Decker, and Lauren Margulieux. Briana and Lauren have both now left Georgia Tech, but they were still here when they did this paper, so we’re claiming them. Readers of this blog may recall Briana and Lauren’s confusing results from SIGCSE 2016 result that suggest that cognitive load in CS textual programming is so high that it blows away our experimental instructional treatments. Was that an aberration? With Adrienne Decker’s help (and student participants), they replicated the study. I’ll give away the bottom line: It wasn’t an aberration. One new finding is that students who did not have high school CS classes caught up with those who did in the experiment, with respect to understanding loops

We’re sending three of our Human-Centered Computing PhD students to the ICER 2016 Doctoral Consortium. These folks will be in the DC on Sept 8, and will present posters to the conference on Sept 9 afternoon.

September 2, 2016 at 7:53 am 17 comments

Crowd-sourcing high-quality CS Ed Assessments: CAS’s Project Quantum

Bold new project from the UK’s Computing at School project aims to create high-quality assessments for their entire computing curriculum, across grade levels.  The goal is to generate crowd-sourced problems with quality control checks to produce a large online resource of free assessments. It’s a remarkable idea — I’ve not heard of anything this scale before.  If it works, it’ll be a significant education outcome, as well as an enormous resource for computing educators.

I’m a bit concerned whether it can work. Let’s use open-source software as a comparison. While there are many great open-source projects, most of them die off.  There simply aren’t enough programmers in open-source to contribute to all the great ideas and keep them all going.  There are fewer people who can write high-quality assessment questions in computing, and fewer still who will do it for free. Can we get enough assessments made for this to be useful?

Project Quantum will help computing teachers check their students’ understanding, and support their progress, by providing free access to an online assessment system. The assessments will be formative, automatically marked, of high quality, and will support teaching by guiding content, measuring progress, and identifying misconceptions.Teachers will be able to direct pupils to specific quizzes and their pupils’ responses can be analysed to inform future teaching. Teachers can write questions themselves, and can create quizzes using their own questions or questions drawn from the question bank. A significant outcome is the crowd-sourced quality-checked question bank itself, and the subsequent anonymised analysis of the pupils’ responses to identify common misconceptions.

Source: CAS Community | Quantum: tests worth teaching to

May 25, 2016 at 7:51 am 3 comments

A Dagstuhl Discussion about Social and Professional Practices

Another of the breakouts that I was in at the recent Dagstuhl seminar on assessment in CS learning focused on how we teach and assess in CS classes social and professional practices. This was a small group: Amy Ko, Lisa Kaczmarczyk, Jan Erik Moström, and me.

Amy and her students have been studying (via interviews and surveys) what makes a great engineer.

  • They’re good at decision-making.
  • They’re good at shifting levels of abstraction, e.g., describing how a line of code relates to a business strategy.
  • They have some particular inter-personal skills. They program ego-less-ly. They have empathy, e.g., “not an asshole.”
  • Senior engineers often spend a lot of time being teachers for more junior engineers.

Since I’ve worked with Lijun Ni on high school CS teachers, I know some of the social and professional practices of teachers. They have content knowledge, and they have pedagogical content knowledge. They know how to teach. They know how to identify and diagnose student misunderstandings, and they know techniques for addressing these.

We know some techniques for teaching these practices. We can have students watch professionals, by shadowing or using case-based systems like the Ask systems. We can put students in apprenticeships (like student teaching or internships) or in design teams. We could even use games and other simulations. We have to convey authenticity — students have to believe that these are the real social and professional practices. An interesting question we came up with: How would you know if you covered the set of social and professional practice?

Here’s the big question: How similar are these sets? They seem quite different to me, and these are just two possible communities of practice for students in an intro course. Are there social and professional practices that we might teach in the same intro CS — for any community of practice that the student might later join? My sense is that the important social and professional practices are not in the intersection. The most important are unique to the community of practice.

How would we know if we got there? How would you assess student learning about social and professional practice? Knowledge isn’t enough — we’re talking about practice. We have to know that they’d do the right things. And if you found out that they didn’t have the right practices, is it still actionable? Can we “fix” practices while in undergrad? Maybe students will just do the right things when they actually get out there?

The countries with low teacher attrition spend a lot of time on teacher on-boarding. In Japan, the whole school helps to prepare a new teacher, and the whole school feels a sense of failure if the first year teacher doesn’t pass the required certification exam. US schools tend not to have much on-boarding — at schools for teachers, or in industry for software engineers (as Begel and Simon found in their studies at Microsoft). On-boarding seems like a really good place, to me, for teaching professional practice. And since the student is then doing the job, assessment is job assessment.

The problems of teaching and assessing professional practice are particularly hard when you’re trying to design a new community of practice. We’d like computing to be more diverse, to be more welcoming to women and to people from under-represented groups. We’d want cultural sensitivity to be a practice for software professionals. How would you design that? How do you define a practice for a community that doesn’t exist yet? How do you convince students about the authenticity?

It’s an interesting set of problems, and some interesting questions to explore, but I came away dubious. Is this something that we can do effectively in school?  Perhaps it’s more effective to teach professional practices in the professional context?

March 9, 2016 at 8:00 am 2 comments

SIGCSE 2016 Preview: Miranda Parker replicated the FCS1

I’ve been waiting a long time to write this post, though I do so even now with some trepidation.

In 2010, Allison Elliott Tew completed her dissertation on building FCS1, the first language-independent and validated measure of introductory computer science knowledge (see this post summarizing the work). The FCS1 was a significant accomplishment, but it didn’t get used much. Allison had concerns about the test becoming freely available and no longer useful as a research instrument.

Miranda Parker joined our group and replicated the FCS1. She created an isomorphic test (which we’re calling SCS1 for Secondary CS1 instrument — it comes after the first). She then followed a rigorous process for replicating a validated instrument, including think-aloud protocols to check usability (do the problems read as she meant them?), large-scale counter-balanced study using both tests, and analysis, including correlational and item-response theory (IRT) analysis. Her results support that SCS1 is effectively identical to FCS1, but also point out the weaknesses of both tests and why we need more and better assessments.

(Note: Complaining in this paragraph — some readers might just want to skip this.) As the first time anyone had ever replicated a validated CS research instrument, the process is a significant result. SIGCSE reviewers did not agree. The Associate Chair’s comment on our rejected paper said, “Two reviewers had concerns about appropriateness of this paper for SIGCSE: #XXX because it didn’t directly address improved learning, and #YYY because replicating the FCS1 wasn’t deemed to be as noteworthy as the original work.” An assessment tool doesn’t improve learning, and a first-ever replication is not publishable.

Miranda was hesitant to release SCS1 for use (e.g., post in my blog, send emails on CSEd-Research email lists) until the result was peer-reviewed. A disadvantage that my students have suffered for having an advisor who blogs — some reviewers have rejected my students’ papers because my blogging made it discoverable who did the research, and thus our papers can’t be sufficiently anonymized to meet those reviewers’ standards. So, I haven’t talked about SCS1, despite my pleasure and pride in Miranda’s accomplishment.

I’m posting this now because Miranda does have a poster on SCS1 at the SIGCSE 2016 Technical Symposium. Come see her at the 3-5 pm Poster Session on Friday. Miranda had a major success in her first year as a PhD student, and the research community now has a new validated research instrument.

Here’s the trepidation part: her paper on the replication process was just rejected for ITICSE. There’s no Associate Chair for ITICSE, so there’s no meta-review that gives the overall reasons.  One reviewer raised some concerns about the statistics, which we’ll have to investigate.  Another reviewer strongly disagrees with the idea of a replication, much like the #YYY reviewer at SIGCSE. One reviewer complained that this paper was awfully similar to a paper by Elliott Tew and Guzdial, so maybe it shouldn’t be published.  I’m not sure how we convince SIGCSE and ITICSE reviewers that replication is important and something that most STEM disciplines are calling for more of. (Particularly aggravating point: Because FCS1 is not freely available, the reviewer doesn’t believe that FCS1 is “valid, consistent, and reliable” without inspecting it — as if you can tell those characteristics just by looking at the test?)

I’m talking about SCS1 now because she has her poster accepted, so she has a publication on that.  We really want to publish her process and in particular, the insights we now have about both instruments.  We’ll have to wait to publish that — and I hope the reviewers of the next conference don’t give us grief because I talked about the result here.

Contact Miranda at scs1assessment@gmail.com for access to the test.

March 2, 2016 at 8:00 am 10 comments

Say Goodbye to Myers-Briggs, the Fad That Won’t Die

Once in our Learning Sciences seminar, we all took the Myers-Briggs test on day 1 of the semester, and again at the end.  Almost everybody’s score changed.  So, why do people still use it as some kind of reliable test of personality?

A test is reliable if it produces the same results from different sources. If you think your leg is broken, you can be more confident when two different radiologists diagnose a fracture. In personality testing, reliability means getting consistent results over time, or similar scores when rated by multiple people who know me well. As my inconsistent scores foreshadowed, the MBTI does poorly on reliability. Research shows “that as many as three-quarters of test takers achieve a different personality type when tested again,” writes Annie Murphy Paul in The Cult of Personality Testing, “and the sixteen distinctive types described by the Myers-Briggs have no scientific basis whatsoever.” In a recent article, Roman Krznaric adds that “if you retake the test after only a five-week gap, there’s around a 50% chance that you will fall into a different personality category.”

via Say Goodbye to MBTI, the Fad That Won’t Die | LinkedIn.

November 5, 2013 at 1:53 am 5 comments

1st “BOOC” on Scaling-Up What Works about to start at Indiana University

I talked with Dan Hickey about this — it’s an interesting alternative to MOOCs, and the topic is relevant for this blog.

In the fall semester of 2013, IU School of Education Researcher and Associate Professor Dr. Daniel Hickey will be leading an online course. The 11-week course will begin on September 9 and is being called a ‘BOOC’ or “Big Open Online Course”. The main topic being taught is ”Educational Assessment: Practices, Principles, and Policies”. Here students will develop “WikiFolios”, endorse each other’s work, and earn bonafide Digital Badges based on the work they complete. Additionally, the course provides an opportunity for Dr. Hickey to observe how these activities translate from the same for-credit, online course that initially seated 25 students to the new ‘BOOC’ format hosting 500 participants: During his small scale experimental study, Dr. Hickey stated:

“I feel like I came up with some nice strategies for streamlining the course and making it a little less demanding which I think is necessary for an open, non-credit course. I learned ways to shorten the class, to get it from the normal 15 week semester to the 11 weeks. I condensed some of the assignments and gave students options; they do performance or portfolio assessment, they don’t do both. I thought that was pretty good for students.”

via 1st “BOOC” To Begin In September, Scaling-Up What Works | BOOC at Indiana University.

September 5, 2013 at 1:46 am Leave a comment

Learning for today versus learning for tomorrow: Teaching evaluations

Really interesting set of experiments that give us new insight into the value of teaching evaluations.  The second is particularly striking and points to the difficulty of measuring teaching quality — good today isn’t the same as good tomorrow.

When you measure performance in the courses the professors taught i.e., how intro students did in intro, the less experienced and less qualified professors produced the best performance. They also got the highest student evaluation scores. But more experienced and qualified professors students did best in follow-on courses i.e., their intro students did best in advanced classes.The authors speculate that the more experienced professors tend to “broaden the curriculum and produce students with a deeper understanding of the material.” p. 430 That is, because they don’t teach directly to the test, they do worse in the short run but better in the long run.

via Do the Best Professors Get the Worst Ratings? | Psychology Today.

June 26, 2013 at 1:20 am 1 comment

Minerva Project Announces Annual $500,000 Prize for Professors: Measured how?

How would one measure extraordinary, innovative teaching?  We have a difficult time measuring regular teaching!

The Minerva Project, a San Francisco venture with lofty but untested plans to redefine higher education, said on Monday that starting next year it would award an annual $500,000 prize to a faculty member at any institution in the world who has demonstrated extraordinary, innovative teaching.

via Minerva Project Announces Annual $500,000 Prize for Professors – NYTimes.com.

May 17, 2013 at 1:48 am 3 comments

SIGCSE 2013 Preview: Measuring attitudes in introductory computing

Brian Dorn and Allison Elliott Tew have been working on a new assessment instrument for measuring attitudes towards computing.  They published a paper at ICER 2012 on its development, and the new SIGCSE 2013 paper is on its initial uses.

In general, we have too few research measures in computing education research.  Allison’s dissertation work stands alone as the only validated language-independent measure of CS1.  Brian and Allison have been following a careful process of developing the Computing Attitudes Survey (CAS).  They’re developing their instrument based on a measure created for Physics. The Physics instrument has already been adapted for Chemistry and Biology, so the process of adaptation is well-defined.

What’s particularly cool about CAS is that it can be used as a pre-test/post-test.  What were the attitude effects of a particular intervention?  The SIGCSE 2013 paper describes use of CAS in a set of pre-test/post-test situations.

Here comes the remarkable part.  In the other fields, an introductory course actually leads to decreased interest in the field (more specifically, in attitudes less-like experts in the field).  But not in computer science!  The CAS indicates increased interest in the field after the first course.

Why is that?  I like the hypothesis that Brian and Allison suggest.  Students have some clue what physics, biology, and chemistry — but it’s probably significantly wrong about real practice, and real practice is more rigorous than they thought.  Students have almost no clue what computer science is. They probably have misconceptions, but they are not tightly held — we’ve found that high school students’ perceptions of what CS is can be changed pretty easily.  After a first CS course, students realize that it’s more interesting than they thought, so attitudes become more expert-like and positive.

February 15, 2013 at 1:49 am 5 comments

Older Posts


Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,184 other subscribers

Feeds

Recent Posts

Blog Stats

  • 2,054,520 hits
April 2023
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930

CS Teaching Tips