Posts tagged ‘assessment’
I have written about this Dagstuhl Seminar (see earlier post). The formal report is now available.
This seminar discussed educational outcomes for first-year (university-level) computer science. We explored which outcomes were widely shared across both countries and individual universities, best practices for assessing outcomes, and research projects that would significantly advance assessment of learning in computer science. We considered both technical and professional outcomes (some narrow and some broad) as well as how to create assessments that focused on individual learners. Several concrete research projects took shape during the seminar and are being pursued by some participants.
Preview ICER 2016: Ebooks Design-Based Research and Replications in Assessment and Cognitive Load Studies
The International Computing Education Research (ICER) Conference 2016 is September 8-12 in Melbourne, Australia (see website here). There were 102 papers submitted, and 26 papers accepted for a 25% acceptance rate. Georgia Tech computing education researchers are justifiably proud — we submitted three papers to ICER 2016, and we had three acceptances. We’re over 10% of all papers at ICER 2016.
One of the papers extends the ebook work that I’ve reported on here (see here where we made them available and our paper on usability and usage from WiPSCE 2015). Identifying Design Principles for CS Teacher Ebooks through Design-Based Research (click on the title to get to the ACM DL page) by Barbara Ericson, Kantwon Rogers, Miranda Parker, Briana Morrison, and I use a Design-Based Research perspective on our ebooks work. We describe our theory for the ebooks, then describe the iterations of what we designed, what happened when we deployed (data-driven), and how we then re-designed.
Two of our papers are replication studies — so grateful to the ICER reviewers and communities for seeing the value of replication studies. The first is Replication, Validation, and Use of a Language Independent CS1 Knowledge Assessment by Miranda Parker, me, and Shelly Engleman. This is Miranda’s paper expanding on her SIGCSE 2016 poster introducing the SCS1 validated and language-independent measure of CS1 knowledge. The paper does a great survey of validated measures of learning, explains her process, and then presents what one can and can’t claim with a validated instrument.
The second is Learning Loops: A Replication Study Illuminates Impact of HS Courses by Briana Morrison, Adrienne Decker, and Lauren Margulieux. Briana and Lauren have both now left Georgia Tech, but they were still here when they did this paper, so we’re claiming them. Readers of this blog may recall Briana and Lauren’s confusing results from SIGCSE 2016 result that suggest that cognitive load in CS textual programming is so high that it blows away our experimental instructional treatments. Was that an aberration? With Adrienne Decker’s help (and student participants), they replicated the study. I’ll give away the bottom line: It wasn’t an aberration. One new finding is that students who did not have high school CS classes caught up with those who did in the experiment, with respect to understanding loops
We’re sending three of our Human-Centered Computing PhD students to the ICER 2016 Doctoral Consortium. These folks will be in the DC on Sept 8, and will present posters to the conference on Sept 9 afternoon.
- Barbara Ericson will be presenting her results with Dynamically Adaptive Parsons Problems. I’ve seen some of the pilot study results from this summer, and they’re fascinating.
- Amber Solomon is just starting her second year working with me. She did the evaluation on the AR Design Studio classroom. She (and I) is fascinated by Steve Cooper’s results from ICER 2015 where spatial reasoning training influenced CS performance and reduced SES differences. She’s been doing a study on CS grades, SES, and spatial reasoning in a non-majors class. She’ll be presenting on The Role of Spatial Reasoning in Learning Computer Science.
- Kayla DesPortes works with my colleague Betsy DiSalvo on the learning that happens in MakerSpaces. She’s designing new kinds of physical interfaces to reduce cognitive load and improve learning when working with electronics, which she’ll be talking about at her poster: Learning and Collaboration in Physical Computing.
Bold new project from the UK’s Computing at School project aims to create high-quality assessments for their entire computing curriculum, across grade levels. The goal is to generate crowd-sourced problems with quality control checks to produce a large online resource of free assessments. It’s a remarkable idea — I’ve not heard of anything this scale before. If it works, it’ll be a significant education outcome, as well as an enormous resource for computing educators.
I’m a bit concerned whether it can work. Let’s use open-source software as a comparison. While there are many great open-source projects, most of them die off. There simply aren’t enough programmers in open-source to contribute to all the great ideas and keep them all going. There are fewer people who can write high-quality assessment questions in computing, and fewer still who will do it for free. Can we get enough assessments made for this to be useful?
Project Quantum will help computing teachers check their students’ understanding, and support their progress, by providing free access to an online assessment system. The assessments will be formative, automatically marked, of high quality, and will support teaching by guiding content, measuring progress, and identifying misconceptions.Teachers will be able to direct pupils to specific quizzes and their pupils’ responses can be analysed to inform future teaching. Teachers can write questions themselves, and can create quizzes using their own questions or questions drawn from the question bank. A significant outcome is the crowd-sourced quality-checked question bank itself, and the subsequent anonymised analysis of the pupils’ responses to identify common misconceptions.
Another of the breakouts that I was in at the recent Dagstuhl seminar on assessment in CS learning focused on how we teach and assess in CS classes social and professional practices. This was a small group: Andy Ko, Lisa Kaczmarczyk, Jan Erik Moström, and me.
Andy and his students have been studying (via interviews and surveys) what makes a great engineer.
- They’re good at decision-making.
- They’re good at shifting levels of abstraction, e.g., describing how a line of code relates to a business strategy.
- They have some particular inter-personal skills. They program ego-less-ly. They have empathy, e.g., “not an asshole.”
- Senior engineers often spend a lot of time being teachers for more junior engineers.
Since I’ve worked with Lijun Ni on high school CS teachers, I know some of the social and professional practices of teachers. They have content knowledge, and they have pedagogical content knowledge. They know how to teach. They know how to identify and diagnose student misunderstandings, and they know techniques for addressing these.
We know some techniques for teaching these practices. We can have students watch professionals, by shadowing or using case-based systems like the Ask systems. We can put students in apprenticeships (like student teaching or internships) or in design teams. We could even use games and other simulations. We have to convey authenticity — students have to believe that these are the real social and professional practices. An interesting question we came up with: How would you know if you covered the set of social and professional practice?
Here’s the big question: How similar are these sets? They seem quite different to me, and these are just two possible communities of practice for students in an intro course. Are there social and professional practices that we might teach in the same intro CS — for any community of practice that the student might later join? My sense is that the important social and professional practices are not in the intersection. The most important are unique to the community of practice.
How would we know if we got there? How would you assess student learning about social and professional practice? Knowledge isn’t enough — we’re talking about practice. We have to know that they’d do the right things. And if you found out that they didn’t have the right practices, is it still actionable? Can we “fix” practices while in undergrad? Maybe students will just do the right things when they actually get out there?
The countries with low teacher attrition spend a lot of time on teacher on-boarding. In Japan, the whole school helps to prepare a new teacher, and the whole school feels a sense of failure if the first year teacher doesn’t pass the required certification exam. US schools tend not to have much on-boarding — at schools for teachers, or in industry for software engineers (as Begel and Simon found in their studies at Microsoft). On-boarding seems like a really good place, to me, for teaching professional practice. And since the student is then doing the job, assessment is job assessment.
The problems of teaching and assessing professional practice are particularly hard when you’re trying to design a new community of practice. We’d like computing to be more diverse, to be more welcoming to women and to people from under-represented groups. We’d want cultural sensitivity to be a practice for software professionals. How would you design that? How do you define a practice for a community that doesn’t exist yet? How do you convince students about the authenticity?
It’s an interesting set of problems, and some interesting questions to explore, but I came away dubious. Is this something that we can do effectively in school? Perhaps it’s more effective to teach professional practices in the professional context?
I’ve been waiting a long time to write this post, though I do so even now with some trepidation.
In 2010, Allison Elliott Tew completed her dissertation on building FCS1, the first language-independent and validated measure of introductory computer science knowledge (see this post summarizing the work). The FCS1 was a significant accomplishment, but it didn’t get used much. Allison had concerns about the test becoming freely available and no longer useful as a research instrument.
Miranda Parker joined our group and replicated the FCS1. She created an isomorphic test (which we’re calling SCS1 for Secondary CS1 instrument — it comes after the first). She then followed a rigorous process for replicating a validated instrument, including think-aloud protocols to check usability (do the problems read as she meant them?), large-scale counter-balanced study using both tests, and analysis, including correlational and item-response theory (IRT) analysis. Her results support that SCS1 is effectively identical to FCS1, but also point out the weaknesses of both tests and why we need more and better assessments.
(Note: Complaining in this paragraph — some readers might just want to skip this.) As the first time anyone had ever replicated a validated CS research instrument, the process is a significant result. SIGCSE reviewers did not agree. The Associate Chair’s comment on our rejected paper said, “Two reviewers had concerns about appropriateness of this paper for SIGCSE: #XXX because it didn’t directly address improved learning, and #YYY because replicating the FCS1 wasn’t deemed to be as noteworthy as the original work.” An assessment tool doesn’t improve learning, and a first-ever replication is not publishable.
Miranda was hesitant to release SCS1 for use (e.g., post in my blog, send emails on CSEd-Research email lists) until the result was peer-reviewed. A disadvantage that my students have suffered for having an advisor who blogs — some reviewers have rejected my students’ papers because my blogging made it discoverable who did the research, and thus our papers can’t be sufficiently anonymized to meet those reviewers’ standards. So, I haven’t talked about SCS1, despite my pleasure and pride in Miranda’s accomplishment.
I’m posting this now because Miranda does have a poster on SCS1 at the SIGCSE 2016 Technical Symposium. Come see her at the 3-5 pm Poster Session on Friday. Miranda had a major success in her first year as a PhD student, and the research community now has a new validated research instrument.
Here’s the trepidation part: her paper on the replication process was just rejected for ITICSE. There’s no Associate Chair for ITICSE, so there’s no meta-review that gives the overall reasons. One reviewer raised some concerns about the statistics, which we’ll have to investigate. Another reviewer strongly disagrees with the idea of a replication, much like the #YYY reviewer at SIGCSE. One reviewer complained that this paper was awfully similar to a paper by Elliott Tew and Guzdial, so maybe it shouldn’t be published. I’m not sure how we convince SIGCSE and ITICSE reviewers that replication is important and something that most STEM disciplines are calling for more of. (Particularly aggravating point: Because FCS1 is not freely available, the reviewer doesn’t believe that FCS1 is “valid, consistent, and reliable” without inspecting it — as if you can tell those characteristics just by looking at the test?)
I’m talking about SCS1 now because she has her poster accepted, so she has a publication on that. We really want to publish her process and in particular, the insights we now have about both instruments. We’ll have to wait to publish that — and I hope the reviewers of the next conference don’t give us grief because I talked about the result here.
Contact Miranda at email@example.com for access to the test.
Once in our Learning Sciences seminar, we all took the Myers-Briggs test on day 1 of the semester, and again at the end. Almost everybody’s score changed. So, why do people still use it as some kind of reliable test of personality?
A test is reliable if it produces the same results from different sources. If you think your leg is broken, you can be more confident when two different radiologists diagnose a fracture. In personality testing, reliability means getting consistent results over time, or similar scores when rated by multiple people who know me well. As my inconsistent scores foreshadowed, the MBTI does poorly on reliability. Research shows “that as many as three-quarters of test takers achieve a different personality type when tested again,” writes Annie Murphy Paul in The Cult of Personality Testing, “and the sixteen distinctive types described by the Myers-Briggs have no scientific basis whatsoever.” In a recent article, Roman Krznaric adds that “if you retake the test after only a five-week gap, there’s around a 50% chance that you will fall into a different personality category.”
I talked with Dan Hickey about this — it’s an interesting alternative to MOOCs, and the topic is relevant for this blog.
In the fall semester of 2013, IU School of Education Researcher and Associate Professor Dr. Daniel Hickey will be leading an online course. The 11-week course will begin on September 9 and is being called a ‘BOOC’ or “Big Open Online Course”. The main topic being taught is ”Educational Assessment: Practices, Principles, and Policies”. Here students will develop “WikiFolios”, endorse each other’s work, and earn bonafide Digital Badges based on the work they complete. Additionally, the course provides an opportunity for Dr. Hickey to observe how these activities translate from the same for-credit, online course that initially seated 25 students to the new ‘BOOC’ format hosting 500 participants: During his small scale experimental study, Dr. Hickey stated:
“I feel like I came up with some nice strategies for streamlining the course and making it a little less demanding which I think is necessary for an open, non-credit course. I learned ways to shorten the class, to get it from the normal 15 week semester to the 11 weeks. I condensed some of the assignments and gave students options; they do performance or portfolio assessment, they don’t do both. I thought that was pretty good for students.”