Computer scientists need to understand education research methods for CE21
February 13, 2012 at 10:01 am 7 comments
At the CE21 meeting earlier this month, I got asked a similar question more than once. “I have got this great class on X for high school teachers. I want to ‘evaluate it’. Um…how many teachers do I need?” I’m pretty sure I really heard the quote marks around “evaluate it,” because I’m pretty sure that the question-asker really had little idea what that meant.
I used this story as an example in my educational technology class last week. It’s worth exploring why that’s not answerable as-is. “How many teachers do I need?” depends on the research question that you’re trying to answer. There are lots of questions one might ask about a “great class for high school teachers.” Which one are you trying to explore?
- Maybe you think you’ve solved a particular problem that high school teachers face in learning computer science, like struggling with data structures or fitting the course material into their daily lives. I’m particularly interested in that latter problem. To answer that question, you need to talk to the teachers, to get an understanding of whether the teachers faced the problem and if your class helped them get past it. You’re not going to interview 20 people and do something useful with your data (interview transcripts). At least 3-5 people, probably no more than 10-12 participants would let you answer your question.
- Maybe you think that your class in X is better than other classes in X. Then, you need to do a comparison study. My rudimentary knowledge of statistics suggests that you need 40-50 teachers with about half taking each course so that you can compare them on some learning or performance measure.
- Maybe you think that your class can scale dramatically well, that you really have a solution to the CS10K challenge — your class can educate thousands of teachers in the next four years. That’s great, but to be convincing, you’re going to show that you can run your class at scale (maybe 100 teachers at once would be convincing) and that you still achieve learning outcomes (against some reasonable measure of learning, like Allison’s test or the outcome measures being developed for CS:Principles). You don’t need to do a comparison to something else if you’re trying to demonstrate scale, and you certainly aren’t going to interview all those participants.
There are other possible research questions, with other appropriate evaluation mechanisms. Do you think that your intervention is going to result in systemic change? Then you need a longitudinal study. Do you think that you have a class that will draw more teachers into CS teaching? Then your real target audience is outside your classroom, and you need to do an evaluation that extends outside your classroom.
The greatest challenge facing the CE21 community is that the community is filled with computer scientists. Computer science too rarely asks questions involving human beings, so we have too little practice defining the right kinds of methods. The CE21 meeting had a few education researchers, who they seemed not too comfortable with computer science — and there was way too little collaboration between the two groups. If we want to do education research that means something, we need to learn how to to ask research questions that involve humans and to figure out the right methods.
Entry filed under: Uncategorized. Tags: CE21, computing education research, NSF, teachers.
1.
John Pane | February 13, 2012 at 1:20 pm
Hi Mark,
I agree with your general point that the study design depends on the research questions you are asking. Just a couple of comments…
Qualitative researchers do know how to do something useful with 20+ interviews. They generally use qualitative analysis software to “code” the text and identify themes. Examples of the software are Atlast ti or NVivo. So the decision of how many interviews shouldn’t necessarily be driven by trying to keep the number small, but, once again, to size the study appropriately to address the research questions. If the numbers get larger you can also consider using surveys, supplemented by interviews to help figure out what should be on the surveys or to help provide richer context for what you receive in the survey responses.
40-50 teachers might work for sufficient statistical power in a randomized experiment. I usually start with a rule of thumb of 60 (30 treatment, 30 control), but the necessary number could be even larger. To do power analysis you need to understand (or estimate) some things about the intervention, how effects will be assessed, and the research context. For example: pilot data on the effects of the intervention is very useful; it is usually easier to achieve large effects on proximal measures than more distal measures (think, unit test versus end-of-year broad achievement assessment); and if the teachers are clustered in schools or districts this can vastly affect your power and require even larger numbers (requires hierarchical data analysis). There is software to help with power calculations (e.g. Optimal Design http://sitemaker.umich.edu/group-based/optimal_design_software), but you probably need someone experienced in education research to help determine appropriate input parameters.
Best,
John
2.
Mark Guzdial | February 13, 2012 at 1:40 pm
Agreed on both points. Some of my students have used some of the software for dealing with larger numbers of interviews, and yes, a combination of interview and survey is something that we have done and are doing now. And yes, there are methods that folks better versed in statistics than me can do to determine power and the needed number of participants. Thanks!
3.
gasstationwithoutpumps | February 13, 2012 at 9:59 pm
The amount of data you need depends on the effect size. If you have no idea how powerful the effect is, you can’t guess how many subjects you need. Doing a preliminary experiment to estimate the effect size, then redoing the experiment with enough people to ensure that the effect of that size is measurable with sufficient significance seems like the only reasonable approach.
Most of the education research I’ve seen is only the collection of the preliminary data to form a hypothesis, without the (expensive) followup to make sure that it is not just a statistical fluke. This may be part of the reason why so few of the research-based curricula survive long in the real world (there may be plenty of other reasons also).
4.
Mark Guzdial | February 13, 2012 at 10:06 pm
Not all education research is about curriculum. Allison Elliott Tew’s work was about developing an instrument. Some education research is asking about specific intervention questions (like about media or technology), or about understanding a student population (e.g., our work characterizing high school teachers or graphics designers). Sometimes the right data to collect has nothing to do with statistics.
5.
Jill Denner | February 15, 2012 at 12:28 pm
I was one of the education researchers at the CE21 meeting, and it is one of the few places that we have the opportunity to meet with computer scientists. We need more opportunities to build these interdisciplinary collaborations, and NSF is starting to fund this effort. Check out Fostering Interdisciplinary Research on Education (FIRE), which funds collaboration. And the Computer Science Collaboration Project, which builds connections across disciplines, as well as research and practice http://www.cscproject.org.
6. Los científicos del equipo necesitan comprender los métodos de investigación de educación para CE21 | INTERNET | February 20, 2012 at 11:28 am
[…] preguntas que involucran a los seres humanos y averiguar los métodos de la derecho. version original aqui […]
7.
Michelle | March 15, 2012 at 4:02 pm
This is exactly why I’m focusing my efforts on education, sociology, and social and cognitive psychology. There’s a huge deficit in the number of people with experience and expertise in social science research, which is fundamentally what we’re trying to do.
It’s an interesting problem when looking at the results, too. Computer scientists are used to results of experiments that are consistent and fairly cut-and-dried. “This algorithm ran this fast under these conditions.” Research dealing with people is much murkier – validity of qualitative results is (as you know) different than quantitative, and mixed methods is more powerful but even more complicated. People believe statistics but they respond to stories. Computer scientists are suspicious of small n’s and quotes.
I hope to attend CE21 someday.