Big data: are we making a big mistake? Yes, especially in education
Important article that gets at some of my concerns about using MOOCs to inform education research. The sampling bias mentioned in the article below is one of my responses to the claim that we can inform education research by analyzing the results of MOOCs. We can only learn from the data of participants. If 90% of the students go away, we can’t learn about them. Making claims about computing education based on the 10% who complete a CS MOOC (and mostly white/Asian, male, wealthy, and well-educated at that) is bad science.
Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote “The End of Theory”, a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves”.
Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense.”