Big data: are we making a big mistake? Yes, especially in education

April 14, 2014 at 8:59 am 6 comments

Important article that gets at some of my concerns about using MOOCs to inform education research.  The sampling bias mentioned in the article below is one of my responses to the claim that we can inform education research by analyzing the results of MOOCs. We can only learn from the data of participants. If 90% of the students go away, we can’t learn about them. Making claims about computing education based on the 10% who complete a CS MOOC (and mostly white/Asian, male, wealthy, and well-educated at that) is bad science.

Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote “The End of Theory”, a provocative essay published in Wired in 2008, “with enough data, the numbers speak for themselves”.

Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be “complete bollocks. Absolute nonsense.”

via Big data: are we making a big mistake? –

Entry filed under: Uncategorized. Tags: , .

Naming a field: “CS Ed Research” isn’t going to work Crowd-sourcing hints for tutorials via CSTA teachers

6 Comments Add your own

  • 1. Raul Miller  |  April 14, 2014 at 10:00 am

    On the flip side, calling into question the value of “big data” also calls into question the value of education itself:

    Logically speaking, there must be some cases where having lots of information is valuable, otherwise what we are saying is that there’s only a subjective benefit to education, and no objective benefit.

    The fundamental reasons for doubting “big data” claims have to do with questions about the accuracy of the data and questions about the relevance of the data. Essentially, we are saying that “big data might be wrong because people cannot be trusted”.

    Not that that’s necessarily a bad thing.

    But perhaps we should at least be honest about it?

    But if people cannot be trusted in the gathering of information, the same would naturally hold in the conveying of information.

    In the realm of politics we deal with untrustworthy people using systems of checks and balances, which do not work very well except that we say they work better than any available alternative. Education seems to be evolving similar processes.

    Anyways, I would not hesitate to call out MOOC flaws – an awareness of flaws is crucial to treating those flaws. But at the same time, I’d be careful to point out that in some cases it’s MOOCs which raising awarenesses of flaws in traditional education.

    But that does not make this an easy subject to talk about, nor reason about.

    • 2. Mark Guzdial  |  April 14, 2014 at 11:30 am

      Having lots of data can often be useful — if it’s the data you want. I don’t see why it’s a matter of trust of people. Rather, I fear that it’s a natural reaction to the “bigness,” e.g., “We have so much data here! It must mean something!” One certainly can do big data analyses of educational data. The Pittsburgh Science of Learning Center does that. They are careful about the claims they make with regards to their sampling.

      • 3. Raul Miller  |  April 14, 2014 at 11:48 am

        So we are in agreement?

        • 4. Mark Guzdial  |  April 14, 2014 at 2:57 pm

          I’m sorry I was unclear. Why is it a matter of trust?

  • 5. shriramkrishnamurthi  |  April 14, 2014 at 10:24 am

    Out of curiosity, how many universities administer course evaluation surveys to students who drop a course?

  • 6. Fiona Harvey  |  April 27, 2014 at 1:44 am

    Reblogged this on a networked education developer and commented:
    This is very true – if MOOCs are to be truly successful for on campus students then any decisions we make based on data for educational enhancements, needs to be judged against a wider criteria. Question is, how can we attract participants from a broader base? Maybe more collaborations with colleagues in other institutions or even better community connections.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trackback this post  |  Subscribe to the comments via RSS Feed

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 10,184 other subscribers


Recent Posts

Blog Stats

  • 2,053,934 hits
April 2014

CS Teaching Tips

%d bloggers like this: