Why Big Data (Mostly) Can’t Help Improve Teaching

Here’s a nifty video summary of a doctoral dissertation by Derek Muller that a client pointed out to me:

YouTube Preview Image

The basic gist is that students have pre-conceived notions that are wrong, and it is very hard to dislodge those mistaken notions. If you show them a video with an accurate explanation, the students will say that the video was clear and helpful, but they will misremember it as confirming their (mistaken) preconceived notions. In short, they won’t learn. In contrast, if you show them a video that starts by directly stating and then refuting their misconception, they like the video less and say it is confusing, but they actually learn more. This is a really important pedagogical point to know whether you are giving traditional in-class lectures, writing curricular materials, or creating one of those oh-so-modern video lectures that all the cool kids are into these days.

It’s also a good example of the kind of insight that big data is completely blind to. And it gives us good reason to be skeptical that taking large lecture courses online, turning them into REALLY large lecture courses (with nice videos), and expecting that new and more effective pedagogies will rise out of the data because, you know, science or something, is more of a hope (or a fantasy) than a plan to improve education.

Let’s say you have one of those ultra-hip MOOC platforms with a bazillion courses running on it and a hadoop thingamabob back end that’s tied to a flux capacitor, an oscillating overthruster, and a machine that goes “ping!” You’ve got all the big data toys. And let’s say that, among the many thousands of lecture videos being used on your platform, a bunch of them are designed the way Muller’s work suggests is best practice. Some of these were done this way consciously with awareness of the research. Some were done this way on purpose but based on intuitions by classroom teachers. They don’t have a name for what they’re doing, and they don’t really think about it as a general pedagogical strategy, but they have learned from experience that there are certain spots in their courses where they have to confront some misconceptions head-on. And then some of the videos may be in the Muller format completely accidentally. For example, maybe there’s a video of students working through a problem together. The first idea they come up with is the misconception, but they talk it through together and come up with the right answer in the end. This wasn’t planned, and the teacher who posts the video may not even be aware of why this sequence of events makes the event effective. Maybe she believes in the value of watching students work through the problem together and posts lots of student conversations videos, some of which end up being in Muller’s format and some of which don’t. Let’s assume that many of these videos are effective at teaching the concepts they are trying to teach, and let’s also assume that they are effective for the reason that Muller hypothesizes.

The first question is whether our super-duper, trans-warp-capable, dilithium crystal-powered big data cluster would even identify these videos as noteworthy. The answer is maybe, but probably not reliably so. Muller set up a controlled experiment with one variable designed to test a well-formed hypothesis. He was measuring whether this style video was more effective than the alternative of a more traditional lecture delivery. In science, this is called a “control of variables strategy.” In product development, it’s called “A/B testing” or “split testing.”

Big data usually doesn’t work that way. Instead of creating a tightly controlled set of conditions, it usually looks at what’s available “in the wild” and relies on the massive numbers of examples it has plus the power of computers to do lots of comparisons really fast to come up with inferences. Let’s say, for example, that you’re a medical researcher trying to figure out the role of genetics in a particular type of cancer. There are many, many genes that could be involved, and it may be that a bunch of them are involved but interact in complex ways. And, of course, environmental factors such as diet or exposure to carcinogens, as well as a certain amount of chance, can all impact whether a particular individual gets cancer. The good news is that, while there are many variables, they are finite in number, mostly known and measurable, and mostly have a quantifiable and reasonably regular impact on the cancer outcome (if you understand all the interactions sufficiently well). If you have a large enough database of patients with enough genetic material and good details on the non-genetic factors that you think probably contribute to the likelihood that they will get cancer, then a big data approach will probably help. There are regular patterns in the data. The main challenge is sifting through the mountains of data to find the patterns that are already there. Big data is good for that kind of problem.

But education doesn’t work that way. The same video may impact different students very differently, due to variables that mostly aren’t in our computer systems. For one thing, classes can be taught differently in many, many different ways, some of which matter and some of which don’t. Again, if we were doing a split test in a MOOC context, we could control the variables what happens when you just change one video for a class that is otherwise the same for many students. That approach has significant research value, but it’s not big data magic. It’s educators who come up with hypotheses and test them using a large data set. Students are also very different, in important ways that often don’t show up the data that we have in our online systems. Silicon Valley is not going to make us magically smarter about teaching.

Now, big data enthusiasts might argue that I’m not thinking big enough in terms of the data set, and that could make a difference. Knewton, for example, claims that their system can track students across courses and semesters and test hypotheses about them over time. For example, suppose a student is struggling with word problems in a math class. It’s possible that the student is having difficulty translating English into math variables, or trouble identifying the important variables in the first place. Those are both math-related issues. But it’s also possible that the student just has poor English decoding skills in general. Knewton claims that their system can hold all of these hypotheses about the student and then test them (presumably using some sort of Baysian analysis) across all the courses. If there is evidence in the English class that the student is struggling with basic reading, then that hypothesis gets elevated. And maybe that student gets extra reading lessons slipped in between math lessons. It sounds really cool. I haven’t seen evidence that it actually works yet, and to the degree that it does, it raises other questions about whether you need all student educational interactions to be on the platform in order to get the value, who owns the data, and so on. Put this one in the “maybe someday” category for now.

But even granting that you can get sufficiently rich information about the students, there’s another hard problem. Let’s say that, thanks to the upgrade in your big data infinite improbability drive made possible by your new Spacely’s space sprocket, your system is able to flag at least a critical mass of videos taught in the Mueller method as having a bigger educational impact on the students the average educational video by some measure you have identified. Would the machine be able to infer that these videos belong in a common category in terms of the reason for their effectiveness? Would it be able to figure out what Muller did? There are lots of reasons why a video might be more effective than average. And many of those ways are internal to the narrative structure of the video. The machine only knows things like the format of the video, the length, what kind of class it’s in, who the creator is, when it was made, and so on. Other than the external characteristics of the video file, it mostly knows what we tell it about the contents. It has no way for it to inspect the video and deduce that a particular presentation strategy is being used. We are nowhere close to having a machine that is smart enough to do what Muller did and identify a pattern in the narrative of the speaker. Now, if an educational researcher were to read Muller’s research, tag a critical mass of the relevant videos in the system as being in this style, and ask the machine to find other videos that might be similar, it’s possible that big data could help. It might come back with something like, “Here are some videos that seem to have roughly the same kind and size of effect on test scores as the ones with the Muller tag.” Maybe. Even then, you’d have to have human researchers go through the videos the computer flagged—and there might be a lot of them—to see which ones really use the same strategy and which ones don’t. That would be better than nothing, but it’s far from magic.

By the way, the low-tech method commonly used now is even worse. Not only is it useless, it’s actually harmful. A/B tests are rarely done on curricular materials, but surveys and focus groups where students self-report the effectiveness of the materials are common, particularly among textbook publishers. And in that situation, the videos that the students report to be harder and more confusing would actually be the more effective ones. But, lacking any measure other than the survey of their real effect on learning, the publishers (or teachers) generally would toss out the more effective videos in favor of the less effective ones.

Whether we’re talking about machine learning or human learning about how to improve education, the real problem is that we don’t have a vocabulary to talk about these teaching strategies, so we can’t formulate, test, and independently verify our hypotheses. In the machine learning example, we could create an arbitrary “Muller” tag in the system, but we don’t have a common language among teachers where we say “Oh, yeah, he’s using the confront-the-misconceptions (CTM) lecture strategy for that one. I prefer doing a predict-observe-explain (POE) experiment to accomplish the same thing.” If we had a widely adopted language that describes the details of why instructors think a particular aspect of their lecture or their discussion prompt or their experiment assignment is effective at teaching, then big data could be helpful because we could tag all our videos with pedagogical descriptions. We could make our theories about teaching and learning visible to the system in a way that it would be more able to test. And, perhaps even more importantly, human researchers could be more effective at collaborating with each other on testing theories of teaching and learning. Right now, what we’re trying to do is a little like trying to conduct physics research before somebody has invented calculus. You can do some things around the edges, but you can’t describe the really important hypotheses about causes and effects in learning situations with any precision. And if you can’t describe them with precision, then you can’t test them, and you certainly can’t get a machine to understand them.

More on this in a future post.

Share Button

Google+ Comments

About Michael Feldstein

Michael Feldstein is co-Publisher of e-Literate, co-Producer of e-Literate TV, and Partner in MindWires Consulting. For more information, see his profile page.
This entry was posted in Educational Pattern Languages, Tools, Toys, and Technology (Oh my!) and tagged , , , , . Bookmark the permalink.

17 Responses to Why Big Data (Mostly) Can’t Help Improve Teaching

  1. I’m still surprised to read negative things about MOOCs. I don’t see a business case for them but while they are available I think anyone that can, should take advantage of them. As for big data, I agree but do we really use the argument that students learn in different ways in shaping the current curriculum? MOOCs are following the same old model that is still being played out in most classrooms today. I think the change should start in the real world before we start judging the MOOCs.

  2. I’m having a little trouble connecting your criticism to my post, Chris. I really have nothing to say here about MOOCs in general, other than a really peripheral side-snark about how videos are the new hotness. MOOCs and big data are not the same thing. My point is that there’s a lot of hype about how putting these courses into a platform where we can gather a lot of data will magically yield new insights, and that we have reason to be skeptical of that hype. This is a criticism that neither invalidates all the various benefits that MOOCs purport to offer nor is it limited only to MOOCs. The same criticism could be applied to claims by textbook publishers about how they’re going to use data to revolutionize education through adaptive learning, for example.

  3. Michael, when you say…

    “Let’s say you have one of those ultra-hip MOOC platforms with a bazillion courses running on it and a hadoop thingamabob back end that’s tied to a flux capacitor, an oscillating overthruster, and a machine that goes “ping!” You’ve got all the big data toys.”

    …the structure of those two sentences serves to link MOOcs and big data in the reader’s mind, as it also does when you discuss split tests “in a MOOC context.” If you want readers to assume you *don’t* have MOOCs as a primary target, obsessing with video delivery of content and piling on the snark are bad ways to signal that intent.

    Likewise, your response to Chris, “My point is that there’s a lot of hype about how putting these courses into a platform where we can gather a lot of data will magically yield new insights” comes without links. Where is this hype? I’m sure you have something in particular in mind, but posts that are long on denouncing nameless idiots and short on URLs are a little harder to comment on, because we can’t see how seriously we should take the material you are reacting to.

  4. First of all, Clay, welcome. I’m a fan of your work.

    That said, you are somewhat late to the party. While I do try to make my blog posts as accessible to newcomers as possible, when I am trying to make a long and complex point I sometimes assume that (1) my readers have been following along with the blog and have some of the same context that I do, and (2) they are also familiar with my sometimes-flip tone and don’t take it too seriously. I don’t think that people who believe in big data in education are idiots. But my regular readers are well aware of the fact that both MOOCs and big data in education are incessantly hyped—so incessantly, that it didn’t occur to me that I would need to back up the references with…uh…references. For the latest examples, see, respectively, last Friday’s piece by Steve Kolowich (whose writing I generally like a lot) and yesterday’s piece by Thomas Friedman (whose writing I also generally like). You can’t listen to a speech of more than 10 minutes by Coursera’s Daphne Koller, edX’s Anant Agrawal, or Udacity’s Sebastian Thrun without hearing them talk about “new pedagogies” and connecting the coming revolution to big data in some way or other (Daphne in particular; see, for example, her TED talk). I certainly don’t think any of them are idiots by any stretch of the imagination. Nor are the MOOC providers the only culprits. The Kolowich piece focuses a lot on Knewton, which is a start-up that mostly works with textbook companies. But anyone who has been following the MOOC developments reasonably closely will be familiar with the relatively pervasive assumption that all that data in these platforms will yield big new learning insights. The joke isn’t that there are a few idiots hyping MOOCs big data; the joke is that you can’t go ten minutes without hearing normally bright and respectable people wax poetic about MOOCs, big data or, often, both. It’s sort of an assumption that’s woven into the fabric of the narrative about education and technology. The ed tech version of “…a noun, a verb, and 911.” Perhaps I need to adjust my style to the reality that e-Literate is attracting a wider audience now.

    And one reason that I’m “obsessing with the video delivery of content” is because it’s integral to the argument. At this point, most xMOOCs are mainly videos and quizzes, with some open discussion that nobody quite knows how to manage at the scale that you get in those classes (and which was really incidental to early xMOOC designs). Sal Khan has been the model. Rather than citing sources for that one, the best evidence I can suggest is that you sample a few MOOCs. By and large, there isn’t a lot of variation in the design patterns yet. The snark, of course, is aimed at the fact that there’s nothing particularly innovative about video delivery of lectures. There should be, but largely isn’t yet, some serious examination of what it says about the state of our F2F pedagogy if teachers can be easily replaced by videos. But that’s an en passant reference to an argument that my regular readers have heard from me before in more detail. The real point is that, if video is so central to your pedagogy, then you’d better be able to extract the relevant semantic features that make it effective if you’re going to apply big data methods and expect to get anything of value. Garbage in, garbage out. And in order to tag videos with the relevant pedagogical features so that it is accessible to the machines, we need to have a vocabulary that we currently do not have.

  5. Also, it should be said that I have a difficult time resisting an opportunity to make a flux capacitor joke.

  6. ThoughtKast says:

    Interesting post & comments. Fascinating subject, too. In my view big data is a tool for teaching not for learning. Teaching improvement requires accumulation of experience and adaptation to a pedagogical style to maximise learning. It is a long-time process. Learning on the other hand requires immediate feedback; it is a short term process limited by its neurological underlying structures.

    MOOCS and big data are good friends because they use the observation of patterns to improve the process that most of us miss. This works very well and I think they will accelerate their improvement as more students will join online. I am sure the second generation of courses on Coursera will improve notably based on lessons learned from the study of big data.

    Are MOOCS going to replace F2F teaching? No. But they will complement it very well.

    If anyone says that MOOCS will replace teaching will have a hard time convincing people, because of the evidence doesn’t support that assumption at all. However, MOOCS works very well for students who know how to use new learning material in an independent way. This is an essential skill that F2F should focus on. F2F is about providing to students the right scaffolding – this requires short term personal interactions for which big data is not really suitable. Those who have the right scaffolding can take advantage of MOOCS.

    F2F can benefit from big data if teachers use a commonly accepted pedagogical notation system. This allows large scale sharing of pedagogical patterns that allows the selection of the best models, with direct application in the classroom (for better scaffolding). Learning Design is attempting to formalise this approach (http://larnacadeclaration.org/)

  7. This post is actually a setup for one I plan to do about Learning Design and the Larnaca Declaration, so I agree with you wholeheartedly there. My point is that big data is of highly limited usefulness without the tagging that comes with something like LD. I don’t think big data will yield much useful information from MOOCs. I think learning analytics can, but that’s not the same thing.

  8. John Whitmer says:

    Hi Michael,

    Thanks for the interesting post and being willing to challenge some of the precepts of the hype cycle we’re living in. I’d suggest that, like your final recommendation for “tagging” with LD, or another pedagogy – that your post conflates “big data” with “recommendation engines or adaptive learning environments” – which are one area that big data is being used for, but it’s only one deployment.

    To me, it seems a bit dangerous to suggest they’re the same – we are so early in our understanding of learning analytics that, to me, it seems important to specify the area that’s being hyped – and what isn’t being hyped or recognized. I’ve asked MOOC providers (well, OK, just one) about what they’re doing with big data – and they readily admit that they’re at a very early stage. I think pieces like this are something they’d be interesting in – and we can design algorithms to take into consideration.

    But then again, that wouldn’t make for such an exciting headline …

  9. It is absolutely critical to distinguish between adaptive learning environments, other kinds of learning analytics, and big data approaches. I have a lot more optimism about the former than the latter at the moment. But you absolutely do need those tags for big data as well. I’ll get into this a little more in my next post on the topic, but the basic point is that the context that you can infer from the typical generic learning environment with untagged content will not let you mine for the important attributes of the pedagogical design that may actually impact learning.

  10. John Whitmer says:

    Hi Michael – not to get to wonky on you (and feel free to not accept this post), but in my lit review and research study I found that we can infer about 25% of the variation in final student grade in both fully online and hybrid courses. There’s four studies that have come to the same conclusion by using “clickometry”.

    This isn’t as accurate as some of the rhetoric might lead us to believe – but 25% of the variation for an individual course is actually VERY good compared to the standard demographics and other measures that we use.

    And at the same time, I’m on board with you that these findings point to the need for more sophisticated analytics and methods – like tagged content. Of course, that approach requires that faculty implement that standard – which is harder to execute than just externally analyzing lots of data.

    Have you seen the Tin Can API? Interesting emerging standard in that direction.

  11. John Whitmer says:

    Here’s a link to my dissertation study referenced above: http://johnwhitmer.net/dissertation-study/

    And the Tin Can API: http://tincanapi.com/

  12. Zack Dergen says:

    The problem with those assumptions, Michael, is that this post is being shared far as wide (as you might know if you get link notifications, though perhaps all the Facebook shares are hidden to you). Many of the people sharing it are anti-MOOC advocates (e.g., Siva Vaidhyanathan) promoting your post as anti-MOOC . This is not itself your fault, of course, but we surely bear some responsibility for contextualizing our work when it is presented in a medium (blogging) so commonly understood as consisting of a series of more or less standalone elements.

  13. John, that’s an interesting result, but what it tells us is mainly that there is a correlation between students who tend to be active in class and students that tend to pass the class. This is valuable information if you’re trying to build an early warning system but not much use if you’re trying to develop a “new pedagogy,” where you find teaching methods that lead to higher comprehension, retention (as in memory, not as in staying in school), near and far transfer, etc.

    Zack, I’m sorry, but I don’t feel terribly responsible for the fact that somebody on the internet may be misinterpreting something that I wrote. Having reread my post several times now, I do not think that I was careless in my use of language or humor. I am skeptical of the usefulness of big data in MOOCs as they are currently designed. If somebody wants to use my position as an argument against all things MOOCish, there’s nothing I can do about that. I’m trying to move a conversation forward. I’m not going to give up on long discussions that engage people who have already thought about this stuff just because somebody out there isn’t going to bother looking at larger the context.

  14. Before I saw all these posts, I personally Cracked up at ” hadoop thingamabob back end that’s tied to a flux capacitor, an oscillating overthruster, and a machine that goes “ping!” I got the humor as well as the context of how videos are the new hotness! great article!

  15. Pingback: Apollo Group's Technology Investments |e-Literate

  16. Pingback: A Taxonomy of Adaptive Analytics Strategies |e-Literate

  17. Pingback: Why Big Data (Mostly) Can’t Help Improve Teaching – | Flexibility Enables Learning