Last month, I wrote this narrow defense of automated essay grading, hoping to clear the air on a new and controversial technology. In that post's prolific comments section, Laura Gibbs made a comment echoing what I’ve heard from every teacher I speak to.
I am waiting for someone to show me a real example of this “useful supplement” provided by the computer that is responding to natural human language use – I understand what you want it to be, but I would contend that natural human language use is so complex (complex for a computer to apprehend) that trying to give writing mechanics feedback on spontaneously generated student writing will lead only to confusion for the students.
When we talk about machine learning being used to automatically grade writing, most people don't know what that looks like. Because they don’t know the technology, they make it up. As far as I can tell, this is based on a combination of decades-old technology like Microsoft Word’s green grammar squiggles, clever new applications like Apple’s Siri personal assistant, and downright fiction, like Tony Stark’s snarky talking suits. What you get from this cross is a weird and incompetent artificial intelligence pointing out commas and giving students high grades for hiding the word “defenestration” in an essay.
My cofounder at LightSIDE Labs, David Adamson, taught in a high school for six years. If we were endeavoring to build something that was this unhelpful for teachers, he would have walked out a long time ago. In fact, though, David is a researcher in his own right. David’s Ph.D. research isn't as focused on machine learning and algorithms as my own; instead, his work brings him into Pittsburgh public schools, talking with students and teachers, and putting technology where it can make a difference. In this post, rather than focus on essay evaluation and helping students with writing - which will be the subject of future posts - I'm going to explore the things he's already doing in classrooms.
Building computers that talk to students
David builds conversational agents. These agents are computer programs that sit in chatrooms for small-group discussion in class projects, looking by all appearances like a moderator or TA logged in elsewhere. They’re not human, however – they’re totally automated. They have a small library of lines that they can inject into the discussion, which can be automatically modified slightly in context. They use language technology, including machine learning as well as simpler techniques, to process what students are saying as they work together. The agent has to decide what to say and when.
Those pre-scripted lines aren’t thrown in arbitrarily. In fact, they’re descended from decades of research into education and getting classroom discussion right. This line of research is called Accountable Talk, and in fact there’s an entire course coming up on Coursera about how to use this theory productively. The whole thing is built on fairly basic principles:
First, students should be accountable to each other in a conversation. If you’re only sharing your own ideas and not building off of the ideas of others, then it’s just a bunch of people thinking alone, who happen to be in a chatroom together. You don’t get anything out of the discussion. Next, your thought process should be built off of connecting the dots, making logical conclusions, and reasoning about the connections between facts. Finally, those facts that you’re basing your decision-making on should be explicit. They should come from explicit sources and you should be able to point to them in your argument for why your beliefs are correct.
David’s agents are framed around Accountable Talk, doing what teachers know leads to a good discussion. Instead of giving students instructions or trying to evaluate whether they were right or wrong, they merely ask good questions at the right times. Agents were trained to look for places where students made a productive, substantial claim – the type of jumping-off point that Accountable Talk encourages. He never tried to correct those claims, though; he didn’t even evaluate whether they were right or wrong. He was just looking for the chance to make a difference in the discussion.
He used those automated predictions as a springboard for collaborative discussion. Agents were programmed to try to match student statements to existing facts about a specific chemistry topic. “So, let me get this right. You’re saying...” More often than not, he also programmed the agents to lean on other students for help. “[Student 2], can you repeat what [Student 1] just said, in your own words? Do you agree or disagree? Why?” Automated prompts like this leave the deep thinking to students. Instead of following computer instructions by rote, the students were being pushed into deeper discussions. Agents give the authority to students, asking them to lead and not taking on the role of a teacher and looming over them.
Sometimes computers fail
In the real world, intervention to help students requires confidence that you’re giving good advice. If David’s agents always spout unhelpful nonsense, students will learn to ignore them. Perhaps worst of all, if the agent tries to reward students for information it thinks is correct, a wrong judgment means students get literally the opposite of helpful teaching. With all of this opportunity for downside, reliability seems like it would be the top priority. How can you build a system that’s useful for intervening in small groups if it makes big mistakes?
This is mostly accounted for by crafting the right feedback, designing agents that are tailored to the technology's strengths and avoiding weaknesses. In large part this comes down to avoiding advice that's so clear-cut that big mistakes are possible. Grammar checking and evaluations of accuracy within a sentence are doomed to fail almost from the start. If your goal with a machine learning system is to correct every mistake that every student makes, you’re going to need to be very confident, and because this is a statistics game we're playing, that kind of technology is going to disappoint. Moreover, even when you get it right, what has a student gained by being told to fix a run-on sentence? At best, an improvement at small-scale grammar understanding. This is not going to sweep anyone off their feet.
By basing his conversational agents on the tenets of a good discussion, David was able to gain a lot of ground with what is, frankly, pretty run-of-the-mill machine learning. Whiz-bang technology is secondary to technology that does something that helps. When the system works, it skips the grammar lessons. Instead, it jumps into the conversation at just the right time to encourage students to think for themselves.
Sometimes, though, the agent misfires. When using machine learning, this is something you just have to accept. What we care about is that this doesn’t hurt students or start teaching wrong ideas. So let’s think about the cases where an agent can make a wrong decision: first, where the agent could have given feedback but didn’t, and second, where the agent gives the wrong feedback at the wrong time.
First, the easy case. Sometimes a student will say something brilliant and the agent will fail to catch it. Here, the balance of authority between agent and student matters. If students get used to the idea that the agent is a teacher, they’ll be looking for it to tell them they got every answer right. This is a danger zone for us - with statistical machine learning, we don’t want to back ourselves into a corner where the system has to make judgments every single time. Sometimes, we'll be uncertain. If an agent misses its cue and the students expect a teacher's authoritative judgment of correctness, the group will flounder when the agent misses a beat, or take the discussion into their own hands and leave the agent in the dust.
Let’s see what that looks like in practice. Here’s a transcript from an earlier study, when conversational agents weren't as heavily rooted in the theory of Accountable Talk:
Tutor: You should now move on to discussing Condition C.
Tutor: [Student A], now would be a good time to ask [Student B] to build on what [Student C] is saying.
[Student B]: I’m so confused!
[Student A]: [Student B], would you like to build onto what [Student C] is saying? And me too!
Tutor: When you agree, write down your predictions on your worksheet.
This is dysfunctional; the tutor isn't asking discussion-generating questions, and it's pushing forward regardless of context. Focusing on the task single-mindedly doesn't give students room to think.
By taking on the role of facilitator, though, the agent’s comments aren’t expected every time. We can use the system to chime in with an Accountable Talk question when we’re highly confident that the machine learning system is making the right prediction, but if the system stays silent, the students won’t even notice it’s missing; the agent is purely there to help scaffold learning, rather than judge correctness. When you're augmenting teaching, rather than replacing humans entirely, you can afford to be cautious. Look at how the agent interacts when we start asking questions that are better suited to what we know we can do:
[Student D]: ok, boiling pt will go up and vdW will go up for all of them consecutively... right?
Tutor: Do you concur with [Student D]? Why, or why not?
[Student E]: hmm not necessarily
[Student F]: area goes up for each
[Student E]: would it?
[Student E]: im not sure
[Student D]: yea for sure area goes up
[Student F]: dipole increases first one
In this example excerpt the tutor didn’t give an instruction or evaluate anything that the first quoted student line. It simply asked a basic question in response to machine learning evaluating that spot as a good opening. The comments from these new agents use Accountable Talk principles, and get student groups discussing ideas.
Of course, these systems aren’t perfect. What we’re finding out, though, is that we can frame the discussion right for automated assessment by not trying to make our automated system the perfect arbiter of truth. What I’m describing isn’t a dire portrait of machines taking over the education system. It’s agents contributing meaningfully to learning by cautiously intervening when appropriate, using machine learning for educated guessing about when it's time to get students to think more deeply. These agents are tireless and can be placed into every discussion in every online small group at all times - something a single teacher in a large class will never be able to do.
The results with these agents were clear: students learned significantly more than students who didn’t get the support. Moreover, when students were singled out and targeted by agent questioning, they participated more and led a more engaged, more assertive conversation with the other students. The agent didn’t have to give students remedial grammar instructions to be valuable; the data showed that the students took their own initiative, with the agents merely pushing them in the right direction. Machine learning didn’t have to be perfect. Instead, machine learning figured out the right places to ask questions, and worked towards making students think for themselves. This is how machine learning can help students.
For helping students, automated feedback works.
We should be exercising caution with machine learning. Skeptics are right to second guess interventions from technologists who aren't working with students. The goal is often to replace teachers, not help them, especially with the promise of tantalizingly quick cost savings. Yes – if you want to make standardized testing cheaper, machine learning works. I don't to dismiss this entirely - we can, in fact, save schools and states a lot of money on existing standardized tests - but if that's as far as your imagination takes you, you're missing the point. What’s important isn’t that we can test students more, and more quickly, with less money. Focus on this: we can actually help students.
Not every student is going to get one-on-one time daily with a trained writing tutor. Many are never going to see a writing tutor individually in their entire education. For these students, machine learning is stepping in, with instant help. These systems aren’t going to make the right decision every time in every sentence. We need to know that, and we need to work with it. Rather than toss out technology promising the moon, look carefully at what it can do. Shift expectations as necessary. In David's case, the shift was about authority. He empowered students to take up their own education, and chimed in when it saw an opportunity; it positioned the automated system as guide rather than dictator.
This goes way beyond grading, and way beyond grammar checking. Machine learning helps students when teachers aren't there. Getting automated feedback right leads to students thinking, discussing ideas, and learning more - and that's what matters. In my next post, I'd like to launch off from here and talk about what these lessons mean not just for discussion, but for writing. Stay tuned.
A last note
The work I described from David is part of an extended series of more than 20 papers and journal articles from my advisor at Carnegie Mellon, Carolyn Rosé, and her students. While I won’t give a bibliography for a decade of research, some of the newest work is published as:
- “Intensification of Group Knowledge Exchange with Academically Productive Talk Agents,” in this year’s CSCL conference.
- “Enhancing Scientific Reasoning and Explanation Skills with Conversational Agents,” submitted to IEEE Transactions on Learning Technologies.
- “Towards Academically Productive Talk Supported by Conversational Agents,” in the 2012 conference on Intelligent Tutoring Systems.
I’ve asked David to watch this post’s comments section, and I’m sure he’ll be happy to directly answer any questions you have.