Getting students useful feedback from machine learning

Last month, I wrote this narrow defense of automated essay grading, hoping to clear the air on a new and controversial technology. In that post’s prolific comments section, Laura Gibbs made a comment echoing what I’ve heard from every teacher I speak to.

I am waiting for someone to show me a real example of this “useful supplement” provided by the computer that is responding to natural human language use – I understand what you want it to be, but I would contend that natural human language use is so complex (complex for a computer to apprehend) that trying to give writing mechanics feedback on spontaneously generated student writing will lead only to confusion for the students.

When we talk about machine learning being used to automatically grade writing, most people don’t know what that looks like. Because they don’t know the technology, they make it up. As far as I can tell, this is based on a combination of decades-old technology like Microsoft Word’s green grammar squiggles, clever new applications like Apple’s Siri personal assistant, and downright fiction, like Tony Stark’s snarky talking suits. What you get from this cross is a weird and incompetent artificial intelligence pointing out commas and giving students high grades for hiding the word “defenestration” in an essay.

My cofounder at LightSIDE Labs, David Adamson, taught in a high school for six years. If we were endeavoring to build something that was this unhelpful for teachers, he would have walked out a long time ago. In fact, though, David is a researcher in his own right. David’s Ph.D. research isn’t as focused on machine learning and algorithms as my own; instead, his work brings him into Pittsburgh public schools, talking with students and teachers, and putting technology where it can make a difference. In this post, rather than focus on essay evaluation and helping students with writing – which will be the subject of future posts – I’m going to explore the things he’s already doing in classrooms.

Building computers that talk to students

David builds conversational agents. These agents are computer programs that sit in chatrooms for small-group discussion in class projects, looking by all appearances like a moderator or TA logged in elsewhere. They’re not human, however – they’re totally automated. They have a small library of lines that they can inject into the discussion, which can be automatically modified slightly in context. They use language technology, including machine learning as well as simpler techniques, to process what students are saying as they work together. The agent has to decide what to say and when.

Those pre-scripted lines aren’t thrown in arbitrarily. In fact, they’re descended from decades of research into education and getting classroom discussion right. This line of research is called Accountable Talk, and in fact there’s an entire course coming up on Coursera about how to use this theory productively. The whole thing is built on fairly basic principles:

First, students should be accountable to each other in a conversation. If you’re only sharing your own ideas and not building off of the ideas of others, then it’s just a bunch of people thinking alone, who happen to be in a chatroom together. You don’t get anything out of the discussion. Next, your thought process should be built off of connecting the dots, making logical conclusions, and reasoning about the connections between facts. Finally, those facts that you’re basing your decision-making on should be explicit. They should come from explicit sources and you should be able to point to them in your argument for why your beliefs are correct.

David’s agents are framed around Accountable Talk, doing what teachers know leads to a good discussion. Instead of giving students instructions or trying to evaluate whether they were right or wrong, they merely ask good questions at the right times. Agents were trained to look for places where students made a productive, substantial claim – the type of jumping-off point that Accountable Talk encourages. He never tried to correct those claims, though; he didn’t even evaluate whether they were right or wrong. He was just looking for the chance to make a difference in the discussion.

He used those automated predictions as a springboard for collaborative discussion. Agents were programmed to try to match student statements to existing facts about a specific chemistry topic. “So, let me get this right. You’re saying…” More often than not, he also programmed the agents to lean on other students for help. “[Student 2], can you repeat what [Student 1] just said, in your own words? Do you agree or disagree? Why?” Automated prompts like this leave the deep thinking to students. Instead of following computer instructions by rote, the students were being pushed into deeper discussions. Agents give the authority to students, asking them to lead and not taking on the role of a teacher and looming over them.

Sometimes computers fail

In the real world, intervention to help students requires confidence that you’re giving good advice. If David’s agents always spout unhelpful nonsense, students will learn to ignore them. Perhaps worst of all, if the agent tries to reward students for information it thinks is correct, a wrong judgment means students get literally the opposite of helpful teaching. With all of this opportunity for downside, reliability seems like it would be the top priority. How can you build a system that’s useful for intervening in small groups if it makes big mistakes?

This is mostly accounted for by crafting the right feedback, designing agents that are tailored to the technology’s strengths and avoiding weaknesses. In large part this comes down to avoiding advice that’s so clear-cut that big mistakes are possible. Grammar checking and evaluations of accuracy within a sentence are doomed to fail almost from the start. If your goal with a machine learning system is to correct every mistake that every student makes, you’re going to need to be very confident, and because this is a statistics game we’re playing, that kind of technology is going to disappoint. Moreover, even when you get it right, what has a student gained by being told to fix a run-on sentence? At best, an improvement at small-scale grammar understanding. This is not going to sweep anyone off their feet.

By basing his conversational agents on the tenets of a good discussion, David was able to gain a lot of ground with what is, frankly, pretty run-of-the-mill machine learning. Whiz-bang technology is secondary to technology that does something that helps. When the system works, it skips the grammar lessons. Instead, it jumps into the conversation at just the right time to encourage students to think for themselves.

Sometimes, though, the agent misfires. When using machine learning, this is something you just have to accept. What we care about is that this doesn’t hurt students or start teaching wrong ideas. So let’s think about the cases where an agent can make a wrong decision: first, where the agent could have given feedback but didn’t, and second, where the agent gives the wrong feedback at the wrong time.

First, the easy case. Sometimes a student will say something brilliant and the agent will fail to catch it. Here, the balance of authority between agent and student matters. If students get used to the idea that the agent is a teacher, they’ll be looking for it to tell them they got every answer right. This is a danger zone for us – with statistical machine learning, we don’t want to back ourselves into a corner where the system has to make judgments every single time. Sometimes, we’ll be uncertain. If an agent misses its cue and the students expect a teacher’s authoritative judgment of correctness, the group will flounder when the agent misses a beat, or take the discussion into their own hands and leave the agent in the dust.

Let’s see what that looks like in practice. Here’s a transcript from an earlier study, when conversational agents weren’t as heavily rooted in the theory of Accountable Talk:

Tutor: You should now move on to discussing Condition C.

Tutor: [Student A], now would be a good time to ask [Student B] to build on what [Student C] is saying.

[Student B]: I’m so confused!

[Student A]: [Student B], would you like to build onto what [Student C] is saying? And me too!

Tutor: When you agree, write down your predictions on your worksheet.

This is dysfunctional; the tutor isn’t asking discussion-generating questions, and it’s pushing forward regardless of context. Focusing on the task single-mindedly doesn’t give students room to think.


By taking on the role of facilitator, though, the agent’s comments aren’t expected every time. We can use the system to chime in with an Accountable Talk question when we’re highly confident that the machine learning system is making the right prediction, but if the system stays silent, the students won’t even notice it’s missing; the agent is purely there to help scaffold learning, rather than judge correctness. When you’re augmenting teaching, rather than replacing humans entirely, you can afford to be cautious. Look at how the agent interacts when we start asking questions that are better suited to what we know we can do:

[Student D]: ok, boiling pt will go up and vdW will go up for all of them consecutively… right?

Tutor: Do you concur with [Student D]? Why, or why not?

[Student E]: hmm not necessarily

[Student F]: area goes up for each

[Student E]: would it?

[Student E]: im not sure

[Student D]: yea for sure area goes up

[Student F]: dipole increases first one

In this example excerpt the tutor didn’t give an instruction or evaluate anything that the first quoted student line. It simply asked a basic question in response to machine learning evaluating that spot as a good opening. The comments from these new agents use Accountable Talk principles, and get student groups discussing ideas.

Of course, these systems aren’t perfect. What we’re finding out, though, is that we can frame the discussion right for automated assessment by not trying to make our automated system the perfect arbiter of truth. What I’m describing isn’t a dire portrait of machines taking over the education system. It’s agents contributing meaningfully to learning by cautiously intervening when appropriate, using machine learning for educated guessing about when it’s time to get students to think more deeply. These agents are tireless and can be placed into every discussion in every online small group at all times – something a single teacher in a large class will never be able to do.

The results with these agents were clear: students learned significantly more than students who didn’t get the support. Moreover, when students were singled out and targeted by agent questioning, they participated more and led a more engaged, more assertive conversation with the other students.  The agent didn’t have to give students remedial grammar instructions to be valuable; the data showed that the students took their own initiative, with the agents merely pushing them in the right direction. Machine learning didn’t have to be perfect. Instead, machine learning figured out the right places to ask questions, and worked towards making students think for themselves. This is how machine learning can help students.

For helping students, automated feedback works.

We should be exercising caution with machine learning. Skeptics are right to second guess interventions from technologists who aren’t working with students. The goal is often to replace teachers, not help them, especially with the promise of tantalizingly quick cost savings. Yes – if you want to make standardized testing cheaper, machine learning works. I don’t to dismiss this entirely – we can, in fact, save schools and states a lot of money on existing standardized tests – but if that’s as far as your imagination takes you, you’re missing the point. What’s important isn’t that we can test students more, and more quickly, with less money. Focus on this: we can actually help students.

Not every student is going to get one-on-one time daily with a trained writing tutor. Many are never going to see a writing tutor individually in their entire education. For these students, machine learning is stepping in, with instant help. These systems aren’t going to make the right decision every time in every sentence. We need to know that, and we need to work with it. Rather than toss out technology promising the moon, look carefully at what it can do. Shift expectations as necessary. In David’s case, the shift was about authority. He empowered students to take up their own education, and chimed in when it saw an opportunity; it positioned the automated system as guide rather than dictator.

This goes way beyond grading, and way beyond grammar checking. Machine learning helps students when teachers aren’t there. Getting automated feedback right leads to students thinking, discussing ideas, and learning more – and that’s what matters. In my next post, I’d like to launch off from here and talk about what these lessons mean not just for discussion, but for writing. Stay tuned.

A last note

The work I described from David is part of an extended series of more than 20 papers and journal articles from my advisor at Carnegie Mellon, Carolyn Rosé, and her students. While I won’t give a bibliography for a decade of research, some of the newest work is published as:

  • “Intensification of Group Knowledge Exchange with Academically Productive Talk Agents,” in this year’s CSCL conference.
  • “Enhancing Scientific Reasoning and Explanation Skills with Conversational Agents,” submitted to IEEE Transactions on Learning Technologies.
  • “Towards Academically Productive Talk Supported by Conversational Agents,” in the 2012 conference on Intelligent Tutoring Systems.

I’ve asked David to watch this post’s comments section, and I’m sure he’ll be happy to directly answer any questions you have.

Share Button

Google+ Comments

About Elijah Mayfield

Founder of LightSIDE Labs, a small company in Pittsburgh focusing on machine learning for automated writing assessment in education. Also a Ph.D. student at Carnegie Mellon's Language Technologies Institute, though that role is, by necessity, on hold for awhile.
This entry was posted in Ed Tech and tagged , , , , , . Bookmark the permalink.

27 Responses to Getting students useful feedback from machine learning

  1. tom abeles says:

    What would be interesting is to know how far this differs from the v. early program, Eliza, since the sense of the interactions seems much like “her” and the reactions similar.

  2. Joe McCarthy says:

    Interesting, on several dimensions.

    I like the idea of Accountable Talk, and the emphasis on facilitative vs. authoritative engagement with students. I suspect some human educators could benefit from approaching interactions with students with a similar orientation.

    I also like the modest / cautious approach described here. A little AI goes a long way, and a lot of AI typically fails (CYC comes to mind). In many AI problems, the key is figuring out where to insert the human in the loop, but in this domain, it appears the key is to figure out where to insert the AI in the loop.

    I’m reminded of Joseph Weizenbaum’s ELIZA program, an early AI program from the 1960s that simulated a Rogerian psychotherapist, using fairly primitive patterns of human language to generate approximately appropriate output, defaulting to “tell me more” when no pattern was matched. Weizenbaum was surprised and somewhat disturbed by how readily many users of his program revealed so much about themselves, even though they knew they were interacting with a computer program.

    I’m also reminded of Sherry Turkle’s objections to the use of Paro, the baby harp seal robot, in nursing homes, arguing that elderly users are telling their problems and life stories to a robot who cannot understand them. My own judgment is that the simple act of articulating problems and life stories – to any recipient (sentient or not) – can have a valuable therapeutic impact.

    I think this is a similar situation, wherein the goal is simply to encourage students to tell their stories – or ask their questions – without judgment or deep[er] response. I often find that my students benefit from simply articulating their questions (in class or in a human-moderated discussion forum): by making their questions, assumptions and prior steps explicit, they often arrive at the answers themselves. If a computer program can provide some simple facilitation in this process, it sounds like a net gain to me.

  3. Elijah,

    Another discussion-provoking and informative post. Thank you again for sharing.

    Though honestly I find this automated feedback even more disturbing than machine grading that you wrote about in your last post. You say the goal of this tool is to help students — yet are we now resorting to computers to help students learn in discussion settings which is one key are where critical thinking skills can be developed? This is most frightening.

    You suggest in your post the context for using automated feedback in this instance, “First, students should be accountable to each other in a conversation. If you’re only sharing your own ideas and not building off of the ideas of others, then it’s just a bunch of people thinking alone, who happen to be in a chatroom together.”

    Two comments on this:

    1) a teacher can teach the skill of asking questions and developing a conversation to students by modeling for students: how to moderate a discussion, how to get peers to build on each others questions and points, how to ask questions, how to get a conversation back on track.

    2) I been in an online course where students were assigned the role of moderating a discussion board in this kind of scenario you speak of. The moderating team (usually two or three students), would be assigned one week as moderators. The teams were to ask questions, promote discussion etc. [similar to what the automated feedback program does but with more depth]. The team summarized the week’s discussion, highlighted the main themes and key points for the class and posted it in the forum two or three days after the discussion week ended. Teams all received instruction and support from the prof.

    When I state my opinions here on disagreeing with your points, I am not targeting your products specifically, but speak to the entire concept of machine grading and automated feedback. We are moving in a dangerous direction by de-humanizing human interaction – education should be the last place we automate.

  4. Debbie: I think a big consideration – one that needs to be looked at very carefully for this type of technology – is where you’re going to put this technology. With conversational agents even more than with essay grading, though, this is technology to add help for students where they’re not getting any right now, and to change the types of learning that are feasible in large groups.

    When David goes into a classroom he usually works with 30-40 students at a time in a single classroom; he immediately breaks them into a dozen or so small groups, working in parallel, each of which has a conversational agent in their chatroom. If you want to support small group discussion, there’s simply no comparable intervention that you can get from a single teacher in crowded classrooms.

    The agents that David builds ask less complex questions than a teacher might if they were working with students one on one, but that’s not the model we’re comparing against. They also don’t attempt to replace the role of the teacher wholesale – it’s almost purely additive, as a way to support small group learning as a supplement to the traditional lecture format without completely overwhelming a teacher. That, to me, is encouraging human interaction, rather than hurting it.

  5. The point about modeling is an interesting one. There’s an opportunity, in the context that Elijah describes in his comment, for the teacher to lead a meta-conversation with the students about the role of the software. What kind of behavior is the agent modeling? Can you deduce the patterns from its behavior? When and how is that behavior helpful? And do you think that you could do what the computer is doing, only better?

    The same technology that can produce these prompts can detect the language being modeled when used by students. So another possibility—either alternative or complementary—is to give the teacher and students visibility into when students in group work appear to be using facilitating language.

  6. It should be noted that there are people at Pitt’s Learning Research and Development Center that are explicitly working on machine learning for recognizing the types of Accountable Talk moves that students are using in a chatroom context – not classifying them on a “right”/”wrong” spectrum but instead being descriptive and automatically detecting the strategies they’re using, if any. I’ve only had brief conversations with them on this topic and I’m not sure how successful they’ve been.

    More speculatively, I’ve heard discussion – though I don’t know how far it’s gotten – about predicting not just when those students did use accountable talk, but when they *might have* used it if they had been trained to do so. I don’t think any formal quantitative attempts have been made there, mostly because it’s hard to get training data or even know what training data would look like (who gets to proclaim that a student missed an opportunity?). I wouldn’t be surprised if that’s the direction their research heads in, though.

  7. Elijah and Michael

    First, I need to clarify what David’s conversational agents – when you speak of agents for small chat rooms – I am assuming you are speaking of chatrooms as synchronous online discussions? You mention that they “looking by all appearances like a moderator”. How is the agent identified by the student?

    I also respectfully disagree – you don’t need a computer to support human interaction. You need a human to teach students how to interact and think like a human.

    Michael the concern I have with your suggestion, that the teacher focuses on the agent, encouraging students to model the agent is that the computer/technology becomes the focus, not the learning. Also we then are in danger of highlighting the computer as the expert not the teacher. The teacher is far more valuable here

  8. Debbie – I agree with you that humans are better models. The solution of having a human moderator in every small group discussion isn’t a feasible one, though – the teacher:student ratio just doesn’t work. I also agree that students can learn from one another without an agent. The claim definitely isn’t that we’re starting from zero, and we ought to respect that students can improve simply through practice, if you don’t include any technological doodads.

    Instead, the papers from David and his colleagues show is that there are gains on top of that! In their collection of papers, the vast majority show learning gains, that the students were more engaged in the discussion, that they were asking deeper questions, that they enjoyed themselves more, that they felt more self-efficacious (believed in their own ability to learn), and other related positive effects. These are all things that they’ve reported from their classroom studies representing a decade of work thinking about these issues.

    I should caution that you don’t see every positive effect in every study – there’s no magic bullet. Different studies test for different things, experiment with conditions, and sometimes they test a hypothesis that goes wrong. They’ve had some studies where the students walked away feeling better about themselves but the presence of an agent had no effect whatsoever on learning. The aggregate story, though, is that agents are a clear improvement on top of just having students work together.

  9. Elijah
    Thanks for your thoughts and willingness to engage in discussion! And a most interesting discussion it is.

    I do think we will agree to disagree :). You mention that having a human moderator (teacher) in every small discussion isn’t feasible – yet it can be if students are taught how to do so. My 10th grader frequently had seminar discussions in his English class this past year. There are 32 students in the class. They were often divided into small groups for what they called ‘Socratic Discussions’ on books they were reading. The teacher provided detailed instructions on how each group was run the discussion – they were also focused, specific topics and questions that were provided to each group. This seemed to work quite well. The fact that my son talked about the results of the some of the discussion at the dinner table, suggests they were successful.

    Thank you again for a great post and a good discussion. Debbie

  10. dadamsonlightside says:

    The conversational agents I’ve developed for my research do bear a passing similarity to Weizenbaum’s Eliza, in that students often doubt that they’re interacting with a machine – although the intent is never to deceive. As Joe points out, responding appropriately to students’ contributions can open up opportunities for rich interactions and knowledge-building from the students themselves.

    The agents I work with are, I’d argue, more interesting and effective than a decades-old chatterbot prototype. For a start, we work with educators and content experts to build model of what meaningful, on-topic contributions from students look like for a given unit. We can use these models to notice student turns that *might* be good opportunities for elaboration or reflection – but to respond to every such statement (or to pick between them at random) could quickly kill a productive back-and-forth between students – knowing *how* and *when* to respond (and when not to) is just as important as knowing what to respond to. We employ a conversational model (based on the principles of Accountable Talk) to identify points in an ongoing discussion where a revoice move, or a prompt for others to agree or disagree with a particular student statement, would be productive and non-disruptive. The studies we’ve run with this framework have shown positive effects – not just for learning (as measured between pre and post tests), but in the quality and intensity of student discourse, both during the activity and back in the classroom.

    These small-group chat sessions with automated facilitation provide the sort of modeling oppoertunities that Debbie and Michael are talking about – students get to practice participating in rich, accountable science talk in a safe, managed environment, and can transfer it to the classroom. Anecdotally, the teachers I’ve worked with have shared their own amazement – that students who never spoke up in class, who they had to pull teeth to engage with, were active contributors in in the small-group chats, and continued to participate in followup classroom discussions.

    This, indeed, is the point of Accountable Talk. By making automated on-demand feedback available, we’re not transferring authority to the machine – we’re granting it to the students.

  11. simSchool says:

    “help for students where they’re not getting any right now” is I think the crux of the matter and why we do need to explore automated feedback, adaptive apps that listen and do their best to respond, apps that generate ideas and support divergent thinking, etc. A graphics artist recently pointed out this site to me: and it got me to thinking about structures for generative conversations, narratives, and prompts for student responses.

  12. @Elijah: another great post that helps on the way to enlightenment about what AI and NLP-based edtech can do. As you know, I’m on the same line as you: we need to empower teachers through technology so they can do more with than without it.

    @Debbie: I think it’s great that you fire up this discussion and I understand your reticence to machines teaching humans how to interact. That said, I think the point is that we may be able to bring (socratic or not) discussion training to more classrooms if we help the teacher with a chatbot. The same goes for high quality writing skills practice: how many students have the privilege to get detailed writing coaching from a human teacher on a frequent basis?

    @dadamsonlightside: are you also familiar with Mark Humphry’s work? He actually managed to pass the Turing test in the ’80s with his chatbot ‘MGonz’ and is still working at Dublin City University.

  13. Pingback: Need-to-Watch-Videos: Three Clips that Promote Thinking Outside-of-the-Box | online learning insights

  14. Laura Gibbs says:

    Elijah, just saw this article (out of town all week) – anyway, since you have not addressed writing in this essay, I will wait until your next post to provide detailed comments. It sounds like you are going to abandon the whole idea of the actual teaching of writing as a skill (which includes writing mechanics), and instead focus on writing just as an alternative mode for students to present content to computers for automated evaluation, as with the oral conversational agents described in this post. If that is indeed the case, I will be sad to see it. Many college professors have abandoned the actual teaching of writing and they have done so for all kinds of reasons; if the technologists also abandon that cause, then the students are really going to be in trouble when it comes to the development of actual writing skills, especially those writing skills which are entirely separate from oral language skills.

    As for the dismissive comment about students not gaining anything of real value by getting accurate feedback about run-on sentences and other writing errors… OUCH. That kind of scornful attitude towards writing instruction makes me highly dubious that anything really useful for the teaching of writing will emerge from this endeavor, but I will wait to see what the next post offers.

  15. epurser says:

    Gosh, hot topic! thanks e-Literate and commentators for your stimulating words, enjoyed thinking about this…. shouldn’t be using my Saturday to reply, have far too much other stuff to do but here goes…

    I’d been wondering where the ‘accountable conversation’ mooc was coming from… good strategy to crowd-source appraisal that way, of whatever flavour.

    Longer term, I’ve been curious about automated text analysis and text generation for many years, watching the PENMAN project… and now with massification, moocification, widening participation, internationalization and English language domination of global HE, this area seems to be hotting up rapidly with real and manufactured need for a million new apps to help with all aspects of language teaching, faster feedback, and support for mass numbers in e-learning spaces….

    Just wondering, a bit rhetorically, about how your ‘agents’ (secret or otherwise) are being programmed… which corpora lie behind them and how data is tagged…

    I’m assuming by ‘substance’ in Accountable Talk is meant key words and collocations (going by the transcript fragments here – boiling pt, go up, vdW, consecutively, area, dipole, increase, first)? If this is how a ‘substantial claim’ is identified by the software, (so that on that basis students can be encouraged into further interaction and hopefully deep and meaningful dialogue and learning)… I figure that identification must be based on cross-referencing some discipline-specific corpora representing the discourse, which has been made searchable by a particular kind of grammatical tagging – and it’s that tagging that I’d be most interested to know about, because that’s the bit that needs to be got right, I figure. The value of the corpus lies in how we search them, which depends on how we tag the data. And all theories of language are not created equal…

    The control agent (sorry I just can’t get the image of 86 out of my head now) doesn’t add any further ideational meaning, it seems from what I read in the post here, just questions that might lead students’ building up of ‘field’….but the program doesn’t actually introduce new lexical items that are not already being used by students?

    Whatever it’s doing, I wouldn’t say there’s no lessons in ‘grammar’ going on.. unless you’re defining grammar lesson as = correction of syntactic or punctuation ‘error’ only… I think of language education in terms of modeling – pointing out what’s actually going on in normal discourse, so people can see it, rather than gloss over it in a rush and not notice and learn from the patterns.

    I was also wondering about the transcript snippets, whether there’s anything there that couldn’t be achieved by a mastery quizzes? Seemed they were just checking comprehension of established ‘facts’ or process descriptions, rather than ‘discussing ideas’ deeply or otherwise. How were “the tenets of a good discussion” determined, seeing everything else in the system hinges on that…. and how learning was measured… so many questions – perhaps I’ll just have to read more, to get a better sense of what you’re doing. thanks for the refs.

    I’ll look forward to future posts on writing as such, but I guess I’m with Laura on not wanting to skip ‘grammar’ that might help people understand how language works in relation to discourse and the construction of logic and ideational meaning… I’d want to focus on it, if I were programming educational interactions to predict, scaffold and generally facilitate the development of peer to peer learning…. I think everyone needs to understand ‘grammar’ better… especially when half or most of a cohort are not comfortable in academic English, and actually want feedback on their use of the language… (and never get it, because their teachers of Engineering or Chemistry or whatever aren’t able to talk about language very effectively, and don’t think it’s their job to do so).

    There probably is positive potential for robo-assistance in eLearning spaces, but if the initial analysis of language isn’t sophisticated and educationally useful, we might as well all go home now (along with the ‘Grammarly’s of this world). I reckon you need a coherent, integrated, evidence-based theory of context, discourse semantics, genre, register and lexico-grammar informing the programming…. or the outcome might be a tad limited… and if a limited system were misunderstood by institutional administrators as being the beez neez and answer to all financial prayers and way to go in getting rid of expensive teachers, well …. hmmm (not that such a thing could ever possibly happen of course..)

  16. Laura Gibbs says:

    Well, I had said I was not going to comment so much on this post because it really doesn’t say anything about feedback on writing per se, but epurser has raised so many good questions that I feel obliged to chime in. My take is that the kind of feedback which is being presented here as a machine intervention is the kind of thing that goes really well ANYWAY if you create opportunities for peers to respond to each other on work that is of interest to them; I really don’t see that we need machines for that at all. In my classes (online), the students quickly learn to do a really good job of giving each other feedback, on their own, with no monitoring from me. They do get lots of modeling from me in the feedback that I give them too of course, plus the experience that comes from receiving peer feedback in turn and being aware of what is valuable and what is not, as well as just some basic guidelines and instructions given throughout the class (for example, the excellent video from Mrs. Yollis’ students:

    So, for example, here is a typical kind of comment that one student left for another student in my class, one I’ve just chosen here totally at random from all the hundreds and hundreds of comments my students leave for each other during the semester – Hey there Hannah! Came back to read your third story and was very impressed! I like to think I know a lot of fairy tales, but this was one I had never heard of before. This made it very interesting to read, as I didn’t know how it was supposed to end normally! It was the perfect story for the format of your storybook, as it was a tale that had many opportunities to turn tragic. Thus, the fairy had many places where she could have stepped in and altered the course. I like how King Edward is pretty much silent for the entire duration of the scene. It shows that he’s finally actually paying attention. He has completely relinquished his reluctance at this point, and is completely wrapped up in the story, just like the reader. It’s another nice step in the progression of his character that we’ve been seeing. Like many others, including myself, you ran into the problem of having to try and find a way to fit a story into your frametale. I’ve seen several others who have mentioned this story being difficult because they know where they want their frametale to go at this point, and so a really specific myth is required. You seem to have conquered this, though, so I can’t wait to come back and see the conclusion!

    As that comment chosen at random shows, students are able to do a really good job of responding to each other on that level of content – especially if they are given a little bit of direct instruction about strategies for how to do that early on in the class (a great skill set for them to learn, too). I would contend that the peer feedback they can provide one another in that regard is so superior to machine feedback that a comparison would be laughable – a contention which is indeed supported by the very generic and non-responsive examples of machine feedback provided in the blog post here.

    So, in addition to emphasizing the enormous value of peer feedback (human to human), I also want to make a stand here once again in defense of the teaching of grammar – using the word grammar as a a kind of catchphrase (an admittedly sloppy catchphrase, but commonly in use) for writing mechanics of all kinds – spelling, punctuation, sentence structure, etc. These are exactly the kinds of things that are PECULIAR to written language and which students often struggle with exactly because they are not things learned naturally from the speaking-and-listening that occupies so many hours of people’s everyday lives and, indeed, so much of the school day. In my world (a world of college students at a typical large-ish 4-year public university, mostly native speakers), it is accurate feedback (and yes, error correction) in the realm of grammar that students really need help with – either from the teacher or (in some quixotic techno-utopia) from the machine tutor. They need help from the teacher because they are unable to help themselves very much with that in their otherwise extremely useful and insightful peer feedback. Given that they are often all struggling with writing mechanics to some degree, it is not something natural or easy to include in peer feedback, and boy, did I see that as an enormous problem in both of the Coursera writing courses in which I participated – grammar feedback from unskilled writers led to absurdities that would be funny if the students were not in fact agonizing over all of it. Indeed, even students with higher-level skills often do not have that additional boost of confidence and experience which could allow them to really take on a guiding or instructional role with their peers when it comes to writing mechanics.

    So, with those thoughts in mind, I await the next post where we will find out whether the machine tutor can indeed engage with writing on this level or whether it is more like a checking of content mastery through student writing processed as a kind of series of short-answer fill-in-the-blank responses (a content quiz in disguise), as opposed to truly discursive writing.

  17. tom abeles says:

    Great literature is often read at several levels or grades. Cervantes, Shakespeare, Sarte, etc. And the academic journals are filled with nuanced exchanges. How can the “chatbots” or variances thereof handle these levels of complexity in a meaningful manner? In other words can they get beyond the idea of giving students the confidence in themselves to participate in complex exchanges and/or “discovering” ideas like a sculpture hidden in a block of marble?

    The comment about the “chemist” saying that issues of grammar, etc is “not my job” is very prevalent and we see this in the writings in scholarly journals and in presentations at conferences.This filters down in the education system where often even “discussions” are seen as expected performance. This raises issues of communication beyond the classroom and the difference when the student is in class or participating in self-directed learning worlds such as Whyville where there may or may not be automatons with which to engage.

    Neal Stephenson in his SF novel, The Diamond Age, with its hybrid “book” seems to have struggled with these issues.

  18. Phil Hill says:

    Tom, and on the subject of professors compartmentalizing and “not my job”, have you seen the article in today’s Chronicle?

  19. The discussion here has taken a decidedly (and unexpectedly) sociolinguistic turn! This is both exciting for me and a little dangerous to dive into. Exciting, because I’ve spent several years thinking about how to capture sociolinguistics computationally and what you can get out of machine learning when talking about deeper discursive structures. Dangerous, because so much of this qualitative research is simplified and made coarse when you move to an automated process. Talking with a sociolinguistic vocabulary makes people assume that the computer algorithms are doing smarter things than they really are. I’ll take a stab at it now but will save most of this for future posts.

    Emily and Laura, you’re rightly hesitant to take my definition of “grammar” at face value. In this post, I use that as a catch-all to refer to intrasentential syntactic and lexical correction, with cheap and easy examples including subject-verb agreement, pluralization, punctuation marks, and run-on sentences or sentence fragments. This is what most bloggers seem to picture when they think about automated grading. What puzzles me is that this doesn’t really bear any resemblance to any system I’ve seen in modern research, either in the learning sciences or especially in computational linguistics. I have a hard time figuring out where these guesses are being drawn from.

    Emily is talking about a different notion of grammar, I think. From the vocabulary she’s using I’m guessing she has experience with the systemic functional community. They take an incredibly expansive view of grammatical and interpersonal metaphor. I’ve published about a dozen papers on using machine learning to formalize systemic functional approaches to language, and I love the work they do – it inspires a lot of my own thinking. Their definition of “grammar” extends far beyond what I’ve described above. There are troubles with systemic functional approaches to language, mostly to do with reliability of their interpretations, but that’s beyond my point here that I can get into later.

    What’s important is knowing when to stop yourself as a machine learning researcher. There are whole swaths of feedback that just don’t work well with machine learning. On the higher, conceptual end, there are comments which are inherently subjective. Once you get into the level of pragmatics and intertextual, multi-level interpretation, you could get half a dozen humans in a room and they’d all disagree in nuanced ways with each other. What happens when I build a system that generates one of those views, but not the other? Is it wrong? Hard to say, but it’s certainly not “trustworthy” in the eyes of the people who disagree with its interpretation. Even if they wouldn’t necessarily fault a human for taking a stance, they likely will reject that same opinion from an automated algorithm.

    On the other hand, if you get into the level of mechanics, you get the opposite problem. Here, there’s no room for interpretation at all (with some exceptions, like optional comma placement). A machine learning system that’s wrong is just wrong, and giving bad feedback. The payoff for giving correct feedback on mechanics 80% of the time is massively undone by the 20% of the time that it’s giving patently wrong feedback, and the system can easily turn out to be a net negative. I’ll be going into this in my next post, but this is what needs to be thought about carefully – what’s the tradeoff between good automated feedback (when it works) and damage from automated feedback (when it doesn’t)? It’s the potential damage of those systems that I was disparaging, Laura, and not necessarily their inclusion in an overall writing curriculum.

    This gets to the fundamental point, I guess – finding the balance. I think there’s a place for automated feedback that’s at a level that allow students to gain critical thinking skills, to revise and improve their writing (and, in the case of this blog post, their discussion skills), and to generally communicate better with the feedback than they would without. I think there are dangers to making this approach too specific or too abstract, and I think discussions like this comment thread offer a good window into how to get the technology right. Personally, I’d like to think that every comment from an English teacher or writing tutor is input that can eventually go into a tool that’s useful and productive, not just cost-saving.

    It’d be a waste to let this technology flounder merely because it’s not as good as humans, or because it’s somewhat crude in its first implementation. I’m surely not going to take a stand that the technology is perfectly defensible in every way as is – instead, I’m going to say it’s getting many things right, already, and has huge opportunity for improvement, moving forward. And that’s exciting!

  20. epurser says:

    I’m excited by it all too, Elijah, and you’re very right, it’s SFL I’m coming from with my questions about how sentences are being described in your system.

    I just read through the initial thread about writing that preceded this one on secret agents in online conversation spaces, and am feeling very happy to have happened upon this blog – there’s so much thoughtful discussion going on in it, it’s great fun! I’m really looking forward to your coming posts. I just don’t have time unfortunately this or next week to write much, have a conference to write for and be at, but do look forward to returning soon and engaging further.

    @Phil I also just read through the linked articles in your comment above – yes well, I can certainly relate to all that! Though where I live (in Australia) we don’t tend to locate English language and academic literacy education as you do in the US in first year writing classes, we have similar challenges with teachers in faculties who imagine themselves to be teaching something other than English and expect ‘someone else’ to ‘fix up’ all the normal, language learning problems their students experience… the ‘conduit metaphor’ Reddy wrote about so many moons ago still so dominates all discourse, that folks in electrical engineering, science, health, business, whatever, much prefer to think of themselves as doing what they do ‘in’ English, and imagine that ‘English’ is something that students should arrive in the faculties fully fluently speaking and writing… the banal idea that the way we teach disciplines ‘might have something to do with’ how easily or otherwise students learn to make sense of the discourse and become writers of the knowledge seems incomprehensible, odd or maybe revolutionary in some parts of the university…. (but it’s certainly not the case that disciplines can be characterised as awake or asleep when it comes to recognising the role of dialogue in learning and the need to pay close attention to how language works… the culture is very mixed in every discipline I find – and some great practice peer-review and small group scaffolded writing does go on in Chemistry…).

    but coming back to what Elijah’s on about, I certainly see the potential power of making the patterns of language more visible to everyone – and while I’m very interested in what the Mick O’Donnells and Christian Matthiessens of this world are doing, I’m actually using non-SFL tools in my teaching, because it’s easier for most of the students I work with…. just getting student writers to explore simple-to-use online tools like Tom Cobb’s Vocab Profiler, John Morley’s Phrasebank or a concordancer like Just The Word can be very useful, strategically, and showing them how to create their own searchable corpus of what they’re reading, can also, if used wisely, change the way they see ‘English’, and on what basis they might develop their repertoire and sense of choice, much more quickly than otherwise happens.

    The challenge where I work is that half the students are L2, and in most post-grad courses and research programs, where I focus my attention, ALL the students are using English as an additional language – so the need for fast language development is acute. But the biggest challenge of all is getting teaching academics across the disciplines to seriously contemplate how a course might be re-designed around the learning needs of the students we have (gosh, what a concept!) when those needs are by definition linguistic. The tendency is still to teach as though students are or ought to be L1, and construe them as deficient if they struggle, and send them off to have their English ‘fixed’ somewhere outside the curriculum – because the desire is to keep language invisible, and pretend that developing ‘knowledge’ has nothing to do with developing language is very strong. But from my point of view, the joys of actually seeing how disciplinary discourse works, and showing people how to look at and think a bit differently about the language under their noses every day, so they start paying attention and talking about it as a normal part of teaching, makes it all worth bothering (I think!).

    I find this whole area worth pursuing, whatever the current limitations and inelegancies, because I can see how it could be good to have more of the ‘big data’ of well planned and searchable corpora flowing into the daily language learning environments and curriculum development practices of universities, where more and more students need to pay close attention to language.

  21. Laura Gibbs says:

    Elijah, thanks for your detailed reply! About where the grammar model is coming from (“What puzzles me is that this doesn’t really bear any resemblance to any system I’ve seen in modern research, either in the learning sciences or especially in computational linguistics. I have a hard time figuring out where these guesses are being drawn from”), I think the answer is simple: IT COMES FROM WHAT TEACHERS DO. So, insofar as these automated systems are represented as being like what teachers do, we naturally expect your system to help students learn to write correctly. Error detection is part of the grading process (and so it is an issue in automated systems like robograders that are used only to mark tests) and, even more importantly, both error detection and strategies for error correction (either providing the correction and/or prompting the student to correct) is a huge part of the feedback process as we work to help students master these writing skills. Insofar as you are building an automated system to provide students with feedback on their writing, you should not be surprised that people expect the system to be able to provide “grammar” feedback since that is, in fact, a big part of what teachers do as they provide feedback to students. It is even more the case for teachers who are working with non-native speakers of English. Native speakers only have to negotiate the differences between spoken language and its written form (and those differences are numerous and complex); with non-native speakers, an even wider range of language learning has to be supported.

    If your system is NOT going to be able to do that at all, then it seems to me that it is not a substitute or supplement to the the teacher’s role, but is instead a supplement to peer feedback and in that case I have to question why we need an automated system at all. It’s clear that logistics of time mean that teachers cannot always give every student a lot of individualized feedback (although in such cases I always urge teachers to redesign their assignments to cope with that problem; better less writing with more feedback rather than the reverse) … but there can and should be room for LOTS of peer feedback; the only obstacle to abundant peer feedback is usually logistics (for which computers are great, giving us the chance to build truly networked classrooms online) and also lack of training that empowers the teacher to build peer feedback systems into their class design (how to build peer feedback assignments, how to help students learn to give good feedback).

    Can your computerized system provide feedback that is even close to being equal to peer feedback? I doubt that this will be the case, so I really have to question the goal here: if the computer cannot provide the truly expert feedback that a professional teacher does, and if the computer only provides feedback roughly comparable to that of peers, then why use the computer at all for feedback? I would instead far prefer to see development of computer resources that take advantage of what computers are truly good at: providing a platform for sharing and interaction among humans (facilitating more teacher feedback, more peer feedback, more self-reflection by students themselves and revise revision revision of all their writing), as well as providing truly smart tutorials (building writing tutorials is far more feasible than asking computers to respond to human writing which is being created to communicate with a human audience). I will await your further posts to persuade me that there is actually something of value to be gained from attempting to have the computer itself provide feedback rather than providing the digital environment the best possible human feedback.

  22. Laura – thanks for all of your comments. They’re thoughtful and have made me take pause when thinking about the design of appropriate automated feedback, and pausing every once in awhile is important. I think rather than attempt a reply here, I’ll instead incorporate much of what I’d say into my next blog post.

    In the mean time, I’d point to articles like this one as the type of commentary I’m speaking of:

    I’d love your feedback on that feedback. To me, it’s clear that the writer isn’t basing their editorial on any real-world system, which seems egregious if you’re writing in the Washington Post. It’s humor, yes, but it’s still going to get attention. What aspects of what he’s mocking, though, would you say resemble what teachers do? This is the type of straw man that’s being battered in popular commentary; I’m not sure it matches either automated systems or human teachers, and that’s what worries me.

  23. Laura Gibbs says:

    I had not seen that article in the Washington Post – it’s funny because in some ways it exemplifies the stereotype of what computerized feedback can mean (and for most people, this means MicrosoftWord grammar feedback, a true nightmare), and it also exemplifies the kind of bad feedback that writing teachers themselves sometimes provide… computers have not cornered the market on giving bad feedback which is non-responsive to the author’s meaning, of course!

    At the same time, I have to emphasize that there is indeed a place for error detection (and sometimes error correction) in the feedback that we give to students. Starting with the Gettysburg Address is not a good place for that; instead, we need to look at real student writing and find out as much as possible from students about what kind of feedback is useful to them and also what kind of feedback is most feasible for teachers to provide.

    Just to give you a sense of how I work, I’ve pasted in comments from a student story that I pulled out at random (it’s part of a very dramatic myth about Medusa before she was turned into a monster; this student has chosen to do her class project on Medusa – such a great project, and it is a topic no one had done for this class before, a whole Storybook all about Medusa). As you can see, I mark my comments with ==>, just inserting them into the student’s own writing. Making comments like this is something that is my full-time job, since I teach online courses that are writing-intensive; basically I am a writing tutor to appx. 80-100 students per week. I rely a lot on cutting-and-pasting canned comments re: spelling, punctuation, sentence structure from a giant GoogleDoc but I am also making individualized comments all the time based on the actual meaning of each story. In general, I really enjoy the stories that the students write, so making these comments is all just a pleasure for me. I picked this chunk out at random just to give a sense of how it works – the total story is appx. 1000 words of student writing. It takes me usually 20-30 minutes to comment on a story the first time I see it; when they turn in the revised version, it goes much faster of course. They revise each story at least once and, depending on how much revision is needed, sometimes twice (but that’s pretty rare; they usually do a very good job revising). You can see the final products here: (that is my Myth-Folklore class; I also teach a course in Indian Epics which uses the same approach). The workload balances out since in any given week some students are doing revisions; I spend appx. 30 hours per week doing feedback.

    I provided above a sample of the kind of peer feedback students leave each other, so maybe by seeing my feedback you can get a sense of the contrast. The students provide mostly content-oriented feedback, and I do that too of course, but I’m also concerned with helping students grapple with punctuation and sentence structure, something that a lot of them really do have trouble with (in part because they rarely get sentence by sentence feedback like this to help them see what mistakes they are making).


    “Medusa is it, right?” Poseidon said, making me jump.

    “Oh, Poseidon
    ==> a vocative needs a comma both before and after:

    I didn’t see you there, you scared me, goodness.
    ==> poor Medusa: she has every reason to be scared! this is a run-on sentence, also known as a comma splice:

    What are you even doing here?” I said shaking. I should have known he was trouble, I had heard all of the rumors
    ==> another run-on sentence; see note above

    about how Poseidon tried to sleep with everyone of all sexes.
    ==> that definitely sounds like the world of Greek mythology!

    I clumsily stepped backwards as he moved toward me.
    ==> nice visual details! hearing it told by Medusa herself in first-person is a powerful choice!

    “Medusa you aren’t afraid of me.
    ==> a vocative needs a comma; see note above

    Don’t be so shy, you are one of the most beautiful creatures I have ever seen.
    ==> comma splice; see note above

    My love come sit right here with me.”
    ==> vocative; see note above

    I was nervous, straightening my dress trying to avert my eyes elsewhere, just hoping he would leave.
    ==> and trying
    (otherwise, it would be the dress trying to avert her eyes, if you see what I mean)

    I knew this would upset Athena but I was more terrified of Poseidon at the time.
    ==> great! this makes perfect sense – poor Medusa had a god AND a goddess to worry about, but Poseidon was clearly the greater danger at that intense moment!

    (and so the story goes on… it does not end well for Medusa, as you can imagine!)

  24. Laura Gibbs says:

    Hmmm, just got a note that my comment is awaiting moderation which I don’t remember happening here before. Perhaps it triggered a spam filter because it contains hyperlinks. I saved it, though, so if it doesn’t show up soon, Elijah, you can send me an email ([email protected]) and I’ll send it to you by email; it contains a sample of how I do feedback – definitely different from the Washington Post article, ha ha… 🙂

  25. Yup, it was the hyperlinks. We try to strike a balance between not requiring commenters to register and not spending our entire day deleting comment spam. It works pretty well most of the time, but hyperlinks do set off the trigger.

    Anyway, your comment is cleared.

  26. Laura Gibbs says:

    Wow, that was fast – thank you, Michael!!!

  27. epurser says:

    just back from a conference on language education at NUSingapore, where some pretty great software development for multimodal text analysis is happening…. and just read the refs you posted here Elijah, thankyou so much for those.

    One quick question today about your secret agent software based on ‘accountable talk’ – what happens if students contributing to an online discussion that’s being monitored and prompted by a robot are not writing accurate English? Are inaccuracies and ambiguities questioned, or do they just get ignored? In other words, to what extent does the deepening engagement in the conversation that is the goal of the intervention depend on lexico-grammatical accuracy of posts by students?

    I ask because I’d think the greatest need for something like this, if there is one, would be in environments where language is most problematic, being learned, and students are in desperate need of the targeted extensive feedback that is often not currently being given in STEM disciplines that imagine themselves to have non-linguistic learning objectives (disciplines where the majority of participants are keen to avoid language and/or are not yet highly proficient in the language of instruction, and generally don’t recognise the role of dialogue in all learning – and indeed where those comments also apply to teachers!)

Comments are closed.