A few weeks ago, Audrey Watters wrote a great piece on her concerns about robo-grading of essays. (I tend to take a lot of inspiration from the things that annoy Audrey, in part because they usually annoy me too.) Here’s the crux of her argument:
According to Steve Kolowich’s Inside Higher Ed story, [educational researcher Mark] Shermis “acknowledges that [Automated Essay Scoring] software has not yet been able to replicate human intuition when it comes to identifying creativity. But while fostering original, nuanced expression is a good goal for a creative writing instructor, many instructors might settle for an easier way to make sure their students know how to write direct, effective sentences and paragraphs. ‘If you go to a business school or an engineering school, they’re not looking for creative writers,’ Shermis says. ‘They’re looking for people who can communicate ideas. And that’s what the technology is best at’ evaluating.”
Why are nuance and originality just the purview of the creative writing department? Why are those things seen here as indirect or ineffective? Why do we think creativity is opposed to communication? Is writing then just regurgitation?
What sorts of essays gain high marks among the SAT graders – human now or robot in the future? Are these the sorts of essays that students will be expected to write in college? Is this the sort of writing that a citizen/worker/grown-up will be expected to produce? Or, for the sake of speed and cost effectiveness, in Vander Ark’s formulation, are we promoting one mode of writing for standardized assessments at the K–12 level, only to realize when students get to college and to the job market that, alas, they still don’t know how to write?
How can we get students to write more? How can we help them find their voice and hone their craft? How do we create authentic writing assignments and experiences – ones that appeal to real issues and real discourse communities, not just to robot graders? How do we encourage students to find something to say and to write that something well? Is that by telling them that their work will be assessed by an automaton?
How do we support the instructors who have to read student papers and offer them thinking and writing guidance? When we talk about saving time and money here, whose bottom line are we really looking out for? Who’s really interested in this robot grader technology? And why? [Emphasis added.]
This is a classic case of a market gone awry. Machine learning is sold as an “efficiency” tool, because there is money in squeezing cost out of education. In and of itself, there’s nothing wrong with wanting education to be cost-effective. David Wiley’s formulation of “standard deviations per dollar” has both a numerator and a denominator. You can attack either number and still affect the ratio. The problem with obsessing over the denominator is that you start forgetting that “cost-effective” has to be effective. If you want to know what the ongoing industrialization of education looks like in the post-industrial world, robo-grading is it. We are reducing the evaluation to the least common denominator, where the denomination is in dollars.
But it doesn’t have to be that way. What if we looked at machine learning (the technology that makes robo-grading possible) from the perspective of trying to raise the numerator, i.e., effectiveness, while keeping cost the same? How could the technology be used as a force multiplier for good teachers, helping them to focus on what they do best in roughly the same way that flipping the classroom is supposed to do? If the goal is teaching better rather than just teaching cheaper, then what is machine learning good for?
Continue reading →