The "Course Signals" story originally covered here has recently gone international, with Britain's prestigious Times Higher Education magazine picking up the Inside Higher Ed story and publishing it as an "Editor's Pick". Hopefully this will push the Course Signals team to answer questions asked of them nearly two months ago, questions that have still not been satisfactorily answered.
We realize those watching the posts on e-Literate over the past couple weeks may have some questions about what the "Course Signals issue" is, what it isn't, and why it is so important for the educational technology community to make sure Purdue is accounting for the recent issue discovered with their statistical approach. This explainer should get you up to speed.
What is Course Signals? Why is it important?
Course Signals is a software product developed at Purdue University to increase student success through the use of analytics to alert faculty, students, and staff to potential problems. Through using a formula that takes into account a variety of predictors and current behaviors (e.g. previous GPA, attendance, running scores), Course Signals can help spot potential academic problems before traditional methods might. That formula labels student status in given course according to a green-yellow-red scheme that clearly indicates whether students are in danger of the dreaded DWIF (dropping out, withdrawing, getting an incomplete, or failing)
While the product is used to to improve in-class student performance, the product is most often discussed in in a larger frame, as a product that increases long-term student success. The product has won prestigious awards for its approach to retention, and the product is particularly important in the analytics field, as its reported ability to increase retention by 21% makes it one of the most effective interventions out there, and suggests that technological solutions to student success can significantly outperform more traditional measures.
What problems were found in the data supporting the retention effects?
Purdue had been claiming that taking classes using CS technology led to better retention. Several anomalies in the data led to the discovery that the experiment may suffer from a "reverse-causality" problem.
One such anomaly was an odd "dose-response" curve. With many effective interventions, as exposure to the intervention increases, the desired benefit increases as well. In the recent Purdue data, taking one Course Signal-enhanced course was shown to have a very slight negative benefit, while taking two had a very strong benefit.
The story became even more complex when older data was examined. Early in the program taking one CS-enhanced course had a very substantial impact on retention, nearly equal to taking two CS-enhanced classes. But as the program expanded over the years, taking one CS-enhanced class started to show no impact at all. This behavior is not consistent with Course Signals causing higher retention.
I hypothesized a simple model to explain this shift: rather than students taking more CS-courses retaining at a higher rate, what was really happening was that the students who dropped out mid-year were taking less CS classes because they were taking less classes period. In other words, the retention/CS link existed, but not in a meaningful way. Unlike the Purdue model where taking CS-enhanced courses caused retention, this "reverse-causality" model explained why as participation expanded taking one CS-enhanced course might move from being a strong predictor to having no predictive force at all.
Michael Feldstein picked up on this analysis, and prodded the Purdue team for a response. When no response came, Alfred Essa, head of R & D and Analytics at McGraw-Hill, took my "back-of-the-envelope" model, and built it out into a full-fledged simulation. The simulation confirmed the reverse-causality model explained the data anomalies very well, much better than Purdue's causal model. Purdue's response to the simulation did not address the serious issues raised.
Does this mean Course Signals does not work?
It depends. Purdue has yet to respond to the new information in any meaningful way, and until they either release revised estimates that control for this effect or release their data for third-party analysis, we don't know the full story. Additionally, there are some course level effects seen in early Signals testing that will be unaffected by the issue.
However, Purdue's recent response to Inside Higher Ed indicates that they did not control for the reverse-causality issue at all. If this is true, then the likelihood is that the retention impact of Course Signals will be positive, but significantly below the 21% they have been claiming.
But positive impact is good, right?
Not really. The great insight regarding educational interventions of the past decade or so is what we might term "Hattie's Law", after researcher John Hattie. Most educational interventions have some effect. Doing something is usually better than doing nothing. The question that administrators face is not which interventions "work", but which interventions "work better than average."
At a 21% impact on retention, Course Signals was clearly in the "better than average" category, and its unparalleled dominance in that area suggested that the formula and approach embraced by Course Signals formed the best possible path forward.
Halve that impact and everything changes. Peer coaching models such as InsideTrack have shown impact in the 10-15% range. Increased student aid has shown moderate impact, as has streamlined registration and course access initiatives.
Additionally, other analytics packages exist that have taken a different route than Course Signals. Up until now, they have lived in the shadow of Purdue's success. If CS impact is shown to be significantly reduced, it may be time to give those approaches a second look.
What is unaffected by the new analysis?
Until Purdue fixes and reruns their analysis, it it hard to know what the effects might be. However, there were a number of claims Purdue made that were not based on longitudinal analysis, and these should stand. For instance, students in Course Signals do tend to get more A's and less F's, and that data would be unaffexted by this issue.
While that's good, it's not the major intent of at least some institutions interested in the system. What makes systems like this particularly attractive is their ability to pay for themselves over time by increasing retention.
There remains a question as to how a system that boosts grades could fail to boost retention. There are a couple potential hypotheses. First of all, it is quite possible that when the numbers are rerun there will still be a significant, though reduced, retention effect, and that reduced effect is still congruent with the better scores.
Alternately, it could be that students in Course Signals courses score highly in Course Signals-enhanced courses, but at the expense of other courses. My daughter's math teacher has a very strict policy on math homework which has whipped her into shape in that class, but this means she often delays studying for other things. Students with finite time resources can rearrange their time, but not always expand it.
Finally, for some nontrivial amount of students, retention problems are not due to grades. Not to push the reverse-causality logic too far, but for some students low grades could be a sign of financial or domestic difficulty; fixing the grade would not address the larger problem.
What are the larger cultural implications?
As Michael has outlined in a different post, there are major cultural implications to this error, ones which partially indict the research analytics community's approach to research. To my knowledge, the study was never peer-reviewed outside of its inclusion in conference proceedings, but it is one of the most referenced studies in learning analytics.
Technology does move fast enough that old publication cycles do not serve the industry well. But if pre-publication peer-review does not exist, there are a host of things we need to make post-publication review work. We need to release more underlying data, invite more criticism, and separate the PR arm of many organizations from their research arm (or at least insure more autonomy). Additionally, we may need to place more rigorous controls on conference presentation, and make sure that presentations making strong statistical claims undergo a more thorough and profiessional review.
The cultural implications of an error like this going undetected this long in a community that is supposedly a community of data analysts are also stunning, and will be the subject of a future post. For the moment we are still waiting for Purdue to engage honestly with the critique, and re-run their numbers after controlling for this effect. Hopefully that will happen later this week.
UPDATE: As Doug notes below, the paper did undergo a full peer review before its inclusion in the LAK conference. I was aware of that, but reading through the post, I realize that is not clear. As I mentioned, we're looking at putting together a more detailed analysis on how we got here after we know better what the damage is, and will walk through those issues more thoroughly at that time. In the meantime, I'd love to start a conversation about that issue in the comments. Let's assume that some analytics is sugar water, and some is useful medicine. How do we create a culture and a process that helps us separate one from the other? What's preventing us from doing that now?