There has been a lot of interest in Tuesday's guest post by Steve Lattanzio from MetaMetrics on an alternate approach to college rankings that relies on algorithmic analysis of thousands of variables from the College Scorecard instead of typical cherry-picking of variables and subjective analysis. There have been some good questions posted on social media and blog comments asking for more information on the algorithms or assumptions behind the algorithms.
While we linked to a corresponding article with more results and more detail on the methodology, we should have made that link more obvious. That article gives a much deeper description of the assumptions and methods used, including references to assumptions behind the theory and underpinnings of the approach. We have updated the Tuesday post with a direct link and include links in this postscript.
The article "A New School of Thought for Our Thoughts on Schools" describes the challenge:
The solution that we propose is to use neural networks to perform representational learning on the data. In other words, instead of manually going through the dataset and engineering a handful of features, we propose to use neural networks to automatically encode (autoencode) the information, including information about where data are missing, in a smaller dimensional space. Similar to principal components analysis (PCA), auto-encoding via neural networks is a dimension-reducing technique, but is more apt at handling variables that are nonlinearly related. In fact, it could be thought of as a more generalized version of PCA. Of course, such compression is lossy, but much of the information lost will be uninteresting noise and redundancies.
The approach breaks up the 3,599 variables into a discrete number of categories, which then goes through successive layers of the neural network to generate a 2D representation.
I won't pretend to answer all questions by this summary, but instead I want to point out the source for describing this additional detail.
Through all of this discussion, I want to remind readers that Steve in the original post was quite deliberate about what is not being claimed by this research.
Out of an abundance of concern that the results of this experiment would be misrepresented, we’ll immediately point out that we make no claim that the rankings in this piece are the proper method for ranking these institutions, and we caution anyone from thinking of them as such.
The real goal is further described in the New School article's concluding paragraph:
The methodology described in this paper and the pedagogical use-cases provide a rich framework for advanced analytics of post-secondary education—something that the consequence of the industry and the unwieldiness of the data demands. It is our hope that a future proliferation of similar work will promote further transparency in the post-secondary school market, more holistic approaches to data use, and ultimately more complete, fairer, and objective metrics that empower students to make the best decisions.