It’s Called Data Analysis And Not Data Synthesis For A Reason

By Phil Hill. Posted on January 18, 2016

I’ve never been a big TEDtalks fan, but recently I’ve been exploring some of the episodes, partially based on peer pressure.

@PhilOnEdTech @mfeldstein67 y'all should do a weekly PTI style podcast rundown of the issues raised each week in edtech.

— Glenda Morgan (@morganmundum) January 15, 2016

In the process I ran across a talk from Sebastian Wernicke, who has a bioinformatics background but now seems to specialize in giving talks. The talk in question is “How to use data to make a hit TV show”, which starts by looking at two data approaches to binge TV production – Amazon’s use of data analysis to choose a new show concept, leading to Alpha House, and Netflix’s use of data to look at lots of show components but then to let humans make conclusions and “take a leap of faith”, leading to House of Cards. The anecdotes set up his description of where data fits and where it doesn’t, and this mirrors what Michael and I are seeing in the use the broad application of personalized learning.

We have described in our most recent EdSurge article:

Bottom Line: Personalized learning is not a product you can buy. It is a strategy that good teachers can implement.

While Wernicke is not addressing education, he describes the same underlying issue in memorable way (starting at 8:18 in particular).

Now, personally I’ve seen a lot of this struggle with data myself, because I work in computational genetics, which is also a field where lots of very smart people are using unimaginable amounts of data to make pretty serious decisions like deciding on a cancer therapy or developing a drug. And over the years, I’ve noticed a sort of pattern or kind of rule, if you will, about the difference between successful decision-making with data and unsuccessful decision-making, and I find this a pattern worth sharing, and it goes something like this.

So whenever you’re solving a complex problem, you’re doing essentially two things. The first one is, you take that problem apart into its bits and pieces so that you can deeply analyze those bits and pieces, and then of course you do the second part. You put all of these bits and pieces back together again to come to your conclusion. And sometimes you have to do it over again, but it’s always those two things: taking apart and putting back together again.

And now the crucial thing is that data and data analysis is only good for the first part. Data and data analysis, no matter how powerful, can only help you taking a problem apart and understanding its pieces. It’s not suited to put those pieces back together again and then to come to a conclusion. There’s another tool that can do that, and we all have it, and that tool is the brain. If there’s one thing a brain is good at, it’s taking bits and pieces back together again, even when you have incomplete information, and coming to a good conclusion, especially if it’s the brain of an expert.

And that’s why I believe that Netflix was so successful, because they used data and brains where they belong in the process. They use data to first understand lots of pieces about their audience that they otherwise wouldn’t have been able to understand at that depth, but then the decision to take all these bits and pieces and put them back together again and make a show like “House of Cards,” that was nowhere in the data. Ted Sarandos and his team made that decision to license that show, which also meant, by the way, that they were taking a pretty big personal risk with that decision. And Amazon, on the other hand, they did it the wrong way around. They used data all the way to drive their decision-making, first when they held their competition of TV ideas, then when they selected “Alpha House” to make as a show. Which of course was a very safe decision for them, because they could always point at the data, saying, “This is what the data tells us.” But it didn’t lead to the exceptional results that they were hoping for.

So data is of course a massively useful tool to make better decisions, but I believe that things go wrong when data is starting to drive those decisions. No matter how powerful, data is just a tool . . .

We are not the only people to describe this distinction. Tony Bates’ latest blog post describes a crossroads we face in automation vs. empowerment:

The key question we face is whether online learning should aim to replace teachers and instructors through automation, or whether technology should be used to empower not only teachers but also learners. Of course, the answer will always be a mix of both, but getting the balance right is critical.

What I particularly like about the Wernicke description is that he gets to the difference between analysis (detailed examination of the elements or structure of something, typically as a basis for discussion or interpretation) and synthesis (combination or composition, in particular)1. Data is uniquely suited to the former, the human mind is uniquely suited to the latter.

This is not to say that the use of data and analytics can never be used to put information back together, but it is crucial to understand there is a world of difference in data for analysis and data for synthesis. In the world of education, the difference shows up in whether data is used to empower learners and teachers or whether it is used to attempt automation of the learning experience.

By Phil Hill

More Posts(527)

Using Google’s definitions. [↩]

Comments

Kate Bowles says

January 18, 2016 at 7:58 PM

Phil, this is a very useful sorting out, thank you. Some of us are also beginning to see how very small scale ethnographic or narrative work can help us to assemble more nuanced conclusions than data currently allows, in part because the heavy lifting that data does depends on aggregation, and aggregation depends on the decision that outliers and complications that don’t fit can be cleaned out so that we get more manageable groupings.

For me this formulation of analysis and synthesis provides a useful location for narrative work, on the synthesis side. This is important because qualitative, narrative or ethnographic work has been placed under enormous pressure to stay on the analysis side — at a basic level, for example, narrative coding is an attempt to iron out the wrinkles that necessarily characterise individual human behaviour, and prioritise the search for themes. This enables narrative to play on the court that data rules, but possibly at a cost of understanding what narrative helps us to see in the first place.

The persistent challenges to narrative are scale and transferability, but they’re not prohibitive. We just need to figure this out. Is this something that higher education can take from industrial ethnography, and why do we not see more of it in edtech, do you think?
Phil Hill says

January 18, 2016 at 9:00 PM

Kate, thanks for note. Narrative is a great way to look at synthesis in education. Great points there.

As for scale and transferability, I have two inputs (for now):
– There is natural scale by removing the analysis / decomposition and giving eyes and ears to instructors. This is not MOOC-type scale, but smaller scale that is more effective.
– Perhaps we can think of narrative & synthesis as governors setting appropriate limits on scale. Go ahead and try to scale beyond the ability to add narrative, but don’t expect nuance or deeper learning.

Why do we not see similar insights as in industrial ethnography? Hmm, not sure. What are your thoughts?
Phil Hill says

January 18, 2016 at 9:09 PM

Kate, Did you see this article in Chronicle today? “The Faulty Foundation of American Colleges”:
http://chronicle.com/article/The-Faulty-Foundation-of/234905

It gets into standardization vs. individuality. Here is intro, which seems quite relevant here as well as your Twitter chat:

In the early 1950s, at the dawn of jet-powered flight, the U.S. Air Force confronted a troubling problem: Its pilots could not keep control of their planes. The two official designations for the mishaps were incidents and accidents, ranging from bungled landings to aircraft-obliterating fatalities. At the worst point, 17 pilots crashed in a single day. The military initially pinned the blame on the men in the cockpits, citing “pilot error” in crash reports. To remedy this, the Air Force elevated its recruiting standards and changed up its flight school — to little effect. The pilots, meanwhile, vehemently denied responsibility, insisting something was not quite right with the aircraft. Yet when engineers tested the planes’ mechanics and electronics, everything worked as designed.

So what was causing the mysterious performance failures? The pilots had got it right. The problem lay in the design of the cockpit. The military had such a difficult time identifying the problem — and its doctrine-shattering solution — because it went against everything they thought they knew about designing for pilots.

In 1926, Army scientists had measured the size of hundreds of male pilots (the possibility of female pilots was never a serious consideration) and used the data to standardize all cockpits and controls to fit an average-size airman. The assumption was that this would maximize the pool of pilot talent while minimizing the costs of manufacturing. That standardized-design dogma went unchallenged for the next three decades. But in 1950 a junior researcher named Lt. Gilbert Daniels finally asked, How many pilots really were average?

Using body measurements for 4,063 airmen, Daniels calculated the average of the 10 physical dimensions believed to be most relevant for cockpit design, including leg length and wrist circumference. They formed the dimensions of the “average pilot,” which Daniels generously defined as someone whose measurements lay within the middle 30 percent of values. Next he compared each individual pilot, one by one, with that average pilot. Before Daniels crunched his numbers, the consensus in the Air Force was that a sizable number of pilots, perhaps a majority, would be within the average range on all 10 dimensions. That’s why he was stunned when he tabulated the actual number: zero.

Out of 4,063 pilots, not a single airman fit within the average range on all dimensions. One might have longer-than-average arms, but shorter-than-average legs. Another might have a big chest but small hips. Even more astonishing, Daniels discovered that even if you picked just three dimensions — height, chest circumference, and sleeve length — less than 4 percent of pilots were average on all three. A cockpit standardized to fit the average actually fits no one. Forcing individuals to conform to a standardized cockpit ensures that nobody will ever perform at his (or her!) full potential.
Paul-Olivier Dehaye says

January 19, 2016 at 6:12 AM

Since you mention aviation, this is also very relevant:

http://www.vanityfair.com/news/business/2014/10/air-france-flight-447-crash

The failure mode is very interesting: humans failed to understand that it was then up to them to do the synthesis of the information they were given. There was no brain driving this plane, artificial or biological, just a constant stream of data.
Phil Hill says

January 19, 2016 at 9:50 AM

That (Vanity Fair) was a fascinating article. Glad you brought it up.

Trackbacks

Blue Canary says:

January 19, 2016 at 5:47 PM

[…] is a stark symptom of an over-reliance on data. The second item is Phil Hill’s ‘It’s Called Data Analysis And Not Data Synthesis For A Reason‘ post on e-literate. Phil uses a TED talk from a computational geneticist named Sebastian […]
What Analytics Aren’t | edtechdigest.com says:

April 14, 2016 at 6:30 AM

[…] ratings systems is a stark symptom of an over-reliance on data. The second item is Phil Hill’s ‘It’s Called Data Analysis And Not Data Synthesis For A Reason‘ post on e-literate. Phil uses a TED talk from a computational geneticist named Sebastian […]

By Phil Hill

Reader Interactions

Comments

Trackbacks