It has now been four weeks since Chegg announced a data breach compromising personal information of up to 40 million users. Cue the crickets because the only coverage in ed tech press thus far is from EdWeek, which focuses on the K-12 market. That's a shame, because if ed tech companies want a case study to help understand the implications of FBI warnings or the European Union's new Global Data Privacy Regulations (GDPR), this example from Chegg should be illustrative. The same goes for institutions.
As a recap, Chegg discovered on September 19th a data breach dating back to April that "an unauthorized party" accessed a data base with access to "a Chegg user’s name, email address, shipping address, Chegg username, and hashed Chegg password" but no financial information or social security numbers. The company has not disclosed, or is unsure of, how many of the 40 million users had their personal information stolen. On September 25th Chegg notified the SEC about the breach, focusing on guidance for company financials. The company then started notifying users and "certain regulatory authorities" on September 26th.
A "hashed password" is a typical process where the entered password is converted to random-looking cryptographic characters not intended to be decrypted. Subsequent password entries use the same hash again and software compares not the passwords but the hashed passwords to see if they come out identical. While this practice of one-way hashes is well-known, there are far too many web sites (including in ed tech) using plain text, reversible hashes, or poor cryptography schemes.
This 2016 article in Wired gives a good overview of hashing and data breaches and notes that the level of compromise depends on the details.
In theory, no one, not a hacker or even the web service itself, should be able to take those hashes and convert them back into passwords. But in practice, some hashing schemes are significantly harder to reverse than others. The collection of 177 million LinkedIn accounts stolen in 2012 that went up for sale on a dark web market last week, for instance, had actually been hashed. But the company used only a simple hashing function called SHA1 without extra protections, allowing almost all the hashed passwords to be trivially cracked. The result is that hackers were able to not only access the passwords, but also try them on other websites, likely leading to Mark Zuckerberg having his Twitter and Pinterest accounts hacked over the weekend.
By contrast, a breach at the crowdfunding site Patreon last year exposed passwords that had been hashed with a far stronger function called bcrypt, the fact of which likely kept the full cache relatively secure in spite of the breach.
What is problematic with the Chegg data breach is that no further information has been made public and there has yet to be any interest from the broader ed tech press to dig up answers. We have no idea how serious this breach is, and I do not believe that the users with compromised personal information have had any updates since the initial email blast and associated post.
Less than one week before the Chegg discovery of the data breach, the FBI put out a warning about ed tech and K-12 schools, but the details could easily be applied to higher education.
The FBI is encouraging public awareness of cyber threat concerns related to K-12 students. The US school systems’ rapid growth of education technologies (EdTech) and widespread collection of student data could have privacy and safety implications if compromised or exploited.
EdTech can provide services for adaptive, personalized learning experiences, and unique opportunities for student collaboration. Additionally, administrative platforms for tracking academics, disciplinary issues, student information systems, and classroom management programs, are commonly served through EdTech services.
There is also the GDPR angle described in the EdWeek article.
One of the first to call attention to the Chegg breach was Hill, an education consultant and market analyst for the company MindWires Consulting who posted a blog and a tweet about the SEC disclosure. [snip]
One of the more pressing questions is whether the breach will draw the scrutiny of data-privacy regulators, said Hill in an interview. He pointed to the new rules put in place as part of GDPR, the sweeping European data privacy regulation that took effect earlier this year.
The European policy has come into focus recently with the admission by social media giant Facebook — which has a major presence in schools — that hackers gained access to 50 million of its accounts. European authorities have said they are investigating how many users on the continent were affected, and whether it would trigger GPDR enforcement.
The Facebook breach was no doubt more problematic, as its breach exposed far more personal information as well as access to Facebook Login, thus compromising third-party platforms. But both data breaches involve consumer-based systems and similar numbers of users. In legal terms, however, GDPR is based on protecting citizens of the European Union. When I asked a Chegg spokesman about the GDPR-based notifications, they replied in general terms.
We actually do have an office in Berlin. Chegg’s customer base is principally US-based, and the core focus of our business is the United States. We are providing notice to the particular regulatory agencies, in the US and Internationally- including Europe.
GDPR has been criticized as creating impossible to fully comply requirements, and there are two aspects worth covering here - Supervisory Authority and Notification of Data Breach. This article gives a good summary and whom to notify - the Supervisory Authority.
For most companies, choosing a GDPR Lead Supervisory Authority is a straightforward decision. A company based in Paris, France would appoint the supervisory authority in France as the lead supervisory authority. A UK-based company would choose the Information Commissioner’s Office (ICO), which is the supervisory authority for the UK.
For companies that operate in multiple EU member states, the lead supervisory authority would normally be the supervisory authority in the country where the company’s headquarters is or where its main business location is in the EU. More specifically, it would be the Supervisory Authority in the country where the final decisions are made about data collection and processing.
A U.S. company that does not have a base in an EU member state has a problem. If it does not have a base in an EU member state where data procession decisions are made, it will not benefit from the one-stop-shop mechanism. Even if a company has a representative in an EU member state, that does not trigger the one-stop-shop mechanism.
The company must therefore deal with the supervisory authority in every member state where the company is active, through its local representative.
In Chegg's case, presumably the Berlin office allows them to use the one-stop mechanism of a lead authority. But smaller ed tech companies may not have this benefit and require interactions with many different country regulators1.
What about notification requirements in the case of a data breach? The relevant section is Article 33 of GDPR where Chegg would be a "controller" [emphasis added].
- In the case of a personal data breach, the controller shall without undue delay and, where feasible, not later than 72 hours after having become aware of it, notify the personal data breach to the supervisory authority competent in accordance with Article 55, unless the personal data breach is unlikely to result in a risk to the rights and freedoms of natural persons. 2Where the notification to the supervisory authority is not made within 72 hours, it shall be accompanied by reasons for the delay.
- The processor shall notify the controller without undue delay after becoming aware of a personal data breach.
- The notification referred to in paragraph 1 shall at least:
- describe the nature of the personal data breach including where possible, the categories and approximate number of data subjects concerned and the categories and approximate number of personal data records concerned;
- communicate the name and contact details of the data protection officer or other contact point where more information can be obtained;
- describe the likely consequences of the personal data breach;
- describe the measures taken or proposed to be taken by the controller to address the personal data breach, including, where appropriate, measures to mitigate its possible adverse effects.
In this case, Chegg would have had to notify its Lead Supervisory Authority by September 22 the details described above. According to the SEC form, initial notifications to regulators beyond the SEC started September 26.
Would there be a lawsuit based on this delayed notification? We don't know yet, but one important distinction is that in the EU the process must go through the official data regulators. Article 77 of GDPR specifies these actions.
Without prejudice to any other administrative or judicial remedy, every data subject shall have the right to lodge a complaint with a supervisory authority, in particular in the Member State of his or her habitual residence, place of work or place of the alleged infringement if the data subject considers that the processing of personal data relating to him or her infringes this Regulation.
In other words, a country regulator must decide whether it wants to pursue action against Chegg. In the US, similar complaints or lawsuits can be filed by individuals against the company with a data breach. The intention of GDPR is to go after the big tech companies - Google, Facebook, etc - and Chegg may be too low-profile to warrant close attention. Despite the large numbers involved of up to 40 million users, it is unknown how many are EU citizens.
Will there be further fallout for Chegg than the initial flurry of financial news that helped drive down its stock price by 21 percent since the notification? It looks like the biggest issue is job security for US lawyers, as there have been at least four dozen lawsuits seeking class-action status filed with the general theme of the company not securing its systems properly or not notifying investors of the risks of data security. I have no idea if any of these will stick2, but Chegg's initial focus on SEC and financial notifications seems well-placed.
In the meantime, other ed tech companies would do well to view this data breach as a case study and opportunity to figure out how secure their systems are, and if they would be able to comply with GDPR regulations (or if they would be required to do so). More broadly, how many companies collecting personal information use adequate protection of hashed passwords? How many know what to do in the case of a data breach? Now is the time to find out and take action, before the next event occurs.
I will repeat my call that Chegg needs to more fully disclose the details of the incident to the general public. There has been no new information shared by Chegg based on its investigation. I would add that this subject should get more attention from ed tech press.
Update: Based on interaction with executive director of OpsecEdu, the description of common password security approaches has been changed to not state that most use one-way hashing.