This blog post is a joint submission with BakerHostetler’s Data Privacy Monitor blog.
During the final panel of Thomson Reuters’ 17th Annual eDiscovery & Information Governance in Practice Forum, Thomas Barnett, Ignatius Grande, and Sandra Rampersaud led a lively discussion on Managing Big Data, Dark Data, and Risk. And while the exchange incorporated Information Governance 101 principles such as the explosion of Social Media and the corresponding growth of new data year-over-year, an additional set of concerns was raised about “dusty” and “dark” data—data unknown to many organizations, and unmanaged by many more.
Dusty and Dark Data
Dusty data is data the organization – or someone within it – kind of knows about, but is still cloaked with mystery and obscured by time. Dark data is data organizations keep unknowingly, entirely lurking within the shadows. Put more simply, dusty data is the “known unknown;” dark data is the “unknown unknown.” But while there has at least been some scholarship done on dark data as a concept, dusty data has received a lot less press. Both are important, and both present risk; however, they present different risk, and should be treated differently.
You need both Responsibility and Authority to Drive Change?
The panel had limited time, and chose to consolidate both dusty and dark data into a single set of information when providing a framework for a pragmatic approach to a solution that would identify, classify, and manage the data within the context of information governance and legal holds. Their proposal, while complicated in practice, would rely on a clear mandate from a decision maker/stakeholder who would empower someone within the organization with both the responsibility to undertake the project, and the authority to implement change (e.g., assign resources and, perhaps most importantly, spend money where necessary).
One of these Risks is not like the Other
We agree with the approach, but think that some additional analysis, that differentiates between dusty and dark data and employ a custodian-by-custodian approach, may provide further help to organizations dealing with these issues. Logically, dusty data represents a more immediate risk insofar as someone already knows about it. That risk may appear as quickly as an offhand statement in a 30(b)(6) or fact deposition by a witness who mentions “the room we keep legacy data and terminated employee information in but that we don’t go into very often.” In contrast, the “unknown unknown” dark data is, by definition, a lot less likely to come up in testimony or interviews precisely because no one knows about it.
A Strategic and Methodical Approach
Dusty data is the more immediate threat. But while it is easier to identify dusty data expediently through employee interviews, the nature of the interview process also compounds the risk: the more people who know about the data, the more likely its existence will manifest at a deposition or surprise counsel at trial. This makes the authority component of the proposed practice a requirement for effective practice; interviewing is not enough – a decision maker needs to craft a strategic solution and timely effect it. Organizations wrestling with these issues should carefully consider them across the entire range of Information Governance, including E-Discovery, privacy, and data security. This approach does not ignore the additional risk dark data presents, but instead first focuses on the risk associated with the known unknowns before performing the deeper client-directed dive into the true mysteries of the organization.