The introduction of Transparent Predictive Coding to Symantec’s Clearwell eDiscovery Platform helps organizations defensibly reduce the time and cost of document review. Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input. As expert reviewers tag documents in a training set, the software identifies common criteria across those documents, which it uses to “predict” the responsiveness of the remaining case documents. The result is that fewer irrelevant and non-responsive documents need to be reviewed manually – thereby accelerating the review process, increasing accuracy and allowing organizations to reduce the time and money spent on traditional page-by-page attorney document review.
Given the cost, speed and accuracy improvements that predictive coding promises, its adoption may seem to be a no-brainer. Yet predictive coding technology hasn’t been widely adopted in eDiscovery – largely because the technology and process itself still seems opaque and complex. Symantec’s Transparent Predictive Coding was developed to address these concerns and provide the level of defensibility necessary to enable legal teams to adopt predictive coding as a mainstream technology for eDiscovery review. Transparent Predictive Coding provides reviewers with complete visibility into the training and prediction process and delivers context for more informed, defensible decision-making.
Early adopters like Falcon Discovery have already witnessed the benefits of Transparent Predictive Coding. Falcon is a managed services provider that leverages a mix of top legal talent and cutting-edge technologies to help corporate legal departments, and the law firms that serve them, manage discovery and compliance challenges across matters. Recently, we spoke with Don McLaughlin, founder and CEO of Falcon Discovery, on the firm’s experiences with and lessons learned from using Transparent Predictive Coding.
1. Why did Falcon Discovery decide to evaluate Transparent Predictive Coding?
Predictive coding is obviously an exciting development for the eDiscovery industry, and we want to be able to offer Falcon’s clients the time and cost savings that it can deliver. At the same time there is an element of risk. For example, not all solutions provide the same level of visibility into the prediction process, and helping our clients manage eDiscovery in a defensible manner is of paramount importance. Over the past several years we have tested and/or used a number of different software solutions that include some assisted review or prediction technology. We were impressed that Symantec has taken the time and put in the research to integrate best practices into its predictive coding technology. This includes elements like integrated, dynamic statistical sampling, which takes the guesswork out of measuring review accuracy. This ability to look at accuracy across the entire review set provides a more complete picture, and helps address key issues that have come to light in some of the recent predictive coding court cases like Da Silva Moore.
2. What’s something you found unique or different from other solutions you evaluated?
I would say one of the biggest differentiators is that Transparent Predictive Coding uses both content and metadata in its algorithms to capture the full context of an e-mail or document, which we found to be appealing for two reasons. First, you often have to consider metadata during review for sensitive issues like privilege and to focus on important communications between specific individuals during specific time periods. Second, this can yield more accurate results with less work because the software has a more complete picture of the important elements in an e-mail or document. This faster time to evaluate the documents is critical for our clients’ bottom line, and enables more effective litigation risk analysis, while minimizing the chance of overlooking privileged or responsive documents.
3. So what were some of the success metrics that you logged?
Using Transparent Predictive Coding, Falcon was able to achieve extremely high levels of review accuracy with only a fraction of the time and review effort. If you look at academic studies on linear search and review, even under ideal conditions you often get somewhere between 40-60% accuracy. With Transparent Predictive Coding we are seeing accuracy measures closer to 90%, which means we are often achieving 90% recall and 80% precision by reviewing only a small fraction – under 10% – of the data population that you might otherwise review document-by-document. For the appropriate case and population of documents, this enables us to cut review time and costs by 90% compared to pure linear review. Of course, this is on top of the significant savings derived from leveraging other technologies to intelligently cull the data to a more relevant review set, prior to even using Transparent Predictive Coding. This means that our clients can understand the key issues, and identify potentially ‘smoking gun’ material, much earlier in a case.
4. How do you anticipate using this technology for Falcon’s clients?
I think it’s easy for people to get swept up by the “latest and greatest” technology or gadget and assume this is the silver bullet for everything we’ve been toiling over before. Take, for example, the smartphone camera – great for a lot of (maybe even most) situations, but sometimes you’re going to want that super zoom lens or even (gasp!) regular film. By the same token, it’s important to recognize that predictive coding is not an across-the-board substitute for other important eDiscovery review technologies and targeted manual review. That said, we’ve leveraged Clearwell to help our clients lower the time and costs of the eDiscovery process on hundreds of cases now, and one of the main benefits is that the solution offers the flexibility of using any number of advanced analytics tools to meet the specific requirements of the case at hand. We’re obviously excited to be able to introduce our clients to this predictive coding technology – and the time and cost benefits it can deliver – but this is in addition to other Clearwell tools, like advanced keyword search, concept or topic clustering, domain filtering, discussion threading and so on, that can and should be used together with predictive coding.
5. Based on your experience, do you have advice for others who may be looking to defensibly reduce the time and cost of document review with predictive coding technology?
The goal of the eDiscovery process is not perfection. At the end of the day, whether you employ a linear review approach and/or leverage predictive coding technology, you need to be able to show that what you did was reasonable and achieved an acceptable level of recall and precision. One of the things you notice with predictive coding is that as you review more documents, the recall and precision scores go up but at a decreasing rate. A key element of a reasonable approach to predictive coding is measuring your review accuracy using a proven statistical sampling methodology. This includes measuring recall and precision accurately to ensure the predictive coding technology is performing as expected. We’re excited to be able to deliver this capability to our clients out of the box with Clearwell, so they can make more informed decisions about their cases early-on and when necessary address concerns of proportionality with opposing parties and the court.
To find out more about Transparent Predictive Coding, visit http://go.symantec.com/predictive-coding