24h-payday

Posts Tagged ‘workflow’

From A to PC – Running a Defensible Predictive Coding Workflow

Tuesday, September 11th, 2012

So far in our ongoing predictive coding blog series, we’ve touched on the “whys” and “whats” of predictive coding, and now I’d like to address the “hows” of using this new technology. Given that predictive coding is groundbreaking technology in the world of eDiscovery, it’s no surprise that a different workflow is required in order to run the review process.

The traditional linear review process utilizes a “brute force” approach of manually reading each document and processing it for responsiveness and privilege. In order to reduce the high cost of this process, many organizations now farm out documents to contract attorneys for review. Often, however, contract attorneys possess less expertise and knowledge of the issues, which means that multiple review passes along with additional checks and balances are often needed in order to ensure review accuracy. This process commonly results in a significant number of documents being reviewed multiple times, which in turn increases the cost of review. When you step away from an “eyes-on review” of every document and use predictive coding to leverage the expertise of more experienced attorneys, you will naturally aim to review as few documents as possible in order to achieve the best possible results.

How do you review the minimum number of documents with predictive coding? For starters, organizations should prepare their case to use predictive coding by performing an early case assessment (ECA) in order to cull down to your review population prior to review. While some may suggest that predictive coding can be run without any ECA up front, you will actually save a significant amount of review time if you put in the effort to cull out the profoundly irrelevant documents in your case. Doing so will prevent a “junk in, junk out” situation where leaving too much junk in the case will result in having to necessarily review a number of junk documents throughout the predictive coding workflow.

Next, segregating documents that are unsuitable for predictive coding is important. Most predictive coding solutions leverage the extracted text content within documents to operate. That means any documents that do not contain extracted text, such as photographs and engineering schematics, should be manually reviewed so they are not overlooked by the predictive coding engine. The same concept applies to any other document that has other reviewable limitations, such as encrypted and password protected files. All of these documents should be reviewed separately as to not miss any relevant documents.

After culling down to your review population, the next step in preparing to use predictive coding is to create a Control Set by drawing a randomly selected statistical sample from the document population. Once the Control Set is manually reviewed, it will serve two main purposes. First, it will allow you to estimate the population yield, otherwise referred to as the percentage of responsive documents contained within the larger population. (The size of the control set may need to be adjusted to insure the yield is properly taken into account). Second, it will serve as your baseline for a true “apples-to-apples” comparison of your prediction accuracy across iterations as you move through the predictive coding workflow. The Control Set will only need to be reviewed once up front to be used for measuring accuracy throughout the workflow.

It is essential that the documents in the Control Set are selected randomly from the entire population. While some believe that taking other sampling approaches give better peace of mind, they actually may result in unnecessary review. For example, other workflows recommend sampling from the documents that are not predicted to be relevant to see if anything was left behind. If you instead create a proper Control Set from the entire population, you can get the necessary precision and recall metrics that are representative of the entire population, which in turn represents the documents that are not predicted to be relevant.

Once the Control Set is created, you can begin training the software to evaluate documents by the review criteria in the case. Selecting the optimal set of documents to train the system (commonly referred to as the training set or seed set) is one of the most important steps in the entire predictive coding workflow as it sets the initial accuracy for the system, and thus it should be chosen carefully. Some suggest creating the initial training set by taking a random sample (much like how the control set is selected) from the population instead of proactively selecting responsive documents. However, the important thing to understand is that any items used for training should accurately represent the responsive items instead. The reason selecting responsive documents for inclusion in the training set is important is related to the fact that most eDiscovery cases generally have low yield – meaning the prevalence of responsive documents contained within the overall document population is low. This means the system will not be able to effectively learn how to identify responsive items if enough responsive documents are not included in the training set.

An effective method for selecting the initial training set is to use a targeted search to locate a small set of documents (typically between 100-1000) that is expected to be about 50% responsive. For example, you may choose to focus on only the key custodians in the case and use a combination of tighter keyword/date range/etc search criteria. You do not have to perform exhaustive searches, but a high quality initial training set will likely minimize the amount of additional training needed to achieve high prediction accuracy.

After the initial training set is selected, it must then be reviewed. It is extremely important that the review decisions made on any training items are as accurate as possible since the systems will be learning from these items, which typically means that the more experienced case attorneys should be used for this review. Once review is finished on all of the training documents, then the system can learn from the tagging decisions in order to be able to predict the responsiveness or non-responsiveness of the remaining documents.

While you can now predict on all of the other documents in the population, it is most important to predict on the Control Set at this time. Not only may this decision be more time effective than applying predictions to all the documents in the case, but you will need predictions on all of the documents in the Control Set in order to assess the accuracy of the predictions. With predictions and tagging decisions on each of the Control Set documents, you will be able to get accurate precision and recall metrics that you can extrapolate to the entire review population.

At this point, the accuracy of the predictions is likely to not be optimal, and thus the iterative process begins. In order to increase the accuracy, you must select additional documents to use for training the system. Much like the initial training set, this additional training set must also be selected carefully. The best documents to use for an additional training set are those that the system would be unable to accurately predict. Rather than choosing these documents manually, the software is often able to mathematically determine this set more effectively than human reviewers. Once these documents are selected, you simply continue the iterative process of training, predicting and testing until your precision and recall are at an acceptable point. Following this workflow will result in a set of documents identified to be responsive by the system along with trustworthy and defensible accuracy metrics.

You cannot simply produce all of these documents at this point, however. The documents must still go through a privileged screen in order to remove any documents that should not be produced, and also go through any other review measures that you usually take on your responsive documents. This does, however, open up the possibility of applying additional rounds of predictive coding on top of this set of responsive documents. For example, after running the privileged screen, you can train on the privileged tag and attempt to identify additional privileged documents in your responsive set that were missed.

The important thing to keep in mind is that predictive coding is meant to strengthen your current review workflows. While we have outlined one possible workflow that utilizes predictive coding, the flexibility of the technology lends itself to be utilized for a multitude of other uses, including prioritizing a linear review. Whatever application you choose, predictive coding is sure to be an effective tool in your future reviews.

The Malkovich-ization of Predictive Coding in eDiscovery

Tuesday, August 14th, 2012

In the 1999 Academy Award-winning movie, Being John Malkovich, there’s a scene where the eponymous character is transported into his own body via a portal and everyone around him looks exactly like him.  All the characters can say is “Malkovich” as if this single word conveys everything to everyone.

In the eDiscovery world it seems lately like predictive coding has been Malkovich-ized, in the sense that it’s the start and end of every discussion. We here at eDiscovery 2.0 are similarly unable to break free of predictive coding’s gravitational pull – but we’ve attempted to give the use of this emerging technology some context, in the form of a top ten list.

So, without further ado, here are the top ten important items to consider with predictive coding and eDiscovery generally…

1. Perfection Is Not Required in eDiscovery

While not addressing predictive coding per se, it’s important to understand the litmus test for eDiscovery efforts. Regardless of the tools or techniques utilized to respond to document requests in electronic discovery, perfection is not required. The goal should be to create a reasonable and repeatable process to establish defensibility in the event you face challenges by the court or an opposing party. Make sure the predictive coding application (and broader eDiscovery platform you choose) functions correctly, is used properly and can generate reports illustrating that a reasonable process was followed. Remember, making smart decisions to establish a repeatable and defensible process early will inevitably reduce the risk of downstream problems.

2. Predictive Coding Is Just One Tool in the Litigator’s Tool-belt

Although the right predictive coding tools can reduce the time and cost of document review and improve accuracy rates, they are not a substitute for other important technology tools. Keyword search, concept search, domain filtering, and discussion threading are only a few of the other important tools in the litigator’s tool-belt that can and should be used together with predictive coding. Invest in an eDiscovery platform that contains a wide range of seamlessly integrated eDiscovery tools that work together to ensure the simplest, most flexible, and most efficient eDiscovery process.

3. Using Predictive Coding Tools Properly Makes All the Difference

Electronic discovery applications, like most technology solutions, are only effective if deployed properly. Since many early-generation tools are not intuitive, learning how to use a given predictive coding tool properly is critical to eDiscovery success. To maximize chances for success and minimize the risk of problems, select trustworthy predictive coding applications supported by reputable providers and make sure to learn how to use the solutions properly.

4. Predictive Coding Isn’t Just for Big Cases

Sometimes predictive coding applications must be purchased separately from other eDiscovery tools; other times additional fees may be required to use predictive coding. As a result, many practitioners only consider predictive coding for the largest cases, to ensure the cost of eDiscovery doesn’t exceed the value of the case. If possible, invest in an electronic discovery solution that includes predictive coding as part of an integrated eDiscovery platform containing legal hold, collection, processing, culling, analysis, and review capabilities at no additional charge. Since the cost of using different predictive coding tools varies dramatically, make sure to select a tool at the right price point to maximize economic efficiencies across multiple cases, regardless of size.

5. Investigate the Solution Providers

All predictive coding applications are not created equal. The tools vary significantly in price, usability, performance and overall reputation. Although the availability of trustworthy and independent information comparing different predictive coding solutions is limited, information about the companies creating these different application is available. Make sure to review independent research from analysts such as Gartner, Inc., as part of the vetting process instead of starting from scratch.

6. Test Drive Before You Buy

Savvy eDiscovery technologists take steps to ensure that the predictive coding application they are considering works within their organization’s environment and on their organization’s data. Product demonstrations are important, but testing products internally through a proof of concept evaluation is even more important if you are contemplating bringing an eDiscovery platform in house. Additionally, check company references before investing in a solution to find out how others feel about the software they purchased and the level of product support they receive.

7. Defensibility Is Paramount

Although predictive coding tools can save organizations money through increased efficiency, the relative newness and complexity of the technology can create risk. To avoid this risk, choose a predictive coding tool that is easy to use, developed by an industry leading company and fully supported.

8. Statistical Methodology and Product Training Are Critical

The underlying statistical methodology behind any predictive coding application is critical to the defensibility of the entire eDiscovery process. Many providers fail to incorporate a product workflow for selecting a properly sized control set in certain situations. Unfortunately, this oversight could unwittingly result in misrepresentations to the court and opposing parties about the system’s performance. Select providers capable of illustrating the statistical methodology behind their approach and that are capable of providing proper training on the use of their system.

9. Transparency Is Key

Many practitioners are legitimately concerned that early-generation predictive coding solutions operate as a “black box,” meaning the way they work is difficult to understand and/or explain. Since it is hard to defend technology that is difficult to understand, selecting a solution and process that can be explained in court is critical. Make sure to choose a predictive coding solution that is transparent to avoid allegations by opponents that your tool is ”black box” technology that cannot be trusted.

10. Align with Attorneys You Trust

The fact that predictive coding is relatively new to the legal field and can be more complex than traditional approaches to eDiscovery highlights the importance of aligning with trusted legal counsel. Most attorneys defer legal technology decisions to others on their legal team and have little practical experience using these solutions themselves. Conversational knowledge about these tools isn’t enough given the confusion, complexity, and risk related to selecting the wrong tool or using the applications improperly. Make sure to align with an attorney who possesses hands-on experience and who are able to articulate specific reasons why they prefer a particular solution or approach.

Hopefully this top ten list can ensure that your use of “predictive coding” isn’t Malkovich-ized – meaning you understand when, how and why you’re deploying this particularly eDiscovery technology. Without the right context, the eDiscovery industry risks overusing this term and in turn over-hyping this exciting next chapter in process improvement.

Why Half Measures Aren’t Enough in Predictive Coding

Thursday, July 26th, 2012

In part 2 of our predictive coding blog series, we highlighted some of the challenges in measuring and communicating the accuracy of computer predictions. But what exactly do we mean when we refer to accuracy? In this post, I will cover the various metrics used to assess the accuracy of predictive coding.

The most intuitive method for measuring the accuracy of predictions is to simply calculate the percentage of documents the software predicted correctly.  If 80 out of 100 documents are correctly predicted, the accuracy should be 80%. This approach is one of the standard methods used in many other disciplines. For example, a test score in school is often calculated by taking the number of questions answered correctly, dividing that by the total number of questions on the test, then multiplying the resulting number by 100 to get a percentage value. Wouldn’t it make sense to apply the same method for measuring the accuracy of predictive coding? Surprisingly, the answer is actually, “no.”

This approach is problematic because in eDiscovery the goal is not to determine the number of all documents tagged correctly, but rather the number of responsive documents tagged correctly. Let’s assume there are 50,000 documents in a case and each document has been reviewed by a human and computer, resulting in the human-computer comparison chart shown below.

 

Based on this chart, we can see that out of 50,000 total documents, the software predicted 42,000 documents (sum of row #1 and #3) correctly and therefore its accuracy is 84% (42,000/50,000).

However, analyzing the chart closely reveals a very different picture. The results of human review shows that there are 8,000 total responsive documents (sum of row #1 and #2) but the software found only 2,000 of those (row #1). This means the computer only found 25% of the truly responsive documents. This is called Recall.

Also, of the 4,000 documents that the computer predicted as responsive (the sum of row #1 and #4), only 2,000 are actually responsive (row #1), meaning the computer is right only 50% of the time when it predicts a document to be responsive. This is called Precision.

So, why are Recall and Precision so low – only 25% and 50%, respectively – when computer predictions are correct for 84% of the documents? That’s because the software did very well predicting non-responsive documents.  Based on the human review, there are 42,000 non-responsive documents (sum of row #3 and #4), of which the software correctly found 40,000, meaning the computer is right 95% (40,000/42,000) of the time when it predicts a document non-responsive. While the software is right only 50% of the time when predicting a document responsive, it is right 95% of the time when predicting a document non-responsive, meaning that overall  predictions across all documents are right to 84%.

In eDiscovery, parties are required to take reasonable steps to find documents.  The example above illustrates that the “percentage of correct predictions across all documents” metric may paint an inaccurate view of the number of responsive documents found or missed by the software. This is especially true when most of the documents in a case are non-responsive, which is the most common scenario in eDiscovery. Therefore, Recall and Precision, which accurately track the number of responsive documents found and missed, are better metrics for measuring accuracy of predictions, since they measure what the eDiscovery process is seeking to achieve.

However, measuring and tracking both metrics independently could be cumbersome in many situations, especially if the end goal is to achieve higher accuracy on both measures overall.  A single metric called F-measure, which tracks both Precision and Recall and is designed to strike a balance (or harmonic mean) between the two, can be used instead. A higher F-measure typically indicates higher precision and recall, and a lower F-measure typically indicates lower precision and recall.

These three units – Precision, Recall and F-measure – are the most widely accepted standards for measuring the accuracy of computer predictions. As a result, users of predictive coding are looking to solutions that provide a way to measure the prediction accuracy in all three units. The most advanced solutions have built-in measurement workflows and tracking mechanisms.

There is no standard for Recall, Precision or F-measure percentage. It is up to the parties involved in eDiscovery to determine a “reasonable” percentage based on the time, cost and risk trade-offs. The higher percentage means higher accuracy – but it also means higher eDiscovery costs as the software will likely require more training. For high-risk matters, 80%, 90% or even higher Recall may be required, but for lower-risk matters, 70% or even 60% may be acceptable. It should be noted that academic studies analyzing the effectiveness of linear review show widely varying review quality. One study which compared the accuracy of manual review with technology assisted review shows that manual review achieved, on average, 59.3% recall compared with an average recall of 76.7% for technology assisted review such as predictive coding.

FOIA Matters! — 2012 Information Governance Survey Results for the Government Sector

Thursday, July 12th, 2012

At this year’s EDGE Summit in April, Symantec polled attendees about a range of government-specific information governance questions. The attendees of the event were primarily comprised of members from IT, Legal, as well as Freedom of Information Act (FOIA) agents, government investigators and records managers. The main purpose of the EDGE survey was to gather attendees’ thoughts on what information governance means for their agencies, discern what actions were being taken to address Big Data challenges, and assess how far along agencies were in their information governance implementations pursuant to the recent Presidential Mandate.

As my colleague Matt Nelson’s blog recounts from the LegalTech conference earlier this year, information governance and predictive coding were among the hottest topics at the LTNY 2012 show and in the industry generally. The EDGE Summit correspondingly held sessions on those two topics, as well as delved deeper into questions that are unique to the government. For example, when asked what the top driver for implementation of an information governance plan in an agency was, three out of four respondents answered “FOIA.”

The fact that FOIA was listed as the top driver for government agencies planning to implement an information governance solution is in line with data reported by the Department of Justice (DOJ) from 2008-2011 on the number of requests received. In 2008, 605,491 FOIA requests were received. This figure grew to 644,165 in 2011. While the increase in FOIA requests is not enormous percentage-wise, what is significant is the reduction in backlogs for FOIA requests. In 2008, there was a backlog of 130,419 requests and was decreased to 83,490 by 2011. This is likely due to the implementation of newer and better technology, coupled with the fact that the current administration has made FOIA request processing a priority.

In 2009, President Obama directed agencies to adopt “a presumption in favor’” of FOIA requests for greater transparency in the government. Agencies have had pressure from the President to improve the response time to (and completeness of) FOIA requests. Washington Post reporter Ed O’Keefe wrote,

“a study by the National Security Archive at George Washington University and the Knight Foundation, found approximately 90 federal agencies are equipped to process FOIA requests, and of those 90, only slightly more than half have taken at least some steps to fulfill Obama’s goal to improve government transparency.”

Agencies are increasingly more focused on complying with FOIA and will continue to improve their IT environments with archiving, eDiscovery and other proactive records management solutions in order to increase access to data.

Not far behind FOIA requests on the list of reasons to implement an information governance plan were “lawsuits” and “internal investigations.” Fortunately, any comprehensive information governance plan will axiomatically address FOIA requests since the technology implemented to accomplish information governance inherently allows for the storage, identification, collection, review and production of data regardless of the specific purpose. The use of information governance technology will not have the same workflow or process for FOIA that an internal investigation would require, for example, but the tools required are the same.

The survey also found that the top three most important activities surrounding information governance were: email/records retention (73%), data security/privacy (73%) and data storage (72%). These concerns are being addressed modularly by agencies with technology like data classification services, archiving, and data loss prevention technologies. In-house eDiscovery tools are also important as they facilitate the redaction of personally identifiable information that must be removed in many FOIA requests.

It is clear that agencies recognize the importance of managing email/records for the purposes of FOIA and this is an area of concern in light of not only the data explosion, but because 53% of respondents reported they are responsible for classifying their own data. Respondents have connected the concept of information governance with records management and the ability to execute more effectively on FOIA requests. Manual classification is rapidly becoming obsolete as data volumes grow, and is being replaced by automated solutions in successfully deployed information governance plans.

Perhaps the most interesting piece of data from the survey was the disclosures about what was preventing governmental agencies from implementing information governance plans. The top inhibitors for the government were “budget,” “internal consensus” and “lack of internal skill sets.” Contrasted with the LegalTech Survey findings from 2012 on information governance, with respondents predominantly from the private sector, the government’s concerns and implementation timelines are slightly different. In the EDGE survey, only 16% of the government respondents reported that they have implemented an information governance solution, contrasted with the 19% of the LegalTech audience. This disparity is partly because the government lacks the budget and the proper internal committee of stakeholders to sponsor and deploy a plan, but the relatively lows numbers in both sectors indicate the nascent state of information governance.

In order for a successful information governance plan to be deployed, “it takes a village,” to quote Secretary Clinton. Without prioritizing coordination between IT, legal, records managers, security, and the other necessary departments on data management, merely having the budget only purchases the technology and does not ensure true governance. In this year’s survey, 95% of EDGE respondents were actively discussing information governance solutions. Over the next two years the percentage of agencies that will submit a solution is expected to triple from 16%-52%. With the directive on records management due this month by the National Archives Records Administration (NARA), the government agencies will have clear guidance on what the best practices are for records management, and this will aid the adoption of automated archiving and records classification workflows.

The future is bright with the initiative by the President and NARA’s anticipated directive to examine the state of technology in the government. The EDGE survey results support the forecast, provided budget can be obtained, that agencies will be in an improved state of information governance within the next two years. This will be an improvement for FOIA request compliance, efficient litigation with the government and increase their ability to effectively conduct internal investigations.

Many would have projected that the results of the survey question on what drives information governance in the government would be litigation, internal investigations, and FOIA requests respectively. And yet, FOIA has recently taken on a more important role given the Obama administration’s focus on transparency and the increased number of requests by citizens. While any one of the drivers could have facilitated updates in process and technology the government clearly needs, FOIA has positive momentum behind it and seems to be the impetus primarily driving information governance. Fortunately, archiving and eDiscovery technology, only two parts of information governance continuum, can help with all three of the aforementioned drivers with different workflows.

Later this month we will examine NARA’s directive and what the impact will be on the government’s technology environment – stay tuned.

7th Circuit eDiscovery Pilot Program Tackles Technology Assisted Review With Mock Arguments

Tuesday, May 22nd, 2012

The 7th Circuit eDiscovery Pilot Program’s Mock Argument is the first of its kind and is slated for June 14, 2012.  It is not surprising that the Seventh Circuit’s eDiscovery Pilot Program would be the first to host an event like this on predictive coding, as the program has been a progressive model across the country for eDiscovery protocols since 2009.  The predictive coding event is open to the public (registration required) and showcases the expertise of leading litigators, technologists and experts from all over the United States.  Speakers include: Jason R. Baron, Director of Litigation at the National Archives and Records Administration; Maura R. Grossman, Counsel at Wachtell, Lipton, Rosen & Katz; Dr. David Lewis, Technology Expert and co-founder of the TREC Legal Track; Ralph Losey, Partner at Jackson Lewis; Matt Nelson, eDiscovery Counsel at Symantec; Lisa Rosen, President of Rosen Technology ResourcesJeff Sharer, Partner at Sidley Austin; and Tomas Thompson, Senior Associate at DLA Piper.

The eDiscovery 2.0 blog has extensively covered the three recent predictive coding cases currently being litigated, and while real court cases are paramount to the direction of predictive coding, the 7th Circuit program will proactively address a scenario that has not yet been considered by a court.  In Da Silva Moore, the parties agreed to the use of predictive coding, but couldn’t subsequently agree on the protocol.  In Kleen, plaintiffs want defendants to redo their review process using predictive coding even though the production is 99% complete.  And, in Global Aerospace the defendant proactively petitioned to use predictive coding over plaintiff’s objections.  By contrast, in the 7th Circuit’s hypothetical, the mock argument predicts another likely predictive coding scenario; the instance where a defendant has a deployed in-house solution in place and argues against the use of predictive coding before discovery has begun.

Traditionally, courts have been reticent to bless or admonish technology, but rather rule on the reasonableness of an organization’s process and depend on expert testimony for issues beyond that scope.  It is expected that predictive coding will follow suit; however, because so little is understood about how the technology works, interest has been generated in a way the legal technology industry has not seen before, as evidenced by this tactical program.

* * *

The hypothetical dispute is a complex litigation matter pending in a U.S. District Court involving a large public corporation that has been sued by a smaller high-tech competitor for alleged anticompetitive conduct, unfair competition and various business torts.  The plaintiff has filed discovery requests that include documents and communications maintained by the defendant corporation’s vast international sales force.  To expedite discovery and level the playing field in terms of resources and costs, the Plaintiff has requested the use of predictive coding to identify and produce responsive documents.  The defendant, wary of the latest (and untested) eDiscovery technology trends, argues that the organization already has a comprehensive eDiscovery program in place.  The defendant will further argue that the technological investment and defensible processes in-house are more than sufficient for comprehensive discovery, and in fact, were designed in order to implement a repeatable and defensible discovery program.  The methodology of the defendant is estimated to take months and result in the typical massive production set, whereas predictive coding would allegedly make for a shorter discovery period.  Because of the burden, the defendant plans to shift some of these costs to the plaintiff.

Ralph Losey’s role will be as the Magistrate Judge, defense counsel will be Martin T. Tully (partner Katten Muchin Rosenman LLP), with Karl Schieneman (of Review Less/ESI Bytes) as the litigation support manager for the corporation and plaintiff’s counsel will be Sean Byrne (eDiscovery solutions director at Axiom) with Herb Roitblat (of OrcaTec) as plaintiff’s eDiscovery consultant.

As the hottest topic in the eDiscovery world, the promises of predictive coding include: increased search accuracy for relevant documents, decreased cost and time spent for manual review, and possibly greater insight into an organization’s corpus of data allowing for more strategic decision making with regard to early case assessment.  The practical implications of predictive coding use are still to be determined and programs like this one will flesh out some of those issues before they get to the courts, which is good for practitioners and judges alike.  Stay tuned for an analysis of the arguments, as well as a link to the video.

Courts Increasingly Cognizant of eDiscovery Burdens, Reject “Gotcha” Sanctions Demands

Friday, May 18th, 2012

Courts are becoming increasingly cognizant of the eDiscovery burdens that the information explosion has placed on organizations. Indeed, the cases from 2012 are piling up in which courts have rejected demands that sanctions be imposed for seemingly reasonable information retention practices. The recent case of Grabenstein v. Arrow Electronics (D. Colo. April 23, 2012) is another notable instance of this trend.

In Grabenstein, the court refused to sanction a company for eliminating emails pursuant to a good faith document retention policy. The plaintiff had argued that drastic sanctions (evidence, adverse inference and monetary) should be imposed on the company since relevant emails regarding her alleged disability were not retained in violation of both its eDiscovery duties and an EEOC regulatory retention obligation. The court disagreed, finding that sanctions were inappropriate because the emails were not deleted before the duty to preserve was triggered: “Plaintiff has not provided any evidence that Defendant deleted e-mails after the litigation hold was imposed.”

Furthermore, the court declined to issue sanctions of any kind even though it found that the company deleted emails in violation of its EEOC regulatory retention duty. The court adopted this seemingly incongruous position because the emails were overwritten pursuant to a reasonable document retention policy:

“there is no evidence to show that the e-mails were destroyed in other than the normal course of business pursuant to Defendant’s e-mail retention policy or that Defendant intended to withhold unfavorable information from Plaintiff.”

The Grabenstein case reinforces the principle that reasonable information retention and eDiscovery processes can and often do trump sanctions requests. Just like the defendant in Grabenstein, organizations should develop and follow a retention policy that eliminates data stockpiles before litigation is reasonably anticipated. Grabenstein also demonstrates the value of deploying a timely and comprehensive litigation hold process to ensure that relevant electronically stored information (ESI) is retained once a preservation duty is triggered. These principles are consistent with various other recent cases, including a decision last month in which pharmaceutical giant Pfizer defeated a sanctions motion by relying on its “good faith business procedures” to eliminate legacy materials before a duty to preserve arose.

The Grabenstein holding also spotlights the role that proportionality can play in determining the extent of a party’s preservation duties. The Grabenstein court reasoned that sanctions would be inappropriate since plaintiff managed to obtain the destroyed emails from an alternative source. Without expressly mentioning “proportionality,” the court implicitly drew on Federal Rule of Civil Procedure 26(b)(2)(C) to reach its “no harm, no foul” approach to plaintiff’s sanctions request. Rule 2626(b)(2)(C)(i) empowers a court to limit discovery when it is “unreasonably cumulative or duplicative, or can be obtained from some other source that is more convenient, less burdensome, or less expensive.” Given that plaintiff actually had the emails in question and there was no evidence suggesting other ESI had been destroyed, proportionality standards tipped the scales against the sanctions request.

The Grabenstein holding is good news for organizations looking to reduce their eDiscovery costs and burdens. By refusing to accede to a tenuous sanctions motion and by following principles of proportionality, the court sustained reasonableness over “gotcha” eDiscovery tactics. If courts adhere to the Grabenstein mantra that preservation and production should be reasonable and proportional, organizations truly stand a better chance of seeing their litigation costs and burdens reduced accordingly.

District Court Upholds Judge Peck’s Predictive Coding Order Over Plaintiff’s Objection

Monday, April 30th, 2012

In a decision that advances the predictive coding ball one step further, United States District Judge Andrew L. Carter, Jr. upheld Magistrate Judge Andrew Peck’s order in Da Silva Moore, et. al. v. Publicis Groupe, et. al. despite Plaintiff’s multiple objections. Although Judge Carter rejected all of Plaintiff’s arguments in favor of overturning Judge Peck’s predictive coding order, he did not rule on Plaintiff’s motion to recuse Judge Peck from the current proceedings – a matter that is expected to be addressed separately at a later time. Whether or not a successful recusal motion will alter this or any other rulings in the case remains to be seen.

Finding that it was within Judge Peck’s discretion to conclude that the use of predictive coding technology was appropriate “under the circumstances of this particular case,” Judge Carter summarized Plaintiff’s key arguments listed below and rejected each of them in his five-page Opinion and Order issued on April 26, 2012.

  • the predictive coding method contemplated in the ESI protocol lacks generally accepted reliability standards,
  • Judge Peck improperly relied on outside documentary evidence,
  • Defendant MSLGroup’s (“MSL’s”) expert is biased because the use of predictive coding will reap financial benefits for his company,
  • Judge Peck failed to hold an evidentiary hearing and adopted MSL’s version of the ESI protocol on an insufficient record and without proper Rule 702 consideration

Since Judge Peck’s earlier order is “non-dispositive,” Judge Carter identified and applied the “clearly erroneous or contrary to law” standard of review in rejecting Plaintiffs’ request to overturn the order. Central to Judge Carter’s reasoning is his assertion that any confusion regarding the ESI protocol is immaterial because the protocol “contains standards for measuring the reliability of the process and the protocol builds in levels of participation by Plaintiffs.” In other words, Judge Carter essentially dismisses Plaintiff’s concerns as premature on the grounds that the current protocol provides a system of checks and balances that protects both parties. To be clear, that doesn’t necessarily mean Plaintiffs won’t get a second bite of the apple if problems with MSL’s productions surface.

For now, however, Judge Carter seems to be saying that although Plaintiffs must live with the current order, they are by no means relinquishing their rights to a fair and just discovery process. In fact, the existing protocol allows Plaintiffs to actively participate in and monitor the entire process closely. For example, Judge Carter writes that, “if the predictive coding software is flawed or if Plaintiffs are not receiving the types of documents that should be produced, the parties are allowed to reconsider their methods and raise their concerns with the Magistrate Judge.”

Judge Carter also specifically addresses Plaintiff’s concerns related to statistical sampling techniques which could ultimately prove to be their meatiest argument. A key area of disagreement between the parties is whether or not MSL is reviewing enough documents to insure relevant documents are not completely overlooked even if this complex process is executed flawlessly. Addressing this point Judge Carter states that, “If the method provided in the protocol does not work or if the sample size is indeed too small to properly apply the technology, the Court will not preclude Plaintiffs from receiving relevant information, but to call the method unreliable at this stage is speculative.”

Although most practitioners are focused on seeing whether and how many of these novel predictive coding issues play out, it is important not to overlook two key nuggets of information lining Judge Carter’s Opinion and Order. First, Judge Carter’s statement that “[t]here simply is no review tool that guarantees perfection” serves as an acknowledgement that “reasonableness” is the standard by which discovery should be measured, not “perfection.” Second, Judge Carter’s acknowledgement that manual review with keyword searches may be appropriate in certain situations should serve as a wake-up call for those who think predictive coding technology will replace all predecessor technologies. To the contrary, predictive coding is a promising new tool to add to the litigator’s tool belt, but it is not necessarily a replacement for all other technology tools.

Plaintiffs in Da Silva Moore may not have received the ruling they were hoping for, but Judge Carter’s Opinion and Order makes it clear that the court house door has not been closed. Given the controversy surrounding this case, one can assume that Plaintiffs are likely to voice many of their concerns at a later date as discovery proceeds. In other words, don’t expect all of these issues to fade away without a fight.

First State Court Issues Order Approving the Use of Predictive Coding

Thursday, April 26th, 2012

On Monday, Virginia Circuit Court Judge James H. Chamblin issued what appears to be the first state court Order approving the use of predictive coding technology for eDiscovery. Tuesday, Law Technology News reported that Judge Chamblin issued the two-page Order in Global Aerospace Inc., et al, v. Landow Aviation, L.P. dba Dulles Jet Center, et al, over Plaintiffs’ objection that traditional manual review would yield more accurate results. The case stems from the collapse of three hangars at the Dulles Jet Center (“DJC”) that occurred during a major snow storm on February 6, 2010. The Order was issued at Defendants’ request after opposing counsel objected to their proposed use of predictive coding technology to “retrieve potentially relevant documents from a massive collection of electronically stored information.”

In Defendants’ Memorandum in Support of their motion, they argue that a first pass manual review of approximately two million documents would cost two million dollars and only locate about sixty percent of all potentially responsive documents. They go on to state that keyword searching might be more cost-effective “but likely would retrieve only twenty percent of the potentially relevant documents.” On the other hand, they claim predictive coding “is capable of locating upwards of seventy-five percent of the potentially relevant documents and can be effectively implemented at a fraction of the cost and in a fraction of the time of linear review and keyword searching.”

In their Opposition Brief, Plaintiffs argue that Defendants should produce “all responsive documents located upon a reasonable inquiry,” and “not just the 75%, or less, that the ‘predictive coding’ computer program might select.” They also characterize Defendants’ request to use predictive coding technology instead of manual review as a “radical departure from the standard practice of human review” and point out that Defendants cite no case in which a court compelled a party to accept a document production selected by a “’predictive coding’ computer program.”

Considering predictive coding technology is new to eDiscovery and first generation tools can be difficult to use, it is not surprising that both parties appear to frame some of their arguments curiously. For example, Plaintiffs either mischaracterize or misunderstand Defendants’ proposed workflow given their statement that Defendants want a “computer program to make the selections for them” instead of having “human beings look at and select documents.” Importantly, predictive coding tools require human input for a computer program to “predict” document relevance. Additionally, the proposed approach includes an additional human review step prior to production that involves evaluating the computer’s predictions.

On the other hand, some of Defendants’ arguments also seem to stray a bit off course. For example, Defendants’ seem to unduly minimize the value of using other tools in the litigator’s tool belt like keyword search or topic grouping to cull data prior to using potentially more expensive predictive coding technology. To broadly state that keyword searching “likely would retrieve only twenty percent of the potentially relevant documents” seems to ignore two facts. First, keyword search for eDiscovery is not dead. To the contrary, keyword searches can be an effective tool for broadly culling data prior to manual review and for conducting early case assessments. Second, the success of keyword searches and other litigation tools depends as much on the end user as the technology. In other words, the carpenter is just as important as the hammer.

The Order issued by Judge Chamblin, the current Chief Judge for the 20th Judicial Circuit of Virginia, states that “Defendants shall be allowed to proceed with the use of predictive coding for purposes of the processing and production of electronically stored information.”  In a hand written notation, the Order further provides that the processing and production is to be completed within 120 days, with “processing” to be completed within 60 days and “production to follow as soon as practicable and in no more than 60 days.” The order does not mention whether or not the parties are required to agree upon a mutually agreeable protocol; an issue that has plagued the court and the parties in the ongoing Da Silva Moore, et. al. v. Publicis Groupe, et. al. for months.

Global Aerospace is the third known predictive coding case on record, but appears to present yet another set of unique legal and factual issues. In Da Silva Moore, Judge Andrew Peck of the Southern District of New York rang in the New Year by issuing the first known court order endorsing the use of predictive coding technology.  In that case, the parties agreed to the use of predictive coding technology, but continue to fight like cats and dogs to establish a mutually agreeable protocol.

Similarly, in the 7th Federal Circuit, Judge Nan Nolan is tackling the issue of predictive coding technology in Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. In Kleen, Plaintiffs basically ask that Judge Nolan order Defendants to redo their production even though Defendants have spent thousands of hours reviewing documents, have already produced over a million documents, and their review is over 99 percent complete. The parties have already presented witness testimony in support of their respective positions over the course of two full days and more testimony may be required before Judge Nolan issues a ruling.

What is interesting about Global Aerospace is that Defendants proactively sought court approval to use predictive coding technology over Plaintiffs’ objections. This scenario is different than Da Silva Moore because the parties in Global Aerospace have not agreed to the use of predictive coding technology. Similarly, it appears that Defendants have not already significantly completed document review and production as they had in Kleen Products. Instead, the Global Aerospace Defendants appear to have sought protection from the court before moving full steam ahead with predictive coding technology and they have received the court’s blessing over Plaintiffs’ objection.

A key issue that the Order does not address is whether or not the parties will be required to decide on a mutually agreeable protocol before proceeding with the use of predictive coding technology. As stated earlier, the inability to define a mutually agreeable protocol is a key issue that has plagued the court and the parties for months in Da Silva Moore, et. al. v. Publicis Groupe, et. al. Similarly, in Kleen, the court was faced with issues related to the protocol for using technology tools. Both cases highlight the fact that regardless of which eDiscovery technology tools are selected from the litigator’s tool belt, the tools must be used properly in order for discovery to be fair.

Judge Chamblin left the barn door wide open for Plaintiffs to lodge future objections, perhaps setting the stage for yet another heated predictive coding battle. Importantly, the Judge issued the Order “without prejudice to a receiving party” and notes that parties can object to the “completeness or the contents of the production or the ongoing use of predictive coding technology.”  Given the ongoing challenges in Da Silva Moore and Kleen, don’t be surprised if the parties in Global Aerospace Inc. face some of the same process-based challenges as their predecessors. Hopefully some of the early challenges related to the use of first generation predictive coding tools can be overcome as case law continues to develop and as next generation predictive coding tools become easier to use. Stay tuned as the facts, testimony, and arguments related to Da Silva Moore, Kleen Products, and Global Aerospace Inc. cases continue to evolve.

The 2012 EDGE Summit (21st Century Technology for Information Governance) Debuts In Nation’s Capitol

Monday, April 23rd, 2012

The EDGE Summit this week is one of the most prestigious eDiscovery events of the year as well as arguably the largest for the government sector. This year’s topics and speakers are top notch. The opening keynote speaker will be the Director of Litigation for the National Archives and Records Administration (NARA), Mr. Jason Baron. The EDGE Summit will be the first appearance for Mr. Baron since the submission deadline for the 480 agencies to submit their reports to his Agency in order to construct the Directive required by the Presidential Mandate. Attendees will be eager to hear what steps NARA is taking to implement a Directive to the government later this year, and the potential impact it will have on how the government approaches its eDiscovery obligations. The Directive will be a significant step in attempting to bring order to the government’s Big Data challenges and to unify agencies with a similar approach to an information governance plan.

Also speaking at EDGE is the renowned Judge Facciola who will be discussing the anticipated updates the American Bar Association (ABA) is expected to make to the Model Rules of Professional Conduct. He plans to speak on the challenges that lawyers are facing in the digital age, and what that means with regard to competency as a practicing lawyer. He will focus as well on the government lawyer and how they can better meet their legal obligations through education, training, or knowing when and how to find the right expert. Whether it is the investigating party for law enforcement, producing party under the Freedom of Information Act (FOIA), or defendant in civil litigation, Judge Facciola will also discuss what he sees in his courtroom every day and where the true knowledge gaps are in the technological understanding of many lawyers today.

While the EDGE Summit offers CLE credit, it also has a very unique practical aspect as well. There will be a FOIA-specific lab, a lab on investigations, one on civil litigation and early case assessment (ECA) and also one on streamlining the eDiscovery workflow process. Those that plan on attending the labs will get the hands-on experience with technology that few educational events offer. It is rare to get in the driver’s seat of the car on the showroom floor and actually drive, which is what EDGE is providing for end users and interested attendees. When talking about the complex problems government agencies face today with Big Data, records management, information governance, eDiscovery, compliance, security, etc. it is necessary to give users a way to  truly visualize how these technologies work.

Another key draw at the Summit will be the panel discussions which will feature experienced government lawyers who have been on the front lines of litigation and have very unique perspectives. The legal hold panel will cover some exciting aspects of the evolution of manual versus automated processes for legal hold. Mr. David Shonka, the Deputy General Counsel of the Federal Trade Commission, is on the panel and he will discuss the defensibility of the process the FTC used and the experience his department had with two 30 (b) (6) witnesses in the Federal Trade Commission v. Lights of America, Inc (CD California, Mar 2011). The session will also cover how issuing a legal hold is imperative once the duty to preserve has been triggered. There are a whole new generation of lawyers that are managing the litigation hold process in an automated way, and it will be great to discuss both the manual and automated approaches and talk about best practices for government agencies. There will also be a session on predictive coding and discussion about the recent cases that have involve the use of technology assisted review. While we are not at the point of mainstream adoption for predictive coding, it is quite exciting to think about the government going from a paper world straight into solutions that would help them manage their unique challenges as well as save them time and money.

Finally, the EDGE Summit will conclude with closing remarks from The Hon. Michael Chertoff, former Secretary of the U.S. Department of Homeland Security from 2005 to 2009. Mr. Chertoff presently consults with high-level strategic counsel to corporate and government leaders on a broad range of security issues, from risk identification and prevention to preparedness, response and recovery. All of these issues now involve data and how to search, collect, analyze, protect and store it. Security is one of the most important aspects of information governance. The government has unique challenges including size and many geographical locations, records management requirements, massive data volume and case load, investigations, heightened security and defense intelligence risks. This year, in particular, will be a defining year; not only because of the Presidential Mandate, but because of the information explosion and the stretch of global economy. This is why the sector needs to come together to share best practices and hear success stories.  Otherwise, they won’t be able to keep up with the data explosion that’s threatening private and public sectors alike.

eDiscovery Down Under: New Zealand and Australia Are Not as Different as They Sound, Mate!

Thursday, March 29th, 2012

Shortly after arriving in Wellington, New Zealand, I picked up the Dominion Post newspaper and read its lead article: a story involving U.S. jurisdiction being exercised over billionaire NZ resident Mr. Kim Dotcom. The article reinforced the challenges we face with blurred legal and data governance issues presented by the globalization of the economy and the expansive reach of the internet. Originally from Germany, and having changed his surname to reflect the origin of his fortune, Mr. Dotcom has become all too familiar in NZ of late. He has just purchased two opulent homes in NZ, and has become an internationally controversial figure for internet piracy. Mr. Dotcom’s legal troubles arise out of his internet business that enables illegal downloads of pirated material between users, which allegedly is powering the largest copyright infringement in global history. It is approximated that his website constitutes 4% of the internet traffic in the world, which means there could be tons of discovery in this case (or, cases).

The most recent legal problems Mr. Dotcom faces are with U.S. authorities who want to extradite him to face copyright charges worth $500 million by his Megaupload file-sharing website. From a criminal and record-keeping standpoint, Mr. Dotcom’s issues highlight the need for and use of appropriate technologies. In order to establish a case against him, it’s likely that search technologies were deployed by U.S. intelligence agencies to piece together Mr. Dotcom’s activities, banking information, emails and the data transfers on his site. In a case like this, where intelligence agencies would need to collect, search and cull email from so many different geographies and data sources down to just the relevant information, using technologies that link email conversation threads and give insight into a data collection set from a transparent search point of view would provide immense value. Additionally, the Immigration bureau in New Zealand has been required to release hundreds of documents about Mr. Dotcom’s residency application that were requested under the Official Information Act (OIA). The records that Immigration had to produce were likely pulled from their archive or records management system in NZ, and then redacted for private information before production to the public.

The same tools are needed in Australia and New Zealand to build a criminal case or to comply with the OIA that we use here in the U.S for investigatory and compliance purposes, as well as for litigation. The trend in information governance technology in APAC is trending first toward government agencies who are purchasing archiving and eDiscovery technologies more rapidly than private companies. Why is this? One reason could be that because the governments in APAC have a larger responsibility for healthcare, education and the protection of privacy; they are more invested in the compliance requirements and staying off the front page of the news for shortcomings. APAC private enterprises that are small or mid-sized and are not yet doing international business do not have the same archiving and eDiscovery needs large government agencies do, nor do they face litigation in the same way their American counterparts do. Large global companies should assume no matter where they are based, that they may be availed to litigation where they are doing business.

An interesting NZ use case on the enterprise level is that of Transpower (the quasi-governmental energy agency), where compliance with both the “private and public” requirements are mandatory. Transpower is an organisation that is government-owned, yet operates for a profit. Sally Myles, an experienced records manager that recently came to Transpower to head up information governance initiatives, says,

“We have to comply with the Public Records Act of 2005, public requests for information are frequent as we and are under constant scrutiny about where we will develop our plants. We also must comply with the Privacy Act of 1993. My challenge is to get the attention of our leadership to demonstrate why we need to make these changes and show them a plan for implementation as well as cost savings.”

Myles’ comments indicate NZ is facing many of the same information challenges we are here in the US with storage, records management and searching for meaningful information within the organisation.

Australia, New Zealand and U.S. Commonalities

In Australia and NZ, litigation is not seen as a compelling business driver the same way it is in the U.S. This is because many of the information governance needs of organisations are driven by regulatory, statutory and compliance requirements and the environment is not as litigious as it is in the U.S. The Official Information Act in NZ, and the Freedom of Information in Australia, are analogous to the Freedom of Information Act (FOIA) here in the U.S. The requirements to produce public records alone justify the use of technology to provide the ability to manage large volumes of data and produce appropriately redacted information to the public. This is true regardless of litigation. Additionally, there are now cases like DuPont or Mr. Dotcom’s, that legitimatize the risk of litigation with the U.S. The fact that implementing an information governance product suite will also enable a company to be prepared for litigation is a beneficial by-product for many entities as they need technology for record keeping and privacy reasons anyway. In essence, the same capabilities are achieved at the end of the day, regardless of the impetus for implementing a solution.

The Royal Commission – The Ultimate eDiscovery Vehicle

One way to think about the Australian Royal Commission (RCs) is to see it as a version of the U.S.’ government investigation. A key difference, however, is that in the case of the U.S. government, an investigation is typically into private companies. Conversely, a Royal Commission is typically an investigation into a government body after a major tragedy and it is initiated by the Head of State. A RC is an ad-hoc, formal, public inquiry into a defined issue with considerable discovery powers. These powers can be greater than those of a judge and are restricted to the scope and terms of reference of the Commission. RCs are called to look into matters of great importance and usually have very large budgets. The RC is charged with researching the issue, consulting experts both within and outside of government and developing findings to recommend changes to the law or other courses of actions. RCs have immense investigatory powers, including summoning witnesses under oath, offering of indemnities, seizing of documents and other evidence (sometimes including those normally protected, such as classified information), holding hearings in camera if necessary and—in a few cases—compelling government officials to aid in the execution of the Commission.

These expansive powers give the RC the opportunity to employ state of the art technology and to skip the slow bureaucratic decision making processes found within the government when it comes to implementing technological change. For this reason, initially, eDiscovery will continue to increase in the government sector at a more rapid pace than in the private in the Asia Pacific region. This is because litigation is less prevalent in the Asia Pacific, and because the RC is a unique investigatory vehicle with the most far-reaching authority for discovering information. Moreover, the timeframes for RCs are tight and their scopes are broad, making them hair on fire situations that move quickly.

While the APAC information management environment does not have the exact same drivers the U.S. market does, it definitely has the same archiving, eDiscovery and technology needs for different reasons. Another key point is that the APAC archiving and eDiscovery market will likely be driven by the government as records, search and production requirements are the main compliance needs in Australia and NZ. APAC organisations would be well served by beginning to modularly implement key elements of an information governance plan, as globalization is driving us all to a more common and automated approach to data management.