Posts Tagged ‘electronic discovery costs’

The Top 3 Forensic Data Collection Myths in eDiscovery

Wednesday, August 7th, 2013

Confusion about establishing a legally defensible approach for collecting data from computer hard drives during eDiscovery has existed for years. The confusion stems largely from the fact that traditional methodologies die hard and legal requirements are often misunderstood. The most traditional approach to data collection entails making forensic copies or mirror images of every custodian hard drive that may be relevant to a particular matter. This practice is still commonly followed because many believe collecting every shred of potentially relevant data from a custodian’s computer is the most efficient approach to data collection and the best way to avoid spoliation sanctions.


In reality, courts typically do not require parties to collect every shred of electronically stored information (ESI) as part of a defensible eDiscovery process and organizations wedded to this process are likely wasting significant amounts of time and money. If collecting everything is not required, then why would organizations waste time and money following an outdated and unnecessary approach? The answer is simple – many organizations fall victim to 3 forensic data collection myths that perpetuate inefficient data collection practices. This article debunks these 3 myths and provides insight into more efficient data collection methodologies that can save organizations time and money without increasing risk.


Myth #1: “Forensic Copy” and “Forensically Sound” are Synonymous


For many, the confusion begins with a misunderstanding of the terms “forensic copy” and “forensically sound.” The Sedona Conference, a leading nonprofit research and educational institute dedicated to the advanced study of law, defines a forensic copy as follows:


An exact copy of an entire physical storage media (hard drive, CD-ROM, DVD-ROM, tape, etc.), including all active and residual data and unallocated or slack space on the media. Forensic copies are often called “images” or “imaged copies.” (See: The Sedona Conference Glossary: E-Discovery & Digital Information Management, 3rd Edition, Sept. 2010).


Forensically sound, on the other hand, refers to the integrity of the data collection process and relates to the defensibility of how ESI is collected. Among other things, electronic files should not be modified or deleted during collection and a proper chain of custody should be established in order for the data collection to be deemed forensically sound. If data is not collected in a forensically sound manner, then the integrity of the ESI that is collected may be suspect and could be excluded as evidence.


Somehow over time, many have interpreted the need for a forensically sound collection to require forensic copies of hard drives to be made. In other words, they believe an entire computer hard drive must be collected for a collection to be legally defensible (forensically sound). In reality, entire hard drives (forensic copies) or even all active user files need not be copied as part of a defensible data collection process. What is required, however, is the collection of ESI in a forensically sound manner regardless of whether an entire drive is copied or only a few files.


Myth # 2: Courts Require Forensic Copies for Most Cases


Making forensic copies of custodian hard drives is often important as part of criminal investigations, trade secret theft cases, and other matters where the recovery and analysis of deleted files, internet browsing history, and other non-user generated information is important to a case. However, most large civil matters only require the production of user-generated files like emails, Microsoft Word documents, and other active files (as opposed to deleted files).


Unnecessarily making forensic copies results in more downstream costs in the form of increased document processing, attorney review, and vendor hosting fees because more ESI is collected than necessary. The simple rule of thumb is that the more ESI collected at the beginning of a matter, the higher the downstream eDiscovery costs. That means casting a narrow collection net at the beginning of a case rather than “over-collecting” more ESI than legally required can save significant time and money.


Federal Rule of Civil Procedure 34 and case law help dispel the myth that forensic copies are required for most civil cases. The notes to Rule 34(a)(1) state that,


Rule 34(a)…is not meant to create a routine right of direct access to a party’s electronic information system, although such access might be justified in some circumstances. Courts should guard against undue intrusiveness resulting from inspecting or testing such systems.

More than a decade ago, the Tenth Circuit validated the notion that opposing parties should not be routinely entitled to forensic copies of hard drives. In McCurdy Group v. Am. Biomedical Group, Inc., 9 Fed. Appx. 822 (10th Cir. 2001) the court held that skepticism concerning whether a party has produced all responsive, non-privileged documents from certain hard drives is an insufficient reason standing alone to warrant production of the hard drives: “a mere desire to check that the opposition has been forthright in its discovery responses is not a good enough reason.” Id. at 831.

On the other hand, Ameriwood Indus. v. Liberman, 2006 U.S. Dist. LEXIS 93380 (E.D. Mo. Dec. 27, 2006), is a good example of a limited situation where making a forensic copy of a hard drive might be appropriate. In Ameriwood, the court referenced Rule 34(a)(1) to support its decision to order a forensic copy of the defendant’s hard drive in a trade secret misappropriation case because the defendant “allegedly used the computer itself to commit the wrong….” In short, courts expect parties to take a reasonable approach to data collection. A reasonable approach to collection only requires making forensic copies of computer hard drives in limited situations.


Myth #3: Courts Have “Validated” Some Proprietary Collection Tools


Confusion about computer forensics, data collection, and legal defensibility has also been stoked as the result of overzealous claims by technology vendors that courts have “validated” some data collection tools and not others. This has led many attorneys to believe they should play it safe by only using tools that have ostensibly been “validated” by courts. Unfortunately, this myth exacerbates the over-collection of ESI problem that frequently costs organizations time and money.


The notion that courts are in the business of validating particular vendors or proprietary technology solutions is a hot topic that has been summarily dismissed by one of the leading eDiscovery attorneys and computer forensic examiners on the planet. In his article titled, We’re Both Part of the Same Hypocrisy, Senator, Craig Ball explains that courts generally are not in the business of “validating” specific companies and products. To make his point, Mr. Ball poignantly states that:


just because a product is named in passing in a court opinion and the court doesn’t expressly label the product a steaming pile of crap does not render the product ‘court validated.’ 


In a nod to the fact that the defensibility of the data collection process is dependent on the methodology as much as the tools used, Mr. Ball goes on to explain that, “the integrity of the process hinges on the carpenter, not the hammer.”




In the past decade, ESI collection tools have evolved dramatically to enable the targeted collection of ESI from multiple data sources in an automated fashion through an organization’s computer network. Rather than manually connecting a collection device to every custodian hard drive or server to identify and collect ESI for every new matter, new tools enable data to be collected from multiple custodians and data sources within an organization using a single collection tool. This streamlined approach saves organizations time and money without sacrificing legal defensibility or forensic soundness.


Choosing the correct collection approach is important for any organization facing regulatory scrutiny or routine litigation because data collection represents an early and important step in the eDiscovery process. If data is overlooked, destroyed, altered, or collected too slowly, the organization could face embarrassment and sanctions. On the other hand, needlessly over-collecting data could result in unnecessary downstream processing and review expenses. Properly assessing the data collection requirements of each new matter and understanding modern collection technologies will help you avoid the top 3 forensic data collection myths and save your organization time and money.

The Need for a More Active Judiciary in eDiscovery

Wednesday, July 24th, 2013

Various theories have been advanced over the years to determine why the digital age has caused the discovery process to spiral out of control. Many believe that the sheer volume of ESI has led to the increased costs and delays that now characterize eDiscovery. Others place the blame on the quixotic advocacy of certain lawyers who seek “any and all documents” in their quest for the proverbial smoking gun. While these factors have undoubtedly contributed to the current eDiscovery frenzy, there is still another key reason that many cognoscenti believe has impacted discovery: a lack of judicial involvement. Indeed, in a recent article published by the University of Kansas Law Review, Professor Steven Gensler and Judge Lee Rosenthal argue that many of the eDiscovery challenges facing lawyers and litigants could be addressed in a more efficient and cost-effective manner through “active case management” by judges. According to Professor Gensler and Judge Rosenthal, a meaningful Rule 16 conference with counsel can enable “the court to ensure that the lawyers and parties have paid appropriate attention to planning for electronic discovery.”

To facilitate this vision of a more active judiciary in the discovery process, the Advisory Committee has proposed a series of changes to the Federal Rules of Civil Procedure. Most of these changes are designed to improve the effectiveness of the Rule 26(f) discovery conference and to encourage courts to provide input on key discovery issues at the outset of a case.

Rules 26 and 34 – Improving the Effectiveness of the Rule 26(f) Discovery Conference

One way the Committee felt that it could enable greater judicial involvement in case management was to have the parties conduct a more meaningful Rule 26(f) discovery conference. Such a step is significant since courts generally believe that a successful conference is the lynchpin for conducting discovery in a proportional manner.

To enhance the usefulness of the conference, the Committee recommended that Rule 26(f) be amended to specifically require the parties to discuss any pertinent issues surrounding the preservation of ESI. This provision is calculated to get the parties thinking proactively about preservation problems that could arise later in discovery. It is also designed to work in conjunction with the proposed amendments to Rule 16(b)(3) and Rule 37(e). Changes to the former would expressly empower the court to issue a scheduling order addressing ESI preservation issues. Under the latter, the extent to which preservation issues were addressed at a discovery conference or in a scheduling order could very well affect any subsequent motion for sanctions relating to a failure to preserve relevant ESI.

Another amendment to Rule 26(f) would require the parties to discuss the need for a “clawback” order under Federal Rule of Evidence 502. Though underused, Rule 502(d) orders generally reduce the expense and hassle of litigating issues surrounding the inadvertent disclosure of ESI protected by the lawyer-client privilege. To ensure this overlooked provision receives attention from litigants, the Committee has drafted a corresponding amendment to Rule 16(b)(3) that would enable the court to weigh in on Rule 502 related issues in a scheduling order.

The final step the Committee has proposed for increasing the effectiveness of the Rule 26(f) conference is to amend Rule 26(d) and Rule 34(b)(2) to enable parties to serve Rule 34 document requests prior to that conference. These “early” requests, which are not deemed served until the conference, are “designed to facilitate focused discussion during the Rule 26(f) conference.” This, the Committee hopes, will enable the parties to subsequently prepare document requests that are more targeted and proportional to the issues in play.

Rule 16 – Greater Judicial Input on Key Discovery Issues

As mentioned above, the Committee has suggested adding provisions to Rule 16(b)(3) that track those in Rule 26(f) so as to provide the opportunity for greater judicial input on certain eDiscovery issues at the outset of a case. In addition to these changes, Rule 16(b)(3) would also allow a court to require that the parties caucus with the court before filing a discovery-related motion. The purpose of this provision is to encourage judges to informally resolve discovery disputes before the parties incur the expense of fully engaging in motion practice. According to the Committee, various courts have used similar arrangements under their local rules that have “proven highly effective in reducing cost and delay.”


Whether or not these changes are successful depends on how committed the courts are to using the proposed case management tools. Without more active involvement from the courts, the newly proposed initiatives regarding cooperation and proportionality may very well fall by the wayside and remain noble, but unmet expectations. Compliance with the draft rules is likely the only method to ensure that these amendments (if enacted) are to be successful.

The Gartner 2013 Magic Quadrant for eDiscovery Software is Out!

Wednesday, June 12th, 2013

This week marks the release of the 3rd annual Gartner Magic Quadrant for e-Discovery Software report.  In the early days of eDiscovery, most companies outsourced almost every sizeable project to vendors and law firms so eDiscovery software was barely a blip on the radar screen for technology analysts. Fast forward a few years to an era of explosive information growth and rising eDiscovery costs and the landscape has changed significantly. Today, much of the outsourced eDiscovery “services” business has been replaced by eDiscovery software solutions that organizations bring in house to reduce risk and cost. As a result, the enterprise eDiscovery software market is forecast to grow from $1.4 billion in total software revenue worldwide in 2012 to $2.9 billion by 2017. (See Forecast:  Enterprise E-Discovery Software, Worldwide, 2012 – 2017, Tom Eid, December, 2012).

Not surprisingly, today’s rapidly growing eDiscovery software market has become significant enough to catch the attention of mainstream analysts like Gartner. This is good news for company lawyers who are used to delegating enterprise software decisions to IT departments and outside law firms. Because today those same company lawyers are involved in eDiscovery and other information management software purchasing decisions for their organizations. While these lawyers understand the company’s legal requirements, they do not necessarily understand how to choose the best technology to address those requirements. Conversely, IT representatives understand enterprise software, but they do not necessarily understand the law. Gartner bridges this information gap by providing in depth and independent analysis of the top eDiscovery software solutions in the form of the Gartner Magic Quadrant for e-Discovery Software.

Gartner’s methodology for preparing the annual Magic Quadrant report is rigorous. Providers must meet quantitative requirements such as revenue and significant market penetration to be included in the report. If these threshold requirements are met then Gartner probes deeper by meeting with company representatives, interviewing customers, and soliciting feedback to written questions. Providers that make the cut are evaluated across four Magic Quadrant categories as either “leaders, challengers, niche players, or visionaries.” Where each provider ends up on the quadrant is guided by an independent evaluation of each provider’s “ability to execute” and “completeness of vision.” Landing in the “leaders” quadrant is considered a top recognition.

The nine Leaders in this year’s Magic Quadrant have four primary characteristics (See figure 1 above).

The first is whether the provider has functionality that spans both sides of the electronic discovery reference model (EDRM) (left side – identification, preservation, litigation hold, collection, early case assessment (ECA) and processing and right-side – processing, review, analysis and production). “While Gartner recognizes that not all enterprises — or even the majority — will want to perform legal-review work in-house, more and more are dictating what review tools will be used by their outside counsel or legal-service providers. As practitioners become more sophisticated, they are demanding that data change hands as little as possible, to reduce cost and risk. This is a continuation of a trend we saw developing last year, and it has grown again in importance, as evidenced both by inquiries from Gartner clients and reports from vendors about the priorities of current and prospective customers.”

We see this as consistent with the theme that providers with archiving solutions designed to automate data retention and destruction policies generally fared better than those without archiving technology. The rationale is that part of a good end-to-end eDiscovery strategy includes proactively deleting data organizations do not have a legal or business need to keep. This approach decreases the amount of downstream electronically stored information (ESI) organizations must review on a case-by-case basis so the cost savings can be significant.

Not surprisingly, whether or not a provider offers technology assisted review or predictive coding capabilities was another factor in evaluating each provider’s end-to-end functionality. The industry has witnessed a surge in predictive coding case law since 2012 and judicial interest has helped drive this momentum. However, a key driver for implementing predictive coding technology is the ability to reduce the amount of ESI attorneys need to review on a case-by-case basis. Given the fact that attorney review is the most expensive phase of the eDiscovery process, many organizations are complementing their proactive information reduction (archiving) strategy with a case-by-case information reduction plan that also includes predictive coding.

The second characteristic Gartner considered was that Leaders’ business models clearly demonstrate that their focus is software development and sales, as opposed to the provision of services. Gartner acknowledged that the eDiscovery services market is strong, but explains that the purpose of the Magic Quadrant is to evaluate software, not services. The justification is that “[c]orporate buyers and even law firms are trending towards taking as much e-Discovery process in house as they can, for risk management and cost control reasons. In addition, the vendor landscape for services in this area is consolidating. A strong software offering, which can be exploited for growth and especially profitability, is what Gartner looked for and evaluated.”

Third, Gartner believes the solution provider market is shrinking and that corporations are becoming more involved in buying decisions instead of deferring technology decisions to their outside law firms. Therefore, those in the Leaders category were expected to illustrate a good mix of corporate and law firm buying centers. The rationale behind this category is that law firms often help influence corporate buying decisions so both are important players in the buying cycle. However, Gartner also highlighted that vendors who get the majority of their revenues from the “legal solution provider channel” or directly from “law firms” may soon face problems.

The final characteristic Gartner considered for the Leaders quadrant is related to financial performance and growth. In measuring this component, Gartner explained that a number of factors were considered. Primary among them is whether the Leaders are keeping pace with or even exceeding overall market growth. (See “Forecast:  Enterprise E-Discovery Software, Worldwide, 2012 – 2017,” Tom Eid, December, 2012).

Companies landing in Gartner’s Magic Quadrant for eDiscovery Software have reason to celebrate their position in an increasingly competitive market. To review Gartner’s full report yourself, click here. In the meantime, please feel free to share your own comments below as the industry anxiously awaits next year’s Magic Quadrant Report.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Breaking News: Over $12 million in Attorney Fees Awarded in Patent Case Involving Predictive Coding

Thursday, February 14th, 2013

A federal judge for the Southern District of California rang in the month of February by ordering plaintiffs in a patent related case to pay a whopping $12 million in attorney fees. The award included more than $2.8 million in “computer assisted” review fees and to add insult to injury, the judge tacked on an additional $64,316.50 in Rule 11 sanctions against defendants’ local counsel. Plaintiffs filed a notice of appeal on February 13th, but regardless of the final outcome, the case is chock-full of important lessons about patent litigation, eDiscovery and the use of predictive coding technology.

The Lawsuit

In Gabriel Technologies Corp. v. Qualcomm Inc., plaintiffs filed a lawsuit seeking over $1 billion in damages. Among its eleven causes of action were claims for patent infringement and misappropriation of trade secrets.  The Court eventually dismissed or granted summary judgment in defendants’ favor as to all of plaintiffs’ claims making defendants the prevailing party and prompting Defendants’ subsequent request for attorneys’ fees.

In response to defendants’ motion for attorney fees, U. S. District Judge Anthony J. Battaglia relied on plaintiffs’ repeated email references to “the utter lack of a case” and their inability to identify the alleged patent inventors to support his finding that their claims were brought in “subjective bad faith” and were “objectively baseless.” Given these findings, Judge Battaglia determined that an award of attorney fees was warranted.

The Attorney Fees Award

The judge then turned to the issue of whether or not defendants’ fee request for $13,465,331.01 was reasonable. He began by considering how defendants itemized their fees which were broken down as follows:

  • $10,244,053 for its outside counsel Cooley LLP (“Cooley”);
  • $391,928.91 for document review performed by Black Letter Discovery, Inc. (“Black Letter”); and
  • $2,829,349.10 for a document review algorithm generated by outside vendor H5.

The court also considered defendants’ request that plaintiffs’ local counsel be held jointly and severally liable for the entire fee award based on the premise that local counsel is required to certify that all pleadings are legally tenable and “well-grounded in fact” under Federal Rule of Civil Procedure 11.

Following a brief analysis, Judge Battaglia found the overall request “reasonable,” but reduced the fee award by $1 million. In lieu of holding local counsel jointly liable, the court chose to sanction local counsel in the amount of $64,316.50 (identical to the amount of local counsel’s fees) for failing to “undertake a reasonable investigation into the merits of the case.”

Three Lessons Learned

The case is important on many fronts. First, the decision makes clear that filing baseless patent claims can lead to financial consequences more severe than many lawyers might expect. If reviewed and upheld on appeal, counsel in the Ninth Circuit accustomed to fending off unsubstantiated patent or misappropriation claims will be armed with an important new tool to ward off would-be patent trolls.

Second, Judge Battaglia’s decision to order Rule 11 sanctions should serve as a wake-up call for local counsel. The ruling reinforces the fact that merely rubber-stamping filings and passively monitoring cases is a risky proposition. Gabriel Technologies illustrates the importance of properly monitoring lead counsel and the consequences of not complying with the mandate of Rule 11 whether serving as lead or local counsel.

The final lesson relates to curbing the costs of eDiscovery and the importance of understanding tools like predictive coding technology. The court left the barn door wide open for plaintiffs to attack defendants’ predictive coding and other fees as “unreasonable,” but plaintiffs didn’t bite. In evaluating H5’s costs, the court determined that Cooley’s review fees were reasonable because Cooley used H5’s “computer-assisted” review services to apparently cull down 12 million documents to a more reasonable number of documents prior to manual review. Although one would expect this approach to be less expensive than paying attorneys to review all 12 million documents, $2,829,349.10 is still an extremely high price to pay for technology that is expected to help cut traditional document review costs by as much as 90 percent.

Plaintiffs were well-positioned to argue that predictive coding technology should be far less expensive because the technology allows a fraction of documents to be reviewed at a fraction of the cost compared to traditional manual review. These savings are possible because a computer is used to evaluate how human reviewers categorize a small subset of documents in order to construct and apply an algorithm that ranks the remaining documents by degree of responsiveness automatically. There are many tools on the market that vary drastically in quality and price, but a price tag approaching $3 million is extravagant and should certainly raise a few eyebrows in today’s predictive coding market. Whether or not plaintiffs missed an opportunity to challenge the reasonableness of defendants’ document review approach may never be known. Stay tuned to see if these and other arguments surface on appeal.

Q&A with Allison Walton of Symantec and Laura Zubulake, Author of Zubulake’s e-Discovery: The Untold Story of my Quest for Justice

Monday, February 4th, 2013

The following is my Q&A with Laura Zubulake, Author of Zubulake’s e-Discovery: The Untold Story of my Quest for Justice.

Q: Given your case began in 2003, and the state of information governance today, do you believe that adoption to has been too slow? Do you think organizations in 2013, ten years later, have come far enough in managing their information?

A: From a technology standpoint, the advancements have been significant. The IT industry has come a long way with regard to the tools available to conduct eDiscovery. Alternatively, surveys indicate a significant percentage of organizations do not prioritize information management and have not established eDiscovery policies and procedures. This is disappointing. The fact that organizations apparently do not understand the value of proactively managing information only puts them at a competitive disadvantage and at increased risk.

 Q: Gartner predicts that the market will be $2.9 billion by 2017. Given this prediction, don’t you think eDiscovery is basically going to be absorbed as a business process and not something so distinct as to require outside 3rd party help? 

A: First, as a former financial executive those predictions, if realized, are reasonably attractive. Any business that can generate double-digit revenue growth until 2017, in this economy and interest rate environment, is worthy of note (assuming costs are controlled). Second, here I would like to distinguish between information governance and eDiscovery. I view eDiscovery as a subset of a broader information governance effort. My case while renowned for eDiscovery, at its essence, was about information. I insisted on searching for electronic documents because I understood the value and purpose of information. I could not make strategic decisions without, what I refer to as, “full” information. The Zubulake opinions were a result of my desire for information, not the other way around. I believe corporations will increasingly recognize the need to proactively manage information for business, cost, legal, and risk purposes. As such, I think information governance will become more of a business process, just like any management, operational, product, and finance process.

With regard to eDiscovery, I think there will continue to be a market for outside third-party assistance. eDiscovery requires specific skills and technologies. Companies lacking financial resources and expertise, and requiring assistance to address the volume of data will likely deem it economical to outsource eDiscovery efforts. As with any industry, eDiscovery will evolve.  The sector has grown quickly. There will be consolidation. Eventually, the fittest will survive.

Q: What do you think about the proposed changes to the FRCP regarding preservation? 

A: As a former plaintiff (non-attorney), eDiscovery was (to me) about preservation. Very simply, documents could not be collected, reviewed, and produced if they had not been preserved. Any effort to clarify preservation rules would benefit all parties—uncertainty created challenges. Of course, there needs to be a balance between overwhelming corporations with legal requirements and costs versus protecting a party’s rights to evidence. Apparently, the current proposals do not specifically pertain to preservation. They concern the scope of discovery and proportionality and thus indirectly address the issue of preservation. While this would be helpful, it is not ideal. Scope is, in part, a function of relevance – a frequently debated concept. What was relevant to me might not have been relevant to others. Regarding proportionality, my concern is perspective.  Too often I find discussions about proportionality, stem from the defendant’s perspective. Rarely, do I hear the viewpoint of the plaintiff represented. Although not all plaintiffs are individuals, often the plaintiff is the relatively under-resourced party. Deciding whether the burden of proposed discovery outweighs its likely benefits is not a science. As I wrote in my book:

Imagine if the Court were to have agreed with [the Defendant’s] argument and determined the burden of expense of the proposed discovery in my case outweighed its likely benefit. Not only would the Zubulake opinions not have come to fruition, but also I would have been denied my opportunity to prove my claims. 

Q: Lastly, what other trends are you see in in the area of eDiscovery and what predictions do you have for the market in 2013? 

A: eDiscovery Morphs. Organizations will realize that eDiscovery should be part of a broader information governance effort. Information governance will become a division within a corporation with separate accountable management from which operations, legal, IT, and HR professionals can source and utilize information to achieve goals. Financial markets will increasingly reward companies (with higher multiples) who proactively manage information.

Reorganization. Organizations will recognize while information is their most valuable asset it is fearless— crossing functions, divisions, borders and not caring if it overwhelms an entity with volume, costs, and risks. Organizational structures will need to adapt and accommodate the ubiquitous nature of information. A systems thinking framework (understanding how processes influence one another within a whole) will increasingly replace a business silo structure. Information and communication managed proactively and globally, will improve efficiency, enhance profitability, reduces costs, increase compliance, and mitigate risks.

Search. Algorithms become an accepted search tool. Although keyword, concept, cluster, etc. searches will still play a role. For years, law enforcement, government, and Wall Street have used algorithms—the concept is not new and not without peril (significant market corrections were the result of algorithms gone wrong). Parties confronted with volumes of data and limited resources will have no choice but to agree to computer assistance. However, negative perceptions and concerns about algorithms will only change when there is a case where the parties initiate and voluntarily agree to their use.

Education. Within information governance efforts, organizations will increasingly establish training for employees. Employees need to be educated about the origination, maintenance, use, disposal, risks, rules, and regulations associated with ESI. A goal should be to lessen the growth of data and encourage smart and efficient communications. Education is a cost-control and risk-mitigating effort.

BYOD Reconsidered. Thinking a BYOD to work policy is cost-effective will be questioned and should be evaluated on a risk-adjusted basis. When companies analyze the costs (cash outlay) of providing employees with devices versus the unquantifiable costs associated with the lack of control, disorganization, and increased risks – it will become clear BYOD has the potential to be very expensive.

Government Focus. I had the privilege of addressing the Dept. of Justice’s Civil E-Discovery training program. It was evident to me that eDiscovery is one of the department’s focuses. With recent headlines concerning emails uncovering evidence (e.g. Fast and Furious), government entities (state and federal) will increasingly adopt rules, procedures, and training to address ESI. This brings me back to your first question—have organizations come far enough in managing their information? Government efforts to focus on eDiscovery will incentivize more corporations to (finally) address eDiscovery and information governance challenges.

Stay tuned for more breaking news coverage with industry luminaries.

For Westerners Seeking Discovery From China, Fortune Cookie Reads: Discovery is Uncertain, and Will Likely Be Hard

Monday, January 7th, 2013

In a recent Inside Counsel article, we explored the eDiscovery climate in China and some of the most important differences between the Chinese and U.S. legal systems. There is an increased interest in China and the legal considerations surrounding doing business with Chinese organizations, which we also covered on this Inside Counsel webcast.

 Five highlights from this series include:

1.  Conflicting Corporate Cultures- In general, business in China is done in a way that relies heavily on relationships. This can easily cause a conflict of interest for organizations and put them at risk for violations under the FCPA and UK Bribery Act. The concept that “relationships are gold” or Guanxi is crucial to conducting successful business in China. However, a fine line exists for organizations, necessitating a need for strong local counsel and guidance. Moreover, Chinese businesses don’t share the same definitions the Western world does for concepts like: information governance, legal hold or privacy.

 2.   FCPA and the UK Bribery Act- Both of these regulations are very troublesome for those doing business in China, yet necessary for regulating white-collar crime. In order to do business in China one must walk a fine line developing close relationships, without going too far and participating in bribery or other illegal acts. There are increased levels of prosecution under both of these statutes as businesses globalize.

3.  Drastically Different Legal Systems- The Chinese legal system is very different than those of common law jurisdictions. China’s legal system is based on civil law and there is no requirement for formal pre-litigation discovery. For this reason, litigants may find it very difficult to successfully procure discovery from Chinese parties. Chinese companies have been historically slow to cooperate with U.S. regulatory bodies and many discovery requests in civil litigation can take up to a year for a response. A copy of our eDiscovery passport on China can be found here, along with other important countries.

4.  State Secrets- In addition to the differences between common and civil law jurisdictions, China has strict laws protecting state secrets. Anything deemed a state secret would not be discoverable, and an attempt to remove state secrets from China could result in criminal prosecution. The definition of a state secret under People’s Republic of China law includes a wide range of information and is more ambiguous than Western definitions about national security (for example, the Chinese definitions are less defined than those in the U.S. Patriot Act). Politically sensitive data is susceptible to the government’s scrutiny and protection, regardless of whether it is possessed by PRC citizens or officials working for foreign corporations- there is no distinction or exception for civil discovery.

5.  Globalization- Finally, it is no secret that the world has become one huge marketplace. The rapid proliferation of information creation as well as the clashing of disparate legal systems creates real discovery challenges. However, there are also abundant opportunities for lawyers that become specialized in the Asia Pacific region today. Lawyers that are particularly adept in eDiscovery and Asia will flourish for years to come.

For more, read here…

Predictive Coding 101 & the Litigator’s Toolbelt

Wednesday, December 5th, 2012

Query your average litigation attorney about the difference between predictive coding technology and other more traditional litigation tools and you are likely to receive a wide range of responses. The fact that “predictive coding” goes by many names, including “computer-assisted review” (CAR) and “technology-assisted review” (TAR) illustrates a fundamental problem: what is predictive coding and how is it different from other tools in the litigator’s technology toolbelt™?

 Predictive coding is a type of machine-learning technology that enables a computer to “predict” how documents should be classified by relying on input (or “training”) from human reviewers. The technology is exciting for organizations attempting to manage skyrocketing eDiscovery costs because the ability to expedite the document review process and find key documents faster has the potential to save organizations thousands of hours of time. In a profession where the cost of reviewing a single gigabyte of data has been estimated to be around $18,000, narrowing days, weeks, or even months of tedious document review into more reasonable time frames means massive savings for thousands of organizations struggling to keep litigation expenditures in check.

 Unfortunately, widespread adoption of predictive coding technology has been relatively slow due to confusion about how predictive coding differs from other types of CAR or TAR tools that have been available for years. Predictive coding, unlike other tools that automatically extract patterns and identify relationships between documents with minimal human intervention, requires a deeper level of human interaction. That interaction involves significant reliance on humans to train and fine-tune the system through an iterative, hands-on process. Some common TAR tools used in eDiscovery that do not include this same level of interaction are described below:

  •  Keyword search: Involves inputting a word or words into a computer which then retrieves documents within the collection containing the same words. Also known as Boolean searching, keyword search tools typically include enhanced capabilities to identify word combinations and derivatives of root words among other things.
  •  Concept search: Involves the use of linguistic and statistical algorithms to determine whether a document is responsive to a particular search query. This technology typically analyzes variables such as the proximity and frequency of words as they appear in relationship to a keyword search. The technology can retrieve more documents than keyword searches because conceptually related documents are identified, whether or not those documents contain the original keyword search terms.
  •  Discussion threading: Utilizes algorithms to dynamically link together related documents (most commonly e-mail messages) into chronological threads that reveal entire discussions. This simplifies the process of identifying participants to a conversation and understanding the substance of the conversation.
  •  Clustering: Involves the use of algorithms to automatically organize a large collection of documents into different topical categories based on similarities between documents. Reviewing documents organized categorically can help increase the speed and efficiency of document review. 
  •  Find similar: Enables the automated retrieval of other documents related to a particular document of interest. Reviewing similar documents together accelerates the review process, provides full context for the document under review, and ensures greater coding consistency.
  •  Near-duplicate identification: Allows reviewers to easily identify, view, and code near-duplicate e-mails, attachments, and loose files. Some systems can highlight differences between near-duplicate documents to help simplify document review.

Unlike the TAR tools listed above, predictive coding technology relies on humans to review a small fraction of the overall document population, which ultimately results in a fraction of the review costs. The process entails feeding decisions about how to classify a small number of case documents called a training set into a computer system. The computer then relies on the human training decisions to generate a model that is used to predict how the remaining documents should be classified. The information generated by the model can be used to rank, analyze, and review the documents quickly and efficiently. Although documents can be coded with multiple designations that relate to various issues in the case during eDiscovery, many times predictive coding technology is simply used to segregate responsive and privileged documents from non-responsive documents in order to expedite and simplify the document review process.

 Training the predictive coding system is an iterative process that requires attorneys and their legal teams to evaluate the accuracy of the computer’s document prediction scores at each stage. A prediction score is simply a percentage value assigned to each document that is used to rank all the documents by degree of responsiveness. If the accuracy of the computer-generated predictions is insufficient, additional training documents can be selected and reviewed to help improve the system’s performance. Multiple training sets are commonly reviewed and coded until the desired performance levels are achieved. Once the desired performance levels are achieved, informed decisions can be made about which documents to produce.

 For example, if the legal team’s analysis of the computer’s predictions reveals that within a population of 1 million documents, only those with prediction scores in the 70 percent range and higher appear to be responsive, the team may elect to produce only those 300,000 documents to the requesting party. The financial consequences of this approach are significant because a majority of the documents can be excluded from expensive manual review by humans. The simple rule of thumb in eDiscovery is that the fewer documents requiring human review, the more money saved since document review is typically the most expensive facet of eDiscovery.

 Hype and confusion surrounding the promise of predictive coding technology has led some to believe that the technology renders other TAR tools obsolete. To the contrary, predictive coding technology should be viewed as one of many different types of tools in the litigator’s technology toolbelt™ that often can and should be used together. Choosing which of these tools to use and how to use them depends on the case and requires balancing factors such as discovery deadlines, cost, and complexity. Many believe the choice about which tools should be used for a particular matter, however, should be left to producing party as long as the tools are used properly and in a manner that is “just” for both parties as mandated by Rule 1 of the Federal Rules of Civil Procedure

 The notion that parties should be able to choose which tools they use during discovery recently garnered support in the 7th Federal Circuit. In Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al., Judge Nolan was faced with exploring plaintiffs’ claim that the defendants’ should be required to supplement their use of keyword searching tools with more advanced tools in order to better comply with their duty to produce documents. Plaintiffs’ argument hinged largely on the assumption that using more advanced tools would result in a more thorough document production. In response to this argument, Judge Nolan referenced Sedona Best Practices Recommendations & Principles for Addressing Electronic Document Production during a hearing between the parties to suggest that carpenter (end user) is best equipped to select the appropriate tool during discovery. Sedona Principle 6 states that:

“[r]esponding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

Even though the parties in Kleen Products ultimately postponed further discussion about whether tools like predictive coding technology should be used when possible during discovery, the issue remains important because it is likely to resurface again and again as predictive coding momentum continues to grow. Some will argue that parties who fail to leverage modern technology tools like predictive coding are attempting to game the legal system to avoid thorough document productions.  In some instances, that argument could be valid, but it should not be a foregone conclusion.

Although there will likely come a day where predictive coding technology is the status quo for managing large scale document review, that day has not yet arrived. Predictive coding technology is a type of machine learning technology that has been used in other disciplines for decades. However, predictive coding tools are still very new to the field of law. As a result, most predictive coding tools lack transparency because they provide little if any information about the underlying statistical methodologies they apply. The issue is important because the misapplication of statistics could have a dramatic effect on the thoroughness of document productions. Unfortunately, these nuanced issues are sometimes misunderstood or overlooked by predictive coding proponents –a problem that could ultimately result in unfairness to requesting parties and stall broader adoption of otherwise promising technology. 

Further complicating matters is the fact that several solution providers have introduced new predictive coding tools in recent months to try and capture market share. In the long term, competition is good for consumers and the industry as a whole. In the short term, however, most of these tools are largely untested and vary in quality and ease of use, thereby adding more confusion to would-be consumers. The unfortunate end result is that many lawyers are shying away from using predictive coding technology until the pros and cons of various technology solutions and their providers are better understood.  Market confusion is often one of the biggest stumbling blocks to faster adoption of technology that could save organizations millions and the current predictive coding landscape is a testament to this fact.

Eliminating much of the current confusion through education is the precise goal of Symantec’s Predictive Coding for Dummies book. The book addresses everything from predictive coding case law and defensible workflows, to key factors that should be considered when evaluating different predictive coding tools. The book strives to provide attorneys and legal staff accustomed to using traditional TAR tools like keyword searching with a baseline understanding of a new technological approach that many find confusing. We believe providing the industry with this basic level of understanding will help ensure that predictive coding technology and related best practices standards will evolve in a manner that is fair to both parties –ultimately, expediting rather than slowing broader adoption of this promising new technology. To learn more, download a free copy of Predictive Coding for Dummies and feel free to share your feedback and comments below.

5 questions with Ralph Losey about the New Electronic Discovery Best Practices (EDBP) Model for Attorneys

Tuesday, November 6th, 2012

The eDiscovery world is atwitter with two new developments – one is Judge Laster’s opinion in the EORHB case where he required both parties to use predictive coding. The other is the new EDBP model, created by Ralph Losey (and team) to “provide a model of best practices for use by law firms and corporate law departments.” Ralph was kind enough to answer a few questions for eDiscovery 2.0:

1. While perhaps not fair, I’ve already heard the EDBP referred to as the “new EDRM.” If busy folks could only read one paragraph on the distinction, could you set them straight?

“EDRM, the Electronic Discovery Reference Model, covers the whole gamut of an e-discovery project. The model provides a well-established, nine-step workflow that helps beginners understand e-discovery. EDBP, Electronic Discovery Best Practices, is focused solely on the activities of lawyers. The EDBP identifies a ten-step workflow of the rendition of legal services in e-discovery. Moreover, EDBP.com attempts to capture and record what lawyers specializing in the field now consider the best practices for each of these activities.”

“By the way, although I have a copyright on these diagrams, anyone may freely use the diagrams. We encourage that. We are also open to suggestions for best practices from any practicing lawyer. We anticipate that this will be a constantly evolving model and collection of best practices.”

2. Given the lawyer-centric focus, what void are you attempting to fill with the EDBP?

I was convinced by my friend Jason Baron of the need for standards in the world of e-discovery. It is too much of a wild west out there now, and we need guidance. But as a private lawyer I am also cognizant of the dangers of creating minimum standards for lawyers that could be used as a basis for malpractice suits. It is not an appropriate thing for any private group to do. It is a judicial matter that will arise out of case law and competition. So after a lot of thought we realized that minimum standards should only be articulated for the non-legal-practice part of e-discovery, in other words, standards should be created for vendors only and their non-legal activities. The focus for lawyers should be on establishing best practices, not minimum standards. I created this graphic using the analogy of a full tank of gas to visualize this point and explained it my blog Does Your CAR (“Computer Assisted Review”) Have a Full Tank of Gas?


“This continuum of competence applies not only to the legal service of Computer Assisted Review (CAR), aka Technology Assisted Review (TAR), but to all legal services. The goal of EDBP is to help lawyers avoid negligence by staying far away from minimum standards and focus instead of the ideals, the best practices.”


3. The EDBP has ten steps. While assuredly unfair, what step contains the most controversy/novelty compared to business as usual in the current e-Discovery world?

“None really. That’s the beauty of it. The EDBP just documents what attorneys already do. The only thing controversial about it, if you want to call it that, is that it established another frame of reference for e-discovery in addition to the EDRM. It does not replace EDRM. It supplements it. Most lawyers specializing in the field will get EDBP right away.”


“I suppose you could say giving Cooperation its very own key place in a lawyer’s work flow might be somewhat controversial, but there is no denying that the rules, and best practices, require lawyers to talk to each other and at least try to cooperate. Failing that, all the judges and experts I have heard suggest that you should initiate early motion practice and not wait until the end. There seems to be widespread consensus in the e-discovery community on the key role of cooperative dialogues with opposing counsel and the court, so I do not think it is really controversial, but may still be news to the larger legal community. In fact, all of these best practices may not be well-known to the average Joe Litigator, which just shows the strong need for an educational resource like EDBP.”

4. Why not use “information governance” instead of “litigation readiness” on the far left hand side of the EDBP?

 There is far more to getting a client ready for litigation than helping them with their information governance. Plus, remember, this is not a workflow for vendors or management or records managers. It is not a model for an entire e-discovery team. This is a workflow only for what lawyers do.”

5. Given your recent, polarizing article urging law firms to get out of the eDiscovery business, how does the EDBP model either help or hinder that exhortation?

 This article was part of my attempt to clarify the line between legal e-discovery services and non-legal e-discovery services. EDBP is a part of that effort because it is only concerned with the law. It does not include non-legal services. As a practicing lawyer my core competency is legal advice, not processing ESI and software. Many lawyers agree with me on this, so I don’t think my article was polarizing so much as it is exposing, kind of like the young kid who pointed out that the emperor had no clothes.

The professionals in law firm lit support departments will eventually calm down when they realize no jobs are lost in this kind of outsourcing, and it all stays in the country. The work just moves from law firms, that also do some e-discovery, to businesses, most of whom only do e-discovery. I predict that when this kind of outsourcing catches on, that it will be common for the vendor with the outsourcing contract to hire as many of the law firm’s lit-support professionals as possible.

My Emperor’s no-clothes expose applies to the vendor side of the equation too. Vendors, like law firms, should stick to their core competence and stay away from providing legal advice. UPL is a serious matter. In most states it is a crime. Many vendors may well be competent to provide legal services, but they do not have a license to do so, not to mention their lack of malpractice insurance.

I am trying to help the justice system by clarifying and illuminating the line between law and business. It has become way too blurred to the detriment of both. Much of this fault lies on the lawyer-side as many seem quite content to unethically delegate their legal duties to non-lawyers, rather than learn this new area of law. I am all for the team approach. I have been advocating it for years in e-DiscoveryTeam.com. But each member of the team should know their strengths and limitations and act accordingly. We all have different positions to play on the team. We cannot all be quarterbacks.”

6. [Bonus Question] “EDBP” doesn’t just roll off the tongue. Given your prolific creativity (I seem to recall hamsters on a trapeze at one point in time), did you spend any cycles on a more mellifluous name for the new model?

“There are not many four-letter dot-com domain names out there for purchase, and none for free, and I did not want to settle for dot-net like EDRM did. I am proud, and a tad poorer, to have purchased what I think is a very good four-letter domain name, EDBP.com. After a few years EDBP will flow off your tongue too, after all, if has an internal rhyme – ED BP. Just add a slight pause to the name, ED … BP, and it flows pretty well thank you.”

Thanks Ralph.  We look forward to seeing how this new model gains traction. Best of luck.

New Gartner Report Spotlights Significance of Email Archiving for Defensible Deletion

Thursday, November 1st, 2012

Gartner recently released a report that spotlights the importance of using email archiving as part of an organization’s defensible deletion strategy. The report – Best Practices for Using Email Archiving to Eliminate PST and Mailbox Quota Headaches (Alan Dayley, September 21, 2012) – specifically focuses on the information retention and eDiscovery challenges associated with email storage on Microsoft Exchange and how email archiving software can help address these issues. As Gartner makes clear in its report, an archiving solution can provide genuine opportunities to reduce the costs and risks of email hoarding.

The Problem: PST Files

The primary challenge that many organizations are experiencing with Microsoft Exchange email is the unchecked growth of messages stored in portable storage tablet (PST) files. Used to bypass storage quotas on Exchange, PST files are problematic because they increase the costs and risks of eDiscovery while circumventing information retention policies.

That the unrestrained growth of PST files could create problems downstream for organizations should come as no surprise. Various court decisions have addressed this issue, with the DuPont v. Kolon Industries litigation foremost among them. In the DuPont case, a $919 million verdict and 20 year product injunction largely stemmed from the defendant’s inability to prevent the destruction of thousands pages of email formerly stored in PST files. That spoliation resulted in a negative inference instruction to the jury and the ensuing verdict against the defendant.

The Solution: Eradicate PSTs with the Help of Archiving Software and Retention Policies

To address the PST problem, Gartner suggests following a three-step process to help manage and then eradicate PSTs from the organization. This includes educating end users regarding both the perils of PSTs and the ease of access to email through archiving software. It also involves disabling the creation of new PSTs, a process that should ultimately culminate with the elimination of existing PSTs.

In connection with this process, Gartner suggests deployment of archiving software with a “PST management tool” to facilitate the eradication process. With the assistance of the archiving tool, existing PSTs can be discovered and migrated into the archive’s central data repository. Once there, email retention policies can begin to expire stale, useless and even harmful messages that were formerly outside the company’s information retention framework.

With respect to the development of retention policies, organizations should consider engaging in a cooperative internal process involving IT, compliance, legal and business units. These key stakeholders must be engaged and collaborate if a workable policies are to be created. The actual retention periods should take into account the types of email generated and received by an organization, along with the enterprise’s business, industry and litigation profile.

To ensure successful implementation of such retention policies and also address the problem of PSTs, an organization should explore whether an on premise or cloud archiving solution is a better fit for its environment. While each method has its advantages, Gartner advises organizations to consider whether certain key features are included with a particular offering:

Email classification. The archiving tool should allow your organization to classify and tag the emails in accordance with your retention policy definitions, including user-selected, user/group, or key-word tagging.

User access to archived email. The tool must also give end users appropriate and user-friendly access to their archived email, thus eliminating concerns over their inability to manage their email storage with PSTs.

Legal and information discovery capabilities. The search, indexing, and e-discovery capabilities of the archiving tool should also match your needs or enable integration into corporate e-discovery systems.

While perhaps not a panacea for the storage and eDiscovery problems associated with email, on premise or cloud archiving software should provide various benefits to organizations. Indeed, such technologies have the potential to help organizations store, manage and discover their email efficiently, cost effectively and in a defensible manner. Where properly deployed and fully implemented, organizations should be able to reduce the nettlesome costs and risks connected with email.

Judicial Activism Taken to New Heights in Latest EORHB (Hooters) Predictive Coding Case

Monday, October 29th, 2012

Ralph Losey, an attorney for Jackson Lewis, reported last week that a Delaware judge took matters into his own hands by proactively requiring both parties to show cause as to why they should not use predictive coding technology to manage electronic discovery. Predictive coding advocates around the globe will eagerly trumpet Judge Laster’s move as another judicial stamp of approval for predictive coding much the same way proponents lauded Judge Peck’s order in in Da Silva Moore, et. al. v. Publicis Groupe, et. al.  In Da Silva Moore, Judge Peck stated that computer-assisted review is “acceptable in appropriate cases.” In stark contrast to Da Silva Moore, the parties in EORHB, Inc., et al v. HOA Holdings, LLC, not only never agreed to use predictive coding technology, there is no indication they ever initiated the discussion with one another let alone with Judge Laster. In addition to attempting to dictate the technology tool to be used, Judge Laster also directed the parties to use the same vendor. Apparently, Judge Laster not only has the looks of Agent 007, he shares James Bonds’ bold demeanor as well.

Although many proponents of predictive coding technology will see Judge Laster’s approach as an important step forward toward broader acceptance of predictive coding technology, the directive may sound alarm bells for others. The approach contradicts the apparent judicial philosophy applied in Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. — a 7th Circuit case also addressing the use of predictive coding technology. During one of many hearings between the parties in Kleen, Judge Nan Nolan stated that “the defendant under Sedona 6 has the right to pick the [eDiscovery] method.”  Judge Nolan’s statement is a nod to Principle 6 of the Sedona Best Practices Recommendations & Principles for Addressing Electronic Document Production which states:

“[r]esponding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

Many attorneys shudder at the notion that the judiciary should choose (or at least strongly urge) the specific technology tools parties must use during discovery. The concern is based largely on the belief that many judges lack familiarity with the wide range of eDiscovery technology tools that exist today.  For example, keyword search, concept search, and email threading represent only a few of the many technology tools in the litigator’s tool belt that can be used in conjunction with predictive coding tools to accelerate document review and analysis.  The current challenge is that predictive coding technology is relatively new to the legal industry so the technology is much more complex than some of the older tools in the litigator’s tool belt.  Not surprisingly, this complexity combined with an onslaught of new entrants to the predictive coding market has generated a lot of confusion about how to use predictive coding tools properly.

Current market confusion is precisely what Judge Laster and the parties in EORHB must overcome in order to successfully advance the adoption of predictive coding tools within the legal community. Key to the success of this mission is the recognition that predictive coding pitfalls are not always easy to identify– let alone avoid. However, if these pitfalls are properly identified and navigated, then Judge Laster’s mission may be possible.

Identifying pitfalls is challenging because industry momentum has led many to erroneously assume that all predictive coding tools work the same way. The momentum has been driven by the potential for organizations to save millions in document review costs with predictive coding technology. As a result, vendors are racing to market at breakneck speed to offer their own brand of predictive coding technology. Those without their own solutions are rapidly forming partnerships with those who have offerings so they too can capitalize on the predictive coding financial bonanza that many believe is around the corner. This rush to market has left the legal and academic communities with little time to build consensus about the best way to properly vet a wide range of new technology offerings.  More specifically, the predictive coding craze has fostered an environment where there is often a lack of scrutiny related to individual predictive coding tools.

The harsh reality is that all predictive coding tools are not created equally.  For example, some providers erroneously call their solution “predictive coding technology” when the solution they offer is merely a type of clustering and/or concept searching technology that has been commonly used for over a decade. Even among predictive coding tools that are perceived as legitimate, pricing varies so widely that using some tools may not even be economically feasible considering the value of the case at hand. Some solution providers charge a premium to use their predictive coding tools and require additional expenditures in the form of consulting fees, while others tools are integrated within easy-to-use eDiscovery platforms at no additional cost.

If the court and parties decide that using predictive coding technology in EORHB makes economic sense, they must understand the importance of statistics and transparency to insure a fair playing field. The widespread belief that all predictive coding technologies surpass the accuracy of human review is a pervasive misperception that continues to drive confusion in the industry. The assumption is false not only because these tools must be used correctly to yield reliable results, but because the underlying statistical methodology applied by the tools must also be sound for the tools to work properly and exceed the accuracy of human review. (See Predictive Coding for Dummies for a more comprehensive explanation of predictive coding and statistics).

The underlying statistical methodology utilized by most tools today is almost always unclear which should automatically raise red flags for Judge Laster. In fact, this lack of transparency has led many to characterize most predictive coding tools as “black box” technologies – meaning that inadequate information about how the tools apply statistics makes it difficult to trust the results. There are differing schools of thought about the proper application of statistics in predictive coding that have largely been ignored to date.  Hopefully Judge Laster and the parties will use the present case as an opportunity to clarify some of this confusion so that the adoption of predictive coding technology within the legal community is accelerated in a way that involves sufficient scrutiny of the processes and tools used.

Judge Laster and the parties in EORHB are presented with a unique opportunity to address many important issues related to the use of predictive coding technology that are often misunderstood and overlooked. Hopefully the parties use predictive coding technology and engage in a dialogue that highlights the importance of selecting the right predictive coding tool, using that tool correctly, and the proper application of statistics.  If the court and the parties shed light on these three areas, Judge Laster’s predictive coding mission may be possible.