Posts Tagged ‘Sedona Conference’

In Re: Biomet Order Addresses Hot Button Predictive Coding Issue

Friday, December 20th, 2013

United States District Court Judge for the Northern District of Indiana, Ronald J. Miller, recently addressed what has arguably become the hottest predictive coding issue since Judge Andrew J. Peck’s February 2012 order in Da Silva Moore v. Publicis Groupe. The issue is whether or not parties who use predictive coding technology to assist with document productions should disclose the non-responsive documents used to train their system to the other side.

Judge Peck Opens the Predictive Coding Door

In Da Silva Moore, Judge Peck became the first judge to state that the use of predictive coding technology is “acceptable in appropriate cases.” Since the decision, some litigation attorneys have criticized the predictive coding protocol the parties established. Central to that criticism is the inclusion of a provision requiring the voluntary disclosure of non-privileged documents used to train the predictive coding system.

Fearing a judicial trend, many attorneys have argued that the Federal Rules of Civil Procedure (Rules) simply do not require the disclosure of non-responsive documents under any circumstances. Others argue that a little cooperation and transparency between adversaries isn’t a bad thing when one party saves money and time and the other receives a more thorough production. Not surprisingly, both sides have eagerly awaited judicial guidance.

Judge Miller Tackles the Hot Button Issue

In In Re: Biomet, Judge Miller provided that long-awaited guidance by holding that Rule 26 does not require a party to disclose seed set documents used to train a predictive coding system. The order came on the heels of an earlier April 2013 order denying plaintiffs’ motion to compel Biomet to re-do earlier document productions (unless plaintiffs paid). The plaintiffs argued that Biomet’s decision to use key word search terms and de-duplication techniques to cull 19.5 million documents down to 2.5 million before using predictive coding technology “tainted” the production process. More specifically, plaintiffs contended that using keywords to filter out documents likely excluded responsive documents that should have been produced. Judge Miller found plaintiffs’ arguments unconvincing, largely due to the fact that Biomet had already spent approximately $1.07 million on eDiscovery.

Four months later plaintiffs filed another motion requesting more transparency into Biomet’s predictive coding process. Plaintiffs moved to compel Biomet to disclose and identify the initial seed set documents used to train the predictive coding system to distinguish between a responsive and non-responsive document. Plaintiffs reasoned that knowing which documents Biomet coded as responsive and non-responsive was necessary to measure the accuracy of Biomet’s production. In the order denying plaintiffs’ request, Judge Miller stated:

“As I understand it, a predictive coding algorithm offers up a document, and the user tells the algorithm to find more like that document or that the user doesn’t want more documents like what was offered up. The Steering Committee wants the whole seed set Biomet used for the algorithm’s initial training. That request reaches well beyond the scope of any permissible discovery by seeking irrelevant or privileged documents used to tell the algorithm what not to find. That the Steering Committee has no right to discover irrelevant or privileged documents seems self-evident.”

Judge Miller continued by acknowledging plaintiffs’ argument that Biomet was not proceeding in the cooperative spirit endorsed by the Sedona Conference Cooperation Proclamation and the 7th Circuit Pilot Program. However, he stated that:

“[N]either the Sedona Conference nor the Seventh Circuit project expands a federal district court’s powers, so they can’t provide me with authority to compel discovery of information not made discoverable by the Federal Rules.”

In particular, Judge Miller pointed to the language contained in FRCP 26(b)(1) as a basis for his decision. He concluded that because the plaintiffs knew of the “existence and location” of each discoverable document Biomet used in the seed set, Biomet had complied with their production obligation. Surprisingly, Judge Miller’s analysis did not specifically address what some may argue is the key language in FRCP 26(b)(1) which states:

“For good cause, the court may order discovery of any matter relevant to the subject matter involved in the action. Relevant information need not be admissible at the trial if the discovery appears reasonably calculated to lead to the discovery of admissible evidence.”

Judge Miller went on to criticize Biomet’s “unexplained lack of cooperation” and urged Biomet to rethink its refusal to at least reveal the responsive documents used in the seed set. His comments indicated that plaintiffs’ position would be stronger if they had only requested the identification of the non-privileged and non-responsive seed set. However, he ultimately refused to compel the identity of any of the seed set documents because he lacked “any discretion in this dispute.”

Is the Issue Resolved?

Even though Judge Miller explained that he lacked “any discretion in this dispute,” some future litigants are likely to argue that Rule 26 provides judges with the discretion to order the disclosure of documents that are both non-responsive and non-privileged where appropriate. For example, proponents of disclosure are likely to argue that coding decisions applied to training documents could have a significant impact on the discovery of admissible evidence. If training documents are coded accurately, the likelihood of discovering admissible evidence increases if that evidence exists. On the other hand, adversaries are likely to respond sharply that sharing non-responsive documents has not been required in the past and should not be required in the future. In fact, following Da Silva Moore, some have argued that even keywords are work-product protected and should not be disclosed.


In Re: Biomet appears to be the first case addressing whether or not parties are obligated to share non-responsive documents used to train a predictive coding system — but likely won’t be the last. First, the decision is not binding. Second, Judge Miller did not thoroughly address key language contained within 26(b)(1) which invites further analysis. Lastly, the legal industry is struggling to define predictive coding best practices and to understand the range of different predictive coding technology solutions. Given the current confusion, demands for more predictive coding transparency are likely to continue as the market evolves. Don’t expect this hot button issue to cool off any time soon.

*Blog post co-authored by Matt Nelson and Adam Kuhn

ADR Offers Unique Solutions to Address Common eDiscovery Challenges

Friday, May 3rd, 2013

Much of the writing in the eDiscovery community focuses on the consequences of a party failing to adequately accomplish one of the nine boxes of the Electronic Discovery Reference Model. Breaking news posts frequently report on how spoliation and sanctions are typically issued for failure to suspend auto-deletion or to properly circulate a written litigation hold notices. This begs the question, aside from becoming perfectly adept in all nine boxes of the EDRM, how else can an organization protect themselves from discovery wars and sanctions?

One way is explore the possibilities Alternative Dispute Resolution (ADR) has to offer. While there is no substitute for the proper implementation of information governance processes, technology, and the people who manage them; there are alternative and creative ways to minimize exposure. This is not to say that ESI is less discoverable in ADR, but it is to say with the proper agreements in place, the way ESI is handled in the event of a dispute can be addressed proactively.  That is because although parties are free to use the Federal Rules of Civil Procedure in ADR proceedings, they are not constricted by them. In other words, ADR proceedings can provide parties with the flexibility to negotiate and tailor their own discovery rules to address the specific matter and issues at hand.

Arbitration is a practical and preferred way to resolve disputes because it is quick, relatively inexpensive and commonly binding. With enough foresight, parties can preemptively limit the scope of discovery in their agreements to ensure the just and speedy resolution of a matter. Practitioners who are well versed in electronic discovery will be the best positioned to counsel clients in the formation of their agreements upfront, obviating protracted discovery. While a similar type of agreement can be reached and protection can be achieved with the Meet and Confer Conference in civil litigation, ADR offers a more private forum giving the parties more contractual power and less unwanted surprises.

For example, JAMS includes this phrase in one of their model recommendations:

JAMS recognizes that there is significant potential for dealing with time and other limitations on discovery in the arbitration clauses of commercial contracts. An advantage of such drafting is that it is much easier for parties to agree on such limitations before a dispute has arisen. A drawback, however, is the difficulty of rationally providing for how best to arbitrate a dispute that has not yet surfaced. Thus, the use of such clauses may be most productive in circumstances in which parties have a good idea from the outset as to the nature and scope of disputes that might thereafter arise.

Thus, arbitration is an attractive option for symmetrical litigation where the merits of the case are high stakes and neither party wants to delve into a discovery war. A fair amount of early case assessment would be necessary as well, so parties have a full appreciation about what they are agreeing to include or not include in the way of ESI.  Absent a provision to use specific rules (American Arbitration Association or Federal Arbitration Act), the agreement between parties is the determining factor as to how extensive the scope of discovery will be.

In Mitsubishi Motors v. Soler Chrysler-Plymouth, Inc., 473 U.S. 614, 625 (1985), the U.S. Supreme Court has explained that the “liberal federal policy favoring arbitration agreements’…is at bottom a policy guaranteeing the enforcement of private contractual agreements. As such, assuming an equal bargaining position or, at least an informed judgment, courts will enforce stipulations regarding discovery, given the policy of enforcing arbitration agreements by their terms.” Please also see an excellent explanation of Discovery in Arbitration by Joseph L. Forstadt for more information.

Cooperation amongst litigants in discovery has long been a principle of the revered Sedona Conference. ADR practitioners facing complex discovery questions are looking to Sedona’s Cooperation Proclamation for guidance with an eye toward negotiation by educating themselves on ways to further minimize distractions and costs in discovery.  An example of one such event is at The Center for Negotiation and Dispute Resolution at UC Hastings, where they are conducting a mock Meet and Confer on May 16, 2013. The event highlights the need for all practitioners, whether it be the 26 (f) conference for litigation or the preliminary hearing in the case of arbitration, to assess electronic discovery issues with the same weight they do claims and damages early on in the dispute.

It is also very important that arbitrators, especially given the power they have over a matter, to understand the consequences of their rulings. Discovery is typically under the sole control of the arbitrator in a dispute, and only in very select circumstances can relief be granted by the court. An arbitrator that knows nothing about eDiscovery could miss something material and affect the entire outcome adversely. For parties that have identified and addressed these issues proactively, there is more protection and certainty in arbitration. Typically, the primary focus of an arbitrator is enforcing the contract between parties, not to be an eDiscovery expert.

It is also important to caution against revoking rights to discovery by entering into mutual agreements to unreasonably limit discovery.  This approach is somewhat reminiscent of the days when lawyers would agree not to conduct discovery, because neither knew how. Now, while efficiency and cost savings are a priority, we must guard against a potential similar paradigm emerging as we may know too much about how to shield relevant ESI.

As we look to the future, especially for serial litigants, one can imagine a perfect world in arbitration for predictive coding. In the Federal courts, we have seen over the past two years or so an emergence of the use of predictive coding technologies. However, even when the parties agree, which they don’t always, they still struggle with achieving a meeting of the minds on the protocol. These disputes have at times overshadowed the advantage of using predictive coding because discovery disputes and attorney’s fees have overtaken any savings. In ADR there is a real opportunity for similarly situated parties to agree via contract, upfront on tools, methodologies and scope. Once these contracts are in place, both parties are bound to the same rules and a just and speedy resolution of a matter can take place.

Dueling Predictive Coding for Dummies Books Part Deux

Friday, December 7th, 2012

Long time readers of the eDiscovery 2.0 blog know we like to take advantage of every opportunity we have to discuss Charlie Sheen and eDiscovery.  While Charlie Sheen’s antics may have died down, the evolution and discussion of e-Discovery technology continues unabated. Thanks to Sharon Nelson and a recent blog post on her ride the lightning site, we’ve decided that there is no way we can pass up the opportunity to stretch the Charlie Sheen/eDiscovery analogy once again.

In the 1993 movie Hot Shots Part Deux, Charlie Sheen plays the main character in a Rambo parody that has similarities to the original Rambo movies starring Sylvester Stallone.  Not surprisingly, the parody is focused on comedic value and is a far cry from the original Rambo movies that helped make Stallone a Hollywood icon.  In recent months, those in the litigation community have watched an analogous situation play out with two competing books about predictive coding technology.

In September, the legal publication ALM (American Lawyer Media), reported that two competing Predictive Coding for Dummies books were published by Symantec and Recommind respectively. 

The ALM article, titled: Predictive Coding Vendors Duel for ‘Dummies’ did not provide an in depth analysis of either book, but a recent blog posted to ride the lightening by Terry Dexter provided an analysis and review of both books that many have eagerly anticipated.  The conclusion?  The Predictive Coding for Dummies sequel is a far cry from the original.

Here is the actual text of Mr. Dexter’s analysis for your reading pleasure:

Predictive Coding For Dummies®, Symantec(TM) Special Edition by Matthew D. Nelson, Esq. Copyright © 2012 from John Wiley & Sons, Inc. 111 River St. Hoboken, NJ 07030-5774 ISBN 978-1-118-48198-1 (pbk); ISBN 978-1-118-48237-7 (ebk)

Predictive Coding For Dummies®, Recommind Special Edition author(s) not listed, Copyright © 2013 from John Wiley & Sons, Inc. 111 River St.Hoboken, NJ 07030-5774 ISBN 978-1-118-52167-0 (pbk); ISBN 978-1-118-52230-1 (ebk)

Not being known as someone who won’t accept a challenge, I read both books cover to cover (several times).  In full disclosure, I am not an attorney (or played one on TV); I am simply a techno-geek with a Bachelor of Arts in English and strong interest in the tools, techniques and methods involved with electronic discovery (eDis). This review is based upon my reading and understanding of Predictive Coding, which, in turn, is based upon a combination of 30 years in Information Science & Technology and extensive research into the wild wooly world of electronic discovery. Any and all comments are mine and not that of Sharon Nelson (the individual) or Sensei Enterprises, Inc.

Up first: Predictive Coding For Dummies®, Symantec(TM) Special Edition by Matthew D. Nelson, Esq.

My initial impression of this book was good. The format follows the standard “Dummies” format and structure while legal and technical concepts are presented in a clear, easily understood manner. Nelson’s writing flows from one paragraph to another and doesn’t introduce new terms without first explaining them. The reader is immediately informed as to the what and why of electronic discovery.  From the third paragraph onward, the reader is gradually immersed into a sometimes murky world.

This excerpt from the Introduction sets the tone:

“Predictive coding technology is a new approach to attorney document review that can be used to help legal teams significantly reduce the time and cost of eDiscovery. Despite the promise of predictive coding technology, the technology is relatively new to the legal field, and significant confusion about the proper use of these tools is pervasive. This book helps eliminate that confusion by providing a wealth of information about predictive coding technology, related terminology, and the proper use of these tools.”

Specific comments:

Beyond the excellent writing, this book contains many positives and negatives; some of which I present here.


  1. The cost in terms of timeliness, accuracy and productivity is compared to manual review. 
  2. Nelson introduces the Electronic Discovery Reference Model (EDRM) within the first three (3) pages. The subsequent discussions regarding potential costs is emphasized by illustrating the enormity of the potential volume of Electronically Stored Information (ESI). This early introduction is also valuable when process defensibility is introduced.
  3. The concepts of sanctions, privileged information, human v machine reading/review and risk are easily distinguished. Again, the “whys” for such concepts, easily understandable to a First Year Law Student, are easily understood for the layperson.
  4. The inclusion of website addresses to provide additional information is most welcome. Indeed, references to a predictive coding cost estimate page and to a Ralph Losey article helped me gain a deeper understanding of the planning and execution of a PC effort.
  5. A separate step in Nelson’s work flow considers Privileged Information. While no one on either side of a litigation struggle want to divulge such data, it can and does happen. Predictive Coding is not presented as a palliative cure-all for such ‘ooopsies’; however the book does go far in helping the reader comprehend the necessity of conducting separate actions to reduce if not eliminate the probability of such an event occurring.
  6. The three prominent eDis cases (DaSilva-Moore, Kleen Products and Global  Aerospace) are discussed relative to First Generation PC tools and Judicial Guidance.


  1. Clearly, this book is written and produced to influence litigators and law firms to orient themselves towards Symantec and Clearwell. Hints are subtly placed throughout the book. While not explicitly mentioning any names, the implication is clear and gets more obvious starting at Chapter 6. A more neutral, objective content makes more sense to someone who is already familiar with the eDis process.
  2. There is no discussion on the difficulties of using Optical Character Recognition (OCR) or different character set based ESI. All data is presumed to be 100% compatible and ANSI compliant.


This is an excellent book to give to clients, new litigation support personnel, paralegals, etc. involved in the beginnings of any litigation where the use of Electronic Discovery tools is likely.

Next up: Predictive Coding For Dummies®, Recommind Special Edition author(s) not listed

My initial impression was guarded. The format follows the standard “Dummies” format and structure but the content reads like someone mashed several marketing ‘White Papers’ together. This impression is further supported when comparing copyright dates with the Symantec book. Indeed, a stark comparison between these tomes is like comparing apples to oranges.


  1. It’s short.


  1. Only nine (9) pages (25%) have any direct relationship with the subject matter. Twenty-eight (28) pages (~77%) are more closely related to marketing collateral. The very topic of Predictive Coding is introduced to the reader at page 11!
  2. The reader is constantly bombarded with the cost differential between manual and automated document review. Figure 1-2 in this book compares savings in 3 types of cases (IP, Second Request & Tort). Linear (Manual) Review is compared to Predictive Coding and, of course, PC wins every time. However there is no mention as to the style of the PC effort (and related costs) – were documents reviewed in house or by a services provider?
  3. There is zero mention of risk, sanctions or privileged information. In fact, a reader may develop the idea that any Predictive Coding tool takes care of any such occurrences.
  4. There is no discussion on the difficulties of using Optical Character Recognition (OCR) or different character set based ESI. All ESI is presumed to be 100% compatible and ANSI compliant.
  5. What are ‘Frankenstacks’? This book is supposed to help IT Managers who already understand the hurdles of application incompatibility.
  6. The book is very difficult to read. The workflow discussion does not follow the accompanying diagram (Figure 2-1) and even introduces the concept of ‘Predictive Analysis’ without any further discussion.
  7. The book makes blatant reference to Recommind’s product. Indeed the content of the entire document builds to the conclusion that only Recommind has the capability to successfully conduct electronic discovery.


This is a very poorly written book using a style that insults the reader’s intelligence. A cursory Bing or Google search would a better investment in time and money.”

Interestingly, only one day after Mr. Dexter’s review, another review by Jeffrey Reed was posted to ride the lightning criticizing both books. For those of us in a profession that thrives on advocacy, it probably comes as no surprise that two people could have different views of the same book. Unfortunately, inconsistent reviews might leave some to wonder which book they should read.  The good news is that both books are free so we invite you to read them both and draw your own conclusions.  As always, we also invite your feedback.

To download a copy of Symantec’s Predictive Coding for Dummies book click here.

Mission Impossible? The eDiscovery Implications of the ABA’s New Ethics Rules

Thursday, August 30th, 2012

The American Bar Association (ABA) recently announced changes to its Model Rules of Professional Conduct that are designed to address digital age challenges associated with practicing law in the 21st century. These changes emphasize that lawyers must understand the ins and outs of technology in order to provide competent representation to their clients. From an eDiscovery perspective, such a declaration is particularly important given the lack of understanding that many lawyers have regarding even the most basic supporting technology needed to effectively satisfy their discovery obligations.

With respect to the actual changes, the amendment to the commentary language from Model Rule 1.1 was most significant for eDiscovery purposes. That rule, which defines a lawyer’s duty of competence, now requires that attorneys discharge that duty with an understanding of the “benefits and risks” of technology:

To maintain the requisite knowledge and skill, a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology, engage in continuing study and education and comply with all continuing legal education requirements to which the lawyer is subject.

This rule certainly restates the obvious for experienced eDiscovery counsel. Indeed, the Zubulake series of opinions from nearly a decade ago laid the groundwork for establishing that competence and technology are irrevocably and inextricably intertwined. As Judge Scheindlin observed in Zubulake V, “counsel has a duty to effectively communicate to her client its discovery obligations so that all relevant information is discovered, retained, and produced.” This includes being familiar with client retention policies, in addition to its “data retention architecture;” communicating with the “client’s information technology personnel” and arranging for the “segregation and safeguarding of any archival media (e.g., backup tapes) that the party has a duty to preserve.”

Nevertheless, Model Rule 1.1 is groundbreaking in that it formally requires lawyers in those jurisdictions following the Model Rules to be up to speed on the impact of eDiscovery technology. In 2012, that undoubtedly means counsel should become familiar with the benefits and risks of predictive coding technology. With its promise of reduced document review costs and decreased legal fees, counsel should closely examine predictive coding solutions to determine whether they might be deployed in some phase of the document review process (e.g., prioritization, quality assurance for linear review, full scale production). Yet caution should also be exercised given the risks associated with this technology, particularly the well-known limitations of early generation predictive coding tools.

In addition to predictive coding, lawyers would be well served to better understand traditional eDiscovery technology tools such as keyword search, concept search, email threading and data clustering. Indeed, there is significant confusion regarding the continued viability of keyword searching given some prominent judicial opinions frowning on so-called blind keyword searches. However, most eDiscovery jurisprudence and authoritative commentators confirm the effectiveness of keyword searches that involve some combination of testing, sampling and iterative feedback.

Whether the technology involves predictive coding, keyword searching, attorney client privilege reviews or other areas of eDiscovery, the revised Model Rules appear to require counsel to understand the benefits and risks of these tools. Moreover, this is not simply a one-time directive. Because technology is always changing, lawyers should continue to stay abreast of changes and developments. This continuing duty of competence is well summarized in The Sedona Conference Best Practices Commentary on the Use of Search & Retrieval Methods in E-Discovery:

Parties and the courts should be alert to new and evolving search and information retrieval methods. What constitutes a reasonable search and information retrieval method is subject to change, given the rapid evolution of technology. The legal community needs to be vigilant in examining new and emerging techniques and methods which claim to yield better search results.

While the challenge of staying abreast of these complex technological changes is difficult, it is certainly not “mission impossible.” Lawyers untrained in the areas of technology have often developed tremendous skill sets required for dealing with other areas of complexities in the law. Perhaps the wise but encouraging reminder from Anthony Hopkins to Tom Cruise in Mission Impossible II will likewise spur reluctant attorneys to accept this difficult, though not impossible task: “Well this is not Mission Difficult, Mr. Hunt, it’s Mission Impossible. Difficult should be a walk in the park for you.”

The Increasing Importance of Cross-Border eDiscovery and Data Protection Awareness

Thursday, June 28th, 2012

Some of the hot news in the legal world suggests that cross-border eDiscovery and data protection issues are gaining increasing importance in the eyes of organizations and their counsel. Recent court cases, international political discussions and industry publications confirm this trend.

On the judicial front, courts appear to weigh in more frequently on cross-border eDiscovery disputes. Unfortunately, their decisions are frequently inconsistent and often fail to provide a clear message for how organizations should approach these complex matters. For example, the U.S. federal court in Manhattan recently issued two diametrically opposed decisions on whether parties must use the Hague Convention for obtaining written discovery from the People’s Republic of China. In Tiffany (NJ) LLC v. Forbse (S.D.N.Y. May 23, 2012), the court ordered the plaintiff to use the Hague Convention to obtain the sought-after discovery in China from two non-party banks. This stands in sharp contrast to the Gucci America, Inc. v. Weixing Li (S.D.N.Y. May 18, 2012) decision, which allowed the plaintiffs to bypass the Hague treaty and obtain discovery through a Rule 45 subpoena. The principal difference between the orders was their respective authors, with each judge reaching a different conclusion as to the merits of proceeding with the Hague Convention. These conflicting decisions, largely the result of the U.S. Supreme Court’s decision in Aerospatiale, confirm that the jurisprudence on cross-border data requests is quite unsettled and will remain an issue of considerable interest for the foreseeable future.

Of no less importance to multinational corporations is the issue of cross-border data protection. In fact, data protection seems to be outpacing eDiscovery in terms of significance to organizations. This was apparent last week at The Sedona Conference Working Group Six (WG6) annual meeting held in Toronto. Though the custom is to not disseminate specifics about Sedona meetings, cross-border data protection laws and their impact on organizations were a significant theme at the conference. This should come as no surprise given the number of data protection laws both enacted and proposed in the past year now confronting organizations. The Philippines, Singapore, Australia, New Zealand, the European Union, the United States and several other countries have either implemented or are considering new data protection laws. Moreover, even the U.S. and the European Union have announced a desire for rapprochement over their continuing differences on data protection and privacy.

Leading industry analyst firm Gartner also weighed in on these issues in its recently released Magic Quadrant for E-Discovery Software. In that Magic Quadrant report, Gartner concluded that eDiscovery was reaching across international boundaries and impacting organizations across the globe. With cross-border litigation on the rise and new regulations in the United Kingdom affecting the financial services industry, Gartner has predicted that “demand for e-discovery products and services will accelerate.”

All of which suggests the need for greater awareness on the eDiscovery and data protection front. Organizations looking to obtain a better understanding of these issues can find resources at their fingertips through the American Bar Association and The Sedona Conference. These not-for-profit entities have issued publications that provide a good starting point for obtaining an understanding of the issues and highlighting best practices for addressing global eDiscovery and data protection laws. Symantec has likewise made resources available in this regard. In addition to its eDiscovery Passports™, Symantec is recording a series of podcasts with industry thought leaders that spotlight key cross-border considerations for organizations. The first of these podcasts features Chris Dale, a well-known international lawyer in this field, who discusses some key aspects of disclosure in the United Kingdom impacting organizations around the world.

Obtaining a greater awareness of cross-border eDiscovery and data protection should ultimately help companies meet the legal challenges accompanying globalization. And greater awareness will likely lead to better corporate practices, which has the opportunity to reduce risks, fees and lost opportunities.

Proportionality Demystified: How Organizations Can Get eDiscovery Right by Following Four Key Principles

Tuesday, April 17th, 2012

Talk to most any organization about legal issues and invariably the subject of eDiscovery will be raised. The skyrocketing costs and lengthy delays associated with data preservation and document review provide ample justification for organizations to be on the alert about eDiscovery. While these costs and delays tend to make the eDiscovery landscape appear bleak, a positive development on this front is emerging for organizations. That development is the emphasis that many courts are now placing on “proportionality” for addressing eDiscovery disputes.

Though initially embraced by only a few cognoscenti after 1983 and 2000 amendments to the Federal Rules of Civil Procedure (FRCP), proportionality standards are now being championed by various district and circuit courts. As more opinions are issued which analyze proportionality, several key principles are now becoming apparent in this developing body of jurisprudence. To better understand these principles, it is instructive to review some of the top proportionality cases issued this year and last. These cases provide a roadmap of best practices that, if followed, will help courts, clients and counsel reduce the costs and burdens connected with eDiscovery.

1. Discourage Unnecessary Discovery

Case: Bottoms v. Liberty Life Assur. Co. of Boston (D. Colo. Dec. 13, 2011)

Summary: The court dramatically curtailed the written discovery that plaintiff sought to propound on the defendant. Plaintiff had requested leave in this ERISA action to serve “sweeping” interrogatories and document requests to resolve the limited issue of whether the defendant had improperly denied her long term disability benefits. Drawing on the proportionality standards under Federal Rule 26(b)(2)(C), the court characterized the proposed discovery as “patently overbroad” and as seeking materials that were “largely irrelevant.” The court ultimately ordered the defendant to respond to some aspects of plaintiff’s interrogatories and document demands, but not before limiting their nature and scope.

Proportionality Principle No. 1: The Bottoms case emphasizes what courts have been advocating for years: that organizations should do away with unnecessary discovery. That does not mean “robotically recycling discovery requests propounded in earlier actions.” Instead, counsel must “stop and think” to ensure that its discovery is narrowly tailored in accordance with Rule 26(b)(2)(C). As Bottoms teaches, “the responsibility for conducting discovery in a reasonable, proportionate manner rests in the first instance with the parties and their attorneys.”

2. Encourage Reasonable Discovery Efforts

Case: Larsen v. Coldwell Banker Real Estate Corp. (C.D. Cal. Feb. 2, 2012)

Summary: In Larsen, the court rejected the plaintiffs’ assertion that the defendants should be made to redo their production of 9,000 pages of documents. The plaintiffs had argued that re-production of the documents was necessary to address certain discrepancies – including missing emails – in the production. The court disagreed, holding instead that plaintiffs had failed to establish that such discrepancies had “prevented them in any way from obtaining information relevant to a claim or defense under Fed.R.Civ.P. 26(b)(1).”

The court also reasoned that a “do over” would violate the principles of proportionality codified in Rule 26(b)(2)(C). After reciting the proportionality language from Rule 26 and referencing The Sedona Principles, the court determined that “the burden and expense to Defendants in completely reproducing its entire ESI production far outweighs any possible benefit to Plaintiffs.” There were too few discrepancies identified to justify the cost of redoing the production.

Proportionality Principle No. 2: The Larsen decision provides a simple reminder that organizations’ discovery efforts must be reasonable, not perfect. This reminder bears repeating as litigants frequently use eDiscovery sideshows to leverage lucrative settlements without having to address the merits of their claims or defenses. Such a practice, liked to a “cancerous growth” given its destructive nature, emphasizes that discovery devices should be used to “facilitate litigation rather than as weapons to wage litigation.” Calcor Space Facility, Inc. v. Superior Court, 53 Cal.App.4th 216, 221 (1997). Similar to the theme raised in our post regarding the predictive coding dispute in the Kleen Products case, principles of proportionality rightly emphasize the reasonable nature of parties’ obligations in discovery.

3. Discourage Dilatory Discovery Tactics

Case: Escamilla v. SMS Holdings Corporation (D. Minn. Oct. 21, 2011)

Summary: The court rejected an argument that proportionality standards should excuse the individual defendant from paying for additional discovery ordered by the court. The defendant essentially argued that Rule 26(b)(2)(C)(iii) foreclosed the ordered discovery given his limited financial resources. This position was unavailing, however, given that “the burden and expense of this discovery was self-inflicted by [the defendant].” As it turns out, the ordered discovery was necessary to address issues created in the litigation by the defendant’s failure to preserve relevant evidence. Moreover, there were no alternative means available for obtaining the sought-after materials. Given the unique nature of the evidence and the defendant’s misconduct, the court held that the “burden of the additional discovery [did] not outweigh its likely benefit.”

Proportionality Principle No. 3: The Escamilla decision reinforces a common refrain among proportionality cases: that proportionality is foreclosed to those parties who create their own burdens. Like the defense of unclean hands, proportionality essentially requires a litigant to approach the court with a clean slate of conduct in discovery. This is confirmed by The Sedona Conference Comment on Proportionality in Electronic Discovery, which declares that “[c]ourts should disregard any undue burden or expense that results from a responding party’s own conduct or delay.”

4. Encourage Better Information Governance Practices

Case: Salamone v. Carter’s Retail, Inc. (D.N.J. Jan. 28, 2011)

Summary: The court denied a motion for protective order that the defendant clothing retailer filed to stave off the collection and analysis of over 13,000 personnel files. The retailer had argued that proportionality precluded the search and review of the personnel files. In support of its argument, the retailer asserted that the nature, format, location and organization of the records made their review and production too burdensome: “ ‘the burden of production . . . outweigh[s] any benefit to plaintiffs’ considering the ‘disorganization of the information, the lack of accessible format, the significant amount of labor and costs involved, and defendant’s management structure’.”

In rejecting the retailer’s position, the court criticized its information retention system as the culprit for its burdens. That the retailer, the court reasoned, “maintains personnel files in several locations without any uniform organizational method,” does not exempt Defendant from reasonable discovery obligations.” After weighing the various factors that comprise the proportionality analysis under Rule 26(b)(2)(C), the court concluded that the probative value of production outweighed the resulting burden and expense on the retailer.

Proportionality Principle No. 4: Having an intelligent information governance process in place could have addressed the cost and logistics headaches that the retailer faced. Had the records at issue been digitized and maintained in a central archive, the retailer’s collection burdens would have been significantly minimized. Furthermore, integrating these “upstream” data retention protocols with “downstream” eDiscovery processes could have expedited the review process. The Salamone case teaches that an integrated information governance process, supported by effective, enabling technologies, will likely help organizations reach the objectives of proportionality by reducing the extent of discovery burdens and making them more commensurate with the demands of litigation.


The foregoing cases exemplify how proportionality principles can help lawyers and litigants conduct eDiscovery in an efficient and cost effective manner. And by faithfully observing these standards, courts, clients and counsel can better follow the mandate from Federal Rule 1 “to secure the just, speedy, and inexpensive determination of every action and proceeding.”

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.

Amending the FRCP: More Questions than Answers

Friday, October 14th, 2011

Outcry from many in the legal community has caused a number of groups to consider whether the Federal Rules of Civil Procedure (FRCP) should be amended.  The dialogue began in earnest a year ago at the Duke Civil Litigation Conference and picked up speed following an eDiscoverymini-conference” held in Dallas last month (led by the Discovery Subcommittee –  appointed by the Advisory Committee on Civil Rules).  The rules amendment topic is so hot that the Sedona Conference (WG1) spent most of its two day annual meeting discussing the need for amendments and evaluating a range of competing proposals.

During this dialogue (which I can’t quote verbatim) a number of things became clear to me…

1.  This rules amendment quandary is a bit of a chicken and egg riddle — meaning that it’s hard to cast support wholeheartedly for a rules change if there isn’t a good consensus for what a particular change would accomplish and what the long term consequences might be as technology quickly morphs.  As an example, if there was a redefined preservation trigger that started the duty to preserve when there was a reasonable “certainty” of litigation (versus a mere “likelihood”), would this really make a material impact?  Or, would this inquiry still be as highly fact specific as it is today?  Would this still be similarly prone to the 20/20 hindsight judgment that’s inevitable as well?

2. While it is clear that preservation has become a more complex and risk laden process, it’s not clear that this “pain” is causally related to the FRCP.  In the notes from the Dallas mini-conference, a pending Sedona survey was quoted, referencing the fact that preservation challenges were overwhelmingly increasing:

“[S]ome trends can be noted. 95% (of the surveyed members) agreed that preservation issues were more frequent. 75% said that development was due to the proliferation of information.”

3. Another camp of stakeholders complain that the existing rules (as amended in 2006) aren’t being followed by practitioners or understood by the judiciary.  While this may be the case, it then begs the critical question: If folks aren’t following the amended rules (utilizing proportionality, leveraging FRE 502, etc.) is it really reasonable to think that any new rules would be followed this time around?

4. The role of technology in easing the preservation burden represents another murky area for debate.  For example, it could be argued that preservation pains (i.e., costs) are only really significant for organizations that haven’t deployed state of the art information governance solutions (e.g., legal hold solutions, email archives, records retention software, etc.) to make the requisite tasks less manual.

5. And finally, even assuming that the FRCP is magically re-jiggered to ease preservation costs, this would only impact organizations with litigation in Federal court. This leaves many still exposed to varying standards for the preservation trigger, scope and associated sanctions.

So, in the end, it’s unclear what the future holds for an amended FRCP landscape.  Given the range of divergent perspectives, differing viewpoints on potential solutions and the time necessary to navigate the Rules Enabling Act, the only thing that’s clear is that the cavalry isn’t coming to the rescue any time soon.  This means that organizations with significant preservation pains should endeavor to better utilize the rules that are on the books and deploy enabling technologies where possible.

A Judicial Perspective: Q&A With Former United States Magistrate Judge Ronald J. Hedges Regarding Possible Discovery Related Rule Changes

Friday, September 9th, 2011

If you have been following my previous posts regarding possible amendments to the Federal Rules of Civil Procedure (Rules), then you know I promised a special interview with former United States Magistrate Judge Ron Hedges.  The timing of the discussion is perfect considering that a “mini-conference” is being hosted by a Federal Rules Discovery Subcommittee today (September 9th) in Dallas, TX.  The debate will focus on whether or not the Rules should be amended to address evidence preservation and sanctions.  I am attending the mini-conference and will summarize my observations as part of my next post.  In the meantime, please enjoy reading the dialogue below for a glimpse into Judge Hedges’ perspective regarding possible Rule amendments.

Nelson: You were recently quoted in a Law Technology News (LTN) article written by Evan Koblentz as saying, “I don’t see a need to amend the rules” because these rules haven’t been around long enough to see what happens.  Isn’t almost five years long enough?

Judge Hedges: No.  For the simple reason that both attorneys and judges continue to need education on the 2006 amendments and, more particularly, they need to understand the technologies that create and store electronic information.  The amendments establish a framework within which attorneys and judges make daily decisions on discovery.  I have not seen any objective evidence that the framework is somehow failing and needs further amendment.

Nelson: You also said the “big problem” is that people don’t talk enough.  What did you mean?  Hasn’t the Sedona Cooperation Proclamation made a difference?

Judge Hedges: The centerpiece of the 2006 amendments (at least in my view) is Rule 26(f).  I think it is fair to say that the legal community’s response to 26(f) has been, to say the least, varied. Civil actions with large volumes of ESI that may be discoverable under Rule 26(b)(1) cry out for extensive 26(f) meet-and-confer discussions that may take a number of meetings and require the presence of party representatives from, for example, IT.  There is an element of trust required between adversary counsel (with the concurrence of the parties they represent) that may be difficult to establish – but some cooperation is necessary to make 26(f) work.  Overlay that reality with our adversary system and the duty of attorneys to zealously advocate on behalf of their clients and you can understand why cooperation isn’t always a top priority for some attorneys.

However, “transparency” in discussing ESI is essential, along with advocacy and the need to maintain appropriate confidentiality. That’s where the Sedona Conference Proclamation can make a big difference. Has the Proclamation done that? It’s too early to reach a conclusion on that question, but the Proclamation is often cited and, as education progresses in eDiscovery, I am confident that the Proclamation will be recognized as a means to realize the just, speedy, and inexpensive resolution of litigation, as articulated under Rule 1.

Nelson: You also mentioned that the Federal Rules Advisory Committee might be running afoul of the Rules Enabling Act.  Can you explain?

Judge Hedges: There is a distinction between “procedural” and “substantive” rules.  The Rules Enabling Act governs the adoption of the former.  Rule 502 of the Federal Rules of Evidence is an example of a substantive rule that was proposed by the Judicial Conference.  However, since Rule 502 is a rule dealing with substantive privilege and waiver issues, it had to be enacted into law through an Act of Congress.  I am concerned that proposals to further amend the Federal Rules of Civil Procedure may cross the line from procedural to substantive.  I am not prepared to suggest at this time, however, that anything I have seen has crossed the line.  Stay tuned.

Nelson: If you had to select one of the three options currently being considered (see page 264), which option would you select and why?

Judge Hedges: To start, I would not choose option 1, which presumes that the Rules can reach pre-litigation conduct consistent with the Rules Enabling Act.  My concern here is also that, in the area of electronic information, a too-specific rule risks “overnight” obsolescence, just as the Electronic Communications Privacy Act, enacted in 1986, is considered by a number of commentators to be, at best, obsolescent.  Note also that I did not use the word “stored” when I mentioned electronic information, as courts have already required that so-called ephemeral information be preserved.  Nor would I choose option 2.  Absent seeing more than the brief description of the category on page 264, it seems to me that option 2 is likely to do nothing more than be a restatement of the existing law on when the duty to preserve is “triggered.”

So, by default, I am forced to choose option 3.  I presume a rule would say something like, “sanctions may not be imposed on a party for loss of ESI (or “EI”) if that party acted reasonably in making preservation decisions.”  There are a number of problems here. First, in a jurisdiction which allows the imposition of at least some sanction for negligence, all the rule would likely do is be interpreted to foreclose “serious” sanctions. Isn’t that correct? Or is the rule intended to supersede existing variances in the law of sanctions?  At that point, does the rule become “substantive”?   Second, how will “reasonableness” be defined?  Reasonableness supposes the existence of a duty – in this case, a duty to preserve.  For example, is there a duty to preserve ephemeral data that a party knows is relevant?  We come back full circle to where we began.

Remember, Rule 37(f) (now 37(e)) was intended to provide some level of protection against the imposition of sanctions, just as the categories are intended to.  Right?  And five years later 37(e) remains defined variously to be a “safe harbor” or a “lighthouse” by some lawyers such as Jonathan Redgrave or an “uncharted minefield” by others like me.

Nelson: What about heightened pleading standards after the Iqbal and Twombly decisions?  Do these decisions have any relevance to electronic discovery and the topic at hand?

Judge Hedges: Let me begin by saying that I am no fan of Twombly or Iqbal. The decisions, however well intended, have led to undue cost and delay all too often.  Not only is motion to dismiss practice costly for parties, but it imposes great burdens on the United States Courts and, as often as not, leads to at least one other round of motion practice as plaintiffs are given leave to re-plead.  All the while, parties have preservation obligations to fulfill and, in the hope of saving expense, discovery is often stayed until a motion is “finally” decided.  I would like to see objective evidence of the delay and cost of this motion practice (and I expect that the Administrative Office of the United States has statistical evidence already).  I would also like to see objective evidence from defendants distinguishing between the cost of motion practice and later discovery costs.

Putting all that aside, and if I had to accept one option, I would choose to allow some discovery that is integrated to the motion practice.  First, even without the filing of a responsive pleading, there should be a 26(f) meet-and-confer to discuss, if nothing else, the nature and scope of preservation and the possibility of securing a Rule 502(d) order. Second, while I have serious concerns about “pre-answer discovery” for a number of reasons, I would have the parties make 26(a)(1) disclosures while a motion to dismiss is pending or leave to re-plead has been granted in order to address the likely “asymmetry of information” between a plaintiff and a moving defendant.  Once the disclosures are made, I would allow the plaintiff to secure some information identified in the disclosures to allow re-pleading and perhaps obviate the need for continued motion practice.

All of this would, of course, require active judicial management.  And one would hope that Congress, which seems so interested in conserving resources, would recognize the vital role of the United States Courts in securing justice for everyone and give adequate funding to the Courts.

Bit by Bit: Building a Better eDiscovery Collection Solution

Friday, July 29th, 2011

Is there a place in eDiscovery today for hard drive imaging and bit by bit copies, which collect deleted items or slack/unused hard disk space?  The answer is yes with some important limitations.  For the vast majority of matters, ESI can be collected without imaging drives or utilizing proprietary container files.  However, I occasionally still encounter folks who are victims of the dated and costly misconception that eDiscovery always requires the bit-level imaging of hard drives.

There are situations, though, where the existence of data (as opposed to its content) is central to the matter – when companies suspect employees of stealing proprietary information or when employees leave a company under suspicious circumstances.  In these and other similar situations, it may make sense to have the employee’s workstation hard drive imaged for full forensic analysis.  Even in these scenarios, I find that companies are more likely to hire an external investigator to perform this task to allay suspicions of tampering or bias, and the company generally would prefer that this investigator be the one to testify about this sensitive data acquisition.  Then, for ESI beyond the target employee’s hard drive, other collection methods may be used.  As we’re now midway through 2011 – a year in which I expect to see eDiscovery fully embraced by many corporations as a true business process – I wanted to analyze why the forensic disk image myth still exists, where it came from, and what the law really requires of an eDiscovery collections process.

Traditionally, cases that mentioned full forensic imaging of hard drives began their captions with United States v. or State v. because they were criminal matters.  In traditional civil litigation – even the behemoth eDiscovery cases that get all the bloggers blogging – forensic imaging simply is not required or needed.  In fact, in most cases, it will dramatically increase the cost associated with electronic discovery – this process adds unnecessary complexity in downstream phases of eDiscovery and leads to vast over-collection.  Why collect the Microsoft Office suite 50 times when what you are really required to preserve and collect are the files created with those programs?  When using disk imaging, program files are collected which drives up storage costs and requires the post-collection step of deNISTing (removing system files based on the NIST list).  Why not leave those system files behind and perform a targeted collection of only user-created content?    In addition, the primary rules governing civil litigation – the Federal Rules of Civil Procedure and Federal Rules of Evidence – simply do not require exact duplication of electronic files.  I am amazed that there are so many experts who are still pushing full forensic imaging and duplication in every case.  In fact, this goes against best practices published by The Sedona Conference, EDRM, and in the E-Discovery textbook co-authored by Judge Shira A. Sheindlin.

In comment 8c of the Sedona Principles, the authors call making forensic image backups of computers “the first step of an expensive, complex, and difficult process of data analysis that can divert litigation into side issues and satellite disputes involving the interpretation of potentially ambiguous forensic evidence.”  The comment goes on to say that “it should not be required unless exceptional circumstances warrant the extraordinary cost and burden.”  In a whitepaper authored for EDRM by three eDiscovery experts from KPMG, LLC, the authors discussed the high cost of forensic bit-level imaging and, instead, suggested that targeted collection of ESI would be sufficient in the vast majority of non-criminal matters.  They state, “[t]he challenge of Smart EDM [Evidence and Discovery Management] is to obtain targeted files in a forensically sound manner – chain-of-custody established, proven provenance, and metadata intact – without having to resort to drive imaging.”

In Electronic Discovery and Digital Evidence: Cases and Materials, written by Judge Shira A. Scheindlin, Daniel J. Capra, and The Sedona Conference, the authors state that,

“because imaging software is commonly available, and because the vast majority of training programs in the field of electronic discovery revolve around forensics, there is a growing tendency to want to ‘image everything.’  But unless an argument can be made that the matter at hand will benefit from a forensic collection and additional examination, there is no reason to do a forensic collection just because the technology exists to do it.”

So, with the top experts in the field saying the days of “image everything” should be over, why does it still happen?  Why are the victims of this antiquated workflow still paying the exorbitant costs of a solution that does not really meet their requirements?  Perhaps a historical perspective will be helpful in explaining.

Why Drive Imaging and Proprietary Containers?

I do not think there is any debate on the benefit of having a bit-level image of a hard drive in a criminal investigation.  However, traditionally, the investigators using these methods needed a way to get the imaged drive safely back to a lab for further analysis.  Companies or law enforcement agencies that hired third-party investigators to image drives had to transport the data, maintaining chain of custody, and preserving all contents in an un-alterable state through several phases of the investigation.  And, in criminal matters, it was especially important to maintain the integrity of the evidence when the electronic evidence was central to the government’s case.  Remember, the burden of proof in a criminal matter is “beyond a reasonable doubt” (along with a host of constitutional considerations).  Alteration of key evidence could certainly create reasonable doubt and hose the prosecution’s case (or, worse, the evidence gets tossed by the Court before the trial even begins).  The container file ensures that no matter who handles the evidence, checksums can prove that the contents were not altered since the initial imaging.

Many vendors now offer logical image containers as an alternative to doing a full bit-level image of the drive.  However, in corporate eDiscovery, this is still overkill because the tools and solutions being used downstream still have to unpack or parse these proprietary container formats for processing and analysis.  In fact, even software from the vendors who created these container formats must “crack them open” to get to the contents within.  This seems to add a layer of complexity that has not been needed since the days of the external examiner coming in with her forensic toolkit to do drive images. The format was created to solve a very specific problem, and little thought was given to the use of this format in a holistic process like what is typically seen in civil eDiscovery.   There is no longer a need for a container for portability of evidence because it is most likely going to be processed in place after collection while residing on a secure evidence store on the company’s network.  I have heard “what if our collections methods are challenged?”  And to that, I would respond that we are not in criminal court and that the requirement in civil court is reasonableness, not perfection.  Now, if an employee is suspected of wrongdoing and the potential deletion of files will dramatically alter the case, then by all means, hire a forensic investigator and follow all of the protocols established over the last several decades in computer forensic science.

Fast forward to the 21st century

Corporations are bringing eDiscovery in-house; they are building a business process around it to minimize risk and drive enormous cost savings, and in today’s world of civil litigation, there simply is not a need for these drive images or proprietary containers.  First of all, the burden of proof in a civil matter is “by a preponderance of the evidence.”  What this means is that the burden is satisfied if there is greater than 50% chance that a proposition is true.  This is a much lower standard than in criminal cases.  But, burden of proof goes more to the weight evidence is given by the court or jury.  Before that is even considered, evidence must pass several hurdles of admissibility.  As we will explore, these standards of admissibility have also been the recipients of significant bolstering from vendors over the years.

The Path to Admissibility

There are several hurdles to admissibility for any type of evidence, and because they are not within the scope of this post, I will forego any discussion of relevance, FRE 403, or the hearsay rules.  I will focus on the issues that tend to be associated with electronic evidence: authentication and the “best evidence rule”.  There are some examiners and perhaps even vendors that would argue electronic evidence is simply not admissible if not collected using bit-level imaging (and sometimes 2 copies – one that is referred to by examiners as the “best evidence” copy and another “working copy” to be analyzed).  This is simply not true.  What we will find is that the collection method will go more to the weight of the evidence rather than the minimum showing needed for admissibility (hence, the discussion of burden of proof above).

All evidence must be authenticated pursuant to FRE 901.  This is a “don’t pass Go” threshold requirement for admissibility.  FRE 901 is satisfied by “evidence sufficient to support a finding that the matter in question is what its proponent claims.”  Notwithstanding a “self-authenticating” piece of evidence pursuant to FRE 902, the proponent must establish the identity of the exhibit by stipulation, circumstantial evidence, or the testimony of a witness with knowledge of its identity and authorship.  Typically, objections to this process would tend to go toward whether the exhibit is an original, was altered, or the witness with whom the proponent is attempting to authenticate the exhibit is not able to so based on lack of personal knowledge or some other defect.  Mostly these objections deal with the authenticity of the contents of the exhibit, and the rules in Article X of the FRE are helpful here.  Rule 1001 defines an “original” with respect to data stored in a computer or similar device as “any printout or other output readable by sight, shown to reflect the data accurately.”  This is a far cry from a bit-by-bit forensic image!  Rule 1002 – often referred to as the “Best Evidence Rule” – requires that “[t]o prove the content of a writing, recording, or photograph, the original writing, recording, or photograph is required, except as otherwise provided in these rules or by Act of Congress.”  Not only do these rules not require exact duplication of the electronic files, but they do not require imaging the entire 80GB hard drive to collect the 100MB of files that are potentially relevant to the case.  What they do require, though, is the ability to show that a document being proffered is the same document that was originally created.  In Re Vee Vinhnee, 336 B.R. 437, 444 (B.A.P. 9th 2005). Also, Judge Grimm sets out an extremely comprehensive analysis of what is required for the admissibility of electronic evidence in civil litigation in Lorraine v. Markel American Insurance Company, 241 F.R.D. 534 (D.Md. May 4, 2007).  In Lorraine, he notes that In Re Vee Vinhee may set out the most demanding test for admissibility of ESI.

Maintaining Forensic Integrity

So, how do I combat the claims that “they must have altered that document” or “Your, honor, I swear that line about ‘acceptable losses’ was not in the safety memo when I created it”?  This is where hash value becomes a wonderful thing.  Computing the hash of an electronic file, or computing a hexadecimal checksum based on analysis of the contents of an electronic document, is essentially like recording the DNA of an electronic file.  If the file is altered, its hash value would be different.  So, by computing the hash value at the source, in transit, and at the destination, I can ensure that the electronic file is in exactly the same state as it was at the source (or, that the collected document is the same as the document originally created).  Now, add the ability to report on that information and those container files and full forensic disk images really do become extreme overkill.

The important distinction here is that the term “forensic” does not refer to a type of technology or the products of a specific vendor – despite claims and propaganda to the contrary.  Forensic refers to the methodology used by the person collecting the evidence – whether it is finger prints from a weapon or electronic files from an employee’s laptop.  Forensic imaging, however, refers to the process by which an entire hard disk is copied bit by bit to create an exact duplicate of that hard drive in a forensic manner.  It is entirely possible for a collection of ESI to be “forensically sound” by simply employing the technique described above of taking hash values at each stage of the process to be able to prove that the files were not altered during collection.  As long as chain of custody is also maintained (much easier to do now that we are not using multiple tools, vendors, locations, and people to do the job), then the process should meet the threshold admissibility requirements of the Federal Rules of Evidence.

Opponents will still bring up claims that the evidence must have been altered, or the expert familiar only with forensic imaging technologies will try to use the argument that only vendor X’s technology is “court vetted,” so any other method is not acceptable.  But, to these opponents, I would argue two points:

  1. No technology is “court vetted”.  The operator’s use of the technology in the specific case (in a specific jurisdiction) was acceptable to the court to meet the threshold showings required by FRE 901, 1001, and 1002 – as well as any rules of procedure governing the production of discovery in either a civil or criminal matter.  Wow – that would be a very long footnote on a marketing slide…probably why it is not usually mentioned.
  2. The process is forensically sound, and you can prove that the documents were not altered from collection through production by referencing the hash value and maintaining copies of the original native files analyzed on a secured preservation store.  This would exceed the requirements of FRE 901, 1001, and 1002 – but would provide protection against claims going to the “weight” of the evidence by opponents who would cry foul.

What Now?

So, where does all of this leave us?  First, in the vast majority of civil litigation matters where electronic discovery is being performed, forensic bit by bit imaging of computer hard drives is simply not required.  Vendors have promoted this practice over the years, but all this has done is over-complicate the eDiscovery process for many unsuspecting litigants and dramatically increase costs because the model simply does not scale.  Moreover, the effort and cost required to deal with these full drive images downstream in the process is often overlooked by these vendors and overzealous consultants.  Next, we now know there is a better way – targeted, forensically-sound collection of ESI using streamlined and automated solutions that maintain custodian relationship – even for shared data sources – throughout the eDiscovery lifecycle, preventing form of production disputes and other calamities that have plagued this industry for the last decade.  There is a better way to collect ESI that will provide exponential cost savings all the way to production.